On 21 March 2011 21:56, Adam Martinson amartinson@codeweavers.com wrote:
Cuts CPU time in context_apply_fbo_state() in half.
This is meaningless. Which applications, and how much time of the total is spent in context_apply_fbo_state()? How does this translate into concrete thing like frame time? More importantly, why does this change come before the one for tracking FBO dirty state? Avoiding redundant FBO entry comparisons completely is likely to have a much more significant effect than making the comparisons themselves slightly cheaper in very specific applications.
On 03/22/2011 05:02 AM, Henri Verbeet wrote:
On 21 March 2011 21:56, Adam Martinsonamartinson@codeweavers.com wrote:
Cuts CPU time in context_apply_fbo_state() in half.
This is meaningless. Which applications, and how much time of the total is spent in context_apply_fbo_state()?
In the 3DMark06 batch size 8 test, as of Friday's tip, it was 5.5% of wined3d; 1.2% of total wine CPU time. Searching the FBO list is almost all of that. It's called every time drawPrimitive() is (assuming ORM_FBO), so that shouldn't be much of a shock.
How does this translate into concrete thing like frame time?
It's an improvement, but I can't give you very meaningful FPS numbers there for individual patches. In my testing raw FPS is less reliable than profile results; it's dependent on GPU time as well, which I can't profile. The range of normal FPS variability is larger than the effect of most individual patches, including this one.
More importantly, why does this change come before the one for tracking FBO dirty state?
Really? I sent both at the same time, and the order bothers you? Really??
Avoiding redundant FBO entry comparisons completely is likely to have a much more significant effect than making the comparisons themselves slightly cheaper in very specific applications.
Actually, it would take a much more specific FBO configuration for this to make no difference. The whole list would have to have the same number of render targets, and almost all of them full. memcmp() is expensive. Avoiding unnecessary comparisons is cheap. This makes both successful and unsuccessful FBO entry comparisons cheaper. The other patch helps too, that's why I submitted both.
On 22 March 2011 18:12, Adam Martinson amartinson@codeweavers.com wrote:
On 03/22/2011 05:02 AM, Henri Verbeet wrote:
This is meaningless. Which applications, and how much time of the total is spent in context_apply_fbo_state()?
In the 3DMark06 batch size 8 test, as of Friday's tip, it was 5.5% of wined3d; 1.2% of total wine CPU time. Searching the FBO list is almost all of that. It's called every time drawPrimitive() is (assuming ORM_FBO), so that shouldn't be much of a shock.
I would hope that significant changes to fundamental parts of wined3d would be backed up by a somewhat wider range of benchmark results.
How does this translate into concrete thing like frame time?
It's an improvement, but I can't give you very meaningful FPS numbers there for individual patches. In my testing raw FPS is less reliable than profile results; it's dependent on GPU time as well, which I can't profile. The range of normal FPS variability is larger than the effect of most individual patches, including this one.
GPU time matters. Freeing up CPU time just so you can spend more time waiting for the GPU to finish isn't all that useful.
More importantly, why does this change come before the one for tracking FBO dirty state?
Really? I sent both at the same time, and the order bothers you? Really??
Really. And I did even explain why. Note that while displays of disbelief rarely help with getting patches committed, solid arguments tend to do just fine.
Avoiding redundant FBO entry comparisons completely is likely to have a much more significant effect than making the comparisons themselves slightly cheaper in very specific applications.
Actually, it would take a much more specific FBO configuration for this to make no difference. The whole list would have to have the same number of render targets, and almost all of them full. memcmp() is expensive. Avoiding unnecessary comparisons is cheap. This makes both successful and unsuccessful FBO entry comparisons cheaper. The other patch helps too, that's why I submitted both.
I'm quite aware of the costs of memory accesses, thanks.
So, to be more explicit this time: - I understand just fine what the patch actually does. - I'm not opposed to this change in principle, but you will have to justify changes like these. - How much of a real difference does this patch make, after you avoid the redundant lookups? (I.e., after the second patch in this series.)
On 03/22/2011 01:33 PM, Henri Verbeet wrote:
How much of a real difference does this patch make, after you avoid the redundant lookups? (I.e., after the second patch in this series.)
In that case the additional reduction in context_apply_fbo_state() CPU time from the 1st patch is only about 10%.
On 23 March 2011 23:50, Adam Martinson amartinson@codeweavers.com wrote:
On 03/22/2011 01:33 PM, Henri Verbeet wrote:
How much of a real difference does this patch make, after you avoid the redundant lookups? (I.e., after the second patch in this series.)
In that case the additional reduction in context_apply_fbo_state() CPU time from the 1st patch is only about 10%.
That sounds more realistic. I think it makes sense to just focus on your 2/4 patch for the moment, and worry about the other patches later.