Hi,
Thanks for the comments, I've some replies inlined below. The bigger concern I have is that the patches don't improvement by a lot yet, about 1% on AMD GPUs and 1.5% to 2.5% on Nvidia GPUs(in real apps, my benchmarks are a different issue). Because of that my plan is to do more testing and experiment with things like separating the vdecl-streamsrc-vshader states et cetera and see how much impact this has.
All in all I am not yet convinced that my approach with lots of very targeted tests is the way to salvation, and I want to wait a bit before I start screwing up the code. Unfortunately it is still the best way I have.
I'm sending in some of the patches that make sense even without performance improvements. The MINFILTER stuff is a case like that, but I want to test it on pre-dx10 GPUs first.
On Tuesday 14 June 2011 14:20:12 Henri Verbeet wrote:
I only looked over about half of this, doesn't look too crazy. You do have trailing spaces in a couple of places though. Here are some
Ya, I spotted them myself a while later, and didn't look for style issues before sending them to wine-devel. Though maybe I should look for better tools to watch out for slopiness like this.
As far as I'm concerned you can just submit this. I was going to do this myself, looks like you got there first.
Still didn't get around to test this on geforce 7 GPUs. It's possible that the bug this was supposed to fix is still around.
Besides, it is probably not necessary for the other patches. The consideration was that we'd have to verify the filter each draw, but I don't think setting a texture as sampler and render target simultaneously is allowed in d3d.
@@ -1913,6 +1928,10 @@ void surface_set_texture_name(struct ...
- if (surface_is_framebuffer(surface))
- {
IWineD3DDeviceImpl_MarkStateDirty(surface->resource.device,
STATE_FRAMEBUFFER); + } ...
What are these for?
The texture name one is not needed, I've removed that from the patch already. The allocate_surface check is needed in case a ddraw app changes the pixelformat via SetSurfaceDesc.
You may also have to handle an active RT getting unloaded, though I'm not entirely sure if that's allowed or not.
It shouldn't be. RTs must be in the default pool, which can't be unloaded. Of course that doesn't stop our broken location management from screwing up, but that would be a separate issue.
Subject: [PATCH 09/14] wined3d: Make render target dirtification in draws cheaper
I've been thinking about getting rid of the texture == drawable stuff for FBOs, and instead just making texture <-> drawable loads a nop in load_location(). At the very least this would allow the check to always be against SFLAG_INDRAWABLE instead of surface->rt_location_flags, which IMHO looks more sane.
Yes, this sounds much better.
I wonder if the speedup is mostly for load_location(), modify_location() or both though? Maybe we can improve those functions themselves.
It's caused by both, I think it's plain call overhead. I'll double check that though. It may also be hyper-sensitivity of the draw overhead test. 260->270 fps isn't a lot when you consider that native gets ~1100 fps. But right now I have to take what I can get.