On 14 June 2011 15:26, Stefan Dösinger stefandoesinger@gmx.at wrote:
As far as I'm concerned you can just submit this. I was going to do this myself, looks like you got there first.
Still didn't get around to test this on geforce 7 GPUs. It's possible that the bug this was supposed to fix is still around.
Yes, but I think that by now GF7 GPUs are marginal enough that it's not worth keeping the code around for. The Steam HW survey for example reports over 90% D3D10+ cards. Even if it does regress something, I think it makes more sense to tell people to either file a bug with NVIDIA for that or help improve the nouveau driver for that card.
Besides, it is probably not necessary for the other patches. The consideration was that we'd have to verify the filter each draw, but I don't think setting a texture as sampler and render target simultaneously is allowed in d3d.
I'm not so sure. E.g. the docs for the INTZ format say you can have an INTZ texture bound as both depth buffer and texture as long as depth writes are disabled. (This makes some sense, since in that case there aren't any read/write conflicts.)
@@ -1913,6 +1928,10 @@ void surface_set_texture_name(struct ...
- if (surface_is_framebuffer(surface))
- {
- IWineD3DDeviceImpl_MarkStateDirty(surface->resource.device,
STATE_FRAMEBUFFER); + } ...
What are these for?
The texture name one is not needed, I've removed that from the patch already. The allocate_surface check is needed in case a ddraw app changes the pixelformat via SetSurfaceDesc.
I'm not sure that can actually happen. wined3d_surface_set_format() insists the format must be WINED3DFMT_UNKNOWN, so it can't be part of a working FBO entry before the format is changed. That probably also means clearing the allocation flags there is a bit silly.
You may also have to handle an active RT getting unloaded, though I'm not entirely sure if that's allowed or not.
It shouldn't be. RTs must be in the default pool, which can't be unloaded.
Even in ddraw?
I wonder if the speedup is mostly for load_location(), modify_location() or both though? Maybe we can improve those functions themselves.
It's caused by both, I think it's plain call overhead. I'll double check that though. It may also be hyper-sensitivity of the draw overhead test. 260->270 fps isn't a lot when you consider that native gets ~1100 fps. But right now I have to take what I can get.
Maybe surface_load_location() could do an initial location check a bit earlier. (And some of the code before the current check could also be removed when we get rid of texture == drawable.) For surface_modify_location(), the overlay code probably doesn't belong in there, we could do a similar early check if the flags already match, and maybe we should split it in two functions.