Mostly for what it's worth, I don't think there are a lot of good reasons left these days for ddraw accessing the wined3d front buffer. In particular, in ddraw_surface_update_frontbuffer() we should be able to just blit to the back buffer and then call wined3d_swapchain_present() with a 0 swap interval.
For my own edification, why have we historically rendered directly to the front buffer?
Originally, because it was the straightforward thing to do; we'd translate ddraw front buffer blits to X11/GL front buffer blits. There was no wined3d at that point, nor necessarily OpenGL. (XFree86-DGA was a thing, once upon a time.)
That model was largely translated as-is when ddraw started using wined3d. There's still a fast-path to implement blits from the back buffer to the front buffer as a call to wined3d_swapchain_present() in texture2d_blt(). Then, at some point ddraw_surface_update_frontbuffer() was introduced. The reasons included fixing issues with drawing outside the swapchain window (if any) when using GL, retaining the contents of swapchain surfaces after Flip(), and fixing some performance issues with applications locking the front buffer by keeping track of the affected rectangle. This effectively virtualised ddraw access to the front buffer, not unlike modern windowing systems tend to do; the performance advantages of directly blitting to the front buffer largely no longer exist today. The introduction of "AlwaysOffscreen" effectively removed direct access to the back buffer.
What remained then was that ddraw_surface_update_frontbuffer() never used wined3d_swapchain_present() until commit 034e88e038e8114ec31261d88dece1e2691185fb. These days it does though, and that pretty much leaves potential overhead of wined3d_swapchain_present() as the only reason for not calling it, perhaps most significantly from swapchain_blit(). I think we should try to reduce that overhead in any case.
There are of course some optimisations to wined3d's present path we'd like to make regardless.
What do you have in mind?
Broadly, I'd like to reduce the number of blits/copies involved in getting surfaces to the screen, and I think we should be able to make some progress on that using an approach similar to wined3d_buffer_set_bo()/UPLOAD_BO_RENAME_ON_UNMAP. I.e., if we're completely replacing the contents of a texture without scaling or format conversion, we could just propagate the underlying texture/VkImage. Ideally we'd be able to do that all the way to the display driver in the kernel.