Hi,
(warning: long mail)
Currently the wined3d code is doing more or less syncronous rendering, that
means that a Direct3D function call from the app results directly results in
the equivalent opengl call(s). There are a few issues with that:
* Multithreaded Direct3D: Opengl calls can only be done from the thread that
owns the glX context. Direct3D calls can be done from any thread. Passing
around the context is only possible with hacks(SetThreadContext or
pthread_kill) and prone to deadlocks.
* Performance: Applications expect the 3D calls to return immediately so they
can do other things while the gpu is rendering. GL works in the same way, so
our Direct3D rendering functions should return almost immediately too, but
due to the state changes and drawStridedSlow seem to cause gl to wait until
the pipeline is empty.
My suggestion is to create a per-device thread which does the rendering and
owns the thread, and the rendering calls only place some tokens into a queue
and return immediately. This way the app gets the control back immediately
and multithreaded direct3d is only about locking the queue correctly. The
rendering thread and all rendering code would be in drawprim.c(and maybe a
new file e.g. opengl_utils.c). The other files would contain no gl code.
Here are some more concrete suggestions for implementing this:
The pipeline is a block of memory with a fixed size(e.g. 64k, whatever), and
the work orders that are placed in it consist of an opcode and any number of
arguments. A NULL opcode means that the place is empty. When a new operation
doesn't fit at the bottom of the pipeline we start again. A instruction
pointer points at the opcode of the next instruction. If the next opcode is
NULL that means that the pipeline is empty. When an instruction was executed
the memory of the instruction is zeroed and the instruction pointer set to
the next byte after the old instruction. A new instruction doesn't fit into
the pipeline if it would overwrite nonzero memory, then we issue a warning
and wait until some more space is free.
A little modification would be to fix the number of arguments to an opcode.
Checking for emtpy instructions would be easier then because we only have to
check if the address holding the operation code is NULL and not the whole
memory when placing a new instruction, but on the other hand waste memory.
Of course we can HeapAlloc the instructions and place pointers to the
allocated memory. That doesn't waste memory(maybe, depends on how HeapAlloc
works), but imposes the overhead of regular HeapAlloc / HeapFree calls.
So what instructions would we need? Everything that issues GL calls. Here are
some I could think of and some implementation thoughts:
SetRenderState:
IWineD3DDevice::SetRenderState sets the update stateblock, and if not
recording to a stateblock places a SETRENDERSTATE operation. Arguments are
the render state to set and the value to set it to. When the instruction from
the pipeline is exectured the value is set in the actual render stateblock
and the gl state is updated with the code that is in setrenderstate already.
IWineD3DDevice::GetRenderState returns the value from the update stateblock,
so it is independent from the execution state of the pipe.
SetTextureStageState:
Arguments are the stage, state and value, otherwise it is simmilar to
SetRenderState
SetStreamSource, SetTexture, Set*Shader:
Update the update state block, update the refcounts and if not recording queue
a setting operation for the stream/texture/shader. This operation updates the
render state block, but does not necessarilly change the gl setting(e.g.
SetTexture requires texture coords in the vertex too)
The Getters return the values stored in the update state block.
SetDisplayMode, GetDisplayMode: Not GL calls
SetClipPlane, SetMateral, SetLight, SetLightenable, SetTransform,
MultiplyTransform, SetViewport: Pretty simple, update the
updatestateblock, ...
SetFVF, SetVertexDeclaration: Updates the update stateblock and queues a
SetDeclaration operation. The declaration is stored in the render stateblock
and referenced for rendering. I'd suggest that the render thread should not
deal with FVFs
Set*ShaderConstant: No idea
UpdateTexture, UpdateSurface: No idea either. Maybe relay to DirectDraw Blits
ApplyStateBlock:
Compare the stateblock contents against the updatestateblock, update the
updatestateblock and queue Set* commands for different ones
Surface Locking:
Set up the local memory for the surface, and if necessary issue a command to
read back the surface from gl. Wait for this command to be executed and wait
until the last command referencing the surface is finished. If a surface is
locked often keep the local memory copy to avoid flushing the whole pipeline
for the readback command. When the surface memory is ready pass return
Surface unlocking:
If necessary start converting the surface e.g. for color keying in a seperate
thread and return. If the surface is used for drawing before the conversion
is complete the rendering thread has to wait until the conversion is
finished. Uploading the surface to gl is done during drawprim when the
surface is used.
Vertex Buffer locking:
Simmilar to surface locking. If neither NOOVERWRITE or DISCARD locking flags
are provided wait until all rendering with the buffer is done. Then return
the buffer data. We may have to give up the idea of mapping gl memory via
glMapBuffer or we might have to wait for the whole pipeline to be executed to
place a command for that.
Unlocking vertex buffers:
If the semantics of the data is known start fixing up vertices in a seperate
thread. When done fixing up the buffer place a preload command into the
pipeline to load the buffer as early as possible, some gl implementations
seem to need that. Again if the buffer is used for drawing until the
conversion is done the drawing thread has to wait. Also convert buffers if no
vbos are available to get rid of drawStridedSlow completely.
Drawprim: This is the most complex thing:
First, check if all bound textures and vertex buffers are unlocked(Unit
test!). Then increase the rendering reference counter of all textures and
buffers(to count how often the object is used in the queue). Then queue a
drawprim command and return. If drawing from a user pointer we either have to
wait until drawing is done or create a copy before placing the call(this is
my favorite, we can fixup colors too while we're at it)
Blits: Find out if the blit can be handled in opengl and queue a blit
call(which will draw a textured quad). If gl can't handle that fall back to
the gdi code, it will perform everything in software, from a gl perspective
surface locks are done. This is slow, we will want to handle everything in
gl.
GetDC: At the moment this is a LockRect from the gl pov, we may want to write
a gl gdi driver which queues commands on the pipeline
Present:
Queue a FLIP command and wait for the pipeline to be emptied, then return.
Ideally the rendering is done when Present is called and present returns
immediately.
Destroying objects: Wait until they aren't needed in the pipeline anymore in
Release.
Open issues:
SetRenderTarget:
Afaik those have their own gl context. Should we have a different pipeline or
request the worker to switch to a different gl context? Synchonisation is a
issue
Multiple swapchains: Simmilar issue
Anything I forgot?
How do we reference objects in the pipeline? For the start I'd suggest to use
the implementation pointer, later we may want to replace it by handles(to
avoid issues with the pointer size on 64 bit). See the roadmap.
My suggestion for the roadmap:
1) Start by protecting the ddraw, d3d8 and d3d9 objects with critical sections
against race conditions(easy).
2) Move the code around in wined3d a bit, split up COM from GL stuff without
actually changing the way rendering is done.
3) Add a stub pipeline and add code queing the command
4) Move the context into the worker thread and call the actual gl commands
from there
Additional stuff that can be done if we feel like it:
* Get rid of COM in wined3d
* Move non-rendering things like Private Data, Stateblocks and Getters into
ddraw, d3d8 and d3d9, leave only rendering code in wined3d. The software
ddraw code has to stay in wined3d though
* Adopt ddi, ddentry or whatever is used in windows xp / vista(Potential legal
issues as these interfaces aren't well documented).
Stefan