Hi I cant compile it with actual CVS head.
drawprim.o: In function `drawPrimitive': /usr/src/wine/dlls/wined3d/drawprim.c:1696: undefined reference to `list_move' collect2: ld returned 1 exit status winegcc: gcc failed. make: *** [wined3d.dll.so] Error 2
Mirek
Stefan Dösinger napsal(a):
Hi, In the past days I've been hacking on implementing my state management ideas, and I think I've come to a state where I don't have to be completely ashamed of my patches :-)
First, what the code does NOT do yet:
- Pixel Shaders, GLSL shaders: I only had my notebook with the M9 available,
so I had no chance to implement them. Expect anything from broken graphics to the sudden release of Duke Nukem Forever if you try to use them.
- Stateblocks
- Register combiners: Disabled right now
- Offscreen rendering: Causes random rendering garbage
- 2D Blits: Commented out
I have described the basic ideas in earlier mails(http://www.winehq.org/pipermail/wine-devel/2006-October/051868.html), so I don't describe them here again. I pretty much followed the original plan.
Performance: One of the aims was to get better performance, since we apparently lost performance due to exessive state changes which eat CPU time and may require CPU-GPU syncs. My patches improve performance, but not as much as I originally hoped. I mainly have performance figures on the M9, and some basic testing on a gf7600.
- Billboard dx8 sdk demo: got from 56fps to 107 fps :-)
- Half-Life 1: Quite an improvement here too. 110->150 fps in one of my
timedemos. The d3d renderer now outperforms the opengl renderer(140 fps). Both the billboard demo and hl1 hit a special rendering case(no stream source or fvf changes), this is nicely optimized by my changes. The gl renderer in hl1 uses immediate mode drawing while wined3d can use VBOs and array drawing, thus beeing faster on today's cards.
- Battlefield 1942: Slight improvement too, 32->37 fps on my testing
scene(spawn point on a u.s. carrier@full graphics). BF1942 exceeded the usual linux/windows driver performance ratio already before, so I assume I'm pretty much at the limit of my M9 here.
- 3DMark2000: Unfortunately my driver crashes it before showing the scores, so
I can only watch the in-test counter. Seems to get +5 to +10 fps in the low detail helicopter test(resolution independent). Native msvcrt.dll gets another +5 fps.
I did only a short testing on my geforce7600:
- 3dmark2000: gets 11500 3dmarks, with forcing drawStridedFast 14500. This is
I believe the windows performance. However, the benchmark is too old to be meaningful. Before my state patches drawStridedFast scroe was around 13500 if I remember correctly, have to retest.
- 3dmark2001: Low detail tests run at 150-300 fps, too fast for a meaningful
result. high detail tests are slow and partially broken due to offscreen rendering.
- Battlefield 1942: Runs at steady 100fps, but it did that already before
So it seems that the state patches improve one bottleneck, but we have still many others(offscreen rendering, drawStridedSlow) left. The nvidia profiling driver may help here.
Where to go from here: The state management was also planned to make implementing other features easier:
- Multithreading: Make the dirty states list per context, and the helpers
stored in the device too. Before applying the states activate the correct ctx for the thread.
- Stateblocks: Basic idea is to record a display list and call it:
glNewList(stateblock->listname, GL_COMPILE); for(i = 1; i <= STATE_HIGHEST; i++) { States[i].func(i, stateblock); } glEndList();
To apply the stateblock: glCallList(stateblock->listname);
Ok, we need to split the list to apply only partial states, and the for loop can be improved to create a more efficient list. When the stateblock is altered we have to recreate the list. Thats the basic idea...
- Offscreen rendering: Depends on wether we need seperate contexts for
pbuffers. If yes, include it with the multithreading ctx finding, then apply the states, otherwise I think we can make selecting the pbuffer/fbo a state like all others. Has interactions with the viewport(I think) and the projection matrix(render_offscreen for upside down rendering)
- sRGB textures: Dirtifies the sampler. All textures have now information
about how many samplers they are bound to, and the number of one of the samplers. Phil?
- Vertex samplers: Ivan said he'd need the state management for them. My idea
is to build a d3d sampler - gl sampler mapping in SetTexture, which will be needed for register combiners too. Based on that we can bind vtf samplers in gl.
I have no clean patches right now(45 chaotic patches), so I decided to share my wined3d directory. However, this is even compressed a bit big for a mailing list, so I uploaded it to http://stud4.tuwien.ac.at/~e0526822/wined3d-statemgmt.tar.bz2
Stefan