2008/12/30 Henri Verbeet hverbeet@codeweavers.com:
From f6e4f88407491db8bb53d22d526f69b9ff761aaf Mon Sep 17 00:00:00 2001 From: Henri Verbeet hverbeet@codeweavers.com Date: Tue, 30 Dec 2008 14:56:49 +0100 Subject: wined3d: Convert some BOOLs to bitfields in struct IWineD3DDeviceImpl.
Also fills a 3 byte hole.
dlls/wined3d/device.c | 17 +++++---------- dlls/wined3d/nvidia_texture_shader.c | 2 +- dlls/wined3d/state.c | 4 +- dlls/wined3d/wined3d_private.h | 36 ++++++++++++++++----------------- 4 files changed, 26 insertions(+), 33 deletions(-)
How many instances of this structure are likely to be in a process at any one time? It seems to me as though as any memory savings gained by making the BOOLs into bitfields will be taken up by increased code size. There is also the risk that there will be a small performance penalty for this and the other similar changes too.
These kinds of optimisations need to be backed up by benchmarks, for both memory and performance.
2009/1/2 Rob Shearman robertshearman@gmail.com:
How many instances of this structure are likely to be in a process at any one time? It seems to me as though as any memory savings gained by making the BOOLs into bitfields will be taken up by increased code size. There is also the risk that there will be a small performance penalty for this and the other similar changes too.
In a typical application there's only one instance of the device struct, but the fields are accessed a lot. The patch isn't so much about saving memory as it's about not wasting cachelines. The SAVEDSTATES struct, which most of the other patches modify is used a bit more, once for each stateblock. Note that that structure was initially 5448 bytes large, using up 86 64-bit cachelines. It should be possible to get that down to 3 or 4.
Code size increase should be insignificant for this patch, in case of setting a flag you essentially replace a mov with an or, and testing stays mostly the same. For the SAVEDSTATES patches a couple of extra shifts are introduced, but I'm pretty sure those are worth it compared to the saved cachelines.
These kinds of optimisations need to be backed up by benchmarks, for both memory and performance.
I did of course run some benchmarks before sending these changes in. 3DMark03 shows a small but consistent improvement. The CSS stress test doesn't get much more than a single fps improvement for the average frame rate, but that one is mostly limited by shader constant loading and sample size & rate conversion in dsound (ignoring sRGB texture loading). I didn't notice any performance regressions in any applications.