If I understand it correctly, there was an idea floating around that Ge van Geldorp had which was to auto generate the needed DIB engine code for certain color depths and functions so you would not have to implement the whole thing at once. if I remember correctly he did a proof of concept implementation for ReactOS by first creating the generic interfaces that were needed and then generating the code for the more simple color depths so as not to break all the existing hacks they had. Maybe I am off base here and Ge will comment as he lurks on wine-devel and I have cc'd him on this email.
This was specifically for BitBlt'ing. You have (potentially) 3 surfaces involved in a BitBlt, the source surface, the destination surface and a brush surface. For DIBs, the source and destination surface can be either 1, 4, 8, 16, 24 or 32 bits deep plus source and destination can have different depths. The brush can be either a solid brush or a patterned brush. There are 256 different ways to combine the surfaces (Raster Ops or ROPs). All of these variables mean that although it's possible to write generic code to handle everything that code is going to be littered with if's. Meaning that generic code is going to be slow for simple operations like filling a rectangle with a solid color. So you want to special-case the most-often used cases and make them fast, while using the slow, generic code as a fallback.
If you want to get the best performance, you need to write a lot of almost-identical-but-slightly-different code. For example, in the innermost loop you'll need to actually perform the ROP. But with 256 possible ROPs that can be quite a number of if's to execute. And you're doing that inside the innermost loop. To speed up things, I moved the ROP determination to the outermost level. Based on the ROP one of 256 possible subroutines is called, which in its innermost loop can just combine the bits in a way hardcoded for that specific ROP (i.e. no more if's there's just e.g. "*Dest = *Src ^ *Dest" there). Actually, in the end I didn't use 256 subroutines but only used subroutines for the most common ROPs (those with a symbol like PATCOPY) and used a catch-all generic subroutine for the lesser used ROPs. All these subroutines are almost the same, it's just the actual ROP code that's different between them. And for some ROPs there's no source surface involved, so for those ROPs you don't need to advance pointers into the source surface when you're moving from row to row etc. (meaning you can't just use a preprocessor macro, the changes between the subroutines are a bit too complicated for that).
That's where the code generator came in. It generated all those slightly different subroutines for the standard ROPs and a generic routine for the rest. For an example, see http://oss.gse.nl/wine/dib8gen.c which contains the generated bitblt routines for a 8-bit destination surface. Compare for example the DIB_8BPP_BitBlt_Generic() routine near the top (the catch-all one) with DIB_8BPP_BitBlt_PATCOPY() further down. The last one has very tight inner loops (especially when BltInfo->PatternSurface is NULL, meaning you're filling the destination rectangle with a solid color) compared to the first one.
Of course, at the time I did measure performance to see if all this optimization stuff indeed improved performance. And it did, dramatically even. It's been a while, so I don't recall most of the performance numbers anymore, but I do remember that I benchmarked some of the DIB BitBlt operations and found that the generated code in ReactOS was about 3 times as fast as the DIB code in Windows XP (on the same hardware of course).
I was actually quite proud of that code generator and the code it produced. The DIB code generator is absolutely clean, MS doesn't ship anything like it so it's simply impossible that it was created using reverse engineering. I put the code generator under the LGPL, specifically so it could be used by Wine if so desired. Note that the scope of this is limited to BitBlt's though, it won't help when you need to draw a 3-pixel wide dash-dotted ellips on your DIB surface...
Gé van Geldorp.