To make it clear, this doesn't implement software shaders - it just moves code around so when they're implemented it would be easier to do.
I already have one more or less working implementation, but it's based on executing small functions for each opcode (like the previous code does), except it goes through central processing with hardware shaders. That's incredibly slow.... so more thought needs to go into how shaders can be implemented. If it was possible to pre-generate the shader code (in asm?) somehow that'd be nice...
However shaders are implemented though, I think they should hook into the rest of the code as shown in the patch.