Am 13.10.2009 um 22:50 schrieb Nikolay Sivov:
Stefan Dösinger wrote:
This makes use of the gcc attribute added to gcc svn yesterday: http://gcc.gnu.org/ml/gcc-cvs/2009-10/msg00319.html
Hi, Stefan. Could you explain me briefly if you have time what this is about? How this hook is about to work? I guess it's something like hard wrapping for forwarding api calls to something nondefault, or I'm wrong?
CC'ing Wine-devel, since other people will probably have the same question.
Steam has a feature which Valve calls "In Game Overlay". You can access certain Steam features, like the chat over voice chat from inside the game. This works without modifying the game. Other apps like Xfire have a similar feature.
To make the overlay work, Steam injects a DLL into the game before it starts the game. It creates the process suspended, allocates remote memory, and puts some DLL load code there. Then it changes the entrypoint to that allocated area, which calls LoadLibrary and calls the GameOverlayRenderer.dll DllMain. Then it calls the original game entrypoint. This part works fine already.
Now GameOverlayRenderer.dll has to intercept keyboard input, and it has to add its own graphics on top of the game graphics before the final result is sent to D3D. To do this, it tries to intercept calls like opengl32.wglSwapBuffers, IDirect3DSwapChain9::Present. Furthermore it hooks calls like LoadLibrary to find out if the game uses D3D9, and CreateProcess to catch children.
Steam's hooker doesn't blindly replace code bytes like other apps do. It disassembles the code, and tries to preserve the instruction. It tries to free the first 5 bytes and places an unconditional immediate jump to its own code there. After this has done its business, it executes the replaced opcodes and jumps to the next instruction.
Now Steam doesn't know all possible opcodes. For example, it doesn't know
89 e5 mov %esp, %ebp
which is the same as
8b ec mov %esp, %ebp
generated by MSVC.
Furthermore, it can't deal with the LEA generated on OSX to align the stack, and it doesn't know an xor %eax, %eax used to clear the eax register, etc.
Starting with WinXP SP2, microsoft added a 2 byte nop at the start of each function. It also adds 5 nops above the entrypoint. So many Win32 API functions look like this on Windows:
90 nop ; cc int 3 in some functions 90 nop ; cc int 3 90 nop ; cc int 3 90 nop ; cc int 3 90 nop ; cc int 3 func: 8b ff movl.s %edi, %edi. 55 pushl %ebp 8b ec movl.s %esp, %ebp
MS' idea is to replace the 8b ff with a -5 byte jump, and replace the 5 nops with a JMP dst. They use this for hotpatching, to apply security fixes to Windows DLLs without restarting the app. They have special compiler and linker switches to create hotpatchable images.
Since Steam makes a fairly reasonable assumption when it assumes that a function starts with 8b ff 55 ob ec, Alexandre and I agreed that it makes sense to do this for all Wine functions that Steam and other apps try to hook. However, the MSVC compile switches are pretty obscure. They only add the 8b ff, and they don't work together with / O2 (only /O1), and they break if you use the MSVC equivalent of -fomit- frame-pointer. So we and the gcc maintainer decided that we don't want to clone this feature 1:1 in gcc, instead we added a function attribute.
The ms_hook_prologue attrib will make sure the 5 bytes before and after the function entrypoint look as the above code. This happens no matter what other options, attributes etc are used. If e.g. -fomit- frame-pointer is used, it generates this code:
90 90 90 90 90 func: 8b ff movl.s %edi, %edi. 55 pushl %ebp 8b ec movl.s %esp, %ebp ?? popl %ebp < other code > ret
The first 5 bytes are marked as unspec volatile, so the optimizer will not try to optimize them away, even at -O6. If %ebp is pushed as part of <other code> the optimizer realizes this though. So gcc first dumps the 8b ff 55 8b ec, and then bothers about reconciling that with what the app actually wants.
Currently only the 5 bytes after the function start are generated by gcc, it doesn't yet generate the nops. I am working on this, but it will take a bit longer because it needs some changes in the backend- frontend interaction when generating function alignment. Furthermore there's a bug in binutils that when you explicitly align a function with .align X, 0x90 (ie, you request that 0x90 bytes are used for alignment) it will optimize those nops into a single 3, 4, 5, ... byte nop.
Stefan Dösinger <stefan <at> codeweavers.com> writes:
Do I read this correctly that a double unlockrect call on a surface fails, while a double unlockrect call on a texture succeeds?
Yes
Here are some more test suggestions:
-> Create a texture, retrieve its surface, and call LockRect/ UnlockRect on the surface. Does this show the surface or texture behavior
-> What happens if you have a texture, retrieve a surface and mix locks on the surface and texture?
I'll see if I can fix something together later on this week
-> What is the d3d9 behavior?
Test follows, was about to send