Note that we are no longer doing that in the latest versions; the file descriptor is only transferred once,
Fair enough... I see that ZwClose/NtClose isn't actually a problem (since unlike most other Zw* calls, it can't affect other processes).
Oh... I see how you're doing it... sending the handle->fd translate request to the server, which sends a response saying you've got it cached; then using dup() locally to emulate the old behaviour; and then closing the fd.
So this saves you the cost of the fd transfer net packet. Though you still have to do the two context switches, which is my main contention.
and all further requests are done on a pipe which is faster than a socket.
True, but I'd have thought that the context switches involved are still a cost you can't get rid of so easily. Out of interest, how do you plan on doing the locking stuff for Read/WriteFile? Cache it locally? It is unfortunate, but you can't really make use of UNIX file locking, since this is mostly advisory and as such doesn't actively stop read/write calls.
The kernel module itself may be hard to do incrementally, but you should really consider reusing the existing server API so that your module can be plugged in easily. For instance your module entry points should be the same as the server requests, and use the same request structures.
What? Miss the opportunity to implement "int 0x2e" directly? *grin*
Seriously, though, whilst this'd be a lot easier in many ways (and it would allow you to avoid the context-switch penalties), you wouldn't be able to take full advantage of the available support in the kernel, which is more capable than the standard UNIX userspace API suggests.
It'd still have to paste handles to fds for most file operation calls, and you'd still have the PE Images soaking up a fair amount of memory.
If this is what you want, then it might be better done as a network protocol module that just pretends to be a wineserver, and supports the same read/write/sendmsg/recvmsg interface. (It'd have to be a network protocol to be able to get sendmsg/recvmsg calls):
int serv = socket(AF_WINE,SOCK_STREAM,0); short addr = AF_WINE; connect(serv,(struct sockaddr*)&addr,sizeof(addr));
I still think that it should be possible to improve that by a small kernel hack. It will never be as fast as doing everything in the kernel of course, but it may just be fast enough to avoid the need to reimplement the whole server.
If you want to suggest exactly what you'd like to see as a hack...
Have you measured how many dirty pages you can avoid with your change? It seems to me that in most cases, when the dll is loaded at its preferred address, the number of pages made dirty by the fixups should be quite small anyway.
As far as I've observed (I've got Win2000 available), most Windows DLL's have 512-byte (sector) alignment internally, _not_ 4096-byte (page) alignment for the sections. This means that the separate sections can't be mmap'd (or else they'd lose their required relative relationships):
VIRTUAL_mmap() { ... /* mmap() failed; if this is because the file offset is not */ /* page-aligned (EINVAL), or because the underlying filesystem */ /* does not support mmap() (ENOEXEC,ENODEV), we do it by hand. */ ... }
This appears to happen a lot. And then _all_ the pages in that section are dirty, irrespective of whether fixups are done or not.
Also, since DLLs and EXEs are not compiled as PIC (the MSDEV compiler not having such an option as far as I can recall), the fixup tables usually seem to apply to just about every page in the code section.
I'll have to write a small program to collect some statistics:-)
As for the DLL being loaded at it's preferred address, the kernel module jumps around the fixup stuff, and doesn't even consider trying to perform it.
Plus pages that have been altered by the fixup code are actually marked _clean_ by the VM subsystem, and can thus be simply discarded when physical memory needs to be reclaimed.
David