David Howells dhowells@cambridge.redhat.com writes:
So this saves you the cost of the fd transfer net packet. Though you still have to do the two context switches, which is my main contention.
I suspect we are doing more than two switches (though I haven't proved it), which is why I think there is a margin for improvement. You'll obviously always have the context switch cost unless everything is in the kernel.
True, but I'd have thought that the context switches involved are still a cost you can't get rid of so easily. Out of interest, how do you plan on doing the locking stuff for Read/WriteFile? Cache it locally? It is unfortunate, but you can't really make use of UNIX file locking, since this is mostly advisory and as such doesn't actively stop read/write calls.
Yes, we'll need to store the locks in the server and check them before each read/write (and probably also release them afterwards if necessary). There may be some optimisations possible, but we should probably do it the easy way first.
Seriously, though, whilst this'd be a lot easier in many ways (and it would allow you to avoid the context-switch penalties), you wouldn't be able to take full advantage of the available support in the kernel, which is more capable than the standard UNIX userspace API suggests.
I don't see why. I'm not suggesting keeping the current socket stuff, just reusing the structures. So basically instead of passing the address of the stack arguments (which is really ugly IMO) to your ioctl, you pass one of the server request structures. This allows your changes to be localized to wine_server_call and doesn't require changing any of the routines that make server calls. Obviously you'd need some more changes for a few calls like ReadFile/WriteFile, but most operations could switch to your mechanism without needing any change. You simply cannot require people to recompile all of Wine to use your module.
I still think that it should be possible to improve that by a small kernel hack. It will never be as fast as doing everything in the kernel of course, but it may just be fast enough to avoid the need to reimplement the whole server.
If you want to suggest exactly what you'd like to see as a hack...
I don't know exactly, there are many ways of doing it; you can have a specialized fifo, a network protocol, an ioctl, etc. Basically any mechanism that ensures that we do the strict mimimum number of context switches and schedule() calls for a server call. And probably also a way to transfer chunks of memory from the client address space so that we don't need the shared memory area.
As far as I've observed (I've got Win2000 available), most Windows DLL's have 512-byte (sector) alignment internally, _not_ 4096-byte (page) alignment for the sections. This means that the separate sections can't be mmap'd (or else they'd lose their required relative relationships):
Actually the file alignment doesn't need to be 4096, it needs to match the filesystem block size. On a FAT filesystem the block size is 512 so Linux will happily mmap every section. On a 1k-block ext2 fs it will be able to mmap about 50% of them.
Also, since DLLs and EXEs are not compiled as PIC (the MSDEV compiler not having such an option as far as I can recall), the fixup tables usually seem to apply to just about every page in the code section.
Only if the dll cannot be loaded at the preferred address, which shouldn't happen too often. I'm not saying your patch is useless, but I doubt the gain is as large as you seem to think.