Gavriel State gav@transgaming.com wrote:
As it stands, your approach won't be useful for general Wine usage until you've got *everything* done.
True. But I think there are valid reasons for doing it this way.
One major problem is handles. Either the kernel must allocate all handles, or userspace must allocate all handles. Take something like the implementation of DuplicateHandle() - dead easy really in kernel space, including duplication to another process.
If userspace tells the kernel to use handle X for a mutex, say, and handle Y for a semaphore, then userspace still has to go to atomic management code to (a) allocate a handle, and (b) access a handle. And to implement cross-process duplication, you either need some sort of interrupt & IPC mechanism, or an external process (the wineserver). This being the case, you lose any gains by putting the stuff in-kernel.
Alternatively, if userspace is allowed to implement arbitrary handles that are allocated by the kernel, then the kernel waiting mechanism can get a little tricky. However, that said, for non-waitable objects, this might not be so bad. In fact, I'm planning on implementing the registry access handles this way and may well do some of the process and thread control this way too.
Furthermore, the kernel would have to be given object handles not object pointers, otherwise you have a gaping security hole.
And another major problem is context switches... they are horribly expensive really. They currently make Wine with at least 20 times the system call latency that the kernel method is capable of.
And then there will be boatloads of debugging to do.
Perhaps not as much as you think... I'm doing as much debugging as I can as I go along, just making sure the kernel objects work as I'd expect (not necessarily the same thing, true, as having "compatible" behaviour).
If there's any way that you can implement your kernel module more within the context of the existing server architecture - replacing objects on a piece by piece fashion rather than all at once - that might make it easier to adopt.
See above for why the piece-by-piece method is difficult.
One alternative would be to invent a new network protocol (say AF_WINE), but that again requires a complete implementation before it is really useful.
For example, you might try implementing the core object/handle management and waiting code in the kernel module, and have the wine server rely on the kernel module for that low-level functionality.
By this, do you mean actually having the wineserver process talk to the kernel module on behalf of the Wine application?
Waiting for objects would be implemented on the client side through a call to the kernel module.
That's what I'm currently doing, though it's not fully implemented yet.
When something needs to be done with an object, we would call either the wineserver or the kernel module, depending on how that object is implemented.
For example, you could do mutexes entirely within the kernel module, but leave file objects on the wine-server side initially.
My main gripe is the slow speed of access to files... Every Read/WriteFile goes to the wineserver to convert the handle into a file descriptor and to check for locking. The FD is then passed back over a UNIX domain socket, used once and then closed.
I suspect it can't really be done otherwise, particularly if ZwCloseObject (or whatever it is called) is implemented, since this allows handles in another process to be closed.
Actually, I've done a fair amount of the file object stuff... Most of it involves mapping down to a "struct file *", which is how the kernel views files, and then invoking appropriate kernel method.
In my experience Alexandre far prefers incremental change to kind of approach you're taking. Using that kind of approach will improve the chances that your code will make it into Wine at some point.
Hmmm... It's difficult to determine how to do it incrementally without making for even more work, but I think I know what you mean.
One thing I've been wondering that you might be able to answer is this: exactly why is the current Wine Server architecture so slow? Is it just the context switching?
Context switching is the main element of it. Going to the wineserver and back again just for a ReadFile() call or a Wait*() function incurs a fairly serious penalty (particularly on an X86, I think). Plus there's no requirement for the kernel to pass the remains of your timeslice to the wineserver and back again. Also, you have to bundle lots of data through AF_UNIX network packets and/or copy lots of data into and out of _shared_ memory without killing other threads.
One of the problems with the context switch is that you have to flush all the CPU caches, muck around with the MMU and execute scheduling algorithms.
Is it that the kernel isn't giving the wineserver a high enough priority once the client blocks after having written to the socket?
I don't think priority has anything much to do with it. A more convenient scheduling algorithm might help a little, though.
Is it other socket overhead (routing, perhaps)? Simply speeding up the communications path between the clients and the server would remove the need for most of the kernel level services.
I don't think so... To communicate with the wineserver you have to use some sort of waitable UNIX object (ie: a socket, a pipe, or a SYSV semaphore or message), do busy waiting (& kill your CPU), or send signals (context switch _and_ signal overhead).
To do it without a wineserver (ie: using shared memory) is also tricky... You have to be able to, for instance, recover from processes going away without releasing mutexes.
Also, just FYI, the PE image mapping work is nice, but isn't likely to affect speed all that much, since Wine can just mmap in most PE images created with recent compilers. WordPerfect, for example, launches in around 10-12 seconds on my machine (down from 45-60 seconds before the mmaping). I can't imagine that doing the fixups when paging instead would do too much better.
Look at the VM size of something like MS Word, and think of not having to allocate buffers to store all those DLLs, thus eating a massive chunk out of your machines total VM.
Plus, it should be even quicker launching because fixups don't have to be done unless they're necessary. You certainly don't have to go through an fixup a few tens of megs of DLL.
Cheers, David