Hello all,
There is a Windows 98 program, a game called Nuclear Strike, which wants to do some amount of direct VGA access. Part of this is port I/O, which naturally throws SIGILL that we can trivially catch and emulate in Wine. The other part is direct access to the video memory at 0xa0000, which in general isn't a problem to catch and virtualize as well.
However, this program is a bit creative about how it accesses that memory; instead of just writing to 0xa0000 directly, it looks up a segment descriptor whose base is at 0xa0000 and then uses the %es override to write bytes. In pseudo-C, what it does is:
int get_vga_selector() { sgdt(&gdt_size, &gdt_ptr); sldt(&ldt_segment); ++gdt_size; descriptor = gdt_ptr; while (descriptor->base != 0xa0000) { ++descriptor; gdt_size -= sizeof(*descriptor); if (!gdt_size) break; }
if (gdt_size) return (descriptor - gdt_ptr) << 3;
descriptor = gdt_ptr[ldt_segment >> 3]->base; ldt_size = gdt_ptr[ldt_segment >> 3]->limit + 1; while (descriptor->base != 0xa0000) { ++descriptor; ldt_size -= sizeof(*descriptor); if (!ldt_size) break; }
if (ldt_size) return (descriptor - ldt_ptr) << 3;
return 0; }
Currently we emulate IDT access. On a read fault, we execute sidt ourselves, check if the read address falls within the IDT, and return some dummy data from the exception handler if it does [1]. We can easily enough implement GDT access as well this way, and there is even an out-of-tree patch written some years ago that does this, and helps the game run.
However, there are two problems that I have observed or anticipated:
(1) On systems with UMIP, the kernel emulates sgdt instructions and returns a consistent address which we can guarantee is invalid. However, it also returns a size of zero. The program doesn't expect this (cf. the way the loop is written above) and I believe will effectively loop forever in that case, or until it finds the VGA selector or hits invalid memory.
I see two obvious ways to fix this: either adjust the size of the fake kernel GDT, or provide a switch to stop emulating and let Wine handle it. The latter may very well a more sustainable option in the long term (although I'll admit I can't immediately come up with a reason why, other than "we might need to raise the size yet again".)
Does anyone have opinions on this particular topic? I can look into writing a patch but I'm not sure what the best approach is.
(2) On 64-bit systems without UMIP, sgdt returns a truncated address when in 32-bit mode. This truncated address in practice might point anywhere in the address space, including to valid memory.
In order to fix this, we would need the kernel to guarantee that the GDT base points to an address whose bottom 32 bits we can guarantee are inaccessible. This is relatively easy to achieve ourselves by simply mapping those pages as noaccess, but it also means that those pages can't overlap something we need; we already go to pains to make sure that certain parts of the address space are free. Broadly anything above the 2G boundary *should* be okay though. Is this feasible?
We could also just decide we don't care about systems without UMIP, but that seems a bit unfortunate; it's not that old of a feature. But I also have no idea how hard it would be to make this kind of a guarantee on the kernel side.
This is also, theoretically, a problem for the IDT, except that on the machines I've tested, the IDT is always at 0xfffffe0000000000. That's not great either (it's certainly caused some weirdness and confusion when debugging, when we unexpectedly catch an unrelated null pointer access) but it seems to work in practice.
--Zeb
[1] https://source.winehq.org/git/wine.git/blob/HEAD:/dlls/krnl386.exe16/ instr.c#l702
On December 27, 2023 2:20:37 PM PST, Elizabeth Figura zfigura@codeweavers.com wrote:
Hello all,
There is a Windows 98 program, a game called Nuclear Strike, which wants to do some amount of direct VGA access. Part of this is port I/O, which naturally throws SIGILL that we can trivially catch and emulate in Wine. The other part is direct access to the video memory at 0xa0000, which in general isn't a problem to catch and virtualize as well.
However, this program is a bit creative about how it accesses that memory; instead of just writing to 0xa0000 directly, it looks up a segment descriptor whose base is at 0xa0000 and then uses the %es override to write bytes. In pseudo-C, what it does is:
int get_vga_selector() { sgdt(&gdt_size, &gdt_ptr); sldt(&ldt_segment); ++gdt_size; descriptor = gdt_ptr; while (descriptor->base != 0xa0000) { ++descriptor; gdt_size -= sizeof(*descriptor); if (!gdt_size) break; }
if (gdt_size) return (descriptor - gdt_ptr) << 3;
descriptor = gdt_ptr[ldt_segment >> 3]->base; ldt_size = gdt_ptr[ldt_segment >> 3]->limit + 1; while (descriptor->base != 0xa0000) { ++descriptor; ldt_size -= sizeof(*descriptor); if (!ldt_size) break; }
if (ldt_size) return (descriptor - ldt_ptr) << 3;
return 0; }
Currently we emulate IDT access. On a read fault, we execute sidt ourselves, check if the read address falls within the IDT, and return some dummy data from the exception handler if it does [1]. We can easily enough implement GDT access as well this way, and there is even an out-of-tree patch written some years ago that does this, and helps the game run.
However, there are two problems that I have observed or anticipated:
(1) On systems with UMIP, the kernel emulates sgdt instructions and returns a consistent address which we can guarantee is invalid. However, it also returns a size of zero. The program doesn't expect this (cf. the way the loop is written above) and I believe will effectively loop forever in that case, or until it finds the VGA selector or hits invalid memory.
I see two obvious ways to fix this: either adjust the size of the fake kernel GDT, or provide a switch to stop emulating and let Wine handle it. The latter may very well a more sustainable option in the long term (although I'll admit I can't immediately come up with a reason why, other than "we might need to raise the size yet again".)
Does anyone have opinions on this particular topic? I can look into writing a patch but I'm not sure what the best approach is.
(2) On 64-bit systems without UMIP, sgdt returns a truncated address when in 32-bit mode. This truncated address in practice might point anywhere in the address space, including to valid memory.
In order to fix this, we would need the kernel to guarantee that the GDT base points to an address whose bottom 32 bits we can guarantee are inaccessible. This is relatively easy to achieve ourselves by simply mapping those pages as noaccess, but it also means that those pages can't overlap something we need; we already go to pains to make sure that certain parts of the address space are free. Broadly anything above the 2G boundary *should* be okay though. Is this feasible?
We could also just decide we don't care about systems without UMIP, but that seems a bit unfortunate; it's not that old of a feature. But I also have no idea how hard it would be to make this kind of a guarantee on the kernel side.
This is also, theoretically, a problem for the IDT, except that on the machines I've tested, the IDT is always at 0xfffffe0000000000. That's not great either (it's certainly caused some weirdness and confusion when debugging, when we unexpectedly catch an unrelated null pointer access) but it seems to work in practice.
--Zeb
[1] https://source.winehq.org/git/wine.git/blob/HEAD:/dlls/krnl386.exe16/ instr.c#l702
A prctl() to set the UMIP-emulated return values or disable it (giving SIGILL) would be easy enough.
For the non-UMIP case, and probably for a lot of other corner cases like relying on certain magic selector values and what not, the best option really would be to wrap the code in a lightweight KVM container. I do *not* mean running the Qemu user space part of KVM; instead have Wine interface with /dev/kvm directly.
Non-KVM-capable hardware is basically historic at this point.
On Wednesday, December 27, 2023 5:58:19 PM CST H. Peter Anvin wrote:
On December 27, 2023 2:20:37 PM PST, Elizabeth Figura
zfigura@codeweavers.com wrote:
Hello all,
There is a Windows 98 program, a game called Nuclear Strike, which wants to do some amount of direct VGA access. Part of this is port I/O, which naturally throws SIGILL that we can trivially catch and emulate in Wine. The other part is direct access to the video memory at 0xa0000, which in general isn't a problem to catch and virtualize as well.
However, this program is a bit creative about how it accesses that memory; instead of just writing to 0xa0000 directly, it looks up a segment descriptor whose base is at 0xa0000 and then uses the %es override to write bytes. In pseudo-C, what it does is:
int get_vga_selector() {
sgdt(&gdt_size, &gdt_ptr); sldt(&ldt_segment); ++gdt_size; descriptor = gdt_ptr; while (descriptor->base != 0xa0000) {
++descriptor; gdt_size -= sizeof(*descriptor); if (!gdt_size) break;
}
if (gdt_size)
return (descriptor - gdt_ptr) << 3;
descriptor = gdt_ptr[ldt_segment >> 3]->base; ldt_size = gdt_ptr[ldt_segment >> 3]->limit + 1; while (descriptor->base != 0xa0000) {
++descriptor; ldt_size -= sizeof(*descriptor); if (!ldt_size) break;
}
if (ldt_size)
return (descriptor - ldt_ptr) << 3;
return 0;
}
Currently we emulate IDT access. On a read fault, we execute sidt ourselves, check if the read address falls within the IDT, and return some dummy data from the exception handler if it does [1]. We can easily enough implement GDT access as well this way, and there is even an out-of-tree patch written some years ago that does this, and helps the game run.
However, there are two problems that I have observed or anticipated:
(1) On systems with UMIP, the kernel emulates sgdt instructions and returns a consistent address which we can guarantee is invalid. However, it also returns a size of zero. The program doesn't expect this (cf. the way the loop is written above) and I believe will effectively loop forever in that case, or until it finds the VGA selector or hits invalid memory.
I see two obvious ways to fix this: either adjust the size of the fake
kernel GDT, or provide a switch to stop emulating and let Wine handle it. The latter may very well a more sustainable option in the long term (although I'll admit I can't immediately come up with a reason why, other than "we might need to raise the size yet again".)
Does anyone have opinions on this particular topic? I can look into
writing a patch but I'm not sure what the best approach is.
(2) On 64-bit systems without UMIP, sgdt returns a truncated address when in 32-bit mode. This truncated address in practice might point anywhere in the address space, including to valid memory.
In order to fix this, we would need the kernel to guarantee that the GDT
base points to an address whose bottom 32 bits we can guarantee are inaccessible. This is relatively easy to achieve ourselves by simply mapping those pages as noaccess, but it also means that those pages can't overlap something we need; we already go to pains to make sure that certain parts of the address space are free. Broadly anything above the 2G boundary *should* be okay though. Is this feasible?
We could also just decide we don't care about systems without UMIP, but
that seems a bit unfortunate; it's not that old of a feature. But I also have no idea how hard it would be to make this kind of a guarantee on the kernel side.
This is also, theoretically, a problem for the IDT, except that on the
machines I've tested, the IDT is always at 0xfffffe0000000000. That's not great either (it's certainly caused some weirdness and confusion when debugging, when we unexpectedly catch an unrelated null pointer access) but it seems to work in practice.
--Zeb
[1] https://source.winehq.org/git/wine.git/blob/HEAD:/dlls/krnl386.exe16/ instr.c#l702
A prctl() to set the UMIP-emulated return values or disable it (giving SIGILL) would be easy enough.
For the non-UMIP case, and probably for a lot of other corner cases like relying on certain magic selector values and what not, the best option really would be to wrap the code in a lightweight KVM container. I do *not* mean running the Qemu user space part of KVM; instead have Wine interface with /dev/kvm directly.
Non-KVM-capable hardware is basically historic at this point.
Sorry for the late response—I've been trying to do research on what would be necessary to use KVM (plus I made the poor choice of sending this during the holiday season...)
I'm concerned that KVM is going to be difficult or even intractable. Here are some of the problems that I (perhaps incorrectly) understand:
* As I am led to understand, there can only be one hypervisor on the machine at a time, and KVM has a hard limit on the number of vCPUs.
The obvious way to use KVM for Wine is to make each (guest) thread a vCPU. That will, at the very least, run into the thread limit. In order to avoid that we'd need to ship a whole scheduler, which is concerning. That's a huge component to ship and a huge burden to keep updated. It also means we need to hoist *all* of the ipc and sync code into the guest, which will take an enormous amount of work.
Moreover, because there can only be one hypervisor, and Wine is a multi- process beast, that means that we suddenly need to throw every process into the same VM. That has unfortunate implications regarding isolation (it's been a dream for years that we'd be able to share a single wine "VM" between multiple users), it complicates memory management (though perhaps not terribly?). And it means you can only have one Wine VM at a time, and can't use Wine at the same time as a "real" VM, neither of which are restrictions that currently exist.
And it's not even like we can refactor—we'd have to rewrite tons of code to work inside a VM, but also keep the old code around for the cases where we don't have a VM and want to delegate scheduling to the host OS.
* Besides scheduling, we need to exit the VM every time we would normally call into Unix code, which in practice is every time that the application does an NT syscall, or uses a library which we delegate to the host (including e.g. GPU, multimedia, audio...)
I'm concerned that this will be very expensive. Most VM users don't need to exit on every syscall. While I haven't tested KVM, I think some other Wine developers actually did a similar experiment using a hypervisor to solve some other problem (related to 32-bit support on Mac OS), and exiting the hypervisor was prohibitively slow.
Alternatively we ship *more* components to reimplement these things inside the VM (e.g. virgl/venus for GPU hardware, other virtio bits for interacting with e.g. multimedia hardware? enough of a cache to make block I/O reasonably fast, a few layers of networking code...), which looks more and more ugly.
If nothing else, it's a huge hammer to fix this one problem for an application which doesn't even currently work in Wine, *and* which isn't even a problem on sufficiently new hardware (and to fix other GDT problems which are only theoretical at this point.)
--Zeb
Am Dienstag, 2. Januar 2024, 22:53:26 EAT schrieb Elizabeth Figura:
I'm concerned that this will be very expensive. Most VM users don't need to exit on every syscall. While I haven't tested KVM, I think some other Wine developers actually did a similar experiment using a hypervisor to solve some other problem (related to 32-bit support on Mac OS), and exiting the hypervisor was prohibitively slow.
Just to add to this point, Ken Thomases and I experimented with this on Mac OS, and as Zeb said, we found it to be unworkably slow. In the d3d games we tested the performance of hypervisor + lots of exits was approximately the same as running all 32 bit guest code inside qemu's software CPU emulation, or about 5% of the performance of using native 32 bit mac processes (when they still existed). From what we could tell the cost was imposed by the CPU and not MacOS' very lightweight hypervisor API.
There are obviously differences between Mac and Linux, and with Wine's new syscalls we probably don't need to exit as often as my hangover wrapper DLLs did, but combined with the other reasons Zeb listed I don't think running Wine inside KVM is ever going to be realistic.