On January 4, 2024 6:47:04 PM PST, Andrew Cooper andrew.cooper3@citrix.com wrote:
On 05/01/2024 1:02 am, H. Peter Anvin wrote:
Note that there is no fundamental reason you cannot run the Unix user space code inside the VM container, too; you only need to vmexit on an actual system call.
I know this is going on a tangent, but getting a VMExit on the SYSCALL instruction is surprisingly difficult.
The "easy" way is to hide EFER.SCE behind the guests back, intercept #UD and emulate both the SYSCALL and SYSRET instructions. It's slow, but it works.
However, FRED completely prohibits tricks like this, because what you cannot reasonably do is clear CR4.FRED behind the back of a guest kernel. You'd have to intercept and emulate all event sources in order to catch SYSCALL.
I raised this as a concern during early review, but Intel has no official feature to take a VMExit on privilege change, and FRED (rightly) wasn't an appropriate vehicle to add such a feature, so it was deemed not an issue that the FRED design would break the unofficial ways that people were using to intercept/monitor/etc system calls.
~Andrew
P.S. Yes, there are more adventurous tricks like injecting a thunk into the guest kernel and editing MSR_LSTAR behind the guest's back. In principle a similar trick works with FRED, but in order to do this to Windows, you also need to hook checkpatch to blind it to the thunk, and this is horribly invasive.
*In this case* it shouldn't be a problem, since the "guest operating system" would be virtually nonexistent and entirely puppeted by Wine.