After some testing I found out that the patch's behaviour is inaccurate.
The attached test program does the following:
1. Set R10 to 0xdeadbeef5a5a5a5a and R11 to 0x0123456789ABCDEF. 2. Generate a page fault. 3. Set R10 to 0xcafebabea5a5a5a5 and R11 to 0xfedcba9876543210. 4. Issue a system call that pauses the current thread. 5. Switch to another thread, and dump the previous thread's registers. 6. Set all bits in EFLAGS to 1. (0xffffffffffffffff) 7. Dump the previous thread's registers again.
Its output on Windows 10 (20H2) is:
SharedUserData.SystemCall = 0000000000
Before set context: EFlags = 0x0000000000000246 R11 = 0x0123456789abcdef RIP = 0x00007ffa09e504d4 RCX = 0x0000000000000088 RSP = 0x0000000000ccfef8 R10 = 0xdeadbeef5a5a5a5a
After set context: EFlags = 0x0000000000210fd5 R11 = 0x0123456789abcdef RIP = 0x00007ffa09e504d4 RCX = 0x0000000000000088 RSP = 0x0000000000ccfef8 R10 = 0xdeadbeef5a5a5a5a
From this we can observe the following:
A. KiFastSystemCall doesn't clear bit 1 in R11 by itself. Rather, it's the job of NtSetContextThread.
B. KiFastSystemCall ignores registers clobbered by the SYSCALL instruction. It does try to pretend that the 1st argument is being passed to RCX, which leaves the actual 1st argument register (R11) unmodified in CONTEXT. (Also note that this implies the presence of a flag in the real kernel that records whether R10/R11 are set to valid values or not. Otherwise, the kernel would be unable to use SYSRET since R11 != RFLAGS, etc.)
C. Other entrances to kernel (e.g. a page fault) do record all registers. These values are preserved until the next time the thread switches to kernel mode.