I guess the biggest issue with that change is, however, that it actually goes against what the DWARF spec. says about the CFA:
Typically, the CFA is defined to be the value of the stack pointer at the call site in the previous frame (which may be different from its value on entry to the current frame).
In particular, your change was unfortunately not working with our version of (libunwindstack)[https://github.com/google/orbit/tree/main/third_party/libunwindstack%5C] and LLDB (unfortunately not yet upstream).
Yes, that's what I've been told (maybe even *you* did).
I am actually interested, does your change manage to unwind completely through the dispatcher and through the complete windows stack, as shown here?
With GDB and with the two other changes I mentioned, it fully works yes.
I'm a bit surprised that you don't need the flags push/pop change, as I believe it corrupts the return address on the stack, but maybe the unwinder has other ways to get it back.
Now the other part where unwinding doesn't work well is with the other side of syscalls, with KiUserCallbackDispatcher callbacks. Massaging a bit the frame pointers before entering it I'm able to stitch the stack trace to the frame before entering the syscall but I'm then losing the stack within the syscall. I guess we'd probably need some assembly there too to add adequate CFI info if we want to get that right too.