Okay, while `.cfi_same_value` in the clean-up part technically worked for the tested unwinders, it was actually still wrong to use. The content of the register can be computed by looking at the register itself and not at the value previously computed by the previous frame (which does not exist at that point in time as we are in leaf-code).
Regarding the XMM registers. I added the instructions, to be the register content during the execution of the function and being the "same value" as the previous computation (i.e. the callee, i.e. the syscall) during the call.
In general, this is all mostly just doing the right thing because it is right. Windows unwinding codes (runtime_functions) must only make use of very few registers for their computation (IIRC, rbp and rsp only). But, as said, let's do it right, which I hope, should be the case now.