I simplified the thing greatly but not caring about specific synchronization of the async cancel APCs delivery. The reasoning behind that is that in the normal case (when thread kill facility and unix_tid is available) the signal should be delivered and started processing before we return from server call. For the potential corner cases this hopefully won't be a regression because those asyncs anyway can now complete and be delivered when thread is dead already.