On Mon Sep 18 09:30:20 2023 +0000, Yuxuan Shui wrote:
(I've seen up to 15x slower)
This is because Window's scheduling granularity is larger, so my test program ended up sleeping longer. When that's accounted for the overall runtime is comparable. I also measured the overhead of locking operations themselves. In general this MR is comparable with the current implementation, and is a bit faster than native. However, when there is no contention (i.e. only AcquireShared), native is faster, IIRC it is twice as fast vs the current impl, and about 2.5x vs this MR.
OK, I remembered incorrectly, I ran the test again and native is about 1.5x faster, this MR and the current impl is about the same (the measurement's margin of error is a bit large)