v7. Bugs fixed. Now all mutexes are 1 to 4 times faster depending on the load and number of threads, the operating time of mutexes is more deterministic with a lower standard deviation.
Interesting Facts: 1) `syscall(SYS_gettid)` kills performance, and makes the code 10-15 times slower compared to pthread_mutex. 2) `static inline` is a little faster than `#define`, apparently due to the large amount of inlining, branching and the processor cache stops working efficiently. 3) Optimizing fast blocking paths can be several times slower. This is a paradox. The fastest recursive mutex turned out to be the least optimal based on common sense. For the same reason, pthread_mutex recursive does not optimize fast paths.
``` pthread_mutex duration=67245843.000000 iterations=100.000000 mean=672458.430000 stdev=242104.122553 min=625500.000000 max=2840101.000000
atomic_mutex duration=29866077.000000 iterations=100.000000 mean=298660.770000 stdev=199965.048512 min=229918.000000 max=2840101.000000
pthread_mutex_recursive duration=384579581.000000 iterations=100.000000 mean=3845795.810000 stdev=631946.613951 min=3796231.000000 max=8124933.000000
atomic_mutex_recursive duration=206283709.000000 iterations=100.000000 mean=2062837.090000 stdev=241309.789530 min=2028265.000000 max=8124933.000000 ```