Deadlock calls dbghelp.EnumerateLoadedModulesW64() each 5 minutes while in match and that now takes 400-500ms (with the game having ~180 modules loaded), that is already with process_vm_readv optimization in place, the server calls for each ReadProcessMemory() is taking majority of time. That is called de-facto for the current process but with the handle not being current process pseudo-handle. This a bit ad-hoc but simple optimization brings the time to about 10ms.
That is still much slower than on Windows where that takes about 0.5ms. Further optimization is possible by not relying on kernelbase functions and instead traversing the process loader information in dbghelp (which would allow to traverse that just once instead of 180 times for each module). Yet ReadProcessMemory() is going to take much more time than on Windows due to extra server calls and avoiding that when possible seems beneficial regardless.