It was spotted that QueryWorkingSetEx() takes an extremely long time when runs over big enough ranges.
E. g., Dinogen Online performs the memory scan by querying used memory through CEF, which CEF calls QueryWorkingSetEx for all of its memory chunks in ProcessMemoryDump::CountResidentBytes(). That scan takes ~400-500ms now and causes a visible freeze. After the patchset it is 15-20ms. QueryWorkingSetEx() is of course not a most commonly used function, but once used it is typical to scan big memory ranges (or the whole process address space) and not just a few pages.
I made a small benchmark program, attaching it and the output after different patches and from Windows all from the same computer. From the benchmark the scan performed in batches is speed up ~50 times and one page requests ~2.5times.
[query_working_set.c](/uploads/bf04ff316abcd7b2956c17f0ee96b5a8/query_working_set.c)
``` Windows: ----- anonymous -----. anonymous sequential: 1.7ms anonymous reverse: 1.0ms anonymous each second: 0.5ms anonymous page by page: 27.5ms ----- file -----. file sequential: 2.7ms file reverse: 2.5ms file each second: 1.3ms file page by page: 45.3ms
Current git ----- anonymous -----. anonymous sequential: 166.2ms anonymous reverse: 166.3ms anonymous each second: 83.7ms anonymous page by page: 197.6ms ----- file -----. file sequential: 161.5ms file reverse: 161.4ms file each second: 81.1ms file page by page: 191.2ms
After "ntdll: Factor OS-specific parts out of get_working_set_ex()." ----- anonymous -----. anonymous sequential: 166.3ms anonymous reverse: 166.5ms anonymous each second: 83.5ms anonymous page by page: 199.5ms ----- file -----. file sequential: 163.3ms file reverse: 163.9ms file each second: 81.9ms file page by page: 192.9ms
After "ntdll: Iterate views instead of requested addresses in get_working_set_ex()." ----- anonymous -----. anonymous sequential: 42.7ms anonymous reverse: 42.5ms anonymous each second: 21.9ms anonymous page by page: 198.7ms ----- file -----. file sequential: 40.1ms file reverse: 40.9ms file each second: 20.1ms file page by page: 197.3ms
After "ntdll: Limit vprot scan range to the needed interval in get_working_set_ex()." ----- anonymous -----. anonymous sequential: 42.5ms anonymous reverse: 42.9ms anonymous each second: 21.1ms anonymous page by page: 72.9ms ----- file -----. file sequential: 40.4ms file reverse: 40.1ms file each second: 20.0ms 008c:err:winediag:is_broken_driver Broken NVIDIA RandR detected, falling back to RandR 1.0. Please consider using the Nouveau driver instead. file page by page: 70.2ms
After ntdll: Buffer pagemap reads in fill_working_set_info(). ----- anonymous -----. anonymous sequential: 3.5ms anonymous reverse: 3.2ms anonymous each second: 1.8ms anonymous page by page: 74.7ms ----- file -----. file sequential: 2.4ms file reverse: 2.3ms file each second: 1.4ms file page by page: 71.6ms ```