General comments: * This will take up up to 32 MiB per wg_parser, which seems like an awful lot, especially since applications can often create more than one of these at a time. I'm concerned about the memory usage and address space implications. Can the chunk size get any smaller without negatively impacting performance again? * The code as written does handle loads that span chunk boundaries, although I feel like it's done in a non-obvious way. Reading through the loop in src_getrange_cb() I'm left wondering "why would read_parser_cache() return less than the requested size?" and "why are read_parser_cache() and load_parser_cache() separate functions?" * And along similar lines, is it even worth falling back to the old path for larger read requests? If not, we should factor out a helper to actually perform the read pseudo-callback. * Instead of a "rank" field, I think it would be simpler just to make the index itself be the rank, and just memmove the entries every time. * I don't like the naminng of "parser_cache"; it's both redundant (everything in the file has to do with the parser) and not specific enough (caching what?) I'd propose "input_cache" or "read_cache" here. * The parser mutex doesn't seem to be taken around everything that it should be. -- https://gitlab.winehq.org/wine/wine/-/merge_requests/2390#note_26771