I've done some performance testing, and here are my results (milliseconds between the two presents when a video starts playing in WILD HEARTS):
keeping fallback path for large reads:
512kb chunk size: 176ms, 176ms, 182ms, 176ms, 186ms
256kb chunk size: 180ms, 193ms, 189ms, 213ms, 204ms
128kb chunk size: 304ms, 311ms, 302ms, 307ms, 314ms
everything goes through cache:
256kb chunk size: 210ms, 199ms, 220ms, 195ms, 211ms
Also looked a bit at a 32kb chunk size, and here we were getting in the range of 400-500ms with the fallback path for larger reads, but without that a full 7 seconds.
Taking all that into consideration, I'll keep the fallback path out of cautiousness, keep the chunk size at 512kb, and disable the path entirely for 32-bit, unless there are any objections.