https://bugs.winehq.org/show_bug.cgi?id=38558
--- Comment #8 from katsunori.kumatani@gmail.com --- Thanks, I'll test the patch tomorrow.
About the performance degradation: is there a reason you can't just do an initial check to see if the buffers overlap (distance between them < 16 bytes) and in that case use this slow but correct method?
A single branch shouldn't really hurt performance and will yield correct results in all cases (even if it's not this bug itself it's still a bug).