Hi,
Sorry it took so long to review it.
I've done some tests and I don't think it makes sense to do the "merge" approach. I'm attaching a version of your patch that uses the same approach as you have used on x86/x86_64 on all platforms. If the patch looks OK please send it to wine-devel.
Notes about performance: - it has similar performance as your previous patch on x86/x86_64 - it's faster on ARM comparing to what's currently in wine - performance on ARM varies a lot depending on hardware capabilities
Thanks, Piotr