On Thu, 16 Sep 2021, Piotr Caban wrote:
Hi Martin,
On 9/15/21 10:27 PM, Martin Storsjo wrote:
ARM can do 64 bit writes with the STRD instruction, but that instruction requires a 32 bit aligned address - while these stores are unaligned.
Two consecutive stores to uint32_t* pointers can also be fused into one single STRD, as a uint32_t* is supposed to be properly aligned - therefore, do these stores as stores to volatile uint32_t* to avoid fusing them.
How about letting the compiler know that the pointers are unaligned instead? Is attached patch working for you?
Thanks, that's even better!
This way the compiler has more freedom to reason about it and can choose to use another instruction with less alignment requirements (both GCC and Clang seem to compile it to use a 16 byte VST, an unaligned SIMD store instead) which probably is much better than forcing the compiler to do a long sequence of 32 bit stores.
Clang doesn't seem to know/exploit that the regular 32 bit store instructions work unaligned though, so the smaller stores get exploded into a long series of single byte writes. But I guess that's just a missed optimization opportunity in Clang, I'll see if I can report it.
// Martin