On Thu, 16 Sep 2021, Martin Storsjö wrote:
On Thu, 16 Sep 2021, Martin Storsjö wrote:
Clang doesn't seem to know/exploit that the regular 32 bit store instructions work unaligned though, so the smaller stores get exploded into a long series of single byte writes. But I guess that's just a missed optimization opportunity in Clang, I'll see if I can report it.
FWIW this seems to be a target specific issue; Clang does optimize it correctly for an armv7-linux-gnueabihf target, but not for armv7-windows. I'll see about getting that fixed.
For the record, this should have been fixed in Clang now: https://reviews.llvm.org/D109960
// Martin