I don't know how well compilers will handle unaligned access on archs that don't natively support it.
I have only tested it with clang on arm - it produces code without unaligned memory access. This code is not well optimized but is faster than our current implementation. Recent ARM cpu's can also do unaligned memory access with "small" performance penalty FWIW.