Hmmm... the GCC docs say:
http://gcc.gnu.org/onlinedocs/gcc-4.7.2/gcc/Type-Attributes.html#Type-Attrib...
"Note that the alignment of any given struct or union type is required by the ISO C standard to be at least a perfect multiple of the lowest common multiple of the alignments of all of the members of the struct or union in question. This means that you can effectively adjust the alignment of a struct or union type by attaching an aligned attribute to any one of the members of such a type, [...]"
So, with conforming compilers, the having M128A aligned makes XMM_SAVE_AREA32 aligned. I tried it out some and I wasn't able to find a case which GCC (4.7.2) didn't align XMM_SAVE_AREA32 at 16 with M128A declared the way it is.
The documentation goes on to say:
"Note that the effectiveness of aligned attributes may be limited by inherent limitations in your linker. On many systems, the linker is only able to arrange for variables to be aligned up to a certain maximum alignment. (For some linkers, the maximum supported alignment may be very very small.) If your linker is only able to align variables up to a maximum of 8 byte alignment, then specifying aligned(16) in an __attribute__ will still only provide you with 8 byte alignment. See your linker documentation for further information."
Seems to work fine for me without the manual alignment, but I that doesn't say much for anyone using some other compiler/linker. Not that I care about those unfortunate souls :P. I was just going for what I figured would get the patch in. Either way will work well enough for me. So I guess I just need to know which way will get the patch in. :)