On 10/20/2012 05:40 PM, James Eder wrote:
- /* Intel says we need a zeroed 16-byte aligned buffer */
- char buffer[512 + 16];
- XMM_SAVE_AREA32 *state = (XMM_SAVE_AREA32 *)(((ULONG_PTR)buffer + 15) & ~15);
- memset(buffer, 0, sizeof(buffer));
- __asm__ __volatile__( "fxsave %0" : "=m" (*state) : "m" (*state) );
Wouldn't this be simpler?
DECLSPEC_ALIGN(16) XMM_SAVE_AREA32 state; memset(state, 0, sizeof(state)); __asm__ __volatile__("fxsave %0" : "=m" (*&state) : "m" (*&state));
May also want to make sure the two structs are packed.
On Sat, Oct 20, 2012 at 7:06 PM, Chris Robinson chris.kcat@gmail.com wrote:
On 10/20/2012 05:40 PM, James Eder wrote:
- /* Intel says we need a zeroed 16-byte aligned buffer */
- char buffer[512 + 16];
- XMM_SAVE_AREA32 *state = (XMM_SAVE_AREA32 *)(((ULONG_PTR)buffer + 15)
& ~15);
- memset(buffer, 0, sizeof(buffer));
- __asm__ __volatile__( "fxsave %0" : "=m" (*state) : "m" (*state) );
Wouldn't this be simpler?
DECLSPEC_ALIGN(16) XMM_SAVE_AREA32 state; memset(state, 0, sizeof(state)); __asm__ __volatile__("fxsave %0" : "=m" (*&state) : "m" (*&state));
May also want to make sure the two structs are packed.
I used that alignment method because I saw it done that way other places in Wine. I figured there must have been a good reason for doing it that way (issue with some build environments?) but perhaps I'm being paranoid.
On 10/21/2012 05:49 PM, James Eder wrote:
On Sat, Oct 20, 2012 at 7:06 PM, Chris Robinson chris.kcat@gmail.com wrote:
On 10/20/2012 05:40 PM, James Eder wrote:
- /* Intel says we need a zeroed 16-byte aligned buffer */
- char buffer[512 + 16];
- XMM_SAVE_AREA32 *state = (XMM_SAVE_AREA32 *)(((ULONG_PTR)buffer + 15)
& ~15);
- memset(buffer, 0, sizeof(buffer));
- __asm__ __volatile__( "fxsave %0" : "=m" (*state) : "m" (*state) );
Wouldn't this be simpler?
DECLSPEC_ALIGN(16) XMM_SAVE_AREA32 state; memset(state, 0, sizeof(state)); __asm__ __volatile__("fxsave %0" : "=m" (*&state) : "m" (*&state));
May also want to make sure the two structs are packed.
I used that alignment method because I saw it done that way other places in Wine. I figured there must have been a good reason for doing it that way (issue with some build environments?) but perhaps I'm being paranoid.
I would think the construct is necessary when allocating memory (the allocation functions don't allow to require a certain alignment as far as I know) That might be where you saw this being done?
On Sun, Oct 21, 2012 at 06:06:56PM +0200, Joris Huizer wrote:
I would think the construct is necessary when allocating memory (the allocation functions don't allow to require a certain alignment as far as I know) That might be where you saw this being done?
A conformant malloc() should return 16 byte aligned memory. But I bet some i386 ones don't. Especially since gcc is quite capable of suddently using the SSE2 (etc) instructions that require such alignment.
David
Gah... hit the wrong reply button so here we go again for some.
On Sun, Oct 21, 2012 at 10:06 AM, Joris Huizer joris_huizer@yahoo.com wrote:
On 10/21/2012 05:49 PM, James Eder wrote:
On Sat, Oct 20, 2012 at 7:06 PM, Chris Robinson chris.kcat@gmail.com wrote:
On 10/20/2012 05:40 PM, James Eder wrote:
- /* Intel says we need a zeroed 16-byte aligned buffer */
- char buffer[512 + 16];
- XMM_SAVE_AREA32 *state = (XMM_SAVE_AREA32 *)(((ULONG_PTR)buffer +
& ~15);
- memset(buffer, 0, sizeof(buffer));
- __asm__ __volatile__( "fxsave %0" : "=m" (*state) : "m" (*state) );
Wouldn't this be simpler?
DECLSPEC_ALIGN(16) XMM_SAVE_AREA32 state; memset(state, 0, sizeof(state)); __asm__ __volatile__("fxsave %0" : "=m" (*&state) : "m" (*&state));
May also want to make sure the two structs are packed.
I used that alignment method because I saw it done that way other places in Wine. I figured there must have been a good reason for doing it that way (issue with some build environments?) but perhaps I'm being paranoid.
I would think the construct is necessary when allocating memory (the allocation functions don't allow to require a certain alignment as far as I know) That might be where you saw this being done?
I was going off of save_fpux in dlls/ntdll/signal_i386.c:
static inline void save_fpux( CONTEXT *context ) { #ifdef __GNUC__ /* we have to enforce alignment by hand */ char buffer[sizeof(XMM_SAVE_AREA32) + 16]; XMM_SAVE_AREA32 *state = (XMM_SAVE_AREA32 *)(((ULONG_PTR)buffer + 15) & ~15);
__asm__ __volatile__( "fxsave %0" : "=m" (*state) ); context->ContextFlags |= CONTEXT_EXTENDED_REGISTERS; memcpy( context->ExtendedRegisters, state, sizeof(*state) ); #endif }
On Sun, Oct 21, 2012 at 10:56:51AM -0600, James Eder wrote:
static inline void save_fpux( CONTEXT *context ) { #ifdef __GNUC__ /* we have to enforce alignment by hand */ char buffer[sizeof(XMM_SAVE_AREA32) + 16]; XMM_SAVE_AREA32 *state = (XMM_SAVE_AREA32 *)(((ULONG_PTR)buffer + 15) & ~15);
__asm__ __volatile__( "fxsave %0" : "=m" (*state) ); context->ContextFlags |= CONTEXT_EXTENDED_REGISTERS; memcpy( context->ExtendedRegisters, state, sizeof(*state) );
#endif }
That is an on-stack buffer, all bets are off!
Traditionally the SYSV ABI (used by everyone except microsoft) only required the stack to br 4 byte aligned. At some point the gcc bods unilaterally decided to maintain 16 byte alignement (so the SSE2 registers can be saved on stack).
The Linux kernel aligns the stack when main (strictly _start) is called, but anything not compiled by gcc (or compiled with other options) can lose that alignment.
If you request 32 byte alignment for the save area, (recent enough) gcc will align the stack before allocating the locals.
David
Hmmm... the GCC docs say:
http://gcc.gnu.org/onlinedocs/gcc-4.7.2/gcc/Type-Attributes.html#Type-Attrib...
"Note that the alignment of any given struct or union type is required by the ISO C standard to be at least a perfect multiple of the lowest common multiple of the alignments of all of the members of the struct or union in question. This means that you can effectively adjust the alignment of a struct or union type by attaching an aligned attribute to any one of the members of such a type, [...]"
So, with conforming compilers, the having M128A aligned makes XMM_SAVE_AREA32 aligned. I tried it out some and I wasn't able to find a case which GCC (4.7.2) didn't align XMM_SAVE_AREA32 at 16 with M128A declared the way it is.
The documentation goes on to say:
"Note that the effectiveness of aligned attributes may be limited by inherent limitations in your linker. On many systems, the linker is only able to arrange for variables to be aligned up to a certain maximum alignment. (For some linkers, the maximum supported alignment may be very very small.) If your linker is only able to align variables up to a maximum of 8 byte alignment, then specifying aligned(16) in an __attribute__ will still only provide you with 8 byte alignment. See your linker documentation for further information."
Seems to work fine for me without the manual alignment, but I that doesn't say much for anyone using some other compiler/linker. Not that I care about those unfortunate souls :P. I was just going for what I figured would get the patch in. Either way will work well enough for me. So I guess I just need to know which way will get the patch in. :)
On Sun, Oct 21, 2012 at 04:16:04PM -0600, James Eder wrote:
Hmmm... the GCC docs say:
http://gcc.gnu.org/onlinedocs/gcc-4.7.2/gcc/Type-Attributes.html#Type-Attrib...
"Note that the alignment of any given struct or union type is required by the ISO C standard to be at least a perfect multiple of the lowest common multiple of the alignments of all of the members of the struct or union in question. This means that you can effectively adjust the alignment of a struct or union type by attaching an aligned attribute to any one of the members of such a type, [...]"
So, with conforming compilers, the having M128A aligned makes XMM_SAVE_AREA32 aligned. I tried it out some and I wasn't able to find a case which GCC (4.7.2) didn't align XMM_SAVE_AREA32 at 16 with M128A declared the way it is.
With the default gcc configuration and parameters used on Linux, 128 bit (16 byte) alignment of on-stack items assumes that the stack is 16 byte aligned.
While the code generated by gcc for function calls maintains this alignement, it isn't part of the standard ABI and other code may not do so. If you read up on this you'll find someone tried detecting misaligned stacks and found they happened all the time. Pertinent to wine is that code compiled with MSC will not maintain that alignment, any assembler wrappers are unlikely to as well.
There are some options to gcc that affect the generated code, one will force the stack to be realigned if any locals need 16 byte alignement. Changing some of the other options will cause internal compiler faults!
The code in gcc is made more complicated by the fact that it is useful to align 8 byte items, but never (AFAICT) necessary to do so, and gcc tries to align on-stack double and long-long items but not realign the stack in order to do so.
We've just gone through a series of hoops in NetBSD trying to fix gcc to generate code that doesn't assume 16 byte aligned stack and doesn't fault internally.
David