Thanks. I did miss it, as I had subscribed to the patches list, but forgot to subscribe the devel list. And thanks for your feedback.
One additional comment regarding the tests, as I didn't see mention of that in this thread. My code passes the tests for this DispCallFunc() function (I did verify that) except for the specialized case that tests for win64 functionality in DispCallFunc(). That test was designed to verify stdcall vs. cdecl on 32-bit x86 platforms. However, like the 64-bit x86, ARM (both 32 and 64) doesn't use stdcall in the traditional sense of stack cleaning.
Thus, that test needs to be #ifdef'd out on ARM as it isn't valid. But I didn't like how that looked and it wasn't consistent with the existing 64-bit version. I thought a better solution would be to change it to only be there for "__i386__" instead, as I think that's the only mode that will ever need it.
So if you are wondering why my patch set didn't touch the test code and why that particular test technically fails, that's why. I figured you guys can decide which way you want to tweak that...
My other replies are inlined below...
On Tue, Oct 24, 2017, at 13:37, André Hentschel wrote:
hi, just in case you missed it
-------- Weitergeleitete Nachricht -------- Betreff: Re: oleaut32: Add ARM support to DispCallFunc(). Datum: Mon, 23 Oct 2017 23:07:43 +0200 Von: André Hentschel [email protected] An: [email protected]
Hi Donna and Welcome to the Wine Project!
Whats your motivation for this patch? You did a very good job for a firsttimer!
Thanks! While new to this project, I've been programming for over 37 years. My motivation on this patch was porting some ActiveX components over to ARM to run on a Raspberry Pi Zero W for a data logging device I'm designing for the CAN bus. Those components use IDispatch for invocation rather than the vtable, as they were designed to be extensible without requiring users to recompile their code, and this was the last piece of the puzzle to make them fully work on ARM.
Some more notes below:
Am 22.10.2017 um 03:04 schrieb Donna Whisnant:
Adds ARM ABI support to DispCallFunc() to allow IDispatch invoke calls to succeed on ARM platforms. This change specifically targets only 32-bit little-endian (ARMEL) platform CPUs. It's believed to likely be compatible with 64-bit and/or big-endian (ARMEB) platforms, but testing for those other platforms should be completed before enabling.
Tested on Raspbian Stretch 2017-09-07 RPi image using a Qemu systemd container to compile and run it on an x86-64 Host.. When running on Raspberry Pi hardware, a 3G/1G split kernel needs to be used for Wine to function.
Signed-off-by: Donna Whisnant [email protected]
dlls/oleaut32/typelib.c | 221 +++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 219 insertions(+), 2 deletions(-)
diff --git a/dlls/oleaut32/typelib.c b/dlls/oleaut32/typelib.c index ebf6d85..d7ef212 100644 --- a/dlls/oleaut32/typelib.c +++ b/dlls/oleaut32/typelib.c @@ -6302,8 +6302,48 @@ static HRESULT WINAPI ITypeInfo_fnGetIDsOfNames( ITypeInfo2 *iface, return DISP_E_UNKNOWNNAME; }
+#if defined(__arm__) && defined(__ARM_32BIT_STATE) && !defined(__ARMEB__) +/* Note: this may work OK on 64-bit and/or ARMEB, but has currently only been tested for 32-bit ARMEL */
While i doubt there is something risky here regarding endianess you can keep the ARMEB check, but get rid of the 32bit check, it might be about Thunk(16bit) vs. ARM(32bit) but I'm not sure. Anyway, this won't work on 64bit, because the instruction set is totally different, but it inspires me to do the 64bit version as soon this lands upstream ;)
Interesting. Between the ARMEB/ARMEL modes and the 32/64 bit check, I would have left the 32-bit check and tossed the !ARMEB check, which is the opposite of your instructions. I'm fairly certain there isn't anything endian-specific. But as you pointed out, the instruction set on the 64-bit is different. So I'm confused why you say to get rid of the 32-bit check. Are you talking about removing the 'define' check for 32-bit or just the comment? Can you please clarify?
-#ifdef __i386__ +extern LONGLONG call_method( void *func, int nb_stk_args, const DWORD *stk_args, const DWORD *reg_args ); +__ASM_GLOBAL_FUNC( call_method,
/* r0 = *func
* r1 = nb_stk_args
* r2 = *stk_args (pointer to 'nb_stk_args' DWORD values to push on stack)
* r3 = *reg_args (pointer to 8, 64-bit d0-d7 (double) values OR as 16, 32-bit s0-s15 (float) values, followed by 4, 32-bit (DWORD) r0-r3 values)
*/
"push {fp, lr}\n\t" /* Save frame pointer and return address (stack still aligned to 8 bytes) */
__ASM_CFI(".cfi_adjust_cfa_offset 4\n\t")
__ASM_CFI(".cfi_rel_offset %fp,0\n\t")
"add fp, sp, #4\n\t" /* Setup frame pointer */
Is this adjusted FP used somewhere? (I forgot a lot of things regarding arm32...)
In my first version of the code, I was using r4 and r5 (which had to be saved/restored) and originally had slightly different arguments coming into the function (more than was fitting in the registers) and so I was setting up the frame for easier debugging. But then I remembered I have the 'ip' register to my disposal, and that was sufficient and allowed me to toss the saving of those other registers.
And, my original function signature was passing both the floating point registers and the core registers in separate structures using separate pointers to them, but then I realized that could be a single structure with both, so I tossed the extra argument and everything would then fit into the registers alone for the call, meaning the stack frame itself isn't really needed anymore. However, it's still necessary to set 'fp', though, as it's needed for cleaning the stack on the return from the callee, but I could toss the +4 adjustment of it or use "#0" for the adjustment like the compiler probably would (as I think it maps to the same instruction binary). If I do that, the ASM_CFI macro usage could be dropped too as there would be nothing for them to report.
__ASM_CFI(".cfi_def_cfa_register %fp\n\t")
"lsls r1, r1, #2\n\t" /* r1 = nb_stk_args * sizeof(DWORD) */
"beq 1f\n\t" /* Skip allocation if no stack args */
"add r2, r2, r1\n" /* Calculate ending address of incoming stack data */
"2:\tldr ip, [r2, #-4]!\n\t" /* Get next value */
"str ip, [sp, #-4]!\n\t" /* Push it on the stack */
"subs r1, r1, #4\n\t" /* Decrement count */
"bgt 2b\n\t" /* Loop till done */
"1:\tvldm r3!, {s0-s15}\n\t" /* Load the s0-s15/d0-d7 arguments */
"mov ip, r0\n\t" /* Save the function call address to ip before we nuke r0 with arguments to pass */
"ldm r3, {r0-r3}\n\t" /* Load the r0-r3 arguments */
"blx ip\n\t" /* Call the target function */
"sub sp, fp, #4\n\t" /* Clean the stack using 'fp' to allow for either stdcall or cdecl calls */
__ASM_CFI(".cfi_def_cfa %sp,4\n\t")
"pop {fp, pc}\n\t" /* Restore FP and return */
__ASM_CFI(".cfi_same_value %fp\n\t") /* Here for completeness and symmetry with other cpus, but won't be reached */
)
+/* same function but returning single/double floating point */ +static float (* const call_float_method)(void *, int, const DWORD *, const DWORD *) = (void *)call_method; +static double (* const call_double_method)(void *, int, const DWORD *, const DWORD *) = (void *)call_method;
+#elif defined(__i386__)
extern LONGLONG call_method( void *func, int nb_args, const DWORD *args, int *stack_offset ); __ASM_GLOBAL_FUNC( call_method, @@ -6637,7 +6677,184 @@ DispCallFunc( void* pvInstance, ULONG_PTR oVft, CALLCONV cc, VARTYPE vtReturn, UINT cActuals, VARTYPE* prgvt, VARIANTARG** prgpvarg, VARIANT* pvargResult) { -#ifdef __i386__ +#if defined(__arm__) && defined(__ARM_32BIT_STATE) && !defined(__ARMEB__) +/* Note: this may work OK on 64-bit and/or ARMEB, but has currently only been tested for 32-bit ARMEL */
Same as above
- int argspos;
- void *func;
- UINT i;
- DWORD *args;
- struct {
union {
float s[16];
double d[8];
} sd;
DWORD r[4];
- } regs;
- int rcount; /* 32-bit register index count */
- int scount; /* single-precision float register index count (will be incremented twice for doubles, plus alignment) */
- TRACE("(%p, %ld, %d, %d, %d, %p, %p, %p (vt=%d))\n",
pvInstance, oVft, cc, vtReturn, cActuals, prgvt, prgpvarg,
pvargResult, V_VT(pvargResult));
You can go up to 100 chars per line, so be free to rearrange this
I could, but the trace line was a cut-and-paste from the i386 path that was already there and I figured consistency in the code was good.
- if (cc != CC_STDCALL && cc != CC_CDECL)
- {
FIXME("unsupported calling convention %d\n",cc);
return E_INVALIDARG;
- }
- argspos = 0;
- rcount = 0;
- scount = 0;
- /* Determine if we need to pass a pointer for the return value as arg 0. If so, do that */
- /* first as it will need to be in the 'r' registers: */
- switch (vtReturn)
- {
- case VT_DECIMAL:
- case VT_VARIANT:
regs.r[rcount++] = (DWORD)pvargResult; /* arg 0 is a pointer to the result */
break;
- case VT_HRESULT:
WARN("invalid return type %u\n", vtReturn);
return E_INVALIDARG;
- default: /* And all others are in 'r', 's', or 'd' registers or have no return value */
break;
- }
- if (pvInstance)
- {
const FARPROC *vtable = *(FARPROC **)pvInstance;
func = vtable[oVft/sizeof(void *)];
regs.r[rcount++] = (DWORD)pvInstance; /* the This pointer is always the first parameter */
- }
- else func = (void *)oVft;
- /* maximum size for an argument is sizeof(VARIANT). Also allow for return pointer and stack alignment. */
- args = heap_alloc(sizeof(VARIANT) * cActuals + sizeof(DWORD) * 4 );
- for (i = 0; i < cActuals; i++)
- {
VARIANT *arg = prgpvarg[i];
DWORD *pdwarg = (DWORD *)(arg); /* a reinterpret_cast of the variant, used for copying structures when they are split between registers and stack */
int ntemp; /* Used for counting words split between registers and stack */
switch (prgvt[i])
{
case VT_EMPTY:
break;
case VT_R4: /* these must be 4-byte aligned, and put in 's' regs or stack, as they are single-floats */
if (scount < 16)
{
regs.sd.s[scount++] = V_R4(arg);
} else {
args[argspos++] = V_UI4(arg);
}
I think no curly brackets are necessary here at all, but we usually don't open them in the same line as the if/else statement for new code
Oh sorry. I thought I was being consistent with the rest of the code-base. I have the curly brace on the next line after the 'if', but didn't think about the 'else'. I will fix that and resubmit it. I personally like the curly braces even for single-line statements, though, as it keeps from either misreading it or accidentally adding something for the conditional and forgetting to add them. But I will fix the formatting.
break;
case VT_R8: /* these must be 8-byte aligned, and put in 'd' regs or stack, as they are double-floats */
case VT_DATE:
if (scount < 15)
{
scount += (scount % 2); /* align scount to next whole double */
regs.sd.d[scount/2] = V_R8(arg);
scount += 2;
} else {
scount = 16; /* Make sure we flag that all 's' regs are full */
argspos += (argspos % 2); /* align argspos to 8-bytes */
memcpy( &args[argspos], &V_R8(arg), sizeof(V_R8(arg)) );
argspos += sizeof(V_R8(arg)) / sizeof(DWORD);
}
break;
case VT_I8: /* these must be 8-byte aligned, and put in 'r' regs or stack, as they are long-longs */
case VT_UI8:
case VT_CY:
if (rcount < 3)
{
rcount += (rcount % 2); /* align rcount to 8-byte register pair */
memcpy( ®s.r[rcount], &V_UI8(arg), sizeof(V_UI8(arg)) );
rcount += sizeof(V_UI8(arg)) / sizeof(DWORD);
} else {
rcount = 4; /* Make sure we flag that all 'r' regs are full */
argspos += (argspos % 2); /* align argspos to 8-bytes */
memcpy( &args[argspos], &V_UI8(arg), sizeof(V_UI8(arg)) );
argspos += sizeof(V_UI8(arg)) / sizeof(DWORD);
}
break;
case VT_DECIMAL: /* these structures are 8-byte aligned, and put in 'r' regs or stack, can be split between the two */
case VT_VARIANT:
/* 8-byte align 'r' and/or stack: */
if (rcount < 3)
{
rcount += (rcount % 2);
} else {
rcount = 4;
argspos += (argspos % 2);
}
ntemp = sizeof(*arg) / sizeof(DWORD);
while (ntemp > 0)
{
if (rcount < 4)
{
regs.r[rcount++] = *pdwarg++;
} else {
args[argspos++] = *pdwarg++;
}
--ntemp;
}
break;
case VT_BOOL: /* VT_BOOL is 16-bit but BOOL is 32-bit, needs to be extended */
if (rcount < 4)
{
regs.r[rcount++] = V_BOOL(arg);
} else {
args[argspos++] = V_BOOL(arg);
}
break;
default:
if (rcount < 4)
{
regs.r[rcount++] = V_UI4(arg);
} else {
args[argspos++] = V_UI4(arg);
}
break;
}
TRACE("arg %u: type %s %s\n", i, debugstr_vt(prgvt[i]), debugstr_variant(arg));
- }
- argspos += (argspos % 2); /* Make sure stack function alignment is 8-byte */
- TRACE("rcount: %d, scount: %d, argspos: %d\n", rcount, scount, argspos);
- switch (vtReturn)
- {
- case VT_EMPTY: /* EMPTY = no return value */
- case VT_DECIMAL: /* DECIMAL and VARIANT already have a pointer argument passed (see above) */
- case VT_VARIANT:
call_method( func, argspos, args, (DWORD*)®s );
break;
- case VT_R4:
V_R4(pvargResult) = call_float_method( func, argspos, args, (DWORD*)®s );
break;
- case VT_R8:
- case VT_DATE:
V_R8(pvargResult) = call_double_method( func, argspos, args, (DWORD*)®s );
break;
- case VT_I8:
- case VT_UI8:
- case VT_CY:
V_UI8(pvargResult) = call_method(func, argspos, args, (DWORD*)®s );
break;
- default:
V_UI4(pvargResult) = call_method( func, argspos, args, (DWORD*)®s );
break;
- }
- heap_free( args );
- if (vtReturn != VT_VARIANT) V_VT(pvargResult) = vtReturn;
- TRACE("retval: %s\n", debugstr_variant(pvargResult));
- return S_OK;
+#elif defined(__i386__) int argspos, stack_offset; void *func; UINT i;
Am 24.10.2017 um 21:58 schrieb Donna Whisnant:
Thanks. I did miss it, as I had subscribed to the patches list, but forgot to subscribe the devel list. And thanks for your feedback.
One additional comment regarding the tests, as I didn't see mention of that in this thread. My code passes the tests for this DispCallFunc() function (I did verify that) except for the specialized case that tests for win64 functionality in DispCallFunc(). That test was designed to verify stdcall vs. cdecl on 32-bit x86 platforms. However, like the 64-bit x86, ARM (both 32 and 64) doesn't use stdcall in the traditional sense of stack cleaning.
Thus, that test needs to be #ifdef'd out on ARM as it isn't valid. But I didn't like how that looked and it wasn't consistent with the existing 64-bit version. I thought a better solution would be to change it to only be there for "__i386__" instead, as I think that's the only mode that will ever need it.
So if you are wondering why my patch set didn't touch the test code and why that particular test technically fails, that's why. I figured you guys can decide which way you want to tweak that...
My other replies are inlined below...
On Tue, Oct 24, 2017, at 13:37, André Hentschel wrote:
hi, just in case you missed it
-------- Weitergeleitete Nachricht -------- Betreff: Re: oleaut32: Add ARM support to DispCallFunc(). Datum: Mon, 23 Oct 2017 23:07:43 +0200 Von: André Hentschel [email protected] An: [email protected]
Hi Donna and Welcome to the Wine Project!
Whats your motivation for this patch? You did a very good job for a firsttimer!
Thanks! While new to this project, I've been programming for over 37 years. My motivation on this patch was porting some ActiveX components over to ARM to run on a Raspberry Pi Zero W for a data logging device I'm designing for the CAN bus. Those components use IDispatch for invocation rather than the vtable, as they were designed to be extensible without requiring users to recompile their code, and this was the last piece of the puzzle to make them fully work on ARM.
Interesting, wow, so you recompile the ActiveX components?
Some more notes below:
Am 22.10.2017 um 03:04 schrieb Donna Whisnant:
Adds ARM ABI support to DispCallFunc() to allow IDispatch invoke calls to succeed on ARM platforms. This change specifically targets only 32-bit little-endian (ARMEL) platform CPUs. It's believed to likely be compatible with 64-bit and/or big-endian (ARMEB) platforms, but testing for those other platforms should be completed before enabling.
Tested on Raspbian Stretch 2017-09-07 RPi image using a Qemu systemd container to compile and run it on an x86-64 Host.. When running on Raspberry Pi hardware, a 3G/1G split kernel needs to be used for Wine to function.
Signed-off-by: Donna Whisnant [email protected]
dlls/oleaut32/typelib.c | 221 +++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 219 insertions(+), 2 deletions(-)
diff --git a/dlls/oleaut32/typelib.c b/dlls/oleaut32/typelib.c index ebf6d85..d7ef212 100644 --- a/dlls/oleaut32/typelib.c +++ b/dlls/oleaut32/typelib.c @@ -6302,8 +6302,48 @@ static HRESULT WINAPI ITypeInfo_fnGetIDsOfNames( ITypeInfo2 *iface, return DISP_E_UNKNOWNNAME; }
+#if defined(__arm__) && defined(__ARM_32BIT_STATE) && !defined(__ARMEB__) +/* Note: this may work OK on 64-bit and/or ARMEB, but has currently only been tested for 32-bit ARMEL */
While i doubt there is something risky here regarding endianess you can keep the ARMEB check, but get rid of the 32bit check, it might be about Thunk(16bit) vs. ARM(32bit) but I'm not sure. Anyway, this won't work on 64bit, because the instruction set is totally different, but it inspires me to do the 64bit version as soon this lands upstream ;)
Interesting. Between the ARMEB/ARMEL modes and the 32/64 bit check, I would have left the 32-bit check and tossed the !ARMEB check, which is the opposite of your instructions. I'm fairly certain there isn't anything endian-specific. But as you pointed out, the instruction set on the 64-bit is different. So I'm confused why you say to get rid of the 32-bit check. Are you talking about removing the 'define' check for 32-bit or just the comment? Can you please clarify?
Well, with __arm__ you already exclude 64-bit, as this only checks for the traditional arm32 instruction set. Code in that ifdef won't be compiled on arm64. If you want to write code that runs on arm64 you'd check for __aarch64__ instead of __arm__. see following link for reference: https://source.winehq.org/git/wine.git/blob/85527dbdf39c70fe5b1a341701813ecf...
-#ifdef __i386__ +extern LONGLONG call_method( void *func, int nb_stk_args, const DWORD *stk_args, const DWORD *reg_args ); +__ASM_GLOBAL_FUNC( call_method,
/* r0 = *func
* r1 = nb_stk_args
* r2 = *stk_args (pointer to 'nb_stk_args' DWORD values to push on stack)
* r3 = *reg_args (pointer to 8, 64-bit d0-d7 (double) values OR as 16, 32-bit s0-s15 (float) values, followed by 4, 32-bit (DWORD) r0-r3 values)
*/
"push {fp, lr}\n\t" /* Save frame pointer and return address (stack still aligned to 8 bytes) */
__ASM_CFI(".cfi_adjust_cfa_offset 4\n\t")
__ASM_CFI(".cfi_rel_offset %fp,0\n\t")
"add fp, sp, #4\n\t" /* Setup frame pointer */
Is this adjusted FP used somewhere? (I forgot a lot of things regarding arm32...)
In my first version of the code, I was using r4 and r5 (which had to be saved/restored) and originally had slightly different arguments coming into the function (more than was fitting in the registers) and so I was setting up the frame for easier debugging. But then I remembered I have the 'ip' register to my disposal, and that was sufficient and allowed me to toss the saving of those other registers.
And, my original function signature was passing both the floating point registers and the core registers in separate structures using separate pointers to them, but then I realized that could be a single structure with both, so I tossed the extra argument and everything would then fit into the registers alone for the call, meaning the stack frame itself isn't really needed anymore. However, it's still necessary to set 'fp', though, as it's needed for cleaning the stack on the return from the callee, but I could toss the +4 adjustment of it or use "#0" for the adjustment like the compiler probably would (as I think it maps to the same instruction binary). If I do that, the ASM_CFI macro usage could be dropped too as there would be nothing for them to report.
__ASM_CFI(".cfi_def_cfa_register %fp\n\t")
"lsls r1, r1, #2\n\t" /* r1 = nb_stk_args * sizeof(DWORD) */
"beq 1f\n\t" /* Skip allocation if no stack args */
"add r2, r2, r1\n" /* Calculate ending address of incoming stack data */
"2:\tldr ip, [r2, #-4]!\n\t" /* Get next value */
"str ip, [sp, #-4]!\n\t" /* Push it on the stack */
"subs r1, r1, #4\n\t" /* Decrement count */
"bgt 2b\n\t" /* Loop till done */
"1:\tvldm r3!, {s0-s15}\n\t" /* Load the s0-s15/d0-d7 arguments */
"mov ip, r0\n\t" /* Save the function call address to ip before we nuke r0 with arguments to pass */
"ldm r3, {r0-r3}\n\t" /* Load the r0-r3 arguments */
"blx ip\n\t" /* Call the target function */
"sub sp, fp, #4\n\t" /* Clean the stack using 'fp' to allow for either stdcall or cdecl calls */
__ASM_CFI(".cfi_def_cfa %sp,4\n\t")
"pop {fp, pc}\n\t" /* Restore FP and return */
__ASM_CFI(".cfi_same_value %fp\n\t") /* Here for completeness and symmetry with other cpus, but won't be reached */
)
+/* same function but returning single/double floating point */ +static float (* const call_float_method)(void *, int, const DWORD *, const DWORD *) = (void *)call_method; +static double (* const call_double_method)(void *, int, const DWORD *, const DWORD *) = (void *)call_method;
+#elif defined(__i386__)
extern LONGLONG call_method( void *func, int nb_args, const DWORD *args, int *stack_offset ); __ASM_GLOBAL_FUNC( call_method, @@ -6637,7 +6677,184 @@ DispCallFunc( void* pvInstance, ULONG_PTR oVft, CALLCONV cc, VARTYPE vtReturn, UINT cActuals, VARTYPE* prgvt, VARIANTARG** prgpvarg, VARIANT* pvargResult) { -#ifdef __i386__ +#if defined(__arm__) && defined(__ARM_32BIT_STATE) && !defined(__ARMEB__) +/* Note: this may work OK on 64-bit and/or ARMEB, but has currently only been tested for 32-bit ARMEL */
Same as above
- int argspos;
- void *func;
- UINT i;
- DWORD *args;
- struct {
union {
float s[16];
double d[8];
} sd;
DWORD r[4];
- } regs;
- int rcount; /* 32-bit register index count */
- int scount; /* single-precision float register index count (will be incremented twice for doubles, plus alignment) */
- TRACE("(%p, %ld, %d, %d, %d, %p, %p, %p (vt=%d))\n",
pvInstance, oVft, cc, vtReturn, cActuals, prgvt, prgpvarg,
pvargResult, V_VT(pvargResult));
You can go up to 100 chars per line, so be free to rearrange this
I could, but the trace line was a cut-and-paste from the i386 path that was already there and I figured consistency in the code was good.
If you add new code, you're allowed to modernize it.
- if (cc != CC_STDCALL && cc != CC_CDECL)
- {
FIXME("unsupported calling convention %d\n",cc);
return E_INVALIDARG;
- }
- argspos = 0;
- rcount = 0;
- scount = 0;
- /* Determine if we need to pass a pointer for the return value as arg 0. If so, do that */
- /* first as it will need to be in the 'r' registers: */
- switch (vtReturn)
- {
- case VT_DECIMAL:
- case VT_VARIANT:
regs.r[rcount++] = (DWORD)pvargResult; /* arg 0 is a pointer to the result */
break;
- case VT_HRESULT:
WARN("invalid return type %u\n", vtReturn);
return E_INVALIDARG;
- default: /* And all others are in 'r', 's', or 'd' registers or have no return value */
break;
- }
- if (pvInstance)
- {
const FARPROC *vtable = *(FARPROC **)pvInstance;
func = vtable[oVft/sizeof(void *)];
regs.r[rcount++] = (DWORD)pvInstance; /* the This pointer is always the first parameter */
- }
- else func = (void *)oVft;
- /* maximum size for an argument is sizeof(VARIANT). Also allow for return pointer and stack alignment. */
- args = heap_alloc(sizeof(VARIANT) * cActuals + sizeof(DWORD) * 4 );
- for (i = 0; i < cActuals; i++)
- {
VARIANT *arg = prgpvarg[i];
DWORD *pdwarg = (DWORD *)(arg); /* a reinterpret_cast of the variant, used for copying structures when they are split between registers and stack */
int ntemp; /* Used for counting words split between registers and stack */
switch (prgvt[i])
{
case VT_EMPTY:
break;
case VT_R4: /* these must be 4-byte aligned, and put in 's' regs or stack, as they are single-floats */
if (scount < 16)
{
regs.sd.s[scount++] = V_R4(arg);
} else {
args[argspos++] = V_UI4(arg);
}
I think no curly brackets are necessary here at all, but we usually don't open them in the same line as the if/else statement for new code
Oh sorry. I thought I was being consistent with the rest of the code-base. I have the curly brace on the next line after the 'if', but didn't think about the 'else'. I will fix that and resubmit it. I personally like the curly braces even for single-line statements, though, as it keeps from either misreading it or accidentally adding something for the conditional and forgetting to add them. But I will fix the formatting.
break;
case VT_R8: /* these must be 8-byte aligned, and put in 'd' regs or stack, as they are double-floats */
case VT_DATE:
if (scount < 15)
{
scount += (scount % 2); /* align scount to next whole double */
regs.sd.d[scount/2] = V_R8(arg);
scount += 2;
} else {
scount = 16; /* Make sure we flag that all 's' regs are full */
argspos += (argspos % 2); /* align argspos to 8-bytes */
memcpy( &args[argspos], &V_R8(arg), sizeof(V_R8(arg)) );
argspos += sizeof(V_R8(arg)) / sizeof(DWORD);
}
break;
case VT_I8: /* these must be 8-byte aligned, and put in 'r' regs or stack, as they are long-longs */
case VT_UI8:
case VT_CY:
if (rcount < 3)
{
rcount += (rcount % 2); /* align rcount to 8-byte register pair */
memcpy( ®s.r[rcount], &V_UI8(arg), sizeof(V_UI8(arg)) );
rcount += sizeof(V_UI8(arg)) / sizeof(DWORD);
} else {
rcount = 4; /* Make sure we flag that all 'r' regs are full */
argspos += (argspos % 2); /* align argspos to 8-bytes */
memcpy( &args[argspos], &V_UI8(arg), sizeof(V_UI8(arg)) );
argspos += sizeof(V_UI8(arg)) / sizeof(DWORD);
}
break;
case VT_DECIMAL: /* these structures are 8-byte aligned, and put in 'r' regs or stack, can be split between the two */
case VT_VARIANT:
/* 8-byte align 'r' and/or stack: */
if (rcount < 3)
{
rcount += (rcount % 2);
} else {
rcount = 4;
argspos += (argspos % 2);
}
ntemp = sizeof(*arg) / sizeof(DWORD);
while (ntemp > 0)
{
if (rcount < 4)
{
regs.r[rcount++] = *pdwarg++;
} else {
args[argspos++] = *pdwarg++;
}
--ntemp;
}
break;
case VT_BOOL: /* VT_BOOL is 16-bit but BOOL is 32-bit, needs to be extended */
if (rcount < 4)
{
regs.r[rcount++] = V_BOOL(arg);
} else {
args[argspos++] = V_BOOL(arg);
}
break;
default:
if (rcount < 4)
{
regs.r[rcount++] = V_UI4(arg);
} else {
args[argspos++] = V_UI4(arg);
}
break;
}
TRACE("arg %u: type %s %s\n", i, debugstr_vt(prgvt[i]), debugstr_variant(arg));
- }
- argspos += (argspos % 2); /* Make sure stack function alignment is 8-byte */
- TRACE("rcount: %d, scount: %d, argspos: %d\n", rcount, scount, argspos);
- switch (vtReturn)
- {
- case VT_EMPTY: /* EMPTY = no return value */
- case VT_DECIMAL: /* DECIMAL and VARIANT already have a pointer argument passed (see above) */
- case VT_VARIANT:
call_method( func, argspos, args, (DWORD*)®s );
break;
- case VT_R4:
V_R4(pvargResult) = call_float_method( func, argspos, args, (DWORD*)®s );
break;
- case VT_R8:
- case VT_DATE:
V_R8(pvargResult) = call_double_method( func, argspos, args, (DWORD*)®s );
break;
- case VT_I8:
- case VT_UI8:
- case VT_CY:
V_UI8(pvargResult) = call_method(func, argspos, args, (DWORD*)®s );
break;
- default:
V_UI4(pvargResult) = call_method( func, argspos, args, (DWORD*)®s );
break;
- }
- heap_free( args );
- if (vtReturn != VT_VARIANT) V_VT(pvargResult) = vtReturn;
- TRACE("retval: %s\n", debugstr_variant(pvargResult));
- return S_OK;
+#elif defined(__i386__) int argspos, stack_offset; void *func; UINT i;
I just posted v2 of the patch with the requisite changes.
[...]
Well, with __arm__ you already exclude 64-bit, as this only checks for the traditional arm32 instruction set. Code in that ifdef won't be compiled on arm64. If you want to write code that runs on arm64 you'd check for __aarch64__ instead of __arm__. see following link for reference: https://source.winehq.org/git/wine.git/blob/85527dbdf39c70fe5b1a341701813ecf...
Interesting! I didn't realize that that was true about __arm__. The majority of my ARM work has been on 32-bit devices. I have done some 64-bit ARM work, but nothing that needed those defines.
I had looked in the ARM and gcc specs on the defines and my interpretation of it was that __arm__ was always defined on any ARM platform (32 or 64) and that on 64 the "__aarch64__" was also defined. I didn't realize they are mutually exclusive with __arm__ implying 32-bit. I must have misinterpreted it.