On 9/13/21 7:56 PM, Rémi Bernon wrote:
On 9/13/21 7:08 PM, Zebediah Figura wrote:
On 9/13/21 12:01 PM, Rémi Bernon wrote:
On 9/13/21 7:00 PM, Zebediah Figura wrote:
On 9/13/21 11:53 AM, Rémi Bernon wrote:
On 9/13/21 6:42 PM, Zebediah Figura wrote:
On 9/13/21 10:25 AM, Rémi Bernon wrote: > On 9/13/21 4:51 PM, Piotr Caban wrote: >> Hi Rémi, >> >> On 9/13/21 2:23 PM, Rémi Bernon wrote: >>> +static inline void __stosb(unsigned char* dst, unsigned char c, >>> size_t n) >>> +{ >>> + __asm__ __volatile__ ("cld; rep; stosb" : "=D"(dst) : "a"(c), >>> "D"(dst), "c"(n) : "memory", "cc"); >>> +} >> I don't know if it's important here but Microsoft's i386 cdecl abi >> specifies direction flag value on function call. Maybe if >> __cdecl is >> added cld call may be removed. >> > > All the ABIs are apparently requiring it to be cleared before a > function > call, or am I missing something? So it looks like it's not needed > anywhere and I was just over cautious. >
Well, ABIs do, but you're not defining that as an asm function; you're using inline assembly. So you can't guarantee anything.
But it's wrapped in a function, which implies what its calling convention ABI implies?
No, not really. The compiler is free to insert whatever assembly it wants before and after the __asm__ block, as long as it satisfies the constraints.
Not only that, but because it's a static function, the compiler is also free not to give it a standard calling convention at all.
Well, anyway MSVC doesn't generate cld with this intrinsic so I think we should not either.
I don't see why that means anything. At best, that just means MSVC is checking whether the direction flag was already clear, and not clearing it again. In theory, GCC could do that too, but I don't see any clear way to make the value of DF an input constraint.
I don't think it's doing that, and there's also probably no point.
These intrinsics are meant to generate assembly instructions, and so you can very well combine them by setting the direction flag before hand, for instance with __writeeflags and effectively reverse later __stosb or __movsb.
Then although it's what it does it's not documented and maybe isn't very safe.
Probably I should just add the "cld; rep; stosb" inline instead.
Yes. This will also kind of match with what LLVM has done (they are force inlining memset in this case).