FW: locales, unicode and ansi with msvcrt (bug 8022) - wine-devel

16 Apr 2007


      (Meant to copy wine-devel as well, in case anyone else had any comments)
Subject: RE: locales, unicode and ansi with msvcrt (bug 8022)
...
...
My current plan, unless you have strong objections, is to make the
wprintf
...
...
msvcrt routines use WideCharToMultiByte on the string into the GetACP
codepage before being written out, and add file tests for this into the
msvcrt testsuite
...
You shouldn't really guess, you need to investigate how the things are
supposed
...
to work in Windows. IMO you are going wrong route by looking how msvcrt
works
...
instead of looking how cmd.exe does. Did you try to run native cmd.exe in
WIne
...
and see how it handles cp conversions, or under logger.exe in Windows?
Firstly, leave cmd.exe out of the equation. This has nothing to do with it
at all. Just to emphasize, I am looking at the part of the bug where xcopy
was writing out characters padded with gaps (nulls) which is caused by it
simply calling msvcrt's wprintf.
I can recreate the problem with a simple, one line console application run
through wine (not cmd.exe nor wineconsole):
void main() { wprintf(L"test with wprintf"); }
[program was compiled VC6 : cl /MD test.c ]
wine test >a.a
od -x a.a
0000000 0074 0065 0073 0074 0020 0077 0069 0074
0000020 0068 0020 0077 0070 0072 0069 006e 0074
0000040 0066
export WINEDLLOVERRIDES=msvcrt=n
wine test >a.a
od -x a.a
0000000 6574 7473 7720 7469 2068 7077 6972 746e
0000020 0066
Shows wine internal msvcrt differs from windows msvcrt and incorrectly
outputs Unicode
...
...
I'm also guessing relay and snoop cant see internal dll calls, so that
might
...
...
explain the lack of calls. Perhaps something like wcstombs would be the
key
...
...
to this
...
Even if internal dll calls are not logged, wcstombs does an external call
to do
...
its job.
What makes you say that? It potentially doesn't need to, given the internal
caching it does in process_attach, it could do it all in memory. However...
WINEDEBUG=+all,+relay,+snoop wine test 2>a.a
a.a (removing Set/GetLastError and Tls call) shows:
0009:CALL msvcrt.wprintf(00403010) ret=0040100d
0009:Call ntdll.RtlAllocateHeap(00410000,00000000,00001000) ret=77c2c3c9
0009:trace:heap:RtlAllocateHeap (0x410000,00000002,00001000): returning
0x412168
0009:Ret  ntdll.RtlAllocateHeap() retval=00412168 ret=77c2c3c9
0009:Call kernel32.WriteFile(00000008,0033f9c8,00000011,0033f9ac,00000000)
ret=77c30218
0009:trace:file:WriteFile 0x8 0x33f9c8 17 0x33f9ac (nil)
0009:trace:ntdll:NtWriteFile
(0x8,(nil),(nil),(nil),0x33f8b8,0x33f9c8,0x00000011,(nil),(nil))!
0009: get_handle_fd( handle=0x8, access=00000002, cached=1 )
0009: get_handle_fd() = 0 { type=1, flags=0 }
0009:Ret  kernel32.WriteFile() retval=00000001 ret=77c30218
0009:RET  msvcrt.wprintf() retval=00000011 ret=0040100d
Note the writefile call - it passed 0x11 bytes (number of bytes in the
narrow string)
Logger shows no calls from msvcrt during that time, but I don't trust it.
Running test program under windbg and trying to follow it through it does
appear wctomb is called (it does NOT result in WideCharToMultiByte, possibly
due to the characters or locale I am using?) but it does it on a character
by character basis (I believe during the formatting). This would be the
equivalent on wine to calling pf_vsnprintf with out.unicode set to false.
(I don't know why it doesn't come up on snoop - maybe it's a near call (no
fixups?).
So its back to a necessity to ensure for text streams we only output
multibyte (not wide).
I can see 2 ways of doing it:
1. Change MSVCRT_vfwprintf and add a conversion widechartomultibyte for a
text stream (if (MSVCRT_fdesc[fd].wxflag & WX_TEXT) ...)
Advantages: Changes self contained, easy
Disadvantages: Formatting into Unicode buffer, then need to allocate space
for answer, convert, print, free
2. Change MSVCRT_vfwprintf to use a new internal function which is identical
to MSVCRT_vsnwprintf but sets out->Unicode = FALSE and out.buf.A as
destination string.
Advantages: Fastest way (conversion is WC->MB as the formatting is
processed)
Disadvantage: Needs new internal function
My preference is (2), and I have coded and tested it to confirm it does
solve both the file i/o tests (included in patch) and the console output
tests (performed by hand)
Comments?
Jason