(Meant to copy wine-devel as well, in case anyone else had any comments)
Subject: RE: locales, unicode and ansi with msvcrt (bug 8022)
My current plan, unless you have strong objections, is to make the
wprintf
msvcrt routines use WideCharToMultiByte on the string into the GetACP codepage before being written out, and add file tests for this into the msvcrt testsuite
You shouldn't really guess, you need to investigate how the things are
supposed
to work in Windows. IMO you are going wrong route by looking how msvcrt
works
instead of looking how cmd.exe does. Did you try to run native cmd.exe in
WIne
and see how it handles cp conversions, or under logger.exe in Windows?
Firstly, leave cmd.exe out of the equation. This has nothing to do with it at all. Just to emphasize, I am looking at the part of the bug where xcopy was writing out characters padded with gaps (nulls) which is caused by it simply calling msvcrt's wprintf.
I can recreate the problem with a simple, one line console application run through wine (not cmd.exe nor wineconsole):
void main() { wprintf(L"test with wprintf"); } [program was compiled VC6 : cl /MD test.c ]
wine test >a.a od -x a.a 0000000 0074 0065 0073 0074 0020 0077 0069 0074 0000020 0068 0020 0077 0070 0072 0069 006e 0074 0000040 0066
export WINEDLLOVERRIDES=msvcrt=n wine test >a.a od -x a.a 0000000 6574 7473 7720 7469 2068 7077 6972 746e 0000020 0066
Shows wine internal msvcrt differs from windows msvcrt and incorrectly outputs Unicode
I'm also guessing relay and snoop cant see internal dll calls, so that
might
explain the lack of calls. Perhaps something like wcstombs would be the
key
to this
Even if internal dll calls are not logged, wcstombs does an external call
to do
its job.
What makes you say that? It potentially doesn't need to, given the internal caching it does in process_attach, it could do it all in memory. However...
WINEDEBUG=+all,+relay,+snoop wine test 2>a.a a.a (removing Set/GetLastError and Tls call) shows:
0009:CALL msvcrt.wprintf(00403010) ret=0040100d 0009:Call ntdll.RtlAllocateHeap(00410000,00000000,00001000) ret=77c2c3c9 0009:trace:heap:RtlAllocateHeap (0x410000,00000002,00001000): returning 0x412168 0009:Ret ntdll.RtlAllocateHeap() retval=00412168 ret=77c2c3c9 0009:Call kernel32.WriteFile(00000008,0033f9c8,00000011,0033f9ac,00000000) ret=77c30218 0009:trace:file:WriteFile 0x8 0x33f9c8 17 0x33f9ac (nil) 0009:trace:ntdll:NtWriteFile (0x8,(nil),(nil),(nil),0x33f8b8,0x33f9c8,0x00000011,(nil),(nil))! 0009: get_handle_fd( handle=0x8, access=00000002, cached=1 ) 0009: get_handle_fd() = 0 { type=1, flags=0 } 0009:Ret kernel32.WriteFile() retval=00000001 ret=77c30218 0009:RET msvcrt.wprintf() retval=00000011 ret=0040100d
Note the writefile call - it passed 0x11 bytes (number of bytes in the narrow string)
Logger shows no calls from msvcrt during that time, but I don't trust it.
Running test program under windbg and trying to follow it through it does appear wctomb is called (it does NOT result in WideCharToMultiByte, possibly due to the characters or locale I am using?) but it does it on a character by character basis (I believe during the formatting). This would be the equivalent on wine to calling pf_vsnprintf with out.unicode set to false.
(I don't know why it doesn't come up on snoop - maybe it's a near call (no fixups?).
So its back to a necessity to ensure for text streams we only output multibyte (not wide).
I can see 2 ways of doing it:
1. Change MSVCRT_vfwprintf and add a conversion widechartomultibyte for a text stream (if (MSVCRT_fdesc[fd].wxflag & WX_TEXT) ...) Advantages: Changes self contained, easy Disadvantages: Formatting into Unicode buffer, then need to allocate space for answer, convert, print, free
2. Change MSVCRT_vfwprintf to use a new internal function which is identical to MSVCRT_vsnwprintf but sets out->Unicode = FALSE and out.buf.A as destination string. Advantages: Fastest way (conversion is WC->MB as the formatting is processed) Disadvantage: Needs new internal function
My preference is (2), and I have coded and tested it to confirm it does solve both the file i/o tests (included in patch) and the console output tests (performed by hand)
Comments? Jason
Ann & Jason Edmeades a écrit :
(Meant to copy wine-devel as well, in case anyone else had any comments)
Subject: RE: locales, unicode and ansi with msvcrt (bug 8022)
My current plan, unless you have strong objections, is to make the
wprintf
msvcrt routines use WideCharToMultiByte on the string into the GetACP codepage before being written out, and add file tests for this into the msvcrt testsuite
You shouldn't really guess, you need to investigate how the things are
supposed
to work in Windows. IMO you are going wrong route by looking how msvcrt
works
instead of looking how cmd.exe does. Did you try to run native cmd.exe in
WIne
and see how it handles cp conversions, or under logger.exe in Windows?
Firstly, leave cmd.exe out of the equation. This has nothing to do with it at all. Just to emphasize, I am looking at the part of the bug where xcopy was writing out characters padded with gaps (nulls) which is caused by it simply calling msvcrt's wprintf.
I can recreate the problem with a simple, one line console application run through wine (not cmd.exe nor wineconsole):
void main() { wprintf(L"test with wprintf"); } [program was compiled VC6 : cl /MD test.c ]
wine test >a.a od -x a.a 0000000 0074 0065 0073 0074 0020 0077 0069 0074 0000020 0068 0020 0077 0070 0072 0069 006e 0074 0000040 0066
export WINEDLLOVERRIDES=msvcrt=n wine test >a.a od -x a.a 0000000 6574 7473 7720 7469 2068 7077 6972 746e 0000020 0066
Shows wine internal msvcrt differs from windows msvcrt and incorrectly outputs Unicode
I'm also guessing relay and snoop cant see internal dll calls, so that
might
explain the lack of calls. Perhaps something like wcstombs would be the
key
to this
Even if internal dll calls are not logged, wcstombs does an external call
to do
its job.
What makes you say that? It potentially doesn't need to, given the internal caching it does in process_attach, it could do it all in memory. However...
WINEDEBUG=+all,+relay,+snoop wine test 2>a.a a.a (removing Set/GetLastError and Tls call) shows:
0009:CALL msvcrt.wprintf(00403010) ret=0040100d 0009:Call ntdll.RtlAllocateHeap(00410000,00000000,00001000) ret=77c2c3c9 0009:trace:heap:RtlAllocateHeap (0x410000,00000002,00001000): returning 0x412168 0009:Ret ntdll.RtlAllocateHeap() retval=00412168 ret=77c2c3c9 0009:Call kernel32.WriteFile(00000008,0033f9c8,00000011,0033f9ac,00000000) ret=77c30218 0009:trace:file:WriteFile 0x8 0x33f9c8 17 0x33f9ac (nil) 0009:trace:ntdll:NtWriteFile (0x8,(nil),(nil),(nil),0x33f8b8,0x33f9c8,0x00000011,(nil),(nil))! 0009: get_handle_fd( handle=0x8, access=00000002, cached=1 ) 0009: get_handle_fd() = 0 { type=1, flags=0 } 0009:Ret kernel32.WriteFile() retval=00000001 ret=77c30218 0009:RET msvcrt.wprintf() retval=00000011 ret=0040100d
Note the writefile call - it passed 0x11 bytes (number of bytes in the narrow string)
Logger shows no calls from msvcrt during that time, but I don't trust it.
Running test program under windbg and trying to follow it through it does appear wctomb is called (it does NOT result in WideCharToMultiByte, possibly due to the characters or locale I am using?) but it does it on a character by character basis (I believe during the formatting). This would be the equivalent on wine to calling pf_vsnprintf with out.unicode set to false.
(I don't know why it doesn't come up on snoop - maybe it's a near call (no fixups?).
So its back to a necessity to ensure for text streams we only output multibyte (not wide).
I can see 2 ways of doing it:
- Change MSVCRT_vfwprintf and add a conversion widechartomultibyte for a
text stream (if (MSVCRT_fdesc[fd].wxflag & WX_TEXT) ...) Advantages: Changes self contained, easy Disadvantages: Formatting into Unicode buffer, then need to allocate space for answer, convert, print, free
- Change MSVCRT_vfwprintf to use a new internal function which is identical
to MSVCRT_vsnwprintf but sets out->Unicode = FALSE and out.buf.A as destination string. Advantages: Fastest way (conversion is WC->MB as the formatting is processed) Disadvantage: Needs new internal function
My preference is (2), and I have coded and tested it to confirm it does solve both the file i/o tests (included in patch) and the console output tests (performed by hand)
Comments? Jason
what should be done is: - ensure that all msvcrt's function needing to write unicode strings behave as follows: + if the output handle is a console handle, use WriteConsoleW -> this is the only way to get the proper output + if not, convert the unicode string to ansi string and use WriteFile For the implementation, note that the call to WriteConsoleW will fail if the output stream is not a console handle, ie has been redirected to a file, pipe... => which makes the coding rather straightforward
the current issue is that all the w-functions end up calling WriteFile which is wrong
NB : in your test cases, don't use L"xxx" strings, they won't work when your test case is compiled under linux
A+
Detlef - Thanks... You can see what I added as an afterthough!
Eric, Thanks for taking the time to look at this and for your answers
- ensure that all msvcrt's function needing to write unicode strings
behave as follows:
- if the output handle is a console handle, use WriteConsoleW ->
this is the only way to get the proper output
- if not, convert the unicode string to ansi string and use WriteFile
For the implementation, note that the call to WriteConsoleW will fail if the output stream is not a console handle, ie has been redirected to a file, pipe... => which makes the coding rather straightforward
I disagree...
- My tests have shown that when writing to files as well, the output depends on whether the file was opened as text or binary. If opened as text, then it comes out in ansi.
- What makes you say calling WriteConsoleW is the only way to get proper output. WriteFileA will call WriteConsoleA for a console handle, so the calling code can then not need to care if it's a console or file handle.
- As has been seen when trying to work out what windows does, it converts as it parses character by character which is better than formatting into Unicode, then converting to ansi (possibly twice with a heapalloc inbetween to allocate a temporary buffer). This is exactly the change my patch introduces.
- Finally, debugging under windows seems to show it uses _putwc which does wctomb in memory, followed by a flush which results in a writefile (even for a console) call.
All of this would point at the fact the patch as written is functionally correct, I think. What do you think?
the current issue is that all the w-functions end up calling WriteFile which is wrong
Right, that's what I am looking at solving, and the patch will address all the *printf calls. There may be another exercise to look into some of the other msvcrt calls but I'm concentrating on the issue at hand to start with.
NB : in your test cases, don't use L"xxx" strings, they won't work when your test case is compiled under linux
Sure - I was doing it so the simplicity of the tests stood out (and I was compiling them on windows and running under wine). The tests added in msvcrt did not use this syntax...
Jason
Hi Jason. On Mo, 2007-04-16 at 22:28 +0100, you wrote:
- ok (i = (fread (wbuffer, 1, sizeof (wbuffer), file) == 3 *
sizeof(WCHAR)), "read wrong byte count got %i\n", c);
You put the result in "i", but in the message, you dump "c". (similar bug some lines later: Copy & Paste)
Also calling the function to test in a seperate Line looks much nicer. Thanks