What your test app is doing? It probably needs a test under Windows to see in which encoding (ANSI/OEM) a not unicode app should receive input via a pipe.
I meant things like 'dir >lst.txt', 'dir | sort > lst.txt'. 'dir' and 'sort' could be replaced by some external .exes that get input and produce outpup.
Hiya,
I wrote an app which did ReadConsoleW and then traced out the hex of the first character read in, and used ALT+157 as a mechanism to supply a character which differs between the codepages I was playing with:
(All the following was under windows XP) Code: ReadConsoleW(GetStdHandle(STD_INPUT_HANDLE), buf, sizeof(buf)/sizeof(WCHAR), &nChars, NULL); printf("Character at position 0 is %x\n", buf[0]);
Results: Active code page: 437 - Character at position 0 is a5 Active code page: 850 - Character at position 0 is d8 Active code page: 1252 - Character at position 0 is 9d
So I think its converting between the console codepage and Unicode, if I interpret that correctly.
I then modified it to write out (**) unicode character 0xa5 to see if the conversion is back to oem or ansi, and although its hard to prove beyond doubt(*), it appears to me I am getting the reverse of that, it its converted to the console codepage before being output..
(*) in cmd.exe if its not full screen, the font does not change when chcp is executed, so for 437 and 850 I get an 0 type char and a yen. If I do it full screen, both give me a yen, so I would concur from that the character codepoint is changing and comes out depending on the font
(**) Because I want to test this with WriteConsoleW, this does not get redirected to a file so I cant see the raw codepoints...
Anything else I can test, or am I ok to put file tests into msvcrt test buckets and allow the msvcrt unicode printf and friends to convert to non-unicode using the console codepage before being output to the file handle?
Suggested tests welcome, but I was planning on using the unicode wide file i/o functions, the opening in and confirming the bytes were as expected (If I stick to a-z, 0-9 we will know if its come out with extra 0's)
Regards and thanks for your time, Jason
"Ann & Jason Edmeades" us@edmeades.me.uk wrote:
Anything else I can test, or am I ok to put file tests into msvcrt test buckets and allow the msvcrt unicode printf and friends to convert to non-unicode using the console codepage before being output to the file handle?
Why don't you simply run native MSVCRT under +relay,+snoop and see how it does things under win9x and NT based systems instead of inventing the wheel?
Anything else I can test, or am I ok to put file tests into msvcrt test buckets and allow the msvcrt unicode printf and friends to convert to non-unicode using the console codepage before being output to the file handle?
Why don't you simply run native MSVCRT under +relay,+snoop and see how it does things under win9x and NT based systems instead of inventing the
wheel?
I started with that, but the problem is msvcrt doesn't call anything in order to do the translations, which are definitely happening. What I think is being done is that at startup msvcrt uses GetACP to retrieve a codepage, and then converts 0x00, 0x01, 0x02 -> 0xff into wide chars. It then does some playing around producing non-unicode versions in upper and lowercase plus a character mapping so it knows what each character is (LCMapString) so I strongly suspect it is simply using caching to do the conversion.
However, since I cant snoop, relay or debug (I tried - far too confusing), that left the 'replicate via tests' alternative...
I was surprised that GetACP was the only codepage call I could see msvcrt making (rather than consolecp - I guess its because you don't know/care if its going to the console or a file?).
My current plan, unless you have strong objections, is to make the wprintf msvcrt routines use WideCharToMultiByte on the string into the GetACP codepage before being written out, and add file tests for this into the msvcrt testsuite
Jason
My current plan, unless you have strong objections, is to make the wprintf msvcrt routines use WideCharToMultiByte on the string into the GetACP codepage before being written out, and add file tests for this into the msvcrt testsuite
OK, A bit more testing and a bit more research starts to help...
f1 = fopen("text.file", "w+t"); fwprintf(f1, L"hello\n"); fclose(f1); f2 = fopen("binary.file", "w+b"); fwprintf(f1, L"hello\n"); fclose(f2);
In this case text.file comes out in multibyte (single byte in my case), and binary.file comes out in Unicode. So I think the answer is to see whether the stream we are writing to is binary or text and only convert if text. This is on the assumption that the stdhandles are opened in text mode. Next problem is to see how to tell :-)
I'm also guessing relay and snoop cant see internal dll calls, so that might explain the lack of calls. Perhaps something like wcstombs would be the key to this
Jason
"Ann & Jason Edmeades" us@edmeades.me.uk wrote:
My current plan, unless you have strong objections, is to make the wprintf msvcrt routines use WideCharToMultiByte on the string into the GetACP codepage before being written out, and add file tests for this into the msvcrt testsuite
You shouldn't really guess, you need to investigate how the things are supposed to work in Windows. IMO you are going wrong route by looking how msvcrt works instead of looking how cmd.exe does. Did you try to run native cmd.exe in WIne and see how it handles cp conversions, or under logger.exe in Windows?
I'm also guessing relay and snoop cant see internal dll calls, so that might explain the lack of calls. Perhaps something like wcstombs would be the key to this
Even if internal dll calls are not logged, wcstombs does an external call to do its job.