On Wed, 27 Jul 2005 20:34, Alexandre Julliard wrote:
Since there is no way of knowing if the target window uses the same code page, or even if its code page won't change between the time the message is stored in the queue and when it is retrieved, the only sane approach is to store messages in the queue in Unicode. Only SendMessage calls that bypass the queue avoid the translation. I'm pretty sure that this is what Windows does too, if you have a test demonstrating the opposite I'd be very interested to see it.
I have just finished running a series of tests using the attached programs - msgchar, msgchar2 and msgchar3.
* The short version:
WM_CHAR messages are delivered "immediately" whether sent by SendMessage or PostMessage. Where there is a conversion from A->W or W->A, the conversion is performed using a modified CP1252 table regardless of the values of CP_THREAD_ACP and CP_ACP. Effectively, when sending a message to a window that was created by a thread in a different code page, if SendMessageA is used, no translation is performed. This was tested on a Win2k system with a default of CP1252 (Western Europe) and a WinXP system with a default ACP of CP950 (Chinese Traditional).
The table used for the conversion differs from the real CP1252 table in that characters 81, 8D, 8F, 90 and 9D, which are unassigned in CP1252, are converted to and from the Unicode characters with the same value (+). This results in a round-trippable conversion via Unicode, so that for WM_CHAR PostMessageA to an ANSI window will always work provided the data is in the code page expected by the recipient, but SendMessageA and PostMessageA to a Unicode window and SendMessageW and PostMessageW to an ANSI window are only guaranteed to get the correct result if CP1252 is used or the messages are limited to characters in the range 0x00->0x7f (assuming nothing exotic has been done like setting the ACP to an EBCDIC code page).
(+) - The CP1252 table in libs/unicode/c_1252.c does the same thing, but is seems Microsoft's CP1252 table also does this despite the fact that every published document on the code page says those characters are undefined.
* The long version
The first two programs create windows after setting the thread locale to be Chinese Traditional, which results in a CP_THREAD_ACP of 950. They then create additional threads with a locale of Japanese, which gets a CP_THREAD_ACP of 932. They create windows using both the W and A versions of the RegisterClass and CreateWindow API calls, and then tests sending messages using SendMessageA and SendMessageW for Unicode character 0x6893 (CP932 0x88 0xB2 and CP950 0xB1 0xEA). The difference between msgchar and msgchar2 is that the first uses GetMessageA/DispatchMessageA and the second uses GetMessageW/DispatchMessageW.
The third program tests more characters and the 950->932 direction and 950->950 transmissions, and was used to verify that a modified CP1252 is what is being used.
Note that 0x88, which is a lead byte in CP892, is one of the characters that maps outside the Latin1 page in Unicode (CP1252 0x88 is Unicode 0x02C6), which makes double-byte characters beginning with that code ideal for these tests.
The results were surprising. No matter what I did, when SendMessageW was used to send WM_CHAR to a window registered with RegisterClassA, the conversion was performed using a modified CP1252 - even if the system code page and thread code page for the receiving thread was CP950. When sending WM_CHAR using SendMessageA to a window registered with RegisterClassW, the conversion was also performed using CP1252 - even if the code page and thread code page for the receiving thread was CP950 (and for the sending thread was CP932).
When using SendMessageA to send WM_CHAR to a window registered with RegisterClassA, no conversion is performed even if the threads have different values for CP_THREAD_ACP.
In other words, where a conversion is performed it is always based on the modified CP1252, which has the effect that no visible conversion is ever performed for A->A messages.
In the first 3 sets of results (the ones listed as "results.*"), W->W PostMessages lose information because GetMessageA and DispatchMessageA are used. The next 2 sets of results (listed as "results2.*") do not show this loss, suggesting that messages are stored in the queue in "Unicode" based on the modified CP1252 conversion.
A 5 second delay was used between all calls to SendMessage and PostMessage, and lead bytes were never being held back to wait for the trail bytes.
Obviously Windows is fundamentally broken in the way it handles this. The general rule for applications has to be that if IsWindowUnicode is true, use SendMessageW with the Unicode character, and if false, use SendMessageA with the ANSI character, preferably knowing the code page expected by the recipient. Applications should avoid GetMessageA, TranslateMessageA and DispatchMessageA and use the W ones exclusively (since they might be processing a message for a Unicode window - perhaps the rich edit control?)
I also ran the program under Wine to test its behaviour, which does not match the behaviour of Windows at all.
The source to the test programs is attached, together with the output of the tests.