Hi,
I am having a problem with an app of mine. It posts WM_CHAR messages to a ANSI window with Unicode value in LOWORD(wParam). Wine automatically translates the character code to ANSI (producing a "?" when there is no such code in CP_ACP). Windows does not do it (contrary to documentation). Here is attached (source and binary) a quick and dirty sample application which shows the problem.
I could remove map_wparam_AtoW() and map_wparam_WtoA() from dlls/user/message.c, but probably somebody needs that translation. Does anybody have an idea of a better fix for that bug?
-- Ph.
On Tue, 26 Jul 2005 09:29, Phil Krylov wrote:
I could remove map_wparam_AtoW() and map_wparam_WtoA() from dlls/user/message.c, but probably somebody needs that translation. Does anybody have an idea of a better fix for that bug?
If you call SendMessageW under native Windows the translation happens. If you call SendMessageA it does not. This is all exactly what I would expect.
By my test, sending 0xf301 with SendMessageA gets 0xf301 on Windows, and with SendMessageW gets 0x003f. Sending with SendMessageA on Wine gets 0x0001 and with SendMessageW gets 0x003f.
On Tue, 26 Jul 2005 10:15:58 +1000 Troy Rollo wine@troy.rollo.name wrote:
If you call SendMessageW under native Windows the translation happens. If you call SendMessageA it does not. This is all exactly what I would expect.
By my test, sending 0xf301 with SendMessageA gets 0xf301 on Windows, and with SendMessageW gets 0x003f. Sending with SendMessageA on Wine gets 0x0001 and with SendMessageW gets 0x003f.
Thanks for the clarification. Here is a patch that takes it into account. But I don't know if and when the translation should be done in PostThreadMessageA.
ChangeLog:
Translate Unicode<->ANSI message wParams only when window type (ANSI/Unicode) does not match message handling function postfix (PeekMessageA/W etc.).
-- Ph.
"Phil Krylov" phil@newstar.rinet.ru wrote:
Translate Unicode<->ANSI message wParams only when window type (ANSI/Unicode) does not match message handling function postfix (PeekMessageA/W etc.).
This patch is not correct. All messages which potentially go through wineserver should be posted/sent via unicode. Perhaps map_wparam_AtoW and map_wparam_WtoA should translate the whole wparam, i.e. both high and low bytes. What does Windows do if the current locale is DBCS (Chinese/Japanese/Korean), does it translate the whole wparam in that case?
On Tue, 26 Jul 2005 16:04:22 +0900 "Dmitry Timoshkov" dmitry@baikal.ru wrote:
This patch is not correct. All messages which potentially go through wineserver should be posted/sent via unicode.
OK, then how you would suggest to solve the problem described in previous messages of this thread?
Perhaps map_wparam_AtoW and map_wparam_WtoA should translate the whole wparam, i.e. both high and low bytes.
What about the character codes which can't be converted?
What does Windows do if the current locale is DBCS (Chinese/Japanese/Korean), does it translate the whole wparam in that case?
AFAIK in DBCS two separate messages are used.
-- Ph.
"Phil Krylov" phil@newstar.rinet.ru wrote:
This patch is not correct. All messages which potentially go through wineserver should be posted/sent via unicode.
OK, then how you would suggest to solve the problem described in previous messages of this thread?
One solution is to translate the whole wparam.
Perhaps map_wparam_AtoW and map_wparam_WtoA should translate the whole wparam, i.e. both high and low bytes.
What about the character codes which can't be converted?
A->W conversion doesn't have that problem, A->W translation either if you use a valid source (i.e. the result of the A->W conversion).
What does Windows do if the current locale is DBCS (Chinese/Japanese/Korean), does it translate the whole wparam in that case?
AFAIK in DBCS two separate messages are used.
A test under Windows would say it for sure.
"Dmitry Timoshkov" dmitry@baikal.ru wrote:
What about the character codes which can't be converted?
A->W conversion doesn't have that problem, A->W translation either if
Of course the last A->W should be read as W->A, i.e.:
"W->A translation either if ..."
On Tue, 26 Jul 2005 16:31:50 +0900 "Dmitry Timoshkov" dmitry@baikal.ru wrote:
"Phil Krylov" phil@newstar.rinet.ru wrote:
This patch is not correct. All messages which potentially go through wineserver should be posted/sent via unicode.
OK, then how you would suggest to solve the problem described in previous messages of this thread?
One solution is to translate the whole wparam.
How do you see it? Example: I do a
PostMessageA(hwndAnsi, WM_CHAR, 0xF301, 0);
map_wparam_AtoW takes the "\x01\xF3" string, translates it to Unicode via CP_ACP (for CP1251, this would be probably {'\x01', 0x443})... and how does it fit this info back in LOWORD(wParam)?
AFAIK in DBCS two separate messages are used.
A test under Windows would say it for sure.
I can't test it as I don't have a Windows with DBCS locales installed, but Internet says:
When entering non-ASCII characters on systems with DBCS input
locales, the lead byte and trail byte for the DBCS character are passed in two successive WM_CHAR messages. So we are better off processing WM_IME_CHAR messages because we get both bytes at once. If we move to Unicode, however, we'll directly get UTF-16 in WM_CHAR; or on XP: UTF-32 in WM_UNICHAR.
-- Ph.
"Phil Krylov" phil@newstar.rinet.ru wrote:
One solution is to translate the whole wparam.
How do you see it? Example: I do a
PostMessageA(hwndAnsi, WM_CHAR, 0xF301, 0);
map_wparam_AtoW takes the "\x01\xF3" string, translates it to Unicode via CP_ACP (for CP1251, this would be probably {'\x01', 0x443})... and how does it fit this info back in LOWORD(wParam)?
The key word is "the whole" wparam. So, there is no need to truncate it by using LOWORD.
AFAIK in DBCS two separate messages are used.
A test under Windows would say it for sure.
I can't test it as I don't have a Windows with DBCS locales installed,
Just install one of such locales then, NT/2k/XP have built-in support for DBCS locales.
but Internet says:
When entering non-ASCII characters on systems with DBCS input
locales, the lead byte and trail byte for the DBCS character are passed in two successive WM_CHAR messages. So we are better off processing WM_IME_CHAR messages because we get both bytes at once. If we move to Unicode, however, we'll directly get UTF-16 in WM_CHAR; or on XP: UTF-32 in WM_UNICHAR.
Again, without a test we can't tell for sure what happens in reality.
Hi Dmitry,
On Tue, 26 Jul 2005 16:56:31 +0900 "Dmitry Timoshkov" dmitry@baikal.ru wrote:
The key word is "the whole" wparam. So, there is no need to truncate it by using LOWORD.
Oh sorry. For some reason I thought that HIWORD(wParam) is used for some other data. Here is a new patch, is it ok?
AFAIK in DBCS two separate messages are used.
A test under Windows would say it for sure.
I can't test it as I don't have a Windows with DBCS locales installed,
Just install one of such locales then, NT/2k/XP have built-in support for DBCS locales.
I'll try to find a distro that has. However, my 2000 and XP distros have an installation option for CJK locales, but the needed files are not on the installation CDs.
-- Ph.
"Phil Krylov" phil@newstar.rinet.ru wrote:
Oh sorry. For some reason I thought that HIWORD(wParam) is used for some other data. Here is a new patch, is it ok?
Looks good to me, let's see if Alexandre likes it as well.
On Tue, 26 Jul 2005 21:26:29 +0900 "Dmitry Timoshkov" dmitry@baikal.ru wrote:
"Phil Krylov" phil@newstar.rinet.ru wrote:
Oh sorry. For some reason I thought that HIWORD(wParam) is used for some other data. Here is a new patch, is it ok?
Looks good to me, let's see if Alexandre likes it as well.
Still it is not the best. Now, WM_CHARs posted by PostMessageA and dispatched using GetMessageA work well (are converted back and forth). But when the message loop uses GetMessageW, these WM_CHARs come to the window procedure as 4-byte "garbage" (WCHAR which is converted from CP_ACP to 2 WCHARs). In Windows, the window procedure receives unchanged code...
(See attached test).
-- Ph.
"Phil Krylov" phil@newstar.rinet.ru wrote:
Still it is not the best. Now, WM_CHARs posted by PostMessageA and dispatched using GetMessageA work well (are converted back and forth). But when the message loop uses GetMessageW, these WM_CHARs come to the window procedure as 4-byte "garbage" (WCHAR which is converted from CP_ACP to 2 WCHARs). In Windows, the window procedure receives unchanged code...
I no more can test on a win2k, but under XP with russian locale MsgCheckProcA in your test app receives 0x00000001 in wParam. In Wine it gets 0x00F30001. I'd not say that it's not acceptable, but it needs more testing in win2k and XP with CJK locales to see what is an expected behaviour in that case.
Do you know a real world app does depend on that behaviour?
On Tue, 26 Jul 2005 17:31, Dmitry Timoshkov wrote:
What about the character codes which can't be converted?
A->W conversion doesn't have that problem
That is not necessarily true. A DBCS lead byte without a valid trail byte will result in failure in an A->W conversion.
In fact translating from the 'A' version of WM_CHAR to the 'W' version is likely to be wrong. Compare WM_IME_CHAR to WM_CHAR:
WM_IME_CHAR has wParam set to ((lead_byte << 8) | (trail_byte)) for DBCS characters. WM_CHAR receives DBCS characters as two WM_CHAR messages.
This means that if you SendMessageA a DBCS lead byte, any conversion to a W window procedure would need to involve caching that byte and returning immediately, then performing the translation.
On the other hand SendMessageA of any character to an A window procedure (regardless of any DBCS rules that might apply) ought to pass the character through immediately.
This means that ideally, if the window is not a unicode window, then there should be no A->W->A translation.
On the other hand perhaps Windows is doing some kind of caching, but if it is then it's doing something very strange. When I tested this I noticed an anomoly on Win2K - if I called PostMessageA(hWnd, WM_CHAR...) followed by PostMessageW(hWnd, WM_CHAR...), the character posted with PostMessageW arrived at the A window procedure first. I didn't bother to investigate why this happened.
"Troy Rollo" wine@troy.rollo.name wrote:
This means that ideally, if the window is not a unicode window, then there should be no A->W->A translation.
What is the source and target threads are running in different locales? That's the point in conversion to unicode for the interthread PostMessage/ SendMessage case.
On Wed, 27 Jul 2005 13:20, Dmitry Timoshkov wrote:
"Troy Rollo" wine@troy.rollo.name wrote:
This means that ideally, if the window is not a unicode window, then there should be no A->W->A translation.
What is the source and target threads are running in different locales? That's the point in conversion to unicode for the interthread PostMessage/ SendMessage case.
Locales don't affect the ANSI code page - that is determined either at boot time or by the most recent call of the process to SetGlobalCP (NT only). I could write a test for this, but I won't have time to do it until about a week from now.
"Troy Rollo" wine@troy.rollo.name wrote:
Locales don't affect the ANSI code page - that is determined either at boot time or by the most recent call of the process to SetGlobalCP (NT only). I could write a test for this, but I won't have time to do it until about a week from now.
Take into account that threads can belong to different processes. Another thing that we may want to take into account (that needs a test though) is whether a thread locale affects the translation.
On Wed, 27 Jul 2005 15:28, Dmitry Timoshkov wrote:
Take into account that threads can belong to different processes. Another thing that we may want to take into account (that needs a test though) is whether a thread locale affects the translation.
Locale doesn't affect the A->W translation. I know that is counter-intuitive and it surprised me when I found out (and verified) this. The system dialogs where you can change these things tend to obscure this.
"Troy Rollo" wine@troy.rollo.name wrote:
Locale doesn't affect the A->W translation. I know that is counter-intuitive and it surprised me when I found out (and verified) this. The system dialogs where you can change these things tend to obscure this.
Did you test specifically A->W conversion of the message data or something else? Even if it doesn't work on current Windows platforms I don't see why Microsoft can't fix it and make it work in future versions.
On Wednesday 27 July 2005 18:16, Dmitry Timoshkov wrote:
Did you test specifically A->W conversion of the message data or something else?
I tested the CP_ACP conversions and the GetACP call. I also disassembled kernel32.dll to see how it populated CP_ACP (if I recall correctly it populated it from a registry key under either HKEY_CURRENT_CONFIG or HKEY_LOCAL_MACHINE, so perhaps a change in the registry could cause the same behaviour as a process calling SetCPGlobal, but threads within a single process can never have a different CP_ACP).
"Troy Rollo" wine@troy.rollo.name wrote:
Did you test specifically A->W conversion of the message data or something else?
I tested the CP_ACP conversions and the GetACP call.
Actually you didn't answer the question. GetACP() by itself shows nothing, only actual API tests could show something useful.
I also disassembled kernel32.dll to see how it populated CP_ACP (if I recall correctly it populated it from a registry key under either HKEY_CURRENT_CONFIG or HKEY_LOCAL_MACHINE, so perhaps a change in the registry could cause the same behaviour as a process calling SetCPGlobal, but threads within a single process can never have a different CP_ACP).
I don't believe it, especially since SetThreadLocale exists.
And ...
Even if it doesn't work on current Windows platforms I don't see why Microsoft can't fix it and make it work in future versions.
On Wednesday 27 July 2005 18:40, Dmitry Timoshkov wrote:
I also disassembled
kernel32.dll to see how it populated CP_ACP (if I recall correctly it populated it from a registry key under either HKEY_CURRENT_CONFIG or HKEY_LOCAL_MACHINE, so perhaps a change in the registry could cause the same behaviour as a process calling SetCPGlobal, but threads within a single process can never have a different CP_ACP).
I don't believe it, especially since SetThreadLocale exists.
Then I suggest you test it yourself. I already have. SetThreadLocale affects other stuff, but not the code page. The code page is not part of the locale. You'd think it was, but it's not.
Even if it doesn't work on current Windows platforms I don't see why Microsoft can't fix it and make it work in future versions.
You're assuming they consider it broken. Microsoft seem to think many things are perfectly OK that we consider broken.
Troy Rollo wrote:
On Wednesday 27 July 2005 18:40, Dmitry Timoshkov wrote:
I also disassembled
kernel32.dll to see how it populated CP_ACP (if I recall correctly it populated it from a registry key under either HKEY_CURRENT_CONFIG or HKEY_LOCAL_MACHINE, so perhaps a change in the registry could cause the same behaviour as a process calling SetCPGlobal, but threads within a single process can never have a different CP_ACP).
I don't believe it, especially since SetThreadLocale exists.
Then I suggest you test it yourself. I already have. SetThreadLocale affects other stuff, but not the code page. The code page is not part of the locale. You'd think it was, but it's not.
Actually it affects the CP_THREAD_ACP code page, but not CP_ACP.
- Filip
On Wednesday 27 July 2005 19:22, Filip Navara wrote:
Actually it affects the CP_THREAD_ACP code page, but not CP_ACP.
Interesting. I wasn't previously aware of this. It isn't used anywhere in Wine (it's returned, but nothing ever calls a routine with CP_THREAD_ACP). I suspect not a whole lot of apps use it either since it's Win2K and higher only. Its existence suggests a whole truckload of new tests needed to check its behaviour.
Interestingly though, SetThreadLocale is NT3.1 or higher, so an app using it in Win2K and higher gets a different result to what it would get on earlier versions if CP_THREAD_ACP is used by Windows for internal A->W and W->A conversions.
Troy Rollo wine@troy.rollo.name writes:
On the other hand SendMessageA of any character to an A window procedure (regardless of any DBCS rules that might apply) ought to pass the character through immediately.
This means that ideally, if the window is not a unicode window, then there should be no A->W->A translation.
Since there is no way of knowing if the target window uses the same code page, or even if its code page won't change between the time the message is stored in the queue and when it is retrieved, the only sane approach is to store messages in the queue in Unicode. Only SendMessage calls that bypass the queue avoid the translation. I'm pretty sure that this is what Windows does too, if you have a test demonstrating the opposite I'd be very interested to see it.
On Wed, 27 Jul 2005 20:34, Alexandre Julliard wrote:
Since there is no way of knowing if the target window uses the same code page, or even if its code page won't change between the time the message is stored in the queue and when it is retrieved, the only sane approach is to store messages in the queue in Unicode. Only SendMessage calls that bypass the queue avoid the translation. I'm pretty sure that this is what Windows does too, if you have a test demonstrating the opposite I'd be very interested to see it.
I have just finished running a series of tests using the attached programs - msgchar, msgchar2 and msgchar3.
* The short version:
WM_CHAR messages are delivered "immediately" whether sent by SendMessage or PostMessage. Where there is a conversion from A->W or W->A, the conversion is performed using a modified CP1252 table regardless of the values of CP_THREAD_ACP and CP_ACP. Effectively, when sending a message to a window that was created by a thread in a different code page, if SendMessageA is used, no translation is performed. This was tested on a Win2k system with a default of CP1252 (Western Europe) and a WinXP system with a default ACP of CP950 (Chinese Traditional).
The table used for the conversion differs from the real CP1252 table in that characters 81, 8D, 8F, 90 and 9D, which are unassigned in CP1252, are converted to and from the Unicode characters with the same value (+). This results in a round-trippable conversion via Unicode, so that for WM_CHAR PostMessageA to an ANSI window will always work provided the data is in the code page expected by the recipient, but SendMessageA and PostMessageA to a Unicode window and SendMessageW and PostMessageW to an ANSI window are only guaranteed to get the correct result if CP1252 is used or the messages are limited to characters in the range 0x00->0x7f (assuming nothing exotic has been done like setting the ACP to an EBCDIC code page).
(+) - The CP1252 table in libs/unicode/c_1252.c does the same thing, but is seems Microsoft's CP1252 table also does this despite the fact that every published document on the code page says those characters are undefined.
* The long version
The first two programs create windows after setting the thread locale to be Chinese Traditional, which results in a CP_THREAD_ACP of 950. They then create additional threads with a locale of Japanese, which gets a CP_THREAD_ACP of 932. They create windows using both the W and A versions of the RegisterClass and CreateWindow API calls, and then tests sending messages using SendMessageA and SendMessageW for Unicode character 0x6893 (CP932 0x88 0xB2 and CP950 0xB1 0xEA). The difference between msgchar and msgchar2 is that the first uses GetMessageA/DispatchMessageA and the second uses GetMessageW/DispatchMessageW.
The third program tests more characters and the 950->932 direction and 950->950 transmissions, and was used to verify that a modified CP1252 is what is being used.
Note that 0x88, which is a lead byte in CP892, is one of the characters that maps outside the Latin1 page in Unicode (CP1252 0x88 is Unicode 0x02C6), which makes double-byte characters beginning with that code ideal for these tests.
The results were surprising. No matter what I did, when SendMessageW was used to send WM_CHAR to a window registered with RegisterClassA, the conversion was performed using a modified CP1252 - even if the system code page and thread code page for the receiving thread was CP950. When sending WM_CHAR using SendMessageA to a window registered with RegisterClassW, the conversion was also performed using CP1252 - even if the code page and thread code page for the receiving thread was CP950 (and for the sending thread was CP932).
When using SendMessageA to send WM_CHAR to a window registered with RegisterClassA, no conversion is performed even if the threads have different values for CP_THREAD_ACP.
In other words, where a conversion is performed it is always based on the modified CP1252, which has the effect that no visible conversion is ever performed for A->A messages.
In the first 3 sets of results (the ones listed as "results.*"), W->W PostMessages lose information because GetMessageA and DispatchMessageA are used. The next 2 sets of results (listed as "results2.*") do not show this loss, suggesting that messages are stored in the queue in "Unicode" based on the modified CP1252 conversion.
A 5 second delay was used between all calls to SendMessage and PostMessage, and lead bytes were never being held back to wait for the trail bytes.
Obviously Windows is fundamentally broken in the way it handles this. The general rule for applications has to be that if IsWindowUnicode is true, use SendMessageW with the Unicode character, and if false, use SendMessageA with the ANSI character, preferably knowing the code page expected by the recipient. Applications should avoid GetMessageA, TranslateMessageA and DispatchMessageA and use the W ones exclusively (since they might be processing a message for a Unicode window - perhaps the rich edit control?)
I also ran the program under Wine to test its behaviour, which does not match the behaviour of Windows at all.
The source to the test programs is attached, together with the output of the tests.
"Troy Rollo" wine@troy.rollo.name wrote:
I have just finished running a series of tests using the attached programs - msgchar, msgchar2 and msgchar3.
- The short version:
WM_CHAR messages are delivered "immediately" whether sent by SendMessage or PostMessage.
Actually not immediately, just fast enough, so that GetMessage/DispatchMessage delivers a message while the thread does Sleep.
[skipped]
When using SendMessageA to send WM_CHAR to a window registered with RegisterClassA, no conversion is performed even if the threads have different values for CP_THREAD_ACP.
In other words, where a conversion is performed it is always based on the modified CP1252, which has the effect that no visible conversion is ever performed for A->A messages.
That just means that your way of changing system locale using SetCPGlobal doesn't work (not mentioning that it's quite strange to call an API in a DLL after FreeLibrary, that works because kernel32 is already loaded before LoadLibrary).
I skipped everything else since the tests apparently do not run on a proper (or rather expected) locale, and the results basically show what you would get on an english locale without SetCPGlobal hacks.
On Fri, 5 Aug 2005 12:59, Dmitry Timoshkov wrote:
I skipped everything else since the tests apparently do not run on a proper (or rather expected) locale, and the results basically show what you would get on an english locale without SetCPGlobal hacks.
You clearly neither read my message very carefully nor really looked at what the code was doing. Try again after you have.
"Troy Rollo" wine@troy.rollo.name wrote:
I skipped everything else since the tests apparently do not run on a proper (or rather expected) locale, and the results basically show what you would get on an english locale without SetCPGlobal hacks.
You clearly neither read my message very carefully nor really looked at what the code was doing. Try again after you have.
I very carefully have read your mail, the code, and the results you have got. Did you try to remove SetCPGlobal calls and see whether it actually changes anything?
On Fri, 5 Aug 2005 13:22, Dmitry Timoshkov wrote:
I very carefully have read your mail, the code, and the results you have got. Did you try to remove SetCPGlobal calls and see whether it actually changes anything?
Obviously you didn't, because if you did you'd realise that each program performs 3 series of tests - one before *any* calls to SetCPGlobal, then a series after each value of SetCPGlobal.
You would also have realised that one of the tests was performed on a system whose native code page is 950, particularly since I (1) said as much and (2) show the default code page as the first line of output in each test file (which you would have noticed either by reading the source file or by looking at the test output).
You haven't mentioned any tests you have done that show some other behaviour. If you have, supply them and show your results.
By the way:
WM_CHAR messages are delivered "immediately" whether sent by SendMessage or PostMessage.
Actually not immediately, just fast enough, so that
GetMessage/DispatchMessage
delivers a message while the thread does Sleep.
Why do you think I put the word in quotes?