https://bugs.winehq.org/show_bug.cgi?id=54005
Bug ID: 54005 Summary: ole32:clipboard - test_set_clipboard_DRAWCLIPBOARD() sometimes fails in Wine Product: Wine Version: unspecified Hardware: x86-64 OS: Linux Status: NEW Severity: normal Priority: P2 Component: user32 Assignee: wine-bugs@winehq.org Reporter: fgouget@codeweavers.com Distribution: ---
ole32:clipboard - test_set_clipboard_DRAWCLIPBOARD() sometimes fails in Wine:
clipboard.c:1067: Test failed: OleIsCurrentClipboard returned 0 clipboard.c:1135: Test failed: 1 WM_DRAWCLIPBOARD received
See https://test.winehq.org/data/patterns.html#ole32:clipboard
Notes: * This was almost systematic on the debian11 VM until 2022-06-03 when it was updated. * This was also systematic on the debiant VM and stopped on 2022-06-03 (though there is no record of an update on that date), and then it started happening systematically again on 2022-08-02 when it got updated. * It also happens almost systematically on fgtb-debian11 but more rarely on my fg-deb64 desktop.
So I suspect some external factor is at play, like whether there is already some data in the clipboard or something similar. Regardless, that's probably something that either the test or Wine should be able to deal with.
https://bugs.winehq.org/show_bug.cgi?id=54005
François Gouget fgouget@codeweavers.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Keywords| |source, testcase
https://bugs.winehq.org/show_bug.cgi?id=54005
Kevin Puetz PuetzKevinA@JohnDeere.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |PuetzKevinA@JohnDeere.com
--- Comment #1 from Kevin Puetz PuetzKevinA@JohnDeere.com --- This seems to be potentially flaky for a long time: ole32:clipboard is blacklisted even in the old winezeug system: Looks like was blacklisted for years in the old winezeug system: https://github.com/Winetricks/winezeug/commit/3d19e65ca133d39b1b968b2171d0c6...
But I also just hit this while debugging some other threading/apartment fixes and dug into it a bit. It seems to be a race condition:
set_clipboard_thread will leave a WM_DRAWCLIPBOARD message in the queue, with QS_SendMessage (sent by CloseClipboard as [`SendNotifyMessageW( viewer, WM_DRAWCLIPBOARD, (WPARAM)owner, 0 )`](dlls/user32/clipboard.c:639). But (since this was SendNotifyMessage, that lack of pumping doesn't prevent the thread from completing and exiting (successfully). [`WaitForSingleObject(thread)`](dlls/ole32/tests/clipboard.c:1127) will not pump any messages, so this will still remain in the queue until we get into OleSetClipboard.
The purpose of test_set_clipboard_DRAWCLIPBOARD seems to be to demonstrate that OleSetClipboard will deliver this pending WM_DRAWCLIPBOARD, *before* the clipboard actually returns the new data (the test failing with [`OleIsCurrentClipboard returned ...`](dlls/ole32/tests/clipboard.c:1068) is expecting it to be the old data on the first WM_DRAWCLIPBOARD, and the new clip_data on the second.
But while this will *sometimes* happen, it doesn't seem guaranteed; OleSetClipboard does no explicit message pumping. When it works, it seems to work because OleSetClipboard will call EmptyClipboard, which will [`SendMessageTimeoutW( owner, WM_DESTROYCLIPBOARD, 0, 0, SMTO_ABORTIFHUNG, 5000, NULL )`](dlls/user32/clipboard.c:736). This message is going to the desktop HWND, which is in another thread, so it will go down send_inter_thread_message. [SMTO_ABORTIFHUNG is not SMTO_BLOCK, so this SendMessageTimeoutW will end up with QS_SENDMESSAGE in its wake mask](dlls/user32/message.c:2950).
We've made the calls to SERVER_START_REQ( send_message ), so now both the desktop thread (in explorer.exe) and the "main" thread (running test_set_clipboard_DRAWCLIPBOARD) would be runnable. If the schedule runs test_set_clipboard_DRAWCLIPBOARD first, wait_message_reply will get [wake_bits = QS_SENDMESSAGE and process_sent_messages()](dlls/user32/message.c:2971) will deliver that long-pending WM_DRAWCLIPBOARD, and the test will succeed. If it instead schedules explorer first, the desktop window will process its WM_DESTROYCLIPBOARD first, and when we get back wait_message_reply will get wake_bits = QS_SMRESULT | QS_SENDMESSAGE, and give precedence to QS_SMRESULT, returning with that WM_DRAWCLIPBOARD still on the queue (thus it left to arrive after the new clip_data is set, and the OleIsCurrentClipboard / wm_drawclipboard checks fail).
I'm not sure what the fix is here.
SendMessage certainly allows reentrant calls, and one could reverse the precedence so it would always drain QS_SENDMESSAGE before returning QS_SMRESULT, but that seems like a rather deep change of priority (and I'm not sure how to consistently engineer a "fast" wndproc to probe what windows SendMessage actually does). Or OleSetClipboard could pump WM_DRAWCLIPBOARD explicitly. Or this might simply be an invalid test (that ought not to be expected to succeed).
The test does seem to pass on windows, at least from what I can see on http://test.winehq.org/data/tests/ole32:clipboard.html, but it might just be that the NT scheduler doesn't tend to switch to the desktop window's thread right away. The back and forth between threads and wineserver does seem like the sort of thing where the linux interactive scheduling policies designed for X11 might be working against us here (to make it more likely that send_inter_thread_message immediately wakes the recipient thread).
https://bugs.winehq.org/show_bug.cgi?id=54005
Zeb Figura z.figura12@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |z.figura12@gmail.com
--- Comment #2 from Zeb Figura z.figura12@gmail.com --- This sounded familiar. In fact I actually investigated this ages ago, and came to the conclusion that inter-thread SendMessage() should *always* process pending sent messages, no matter how fast the other thread returns. I sent a patch for it [1], but it was never accepted, or apparently even looked at.
[1] https://www.winehq.org/pipermail/wine-devel/2018-January/121628.html
https://bugs.winehq.org/show_bug.cgi?id=54005
--- Comment #3 from Kevin Puetz PuetzKevinA@JohnDeere.com --- Hmm. Your patch is what I had in mind when I said
and one could reverse the precedence so it would always drain QS_SENDMESSAGE before returning QS_SMRESULT
Maybe it would be possible to distinguish "SendMessage should flush QS_SENDMESAGE)" from "consistency in thread scheduling" by trying to provoke a race the other way around? Rather than having a thread do the SendNotifyMessage *before* the SendMessage, do it in the WNDPROC responding to the SendMessage:
I.e. two threads (A and B), each with an HWND
1. Thread B just creates its HWND and then starts a message pump
2. Thread A does a SendMessage(hwnd_B, WM_USER)
3. The wndproc in hwnd_B receives this WM_USER, and responds by doing SendNotifyMessage(hwnd_A,WM_USER) then returns 0.
4. the wndproc for hwnd_A receives this WM_USER, and responds by doing PostThreadMessage(thread_B, WM_QUIT, 0, 0)
5. Thread A waits for thread B's handle to be signaled (without pumping). If the handle gets signaled, that shows it got the WM_QUIT, so thread A's SendMessage must have dispatched the SendNotifyMessage posted by thread B. Otherwise, the wait will timeout since thread A won't otherwise pump (and thus won't post the WM_QUIT).
This should (mostly) remove the scheduler vagaries - There was nothing in thread A's queue before the SendMessage, so it's going to *have* to yield to thread B (unlike test_set_clipboard_DRAWCLIPBOARD, where the queued message was already there). Once thread B gets woken, it will post a message back and return, adding first QS_SENDMESSAGE and then very shortly after, QS_SMRESULT. But B *just* got scheduled, and should easily get through both steps before running out of timeslice and being preempted. So (unless SendNotifyMessage explicitly yields or something) thread A ought to quite consistently observe both QS_SENDMESSAGE|QS_SMRESULT simultaneously when it gets control back. At least on a single core machine. If A and B are running simultaneously on a multicore setup, then there is once again a window for A to see QS_POSTMESSAGE before QS_SMRESULT comes back - realistic, but very brief compared to waiting for thread B to get scheduled.
https://bugs.winehq.org/show_bug.cgi?id=54005
t.bussmann@gmx.net changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |t.bussmann@gmx.net
https://bugs.winehq.org/show_bug.cgi?id=54005
--- Comment #4 from Kevin Puetz PuetzKevinA@JohnDeere.com --- Created attachment 73628 --> https://bugs.winehq.org/attachment.cgi?id=73628 Text code exploring the behavior of reentrant cross-thread SendMessage
#include <windows.h> #include <stdio.h> #include <assert.h>
LRESULT magic_result = 0;
//#define SUSPEND_THREAD #ifdef SUSPEND_THREAD HANDLE suspended_thread = 0; #endif
#if 0 #define USE_HWND_MESSAGE HWND_MESSAGE #else #define USE_HWND_MESSAGE NULL #endif
static LRESULT WINAPI MsgCheckProc(HWND hwnd, UINT message, WPARAM wParam, LPARAM lParam) { //fprintf(stderr, "MsgCheckProc %d in thread %d\n",message,GetCurrentThreadId()); switch(message) { case WM_USER: HANDLE thread = (HANDLE)wParam; HWND thread_hwnd = (HWND)lParam;
// nudge our own thread's message pump to quit (and resume the other thread) after this wndproc returns PostQuitMessage(0); #ifdef SUSPEND_THREAD suspended_thread = thread; DWORD suspend_count = SuspendThread(thread); //fprintf(stderr, "SuspendThread(%p) from %d\n",suspended_thread,GetCurrentThreadId()); assert(suspend_count == 0); // nobody else should have been suspending it before us, so previous count ought to be 0...
// FIXME: SuspendThread is itself asynchronous: https://devblogs.microsoft.com/oldnewthing/20150205-00/?p=44743 // But calling GetThreadContext requires that the the scheduler save and read back the (user) context, forcing at least that much synchronization #ifndef __WINE__ CONTEXT thread_context; if(!GetThreadContext(thread,&thread_context)) { fprintf(stderr, "***GetThreadContext(%p) failed\n",thread); } #endif
// but even this isn't *quite* there, becuse https://stackoverflow.com/questions/3444190/windows-suspendthread-doesnt-get... // > SuspendThread is asynchronous with respect to kernel-mode execution, but user-mode execution does not occur until the thread is resumed // So if we happen to catch it when it has already begun some kernel wait primitive (as seems likely), // the we can get the context (that it will be returning to), and yet the wait has begun and might observe the QS_SENDMESSAGE wakeup // alone (once SendNotifyMessage queues it) even though it will be unable to act until after wndproc has returned. // // But the race window is now very small, so it fails **most** of the time #endif
LRESULT magic_orig = magic_result; SendNotifyMessage(thread_hwnd,WM_USER+1,0,magic_result+1); return magic_orig; case WM_USER+1: magic_result = lParam; return magic_result; }
return DefWindowProc(hwnd, message, wParam, lParam); }
static BOOL RegisterWindowClasses(void) { WNDCLASS cls; memset(&cls,0, sizeof(cls));
cls.lpfnWndProc = MsgCheckProc; cls.hInstance = GetModuleHandle(0); cls.lpszClassName = "TestWindowClass"; if(!RegisterClass(&cls)) return FALSE;
return TRUE; }
struct pump_thread_param { HANDLE init_event; HWND hwnd; };
static DWORD WINAPI pump_thread_proc(LPVOID lpParameter) { BOOL bRet = 0; MSG msg; struct pump_thread_param *thread_param = (struct pump_thread_param *)lpParameter; HWND thread_hwnd = thread_param->hwnd = CreateWindow("TestWindowClass", 0, 0, 0, 0, 0, 0, USE_HWND_MESSAGE, NULL, GetModuleHandleA(0), 0); //fprintf(stderr, "thread_hwnd = %p\n",thread_hwnd); SetEvent(thread_param->init_event);
while( (bRet = GetMessage( &msg, 0, 0, 0 )) != 0) { if (bRet == -1) { break; } else { TranslateMessage(&msg); DispatchMessage(&msg); } } #ifdef SUSPEND_THREAD if(suspended_thread) { //fprintf(stderr, "ResumeThread(%p) from %d\n",suspended_thread,GetCurrentThreadId()); ResumeThread(suspended_thread); } #endif DestroyWindow(thread_hwnd); return bRet; }
BOOL test_SendMessage_incoming() { HANDLE main_thread = 0; BOOL ret = DuplicateHandle(GetCurrentProcess(),GetCurrentThread(),GetCurrentProcess(),&main_thread,THREAD_SUSPEND_RESUME | THREAD_GET_CONTEXT | THREAD_QUERY_INFORMATION,FALSE,0); assert(ret); //fprintf(stderr, "main_thread id = %d\n",GetCurrentThreadId()); HWND main_hwnd = CreateWindow("TestWindowClass", NULL, 0, 0, 0, 0, 0, USE_HWND_MESSAGE, NULL, GetModuleHandleA(0), 0);
//fprintf(stderr, "main_hwnd = %p\n",main_hwnd);
struct pump_thread_param pump_thread_param; pump_thread_param.init_event = CreateEvent(NULL,FALSE,FALSE,NULL); DWORD pump_thread_id = 0; HANDLE pump_thread = CreateThread(NULL,0,pump_thread_proc,&pump_thread_param,0,&pump_thread_id); //fprintf(stderr, "pump_thread_id id = %d\n",GetCurrentThreadId());
WaitForSingleObject(pump_thread_param.init_event,INFINITE); CloseHandle(pump_thread_param.init_event);
magic_result = 1;
LRESULT wndproc_magic = SendMessage(pump_thread_param.hwnd,WM_USER,(WPARAM)main_thread,(LPARAM)main_hwnd); LRESULT after_magic = magic_result;
assert(wndproc_magic == 1); // thread_hwnd/WM_USER should see the value we set before doing anything
//fprintf(stderr, "waiting for pump_thread to exit\n",main_hwnd); WaitForSingleObject(pump_thread,INFINITE);
//fprintf(stderr,"magic before PeekMessage: WndProc returned %d, global shows %d\n",wndproc_magic,magic); MSG msg;
// any PeekMessage at all will dispatch all incoming nonqueued message (i.e. QS_SENDMESSAGE) // the filtering by hwnd, msg, etc is only applied posted messages // so this will actually also deliver the WM_USER+1 message to main_hwnd (if it hasnt't already been) PeekMessage(&msg,(HWND)-1,WM_PAINT,WM_PAINT,PM_NOREMOVE); assert(!GetQueueStatus(QS_SENDMESSAGE)); assert(magic_result == 2); // main_hwnd/WM_USER+1 should have stored the new message
if(after_magic == magic_result) { ret = TRUE; /* the WM_USER+1 which changes this was pumped during SendMessage */ } else { /* if magic changed between SendMessage and here (i.e. during PeekMessage), * the WM_USER+1 queued by SendNotifyMessage was not pumped by SendMessage */ //fprintf(stderr,"*** message not pumped by SendMessage changed `magic` from %ld != %ld\n",after_magic, magic_result); ret = FALSE; }
CloseHandle(main_thread); DestroyWindow(main_hwnd); return ret; }
int main() { //SetProcessAffinityMask(GetCurrentProcess(),1);
RegisterWindowClasses();
int pass = 0, fail = 0;
DWORD startTime = GetTickCount();
for(int i = 0; i < 100; ++i) { //fprintf(stderr,"---\n"); if(test_SendMessage_incoming()) { ++pass; } else { ++fail; } }
printf("%d pass / %d fail in %u msec\n",pass, fail, GetTickCount() - startTime);
return fail == 0 ? 0 : 1; }
https://bugs.winehq.org/show_bug.cgi?id=54005
--- Comment #5 from Kevin Puetz PuetzKevinA@JohnDeere.com --- Comment on attachment 73628 --> https://bugs.winehq.org/attachment.cgi?id=73628 Text code exploring the behavior of reentrant cross-thread SendMessage
I came up with an even stricter (albiet very very ugly) black-box test to explore the behavior of SendMessage - I can use SuspendThread/ResumeThread to force the timing. I.e. suspend the thread which performed SendMessage once we get into the WndProc (knowing it the caller currently waiting for a reply), then perform both a SendNotifyMessage and a wndproc return, then have the window's thread's message pump resume the SendMessage thread. That *should* guaranteed that, when SendMessage resumes, both wakeup reasons already present (and are seen simultaneously).
the test below "passes" if SendMessage dispatches the nonqueued WM_USER+1 posted by SendNotifyMessage (the behavior assumed by test_set_clipboard_DRAWCLIPBOARD), and "fails" if this message is still pending (i.e. if it is dispatched by a second call to PeekMessage, rather than before SendMessage returned). There are two knobs to explore some weird quirks - SUSPEND_THREAD (whether or not to do this Suspend/ResumeThread dance) and USE_HWND_MESSAGE (whether or not to use a "real" top-level window, or a message-only child of HWND_MESSAGE).
As seen in test_set_clipboard_DRAWCLIPBOARD, with neither the test usually (though only about 90%) passes on windows. So the scheduler seems to switch threads immediately. Even a little bit more code after the SendMessage nearly guarantees it. This all fits with the behavior seen in test_set_clipboard_DRAWCLIPBOARD, where a message queued before the call to SendMessage just about always gets dispatched.
However, adding SUSPEND_THREAD flips it, now to about 90% failure. It still passes occasionally, which is confusing (the SuspendThread/GetThreadContext/ResumeThread dance should make it *impossible* for the SendMessage call to run dispatch the message queued by SendNotifyMessage before the WndProc has also returned). I think what's happening here is that, even with GetThreadContext, SuspendThread is still only synchronized with respect to user mode; if we catch the thread inside a kernel syscall, it will not actually suspend until the return to user space. So if SendMessage has begun some kind of wait-primitive syscall, that call's return value might still race between the QS_SENDMESSAGE and QS_SMRESULT, and return a state that shows only the former. Then, when the thread is resumed later, it will still receive this stale return value. So it makes the race narrower, but (assuming some of this is in kernel mode, which seems likely) not gone.
I am baffled by the interaction with HWND_MESSAGE - using this makes the test run much faster, makes the !SUSPEND_THREAD fail more often (now only about 50% successful), and makes SUSPEND_THREAD fail ever time (as far as I can tell). I strongly supect the "faster" is related - quite a bit of work is being skipped (e.g. a normal window gets its icon placed on the taskbar), and this shifts the timing enough to greatly affect the whatever race condition we're poking at).
I doubt we want any tests this nasty added to winetest. But I think the answers heavily favor the "consistency in thread scheduling" explanation, and the SUSPEND_THREAD case seems to disprove zf's patch (because it shows that SendMessage does not guarantee dispatch of nonqueued messages which preceded the result); It is interesting how much the pass/fail rate seems to be reciprocal; it's probably dependent on basically the basic same scheduler question (whether the thread calling SendMessage reaches the kernel wait primitive before or after a context switch to the wndproc's thread)
So it's either test_set_clipboard_DRAWCLIPBOARD is indeed just a flaky test (and windows *could* fail, but mostly doesn't since it's hard to cause so much delay), or perhaps the native implementations of OleSetClipboard (or marshaling, or who knows what) just has another PeekMessage or similar buried in it somewhere that wine lacks (given the nature of OLE/RPC being full of such, this is easy to believe). Any PeekMessage would do; the filtering is not applied to nonqueued messages.
https://bugs.winehq.org/show_bug.cgi?id=54005
Kevin Puetz PuetzKevinA@JohnDeere.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Attachment #73628|Text code exploring the |Test code exploring the description|behavior of reentrant |behavior of reentrant |cross-thread SendMessage |cross-thread SendMessage