[PATCH 0/3] MR1559: ntoskrnl.exe: Prevent blocking if kernel timer is reset in DPC.

List overview All Threads

newer

older

Re: [PATCH v3 0/1] MR1614:...

[PATCH 0/4] MR1612: Start of...

Ivo Ivanov (＠logos128)

27 Nov 2022 27 Nov '22

5:09 p.m.

KeCancelTimer (which is called via KeSetTimerEx) would block indefinitely in the DPC, while waiting for the timer to complete. Which never happens, since the internal thread pool implementation never gets the chance to wake up the "finished_event" condition variable on the DPC exit in tp_object_execute().

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/1559

Show replies by date

Ivo Ivanov

27 Nov 27 Nov

5:09 p.m.

New subject: [PATCH 1/3] ntdll: Cancel any pending callbacks on a timer re-set.

From: Ivo Ivanov logos128@gmail.com

--- dlls/ntdll/threadpool.c | 3 +++ 1 file changed, 3 insertions(+)

diff --git a/dlls/ntdll/threadpool.c b/dlls/ntdll/threadpool.c index 99525f831e1..0d88ebd34a0 100644 --- a/dlls/ntdll/threadpool.c +++ b/dlls/ntdll/threadpool.c @@ -2975,6 +2975,9 @@ VOID WINAPI TpSetTimer( TP_TIMER *timer, LARGE_INTEGER *timeout, LONG period, LO

TRACE( "%p %p %lu %lu\n", timer, timeout, period, window_length );

+ /* If the timer has any pending callbacks, cancel them first. */ + tp_object_cancel( this ); + RtlEnterCriticalSection( &timerqueue.cs );

assert( this->u.timer.timer_initialized );

-- GitLab https://gitlab.winehq.org/wine/wine/-/merge_requests/1559

Ivo Ivanov

5:09 p.m.

New subject: [PATCH 2/3] ntoskrnl.exe: Prevent blocking if timer is reset in DPC.

From: Ivo Ivanov logos128@gmail.com

Fixes NaturalPoint's TrackIR5 app hanging indefinitely on exit, while waiting synchronously on an IRP to complete. The blocking happens in the app's npusbio_x64 driver, while it tries to reset a timer in its DPC. Probably fixes other drivers/apps in such situations. --- dlls/ntoskrnl.exe/sync.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/dlls/ntoskrnl.exe/sync.c b/dlls/ntoskrnl.exe/sync.c index d9b5726b920..0b101158eb6 100644 --- a/dlls/ntoskrnl.exe/sync.c +++ b/dlls/ntoskrnl.exe/sync.c @@ -493,10 +493,14 @@ BOOLEAN WINAPI KeSetTimerEx( KTIMER *timer, LARGE_INTEGER duetime, LONG period,

EnterCriticalSection( &sync_cs );

- if ((ret = timer->Header.Inserted)) - KeCancelTimer(timer); - + ret = timer->Header.Inserted; timer->Header.Inserted = TRUE; + timer->Header.SignalState = FALSE; + if (timer->Header.WaitListHead.Blink && !*((ULONG_PTR *)&timer->Header.WaitListHead.Flink)) + { + CloseHandle(timer->Header.WaitListHead.Blink); + timer->Header.WaitListHead.Blink = NULL; + }

if (!timer->TimerListEntry.Blink) timer->TimerListEntry.Blink = (void *)CreateThreadpoolTimer(ke_timer_complete_proc, timer, NULL);

-- GitLab https://gitlab.winehq.org/wine/wine/-/merge_requests/1559

Ivo Ivanov

5:09 p.m.

New subject: [PATCH 3/3] ntoskrnl.exe/tests: Test resetting a kernel timer in its DPC.

From: Ivo Ivanov logos128@gmail.com

--- dlls/ntoskrnl.exe/tests/driver.c | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/dlls/ntoskrnl.exe/tests/driver.c b/dlls/ntoskrnl.exe/tests/driver.c index d293889c823..dfdca2e24ca 100644 --- a/dlls/ntoskrnl.exe/tests/driver.c +++ b/dlls/ntoskrnl.exe/tests/driver.c @@ -578,6 +578,14 @@ static void WINAPI test_sync_dpc(KDPC *dpc, void *context, void *system_argument c->called = TRUE; }

+static void WINAPI test_sync_dpc_2(KDPC *dpc, void *context, void *system_argument1, void *system_argument2) +{ + KTIMER *timer = context; + LARGE_INTEGER timeout = {.QuadPart = -20 * 10000}; + + KeSetTimerEx(timer, timeout, 0, dpc); +} + static void test_sync(void) { static const ULONG wine_tag = 0x454e4957; /* WINE */ @@ -591,7 +599,7 @@ static void test_sync(void) void *objs[2]; KTIMER timer; NTSTATUS ret; - KDPC dpc; + KDPC dpc, dpc_2; int i;

KeInitializeEvent(&manual_event, NotificationEvent, FALSE); @@ -925,6 +933,17 @@ static void test_sync(void) ret = wait_single(&timer, -40 * 10000); ok(ret == WAIT_TIMEOUT, "got %#lx\n", ret);

+ KeCancelTimer(&timer); + /* Test re-setting timer in dpc */ + KeInitializeDpc(&dpc_2, test_sync_dpc_2, &timer); + KeSetTimerEx(&timer, timeout, 0, &dpc_2); + ret = wait_single(&timer, -40 * 10000); + ok(ret == 0, "got %#lx\n", ret); + ret = wait_single(&timer, -40 * 10000); + ok(ret == 0, "got %#lx\n", ret); + ret = wait_single(&timer, -40 * 10000); + ok(ret == 0, "got %#lx\n", ret); + KeCancelTimer(&timer); /* Test reinitializing timer. */ KeSetTimerEx(&timer, timeout, 0, &dpc);

-- GitLab https://gitlab.winehq.org/wine/wine/-/merge_requests/1559

Marvin

5:38 p.m.

New subject: [PATCH 3/3] ntoskrnl.exe/tests: Test resetting a kernel timer in its DPC.

Hi,

It looks like your patch introduced the new failures shown below. Please investigate and fix them before resubmitting your patch. If they are not new, fixing them anyway would help a lot. Otherwise please ask for the known failures list to be updated.

The tests also ran into some preexisting test failures. If you know how to fix them that would be helpful. See the TestBot job for the details:

The full results can be found at: https://testbot.winehq.org/JobDetails.pl?Key=126832

Your paranoid android.

=== w8 (32 bit report) ===

ntoskrnl.exe: ntoskrnl.c:1218: Test failed: Got unexpected ret -1. ntoskrnl.c:1219: Test failed: Received unexpected data.

Zebediah Figura (＠zfigura)

28 Nov 28 Nov

9:35 p.m.

Zebediah Figura (@zfigura) commented about dlls/ntdll/threadpool.c:

...

 TRACE( "%p %p %lu %lu\n", timer, timeout, period, window_length );
/* If the timer has any pending callbacks, cancel them first. */

tp_object_cancel( this );

Why do we need this?

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/1559#note_17731

Zebediah Figura (＠zfigura)

9:55 p.m.

Zebediah Figura (@zfigura) commented about dlls/ntoskrnl.exe/sync.c:

...

 EnterCriticalSection( &sync_cs );
if ((ret = timer->Header.Inserted))
   KeCancelTimer(timer);
ret = timer->Header.Inserted; timer->Header.Inserted = TRUE;

timer->Header.SignalState = FALSE;

if (timer->Header.WaitListHead.Blink && !*((ULONG_PTR *)&timer->Header.WaitListHead.Flink))

{
   CloseHandle(timer->Header.WaitListHead.Blink);
   timer->Header.WaitListHead.Blink = NULL;
}

if (!timer->TimerListEntry.Blink) timer->TimerListEntry.Blink = (void *)CreateThreadpoolTimer(ke_timer_complete_proc, timer, NULL);

I'm a bit concerned about this change, because my assumption—although I never tried to validate it—is that KeSetTimerEx() should probably act as a fence, i.e. it shouldn't be possible for an earlier callback to execute simultaneously or after it's returned. In this case that could result in a non-periodic DPC executing twice, which seems dangerous.

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/1559#note_17732

Zebediah Figura (＠zfigura)

9:57 p.m.

On Mon Nov 28 21:55:48 2022 +0000, Zebediah Figura wrote:

...

I'm a bit concerned about this change, because my assumption—although I never tried to validate it—is that KeSetTimerEx() should probably act as a fence, i.e. it shouldn't be possible for an earlier callback to execute simultaneously or after it's returned. In this case that could result in a non-periodic DPC executing twice, which seems dangerous.

Note that 1/3 won't actually help you, because we could already be in ke_timer_complete_proc() but not executing the callback yet.

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/1559#note_17733

Ivo Ivanov (＠logos128)

29 Nov 29 Nov

5 a.m.

On Mon Nov 28 21:35:09 2022 +0000, Zebediah Figura wrote:

...

Why do we need this?

This cancels the thread pool timer on reset, so it won't call any pending callbacks. Otherwise it only removes the pending timers, which is not enough to stop the pending callbacks. This also allows us to remove the blocking KeCancelTimer() call from KeSetTimerEx(), which uses the thread pool implementation.

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/1559#note_17803

Ivo Ivanov (＠logos128)

6:05 a.m.

On Mon Nov 28 21:57:01 2022 +0000, Zebediah Figura wrote:

...

Note that 1/3 won't actually help you, because we could already be in ke_timer_complete_proc() but not executing the callback yet.

Yes, that won't be enough to stop the pending callbacks. The change in TpSetTimer handles this on thread pool level. If there is ke_timer_complete_proc() running at this time, it will complete in parallel. I have to dig the MS documentation again, but I'm pretty sure they don't guarantee immediate cancellation of pending DPCs after canceling an object. Since the DPCs are queued and run on another IRQ level, I assume they can't be dequeued and stopped easily in time.

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/1559#note_17812

Zebediah Figura (＠zfigura)

4:58 p.m.

On Tue Nov 29 05:00:35 2022 +0000, Ivo Ivanov wrote:

...

This cancels the thread pool timer on reset, so it won't call any pending callbacks. Otherwise it only removes the pending timers, which is not enough to stop the pending callbacks. This also allows us to remove the blocking KeCancelTimer() call from KeSetTimerEx(), which uses the thread pool implementation.

If callbacks are pending at this point, then they've been triggered and are just waiting for thread space to be executed. On the other hand, this isn't going to stop callbacks that are already in process. So this doesn't guarantee anything, and in light of that I don't see why cancelling queued callbacks is necessary or even helpful.

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/1559#note_17865

Zebediah Figura (＠zfigura)

5:01 p.m.

On Tue Nov 29 06:05:51 2022 +0000, Ivo Ivanov wrote:

...

Yes, that won't be enough to stop the pending callbacks. The change in TpSetTimer handles this on thread pool level. If there is ke_timer_complete_proc() running at this time, it will complete in parallel. I have to dig the MS documentation again, but I'm pretty sure they don't guarantee immediate cancellation of pending DPCs after canceling an object. Since the DPCs are queued and run on another IRQ level, I assume they can't be dequeued and stopped easily in time.

That sounds believable, although it'd be nice to be sure. More concerning to me, though, is that currently we can execute the "wrong" DPC if the callback was triggered during/after a concurrent KeSetTimer().

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/1559#note_17866

Ivo Ivanov (＠logos128)

30 Nov 30 Nov

3:37 p.m.

...

If callbacks are pending at this point, then they've been triggered and are just waiting for thread space to be executed.

The purpose of the call here is exactly that - to cancel any pending callbacks that are about to be executed.

...

On the other hand, this isn't going to stop callbacks that are already in process.

Yes. According to the MS documentation for SetThreadpoolTimer(): _"In some cases, callback functions might run after an application closes the threadpool timer."_ (Remarks section) https://learn.microsoft.com/en-us/windows/win32/api/threadpoolapiset/nf-thre...

Although they seem to imply that this includes also the queued callbacks, not just those in the execute phase: (pftDueTime parameter) _"If this parameter is NULL, the timer object will cease to queue new callbacks (but callbacks already queued will still occur)."_

And also in the Remarks section of CloseThreadpoolTimer(): _"The timer object is freed immediately if there are no outstanding callbacks; otherwise, the timer object is freed asynchronously after the outstanding callback functions complete."_ https://learn.microsoft.com/en-us/windows/win32/api/threadpoolapiset/nf-thre...

Since the tp_object_cancel() removes pending callbacks, it can probably be removed. It won't make much of a difference though, and I think it makes more sense to cancel the pending callbacks there.

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/1559#note_18070

Ivo Ivanov (＠logos128)

4:12 p.m.

On Tue Nov 29 17:01:12 2022 +0000, Zebediah Figura wrote:

...

That sounds believable, although it'd be nice to be sure. More concerning to me, though, is that currently we can execute the "wrong" DPC if the callback was triggered during/after a concurrent KeSetTimer().

Here is a quote from the MS documentation: _"However, if the timer is about to expire when KeCancelTimer is called, expiration might occur before KeCancelTimer has a chance to access the time queue, in which case signaling and DPC queuing will occur."_ https://learn.microsoft.com/en-us/windows-hardware/drivers/kernel/using-a-cu...

And also: _"If a second driver routine calls KeSetTimer or KeSetTimerEx to run the same CustomTimerDpc routine before the interval specified by the first caller expires, the CustomTimerDpc routine is run only after the interval specified by the second caller expires. In these circumstances, the CustomTimerDpc does none of the work for which the first routine called KeSetTimer or KeSetTimerEx."_

In general they warn of many race conditions that can occur when using DPCs with kernel timers, and that it is responsibility of the developer to avoid them.

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/1559#note_18072

Zebediah Figura (＠zfigura)

4:59 p.m.

On Wed Nov 30 15:37:41 2022 +0000, Ivo Ivanov wrote:

...

...
If callbacks are pending at this point, then they've been triggered

and are just waiting for thread space to be executed. The purpose of the call here is exactly that - to cancel any pending callbacks that are about to be executed.

...
On the other hand, this isn't going to stop callbacks that are already

in process. Yes. According to the MS documentation for SetThreadpoolTimer(): _"In some cases, callback functions might run after an application closes the threadpool timer."_ (Remarks section) https://learn.microsoft.com/en-us/windows/win32/api/threadpoolapiset/nf-thre... Although they seem to imply that this includes also the queued callbacks, not just those in the execute phase: (pftDueTime parameter) _"If this parameter is NULL, the timer object will cease to queue new callbacks (but callbacks already queued will still occur)."_ And also in the Remarks section of CloseThreadpoolTimer(): _"The timer object is freed immediately if there are no outstanding callbacks; otherwise, the timer object is freed asynchronously after the outstanding callback functions complete."_ https://learn.microsoft.com/en-us/windows/win32/api/threadpoolapiset/nf-thre... Since the tp_object_cancel() removes pending callbacks, it can probably be removed. It won't make much of a difference though, and I think it makes more sense to cancel the pending callbacks there.

What I'm trying to say is I don't see why cancelling pending callbacks is going to *help* anything. In general I can see two reasons, in a situation like this, to cancel or flush some queued event:

* in order to avoid letting that event execute after the cancel

* in order to prevent anything from taking extra time waiting for those queued events to be processed

Neither of these seem to apply here. The former can't apply because the race is still present, as has been described. The latter doesn't seem like it should apply either, since there's nothing actually waiting for those callbacks to execute.

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/1559#note_18087

Zebediah Figura (＠zfigura)

5:03 p.m.

On Wed Nov 30 16:12:36 2022 +0000, Ivo Ivanov wrote:

...

Here is a quote from the MS documentation: _"However, if the timer is about to expire when KeCancelTimer is called, expiration might occur before KeCancelTimer has a chance to access the time queue, in which case signaling and DPC queuing will occur."_ https://learn.microsoft.com/en-us/windows-hardware/drivers/kernel/using-a-cu... And also: _"If a second driver routine calls KeSetTimer or KeSetTimerEx to run the same CustomTimerDpc routine before the interval specified by the first caller expires, the CustomTimerDpc routine is run only after the interval specified by the second caller expires. In these circumstances, the CustomTimerDpc does none of the work for which the first routine called KeSetTimer or KeSetTimerEx."_ In general they warn of many race conditions that can occur when using DPCs with kernel timers, and that it is responsibility of the developer to avoid them.

Sure, but those are specific races, and driver developers might expect and be able to handle them. On the other hand, I could easily see a driver crashing if a DPC that it thought was only going to execute at most once ends up executing twice. Unless Windows actually does that, we should probably try to avoid it. Which shouldn't be too difficult, I think.

(Note also that the Microsoft documentation can often be inaccurate, or incomplete. Programs may also depend on guarantees weaker or stronger than the documentation gives.)

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/1559#note_18088

Ivo Ivanov (＠logos128)

1 Dec 1 Dec

12:15 a.m.

On Wed Nov 30 16:59:15 2022 +0000, Zebediah Figura wrote:

...

What I'm trying to say is I don't see why cancelling pending callbacks is going to *help* anything. In general I can see two reasons, in a situation like this, to cancel or flush some queued event:

in order to avoid letting that event execute after the cancel

in order to prevent anything from taking extra time waiting for those

queued events to be processed Neither of these seem to apply here. The former can't apply because the race is still present, as has been described. The latter doesn't seem like it should apply either, since there's nothing actually waiting for those callbacks to execute.

I'm trying to eliminate those callbacks that are still in the submission phase (num_pending_callbacks). Those that have entered the execution phase (num_associated_callbacks) in tp_object_execute() cannot be stopped and will complete. Anyway, more than one pending callback per timer would be extremely rare, so no problem reverting that change.

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/1559#note_18175

Ivo Ivanov (＠logos128)

12:43 a.m.

...

On the other hand, I could easily see a driver crashing if a DPC that it thought was only going to execute at most once ends up executing twice.

Yeah, that could be a problem. Callbacks can finish executing later than timer reset/cancel, as shown by the test, so at least that is expected behavior.

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/1559#note_18178

961

Age (days ago)

965

Last active (days ago)

wine-gitlab@winehq.org

17 comments

4 participants

tags (0)

participants (4)

Ivo Ivanov
Ivo Ivanov (＠logos128)
Marvin
Zebediah Figura (＠zfigura)