These changes are the documented way to implement IMFMediaEventGenerator for a media source. Avoiding using a destroyed event queue is the only reason to lock the CS that I can find. On top of !6783 it seems to resolve hanging while switching characters in Killsquad (not decisively proven though).
-- v2: winegstreamer: Lock the media source critical section while using the event queue.
From: Conor McCarthy cmccarthy@codeweavers.com
--- dlls/winegstreamer/media_source.c | 76 ++++++++++++++++++++++++++++--- 1 file changed, 70 insertions(+), 6 deletions(-)
diff --git a/dlls/winegstreamer/media_source.c b/dlls/winegstreamer/media_source.c index 030d0c1b9a2..e491d0e9627 100644 --- a/dlls/winegstreamer/media_source.c +++ b/dlls/winegstreamer/media_source.c @@ -1376,38 +1376,102 @@ static ULONG WINAPI media_source_Release(IMFMediaSource *iface) static HRESULT WINAPI media_source_GetEvent(IMFMediaSource *iface, DWORD flags, IMFMediaEvent **event) { struct media_source *source = impl_from_IMFMediaSource(iface); + IMFMediaEventQueue *event_queue = NULL; + HRESULT hr = S_OK;
TRACE("%p, %#lx, %p.\n", iface, flags, event);
- return IMFMediaEventQueue_GetEvent(source->event_queue, flags, event); + EnterCriticalSection(&source->cs); + + /* GetEvent() is blocking, so we safely get an event queue pointer and then + * unlock the CS. The event queue exists until the media source is + * destroyed, and it checks for shutdown, but to be strictly correct we + * must check for shutdown in the source, because the event queue's state + * is updated during execution of IMFMediaSource_Shutdown() and we could + * erroneously return S_OK here. */ + if (source->state == SOURCE_SHUTDOWN) + hr = MF_E_SHUTDOWN; + else + { + event_queue = source->event_queue; + IMFMediaEventQueue_AddRef(event_queue); + } + + LeaveCriticalSection(&source->cs); + + if (SUCCEEDED(hr)) + { + hr = IMFMediaEventQueue_GetEvent(event_queue, flags, event); + IMFMediaEventQueue_Release(event_queue); + } + + return hr; }
+/* The event queue has its own CS, but in the event-handling methods we still + * lock the source CS to ensure execution of the method is atomic wrt all other + * methods. The async handlers for start, stop, pause and request-sample all + * use the event queue, and the below event queue calls should not fall + * randomly between the actions taken in those handlers. Locking here ensures + * robustness and it means we don't need to prove the CS in the event queue is + * sufficient for correct synchronisation. */ + static HRESULT WINAPI media_source_BeginGetEvent(IMFMediaSource *iface, IMFAsyncCallback *callback, IUnknown *state) { struct media_source *source = impl_from_IMFMediaSource(iface); + HRESULT hr;
TRACE("%p, %p, %p.\n", iface, callback, state);
- return IMFMediaEventQueue_BeginGetEvent(source->event_queue, callback, state); + EnterCriticalSection(&source->cs); + + if (source->state == SOURCE_SHUTDOWN) + hr = MF_E_SHUTDOWN; + else + hr = IMFMediaEventQueue_BeginGetEvent(source->event_queue, callback, state); + + LeaveCriticalSection(&source->cs); + + return hr; }
static HRESULT WINAPI media_source_EndGetEvent(IMFMediaSource *iface, IMFAsyncResult *result, IMFMediaEvent **event) { struct media_source *source = impl_from_IMFMediaSource(iface); + HRESULT hr;
TRACE("%p, %p, %p.\n", iface, result, event);
- return IMFMediaEventQueue_EndGetEvent(source->event_queue, result, event); + EnterCriticalSection(&source->cs); + + if (source->state == SOURCE_SHUTDOWN) + hr = MF_E_SHUTDOWN; + else + hr = IMFMediaEventQueue_EndGetEvent(source->event_queue, result, event); + + LeaveCriticalSection(&source->cs); + + return hr; }
static HRESULT WINAPI media_source_QueueEvent(IMFMediaSource *iface, MediaEventType event_type, REFGUID ext_type, - HRESULT hr, const PROPVARIANT *value) + HRESULT status, const PROPVARIANT *value) { struct media_source *source = impl_from_IMFMediaSource(iface); + HRESULT hr;
- TRACE("%p, %lu, %s, %#lx, %p.\n", iface, event_type, debugstr_guid(ext_type), hr, value); + TRACE("%p, %lu, %s, %#lx, %p.\n", iface, event_type, debugstr_guid(ext_type), status, value); + + EnterCriticalSection(&source->cs); + + if (source->state == SOURCE_SHUTDOWN) + hr = MF_E_SHUTDOWN; + else + hr = IMFMediaEventQueue_QueueEventParamVar(source->event_queue, event_type, ext_type, status, value);
- return IMFMediaEventQueue_QueueEventParamVar(source->event_queue, event_type, ext_type, hr, value); + LeaveCriticalSection(&source->cs); + + return hr; }
static HRESULT WINAPI media_source_GetCharacteristics(IMFMediaSource *iface, DWORD *characteristics)
I've edited the comments to state reasons why we should do this.
In similar amounts of Killsquad testing I saw five hangs without this commit and no hangs with it. Enabling mfplat tracing causes occasional hanging even with this change, so it's seems to be an improvement but not a complete fix.
Rémi Bernon (@rbernon) commented about dlls/winegstreamer/media_source.c:
* erroneously return S_OK here. */
- if (source->state == SOURCE_SHUTDOWN)
hr = MF_E_SHUTDOWN;
- else
- {
event_queue = source->event_queue;
IMFMediaEventQueue_AddRef(event_queue);
- }
- LeaveCriticalSection(&source->cs);
- if (SUCCEEDED(hr))
- {
hr = IMFMediaEventQueue_GetEvent(event_queue, flags, event);
IMFMediaEventQueue_Release(event_queue);
- }
I still don't see how this makes any difference? Let's assume there's an event E in the event queue prior to these calls:
Before this change we can have A3-A5 serialized with B5:
``` Thread A: Thread B: A1: IMFMediaSource_GetEvent( source ) B1: IMFMediaSource_Shutdown( source ) B2: EnterCriticalSection( &source->cs ) B3: source->state := SOURCE_SHUTDOWN B4: IMFMediaEventQueue_Shutdown( queue ) A2: IMFMediaEventQueue_GetEvent( queue ) A3: EnterCriticalSection( &queue->cs ) A4: queue->is_shut_down == FALSE A5: LeaveCriticalSection( &queue->cs ) B5: EnterCriticalSection( &queue->cs ) B6: queue->is_shut_down := TRUE B7: LeaveCriticalSection( &queue->cs ) B8: IMFMediaEventQueue_Shutdown -> (S_OK) B9: LeaveCriticalSection( &source->cs ) B10: IMFMediaSource_Shutdown -> S_OK A6: IMFMediaEventQueue_GetEvent -> (S_OK, E) ```
After this change we can have A2-A5 serialized with B3, and A7-A9 serialized with B5:
``` Thread A: Thread B: A1: IMFMediaSource_GetEvent( source ) B1: IMFMediaSource_Shutdown( source ) A2: EnterCriticalSection( &source->cs ) A3: source->state != SOURCE_SHUTDOWN A4: queue := source->queue A5: LeaveCriticalSection( &source->cs ) B2: EnterCriticalSection( &source->cs ) B3: source->state := SOURCE_SHUTDOWN B4: IMFMediaEventQueue_Shutdown( queue ) A6: IMFMediaEventQueue_GetEvent( queue ) A7: EnterCriticalSection( &queue->cs ) A8: queue->is_shut_down == FALSE A9: LeaveCriticalSection( &queue->cs ) B5: EnterCriticalSection( &queue->cs ) B6: queue->is_shut_down := TRUE B7: LeaveCriticalSection( &queue->cs ) B8: IMFMediaEventQueue_Shutdown -> (S_OK) B9: LeaveCriticalSection( &source->cs ) B10: IMFMediaSource_Shutdown -> S_OK A10: IMFMediaEventQueue_GetEvent -> (S_OK, E) ```
Makes strictly no difference in terms of possible outcome?
I think we need to determine the cause of the hang more precisely, especially if it still happens after this change because I don't think it makes any logical difference.
I linked that for another MR, but it could be useful here too https://gitlab.winehq.org/wine/wine/-/merge_requests/6783#note_87504. Sample implementation always grab its own lock, checks for its own shutdown status, and does not call to the event queue if container object is shut down already. I don't know if it helps or not, or if this sample is thread safe in a first place.
Sure, and this is also what the documentation linked above describes. It's probably canonical if you don't make any assumption about the event queue, I just don't see how it is actually solving anything.
Note that the difference is also that in this example, they release their event queue reference in Shutdown, which is why they have to lock and check for shutdown status here.
I can drop the GetEvent() changes as the game doesn't use it.
It's not really that change that bothers me here, it's that I don't see how adding more locks makes anything actually behave differently.
Adding them maybe fixes the hang just because it changes the scheduling, it doesn't prevent it from happening, and it isn't the right way to fix this issue, we need to understand what is causing the hang in the first place.
The hang occurs because the game inexplicably calls `Shutdown()` on a source which is starting, so that asynchronous execution of the start command fails. The `Shutdown()` call doesn't normally occur at all, so the issue is not a race between it and asynchronous start. The reason for the spurious `Shutdown()` is very difficult to find. I'm the third dev to look at this bug, so it's unlikely we will find it. It seems likely to be a game bug, and shifting the timing to more closely match Windows hides the bug. I'll close this if you think the code should remain unchanged.
Giving it a quick look it seems like the game is doing the following:
1. Creates a MF media session. 2. Creates a MF media source, 3. Creates a MF topology with the media source, 4. IMFMediaSession_BeginGetEvent (then handles the produced events asynchronously) 5. IMFMediaSession_ClearTopologies 6. IMFMediaSession_SetTopology 7. IMFMediaSession_Start
Then, when the user selects another character, it does:
8. IMFMediaSource_Shutdown 9. loops to 2.
The IMFMediaSession_Start call queues an internal media session START command, that subscribes to media source events and then calls IMFMediaSource_Start. It then asynchronously waits until the corresponding MESourceStarted event is received before completing its START command with a corresponding media session event. The media session command queue uses its own list of commands and doesn't queue additional async calls while a media source command is pending.
Now, the game IMFMediaSource_Shutdown call can sometimes be received after a IMFMediaSource_Start call has been made by the media session, but before the media source async command has been executed. When the command is executed, the media source has been shutdown already and it cannot queue any kind of event anymore for the media session to receive it.
The media session is then stuck in its PENDING command state, and cannot process any other asynchronous commands.
This looks to me like a media session bug, and it should instead be robust to this, especially as in this use case the ClearTopologies command should probably ignore any previous error or forever pending commands.
On Tue Nov 19 07:50:09 2024 +0000, Rémi Bernon wrote:
Giving it a quick look it seems like the game is doing the following:
- Creates a MF media session.
- Creates a MF media source,
- Creates a MF topology with the media source,
- IMFMediaSession_BeginGetEvent (then handles the produced events asynchronously)
- IMFMediaSession_ClearTopologies
- IMFMediaSession_SetTopology
- IMFMediaSession_Start
Then, when the user selects another character, it does: 8. IMFMediaSource_Shutdown 9. loops to 2. The IMFMediaSession_Start call queues an internal media session START command, that subscribes to media source events and then calls IMFMediaSource_Start. It then asynchronously waits until the corresponding MESourceStarted event is received before completing its START command with a corresponding media session event. The media session command queue uses its own list of commands and doesn't queue additional async calls while a media source command is pending. Now, the game IMFMediaSource_Shutdown call can sometimes be received after a IMFMediaSource_Start call has been made by the media session, but before the media source async command has been executed. When the command is executed, the media source has been shutdown already and it cannot queue any kind of event anymore for the media session to receive it. The media session is then stuck in its PENDING command state, and cannot process any other asynchronous commands. This looks to me like a media session bug, and it should instead be robust to this, especially as in this use case the ClearTopologies command should probably ignore any previous error or forever pending commands.
Thanks for the info. I made a test which shows that `MFUnlockWorkQueue()` does not purge pending items in Windows and instead they are processed. To fix this we need only pass FALSE to `CloseThreadpoolCleanupGroupMembers()`. On top of that, if we delay shutting down the event queue until the source is destroyed, and enqueue events with status MF_E_SHUTDOWN for commands handled after shutdown, it seems to fix the hang. I'll do some more testing. Since we can no longer rely on the event queue tracking shutdown status, we'll need this commit too.
On Tue Nov 19 07:50:09 2024 +0000, Conor McCarthy wrote:
Thanks for the info. I made a test which shows that `MFUnlockWorkQueue()` does not purge pending items in Windows and instead they are processed. To fix this we need only pass FALSE to `CloseThreadpoolCleanupGroupMembers()`. On top of that, if we delay shutting down the event queue until the source is destroyed, and enqueue events with status MF_E_SHUTDOWN for commands handled after shutdown, it seems to fix the hang. I'll do some more testing. Since we can no longer rely on the event queue tracking shutdown status, we'll need this commit too.
Well that's maybe yet another issue but not what I meant.
What I think is happening is related to the `SESSION_FLAG_PENDING_COMMAND` that is used at https://gitlab.winehq.org/wine/wine/-/blob/master/dlls/mf/session.c?ref_type... to decide whether to append the async call to a list or whether to request a new async call from the thread pool.
The flag is set at https://gitlab.winehq.org/wine/wine/-/blob/master/dlls/mf/session.c?ref_type..., so before processing `SESSION_CMD_START`, which calls BeginGetEvent and Start on the media sources at https://gitlab.winehq.org/wine/wine/-/blob/master/dlls/mf/session.c?ref_type....
If `IMFMediaSource_Start` is returning an error immediately (if the source has been shutdown before the call), the flag is correctly cleared with `session_command_complete_with_event` at https://gitlab.winehq.org/wine/wine/-/blob/master/dlls/mf/session.c?ref_type....
If the call doesn't return an error, but the source is shutdown before it has a chance of executing its async command, the media session never completes its command and never clear that flag because `MESourceStarted` is never received, and `session_set_source_object_state` never called.
On Tue Nov 19 08:38:35 2024 +0000, Rémi Bernon wrote:
Well that's maybe yet another issue but not what I meant. What I think is happening is related to the `SESSION_FLAG_PENDING_COMMAND` that is used at https://gitlab.winehq.org/wine/wine/-/blob/master/dlls/mf/session.c?ref_type... to decide whether to append the async call to a list or whether to request a new async call from the thread pool. The flag is set at https://gitlab.winehq.org/wine/wine/-/blob/master/dlls/mf/session.c?ref_type..., so before processing `SESSION_CMD_START`, which calls BeginGetEvent and Start on the media sources at https://gitlab.winehq.org/wine/wine/-/blob/master/dlls/mf/session.c?ref_type.... If `IMFMediaSource_Start` is returning an error immediately (if the source has been shutdown before the call), the flag is correctly cleared with `session_command_complete_with_event` at https://gitlab.winehq.org/wine/wine/-/blob/master/dlls/mf/session.c?ref_type.... If the call doesn't return an error, but the source is shutdown before it has executed its async command, the media session never completes its command and never clear that flag because `MESourceStarted` is never received, and `session_set_source_object_state` never called.
With the changes I've made, `MESourceStarted` should be received even if the source is shutdown before the start async command executes. Is that sufficient to solve that issue?
On Tue Nov 19 13:45:07 2024 +0000, Conor McCarthy wrote:
With the changes I've made, `MESourceStarted` should be received even if the source is shutdown before the start async command executes. Is that sufficient to solve that issue?
Delay shutting down the event queue will keep the media source async commands running, and events to be received, which is not supposed to happen after Shutdown has been called, unless tests show otherwise.
IMO the bug is in the media session state machine and should be fixed there.
On Tue Nov 19 13:49:40 2024 +0000, Rémi Bernon wrote:
Delay shutting down the event queue will keep the media source async commands running, and events to be received, which is not supposed to happen after Shutdown has been called, unless tests show otherwise. IMO the bug is in the media session state machine and should be fixed there.
A test Santino made last year for the same issue in Proton new media source showed that when this case happens in Windows, `MESourceStarted` is sent with a status of `MF_E_SHUTDOWN`. I'll try to make a Wine test which shows that.
On Tue Nov 19 14:05:24 2024 +0000, Conor McCarthy wrote:
A test Santino made last year for the same issue in Proton new media source showed that when this case happens in Windows, `MESourceStarted` is sent with a status of `MF_E_SHUTDOWN`. I'll try to make a Wine test which shows that.
I misread the info on the Proton issue; it was for `MESessionStarted`. For the media source in Windows, when `Shutdown()` is called after `Start()`, the `Invoke()` callback is called but `IMFMediaSource::EndGetEvent()` fails with `MF_E_SHUTDOWN`. If we replicate this behaviour then we have notification of the shutdown.
I propose:
1. Change `MFUnlockWorkQueue()` to execute all pending commands. 2. Keep the event queue alive after source shutdown. 3. When a command executes on a shutdown source, we can call `QueueEventParamVar()` with `MESourceStarted` (and others if necessary). Combined with commit b30da09b this results in `Invoke()` being called but `IMFMediaSource::EndGetEvent()` fails just as in Windows. 4. In media session, use `session_events_callback_Invoke()` and subsequent failure of `EndGetEvent()` to detect an abort condition, and call a `session_abort()` helper. 5. Also call `session_abort()` on failure to re-subscribe.
My current implementation of `session_abort()` does:
1. If the session state is `SESSION_STATE_STARTING_SOURCES`, enqueue an `MESessionStarted` with status `MF_E_SHUTDOWN`. 2. Set the state to `ABORTED`. 3. Delete all commands in `session->commands` which are not `CLOSE` or `SHUTDOWN`. 4. Call `session_command_complete()`. 5. In `session_close()`, if the state is `ABORTED`, call `session_command_complete_with_event()` with `MESessionClosed`.
How does that look?
On Wed Nov 20 09:13:04 2024 +0000, Conor McCarthy wrote:
I misread the info on the Proton issue; it was for `MESessionStarted`. For the media source in Windows, when `Shutdown()` is called after `Start()`, the `Invoke()` callback is called but `IMFMediaSource::EndGetEvent()` fails with `MF_E_SHUTDOWN`. If we replicate this behaviour then we have notification of the shutdown. I propose:
- Change `MFUnlockWorkQueue()` to execute all pending commands.
- Keep the event queue alive after source shutdown.
- When a command executes on a shutdown source, we can call
`QueueEventParamVar()` with `MESourceStarted` (and others if necessary). Combined with commit b30da09b this results in `Invoke()` being called but `IMFMediaSource::EndGetEvent()` fails just as in Windows. 4. In media session, use `session_events_callback_Invoke()` and subsequent failure of `EndGetEvent()` to detect an abort condition, and call a `session_abort()` helper. 5. Also call `session_abort()` on failure to re-subscribe. My current implementation of `session_abort()` does:
- If the session state is `SESSION_STATE_STARTING_SOURCES`, enqueue an
`MESessionStarted` with status `MF_E_SHUTDOWN`. 2. Set the state to `ABORTED`. 3. Delete all commands in `session->commands` which are not `CLOSE` or `SHUTDOWN`. 4. Call `session_command_complete()`. 5. In `session_close()`, if the state is `ABORTED`, call `session_command_complete_with_event()` with `MESessionClosed`. How does that look?
I don't know, it would need tests to make sure what native behavior is and match it.
From the game usage it doesn't seem to me that shutting down a media source should put the media session in any kind of aborted state. Rather, it looks like it should simply just ignore the error and later keep working normally when ClearTopologies / SetTopology / Start are called with a new topology.
- Keep the event queue alive after source shutdown.
This would need tests, especially in the light of the documentation and example links above.
This would need tests, especially in the light of the documentation and example links above.
It would not be externally visible because media source `Begin/EndGetEvent()` etc will return MF_E_SHUTDOWN if source shutdown has occurred.
On Wed Nov 20 14:18:24 2024 +0000, Conor McCarthy wrote:
This would need tests, especially in the light of the documentation
and example links above. It would not be externally visible because media source `Begin/EndGetEvent()` etc will return MF_E_SHUTDOWN if source shutdown has occurred.
Perhaps but it still feels brittle that fixing the media session requires some cooperation from the media source.
Some tests with a mock media source that doesn't send any event in response to IMFMediaSource_Start, and making the same media session calls as the game would show if the session is able to handle the missing events or not.
If native is indeed hanging too in that scenario, then yes the media source would also need some changes. Otherwise they don't seem necessary.
On Wed Nov 20 14:30:21 2024 +0000, Rémi Bernon wrote:
Perhaps but it still feels brittle that fixing the media session requires some cooperation from the media source. Some tests with a mock media source that doesn't send any event in response to IMFMediaSource_Start, and making the same media session calls (SetTopology, Start, ClearTopologies, SetTopology, Start again) as the game would show if the session is able to handle the missing events or not. If native is indeed hanging too in that scenario, then yes the media source would also need some changes. Otherwise they don't seem necessary.
I confirmed that native hangs if `MESourceStarted` is not sent and events are halted as we'd expect for a shutdown source. I tested this by wrapping the native media source in a custom one, which also required wrapping streams returned by `MENewStream` so their `GetMediaSource()` returns the wrapper. The wrapper intercepts `MESourceStarted` and blocks forwarding.
I also tested reusing the session after a source is forcibly shutdown. This requires calling `ClearTopologies()`. Clearing the current topology is not enough.
I have this working in Wine; I need to make sure the details are correct or reasonably close to it for review.
On Fri Nov 22 14:55:40 2024 +0000, Conor McCarthy wrote:
I confirmed that native hangs if `MESourceStarted` is not sent and events are halted as we'd expect for a shutdown source. I tested this by wrapping the native media source in a custom one, which also required wrapping streams returned by `MENewStream` so their `GetMediaSource()` returns the wrapper. The wrapper intercepts `MESourceStarted` and blocks forwarding. I also tested reusing the session after a source is forcibly shutdown. This requires calling `ClearTopologies()`. Clearing the current topology is not enough. I have this working in Wine; I need to make sure the details are correct or reasonably close to it for review.
While event queues will invoke the callback after shutdown, this occurs for media sources only within a narrow window. Probably it enqueues `MESourceStarted`, but shutdown occurs before `EndGetEvent()` is called in the callback function. This is not sufficient to prevent the session hanging. I tested a media session with a wrapped media source which calls `Shutdown()` on the native source object immediately after the `Start()` call has been forwarded to it, and this causes a session hang in Windows. The only explanation I have for the game not hanging in Windows is either Windows has a workaround which only applies when `Shutdown()` is called from a different thread (unlikely), or it uses a private interface to receive shutdown notifications. In Windows the session queries from the source an interface `{0b5e1c7e-bd76-46bc-896c-b2edb40dd803}` which is not documented anywhere. Forwarding this `QueryInterface()` call to the native source doesn't prevent the hang, but wrapping could stil l prevent notification from working.
I see no way to prevent a hang without adding a private interface for shutdown notification. It will be much simpler than wrapping the media source in order to intercept the call to `Shutdown()`.