During engine shutdown we acquire engine lock first, then locks of its constituents (e.g. sample grabbers); whereas normally the order is the other way around (e.g. timer callback -> acquire sample grabber lock -> OnProcessSample callback -> engine lock). This is deadlock prone.
With this commit, engine lock is released before we shutdown the inner media session.
From: Yuxuan Shui yshui@codeweavers.com
During engine shutdown we acquire engine lock first, then locks of its constituents (e.g. sample grabbers); whereas normally the order is the other way around (e.g. timer callback -> acquire sample grabber lock -> OnProcessSample callback -> engine lock). This is deadlock prone.
With this commit, engine lock is released before we shutdown the inner media session. --- dlls/mfmediaengine/main.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/dlls/mfmediaengine/main.c b/dlls/mfmediaengine/main.c index 0d553a40a7e..583437ac952 100644 --- a/dlls/mfmediaengine/main.c +++ b/dlls/mfmediaengine/main.c @@ -2237,6 +2237,7 @@ static HRESULT WINAPI media_engine_Shutdown(IMFMediaEngineEx *iface) { struct media_engine *engine = impl_from_IMFMediaEngineEx(iface); HRESULT hr = S_OK; + IMFMediaSession *session = NULL;
TRACE("%p.\n", iface);
@@ -2247,10 +2248,16 @@ static HRESULT WINAPI media_engine_Shutdown(IMFMediaEngineEx *iface) { media_engine_set_flag(engine, FLAGS_ENGINE_SHUT_DOWN, TRUE); media_engine_clear_presentation(engine); - IMFMediaSession_Shutdown(engine->session); + session = engine->session; + IMFMediaSession_AddRef(session); } LeaveCriticalSection(&engine->cs);
+ if (session) + { + IMFMediaSession_Shutdown(session); + IMFMediaSession_Release(session); + } return hr; }
If this is the right direction to take, it can be extended further.
For example, when shutting down a media session, we can first detach its topology (and other stuff, clocks, whatever), release the session lock, then shutdown all the other stuff.
Could you describe exact path, and topology, when this happens. Where in IMFMediaSession::Shutdown() are we getting blocked on other locks?
On Fri Oct 27 22:14:47 2023 +0000, Nikolay Sivov wrote:
Could you describe exact path, and topology, when this happens. Where in IMFMediaSession::Shutdown() are we getting blocked on other locks?
see https://gitlab.winehq.org/wine/wine/-/merge_requests/3067#note_49787
On Fri Oct 27 22:14:46 2023 +0000, Yuxuan Shui wrote:
see https://gitlab.winehq.org/wine/wine/-/merge_requests/3067#note_49787
I think this deserves more consideration. First we don't currently shut down sinks the same way Windows does, that sort of prevents seeing what happens with this grabber. We should fix that first.
Second, for the grabber, locking does not match what we do. On Windows, blocking in OnProcessSample() does not prevent shut down sequence. So this needs some more manual testing of the sample grabber, to see what happens when OnProcessSample() is blocked, but clock state callbacks are called for example, do they produce events or not, that sort of thing.
After that we'll see if changing call order in the media engine is still useful.
First we don't currently shut down sinks the same way Windows does.
From my observations, Windows calls `MediaSink::Shutdown` in `MediaSession::Shutdown`, and doesn't call `ShutdownObject`; and it doesn't shutdown sinks at all when changing/clearing topologies. Does that match what you know? Is that what you suggest we do?
On Windows, blocking in OnProcessSample() does not prevent shut down sequence.
That would suggest `OnProcessSample` is called on a different thread, wouldn't it? And that doesn't say anything about locking.
On Mon Oct 30 20:00:05 2023 +0000, Yuxuan Shui wrote:
First we don't currently shut down sinks the same way Windows does.
From my observations, Windows calls `MediaSink::Shutdown` in `MediaSession::Shutdown`, and doesn't call `ShutdownObject`; and it doesn't shutdown sinks at all when changing/clearing topologies. Does that match what you know? Is that what you suggest we do?
On Windows, blocking in OnProcessSample() does not prevent shut down sequence.
~~That would suggest `OnProcessSample` is called on a different thread, wouldn't it? And that doesn't say anything about locking.~~
Wait, sorry I got things mixed up, yeah OnProcessSample is not called from Shutdown... 🤦♂️
Did you mean locking engine in OnProcessSample doesn't block shutdown on native? Or did you mean something else by "blocking"? Because that would be expected right? With this MR shutdown won't be blocked in that case either, just like native.
On Mon Oct 30 21:49:29 2023 +0000, Yuxuan Shui wrote:
Wait, sorry I got things mixed up, yeah OnProcessSample is not called from Shutdown... :face_palm: Did you mean locking engine in OnProcessSample doesn't block shutdown on native? Or did you mean something else by "blocking"? Because that would be expected right? With this MR shutdown won't be blocked in that case either, just like native.
Or maybe you mean like putting an `while(true)` inside `OnProcessSample`? In that case, I guess this means either `OnProcessSample` is called without the grabber locked, or shutting down a grabber doesn't require its lock.
I think the latter is less likely, the former is achievable independent of this MR.
On Mon Oct 30 21:52:38 2023 +0000, Yuxuan Shui wrote:
Or maybe you mean like putting an `while(true)` inside `OnProcessSample`? In that case, I guess this means either `OnProcessSample` is called without the grabber locked, or shutting down a grabber doesn't require its lock. I think the latter is less likely, the former is achievable independent of this MR.
I mean that putting long Sleep() in OnProcessSample(), which is called from some thread from the pool/async queue, does not prevent Shutdown() from completing from the main thread.
On Tue Oct 31 17:00:16 2023 +0000, Nikolay Sivov wrote:
I mean that putting long Sleep() in OnProcessSample(), which is called from some thread from the pool/async queue, does not prevent Shutdown() from completing from the main thread.
Right, it's my second case then. Which means `OnProcessSample` is likely called with the grabber lock released, which is doable.
Do you want that as part of this MR? And what do you think about the other stuff I mentioned?
On Tue Oct 31 23:13:42 2023 +0000, Yuxuan Shui wrote:
Right, it's my second case then. Which means `OnProcessSample` is likely called with the grabber lock released, which is doable. Do you want that as part of this MR? And what do you think about the other stuff I mentioned?
If calling it outside of the lock is how it should work, and it fixes the deadlock, I don't see a need of initial proposed change.
On Wed Nov 1 10:13:29 2023 +0000, Nikolay Sivov wrote:
If calling it outside of the lock is how it should work, and it fixes the deadlock, I don't see a need of initial proposed change.
oh, ok. i can write a test to check if session shutdown is called with engine locked.
On Wed Nov 1 13:47:13 2023 +0000, Yuxuan Shui wrote:
oh, ok. i can write a test to check if session shutdown is called with engine locked.
So, I tried, and it's difficult to validate if engine is locked or not during session shutdown, as there isn't many opportunity to hook into the engine.
But that also mean it's hard for the application to observe this fact, in which case this MR still stands. And I want to argue for its inclusion.
1. In general, it's a good idea to avoid nested locks. Of course, we should avoid holding grabber locks while calling `OnProcessSample`, but: 2. That fix is less universal then this one. The problem arise from the lock order being reversed during shutdown. So there could be more potential deadlocks on top of this engine/grabber deadlock. This MR would fix all of them.
I would also open an MR to do what you suggested, but I think this MR should be merged as well.
On Fri Nov 3 18:09:59 2023 +0000, Yuxuan Shui wrote:
So, I tried, and it's difficult to validate if engine is locked or not during session shutdown, as there isn't many opportunity to hook into the engine. But that also mean it's hard for the application to observe this fact, in which case this MR still stands. And I want to argue for its inclusion.
- In general, it's a good idea to avoid nested locks. Of course, we
should avoid holding grabber locks while calling `OnProcessSample`, but: 2. That fix is less universal then this one. The problem arise from the lock order being reversed during shutdown. So there could be more potential deadlocks on top of this engine/grabber deadlock. This MR would fix all of them. I would also open an MR to do what you suggested, but I think this MR should be merged as well.
Adding to point (2), even if we can fix this now by not holding grabber lock while calling `OnProcessSample`, we might introduce more deadlocks of the same kind in the future. This MR is more future proof in that sense.
On Fri Nov 3 18:14:45 2023 +0000, Yuxuan Shui wrote:
Adding to point (2), even if we can fix this now by not holding grabber lock while calling `OnProcessSample`, we might introduce more deadlocks of the same kind in the future. This MR is more future proof in that sense.
I was able to block engine shutdown by injecting a media source via `IMFMediaEngineExtension`. While the engine shutdown is blocked, I was able to call engine shutdown again from a different thread, and the second shutdown returns `MF_E_SHUTDOWN` immediately.
This mean either engine lock is not held; or engine shutdown flag is not protected by a lock, maybe an atomic. I think this is more evidence this MR should be accepted.