https://bugs.winehq.org/show_bug.cgi?id=50779
Bug ID: 50779 Summary: winegstreamer is leaking something Product: Wine Version: unspecified Hardware: x86-64 OS: Linux Status: UNCONFIRMED Severity: normal Priority: P2 Component: winegstreamer Assignee: wine-bugs@winehq.org Reporter: galtgendo@o2.pl Distribution: ---
A rough summary of the problem and things said on chat.
Awhile ago a thing, that had been working reasonably well for quite a few years (with exception of a short period when code changes made the music stop looping), has began to randomly freeze to a point that you'd needed to wait for wm's kill dialog to appear after trying to close the main window.
The messages in the console were saying something about being unable to create thread due to running out of resources. The freezes usually took a quite a bit of time, but didn't seem to have any obvious trigger.
As far as free memory went, the situation was still fine at such time, but zf suggested it was an address space problem. Forcibly setting the executable to LARGE_ADDRESS_AWARE gave it more running time, but eventually seem to have resulted in a whole system freeze.
As such, I was once put into realm of wild ass guesses and even crazier attempted solutions.
So, I took a gamble and removed gst_object_ref calls from mpeg_audio_parser_init_gst and wave_parser_init_gst first.
This didn't seem to have any effect.
Then I've removed the last gst_object_ref - one in pad_added_cb.
This resulted in a flood of 'gst_object_unref: assertion '((GObject *) object)->ref_count > 0' failed', yet after running for quite awhile the freeze is yet to happen.
Now, this is far from conclusive, as we're trying to literally prove a negative here, but it would align well with the fact that free memory used at the time of freeze isn't all that significant.
I'm not sure when the problem began. I think it was somewhere between 5.22 and 6.1 (though only the upper bound being definite).
https://bugs.winehq.org/show_bug.cgi?id=50779
Zebediah Figura z.figura12@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |z.figura12@gmail.com
--- Comment #1 from Zebediah Figura z.figura12@gmail.com --- Can you please give the actual name of the program you are running?
https://bugs.winehq.org/show_bug.cgi?id=50779
--- Comment #2 from Rafał Mużyło galtgendo@o2.pl --- Next update.
So, unfortunately, it seems to take awhile longer, but it still hangs.
While there still might be a valid part in that approach, as gst_element_get_static_pad implicitly refs (that's the way I read that part of api docs), the main problem seems to lie elsewhere.
Now, another WAG, but this time I have no idea where to go with it.
glib is linked with nodelete. winegstreamer recently underwent PE porting. winegstreamer is loaded via that registry guid mechanism and it happens on pretty much every sound effect (likely each bgm too, but that changes far less often).
Could it be that it's being loaded and mapped for every thread, but due to nodelete resources can't be freed after such mapping ?
https://bugs.winehq.org/show_bug.cgi?id=50779
Zebediah Figura z.figura12@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|winegstreamer is leaking |An unidentified program |something |freezes after an | |unspecified period of time CC|z.figura12@gmail.com | Component|winegstreamer |-unknown Version|unspecified |6.3
https://bugs.winehq.org/show_bug.cgi?id=50779
--- Comment #3 from Rafał Mużyło galtgendo@o2.pl --- (below is just me thinking out loud)
So, initially as I began examining this problem, I suspected it was a matter of a change in code that introduced a leak of gstreamer objects. While it still seems explicit refs after gst_element_get_static_pad are mistakes, this doesn't seem to the case.
Now, after finally noticing 'winegstreamer: Manage our own streaming thread' commit, I'm trying to approach it from a different angle.
It would seem threads themselves are being leaked, more exactly they aren't terminated properly and end up waiting forever. Though AFAICT it's something racy - it seems to be tied to playback being terminated early, but doesn't seem to on every single such event (at this point I wouldn't be surprised if internal disk caching played a role).
By 'terminated early', I mean the leak *seems* to happen when a scene change occurs faster than a sound effect is either played or fully read. That is I suspect it happens most often when a sound effect that's tied to loading a save that seems to be slightly longer than the ones usually used in RPGMaker VXAce is being terminated early once the load is finished. Other case seem to be another ogg file with the sound effect for the cursor movement - as it's played on every key press in such menus, it might be that every once in awhile it's being cut short by its next instance. Again, leak doesn't happen every time.
The engine notes in its docs (googleable) that it applies some 'filtering' if the same sound effect is being played in a very short period...whatever that's supposed to mean in this context.
https://bugs.winehq.org/show_bug.cgi?id=50779
Rafał Mużyło galtgendo@o2.pl changed:
What |Removed |Added ---------------------------------------------------------------------------- Version|6.3 |6.1
https://bugs.winehq.org/show_bug.cgi?id=50779
--- Comment #4 from Rafał Mużyło galtgendo@o2.pl --- So, either I need to slightly correct myself or there's an additional element in play, that I fail to identify.
Turning off winegstreamer via WINEDLLOVERRIDES does mitigate the damage, but doesn't 100% prevent the freeze. That is it slows down the growth of the number of threads used, but it still keeps growing.
Now, the problem with that potential additional element is following: while WINEDLLOVERRIDES seems to affect at least some of the sound effects, it seems to be just wav files. But for whatever the reason, it seems to not affect ogg files at all, they seem to go through dsound.dll DSound8 interface and are being played even when winegstreamer is disabled...and I have no idea how.
https://bugs.winehq.org/show_bug.cgi?id=50779
--- Comment #5 from Rafał Mużyło galtgendo@o2.pl --- Well, then...
So, looking a bit deeper into things, I must once again admit I hate all the obfuscators.
In this particular case, executable compressors.
So, main lib does carry some code to read ogg files.
Though, given that the problem happens even with winegstreamer disabled, it just means the problem happens somewhere deeper in the stack. As of now, I've got no idea where.
It's still is related to playback of those files...
https://bugs.winehq.org/show_bug.cgi?id=50779
joaopa jeremielapuree@yahoo.fr changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |jeremielapuree@yahoo.fr
--- Comment #6 from joaopa jeremielapuree@yahoo.fr --- What is the name of the application. Does there exist a freely downloadable version?
https://bugs.winehq.org/show_bug.cgi?id=50779
--- Comment #7 from Rafał Mużyło galtgendo@o2.pl --- As I've said, this is about RPGMaker VXAce engine games.
Pretty much all games are affected, but it takes a special one to get to the point of freeze - it needs to be using a lot of extra sound effects (as it seems each *unique* sound effect gets its own thread).
Note: in VXAce (and likely other two ruby makers) filename is (in a way) unique sound effect identifier (well, it's a bit more complicated, but effectively close to that).
Those threads are being released every now and then (not sure about exact condition - seems to be an internal engine implementation detail), but a game using a lot of them is able to overwhelm wine now and run out of address space.
https://bugs.winehq.org/show_bug.cgi?id=50779
--- Comment #8 from Rafał Mużyło galtgendo@o2.pl --- ...need to correct myself again.
If every sound effect got its own thread, it might have still worked fine, but what seems to be happening is the opposite - every instance of a sound effect gets a thread.
It doesn't seem to effect bgm though...
https://bugs.winehq.org/show_bug.cgi?id=50779
--- Comment #9 from Rafał Mużyło galtgendo@o2.pl --- So,...
I've just stumbled upon this problem in a game using the older version of this engine - RPGMaker XP (this is still basically embedded ruby interpreter + a couple static libs, just an older version of all those).
Definitely not amused.
https://bugs.winehq.org/show_bug.cgi?id=50779
--- Comment #10 from Rafał Mużyło galtgendo@o2.pl --- All those extra threads seem to come from unix side, at least WINEDEBUG=thread seems silent.
In GST_DEBUG shortly before freeze, there are beauties like:
4786:gst_decode_bin_expose:decodebin36:src_0 added new decoded pad
That's quite high, given that I doubt there are more than 8 streams at any given time.
Yet the point remains that disabling winegstreamer doesn't fully cover the problem, so it seems to happen deeper...
Yet, my best other guess - amstream - doesn't print anything obvious and mmdevapi looks clean.
https://bugs.winehq.org/show_bug.cgi?id=50779
--- Comment #11 from Rafał Mużyło galtgendo@o2.pl --- Created attachment 69912 --> https://bugs.winehq.org/attachment.cgi?id=69912 output of 'thread apply all bt' during a freeze
There's likely something of use in this backtrace.
Haven't fully read it yet. What can I say, it's a lot of threads.
https://bugs.winehq.org/show_bug.cgi?id=50779
Rafał Mużyło galtgendo@o2.pl changed:
What |Removed |Added ---------------------------------------------------------------------------- Component|-unknown |quartz
--- Comment #12 from Rafał Mużyło galtgendo@o2.pl --- so, we're finally moving forward
After overriding quartz to native (by putting it in syswow64 - I just *love* that 'load by GUID' scheme), number of threads still goes into 150+, which I strongly suspect isn't proper behavior, but it doesn't seem to freeze anymore.
That quite a few commits between early November and end of January, but at least now I've got a decent target.
https://bugs.winehq.org/show_bug.cgi?id=50779
--- Comment #13 from Rafał Mużyło galtgendo@o2.pl --- ...another bust...
Using native quartz does delay the freeze by a factor of at least 6-8, but it still happens.
...I need a better idea...
https://bugs.winehq.org/show_bug.cgi?id=50779
--- Comment #14 from Rafał Mużyło galtgendo@o2.pl --- ...my current guess goes toward strmbase, but while that would explain why both quartz and winegstreamer are affected, this seems to be something about what it fails to do, as opposed to something it does...
That is it seems to be a case where for one reason or another a call to free/unlock something isn't being done. While there weren't that many commits, the code shifts were significant...
Also, that code feels quite convoluted to me. It might need to be, but it makes it difficult to follow and comprehend.
To make things more difficult, BaseOutputPinImpl_GetDeliveryBuffer is spamming the console after activating that debug channel.
https://bugs.winehq.org/show_bug.cgi?id=50779
--- Comment #15 from Rafał Mużyło galtgendo@o2.pl --- When using 'strmbase' debug channel, there's close to 70 unique filters that end up at refcount 3 at freeze point.
That is last entry in the log for them is:
trace:strmbase:filter_inner_Release <number> decreasing refcount to 3
A few go higher, but overwhelming majority goes down to 3.
That's still a lot, given that from what I know of the engine, sound effects only have .play and .stop methods (bgm and music effects also have .fade), with .stop rarely ever invoked. So, I'd suspect cleanup should happen as soon as the file ends.
https://bugs.winehq.org/show_bug.cgi?id=50779
--- Comment #16 from Rafał Mużyło galtgendo@o2.pl --- For awhile now, I've been trying to reproduce the problem in random other games of that engine.
Given that while the engines are quite flexible, most authors don't really bother making use of that flexibility, that shouldn't have been difficult...
Yet...
That game for RPM XP can be hung on the title screen simply by pressing down enough times. Yet if I substitute both bgm for title screen and the sound effect for the cursor in a different game of the same engine, only thing I've been accomplish was learn that internally the engine caches 32 sound effects, as after that thread count stops growing, yet is able to reach a higher count than in the freezing case.
I'm beginning to suspect that the actual problem actually is the thing on the label - due to the changes made during recent rewrite in winegstreamer we're simply running out of virtual address space.
That is to say, it's just weird, that in one case I can reach 196 threads and be fine and in the other freeze at around 150 while doing the same thing. I suspect that's because in the freezing case other stuff already fills some of that space.
https://bugs.winehq.org/show_bug.cgi?id=50779
Rafał Mużyło galtgendo@o2.pl changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|An unidentified program |thread changes in |freezes after an |winegstreamer lead to |unspecified period of time |increased virtual address | |space consumption which may | |lead to permanent freezes Component|quartz |winegstreamer Keywords| |regression
https://bugs.winehq.org/show_bug.cgi?id=50779
--- Comment #17 from Rafał Mużyło galtgendo@o2.pl --- ...so, finally I've stumbled upon some other game, not using any of ruby maker engines, that actually seems to crash due to the address exhaustion.
This one, going by all xnb files used, is using MonoGame.
https://bugs.winehq.org/show_bug.cgi?id=50779
--- Comment #18 from Rafał Mużyło galtgendo@o2.pl --- ...
https://gitlab.freedesktop.org/pulseaudio/pulseaudio/-/merge_requests/448 applied to dlls/winepulse.drv/pulse.c helps with ruby makers (though not with MonoGame).
This is actually *not* a good thing - it only stresses that now additional tinkering is required.
https://bugs.winehq.org/show_bug.cgi?id=50779
Zeb Figura z.figura12@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Component|winegstreamer |-unknown
https://bugs.winehq.org/show_bug.cgi?id=50779
--- Comment #19 from Rafał Mużyło galtgendo@o2.pl --- Really, Zeb...
The regression *is* pinned down to winegstreamer changes that went into 6.1.
Unless your changes have uncovered a problem elsewhere, 'winegstreamer' had been the correct component.
https://bugs.winehq.org/show_bug.cgi?id=50779
Zeb Figura z.figura12@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |z.figura12@gmail.com
--- Comment #20 from Zeb Figura z.figura12@gmail.com --- (In reply to Rafał Mużyło from comment #19)
Really, Zeb...
The regression *is* pinned down to winegstreamer changes that went into 6.1.
Unless your changes have uncovered a problem elsewhere, 'winegstreamer' had been the correct component.
I haven't seen a regression SHA1; did I miss one?
https://bugs.winehq.org/show_bug.cgi?id=50779
--- Comment #21 from Rafał Mużyło galtgendo@o2.pl --- SHA1 of the regression would require a bisect, which (for reasons) I haven't done.
What I did do (at least AFAIR - it's been over 1.5 years since already), was establish it happened at 6.1 exactly, then comb it for sound changes (as the bug was triggered upon sound effects playing), then with your help notice it was virtual mapping space exhaustion triggered by spawning multiple threads for each sound effect. That pretty much pointed towards the changeset containing 'winegstreamer: Manage our own streaming thread'. As that changeset was too complex to revert bit by bit (at least at my skill level), I've pretty much stopped there.
Well, not so much stopped as that anything I've tried with adding debug lines and tinkering with little pieces afterwards didn't affect the bug. As the content of this bug is a bit confusing, to summarize in the game I could trigger the problem in the easiest way, all I needed to do was repeatedly press down in the title screen menu; on each press a sound effect from an mp3 file was being played; address exhaustion always managed to hit before engine's internal caching limit; changing that file to an ogg stopped the bug from triggering, as the engine embeds some ogg handling library, bypassing winegstreamer.
Given that patching wine to make pulseaudio *not* use shm makes a difference, it's likely cause something like each effect having its own stream, instead of putting them all into one (...OK, that's extremely badly phrased, but you should get the gist of it).
https://bugs.winehq.org/show_bug.cgi?id=50779
--- Comment #22 from Zeb Figura z.figura12@gmail.com --- (In reply to Rafał Mużyło from comment #21)
SHA1 of the regression would require a bisect, which (for reasons) I haven't done.
What I did do (at least AFAIR - it's been over 1.5 years since already), was establish it happened at 6.1 exactly, then comb it for sound changes (as the bug was triggered upon sound effects playing), then with your help notice it was virtual mapping space exhaustion triggered by spawning multiple threads for each sound effect. That pretty much pointed towards the changeset containing 'winegstreamer: Manage our own streaming thread'. As that changeset was too complex to revert bit by bit (at least at my skill level), I've pretty much stopped there.
So you haven't actually proven anything. You made a guess, then didn't try to verify it at all.
If you're confident in that guess, you don't even need to perform a full bisect. All you need to do is check out that commit, test that you can reproduce the issue, then check out the previous commit and test that you can't. What part of this is infeasible?
https://bugs.winehq.org/show_bug.cgi?id=50779
--- Comment #23 from Rafał Mużyło galtgendo@o2.pl ---
What part of this is infeasible?
Short version: fuck you, that's why.
Longer variant:
1. keeping wine git around is quite annoying when you're not working with it constantly 2. if you need some patches on top of the tarball/git tree, bisect becomes far less automatic 3. I'm not sure I'd be able to pull off non-mingw wine build outside portage and nearly sure I'd botch a mingw one 4. there might be other issues (more on that later)
So as I've said, for reasons.
Having said that, fortunately I've still had the old ebuild for 6.1 and its patches. With some elbow grease and both live ebuild framework and the ebuilds for additional wine variants, I've managed to put together something that won't inconvenience me *too* much.
So, those other issues... In this instance, I needed a far later 'msvcrt: Add sincos to importlib' commit to get things to build at all.
Then I've started looking for the commits behind the problem...and it's a bit more complicated than I thought.
I was right about the patchset, if not about particular commit.
With wine 6.0, VM consumption starts a bit below 3300M, then with the test procedure mentioned earlier rises quite quickly to 3500M, then much more slowly and quite randomly might reach 3700M.
This holds true till 'winegstreamer: Keep the stream in paused state for its entire lifetime'. Now, it rises quickly (seemingly faster than before) to nearly 4000M, but it may take awhile till it goes over and triggers a freeze. So I was kind of right about things getting stuck somewhere - they were, deliberately so.
The process is finalized with 'winegstreamer: Manage our own thread for read requests'. Now, within less than 20 seconds, the procedure leads to a freeze. Comment on that commit sounds a bit ironic in this context.
https://bugs.winehq.org/show_bug.cgi?id=50779
--- Comment #24 from Rafał Mużyło galtgendo@o2.pl --- ...and on unrelated note: it seems the pulseaudio no-shm hack stopped being a working workaround.
https://bugs.winehq.org/show_bug.cgi?id=50779
Zeb Figura z.figura12@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Regression SHA1| |90107bba944a1da286e9ebd246a | |b3fd5dd7cb115 Component|-unknown |winegstreamer
--- Comment #25 from Zeb Figura z.figura12@gmail.com --- (In reply to Rafał Mużyło from comment #23)
What part of this is infeasible?
Short version: fuck you, that's why.
Rafał, I'm sorry to be short with you.
On the other hand, it's hard to have much patience when you waste people's time like this. This is a recurring pattern: you make guesses and assumptions with little relevance to the code, which often turn out to be wrong; you ask people directed questions about these guesses which end up having nothing to do with your application; you refuse or ignore requests for bisects, logs, backtraces, or even the name of the application that you're reporting bugs against.
Longer variant:
- keeping wine git around is quite annoying when you're not working with it
constantly 2. if you need some patches on top of the tarball/git tree, bisect becomes far less automatic
So don't keep it around? Cloning isn't instant, sure, but if you're concerned about space it seems like a worthwhile tradeoff.
Admittedly this also didn't occur to me since earlier comments here (comment 18) had implied you already had a wine tree available.
I understand that bisecting takes a lot of time; I don't really want to fault anyone for not being able to do a bisect. But for someone who's already built wine, I feel like asking for one build is not unreasonable.
Especially when the implicit expectation, with this kind of bug report, seems to be "please examine all of the winegstreamer commits between 6.0 and 6.1 for errors". That's not a reasonable request for a developer.
Examining a single commit for errors is a lot closer to feasible. Examining a single commit, with appropriate debug logs, is solidly within the realm of feasibility. Being able to test the program myself all but assures it'll get fixed.
- I'm not sure I'd be able to pull off non-mingw wine build outside portage
and nearly sure I'd botch a mingw one
A quick Internet search implies that Gentoo ships a mingw-w64 toolchain. All that should be necessary to build Wine is to have that toolchain and compiler installed and available; our configure script will do the rest. I haven't used Gentoo, and maybe there are bugs that cause it to be broken in subtle ways, but it would be a lot more productive for me to spend that effort on helping to fix those bugs so that Gentoo users can effectively test wine.
With wine 6.0, VM consumption starts a bit below 3300M, then with the test procedure mentioned earlier rises quite quickly to 3500M, then much more slowly and quite randomly might reach 3700M.
This holds true till 'winegstreamer: Keep the stream in paused state for its entire lifetime'. Now, it rises quickly (seemingly faster than before) to nearly 4000M, but it may take awhile till it goes over and triggers a freeze. So I was kind of right about things getting stuck somewhere - they were, deliberately so.
The process is finalized with 'winegstreamer: Manage our own thread for read requests'. Now, within less than 20 seconds, the procedure leads to a freeze. Comment on that commit sounds a bit ironic in this context.
Thank you, that's useful information. Can you please attach a log with WINEDEBUG=+quartz,+timestamp,+seh,+pid GST_DEBUG=6 ?
https://bugs.winehq.org/show_bug.cgi?id=50779
--- Comment #26 from Rafał Mużyło galtgendo@o2.pl --- Created attachment 73210 --> https://bugs.winehq.org/attachment.cgi?id=73210 log in wine 7.18
Hmm, you know, part of the problem might just be on the box. Cause I might be barking up the wrong tree, but glib's "-Wl,-z,nodelete" might play some role here.
Anyway, the log even compressed almost hits the size limit. Also, no 'seh' entries within. Freeze point is 'failed to create thread'.
https://bugs.winehq.org/show_bug.cgi?id=50779
--- Comment #27 from Zeb Figura z.figura12@gmail.com --- (In reply to Rafał Mużyło from comment #26)
Created attachment 73210 [details] log in wine 7.18
Hmm, you know, part of the problem might just be on the box. Cause I might be barking up the wrong tree, but glib's "-Wl,-z,nodelete" might play some role here.
Thanks. I don't think nodelete matters, because that'll only add more VM usage once, rather than every time.
Rather, it looks like the application is just creating a new filter graph every time it plays a given sound, and then never destroying it. It does stop the filter graph after the sound is done, though. This is consistent with 90107bba making things worse; previously we would essentially clean up all the GStreamer resources by also resetting the GStreamer filter to the null state. It's also consistent with 3b885fb5f making things worse, since that's another thread we create for every filter graph.
Testing shows that Windows doesn't even leak the filter graph's message thread (rather, it seems to be shared among all filter graphs). So while it's possible that we're doing something wrong (missing graph events?) and that the game is supposed to be cleaning up its filter graphs, my guess is that we really do need to avoid creating any extra threads. That will be annoying.
https://bugs.winehq.org/show_bug.cgi?id=50779
--- Comment #28 from Rafał Mużyło galtgendo@o2.pl --- I wouldn't say 'always' and 'never'.
As the log suggests, this was tested with a simple sounds effect - essentially a short beep. Engine caches only limited number of them - as I've said in earlier comments, in other games of that engine (even one affected by this bug, just not brutally so), the growth stops at (IIRC) 32 cached sound effects, then they get reused. It's simply that in the brutal example, game runs out of vm before it reaches that point - though the usage is still high enough to risk a random freeze later of a random reason. Given the time taken, some of the freezes with first commit only might have been of this nature.
https://bugs.winehq.org/show_bug.cgi?id=50779
--- Comment #29 from Zeb Figura z.figura12@gmail.com --- https://gitlab.winehq.org/wine/wine/-/merge_requests/1424/diffs helps share the message thread, although I don't remember if that was even a problem here (but my guess is yes, it's kind of the default behaviour.)
We could probably do something similar for the read thread? Or at least throw that into a thread pool.
The streaming threads are already created on pause/stop.
That leaves GStreamer's streaming threads. The problem is that in order to get rid of those, we have to start and stop the filter, and that's actually expensive (not so much because typefinding is expensive, but because prerolling is.) That was one of the motivations for 90107bba9, besides the code cleanup. I don't think there's any way to have it both ways, not within the design of GStreamer. (IIRC typefind doesn't do it since it won't demux.) Maybe we can get away with just letting WoW64 support handle this? It's still not great, not all of that memory is going to be unused, but maybe it's enough, and this is definitely the less common case anyway.
https://bugs.winehq.org/show_bug.cgi?id=50779
--- Comment #30 from Rafał Mużyło galtgendo@o2.pl --- So, did the situation improved with wine 7.22 ?
Well, kind of...
We seem to be back to the situation after the first commit. I don't seem to be able to reliably be able to hang it under 20s, but vm usage still runs close to 4G, so we're essentially one bad mapping away from that hang.
Still, progress ?
https://bugs.winehq.org/show_bug.cgi?id=50779
Julian Rüger jr98@gmx.net changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |jr98@gmx.net