https://bugs.winehq.org/show_bug.cgi?id=49590
Bug ID: 49590 Summary: Battle.net Agent.exe hang/crash Product: Wine-staging Version: 5.13 Hardware: x86-64 URL: https://www.blizzard.com/apps/battle.net/desktop OS: Linux Status: UNCONFIRMED Severity: normal Priority: P2 Component: -unknown Assignee: wine-bugs@winehq.org Reporter: maciej.stanczew+b@gmail.com CC: leslie_alistair@hotmail.com, z.figura12@gmail.com Distribution: ArchLinux
On Staging 5.13, when using Battle.net App, its Agent.exe process will often (not always) misbehave. Example behavior: - Crashes on launch -- can be verified by processes dying and new ones being spawned, and by empty "Crash.txt" file in 'drive_c/ProgramData/Battle.net/Agent/Agent.<version>/Errors'; - Hangs using 100% CPU and doesn't exit when Battle.net App is closed; - Blocks launching of games; for example when launching Diablo III, I see 'Diablo' in process list, but it won't actually launch the game until I kill Agent.exe (which at the time is hanged with 100% CPU consumption).
Sometimes error message BLZBNTBNA00000005 will be shown in Battle.net App, which is described as: "The Blizzard Battle.net desktop app failed to communicate with the Blizzard Update Agent, which is required to install, update, launch, and uninstall Blizzard games." https://battle.net/support/en/article/16531
This is not happening with both Staging 5.12 and with vanilla Wine 5.13. With those versions, a single Agent.exe lives alongside Battle.net App, doesn't hang, and exits when Battle.net App is closed. I'm not able to check cooperation with games on those versions because of bug 45349 and bug 42741.
Since Agent.exe is spawned by Battle.net App, it's difficult to get Wine logs for its execution. If I launch Agent.exe manually (without Battle.net App), it seems to not hang/crash.
I only managed to get two exceptions when Battle.net App was running; one when I manually launched Agent.exe when previous instance died: 07c0:err:seh:NtRaiseException Unhandled exception code c0000005 flags 0 addr 0x7bc265b8
and I'm not even sure how I got the second one, as it happened only once and I've been unable to reproduce it again -- but hey, it might be useful: 01f8:err:virtual:virtual_setup_exception stack overflow 976 bytes in thread 01f8 addr 0xf7a91c2e stack 0x220c30 (0x220000-0x221000-0x320000)
I'll try to do a bisection next, to find where in Staging 5.13 the problem was introduced/uncovered.
https://bugs.winehq.org/show_bug.cgi?id=49590
--- Comment #1 from Maciej Stanczew maciej.stanczew+b@gmail.com --- Created attachment 67765 --> https://bugs.winehq.org/attachment.cgi?id=67765 Logs
Results are in:
f6954e6e77dfd443f5bdc28190ad478e0d6fb77d is the first bad commit commit f6954e6e77dfd443f5bdc28190ad478e0d6fb77d Author: Zebediah Figura z.figura12@gmail.com Date: Wed Jul 8 20:46:51 2020 -0500
Rebase against 262e4ab9e0eeb126dde5cb4cba13fbf7f1d1cef0.
For reproduction with logs, I have to kill the Agent.exe process that was spawned by Battle.net, and then quickly launch Agent.exe manually (before Battle.net spawns another one automatically). Repro is sporadic, but I'm able to get it eventually if I try enough times (10+).
wine-5.12-64-ge0e3b6bc91 (Staging) [553c1cff]: - No crashes in 20 manual retries - 'Errors' directory in ProgramData is empty
wine-5.12-97-g262e4ab9e0 (Staging) [f6954e6e]: - 4 crashes in 20 manual retries, plus 2 crashes of automatically spawned Agent - 'Errors' directory already gets one entry just after starting Battle.net
I have attached logs with +seh,+ntdll,+relay,+timestamp from f6954e6e and from Staging 5.13, and also some Agent crash reports. All builds tested are with PE support.
https://bugs.winehq.org/show_bug.cgi?id=49590
Maciej Stanczew maciej.stanczew+b@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Regression SHA1| |f6954e6e77dfd443f5bdc28190a | |d478e0d6fb77d
https://bugs.winehq.org/show_bug.cgi?id=49590
i.Dark_Templar idarktemplar@mail.ru changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |idarktemplar@mail.ru
--- Comment #2 from i.Dark_Templar idarktemplar@mail.ru --- Created attachment 67771 --> https://bugs.winehq.org/attachment.cgi?id=67771 battle.net error screenshot.png
This or similar issue appeared to me as well after I upgraded to wine-staging 5.13 with PE modules support.
Issue does not reproduce in wine-staging 5.11 both with and without PE modules support.
To me it happens almost soon after launching a game from Battle.net, for example Starcraft II. Each time message box is displayed, it brings Battle.net application's window to foreground.
https://bugs.winehq.org/show_bug.cgi?id=49590
--- Comment #3 from i.Dark_Templar idarktemplar@mail.ru --- (In reply to Maciej Stanczew from comment #1)
Results are in:
f6954e6e77dfd443f5bdc28190ad478e0d6fb77d is the first bad commit commit f6954e6e77dfd443f5bdc28190ad478e0d6fb77d Author: Zebediah Figura z.figura12@gmail.com Date: Wed Jul 8 20:46:51 2020 -0500
Rebase against 262e4ab9e0eeb126dde5cb4cba13fbf7f1d1cef0.
I've looked a bit in this commit. In this commit winebuild-Fake_Dlls patchset is removed for wine-staging 5.12. Later a different implementation, patchset winebuild-pe_syscall_thunks, is added for wine-staging 5.13. I think it might be an issue in this patchset, and a regression in wine-staging 5.13 compared to wine-staging 5.11. But I might be wrong.
https://bugs.winehq.org/show_bug.cgi?id=49590
--- Comment #4 from Maciej Stanczew maciej.stanczew+b@gmail.com --- Created attachment 67772 --> https://bugs.winehq.org/attachment.cgi?id=67772 Logs non-staging
(In reply to i.Dark_Templar from comment #2)
To me it happens almost soon after launching a game from Battle.net, for example Starcraft II. Each time message box is displayed, it brings Battle.net application's window to foreground.
It looks very similar to my case, probably it's the same issue.
(In reply to i.Dark_Templar from comment #3)
I've looked a bit in this commit. In this commit winebuild-Fake_Dlls patchset is removed for wine-staging 5.12. Later a different implementation, patchset winebuild-pe_syscall_thunks, is added for wine-staging 5.13. I think it might be an issue in this patchset, and a regression in wine-staging 5.13 compared to wine-staging 5.11. But I might be wrong.
These were also my first suspicions, but they turned out to be false, or at least incomplete.
I was able to reproduce the issue on plain Wine 5.13. Logs attached. I have done further bisection of upstream commits, and got this as the commit that introduces regression:
82cd85b07918a4437428497ffaf7f13286b83479 is the first bad commit commit 82cd85b07918a4437428497ffaf7f13286b83479 Author: Zebediah Figura z.figura12@gmail.com Date: Tue Jul 7 18:58:34 2020 -0500
ntdll: Set the thread creation time in NtQuerySystemInformation(SystemProcessInformation).
Process Hacker displays this information.
With this commit reverted from Staging 5.13: - I can use Battle.net App and launch games without any popups appearing about Agent crashing; - I can't reproduce the exception anymore by killing and manually launching Agent.exe.
However, automatically spawned Agent still sometimes crashes -- I can see it happening in process list, and by entries in 'Errors' directory. It seems to happen most often when launching games: Agent will crash, new one will spawn, and eventually one will "stick" and things will proceed. I don't see any functional impact -- no popups and game hangs. I didn't see any crashes in Staging 5.10. Unfortunately because of the time between disabling Fake_Dlls and introduction of pe_syscall_thunks, there's no way to do a bisect with launching games to trigger this crash.
After all the testing and reverting I'm starting to think there might be multiple issues leading to Agent crashing. But the one this bug was initially about seems to be related to 82cd85b07918a4437428497ffaf7f13286b83479.
https://bugs.winehq.org/show_bug.cgi?id=49590
Maciej Stanczew maciej.stanczew+b@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Product|Wine-staging |Wine Component|-unknown |ntdll Regression SHA1|f6954e6e77dfd443f5bdc28190a |82cd85b07918a4437428497ffaf |d478e0d6fb77d |7f13286b83479
https://bugs.winehq.org/show_bug.cgi?id=49590
Paul Gofman pgofman@codeweavers.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |pgofman@codeweavers.com
--- Comment #5 from Paul Gofman pgofman@codeweavers.com --- The blamed commit is misleading, I suggest removing it from Regression SHA1 field. Bisect showed that because it stopped working after that one, but the crash present now is not related. As far as my testing goes so far, the reintroduced syscall thunks patchset is also not at fault.
I could reproduce crashes in Agent.exe with the latest Staging and Starcraft. It looks like some memory overwrite issue. WINEDEBUG=warn+heap shows tail overwrites, and the crashes are always in ntdll heap allocation / free functions, which clearly suggests that heap control data is smashed. Can you try Staging without ntdll-Heap_Improvements patchset (staging/patchinstall.py --all -W ntdll-Heap_Improvements). That was fixing the issue for me, would be interesting to confirm if that is the same issue I am seeing.
It is not much likely that ntdll-Heap_Improvements is at fault per se, it just introduces a different memory control structures layout which appears to be more vulnerable.
It is yet to be verified if the memory smash is solely due to Agent code or maybe imposed by something in Wine.
https://bugs.winehq.org/show_bug.cgi?id=49590
--- Comment #6 from Paul Gofman pgofman@codeweavers.com --- As a separate note, mainstream Wine with syscall thunks patchset applied also crashes for me but in a different executable (always the same address in libcef.dll), that does not look related.
https://bugs.winehq.org/show_bug.cgi?id=49590
Maciej Stanczew maciej.stanczew+b@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Regression SHA1|82cd85b07918a4437428497ffaf | |7f13286b83479 |
https://bugs.winehq.org/show_bug.cgi?id=49590
--- Comment #7 from Maciej Stanczew maciej.stanczew+b@gmail.com --- (In reply to Paul Gofman from comment #5)
The blamed commit is misleading, I suggest removing it from Regression SHA1 field. Bisect showed that because it stopped working after that one, but the crash present now is not related.
True, after some more time of using Battle.net I see that those crashes during launching of games can in fact lead to the broken state ("Whoops!" popups and game hangs).
Can you try Staging without ntdll-Heap_Improvements patchset (staging/patchinstall.py --all -W ntdll-Heap_Improvements). That was fixing the issue for me, would be interesting to confirm if that is the same issue I am seeing.
Initially it looked better, but then I got one Agent crash during game launch, and another just after launching Battle.net. I haven't managed to get it into the same broken state, but based on previous debugging data, I might have just not tried enough times. For now I'll stay on this configuration and report anything new if it happens.
https://bugs.winehq.org/show_bug.cgi?id=49590
--- Comment #8 from Maciej Stanczew maciej.stanczew+b@gmail.com ---
I might have just not tried enough times
Yup, that was it. Just now I launched Battle.net and then Diablo III, and I got 4 back to back Agent crashes, resulting in the popup appearing and the game being delayed from starting. (The 5th Agent didn't crash and the game finally launched after about 30 s of delay.)
https://bugs.winehq.org/show_bug.cgi?id=49590
--- Comment #9 from i.Dark_Templar idarktemplar@mail.ru --- (In reply to Paul Gofman from comment #5)
Can you try Staging without ntdll-Heap_Improvements patchset (staging/patchinstall.py --all -W ntdll-Heap_Improvements). That was fixing the issue for me, would be interesting to confirm if that is the same issue I am seeing.
I've rebuilt wine-staging 5.13 without ntdll-Heap_Improvements patchset and issue didn't reproduce again for me yet. Thank you for workaround.
https://bugs.winehq.org/show_bug.cgi?id=49590
--- Comment #10 from Paul Gofman pgofman@codeweavers.com --- I think I found at least one reason for heap corruption, that is free of the random pointer by Wine (which accidentally happens to be a pointer already freed previously).
I've sent a patch for that [1].
However, I still see crashes in memory management with ntdll-Heap_Improvements patchset. I am not sure yet if this is a bug in the patchset itself or caused by some memory naughtiness by Wine or Agent.exe, this needs further debugging.
Does the linked patch without ntdll-Heap_Improvements patchset (that is, ntdll-Heap_Improvements excluded from Staging pacthes by -W ntdll-Heap_Improvements) solve the issue completely?
1. https://source.winehq.org/patches/data/189533
https://bugs.winehq.org/show_bug.cgi?id=49590
--- Comment #11 from Paul Gofman pgofman@codeweavers.com --- Here is the fix for another heap corruption I found: https://www.winehq.org/pipermail/wine-devel/2020-July/170551.html
With the patch from comment #10 and this one Agent is not crashing for me anymore (no Staging patches disabled, ntdll-Heap_Improvements is in place) and I don't see heap validation errors so far.
https://bugs.winehq.org/show_bug.cgi?id=49590
--- Comment #12 from i.Dark_Templar idarktemplar@mail.ru --- I didn't test it for long time yet, but it looks like this issue doesn't reproduce for me with wine-staging 5.13 with ntdll-Heap_Improvements patchset and 2 patches from last comments. Thank you!
https://bugs.winehq.org/show_bug.cgi?id=49590
--- Comment #13 from Maciej Stanczew maciej.stanczew+b@gmail.com --- Staging 5.13 (all patchsets) + the two patches linked = no crash in about 2 hours of testing. Looks like that was it :)
https://bugs.winehq.org/show_bug.cgi?id=49590
Maciej Stanczew maciej.stanczew+b@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Component|ntdll |-unknown
https://bugs.winehq.org/show_bug.cgi?id=49590
--- Comment #14 from Paul Gofman pgofman@codeweavers.com --- Great, thanks for testing and bisecting. Those patches have been committed upstream as [1] and [2] and will appear in the next Staging rebase, so this can probably be marked as fixed.
1. https://source.winehq.org/git/wine.git/commit/3d54677586eb0a9f379839cd06c04d...
2. https://source.winehq.org/git/wine.git/commit/3feaca754613df248bc576b801d885...
https://bugs.winehq.org/show_bug.cgi?id=49590
Maciej Stanczew maciej.stanczew+b@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Fixed by SHA1| |3feaca754613df248bc576b801d | |885baa8637050 Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED
--- Comment #15 from Maciej Stanczew maciej.stanczew+b@gmail.com --- Briefly tested Wine 07030059486e0121051b452c94d37f12931cabf4 + Staging 02be23fa5213c0cb0b377b5120ea256d6b5f1af4 -- also no crashes. Marking as fixed, thank you for all the work!
https://bugs.winehq.org/show_bug.cgi?id=49590
Alexandre Julliard julliard@winehq.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |CLOSED
--- Comment #16 from Alexandre Julliard julliard@winehq.org --- Closing bugs fixed in 5.14.
https://bugs.winehq.org/show_bug.cgi?id=49590
Justin King-Lacroix justin.kinglacroix@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |justin.kinglacroix@gmail.co | |m
--- Comment #17 from Justin King-Lacroix justin.kinglacroix@gmail.com --- (In reply to Maciej Stanczew from comment #15)
Briefly tested Wine 07030059486e0121051b452c94d37f12931cabf4 + Staging 02be23fa5213c0cb0b377b5120ea256d6b5f1af4 -- also no crashes. Marking as fixed, thank you for all the work!
+1 -- Starcraft II on Wine 5.14 + Staging v5.14, and this bug goes away completely.