http://bugs.winehq.org/show_bug.cgi?id=59450 --- Comment #31 from Henri Verbeet <hverbeet@gmail.com> --- I'm afraid this is looking more and more like a MoltenVK issue, but for what it's worth: (In reply to Tony Fabris from comment #30)
Attaching another backtrace during a hang. Comparing it to previous ones, I see a lot of commonalities, but I don't know if they're significant. For example, this bit is always the same, but it sometimes has zero instead of "7c0d28" in the addresses below:
Backtracing for thread 0178 in process 0120 (C:\Program Files\Hades2\Ship\Hades2.exe): Backtrace: =>0 0x006ffffb3d9103 in winevulkan (+0x19103) (0x000000007c0d28) 1 0x006ffffc71ddcb vkd3d_fence_worker_main+0x17b(arg=<is not available>) [/home/parallels/src/vkd3d-git/libs/vkd3d/command.c:358] in libvkd3d-1 (0x000000007c0d28) 2 0x006ffffc739111 call_thread_main+0x11(data=000000000CD22910) [/home/parallels/src/vkd3d-git/include/private/vkd3d_memory.h:52] in libvkd3d-1 (0000000000000000) 3 0x006fffffed1469 in kernel32 (+0x11469) (0000000000000000) 4 0x006ffffff50da3 in ntdll (+0x10da3) (0000000000000000)
The vkd3d_fence_worker_main() thread is a vkd3d worker thread for notifying d3d12 fences. The backtrace suggests it's waiting for a fence to finish, which in principle would be a normal state for that thread to be in.
And I notice this bit is always the same, but with different addresses, and it has some information surrounding RenderCommands.cpp line 366.
Backtracing for thread 0124 in process 0120 (C:\Program Files\Hades2\Ship\Hades2.exe): Backtrace: =>0 0x006ffffff50974 in ntdll (+0x10974) (0x0000000010fc88) 1 0x006ffffff79588 in ntdll (+0x39588) (0x0000000010fc88) 2 0x006ffffff74ff4 in ntdll (+0x34ff4) (0x00002995ff5fce) 3 0x006fffffc7138d in kernelbase (+0x6138d) (0x00002995ff5fce) 4 0x000001401fbbf5 Release() [C:\Jenkins\workspace\Iris_PC_Latest\Code\The- Forge\Common_3\OS\Windows\WindowsThread.cpp:67] in hades2 (0x00002995ff5fce) 5 0x000001401fbbf5 sgg::RenderCommands::WaitCanWrite+0x45(msTimeout=<register EBX not accessible in this frame>) [C:\Jenkins\workspace\Iris_PC_Latest\Code\Engine. Native\Code\Rendering\RenderCommands.cpp:366] in hades2 (0x00002995ff5fce) 6 0x00000140062ae9 sgg::App::UpdateAndDraw+0x169(this=<register RBX not accessible in this frame>, elapsedSeconds=*** Invalid address 0x00002995ff5fce *** ) [C:\Jenkins\workspace\Iris_PC_Latest\Code\Engine.Native\Code\App.cpp:649] in hades2 (0x00002995ff5fce) 7 0x00000140302b49 in hades2 (+0x302b49) (0x00002995ff5fce) 8 0x0000014001b9c3 WindowsMain+0x3c3(app=*** Invalid address 0x00002995ff5fce *** Internal symbol error: unable to access memory location 0000002995FF5FCE) [C:\Jenkins\workspace\Iris_PC_Latest\Code\The- Forge\Common_3\OS\Windows\WindowsBase.cpp:1210] in hades2 (0x00002995ff5fce) 9 0x0000014030109b in hades2 (+0x30109b) (0000000000000000) 10 0x00000140312595 main+0x45(argc=<register ESI not accessible in this frame>, argv=<register RDI not accessible in this frame>) [C:\Jenkins\workspace\Iris_PC_Latest\Code\Game.Native\main.cpp:17] in hades2 (0000000000000000) 11 0x0000014038e5e0 invoke_main+0x22() [D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:78] in hades2 (0000000000000000) 12 0x0000014038e5e0 __scrt_common_main_seh+0x10c() [D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288] in hades2 (0000000000000000) 13 0x006fffffed1469 in kernel32 (+0x11469) (0000000000000000) 14 0x006ffffff50da3 in ntdll (+0x10da3) (0000000000000000)
That looks like an application thread waiting to be able to render more frames. It's consistent with the application hanging, but probably not the cause. The lack of debug symbols for CrossOver's Wine makes things a bit harder, but this backtrace (a similar ones in the other logs) is consistent with the MVKPresentableSwapchainImage::presentCAMetalDrawable() related crash from earlier: Backtracing for thread 0178 in process 0120 (C:\Program Files\Hades2\Ship\Hades2.exe): Backtrace: =>0 0x006ffffb3d8f7b in winevulkan (+0x18f7b) (0x0000000b7a6d20) 1 0x006ffffd4aa041 in dxgi (+0xa041) (0x0000000b7a6d20) 2 0x006ffffd4aa908 in dxgi (+0xa908) (0x006ffffff6c400) 3 0x006fffffed1469 in kernel32 (+0x11469) (0000000000000000) 4 0x006ffffff50da3 in ntdll (+0x10da3) (0000000000000000) If you were able to get WINEDEBUG=+dxgi to work you may be able to confirm this, but I'd expect the call stack here to look something like this: (Some of these functions may have gotten inlined.) winevulkan.vkWaitForFences() dxgi.d3d12_swapchain_acquire_next_vulkan_image() dxgi.d3d12_swapchain_queue_present() dxgi.d3d12_swapchain_op_present_execute() dxgi.d3d12_swapchain_worker_proc() I.e., I suspect the dxgi d3d12_swapchain_worker_proc() thread is waiting for the next swapchain image the become available, which never happens because that Metal thread crashed. (There are potentially a couple of other places where d3d12_swapchain_worker_proc() could hang in a similar way; the basics are the same.) Some of the messages from the first log, like "[CAMetalLayerDrawable texture] should not be called after already presenting this drawable. Get a nextDrawable instead." also seem to point in a similar direction. I think what it comes down to is that at first sight these all look like the same underlying issue manifesting itself in slightly different ways. It's not necessarily clear whether it's the root cause, but I think the thing to figure out is what's happening with MoltenVK's _mtlDrawable/mtlDrwbl. My best guess at this point is that some kind of race causes _mtlDrawable to either be released while it's still in use, or to be released too often. The main path through which that may happen is probably MVKPresentableSwapchainImage::releaseMetalDrawable(). The following is mostly speculation, but here are some ideas: - If two different threads were to call MVKPresentableSwapchainImage::releaseMetalDrawable() and MVKPresentableSwapchainImage::getCAMetalDrawable() at the same time, getCAMetalDrawable() could observe "_mtlDrawable" after the [_mtlDrawable release] from releaseMetalDrawable(), but before the "_mtlDrawable = nil;" assignment. I.e., getCAMetalDrawable() could return a (potentially) already destroyed "_mtlDrawable" to its caller. - Places that call releaseMetalDrawable() aren't hard to find. For example, vkAcquireNextImageKHR() calls MVKSwapchain::acquireNextImage(), which calls MVKPresentableSwapchainImage::acquireAndSignalWhenAvailable(), which calls releaseMetalDrawable(). - getCAMetalDrawable() is of course called from MVKPresentableSwapchainImage::presentCAMetalDrawable(). The interesting part is probably how MVKPresentableSwapchainImage::presentCAMetalDrawable() is called. vkQueuePresentKHR() calls MVKQueue::submit(const VkPresentInfoKHR* pPresentInfo), which eventually calls MVKQueue::submit(MVKQueueSubmission* qSubmit). MVKQueue::submit(MVKQueueSubmission* qSubmit) can then execute "qSubmit" either synchronously as "execute(qSubmit)", or asynchronously using dispatch_async(). (MVKQueuePresentSurfaceSubmission::execute() calls MVKPresentableSwapchainImage::presentCAMetalDrawable().) In the asynchronous case, that would mean MVKPresentableSwapchainImage::releaseMetalDrawable() and MVKPresentableSwapchainImage::getCAMetalDrawable() end up being called from different threads. - I think asynchronous submission is the default. -- Do not reply to this email, post in Bugzilla using the above URL to reply. You are receiving this mail because: You are watching all bug changes.