I may have a repeatable case of an error in locking critical sections, so I'd like some pointers as to how to debug this.
The case occurs with the installer for Delorme Street Atlas 5 - on my 2GHz Athlon desktop it runs without a hitch, but on my oooooold slow laptop (how old is it? It's sooooo old, it used PIO for the disk!) the program locks up 100% of the time at startup, with 2 threads trying to take different critical section locks and dying. It looks like the standard deadlock condition: one thread tries to lock A-B-C, the other tries to lock A-C-B, and so they deadlock.
Are there any tricks to identifying who locked what where and in what order?
David D. Hagood wrote:
I may have a repeatable case of an error in locking critical sections, so I'd like some pointers as to how to debug this.
The case occurs with the installer for Delorme Street Atlas 5 - on my 2GHz Athlon desktop it runs without a hitch, but on my oooooold slow laptop (how old is it? It's sooooo old, it used PIO for the disk!) the program locks up 100% of the time at startup, with 2 threads trying to take different critical section locks and dying. It looks like the standard deadlock condition: one thread tries to lock A-B-C, the other tries to lock A-C-B, and so they deadlock.
Unless the installer is using TryEnterCriticalSection, I would expect CPU utilisation to be 0% when deadlocking. Relay logs generally give the best clues in this kind of situation.
Robert Shearman wrote:
David D. Hagood wrote:
Unless the installer is using TryEnterCriticalSection, I would expect CPU utilisation to be 0% when deadlocking.
Yes, *once the deadlock occurs* the CPU drops to 0%. The issue (I think) is more along the lines of this:
On fast CPU:
thread 1 locks resource A, does something, locks B, unlocks B, unlocks A.
Thread 2 locks B, does something, locks A, does something, unlocks C, A.
On slow machine:
Thread 1 locks resource A, does something.
Context switch.
Thread 2 locks B, does something.
Context switch
Thread 1 tries to lock B and blocks.
Context switch.
Thread 2 tries to lock A and blocks.
Deadlock.
In other words, on the fast CPU the deadlock does not happen because thread 1 gets everything done before thread 2 starts. On the slow machine, thread 2 starts while thread 1 is still doing stuff.
David D. Hagood wrote:
Robert Shearman wrote:
David D. Hagood wrote:
Unless the installer is using TryEnterCriticalSection, I would expect CPU utilisation to be 0% when deadlocking.
Yes, *once the deadlock occurs* the CPU drops to 0%. The issue (I think) is more along the lines of this:
On fast CPU:
thread 1 locks resource A, does something, locks B, unlocks B, unlocks A.
Thread 2 locks B, does something, locks A, does something, unlocks C, A.
On slow machine:
Thread 1 locks resource A, does something.
Context switch.
Thread 2 locks B, does something.
Context switch
Thread 1 tries to lock B and blocks.
Context switch.
Thread 2 tries to lock A and blocks.
Deadlock.
In other words, on the fast CPU the deadlock does not happen because thread 1 gets everything done before thread 2 starts. On the slow machine, thread 2 starts while thread 1 is still doing stuff.
Yep, example of what not to do in concurrent programming. You should make a note of the order in which locks are taken and always take the locks in that order and always release them in the opposite order.
You have several options from here: 1. File a bug report with the maker of the application. 2. Find a function that B uses just before it locks that A doesn't use and add a Sleep call. Obviously this kind of fix won't be accepted into Wine. 3. Do tests to try to find a function that A uses that is much slower on Wine and try to fix it.
Robert Shearman wrote:
Yep, example of what not to do in concurrent programming. You should
Tell me about it - I do hard realtime for a living.
One of the locks is the Win6 lock, another does not seem to have a name (shown as "?" when things go bad). I wonder if, under Real Windows, the Win116 subsystem does not lock, and that is why this program doesn't die under Windows.