New subject: [Bug 44688] Detect stuck processes

8 Mar 2018


      https://bugs.winehq.org/show_bug.cgi?id=44688
Bug ID: 44688
           Summary: Detect stuck processes
           Product: Wine-Testbot
           Version: unspecified
          Hardware: x86
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: unknown
          Assignee: wine-bugs@winehq.org
          Reporter: fgouget@codeweavers.com
      Distribution: ---
Sometimes a TestBot worker process can get stuck.
This can happen to LibvirtTool.pl, particularly when dealing with offline VMs.
But it can also happen to regular scripts like WineRunTask.pl when using
TestAgent to send or retrieve a file.
In both cases the TestBot Engine should have a way to detect stuck processes
and simply kill them.
To detect stuck processes add two fields to the VM table.
  ChildStarted - The current child process start timestamp.
  ChildTimeout - How long the current child process is allowed to run.
Most of our tasks already have timeouts so it's just a matter of reusing this
timeout and adding some leeway. For the revert and offline tasks we could use 5
and 60 minutes respectively. Then the Jobs::_CheckAndClassifyVMs() method can
check those fields and kill the stuck processes. This works because the
Engine's SafetyNet() method schedules jobs every 10 minutes as a fallback.
The reason for using two fields instead of a single ChildDeadline one is that
the ChildStarted field could be useful to know which period to analyze when
collecting the Munin statistics (currently we analyze an arbitrary period of
time that's supposed to cover the worst case).
-- 
Do not reply to this email, post in Bugzilla using the
above URL to reply.
You are receiving this mail because:
You are watching all bug changes.

[Bug 44688] New: Detect stuck processes