https://bugs.winehq.org/show_bug.cgi?id=39412
Bug ID: 39412 Summary: Add failover and load balancing support Product: Wine-Testbot Version: unspecified Hardware: x86 OS: Linux Status: NEW Severity: normal Priority: P2 Component: unknown Assignee: wine-bugs@winehq.org Reporter: fgouget@codeweavers.com Distribution: ---
Created attachment 52530 --> https://bugs.winehq.org/attachment.cgi?id=52530 Proposed schema update to allow for failover
We currently have 4 VM hosts and it's easy to duplicate a given VM so it's present on multiple hosts.
So we could put a VM such as win7u on both vm1 and vm2. But if vm1 goes offline for whatever reason, the administrator would still have to manually change the WineTestBot configuration so it knows to use the win7u copy on vm2.
Furthermore if a task is scheduled to run on win7u but vm1 is busy running a task on another VM it would be nice to automatically run that task on vm2's win7u copy instead. This would lessen the need to carefully figure out which VMs can be put on the same host based on whether they are 64 bit or not, whether they are a base or winetest VM, etc. (later how many configurations they have, see bug 31784)
But we cannot add entries for both copies of win7u because: 1. Both entries would show up in the user interface. 2. Wine patches would be scheduled to run on both copies which is not what we want. 3. Tasks scheduled to run on the first copy would still not be switched to the second copy. 4. The TestBot may start both copies at the same time which would be wrong from a licensing point of view and cause conflicts due to both having the same IP address (unless one of the copies is manually modified to have a different IP address).
What is really needed is a change in the database schema so we can define multiple interchangeable 'VM instances' of a given VM. The attached schema adds a VMInstances table to deal with this: * A task would initially be connected to a VM (we can ignore the VMConfig vs. VM distinction here). * When scheduling the task the TestBot would iterate over the corresponding VM's VMInstances, identify the ones where the VMSnapshot field is set as the active instances and count them. (Note: the instance's ChildPid field is only set during reverts and the (JobID, StepNo, TaskNo) triplet is only set after revert when the task is actually running. So neither identify active VMs). * Then if the count of active instances is lower than the VM's MaxActive field it could revert that VM instance. * The VM's MaxActive is what allows us to not exceed our number of licenses. It also allow us to have multiple instances active in case we have multiple licenses: for the Linux build machine for instance (in that cae each copy would need a different IP address). * Optionally a user could request that a task be run on a specific VMInstance by setting the Task's VMNo field.