We set RaiseError in our DBI options so programming errors are easier to debug and have a better chance of being caught. However this means DBI::connect() will raise an exception if the database server is not running (like if it's being upgraded just at the wrong time). Furthermore the higher layers cannot meaningfully deal with a broken connection. So catch these exceptions and try reconnecting until successful.
Signed-off-by: Francois Gouget fgouget@codeweavers.com ---
This patch should prevent last friday's outage from re-occuring.
At least the admin page to access the log worked great! Also I have updated a local script to not only notify me when a VM goes offline, but also when the TestBot Engine is no longer responding.
testbot/lib/ObjectModel/DBIBackEnd.pm | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/testbot/lib/ObjectModel/DBIBackEnd.pm b/testbot/lib/ObjectModel/DBIBackEnd.pm index 4082a0ac..0594951b 100644 --- a/testbot/lib/ObjectModel/DBIBackEnd.pm +++ b/testbot/lib/ObjectModel/DBIBackEnd.pm @@ -53,9 +53,18 @@ sub GetDb($) } if (!defined $self->{Db}) { - $self->{Db} = DBI->connect(@{$self->{ConnectArgs}}); + while (1) + { + # Protect this call so we can retry in case RaiseError is set + eval { $self->{Db} = DBI->connect(@{$self->{ConnectArgs}}) }; + last if ($self->{Db}); + + # Prints errors on stderr like DBI normally does + $@ ||= "DBI::connect() returned undef without setting an error"; + print STDERR "$@\n"; + sleep(30); + } } - return $self->{Db}; }