Yeah the CI and testbot often have very high noise. I could probably remove the tests if they prove very flaky, I was interested in checking how the different delivery modes worked but now it's pretty clear. Or maybe see how we can increase the QUEUE vs AT_TIME offset and use larger time differences (though at the cost of run time).