IMO tests are meant to be run regularly or are otherwise meaningless and doomed to bitrot and fail without anybody noticing. As our testing policy is generally designed around testing MRs and nightly runs of the test suite in non-interactive mode, I don't see much value in interactive tests. People may run a couple of tests in interactive mode but nobody will run the entire test suite when reviewing.
I am not particularly interested in argueing for interactive tests, since what I'm pushing for here is something else; and I agree that in general tests that are run regularly and frequently are better than tests that are run irregularly and rarely; but tests that are run irregularly and rarely are better than tests that are never run because they do not exist. Even if you discover about a problem after a year, that's better than never realizing that, or having to debug it from scratch because of a application that fails. If nothing else because you don't have to write again the test.
I could find an "extended" test suite useful, but only if it's run regularly. If it is so much more expensive that we can't afford running it in MRs and nightly runs, I kind of doubt we can do that?
What would you think about running it daily? Not on every MR, in order to make MR pipelines a quicker feedback; doing an extended test daily still gives a relatively good feedback without starving the MR pipeline queue.
If necessary I think we could perhaps consider increasing the test timeouts on a case-by-case basis, but it's usually better to try to find some interesting test subset rather than being exhaustive. Looking at the change here I would say that testing the entire parameter matrix seems a bit overkill, and only varying over one dimension at a time would be enough?
I'm not sure. I have tried different approaches, and in many cases I left some test cases around and later discovery that they were meaningful and I was making incorrect assumptions because of that. Running the whole matrix isn't terribly slow after all, it still takes around a minute (give or take something depending on the hardware and OS). So I see a good reason for not doing that on each MR, but still doing it on an extended run. Getting smart about what to include or not is likely going to result into not being smart enough and not catching a regression when there is one. -- https://gitlab.winehq.org/wine/wine/-/merge_requests/9694#note_128757