So I've been working on the test.winehq.org site lately. Here is an overall description of what I'm trying to achieve:
1) Improve parsing of the WineTest reports
This is what started the ball rolling.
Initially the issue was with detecting the test executable exiting abruptly after a subprocess had already printed a test summary line. Pids were introduced into the test reports and test.winehq turned out to not need modifications. But having it take advantage of the pids does. And then a lot of other parsing improvements turned up.
2) Make it possible to check out the source anywhere and use it unmodified no matter where your web server files are located.
Currently winetest.conf, which is checked into Git, contains hardcoded paths to Wine's Git directory and the location of the web server files. I intend to move those to the web site configuration file and to the crontab entry.
The path to that is to remove $datadir, $queuedir and $gitdir in favor of a single $workdir directory; and then pass that path to the scripts that need it. For the CGI scripts this would be done by setting $workdir in the web server configuration file. And for regular scripts by passing it on the command line, starting with the crontab entry.
3) Get rid of /data in the URLs
The idea is that everything that's supposed to be accessible through the web server should be in the /data directory. That means moving the error files and the /builds directory.
Once that's done the web server can be told to only serve files in $workdir/data, /data becomes redundant in the URL, and redirects can be put in place to not break the old URLs.
That leaves /old-data which is currently accessible but need not be, and /queue which is partly accessible but probably should not be. After this change neither would be accessible anymore.
There's also a number of scripts that have an unclear purpose and make very tempting targets for git rm: error.cgi, make-winetest, service.cgi, site
3) Security
I don't know of any specific bug that needs fixing and I'm not an expert in that domain. So I'm just trying to stick to some principles that I hope will help keep things reasonably secure.
- I'm trying to make it so the web server process has write access to as few files as possible. In particular it should not have write access to anything that's executable. Currently the raw reports remain owned by the web server so it can still modify them. That's not necessarily an issue since those are not being run but maybe it should be changed.
- Same thing for the perl scripts which is why I documented a way of setting things up based on two accounts: one for the source and one where the scripts are run (and the web server is run in a third one of course).
- Eventually I hope to enable perl's taint mode. This does not guarantee safety but can help identify places where incoming data should be checked and sanitized.
4) Miscellaneous issues that would be worth looking at one day.
I don't have specific plans for these (hence why they have bug entries) but I'll list them here anyway in case someone volunteers.
- Disk usage seems really very high for such a simple site.
The static HTML pages the web site serves are pretty big: I estimate that for test.winehq they take up 21 GB of disk. And that's not because the reports themselves are big. On any given day we get about 60 reports which typically means: 50 MB of raw report files 180 MB of full report HTML files 285 MB of individual test unit HTML files 3 MB of index files So the raw reports represent under 10% of the disk usage. Everything else is duplicated data. But fixing this would require quite a change in the way the site works.
See bug 42756 for more on this part: https://bugs.winehq.org/show_bug.cgi?id=42756
Then there is the archives which takes 132 GB! There are a number of things we could do to reduce that: . It looks like we keep the old reports forever. Maybe we could delete old data after a while. . Maybe archiving just the raw report would be sufficient. After all, all the data is there. We may not be able to parse the old reports in the future but we would still be able to read them. Or we could simply archive the raw report and the single-file full report. - Instead of bz2 we could use other compression algorithms. On a set of 3 builds I got the following results: all > bz2 -9 : 22 MB / build -> 41 GB (1) all > xz -9 : 5.5 MB / build -> 10 GB (1) raw + html > bz2 -9 : 14 MB / build -> 26 GB (2) raw + html > xz -9 : 3 MB / build -> 6 GB (2) raw report > bz2 -9 : 5.5 MB / build -> 10 GB (3) raw report > xz -9 : 1.1 MB / build -> 2 GB (3) (1) Archive every single file as we currently do. (2) Only archive (the raw) report and report.html (3) Only archive (the raw) report. . Passing the 64 bit builds through UPX (as we already do for the 32 bit ones) would compress them by a factor of around 2.9 (one build that took 82 MB got reduced to 28 MB). This would reduce the size of the /builds directory to around 43 GB.
- Provide per machine results over time.
The index files make it possible to see how the number of failures evolved over time for a fiven platform, such as Windows XP. But anyone running the tests regularly on a machine would appreciate being able to see how it fared over time. However that's currently impossible since you only get specific machine results on a per-build basis. A fix for that would be to add a 'flat all machines' top-level index, as well as 'flat all machines' indexes for each test unit.
https://bugs.winehq.org/show_bug.cgi?id=39379
- Provide some data about the test unit run time.
This would likely take the form of additional per-build index files showing the test units sorted by run time with some min, max and average data. This would help identify which test units take too much time or which machines are having trouble keeping up.
https://bugs.winehq.org/show_bug.cgi?id=42757
- Provide some data about the size of individual test units output
The Wine test reports are pretty close to the 1.5 MB limit so having an idea of which the biggest offenders are would be helpful.
https://bugs.winehq.org/show_bug.cgi?id=42758