Well the pipeline failed for some unrelated reason (xaudio test which should be marked flaky probably), but this shows the idea. The https://gitlab.winehq.org/rbernon/wine/-/jobs/4242 run shows the nulldrv differences and thus what's broken in user32 vs what is fixed / broken by winex11.
It adds a few minutes overall, so I don't think it's too bad, and it would be nice to have to better test user32 changes.
Note that I think the way I'm doing allowing nulldrv to be used here may not be great as it changes the registry for all tests, and so a normal run may now fail to load winex11 and fallback to nulldrv. Maybe it's not too much of an issue as other things will likely then fail, but still an open question.