I see the 2nd and 3rd failure on radeonsi (and radv), and all four failures on llvmpipe (and lavapipe). This is with Mesa 22.2.0-rc3. Nikolay, Henri, do neither of you see test failures, and with which drivers if so?
These pass for me on both Intel SKL / Mesa 20.3.5, and AMD VEGA10 / Mesa 20.1.9.
It's perhaps also worth pointing out that the second to last parameter to compare_figure() is a tolerance; that's zero for all the four tests in question, IIRC because these are supposed to draw relatively straightforward figures with straight lines that are expected to match exactly, but it may be interesting to check how much that would need to be raised to pass. Perhaps even more insightful would be to insert a IDXGISwapChain_Present() and Sleep() and get a screenshot, to compare with e.g. a run on WARP.