I had forgotten we do this, but that's actually intentional; see the following lines in wg_format_to_caps_video() where we explicitly strip the framerate. There's potentially an argument for changing that, but in general I don't think we ever need to restrict the framerate we pass to GStreamer. If we want the output framerate to match the input in a transform, we can copy it directly. If I'm not missing anything, that should remove the need for patch 2/4 entirely.
Framerate striping is introduced in commit 7adcdb6 by Rémi. I think he did this for H264 decoder. See: try_create_wg_transform() (https://gitlab.winehq.org/wine/wine/-/blob/master/dlls/winegstreamer/h264_de...).
In this function, fps_n and fps_d are both set to 0 for h264 decoder. Then in wg_format_to_caps_video(), framerate is stripped only when fps_n and fps_d are both 0.
It works fine so far, at least before introducing WMV deocder. However, according to my tests, in WMV decoder, when it tries to match element caps, it requires caps to hold framerate field, otherwise, the element connecting will fail.
Therefore, we have to set the framerate field in caps for WMV decoder.