I think you're just making things more complicated than they are. The wg_format struct is just another ad-hoc internal struct, and a subset of native representations. Choosing another internal representation, and in this case MFVIDEOFORMAT / WAVEFORMATEX should be safe and enough to achieve the exact same thing.
How does renaming a struct make anything easier?
It's not renamed, it's a different struct to allow to gradually transition the fronted. Does it even matter? It better matches native terminology anyway, where `format` is the representation, and `media type` is the additional metadata around it.