We actually used to do something kind of like this, but moved away from it in 0d956959fa and b5916adfb6. There are advantages and disadvantages: on the one hand, the way the code is now makes it easier to see exactly what fields are relevant to each format. On the other hand, we could potentially avoid some duplication, even forgetting about Proton. So I'm not really sure whether this is an improvement.
If we do, I'd probably prefer to revert or half-revert the mentioned commits, rather than making the changes proposed here. I.e. don't add more nested structures; instead just merge the the existing video_* and audio_* to a single "video" and "audio" structure the way it used to be. That's a bit simpler. I could see potentially moving the major types back to audio/video formats (which would help simplify a lot of the format conversion probably). Also, we should probably explicitly mention in the header which fields matter, since it's not spelled out in the API anymore.
I'm not thrilled about renaming "format" to "type", partly because it's now almost the same term used for the major type, and also doesn't match either "subtype" (mfplat, dshow/wmv) or "format" (mfplat, GStreamer). Is there a motivation for this?