I've been thinking about this, and while I don't dislike the idea of feeding opaque formats through the frontend like this, it's hard for me to particularly want to deviate from the existing design, when it leaves problems that are going to need to be addressed a different way anyway. Especially when it's not really that hard to check if a format is using unrecognized caps. I certainly want to try to make failure modes easy to notice, but this doesn't seem like it should be that hard to debug.
(I kind of do dislike feeding opaque formats through the frontend, mostly because of latency, but I suppose it's not like there's going to be a lot of cases where it matters and we can actually avoid it.)
Also, is there a reason that we need to handle audio and video major types separately? Couldn't we just use the existing WG_MAJOR_TYPE_UNKNOWN and have done? It'd mean a lot less work, which would make it much easier to justify a change like this.
Also, if the caps are unknown, why expose attributes like channels, rate, width, height, FPS?