I wouldn't call either d3dbc.c or tpf.c very long, but I suppose that's a matter of opinion. I imagine it doesn't help that I find the practice of spreading source code over a million tiny files to be one of the most irritating things a codebase can do...
In any case, while it's perhaps not quite true at the moment, parsing and serialising the different bytecode formats should be closely related and use the same constants, structures, and so on. There's currently some HLSL-specific code intermixed that would more properly belong in hlsl.c/hlsl_codegen.c, but we can deal with that once we get to it.