On Fri Mar 14 06:23:19 2025 +0000, Brendan McGrath wrote:
I've raised another MR (!7563) that shows that the MFT decoders use the timestamps from the input samples on their respective output samples. We don't currently do that, we generate our own based on the provided frame rate (which has caused bugs). But we certainly have examples of applications talking directly to the MFTs, and if they are providing timestamps on their input samples then we are currently not honouring them. But if we want to change the MFTs to respect the timestamps on input samples, then we need to make sure our demuxers are providing them with sane values. And, as these tests show, the demuxers don't always agree. For example, take the results for `test-h264.mp4` (a file with just 5 samples): Sample 0: | | PTS | Duration | DTS | | ---------------- | ------- | -------- | ------- | | Windows | 1333332 | 333334 | 666666 | | Wine (gstreamer) | 666666 | 333333 | None | | Wine (ffmpeg) | 0 | 333333 | -666666 | | Value in file | 0 | 333333 | 666666 | Sample 1: | | PTS | Duration | DTS | | ---------------- | ------- | -------- | ------- | | Windows | 2666665 | 333333 | 999999 | | Wine (gstreamer) | 1000000 | 333333 | None | | Wine (ffmpeg) | 1333333 | 333333 | -333333 | | Value in file | 333333 | 333333 | 1666666 | Sample 2: | | PTS | Duration | DTS | | ---------------- | ------- | -------- | ------- | | Windows | 1999998 | 333334 | 1333332 | | Wine (gstreamer) | 1333333 | 333333 | None | | Wine (ffmpeg) | 666666 | 333333 | 0 | | Value in file | 666666 | 333333 | 666666 | Sample 3: | | PTS | Duration | DTS | | ---------------- | ------- | -------- | ------- | | Windows | 1666666 | 333332 | None | | Wine (gstreamer) | 1666666 | 333333 | None | | Wine (ffmpeg) | 333333 | 333333 | 333333 | | Value in file | 1000000 | 333333 | 0 | Sample 4: | | PTS | Duration | DTS | | ---------------- | ------- | -------- | ------- | | Windows | 2333332 | 333333 | 1999999 | | Wine (gstreamer) | 2000000 | 333333 | None | | Wine (ffmpeg) | 1000000 | 333333 | 666666 | | Value in file | 1333333 | 333333 | 333333 | So Wine with GStreamer doesn't output DTS at all, and with FFmpeg, it is outputting negative values. Windows is different again, but doesn't seem to use the raw values found within the actual file. Plus Windows and FFmpeg seem to output the samples in ascending DTS order, but GStreamer does it in ascending PTS order.
Actually, I think the DTS value in the file is a delta. In that the actual DTS value is PTS - DTS. Certainly the difference between the Windows (and FFmpeg) PTS and DTS of each sample is the DTS value from the file. So I guess it's just a matter of understanding why the PTS values are different. Possibly Windows adjusts the PTS values so that there are no negative DTS values (given its passed as an unsigned 64-bit value). I'm thinking we should probably do the same.