Dropping buffers with PTS being too far behind would be SAR functionality I think
I haven't tested SAR when using an external clock. But I found that when it uses its own clock (it implements the `IMFPresentationTimeSource` interface), it never drops samples. It just updates its clock to reflect the sample PTS. I captured that in a test here: https://gitlab.winehq.org/redmcg/wine/-/commit/f940f12b3c7d562fb2fb00a68a3e7...
In that test, I confirm the clock is `5510000`, then I get SAR to process a sample with PTS `20000000` and duration `100000`, sleep for 150ms (to allow time for SAR to process the sample) and then confirm the clock has magically jumped to `20100000` (which is a 1.5 second jump in a tenth of that time).
I've found this matches the behavior of `IMFMediaEngine` too; thus it must use the `IMFPresentationTimeSource` provided by SAR. So audio is never out of sync, as it determines the presentation time. It's up to the video to pause or speed-up as necessary to keep audio and video in sync.
I found I needed to implement this behavior to fix seek in VRChat (which uses `IMFMediaEngine`). I plan to raise an MR upstream soon (once I'm happy with it).