Just so I understand, SAR clock jumps to pts + <time it actually took to play this sample/time it would take according to sample rate> or it jumps blindly to pts + mf sample duration, ignoring actual playback time? I don't know how well sample duration is validated against actual buffer sizes.
It will jump to PTS and then progress from there at the playback rate. I don't think it uses the duration value, just buffer size to determine duration (which is easy to calculate based on frame size and sample rate).