because the flag can get set immediately after we read it.
but the waiter is not added until we cmpxchg the lock ptr, and we would notice the flag got set and will retry.
Then we run into an ABA problem, though—the spinlock can get released again by that point.
In fact, yes, I think doing this lock-free isn't going to work. Regardless of whether we need all head pointers to be correct or just the tail's head pointer, I don't see a way to avoid the head pointer being stale in this function.