I looked at this during development. When an object is freed it must be marked as free within its `REBALANCE_SIZE` packet, and the only way to make that object available for reuse is to add the packet to the cache. So we end up adding packets containing only one free object. But we can't send packets to the rebalancing array unless they are completely free, otherwise multiple threads can allocate and free objects within the same packet, and we are back to square one.
Yeah, I think I wrote that in a confusing way: what I meant is that, instead of storing an array of objects in the rebalance object, you can store an array of pointers to arrays of `REBALANCE_SIZE` objects. When you have `REBALANCE_SIZE` objects to push, you first create an array with all of them outside of the critical region, then inside the critical region you only have to copy one pointer instead of `REBALANCE_SIZE`. But I just realized that means you have to allocate another thing (the array itself), and maybe we don't want to handle that.
It may be slightly advantageous to transfer more than `REBALANCE_SIZE` because of cache coherence and the overhead of mutex locking/unlocking. Do you see any other advantages?
Not that I am aware of. OTOH it's also true that transfer more than `REBALANCE_SIZE` objects means that the lock can remain locked for a longer time, which might be a source of jitter. It's really not clear what's the best thing here.