The rebalancing algorithm seems quite complicated, so we really it that way? Could you explain the reasoning behind it?
A simpler alternative would be to fix some number N (say, 64), and each time a thread reaches N it donates N/2 to the rebalance, and each time a thread is empty it fetches N/2 from the balance (or allocates).