I changed it to overallocate and store the pointer separately. Heap pointers are always aligned to 2 * sizeof(void *), so after the 16 byte alignment, this always ends up with either 8 or 16 bytes of padding on 32-bit (and always 16 bytes on 64-bit).