I did several patches about D3DXMAtrixStack. All were uncorrect.
The idea of Henri to implement it is this one: it is very time -comsumming to allocate or to free memory. So one needs to do that as less as possible.
So, we start with a stack with predefined size. When the stack is full, one multiplies by 2 its size. When one releases items enough of the stack, one divides its size by 2.
Here are patches that implemented that idea.
When looking at your patch, I saw that in my patch, in the D3dxMatrixstackImpl_release function, I do not free the memory of the array of matrix. It should be done.
In the tests patch, I did not check if d3dxmatrixstackcreate fails. If it fails, tests should be skipped. At the end of the test, I did not call the _Release function. It should be done too.
Maybe, you should use these patches and try to improve them.
All the tests passed on my Windows XP box.
David