Afaik 3DMark can also generate performance graphs.
I prefer to have the raw data myself, to do my own processing, so I'm not sure if that helps us. But graphcs are just a nice to have, we don't need them.
We could always try writing something ourselves, I guess :-) The problem there is going to be creating something that's representative though.
Ya, I don't think our own tests will help us much.
random elements in the game? More detailed data from wined3d could also be used to draw nice graphs for other benchmarks :-)
NVPerfSDK could potentially help there, if the license permits.
I'm concerned about the NV in NVPerfSDK :-) Of course its a tool we can use, but we shouldn't depend on it.
One issue I see is actually interpreting the results. When is a performance drop large enough to be a problem? Sometimes a change will slightly reduce performance for some applications, but significantly improve it for others.
Thats a good point, and I've been thinking about it a bit. All we can use is some heuristics. Throwing a warning on a single 0.01 fps drop is an overkill because there are minor differences between runs, even with exactly the same code. On the other hand, if a number of 0.5 fps drops sum up to a drop from 80 to 75 fps it is something that should perhaps be investigated, so we should check against a fixed reference value rather than the last day's result.
Wether a performance drop is a real regression or an unavoidable side effect from something else, like a rendering correctness fix is something we have to decide per incident. The automated tests can only generate a warning(e.g. per mail). Then we have to look at the warning and decide what to do with it. We could either fix the regression, if it is one, or adjust the reference value.