I don't think the flattener should be "hijacked" for this pass, it is already complicated enough. It's probably appropriate to have a single pass to collect all the little local operations like this one, instead of iterating over the whole instruction array every time, but it should be another one. As Henri notices, the same pass might do this and lowering texkills (and probably removing DCL_TEMPS too).