"Mikolaj Zalewski" mikolajz@google.com writes:
I'm not sure if I understood how it's supposed to look like. Is something like this good? It adds an 'if' in the inner loop, but it's an 'if' that is easy to predict and this version makes less memory reads so maybe it will be better.
You should put the if outside the loops, and keep the existing fast case with the memcpys and add a separate slower way for when alpha needs patching.