On Mon Sep 25 16:45:04 2023 +0000, Evan Tang wrote:
You don't have to change the coefficients, just accumulate in the opposite order like `((coeff_x3 * x + coeff_x2) * x + coeff_x1) * x + coeff_x0` All modern GPUs have multiply-add instructions, so that will compile to three madds, which should be nice and fast.
And while the atan function is slightly less accurate to avoid the discontinuity, acos is slightly more accurate on average: https://www.desmos.com/calculator/3ncrqzazxa