On Mon Sep 25 15:41:25 2023 +0000, Petrichor Park wrote:
Evan Tang wrote the algorithms for all these. He solved for the coefficients using Wolfram Alpha because he felt a little worried about using Microsoft's output directly. Apparently, Evan's numbers are slightly less accurate overall, but don't yield a discontinuity at X=0. It doesn't look like he has an account here but we could pull him into the discussion on Matrix?
You don't have to change the coefficients, just accumulate in the opposite order like `((coeff_x3 * x + coeff_x2) * x + coeff_x1) * x + coeff_x0`
All modern GPUs have multiply-add instructions, so that will compile to three madds, which should be nice and fast.