Giovanni Mascellani (@giomasce) commented about libs/vkd3d-shader/hlsl.y:
const struct parse_initializer *params, const struct vkd3d_shader_location *loc, bool asin_mode)
+{
- struct hlsl_ir_function_decl *func;
- struct hlsl_type *type;
- char *body;
- static const char template[] =
"%s %s(%s x)\n"
"{\n"
" %s abs_arg = abs(x);\n"
" %s correction = sqrt(1.0 - abs_arg);\n"
" %s result = correction * (\n"
" (3.14159265 / 2.0)\n"
" - (0.2127403136003234 * abs_arg)\n"
" + (0.07612092595257536 * abs_arg * abs_arg)\n"
" - (0.01996337677405357 * abs_arg * abs_arg * abs_arg)\n"
Notice that native, at least in my tests, is evaluating the polynomial using [Horner's method](https://en.wikipedia.org/wiki/Horner%27s_method), which is probably more efficient (it takes only three multiplications instead of six in this case, if I'm not mistaken). That would amount to something like `(((-0.01996337677405357f * abs_arg + 0.07612092595257536f) * abs_arg - 0.2127403136003234f) * abs_arg + 1.570796325f`.
Notice that floating point numbers do not need so many significant digits ([see this nice tool](https://evanw.github.io/float-toy/)), and that native coefficients are slightly different from yours, not sure why. Where do your coefficients come from? Your tests have a rather large error margin set: while this is not a problem in its own, maybe using the same coefficients as native you could get that smaller.