Giovanni Mascellani (@giomasce) commented about libs/vkd3d-shader/hlsl.y:
+ const struct parse_initializer *params, const struct vkd3d_shader_location *loc, bool asin_mode) +{ + struct hlsl_ir_function_decl *func; + struct hlsl_type *type; + char *body; + + static const char template[] = + "%s %s(%s x)\n" + "{\n" + " %s abs_arg = abs(x);\n" + " %s correction = sqrt(1.0 - abs_arg);\n" + " %s result = correction * (\n" + " (3.14159265 / 2.0)\n" + " - (0.2127403136003234 * abs_arg)\n" + " + (0.07612092595257536 * abs_arg * abs_arg)\n" + " - (0.01996337677405357 * abs_arg * abs_arg * abs_arg)\n" Notice that native, at least in my tests, is evaluating the polynomial using [Horner's method](https://en.wikipedia.org/wiki/Horner's_method), which is probably more efficient (it takes only three multiplications instead of six in this case, if I'm not mistaken). That would amount to something like `(((-0.01996337677405357f * abs_arg + 0.07612092595257536f) * abs_arg - 0.2127403136003234f) * abs_arg + 1.570796325f`.
Notice that floating point numbers do not need so many significant digits ([see this nice tool](https://evanw.github.io/float-toy/)), and that native coefficients are slightly different from yours, not sure why. Where do your coefficients come from? Your tests have a rather large error margin set: while this is not a problem in its own, maybe using the same coefficients as native you could get that smaller. -- https://gitlab.winehq.org/wine/vkd3d/-/merge_requests/364#note_46616