On Wed, 12 Jun 2019 at 15:33, Paul Gofman <gofmanp(a)gmail.com> wrote:
- shader_addline(buffer, "tmp0.x = dot(%s, %s);\n", - src_param.param_str, src_param.param_str); + if (mask_size > 3) + shader_addline(buffer, "tmp0.x = dot(vec3(%s), vec3(%s));\n", + src_param.param_str, src_param.param_str); + else + shader_addline(buffer, "tmp0.x = dot(%s, %s);\n", + src_param.param_str, src_param.param_str); This is fine.
- if (mask_size > 1) + if (mask_size == 4) + { + static const float max_float = FLT_MAX; + + shader_addline(buffer, "tmp0.x == 0.0 ? vec4(vec3(0.0), sign(%s[3]) * ", + src_param.param_str); + shader_glsl_append_imm_vec(buffer, &max_float, 1, ins->ctx->gl_info); + shader_addline(buffer, ") : (%s * inversesqrt(tmp0.x)));\n", src_param.param_str); + } + else if (mask_size > 1) This seems like a separate change. I'm also not sure about the FLT_MAX literal. I'd expect that you could achieve the same test results by simply multiplying the .w component with the rsq of tmp0.x. (Under d3d9's "zero wins"-rules at least; there would be a potential NaN under IEEE rules.)