On Wed, 12 Jun 2019 at 15:33, Paul Gofman gofmanp@gmail.com wrote:
- shader_addline(buffer, "tmp0.x = dot(%s, %s);\n",
src_param.param_str, src_param.param_str);
- if (mask_size > 3)
shader_addline(buffer, "tmp0.x = dot(vec3(%s), vec3(%s));\n",
src_param.param_str, src_param.param_str);
- else
shader_addline(buffer, "tmp0.x = dot(%s, %s);\n",
src_param.param_str, src_param.param_str);
This is fine.
- if (mask_size > 1)
- if (mask_size == 4)
- {
static const float max_float = FLT_MAX;
shader_addline(buffer, "tmp0.x == 0.0 ? vec4(vec3(0.0), sign(%s[3]) * ",
src_param.param_str);
shader_glsl_append_imm_vec(buffer, &max_float, 1, ins->ctx->gl_info);
shader_addline(buffer, ") : (%s * inversesqrt(tmp0.x)));\n", src_param.param_str);
- }
- else if (mask_size > 1)
This seems like a separate change. I'm also not sure about the FLT_MAX literal. I'd expect that you could achieve the same test results by simply multiplying the .w component with the rsq of tmp0.x. (Under d3d9's "zero wins"-rules at least; there would be a potential NaN under IEEE rules.)