cmp instruction in GLSL - problems
This patch is causing trouble for me. The current GLSL cmp implementation is wrong - here's 4 reasons why, only the first of which is mentioned in the source: - it ignores destination write mask - it ignores source swizzle - it ignores other source modifiers. - it works incorrectly src0 = 0 So, I replaced it with something that's supposed to fix all the issues above - based on per-component lessThan operation followed by a mix(), which blends the two things we're selecting between, but the blend factor is always 0 or 1 exactly, so it's always a select operation, not a blend. Patch improved: Hair, TransparentShadowMapping Patch broke: 3Dc, PhongLighting, ASCII, SelfShadowBump I still think it's correct - tested behavior separately, and it seems to work... but it breaks other demos. Any suggestions? All the demos above can be found on http://www.humus.ca/index.php?page=3D. They require UsingGLSL to be enabled in the registry, and a GLSL-capable card. ========= Note that all of the above demos used replicate swizzle in at least one argument, which the previous implementation did not take into account. A lot of them did this: cmp r1.a, r1.a, 0, 1 mad r0, r0, r1.a, something_else So, that looks like z-buffer test. Note that with the previous implementation, the comparison line would ignore the write mask and write to all 4 channels, which should cause a result that's too bright in all channels. I think that's exactly what happened in a number of these demos, PhongLighting in particular. If I use the new code, and simply swap the two arguments in the cmp test, the demo will work for the most part, a lot more is shown than black screen, and the color balance looks much closer to what's shown in the picture online. ...but.. I shouldn't have to swap the two arguments in cmp, they are correct as is, imho, and the previous implementation worked by accident. --- dlls/wined3d/glsl_shader.c | 11 +++++------ 1 files changed, 5 insertions(+), 6 deletions(-) diff --git a/dlls/wined3d/glsl_shader.c b/dlls/wined3d/glsl_shader.c index 223a8e0..23f944c 100644 --- a/dlls/wined3d/glsl_shader.c +++ b/dlls/wined3d/glsl_shader.c @@ -987,20 +987,19 @@ void shader_glsl_compare(SHADER_OPCODE_A /** Process CMP instruction in GLSL (dst = src0.x > 0.0 ? src1.x : src2.x), per channel */ void shader_glsl_cmp(SHADER_OPCODE_ARG* arg) { + char tmpLine[256]; char dst_str[100], src0_str[100], src1_str[100], src2_str[100]; char dst_reg[50], src0_reg[50], src1_reg[50], src2_reg[50]; char dst_mask[6], src0_mask[6], src1_mask[6], src2_mask[6]; - + shader_glsl_add_param(arg, arg->dst, 0, FALSE, dst_reg, dst_mask, dst_str); shader_glsl_add_param(arg, arg->src[0], arg->src_addr[0], TRUE, src0_reg, src0_mask, src0_str); shader_glsl_add_param(arg, arg->src[1], arg->src_addr[1], TRUE, src1_reg, src1_mask, src1_str); shader_glsl_add_param(arg, arg->src[2], arg->src_addr[2], TRUE, src2_reg, src2_mask, src2_str); - /* FIXME: This isn't correct - doesn't take the dst's swizzle into account. */ - shader_addline(arg->buffer, "%s.x = (%s.x > 0.0) ? %s.x : %s.x;\n", dst_reg, src0_reg, src1_reg, src2_reg); - shader_addline(arg->buffer, "%s.y = (%s.y > 0.0) ? %s.y : %s.y;\n", dst_reg, src0_reg, src1_reg, src2_reg); - shader_addline(arg->buffer, "%s.z = (%s.z > 0.0) ? %s.z : %s.z;\n", dst_reg, src0_reg, src1_reg, src2_reg); - shader_addline(arg->buffer, "%s.w = (%s.w > 0.0) ? %s.w : %s.w;\n", dst_reg, src0_reg, src1_reg, src2_reg); + shader_glsl_add_dst(arg->dst, dst_reg, dst_mask, tmpLine); + shader_addline(arg->buffer, "%smix(vec4(%s), vec4(%s), vec4(lessThan(vec4(%s), vec4(0.0)))))%s;\n", + tmpLine, src1_str, src2_str, src0_str, dst_mask); } /** Process the CND opcode in GLSL (dst = (src0 < 0.5) ? src1 : src2) */ -- 1.4.0
- shader_addline(arg->buffer, "%s.w = (%s.w > 0.0) ? %s.w : %s.w;\n", dst_reg, src0_reg, src1_reg, src2_reg); + shader_glsl_add_dst(arg->dst, dst_reg, dst_mask, tmpLine); + shader_addline(arg->buffer, "%smix(vec4(%s), vec4(%s), vec4(lessThan(vec4(%s), vec4(0.0)))))%s;\n", + tmpLine, src1_str, src2_str, src0_str, dst_mask); } To give an idea as to what this is doing, see the following translation, taken from the Hair demo:
Original Shader: cmp r1.rg, r0.r, abs(v0).brba, abs(v0).rbba New code translates to: R1.xy = vec4(mix(vec4(abs(IN0.zxzw)), vec4(abs(IN0.xzzw)), vec4(lessThan(vec4(R0.x), vec4(0.0))))).xy;
participants (1)
-
Ivan Gyurdiev