Hi,
On 09/11/21 23:21, Zebediah Figura wrote:
Overall this patch looks a lot better than I was afraid of. It's a lot of code that's intimidating to review, but once you ignore the rbtree boilerplate it's simple enough and seems about in line with what I expect. There's quite a few things that I think can be simplified, but the basic structure seems sound, so nice work.
Great, that's good! Thanks for your review, I think most if not all of the points you wrote should be easy to fix.
The CF part is probably going to be a lot trickier, but fortunately I think that this patch alone is the one that really matters. Frankly, if we were to take this patch, and then a second patch that does copy-prop on interior CF blocks with a fresh copy_propagation_state(), we might even cover enough that it's not even worrying about CF...
Mmh, note that you have to handle control flow in some way or another, you cannot just ignore it. This patch doesn't know about control flow, but the only way it can do it is by bailing out at the first control flow instruction in the program. It's not enough to ignore the control flow block and resume processing after it, because the inner block might invalidate some of your knowledge about the program state (in 3/5 patch this is done by copy_propagation_invalidate_from_block).
+/* struct copy_propagation_state represents the accumulated knowledge
- of the copy propagation pass while it scans through the code. Field
- "variables" is a tree whose elements have type struct
- copy_propagation_varible, and represent each of the variables the
- pass has already encountered (except those with special semantics,
- which are ignored). For each variable, the array "values" (whose
- length is the register size of the variable) represent which node
- and which index inside that node (i.e., which of the at most four
- entries of a vector) provided that value last time. Field "node"
- can be NULL, meaning that the pass was not able to statically
- determine the node.
- */
This comment feels a bit too low-level? I dunno, comments like this are hard to review, but my inclination is to describe what you're doing at a high level, along the lines of "we track the last known value of each component of each variable", and let the code itself explain the details.
Personally I would have loved to meet comments like mine in other areas of Wine: knowing what data read or written by an algorithm are meant to represent makes it much easier to understand why the algorithm does what it does. I agree that this is not the most complicated case around, but I thought this would be helpful for someone not already knowing my code. OTOH I understand that also a high-level description is useful: would you consider it satisfying to have both?
+static int copy_propagation_variable_compare(const void *key, const struct rb_entry *entry) +{ + struct copy_propagation_variable *variable = RB_ENTRY_VALUE(entry, struct copy_propagation_variable, entry); + uintptr_t key_int = (uintptr_t)key, entry_int = (uintptr_t)variable->var;
+ if (key_int < entry_int) + return -1; + else if (key_int > entry_int) + return 1; + else + return 0;
Could we just modify the rbtree implementation to use uintptr_t instead, and then do a direct subtraction?
I am not sure of what you mean: the pointer in struct copy_propagation_variable is used as a pointer, it would not be very logical to store it as an integer.
Also, do you get the right think if you subtract two unsigned integers? For one thing the result is unsigned, but even if you cast it to signed you don't get an order, do you?
Even if you could, you would get a shorter code, but I am not sure it would be a more legible one.
+static struct copy_propagation_variable *copy_propagation_get_variable(struct hlsl_ctx *ctx, + struct copy_propagation_state *state, struct hlsl_ir_var *var) +{ + struct rb_entry *entry = rb_get(&state->variables, var); + struct copy_propagation_variable *variable; + int res;
+ if (entry) + return RB_ENTRY_VALUE(entry, struct copy_propagation_variable, entry);
+ variable = hlsl_alloc(ctx, sizeof(*variable)); + if (!variable) + return NULL;
+ variable->var = var; + variable->values = hlsl_alloc(ctx, sizeof(*variable->values) * var->data_type->reg_size); + if (!variable->values) + { + vkd3d_free(variable); + return NULL; + }
+ res = rb_put(&state->variables, var, &variable->entry); + assert(!res);
Although this is a bit awkward because most of the function is only relevant for stores, not loads, and I think it would actually be better to reflect that in the code, one way or another (so we can do less effort).
Do less effort at what? It's true that when loading I could return NULL when the variable has never been seen before (meaning that we don't know anything about that variable, so the load cannot be removed) instead of creating an "empty" variable placeholder and return that one, but I don't really see an advantage doing that. It would seem more effort rather than less to me, because there are more possible branches to check.
(BTW, that would also mean that the variable is read before being written, which is something that shouldn't normally happen, but I don't think it's sensible to count on that)
+ if (TRACE_ON()) + { + char buf[32]; + if (!node) + sprintf(buf, "(nil)");
Can that happen?
Not yet, but it will as soon as loops and conditionals enter the scene, because they can cause the last store to a variable to become unknown (at compilation time).
Now, I know that usually we don't like to have dead code around. I think this case could be accepted, since this code is just a line long and becomes alive at 3/5. Also, I find it easier to review this piece of code at once, instead of first reviewing a part, checking that the missing case will never happen and then review the missing case and checking that it fits well in the already-reviewed part.
But if you really don't like it, I can defer the NULL case to 3/5.
+ else if (node->index) + sprintf(buf, "@%u", node->index); + else + sprintf(buf, "%p", node);
Can that happen?
Definitely. Arguably, it might be the common case, given that the earlier optimization passes (including this pass itself, given that it might be executed more than once) might synthesize a lot of nodes, which won't have an index until compute_liveness is executed.
+ TRACE("variable %s[%d] is written by %p[%d]\n", variable->var->name, offset + index, buf, index);
This trace doesn't really match the usual format, but more saliently, the hardcoded buffer is ugly. Assuming the other two cases really can happen, I'd rather see individual traces spelled out.
Same thing below.
Ok. As usual, I found find that uglier, but I can live with that.
BTW, having the traces is not a condicio sine qua non for me. They were useful for me to develop the patch, and I left them because I figured that they might also be useful in the future, but we can happily get rid of them.
+ struct hlsl_deref *src = &load->src; + struct hlsl_ir_var *var = src->var; + struct hlsl_ir_swizzle *swizzle; + DWORD s;
+ if (var->is_input_semantic || var->is_output_semantic || var->is_uniform) + return false;
For input semantics and uniforms: yeah, but do we need to? The important question is "can we reconstruct a store for this variable", and this should already be false.
For output semantics, we should never get here in the first place.
I think you're right. I had put this check initially because I was concentrated on other aspects of the algorithm, but it's true that it is overly cautious.
+ if (type->type != HLSL_CLASS_SCALAR && type->type != HLSL_CLASS_VECTOR) + return false;
+ offset = hlsl_offset_from_deref(src);
The problem with hlsl_offset_from_deref(), and the reason I probably should have fought against 62b25bc52b in its current form, is that it not only doesn't deal with non-constant offsets, but doesn't really give the caller a way to bail either. We should probably be returning bool from it, and then aborting here.
I agree. Actually, the first version of my patch set had a variant of hlsl_offset_from_deref that acknowledged the possibility that it could not be possible to statically determine the offset of a deref. This condition is difficult to handle at code generation time, but it is not particularly problematic here: at load time just ignore the load, and at store time invalidate the whole variable.
I can totally fix hlsl_offset_from_deref so that this happens.
(speaking of constant expressions in the code, I am a bit bothered that evaluate_array_dimension is not able to figure out the value of constant expressions that are not immediate constants, and unfortunately fixing that is not as easy as running a fold_constants pass; I don't in mind a clean solution for that that doesn't require re-implementing constant folding there; that's another matter, of course)
+ variable = copy_propagation_get_variable(ctx, state, var); + if (!variable) + return false;
+ new_node = copy_propagation_reconstruct_node(variable, offset, type->dimx, indices);
+ if (TRACE_ON()) + { + char buf[32]; + if (!new_node) + sprintf(buf, "(nil)");
Is this useful to trace?
As before, it is a matter of tastes, I believe. I can drop it, just as I can drop the traces above.
+ else if (new_node->index) + sprintf(buf, "@%u", new_node->index); + else + sprintf(buf, "%p", new_node);
Can this happen?
Yes, just as above.
+ variable = copy_propagation_get_variable(ctx, state, var); + if (!variable) + return false;
+ copy_propagation_set_value(variable, hlsl_offset_from_deref(lhs), store->writemask, store->rhs.node);
+ return false; +}
Shouldn't this function just return void?
It's a matter of philosophy. If you ask me, the switch in copy_propagation_recursive should be rather oblivious of what the various helpers actually do, it shouldn't have an insight that copy_propagation_store never returns true. For its point of view, any helper can potentially modify the code. Though I can change to void if you prefer.
+static bool copy_propagation_recursive(struct hlsl_ctx *ctx, struct hlsl_block *block, + struct copy_propagation_state *state)
So, no comment for a function called "recursive" which is not (yet) actually recursive? :-P
@@ -1354,7 +1601,12 @@ int hlsl_emit_dxbc(struct hlsl_ctx *ctx, struct hlsl_ir_function_decl *entry_fun progress |= transform_ir(ctx, split_struct_copies, body, NULL); } while (progress); - while (transform_ir(ctx, fold_constants, body, NULL)); + do + { + progress = transform_ir(ctx, fold_constants, body, NULL); + progress |= copy_propagation_pass(ctx, body);
This probably isn't worth examining, but I'm curious why constant folding is useful after copy-prop.
Copy propagation might enable some additional steps of constant folding, if a constant is first stored, than loaded and then an operation is carried on it. OTOH, constant folding can enable some additional step of copy propagation if it makes an offset statically visible.
Thanks, Giovanni.