2017-08-10 12:02 GMT+02:00 Paul Gofman gofmanp@gmail.com:
Signed-off-by: Paul Gofman gofmanp@gmail.com
The patch is fine overall, I only have a few questions / nitpicks. I'll skip the part about the delay with the review because it's old news by now... :/
dlls/d3dx9_36/d3dx9_private.h | 3 +- dlls/d3dx9_36/preshader.c | 312 ++++++++++++++++++------------------------ 2 files changed, 135 insertions(+), 180 deletions(-)
diff --git a/dlls/d3dx9_36/d3dx9_private.h b/dlls/d3dx9_36/d3dx9_private.h index 2727463a18..5ac5e63aeb 100644 --- a/dlls/d3dx9_36/d3dx9_private.h +++ b/dlls/d3dx9_36/d3dx9_private.h @@ -230,7 +230,7 @@ enum pres_reg_tables struct d3dx_const_param_eval_output { struct d3dx_parameter *param;
- unsigned int table;
- enum pres_reg_tables table;
The change is good but (unrelated to this patch) the enum type should really use the singular form.
static void dump_bytecode(void *data, unsigned int size) @@ -674,6 +614,40 @@ static HRESULT append_const_set(struct d3dx_const_tab *const_tab, struct d3dx_co return D3D_OK; }
+static void append_pres_const_set(struct d3dx_const_tab *const_tab, struct d3dx_preshader *pres)
Nitpick: this function appends all the struct d3dx_const_param_eval_output from the preshader so calling it "append_pres_const_sets" (or even something more explicit to that effect, e.g. clarifying that you're appending entries about the preshader outputs, which are shader inputs) would be preferable.
+{
- unsigned int i;
- struct d3dx_const_param_eval_output const_set;
- memset(&const_set, 0, sizeof(const_set));
You could initialize const_set in the definition instead, like:
struct d3dx_const_param_eval_output const_set = {NULL};
which looks slightly better to me. It doesn't matter much of course.
@@ -971,10 +946,30 @@ static HRESULT get_constants_desc(unsigned int *byte_code, struct d3dx_const_tab if (FAILED(hr = init_set_constants_param(out, ctab, hc, inputs_param[index]))) goto cleanup; }
if (pres)
append_pres_const_set(out, pres);
if (out->const_set_count) { struct d3dx_const_param_eval_output *new_alloc;
qsort(out->const_set, out->const_set_count, sizeof(*out->const_set), compare_const_set);
for (i = 0; i < out->const_set_count - 1; ++i)
{
if (out->const_set[i].constant_class == D3DXPC_FORCE_DWORD
&& out->const_set[i + 1].constant_class == D3DXPC_FORCE_DWORD
&& out->const_set[i].table == out->const_set[i + 1].table
&& out->const_set[i].register_index + out->const_set[i].register_count
>= out->const_set[i + 1].register_index)
{
out->const_set[i].register_count = out->const_set[i + 1].register_index
+ out->const_set[i + 1].register_count - out->const_set[i].register_index;
I think this might lose account of some registers. E.g. consider the case where const_set[i] has register_index == 0 and register_count == 4 while const_set[i + 1] has register_index == 1 and register_count == 1. This would seem to lose track of the last two components.
I know that preshaders found in the wild seem not to overwrite already written output registers but I wouldn't feel safe with (silently) depending on it.
memmove(&out->const_set[i + 1], &out->const_set[i + 2], sizeof(out->const_set[i])
* (out->const_set_count - i - 2));
--out->const_set_count;
--i;
Mostly matter of personal preference but rather than decrementing the for counter in the if when coalescing I'd write this as a while and increment "i" only in the (currently absent) else case.
@@ -1529,22 +1534,56 @@ static void set_constants(struct d3dx_regstore *rs, struct d3dx_const_tab *const data += param->rows * param->columns; } start_offset = get_offset_reg(table, const_set->register_index);
if (table_info[table].type == param_type)
regstore_set_modified(rs, table, start_offset,
get_offset_reg(table, const_set->register_count) * const_set->element_count);
else
}if (table_info[table].type != param_type) regstore_set_data(rs, table, start_offset, (unsigned int *)rs->tables[table] + start_offset, get_offset_reg(table, const_set->register_count) * const_set->element_count, param_type);
- const_tab->update_version = new_update_version;
One blank line too many.
-}
if (!update_device)
return D3D_OK;
for (const_idx = 0; const_idx < const_tab->const_set_count; ++const_idx)
{
struct d3dx_const_param_eval_output *const_set = &const_tab->const_set[const_idx];
if (device_update_all || (const_set->param
? is_param_dirty(const_set->param, update_version) : pres_dirty))
{
enum pres_reg_tables table = const_set->table;
if (table == device_table && device_start + device_count == const_set->register_index)
Again nitpicking: are there better names for those variables? Maybe use the "current_" prefix instead? It's pretty clear what's going on either way, so up to you.
On 08/28/2017 09:00 PM, Matteo Bruni wrote:
@@ -971,10 +946,30 @@ static HRESULT get_constants_desc(unsigned int *byte_code, struct d3dx_const_tab if (FAILED(hr = init_set_constants_param(out, ctab, hc, inputs_param[index]))) goto cleanup; }
if (pres)
append_pres_const_set(out, pres); if (out->const_set_count) { struct d3dx_const_param_eval_output *new_alloc;
qsort(out->const_set, out->const_set_count, sizeof(*out->const_set), compare_const_set);
for (i = 0; i < out->const_set_count - 1; ++i)
{
if (out->const_set[i].constant_class == D3DXPC_FORCE_DWORD
&& out->const_set[i + 1].constant_class == D3DXPC_FORCE_DWORD
&& out->const_set[i].table == out->const_set[i + 1].table
&& out->const_set[i].register_index + out->const_set[i].register_count
>= out->const_set[i + 1].register_index)
{
out->const_set[i].register_count = out->const_set[i + 1].register_index
+ out->const_set[i + 1].register_count - out->const_set[i].register_index;
I think this might lose account of some registers. E.g. consider the case where const_set[i] has register_index == 0 and register_count == 4 while const_set[i + 1] has register_index == 1 and register_count ==
- This would seem to lose track of the last two components.
I know that preshaders found in the wild seem not to overwrite already written output registers but I wouldn't feel safe with (silently) depending on it.
This 'if' selects just preshader output constants. The entries for const_set handled (merged) in this 'if' are those provided preshader instructions which output anything to shader constants. They can potentially overlap (that's why '>=' in the last 'if' check). I am not sure what can be wrong with that? We track preshader output as a single "commit", the same does native implementation according to our tests.
2017-08-28 23:32 GMT+02:00 Paul Gofman gofmanp@gmail.com:
On 08/28/2017 09:00 PM, Matteo Bruni wrote:
@@ -971,10 +946,30 @@ static HRESULT get_constants_desc(unsigned int *byte_code, struct d3dx_const_tab if (FAILED(hr = init_set_constants_param(out, ctab, hc, inputs_param[index]))) goto cleanup; }
if (pres)
append_pres_const_set(out, pres); if (out->const_set_count) { struct d3dx_const_param_eval_output *new_alloc;
qsort(out->const_set, out->const_set_count,
sizeof(*out->const_set), compare_const_set);
for (i = 0; i < out->const_set_count - 1; ++i)
{
if (out->const_set[i].constant_class == D3DXPC_FORCE_DWORD
&& out->const_set[i + 1].constant_class ==
D3DXPC_FORCE_DWORD
&& out->const_set[i].table == out->const_set[i +
1].table
&& out->const_set[i].register_index +
out->const_set[i].register_count
>= out->const_set[i + 1].register_index)
{
out->const_set[i].register_count = out->const_set[i +
1].register_index
+ out->const_set[i + 1].register_count -
out->const_set[i].register_index;
I think this might lose account of some registers. E.g. consider the case where const_set[i] has register_index == 0 and register_count == 4 while const_set[i + 1] has register_index == 1 and register_count ==
- This would seem to lose track of the last two components.
I know that preshaders found in the wild seem not to overwrite already written output registers but I wouldn't feel safe with (silently) depending on it.
This 'if' selects just preshader output constants. The entries for const_set handled (merged) in this 'if' are those provided preshader instructions which output anything to shader constants. They can potentially overlap (that's why '>=' in the last 'if' check). I am not sure what can be wrong with that? We track preshader output as a single "commit", the same does native implementation according to our tests.
I don't have any issues with the if condition. It's the const_set[i].register_count assignment which looks troublesome since it doesn't consider the old count. In the (artificial, even forced if you want) example I gave const_set[i].register_count would go from 4 to 2 which doesn't seem right.
On 08/29/2017 01:51 AM, Matteo Bruni wrote:
2017-08-28 23:32 GMT+02:00 Paul Gofman gofmanp@gmail.com:
On 08/28/2017 09:00 PM, Matteo Bruni wrote:
@@ -971,10 +946,30 @@ static HRESULT get_constants_desc(unsigned int *byte_code, struct d3dx_const_tab if (FAILED(hr = init_set_constants_param(out, ctab, hc, inputs_param[index]))) goto cleanup; }
if (pres)
append_pres_const_set(out, pres); if (out->const_set_count) { struct d3dx_const_param_eval_output *new_alloc;
qsort(out->const_set, out->const_set_count,
sizeof(*out->const_set), compare_const_set);
for (i = 0; i < out->const_set_count - 1; ++i)
{
if (out->const_set[i].constant_class == D3DXPC_FORCE_DWORD
&& out->const_set[i + 1].constant_class ==
D3DXPC_FORCE_DWORD
&& out->const_set[i].table == out->const_set[i +
1].table
&& out->const_set[i].register_index +
out->const_set[i].register_count
>= out->const_set[i + 1].register_index)
{
out->const_set[i].register_count = out->const_set[i +
1].register_index
+ out->const_set[i + 1].register_count -
out->const_set[i].register_index;
I think this might lose account of some registers. E.g. consider the case where const_set[i] has register_index == 0 and register_count == 4 while const_set[i + 1] has register_index == 1 and register_count ==
- This would seem to lose track of the last two components.
I know that preshaders found in the wild seem not to overwrite already written output registers but I wouldn't feel safe with (silently) depending on it.
This 'if' selects just preshader output constants. The entries for const_set handled (merged) in this 'if' are those provided preshader instructions which output anything to shader constants. They can potentially overlap (that's why '>=' in the last 'if' check). I am not sure what can be wrong with that? We track preshader output as a single "commit", the same does native implementation according to our tests.
I don't have any issues with the if condition. It's the const_set[i].register_count assignment which looks troublesome since it doesn't consider the old count. In the (artificial, even forced if you want) example I gave const_set[i].register_count would go from 4 to 2 which doesn't seem right.
Here is how this register_count is currently evaluated:
---- snip ----
const_set.register_count = max(get_reg_offset(reg->table, pres_op_info[ins->op].func_all_comps ? 1 : ins->component_count), 1);
---- snip ----
This expression is written for some generic case of component_count and components per register, but unless I am missing something this expression should not evaluate to anything but 1 for the real cases. We do not check that component count in instruction does not exceed component count per register during preshader parse though (just checking if it is between 1 and 4).
So maybe I will add an assert for the overlapping registers in this 'if' under discussion just in case?
2017-08-29 1:16 GMT+02:00 Paul Gofman gofmanp@gmail.com:
On 08/29/2017 01:51 AM, Matteo Bruni wrote:
2017-08-28 23:32 GMT+02:00 Paul Gofman gofmanp@gmail.com:
On 08/28/2017 09:00 PM, Matteo Bruni wrote:
@@ -971,10 +946,30 @@ static HRESULT get_constants_desc(unsigned int *byte_code, struct d3dx_const_tab if (FAILED(hr = init_set_constants_param(out, ctab, hc, inputs_param[index]))) goto cleanup; }
if (pres)
append_pres_const_set(out, pres); if (out->const_set_count) { struct d3dx_const_param_eval_output *new_alloc;
qsort(out->const_set, out->const_set_count,
sizeof(*out->const_set), compare_const_set);
for (i = 0; i < out->const_set_count - 1; ++i)
{
if (out->const_set[i].constant_class == D3DXPC_FORCE_DWORD
&& out->const_set[i + 1].constant_class ==
D3DXPC_FORCE_DWORD
&& out->const_set[i].table == out->const_set[i +
1].table
&& out->const_set[i].register_index +
out->const_set[i].register_count
>= out->const_set[i + 1].register_index)
{
out->const_set[i].register_count = out->const_set[i +
1].register_index
+ out->const_set[i + 1].register_count -
out->const_set[i].register_index;
I think this might lose account of some registers. E.g. consider the case where const_set[i] has register_index == 0 and register_count == 4 while const_set[i + 1] has register_index == 1 and register_count ==
- This would seem to lose track of the last two components.
I know that preshaders found in the wild seem not to overwrite already written output registers but I wouldn't feel safe with (silently) depending on it.
This 'if' selects just preshader output constants. The entries for const_set handled (merged) in this 'if' are those provided preshader instructions which output anything to shader constants. They can potentially overlap (that's why '>=' in the last 'if' check). I am not sure what can be wrong with that? We track preshader output as a single "commit", the same does native implementation according to our tests.
I don't have any issues with the if condition. It's the const_set[i].register_count assignment which looks troublesome since it doesn't consider the old count. In the (artificial, even forced if you want) example I gave const_set[i].register_count would go from 4 to 2 which doesn't seem right.
Here is how this register_count is currently evaluated:
---- snip ----
const_set.register_count = max(get_reg_offset(reg->table, pres_op_info[ins->op].func_all_comps ? 1 :
ins->component_count), 1);
---- snip ----
This expression is written for some generic case of component_count and
components per register, but unless I am missing something this expression should not evaluate to anything but 1 for the real cases. We do not check that component count in instruction does not exceed component count per register during preshader parse though (just checking if it is between 1 and 4).
Right, I think checking that during parsing is a good idea regardless of everything else.
I'd argue that either the expression above is too generic (allowing / handling register_count > 1) or the other register_count expression (the merge one I brought up earlier) is too naive, not expecting the same. As it is it just doesn't look right. If you ensure during parsing that, for each preshader instruction, 0 < component_count <= components_per_register then you can just always set the initial register_count to 1. Actually you could avoid setting it at all at that point (more on that below).
So maybe I will add an assert for the overlapping registers in this
'if' under discussion just in case?
That may also be a good idea. Otherwise, if the "+ out->const_set[i + 1].register_count" part is replaced with a "+ 1", the overlap case would be handled trivially. That would make it possible to avoid the previous register_count initialization entirely, assuming register_count is set to 1 during merging in the NOT merging case.
Both options are okay with me. Also, as usual, let me know if I'm still missing something.
On 08/29/2017 05:35 PM, Matteo Bruni wrote:
Right, I think checking that during parsing is a good idea regardless of everything else.
I'd argue that either the expression above is too generic (allowing / handling register_count > 1) or the other register_count expression (the merge one I brought up earlier) is too naive, not expecting the same. As it is it just doesn't look right.
I updated the expression in 'merge' phase by now to do the check, more exactly, added a check for the case under discussion inside the 'if'.
To block the situation when register components exceed register size, I actually should check not just component count but offset also. Perhaps this should also be done not just for output but for input registers only. This can't be done cleanly for input registers though for all the cases, as there might be a relative offset which can still effectively make a multicomponent operation cross register boundary. Not sure if we should add runtime checks for that. Native implementation seems to allow all this and does not refuse to parse preshaders which crossses a register boundary in instruction even without relative addressing. It does allow some things we already deny now, so I think I should improve the checks for components during parsing and block them early when possible, and after that consider preshader instruction output to affect just one register always in this part.
If you ensure during parsing that, for each preshader instruction, 0 < component_count <= components_per_register then you can just always set the initial register_count to 1. Actually you could avoid setting it at all at that point (more on that below).
So maybe I will add an assert for the overlapping registers in this
'if' under discussion just in case?
That may also be a good idea. Otherwise, if the "+ out->const_set[i + 1].register_count" part is replaced with a "+ 1", the overlap case would be handled trivially. That would make it possible to avoid the previous register_count initialization entirely, assuming register_count is set to 1 during merging in the NOT merging case.
Yes, sure.
Both options are okay with me. Also, as usual, let me know if I'm still missing something.
I added a check in this patch by now (as all the other code actually allows for multicomponent operation to cross register boundary by now, and thus register_count can potentially exceed 1 here).
2017-08-29 17:08 GMT+02:00 Paul Gofman gofmanp@gmail.com:
On 08/29/2017 05:35 PM, Matteo Bruni wrote:
Right, I think checking that during parsing is a good idea regardless of everything else.
I'd argue that either the expression above is too generic (allowing / handling register_count > 1) or the other register_count expression (the merge one I brought up earlier) is too naive, not expecting the same. As it is it just doesn't look right.
I updated the expression in 'merge' phase by now to do the check, more exactly, added a check for the case under discussion inside the 'if'.
To block the situation when register components exceed register size, I
actually should check not just component count but offset also. Perhaps this should also be done not just for output but for input registers only. This can't be done cleanly for input registers though for all the cases, as there might be a relative offset which can still effectively make a multicomponent operation cross register boundary. Not sure if we should add runtime checks for that. Native implementation seems to allow all this and does not refuse to parse preshaders which crossses a register boundary in instruction even without relative addressing. It does allow some things we already deny now, so I think I should improve the checks for components during parsing and block them early when possible, and after that consider preshader instruction output to affect just one register always in this part.
Native may allow that but there is some good chance that it computes incorrect outputs / sets incorrect device states in those cases (i.e. I imagine instructions straddling multiple registers to be unexpected for native too). It's probably not relevant in practice. Anyway, sure, I'm okay with adding those checks in our implementation if you want (mostly to print some WARNs if the unexpected happens) but simply assuming / enforcing a single output register per instruction, along the lines of my last email, seems enough to me.