2017-03-24 23:12 GMT+01:00 Paul Gofman gofmanp@gmail.com:
@@ -697,13 +739,13 @@ static HRESULT parse_preshader(struct d3dx_preshader *pres, unsigned int *ptr, u for (i = 0; i < pres->ins_count; ++i) { for (j = 0; j < pres_op_info[pres->ins[i].op].input_count; ++j)
update_table_size(pres->regs.table_sizes, pres->ins[i].inputs[j].table,
get_reg_offset(pres->ins[i].inputs[j].table,
pres->ins[i].inputs[j].offset + pres->ins[i].component_count - 1));
update_table_size(pres->regs.table_sizes, pres->ins[i].inputs[j].reg.table,
get_reg_offset(pres->ins[i].inputs[j].reg.table,
pres->ins[i].inputs[j].reg.offset + pres->ins[i].component_count - 1));
update_table_size(pres->regs.table_sizes, pres->ins[i].output.table,
get_reg_offset(pres->ins[i].output.table,
pres->ins[i].output.offset + pres->ins[i].component_count - 1));
update_table_size(pres->regs.table_sizes, pres->ins[i].output.reg.table,
get_reg_offset(pres->ins[i].output.reg.table,
pres->ins[i].output.reg.offset + pres->ins[i].component_count - 1));
Do these update_table_size() always do the right thing with relative addressing? If I'm not mistaken, they should be fine because we only accept relative addressing for CONST and IMMED and the former is taken care by update_table_sizes_consts() just below while the latter's size is set statically just above this chunk. I think that means though that we don't need to call update_table_size() on the input arguments.
If that sounds right to you, you could drop that "for(j)" altogether or, probably better, replace the update_table_size() with a check that the argument is not out of bounds (probably returning failure if the check doesn't pass). That should be a separate patch.
@@ -1088,18 +1130,65 @@ static HRESULT init_set_constants(struct d3dx_const_tab *const_tab, ID3DXConstan return ret; }
+static double exec_get_reg_value(struct d3dx_regstore *rs, enum pres_reg_tables table, unsigned int offset) +{
- if (!regstore_is_val_set_reg(rs, table, offset / table_info[table].reg_component_count))
WARN("Using uninitialized input, table %u, offset %u.\n", table, offset);
- return regstore_get_double(rs, table, offset);
+}
static double exec_get_arg(struct d3dx_regstore *rs, const struct d3dx_pres_operand *opr, unsigned int comp) {
- if (!regstore_is_val_set_reg(rs, opr->table, (opr->offset + comp) / table_info[opr->table].reg_component_count))
WARN("Using uninitialized input, table %u, offset %u.\n", opr->table, opr->offset + comp);
- unsigned int offset, base_index, reg_index, table;
- table = opr->reg.table;
- if (opr->index_reg.table == PRES_REGTAB_COUNT)
base_index = 0;
- else
base_index = lrint(exec_get_reg_value(rs, opr->index_reg.table, opr->index_reg.offset));
- /* '4' is used instead of reg_component_count, as immediate constants (which have
* reg_component_count of 1) are still indexed as 4 values according to the tests.
*/
- offset = base_index * 4 + opr->reg.offset + comp;
Actually, is there any reason why reg_component_count for PRES_REGTAB_IMMED isn't 4? Maybe change it to 4 and in parse_preshader() check that const_count is a multiple of 4 (which is something we're depending on anyway). That would be a separate patch too.
Clue me if I'm missing anything.
- reg_index = offset / table_info[table].reg_component_count;
- if (reg_index >= rs->table_sizes[table])
- {
unsigned int wrap_size;
if (table == PRES_REGTAB_CONST)
{
/* As it can be guessed from tests (by iterating base register value and
* observing the period of result values repeat), offset into floating
* constant table are wrapped to the nearest power of 2 and not to the
* actual table size.
*/
for (wrap_size = 1; wrap_size < rs->table_sizes[table]; wrap_size <<= 1)
;
Please close the comment on the last text line (i.e. "*/" shouldn't be on its own line) and maybe remove the part in parentheses which is pretty obvious. I like the comment otherwise.
Another thing on this, maybe add a WARN() on wrapped accesses to show what offset (or value) is returned in that case? Ignore it if you don't think that's useful.
On 03/29/2017 02:37 AM, Matteo Bruni wrote:
2017-03-24 23:12 GMT+01:00 Paul Gofman gofmanp@gmail.com:
@@ -697,13 +739,13 @@ static HRESULT parse_preshader(struct d3dx_preshader *pres, unsigned int *ptr, u for (i = 0; i < pres->ins_count; ++i) { for (j = 0; j < pres_op_info[pres->ins[i].op].input_count; ++j)
update_table_size(pres->regs.table_sizes, pres->ins[i].inputs[j].table,
get_reg_offset(pres->ins[i].inputs[j].table,
pres->ins[i].inputs[j].offset + pres->ins[i].component_count - 1));
update_table_size(pres->regs.table_sizes, pres->ins[i].inputs[j].reg.table,
get_reg_offset(pres->ins[i].inputs[j].reg.table,
pres->ins[i].inputs[j].reg.offset + pres->ins[i].component_count - 1));
update_table_size(pres->regs.table_sizes, pres->ins[i].output.table,
get_reg_offset(pres->ins[i].output.table,
pres->ins[i].output.offset + pres->ins[i].component_count - 1));
update_table_size(pres->regs.table_sizes, pres->ins[i].output.reg.table,
get_reg_offset(pres->ins[i].output.reg.table,
pres->ins[i].output.reg.offset + pres->ins[i].component_count - 1));
Do these update_table_size() always do the right thing with relative addressing? If I'm not mistaken, they should be fine because we only accept relative addressing for CONST and IMMED and the former is taken care by update_table_sizes_consts() just below while the latter's size is set statically just above this chunk. I think that means though that we don't need to call update_table_size() on the input arguments.
If that sounds right to you, you could drop that "for(j)" altogether or, probably better, replace the update_table_size() with a check that the argument is not out of bounds (probably returning failure if the check doesn't pass). That should be a separate patch.
Oh, its a bug actually now, we should not mind absolute offset here if relative addressing is on, we know if it is out of bound or not when we add index register value only, which can be negative. Yes, it looks the only tables where we need to do this update table size are temporary register table (for which we don't have any size known in advance) and output tables. I would check that and if I am not missing something now would leave just update table size where required. Otherwise, if to leave the way it is done now, if relative addressing is used update_table_size should be called on index register table/offset instead of main register. Regarding absolute offset check, if the absolute offset (without relative addressing) is out of bound fxc does not allow to compile that. If to test what happens to native implementation in this case I need to update the blob manually. So do you suggest to check if the absolute offset to input table is in bound and print a FIXME/fail preshader creation in this case?
@@ -1088,18 +1130,65 @@ static HRESULT init_set_constants(struct d3dx_const_tab *const_tab, ID3DXConstan return ret; }
+static double exec_get_reg_value(struct d3dx_regstore *rs, enum pres_reg_tables table, unsigned int offset) +{
- if (!regstore_is_val_set_reg(rs, table, offset / table_info[table].reg_component_count))
WARN("Using uninitialized input, table %u, offset %u.\n", table, offset);
- return regstore_get_double(rs, table, offset);
+}
- static double exec_get_arg(struct d3dx_regstore *rs, const struct d3dx_pres_operand *opr, unsigned int comp) {
- if (!regstore_is_val_set_reg(rs, opr->table, (opr->offset + comp) / table_info[opr->table].reg_component_count))
WARN("Using uninitialized input, table %u, offset %u.\n", opr->table, opr->offset + comp);
- unsigned int offset, base_index, reg_index, table;
- table = opr->reg.table;
- if (opr->index_reg.table == PRES_REGTAB_COUNT)
base_index = 0;
- else
base_index = lrint(exec_get_reg_value(rs, opr->index_reg.table, opr->index_reg.offset));
- /* '4' is used instead of reg_component_count, as immediate constants (which have
* reg_component_count of 1) are still indexed as 4 values according to the tests.
*/
- offset = base_index * 4 + opr->reg.offset + comp;
Actually, is there any reason why reg_component_count for PRES_REGTAB_IMMED isn't 4? Maybe change it to 4 and in parse_preshader() check that const_count is a multiple of 4 (which is something we're depending on anyway). That would be a separate patch too.
Clue me if I'm missing anything.
Immediate const count is not necessarily the multiple of 4 (it is not the case in the present tests). If we set PRES_REGTAB_IMMED count to 4 it breaks new out of bounds index tests (due to incorrect size to wrap to). Besides, immediate constants do not treated as 4-value registers anywhere else in preshader code, other than for relative addressing. fxc compiler can "pack" constants one after another for there "normal" usage (as it was in all our previous tests). So I don't see why we would change reg_component_count to 4 for them. But when indexing immed constants all the cases I could construct so far index them as 4-value packs, even if they are used to index vec3 and just 3 values are used from there. I think it is unlikely could be done some other way (if they are already undexed in 4-packs in this case), as otherwise each use of index to immediate constant would require its own aligment of those constants.
- reg_index = offset / table_info[table].reg_component_count;
- if (reg_index >= rs->table_sizes[table])
- {
unsigned int wrap_size;
if (table == PRES_REGTAB_CONST)
{
/* As it can be guessed from tests (by iterating base register value and
* observing the period of result values repeat), offset into floating
* constant table are wrapped to the nearest power of 2 and not to the
* actual table size.
*/
for (wrap_size = 1; wrap_size < rs->table_sizes[table]; wrap_size <<= 1)
;
Please close the comment on the last text line (i.e. "*/" shouldn't be on its own line) and maybe remove the part in parentheses which is pretty obvious. I like the comment otherwise.
Another thing on this, maybe add a WARN() on wrapped accesses to show what offset (or value) is returned in that case? Ignore it if you don't think that's useful.
Yes, I also think WARN won't hurt in this case.
I haven't looked at the v3 patches yet, but in the meantime:
2017-03-29 2:09 GMT+02:00 Paul Gofman gofmanp@gmail.com:
Oh, its a bug actually now, we should not mind absolute offset here if
relative addressing is on, we know if it is out of bound or not when we add index register value only, which can be negative. Yes, it looks the only tables where we need to do this update table size are temporary register table (for which we don't have any size known in advance) and output tables. I would check that and if I am not missing something now would leave just update table size where required.
Yeah and temporaries are supposed to be written before being read so in theory you can entirely skip the update_table_size() calls for source arguments.
Otherwise, if to leave the way it is done now, if relative addressing is
used update_table_size should be called on index register table/offset instead of main register. Regarding absolute offset check, if the absolute offset (without relative addressing) is out of bound fxc does not allow to compile that. If to test what happens to native implementation in this case I need to update the blob manually. So do you suggest to check if the absolute offset to input table is in bound and print a FIXME/fail preshader creation in this case?
Failing with a WARN is probably the best option. I don't think we necessarily need a Wine test for that, once you tested what happens with native. If you want you can add a test with a small effect manually hacked to contain an invalid argument.
Immediate const count is not necessarily the multiple of 4 (it is not the case in the present tests). If we set PRES_REGTAB_IMMED count to 4 it breaks new out of bounds index tests (due to incorrect size to wrap to).
Hmm, I've never seen any effect with a number of immediates not multiple of 4. Can you point me to one?
On 03/29/2017 09:17 PM, Matteo Bruni wrote:
I haven't looked at the v3 patches yet, but in the meantime:
2017-03-29 2:09 GMT+02:00 Paul Gofman gofmanp@gmail.com:
Oh, its a bug actually now, we should not mind absolute offset here if
relative addressing is on, we know if it is out of bound or not when we add index register value only, which can be negative. Yes, it looks the only tables where we need to do this update table size are temporary register table (for which we don't have any size known in advance) and output tables. I would check that and if I am not missing something now would leave just update table size where required.
Yeah and temporaries are supposed to be written before being read so in theory you can entirely skip the update_table_size() calls for source arguments.
But update_table_sizes is required to calculate memory allocation size for table, what is the difference if temporary registers are written first in this regard? In v3 I u sed correct offset for updating table sizes when relative addressing is used in operand. I didn't change table size update though to update temporary register table size only, as doing so currently breaks one test. It is test_effect_preshader_op(), which writes instructions to the effect blob and unintentionally corrupts the data past end of the preshader code being updated. This effectively results in destroying 'CTAB' for the next preshader in blob, so that preshader is parsed ok but it does not have input constant definition anymore, thus no table sizes set from input registers definition. Still such an effect and preshader is created ok by native implementation, and also by our current implementation. So So I thought it is better to keep the current logic of table size updates based on actual register access in preshader for all tables, as according to the test native implementation does not fail preshader creation if it access the constants outside defined input range.
Sorry, regarding my 1st question, I realized that of course first write make sense, as the size will be updated through output register. But still I would suggest to keep the currently implemented scheme to better much native approach (which is tolerant to missing constants definition).
On 03/29/2017 09:23 PM, Paul Gofman wrote:
On 03/29/2017 09:17 PM, Matteo Bruni wrote:
I haven't looked at the v3 patches yet, but in the meantime:
2017-03-29 2:09 GMT+02:00 Paul Gofman gofmanp@gmail.com:
Oh, its a bug actually now, we should not mind absolute offset
here if relative addressing is on, we know if it is out of bound or not when we add index register value only, which can be negative. Yes, it looks the only tables where we need to do this update table size are temporary register table (for which we don't have any size known in advance) and output tables. I would check that and if I am not missing something now would leave just update table size where required.
Yeah and temporaries are supposed to be written before being read so in theory you can entirely skip the update_table_size() calls for source arguments.
But update_table_sizes is required to calculate memory allocation
size for table, what is the difference if temporary registers are written first in this regard? In v3 I u sed correct offset for updating table sizes when relative addressing is used in operand. I didn't change table size update though to update temporary register table size only, as doing so currently breaks one test. It is test_effect_preshader_op(), which writes instructions to the effect blob and unintentionally corrupts the data past end of the preshader code being updated. This effectively results in destroying 'CTAB' for the next preshader in blob, so that preshader is parsed ok but it does not have input constant definition anymore, thus no table sizes set from input registers definition. Still such an effect and preshader is created ok by native implementation, and also by our current implementation. So So I thought it is better to keep the current logic of table size updates based on actual register access in preshader for all tables, as according to the test native implementation does not fail preshader creation if it access the constants outside defined input range.
2017-03-29 20:17 GMT+02:00 Matteo Bruni matteo.mystral@gmail.com:
Yeah and temporaries are supposed to be written before being read so in theory you can entirely skip the update_table_size() calls for source arguments.
I haven't looked in detail but this still works for me (i.e. commenting out those calls doesn't break the tests here, on 32 bits at least). How does the accidental CTAB overwriting you mention happen exactly?
Hmm, I've never seen any effect with a number of immediates not multiple of 4. Can you point me to one?
Do you have one of those?
For reference, I'm attaching a patch with some hacks regarding these two points above and various debug stuff.
On 04/03/2017 06:35 PM, Matteo Bruni wrote:
2017-03-29 20:17 GMT+02:00 Matteo Bruni matteo.mystral@gmail.com:
Yeah and temporaries are supposed to be written before being read so in theory you can entirely skip the update_table_size() calls for source arguments.
I haven't looked in detail but this still works for me (i.e. commenting out those calls doesn't break the tests here, on 32 bits at least). How does the accidental CTAB overwriting you mention happen exactly?
Yes, if to just comment out it won't break any test, it will break that one though if you replace table update with input operand offset range check with failure return (after moving table size update from input parameters above the loop of course, otherwise it will break many tests). I actually spotted the accidental CTAB overwrite before that by watching warn/trace output of tests and investigating why there are 'uninitialized input' warnings on preshader operations testing, and had in mind to fix this test at some point. Overwriting happens in test_preshader_op() function for 2 argumemts op case. The function thinks there are 4 per-component operations (which is the case in test blob for 1 argument test preshader but actually not the case for 2 argument preshader blob), and puts opcode there, while the last opcode fall after the actual end of preshader code to the CTAB of the next preshader. I agree that this part with table size update based on input operands actual offset looks a bit ugly and can be most likely made better. I just suspect that the better solution which won't break anything that can work now same as native needs a bit more investigation of how native code handles out of bound access in this case. Just dropping it without failing will lead to crash or undefined behaviour if index register's offset is out of bound now (which should not normally happen in my current understanding though without broken bytecode, as well as the need to update table size based on input operands as well). This of course can be mitigated by also checking index register offsets at runtime (now the check done for value register offsets only which can now include index register value), but I am not sure that moving this extra check to execution time is much nicer. So maybe this improvement can wait until we get all the remaining major pieces in place (that pieces are, in my understanding, are state manager support and effect pools/shared parameters, which make d3dx9 effect framework usable with the majority of applications)? Or a simple solution is to fix the test and add the checks instead of table update, ignoring compatibility with native behaviour for such cases, which we consider possible on broken bytecode only, at least based on what I've tested so far.
Hmm, I've never seen any effect with a number of immediates not multiple of 4. Can you point me to one?
Do you have one of those?
I somehow was sure that there are such cases in our present tests, though could not find them now. I pretty sure I saw the values of '6' as immediate constant table size, maybe they were present in some previous versions of blobs when there was less immediate constants used in relative addressing test, or during my extensive testing of out of bounds indexes. Anyway, I found such a case (18 immediate constants) in one of real apps I am also testing this on, I am attaching a text file with that preshader dump (it also has an immediate constants count display which I temporary added to the preshader dump code, the same way you do in your patch, without the values dump though).
2017-04-03 19:50 GMT+02:00 Paul Gofman gofmanp@gmail.com:
On 04/03/2017 06:35 PM, Matteo Bruni wrote:
2017-03-29 20:17 GMT+02:00 Matteo Bruni matteo.mystral@gmail.com:
Yeah and temporaries are supposed to be written before being read so in theory you can entirely skip the update_table_size() calls for source arguments.
I haven't looked in detail but this still works for me (i.e. commenting out those calls doesn't break the tests here, on 32 bits at least). How does the accidental CTAB overwriting you mention happen exactly?
Yes, if to just comment out it won't break any test, it will break that
one though if you replace table update with input operand offset range check with failure return (after moving table size update from input parameters above the loop of course, otherwise it will break many tests). I actually spotted the accidental CTAB overwrite before that by watching warn/trace output of tests and investigating why there are 'uninitialized input' warnings on preshader operations testing, and had in mind to fix this test at some point. Overwriting happens in test_preshader_op() function for 2 argumemts op case. The function thinks there are 4 per-component operations (which is the case in test blob for 1 argument test preshader but actually not the case for 2 argument preshader blob), and puts opcode there, while the last opcode fall after the actual end of preshader code to the CTAB of the next preshader. I agree that this part with table size update based on input operands actual offset looks a bit ugly and can be most likely made better. I just suspect that the better solution which won't break anything that can work now same as native needs a bit more investigation of how native code handles out of bound access in this case. Just dropping it without failing will lead to crash or undefined behaviour if index register's offset is out of bound now (which should not normally happen in my current understanding though without broken bytecode, as well as the need to update table size based on input operands as well). This of course can be mitigated by also checking index register offsets at runtime (now the check done for value register offsets only which can now include index register value), but I am not sure that moving this extra check to execution time is much nicer. So maybe this improvement can wait until we get all the remaining major pieces in place (that pieces are, in my understanding, are state manager support and effect pools/shared parameters, which make d3dx9 effect framework usable with the majority of applications)? Or a simple solution is to fix the test and add the checks instead of table update, ignoring compatibility with native behaviour for such cases, which we consider possible on broken bytecode only, at least based on what I've tested so far.
I see. It's okay to handle this as native and be robust to inconsistent data in constant tables.
Hmm, I've never seen any effect with a number of immediates not multiple of 4. Can you point me to one?
Do you have one of those?
I somehow was sure that there are such cases in our present tests,
though could not find them now. I pretty sure I saw the values of '6' as immediate constant table size, maybe they were present in some previous versions of blobs when there was less immediate constants used in relative addressing test, or during my extensive testing of out of bounds indexes. Anyway, I found such a case (18 immediate constants) in one of real apps I am also testing this on, I am attaching a text file with that preshader dump (it also has an immediate constants count display which I temporary added to the preshader dump code, the same way you do in your patch, without the values dump though).
It looks to me as the CLIT section in the bytecode you attached contains 16 as the constants count. Not sure why the print below doesn't match that.
Either way, it doesn't matter much. Even if immediate counts non-multiple-of-4 can actually exist we could (and I argue, should) bump that up to the next multiple, zeroing the extra components.
On 04/09/2017 08:40 PM, Matteo Bruni wrote:
It looks to me as the CLIT section in the bytecode you attached contains 16 as the constants count. Not sure why the print below doesn't match that.
Either way, it doesn't matter much. Even if immediate counts non-multiple-of-4 can actually exist we could (and I argue, should) bump that up to the next multiple, zeroing the extra components.
Oh, it must be due to the code for updating constant table size under discussion does not respect scalar op flag, and 18 constants here reveals that (as the constant used in the instruction causing actual size update is using just 1 constant for all 3 components, not 3). It is not good by itself, in particular, it can break index wrap logic. For the same index wrap logic we can't just bump to next multiple of 4, if the cases where the count is not the multiple actually exist (I am not sure anymore that they do). So I will fix that update for scalar op case at some point, do some more testing and either send the fix or change the constant components count to 4, with a fail on preshader creation if it is not the case.
2017-04-10 12:13 GMT+02:00 Paul Gofman gofmanp@gmail.com:
On 04/09/2017 08:40 PM, Matteo Bruni wrote:
It looks to me as the CLIT section in the bytecode you attached contains 16 as the constants count. Not sure why the print below doesn't match that.
Either way, it doesn't matter much. Even if immediate counts non-multiple-of-4 can actually exist we could (and I argue, should) bump that up to the next multiple, zeroing the extra components.
Oh, it must be due to the code for updating constant table size under
discussion does not respect scalar op flag, and 18 constants here reveals that (as the constant used in the instruction causing actual size update is using just 1 constant for all 3 components, not 3). It is not good by itself, in particular, it can break index wrap logic. For the same index wrap logic we can't just bump to next multiple of 4, if the cases where the count is not the multiple actually exist (I am not sure anymore that they do). So I will fix that update for scalar op case at some point, do some more testing and either send the fix or change the constant components count to 4, with a fail on preshader creation if it is not the case.
Ah, yes, that makes sense. Agreed on the follow up change.