2017-03-23 16:06 GMT+01:00 Paul Gofman gofmanp@gmail.com:
Signed-off-by: Paul Gofman gofmanp@gmail.com
dlls/d3dx9_36/preshader.c | 103 ++++++++++++++++++++++++++++++++++++------- dlls/d3dx9_36/tests/effect.c | 38 ++++++++-------- 2 files changed, 106 insertions(+), 35 deletions(-)
diff --git a/dlls/d3dx9_36/preshader.c b/dlls/d3dx9_36/preshader.c index 9aedc99..2974ffd 100644 --- a/dlls/d3dx9_36/preshader.c +++ b/dlls/d3dx9_36/preshader.c @@ -191,6 +191,8 @@ struct d3dx_pres_operand /* offset is component index, not register index, e. g. offset for component c3.y is 13 (3 * 4 + 1) */ unsigned int offset;
- enum pres_reg_tables base_reg_table;
- unsigned int base_reg_offset;
};
It would be nicer to create a struct for storing together table and offset and to pass it around to the various functions taking both arguments, like parse_pres_reg(). It should probably be a separate patch before this one.
ptr = parse_pres_reg(ptr + 1, &opr->base_reg_table, &opr->base_reg_offset);
- }
- else
- {
opr->base_reg_table = PRES_REGTAB_COUNT;
}++ptr;
- opr->table = reg_table[*ptr++];
- opr->offset = *ptr++;
- ptr = parse_pres_reg(ptr, &opr->table, &opr->offset);
This would crash if the previous parse_pres_reg() call returned NULL.
- size = rs->table_sizes[table] * table_info[table].reg_component_count;
- if (offset >= size)
- {
if ((offset & 255) < size)
offset &= 255;
else
offset = offset % size;
- }
Is this just to try and avoid the modulo operation in some cases or is there a deeper reason? If the former, you could extend that to all the power-of-two sizes by first checking if that's the case (!(size & size - 1)) and, if so, and-ing offset with the mask (size - 1). That's assuming the compiler doesn't do the same optimization already.
Let me know if there is some other reason instead.
static double exec_get_arg(struct d3dx_regstore *rs, const struct d3dx_pres_operand *opr, unsigned int comp) {
- if (!regstore_is_val_set_reg(rs, opr->table, (opr->offset + comp) / table_info[opr->table].reg_component_count))
WARN("Using uninitialized input, table %u, offset %u.\n", opr->table, opr->offset + comp);
- unsigned int base_offset, index;
- if (opr->base_reg_table == PRES_REGTAB_COUNT)
index = 0;
- else
index = (int)exec_get_reg_value(rs, opr->base_reg_table, opr->base_reg_offset);
I think it's better with lrint() in place of the cast.
- if (index >= rs->table_sizes[opr->table] && opr->table == PRES_REGTAB_CONST)
- {
unsigned int size_pow2;
for (size_pow2 = 1; size_pow2 < rs->table_sizes[opr->table]; size_pow2 <<= 1)
;
index &= (size_pow2 - 1);
if (index >= rs->table_sizes[opr->table])
return 0.0;
- }
I don't think we necessarily have to follow native's madness on buffer overflows to the letter. Since you did, that's certainly fine though.
On 03/24/2017 01:57 AM, Matteo Bruni wrote:
2017-03-23 16:06 GMT+01:00 Paul Gofman gofmanp@gmail.com:
Signed-off-by: Paul Gofman gofmanp@gmail.com
dlls/d3dx9_36/preshader.c | 103 ++++++++++++++++++++++++++++++++++++------- dlls/d3dx9_36/tests/effect.c | 38 ++++++++-------- 2 files changed, 106 insertions(+), 35 deletions(-)
diff --git a/dlls/d3dx9_36/preshader.c b/dlls/d3dx9_36/preshader.c index 9aedc99..2974ffd 100644 --- a/dlls/d3dx9_36/preshader.c +++ b/dlls/d3dx9_36/preshader.c @@ -191,6 +191,8 @@ struct d3dx_pres_operand /* offset is component index, not register index, e. g. offset for component c3.y is 13 (3 * 4 + 1) */ unsigned int offset;
- enum pres_reg_tables base_reg_table;
- unsigned int base_reg_offset; };
It would be nicer to create a struct for storing together table and offset and to pass it around to the various functions taking both arguments, like parse_pres_reg(). It should probably be a separate patch before this one.
Actually I can't guess the case when base_reg_table would be not a floating constant preshader input. Maybe I just remove this variable and check with FIXME during parsing?
- size = rs->table_sizes[table] * table_info[table].reg_component_count;
- if (offset >= size)
- {
if ((offset & 255) < size)
offset &= 255;
else
offset = offset % size;
- }
Is this just to try and avoid the modulo operation in some cases or is there a deeper reason? If the former, you could extend that to all the power-of-two sizes by first checking if that's the case (!(size & size - 1)) and, if so, and-ing offset with the mask (size - 1). That's assuming the compiler doesn't do the same optimization already.
Let me know if there is some other reason instead.
Just doing (offset % size) will lead to different results and will break the tests. The logic in this out of range index checking is a bit mind breaking.
static double exec_get_arg(struct d3dx_regstore *rs, const struct d3dx_pres_operand *opr, unsigned int comp) {
- if (!regstore_is_val_set_reg(rs, opr->table, (opr->offset + comp) / table_info[opr->table].reg_component_count))
WARN("Using uninitialized input, table %u, offset %u.\n", opr->table, opr->offset + comp);
- unsigned int base_offset, index;
- if (opr->base_reg_table == PRES_REGTAB_COUNT)
index = 0;
- else
index = (int)exec_get_reg_value(rs, opr->base_reg_table, opr->base_reg_offset);
I think it's better with lrint() in place of the cast.
I just want to note that whenever the input is from floating values FX compiler generates preshader instructions which do the rounding, and skips that if the base register comes from integers. Do you think we need to do extra rounding in this case?
- if (index >= rs->table_sizes[opr->table] && opr->table == PRES_REGTAB_CONST)
- {
unsigned int size_pow2;
for (size_pow2 = 1; size_pow2 < rs->table_sizes[opr->table]; size_pow2 <<= 1)
;
index &= (size_pow2 - 1);
if (index >= rs->table_sizes[opr->table])
return 0.0;
- }
I don't think we necessarily have to follow native's madness on buffer overflows to the letter. Since you did, that's certainly fine though.
I think I finally guessed the principle how native d3dx clamps indexes, at least finally I got stable match for the tests involving any index numbers. I thought it could make some sense as we sometimes see apps depending on out of bounds access behaviour, especially if native implementation can return zeroes (either stable or by chance like here).
As I can guess the logic behind it, it has some allocation size on the floating constant table which is the power of 2, and it first ultimately puts the index to this allocation range with modulo operation. Then, if the index is bigger than the actual number of constants, it returns zero. This logic is a bit different for indexing immediate constants though (which is used when indirect addressing selects something from vector or matrix), which never returns ultimate zeroes, probably because the memory allocation for immediate constants is always exact and is not done in chunks. I have no clear idea though why (offset & 255) < size is a special case, can only guess than maybe it is related somehow to casting offset to byte at some point.
2017-03-24 0:39 GMT+01:00 Paul Gofman gofmanp@gmail.com:
On 03/24/2017 01:57 AM, Matteo Bruni wrote:
2017-03-23 16:06 GMT+01:00 Paul Gofman gofmanp@gmail.com:
Signed-off-by: Paul Gofman gofmanp@gmail.com
dlls/d3dx9_36/preshader.c | 103 ++++++++++++++++++++++++++++++++++++------- dlls/d3dx9_36/tests/effect.c | 38 ++++++++-------- 2 files changed, 106 insertions(+), 35 deletions(-)
diff --git a/dlls/d3dx9_36/preshader.c b/dlls/d3dx9_36/preshader.c index 9aedc99..2974ffd 100644 --- a/dlls/d3dx9_36/preshader.c +++ b/dlls/d3dx9_36/preshader.c @@ -191,6 +191,8 @@ struct d3dx_pres_operand /* offset is component index, not register index, e. g. offset for component c3.y is 13 (3 * 4 + 1) */ unsigned int offset;
- enum pres_reg_tables base_reg_table;
- unsigned int base_reg_offset; };
It would be nicer to create a struct for storing together table and offset and to pass it around to the various functions taking both arguments, like parse_pres_reg(). It should probably be a separate patch before this one.
Actually I can't guess the case when base_reg_table would be not a floating constant preshader input. Maybe I just remove this variable and check with FIXME during parsing?
Sure, that's fine. Still the struct seems useful for parameter passing, if not right there in struct d3dx_pres_operand.
- size = rs->table_sizes[table] *
table_info[table].reg_component_count;
- if (offset >= size)
- {
if ((offset & 255) < size)
offset &= 255;
else
offset = offset % size;
- }
Is this just to try and avoid the modulo operation in some cases or is there a deeper reason? If the former, you could extend that to all the power-of-two sizes by first checking if that's the case (!(size & size - 1)) and, if so, and-ing offset with the mask (size - 1). That's assuming the compiler doesn't do the same optimization already.
Let me know if there is some other reason instead.
Just doing (offset % size) will lead to different results and will break the tests. The logic in this out of range index checking is a bit mind breaking.
I guess there was a deeper reason indeed ;)
Is that visible in the tests included in this series? Either way, maybe add a comment so that it's obvious why that thing is there.
static double exec_get_arg(struct d3dx_regstore *rs, const struct d3dx_pres_operand *opr, unsigned int comp) {
- if (!regstore_is_val_set_reg(rs, opr->table, (opr->offset + comp) /
table_info[opr->table].reg_component_count))
WARN("Using uninitialized input, table %u, offset %u.\n",
opr->table, opr->offset + comp);
- unsigned int base_offset, index;
- if (opr->base_reg_table == PRES_REGTAB_COUNT)
index = 0;
- else
index = (int)exec_get_reg_value(rs, opr->base_reg_table,
opr->base_reg_offset);
I think it's better with lrint() in place of the cast.
I just want to note that whenever the input is from floating values FX compiler generates preshader instructions which do the rounding, and skips that if the base register comes from integers. Do you think we need to do extra rounding in this case?
Ah, you're right, I somehow overlooked that. Actually I think doublechecking (e.g. with a FIXME) that the source is really integer would be useful. If it isn't too much of a mess it would be nice to avoid the roundtrip to doubles for the index entirely.
- if (index >= rs->table_sizes[opr->table] && opr->table ==
PRES_REGTAB_CONST)
- {
unsigned int size_pow2;
for (size_pow2 = 1; size_pow2 < rs->table_sizes[opr->table];
size_pow2 <<= 1)
;
index &= (size_pow2 - 1);
if (index >= rs->table_sizes[opr->table])
return 0.0;
- }
I don't think we necessarily have to follow native's madness on buffer overflows to the letter. Since you did, that's certainly fine though.
I think I finally guessed the principle how native d3dx clamps indexes,
at least finally I got stable match for the tests involving any index numbers. I thought it could make some sense as we sometimes see apps depending on out of bounds access behaviour, especially if native implementation can return zeroes (either stable or by chance like here).
Yeah, sure, it's just that maybe getting all the finer details right would turn out unnecessary in practice. Props to you for doing all the testing and getting things correct to this degree :)
As I can guess the logic behind it, it has some allocation size on the
floating constant table which is the power of 2, and it first ultimately puts the index to this allocation range with modulo operation. Then, if the index is bigger than the actual number of constants, it returns zero. This logic is a bit different for indexing immediate constants though (which is used when indirect addressing selects something from vector or matrix), which never returns ultimate zeroes, probably because the memory allocation for immediate constants is always exact and is not done in chunks.
That seems like a reasonable deduction. We could do the same in our implementation and avoid the modulo non-power-of-two sizes but that can certainly be something left for future improvements (if ever, there is something to be said about being moderate with memory consumption).
I have no clear idea though why (offset & 255) < size is a special case, can only guess than maybe it is related somehow to casting offset to byte at some point.
Yeah, that is also plausible, FWIW.
On 03/24/2017 03:31 AM, Matteo Bruni wrote:
2017-03-24 0:39 GMT+01:00 Paul Gofman gofmanp@gmail.com:
On 03/24/2017 01:57 AM, Matteo Bruni wrote:
2017-03-23 16:06 GMT+01:00 Paul Gofman gofmanp@gmail.com:
Signed-off-by: Paul Gofman gofmanp@gmail.com
dlls/d3dx9_36/preshader.c | 103 ++++++++++++++++++++++++++++++++++++------- dlls/d3dx9_36/tests/effect.c | 38 ++++++++-------- 2 files changed, 106 insertions(+), 35 deletions(-)
diff --git a/dlls/d3dx9_36/preshader.c b/dlls/d3dx9_36/preshader.c index 9aedc99..2974ffd 100644 --- a/dlls/d3dx9_36/preshader.c +++ b/dlls/d3dx9_36/preshader.c @@ -191,6 +191,8 @@ struct d3dx_pres_operand /* offset is component index, not register index, e. g. offset for component c3.y is 13 (3 * 4 + 1) */ unsigned int offset;
- enum pres_reg_tables base_reg_table;
- unsigned int base_reg_offset; };
It would be nicer to create a struct for storing together table and offset and to pass it around to the various functions taking both arguments, like parse_pres_reg(). It should probably be a separate patch before this one.
Actually I can't guess the case when base_reg_table would be not a floating constant preshader input. Maybe I just remove this variable and check with FIXME during parsing?
Sure, that's fine. Still the struct seems useful for parameter passing, if not right there in struct d3dx_pres_operand.
- size = rs->table_sizes[table] *
table_info[table].reg_component_count;
- if (offset >= size)
- {
if ((offset & 255) < size)
offset &= 255;
else
offset = offset % size;
- }
Is this just to try and avoid the modulo operation in some cases or is there a deeper reason? If the former, you could extend that to all the power-of-two sizes by first checking if that's the case (!(size & size - 1)) and, if so, and-ing offset with the mask (size - 1). That's assuming the compiler doesn't do the same optimization already.
Let me know if there is some other reason instead.
Just doing (offset % size) will lead to different results and will break the tests. The logic in this out of range index checking is a bit mind breaking.
I guess there was a deeper reason indeed ;)
Is that visible in the tests included in this series? Either way, maybe add a comment so that it's obvious why that thing is there.
Oh, I was sure that yes, but just checked once again and it worked without that first check. It could happen that either (most likely) I need it on some other testing I made in process, or potentially could happen that it was "helping" before I figured out the correct way for floating constant table indexing. I will check that and either remove this check or add another test (probably as a separate patch, as it would likely require effect blob update).
On 03/24/2017 03:31 AM, Matteo Bruni wrote:
2017-03-24 0:39 GMT+01:00 Paul Gofman gofmanp@gmail.com:
On 03/24/2017 01:57 AM, Matteo Bruni wrote:
2017-03-23 16:06 GMT+01:00 Paul Gofman gofmanp@gmail.com:
Signed-off-by: Paul Gofman gofmanp@gmail.com
dlls/d3dx9_36/preshader.c | 103 ++++++++++++++++++++++++++++++++++++------- dlls/d3dx9_36/tests/effect.c | 38 ++++++++-------- 2 files changed, 106 insertions(+), 35 deletions(-)
diff --git a/dlls/d3dx9_36/preshader.c b/dlls/d3dx9_36/preshader.c index 9aedc99..2974ffd 100644 --- a/dlls/d3dx9_36/preshader.c +++ b/dlls/d3dx9_36/preshader.c @@ -191,6 +191,8 @@ struct d3dx_pres_operand /* offset is component index, not register index, e. g. offset for component c3.y is 13 (3 * 4 + 1) */ unsigned int offset;
- enum pres_reg_tables base_reg_table;
- unsigned int base_reg_offset; };
It would be nicer to create a struct for storing together table and offset and to pass it around to the various functions taking both arguments, like parse_pres_reg(). It should probably be a separate patch before this one.
Actually I can't guess the case when base_reg_table would be not a floating constant preshader input. Maybe I just remove this variable and check with FIXME during parsing?
Sure, that's fine. Still the struct seems useful for parameter passing, if not right there in struct d3dx_pres_operand.
Sorry, I messed up things actually, base register actually can come from temporary (register) table also, which happens in the test. So I will introduce extra structure for registers.
static double exec_get_arg(struct d3dx_regstore *rs, const struct d3dx_pres_operand *opr, unsigned int comp) {
- if (!regstore_is_val_set_reg(rs, opr->table, (opr->offset + comp) /
table_info[opr->table].reg_component_count))
WARN("Using uninitialized input, table %u, offset %u.\n",
opr->table, opr->offset + comp);
- unsigned int base_offset, index;
- if (opr->base_reg_table == PRES_REGTAB_COUNT)
index = 0;
- else
index = (int)exec_get_reg_value(rs, opr->base_reg_table,
opr->base_reg_offset);
I think it's better with lrint() in place of the cast.
I just want to note that whenever the input is from floating values FX compiler generates preshader instructions which do the rounding, and skips that if the base register comes from integers. Do you think we need to do extra rounding in this case?
Ah, you're right, I somehow overlooked that. Actually I think doublechecking (e.g. with a FIXME) that the source is really integer would be useful. If it isn't too much of a mess it would be nice to avoid the roundtrip to doubles for the index entirely.
Source of the base register value seems not so straightforward to check. The input table always has float format. To do such check, we need to find which parameter the register is initialized from, we don't have such a link now. Besides, using float register as index seems totally ok in effect, compiler generates rounding code for this case. Moreover, base register used in instruction may come from temporary ("r") registers and be a result of some operations, like rounding or multiplication when indexing anything with stride other than 4xfloat. Or could it be you had in mind not checking the source of register value, but rather that the value is actually an integer (fractional part is zero)?
2017-03-24 2:16 GMT+01:00 Paul Gofman gofmanp@gmail.com:
On 03/24/2017 03:31 AM, Matteo Bruni wrote:
2017-03-24 0:39 GMT+01:00 Paul Gofman gofmanp@gmail.com:
On 03/24/2017 01:57 AM, Matteo Bruni wrote:
2017-03-23 16:06 GMT+01:00 Paul Gofman gofmanp@gmail.com:
Signed-off-by: Paul Gofman gofmanp@gmail.com
dlls/d3dx9_36/preshader.c | 103 ++++++++++++++++++++++++++++++++++++------- dlls/d3dx9_36/tests/effect.c | 38 ++++++++-------- 2 files changed, 106 insertions(+), 35 deletions(-)
diff --git a/dlls/d3dx9_36/preshader.c b/dlls/d3dx9_36/preshader.c index 9aedc99..2974ffd 100644 --- a/dlls/d3dx9_36/preshader.c +++ b/dlls/d3dx9_36/preshader.c @@ -191,6 +191,8 @@ struct d3dx_pres_operand /* offset is component index, not register index, e. g. offset for component c3.y is 13 (3 * 4 + 1) */ unsigned int offset;
- enum pres_reg_tables base_reg_table;
- unsigned int base_reg_offset; };
It would be nicer to create a struct for storing together table and offset and to pass it around to the various functions taking both arguments, like parse_pres_reg(). It should probably be a separate patch before this one.
Actually I can't guess the case when base_reg_table would be not a floating constant preshader input. Maybe I just remove this variable and check with FIXME during parsing?
Sure, that's fine. Still the struct seems useful for parameter passing, if not right there in struct d3dx_pres_operand.
Sorry, I messed up things actually, base register actually can come from temporary (register) table also, which happens in the test. So I will introduce extra structure for registers.
So I had a look at my stuff. As you know it has got very little and specific testing and is most likely missing some cases but I see that: - for relative addressing I accept any register type as base but expect only temporary registers as index (i.e. I look for "flags" == 1 and then for the "base" fields only when parsing TEMP registers) - for CONST registers the "flags" != 0 field has a different meaning. In that case it is an offset in the IMMED table and there is no additional "base" register type + index in the effect bytecode, so it's indexed like imm["flags" + cidx].
Unfortunately I don't think I have any effect with the latter feature around. Supporting that should probably be a separate patch but try to see if you can replicate that.
static double exec_get_arg(struct d3dx_regstore *rs, const struct d3dx_pres_operand *opr, unsigned int comp) {
- if (!regstore_is_val_set_reg(rs, opr->table, (opr->offset + comp)
/ table_info[opr->table].reg_component_count))
WARN("Using uninitialized input, table %u, offset %u.\n",
opr->table, opr->offset + comp);
- unsigned int base_offset, index;
- if (opr->base_reg_table == PRES_REGTAB_COUNT)
index = 0;
- else
index = (int)exec_get_reg_value(rs, opr->base_reg_table,
opr->base_reg_offset);
I think it's better with lrint() in place of the cast.
I just want to note that whenever the input is from floating values FX compiler generates preshader instructions which do the rounding, and skips that if the base register comes from integers. Do you think we need to do extra rounding in this case?
Ah, you're right, I somehow overlooked that. Actually I think doublechecking (e.g. with a FIXME) that the source is really integer would be useful. If it isn't too much of a mess it would be nice to avoid the roundtrip to doubles for the index entirely.
Source of the base register value seems not so straightforward to check.
The input table always has float format. To do such check, we need to find which parameter the register is initialized from, we don't have such a link now.
Right, that's hard to do in the current architecture. Also I confused this with the IMMED relative addressing I mentioned above, for those my code expects the input CONST register to be mapped to an INT parameter. That's only relevant in my architecture since I don't fill all the input registers before executing the preshader but go fetch them "on-the-fly" for each instruction. That is also clearly worse than your solution, of course. Long story short, just ignore this.
Besides, using float register as index seems totally ok in effect, compiler generates rounding code for this case. Moreover, base register used in instruction may come from temporary ("r") registers and be a result of some operations, like rounding or multiplication when indexing anything with stride other than 4xfloat.
Right as well. I don't know, lrintf() seems like a good idea still.
Or could it be you had in mind not checking the source of register
value, but rather that the value is actually an integer (fractional part is zero)?
That might be a bit fragile (because of potential numerical accuracy troubles), although as you noticed the compiler seems to go to great lengths to make sure the value is exactly an integer. Maybe doing that kind of check only in a if (WARN_ON(d3dx)) might make sense as it would be somewhat surprising and it might reveal something else going on but it doesn't seem worth it to me in general.
On 03/24/2017 05:10 PM, Matteo Bruni wrote:
2017-03-24 2:16 GMT+01:00 Paul Gofman gofmanp@gmail.com:
Sorry, I messed up things actually, base register actually can come from temporary (register) table also, which happens in the test. So I will introduce extra structure for registers.
So I had a look at my stuff. As you know it has got very little and specific testing and is most likely missing some cases but I see that:
- for relative addressing I accept any register type as base but
expect only temporary registers as index (i.e. I look for "flags" == 1 and then for the "base" fields only when parsing TEMP registers)
- for CONST registers the "flags" != 0 field has a different meaning.
In that case it is an offset in the IMMED table and there is no additional "base" register type + index in the effect bytecode, so it's indexed like imm["flags" + cidx].
What is seen from my tests for relative addressing (all these cases below are present in the test currently there in the last patch I sent, though I will add another test in the updated patch which covers more cases, and also has a non-zero absolute offset, which helped me to discovered and fix the bug): 1. Temporary register or constant float (e. g. c0) register can be used as base register in relative addressing. Temporary register is used by compiler when it needs to do some precompute for the index value, like rounding, multiplying by stride for structure or any other arithmetic with indexes, otherwise cX registers are used directly.
2. If the value being indexed is 4x vectors aligned, the main register being indexed is cX, it is directly indexed by base register;
3. If the value we are indexing into is some vector, or matrix, or anything not 4x aligned, compiler generates a code doing the indexing through a 'dot' instructions with base register indexing immediate constants to dot with.
This is basically present in the test I sent (doing a trace with +d3dx will show preshaders with these relative addressing types, the string to easily find the code in the output is 'imm['). in the next update I will include another test which is a bit more complicated in terms of this 'dot' arithmetic.
So unless I am missing something I don't see yet how flags have a different meaning for const or immed registers, it looks very straightforward and matches the test.