2017-05-24 11:46 GMT+02:00 Paul Gofman gofmanp@gmail.com:
static unsigned int get_reg_offset(unsigned int table, unsigned int offset) {
- return offset / table_info[table].reg_component_count;
- return table == PRES_REGTAB_OBCONST ? offset : offset >> 2;
+}
+static unsigned int get_offset_reg(unsigned int table, unsigned int reg_idx) +{
- return table == PRES_REGTAB_OBCONST ? reg_idx : reg_idx << 2;
}
I suppose using an explicit division and multiplication here wouldn't change the generated code, in which case I'd slightly prefer that. It doesn't matter right now, just mentioning it for potentially similar cases in the future.
Also this patch probably should have been split in two, with the first patch introducing get_offset_reg() with no functional changes and the second hardcoding the table check / dropping the table lookup. Not worth a rework at this point I guess.
@@ -1033,9 +1035,9 @@ static void set_constants(struct d3dx_regstore *rs, struct d3dx_const_tab *const unsigned int offset;
offset = start_offset + i * major_stride + j;
if (offset / table_info[table].reg_component_count >= rs->table_sizes[table])
if (get_reg_offset(table, offset) >= rs->table_sizes[table]) {
if (table_info[table].reg_component_count != 1)
if (table != PRES_REGTAB_OBCONST) FIXME("Output offset exceeds table size, name %s, component %u.\n", debugstr_a(param->name), i); break;
Not new and probably I just forgot the details, but, why would the register component count matter for the FIXME?
offset = reg_index * table_info[table].reg_component_count
+ offset % table_info[table].reg_component_count;
offset = get_offset_reg(table, reg_index) + offset % get_offset_reg(table, 1);
That "get_offset_reg(table, 1)" comes up a number of times. Maybe worth making an extra get_reg_components(table) (or similar) helper? Simply for even greater clarity.
On 05/29/2017 09:19 PM, Matteo Bruni wrote:
2017-05-24 11:46 GMT+02:00 Paul Gofman gofmanp@gmail.com:
static unsigned int get_reg_offset(unsigned int table, unsigned int offset) {
- return offset / table_info[table].reg_component_count;
- return table == PRES_REGTAB_OBCONST ? offset : offset >> 2;
+}
+static unsigned int get_offset_reg(unsigned int table, unsigned int reg_idx) +{
- return table == PRES_REGTAB_OBCONST ? reg_idx : reg_idx << 2; }
I suppose using an explicit division and multiplication here wouldn't change the generated code, in which case I'd slightly prefer that. It doesn't matter right now, just mentioning it for potentially similar cases in the future.
Yes, sure, if it would be constant literal for multiplication / division here.
@@ -1033,9 +1035,9 @@ static void set_constants(struct d3dx_regstore *rs, struct d3dx_const_tab *const unsigned int offset;
offset = start_offset + i * major_stride + j;
if (offset / table_info[table].reg_component_count >= rs->table_sizes[table])
if (get_reg_offset(table, offset) >= rs->table_sizes[table]) {
if (table_info[table].reg_component_count != 1)
if (table != PRES_REGTAB_OBCONST) FIXME("Output offset exceeds table size, name %s, component %u.\n", debugstr_a(param->name), i); break;
Not new and probably I just forgot the details, but, why would the register component count matter for the FIXME?
The length of the data to be copied to registers can be limited by RegisterCount in (pre)shader constant description. But when register is 4-value, the major count for output loop should already mind that (since we always have row count and column count <= 4). The only legitimate case we may need to break from the inner loop is boolean constant matrix setting. So actually fixme condition should never happen, though break from the inner loop on boolean constant setting may be required. I am changing this place in the next patches (I didn't send yet) when optimizing this place. I was actually going to add a bit more tests for this case along with the forthcoming changes, and I suspect a bug here (I actually need to effectively compare with constant register count and not with the table size). We have a test for full update with boolean matrices (which will not trigger the error here) but commit of individual matrices test probably will.
offset = reg_index * table_info[table].reg_component_count
+ offset % table_info[table].reg_component_count;
offset = get_offset_reg(table, reg_index) + offset % get_offset_reg(table, 1);
That "get_offset_reg(table, 1)" comes up a number of times. Maybe worth making an extra get_reg_components(table) (or similar) helper? Simply for even greater clarity.
Yes, sure.
2017-05-29 20:32 GMT+02:00 Paul Gofman gofmanp@gmail.com:
On 05/29/2017 09:19 PM, Matteo Bruni wrote:
2017-05-24 11:46 GMT+02:00 Paul Gofman gofmanp@gmail.com:
static unsigned int get_reg_offset(unsigned int table, unsigned int offset) {
- return offset / table_info[table].reg_component_count;
- return table == PRES_REGTAB_OBCONST ? offset : offset >> 2;
+}
+static unsigned int get_offset_reg(unsigned int table, unsigned int reg_idx) +{
- return table == PRES_REGTAB_OBCONST ? reg_idx : reg_idx << 2; }
I suppose using an explicit division and multiplication here wouldn't change the generated code, in which case I'd slightly prefer that. It doesn't matter right now, just mentioning it for potentially similar cases in the future.
Yes, sure, if it would be constant literal for multiplication / division here.
Yeah, it should be a trivial optimization for the compiler and a tiny bit clearer for the reader.
Not a big deal of course.
@@ -1033,9 +1035,9 @@ static void set_constants(struct d3dx_regstore *rs, struct d3dx_const_tab *const unsigned int offset;
offset = start_offset + i * major_stride + j;
if (offset / table_info[table].reg_component_count >=
rs->table_sizes[table])
if (get_reg_offset(table, offset) >=
rs->table_sizes[table]) {
if (table_info[table].reg_component_count != 1)
if (table != PRES_REGTAB_OBCONST) FIXME("Output offset exceeds table size, name
%s, component %u.\n", debugstr_a(param->name), i); break;
Not new and probably I just forgot the details, but, why would the register component count matter for the FIXME?
The length of the data to be copied to registers can be limited by RegisterCount in (pre)shader constant description. But when register is 4-value, the major count for output loop should already mind that (since we always have row count and column count <= 4). The only legitimate case we may need to break from the inner loop is boolean constant matrix setting. So actually fixme condition should never happen, though break from the inner loop on boolean constant setting may be required. I am changing this place in the next patches (I didn't send yet) when optimizing this place. I was actually going to add a bit more tests for this case along with the forthcoming changes, and I suspect a bug here (I actually need to effectively compare with constant register count and not with the table size). We have a test for full update with boolean matrices (which will not trigger the error here) but commit of individual matrices test probably will.
Right, that makes sense, including the likely bug you mention.
In theory it should be possible to do the FIXME check at init time. Actually, the whole constant upload info could be precomputed. Just putting it out there...
On 05/30/2017 01:59 AM, Matteo Bruni wrote:
@@ -1033,9 +1035,9 @@ static void set_constants(struct d3dx_regstore *rs, struct d3dx_const_tab *const unsigned int offset;
offset = start_offset + i * major_stride + j;
if (offset / table_info[table].reg_component_count >=
rs->table_sizes[table])
if (get_reg_offset(table, offset) >=
rs->table_sizes[table]) {
if (table_info[table].reg_component_count != 1)
if (table != PRES_REGTAB_OBCONST) FIXME("Output offset exceeds table size, name
%s, component %u.\n", debugstr_a(param->name), i); break;
Not new and probably I just forgot the details, but, why would the register component count matter for the FIXME?
The length of the data to be copied to registers can be limited by RegisterCount in (pre)shader constant description. But when register is 4-value, the major count for output loop should already mind that (since we always have row count and column count <= 4). The only legitimate case we may need to break from the inner loop is boolean constant matrix setting. So actually fixme condition should never happen, though break from the inner loop on boolean constant setting may be required. I am changing this place in the next patches (I didn't send yet) when optimizing this place. I was actually going to add a bit more tests for this case along with the forthcoming changes, and I suspect a bug here (I actually need to effectively compare with constant register count and not with the table size). We have a test for full update with boolean matrices (which will not trigger the error here) but commit of individual matrices test probably will.
Right, that makes sense, including the likely bug you mention.
In theory it should be possible to do the FIXME check at init time. Actually, the whole constant upload info could be precomputed. Just putting it out there...
My plan was actually to get rid of this FIXME at all (it is actually more of a debug assertion rather than true FIXME which seems redundant now), and besides introduce more code paths for matrix settings, like e. g. the cases when output stride is the same as input. I strictly want to avoid any 'if' in the inner loops at least. Ideally I would maybe like to end up with strides values specialized loops for possible matrix sizes, but not sure yet if it worth it. Regarding precomputing matrix strides / counts, I tried to store them all in const_set structure (after the optimizations I mentioned), but did not see any improvement. It takes quite a lot of fields and probably getting them on the fly takes not more time than adding much more data to const_set.
2017-05-30 9:51 GMT+02:00 Paul Gofman gofmanp@gmail.com:
On 05/30/2017 01:59 AM, Matteo Bruni wrote:
@@ -1033,9 +1035,9 @@ static void set_constants(struct d3dx_regstore *rs, struct d3dx_const_tab *const unsigned int offset;
offset = start_offset + i * major_stride + j;
if (offset / table_info[table].reg_component_count >=
rs->table_sizes[table])
if (get_reg_offset(table, offset) >=
rs->table_sizes[table]) {
if (table_info[table].reg_component_count != 1)
if (table != PRES_REGTAB_OBCONST) FIXME("Output offset exceeds table size,
name %s, component %u.\n", debugstr_a(param->name), i); break;
Not new and probably I just forgot the details, but, why would the register component count matter for the FIXME?
The length of the data to be copied to registers can be limited by RegisterCount in (pre)shader constant description. But when register is 4-value, the major count for output loop should already mind that (since we always have row count and column count <= 4). The only legitimate case we may need to break from the inner loop is boolean constant matrix setting. So actually fixme condition should never happen, though break from the inner loop on boolean constant setting may be required. I am changing this place in the next patches (I didn't send yet) when optimizing this place. I was actually going to add a bit more tests for this case along with the forthcoming changes, and I suspect a bug here (I actually need to effectively compare with constant register count and not with the table size). We have a test for full update with boolean matrices (which will not trigger the error here) but commit of individual matrices test probably will.
Right, that makes sense, including the likely bug you mention.
In theory it should be possible to do the FIXME check at init time. Actually, the whole constant upload info could be precomputed. Just putting it out there...
My plan was actually to get rid of this FIXME at all (it is actually more of a debug assertion rather than true FIXME which seems redundant now), and besides introduce more code paths for matrix settings, like e. g. the cases when output stride is the same as input. I strictly want to avoid any 'if' in the inner loops at least. Ideally I would maybe like to end up with strides values specialized loops for possible matrix sizes, but not sure yet if it worth it. Regarding precomputing matrix strides / counts, I tried to store them all in const_set structure (after the optimizations I mentioned), but did not see any improvement. It takes quite a lot of fields and probably getting them on the fly takes not more time than adding much more data to const_set.
Makes sense.
On 05/30/2017 01:59 AM, Matteo Bruni wrote:
@@ -1033,9 +1035,9 @@ static void set_constants(struct d3dx_regstore *rs, struct d3dx_const_tab *const unsigned int offset;
offset = start_offset + i * major_stride + j;
if (offset / table_info[table].reg_component_count >=
rs->table_sizes[table])
if (get_reg_offset(table, offset) >=
rs->table_sizes[table]) {
if (table_info[table].reg_component_count != 1)
if (table != PRES_REGTAB_OBCONST) FIXME("Output offset exceeds table size, name
%s, component %u.\n", debugstr_a(param->name), i); break;
Not new and probably I just forgot the details, but, why would the register component count matter for the FIXME?
The length of the data to be copied to registers can be limited by RegisterCount in (pre)shader constant description. But when register is 4-value, the major count for output loop should already mind that (since we always have row count and column count <= 4). The only legitimate case we may need to break from the inner loop is boolean constant matrix setting. So actually fixme condition should never happen, though break from the inner loop on boolean constant setting may be required. I am changing this place in the next patches (I didn't send yet) when optimizing this place. I was actually going to add a bit more tests for this case along with the forthcoming changes, and I suspect a bug here (I actually need to effectively compare with constant register count and not with the table size). We have a test for full update with boolean matrices (which will not trigger the error here) but commit of individual matrices test probably will.
Right, that makes sense, including the likely bug you mention.
In theory it should be possible to do the FIXME check at init time. Actually, the whole constant upload info could be precomputed. Just putting it out there...
My plan was actually to get rid of this FIXME at all (it is actually more of a debug assertion rather than true FIXME which seems redundant now), and besides introduce more code paths for matrix settings, like e. g. the cases when output stride is the same as input. I strictly want to avoid any 'if' in the inner loops at least. Ideally I would maybe like to end up with strides values specialized loops for possible matrix sizes, but not sure yet if it worth it. Regarding precomputing matrix strides / counts, I tried to store them all in const_set structure (after the optimizations I mentioned), but did not see any improvement. It takes quite a lot of fields and probably getting them on the fly takes not more time than adding much more data to const_set.