2017-05-24 11:46 GMT+02:00 Paul Gofman gofmanp@gmail.com:
if (!(const_set->table == current_table && current_start_offset == start_offset
&& const_set->direct_copy == first_const->direct_copy
&& current_data == const_set->param->data
&& (const_set->direct_copy || (first_const->param->type == const_set->param->type
&& first_const->param->class == const_set->param->class
&& first_const->param->columns == const_set->param->columns
&& first_const->param->rows == const_set->param->rows
&& first_const->register_count == const_set->register_count
&& (i == const_tab->const_set_count - 1
|| first_const->param->element_count == const_set->param->element_count)))))
{
TRACE("direct_copy %u, i %u, index %u, param %s.%s, current_data %p, const_set->param->data %p, "\
"current_start_offset %u, start_offset %u, const_set->table %u, current_table %u.\n",
const_set->direct_copy, i, index, debugstr_a(param->name),
debugstr_a(const_set->param->name), current_data, const_set->param->data,
current_start_offset, start_offset, const_set->table, current_table);
break;
}
This looks like a debug trace.
TRACE("Merging %u child parameters for %s, not merging %u, direct_copy %u.\n", i - index,
debugstr_a(param->name), const_tab->const_set_count - i, first_const->direct_copy);
I guess it's intentional that you're printing this even when not merging anything, in that case the TRACE sounds a bit awkward though. Maybe have a separate TRACE for that case?
BTW, please format the BOOL with %#x.
if (i > index + 1)
{
first_const->element_count = element_count;
if (first_const->direct_copy)
{
first_const->element_count = 1;
if (index == start_index
&& !(param->type == D3DXPT_VOID && param->class == D3DXPC_STRUCT))
{
if (param_type_to_table_type(param->type) == PRES_VT_MAX)
return D3DERR_INVALIDCALL;
first_const->param = param;
}
first_const->register_count = get_reg_offset(current_table, current_start_offset)
- first_const->register_index;
}
memmove(&const_tab->const_set[index + 1], &const_tab->const_set[i],
sizeof(*const_tab->const_set) * (const_tab->const_set_count - i));
const_tab->const_set_count -= i - index - 1;
}
Just for my own curiosity, do you think it's significant to merge non-direct_copy entries? Not that it hurts anything to have it.
On 05/29/2017 09:24 PM, Matteo Bruni wrote:
2017-05-24 11:46 GMT+02:00 Paul Gofman gofmanp@gmail.com:
if (!(const_set->table == current_table && current_start_offset == start_offset
&& const_set->direct_copy == first_const->direct_copy
&& current_data == const_set->param->data
&& (const_set->direct_copy || (first_const->param->type == const_set->param->type
&& first_const->param->class == const_set->param->class
&& first_const->param->columns == const_set->param->columns
&& first_const->param->rows == const_set->param->rows
&& first_const->register_count == const_set->register_count
&& (i == const_tab->const_set_count - 1
|| first_const->param->element_count == const_set->param->element_count)))))
{
TRACE("direct_copy %u, i %u, index %u, param %s.%s, current_data %p, const_set->param->data %p, "\
"current_start_offset %u, start_offset %u, const_set->table %u, current_table %u.\n",
const_set->direct_copy, i, index, debugstr_a(param->name),
debugstr_a(const_set->param->name), current_data, const_set->param->data,
current_start_offset, start_offset, const_set->table, current_table);
break;
}
This looks like a debug trace.
TRACE("Merging %u child parameters for %s, not merging %u, direct_copy %u.\n", i - index,
debugstr_a(param->name), const_tab->const_set_count - i, first_const->direct_copy);
I guess it's intentional that you're printing this even when not merging anything, in that case the TRACE sounds a bit awkward though. Maybe have a separate TRACE for that case?
Both traces are to have a possibility to check from trace log what is merged and not "merged" and why, for performance analysis. If you think it is more of debug I can remove it, though since it is in initialization only I was thinking it does not add too much noise.
BTW, please format the BOOL with %#x.
if (i > index + 1)
{
first_const->element_count = element_count;
if (first_const->direct_copy)
{
first_const->element_count = 1;
if (index == start_index
&& !(param->type == D3DXPT_VOID && param->class == D3DXPC_STRUCT))
{
if (param_type_to_table_type(param->type) == PRES_VT_MAX)
return D3DERR_INVALIDCALL;
first_const->param = param;
}
first_const->register_count = get_reg_offset(current_table, current_start_offset)
- first_const->register_index;
}
memmove(&const_tab->const_set[index + 1], &const_tab->const_set[i],
sizeof(*const_tab->const_set) * (const_tab->const_set_count - i));
const_tab->const_set_count -= i - index - 1;
}
Just for my own curiosity, do you think it's significant to merge non-direct_copy entries? Not that it hurts anything to have it.
Yes, it is crucial together with the next patches I did not send yet. Even in this patchset removing extra const_set entries is beneficial: it removes a lot of dirty checks when skipping const_set's from the same parameter on CommitChanges, and also initializing the data for inner loops less times. In the next patches which is not there yet I:
- Factor out type conversion which can be used for an array of values, removing switching for each value being set;
- Introduce a separate path for setting scalar and vectors (which is simpler than general matrix case an quite often may be grouped effectively even when it is not 'direct_copy');
- Optimize matrix settings loops.
Copying ~100 element array with transpose is quite a common case. Merging them in a one const_set and doing that things further is a huge optimization actually.
2017-05-29 20:44 GMT+02:00 Paul Gofman gofmanp@gmail.com:
On 05/29/2017 09:24 PM, Matteo Bruni wrote:
2017-05-24 11:46 GMT+02:00 Paul Gofman gofmanp@gmail.com:
if (!(const_set->table == current_table &&
current_start_offset == start_offset
&& const_set->direct_copy ==
first_const->direct_copy
&& current_data == const_set->param->data
&& (const_set->direct_copy ||
(first_const->param->type == const_set->param->type
&& first_const->param->class ==
const_set->param->class
&& first_const->param->columns ==
const_set->param->columns
&& first_const->param->rows ==
const_set->param->rows
&& first_const->register_count ==
const_set->register_count
&& (i == const_tab->const_set_count - 1
|| first_const->param->element_count ==
const_set->param->element_count)))))
{
TRACE("direct_copy %u, i %u, index %u, param %s.%s,
current_data %p, const_set->param->data %p, "\
"current_start_offset %u, start_offset %u,
const_set->table %u, current_table %u.\n",
const_set->direct_copy, i, index,
debugstr_a(param->name),
debugstr_a(const_set->param->name),
current_data, const_set->param->data,
current_start_offset, start_offset,
const_set->table, current_table);
break;
}
This looks like a debug trace.
TRACE("Merging %u child parameters for %s, not merging %u,
direct_copy %u.\n", i - index,
debugstr_a(param->name), const_tab->const_set_count - i,
first_const->direct_copy);
I guess it's intentional that you're printing this even when not merging anything, in that case the TRACE sounds a bit awkward though. Maybe have a separate TRACE for that case?
Both traces are to have a possibility to check from trace log what is merged and not "merged" and why, for performance analysis. If you think it is more of debug I can remove it, though since it is in initialization only I was thinking it does not add too much noise.
The first trace feels pretty noisy to me for upstream Wine, do you think it would be useful in a trace attached on a generic bug to figure out potential performance improvements?
The "Merging" one is fine, I'm just a tiny bit annoyed by the "Merged 0 child" case.
Just for my own curiosity, do you think it's significant to merge non-direct_copy entries? Not that it hurts anything to have it.
Yes, it is crucial together with the next patches I did not send yet. Even in this patchset removing extra const_set entries is beneficial: it removes a lot of dirty checks when skipping const_set's from the same parameter on CommitChanges, and also initializing the data for inner loops less times. In the next patches which is not there yet I:
- Factor out type conversion which can be used for an array of values,
removing switching for each value being set;
- Introduce a separate path for setting scalar and vectors (which is simpler
than general matrix case an quite often may be grouped effectively even when it is not 'direct_copy');
Optimize matrix settings loops.
Copying ~100 element array with transpose is quite a common case.
Merging them in a one const_set and doing that things further is a huge optimization actually.
Cool. Optimizing the transposed matrix case is probably worthy by itself.
On 05/30/2017 02:03 AM, Matteo Bruni wrote:
2017-05-29 20:44 GMT+02:00 Paul Gofman gofmanp@gmail.com:
On 05/29/2017 09:24 PM, Matteo Bruni wrote:
2017-05-24 11:46 GMT+02:00 Paul Gofman gofmanp@gmail.com:
if (!(const_set->table == current_table &&
current_start_offset == start_offset
&& const_set->direct_copy ==
first_const->direct_copy
&& current_data == const_set->param->data
&& (const_set->direct_copy ||
(first_const->param->type == const_set->param->type
&& first_const->param->class ==
const_set->param->class
&& first_const->param->columns ==
const_set->param->columns
&& first_const->param->rows ==
const_set->param->rows
&& first_const->register_count ==
const_set->register_count
&& (i == const_tab->const_set_count - 1
|| first_const->param->element_count ==
const_set->param->element_count)))))
{
TRACE("direct_copy %u, i %u, index %u, param %s.%s,
current_data %p, const_set->param->data %p, "\
"current_start_offset %u, start_offset %u,
const_set->table %u, current_table %u.\n",
const_set->direct_copy, i, index,
debugstr_a(param->name),
debugstr_a(const_set->param->name),
current_data, const_set->param->data,
current_start_offset, start_offset,
const_set->table, current_table);
break;
}
This looks like a debug trace.
TRACE("Merging %u child parameters for %s, not merging %u,
direct_copy %u.\n", i - index,
debugstr_a(param->name), const_tab->const_set_count - i,
first_const->direct_copy);
I guess it's intentional that you're printing this even when not merging anything, in that case the TRACE sounds a bit awkward though. Maybe have a separate TRACE for that case?
Both traces are to have a possibility to check from trace log what is merged and not "merged" and why, for performance analysis. If you think it is more of debug I can remove it, though since it is in initialization only I was thinking it does not add too much noise.
The first trace feels pretty noisy to me for upstream Wine, do you think it would be useful in a trace attached on a generic bug to figure out potential performance improvements?
I will remove it. Anyway, I can recover the same information from the other present TRACE data, just harder. I don't think this particular message alone worth putting as a "performance debugging" patch. If to do some performance analysis remotely, I would start from time measurements dumping in FIXMEs, though I don't have any more or less universal patch for that now apart from timing BeginPass / CommitChanges. I did inner functions timing at some points but adding that for all possible functions of interest together is not feasible as it does biases the outer functions measurements. If to think of a general patch it probably should have switches of what functions to profile, or just rotate that switches on the fly.
The "Merging" one is fine, I'm just a tiny bit annoyed by the "Merged 0 child" case.
BTW merging "0" technically never happens, merging "1" means that it actually merging nothing. I will add another TRACE message for this case. Ironically I had that before some last edits, but considered a single unified TRACE message nicer than more code branching and different TRACEs.
2017-05-30 10:01 GMT+02:00 Paul Gofman gofmanp@gmail.com:
On 05/30/2017 02:03 AM, Matteo Bruni wrote:
2017-05-29 20:44 GMT+02:00 Paul Gofman gofmanp@gmail.com:
On 05/29/2017 09:24 PM, Matteo Bruni wrote:
2017-05-24 11:46 GMT+02:00 Paul Gofman gofmanp@gmail.com:
if (!(const_set->table == current_table &&
current_start_offset == start_offset
&& const_set->direct_copy ==
first_const->direct_copy
&& current_data == const_set->param->data
&& (const_set->direct_copy ||
(first_const->param->type == const_set->param->type
&& first_const->param->class ==
const_set->param->class
&& first_const->param->columns ==
const_set->param->columns
&& first_const->param->rows ==
const_set->param->rows
&& first_const->register_count ==
const_set->register_count
&& (i == const_tab->const_set_count - 1
|| first_const->param->element_count ==
const_set->param->element_count)))))
{
TRACE("direct_copy %u, i %u, index %u, param %s.%s,
current_data %p, const_set->param->data %p, "\
"current_start_offset %u, start_offset %u,
const_set->table %u, current_table %u.\n",
const_set->direct_copy, i, index,
debugstr_a(param->name),
debugstr_a(const_set->param->name),
current_data, const_set->param->data,
current_start_offset, start_offset,
const_set->table, current_table);
break;
}
This looks like a debug trace.
TRACE("Merging %u child parameters for %s, not merging %u,
direct_copy %u.\n", i - index,
debugstr_a(param->name), const_tab->const_set_count -
i, first_const->direct_copy);
I guess it's intentional that you're printing this even when not merging anything, in that case the TRACE sounds a bit awkward though. Maybe have a separate TRACE for that case?
Both traces are to have a possibility to check from trace log what is merged and not "merged" and why, for performance analysis. If you think it is more of debug I can remove it, though since it is in initialization only I was thinking it does not add too much noise.
The first trace feels pretty noisy to me for upstream Wine, do you think it would be useful in a trace attached on a generic bug to figure out potential performance improvements?
I will remove it. Anyway, I can recover the same information from the other present TRACE data, just harder. I don't think this particular message alone worth putting as a "performance debugging" patch. If to do some performance analysis remotely, I would start from time measurements dumping in FIXMEs, though I don't have any more or less universal patch for that now apart from timing BeginPass / CommitChanges. I did inner functions timing at some points but adding that for all possible functions of interest together is not feasible as it does biases the outer functions measurements. If to think of a general patch it probably should have switches of what functions to profile, or just rotate that switches on the fly.
Yeah, my point (which admittedly didn't quite come out clearly in my previous email) is that there is no easy and generic patch which would help with remote performance debugging. There is always going to be some back and forth with debug patches specific to the case at hand and generally it's going to be a PITA to work on...