2017-05-12 14:24 GMT+02:00 Paul Gofman gofmanp@gmail.com:
Looping through shared parameters in set_dirty() which is removed by this patch has a major performance impact on parameter update for applications creating a lot of effects and sharing parameters between them, which is a common case. Using versioned scheme also allows for some further optimization.
Signed-off-by: Paul Gofman gofmanp@gmail.com
I don't have serious issues with this patch but I also have a number of questions so I don't feel like signing it off either. Sorry :/
@@ -163,6 +163,7 @@ struct d3dx_const_tab unsigned int const_set_size; struct d3dx_const_param_eval_output *const_set; const enum pres_reg_tables *regset2table;
- ULONG64 update_version;
};
Is it necessary? Could you do without it, using the pass version instead?
struct d3dx_shared_data;
@@ -213,6 +214,8 @@ struct d3dx_parameter UINT bytes; DWORD runtime_flags; DWORD object_id;
- ULONG64 update_version;
- ULONG64 *update_version_counter;
I don't like the "update_version_counter" name that much but I don't have particularly good suggestions. pass_update_version? current_update_version? parent_update_version? new_update_version?
+static inline ULONG64 get_update_version_counter(ULONG64 *update_version_counter) +{
- return ++*update_version_counter;
+}
Maybe it's just me but it's a bit surprising to see this get_x() function having side effects. Not sure how to call it. next_update_version()? inc_update_version_counter()?
+static inline BOOL is_param_dirty(struct d3dx_parameter *param, ULONG64 update_version) +{
- struct d3dx_shared_data *shared_data;
- if ((shared_data = param->top_level_param->u.shared_data))
return update_version < shared_data->update_version;
- else
return update_version < param->top_level_param->update_version;
+}
Did you try if a unified update_version pointer looks any nicer? Especially considering that you have a more or less matching *update_version_counter at the moment.
HRESULT d3dx_evaluate_parameter(struct d3dx_param_eval *peval,
const struct d3dx_parameter *param, void *param_value, BOOL update_all) DECLSPEC_HIDDEN;
const struct d3dx_parameter *param, void *param_value) DECLSPEC_HIDDEN;
I guess this has to do with the update_version in struct d3dx_const_tab. I might be missing something, in that case please clue me.
+static ULONG64 get_effect_update_version_counter(struct d3dx9_base_effect *base) +{
- return get_update_version_counter(get_update_version_counter_ptr(base));
}
Same naming caveat here.
-static void clear_dirty_params(struct d3dx9_base_effect *base) +static void set_dirty(struct d3dx_parameter *param) {
- unsigned int i;
- struct d3dx_shared_data *shared_data;
- struct d3dx_parameter *top_param = param->top_level_param;
- ULONG64 new_update_version = get_update_version_counter(top_param->update_version_counter);
- for (i = 0; i < base->parameter_count; ++i)
base->parameters[i].runtime_flags &= ~PARAMETER_FLAG_DIRTY;
- if ((shared_data = top_param->u.shared_data))
shared_data->update_version = new_update_version;
- else
top_param->update_version = new_update_version;
}
Another option would be NOT to increment the effect / pool counter on each set_dirty() but instead only increment it in BeginPass() and CommitChanges(), after you're done with the update (or well, it's fine at almost any point given that we don't have to care about races). It seems a bit better to me, unless I'm overlooking something and it won't work.
static HRESULT set_string(char **param_data, const char *string) @@ -3078,6 +3074,7 @@ static HRESULT d3dx9_apply_pass_states(struct ID3DXEffectImpl *effect, struct d3 unsigned int i; HRESULT ret; HRESULT hr;
ULONG64 new_update_version = get_effect_update_version_counter(&effect->base_effect);
TRACE("effect %p, pass %p, state_count %u.\n", effect, pass, pass->state_count);
@@ -3109,7 +3106,7 @@ static HRESULT d3dx9_apply_pass_states(struct ID3DXEffectImpl *effect, struct d3 } effect->material_updated = FALSE;
- clear_dirty_params(&effect->base_effect);
- pass->update_version = new_update_version; return ret;
}
Right, here is where I'd update the counter.
On 05/15/2017 09:19 PM, Matteo Bruni wrote:
2017-05-12 14:24 GMT+02:00 Paul Gofman gofmanp@gmail.com:
@@ -163,6 +163,7 @@ struct d3dx_const_tab unsigned int const_set_size; struct d3dx_const_param_eval_output *const_set; const enum pres_reg_tables *regset2table;
- ULONG64 update_version; };
Is it necessary? Could you do without it, using the pass version instead?
I think I can, but the reasons I did this were: 1. The logic in patch 5 (removing redundant constant table updates & preshader recomputes) won't work without it. Imagine an array selector with shader 1 selected, when after BeginPass() or Commit() it will update the version of pass. If another shader is selected later it might not get updated for the some parameter changes, so I will have to ultimately recompute all preshaders and reset constant table parameters on update_all, which I can avoid using this const_tab own version. Even if I manage to workaround this case with array selector somehow, redundant shader constant updates and preshaders recomputes can happen if different passes or techniques use the same shader (as preshader functions will be called with some old update_version, older than the last one for which this was actually called already). So I guess going without would mean abandoning the idea for patch 5, or somehow greatly complicating it.
2. I thought encapsulating constant updates arguably makes (for everything but a very specific case of emulating native behaviour on array selector index) it more structural and easier verifiable. It is just a single value for the whole parameter table, so I didn't think it should add an overhead compared to passing the parameter.
3. Adding 'update_version' parameter to (almost) every preshader.c function call looked more ugly to me.
struct d3dx_shared_data;
@@ -213,6 +214,8 @@ struct d3dx_parameter UINT bytes; DWORD runtime_flags; DWORD object_id;
- ULONG64 update_version;
- ULONG64 *update_version_counter;
I don't like the "update_version_counter" name that much but I don't have particularly good suggestions. pass_update_version? current_update_version? parent_update_version? new_update_version?
It is not a pass update version in my understanding, it relates to parameter or constant table update version as well. So maybe current_update_version? Or just version_counter, which looks consistent to how the same thing is often named in DBs?
+static inline ULONG64 get_update_version_counter(ULONG64 *update_version_counter) +{
- return ++*update_version_counter;
+}
Maybe it's just me but it's a bit surprising to see this get_x() function having side effects. Not sure how to call it. next_update_version()? inc_update_version_counter()?
I will rename to next_update_version().
+static inline BOOL is_param_dirty(struct d3dx_parameter *param, ULONG64 update_version) +{
- struct d3dx_shared_data *shared_data;
- if ((shared_data = param->top_level_param->u.shared_data))
return update_version < shared_data->update_version;
- else
return update_version < param->top_level_param->update_version;
+}
Did you try if a unified update_version pointer looks any nicer? Especially considering that you have a more or less matching *update_version_counter at the moment.
Do you mean a pointer in the update_version in the struct d3dx_parameter which will point to either shared data or the update version in the same parameter? I would like to avoid this if possible as it makes extra indirection for non-shared parameters. I don't have exact performance difference exactly for such a change though, but it looks like removing some indirections here and there (after a more global optimizations of set_constants function) gives some moderate improvement.
In the further optimization which I am testing I go bit more towards less indirection, e. g., in function is_const_tab_input_dirty() I use the fact that actually the parameters there are always top level.
HRESULT d3dx_evaluate_parameter(struct d3dx_param_eval *peval,
const struct d3dx_parameter *param, void *param_value, BOOL update_all) DECLSPEC_HIDDEN;
const struct d3dx_parameter *param, void *param_value) DECLSPEC_HIDDEN;
I guess this has to do with the update_version in struct d3dx_const_tab. I might be missing something, in that case please clue me.
I don't need update_all parameter anymore here (regardless of storing update_version), as I don't have to ever recompute preshader for parameter if it is not required by parameter version change. If update_version is stored d3dx_const_tab, I don't have to pass an update_version here, apart from some other benefits I see in that as I outlined above. Otherwise I need to pass a version as a parameter to this function.
+static ULONG64 get_effect_update_version_counter(struct d3dx9_base_effect *base) +{
- return get_update_version_counter(get_update_version_counter_ptr(base)); }
Same naming caveat here.
next_effect_update_version()?
-static void clear_dirty_params(struct d3dx9_base_effect *base) +static void set_dirty(struct d3dx_parameter *param) {
- unsigned int i;
- struct d3dx_shared_data *shared_data;
- struct d3dx_parameter *top_param = param->top_level_param;
- ULONG64 new_update_version = get_update_version_counter(top_param->update_version_counter);
- for (i = 0; i < base->parameter_count; ++i)
base->parameters[i].runtime_flags &= ~PARAMETER_FLAG_DIRTY;
- if ((shared_data = top_param->u.shared_data))
shared_data->update_version = new_update_version;
- else
}top_param->update_version = new_update_version;
Another option would be NOT to increment the effect / pool counter on each set_dirty() but instead only increment it in BeginPass() and CommitChanges(), after you're done with the update (or well, it's fine at almost any point given that we don't have to care about races). It seems a bit better to me, unless I'm overlooking something and it won't work.
After this patch, set_dirty() is already completely out of any performance hot spots, as far as I could measure so far (though there are the other places like mainly set_constants that I was going to address too).
Not incrementing the version does not improve things much in terms of racing in my understanding. Since we don't have to support it within pool it should be ok either way. If we had to, we would also need to care for some way for atomic & fenced read even when not incrementing, and probably parallel BeginPass with Set... would make non-incrementing logic require additional sync.
Not incrementing the version on update likely won't break anything, if to change the initialization a bit (increment version on effect creation) and carefully watch comparison condition for a sort of "off by one" errors. At least now I cannot figure out what can go wrong, though I need to test it more and to recheck the code paths thoroughly to be sure if it is ok. It looked natural to me to increase the version on parameter update. Don't you think that verifying the overall logic if we don't increment becomes a bit mind breaking? Incrementing the version every time makes the 'updated' check straightforward for any case, regardless of how parameters update sequence interfere with updates 'consuming' sequence. Otherwise we need to check all the possible use cases of "dirty" parameters to be sure.
Or if you suspect it may have some indirect performance impact (which I am keen to test) maybe I could test it as a separate change after some other (more severe) optimization, when I potentially can see this effect more clear?
2017-05-15 22:24 GMT+02:00 Paul Gofman gofmanp@gmail.com:
On 05/15/2017 09:19 PM, Matteo Bruni wrote:
2017-05-12 14:24 GMT+02:00 Paul Gofman gofmanp@gmail.com:
@@ -163,6 +163,7 @@ struct d3dx_const_tab unsigned int const_set_size; struct d3dx_const_param_eval_output *const_set; const enum pres_reg_tables *regset2table;
- ULONG64 update_version; };
Is it necessary? Could you do without it, using the pass version instead?
I think I can, but the reasons I did this were:
- The logic in patch 5 (removing redundant constant table updates &
preshader recomputes) won't work without it. Imagine an array selector with shader 1 selected, when after BeginPass() or Commit() it will update the version of pass. If another shader is selected later it might not get updated for the some parameter changes, so I will have to ultimately recompute all preshaders and reset constant table parameters on update_all, which I can avoid using this const_tab own version. Even if I manage to workaround this case with array selector somehow, redundant shader constant updates and preshaders recomputes can happen if different passes or techniques use the same shader
That seems like a good reason. Do you think it's possible to write a test for it (i.e. does this kind of behavior have some visible effect in black-box testing?)
struct d3dx_shared_data;
@@ -213,6 +214,8 @@ struct d3dx_parameter UINT bytes; DWORD runtime_flags; DWORD object_id;
- ULONG64 update_version;
- ULONG64 *update_version_counter;
I don't like the "update_version_counter" name that much but I don't have particularly good suggestions. pass_update_version? current_update_version? parent_update_version? new_update_version?
It is not a pass update version in my understanding, it relates to parameter or constant table update version as well. So maybe current_update_version? Or just version_counter, which looks consistent to how the same thing is often named in DBs?
Yeah, I like version_counter actually. Pick whatever you prefer.
Do you mean a pointer in the update_version in the struct d3dx_parameter which will point to either shared data or the update version in the same parameter? I would like to avoid this if possible as it makes extra indirection for non-shared parameters. I don't have exact performance difference exactly for such a change though, but it looks like removing some indirections here and there (after a more global optimizations of set_constants function) gives some moderate improvement.
It would look a bit nicer though. Anyway, I don't really mind.
In the further optimization which I am testing I go bit more towards
less indirection, e. g., in function is_const_tab_input_dirty() I use the fact that actually the parameters there are always top level.
Well, that's a simplification (which sounds good BTW), not really a choice between two similar alternative options.
+static ULONG64 get_effect_update_version_counter(struct d3dx9_base_effect *base) +{
- return
get_update_version_counter(get_update_version_counter_ptr(base)); }
Same naming caveat here.
next_effect_update_version()?
Yeah, or something like that.
-static void clear_dirty_params(struct d3dx9_base_effect *base) +static void set_dirty(struct d3dx_parameter *param) {
- unsigned int i;
- struct d3dx_shared_data *shared_data;
- struct d3dx_parameter *top_param = param->top_level_param;
- ULONG64 new_update_version =
get_update_version_counter(top_param->update_version_counter);
- for (i = 0; i < base->parameter_count; ++i)
base->parameters[i].runtime_flags &= ~PARAMETER_FLAG_DIRTY;
- if ((shared_data = top_param->u.shared_data))
shared_data->update_version = new_update_version;
- else
}top_param->update_version = new_update_version;
Another option would be NOT to increment the effect / pool counter on each set_dirty() but instead only increment it in BeginPass() and CommitChanges(), after you're done with the update (or well, it's fine at almost any point given that we don't have to care about races). It seems a bit better to me, unless I'm overlooking something and it won't work.
After this patch, set_dirty() is already completely out of any
performance hot spots, as far as I could measure so far (though there are the other places like mainly set_constants that I was going to address too).
Not incrementing the version does not improve things much in terms of
racing in my understanding. Since we don't have to support it within pool it should be ok either way. If we had to, we would also need to care for some way for atomic & fenced read even when not incrementing, and probably parallel BeginPass with Set... would make non-incrementing logic require additional sync.
No, I'm not suddenly starting to care about race conditions. :)
Not incrementing the version on update likely won't break anything, if
to change the initialization a bit (increment version on effect creation) and carefully watch comparison condition for a sort of "off by one" errors. At least now I cannot figure out what can go wrong, though I need to test it more and to recheck the code paths thoroughly to be sure if it is ok. It looked natural to me to increase the version on parameter update. Don't you think that verifying the overall logic if we don't increment becomes a bit mind breaking? Incrementing the version every time makes the 'updated' check straightforward for any case, regardless of how parameters update sequence interfere with updates 'consuming' sequence. Otherwise we need to check all the possible use cases of "dirty" parameters to be sure.
AFAICS it shouldn't make any difference in behavior, unless there is some bug lurking (and in that case exposing the bug would be a good thing IMO).
Or if you suspect it may have some indirect performance impact (which I
am keen to test) maybe I could test it as a separate change after some other (more severe) optimization, when I potentially can see this effect more clear?
Yeah, I don't see it having any measurable impact but it's still unnecessary (if trivial) work.
I don't mind looking into it later, I mentioned it mostly to doublecheck that we're on the same page.
On 05/16/2017 05:50 PM, Matteo Bruni wrote:
2017-05-15 22:24 GMT+02:00 Paul Gofman gofmanp@gmail.com:
On 05/15/2017 09:19 PM, Matteo Bruni wrote:
2017-05-12 14:24 GMT+02:00 Paul Gofman gofmanp@gmail.com:
@@ -163,6 +163,7 @@ struct d3dx_const_tab unsigned int const_set_size; struct d3dx_const_param_eval_output *const_set; const enum pres_reg_tables *regset2table;
- ULONG64 update_version; };
Is it necessary? Could you do without it, using the pass version instead?
I think I can, but the reasons I did this were:
- The logic in patch 5 (removing redundant constant table updates &
preshader recomputes) won't work without it. Imagine an array selector with shader 1 selected, when after BeginPass() or Commit() it will update the version of pass. If another shader is selected later it might not get updated for the some parameter changes, so I will have to ultimately recompute all preshaders and reset constant table parameters on update_all, which I can avoid using this const_tab own version. Even if I manage to workaround this case with array selector somehow, redundant shader constant updates and preshaders recomputes can happen if different passes or techniques use the same shader
That seems like a good reason. Do you think it's possible to write a test for it (i.e. does this kind of behavior have some visible effect in black-box testing?)
I don't see how the choice between the options here can be exhibited to black box behaviour, unless I figure out some way of "probing" it through race conditions behaviour or something like that which looks a bit insane to me and would require some guessing on native d3dx structures layout. Its all about saving preshader executes and table updates in some cases (which should be producing the same results anyway) and not passing through 'update_version' to all preshader functions.
Do you mean a pointer in the update_version in the struct d3dx_parameter which will point to either shared data or the update version in the same parameter? I would like to avoid this if possible as it makes extra indirection for non-shared parameters. I don't have exact performance difference exactly for such a change though, but it looks like removing some indirections here and there (after a more global optimizations of set_constants function) gives some moderate improvement.
It would look a bit nicer though. Anyway, I don't really mind.
I then suggest to maybe rethink it once again at later stage. I already have the patch which makes different structures for top level parameter and members which saves a few fields for members, I would feel less regret adding an extra pointer field to just top level parameter :) I am not sure yet if that patch is useful at all though as I could not yet reliably measure any performance improvement from it. Anyway, I am likely to experiment more with finer perfomance optimizations after the bigger ones, and at this point it may eventually become beneficial to yet change a few things around this.
Not incrementing the version on update likely won't break anything, if
to change the initialization a bit (increment version on effect creation) and carefully watch comparison condition for a sort of "off by one" errors. At least now I cannot figure out what can go wrong, though I need to test it more and to recheck the code paths thoroughly to be sure if it is ok. It looked natural to me to increase the version on parameter update. Don't you think that verifying the overall logic if we don't increment becomes a bit mind breaking? Incrementing the version every time makes the 'updated' check straightforward for any case, regardless of how parameters update sequence interfere with updates 'consuming' sequence. Otherwise we need to check all the possible use cases of "dirty" parameters to be sure.
AFAICS it shouldn't make any difference in behavior, unless there is some bug lurking (and in that case exposing the bug would be a good thing IMO).
Or if you suspect it may have some indirect performance impact (which I
am keen to test) maybe I could test it as a separate change after some other (more severe) optimization, when I potentially can see this effect more clear?
Yeah, I don't see it having any measurable impact but it's still unnecessary (if trivial) work.
I don't mind looking into it later, I mentioned it mostly to doublecheck that we're on the same page.
I totally agree, I don't see how it can break anything either, moreover, I already briefly tested such a change before replying yesterday (there are couple of more small things to do rather than just not to increment counter to make it work here, but still trivial). Just having a unique version for each parameter update and each "commit" feels like bit less fragile and easier to work with to me, with version equality between pass update_version and parameter update version being impossible and parameter update sequence counting going completely independent of how these updates are accepted by "readers". So unless you feel not comfortable with this increment I would keep it, otherwise can change it.