On 05/31/2016 12:24 AM, Matteo Bruni wrote:
+struct d3dx_const_copy_state
I'm not entirely happy with the name, maybe d3dx_const_param_eval_output or something in that fashion?
I will rename to d3dx_const_param_eval_output.
unsigned int out;
unsigned int *in = (unsigned int *)param->data + set_state.param_offset + j;
switch (table_info[set_state.table].type)
{
case PRES_VT_FLOAT: set_number(&out, D3DXPT_FLOAT, in, param->type); break;
case PRES_VT_INT: set_number(&out, D3DXPT_INT, in, param->type); break;
case PRES_VT_BOOL: set_number(&out, D3DXPT_BOOL, in, param->type); break;
default:
FIXME("Unexpected type %#x.\n", table_info[set_state.table].type);
break;
}
regstore_set_values(rs, set_state.table, &out, set_state.table_offset + j, 1);
}
}
set_state.param_offset += count;
set_state.table_offset += count;
break;
}
}
- }
+}
This seems overly complicated and fragile. Instead of this kind of "5 instructions bytecode" isn't it easier to just store and then lookup a struct with all the info you need?
Keeping a struct with offsets & count altogether is of course more straightforward, but I intentionally did it this way to save a lot of extra space and to introduce an easier coalescing. When there is a matrix transpose involved (which happens quite often in real apps as far as I have seen) the whole structure would be duplicated for every matrix element. In the current approach each element for transposed matrix has just one offset (destination offset changes sequentially and does not come to the array for each matrix element) and one count of 1. The other way to optimize this is to introduce a separate operation for transposed matrix copy, but will it really be simpler or nicer than it is now? Current approach is indeed a sort of encoding the data copy into a very simple bytecode. Keeping the whole struct and optimizing as a separate op for matrix transpose would be the same in principle, but I am afraid will result in a bit more code and a bit more mess as there will be more cases to handle. I thought the current approach is simpler as the data copy is encoded in a generic and simple way so on the actual copy I do not have to care for any specifics of initial parameter & constant layout, as well as in process of extending the array & coalescing.
I guess the end result would be something akin to storing an array of struct d3dx_const_copy_state in place of struct d3dx_const_param_set. I don't know how the code would exactly look like but I would start by moving building all the struct d3dx_const_copy_param to init time.
I am not sure I understand what you mean here, could you please clarify? In the variant currently suggested the whole building of structure responsible for parameter copy is already done solely at init time. Shader/preshader full constant structure is not stored anymore. There is still an array of parameters and constant description left which is currently used in effect.c code for setting sampler states. This could easily be optimized further by not storing the parameters other than samplers and storing just register index instead of the whole constant descs for them, but this looked as a separate step to me not fully related to this patch. Or probably I am misunderstanding your point here.
BTW the coalescing you currently do in add_const_set() could be potentially done as a separate pass after you generate all the "d3dx_const_copy_state-like" structures (but still at init time), which might be simpler.
It can be done as a separate pass, but why? In my understanding it will result just in bigger intermediate array and a few more lines of code. I will need to do all the same but not at once when getting an "copy request" but from scanning the pre-stored array. Or am I missing something?
Whitespace.
@@ -931,21 +1107,17 @@ static HRESULT set_constants_param(struct d3dx_regstore *rs, struct d3dx_const_t major_stride = max(minor, table_info[table].reg_component_count); n = min(major * major_stride, desc.RegisterCount * table_info[table].reg_component_count + major_stride - 1) / major_stride;
- for (i = 0; i < n; ++i) { for (j = 0; j < minor; ++j) {
unsigned int out;
unsigned int *in; unsigned int offset; offset = start_offset + i * major_stride + j;
if (offset / table_info[table].reg_component_count >= rs->table_sizes[table])
{
if (table_info[table].reg_component_count != 1)
FIXME("Output offset exceeds table size, name %s, component %u.\n", desc.Name, i);
if ((offset - start_offset) / table_info[table].reg_component_count >= desc.RegisterCount) break;
Maybe keeping a WARN or something makes sense here?
This break is just normal for boolean register set (it is covered by test case) and should not happen for vec4 register sets. This is because the inner loop can never be bound by desc.RegisterCount for vec4 registers (as nminor <= 4 and limiting just outer 'n' loop counter is sufficient), but can be bound for boolean single value registers.
I will get it back in the same way it was before (FIXME for vec4 register sets and nothing for boolean).