2016-05-31 11:48 GMT+02:00 Paul Gofman gofmanp@gmail.com:
On 05/31/2016 12:24 AM, Matteo Bruni wrote:
This seems overly complicated and fragile. Instead of this kind of "5 instructions bytecode" isn't it easier to just store and then lookup a struct with all the info you need?
Keeping a struct with offsets & count altogether is of course more straightforward, but I intentionally did it this way to save a lot of extra space and to introduce an easier coalescing. When there is a matrix transpose involved (which happens quite often in real apps as far as I have seen) the whole structure would be duplicated for every matrix element. In the current approach each element for transposed matrix has just one offset (destination offset changes sequentially and does not come to the array for each matrix element) and one count of 1. The other way to optimize this is to introduce a separate operation for transposed matrix copy, but will it really be simpler or nicer than it is now? Current approach is indeed a sort of encoding the data copy into a very simple bytecode. Keeping the whole struct and optimizing as a separate op for matrix transpose would be the same in principle, but I am afraid will result in a bit more code and a bit more mess as there will be more cases to handle. I thought the current approach is simpler as the data copy is encoded in a generic and simple way so on the actual copy I do not have to care for any specifics of initial parameter & constant layout, as well as in process of extending the array & coalescing.
I don't see why just moving the part of set_constants() which generates the set_state structures for each "copy" to init time would be more complicated than the current solution. It might take some more memory (not a whole lot more, AFAICS) but other than that it seems strictly nicer to me. Maybe it's a case of "beauty is in the eye of the beholder", I don't know... The issue with transposed matrices seems mostly orthogonal to the encoding of the constant uploads and I agree it's important. What about simply adding a transpose flag and the stride of the matrix and encode it all in one struct (or something like a COPY_STATE_TYPE_COPY_TRANSPOSED op for your approach)? It seems it should be good from a practical standpoint and it should reduce the number of operations (encoded either way) you have to store by a lot.
I guess the end result would be something akin to storing an array of struct d3dx_const_copy_state in place of struct d3dx_const_param_set. I don't know how the code would exactly look like but I would start by moving building all the struct d3dx_const_copy_param to init time.
I am not sure I understand what you mean here, could you please clarify? In the variant currently suggested the whole building of structure responsible for parameter copy is already done solely at init time. Shader/preshader full constant structure is not stored anymore. There is still an array of parameters and constant description left which is currently used in effect.c code for setting sampler states. This could easily be optimized further by not storing the parameters other than samplers and storing just register index instead of the whole constant descs for them, but this looked as a separate step to me not fully related to this patch. Or probably I am misunderstanding your point here.
Yeah, that can and should certainly be a separate patch. What I mean is storing the structs d3dx_const_copy_state you compute in set_constants() (as they are right before actually setting the values) in struct d3dx_const_tab in place of the structs d3dx_const_param_set you use to generate them. Does it make sense?
BTW the coalescing you currently do in add_const_set() could be potentially done as a separate pass after you generate all the "d3dx_const_copy_state-like" structures (but still at init time), which might be simpler.
It can be done as a separate pass, but why? In my understanding it will result just in bigger intermediate array and a few more lines of code. I will need to do all the same but not at once when getting an "copy request" but from scanning the pre-stored array. Or am I missing something?
No, that's right. It is just an alternative option which I mentioned because it might be simpler or more powerful (e.g. it makes possible to coalesce multiple copies which are not next to each other - it probably wouldn't help much at the moment since you can't coalesce accesses from multiple parameters) in some case. Feel free to take or ignore it.