On 2 February 2017 at 12:43, Józef Kucia jkucia@codeweavers.com wrote:
@@ -3520,19 +3546,19 @@ BOOL context_apply_draw_state(struct wined3d_context *context, wined3d_buffer_load_sysmem(state->index_buffer, context); }
- for (i = 0; i < context->numDirtyEntries; ++i)
- for (i = 0, dirty_idx = 0; i < context->numDirtyEntries; ++i) { DWORD rep = context->dirtyArray[i];
DWORD idx = rep / (sizeof(*context->isStateDirty) * CHAR_BIT);
BYTE shift = rep & ((sizeof(*context->isStateDirty) * CHAR_BIT) - 1);
context->isStateDirty[idx] &= ~(1u << shift);
state_table[rep].apply(context, state, rep);
if (!is_compute_state(rep))
apply_state(context, state_table, state, rep);
else
context->dirtyArray[dirty_idx++] = rep;
I'd rather not do this. How important is it to avoid applying compute dispatch states for draws and vice-versa?
As an aside, "dirtyArray" is mostly redundant, and possibly a little harmful.
On Thu, Feb 2, 2017 at 3:07 PM, Henri Verbeet hverbeet@gmail.com wrote:
On 2 February 2017 at 12:43, Józef Kucia jkucia@codeweavers.com wrote:
@@ -3520,19 +3546,19 @@ BOOL context_apply_draw_state(struct wined3d_context *context, wined3d_buffer_load_sysmem(state->index_buffer, context); }
- for (i = 0; i < context->numDirtyEntries; ++i)
- for (i = 0, dirty_idx = 0; i < context->numDirtyEntries; ++i) { DWORD rep = context->dirtyArray[i];
DWORD idx = rep / (sizeof(*context->isStateDirty) * CHAR_BIT);
BYTE shift = rep & ((sizeof(*context->isStateDirty) * CHAR_BIT) - 1);
context->isStateDirty[idx] &= ~(1u << shift);
state_table[rep].apply(context, state, rep);
if (!is_compute_state(rep))
apply_state(context, state_table, state, rep);
else
context->dirtyArray[dirty_idx++] = rep;
I'd rather not do this. How important is it to avoid applying compute dispatch states for draws and vice-versa?
I think it would complicate things a lot. Applying all states for dispatches would mean that we have to acquire, prepare and load all resources potentially used by draws, e.g. render targets and constant buffers. This would be required mainly because of the way how context_state_fb(), state_cb() and similar are implemented. It could be possibly mitigated to some extent by refactoring how the states are applied, e.g. using a similar mechanism to "context->update_shader_resource_bindings", but this approach doesn't appeal to me.
Applying just necessary states may also avoid unnecessary work, but this shouldn't matter much in practice.
On 2 February 2017 at 15:41, Józef Kucia joseph.kucia@gmail.com wrote:
I think it would complicate things a lot. Applying all states for dispatches would mean that we have to acquire, prepare and load all resources potentially used by draws, e.g. render targets and constant buffers. This would be required mainly because of the way how context_state_fb(), state_cb() and similar are implemented. It could be possibly mitigated to some extent by refactoring how the states are applied, e.g. using a similar mechanism to "context->update_shader_resource_bindings", but this approach doesn't appeal to me.
One approach would be to split things up when invalidating the states. In the longer term, I do think we want to think about reworking how states are applied, but it'll likely be a bit involved.
2017-02-02 19:10 GMT+01:00 Henri Verbeet hverbeet@gmail.com:
On 2 February 2017 at 15:41, Józef Kucia joseph.kucia@gmail.com wrote:
I think it would complicate things a lot. Applying all states for dispatches would mean that we have to acquire, prepare and load all resources potentially used by draws, e.g. render targets and constant buffers. This would be required mainly because of the way how context_state_fb(), state_cb() and similar are implemented. It could be possibly mitigated to some extent by refactoring how the states are applied, e.g. using a similar mechanism to "context->update_shader_resource_bindings", but this approach doesn't appeal to me.
One approach would be to split things up when invalidating the states. In the longer term, I do think we want to think about reworking how states are applied, but it'll likely be a bit involved.
Heh, I was writing a reply mostly along the same lines. It doesn't add much at this point but here it is anyway...
- for (i = 0, dirty_idx = 0; i < context->numDirtyEntries; ++i) { DWORD rep = context->dirtyArray[i];
DWORD idx = rep / (sizeof(*context->isStateDirty) * CHAR_BIT);
BYTE shift = rep & ((sizeof(*context->isStateDirty) * CHAR_BIT) - 1);
context->isStateDirty[idx] &= ~(1u << shift);
state_table[rep].apply(context, state, rep);
if (!is_compute_state(rep))
apply_state(context, state_table, state, rep);
else
context->dirtyArray[dirty_idx++] = rep;
...
Applying just necessary states may also avoid unnecessary work, but this shouldn't matter much in practice.
It shouldn't but it might, I'd not be surprised to see applications change a bunch of draw states before a compute dispatch and change them again before the eventual draw. We've seen similar things before.
Doing it like this does look a bit ugly though. What about changing the dirty flags side instead, so that compute shader states don't end up in isStateDirty / dirtyArray but in their own data structures?