On Tue, 23 Nov 2021 at 15:28, Jan Sikorski jsikorski@codeweavers.com wrote:
-static void wined3d_cs_exec_query_issue(struct wined3d_cs *cs, const void *data) +void wined3d_cs_run_priority_callback(struct wined3d_cs *cs, void (*callback)(void *object), void *object) {
- const struct wined3d_cs_query_issue *op = data;
- struct wined3d_query *query = op->query;
- BOOL poll;
- struct wined3d_cs_callback *op;
- poll = query->query_ops->query_issue(query, op->flags);
- op = wined3d_device_context_require_space(&cs->c, sizeof(*op), WINED3D_CS_QUEUE_MAP);
- op->opcode = WINED3D_CS_OP_CALLBACK;
- op->callback = callback;
- op->object = object;
- if (!cs->thread)
return;
- wined3d_device_context_submit(&cs->c, WINED3D_CS_QUEUE_MAP);
- wined3d_cs_finish(cs, WINED3D_CS_QUEUE_MAP);
+}
The wined3d_cs_finish() call is effectively going to make polling queries synchronous, at least as far as WINED3D_CS_QUEUE_MAP is concerned. That doesn't seem desirable.
If we can’t call directly into OpenGL on the application thread reasonably, I’m not sure how much better can this get. Just removing the wined3d_cs_finish() and getting the result asynchronously may actually make it worse, if a game knows the queries should be done by the time it is called, it will probably just busy wait for it by calling GetData() repeatedly. That said, I don’t think it is so bad, because for occlusion queries, which are the bulk I guess, we probably use the buffer method; and other users of WINED3D_CS_QUEUE_MAP are synchronous too, so it shouldn't clog unless there’s more threads involved. It ran well with the couple games I tested, so there’s that too, but supposedly we could hit a pessimistic case where it’s worse.. Do you have a better way in mind?
I guess what it comes down to is whether there's any reason for the wined3d_cs_run_priority_callback() scheme to exist. In the case of Vulkan queries, those would be polled from application threads after this series of patches, so the CS poll list would just be empty, and as far as I'm aware it doesn't add much overhead in that case. As far as I can tell, the wined3d_cs_run_priority_callback() scheme is not a requirement for polling Vulkan queries from application threads. Then, for the GL backend, when we have ARB_query_buffer_object, the OpenGL occlusion query scheme could be extended to other query types if needed. So that mostly leaves OpenGL without ARB_query_buffer_object where the CS poll list would need to be used. If the wined3d_cs_run_priority_callback() scheme is more efficient in those cases, or perhaps equally efficient but less complicated, etc., that's would of course be worth it. On the face of it though, that doesn't appear to be the case? (And the background here is that the CS poll list was introduced as an optimisation over a scheme not entirely unlike the one in this patch, although before the introduction of WINED3D_CS_QUEUE_MAP and ARB_query_buffer_object.)