On Fri Jul 21 14:11:14 2023 +0000, Giovanni Mascellani wrote:
I see, I modified get_register_type_info() and
get_register_type_info_from_handler() to have static lookup arrays, initialized on the first call. If you think this solutions is fine, I can send a similar solution for get_opcode_info() as another MR. Unfortunately that's slightly wrong: without appropriate synchronization two different threads might try to do that initialization step concurrently of step on each other. I believe it's very improbable that anything bad happens, honestly, but it's the kind of thing I'd try to avoid. C doesn't offer, AFAICT, a fully satisfying way to do what you want to do in a DRY and self-contained way (which is part of why I didn't raise this point in my review). The ways forward I can see are these:
- Do the same thing, but use proper locking and/or atomic operations to
ensure concurrent threads do the right thing. It can be tricky and it forces us to deal with threading API in a library that probably shouldn't. I'd avoid that.
- Do the same thing, but store the lookup tables in the parser context
instead of in a static variable. There's a bit of CPU cycle wasting, but it's a small thing, and concurrent threads work on independent tables and do not step on each other.
- Don't do anything fancy, just write both lookup tables in the code.
It's not DRY, so in principle I hate that, but given that the two tables are still rather small and C really tries to get in your way here, it still feels one of the best alternatives.
- Turn to the dark side and embrace the C way: the preprocessor! Create
a file full of `X(VKD3D_SM4_RT_TEMP, VKD3DSPR_TEMP)` lines and include each time you need it defining `X()` appropriately each time. Once you're done with that you start doing things like `#define E } else {` and you're halfway submitting vkd3d-shader to the IOCCC. It's a great career path, though, I think you can get consultancy work from Satan himself. Notice that with the third and fourth alternatives you can also write a `switch` statement instead of a lookup table, in theory giving your compiler even more optimization opportunities. Not sure the compiler is going to make good use of them, though. I'd go with the second or third alternative.
I see your point. AFAIK hlsl_sm4_write() always gets called by a single thread, so I guess that the problem may happen when calling the disassembler. I have to understand that code path better.
Anyways, notice that the initialization part of the functions just read from `register_type_table`, which is const, and write values to `lookup` without reading it, and the values are the same regardless of which thread is writing them. So, even if many threads were to pass the `!computed_lookup` check at the same time, they wouldn't be stepping on each other.
I dare to say that this is one of the weird cases where parallelization doesn't require synchronization structs. Now, the question is if it is possible that the compiler does some optimization that breaks this rather special situation.
I will explore the second and third alternatives though.