We've been adding increasingly complex intrinsics, and after looking at an implementation of inverse trigonometry intrinsics sent to me privately, I've decided that reviewing particularly complex intrinsics written in C is getting unwieldy. At least, it's difficult to actually double-check the math like that.
So I decided to see how difficult it would be to define these functions in HLSL instead. The main reason we haven't done that historically is because most of these functions are not really expressible in HLSL, mostly by virtue of taking polymorphic type arguments. In this patch series I resolve that by basically generating type variants one at a time when the function is invoked.
Apart from that the implementation is relatively simple. We reuse the same hlsl_ctx but a new flex scanner, and save and restore a couple of ctx members so that we're in global scope, then we just invoke the function with a HLSL_IR_CALL, the same way we do with user functions.
In theory, we could reuse this code for complex lowering intrinsics as well—float modulus and integer div/mod come to mind. Currently what makes that awkward is that those lowering passes are done on the entry point after inlining calls, while invoking HLSL-defined functions requires another HLSL_IR_CALL. But this is a relatively simple problem to fix; probably we either run lowering passes earlier, on all functions, or just rerun function inlining.