Would a SELECT not be more friendly to optimisations like known-bits analysis, or does that cause too much code to be emitted in the general case?
Not sure if I understand you correctly, if you mean analyzing the 3 bits of rounding mode bit-by-bit, It think it does not make sense here. It is unlikely that only a part of the bits are known. Rounding mode is specified as a constant or obtained as a result of FLT_ROUNDS or as argument of a function call. Usually it is not obtained by calculations. And yes, in the general case too much code is generated, at least 8 operators, while in this implementation 5 instructions is enough.
Indeed, the general case can be optimized for constant arguments.
If this gets used in a function that does floating point, what prevents the machine schedulers from reordering the FP instructions with this? Don't we need to model FRM as a register and make the FP instructions implicitly use it? You'd also need to be using the constrained FP intrinsics which aren't implemented for RISCV yet.
I changed patterns to those generated by update_llc_test_checks.py, as the test for FLT_ROUNDS_ already uses it.