Currently, X86 floor and ceil intrinsics for vectors are implemented as target-specific intrinsics that use the generic rounding instruction of the corresponding vector processing feature (ROUND* or VRNDSCALE*). This patch replaces those specific cases with calls to target-independent @llvm.floor.* and @llvm.ceil.* intrinsics. This doesn't affect the resulting machine code, as those intrinsics are lowered to the same instructions, but exposes these specific rounding cases to generic optimizations.
|951 ↗||(On Diff #140745)|
I'd prefer CGBuiltin to detect the specific immediates on the rndscale value. Primarily because we should be able to optimize _mm512_roundscale_pd when the ceil/floor immediate is used.
But it’s not really consistent because the mask is being removed early for the packed intrinsics, but late for the scalar intrinsics. Doesn’t it also introduce extra code for fast isel?
It's been a while since I looked at these. Last memory I have is for the conversion from x86 masked ops to the generic LLVM intrinsics, and we did that in InstCombineCalls. I don't know if there was any sound reasoning for that though. If it makes no functional difference, I'd continue with that structure just so we don't become scattered in the transform.