Currently, X86 floor and ceil intrinsics for vectors are implemented as target-specific intrinsics that use the generic rounding instruction of the corresponding vector processing feature (ROUND* or VRNDSCALE*). This patch replaces those specific cases with calls to target-independent @llvm.floor.* and @llvm.ceil.* intrinsics. This doesn't affect the resulting machine code, as those intrinsics are lowered to the same instructions, but exposes these specific rounding cases to generic optimizations.
This patch also has an LLVM part, D45203. An alternative InstCombine-based implementation is proposed in D48067.
I'd prefer CGBuiltin to detect the specific immediates on the rndscale value. Primarily because we should be able to optimize _mm512_roundscale_pd when the ceil/floor immediate is used.