Currently, X86 floor and ceil intrinsics for vectors are implemented as target-specific intrinsics that use the generic rounding instruction of the corresponding vector processing feature (ROUND* or VRNDSCALE*). This patch replaces those specific cases with calls to target-independent @llvm.floor.* and @llvm.ceil.* intrinsics. This doesn't affect the resulting machine code, as those intrinsics are lowered to the same instructions, but exposes these specific rounding cases to generic optimizations.
This patch also has an LLVM part, D45203. An alternative InstCombine-based implementation is proposed in D48067.
I'm not sure we should even try to emit a mask for the legacy scalar intrinsics. Does this get removed well by the middle or backend?