Currently, X86 floor and ceil intrinsics for vectors are implemented as target-specific intrinsics that use the generic rounding instruction of the corresponding vector processing feature (ROUND* or VRNDSCALE*). This patch replaces those specific cases with calls to target-independent @llvm.floor.* and @llvm.ceil.* intrinsics. This doesn't affect the resulting machine code, as those intrinsics are lowered to the same instructions, but exposes these specific rounding cases to generic optimizations.
|582 ↗||(On Diff #150911)|
Why int and getSExtValue? Feels like it should be unsigned and getZExtValue.
|586 ↗||(On Diff #150911)|
Not sure if we should assume the rounding mode or SAE is a constant int. The clang frontend guarantees it, but handcrafted IR tests could break it.
|619 ↗||(On Diff #150911)|
Why would MaskTy vary here? It's fixed by the intrinsic isn't it?
|621 ↗||(On Diff #150911)|
There's an overload of CreateAnd that accepts a uint64_t for RHS, so you can probably just pass 1 here without creating the ConstantInt yourself.
|652 ↗||(On Diff #150911)|
For less than 8, you should cast i8 to v8i1 first then extract the subvector. We don't want i4 and i2 types floating around.
|2575 ↗||(On Diff #150911)|
Why are we calling simplifyX86Round nested under SimplifyDemandedVectorElts? Simplifying should be completely orthogonal. If a simplification happened and the round intrinsic still exists, InstCombine will revisit here and SimplifyDemandedVectorElts will return nullptr. No need to try to handle every case in one shot.
I think where we ultimately want to end up is to remove the masking from the packed intrinsics and replace the scalar intrinsics with versions that use f32/f64 as their types. The IR would then look similar to where we're trying to end up for things like sqrt. But instead of a target independent intrinsic we would have a target specific intrinsic. All the the masking and insert/extract would be completely separate from the operation itself. That would greatly simplify the InstCombine code here because you would just need to trade out the target specific intrinsic for the floor/ceil intrinsic without having to worry about anything else. Thoughts?
Changes made per comments. Note that zext IR instructions have been fully excluded from all patterns, which will require altering vec_floor.ll tests in D45203 if this revision is accepted.
|586 ↗||(On Diff #150911)|
Rounding and SAE are immediate. Not using constants here leads to isel failure. Not sure this code should accept such explicitly incorrect cases.
Yes an non-constant value will fail isel and print a readable error message. But if you assume constant in instcombine by just using “cast” and it’s not we’ll get a segmentation fault in release builds before we get to isel. This is a much worse experience for users. So I suggest just ignoring any intrinsics with non-constant here and let isel throw an error.