This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Replacing X86-specific rounding intrinsics with generic floor-ceil
ClosedPublic

Authored by mike.dvoretsky on Jun 12 2018, 2:59 AM.

Details

Summary

Currently, X86 floor and ceil intrinsics for vectors are implemented as target-specific intrinsics that use the generic rounding instruction of the corresponding vector processing feature (ROUND* or VRNDSCALE*). This patch replaces those specific cases with calls to target-independent @llvm.floor.* and @llvm.ceil.* intrinsics. This doesn't affect the resulting machine code, as those intrinsics are lowered to the same instructions, but exposes these specific rounding cases to generic optimizations.

This patch relies on D45203 to fold the resulting IR patterns and serves as an alternative to D45202.

Diff Detail

Event Timeline

mike.dvoretsky created this revision.Jun 12 2018, 2:59 AM
craig.topper added inline comments.Jun 12 2018, 10:27 AM
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
582

Why int and getSExtValue? Feels like it should be unsigned and getZExtValue.

586

Not sure if we should assume the rounding mode or SAE is a constant int. The clang frontend guarantees it, but handcrafted IR tests could break it.

619

Why would MaskTy vary here? It's fixed by the intrinsic isn't it?

621

There's an overload of CreateAnd that accepts a uint64_t for RHS, so you can probably just pass 1 here without creating the ConstantInt yourself.

652

For less than 8, you should cast i8 to v8i1 first then extract the subvector. We don't want i4 and i2 types floating around.

2575

Why are we calling simplifyX86Round nested under SimplifyDemandedVectorElts? Simplifying should be completely orthogonal. If a simplification happened and the round intrinsic still exists, InstCombine will revisit here and SimplifyDemandedVectorElts will return nullptr. No need to try to handle every case in one shot.

I think where we ultimately want to end up is to remove the masking from the packed intrinsics and replace the scalar intrinsics with versions that use f32/f64 as their types. The IR would then look similar to where we're trying to end up for things like sqrt. But instead of a target independent intrinsic we would have a target specific intrinsic. All the the masking and insert/extract would be completely separate from the operation itself. That would greatly simplify the InstCombine code here because you would just need to trade out the target specific intrinsic for the floor/ceil intrinsic without having to worry about anything else. Thoughts?

mike.dvoretsky marked 5 inline comments as done.

Changes made per comments. Note that zext IR instructions have been fully excluded from all patterns, which will require altering vec_floor.ll tests in D45203 if this revision is accepted.

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
586

Rounding and SAE are immediate. Not using constants here leads to isel failure. Not sure this code should accept such explicitly incorrect cases.

lebedev.ri added inline comments.
llvm/test/Transforms/InstCombine/X86/x86-avx.ll
35

ceil, not deil

llvm/test/Transforms/InstCombine/X86/x86-sse41.ll
7

Can you regenerate this file right now in trunk (and commit), so this noise is gone?

Yes an non-constant value will fail isel and print a readable error message. But if you assume constant in instcombine by just using “cast” and it’s not we’ll get a segmentation fault in release builds before we get to isel. This is a much worse experience for users. So I suggest just ignoring any intrinsics with non-constant here and let isel throw an error.

mike.dvoretsky marked 3 inline comments as done.

Fixed the typo in the test name and added checks to make the transform stop if the rounding mode immediate and/or SAE are not constant.

This revision is now accepted and ready to land.Jun 14 2018, 12:31 PM
This revision was automatically updated to reflect the committed changes.