Page MenuHomePhabricator

[LoopVectorize] Fix assertion failure in Fcmp vectorization
ClosedPublic

Authored by anna on Aug 2 2017, 2:45 PM.

Details

Summary

When vectorizing fcmps we can trip on incorrect cast assertion when setting the
FastMathFlags after generating the vectorized FCmp.
This can happen if the FCmp can be folded to true or false directly. The fix
here is to set the FastMathFlag using the FastMathFlagBuilder *before* creating
the FCmp Instruction. This is what's done by other optimizations such as
InstCombine.
Added a test case which trips on cast assertion without this patch.

Diff Detail

Repository
rL LLVM

Event Timeline

anna created this revision.Aug 2 2017, 2:45 PM
fhahn added a subscriber: fhahn.Aug 2 2017, 3:27 PM
Ayal added inline comments.Aug 2 2017, 11:37 PM
lib/Transforms/Vectorize/LoopVectorize.cpp
4867 ↗(On Diff #109436)

An alternative may be to do

if (isa<FCmpInst>(C))
  cast<FCmpInst>(C)->copyFastMathFlags(Cmp);

or

if (auto *FCmp = dyn_cast<FCmpInst>(C))
  FCmp->copyFastMathFlags(Cmp);

Wonder if the last, optional argument to CreateFCmp() may be used instead?

Is this issue relevant elsewhere, e.g., SLP vectorizer? (Maybe that last argument shouldn't be optional ;-)

anna added inline comments.Aug 4 2017, 6:13 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
4867 ↗(On Diff #109436)

the alternative way may also be used.

The FastMathFlagGuard seems to be the accepted practice though - in all of instcombine and LoopUtils transformations where the created instruction may not be a compare.

SLPVectorizer creates FCmp and uses propagateIRFlags from LoopUtils for this. So, SLPVectorizer does not have this bug. However, it's a full blown method that copies IR flags by checking the type of instruction, and we don't need it here in LoopVectorizer.

mssimpso edited edge metadata.Aug 4 2017, 10:45 AM

Hi Anna,

Thanks for taking a look at this. I had a few comments about the test. Otherwise, it makes sense to me.

test/Transforms/LoopVectorize/fcmp-vectorize.ll
1 ↗(On Diff #109436)

Since this is a target-independent test, it's probably a good idea to specify the vector width and interleave factor: -force-vector-width=4 -force-vector-interleave=1.

3 ↗(On Diff #109436)

The test doesn't require assertions, so you can remove this line.

14–16 ↗(On Diff #109436)

No need to check the original loop in this case.

Ayal added inline comments.Aug 4 2017, 2:27 PM
lib/Transforms/Vectorize/LoopVectorize.cpp
4867 ↗(On Diff #109436)

this alternative way also appears in instcombine, in the form of:

auto *FPInst = dyn_cast<Instruction>(RI);
if (FPInst && isa<FPMathOperator>(FPInst))
  FPInst->copyFastMathFlags(BO);

and is also used by SLPVectorizer eventually via copyIRFlags()/andIRFlags().

If you prefer the FastMathFlagGuard way, suggest to also attach a preceding comment, e.g.:

// Propagate fast math flags.

BTW, curious how such a redundant comparison reached the vectorizer? Such things are better cleaned up earlier, to let the vectorizer and its cost model concentrate on 'real' instructions.

anna updated this revision to Diff 110199.Aug 8 2017, 7:44 AM
anna marked 2 inline comments as done.

Addressed review comments.

lib/Transforms/Vectorize/LoopVectorize.cpp
4867 ↗(On Diff #109436)

I've added the comment. Would be good to find out how the redundant comparison came in, will take a look.

I think that's an orthogonal performance problem, because there can always be other cases where we have such redundant comparisons flowing through the vectorizer. For example, it could also be a missed optimization or an in-tree/out-of-tree pass ordering issue.

Ayal edited edge metadata.Aug 8 2017, 8:36 AM

Looks good to me, please wait for @mssimpso to approve.

mssimpso accepted this revision.Aug 8 2017, 8:54 AM

LGTM as well.

This revision is now accepted and ready to land.Aug 8 2017, 8:54 AM
This revision was automatically updated to reflect the committed changes.