When vectorizing fcmps we can trip on incorrect cast assertion when setting the
FastMathFlags after generating the vectorized FCmp.
This can happen if the FCmp can be folded to true or false directly. The fix
here is to set the FastMathFlag using the FastMathFlagBuilder *before* creating
the FCmp Instruction. This is what's done by other optimizations such as
InstCombine.
Added a test case which trips on cast assertion without this patch.
Details
Diff Detail
- Build Status
Buildable 8936 Build 8936: arc lint + arc unit
Event Timeline
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
4867 | An alternative may be to do if (isa<FCmpInst>(C)) cast<FCmpInst>(C)->copyFastMathFlags(Cmp); or if (auto *FCmp = dyn_cast<FCmpInst>(C)) FCmp->copyFastMathFlags(Cmp); Wonder if the last, optional argument to CreateFCmp() may be used instead? Is this issue relevant elsewhere, e.g., SLP vectorizer? (Maybe that last argument shouldn't be optional ;-) |
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
4867 | the alternative way may also be used. The FastMathFlagGuard seems to be the accepted practice though - in all of instcombine and LoopUtils transformations where the created instruction may not be a compare. SLPVectorizer creates FCmp and uses propagateIRFlags from LoopUtils for this. So, SLPVectorizer does not have this bug. However, it's a full blown method that copies IR flags by checking the type of instruction, and we don't need it here in LoopVectorizer. |
Hi Anna,
Thanks for taking a look at this. I had a few comments about the test. Otherwise, it makes sense to me.
test/Transforms/LoopVectorize/fcmp-vectorize.ll | ||
---|---|---|
1 | Since this is a target-independent test, it's probably a good idea to specify the vector width and interleave factor: -force-vector-width=4 -force-vector-interleave=1. | |
3 | The test doesn't require assertions, so you can remove this line. | |
14–16 | No need to check the original loop in this case. |
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
4867 | this alternative way also appears in instcombine, in the form of: auto *FPInst = dyn_cast<Instruction>(RI); if (FPInst && isa<FPMathOperator>(FPInst)) FPInst->copyFastMathFlags(BO); and is also used by SLPVectorizer eventually via copyIRFlags()/andIRFlags(). If you prefer the FastMathFlagGuard way, suggest to also attach a preceding comment, e.g.: // Propagate fast math flags. BTW, curious how such a redundant comparison reached the vectorizer? Such things are better cleaned up earlier, to let the vectorizer and its cost model concentrate on 'real' instructions. |
Addressed review comments.
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
4867 | I've added the comment. Would be good to find out how the redundant comparison came in, will take a look. I think that's an orthogonal performance problem, because there can always be other cases where we have such redundant comparisons flowing through the vectorizer. For example, it could also be a missed optimization or an in-tree/out-of-tree pass ordering issue. |
An alternative may be to do
or
Wonder if the last, optional argument to CreateFCmp() may be used instead?
Is this issue relevant elsewhere, e.g., SLP vectorizer? (Maybe that last argument shouldn't be optional ;-)