This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
4
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
3/3
fcmp-vectorize.ll

Differential D36244

[LoopVectorize] Fix assertion failure in Fcmp vectorization
ClosedPublic

Authored by anna on Aug 2 2017, 2:45 PM.

Download Raw Diff

Details

Reviewers

Ayal
mssimpso
mkuper
gilr

Commits

rG9b6e12f3dce3: [LoopVectorize] Fix assertion failure in Fcmp vectorization
rL310389: [LoopVectorize] Fix assertion failure in Fcmp vectorization

Summary

When vectorizing fcmps we can trip on incorrect cast assertion when setting the
FastMathFlags after generating the vectorized FCmp.
This can happen if the FCmp can be folded to true or false directly. The fix
here is to set the FastMathFlag using the FastMathFlagBuilder *before* creating
the FCmp Instruction. This is what's done by other optimizations such as
InstCombine.
Added a test case which trips on cast assertion without this patch.

Diff Detail

Build Status

Buildable 8936
Build 8936: arc lint + arc unit

Event Timeline

anna created this revision.Aug 2 2017, 2:45 PM

Herald added a subscriber: mzolotukhin. · View Herald TranscriptAug 2 2017, 2:45 PM

fhahn added a subscriber: fhahn.Aug 2 2017, 3:27 PM

Ayal added inline comments.Aug 2 2017, 11:37 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
4867	An alternative may be to do if (isa<FCmpInst>(C)) cast<FCmpInst>(C)->copyFastMathFlags(Cmp); or if (auto *FCmp = dyn_cast<FCmpInst>(C)) FCmp->copyFastMathFlags(Cmp); Wonder if the last, optional argument to CreateFCmp() may be used instead? Is this issue relevant elsewhere, e.g., SLP vectorizer? (Maybe that last argument shouldn't be optional ;-)

anna added inline comments.Aug 4 2017, 6:13 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
4867	the alternative way may also be used. The `FastMathFlagGuard` seems to be the accepted practice though - in all of instcombine and LoopUtils transformations where the created instruction may not be a compare. `SLPVectorizer` creates FCmp and uses `propagateIRFlags` from LoopUtils for this. So, SLPVectorizer does not have this bug. However, it's a full blown method that copies IR flags by checking the type of instruction, and we don't need it here in LoopVectorizer.

Hi Anna,

Thanks for taking a look at this. I had a few comments about the test. Otherwise, it makes sense to me.

test/Transforms/LoopVectorize/fcmp-vectorize.ll
1	Since this is a target-independent test, it's probably a good idea to specify the vector width and interleave factor: `-force-vector-width=4 -force-vector-interleave=1`.
3	The test doesn't require assertions, so you can remove this line.
14–16	No need to check the original loop in this case.

Ayal added inline comments.Aug 4 2017, 2:27 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
4867	this alternative way also appears in instcombine, in the form of: auto *FPInst = dyn_cast<Instruction>(RI); if (FPInst && isa<FPMathOperator>(FPInst)) FPInst->copyFastMathFlags(BO); and is also used by `SLPVectorizer` eventually via `copyIRFlags()`/`andIRFlags()`. If you prefer the `FastMathFlagGuard` way, suggest to also attach a preceding comment, e.g.: // Propagate fast math flags. BTW, curious how such a redundant comparison reached the vectorizer? Such things are better cleaned up earlier, to let the vectorizer and its cost model concentrate on 'real' instructions.

Addressed review comments.

lib/Transforms/Vectorize/LoopVectorize.cpp
4867	I've added the comment. Would be good to find out how the redundant comparison came in, will take a look. I think that's an orthogonal performance problem, because there can always be other cases where we have such redundant comparisons flowing through the vectorizer. For example, it could also be a missed optimization or an in-tree/out-of-tree pass ordering issue.

Harbormaster completed remote builds in B9132: Diff 110199.Aug 8 2017, 7:46 AM

Looks good to me, please wait for @mssimpso to approve.

LGTM as well.

This revision is now accepted and ready to land.Aug 8 2017, 8:54 AM

Closed by commit rL310389: [LoopVectorize] Fix assertion failure in Fcmp vectorization (authored by annat). · Explain WhyAug 8 2017, 11:09 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

3 lines

test/

Transforms/

LoopVectorize/

fcmp-vectorize.ll

29 lines

Diff 109436

lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,857 Lines • ▼ Show 20 Lines	case Instruction::FCmp: {
bool FCmp = (I.getOpcode() == Instruction::FCmp);		bool FCmp = (I.getOpcode() == Instruction::FCmp);
auto *Cmp = dyn_cast<CmpInst>(&I);		auto *Cmp = dyn_cast<CmpInst>(&I);
setDebugLocFromInst(Builder, Cmp);		setDebugLocFromInst(Builder, Cmp);
for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
Value *A = getOrCreateVectorValue(Cmp->getOperand(0), Part);		Value *A = getOrCreateVectorValue(Cmp->getOperand(0), Part);
Value *B = getOrCreateVectorValue(Cmp->getOperand(1), Part);		Value *B = getOrCreateVectorValue(Cmp->getOperand(1), Part);
Value *C = nullptr;		Value *C = nullptr;
if (FCmp) {		if (FCmp) {
		IRBuilder<>::FastMathFlagGuard FMFG(Builder);
		Builder.setFastMathFlags(Cmp->getFastMathFlags());
		AyalUnsubmitted Not Done Reply Inline Actions An alternative may be to do if (isa<FCmpInst>(C)) cast<FCmpInst>(C)->copyFastMathFlags(Cmp); or if (auto FCmp = dyn_cast<FCmpInst>(C)) FCmp->copyFastMathFlags(Cmp); Wonder if the last, optional argument to CreateFCmp() may be used instead? Is this issue relevant elsewhere, e.g., SLP vectorizer? (Maybe that last argument shouldn't be optional ;-) Ayal:* An alternative may be to do ``` if (isa<FCmpInst>(C)) cast<FCmpInst>(C)…
		annaAuthorUnsubmitted Not Done Reply Inline Actions the alternative way may also be used. The `FastMathFlagGuard` seems to be the accepted practice though - in all of instcombine and LoopUtils transformations where the created instruction may not be a compare. `SLPVectorizer` creates FCmp and uses `propagateIRFlags` from LoopUtils for this. So, SLPVectorizer does not have this bug. However, it's a full blown method that copies IR flags by checking the type of instruction, and we don't need it here in LoopVectorizer. anna: the alternative way may also be used. The `FastMathFlagGuard` seems to be the accepted…
		AyalUnsubmitted Not Done Reply Inline Actions this alternative way also appears in instcombine, in the form of: auto FPInst = dyn_cast<Instruction>(RI); if (FPInst && isa<FPMathOperator>(FPInst)) FPInst->copyFastMathFlags(BO); and is also used by `SLPVectorizer` eventually via `copyIRFlags()`/`andIRFlags()`. If you prefer the `FastMathFlagGuard` way, suggest to also attach a preceding comment, e.g.: // Propagate fast math flags. BTW, curious how such a redundant comparison reached the vectorizer? Such things are better cleaned up earlier, to let the vectorizer and its cost model concentrate on 'real' instructions. Ayal:* this alternative way also appears in instcombine, in the form of: ``` auto *FPInst =…
		annaAuthorUnsubmitted Not Done Reply Inline Actions I've added the comment. Would be good to find out how the redundant comparison came in, will take a look. I think that's an orthogonal performance problem, because there can always be other cases where we have such redundant comparisons flowing through the vectorizer. For example, it could also be a missed optimization or an in-tree/out-of-tree pass ordering issue. anna: I've added the comment. Would be good to find out how the redundant comparison came in, will…
C = Builder.CreateFCmp(Cmp->getPredicate(), A, B);		C = Builder.CreateFCmp(Cmp->getPredicate(), A, B);
cast<FCmpInst>(C)->copyFastMathFlags(Cmp);
} else {		} else {
C = Builder.CreateICmp(Cmp->getPredicate(), A, B);		C = Builder.CreateICmp(Cmp->getPredicate(), A, B);
}		}
VectorLoopValueMap.setVectorValue(&I, Part, C);		VectorLoopValueMap.setVectorValue(&I, Part, C);
addMetadata(C, &I);		addMetadata(C, &I);
}		}

break;		break;
▲ Show 20 Lines • Show All 3,230 Lines • Show Last 20 Lines

test/Transforms/LoopVectorize/fcmp-vectorize.ll

This file was added.

				; RUN: opt -loop-vectorize -S %s \| FileCheck %s
				mssimpsoUnsubmitted Done Reply Inline Actions Since this is a target-independent test, it's probably a good idea to specify the vector width and interleave factor: `-force-vector-width=4 -force-vector-interleave=1`. mssimpso: Since this is a target-independent test, it's probably a good idea to specify the vector width…

				; REQUIRES: asserts
				mssimpsoUnsubmitted Done Reply Inline Actions The test doesn't require assertions, so you can remove this line. mssimpso: The test doesn't require assertions, so you can remove this line.
				; Avoid crashing while trying to vectorize fcmp that can be folded to vector of
				; i1 true.
				define void @test1() {
				; CHECK-LABEL: test1(
				; CHECK-LABEL: vector.body:
				; CHECK-NEXT: %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]
				; CHECK-NEXT: %broadcast.splatinsert = insertelement <4 x i32> undef, i32 %index, i32 0
				; CHECK: %induction = add <4 x i32> %broadcast.splat, <i32 0, i32 1, i32 2, i32 3>
				; CHECK: %index.next = add i32 %index, 4

				; CHECK-LABEL: loop:
				; CHECK-NEXT: %iv = phi i32 [ %bc.resume.val, %scalar.ph ], [ %ivnext, %loop ]
				; CHECK-NEXT: %fcmp = fcmp uno float 0.000000e+00, 0.000000e+00
				mssimpsoUnsubmitted Not Done Reply Inline Actions No need to check the original loop in this case. mssimpso: No need to check the original loop in this case.
				entry:
				br label %loop

				loop: ; preds = %loop, %entry
				%iv = phi i32 [ 0, %entry ], [ %ivnext, %loop ]
				%fcmp = fcmp uno float 0.000000e+00, 0.000000e+00
				%ivnext = add nsw i32 %iv, 1
				%cnd = icmp sgt i32 %iv, 142
				br i1 %cnd, label %exit, label %loop

				exit: ; preds = %loop
				ret void
				}