This is an archive of the discontinued LLVM Phabricator instance.

[SLP]Fix PR43799: Crash on different sizes of GEP indices.
ClosedPublic

Authored by ABataev on Oct 30 2019, 10:09 AM.

Download Raw Diff

Details

Reviewers

RKSimon
spatel

Commits

rGb80c41cd3c09: [SLP]Fix PR43799: Crash on different sizes of GEP indices.

Summary

If the GEP instructions are going to be vectorized, the indices in those
GEP instructions must be of the same type. Otherwise, the compiler may
crash when trying to build the vector constant.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ABataev created this revision.Oct 30 2019, 10:09 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 30 2019, 10:09 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

Harbormaster completed remote builds in B40286: Diff 227134.Oct 30 2019, 10:12 AM

spatel added inline comments.Oct 30 2019, 11:04 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
4170–4171	I think it's already a fuzzer corner case that we have different types in the bug report example, but let's take that 1 step further: what if the index type in IR is larger than getIntPtrType()? We might illegally truncate a large constant value. Safer to just give up completely if we find mismatched GEP index types?

ABataev marked an inline comment as done.Oct 30 2019, 11:17 AM

ABataev added inline comments.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
4170–4171	Maybe, just give up if GEP at least one index type is larger than intptr?

ABataev marked an inline comment as not done.Oct 30 2019, 11:20 AM

spatel added inline comments.Oct 30 2019, 12:51 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
4170–4171	That should work, but it seems like we are adding code complexity for no real-world reason. Unless I'm misunderstanding - do we have real code / tests where the GEP indexes are different sizes?

ABataev added inline comments.Oct 30 2019, 12:58 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
4170–4171	It does not add a lot of complexity here, just a few additional lines of code.

xbolva00 added a subscriber: xbolva00.Oct 30 2019, 1:02 PM

xbolva00 added inline comments.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
4170–4171	@spatel: Recent memcmp related patch uses CreateGEP_64 (dont know exact name atm; index is i64) - could it be affected with your proposed stronger bail out?. We should be careful to not pessimize the code generated by own passes.

Added a check for GEP indexes greater than pointer size.

Harbormaster completed remote builds in B40299: Diff 227167.Oct 30 2019, 1:16 PM

lebedev.ri added a subscriber: lebedev.ri.Oct 30 2019, 1:20 PM

lebedev.ri added inline comments.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
4170–4171	https://llvm.org/docs/LangRef.html#langref-datalayout Data Layout p[n]:<size>:<abi>:<pref>:<idx> This specifies the size of a pointer and its <abi> and <pref>erred alignments for address space n. The fourth parameter <idx> is a size of index that used for address calculation. If not specified, the default index size is equal to the pointer size. All sizes are in bits. The address space, n, is optional, and if not specified, denotes the default address space 0. The value of n must be in the range [1,2^23).

spatel added inline comments.Oct 31 2019, 8:26 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
4170–4171	For reference, I think the patch that @xbolva00 mentioned is: D69507 (...unfortunately, expandmemcmp is still a codegen pass - D60318 / rL371507 - so that particular case can't be in play yet for the question/concern raised in this patch) Given the LangRef specification, I'm surprised that we're encouraging hard-coding the index size via the Builder API rather than deriving it from the DataLayout. So it seems like we've made a mess, and now we want to deal with that mess here. I still don't see any real-world reason for this patch to try so hard, but if we insist on it, then we should be using: DL->getIndexTypeSizeInBits() ?

Address comments.

Harbormaster completed remote builds in B40367: Diff 227337.Oct 31 2019, 1:54 PM

LGTM

This revision is now accepted and ready to land.Nov 1 2019, 1:10 PM

Closed by commit rGb80c41cd3c09: [SLP]Fix PR43799: Crash on different sizes of GEP indices. (authored by ABataev). · Explain WhyNov 4 2019, 7:48 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

SLPVectorizer.cpp

24 lines

test/

Transforms/

SLPVectorizer/

X86/

crash_gep.ll

23 lines

Diff 227709

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,632 Lines • ▼ Show 20 Lines	case Instruction::GetElementPtr: {
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,
ReuseShuffleIndicies);		ReuseShuffleIndicies);
return;		return;
}		}
}		}

// We don't combine GEPs with non-constant indexes.		// We don't combine GEPs with non-constant indexes.
		Type *Ty1 = VL0->getOperand(1)->getType();
for (Value *V : VL) {		for (Value *V : VL) {
auto Op = cast<Instruction>(V)->getOperand(1);		auto Op = cast<Instruction>(V)->getOperand(1);
if (!isa<ConstantInt>(Op)) {		if (!isa<ConstantInt>(Op) \|\|
		(Op->getType() != Ty1 &&
		Op->getType()->getScalarSizeInBits() >
		DL->getIndexSizeInBits(
		V->getType()->getPointerAddressSpace()))) {
LLVM_DEBUG(dbgs()		LLVM_DEBUG(dbgs()
<< "SLP: not-vectorizable GEP (non-constant indexes).\n");		<< "SLP: not-vectorizable GEP (non-constant indexes).\n");
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,
ReuseShuffleIndicies);		ReuseShuffleIndicies);
return;		return;
}		}
}		}
▲ Show 20 Lines • Show All 1,499 Lines • ▼ Show 20 Lines	switch (ShuffleOrOp) {
case Instruction::GetElementPtr: {		case Instruction::GetElementPtr: {
setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);

Value *Op0 = vectorizeTree(E->getOperand(0));		Value *Op0 = vectorizeTree(E->getOperand(0));

std::vector<Value *> OpVecs;		std::vector<Value *> OpVecs;
for (int j = 1, e = cast<GetElementPtrInst>(VL0)->getNumOperands(); j < e;		for (int j = 1, e = cast<GetElementPtrInst>(VL0)->getNumOperands(); j < e;
++j) {		++j) {
Value *OpVec = vectorizeTree(E->getOperand(j));		ValueList &VL = E->getOperand(j);
		// Need to cast all elements to the same type before vectorization to
		// avoid crash.
		Type *VL0Ty = VL0->getOperand(j)->getType();
		Type *Ty = llvm::all_of(
		VL, [VL0Ty](Value *V) { return VL0Ty == V->getType(); })
		? VL0Ty
		: DL->getIndexType(cast<GetElementPtrInst>(VL0)
		spatelUnsubmitted Not Done Reply Inline Actions I think it's already a fuzzer corner case that we have different types in the bug report example, but let's take that 1 step further: what if the index type in IR is larger than getIntPtrType()? We might illegally truncate a large constant value. Safer to just give up completely if we find mismatched GEP index types? spatel: I think it's already a fuzzer corner case that we have different types in the bug report…
		ABataevAuthorUnsubmitted Not Done Reply Inline Actions Maybe, just give up if GEP at least one index type is larger than intptr? ABataev: Maybe, just give up if GEP at least one index type is larger than intptr?
		spatelUnsubmitted Not Done Reply Inline Actions That should work, but it seems like we are adding code complexity for no real-world reason. Unless I'm misunderstanding - do we have real code / tests where the GEP indexes are different sizes? spatel: That should work, but it seems like we are adding code complexity for no real-world reason.
		ABataevAuthorUnsubmitted Not Done Reply Inline Actions It does not add a lot of complexity here, just a few additional lines of code. ABataev: It does not add a lot of complexity here, just a few additional lines of code.
		xbolva00Unsubmitted Not Done Reply Inline Actions @spatel: Recent memcmp related patch uses CreateGEP_64 (dont know exact name atm; index is i64) - could it be affected with your proposed stronger bail out?. We should be careful to not pessimize the code generated by own passes. xbolva00: @spatel: Recent memcmp related patch uses CreateGEP_64 (dont know exact name atm; index is i64)…
		lebedev.riUnsubmitted Not Done Reply Inline Actions https://llvm.org/docs/LangRef.html#langref-datalayout Data Layout p[n]:<size>:<abi>:<pref>:<idx> This specifies the size of a pointer and its <abi> and <pref>erred alignments for address space n. The fourth parameter <idx> is a size of index that used for address calculation. If not specified, the default index size is equal to the pointer size. All sizes are in bits. The address space, n, is optional, and if not specified, denotes the default address space 0. The value of n must be in the range [1,2^23). lebedev.ri: https://llvm.org/docs/LangRef.html#langref-datalayout > Data Layout > p[n]:<size>:<abi>:<pref>…
		spatelUnsubmitted Not Done Reply Inline Actions For reference, I think the patch that @xbolva00 mentioned is: D69507 (...unfortunately, expandmemcmp is still a codegen pass - D60318 / rL371507 - so that particular case can't be in play yet for the question/concern raised in this patch) Given the LangRef specification, I'm surprised that we're encouraging hard-coding the index size via the Builder API rather than deriving it from the DataLayout. So it seems like we've made a mess, and now we want to deal with that mess here. I still don't see any real-world reason for this patch to try so hard, but if we insist on it, then we should be using: DL->getIndexTypeSizeInBits() ? spatel: For reference, I think the patch that @xbolva00 mentioned is: D69507 (...unfortunately…
		->getPointerOperandType()
		->getScalarType());
		for (Value *&V : VL) {
		auto *CI = cast<ConstantInt>(V);
		V = ConstantExpr::getIntegerCast(CI, Ty,
		CI->getValue().isSignBitSet());
		}
		Value *OpVec = vectorizeTree(VL);
OpVecs.push_back(OpVec);		OpVecs.push_back(OpVec);
}		}

Value *V = Builder.CreateGEP(		Value *V = Builder.CreateGEP(
cast<GetElementPtrInst>(VL0)->getSourceElementType(), Op0, OpVecs);		cast<GetElementPtrInst>(VL0)->getSourceElementType(), Op0, OpVecs);
if (Instruction *I = dyn_cast<Instruction>(V))		if (Instruction *I = dyn_cast<Instruction>(V))
V = propagateMetadata(I, E->Scalars);		V = propagateMetadata(I, E->Scalars);

▲ Show 20 Lines • Show All 3,051 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/crash_gep.ll

Show All 23 Lines	entry:
%add.ptr = getelementptr inbounds i64, i64* %0, i64 1		%add.ptr = getelementptr inbounds i64, i64* %0, i64 1
%1 = ptrtoint i64* %add.ptr to i64		%1 = ptrtoint i64* %add.ptr to i64
%arrayidx = getelementptr inbounds i64, i64* %0, i64 2		%arrayidx = getelementptr inbounds i64, i64* %0, i64 2
store i64 %1, i64* %arrayidx, align 8		store i64 %1, i64* %arrayidx, align 8
%2 = ptrtoint i64* %arrayidx to i64		%2 = ptrtoint i64* %arrayidx to i64
store i64 %2, i64* %add.ptr, align 8		store i64 %2, i64* %add.ptr, align 8
ret i32 undef		ret i32 undef
}		}

		define void @PR43799() {
		; CHECK-LABEL: @PR43799(
		; CHECK-NEXT: entry:
		; CHECK-NEXT: br label [[BODY:%.*]]
		; CHECK: body:
		; CHECK-NEXT: br label [[BODY]]
		; CHECK: epilog:
		; CHECK-NEXT: ret void
		;
		entry:
		br label %body

		body:
		%p.1.i19 = phi i8* [ undef, %entry ], [ %incdec.ptr.i.7, %body ]
		%lsr.iv17 = phi i8* [ undef, %entry ], [ %scevgep113.7, %body ]
		%incdec.ptr.i.7 = getelementptr inbounds i8, i8* undef, i32 1
		%scevgep113.7 = getelementptr i8, i8* undef, i64 1
		br label %body

		epilog:
		ret void
		}