This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Analysis/
-
Analysis/
-
BasicAliasAnalysis.cpp
-
ValueTracking.cpp
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
1/2
promote-alloca-to-lds-select.ll

Differential D82261

[ValueTracking, BasicAA] Don't simplify instructions
ClosedPublic

Authored by nikic on Jun 20 2020, 8:41 AM.

Download Raw Diff

Details

Reviewers

fhahn
asbirlea
efriedma
hfinkel

Commits

rG37d3030711cc: [ValueTracking, BasicAA] Don't simplify instructions

Summary

GetUnderlyingObject() (and by required symmetry DecomposeGEPExpression()) will call SimplifyInstruction on the passed value if other checks fail. This simplification is very expensive, but has little effect in practice. This patch removes the SimplifyInstruction call, and replaces it with a check for single-argument phis (which can occur in canonical IR in LCSSA form), which is the only useful simplification case I was able to identify.

Compile-time numbers: http://llvm-compile-time-tracker.com/compare.php?from=be93ba1fd608cf9bef0a414c3193dff398c80c44&to=38bd3bc987b0c0f2e715859c78cd705e05689534&stat=instructions At O3 the geomean improvement is -1.7%. The largest improvement is SPASS with ThinLTO at -6%.

In test-suite, I see only two tests with a hash difference and no code size difference (PAQ8p, Ptrdist), which indicates that the simplification only ends up being useful very rarely. (I would have liked to figure out which simplification is responsible here, but wasn't able to spot it looking at transformation logs.)

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

nikic created this revision.Jun 20 2020, 8:41 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 20 2020, 8:41 AM

Herald added subscribers: llvm-commits, kerbowa, dexonsmith and 3 others. · View Herald Transcript

nikic marked an inline comment as done.Jun 20 2020, 8:48 AM

nikic added inline comments.

llvm/test/CodeGen/AMDGPU/promote-alloca-to-lds-select.ll
78	Let me explain what is going on here: We have selects on undefs involved here, which means that instruction simplification will simplify to the first select operand. If a real condition is used, then that of course doesn't happen. Because of that, this test is not testing anything useful right now. In fact, I think this indicates that the current usage of SimplifyInstruction inside GetUnderlyingObject may be subtly unsound. GetUnderlyingObject will declare that the first select operand is the underlying object, but other code is permitted to later simplify that select to the second operand instead. In this case optimizations may have been performed based on an incorrect assumption. (While undef can be true or false, it cannot be both at the same time.)

Harbormaster completed remote builds in B61140: Diff 272258.Jun 20 2020, 10:02 AM

LGTM

llvm/test/CodeGen/AMDGPU/promote-alloca-to-lds-select.ll
78	Makes sense. This is called out in https://github.com/llvm/llvm-project/blob/0e3faab6f0fa00668f97747a6a4afa1bc5647ef9/llvm/include/llvm/Analysis/InstructionSimplify.h#L16 .

This revision is now accepted and ready to land.Jun 20 2020, 1:26 PM

LGTM too.

Closed by commit rG37d3030711cc: [ValueTracking, BasicAA] Don't simplify instructions (authored by nikic). · Explain WhyJun 21 2020, 7:56 AM

This revision was automatically updated to reflect the committed changes.

nikic mentioned this in D69914: [LVI] Normalize pointer behavior.Jun 22 2020, 1:19 PM

Revision Contents

Path

Size

llvm/

lib/

Analysis/

BasicAliasAnalysis.cpp

21 lines

ValueTracking.cpp

19 lines

test/

CodeGen/

AMDGPU/

promote-alloca-to-lds-select.ll

17 lines

Diff 272310

llvm/lib/Analysis/BasicAliasAnalysis.cpp

Show First 20 Lines • Show All 486 Lines • ▼ Show 20 Lines	do {
if (Op->getOpcode() == Instruction::BitCast \|\|		if (Op->getOpcode() == Instruction::BitCast \|\|
Op->getOpcode() == Instruction::AddrSpaceCast) {		Op->getOpcode() == Instruction::AddrSpaceCast) {
V = Op->getOperand(0);		V = Op->getOperand(0);
continue;		continue;
}		}

const GEPOperator *GEPOp = dyn_cast<GEPOperator>(Op);		const GEPOperator *GEPOp = dyn_cast<GEPOperator>(Op);
if (!GEPOp) {		if (!GEPOp) {
if (const auto *Call = dyn_cast<CallBase>(V)) {		if (const auto *PHI = dyn_cast<PHINode>(V)) {
		// Look through single-arg phi nodes created by LCSSA.
		if (PHI->getNumIncomingValues() == 1) {
		V = PHI->getIncomingValue(0);
		continue;
		}
		} else if (const auto *Call = dyn_cast<CallBase>(V)) {
// CaptureTracking can know about special capturing properties of some		// CaptureTracking can know about special capturing properties of some
// intrinsics like launder.invariant.group, that can't be expressed with		// intrinsics like launder.invariant.group, that can't be expressed with
// the attributes, but have properties like returning aliasing pointer.		// the attributes, but have properties like returning aliasing pointer.
// Because some analysis may assume that nocaptured pointer is not		// Because some analysis may assume that nocaptured pointer is not
// returned from some special intrinsic (because function would have to		// returned from some special intrinsic (because function would have to
// be marked with returns attribute), it is crucial to use this function		// be marked with returns attribute), it is crucial to use this function
// because it should be in sync with CaptureTracking. Not using it may		// because it should be in sync with CaptureTracking. Not using it may
// cause weird miscompilations where 2 aliasing pointers are assumed to		// cause weird miscompilations where 2 aliasing pointers are assumed to
// noalias.		// noalias.
if (auto *RP = getArgumentAliasingToReturnedPointer(Call, false)) {		if (auto *RP = getArgumentAliasingToReturnedPointer(Call, false)) {
V = RP;		V = RP;
continue;		continue;
}		}
}		}

// If it's not a GEP, hand it off to SimplifyInstruction to see if it
// can come up with something. This matches what GetUnderlyingObject does.
if (const Instruction *I = dyn_cast<Instruction>(V))
// TODO: Get a DominatorTree and AssumptionCache and use them here
// (these are both now available in this function, but this should be
// updated when GetUnderlyingObject is updated). TLI should be
// provided also.
if (const Value *Simplified =
SimplifyInstruction(const_cast<Instruction *>(I), DL)) {
V = Simplified;
continue;
}

Decomposed.Base = V;		Decomposed.Base = V;
return false;		return false;
}		}

// Don't attempt to analyze GEPs over unsized objects.		// Don't attempt to analyze GEPs over unsized objects.
if (!GEPOp->getSourceElementType()->isSized()) {		if (!GEPOp->getSourceElementType()->isSized()) {
Decomposed.Base = V;		Decomposed.Base = V;
return false;		return false;
▲ Show 20 Lines • Show All 1,590 Lines • Show Last 20 Lines

llvm/lib/Analysis/ValueTracking.cpp

Show First 20 Lines • Show All 4,145 Lines • ▼ Show 20 Lines	if (GEPOperator *GEP = dyn_cast<GEPOperator>(V)) {
V = GEP->getPointerOperand();		V = GEP->getPointerOperand();
} else if (Operator::getOpcode(V) == Instruction::BitCast \|\|		} else if (Operator::getOpcode(V) == Instruction::BitCast \|\|
Operator::getOpcode(V) == Instruction::AddrSpaceCast) {		Operator::getOpcode(V) == Instruction::AddrSpaceCast) {
V = cast<Operator>(V)->getOperand(0);		V = cast<Operator>(V)->getOperand(0);
} else if (GlobalAlias *GA = dyn_cast<GlobalAlias>(V)) {		} else if (GlobalAlias *GA = dyn_cast<GlobalAlias>(V)) {
if (GA->isInterposable())		if (GA->isInterposable())
return V;		return V;
V = GA->getAliasee();		V = GA->getAliasee();
} else if (isa<AllocaInst>(V)) {
// An alloca can't be further simplified.
return V;
} else {		} else {
if (auto *Call = dyn_cast<CallBase>(V)) {		if (auto *PHI = dyn_cast<PHINode>(V)) {
		// Look through single-arg phi nodes created by LCSSA.
		if (PHI->getNumIncomingValues() == 1) {
		V = PHI->getIncomingValue(0);
		continue;
		}
		} else if (auto *Call = dyn_cast<CallBase>(V)) {
// CaptureTracking can know about special capturing properties of some		// CaptureTracking can know about special capturing properties of some
// intrinsics like launder.invariant.group, that can't be expressed with		// intrinsics like launder.invariant.group, that can't be expressed with
// the attributes, but have properties like returning aliasing pointer.		// the attributes, but have properties like returning aliasing pointer.
// Because some analysis may assume that nocaptured pointer is not		// Because some analysis may assume that nocaptured pointer is not
// returned from some special intrinsic (because function would have to		// returned from some special intrinsic (because function would have to
// be marked with returns attribute), it is crucial to use this function		// be marked with returns attribute), it is crucial to use this function
// because it should be in sync with CaptureTracking. Not using it may		// because it should be in sync with CaptureTracking. Not using it may
// cause weird miscompilations where 2 aliasing pointers are assumed to		// cause weird miscompilations where 2 aliasing pointers are assumed to
// noalias.		// noalias.
if (auto *RP = getArgumentAliasingToReturnedPointer(Call, false)) {		if (auto *RP = getArgumentAliasingToReturnedPointer(Call, false)) {
V = RP;		V = RP;
continue;		continue;
}		}
}		}

// See if InstructionSimplify knows any relevant tricks.
if (Instruction *I = dyn_cast<Instruction>(V))
// TODO: Acquire a DominatorTree and AssumptionCache and use them.
if (Value *Simplified = SimplifyInstruction(I, {DL, I})) {
V = Simplified;
continue;
}

return V;		return V;
}		}
assert(V->getType()->isPointerTy() && "Unexpected operand type!");		assert(V->getType()->isPointerTy() && "Unexpected operand type!");
}		}
return V;		return V;
}		}

void llvm::GetUnderlyingObjects(const Value *V,		void llvm::GetUnderlyingObjects(const Value *V,
▲ Show 20 Lines • Show All 2,353 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/promote-alloca-to-lds-select.ll

Show First 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @lds_promote_alloca_select_two_derived_constant_pointers() #0 {
%alloca = alloca [16 x i32], align 4, addrspace(5)		%alloca = alloca [16 x i32], align 4, addrspace(5)
%ptr0 = getelementptr inbounds [16 x i32], [16 x i32] addrspace(5)* %alloca, i32 0, i32 1		%ptr0 = getelementptr inbounds [16 x i32], [16 x i32] addrspace(5)* %alloca, i32 0, i32 1
%ptr1 = getelementptr inbounds [16 x i32], [16 x i32] addrspace(5)* %alloca, i32 0, i32 3		%ptr1 = getelementptr inbounds [16 x i32], [16 x i32] addrspace(5)* %alloca, i32 0, i32 3
%select = select i1 undef, i32 addrspace(5)* %ptr0, i32 addrspace(5)* %ptr1		%select = select i1 undef, i32 addrspace(5)* %ptr0, i32 addrspace(5)* %ptr1
store i32 0, i32 addrspace(5)* %select, align 4		store i32 0, i32 addrspace(5)* %select, align 4
ret void		ret void
}		}

		; FIXME: Can be promoted, but we'd have to recursively show that the select
		; operands all point to the same alloca.

; CHECK-LABEL: @lds_promoted_alloca_select_input_select(		; CHECK-LABEL: @lds_promoted_alloca_select_input_select(
; CHECK: getelementptr inbounds [256 x [16 x i32]], [256 x [16 x i32]] addrspace(3)* @lds_promoted_alloca_select_input_select.alloca, i32 0, i32 %{{[0-9]+}}		; CHECK: alloca
; CHECK: %ptr0 = getelementptr inbounds [16 x i32], [16 x i32] addrspace(3)* %{{[0-9]+}}, i32 0, i32 %a		define amdgpu_kernel void @lds_promoted_alloca_select_input_select(i32 %a, i32 %b, i32 %c, i1 %c1, i1 %c2) #0 {
; CHECK: %ptr1 = getelementptr inbounds [16 x i32], [16 x i32] addrspace(3)* %{{[0-9]+}}, i32 0, i32 %b
; CHECK: %ptr2 = getelementptr inbounds [16 x i32], [16 x i32] addrspace(3)* %{{[0-9]+}}, i32 0, i32 %c
; CHECK: %select0 = select i1 undef, i32 addrspace(3)* %ptr0, i32 addrspace(3)* %ptr1
; CHECK: %select1 = select i1 undef, i32 addrspace(3)* %select0, i32 addrspace(3)* %ptr2
; CHECK: store i32 0, i32 addrspace(3)* %select1, align 4
define amdgpu_kernel void @lds_promoted_alloca_select_input_select(i32 %a, i32 %b, i32 %c) #0 {
%alloca = alloca [16 x i32], align 4, addrspace(5)		%alloca = alloca [16 x i32], align 4, addrspace(5)
%ptr0 = getelementptr inbounds [16 x i32], [16 x i32] addrspace(5)* %alloca, i32 0, i32 %a		%ptr0 = getelementptr inbounds [16 x i32], [16 x i32] addrspace(5)* %alloca, i32 0, i32 %a
%ptr1 = getelementptr inbounds [16 x i32], [16 x i32] addrspace(5)* %alloca, i32 0, i32 %b		%ptr1 = getelementptr inbounds [16 x i32], [16 x i32] addrspace(5)* %alloca, i32 0, i32 %b
%ptr2 = getelementptr inbounds [16 x i32], [16 x i32] addrspace(5)* %alloca, i32 0, i32 %c		%ptr2 = getelementptr inbounds [16 x i32], [16 x i32] addrspace(5)* %alloca, i32 0, i32 %c
%select0 = select i1 undef, i32 addrspace(5)* %ptr0, i32 addrspace(5)* %ptr1		%select0 = select i1 %c1, i32 addrspace(5)* %ptr0, i32 addrspace(5)* %ptr1
%select1 = select i1 undef, i32 addrspace(5)* %select0, i32 addrspace(5)* %ptr2		%select1 = select i1 %c2, i32 addrspace(5)* %select0, i32 addrspace(5)* %ptr2
store i32 0, i32 addrspace(5)* %select1, align 4		store i32 0, i32 addrspace(5)* %select1, align 4
ret void		ret void
}		}
		nikicAuthorUnsubmitted Done Reply Inline Actions Let me explain what is going on here: We have selects on undefs involved here, which means that instruction simplification will simplify to the first select operand. If a real condition is used, then that of course doesn't happen. Because of that, this test is not testing anything useful right now. In fact, I think this indicates that the current usage of SimplifyInstruction inside GetUnderlyingObject may be subtly unsound. GetUnderlyingObject will declare that the first select operand is the underlying object, but other code is permitted to later simplify that select to the second operand instead. In this case optimizations may have been performed based on an incorrect assumption. (While undef can be true or false, it cannot be both at the same time.) nikic: Let me explain what is going on here: We have selects on undefs involved here, which means that…
		efriedmaUnsubmitted Not Done Reply Inline Actions Makes sense. This is called out in https://github.com/llvm/llvm-project/blob/0e3faab6f0fa00668f97747a6a4afa1bc5647ef9/llvm/include/llvm/Analysis/InstructionSimplify.h#L16 . efriedma: Makes sense. This is called out in https://github.com/llvm/llvm…

define amdgpu_kernel void @lds_promoted_alloca_select_input_phi(i32 %a, i32 %b, i32 %c) #0 {		define amdgpu_kernel void @lds_promoted_alloca_select_input_phi(i32 %a, i32 %b, i32 %c) #0 {
entry:		entry:
%alloca = alloca [16 x i32], align 4, addrspace(5)		%alloca = alloca [16 x i32], align 4, addrspace(5)
%ptr0 = getelementptr inbounds [16 x i32], [16 x i32] addrspace(5)* %alloca, i32 0, i32 %a		%ptr0 = getelementptr inbounds [16 x i32], [16 x i32] addrspace(5)* %alloca, i32 0, i32 %a
%ptr1 = getelementptr inbounds [16 x i32], [16 x i32] addrspace(5)* %alloca, i32 0, i32 %b		%ptr1 = getelementptr inbounds [16 x i32], [16 x i32] addrspace(5)* %alloca, i32 0, i32 %b
store i32 0, i32 addrspace(5)* %ptr0		store i32 0, i32 addrspace(5)* %ptr0
br i1 undef, label %bb1, label %bb2		br i1 undef, label %bb1, label %bb2
▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines