This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/IPO/
-
Transforms/
-
IPO/
-
ArgumentPromotion.cpp
-
test/Transforms/ArgumentPromotion/
-
Transforms/
-
ArgumentPromotion/
-
diamond-graph-no-promotion.ll
-
phi-loop-no-arg-promotion.ll

Differential D123669

[ArgPromotion] Use a Visited set to protect dead instruction collection
AbandonedPublic

Authored by psamolysov on Apr 13 2022, 4:23 AM.

Download Raw Diff

Details

Reviewers

chandlerc
nikic
antoniofrighetto
beanz
silvas

Summary

Although it looks like SmallVector is enough for dead instruction
collection as well as for load instruction searching and no Visited set
is actually needed, a Visited set has been added for dead instruction
collection to make it safe and consistent with load instruction
searching in the 'findArgParts'.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60,030 ms	x64 debian > libFuzzer.libFuzzer::fuzzer-leak.test
	60,070 ms	x64 debian > libFuzzer.libFuzzer::large.test
	60,020 ms	x64 debian > libFuzzer.libFuzzer::minimize_crash.test
	60,040 ms	x64 debian > libFuzzer.libFuzzer::out-of-process-fuzz.test
	60,030 ms	x64 debian > libFuzzer.libFuzzer::value-profile-load.test

Event Timeline

psamolysov created this revision.Apr 13 2022, 4:23 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 13 2022, 4:23 AM

Herald added subscribers: ormris, dexonsmith, hiraditya. · View Herald Transcript

psamolysov requested review of this revision.Apr 13 2022, 4:23 AM

Herald added a subscriber: llvm-commits. · View Herald TranscriptApr 13 2022, 4:23 AM

Harbormaster completed remote builds in B159416: Diff 422465.Apr 13 2022, 6:00 AM

This doesn't make sense. The visited set must contain all visited instructions to prevent infinite loops. The worklist on the other hand only contains instructions yet to be visited. They are complement sets, you can't combine them into one, or at least not as implemented.

This revision now requires changes to proceed.Apr 13 2022, 7:07 AM

@nikic thank you very much for your comment. As I see in the removed code, in the AppendUsers lambda, every user of the value V is firstly inserted into the Visited set and if and only if the insertion takes place, the object is also appended to the Worklist vector. The SmallSetVector class implements exactly the same logic: https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/ADT/SetVector.h#L141 (however, one difference is there, SmallSetVector uses SmallDenceSet, not SmallPtrSet). The problem I see, during the traversing, we remove elements from the end of the vector using it as a stack (https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/IPO/ArgumentPromotion.cpp#L573) and SmallSetVector will remove the elements from the set too, not only from the vector. So (in the new implementation), if a user was inserted into the worklist and then is taken for processing, the user is also removed from the set of visited elements and therefore can be added to the worklist again and again if there is a loop between the users. Is this your concern?

In D123669#3448600, @psamolysov wrote:

@nikic thank you very much for your comment. As I see in the removed code, in the AppendUsers lambda, every user of the value V is firstly inserted into the Visited set and if and only if the insertion takes place, the object is also appended to the Worklist vector. The SmallSetVector class implements exactly the same logic: https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/ADT/SetVector.h#L141 (however, one difference is there, SmallSetVector uses SmallDenceSet, not SmallPtrSet). The problem I see, during the traversing, we remove elements from the end of the vector using it as a stack (https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/IPO/ArgumentPromotion.cpp#L573) and SmallSetVector will remove the elements from the set too, not only from the vector. So (in the new implementation), if a user was inserted into the worklist and then is taken for processing, the user is also removed from the set of visited elements and therefore can be added to the worklist again and again if there is a loop between the users. Is this your concern?

Yes, exactly.

Colleagues, @nikic Excuse me if I spend your time. I've had a look at the loop of searching the load instructions (https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/IPO/ArgumentPromotion.cpp#L563-L597) again and I cannot figure out an example where a cycle between an instruction and its user is. There is a loop over a worklist to find the dead instructions in the same pass: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/IPO/ArgumentPromotion.cpp#L392-L416 and no Visited set is used in this loop at all (the code is very similar but the code on lines 563-597, I proposed a patch for, adds some condition checks in the case where the GEP instruction is visited). I tried to replace the type of the Worklist variable in my patch with SmallVector instead of SmallSetVector and all the insert_range operations with append_range and no test in the llvm\test\Transforms\ArgumentPromotion folder fell. This can be a proof that the Visited set is not needed or we just have no test to get an infinite loop over the worklist in the case where a cycle between an instruction and its user is possible.

we just have no test to get an infinite loop over the worklist in the case where a cycle between an instruction and its user is possible.

Highly likely this case is our reality.

The cycle between an instruction and its users is possible for PHI nodes: only they may reference their own value but the loop in the patch handles no instructions but GEP, bitcast and load.

For example for the following input:

define internal i32* @callee(i32* noundef %0, i32* noundef %1) #0 {
2:
  br label %3
3:
  %4 = phi i32* [ %4, %3 ], [ %1, %2 ]
  br label %3

  ret i32* %4
}

The pass just writes into the debug stream: ArgPromotion of i32* %1 failed: unknown user %4 = phi i32* [ %4, %3 ], [ %1, %2 ] and doesn't fell into a loop.

Also, there can be a situation when the graph looks like a diamond:

%0 = ...
%1 = ... %0 ...
%2 = ... %0 ...
%3 = ... %1 ... %2 ... ; might be handled twice
%4 = load ... %3 ...

In this case if there was no Visited set, instruction %3 would be handled twice, but because only the GEP, bitcast and load instructions are handled and no one of them can have more than a single pointer argument (I'm not sure about GEP but as I see even GEP has one pointer argument too), the promotion just fails one time on an unknown user:

%3 = bitcast i32* %0 to float*
%4 = bitcast i32* %1 to float*
%5 = icmp ugt float* %3, %4
%6 = select i1 %5, float* %3, float* %4
%7 = load float, float* %6, align 4, !tbaa !4
...

output:

ArgPromotion of i32* %0 failed: unknown user   %5 = icmp ugt float* %3, %4

So, the optimization (to remove the Visited set) looks like making sense here. What is your opinion?

The patch summary and the diff have been updated, the idea is just use a SmallVector instance for load instruction search because the def-use sub-graph where users are only GET, bitcast and load instructions looks like a tree not even a DAG.

Harbormaster completed remote builds in B160442: Diff 423883.Apr 20 2022, 7:13 AM

rriddle removed a reviewer: rriddle.Apr 20 2022, 9:43 AM

silvas resigned from this revision.Apr 20 2022, 4:34 PM

@psamolysov The main thing we're concerned about when we use these "Visited" sets is unreachable code, which can include self-referencing instructions. You can have something like %ptr = load ptr, ptr %ptr and then loop infinitely.

Now, I think you are correct that this can't happen in this case, because we are only looking through instructions that have a single non-constant operand. However, if this code is ever changed to also handle other instruction kinds, one would have to be careful to reintroduce the visited set. For example, the "full restrict" patches add an additional operand to load instructions, which could be used to form a cycle in unreachable code.

So, I think it's safe to drop this Visited set in this case, but I would usually consider it good practice to have one, in the interest of defensiveness. Though I don't care particularly strongly about this point.

@nikic

So, I think it's safe to drop this Visited set in this case, but I would usually consider it good practice to have one, in the interest of defensiveness.

I agree, the code may be changed in the future and the change can be unsafe if the developer won't take into account the idea why the Visited set is not used *now*. I see two ways: to add a comment why using a SmallVector to store the worklist and not using of the Visited set is safe in our case, or to add the Visited set into dead instruction collecting loop in the doPromotion function too, for consistency. What could be better from your point of view?

In D123669#3480070, @psamolysov wrote:

@nikic

So, I think it's safe to drop this Visited set in this case, but I would usually consider it good practice to have one, in the interest of defensiveness.

I agree, the code may be changed in the future and the change can be unsafe if the developer won't take into account the idea why the Visited set is not used *now*. I see two ways: to add a comment why using a SmallVector to store the worklist and not using of the Visited set is safe in our case, or to add the Visited set into dead instruction collecting loop in the doPromotion function too, for consistency. What could be better from your point of view?

Is there any other motivation than code simplification?

Better to stay safe IMHO.

Better to stay safe IMHO.

To stay safe, I propose the following change: to add the Visited set to the dead instruction searching in the doPromotion function too. This makes the code consistent: two used graph traversing algorithms will look similar.

Harbormaster completed remote builds in B161958: Diff 426013.Apr 29 2022, 4:57 AM

ormris removed a subscriber: ormris.May 16 2022, 11:05 AM

Due to https://reviews.llvm.org/D125485 this patch is not actual anymore.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

IPO/

ArgumentPromotion.cpp

31 lines

test/

Transforms/

ArgumentPromotion/

diamond-graph-no-promotion.ll

22 lines

phi-loop-no-arg-promotion.ll

36 lines

Diff 426013

llvm/lib/Transforms/IPO/ArgumentPromotion.cpp

Show First 20 Lines • Show All 144 Lines • ▼ Show 20 Lines	static Value *createByteGEP(IRBuilderBase &IRB, const DataLayout &DL,

if (OrigOffset != 0) {		if (OrigOffset != 0) {
Ptr = IRB.CreateBitCast(Ptr, IRB.getInt8PtrTy(AddrSpace));		Ptr = IRB.CreateBitCast(Ptr, IRB.getInt8PtrTy(AddrSpace));
Ptr = IRB.CreateGEP(IRB.getInt8Ty(), Ptr, IRB.getInt(OrigOffset));		Ptr = IRB.CreateGEP(IRB.getInt8Ty(), Ptr, IRB.getInt(OrigOffset));
}		}
return IRB.CreateBitCast(Ptr, ResElemTy->getPointerTo(AddrSpace));		return IRB.CreateBitCast(Ptr, ResElemTy->getPointerTo(AddrSpace));
}		}

		static void appendUsers(SmallVectorImpl<Value *> &Destination,
		SmallPtrSetImpl<Value > &Visited, Value V) {
		for (User *U : V->users())
		if (Visited.insert(U).second)
		Destination.push_back(U);
		}

/// DoPromotion - This method actually performs the promotion of the specified		/// DoPromotion - This method actually performs the promotion of the specified
/// arguments, and returns the new function. At this point, we know that it's		/// arguments, and returns the new function. At this point, we know that it's
/// safe to do so.		/// safe to do so.
static Function *doPromotion(		static Function *doPromotion(
Function *F,		Function *F,
const DenseMap<Argument *, SmallVector<OffsetAndArgPart, 4>> &ArgsToPromote,		const DenseMap<Argument *, SmallVector<OffsetAndArgPart, 4>> &ArgsToPromote,
SmallPtrSetImpl<Argument *> &ByValArgsToTransform,		SmallPtrSetImpl<Argument *> &ByValArgsToTransform,
Optional<function_ref<void(CallBase &OldCS, CallBase &NewCS)>>		Optional<function_ref<void(CallBase &OldCS, CallBase &NewCS)>>
▲ Show 20 Lines • Show All 229 Lines • ▼ Show 20 Lines	for (const auto &Pair : ArgsToPromote.find(&Arg)->second) {
NewArg.setName(Arg.getName() + "." + Twine(Pair.first) + ".val");		NewArg.setName(Arg.getName() + "." + Twine(Pair.first) + ".val");
OffsetToArg.insert({Pair.first, &NewArg});		OffsetToArg.insert({Pair.first, &NewArg});
}		}

// Otherwise, if we promoted this argument, then all users are load		// Otherwise, if we promoted this argument, then all users are load
// instructions (with possible casts and GEPs in between).		// instructions (with possible casts and GEPs in between).

SmallVector<Value *, 16> Worklist;		SmallVector<Value *, 16> Worklist;
		SmallPtrSet<Value *, 16> Visited;
SmallVector<Instruction *, 16> DeadInsts;		SmallVector<Instruction *, 16> DeadInsts;
append_range(Worklist, Arg.users());		appendUsers(Worklist, Visited, &Arg);
while (!Worklist.empty()) {		while (!Worklist.empty()) {
Value *V = Worklist.pop_back_val();		Value *V = Worklist.pop_back_val();
if (isa<BitCastInst>(V) \|\| isa<GetElementPtrInst>(V)) {		if (isa<BitCastInst>(V) \|\| isa<GetElementPtrInst>(V)) {
DeadInsts.push_back(cast<Instruction>(V));		DeadInsts.push_back(cast<Instruction>(V));
append_range(Worklist, V->users());		appendUsers(Worklist, Visited, V);
continue;		continue;
}		}

if (auto *LI = dyn_cast<LoadInst>(V)) {		if (auto *LI = dyn_cast<LoadInst>(V)) {
Value *Ptr = LI->getPointerOperand();		Value *Ptr = LI->getPointerOperand();
APInt Offset(DL.getIndexTypeSizeInBits(Ptr->getType()), 0);		APInt Offset(DL.getIndexTypeSizeInBits(Ptr->getType()), 0);
Ptr =		Ptr =
Ptr->stripAndAccumulateConstantOffsets(DL, Offset,		Ptr->stripAndAccumulateConstantOffsets(DL, Offset,
▲ Show 20 Lines • Show All 150 Lines • ▼ Show 20 Lines	if (!isGuaranteedToTransferExecutionToSuccessor(&I))
break;		break;
}		}

// Now look at all loads of the argument. Remember the load instructions		// Now look at all loads of the argument. Remember the load instructions
// for the aliasing check below.		// for the aliasing check below.
SmallVector<Value *, 16> Worklist;		SmallVector<Value *, 16> Worklist;
SmallPtrSet<Value *, 16> Visited;		SmallPtrSet<Value *, 16> Visited;
SmallVector<LoadInst *, 16> Loads;		SmallVector<LoadInst *, 16> Loads;
auto AppendUsers = [&](Value *V) {		appendUsers(Worklist, Visited, Arg);
for (User *U : V->users())
if (Visited.insert(U).second)
Worklist.push_back(U);
};
AppendUsers(Arg);
while (!Worklist.empty()) {		while (!Worklist.empty()) {
Value *V = Worklist.pop_back_val();		Value *V = Worklist.pop_back_val();
if (isa<BitCastInst>(V)) {		if (isa<BitCastInst>(V)) {
AppendUsers(V);		appendUsers(Worklist, Visited, V);
continue;		continue;
}		}

if (auto *GEP = dyn_cast<GetElementPtrInst>(V)) {		if (auto *GEP = dyn_cast<GetElementPtrInst>(V)) {
if (!GEP->hasAllConstantIndices())		if (!GEP->hasAllConstantIndices())
return false;		return false;
AppendUsers(V);		appendUsers(Worklist, Visited, V);
continue;		continue;
}		}

if (auto *LI = dyn_cast<LoadInst>(V)) {		if (auto *LI = dyn_cast<LoadInst>(V)) {
if (!HandleLoad(LI, / GuaranteedToExecute */ false))		if (!HandleLoad(LI, / GuaranteedToExecute */ false))
return false;		return false;
Loads.push_back(LI);		Loads.push_back(LI);
continue;		continue;
▲ Show 20 Lines • Show All 114 Lines • ▼ Show 20 Lines	static bool canPaddingBeAccessed(Argument *Arg) {
SmallPtrSet<Value *, 16> PtrValues;		SmallPtrSet<Value *, 16> PtrValues;
PtrValues.insert(Arg);		PtrValues.insert(Arg);

// Track all of the stores.		// Track all of the stores.
SmallVector<StoreInst *, 16> Stores;		SmallVector<StoreInst *, 16> Stores;

// Scan through the uses recursively to make sure the pointer is always used		// Scan through the uses recursively to make sure the pointer is always used
// sanely.		// sanely.
SmallVector<Value *, 16> WorkList(Arg->users());		SmallVector<Value *, 16> Worklist(Arg->users());
while (!WorkList.empty()) {		while (!Worklist.empty()) {
Value *V = WorkList.pop_back_val();		Value *V = Worklist.pop_back_val();
if (isa<GetElementPtrInst>(V) \|\| isa<PHINode>(V)) {		if (isa<GetElementPtrInst>(V) \|\| isa<PHINode>(V)) {
if (PtrValues.insert(V).second)		if (PtrValues.insert(V).second)
llvm::append_range(WorkList, V->users());		append_range(Worklist, V->users());
} else if (StoreInst *Store = dyn_cast<StoreInst>(V)) {		} else if (StoreInst *Store = dyn_cast<StoreInst>(V)) {
Stores.push_back(Store);		Stores.push_back(Store);
} else if (!isa<LoadInst>(V)) {		} else if (!isa<LoadInst>(V)) {
return true;		return true;
}		}
}		}

// Check to make sure the pointers aren't captured		// Check to make sure the pointers aren't captured
▲ Show 20 Lines • Show All 338 Lines • Show Last 20 Lines

llvm/test/Transforms/ArgumentPromotion/diamond-graph-no-promotion.ll

This file was added.

				; REQUIRES: asserts

				; RUN: opt -passes=argpromotion -debug-only=argpromotion -disable-output %s 2>&1 \| FileCheck %s

				define internal i32 @diamond_callee(i32* noundef readonly %0, i32* noundef readonly %1) {
				; CHECK: ArgPromotion of i32* [[ARG0:%.*]] failed: unknown user {{%.+}} = icmp
				; CHECK: ArgPromotion of i32* [[ARG1:%.*]] failed: unknown user {{%.+}} = icmp
				%3 = bitcast i32* %0 to float*
				%4 = bitcast i32* %1 to float*
				%5 = icmp ugt float* %3, %4
				%6 = select i1 %5, float* %3, float* %4
				%7 = load float, float* %6
				%8 = fptosi float %7 to i32
				ret i32 %8
				}

				define i32 @diamond_caller(i32* nocapture noundef readonly %0) {
				%2 = getelementptr inbounds i32, i32* %0, i64 0
				%3 = getelementptr inbounds i32, i32* %0, i64 1
				%4 = call i32 @diamond_callee(i32* noundef %2, i32* noundef %3)
				ret i32 %4
				}

llvm/test/Transforms/ArgumentPromotion/phi-loop-no-arg-promotion.ll

This file was added.

				; REQUIRES: asserts

				; RUN: opt -passes=argpromotion -S -o - %s \| FileCheck %s
				; RUN: opt -passes=argpromotion -debug-only=argpromotion -disable-output %s 2>&1 \| FileCheck %s --check-prefix=WARN

				define internal i32* @callee(i32* noundef %0, i32* noundef %1) {
				; CHECK-LABEL: define {{[^@]+}}@callee
				; CHECK-SAME: (i32* noundef [[P_0_PTR:%.*]]) {
				; CHECK-NEXT: br label %[[BGN:.*]]
				; CHECK: [[BGN]]:
				; CHECK-NEXT: [[PHI_VAL:%.]] = phi i32 [ [[PHI_VAL]], %[[BGN]] ], [ [[P_0_PTR]], {{%.+}} ]
				; CHECK-NEXT: br label %[[BGN]]
				; CHECK: [[END:.*]]:
				; CHECK-NEXT: ret i32* [[PHI_VAL]]

				; WARN: ArgPromotion of i32* [[P_1_PTR:%.]] failed: unknown user {{%.+}} = phi i32
				2:
				br label %3
				3:
				%4 = phi i32* [ %4, %3 ], [ %1, %2 ]
				br label %3

				ret i32* %4
				}

				define i32* @caller(i32* noundef %0) {
				; CHECK-LABEL: define {{[^@]+}}@caller
				; CHECK-SAME: (i32* noundef [[ARG:%.*]]) {
				; CHECK: [[P_1_PTR:%.]] = getelementptr inbounds i32, i32 [[ARG]], i64 1
				; CHECK-NEXT: [[RET:%.+]] = call i32* @callee(i32* noundef [[P_1_PTR]])
				; CHECK-NEXT: ret i32* [[RET]]
				%2 = getelementptr inbounds i32, i32* %0, i64 0
				%3 = getelementptr inbounds i32, i32* %0, i64 1
				%4 = call i32* @callee(i32* noundef %2, i32* noundef %3)
				ret i32* %4
				}