This is an archive of the discontinued LLVM Phabricator instance.

[ArgPromotion] Extend search for SafeToUnconditionallyLoad indices to the blocks that must be executed upon entry into the function.
Needs ReviewPublic

Authored by mgudim on Jan 7 2020, 6:29 PM.

Download Raw Diff

Details

Reviewers

chandlerc
rnk
jdoerfert

Summary

Currently, isSafeToPromoteArgument only scans the function's entry block to find indices for loads which are safe to move from callee to the caller. In this patch, we extend this search to all the blocks which must be executed upon entry into the function.

Diff Detail

Event Timeline

mgudim created this revision.Jan 7 2020, 6:29 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 7 2020, 6:29 PM

Herald added subscribers: llvm-commits, hiraditya. · View Herald Transcript

Hi all, most likely this is not the final version of the patch, but I am posting this initial version to get some suggestions/feedback. Soon I will try to collect some statistics to see how much impact on performance this change has.

Using post-dominator information is not safe. Take this code:

static int foo(int c, int *a) {
  if (c) {
    while (1) {}
  } else {
    my_personal_exit();    // < no return
  }
  return *a; // < post dominator is dead
}

What you can use is the must-be-executed context as it was introduced for this reason, e.g.,:

MustBeExecutedContextExplorer Explorer(...);
auto EIt = Explorer.begin(&EntryBB.front()), EEnd = Explorer.end(&EntryBB.front());
do {
  Instruction *ExecutedI = EIt.getCurrentInst();
  // extract properties from ExecutedI.
} while (++EIt != EEnd);

FWIW, I'm strongly in favor of removing ArgumentPromotion as a pass, the first step towards this goal is D68852, I just haven't had the time to rebase and merge it.
I want this for various reasons, including the bugs in ArgumentPromotion, which are fixable of course. (See the bugzilla PR42852, PR887, PR42683). Generally speaking,
a lot of things happening in this pass are, or should be, part of other analyses/transformations.

If you would be interested to make the pointer privatization in the Attributor stronger, I'd be happy to help with that.

@jdoerfert I see, thanks for the example - I did not think about that. I will try the "MustBeExecutedContextExplorer".

@jdoerfert I updated the patch as you suggested. Also, I added your example as a test.

This should fix PR887: https://bugs.llvm.org/show_bug.cgi?id=887
Please verify that and include the test case from there. We should mention it in the commit message and close the bug if it works.

This should also fix PR42039: https://bugs.llvm.org/show_bug.cgi?id=42039
Please verify that and include the test case from there. We should mention it in the commit message and close the bug if it works.

In D72382#1824701, @mgudim wrote:

@jdoerfert I updated the patch as you suggested. Also, I added your example as a test.

I don't see the example.

llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
637	I think this is correct but it can be expensive. Since this runs by default in O3 you might want to verify the compile time impact and potentially put the construction under a flag.

updated the comment.

In D72382#1824854, @jdoerfert wrote:

This should fix PR887: https://bugs.llvm.org/show_bug.cgi?id=887
Please verify that and include the test case from there. We should mention it in the commit message and close the bug if it works.

My patch does not make any difference. It looks like 887 is related to pass ordering: the load can be hoisted up into the entry block, but hoisting only runs after argpromotion

This should also fix PR42039: https://bugs.llvm.org/show_bug.cgi?id=42039
Please verify that and include the test case from there. We should mention it in the commit message and close the bug if it works.

With my patch argpromotion does not happen, so I'll add this to the test cases and mention it in commit message.

In D72382#1824701, @mgudim wrote:

@jdoerfert I updated the patch as you suggested. Also, I added your example as a test.

I don't see the example.

It's at the very end of control-flow3.ll:
; CHECK-LABEL: define internal i32 @callee2(i32* %P) {
define internal i32 @callee2(i32* %P) {
entry:

br label %bb1

bb1:

%gep0 = getelementptr i32, i32* %P, i64 0
; CHECK: %X = load i32, i32* %gep0
%X = load i32, i32* %gep0
br label %bb1

bb2:

%gep1 = getelementptr i32, i32* %P, i64 1
; CHECK: %Y = load i32, i32* %gep1
%Y = load i32, i32* %gep1
ret i32 %X

}

define i32 @caller2() {

%A = alloca i32
store i32 17, i32* %A
; CHECK: %X = call i32 @callee2(i32* %A)
%X = call i32 @callee2(i32* %A)
ret i32 %X

}

In D72382#1830078, @mgudim wrote:

In D72382#1824854, @jdoerfert wrote:

This should fix PR887: https://bugs.llvm.org/show_bug.cgi?id=887
Please verify that and include the test case from there. We should mention it in the commit message and close the bug if it works.

My patch does not make any difference. It looks like 887 is related to pass ordering: the load can be hoisted up into the entry block, but hoisting only runs after argpromotion

Right. Sorry. We are still missing DD65593. Anyway, all this logic should not be here in the first place so never mind ;)

This should also fix PR42039: https://bugs.llvm.org/show_bug.cgi?id=42039
Please verify that and include the test case from there. We should mention it in the commit message and close the bug if it works.

With my patch argpromotion does not happen, so I'll add this to the test cases and mention it in commit message.

Yes, thanks.

In D72382#1824701, @mgudim wrote:

@jdoerfert I updated the patch as you suggested. Also, I added your example as a test.

I don't see the example.

It's at the very end of control-flow3.ll:
; CHECK-LABEL: define internal i32 @callee2(i32* %P) {
define internal i32 @callee2(i32* %P) {
entry:
br label %bb1
bb1:
%gep0 = getelementptr i32, i32* %P, i64 0
; CHECK: %X = load i32, i32* %gep0
%X = load i32, i32* %gep0
br label %bb1
bb2:
%gep1 = getelementptr i32, i32* %P, i64 1
; CHECK: %Y = load i32, i32* %gep1
%Y = load i32, i32* %gep1
ret i32 %X
}

define i32 @caller2() {
%A = alloca i32
store i32 17, i32* %A
; CHECK: %X = call i32 @callee2(i32* %A)
%X = call i32 @callee2(i32* %A)
ret i32 %X
}

Isn't that an endless loop without side-effects? We should avoid test like these. If you run it through the attributor (https://godbolt.org/z/xTPcmT) you see why this is not too meaningful.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

IPO/

ArgumentPromotion.cpp

26 lines

test/

Transforms/

ArgumentPromotion/

control-flow3.ll

81 lines

Diff 238560

llvm/lib/Transforms/IPO/ArgumentPromotion.cpp

Show All 40 Lines
#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/Analysis/AssumptionCache.h"		#include "llvm/Analysis/AssumptionCache.h"
#include "llvm/Analysis/BasicAliasAnalysis.h"		#include "llvm/Analysis/BasicAliasAnalysis.h"
#include "llvm/Analysis/CGSCCPassManager.h"		#include "llvm/Analysis/CGSCCPassManager.h"
#include "llvm/Analysis/CallGraph.h"		#include "llvm/Analysis/CallGraph.h"
#include "llvm/Analysis/CallGraphSCCPass.h"		#include "llvm/Analysis/CallGraphSCCPass.h"
#include "llvm/Analysis/LazyCallGraph.h"		#include "llvm/Analysis/LazyCallGraph.h"
#include "llvm/Analysis/Loads.h"		#include "llvm/Analysis/Loads.h"
		#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/MemoryLocation.h"		#include "llvm/Analysis/MemoryLocation.h"
		#include "llvm/Analysis/MustExecute.h"
		#include "llvm/Analysis/PostDominators.h"
#include "llvm/Analysis/TargetLibraryInfo.h"		#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/IR/Argument.h"		#include "llvm/IR/Argument.h"
#include "llvm/IR/Attributes.h"		#include "llvm/IR/Attributes.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/CFG.h"		#include "llvm/IR/CFG.h"
#include "llvm/IR/CallSite.h"		#include "llvm/IR/CallSite.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
▲ Show 20 Lines • Show All 560 Lines • ▼ Show 20 Lines	auto UpdateBaseTy = [&](Type *NewBaseTy) {
}		}

return true;		return true;
};		};

// First, iterate the entry block and mark loads of (geps of) arguments as		// First, iterate the entry block and mark loads of (geps of) arguments as
// safe.		// safe.
BasicBlock &EntryBlock = Arg->getParent()->front();		BasicBlock &EntryBlock = Arg->getParent()->front();
		GetterTy<LoopInfo> LIGetter = [&](const Function &F) {
		DominatorTree *DT = new DominatorTree(const_cast<Function &>(F));
		LoopInfo LI = new LoopInfo(DT);
		return LI;
		};
		GetterTy<PostDominatorTree> PDTGetter = [&](const Function &F) {
		PostDominatorTree *PDT = new PostDominatorTree(const_cast<Function &>(F));
		return PDT;
		};
		jdoerfertUnsubmitted Not Done Reply Inline Actions I think this is correct but it can be expensive. Since this runs by default in O3 you might want to verify the compile time impact and potentially put the construction under a flag. jdoerfert: I think this is correct but it can be expensive. Since this runs by default in O3 you might…
		MustBeExecutedContextExplorer Explorer(true, LIGetter, PDTGetter);
// Declare this here so we can reuse it		// Declare this here so we can reuse it
IndicesVector Indices;		IndicesVector Indices;
for (Instruction &I : EntryBlock)		for (
if (LoadInst *LI = dyn_cast<LoadInst>(&I)) {		auto EIt = Explorer.begin(&EntryBlock.front()), EEnd = Explorer.end(&EntryBlock.front());
Value *V = LI->getPointerOperand();		EIt != EEnd; ++EIt
if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(V)) {		)
		if (const LoadInst *LI = dyn_cast<LoadInst>(EIt.getCurrentInst())) {
		const Value *V = LI->getPointerOperand();
		if (const GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(V)) {
V = GEP->getPointerOperand();		V = GEP->getPointerOperand();
if (V == Arg) {		if (V == Arg) {
// This load actually loads (part of) Arg? Check the indices then.		// This load actually loads (part of) Arg? Check the indices then.
Indices.reserve(GEP->getNumIndices());		Indices.reserve(GEP->getNumIndices());
for (User::op_iterator II = GEP->idx_begin(), IE = GEP->idx_end();		for (User::const_op_iterator II = GEP->idx_begin(), IE = GEP->idx_end();
II != IE; ++II)		II != IE; ++II)
if (ConstantInt CI = dyn_cast<ConstantInt>(II))		if (ConstantInt CI = dyn_cast<ConstantInt>(II))
Indices.push_back(CI->getSExtValue());		Indices.push_back(CI->getSExtValue());
else		else
// We found a non-constant GEP index for this argument? Bail out		// We found a non-constant GEP index for this argument? Bail out
// right away, can't promote this argument at all.		// right away, can't promote this argument at all.
return false;		return false;

▲ Show 20 Lines • Show All 531 Lines • Show Last 20 Lines

llvm/test/Transforms/ArgumentPromotion/control-flow3.ll

This file was added.

				; RUN: opt < %s -argpromotion -S \| FileCheck %s
				; RUN: opt < %s -passes=argpromotion -S \| FileCheck %s

				target datalayout = "E-p:64:64:64-a0:0:8-f32:32:32-f64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-v64:64:64-v128:128:128"

				; CHECK-LABEL: define internal i32 @callee0(i32 %P.val)
				define internal i32 @callee0(i32* %P) {
				entry:
				br label %bb1

				bb1:
				br label %bb2

				bb2:
				; CHECK-NOT: load i32, i32* %P
				%X = load i32, i32* %P
				ret i32 %X
				}

				; CHECK-LABEL: define i32 @caller0() {
				define i32 @caller0() {
				%A = alloca i32
				store i32 17, i32* %A
				; CHECK: %A.val = load i32, i32* %A
				; CHECK: %X = call i32 @callee0(i32 %A.val)
				%X = call i32 @callee0(i32* %A)
				ret i32 %X
				}

				; CHECK-LABEL: define internal i32 @callee1(i1 %C, i32 %P.val)
				define internal i32 @callee1(i1 %C, i32* %P) {
				entry:
				br label %bb1

				bb1:
				br label %bb2

				bb2:
				; CHECK-NOT: load i32, i32* %P
				%X = load i32, i32* %P
				br i1 %C, label %bb2, label %exit

				exit:
				ret i32 %X
				}

				; CHECK-LABEL: define i32 @caller1() {
				define i32 @caller1() {
				%A = alloca i32
				store i32 17, i32* %A
				; CHECK: %A.val = load i32, i32* %A
				; CHECK: %X = call i32 @callee1(i1 false, i32 %A.val)
				%X = call i32 @callee1(i1 false, i32* %A)
				ret i32 %X
				}

				; CHECK-LABEL: define internal i32 @callee2(i32* %P) {
				define internal i32 @callee2(i32* %P) {
				entry:
				br label %bb1

				bb1:
				%gep0 = getelementptr i32, i32* %P, i64 0
				; CHECK: %X = load i32, i32* %gep0
				%X = load i32, i32* %gep0
				br label %bb1

				bb2:
				%gep1 = getelementptr i32, i32* %P, i64 1
				; CHECK: %Y = load i32, i32* %gep1
				%Y = load i32, i32* %gep1
				ret i32 %X
				}

				define i32 @caller2() {
				%A = alloca i32
				store i32 17, i32* %A
				; CHECK: %X = call i32 @callee2(i32* %A)
				%X = call i32 @callee2(i32* %A)
				ret i32 %X
				}