This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/IPO/
-
Transforms/
-
IPO/
4/13
GlobalOpt.cpp
-
test/Transforms/GlobalOpt/
-
Transforms/
-
GlobalOpt/
2/2
inalloca.ll

Differential D61461

When removing inalloca, convert to static alloca
AbandonedPublic

Authored by inglorion on May 2 2019, 1:49 PM.

Download Raw Diff

Details

Reviewers

rnk
efriedma

Summary

Since r359743, we remove inalloca from functions that don't need
it. This change optimizes the affected allocas by turning them
into static allocas.

Diff Detail

Repository

rG LLVM Github Monorepo

Build Status

Buildable 31319
Build 31318: arc lint + arc unit

Event Timeline

inglorion created this revision.May 2 2019, 1:49 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 2 2019, 1:49 PM

Herald added a subscriber: hiraditya. · View Herald Transcript

Harbormaster completed remote builds in B31310: Diff 197857.May 2 2019, 1:50 PM

Nice!

llvm/lib/Transforms/IPO/GlobalOpt.cpp
2147	I would suggest adding `->stripPointerCasts()` to the argument here. In LTO scenarios with C++ templates, it's highly likely that two different TUs will end up computing different struct types that are structurally equivalent. When they are linked together, pointer bitcasts may be added to make the types work out.
2155	This will skip the lifetime insertion. Do you think we should do it anyway? I guess it could hypothetically matter if you have a massive single basic block function that does 20 inalloca calls, then stack usage could get out of hand. Maybe just skip the hoisting.
2159–2162	There's a shortcut for this: `IRBuilder<>::getInt64()` can make ConstantInts.
2168	It's unfortunately very likely that `getFirstInsertionPt` may return an end iterator if BB is a `catchswitch` BB. I would just continue the loop in these cases. A missing lifetime end should pessimize the code, not lead to miscompiles. You should be able to construct a test case for this by putting an inalloca call inside a try / catch.
llvm/test/Transforms/GlobalOpt/inalloca.ll
67	This is a bit artificial, since we don't support generating code for a function that uses landingpad instructions with `__CxxFrameHandler3` personalities. I think it's worth testing catchswitch anyway, but if you want to test landingpad too, add another test that uses `__gxx_personality_v0`.

efriedma added inline comments.May 2 2019, 3:15 PM

llvm/lib/Transforms/IPO/GlobalOpt.cpp
2155	Do we need to check whether the alloca is in the same basic block as the call? If there's control flow between the alloca and the call, the placement of the lifetime intrinsics might be wrong, or the alloca might still be used by some other inalloca call, or the alloca might not be deallocated along the path where the call doesn't execute. It's easy to construct a C++ testcase where an exception might be thrown between an alloca and the corresponding inalloca call. Not sure if it's possible for clang or LLVM optimizations to produce an alloca used by multiple different inalloca calls, depending on control flow.
2168	There might not be an insertion point in a block with a catchswitch.

updated with more sensible test case
improvements suggested by @rnk

Harbormaster completed remote builds in B31316: Diff 197884.May 2 2019, 3:52 PM

rnk added inline comments.May 2 2019, 3:57 PM

llvm/lib/Transforms/IPO/GlobalOpt.cpp
2155	I've been operating under the assumption that stack coloring can tolerate lifetime markers that don't form regions, so this is a best effort to not create excessively large stack frames for functions with long sequences of inalloca calls. There are two side exit cases that are interesting: exceptional (common) normal (rare) I think normal exits are only possible with gnu statement expressions, which allow you to set up some arguments to a call, and then `return`, `goto`, or `break` out of the call setup. Clang's generated IR will do the stackrestore on a normal side exit, I believe, using the normal cleanup mechanism. We could try to find the associated stacksave / stackrestore calls and use them to insert the lifetime markers, but that requires a lot of analysis. So, I think normal exits are uninteresting. I had thought exceptional exits wouldn't be a problem, but now I'm worried about them. When the alloca is dynamic, I believe what happens is that the unwinder resets the stack pointer to what it was at the end of the prologue. This interacts badly with allocas, and MSVC does something to handle this, but I think we never implemented it. However, if we don't place our lifetime end markers along exceptional exits in this transform, I think there's the possibility that we won't do the stack coloring, and we could have runaway stack growth. Hm. We could place lifetime end markers along every unwind edge reachable on the path from the alloca to the call... but we'd have to worry about those uncommon normal side exits, then.

inglorion added inline comments.May 2 2019, 4:01 PM

llvm/lib/Transforms/IPO/GlobalOpt.cpp
2155	This skips the hoisting (that happens in AI->moveBefore(...) later on). This is the case where the alloca is already in the entry block. In that case, I think there is no gain from moving it, and you risk breaking code (by moving it after its use).
2159–2162	Thanks!
2168	I updated the test case so that it hits this case and changed the code to: For non-terminator instructions, insert lifetime.end after. For terminators, look at each of the successors. The base case is to insert before the first non-phi instruction. But if that instruction is a landing pad, insert after it. If there is no instruction after the landing pad, the landing pad is also a terminator, and we recurse to its successors. This should handle the catchswitch case by effectively inserting lifetime.end after the catchpads it dispatches to.
llvm/test/Transforms/GlobalOpt/inalloca.ll
67	I cobbled this together, but realized afterwards that it doesn't make much sense. Since this is 32-bit Windows, I rewrote it to use catchswitch instead.

An example of what I'm talking about with multiple calls:

struct C { C() noexcept; C(const C&) noexcept; ~C() noexcept; };
void f(C) noexcept, f2(C) noexcept;
void g(bool b, bool b2) { if (b2) { b ? f(C()) : f2(C()); } }

With "clang -O2 --target=i686-pc-windows-msvc -emit-llvm", there's one alloca used by multiple calls; the alloca can't be hoisted until both inalloca attributes are stripped.

I think all my comments are addressed, but @efriedma made a good point about the lifetime marker placement along exceptional exits from inalloca construction. I think we probably shouldn't worry about it and just go with this, but I'm biased, since we don't use EH. =S

In D61461#1488801, @efriedma wrote:
An example of what I'm talking about with multiple calls:
struct C { C() noexcept; C(const C&) noexcept; ~C() noexcept; };
void f(C) noexcept, f2(C) noexcept;
void g(bool b, bool b2) { if (b2) { b ? f(C()) : f2(C()); } }
With "clang -O2 --target=i686-pc-windows-msvc -emit-llvm", there's one alloca used by multiple calls; the alloca can't be hoisted until both inalloca attributes are stripped.

Right, that test case. It was even in the design doc. If only we'd had token at the time. :(

So, we'd have to analyze the uses of the alloca to prove this transform is valid. But, this is kind of nasty since you can take an alloca, bitcast it, gep forwards, store that, reload it, gep back, and then use that as an argument to an inalloca call, and that should be valid IR.

It's actually important to optimize the case where the inalloca parameter escapes via a call and there is some control flow, since typically the reason we're emitting inalloca in the first place is because we have to call a nontrivial constructor. So, we'd want to be able to say that it's not valid to pass an inalloca pointer to a call, return it back, and then use it as an argument pack. I don't know if we can make that semantic change. We have things like the IR outliner that wouldn't know about that, although they wouldn't intentionally try to obscure other uses of a captured alloca.

I think Eli raises a good point, and I'm inclined to just say "if the alloca (before hoisting) is not in the same BB as the call, don't hoist it". That avoids changing the stack behavior of the code. It also means we miss out on an optimization, but it's an optimization that we're not performing today anyway.

don't hoist allocas if they are not in the same BB as the function that uses them

Harbormaster completed remote builds in B31319: Diff 197890.May 2 2019, 4:22 PM

For the multiple call case, we could maybe try to do some sort of control-flow based analysis, instead of trying to analyze uses of the pointer. Essentially, take advantage of the rule that "the argument allocation must have been the most recent stack allocation that is still live". But that's probably difficult to implement.

llvm/lib/Transforms/IPO/GlobalOpt.cpp
2155	The truly awful part about trying to emit lifetime markers for the exception side-exit case is that there isn't any good indication in the IR where, exactly, we would need to place the lifetime.end marker. The lifetime actually ends at some location in the middle of the exception handler, after the temporary's destructors run. Maybe clang should emit lifetime markers in that case.

efriedma added inline comments.May 2 2019, 4:55 PM

llvm/lib/Transforms/IPO/GlobalOpt.cpp
2152	Maybe worth expanding a little on why control flow matters, based on the review discussion.
2167	Do we expect that clang will, at some point, start emitting the relevant lifetime markers itself? It currently doesn't, as far as I can tell, but it seems like it might be useful for other reasons.

In D61461#1488860, @efriedma wrote:

For the multiple call case, we could maybe try to do some sort of control-flow based analysis, instead of trying to analyze uses of the pointer. Essentially, take advantage of the rule that "the argument allocation must have been the most recent stack allocation that is still live". But that's probably difficult to implement.

The control flow based idea could be stated as, is there any other call to inalloca reachable from this allocation that is not killed by another stack allocation? But, I guess it would have to track stacksave + stackrestore levels, and those are hard to analyze too.

In person over here, @inglorion and I felt that maybe this would be the best:

Teach clang to emit lifetime markers for inalloca packs (this should be easy)
Teach instcombine or another general cleanup pass to turn an inalloca alloca static if there are no inalloca calls anywhere in the current function. Don't try to insert lifetime markers, just assume the frontend did it if it cares.

Doing the whole-function analysis ignoring uses is much simpler, and it catches cases where the inalloca call site gets inlined, but the inalloca alloca doesn't get SROA'd.

Teach instcombine or another general cleanup pass to turn an inalloca alloca static if there are no inalloca calls anywhere in the current function

Given an arbitrary alloca in a function with no inalloca calls, hoisting the alloca to the entry block requires proving that the allocation's lifetime doesn't overlap itself. For example, we have to make sure we don't miscompile for (int i = 0; i < n; i++) foo(alloca(1));. So you have to track stacksave + stackrestore levels anyway.

That said, a general alloca hoisting transform might be useful in other contexts, and analyzing the whole function at once is probably the most efficient way to handle it.

Thanks for helping me think about this and alternative approaches. I'll be withdrawing this for now to get it out of the review queue. If I end up implementing a new approach, I'll put it up as a new diff (with a link to this one for the comments).

Revision Contents

Path

Size

llvm/

lib/

Transforms/

IPO/

GlobalOpt.cpp

79 lines

test/

Transforms/

GlobalOpt/

inalloca.ll

105 lines

Diff 197890

llvm/lib/Transforms/IPO/GlobalOpt.cpp

Show All 19 Lines
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/ADT/Twine.h"		#include "llvm/ADT/Twine.h"
#include "llvm/ADT/iterator_range.h"		#include "llvm/ADT/iterator_range.h"
#include "llvm/Analysis/BlockFrequencyInfo.h"		#include "llvm/Analysis/BlockFrequencyInfo.h"
#include "llvm/Analysis/ConstantFolding.h"		#include "llvm/Analysis/ConstantFolding.h"
#include "llvm/Analysis/MemoryBuiltins.h"		#include "llvm/Analysis/MemoryBuiltins.h"
#include "llvm/Analysis/TargetLibraryInfo.h"		#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/Transforms/Utils/Local.h"
#include "llvm/BinaryFormat/Dwarf.h"		#include "llvm/BinaryFormat/Dwarf.h"
#include "llvm/IR/Attributes.h"		#include "llvm/IR/Attributes.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/CallSite.h"		#include "llvm/IR/CallSite.h"
#include "llvm/IR/CallingConv.h"		#include "llvm/IR/CallingConv.h"
#include "llvm/IR/Constant.h"		#include "llvm/IR/Constant.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/DebugInfoMetadata.h"		#include "llvm/IR/DebugInfoMetadata.h"
#include "llvm/IR/DerivedTypes.h"		#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/GetElementPtrTypeIterator.h"		#include "llvm/IR/GetElementPtrTypeIterator.h"
#include "llvm/IR/GlobalAlias.h"		#include "llvm/IR/GlobalAlias.h"
#include "llvm/IR/GlobalValue.h"		#include "llvm/IR/GlobalValue.h"
#include "llvm/IR/GlobalVariable.h"		#include "llvm/IR/GlobalVariable.h"
		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/InstrTypes.h"		#include "llvm/IR/InstrTypes.h"
#include "llvm/IR/Instruction.h"		#include "llvm/IR/Instruction.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
#include "llvm/IR/Operator.h"		#include "llvm/IR/Operator.h"
#include "llvm/IR/Type.h"		#include "llvm/IR/Type.h"
#include "llvm/IR/Use.h"		#include "llvm/IR/Use.h"
#include "llvm/IR/User.h"		#include "llvm/IR/User.h"
#include "llvm/IR/Value.h"		#include "llvm/IR/Value.h"
#include "llvm/IR/ValueHandle.h"		#include "llvm/IR/ValueHandle.h"
#include "llvm/Pass.h"		#include "llvm/Pass.h"
#include "llvm/Support/AtomicOrdering.h"		#include "llvm/Support/AtomicOrdering.h"
#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/MathExtras.h"		#include "llvm/Support/MathExtras.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Transforms/IPO.h"		#include "llvm/Transforms/IPO.h"
#include "llvm/Transforms/Utils/CtorUtils.h"		#include "llvm/Transforms/Utils/CtorUtils.h"
#include "llvm/Transforms/Utils/Evaluator.h"		#include "llvm/Transforms/Utils/Evaluator.h"
#include "llvm/Transforms/Utils/GlobalStatus.h"		#include "llvm/Transforms/Utils/GlobalStatus.h"
		#include "llvm/Transforms/Utils/Local.h"
#include <cassert>		#include <cassert>
#include <cstdint>		#include <cstdint>
#include <utility>		#include <utility>
#include <vector>		#include <vector>

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "globalopt"		#define DEBUG_TYPE "globalopt"
▲ Show 20 Lines • Show All 2,017 Lines • ▼ Show 20 Lines
static AttributeList StripAttr(LLVMContext &C, AttributeList Attrs,		static AttributeList StripAttr(LLVMContext &C, AttributeList Attrs,
Attribute::AttrKind A) {		Attribute::AttrKind A) {
unsigned AttrIndex;		unsigned AttrIndex;
if (Attrs.hasAttrSomewhere(A, &AttrIndex))		if (Attrs.hasAttrSomewhere(A, &AttrIndex))
return Attrs.removeAttribute(C, AttrIndex, A);		return Attrs.removeAttribute(C, AttrIndex, A);
return Attrs;		return Attrs;
}		}

static void RemoveAttribute(Function *F, Attribute::AttrKind A) {		static void RemoveNestAttribute(Function *F) {
F->setAttributes(StripAttr(F->getContext(), F->getAttributes(), A));		F->setAttributes(
		StripAttr(F->getContext(), F->getAttributes(), Attribute::Nest));
		for (User *U : F->users()) {
		if (isa<BlockAddress>(U))
		continue;
		CallSite CS(cast<Instruction>(U));
		CS.setAttributes(
		StripAttr(F->getContext(), CS.getAttributes(), Attribute::Nest));
		}
		}

		static void InsertLifetimeEndAfter(Instruction I, Value P, ConstantInt *Size) {
		if (I->isTerminator()) {
		for (unsigned i = 0; i < I->getNumSuccessors(); ++i) {
		BasicBlock *BB = I->getSuccessor(i);
		Instruction *NonPhi = BB->getFirstNonPHI();
		if (NonPhi->isEHPad()) {
		BasicBlock::iterator IP = ++NonPhi->getIterator();
		if (IP == BB->end())
		InsertLifetimeEndAfter(NonPhi, P, Size);
		else
		IRBuilder<>(BB, IP).CreateLifetimeEnd(P, Size);
		} else {
		IRBuilder<>(NonPhi).CreateLifetimeEnd(P, Size);
		}
		}
		} else {
		IRBuilder<>(I->getParent(), ++I->getIterator())
		.CreateLifetimeEnd(P, Size);
		}
		}

		static void RemoveInAlloca(Function *F) {
for (User *U : F->users()) {		for (User *U : F->users()) {
if (isa<BlockAddress>(U))		if (isa<BlockAddress>(U))
continue;		continue;
CallSite CS(cast<Instruction>(U));		CallSite CS(cast<Instruction>(U));
CS.setAttributes(StripAttr(F->getContext(), CS.getAttributes(), A));		assert(CS.arg_size() > 0);
		// The inalloca, if present, is on the last argument.
		unsigned ArgNo = CS.arg_size() - 1;
		if (!CS.paramHasAttr(ArgNo, Attribute::InAlloca))
		continue;
		CS.setAttributes(
		StripAttr(F->getContext(), CS.getAttributes(), Attribute::InAlloca));
		if (AllocaInst *AI = dyn_cast<AllocaInst>(CS.getArgument(ArgNo))) {
		rnkUnsubmitted Not Done Reply Inline Actions I would suggest adding `->stripPointerCasts()` to the argument here. In LTO scenarios with C++ templates, it's highly likely that two different TUs will end up computing different struct types that are structurally equivalent. When they are linked together, pointer bitcasts may be added to make the types work out. rnk: I would suggest adding `->stripPointerCasts()` to the argument here. In LTO scenarios with C++…
		AI->setUsedWithInAlloca(false);
		// Hoist allocas we just removed inalloca from, unless:
		// 1. They are already in the entry block.
		// 2. They have dynamic size.
		// 3. There is control flow between the alloca and the call that uses it.
		efriedmaUnsubmitted Not Done Reply Inline Actions Maybe worth expanding a little on why control flow matters, based on the review discussion. efriedma: Maybe worth expanding a little on why control flow matters, based on the review discussion.
		ConstantInt *Num = dyn_cast<ConstantInt>(AI->getArraySize());
		if (!Num)
		continue;
		rnkUnsubmitted Done Reply Inline Actions This will skip the lifetime insertion. Do you think we should do it anyway? I guess it could hypothetically matter if you have a massive single basic block function that does 20 inalloca calls, then stack usage could get out of hand. Maybe just skip the hoisting. rnk: This will skip the lifetime insertion. Do you think we should do it anyway? I guess it could…
		inglorionAuthorUnsubmitted Not Done Reply Inline Actions This skips the hoisting (that happens in AI->moveBefore(...) later on). This is the case where the alloca is already in the entry block. In that case, I think there is no gain from moving it, and you risk breaking code (by moving it after its use). inglorion: This skips the hoisting (that happens in AI->moveBefore(...) later on). This is the case where…
		efriedmaUnsubmitted Not Done Reply Inline Actions Do we need to check whether the alloca is in the same basic block as the call? If there's control flow between the alloca and the call, the placement of the lifetime intrinsics might be wrong, or the alloca might still be used by some other inalloca call, or the alloca might not be deallocated along the path where the call doesn't execute. It's easy to construct a C++ testcase where an exception might be thrown between an alloca and the corresponding inalloca call. Not sure if it's possible for clang or LLVM optimizations to produce an alloca used by multiple different inalloca calls, depending on control flow. efriedma: Do we need to check whether the alloca is in the same basic block as the call? If there's…
		rnkUnsubmitted Not Done Reply Inline Actions I've been operating under the assumption that stack coloring can tolerate lifetime markers that don't form regions, so this is a best effort to not create excessively large stack frames for functions with long sequences of inalloca calls. There are two side exit cases that are interesting: exceptional (common) normal (rare) I think normal exits are only possible with gnu statement expressions, which allow you to set up some arguments to a call, and then `return`, `goto`, or `break` out of the call setup. Clang's generated IR will do the stackrestore on a normal side exit, I believe, using the normal cleanup mechanism. We could try to find the associated stacksave / stackrestore calls and use them to insert the lifetime markers, but that requires a lot of analysis. So, I think normal exits are uninteresting. I had thought exceptional exits wouldn't be a problem, but now I'm worried about them. When the alloca is dynamic, I believe what happens is that the unwinder resets the stack pointer to what it was at the end of the prologue. This interacts badly with allocas, and MSVC does something to handle this, but I think we never implemented it. However, if we don't place our lifetime end markers along exceptional exits in this transform, I think there's the possibility that we won't do the stack coloring, and we could have runaway stack growth. Hm. We could place lifetime end markers along every unwind edge reachable on the path from the alloca to the call... but we'd have to worry about those uncommon normal side exits, then. rnk: I've been operating under the assumption that stack coloring can tolerate lifetime markers that…
		efriedmaUnsubmitted Not Done Reply Inline Actions The truly awful part about trying to emit lifetime markers for the exception side-exit case is that there isn't any good indication in the IR where, exactly, we would need to place the lifetime.end marker. The lifetime actually ends at some location in the middle of the exception handler, after the temporary's destructors run. Maybe clang should emit lifetime markers in that case. efriedma: The truly awful part about trying to emit lifetime markers for the exception side-exit case is…
		if (AI->getParent() != CS->getParent())
		continue;
		BasicBlock &Entry = AI->getFunction()->getEntryBlock();
		if (AI->getParent() == &Entry)
		continue;
		IRBuilder<> IR(AI);
		// Restrict the lifetime of the hoisted alloca to start where the
		rnkUnsubmitted Done Reply Inline Actions There's a shortcut for this: `IRBuilder<>::getInt64()` can make ConstantInts. rnk: There's a shortcut for this: `IRBuilder<>::getInt64()` can make ConstantInts.
		inglorionAuthorUnsubmitted Done Reply Inline Actions Thanks! inglorion: Thanks!
		// original alloca was and end after the call that uses the alloca.
		ConstantInt *Size = IR.getInt64(
		Num->getZExtValue() *
		F->getParent()->getDataLayout().getTypeAllocSize(AI->getAllocatedType()));
		IR.CreateLifetimeStart(AI, Size);
		efriedmaUnsubmitted Not Done Reply Inline Actions Do we expect that clang will, at some point, start emitting the relevant lifetime markers itself? It currently doesn't, as far as I can tell, but it seems like it might be useful for other reasons. efriedma: Do we expect that clang will, at some point, start emitting the relevant lifetime markers…
		InsertLifetimeEndAfter(CS.getInstruction(), AI, Size);
		rnkUnsubmitted Done Reply Inline Actions It's unfortunately very likely that `getFirstInsertionPt` may return an end iterator if BB is a `catchswitch` BB. I would just continue the loop in these cases. A missing lifetime end should pessimize the code, not lead to miscompiles. You should be able to construct a test case for this by putting an inalloca call inside a try / catch. rnk: It's unfortunately very likely that `getFirstInsertionPt` may return an end iterator if BB is a…
		inglorionAuthorUnsubmitted Not Done Reply Inline Actions I updated the test case so that it hits this case and changed the code to: For non-terminator instructions, insert lifetime.end after. For terminators, look at each of the successors. The base case is to insert before the first non-phi instruction. But if that instruction is a landing pad, insert after it. If there is no instruction after the landing pad, the landing pad is also a terminator, and we recurse to its successors. This should handle the catchswitch case by effectively inserting lifetime.end after the catchpads it dispatches to. inglorion: I updated the test case so that it hits this case and changed the code to: - For non…
		efriedmaUnsubmitted Not Done Reply Inline Actions There might not be an insertion point in a block with a catchswitch. efriedma: There might not be an insertion point in a block with a catchswitch.
		AI->moveBefore(Entry.getTerminator());
		}
}		}
		F->setAttributes(
		StripAttr(F->getContext(), F->getAttributes(), Attribute::InAlloca));
}		}

/// Return true if this is a calling convention that we'd like to change. The		/// Return true if this is a calling convention that we'd like to change. The
/// idea here is that we don't want to mess with the convention if the user		/// idea here is that we don't want to mess with the convention if the user
/// explicitly requested something with performance implications like coldcc,		/// explicitly requested something with performance implications like coldcc,
/// GHC, or anyregcc.		/// GHC, or anyregcc.
static bool hasChangeableCC(Function *F) {		static bool hasChangeableCC(Function *F) {
CallingConv::ID CC = F->getCallingConv();		CallingConv::ID CC = F->getCallingConv();
▲ Show 20 Lines • Show All 161 Lines • ▼ Show 20 Lines	for (Module::iterator FI = M.begin(), E = M.end(); FI != E; ) {
Changed \|= processGlobal(*F, TLI, LookupDomTree);		Changed \|= processGlobal(*F, TLI, LookupDomTree);

if (!F->hasLocalLinkage())		if (!F->hasLocalLinkage())
continue;		continue;

// If we have an inalloca parameter that we can safely remove the		// If we have an inalloca parameter that we can safely remove the
// inalloca attribute from, do so. This unlocks optimizations that		// inalloca attribute from, do so. This unlocks optimizations that
// wouldn't be safe in the presence of inalloca.		// wouldn't be safe in the presence of inalloca.
// FIXME: We should also hoist alloca affected by this to the entry
// block if possible.
if (F->getAttributes().hasAttrSomewhere(Attribute::InAlloca) &&		if (F->getAttributes().hasAttrSomewhere(Attribute::InAlloca) &&
!F->hasAddressTaken()) {		!F->hasAddressTaken()) {
RemoveAttribute(F, Attribute::InAlloca);		RemoveInAlloca(F);
Changed = true;		Changed = true;
}		}

if (hasChangeableCC(F) && !F->isVarArg() && !F->hasAddressTaken()) {		if (hasChangeableCC(F) && !F->isVarArg() && !F->hasAddressTaken()) {
NumInternalFunc++;		NumInternalFunc++;
TargetTransformInfo &TTI = GetTTI(*F);		TargetTransformInfo &TTI = GetTTI(*F);
// Change the calling convention to coldcc if either stress testing is		// Change the calling convention to coldcc if either stress testing is
// enabled or the target would like to use coldcc on functions which are		// enabled or the target would like to use coldcc on functions which are
Show All 19 Lines	if (hasChangeableCC(F) && !F->isVarArg() &&
++NumFastCallFns;		++NumFastCallFns;
Changed = true;		Changed = true;
}		}

if (F->getAttributes().hasAttrSomewhere(Attribute::Nest) &&		if (F->getAttributes().hasAttrSomewhere(Attribute::Nest) &&
!F->hasAddressTaken()) {		!F->hasAddressTaken()) {
// The function is not used by a trampoline intrinsic, so it is safe		// The function is not used by a trampoline intrinsic, so it is safe
// to remove the 'nest' attribute.		// to remove the 'nest' attribute.
RemoveAttribute(F, Attribute::Nest);		RemoveNestAttribute(F);
++NumNestRemoved;		++NumNestRemoved;
Changed = true;		Changed = true;
}		}
}		}
return Changed;		return Changed;
}		}

static bool		static bool
▲ Show 20 Lines • Show All 676 Lines • Show Last 20 Lines

llvm/test/Transforms/GlobalOpt/inalloca.ll

This file was added.

				; Tests that globalopt can turn inallocas into static allocas.

				; RUN: opt -S -globalopt %s \| FileCheck %s
				; RUN: opt -S -passes=globalopt %s \| FileCheck %s

				target datalayout = "e-m:x-p:32:32-i64:64-f80:32-n8:16:32-a:0:32-S32"
				target triple = "i386-pc-windows-msvc19.11.0"

				%struct.a = type { i1 }

				; CHECK: define internal fastcc i1 @f(%struct.a* %a)
				define internal i1 @f(%struct.a* inalloca %a) {
				%1 = getelementptr inbounds %struct.a, %struct.a* %a, i32 0, i32 0
				%2 = load i1, i1* %1
				ret i1 %2
				}

				; CHECK: @move
				define i32 @move() {
				; CHECK: %a = alloca %struct.a
				br label %again
				; CHECK: again:
				again:
				; CHECK-NOT: alloca
				; CHECK: %1 = bitcast %struct.a* %a to i8*
				; CHECK: call void @llvm.lifetime.start.p0i8(i64 1, i8* %1)
				%a = alloca inalloca %struct.a
				; CHECK: %t = call fastcc i1 @f(%struct.a* %a)
				%t = call i1 @f(%struct.a* inalloca %a)
				; CHECK: %2 = bitcast %struct.a* %a to i8*
				; CHECK: call void @llvm.lifetime.end.p0i8(i64 1, i8* %2)
				br i1 %t, label %again, label %done
				done:
				ret i32 0
				}

				; Check that allocas already in the entry block stay before their uses.
				; CHECK: @dontmove
				define i32 @dontmove() {
				; CHECK: %a = alloca %struct.a
				%a = alloca inalloca %struct.a
				; CHECK: %t = call fastcc i1 @f(%struct.a* %a)
				%t = call i1 @f(%struct.a* inalloca %a)
				ret i32 0
				}

				; Check that we insert lifetime ends for all successors of an invoke.
				; CHECK: @successors
				define i32 @successors() nounwind personality i32 (...)* @__CxxFrameHandler3 {
				; CHECK: %a = alloca %struct.a
				br label %again
				; CHECK: again:
				again:
				; CHECK-NOT: alloca
				; CHECK: %1 = bitcast %struct.a* %a to i8*
				; CHECK: call void @llvm.lifetime.start.p0i8(i64 1, i8* %1)
				; CHECK: %t = invoke fastcc i1 @f(%struct.a* %a)
				%a = alloca inalloca %struct.a
				%t = invoke i1 @f(%struct.a* inalloca %a)
				to label %cont unwind label %unwind
				cont:
				; CHECK: %2 = bitcast %struct.a* %a to i8*
				; CHECK: call void @llvm.lifetime.end.p0i8(i64 1, i8* %2)
				br i1 %t, label %again, label %done
				unwind:
				%cs = catchswitch within none [label %unwind.body] unwind to caller
				unwind.body:
				rnkUnsubmitted Done Reply Inline Actions This is a bit artificial, since we don't support generating code for a function that uses landingpad instructions with `__CxxFrameHandler3` personalities. I think it's worth testing catchswitch anyway, but if you want to test landingpad too, add another test that uses `__gxx_personality_v0`. rnk: This is a bit artificial, since we don't support generating code for a function that uses…
				inglorionAuthorUnsubmitted Done Reply Inline Actions I cobbled this together, but realized afterwards that it doesn't make much sense. Since this is 32-bit Windows, I rewrote it to use catchswitch instead. inglorion: I cobbled this together, but realized afterwards that it doesn't make much sense. Since this is…
				%cp = catchpad within %cs [i8* null, i32 64, i8* null]
				; CHECK: %3 = bitcast %struct.a* %a to i8*
				; CHECK: call void @llvm.lifetime.end.p0i8(i64 1, i8* %3)
				catchret from %cp to label %retneg
				retneg:
				ret i32 -1
				done:
				ret i32 0
				}

				; Hoisting allocas from basic blocks other than the function that uses
				; them require more sophisticated analysis. For now, just leave them
				; alone.
				define void @control_flow(i1 %x) nounwind personality i32 (...)* @__CxxFrameHandler3 {
				br i1 %x, label %done, label %doit

				doit:
				%a = alloca inalloca %struct.a
				%t = invoke i1 @g() to label %cont unwind label %unwind

				cont:
				%ptr = getelementptr %struct.a, %struct.a* %a, i32 0, i32 0
				store i1 %t, i1* %ptr
				call i1 @f(%struct.a* inalloca %a)
				br label %done

				unwind:
				%cs = catchswitch within none [label %unwind.body] unwind to caller
				unwind.body:
				%cp = catchpad within %cs [i8* null, i32 64, i8* null]
				catchret from %cp to label %done

				done:
				ret void
				}

				declare i32 @__CxxFrameHandler3(...)
				declare i1 @g()