This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Utils/
-
Transforms/
-
Utils/
1/2
PromoteMemoryToRegister.cpp
-
test/Transforms/Mem2Reg/
-
Transforms/
-
Mem2Reg/
2/4
fnattr.ll

Differential D76965

[FunctionAttrs][Mem2Reg] Handle Alloca passed as function call operand with function attributes
Needs ReviewPublic

Authored by ddcc on Mar 27 2020, 6:46 PM.

Download Raw Diff

Details

Reviewers

jdoerfert

Summary

Allow mem2reg to replace an Alloca used as a nocapture and inaccessiblememonly/readnone function call operand with undef

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	20 ms	MLIR.Dialect/Affine::Unknown Unit Message ("")

Event Timeline

ddcc created this revision.Mar 27 2020, 6:46 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 27 2020, 6:46 PM

Herald added subscribers: jfb, hiraditya. · View Herald Transcript

ddcc mentioned this in D76966: [GlobalOpt/GlobalStatus][Mem2Reg] Handle PtrToInt passed as function call operand.Mar 27 2020, 6:49 PM

Harbormaster failed remote builds in B50762: Diff 253274!Mar 27 2020, 7:17 PM

Allow mem2reg to replace an Alloca used as a nocapture and inaccessiblememonly/readnone function call operand with undef

This is not sound. Take the following example

int foo(int *a) {
  return a & 31;  // or "worse": if(a&32) return 1; else return 0;
}
int is_stack_aligned() {
   int stack;
   return foo(&stack);
}

Replacing stack with undef will introduce poison or UB where there was none before.
It also breaks if you have something like:

int is_same(int *a, short *b) {
  return a == b;
}
int return_true() {
  int Stack;
  return is_same(&Stack, (short*)&Stack);
}

That said, I think what we can do is "privatize" the pointer under more strict requirements than you have here. I was hoping to do that in AAPrivatizablePtrImpl at some point but I now think the requirements we have there are still a little too weak. I'll have to think about this one a bit. (Btw. you should really consider implementing some of these in the Attributor instead. It should be way more powerful there.)

jdoerfert requested changes to this revision.Mar 28 2020, 12:01 AM

This revision now requires changes to proceed.Mar 28 2020, 12:01 AM

Hmm, yeah, undef is probably too strong. Ok, how about I replace the argument with a fresh alloca? It should still permit load/store optimizations on the original alloca, while still providing some alloca that isn't accessed.

I'm not familiar with the attributor, is that the component that infers function attributes? Isn't that somewhat orthogonal since it tags function attributes for subsequent optimizations like this one?

Create fresh alloca instead of undefvalue

I'll check later but please add my examples as tests.

Harbormaster failed remote builds in B50834: Diff 253380!Mar 28 2020, 4:40 PM

Add more tests, fix bugs

Harbormaster failed remote builds in B50848: Diff 253397!Mar 28 2020, 11:04 PM

I'm unsure about the scope of this. It seems to match a particular pattern and it is unclear this is the right place to do so. Have you considered doing this as part of AAPrivatizablePtr (or a new AbstractAttribute) in the Attributor?

llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp
317	This comment will be outdated.
llvm/test/Transforms/Mem2Reg/fnattr.ll
2	If it is OK with you, please run `update_test_checks.py --function-signature --scrub-attributes` on this file to create the check lines. I personally find it way easier to read as almost complete check lines.
112	We should do something with the return value of `is_same`, either here or in `is_same`. Dead code for testing is not always future proof. Similarly in other test cases.

In D76965#1950042, @jdoerfert wrote:

I'm unsure about the scope of this. It seems to match a particular pattern and it is unclear this is the right place to do so. Have you considered doing this as part of AAPrivatizablePtr (or a new AbstractAttribute) in the Attributor?

No, I'm not familiar with the attributor, but I thought that was the component that infers function attributes? I admit I'm a bit hesitant to revise and reimplement this in a different framework. The original reason I ended up implementing this here is that in some benchmark code, I noticed the optimization sequence of global variable localization and memory to register conversion was being inhibited by these attributed functions, when compared against a baseline without attributed functions.

llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp
317	I'll elaborate more here, but this function must still remove all non load/store instructions using this alloca, because the rest of the memtoreg pass assumes it. The only change is that a promotable alloca passed as a non-capturing and non-read function call argument is changed to use a separate fresh alloca, instead of the original one.
llvm/test/Transforms/Mem2Reg/fnattr.ll
2	Ok, done.
112	Done.

Update tests with update_test_checks.py, fix bug

Harbormaster failed remote builds in B51006: Diff 253672!Mar 30 2020, 1:37 PM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Utils/

PromoteMemoryToRegister.cpp

110 lines

test/

Transforms/

Mem2Reg/

fnattr.ll

159 lines

Diff 253672

llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp

Show All 19 Lines
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/ADT/TinyPtrVector.h"		#include "llvm/ADT/TinyPtrVector.h"
#include "llvm/ADT/Twine.h"		#include "llvm/ADT/Twine.h"
#include "llvm/Analysis/AssumptionCache.h"		#include "llvm/Analysis/AssumptionCache.h"
#include "llvm/Analysis/InstructionSimplify.h"		#include "llvm/Analysis/InstructionSimplify.h"
#include "llvm/Analysis/IteratedDominanceFrontier.h"		#include "llvm/Analysis/IteratedDominanceFrontier.h"
#include "llvm/Transforms/Utils/Local.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/CFG.h"		#include "llvm/IR/CFG.h"
#include "llvm/IR/Constant.h"		#include "llvm/IR/Constant.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/DIBuilder.h"		#include "llvm/IR/DIBuilder.h"
#include "llvm/IR/DerivedTypes.h"		#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/InstrTypes.h"		#include "llvm/IR/InstrTypes.h"
#include "llvm/IR/Instruction.h"		#include "llvm/IR/Instruction.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Intrinsics.h"		#include "llvm/IR/Intrinsics.h"
#include "llvm/IR/LLVMContext.h"		#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
#include "llvm/IR/Type.h"		#include "llvm/IR/Type.h"
#include "llvm/IR/User.h"		#include "llvm/IR/User.h"
		#include "llvm/IR/ValueMap.h"
#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"
		#include "llvm/Transforms/Utils/Local.h"
#include "llvm/Transforms/Utils/PromoteMemToReg.h"		#include "llvm/Transforms/Utils/PromoteMemToReg.h"
#include <algorithm>		#include <algorithm>
#include <cassert>		#include <cassert>
#include <iterator>		#include <iterator>
#include <utility>		#include <utility>
#include <vector>		#include <vector>

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "mem2reg"		#define DEBUG_TYPE "mem2reg"

STATISTIC(NumLocalPromoted, "Number of alloca's promoted within one block");		STATISTIC(NumLocalPromoted, "Number of alloca's promoted within one block");
STATISTIC(NumSingleStore, "Number of alloca's promoted with a single store");		STATISTIC(NumSingleStore, "Number of alloca's promoted with a single store");
STATISTIC(NumDeadAlloca, "Number of dead alloca's removed");		STATISTIC(NumDeadAlloca, "Number of dead alloca's removed");
STATISTIC(NumPHIInsert, "Number of PHI nodes inserted");		STATISTIC(NumPHIInsert, "Number of PHI nodes inserted");

		static bool onlyUsedByOptimizableCall(const Use &U) {
		const Instruction *I = dyn_cast<Instruction>(U.getUser());
		if (const CallBase *CB = dyn_cast_or_null<CallBase>(I)) {
		if (!CB->isArgOperand(&U))
		return false;
		unsigned ArgNo = CB->getArgOperandNo(&U);
		// Argument must be not be captured or accessed
		if (CB->paramHasAttr(ArgNo, Attribute::NoCapture) &&
		(CB->hasFnAttr(Attribute::InaccessibleMemOnly) \|\|
		CB->hasFnAttr(Attribute::ReadNone) \|\|
		CB->paramHasAttr(ArgNo, Attribute::ReadNone)))
		return true;
		}

		return false;
		}

bool llvm::isAllocaPromotable(const AllocaInst *AI) {		bool llvm::isAllocaPromotable(const AllocaInst *AI) {
		bool usedByOtherInsts = !AI->getNumUses();
// FIXME: If the memory unit is of pointer or integer type, we can permit		// FIXME: If the memory unit is of pointer or integer type, we can permit
// assignments to subsections of the memory unit.		// assignments to subsections of the memory unit.
unsigned AS = AI->getType()->getAddressSpace();		unsigned AS = AI->getType()->getAddressSpace();

// Only allow direct and non-volatile loads and stores...		// Only allow direct and non-volatile loads and stores...
for (const User *U : AI->users()) {		for (const Use &UU : AI->uses()) {
		const User *U = UU.getUser();
if (const LoadInst *LI = dyn_cast<LoadInst>(U)) {		if (const LoadInst *LI = dyn_cast<LoadInst>(U)) {
// Note that atomic loads can be transformed; atomic semantics do		// Note that atomic loads can be transformed; atomic semantics do
// not have any meaning for a local alloca.		// not have any meaning for a local alloca.
if (LI->isVolatile())		if (LI->isVolatile())
return false;		return false;
} else if (const StoreInst *SI = dyn_cast<StoreInst>(U)) {		} else if (const StoreInst *SI = dyn_cast<StoreInst>(U)) {
if (SI->getOperand(0) == AI)		if (SI->getOperand(0) == AI)
return false; // Don't allow a store OF the AI, only INTO the AI.		return false; // Don't allow a store OF the AI, only INTO the AI.
// Note that atomic stores can be transformed; atomic semantics do		// Note that atomic stores can be transformed; atomic semantics do
// not have any meaning for a local alloca.		// not have any meaning for a local alloca.
if (SI->isVolatile())		if (SI->isVolatile())
return false;		return false;
} else if (const IntrinsicInst *II = dyn_cast<IntrinsicInst>(U)) {		} else if (const IntrinsicInst *II = dyn_cast<IntrinsicInst>(U)) {
if (!II->isLifetimeStartOrEnd())		if (!II->isLifetimeStartOrEnd())
return false;		return false;
		} else if (isa<CallBase>(U) && onlyUsedByOptimizableCall(UU)) {
		continue;
} else if (const BitCastInst *BCI = dyn_cast<BitCastInst>(U)) {		} else if (const BitCastInst *BCI = dyn_cast<BitCastInst>(U)) {
		bool onlyOptBCI = true;
		for (auto BI = BCI->use_begin(), BE = BCI->use_end();
		onlyOptBCI && BI != BE; ++BI)
		onlyOptBCI = onlyOptBCI && onlyUsedByOptimizableCall(*BI);
		if (onlyOptBCI)
		continue;
if (BCI->getType() != Type::getInt8PtrTy(U->getContext(), AS))		if (BCI->getType() != Type::getInt8PtrTy(U->getContext(), AS))
return false;		return false;
if (!onlyUsedByLifetimeMarkers(BCI))		if (!onlyUsedByLifetimeMarkers(BCI))
return false;		return false;
} else if (const GetElementPtrInst *GEPI = dyn_cast<GetElementPtrInst>(U)) {		} else if (const GetElementPtrInst *GEPI = dyn_cast<GetElementPtrInst>(U)) {
if (GEPI->getType() != Type::getInt8PtrTy(U->getContext(), AS))		if (GEPI->getType() != Type::getInt8PtrTy(U->getContext(), AS))
return false;		return false;
if (!GEPI->hasAllZeroIndices())		if (!GEPI->hasAllZeroIndices())
return false;		return false;
if (!onlyUsedByLifetimeMarkers(GEPI))		if (!onlyUsedByLifetimeMarkers(GEPI))
return false;		return false;
} else {		} else
return false;		return false;
}		usedByOtherInsts = true;
}		}

return true;		// Not promotable because it is a fresh alloca (only used by bitcasts and/or
		// as a non-accessed and non-capturing function call operand).
		return usedByOtherInsts;
}		}

namespace {		namespace {

struct AllocaInfo {		struct AllocaInfo {
SmallVector<BasicBlock *, 32> DefiningBlocks;		SmallVector<BasicBlock *, 32> DefiningBlocks;
SmallVector<BasicBlock *, 32> UsingBlocks;		SmallVector<BasicBlock *, 32> UsingBlocks;

▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines	class LargeBlockInfo {
/// For each instruction that we track, keep the index of the		/// For each instruction that we track, keep the index of the
/// instruction.		/// instruction.
///		///
/// The index starts out as the number of the instruction from the start of		/// The index starts out as the number of the instruction from the start of
/// the block.		/// the block.
DenseMap<const Instruction *, unsigned> InstNumbers;		DenseMap<const Instruction *, unsigned> InstNumbers;

public:		public:

/// This code only looks at accesses to allocas.		/// This code only looks at accesses to allocas.
static bool isInterestingInstruction(const Instruction *I) {		static bool isInterestingInstruction(const Instruction *I) {
return (isa<LoadInst>(I) && isa<AllocaInst>(I->getOperand(0))) \|\|		return (isa<LoadInst>(I) && isa<AllocaInst>(I->getOperand(0))) \|\|
(isa<StoreInst>(I) && isa<AllocaInst>(I->getOperand(1)));		(isa<StoreInst>(I) && isa<AllocaInst>(I->getOperand(1)));
}		}

/// Get or calculate the index of the specified instruction.		/// Get or calculate the index of the specified instruction.
unsigned getInstructionIndex(const Instruction *I) {		unsigned getInstructionIndex(const Instruction *I) {
▲ Show 20 Lines • Show All 109 Lines • ▼ Show 20 Lines	static void addAssumeNonNull(AssumptionCache AC, LoadInst LI) {
ICmpInst *LoadNotNull = new ICmpInst(ICmpInst::ICMP_NE, LI,		ICmpInst *LoadNotNull = new ICmpInst(ICmpInst::ICMP_NE, LI,
Constant::getNullValue(LI->getType()));		Constant::getNullValue(LI->getType()));
LoadNotNull->insertAfter(LI);		LoadNotNull->insertAfter(LI);
CallInst *CI = CallInst::Create(AssumeIntrinsic, {LoadNotNull});		CallInst *CI = CallInst::Create(AssumeIntrinsic, {LoadNotNull});
CI->insertAfter(LoadNotNull);		CI->insertAfter(LoadNotNull);
AC->registerAssumption(CI);		AC->registerAssumption(CI);
}		}

static void removeLifetimeIntrinsicUsers(AllocaInst *AI) {		static void removeNonLoadStoreUsers(AllocaInst *AI) {
// Knowing that this alloca is promotable, we know that it's safe to kill all		SmallVector<Instruction *, 4> Roots;
// instructions except for load and store.		ValueMap<Instruction , Instruction > VisitedMap;
jdoerfertUnsubmitted Not Done Reply Inline Actions This comment will be outdated. jdoerfert: This comment will be outdated.
ddccAuthorUnsubmitted Done Reply Inline Actions I'll elaborate more here, but this function must still remove all non load/store instructions using this alloca, because the rest of the memtoreg pass assumes it. The only change is that a promotable alloca passed as a non-capturing and non-read function call argument is changed to use a separate fresh alloca, instead of the original one. ddcc: I'll elaborate more here, but this function must still remove all non load/store instructions…
		AllocaInst *NewAlloca = nullptr;
for (auto UI = AI->user_begin(), UE = AI->user_end(); UI != UE;) {
		// Given that this alloca is promotable, remove all non-load/store
		// instructions on AI, which assumed by the remainder of this pass. In the
		// event AI is used as an uncaptured and unread argument to a function call,
		// create a new fresh AllocaInst and redirect these uses to that variable.
		Roots.push_back(AI);
		while (Roots.size()) {
		Instruction *V = Roots.pop_back_val();
		auto it = VisitedMap.insert({V, nullptr}).first;
		for (auto UI = V->user_begin(), UE = V->user_end(); UI != UE;) {
Instruction I = cast<Instruction>(UI);		Instruction I = cast<Instruction>(UI);
++UI;		++UI;
if (isa<LoadInst>(I) \|\| isa<StoreInst>(I))		if (isa<LoadInst>(I) \|\| isa<StoreInst>(I))
continue;		continue;
		else if (isa<BitCastInst>(I) \|\| isa<GetElementPtrInst>(I))
if (!I->getType()->isVoidTy()) {		Roots.push_back(I);
// The only users of this bitcast/GEP instruction are lifetime intrinsics.		else if (isa<CallBase>(I) && !isa<IntrinsicInst>(I)) {
// Follow the use/def chain to erase them now instead of leaving it for		// Introduce a fresh alloca, because it still needs some stack variable,
// dead code elimination later.		// but aliasing with this one will inhibit load/store optimizations.
for (auto UUI = I->user_begin(), UUE = I->user_end(); UUI != UUE;) {		if (!NewAlloca) {
Instruction Inst = cast<Instruction>(UUI);		NewAlloca = cast<AllocaInst>(AI->clone());
++UUI;		NewAlloca->insertBefore(AI);
Inst->eraseFromParent();		}
		// Replace the root with either the fresh alloca or a bitcast of it.
		if (!it->second)
		it->second = (NewAlloca->getType() == V->getType())
		? cast<Instruction>(NewAlloca)
		: new BitCastInst(NewAlloca, V->getType(), "", AI);
		} else
		I->eraseFromParent();
}		}
}		}
I->eraseFromParent();
		// Remove original roots except AI, and change uses to the fresh alloca.
		for (auto P : VisitedMap) {
		Instruction R = P.first, V = P.second;
		if (V) {
		// Only bitcasts and qualifying calls should refer to the fresh alloca.
		R->replaceUsesWithIf(V, [](Use &U) {
		Instruction *I = cast<Instruction>(U.getUser());
		return isa<BitCastInst>(I) \|\| isa<CallBase>(I);
		});
		} else if (R != AI)
		R->dropAllReferences();
		if (R != AI && !R->getNumUses())
		R->eraseFromParent();
}		}
}		}

/// Rewrite as many loads as possible given a single store.		/// Rewrite as many loads as possible given a single store.
///		///
/// When there is only a single store, we can use the domtree to trivially		/// When there is only a single store, we can use the domtree to trivially
/// replace all of the dominated loads with the stored value. Do so, and return		/// replace all of the dominated loads with the stored value. Do so, and return
/// true if this has successfully promoted the alloca entirely. If this returns		/// true if this has successfully promoted the alloca entirely. If this returns
▲ Show 20 Lines • Show All 195 Lines • ▼ Show 20 Lines	void PromoteMem2Reg::run() {

for (unsigned AllocaNum = 0; AllocaNum != Allocas.size(); ++AllocaNum) {		for (unsigned AllocaNum = 0; AllocaNum != Allocas.size(); ++AllocaNum) {
AllocaInst *AI = Allocas[AllocaNum];		AllocaInst *AI = Allocas[AllocaNum];

assert(isAllocaPromotable(AI) && "Cannot promote non-promotable alloca!");		assert(isAllocaPromotable(AI) && "Cannot promote non-promotable alloca!");
assert(AI->getParent()->getParent() == &F &&		assert(AI->getParent()->getParent() == &F &&
"All allocas should be in the same function, which is same as DF!");		"All allocas should be in the same function, which is same as DF!");

removeLifetimeIntrinsicUsers(AI);		removeNonLoadStoreUsers(AI);

if (AI->use_empty()) {		if (AI->use_empty()) {
// If there are no uses of the alloca, just delete it now.		// If there are no uses of the alloca, just delete it now.
AI->eraseFromParent();		AI->eraseFromParent();

// Remove the alloca from the Allocas list, since it has been processed		// Remove the alloca from the Allocas list, since it has been processed
RemoveFromAllocasList(AllocaNum);		RemoveFromAllocasList(AllocaNum);
++NumDeadAlloca;		++NumDeadAlloca;
▲ Show 20 Lines • Show All 452 Lines • Show Last 20 Lines

llvm/test/Transforms/Mem2Reg/fnattr.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -S < %s -mem2reg \| FileCheck %s
				jdoerfertUnsubmitted Not Done Reply Inline Actions If it is OK with you, please run `update_test_checks.py --function-signature --scrub-attributes` on this file to create the check lines. I personally find it way easier to read as almost complete check lines. jdoerfert: If it is OK with you, please run `update_test_checks.py --function-signature --scrub…
				ddccAuthorUnsubmitted Done Reply Inline Actions Ok, done. ddcc: Ok, done.

				declare void @foo1a(i8* readnone nocapture, i8) local_unnamed_addr
				declare void @foo1b(i8* nocapture, i8) local_unnamed_addr readnone
				declare void @foo1c(i8* nocapture, i8) local_unnamed_addr inaccessiblememonly

				; Doesn't read from pointer argument, promotable
				define i32 @a() norecurse {
				; CHECK-LABEL: define {{[^@]+}}@a()
				; CHECK-NEXT: [[TMP1:%.*]] = alloca i32
				; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[TMP1]] to i8*
				; CHECK-NEXT: call void @foo1a(i8* [[TMP2]], i8 0)
				; CHECK-NEXT: call void @foo1b(i8* [[TMP2]], i8 0)
				; CHECK-NEXT: call void @foo1c(i8* [[TMP2]], i8 0)
				; CHECK-NEXT: ret i32 42
				;
				%g1 = alloca i32
				store i32 42, i32 *%g1
				%p = bitcast i32* %g1 to i8*
				call void @foo1a(i8* %p, i8 0)
				call void @foo1b(i8* %p, i8 0)
				call void @foo1c(i8* %p, i8 0)
				%a = load i32, i32* %g1
				ret i32 %a
				}

				declare void @foo2a(i8*, i8) local_unnamed_addr readnone
				declare void @foo2b(i8* nocapture, i8) local_unnamed_addr

				; May-capture and may-read, not promotable
				define i32 @b() norecurse {
				; CHECK-LABEL: define {{[^@]+}}@b()
				; CHECK-NEXT: [[G2:%.*]] = alloca i32
				; CHECK-NEXT: store i32 42, i32* [[G2]]
				; CHECK-NEXT: [[P:%.]] = bitcast i32 [[G2]] to i8*
				; CHECK-NEXT: call void @foo2a(i8* [[P]], i8 0)
				; CHECK-NEXT: call void @foo2b(i8* [[P]], i8 0)
				; CHECK-NEXT: [[B:%.]] = load i32, i32 [[G2]]
				; CHECK-NEXT: ret i32 [[B]]
				;
				%g2 = alloca i32
				store i32 42, i32 *%g2
				%p = bitcast i32* %g2 to i8*
				call void @foo2a(i8* %p, i8 0)
				call void @foo2b(i8* %p, i8 0)
				%b = load i32, i32* %g2
				ret i32 %b
				}

				define i1 @is_aligned(i32* nocapture readnone) norecurse {
				; CHECK-LABEL: define {{[^@]+}}@is_aligned
				; CHECK-SAME: (i32* nocapture readnone [[TMP0:%.*]])
				; CHECK-NEXT: [[INT:%.]] = ptrtoint i32 [[TMP0]] to i64
				; CHECK-NEXT: [[AND:%.*]] = and i64 [[INT]], 31
				; CHECK-NEXT: [[CMP:%.*]] = icmp ne i64 [[AND]], 0
				; CHECK-NEXT: ret i1 [[CMP]]
				;
				%int = ptrtoint i32* %0 to i64
				%and = and i64 %int, 31
				%cmp = icmp ne i64 %and, 0
				ret i1 %cmp
				}

				; No non-bitcasts/qualifying calls, not promotable
				define i1 @is_stack_aligned() norecurse {
				; CHECK-LABEL: define {{[^@]+}}@is_stack_aligned()
				; CHECK-NEXT: [[VAR:%.*]] = alloca i32, align 4
				; CHECK-NEXT: [[RET:%.]] = call zeroext i1 @is_aligned(i32 nonnull [[VAR]])
				; CHECK-NEXT: ret i1 [[RET]]
				;
				%var = alloca i32, align 4
				%ret = call zeroext i1 @is_aligned(i32* nonnull %var)
				ret i1 %ret
				}

				define i1 @is_same(i32* nocapture readnone, i16* nocapture readnone) norecurse {
				; CHECK-LABEL: define {{[^@]+}}@is_same
				; CHECK-SAME: (i32* nocapture readnone [[TMP0:%.]], i16 nocapture readnone [[TMP1:%.*]])
				; CHECK-NEXT: [[CAST:%.]] = bitcast i16 [[TMP1]] to i32*
				; CHECK-NEXT: [[CMP:%.]] = icmp eq i32 [[CAST]], [[TMP0]]
				; CHECK-NEXT: ret i1 [[CMP]]
				;
				%cast = bitcast i16* %1 to i32*
				%cmp = icmp eq i32* %cast, %0
				ret i1 %cmp
				}

				; No non-bitcasts/qualifying calls, not promotable
				define i1 @return_true() norecurse {
				; CHECK-LABEL: define {{[^@]+}}@return_true()
				; CHECK-NEXT: [[VAR:%.*]] = alloca i32, align 4
				; CHECK-NEXT: [[CAST:%.]] = bitcast i32 [[VAR]] to i16*
				; CHECK-NEXT: [[RET:%.]] = call zeroext i1 @is_same(i32 nonnull [[VAR]], i16* nonnull [[CAST]])
				; CHECK-NEXT: ret i1 [[RET]]
				;
				%var = alloca i32, align 4
				%cast = bitcast i32* %var to i16*
				%ret = call zeroext i1 @is_same(i32* nonnull %var, i16* nonnull %cast)
				ret i1 %ret
				}

				declare void @llvm.trap()

				; Bitcast dominates loads/stores, not promotable
				define i32 @c() norecurse {
				; CHECK-LABEL: define {{[^@]+}}@c()
				; CHECK-NEXT: [[VAR:%.*]] = alloca i32, align 4
				; CHECK-NEXT: [[CAST:%.]] = bitcast i32 [[VAR]] to i32*
				; CHECK-NEXT: store i32 42, i32* [[CAST]]
				; CHECK-NEXT: [[CAST2:%.]] = bitcast i32 [[VAR]] to i16*
				; CHECK-NEXT: [[RET:%.]] = call zeroext i1 @is_same(i32 nonnull [[VAR]], i16* nonnull [[CAST2]])
				jdoerfertUnsubmitted Not Done Reply Inline Actions We should do something with the return value of `is_same`, either here or in `is_same`. Dead code for testing is not always future proof. Similarly in other test cases. jdoerfert: We should do something with the return value of `is_same`, either here or in `is_same`. Dead…
				ddccAuthorUnsubmitted Done Reply Inline Actions Done. ddcc: Done.
				; CHECK-NEXT: br i1 [[RET]], label [[CONT:%.]], label [[TRAP:%.]]
				; CHECK: trap:
				; CHECK-NEXT: call void @llvm.trap()
				; CHECK-NEXT: unreachable
				; CHECK: cont:
				; CHECK-NEXT: [[C:%.]] = load i32, i32 [[CAST]]
				; CHECK-NEXT: ret i32 [[C]]
				;
				%var = alloca i32, align 4
				%cast = bitcast i32* %var to i32*
				store i32 42, i32* %cast
				%cast2 = bitcast i32* %var to i16*
				%ret = call zeroext i1 @is_same(i32* nonnull %var, i16* nonnull %cast2)
				br i1 %ret, label %cont, label %trap
				trap:
				call void @llvm.trap()
				unreachable
				cont:
				%c = load i32, i32* %cast
				ret i32 %c
				}

				; Bitcast only dominates qualifying calls, promotable
				define i32 @d() norecurse {
				; CHECK-LABEL: define {{[^@]+}}@d()
				; CHECK-NEXT: [[TMP1:%.*]] = alloca i32, align 4
				; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[TMP1]] to i16*
				; CHECK-NEXT: [[RET:%.]] = call zeroext i1 @is_same(i32 nonnull [[TMP1]], i16* nonnull [[TMP2]])
				; CHECK-NEXT: br i1 [[RET]], label [[CONT:%.]], label [[TRAP:%.]]
				; CHECK: trap:
				; CHECK-NEXT: call void @llvm.trap()
				; CHECK-NEXT: unreachable
				; CHECK: cont:
				; CHECK-NEXT: ret i32 42
				;
				%var = alloca i32, align 4
				store i32 42, i32* %var
				%cast = bitcast i32* %var to i16*
				%ret = call zeroext i1 @is_same(i32* nonnull %var, i16* nonnull %cast)
				br i1 %ret, label %cont, label %trap
				trap:
				call void @llvm.trap()
				unreachable
				cont:
				%d = load i32, i32* %var
				ret i32 %d
				}