Download Raw Diff

Details

Reviewers

qcolombet
ab
mcrosier

Commits

rG5256fcada0ae: [CodeGenPrepare] Create more extloads and fewer ands
rL253722: [CodeGenPrepare] Create more extloads and fewer ands

Summary

Add and instructions immediately after loads that only have their low
bits used, assuming that the (and (load x) c) will be matched as a
extload and the ands/truncs fed by the extload will be removed by isel.

Diff Detail

Repository: rL LLVM

Event Timeline

gberry updated this revision to Diff 39962.Nov 11 2015, 12:58 PM

gberry retitled this revision from to [CodeGenPrepare] Create more extloads and fewer ands.

gberry updated this object.

gberry added reviewers: mcrosier, qcolombet, ab.

gberry added a subscriber: llvm-commits.

qcolombet added inline comments.Nov 11 2015, 5:03 PM

lib/CodeGen/CodeGenPrepare.cpp
4197 ↗	(On Diff #39962)	why would you duplicate the 'and' instead of just moving it? Moreover, would it make sense to emit a narrower load instead followed by a zext?

New version addressing qcolombet's feedback.

Quentin,

Thanks for the quick review. I've added a new revision that addresses your question about removing the old add. This can't be done in general since it might have a narrower mask or reach the load through a phi, but for the safe cases I now go ahead and remove it.

As for your question about transforming the load to be narrower with a zext, my reservation about doing this here is that I can't check the TLI.shouldReduceLoadWidth() hook here, since it wants to see the load SDNode, so I might end up narrowing some special case loads that the target doesn't want narrowed. Does this seem reasonable to you?

junbuml added a subscriber: junbuml.Nov 13 2015, 2:23 PM

qcolombet added inline comments.Nov 17 2015, 1:42 PM

lib/CodeGen/CodeGenPrepare.cpp
4184 ↗	(On Diff #40083)	Could you put some quotes around “and” or something? I found it difficult to process as it is :).
4216 ↗	(On Diff #40083)	If I read the code correctly, this will happen only after a call to this optimization for each load. Although the example is interesting, you should call out that it needs both loads to be optimized to end up in this situation, otherwise the expectations while reading the code are misled.
4240 ↗	(On Diff #40083)	The dyn_cast is useless. A user of an instruction must be an instruction or we are doing something weird.
4242 ↗	(On Diff #40083)	Just test getParent == getParent. Indeed, if you are in the same block, by construction the node cannot be a phi, since phis are the first instructions in the block.
4268 ↗	(On Diff #40083)	You mean its users, right?
4274 ↗	(On Diff #40083)	As far as I can tell, you do not need to access anything specific to BinaryOperator. I then suggest to turn this if…else… if… sequence into a switch: switch (V->getOpcode()) { case And: … default: return false; }
4284 ↗	(On Diff #40083)	BinOp must be an instruction at this point, given it has been constructed with dyn_cast<BinaryOperator>. I.e., you don’t need to test that BinOp isa Instruction and you don’t need to cast BinOp to Instruction.
4308 ↗	(On Diff #40083)	Could you give more details here on the reason of the unlikeliness and what would happen if we do generate a i1 EXTLOAD?
4317 ↗	(On Diff #40083)	TruncTy would be a better name, wouldn’t it?
5053 ↗	(On Diff #40083)	The enableExtLdPromotion was aimed at a different optimization. I think it does not make much sense to reuse it here. Wouldn’t it make sense to just do it? That should always be a win, right?

One additional comment, it would like to see an IR to IR test case to test the optimization in isolation.

Thanks,
-Quentin

Updated to address Quentin's comments.

gberry marked 8 inline comments as done.Nov 18 2015, 1:54 PM

gberry added inline comments.

lib/CodeGen/CodeGenPrepare.cpp
4242 ↗	(On Diff #40555)	I don't think your comment about the user not being a phi in the same block is correct. Consider a single block loop where a load inside the loop feeds a phi at the top of the loop. I've added a test case for this (see test/Transforms/CodeGenPrepare/AArch64/free-zext.ll test_free_zext3).
4308 ↗	(On Diff #40555)	I tried to add more of an explanation here, let me know if it makes sense, or if you think this is a shortcoming in the AArch64 back-end that needs to be addressed.
5061 ↗	(On Diff #40555)	I went ahead and removed this, though I'm not 100% happy with it. I would like to avoid doing this work for targets that don't support any extloads at all (since it is wasted computation), but I couldn't find a good way of checking for that, so I was using enableExtLdPromotion as a proxy.

qcolombet added inline comments.Nov 18 2015, 2:30 PM

lib/CodeGen/CodeGenPrepare.cpp
4242 ↗	(On Diff #40555)	Of course, you’re right! Please disregard that comment and thanks for adding a test case to cover that code.
4285 ↗	(On Diff #40555)	LLVM coding style is to go with early exits in those cases. I.e., if (<inverted cond>) return false; // Else, do the work.
4294 ↗	(On Diff #40555)	Ditto.
4308 ↗	(On Diff #40555)	That feels like a shortcoming in the AArch64 backend to me. I could live with it for now if you file a PR so that we remember to look into it.
5061 ↗	(On Diff #40555)	I see. In that case, we may consider adding a new target hook. What is the impact of this optimization on the compile time anyway?
test/CodeGen/AArch64/free-zext.ll
30 ↗	(On Diff #40555)	Could you be more explicit on what case of optimizeLoadExt you are checking? E.g., test… when the phi is in the same block as the load. This is usually useful when we have to update the tests. This applies to all the added test cases.
55 ↗	(On Diff #40555)	Make sure to file a PR when this land.

Update to address Quentin's comments (round 2)

Hi Quentin,

I believe I have addressed all of your concerns. One other change to note in the lastest update is that I moved the CodeGenPrepare IR test up out of the AArch64 directory since the optimization is now enabled for all targets.

I'll be sure to file PR's for the two cases mentioned above as well.

Let me know if you have any other concerns.

Thanks,
-Geoff

lib/CodeGen/CodeGenPrepare.cpp
4392 ↗	(On Diff #40675)	I will file a PR for the AArch64 i1 zextload case.
5150 ↗	(On Diff #40675)	I'm less concerned about this now after checking the compile time (CodeGenPrepare time within the noise for spec2006.gcc where this code is hit many times), and the fact that this feature is more common across targets than I initially thought.
test/CodeGen/AArch64/free-zext.ll
55 ↗	(On Diff #40675)	Will do.

Hi Geoff,

Thanks for your patience!

LGTM.

Cheers,
-Quentin

This revision is now accepted and ready to land.Nov 19 2015, 1:36 PM

Closed by commit rL253722: [CodeGenPrepare] Create more extloads and fewer ands (authored by gberry). · Explain WhyNov 20 2015, 2:37 PM

This revision was automatically updated to reflect the committed changes.

Diff 40833

llvm/trunk/lib/CodeGen/CodeGenPrepare.cpp

Show First 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
STATISTIC(NumCmpUses, "Number of uses of Cmp expressions replaced with uses of "		STATISTIC(NumCmpUses, "Number of uses of Cmp expressions replaced with uses of "
"sunken Cmps");		"sunken Cmps");
STATISTIC(NumCastUses, "Number of uses of Cast expressions replaced with uses "		STATISTIC(NumCastUses, "Number of uses of Cast expressions replaced with uses "
"of sunken Casts");		"of sunken Casts");
STATISTIC(NumMemoryInsts, "Number of memory instructions whose address "		STATISTIC(NumMemoryInsts, "Number of memory instructions whose address "
"computations were sunk");		"computations were sunk");
STATISTIC(NumExtsMoved, "Number of [s\|z]ext instructions combined with loads");		STATISTIC(NumExtsMoved, "Number of [s\|z]ext instructions combined with loads");
STATISTIC(NumExtUses, "Number of uses of [s\|z]ext instructions optimized");		STATISTIC(NumExtUses, "Number of uses of [s\|z]ext instructions optimized");
		STATISTIC(NumAndsAdded,
		"Number of and mask instructions added to form ext loads");
		STATISTIC(NumAndUses, "Number of uses of and mask instructions optimized");
STATISTIC(NumRetsDup, "Number of return instructions duplicated");		STATISTIC(NumRetsDup, "Number of return instructions duplicated");
STATISTIC(NumDbgValueMoved, "Number of debug value instructions moved");		STATISTIC(NumDbgValueMoved, "Number of debug value instructions moved");
STATISTIC(NumSelectsExpanded, "Number of selects turned into branches");		STATISTIC(NumSelectsExpanded, "Number of selects turned into branches");
STATISTIC(NumAndCmpsMoved, "Number of and/cmp's pushed into branches");		STATISTIC(NumAndCmpsMoved, "Number of and/cmp's pushed into branches");
STATISTIC(NumStoreExtractExposed, "Number of store(extractelement) exposed");		STATISTIC(NumStoreExtractExposed, "Number of store(extractelement) exposed");

static cl::opt<bool> DisableBranchOpts(		static cl::opt<bool> DisableBranchOpts(
"disable-cgp-branch-opts", cl::Hidden, cl::init(false),		"disable-cgp-branch-opts", cl::Hidden, cl::init(false),
▲ Show 20 Lines • Show All 93 Lines • ▼ Show 20 Lines	private:
bool optimizeBlock(BasicBlock &BB, bool& ModifiedDT);		bool optimizeBlock(BasicBlock &BB, bool& ModifiedDT);
bool optimizeInst(Instruction *I, bool& ModifiedDT);		bool optimizeInst(Instruction *I, bool& ModifiedDT);
bool optimizeMemoryInst(Instruction I, Value Addr,		bool optimizeMemoryInst(Instruction I, Value Addr,
Type *AccessTy, unsigned AS);		Type *AccessTy, unsigned AS);
bool optimizeInlineAsmInst(CallInst *CS);		bool optimizeInlineAsmInst(CallInst *CS);
bool optimizeCallInst(CallInst *CI, bool& ModifiedDT);		bool optimizeCallInst(CallInst *CI, bool& ModifiedDT);
bool moveExtToFormExtLoad(Instruction *&I);		bool moveExtToFormExtLoad(Instruction *&I);
bool optimizeExtUses(Instruction *I);		bool optimizeExtUses(Instruction *I);
		bool optimizeLoadExt(LoadInst *I);
bool optimizeSelectInst(SelectInst *SI);		bool optimizeSelectInst(SelectInst *SI);
bool optimizeShuffleVectorInst(ShuffleVectorInst *SI);		bool optimizeShuffleVectorInst(ShuffleVectorInst *SI);
bool optimizeSwitchInst(SwitchInst *CI);		bool optimizeSwitchInst(SwitchInst *CI);
bool optimizeExtractElementInst(Instruction *Inst);		bool optimizeExtractElementInst(Instruction *Inst);
bool dupRetToEnableTailCallOpts(BasicBlock *BB);		bool dupRetToEnableTailCallOpts(BasicBlock *BB);
bool placeDbgValues(Function &F);		bool placeDbgValues(Function &F);
bool sinkAndCmp(Function &F);		bool sinkAndCmp(Function &F);
bool extLdPromotion(TypePromotionTransaction &TPT, LoadInst *&LI,		bool extLdPromotion(TypePromotionTransaction &TPT, LoadInst *&LI,
▲ Show 20 Lines • Show All 4,067 Lines • ▼ Show 20 Lines	for (Use &U : Src->uses()) {
U = InsertedTrunc;		U = InsertedTrunc;
++NumExtUses;		++NumExtUses;
MadeChange = true;		MadeChange = true;
}		}

return MadeChange;		return MadeChange;
}		}

		// Find loads whose uses only use some of the loaded value's bits. Add an "and"
		// just after the load if the target can fold this into one extload instruction,
		// with the hope of eliminating some of the other later "and" instructions using
		// the loaded value. "and"s that are made trivially redundant by the insertion
		// of the new "and" are removed by this function, while others (e.g. those whose
		// path from the load goes through a phi) are left for isel to potentially
		// remove.
		//
		// For example:
		//
		// b0:
		// x = load i32
		// ...
		// b1:
		// y = and x, 0xff
		// z = use y
		//
		// becomes:
		//
		// b0:
		// x = load i32
		// x' = and x, 0xff
		// ...
		// b1:
		// z = use x'
		//
		// whereas:
		//
		// b0:
		// x1 = load i32
		// ...
		// b1:
		// x2 = load i32
		// ...
		// b2:
		// x = phi x1, x2
		// y = and x, 0xff
		//
		// becomes (after a call to optimizeLoadExt for each load):
		//
		// b0:
		// x1 = load i32
		// x1' = and x1, 0xff
		// ...
		// b1:
		// x2 = load i32
		// x2' = and x2, 0xff
		// ...
		// b2:
		// x = phi x1', x2'
		// y = and x, 0xff
		//

		bool CodeGenPrepare::optimizeLoadExt(LoadInst *Load) {

		if (!Load->isSimple() \|\|
		!(Load->getType()->isIntegerTy() \|\| Load->getType()->isPointerTy()))
		return false;

		// Skip loads we've already transformed or have no reason to transform.
		if (Load->hasOneUse()) {
		User LoadUser = Load->user_begin();
		if (cast<Instruction>(LoadUser)->getParent() == Load->getParent() &&
		!dyn_cast<PHINode>(LoadUser))
		return false;
		}

		// Look at all uses of Load, looking through phis, to determine how many bits
		// of the loaded value are needed.
		SmallVector<Instruction *, 8> WorkList;
		SmallPtrSet<Instruction *, 16> Visited;
		SmallVector<Instruction *, 8> AndsToMaybeRemove;
		for (auto *U : Load->users())
		WorkList.push_back(cast<Instruction>(U));

		EVT LoadResultVT = TLI->getValueType(*DL, Load->getType());
		unsigned BitWidth = LoadResultVT.getSizeInBits();
		APInt DemandBits(BitWidth, 0);
		APInt WidestAndBits(BitWidth, 0);

		while (!WorkList.empty()) {
		Instruction *I = WorkList.back();
		WorkList.pop_back();

		// Break use-def graph loops.
		if (!Visited.insert(I).second)
		continue;

		// For a PHI node, push all of its users.
		if (auto *Phi = dyn_cast<PHINode>(I)) {
		for (auto *U : Phi->users())
		WorkList.push_back(cast<Instruction>(U));
		continue;
		}

		switch (I->getOpcode()) {
		case llvm::Instruction::And: {
		auto *AndC = dyn_cast<ConstantInt>(I->getOperand(1));
		if (!AndC)
		return false;
		APInt AndBits = AndC->getValue();
		DemandBits \|= AndBits;
		// Keep track of the widest and mask we see.
		if (AndBits.ugt(WidestAndBits))
		WidestAndBits = AndBits;
		if (AndBits == WidestAndBits && I->getOperand(0) == Load)
		AndsToMaybeRemove.push_back(I);
		break;
		}

		case llvm::Instruction::Shl: {
		auto *ShlC = dyn_cast<ConstantInt>(I->getOperand(1));
		if (!ShlC)
		return false;
		uint64_t ShiftAmt = ShlC->getLimitedValue(BitWidth - 1);
		auto ShlDemandBits = APInt::getAllOnesValue(BitWidth).lshr(ShiftAmt);
		DemandBits \|= ShlDemandBits;
		break;
		}

		case llvm::Instruction::Trunc: {
		EVT TruncVT = TLI->getValueType(*DL, I->getType());
		unsigned TruncBitWidth = TruncVT.getSizeInBits();
		auto TruncBits = APInt::getAllOnesValue(TruncBitWidth).zext(BitWidth);
		DemandBits \|= TruncBits;
		break;
		}

		default:
		return false;
		}
		}

		uint32_t ActiveBits = DemandBits.getActiveBits();
		// Avoid hoisting (and (load x) 1) since it is unlikely to be folded by the
		// target even if isLoadExtLegal says an i1 EXTLOAD is valid. For example,
		// for the AArch64 target isLoadExtLegal(ZEXTLOAD, i32, i1) returns true, but
		// (and (load x) 1) is not matched as a single instruction, rather as a LDR
		// followed by an AND.
		// TODO: Look into removing this restriction by fixing backends to either
		// return false for isLoadExtLegal for i1 or have them select this pattern to
		// a single instruction.
		//
		// Also avoid hoisting if we didn't see any ands with the exact DemandBits
		// mask, since these are the only ands that will be removed by isel.
		if (ActiveBits <= 1 \|\| !APIntOps::isMask(ActiveBits, DemandBits) \|\|
		WidestAndBits != DemandBits)
		return false;

		LLVMContext &Ctx = Load->getType()->getContext();
		Type *TruncTy = Type::getIntNTy(Ctx, ActiveBits);
		EVT TruncVT = TLI->getValueType(*DL, TruncTy);

		// Reject cases that won't be matched as extloads.
		if (!LoadResultVT.bitsGT(TruncVT) \|\| !TruncVT.isRound() \|\|
		!TLI->isLoadExtLegal(ISD::ZEXTLOAD, LoadResultVT, TruncVT))
		return false;

		IRBuilder<> Builder(Load->getNextNode());
		auto *NewAnd = dyn_cast<Instruction>(
		Builder.CreateAnd(Load, ConstantInt::get(Ctx, DemandBits)));

		// Replace all uses of load with new and (except for the use of load in the
		// new and itself).
		Load->replaceAllUsesWith(NewAnd);
		NewAnd->setOperand(0, Load);

		// Remove any and instructions that are now redundant.
		for (auto *And : AndsToMaybeRemove)
		// Check that the and mask is the same as the one we decided to put on the
		// new and.
		if (cast<ConstantInt>(And->getOperand(1))->getValue() == DemandBits) {
		And->replaceAllUsesWith(NewAnd);
		if (&*CurInstIterator == And)
		CurInstIterator = std::next(And->getIterator());
		And->eraseFromParent();
		++NumAndUses;
		}

		++NumAndsAdded;
		return true;
		}

/// Check if V (an operand of a select instruction) is an expensive instruction		/// Check if V (an operand of a select instruction) is an expensive instruction
/// that is only used once.		/// that is only used once.
static bool sinkSelectOperand(const TargetTransformInfo TTI, Value V) {		static bool sinkSelectOperand(const TargetTransformInfo TTI, Value V) {
auto *I = dyn_cast<Instruction>(V);		auto *I = dyn_cast<Instruction>(V);
// If it's safe to speculatively execute, then it should not have side		// If it's safe to speculatively execute, then it should not have side
// effects; therefore, it's safe to sink and possibly not execute.		// effects; therefore, it's safe to sink and possibly not execute.
return I && I->hasOneUse() && isSafeToSpeculativelyExecute(I) &&		return I && I->hasOneUse() && isSafeToSpeculativelyExecute(I) &&
TTI->getUserCost(I) >= TargetTransformInfo::TCC_Expensive;		TTI->getUserCost(I) >= TargetTransformInfo::TCC_Expensive;
▲ Show 20 Lines • Show All 685 Lines • ▼ Show 20 Lines	bool CodeGenPrepare::optimizeInst(Instruction *I, bool& ModifiedDT) {

if (CmpInst *CI = dyn_cast<CmpInst>(I))		if (CmpInst *CI = dyn_cast<CmpInst>(I))
if (!TLI \|\| !TLI->hasMultipleConditionRegisters())		if (!TLI \|\| !TLI->hasMultipleConditionRegisters())
return OptimizeCmpExpression(CI);		return OptimizeCmpExpression(CI);

if (LoadInst *LI = dyn_cast<LoadInst>(I)) {		if (LoadInst *LI = dyn_cast<LoadInst>(I)) {
stripInvariantGroupMetadata(*LI);		stripInvariantGroupMetadata(*LI);
if (TLI) {		if (TLI) {
		bool Modified = optimizeLoadExt(LI);
unsigned AS = LI->getPointerAddressSpace();		unsigned AS = LI->getPointerAddressSpace();
return optimizeMemoryInst(I, I->getOperand(0), LI->getType(), AS);		Modified \|= optimizeMemoryInst(I, I->getOperand(0), LI->getType(), AS);
		return Modified;
}		}
return false;		return false;
}		}

if (StoreInst *SI = dyn_cast<StoreInst>(I)) {		if (StoreInst *SI = dyn_cast<StoreInst>(I)) {
stripInvariantGroupMetadata(*SI);		stripInvariantGroupMetadata(*SI);
if (TLI) {		if (TLI) {
unsigned AS = SI->getPointerAddressSpace();		unsigned AS = SI->getPointerAddressSpace();
▲ Show 20 Lines • Show All 404 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AArch64/free-zext.ll

	Show All 20 Lines
	; CHECK: str x[[A]], [x2]			; CHECK: str x[[A]], [x2]
	%load = load i32, i32* %ptr, align 8			%load = load i32, i32* %ptr, align 8
	%load16 = and i32 %load, 65535			%load16 = and i32 %load, 65535
	%load64 = zext i32 %load16 to i64			%load64 = zext i32 %load16 to i64
	store i32 %load16, i32* %dst1, align 4			store i32 %load16, i32* %dst1, align 4
	store i64 %load64, i64* %dst2, align 8			store i64 %load64, i64* %dst2, align 8
	ret void			ret void
	}			}

				; Test for CodeGenPrepare::optimizeLoadExt(): simple case: two loads
				; feeding a phi that zext's each loaded value.
				define i32 @test_free_zext3(i32* %ptr, i32* %ptr2, i32* %dst, i32 %c) {
				; CHECK-LABEL: test_free_zext3:
				bb1:
				; CHECK: ldrh [[REG:w[0-9]+]]
				; CHECK-NOT: and {{w[0-9]+}}, [[REG]], #0xffff
				%tmp1 = load i32, i32* %ptr, align 4
				%cmp = icmp ne i32 %c, 0
				br i1 %cmp, label %bb2, label %bb3
				bb2:
				; CHECK: ldrh [[REG2:w[0-9]+]]
				; CHECK-NOT: and {{w[0-9]+}}, [[REG2]], #0xffff
				%tmp2 = load i32, i32* %ptr2, align 4
				br label %bb3
				bb3:
				%tmp3 = phi i32 [ %tmp1, %bb1 ], [ %tmp2, %bb2 ]
				; CHECK-NOT: and {{w[0-9]+}}, {{w[0-9]+}}, #0xffff
				%tmpand = and i32 %tmp3, 65535
				ret i32 %tmpand
				}

				; Test for CodeGenPrepare::optimizeLoadExt(): check case of zext-able
				; load feeding a phi in the same block.
				define void @test_free_zext4(i32* %ptr, i32* %ptr2, i32* %dst) {
				; CHECK-LABEL: test_free_zext4:
				; CHECK: ldrh [[REG:w[0-9]+]]
				; TODO: fix isel to remove final and XCHECK-NOT: and {{w[0-9]+}}, {{w[0-9]+}}, #0xffff
				; CHECK: ldrh [[REG:w[0-9]+]]
				bb1:
				%load1 = load i32, i32* %ptr, align 4
				br label %loop
				loop:
				%phi = phi i32 [ %load1, %bb1 ], [ %load2, %loop ]
				%and = and i32 %phi, 65535
				store i32 %and, i32* %dst, align 4
				%load2 = load i32, i32* %ptr2, align 4
				%cmp = icmp ne i32 %and, 0
				br i1 %cmp, label %loop, label %end
				end:
				ret void
				}

llvm/trunk/test/Transforms/CodeGenPrepare/free-zext.ll

				; RUN: opt -S -codegenprepare -mtriple=aarch64-linux %s \| FileCheck %s

				; Test for CodeGenPrepare::optimizeLoadExt(): simple case: two loads
				; feeding a phi that zext's each loaded value.
				define i32 @test_free_zext(i32* %ptr, i32* %ptr2, i32 %c) {
				; CHECK-LABEL: @test_free_zext(
				bb1:
				; CHECK-LABEL: bb1:
				; CHECK: %[[T1:.*]] = load
				; CHECK: %[[A1:.*]] = and i32 %[[T1]], 65535
				%load1 = load i32, i32* %ptr, align 4
				%cmp = icmp ne i32 %c, 0
				br i1 %cmp, label %bb2, label %bb3
				bb2:
				; CHECK-LABEL: bb2:
				; CHECK: %[[T2:.*]] = load
				; CHECK: %[[A2:.*]] = and i32 %[[T2]], 65535
				%load2 = load i32, i32* %ptr2, align 4
				br label %bb3
				bb3:
				; CHECK-LABEL: bb3:
				; CHECK: phi i32 [ %[[A1]], %bb1 ], [ %[[A2]], %bb2 ]
				%phi = phi i32 [ %load1, %bb1 ], [ %load2, %bb2 ]
				%and = and i32 %phi, 65535
				ret i32 %and
				}

				; Test for CodeGenPrepare::optimizeLoadExt(): exercise all opcode
				; cases of active bit calculation.
				define i32 @test_free_zext2(i32* %ptr, i16* %dst16, i32* %dst32, i32 %c) {
				; CHECK-LABEL: @test_free_zext2(
				bb1:
				; CHECK-LABEL: bb1:
				; CHECK: %[[T1:.*]] = load
				; CHECK: %[[A1:.*]] = and i32 %[[T1]], 65535
				%load1 = load i32, i32* %ptr, align 4
				%cmp = icmp ne i32 %c, 0
				br i1 %cmp, label %bb2, label %bb4
				bb2:
				; CHECK-LABEL: bb2:
				%trunc = trunc i32 %load1 to i16
				store i16 %trunc, i16* %dst16, align 2
				br i1 %cmp, label %bb3, label %bb4
				bb3:
				; CHECK-LABEL: bb3:
				%shl = shl i32 %load1, 16
				store i32 %shl, i32* %dst32, align 4
				br label %bb4
				bb4:
				; CHECK-LABEL: bb4:
				; CHECK-NOT: and
				; CHECK: ret i32 %[[A1]]
				%and = and i32 %load1, 65535
				ret i32 %and
				}

				; Test for CodeGenPrepare::optimizeLoadExt(): check case of zext-able
				; load feeding a phi in the same block.
				define void @test_free_zext3(i32* %ptr, i32* %ptr2, i32* %dst, i64* %c) {
				; CHECK-LABEL: @test_free_zext3(
				bb1:
				; CHECK-LABEL: bb1:
				; CHECK: %[[T1:.*]] = load
				; CHECK: %[[A1:.*]] = and i32 %[[T1]], 65535
				%load1 = load i32, i32* %ptr, align 4
				br label %loop
				loop:
				; CHECK-LABEL: loop:
				; CHECK: phi i32 [ %[[A1]], %bb1 ], [ %[[A2]], %loop ]
				%phi = phi i32 [ %load1, %bb1 ], [ %load2, %loop ]
				%and = and i32 %phi, 65535
				store i32 %and, i32* %dst, align 4
				%idx = load volatile i64, i64* %c, align 4
				%addr = getelementptr inbounds i32, i32* %ptr2, i64 %idx
				; CHECK: %[[T2:.*]] = load i32
				; CHECK: %[[A2:.*]] = and i32 %[[T2]], 65535
				%load2 = load i32, i32* %addr, align 4
				%cmp = icmp ne i64 %idx, 0
				br i1 %cmp, label %loop, label %end
				end:
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

[CodeGenPrepare] Create more extloads and fewer ands
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 40833

llvm/trunk/lib/CodeGen/CodeGenPrepare.cpp

llvm/trunk/test/CodeGen/AArch64/free-zext.ll

llvm/trunk/test/Transforms/CodeGenPrepare/free-zext.ll

This is an archive of the discontinued LLVM Phabricator instance.

[CodeGenPrepare] Create more extloads and fewer andsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 40833

llvm/trunk/lib/CodeGen/CodeGenPrepare.cpp

llvm/trunk/test/CodeGen/AArch64/free-zext.ll

llvm/trunk/test/Transforms/CodeGenPrepare/free-zext.ll

[CodeGenPrepare] Create more extloads and fewer ands
ClosedPublic