This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/Transforms/AggressiveInstCombine/
-
Transforms/
-
AggressiveInstCombine/
-
AggressiveInstCombine.cpp
-
test/Transforms/
-
Transforms/
-
AggressiveInstCombine/
-
rotate.ll
-
PhaseOrdering/
-
rotate.ll

Differential D55604

[AggressiveInstCombine] convert rotate with guard branch into funnel shift (PR34924)
ClosedPublic

Authored by spatel on Dec 12 2018, 8:47 AM.

Download Raw Diff

Details

Reviewers

efriedma
nikic
lebedev.ri
fabiang

Commits

rG200885e654fc: [AggressiveInstCombine] convert rotate with guard branch into funnel shift…
rL349396: [AggressiveInstCombine] convert rotate with guard branch into funnel shift…

Summary

Now, that we have funnel shift intrinsics, it should be safe to convert this form of rotate to it. In the worst case (a target that doesn't have rotate instructions), we will expand this into a branch-less sequence of ALU ops (neg/and/and/lshr/shl/or) in the backend, so it's still very likely to be a perf improvement over the original code.

The motivating source code pattern for this is shown in:
https://bugs.llvm.org/show_bug.cgi?id=34924

Background:
I looked at several different options before deciding where to try this - instcombine, simplifycfg, CGP - because it doesn't fit cleanly anywhere AFAIK.

The backend (CGP, SDAG, GlobalIsel?) is too late for what we're trying to accomplish. We want to have the IR converted before we reach things like vectorization because the simplified code can make a loop much simpler to transform.

Technically, this could be included in instcombine, but it's a large pattern match that includes control-flow, so it just felt wrong to stuff into there (although I have a draft of that patch). Similarly, this could be part of simplifycfg, but all of this pattern matching is a stretch.

So we're left with our relatively new dumping ground for homeless transforms: aggressive-instcombine. This only runs at -O3, but that seems like a reasonable limitation given that source code has many options to avoid this pattern (including the recently added clang intrinsics for rotates).

I'm including a PhaseOrdering test because we require the teamwork of 3 passes (aggressive-instcombine, instcombine, simplifycfg) to get this into the minimal IR form that we want.

And that showed a surprise - the new pass manager fails to reduce this. That's because the new PM includes this:

// Speculative execution if the target has divergent branches; otherwise nop.
FPM.addPass(SpeculativeExecutionPass());

...in the default simplification pipeline. That pass hoists all of the instructions into the entry block which allows simplifycfg to turn this sequence into straight-line code with a select. That then allows instcombine to recognize this sequence as a rotate and remove the select. But instcombine is not yet canonicalizing the 6 op ALU sequence to funnel-shift yet. I'm planning to add that, but this really looks like a bug in the new pass manager because hoisting a sequence of ops isn't supposed to be happening for all targets - from SpeculativeExecution.h:

// This pass hoists instructions to enable speculative execution on
// targets where branches are expensive. This is aimed at GPUs.

Diff Detail

Repository: rL LLVM

Event Timeline

spatel created this revision.Dec 12 2018, 8:47 AM

Herald added a subscriber: mcrosier. · View Herald TranscriptDec 12 2018, 8:47 AM

nikic added inline comments.Dec 12 2018, 10:43 AM

lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
95 ↗	(On Diff #177858)	Rather than matching the rotate pattern here, would it be possible to leave the conversion to fsh to InstCombine (as you say, this is planned anyway), and only do the removal of the zero check here? Or is general canonicalization in InstCombine still blocked on some optimization issues?
110 ↗	(On Diff #177858)	Do we possibly need to check that the rotate only has a single use, to avoid converting both the phi into fsh and leaving behind the original rotate?
124 ↗	(On Diff #177858)	Is `icmp ne %x, 0` already canonicalized to `icmp eq %x, 0` beforehand?

spatel marked 3 inline comments as done.Dec 12 2018, 11:57 AM

spatel added inline comments.

lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
95 ↗	(On Diff #177858)	There's no clear limit on the power of aggressive-instcombine, but I'm assuming that we should follow the same rules as regular instcombine. Ie, you should be able to transfer anything in here to instcombine if you don't care about compile-time. So we defer CFG changes to other passes (simplifycfg in particular) and just focus on logical transforms. This one blurs the lines of course, but I don't think we should be trying to alter the branch rather than the phi here. The only remaining optimization issue that I'm aware of for funnel shift canonicalization is addressed by D55563.
110 ↗	(On Diff #177858)	Sure, I can add that check. Strictly speaking, we can never create more instructions than we started with here (we're only replacing the phi), so we don't need to do that.
124 ↗	(On Diff #177858)	Yes, instcombine prefers 'eq' to 'ne', so we're relying on that. It's possible that we don't get that transform though - fresh example of that: https://bugs.llvm.org/show_bug.cgi?id=39968 ...so like the 'sle' negative test that's already included here, we are not catching every possible pattern in this patch. But until there's evidence that we need it, I don't think it's worth chasing.

nikic added inline comments.Dec 12 2018, 12:15 PM

lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
95 ↗	(On Diff #177858)	I'm not suggesting to alter the branches here, but rather to split this transform into a two step process. First InstCombine converts the `(x << y) \| (x >> (bw - y))` into `fshl(x, x, y)` (which is legal independently of the null check for rotates, but not general funnel shifts) and then here we only handle the `y == 0 ? x : fshl(x, x, y)` pattern, rather than the shift-based variant. However, after reading https://bugs.llvm.org/show_bug.cgi?id=34924, maybe this would not even be necessary? In one of the comments at the end you mention that if the rotate is represented as fsh, then simplifycfg already converts this pattern into a select. Together with the fsh+select simplification you already added in instsimplify, wouldn't that mean that a combination of instcombine+simplifycfg+instsimplify will already take care of the whole pattern (if we canonicalize the shl/ashr/or sequence to fsh in instcombine)? So the process would go like this: define i32 @rotl(i32 %a, i32 %b) { entry: %cmp = icmp eq i32 %b, 0 br i1 %cmp, label %end, label %rotbb rotbb: %sub = sub i32 32, %b %shr = lshr i32 %a, %sub %shl = shl i32 %a, %b %or = or i32 %shr, %shl br label %end end: %cond = phi i32 [ %or, %rotbb ], [ %a, %entry ] ret i32 %cond } // -instcombine define i32 @rotl(i32 %a, i32 %b) { entry: %cmp = icmp eq i32 %b, 0 br i1 %cmp, label %end, label %rotbb rotbb: %or = call i32 @llvm.fshl(i32 %a, i32 %a, i32 %b) br label %end end: %cond = phi i32 [ %or, %rotbb ], [ %a, %entry ] ret i32 %cond } // -simplifycfg define i32 @rotl(i32 %a, i32 %b) { entry: %cmp = icmp eq i32 %b, 0 %or = call i32 @llvm.fshl(i32 %a, i32 %a, i32 %b) %cond = select i1 %cmp, i32 %or, %a ret i32 %cond } // -instsimplify define i32 @rotl(i32 %a, i32 %b) { entry: %or = call i32 @llvm.fshl(i32 %a, i32 %a, i32 %b) ret i32 %or }

fabiang added inline comments.Dec 12 2018, 12:36 PM

lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
95 ↗	(On Diff #177858)	I agree that it feels a bit redundant to have the pattern replicated twice, once for branches here and once for selects in instsimplify (in what was D54552). If the contents of rotbb get reliably converted into funnel shifts by instcombine, and simplifycfg turns it into a select, the existing code seems sufficient. A counterargument may be that the more passes need to collaborate to handle this pattern as intended, the bigger the chances are of something going wrong in the middle of the process.

spatel marked an inline comment as done.Dec 12 2018, 1:27 PM

spatel added inline comments.

lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
95 ↗	(On Diff #177858)	Ok - this wasn't clear in the bug comments, so let me try to explain it better here. We can't turn this alone: %sub = sub i32 32, %b %shr = lshr i32 %a, %sub %shl = shl i32 %a, %b %or = or i32 %shr, %shl ...into fshl/fshr. The reason is that -- for a target that does not have a rotate instruction -- this will get expanded into a sequence with more ops and almost certainly worse perf. And there's no way for the backend to reverse that, so that's forbidden AFAIK. If the code already has some attempt to guard against UB (by masking/selecting/branching), then we can turn it into funnel shift because we can (almost) guarantee that our worst-case expansion is still equal or better than the original code.

spatel mentioned this in rL348980: [PhaseOrdering] add test for funnel shift (rotate); NFC.Dec 12 2018, 2:17 PM

Patch updated:

Add hasOneUse() checks and test, so we're not converting to funnel shift if there's potential for perf regression.
I went ahead and committed the PhaseOrdering test since it shows a bug independent of this patch (rL348980).

nikic added inline comments.Dec 12 2018, 2:30 PM

lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
95 ↗	(On Diff #177858)	Okay, that makes sense. For targets with rotate it's a trivial win, but for those without we can only perform the transform here because we're trading off a branch for two extra ands. In that case, I'm wondering if this transform should be restricted to power-of-two bitwidths. Otherwise this might end up introducing urems rather than simple masking (and looking at the current DAG code, we'd actually even add a select on zero, even though that doesn't seem necessary for rotates). Not that it really matters in the end, as I'm having a hard time imagining how one would end up with a non-power-of-two rotate.

spatel marked an inline comment as done.Dec 12 2018, 3:13 PM

spatel added inline comments.

lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
95 ↗	(On Diff #177858)	Right - it's the targets that don't rotate that make this hard to handle for generic combining. If we had some kind of cost-model driven instcombine that ran before things like vectorization, we could be more flexible. Yes, we can restrict this to power-of-2 types. And yes, I don't know how we'd ever see that kind of op unless someone has a very strange target. :)

Patch updated:
Added check and test for non-power-of-2 type.

nikic added inline comments.Dec 16 2018, 12:47 PM

lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp

148 ↗

(On Diff #177959)

It looks like this is not the right way to replace a phi. For this test case:

define i32 @rotl(i32 %a, i32 %b) {
entry:
  %cmp = icmp eq i32 %b, 0
  br i1 %cmp, label %end, label %rotbb

rotbb:
  %sub = sub i32 32, %b
  %shr = lshr i32 %a, %sub
  %shl = shl i32 %a, %b
  %or = or i32 %shr, %shl
  br label %end

end:
  %cond = phi i32 [ %or, %rotbb ], [ %a, %entry ]
  %other = phi i32 [ 1, %rotbb ], [ 2, %entry ]
  %res = or i32 %cond, %other
  ret i32 %res
}

I get:

PHI nodes not grouped at top of basic block!
  %other = phi i32 [ 1, %rotbb ], [ 2, %entry ]
label %end
in function rotl
LLVM ERROR: Broken function found, compilation aborted!

Some possible alternatives from git grep "IRBuilder.*Phi":

lib/Transforms/Instrumentation/GCOVProfiling.cpp:          IRBuilder<> BuilderForPhi(&*BB.begin());
lib/Transforms/Scalar/IndVarSimplify.cpp:        IRBuilder<> Builder(&*WidePhi->getParent()->getFirstInsertionPt());
lib/Transforms/Utils/BypassSlowDivision.cpp:  IRBuilder<> Builder(PhiBB, PhiBB->begin());

spatel marked 2 inline comments as done.Dec 17 2018, 4:24 AM

spatel added inline comments.

lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
148 ↗	(On Diff #177959)	Yes, good catch. I'll add a regression test and fix that.

spatel mentioned this in rL349347: [AggressiveInstCombine] add test for rotate insertion point; NFC.Dec 17 2018, 4:39 AM

Patch updated:
Set builder insertion point to valid location for non-phi replacement instruction.

LGTM. As a future extension, it would be nice to also handle non-rotate funnel shifts.

One note: We're not guarding whether the RotBB is reachable *only* from GuardBB, so it may be that it can also be executed with a shift amount of zero. However, in this case the result will be undef due to the oversized shift and we can assume that it coincides with the value provided by this transformation, so there shouldn't be a problem.

This revision is now accepted and ready to land.Dec 17 2018, 5:28 AM

Closed by commit rL349396: [AggressiveInstCombine] convert rotate with guard branch into funnel shift… (authored by spatel). · Explain WhyDec 17 2018, 1:18 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

AggressiveInstCombine/

AggressiveInstCombine.cpp

97 lines

test/

Transforms/

AggressiveInstCombine/

rotate.ll

318 lines

PhaseOrdering/

rotate.ll

13 lines

Diff 178520

llvm/trunk/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp

Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	public:
/// Run all expression pattern optimizations on the given /p F function.		/// Run all expression pattern optimizations on the given /p F function.
///		///
/// \param F function to optimize.		/// \param F function to optimize.
/// \returns true if the IR is changed.		/// \returns true if the IR is changed.
bool runOnFunction(Function &F) override;		bool runOnFunction(Function &F) override;
};		};
} // namespace		} // namespace

		/// Match a pattern for a bitwise rotate operation that partially guards
		/// against undefined behavior by branching around the rotation when the shift
		/// amount is 0.
		static bool foldGuardedRotateToFunnelShift(Instruction &I) {
		if (I.getOpcode() != Instruction::PHI \|\| I.getNumOperands() != 2)
		return false;

		// As with the one-use checks below, this is not strictly necessary, but we
		// are being cautious to avoid potential perf regressions on targets that
		// do not actually have a rotate instruction (where the funnel shift would be
		// expanded back into math/shift/logic ops).
		if (!isPowerOf2_32(I.getType()->getScalarSizeInBits()))
		return false;

		// Match V to funnel shift left/right and capture the source operand and
		// shift amount in X and Y.
		auto matchRotate = [](Value V, Value &X, Value *&Y) {
		Value L0, L1, R0, R1;
		unsigned Width = V->getType()->getScalarSizeInBits();
		auto Sub = m_Sub(m_SpecificInt(Width), m_Value(R1));

		// rotate_left(X, Y) == (X << Y) \| (X >> (Width - Y))
		auto RotL = m_OneUse(m_c_Or(m_Shl(m_Value(L0), m_Value(L1)),
		m_LShr(m_Value(R0), Sub)));
		if (RotL.match(V) && L0 == R0 && L1 == R1) {
		X = L0;
		Y = L1;
		return Intrinsic::fshl;
		}

		// rotate_right(X, Y) == (X >> Y) \| (X << (Width - Y))
		auto RotR = m_OneUse(m_c_Or(m_LShr(m_Value(L0), m_Value(L1)),
		m_Shl(m_Value(R0), Sub)));
		if (RotR.match(V) && L0 == R0 && L1 == R1) {
		X = L0;
		Y = L1;
		return Intrinsic::fshr;
		}

		return Intrinsic::not_intrinsic;
		};

		// One phi operand must be a rotate operation, and the other phi operand must
		// be the source value of that rotate operation:
		// phi [ rotate(RotSrc, RotAmt), RotBB ], [ RotSrc, GuardBB ]
		PHINode &Phi = cast<PHINode>(I);
		Value P0 = Phi.getOperand(0), P1 = Phi.getOperand(1);
		Value RotSrc, RotAmt;
		Intrinsic::ID IID = matchRotate(P0, RotSrc, RotAmt);
		if (IID == Intrinsic::not_intrinsic \|\| RotSrc != P1) {
		IID = matchRotate(P1, RotSrc, RotAmt);
		if (IID == Intrinsic::not_intrinsic \|\| RotSrc != P0)
		return false;
		assert((IID == Intrinsic::fshl \|\| IID == Intrinsic::fshr) &&
		"Pattern must match funnel shift left or right");
		}

		// The incoming block with our source operand must be the "guard" block.
		// That must contain a cmp+branch to avoid the rotate when the shift amount
		// is equal to 0. The other incoming block is the block with the rotate.
		BasicBlock *GuardBB = Phi.getIncomingBlock(RotSrc == P1);
		BasicBlock *RotBB = Phi.getIncomingBlock(RotSrc != P1);
		Instruction *TermI = GuardBB->getTerminator();
		BasicBlock TrueBB, FalseBB;
		ICmpInst::Predicate Pred;
		if (!match(TermI, m_Br(m_ICmp(Pred, m_Specific(RotAmt), m_ZeroInt()),
		TrueBB, FalseBB)))
		return false;

		BasicBlock *PhiBB = Phi.getParent();
		if (Pred != CmpInst::ICMP_EQ \|\| TrueBB != PhiBB \|\| FalseBB != RotBB)
		return false;

		// We matched a variation of this IR pattern:
		// GuardBB:
		// %cmp = icmp eq i32 %RotAmt, 0
		// br i1 %cmp, label %PhiBB, label %RotBB
		// RotBB:
		// %sub = sub i32 32, %RotAmt
		// %shr = lshr i32 %X, %sub
		// %shl = shl i32 %X, %RotAmt
		// %rot = or i32 %shr, %shl
		// br label %PhiBB
		// PhiBB:
		// %cond = phi i32 [ %rot, %RotBB ], [ %X, %GuardBB ]
		// -->
		// llvm.fshl.i32(i32 %X, i32 %RotAmt)
		IRBuilder<> Builder(PhiBB, PhiBB->getFirstInsertionPt());
		Function *F = Intrinsic::getDeclaration(Phi.getModule(), IID, Phi.getType());
		Phi.replaceAllUsesWith(Builder.CreateCall(F, {RotSrc, RotSrc, RotAmt}));
		return true;
		}

/// This is used by foldAnyOrAllBitsSet() to capture a source value (Root) and		/// This is used by foldAnyOrAllBitsSet() to capture a source value (Root) and
/// the bit indexes (Mask) needed by a masked compare. If we're matching a chain		/// the bit indexes (Mask) needed by a masked compare. If we're matching a chain
/// of 'and' ops, then we also need to capture the fact that we saw an		/// of 'and' ops, then we also need to capture the fact that we saw an
/// "and X, 1", so that's an extra return value for that case.		/// "and X, 1", so that's an extra return value for that case.
struct MaskOps {		struct MaskOps {
Value *Root;		Value *Root;
APInt Mask;		APInt Mask;
bool MatchAndChain;		bool MatchAndChain;
▲ Show 20 Lines • Show All 99 Lines • ▼ Show 20 Lines	for (BasicBlock &BB : F) {
if (!DT.isReachableFromEntry(&BB))		if (!DT.isReachableFromEntry(&BB))
continue;		continue;
// Do not delete instructions under here and invalidate the iterator.		// Do not delete instructions under here and invalidate the iterator.
// Walk the block backwards for efficiency. We're matching a chain of		// Walk the block backwards for efficiency. We're matching a chain of
// use->defs, so we're more likely to succeed by starting from the bottom.		// use->defs, so we're more likely to succeed by starting from the bottom.
// Also, we want to avoid matching partial patterns.		// Also, we want to avoid matching partial patterns.
// TODO: It would be more efficient if we removed dead instructions		// TODO: It would be more efficient if we removed dead instructions
// iteratively in this loop rather than waiting until the end.		// iteratively in this loop rather than waiting until the end.
for (Instruction &I : make_range(BB.rbegin(), BB.rend()))		for (Instruction &I : make_range(BB.rbegin(), BB.rend())) {
MadeChange \|= foldAnyOrAllBitsSet(I);		MadeChange \|= foldAnyOrAllBitsSet(I);
		MadeChange \|= foldGuardedRotateToFunnelShift(I);
		}
}		}

// We're done with transforms, so remove dead instructions.		// We're done with transforms, so remove dead instructions.
if (MadeChange)		if (MadeChange)
for (BasicBlock &BB : F)		for (BasicBlock &BB : F)
SimplifyInstructionsInBlock(&BB);		SimplifyInstructionsInBlock(&BB);

return MadeChange;		return MadeChange;
▲ Show 20 Lines • Show All 71 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/AggressiveInstCombine/rotate.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -aggressive-instcombine -S \| FileCheck %s			; RUN: opt < %s -aggressive-instcombine -S \| FileCheck %s

	; https://bugs.llvm.org/show_bug.cgi?id=34924			; https://bugs.llvm.org/show_bug.cgi?id=34924

	define i32 @rotl(i32 %a, i32 %b) {			define i32 @rotl(i32 %a, i32 %b) {
	; CHECK-LABEL: @rotl(			; CHECK-LABEL: @rotl(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CMP:%.]] = icmp eq i32 [[B:%.]], 0			; CHECK-NEXT: [[CMP:%.]] = icmp eq i32 [[B:%.]], 0
	; CHECK-NEXT: br i1 [[CMP]], label [[END:%.]], label [[ROTBB:%.]]			; CHECK-NEXT: br i1 [[CMP]], label [[END:%.]], label [[ROTBB:%.]]
	; CHECK: rotbb:			; CHECK: rotbb:
	; CHECK-NEXT: [[SUB:%.*]] = sub i32 32, [[B]]
	; CHECK-NEXT: [[SHR:%.]] = lshr i32 [[A:%.]], [[SUB]]
	; CHECK-NEXT: [[SHL:%.*]] = shl i32 [[A]], [[B]]
	; CHECK-NEXT: [[OR:%.*]] = or i32 [[SHR]], [[SHL]]
	; CHECK-NEXT: br label [[END]]			; CHECK-NEXT: br label [[END]]
	; CHECK: end:			; CHECK: end:
	; CHECK-NEXT: [[COND:%.]] = phi i32 [ [[OR]], [[ROTBB]] ], [ [[A]], [[ENTRY:%.]] ]			; CHECK-NEXT: [[TMP0:%.]] = call i32 @llvm.fshl.i32(i32 [[A:%.]], i32 [[A]], i32 [[B]])
	; CHECK-NEXT: ret i32 [[COND]]			; CHECK-NEXT: ret i32 [[TMP0]]
	;			;
	entry:			entry:
	%cmp = icmp eq i32 %b, 0			%cmp = icmp eq i32 %b, 0
	br i1 %cmp, label %end, label %rotbb			br i1 %cmp, label %end, label %rotbb

	rotbb:			rotbb:
	%sub = sub i32 32, %b			%sub = sub i32 32, %b
	%shr = lshr i32 %a, %sub			%shr = lshr i32 %a, %sub
	%shl = shl i32 %a, %b			%shl = shl i32 %a, %b
	%or = or i32 %shr, %shl			%or = or i32 %shr, %shl
	br label %end			br label %end

	end:			end:
	%cond = phi i32 [ %or, %rotbb ], [ %a, %entry ]			%cond = phi i32 [ %or, %rotbb ], [ %a, %entry ]
	ret i32 %cond			ret i32 %cond
	}			}

	define i32 @rotl_commute_phi(i32 %a, i32 %b) {			define i32 @rotl_commute_phi(i32 %a, i32 %b) {
	; CHECK-LABEL: @rotl_commute_phi(			; CHECK-LABEL: @rotl_commute_phi(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CMP:%.]] = icmp eq i32 [[B:%.]], 0			; CHECK-NEXT: [[CMP:%.]] = icmp eq i32 [[B:%.]], 0
	; CHECK-NEXT: br i1 [[CMP]], label [[END:%.]], label [[ROTBB:%.]]			; CHECK-NEXT: br i1 [[CMP]], label [[END:%.]], label [[ROTBB:%.]]
	; CHECK: rotbb:			; CHECK: rotbb:
	; CHECK-NEXT: [[SUB:%.*]] = sub i32 32, [[B]]
	; CHECK-NEXT: [[SHR:%.]] = lshr i32 [[A:%.]], [[SUB]]
	; CHECK-NEXT: [[SHL:%.*]] = shl i32 [[A]], [[B]]
	; CHECK-NEXT: [[OR:%.*]] = or i32 [[SHR]], [[SHL]]
	; CHECK-NEXT: br label [[END]]			; CHECK-NEXT: br label [[END]]
	; CHECK: end:			; CHECK: end:
	; CHECK-NEXT: [[COND:%.]] = phi i32 [ [[A]], [[ENTRY:%.]] ], [ [[OR]], [[ROTBB]] ]			; CHECK-NEXT: [[TMP0:%.]] = call i32 @llvm.fshl.i32(i32 [[A:%.]], i32 [[A]], i32 [[B]])
	; CHECK-NEXT: ret i32 [[COND]]			; CHECK-NEXT: ret i32 [[TMP0]]
	;			;
	entry:			entry:
	%cmp = icmp eq i32 %b, 0			%cmp = icmp eq i32 %b, 0
	br i1 %cmp, label %end, label %rotbb			br i1 %cmp, label %end, label %rotbb

	rotbb:			rotbb:
	%sub = sub i32 32, %b			%sub = sub i32 32, %b
	%shr = lshr i32 %a, %sub			%shr = lshr i32 %a, %sub
	%shl = shl i32 %a, %b			%shl = shl i32 %a, %b
	%or = or i32 %shr, %shl			%or = or i32 %shr, %shl
	br label %end			br label %end

	end:			end:
	%cond = phi i32 [ %a, %entry ], [ %or, %rotbb ]			%cond = phi i32 [ %a, %entry ], [ %or, %rotbb ]
	ret i32 %cond			ret i32 %cond
	}			}

	define i32 @rotl_commute_or(i32 %a, i32 %b) {			define i32 @rotl_commute_or(i32 %a, i32 %b) {
	; CHECK-LABEL: @rotl_commute_or(			; CHECK-LABEL: @rotl_commute_or(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CMP:%.]] = icmp eq i32 [[B:%.]], 0			; CHECK-NEXT: [[CMP:%.]] = icmp eq i32 [[B:%.]], 0
	; CHECK-NEXT: br i1 [[CMP]], label [[END:%.]], label [[ROTBB:%.]]			; CHECK-NEXT: br i1 [[CMP]], label [[END:%.]], label [[ROTBB:%.]]
	; CHECK: rotbb:			; CHECK: rotbb:
	; CHECK-NEXT: [[SUB:%.*]] = sub i32 32, [[B]]
	; CHECK-NEXT: [[SHR:%.]] = lshr i32 [[A:%.]], [[SUB]]
	; CHECK-NEXT: [[SHL:%.*]] = shl i32 [[A]], [[B]]
	; CHECK-NEXT: [[OR:%.*]] = or i32 [[SHL]], [[SHR]]
	; CHECK-NEXT: br label [[END]]			; CHECK-NEXT: br label [[END]]
	; CHECK: end:			; CHECK: end:
	; CHECK-NEXT: [[COND:%.]] = phi i32 [ [[A]], [[ENTRY:%.]] ], [ [[OR]], [[ROTBB]] ]			; CHECK-NEXT: [[TMP0:%.]] = call i32 @llvm.fshl.i32(i32 [[A:%.]], i32 [[A]], i32 [[B]])
	; CHECK-NEXT: ret i32 [[COND]]			; CHECK-NEXT: ret i32 [[TMP0]]
	;			;
	entry:			entry:
	%cmp = icmp eq i32 %b, 0			%cmp = icmp eq i32 %b, 0
	br i1 %cmp, label %end, label %rotbb			br i1 %cmp, label %end, label %rotbb

	rotbb:			rotbb:
	%sub = sub i32 32, %b			%sub = sub i32 32, %b
	%shr = lshr i32 %a, %sub			%shr = lshr i32 %a, %sub
	Show All 9 Lines
	; Verify that the intrinsic is inserted into a valid position.			; Verify that the intrinsic is inserted into a valid position.

	define i32 @rotl_insert_valid_location(i32 %a, i32 %b) {			define i32 @rotl_insert_valid_location(i32 %a, i32 %b) {
	; CHECK-LABEL: @rotl_insert_valid_location(			; CHECK-LABEL: @rotl_insert_valid_location(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CMP:%.]] = icmp eq i32 [[B:%.]], 0			; CHECK-NEXT: [[CMP:%.]] = icmp eq i32 [[B:%.]], 0
	; CHECK-NEXT: br i1 [[CMP]], label [[END:%.]], label [[ROTBB:%.]]			; CHECK-NEXT: br i1 [[CMP]], label [[END:%.]], label [[ROTBB:%.]]
	; CHECK: rotbb:			; CHECK: rotbb:
	; CHECK-NEXT: [[SUB:%.*]] = sub i32 32, [[B]]
	; CHECK-NEXT: [[SHR:%.]] = lshr i32 [[A:%.]], [[SUB]]
	; CHECK-NEXT: [[SHL:%.*]] = shl i32 [[A]], [[B]]
	; CHECK-NEXT: [[OR:%.*]] = or i32 [[SHR]], [[SHL]]
	; CHECK-NEXT: br label [[END]]			; CHECK-NEXT: br label [[END]]
	; CHECK: end:			; CHECK: end:
	; CHECK-NEXT: [[COND:%.]] = phi i32 [ [[OR]], [[ROTBB]] ], [ [[A]], [[ENTRY:%.]] ]			; CHECK-NEXT: [[OTHER:%.]] = phi i32 [ 1, [[ROTBB]] ], [ 2, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[OTHER:%.*]] = phi i32 [ 1, [[ROTBB]] ], [ 2, [[ENTRY]] ]			; CHECK-NEXT: [[TMP0:%.]] = call i32 @llvm.fshl.i32(i32 [[A:%.]], i32 [[A]], i32 [[B]])
	; CHECK-NEXT: [[RES:%.*]] = or i32 [[COND]], [[OTHER]]			; CHECK-NEXT: [[RES:%.*]] = or i32 [[TMP0]], [[OTHER]]
	; CHECK-NEXT: ret i32 [[RES]]			; CHECK-NEXT: ret i32 [[RES]]
	;			;
	entry:			entry:
	%cmp = icmp eq i32 %b, 0			%cmp = icmp eq i32 %b, 0
	br i1 %cmp, label %end, label %rotbb			br i1 %cmp, label %end, label %rotbb

	rotbb:			rotbb:
	%sub = sub i32 32, %b			%sub = sub i32 32, %b
	Show All 10 Lines
	}			}

	define i32 @rotr(i32 %a, i32 %b) {			define i32 @rotr(i32 %a, i32 %b) {
	; CHECK-LABEL: @rotr(			; CHECK-LABEL: @rotr(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CMP:%.]] = icmp eq i32 [[B:%.]], 0			; CHECK-NEXT: [[CMP:%.]] = icmp eq i32 [[B:%.]], 0
	; CHECK-NEXT: br i1 [[CMP]], label [[END:%.]], label [[ROTBB:%.]]			; CHECK-NEXT: br i1 [[CMP]], label [[END:%.]], label [[ROTBB:%.]]
	; CHECK: rotbb:			; CHECK: rotbb:
	; CHECK-NEXT: [[SUB:%.*]] = sub i32 32, [[B]]
	; CHECK-NEXT: [[SHL:%.]] = shl i32 [[A:%.]], [[SUB]]
	; CHECK-NEXT: [[SHR:%.*]] = lshr i32 [[A]], [[B]]
	; CHECK-NEXT: [[OR:%.*]] = or i32 [[SHR]], [[SHL]]
	; CHECK-NEXT: br label [[END]]			; CHECK-NEXT: br label [[END]]
	; CHECK: end:			; CHECK: end:
	; CHECK-NEXT: [[COND:%.]] = phi i32 [ [[OR]], [[ROTBB]] ], [ [[A]], [[ENTRY:%.]] ]			; CHECK-NEXT: [[TMP0:%.]] = call i32 @llvm.fshr.i32(i32 [[A:%.]], i32 [[A]], i32 [[B]])
	; CHECK-NEXT: ret i32 [[COND]]			; CHECK-NEXT: ret i32 [[TMP0]]
	;			;
	entry:			entry:
	%cmp = icmp eq i32 %b, 0			%cmp = icmp eq i32 %b, 0
	br i1 %cmp, label %end, label %rotbb			br i1 %cmp, label %end, label %rotbb

	rotbb:			rotbb:
	%sub = sub i32 32, %b			%sub = sub i32 32, %b
	%shl = shl i32 %a, %sub			%shl = shl i32 %a, %sub
	%shr = lshr i32 %a, %b			%shr = lshr i32 %a, %b
	%or = or i32 %shr, %shl			%or = or i32 %shr, %shl
	br label %end			br label %end

	end:			end:
	%cond = phi i32 [ %or, %rotbb ], [ %a, %entry ]			%cond = phi i32 [ %or, %rotbb ], [ %a, %entry ]
	ret i32 %cond			ret i32 %cond
	}			}

	define i32 @rotr_commute_phi(i32 %a, i32 %b) {			define i32 @rotr_commute_phi(i32 %a, i32 %b) {
	; CHECK-LABEL: @rotr_commute_phi(			; CHECK-LABEL: @rotr_commute_phi(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CMP:%.]] = icmp eq i32 [[B:%.]], 0			; CHECK-NEXT: [[CMP:%.]] = icmp eq i32 [[B:%.]], 0
	; CHECK-NEXT: br i1 [[CMP]], label [[END:%.]], label [[ROTBB:%.]]			; CHECK-NEXT: br i1 [[CMP]], label [[END:%.]], label [[ROTBB:%.]]
	; CHECK: rotbb:			; CHECK: rotbb:
				; CHECK-NEXT: br label [[END]]
				; CHECK: end:
				; CHECK-NEXT: [[TMP0:%.]] = call i32 @llvm.fshr.i32(i32 [[A:%.]], i32 [[A]], i32 [[B]])
				; CHECK-NEXT: ret i32 [[TMP0]]
				;
				entry:
				%cmp = icmp eq i32 %b, 0
				br i1 %cmp, label %end, label %rotbb

				rotbb:
				%sub = sub i32 32, %b
				%shl = shl i32 %a, %sub
				%shr = lshr i32 %a, %b
				%or = or i32 %shr, %shl
				br label %end

				end:
				%cond = phi i32 [ %a, %entry ], [ %or, %rotbb ]
				ret i32 %cond
				}

				define i32 @rotr_commute_or(i32 %a, i32 %b) {
				; CHECK-LABEL: @rotr_commute_or(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[CMP:%.]] = icmp eq i32 [[B:%.]], 0
				; CHECK-NEXT: br i1 [[CMP]], label [[END:%.]], label [[ROTBB:%.]]
				; CHECK: rotbb:
				; CHECK-NEXT: br label [[END]]
				; CHECK: end:
				; CHECK-NEXT: [[TMP0:%.]] = call i32 @llvm.fshr.i32(i32 [[A:%.]], i32 [[A]], i32 [[B]])
				; CHECK-NEXT: ret i32 [[TMP0]]
				;
				entry:
				%cmp = icmp eq i32 %b, 0
				br i1 %cmp, label %end, label %rotbb

				rotbb:
				%sub = sub i32 32, %b
				%shl = shl i32 %a, %sub
				%shr = lshr i32 %a, %b
				%or = or i32 %shl, %shr
				br label %end

				end:
				%cond = phi i32 [ %a, %entry ], [ %or, %rotbb ]
				ret i32 %cond
				}

				; Negative test - non-power-of-2 might require urem expansion in the backend.

				define i12 @could_be_rotr_weird_type(i12 %a, i12 %b) {
				; CHECK-LABEL: @could_be_rotr_weird_type(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[CMP:%.]] = icmp eq i12 [[B:%.]], 0
				; CHECK-NEXT: br i1 [[CMP]], label [[END:%.]], label [[ROTBB:%.]]
				; CHECK: rotbb:
				; CHECK-NEXT: [[SUB:%.*]] = sub i12 12, [[B]]
				; CHECK-NEXT: [[SHL:%.]] = shl i12 [[A:%.]], [[SUB]]
				; CHECK-NEXT: [[SHR:%.*]] = lshr i12 [[A]], [[B]]
				; CHECK-NEXT: [[OR:%.*]] = or i12 [[SHL]], [[SHR]]
				; CHECK-NEXT: br label [[END]]
				; CHECK: end:
				; CHECK-NEXT: [[COND:%.]] = phi i12 [ [[A]], [[ENTRY:%.]] ], [ [[OR]], [[ROTBB]] ]
				; CHECK-NEXT: ret i12 [[COND]]
				;
				entry:
				%cmp = icmp eq i12 %b, 0
				br i1 %cmp, label %end, label %rotbb

				rotbb:
				%sub = sub i12 12, %b
				%shl = shl i12 %a, %sub
				%shr = lshr i12 %a, %b
				%or = or i12 %shl, %shr
				br label %end

				end:
				%cond = phi i12 [ %a, %entry ], [ %or, %rotbb ]
				ret i12 %cond
				}

				; Negative test - wrong phi ops.

				define i32 @not_rotr_1(i32 %a, i32 %b) {
				; CHECK-LABEL: @not_rotr_1(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[CMP:%.]] = icmp eq i32 [[B:%.]], 0
				; CHECK-NEXT: br i1 [[CMP]], label [[END:%.]], label [[ROTBB:%.]]
				; CHECK: rotbb:
				; CHECK-NEXT: [[SUB:%.*]] = sub i32 32, [[B]]
				; CHECK-NEXT: [[SHL:%.]] = shl i32 [[A:%.]], [[SUB]]
				; CHECK-NEXT: [[SHR:%.*]] = lshr i32 [[A]], [[B]]
				; CHECK-NEXT: [[OR:%.*]] = or i32 [[SHL]], [[SHR]]
				; CHECK-NEXT: br label [[END]]
				; CHECK: end:
				; CHECK-NEXT: [[COND:%.]] = phi i32 [ [[B]], [[ENTRY:%.]] ], [ [[OR]], [[ROTBB]] ]
				; CHECK-NEXT: ret i32 [[COND]]
				;
				entry:
				%cmp = icmp eq i32 %b, 0
				br i1 %cmp, label %end, label %rotbb

				rotbb:
				%sub = sub i32 32, %b
				%shl = shl i32 %a, %sub
				%shr = lshr i32 %a, %b
				%or = or i32 %shl, %shr
				br label %end

				end:
				%cond = phi i32 [ %b, %entry ], [ %or, %rotbb ]
				ret i32 %cond
				}

				; Negative test - too many phi ops.

				define i32 @not_rotr_2(i32 %a, i32 %b, i32 %c) {
				; CHECK-LABEL: @not_rotr_2(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[CMP:%.]] = icmp eq i32 [[B:%.]], 0
				; CHECK-NEXT: br i1 [[CMP]], label [[END:%.]], label [[ROTBB:%.]]
				; CHECK: rotbb:
	; CHECK-NEXT: [[SUB:%.*]] = sub i32 32, [[B]]			; CHECK-NEXT: [[SUB:%.*]] = sub i32 32, [[B]]
	; CHECK-NEXT: [[SHL:%.]] = shl i32 [[A:%.]], [[SUB]]			; CHECK-NEXT: [[SHL:%.]] = shl i32 [[A:%.]], [[SUB]]
	; CHECK-NEXT: [[SHR:%.*]] = lshr i32 [[A]], [[B]]			; CHECK-NEXT: [[SHR:%.*]] = lshr i32 [[A]], [[B]]
	; CHECK-NEXT: [[OR:%.*]] = or i32 [[SHR]], [[SHL]]			; CHECK-NEXT: [[OR:%.*]] = or i32 [[SHL]], [[SHR]]
				; CHECK-NEXT: [[CMP42:%.*]] = icmp ugt i32 [[OR]], 42
				; CHECK-NEXT: br i1 [[CMP42]], label [[END]], label [[BOGUS:%.*]]
				; CHECK: bogus:
				; CHECK-NEXT: br label [[END]]
				; CHECK: end:
				; CHECK-NEXT: [[COND:%.]] = phi i32 [ [[A]], [[ENTRY:%.]] ], [ [[OR]], [[ROTBB]] ], [ [[C:%.*]], [[BOGUS]] ]
				; CHECK-NEXT: ret i32 [[COND]]
				;
				entry:
				%cmp = icmp eq i32 %b, 0
				br i1 %cmp, label %end, label %rotbb

				rotbb:
				%sub = sub i32 32, %b
				%shl = shl i32 %a, %sub
				%shr = lshr i32 %a, %b
				%or = or i32 %shl, %shr
				%cmp42 = icmp ugt i32 %or, 42
				br i1 %cmp42, label %end, label %bogus

				bogus:
				br label %end

				end:
				%cond = phi i32 [ %a, %entry ], [ %or, %rotbb ], [ %c, %bogus ]
				ret i32 %cond
				}

				; Negative test - wrong cmp (but this should match?).

				define i32 @not_rotr_3(i32 %a, i32 %b) {
				; CHECK-LABEL: @not_rotr_3(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[CMP:%.]] = icmp sle i32 [[B:%.]], 0
				; CHECK-NEXT: br i1 [[CMP]], label [[END:%.]], label [[ROTBB:%.]]
				; CHECK: rotbb:
				; CHECK-NEXT: [[SUB:%.*]] = sub i32 32, [[B]]
				; CHECK-NEXT: [[SHL:%.]] = shl i32 [[A:%.]], [[SUB]]
				; CHECK-NEXT: [[SHR:%.*]] = lshr i32 [[A]], [[B]]
				; CHECK-NEXT: [[OR:%.*]] = or i32 [[SHL]], [[SHR]]
				; CHECK-NEXT: br label [[END]]
				; CHECK: end:
				; CHECK-NEXT: [[COND:%.]] = phi i32 [ [[A]], [[ENTRY:%.]] ], [ [[OR]], [[ROTBB]] ]
				; CHECK-NEXT: ret i32 [[COND]]
				;
				entry:
				%cmp = icmp sle i32 %b, 0
				br i1 %cmp, label %end, label %rotbb

				rotbb:
				%sub = sub i32 32, %b
				%shl = shl i32 %a, %sub
				%shr = lshr i32 %a, %b
				%or = or i32 %shl, %shr
				br label %end

				end:
				%cond = phi i32 [ %a, %entry ], [ %or, %rotbb ]
				ret i32 %cond
				}

				; Negative test - wrong shift.

				define i32 @not_rotr_4(i32 %a, i32 %b) {
				; CHECK-LABEL: @not_rotr_4(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[CMP:%.]] = icmp eq i32 [[B:%.]], 0
				; CHECK-NEXT: br i1 [[CMP]], label [[END:%.]], label [[ROTBB:%.]]
				; CHECK: rotbb:
				; CHECK-NEXT: [[SUB:%.*]] = sub i32 32, [[B]]
				; CHECK-NEXT: [[SHL:%.]] = shl i32 [[A:%.]], [[SUB]]
				; CHECK-NEXT: [[SHR:%.*]] = ashr i32 [[A]], [[B]]
				; CHECK-NEXT: [[OR:%.*]] = or i32 [[SHL]], [[SHR]]
	; CHECK-NEXT: br label [[END]]			; CHECK-NEXT: br label [[END]]
	; CHECK: end:			; CHECK: end:
	; CHECK-NEXT: [[COND:%.]] = phi i32 [ [[A]], [[ENTRY:%.]] ], [ [[OR]], [[ROTBB]] ]			; CHECK-NEXT: [[COND:%.]] = phi i32 [ [[A]], [[ENTRY:%.]] ], [ [[OR]], [[ROTBB]] ]
	; CHECK-NEXT: ret i32 [[COND]]			; CHECK-NEXT: ret i32 [[COND]]
	;			;
	entry:			entry:
	%cmp = icmp eq i32 %b, 0			%cmp = icmp eq i32 %b, 0
	br i1 %cmp, label %end, label %rotbb			br i1 %cmp, label %end, label %rotbb

	rotbb:			rotbb:
	%sub = sub i32 32, %b			%sub = sub i32 32, %b
	%shl = shl i32 %a, %sub			%shl = shl i32 %a, %sub
				%shr = ashr i32 %a, %b
				%or = or i32 %shl, %shr
				br label %end

				end:
				%cond = phi i32 [ %a, %entry ], [ %or, %rotbb ]
				ret i32 %cond
				}

				; Negative test - wrong shift.

				define i32 @not_rotr_5(i32 %a, i32 %b) {
				; CHECK-LABEL: @not_rotr_5(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[CMP:%.]] = icmp eq i32 [[B:%.]], 0
				; CHECK-NEXT: br i1 [[CMP]], label [[END:%.]], label [[ROTBB:%.]]
				; CHECK: rotbb:
				; CHECK-NEXT: [[SUB:%.*]] = sub i32 32, [[B]]
				; CHECK-NEXT: [[SHL:%.*]] = shl i32 [[B]], [[SUB]]
				; CHECK-NEXT: [[SHR:%.]] = lshr i32 [[A:%.]], [[B]]
				; CHECK-NEXT: [[OR:%.*]] = or i32 [[SHL]], [[SHR]]
				; CHECK-NEXT: br label [[END]]
				; CHECK: end:
				; CHECK-NEXT: [[COND:%.]] = phi i32 [ [[A]], [[ENTRY:%.]] ], [ [[OR]], [[ROTBB]] ]
				; CHECK-NEXT: ret i32 [[COND]]
				;
				entry:
				%cmp = icmp eq i32 %b, 0
				br i1 %cmp, label %end, label %rotbb

				rotbb:
				%sub = sub i32 32, %b
				%shl = shl i32 %b, %sub
	%shr = lshr i32 %a, %b			%shr = lshr i32 %a, %b
	%or = or i32 %shr, %shl			%or = or i32 %shl, %shr
	br label %end			br label %end

	end:			end:
	%cond = phi i32 [ %a, %entry ], [ %or, %rotbb ]			%cond = phi i32 [ %a, %entry ], [ %or, %rotbb ]
	ret i32 %cond			ret i32 %cond
	}			}

	define i32 @rotr_commute_or(i32 %a, i32 %b) {			; Negative test - wrong sub.
	; CHECK-LABEL: @rotr_commute_or(
				define i32 @not_rotr_6(i32 %a, i32 %b) {
				; CHECK-LABEL: @not_rotr_6(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[CMP:%.]] = icmp eq i32 [[B:%.]], 0
				; CHECK-NEXT: br i1 [[CMP]], label [[END:%.]], label [[ROTBB:%.]]
				; CHECK: rotbb:
				; CHECK-NEXT: [[SUB:%.*]] = sub i32 8, [[B]]
				; CHECK-NEXT: [[SHL:%.]] = shl i32 [[A:%.]], [[SUB]]
				; CHECK-NEXT: [[SHR:%.*]] = lshr i32 [[A]], [[B]]
				; CHECK-NEXT: [[OR:%.*]] = or i32 [[SHL]], [[SHR]]
				; CHECK-NEXT: br label [[END]]
				; CHECK: end:
				; CHECK-NEXT: [[COND:%.]] = phi i32 [ [[A]], [[ENTRY:%.]] ], [ [[OR]], [[ROTBB]] ]
				; CHECK-NEXT: ret i32 [[COND]]
				;
				entry:
				%cmp = icmp eq i32 %b, 0
				br i1 %cmp, label %end, label %rotbb

				rotbb:
				%sub = sub i32 8, %b
				%shl = shl i32 %a, %sub
				%shr = lshr i32 %a, %b
				%or = or i32 %shl, %shr
				br label %end

				end:
				%cond = phi i32 [ %a, %entry ], [ %or, %rotbb ]
				ret i32 %cond
				}

				; Negative test - extra use. Technically, we could transform this
				; because it doesn't increase the instruction count, but we're
				; being cautious not to cause a potential perf pessimization for
				; targets that do not have a rotate instruction.

				define i32 @could_be_rotr(i32 %a, i32 %b, i32* %p) {
				; CHECK-LABEL: @could_be_rotr(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CMP:%.]] = icmp eq i32 [[B:%.]], 0			; CHECK-NEXT: [[CMP:%.]] = icmp eq i32 [[B:%.]], 0
	; CHECK-NEXT: br i1 [[CMP]], label [[END:%.]], label [[ROTBB:%.]]			; CHECK-NEXT: br i1 [[CMP]], label [[END:%.]], label [[ROTBB:%.]]
	; CHECK: rotbb:			; CHECK: rotbb:
	; CHECK-NEXT: [[SUB:%.*]] = sub i32 32, [[B]]			; CHECK-NEXT: [[SUB:%.*]] = sub i32 32, [[B]]
	; CHECK-NEXT: [[SHL:%.]] = shl i32 [[A:%.]], [[SUB]]			; CHECK-NEXT: [[SHL:%.]] = shl i32 [[A:%.]], [[SUB]]
	; CHECK-NEXT: [[SHR:%.*]] = lshr i32 [[A]], [[B]]			; CHECK-NEXT: [[SHR:%.*]] = lshr i32 [[A]], [[B]]
	; CHECK-NEXT: [[OR:%.*]] = or i32 [[SHL]], [[SHR]]			; CHECK-NEXT: [[OR:%.*]] = or i32 [[SHL]], [[SHR]]
				; CHECK-NEXT: store i32 [[OR]], i32* [[P:%.*]]
	; CHECK-NEXT: br label [[END]]			; CHECK-NEXT: br label [[END]]
	; CHECK: end:			; CHECK: end:
	; CHECK-NEXT: [[COND:%.]] = phi i32 [ [[A]], [[ENTRY:%.]] ], [ [[OR]], [[ROTBB]] ]			; CHECK-NEXT: [[COND:%.]] = phi i32 [ [[A]], [[ENTRY:%.]] ], [ [[OR]], [[ROTBB]] ]
	; CHECK-NEXT: ret i32 [[COND]]			; CHECK-NEXT: ret i32 [[COND]]
	;			;
	entry:			entry:
	%cmp = icmp eq i32 %b, 0			%cmp = icmp eq i32 %b, 0
	br i1 %cmp, label %end, label %rotbb			br i1 %cmp, label %end, label %rotbb

	rotbb:			rotbb:
	%sub = sub i32 32, %b			%sub = sub i32 32, %b
	%shl = shl i32 %a, %sub			%shl = shl i32 %a, %sub
	%shr = lshr i32 %a, %b			%shr = lshr i32 %a, %b
	%or = or i32 %shl, %shr			%or = or i32 %shl, %shr
				store i32 %or, i32* %p
	br label %end			br label %end

	end:			end:
	%cond = phi i32 [ %a, %entry ], [ %or, %rotbb ]			%cond = phi i32 [ %a, %entry ], [ %or, %rotbb ]
	ret i32 %cond			ret i32 %cond
	}			}

llvm/trunk/test/Transforms/PhaseOrdering/rotate.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -O3 -S < %s \| FileCheck %s --check-prefixes=ANY,OLDPM			; RUN: opt -O3 -S < %s \| FileCheck %s --check-prefixes=ANY,OLDPM
	; RUN: opt -passes='default<O3>' -S < %s \| FileCheck %s --check-prefixes=ANY,NEWPM			; RUN: opt -passes='default<O3>' -S < %s \| FileCheck %s --check-prefixes=ANY,NEWPM

	; This should become a single funnel shift through a combination			; This should become a single funnel shift through a combination
	; of aggressive-instcombine, simplifycfg, and instcombine.			; of aggressive-instcombine, simplifycfg, and instcombine.
	; https://bugs.llvm.org/show_bug.cgi?id=34924			; https://bugs.llvm.org/show_bug.cgi?id=34924

	define i32 @rotl(i32 %a, i32 %b) {			define i32 @rotl(i32 %a, i32 %b) {
	; OLDPM-LABEL: @rotl(			; OLDPM-LABEL: @rotl(
	; OLDPM-NEXT: entry:			; OLDPM-NEXT: entry:
	; OLDPM-NEXT: [[CMP:%.]] = icmp eq i32 [[B:%.]], 0			; OLDPM-NEXT: [[TMP0:%.]] = tail call i32 @llvm.fshl.i32(i32 [[A:%.]], i32 [[A]], i32 [[B:%.*]])
	; OLDPM-NEXT: br i1 [[CMP]], label [[END:%.]], label [[ROTBB:%.]]			; OLDPM-NEXT: ret i32 [[TMP0]]
	; OLDPM: rotbb:
	; OLDPM-NEXT: [[SUB:%.*]] = sub i32 32, [[B]]
	; OLDPM-NEXT: [[SHR:%.]] = lshr i32 [[A:%.]], [[SUB]]
	; OLDPM-NEXT: [[SHL:%.*]] = shl i32 [[A]], [[B]]
	; OLDPM-NEXT: [[OR:%.*]] = or i32 [[SHR]], [[SHL]]
	; OLDPM-NEXT: br label [[END]]
	; OLDPM: end:
	; OLDPM-NEXT: [[COND:%.]] = phi i32 [ [[OR]], [[ROTBB]] ], [ [[A]], [[ENTRY:%.]] ]
	; OLDPM-NEXT: ret i32 [[COND]]
	;			;
	; NEWPM-LABEL: @rotl(			; NEWPM-LABEL: @rotl(
	; NEWPM-NEXT: entry:			; NEWPM-NEXT: entry:
	; NEWPM-NEXT: [[TMP0:%.]] = sub i32 0, [[B:%.]]			; NEWPM-NEXT: [[TMP0:%.]] = sub i32 0, [[B:%.]]
	; NEWPM-NEXT: [[TMP1:%.*]] = and i32 [[B]], 31			; NEWPM-NEXT: [[TMP1:%.*]] = and i32 [[B]], 31
	; NEWPM-NEXT: [[TMP2:%.*]] = and i32 [[TMP0]], 31			; NEWPM-NEXT: [[TMP2:%.*]] = and i32 [[TMP0]], 31
	; NEWPM-NEXT: [[TMP3:%.]] = lshr i32 [[A:%.]], [[TMP2]]			; NEWPM-NEXT: [[TMP3:%.]] = lshr i32 [[A:%.]], [[TMP2]]
	; NEWPM-NEXT: [[TMP4:%.*]] = shl i32 [[A]], [[TMP1]]			; NEWPM-NEXT: [[TMP4:%.*]] = shl i32 [[A]], [[TMP1]]
	Show All 19 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AggressiveInstCombine] convert rotate with guard branch into funnel shift (PR34924)ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 178520

llvm/trunk/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp

llvm/trunk/test/Transforms/AggressiveInstCombine/rotate.ll

llvm/trunk/test/Transforms/PhaseOrdering/rotate.ll

[AggressiveInstCombine] convert rotate with guard branch into funnel shift (PR34924)
ClosedPublic