This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Transforms/Utils/
-
Transforms/
-
Utils/
25
SimplifyCFG.cpp
-
test/Transforms/SimplifyCFG/
-
Transforms/
-
SimplifyCFG/
-
rangereduce.ll

Differential D21291

[SimplifyCFG] Range reduce switches
AbandonedPublic

Authored by jmolloy on Jun 13 2016, 7:53 AM.

Download Raw Diff

Details

Reviewers

hans
sanjoy
sbaranga
mcrosier

Summary

If a switch is sparse and all the cases (once sorted) are in arithmetic progression, we can extract the common factor out of the switch and create a dense switch. For example:

switch (i) {
case 5: ...
case 9: ...
case 13: ...
case 17: ...
}

can become:

if ( (i - 5) % 4 ) goto default;
switch ((i - 5) / 4) {
case 0: ...
case 1: ...
case 2: ...
case 3: ...
}

The division and remainder operations could be costly so we only do this if the factor is a power of two. Dense switches can be lowered significantly better than sparse switches and can even be transformed into lookup tables.

Diff Detail

Repository: rL LLVM

Event Timeline

jmolloy updated this revision to Diff 60524.Jun 13 2016, 7:53 AM

jmolloy retitled this revision from to [SimplifyCFG] Range reduce switches.

jmolloy updated this object.

jmolloy added reviewers: mcrosier, sbaranga, sanjoy.

jmolloy set the repository for this revision to rL LLVM.

jmolloy added a subscriber: llvm-commits.

sbaranga added inline comments.Jun 13 2016, 8:51 AM

lib/Transforms/Utils/SimplifyCFG.cpp
4993	Could you also add a test for this case?
5009	If it doesn't have holes then the GCD that you are looking for should be Values[1].

Hi Silviu,

Thanks for the review!

James

lib/Transforms/Utils/SimplifyCFG.cpp
4993	This should be @test3
5009	Uhhhhh, yes. Yes it should. Thanks for spotting this! I've rewritten the code entirely.

sanjoy requested changes to this revision.Jun 13 2016, 9:49 AM

sanjoy edited edge metadata.

sanjoy added inline comments.

lib/Transforms/Utils/SimplifyCFG.cpp
4989	I'd prefer a `getZExtValue` or `getSExtValue` here. That way the code will assert if someone later accidentally removed the check on the bit width.
4992	How about `{-2, 0, 2, 4}`? That's sparse and can benefit from this pass, but here you'll conclude that it has no holes.
5006	Don't you need to check that `V` is divisible by `Divisor` here?
5007	Might be worth calling out the signedness of the division here explicitly. I'd mildly prefer bitwise ops instead of division.
5017	I'd be tempted to emit the bitwise ops here directly. That way you don't do extra work later and are less sensitive to pass ordering.

This revision now requires changes to proceed.Jun 13 2016, 9:49 AM

sanjoy added inline comments.Jun 13 2016, 9:51 AM

lib/Transforms/Utils/SimplifyCFG.cpp
4992	Bad example, I meant cases like `{-3, -1, 1, 3}`.

Maybe it makes sense to perform this sort of transformation in SelectionDAGBuilder::visitSwitch instead? Most of the relevant logic is there.

This patch only handles situations where all the values form a linear series, but it seems important to also catch cases where *most* of the values form a linear series (for example, 97, 105, 109, 113, 117, 121).

I think your profitability model is a bit off: we never generate a jump table for less than four cases, so trying to improve the density of a nonexistent jump table seems like a bad idea. And for cases where the divisor is small, the tradeoff between extra jump table entries and an extra compare+branch isn't obvious.

flyingforyou added a subscriber: flyingforyou.Jun 13 2016, 3:11 PM

Hi Sanjoy and Eli,

Thanks for your reviews. I agree with all of your comments. This new version has a real density function and will perform the optimization if the switch is not dense to begin with and would be made dense. The density function is adapted from SelectionDAGBuilder, and I'm not too happy on replicating the heuristic here but I couldn't think of a better way.

Similarly the hardcoded bailout for < 4 cases - ideally I'd use TargetLowering here which has a hook, but we only have TTI. It might be worth adding a hook, but perhaps it's not so urgent so I've added a FIXME.

Eli, the reason I'm doing this in SimplifyCFG is because it's an enabler for switch->lookup table lowering (which is also in SimplifyCFG).

The testing has also been improved to cover more negative and unsigned large values and wraparound cases.

Cheers,

James

eli.friedman added inline comments.Jun 15 2016, 1:59 PM

lib/Transforms/Utils/SimplifyCFG.cpp
5022	Nit: please make the implicit casting here explicit.
5031	If the GCD is some odd number N, you can multiply by the inverse of N and switch on that. Not sure if that's actually useful in practice.
5056	If we were doing this in SelectionDAG, we would be able to combine this branch with the jump table's bounds check: "ROTATE(X - Base, Shift) > Limit".

Haven't done a thorough review yet, but some minor nitpicky comments inline.

lib/Transforms/Utils/SimplifyCFG.cpp
5029	This gcd computation looks like extra work -- why not: uint64_t PotentialGCD = Values[1]; if (!isPowerOf2(PotentialGCD)) return false; if (!all_of(Values, [](uint64_t V) { return V & (PotentialGCD - 1) == 0; }) return false;
5072	This linear search (via `replaceUsesOfWith`) seems wasteful. Why not change `getCaseValue` to return a `Use &` and do `C.getCaseValue().set(New)` or something similar?

jmolloy added inline comments.Jun 15 2016, 2:31 PM

lib/Transforms/Utils/SimplifyCFG.cpp
5056	Hmm, you're close to convincing me. I'm loath to give up the lookup table optimization though. How gross (/ acceptable) would it be to implement this in SelectionDAG and then also teach lookup table lowering in SimplifyCFG this trick too?

eli.friedman added inline comments.Jun 15 2016, 4:39 PM

lib/Transforms/Utils/SimplifyCFG.cpp
5029	That isn't equivalent for something like "0, 8, 12, 16, 20".
5056	It wouldn't be too terrible as long as the actual algorithm for finding a reducible switch is factored out into a utility, I think.

sanjoy added inline comments.Jun 15 2016, 4:48 PM

lib/Transforms/Utils/SimplifyCFG.cpp
5029	Yeah, you're right. I was still thinking in terms of the older "100% dense" scheme.

Hi Eli,

That rotate trick is, quite simply, brilliant.

New version uses SelectionDAG - I'll factor some of the machinery out at a later date when I implement this for lookup table formation too.

All comments should now be addressed.

Cheers,

James

eli.friedman added inline comments.Jun 19 2016, 2:58 PM

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8332 ↗	(On Diff #60973)	Do we need to merge clusters in some cases, so we don't end up with adjacent clusters with the same destination? (This doesn't affect building a jump table, but it affects isDense.) Hmm... actually, more generally, whether we transform a switch condition is to some extent independent of whether we form a jump table; for example, if we end up with one cluster here, you don't need a jump table at all. Maybe we can leave that for a followup, though.
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
267 ↗	(On Diff #60973)	It's unintuitive to hide this in the constructor... it seems like a good idea to push the Shift == 0 special-case out.

hans added a subscriber: hans.Jun 20 2016, 9:56 AM

hans added inline comments.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
1971 ↗	(On Diff #60973)	I was going to comment that the code in the patch summary could use rotation instead, but it seems you're already on it :-)
8434 ↗	(On Diff #60973)	This code doesn't take into account that a range of cases could be transformed to become dense, so I'm not sure your patch will actually find any more jump table lowerings that don't cover the whole switch?

hans added inline comments.Jun 20 2016, 10:30 AM

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8434 ↗	(On Diff #60973)	I guess what I was saying with this comment is that since you're not going to get the benefit of transforming parts of the switch, it there a point in doing this in the DAG instead of in the IR? I think the rotation trick should work in the IR too..

eli.friedman added inline comments.Jun 20 2016, 10:50 AM

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8434 ↗	(On Diff #60973)	Err, I guess the rotation trick works in IR? You can let switch lowering do the compare+branch. Not sure what I was thinking before. That said, we don't really want to do switch lowering in IR. I mean, in theory, we can lower switches in IR, but we want to do all of switch lowering at the same time. Otherwise, you end up with weird guessing games to predict what SelectionDAG will actually do.

Hi guys,

Thanks for the comments. I was away on vacation for 2 weeks which is why this has been sitting around.

Hans, if you're OK with it being done in DAG for the reasons mentioned, is this good to commit?

Cheers,

James

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8434 ↗	(On Diff #60973)	I agree with Eli here. It'd be nice to densify individual switch case ranges, and perhaps we could do that in the future? The main advantage I got from moving from IR to SDAG for this is the heuristics don't need second-guessing, which was a major improvement. Doing this in IR would give one large advantage - it'd allow the switch table lookup lowering to work too. I intend to implement this directly in the switch table lookup lowering code though, which should be fairly simple.

Hi Hans, Sanjoy,

Ping! :) Are you now happy with this?

Cheers,

James

Ping!

The code seems fine to me. I'm OK with this landing.

I still wonder if we shouldn't just do this in SimplifyCFG though. How big is the pay-off here? I've seen switches with e.g. a common factor of 10 in the cases, but are powers of 2 common enough to justify the complexity, especially if we start looking at densifying only parts of a switch?

I just imagine that doing this in SimplifyCFG instead would be so much simpler: we'd look for a common power-of-two factor and slap a rotate-right operation in front of the switch. This could potentially allow more lookup tables, jump tables, bit-test lowerings, new adjacent cases, who knows -- one could argue that densifying the switch is just general goodness, and if it doesn't pay off it's just an extra rotate which is cheap.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8389 ↗	(On Diff #60973)	This comment refers to the algorithm that finds dense partitions in Clusters, which is O(n^2). Now that there's more stuff happening below, maybe update it to "don't try any harder at -O0", or you could jsut move your code before it, since I don't think it would be a problem running it a -O0.

Hi Hans,

I do agree with you. I think that the major reason I moved from SimplifyCFG to SDAGBuilder was that, because we were inserting new CFG edges, we didn't want to do this unless we could be *sure* that a dense switch lowering was used.

However now we've used the rotate trick, the inserted code is so tiny that we don't have to be so conservative. So I think it's OK to replicate the density heuristic in SimplifyCFG and that's what I've done.

I hope this is now more acceptable to you? :)

Cheers,

James

Thanks, I like this :-) Just some nits.

lib/Transforms/Utils/SimplifyCFG.cpp
5044	Aren't Values sorted (from line 5081), so this should be unnecessary?
5067	Maybe we should check DL.fitsInLegalInteger() instead of hard-coding 64 here. Oh I see, 64 is because we use [u]int64_t below. It might still be a good idea to check fitsInLegalInteger() so we're not creating a non-legal rotate operation.
5088	why isn't Base int64_t already? Also, could the subtraction overflow?
5106	Any bijective transformation of the case values will work. The problem is just deciding when they're profitable, I suppose. But I'm all for more creative switch lowerings :-)

Thanks Hans!

New version - hopefully this one should be ready to go?

Cheers,

James

lib/Transforms/Utils/SimplifyCFG.cpp
5088	I don't think overflow is possible. The subtraction moves values towards zero; the only way I can consider it overflowing is if the difference between the smallest value and the largest value is INT64_MAX or greater which is impossible.
5106	:)

lgtm

This was committed with Hans' LGTM.

Revision Contents

Path

Size

lib/

Transforms/

Utils/

SimplifyCFG.cpp

106 lines

test/

Transforms/

SimplifyCFG/

rangereduce.ll

195 lines

Diff 66226

lib/Transforms/Utils/SimplifyCFG.cpp

	Show First 20 Lines • Show All 939 Lines • ▼ Show 20 Lines

	if (!DefaultIsReachable \|\| GeneratingCoveredLookupTable) {			if (!DefaultIsReachable \|\| GeneratingCoveredLookupTable) {
	// We cached PHINodes in PHIs, to avoid accessing deleted PHINodes later,			// We cached PHINodes in PHIs, to avoid accessing deleted PHINodes later,
	// do not delete PHINodes here.			// do not delete PHINodes here.
	SI->getDefaultDest()->removePredecessor(SI->getParent(),			SI->getDefaultDest()->removePredecessor(SI->getParent(),
	/DontDeleteUselessPHIs=/true);			/DontDeleteUselessPHIs=/true);
	}			}

	bool ReturnedEarly = false;			bool ReturnedEarly = false;
				sanjoyUnsubmitted Not Done Reply Inline Actions I'd prefer a `getZExtValue` or `getSExtValue` here. That way the code will assert if someone later accidentally removed the check on the bit width. sanjoy: I'd prefer a `getZExtValue` or `getSExtValue` here. That way the code will assert if someone…
	for (size_t I = 0, E = PHIs.size(); I != E; ++I) {			for (size_t I = 0, E = PHIs.size(); I != E; ++I) {
	PHINode *PHI = PHIs[I];			PHINode *PHI = PHIs[I];
	const ResultListTy &ResultList = ResultLists[PHI];			const ResultListTy &ResultList = ResultLists[PHI];
				sanjoyUnsubmitted Not Done Reply Inline Actions How about `{-2, 0, 2, 4}`? That's sparse and can benefit from this pass, but here you'll conclude that it has no holes. sanjoy: How about `{-2, 0, 2, 4}`? That's sparse and can benefit from this pass, but here you'll…
				sanjoyUnsubmitted Not Done Reply Inline Actions Bad example, I meant cases like `{-3, -1, 1, 3}`. sanjoy: Bad example, I meant cases like `{-3, -1, 1, 3}`.

				sbarangaUnsubmitted Not Done Reply Inline Actions Could you also add a test for this case? sbaranga: Could you also add a test for this case?
				jmolloyAuthorUnsubmitted Not Done Reply Inline Actions This should be @test3 jmolloy: This should be @test3
	// If using a bitmask, use any value to fill the lookup table holes.			// If using a bitmask, use any value to fill the lookup table holes.
	Constant *DV = NeedMask ? ResultLists[PHI][0].second : DefaultResults[PHI];			Constant *DV = NeedMask ? ResultLists[PHI][0].second : DefaultResults[PHI];
	SwitchLookupTable Table(Mod, TableSize, MinCaseVal, ResultList, DV, DL);			SwitchLookupTable Table(Mod, TableSize, MinCaseVal, ResultList, DV, DL);

	Value *Result = Table.BuildLookup(TableIndex, Builder);			Value *Result = Table.BuildLookup(TableIndex, Builder);

	// If the result is used to return immediately from the function, we want to			// If the result is used to return immediately from the function, we want to
	// do that right here.			// do that right here.
	if (PHI->hasOneUse() && isa<ReturnInst>(*PHI->user_begin()) &&			if (PHI->hasOneUse() && isa<ReturnInst>(*PHI->user_begin()) &&
	PHI->user_back() == CommonDest->getFirstNonPHIOrDbg()) {			PHI->user_back() == CommonDest->getFirstNonPHIOrDbg()) {
	Builder.CreateRet(Result);			Builder.CreateRet(Result);
	ReturnedEarly = true;			ReturnedEarly = true;
	break;			break;
				sanjoyUnsubmitted Not Done Reply Inline Actions Don't you need to check that `V` is divisible by `Divisor` here? sanjoy: Don't you need to check that `V` is divisible by `Divisor` here?
	}			}
				sanjoyUnsubmitted Not Done Reply Inline Actions Might be worth calling out the signedness of the division here explicitly. I'd mildly prefer bitwise ops instead of division. sanjoy: Might be worth calling out the signedness of the division here explicitly. I'd mildly prefer…

	// Do a small peephole optimization: re-use the switch table compare if			// Do a small peephole optimization: re-use the switch table compare if
				sbarangaUnsubmitted Not Done Reply Inline Actions If it doesn't have holes then the GCD that you are looking for should be Values[1]. sbaranga: If it doesn't have holes then the GCD that you are looking for should be Values[1].
				jmolloyAuthorUnsubmitted Not Done Reply Inline Actions Uhhhhh, yes. Yes it should. Thanks for spotting this! I've rewritten the code entirely. jmolloy: Uhhhhh, yes. Yes it should. Thanks for spotting this! I've rewritten the code entirely.
	// possible.			// possible.
	if (!TableHasHoles && HasDefaultResults && RangeCheckBranch) {			if (!TableHasHoles && HasDefaultResults && RangeCheckBranch) {
	BasicBlock *PhiBlock = PHI->getParent();			BasicBlock *PhiBlock = PHI->getParent();
	// Search for compare instructions which use the phi.			// Search for compare instructions which use the phi.
	for (auto *User : PHI->users()) {			for (auto *User : PHI->users()) {
	reuseTableCompare(User, PhiBlock, RangeCheckBranch, DV, ResultList);			reuseTableCompare(User, PhiBlock, RangeCheckBranch, DV, ResultList);
	}			}
	}			}
				sanjoyUnsubmitted Not Done Reply Inline Actions I'd be tempted to emit the bitwise ops here directly. That way you don't do extra work later and are less sensitive to pass ordering. sanjoy: I'd be tempted to emit the bitwise ops here directly. That way you don't do extra work later…

	PHI->addIncoming(Result, LookupBB);			PHI->addIncoming(Result, LookupBB);
	}			}

	if (!ReturnedEarly)			if (!ReturnedEarly)
				eli.friedmanUnsubmitted Not Done Reply Inline Actions Nit: please make the implicit casting here explicit. eli.friedman: Nit: please make the implicit casting here explicit.
	Builder.CreateBr(CommonDest);			Builder.CreateBr(CommonDest);

	// Remove the switch.			// Remove the switch.
	for (unsigned i = 0, e = SI->getNumSuccessors(); i < e; ++i) {			for (unsigned i = 0, e = SI->getNumSuccessors(); i < e; ++i) {
	BasicBlock *Succ = SI->getSuccessor(i);			BasicBlock *Succ = SI->getSuccessor(i);

	if (Succ == SI->getDefaultDest())			if (Succ == SI->getDefaultDest())
				sanjoyUnsubmitted Not Done Reply Inline Actions This gcd computation looks like extra work -- why not: uint64_t PotentialGCD = Values[1]; if (!isPowerOf2(PotentialGCD)) return false; if (!all_of(Values, [](uint64_t V) { return V & (PotentialGCD - 1) == 0; }) return false; sanjoy: This gcd computation looks like extra work -- why not: ``` uint64_t PotentialGCD = Values[1]…
				eli.friedmanUnsubmitted Not Done Reply Inline Actions That isn't equivalent for something like "0, 8, 12, 16, 20". eli.friedman: That isn't equivalent for something like "0, 8, 12, 16, 20".
				sanjoyUnsubmitted Not Done Reply Inline Actions Yeah, you're right. I was still thinking in terms of the older "100% dense" scheme. sanjoy: Yeah, you're right. I was still thinking in terms of the older "100% dense" scheme.
	continue;			continue;
	Succ->removePredecessor(SI->getParent());			Succ->removePredecessor(SI->getParent());
				eli.friedmanUnsubmitted Not Done Reply Inline Actions If the GCD is some odd number N, you can multiply by the inverse of N and switch on that. Not sure if that's actually useful in practice. eli.friedman: If the GCD is some odd number N, you can multiply by the inverse of N and switch on that. Not…
	}			}
	SI->eraseFromParent();			SI->eraseFromParent();

	++NumLookupTables;			++NumLookupTables;
	if (NeedMask)			if (NeedMask)
	++NumLookupTablesHoles;			++NumLookupTablesHoles;
	return true;			return true;
	}			}

				static bool isSwitchDense(ArrayRef<int64_t> Values) {
				// See also SelectionDAGBuilder::isDense(), which this function was based on.
				uint64_t Diff = (uint64_t)Values.back() - (uint64_t)Values.front();
				uint64_t Range = Diff + 1;
				hansUnsubmitted Not Done Reply Inline Actions Aren't Values sorted (from line 5081), so this should be unnecessary? hans: Aren't Values sorted (from line 5081), so this should be unnecessary?
				uint64_t NumCases = Values.size();
				// 40% is the default density for building a jump table in optsize/minsize mode.
				uint64_t MinDensity = 40;

				return NumCases * 100 >= Range * MinDensity;
				}

				// Try and transform a switch that has "holes" in it to a contiguous sequence
				// of cases.
				//
				// A switch such as: switch(i) {case 5: case 9: case 13: case 17:} can be
				// range-reduced to: switch ((i-5) / 4) {case 0: case 1: case 2: case 3:}.
				eli.friedmanUnsubmitted Not Done Reply Inline Actions If we were doing this in SelectionDAG, we would be able to combine this branch with the jump table's bounds check: "ROTATE(X - Base, Shift) > Limit". eli.friedman: If we were doing this in SelectionDAG, we would be able to combine this branch with the jump…
				jmolloyAuthorUnsubmitted Not Done Reply Inline Actions Hmm, you're close to convincing me. I'm loath to give up the lookup table optimization though. How gross (/ acceptable) would it be to implement this in SelectionDAG and then also teach lookup table lowering in SimplifyCFG this trick too? jmolloy: Hmm, you're close to convincing me. I'm loath to give up the lookup table optimization though.
				eli.friedmanUnsubmitted Not Done Reply Inline Actions It wouldn't be too terrible as long as the actual algorithm for finding a reducible switch is factored out into a utility, I think. eli.friedman: It wouldn't be too terrible as long as the actual algorithm for finding a reducible switch is…
				//
				// This converts a sparse switch into a dense switch which allows better
				// lowering and could also allow transforming into a lookup table.
				static bool ReduceSwitchRange(SwitchInst *SI, IRBuilder<> &Builder,
				const DataLayout &DL,
				const TargetTransformInfo &TTI) {
				auto *CondTy = cast<IntegerType>(SI->getCondition()->getType());
				if (CondTy->getIntegerBitWidth() > 64 \|\|
				!DL.fitsInLegalInteger(CondTy->getIntegerBitWidth()))
				return false;
				// Only bother with this optimization if there are more than 3 switch cases;
				hansUnsubmitted Not Done Reply Inline Actions Maybe we should check DL.fitsInLegalInteger() instead of hard-coding 64 here. Oh I see, 64 is because we use [u]int64_t below. It might still be a good idea to check fitsInLegalInteger() so we're not creating a non-legal rotate operation. hans: Maybe we should check DL.fitsInLegalInteger() instead of hard-coding 64 here. Oh I see, 64 is…
				// SDAG will only bother creating jump tables for 4 or more cases.
				if (SI->getNumCases() < 4)
				return false;

				// This transform is agnostic to the signedness of the input or case values. We
				sanjoyUnsubmitted Not Done Reply Inline Actions This linear search (via `replaceUsesOfWith`) seems wasteful. Why not change `getCaseValue` to return a `Use &` and do `C.getCaseValue().set(New)` or something similar? sanjoy: This linear search (via `replaceUsesOfWith`) seems wasteful. Why not change `getCaseValue` to…
				// can treat the case values as signed or unsigned. We can optimize more common
				// cases such as a sequence crossing zero {-4,0,4,8} if we interpret case values
				// as signed.
				SmallVector<int64_t,4> Values;
				for (auto &C : SI->cases())
				Values.push_back(C.getCaseValue()->getValue().getSExtValue());
				std::sort(Values.begin(), Values.end());

				// If the switch is already dense, there's nothing useful to do here.
				if (isSwitchDense(Values))
				return false;

				// First, transform the values such that they start at zero and ascend.
				int64_t Base = Values[0];
				for (auto &V : Values)
				V -= Base;
				hansUnsubmitted Not Done Reply Inline Actions why isn't Base int64_t already? Also, could the subtraction overflow? hans: why isn't Base int64_t already? Also, could the subtraction overflow?
				jmolloyAuthorUnsubmitted Not Done Reply Inline Actions I don't think overflow is possible. The subtraction moves values towards zero; the only way I can consider it overflowing is if the difference between the smallest value and the largest value is INT64_MAX or greater which is impossible. jmolloy: I don't think overflow is possible. The subtraction moves values towards zero; the only way I…

				// Now we have signed numbers that have been shifted so that, given enough
				// precision, there are no negative values. Since the rest of the transform
				// is bitwise only, we switch now to an unsigned representation.
				uint64_t GCD = 0;
				for (auto &V : Values)
				GCD = llvm::GreatestCommonDivisor64(GCD, (uint64_t)V);

				// This transform can be done speculatively because it is so cheap - it results
				// in a single rotate operation being inserted. This can only happen if the
				// factor extracted is a power of 2.
				// FIXME: If the GCD is an odd number we can multiply by the multiplicative
				// inverse of GCD and then perform this transform.
				// FIXME: It's possible that optimizing a switch on powers of two might also
				// be beneficial - flag values are often powers of two and we could use a CLZ
				// as the key function.
				if (GCD <= 1 \|\| !llvm::isPowerOf2_64(GCD))
				// No common divisor found or too expensive to compute key function.
				hansUnsubmitted Not Done Reply Inline Actions Any bijective transformation of the case values will work. The problem is just deciding when they're profitable, I suppose. But I'm all for more creative switch lowerings :-) hans: Any bijective transformation of the case values will work. The problem is just deciding when…
				jmolloyAuthorUnsubmitted Not Done Reply Inline Actions :) jmolloy: :)
				return false;

				unsigned Shift = llvm::Log2_64(GCD);
				for (auto &V : Values)
				V = (int64_t)((uint64_t)V >> Shift);

				if (!isSwitchDense(Values))
				// Transform didn't create a dense switch.
				return false;

				// The obvious transform is to shift the switch condition right and emit a
				// check that the condition actually cleanly divided by GCD, i.e.
				// C & (1 << Shift - 1) == 0
				// inserting a new CFG edge to handle the case where it didn't divide cleanly.
				//
				// A cheaper way of doing this is a simple ROTR(C, Shift). This performs the
				// shift and puts the shifted-off bits in the uppermost bits. If any of these
				// are nonzero then the switch condition will be very large and will hit the
				// default case.

				auto *Ty = cast<IntegerType>(SI->getCondition()->getType());
				Builder.SetInsertPoint(SI);
				auto *ShiftC = ConstantInt::get(Ty, Shift);
				auto *Sub = Builder.CreateSub(SI->getCondition(), ConstantInt::get(Ty, Base));
				auto *Rot = Builder.CreateOr(Builder.CreateLShr(Sub, ShiftC),
				Builder.CreateShl(Sub, Ty->getBitWidth() - Shift));
				SI->replaceUsesOfWith(SI->getCondition(), Rot);

				for (auto &C : SI->cases()) {
				auto *Orig = C.getCaseValue();
				auto Sub = Orig->getValue() - APInt(Ty->getBitWidth(), Base);
				SI->replaceUsesOfWith(Orig,
				ConstantInt::get(Ty, Sub.lshr(ShiftC->getValue())));
				}
				return true;
				}

	bool SimplifyCFGOpt::SimplifySwitch(SwitchInst *SI, IRBuilder<> &Builder) {			bool SimplifyCFGOpt::SimplifySwitch(SwitchInst *SI, IRBuilder<> &Builder) {
	BasicBlock *BB = SI->getParent();			BasicBlock *BB = SI->getParent();

	if (isValueEqualityComparison(SI)) {			if (isValueEqualityComparison(SI)) {
	// If we only have one predecessor, and if it is a branch on this value,			// If we only have one predecessor, and if it is a branch on this value,
	// see if that predecessor totally determines the outcome of this switch.			// see if that predecessor totally determines the outcome of this switch.
	if (BasicBlock *OnlyPred = BB->getSinglePredecessor())			if (BasicBlock *OnlyPred = BB->getSinglePredecessor())
	if (SimplifyEqualityComparisonWithOnlyPredecessor(SI, OnlyPred, Builder))			if (SimplifyEqualityComparisonWithOnlyPredecessor(SI, OnlyPred, Builder))
	Show All 27 Lines
	return SimplifyCFG(BB, TTI, BonusInstThreshold, AC) \| true;			return SimplifyCFG(BB, TTI, BonusInstThreshold, AC) \| true;

	if (ForwardSwitchConditionToPHI(SI))			if (ForwardSwitchConditionToPHI(SI))
	return SimplifyCFG(BB, TTI, BonusInstThreshold, AC) \| true;			return SimplifyCFG(BB, TTI, BonusInstThreshold, AC) \| true;

	if (SwitchToLookupTable(SI, Builder, DL, TTI))			if (SwitchToLookupTable(SI, Builder, DL, TTI))
	return SimplifyCFG(BB, TTI, BonusInstThreshold, AC) \| true;			return SimplifyCFG(BB, TTI, BonusInstThreshold, AC) \| true;

				if (ReduceSwitchRange(SI, Builder, DL, TTI))
				return SimplifyCFG(BB, TTI, BonusInstThreshold, AC) \| true;

	return false;			return false;
	}			}

	bool SimplifyCFGOpt::SimplifyIndirectBr(IndirectBrInst *IBI) {			bool SimplifyCFGOpt::SimplifyIndirectBr(IndirectBrInst *IBI) {
	BasicBlock *BB = IBI->getParent();			BasicBlock *BB = IBI->getParent();
	bool Changed = false;			bool Changed = false;

	// Eliminate redundant destinations.			// Eliminate redundant destinations.
	▲ Show 20 Lines • Show All 447 Lines • Show Last 20 Lines

test/Transforms/SimplifyCFG/rangereduce.ll

This file was added.

				; RUN: opt < %s -simplifycfg -S \| FileCheck %s

				target datalayout = "e-n32"

				; CHECK-LABEL: @test1
				; CHECK: %1 = sub i32 %a, 97
				; CHECK: %2 = lshr i32 %1, 2
				; CHECK: %3 = shl i32 %1, 30
				; CHECK: %4 = or i32 %2, %3
				; CHECK: switch i32 %4, label %def [
				; CHECK: i32 0, label %one
				; CHECK: i32 1, label %two
				; CHECK: i32 2, label %three
				; CHECK: ]
				define i32 @test1(i32 %a) {
				switch i32 %a, label %def [
				i32 97, label %one
				i32 101, label %two
				i32 105, label %three
				i32 109, label %three
				]

				def:
				ret i32 8867

				one:
				ret i32 11984
				two:
				ret i32 1143
				three:
				ret i32 99783
				}

				; Optimization shouldn't trigger; bitwidth > 64
				; CHECK-LABEL: @test2
				; CHECK: switch i128 %a, label %def
				define i128 @test2(i128 %a) {
				switch i128 %a, label %def [
				i128 97, label %one
				i128 101, label %two
				i128 105, label %three
				i128 109, label %three
				]

				def:
				ret i128 8867

				one:
				ret i128 11984
				two:
				ret i128 1143
				three:
				ret i128 99783
				}


				; Optimization shouldn't trigger; no holes present
				; CHECK-LABEL: @test3
				; CHECK: switch i32 %a, label %def
				define i32 @test3(i32 %a) {
				switch i32 %a, label %def [
				i32 97, label %one
				i32 98, label %two
				i32 99, label %three
				]

				def:
				ret i32 8867

				one:
				ret i32 11984
				two:
				ret i32 1143
				three:
				ret i32 99783
				}

				; Optimization shouldn't trigger; not an arithmetic progression
				; CHECK-LABEL: @test4
				; CHECK: switch i32 %a, label %def
				define i32 @test4(i32 %a) {
				switch i32 %a, label %def [
				i32 97, label %one
				i32 102, label %two
				i32 105, label %three
				i32 109, label %three
				]

				def:
				ret i32 8867

				one:
				ret i32 11984
				two:
				ret i32 1143
				three:
				ret i32 99783
				}

				; Optimization shouldn't trigger; not a power of two
				; CHECK-LABEL: @test5
				; CHECK: switch i32 %a, label %def
				define i32 @test5(i32 %a) {
				switch i32 %a, label %def [
				i32 97, label %one
				i32 102, label %two
				i32 107, label %three
				i32 112, label %three
				]

				def:
				ret i32 8867

				one:
				ret i32 11984
				two:
				ret i32 1143
				three:
				ret i32 99783
				}

				; CHECK-LABEL: @test6
				; CHECK: %1 = sub i32 %a, -109
				; CHECK: %2 = lshr i32 %1, 2
				; CHECK: %3 = shl i32 %1, 30
				; CHECK: %4 = or i32 %2, %3
				; CHECK: switch i32 %4, label %def [
				define i32 @test6(i32 %a) optsize {
				switch i32 %a, label %def [
				i32 -97, label %one
				i32 -101, label %two
				i32 -105, label %three
				i32 -109, label %three
				]

				def:
				ret i32 8867

				one:
				ret i32 11984
				two:
				ret i32 1143
				three:
				ret i32 99783
				}

				; CHECK-LABEL: @test7
				; CHECK: %1 = sub i8 %a, -36
				; CHECK: %2 = lshr i8 %1, 2
				; CHECK: %3 = shl i8 %1, 6
				; CHECK: %4 = or i8 %2, %3
				; CHECK: switch.tableidx = {{.*}} %4
				define i8 @test7(i8 %a) optsize {
				switch i8 %a, label %def [
				i8 220, label %one
				i8 224, label %two
				i8 228, label %three
				i8 232, label %three
				]

				def:
				ret i8 8867

				one:
				ret i8 11984
				two:
				ret i8 1143
				three:
				ret i8 99783
				}

				; CHECK-LABEL: @test8
				; CHECK: %1 = sub i32 %a, 97
				; CHECK: %2 = lshr i32 %1, 2
				; CHECK: %3 = shl i32 %1, 30
				; CHECK: %4 = or i32 %2, %3
				; CHECK: switch i32 %4, label %def [
				define i32 @test8(i32 %a) optsize {
				switch i32 %a, label %def [
				i32 97, label %one
				i32 101, label %two
				i32 105, label %three
				i32 113, label %three
				]

				def:
				ret i32 8867

				one:
				ret i32 11984
				two:
				ret i32 1143
				three:
				ret i32 99783
				}
				No newline at end of file

This is an archive of the discontinued LLVM Phabricator instance.

[SimplifyCFG] Range reduce switchesAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 66226

lib/Transforms/Utils/SimplifyCFG.cpp

test/Transforms/SimplifyCFG/rangereduce.ll

[SimplifyCFG] Range reduce switches
AbandonedPublic