This is an archive of the discontinued LLVM Phabricator instance.

[SimplifyCFG] Use lookup tables when they are more space efficient or a huge speed win.
AbandonedPublic

Authored by shawnl on Apr 22 2019, 2:55 PM.

Download Raw Diff

Details

Reviewers

nikic
hans
spatel
jmolloy

Summary

I am trying to move towards much more space-efficient switch statements, using popcnt, as described in PR39013. This is the patch 6 towards that goal, and a continuation of D60673.

There are quite a few cases where the lookup table was smaller and it was still not used. Also, there was no consideration of how large the cases were in the calculation. These numbers will change in a later patch, (when sparse maps make switch much more compact) so we shouldn't argue too much about this cut-off.

When only an i8 is being switched over, a complete table is not large, and avoiding the branch of a regular lookup table is a significant speed win (as can be seen in the tests).

Diff Detail

Event Timeline

shawnl created this revision.Apr 22 2019, 2:55 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 22 2019, 2:55 PM

Herald added subscribers: llvm-commits, kristof.beyls, javed.absar. · View Herald Transcript

add more test results

also remove Sub (turns out Sub does not get normalized to Add like I thought it would.

lebedev.ri added a parent revision: D60673: [SimplifyCFG] Improove and speed up ReduceSwitchRange.Apr 24 2019, 12:07 PM

lebedev.ri removed a reviewer: lebedev.ri.Apr 24 2019, 12:39 PM

lebedev.ri set the repository for this revision to rL LLVM.

lebedev.ri removed a subscriber: lebedev.ri.

fix i8 use of covered table to work when the return is not i8, do not use -O2 in tests. Update tests.

shawnl mentioned this in D61132: [builtins] run-time support for sparse maps in llvm.Apr 25 2019, 8:17 AM

Please reupload with full context as described in the developer's guide.

These numbers will change in a later patch, (when sparse maps make switch much more compact) so we shouldn't argue too much about this cut-off.

Are you referring to your changes in builtins/? If so, that's surely part of compiler-rt, and you can't expect targets to have that available at runtime.

lib/Transforms/Utils/SimplifyCFG.cpp
5131	( 3 * 8 ) ... but why? This isn't obvious from the code so needs comments. Note that the original code randomly dividing by 10 was just as bad, but you've clearly got a good reason for this change so should document it ;)
5135	will Bold claim :)
5142	Nit: please use the term "Largest". They're synonymous and I know this is nitpicking, but it's the more generally used term.
5145	This is only used in one place; please fold into its use.
5173	Please describe this heuristic in more detail. Why is it always good? Take a look at the line 5162 in the diffbase for a good example of a heuristic description. It captures what the criterion is, and a rationale for it.
5174	These magic numbers have no place in SimplifyCFG. If you need to be this accurate, add a TTI hook.
5183	But why? You've replaced 40% with 33% and you mention 64-bit integers but the following heuristic doesn't use 64-bit integers anywhere.
5272	Please upload the code for review as it will be committed.
5386	This assumes whatever constant add/sub/xor was planted still exists. There's no guarantee of that; for example if V is a constant, IRBuilder would have constant folded immediately. In general unstitching code like this is inherently problematic. Note that isn't the same as planting an inverting code sequence - that's fine. We can use the optimizer to clean the duplicated stuff up and everyone is happy. That is much preferable to hamhandedly unstitching stuff and looking for patterns of code you've just emitted. It also adds an implicit contract between different parts of SimplifyCFG that I guarantee someone will miss when they update this code :)
5400	This comment doesn't parse for me.
5401	This code structure is a little hard to understand. Instead of trying to tack onto the end of the if statement to reuse the heuristic, can you extract the heuristic into a bool and reuse it explicitly here? Then the reader doesn't need to think about context.
5713	It's not clear to me why this matters from this context. Perhaps if you wrote something like: // Call ReduceSwitchRange after SwitchToLookupTable as SwitchToLookupTable calls this internally.

This revision now requires changes to proceed.Apr 25 2019, 9:00 AM

Thanks for the review!

I've split the other patch into 5 distinct patches, which I will submit once I run the tests.

There is one part of this review that I need some clarification on before I can rev this patch. (see below comment)

lib/Transforms/Utils/SimplifyCFG.cpp
5386	The Xor wasn't added by this stuff. The problem is that this pass gets run multiple times, sometimes without the table generation (because it can make the code analyzable). There is nowhere else this optimization can go, because it is an optimization specific to switch statements, where the operands can be re-ordered arbitrarily. Also, there is no assumption that these things are there. It simply sees if they are there and if so removed them and then returns true so that the other optimizations run before we continue. So I feel this is a planted inverting code sequence: this is not "unstitching" (and the code does not generate xors but does remove them): it is an optimization specific to covered tables.

jmolloy added inline comments.Apr 25 2019, 12:05 PM

lib/Transforms/Utils/SimplifyCFG.cpp
5386	My worry about this code is that it doesn't assert the preconditions or postconditions. AFAICS, it detects any Add, Sub or Xor and does some transform on them. Why is this correct in all cases? What if one of these instructions were missed? What if there is a sequence of these instructions, is it correct to just adjust a subset? What if they were reordered or constant folded? The code doesn't explain exactly what it's checking for, who generates it and why it's correct to remove (and what happens if you have a false negative or false positive). Note I'm not saying it's wrong, or perhaps it's the only way to do this, but with the lack of comment it remains a code smell at the moment and it's difficult for me to suggest an alternative.

nikic added inline comments.Apr 25 2019, 12:17 PM

lib/Transforms/Utils/SimplifyCFG.cpp
5386	I think the idea here is that given two bijective (total) functions f(x) and lookup(y), then lookup(f(x)) is also a bijective function that can be implemented as a new lookup table lookup_f(x). In this case and, sub and xor with a constant are the f(x)s. It seems like this could be an independent and more general optimization though, that works on any table lookup.

I'm trying to follow along here, but there's so much churn I'm not sure what I'm supposed to be reviewing?

Are you asking for review on this patch? Is it ready for review? Or should I look at the other five patches you alluded too, and if so should this be marked abandoned?

This patch got a pretty comprehensive review and I will submit a new version for review. The other patches are also live.

shawnl updated this revision to Diff 197604.May 1 2019, 11:36 AM

shawnl edited the summary of this revision. (Show Details)

shawnl marked an inline comment as done.

shawnl added inline comments.

lib/Transforms/Utils/SimplifyCFG.cpp
5386	I am working on doing this as a general optimization of GEPs.

shawnl edited the summary of this revision. (Show Details)May 1 2019, 11:37 AM

git-clang-format

This seems reasonable to me.

This revision is now accepted and ready to land.May 2 2019, 1:11 AM

Undo approval; was looking at an incorrect diff.

This revision now requires changes to proceed.May 2 2019, 1:12 AM

shawnl requested review of this revision.May 2 2019, 5:04 PM

Looking much better. I think the TTI hook could be described better.

include/llvm/Analysis/TargetTransformInfo.h
581	What is the TableSize, and what is the CaseSize? In particular why would TableSize ever be different from NumCases?

This revision now requires changes to proceed.May 3 2019, 7:47 AM

NFC

jmolloy requested changes to this revision.May 9 2019, 6:11 AM

jmolloy added inline comments.

include/llvm/Analysis/TargetTransformInfo.h
582	To be honest, I'd really just express this as: // Return true if the given SwichInst should be converted into a lookup table. The size of the lookup table is \c TableSize and the number of covered cases is \c NumCases (meaning the number of table entries that hit the default case is \c TableSize - \c NumCases). bool shouldBuildLookupTable(SwitchInst *SI, uint64_t TableSize, unsigned NumCases); Sorry for the churn here, I know I didn't mention this in your previous review. But passing the SwitchInst itself and letting the target fish out OptSize/SI->getType()->getIntegerBitWidth() seems cleaner. Also, never use size_t or uint32_t here. TableSize's maximum extent isn't host dependent, it's target dependent. Use uint64_t to guarantee 64-bit coverage on all hosts. Similarly NumCases isn't limited to 32-bits, so use unsigned because that's what we use everywhere.

This revision now requires changes to proceed.May 9 2019, 6:11 AM

shawnl marked an inline comment as done.May 9 2019, 9:36 AM

shawnl added inline comments.

include/llvm/Analysis/TargetTransformInfo.h
582	good catch on the size_t! Similarly NumCases isn't limited to 32-bits, so use unsigned because that's what we use everywhere. I was going to limit it to 32-bits in the next patch. Is that a problem?

jmolloy added inline comments.May 10 2019, 5:52 AM

include/llvm/Analysis/TargetTransformInfo.h
582	I don't see the rationale for limiting it to 32-bits, so let's argue that in your next patch :) In the meantime let's keep it to uint64_t or unsigned in this patch.

shawnl marked an inline comment as done.May 19 2019, 5:15 AM

shawnl added inline comments.

lib/Transforms/Utils/SimplifyCFG.cpp
5183	Along with the comment on the multiplication overflow above: this is basic grade-school multiplication. 1/3 == 33 %; 64 / 8 == 8. The heuristic has threatened to de-rail this patch set with bike-shedding, and has made me really frustrated. Communication is a key part of doing this work, but when you worry that multiplication will not be understood by the reviewer it really throws a wrench in the process.

jmolloy added inline comments.May 19 2019, 7:10 AM

lib/Transforms/Utils/SimplifyCFG.cpp
5183	Hi Shawn, I apologise that my review has made you frustrated. If you wish to find a different reviewer, that's fine by me. I have approximate knowledge of many things. Luckily multiplication is one of them. Clairvoyance is not, and the point of a code review is to ensure that submitted code can be comprehended by anyone, not just its author. Communication is important as you say, and it's possible that I've been poor in my own communication. My intent with the "But why?" comments wasn't to say "I cannot understand this at all", instead to say "It took me longer than it should to understand this.". The latter does not mean the code is wrong, it means it either requires a comment or refactoring to make the intended logic (not the eventual mechanics) clear. My view on code in heuristics is that it should be crystal clear what the intent is without any mental gymnastics. If we take the code as you've rewritten it: // The table density should be at least 1/3rd (33%) for 64-bit integers. // This is a guess. FIXME: Find the best cut-off. return (uint64_t)NumCases * 3 * (64 / 8) >= TableSize * CaseSize; There are a few things the reader needs to do here. They must realise that the comment describes density with a number (33%), but the code calculates the inverse (denominator on the LHS, numerator on the RHS). It's still not clear to me what the "for 64-bit integers" means in the comment or precisely how it impacts the heuristic function. Heuristics are always arbitrary, often wrong. They are the trickiest pieces of code to comprehend the intent of because of this, so I pay them more attention. Also, you changed from 40% to 33% alongside this. Was there a reasoning behind this? People who notice the effects of this change may look back at this review to find a rationale. Again, apologies if you find this pedantic. I always aim to be anything but pedantic in my reviews.

hans added inline comments.May 22 2019, 1:42 AM

lib/Transforms/Utils/SimplifyCFG.cpp
5183	To give some background, the 40% density was originally chosen to match the density used to form switch jump tables in SelectionDAGBuilder. That density is still used for -Os builds, see OptsizeJumpTableDensity in TargetLoweringBase.cpp. Changing the threshold here may well be a good idea, especially since it has dropped to 10% for non-optsize jump tables (JumpTableDensity in TargetLoweringBase.cpp), but such a change needs at least a comment with the motivation. Perhaps it could be synced with the jump table density? Or perhaps it could be left alone for now -- there's already a lot going on in this patch. Currently you're returning false for optsize functions. If a switch is >40% dense it means we will instead build a jump table, which is likely to be larger so this would be both a size and performance regression. Regarding the "randomly dividing by 10" before, that's my fault too. The code currently looks like: // The table density should be at least 40%. This is the same criterion as for // jump tables, see SelectionDAGBuilder::handleJTSwitchCase. // FIXME: Find the best cut-off. return SI->getNumCases() * 10 >= TableSize * 4; If I were to write that today I might have spelled it out with 100 and 40 instead to make it really clear that 40 is a percentage, but I optimized prematurely. The "TableSize >= UINT64_MAX / 10" check above is to protect against overflow, as the comment says, but as the code evolved and that line moved further away from the density computation, it became harder to make the connection. I mentioned before that I have trouble following along with this patch series. This one has a title of "Use lookup tables when they are more space efficient or a huge speed win" which is pretty vague, and the inline description says things like "trying to move towards much more space-efficient switch statements, using popcnt" and "These numbers will change in a later patch, (when sparse maps make switch much more compact) so we shouldn't argue too much about this cut-off". This makes the patch very hard to review. At least for me, it's not clear what you're trying to do. Making the switch-to-lookup table transformation better is excellent, but it needs to be done by well-argued changes in clear and focused patches.

Thanks for the context Hans.

This review has been fairly critical, so I wanted to suggest ways to make this patch land easier.

Split it up into the NFC moving of ReduceSwitchRange to reduce code churn in this patch and make it more accessible to review. This NFC patch would land almost instantly without contention.
A general rule of thumb is to split up refactoring, new functionality, and heuristic changes into different patches. In particular the former two are easier to review and approve than the latter, so if the latter is totally isolated it's much less likely to derail your other changes.
I've previously explained how to make the heuristic self describing. As Hans mentioned, premature optimizations like folding constant divisions/multiplications in the code can obscure the intended calculation.

This way, the refactor and new functionality could land without much contention and the heuristic can be (a) picked over easier, (b) isolated for benchmarking and (c) isolated for rollback.

Cheers,

James

shawnl abandoned this revision.Jun 25 2020, 2:02 AM

Revision Contents

Path

Size

include/

llvm/

Analysis/

TargetTransformInfo.h

14 lines

TargetTransformInfoImpl.h

22 lines

lib/

Analysis/

TargetTransformInfo.cpp

7 lines

Transforms/

Utils/

SimplifyCFG.cpp

307 lines

test/

Transforms/

SimplifyCFG/

X86/

disable-lookup-table.ll

12 lines

switch-covered-bug.ll

12 lines

switch_to_lookup_table.ll

102 lines

rangereduce.ll

162 lines

switch-genfori8.ll

59 lines

Diff 198138

include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 569 Lines • ▼ Show 20 Lines	public:
/// Return true if switches should be turned into lookup tables for the		/// Return true if switches should be turned into lookup tables for the
/// target.		/// target.
bool shouldBuildLookupTables() const;		bool shouldBuildLookupTables() const;

/// Return true if switches should be turned into lookup tables		/// Return true if switches should be turned into lookup tables
/// containing this constant value for the target.		/// containing this constant value for the target.
bool shouldBuildLookupTablesForConstant(Constant *C) const;		bool shouldBuildLookupTablesForConstant(Constant *C) const;

		/// Given a table size (including holes), number of elements (not including
		/// holes), byte width of cases, and if this function is being compiled opt
		/// size, return true if this switch should be turned in to a lookup table for
		/// this target.
		jmolloyUnsubmitted Not Done Reply Inline Actions What is the TableSize, and what is the CaseSize? In particular why would TableSize ever be different from NumCases? jmolloy: What is the TableSize, and what is the CaseSize? In particular why would TableSize ever be…
		bool shouldBuildLookupTable(size_t TableSize, uint32_t NumCases,
		jmolloyUnsubmitted Not Done Reply Inline Actions To be honest, I'd really just express this as: // Return true if the given SwichInst should be converted into a lookup table. The size of the lookup table is \c TableSize and the number of covered cases is \c NumCases (meaning the number of table entries that hit the default case is \c TableSize - \c NumCases). bool shouldBuildLookupTable(SwitchInst SI, uint64_t TableSize, unsigned NumCases); Sorry for the churn here, I know I didn't mention this in your previous review. But passing the SwitchInst itself and letting the target fish out OptSize/SI->getType()->getIntegerBitWidth() seems cleaner. Also, never use size_t or uint32_t here. TableSize's maximum extent isn't host dependent, it's target dependent. Use uint64_t to guarantee 64-bit coverage on all hosts. Similarly NumCases isn't limited to 32-bits, so use unsigned because that's what we use everywhere. jmolloy:* To be honest, I'd really just express this as: // Return true if the given SwichInst should…
		shawnlAuthorUnsubmitted Done Reply Inline Actions good catch on the size_t! Similarly NumCases isn't limited to 32-bits, so use unsigned because that's what we use everywhere. I was going to limit it to 32-bits in the next patch. Is that a problem? shawnl: good catch on the size_t! > Similarly NumCases isn't limited to 32-bits, so use unsigned…
		jmolloyUnsubmitted Not Done Reply Inline Actions I don't see the rationale for limiting it to 32-bits, so let's argue that in your next patch :) In the meantime let's keep it to uint64_t or unsigned in this patch. jmolloy: I don't see the rationale for limiting it to 32-bits, so let's argue that in your next patch :)…
		unsigned CaseSize, bool HasOptSize) const;

/// Return true if the input function which is cold at all call sites,		/// Return true if the input function which is cold at all call sites,
/// should use coldcc calling convention.		/// should use coldcc calling convention.
bool useColdCCForColdCall(Function &F) const;		bool useColdCCForColdCall(Function &F) const;

unsigned getScalarizationOverhead(Type *Ty, bool Insert, bool Extract) const;		unsigned getScalarizationOverhead(Type *Ty, bool Insert, bool Extract) const;

unsigned getOperandsScalarizationOverhead(ArrayRef<const Value *> Args,		unsigned getOperandsScalarizationOverhead(ArrayRef<const Value *> Args,
unsigned VF) const;		unsigned VF) const;
▲ Show 20 Lines • Show All 515 Lines • ▼ Show 20 Lines	public:
virtual bool isTruncateFree(Type Ty1, Type Ty2) = 0;		virtual bool isTruncateFree(Type Ty1, Type Ty2) = 0;
virtual bool isProfitableToHoist(Instruction *I) = 0;		virtual bool isProfitableToHoist(Instruction *I) = 0;
virtual bool useAA() = 0;		virtual bool useAA() = 0;
virtual bool isTypeLegal(Type *Ty) = 0;		virtual bool isTypeLegal(Type *Ty) = 0;
virtual unsigned getJumpBufAlignment() = 0;		virtual unsigned getJumpBufAlignment() = 0;
virtual unsigned getJumpBufSize() = 0;		virtual unsigned getJumpBufSize() = 0;
virtual bool shouldBuildLookupTables() = 0;		virtual bool shouldBuildLookupTables() = 0;
virtual bool shouldBuildLookupTablesForConstant(Constant *C) = 0;		virtual bool shouldBuildLookupTablesForConstant(Constant *C) = 0;
		virtual bool shouldBuildLookupTable(uint64_t TableSize, uint32_t NumCases,
		unsigned CaseSize, bool HasOptSize) = 0;
virtual bool useColdCCForColdCall(Function &F) = 0;		virtual bool useColdCCForColdCall(Function &F) = 0;
virtual unsigned		virtual unsigned
getScalarizationOverhead(Type *Ty, bool Insert, bool Extract) = 0;		getScalarizationOverhead(Type *Ty, bool Insert, bool Extract) = 0;
virtual unsigned getOperandsScalarizationOverhead(ArrayRef<const Value *> Args,		virtual unsigned getOperandsScalarizationOverhead(ArrayRef<const Value *> Args,
unsigned VF) = 0;		unsigned VF) = 0;
virtual bool supportsEfficientVectorElementLoadStore() = 0;		virtual bool supportsEfficientVectorElementLoadStore() = 0;
virtual bool enableAggressiveInterleaving(bool LoopHasReductions) = 0;		virtual bool enableAggressiveInterleaving(bool LoopHasReductions) = 0;
virtual const MemCmpExpansionOptions *enableMemCmpExpansion(		virtual const MemCmpExpansionOptions *enableMemCmpExpansion(
▲ Show 20 Lines • Show All 260 Lines • ▼ Show 20 Lines	public:
unsigned getJumpBufAlignment() override { return Impl.getJumpBufAlignment(); }		unsigned getJumpBufAlignment() override { return Impl.getJumpBufAlignment(); }
unsigned getJumpBufSize() override { return Impl.getJumpBufSize(); }		unsigned getJumpBufSize() override { return Impl.getJumpBufSize(); }
bool shouldBuildLookupTables() override {		bool shouldBuildLookupTables() override {
return Impl.shouldBuildLookupTables();		return Impl.shouldBuildLookupTables();
}		}
bool shouldBuildLookupTablesForConstant(Constant *C) override {		bool shouldBuildLookupTablesForConstant(Constant *C) override {
return Impl.shouldBuildLookupTablesForConstant(C);		return Impl.shouldBuildLookupTablesForConstant(C);
}		}
		bool shouldBuildLookupTable(uint64_t TableSize, uint32_t NumCases,
		unsigned CaseSize, bool HasOptSize) override {
		return Impl.shouldBuildLookupTable(TableSize, NumCases, CaseSize,
		HasOptSize);
		}
bool useColdCCForColdCall(Function &F) override {		bool useColdCCForColdCall(Function &F) override {
return Impl.useColdCCForColdCall(F);		return Impl.useColdCCForColdCall(F);
}		}

unsigned getScalarizationOverhead(Type *Ty, bool Insert,		unsigned getScalarizationOverhead(Type *Ty, bool Insert,
bool Extract) override {		bool Extract) override {
return Impl.getScalarizationOverhead(Ty, Insert, Extract);		return Impl.getScalarizationOverhead(Ty, Insert, Extract);
}		}
▲ Show 20 Lines • Show All 366 Lines • Show Last 20 Lines

include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 255 Lines • ▼ Show 20 Lines	public:
bool isTypeLegal(Type *Ty) { return false; }		bool isTypeLegal(Type *Ty) { return false; }

unsigned getJumpBufAlignment() { return 0; }		unsigned getJumpBufAlignment() { return 0; }

unsigned getJumpBufSize() { return 0; }		unsigned getJumpBufSize() { return 0; }

bool shouldBuildLookupTables() { return true; }		bool shouldBuildLookupTables() { return true; }
bool shouldBuildLookupTablesForConstant(Constant *C) { return true; }		bool shouldBuildLookupTablesForConstant(Constant *C) { return true; }
		bool shouldBuildLookupTable(uint64_t TableSize, uint32_t NumCases,
		unsigned CaseSize, bool HasOptSize) {
		assert(TableSize >= NumCases);
		// This is a limitation of the sparse maps implementation
		if (TableSize >= (1ULL << 38))
		return false;

		// This is a guess of whether the table will be smaller.
		// Tables have O(1) performance, compared to the O(log n) performance of
		// the binary search that switches get lowered into, so we prefer them when
		// they are smaller.
		if (TableSize * CaseSize + 16 < (uint64_t)NumCases * 16)
		return true;

		// Space is more important than performance when using -Os
		if (HasOptSize)
		return false;

		// The table density should be at least 1/3rd (33%) for 64-bit integers.
		// This is a guess. FIXME: Find the best cut-off.
		return (uint64_t)NumCases * 3 * (64 / 8) >= TableSize * CaseSize;
		}

bool useColdCCForColdCall(Function &F) { return false; }		bool useColdCCForColdCall(Function &F) { return false; }

unsigned getScalarizationOverhead(Type *Ty, bool Insert, bool Extract) {		unsigned getScalarizationOverhead(Type *Ty, bool Insert, bool Extract) {
return 0;		return 0;
}		}

unsigned getOperandsScalarizationOverhead(ArrayRef<const Value *> Args,		unsigned getOperandsScalarizationOverhead(ArrayRef<const Value *> Args,
▲ Show 20 Lines • Show All 607 Lines • Show Last 20 Lines

lib/Analysis/TargetTransformInfo.cpp

	Show First 20 Lines • Show All 243 Lines • ▼ Show 20 Lines
	}			}

	bool TargetTransformInfo::shouldBuildLookupTables() const {			bool TargetTransformInfo::shouldBuildLookupTables() const {
	return TTIImpl->shouldBuildLookupTables();			return TTIImpl->shouldBuildLookupTables();
	}			}
	bool TargetTransformInfo::shouldBuildLookupTablesForConstant(Constant *C) const {			bool TargetTransformInfo::shouldBuildLookupTablesForConstant(Constant *C) const {
	return TTIImpl->shouldBuildLookupTablesForConstant(C);			return TTIImpl->shouldBuildLookupTablesForConstant(C);
	}			}
				bool TargetTransformInfo::shouldBuildLookupTable(uint64_t TableSize,
				uint32_t NumCases,
				unsigned CaseSize,
				bool HasOptSize) const {
				return TTIImpl->shouldBuildLookupTable(TableSize, NumCases, CaseSize,
				HasOptSize);
				}

	bool TargetTransformInfo::useColdCCForColdCall(Function &F) const {			bool TargetTransformInfo::useColdCCForColdCall(Function &F) const {
	return TTIImpl->useColdCCForColdCall(F);			return TTIImpl->useColdCCForColdCall(F);
	}			}

	unsigned TargetTransformInfo::			unsigned TargetTransformInfo::
	getScalarizationOverhead(Type *Ty, bool Insert, bool Extract) const {			getScalarizationOverhead(Type *Ty, bool Insert, bool Extract) const {
	return TTIImpl->getScalarizationOverhead(Ty, Insert, Extract);			return TTIImpl->getScalarizationOverhead(Ty, Insert, Extract);
	▲ Show 20 Lines • Show All 981 Lines • Show Last 20 Lines

lib/Transforms/Utils/SimplifyCFG.cpp

	Show First 20 Lines • Show All 991 Lines • ▼ Show 20 Lines
	}			}

	/// Determine whether a lookup table should be built for this switch, based on			/// Determine whether a lookup table should be built for this switch, based on
	/// the number of cases, size of the table, and the types of the results.			/// the number of cases, size of the table, and the types of the results.
	static bool			static bool
	ShouldBuildLookupTable(SwitchInst *SI, uint64_t TableSize,			ShouldBuildLookupTable(SwitchInst *SI, uint64_t TableSize,
	const TargetTransformInfo &TTI, const DataLayout &DL,			const TargetTransformInfo &TTI, const DataLayout &DL,
	const SmallDenseMap<PHINode , Type > &ResultTypes) {			const SmallDenseMap<PHINode , Type > &ResultTypes) {
	if (SI->getNumCases() > TableSize \|\| TableSize >= UINT64_MAX / 10)			if (SI->getNumCases() > TableSize)
				jmolloyUnsubmitted Not Done Reply Inline Actions ( 3 * 8 ) ... but why? This isn't obvious from the code so needs comments. Note that the original code randomly dividing by 10 was just as bad, but you've clearly got a good reason for this change so should document it ;) jmolloy: > ( 3 * 8 ) ... but why? This isn't obvious from the code so needs comments. Note that the…
	return false; // TableSize overflowed, or mul below might overflow.			return false; // TableSize overflowed

				// If the table only contains i8 or smaller condition, it has a bounded size
				// of 256 times the largest legal size, and will generally be more performant
				jmolloyUnsubmitted Not Done Reply Inline Actions will Bold claim :) jmolloy: > will Bold claim :)
				// with a lookup table.
				if (!SI->getFunction()->hasOptSize() &&
				(DL.getTypeAllocSize(SI->getCondition()->getType()) * 8) <= 8)
				return true;

	bool AllTablesFitInRegister = true;			bool AllTablesFitInRegister = true;
	bool HasIllegalType = false;			bool HasIllegalType = false;
				jmolloyUnsubmitted Not Done Reply Inline Actions Nit: please use the term "Largest". They're synonymous and I know this is nitpicking, but it's the more generally used term. jmolloy: Nit: please use the term "Largest". They're synonymous and I know this is nitpicking, but it's…
				unsigned LargestTypeSize = 0;
	for (const auto &I : ResultTypes) {			for (const auto &I : ResultTypes) {
	Type *Ty = I.second;			Type *Ty = I.second;
				jmolloyUnsubmitted Not Done Reply Inline Actions This is only used in one place; please fold into its use. jmolloy: This is only used in one place; please fold into its use.

	// Saturate this flag to true.			// Saturate this flag to true.
	HasIllegalType = HasIllegalType \|\| !TTI.isTypeLegal(Ty);			HasIllegalType = HasIllegalType \|\| !TTI.isTypeLegal(Ty);

	// Saturate this flag to false.			// Saturate this flag to false.
	AllTablesFitInRegister =			AllTablesFitInRegister =
	AllTablesFitInRegister &&			AllTablesFitInRegister &&
	SwitchLookupTable::WouldFitInRegister(DL, TableSize, Ty);			SwitchLookupTable::WouldFitInRegister(DL, TableSize, Ty);

	// If both flags saturate, we're done. NOTE: This only works with			LargestTypeSize =
	// saturating flags, and all flags have to saturate first due to the			std::max((uint32_t)DL.getTypeAllocSize(Ty), LargestTypeSize);
	// non-deterministic behavior of iterating over a dense map.
				// If both flags saturate, we're done.
	if (HasIllegalType && !AllTablesFitInRegister)			if (HasIllegalType && !AllTablesFitInRegister)
	break;			break;
	}			}

	// If each table would fit in a register, we should build it anyway.			// If each table would fit in a register, we should build it anyway.
	if (AllTablesFitInRegister)			if (AllTablesFitInRegister)
	return true;			return true;

	// Don't build a table that doesn't fit in-register if it has illegal types.			return TTI.shouldBuildLookupTable(TableSize, SI->getNumCases(),
	if (HasIllegalType)			LargestTypeSize,
	return false;			SI->getFunction()->hasOptSize());

	// The table density should be at least 40%. This is the same criterion as for
	// jump tables, see SelectionDAGBuilder::handleJTSwitchCase.
	// FIXME: Find the best cut-off.
	return SI->getNumCases() * 10 >= TableSize * 4;
	}			}

	/// Try to reuse the switch table index compare. Following pattern:			/// Try to reuse the switch table index compare. Following pattern:
	/// \code			/// \code
				jmolloyUnsubmitted Not Done Reply Inline Actions Please describe this heuristic in more detail. Why is it always good? Take a look at the line 5162 in the diffbase for a good example of a heuristic description. It captures what the criterion is, and a rationale for it. jmolloy: Please describe this heuristic in more detail. Why is it always good? Take a look at the line…
	/// if (idx < tablesize)			/// if (idx < tablesize)
				jmolloyUnsubmitted Not Done Reply Inline Actions These magic numbers have no place in SimplifyCFG. If you need to be this accurate, add a TTI hook. jmolloy: These magic numbers have no place in SimplifyCFG. If you need to be this accurate, add a TTI…
	/// r = table[idx]; // table does not contain default_value			/// r = table[idx]; // table does not contain default_value
	/// else			/// else
	/// r = default_value;			/// r = default_value;
	/// if (r != default_value)			/// if (r != default_value)
	/// ...			/// ...
	/// \endcode			/// \endcode
	/// Is optimized to:			/// Is optimized to:
	/// \code			/// \code
	/// cond = idx < tablesize;			/// cond = idx < tablesize;
				jmolloyUnsubmitted Not Done Reply Inline Actions But why? You've replaced 40% with 33% and you mention 64-bit integers but the following heuristic doesn't use 64-bit integers anywhere. jmolloy: But why? You've replaced 40% with 33% and you mention 64-bit integers but the following…
				shawnlAuthorUnsubmitted Done Reply Inline Actions Along with the comment on the multiplication overflow above: this is basic grade-school multiplication. 1/3 == 33 %; 64 / 8 == 8. The heuristic has threatened to de-rail this patch set with bike-shedding, and has made me really frustrated. Communication is a key part of doing this work, but when you worry that multiplication will not be understood by the reviewer it really throws a wrench in the process. shawnl: Along with the comment on the multiplication overflow above: this is basic grade-school…
				jmolloyUnsubmitted Not Done Reply Inline Actions Hi Shawn, I apologise that my review has made you frustrated. If you wish to find a different reviewer, that's fine by me. I have approximate knowledge of many things. Luckily multiplication is one of them. Clairvoyance is not, and the point of a code review is to ensure that submitted code can be comprehended by anyone, not just its author. Communication is important as you say, and it's possible that I've been poor in my own communication. My intent with the "But why?" comments wasn't to say "I cannot understand this at all", instead to say "It took me longer than it should to understand this.". The latter does not mean the code is wrong, it means it either requires a comment or refactoring to make the intended logic (not the eventual mechanics) clear. My view on code in heuristics is that it should be crystal clear what the intent is without any mental gymnastics. If we take the code as you've rewritten it: // The table density should be at least 1/3rd (33%) for 64-bit integers. // This is a guess. FIXME: Find the best cut-off. return (uint64_t)NumCases * 3 * (64 / 8) >= TableSize * CaseSize; There are a few things the reader needs to do here. They must realise that the comment describes density with a number (33%), but the code calculates the inverse (denominator on the LHS, numerator on the RHS). It's still not clear to me what the "for 64-bit integers" means in the comment or precisely how it impacts the heuristic function. Heuristics are always arbitrary, often wrong. They are the trickiest pieces of code to comprehend the intent of because of this, so I pay them more attention. Also, you changed from 40% to 33% alongside this. Was there a reasoning behind this? People who notice the effects of this change may look back at this review to find a rationale. Again, apologies if you find this pedantic. I always aim to be anything but pedantic in my reviews. jmolloy: Hi Shawn, I apologise that my review has made you frustrated. If you wish to find a different…
				hansUnsubmitted Not Done Reply Inline Actions To give some background, the 40% density was originally chosen to match the density used to form switch jump tables in SelectionDAGBuilder. That density is still used for -Os builds, see OptsizeJumpTableDensity in TargetLoweringBase.cpp. Changing the threshold here may well be a good idea, especially since it has dropped to 10% for non-optsize jump tables (JumpTableDensity in TargetLoweringBase.cpp), but such a change needs at least a comment with the motivation. Perhaps it could be synced with the jump table density? Or perhaps it could be left alone for now -- there's already a lot going on in this patch. Currently you're returning false for optsize functions. If a switch is >40% dense it means we will instead build a jump table, which is likely to be larger so this would be both a size and performance regression. Regarding the "randomly dividing by 10" before, that's my fault too. The code currently looks like: // The table density should be at least 40%. This is the same criterion as for // jump tables, see SelectionDAGBuilder::handleJTSwitchCase. // FIXME: Find the best cut-off. return SI->getNumCases() * 10 >= TableSize * 4; If I were to write that today I might have spelled it out with 100 and 40 instead to make it really clear that 40 is a percentage, but I optimized prematurely. The "TableSize >= UINT64_MAX / 10" check above is to protect against overflow, as the comment says, but as the code evolved and that line moved further away from the density computation, it became harder to make the connection. I mentioned before that I have trouble following along with this patch series. This one has a title of "Use lookup tables when they are more space efficient or a huge speed win" which is pretty vague, and the inline description says things like "trying to move towards much more space-efficient switch statements, using popcnt" and "These numbers will change in a later patch, (when sparse maps make switch much more compact) so we shouldn't argue too much about this cut-off". This makes the patch very hard to review. At least for me, it's not clear what you're trying to do. Making the switch-to-lookup table transformation better is excellent, but it needs to be done by well-argued changes in clear and focused patches. hans: To give some background, the 40% density was originally chosen to match the density used to…
	/// if (cond)			/// if (cond)
	/// r = table[idx];			/// r = table[idx];
	/// else			/// else
	/// r = default_value;			/// r = default_value;
	/// if (cond)			/// if (cond)
	/// ...			/// ...
	/// \endcode			/// \endcode
	/// Jump threading will then eliminate the second if(cond).			/// Jump threading will then eliminate the second if(cond).
	▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines
	Value *InvertedTableCmp = BinaryOperator::CreateXor(			Value *InvertedTableCmp = BinaryOperator::CreateXor(
	RangeCmp, ConstantInt::get(RangeCmp->getType(), 1), "inverted.cmp",			RangeCmp, ConstantInt::get(RangeCmp->getType(), 1), "inverted.cmp",
	RangeCheckBranch);			RangeCheckBranch);
	CmpInst->replaceAllUsesWith(InvertedTableCmp);			CmpInst->replaceAllUsesWith(InvertedTableCmp);
	++NumTableCmpReuses;			++NumTableCmpReuses;
	}			}
	}			}

				/// Try to transform a switch that has "holes" in it to a contiguous sequence
				/// of cases.
				///
				/// A switch such as: switch(i) {case 5: case 9: case 13: case 17:} can be
				/// range-reduced to: switch ((i-5) / 4) {case 0: case 1: case 2: case 3:}.
				///
				/// This converts a sparse switch into a dense switch which allows better
				/// lowering and could also allow transforming into a lookup table.
				static bool ReduceSwitchRange(SwitchInst *SI, IRBuilder<> &Builder,
				const DataLayout &DL,
				const TargetTransformInfo &TTI) {
				// The number of cases that need to be removed by a subtraction operation
				// to make it worth using.
				const unsigned SubThreshold = (SI->getFunction()->hasOptSize() ? 2 : 8);
				auto *CondTy = cast<IntegerType>(SI->getCondition()->getType());
				unsigned BitWidth = CondTy->getIntegerBitWidth();
				if (BitWidth > 64 \|\| !DL.fitsInLegalInteger(BitWidth))
				return false;
				jmolloyUnsubmitted Not Done Reply Inline Actions Please upload the code for review as it will be committed. jmolloy: Please upload the code for review as it will be committed.
				// Only bother with this optimization if there are more than 3 switch cases;
				// SDAG will only bother creating jump tables for 4 or more cases.
				// This is also useful when using the LowerSwitch transform, but not with
				// so few cases.
				if (SI->getNumCases() < 4)
				return false;

				// We organize the range to start from 0, if it is not already close.
				SmallVector<uint64_t, 4> Values;
				for (auto &C : SI->cases())
				Values.push_back(C.getCaseValue()->getValue().getLimitedValue());
				llvm::sort(Values);

				bool MadeChanges = false;

				// We must first look find the best start point, for example if we have a
				// series that crosses zero: -2, -1, 0, 1, 2.
				uint64_t BestDistance =
				APInt::getMaxValue(CondTy->getIntegerBitWidth()).getLimitedValue() -
				Values.back() + Values.front() + 1;
				unsigned BestIndex = 0;
				for (unsigned I = 1, E = Values.size(); I != E; I++) {
				if (Values[I] - Values[I - 1] > BestDistance) {
				BestIndex = I;
				BestDistance = Values[I] - Values[I - 1];
				}
				}

				// This transform can be done speculatively because it is so cheap - it
				// results in a single rotate operation being inserted.

				// countTrailingZeros(0) returns 64. As Values is guaranteed to have more than
				// one element and LLVM disallows duplicate cases, Shift is guaranteed to be
				// less than 64.
				unsigned Shift = 64;
				// We need to store this from _before_ the transform
				uint64_t BestIndexXor = Values[BestIndex];
				for (auto &V : Values)
				Shift = std::min(Shift, countTrailingZeros(V ^ BestIndexXor));
				assert(Shift < 64);
				if (Shift > 0) {
				MadeChanges = true;
				for (auto &V : Values)
				V >>= Shift;
				}

				// We Xor against Values[] (any element will do) because the if we do not
				// start at zero, but also don't meet the SubThreshold, then we still might
				// share common rights bits, and if this transform succeeds
				// then we should insert the subtraction anyways, because the rotate trick
				// below to avoid a branch needs the shifted away bits to be zero.

				// Now transform the values such that they start at zero and ascend. Do not
				// do this if the shift reduces the lowest value to less than SubThreshold,
				// or if the subtraction is less than SubThreshold and it does not enable a
				// rotate.
				uint64_t Base = 0;
				if ((BestIndexXor >= SubThreshold && Shift == 0) \|\|
				(Shift > countTrailingZeros(BestIndexXor) &&
				Values[BestIndex] >= SubThreshold)) {
				Base = BestIndexXor;
				MadeChanges = true;
				for (auto &V : Values)
				V = (APInt(BitWidth, V) - Base).getLimitedValue();
				}

				if (!MadeChanges)
				// We didn't do anything.
				return false;

				// The obvious transform is to shift the switch condition right and emit a
				// check that the condition actually cleanly divided by GCD, i.e.
				// C & (1 << Shift - 1) == 0
				// inserting a new CFG edge to handle the case where it didn't divide cleanly.
				//
				// A cheaper way of doing this is a simple ROTR(C, Shift). This performs the
				// shift and puts the shifted-off bits in the uppermost bits. If any of these
				// are nonzero then the switch condition will be very large and will hit the
				// default case.

				auto *Ty = cast<IntegerType>(SI->getCondition()->getType());
				Builder.SetInsertPoint(SI);
				Value *Key = SI->getCondition();
				if (Base > 0)
				Key = Builder.CreateSub(Key, ConstantInt::get(Ty, Base),
				"switch.rangereduce");
				if (Shift > 0) {
				// FIXME replace with fshr?
				auto *ShiftC = ConstantInt::get(Ty, Shift);
				auto *LShr = Builder.CreateLShr(Key, ShiftC);
				auto *Shl = Builder.CreateShl(Key, Ty->getBitWidth() - Shift);
				Key = Builder.CreateOr(LShr, Shl);
				}
				SI->replaceUsesOfWith(SI->getCondition(), Key);

				for (auto Case : SI->cases()) {
				auto *Orig = Case.getCaseValue();
				auto Sub = Orig->getValue() - Base;
				Case.setValue(cast<ConstantInt>(ConstantInt::get(Ty, Sub.lshr(Shift))));
				}
				return true;
				}

	/// If the switch is only used to initialize one or more phi nodes in a common			/// If the switch is only used to initialize one or more phi nodes in a common
	/// successor block with different constant values, replace the switch with			/// successor block with different constant values, replace the switch with
	/// lookup tables.			/// lookup tables.
	static bool SwitchToLookupTable(SwitchInst *SI, IRBuilder<> &Builder,			static bool SwitchToLookupTable(SwitchInst *SI, IRBuilder<> &Builder,
	const DataLayout &DL,			const DataLayout &DL,
	const TargetTransformInfo &TTI) {			const TargetTransformInfo &TTI) {
	assert(SI->getNumCases() > 1 && "Degenerate switch?");			assert(SI->getNumCases() > 1 && "Degenerate switch?");

	Function *Fn = SI->getParent()->getParent();			Function *Fn = SI->getParent()->getParent();
	// Only build lookup table when we have a target that supports it or the			// Only build lookup table when we have a target that supports it or the
	// attribute is not set.			// attribute is not set.
				jmolloyUnsubmitted Not Done Reply Inline Actions This assumes whatever constant add/sub/xor was planted still exists. There's no guarantee of that; for example if V is a constant, IRBuilder would have constant folded immediately. In general unstitching code like this is inherently problematic. Note that isn't the same as planting an inverting code sequence - that's fine. We can use the optimizer to clean the duplicated stuff up and everyone is happy. That is much preferable to hamhandedly unstitching stuff and looking for patterns of code you've just emitted. It also adds an implicit contract between different parts of SimplifyCFG that I guarantee someone will miss when they update this code :) jmolloy: This assumes whatever constant add/sub/xor was planted still exists. There's no guarantee of…
				shawnlAuthorUnsubmitted Done Reply Inline Actions The Xor wasn't added by this stuff. The problem is that this pass gets run multiple times, sometimes without the table generation (because it can make the code analyzable). There is nowhere else this optimization can go, because it is an optimization specific to switch statements, where the operands can be re-ordered arbitrarily. Also, there is no assumption that these things are there. It simply sees if they are there and if so removed them and then returns true so that the other optimizations run before we continue. So I feel this is a planted inverting code sequence: this is not "unstitching" (and the code does not generate xors but does remove them): it is an optimization specific to covered tables. shawnl: The Xor wasn't added by this stuff. The problem is that this pass gets run multiple times…
				jmolloyUnsubmitted Not Done Reply Inline Actions My worry about this code is that it doesn't assert the preconditions or postconditions. AFAICS, it detects any Add, Sub or Xor and does some transform on them. Why is this correct in all cases? What if one of these instructions were missed? What if there is a sequence of these instructions, is it correct to just adjust a subset? What if they were reordered or constant folded? The code doesn't explain exactly what it's checking for, who generates it and why it's correct to remove (and what happens if you have a false negative or false positive). Note I'm not saying it's wrong, or perhaps it's the only way to do this, but with the lack of comment it remains a code smell at the moment and it's difficult for me to suggest an alternative. jmolloy: My worry about this code is that it doesn't assert the preconditions or postconditions. AFAICS…
				nikicUnsubmitted Not Done Reply Inline Actions I think the idea here is that given two bijective (total) functions f(x) and lookup(y), then lookup(f(x)) is also a bijective function that can be implemented as a new lookup table lookup_f(x). In this case and, sub and xor with a constant are the f(x)s. It seems like this could be an independent and more general optimization though, that works on any table lookup. nikic: I think the idea here is that given two bijective (total) functions f(x) and lookup(y), then…
				shawnlAuthorUnsubmitted Done Reply Inline Actions I am working on doing this as a general optimization of GEPs. shawnl: I am working on doing this as a general optimization of GEPs.
	if (!TTI.shouldBuildLookupTables() \|\|			if (!TTI.shouldBuildLookupTables() \|\|
	(Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true"))			(Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true"))
	return false;			return false;

	// FIXME: If the switch is too sparse for a lookup table, perhaps we could			// FIXME: If the switch is too sparse for a lookup table, perhaps we could
	// split off a dense part and build a lookup table for that.			// split off a dense part and build a lookup table for that.

	// FIXME: This creates arrays of GEPs to constant strings, which means each			// FIXME: This creates arrays of GEPs to constant strings, which means each
	// GEP needs a runtime relocation in PIC code. We should just build one big			// GEP needs a runtime relocation in PIC code. We should just build one big
	// string and lookup indices into that.			// string and lookup indices into that.

	// Ignore switches with less than three cases. Lookup tables will not make			// Ignore switches with less than three cases. Lookup tables will not make
	// them faster, so we don't analyze them.			// them faster, so we don't analyze them.
	if (SI->getNumCases() < 3)			if (SI->getNumCases() < 3)
				jmolloyUnsubmitted Not Done Reply Inline Actions This comment doesn't parse for me. jmolloy: This comment doesn't parse for me.
	return false;			return false;
				jmolloyUnsubmitted Not Done Reply Inline Actions This code structure is a little hard to understand. Instead of trying to tack onto the end of the if statement to reuse the heuristic, can you extract the heuristic into a bool and reuse it explicitly here? Then the reader doesn't need to think about context. jmolloy: This code structure is a little hard to understand. Instead of trying to tack onto the end of…

	// Figure out the corresponding result for each case value and phi node in the			// Figure out the corresponding result for each case value and phi node in the
	// common destination, as well as the min and max case values.			// common destination, as well as the min and max case values.
	assert(!empty(SI->cases()));			assert(!empty(SI->cases()));
	SwitchInst::CaseIt CI = SI->case_begin();			SwitchInst::CaseIt CI = SI->case_begin();
	ConstantInt *MinCaseVal = CI->getCaseValue();			ConstantInt *MinCaseVal = CI->getCaseValue();
	ConstantInt *MaxCaseVal = CI->getCaseValue();			ConstantInt *MaxCaseVal = CI->getCaseValue();

	▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
	}			}

	for (const auto &I : DefaultResultsList) {			for (const auto &I : DefaultResultsList) {
	PHINode *PHI = I.first;			PHINode *PHI = I.first;
	Constant *Result = I.second;			Constant *Result = I.second;
	DefaultResults[PHI] = Result;			DefaultResults[PHI] = Result;
	}			}

				// Compute the maximum table size representable by the integer type we are
				// switching upon.
				unsigned CaseSize = MinCaseVal->getType()->getPrimitiveSizeInBits();
				uint64_t MaxTableSize = CaseSize > 63 ? UINT64_MAX : 1ULL << CaseSize;
				assert(MaxTableSize >= TableSize &&
				"It is impossible for a switch to have more entries than the max "
				"representable value of its input integer type's size.");

				// If the table is only a u8 and we do not have to check for the default case,
				// extend the table so we can get rid of the branch.
				bool ExtendToCoveredTable = MaxTableSize <= 256 && HasDefaultResults &&
				!SI->getFunction()->hasOptSize();
				if (ExtendToCoveredTable)
				TableSize = MaxTableSize;
				else if (ReduceSwitchRange(SI, Builder, DL, TTI))
				return true; // We will get called again

	if (!ShouldBuildLookupTable(SI, TableSize, TTI, DL, ResultTypes))			if (!ShouldBuildLookupTable(SI, TableSize, TTI, DL, ResultTypes))
	return false;			return false;

	// Create the BB that does the lookups.			// Create the BB that does the lookups.
	Module &Mod = *CommonDest->getParent()->getParent();			Module &Mod = *CommonDest->getParent()->getParent();
	BasicBlock *LookupBB = BasicBlock::Create(			BasicBlock *LookupBB = BasicBlock::Create(
	Mod.getContext(), "switch.lookup", CommonDest->getParent(), CommonDest);			Mod.getContext(), "switch.lookup", CommonDest->getParent(), CommonDest);

	// Compute the table index value.
	Builder.SetInsertPoint(SI);			Builder.SetInsertPoint(SI);
	Value *TableIndex = SI->getCondition();			Value *TableIndex = SI->getCondition();

	// Compute the maximum table size representable by the integer type we are
	// switching upon.
	unsigned CaseSize = MinCaseVal->getType()->getPrimitiveSizeInBits();
	uint64_t MaxTableSize = CaseSize > 63 ? UINT64_MAX : 1ULL << CaseSize;
	assert(MaxTableSize >= TableSize &&
	"It is impossible for a switch to have more entries than the max "
	"representable value of its input integer type's size.");

	// If the default destination is unreachable, or if the lookup table covers			// If the default destination is unreachable, or if the lookup table covers
	// all values of the conditional variable, branch directly to the lookup table			// all values of the conditional variable, branch directly to the lookup table
	// BB. Otherwise, check that the condition is within the case range.			// BB. Otherwise, check that the condition is within the case range.
	const bool DefaultIsReachable =			const bool DefaultIsReachable =
	!isa<UnreachableInst>(SI->getDefaultDest()->getFirstNonPHIOrDbg());			!isa<UnreachableInst>(SI->getDefaultDest()->getFirstNonPHIOrDbg());
	const bool GeneratingCoveredLookupTable = (MaxTableSize == TableSize);			const bool GeneratingCoveredLookupTable = (MaxTableSize == TableSize);
	BranchInst *RangeCheckBranch = nullptr;			BranchInst *RangeCheckBranch = nullptr;

	▲ Show 20 Lines • Show All 110 Lines • ▼ Show 20 Lines
	SI->eraseFromParent();			SI->eraseFromParent();

	++NumLookupTables;			++NumLookupTables;
	if (NeedMask)			if (NeedMask)
	++NumLookupTablesHoles;			++NumLookupTablesHoles;
	return true;			return true;
	}			}

	/// Try to transform a switch that has "holes" in it to a contiguous sequence
	/// of cases.
	///
	/// A switch such as: switch(i) {case 5: case 9: case 13: case 17:} can be
	/// range-reduced to: switch ((i-5) / 4) {case 0: case 1: case 2: case 3:}.
	///
	/// This converts a sparse switch into a dense switch which allows better
	/// lowering and could also allow transforming into a lookup table.
	static bool ReduceSwitchRange(SwitchInst *SI, IRBuilder<> &Builder,
	const DataLayout &DL,
	const TargetTransformInfo &TTI) {
	// The number of cases that need to be removed by a subtraction operation
	// to make it worth using.
	const unsigned SubThreshold = (SI->getFunction()->hasOptSize() ? 2 : 8);
	auto *CondTy = cast<IntegerType>(SI->getCondition()->getType());
	unsigned BitWidth = CondTy->getIntegerBitWidth();
	if (BitWidth > 64 \|\| !DL.fitsInLegalInteger(BitWidth))
	return false;
	// Only bother with this optimization if there are more than 3 switch cases;
	// SDAG will only bother creating jump tables for 4 or more cases.
	// This is also useful when using the LowerSwitch transform, but not with
	// so few cases.
	if (SI->getNumCases() < 4)
	return false;

	// We organize the range to start from 0, if it is not already close.
	SmallVector<uint64_t, 4> Values;
	for (auto &C : SI->cases())
	Values.push_back(C.getCaseValue()->getValue().getLimitedValue());
	llvm::sort(Values);

	bool MadeChanges = false;
	// We must first look find the best start point, for example if we have a
	// series that crosses zero: -2, -1, 0, 1, 2.
	uint64_t BestDistance =
	APInt::getMaxValue(CondTy->getIntegerBitWidth()).getLimitedValue() -
	Values.back() + Values.front() + 1;
	unsigned BestIndex = 0;
	for (unsigned I = 1, E = Values.size(); I != E; I++) {
	if (Values[I] - Values[I - 1] > BestDistance) {
	BestIndex = I;
	BestDistance = Values[I] - Values[I - 1];
	}
	}

	// This transform can be done speculatively because it is so cheap - it
	// results in a single rotate operation being inserted.
	// FIXME: It's possible that optimizing a switch on powers of two might also
	// be beneficial - flag values are often powers of two and we could use a CLZ
	// as the key function.

	// countTrailingZeros(0) returns 64. As Values is guaranteed to have more than
	// one element and LLVM disallows duplicate cases, Shift is guaranteed to be
	// less than 64.
	unsigned Shift = 64;
	// We need to store this from _before_ the transform
	uint64_t BestIndexXor = Values[BestIndex];
	for (auto &V : Values)
	Shift = std::min(Shift, countTrailingZeros(V ^ BestIndexXor));
	assert(Shift < 64);
	if (Shift > 0) {
	MadeChanges = true;
	for (auto &V : Values)
	V >>= Shift;
	}

	// We Xor against Values[] (any element will do) because the if we do not
	// start at zero, but also don't meet the SubThreshold, then we still might
	// share common rights bits, and if this transform succeeds
	// then we should insert the subtraction anyways, because the rotate trick
	// below to avoid a branch needs the shifted away bits to be zero.

	// Now transform the values such that they start at zero and ascend. Do not
	// do this if the shift reduces the lowest value to less than SubThreshold,
	// or if the subtraction is less than SubThreshold and it does not enable a
	// rotate.
	uint64_t Base = 0;
	if ((BestIndexXor >= SubThreshold && Shift == 0) \|\|
	(Shift > countTrailingZeros(BestIndexXor) &&
	Values[BestIndex] >= SubThreshold)) {
	Base = BestIndexXor;
	MadeChanges = true;
	for (auto &V : Values)
	V = (APInt(BitWidth, V) - Base).getLimitedValue();
	}

	if (!MadeChanges)
	// We didn't do anything.
	return false;

	// The obvious transform is to shift the switch condition right and emit a
	// check that the condition actually cleanly divided by GCD, i.e.
	// C & (1 << Shift - 1) == 0
	// inserting a new CFG edge to handle the case where it didn't divide cleanly.
	//
	// A cheaper way of doing this is a simple ROTR(C, Shift). This performs the
	// shift and puts the shifted-off bits in the uppermost bits. If any of these
	// are nonzero then the switch condition will be very large and will hit the
	// default case.

	auto *Ty = cast<IntegerType>(SI->getCondition()->getType());
	Builder.SetInsertPoint(SI);
	Value *Key = SI->getCondition();
	if (Base > 0)
	Key = Builder.CreateSub(Key, ConstantInt::get(Ty, Base));
	if (Shift > 0) {
	// FIXME replace with fshr?
	auto *ShiftC = ConstantInt::get(Ty, Shift);
	auto *LShr = Builder.CreateLShr(Key, ShiftC);
	auto *Shl = Builder.CreateShl(Key, Ty->getBitWidth() - Shift);
	Key = Builder.CreateOr(LShr, Shl);
	}
	SI->replaceUsesOfWith(SI->getCondition(), Key);

	for (auto Case : SI->cases()) {
	auto *Orig = Case.getCaseValue();
	auto Sub = Orig->getValue() - Base;
	Case.setValue(cast<ConstantInt>(ConstantInt::get(Ty, Sub.lshr(Shift))));
	}
	return true;
	}

	bool SimplifyCFGOpt::SimplifySwitch(SwitchInst *SI, IRBuilder<> &Builder) {			bool SimplifyCFGOpt::SimplifySwitch(SwitchInst *SI, IRBuilder<> &Builder) {
	BasicBlock *BB = SI->getParent();			BasicBlock *BB = SI->getParent();

	if (isValueEqualityComparison(SI)) {			if (isValueEqualityComparison(SI)) {
	// If we only have one predecessor, and if it is a branch on this value,			// If we only have one predecessor, and if it is a branch on this value,
	// see if that predecessor totally determines the outcome of this switch.			// see if that predecessor totally determines the outcome of this switch.
	if (BasicBlock *OnlyPred = BB->getSinglePredecessor())			if (BasicBlock *OnlyPred = BB->getSinglePredecessor())
	if (SimplifyEqualityComparisonWithOnlyPredecessor(SI, OnlyPred, Builder))			if (SimplifyEqualityComparisonWithOnlyPredecessor(SI, OnlyPred, Builder))
	Show All 20 Lines
	return requestResimplify();			return requestResimplify();

	if (switchToSelect(SI, Builder, DL, TTI))			if (switchToSelect(SI, Builder, DL, TTI))
	return requestResimplify();			return requestResimplify();

	if (Options.ForwardSwitchCondToPhi && ForwardSwitchConditionToPHI(SI))			if (Options.ForwardSwitchCondToPhi && ForwardSwitchConditionToPHI(SI))
	return requestResimplify();			return requestResimplify();

	if (ReduceSwitchRange(SI, Builder, DL, TTI))
	return requestResimplify();

	// The conversion from switch to lookup tables results in difficult-to-analyze			// The conversion from switch to lookup tables results in difficult-to-analyze
	// code and makes pruning branches much harder. This is a problem if the			// code and makes pruning branches much harder. This is a problem if the
	// switch expression itself can still be restricted as a result of inlining or			// switch expression itself can still be restricted as a result of inlining or
	// CVP. Therefore, only apply this transformation during late stages of the			// CVP. Therefore, only apply this transformation during late stages of the
	// optimisation pipeline.			// optimisation pipeline.
	if (Options.ConvertSwitchToLookupTable &&			if (Options.ConvertSwitchToLookupTable &&
	SwitchToLookupTable(SI, Builder, DL, TTI))			SwitchToLookupTable(SI, Builder, DL, TTI))
	return requestResimplify();			return requestResimplify();

				// Call ReduceSwitchRange after SwitchToLookupTable as SwitchToLookupTable
				// calls this internally.
				if (ReduceSwitchRange(SI, Builder, DL, TTI))
				return requestResimplify();

	return false;			return false;
	}			}

	bool SimplifyCFGOpt::SimplifyIndirectBr(IndirectBrInst *IBI) {			bool SimplifyCFGOpt::SimplifyIndirectBr(IndirectBrInst *IBI) {
	BasicBlock *BB = IBI->getParent();			BasicBlock *BB = IBI->getParent();
	bool Changed = false;			bool Changed = false;

	// Eliminate redundant destinations.			// Eliminate redundant destinations.
	Show All 19 Lines
	if (IBI->getNumDestinations() == 1) {			if (IBI->getNumDestinations() == 1) {
	// If the indirectbr has one successor, change it to a direct branch.			// If the indirectbr has one successor, change it to a direct branch.
	BranchInst::Create(IBI->getDestination(0), IBI);			BranchInst::Create(IBI->getDestination(0), IBI);
	EraseTerminatorAndDCECond(IBI);			EraseTerminatorAndDCECond(IBI);
	return true;			return true;
	}			}

	if (SelectInst *SI = dyn_cast<SelectInst>(IBI->getAddress())) {			if (SelectInst *SI = dyn_cast<SelectInst>(IBI->getAddress())) {
	if (SimplifyIndirectBrOnSelect(IBI, SI))			if (SimplifyIndirectBrOnSelect(IBI, SI))
				jmolloyUnsubmitted Not Done Reply Inline Actions It's not clear to me why this matters from this context. Perhaps if you wrote something like: // Call ReduceSwitchRange after SwitchToLookupTable as SwitchToLookupTable calls this internally. jmolloy: It's not clear to me why this matters from this context. Perhaps if you wrote something like…
	return requestResimplify();			return requestResimplify();
	}			}
	return Changed;			return Changed;
	}			}

	/// Given an block with only a single landing pad and a unconditional branch			/// Given an block with only a single landing pad and a unconditional branch
	/// try to find another basic block which this one can be merged with. This			/// try to find another basic block which this one can be merged with. This
	/// handles cases where we have multiple invokes with unique landing pads, but			/// handles cases where we have multiple invokes with unique landing pads, but
	▲ Show 20 Lines • Show All 412 Lines • Show Last 20 Lines

test/Transforms/SimplifyCFG/X86/disable-lookup-table.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -simplifycfg -switch-to-lookup -S -mtriple=x86_64-unknown-linux-gnu \| FileCheck %s		; RUN: opt < %s -simplifycfg -switch-to-lookup -S -mtriple=x86_64-unknown-linux-gnu \| FileCheck %s
; RUN: opt < %s -passes='simplify-cfg<switch-to-lookup>' -S -mtriple=x86_64-unknown-linux-gnu \| FileCheck %s		; RUN: opt < %s -passes='simplify-cfg<switch-to-lookup>' -S -mtriple=x86_64-unknown-linux-gnu \| FileCheck %s

target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"		target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
; In the presence of "-no-jump-tables"="true", simplifycfg should not convert switches to lookup tables.		; In the presence of "-no-jump-tables"="true", simplifycfg should not convert switches to lookup tables.

; CHECK: @switch.table.bar = private unnamed_addr constant [4 x i32] [i32 55, i32 123, i32 0, i32 -1]		; CHECK: @switch.table.bar = private unnamed_addr constant [4 x i32] [i32 55, i32 123, i32 0, i32 -1]
; CHECK-LABEL: foo		; CHECK-LABEL: foo
; CHECK-NOT: @switch.table.foo = private unnamed_addr constant [4 x i32] [i32 55, i32 123, i32 0, i32 -1]		; CHECK-NOT: @switch.table.foo = private unnamed_addr constant [4 x i32] [i32 55, i32 123, i32 0, i32 -1]

define i32 @foo(i32 %c) "no-jump-tables"="true" {		define i32 @foo(i32 %c) "no-jump-tables"="true" {
; CHECK-LABEL: @foo(		; CHECK-LABEL: @foo(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = sub i32 [[C:%.]], 42		; CHECK-NEXT: [[SWITCH_RANGEREDUCE:%.]] = sub i32 [[C:%.]], 42
; CHECK-NEXT: switch i32 [[TMP0]], label [[SW_DEFAULT:%.*]] [		; CHECK-NEXT: switch i32 [[SWITCH_RANGEREDUCE]], label [[SW_DEFAULT:%.*]] [
; CHECK-NEXT: i32 0, label [[RETURN:%.*]]		; CHECK-NEXT: i32 0, label [[RETURN:%.*]]
; CHECK-NEXT: i32 1, label [[SW_BB1:%.*]]		; CHECK-NEXT: i32 1, label [[SW_BB1:%.*]]
; CHECK-NEXT: i32 2, label [[SW_BB2:%.*]]		; CHECK-NEXT: i32 2, label [[SW_BB2:%.*]]
; CHECK-NEXT: i32 3, label [[SW_BB3:%.*]]		; CHECK-NEXT: i32 3, label [[SW_BB3:%.*]]
; CHECK-NEXT: ]		; CHECK-NEXT: ]
; CHECK: sw.bb1:		; CHECK: sw.bb1:
; CHECK-NEXT: br label [[RETURN]]		; CHECK-NEXT: br label [[RETURN]]
; CHECK: sw.bb2:		; CHECK: sw.bb2:
Show All 22 Lines	return:
%retval.0 = phi i32 [ 15, %sw.default ], [ -1, %sw.bb3 ], [ 0, %sw.bb2 ], [ 123, %sw.bb1 ], [ 55, %entry ]		%retval.0 = phi i32 [ 15, %sw.default ], [ -1, %sw.bb3 ], [ 0, %sw.bb2 ], [ 123, %sw.bb1 ], [ 55, %entry ]
ret i32 %retval.0		ret i32 %retval.0
}		}


define i32 @bar(i32 %c) {		define i32 @bar(i32 %c) {
; CHECK-LABEL: @bar(		; CHECK-LABEL: @bar(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = sub i32 [[C:%.]], 42		; CHECK-NEXT: [[SWITCH_RANGEREDUCE:%.]] = sub i32 [[C:%.]], 42
; CHECK-NEXT: [[TMP1:%.*]] = icmp ult i32 [[TMP0]], 4		; CHECK-NEXT: [[TMP0:%.*]] = icmp ult i32 [[SWITCH_RANGEREDUCE]], 4
; CHECK-NEXT: br i1 [[TMP1]], label [[SWITCH_LOOKUP:%.]], label [[RETURN:%.]]		; CHECK-NEXT: br i1 [[TMP0]], label [[SWITCH_LOOKUP:%.]], label [[RETURN:%.]]
; CHECK: switch.lookup:		; CHECK: switch.lookup:
; CHECK-NEXT: [[SWITCH_GEP:%.]] = getelementptr inbounds [4 x i32], [4 x i32] @switch.table.bar, i32 0, i32 [[TMP0]]		; CHECK-NEXT: [[SWITCH_GEP:%.]] = getelementptr inbounds [4 x i32], [4 x i32] @switch.table.bar, i32 0, i32 [[SWITCH_RANGEREDUCE]]
; CHECK-NEXT: [[SWITCH_LOAD:%.]] = load i32, i32 [[SWITCH_GEP]]		; CHECK-NEXT: [[SWITCH_LOAD:%.]] = load i32, i32 [[SWITCH_GEP]]
; CHECK-NEXT: ret i32 [[SWITCH_LOAD]]		; CHECK-NEXT: ret i32 [[SWITCH_LOAD]]
; CHECK: return:		; CHECK: return:
; CHECK-NEXT: ret i32 15		; CHECK-NEXT: ret i32 15
;		;
entry:		entry:
switch i32 %c, label %sw.default [		switch i32 %c, label %sw.default [
i32 42, label %return		i32 42, label %return
Show All 14 Lines

test/Transforms/SimplifyCFG/X86/switch-covered-bug.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -simplifycfg -switch-to-lookup < %s -mtriple=x86_64-apple-darwin12.0.0 \| FileCheck %s			; RUN: opt -S -simplifycfg -switch-to-lookup < %s -mtriple=x86_64-apple-darwin12.0.0 \| FileCheck %s
	; RUN: opt -S -passes='simplify-cfg<switch-to-lookup>' < %s -mtriple=x86_64-apple-darwin12.0.0 \| FileCheck %s			; RUN: opt -S -passes='simplify-cfg<switch-to-lookup>' < %s -mtriple=x86_64-apple-darwin12.0.0 \| FileCheck %s

	; rdar://17887153			; rdar://17887153
	target datalayout = "e-p:64:64:64-S128-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f16:16:16-f32:32:32-f64:64:64-f128:128:128-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"			target datalayout = "e-p:64:64:64-S128-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f16:16:16-f32:32:32-f64:64:64-f128:128:128-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"
	target triple = "x86_64-apple-darwin12.0.0"			target triple = "x86_64-apple-darwin12.0.0"

	define i64 @test(i3 %arg) {			define i64 @test(i3 %arg) {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = icmp ult i3 [[ARG:%.]], -1			; CHECK-NEXT: [[SWITCH_TABLEIDX_ZEXT:%.]] = zext i3 [[ARG:%.]] to i4
	; CHECK-NEXT: br i1 [[TMP0]], label [[SWITCH_LOOKUP:%.]], label [[DEFAULT:%.]]			; CHECK-NEXT: [[SWITCH_GEP:%.]] = getelementptr inbounds [8 x i64], [8 x i64] @switch.table.test, i32 0, i4 [[SWITCH_TABLEIDX_ZEXT]]
	; CHECK: switch.lookup:
	; CHECK-NEXT: [[SWITCH_TABLEIDX_ZEXT:%.*]] = zext i3 [[ARG]] to i4
	; CHECK-NEXT: [[SWITCH_GEP:%.]] = getelementptr inbounds [7 x i64], [7 x i64] @switch.table.test, i32 0, i4 [[SWITCH_TABLEIDX_ZEXT]]
	; CHECK-NEXT: [[SWITCH_LOAD:%.]] = load i64, i64 [[SWITCH_GEP]]			; CHECK-NEXT: [[SWITCH_LOAD:%.]] = load i64, i64 [[SWITCH_GEP]]
	; CHECK-NEXT: br label [[DEFAULT]]			; CHECK-NEXT: [[V3:%.*]] = add i64 [[SWITCH_LOAD]], 0
	; CHECK: Default:
	; CHECK-NEXT: [[V1:%.]] = phi i64 [ 8, [[ENTRY:%.]] ], [ [[SWITCH_LOAD]], [[SWITCH_LOOKUP]] ]
	; CHECK-NEXT: [[V3:%.*]] = add i64 [[V1]], 0
	; CHECK-NEXT: ret i64 [[V3]]			; CHECK-NEXT: ret i64 [[V3]]
	;			;
	entry:			entry:
	switch i3 %arg, label %Default [			switch i3 %arg, label %Default [
	i3 -2, label %Label6			i3 -2, label %Label6
	i3 1, label %Label1			i3 1, label %Label1
	i3 2, label %Label2			i3 2, label %Label2
	i3 3, label %Label3			i3 3, label %Label3
	Show All 28 Lines

test/Transforms/SimplifyCFG/X86/switch_to_lookup_table.ll

Show All 30 Lines
; A simple int-to-int selection switch.		; A simple int-to-int selection switch.
; It is dense enough to be replaced by table lookup.		; It is dense enough to be replaced by table lookup.
; The result is directly by a ret from an otherwise empty bb,		; The result is directly by a ret from an otherwise empty bb,
; so we return early, directly from the lookup bb.		; so we return early, directly from the lookup bb.

define i32 @f(i32 %c) {		define i32 @f(i32 %c) {
; CHECK-LABEL: @f(		; CHECK-LABEL: @f(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = sub i32 [[C:%.]], 42		; CHECK-NEXT: [[SWITCH_RANGEREDUCE:%.]] = sub i32 [[C:%.]], 42
; CHECK-NEXT: [[TMP1:%.*]] = icmp ult i32 [[TMP0]], 7		; CHECK-NEXT: [[TMP0:%.*]] = icmp ult i32 [[SWITCH_RANGEREDUCE]], 7
; CHECK-NEXT: br i1 [[TMP1]], label [[SWITCH_LOOKUP:%.]], label [[RETURN:%.]]		; CHECK-NEXT: br i1 [[TMP0]], label [[SWITCH_LOOKUP:%.]], label [[RETURN:%.]]
; CHECK: switch.lookup:		; CHECK: switch.lookup:
; CHECK-NEXT: [[SWITCH_GEP:%.]] = getelementptr inbounds [7 x i32], [7 x i32] @switch.table.f, i32 0, i32 [[TMP0]]		; CHECK-NEXT: [[SWITCH_GEP:%.]] = getelementptr inbounds [7 x i32], [7 x i32] @switch.table.f, i32 0, i32 [[SWITCH_RANGEREDUCE]]
; CHECK-NEXT: [[SWITCH_LOAD:%.]] = load i32, i32 [[SWITCH_GEP]]		; CHECK-NEXT: [[SWITCH_LOAD:%.]] = load i32, i32 [[SWITCH_GEP]]
; CHECK-NEXT: ret i32 [[SWITCH_LOAD]]		; CHECK-NEXT: ret i32 [[SWITCH_LOAD]]
; CHECK: return:		; CHECK: return:
; CHECK-NEXT: ret i32 15		; CHECK-NEXT: ret i32 15
;		;
entry:		entry:
switch i32 %c, label %sw.default [		switch i32 %c, label %sw.default [
i32 42, label %return		i32 42, label %return
Show All 18 Lines

}		}

; Same thing, but with i8's		; Same thing, but with i8's

define i8 @char(i32 %c) {		define i8 @char(i32 %c) {
; CHECK-LABEL: @char(		; CHECK-LABEL: @char(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = sub i32 [[C:%.]], 42		; CHECK-NEXT: [[SWITCH_RANGEREDUCE:%.]] = sub i32 [[C:%.]], 42
; CHECK-NEXT: [[TMP1:%.*]] = icmp ult i32 [[TMP0]], 9		; CHECK-NEXT: [[TMP0:%.*]] = icmp ult i32 [[SWITCH_RANGEREDUCE]], 9
; CHECK-NEXT: br i1 [[TMP1]], label [[SWITCH_LOOKUP:%.]], label [[RETURN:%.]]		; CHECK-NEXT: br i1 [[TMP0]], label [[SWITCH_LOOKUP:%.]], label [[RETURN:%.]]
; CHECK: switch.lookup:		; CHECK: switch.lookup:
; CHECK-NEXT: [[SWITCH_GEP:%.]] = getelementptr inbounds [9 x i8], [9 x i8] @switch.table.char, i32 0, i32 [[TMP0]]		; CHECK-NEXT: [[SWITCH_GEP:%.]] = getelementptr inbounds [9 x i8], [9 x i8] @switch.table.char, i32 0, i32 [[SWITCH_RANGEREDUCE]]
; CHECK-NEXT: [[SWITCH_LOAD:%.]] = load i8, i8 [[SWITCH_GEP]]		; CHECK-NEXT: [[SWITCH_LOAD:%.]] = load i8, i8 [[SWITCH_GEP]]
; CHECK-NEXT: ret i8 [[SWITCH_LOAD]]		; CHECK-NEXT: ret i8 [[SWITCH_LOAD]]
; CHECK: return:		; CHECK: return:
; CHECK-NEXT: ret i8 15		; CHECK-NEXT: ret i8 15
;		;
entry:		entry:
switch i32 %c, label %sw.default [		switch i32 %c, label %sw.default [
i32 42, label %return		i32 42, label %return
▲ Show 20 Lines • Show All 147 Lines • ▼ Show 20 Lines
; Example 7 from http://blog.regehr.org/archives/320		; Example 7 from http://blog.regehr.org/archives/320
; It is not dense enough for a regular table, but the results		; It is not dense enough for a regular table, but the results
; can be packed into a bitmap.		; can be packed into a bitmap.

define i32 @crud(i8 zeroext %c) {		define i32 @crud(i8 zeroext %c) {
; CHECK-LABEL: @crud(		; CHECK-LABEL: @crud(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[CMP:%.]] = icmp ult i8 [[C:%.]], 33		; CHECK-NEXT: [[CMP:%.]] = icmp ult i8 [[C:%.]], 33
; CHECK-NEXT: br i1 [[CMP]], label [[LOR_END:%.]], label [[SWITCH_EARLY_TEST:%.]]		; CHECK-NEXT: br i1 [[CMP]], label [[LOR_END:%.]], label [[SWITCH_LOOKUP:%.]]
; CHECK: switch.early.test:
; CHECK-NEXT: [[TMP0:%.*]] = sub i8 [[C]], 34
; CHECK-NEXT: [[TMP1:%.*]] = icmp ult i8 [[TMP0]], 59
; CHECK-NEXT: br i1 [[TMP1]], label [[SWITCH_LOOKUP:%.*]], label [[LOR_END]]
; CHECK: switch.lookup:		; CHECK: switch.lookup:
; CHECK-NEXT: [[SWITCH_CAST:%.*]] = zext i8 [[TMP0]] to i59		; CHECK-NEXT: [[SWITCH_TABLEIDX_ZEXT:%.*]] = zext i8 [[C]] to i9
; CHECK-NEXT: [[SWITCH_SHIFTAMT:%.*]] = mul i59 [[SWITCH_CAST]], 1		; CHECK-NEXT: [[SWITCH_GEP:%.]] = getelementptr inbounds [256 x i1], [256 x i1] @switch.table.crud, i32 0, i9 [[SWITCH_TABLEIDX_ZEXT]]
; CHECK-NEXT: [[SWITCH_DOWNSHIFT:%.*]] = lshr i59 -288230375765830623, [[SWITCH_SHIFTAMT]]		; CHECK-NEXT: [[SWITCH_LOAD:%.]] = load i1, i1 [[SWITCH_GEP]]
; CHECK-NEXT: [[SWITCH_MASKED:%.*]] = trunc i59 [[SWITCH_DOWNSHIFT]] to i1
; CHECK-NEXT: br label [[LOR_END]]		; CHECK-NEXT: br label [[LOR_END]]
; CHECK: lor.end:		; CHECK: lor.end:
; CHECK-NEXT: [[TMP2:%.]] = phi i1 [ true, [[ENTRY:%.]] ], [ [[SWITCH_MASKED]], [[SWITCH_LOOKUP]] ], [ false, [[SWITCH_EARLY_TEST]] ]		; CHECK-NEXT: [[TMP0:%.]] = phi i1 [ true, [[ENTRY:%.]] ], [ [[SWITCH_LOAD]], [[SWITCH_LOOKUP]] ]
; CHECK-NEXT: [[LOR_EXT:%.*]] = zext i1 [[TMP2]] to i32		; CHECK-NEXT: [[LOR_EXT:%.*]] = zext i1 [[TMP0]] to i32
; CHECK-NEXT: ret i32 [[LOR_EXT]]		; CHECK-NEXT: ret i32 [[LOR_EXT]]
;		;
entry:		entry:
%cmp = icmp ult i8 %c, 33		%cmp = icmp ult i8 %c, 33
br i1 %cmp, label %lor.end, label %switch.early.test		br i1 %cmp, label %lor.end, label %switch.early.test

switch.early.test:		switch.early.test:
switch i8 %c, label %lor.rhs [		switch i8 %c, label %lor.rhs [
Show All 27 Lines	lor.end:
ret i32 %lor.ext		ret i32 %lor.ext

}		}

; PR13946		; PR13946
define i32 @overflow(i32 %type) {		define i32 @overflow(i32 %type) {
; CHECK-LABEL: @overflow(		; CHECK-LABEL: @overflow(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = sub i32 [[TYPE:%.]], -2147483645		; CHECK-NEXT: [[SWITCH_RANGEREDUCE:%.]] = sub i32 [[TYPE:%.]], -2147483645
; CHECK-NEXT: switch i32 [[TMP0]], label [[IF_END:%.*]] [		; CHECK-NEXT: switch i32 [[SWITCH_RANGEREDUCE]], label [[IF_END:%.*]] [
; CHECK-NEXT: i32 -2147483648, label [[SW_BB3:%.*]]		; CHECK-NEXT: i32 -2147483648, label [[SW_BB3:%.*]]
; CHECK-NEXT: i32 0, label [[SW_BB3]]		; CHECK-NEXT: i32 0, label [[SW_BB3]]
; CHECK-NEXT: i32 2147483646, label [[SW_BB1:%.*]]		; CHECK-NEXT: i32 2147483646, label [[SW_BB1:%.*]]
; CHECK-NEXT: i32 2147483647, label [[SW_BB2:%.*]]		; CHECK-NEXT: i32 2147483647, label [[SW_BB2:%.*]]
; CHECK-NEXT: ]		; CHECK-NEXT: ]
; CHECK: sw.bb1:		; CHECK: sw.bb1:
; CHECK-NEXT: br label [[IF_END]]		; CHECK-NEXT: br label [[IF_END]]
; CHECK: sw.bb2:		; CHECK: sw.bb2:
▲ Show 20 Lines • Show All 605 Lines • ▼ Show 20 Lines	return:
ret i32 %retval		ret i32 %retval

}		}

; Don't create a table with illegal type		; Don't create a table with illegal type
define i96 @illegaltype(i32 %c) {		define i96 @illegaltype(i32 %c) {
; CHECK-LABEL: @illegaltype(		; CHECK-LABEL: @illegaltype(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = sub i32 [[C:%.]], 42		; CHECK-NEXT: [[SWITCH_RANGEREDUCE:%.]] = sub i32 [[C:%.]], 42
; CHECK-NEXT: switch i32 [[TMP0]], label [[SW_DEFAULT:%.*]] [		; CHECK-NEXT: [[TMP0:%.*]] = icmp ult i32 [[SWITCH_RANGEREDUCE]], 5
; CHECK-NEXT: i32 0, label [[RETURN:%.*]]		; CHECK-NEXT: br i1 [[TMP0]], label [[SWITCH_LOOKUP:%.]], label [[RETURN:%.]]
; CHECK-NEXT: i32 1, label [[SW_BB1:%.*]]		; CHECK: switch.lookup:
; CHECK-NEXT: i32 2, label [[SW_BB2:%.*]]		; CHECK-NEXT: [[SWITCH_GEP:%.]] = getelementptr inbounds [5 x i96], [5 x i96] @switch.table.illegaltype, i32 0, i32 [[SWITCH_RANGEREDUCE]]
; CHECK-NEXT: i32 3, label [[SW_BB3:%.*]]		; CHECK-NEXT: [[SWITCH_LOAD:%.]] = load i96, i96 [[SWITCH_GEP]]
; CHECK-NEXT: i32 4, label [[SW_BB4:%.*]]		; CHECK-NEXT: ret i96 [[SWITCH_LOAD]]
; CHECK-NEXT: ]
; CHECK: sw.bb1:
; CHECK-NEXT: br label [[RETURN]]
; CHECK: sw.bb2:
; CHECK-NEXT: br label [[RETURN]]
; CHECK: sw.bb3:
; CHECK-NEXT: br label [[RETURN]]
; CHECK: sw.bb4:
; CHECK-NEXT: br label [[RETURN]]
; CHECK: sw.default:
; CHECK-NEXT: br label [[RETURN]]
; CHECK: return:		; CHECK: return:
; CHECK-NEXT: [[RETVAL_0:%.]] = phi i96 [ 15, [[SW_DEFAULT]] ], [ 27, [[SW_BB4]] ], [ -1, [[SW_BB3]] ], [ 0, [[SW_BB2]] ], [ 123, [[SW_BB1]] ], [ 55, [[ENTRY:%.]] ]		; CHECK-NEXT: ret i96 15
; CHECK-NEXT: ret i96 [[RETVAL_0]]
;		;
entry:		entry:
switch i32 %c, label %sw.default [		switch i32 %c, label %sw.default [
i32 42, label %return		i32 42, label %return
i32 43, label %sw.bb1		i32 43, label %sw.bb1
i32 44, label %sw.bb2		i32 44, label %sw.bb2
i32 45, label %sw.bb3		i32 45, label %sw.bb3
i32 46, label %sw.bb4		i32 46, label %sw.bb4
▲ Show 20 Lines • Show All 254 Lines • ▼ Show 20 Lines	return:
[ getelementptr inbounds ([3 x i32], [3 x i32]* @dllimport_a, i32 0, i32 0), %entry ]		[ getelementptr inbounds ([3 x i32], [3 x i32]* @dllimport_a, i32 0, i32 0), %entry ]
ret i32* %retval.0		ret i32* %retval.0
}		}

; We can use linear mapping.		; We can use linear mapping.
define i8 @linearmap1(i32 %c) {		define i8 @linearmap1(i32 %c) {
; CHECK-LABEL: @linearmap1(		; CHECK-LABEL: @linearmap1(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = sub i32 [[C:%.]], 10		; CHECK-NEXT: [[SWITCH_RANGEREDUCE:%.]] = sub i32 [[C:%.]], 10
; CHECK-NEXT: [[TMP1:%.*]] = icmp ult i32 [[TMP0]], 4		; CHECK-NEXT: [[TMP0:%.*]] = icmp ult i32 [[SWITCH_RANGEREDUCE]], 4
; CHECK-NEXT: br i1 [[TMP1]], label [[SWITCH_LOOKUP:%.]], label [[RETURN:%.]]		; CHECK-NEXT: br i1 [[TMP0]], label [[SWITCH_LOOKUP:%.]], label [[RETURN:%.]]
; CHECK: switch.lookup:		; CHECK: switch.lookup:
; CHECK-NEXT: [[SWITCH_IDX_CAST:%.*]] = trunc i32 [[TMP0]] to i8		; CHECK-NEXT: [[SWITCH_IDX_CAST:%.*]] = trunc i32 [[SWITCH_RANGEREDUCE]] to i8
; CHECK-NEXT: [[SWITCH_IDX_MULT:%.*]] = mul i8 [[SWITCH_IDX_CAST]], -5		; CHECK-NEXT: [[SWITCH_IDX_MULT:%.*]] = mul i8 [[SWITCH_IDX_CAST]], -5
; CHECK-NEXT: [[SWITCH_OFFSET:%.*]] = add i8 [[SWITCH_IDX_MULT]], 18		; CHECK-NEXT: [[SWITCH_OFFSET:%.*]] = add i8 [[SWITCH_IDX_MULT]], 18
; CHECK-NEXT: ret i8 [[SWITCH_OFFSET]]		; CHECK-NEXT: ret i8 [[SWITCH_OFFSET]]
; CHECK: return:		; CHECK: return:
; CHECK-NEXT: ret i8 3		; CHECK-NEXT: ret i8 3
;		;
entry:		entry:
switch i32 %c, label %sw.default [		switch i32 %c, label %sw.default [
Show All 10 Lines	return:
%x = phi i8 [ 3, %sw.default ], [ 3, %sw.bb3 ], [ 8, %sw.bb2 ], [ 13, %sw.bb1 ], [ 18, %entry ]		%x = phi i8 [ 3, %sw.default ], [ 3, %sw.bb3 ], [ 8, %sw.bb2 ], [ 13, %sw.bb1 ], [ 18, %entry ]
ret i8 %x		ret i8 %x
}		}

; Linear mapping in a different configuration.		; Linear mapping in a different configuration.
define i32 @linearmap2(i8 %c) {		define i32 @linearmap2(i8 %c) {
; CHECK-LABEL: @linearmap2(		; CHECK-LABEL: @linearmap2(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = sub i8 [[C:%.]], -13		; CHECK-NEXT: [[SWITCH_TABLEIDX_ZEXT:%.]] = zext i8 [[C:%.]] to i9
; CHECK-NEXT: [[TMP1:%.*]] = icmp ult i8 [[TMP0]], 4		; CHECK-NEXT: [[SWITCH_GEP:%.]] = getelementptr inbounds [256 x i32], [256 x i32] @switch.table.linearmap2, i32 0, i9 [[SWITCH_TABLEIDX_ZEXT]]
; CHECK-NEXT: br i1 [[TMP1]], label [[SWITCH_LOOKUP:%.]], label [[RETURN:%.]]		; CHECK-NEXT: [[SWITCH_LOAD:%.]] = load i32, i32 [[SWITCH_GEP]]
; CHECK: switch.lookup:		; CHECK-NEXT: ret i32 [[SWITCH_LOAD]]
; CHECK-NEXT: [[SWITCH_IDX_CAST:%.*]] = zext i8 [[TMP0]] to i32
; CHECK-NEXT: [[SWITCH_OFFSET:%.*]] = add i32 [[SWITCH_IDX_CAST]], 18
; CHECK-NEXT: ret i32 [[SWITCH_OFFSET]]
; CHECK: return:
; CHECK-NEXT: ret i32 3
;		;
entry:		entry:
switch i8 %c, label %sw.default [		switch i8 %c, label %sw.default [
i8 -10, label %return		i8 -10, label %return
i8 -11, label %sw.bb1		i8 -11, label %sw.bb1
i8 -12, label %sw.bb2		i8 -12, label %sw.bb2
i8 -13, label %sw.bb3		i8 -13, label %sw.bb3
]		]
sw.bb1: br label %return		sw.bb1: br label %return
sw.bb2: br label %return		sw.bb2: br label %return
sw.bb3: br label %return		sw.bb3: br label %return
sw.default: br label %return		sw.default: br label %return
return:		return:
%x = phi i32 [ 3, %sw.default ], [ 18, %sw.bb3 ], [ 19, %sw.bb2 ], [ 20, %sw.bb1 ], [ 21, %entry ]		%x = phi i32 [ 3, %sw.default ], [ 18, %sw.bb3 ], [ 19, %sw.bb2 ], [ 20, %sw.bb1 ], [ 21, %entry ]
ret i32 %x		ret i32 %x
}		}

; Linear mapping with overflows.		; Linear mapping with overflows.
define i8 @linearmap3(i32 %c) {		define i8 @linearmap3(i32 %c) {
; CHECK-LABEL: @linearmap3(		; CHECK-LABEL: @linearmap3(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = sub i32 [[C:%.]], 10		; CHECK-NEXT: [[SWITCH_RANGEREDUCE:%.]] = sub i32 [[C:%.]], 10
; CHECK-NEXT: [[TMP1:%.*]] = icmp ult i32 [[TMP0]], 4		; CHECK-NEXT: [[TMP0:%.*]] = icmp ult i32 [[SWITCH_RANGEREDUCE]], 4
; CHECK-NEXT: br i1 [[TMP1]], label [[SWITCH_LOOKUP:%.]], label [[RETURN:%.]]		; CHECK-NEXT: br i1 [[TMP0]], label [[SWITCH_LOOKUP:%.]], label [[RETURN:%.]]
; CHECK: switch.lookup:		; CHECK: switch.lookup:
; CHECK-NEXT: [[SWITCH_IDX_CAST:%.*]] = trunc i32 [[TMP0]] to i8		; CHECK-NEXT: [[SWITCH_IDX_CAST:%.*]] = trunc i32 [[SWITCH_RANGEREDUCE]] to i8
; CHECK-NEXT: [[SWITCH_IDX_MULT:%.*]] = mul i8 [[SWITCH_IDX_CAST]], 100		; CHECK-NEXT: [[SWITCH_IDX_MULT:%.*]] = mul i8 [[SWITCH_IDX_CAST]], 100
; CHECK-NEXT: ret i8 [[SWITCH_IDX_MULT]]		; CHECK-NEXT: ret i8 [[SWITCH_IDX_MULT]]
; CHECK: return:		; CHECK: return:
; CHECK-NEXT: ret i8 3		; CHECK-NEXT: ret i8 3
;		;
entry:		entry:
switch i32 %c, label %sw.default [		switch i32 %c, label %sw.default [
i32 10, label %return		i32 10, label %return
Show All 9 Lines	return:
%x = phi i8 [ 3, %sw.default ], [ 44, %sw.bb3 ], [ -56, %sw.bb2 ], [ 100, %sw.bb1 ], [ 0, %entry ]		%x = phi i8 [ 3, %sw.default ], [ 44, %sw.bb3 ], [ -56, %sw.bb2 ], [ 100, %sw.bb1 ], [ 0, %entry ]
ret i8 %x		ret i8 %x
}		}

; Linear mapping with with multiplier 1 and offset 0.		; Linear mapping with with multiplier 1 and offset 0.
define i8 @linearmap4(i32 %c) {		define i8 @linearmap4(i32 %c) {
; CHECK-LABEL: @linearmap4(		; CHECK-LABEL: @linearmap4(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = sub i32 [[C:%.]], -2		; CHECK-NEXT: [[SWITCH_RANGEREDUCE:%.]] = sub i32 [[C:%.]], -2
; CHECK-NEXT: [[TMP1:%.*]] = icmp ult i32 [[TMP0]], 4		; CHECK-NEXT: [[TMP0:%.*]] = icmp ult i32 [[SWITCH_RANGEREDUCE]], 4
; CHECK-NEXT: br i1 [[TMP1]], label [[SWITCH_LOOKUP:%.]], label [[RETURN:%.]]		; CHECK-NEXT: br i1 [[TMP0]], label [[SWITCH_LOOKUP:%.]], label [[RETURN:%.]]
; CHECK: switch.lookup:		; CHECK: switch.lookup:
; CHECK-NEXT: [[SWITCH_IDX_CAST:%.*]] = trunc i32 [[TMP0]] to i8		; CHECK-NEXT: [[SWITCH_IDX_CAST:%.*]] = trunc i32 [[SWITCH_RANGEREDUCE]] to i8
; CHECK-NEXT: ret i8 [[SWITCH_IDX_CAST]]		; CHECK-NEXT: ret i8 [[SWITCH_IDX_CAST]]
; CHECK: return:		; CHECK: return:
; CHECK-NEXT: ret i8 3		; CHECK-NEXT: ret i8 3
;		;
entry:		entry:
switch i32 %c, label %sw.default [		switch i32 %c, label %sw.default [
i32 -2, label %return		i32 -2, label %return
i32 -1, label %sw.bb1		i32 -1, label %sw.bb1
▲ Show 20 Lines • Show All 388 Lines • Show Last 20 Lines

test/Transforms/SimplifyCFG/rangereduce.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -simplifycfg -switch-to-lookup -S \| FileCheck %s		; RUN: opt < %s -simplifycfg -switch-to-lookup -S \| FileCheck %s
; RUN: opt < %s -passes='simplify-cfg<switch-to-lookup>' -S \| FileCheck %s		; RUN: opt < %s -passes='simplify-cfg<switch-to-lookup>' -S \| FileCheck %s

target datalayout = "e-n32"		target datalayout = "e-n32"

define i32 @test1(i32 %a) {		define i32 @test1(i32 %a) {
; CHECK-LABEL: @test1(		; CHECK-LABEL: @test1(
; CHECK-NEXT: [[TMP1:%.]] = sub i32 [[A:%.]], 97		; CHECK-NEXT: [[SWITCH_RANGEREDUCE:%.]] = sub i32 [[A:%.]], 97
; CHECK-NEXT: [[TMP2:%.*]] = lshr i32 [[TMP1]], 2		; CHECK-NEXT: [[TMP1:%.*]] = lshr i32 [[SWITCH_RANGEREDUCE]], 2
; CHECK-NEXT: [[TMP3:%.*]] = shl i32 [[TMP1]], 30		; CHECK-NEXT: [[TMP2:%.*]] = shl i32 [[SWITCH_RANGEREDUCE]], 30
; CHECK-NEXT: [[TMP4:%.*]] = or i32 [[TMP2]], [[TMP3]]		; CHECK-NEXT: [[TMP3:%.*]] = or i32 [[TMP1]], [[TMP2]]
; CHECK-NEXT: switch i32 [[TMP4]], label [[DEF:%.*]] [		; CHECK-NEXT: [[TMP4:%.*]] = icmp ult i32 [[TMP3]], 4
; CHECK-NEXT: i32 0, label [[ONE:%.*]]		; CHECK-NEXT: br i1 [[TMP4]], label [[SWITCH_LOOKUP:%.]], label [[DEF:%.]]
; CHECK-NEXT: i32 1, label [[TWO:%.*]]		; CHECK: switch.lookup:
; CHECK-NEXT: i32 2, label [[THREE:%.*]]		; CHECK-NEXT: [[SWITCH_GEP:%.]] = getelementptr inbounds [4 x i32], [4 x i32] @switch.table.test1, i32 0, i32 [[TMP3]]
; CHECK-NEXT: i32 3, label [[THREE]]		; CHECK-NEXT: [[SWITCH_LOAD:%.]] = load i32, i32 [[SWITCH_GEP]]
; CHECK-NEXT: ]		; CHECK-NEXT: ret i32 [[SWITCH_LOAD]]
; CHECK: def:		; CHECK: def:
; CHECK-NEXT: [[MERGE:%.]] = phi i32 [ 8867, [[TMP0:%.]] ], [ 11984, [[ONE]] ], [ 1143, [[TWO]] ], [ 99783, [[THREE]] ]		; CHECK-NEXT: ret i32 8867
; CHECK-NEXT: ret i32 [[MERGE]]
; CHECK: one:
; CHECK-NEXT: br label [[DEF]]
; CHECK: two:
; CHECK-NEXT: br label [[DEF]]
; CHECK: three:
; CHECK-NEXT: br label [[DEF]]
;		;
switch i32 %a, label %def [		switch i32 %a, label %def [
i32 97, label %one		i32 97, label %one
i32 101, label %two		i32 101, label %two
i32 105, label %three		i32 105, label %three
i32 109, label %three		i32 109, label %three
]		]

▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines	two:
ret i32 1143		ret i32 1143
three:		three:
ret i32 99783		ret i32 99783
}		}

; Optimization shouldn't trigger; not an arithmetic progression		; Optimization shouldn't trigger; not an arithmetic progression
define i32 @test4(i32 %a) {		define i32 @test4(i32 %a) {
; CHECK-LABEL: @test4(		; CHECK-LABEL: @test4(
; CHECK-NEXT: [[TMP1:%.]] = sub i32 [[A:%.]], 97		; CHECK-NEXT: [[SWITCH_RANGEREDUCE:%.]] = sub i32 [[A:%.]], 97
; CHECK-NEXT: switch i32 [[TMP1]], label [[DEF:%.*]] [		; CHECK-NEXT: [[TMP1:%.*]] = icmp ult i32 [[SWITCH_RANGEREDUCE]], 13
; CHECK-NEXT: i32 0, label [[ONE:%.*]]		; CHECK-NEXT: br i1 [[TMP1]], label [[SWITCH_LOOKUP:%.]], label [[DEF:%.]]
; CHECK-NEXT: i32 5, label [[TWO:%.*]]		; CHECK: switch.lookup:
; CHECK-NEXT: i32 8, label [[THREE:%.*]]		; CHECK-NEXT: [[SWITCH_GEP:%.]] = getelementptr inbounds [13 x i32], [13 x i32] @switch.table.test4, i32 0, i32 [[SWITCH_RANGEREDUCE]]
; CHECK-NEXT: i32 12, label [[THREE]]		; CHECK-NEXT: [[SWITCH_LOAD:%.]] = load i32, i32 [[SWITCH_GEP]]
; CHECK-NEXT: ]		; CHECK-NEXT: ret i32 [[SWITCH_LOAD]]
; CHECK: def:		; CHECK: def:
; CHECK-NEXT: [[MERGE:%.]] = phi i32 [ 8867, [[TMP0:%.]] ], [ 11984, [[ONE]] ], [ 1143, [[TWO]] ], [ 99783, [[THREE]] ]		; CHECK-NEXT: ret i32 8867
; CHECK-NEXT: ret i32 [[MERGE]]
; CHECK: one:
; CHECK-NEXT: br label [[DEF]]
; CHECK: two:
; CHECK-NEXT: br label [[DEF]]
; CHECK: three:
; CHECK-NEXT: br label [[DEF]]
;		;
switch i32 %a, label %def [		switch i32 %a, label %def [
i32 97, label %one		i32 97, label %one
i32 102, label %two		i32 102, label %two
i32 105, label %three		i32 105, label %three
i32 109, label %three		i32 109, label %three
]		]

def:		def:
ret i32 8867		ret i32 8867

one:		one:
ret i32 11984		ret i32 11984
two:		two:
ret i32 1143		ret i32 1143
three:		three:
ret i32 99783		ret i32 99783
}		}

; Optimization shouldn't trigger; not a power of two		; Optimization shouldn't trigger; not a power of two
define i32 @test5(i32 %a) {		define i32 @test5(i32 %a) {
; CHECK-LABEL: @test5(		; CHECK-LABEL: @test5(
; CHECK-NEXT: [[TMP1:%.]] = sub i32 [[A:%.]], 97		; CHECK-NEXT: [[SWITCH_RANGEREDUCE:%.]] = sub i32 [[A:%.]], 97
; CHECK-NEXT: switch i32 [[TMP1]], label [[DEF:%.*]] [		; CHECK-NEXT: [[TMP1:%.*]] = icmp ult i32 [[SWITCH_RANGEREDUCE]], 16
; CHECK-NEXT: i32 0, label [[ONE:%.*]]		; CHECK-NEXT: br i1 [[TMP1]], label [[SWITCH_LOOKUP:%.]], label [[DEF:%.]]
; CHECK-NEXT: i32 5, label [[TWO:%.*]]		; CHECK: switch.lookup:
; CHECK-NEXT: i32 10, label [[THREE:%.*]]		; CHECK-NEXT: [[SWITCH_GEP:%.]] = getelementptr inbounds [16 x i32], [16 x i32] @switch.table.test5, i32 0, i32 [[SWITCH_RANGEREDUCE]]
; CHECK-NEXT: i32 15, label [[THREE]]		; CHECK-NEXT: [[SWITCH_LOAD:%.]] = load i32, i32 [[SWITCH_GEP]]
; CHECK-NEXT: ]		; CHECK-NEXT: ret i32 [[SWITCH_LOAD]]
; CHECK: def:		; CHECK: def:
; CHECK-NEXT: [[MERGE:%.]] = phi i32 [ 8867, [[TMP0:%.]] ], [ 11984, [[ONE]] ], [ 1143, [[TWO]] ], [ 99783, [[THREE]] ]		; CHECK-NEXT: ret i32 8867
; CHECK-NEXT: ret i32 [[MERGE]]
; CHECK: one:
; CHECK-NEXT: br label [[DEF]]
; CHECK: two:
; CHECK-NEXT: br label [[DEF]]
; CHECK: three:
; CHECK-NEXT: br label [[DEF]]
;		;
switch i32 %a, label %def [		switch i32 %a, label %def [
i32 97, label %one		i32 97, label %one
i32 102, label %two		i32 102, label %two
i32 107, label %three		i32 107, label %three
i32 112, label %three		i32 112, label %three
]		]

def:		def:
ret i32 8867		ret i32 8867

one:		one:
ret i32 11984		ret i32 11984
two:		two:
ret i32 1143		ret i32 1143
three:		three:
ret i32 99783		ret i32 99783
}		}

define i32 @test6(i32 %a) optsize {		define i32 @test6(i32 %a) optsize {
; CHECK-LABEL: @test6(		; CHECK-LABEL: @test6(
; CHECK-NEXT: [[TMP1:%.]] = sub i32 [[A:%.]], -109		; CHECK-NEXT: [[SWITCH_RANGEREDUCE:%.]] = sub i32 [[A:%.]], -109
; CHECK-NEXT: [[TMP2:%.*]] = lshr i32 [[TMP1]], 2		; CHECK-NEXT: [[TMP1:%.*]] = lshr i32 [[SWITCH_RANGEREDUCE]], 2
; CHECK-NEXT: [[TMP3:%.*]] = shl i32 [[TMP1]], 30		; CHECK-NEXT: [[TMP2:%.*]] = shl i32 [[SWITCH_RANGEREDUCE]], 30
; CHECK-NEXT: [[TMP4:%.*]] = or i32 [[TMP2]], [[TMP3]]		; CHECK-NEXT: [[TMP3:%.*]] = or i32 [[TMP1]], [[TMP2]]
; CHECK-NEXT: switch i32 [[TMP4]], label [[DEF:%.*]] [		; CHECK-NEXT: [[TMP4:%.*]] = icmp ult i32 [[TMP3]], 4
; CHECK-NEXT: i32 3, label [[ONE:%.*]]		; CHECK-NEXT: br i1 [[TMP4]], label [[SWITCH_LOOKUP:%.]], label [[DEF:%.]]
; CHECK-NEXT: i32 2, label [[TWO:%.*]]		; CHECK: switch.lookup:
; CHECK-NEXT: i32 1, label [[THREE:%.*]]		; CHECK-NEXT: [[SWITCH_GEP:%.]] = getelementptr inbounds [4 x i32], [4 x i32] @switch.table.test6, i32 0, i32 [[TMP3]]
; CHECK-NEXT: i32 0, label [[THREE]]		; CHECK-NEXT: [[SWITCH_LOAD:%.]] = load i32, i32 [[SWITCH_GEP]]
; CHECK-NEXT: ]		; CHECK-NEXT: ret i32 [[SWITCH_LOAD]]
; CHECK: def:		; CHECK: def:
; CHECK-NEXT: [[MERGE:%.]] = phi i32 [ 8867, [[TMP0:%.]] ], [ 11984, [[ONE]] ], [ 1143, [[TWO]] ], [ 99783, [[THREE]] ]		; CHECK-NEXT: ret i32 8867
; CHECK-NEXT: ret i32 [[MERGE]]
; CHECK: one:
; CHECK-NEXT: br label [[DEF]]
; CHECK: two:
; CHECK-NEXT: br label [[DEF]]
; CHECK: three:
; CHECK-NEXT: br label [[DEF]]
;		;
switch i32 %a, label %def [		switch i32 %a, label %def [
i32 -97, label %one		i32 -97, label %one
i32 -101, label %two		i32 -101, label %two
i32 -105, label %three		i32 -105, label %three
i32 -109, label %three		i32 -109, label %three
]		]

def:		def:
ret i32 8867		ret i32 8867

one:		one:
ret i32 11984		ret i32 11984
two:		two:
ret i32 1143		ret i32 1143
three:		three:
ret i32 99783		ret i32 99783
}		}

define i8 @test7(i8 %a) optsize {		define i8 @test7(i8 %a) optsize {
; CHECK-LABEL: @test7(		; CHECK-LABEL: @test7(
; CHECK-NEXT: [[TMP1:%.]] = lshr i8 [[A:%.]], 2		; CHECK-NEXT: [[TMP1:%.]] = lshr i8 [[A:%.]], 2
; CHECK-NEXT: [[TMP2:%.*]] = shl i8 [[A]], 6		; CHECK-NEXT: [[TMP2:%.*]] = shl i8 [[A]], 6
; CHECK-NEXT: [[TMP3:%.*]] = or i8 [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP3:%.*]] = or i8 [[TMP1]], [[TMP2]]
; CHECK-NEXT: [[TMP4:%.*]] = sub i8 [[TMP3]], 55		; CHECK-NEXT: [[SWITCH_RANGEREDUCE:%.*]] = sub i8 [[TMP3]], 55
; CHECK-NEXT: [[TMP5:%.*]] = icmp ult i8 [[TMP4]], 4		; CHECK-NEXT: [[TMP4:%.*]] = icmp ult i8 [[SWITCH_RANGEREDUCE]], 4
; CHECK-NEXT: br i1 [[TMP5]], label [[SWITCH_LOOKUP:%.]], label [[DEF:%.]]		; CHECK-NEXT: br i1 [[TMP4]], label [[SWITCH_LOOKUP:%.]], label [[DEF:%.]]
; CHECK: switch.lookup:		; CHECK: switch.lookup:
; CHECK-NEXT: [[SWITCH_CAST:%.*]] = zext i8 [[TMP4]] to i32		; CHECK-NEXT: [[SWITCH_CAST:%.*]] = zext i8 [[SWITCH_RANGEREDUCE]] to i32
; CHECK-NEXT: [[SWITCH_SHIFTAMT:%.*]] = mul i32 [[SWITCH_CAST]], 8		; CHECK-NEXT: [[SWITCH_SHIFTAMT:%.*]] = mul i32 [[SWITCH_CAST]], 8
; CHECK-NEXT: [[SWITCH_DOWNSHIFT:%.*]] = lshr i32 -943228976, [[SWITCH_SHIFTAMT]]		; CHECK-NEXT: [[SWITCH_DOWNSHIFT:%.*]] = lshr i32 -943228976, [[SWITCH_SHIFTAMT]]
; CHECK-NEXT: [[SWITCH_MASKED:%.*]] = trunc i32 [[SWITCH_DOWNSHIFT]] to i8		; CHECK-NEXT: [[SWITCH_MASKED:%.*]] = trunc i32 [[SWITCH_DOWNSHIFT]] to i8
; CHECK-NEXT: ret i8 [[SWITCH_MASKED]]		; CHECK-NEXT: ret i8 [[SWITCH_MASKED]]
; CHECK: def:		; CHECK: def:
; CHECK-NEXT: ret i8 -93		; CHECK-NEXT: ret i8 -93
;		;
switch i8 %a, label %def [		switch i8 %a, label %def [
Show All 11 Lines
two:		two:
ret i8 1143		ret i8 1143
three:		three:
ret i8 99783		ret i8 99783
}		}

define i32 @test8(i32 %a) optsize {		define i32 @test8(i32 %a) optsize {
; CHECK-LABEL: @test8(		; CHECK-LABEL: @test8(
; CHECK-NEXT: [[TMP1:%.]] = sub i32 [[A:%.]], 97		; CHECK-NEXT: [[SWITCH_RANGEREDUCE:%.]] = sub i32 [[A:%.]], 97
; CHECK-NEXT: [[TMP2:%.*]] = lshr i32 [[TMP1]], 2		; CHECK-NEXT: [[TMP1:%.*]] = lshr i32 [[SWITCH_RANGEREDUCE]], 2
; CHECK-NEXT: [[TMP3:%.*]] = shl i32 [[TMP1]], 30		; CHECK-NEXT: [[TMP2:%.*]] = shl i32 [[SWITCH_RANGEREDUCE]], 30
; CHECK-NEXT: [[TMP4:%.*]] = or i32 [[TMP2]], [[TMP3]]		; CHECK-NEXT: [[TMP3:%.*]] = or i32 [[TMP1]], [[TMP2]]
; CHECK-NEXT: switch i32 [[TMP4]], label [[DEF:%.*]] [		; CHECK-NEXT: [[TMP4:%.*]] = icmp ult i32 [[TMP3]], 5
; CHECK-NEXT: i32 0, label [[ONE:%.*]]		; CHECK-NEXT: br i1 [[TMP4]], label [[SWITCH_LOOKUP:%.]], label [[DEF:%.]]
; CHECK-NEXT: i32 1, label [[TWO:%.*]]		; CHECK: switch.lookup:
; CHECK-NEXT: i32 2, label [[THREE:%.*]]		; CHECK-NEXT: [[SWITCH_GEP:%.]] = getelementptr inbounds [5 x i32], [5 x i32] @switch.table.test8, i32 0, i32 [[TMP3]]
; CHECK-NEXT: i32 4, label [[THREE]]		; CHECK-NEXT: [[SWITCH_LOAD:%.]] = load i32, i32 [[SWITCH_GEP]]
; CHECK-NEXT: ]		; CHECK-NEXT: ret i32 [[SWITCH_LOAD]]
; CHECK: def:		; CHECK: def:
; CHECK-NEXT: [[MERGE:%.]] = phi i32 [ 8867, [[TMP0:%.]] ], [ 11984, [[ONE]] ], [ 1143, [[TWO]] ], [ 99783, [[THREE]] ]		; CHECK-NEXT: ret i32 8867
; CHECK-NEXT: ret i32 [[MERGE]]
; CHECK: one:
; CHECK-NEXT: br label [[DEF]]
; CHECK: two:
; CHECK-NEXT: br label [[DEF]]
; CHECK: three:
; CHECK-NEXT: br label [[DEF]]
;		;
switch i32 %a, label %def [		switch i32 %a, label %def [
i32 97, label %one		i32 97, label %one
i32 101, label %two		i32 101, label %two
i32 105, label %three		i32 105, label %three
i32 113, label %three		i32 113, label %three
]		]

def:		def:
ret i32 8867		ret i32 8867

one:		one:
ret i32 11984		ret i32 11984
two:		two:
ret i32 1143		ret i32 1143
three:		three:
ret i32 99783		ret i32 99783
}		}

define i32 @test9(i32 %a) {		define i32 @test9(i32 %a) {
; CHECK-LABEL: @test9(		; CHECK-LABEL: @test9(
; CHECK-NEXT: [[TMP1:%.]] = lshr i32 [[A:%.]], 1		; CHECK-NEXT: [[TMP1:%.]] = lshr i32 [[A:%.]], 1
; CHECK-NEXT: [[TMP2:%.*]] = shl i32 [[A]], 31		; CHECK-NEXT: [[TMP2:%.*]] = shl i32 [[A]], 31
; CHECK-NEXT: [[TMP3:%.*]] = or i32 [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP3:%.*]] = or i32 [[TMP1]], [[TMP2]]
; CHECK-NEXT: switch i32 [[TMP3]], label [[DEF:%.*]] [		; CHECK-NEXT: [[TMP4:%.*]] = icmp ult i32 [[TMP3]], 11
; CHECK-NEXT: i32 9, label [[ONE:%.*]]		; CHECK-NEXT: br i1 [[TMP4]], label [[SWITCH_LOOKUP:%.]], label [[DEF:%.]]
; CHECK-NEXT: i32 10, label [[TWO:%.*]]		; CHECK: switch.lookup:
; CHECK-NEXT: i32 3, label [[THREE:%.*]]		; CHECK-NEXT: [[SWITCH_GEP:%.]] = getelementptr inbounds [11 x i32], [11 x i32] @switch.table.test9, i32 0, i32 [[TMP3]]
; CHECK-NEXT: i32 5, label [[THREE]]		; CHECK-NEXT: [[SWITCH_LOAD:%.]] = load i32, i32 [[SWITCH_GEP]]
; CHECK-NEXT: ]		; CHECK-NEXT: ret i32 [[SWITCH_LOAD]]
; CHECK: def:		; CHECK: def:
; CHECK-NEXT: [[MERGE:%.]] = phi i32 [ 8867, [[TMP0:%.]] ], [ 11984, [[ONE]] ], [ 1143, [[TWO]] ], [ 99783, [[THREE]] ]		; CHECK-NEXT: ret i32 8867
; CHECK-NEXT: ret i32 [[MERGE]]
; CHECK: one:
; CHECK-NEXT: br label [[DEF]]
; CHECK: two:
; CHECK-NEXT: br label [[DEF]]
; CHECK: three:
; CHECK-NEXT: br label [[DEF]]
;		;
switch i32 %a, label %def [		switch i32 %a, label %def [
i32 18, label %one		i32 18, label %one
i32 20, label %two		i32 20, label %two
i32 6, label %three		i32 6, label %three
i32 10, label %three		i32 10, label %three
]		]

Show All 11 Lines

test/Transforms/SimplifyCFG/switch-genfori8.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -passes='simplify-cfg<switch-to-lookup>' < %s \| FileCheck %s			; RUN: opt -S -passes='simplify-cfg<switch-to-lookup>' < %s \| FileCheck %s
	; Using a Zig driver https://gist.github.com/shawnl/8137f62f7dbcfd539f6cf1925387cd38			; Using a Zig driver https://gist.github.com/shawnl/8137f62f7dbcfd539f6cf1925387cd38
	;after-patch, covered lookup table: 509.8MiB/sec			;after-patch, covered lookup table: 509.8MiB/sec
	;lookup table, not covered, only valid digits to prime the branch predictor: 437.8MiB/sec			;lookup table, not covered, only valid digits to prime the branch predictor: 437.8MiB/sec
	;lookup table, not covered, random bytes: 242.0MiB/sec			;lookup table, not covered, random bytes: 242.0MiB/sec
	;before-patch, no lookup table: 205.4MiB/sec			;before-patch, no lookup table: 205.4MiB/sec

	; ModuleID = 'chartodigit.c'			; ModuleID = 'chartodigit.c'
	source_filename = "chartodigit.c"			source_filename = "chartodigit.c"
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-pc-linux-gnu"			target triple = "x86_64-pc-linux-gnu"

	; Function Attrs: norecurse nounwind readnone uwtable			; Function Attrs: norecurse nounwind readnone uwtable
	define dso_local zeroext i8 @char_to_digit(i8 zeroext) local_unnamed_addr #0 {			define dso_local zeroext i8 @char_to_digit(i8 zeroext) local_unnamed_addr #0 {
	; CHECK-LABEL: @char_to_digit(			; CHECK-LABEL: @char_to_digit(
	; CHECK-NEXT: [[TMP2:%.]] = sub i8 [[TMP0:%.]], 48			; CHECK-NEXT: switch.lookup:
	; CHECK-NEXT: switch i8 [[TMP2]], label [[TMP18:%.*]] [			; CHECK-NEXT: [[SWITCH_TABLEIDX_ZEXT:%.]] = zext i8 [[TMP0:%.]] to i9
	; CHECK-NEXT: i8 0, label [[TMP19:%.*]]			; CHECK-NEXT: [[SWITCH_GEP:%.]] = getelementptr inbounds [256 x i8], [256 x i8] @switch.table.char_to_digit, i32 0, i9 [[SWITCH_TABLEIDX_ZEXT]]
	; CHECK-NEXT: i8 1, label [[TMP3:%.*]]			; CHECK-NEXT: [[SWITCH_LOAD:%.]] = load i8, i8 [[SWITCH_GEP]]
	; CHECK-NEXT: i8 2, label [[TMP4:%.*]]			; CHECK-NEXT: ret i8 [[SWITCH_LOAD]]
	; CHECK-NEXT: i8 3, label [[TMP5:%.*]]
	; CHECK-NEXT: i8 4, label [[TMP6:%.*]]
	; CHECK-NEXT: i8 5, label [[TMP7:%.*]]
	; CHECK-NEXT: i8 6, label [[TMP8:%.*]]
	; CHECK-NEXT: i8 7, label [[TMP9:%.*]]
	; CHECK-NEXT: i8 8, label [[TMP10:%.*]]
	; CHECK-NEXT: i8 9, label [[TMP11:%.*]]
	; CHECK-NEXT: i8 49, label [[TMP12:%.*]]
	; CHECK-NEXT: i8 50, label [[TMP13:%.*]]
	; CHECK-NEXT: i8 51, label [[TMP14:%.*]]
	; CHECK-NEXT: i8 52, label [[TMP15:%.*]]
	; CHECK-NEXT: i8 53, label [[TMP16:%.*]]
	; CHECK-NEXT: i8 54, label [[TMP17:%.*]]
	; CHECK-NEXT: ]
	; CHECK: 3:
	; CHECK-NEXT: br label [[TMP19]]
	; CHECK: 4:
	; CHECK-NEXT: br label [[TMP19]]
	; CHECK: 5:
	; CHECK-NEXT: br label [[TMP19]]
	; CHECK: 6:
	; CHECK-NEXT: br label [[TMP19]]
	; CHECK: 7:
	; CHECK-NEXT: br label [[TMP19]]
	; CHECK: 8:
	; CHECK-NEXT: br label [[TMP19]]
	; CHECK: 9:
	; CHECK-NEXT: br label [[TMP19]]
	; CHECK: 10:
	; CHECK-NEXT: br label [[TMP19]]
	; CHECK: 11:
	; CHECK-NEXT: br label [[TMP19]]
	; CHECK: 12:
	; CHECK-NEXT: br label [[TMP19]]
	; CHECK: 13:
	; CHECK-NEXT: br label [[TMP19]]
	; CHECK: 14:
	; CHECK-NEXT: br label [[TMP19]]
	; CHECK: 15:
	; CHECK-NEXT: br label [[TMP19]]
	; CHECK: 16:
	; CHECK-NEXT: br label [[TMP19]]
	; CHECK: 17:
	; CHECK-NEXT: br label [[TMP19]]
	; CHECK: 18:
	; CHECK-NEXT: br label [[TMP19]]
	; CHECK: 19:
	; CHECK-NEXT: [[TMP20:%.]] = phi i8 [ -1, [[TMP18]] ], [ 15, [[TMP17]] ], [ 14, [[TMP16]] ], [ 13, [[TMP15]] ], [ 12, [[TMP14]] ], [ 11, [[TMP13]] ], [ 10, [[TMP12]] ], [ 9, [[TMP11]] ], [ 8, [[TMP10]] ], [ 7, [[TMP9]] ], [ 6, [[TMP8]] ], [ 5, [[TMP7]] ], [ 4, [[TMP6]] ], [ 3, [[TMP5]] ], [ 2, [[TMP4]] ], [ 1, [[TMP3]] ], [ 0, [[TMP1:%.]] ]
	; CHECK-NEXT: ret i8 [[TMP20]]
	;			;
	switch i8 %0, label %17 [			switch i8 %0, label %17 [
	i8 48, label %18			i8 48, label %18
	i8 49, label %2			i8 49, label %2
	i8 50, label %3			i8 50, label %3
	i8 51, label %4			i8 51, label %4
	i8 52, label %5			i8 52, label %5
	i8 53, label %6			i8 53, label %6
	▲ Show 20 Lines • Show All 72 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SimplifyCFG] Use lookup tables when they are more space efficient or a huge speed win.AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 198138

include/llvm/Analysis/TargetTransformInfo.h

include/llvm/Analysis/TargetTransformInfoImpl.h

lib/Analysis/TargetTransformInfo.cpp

lib/Transforms/Utils/SimplifyCFG.cpp

test/Transforms/SimplifyCFG/X86/disable-lookup-table.ll

test/Transforms/SimplifyCFG/X86/switch-covered-bug.ll

test/Transforms/SimplifyCFG/X86/switch_to_lookup_table.ll

test/Transforms/SimplifyCFG/rangereduce.ll

test/Transforms/SimplifyCFG/switch-genfori8.ll

[SimplifyCFG] Use lookup tables when they are more space efficient or a huge speed win.
AbandonedPublic