This is an archive of the discontinued LLVM Phabricator instance.

[SimplifyCFG] Use lookup tables when they are more space efficient or a huge speed win.
AbandonedPublic

Authored by shawnl on Apr 22 2019, 2:55 PM.

Download Raw Diff

Details

Reviewers

nikic
hans
spatel
jmolloy

Summary

I am trying to move towards much more space-efficient switch statements, using popcnt, as described in PR39013. This is the patch 6 towards that goal, and a continuation of D60673.

There are quite a few cases where the lookup table was smaller and it was still not used. Also, there was no consideration of how large the cases were in the calculation. These numbers will change in a later patch, (when sparse maps make switch much more compact) so we shouldn't argue too much about this cut-off.

When only an i8 is being switched over, a complete table is not large, and avoiding the branch of a regular lookup table is a significant speed win (as can be seen in the tests).

Diff Detail

Event Timeline

shawnl created this revision.Apr 22 2019, 2:55 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 22 2019, 2:55 PM

Herald added subscribers: llvm-commits, kristof.beyls, javed.absar. · View Herald Transcript

add more test results

also remove Sub (turns out Sub does not get normalized to Add like I thought it would.

lebedev.ri added a parent revision: D60673: [SimplifyCFG] Improove and speed up ReduceSwitchRange.Apr 24 2019, 12:07 PM

lebedev.ri removed a reviewer: lebedev.ri.Apr 24 2019, 12:39 PM

lebedev.ri set the repository for this revision to rL LLVM.

lebedev.ri removed a subscriber: lebedev.ri.

fix i8 use of covered table to work when the return is not i8, do not use -O2 in tests. Update tests.

shawnl mentioned this in D61132: [builtins] run-time support for sparse maps in llvm.Apr 25 2019, 8:17 AM

Please reupload with full context as described in the developer's guide.

These numbers will change in a later patch, (when sparse maps make switch much more compact) so we shouldn't argue too much about this cut-off.

Are you referring to your changes in builtins/? If so, that's surely part of compiler-rt, and you can't expect targets to have that available at runtime.

lib/Transforms/Utils/SimplifyCFG.cpp
5131	( 3 * 8 ) ... but why? This isn't obvious from the code so needs comments. Note that the original code randomly dividing by 10 was just as bad, but you've clearly got a good reason for this change so should document it ;)
5135	will Bold claim :)
5142	Nit: please use the term "Largest". They're synonymous and I know this is nitpicking, but it's the more generally used term.
5145	This is only used in one place; please fold into its use.
5173	Please describe this heuristic in more detail. Why is it always good? Take a look at the line 5162 in the diffbase for a good example of a heuristic description. It captures what the criterion is, and a rationale for it.
5174	These magic numbers have no place in SimplifyCFG. If you need to be this accurate, add a TTI hook.
5183	But why? You've replaced 40% with 33% and you mention 64-bit integers but the following heuristic doesn't use 64-bit integers anywhere.
5272	Please upload the code for review as it will be committed.
5386	This assumes whatever constant add/sub/xor was planted still exists. There's no guarantee of that; for example if V is a constant, IRBuilder would have constant folded immediately. In general unstitching code like this is inherently problematic. Note that isn't the same as planting an inverting code sequence - that's fine. We can use the optimizer to clean the duplicated stuff up and everyone is happy. That is much preferable to hamhandedly unstitching stuff and looking for patterns of code you've just emitted. It also adds an implicit contract between different parts of SimplifyCFG that I guarantee someone will miss when they update this code :)
5400	This comment doesn't parse for me.
5401	This code structure is a little hard to understand. Instead of trying to tack onto the end of the if statement to reuse the heuristic, can you extract the heuristic into a bool and reuse it explicitly here? Then the reader doesn't need to think about context.
5713	It's not clear to me why this matters from this context. Perhaps if you wrote something like: // Call ReduceSwitchRange after SwitchToLookupTable as SwitchToLookupTable calls this internally.

This revision now requires changes to proceed.Apr 25 2019, 9:00 AM

Thanks for the review!

I've split the other patch into 5 distinct patches, which I will submit once I run the tests.

There is one part of this review that I need some clarification on before I can rev this patch. (see below comment)

lib/Transforms/Utils/SimplifyCFG.cpp
5386	The Xor wasn't added by this stuff. The problem is that this pass gets run multiple times, sometimes without the table generation (because it can make the code analyzable). There is nowhere else this optimization can go, because it is an optimization specific to switch statements, where the operands can be re-ordered arbitrarily. Also, there is no assumption that these things are there. It simply sees if they are there and if so removed them and then returns true so that the other optimizations run before we continue. So I feel this is a planted inverting code sequence: this is not "unstitching" (and the code does not generate xors but does remove them): it is an optimization specific to covered tables.

jmolloy added inline comments.Apr 25 2019, 12:05 PM

lib/Transforms/Utils/SimplifyCFG.cpp
5386	My worry about this code is that it doesn't assert the preconditions or postconditions. AFAICS, it detects any Add, Sub or Xor and does some transform on them. Why is this correct in all cases? What if one of these instructions were missed? What if there is a sequence of these instructions, is it correct to just adjust a subset? What if they were reordered or constant folded? The code doesn't explain exactly what it's checking for, who generates it and why it's correct to remove (and what happens if you have a false negative or false positive). Note I'm not saying it's wrong, or perhaps it's the only way to do this, but with the lack of comment it remains a code smell at the moment and it's difficult for me to suggest an alternative.

nikic added inline comments.Apr 25 2019, 12:17 PM

lib/Transforms/Utils/SimplifyCFG.cpp
5386	I think the idea here is that given two bijective (total) functions f(x) and lookup(y), then lookup(f(x)) is also a bijective function that can be implemented as a new lookup table lookup_f(x). In this case and, sub and xor with a constant are the f(x)s. It seems like this could be an independent and more general optimization though, that works on any table lookup.

I'm trying to follow along here, but there's so much churn I'm not sure what I'm supposed to be reviewing?

Are you asking for review on this patch? Is it ready for review? Or should I look at the other five patches you alluded too, and if so should this be marked abandoned?

This patch got a pretty comprehensive review and I will submit a new version for review. The other patches are also live.

shawnl updated this revision to Diff 197604.May 1 2019, 11:36 AM

shawnl edited the summary of this revision. (Show Details)

shawnl marked an inline comment as done.

shawnl added inline comments.

lib/Transforms/Utils/SimplifyCFG.cpp
5386	I am working on doing this as a general optimization of GEPs.

shawnl edited the summary of this revision. (Show Details)May 1 2019, 11:37 AM

git-clang-format

This seems reasonable to me.

This revision is now accepted and ready to land.May 2 2019, 1:11 AM

Undo approval; was looking at an incorrect diff.

This revision now requires changes to proceed.May 2 2019, 1:12 AM

shawnl requested review of this revision.May 2 2019, 5:04 PM

Looking much better. I think the TTI hook could be described better.

include/llvm/Analysis/TargetTransformInfo.h
581 ↗	(On Diff #197625)	What is the TableSize, and what is the CaseSize? In particular why would TableSize ever be different from NumCases?

This revision now requires changes to proceed.May 3 2019, 7:47 AM

NFC

jmolloy requested changes to this revision.May 9 2019, 6:11 AM

jmolloy added inline comments.

include/llvm/Analysis/TargetTransformInfo.h
582 ↗	(On Diff #198138)	To be honest, I'd really just express this as: // Return true if the given SwichInst should be converted into a lookup table. The size of the lookup table is \c TableSize and the number of covered cases is \c NumCases (meaning the number of table entries that hit the default case is \c TableSize - \c NumCases). bool shouldBuildLookupTable(SwitchInst *SI, uint64_t TableSize, unsigned NumCases); Sorry for the churn here, I know I didn't mention this in your previous review. But passing the SwitchInst itself and letting the target fish out OptSize/SI->getType()->getIntegerBitWidth() seems cleaner. Also, never use size_t or uint32_t here. TableSize's maximum extent isn't host dependent, it's target dependent. Use uint64_t to guarantee 64-bit coverage on all hosts. Similarly NumCases isn't limited to 32-bits, so use unsigned because that's what we use everywhere.

This revision now requires changes to proceed.May 9 2019, 6:11 AM

shawnl marked an inline comment as done.May 9 2019, 9:36 AM

shawnl added inline comments.

include/llvm/Analysis/TargetTransformInfo.h
582 ↗	(On Diff #198138)	good catch on the size_t! Similarly NumCases isn't limited to 32-bits, so use unsigned because that's what we use everywhere. I was going to limit it to 32-bits in the next patch. Is that a problem?

jmolloy added inline comments.May 10 2019, 5:52 AM

include/llvm/Analysis/TargetTransformInfo.h
582 ↗	(On Diff #198138)	I don't see the rationale for limiting it to 32-bits, so let's argue that in your next patch :) In the meantime let's keep it to uint64_t or unsigned in this patch.

shawnl marked an inline comment as done.May 19 2019, 5:15 AM

shawnl added inline comments.

lib/Transforms/Utils/SimplifyCFG.cpp
5183	Along with the comment on the multiplication overflow above: this is basic grade-school multiplication. 1/3 == 33 %; 64 / 8 == 8. The heuristic has threatened to de-rail this patch set with bike-shedding, and has made me really frustrated. Communication is a key part of doing this work, but when you worry that multiplication will not be understood by the reviewer it really throws a wrench in the process.

jmolloy added inline comments.May 19 2019, 7:10 AM

lib/Transforms/Utils/SimplifyCFG.cpp
5183	Hi Shawn, I apologise that my review has made you frustrated. If you wish to find a different reviewer, that's fine by me. I have approximate knowledge of many things. Luckily multiplication is one of them. Clairvoyance is not, and the point of a code review is to ensure that submitted code can be comprehended by anyone, not just its author. Communication is important as you say, and it's possible that I've been poor in my own communication. My intent with the "But why?" comments wasn't to say "I cannot understand this at all", instead to say "It took me longer than it should to understand this.". The latter does not mean the code is wrong, it means it either requires a comment or refactoring to make the intended logic (not the eventual mechanics) clear. My view on code in heuristics is that it should be crystal clear what the intent is without any mental gymnastics. If we take the code as you've rewritten it: // The table density should be at least 1/3rd (33%) for 64-bit integers. // This is a guess. FIXME: Find the best cut-off. return (uint64_t)NumCases * 3 * (64 / 8) >= TableSize * CaseSize; There are a few things the reader needs to do here. They must realise that the comment describes density with a number (33%), but the code calculates the inverse (denominator on the LHS, numerator on the RHS). It's still not clear to me what the "for 64-bit integers" means in the comment or precisely how it impacts the heuristic function. Heuristics are always arbitrary, often wrong. They are the trickiest pieces of code to comprehend the intent of because of this, so I pay them more attention. Also, you changed from 40% to 33% alongside this. Was there a reasoning behind this? People who notice the effects of this change may look back at this review to find a rationale. Again, apologies if you find this pedantic. I always aim to be anything but pedantic in my reviews.

hans added inline comments.May 22 2019, 1:42 AM

lib/Transforms/Utils/SimplifyCFG.cpp
5183	To give some background, the 40% density was originally chosen to match the density used to form switch jump tables in SelectionDAGBuilder. That density is still used for -Os builds, see OptsizeJumpTableDensity in TargetLoweringBase.cpp. Changing the threshold here may well be a good idea, especially since it has dropped to 10% for non-optsize jump tables (JumpTableDensity in TargetLoweringBase.cpp), but such a change needs at least a comment with the motivation. Perhaps it could be synced with the jump table density? Or perhaps it could be left alone for now -- there's already a lot going on in this patch. Currently you're returning false for optsize functions. If a switch is >40% dense it means we will instead build a jump table, which is likely to be larger so this would be both a size and performance regression. Regarding the "randomly dividing by 10" before, that's my fault too. The code currently looks like: // The table density should be at least 40%. This is the same criterion as for // jump tables, see SelectionDAGBuilder::handleJTSwitchCase. // FIXME: Find the best cut-off. return SI->getNumCases() * 10 >= TableSize * 4; If I were to write that today I might have spelled it out with 100 and 40 instead to make it really clear that 40 is a percentage, but I optimized prematurely. The "TableSize >= UINT64_MAX / 10" check above is to protect against overflow, as the comment says, but as the code evolved and that line moved further away from the density computation, it became harder to make the connection. I mentioned before that I have trouble following along with this patch series. This one has a title of "Use lookup tables when they are more space efficient or a huge speed win" which is pretty vague, and the inline description says things like "trying to move towards much more space-efficient switch statements, using popcnt" and "These numbers will change in a later patch, (when sparse maps make switch much more compact) so we shouldn't argue too much about this cut-off". This makes the patch very hard to review. At least for me, it's not clear what you're trying to do. Making the switch-to-lookup table transformation better is excellent, but it needs to be done by well-argued changes in clear and focused patches.

Thanks for the context Hans.

This review has been fairly critical, so I wanted to suggest ways to make this patch land easier.

Split it up into the NFC moving of ReduceSwitchRange to reduce code churn in this patch and make it more accessible to review. This NFC patch would land almost instantly without contention.
A general rule of thumb is to split up refactoring, new functionality, and heuristic changes into different patches. In particular the former two are easier to review and approve than the latter, so if the latter is totally isolated it's much less likely to derail your other changes.
I've previously explained how to make the heuristic self describing. As Hans mentioned, premature optimizations like folding constant divisions/multiplications in the code can obscure the intended calculation.

This way, the refactor and new functionality could land without much contention and the heuristic can be (a) picked over easier, (b) isolated for benchmarking and (c) isolated for rollback.

Cheers,

James

shawnl abandoned this revision.Jun 25 2020, 2:02 AM

Revision Contents

Path

Size

lib/

Transforms/

Utils/

SimplifyCFG.cpp

89 lines

test/

Transforms/

SimplifyCFG/

switch-genfori8.ll

102 lines

Diff 196440

lib/Transforms/Utils/SimplifyCFG.cpp

Context not available.
	/// the number of cases, size of the table, and the types of the results.	/// the number of cases, size of the table, and the types of the results.
	static bool	static bool
	ShouldBuildLookupTable(SwitchInst *SI, uint64_t TableSize,	ShouldBuildLookupTable(SwitchInst *SI, uint64_t TableSize,
	const TargetTransformInfo &TTI, const DataLayout &DL,	const TargetTransformInfo &TTI, const DataLayout &DL,
	const SmallDenseMap<PHINode , Type > &ResultTypes) {	const SmallDenseMap<PHINode , Type > &ResultTypes) {
	if (SI->getNumCases() > TableSize \|\| TableSize >= UINT64_MAX / 10)	if (SI->getNumCases() > TableSize \|\| TableSize >= UINT64_MAX / (3 * 8))
		jmolloyUnsubmitted Not Done Reply Inline Actions ( 3 * 8 ) ... but why? This isn't obvious from the code so needs comments. Note that the original code randomly dividing by 10 was just as bad, but you've clearly got a good reason for this change so should document it ;) jmolloy: > ( 3 * 8 ) ... but why? This isn't obvious from the code so needs comments. Note that the…
	return false; // TableSize overflowed, or mul below might overflow.	return false; // TableSize overflowed, or mul below might overflow.

	bool AllTablesFitInRegister = true;	bool AllTablesFitInRegister = true;
	bool HasIllegalType = false;	bool HasIllegalType = false;
		jmolloyUnsubmitted Not Done Reply Inline Actions will Bold claim :) jmolloy: > will Bold claim :)
		bool NoBiggerThanI8 = true;
		unsigned BiggestTypeSize = 0;
	for (const auto &I : ResultTypes) {	for (const auto &I : ResultTypes) {
	Type *Ty = I.second;	Type *Ty = I.second;
		unsigned TySize = DL.getTypeAllocSize(Ty);

	// Saturate this flag to true.	// Saturate this flag to true.
		jmolloyUnsubmitted Not Done Reply Inline Actions Nit: please use the term "Largest". They're synonymous and I know this is nitpicking, but it's the more generally used term. jmolloy: Nit: please use the term "Largest". They're synonymous and I know this is nitpicking, but it's…
	HasIllegalType = HasIllegalType \|\| !TTI.isTypeLegal(Ty);	HasIllegalType = HasIllegalType \|\| !TTI.isTypeLegal(Ty);

	// Saturate this flag to false.	// Saturate this flag to false.
		jmolloyUnsubmitted Not Done Reply Inline Actions This is only used in one place; please fold into its use. jmolloy: This is only used in one place; please fold into its use.
	AllTablesFitInRegister =	AllTablesFitInRegister =
	AllTablesFitInRegister &&	AllTablesFitInRegister &&
	SwitchLookupTable::WouldFitInRegister(DL, TableSize, Ty);	SwitchLookupTable::WouldFitInRegister(DL, TableSize, Ty);

	// If both flags saturate, we're done. NOTE: This only works with	// Saturate this flag to false.
	// saturating flags, and all flags have to saturate first due to the	NoBiggerThanI8 = NoBiggerThanI8 && (TySize == 1);
	// non-deterministic behavior of iterating over a dense map.
		if (TySize > BiggestTypeSize)
		BiggestTypeSize = TySize;

		// If these two flags saturate, we're done.
	if (HasIllegalType && !AllTablesFitInRegister)	if (HasIllegalType && !AllTablesFitInRegister)
	break;	break;
	}	}

	// If each table would fit in a register, we should build it anyway.	// If each table would fit in a register, we should build it anyway.
Context not available.

	// Don't build a table that doesn't fit in-register if it has illegal types.	// Don't build a table that doesn't fit in-register if it has illegal types.
	if (HasIllegalType)	if (HasIllegalType)
	return false;	return false;

	// The table density should be at least 40%. This is the same criterion as for	// If the table only contains i8s or smaller, it has a bounded size of
	// jump tables, see SelectionDAGBuilder::handleJTSwitchCase.	// 256 times the largest legal size, and will be more performant with a lookup table.
		if (NoBiggerThanI8 && !SI->getFunction()->hasOptSize())
		return true;

		jmolloyUnsubmitted Not Done Reply Inline Actions Please describe this heuristic in more detail. Why is it always good? Take a look at the line 5162 in the diffbase for a good example of a heuristic description. It captures what the criterion is, and a rationale for it. jmolloy: Please describe this heuristic in more detail. Why is it always good? Take a look at the line…
		// If the table is smaller, always use it
		jmolloyUnsubmitted Not Done Reply Inline Actions These magic numbers have no place in SimplifyCFG. If you need to be this accurate, add a TTI hook. jmolloy: These magic numbers have no place in SimplifyCFG. If you need to be this accurate, add a TTI…
		if (TableSize * BiggestTypeSize + 14 <
		// Table Size, including empty space, plus header size
		SI->getNumCases() * 14) // size of cmp jmp mov ret on x86_64.
		return true;

		// Space is more important than performance when using -Os
		if (SI->getFunction()->hasOptSize())
		return false;

		jmolloyUnsubmitted Not Done Reply Inline Actions But why? You've replaced 40% with 33% and you mention 64-bit integers but the following heuristic doesn't use 64-bit integers anywhere. jmolloy: But why? You've replaced 40% with 33% and you mention 64-bit integers but the following…
		shawnlAuthorUnsubmitted Done Reply Inline Actions Along with the comment on the multiplication overflow above: this is basic grade-school multiplication. 1/3 == 33 %; 64 / 8 == 8. The heuristic has threatened to de-rail this patch set with bike-shedding, and has made me really frustrated. Communication is a key part of doing this work, but when you worry that multiplication will not be understood by the reviewer it really throws a wrench in the process. shawnl: Along with the comment on the multiplication overflow above: this is basic grade-school…
		jmolloyUnsubmitted Not Done Reply Inline Actions Hi Shawn, I apologise that my review has made you frustrated. If you wish to find a different reviewer, that's fine by me. I have approximate knowledge of many things. Luckily multiplication is one of them. Clairvoyance is not, and the point of a code review is to ensure that submitted code can be comprehended by anyone, not just its author. Communication is important as you say, and it's possible that I've been poor in my own communication. My intent with the "But why?" comments wasn't to say "I cannot understand this at all", instead to say "It took me longer than it should to understand this.". The latter does not mean the code is wrong, it means it either requires a comment or refactoring to make the intended logic (not the eventual mechanics) clear. My view on code in heuristics is that it should be crystal clear what the intent is without any mental gymnastics. If we take the code as you've rewritten it: // The table density should be at least 1/3rd (33%) for 64-bit integers. // This is a guess. FIXME: Find the best cut-off. return (uint64_t)NumCases * 3 * (64 / 8) >= TableSize * CaseSize; There are a few things the reader needs to do here. They must realise that the comment describes density with a number (33%), but the code calculates the inverse (denominator on the LHS, numerator on the RHS). It's still not clear to me what the "for 64-bit integers" means in the comment or precisely how it impacts the heuristic function. Heuristics are always arbitrary, often wrong. They are the trickiest pieces of code to comprehend the intent of because of this, so I pay them more attention. Also, you changed from 40% to 33% alongside this. Was there a reasoning behind this? People who notice the effects of this change may look back at this review to find a rationale. Again, apologies if you find this pedantic. I always aim to be anything but pedantic in my reviews. jmolloy: Hi Shawn, I apologise that my review has made you frustrated. If you wish to find a different…
		hansUnsubmitted Not Done Reply Inline Actions To give some background, the 40% density was originally chosen to match the density used to form switch jump tables in SelectionDAGBuilder. That density is still used for -Os builds, see OptsizeJumpTableDensity in TargetLoweringBase.cpp. Changing the threshold here may well be a good idea, especially since it has dropped to 10% for non-optsize jump tables (JumpTableDensity in TargetLoweringBase.cpp), but such a change needs at least a comment with the motivation. Perhaps it could be synced with the jump table density? Or perhaps it could be left alone for now -- there's already a lot going on in this patch. Currently you're returning false for optsize functions. If a switch is >40% dense it means we will instead build a jump table, which is likely to be larger so this would be both a size and performance regression. Regarding the "randomly dividing by 10" before, that's my fault too. The code currently looks like: // The table density should be at least 40%. This is the same criterion as for // jump tables, see SelectionDAGBuilder::handleJTSwitchCase. // FIXME: Find the best cut-off. return SI->getNumCases() * 10 >= TableSize * 4; If I were to write that today I might have spelled it out with 100 and 40 instead to make it really clear that 40 is a percentage, but I optimized prematurely. The "TableSize >= UINT64_MAX / 10" check above is to protect against overflow, as the comment says, but as the code evolved and that line moved further away from the density computation, it became harder to make the connection. I mentioned before that I have trouble following along with this patch series. This one has a title of "Use lookup tables when they are more space efficient or a huge speed win" which is pretty vague, and the inline description says things like "trying to move towards much more space-efficient switch statements, using popcnt" and "These numbers will change in a later patch, (when sparse maps make switch much more compact) so we shouldn't argue too much about this cut-off". This makes the patch very hard to review. At least for me, it's not clear what you're trying to do. Making the switch-to-lookup table transformation better is excellent, but it needs to be done by well-argued changes in clear and focused patches. hans: To give some background, the 40% density was originally chosen to match the density used to…
		// The table density should be at least 33% for 64-bit integers.
	// FIXME: Find the best cut-off.	// FIXME: Find the best cut-off.
	return SI->getNumCases() * 10 >= TableSize * 4;	return SI->getNumCases() * 3 * 8 >= (TableSize * BiggestTypeSize);
	}	}

	/// Try to reuse the switch table index compare. Following pattern:	/// Try to reuse the switch table index compare. Following pattern:
	/// \code	/// \code
	/// if (idx < tablesize)	/// if (idx < tablesize)
Context not available.
	CmpInst->replaceAllUsesWith(InvertedTableCmp);	CmpInst->replaceAllUsesWith(InvertedTableCmp);
	++NumTableCmpReuses;	++NumTableCmpReuses;
	}	}
	}	}

		// FIXME Move this function up here when commiting. This makes the patch easier to read.
		jmolloyUnsubmitted Not Done Reply Inline Actions Please upload the code for review as it will be committed. jmolloy: Please upload the code for review as it will be committed.
		static bool ReduceSwitchRange(SwitchInst *SI, IRBuilder<> &Builder,
		const DataLayout &DL,
		const TargetTransformInfo &TTI);

	/// If the switch is only used to initialize one or more phi nodes in a common	/// If the switch is only used to initialize one or more phi nodes in a common
	/// successor block with different constant values, replace the switch with	/// successor block with different constant values, replace the switch with
	/// lookup tables.	/// lookup tables.
	static bool SwitchToLookupTable(SwitchInst *SI, IRBuilder<> &Builder,	static bool SwitchToLookupTable(SwitchInst *SI, IRBuilder<> &Builder,
	const DataLayout &DL,	const DataLayout &DL,
Context not available.
	PHINode *PHI = I.first;	PHINode *PHI = I.first;
	Constant *Result = I.second;	Constant *Result = I.second;
	DefaultResults[PHI] = Result;	DefaultResults[PHI] = Result;
	}	}

		// Compute the maximum table size representable by the integer type we are
		// switching upon.
		unsigned CaseSize = MaxCaseVal->getType()->getPrimitiveSizeInBits();
		uint64_t MaxTableSize = CaseSize > 63 ? UINT64_MAX : 1ULL << CaseSize;
		assert(MaxTableSize >= TableSize &&
		"It is impossible for a switch to have more entries than the max "
		"representable value of its input integer type's size.");

		// If the table is only a u8 and we do not have to check for the default case,
		// extend the table so we can get rid of the branch.
		if (MaxTableSize <= 256 && HasDefaultResults && !SI->getFunction()->hasOptSize()) {
		TableSize = MaxTableSize;
		// Un-rotate and un-xor now that we are covering the whole range,
		ConstantInt CAdd = nullptr, CSub = nullptr, *CXor = nullptr;
		Value *V;
		jmolloyUnsubmitted Not Done Reply Inline Actions This assumes whatever constant add/sub/xor was planted still exists. There's no guarantee of that; for example if V is a constant, IRBuilder would have constant folded immediately. In general unstitching code like this is inherently problematic. Note that isn't the same as planting an inverting code sequence - that's fine. We can use the optimizer to clean the duplicated stuff up and everyone is happy. That is much preferable to hamhandedly unstitching stuff and looking for patterns of code you've just emitted. It also adds an implicit contract between different parts of SimplifyCFG that I guarantee someone will miss when they update this code :) jmolloy: This assumes whatever constant add/sub/xor was planted still exists. There's no guarantee of…
		shawnlAuthorUnsubmitted Done Reply Inline Actions The Xor wasn't added by this stuff. The problem is that this pass gets run multiple times, sometimes without the table generation (because it can make the code analyzable). There is nowhere else this optimization can go, because it is an optimization specific to switch statements, where the operands can be re-ordered arbitrarily. Also, there is no assumption that these things are there. It simply sees if they are there and if so removed them and then returns true so that the other optimizations run before we continue. So I feel this is a planted inverting code sequence: this is not "unstitching" (and the code does not generate xors but does remove them): it is an optimization specific to covered tables. shawnl: The Xor wasn't added by this stuff. The problem is that this pass gets run multiple times…
		jmolloyUnsubmitted Not Done Reply Inline Actions My worry about this code is that it doesn't assert the preconditions or postconditions. AFAICS, it detects any Add, Sub or Xor and does some transform on them. Why is this correct in all cases? What if one of these instructions were missed? What if there is a sequence of these instructions, is it correct to just adjust a subset? What if they were reordered or constant folded? The code doesn't explain exactly what it's checking for, who generates it and why it's correct to remove (and what happens if you have a false negative or false positive). Note I'm not saying it's wrong, or perhaps it's the only way to do this, but with the lack of comment it remains a code smell at the moment and it's difficult for me to suggest an alternative. jmolloy: My worry about this code is that it doesn't assert the preconditions or postconditions. AFAICS…
		nikicUnsubmitted Not Done Reply Inline Actions I think the idea here is that given two bijective (total) functions f(x) and lookup(y), then lookup(f(x)) is also a bijective function that can be implemented as a new lookup table lookup_f(x). In this case and, sub and xor with a constant are the f(x)s. It seems like this could be an independent and more general optimization though, that works on any table lookup. nikic: I think the idea here is that given two bijective (total) functions f(x) and lookup(y), then…
		shawnlAuthorUnsubmitted Done Reply Inline Actions I am working on doing this as a general optimization of GEPs. shawnl: I am working on doing this as a general optimization of GEPs.
		if (match(SI->getCondition(), m_Add(m_Value(V), m_ConstantInt(CAdd))) \|\|
		match(SI->getCondition(), m_Sub(m_Value(V), m_ConstantInt(CSub))) \|\|
		match(SI->getCondition(), m_Xor(m_Value(V), m_ConstantInt(CXor)))) {
		for (auto Case : SI->cases()) {
		auto *Orig = Case.getCaseValue();
		auto Sub = CAdd ? Orig->getValue() - CAdd->getValue() : Orig->getValue();
		auto Add = CSub ? Sub + CSub->getValue() : Sub;
		auto Xor = (CXor ? Add ^ CXor->getValue() : Add);
		Case.setValue(cast<ConstantInt>(
		ConstantInt::get(MaxCaseVal->getContext(), Xor)));
		}
		SI->setCondition(V);
		return true; // We will get called again
		}
		jmolloyUnsubmitted Not Done Reply Inline Actions This comment doesn't parse for me. jmolloy: This comment doesn't parse for me.
		// Call this from in here, because we need the context necessary for this if/else
		jmolloyUnsubmitted Not Done Reply Inline Actions This code structure is a little hard to understand. Instead of trying to tack onto the end of the if statement to reuse the heuristic, can you extract the heuristic into a bool and reuse it explicitly here? Then the reader doesn't need to think about context. jmolloy: This code structure is a little hard to understand. Instead of trying to tack onto the end of…
		} else if (ReduceSwitchRange(SI, Builder, DL, TTI))
		return true; // We will get called again

	if (!ShouldBuildLookupTable(SI, TableSize, TTI, DL, ResultTypes))	if (!ShouldBuildLookupTable(SI, TableSize, TTI, DL, ResultTypes))
	return false;	return false;

	// Create the BB that does the lookups.	// Create the BB that does the lookups.
	Module &Mod = *CommonDest->getParent()->getParent();	Module &Mod = *CommonDest->getParent()->getParent();
	BasicBlock *LookupBB = BasicBlock::Create(	BasicBlock *LookupBB = BasicBlock::Create(
	Mod.getContext(), "switch.lookup", CommonDest->getParent(), CommonDest);	Mod.getContext(), "switch.lookup", CommonDest->getParent(), CommonDest);

	// Compute the table index value.
	Builder.SetInsertPoint(SI);	Builder.SetInsertPoint(SI);
	Value *TableIndex = SI->getCondition();	Value *TableIndex = SI->getCondition();

	// Compute the maximum table size representable by the integer type we are
	// switching upon.
	unsigned CaseSize = MaxCaseVal->getType()->getPrimitiveSizeInBits();
	uint64_t MaxTableSize = CaseSize > 63 ? UINT64_MAX : 1ULL << CaseSize;
	assert(MaxTableSize >= TableSize &&
	"It is impossible for a switch to have more entries than the max "
	"representable value of its input integer type's size.");

	// If the default destination is unreachable, or if the lookup table covers	// If the default destination is unreachable, or if the lookup table covers
	// all values of the conditional variable, branch directly to the lookup table	// all values of the conditional variable, branch directly to the lookup table
	// BB. Otherwise, check that the condition is within the case range.	// BB. Otherwise, check that the condition is within the case range.
	const bool DefaultIsReachable =	const bool DefaultIsReachable =
	!isa<UnreachableInst>(SI->getDefaultDest()->getFirstNonPHIOrDbg());	!isa<UnreachableInst>(SI->getDefaultDest()->getFirstNonPHIOrDbg());
Context not available.
	return requestResimplify();	return requestResimplify();

	if (Options.ForwardSwitchCondToPhi && ForwardSwitchConditionToPHI(SI))	if (Options.ForwardSwitchCondToPhi && ForwardSwitchConditionToPHI(SI))
	return requestResimplify();	return requestResimplify();

	if (ReduceSwitchRange(SI, Builder, DL, TTI))
	return requestResimplify();

	// The conversion from switch to lookup tables results in difficult-to-analyze	// The conversion from switch to lookup tables results in difficult-to-analyze
	// code and makes pruning branches much harder. This is a problem if the	// code and makes pruning branches much harder. This is a problem if the
	// switch expression itself can still be restricted as a result of inlining or	// switch expression itself can still be restricted as a result of inlining or
	// CVP. Therefore, only apply this transformation during late stages of the	// CVP. Therefore, only apply this transformation during late stages of the
	// optimisation pipeline.	// optimisation pipeline.
	if (Options.ConvertSwitchToLookupTable &&	if (Options.ConvertSwitchToLookupTable &&
	SwitchToLookupTable(SI, Builder, DL, TTI))	SwitchToLookupTable(SI, Builder, DL, TTI))
	return requestResimplify();	return requestResimplify();

		jmolloyUnsubmitted Not Done Reply Inline Actions It's not clear to me why this matters from this context. Perhaps if you wrote something like: // Call ReduceSwitchRange after SwitchToLookupTable as SwitchToLookupTable calls this internally. jmolloy: It's not clear to me why this matters from this context. Perhaps if you wrote something like…
		// This is also called within SwitchToLookupTable
		if (ReduceSwitchRange(SI, Builder, DL, TTI))
		return requestResimplify();

	return false;	return false;
	}	}

	bool SimplifyCFGOpt::SimplifyIndirectBr(IndirectBrInst *IBI) {	bool SimplifyCFGOpt::SimplifyIndirectBr(IndirectBrInst *IBI) {
	BasicBlock *BB = IBI->getParent();	BasicBlock *BB = IBI->getParent();
Context not available.

test/Transforms/SimplifyCFG/switch-genfori8.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; without the -O2 opt will not generate lookup tables
				; RUN: opt -S -simplifycfg -O2 < %s \| FileCheck %s
				; Using a Zig driver https://gist.github.com/shawnl/8137f62f7dbcfd539f6cf1925387cd38
				;after-patch, covered lookup table: 509.8MiB/sec
				;lookup table, not covered, only valid digits to prime the branch predictor: 437.8MiB/sec
				;lookup table, not covered, random bytes: 242.0MiB/sec
				;before-patch, no lookup table: 205.4MiB/sec

				; ModuleID = 'chartodigit.c'
				source_filename = "chartodigit.c"
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-pc-linux-gnu"

				; Function Attrs: norecurse nounwind readnone uwtable
				define dso_local zeroext i8 @char_to_digit(i8 zeroext) local_unnamed_addr #0 {
				; CHECK-LABEL: @char_to_digit(
				; CHECK-NEXT: switch.lookup:
				; CHECK-NEXT: [[TMP1:%.]] = zext i8 [[TMP0:%.]] to i64
				; CHECK-NEXT: [[SWITCH_GEP:%.]] = getelementptr inbounds [256 x i8], [256 x i8] @switch.table.char_to_digit, i64 0, i64 [[TMP1]]
				; CHECK-NEXT: [[SWITCH_LOAD:%.]] = load i8, i8 [[SWITCH_GEP]], align 1
				; CHECK-NEXT: ret i8 [[SWITCH_LOAD]]
				;
				switch i8 %0, label %17 [
				i8 48, label %18
				i8 49, label %2
				i8 50, label %3
				i8 51, label %4
				i8 52, label %5
				i8 53, label %6
				i8 54, label %7
				i8 55, label %8
				i8 56, label %9
				i8 57, label %10
				i8 97, label %11
				i8 98, label %12
				i8 99, label %13
				i8 100, label %14
				i8 101, label %15
				i8 102, label %16
				]

				; <label>:2: ; preds = %1
				br label %18

				; <label>:3: ; preds = %1
				br label %18

				; <label>:4: ; preds = %1
				br label %18

				; <label>:5: ; preds = %1
				br label %18

				; <label>:6: ; preds = %1
				br label %18

				; <label>:7: ; preds = %1
				br label %18

				; <label>:8: ; preds = %1
				br label %18

				; <label>:9: ; preds = %1
				br label %18

				; <label>:10: ; preds = %1
				br label %18

				; <label>:11: ; preds = %1
				br label %18

				; <label>:12: ; preds = %1
				br label %18

				; <label>:13: ; preds = %1
				br label %18

				; <label>:14: ; preds = %1
				br label %18

				; <label>:15: ; preds = %1
				br label %18

				; <label>:16: ; preds = %1
				br label %18

				; <label>:17: ; preds = %1
				br label %18

				; <label>:18: ; preds = %1, %17, %16, %15, %14, %13, %12, %11, %10, %9, %8, %7, %6, %5, %4, %3, %2
				%19 = phi i8 [ -1, %17 ], [ 15, %16 ], [ 14, %15 ], [ 13, %14 ], [ 12, %13 ], [ 11, %12 ], [ 10, %11 ], [ 9, %10 ], [ 8, %9 ], [ 7, %8 ], [ 6, %7 ], [ 5, %6 ], [ 4, %5 ], [ 3, %4 ], [ 2, %3 ], [ 1, %2 ], [ 0, %1 ]
				ret i8 %19
				}

				attributes #0 = { norecurse nounwind readnone uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

				!llvm.module.flags = !{!0}
				!llvm.ident = !{!1}

				!0 = !{i32 1, !"wchar_size", i32 4}
				!1 = !{!"clang version 8.0.0-3 (tags/RELEASE_800/final)"}

This is an archive of the discontinued LLVM Phabricator instance.

[SimplifyCFG] Use lookup tables when they are more space efficient or a huge speed win.AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 196440

lib/Transforms/Utils/SimplifyCFG.cpp

test/Transforms/SimplifyCFG/switch-genfori8.ll

[SimplifyCFG] Use lookup tables when they are more space efficient or a huge speed win.
AbandonedPublic