This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
-
BasicTTIImpl.h
-
SwitchLoweringUtils.h
-
TargetLowering.h
-
lib/
-
CodeGen/
-
SwitchLoweringUtils.cpp
-
TargetLoweringBase.cpp
-
Target/AArch64/
-
AArch64/
-
AArch64ISelLowering.cpp
-
AArch64Subtarget.h
-
AArch64Subtarget.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
max-jump-table.ll

Differential D60295

[CodeGen] Replace -max-jump-table-size with -max-jump-table-targets
ClosedPublic

Authored by evandro on Apr 4 2019, 5:08 PM.

Download Raw Diff

Details

Reviewers

hans
ayonam
junbuml
craig.topper
RKSimon
efriedma

Commits

rG3bd8ba156b52: [CodeGen] Replace -max-jump-table-size with -max-jump-table-targets
rL372893: [CodeGen] Replace -max-jump-table-size with -max-jump-table-targets

Summary

Modern processors predict the targets of an indirect branch regardless of the size of any jump table used to glean its target address. Moreover, branch predictors typically use resources limited by the number of actual targets that occur at run time.

This patch changes the semantics of the option -max-jump-table-size to limit the number of different targets instead of the number of entries in a jump table. Thus, it is now renamed to -max-jump-table-targets.

Before, when -max-jump-table-size was specified, it could happen that cluster jump tables could have targets used repeatedly, but each one was counted and typically resulted in tables with the same number of entries. With this patch, when specifying -max-jump-table-targets, tables may have different lengths, since the number of unique targets is counted towards the limit, but the number of unique targets in tables is the same, but for the last one containing the balance of targets.

Diff Detail

Repository: rL LLVM

Event Timeline

evandro created this revision.Apr 4 2019, 5:08 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 4 2019, 5:08 PM

Herald added subscribers: llvm-commits, jsji, jocewei and 24 others. · View Herald Transcript

I'm still collecting data on other architectures, but I've observed improvements between 3 and 15% in some SPEC benchmarks, such as 253.perlbmk and 400.perlbench, and proprietary ones on AArch64.

This patch changes the semantics of the options min-jump-table-entries and max-jump-table-size to use the number of different targets instead of the number of entries in a jump table. Thus, the are now renamed to min-jump-table-cases and max-jump-table-cases, respectively.

Hmm, I'll have to think about this one.

I'm not sure it makes sense to change -min-jump-table-entries. I think that one is really supposed to decide when a jump table is better than separate branches, and I'm not sure it should change because of this.

-max-jump-table-size was introduced exactly for this purpose (limiting the load on the branch predictor), so making that based on the number of targets sounds reasonable (I'd suggest calling it max-jump-table-targets instead of -cases though).

I also don't see exactly how the semantics of max-jump-table-size are changed with this patch? What am I missing? Can you upload it again with more context maybe?

In D60295#1456226, @hans wrote:

I also don't see exactly how the semantics of max-jump-table-size are changed with this patch? What am I missing? Can you upload it again with more context maybe?

Look at llvm/include/llvm/CodeGen/TargetLowering.h below.

What does this do for codesize?

aheejin added inline comments.Apr 8 2019, 5:30 AM

llvm/test/CodeGen/WebAssembly/cfg-stackify.ll
1 ↗	(On Diff #193810)	Why does this test need this option?

In D60295#1456414, @evandro wrote:

In D60295#1456226, @hans wrote:

I also don't see exactly how the semantics of max-jump-table-size are changed with this patch? What am I missing? Can you upload it again with more context maybe?

Look at llvm/include/llvm/CodeGen/TargetLowering.h below.

Thanks! I had forgotten how this works :-)

I still think looking at the number of cases isn't that much better than looking at the size of the range though. As you said, the point is to limit the load on the branch target predictor, and IIUC that's limited on the number of *different branch targets*, which is really orthogonal to the number of cases. I realize that we don't have that information as readily available, but do you agree that limiting the jump table to a certain number of different targets would be a better approach?

llvm/include/llvm/CodeGen/TargetLowering.h
962 ↗	(On Diff #193810)	What's the `(NumCases < Range)` part for? Based on the description, I'd expect this to just check "NumCases <= MaxJumpTableCases"

In D60295#1458162, @dmgreen wrote:

What does this do for codesize?

Since there aren't that many jump tables, the increase in code size is negligible.

evandro marked 2 inline comments as done.Apr 19 2019, 10:33 AM

evandro added inline comments.

llvm/test/CodeGen/WebAssembly/cfg-stackify.ll
1 ↗	(On Diff #193810)	Because before there was no jump table and now there is, since the number of targets is small enough. Otherwise, this test would have to be modified for a reason unrelated to the test.

In D60295#1462585, @hans wrote:

I still think looking at the number of cases isn't that much better than looking at the size of the range though. As you said, the point is to limit the load on the branch target predictor, and IIUC that's limited on the number of *different branch targets*, which is really orthogonal to the number of cases. I realize that we don't have that information as readily available, but do you agree that limiting the jump table to a certain number of different targets would be a better approach?

For each case there is a target in the table, that may potentially be reached or not at run time, but would prevent stressing the predictor. So, cases and targets are not orthogonal, but the same.

llvm/include/llvm/CodeGen/TargetLowering.h
962 ↗	(On Diff #193810)	It's how I infer that there may be a default case. Or am I missing a better way to do so?

Since there aren't that many jump tables, the increase in code size is negligible.

For a point of reference on the codesize tests I ran, the increases from this patch are larger than the decreases from D59936 (which was the last decent codesize change I saw). Codesize changes might seem small at times, but they often come small change at a time. Plus it depends how you measure them.

Also, many cpus don't really work the way you claim. Some don't even have branch predictors, or the time would not be dominated by the branch mispredict. With those it's more about the difference between jump table setup code and the equivalent series of branches.

In D60295#1472786, @evandro wrote:

In D60295#1462585, @hans wrote:

I still think looking at the number of cases isn't that much better than looking at the size of the range though. As you said, the point is to limit the load on the branch target predictor, and IIUC that's limited on the number of *different branch targets*, which is really orthogonal to the number of cases. I realize that we don't have that information as readily available, but do you agree that limiting the jump table to a certain number of different targets would be a better approach?

For each case there is a target in the table, that may potentially be reached or not at run time, but would prevent stressing the predictor. So, cases and targets are not orthogonal, but the same.

But if that's so, shouldn't the "default" cases count too?

I don't know how CPUs do this internally, but in your change description you wrote: "Rather, branch predictors typically use resources limited by the number of actual targets that occur at run time."

I interpreted this as the number of actual target addresses, which is less than or equal to the number of cases. For example in

switch (x) {
case 1: case 8: case 12: foo(); break;
case 3: case 7: case 9: bar(); break;
default: baz();
}

there are only 3 branch targets, but 6 cases, and an indirect jump through a jump table could take 12 different values.

The question is which of those dimensions that the branch target predictor is limited by.

Does the CPU keep track of the branch target for each of the 12 possible in-range values? Or does it have some kind of reverse map that tracks which values map to which of the 3 targets? I don't have the expertise to answer this, but so far I'm not convinced that limiting the number of cases is better than limiting the size of the range.

evandro marked an inline comment as done.Apr 25 2019, 6:21 AM

@hans, yes, default, whether explicit or implicit, counts too, for the resources used inside the target only care about different addresses. In you example, a typical branch predictor will see up to 3 target addresses. Of course, branch predictors are trained by branches that are actually taken, so only target addresses that are executed count. Which opens the window for future work involving FDO or just the static probabilities given by the default heuristics.

I'll study your example further.

evandro updated this revision to Diff 206941.Jun 27 2019, 2:41 PM

evandro retitled this revision from [SelectionDAG] Change the jump table size unit from entry to target to [CodeGen] Change the jump table size unit from entry to target.

evandro edited the summary of this revision. (Show Details)

Ping! 🔔

Herald added a subscriber: • wuzish. · View Herald TranscriptJul 11 2019, 8:31 AM

Please upload with the full context.

xbolva00 added a reviewer: efriedma.Jul 11 2019, 8:40 AM

diff with context?

evandro updated this revision to Diff 209282.Jul 11 2019, 12:01 PM

🔔 ¡Ping! 🔔

Hi Evandro,

Very sorry for the slow reply here, but it needed some thinking.

From the change description:

[CodeGen] Change the jump table size unit from entry to target

It's not so clear what a "jump table size unit" is. I think maybe "Replace -max-jump-table-size with -max-jump-table-targets" or similar would be more clear.

llvm/include/llvm/CodeGen/SwitchLoweringUtils.h
232 ↗	(On Diff #209282)	Maye add "unique" in here to make it clearer.
llvm/lib/CodeGen/SwitchLoweringUtils.cpp
51 ↗	(On Diff #209282)	I would probably just have gone with "Targets.insert(Clusters[i].MBB)" since C isn't getting used anywhere else.
96 ↗	(On Diff #209282)	MaxTargets isn't a great variable name for something that's "Number of targets, including the default if it's reachable". Maybe getJumpTableNumTargets() could take the default case into account so it doesn't need to be handled separately?
156 ↗	(On Diff #209282)	Hmm, this is problematic. getJumpTableNumTargets has linear time complexity in the number of clusters, so I think this is essentially adding another factor O(n) to the overall time complexity, which is unfortunate. One way to solve this within the original O(n^2) complexity of the current code, would be to build an auxiliary data structure in the outer for-loop, e.g. TotalNumTargets[], where TotalNumTargets[x] is the total number of targets in clusters i..x (i being defined in the outer loop). That would be built in O(n) time, and the inner loop would then query TotalNumTargets[j] (in constant time).

evandro marked 4 inline comments as done.Aug 6 2019, 5:21 PM

evandro added inline comments.

llvm/lib/CodeGen/SwitchLoweringUtils.cpp
156 ↗	(On Diff #209282)	If I understood you correctly, pre calculating the number of targets in clusters i..x, where x will be j, is the same as performing this calculation inside the inner loop.

evandro updated this revision to Diff 213770.Aug 6 2019, 6:07 PM

evandro retitled this revision from [CodeGen] Change the jump table size unit from entry to target to [CodeGen] Replace -max-jump-table-size with -max-jump-table-targets.

evandro edited the summary of this revision. (Show Details)Aug 6 2019, 6:37 PM

Ping❗ 🔔🔔🔔

Very sorry again for the slow reply. I think this patch still suffers from the O(n^3) time complexity.

llvm/lib/CodeGen/SwitchLoweringUtils.cpp
156 ↗	(On Diff #209282)	No, the current code does O(n) work in the inner loop, making the overall time complexity O(n^3). Pre-computing the number of targets in clusters i..x in the outer loop would preserve the overall time complexity of O(n^2).

evandro marked 2 inline comments as done.Aug 15 2019, 6:06 PM

evandro added inline comments.

llvm/lib/CodeGen/SwitchLoweringUtils.cpp
156 ↗	(On Diff #209282)	Let me see if I understood what you mean in a patch...

evandro updated this revision to Diff 215516.Aug 15 2019, 6:07 PM

evandro marked an inline comment as done.

evandro marked an inline comment as done.Aug 16 2019, 11:24 AM

🔔🔔 ‼️Ping‼️ 🔔🔔

In D60295#1641609, @evandro wrote:

🔔🔔 ‼️Ping‼️ 🔔🔔

Sorry, I've been busy with LLVM 9 release and other work. Hopefully I can get to this sometime next week; I haven't forgotten about it.

I think this keeps the algorithm still withing O(n^2) so that's good.

But I am worried that this adds more complexity to something that is already very complex. Can you share numbers to show that this is worth it?

I left some comments for how to maybe make this cleaner, and I think that's the main improvement that's needed: it needs be easier to read the code, understand what's happening and see that it's correct.

I also worry that this makes the code slower, even though the new functionality is not used by most targets. Even though it's still O(n^2) time complexity, we're doing more work now. Can you measure and see how this affects compile time of a large switch? (Maybe try the program from https://bugs.llvm.org/show_bug.cgi?id=23490)

llvm/lib/CodeGen/SwitchLoweringUtils.cpp
142 ↗	(On Diff #215516)	Maybe PartitionStats is a better name, since it's really about numbers, not any other kinds of traits.
147 ↗	(On Diff #215516)	I guess you mean PartitionTrait, not PartitionTarget. Also, the indexing of this seems very confusing, and that seems to be because elements are inserted with push_back(). Couldn't PartitionTrait[x] be the traits for Clusters[i..x], i.e. a straight-forward indexing? Also, since it's a vector, I don't think the name should be in singular.
166 ↗	(On Diff #215516)	Using the same variable, defined on line 158, both for initializing the array, and then for lookups later is confusing. I think it would be easier to read of the code in this block was more like: PartitionTraits[j].Range = ... PartitionTraits[j].Cases = ... PartitionTraits[j].Targets = ...
179 ↗	(On Diff #215516)	Here the indexing gets confusing (also we probably don't want to copy the struct, but use a const-ref).

RKSimon mentioned this in rL371415: Fix typo in comment noticed in D60295. NFCI..Sep 9 2019, 9:06 AM

RKSimon mentioned this in rG9ede7c039563: Fix typo in comment noticed in D60295. NFCI..

evandro marked 2 inline comments as done.Sep 12 2019, 10:58 AM

evandro marked 2 inline comments as done.Sep 12 2019, 12:53 PM

Please, stand by for numbers.

Running llc -O1 3 times on the byte code from a.i in the "reduced some more" archive:

Before: average of 38.383s ± 0.064s
After: average of 38.440s ± 0.049s

Or an increase of 0.15% in the run time.

lgtm with comments

llvm/include/llvm/CodeGen/SwitchLoweringUtils.h
236 ↗	(On Diff #219982)	Since it has access to Clusters and First and Last, passing in Cases and Range seems redundant. It seems the function should be able to figure those things out itself.
llvm/lib/CodeGen/SwitchLoweringUtils.cpp
139 ↗	(On Diff #219982)	The comment needs updating for the PartitionTrait rename.
147 ↗	(On Diff #219982)	s/traits/stats/ And thanks for updating the indexing, this is easier to follow.

This revision is now accepted and ready to land.Sep 17 2019, 6:38 AM

Thank you.

llvm/include/llvm/CodeGen/SwitchLoweringUtils.h
236 ↗	(On Diff #219982)	That is true, but, whenever `getJumpTableNumTargets()` is called, those values are calculated and then used again. So, at least for this use case, it seems to be more efficient to let `getJumpTableNumTargets()` calculate and return them through references.

evandro marked an inline comment as done.Sep 17 2019, 8:31 AM

evandro added inline comments.

llvm/include/llvm/CodeGen/SwitchLoweringUtils.h
236 ↗	(On Diff #219982)	I take it back. `Cases` needs the `TotalCases` array to be calculated. So, no, with the current arguments, `getJumpTableNumCases()` can only figure `Range` out.

Update the patch Including the suggested refactoring.

hans added inline comments.Sep 18 2019, 12:18 AM

llvm/lib/CodeGen/SwitchLoweringUtils.cpp
49 ↗	(On Diff #220526)	I think it would be simpler if this just returned a PartitionStats object, instead of "returning" via a reference parameter.
54 ↗	(On Diff #220526)	The point of the AllCases vector was to avoid having to iterate from First to Last each time to compute the number of cases. Now that we're iterating from First to Last anyway to count the number of targets, there's no point to this optimization really, and we might as well count the number of cases while counting the number of targets. That would make the code simpler.

evandro marked 4 inline comments as done.Sep 18 2019, 8:14 AM

evandro added inline comments.

llvm/lib/CodeGen/SwitchLoweringUtils.cpp
54 ↗	(On Diff #220526)	Of course!

evandro updated this revision to Diff 220679.Sep 18 2019, 8:23 AM

evandro marked an inline comment as done.

hans added inline comments.Sep 20 2019, 12:06 AM

llvm/lib/CodeGen/SwitchLoweringUtils.cpp
55 ↗	(On Diff #220679)	There's still no need for the vector. The number of cases from First to Last is the sum of (Clusters[i].High - Clusters[i].Low) for each i between First and Last. The sum can be computed directly since we're running the for-loop anyway. There's no need to use the vector.

evandro marked 2 inline comments as done.Sep 20 2019, 11:16 AM

evandro added inline comments.

llvm/lib/CodeGen/SwitchLoweringUtils.cpp
55 ↗	(On Diff #220679)	I see, inlining the functions makes this clear.

evandro updated this revision to Diff 221092.Sep 20 2019, 12:27 PM

evandro marked an inline comment as done.

evandro updated this revision to Diff 221095.Sep 20 2019, 12:46 PM

hans added inline comments.Sep 23 2019, 1:51 AM

llvm/lib/CodeGen/SwitchLoweringUtils.cpp
33 ↗	(On Diff #221095)	I'd suggest not declaring these until they're used.
36 ↗	(On Diff #221095)	I don't know what the "accumulated" refers to here. I don't think there's any need for a comment here actually.
39 ↗	(On Diff #221095)	There's nothing accumulated here. I'd suggest just dropping the comment.
41 ↗	(On Diff #221095)	It would be better to just declare the variables here: APInt Hi = ... APInt Lo = ...
49 ↗	(On Diff #221095)	And declare these variables here (no need to reuse the same variables as in the loop): APInt Hi = ... APInt Lo = ...

evandro updated this revision to Diff 221373.Sep 23 2019, 11:30 AM

evandro marked 5 inline comments as done.

I think you uploaded a new patch but the code that I commented on still looks the same?

🤦🏻‍♂️

evandro updated this revision to Diff 221564.Sep 24 2019, 11:07 AM

Thanks! Looks good to me.

Thank you.

Closed by commit rL372893: [CodeGen] Replace -max-jump-table-size with -max-jump-table-targets (authored by evandro). · Explain WhySep 25 2019, 9:09 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

CodeGen/

BasicTTIImpl.h

2 lines

SwitchLoweringUtils.h

8 lines

TargetLowering.h

28 lines

lib/

CodeGen/

SwitchLoweringUtils.cpp

94 lines

TargetLoweringBase.cpp

18 lines

Target/

AArch64/

AArch64ISelLowering.cpp

7 lines

AArch64Subtarget.h

4 lines

AArch64Subtarget.cpp

4 lines

test/

CodeGen/

AArch64/

max-jump-table.ll

46 lines

Diff 221787

llvm/trunk/include/llvm/CodeGen/BasicTTIImpl.h

Show First 20 Lines • Show All 367 Lines • ▼ Show 20 Lines	unsigned getEstimatedNumberOfCaseClusters(const SwitchInst &SI,
// Check if suitable for a jump table.		// Check if suitable for a jump table.
if (IsJTAllowed) {		if (IsJTAllowed) {
if (N < 2 \|\| N < TLI->getMinimumJumpTableEntries())		if (N < 2 \|\| N < TLI->getMinimumJumpTableEntries())
return N;		return N;
uint64_t Range =		uint64_t Range =
(MaxCaseVal - MinCaseVal)		(MaxCaseVal - MinCaseVal)
.getLimitedValue(std::numeric_limits<uint64_t>::max() - 1) + 1;		.getLimitedValue(std::numeric_limits<uint64_t>::max() - 1) + 1;
// Check whether a range of clusters is dense enough for a jump table		// Check whether a range of clusters is dense enough for a jump table
if (TLI->isSuitableForJumpTable(&SI, N, Range)) {		if (TLI->isSuitableForJumpTable(&SI, N, 0, Range)) {
JumpTableSize = Range;		JumpTableSize = Range;
return 1;		return 1;
}		}
}		}
return N;		return N;
}		}

bool shouldBuildLookupTables() {		bool shouldBuildLookupTables() {
▲ Show 20 Lines • Show All 1,330 Lines • Show Last 20 Lines

llvm/trunk/include/llvm/CodeGen/SwitchLoweringUtils.h

Show First 20 Lines • Show All 215 Lines • ▼ Show 20 Lines	struct BitTestBlock {
BitTestBlock(APInt F, APInt R, const Value *SV, unsigned Rg, MVT RgVT, bool E,		BitTestBlock(APInt F, APInt R, const Value *SV, unsigned Rg, MVT RgVT, bool E,
bool CR, MachineBasicBlock P, MachineBasicBlock D,		bool CR, MachineBasicBlock P, MachineBasicBlock D,
BitTestInfo C, BranchProbability Pr)		BitTestInfo C, BranchProbability Pr)
: First(std::move(F)), Range(std::move(R)), SValue(SV), Reg(Rg),		: First(std::move(F)), Range(std::move(R)), SValue(SV), Reg(Rg),
RegVT(RgVT), Emitted(E), ContiguousRange(CR), Parent(P), Default(D),		RegVT(RgVT), Emitted(E), ContiguousRange(CR), Parent(P), Default(D),
Cases(std::move(C)), Prob(Pr) {}		Cases(std::move(C)), Prob(Pr) {}
};		};

/// Return the range of values within a range.
uint64_t getJumpTableRange(const CaseClusterVector &Clusters, unsigned First,
unsigned Last);

/// Return the number of cases within a range.
uint64_t getJumpTableNumCases(const SmallVectorImpl<unsigned> &TotalCases,
unsigned First, unsigned Last);

struct SwitchWorkListItem {		struct SwitchWorkListItem {
MachineBasicBlock *MBB;		MachineBasicBlock *MBB;
CaseClusterIt FirstCluster;		CaseClusterIt FirstCluster;
CaseClusterIt LastCluster;		CaseClusterIt LastCluster;
const ConstantInt *GE;		const ConstantInt *GE;
const ConstantInt *LT;		const ConstantInt *LT;
BranchProbability DefaultProb;		BranchProbability DefaultProb;
};		};
▲ Show 20 Lines • Show All 58 Lines • Show Last 20 Lines

llvm/trunk/include/llvm/CodeGen/TargetLowering.h

Show First 20 Lines • Show All 1,016 Lines • ▼ Show 20 Lines	bool rangeFitsInWord(const APInt &Low, const APInt &High,
const DataLayout &DL) const {		const DataLayout &DL) const {
// FIXME: Using the pointer type doesn't seem ideal.		// FIXME: Using the pointer type doesn't seem ideal.
uint64_t BW = DL.getIndexSizeInBits(0u);		uint64_t BW = DL.getIndexSizeInBits(0u);
uint64_t Range = (High - Low).getLimitedValue(UINT64_MAX - 1) + 1;		uint64_t Range = (High - Low).getLimitedValue(UINT64_MAX - 1) + 1;
return Range <= BW;		return Range <= BW;
}		}

/// Return true if lowering to a jump table is suitable for a set of case		/// Return true if lowering to a jump table is suitable for a set of case
/// clusters which may contain \p NumCases cases, \p Range range of values.		/// clusters which may contain \p NumCases cases, \p Range range of values,
virtual bool isSuitableForJumpTable(const SwitchInst *SI, uint64_t NumCases,		/// \p NumTargets targets.
		virtual bool isSuitableForJumpTable(const SwitchInst *SI,
		uint64_t NumCases, uint64_t NumTargets,
uint64_t Range) const {		uint64_t Range) const {
// FIXME: This function check the maximum table size and density, but the		// FIXME: This function check the maximum table size and density, but the
// minimum size is not checked. It would be nice if the minimum size is		// minimum size is not checked. It would be nice if the minimum size is
// also combined within this function. Currently, the minimum size check is		// also combined within this function. Currently, the minimum size check is
// performed in findJumpTable() in SelectionDAGBuiler and		// performed in findJumpTable() in SelectionDAGBuiler and
// getEstimatedNumberOfCaseClusters() in BasicTTIImpl.		// getEstimatedNumberOfCaseClusters() in BasicTTIImpl.
const bool OptForSize = SI->getParent()->getParent()->hasOptSize();		const bool OptForSize = SI->getParent()->getParent()->hasOptSize();
const unsigned MinDensity = getMinimumJumpTableDensity(OptForSize);		const unsigned MinDensity = getMinimumJumpTableDensity(OptForSize);
const unsigned MaxJumpTableSize = getMaximumJumpTableSize();		const unsigned MaxJumpTableTargets = getMaximumJumpTableTargets();

// Check whether the number of cases is small enough and		// Check whether the number of targets is small enough and
// the range is dense enough for a jump table.		// the range is dense enough for a jump table.
if ((OptForSize \|\| Range <= MaxJumpTableSize) &&		if ((OptForSize \|\| NumTargets <= MaxJumpTableTargets) &&
(NumCases * 100 >= Range * MinDensity)) {		NumCases * 100 >= Range * MinDensity)
return true;		return true;
}
return false;		return false;
}		}

/// Return true if lowering to a bit test is suitable for a set of case		/// Return true if lowering to a bit test is suitable for a set of case
/// clusters which contains \p NumDests unique destinations, \p Low and		/// clusters which contains \p NumDests unique destinations, \p Low and
/// \p High as its lowest and highest case values, and expects \p NumCmps		/// \p High as its lowest and highest case values, and expects \p NumCmps
/// case value comparisons. Check if the number of destinations, comparison		/// case value comparisons. Check if the number of destinations, comparison
/// metric, and range are all suitable.		/// metric, and range are all suitable.
▲ Show 20 Lines • Show All 486 Lines • ▼ Show 20 Lines
}		}

/// Return lower limit for number of blocks in a jump table.		/// Return lower limit for number of blocks in a jump table.
virtual unsigned getMinimumJumpTableEntries() const;		virtual unsigned getMinimumJumpTableEntries() const;

/// Return lower limit of the density in a jump table.		/// Return lower limit of the density in a jump table.
unsigned getMinimumJumpTableDensity(bool OptForSize) const;		unsigned getMinimumJumpTableDensity(bool OptForSize) const;

/// Return upper limit for number of entries in a jump table.		/// Return upper limit for number of targets in a jump table.
/// Zero if no limit.		unsigned getMaximumJumpTableTargets() const;
unsigned getMaximumJumpTableSize() const;

virtual bool isJumpTableRelative() const {		virtual bool isJumpTableRelative() const {
return TM.isPositionIndependent();		return TM.isPositionIndependent();
}		}

/// If a physical register, this specifies the register that		/// If a physical register, this specifies the register that
/// llvm.savestack/llvm.restorestack should save and restore.		/// llvm.savestack/llvm.restorestack should save and restore.
unsigned getStackPointerRegisterToSaveRestore() const {		unsigned getStackPointerRegisterToSaveRestore() const {
▲ Show 20 Lines • Show All 390 Lines • ▼ Show 20 Lines	protected:
/// llvm.longjmp or the version without _. Defaults to false.		/// llvm.longjmp or the version without _. Defaults to false.
void setUseUnderscoreLongJmp(bool Val) {		void setUseUnderscoreLongJmp(bool Val) {
UseUnderscoreLongJmp = Val;		UseUnderscoreLongJmp = Val;
}		}

/// Indicate the minimum number of blocks to generate jump tables.		/// Indicate the minimum number of blocks to generate jump tables.
void setMinimumJumpTableEntries(unsigned Val);		void setMinimumJumpTableEntries(unsigned Val);

/// Indicate the maximum number of entries in jump tables.		/// Indicate the maximum number of targets in jump tables.
/// Set to zero to generate unlimited jump tables.		void setMaximumJumpTableTargets(unsigned);
void setMaximumJumpTableSize(unsigned);

/// If set to a physical register, this specifies the register that		/// If set to a physical register, this specifies the register that
/// llvm.savestack/llvm.restorestack should save and restore.		/// llvm.savestack/llvm.restorestack should save and restore.
void setStackPointerRegisterToSaveRestore(unsigned R) {		void setStackPointerRegisterToSaveRestore(unsigned R) {
StackPointerRegisterToSaveRestore = R;		StackPointerRegisterToSaveRestore = R;
}		}

/// Tells the code generator that the target has multiple (allocatable)		/// Tells the code generator that the target has multiple (allocatable)
▲ Show 20 Lines • Show All 2,273 Lines • Show Last 20 Lines

llvm/trunk/lib/CodeGen/SwitchLoweringUtils.cpp

//===- SwitchLoweringUtils.cpp - Switch Lowering --------------------------===//		//===- SwitchLoweringUtils.cpp - Switch Lowering --------------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file contains switch inst lowering optimizations and utilities for		// This file contains switch inst lowering optimizations and utilities for
// codegen, so that it can be used for both SelectionDAG and GlobalISel.		// codegen, so that it can be used for both SelectionDAG and GlobalISel.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

		#include "llvm/ADT/SmallSet.h"
#include "llvm/CodeGen/MachineJumpTableInfo.h"		#include "llvm/CodeGen/MachineJumpTableInfo.h"
#include "llvm/CodeGen/SwitchLoweringUtils.h"		#include "llvm/CodeGen/SwitchLoweringUtils.h"

using namespace llvm;		using namespace llvm;
using namespace SwitchCG;		using namespace SwitchCG;

uint64_t SwitchCG::getJumpTableRange(const CaseClusterVector &Clusters,		// Collection of partition stats, made up of, for a given cluster,
unsigned First, unsigned Last) {		// the range of the cases, their number and the number of unique targets.
assert(Last >= First);		struct PartitionStats {
const APInt &LowCase = Clusters[First].Low->getValue();		uint64_t Range, Cases, Targets;
const APInt &HighCase = Clusters[Last].High->getValue();		};
assert(LowCase.getBitWidth() == HighCase.getBitWidth());
		static PartitionStats getJumpTableStats(const CaseClusterVector &Clusters,
// FIXME: A range of consecutive cases has 100% density, but only requires one		unsigned First, unsigned Last,
// comparison to lower. We should discriminate against such consecutive ranges		bool HasReachableDefault) {
// in jump tables.		assert(Last >= First && "Invalid order of clusters");
return (HighCase - LowCase).getLimitedValue((UINT64_MAX - 1) / 100) + 1;
		SmallSet<const MachineBasicBlock *, 8> Targets;
		PartitionStats Stats;

		Stats.Cases = 0;
		for (unsigned i = First; i <= Last; ++i) {
		const APInt &Hi = Clusters[i].High->getValue(),
		&Lo = Clusters[i].Low->getValue();
		Stats.Cases += (Hi - Lo).getLimitedValue() + 1;

		Targets.insert(Clusters[i].MBB);
}		}
		assert(Stats.Cases < UINT64_MAX / 100 && "Too many cases");

uint64_t		const APInt &Hi = Clusters[Last].High->getValue(),
SwitchCG::getJumpTableNumCases(const SmallVectorImpl<unsigned> &TotalCases,		&Lo = Clusters[First].Low->getValue();
unsigned First, unsigned Last) {		assert(Hi.getBitWidth() == Lo.getBitWidth());
assert(Last >= First);		Stats.Range = (Hi - Lo).getLimitedValue((UINT64_MAX - 1) / 100) + 1;
assert(TotalCases[Last] >= TotalCases[First]);		assert(Stats.Range >= Stats.Cases && "Invalid range or number of cases");
uint64_t NumCases =
TotalCases[Last] - (First == 0 ? 0 : TotalCases[First - 1]);		Stats.Targets =
return NumCases;		Targets.size() + (HasReachableDefault && Stats.Range > Stats.Cases);

		return Stats;
}		}

void SwitchCG::SwitchLowering::findJumpTables(CaseClusterVector &Clusters,		void SwitchCG::SwitchLowering::findJumpTables(CaseClusterVector &Clusters,
const SwitchInst *SI,		const SwitchInst *SI,
MachineBasicBlock *DefaultMBB) {		MachineBasicBlock *DefaultMBB) {
#ifndef NDEBUG		#ifndef NDEBUG
// Clusters must be non-empty, sorted, and only contain Range clusters.		// Clusters must be non-empty, sorted, and only contain Range clusters.
assert(!Clusters.empty());		assert(!Clusters.empty());
Show All 10 Lines	#endif
const unsigned MinJumpTableEntries = TLI->getMinimumJumpTableEntries();		const unsigned MinJumpTableEntries = TLI->getMinimumJumpTableEntries();
const unsigned SmallNumberOfEntries = MinJumpTableEntries / 2;		const unsigned SmallNumberOfEntries = MinJumpTableEntries / 2;

// Bail if not enough cases.		// Bail if not enough cases.
const int64_t N = Clusters.size();		const int64_t N = Clusters.size();
if (N < 2 \|\| N < MinJumpTableEntries)		if (N < 2 \|\| N < MinJumpTableEntries)
return;		return;

// Accumulated number of cases in each cluster and those prior to it.		const bool HasReachableDefault =
SmallVector<unsigned, 8> TotalCases(N);		!isa<UnreachableInst>(DefaultMBB->getBasicBlock()->getFirstNonPHIOrDbg());
for (unsigned i = 0; i < N; ++i) {		PartitionStats Stats =
const APInt &Hi = Clusters[i].High->getValue();		getJumpTableStats(Clusters, 0, N - 1, HasReachableDefault);
const APInt &Lo = Clusters[i].Low->getValue();
TotalCases[i] = (Hi - Lo).getLimitedValue() + 1;
if (i != 0)
TotalCases[i] += TotalCases[i - 1];
}

uint64_t Range = getJumpTableRange(Clusters,0, N - 1);
uint64_t NumCases = getJumpTableNumCases(TotalCases, 0, N - 1);
assert(NumCases < UINT64_MAX / 100);
assert(Range >= NumCases);

// Cheap case: the whole range may be suitable for jump table.		// Cheap case: the whole range may be suitable for jump table.
if (TLI->isSuitableForJumpTable(SI, NumCases, Range)) {		if (TLI->isSuitableForJumpTable(SI, Stats.Cases, Stats.Targets, Stats.Range)) {
CaseCluster JTCluster;		CaseCluster JTCluster;
if (buildJumpTable(Clusters, 0, N - 1, SI, DefaultMBB, JTCluster)) {		if (buildJumpTable(Clusters, 0, N - 1, SI, DefaultMBB, JTCluster)) {
Clusters[0] = JTCluster;		Clusters[0] = JTCluster;
Clusters.resize(1);		Clusters.resize(1);
return;		return;
}		}
}		}

// The algorithm below is not suitable for -O0.		// The algorithm below is not suitable for -O0.
if (TM->getOptLevel() == CodeGenOpt::None)		if (TM->getOptLevel() == CodeGenOpt::None)
return;		return;

// Split Clusters into minimum number of dense partitions. The algorithm uses		// Split Clusters into minimum number of dense partitions. The algorithm uses
// the same idea as Kannan & Proebsting "Correction to 'Producing Good Code		// the same idea as Kannan & Proebsting "Correction to 'Producing Good Code
// for the Case Statement'" (1994), but builds the MinPartitions array in		// for the Case Statement'" (1994), but builds the MinPartitions array in
// reverse order to make it easier to reconstruct the partitions in ascending		// reverse order to make it easier to reconstruct the partitions in ascending
// order. In the choice between two optimal partitionings, it picks the one		// order. In the choice between two optimal partitionings, it picks the one
// which yields more jump tables.		// which yields more jump tables.

// MinPartitions[i] is the minimum nbr of partitions of Clusters[i..N-1].		// MinPartitions[i] is the minimum nbr of partitions of Clusters[i..N-1].
SmallVector<unsigned, 8> MinPartitions(N);		SmallVector<unsigned, 8> MinPartitions(N);
// LastElement[i] is the last element of the partition starting at i.		// LastElement[i] is the last element of the partition starting at i.
SmallVector<unsigned, 8> LastElement(N);		SmallVector<unsigned, 8> LastElement(N);
// PartitionsScore[i] is used to break ties when choosing between two
// partitionings resulting in the same number of partitions.
SmallVector<unsigned, 8> PartitionsScore(N);
// For PartitionsScore, a small number of comparisons is considered as good as		// For PartitionsScore, a small number of comparisons is considered as good as
// a jump table and a single comparison is considered better than a jump		// a jump table and a single comparison is considered better than a jump
// table.		// table.
enum PartitionScores : unsigned {		enum PartitionScores : unsigned {
NoTable = 0,		NoTable = 0,
Table = 1,		Table = 1,
FewCases = 1,		FewCases = 1,
SingleCase = 2		SingleCase = 2
};		};
		// PartitionsScore[i] is used to break ties when choosing between two
		// partitionings resulting in the same number of partitions.
		SmallVector<unsigned, 8> PartitionsScore(N);
		// PartitionsStats[j] is the stats for the partition Clusters[i..j].
		SmallVector<PartitionStats, 8> PartitionsStats(N);

// Base case: There is only one way to partition Clusters[N-1].		// Base case: There is only one way to partition Clusters[N-1].
MinPartitions[N - 1] = 1;		MinPartitions[N - 1] = 1;
LastElement[N - 1] = N - 1;		LastElement[N - 1] = N - 1;
PartitionsScore[N - 1] = PartitionScores::SingleCase;		PartitionsScore[N - 1] = PartitionScores::SingleCase;

// Note: loop indexes are signed to avoid underflow.		// Note: loop indexes are signed to avoid underflow.
for (int64_t i = N - 2; i >= 0; i--) {		for (int64_t i = N - 2; i >= 0; i--) {
// Find optimal partitioning of Clusters[i..N-1].		// Find optimal partitioning of Clusters[i..N-1].
// Baseline: Put Clusters[i] into a partition on its own.		// Baseline: Put Clusters[i] into a partition on its own.
MinPartitions[i] = MinPartitions[i + 1] + 1;		MinPartitions[i] = MinPartitions[i + 1] + 1;
LastElement[i] = i;		LastElement[i] = i;
PartitionsScore[i] = PartitionsScore[i + 1] + PartitionScores::SingleCase;		PartitionsScore[i] = PartitionsScore[i + 1] + PartitionScores::SingleCase;
		for (int64_t j = i + 1; j < N; j++)
		PartitionsStats[j] =
		getJumpTableStats(Clusters, i, j, HasReachableDefault);

// Search for a solution that results in fewer partitions.		// Search for a solution that results in fewer partitions.
for (int64_t j = N - 1; j > i; j--) {		for (int64_t j = N - 1; j > i; j--) {
// Try building a partition from Clusters[i..j].		// Try building a partition from Clusters[i..j].
Range = getJumpTableRange(Clusters, i, j);		if (TLI->isSuitableForJumpTable(SI, PartitionsStats[j].Cases,
NumCases = getJumpTableNumCases(TotalCases, i, j);		PartitionsStats[j].Targets,
assert(NumCases < UINT64_MAX / 100);		PartitionsStats[j].Range)) {
assert(Range >= NumCases);

if (TLI->isSuitableForJumpTable(SI, NumCases, Range)) {
unsigned NumPartitions = 1 + (j == N - 1 ? 0 : MinPartitions[j + 1]);		unsigned NumPartitions = 1 + (j == N - 1 ? 0 : MinPartitions[j + 1]);
unsigned Score = j == N - 1 ? 0 : PartitionsScore[j + 1];		unsigned Score = j == N - 1 ? 0 : PartitionsScore[j + 1];
int64_t NumEntries = j - i + 1;		int64_t NumEntries = j - i + 1;

if (NumEntries == 1)		if (NumEntries == 1)
Score += PartitionScores::SingleCase;		Score += PartitionScores::SingleCase;
else if (NumEntries <= SmallNumberOfEntries)		else if (NumEntries <= SmallNumberOfEntries)
Score += PartitionScores::FewCases;		Score += PartitionScores::FewCases;
▲ Show 20 Lines • Show All 340 Lines • Show Last 20 Lines

llvm/trunk/lib/CodeGen/TargetLoweringBase.cpp

Show First 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	static cl::opt<bool> JumpIsExpensiveOverride(
"jump-is-expensive", cl::init(false),		"jump-is-expensive", cl::init(false),
cl::desc("Do not create extra branches to split comparison logic."),		cl::desc("Do not create extra branches to split comparison logic."),
cl::Hidden);		cl::Hidden);

static cl::opt<unsigned> MinimumJumpTableEntries		static cl::opt<unsigned> MinimumJumpTableEntries
("min-jump-table-entries", cl::init(4), cl::Hidden,		("min-jump-table-entries", cl::init(4), cl::Hidden,
cl::desc("Set minimum number of entries to use a jump table."));		cl::desc("Set minimum number of entries to use a jump table."));

static cl::opt<unsigned> MaximumJumpTableSize		static cl::opt<unsigned> MaximumJumpTableTargets
("max-jump-table-size", cl::init(UINT_MAX), cl::Hidden,		("max-jump-table-targets", cl::init(UINT_MAX), cl::Hidden,
cl::desc("Set maximum size of jump tables."));		cl::desc("Set maximum number of targets to use in a jump table."));

/// Minimum jump table density for normal functions.		/// Minimum jump table density for normal functions.
static cl::opt<unsigned>		static cl::opt<unsigned>
JumpTableDensity("jump-table-density", cl::init(10), cl::Hidden,		JumpTableDensity("jump-table-density", cl::init(10), cl::Hidden,
cl::desc("Minimum density for building a jump table in "		cl::desc("Minimum density for building a jump table in "
"a normal function"));		"a normal function"));

/// Minimum jump table density for -Os or -Oz functions.		/// Minimum jump table density for -Os or -Oz functions.
▲ Show 20 Lines • Show All 1,687 Lines • ▼ Show 20 Lines
unsigned TargetLoweringBase::getMinimumJumpTableEntries() const {		unsigned TargetLoweringBase::getMinimumJumpTableEntries() const {
return MinimumJumpTableEntries;		return MinimumJumpTableEntries;
}		}

void TargetLoweringBase::setMinimumJumpTableEntries(unsigned Val) {		void TargetLoweringBase::setMinimumJumpTableEntries(unsigned Val) {
MinimumJumpTableEntries = Val;		MinimumJumpTableEntries = Val;
}		}

unsigned TargetLoweringBase::getMinimumJumpTableDensity(bool OptForSize) const {		unsigned TargetLoweringBase::getMaximumJumpTableTargets() const {
return OptForSize ? OptsizeJumpTableDensity : JumpTableDensity;		return MaximumJumpTableTargets;
}		}

unsigned TargetLoweringBase::getMaximumJumpTableSize() const {		void TargetLoweringBase::setMaximumJumpTableTargets(unsigned Val) {
return MaximumJumpTableSize;		MaximumJumpTableTargets = Val;
}		}

void TargetLoweringBase::setMaximumJumpTableSize(unsigned Val) {		unsigned TargetLoweringBase::getMinimumJumpTableDensity(bool OptForSize) const {
MaximumJumpTableSize = Val;		return OptForSize ? OptsizeJumpTableDensity : JumpTableDensity;
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Reciprocal Estimates		// Reciprocal Estimates
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

/// Get the reciprocal estimate attribute string for a function that will		/// Get the reciprocal estimate attribute string for a function that will
/// override the target defaults.		/// override the target defaults.
▲ Show 20 Lines • Show All 181 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 641 Lines • ▼ Show 20 Lines	AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,

// Set required alignment.		// Set required alignment.
setMinFunctionAlignment(llvm::Align(4));		setMinFunctionAlignment(llvm::Align(4));
// Set preferred alignments.		// Set preferred alignments.
setPrefLoopAlignment(llvm::Align(1ULL << STI.getPrefLoopLogAlignment()));		setPrefLoopAlignment(llvm::Align(1ULL << STI.getPrefLoopLogAlignment()));
setPrefFunctionAlignment(		setPrefFunctionAlignment(
llvm::Align(1ULL << STI.getPrefFunctionLogAlignment()));		llvm::Align(1ULL << STI.getPrefFunctionLogAlignment()));

// Only change the limit for entries in a jump table if specified by		// Only change the limit for targets in a jump table if specified by
// the sub target, but not at the command line.		// the sub target, but not at the command line.
unsigned MaxJT = STI.getMaximumJumpTableSize();		if (getMaximumJumpTableTargets() == UINT_MAX)
if (MaxJT && getMaximumJumpTableSize() == UINT_MAX)		setMaximumJumpTableTargets(STI.getMaximumJumpTableTargets());
setMaximumJumpTableSize(MaxJT);

setHasExtractBitsInsn(true);		setHasExtractBitsInsn(true);

setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::Other, Custom);		setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::Other, Custom);

if (Subtarget->hasNEON()) {		if (Subtarget->hasNEON()) {
// FIXME: v1f64 shouldn't be legal if we can avoid it, because it leads to		// FIXME: v1f64 shouldn't be legal if we can avoid it, because it leads to
// silliness like this:		// silliness like this:
▲ Show 20 Lines • Show All 11,707 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AArch64/AArch64Subtarget.h

Show First 20 Lines • Show All 194 Lines • ▼ Show 20 Lines	protected:
uint8_t MaxInterleaveFactor = 2;		uint8_t MaxInterleaveFactor = 2;
uint8_t VectorInsertExtractBaseCost = 3;		uint8_t VectorInsertExtractBaseCost = 3;
uint16_t CacheLineSize = 0;		uint16_t CacheLineSize = 0;
uint16_t PrefetchDistance = 0;		uint16_t PrefetchDistance = 0;
uint16_t MinPrefetchStride = 1;		uint16_t MinPrefetchStride = 1;
unsigned MaxPrefetchIterationsAhead = UINT_MAX;		unsigned MaxPrefetchIterationsAhead = UINT_MAX;
unsigned PrefFunctionLogAlignment = 0;		unsigned PrefFunctionLogAlignment = 0;
unsigned PrefLoopLogAlignment = 0;		unsigned PrefLoopLogAlignment = 0;
unsigned MaxJumpTableSize = 0;		unsigned MaxJumpTableTargets = UINT_MAX;
unsigned WideningBaseCost = 0;		unsigned WideningBaseCost = 0;

// ReserveXRegister[i] - X#i is not available as a general purpose register.		// ReserveXRegister[i] - X#i is not available as a general purpose register.
BitVector ReserveXRegister;		BitVector ReserveXRegister;

// CustomCallUsedXRegister[i] - X#i call saved.		// CustomCallUsedXRegister[i] - X#i call saved.
BitVector CustomCallSavedXRegs;		BitVector CustomCallSavedXRegs;

▲ Show 20 Lines • Show All 147 Lines • ▼ Show 20 Lines	public:
unsigned getMaxPrefetchIterationsAhead() const {		unsigned getMaxPrefetchIterationsAhead() const {
return MaxPrefetchIterationsAhead;		return MaxPrefetchIterationsAhead;
}		}
unsigned getPrefFunctionLogAlignment() const {		unsigned getPrefFunctionLogAlignment() const {
return PrefFunctionLogAlignment;		return PrefFunctionLogAlignment;
}		}
unsigned getPrefLoopLogAlignment() const { return PrefLoopLogAlignment; }		unsigned getPrefLoopLogAlignment() const { return PrefLoopLogAlignment; }

unsigned getMaximumJumpTableSize() const { return MaxJumpTableSize; }		unsigned getMaximumJumpTableTargets() const { return MaxJumpTableTargets; }

unsigned getWideningBaseCost() const { return WideningBaseCost; }		unsigned getWideningBaseCost() const { return WideningBaseCost; }

/// CPU has TBI (top byte of addresses is ignored during HW address		/// CPU has TBI (top byte of addresses is ignored during HW address
/// translation) and OS enables it.		/// translation) and OS enables it.
bool supportsAddressTopByteIgnored() const;		bool supportsAddressTopByteIgnored() const;

bool hasPerfMon() const { return HasPerfMon; }		bool hasPerfMon() const { return HasPerfMon; }
▲ Show 20 Lines • Show All 121 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AArch64/AArch64Subtarget.cpp

Show First 20 Lines • Show All 90 Lines • ▼ Show 20 Lines	void AArch64Subtarget::initializeProperties() {
case Cyclone:		case Cyclone:
CacheLineSize = 64;		CacheLineSize = 64;
PrefetchDistance = 280;		PrefetchDistance = 280;
MinPrefetchStride = 2048;		MinPrefetchStride = 2048;
MaxPrefetchIterationsAhead = 3;		MaxPrefetchIterationsAhead = 3;
break;		break;
case ExynosM1:		case ExynosM1:
MaxInterleaveFactor = 4;		MaxInterleaveFactor = 4;
MaxJumpTableSize = 8;		MaxJumpTableTargets = 8;
PrefFunctionLogAlignment = 4;		PrefFunctionLogAlignment = 4;
PrefLoopLogAlignment = 3;		PrefLoopLogAlignment = 3;
break;		break;
case ExynosM3:		case ExynosM3:
MaxInterleaveFactor = 4;		MaxInterleaveFactor = 4;
MaxJumpTableSize = 20;		MaxJumpTableTargets = 20;
PrefFunctionLogAlignment = 5;		PrefFunctionLogAlignment = 5;
PrefLoopLogAlignment = 4;		PrefLoopLogAlignment = 4;
break;		break;
case Falkor:		case Falkor:
MaxInterleaveFactor = 4;		MaxInterleaveFactor = 4;
// FIXME: remove this to enable 64-bit SLP if performance looks good.		// FIXME: remove this to enable 64-bit SLP if performance looks good.
MinVectorRegisterBitWidth = 128;		MinVectorRegisterBitWidth = 128;
CacheLineSize = 128;		CacheLineSize = 128;
▲ Show 20 Lines • Show All 192 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AArch64/max-jump-table.ll

; RUN: llc %s -O2 -print-machineinstrs -mtriple=aarch64-linux-gnu -jump-table-density=40 -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECK0 < %t		; RUN: llc %s -O2 -print-machineinstrs -mtriple=aarch64-linux-gnu -jump-table-density=40 -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECK0 < %t
; RUN: llc %s -O2 -print-machineinstrs -mtriple=aarch64-linux-gnu -jump-table-density=40 -max-jump-table-size=4 -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECK4 < %t		; RUN: llc %s -O2 -print-machineinstrs -mtriple=aarch64-linux-gnu -jump-table-density=40 -max-jump-table-targets=4 -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECK4 < %t
; RUN: llc %s -O2 -print-machineinstrs -mtriple=aarch64-linux-gnu -jump-table-density=40 -max-jump-table-size=8 -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECK8 < %t		; RUN: llc %s -O2 -print-machineinstrs -mtriple=aarch64-linux-gnu -jump-table-density=40 -max-jump-table-targets=8 -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECK8 < %t
; RUN: llc %s -O2 -print-machineinstrs -mtriple=aarch64-linux-gnu -jump-table-density=40 -max-jump-table-size=16 -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECK16 < %t		; RUN: llc %s -O2 -print-machineinstrs -mtriple=aarch64-linux-gnu -jump-table-density=40 -max-jump-table-targets=16 -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECK16 < %t
; RUN: llc %s -O2 -print-machineinstrs -mtriple=aarch64-linux-gnu -jump-table-density=40 -mcpu=exynos-m1 -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECKM1 < %t		; RUN: llc %s -O2 -print-machineinstrs -mtriple=aarch64-linux-gnu -jump-table-density=40 -mcpu=exynos-m1 -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECKM1 < %t
; RUN: llc %s -O2 -print-machineinstrs -mtriple=aarch64-linux-gnu -jump-table-density=40 -mcpu=exynos-m3 -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECKM3 < %t		; RUN: llc %s -O2 -print-machineinstrs -mtriple=aarch64-linux-gnu -jump-table-density=40 -mcpu=exynos-m3 -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECKM3 < %t

declare void @ext(i32, i32)		declare void @ext(i32, i32)

define i32 @jt1(i32 %a, i32 %b) {		define i32 @jt1(i32 %a, i32 %b) {
entry:		entry:
switch i32 %a, label %return [		switch i32 %a, label %return [
i32 1, label %bb1		i32 1, label %bb1
i32 2, label %bb2		i32 2, label %bb2
▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	switch i32 %x, label %return [
i32 15, label %bb6		i32 15, label %bb6
]		]
; CHECK-LABEL: function jt2:		; CHECK-LABEL: function jt2:
; CHECK-NEXT: Jump Tables:		; CHECK-NEXT: Jump Tables:
; CHECK0-NEXT: %jump-table.0: %bb.1 %bb.2 %bb.3 %bb.4 %bb.7 %bb.7 %bb.7 %bb.7 %bb.7 %bb.7 %bb.7 %bb.7 %bb.7 %bb.5 %bb.6{{$}}		; CHECK0-NEXT: %jump-table.0: %bb.1 %bb.2 %bb.3 %bb.4 %bb.7 %bb.7 %bb.7 %bb.7 %bb.7 %bb.7 %bb.7 %bb.7 %bb.7 %bb.5 %bb.6{{$}}
; CHECK0-NOT: %jump-table.1:		; CHECK0-NOT: %jump-table.1:
; CHECK4-NEXT: %jump-table.0: %bb.1 %bb.2 %bb.3 %bb.4{{$}}		; CHECK4-NEXT: %jump-table.0: %bb.1 %bb.2 %bb.3 %bb.4{{$}}
; CHECK4-NOT: %jump-table.1:		; CHECK4-NOT: %jump-table.1:
; CHECK8-NEXT: %jump-table.0: %bb.1 %bb.2 %bb.3 %bb.4{{$}}		; CHECK8-NEXT: %jump-table.0: %bb.1 %bb.2 %bb.3 %bb.4 %bb.7 %bb.7 %bb.7 %bb.7 %bb.7 %bb.7 %bb.7 %bb.7 %bb.7 %bb.5 %bb.6{{$}}
; CHECK8-NOT: %jump-table.1:		; CHECK8-NOT: %jump-table.1:
; CHECK16-NEXT: %jump-table.0: %bb.1 %bb.2 %bb.3 %bb.4 %bb.7 %bb.7 %bb.7 %bb.7 %bb.7 %bb.7 %bb.7 %bb.7 %bb.7 %bb.5 %bb.6{{$}}		; CHECK16-NEXT: %jump-table.0: %bb.1 %bb.2 %bb.3 %bb.4 %bb.7 %bb.7 %bb.7 %bb.7 %bb.7 %bb.7 %bb.7 %bb.7 %bb.7 %bb.5 %bb.6{{$}}
; CHECK16-NOT: %jump-table.1:		; CHECK16-NOT: %jump-table.1:
; CHECKM1-NEXT: %jump-table.0: %bb.1 %bb.2 %bb.3 %bb.4{{$}}		; CHECKM1-NEXT: %jump-table.0: %bb.1 %bb.2 %bb.3 %bb.4 %bb.7 %bb.7 %bb.7 %bb.7 %bb.7 %bb.7 %bb.7 %bb.7 %bb.7 %bb.5 %bb.6{{$}}
; CHECKM1-NOT: %jump-table.1:		; CHECKM1-NOT: %jump-table.1:
; CHECKM3-NEXT: %jump-table.0: %bb.1 %bb.2 %bb.3 %bb.4 %bb.7 %bb.7 %bb.7 %bb.7 %bb.7 %bb.7 %bb.7 %bb.7 %bb.7 %bb.5 %bb.6{{$}}		; CHECKM3-NEXT: %jump-table.0: %bb.1 %bb.2 %bb.3 %bb.4 %bb.7 %bb.7 %bb.7 %bb.7 %bb.7 %bb.7 %bb.7 %bb.7 %bb.7 %bb.5 %bb.6{{$}}
; CHECKM3-NOT: %jump-table.1:		; CHECKM3-NOT: %jump-table.1:
; CHECK-DAG: End machine code for function jt2.		; CHECK-DAG: End machine code for function jt2.

bb1: tail call void @ext(i32 6, i32 1) br label %return		bb1: tail call void @ext(i32 6, i32 1) br label %return
bb2: tail call void @ext(i32 5, i32 2) br label %return		bb2: tail call void @ext(i32 5, i32 2) br label %return
bb3: tail call void @ext(i32 4, i32 3) br label %return		bb3: tail call void @ext(i32 4, i32 3) br label %return
bb4: tail call void @ext(i32 3, i32 4) br label %return		bb4: tail call void @ext(i32 3, i32 4) br label %return
bb5: tail call void @ext(i32 2, i32 5) br label %return		bb5: tail call void @ext(i32 2, i32 5) br label %return
bb6: tail call void @ext(i32 1, i32 6) br label %return		bb6: tail call void @ext(i32 1, i32 6) br label %return

return: ret void		return: ret void
}		}

define void @jt3(i32 %x) {		define void @jt3(i32 %x) {
entry:		entry:
switch i32 %x, label %return [		switch i32 %x, label %return [
i32 1, label %bb1		i32 1, label %bb1
i32 2, label %bb2		i32 2, label %bb2
Show All 13 Lines	entry:
]		]
; CHECK-LABEL: function jt3:		; CHECK-LABEL: function jt3:
; CHECK-NEXT: Jump Tables:		; CHECK-NEXT: Jump Tables:
; CHECK0-NEXT: %jump-table.0: %bb.1 %bb.2 %bb.3 %bb.4 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.5 %bb.6 %bb.7 %bb.8 %bb.13 %bb.9 %bb.10 %bb.13 %bb.11 %bb.12		; CHECK0-NEXT: %jump-table.0: %bb.1 %bb.2 %bb.3 %bb.4 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.5 %bb.6 %bb.7 %bb.8 %bb.13 %bb.9 %bb.10 %bb.13 %bb.11 %bb.12
; CHECK0-NOT: %jump-table.1:		; CHECK0-NOT: %jump-table.1:
; CHECK4-NEXT: %jump-table.0: %bb.1 %bb.2 %bb.3 %bb.4		; CHECK4-NEXT: %jump-table.0: %bb.1 %bb.2 %bb.3 %bb.4
; CHECK4-NEXT: %jump-table.1: %bb.5 %bb.6 %bb.7 %bb.8		; CHECK4-NEXT: %jump-table.1: %bb.5 %bb.6 %bb.7 %bb.8
; CHECK4-NOT: %jump-table.2:		; CHECK4-NOT: %jump-table.2:
; CHECK8-NEXT: %jump-table.0: %bb.1 %bb.2 %bb.3 %bb.4		; CHECK8-NEXT: %jump-table.0: %bb.1 %bb.2 %bb.3 %bb.4 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.5 %bb.6 %bb.7
; CHECK8-NEXT: %jump-table.1: %bb.5 %bb.6 %bb.7 %bb.8 %bb.13 %bb.9 %bb.10		; CHECK8-NEXT: %jump-table.1: %bb.8 %bb.13 %bb.9 %bb.10 %bb.13 %bb.11 %bb.12
; CHECK8-NOT: %jump-table.2:		; CHECK8-NOT: %jump-table.2:
; CHECK16-NEXT: %jump-table.0: %bb.1 %bb.2 %bb.3 %bb.4 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.5 %bb.6 %bb.7		; CHECK16-NEXT: %jump-table.0: %bb.1 %bb.2 %bb.3 %bb.4 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.5 %bb.6 %bb.7 %bb.8 %bb.13 %bb.9 %bb.10 %bb.13 %bb.11 %bb.12
; CHECK16-NEXT: %jump-table.1: %bb.8 %bb.13 %bb.9 %bb.10 %bb.13 %bb.11 %bb.12		; CHECK16-NOT: %jump-table.1:
; CHECK16-NOT: %jump-table.2:		; CHECKM1-NEXT: %jump-table.0: %bb.1 %bb.2 %bb.3 %bb.4 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.5 %bb.6 %bb.7
; CHECKM1-NEXT: %jump-table.0: %bb.1 %bb.2 %bb.3 %bb.4		; CHECKM1-NEXT: %jump-table.1: %bb.8 %bb.13 %bb.9 %bb.10 %bb.13 %bb.11 %bb.12
; CHECKM1-NEXT: %jump-table.1: %bb.5 %bb.6 %bb.7 %bb.8 %bb.13 %bb.9 %bb.10
; CHECKM1-NOT: %jump-table.2:		; CHECKM1-NOT: %jump-table.2:
; CHECKM3-NEXT: %jump-table.0: %bb.1 %bb.2 %bb.3 %bb.4 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.5 %bb.6 %bb.7 %bb.8 %bb.13 %bb.9 %bb.10		; CHECKM3-NEXT: %jump-table.0: %bb.1 %bb.2 %bb.3 %bb.4 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.5 %bb.6 %bb.7 %bb.8 %bb.13 %bb.9 %bb.10
; CHECKM3-NOT: %jump-table.1:		; CHECKM3-NOT: %jump-table.1:
; CHECK-DAG: End machine code for function jt3.		; CHECK-DAG: End machine code for function jt3.

bb1: tail call void @ext(i32 1, i32 12) br label %return		bb1: tail call void @ext(i32 1, i32 12) br label %return
bb2: tail call void @ext(i32 2, i32 11) br label %return		bb2: tail call void @ext(i32 2, i32 11) br label %return
bb3: tail call void @ext(i32 3, i32 10) br label %return		bb3: tail call void @ext(i32 3, i32 10) br label %return
Show All 30 Lines	switch i32 %x, label %default [
i32 23, label %bb12		i32 23, label %bb12
]		]
; CHECK-LABEL: function jt4:		; CHECK-LABEL: function jt4:
; CHECK-NEXT: Jump Tables:		; CHECK-NEXT: Jump Tables:
; CHECK0-NEXT: %jump-table.0: %bb.1 %bb.2 %bb.3 %bb.4 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.5 %bb.6 %bb.7 %bb.8 %bb.13 %bb.9 %bb.10 %bb.13 %bb.11 %bb.12		; CHECK0-NEXT: %jump-table.0: %bb.1 %bb.2 %bb.3 %bb.4 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.5 %bb.6 %bb.7 %bb.8 %bb.13 %bb.9 %bb.10 %bb.13 %bb.11 %bb.12
; CHECK0-NOT: %jump-table.1:		; CHECK0-NOT: %jump-table.1:
; CHECK4-NEXT: %jump-table.0: %bb.1 %bb.2 %bb.3 %bb.4		; CHECK4-NEXT: %jump-table.0: %bb.1 %bb.2 %bb.3 %bb.4
; CHECK4-NEXT: %jump-table.1: %bb.5 %bb.6 %bb.7 %bb.8		; CHECK4-NEXT: %jump-table.1: %bb.5 %bb.6 %bb.7 %bb.8
; CHECK4-NOT: %jump-table.2:		; CHECK4-NEXT: %jump-table.2: %bb.9 %bb.10 %bb.13 %bb.11 %bb.12
; CHECK8-NEXT: %jump-table.0: %bb.1 %bb.2 %bb.3 %bb.4		; CHECK4-NOT: %jump-table.3:
; CHECK8-NEXT: %jump-table.1: %bb.5 %bb.6 %bb.7 %bb.8 %bb.13 %bb.9 %bb.10		; CHECK8-NEXT: %jump-table.0: %bb.1 %bb.2 %bb.3 %bb.4 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.5 %bb.6 %bb.7 %bb.8
		; CHECK8-NEXT: %jump-table.1: %bb.9 %bb.10 %bb.13 %bb.11 %bb.12
; CHECK8-NOT: %jump-table.2:		; CHECK8-NOT: %jump-table.2:
; CHECK16-NEXT: %jump-table.0: %bb.1 %bb.2 %bb.3 %bb.4 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.5 %bb.6 %bb.7		; CHECK16-NEXT: %jump-table.0: %bb.1 %bb.2 %bb.3 %bb.4 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.5 %bb.6 %bb.7 %bb.8 %bb.13 %bb.9 %bb.10 %bb.13 %bb.11 %bb.12
; CHECK16-NEXT: %jump-table.1: %bb.8 %bb.13 %bb.9 %bb.10 %bb.13 %bb.11 %bb.12		; CHECK16-NOT: %jump-table.1:
; CHECK16-NOT: %jump-table.2:		; CHECKM1-NEXT: %jump-table.0: %bb.1 %bb.2 %bb.3 %bb.4 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.5 %bb.6 %bb.7 %bb.8
; CHECKM1-NEXT: %jump-table.0: %bb.1 %bb.2 %bb.3 %bb.4		; CHECKM1-NEXT: %jump-table.1: %bb.9 %bb.10 %bb.13 %bb.11 %bb.12
; CHECKM1-NEXT: %jump-table.1: %bb.5 %bb.6 %bb.7 %bb.8 %bb.13 %bb.9 %bb.10
; CHECKM1-NOT: %jump-table.2:		; CHECKM1-NOT: %jump-table.2:
; CHECKM3-NEXT: %jump-table.0: %bb.1 %bb.2 %bb.3 %bb.4 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.5 %bb.6 %bb.7 %bb.8 %bb.13 %bb.9 %bb.10		; CHECKM3-NEXT: %jump-table.0: %bb.1 %bb.2 %bb.3 %bb.4 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.13 %bb.5 %bb.6 %bb.7 %bb.8 %bb.13 %bb.9 %bb.10
; CHECKM3-NOT: %jump-table.1:		; CHECKM3-NOT: %jump-table.1:
; CHECK-DAG: End machine code for function jt4.		; CHECK-DAG: End machine code for function jt4.

bb1: tail call void @ext(i32 1, i32 12) br label %return		bb1: tail call void @ext(i32 1, i32 12) br label %return
bb2: tail call void @ext(i32 2, i32 11) br label %return		bb2: tail call void @ext(i32 2, i32 11) br label %return
bb3: tail call void @ext(i32 3, i32 10) br label %return		bb3: tail call void @ext(i32 3, i32 10) br label %return
Show All 13 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Replace -max-jump-table-size with -max-jump-table-targetsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 221787

llvm/trunk/include/llvm/CodeGen/BasicTTIImpl.h

llvm/trunk/include/llvm/CodeGen/SwitchLoweringUtils.h

llvm/trunk/include/llvm/CodeGen/TargetLowering.h

llvm/trunk/lib/CodeGen/SwitchLoweringUtils.cpp

llvm/trunk/lib/CodeGen/TargetLoweringBase.cpp

llvm/trunk/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/trunk/lib/Target/AArch64/AArch64Subtarget.h

llvm/trunk/lib/Target/AArch64/AArch64Subtarget.cpp

llvm/trunk/test/CodeGen/AArch64/max-jump-table.ll

[CodeGen] Replace -max-jump-table-size with -max-jump-table-targets
ClosedPublic