This is an archive of the discontinued LLVM Phabricator instance.

[DAG] Extract switch lowering as a spearate object NFC
Needs ReviewPublic

Authored by junbuml on Mar 17 2017, 7:48 AM.

Download Raw Diff

Details

Reviewers

hans
chandlerc
mcrosier

Summary

This refactor the switch lowering to extract forming case clusters as a separate
class from SelectionDAGBuilder and decouple forming case cluster and building DAG.
Based on this refactoring, I will expose a TTI hook which allow inliner to use the same
logic when deciding inline cost for switch instructions.

Diff Detail

Event Timeline

junbuml created this revision.Mar 17 2017, 7:48 AM

Herald added subscribers: mgorny, mcrosier. · View Herald TranscriptMar 17 2017, 7:48 AM

junbuml retitled this revision from [DAG] Extract switch lowering as a spearate object NFC to [DAG] Extract switch lowering as a spearate object NFC.Mar 17 2017, 10:05 AM

junbuml added reviewers: chandlerc, mcrosier, hans.

junbuml added a subscriber: llvm-commits.

junbuml mentioned this in D29870: [InlineCost] Increase the cost of Switch.Mar 17 2017, 10:15 AM

junbuml mentioned this in D31085: [InlineCost] Increase the cost of Switch.Mar 17 2017, 10:20 AM

Hi Jun,

I think extracing the logic for clustering cases into jump tables, bit tests, etc. into a separate class might be a good idea, but I don't think we should extract the actual lowering parts. If the purpose of this is to expose a hook to figure out how a switch table will be lowered, the actual lowering doesn't need to happen here. The class that deals with clustering cases should not need to know about SDAGBuilder.

I'm concerned that this hook might turn out to be very expensive though.

In D31080#704197, @hans wrote:

Hi Jun,

I think extracing the logic for clustering cases into jump tables, bit tests, etc. into a separate class might be a good idea, but I don't think we should extract the actual lowering parts. If the purpose of this is to expose a hook to figure out how a switch table will be lowered, the actual lowering doesn't need to happen here. The class that deals with clustering cases should not need to know about SDAGBuilder.

Agreed.

I'm concerned that this hook might turn out to be very expensive though.

What primarily concerns you about the cost?

If it is the cost of actually computing the clustering, the thing the inliner wants could be a rough approximation. Put differently, we don't need to actually *form* the clustering, just have an estimate of how many branches will end up being used (as opposed to jump tables).

In D31080#704210, @chandlerc wrote:

I'm concerned that this hook might turn out to be very expensive though.

What primarily concerns you about the cost?

If it is the cost of actually computing the clustering, the thing the inliner wants could be a rough approximation. Put differently, we don't need to actually *form* the clustering, just have an estimate of how many branches will end up being used (as opposed to jump tables).

Yeah, it's the cost of computing the clustering. I thought most hooks were supposed to be roughly O(1), but this code is actually doing work -- O(n^2) in the worst case, alebeit with low overhead.

It might be possible to come up with some rough estimate though.

Another thought is that at some point it might be worth thinking about moving switch lowering to the IR level, to avoid the whole problem of computing the inlining cost, but that's a big project.

Thanks Hans and Chandler for the reviews. My original intention for this was to have a single logic for calculating case clusters so that we don't need to maintain several different versions depending on places it's used.

From that perspective, I extract the logic for case clustering abstractized in SwitchLoweringCaseClusterBuilder with two virtual methods (buildJumpTable and buildBitTests). Basic implementation of them in SwitchLoweringCaseClusterBuilder don't build anything related with DAGBuilder, which will be used in the TTI hook, but those in SwitchLoweringCaseClusterBuilderForDAG do extra stuffs for DAGbuilder as it will be invoked in SelectionDAGBuilder. However, I agree that SwitchLoweringCaseClusterBuilder should not need to know about SDAGBuilder.

I think having a single abstraction for calculating case clusters might be a good for maintenance and accuracy as long as the calculation cost is reasonable. So, would it make sense to define more virtual methods to cut out expensive calculations for non-DAG builder case ?

In D31080#705381, @junbuml wrote:

Thanks Hans and Chandler for the reviews. My original intention for this was to have a single logic for calculating case clusters so that we don't need to maintain several different versions depending on places it's used.

From that perspective, I extract the logic for case clustering abstractized in SwitchLoweringCaseClusterBuilder with two virtual methods (buildJumpTable and buildBitTests). Basic implementation of them in SwitchLoweringCaseClusterBuilder don't build anything related with DAGBuilder, which will be used in the TTI hook, but those in SwitchLoweringCaseClusterBuilderForDAG do extra stuffs for DAGbuilder as it will be invoked in SelectionDAGBuilder. However, I agree that SwitchLoweringCaseClusterBuilder should not need to know about SDAGBuilder.

I think having a single abstraction for calculating case clusters might be a good for maintenance and accuracy as long as the calculation cost is reasonable. So, would it make sense to define more virtual methods to cut out expensive calculations for non-DAG builder case ?

Rather than splitting those out as virtual calls, I think the nicest design would be if the case clustering logic could work as an analysis: you feed it an array of cases, from which it computes some kind of result which indicates what cases go in jump tables, which are bit tests, etc. The SDAGBuilder would then consume that result to actually build the jump tables etc.

Rather than splitting those out as virtual calls, I think the nicest design would be if the case clustering logic could work as an analysis: you feed it an array of cases, from which it computes some kind of result which indicates what cases go in jump tables, which are bit tests, etc. The SDAGBuilder would then consume that result to actually build the jump tables etc.

Could you give me little bit more details about what you mention because for me it seems almost same as what current implementation is doing. If I understand correctly, current implementation of switch lowering use CaseClusterVector (a vector of CaseCluster) storing what cases go to JT and what cases go to BTest. In visitSwitch(), we first build CaseClusterVector without actual lowering. After then SDAGBuilder use CaseClusterVector to actually lower to JT or BTest. Did you mean for us to use another array instead of CaseClusterVector ?

In D31080#706390, @junbuml wrote:

Rather than splitting those out as virtual calls, I think the nicest design would be if the case clustering logic could work as an analysis: you feed it an array of cases, from which it computes some kind of result which indicates what cases go in jump tables, which are bit tests, etc. The SDAGBuilder would then consume that result to actually build the jump tables etc.

Could you give me little bit more details about what you mention because for me it seems almost same as what current implementation is doing. If I understand correctly, current implementation of switch lowering use CaseClusterVector (a vector of CaseCluster) storing what cases go to JT and what cases go to BTest. In visitSwitch(), we first build CaseClusterVector without actual lowering. After then SDAGBuilder use CaseClusterVector to actually lower to JT or BTest. Did you mean for us to use another array instead of CaseClusterVector ?

The current implementation does try to split the analysis and lowering, but it doesn't succeed completely.

For example, findJumpTables() calls buildJumpTable() which creates a new MachineBasicBlock, creates a JumpTableHeader that it adds to the MachineFunction, and so on. In other words, it performs some lowering.

Your patch works around this by providing virtual methods for buildJumpTable(): one that doesn't actually build a jump table, and the other that does, which means it has to depend on SelectionDAGBuilder.

I'm saying it would be nice if the analysis and lowering could be separated further, so that the analysis does not have to know about SelectionDAGBuilder at all, and SelectionDAGBuilder would do the lowering entirely based on the results of the analysis.

The current implementation does try to split the analysis and lowering, but it doesn't succeed completely.

For example, findJumpTables() calls buildJumpTable() which creates a new MachineBasicBlock, creates a JumpTableHeader that it adds to the MachineFunction, and so on. In other words, it performs >some lowering.

Your patch works around this by providing virtual methods for buildJumpTable(): one that doesn't actually build a jump table, and the other that does, which means it has to depend on SelectionDAGBuilder.

Yes, I agree that the analysis shouldn't have to know about SelectionDAGBuilder.

I'm saying it would be nice if the analysis and lowering could be separated further, so that the analysis does not have to know about SelectionDAGBuilder at all, and SelectionDAGBuilder would do the lowering entirely based on the results of the analysis.

If the main concern is the visibility of SelectionDAGBuilder in the base class for the analysis (SwitchLoweringCaseClusterBuilder in my current patch), don't you think further refactoring to remove SelectionDAGBuilder in SwitchLoweringCaseClusterBuilder by moving SwitchLoweringCaseClusterBuilderForDAG into SelectionDAGBuilder could be reasonable design?

In D31080#706552, @junbuml wrote:

The current implementation does try to split the analysis and lowering, but it doesn't succeed completely.

For example, findJumpTables() calls buildJumpTable() which creates a new MachineBasicBlock, creates a JumpTableHeader that it adds to the MachineFunction, and so on. In other words, it performs >some lowering.

Your patch works around this by providing virtual methods for buildJumpTable(): one that doesn't actually build a jump table, and the other that does, which means it has to depend on SelectionDAGBuilder.

Yes, I agree that the analysis shouldn't have to know about SelectionDAGBuilder.

I'm saying it would be nice if the analysis and lowering could be separated further, so that the analysis does not have to know about SelectionDAGBuilder at all, and SelectionDAGBuilder would do the lowering entirely based on the results of the analysis.

If the main concern is the visibility of SelectionDAGBuilder in the base class for the analysis (SwitchLoweringCaseClusterBuilder in my current patch), don't you think further refactoring to remove SelectionDAGBuilder in SwitchLoweringCaseClusterBuilder by moving SwitchLoweringCaseClusterBuilderForDAG into SelectionDAGBuilder could be reasonable design?

Even if moving the subclass into SelectionDAGBuilder, it still doesn't seem like a very nice design; the analysis isn't just an analysis if it's calling back into lowering code. Also, SelectionDAGBuilders details are currently leaking into the CaseCluster struct in the JT/BTCasesIndex member.

I think it would be better if the analysis could work independently: taking as input a set of cases and returning a vector of clustered cases (each cluster might simply consist of a kind and iterators/indexes into the original set of cases).

From Hans' comment, decouple cluster calculation from lowering. Introduced a new structure CaseCluster which is only for cluster calculation. MachineCaseCluster will be filled from CaseCluster before lowering. Please take a look and let me know any comment.

junbuml updated this revision to Diff 93287.Mar 28 2017, 1:13 PM

In D31080#712466, @junbuml wrote:

From Hans' comment, decouple cluster calculation from lowering. Introduced a new structure CaseCluster which is only for cluster calculation.

Thanks! Things are getting better. I've made some comments, but haven't looked carefully at the changes to SelectionDAGBuilder.h/cpp yet.

MachineCaseCluster will be filled from CaseCluster before lowering.

This seems a little unfortunate, because it seems a MachineCaseCluster basically includes the same info as a CaseCluster with some more fields.

Would it be possible instead to do the lowering using just the CaseCluster objects, and maybe storing any auxiliary data on the side?

Before we get any further, I also would like to ask if you have done any measurements of compile-time with this set of patches. As I said before, I think this be quite an expensive hook to call for the inline cost analysis, and it would be nice to see some numbers. If it turns out that it is expensive, perhaps we could come up with some better inline cost heuristic, perhaps something based on the density of the switch.

include/llvm/CodeGen/SwitchCaseCluster.h
41 ↗	(On Diff #93287)	This should probably be a SmallVector since it will often contain only one element. I'm also not sure if having a typedef for it is really helpful. There needs to be a comment explaining that the vector holds case indexes from the switch.
69 ↗	(On Diff #93287)	I'm not sure this assert is worth it.
89 ↗	(On Diff #93287)	But how is C.Cases set?
104 ↗	(On Diff #93287)	Since TargetLowering.h is included, I guess this isn't needed.
131 ↗	(On Diff #93287)	What does return a default block mean? The default basic block of the switch? Why return it? Spelling, there's an i missing in initial.
142 ↗	(On Diff #93287)	Spelling: missing a in unreachable.
150 ↗	(On Diff #93287)	What is "it" here? The comment on the next line looks like leftover from earlier code? Same thing on the next method.
lib/CodeGen/SelectionDAG/CMakeLists.txt
26	No need to remove this blank line, I think.

Thanks Hans for the review.

Would it be possible instead to do the lowering using just the CaseCluster objects, and maybe storing any auxiliary data on the side?

My initial thought was to have only Kind and CaseVector in CaseCluster as we can extract Low and High from the CaseVector. However, this may increase runtime cost as we often access the Low and High, so I cached them in the struct itself.

MachineCaseCluster has two more fields 1) BranchProbability and 2) the union for the destination block or the case index in JT/BT. We can introduce an auxiliary structure to map clusters with the information, but I think this may increase code complexity and cost to access them through the auxiliary map. Instead of an auxiliary data for BranchProbability and the union, what about to hold a pointer to CaseCluster in MachineCaseCluster like :

struct MachineCaseCluster {
  CaseCluster *Cluster;
  union {
    MachineBasicBlock *MBB;
    unsigned JTCasesIndex;
    unsigned BTCasesIndex;
  };
  BranchProbability Prob;
}

Before we get any further, I also would like to ask if you have done any measurements of compile-time with this set of patches. As I said before, I think this be quite an expensive hook to call for the inline cost analysis, and it would be nice to see some numbers. If it turns out that it is expensive, perhaps we could come up with some better inline cost heuristic, perhaps something based on the density of the switch.

Sure, compile-time experiment should be reported with the patches. Regarding the cost for the hook, I want to discuss in D31085 in which the hook is introduced. I will copy your comment in D31085.

Added the pointer of CaseCluster as a member in MachineCaseCluster. Little bit more refactoring to share the same code for inline cost heuristic (D31782).

mcrosier resigned from this revision.Jul 26 2017, 6:09 AM

Revision Contents

Path

Size

include/

llvm/

CodeGen/

SwitchLoweringCaseCluster.h

284 lines

lib/

CodeGen/

SelectionDAG/

CMakeLists.txt

2 lines

SelectionDAGBuilder.h

189 lines

SelectionDAGBuilder.cpp

612 lines

SwitchLoweringCaseCluster.cpp

651 lines

Diff 92145

include/llvm/CodeGen/SwitchLoweringCaseCluster.h

This file was added.

				//===-- SwitchLoweringCaseCluster.h - Form case clusters from SwitchInst --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This implements routines for forming case clusters.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_CODEGEN_SWITCHLOWERINGCASECLUSTER_H
				#define LLVM_CODEGEN_SWITCHLOWERINGCASECLUSTER_H

				#include "llvm/CodeGen/FunctionLoweringInfo.h"
				#include "llvm/CodeGen/MachineJumpTableInfo.h"
				#include <vector>

				namespace llvm {

				class SelectionDAGBuilder;

				enum CaseClusterKind {
				/// A cluster of adjacent case labels with the same destination, or just one
				/// case.
				CC_Range,
				/// A cluster of cases suitable for jump table lowering.
				CC_JumpTable,
				/// A cluster of cases suitable for bit test lowering.
				CC_BitTests
				};

				/// A cluster of case labels.
				struct CaseCluster {
				CaseClusterKind Kind;
				const ConstantInt Low, High;
				const BasicBlock *BB;
				union {
				MachineBasicBlock *MBB;
				unsigned JTCasesIndex;
				unsigned BTCasesIndex;
				};
				BranchProbability Prob;

				static CaseCluster range(const ConstantInt Low, const ConstantInt High,
				const BasicBlock *BB, BranchProbability Prob) {
				CaseCluster C;
				C.Kind = CC_Range;
				C.Low = Low;
				C.High = High;
				C.BB = BB;
				C.Prob = Prob;
				return C;
				}

				static CaseCluster jumpTable(const ConstantInt Low, const ConstantInt High,
				unsigned JTCasesIndex, BranchProbability Prob) {
				CaseCluster C;
				C.Kind = CC_JumpTable;
				C.Low = Low;
				C.High = High;
				C.JTCasesIndex = JTCasesIndex;
				C.Prob = Prob;
				return C;
				}

				static CaseCluster bitTests(const ConstantInt Low, const ConstantInt High,
				unsigned BTCasesIndex, BranchProbability Prob) {
				CaseCluster C;
				C.Kind = CC_BitTests;
				C.Low = Low;
				C.High = High;
				C.BTCasesIndex = BTCasesIndex;
				C.Prob = Prob;
				return C;
				}
				};

				typedef std::vector<CaseCluster> CaseClusterVector;
				typedef CaseClusterVector::iterator CaseClusterIt;

				struct CaseBits {
				uint64_t Mask;
				MachineBasicBlock *BB;
				unsigned Bits;
				BranchProbability ExtraProb;

				CaseBits(uint64_t mask, MachineBasicBlock *bb, unsigned bits,
				BranchProbability Prob)
				: Mask(mask), BB(bb), Bits(bits), ExtraProb(Prob) {}

				CaseBits() : Mask(0), BB(nullptr), Bits(0) {}
				};

				typedef std::vector<CaseBits> CaseBitsVector;

				struct JumpTableCase {
				JumpTableCase(unsigned R, unsigned J, MachineBasicBlock *M,
				MachineBasicBlock *D)
				: Reg(R), JTI(J), MBB(M), Default(D) {

				assert(MBB && "how MBB is null in JumpTable");
				}

				/// Reg - the virtual register containing the index of the jump table entry
				/// to jump to.
				unsigned Reg;
				/// JTI - the JumpTableIndex for this jump table in the function.
				unsigned JTI;
				/// MBB - the MBB into which to emit the code for the indirect jump.
				MachineBasicBlock *MBB;
				/// Default - the MBB of the default bb, which is a successor of the range
				/// check MBB. This is when updating PHI nodes in successors.
				MachineBasicBlock *Default;
				};
				struct JumpTableHeader {
				JumpTableHeader(APInt F, APInt L, const Value SV, MachineBasicBlock H,
				bool E = false)
				: First(std::move(F)), Last(std::move(L)), SValue(SV), HeaderBB(H),
				Emitted(E) {}
				APInt First;
				APInt Last;
				const Value *SValue;
				MachineBasicBlock *HeaderBB;
				bool Emitted;
				};
				typedef std::pair<JumpTableHeader, JumpTableCase> JumpTableBlock;

				struct BitTestCase {
				BitTestCase(uint64_t M, MachineBasicBlock T, MachineBasicBlock Tr,
				BranchProbability Prob)
				: Mask(M), ThisBB(T), TargetBB(Tr), ExtraProb(Prob) {}
				uint64_t Mask;
				MachineBasicBlock *ThisBB;
				MachineBasicBlock *TargetBB;
				BranchProbability ExtraProb;
				};

				typedef SmallVector<BitTestCase, 3> BitTestInfo;

				struct BitTestBlock {
				BitTestBlock(APInt F, APInt R, const Value *SV, unsigned Rg, MVT RgVT, bool E,
				bool CR, MachineBasicBlock P, MachineBasicBlock D,
				BitTestInfo C, BranchProbability Pr)
				: First(std::move(F)), Range(std::move(R)), SValue(SV), Reg(Rg),
				RegVT(RgVT), Emitted(E), ContiguousRange(CR), Parent(P), Default(D),
				Cases(std::move(C)), Prob(Pr) {}
				APInt First;
				APInt Range;
				const Value *SValue;
				unsigned Reg;
				MVT RegVT;
				bool Emitted;
				bool ContiguousRange;
				MachineBasicBlock *Parent;
				MachineBasicBlock *Default;
				BitTestInfo Cases;
				BranchProbability Prob;
				BranchProbability DefaultProb;
				};

				class SwitchLoweringCaseClusterBuilder {
				public:
				const DataLayout &DL;
				const TargetLowering &TLI;
				const CodeGenOpt::Level OptLevel;

				SwitchLoweringCaseClusterBuilder(const DataLayout &DL,
				const TargetLowering &TLI,
				const CodeGenOpt::Level OptLevel)
				: DL(DL), TLI(TLI), OptLevel(OptLevel) {}

				virtual ~SwitchLoweringCaseClusterBuilder() {}

				/// Find clusters of cases suitable for jump table lowering.
				void findJumpTables(CaseClusterVector &Clusters, const SwitchInst *SI,
				const BasicBlock *DefaultBB);

				/// Find clusters of cases suitable for bit test lowering.
				void findBitTestClusters(CaseClusterVector &Clusters, const SwitchInst *SI);

				/// Extract cases from the switch and build inital form of case clusters.
				void formInitalCaseCluser(const SwitchInst &SI, CaseClusterVector &Clusters,
				BranchProbabilityInfo *BPI);

				// Replace an unreachable default with the most popular destination.
				const BasicBlock *replaceUnrechableDefault(const SwitchInst &SI,
				CaseClusterVector &Clusters);

				/// Check whether the range [Low,High] fits in a machine word.
				bool rangeFitsInWord(const APInt &Low, const APInt &High);

				private:
				/// Check whether these clusters are suitable for lowering with bit tests
				/// based on the number of destinations, comparison metric, and range.
				bool isSuitableForBitTests(unsigned NumDests, unsigned NumCmps,
				const APInt &Low, const APInt &High);

				/// Return true if it can build a bit test cluster from Clusters[First..Last].
				bool canBuildJumpTable(const CaseClusterVector &Clusters, unsigned First,
				unsigned Last, const SwitchInst *SI,
				CaseCluster &JTCluster);

				/// Returns true if it can build a jump table cluster from
				/// Clusters[First..Last].
				bool canBuildBitTest(CaseClusterVector &Clusters, unsigned First,
				unsigned Last, const SwitchInst *SI,
				CaseCluster &BTCluster);

				/// Check whether a range of clusters is dense enough for a jump table.
				bool isDense(const CaseClusterVector &Clusters,
				const SmallVectorImpl<unsigned> &TotalCases, unsigned First,
				unsigned Last, unsigned MinDensity) const;

				/// Sort Clusters and merge adjacent cases.
				void sortAndRangeify(CaseClusterVector &Clusters);

				/// Build a jump table cluster from Clusters[First..Last].
				virtual void buildJumpTable(const CaseClusterVector &Clusters, unsigned First,
				unsigned Last, const SwitchInst *SI,
				const BasicBlock *DefaultBB,
				CaseCluster &JTCluster);

				/// Build a bit test cluster from Clusters[First..Last].
				virtual void buildBitTests(CaseClusterVector &Clusters, unsigned First,
				unsigned Last, const SwitchInst *SI,
				CaseCluster &BTCluster);
				};

				class SwitchLoweringCaseClusterBuilderForDAG
				: public SwitchLoweringCaseClusterBuilder {
				FunctionLoweringInfo &FuncInfo;
				SelectionDAGBuilder *SDB;

				public:
				SwitchLoweringCaseClusterBuilderForDAG(const DataLayout &DL,
				const TargetLowering &TLI,
				const CodeGenOpt::Level OptLevel,
				FunctionLoweringInfo &FuncInfo,
				SelectionDAGBuilder *SDB)
				: SwitchLoweringCaseClusterBuilder(DL, TLI, OptLevel), FuncInfo(FuncInfo),
				SDB(SDB) {}

				~SwitchLoweringCaseClusterBuilderForDAG() {}

				private:
				void buildJumpTable(const CaseClusterVector &Clusters, unsigned First,
				unsigned Last, const SwitchInst *SI,
				const BasicBlock *DefaultBB,
				CaseCluster &JTCluster) override;

				void buildBitTests(CaseClusterVector &Clusters, unsigned First, unsigned Last,
				const SwitchInst *SI, CaseCluster &BTCluster) override;
				};

				class SwitchLoweringCaseCluster {
				SwitchLoweringCaseClusterBuilder *ClusterBuilder;

				public:
				SwitchLoweringCaseCluster(const DataLayout &DL, const TargetLowering &TLI,
				const CodeGenOpt::Level OptLevel,
				FunctionLoweringInfo &FuncInfo,
				SelectionDAGBuilder *SDB) {
				ClusterBuilder = new SwitchLoweringCaseClusterBuilderForDAG(
				DL, TLI, OptLevel, FuncInfo, SDB);
				}

				SwitchLoweringCaseCluster(const DataLayout &DL, const TargetLowering &TLI,
				const CodeGenOpt::Level OptLevel) {
				ClusterBuilder = new SwitchLoweringCaseClusterBuilder(DL, TLI, OptLevel);
				}

				~SwitchLoweringCaseCluster() { delete ClusterBuilder; }

				const BasicBlock *findCaseClusters(const SwitchInst &SI,
				CaseClusterVector &Clusters,
				BranchProbabilityInfo *BPI);
				};

				} // end namespace llvm

				#endif

lib/CodeGen/SelectionDAG/CMakeLists.txt

Show All 16 Lines	add_llvm_library(LLVMSelectionDAG
ScheduleDAGVLIW.cpp		ScheduleDAGVLIW.cpp
SelectionDAGBuilder.cpp		SelectionDAGBuilder.cpp
SelectionDAG.cpp		SelectionDAG.cpp
SelectionDAGDumper.cpp		SelectionDAGDumper.cpp
SelectionDAGISel.cpp		SelectionDAGISel.cpp
SelectionDAGPrinter.cpp		SelectionDAGPrinter.cpp
SelectionDAGTargetInfo.cpp		SelectionDAGTargetInfo.cpp
StatepointLowering.cpp		StatepointLowering.cpp
		SwitchLoweringCaseCluster.cpp
TargetLowering.cpp		TargetLowering.cpp

hansUnsubmitted Not Done Reply Inline Actions No need to remove this blank line, I think. hans: No need to remove this blank line, I think.
DEPENDS		DEPENDS
intrinsics_gen		intrinsics_gen
)		)

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h

Show All 14 Lines
#define LLVM_LIB_CODEGEN_SELECTIONDAG_SELECTIONDAGBUILDER_H		#define LLVM_LIB_CODEGEN_SELECTIONDAG_SELECTIONDAGBUILDER_H

#include "StatepointLowering.h"		#include "StatepointLowering.h"
#include "llvm/ADT/APInt.h"		#include "llvm/ADT/APInt.h"
#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/CodeGen/SelectionDAG.h"		#include "llvm/CodeGen/SelectionDAG.h"
#include "llvm/CodeGen/SelectionDAGNodes.h"		#include "llvm/CodeGen/SelectionDAGNodes.h"
		#include "llvm/CodeGen/SwitchLoweringCaseCluster.h"
#include "llvm/IR/CallSite.h"		#include "llvm/IR/CallSite.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/Statepoint.h"		#include "llvm/IR/Statepoint.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include "llvm/Target/TargetLowering.h"		#include "llvm/Target/TargetLowering.h"
#include <utility>		#include <utility>
#include <vector>		#include <vector>

▲ Show 20 Lines • Show All 99 Lines • ▼ Show 20 Lines	private:
/// up and the emit a single tokenfactor for them just before terminator		/// up and the emit a single tokenfactor for them just before terminator
/// instructions.		/// instructions.
SmallVector<SDValue, 8> PendingExports;		SmallVector<SDValue, 8> PendingExports;

/// SDNodeOrder - A unique monotonically increasing number used to order the		/// SDNodeOrder - A unique monotonically increasing number used to order the
/// SDNodes we create.		/// SDNodes we create.
unsigned SDNodeOrder;		unsigned SDNodeOrder;

enum CaseClusterKind {
/// A cluster of adjacent case labels with the same destination, or just one
/// case.
CC_Range,
/// A cluster of cases suitable for jump table lowering.
CC_JumpTable,
/// A cluster of cases suitable for bit test lowering.
CC_BitTests
};

/// A cluster of case labels.
struct CaseCluster {
CaseClusterKind Kind;
const ConstantInt Low, High;
union {
MachineBasicBlock *MBB;
unsigned JTCasesIndex;
unsigned BTCasesIndex;
};
BranchProbability Prob;

static CaseCluster range(const ConstantInt Low, const ConstantInt High,
MachineBasicBlock *MBB, BranchProbability Prob) {
CaseCluster C;
C.Kind = CC_Range;
C.Low = Low;
C.High = High;
C.MBB = MBB;
C.Prob = Prob;
return C;
}

static CaseCluster jumpTable(const ConstantInt *Low,
const ConstantInt *High, unsigned JTCasesIndex,
BranchProbability Prob) {
CaseCluster C;
C.Kind = CC_JumpTable;
C.Low = Low;
C.High = High;
C.JTCasesIndex = JTCasesIndex;
C.Prob = Prob;
return C;
}

static CaseCluster bitTests(const ConstantInt Low, const ConstantInt High,
unsigned BTCasesIndex, BranchProbability Prob) {
CaseCluster C;
C.Kind = CC_BitTests;
C.Low = Low;
C.High = High;
C.BTCasesIndex = BTCasesIndex;
C.Prob = Prob;
return C;
}
};

typedef std::vector<CaseCluster> CaseClusterVector;
typedef CaseClusterVector::iterator CaseClusterIt;

struct CaseBits {
uint64_t Mask;
MachineBasicBlock* BB;
unsigned Bits;
BranchProbability ExtraProb;

CaseBits(uint64_t mask, MachineBasicBlock* bb, unsigned bits,
BranchProbability Prob):
Mask(mask), BB(bb), Bits(bits), ExtraProb(Prob) { }

CaseBits() : Mask(0), BB(nullptr), Bits(0) {}
};

typedef std::vector<CaseBits> CaseBitsVector;

/// Sort Clusters and merge adjacent cases.
void sortAndRangeify(CaseClusterVector &Clusters);

/// CaseBlock - This structure is used to communicate between		/// CaseBlock - This structure is used to communicate between
/// SelectionDAGBuilder and SDISel for the code generation of additional basic		/// SelectionDAGBuilder and SDISel for the code generation of additional basic
/// blocks needed by multi-case switch statements.		/// blocks needed by multi-case switch statements.
struct CaseBlock {		struct CaseBlock {
CaseBlock(ISD::CondCode cc, const Value cmplhs, const Value cmprhs,		CaseBlock(ISD::CondCode cc, const Value cmplhs, const Value cmprhs,
const Value cmpmiddle, MachineBasicBlock truebb,		const Value cmpmiddle, MachineBasicBlock truebb,
MachineBasicBlock falsebb, MachineBasicBlock me,		MachineBasicBlock falsebb, MachineBasicBlock me,
BranchProbability trueprob = BranchProbability::getUnknown(),		BranchProbability trueprob = BranchProbability::getUnknown(),
Show All 15 Lines	struct CaseBlock {

// ThisBB - the block into which to emit the code for the setcc and branches		// ThisBB - the block into which to emit the code for the setcc and branches
MachineBasicBlock *ThisBB;		MachineBasicBlock *ThisBB;

// TrueProb/FalseProb - branch weights.		// TrueProb/FalseProb - branch weights.
BranchProbability TrueProb, FalseProb;		BranchProbability TrueProb, FalseProb;
};		};

struct JumpTable {
JumpTable(unsigned R, unsigned J, MachineBasicBlock *M,
MachineBasicBlock *D): Reg(R), JTI(J), MBB(M), Default(D) {}

/// Reg - the virtual register containing the index of the jump table entry
//. to jump to.
unsigned Reg;
/// JTI - the JumpTableIndex for this jump table in the function.
unsigned JTI;
/// MBB - the MBB into which to emit the code for the indirect jump.
MachineBasicBlock *MBB;
/// Default - the MBB of the default bb, which is a successor of the range
/// check MBB. This is when updating PHI nodes in successors.
MachineBasicBlock *Default;
};
struct JumpTableHeader {
JumpTableHeader(APInt F, APInt L, const Value SV, MachineBasicBlock H,
bool E = false)
: First(std::move(F)), Last(std::move(L)), SValue(SV), HeaderBB(H),
Emitted(E) {}
APInt First;
APInt Last;
const Value *SValue;
MachineBasicBlock *HeaderBB;
bool Emitted;
};
typedef std::pair<JumpTableHeader, JumpTable> JumpTableBlock;

struct BitTestCase {
BitTestCase(uint64_t M, MachineBasicBlock* T, MachineBasicBlock* Tr,
BranchProbability Prob):
Mask(M), ThisBB(T), TargetBB(Tr), ExtraProb(Prob) { }
uint64_t Mask;
MachineBasicBlock *ThisBB;
MachineBasicBlock *TargetBB;
BranchProbability ExtraProb;
};

typedef SmallVector<BitTestCase, 3> BitTestInfo;

struct BitTestBlock {
BitTestBlock(APInt F, APInt R, const Value *SV, unsigned Rg, MVT RgVT,
bool E, bool CR, MachineBasicBlock P, MachineBasicBlock D,
BitTestInfo C, BranchProbability Pr)
: First(std::move(F)), Range(std::move(R)), SValue(SV), Reg(Rg),
RegVT(RgVT), Emitted(E), ContiguousRange(CR), Parent(P), Default(D),
Cases(std::move(C)), Prob(Pr) {}
APInt First;
APInt Range;
const Value *SValue;
unsigned Reg;
MVT RegVT;
bool Emitted;
bool ContiguousRange;
MachineBasicBlock *Parent;
MachineBasicBlock *Default;
BitTestInfo Cases;
BranchProbability Prob;
BranchProbability DefaultProb;
};

/// Check whether a range of clusters is dense enough for a jump table.
bool isDense(const CaseClusterVector &Clusters,
const SmallVectorImpl<unsigned> &TotalCases,
unsigned First, unsigned Last, unsigned MinDensity) const;

/// Build a jump table cluster from Clusters[First..Last]. Returns false if it
/// decides it's not a good idea.
bool buildJumpTable(const CaseClusterVector &Clusters, unsigned First,
unsigned Last, const SwitchInst *SI,
MachineBasicBlock *DefaultMBB, CaseCluster &JTCluster);

/// Find clusters of cases suitable for jump table lowering.
void findJumpTables(CaseClusterVector &Clusters, const SwitchInst *SI,
MachineBasicBlock *DefaultMBB);

/// Check whether the range [Low,High] fits in a machine word.
bool rangeFitsInWord(const APInt &Low, const APInt &High);

/// Check whether these clusters are suitable for lowering with bit tests based
/// on the number of destinations, comparison metric, and range.
bool isSuitableForBitTests(unsigned NumDests, unsigned NumCmps,
const APInt &Low, const APInt &High);

/// Build a bit test cluster from Clusters[First..Last]. Returns false if it
/// decides it's not a good idea.
bool buildBitTests(CaseClusterVector &Clusters, unsigned First, unsigned Last,
const SwitchInst *SI, CaseCluster &BTCluster);

/// Find clusters of cases suitable for bit test lowering.
void findBitTestClusters(CaseClusterVector &Clusters, const SwitchInst *SI);

struct SwitchWorkListItem {		struct SwitchWorkListItem {
MachineBasicBlock *MBB;		MachineBasicBlock *MBB;
CaseClusterIt FirstCluster;		CaseClusterIt FirstCluster;
CaseClusterIt LastCluster;		CaseClusterIt LastCluster;
const ConstantInt *GE;		const ConstantInt *GE;
const ConstantInt *LT;		const ConstantInt *LT;
BranchProbability DefaultProb;		BranchProbability DefaultProb;
};		};
▲ Show 20 Lines • Show All 255 Lines • ▼ Show 20 Lines	public:
/// HasTailCall - This is set to true if a call in the current		/// HasTailCall - This is set to true if a call in the current
/// block has been translated as a tail call. In this case,		/// block has been translated as a tail call. In this case,
/// no subsequent DAG nodes should be created.		/// no subsequent DAG nodes should be created.
///		///
bool HasTailCall;		bool HasTailCall;

LLVMContext *Context;		LLVMContext *Context;

		/// Helper object to form case clusters for SwitchInst.
		SwitchLoweringCaseCluster *CaseClusters;

SelectionDAGBuilder(SelectionDAG &dag, FunctionLoweringInfo &funcinfo,		SelectionDAGBuilder(SelectionDAG &dag, FunctionLoweringInfo &funcinfo,
CodeGenOpt::Level ol)		CodeGenOpt::Level ol)
: CurInst(nullptr), SDNodeOrder(LowestSDNodeOrder), TM(dag.getTarget()),		: CurInst(nullptr), SDNodeOrder(LowestSDNodeOrder), TM(dag.getTarget()),
DAG(dag), FuncInfo(funcinfo),		DAG(dag), FuncInfo(funcinfo), HasTailCall(false),
HasTailCall(false) {		CaseClusters(nullptr) {}

		~SelectionDAGBuilder() {
		if (CaseClusters)
		delete CaseClusters;
}		}

void init(GCFunctionInfo *gfi, AliasAnalysis &aa,		void init(GCFunctionInfo *gfi, AliasAnalysis &aa,
const TargetLibraryInfo *li);		const TargetLibraryInfo *li);

/// Clear out the current SelectionDAG and the associated state and prepare		/// Clear out the current SelectionDAG and the associated state and prepare
/// this SelectionDAGBuilder object to be used for a new block. This doesn't		/// this SelectionDAGBuilder object to be used for a new block. This doesn't
/// clear out information about additional blocks that are needed to complete		/// clear out information about additional blocks that are needed to complete
▲ Show 20 Lines • Show All 165 Lines • ▼ Show 20 Lines	private:
void visitCleanupRet(const CleanupReturnInst &I);		void visitCleanupRet(const CleanupReturnInst &I);
void visitCatchSwitch(const CatchSwitchInst &I);		void visitCatchSwitch(const CatchSwitchInst &I);
void visitCatchRet(const CatchReturnInst &I);		void visitCatchRet(const CatchReturnInst &I);
void visitCatchPad(const CatchPadInst &I);		void visitCatchPad(const CatchPadInst &I);
void visitCleanupPad(const CleanupPadInst &CPI);		void visitCleanupPad(const CleanupPadInst &CPI);

BranchProbability getEdgeProbability(const MachineBasicBlock *Src,		BranchProbability getEdgeProbability(const MachineBasicBlock *Src,
const MachineBasicBlock *Dst) const;		const MachineBasicBlock *Dst) const;
		public:
void addSuccessorWithProb(		void addSuccessorWithProb(
MachineBasicBlock Src, MachineBasicBlock Dst,		MachineBasicBlock Src, MachineBasicBlock Dst,
BranchProbability Prob = BranchProbability::getUnknown());		BranchProbability Prob = BranchProbability::getUnknown());

public:
void visitSwitchCase(CaseBlock &CB,		void visitSwitchCase(CaseBlock &CB,
MachineBasicBlock *SwitchBB);		MachineBasicBlock *SwitchBB);
void visitSPDescriptorParent(StackProtectorDescriptor &SPD,		void visitSPDescriptorParent(StackProtectorDescriptor &SPD,
MachineBasicBlock *ParentBB);		MachineBasicBlock *ParentBB);
void visitSPDescriptorFailure(StackProtectorDescriptor &SPD);		void visitSPDescriptorFailure(StackProtectorDescriptor &SPD);
void visitBitTestHeader(BitTestBlock &B, MachineBasicBlock *SwitchBB);		void visitBitTestHeader(BitTestBlock &B, MachineBasicBlock *SwitchBB);
void visitBitTestCase(BitTestBlock &BB,		void visitBitTestCase(BitTestBlock &BB,
MachineBasicBlock* NextMBB,		MachineBasicBlock* NextMBB,
BranchProbability BranchProbToNext,		BranchProbability BranchProbToNext,
unsigned Reg,		unsigned Reg,
BitTestCase &B,		BitTestCase &B,
MachineBasicBlock *SwitchBB);		MachineBasicBlock *SwitchBB);
void visitJumpTable(JumpTable &JT);		void visitJumpTable(JumpTableCase &JT);
void visitJumpTableHeader(JumpTable &JT, JumpTableHeader &JTH,		void visitJumpTableHeader(JumpTableCase &JT, JumpTableHeader &JTH,
MachineBasicBlock *SwitchBB);		MachineBasicBlock *SwitchBB);

private:		private:
// These all get lowered before this pass.		// These all get lowered before this pass.
void visitInvoke(const InvokeInst &I);		void visitInvoke(const InvokeInst &I);
void visitResume(const ResumeInst &I);		void visitResume(const ResumeInst &I);

void visitBinary(const User &I, unsigned OpCode);		void visitBinary(const User &I, unsigned OpCode);
▲ Show 20 Lines • Show All 196 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 82 Lines • ▼ Show 20 Lines	LimitFPPrecision("limit-float-precision",
"for some float libcalls"),		"for some float libcalls"),
cl::location(LimitFloatPrecision),		cl::location(LimitFloatPrecision),
cl::init(0));		cl::init(0));

static cl::opt<bool>		static cl::opt<bool>
EnableFMFInDAG("enable-fmf-dag", cl::init(true), cl::Hidden,		EnableFMFInDAG("enable-fmf-dag", cl::init(true), cl::Hidden,
cl::desc("Enable fast-math-flags for DAG nodes"));		cl::desc("Enable fast-math-flags for DAG nodes"));

/// Minimum jump table density for normal functions.
static cl::opt<unsigned>
JumpTableDensity("jump-table-density", cl::init(10), cl::Hidden,
cl::desc("Minimum density for building a jump table in "
"a normal function"));

/// Minimum jump table density for -Os or -Oz functions.
static cl::opt<unsigned>
OptsizeJumpTableDensity("optsize-jump-table-density", cl::init(40), cl::Hidden,
cl::desc("Minimum density for building a jump table in "
"an optsize function"));


// Limit the width of DAG chains. This is important in general to prevent		// Limit the width of DAG chains. This is important in general to prevent
// DAG-based analysis from blowing up. For example, alias analysis and		// DAG-based analysis from blowing up. For example, alias analysis and
// load clustering may not complete in reasonable time. It is difficult to		// load clustering may not complete in reasonable time. It is difficult to
// recognize and avoid this situation within each individual analysis, and		// recognize and avoid this situation within each individual analysis, and
// future analyses are likely to have the same behavior. Limiting DAG width is		// future analyses are likely to have the same behavior. Limiting DAG width is
// the safe approach and will be especially important with global DAGs.		// the safe approach and will be especially important with global DAGs.
//		//
// MaxParallelChains default is arbitrarily high to avoid affecting		// MaxParallelChains default is arbitrarily high to avoid affecting
▲ Show 20 Lines • Show All 1,789 Lines • ▼ Show 20 Lines	void SelectionDAGBuilder::visitSwitchCase(CaseBlock &CB,
// the branch condition.		// the branch condition.
BrCond = DAG.getNode(ISD::BR, dl, MVT::Other, BrCond,		BrCond = DAG.getNode(ISD::BR, dl, MVT::Other, BrCond,
DAG.getBasicBlock(CB.FalseBB));		DAG.getBasicBlock(CB.FalseBB));

DAG.setRoot(BrCond);		DAG.setRoot(BrCond);
}		}

/// visitJumpTable - Emit JumpTable node in the current MBB		/// visitJumpTable - Emit JumpTable node in the current MBB
void SelectionDAGBuilder::visitJumpTable(JumpTable &JT) {		void SelectionDAGBuilder::visitJumpTable(JumpTableCase &JT) {
// Emit the code for the jump table		// Emit the code for the jump table
assert(JT.Reg != -1U && "Should lower JT Header first!");		assert(JT.Reg != -1U && "Should lower JT Header first!");
EVT PTy = DAG.getTargetLoweringInfo().getPointerTy(DAG.getDataLayout());		EVT PTy = DAG.getTargetLoweringInfo().getPointerTy(DAG.getDataLayout());
SDValue Index = DAG.getCopyFromReg(getControlRoot(), getCurSDLoc(),		SDValue Index = DAG.getCopyFromReg(getControlRoot(), getCurSDLoc(),
JT.Reg, PTy);		JT.Reg, PTy);
SDValue Table = DAG.getJumpTable(JT.JTI, PTy);		SDValue Table = DAG.getJumpTable(JT.JTI, PTy);
SDValue BrJumpTable = DAG.getNode(ISD::BR_JT, getCurSDLoc(),		SDValue BrJumpTable = DAG.getNode(ISD::BR_JT, getCurSDLoc(),
MVT::Other, Index.getValue(1),		MVT::Other, Index.getValue(1),
Table, Index);		Table, Index);
DAG.setRoot(BrJumpTable);		DAG.setRoot(BrJumpTable);
}		}

/// visitJumpTableHeader - This function emits necessary code to produce index		/// visitJumpTableHeader - This function emits necessary code to produce index
/// in the JumpTable from switch case.		/// in the JumpTable from switch case.
void SelectionDAGBuilder::visitJumpTableHeader(JumpTable &JT,		void SelectionDAGBuilder::visitJumpTableHeader(JumpTableCase &JT,
JumpTableHeader &JTH,		JumpTableHeader &JTH,
MachineBasicBlock *SwitchBB) {		MachineBasicBlock *SwitchBB) {
SDLoc dl = getCurSDLoc();		SDLoc dl = getCurSDLoc();

// Subtract the lowest switch case value from the value being switched on and		// Subtract the lowest switch case value from the value being switched on and
// conditional branch to default mbb if the result is greater than the		// conditional branch to default mbb if the result is greater than the
// difference between smallest and largest cases.		// difference between smallest and largest cases.
SDValue SwitchOp = getValue(JTH.SValue);		SDValue SwitchOp = getValue(JTH.SValue);
▲ Show 20 Lines • Show All 407 Lines • ▼ Show 20 Lines	Ops[1] = DAG.getZExtOrTrunc(
dl, ValueVTs[1]);		dl, ValueVTs[1]);

// Merge into one.		// Merge into one.
SDValue Res = DAG.getNode(ISD::MERGE_VALUES, dl,		SDValue Res = DAG.getNode(ISD::MERGE_VALUES, dl,
DAG.getVTList(ValueVTs), Ops);		DAG.getVTList(ValueVTs), Ops);
setValue(&LP, Res);		setValue(&LP, Res);
}		}

void SelectionDAGBuilder::sortAndRangeify(CaseClusterVector &Clusters) {
#ifndef NDEBUG
for (const CaseCluster &CC : Clusters)
assert(CC.Low == CC.High && "Input clusters must be single-case");
#endif

std::sort(Clusters.begin(), Clusters.end(),
[](const CaseCluster &a, const CaseCluster &b) {
return a.Low->getValue().slt(b.Low->getValue());
});

// Merge adjacent clusters with the same destination.
const unsigned N = Clusters.size();
unsigned DstIndex = 0;
for (unsigned SrcIndex = 0; SrcIndex < N; ++SrcIndex) {
CaseCluster &CC = Clusters[SrcIndex];
const ConstantInt *CaseVal = CC.Low;
MachineBasicBlock *Succ = CC.MBB;

if (DstIndex != 0 && Clusters[DstIndex - 1].MBB == Succ &&
(CaseVal->getValue() - Clusters[DstIndex - 1].High->getValue()) == 1) {
// If this case has the same successor and is a neighbour, merge it into
// the previous cluster.
Clusters[DstIndex - 1].High = CaseVal;
Clusters[DstIndex - 1].Prob += CC.Prob;
} else {
std::memmove(&Clusters[DstIndex++], &Clusters[SrcIndex],
sizeof(Clusters[SrcIndex]));
}
}
Clusters.resize(DstIndex);
}

void SelectionDAGBuilder::UpdateSplitBlock(MachineBasicBlock *First,		void SelectionDAGBuilder::UpdateSplitBlock(MachineBasicBlock *First,
MachineBasicBlock *Last) {		MachineBasicBlock *Last) {
// Update JTCases.		// Update JTCases.
for (unsigned i = 0, e = JTCases.size(); i != e; ++i)		for (unsigned i = 0, e = JTCases.size(); i != e; ++i)
if (JTCases[i].first.HeaderBB == First)		if (JTCases[i].first.HeaderBB == First)
JTCases[i].first.HeaderBB = Last;		JTCases[i].first.HeaderBB = Last;

// Update BitTestCases.		// Update BitTestCases.
▲ Show 20 Lines • Show All 6,188 Lines • ▼ Show 20 Lines
void SelectionDAGBuilder::updateDAGForMaybeTailCall(SDValue MaybeTC) {		void SelectionDAGBuilder::updateDAGForMaybeTailCall(SDValue MaybeTC) {
// If the node is null, we do have a tail call.		// If the node is null, we do have a tail call.
if (MaybeTC.getNode() != nullptr)		if (MaybeTC.getNode() != nullptr)
DAG.setRoot(MaybeTC);		DAG.setRoot(MaybeTC);
else		else
HasTailCall = true;		HasTailCall = true;
}		}

bool SelectionDAGBuilder::isDense(const CaseClusterVector &Clusters,
const SmallVectorImpl<unsigned> &TotalCases,
unsigned First, unsigned Last,
unsigned Density) const {
assert(Last >= First);
assert(TotalCases[Last] >= TotalCases[First]);

const APInt &LowCase = Clusters[First].Low->getValue();
const APInt &HighCase = Clusters[Last].High->getValue();
assert(LowCase.getBitWidth() == HighCase.getBitWidth());

// FIXME: A range of consecutive cases has 100% density, but only requires one
// comparison to lower. We should discriminate against such consecutive ranges
// in jump tables.

uint64_t Diff = (HighCase - LowCase).getLimitedValue((UINT64_MAX - 1) / 100);
uint64_t Range = Diff + 1;

uint64_t NumCases =
TotalCases[Last] - (First == 0 ? 0 : TotalCases[First - 1]);

assert(NumCases < UINT64_MAX / 100);
assert(Range >= NumCases);

return NumCases * 100 >= Range * Density;
}

static inline bool areJTsAllowed(const TargetLowering &TLI,
const SwitchInst *SI) {
const Function *Fn = SI->getParent()->getParent();
if (Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")
return false;

return TLI.isOperationLegalOrCustom(ISD::BR_JT, MVT::Other) \|\|
TLI.isOperationLegalOrCustom(ISD::BRIND, MVT::Other);
}

bool SelectionDAGBuilder::buildJumpTable(const CaseClusterVector &Clusters,
unsigned First, unsigned Last,
const SwitchInst *SI,
MachineBasicBlock *DefaultMBB,
CaseCluster &JTCluster) {
assert(First <= Last);

auto Prob = BranchProbability::getZero();
unsigned NumCmps = 0;
std::vector<MachineBasicBlock*> Table;
DenseMap<MachineBasicBlock*, BranchProbability> JTProbs;

// Initialize probabilities in JTProbs.
for (unsigned I = First; I <= Last; ++I)
JTProbs[Clusters[I].MBB] = BranchProbability::getZero();

for (unsigned I = First; I <= Last; ++I) {
assert(Clusters[I].Kind == CC_Range);
Prob += Clusters[I].Prob;
const APInt &Low = Clusters[I].Low->getValue();
const APInt &High = Clusters[I].High->getValue();
NumCmps += (Low == High) ? 1 : 2;
if (I != First) {
// Fill the gap between this and the previous cluster.
const APInt &PreviousHigh = Clusters[I - 1].High->getValue();
assert(PreviousHigh.slt(Low));
uint64_t Gap = (Low - PreviousHigh).getLimitedValue() - 1;
for (uint64_t J = 0; J < Gap; J++)
Table.push_back(DefaultMBB);
}
uint64_t ClusterSize = (High - Low).getLimitedValue() + 1;
for (uint64_t J = 0; J < ClusterSize; ++J)
Table.push_back(Clusters[I].MBB);
JTProbs[Clusters[I].MBB] += Clusters[I].Prob;
}

unsigned NumDests = JTProbs.size();
if (isSuitableForBitTests(NumDests, NumCmps,
Clusters[First].Low->getValue(),
Clusters[Last].High->getValue())) {
// Clusters[First..Last] should be lowered as bit tests instead.
return false;
}

// Create the MBB that will load from and jump through the table.
// Note: We create it here, but it's not inserted into the function yet.
MachineFunction *CurMF = FuncInfo.MF;
MachineBasicBlock *JumpTableMBB =
CurMF->CreateMachineBasicBlock(SI->getParent());

// Add successors. Note: use table order for determinism.
SmallPtrSet<MachineBasicBlock *, 8> Done;
for (MachineBasicBlock *Succ : Table) {
if (Done.count(Succ))
continue;
addSuccessorWithProb(JumpTableMBB, Succ, JTProbs[Succ]);
Done.insert(Succ);
}
JumpTableMBB->normalizeSuccProbs();

const TargetLowering &TLI = DAG.getTargetLoweringInfo();
unsigned JTI = CurMF->getOrCreateJumpTableInfo(TLI.getJumpTableEncoding())
->createJumpTableIndex(Table);

// Set up the jump table info.
JumpTable JT(-1U, JTI, JumpTableMBB, nullptr);
JumpTableHeader JTH(Clusters[First].Low->getValue(),
Clusters[Last].High->getValue(), SI->getCondition(),
nullptr, false);
JTCases.emplace_back(std::move(JTH), std::move(JT));

JTCluster = CaseCluster::jumpTable(Clusters[First].Low, Clusters[Last].High,
JTCases.size() - 1, Prob);
return true;
}

void SelectionDAGBuilder::findJumpTables(CaseClusterVector &Clusters,
const SwitchInst *SI,
MachineBasicBlock *DefaultMBB) {
#ifndef NDEBUG
// Clusters must be non-empty, sorted, and only contain Range clusters.
assert(!Clusters.empty());
for (CaseCluster &C : Clusters)
assert(C.Kind == CC_Range);
for (unsigned i = 1, e = Clusters.size(); i < e; ++i)
assert(Clusters[i - 1].High->getValue().slt(Clusters[i].Low->getValue()));
#endif

const TargetLowering &TLI = DAG.getTargetLoweringInfo();
if (!areJTsAllowed(TLI, SI))
return;

const bool OptForSize = DefaultMBB->getParent()->getFunction()->optForSize();

const int64_t N = Clusters.size();
const unsigned MinJumpTableEntries = TLI.getMinimumJumpTableEntries();
const unsigned SmallNumberOfEntries = MinJumpTableEntries / 2;
const unsigned MaxJumpTableSize =
OptForSize \|\| TLI.getMaximumJumpTableSize() == 0
? UINT_MAX : TLI.getMaximumJumpTableSize();

if (N < 2 \|\| N < MinJumpTableEntries)
return;

// TotalCases[i]: Total nbr of cases in Clusters[0..i].
SmallVector<unsigned, 8> TotalCases(N);
for (unsigned i = 0; i < N; ++i) {
const APInt &Hi = Clusters[i].High->getValue();
const APInt &Lo = Clusters[i].Low->getValue();
TotalCases[i] = (Hi - Lo).getLimitedValue() + 1;
if (i != 0)
TotalCases[i] += TotalCases[i - 1];
}

const unsigned MinDensity =
OptForSize ? OptsizeJumpTableDensity : JumpTableDensity;

// Cheap case: the whole range may be suitable for jump table.
unsigned JumpTableSize = (Clusters[N - 1].High->getValue() -
Clusters[0].Low->getValue())
.getLimitedValue(UINT_MAX - 1) + 1;
if (JumpTableSize <= MaxJumpTableSize &&
isDense(Clusters, TotalCases, 0, N - 1, MinDensity)) {
CaseCluster JTCluster;
if (buildJumpTable(Clusters, 0, N - 1, SI, DefaultMBB, JTCluster)) {
Clusters[0] = JTCluster;
Clusters.resize(1);
return;
}
}

// The algorithm below is not suitable for -O0.
if (TM.getOptLevel() == CodeGenOpt::None)
return;

// Split Clusters into minimum number of dense partitions. The algorithm uses
// the same idea as Kannan & Proebsting "Correction to 'Producing Good Code
// for the Case Statement'" (1994), but builds the MinPartitions array in
// reverse order to make it easier to reconstruct the partitions in ascending
// order. In the choice between two optimal partitionings, it picks the one
// which yields more jump tables.

// MinPartitions[i] is the minimum nbr of partitions of Clusters[i..N-1].
SmallVector<unsigned, 8> MinPartitions(N);
// LastElement[i] is the last element of the partition starting at i.
SmallVector<unsigned, 8> LastElement(N);
// PartitionsScore[i] is used to break ties when choosing between two
// partitionings resulting in the same number of partitions.
SmallVector<unsigned, 8> PartitionsScore(N);
// For PartitionsScore, a small number of comparisons is considered as good as
// a jump table and a single comparison is considered better than a jump
// table.
enum PartitionScores : unsigned {
NoTable = 0,
Table = 1,
FewCases = 1,
SingleCase = 2
};

// Base case: There is only one way to partition Clusters[N-1].
MinPartitions[N - 1] = 1;
LastElement[N - 1] = N - 1;
PartitionsScore[N - 1] = PartitionScores::SingleCase;

// Note: loop indexes are signed to avoid underflow.
for (int64_t i = N - 2; i >= 0; i--) {
// Find optimal partitioning of Clusters[i..N-1].
// Baseline: Put Clusters[i] into a partition on its own.
MinPartitions[i] = MinPartitions[i + 1] + 1;
LastElement[i] = i;
PartitionsScore[i] = PartitionsScore[i + 1] + PartitionScores::SingleCase;

// Search for a solution that results in fewer partitions.
for (int64_t j = N - 1; j > i; j--) {
// Try building a partition from Clusters[i..j].
JumpTableSize = (Clusters[j].High->getValue() -
Clusters[i].Low->getValue())
.getLimitedValue(UINT_MAX - 1) + 1;
if (JumpTableSize <= MaxJumpTableSize &&
isDense(Clusters, TotalCases, i, j, MinDensity)) {
unsigned NumPartitions = 1 + (j == N - 1 ? 0 : MinPartitions[j + 1]);
unsigned Score = j == N - 1 ? 0 : PartitionsScore[j + 1];
int64_t NumEntries = j - i + 1;

if (NumEntries == 1)
Score += PartitionScores::SingleCase;
else if (NumEntries <= SmallNumberOfEntries)
Score += PartitionScores::FewCases;
else if (NumEntries >= MinJumpTableEntries)
Score += PartitionScores::Table;

// If this leads to fewer partitions, or to the same number of
// partitions with better score, it is a better partitioning.
if (NumPartitions < MinPartitions[i] \|\|
(NumPartitions == MinPartitions[i] && Score > PartitionsScore[i])) {
MinPartitions[i] = NumPartitions;
LastElement[i] = j;
PartitionsScore[i] = Score;
}
}
}
}

// Iterate over the partitions, replacing some with jump tables in-place.
unsigned DstIndex = 0;
for (unsigned First = 0, Last; First < N; First = Last + 1) {
Last = LastElement[First];
assert(Last >= First);
assert(DstIndex <= First);
unsigned NumClusters = Last - First + 1;

CaseCluster JTCluster;
if (NumClusters >= MinJumpTableEntries &&
buildJumpTable(Clusters, First, Last, SI, DefaultMBB, JTCluster)) {
Clusters[DstIndex++] = JTCluster;
} else {
for (unsigned I = First; I <= Last; ++I)
std::memmove(&Clusters[DstIndex++], &Clusters[I], sizeof(Clusters[I]));
}
}
Clusters.resize(DstIndex);
}

bool SelectionDAGBuilder::rangeFitsInWord(const APInt &Low, const APInt &High) {
// FIXME: Using the pointer type doesn't seem ideal.
uint64_t BW = DAG.getDataLayout().getPointerSizeInBits();
uint64_t Range = (High - Low).getLimitedValue(UINT64_MAX - 1) + 1;
return Range <= BW;
}

bool SelectionDAGBuilder::isSuitableForBitTests(unsigned NumDests,
unsigned NumCmps,
const APInt &Low,
const APInt &High) {
// FIXME: I don't think NumCmps is the correct metric: a single case and a
// range of cases both require only one branch to lower. Just looking at the
// number of clusters and destinations should be enough to decide whether to
// build bit tests.

// To lower a range with bit tests, the range must fit the bitwidth of a
// machine word.
if (!rangeFitsInWord(Low, High))
return false;

// Decide whether it's profitable to lower this range with bit tests. Each
// destination requires a bit test and branch, and there is an overall range
// check branch. For a small number of clusters, separate comparisons might be
// cheaper, and for many destinations, splitting the range might be better.
return (NumDests == 1 && NumCmps >= 3) \|\|
(NumDests == 2 && NumCmps >= 5) \|\|
(NumDests == 3 && NumCmps >= 6);
}

bool SelectionDAGBuilder::buildBitTests(CaseClusterVector &Clusters,
unsigned First, unsigned Last,
const SwitchInst *SI,
CaseCluster &BTCluster) {
assert(First <= Last);
if (First == Last)
return false;

BitVector Dests(FuncInfo.MF->getNumBlockIDs());
unsigned NumCmps = 0;
for (int64_t I = First; I <= Last; ++I) {
assert(Clusters[I].Kind == CC_Range);
Dests.set(Clusters[I].MBB->getNumber());
NumCmps += (Clusters[I].Low == Clusters[I].High) ? 1 : 2;
}
unsigned NumDests = Dests.count();

APInt Low = Clusters[First].Low->getValue();
APInt High = Clusters[Last].High->getValue();
assert(Low.slt(High));

if (!isSuitableForBitTests(NumDests, NumCmps, Low, High))
return false;

APInt LowBound;
APInt CmpRange;

const int BitWidth = DAG.getTargetLoweringInfo()
.getPointerTy(DAG.getDataLayout())
.getSizeInBits();
assert(rangeFitsInWord(Low, High) && "Case range must fit in bit mask!");

// Check if the clusters cover a contiguous range such that no value in the
// range will jump to the default statement.
bool ContiguousRange = true;
for (int64_t I = First + 1; I <= Last; ++I) {
if (Clusters[I].Low->getValue() != Clusters[I - 1].High->getValue() + 1) {
ContiguousRange = false;
break;
}
}

if (Low.isStrictlyPositive() && High.slt(BitWidth)) {
// Optimize the case where all the case values fit in a word without having
// to subtract minValue. In this case, we can optimize away the subtraction.
LowBound = APInt::getNullValue(Low.getBitWidth());
CmpRange = High;
ContiguousRange = false;
} else {
LowBound = Low;
CmpRange = High - Low;
}

CaseBitsVector CBV;
auto TotalProb = BranchProbability::getZero();
for (unsigned i = First; i <= Last; ++i) {
// Find the CaseBits for this destination.
unsigned j;
for (j = 0; j < CBV.size(); ++j)
if (CBV[j].BB == Clusters[i].MBB)
break;
if (j == CBV.size())
CBV.push_back(
CaseBits(0, Clusters[i].MBB, 0, BranchProbability::getZero()));
CaseBits *CB = &CBV[j];

// Update Mask, Bits and ExtraProb.
uint64_t Lo = (Clusters[i].Low->getValue() - LowBound).getZExtValue();
uint64_t Hi = (Clusters[i].High->getValue() - LowBound).getZExtValue();
assert(Hi >= Lo && Hi < 64 && "Invalid bit case!");
CB->Mask \|= (-1ULL >> (63 - (Hi - Lo))) << Lo;
CB->Bits += Hi - Lo + 1;
CB->ExtraProb += Clusters[i].Prob;
TotalProb += Clusters[i].Prob;
}

BitTestInfo BTI;
std::sort(CBV.begin(), CBV.end(), [](const CaseBits &a, const CaseBits &b) {
// Sort by probability first, number of bits second.
if (a.ExtraProb != b.ExtraProb)
return a.ExtraProb > b.ExtraProb;
return a.Bits > b.Bits;
});

for (auto &CB : CBV) {
MachineBasicBlock *BitTestBB =
FuncInfo.MF->CreateMachineBasicBlock(SI->getParent());
BTI.push_back(BitTestCase(CB.Mask, BitTestBB, CB.BB, CB.ExtraProb));
}
BitTestCases.emplace_back(std::move(LowBound), std::move(CmpRange),
SI->getCondition(), -1U, MVT::Other, false,
ContiguousRange, nullptr, nullptr, std::move(BTI),
TotalProb);

BTCluster = CaseCluster::bitTests(Clusters[First].Low, Clusters[Last].High,
BitTestCases.size() - 1, TotalProb);
return true;
}

void SelectionDAGBuilder::findBitTestClusters(CaseClusterVector &Clusters,
const SwitchInst *SI) {
// Partition Clusters into as few subsets as possible, where each subset has a
// range that fits in a machine word and has <= 3 unique destinations.

#ifndef NDEBUG
// Clusters must be sorted and contain Range or JumpTable clusters.
assert(!Clusters.empty());
assert(Clusters[0].Kind == CC_Range \|\| Clusters[0].Kind == CC_JumpTable);
for (const CaseCluster &C : Clusters)
assert(C.Kind == CC_Range \|\| C.Kind == CC_JumpTable);
for (unsigned i = 1; i < Clusters.size(); ++i)
assert(Clusters[i-1].High->getValue().slt(Clusters[i].Low->getValue()));
#endif

// The algorithm below is not suitable for -O0.
if (TM.getOptLevel() == CodeGenOpt::None)
return;

// If target does not have legal shift left, do not emit bit tests at all.
const TargetLowering &TLI = DAG.getTargetLoweringInfo();
EVT PTy = TLI.getPointerTy(DAG.getDataLayout());
if (!TLI.isOperationLegal(ISD::SHL, PTy))
return;

int BitWidth = PTy.getSizeInBits();
const int64_t N = Clusters.size();

// MinPartitions[i] is the minimum nbr of partitions of Clusters[i..N-1].
SmallVector<unsigned, 8> MinPartitions(N);
// LastElement[i] is the last element of the partition starting at i.
SmallVector<unsigned, 8> LastElement(N);

// FIXME: This might not be the best algorithm for finding bit test clusters.

// Base case: There is only one way to partition Clusters[N-1].
MinPartitions[N - 1] = 1;
LastElement[N - 1] = N - 1;

// Note: loop indexes are signed to avoid underflow.
for (int64_t i = N - 2; i >= 0; --i) {
// Find optimal partitioning of Clusters[i..N-1].
// Baseline: Put Clusters[i] into a partition on its own.
MinPartitions[i] = MinPartitions[i + 1] + 1;
LastElement[i] = i;

// Search for a solution that results in fewer partitions.
// Note: the search is limited by BitWidth, reducing time complexity.
for (int64_t j = std::min(N - 1, i + BitWidth - 1); j > i; --j) {
// Try building a partition from Clusters[i..j].

// Check the range.
if (!rangeFitsInWord(Clusters[i].Low->getValue(),
Clusters[j].High->getValue()))
continue;

// Check nbr of destinations and cluster types.
// FIXME: This works, but doesn't seem very efficient.
bool RangesOnly = true;
BitVector Dests(FuncInfo.MF->getNumBlockIDs());
for (int64_t k = i; k <= j; k++) {
if (Clusters[k].Kind != CC_Range) {
RangesOnly = false;
break;
}
Dests.set(Clusters[k].MBB->getNumber());
}
if (!RangesOnly \|\| Dests.count() > 3)
break;

// Check if it's a better partition.
unsigned NumPartitions = 1 + (j == N - 1 ? 0 : MinPartitions[j + 1]);
if (NumPartitions < MinPartitions[i]) {
// Found a better partition.
MinPartitions[i] = NumPartitions;
LastElement[i] = j;
}
}
}

// Iterate over the partitions, replacing with bit-test clusters in-place.
unsigned DstIndex = 0;
for (unsigned First = 0, Last; First < N; First = Last + 1) {
Last = LastElement[First];
assert(First <= Last);
assert(DstIndex <= First);

CaseCluster BitTestCluster;
if (buildBitTests(Clusters, First, Last, SI, BitTestCluster)) {
Clusters[DstIndex++] = BitTestCluster;
} else {
size_t NumClusters = Last - First + 1;
std::memmove(&Clusters[DstIndex], &Clusters[First],
sizeof(Clusters[0]) * NumClusters);
DstIndex += NumClusters;
}
}
Clusters.resize(DstIndex);
}

void SelectionDAGBuilder::lowerWorkItem(SwitchWorkListItem W, Value *Cond,		void SelectionDAGBuilder::lowerWorkItem(SwitchWorkListItem W, Value *Cond,
MachineBasicBlock *SwitchMBB,		MachineBasicBlock *SwitchMBB,
MachineBasicBlock *DefaultMBB) {		MachineBasicBlock *DefaultMBB) {
MachineFunction *CurMF = FuncInfo.MF;		MachineFunction *CurMF = FuncInfo.MF;
MachineBasicBlock *NextMBB = nullptr;		MachineBasicBlock *NextMBB = nullptr;
MachineFunction::iterator BBI(W.MBB);		MachineFunction::iterator BBI(W.MBB);
if (++BBI != FuncInfo.MF->end())		if (++BBI != FuncInfo.MF->end())
NextMBB = &*BBI;		NextMBB = &*BBI;
▲ Show 20 Lines • Show All 96 Lines • ▼ Show 20 Lines	if (I == W.LastCluster) {
ExportFromCurrentBlock(Cond);		ExportFromCurrentBlock(Cond);
}		}
UnhandledProbs -= I->Prob;		UnhandledProbs -= I->Prob;

switch (I->Kind) {		switch (I->Kind) {
case CC_JumpTable: {		case CC_JumpTable: {
// FIXME: Optimize away range check based on pivot comparisons.		// FIXME: Optimize away range check based on pivot comparisons.
JumpTableHeader *JTH = &JTCases[I->JTCasesIndex].first;		JumpTableHeader *JTH = &JTCases[I->JTCasesIndex].first;
JumpTable *JT = &JTCases[I->JTCasesIndex].second;		JumpTableCase *JT = &JTCases[I->JTCasesIndex].second;

// The jump block hasn't been inserted yet; insert it here.		// The jump block hasn't been inserted yet; insert it here.
MachineBasicBlock *JumpMBB = JT->MBB;		MachineBasicBlock *JumpMBB = JT->MBB;
CurMF->insert(BBI, JumpMBB);		CurMF->insert(BBI, JumpMBB);

auto JumpProb = I->Prob;		auto JumpProb = I->Prob;
auto FallthroughProb = UnhandledProbs;		auto FallthroughProb = UnhandledProbs;

▲ Show 20 Lines • Show All 230 Lines • ▼ Show 20 Lines	void SelectionDAGBuilder::splitWorkItem(SwitchWorkList &WorkList,

if (W.MBB == SwitchMBB)		if (W.MBB == SwitchMBB)
visitSwitchCase(CB, SwitchMBB);		visitSwitchCase(CB, SwitchMBB);
else		else
SwitchCases.push_back(CB);		SwitchCases.push_back(CB);
}		}

void SelectionDAGBuilder::visitSwitch(const SwitchInst &SI) {		void SelectionDAGBuilder::visitSwitch(const SwitchInst &SI) {
// Extract cases from the switch.
BranchProbabilityInfo *BPI = FuncInfo.BPI;		BranchProbabilityInfo *BPI = FuncInfo.BPI;
CaseClusterVector Clusters;		CaseClusterVector Clusters;
Clusters.reserve(SI.getNumCases());
for (auto I : SI.cases()) {
MachineBasicBlock *Succ = FuncInfo.MBBMap[I.getCaseSuccessor()];
const ConstantInt *CaseVal = I.getCaseValue();
BranchProbability Prob =
BPI ? BPI->getEdgeProbability(SI.getParent(), I.getSuccessorIndex())
: BranchProbability(1, SI.getNumCases() + 1);
Clusters.push_back(CaseCluster::range(CaseVal, CaseVal, Succ, Prob));
}

MachineBasicBlock *DefaultMBB = FuncInfo.MBBMap[SI.getDefaultDest()];

// Cluster adjacent cases with the same destination. We do this at all
// optimization levels because it's cheap to do and will make codegen faster
// if there are many clusters.
sortAndRangeify(Clusters);

if (TM.getOptLevel() != CodeGenOpt::None) {		if (!CaseClusters)
// Replace an unreachable default with the most popular destination.		CaseClusters = new SwitchLoweringCaseCluster(
// FIXME: Exploit unreachable default more aggressively.		DAG.getDataLayout(), DAG.getTargetLoweringInfo(), TM.getOptLevel(),
bool UnreachableDefault =		FuncInfo, this);
isa<UnreachableInst>(SI.getDefaultDest()->getFirstNonPHIOrDbg());		const BasicBlock *DefaultBB =
if (UnreachableDefault && !Clusters.empty()) {		CaseClusters->findCaseClusters(SI, Clusters, BPI);
DenseMap<const BasicBlock *, unsigned> Popularity;
unsigned MaxPop = 0;
const BasicBlock *MaxBB = nullptr;
for (auto I : SI.cases()) {
const BasicBlock *BB = I.getCaseSuccessor();
if (++Popularity[BB] > MaxPop) {
MaxPop = Popularity[BB];
MaxBB = BB;
}
}
// Set new default.
assert(MaxPop > 0 && MaxBB);
DefaultMBB = FuncInfo.MBBMap[MaxBB];

// Remove cases that were pointing to the destination that is now the
// default.
CaseClusterVector New;
New.reserve(Clusters.size());
for (CaseCluster &CC : Clusters) {
if (CC.MBB != DefaultMBB)
New.push_back(CC);
}
Clusters = std::move(New);
}
}

// If there is only the default destination, jump there directly.		// If there is only the default destination, jump there directly.
MachineBasicBlock *SwitchMBB = FuncInfo.MBB;		MachineBasicBlock *SwitchMBB = FuncInfo.MBB;
		MachineBasicBlock *DefaultMBB = FuncInfo.MBBMap[DefaultBB];
if (Clusters.empty()) {		if (Clusters.empty()) {
SwitchMBB->addSuccessor(DefaultMBB);		SwitchMBB->addSuccessor(DefaultMBB);
if (DefaultMBB != NextBlock(SwitchMBB)) {		if (DefaultMBB != NextBlock(SwitchMBB)) {
DAG.setRoot(DAG.getNode(ISD::BR, getCurSDLoc(), MVT::Other,		DAG.setRoot(DAG.getNode(ISD::BR, getCurSDLoc(), MVT::Other,
getControlRoot(), DAG.getBasicBlock(DefaultMBB)));		getControlRoot(), DAG.getBasicBlock(DefaultMBB)));
}		}
return;		return;
}		}

findJumpTables(Clusters, &SI, DefaultMBB);
findBitTestClusters(Clusters, &SI);

DEBUG({		DEBUG({
dbgs() << "Case clusters: ";		dbgs() << "Case clusters: ";
for (const CaseCluster &C : Clusters) {		for (const CaseCluster &C : Clusters) {
if (C.Kind == CC_JumpTable) dbgs() << "JT:";		if (C.Kind == CC_JumpTable)
if (C.Kind == CC_BitTests) dbgs() << "BT:";		dbgs() << "JT:";
		if (C.Kind == CC_BitTests)
		dbgs() << "BT:";

C.Low->getValue().print(dbgs(), true);		C.Low->getValue().print(dbgs(), true);
if (C.Low != C.High) {		if (C.Low != C.High) {
dbgs() << '-';		dbgs() << '-';
C.High->getValue().print(dbgs(), true);		C.High->getValue().print(dbgs(), true);
}		}
dbgs() << ' ';		dbgs() << ' ';
}		}
dbgs() << '\n';		dbgs() << '\n';
});		});

		for (CaseCluster &C : Clusters) {
		if (C.Kind == CC_Range) {
		C.MBB = FuncInfo.MBBMap[C.BB];
		assert(C.MBB && "No matching MachinBasicBlock was found");
		}
		}

assert(!Clusters.empty());		assert(!Clusters.empty());
SwitchWorkList WorkList;		SwitchWorkList WorkList;
CaseClusterIt First = Clusters.begin();		CaseClusterIt First = Clusters.begin();
CaseClusterIt Last = Clusters.end() - 1;		CaseClusterIt Last = Clusters.end() - 1;
auto DefaultProb = getEdgeProbability(SwitchMBB, DefaultMBB);		auto DefaultProb = getEdgeProbability(SwitchMBB, DefaultMBB);
WorkList.push_back({SwitchMBB, First, Last, nullptr, nullptr, DefaultProb});		WorkList.push_back({SwitchMBB, First, Last, nullptr, nullptr, DefaultProb});

while (!WorkList.empty()) {		while (!WorkList.empty()) {
Show All 14 Lines

lib/CodeGen/SelectionDAG/SwitchLoweringCaseCluster.cpp

This file was added.

				//===-- SwitchLoweringCaseCluster.cpp -----------------------------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This implements routines for forming case clusters for SwitchInst.
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/CodeGen/SwitchLoweringCaseCluster.h"
				#include "SelectionDAGBuilder.h"
				#include "llvm/Analysis/BranchProbabilityInfo.h"

				#include <algorithm>
				using namespace llvm;

				/// Minimum jump table density for normal functions.
				static cl::opt<unsigned>
				JumpTableDensity("jump-table-density", cl::init(10), cl::Hidden,
				cl::desc("Minimum density for building a jump table in "
				"a normal function"));

				/// Minimum jump table density for -Os or -Oz functions.
				static cl::opt<unsigned> OptsizeJumpTableDensity(
				"optsize-jump-table-density", cl::init(40), cl::Hidden,
				cl::desc("Minimum density for building a jump table in "
				"an optsize function"));

				bool SwitchLoweringCaseClusterBuilder::isDense(
				const CaseClusterVector &Clusters,
				const SmallVectorImpl<unsigned> &TotalCases, unsigned First, unsigned Last,
				unsigned Density) const {
				assert(Last >= First);
				assert(TotalCases[Last] >= TotalCases[First]);

				const APInt &LowCase = Clusters[First].Low->getValue();
				const APInt &HighCase = Clusters[Last].High->getValue();
				assert(LowCase.getBitWidth() == HighCase.getBitWidth());

				// FIXME: A range of consecutive cases has 100% density, but only requires one
				// comparison to lower. We should discriminate against such consecutive ranges
				// in jump tables.

				uint64_t Diff = (HighCase - LowCase).getLimitedValue((UINT64_MAX - 1) / 100);
				uint64_t Range = Diff + 1;

				uint64_t NumCases =
				TotalCases[Last] - (First == 0 ? 0 : TotalCases[First - 1]);

				assert(NumCases < UINT64_MAX / 100);
				assert(Range >= NumCases);

				return NumCases * 100 >= Range * Density;
				}

				static inline bool areJTsAllowed(const TargetLowering &TLI,
				const SwitchInst *SI) {
				const Function *Fn = SI->getParent()->getParent();
				if (Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")
				return false;

				return TLI.isOperationLegalOrCustom(ISD::BR_JT, MVT::Other) \|\|
				TLI.isOperationLegalOrCustom(ISD::BRIND, MVT::Other);
				}

				bool SwitchLoweringCaseClusterBuilder::canBuildBitTest(
				CaseClusterVector &Clusters, unsigned First, unsigned Last,
				const SwitchInst *SI, CaseCluster &BTCluster) {
				assert(First <= Last);
				if (First == Last)
				return false;

				SmallPtrSet<const BasicBlock *, 4> Dests;
				unsigned NumCmps = 0;
				for (int64_t I = First; I <= Last; ++I) {
				assert(Clusters[I].Kind == CC_Range);
				Dests.insert(Clusters[I].BB);
				NumCmps += (Clusters[I].Low == Clusters[I].High) ? 1 : 2;
				}
				unsigned NumDests = Dests.size();

				APInt Low = Clusters[First].Low->getValue();
				APInt High = Clusters[Last].High->getValue();
				assert(Low.slt(High));

				return isSuitableForBitTests(NumDests, NumCmps, Low, High);
				}

				void SwitchLoweringCaseClusterBuilder::sortAndRangeify(
				CaseClusterVector &Clusters) {
				#ifndef NDEBUG
				for (const CaseCluster &CC : Clusters)
				assert(CC.Low == CC.High && "Input clusters must be single-case");
				#endif

				std::sort(Clusters.begin(), Clusters.end(),
				[](const CaseCluster &a, const CaseCluster &b) {
				return a.Low->getValue().slt(b.Low->getValue());
				});

				// Merge adjacent clusters with the same destination.
				const unsigned N = Clusters.size();
				unsigned DstIndex = 0;
				for (unsigned SrcIndex = 0; SrcIndex < N; ++SrcIndex) {
				CaseCluster &CC = Clusters[SrcIndex];
				const ConstantInt *CaseVal = CC.Low;
				// MachineBasicBlock *Succ = CC.MBB;
				const BasicBlock *Succ = CC.BB;

				// if (DstIndex != 0 && Clusters[DstIndex - 1].MBB == Succ &&
				if (DstIndex != 0 && Clusters[DstIndex - 1].BB == Succ &&
				(CaseVal->getValue() - Clusters[DstIndex - 1].High->getValue()) == 1) {
				// If this case has the same successor and is a neighbour, merge it into
				// the previous cluster.
				Clusters[DstIndex - 1].High = CaseVal;
				Clusters[DstIndex - 1].Prob += CC.Prob;
				} else {
				std::memmove(&Clusters[DstIndex++], &Clusters[SrcIndex],
				sizeof(Clusters[SrcIndex]));
				}
				}
				Clusters.resize(DstIndex);
				}

				const BasicBlock *SwitchLoweringCaseClusterBuilder::replaceUnrechableDefault(
				const SwitchInst &SI, CaseClusterVector &Clusters) {
				const BasicBlock *DefaultBB = SI.getDefaultDest();
				if (OptLevel != CodeGenOpt::None) {
				// FIXME: Exploit unreachable default more aggressively.
				bool UnreachableDefault =
				isa<UnreachableInst>(SI.getDefaultDest()->getFirstNonPHIOrDbg());
				if (UnreachableDefault && !Clusters.empty()) {
				DenseMap<const BasicBlock *, unsigned> Popularity;
				unsigned MaxPop = 0;
				const BasicBlock *MaxBB = nullptr;
				for (auto I : SI.cases()) {
				const BasicBlock *BB = I.getCaseSuccessor();
				if (++Popularity[BB] > MaxPop) {
				MaxPop = Popularity[BB];
				MaxBB = BB;
				}
				}
				// Set new default.
				assert(MaxPop > 0 && MaxBB);
				DefaultBB = MaxBB;

				// Remove cases that were pointing to the destination that is now the
				// default.
				CaseClusterVector New;
				New.reserve(Clusters.size());
				for (CaseCluster &CC : Clusters) {
				if (CC.BB != DefaultBB)
				New.push_back(CC);
				}
				Clusters = std::move(New);
				}
				}
				return DefaultBB;
				}

				void SwitchLoweringCaseClusterBuilder::formInitalCaseCluser(
				const SwitchInst &SI, CaseClusterVector &Clusters,
				BranchProbabilityInfo *BPI) {
				Clusters.reserve(SI.getNumCases());
				for (auto I : SI.cases()) {
				const BasicBlock *Succ = I.getCaseSuccessor();
				const ConstantInt *CaseVal = I.getCaseValue();
				BranchProbability Prob =
				BPI ? BPI->getEdgeProbability(SI.getParent(), I.getSuccessorIndex())
				: BranchProbability(1, SI.getNumCases() + 1);
				Clusters.push_back(CaseCluster::range(CaseVal, CaseVal, Succ, Prob));
				}
				// Cluster adjacent cases with the same destination. We do this at all
				// optimization levels because it's cheap to do and will make codegen faster
				// if there are many clusters.
				sortAndRangeify(Clusters);
				}

				bool SwitchLoweringCaseClusterBuilder::rangeFitsInWord(const APInt &Low,
				const APInt &High) {
				// FIXME: Using the pointer type doesn't seem ideal.
				uint64_t BW = DL.getPointerSizeInBits();
				uint64_t Range = (High - Low).getLimitedValue(UINT64_MAX - 1) + 1;
				return Range <= BW;
				}

				bool SwitchLoweringCaseClusterBuilder::isSuitableForBitTests(
				unsigned NumDests, unsigned NumCmps, const APInt &Low, const APInt &High) {
				// FIXME: I don't think NumCmps is the correct metric: a single case and a
				// range of cases both require only one branch to lower. Just looking at the
				// number of clusters and destinations should be enough to decide whether to
				// build bit tests.

				// To lower a range with bit tests, the range must fit the bitwidth of a
				// machine word.
				if (!rangeFitsInWord(Low, High))
				return false;

				// Decide whether it's profitable to lower this range with bit tests. Each
				// destination requires a bit test and branch, and there is an overall range
				// check branch. For a small number of clusters, separate comparisons might be
				// cheaper, and for many destinations, splitting the range might be better.
				return (NumDests == 1 && NumCmps >= 3) \|\| (NumDests == 2 && NumCmps >= 5) \|\|
				(NumDests == 3 && NumCmps >= 6);
				}

				bool SwitchLoweringCaseClusterBuilder::canBuildJumpTable(
				const CaseClusterVector &Clusters, unsigned First, unsigned Last,
				const SwitchInst *SI, CaseCluster &JTCluster) {
				assert(First <= Last);
				unsigned NumCmps = 0;
				SmallPtrSet<const BasicBlock *, 4> JTProbs;
				for (unsigned I = First; I <= Last; ++I) {
				assert(Clusters[I].Kind == CC_Range);
				const APInt &Low = Clusters[I].Low->getValue();
				const APInt &High = Clusters[I].High->getValue();
				NumCmps += (Low == High) ? 1 : 2;
				JTProbs.insert(Clusters[I].BB);
				}
				unsigned NumDests = JTProbs.size();
				return !(isSuitableForBitTests(NumDests, NumCmps,
				Clusters[First].Low->getValue(),
				Clusters[Last].High->getValue()));
				}

				void SwitchLoweringCaseClusterBuilder::findJumpTables(
				CaseClusterVector &Clusters, const SwitchInst *SI,
				const BasicBlock *DefaultBB) {
				#ifndef NDEBUG
				// Clusters must be non-empty, sorted, and only contain Range clusters.
				assert(!Clusters.empty());
				for (CaseCluster &C : Clusters)
				assert(C.Kind == CC_Range);
				for (unsigned i = 1, e = Clusters.size(); i < e; ++i)
				assert(Clusters[i - 1].High->getValue().slt(Clusters[i].Low->getValue()));
				#endif

				if (!areJTsAllowed(TLI, SI))
				return;

				const bool OptForSize = DefaultBB->getParent()->optForSize();

				const int64_t N = Clusters.size();
				const unsigned MinJumpTableEntries = TLI.getMinimumJumpTableEntries();
				const unsigned SmallNumberOfEntries = MinJumpTableEntries / 2;
				const unsigned MaxJumpTableSize =
				OptForSize \|\| TLI.getMaximumJumpTableSize() == 0
				? UINT_MAX
				: TLI.getMaximumJumpTableSize();

				if (N < 2 \|\| N < MinJumpTableEntries)
				return;

				// TotalCases[i]: Total nbr of cases in Clusters[0..i].
				SmallVector<unsigned, 8> TotalCases(N);
				for (unsigned i = 0; i < N; ++i) {
				const APInt &Hi = Clusters[i].High->getValue();
				const APInt &Lo = Clusters[i].Low->getValue();
				TotalCases[i] = (Hi - Lo).getLimitedValue() + 1;
				if (i != 0)
				TotalCases[i] += TotalCases[i - 1];
				}

				const unsigned MinDensity =
				OptForSize ? OptsizeJumpTableDensity : JumpTableDensity;

				// Cheap case: the whole range may be suitable for jump table.
				unsigned JumpTableSize =
				(Clusters[N - 1].High->getValue() - Clusters[0].Low->getValue())
				.getLimitedValue(UINT_MAX - 1) + 1;
				if (JumpTableSize <= MaxJumpTableSize &&
				isDense(Clusters, TotalCases, 0, N - 1, MinDensity)) {
				CaseCluster JTCluster;
				if (canBuildJumpTable(Clusters, 0, N - 1, SI, JTCluster)) {
				buildJumpTable(Clusters, 0, N - 1, SI, DefaultBB, JTCluster);
				Clusters[0] = JTCluster;
				Clusters.resize(1);
				return;
				}
				}

				// The algorithm below is not suitable for -O0.
				if (OptLevel == CodeGenOpt::None)
				return;

				// Split Clusters into minimum number of dense partitions. The algorithm uses
				// the same idea as Kannan & Proebsting "Correction to 'Producing Good Code
				// for the Case Statement'" (1994), but builds the MinPartitions array in
				// reverse order to make it easier to reconstruct the partitions in ascending
				// order. In the choice between two optimal partitionings, it picks the one
				// which yields more jump tables.

				// MinPartitions[i] is the minimum nbr of partitions of Clusters[i..N-1].
				SmallVector<unsigned, 8> MinPartitions(N);
				// LastElement[i] is the last element of the partition starting at i.
				SmallVector<unsigned, 8> LastElement(N);
				// PartitionsScore[i] is used to break ties when choosing between two
				// partitionings resulting in the same number of partitions.
				SmallVector<unsigned, 8> PartitionsScore(N);
				// For PartitionsScore, a small number of comparisons is considered as good as
				// a jump table and a single comparison is considered better than a jump
				// table.
				enum PartitionScores : unsigned {
				NoTable = 0,
				Table = 1,
				FewCases = 1,
				SingleCase = 2
				};

				// Base case: There is only one way to partition Clusters[N-1].
				MinPartitions[N - 1] = 1;
				LastElement[N - 1] = N - 1;
				PartitionsScore[N - 1] = PartitionScores::SingleCase;

				// Note: loop indexes are signed to avoid underflow.
				for (int64_t i = N - 2; i >= 0; i--) {
				// Find optimal partitioning of Clusters[i..N-1].
				// Baseline: Put Clusters[i] into a partition on its own.
				MinPartitions[i] = MinPartitions[i + 1] + 1;
				LastElement[i] = i;
				PartitionsScore[i] = PartitionsScore[i + 1] + PartitionScores::SingleCase;

				// Search for a solution that results in fewer partitions.
				for (int64_t j = N - 1; j > i; j--) {
				// Try building a partition from Clusters[i..j].
				JumpTableSize =
				(Clusters[j].High->getValue() - Clusters[i].Low->getValue())
				.getLimitedValue(UINT_MAX - 1) +
				1;
				if (JumpTableSize <= MaxJumpTableSize &&
				isDense(Clusters, TotalCases, i, j, MinDensity)) {
				unsigned NumPartitions = 1 + (j == N - 1 ? 0 : MinPartitions[j + 1]);
				unsigned Score = j == N - 1 ? 0 : PartitionsScore[j + 1];
				int64_t NumEntries = j - i + 1;

				if (NumEntries == 1)
				Score += PartitionScores::SingleCase;
				else if (NumEntries <= SmallNumberOfEntries)
				Score += PartitionScores::FewCases;
				else if (NumEntries >= MinJumpTableEntries)
				Score += PartitionScores::Table;

				// If this leads to fewer partitions, or to the same number of
				// partitions with better score, it is a better partitioning.
				if (NumPartitions < MinPartitions[i] \|\|
				(NumPartitions == MinPartitions[i] && Score > PartitionsScore[i])) {
				MinPartitions[i] = NumPartitions;
				LastElement[i] = j;
				PartitionsScore[i] = Score;
				}
				}
				}
				}

				// Iterate over the partitions, replacing some with jump tables in-place.
				unsigned DstIndex = 0;
				for (unsigned First = 0, Last; First < N; First = Last + 1) {
				Last = LastElement[First];
				assert(Last >= First);
				assert(DstIndex <= First);
				unsigned NumClusters = Last - First + 1;

				CaseCluster JTCluster;
				if (NumClusters >= MinJumpTableEntries &&
				canBuildJumpTable(Clusters, First, Last, SI, JTCluster)) {
				buildJumpTable(Clusters, First, Last, SI, DefaultBB, JTCluster);
				Clusters[DstIndex++] = JTCluster;
				} else {
				for (unsigned I = First; I <= Last; ++I)
				std::memmove(&Clusters[DstIndex++], &Clusters[I], sizeof(Clusters[I]));
				}
				}
				Clusters.resize(DstIndex);
				}

				void SwitchLoweringCaseClusterBuilder::findBitTestClusters(
				CaseClusterVector &Clusters, const SwitchInst *SI) {
				// Partition Clusters into as few subsets as possible, where each subset has a
				// range that fits in a machine word and has <= 3 unique destinations.

				#ifndef NDEBUG
				// Clusters must be sorted and contain Range or JumpTable clusters.
				assert(!Clusters.empty());
				assert(Clusters[0].Kind == CC_Range \|\| Clusters[0].Kind == CC_JumpTable);
				for (const CaseCluster &C : Clusters)
				assert(C.Kind == CC_Range \|\| C.Kind == CC_JumpTable);
				for (unsigned i = 1; i < Clusters.size(); ++i)
				assert(Clusters[i - 1].High->getValue().slt(Clusters[i].Low->getValue()));
				#endif

				// The algorithm below is not suitable for -O0.
				if (OptLevel == CodeGenOpt::None)
				return;

				// If target does not have legal shift left, do not emit bit tests at all.
				EVT PTy = TLI.getPointerTy(DL);
				if (!TLI.isOperationLegal(ISD::SHL, PTy))
				return;

				int BitWidth = PTy.getSizeInBits();
				const int64_t N = Clusters.size();

				// MinPartitions[i] is the minimum nbr of partitions of Clusters[i..N-1].
				SmallVector<unsigned, 8> MinPartitions(N);
				// LastElement[i] is the last element of the partition starting at i.
				SmallVector<unsigned, 8> LastElement(N);

				// FIXME: This might not be the best algorithm for finding bit test clusters.

				// Base case: There is only one way to partition Clusters[N-1].
				MinPartitions[N - 1] = 1;
				LastElement[N - 1] = N - 1;

				// Note: loop indexes are signed to avoid underflow.
				for (int64_t i = N - 2; i >= 0; --i) {
				// Find optimal partitioning of Clusters[i..N-1].
				// Baseline: Put Clusters[i] into a partition on its own.
				MinPartitions[i] = MinPartitions[i + 1] + 1;
				LastElement[i] = i;

				// Search for a solution that results in fewer partitions.
				// Note: the search is limited by BitWidth, reducing time complexity.
				for (int64_t j = std::min(N - 1, i + BitWidth - 1); j > i; --j) {
				// Try building a partition from Clusters[i..j].

				// Check the range.
				if (!rangeFitsInWord(Clusters[i].Low->getValue(),
				Clusters[j].High->getValue()))
				continue;

				// Check nbr of destinations and cluster types.
				// FIXME: This works, but doesn't seem very efficient.
				bool RangesOnly = true;
				SmallPtrSet<const BasicBlock *, 8> Dests;
				for (int64_t k = i; k <= j; k++) {
				if (Clusters[k].Kind != CC_Range) {
				RangesOnly = false;
				break;
				}
				Dests.insert(Clusters[k].BB);
				}
				if (!RangesOnly \|\| Dests.size() > 3)
				break;

				// Check if it's a better partition.
				unsigned NumPartitions = 1 + (j == N - 1 ? 0 : MinPartitions[j + 1]);
				if (NumPartitions < MinPartitions[i]) {
				// Found a better partition.
				MinPartitions[i] = NumPartitions;
				LastElement[i] = j;
				}
				}
				}

				// Iterate over the partitions, replacing with bit-test clusters in-place.
				unsigned DstIndex = 0;
				for (unsigned First = 0, Last; First < N; First = Last + 1) {
				Last = LastElement[First];
				assert(First <= Last);
				assert(DstIndex <= First);

				CaseCluster BitTestCluster;
				if (canBuildBitTest(Clusters, First, Last, SI, BitTestCluster)) {
				buildBitTests(Clusters, First, Last, SI, BitTestCluster);
				Clusters[DstIndex++] = BitTestCluster;
				} else {
				size_t NumClusters = Last - First + 1;
				std::memmove(&Clusters[DstIndex], &Clusters[First],
				sizeof(Clusters[0]) * NumClusters);
				DstIndex += NumClusters;
				}
				}
				Clusters.resize(DstIndex);
				}

				void SwitchLoweringCaseClusterBuilder::buildJumpTable(
				const CaseClusterVector &Clusters, unsigned First, unsigned Last,
				const SwitchInst SI, const BasicBlock DefaultBB, CaseCluster &JTCluster) {
				assert(First <= Last);
				JTCluster = CaseCluster::jumpTable(Clusters[First].Low, Clusters[Last].High,
				0, BranchProbability::getZero());
				}

				void SwitchLoweringCaseClusterBuilder::buildBitTests(
				CaseClusterVector &Clusters, unsigned First, unsigned Last,
				const SwitchInst *SI, CaseCluster &BTCluster) {
				BTCluster = CaseCluster::bitTests(Clusters[First].Low, Clusters[Last].High, 0,
				BranchProbability::getZero());
				}

				void SwitchLoweringCaseClusterBuilderForDAG::buildJumpTable(
				const CaseClusterVector &Clusters, unsigned First, unsigned Last,
				const SwitchInst SI, const BasicBlock DefaultBB, CaseCluster &JTCluster) {
				assert(First <= Last);

				MachineBasicBlock *DefaultMBB = FuncInfo.MBBMap[DefaultBB];

				auto Prob = BranchProbability::getZero();
				std::vector<MachineBasicBlock *> Table;
				DenseMap<MachineBasicBlock *, BranchProbability> JTProbs;

				// Initialize probabilities in JTProbs.
				for (unsigned I = First; I <= Last; ++I)
				JTProbs[FuncInfo.MBBMap[Clusters[I].BB]] = BranchProbability::getZero();

				for (unsigned I = First; I <= Last; ++I) {
				assert(Clusters[I].Kind == CC_Range);
				Prob += Clusters[I].Prob;
				const APInt &Low = Clusters[I].Low->getValue();
				const APInt &High = Clusters[I].High->getValue();
				if (I != First) {
				// Fill the gap between this and the previous cluster.
				const APInt &PreviousHigh = Clusters[I - 1].High->getValue();
				assert(PreviousHigh.slt(Low));
				uint64_t Gap = (Low - PreviousHigh).getLimitedValue() - 1;
				for (uint64_t J = 0; J < Gap; J++)
				Table.push_back(DefaultMBB);
				}
				uint64_t ClusterSize = (High - Low).getLimitedValue() + 1;
				for (uint64_t J = 0; J < ClusterSize; ++J)
				Table.push_back(FuncInfo.MBBMap[Clusters[I].BB]);
				JTProbs[FuncInfo.MBBMap[Clusters[I].BB]] += Clusters[I].Prob;
				}

				// Create the MBB that will load from and jump through the table.
				// Note: We create it here, but it's not inserted into the function yet.
				MachineFunction *CurMF = FuncInfo.MF;
				MachineBasicBlock *JumpTableMBB =
				CurMF->CreateMachineBasicBlock(SI->getParent());

				// Add successors. Note: use table order for determinism.
				SmallPtrSet<MachineBasicBlock *, 8> Done;
				for (MachineBasicBlock *Succ : Table) {
				if (Done.count(Succ))
				continue;
				SDB->addSuccessorWithProb(JumpTableMBB, Succ, JTProbs[Succ]);
				Done.insert(Succ);
				}
				JumpTableMBB->normalizeSuccProbs();

				unsigned JTI = CurMF->getOrCreateJumpTableInfo(TLI.getJumpTableEncoding())
				->createJumpTableIndex(Table);

				// Set up the jump table info.
				JumpTableCase JT(-1U, JTI, JumpTableMBB, nullptr);
				JumpTableHeader JTH(Clusters[First].Low->getValue(),
				Clusters[Last].High->getValue(), SI->getCondition(),
				nullptr, false);
				SDB->JTCases.emplace_back(std::move(JTH), std::move(JT));

				JTCluster = CaseCluster::jumpTable(Clusters[First].Low, Clusters[Last].High,
				SDB->JTCases.size() - 1, Prob);
				}

				void SwitchLoweringCaseClusterBuilderForDAG::buildBitTests(
				CaseClusterVector &Clusters, unsigned First, unsigned Last,
				const SwitchInst *SI, CaseCluster &BTCluster) {
				APInt Low = Clusters[First].Low->getValue();
				APInt High = Clusters[Last].High->getValue();
				assert(Low.slt(High));

				APInt LowBound;
				APInt CmpRange;

				const int BitWidth = TLI.getPointerTy(DL).getSizeInBits();

				assert(rangeFitsInWord(Low, High) && "Case range must fit in bit mask!");

				// Check if the clusters cover a contiguous range such that no value in the
				// range will jump to the default statement.
				bool ContiguousRange = true;
				for (int64_t I = First + 1; I <= Last; ++I) {
				if (Clusters[I].Low->getValue() != Clusters[I - 1].High->getValue() + 1) {
				ContiguousRange = false;
				break;
				}
				}

				if (Low.isStrictlyPositive() && High.slt(BitWidth)) {
				// Optimize the case where all the case values fit in a word without having
				// to subtract minValue. In this case, we can optimize away the subtraction.
				LowBound = APInt::getNullValue(Low.getBitWidth());
				CmpRange = High;
				ContiguousRange = false;
				} else {
				LowBound = Low;
				CmpRange = High - Low;
				}

				CaseBitsVector CBV;
				auto TotalProb = BranchProbability::getZero();
				for (unsigned i = First; i <= Last; ++i) {
				// Find the CaseBits for this destination.
				unsigned j;
				for (j = 0; j < CBV.size(); ++j)
				if (CBV[j].BB == FuncInfo.MBBMap[Clusters[i].BB])
				break;
				if (j == CBV.size())
				CBV.push_back(CaseBits(0, FuncInfo.MBBMap[Clusters[i].BB], 0,
				BranchProbability::getZero()));
				CaseBits *CB = &CBV[j];

				// Update Mask, Bits and ExtraProb.
				uint64_t Lo = (Clusters[i].Low->getValue() - LowBound).getZExtValue();
				uint64_t Hi = (Clusters[i].High->getValue() - LowBound).getZExtValue();
				assert(Hi >= Lo && Hi < 64 && "Invalid bit case!");
				CB->Mask \|= (-1ULL >> (63 - (Hi - Lo))) << Lo;
				CB->Bits += Hi - Lo + 1;
				CB->ExtraProb += Clusters[i].Prob;
				TotalProb += Clusters[i].Prob;
				}

				BitTestInfo BTI;
				std::sort(CBV.begin(), CBV.end(), [](const CaseBits &a, const CaseBits &b) {
				// Sort by probability first, number of bits second.
				if (a.ExtraProb != b.ExtraProb)
				return a.ExtraProb > b.ExtraProb;
				return a.Bits > b.Bits;
				});

				for (auto &CB : CBV) {
				MachineBasicBlock *BitTestBB =
				FuncInfo.MF->CreateMachineBasicBlock(SI->getParent());
				BTI.push_back(BitTestCase(CB.Mask, BitTestBB, CB.BB, CB.ExtraProb));
				}
				SDB->BitTestCases.emplace_back(std::move(LowBound), std::move(CmpRange),
				SI->getCondition(), -1U, MVT::Other, false,
				ContiguousRange, nullptr, nullptr,
				std::move(BTI), TotalProb);

				BTCluster = CaseCluster::bitTests(Clusters[First].Low, Clusters[Last].High,
				SDB->BitTestCases.size() - 1, TotalProb);
				}

				const BasicBlock *
				SwitchLoweringCaseCluster::findCaseClusters(const SwitchInst &SI,
				CaseClusterVector &Clusters,
				BranchProbabilityInfo *BPI) {
				ClusterBuilder->formInitalCaseCluser(SI, Clusters, BPI);
				const BasicBlock *DefaultBB =
				ClusterBuilder->replaceUnrechableDefault(SI, Clusters);
				if (!Clusters.empty()) {
				ClusterBuilder->findJumpTables(Clusters, &SI, DefaultBB);
				ClusterBuilder->findBitTestClusters(Clusters, &SI);
				}
				return DefaultBB;
				}