This is an archive of the discontinued LLVM Phabricator instance.

Conversion of a switch table to a bitmap is not profitable for -Os and -Oz compilation
Needs ReviewPublic

Authored by ramred01 on Apr 11 2019, 2:15 PM.

Download Raw Diff

Details

Reviewers

hans
lattner

Summary

When compiling for code size, conversion of a switch table into a bitmap is not profitable. The bitmap requires materializing a constant value of 32-bit or 64-bit width into a register, which usually requires 2 - 3 instructions followed by a either a left-shift followed by a right-shift _or_ a bit extract instruction to get the desired result. Whereas, indexing into a jump table would cost at most 2 instructions, plus the size of the jump table.

The bitmap is profitable when compiling for speed since it removes an indexed load, which is costly. But when compiling for size, it simply bloats the code without adding any real value for embedded targets.

The issue is pronounced when the switch result is a sub-word type (e.g., short or char). With word types, unless a double word type exists on the machine, this issue is not seen.

The following code demonstrates this behaviour:

short test(unsigned a) {

short t;
switch (a) {
case 0:
  t = 500;
  break;
case 1:
  t = 200;
  break;
case 2:
  t = 17000;    
  break;
default:
  t = 0;
  break;
}
return (t);

}

On AArch64, it uses 3 instructions to materialize 6 bytes of data and then performs a logical shift left followed by a logical shift right to get the result of the switch. Whereas, simply storing the data in a RO table and indexing into it would have generated two instructions to load the table address followed by one instruction to index into the table and load the result.

We fix the issue by adding a new Codegen option called -fno-switch-bitmap (and its converse -fswitch-bitmap) and make the generation of the Bitmap conditional. Then we modify the Clang FE to pass this switch when either the user explicitly passes this switch or the user is compiling for AArch64 with either -Os or -Oz. Currently we restrict this default option to AArch64 alone since we do not know which other architecture may benefit from this. Other architectures can also make this the default for -Os and -Oz once it is verified.

The clang patch is given in a separate revision.

Diff Detail

Event Timeline

ramred01 created this revision.Apr 11 2019, 2:15 PM

Herald added subscribers: kristof.beyls, javed.absar, mehdi_amini. · View Herald TranscriptApr 11 2019, 2:15 PM

ramred01 mentioned this in D60586: [Clang] Conversion of a switch table to a bitmap is not profitable for -Os and -Oz compilation.Apr 11 2019, 2:21 PM

How many such switches are seen in e.g. LLVM test suite? Could you please share the size reduction statistics?

vsk added a subscriber: vsk.Apr 15 2019, 3:24 PM

First I was confused because your message mentions jump tables, but this is only about the switch-to-lookup table transformation in SimplifyCFG. So there are no jump tables involved, this is just about whether packing the lookup table into a "bitmap" scalar is a good idea or not.

Adding a -fno-switch-bitmap option doesn't seem like a great idea to me. Would it be possible to tweak the heuristic for when to emit the "bitmap lookup table" in some way instead? Currently we always do it if the table fits in a register. Maybe it should be more conservative, at least when optimizing for size, maybe depending on target?

Revision Contents

Path

Size

include/

llvm/

Target/

TargetOptions.h

8 lines

Transforms/

IPO/

PassManagerBuilder.h

3 lines

Scalar.h

7 lines

Utils/

Local.h

1 line

lib/

Target/

AArch64/

AArch64TargetMachine.cpp

3 lines

ARM/

ARMTargetMachine.cpp

2 lines

Transforms/

IPO/

PassManagerBuilder.cpp

3 lines

Scalar/

SimplifyCFGPass.cpp

11 lines

Utils/

SimplifyCFG.cpp

52 lines

test/

Transforms/

Util/

no_switch_bitmap.ll

38 lines

Diff 194747

include/llvm/Target/TargetOptions.h

Show First 20 Lines • Show All 112 Lines • ▼ Show 20 Lines	TargetOptions()
GuaranteedTailCallOpt(false), StackSymbolOrdering(true),		GuaranteedTailCallOpt(false), StackSymbolOrdering(true),
EnableFastISel(false), EnableGlobalISel(false), UseInitArray(false),		EnableFastISel(false), EnableGlobalISel(false), UseInitArray(false),
DisableIntegratedAS(false), RelaxELFRelocations(false),		DisableIntegratedAS(false), RelaxELFRelocations(false),
FunctionSections(false), DataSections(false),		FunctionSections(false), DataSections(false),
UniqueSectionNames(true), TrapUnreachable(false),		UniqueSectionNames(true), TrapUnreachable(false),
NoTrapAfterNoreturn(false), EmulatedTLS(false),		NoTrapAfterNoreturn(false), EmulatedTLS(false),
ExplicitEmulatedTLS(false), EnableIPRA(false),		ExplicitEmulatedTLS(false), EnableIPRA(false),
EmitStackSizeSection(false), EnableMachineOutliner(false),		EmitStackSizeSection(false), EnableMachineOutliner(false),
SupportsDefaultOutlining(false), EmitAddrsig(false) {}		SupportsDefaultOutlining(false), EmitAddrsig(false),
		EmitSwitchBitmap(false) {}

/// PrintMachineCode - This flag is enabled when the -print-machineinstrs		/// PrintMachineCode - This flag is enabled when the -print-machineinstrs
/// option is specified on the command line, and should enable debugging		/// option is specified on the command line, and should enable debugging
/// output from the code generator.		/// output from the code generator.
unsigned PrintMachineCode : 1;		unsigned PrintMachineCode : 1;

/// DisableFramePointerElim - This returns true if frame pointer elimination		/// DisableFramePointerElim - This returns true if frame pointer elimination
/// optimization should be disabled for the given machine function.		/// optimization should be disabled for the given machine function.
bool DisableFramePointerElim(const MachineFunction &MF) const;		bool DisableFramePointerElim(const MachineFunction &MF) const;

/// UnsafeFPMath - This flag is enabled when the		/// UnsafeFPMath - This flag is enabled when the
/// -enable-unsafe-fp-math flag is specified on the command line. When		/// -enable-unsafe-fp-math flag is specified on the command line. When
/// this flag is off (the default), the code generator is not allowed to		/// this flag is off (the default), the code generator is not allowed to
/// produce results that are "less precise" than IEEE allows. This includes		/// produce results that are "less precise" than IEEE allows. This includes
▲ Show 20 Lines • Show All 111 Lines • ▼ Show 20 Lines	public:
unsigned EnableMachineOutliner : 1;		unsigned EnableMachineOutliner : 1;

/// Set if the target supports default outlining behaviour.		/// Set if the target supports default outlining behaviour.
unsigned SupportsDefaultOutlining : 1;		unsigned SupportsDefaultOutlining : 1;

/// Emit address-significance table.		/// Emit address-significance table.
unsigned EmitAddrsig : 1;		unsigned EmitAddrsig : 1;

		// Try to emit a Bitmap instead of a Switch Table.
		unsigned EmitSwitchBitmap : 1;

/// FloatABIType - This setting is set by -float-abi=xxx option is specfied		/// FloatABIType - This setting is set by -float-abi=xxx option is specfied
/// on the command line. This setting may either be Default, Soft, or Hard.		/// on the command line. This setting may either be Default, Soft, or Hard.
/// Default selects the target's default behavior. Soft selects the ABI for		/// Default selects the target's default behavior. Soft selects the ABI for
/// software floating point, but does not indicate that FP hardware may not		/// software floating point, but does not indicate that FP hardware may not
/// be used. Such a combination is unfortunately popular (e.g.		/// be used. Such a combination is unfortunately popular (e.g.
/// arm-apple-darwin). Hard presumes that the normal FP ABI is used.		/// arm-apple-darwin). Hard presumes that the normal FP ABI is used.
FloatABI::ABIType FloatABIType = FloatABI::Default;		FloatABI::ABIType FloatABIType = FloatABI::Default;

▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

include/llvm/Transforms/IPO/PassManagerBuilder.h

Show First 20 Lines • Show All 160 Lines • ▼ Show 20 Lines	public:
bool EnablePGOInstrGen;		bool EnablePGOInstrGen;
/// Profile data file name that the instrumentation will be written to.		/// Profile data file name that the instrumentation will be written to.
std::string PGOInstrGen;		std::string PGOInstrGen;
/// Path of the profile data file.		/// Path of the profile data file.
std::string PGOInstrUse;		std::string PGOInstrUse;
/// Path of the sample Profile data file.		/// Path of the sample Profile data file.
std::string PGOSampleUse;		std::string PGOSampleUse;

		/// Don't convert a Switch Table into a Bitmap
		bool NoSwitchBitmap;

private:		private:
/// ExtensionList - This is list of all of the extensions that are registered.		/// ExtensionList - This is list of all of the extensions that are registered.
std::vector<std::pair<ExtensionPointTy, ExtensionFn>> Extensions;		std::vector<std::pair<ExtensionPointTy, ExtensionFn>> Extensions;

public:		public:
PassManagerBuilder();		PassManagerBuilder();
~PassManagerBuilder();		~PassManagerBuilder();
/// Adds an extension that will be used by all PassManagerBuilder instances.		/// Adds an extension that will be used by all PassManagerBuilder instances.
Show All 40 Lines

include/llvm/Transforms/Scalar.h

	Show First 20 Lines • Show All 244 Lines • ▼ Show 20 Lines
	//			//
	FunctionPass *createJumpThreadingPass(int Threshold = -1);			FunctionPass *createJumpThreadingPass(int Threshold = -1);

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// CFGSimplification - Merge basic blocks, eliminate unreachable blocks,			// CFGSimplification - Merge basic blocks, eliminate unreachable blocks,
	// simplify terminator instructions, convert switches to lookup tables, etc.			// simplify terminator instructions, convert switches to lookup tables, etc.
	//			//
	FunctionPass *createCFGSimplificationPass(			FunctionPass *createCFGSimplificationPass(unsigned Threshold = 1,
	unsigned Threshold = 1, bool ForwardSwitchCond = false,			bool ForwardSwitchCond = false, bool ConvertSwitch = false,
	bool ConvertSwitch = false, bool KeepLoops = true, bool SinkCommon = false,			bool KeepLoops = true, bool SinkCommon = false,
				bool noSwitchBitmap = false,
	std::function<bool(const Function &)> Ftor = nullptr);			std::function<bool(const Function &)> Ftor = nullptr);

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// FlattenCFG - flatten CFG, reduce number of conditional branches by using			// FlattenCFG - flatten CFG, reduce number of conditional branches by using
	// parallel-and and parallel-or mode, etc...			// parallel-and and parallel-or mode, etc...
	//			//
	FunctionPass *createFlattenCFGPass();			FunctionPass *createFlattenCFGPass();
	▲ Show 20 Lines • Show All 240 Lines • Show Last 20 Lines

include/llvm/Transforms/Utils/Local.h

	Show First 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
	/// For example, canonical form that includes switches and branches may later be			/// For example, canonical form that includes switches and branches may later be
	/// replaced by lookup tables and selects.			/// replaced by lookup tables and selects.
	struct SimplifyCFGOptions {			struct SimplifyCFGOptions {
	int BonusInstThreshold;			int BonusInstThreshold;
	bool ForwardSwitchCondToPhi;			bool ForwardSwitchCondToPhi;
	bool ConvertSwitchToLookupTable;			bool ConvertSwitchToLookupTable;
	bool NeedCanonicalLoop;			bool NeedCanonicalLoop;
	bool SinkCommonInsts;			bool SinkCommonInsts;
				bool NoSwitchBitmap;
	AssumptionCache *AC;			AssumptionCache *AC;

	SimplifyCFGOptions(unsigned BonusThreshold = 1,			SimplifyCFGOptions(unsigned BonusThreshold = 1,
	bool ForwardSwitchCond = false,			bool ForwardSwitchCond = false,
	bool SwitchToLookup = false, bool CanonicalLoops = true,			bool SwitchToLookup = false, bool CanonicalLoops = true,
	bool SinkCommon = false,			bool SinkCommon = false,
	AssumptionCache *AssumpCache = nullptr)			AssumptionCache *AssumpCache = nullptr)
	: BonusInstThreshold(BonusThreshold),			: BonusInstThreshold(BonusThreshold),
	▲ Show 20 Lines • Show All 426 Lines • Show Last 20 Lines

lib/Target/AArch64/AArch64TargetMachine.cpp

Show First 20 Lines • Show All 399 Lines • ▼ Show 20 Lines	void AArch64PassConfig::addIRPasses() {
// Always expand atomic operations, we don't deal with atomicrmw or cmpxchg		// Always expand atomic operations, we don't deal with atomicrmw or cmpxchg
// ourselves.		// ourselves.
addPass(createAtomicExpandPass());		addPass(createAtomicExpandPass());

// Cmpxchg instructions are often used with a subsequent comparison to		// Cmpxchg instructions are often used with a subsequent comparison to
// determine whether it succeeded. We can exploit existing control-flow in		// determine whether it succeeded. We can exploit existing control-flow in
// ldrex/strex loops to simplify this, but it needs tidying up.		// ldrex/strex loops to simplify this, but it needs tidying up.
if (TM->getOptLevel() != CodeGenOpt::None && EnableAtomicTidy)		if (TM->getOptLevel() != CodeGenOpt::None && EnableAtomicTidy)
addPass(createCFGSimplificationPass(1, true, true, false, true));		addPass(createCFGSimplificationPass(1, true, true, false, true,
		TM->Options.EmitSwitchBitmap));

// Run LoopDataPrefetch		// Run LoopDataPrefetch
//		//
// Run this before LSR to remove the multiplies involved in computing the		// Run this before LSR to remove the multiplies involved in computing the
// pointer values N iterations ahead.		// pointer values N iterations ahead.
if (TM->getOptLevel() != CodeGenOpt::None) {		if (TM->getOptLevel() != CodeGenOpt::None) {
if (EnableLoopDataPrefetch)		if (EnableLoopDataPrefetch)
addPass(createLoopDataPrefetchPass());		addPass(createLoopDataPrefetchPass());
▲ Show 20 Lines • Show All 175 Lines • Show Last 20 Lines

lib/Target/ARM/ARMTargetMachine.cpp

Show First 20 Lines • Show All 379 Lines • ▼ Show 20 Lines	void ARMPassConfig::addIRPasses() {
else		else
addPass(createAtomicExpandPass());		addPass(createAtomicExpandPass());

// Cmpxchg instructions are often used with a subsequent comparison to		// Cmpxchg instructions are often used with a subsequent comparison to
// determine whether it succeeded. We can exploit existing control-flow in		// determine whether it succeeded. We can exploit existing control-flow in
// ldrex/strex loops to simplify this, but it needs tidying up.		// ldrex/strex loops to simplify this, but it needs tidying up.
if (TM->getOptLevel() != CodeGenOpt::None && EnableAtomicTidy)		if (TM->getOptLevel() != CodeGenOpt::None && EnableAtomicTidy)
addPass(createCFGSimplificationPass(		addPass(createCFGSimplificationPass(
1, false, false, true, true, [this](const Function &F) {		1, false, false, true, true, false, [this](const Function &F) {
const auto &ST = this->TM->getSubtarget<ARMSubtarget>(F);		const auto &ST = this->TM->getSubtarget<ARMSubtarget>(F);
return ST.hasAnyDataBarrier() && !ST.isThumb1Only();		return ST.hasAnyDataBarrier() && !ST.isThumb1Only();
}));		}));

TargetPassConfig::addIRPasses();		TargetPassConfig::addIRPasses();

// Match interleaved memory accesses to ldN/stN intrinsics.		// Match interleaved memory accesses to ldN/stN intrinsics.
if (TM->getOptLevel() != CodeGenOpt::None)		if (TM->getOptLevel() != CodeGenOpt::None)
▲ Show 20 Lines • Show All 112 Lines • Show Last 20 Lines

lib/Transforms/IPO/PassManagerBuilder.cpp

Show First 20 Lines • Show All 675 Lines • ▼ Show 20 Lines	if (OptLevel > 1 && ExtraVectorizerPasses) {
addInstructionCombiningPass(MPM);		addInstructionCombiningPass(MPM);
}		}

// Cleanup after loop vectorization, etc. Simplification passes like CVP and		// Cleanup after loop vectorization, etc. Simplification passes like CVP and
// GVN, loop transforms, and others have already run, so it's now better to		// GVN, loop transforms, and others have already run, so it's now better to
// convert to more optimized IR using more aggressive simplify CFG options.		// convert to more optimized IR using more aggressive simplify CFG options.
// The extra sinking transform can create larger basic blocks, so do this		// The extra sinking transform can create larger basic blocks, so do this
// before SLP vectorization.		// before SLP vectorization.
MPM.add(createCFGSimplificationPass(1, true, true, false, true));		MPM.add(createCFGSimplificationPass(1, true, true, false, true,
		NoSwitchBitmap));

if (RunSLPAfterLoopVectorization && SLPVectorize) {		if (RunSLPAfterLoopVectorization && SLPVectorize) {
MPM.add(createSLPVectorizerPass()); // Vectorize parallel scalar chains.		MPM.add(createSLPVectorizerPass()); // Vectorize parallel scalar chains.
if (OptLevel > 1 && ExtraVectorizerPasses) {		if (OptLevel > 1 && ExtraVectorizerPasses) {
MPM.add(createEarlyCSEPass());		MPM.add(createEarlyCSEPass());
}		}
}		}

▲ Show 20 Lines • Show All 396 Lines • Show Last 20 Lines

lib/Transforms/Scalar/SimplifyCFGPass.cpp

	Show First 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
	static cl::opt<bool> UserForwardSwitchCond(			static cl::opt<bool> UserForwardSwitchCond(
	"forward-switch-cond", cl::Hidden, cl::init(false),			"forward-switch-cond", cl::Hidden, cl::init(false),
	cl::desc("Forward switch condition to phi ops (default = false)"));			cl::desc("Forward switch condition to phi ops (default = false)"));

	static cl::opt<bool> UserSinkCommonInsts(			static cl::opt<bool> UserSinkCommonInsts(
	"sink-common-insts", cl::Hidden, cl::init(false),			"sink-common-insts", cl::Hidden, cl::init(false),
	cl::desc("Sink common instructions (default = false)"));			cl::desc("Sink common instructions (default = false)"));


	STATISTIC(NumSimpl, "Number of blocks simplified");			STATISTIC(NumSimpl, "Number of blocks simplified");

	/// If we have more than one empty (other than phi node) return blocks,			/// If we have more than one empty (other than phi node) return blocks,
	/// merge them together to promote recursive block merging.			/// merge them together to promote recursive block merging.
	static bool mergeEmptyReturnBlocks(Function &F) {			static bool mergeEmptyReturnBlocks(Function &F) {
	bool Changed = false;			bool Changed = false;

	BasicBlock *RetBlock = nullptr;			BasicBlock *RetBlock = nullptr;
	▲ Show 20 Lines • Show All 153 Lines • ▼ Show 20 Lines
	struct CFGSimplifyPass : public FunctionPass {			struct CFGSimplifyPass : public FunctionPass {
	static char ID;			static char ID;
	SimplifyCFGOptions Options;			SimplifyCFGOptions Options;
	std::function<bool(const Function &)> PredicateFtor;			std::function<bool(const Function &)> PredicateFtor;

	CFGSimplifyPass(unsigned Threshold = 1, bool ForwardSwitchCond = false,			CFGSimplifyPass(unsigned Threshold = 1, bool ForwardSwitchCond = false,
	bool ConvertSwitch = false, bool KeepLoops = true,			bool ConvertSwitch = false, bool KeepLoops = true,
	bool SinkCommon = false,			bool SinkCommon = false,
				bool noSwitchBitmap = false,
	std::function<bool(const Function &)> Ftor = nullptr)			std::function<bool(const Function &)> Ftor = nullptr)
	: FunctionPass(ID), PredicateFtor(std::move(Ftor)) {			: FunctionPass(ID), PredicateFtor(std::move(Ftor)) {

	initializeCFGSimplifyPassPass(*PassRegistry::getPassRegistry());			initializeCFGSimplifyPassPass(*PassRegistry::getPassRegistry());

	// Check for command-line overrides of options for debug/customization.			// Check for command-line overrides of options for debug/customization.
	Options.BonusInstThreshold = UserBonusInstThreshold.getNumOccurrences()			Options.BonusInstThreshold = UserBonusInstThreshold.getNumOccurrences()
	? UserBonusInstThreshold			? UserBonusInstThreshold
	: Threshold;			: Threshold;

	Options.ForwardSwitchCondToPhi = UserForwardSwitchCond.getNumOccurrences()			Options.ForwardSwitchCondToPhi = UserForwardSwitchCond.getNumOccurrences()
	? UserForwardSwitchCond			? UserForwardSwitchCond
	: ForwardSwitchCond;			: ForwardSwitchCond;

	Options.ConvertSwitchToLookupTable = UserSwitchToLookup.getNumOccurrences()			Options.ConvertSwitchToLookupTable = UserSwitchToLookup.getNumOccurrences()
	? UserSwitchToLookup			? UserSwitchToLookup
	: ConvertSwitch;			: ConvertSwitch;

	Options.NeedCanonicalLoop =			Options.NeedCanonicalLoop =
	UserKeepLoops.getNumOccurrences() ? UserKeepLoops : KeepLoops;			UserKeepLoops.getNumOccurrences() ? UserKeepLoops : KeepLoops;

	Options.SinkCommonInsts = UserSinkCommonInsts.getNumOccurrences()			Options.SinkCommonInsts = UserSinkCommonInsts.getNumOccurrences()
	? UserSinkCommonInsts			? UserSinkCommonInsts
	: SinkCommon;			: SinkCommon;

				Options.NoSwitchBitmap = noSwitchBitmap;
	}			}

	bool runOnFunction(Function &F) override {			bool runOnFunction(Function &F) override {
	if (skipFunction(F) \|\| (PredicateFtor && !PredicateFtor(F)))			if (skipFunction(F) \|\| (PredicateFtor && !PredicateFtor(F)))
	return false;			return false;

	Options.AC = &getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);			Options.AC = &getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);
	auto &TTI = getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);			auto &TTI = getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);
	Show All 14 Lines
	INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)			INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)
	INITIALIZE_PASS_END(CFGSimplifyPass, "simplifycfg", "Simplify the CFG", false,			INITIALIZE_PASS_END(CFGSimplifyPass, "simplifycfg", "Simplify the CFG", false,
	false)			false)

	// Public interface to the CFGSimplification pass			// Public interface to the CFGSimplification pass
	FunctionPass *			FunctionPass *
	llvm::createCFGSimplificationPass(unsigned Threshold, bool ForwardSwitchCond,			llvm::createCFGSimplificationPass(unsigned Threshold, bool ForwardSwitchCond,
	bool ConvertSwitch, bool KeepLoops,			bool ConvertSwitch, bool KeepLoops,
	bool SinkCommon,			bool SinkCommon, bool noSwitchBitmap,
	std::function<bool(const Function &)> Ftor) {			std::function<bool(const Function &)> Ftor) {
	return new CFGSimplifyPass(Threshold, ForwardSwitchCond, ConvertSwitch,			return new CFGSimplifyPass(Threshold, ForwardSwitchCond, ConvertSwitch,
	KeepLoops, SinkCommon, std::move(Ftor));			KeepLoops, SinkCommon, noSwitchBitmap,
				std::move(Ftor));
	}			}

lib/Transforms/Utils/SimplifyCFG.cpp

Show First 20 Lines • Show All 4,869 Lines • ▼ Show 20 Lines
/// This class represents a lookup table that can be used to replace a switch.		/// This class represents a lookup table that can be used to replace a switch.
class SwitchLookupTable {		class SwitchLookupTable {
public:		public:
/// Create a lookup table to use as a switch replacement with the contents		/// Create a lookup table to use as a switch replacement with the contents
/// of Values, using DefaultValue to fill any holes in the table.		/// of Values, using DefaultValue to fill any holes in the table.
SwitchLookupTable(		SwitchLookupTable(
Module &M, uint64_t TableSize, ConstantInt *Offset,		Module &M, uint64_t TableSize, ConstantInt *Offset,
const SmallVectorImpl<std::pair<ConstantInt , Constant >> &Values,		const SmallVectorImpl<std::pair<ConstantInt , Constant >> &Values,
Constant *DefaultValue, const DataLayout &DL, const StringRef &FuncName);		Constant *DefaultValue, const DataLayout &DL, const StringRef &FuncName,
		SimplifyCFGOptions &Options);

/// Build instructions with Builder to retrieve the value at		/// Build instructions with Builder to retrieve the value at
/// the position given by Index in the lookup table.		/// the position given by Index in the lookup table.
Value BuildLookup(Value Index, IRBuilder<> &Builder);		Value BuildLookup(Value Index, IRBuilder<> &Builder);

/// Return true if a table with TableSize elements of		/// Return true if a table with TableSize elements of
/// type ElementType would fit in a target-legal register.		/// type ElementType would fit in a target-legal register.
static bool WouldFitInRegister(const DataLayout &DL, uint64_t TableSize,		static bool WouldFitInRegister(const DataLayout &DL, uint64_t TableSize,
Show All 37 Lines	private:
GlobalVariable *Array = nullptr;		GlobalVariable *Array = nullptr;
};		};

} // end anonymous namespace		} // end anonymous namespace

SwitchLookupTable::SwitchLookupTable(		SwitchLookupTable::SwitchLookupTable(
Module &M, uint64_t TableSize, ConstantInt *Offset,		Module &M, uint64_t TableSize, ConstantInt *Offset,
const SmallVectorImpl<std::pair<ConstantInt , Constant >> &Values,		const SmallVectorImpl<std::pair<ConstantInt , Constant >> &Values,
Constant *DefaultValue, const DataLayout &DL, const StringRef &FuncName) {		Constant *DefaultValue, const DataLayout &DL, const StringRef &FuncName,
		SimplifyCFGOptions &Options) {
assert(Values.size() && "Can't build lookup table without values!");		assert(Values.size() && "Can't build lookup table without values!");
assert(TableSize >= Values.size() && "Can't fit values in table!");		assert(TableSize >= Values.size() && "Can't fit values in table!");

// If all values in the table are equal, this is that value.		// If all values in the table are equal, this is that value.
SingleValue = Values.begin()->second;		SingleValue = Values.begin()->second;

Type *ValueType = Values.begin()->second->getType();		Type *ValueType = Values.begin()->second->getType();

// Build up the table contents.		// Build up the table contents.
SmallVector<Constant *, 64> TableContents(TableSize);		SmallVector<Constant *, 64> TableContents(TableSize);
for (size_t I = 0, E = Values.size(); I != E; ++I) {		for (size_t I = 0, E = Values.size(); I != E; ++I) {
ConstantInt *CaseVal = Values[I].first;		ConstantInt *CaseVal = Values[I].first;
Constant *CaseRes = Values[I].second;		Constant *CaseRes = Values[I].second;
assert(CaseRes->getType() == ValueType);		assert(CaseRes->getType() == ValueType);
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	if (LinearMappingPossible) {
LinearOffset = cast<ConstantInt>(TableContents[0]);		LinearOffset = cast<ConstantInt>(TableContents[0]);
LinearMultiplier = ConstantInt::get(M.getContext(), DistToPrev);		LinearMultiplier = ConstantInt::get(M.getContext(), DistToPrev);
Kind = LinearMapKind;		Kind = LinearMapKind;
++NumLinearMaps;		++NumLinearMaps;
return;		return;
}		}
}		}

		// if fno-switch-bitmap flag is on, will skip building of a bitmap in place
		// of a switch table.
		if (!Options.NoSwitchBitmap) {
// If the type is integer and the table fits in a register, build a bitmap.		// If the type is integer and the table fits in a register, build a bitmap.
if (WouldFitInRegister(DL, TableSize, ValueType)) {		if (WouldFitInRegister(DL, TableSize, ValueType)) {
IntegerType *IT = cast<IntegerType>(ValueType);		IntegerType *IT = cast<IntegerType>(ValueType);
APInt TableInt(TableSize * IT->getBitWidth(), 0);		APInt TableInt(TableSize * IT->getBitWidth(), 0);
for (uint64_t I = TableSize; I > 0; --I) {		for (uint64_t I = TableSize; I > 0; --I) {
TableInt <<= IT->getBitWidth();		TableInt <<= IT->getBitWidth();
// Insert values into the bitmap. Undef values are set to zero.		// Insert values into the bitmap. Undef values are set to zero.
if (!isa<UndefValue>(TableContents[I - 1])) {		if (!isa<UndefValue>(TableContents[I - 1])) {
ConstantInt *Val = cast<ConstantInt>(TableContents[I - 1]);		ConstantInt *Val = cast<ConstantInt>(TableContents[I - 1]);
TableInt \|= Val->getValue().zext(TableInt.getBitWidth());		TableInt \|= Val->getValue().zext(TableInt.getBitWidth());
}		}
}		}
BitMap = ConstantInt::get(M.getContext(), TableInt);		BitMap = ConstantInt::get(M.getContext(), TableInt);
BitMapElementTy = IT;		BitMapElementTy = IT;
Kind = BitMapKind;		Kind = BitMapKind;
++NumBitMaps;		++NumBitMaps;
return;		return;
}		}
		}
// Store the table in an array.		// Store the table in an array.
ArrayType *ArrayTy = ArrayType::get(ValueType, TableSize);		ArrayType *ArrayTy = ArrayType::get(ValueType, TableSize);
Constant *Initializer = ConstantArray::get(ArrayTy, TableContents);		Constant *Initializer = ConstantArray::get(ArrayTy, TableContents);

Array = new GlobalVariable(M, ArrayTy, /constant=/true,		Array = new GlobalVariable(M, ArrayTy, /constant=/true,
GlobalVariable::PrivateLinkage, Initializer,		GlobalVariable::PrivateLinkage, Initializer,
"switch.table." + FuncName);		"switch.table." + FuncName);
Array->setUnnamedAddr(GlobalValue::UnnamedAddr::Global);		Array->setUnnamedAddr(GlobalValue::UnnamedAddr::Global);
▲ Show 20 Lines • Show All 199 Lines • ▼ Show 20 Lines	static void reuseTableCompare(
}		}
}		}

/// If the switch is only used to initialize one or more phi nodes in a common		/// If the switch is only used to initialize one or more phi nodes in a common
/// successor block with different constant values, replace the switch with		/// successor block with different constant values, replace the switch with
/// lookup tables.		/// lookup tables.
static bool SwitchToLookupTable(SwitchInst *SI, IRBuilder<> &Builder,		static bool SwitchToLookupTable(SwitchInst *SI, IRBuilder<> &Builder,
const DataLayout &DL,		const DataLayout &DL,
const TargetTransformInfo &TTI) {		const TargetTransformInfo &TTI,
		SimplifyCFGOptions Options) {
assert(SI->getNumCases() > 1 && "Degenerate switch?");		assert(SI->getNumCases() > 1 && "Degenerate switch?");

Function *Fn = SI->getParent()->getParent();		Function *Fn = SI->getParent()->getParent();
// Only build lookup table when we have a target that supports it or the		// Only build lookup table when we have a target that supports it or the
// attribute is not set.		// attribute is not set.
if (!TTI.shouldBuildLookupTables() \|\|		if (!TTI.shouldBuildLookupTables() \|\|
(Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true"))		(Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true"))
return false;		return false;
▲ Show 20 Lines • Show All 176 Lines • ▼ Show 20 Lines	static bool SwitchToLookupTable(SwitchInst *SI, IRBuilder<> &Builder,
bool ReturnedEarly = false;		bool ReturnedEarly = false;
for (PHINode *PHI : PHIs) {		for (PHINode *PHI : PHIs) {
const ResultListTy &ResultList = ResultLists[PHI];		const ResultListTy &ResultList = ResultLists[PHI];

// If using a bitmask, use any value to fill the lookup table holes.		// If using a bitmask, use any value to fill the lookup table holes.
Constant *DV = NeedMask ? ResultLists[PHI][0].second : DefaultResults[PHI];		Constant *DV = NeedMask ? ResultLists[PHI][0].second : DefaultResults[PHI];
StringRef FuncName = Fn->getName();		StringRef FuncName = Fn->getName();
SwitchLookupTable Table(Mod, TableSize, MinCaseVal, ResultList, DV, DL,		SwitchLookupTable Table(Mod, TableSize, MinCaseVal, ResultList, DV, DL,
FuncName);		FuncName, Options);

Value *Result = Table.BuildLookup(TableIndex, Builder);		Value *Result = Table.BuildLookup(TableIndex, Builder);

// If the result is used to return immediately from the function, we want to		// If the result is used to return immediately from the function, we want to
// do that right here.		// do that right here.
if (PHI->hasOneUse() && isa<ReturnInst>(*PHI->user_begin()) &&		if (PHI->hasOneUse() && isa<ReturnInst>(*PHI->user_begin()) &&
PHI->user_back() == CommonDest->getFirstNonPHIOrDbg()) {		PHI->user_back() == CommonDest->getFirstNonPHIOrDbg()) {
Builder.CreateRet(Result);		Builder.CreateRet(Result);
▲ Show 20 Lines • Show All 174 Lines • ▼ Show 20 Lines	if (Options.ForwardSwitchCondToPhi && ForwardSwitchConditionToPHI(SI))
return requestResimplify();		return requestResimplify();

// The conversion from switch to lookup tables results in difficult-to-analyze		// The conversion from switch to lookup tables results in difficult-to-analyze
// code and makes pruning branches much harder. This is a problem if the		// code and makes pruning branches much harder. This is a problem if the
// switch expression itself can still be restricted as a result of inlining or		// switch expression itself can still be restricted as a result of inlining or
// CVP. Therefore, only apply this transformation during late stages of the		// CVP. Therefore, only apply this transformation during late stages of the
// optimisation pipeline.		// optimisation pipeline.
if (Options.ConvertSwitchToLookupTable &&		if (Options.ConvertSwitchToLookupTable &&
SwitchToLookupTable(SI, Builder, DL, TTI))		SwitchToLookupTable(SI, Builder, DL, TTI, Options))
return requestResimplify();		return requestResimplify();

if (ReduceSwitchRange(SI, Builder, DL, TTI))		if (ReduceSwitchRange(SI, Builder, DL, TTI))
return requestResimplify();		return requestResimplify();

return false;		return false;
}		}

▲ Show 20 Lines • Show All 453 Lines • Show Last 20 Lines

test/Transforms/Util/no_switch_bitmap.ll

This file was added.

				;RUN: llc %s -o - -verify-machineinstrs \| FileCheck %s

				target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
				target triple = "aarch64-arm-none-eabi"

				@switch.table.test = private unnamed_addr constant [3 x i16] [i16 500, i16 200, i16 17000], align 4

				; Function Attrs: minsize norecurse nounwind optsize readnone
				;CHECK-LABEL: @test
				;CHECK: cmp
				;CHECK: b
				;CHECK: adrp
				;CHECK: add
				;CHECK: ldrh
				;CHECK: ret
				;CHECK: mov
				;CHECK: ret
				define dso_local i16 @test(i32) local_unnamed_addr #0 {
				%2 = icmp ult i32 %0, 3
				br i1 %2, label %3, label %7

				; <label>:3: ; preds = %1
				%4 = sext i32 %0 to i64
				%5 = getelementptr inbounds [3 x i16], [3 x i16]* @switch.table.test, i64 0, i64 %4
				%6 = load i16, i16* %5, align 2
				ret i16 %6

				; <label>:7: ; preds = %1
				ret i16 0
				}

				attributes #0 = { minsize norecurse nounwind optsize readnone "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="cortex-a53" "target-features"="+aes,+crc,+crypto,+fp-armv8,+neon,+sha2" "unsafe-fp-math"="false" "use-soft-float"="false" }

				!llvm.module.flags = !{!0}
				!llvm.ident = !{!1}

				!0 = !{i32 1, !"wchar_size", i32 4}
				!1 = !{!"clang version 9.0.0 (https://git.llvm.org/git/clang.git/ 112a7a3c34df3dc506b52cd89e155287cdbcea55) (https://git.llvm.org/git/llvm.git/ feeb3cebe7a586a8e1b12735024389dd8d3bab53)"}

This is an archive of the discontinued LLVM Phabricator instance.

Conversion of a switch table to a bitmap is not profitable for -Os and -Oz compilationNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 194747

include/llvm/Target/TargetOptions.h

include/llvm/Transforms/IPO/PassManagerBuilder.h

include/llvm/Transforms/Scalar.h

include/llvm/Transforms/Utils/Local.h

lib/Target/AArch64/AArch64TargetMachine.cpp

lib/Target/ARM/ARMTargetMachine.cpp

lib/Transforms/IPO/PassManagerBuilder.cpp

lib/Transforms/Scalar/SimplifyCFGPass.cpp

lib/Transforms/Utils/SimplifyCFG.cpp

test/Transforms/Util/no_switch_bitmap.ll

Conversion of a switch table to a bitmap is not profitable for -Os and -Oz compilation
Needs ReviewPublic