This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/IR/
-
llvm/
-
IR/
-
IntrinsicsAArch64.td
-
lib/Target/AArch64/
-
Target/
-
AArch64/
-
AArch64.h
-
AArch64.td
4
AArch64AsmPrinter.cpp
3
AArch64CompressJumpTables.cpp
-
AArch64ISelLowering.h
-
AArch64ISelLowering.cpp
2
AArch64InstrInfo.cpp
-
AArch64InstrInfo.td
-
AArch64MachineFunctionInfo.h
-
AArch64Subtarget.h
1/1
AArch64TargetMachine.cpp
-
CMakeLists.txt
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
O3-pipeline.ll
-
jump-table-compress.mir
-
jump-table-exynos.ll
-
jump-table.ll
-
min-jump-table.ll

Differential D32564

AArch64: compress jump tables to minimum size needed to reach destinations
ClosedPublic

Authored by t.p.northover on Apr 26 2017, 3:17 PM.

Download Raw Diff

Details

Reviewers

rengolin
ab
javed.absar
evandro

Summary

This patch adds a pass after branch relaxation to check whether the span of jump tables is small enough that the offsets can be encoded in 8 or 16-bits and emit appropriate sequences if so.

This improves the binary size quite significantly for some tests (it turns out not a single table in the test-suite needs 32-bits of range, and two thirds only need 8-bits).

I'm a little divided on when to enable this and whether to continue to support 64-bit absolute entries (the code sequence is slightly simpler). In the end I decided to go for simplicity on the grounds that switches probably aren't performance critical and will probably be dominated by mispredicts anyway. I'd be open to gating it on MinSize or something though.

What do you think?

Diff Detail

Repository: rL LLVM

Event Timeline

t.p.northover created this revision.Apr 26 2017, 3:17 PM

Herald added subscribers: mgorny, mcrosier, aemerson. · View Herald TranscriptApr 26 2017, 3:17 PM

If I'm following correctly, for PIC small-code-model code (which is what mostly matters these days), the comparison is this:

On trunk we take 4 instructions to compute the destination for the jump: two to load the address of the jump table, one to load from the jump table, one to add the jump-table-relative offset offset to the address of the jump table.
With this technique, we take 5 instructions: two to load the address of the jump table, one to load from the jump table, one to load the current PC, and one to add the PC-relative offset to the PC.

Is that right?

If you're looking to save size, we could put the jump table into the text segment just after the branch, and compute the jump destination in three instructions: one to load the address of the table, one to load the offset from the table, and one to add the offset to the address of the table. Is there some reason we shouldn't do that?

eastig added a subscriber: eastig.Apr 27 2017, 4:37 AM

My primary concern here is that it can make it impossible to hand-patch the resulting assembler. I'm not sure if anyone is doing that, but if I have learned one lesson from pkgsrc development and the set of Linux support patches, it is to expect "yes, someone is really doing" as most likely answer. As such, I would hope for some global option to disable the pass.

About putting the jump table into a separate section: that's kind of what GCC is doing. I don't think a jump table has necessarily only one user, but placing it inside the function still looks helpful. That wouldn't conflict with -ffunction-sections or so either.

On 5/8/2017 9:36 AM, Tim Northover wrote:

On 26 April 2017 at 16:33, Eli Friedman via Phabricator
<reviews@reviews.llvm.org> wrote:

If you're looking to save size, we could put the jump table into the text segment just after the branch, and compute the jump destination in three instructions: one to load the address of the table, one to load the offset from the table, and one to add the offset to the address of the table. Is there some reason we shouldn't do that?

To get that first load down to 1 instruction we'd need to guarantee
the jump table was within 128MB of the code, and I think we
technically support larger functions. So we'd have to add something
like ARM's full Constant Island pass instead of the newly generic
branch relaxation we currently do. I don't think that is worth the
maintenance burden, it's a constant source of bugs on ARM.

For the second offset addition, again you're compromising distance.
The jump table itself then has to be within 256 or 65536 bytes of
every basic block used. I suspect that even *with* complex placement
that would be tough to arrange.

We manage for Thumb (see, for example, test/CodeGen/Thumb2/thumb2-tbb.ll). That said, I hadn't really considered the lack of existing infrastructure to implement this sort of thing on AArch64; you're right, it probably isn't worthwhile.

jfb added a subscriber: jfb.Sep 18 2018, 10:41 AM

Herald added a reviewer: javed.absar. · View Herald TranscriptSep 18 2018, 10:41 AM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald Transcript

Thread necromancy! I realised that I got distracted and never finished this discussion.

So I've ported the diff to current LLVM. Generally I still think it's the right approach.

jfb added inline comments.Sep 21 2018, 9:40 AM

llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
466	Range-based for loop here.
489	Make it a `shr` here, since that's what you'll want to emit anyways?
llvm/lib/Target/AArch64/AArch64CompressJumpTables.cpp
116	Are there ever cases where min is large, but the range [min, max) would fit within 8 or 16 bits? I'm wondering if you want another optimization where you pack the jump offsets, and always add min (here you're using a base of 0).
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
108	lol this comment
llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
79	drop this

t.p.northover marked an inline comment as done.Sep 24 2018, 4:35 AM

t.p.northover added inline comments.

llvm/lib/Target/AArch64/AArch64CompressJumpTables.cpp
116	Probably. A slight wrinkle is that if the PC-label isn't local it has a limit of +/-1MB, but I suspect that's significantly less likely than the limit that would be lifted. I'll update the patch with that approach.
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
108	I have a horrible suspicion it'll have been one of mine.

Tidy-ups suggested by JF, and (also JF's idea) switch to an offset from the lowest-addressed basic-block instead of the actual branch to increase the number of candidates.

Ping.

hiraditya added a reviewer: evandro.Oct 8 2018, 9:38 AM

vsk added a subscriber: vsk.Oct 8 2018, 10:54 AM

Can you please provide some figures on the code size saved and the effect on the performance of this change?

Thank you.

As you'd expect, size is pretty good. Over the test-suite (including externals) nothing regresses. The total benefit over all the code is 0.5%, with notable highlights of 25% in 401.bzip2, 7% in 403.gcc, and 4% in 177.mesa.

Time is obviously more fuzzy, but I think it's in the noise. Overall there was a 0.5% regression by geomean, but when I limited it to tests that had actually changed, that reduced to 0.05%. Other slices I took to try and improve meaning actually turned it into an improvement (including the total runtime!).

In D32564#1258763, @t.p.northover wrote:

As you'd expect, size is pretty good. Over the test-suite (including externals) nothing regresses. The total benefit over all the code is 0.5%, with notable highlights of 25% in 401.bzip2, 7% in 403.gcc, and 4% in 177.mesa.

Quite expected, but good to know the actual numbers. :)

Time is obviously more fuzzy, but I think it's in the noise. Overall there was a 0.5% regression by geomean, but when I limited it to tests that had actually changed, that reduced to 0.05%. Other slices I took to try and improve meaning actually turned it into an improvement (including the total runtime!).

Sounds like noise to me.

This is good data. However, I'd like to evaluate this patch a bit further on Exynos, if I may.

evandro added a subscriber: javed.absar.Oct 9 2018, 11:00 AM

In D32564#1259125, @evandro wrote:

This is good data. However, I'd like to evaluate this patch a bit further on Exynos, if I may.

Do you have specific worries? Or at least a timeline? Seems the patch can be reviewed without waiting for your exploration.

In D32564#1259155, @jfb wrote:

Do you have specific worries? Or at least a timeline? Seems the patch can be reviewed without waiting for your exploration.

Yes, I do. Exynos limits the size of jump tables, resulting in a daisy chain of smaller jump tables. This patch changes the jump table code, so I'd like to evaluate the performance impact, unless this pass is gated by -Os.

llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
519	It seems to me that this instruction could be moved after the load below to promote macro fusion.

Yes, I do. Exynos limits the size of jump tables, resulting in a daisy chain of smaller jump tables. This patch changes the jump table code, so I'd like to evaluate the performance impact, unless this pass is gated by -Os.

Is that in trunk? I can't see any obvious candidates with a git grep -i exynos. If not, it's probably not relevant to this review.

t.p.northover added inline comments.Oct 9 2018, 3:23 PM

llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
519	The comment above explains why this has to be first.

In D32564#1259585, @t.p.northover wrote:

Is that in trunk? I can't see any obvious candidates with a git grep -i exynos. If not, it's probably not relevant to this review.

Yes, see ExynosM{1,3} under AArch64Subtarget::initializeProperties().

In D32564#1259372, @evandro wrote:

In D32564#1259155, @jfb wrote:

Do you have specific worries? Or at least a timeline? Seems the patch can be reviewed without waiting for your exploration.

Yes, I do. Exynos limits the size of jump tables, resulting in a daisy chain of smaller jump tables. This patch changes the jump table code, so I'd like to evaluate the performance impact, unless this pass is gated by -Os.

Do you have an ETA?

If that ETA is too far, then Tim do you think doing -Os first (with a flag to force it, say -O2 -mcompress-jump-tables) is OK, with other optimizer settings later (when Exynos stuff comes back)?

Could also be an architectural flag, giving this is a particular behaviour from Exynos. Then a simple splitsJumpTables() or whatever check would be enough.

Rebase to master

In D32564#1259703, @jfb wrote:

Do you have an ETA?

Later today. Thank you.

I'm getting mixed results on Exynos, with significant improvements and regressions. I'd like to run more tests, but, at the moment, I'd rather that this feature would be gated, either as @jfb or @rengolin suggested.

Fair enough, I've disabled it for Exynos processors except under MinSize conditions. OK to commit?

rengolin added inline comments.Oct 12 2018, 1:34 AM

llvm/lib/Target/AArch64/AArch64CompressJumpTables.cpp
146	Can we not add more Cpu name comparisons? Shouldn't be too hard to create a feature and associate it to these two cores. Later on, Samsung or any other would be able to fine tune to other cores.

Sure.

Rather than a feature in AArch.td, I'd prefer a line in AArch64Subtarget::initializeProperties().

dmgreen added a subscriber: dmgreen.Oct 16 2018, 9:12 AM

Herald added a subscriber: nhaehnle. · View Herald TranscriptOct 16 2018, 9:12 AM

Rather than a feature in AArch.td, I'd prefer a line in AArch64Subtarget::initializeProperties().

I think I disagree. Putting it in AArch64Subtarget.cpp means that someone has to recompile clang to test this out on their CPU. I'd reserve initializeProperties for unfortunate situations where you can't easily set something on the command-line or in AArch64.td (i.e. non-bool options so far).

Putting it in AArch64Subtarget.cpp also means you can't even in principle set it on a per-function basis without changing your CPU (which is icky).

Isn't this change always enabled for -Os? So it should be easy to test it or to enable down to a single function, shouldn't it?

Isn't this change always enabled for -Os? So it should be easy to test it or to enable down to a single function, shouldn't it?

-Oz, but even if it was -Os that perturbs things in other ways that are probably not desirable for anyone who cared enough to annotate a function for it. But that was mostly a throwaway comment, I think the attitude that .cpp initialization is an unfortunate necessity is more compelling.

evandro accepted this revision.Oct 24 2018, 11:59 AM

This revision is now accepted and ready to land.Oct 24 2018, 11:59 AM

Thanks. Committed as r345188.

Revision Contents

Path

Size

llvm/

include/

llvm/

IR/

IntrinsicsAArch64.td

6 lines

lib/

Target/

AArch64/

AArch64.h

2 lines

AArch64.td

13 lines

AArch64AsmPrinter.cpp

132 lines

AArch64CompressJumpTables.cpp

162 lines

AArch64ISelLowering.h

1 line

AArch64ISelLowering.cpp

20 lines

AArch64InstrInfo.cpp

8 lines

AArch64InstrInfo.td

24 lines

AArch64MachineFunctionInfo.h

15 lines

AArch64Subtarget.h

2 lines

AArch64TargetMachine.cpp

8 lines

CMakeLists.txt

1 line

test/

CodeGen/

AArch64/

O3-pipeline.ll

1 line

jump-table-compress.mir

111 lines

jump-table-exynos.ll

67 lines

jump-table.ll

156 lines

min-jump-table.ll

6 lines

Diff 169360

llvm/include/llvm/IR/IntrinsicsAArch64.td

	Show All 38 Lines

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Data Barrier Instructions			// Data Barrier Instructions

	def int_aarch64_dmb : GCCBuiltin<"__builtin_arm_dmb">, MSBuiltin<"__dmb">, Intrinsic<[], [llvm_i32_ty]>;			def int_aarch64_dmb : GCCBuiltin<"__builtin_arm_dmb">, MSBuiltin<"__dmb">, Intrinsic<[], [llvm_i32_ty]>;
	def int_aarch64_dsb : GCCBuiltin<"__builtin_arm_dsb">, MSBuiltin<"__dsb">, Intrinsic<[], [llvm_i32_ty]>;			def int_aarch64_dsb : GCCBuiltin<"__builtin_arm_dsb">, MSBuiltin<"__dsb">, Intrinsic<[], [llvm_i32_ty]>;
	def int_aarch64_isb : GCCBuiltin<"__builtin_arm_isb">, MSBuiltin<"__isb">, Intrinsic<[], [llvm_i32_ty]>;			def int_aarch64_isb : GCCBuiltin<"__builtin_arm_isb">, MSBuiltin<"__isb">, Intrinsic<[], [llvm_i32_ty]>;

				// A space-consuming intrinsic primarily for testing block and jump table
				// placements. The first argument is the number of bytes this "instruction"
				// takes up, the second and return value are essentially chains, used to force
				// ordering during ISel.
				def int_aarch64_space : Intrinsic<[llvm_i64_ty], [llvm_i32_ty, llvm_i64_ty], []>;

	}			}

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Advanced SIMD (NEON)			// Advanced SIMD (NEON)

	let TargetPrefix = "aarch64" in { // All intrinsics start with "llvm.aarch64.".			let TargetPrefix = "aarch64" in { // All intrinsics start with "llvm.aarch64.".
	class AdvSIMD_2Scalar_Float_Intrinsic			class AdvSIMD_2Scalar_Float_Intrinsic
	: Intrinsic<[llvm_anyfloat_ty], [LLVMMatchType<0>, LLVMMatchType<0>],			: Intrinsic<[llvm_anyfloat_ty], [LLVMMatchType<0>, LLVMMatchType<0>],
	▲ Show 20 Lines • Show All 615 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64.h

	Show All 26 Lines
	class AArch64TargetMachine;			class AArch64TargetMachine;
	class FunctionPass;			class FunctionPass;
	class InstructionSelector;			class InstructionSelector;
	class MachineFunctionPass;			class MachineFunctionPass;

	FunctionPass *createAArch64DeadRegisterDefinitions();			FunctionPass *createAArch64DeadRegisterDefinitions();
	FunctionPass *createAArch64RedundantCopyEliminationPass();			FunctionPass *createAArch64RedundantCopyEliminationPass();
	FunctionPass *createAArch64CondBrTuning();			FunctionPass *createAArch64CondBrTuning();
				FunctionPass *createAArch64CompressJumpTablesPass();
	FunctionPass *createAArch64ConditionalCompares();			FunctionPass *createAArch64ConditionalCompares();
	FunctionPass *createAArch64AdvSIMDScalar();			FunctionPass *createAArch64AdvSIMDScalar();
	FunctionPass *createAArch64ISelDag(AArch64TargetMachine &TM,			FunctionPass *createAArch64ISelDag(AArch64TargetMachine &TM,
	CodeGenOpt::Level OptLevel);			CodeGenOpt::Level OptLevel);
	FunctionPass *createAArch64StorePairSuppressPass();			FunctionPass *createAArch64StorePairSuppressPass();
	FunctionPass *createAArch64ExpandPseudoPass();			FunctionPass *createAArch64ExpandPseudoPass();
	FunctionPass *createAArch64LoadStoreOptimizationPass();			FunctionPass *createAArch64LoadStoreOptimizationPass();
	FunctionPass *createAArch64SIMDInstrOptPass();			FunctionPass *createAArch64SIMDInstrOptPass();
	Show All 14 Lines
	FunctionPass *createAArch64PreLegalizeCombiner();			FunctionPass *createAArch64PreLegalizeCombiner();

	void initializeAArch64A53Fix835769Pass(PassRegistry&);			void initializeAArch64A53Fix835769Pass(PassRegistry&);
	void initializeAArch64A57FPLoadBalancingPass(PassRegistry&);			void initializeAArch64A57FPLoadBalancingPass(PassRegistry&);
	void initializeAArch64AdvSIMDScalarPass(PassRegistry&);			void initializeAArch64AdvSIMDScalarPass(PassRegistry&);
	void initializeAArch64BranchTargetsPass(PassRegistry&);			void initializeAArch64BranchTargetsPass(PassRegistry&);
	void initializeAArch64CollectLOHPass(PassRegistry&);			void initializeAArch64CollectLOHPass(PassRegistry&);
	void initializeAArch64CondBrTuningPass(PassRegistry &);			void initializeAArch64CondBrTuningPass(PassRegistry &);
				void initializeAArch64CompressJumpTablesPass(PassRegistry&);
	void initializeAArch64ConditionalComparesPass(PassRegistry&);			void initializeAArch64ConditionalComparesPass(PassRegistry&);
	void initializeAArch64ConditionOptimizerPass(PassRegistry&);			void initializeAArch64ConditionOptimizerPass(PassRegistry&);
	void initializeAArch64DeadRegisterDefinitionsPass(PassRegistry&);			void initializeAArch64DeadRegisterDefinitionsPass(PassRegistry&);
	void initializeAArch64ExpandPseudoPass(PassRegistry&);			void initializeAArch64ExpandPseudoPass(PassRegistry&);
	void initializeAArch64LoadStoreOptPass(PassRegistry&);			void initializeAArch64LoadStoreOptPass(PassRegistry&);
	void initializeAArch64SIMDInstrOptPass(PassRegistry&);			void initializeAArch64SIMDInstrOptPass(PassRegistry&);
	void initializeAArch64PreLegalizerCombinerPass(PassRegistry&);			void initializeAArch64PreLegalizerCombinerPass(PassRegistry&);
	void initializeAArch64PromoteConstantPass(PassRegistry&);			void initializeAArch64PromoteConstantPass(PassRegistry&);
	void initializeAArch64RedundantCopyEliminationPass(PassRegistry&);			void initializeAArch64RedundantCopyEliminationPass(PassRegistry&);
	void initializeAArch64StorePairSuppressPass(PassRegistry&);			void initializeAArch64StorePairSuppressPass(PassRegistry&);
	void initializeFalkorHWPFFixPass(PassRegistry&);			void initializeFalkorHWPFFixPass(PassRegistry&);
	void initializeFalkorMarkStridedAccessesLegacyPass(PassRegistry&);			void initializeFalkorMarkStridedAccessesLegacyPass(PassRegistry&);
	void initializeLDTLSCleanupPass(PassRegistry&);			void initializeLDTLSCleanupPass(PassRegistry&);
	} // end namespace llvm			} // end namespace llvm

	#endif			#endif

llvm/lib/Target/AArch64/AArch64.td

Show First 20 Lines • Show All 174 Lines • ▼ Show 20 Lines
def FeatureFuseLiterals : SubtargetFeature<		def FeatureFuseLiterals : SubtargetFeature<
"fuse-literals", "HasFuseLiterals", "true",		"fuse-literals", "HasFuseLiterals", "true",
"CPU fuses literal generation operations">;		"CPU fuses literal generation operations">;

def FeatureDisableLatencySchedHeuristic : SubtargetFeature<		def FeatureDisableLatencySchedHeuristic : SubtargetFeature<
"disable-latency-sched-heuristic", "DisableLatencySchedHeuristic", "true",		"disable-latency-sched-heuristic", "DisableLatencySchedHeuristic", "true",
"Disable latency scheduling heuristic">;		"Disable latency scheduling heuristic">;

		def FeatureForce32BitJumpTables
		: SubtargetFeature<"force-32bit-jump-tables", "Force32BitJumpTables", "true",
		"Force jump table entries to be 32-bits wide except at MinSize">;

def FeatureRCPC : SubtargetFeature<"rcpc", "HasRCPC", "true",		def FeatureRCPC : SubtargetFeature<"rcpc", "HasRCPC", "true",
"Enable support for RCPC extension">;		"Enable support for RCPC extension">;

def FeatureUseRSqrt : SubtargetFeature<		def FeatureUseRSqrt : SubtargetFeature<
"use-reciprocal-square-root", "UseRSqrt", "true",		"use-reciprocal-square-root", "UseRSqrt", "true",
"Use the reciprocal square root approximation">;		"Use the reciprocal square root approximation">;

def FeatureDotProd : SubtargetFeature<		def FeatureDotProd : SubtargetFeature<
▲ Show 20 Lines • Show All 215 Lines • ▼ Show 20 Lines	def ProcExynosM1 : SubtargetFeature<"exynosm1", "ARMProcFamily", "ExynosM1",
FeatureExynosCheapAsMoveHandling,		FeatureExynosCheapAsMoveHandling,
FeatureFPARMv8,		FeatureFPARMv8,
FeatureFuseAES,		FeatureFuseAES,
FeatureNEON,		FeatureNEON,
FeaturePerfMon,		FeaturePerfMon,
FeaturePostRAScheduler,		FeaturePostRAScheduler,
FeatureSlowMisaligned128Store,		FeatureSlowMisaligned128Store,
FeatureUseRSqrt,		FeatureUseRSqrt,
FeatureZCZeroingFP]>;		FeatureZCZeroingFP,
		FeatureForce32BitJumpTables]>;

def ProcExynosM2 : SubtargetFeature<"exynosm2", "ARMProcFamily", "ExynosM1",		def ProcExynosM2 : SubtargetFeature<"exynosm2", "ARMProcFamily", "ExynosM1",
"Samsung Exynos-M2 processors",		"Samsung Exynos-M2 processors",
[FeatureSlowPaired128,		[FeatureSlowPaired128,
FeatureCRC,		FeatureCRC,
FeatureCrypto,		FeatureCrypto,
FeatureExynosCheapAsMoveHandling,		FeatureExynosCheapAsMoveHandling,
FeatureFPARMv8,		FeatureFPARMv8,
FeatureFuseAES,		FeatureFuseAES,
FeatureNEON,		FeatureNEON,
FeaturePerfMon,		FeaturePerfMon,
FeaturePostRAScheduler,		FeaturePostRAScheduler,
FeatureSlowMisaligned128Store,		FeatureSlowMisaligned128Store,
FeatureZCZeroingFP]>;		FeatureZCZeroingFP,
		FeatureForce32BitJumpTables]>;

def ProcExynosM3 : SubtargetFeature<"exynosm3", "ARMProcFamily", "ExynosM3",		def ProcExynosM3 : SubtargetFeature<"exynosm3", "ARMProcFamily", "ExynosM3",
"Samsung Exynos-M3 processors",		"Samsung Exynos-M3 processors",
[FeatureCRC,		[FeatureCRC,
FeatureCrypto,		FeatureCrypto,
FeatureExynosCheapAsMoveHandling,		FeatureExynosCheapAsMoveHandling,
FeatureFPARMv8,		FeatureFPARMv8,
FeatureFuseAddress,		FeatureFuseAddress,
FeatureFuseAES,		FeatureFuseAES,
FeatureFuseCCSelect,		FeatureFuseCCSelect,
FeatureFuseLiterals,		FeatureFuseLiterals,
FeatureLSLFast,		FeatureLSLFast,
FeatureNEON,		FeatureNEON,
FeaturePerfMon,		FeaturePerfMon,
FeaturePostRAScheduler,		FeaturePostRAScheduler,
FeaturePredictableSelectIsExpensive,		FeaturePredictableSelectIsExpensive,
FeatureZCZeroingFP]>;		FeatureZCZeroingFP,
		FeatureForce32BitJumpTables]>;

def ProcKryo : SubtargetFeature<"kryo", "ARMProcFamily", "Kryo",		def ProcKryo : SubtargetFeature<"kryo", "ARMProcFamily", "Kryo",
"Qualcomm Kryo processors", [		"Qualcomm Kryo processors", [
FeatureCRC,		FeatureCRC,
FeatureCrypto,		FeatureCrypto,
FeatureCustomCheapAsMoveHandling,		FeatureCustomCheapAsMoveHandling,
FeatureFPARMv8,		FeatureFPARMv8,
FeatureNEON,		FeatureNEON,
▲ Show 20 Lines • Show All 174 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp

Show All 25 Lines
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/ADT/Triple.h"		#include "llvm/ADT/Triple.h"
#include "llvm/ADT/Twine.h"		#include "llvm/ADT/Twine.h"
#include "llvm/CodeGen/AsmPrinter.h"		#include "llvm/CodeGen/AsmPrinter.h"
#include "llvm/CodeGen/MachineBasicBlock.h"		#include "llvm/CodeGen/MachineBasicBlock.h"
#include "llvm/CodeGen/MachineFunction.h"		#include "llvm/CodeGen/MachineFunction.h"
#include "llvm/CodeGen/MachineInstr.h"		#include "llvm/CodeGen/MachineInstr.h"
		#include "llvm/CodeGen/MachineJumpTableInfo.h"
		#include "llvm/CodeGen/MachineModuleInfoImpls.h"
#include "llvm/CodeGen/MachineOperand.h"		#include "llvm/CodeGen/MachineOperand.h"
#include "llvm/CodeGen/StackMaps.h"		#include "llvm/CodeGen/StackMaps.h"
#include "llvm/CodeGen/TargetRegisterInfo.h"		#include "llvm/CodeGen/TargetRegisterInfo.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/DebugInfoMetadata.h"		#include "llvm/IR/DebugInfoMetadata.h"
#include "llvm/MC/MCAsmInfo.h"		#include "llvm/MC/MCAsmInfo.h"
#include "llvm/MC/MCContext.h"		#include "llvm/MC/MCContext.h"
#include "llvm/MC/MCInst.h"		#include "llvm/MC/MCInst.h"
Show All 30 Lines	public:
StringRef getPassName() const override { return "AArch64 Assembly Printer"; }		StringRef getPassName() const override { return "AArch64 Assembly Printer"; }

/// Wrapper for MCInstLowering.lowerOperand() for the		/// Wrapper for MCInstLowering.lowerOperand() for the
/// tblgen'erated pseudo lowering.		/// tblgen'erated pseudo lowering.
bool lowerOperand(const MachineOperand &MO, MCOperand &MCOp) const {		bool lowerOperand(const MachineOperand &MO, MCOperand &MCOp) const {
return MCInstLowering.lowerOperand(MO, MCOp);		return MCInstLowering.lowerOperand(MO, MCOp);
}		}

		void EmitJumpTableInfo() override;
		void emitJumpTableEntry(const MachineJumpTableInfo *MJTI,
		const MachineBasicBlock *MBB, unsigned JTI);

		void LowerJumpTableDestSmall(MCStreamer &OutStreamer, const MachineInstr &MI);

void LowerSTACKMAP(MCStreamer &OutStreamer, StackMaps &SM,		void LowerSTACKMAP(MCStreamer &OutStreamer, StackMaps &SM,
const MachineInstr &MI);		const MachineInstr &MI);
void LowerPATCHPOINT(MCStreamer &OutStreamer, StackMaps &SM,		void LowerPATCHPOINT(MCStreamer &OutStreamer, StackMaps &SM,
const MachineInstr &MI);		const MachineInstr &MI);

void LowerPATCHABLE_FUNCTION_ENTER(const MachineInstr &MI);		void LowerPATCHABLE_FUNCTION_ENTER(const MachineInstr &MI);
void LowerPATCHABLE_FUNCTION_EXIT(const MachineInstr &MI);		void LowerPATCHABLE_FUNCTION_EXIT(const MachineInstr &MI);
void LowerPATCHABLE_TAIL_CALL(const MachineInstr &MI);		void LowerPATCHABLE_TAIL_CALL(const MachineInstr &MI);
▲ Show 20 Lines • Show All 340 Lines • ▼ Show 20 Lines	void AArch64AsmPrinter::PrintDebugValueComment(const MachineInstr *MI,
printOperand(MI, 0, OS);		printOperand(MI, 0, OS);
OS << '+';		OS << '+';
printOperand(MI, 1, OS);		printOperand(MI, 1, OS);
OS << ']';		OS << ']';
OS << "+";		OS << "+";
printOperand(MI, NOps - 2, OS);		printOperand(MI, NOps - 2, OS);
}		}

		void AArch64AsmPrinter::EmitJumpTableInfo() {
		const MachineJumpTableInfo *MJTI = MF->getJumpTableInfo();
		if (!MJTI) return;

		const std::vector<MachineJumpTableEntry> &JT = MJTI->getJumpTables();
		if (JT.empty()) return;

		const TargetLoweringObjectFile &TLOF = getObjFileLowering();
		MCSection *ReadOnlySec = TLOF.getSectionForJumpTable(MF->getFunction(), TM);
		OutStreamer->SwitchSection(ReadOnlySec);

		auto AFI = MF->getInfo<AArch64FunctionInfo>();
		for (unsigned JTI = 0, e = JT.size(); JTI != e; ++JTI) {
		const std::vector<MachineBasicBlock*> &JTBBs = JT[JTI].MBBs;

		// If this jump table was deleted, ignore it.
		if (JTBBs.empty()) continue;

		unsigned Size = AFI->getJumpTableEntrySize(JTI);
		EmitAlignment(Log2_32(Size));
		OutStreamer->EmitLabel(GetJTISymbol(JTI));

		for (auto *JTBB : JTBBs)
		jfbUnsubmitted Not Done Reply Inline Actions Range-based for loop here. jfb: Range-based for loop here.
		emitJumpTableEntry(MJTI, JTBB, JTI);
		}
		}

		void AArch64AsmPrinter::emitJumpTableEntry(const MachineJumpTableInfo *MJTI,
		const MachineBasicBlock *MBB,
		unsigned JTI) {
		const MCExpr *Value = MCSymbolRefExpr::create(MBB->getSymbol(), OutContext);
		auto AFI = MF->getInfo<AArch64FunctionInfo>();
		unsigned Size = AFI->getJumpTableEntrySize(JTI);

		if (Size == 4) {
		// .word LBB - LJTI
		const TargetLowering *TLI = MF->getSubtarget().getTargetLowering();
		const MCExpr *Base = TLI->getPICJumpTableRelocBaseExpr(MF, JTI, OutContext);
		Value = MCBinaryExpr::createSub(Value, Base, OutContext);
		} else {
		// .byte (LBB - LBB) >> 2 (or .hword)
		const MCSymbol *BaseSym = AFI->getJumpTableEntryPCRelSymbol(JTI);
		const MCExpr *Base = MCSymbolRefExpr::create(BaseSym, OutContext);
		Value = MCBinaryExpr::createSub(Value, Base, OutContext);
		Value = MCBinaryExpr::createLShr(
		Value, MCConstantExpr::create(2, OutContext), OutContext);
		jfbUnsubmitted Not Done Reply Inline Actions Make it a `shr` here, since that's what you'll want to emit anyways? jfb: Make it a `shr` here, since that's what you'll want to emit anyways?
		}

		OutStreamer->EmitValue(Value, Size);
		}

		/// Small jump tables contain an unsigned byte or half, representing the offset
		/// from the lowest-addressed possible destination to the desired basic
		/// block. Since all instructions are 4-byte aligned, this is further compressed
		/// by counting in instructions rather than bytes (i.e. divided by 4). So, to
		/// materialize the correct destination we need:
		///
		/// adr xDest, .LBB0_0
		/// ldrb wScratch, [xTable, xEntry] (with "lsl #1" for ldrh).
		/// add xDest, xDest, xScratch, lsl #2
		void AArch64AsmPrinter::LowerJumpTableDestSmall(llvm::MCStreamer &OutStreamer,
		const llvm::MachineInstr &MI) {
		unsigned DestReg = MI.getOperand(0).getReg();
		unsigned ScratchReg = MI.getOperand(1).getReg();
		unsigned ScratchRegW =
		STI->getRegisterInfo()->getSubReg(ScratchReg, AArch64::sub_32);
		unsigned TableReg = MI.getOperand(2).getReg();
		unsigned EntryReg = MI.getOperand(3).getReg();
		int JTIdx = MI.getOperand(4).getIndex();
		bool IsByteEntry = MI.getOpcode() == AArch64::JumpTableDest8;

		// This has to be first because the compression pass based its reachability
		// calculations on the start of the JumpTableDest instruction.
		auto Label =
		MF->getInfo<AArch64FunctionInfo>()->getJumpTableEntryPCRelSymbol(JTIdx);
		EmitToStreamer(OutStreamer, MCInstBuilder(AArch64::ADR)
		evandroUnsubmitted Not Done Reply Inline Actions It seems to me that this instruction could be moved after the load below to promote macro fusion. evandro: It seems to me that this instruction could be moved after the load below to promote macro…
		t.p.northoverAuthorUnsubmitted Not Done Reply Inline Actions The comment above explains why this has to be first. t.p.northover: The comment above explains why this has to be first.
		.addReg(DestReg)
		.addExpr(MCSymbolRefExpr::create(
		Label, MF->getContext())));

		// Load the number of instruction-steps to offset from the label.
		unsigned LdrOpcode = IsByteEntry ? AArch64::LDRBBroX : AArch64::LDRHHroX;
		EmitToStreamer(OutStreamer, MCInstBuilder(LdrOpcode)
		.addReg(ScratchRegW)
		.addReg(TableReg)
		.addReg(EntryReg)
		.addImm(0)
		.addImm(IsByteEntry ? 0 : 1));

		// Multiply the steps by 4 and add to the already materialized base label
		// address.
		EmitToStreamer(OutStreamer, MCInstBuilder(AArch64::ADDXrs)
		.addReg(DestReg)
		.addReg(DestReg)
		.addReg(ScratchReg)
		.addImm(2));
		}

void AArch64AsmPrinter::LowerSTACKMAP(MCStreamer &OutStreamer, StackMaps &SM,		void AArch64AsmPrinter::LowerSTACKMAP(MCStreamer &OutStreamer, StackMaps &SM,
const MachineInstr &MI) {		const MachineInstr &MI) {
unsigned NumNOPBytes = StackMapOpers(&MI).getNumPatchBytes();		unsigned NumNOPBytes = StackMapOpers(&MI).getNumPatchBytes();

SM.recordStackMap(MI);		SM.recordStackMap(MI);
assert(NumNOPBytes % 4 == 0 && "Invalid number of NOP bytes requested!");		assert(NumNOPBytes % 4 == 0 && "Invalid number of NOP bytes requested!");

// Scan ahead to trim the shadow.		// Scan ahead to trim the shadow.
▲ Show 20 Lines • Show All 213 Lines • ▼ Show 20 Lines	case AArch64::TLSDESC_CALLSEQ: {
MCInst Blr;		MCInst Blr;
Blr.setOpcode(AArch64::BLR);		Blr.setOpcode(AArch64::BLR);
Blr.addOperand(MCOperand::createReg(AArch64::X1));		Blr.addOperand(MCOperand::createReg(AArch64::X1));
EmitToStreamer(*OutStreamer, Blr);		EmitToStreamer(*OutStreamer, Blr);

return;		return;
}		}

		case AArch64::JumpTableDest32: {
		// We want:
		// ldrsw xScratch, [xTable, xEntry, lsl #2]
		// add xDest, xTable, xScratch
		unsigned DestReg = MI->getOperand(0).getReg(),
		ScratchReg = MI->getOperand(1).getReg(),
		TableReg = MI->getOperand(2).getReg(),
		EntryReg = MI->getOperand(3).getReg();
		EmitToStreamer(*OutStreamer, MCInstBuilder(AArch64::LDRSWroX)
		.addReg(ScratchReg)
		.addReg(TableReg)
		.addReg(EntryReg)
		.addImm(0)
		.addImm(1));
		EmitToStreamer(*OutStreamer, MCInstBuilder(AArch64::ADDXrs)
		.addReg(DestReg)
		.addReg(TableReg)
		.addReg(ScratchReg)
		.addImm(0));
		return;
		}
		case AArch64::JumpTableDest16:
		case AArch64::JumpTableDest8:
		LowerJumpTableDestSmall(OutStreamer, MI);
		return;

case AArch64::FMOVH0:		case AArch64::FMOVH0:
case AArch64::FMOVS0:		case AArch64::FMOVS0:
case AArch64::FMOVD0:		case AArch64::FMOVD0:
EmitFMov0(*MI);		EmitFMov0(*MI);
return;		return;

case TargetOpcode::STACKMAP:		case TargetOpcode::STACKMAP:
return LowerSTACKMAP(OutStreamer, SM, MI);		return LowerSTACKMAP(OutStreamer, SM, MI);
Show All 29 Lines

llvm/lib/Target/AArch64/AArch64CompressJumpTables.cpp

This file was added.

				//==-- AArch64CompressJumpTables.cpp - Compress jump tables for AArch64 --====//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				// This pass looks at the basic blocks each jump-table refers to and works out
				// whether they can be emitted in a compressed form (with 8 or 16-bit
				// entries). If so, it changes the opcode and flags them in the associated
				// AArch64FunctionInfo.
				//
				//===----------------------------------------------------------------------===//

				#include "AArch64.h"
				#include "AArch64MachineFunctionInfo.h"
				#include "AArch64Subtarget.h"
				#include "llvm/ADT/Statistic.h"
				#include "llvm/CodeGen/MachineFunctionPass.h"
				#include "llvm/CodeGen/MachineJumpTableInfo.h"
				#include "llvm/CodeGen/TargetInstrInfo.h"
				#include "llvm/CodeGen/TargetSubtargetInfo.h"
				#include "llvm/MC/MCContext.h"
				#include "llvm/Support/Debug.h"

				using namespace llvm;

				#define DEBUG_TYPE "aarch64-jump-tables"

				STATISTIC(NumJT8, "Number of jump-tables with 1-byte entries");
				STATISTIC(NumJT16, "Number of jump-tables with 2-byte entries");
				STATISTIC(NumJT32, "Number of jump-tables with 4-byte entries");

				namespace {
				class AArch64CompressJumpTables : public MachineFunctionPass {
				const TargetInstrInfo *TII;
				MachineFunction *MF;
				SmallVector<int, 8> BlockInfo;

				int computeBlockSize(MachineBasicBlock &MBB);
				void scanFunction();

				bool compressJumpTable(MachineInstr &MI, int Offset);

				public:
				static char ID;
				AArch64CompressJumpTables() : MachineFunctionPass(ID) {
				initializeAArch64CompressJumpTablesPass(*PassRegistry::getPassRegistry());
				}

				bool runOnMachineFunction(MachineFunction &MF) override;

				MachineFunctionProperties getRequiredProperties() const override {
				return MachineFunctionProperties().set(
				MachineFunctionProperties::Property::NoVRegs);
				}
				StringRef getPassName() const override {
				return "AArch64 Compress Jump Tables";
				}
				};
				char AArch64CompressJumpTables::ID = 0;
				}

				INITIALIZE_PASS(AArch64CompressJumpTables, DEBUG_TYPE,
				"AArch64 compress jump tables pass", false, false)

				int AArch64CompressJumpTables::computeBlockSize(MachineBasicBlock &MBB) {
				int Size = 0;
				for (const MachineInstr &MI : MBB)
				Size += TII->getInstSizeInBytes(MI);
				return Size;
				}

				void AArch64CompressJumpTables::scanFunction() {
				BlockInfo.clear();
				BlockInfo.resize(MF->getNumBlockIDs());

				int Offset = 0;
				for (MachineBasicBlock &MBB : *MF) {
				BlockInfo[MBB.getNumber()] = Offset;
				Offset += computeBlockSize(MBB);
				}
				}

				bool AArch64CompressJumpTables::compressJumpTable(MachineInstr &MI,
				int Offset) {
				if (MI.getOpcode() != AArch64::JumpTableDest32)
				return false;

				int JTIdx = MI.getOperand(4).getIndex();
				auto &JTInfo = *MF->getJumpTableInfo();
				const MachineJumpTableEntry &JT = JTInfo.getJumpTables()[JTIdx];

				// The jump-table might have been optimized away.
				if (JT.MBBs.empty())
				return false;

				int MaxOffset = std::numeric_limits<int>::min(),
				MinOffset = std::numeric_limits<int>::max();
				MachineBasicBlock *MinBlock = nullptr;
				for (auto Block : JT.MBBs) {
				int BlockOffset = BlockInfo[Block->getNumber()];
				assert(BlockOffset % 4 == 0 && "misaligned basic block");

				MaxOffset = std::max(MaxOffset, BlockOffset);
				if (BlockOffset <= MinOffset) {
				MinOffset = BlockOffset;
				MinBlock = Block;
				}
				}

				// The ADR instruction needed to calculate the address of the first reachable
				// basic block can address +/-1MB.
				if (!isInt<21>(MinOffset - Offset)) {
				++NumJT32;
				return false;
				jfbUnsubmitted Not Done Reply Inline Actions Are there ever cases where min is large, but the range [min, max) would fit within 8 or 16 bits? I'm wondering if you want another optimization where you pack the jump offsets, and always add min (here you're using a base of 0). jfb: Are there ever cases where min is large, but the range [min, max) would fit within 8 or 16 bits?
				t.p.northoverAuthorUnsubmitted Not Done Reply Inline Actions Probably. A slight wrinkle is that if the PC-label isn't local it has a limit of +/-1MB, but I suspect that's significantly less likely than the limit that would be lifted. I'll update the patch with that approach. t.p.northover: Probably. A slight wrinkle is that if the PC-label isn't local it has a limit of +/-1MB, but I…
				}

				int Span = MaxOffset - MinOffset;
				auto AFI = MF->getInfo<AArch64FunctionInfo>();
				if (isUInt<8>(Span / 4)) {
				AFI->setJumpTableEntryInfo(JTIdx, 1, MinBlock->getSymbol());
				MI.setDesc(TII->get(AArch64::JumpTableDest8));
				++NumJT8;
				return true;
				} else if (isUInt<16>(Span / 4)) {
				AFI->setJumpTableEntryInfo(JTIdx, 2, MinBlock->getSymbol());
				MI.setDesc(TII->get(AArch64::JumpTableDest16));
				++NumJT16;
				return true;
				}

				++NumJT32;
				return false;
				}

				bool AArch64CompressJumpTables::runOnMachineFunction(MachineFunction &MFIn) {
				bool Changed = false;
				MF = &MFIn;

				const auto &ST = MF->getSubtarget<AArch64Subtarget>();
				TII = ST.getInstrInfo();

				if (ST.force32BitJumpTables() && !MF->getFunction().optForMinSize())
				return false;

				rengolinUnsubmitted Not Done Reply Inline Actions Can we not add more Cpu name comparisons? Shouldn't be too hard to create a feature and associate it to these two cores. Later on, Samsung or any other would be able to fine tune to other cores. rengolin: Can we not add more Cpu name comparisons? Shouldn't be too hard to create a feature and…
				scanFunction();

				for (MachineBasicBlock &MBB : *MF) {
				int Offset = BlockInfo[MBB.getNumber()];
				for (MachineInstr &MI : MBB) {
				Changed \|= compressJumpTable(MI, Offset);
				Offset += TII->getInstSizeInBytes(MI);
				}
				}

				return Changed;
				}

				FunctionPass *llvm::createAArch64CompressJumpTablesPass() {
				return new AArch64CompressJumpTables();
				}

llvm/lib/Target/AArch64/AArch64ISelLowering.h

Show First 20 Lines • Show All 601 Lines • ▼ Show 20 Lines	private:
SDValue LowerSETCC(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSETCC(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerBR_CC(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerBR_CC(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerSELECT(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSELECT(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerSELECT_CC(ISD::CondCode CC, SDValue LHS, SDValue RHS,		SDValue LowerSELECT_CC(ISD::CondCode CC, SDValue LHS, SDValue RHS,
SDValue TVal, SDValue FVal, const SDLoc &dl,		SDValue TVal, SDValue FVal, const SDLoc &dl,
SelectionDAG &DAG) const;		SelectionDAG &DAG) const;
SDValue LowerJumpTable(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerJumpTable(SDValue Op, SelectionDAG &DAG) const;
		SDValue LowerBR_JT(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerConstantPool(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerConstantPool(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerBlockAddress(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerBlockAddress(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerAAPCS_VASTART(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerAAPCS_VASTART(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerDarwin_VASTART(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerDarwin_VASTART(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerWin64_VASTART(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerWin64_VASTART(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerVASTART(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerVASTART(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerVACOPY(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerVACOPY(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerVAARG(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerVAARG(SDValue Op, SelectionDAG &DAG) const;
▲ Show 20 Lines • Show All 99 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 181 Lines • ▼ Show 20 Lines	AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
setOperationAction(ISD::SELECT, MVT::f16, Custom);		setOperationAction(ISD::SELECT, MVT::f16, Custom);
setOperationAction(ISD::SELECT, MVT::f32, Custom);		setOperationAction(ISD::SELECT, MVT::f32, Custom);
setOperationAction(ISD::SELECT, MVT::f64, Custom);		setOperationAction(ISD::SELECT, MVT::f64, Custom);
setOperationAction(ISD::SELECT_CC, MVT::i32, Custom);		setOperationAction(ISD::SELECT_CC, MVT::i32, Custom);
setOperationAction(ISD::SELECT_CC, MVT::i64, Custom);		setOperationAction(ISD::SELECT_CC, MVT::i64, Custom);
setOperationAction(ISD::SELECT_CC, MVT::f16, Custom);		setOperationAction(ISD::SELECT_CC, MVT::f16, Custom);
setOperationAction(ISD::SELECT_CC, MVT::f32, Custom);		setOperationAction(ISD::SELECT_CC, MVT::f32, Custom);
setOperationAction(ISD::SELECT_CC, MVT::f64, Custom);		setOperationAction(ISD::SELECT_CC, MVT::f64, Custom);
setOperationAction(ISD::BR_JT, MVT::Other, Expand);		setOperationAction(ISD::BR_JT, MVT::Other, Custom);
setOperationAction(ISD::JumpTable, MVT::i64, Custom);		setOperationAction(ISD::JumpTable, MVT::i64, Custom);

setOperationAction(ISD::SHL_PARTS, MVT::i64, Custom);		setOperationAction(ISD::SHL_PARTS, MVT::i64, Custom);
setOperationAction(ISD::SRA_PARTS, MVT::i64, Custom);		setOperationAction(ISD::SRA_PARTS, MVT::i64, Custom);
setOperationAction(ISD::SRL_PARTS, MVT::i64, Custom);		setOperationAction(ISD::SRL_PARTS, MVT::i64, Custom);

setOperationAction(ISD::FREM, MVT::f32, Expand);		setOperationAction(ISD::FREM, MVT::f32, Expand);
setOperationAction(ISD::FREM, MVT::f64, Expand);		setOperationAction(ISD::FREM, MVT::f64, Expand);
▲ Show 20 Lines • Show All 2,559 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::LowerOperation(SDValue Op,
case ISD::BR_CC:		case ISD::BR_CC:
return LowerBR_CC(Op, DAG);		return LowerBR_CC(Op, DAG);
case ISD::SELECT:		case ISD::SELECT:
return LowerSELECT(Op, DAG);		return LowerSELECT(Op, DAG);
case ISD::SELECT_CC:		case ISD::SELECT_CC:
return LowerSELECT_CC(Op, DAG);		return LowerSELECT_CC(Op, DAG);
case ISD::JumpTable:		case ISD::JumpTable:
return LowerJumpTable(Op, DAG);		return LowerJumpTable(Op, DAG);
		case ISD::BR_JT:
		return LowerBR_JT(Op, DAG);
case ISD::ConstantPool:		case ISD::ConstantPool:
return LowerConstantPool(Op, DAG);		return LowerConstantPool(Op, DAG);
case ISD::BlockAddress:		case ISD::BlockAddress:
return LowerBlockAddress(Op, DAG);		return LowerBlockAddress(Op, DAG);
case ISD::VASTART:		case ISD::VASTART:
return LowerVASTART(Op, DAG);		return LowerVASTART(Op, DAG);
case ISD::VACOPY:		case ISD::VACOPY:
return LowerVACOPY(Op, DAG);		return LowerVACOPY(Op, DAG);
▲ Show 20 Lines • Show All 2,037 Lines • ▼ Show 20 Lines	if (getTargetMachine().getCodeModel() == CodeModel::Large &&
!Subtarget->isTargetMachO()) {		!Subtarget->isTargetMachO()) {
return getAddrLarge(JT, DAG);		return getAddrLarge(JT, DAG);
} else if (getTargetMachine().getCodeModel() == CodeModel::Tiny) {		} else if (getTargetMachine().getCodeModel() == CodeModel::Tiny) {
return getAddrTiny(JT, DAG);		return getAddrTiny(JT, DAG);
}		}
return getAddr(JT, DAG);		return getAddr(JT, DAG);
}		}

		SDValue AArch64TargetLowering::LowerBR_JT(SDValue Op,
		SelectionDAG &DAG) const {
		// Jump table entries as PC relative offsets. No additional tweaking
		// is necessary here. Just get the address of the jump table.
		SDLoc DL(Op);
		SDValue JT = Op.getOperand(1);
		SDValue Entry = Op.getOperand(2);
		int JTI = cast<JumpTableSDNode>(JT.getNode())->getIndex();

		SDNode *Dest =
		DAG.getMachineNode(AArch64::JumpTableDest32, DL, MVT::i64, MVT::i64, JT,
		Entry, DAG.getTargetJumpTable(JTI, MVT::i32));
		return DAG.getNode(ISD::BRIND, DL, MVT::Other, Op.getOperand(0),
		SDValue(Dest, 0));
		}

SDValue AArch64TargetLowering::LowerConstantPool(SDValue Op,		SDValue AArch64TargetLowering::LowerConstantPool(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
ConstantPoolSDNode *CP = cast<ConstantPoolSDNode>(Op);		ConstantPoolSDNode *CP = cast<ConstantPoolSDNode>(Op);

if (getTargetMachine().getCodeModel() == CodeModel::Large) {		if (getTargetMachine().getCodeModel() == CodeModel::Large) {
// Use the GOT for the large code model on iOS.		// Use the GOT for the large code model on iOS.
if (Subtarget->isTargetMachO()) {		if (Subtarget->isTargetMachO()) {
return getGOT(CP, DAG);		return getGOT(CP, DAG);
▲ Show 20 Lines • Show All 6,796 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp

Show First 20 Lines • Show All 99 Lines • ▼ Show 20 Lines	case TargetOpcode::STACKMAP:
assert(NumBytes % 4 == 0 && "Invalid number of NOP bytes requested!");		assert(NumBytes % 4 == 0 && "Invalid number of NOP bytes requested!");
break;		break;
case TargetOpcode::PATCHPOINT:		case TargetOpcode::PATCHPOINT:
// The size of the patchpoint intrinsic is the number of bytes requested		// The size of the patchpoint intrinsic is the number of bytes requested
NumBytes = PatchPointOpers(&MI).getNumPatchBytes();		NumBytes = PatchPointOpers(&MI).getNumPatchBytes();
assert(NumBytes % 4 == 0 && "Invalid number of NOP bytes requested!");		assert(NumBytes % 4 == 0 && "Invalid number of NOP bytes requested!");
break;		break;
case AArch64::TLSDESC_CALLSEQ:		case AArch64::TLSDESC_CALLSEQ:
// This gets lowered to an instruction sequence which takes 16 bytes		// This gets lowered to an instruction sequence which takes 16 bytes
		jfbUnsubmitted Not Done Reply Inline Actions lol this comment jfb: lol this comment
		t.p.northoverAuthorUnsubmitted Not Done Reply Inline Actions I have a horrible suspicion it'll have been one of mine. t.p.northover: I have a horrible suspicion it'll have been one of mine.
NumBytes = 16;		NumBytes = 16;
break;		break;
		case AArch64::JumpTableDest32:
		case AArch64::JumpTableDest16:
		case AArch64::JumpTableDest8:
		NumBytes = 12;
		break;
		case AArch64::SPACE:
		NumBytes = MI.getOperand(1).getImm();
		break;
}		}

return NumBytes;		return NumBytes;
}		}

static void parseCondBranch(MachineInstr LastInst, MachineBasicBlock &Target,		static void parseCondBranch(MachineInstr LastInst, MachineBasicBlock &Target,
SmallVectorImpl<MachineOperand> &Cond) {		SmallVectorImpl<MachineOperand> &Cond) {
// Block ends with fall-through condbranch.		// Block ends with fall-through condbranch.
▲ Show 20 Lines • Show All 5,514 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64InstrInfo.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 442 Lines • ▼ Show 20 Lines	def : Pat<(AArch64LOADgot tglobaltlsaddr:$addr),
(LOADgot tglobaltlsaddr:$addr)>;		(LOADgot tglobaltlsaddr:$addr)>;

def : Pat<(AArch64LOADgot texternalsym:$addr),		def : Pat<(AArch64LOADgot texternalsym:$addr),
(LOADgot texternalsym:$addr)>;		(LOADgot texternalsym:$addr)>;

def : Pat<(AArch64LOADgot tconstpool:$addr),		def : Pat<(AArch64LOADgot tconstpool:$addr),
(LOADgot tconstpool:$addr)>;		(LOADgot tconstpool:$addr)>;

		// 32-bit jump table destination is actually only 2 instructions since we can
		// use the table itself as a PC-relative base. But optimization occurs after
		// branch relaxation so be pessimistic.
		let Size = 12, Constraints = "@earlyclobber $dst,@earlyclobber $scratch" in {
		def JumpTableDest32 : Pseudo<(outs GPR64:$dst, GPR64sp:$scratch),
		(ins GPR64:$table, GPR64:$entry, i32imm:$jti), []>,
		Sched<[]>;
		def JumpTableDest16 : Pseudo<(outs GPR64:$dst, GPR64sp:$scratch),
		(ins GPR64:$table, GPR64:$entry, i32imm:$jti), []>,
		Sched<[]>;
		def JumpTableDest8 : Pseudo<(outs GPR64:$dst, GPR64sp:$scratch),
		(ins GPR64:$table, GPR64:$entry, i32imm:$jti), []>,
		Sched<[]>;
		}

		// Space-consuming pseudo to aid testing of placement and reachability
		// algorithms. Immediate operand is the number of bytes this "instruction"
		// occupies; register operands can be used to enforce dependency and constrain
		// the scheduler.
		let hasSideEffects = 1, mayLoad = 1, mayStore = 1 in
		def SPACE : Pseudo<(outs GPR64:$Rd), (ins i32imm:$size, GPR64:$Rn),
		[(set GPR64:$Rd, (int_aarch64_space imm:$size, GPR64:$Rn))]>,
		Sched<[]>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// System instructions.		// System instructions.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

def HINT : HintI<"hint">;		def HINT : HintI<"hint">;
def : InstAlias<"nop", (HINT 0b000)>;		def : InstAlias<"nop", (HINT 0b000)>;
def : InstAlias<"yield",(HINT 0b001)>;		def : InstAlias<"yield",(HINT 0b001)>;
def : InstAlias<"wfe", (HINT 0b010)>;		def : InstAlias<"wfe", (HINT 0b010)>;
▲ Show 20 Lines • Show All 6,207 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64MachineFunctionInfo.h

Show First 20 Lines • Show All 156 Lines • ▼ Show 20 Lines	public:
void setVarArgsGPRSize(unsigned Size) { VarArgsGPRSize = Size; }		void setVarArgsGPRSize(unsigned Size) { VarArgsGPRSize = Size; }

int getVarArgsFPRIndex() const { return VarArgsFPRIndex; }		int getVarArgsFPRIndex() const { return VarArgsFPRIndex; }
void setVarArgsFPRIndex(int Index) { VarArgsFPRIndex = Index; }		void setVarArgsFPRIndex(int Index) { VarArgsFPRIndex = Index; }

unsigned getVarArgsFPRSize() const { return VarArgsFPRSize; }		unsigned getVarArgsFPRSize() const { return VarArgsFPRSize; }
void setVarArgsFPRSize(unsigned Size) { VarArgsFPRSize = Size; }		void setVarArgsFPRSize(unsigned Size) { VarArgsFPRSize = Size; }

		unsigned getJumpTableEntrySize(int Idx) const {
		auto It = JumpTableEntryInfo.find(Idx);
		if (It != JumpTableEntryInfo.end())
		return It->second.first;
		return 4;
		}
		MCSymbol *getJumpTableEntryPCRelSymbol(int Idx) const {
		return JumpTableEntryInfo.find(Idx)->second.second;
		}
		void setJumpTableEntryInfo(int Idx, unsigned Size, MCSymbol *PCRelSym) {
		JumpTableEntryInfo[Idx] = std::make_pair(Size, PCRelSym);
		}

using SetOfInstructions = SmallPtrSet<const MachineInstr *, 16>;		using SetOfInstructions = SmallPtrSet<const MachineInstr *, 16>;

const SetOfInstructions &getLOHRelated() const { return LOHRelated; }		const SetOfInstructions &getLOHRelated() const { return LOHRelated; }

// Shortcuts for LOH related types.		// Shortcuts for LOH related types.
class MILOHDirective {		class MILOHDirective {
MCLOHType Kind;		MCLOHType Kind;

Show All 22 Lines	void addLOHDirective(MCLOHType Kind, MILOHArgs Args) {
LOHContainerSet.push_back(MILOHDirective(Kind, Args));		LOHContainerSet.push_back(MILOHDirective(Kind, Args));
LOHRelated.insert(Args.begin(), Args.end());		LOHRelated.insert(Args.begin(), Args.end());
}		}

private:		private:
// Hold the lists of LOHs.		// Hold the lists of LOHs.
MILOHContainer LOHContainerSet;		MILOHContainer LOHContainerSet;
SetOfInstructions LOHRelated;		SetOfInstructions LOHRelated;

		DenseMap<int, std::pair<unsigned, MCSymbol *>> JumpTableEntryInfo;
};		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_LIB_TARGET_AARCH64_AARCH64MACHINEFUNCTIONINFO_H		#endif // LLVM_LIB_TARGET_AARCH64_AARCH64MACHINEFUNCTIONINFO_H

llvm/lib/Target/AArch64/AArch64Subtarget.h

Show First 20 Lines • Show All 136 Lines • ▼ Show 20 Lines	protected:
bool HasArithmeticCbzFusion = false;		bool HasArithmeticCbzFusion = false;
bool HasFuseAddress = false;		bool HasFuseAddress = false;
bool HasFuseAES = false;		bool HasFuseAES = false;
bool HasFuseCryptoEOR = false;		bool HasFuseCryptoEOR = false;
bool HasFuseCCSelect = false;		bool HasFuseCCSelect = false;
bool HasFuseLiterals = false;		bool HasFuseLiterals = false;
bool DisableLatencySchedHeuristic = false;		bool DisableLatencySchedHeuristic = false;
bool UseRSqrt = false;		bool UseRSqrt = false;
		bool Force32BitJumpTables = false;
uint8_t MaxInterleaveFactor = 2;		uint8_t MaxInterleaveFactor = 2;
uint8_t VectorInsertExtractBaseCost = 3;		uint8_t VectorInsertExtractBaseCost = 3;
uint16_t CacheLineSize = 0;		uint16_t CacheLineSize = 0;
uint16_t PrefetchDistance = 0;		uint16_t PrefetchDistance = 0;
uint16_t MinPrefetchStride = 1;		uint16_t MinPrefetchStride = 1;
unsigned MaxPrefetchIterationsAhead = UINT_MAX;		unsigned MaxPrefetchIterationsAhead = UINT_MAX;
unsigned PrefFunctionAlignment = 0;		unsigned PrefFunctionAlignment = 0;
unsigned PrefLoopAlignment = 0;		unsigned PrefLoopAlignment = 0;
▲ Show 20 Lines • Show All 134 Lines • ▼ Show 20 Lines	public:

/// Return true if the CPU supports any kind of instruction fusion.		/// Return true if the CPU supports any kind of instruction fusion.
bool hasFusion() const {		bool hasFusion() const {
return hasArithmeticBccFusion() \|\| hasArithmeticCbzFusion() \|\|		return hasArithmeticBccFusion() \|\| hasArithmeticCbzFusion() \|\|
hasFuseAES() \|\| hasFuseCCSelect() \|\| hasFuseLiterals();		hasFuseAES() \|\| hasFuseCCSelect() \|\| hasFuseLiterals();
}		}

bool useRSqrt() const { return UseRSqrt; }		bool useRSqrt() const { return UseRSqrt; }
		bool force32BitJumpTables() const { return Force32BitJumpTables; }
unsigned getMaxInterleaveFactor() const { return MaxInterleaveFactor; }		unsigned getMaxInterleaveFactor() const { return MaxInterleaveFactor; }
unsigned getVectorInsertExtractBaseCost() const {		unsigned getVectorInsertExtractBaseCost() const {
return VectorInsertExtractBaseCost;		return VectorInsertExtractBaseCost;
}		}
unsigned getCacheLineSize() const { return CacheLineSize; }		unsigned getCacheLineSize() const { return CacheLineSize; }
unsigned getPrefetchDistance() const { return PrefetchDistance; }		unsigned getPrefetchDistance() const { return PrefetchDistance; }
unsigned getMinPrefetchStride() const { return MinPrefetchStride; }		unsigned getMinPrefetchStride() const { return MinPrefetchStride; }
unsigned getMaxPrefetchIterationsAhead() const {		unsigned getMaxPrefetchIterationsAhead() const {
▲ Show 20 Lines • Show All 93 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64TargetMachine.cpp

Show First 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	EnablePromoteConstant("aarch64-enable-promote-const",
cl::init(true), cl::Hidden);		cl::init(true), cl::Hidden);

static cl::opt<bool> EnableCollectLOH(		static cl::opt<bool> EnableCollectLOH(
"aarch64-enable-collect-loh",		"aarch64-enable-collect-loh",
cl::desc("Enable the pass that emits the linker optimization hints (LOH)"),		cl::desc("Enable the pass that emits the linker optimization hints (LOH)"),
cl::init(true), cl::Hidden);		cl::init(true), cl::Hidden);

static cl::opt<bool>		static cl::opt<bool>
EnableDeadRegisterElimination("aarch64-enable-dead-defs", cl::Hidden,		EnableDeadRegisterElimination("aarch64-enable-dead-defs", cl::Hidden,
		jfbUnsubmitted Done Reply Inline Actions drop this jfb: drop this
cl::desc("Enable the pass that removes dead"		cl::desc("Enable the pass that removes dead"
" definitons and replaces stores to"		" definitons and replaces stores to"
" them with stores to the zero"		" them with stores to the zero"
" register"),		" register"),
cl::init(true));		cl::init(true));

static cl::opt<bool> EnableRedundantCopyElimination(		static cl::opt<bool> EnableRedundantCopyElimination(
"aarch64-enable-copyelim",		"aarch64-enable-copyelim",
Show All 30 Lines	static cl::opt<bool>
EnableGEPOpt("aarch64-enable-gep-opt", cl::Hidden,		EnableGEPOpt("aarch64-enable-gep-opt", cl::Hidden,
cl::desc("Enable optimizations on complex GEPs"),		cl::desc("Enable optimizations on complex GEPs"),
cl::init(false));		cl::init(false));

static cl::opt<bool>		static cl::opt<bool>
BranchRelaxation("aarch64-enable-branch-relax", cl::Hidden, cl::init(true),		BranchRelaxation("aarch64-enable-branch-relax", cl::Hidden, cl::init(true),
cl::desc("Relax out of range conditional branches"));		cl::desc("Relax out of range conditional branches"));

		static cl::opt<bool> EnableCompressJumpTables(
		"aarch64-enable-compress-jump-tables", cl::Hidden, cl::init(true),
		cl::desc("Use smallest entry possible for jump tables"));

// FIXME: Unify control over GlobalMerge.		// FIXME: Unify control over GlobalMerge.
static cl::opt<cl::boolOrDefault>		static cl::opt<cl::boolOrDefault>
EnableGlobalMerge("aarch64-enable-global-merge", cl::Hidden,		EnableGlobalMerge("aarch64-enable-global-merge", cl::Hidden,
cl::desc("Enable the global merge pass"));		cl::desc("Enable the global merge pass"));

static cl::opt<bool>		static cl::opt<bool>
EnableLoopDataPrefetch("aarch64-enable-loop-data-prefetch", cl::Hidden,		EnableLoopDataPrefetch("aarch64-enable-loop-data-prefetch", cl::Hidden,
cl::desc("Enable the loop data prefetch pass"),		cl::desc("Enable the loop data prefetch pass"),
Show All 19 Lines	extern "C" void LLVMInitializeAArch64Target() {
RegisterTargetMachine<AArch64leTargetMachine> Z(getTheARM64Target());		RegisterTargetMachine<AArch64leTargetMachine> Z(getTheARM64Target());
auto PR = PassRegistry::getPassRegistry();		auto PR = PassRegistry::getPassRegistry();
initializeGlobalISel(*PR);		initializeGlobalISel(*PR);
initializeAArch64A53Fix835769Pass(*PR);		initializeAArch64A53Fix835769Pass(*PR);
initializeAArch64A57FPLoadBalancingPass(*PR);		initializeAArch64A57FPLoadBalancingPass(*PR);
initializeAArch64AdvSIMDScalarPass(*PR);		initializeAArch64AdvSIMDScalarPass(*PR);
initializeAArch64BranchTargetsPass(*PR);		initializeAArch64BranchTargetsPass(*PR);
initializeAArch64CollectLOHPass(*PR);		initializeAArch64CollectLOHPass(*PR);
		initializeAArch64CompressJumpTablesPass(*PR);
initializeAArch64ConditionalComparesPass(*PR);		initializeAArch64ConditionalComparesPass(*PR);
initializeAArch64ConditionOptimizerPass(*PR);		initializeAArch64ConditionOptimizerPass(*PR);
initializeAArch64DeadRegisterDefinitionsPass(*PR);		initializeAArch64DeadRegisterDefinitionsPass(*PR);
initializeAArch64ExpandPseudoPass(*PR);		initializeAArch64ExpandPseudoPass(*PR);
initializeAArch64LoadStoreOptPass(*PR);		initializeAArch64LoadStoreOptPass(*PR);
initializeAArch64SIMDInstrOptPass(*PR);		initializeAArch64SIMDInstrOptPass(*PR);
initializeAArch64PreLegalizerCombinerPass(*PR);		initializeAArch64PreLegalizerCombinerPass(*PR);
initializeAArch64PromoteConstantPass(*PR);		initializeAArch64PromoteConstantPass(*PR);
▲ Show 20 Lines • Show All 372 Lines • ▼ Show 20 Lines	void AArch64PassConfig::addPreEmitPass() {
// Relax conditional branch instructions if they're otherwise out of		// Relax conditional branch instructions if they're otherwise out of
// range of their destination.		// range of their destination.
if (BranchRelaxation)		if (BranchRelaxation)
addPass(&BranchRelaxationPassID);		addPass(&BranchRelaxationPassID);

if (EnableBranchTargets)		if (EnableBranchTargets)
addPass(createAArch64BranchTargetsPass());		addPass(createAArch64BranchTargetsPass());

		if (TM->getOptLevel() != CodeGenOpt::None && EnableCompressJumpTables)
		addPass(createAArch64CompressJumpTablesPass());

if (TM->getOptLevel() != CodeGenOpt::None && EnableCollectLOH &&		if (TM->getOptLevel() != CodeGenOpt::None && EnableCollectLOH &&
TM->getTargetTriple().isOSBinFormatMachO())		TM->getTargetTriple().isOSBinFormatMachO())
addPass(createAArch64CollectLOHPass());		addPass(createAArch64CollectLOHPass());
}		}

llvm/lib/Target/AArch64/CMakeLists.txt

Show All 28 Lines	add_llvm_target(AArch64CodeGen
AArch64CondBrTuning.cpp		AArch64CondBrTuning.cpp
AArch64ConditionalCompares.cpp		AArch64ConditionalCompares.cpp
AArch64DeadRegisterDefinitionsPass.cpp		AArch64DeadRegisterDefinitionsPass.cpp
AArch64ExpandPseudoInsts.cpp		AArch64ExpandPseudoInsts.cpp
AArch64FalkorHWPFFix.cpp		AArch64FalkorHWPFFix.cpp
AArch64FastISel.cpp		AArch64FastISel.cpp
AArch64A53Fix835769.cpp		AArch64A53Fix835769.cpp
AArch64FrameLowering.cpp		AArch64FrameLowering.cpp
		AArch64CompressJumpTables.cpp
AArch64ConditionOptimizer.cpp		AArch64ConditionOptimizer.cpp
AArch64RedundantCopyElimination.cpp		AArch64RedundantCopyElimination.cpp
AArch64ISelDAGToDAG.cpp		AArch64ISelDAGToDAG.cpp
AArch64ISelLowering.cpp		AArch64ISelLowering.cpp
AArch64InstrInfo.cpp		AArch64InstrInfo.cpp
AArch64InstructionSelector.cpp		AArch64InstructionSelector.cpp
AArch64LegalizerInfo.cpp		AArch64LegalizerInfo.cpp
AArch64LoadStoreOptimizer.cpp		AArch64LoadStoreOptimizer.cpp
Show All 25 Lines

llvm/test/CodeGen/AArch64/O3-pipeline.ll

	Show First 20 Lines • Show All 145 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Falkor HW Prefetch Fix Late Phase			; CHECK-NEXT: Falkor HW Prefetch Fix Late Phase
	; CHECK-NEXT: PostRA Machine Instruction Scheduler			; CHECK-NEXT: PostRA Machine Instruction Scheduler
	; CHECK-NEXT: Analyze Machine Code For Garbage Collection			; CHECK-NEXT: Analyze Machine Code For Garbage Collection
	; CHECK-NEXT: Machine Block Frequency Analysis			; CHECK-NEXT: Machine Block Frequency Analysis
	; CHECK-NEXT: MachinePostDominator Tree Construction			; CHECK-NEXT: MachinePostDominator Tree Construction
	; CHECK-NEXT: Branch Probability Basic Block Placement			; CHECK-NEXT: Branch Probability Basic Block Placement
	; CHECK-NEXT: Branch relaxation pass			; CHECK-NEXT: Branch relaxation pass
	; CHECK-NEXT: AArch64 Branch Targets			; CHECK-NEXT: AArch64 Branch Targets
				; CHECK-NEXT: AArch64 Compress Jump Tables
	; CHECK-NEXT: Contiguously Lay Out Funclets			; CHECK-NEXT: Contiguously Lay Out Funclets
	; CHECK-NEXT: StackMap Liveness Analysis			; CHECK-NEXT: StackMap Liveness Analysis
	; CHECK-NEXT: Live DEBUG_VALUE analysis			; CHECK-NEXT: Live DEBUG_VALUE analysis
	; CHECK-NEXT: Insert fentry calls			; CHECK-NEXT: Insert fentry calls
	; CHECK-NEXT: Insert XRay ops			; CHECK-NEXT: Insert XRay ops
	; CHECK-NEXT: Implement the 'patchable-function' attribute			; CHECK-NEXT: Implement the 'patchable-function' attribute
	; CHECK-NEXT: Machine Outliner			; CHECK-NEXT: Machine Outliner
	; CHECK-NEXT: FunctionPass Manager			; CHECK-NEXT: FunctionPass Manager
	Show All 11 Lines

llvm/test/CodeGen/AArch64/jump-table-compress.mir

This file was added.

				# RUN: llc -mtriple=aarch64-linux-gnu %s -run-pass=aarch64-jump-tables -o - \| FileCheck %s
				--- \|
				define i32 @test_jumptable(i32 %in) {
				unreachable
				}

				...
				---
				name: test_jumptable
				alignment: 2
				exposesReturnsTwice: false
				legalized: false
				regBankSelected: false
				selected: false
				tracksRegLiveness: true
				liveins:
				- { reg: '$w0' }
				frameInfo:
				isFrameAddressTaken: false
				isReturnAddressTaken: false
				hasStackMap: false
				hasPatchPoint: false
				stackSize: 0
				offsetAdjustment: 0
				maxAlignment: 0
				adjustsStack: false
				hasCalls: false
				maxCallFrameSize: 0
				hasOpaqueSPAdjustment: false
				hasVAStart: false
				hasMustTailInVarArgFunc: false
				jumpTable:
				kind: block-address
				entries:
				- id: 0
				blocks: [ '%bb.2', '%bb.3' ]
				- id: 1
				blocks: [ '%bb.4', '%bb.5' ]
				- id: 2
				blocks: [ '%bb.7' ]
				- id: 3
				blocks: [ '%bb.9' ]
				- id: 4
				blocks: [ '%bb.9' ]
				- id: 5
				blocks: [ '%bb.11' ]
				body: \|
				bb.0 (%ir-block.0):

				bb.1 (%ir-block.0):
				; CHECK-LABEL: body:
				; CHECK-LABEL: bb.1
				; CHECK: JumpTableDest8
				liveins: $x8
				early-clobber $x10, dead early-clobber $x11 = JumpTableDest32 undef killed $x9, undef killed $x8, %jump-table.0
				BR killed $x10

				bb.2:
				; Last destination is 4 * 255 = 1020 bytes after first. Byte is OK.
				dead $xzr = SPACE 1020, undef $xzr

				bb.3:
				; CHECK-LABEL: bb.3
				; CHECK: JumpTableDest16
				early-clobber $x10, dead early-clobber $x11 = JumpTableDest32 undef killed $x9, undef killed $x8, %jump-table.1
				BR killed $x10

				bb.4:
				; Last destination is 4 * 256 = 1024 bytes after first. Half needed.
				dead $xzr = SPACE 1024, undef $xzr

				bb.5:
				; CHECK-LABEL: bb.5
				; CHECK: JumpTableDest8
				early-clobber $x10, dead early-clobber $x11 = JumpTableDest32 undef killed $x9, undef killed $x8, %jump-table.2
				BR killed $x10

				bb.6:
				; First destination is (2^20 - 4) after reference. Just reachable by ADR so can use compressed table.
				dead $xzr = SPACE 1048556, undef $xzr

				bb.7:
				; CHECK-LABEL: bb.7
				; CHECK: JumpTableDest32
				early-clobber $x10, dead early-clobber $x11 = JumpTableDest32 undef killed $x9, undef killed $x8, %jump-table.3
				BR killed $x10

				bb.8:
				; First destination is 2^20 after reference. Compressed table cannot reach it.
				dead $xzr = SPACE 1048560, undef $xzr

				bb.9:
				; First destination is 2^20 before reference. Just within reach of ADR.
				dead $xzr = SPACE 1048576, undef $xzr

				bb.10:
				; CHECK-LABEL: bb.10
				; CHECK: JumpTableDest8
				early-clobber $x10, dead early-clobber $x11 = JumpTableDest32 undef killed $x9, undef killed $x8, %jump-table.4
				BR killed $x10

				bb.11:
				; First destination is 2^20 before reference. Just within reach of ADR.
				dead $xzr = SPACE 1048580, undef $xzr

				bb.12:
				; CHECK-LABEL: bb.12
				; CHECK: JumpTableDest32
				early-clobber $x10, dead early-clobber $x11 = JumpTableDest32 undef killed $x9, undef killed $x8, %jump-table.5
				BR killed $x10
				...

llvm/test/CodeGen/AArch64/jump-table-exynos.ll

This file was added.

				; RUN: llc -o - %s -mtriple=aarch64-none-linux-gnu -mattr=+force-32bit-jump-tables -aarch64-enable-atomic-cfg-tidy=0 \| FileCheck %s
				; RUN: llc -o - %s -mtriple=aarch64-none-linux-gnu -mcpu=exynos-m1 -aarch64-enable-atomic-cfg-tidy=0 \| FileCheck %s
				; RUN: llc -o - %s -mtriple=aarch64-none-linux-gnu -mcpu=exynos-m2 -aarch64-enable-atomic-cfg-tidy=0 \| FileCheck %s
				; RUN: llc -o - %s -mtriple=aarch64-none-linux-gnu -mcpu=exynos-m3 -aarch64-enable-atomic-cfg-tidy=0 \| FileCheck %s

				; Exynos doesn't want jump tables to be compressed for now.

				define i32 @test_jumptable(i32 %in) {
				switch i32 %in, label %def [
				i32 0, label %lbl1
				i32 1, label %lbl2
				i32 2, label %lbl3
				i32 4, label %lbl4
				]
				; CHECK-LABEL: test_jumptable:
				; CHECK-NOT: ldrb

				def:
				ret i32 0

				lbl1:
				ret i32 1

				lbl2:
				ret i32 2

				lbl3:
				ret i32 4

				lbl4:
				ret i32 8

				}

				define i32 @test_jumptable_minsize(i32 %in) minsize {
				switch i32 %in, label %def [
				i32 0, label %lbl1
				i32 1, label %lbl2
				i32 2, label %lbl3
				i32 4, label %lbl4
				]
				; CHECK-LABEL: test_jumptable_minsize:
				; CHECK: adrp [[JTPAGE:x[0-9]+]], .LJTI1_0
				; CHECK: add x[[JT:[0-9]+]], [[JTPAGE]], {{#?}}:lo12:.LJTI1_0
				; CHECK: adr [[PCBASE:x[0-9]+]], [[JTBASE:.LBB[0-9]+_[0-9]+]]
				; CHECK: ldrb w[[OFFSET:[0-9]+]], [x[[JT]], {{x[0-9]+}}]
				; CHECK: add [[DEST:x[0-9]+]], [[PCBASE]], x[[OFFSET]], lsl #2
				; CHECK: br [[DEST]]



				def:
				ret i32 0

				lbl1:
				ret i32 1

				lbl2:
				ret i32 2

				lbl3:
				ret i32 4

				lbl4:
				ret i32 8

				}

llvm/test/CodeGen/AArch64/jump-table.ll

	; RUN: llc -verify-machineinstrs -o - %s -mtriple=aarch64-none-linux-gnu -aarch64-enable-atomic-cfg-tidy=0 \| FileCheck %s			; RUN: llc -no-integrated-as -verify-machineinstrs -o - %s -mtriple=aarch64-none-linux-gnu -aarch64-enable-atomic-cfg-tidy=0 \| FileCheck %s
	; RUN: llc -code-model=large -verify-machineinstrs -o - %s -mtriple=aarch64-none-linux-gnu -aarch64-enable-atomic-cfg-tidy=0 \| FileCheck --check-prefix=CHECK-LARGE %s			; RUN: llc -no-integrated-as -code-model=large -verify-machineinstrs -o - %s -mtriple=aarch64-none-linux-gnu -aarch64-enable-atomic-cfg-tidy=0 \| FileCheck --check-prefix=CHECK-LARGE %s
	; RUN: llc -mtriple=aarch64-none-linux-gnu -verify-machineinstrs -relocation-model=pic -aarch64-enable-atomic-cfg-tidy=0 -o - %s \| FileCheck --check-prefix=CHECK-PIC %s			; RUN: llc -no-integrated-as -mtriple=aarch64-none-linux-gnu -verify-machineinstrs -relocation-model=pic -aarch64-enable-atomic-cfg-tidy=0 -o - %s \| FileCheck --check-prefix=CHECK-PIC %s
	; RUN: llc -code-model=tiny -verify-machineinstrs -o - %s -mtriple=aarch64-none-linux-gnu -aarch64-enable-atomic-cfg-tidy=0 \| FileCheck --check-prefix=CHECK-TINY %s			; RUN: llc -no-integrated-as -code-model=tiny -verify-machineinstrs -o - %s -mtriple=aarch64-none-linux-gnu -aarch64-enable-atomic-cfg-tidy=0 \| FileCheck --check-prefix=CHECK-TINY %s

	define i32 @test_jumptable(i32 %in) {			define i32 @test_jumptable(i32 %in) {
	; CHECK: test_jumptable			; CHECK: test_jumptable

	switch i32 %in, label %def [			switch i32 %in, label %def [
	i32 0, label %lbl1			i32 0, label %lbl1
	i32 1, label %lbl2			i32 1, label %lbl2
	i32 2, label %lbl3			i32 2, label %lbl3
	i32 4, label %lbl4			i32 4, label %lbl4
	]			]
				; CHECK-LABEL: test_jumptable:
	; CHECK: adrp [[JTPAGE:x[0-9]+]], .LJTI0_0			; CHECK: adrp [[JTPAGE:x[0-9]+]], .LJTI0_0
	; CHECK: add x[[JT:[0-9]+]], [[JTPAGE]], {{#?}}:lo12:.LJTI0_0			; CHECK: add x[[JT:[0-9]+]], [[JTPAGE]], {{#?}}:lo12:.LJTI0_0
	; CHECK: ldr [[DEST:x[0-9]+]], [x[[JT]], {{x[0-9]+}}, lsl #3]			; CHECK: adr [[PCBASE:x[0-9]+]], [[JTBASE:.LBB[0-9]+_[0-9]+]]
				; CHECK: ldrb w[[OFFSET:[0-9]+]], [x[[JT]], {{x[0-9]+}}]
				; CHECK: add [[DEST:x[0-9]+]], [[PCBASE]], x[[OFFSET]], lsl #2
	; CHECK: br [[DEST]]			; CHECK: br [[DEST]]

	; CHECK-LARGE: movz x[[JTADDR:[0-9]+]], #:abs_g0_nc:.LJTI0_0			; CHECK-LARGE: movz x[[JTADDR:[0-9]+]], #:abs_g0_nc:.LJTI0_0
	; CHECK-LARGE: movk x[[JTADDR]], #:abs_g1_nc:.LJTI0_0			; CHECK-LARGE: movk x[[JTADDR]], #:abs_g1_nc:.LJTI0_0
	; CHECK-LARGE: movk x[[JTADDR]], #:abs_g2_nc:.LJTI0_0			; CHECK-LARGE: movk x[[JTADDR]], #:abs_g2_nc:.LJTI0_0
	; CHECK-LARGE: movk x[[JTADDR]], #:abs_g3:.LJTI0_0			; CHECK-LARGE: movk x[[JTADDR]], #:abs_g3:.LJTI0_0
	; CHECK-LARGE: ldr [[DEST:x[0-9]+]], [x[[JTADDR]], {{x[0-9]+}}, lsl #3]			; CHECK-LARGE: adr [[PCBASE:x[0-9]+]], [[JTBASE:.LBB[0-9]+_[0-9]+]]
				; CHECK-LARGE: ldrb w[[OFFSET:[0-9]+]], [x[[JTADDR]], {{x[0-9]+}}]
				; CHECK-LARGE: add [[DEST:x[0-9]+]], [[PCBASE]], x[[OFFSET]], lsl #2
	; CHECK-LARGE: br [[DEST]]			; CHECK-LARGE: br [[DEST]]

				; CHECK-PIC-LABEL: test_jumptable:
	; CHECK-PIC: adrp [[JTPAGE:x[0-9]+]], .LJTI0_0			; CHECK-PIC: adrp [[JTPAGE:x[0-9]+]], .LJTI0_0
	; CHECK-PIC: add x[[JT:[0-9]+]], [[JTPAGE]], {{#?}}:lo12:.LJTI0_0			; CHECK-PIC: add x[[JT:[0-9]+]], [[JTPAGE]], {{#?}}:lo12:.LJTI0_0
	; CHECK-PIC: ldrsw [[DEST:x[0-9]+]], [x[[JT]], {{x[0-9]+}}, lsl #2]			; CHECK-PIC: adr [[PCBASE:x[0-9]+]], [[JTBASE:.LBB[0-9]+_[0-9]+]]
	; CHECK-PIC: add [[TABLE:x[0-9]+]], [[DEST]], x[[JT]]			; CHECK-PIC: ldrb w[[OFFSET:[0-9]+]], [x[[JT]], {{x[0-9]+}}]
	; CHECK-PIC: br [[TABLE]]			; CHECK-PIC: add [[DEST:x[0-9]+]], [[PCBASE]], x[[OFFSET]], lsl #2
				; CHECK-PIC: br [[DEST]]

				; CHECK-IOS: adrp [[JTPAGE:x[0-9]+]], LJTI0_0@PAGE
				; CHECK-IOS: add x[[JT:[0-9]+]], [[JTPAGE]], LJTI0_0@PAGEOFF
				; CHECK-IOS: adr [[PCBASE:x[0-9]+]], [[JTBASE:LBB[0-9]+_[0-9]+]]
				; CHECK-IOS: ldrb w[[OFFSET:[0-9]+]], [x[[JT]], {{x[0-9]+}}]
				; CHECK-IOS: add [[DEST:x[0-9]+]], [[PCBASE]], x[[OFFSET]], lsl #2
				; CHECK-IOS: br [[DEST]]

				; CHECK-TINY-LABEL: test_jumptable:
	; CHECK-TINY: adr x[[JT:[0-9]+]], .LJTI0_0			; CHECK-TINY: adr x[[JT:[0-9]+]], .LJTI0_0
	; CHECK-TINY: ldr [[DEST:x[0-9]+]], [x[[JT]], {{x[0-9]+}}, lsl #3]			; CHECK-TINY: adr [[PCBASE:x[0-9]+]], [[JTBASE:.LBB[0-9]+_[0-9]+]]
				; CHECK-TINY: ldrb w[[OFFSET:[0-9]+]], [x[[JT]], {{x[0-9]+}}]
				; CHECK-TINY: add [[DEST:x[0-9]+]], [[PCBASE]], x[[OFFSET]], lsl #2
	; CHECK-TINY: br [[DEST]]			; CHECK-TINY: br [[DEST]]


	def:			def:
	ret i32 0			ret i32 0

	lbl1:			lbl1:
	ret i32 1			ret i32 1

	lbl2:			lbl2:
	ret i32 2			ret i32 2

	lbl3:			lbl3:
	ret i32 4			ret i32 4

	lbl4:			lbl4:
	ret i32 8			ret i32 8

	}			}

	; CHECK: .rodata			; CHECK: .rodata

	; CHECK: .LJTI0_0:			; CHECK: .LJTI0_0:
	; CHECK-NEXT: .xword			; CHECK-NEXT: .byte ([[JTBASE]]-[[JTBASE]])>>2
	; CHECK-NEXT: .xword			; CHECK-NEXT: .byte (.LBB{{.*}}-[[JTBASE]])>>2
	; CHECK-NEXT: .xword			; CHECK-NEXT: .byte (.LBB{{.*}}-[[JTBASE]])>>2
	; CHECK-NEXT: .xword			; CHECK-NEXT: .byte (.LBB{{.*}}-[[JTBASE]])>>2
	; CHECK-NEXT: .xword			; CHECK-NEXT: .byte (.LBB{{.*}}-[[JTBASE]])>>2

				define i32 @test_jumptable16(i32 %in) {

				switch i32 %in, label %def [
				i32 0, label %lbl1
				i32 1, label %lbl2
				i32 2, label %lbl3
				i32 4, label %lbl4
				]
				; CHECK-LABEL: test_jumptable16:
				; CHECK: adrp [[JTPAGE:x[0-9]+]], .LJTI1_0
				; CHECK: add x[[JT:[0-9]+]], [[JTPAGE]], {{#?}}:lo12:.LJTI1_0
				; CHECK: adr [[PCBASE:x[0-9]+]], [[JTBASE:.LBB[0-9]+_[0-9]+]]
				; CHECK: ldrh w[[OFFSET:[0-9]+]], [x[[JT]], {{x[0-9]+}}, lsl #1]
				; CHECK: add [[DEST:x[0-9]+]], [[PCBASE]], x[[OFFSET]], lsl #2
				; CHECK: br [[DEST]]

				def:
				ret i32 0

				lbl1:
				ret i32 1

				lbl2:
				call void asm sideeffect "1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16", ""()
				call void asm sideeffect "1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16", ""()
				call void asm sideeffect "1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16", ""()
				call void asm sideeffect "1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16", ""()
				call void asm sideeffect "1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16", ""()
				call void asm sideeffect "1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16", ""()
				call void asm sideeffect "1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16", ""()
				call void asm sideeffect "1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16", ""()
				call void asm sideeffect "1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16", ""()
				call void asm sideeffect "1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16", ""()
				call void asm sideeffect "1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16", ""()
				call void asm sideeffect "1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16", ""()
				call void asm sideeffect "1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16", ""()
				call void asm sideeffect "1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16", ""()
				call void asm sideeffect "1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16", ""()
				call void asm sideeffect "1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16", ""()
				ret i32 2

				lbl3:
				ret i32 4

				lbl4:
				ret i32 8

				}

				; CHECK: .rodata
				; CHECK: .p2align 1
				; CHECK: .LJTI1_0:
				; CHECK-NEXT: .hword ([[JTBASE]]-[[JTBASE]])>>2
				; CHECK-NEXT: .hword (.LBB{{.*}}-[[JTBASE]])>>2
				; CHECK-NEXT: .hword (.LBB{{.*}}-[[JTBASE]])>>2
				; CHECK-NEXT: .hword (.LBB{{.*}}-[[JTBASE]])>>2
				; CHECK-NEXT: .hword (.LBB{{.*}}-[[JTBASE]])>>2

	; CHECK-PIC-NOT: .data_region			; CHECK-PIC-NOT: .data_region
	; CHECK-PIC-NOT: .LJTI0_0			; CHECK-PIC-NOT: .LJTI0_0
	; CHECK-PIC: .LJTI0_0:			; CHECK-PIC: .LJTI0_0:
	; CHECK-PIC-NEXT: .word .LBB{{.*}}-.LJTI0_0			; CHECK-PIC-NEXT: .byte ([[JTBASE]]-[[JTBASE]])>>2
	; CHECK-PIC-NEXT: .word .LBB{{.*}}-.LJTI0_0			; CHECK-PIC-NEXT: .byte (.LBB{{.*}}-[[JTBASE]])>>2
	; CHECK-PIC-NEXT: .word .LBB{{.*}}-.LJTI0_0			; CHECK-PIC-NEXT: .byte (.LBB{{.*}}-[[JTBASE]])>>2
	; CHECK-PIC-NEXT: .word .LBB{{.*}}-.LJTI0_0			; CHECK-PIC-NEXT: .byte (.LBB{{.*}}-[[JTBASE]])>>2
	; CHECK-PIC-NEXT: .word .LBB{{.*}}-.LJTI0_0			; CHECK-PIC-NEXT: .byte (.LBB{{.*}}-[[JTBASE]])>>2
	; CHECK-PIC-NOT: .end_data_region			; CHECK-PIC-NOT: .end_data_region

				; CHECK-IOS: .section __TEXT,__const
				; CHECK-IOS-NOT: .data_region
				; CHECK-IOS: LJTI0_0:
				; CHECK-IOS-NEXT: .byte ([[JTBASE]]-[[JTBASE]])>>2
				; CHECK-IOS-NEXT: .byte (LBB{{.*}}-[[JTBASE]])>>2
				; CHECK-IOS-NEXT: .byte (LBB{{.*}}-[[JTBASE]])>>2
				; CHECK-IOS-NEXT: .byte (LBB{{.*}}-[[JTBASE]])>>2
				; CHECK-IOS-NEXT: .byte (LBB{{.*}}-[[JTBASE]])>>2
				; CHECK-IOS-NOT: .end_data_region

llvm/test/CodeGen/AArch64/min-jump-table.ll

	; RUN: llc %s -O2 -print-machineinstrs -mtriple=aarch64-linux-gnu -jump-table-density=40 -min-jump-table-entries=0 -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECK0 < %t			; RUN: llc %s -O2 -print-machineinstrs -mtriple=aarch64-linux-gnu -jump-table-density=40 -min-jump-table-entries=0 -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECK0 < %t
	; RUN: llc %s -O2 -print-machineinstrs -mtriple=aarch64-linux-gnu -jump-table-density=40 -min-jump-table-entries=4 -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECK4 < %t			; RUN: llc %s -O2 -print-machineinstrs -mtriple=aarch64-linux-gnu -jump-table-density=40 -min-jump-table-entries=4 -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECK4 < %t
	; RUN: llc %s -O2 -print-machineinstrs -mtriple=aarch64-linux-gnu -jump-table-density=40 -min-jump-table-entries=8 -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECK8 < %t			; RUN: llc %s -O2 -print-machineinstrs -mtriple=aarch64-linux-gnu -jump-table-density=40 -min-jump-table-entries=8 -o /dev/null 2> %t; FileCheck %s --check-prefixes=CHECK,CHECK8 < %t

	declare void @ext(i32)			declare void @ext(i32)

	define i32 @jt2(i32 %a, i32 %b) {			define i32 @jt2(i32 %a, i32 %b) {
	entry:			entry:
	switch i32 %a, label %return [			switch i32 %a, label %return [
	i32 1, label %bb1			i32 1, label %bb1
	i32 2, label %bb2			i32 2, label %bb2
	]			]
	; CHECK-LABEL: function jt2:			; CHECK-LABEL: function jt2:
	; CHECK0-NEXT: Jump Tables:			; CHECK0-NEXT: Jump Tables:
	; CHECK0-NEXT: %jump-table.0:			; CHECK0-NEXT: %jump-table.0:
	; CHECK0-NOT: %jump-table.1:			; CHECK0-NOT: %jump-table.1:
	; CHECK4-NOT: Jump Tables:			; CHECK4-NOT: {{^}}Jump Tables:
	; CHECK8-NOT: Jump Tables:			; CHECK8-NOT: {{^}}Jump Tables:

	bb1: tail call void @ext(i32 0) br label %return			bb1: tail call void @ext(i32 0) br label %return
	bb2: tail call void @ext(i32 2) br label %return			bb2: tail call void @ext(i32 2) br label %return

	return: ret i32 %b			return: ret i32 %b
	}			}

	define i32 @jt4(i32 %a, i32 %b) {			define i32 @jt4(i32 %a, i32 %b) {
	entry:			entry:
	switch i32 %a, label %return [			switch i32 %a, label %return [
	i32 1, label %bb1			i32 1, label %bb1
	i32 2, label %bb2			i32 2, label %bb2
	i32 3, label %bb3			i32 3, label %bb3
	i32 4, label %bb4			i32 4, label %bb4
	]			]
	; CHECK-LABEL: function jt4:			; CHECK-LABEL: function jt4:
	; CHECK0-NEXT: Jump Tables:			; CHECK0-NEXT: Jump Tables:
	; CHECK0-NEXT: %jump-table.0:			; CHECK0-NEXT: %jump-table.0:
	; CHECK0-NOT: %jump-table.1:			; CHECK0-NOT: %jump-table.1:
	; CHECK4-NEXT: Jump Tables:			; CHECK4-NEXT: Jump Tables:
	; CHECK4-NEXT: %jump-table.0:			; CHECK4-NEXT: %jump-table.0:
	; CHECK4-NOT: %jump-table.1:			; CHECK4-NOT: %jump-table.1:
	; CHECK8-NOT: Jump Tables:			; CHECK8-NOT: {{^}}Jump Tables:

	bb1: tail call void @ext(i32 0) br label %return			bb1: tail call void @ext(i32 0) br label %return
	bb2: tail call void @ext(i32 2) br label %return			bb2: tail call void @ext(i32 2) br label %return
	bb3: tail call void @ext(i32 4) br label %return			bb3: tail call void @ext(i32 4) br label %return
	bb4: tail call void @ext(i32 6) br label %return			bb4: tail call void @ext(i32 6) br label %return

	return: ret i32 %b			return: ret i32 %b
	}			}
	Show All 30 Lines

This is an archive of the discontinued LLVM Phabricator instance.

AArch64: compress jump tables to minimum size needed to reach destinationsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 169360

llvm/include/llvm/IR/IntrinsicsAArch64.td

llvm/lib/Target/AArch64/AArch64.h

llvm/lib/Target/AArch64/AArch64.td

llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp

llvm/lib/Target/AArch64/AArch64CompressJumpTables.cpp

llvm/lib/Target/AArch64/AArch64ISelLowering.h

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp

llvm/lib/Target/AArch64/AArch64InstrInfo.td

llvm/lib/Target/AArch64/AArch64MachineFunctionInfo.h

llvm/lib/Target/AArch64/AArch64Subtarget.h

llvm/lib/Target/AArch64/AArch64TargetMachine.cpp

llvm/lib/Target/AArch64/CMakeLists.txt

llvm/test/CodeGen/AArch64/O3-pipeline.ll

llvm/test/CodeGen/AArch64/jump-table-compress.mir

llvm/test/CodeGen/AArch64/jump-table-exynos.ll

llvm/test/CodeGen/AArch64/jump-table.ll

llvm/test/CodeGen/AArch64/min-jump-table.ll

AArch64: compress jump tables to minimum size needed to reach destinations
ClosedPublic