This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Enable the select optimize pass for AArch64
ClosedPublic

Authored by dmgreen on Nov 30 2022, 1:42 AM.

Download Raw Diff

Details

Reviewers

SjoerdMeijer
labrinea
apostolakis
ktkachov
samtebbs
t.p.northover

Commits

rG16a72a0f8748: [AArch64] Enable the select optimize pass for AArch64

Summary

I have been running some experiments with the select optimization pass
for ARM AArch64 cores. It is trying to solve a problem that is difficult
for the compiler to fix. The criteria for when a csel is better or worse
than a branch depends heavily on whether the branch is well predicted
and the amount of ILP in the loop (as well as other criteria like the
core in question and the relative performance of the branch predictor).
The pass seems to do a decent job though, with the inner loop heuristics
being well implemented and doing a better job than I had expected in
general, even without PGO information.

I've been doing quite a bit of benchmarking. The headline numbers are
these for SPEC2017 on a Neoverse N1:

500.perlbench_r   -0.12%
502.gcc_r         0.02%
505.mcf_r         6.02%
520.omnetpp_r     0.32%
523.xalancbmk_r   0.20%
525.x264_r        0.02%
531.deepsjeng_r   0.00%
541.leela_r       -0.09%
548.exchange2_r   0.00%
557.xz_r          -0.20%

Running benchmarks with a combination of the llvm-test-suite plus
several versions of SPEC gave between a 0.2% and 0.4% geomean
improvement depending on the core/run. The instruction count went down
by 0.1% too, which is a good sign, but the results can be a little noisy.
Some issues from other benchmarks I had ran were improved in
rGca78b5601466f8515f5f958ef8e63d787d9d812e. In summary well predicted
branches will see in improvement, badly predicted branches may get
worse, and on average performance seems to be a little better overall.

This patch enables the pass for AArch64 under -O3 for cores that will
benefit for it. i.e. not in-order cores that do not fit into the "Assume
infinite resources that allow to fully exploit the available
instruction-level parallelism" cost model. It uses a subtarget feature
for specifying when the pass will be enabled, which I have enabled under
cpu=generic as the performance increases for out of order cores seems
larger than any decreases for inorder.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

dmgreen created this revision.Nov 30 2022, 1:42 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 30 2022, 1:42 AM

Herald added subscribers: wenlei, hiraditya, kristof.beyls. · View Herald Transcript

dmgreen requested review of this revision.Nov 30 2022, 1:42 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 30 2022, 1:42 AM

Nice one Dave!
I am going to take this patch and run some benchmarking on a different AArch64 system. I will report back soon.

In the mean time, how about compile times? Did you measure that by any chance?

In D138990#3959642, @SjoerdMeijer wrote:

Nice one Dave!
I am going to take this patch and run some benchmarking on a different AArch64 system. I will report back soon.

Sounds good, let us know how they look.

In D138990#3959670, @SjoerdMeijer wrote:

In the mean, how about compile times? Did you measure that by any chance?

Hmm. I have not, but this is only a scan over the instructions in each function, not anything that isn't done many times in the compiler already. I can check that is true though and there isn't anything costly hiding in it.

Harbormaster completed remote builds in B200207: Diff 478842.Nov 30 2022, 2:24 AM

In D138990#3959676, @dmgreen wrote:

In D138990#3959670, @SjoerdMeijer wrote:

In the mean, how about compile times? Did you measure that by any chance?

Hmm. I have not, but this is only a scan over the instructions in each function, not anything that isn't done many times in the compiler already. I can check that is true though and there isn't anything costly hiding in it.

Thanks, and yes, that's what I was expecting, i.e. this to be a simple scan and thus not contributing much. Thought it would be good to check, to avoid possibly getting surprised for a reason we had not thought about.

The compile time was a little higher than I expected, at around a 0.25% increase. I was expecting it to be closer to 0%, but some of the tests in ct-mark are around 0.5% (with others being 0%). Considering this is only enabled at -O3, that is probably an acceptable amount.

Thanks Dave for this extensive evaluation and reporting!

The performance improvements even for non-PGO builds are somewhat expected given that currently the AArch64 backend does not have almost any logic to make this decision (contrary to x86) and it just aggressively prefers predication.
So, even without profile information, the loop-level heuristics, albeit conservative, allow for some obvious cases to be converted to branches.

Note that internally at Google, we have already enabled the select-optimize pass for all instrPGO-optimized builds including AArch64. The performance improvements for AArch64 appear to be even more significant than x86.
For non-PGO builds, we have also seen significant improvements for some microbenchmarks on AArch64 but I did not have the time to investigate more. So, your efforts are more than welcome.

Regarding compilation time, the impact should be small. For non-PGO builds, we essentially only have the loop-level heuristic that does two passes over all the instructions in each loop and for each instruction it iterates over its operands. So for each loop, 2*N*K, where N is the instructions in the loop and K is the operand count of each instruction; this is essentially O(N) given that the operand count is a small bounded number.
In practice, the constant costs might be noticeable for some programs with big loops.
Enabling for -O3 only sounds reasonable. Note that you do not want to enable it for size-optimizing builds (although checks within the pass already prevent it from being used in those cases)

My first SPEC performance results are in line with yours. The gain for MCF is bigger, but the gap is also bigger than on the N1; so the trend is the same. I would like to do a bit more testing though and will report back tomorrow.

Thanks for checking the compile times. Yeah, perhaps a bit surprisingly a bit higher than expected, but agreed it looks reasonable for O3.

And the confirmation from Google is nice. I think it shows we are on the right track with this.

I have done a bit more testing, and results are okay. This looks good to me.

One nit: the tests are a bit big. At least some comments explaining what's going on would be nice.

This revision is now accepted and ready to land.Dec 1 2022, 9:43 AM

Thanks

This revision was landed with ongoing or failed builds.Dec 3 2022, 8:09 AM

Closed by commit rG16a72a0f8748: [AArch64] Enable the select optimize pass for AArch64 (authored by dmgreen). · Explain Why

This revision was automatically updated to reflect the committed changes.

dmgreen added a commit: rG16a72a0f8748: [AArch64] Enable the select optimize pass for AArch64.

aleksandr.popov mentioned this in D143162: [AArch64] Add PredictableSelectIsExpensive feature to all the cpus that have FeatureEnableSelectOptimize.Feb 2 2023, 2:50 AM

I am wondering if we could enable it for X86 too. Any known blockers?

Herald added a subscriber: wlei. · View Herald TranscriptJun 4 2023, 9:41 AM

aleksandr.popov mentioned this in rG22f21738370c: [AArch64] Add PredictableSelectIsExpensive feature to all the cpus that have….Jul 3 2023, 3:33 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

TargetTransformInfo.h

7 lines

TargetTransformInfoImpl.h

2 lines

lib/

Analysis/

TargetTransformInfo.cpp

4 lines

CodeGen/

SelectOptimize.cpp

4 lines

Target/

AArch64/

AArch64.td

62 lines

AArch64TargetMachine.cpp

8 lines

AArch64TargetTransformInfo.h

2 lines

test/

CodeGen/

AArch64/

O3-pipeline.ll

4 lines

selectopt.ll

263 lines

Diff 479842

llvm/include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 799 Lines • ▼ Show 20 Lines	struct MemCmpExpansionOptions {
// Set to true to allow overlapping loads. For example, 7-byte compares can		// Set to true to allow overlapping loads. For example, 7-byte compares can
// be done with two 4-byte compares instead of 4+2+1-byte compares. This		// be done with two 4-byte compares instead of 4+2+1-byte compares. This
// requires all loads in LoadSizes to be doable in an unaligned way.		// requires all loads in LoadSizes to be doable in an unaligned way.
bool AllowOverlappingLoads = false;		bool AllowOverlappingLoads = false;
};		};
MemCmpExpansionOptions enableMemCmpExpansion(bool OptSize,		MemCmpExpansionOptions enableMemCmpExpansion(bool OptSize,
bool IsZeroCmp) const;		bool IsZeroCmp) const;

		/// Should the Select Optimization pass be enabled and ran.
		bool enableSelectOptimize() const;

/// Enable matching of interleaved access groups.		/// Enable matching of interleaved access groups.
bool enableInterleavedAccessVectorization() const;		bool enableInterleavedAccessVectorization() const;

/// Enable matching of interleaved access groups that contain predicated		/// Enable matching of interleaved access groups that contain predicated
/// accesses or gaps and therefore vectorized using masked		/// accesses or gaps and therefore vectorized using masked
/// vector loads/stores.		/// vector loads/stores.
bool enableMaskedInterleavedAccessVectorization() const;		bool enableMaskedInterleavedAccessVectorization() const;

▲ Show 20 Lines • Show All 862 Lines • ▼ Show 20 Lines	public:
getOperandsScalarizationOverhead(ArrayRef<const Value *> Args,		getOperandsScalarizationOverhead(ArrayRef<const Value *> Args,
ArrayRef<Type *> Tys) = 0;		ArrayRef<Type *> Tys) = 0;
virtual bool supportsEfficientVectorElementLoadStore() = 0;		virtual bool supportsEfficientVectorElementLoadStore() = 0;
virtual bool supportsTailCalls() = 0;		virtual bool supportsTailCalls() = 0;
virtual bool supportsTailCallFor(const CallBase *CB) = 0;		virtual bool supportsTailCallFor(const CallBase *CB) = 0;
virtual bool enableAggressiveInterleaving(bool LoopHasReductions) = 0;		virtual bool enableAggressiveInterleaving(bool LoopHasReductions) = 0;
virtual MemCmpExpansionOptions		virtual MemCmpExpansionOptions
enableMemCmpExpansion(bool OptSize, bool IsZeroCmp) const = 0;		enableMemCmpExpansion(bool OptSize, bool IsZeroCmp) const = 0;
		virtual bool enableSelectOptimize() = 0;
virtual bool enableInterleavedAccessVectorization() = 0;		virtual bool enableInterleavedAccessVectorization() = 0;
virtual bool enableMaskedInterleavedAccessVectorization() = 0;		virtual bool enableMaskedInterleavedAccessVectorization() = 0;
virtual bool isFPVectorizationPotentiallyUnsafe() = 0;		virtual bool isFPVectorizationPotentiallyUnsafe() = 0;
virtual bool allowsMisalignedMemoryAccesses(LLVMContext &Context,		virtual bool allowsMisalignedMemoryAccesses(LLVMContext &Context,
unsigned BitWidth,		unsigned BitWidth,
unsigned AddressSpace,		unsigned AddressSpace,
Align Alignment,		Align Alignment,
unsigned *Fast) = 0;		unsigned *Fast) = 0;
▲ Show 20 Lines • Show All 474 Lines • ▼ Show 20 Lines	public:
}		}
MemCmpExpansionOptions enableMemCmpExpansion(bool OptSize,		MemCmpExpansionOptions enableMemCmpExpansion(bool OptSize,
bool IsZeroCmp) const override {		bool IsZeroCmp) const override {
return Impl.enableMemCmpExpansion(OptSize, IsZeroCmp);		return Impl.enableMemCmpExpansion(OptSize, IsZeroCmp);
}		}
bool enableInterleavedAccessVectorization() override {		bool enableInterleavedAccessVectorization() override {
return Impl.enableInterleavedAccessVectorization();		return Impl.enableInterleavedAccessVectorization();
}		}
		bool enableSelectOptimize() override {
		return Impl.enableSelectOptimize();
		}
bool enableMaskedInterleavedAccessVectorization() override {		bool enableMaskedInterleavedAccessVectorization() override {
return Impl.enableMaskedInterleavedAccessVectorization();		return Impl.enableMaskedInterleavedAccessVectorization();
}		}
bool isFPVectorizationPotentiallyUnsafe() override {		bool isFPVectorizationPotentiallyUnsafe() override {
return Impl.isFPVectorizationPotentiallyUnsafe();		return Impl.isFPVectorizationPotentiallyUnsafe();
}		}
bool allowsMisalignedMemoryAccesses(LLVMContext &Context, unsigned BitWidth,		bool allowsMisalignedMemoryAccesses(LLVMContext &Context, unsigned BitWidth,
unsigned AddressSpace, Align Alignment,		unsigned AddressSpace, Align Alignment,
▲ Show 20 Lines • Show All 486 Lines • Show Last 20 Lines

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 353 Lines • ▼ Show 20 Lines	bool enableAggressiveInterleaving(bool LoopHasReductions) const {
return false;		return false;
}		}

TTI::MemCmpExpansionOptions enableMemCmpExpansion(bool OptSize,		TTI::MemCmpExpansionOptions enableMemCmpExpansion(bool OptSize,
bool IsZeroCmp) const {		bool IsZeroCmp) const {
return {};		return {};
}		}

		bool enableSelectOptimize() const { return true; }

bool enableInterleavedAccessVectorization() const { return false; }		bool enableInterleavedAccessVectorization() const { return false; }

bool enableMaskedInterleavedAccessVectorization() const { return false; }		bool enableMaskedInterleavedAccessVectorization() const { return false; }

bool isFPVectorizationPotentiallyUnsafe() const { return false; }		bool isFPVectorizationPotentiallyUnsafe() const { return false; }

bool allowsMisalignedMemoryAccesses(LLVMContext &Context, unsigned BitWidth,		bool allowsMisalignedMemoryAccesses(LLVMContext &Context, unsigned BitWidth,
unsigned AddressSpace, Align Alignment,		unsigned AddressSpace, Align Alignment,
▲ Show 20 Lines • Show All 923 Lines • Show Last 20 Lines

llvm/lib/Analysis/TargetTransformInfo.cpp

Show First 20 Lines • Show All 541 Lines • ▼ Show 20 Lines	bool TargetTransformInfo::enableAggressiveInterleaving(
return TTIImpl->enableAggressiveInterleaving(LoopHasReductions);		return TTIImpl->enableAggressiveInterleaving(LoopHasReductions);
}		}

TargetTransformInfo::MemCmpExpansionOptions		TargetTransformInfo::MemCmpExpansionOptions
TargetTransformInfo::enableMemCmpExpansion(bool OptSize, bool IsZeroCmp) const {		TargetTransformInfo::enableMemCmpExpansion(bool OptSize, bool IsZeroCmp) const {
return TTIImpl->enableMemCmpExpansion(OptSize, IsZeroCmp);		return TTIImpl->enableMemCmpExpansion(OptSize, IsZeroCmp);
}		}

		bool TargetTransformInfo::enableSelectOptimize() const {
		return TTIImpl->enableSelectOptimize();
		}

bool TargetTransformInfo::enableInterleavedAccessVectorization() const {		bool TargetTransformInfo::enableInterleavedAccessVectorization() const {
return TTIImpl->enableInterleavedAccessVectorization();		return TTIImpl->enableInterleavedAccessVectorization();
}		}

bool TargetTransformInfo::enableMaskedInterleavedAccessVectorization() const {		bool TargetTransformInfo::enableMaskedInterleavedAccessVectorization() const {
return TTIImpl->enableMaskedInterleavedAccessVectorization();		return TTIImpl->enableMaskedInterleavedAccessVectorization();
}		}

▲ Show 20 Lines • Show All 675 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectOptimize.cpp

Show First 20 Lines • Show All 237 Lines • ▼ Show 20 Lines	bool SelectOptimize::runOnFunction(Function &F) {
// This is an optimization pass. Legality issues will be handled by		// This is an optimization pass. Legality issues will be handled by
// instruction selection.		// instruction selection.
if (!TLI->isSelectSupported(TargetLowering::ScalarValSelect) &&		if (!TLI->isSelectSupported(TargetLowering::ScalarValSelect) &&
!TLI->isSelectSupported(TargetLowering::ScalarCondVectorVal) &&		!TLI->isSelectSupported(TargetLowering::ScalarCondVectorVal) &&
!TLI->isSelectSupported(TargetLowering::VectorMaskSelect))		!TLI->isSelectSupported(TargetLowering::VectorMaskSelect))
return false;		return false;

TTI = &getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);		TTI = &getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);

		if (!TTI->enableSelectOptimize())
		return false;

DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();		DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();
LI = &getAnalysis<LoopInfoWrapperPass>().getLoopInfo();		LI = &getAnalysis<LoopInfoWrapperPass>().getLoopInfo();
BPI.reset(new BranchProbabilityInfo(F, *LI));		BPI.reset(new BranchProbabilityInfo(F, *LI));
BFI.reset(new BlockFrequencyInfo(F, BPI, LI));		BFI.reset(new BlockFrequencyInfo(F, BPI, LI));
PSI = &getAnalysis<ProfileSummaryInfoWrapperPass>().getPSI();		PSI = &getAnalysis<ProfileSummaryInfoWrapperPass>().getPSI();
ORE = &getAnalysis<OptimizationRemarkEmitterWrapperPass>().getORE();		ORE = &getAnalysis<OptimizationRemarkEmitterWrapperPass>().getORE();
TSchedModel.init(TSI);		TSchedModel.init(TSI);

▲ Show 20 Lines • Show All 789 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64.td

Show First 20 Lines • Show All 203 Lines • ▼ Show 20 Lines
def FeatureBalanceFPOps : SubtargetFeature<"balance-fp-ops", "BalanceFPOps",		def FeatureBalanceFPOps : SubtargetFeature<"balance-fp-ops", "BalanceFPOps",
"true",		"true",
"balance mix of odd and even D-registers for fp multiply(-accumulate) ops">;		"balance mix of odd and even D-registers for fp multiply(-accumulate) ops">;

def FeaturePredictableSelectIsExpensive : SubtargetFeature<		def FeaturePredictableSelectIsExpensive : SubtargetFeature<
"predictable-select-expensive", "PredictableSelectIsExpensive", "true",		"predictable-select-expensive", "PredictableSelectIsExpensive", "true",
"Prefer likely predicted branches over selects">;		"Prefer likely predicted branches over selects">;

		def FeatureEnableSelectOptimize : SubtargetFeature<
		"enable-select-opt", "EnableSelectOptimize", "true",
		"Enable the select optimize pass for select loop heuristics">;

def FeatureCustomCheapAsMoveHandling : SubtargetFeature<"custom-cheap-as-move",		def FeatureCustomCheapAsMoveHandling : SubtargetFeature<"custom-cheap-as-move",
"HasCustomCheapAsMoveHandling", "true",		"HasCustomCheapAsMoveHandling", "true",
"Use custom handling of cheap instructions">;		"Use custom handling of cheap instructions">;

def FeatureExynosCheapAsMoveHandling : SubtargetFeature<"exynos-cheap-as-move",		def FeatureExynosCheapAsMoveHandling : SubtargetFeature<"exynos-cheap-as-move",
"HasExynosCheapAsMoveHandling", "true",		"HasExynosCheapAsMoveHandling", "true",
"Use Exynos specific handling of cheap instructions",		"Use Exynos specific handling of cheap instructions",
[FeatureCustomCheapAsMoveHandling]>;		[FeatureCustomCheapAsMoveHandling]>;
▲ Show 20 Lines • Show All 518 Lines • ▼ Show 20 Lines
def TuneA57 : SubtargetFeature<"a57", "ARMProcFamily", "CortexA57",		def TuneA57 : SubtargetFeature<"a57", "ARMProcFamily", "CortexA57",
"Cortex-A57 ARM processors", [		"Cortex-A57 ARM processors", [
FeatureFuseAES,		FeatureFuseAES,
FeatureBalanceFPOps,		FeatureBalanceFPOps,
FeatureCustomCheapAsMoveHandling,		FeatureCustomCheapAsMoveHandling,
FeatureFuseAdrpAdd,		FeatureFuseAdrpAdd,
FeatureFuseLiterals,		FeatureFuseLiterals,
FeaturePostRAScheduler,		FeaturePostRAScheduler,
		FeatureEnableSelectOptimize,
FeaturePredictableSelectIsExpensive]>;		FeaturePredictableSelectIsExpensive]>;

def TuneA65 : SubtargetFeature<"a65", "ARMProcFamily", "CortexA65",		def TuneA65 : SubtargetFeature<"a65", "ARMProcFamily", "CortexA65",
"Cortex-A65 ARM processors", [		"Cortex-A65 ARM processors", [
FeatureFuseAES,		FeatureFuseAES,
FeatureFuseAddress,		FeatureFuseAddress,
FeatureFuseAdrpAdd,		FeatureFuseAdrpAdd,
FeatureFuseLiterals]>;		FeatureFuseLiterals,
		FeatureEnableSelectOptimize]>;

def TuneA72 : SubtargetFeature<"a72", "ARMProcFamily", "CortexA72",		def TuneA72 : SubtargetFeature<"a72", "ARMProcFamily", "CortexA72",
"Cortex-A72 ARM processors", [		"Cortex-A72 ARM processors", [
FeatureFuseAES,		FeatureFuseAES,
FeatureFuseAdrpAdd,		FeatureFuseAdrpAdd,
FeatureFuseLiterals]>;		FeatureFuseLiterals,
		FeatureEnableSelectOptimize]>;

def TuneA73 : SubtargetFeature<"a73", "ARMProcFamily", "CortexA73",		def TuneA73 : SubtargetFeature<"a73", "ARMProcFamily", "CortexA73",
"Cortex-A73 ARM processors", [		"Cortex-A73 ARM processors", [
FeatureFuseAES,		FeatureFuseAES,
FeatureFuseAdrpAdd]>;		FeatureFuseAdrpAdd,
		FeatureEnableSelectOptimize]>;

def TuneA75 : SubtargetFeature<"a75", "ARMProcFamily", "CortexA75",		def TuneA75 : SubtargetFeature<"a75", "ARMProcFamily", "CortexA75",
"Cortex-A75 ARM processors", [		"Cortex-A75 ARM processors", [
FeatureFuseAES,		FeatureFuseAES,
FeatureFuseAdrpAdd]>;		FeatureFuseAdrpAdd,
		FeatureEnableSelectOptimize]>;

def TuneA76 : SubtargetFeature<"a76", "ARMProcFamily", "CortexA76",		def TuneA76 : SubtargetFeature<"a76", "ARMProcFamily", "CortexA76",
"Cortex-A76 ARM processors", [		"Cortex-A76 ARM processors", [
FeatureFuseAES,		FeatureFuseAES,
FeatureFuseAdrpAdd,		FeatureFuseAdrpAdd,
FeatureLSLFast]>;		FeatureLSLFast,
		FeatureEnableSelectOptimize]>;

def TuneA77 : SubtargetFeature<"a77", "ARMProcFamily", "CortexA77",		def TuneA77 : SubtargetFeature<"a77", "ARMProcFamily", "CortexA77",
"Cortex-A77 ARM processors", [		"Cortex-A77 ARM processors", [
FeatureCmpBccFusion,		FeatureCmpBccFusion,
FeatureFuseAES,		FeatureFuseAES,
FeatureFuseAdrpAdd,		FeatureFuseAdrpAdd,
FeatureLSLFast]>;		FeatureLSLFast,
		FeatureEnableSelectOptimize]>;

def TuneA78 : SubtargetFeature<"a78", "ARMProcFamily", "CortexA78",		def TuneA78 : SubtargetFeature<"a78", "ARMProcFamily", "CortexA78",
"Cortex-A78 ARM processors", [		"Cortex-A78 ARM processors", [
FeatureCmpBccFusion,		FeatureCmpBccFusion,
FeatureFuseAES,		FeatureFuseAES,
FeatureFuseAdrpAdd,		FeatureFuseAdrpAdd,
FeatureLSLFast,		FeatureLSLFast,
FeaturePostRAScheduler]>;		FeaturePostRAScheduler,
		FeatureEnableSelectOptimize]>;

def TuneA78C : SubtargetFeature<"a78c", "ARMProcFamily",		def TuneA78C : SubtargetFeature<"a78c", "ARMProcFamily",
"CortexA78C",		"CortexA78C",
"Cortex-A78C ARM processors", [		"Cortex-A78C ARM processors", [
FeatureCmpBccFusion,		FeatureCmpBccFusion,
FeatureFuseAES,		FeatureFuseAES,
FeatureFuseAdrpAdd,		FeatureFuseAdrpAdd,
FeatureLSLFast,		FeatureLSLFast,
FeaturePostRAScheduler]>;		FeaturePostRAScheduler,
		FeatureEnableSelectOptimize]>;

def TuneA710 : SubtargetFeature<"a710", "ARMProcFamily", "CortexA710",		def TuneA710 : SubtargetFeature<"a710", "ARMProcFamily", "CortexA710",
"Cortex-A710 ARM processors", [		"Cortex-A710 ARM processors", [
FeatureCmpBccFusion,		FeatureCmpBccFusion,
FeatureFuseAES,		FeatureFuseAES,
FeatureFuseAdrpAdd,		FeatureFuseAdrpAdd,
FeatureLSLFast,		FeatureLSLFast,
FeaturePostRAScheduler]>;		FeaturePostRAScheduler,
		FeatureEnableSelectOptimize]>;

def TuneA715 : SubtargetFeature<"a715", "ARMProcFamily", "CortexA715",		def TuneA715 : SubtargetFeature<"a715", "ARMProcFamily", "CortexA715",
"Cortex-A715 ARM processors", [		"Cortex-A715 ARM processors", [
FeatureFuseAES,		FeatureFuseAES,
FeaturePostRAScheduler,		FeaturePostRAScheduler,
FeatureCmpBccFusion,		FeatureCmpBccFusion,
FeatureLSLFast,		FeatureLSLFast,
FeatureFuseAdrpAdd]>;		FeatureFuseAdrpAdd,
		FeatureEnableSelectOptimize]>;

def TuneR82 : SubtargetFeature<"cortex-r82", "ARMProcFamily",		def TuneR82 : SubtargetFeature<"cortex-r82", "ARMProcFamily",
"CortexR82",		"CortexR82",
"Cortex-R82 ARM processors", [		"Cortex-R82 ARM processors", [
FeaturePostRAScheduler]>;		FeaturePostRAScheduler]>;

def TuneX1 : SubtargetFeature<"cortex-x1", "ARMProcFamily", "CortexX1",		def TuneX1 : SubtargetFeature<"cortex-x1", "ARMProcFamily", "CortexX1",
"Cortex-X1 ARM processors", [		"Cortex-X1 ARM processors", [
FeatureCmpBccFusion,		FeatureCmpBccFusion,
FeatureFuseAES,		FeatureFuseAES,
FeatureFuseAdrpAdd,		FeatureFuseAdrpAdd,
FeatureLSLFast,		FeatureLSLFast,
FeaturePostRAScheduler]>;		FeaturePostRAScheduler,
		FeatureEnableSelectOptimize]>;

def TuneX2 : SubtargetFeature<"cortex-x2", "ARMProcFamily", "CortexX2",		def TuneX2 : SubtargetFeature<"cortex-x2", "ARMProcFamily", "CortexX2",
"Cortex-X2 ARM processors", [		"Cortex-X2 ARM processors", [
FeatureCmpBccFusion,		FeatureCmpBccFusion,
FeatureFuseAES,		FeatureFuseAES,
FeatureFuseAdrpAdd,		FeatureFuseAdrpAdd,
FeatureLSLFast,		FeatureLSLFast,
FeaturePostRAScheduler]>;		FeaturePostRAScheduler,
		FeatureEnableSelectOptimize]>;

def TuneX3 : SubtargetFeature<"cortex-x3", "ARMProcFamily", "CortexX3",		def TuneX3 : SubtargetFeature<"cortex-x3", "ARMProcFamily", "CortexX3",
"Cortex-X3 ARM processors", [		"Cortex-X3 ARM processors", [
FeatureLSLFast,		FeatureLSLFast,
FeatureFuseAdrpAdd,		FeatureFuseAdrpAdd,
FeatureFuseAES,		FeatureFuseAES,
FeaturePostRAScheduler]>;		FeaturePostRAScheduler,
		FeatureEnableSelectOptimize]>;

def TuneA64FX : SubtargetFeature<"a64fx", "ARMProcFamily", "A64FX",		def TuneA64FX : SubtargetFeature<"a64fx", "ARMProcFamily", "A64FX",
"Fujitsu A64FX processors", [		"Fujitsu A64FX processors", [
FeaturePostRAScheduler,		FeaturePostRAScheduler,
FeatureAggressiveFMA,		FeatureAggressiveFMA,
FeatureArithmeticBccFusion,		FeatureArithmeticBccFusion,
FeaturePredictableSelectIsExpensive		FeaturePredictableSelectIsExpensive
]>;		]>;
▲ Show 20 Lines • Show All 167 Lines • ▼ Show 20 Lines	def TuneNeoverseE1 : SubtargetFeature<"neoversee1", "ARMProcFamily", "NeoverseE1",
FeatureFuseAdrpAdd,		FeatureFuseAdrpAdd,
FeaturePostRAScheduler]>;		FeaturePostRAScheduler]>;

def TuneNeoverseN1 : SubtargetFeature<"neoversen1", "ARMProcFamily", "NeoverseN1",		def TuneNeoverseN1 : SubtargetFeature<"neoversen1", "ARMProcFamily", "NeoverseN1",
"Neoverse N1 ARM processors", [		"Neoverse N1 ARM processors", [
FeatureFuseAES,		FeatureFuseAES,
FeatureFuseAdrpAdd,		FeatureFuseAdrpAdd,
FeatureLSLFast,		FeatureLSLFast,
FeaturePostRAScheduler]>;		FeaturePostRAScheduler,
		FeatureEnableSelectOptimize]>;

def TuneNeoverseN2 : SubtargetFeature<"neoversen2", "ARMProcFamily", "NeoverseN2",		def TuneNeoverseN2 : SubtargetFeature<"neoversen2", "ARMProcFamily", "NeoverseN2",
"Neoverse N2 ARM processors", [		"Neoverse N2 ARM processors", [
FeatureFuseAES,		FeatureFuseAES,
FeatureFuseAdrpAdd,		FeatureFuseAdrpAdd,
FeatureLSLFast,		FeatureLSLFast,
FeaturePostRAScheduler]>;		FeaturePostRAScheduler,
		FeatureEnableSelectOptimize]>;

def TuneNeoverse512TVB : SubtargetFeature<"neoverse512tvb", "ARMProcFamily", "Neoverse512TVB",		def TuneNeoverse512TVB : SubtargetFeature<"neoverse512tvb", "ARMProcFamily", "Neoverse512TVB",
"Neoverse 512-TVB ARM processors", [		"Neoverse 512-TVB ARM processors", [
FeatureFuseAES,		FeatureFuseAES,
FeatureFuseAdrpAdd,		FeatureFuseAdrpAdd,
FeatureLSLFast,		FeatureLSLFast,
FeaturePostRAScheduler]>;		FeaturePostRAScheduler,
		FeatureEnableSelectOptimize]>;

def TuneNeoverseV1 : SubtargetFeature<"neoversev1", "ARMProcFamily", "NeoverseV1",		def TuneNeoverseV1 : SubtargetFeature<"neoversev1", "ARMProcFamily", "NeoverseV1",
"Neoverse V1 ARM processors", [		"Neoverse V1 ARM processors", [
FeatureFuseAES,		FeatureFuseAES,
FeatureFuseAdrpAdd,		FeatureFuseAdrpAdd,
FeatureLSLFast,		FeatureLSLFast,
FeaturePostRAScheduler]>;		FeaturePostRAScheduler,
		FeatureEnableSelectOptimize]>;

def TuneNeoverseV2 : SubtargetFeature<"neoversev2", "ARMProcFamily", "NeoverseV2",		def TuneNeoverseV2 : SubtargetFeature<"neoversev2", "ARMProcFamily", "NeoverseV2",
"Neoverse V2 ARM processors", [		"Neoverse V2 ARM processors", [
FeatureFuseAES,		FeatureFuseAES,
FeatureLSLFast,		FeatureLSLFast,
FeaturePostRAScheduler]>;		FeaturePostRAScheduler,
		FeatureEnableSelectOptimize]>;

def TuneSaphira : SubtargetFeature<"saphira", "ARMProcFamily", "Saphira",		def TuneSaphira : SubtargetFeature<"saphira", "ARMProcFamily", "Saphira",
"Qualcomm Saphira processors", [		"Qualcomm Saphira processors", [
FeatureCustomCheapAsMoveHandling,		FeatureCustomCheapAsMoveHandling,
FeaturePostRAScheduler,		FeaturePostRAScheduler,
FeaturePredictableSelectIsExpensive,		FeaturePredictableSelectIsExpensive,
FeatureZCZeroing,		FeatureZCZeroing,
FeatureLSLFast]>;		FeatureLSLFast]>;
▲ Show 20 Lines • Show All 194 Lines • ▼ Show 20 Lines	def ProcessorFeatures {
// affect code generated by the compiler and can be used only by explicitly		// affect code generated by the compiler and can be used only by explicitly
// mentioning the new system register names in assembly.		// mentioning the new system register names in assembly.
list<SubtargetFeature> Generic = [FeatureFPARMv8, FeatureNEON, FeatureETE];		list<SubtargetFeature> Generic = [FeatureFPARMv8, FeatureNEON, FeatureETE];
}		}

// FeatureFuseAdrpAdd is enabled under Generic to allow linker merging		// FeatureFuseAdrpAdd is enabled under Generic to allow linker merging
// optimizations.		// optimizations.
def : ProcessorModel<"generic", CortexA55Model, ProcessorFeatures.Generic,		def : ProcessorModel<"generic", CortexA55Model, ProcessorFeatures.Generic,
[FeatureFuseAES, FeatureFuseAdrpAdd, FeaturePostRAScheduler]>;		[FeatureFuseAES, FeatureFuseAdrpAdd, FeaturePostRAScheduler,
		FeatureEnableSelectOptimize]>;
def : ProcessorModel<"cortex-a35", CortexA53Model, ProcessorFeatures.A53,		def : ProcessorModel<"cortex-a35", CortexA53Model, ProcessorFeatures.A53,
[TuneA35]>;		[TuneA35]>;
def : ProcessorModel<"cortex-a34", CortexA53Model, ProcessorFeatures.A53,		def : ProcessorModel<"cortex-a34", CortexA53Model, ProcessorFeatures.A53,
[TuneA35]>;		[TuneA35]>;
def : ProcessorModel<"cortex-a53", CortexA53Model, ProcessorFeatures.A53,		def : ProcessorModel<"cortex-a53", CortexA53Model, ProcessorFeatures.A53,
[TuneA53]>;		[TuneA53]>;
def : ProcessorModel<"cortex-a55", CortexA55Model, ProcessorFeatures.A55,		def : ProcessorModel<"cortex-a55", CortexA55Model, ProcessorFeatures.A55,
[TuneA55]>;		[TuneA55]>;
▲ Show 20 Lines • Show All 187 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64TargetMachine.cpp

Show First 20 Lines • Show All 126 Lines • ▼ Show 20 Lines	EnableCondOpt("aarch64-enable-condopt",
cl::init(true), cl::Hidden);		cl::init(true), cl::Hidden);

static cl::opt<bool>		static cl::opt<bool>
EnableGEPOpt("aarch64-enable-gep-opt", cl::Hidden,		EnableGEPOpt("aarch64-enable-gep-opt", cl::Hidden,
cl::desc("Enable optimizations on complex GEPs"),		cl::desc("Enable optimizations on complex GEPs"),
cl::init(false));		cl::init(false));

static cl::opt<bool>		static cl::opt<bool>
		EnableSelectOpt("aarch64-select-opt", cl::Hidden,
		cl::desc("Enable select to branch optimizations"),
		cl::init(true));

		static cl::opt<bool>
BranchRelaxation("aarch64-enable-branch-relax", cl::Hidden, cl::init(true),		BranchRelaxation("aarch64-enable-branch-relax", cl::Hidden, cl::init(true),
cl::desc("Relax out of range conditional branches"));		cl::desc("Relax out of range conditional branches"));

static cl::opt<bool> EnableCompressJumpTables(		static cl::opt<bool> EnableCompressJumpTables(
"aarch64-enable-compress-jump-tables", cl::Hidden, cl::init(true),		"aarch64-enable-compress-jump-tables", cl::Hidden, cl::init(true),
cl::desc("Use smallest entry possible for jump tables"));		cl::desc("Use smallest entry possible for jump tables"));

// FIXME: Unify control over GlobalMerge.		// FIXME: Unify control over GlobalMerge.
▲ Show 20 Lines • Show All 439 Lines • ▼ Show 20 Lines	if (TM->getOptLevel() == CodeGenOpt::Aggressive && EnableGEPOpt) {
addPass(createEarlyCSEPass());		addPass(createEarlyCSEPass());
// Do loop invariant code motion in case part of the lowered result is		// Do loop invariant code motion in case part of the lowered result is
// invariant.		// invariant.
addPass(createLICMPass());		addPass(createLICMPass());
}		}

TargetPassConfig::addIRPasses();		TargetPassConfig::addIRPasses();

		if (getOptLevel() == CodeGenOpt::Aggressive && EnableSelectOpt)
		addPass(createSelectOptimizePass());

addPass(createAArch64StackTaggingPass(		addPass(createAArch64StackTaggingPass(
/IsOptNone=/TM->getOptLevel() == CodeGenOpt::None));		/IsOptNone=/TM->getOptLevel() == CodeGenOpt::None));

// Match complex arithmetic patterns		// Match complex arithmetic patterns
if (TM->getOptLevel() >= CodeGenOpt::Default)		if (TM->getOptLevel() >= CodeGenOpt::Default)
addPass(createComplexDeinterleavingPass(TM));		addPass(createComplexDeinterleavingPass(TM));

// Match interleaved memory accesses to ldN/stN intrinsics.		// Match interleaved memory accesses to ldN/stN intrinsics.
▲ Show 20 Lines • Show All 257 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h

Show First 20 Lines • Show All 382 Lines • ▼ Show 20 Lines	public:
/// mode represented by AM for this target, for a load/store		/// mode represented by AM for this target, for a load/store
/// of the specified type.		/// of the specified type.
/// If the AM is supported, the return value must be >= 0.		/// If the AM is supported, the return value must be >= 0.
/// If the AM is not supported, it returns a negative value.		/// If the AM is not supported, it returns a negative value.
InstructionCost getScalingFactorCost(Type Ty, GlobalValue BaseGV,		InstructionCost getScalingFactorCost(Type Ty, GlobalValue BaseGV,
int64_t BaseOffset, bool HasBaseReg,		int64_t BaseOffset, bool HasBaseReg,
int64_t Scale, unsigned AddrSpace) const;		int64_t Scale, unsigned AddrSpace) const;
/// @}		/// @}

		bool enableSelectOptimize() { return ST->enableSelectOptimize(); }
};		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_LIB_TARGET_AARCH64_AARCH64TARGETTRANSFORMINFO_H		#endif // LLVM_LIB_TARGET_AARCH64_AARCH64TARGETTRANSFORMINFO_H

llvm/test/CodeGen/AArch64/O3-pipeline.ll

	Show First 20 Lines • Show All 60 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Constant Hoisting			; CHECK-NEXT: Constant Hoisting
	; CHECK-NEXT: Replace intrinsics with calls to vector library			; CHECK-NEXT: Replace intrinsics with calls to vector library
	; CHECK-NEXT: Partially inline calls to library functions			; CHECK-NEXT: Partially inline calls to library functions
	; CHECK-NEXT: Expand vector predication intrinsics			; CHECK-NEXT: Expand vector predication intrinsics
	; CHECK-NEXT: Scalarize Masked Memory Intrinsics			; CHECK-NEXT: Scalarize Masked Memory Intrinsics
	; CHECK-NEXT: Expand reduction intrinsics			; CHECK-NEXT: Expand reduction intrinsics
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
	; CHECK-NEXT: TLS Variable Hoist			; CHECK-NEXT: TLS Variable Hoist
				; CHECK-NEXT: Lazy Branch Probability Analysis
				; CHECK-NEXT: Lazy Block Frequency Analysis
				; CHECK-NEXT: Optimization Remark Emitter
				; CHECK-NEXT: Optimize selects
	; CHECK-NEXT: Stack Safety Analysis			; CHECK-NEXT: Stack Safety Analysis
	; CHECK-NEXT: FunctionPass Manager			; CHECK-NEXT: FunctionPass Manager
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Stack Safety Local Analysis			; CHECK-NEXT: Stack Safety Local Analysis
	; CHECK-NEXT: FunctionPass Manager			; CHECK-NEXT: FunctionPass Manager
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	▲ Show 20 Lines • Show All 167 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/selectopt.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -select-optimize -mtriple=aarch64-linux-gnu -mcpu=generic -S < %s \| FileCheck %s --check-prefix=CHECKOO
				; RUN: opt -select-optimize -mtriple=aarch64-linux-gnu -mcpu=cortex-a55 -S < %s \| FileCheck %s --check-prefix=CHECKII
				; RUN: opt -select-optimize -mtriple=aarch64-linux-gnu -mcpu=cortex-a510 -S < %s \| FileCheck %s --check-prefix=CHECKII
				; RUN: opt -select-optimize -mtriple=aarch64-linux-gnu -mcpu=cortex-a72 -S < %s \| FileCheck %s --check-prefix=CHECKOO
				; RUN: opt -select-optimize -mtriple=aarch64-linux-gnu -mcpu=neoverse-n1 -S < %s \| FileCheck %s --check-prefix=CHECKOO
				; RUN: opt -select-optimize -mtriple=aarch64-linux-gnu -mcpu=cortex-a710 -S < %s \| FileCheck %s --check-prefix=CHECKOO
				; RUN: opt -select-optimize -mtriple=aarch64-linux-gnu -mcpu=neoverse-v2 -S < %s \| FileCheck %s --check-prefix=CHECKOO

				%struct.st = type { i32, i64, ptr, ptr, i16, ptr, ptr, i64, i64 }

				; This test has a select at the end of if.then, which is better transformed to a branch on OoO cores.

				define void @replace(ptr nocapture noundef %newst, ptr noundef %t, ptr noundef %h, i64 noundef %c, i64 noundef %rc, i64 noundef %ma, i64 noundef %n) {
				; CHECKOO-LABEL: @replace(
				; CHECKOO-NEXT: entry:
				; CHECKOO-NEXT: [[T1:%.]] = getelementptr inbounds [[STRUCT_ST:%.]], ptr [[NEWST:%.*]], i64 0, i32 2
				; CHECKOO-NEXT: store ptr [[T:%.*]], ptr [[T1]], align 8
				; CHECKOO-NEXT: [[H3:%.*]] = getelementptr inbounds [[STRUCT_ST]], ptr [[NEWST]], i64 0, i32 3
				; CHECKOO-NEXT: store ptr [[H:%.*]], ptr [[H3]], align 8
				; CHECKOO-NEXT: [[ORG_C:%.*]] = getelementptr inbounds [[STRUCT_ST]], ptr [[NEWST]], i64 0, i32 8
				; CHECKOO-NEXT: store i64 [[C:%.*]], ptr [[ORG_C]], align 8
				; CHECKOO-NEXT: [[C6:%.*]] = getelementptr inbounds [[STRUCT_ST]], ptr [[NEWST]], i64 0, i32 1
				; CHECKOO-NEXT: store i64 [[C]], ptr [[C6]], align 8
				; CHECKOO-NEXT: [[FLOW:%.*]] = getelementptr inbounds [[STRUCT_ST]], ptr [[NEWST]], i64 0, i32 7
				; CHECKOO-NEXT: store i64 [[RC:%.*]], ptr [[FLOW]], align 8
				; CHECKOO-NEXT: [[CONV:%.]] = trunc i64 [[N:%.]] to i32
				; CHECKOO-NEXT: store i32 [[CONV]], ptr [[NEWST]], align 8
				; CHECKOO-NEXT: [[FLOW10:%.*]] = getelementptr inbounds [[STRUCT_ST]], ptr [[NEWST]], i64 1, i32 7
				; CHECKOO-NEXT: [[TMP0:%.*]] = load i64, ptr [[FLOW10]], align 8
				; CHECKOO-NEXT: [[FLOW12:%.*]] = getelementptr inbounds [[STRUCT_ST]], ptr [[NEWST]], i64 2, i32 7
				; CHECKOO-NEXT: [[TMP1:%.*]] = load i64, ptr [[FLOW12]], align 8
				; CHECKOO-NEXT: [[CMP13:%.*]] = icmp sgt i64 [[TMP0]], [[TMP1]]
				; CHECKOO-NEXT: [[CONV15:%.*]] = select i1 [[CMP13]], i64 2, i64 3
				; CHECKOO-NEXT: [[CMP16_NOT149:%.]] = icmp sgt i64 [[CONV15]], [[MA:%.]]
				; CHECKOO-NEXT: br i1 [[CMP16_NOT149]], label [[WHILE_END:%.]], label [[LAND_RHS:%.]]
				; CHECKOO: land.rhs:
				; CHECKOO-NEXT: [[CMP_0151:%.]] = phi i64 [ [[CMP_1:%.]], [[IF_END87:%.]] ], [ [[CONV15]], [[ENTRY:%.]] ]
				; CHECKOO-NEXT: [[POS_0150:%.*]] = phi i64 [ [[CMP_0151]], [[IF_END87]] ], [ 1, [[ENTRY]] ]
				; CHECKOO-NEXT: [[SUB:%.*]] = add nsw i64 [[CMP_0151]], -1
				; CHECKOO-NEXT: [[FLOW19:%.*]] = getelementptr inbounds [[STRUCT_ST]], ptr [[NEWST]], i64 [[SUB]], i32 7
				; CHECKOO-NEXT: [[TMP2:%.*]] = load i64, ptr [[FLOW19]], align 8
				; CHECKOO-NEXT: [[CMP20:%.*]] = icmp sgt i64 [[TMP2]], [[RC]]
				; CHECKOO-NEXT: br i1 [[CMP20]], label [[WHILE_BODY:%.*]], label [[WHILE_END]]
				; CHECKOO: while.body:
				; CHECKOO-NEXT: [[ARRAYIDX18:%.*]] = getelementptr inbounds [[STRUCT_ST]], ptr [[NEWST]], i64 [[SUB]]
				; CHECKOO-NEXT: [[T24:%.*]] = getelementptr inbounds [[STRUCT_ST]], ptr [[NEWST]], i64 [[SUB]], i32 2
				; CHECKOO-NEXT: [[TMP3:%.*]] = load ptr, ptr [[T24]], align 8
				; CHECKOO-NEXT: [[SUB25:%.*]] = add nsw i64 [[POS_0150]], -1
				; CHECKOO-NEXT: [[ARRAYIDX26:%.*]] = getelementptr inbounds [[STRUCT_ST]], ptr [[NEWST]], i64 [[SUB25]]
				; CHECKOO-NEXT: [[T27:%.*]] = getelementptr inbounds [[STRUCT_ST]], ptr [[NEWST]], i64 [[SUB25]], i32 2
				; CHECKOO-NEXT: store ptr [[TMP3]], ptr [[T27]], align 8
				; CHECKOO-NEXT: [[H30:%.*]] = getelementptr inbounds [[STRUCT_ST]], ptr [[NEWST]], i64 [[SUB]], i32 3
				; CHECKOO-NEXT: [[TMP4:%.*]] = load ptr, ptr [[H30]], align 8
				; CHECKOO-NEXT: [[H33:%.*]] = getelementptr inbounds [[STRUCT_ST]], ptr [[NEWST]], i64 [[SUB25]], i32 3
				; CHECKOO-NEXT: store ptr [[TMP4]], ptr [[H33]], align 8
				; CHECKOO-NEXT: [[C36:%.*]] = getelementptr inbounds [[STRUCT_ST]], ptr [[NEWST]], i64 [[SUB]], i32 1
				; CHECKOO-NEXT: [[TMP5:%.*]] = load i64, ptr [[C36]], align 8
				; CHECKOO-NEXT: [[C39:%.*]] = getelementptr inbounds [[STRUCT_ST]], ptr [[NEWST]], i64 [[SUB25]], i32 1
				; CHECKOO-NEXT: store i64 [[TMP5]], ptr [[C39]], align 8
				; CHECKOO-NEXT: [[TMP6:%.*]] = load i64, ptr [[C36]], align 8
				; CHECKOO-NEXT: [[ORG_C45:%.*]] = getelementptr inbounds [[STRUCT_ST]], ptr [[NEWST]], i64 [[SUB25]], i32 8
				; CHECKOO-NEXT: store i64 [[TMP6]], ptr [[ORG_C45]], align 8
				; CHECKOO-NEXT: [[FLOW51:%.*]] = getelementptr inbounds [[STRUCT_ST]], ptr [[NEWST]], i64 [[SUB25]], i32 7
				; CHECKOO-NEXT: store i64 [[TMP2]], ptr [[FLOW51]], align 8
				; CHECKOO-NEXT: [[TMP7:%.*]] = load i32, ptr [[ARRAYIDX18]], align 8
				; CHECKOO-NEXT: store i32 [[TMP7]], ptr [[ARRAYIDX26]], align 8
				; CHECKOO-NEXT: store ptr [[T]], ptr [[T24]], align 8
				; CHECKOO-NEXT: store ptr [[H]], ptr [[H30]], align 8
				; CHECKOO-NEXT: store i64 [[C]], ptr [[C36]], align 8
				; CHECKOO-NEXT: [[ORG_C69:%.*]] = getelementptr inbounds [[STRUCT_ST]], ptr [[NEWST]], i64 [[SUB]], i32 8
				; CHECKOO-NEXT: store i64 [[C]], ptr [[ORG_C69]], align 8
				; CHECKOO-NEXT: store i64 [[RC]], ptr [[FLOW19]], align 8
				; CHECKOO-NEXT: store i32 [[CONV]], ptr [[ARRAYIDX18]], align 8
				; CHECKOO-NEXT: [[MUL:%.*]] = shl nsw i64 [[CMP_0151]], 1
				; CHECKOO-NEXT: [[ADD:%.*]] = or i64 [[MUL]], 1
				; CHECKOO-NEXT: [[CMP77_NOT:%.*]] = icmp sgt i64 [[ADD]], [[MA]]
				; CHECKOO-NEXT: br i1 [[CMP77_NOT]], label [[IF_END87]], label [[IF_THEN:%.*]]
				; CHECKOO: if.then:
				; CHECKOO-NEXT: [[SUB79:%.*]] = add nsw i64 [[MUL]], -1
				; CHECKOO-NEXT: [[FLOW81:%.*]] = getelementptr inbounds [[STRUCT_ST]], ptr [[NEWST]], i64 [[SUB79]], i32 7
				; CHECKOO-NEXT: [[TMP8:%.*]] = load i64, ptr [[FLOW81]], align 8
				; CHECKOO-NEXT: [[FLOW83:%.*]] = getelementptr inbounds [[STRUCT_ST]], ptr [[NEWST]], i64 [[MUL]], i32 7
				; CHECKOO-NEXT: [[TMP9:%.*]] = load i64, ptr [[FLOW83]], align 8
				; CHECKOO-NEXT: [[CMP84:%.*]] = icmp slt i64 [[TMP8]], [[TMP9]]
				; CHECKOO-NEXT: [[SPEC_SELECT_FROZEN:%.*]] = freeze i1 [[CMP84]]
				; CHECKOO-NEXT: br i1 [[SPEC_SELECT_FROZEN]], label [[SELECT_END:%.]], label [[SELECT_FALSE:%.]]
				; CHECKOO: select.false:
				; CHECKOO-NEXT: br label [[SELECT_END]]
				; CHECKOO: select.end:
				; CHECKOO-NEXT: [[SPEC_SELECT:%.*]] = phi i64 [ [[ADD]], [[IF_THEN]] ], [ [[MUL]], [[SELECT_FALSE]] ]
				; CHECKOO-NEXT: br label [[IF_END87]]
				; CHECKOO: if.end87:
				; CHECKOO-NEXT: [[CMP_1]] = phi i64 [ [[MUL]], [[WHILE_BODY]] ], [ [[SPEC_SELECT]], [[SELECT_END]] ]
				; CHECKOO-NEXT: [[CMP16_NOT:%.*]] = icmp sgt i64 [[CMP_1]], [[MA]]
				; CHECKOO-NEXT: br i1 [[CMP16_NOT]], label [[WHILE_END]], label [[LAND_RHS]]
				; CHECKOO: while.end:
				; CHECKOO-NEXT: ret void
				;
				; CHECKII-LABEL: @replace(
				; CHECKII-NEXT: entry:
				; CHECKII-NEXT: [[T1:%.]] = getelementptr inbounds [[STRUCT_ST:%.]], ptr [[NEWST:%.*]], i64 0, i32 2
				; CHECKII-NEXT: store ptr [[T:%.*]], ptr [[T1]], align 8
				; CHECKII-NEXT: [[H3:%.*]] = getelementptr inbounds [[STRUCT_ST]], ptr [[NEWST]], i64 0, i32 3
				; CHECKII-NEXT: store ptr [[H:%.*]], ptr [[H3]], align 8
				; CHECKII-NEXT: [[ORG_C:%.*]] = getelementptr inbounds [[STRUCT_ST]], ptr [[NEWST]], i64 0, i32 8
				; CHECKII-NEXT: store i64 [[C:%.*]], ptr [[ORG_C]], align 8
				; CHECKII-NEXT: [[C6:%.*]] = getelementptr inbounds [[STRUCT_ST]], ptr [[NEWST]], i64 0, i32 1
				; CHECKII-NEXT: store i64 [[C]], ptr [[C6]], align 8
				; CHECKII-NEXT: [[FLOW:%.*]] = getelementptr inbounds [[STRUCT_ST]], ptr [[NEWST]], i64 0, i32 7
				; CHECKII-NEXT: store i64 [[RC:%.*]], ptr [[FLOW]], align 8
				; CHECKII-NEXT: [[CONV:%.]] = trunc i64 [[N:%.]] to i32
				; CHECKII-NEXT: store i32 [[CONV]], ptr [[NEWST]], align 8
				; CHECKII-NEXT: [[FLOW10:%.*]] = getelementptr inbounds [[STRUCT_ST]], ptr [[NEWST]], i64 1, i32 7
				; CHECKII-NEXT: [[TMP0:%.*]] = load i64, ptr [[FLOW10]], align 8
				; CHECKII-NEXT: [[FLOW12:%.*]] = getelementptr inbounds [[STRUCT_ST]], ptr [[NEWST]], i64 2, i32 7
				; CHECKII-NEXT: [[TMP1:%.*]] = load i64, ptr [[FLOW12]], align 8
				; CHECKII-NEXT: [[CMP13:%.*]] = icmp sgt i64 [[TMP0]], [[TMP1]]
				; CHECKII-NEXT: [[CONV15:%.*]] = select i1 [[CMP13]], i64 2, i64 3
				; CHECKII-NEXT: [[CMP16_NOT149:%.]] = icmp sgt i64 [[CONV15]], [[MA:%.]]
				; CHECKII-NEXT: br i1 [[CMP16_NOT149]], label [[WHILE_END:%.]], label [[LAND_RHS:%.]]
				; CHECKII: land.rhs:
				; CHECKII-NEXT: [[CMP_0151:%.]] = phi i64 [ [[CMP_1:%.]], [[IF_END87:%.]] ], [ [[CONV15]], [[ENTRY:%.]] ]
				; CHECKII-NEXT: [[POS_0150:%.*]] = phi i64 [ [[CMP_0151]], [[IF_END87]] ], [ 1, [[ENTRY]] ]
				; CHECKII-NEXT: [[SUB:%.*]] = add nsw i64 [[CMP_0151]], -1
				; CHECKII-NEXT: [[FLOW19:%.*]] = getelementptr inbounds [[STRUCT_ST]], ptr [[NEWST]], i64 [[SUB]], i32 7
				; CHECKII-NEXT: [[TMP2:%.*]] = load i64, ptr [[FLOW19]], align 8
				; CHECKII-NEXT: [[CMP20:%.*]] = icmp sgt i64 [[TMP2]], [[RC]]
				; CHECKII-NEXT: br i1 [[CMP20]], label [[WHILE_BODY:%.*]], label [[WHILE_END]]
				; CHECKII: while.body:
				; CHECKII-NEXT: [[ARRAYIDX18:%.*]] = getelementptr inbounds [[STRUCT_ST]], ptr [[NEWST]], i64 [[SUB]]
				; CHECKII-NEXT: [[T24:%.*]] = getelementptr inbounds [[STRUCT_ST]], ptr [[NEWST]], i64 [[SUB]], i32 2
				; CHECKII-NEXT: [[TMP3:%.*]] = load ptr, ptr [[T24]], align 8
				; CHECKII-NEXT: [[SUB25:%.*]] = add nsw i64 [[POS_0150]], -1
				; CHECKII-NEXT: [[ARRAYIDX26:%.*]] = getelementptr inbounds [[STRUCT_ST]], ptr [[NEWST]], i64 [[SUB25]]
				; CHECKII-NEXT: [[T27:%.*]] = getelementptr inbounds [[STRUCT_ST]], ptr [[NEWST]], i64 [[SUB25]], i32 2
				; CHECKII-NEXT: store ptr [[TMP3]], ptr [[T27]], align 8
				; CHECKII-NEXT: [[H30:%.*]] = getelementptr inbounds [[STRUCT_ST]], ptr [[NEWST]], i64 [[SUB]], i32 3
				; CHECKII-NEXT: [[TMP4:%.*]] = load ptr, ptr [[H30]], align 8
				; CHECKII-NEXT: [[H33:%.*]] = getelementptr inbounds [[STRUCT_ST]], ptr [[NEWST]], i64 [[SUB25]], i32 3
				; CHECKII-NEXT: store ptr [[TMP4]], ptr [[H33]], align 8
				; CHECKII-NEXT: [[C36:%.*]] = getelementptr inbounds [[STRUCT_ST]], ptr [[NEWST]], i64 [[SUB]], i32 1
				; CHECKII-NEXT: [[TMP5:%.*]] = load i64, ptr [[C36]], align 8
				; CHECKII-NEXT: [[C39:%.*]] = getelementptr inbounds [[STRUCT_ST]], ptr [[NEWST]], i64 [[SUB25]], i32 1
				; CHECKII-NEXT: store i64 [[TMP5]], ptr [[C39]], align 8
				; CHECKII-NEXT: [[TMP6:%.*]] = load i64, ptr [[C36]], align 8
				; CHECKII-NEXT: [[ORG_C45:%.*]] = getelementptr inbounds [[STRUCT_ST]], ptr [[NEWST]], i64 [[SUB25]], i32 8
				; CHECKII-NEXT: store i64 [[TMP6]], ptr [[ORG_C45]], align 8
				; CHECKII-NEXT: [[FLOW51:%.*]] = getelementptr inbounds [[STRUCT_ST]], ptr [[NEWST]], i64 [[SUB25]], i32 7
				; CHECKII-NEXT: store i64 [[TMP2]], ptr [[FLOW51]], align 8
				; CHECKII-NEXT: [[TMP7:%.*]] = load i32, ptr [[ARRAYIDX18]], align 8
				; CHECKII-NEXT: store i32 [[TMP7]], ptr [[ARRAYIDX26]], align 8
				; CHECKII-NEXT: store ptr [[T]], ptr [[T24]], align 8
				; CHECKII-NEXT: store ptr [[H]], ptr [[H30]], align 8
				; CHECKII-NEXT: store i64 [[C]], ptr [[C36]], align 8
				; CHECKII-NEXT: [[ORG_C69:%.*]] = getelementptr inbounds [[STRUCT_ST]], ptr [[NEWST]], i64 [[SUB]], i32 8
				; CHECKII-NEXT: store i64 [[C]], ptr [[ORG_C69]], align 8
				; CHECKII-NEXT: store i64 [[RC]], ptr [[FLOW19]], align 8
				; CHECKII-NEXT: store i32 [[CONV]], ptr [[ARRAYIDX18]], align 8
				; CHECKII-NEXT: [[MUL:%.*]] = shl nsw i64 [[CMP_0151]], 1
				; CHECKII-NEXT: [[ADD:%.*]] = or i64 [[MUL]], 1
				; CHECKII-NEXT: [[CMP77_NOT:%.*]] = icmp sgt i64 [[ADD]], [[MA]]
				; CHECKII-NEXT: br i1 [[CMP77_NOT]], label [[IF_END87]], label [[IF_THEN:%.*]]
				; CHECKII: if.then:
				; CHECKII-NEXT: [[SUB79:%.*]] = add nsw i64 [[MUL]], -1
				; CHECKII-NEXT: [[FLOW81:%.*]] = getelementptr inbounds [[STRUCT_ST]], ptr [[NEWST]], i64 [[SUB79]], i32 7
				; CHECKII-NEXT: [[TMP8:%.*]] = load i64, ptr [[FLOW81]], align 8
				; CHECKII-NEXT: [[FLOW83:%.*]] = getelementptr inbounds [[STRUCT_ST]], ptr [[NEWST]], i64 [[MUL]], i32 7
				; CHECKII-NEXT: [[TMP9:%.*]] = load i64, ptr [[FLOW83]], align 8
				; CHECKII-NEXT: [[CMP84:%.*]] = icmp slt i64 [[TMP8]], [[TMP9]]
				; CHECKII-NEXT: [[SPEC_SELECT:%.*]] = select i1 [[CMP84]], i64 [[ADD]], i64 [[MUL]]
				; CHECKII-NEXT: br label [[IF_END87]]
				; CHECKII: if.end87:
				; CHECKII-NEXT: [[CMP_1]] = phi i64 [ [[MUL]], [[WHILE_BODY]] ], [ [[SPEC_SELECT]], [[IF_THEN]] ]
				; CHECKII-NEXT: [[CMP16_NOT:%.*]] = icmp sgt i64 [[CMP_1]], [[MA]]
				; CHECKII-NEXT: br i1 [[CMP16_NOT]], label [[WHILE_END]], label [[LAND_RHS]]
				; CHECKII: while.end:
				; CHECKII-NEXT: ret void
				;
				entry:
				%t1 = getelementptr inbounds %struct.st, ptr %newst, i64 0, i32 2
				store ptr %t, ptr %t1, align 8
				%h3 = getelementptr inbounds %struct.st, ptr %newst, i64 0, i32 3
				store ptr %h, ptr %h3, align 8
				%org_c = getelementptr inbounds %struct.st, ptr %newst, i64 0, i32 8
				store i64 %c, ptr %org_c, align 8
				%c6 = getelementptr inbounds %struct.st, ptr %newst, i64 0, i32 1
				store i64 %c, ptr %c6, align 8
				%flow = getelementptr inbounds %struct.st, ptr %newst, i64 0, i32 7
				store i64 %rc, ptr %flow, align 8
				%conv = trunc i64 %n to i32
				store i32 %conv, ptr %newst, align 8
				%flow10 = getelementptr inbounds %struct.st, ptr %newst, i64 1, i32 7
				%0 = load i64, ptr %flow10, align 8
				%flow12 = getelementptr inbounds %struct.st, ptr %newst, i64 2, i32 7
				%1 = load i64, ptr %flow12, align 8
				%cmp13 = icmp sgt i64 %0, %1
				%conv15 = select i1 %cmp13, i64 2, i64 3
				%cmp16.not149 = icmp sgt i64 %conv15, %ma
				br i1 %cmp16.not149, label %while.end, label %land.rhs

				land.rhs: ; preds = %entry, %if.end87
				%cmp.0151 = phi i64 [ %cmp.1, %if.end87 ], [ %conv15, %entry ]
				%pos.0150 = phi i64 [ %cmp.0151, %if.end87 ], [ 1, %entry ]
				%sub = add nsw i64 %cmp.0151, -1
				%flow19 = getelementptr inbounds %struct.st, ptr %newst, i64 %sub, i32 7
				%2 = load i64, ptr %flow19, align 8
				%cmp20 = icmp sgt i64 %2, %rc
				br i1 %cmp20, label %while.body, label %while.end

				while.body: ; preds = %land.rhs
				%arrayidx18 = getelementptr inbounds %struct.st, ptr %newst, i64 %sub
				%t24 = getelementptr inbounds %struct.st, ptr %newst, i64 %sub, i32 2
				%3 = load ptr, ptr %t24, align 8
				%sub25 = add nsw i64 %pos.0150, -1
				%arrayidx26 = getelementptr inbounds %struct.st, ptr %newst, i64 %sub25
				%t27 = getelementptr inbounds %struct.st, ptr %newst, i64 %sub25, i32 2
				store ptr %3, ptr %t27, align 8
				%h30 = getelementptr inbounds %struct.st, ptr %newst, i64 %sub, i32 3
				%4 = load ptr, ptr %h30, align 8
				%h33 = getelementptr inbounds %struct.st, ptr %newst, i64 %sub25, i32 3
				store ptr %4, ptr %h33, align 8
				%c36 = getelementptr inbounds %struct.st, ptr %newst, i64 %sub, i32 1
				%5 = load i64, ptr %c36, align 8
				%c39 = getelementptr inbounds %struct.st, ptr %newst, i64 %sub25, i32 1
				store i64 %5, ptr %c39, align 8
				%6 = load i64, ptr %c36, align 8
				%org_c45 = getelementptr inbounds %struct.st, ptr %newst, i64 %sub25, i32 8
				store i64 %6, ptr %org_c45, align 8
				%flow51 = getelementptr inbounds %struct.st, ptr %newst, i64 %sub25, i32 7
				store i64 %2, ptr %flow51, align 8
				%7 = load i32, ptr %arrayidx18, align 8
				store i32 %7, ptr %arrayidx26, align 8
				store ptr %t, ptr %t24, align 8
				store ptr %h, ptr %h30, align 8
				store i64 %c, ptr %c36, align 8
				%org_c69 = getelementptr inbounds %struct.st, ptr %newst, i64 %sub, i32 8
				store i64 %c, ptr %org_c69, align 8
				store i64 %rc, ptr %flow19, align 8
				store i32 %conv, ptr %arrayidx18, align 8
				%mul = shl nsw i64 %cmp.0151, 1
				%add = or i64 %mul, 1
				%cmp77.not = icmp sgt i64 %add, %ma
				br i1 %cmp77.not, label %if.end87, label %if.then

				if.then: ; preds = %while.body
				%sub79 = add nsw i64 %mul, -1
				%flow81 = getelementptr inbounds %struct.st, ptr %newst, i64 %sub79, i32 7
				%8 = load i64, ptr %flow81, align 8
				%flow83 = getelementptr inbounds %struct.st, ptr %newst, i64 %mul, i32 7
				%9 = load i64, ptr %flow83, align 8
				%cmp84 = icmp slt i64 %8, %9
				%spec.select = select i1 %cmp84, i64 %add, i64 %mul
				br label %if.end87

				if.end87: ; preds = %if.then, %while.body
				%cmp.1 = phi i64 [ %mul, %while.body ], [ %spec.select, %if.then ]
				%cmp16.not = icmp sgt i64 %cmp.1, %ma
				br i1 %cmp16.not, label %while.end, label %land.rhs

				while.end: ; preds = %land.rhs, %if.end87, %entry
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Enable the select optimize pass for AArch64ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 479842

llvm/include/llvm/Analysis/TargetTransformInfo.h

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

llvm/lib/Analysis/TargetTransformInfo.cpp

llvm/lib/CodeGen/SelectOptimize.cpp

llvm/lib/Target/AArch64/AArch64.td

llvm/lib/Target/AArch64/AArch64TargetMachine.cpp

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h

llvm/test/CodeGen/AArch64/O3-pipeline.ll

llvm/test/CodeGen/AArch64/selectopt.ll

[AArch64] Enable the select optimize pass for AArch64
ClosedPublic