This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/AArch64/
-
Target/
-
AArch64/
-
AArch64.h
6/8
AArch64CondBrTuning.cpp
-
AArch64InstrInfo.h
-
AArch64InstrInfo.cpp
-
AArch64TargetMachine.cpp
-
CMakeLists.txt
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
arm64-early-ifcvt.ll
-
arm64-shrink-wrapping.ll
-
cond-br-tuning.ll
-
misched-fusion.ll
-
stack-guard-remat-bitcast.ll
-
tbz-tbnz.ll
-
thread-pointer.ll

Differential D34220

[AArch64] Prefer B.cond to CBZ/CBNZ/TBZ/TBNZ when NZCV flags can be set for "free"
ClosedPublic

Authored by mcrosier on Jun 14 2017, 1:46 PM.

Download Raw Diff

Details

Reviewers

MatzeB
gberry
t.p.northover
rengolin
efriedma
• rafael
qcolombet
evandro
joelkevinjones

Summary

This patch contains a pass that transforms CBZ/CBNZ/TBZ/TBNZ instructions into a conditional branch (B.cond), when the NZCV flags can be set for "free". This is preferred on targets that have more flexibility when scheduling B.cond instructions as compared to CBZ/CBNZ/TBZ/TBNZ (assuming all other variables are equal). This can also reduce register pressure.

A few examples:

add w8, w0, w1  -> cmn w0, w1             ; CMN is an alias of ADDS.
cbz w8, .LBB_2  -> b.eq .LBB0_2

add w8, w0, w1  -> adds w8, w0, w1        ; w8 has multiple uses.
cbz w8, .LBB1_2 -> b.eq .LBB1_2

sub w8, w0, w1       -> subs w8, w0, w1   ; w8 has multiple uses.
tbz w8, #31, .LBB6_2 -> b.ge .LBB6_2

The pass is run after the Machine Combiner because I saw a few cases where converting from 'ADD' to 'ADDS' prevented fusion. I also noticed the AArch64 Conditional Compares pass (i.e., CCMP formation) doesn't handle 'ANDS' and 'BICS', but since this pass is later the formation of CCMP instructions isn't negatively impacted.

No correctness issues across SPEC2000, SPEC2006 or the llvm-test-suite. When running on Falkor, this improves SPEC2006/libquantum by 8.5%. I also saw a few other small improvements in SPEC200[0|6] of ~1-2%. Otherwise, mostly small improvements within noise and no regressions above noise. I saw a few minor improvements on Kryo and didn't test any other subtargets as none are readily available. FWIW, I'm okay with predicating this with a Feature flag, if preferred.

PTAL,
Chad

Diff Detail

Event Timeline

mcrosier created this revision.Jun 14 2017, 1:46 PM

Herald added subscribers: kristof.beyls, javed.absar, mgorny, aemerson. · View Herald TranscriptJun 14 2017, 1:46 PM

RE: the feature flag statement. Unless we know that this is a win all supported sub-targets (micro-architectures), then it seems like a feature flag would be the way to go, right?

gberry added inline comments.Jun 15 2017, 11:06 AM

lib/Target/AArch64/AArch64CondBrTuning.cpp
77	You should be able to at least do AU.setPreservesCFG() here, otherwise there's no need to override this to just call the parent version.
82	The check for MO.isReg() seems unnecessary. MO.getReg() will assert if it isn't a reg, which it always should be. You could simplify this a bit to be: if (!isvirt(MO.getReg()) return null; return getUniqueVRegDef(MO.getReg())
191	You should probably do this after the opcode check below to save compile time.

In D34220#781363, @meadori wrote:

RE: the feature flag statement. Unless we know that this is a win all supported sub-targets (micro-architectures), then it seems like a feature flag would be the way to go, right?

Yes, I think this is reasonable. I left it off initially assuming it would be easier for other sub-target owners to test, but if the consensus is this should be sub-target specific, I'll add the feature flag accordingly.

The transformation make sense to me.
Does this need to be a separate pass? Glancing at the code, it seems that performBRCONDCombine() in AArch64ISelLowering.cpp is the only place creating CBZ instructions (except for FastISel). So maybe the adjustment can be performed there?
The code has a lot of opcode "tables" (in the form of switch/case). It's a judgement call each time, but generally I think it looks better if we have most opcode based tables in AArch64InstrInfo. At least the part mapping "S" opcodes to the non-"S" opcodes looks like a good candidate.

lib/Target/AArch64/AArch64CondBrTuning.cpp
11–26	You should use `/// \file` at the beginning and continue with `///` so that doxygen can pick it up.
335	Writing `MachineBasicBlock &` instead of `auto &` would be friendlier to the reader.

Address Geoff's feedback.

In D34220#781506, @MatzeB wrote:

The transformation make sense to me.

Great! :D

Does this need to be a separate pass? Glancing at the code, it seems that performBRCONDCombine() in AArch64ISelLowering.cpp is the only place creating CBZ instructions (except for FastISel). So maybe the adjustment can be performed there?

Initially, I attempted this during ISel but ran into several problems. First, I found myself duplicating a lot of code from AArch64ISelDAGtoDAG.cpp to avoid this transformation when the 'AND' could be folded into a bitfield insert/extract operation (e.g., BFM). It also leads to the aforementioned problems (i.e., the ConditionalCompares pass would need to be able to support ANDS, BICS and we might miss MADD fusion opportunities because we can't fuse MUL+ADDS).

The code has a lot of opcode "tables" (in the form of switch/case). It's a judgement call each time, but generally I think it looks better if we have most opcode based tables in AArch64InstrInfo. At least the part mapping "S" opcodes to the non-"S" opcodes looks like a good candidate.

I'm happy to move the large opcode table to switch between the non-flag-setting to flag-setting opcodes. No problem.

mcrosier marked 3 inline comments as done.Jun 15 2017, 12:14 PM

mcrosier added inline comments.

lib/Target/AArch64/AArch64CondBrTuning.cpp
77	I also noticed MachineTraceMetrics was being clobbered, so I've marked that as preserved as well. Please let me know if that's not correct.

gberry added inline comments.Jun 15 2017, 12:18 PM

lib/Target/AArch64/AArch64CondBrTuning.cpp
77	I don't think that is correct since you are potentially changing opcodes and latencies.

Address Matthias' and Geoff's feedback.

lib/Target/AArch64/AArch64CondBrTuning.cpp
77	Oh, right. I guess we'll just have to recompute in that case.

Remove an accidental change to AArch64ISelLowering.cpp from a different WIP.

I'm fine with coding style and transformation.

I'm always a bit sad to see whole new passes for simple/local patterns. There is already other passes like MachineCombiner and MachinePeephole, DeadMachineInstructionElim that also just go through all instruction looking for a pattern and don't need any analysis upfront. Unfortunately none of them is extensible and they are at slightly different positions in the pipeline. I guess we do not need to solve/improve this situation with this patch. Though I reserve the right to ask for a revert and replanning if the bots measure slowdowns in CTMark after committing this.

So LGTM.

This revision is now accepted and ready to land.Jun 15 2017, 2:32 PM

In D34220#781653, @MatzeB wrote:

I'm fine with coding style and transformation.

I'm always a bit sad to see whole new passes for simple/local patterns. There is already other passes like MachineCombiner and MachinePeephole, DeadMachineInstructionElim that also just go through all instruction looking for a pattern and don't need any analysis upfront. Unfortunately none of them is extensible and they are at slightly different positions in the pipeline. I guess we do not need to solve/improve this situation with this patch. Though I reserve the right to ask for a revert and replanning if the bots measure slowdowns in CTMark after committing this.

So LGTM.

Thanks, Matthias.

I went ahead and looked at each of the sub-target machine descriptions to determine if this patch is good, bad, or neutral. In short, this transformation is either neutral or positive for all sub-targets.

The default WriteRes mappings are neutral when considering the non-flag-setting vs. flag-setting instructions. Falkor and Kryo are the only targets that don't use the default WriteRes values, but converting from non-flag-setting to flag-setting is still neutral for these targets.

These targets appear to be neutral: A53, A57, and Cyclone (per Matthias), ThunderX, ThunderX2T99.
These targets appear to benefit from this tranformation: Falkor, Kryo, M1

For M1, TBZ/TBNZ consume M1WriteC1 and M1WriteA2 for 2 cycles as compared to Bcc which consumes M1WriteB1 for 1 cycle. That seems to be a clear win. Both CBZ/CBNZ and Bcc execute in a single cycle, but consume different resources (M1WriteC1 vs. M1WriteB1, respectively). This isn't clearly a win or loss, but overall this transform appears to be general goodness.

At this point I'm thinking this should be enabled by default because it appears either neutral or good for all sub-targets. FWIW, this is also the default behavior for GCC. However, I'll add a few more sub-target owners to see if they have any comments.

I concur with the analysis for ThunderXT99.

In D34220#788263, @joelkevinjones wrote:

I concur with the analysis for ThunderXT99.

Thanks, Joel!

apinski-cavium added a subscriber: apinski-cavium.Jun 23 2017, 9:21 PM

Note this should be an improvement for ThunderX (though maybe it is not modeled correctly). Not ThunderX2T99 though (only because fusion happens even for add/cbz cases too).

In D34220#789743, @apinski-cavium wrote:

Note this should be an improvement for ThunderX (though maybe it is not modeled correctly). Not ThunderX2T99 though (only because fusion happens even for add/cbz cases too).

Glad to her, Andrew!!

This was committed in r306144.

labrinea added a child revision: D34743: [AArch64] AArch64CondBrTuningPass generates wrong branch instructions.Jun 28 2017, 1:46 AM

Revision Contents

Path

Size

lib/

Target/

AArch64/

AArch64.h

2 lines

AArch64CondBrTuning.cpp

336 lines

AArch64InstrInfo.h

38 lines

AArch64InstrInfo.cpp

6 lines

AArch64TargetMachine.cpp

7 lines

CMakeLists.txt

1 line

test/

CodeGen/

AArch64/

arm64-early-ifcvt.ll

2 lines

arm64-shrink-wrapping.ll

24 lines

cond-br-tuning.ll

169 lines

misched-fusion.ll

4 lines

stack-guard-remat-bitcast.ll

4 lines

tbz-tbnz.ll

2 lines

thread-pointer.ll

4 lines

Diff 102710

lib/Target/AArch64/AArch64.h

	Show All 25 Lines
	class AArch64Subtarget;			class AArch64Subtarget;
	class AArch64TargetMachine;			class AArch64TargetMachine;
	class FunctionPass;			class FunctionPass;
	class InstructionSelector;			class InstructionSelector;
	class MachineFunctionPass;			class MachineFunctionPass;

	FunctionPass *createAArch64DeadRegisterDefinitions();			FunctionPass *createAArch64DeadRegisterDefinitions();
	FunctionPass *createAArch64RedundantCopyEliminationPass();			FunctionPass *createAArch64RedundantCopyEliminationPass();
				FunctionPass *createAArch64CondBrTuning();
	FunctionPass *createAArch64ConditionalCompares();			FunctionPass *createAArch64ConditionalCompares();
	FunctionPass *createAArch64AdvSIMDScalar();			FunctionPass *createAArch64AdvSIMDScalar();
	FunctionPass *createAArch64ISelDag(AArch64TargetMachine &TM,			FunctionPass *createAArch64ISelDag(AArch64TargetMachine &TM,
	CodeGenOpt::Level OptLevel);			CodeGenOpt::Level OptLevel);
	FunctionPass *createAArch64StorePairSuppressPass();			FunctionPass *createAArch64StorePairSuppressPass();
	FunctionPass *createAArch64ExpandPseudoPass();			FunctionPass *createAArch64ExpandPseudoPass();
	FunctionPass *createAArch64LoadStoreOptimizationPass();			FunctionPass *createAArch64LoadStoreOptimizationPass();
	FunctionPass *createAArch64VectorByElementOptPass();			FunctionPass *createAArch64VectorByElementOptPass();
	ModulePass *createAArch64PromoteConstantPass();			ModulePass *createAArch64PromoteConstantPass();
	FunctionPass *createAArch64ConditionOptimizerPass();			FunctionPass *createAArch64ConditionOptimizerPass();
	FunctionPass *createAArch64A57FPLoadBalancing();			FunctionPass *createAArch64A57FPLoadBalancing();
	FunctionPass *createAArch64A53Fix835769();			FunctionPass *createAArch64A53Fix835769();

	FunctionPass *createAArch64CleanupLocalDynamicTLSPass();			FunctionPass *createAArch64CleanupLocalDynamicTLSPass();

	FunctionPass *createAArch64CollectLOHPass();			FunctionPass *createAArch64CollectLOHPass();
	InstructionSelector *			InstructionSelector *
	createAArch64InstructionSelector(const AArch64TargetMachine &,			createAArch64InstructionSelector(const AArch64TargetMachine &,
	AArch64Subtarget &, AArch64RegisterBankInfo &);			AArch64Subtarget &, AArch64RegisterBankInfo &);

	void initializeAArch64A53Fix835769Pass(PassRegistry&);			void initializeAArch64A53Fix835769Pass(PassRegistry&);
	void initializeAArch64A57FPLoadBalancingPass(PassRegistry&);			void initializeAArch64A57FPLoadBalancingPass(PassRegistry&);
	void initializeAArch64AdvSIMDScalarPass(PassRegistry&);			void initializeAArch64AdvSIMDScalarPass(PassRegistry&);
	void initializeAArch64CollectLOHPass(PassRegistry&);			void initializeAArch64CollectLOHPass(PassRegistry&);
				void initializeAArch64CondBrTuningPass(PassRegistry &);
	void initializeAArch64ConditionalComparesPass(PassRegistry&);			void initializeAArch64ConditionalComparesPass(PassRegistry&);
	void initializeAArch64ConditionOptimizerPass(PassRegistry&);			void initializeAArch64ConditionOptimizerPass(PassRegistry&);
	void initializeAArch64DeadRegisterDefinitionsPass(PassRegistry&);			void initializeAArch64DeadRegisterDefinitionsPass(PassRegistry&);
	void initializeAArch64ExpandPseudoPass(PassRegistry&);			void initializeAArch64ExpandPseudoPass(PassRegistry&);
	void initializeAArch64LoadStoreOptPass(PassRegistry&);			void initializeAArch64LoadStoreOptPass(PassRegistry&);
	void initializeAArch64VectorByElementOptPass(PassRegistry&);			void initializeAArch64VectorByElementOptPass(PassRegistry&);
	void initializeAArch64PromoteConstantPass(PassRegistry&);			void initializeAArch64PromoteConstantPass(PassRegistry&);
	void initializeAArch64RedundantCopyEliminationPass(PassRegistry&);			void initializeAArch64RedundantCopyEliminationPass(PassRegistry&);
	void initializeAArch64StorePairSuppressPass(PassRegistry&);			void initializeAArch64StorePairSuppressPass(PassRegistry&);
	void initializeLDTLSCleanupPass(PassRegistry&);			void initializeLDTLSCleanupPass(PassRegistry&);
	} // end namespace llvm			} // end namespace llvm

	#endif			#endif

lib/Target/AArch64/AArch64CondBrTuning.cpp

This file was added.

				//===-- AArch64CondBrTuning.cpp --- Conditional branch tuning for AArch64 -===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				/// \file
				/// This file contains a pass that transforms CBZ/CBNZ/TBZ/TBNZ instructions
				/// into a conditional branch (B.cond), when the NZCV flags can be set for
				/// "free". This is preferred on targets that have more flexibility when
				/// scheduling B.cond instructions as compared to CBZ/CBNZ/TBZ/TBNZ (assuming
				/// all other variables are equal). This can also reduce register pressure.
				///
				/// A few examples:
				///
				/// 1) add w8, w0, w1 -> cmn w0, w1 ; CMN is an alias of ADDS.
				/// cbz w8, .LBB_2 -> b.eq .LBB0_2
				///
				/// 2) add w8, w0, w1 -> adds w8, w0, w1 ; w8 has multiple uses.
				/// cbz w8, .LBB1_2 -> b.eq .LBB1_2
				///
				/// 3) sub w8, w0, w1 -> subs w8, w0, w1 ; w8 has multiple uses.
				/// tbz w8, #31, .LBB6_2 -> b.ge .LBB6_2
				///
				MatzeBUnsubmitted Done Reply Inline Actions You should use `/// \file` at the beginning and continue with `///` so that doxygen can pick it up. MatzeB: You should use `/// \file` at the beginning and continue with `///` so that doxygen can pick it…
				//===----------------------------------------------------------------------===//

				#include "AArch64.h"
				#include "AArch64Subtarget.h"
				#include "llvm/CodeGen/MachineFunction.h"
				#include "llvm/CodeGen/MachineFunctionPass.h"
				#include "llvm/CodeGen/MachineInstrBuilder.h"
				#include "llvm/CodeGen/MachineRegisterInfo.h"
				#include "llvm/CodeGen/MachineTraceMetrics.h"
				#include "llvm/CodeGen/Passes.h"
				#include "llvm/Support/Debug.h"
				#include "llvm/Support/raw_ostream.h"
				#include "llvm/Target/TargetInstrInfo.h"
				#include "llvm/Target/TargetRegisterInfo.h"
				#include "llvm/Target/TargetSubtargetInfo.h"

				using namespace llvm;

				#define DEBUG_TYPE "aarch64-cond-br-tuning"
				#define AARCH64_CONDBR_TUNING_NAME "AArch64 Conditional Branch Tuning"

				namespace {
				class AArch64CondBrTuning : public MachineFunctionPass {
				const AArch64InstrInfo *TII;
				const TargetRegisterInfo *TRI;

				MachineRegisterInfo *MRI;

				public:
				static char ID;
				AArch64CondBrTuning() : MachineFunctionPass(ID) {
				initializeAArch64CondBrTuningPass(*PassRegistry::getPassRegistry());
				}
				void getAnalysisUsage(AnalysisUsage &AU) const override;
				bool runOnMachineFunction(MachineFunction &MF) override;
				StringRef getPassName() const override { return AARCH64_CONDBR_TUNING_NAME; }

				private:
				MachineInstr *getOperandDef(const MachineOperand &MO);
				MachineInstr *convertToFlagSetting(MachineInstr &MI, bool IsFlagSetting);
				MachineInstr *convertToCondBr(MachineInstr &MI);
				bool tryToTuneBranch(MachineInstr &MI, MachineInstr &DefMI);
				};
				} // end anonymous namespace

				char AArch64CondBrTuning::ID = 0;

				INITIALIZE_PASS(AArch64CondBrTuning, "aarch64-cond-br-tuning",
				AARCH64_CONDBR_TUNING_NAME, false, false)

				void AArch64CondBrTuning::getAnalysisUsage(AnalysisUsage &AU) const {
				gberryUnsubmitted Done Reply Inline Actions You should be able to at least do AU.setPreservesCFG() here, otherwise there's no need to override this to just call the parent version. gberry: You should be able to at least do AU.setPreservesCFG() here, otherwise there's no need to…
				mcrosierAuthorUnsubmitted Not Done Reply Inline Actions I also noticed MachineTraceMetrics was being clobbered, so I've marked that as preserved as well. Please let me know if that's not correct. mcrosier: I also noticed MachineTraceMetrics was being clobbered, so I've marked that as preserved as…
				gberryUnsubmitted Done Reply Inline Actions I don't think that is correct since you are potentially changing opcodes and latencies. gberry: I don't think that is correct since you are potentially changing opcodes and latencies.
				mcrosierAuthorUnsubmitted Not Done Reply Inline Actions Oh, right. I guess we'll just have to recompute in that case. mcrosier: Oh, right. I guess we'll just have to recompute in that case.
				AU.setPreservesCFG();
				MachineFunctionPass::getAnalysisUsage(AU);
				}

				MachineInstr *AArch64CondBrTuning::getOperandDef(const MachineOperand &MO) {
				gberryUnsubmitted Done Reply Inline Actions The check for MO.isReg() seems unnecessary. MO.getReg() will assert if it isn't a reg, which it always should be. You could simplify this a bit to be: if (!isvirt(MO.getReg()) return null; return getUniqueVRegDef(MO.getReg()) gberry: The check for MO.isReg() seems unnecessary. MO.getReg() will assert if it isn't a reg, which…
				if (!TargetRegisterInfo::isVirtualRegister(MO.getReg()))
				return nullptr;
				return MRI->getUniqueVRegDef(MO.getReg());
				}

				MachineInstr *AArch64CondBrTuning::convertToFlagSetting(MachineInstr &MI,
				bool IsFlagSetting) {
				// If this is already the flag setting version of the instruction (e.g., SUBS)
				// just make sure the implicit-def of NZCV isn't marked dead.
				if (IsFlagSetting) {
				for (unsigned I = MI.getNumExplicitOperands(), E = MI.getNumOperands();
				I != E; ++I) {
				MachineOperand &MO = MI.getOperand(I);
				if (MO.isReg() && MO.isDead() && MO.getReg() == AArch64::NZCV)
				MO.setIsDead(false);
				}
				return &MI;
				}
				bool Is64Bit;
				unsigned NewOpc = TII->convertToFlagSettingOpc(MI.getOpcode(), Is64Bit);
				unsigned NewDestReg = MI.getOperand(0).getReg();
				if (MRI->hasOneNonDBGUse(MI.getOperand(0).getReg()))
				NewDestReg = Is64Bit ? AArch64::XZR : AArch64::WZR;

				MachineInstrBuilder MIB = BuildMI(*MI.getParent(), MI, MI.getDebugLoc(),
				TII->get(NewOpc), NewDestReg);
				for (unsigned I = 1, E = MI.getNumOperands(); I != E; ++I)
				MIB.add(MI.getOperand(I));

				return MIB;
				}

				MachineInstr *AArch64CondBrTuning::convertToCondBr(MachineInstr &MI) {
				AArch64CC::CondCode CC;
				MachineBasicBlock *TargetMBB = TII->getBranchDestBlock(MI);
				switch (MI.getOpcode()) {
				default:
				llvm_unreachable("Unexpected opcode!");

				case AArch64::CBZW:
				case AArch64::CBZX:
				CC = AArch64CC::EQ;
				break;
				case AArch64::CBNZW:
				case AArch64::CBNZX:
				CC = AArch64CC::NE;
				break;
				case AArch64::TBZW:
				case AArch64::TBZX:
				CC = AArch64CC::GE;
				break;
				case AArch64::TBNZW:
				case AArch64::TBNZX:
				CC = AArch64CC::LT;
				break;
				}
				return BuildMI(*MI.getParent(), MI, MI.getDebugLoc(), TII->get(AArch64::Bcc))
				.addImm(CC)
				.addMBB(TargetMBB);
				}

				bool AArch64CondBrTuning::tryToTuneBranch(MachineInstr &MI,
				MachineInstr &DefMI) {
				// We don't want NZCV bits live across blocks.
				if (MI.getParent() != DefMI.getParent())
				return false;

				bool IsFlagSetting = true;
				unsigned MIOpc = MI.getOpcode();
				MachineInstr NewCmp = nullptr, NewBr = nullptr;
				switch (DefMI.getOpcode()) {
				default:
				return false;
				case AArch64::ADDWri:
				case AArch64::ADDWrr:
				case AArch64::ADDWrs:
				case AArch64::ADDWrx:
				case AArch64::ANDWri:
				case AArch64::ANDWrr:
				case AArch64::ANDWrs:
				case AArch64::BICWrr:
				case AArch64::BICWrs:
				case AArch64::SUBWri:
				case AArch64::SUBWrr:
				case AArch64::SUBWrs:
				case AArch64::SUBWrx:
				IsFlagSetting = false;
				case AArch64::ADDSWri:
				case AArch64::ADDSWrr:
				case AArch64::ADDSWrs:
				case AArch64::ADDSWrx:
				case AArch64::ANDSWri:
				case AArch64::ANDSWrr:
				case AArch64::ANDSWrs:
				case AArch64::BICSWrr:
				case AArch64::BICSWrs:
				case AArch64::SUBSWri:
				case AArch64::SUBSWrr:
				case AArch64::SUBSWrs:
				case AArch64::SUBSWrx:
				switch (MIOpc) {
				default:
				llvm_unreachable("Unexpected opcode!");

				case AArch64::CBZW:
				case AArch64::CBNZW:
				case AArch64::TBZW:
				case AArch64::TBNZW:
				// Check to see if the TBZ/TBNZ is checking the sign bit.
				gberryUnsubmitted Done Reply Inline Actions You should probably do this after the opcode check below to save compile time. gberry: You should probably do this after the opcode check below to save compile time.
				if ((MIOpc == AArch64::TBZW \|\| MIOpc == AArch64::TBNZW) &&
				MI.getOperand(1).getImm() != 31)
				return false;

				// There must not be any instruction between DefMI and MI that clobbers or
				// reads NZCV.
				MachineBasicBlock::iterator I(DefMI), E(MI);
				for (I = std::next(I); I != E; ++I) {
				if (I->modifiesRegister(AArch64::NZCV, TRI) \|\|
				I->readsRegister(AArch64::NZCV, TRI))
				return false;
				}
				DEBUG(dbgs() << " Replacing instructions:\n ");
				DEBUG(DefMI.print(dbgs()));
				DEBUG(dbgs() << " ");
				DEBUG(MI.print(dbgs()));

				NewCmp = convertToFlagSetting(DefMI, IsFlagSetting);
				NewBr = convertToCondBr(MI);
				break;
				}
				break;

				case AArch64::ADDXri:
				case AArch64::ADDXrr:
				case AArch64::ADDXrs:
				case AArch64::ADDXrx:
				case AArch64::ANDXri:
				case AArch64::ANDXrr:
				case AArch64::ANDXrs:
				case AArch64::BICXrr:
				case AArch64::BICXrs:
				case AArch64::SUBXri:
				case AArch64::SUBXrr:
				case AArch64::SUBXrs:
				case AArch64::SUBXrx:
				IsFlagSetting = false;
				case AArch64::ADDSXri:
				case AArch64::ADDSXrr:
				case AArch64::ADDSXrs:
				case AArch64::ADDSXrx:
				case AArch64::ANDSXri:
				case AArch64::ANDSXrr:
				case AArch64::ANDSXrs:
				case AArch64::BICSXrr:
				case AArch64::BICSXrs:
				case AArch64::SUBSXri:
				case AArch64::SUBSXrr:
				case AArch64::SUBSXrs:
				case AArch64::SUBSXrx:
				switch (MIOpc) {
				default:
				llvm_unreachable("Unexpected opcode!");

				case AArch64::CBZX:
				case AArch64::CBNZX:
				case AArch64::TBZX:
				case AArch64::TBNZX: {
				// Check to see if the TBZ/TBNZ is checking the sign bit.
				if ((MIOpc == AArch64::TBZX \|\| MIOpc == AArch64::TBNZX) &&
				MI.getOperand(1).getImm() != 63)
				return false;
				// There must not be any instruction between DefMI and MI that clobbers or
				// reads NZCV.
				MachineBasicBlock::iterator I(DefMI), E(MI);
				for (I = std::next(I); I != E; ++I) {
				if (I->modifiesRegister(AArch64::NZCV, TRI) \|\|
				I->readsRegister(AArch64::NZCV, TRI))
				return false;
				}
				DEBUG(dbgs() << " Replacing instructions:\n ");
				DEBUG(DefMI.print(dbgs()));
				DEBUG(dbgs() << " ");
				DEBUG(MI.print(dbgs()));

				NewCmp = convertToFlagSetting(DefMI, IsFlagSetting);
				NewBr = convertToCondBr(MI);
				break;
				}
				}
				break;
				}
				assert(NewCmp && NewBr && "Expected new instructions.");

				DEBUG(dbgs() << " with instruction:\n ");
				DEBUG(NewCmp->print(dbgs()));
				DEBUG(dbgs() << " ");
				DEBUG(NewBr->print(dbgs()));

				// If this was a flag setting version of the instruction, we use the original
				// instruction by just clearing the dead marked on the implicit-def of NCZV.
				// Therefore, we should not erase this instruction.
				if (!IsFlagSetting)
				DefMI.eraseFromParent();
				MI.eraseFromParent();
				return true;
				}

				bool AArch64CondBrTuning::runOnMachineFunction(MachineFunction &MF) {
				if (skipFunction(*MF.getFunction()))
				return false;

				DEBUG(dbgs() << "******** AArch64 Conditional Branch Tuning ********\n"
				<< "********** Function: " << MF.getName() << '\n');

				TII = static_cast<const AArch64InstrInfo *>(MF.getSubtarget().getInstrInfo());
				TRI = MF.getSubtarget().getRegisterInfo();
				MRI = &MF.getRegInfo();

				bool Changed = false;
				for (MachineBasicBlock &MBB : MF) {
				bool LocalChange = false;
				for (MachineBasicBlock::iterator I = MBB.getFirstTerminator(),
				E = MBB.end();
				I != E; ++I) {
				MachineInstr &MI = *I;
				switch (MI.getOpcode()) {
				default:
				break;
				case AArch64::CBZW:
				case AArch64::CBZX:
				case AArch64::CBNZW:
				case AArch64::CBNZX:
				case AArch64::TBZW:
				case AArch64::TBZX:
				case AArch64::TBNZW:
				case AArch64::TBNZX:
				MachineInstr *DefMI = getOperandDef(MI.getOperand(0));
				LocalChange = (DefMI && tryToTuneBranch(MI, *DefMI));
				break;
				}
				// If the optimization was successful, we can't optimize any other
				// branches because doing so would clobber the NZCV flags.
				if (LocalChange) {
				Changed = true;
				break;
				}
				}
				}
				return Changed;
				}

				FunctionPass *llvm::createAArch64CondBrTuning() {
				return new AArch64CondBrTuning();
				MatzeBUnsubmitted Done Reply Inline Actions Writing `MachineBasicBlock &` instead of `auto &` would be friendlier to the reader. MatzeB: Writing `MachineBasicBlock &` instead of `auto &` would be friendlier to the reader.
				}

lib/Target/AArch64/AArch64InstrInfo.h

Show First 20 Lines • Show All 113 Lines • ▼ Show 20 Lines	static bool isPairableLdStInst(const MachineInstr &MI) {
case AArch64::LDURQi:		case AArch64::LDURQi:
case AArch64::LDURWi:		case AArch64::LDURWi:
case AArch64::LDURXi:		case AArch64::LDURXi:
case AArch64::LDURSWi:		case AArch64::LDURSWi:
return true;		return true;
}		}
}		}

		/// \brief Return the opcode that set flags when possible. The caller is
		/// responsible for ensuring the opc has a flag setting equivalent.
		static unsigned convertToFlagSettingOpc(unsigned Opc, bool &Is64Bit) {
		switch (Opc) {
		default:
		llvm_unreachable("Opcode has no flag setting equivalent!");
		// 32-bit cases:
		case AArch64::ADDWri: Is64Bit = false; return AArch64::ADDSWri;
		case AArch64::ADDWrr: Is64Bit = false; return AArch64::ADDSWrr;
		case AArch64::ADDWrs: Is64Bit = false; return AArch64::ADDSWrs;
		case AArch64::ADDWrx: Is64Bit = false; return AArch64::ADDSWrx;
		case AArch64::ANDWri: Is64Bit = false; return AArch64::ANDSWri;
		case AArch64::ANDWrr: Is64Bit = false; return AArch64::ANDSWrr;
		case AArch64::ANDWrs: Is64Bit = false; return AArch64::ANDSWrs;
		case AArch64::BICWrr: Is64Bit = false; return AArch64::BICSWrr;
		case AArch64::BICWrs: Is64Bit = false; return AArch64::BICSWrs;
		case AArch64::SUBWri: Is64Bit = false; return AArch64::SUBSWri;
		case AArch64::SUBWrr: Is64Bit = false; return AArch64::SUBSWrr;
		case AArch64::SUBWrs: Is64Bit = false; return AArch64::SUBSWrs;
		case AArch64::SUBWrx: Is64Bit = false; return AArch64::SUBSWrx;
		// 64-bit cases:
		case AArch64::ADDXri: Is64Bit = true; return AArch64::ADDSXri;
		case AArch64::ADDXrr: Is64Bit = true; return AArch64::ADDSXrr;
		case AArch64::ADDXrs: Is64Bit = true; return AArch64::ADDSXrs;
		case AArch64::ADDXrx: Is64Bit = true; return AArch64::ADDSXrx;
		case AArch64::ANDXri: Is64Bit = true; return AArch64::ANDSXri;
		case AArch64::ANDXrr: Is64Bit = true; return AArch64::ANDSXrr;
		case AArch64::ANDXrs: Is64Bit = true; return AArch64::ANDSXrs;
		case AArch64::BICXrr: Is64Bit = true; return AArch64::BICSXrr;
		case AArch64::BICXrs: Is64Bit = true; return AArch64::BICSXrs;
		case AArch64::SUBXri: Is64Bit = true; return AArch64::SUBSXri;
		case AArch64::SUBXrr: Is64Bit = true; return AArch64::SUBSXrr;
		case AArch64::SUBXrs: Is64Bit = true; return AArch64::SUBSXrs;
		case AArch64::SUBXrx: Is64Bit = true; return AArch64::SUBSXrx;
		}
		}


/// Return true if this is a load/store that can be potentially paired/merged.		/// Return true if this is a load/store that can be potentially paired/merged.
bool isCandidateToMergeOrPair(MachineInstr &MI) const;		bool isCandidateToMergeOrPair(MachineInstr &MI) const;

/// Hint that pairing the given load or store is unprofitable.		/// Hint that pairing the given load or store is unprofitable.
void suppressLdStPair(MachineInstr &MI) const;		void suppressLdStPair(MachineInstr &MI) const;

bool getMemOpBaseRegImmOfs(MachineInstr &LdSt, unsigned &BaseReg,		bool getMemOpBaseRegImmOfs(MachineInstr &LdSt, unsigned &BaseReg,
int64_t &Offset,		int64_t &Offset,
▲ Show 20 Lines • Show All 226 Lines • Show Last 20 Lines

lib/Target/AArch64/AArch64InstrInfo.cpp

Show First 20 Lines • Show All 1,030 Lines • ▼ Show 20 Lines	static bool UpdateOperandRegClass(MachineInstr &Instr) {
}		}

return true;		return true;
}		}

/// \brief Return the opcode that does not set flags when possible - otherwise		/// \brief Return the opcode that does not set flags when possible - otherwise
/// return the original opcode. The caller is responsible to do the actual		/// return the original opcode. The caller is responsible to do the actual
/// substitution and legality checking.		/// substitution and legality checking.
static unsigned convertFlagSettingOpcode(const MachineInstr &MI) {		static unsigned convertToNonFlagSettingOpc(const MachineInstr &MI) {
// Don't convert all compare instructions, because for some the zero register		// Don't convert all compare instructions, because for some the zero register
// encoding becomes the sp register.		// encoding becomes the sp register.
bool MIDefinesZeroReg = false;		bool MIDefinesZeroReg = false;
if (MI.definesRegister(AArch64::WZR) \|\| MI.definesRegister(AArch64::XZR))		if (MI.definesRegister(AArch64::WZR) \|\| MI.definesRegister(AArch64::XZR))
MIDefinesZeroReg = true;		MIDefinesZeroReg = true;

switch (MI.getOpcode()) {		switch (MI.getOpcode()) {
default:		default:
▲ Show 20 Lines • Show All 92 Lines • ▼ Show 20 Lines	bool AArch64InstrInfo::optimizeCompareInstr(
int DeadNZCVIdx = CmpInstr.findRegisterDefOperandIdx(AArch64::NZCV, true);		int DeadNZCVIdx = CmpInstr.findRegisterDefOperandIdx(AArch64::NZCV, true);
if (DeadNZCVIdx != -1) {		if (DeadNZCVIdx != -1) {
if (CmpInstr.definesRegister(AArch64::WZR) \|\|		if (CmpInstr.definesRegister(AArch64::WZR) \|\|
CmpInstr.definesRegister(AArch64::XZR)) {		CmpInstr.definesRegister(AArch64::XZR)) {
CmpInstr.eraseFromParent();		CmpInstr.eraseFromParent();
return true;		return true;
}		}
unsigned Opc = CmpInstr.getOpcode();		unsigned Opc = CmpInstr.getOpcode();
unsigned NewOpc = convertFlagSettingOpcode(CmpInstr);		unsigned NewOpc = convertToNonFlagSettingOpc(CmpInstr);
if (NewOpc == Opc)		if (NewOpc == Opc)
return false;		return false;
const MCInstrDesc &MCID = get(NewOpc);		const MCInstrDesc &MCID = get(NewOpc);
CmpInstr.setDesc(MCID);		CmpInstr.setDesc(MCID);
CmpInstr.RemoveOperand(DeadNZCVIdx);		CmpInstr.RemoveOperand(DeadNZCVIdx);
bool succeeded = UpdateOperandRegClass(CmpInstr);		bool succeeded = UpdateOperandRegClass(CmpInstr);
(void)succeeded;		(void)succeeded;
assert(succeeded && "Some operands reg class are incompatible!");		assert(succeeded && "Some operands reg class are incompatible!");
▲ Show 20 Lines • Show All 2,156 Lines • ▼ Show 20 Lines	static bool getMaddPatterns(MachineInstr &Root,

if (!isCombineInstrCandidate(Opc))		if (!isCombineInstrCandidate(Opc))
return false;		return false;
if (isCombineInstrSettingFlag(Opc)) {		if (isCombineInstrSettingFlag(Opc)) {
int Cmp_NZCV = Root.findRegisterDefOperandIdx(AArch64::NZCV, true);		int Cmp_NZCV = Root.findRegisterDefOperandIdx(AArch64::NZCV, true);
// When NZCV is live bail out.		// When NZCV is live bail out.
if (Cmp_NZCV == -1)		if (Cmp_NZCV == -1)
return false;		return false;
unsigned NewOpc = convertFlagSettingOpcode(Root);		unsigned NewOpc = convertToNonFlagSettingOpc(Root);
// When opcode can't change bail out.		// When opcode can't change bail out.
// CHECKME: do we miss any cases for opcode conversion?		// CHECKME: do we miss any cases for opcode conversion?
if (NewOpc == Opc)		if (NewOpc == Opc)
return false;		return false;
Opc = NewOpc;		Opc = NewOpc;
}		}

switch (Opc) {		switch (Opc) {
▲ Show 20 Lines • Show All 1,277 Lines • Show Last 20 Lines

lib/Target/AArch64/AArch64TargetMachine.cpp

	Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
	#include <string>			#include <string>

	using namespace llvm;			using namespace llvm;

	static cl::opt<bool> EnableCCMP("aarch64-enable-ccmp",			static cl::opt<bool> EnableCCMP("aarch64-enable-ccmp",
	cl::desc("Enable the CCMP formation pass"),			cl::desc("Enable the CCMP formation pass"),
	cl::init(true), cl::Hidden);			cl::init(true), cl::Hidden);

				static cl::opt<bool>
				EnableCondBrTuning("aarch64-enable-cond-br-tune",
				cl::desc("Enable the conditional branch tuning pass"),
				cl::init(true), cl::Hidden);

	static cl::opt<bool> EnableMCR("aarch64-enable-mcr",			static cl::opt<bool> EnableMCR("aarch64-enable-mcr",
	cl::desc("Enable the machine combiner pass"),			cl::desc("Enable the machine combiner pass"),
	cl::init(true), cl::Hidden);			cl::init(true), cl::Hidden);

	static cl::opt<bool> EnableStPairSuppress("aarch64-enable-stp-suppress",			static cl::opt<bool> EnableStPairSuppress("aarch64-enable-stp-suppress",
	cl::desc("Suppress STP for AArch64"),			cl::desc("Suppress STP for AArch64"),
	cl::init(true), cl::Hidden);			cl::init(true), cl::Hidden);

	▲ Show 20 Lines • Show All 366 Lines • ▼ Show 20 Lines

	bool AArch64PassConfig::addILPOpts() {			bool AArch64PassConfig::addILPOpts() {
	if (EnableCondOpt)			if (EnableCondOpt)
	addPass(createAArch64ConditionOptimizerPass());			addPass(createAArch64ConditionOptimizerPass());
	if (EnableCCMP)			if (EnableCCMP)
	addPass(createAArch64ConditionalCompares());			addPass(createAArch64ConditionalCompares());
	if (EnableMCR)			if (EnableMCR)
	addPass(&MachineCombinerID);			addPass(&MachineCombinerID);
				if (EnableCondBrTuning)
				addPass(createAArch64CondBrTuning());
	if (EnableEarlyIfConversion)			if (EnableEarlyIfConversion)
	addPass(&EarlyIfConverterID);			addPass(&EarlyIfConverterID);
	if (EnableStPairSuppress)			if (EnableStPairSuppress)
	addPass(createAArch64StorePairSuppressPass());			addPass(createAArch64StorePairSuppressPass());
	addPass(createAArch64VectorByElementOptPass());			addPass(createAArch64VectorByElementOptPass());
	return true;			return true;
	}			}

	▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

lib/Target/AArch64/CMakeLists.txt

	Show All 37 Lines


	add_llvm_target(AArch64CodeGen			add_llvm_target(AArch64CodeGen
	AArch64A57FPLoadBalancing.cpp			AArch64A57FPLoadBalancing.cpp
	AArch64AdvSIMDScalarPass.cpp			AArch64AdvSIMDScalarPass.cpp
	AArch64AsmPrinter.cpp			AArch64AsmPrinter.cpp
	AArch64CleanupLocalDynamicTLSPass.cpp			AArch64CleanupLocalDynamicTLSPass.cpp
	AArch64CollectLOH.cpp			AArch64CollectLOH.cpp
				AArch64CondBrTuning.cpp
	AArch64ConditionalCompares.cpp			AArch64ConditionalCompares.cpp
	AArch64DeadRegisterDefinitionsPass.cpp			AArch64DeadRegisterDefinitionsPass.cpp
	AArch64ExpandPseudoInsts.cpp			AArch64ExpandPseudoInsts.cpp
	AArch64FastISel.cpp			AArch64FastISel.cpp
	AArch64A53Fix835769.cpp			AArch64A53Fix835769.cpp
	AArch64FrameLowering.cpp			AArch64FrameLowering.cpp
	AArch64ConditionOptimizer.cpp			AArch64ConditionOptimizer.cpp
	AArch64RedundantCopyElimination.cpp			AArch64RedundantCopyElimination.cpp
	Show All 28 Lines

test/CodeGen/AArch64/arm64-early-ifcvt.ll

	Show All 21 Lines
	if.else:			if.else:
	%cmp1 = icmp slt i32 %0, %min.0			%cmp1 = icmp slt i32 %0, %min.0
	%.min.0 = select i1 %cmp1, i32 %0, i32 %min.0			%.min.0 = select i1 %cmp1, i32 %0, i32 %min.0
	br label %do.cond			br label %do.cond

	do.cond:			do.cond:
	%max.1 = phi i32 [ %0, %do.body ], [ %max.0, %if.else ]			%max.1 = phi i32 [ %0, %do.body ], [ %max.0, %if.else ]
	%min.1 = phi i32 [ %min.0, %do.body ], [ %.min.0, %if.else ]			%min.1 = phi i32 [ %min.0, %do.body ], [ %.min.0, %if.else ]
	; CHECK: cbnz			; CHECK: b.ne
	%dec = add i32 %n.addr.0, -1			%dec = add i32 %n.addr.0, -1
	%tobool = icmp eq i32 %dec, 0			%tobool = icmp eq i32 %dec, 0
	br i1 %tobool, label %do.end, label %do.body			br i1 %tobool, label %do.end, label %do.body

	do.end:			do.end:
	%sub = sub nsw i32 %max.1, %min.1			%sub = sub nsw i32 %max.1, %min.1
	ret i32 %sub			ret i32 %sub
	}			}
	▲ Show 20 Lines • Show All 385 Lines • Show Last 20 Lines

test/CodeGen/AArch64/arm64-shrink-wrapping.ll

	Show First 20 Lines • Show All 72 Lines • ▼ Show 20 Lines
	; DISABLE: cbz w0, [[ELSE_LABEL:LBB[0-9_]+]]			; DISABLE: cbz w0, [[ELSE_LABEL:LBB[0-9_]+]]
	;			;
	; CHECK: mov [[SUM:w[0-9]+]], wzr			; CHECK: mov [[SUM:w[0-9]+]], wzr
	; CHECK-NEXT: mov [[IV:w[0-9]+]], #10			; CHECK-NEXT: mov [[IV:w[0-9]+]], #10
	;			;
	; Next BB.			; Next BB.
	; CHECK: [[LOOP:LBB[0-9_]+]]: ; %for.body			; CHECK: [[LOOP:LBB[0-9_]+]]: ; %for.body
	; CHECK: bl _something			; CHECK: bl _something
	; CHECK-NEXT: sub [[IV]], [[IV]], #1			; CHECK-NEXT: subs [[IV]], [[IV]], #1
	; CHECK-NEXT: add [[SUM]], w0, [[SUM]]			; CHECK-NEXT: add [[SUM]], w0, [[SUM]]
	; CHECK-NEXT: cbnz [[IV]], [[LOOP]]			; CHECK-NEXT: b.ne [[LOOP]]
	;			;
	; Next BB.			; Next BB.
	; Copy SUM into the returned register + << 3.			; Copy SUM into the returned register + << 3.
	; CHECK: lsl w0, [[SUM]], #3			; CHECK: lsl w0, [[SUM]], #3
	;			;
	; Jump to epilogue.			; Jump to epilogue.
	; DISABLE: b [[EPILOG_BB:LBB[0-9_]+]]			; DISABLE: b [[EPILOG_BB:LBB[0-9_]+]]
	;			;
	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	; CHECK: stp [[CSR1:x[0-9]+]], [[CSR2:x[0-9]+]], [sp, #-32]!			; CHECK: stp [[CSR1:x[0-9]+]], [[CSR2:x[0-9]+]], [sp, #-32]!
	; CHECK-NEXT: stp [[CSR3:x[0-9]+]], [[CSR4:x[0-9]+]], [sp, #16]			; CHECK-NEXT: stp [[CSR3:x[0-9]+]], [[CSR4:x[0-9]+]], [sp, #16]
	; CHECK-NEXT: add [[NEW_SP:x[0-9]+]], sp, #16			; CHECK-NEXT: add [[NEW_SP:x[0-9]+]], sp, #16
	; CHECK: mov [[SUM:w[0-9]+]], wzr			; CHECK: mov [[SUM:w[0-9]+]], wzr
	; CHECK-NEXT: mov [[IV:w[0-9]+]], #10			; CHECK-NEXT: mov [[IV:w[0-9]+]], #10
	; Next BB.			; Next BB.
	; CHECK: [[LOOP_LABEL:LBB[0-9_]+]]: ; %for.body			; CHECK: [[LOOP_LABEL:LBB[0-9_]+]]: ; %for.body
	; CHECK: bl _something			; CHECK: bl _something
	; CHECK-NEXT: sub [[IV]], [[IV]], #1			; CHECK-NEXT: subs [[IV]], [[IV]], #1
	; CHECK-NEXT: add [[SUM]], w0, [[SUM]]			; CHECK-NEXT: add [[SUM]], w0, [[SUM]]
	; CHECK-NEXT: cbnz [[IV]], [[LOOP_LABEL]]			; CHECK-NEXT: b.ne [[LOOP_LABEL]]
	; Next BB.			; Next BB.
	; CHECK: ; %for.end			; CHECK: ; %for.end
	; CHECK: mov w0, [[SUM]]			; CHECK: mov w0, [[SUM]]
	; CHECK-NEXT: ldp [[CSR3]], [[CSR4]], [sp, #16]			; CHECK-NEXT: ldp [[CSR3]], [[CSR4]], [sp, #16]
	; CHECK-NEXT: ldp [[CSR1]], [[CSR2]], [sp], #32			; CHECK-NEXT: ldp [[CSR1]], [[CSR2]], [sp], #32
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	define i32 @freqSaveAndRestoreOutsideLoop2(i32 %cond) {			define i32 @freqSaveAndRestoreOutsideLoop2(i32 %cond) {
	entry:			entry:
	Show All 25 Lines
	;			;
	; DISABLE: cbz w0, [[ELSE_LABEL:LBB[0-9_]+]]			; DISABLE: cbz w0, [[ELSE_LABEL:LBB[0-9_]+]]
	;			;
	; CHECK: mov [[SUM:w[0-9]+]], wzr			; CHECK: mov [[SUM:w[0-9]+]], wzr
	; CHECK-NEXT: mov [[IV:w[0-9]+]], #10			; CHECK-NEXT: mov [[IV:w[0-9]+]], #10
	;			;
	; CHECK: [[LOOP_LABEL:LBB[0-9_]+]]: ; %for.body			; CHECK: [[LOOP_LABEL:LBB[0-9_]+]]: ; %for.body
	; CHECK: bl _something			; CHECK: bl _something
	; CHECK-NEXT: sub [[IV]], [[IV]], #1			; CHECK-NEXT: subs [[IV]], [[IV]], #1
	; CHECK-NEXT: add [[SUM]], w0, [[SUM]]			; CHECK-NEXT: add [[SUM]], w0, [[SUM]]
	; CHECK-NEXT: cbnz [[IV]], [[LOOP_LABEL]]			; CHECK-NEXT: b.ne [[LOOP_LABEL]]
	; Next BB.			; Next BB.
	; CHECK: bl _somethingElse			; CHECK: bl _somethingElse
	; CHECK-NEXT: lsl w0, [[SUM]], #3			; CHECK-NEXT: lsl w0, [[SUM]], #3
	;			;
	; Jump to epilogue.			; Jump to epilogue.
	; DISABLE: b [[EPILOG_BB:LBB[0-9_]+]]			; DISABLE: b [[EPILOG_BB:LBB[0-9_]+]]
	;			;
	; DISABLE: [[ELSE_LABEL]]: ; %if.else			; DISABLE: [[ELSE_LABEL]]: ; %if.else
	▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
	; DISABLE: cbz w0, [[ELSE_LABEL:LBB[0-9_]+]]			; DISABLE: cbz w0, [[ELSE_LABEL:LBB[0-9_]+]]
	;			;
	; CHECK: bl _somethingElse			; CHECK: bl _somethingElse
	; CHECK-NEXT: mov [[SUM:w[0-9]+]], wzr			; CHECK-NEXT: mov [[SUM:w[0-9]+]], wzr
	; CHECK-NEXT: mov [[IV:w[0-9]+]], #10			; CHECK-NEXT: mov [[IV:w[0-9]+]], #10
	;			;
	; CHECK: [[LOOP_LABEL:LBB[0-9_]+]]: ; %for.body			; CHECK: [[LOOP_LABEL:LBB[0-9_]+]]: ; %for.body
	; CHECK: bl _something			; CHECK: bl _something
	; CHECK-NEXT: sub [[IV]], [[IV]], #1			; CHECK-NEXT: subs [[IV]], [[IV]], #1
	; CHECK-NEXT: add [[SUM]], w0, [[SUM]]			; CHECK-NEXT: add [[SUM]], w0, [[SUM]]
	; CHECK-NEXT: cbnz [[IV]], [[LOOP_LABEL]]			; CHECK-NEXT: b.ne [[LOOP_LABEL]]
	; Next BB.			; Next BB.
	; CHECK: lsl w0, [[SUM]], #3			; CHECK: lsl w0, [[SUM]], #3
	;			;
	; Jump to epilogue.			; Jump to epilogue.
	; DISABLE: b [[EPILOG_BB:LBB[0-9_]+]]			; DISABLE: b [[EPILOG_BB:LBB[0-9_]+]]
	;			;
	; DISABLE: [[ELSE_LABEL]]: ; %if.else			; DISABLE: [[ELSE_LABEL]]: ; %if.else
	; Shift second argument by one and store into returned register.			; Shift second argument by one and store into returned register.
	▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: mov [[SUM:w0]], wzr			; CHECK-NEXT: mov [[SUM:w0]], wzr
	; CHECK-NEXT: b.lt [[IFEND_LABEL:LBB[0-9_]+]]			; CHECK-NEXT: b.lt [[IFEND_LABEL:LBB[0-9_]+]]
	;			;
	; CHECK: [[LOOP_LABEL:LBB[0-9_]+]]: ; %for.body			; CHECK: [[LOOP_LABEL:LBB[0-9_]+]]: ; %for.body
	; CHECK: ldr [[VA_ADDR:x[0-9]+]], [sp, #8]			; CHECK: ldr [[VA_ADDR:x[0-9]+]], [sp, #8]
	; CHECK-NEXT: add [[NEXT_VA_ADDR:x[0-9]+]], [[VA_ADDR]], #8			; CHECK-NEXT: add [[NEXT_VA_ADDR:x[0-9]+]], [[VA_ADDR]], #8
	; CHECK-NEXT: str [[NEXT_VA_ADDR]], [sp, #8]			; CHECK-NEXT: str [[NEXT_VA_ADDR]], [sp, #8]
	; CHECK-NEXT: ldr [[VA_VAL:w[0-9]+]], {{\[}}[[VA_ADDR]]]			; CHECK-NEXT: ldr [[VA_VAL:w[0-9]+]], {{\[}}[[VA_ADDR]]]
	; CHECK-NEXT: sub w1, w1, #1			; CHECK-NEXT: subs w1, w1, #1
	; CHECK-NEXT: add [[SUM]], [[SUM]], [[VA_VAL]]			; CHECK-NEXT: add [[SUM]], [[SUM]], [[VA_VAL]]
	; CHECK-NEXT: cbnz w1, [[LOOP_LABEL]]			; CHECK-NEXT: b.ne [[LOOP_LABEL]]
	; CHECK-NEXT: [[IFEND_LABEL]]:			; CHECK-NEXT: [[IFEND_LABEL]]:
	; Epilogue code.			; Epilogue code.
	; CHECK: add sp, sp, #16			; CHECK: add sp, sp, #16
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	;			;
	; CHECK: [[ELSE_LABEL]]: ; %if.else			; CHECK: [[ELSE_LABEL]]: ; %if.else
	; CHECK-NEXT: lsl w0, w1, #1			; CHECK-NEXT: lsl w0, w1, #1
	; DISABLE-NEXT: add sp, sp, #16			; DISABLE-NEXT: add sp, sp, #16
	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	; CHECK: stp [[CSR1:x[0-9]+]], [[CSR2:x19]], [sp, #-16]!			; CHECK: stp [[CSR1:x[0-9]+]], [[CSR2:x19]], [sp, #-16]!
	;			;
	; DISABLE: cbz w0, [[ELSE_LABEL:LBB[0-9_]+]]			; DISABLE: cbz w0, [[ELSE_LABEL:LBB[0-9_]+]]
	;			;
	; CHECK: mov [[IV:w[0-9]+]], #10			; CHECK: mov [[IV:w[0-9]+]], #10
	;			;
	; CHECK: [[LOOP_LABEL:LBB[0-9_]+]]: ; %for.body			; CHECK: [[LOOP_LABEL:LBB[0-9_]+]]: ; %for.body
	; Inline asm statement.			; Inline asm statement.
	; CHECK: sub [[IV]], [[IV]], #1			; CHECK: subs [[IV]], [[IV]], #1
	; CHECK: add x19, x19, #1			; CHECK: add x19, x19, #1
	; CHECK: cbnz [[IV]], [[LOOP_LABEL]]			; CHECK: b.ne [[LOOP_LABEL]]
	; Next BB.			; Next BB.
	; CHECK: mov w0, wzr			; CHECK: mov w0, wzr
	; Epilogue code.			; Epilogue code.
	; CHECK-NEXT: ldp [[CSR1]], [[CSR2]], [sp], #16			; CHECK-NEXT: ldp [[CSR1]], [[CSR2]], [sp], #16
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	; Next BB.			; Next BB.
	; CHECK: [[ELSE_LABEL]]: ; %if.else			; CHECK: [[ELSE_LABEL]]: ; %if.else
	; CHECK-NEXT: lsl w0, w1, #1			; CHECK-NEXT: lsl w0, w1, #1
	▲ Show 20 Lines • Show All 294 Lines • Show Last 20 Lines

test/CodeGen/AArch64/cond-br-tuning.ll

This file was added.

				; RUN: llc < %s -O3 -mtriple=aarch64-eabi -verify-machineinstrs \| FileCheck %s

				target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
				target triple = "aarch64-linaro-linux-gnueabi"

				; CMN is an alias of ADDS.
				; CHECK-LABEL: test_add_cbz:
				; CHECK: cmn w0, w1
				; CHECK: b.eq
				; CHECK: ret
				define void @test_add_cbz(i32 %a, i32 %b, i32* %ptr) {
				%c = add nsw i32 %a, %b
				%d = icmp ne i32 %c, 0
				br i1 %d, label %L1, label %L2
				L1:
				store i32 0, i32* %ptr, align 4
				ret void
				L2:
				store i32 1, i32* %ptr, align 4
				ret void
				}

				; CHECK-LABEL: test_add_cbz_multiple_use:
				; CHECK: adds
				; CHECK: b.eq
				; CHECK: ret
				define void @test_add_cbz_multiple_use(i32 %a, i32 %b, i32* %ptr) {
				%c = add nsw i32 %a, %b
				%d = icmp ne i32 %c, 0
				br i1 %d, label %L1, label %L2
				L1:
				store i32 0, i32* %ptr, align 4
				ret void
				L2:
				store i32 %c, i32* %ptr, align 4
				ret void
				}

				; CHECK-LABEL: test_add_cbz_64:
				; CHECK: cmn x0, x1
				; CHECK: b.eq
				define void @test_add_cbz_64(i64 %a, i64 %b, i64* %ptr) {
				%c = add nsw i64 %a, %b
				%d = icmp ne i64 %c, 0
				br i1 %d, label %L1, label %L2
				L1:
				store i64 0, i64* %ptr, align 4
				ret void
				L2:
				store i64 1, i64* %ptr, align 4
				ret void
				}

				; CHECK-LABEL: test_and_cbz:
				; CHECK: tst w0, #0x6
				; CHECK: b.eq
				define void @test_and_cbz(i32 %a, i32* %ptr) {
				%c = and i32 %a, 6
				%d = icmp ne i32 %c, 0
				br i1 %d, label %L1, label %L2
				L1:
				store i32 0, i32* %ptr, align 4
				ret void
				L2:
				store i32 1, i32* %ptr, align 4
				ret void
				}

				; CHECK-LABEL: test_bic_cbnz:
				; CHECK: bics wzr, w1, w0
				; CHECK: b.ne
				define void @test_bic_cbnz(i32 %a, i32 %b, i32* %ptr) {
				%c = and i32 %a, %b
				%d = icmp eq i32 %c, %b
				br i1 %d, label %L1, label %L2
				L1:
				store i32 0, i32* %ptr, align 4
				ret void
				L2:
				store i32 1, i32* %ptr, align 4
				ret void
				}

				; CHECK-LABEL: test_add_tbz:
				; CHECK: adds
				; CHECK: b.ge
				; CHECK: ret
				define void @test_add_tbz(i32 %a, i32 %b, i32* %ptr) {
				entry:
				%add = add nsw i32 %a, %b
				%cmp36 = icmp sge i32 %add, 0
				br i1 %cmp36, label %L2, label %L1
				L1:
				store i32 %add, i32* %ptr, align 8
				br label %L2
				L2:
				ret void
				}

				; CHECK-LABEL: test_subs_tbz:
				; CHECK: subs
				; CHECK: b.ge
				; CHECK: ret
				define void @test_subs_tbz(i32 %a, i32 %b, i32* %ptr) {
				entry:
				%sub = sub nsw i32 %a, %b
				%cmp36 = icmp sge i32 %sub, 0
				br i1 %cmp36, label %L2, label %L1
				L1:
				store i32 %sub, i32* %ptr, align 8
				br label %L2
				L2:
				ret void
				}

				; CHECK-LABEL: test_add_tbnz
				; CHECK: adds
				; CHECK: b.lt
				; CHECK: ret
				define void @test_add_tbnz(i32 %a, i32 %b, i32* %ptr) {
				entry:
				%add = add nsw i32 %a, %b
				%cmp36 = icmp slt i32 %add, 0
				br i1 %cmp36, label %L2, label %L1
				L1:
				store i32 %add, i32* %ptr, align 8
				br label %L2
				L2:
				ret void
				}

				; CHECK-LABEL: test_subs_tbnz
				; CHECK: subs
				; CHECK: b.lt
				; CHECK: ret
				define void @test_subs_tbnz(i32 %a, i32 %b, i32* %ptr) {
				entry:
				%sub = sub nsw i32 %a, %b
				%cmp36 = icmp slt i32 %sub, 0
				br i1 %cmp36, label %L2, label %L1
				L1:
				store i32 %sub, i32* %ptr, align 8
				br label %L2
				L2:
				ret void
				}

				declare void @foo()
				declare void @bar(i32)

				; Don't transform since the call will clobber the NZCV bits.
				; CHECK-LABEL: test_call_clobber:
				; CHECK: and w[[DST:[0-9]+]], w1, #0x6
				; CHECK: bl bar
				; CHECK: cbnz w[[DST]]
				define void @test_call_clobber(i32 %unused, i32 %a) {
				entry:
				%c = and i32 %a, 6
				call void @bar(i32 %c)
				%tobool = icmp eq i32 %c, 0
				br i1 %tobool, label %if.end, label %if.then

				if.then:
				tail call void @foo()
				unreachable

				if.end:
				ret void
				}

test/CodeGen/AArch64/misched-fusion.ll

	; RUN: llc -o - %s -mattr=+arith-cbz-fusion \| FileCheck %s			; RUN: llc -o - %s -mattr=+arith-cbz-fusion \| FileCheck %s
	; RUN: llc -o - %s -mcpu=cyclone \| FileCheck %s			; RUN: llc -o - %s -mcpu=cyclone \| FileCheck %s

	target triple = "aarch64-unknown"			target triple = "aarch64-unknown"

	declare void @foobar(i32 %v0, i32 %v1)			declare void @foobar(i32 %v0, i32 %v1)

	; Make sure sub is scheduled in front of cbnz			; Make sure sub is scheduled in front of cbnz
	; CHECK-LABEL: test_sub_cbz:			; CHECK-LABEL: test_sub_cbz:
	; CHECK: sub w[[SUBRES:[0-9]+]], w0, #13			; CHECK: subs w[[SUBRES:[0-9]+]], w0, #13
	; CHECK-NEXT: cbnz w[[SUBRES]], {{.?LBB[0-9_]+}}			; CHECK: b.ne {{.?LBB[0-9_]+}}
	define void @test_sub_cbz(i32 %a0, i32 %a1) {			define void @test_sub_cbz(i32 %a0, i32 %a1) {
	entry:			entry:
	; except for the fusion opportunity the sub/add should be equal so the			; except for the fusion opportunity the sub/add should be equal so the
	; scheduler would leave them in source order if it weren't for the scheduling			; scheduler would leave them in source order if it weren't for the scheduling
	%v0 = sub i32 %a0, 13			%v0 = sub i32 %a0, 13
	%cond = icmp eq i32 %v0, 0			%cond = icmp eq i32 %v0, 0
	%v1 = add i32 %a1, 7			%v1 = add i32 %a1, 7
	br i1 %cond, label %if, label %exit			br i1 %cond, label %if, label %exit
	Show All 9 Lines

test/CodeGen/AArch64/stack-guard-remat-bitcast.ll

	; RUN: llc < %s -mtriple=arm64-apple-ios -relocation-model=pic -disable-fp-elim \| FileCheck %s			; RUN: llc < %s -mtriple=arm64-apple-ios -relocation-model=pic -disable-fp-elim \| FileCheck %s

	@__stack_chk_guard = external global i64*			@__stack_chk_guard = external global i64*

	; PR20558			; PR20558

	; CHECK: adrp [[R0:x[0-9]+]], ___stack_chk_guard@GOTPAGE			; CHECK: adrp [[R0:x[0-9]+]], ___stack_chk_guard@GOTPAGE
	; CHECK: ldr [[R1:x[0-9]+]], {{\[}}[[R0]], ___stack_chk_guard@GOTPAGEOFF{{\]}}			; CHECK: ldr [[R1:x[0-9]+]], {{\[}}[[R0]], ___stack_chk_guard@GOTPAGEOFF{{\]}}
	; Load the stack guard for the second time, just in case the previous value gets spilled.			; Load the stack guard for the second time, just in case the previous value gets spilled.
	; CHECK: adrp [[GUARD_PAGE:x[0-9]+]], ___stack_chk_guard@GOTPAGE			; CHECK: adrp [[GUARD_PAGE:x[0-9]+]], ___stack_chk_guard@GOTPAGE
	; CHECK: ldr [[R2:x[0-9]+]], {{\[}}[[R1]]{{\]}}			; CHECK: ldr [[R2:x[0-9]+]], {{\[}}[[R1]]{{\]}}
	; CHECK: stur [[R2]], {{\[}}x29, [[SLOT0:[0-9#\-]+]]{{\]}}			; CHECK: stur [[R2]], {{\[}}x29, [[SLOT0:[0-9#\-]+]]{{\]}}
	; CHECK: ldur [[R3:x[0-9]+]], {{\[}}x29, [[SLOT0]]{{\]}}			; CHECK: ldur [[R3:x[0-9]+]], {{\[}}x29, [[SLOT0]]{{\]}}
	; CHECK: ldr [[GUARD_ADDR:x[0-9]+]], {{\[}}[[GUARD_PAGE]], ___stack_chk_guard@GOTPAGEOFF{{\]}}			; CHECK: ldr [[GUARD_ADDR:x[0-9]+]], {{\[}}[[GUARD_PAGE]], ___stack_chk_guard@GOTPAGEOFF{{\]}}
	; CHECK: ldr [[GUARD:x[0-9]+]], {{\[}}[[GUARD_ADDR]]{{\]}}			; CHECK: ldr [[GUARD:x[0-9]+]], {{\[}}[[GUARD_ADDR]]{{\]}}
	; CHECK: sub [[R4:x[0-9]+]], [[GUARD]], [[R3]]			; CHECK: cmp [[GUARD]], [[R3]]
	; CHECK: cbnz [[R4]], LBB			; CHECK: b.ne LBB

	define i32 @test_stack_guard_remat2() {			define i32 @test_stack_guard_remat2() {
	entry:			entry:
	%StackGuardSlot = alloca i8*			%StackGuardSlot = alloca i8*
	%StackGuard = load i8, i8* bitcast (i64 @__stack_chk_guard to i8)			%StackGuard = load i8, i8* bitcast (i64 @__stack_chk_guard to i8)
	call void @llvm.stackprotector(i8* %StackGuard, i8** %StackGuardSlot)			call void @llvm.stackprotector(i8* %StackGuard, i8** %StackGuardSlot)
	%container = alloca [32 x i8], align 1			%container = alloca [32 x i8], align 1
	call void @llvm.stackprotectorcheck(i8 bitcast (i64 @__stack_chk_guard to i8**))			call void @llvm.stackprotectorcheck(i8 bitcast (i64 @__stack_chk_guard to i8**))
	ret i32 -1			ret i32 -1
	}			}

	declare void @llvm.stackprotector(i8, i8*)			declare void @llvm.stackprotector(i8, i8*)
	declare void @llvm.stackprotectorcheck(i8**)			declare void @llvm.stackprotectorcheck(i8**)

test/CodeGen/AArch64/tbz-tbnz.ll

	; RUN: llc < %s -O1 -mtriple=aarch64-eabi \| FileCheck %s			; RUN: llc < %s -O1 -mtriple=aarch64-eabi -aarch64-enable-cond-br-tune=false \| FileCheck %s

	declare void @t()			declare void @t()

	define void @test1(i32 %a) {			define void @test1(i32 %a) {
	; CHECK-LABEL: @test1			; CHECK-LABEL: @test1
	entry:			entry:
	%sub = add nsw i32 %a, -12			%sub = add nsw i32 %a, -12
	%cmp = icmp slt i32 %sub, 0			%cmp = icmp slt i32 %sub, 0
	▲ Show 20 Lines • Show All 352 Lines • Show Last 20 Lines

test/CodeGen/AArch64/thread-pointer.ll

	; RUN: llc -mtriple=aarch64-linux-gnu -verify-machineinstrs -o - %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-linux-gnu -verify-machineinstrs -o - %s \| FileCheck %s

	@x = thread_local local_unnamed_addr global i32 0, align 4			@x = thread_local local_unnamed_addr global i32 0, align 4
	@y = thread_local local_unnamed_addr global i32 0, align 4			@y = thread_local local_unnamed_addr global i32 0, align 4

	; Machine LICM should hoist the mrs into the loop preheader.			; Machine LICM should hoist the mrs into the loop preheader.
	; CHECK-LABEL: @test1			; CHECK-LABEL: @test1
	; CHECK: BB#1:			; CHECK: BB#1:
	; CHECK: mrs x[[BASE:[0-9]+]], TPIDR_EL0			; CHECK: mrs x[[BASE:[0-9]+]], TPIDR_EL0
	; CHECK: add x[[REG1:[0-9]+]], x[[BASE]], :tprel_hi12:x			; CHECK: add x[[REG1:[0-9]+]], x[[BASE]], :tprel_hi12:x
	; CHECK: add x[[REG2:[0-9]+]], x[[REG1]], :tprel_lo12_nc:x			; CHECK: add x[[REG2:[0-9]+]], x[[REG1]], :tprel_lo12_nc:x
	;			;
	; CHECK: .LBB0_2:			; CHECK: .LBB0_2:
	; CHECK: ldr w0, [x[[REG2]]]			; CHECK: ldr w0, [x[[REG2]]]
	; CHECK: bl bar			; CHECK: bl bar
	; CHECK: sub w[[REG3:[0-9]+]], w{{[0-9]+}}, #1			; CHECK: subs w[[REG3:[0-9]+]], w{{[0-9]+}}, #1
	; CHECK: cbnz w[[REG3]], .LBB0_2			; CHECK: b.ne .LBB0_2

	define void @test1(i32 %n) local_unnamed_addr {			define void @test1(i32 %n) local_unnamed_addr {
	entry:			entry:
	%cmp3 = icmp sgt i32 %n, 0			%cmp3 = icmp sgt i32 %n, 0
	br i1 %cmp3, label %bb1, label %bb2			br i1 %cmp3, label %bb1, label %bb2

	bb1:			bb1:
	br label %for.body			br label %for.body
	Show All 35 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Prefer B.cond to CBZ/CBNZ/TBZ/TBNZ when NZCV flags can be set for "free"ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 102710

lib/Target/AArch64/AArch64.h

lib/Target/AArch64/AArch64CondBrTuning.cpp

lib/Target/AArch64/AArch64InstrInfo.h

lib/Target/AArch64/AArch64InstrInfo.cpp

lib/Target/AArch64/AArch64TargetMachine.cpp

lib/Target/AArch64/CMakeLists.txt

test/CodeGen/AArch64/arm64-early-ifcvt.ll

test/CodeGen/AArch64/arm64-shrink-wrapping.ll

test/CodeGen/AArch64/cond-br-tuning.ll

test/CodeGen/AArch64/misched-fusion.ll

test/CodeGen/AArch64/stack-guard-remat-bitcast.ll

test/CodeGen/AArch64/tbz-tbnz.ll

test/CodeGen/AArch64/thread-pointer.ll

[AArch64] Prefer B.cond to CBZ/CBNZ/TBZ/TBNZ when NZCV flags can be set for "free"
ClosedPublic