This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
-
AArch64InstrInfo.h
-
AArch64InstrInfo.cpp
10/10
AArch64MIPeepholeOpt.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
1
addsub.ll
1/1
arm64-instruction-mix-remarks.ll

Differential D118663

[AArch64] Adds SUBS and ADDS instructions to the MIPeepholeOpt.
ClosedPublic

Authored by red1bluelost on Jan 31 2022, 3:43 PM.

Download Raw Diff

Details

Reviewers

dmgreen
benshi001
jaykang10

Commits

rGc69af70f02f2: [AArch64] Adds SUBS and ADDS instructions to the MIPeepholeOpt.
rGaf45d0fd94b2: [AArch64] Adds SUBS and ADDS instructions to the MIPeepholeOpt.

Summary

Implements ADDS/SUBS 24-bit immediate optimization using the
MIPeepholeOpt pass. This follows the pattern:

Optimize ([adds|subs] r, imm) -> ([ADDS|SUBS] ([ADD|SUB] r, #imm0, lsl #12), #imm1),
if imm == (imm0<<12)+imm1. and both imm0 and imm1 are non-zero 12-bit unsigned
integers.

Optimize ([adds|subs] r, imm) -> ([SUBS|ADDS] ([SUB|ADD] r, #imm0, lsl #12), #imm1),
if imm == -(imm0<<12)-imm1, and both imm0 and imm1 are non-zero 12-bit unsigned
integers.

The SplitAndOpcFunc type had to change the return type to an Opcode pair so that
the first add/sub is the regular instruction and the second is the flag setting
instruction. This required updating the code in the AND case.

Testing:

I ran a two stage bootstrap with this code.
Using the second stage compiler, I verified that the negation of an ADDS to SUBS
or vice versa is a valid optimization. Example V == -0x111111.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

red1bluelost created this revision.Jan 31 2022, 3:43 PM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald TranscriptJan 31 2022, 3:43 PM

red1bluelost requested review of this revision.Jan 31 2022, 3:43 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 31 2022, 3:43 PM

Harbormaster completed remote builds in B146791: Diff 404757.Jan 31 2022, 6:25 PM

Nice patch, but I think we might need to be checking the condition that is used too. If we split a single SUBS into two, the second of which is setting the flags, then out of NZCV - N (negative) and Z (zero) will be valid as they produce the same results as before, but C (carry) and V (overflow) might be different than the original, given just the wrong input that does overflow/carry on the first SUB but doesn't on the second SUBS.

So I think all the tests here are valid, because they all use eq or ne conditions, but other conditions might not as be.

llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
418–419	Can we give these better register class names? A quick comment looks like it would be useful too, explaining the instruction we are adding and which registers they will use. // NewTmpReg = Opcode.first SrcReg // NewDstReg = Opcode.second NewTmpReg
434–435	This is technically assuming the two instructions have the same RegClass for Operand 1? I think that will always be the case, so maybe that's OK to keep as-is.
llvm/test/CodeGen/AArch64/arm64-instruction-mix-remarks.ll
14	Should the ldr still be present? And the orr was already present before? It looks like the test was already missing some input, but we might as well keep the ldr around.

benshi001 added inline comments.Feb 3 2022, 4:13 AM

llvm/test/CodeGen/AArch64/addsub.ll
410	Do we need to pre-commit these tests, and check the difference between before/after ?

Ensures that optimizations only occur on EQ and NE.

Updates the peephole optimization to check subsequent instructions to ensure that
the condition is only used for EQ and NE cases. This is done after checking the
immediate value to avoid running unnecessarily.

Updates comments and naming in the code.

Adds tests that verify various instructions that would use the condition code.
Also a test for when the condition code is used in multiple instructions.

red1bluelost marked 3 inline comments as done.Feb 3 2022, 3:07 PM

red1bluelost added inline comments.

llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
379	I scan the instructions which read NZCV after the Adds/Subs and check that the condition code is just EQ or NE. These are the instructions that I that I was able to generate test cases for. Let me know if you think that there other instructions that I should add.
434–435	I added it just to be safe.

Harbormaster completed remote builds in B147512: Diff 405806.Feb 3 2022, 4:42 PM

Someone once wrote examineCFlagsUse for doing something very similar in the optimizeCompareInstr (because these class as compare instructions). Can some of that be re-used here?

The alternative might be to scan backward through instructions in the main loop, keeping a track of which parts of NZCV are "alive" as we go, so we always have that info available.

llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
349	Will this always be AArch64::NZCV?
384	We can stop scanning as soon as the NZCV operand is marked as Dead too, I think.
387	On rare occasion the NZCV might be live-out of the block (which is probably enough reason to just return false, as opposed to trying to scan further blocks).

Uses examineCFlagsUse to check NZCV usages.

I was able to use the helper function but I had to slightly modify it to make
it general to any use of NZCV.

In D118663#3300642, @dmgreen wrote:

Someone once wrote examineCFlagsUse for doing something very similar in the optimizeCompareInstr (because these class as compare instructions). Can some of that be re-used here?

The alternative might be to scan backward through instructions in the main loop, keeping a track of which parts of NZCV are "alive" as we go, so we always have that info available.

Thanks for the suggestion! I was able to use examineCFlagsUse. I had to give it external linkage so that I could access it in MIPeepholeOpt. I also made it general to any NZCV case which required slight changes to the preexisting call sites.

llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
349	`examineCFlagsUse` makes this no longer relevant.
384	`examineCFlagsUse` makes this no longer relevant.
387	`examineCFlagsUse` takes care of this.

red1bluelost marked 3 inline comments as done.Feb 10 2022, 2:54 PM

Harbormaster completed remote builds in B148853: Diff 407691.Feb 10 2022, 3:46 PM

Thanks for the updates. Nice work.

This LGTM as far as I can see.

This revision is now accepted and ready to land.Feb 11 2022, 11:50 AM

In D118663#3315078, @dmgreen wrote:

Thanks for the updates. Nice work.

This LGTM as far as I can see.

Thanks! Could you commit this on my behalf when you have a chance. (Micah Weston - micahsweston@gmail.com)

This revision was landed with ongoing or failed builds.Feb 11 2022, 7:14 PM

Closed by commit rGaf45d0fd94b2: [AArch64] Adds SUBS and ADDS instructions to the MIPeepholeOpt. (authored by red1bluelost, committed by benshi001). · Explain Why

This revision was automatically updated to reflect the committed changes.

benshi001 added a commit: rGaf45d0fd94b2: [AArch64] Adds SUBS and ADDS instructions to the MIPeepholeOpt..

nathanchance added a reverting change: rG22eb1dae3fb2: Revert "[AArch64] Adds SUBS and ADDS instructions to the MIPeepholeOpt.".Feb 13 2022, 9:40 AM

I have reverted this in 22eb1dae3fb20ca8ada865de1d95baab0e08a060, as it causes assertion failures when building the Linux kernel, which has caused our CI to go red:

https://github.com/ClangBuiltLinux/continuous-integration2/actions/runs/1836780929
https://builds.tuxbuild.com/253UOEmSLgpZWbJd3IJsK96YDq5/build.log

A reduced reproducer from cvise:

$ cat neighbour.i
neigh_periodic_work_tbl_1() {
  if ((long)neigh_periodic_work_tbl_1 + 300 * 250 < 0)
    for (;;)
      ;
}

$ clang -O2 --target=aarch64-linux-gnu -c -o /dev/null neighbour.i
...
clang: /home/nathan/cbl/src/llvm-project/llvm/include/llvm/CodeGen/Register.h:78: static unsigned int llvm::Register::virtReg2Index(llvm::Register): Assertion `isVirtualRegister(Reg) && "Not a virtual register"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.      Program arguments: clang -O2 --target=aarch64-linux-gnu -c -o /dev/null neighbour.i
1.      <eof> parser at end of file
2.      Code generation
3.      Running pass 'Function Pass Manager' on module 'neighbour.i'.
4.      Running pass 'AArch64 MI Peephole Optimization pass' on function '@neigh_periodic_work_tbl_1'
...

If there is any other information that I can provide or patches I can test, please let me know.

In D118663#3317794, @nathanchance wrote:

I have reverted this in 22eb1dae3fb20ca8ada865de1d95baab0e08a060, as it causes assertion failures when building the Linux kernel, which has caused our CI to go red:

...

If there is any other information that I can provide or patches I can test, please let me know.

Thank you for reverting! I will look into this.

This revision is now accepted and ready to land.Feb 13 2022, 9:51 AM

Fixes issue found in ClangBuiltLinux.

Turns out a scenario was encountered where DstReg is physical which causes
MRI->getRegClass(DstReg) to have an assertion failure. This was verified with
a new test case based on the ClangBuiltLinux IR output.

As a solution, DstReg only constrains the new NewDstReg if DstReg is virtual.
The times where it isn't virtual is when the output is going to XZR or WZR.

Harbormaster completed remote builds in B149298: Diff 408294.Feb 13 2022, 2:30 PM

Thank you for the quick fix! Unfortunately, I still an assertion failure with this patch while building at least allnoconfig and allmodconfig + ThinLTO kernels, albeit a different one. See below and let me know if there are any problems with reproducing:

$ cat random.i
jiffies, primary_crng, input_pool;
_extract_crng_crng() {
  if ((long)_extract_crng_crng < 0 ||
      (long)(_extract_crng_crng + 300 * 250 - jiffies) < 0)
    crng_reseed(primary_crng ? &input_pool : 0);
}

$ clang -O2 --target=aarch64-linux-gnu -c -o /dev/null random.i
...
clang: /home/nathan/cbl/src/llvm-project/llvm/lib/CodeGen/LiveVariables.cpp:111: void llvm::LiveVariables::MarkVirtRegAliveInBlock(llvm::LiveVariables::VarInfo &, llvm::MachineBasicBlock *, llvm::MachineBasicBlock *, SmallVectorImpl<llvm:
:MachineBasicBlock *> &): Assertion `MBB != &MF->front() && "Can't find reaching def for virtreg"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.      Program arguments: clang -O2 --target=aarch64-linux-gnu -c -o /dev/null random.i
1.      <eof> parser at end of file
2.      Code generation
3.      Running pass 'Function Pass Manager' on module 'random.i'.
4.      Running pass 'Live Variable Analysis' on function '@_extract_crng_crng'
...

Fixes cross basic block found in ClangBuiltLinux.

The issue was the physical register was being replaced which screws up coherency
across basic blocks. The scenario was added as a test case. Before, the XZR register
was replaced every where with a new virtual register screwing up the MIR.

In the case where DstReg is physical, we should reuse it rather than making a new
virtual register. This fix ensures that a physical DstReg is preversed while
a virtual DstReg is replaced with a new virtual register.

Thank you for testing and finding the errors!

Harbormaster completed remote builds in B149564: Diff 408675.Feb 14 2022, 6:31 PM

The latest revision passes all of my AArch64 Linux kernel builds with assertions enabled and boots on bare metal, thank you for fixing all the issues!

In D118663#3323737, @nathanchance wrote:

The latest revision passes all of my AArch64 Linux kernel builds with assertions enabled and boots on bare metal, thank you for fixing all the issues!

That's great to hear! Thank you for testing it on bare metal!

If the changes looks good to the reviewers, then it just needs to be recommitted on my behalf (Micah Weston - micahsweston@gmail).

Sorry - I nearly missed this. Yeah it sounds OK to me. It makes sense we would need to treat physical registers differently. It's a little surprising that didn't come up from the existing tests.

LGTM. I'll commit this now.

dmgreen accepted this revision.Feb 19 2022, 7:23 AM

This revision was landed with ongoing or failed builds.Feb 19 2022, 7:36 AM

Closed by commit rGc69af70f02f2: [AArch64] Adds SUBS and ADDS instructions to the MIPeepholeOpt. (authored by red1bluelost, committed by dmgreen). · Explain Why

This revision was automatically updated to reflect the committed changes.

dmgreen added a commit: rGc69af70f02f2: [AArch64] Adds SUBS and ADDS instructions to the MIPeepholeOpt..

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64InstrInfo.h

27 lines

AArch64InstrInfo.cpp

42 lines

AArch64MIPeepholeOpt.cpp

150 lines

test/

CodeGen/

AArch64/

addsub.ll

290 lines

arm64-instruction-mix-remarks.ll

15 lines

Diff 410087

llvm/lib/Target/AArch64/AArch64InstrInfo.h

Show First 20 Lines • Show All 356 Lines • ▼ Show 20 Lines	private:

/// Remove a ptest of a predicate-generating operation that already sets, or		/// Remove a ptest of a predicate-generating operation that already sets, or
/// can be made to set, the condition codes in an identical manner		/// can be made to set, the condition codes in an identical manner
bool optimizePTestInstr(MachineInstr *PTest, unsigned MaskReg,		bool optimizePTestInstr(MachineInstr *PTest, unsigned MaskReg,
unsigned PredReg,		unsigned PredReg,
const MachineRegisterInfo *MRI) const;		const MachineRegisterInfo *MRI) const;
};		};

		struct UsedNZCV {
		bool N = false;
		bool Z = false;
		bool C = false;
		bool V = false;

		UsedNZCV() = default;

		UsedNZCV &operator\|=(const UsedNZCV &UsedFlags) {
		this->N \|= UsedFlags.N;
		this->Z \|= UsedFlags.Z;
		this->C \|= UsedFlags.C;
		this->V \|= UsedFlags.V;
		return *this;
		}
		};

		/// \returns Conditions flags used after \p CmpInstr in its MachineBB if NZCV
		/// flags are not alive in successors of the same \p CmpInstr and \p MI parent.
		/// \returns None otherwise.
		///
		/// Collect instructions using that flags in \p CCUseInstrs if provided.
		Optional<UsedNZCV>
		examineCFlagsUse(MachineInstr &MI, MachineInstr &CmpInstr,
		const TargetRegisterInfo &TRI,
		SmallVectorImpl<MachineInstr > CCUseInstrs = nullptr);

/// Return true if there is an instruction /after/ \p DefMI and before \p UseMI		/// Return true if there is an instruction /after/ \p DefMI and before \p UseMI
/// which either reads or clobbers NZCV.		/// which either reads or clobbers NZCV.
bool isNZCVTouchedInInstructionRange(const MachineInstr &DefMI,		bool isNZCVTouchedInInstructionRange(const MachineInstr &DefMI,
const MachineInstr &UseMI,		const MachineInstr &UseMI,
const TargetRegisterInfo *TRI);		const TargetRegisterInfo *TRI);

/// emitFrameOffset - Emit instructions as needed to set DestReg to SrcReg		/// emitFrameOffset - Emit instructions as needed to set DestReg to SrcReg
/// plus Offset. This is intended to be used from within the prolog/epilog		/// plus Offset. This is intended to be used from within the prolog/epilog
▲ Show 20 Lines • Show All 143 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,541 Lines • ▼ Show 20 Lines	findCondCodeUseOperandIdxForBranchOrSelect(const MachineInstr &Instr) {
case AArch64::FCSELDrrr: {		case AArch64::FCSELDrrr: {
int Idx = Instr.findRegisterUseOperandIdx(AArch64::NZCV);		int Idx = Instr.findRegisterUseOperandIdx(AArch64::NZCV);
assert(Idx >= 1);		assert(Idx >= 1);
return Idx - 1;		return Idx - 1;
}		}
}		}
}		}

namespace {

struct UsedNZCV {
bool N = false;
bool Z = false;
bool C = false;
bool V = false;

UsedNZCV() = default;

UsedNZCV &operator\|=(const UsedNZCV &UsedFlags) {
this->N \|= UsedFlags.N;
this->Z \|= UsedFlags.Z;
this->C \|= UsedFlags.C;
this->V \|= UsedFlags.V;
return *this;
}
};

} // end anonymous namespace

/// Find a condition code used by the instruction.		/// Find a condition code used by the instruction.
/// Returns AArch64CC::Invalid if either the instruction does not use condition		/// Returns AArch64CC::Invalid if either the instruction does not use condition
/// codes or we don't optimize CmpInstr in the presence of such instructions.		/// codes or we don't optimize CmpInstr in the presence of such instructions.
static AArch64CC::CondCode findCondCodeUsedByInstr(const MachineInstr &Instr) {		static AArch64CC::CondCode findCondCodeUsedByInstr(const MachineInstr &Instr) {
int CCIdx = findCondCodeUseOperandIdxForBranchOrSelect(Instr);		int CCIdx = findCondCodeUseOperandIdxForBranchOrSelect(Instr);
return CCIdx >= 0 ? static_cast<AArch64CC::CondCode>(		return CCIdx >= 0 ? static_cast<AArch64CC::CondCode>(
Instr.getOperand(CCIdx).getImm())		Instr.getOperand(CCIdx).getImm())
: AArch64CC::Invalid;		: AArch64CC::Invalid;
Show All 38 Lines	static UsedNZCV getUsedNZCV(AArch64CC::CondCode CC) {
case AArch64CC::LT: // N and V differ		case AArch64CC::LT: // N and V differ
UsedFlags.N = true;		UsedFlags.N = true;
UsedFlags.V = true;		UsedFlags.V = true;
break;		break;
}		}
return UsedFlags;		return UsedFlags;
}		}

/// \returns Conditions flags used after \p CmpInstr in its MachineBB if they		/// \returns Conditions flags used after \p CmpInstr in its MachineBB if NZCV
/// are not containing C or V flags and NZCV flags are not alive in successors		/// flags are not alive in successors of the same \p CmpInstr and \p MI parent.
/// of the same \p CmpInstr and \p MI parent. \returns None otherwise.		/// \returns None otherwise.
///		///
/// Collect instructions using that flags in \p CCUseInstrs if provided.		/// Collect instructions using that flags in \p CCUseInstrs if provided.
static Optional<UsedNZCV>		Optional<UsedNZCV>
examineCFlagsUse(MachineInstr &MI, MachineInstr &CmpInstr,		llvm::examineCFlagsUse(MachineInstr &MI, MachineInstr &CmpInstr,
const TargetRegisterInfo &TRI,		const TargetRegisterInfo &TRI,
SmallVectorImpl<MachineInstr > CCUseInstrs = nullptr) {		SmallVectorImpl<MachineInstr > CCUseInstrs) {
MachineBasicBlock *CmpParent = CmpInstr.getParent();		MachineBasicBlock *CmpParent = CmpInstr.getParent();
if (MI.getParent() != CmpParent)		if (MI.getParent() != CmpParent)
return None;		return None;

if (areCFlagsAliveInSuccessors(CmpParent))		if (areCFlagsAliveInSuccessors(CmpParent))
return None;		return None;

UsedNZCV NZCVUsedAfterCmp;		UsedNZCV NZCVUsedAfterCmp;
for (MachineInstr &Instr : instructionsWithoutDebug(		for (MachineInstr &Instr : instructionsWithoutDebug(
std::next(CmpInstr.getIterator()), CmpParent->instr_end())) {		std::next(CmpInstr.getIterator()), CmpParent->instr_end())) {
if (Instr.readsRegister(AArch64::NZCV, &TRI)) {		if (Instr.readsRegister(AArch64::NZCV, &TRI)) {
AArch64CC::CondCode CC = findCondCodeUsedByInstr(Instr);		AArch64CC::CondCode CC = findCondCodeUsedByInstr(Instr);
if (CC == AArch64CC::Invalid) // Unsupported conditional instruction		if (CC == AArch64CC::Invalid) // Unsupported conditional instruction
return None;		return None;
NZCVUsedAfterCmp \|= getUsedNZCV(CC);		NZCVUsedAfterCmp \|= getUsedNZCV(CC);
if (CCUseInstrs)		if (CCUseInstrs)
CCUseInstrs->push_back(&Instr);		CCUseInstrs->push_back(&Instr);
}		}
if (Instr.modifiesRegister(AArch64::NZCV, &TRI))		if (Instr.modifiesRegister(AArch64::NZCV, &TRI))
break;		break;
}		}
if (NZCVUsedAfterCmp.C \|\| NZCVUsedAfterCmp.V)
return None;
return NZCVUsedAfterCmp;		return NZCVUsedAfterCmp;
}		}

static bool isADDSRegImm(unsigned Opcode) {		static bool isADDSRegImm(unsigned Opcode) {
return Opcode == AArch64::ADDSWri \|\| Opcode == AArch64::ADDSXri;		return Opcode == AArch64::ADDSWri \|\| Opcode == AArch64::ADDSXri;
}		}

static bool isSUBSRegImm(unsigned Opcode) {		static bool isSUBSRegImm(unsigned Opcode) {
Show All 14 Lines
static bool canInstrSubstituteCmpInstr(MachineInstr &MI, MachineInstr &CmpInstr,		static bool canInstrSubstituteCmpInstr(MachineInstr &MI, MachineInstr &CmpInstr,
const TargetRegisterInfo &TRI) {		const TargetRegisterInfo &TRI) {
assert(sForm(MI) != AArch64::INSTRUCTION_LIST_END);		assert(sForm(MI) != AArch64::INSTRUCTION_LIST_END);

const unsigned CmpOpcode = CmpInstr.getOpcode();		const unsigned CmpOpcode = CmpInstr.getOpcode();
if (!isADDSRegImm(CmpOpcode) && !isSUBSRegImm(CmpOpcode))		if (!isADDSRegImm(CmpOpcode) && !isSUBSRegImm(CmpOpcode))
return false;		return false;

if (!examineCFlagsUse(MI, CmpInstr, TRI))		Optional<UsedNZCV> NZVCUsed = examineCFlagsUse(MI, CmpInstr, TRI);
		if (!NZVCUsed \|\| NZVCUsed->C \|\| NZVCUsed->V)
return false;		return false;

AccessKind AccessToCheck = AK_Write;		AccessKind AccessToCheck = AK_Write;
if (sForm(MI) != MI.getOpcode())		if (sForm(MI) != MI.getOpcode())
AccessToCheck = AK_All;		AccessToCheck = AK_All;
return !areCFlagsAccessedBetweenInstrs(&MI, &CmpInstr, &TRI, AccessToCheck);		return !areCFlagsAccessedBetweenInstrs(&MI, &CmpInstr, &TRI, AccessToCheck);
}		}

▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	static bool canCmpInstrBeRemoved(MachineInstr &MI, MachineInstr &CmpInstr,
UsedNZCV MIUsedNZCV = getUsedNZCV(MICC);		UsedNZCV MIUsedNZCV = getUsedNZCV(MICC);
if (MIUsedNZCV.C \|\| MIUsedNZCV.V)		if (MIUsedNZCV.C \|\| MIUsedNZCV.V)
return false;		return false;

Optional<UsedNZCV> NZCVUsedAfterCmp =		Optional<UsedNZCV> NZCVUsedAfterCmp =
examineCFlagsUse(MI, CmpInstr, TRI, &CCUseInstrs);		examineCFlagsUse(MI, CmpInstr, TRI, &CCUseInstrs);
// Condition flags are not used in CmpInstr basic block successors and only		// Condition flags are not used in CmpInstr basic block successors and only
// Z or N flags allowed to be used after CmpInstr within its basic block		// Z or N flags allowed to be used after CmpInstr within its basic block
if (!NZCVUsedAfterCmp)		if (!NZCVUsedAfterCmp \|\| NZCVUsedAfterCmp->C \|\| NZCVUsedAfterCmp->V)
return false;		return false;
// Z or N flag used after CmpInstr must correspond to the flag used in MI		// Z or N flag used after CmpInstr must correspond to the flag used in MI
if ((MIUsedNZCV.Z && NZCVUsedAfterCmp->N) \|\|		if ((MIUsedNZCV.Z && NZCVUsedAfterCmp->N) \|\|
(MIUsedNZCV.N && NZCVUsedAfterCmp->Z))		(MIUsedNZCV.N && NZCVUsedAfterCmp->Z))
return false;		return false;
// If CmpInstr is comparison to zero MI conditions are limited to eq, ne		// If CmpInstr is comparison to zero MI conditions are limited to eq, ne
if (MIUsedNZCV.N && !CmpValue)		if (MIUsedNZCV.N && !CmpValue)
return false;		return false;
▲ Show 20 Lines • Show All 6,074 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp

Show First 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	AArch64MIPeepholeOpt() : MachineFunctionPass(ID) {
initializeAArch64MIPeepholeOptPass(*PassRegistry::getPassRegistry());		initializeAArch64MIPeepholeOptPass(*PassRegistry::getPassRegistry());
}		}

const AArch64InstrInfo *TII;		const AArch64InstrInfo *TII;
const AArch64RegisterInfo *TRI;		const AArch64RegisterInfo *TRI;
MachineLoopInfo *MLI;		MachineLoopInfo *MLI;
MachineRegisterInfo *MRI;		MachineRegisterInfo *MRI;

		using OpcodePair = std::pair<unsigned, unsigned>;
template <typename T>		template <typename T>
using SplitAndOpcFunc =		using SplitAndOpcFunc =
std::function<Optional<unsigned>(T, unsigned, T &, T &)>;		std::function<Optional<OpcodePair>(T, unsigned, T &, T &)>;
using BuildMIFunc =		using BuildMIFunc =
std::function<void(MachineInstr &, unsigned, unsigned, unsigned, Register,		std::function<void(MachineInstr &, OpcodePair, unsigned, unsigned,
Register, Register)>;		Register, Register, Register)>;

/// For instructions where an immediate operand could be split into two		/// For instructions where an immediate operand could be split into two
/// separate immediate instructions, use the splitTwoPartImm two handle the		/// separate immediate instructions, use the splitTwoPartImm two handle the
/// optimization.		/// optimization.
///		///
/// To implement, the following function types must be passed to		/// To implement, the following function types must be passed to
/// splitTwoPartImm. A SplitAndOpcFunc must be implemented that determines if		/// splitTwoPartImm. A SplitAndOpcFunc must be implemented that determines if
/// splitting the immediate is valid and returns the associated new opcode. A		/// splitting the immediate is valid and returns the associated new opcode. A
Show All 11 Lines	struct AArch64MIPeepholeOpt : public MachineFunctionPass {

bool checkMovImmInstr(MachineInstr &MI, MachineInstr *&MovMI,		bool checkMovImmInstr(MachineInstr &MI, MachineInstr *&MovMI,
MachineInstr *&SubregToRegMI);		MachineInstr *&SubregToRegMI);

template <typename T>		template <typename T>
bool visitADDSUB(unsigned PosOpc, unsigned NegOpc, MachineInstr &MI,		bool visitADDSUB(unsigned PosOpc, unsigned NegOpc, MachineInstr &MI,
SmallSetVector<MachineInstr *, 8> &ToBeRemoved);		SmallSetVector<MachineInstr *, 8> &ToBeRemoved);
template <typename T>		template <typename T>
		bool visitADDSSUBS(OpcodePair PosOpcs, OpcodePair NegOpcs, MachineInstr &MI,
		SmallSetVector<MachineInstr *, 8> &ToBeRemoved);

		template <typename T>
bool visitAND(unsigned Opc, MachineInstr &MI,		bool visitAND(unsigned Opc, MachineInstr &MI,
SmallSetVector<MachineInstr *, 8> &ToBeRemoved);		SmallSetVector<MachineInstr *, 8> &ToBeRemoved);
bool visitORR(MachineInstr &MI,		bool visitORR(MachineInstr &MI,
SmallSetVector<MachineInstr *, 8> &ToBeRemoved);		SmallSetVector<MachineInstr *, 8> &ToBeRemoved);
bool runOnMachineFunction(MachineFunction &MF) override;		bool runOnMachineFunction(MachineFunction &MF) override;

StringRef getPassName() const override {		StringRef getPassName() const override {
return "AArch64 MI Peephole Optimization pass";		return "AArch64 MI Peephole Optimization pass";
▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	bool AArch64MIPeepholeOpt::visitAND(
//		//
// The mov pseudo instruction could be expanded to multiple mov instructions		// The mov pseudo instruction could be expanded to multiple mov instructions
// later. Let's try to split the constant operand of mov instruction into two		// later. Let's try to split the constant operand of mov instruction into two
// bitmask immediates. It makes only two AND instructions intead of multiple		// bitmask immediates. It makes only two AND instructions intead of multiple
// mov + and instructions.		// mov + and instructions.

return splitTwoPartImm<T>(		return splitTwoPartImm<T>(
MI, ToBeRemoved,		MI, ToBeRemoved,
[Opc](T Imm, unsigned RegSize, T &Imm0, T &Imm1) -> Optional<unsigned> {		[Opc](T Imm, unsigned RegSize, T &Imm0, T &Imm1) -> Optional<OpcodePair> {
if (splitBitmaskImm(Imm, RegSize, Imm0, Imm1))		if (splitBitmaskImm(Imm, RegSize, Imm0, Imm1))
return Opc;		return std::make_pair(Opc, Opc);
return None;		return None;
},		},
[&TII = TII](MachineInstr &MI, unsigned Opcode, unsigned Imm0,		[&TII = TII](MachineInstr &MI, OpcodePair Opcode, unsigned Imm0,
unsigned Imm1, Register SrcReg, Register NewTmpReg,		unsigned Imm1, Register SrcReg, Register NewTmpReg,
Register NewDstReg) {		Register NewDstReg) {
DebugLoc DL = MI.getDebugLoc();		DebugLoc DL = MI.getDebugLoc();
MachineBasicBlock *MBB = MI.getParent();		MachineBasicBlock *MBB = MI.getParent();
BuildMI(*MBB, MI, DL, TII->get(Opcode), NewTmpReg)		BuildMI(*MBB, MI, DL, TII->get(Opcode.first), NewTmpReg)
.addReg(SrcReg)		.addReg(SrcReg)
.addImm(Imm0);		.addImm(Imm0);
BuildMI(*MBB, MI, DL, TII->get(Opcode), NewDstReg)		BuildMI(*MBB, MI, DL, TII->get(Opcode.second), NewDstReg)
.addReg(NewTmpReg)		.addReg(NewTmpReg)
.addImm(Imm1);		.addImm(Imm1);
});		});
}		}

bool AArch64MIPeepholeOpt::visitORR(		bool AArch64MIPeepholeOpt::visitORR(
MachineInstr &MI, SmallSetVector<MachineInstr *, 8> &ToBeRemoved) {		MachineInstr &MI, SmallSetVector<MachineInstr *, 8> &ToBeRemoved) {
// Check this ORR comes from below zero-extend pattern.		// Check this ORR comes from below zero-extend pattern.
▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	bool AArch64MIPeepholeOpt::visitADDSUB(
// The mov pseudo instruction could be expanded to multiple mov instructions		// The mov pseudo instruction could be expanded to multiple mov instructions
// later. Let's try to split the constant operand of mov instruction into two		// later. Let's try to split the constant operand of mov instruction into two
// legal add/sub immediates. It makes only two ADD/SUB instructions intead of		// legal add/sub immediates. It makes only two ADD/SUB instructions intead of
// multiple `mov` + `and/sub` instructions.		// multiple `mov` + `and/sub` instructions.

return splitTwoPartImm<T>(		return splitTwoPartImm<T>(
MI, ToBeRemoved,		MI, ToBeRemoved,
[PosOpc, NegOpc](T Imm, unsigned RegSize, T &Imm0,		[PosOpc, NegOpc](T Imm, unsigned RegSize, T &Imm0,
T &Imm1) -> Optional<unsigned> {		T &Imm1) -> Optional<OpcodePair> {
if (splitAddSubImm(Imm, RegSize, Imm0, Imm1))		if (splitAddSubImm(Imm, RegSize, Imm0, Imm1))
return PosOpc;		return std::make_pair(PosOpc, PosOpc);
if (splitAddSubImm(-Imm, RegSize, Imm0, Imm1))		if (splitAddSubImm(-Imm, RegSize, Imm0, Imm1))
return NegOpc;		return std::make_pair(NegOpc, NegOpc);
		return None;
		},
		[&TII = TII](MachineInstr &MI, OpcodePair Opcode, unsigned Imm0,
		unsigned Imm1, Register SrcReg, Register NewTmpReg,
		Register NewDstReg) {
		DebugLoc DL = MI.getDebugLoc();
		MachineBasicBlock *MBB = MI.getParent();
		BuildMI(*MBB, MI, DL, TII->get(Opcode.first), NewTmpReg)
		.addReg(SrcReg)
		.addImm(Imm0)
		.addImm(12);
		BuildMI(*MBB, MI, DL, TII->get(Opcode.second), NewDstReg)
		.addReg(NewTmpReg)
		.addImm(Imm1)
		.addImm(0);
		});
		}

		template <typename T>
		bool AArch64MIPeepholeOpt::visitADDSSUBS(
		OpcodePair PosOpcs, OpcodePair NegOpcs, MachineInstr &MI,
		SmallSetVector<MachineInstr *, 8> &ToBeRemoved) {
		// Try the same transformation as ADDSUB but with additional requirement
		// that the condition code usages are only for Equal and Not Equal
		return splitTwoPartImm<T>(
		MI, ToBeRemoved,
		[PosOpcs, NegOpcs, &MI, &TRI = TRI, &MRI = MRI](
		T Imm, unsigned RegSize, T &Imm0, T &Imm1) -> Optional<OpcodePair> {
		OpcodePair OP;
		if (splitAddSubImm(Imm, RegSize, Imm0, Imm1))
		OP = PosOpcs;
		else if (splitAddSubImm(-Imm, RegSize, Imm0, Imm1))
		OP = NegOpcs;
		else
		return None;
		// Check conditional uses last since it is expensive for scanning
		// proceeding instructions
		MachineInstr &SrcMI = *MRI->getUniqueVRegDef(MI.getOperand(1).getReg());
		Optional<UsedNZCV> NZCVUsed = examineCFlagsUse(SrcMI, MI, *TRI);
		if (!NZCVUsed \|\| NZCVUsed->C \|\| NZCVUsed->V)
return None;		return None;
		return OP;
},		},
[&TII = TII](MachineInstr &MI, unsigned Opcode, unsigned Imm0,		[&TII = TII](MachineInstr &MI, OpcodePair Opcode, unsigned Imm0,
unsigned Imm1, Register SrcReg, Register NewTmpReg,		unsigned Imm1, Register SrcReg, Register NewTmpReg,
Register NewDstReg) {		Register NewDstReg) {
DebugLoc DL = MI.getDebugLoc();		DebugLoc DL = MI.getDebugLoc();
MachineBasicBlock *MBB = MI.getParent();		MachineBasicBlock *MBB = MI.getParent();
BuildMI(*MBB, MI, DL, TII->get(Opcode), NewTmpReg)		BuildMI(*MBB, MI, DL, TII->get(Opcode.first), NewTmpReg)
.addReg(SrcReg)		.addReg(SrcReg)
.addImm(Imm0)		.addImm(Imm0)
.addImm(12);		.addImm(12);
BuildMI(*MBB, MI, DL, TII->get(Opcode), NewDstReg)		BuildMI(*MBB, MI, DL, TII->get(Opcode.second), NewDstReg)
.addReg(NewTmpReg)		.addReg(NewTmpReg)
.addImm(Imm1)		.addImm(Imm1)
.addImm(0);		.addImm(0);
});		});
}		}

// Checks if the corresponding MOV immediate instruction is applicable for		// Checks if the corresponding MOV immediate instruction is applicable for
// this peephole optimization.		// this peephole optimization.
bool AArch64MIPeepholeOpt::checkMovImmInstr(MachineInstr &MI,		bool AArch64MIPeepholeOpt::checkMovImmInstr(MachineInstr &MI,
MachineInstr *&MovMI,		MachineInstr *&MovMI,
MachineInstr *&SubregToRegMI) {		MachineInstr *&SubregToRegMI) {
		dmgreenUnsubmitted Done Reply Inline Actions Will this always be AArch64::NZCV? dmgreen: Will this always be AArch64::NZCV?
		red1bluelostAuthorUnsubmitted Done Reply Inline Actions `examineCFlagsUse` makes this no longer relevant. red1bluelost: `examineCFlagsUse` makes this no longer relevant.
// Check whether current MBB is in loop and the AND is loop invariant.		// Check whether current MBB is in loop and the AND is loop invariant.
MachineBasicBlock *MBB = MI.getParent();		MachineBasicBlock *MBB = MI.getParent();
MachineLoop *L = MLI->getLoopFor(MBB);		MachineLoop *L = MLI->getLoopFor(MBB);
if (L && !L->isLoopInvariant(MI))		if (L && !L->isLoopInvariant(MI))
return false;		return false;

// Check whether current MI's operand is MOV with immediate.		// Check whether current MI's operand is MOV with immediate.
MovMI = MRI->getUniqueVRegDef(MI.getOperand(2).getReg());		MovMI = MRI->getUniqueVRegDef(MI.getOperand(2).getReg());
Show All 13 Lines	if (MovMI->getOpcode() != AArch64::MOVi32imm &&
MovMI->getOpcode() != AArch64::MOVi64imm)		MovMI->getOpcode() != AArch64::MOVi64imm)
return false;		return false;

// If the MOV has multiple uses, do not split the immediate because it causes		// If the MOV has multiple uses, do not split the immediate because it causes
// more instructions.		// more instructions.
if (!MRI->hasOneUse(MovMI->getOperand(0).getReg()))		if (!MRI->hasOneUse(MovMI->getOperand(0).getReg()))
return false;		return false;
if (SubregToRegMI && !MRI->hasOneUse(SubregToRegMI->getOperand(0).getReg()))		if (SubregToRegMI && !MRI->hasOneUse(SubregToRegMI->getOperand(0).getReg()))
return false;		return false;
		red1bluelostAuthorUnsubmitted Done Reply Inline Actions I scan the instructions which read NZCV after the Adds/Subs and check that the condition code is just EQ or NE. These are the instructions that I that I was able to generate test cases for. Let me know if you think that there other instructions that I should add. red1bluelost: I scan the instructions which read NZCV after the Adds/Subs and check that the condition code…

// It is OK to perform this peephole optimization.		// It is OK to perform this peephole optimization.
return true;		return true;
}		}

		dmgreenUnsubmitted Done Reply Inline Actions We can stop scanning as soon as the NZCV operand is marked as Dead too, I think. dmgreen: We can stop scanning as soon as the NZCV operand is marked as Dead too, I think.
		red1bluelostAuthorUnsubmitted Done Reply Inline Actions `examineCFlagsUse` makes this no longer relevant. red1bluelost: `examineCFlagsUse` makes this no longer relevant.
template <typename T>		template <typename T>
bool AArch64MIPeepholeOpt::splitTwoPartImm(		bool AArch64MIPeepholeOpt::splitTwoPartImm(
MachineInstr &MI, SmallSetVector<MachineInstr *, 8> &ToBeRemoved,		MachineInstr &MI, SmallSetVector<MachineInstr *, 8> &ToBeRemoved,
		dmgreenUnsubmitted Done Reply Inline Actions On rare occasion the NZCV might be live-out of the block (which is probably enough reason to just return false, as opposed to trying to scan further blocks). dmgreen: On rare occasion the NZCV might be live-out of the block (which is probably enough reason to…
		red1bluelostAuthorUnsubmitted Done Reply Inline Actions `examineCFlagsUse` takes care of this. red1bluelost: `examineCFlagsUse` takes care of this.
SplitAndOpcFunc<T> SplitAndOpc, BuildMIFunc BuildInstr) {		SplitAndOpcFunc<T> SplitAndOpc, BuildMIFunc BuildInstr) {
unsigned RegSize = sizeof(T) * 8;		unsigned RegSize = sizeof(T) * 8;
assert((RegSize == 32 \|\| RegSize == 64) &&		assert((RegSize == 32 \|\| RegSize == 64) &&
"Invalid RegSize for legal immediate peephole optimization");		"Invalid RegSize for legal immediate peephole optimization");

// Perform several essential checks against current MI.		// Perform several essential checks against current MI.
MachineInstr MovMI, SubregToRegMI;		MachineInstr MovMI, SubregToRegMI;
if (!checkMovImmInstr(MI, MovMI, SubregToRegMI))		if (!checkMovImmInstr(MI, MovMI, SubregToRegMI))
return false;		return false;

// Split the immediate to Imm0 and Imm1, and calculate the Opcode.		// Split the immediate to Imm0 and Imm1, and calculate the Opcode.
T Imm = static_cast<T>(MovMI->getOperand(1).getImm()), Imm0, Imm1;		T Imm = static_cast<T>(MovMI->getOperand(1).getImm()), Imm0, Imm1;
// For the 32 bit form of instruction, the upper 32 bits of the destination		// For the 32 bit form of instruction, the upper 32 bits of the destination
// register are set to zero. If there is SUBREG_TO_REG, set the upper 32 bits		// register are set to zero. If there is SUBREG_TO_REG, set the upper 32 bits
// of Imm to zero. This is essential if the Immediate value was a negative		// of Imm to zero. This is essential if the Immediate value was a negative
// number since it was sign extended when we assign to the 64-bit Imm.		// number since it was sign extended when we assign to the 64-bit Imm.
if (SubregToRegMI)		if (SubregToRegMI)
Imm &= 0xFFFFFFFF;		Imm &= 0xFFFFFFFF;
unsigned Opcode;		OpcodePair Opcode;
if (auto R = SplitAndOpc(Imm, RegSize, Imm0, Imm1))		if (auto R = SplitAndOpc(Imm, RegSize, Imm0, Imm1))
Opcode = R.getValue();		Opcode = R.getValue();
else		else
return false;		return false;

// Create new ADD/SUB MIs.		// Create new MIs using the first and second opcodes. Opcodes might differ for
		// flag setting operations that should only set flags on second instruction.
		// NewTmpReg = Opcode.first SrcReg Imm0
		// NewDstReg = Opcode.second NewTmpReg Imm1

		// Determine register classes for destinations and register operands
MachineFunction *MF = MI.getMF();		MachineFunction *MF = MI.getMF();
const TargetRegisterClass *RC =		const TargetRegisterClass *FirstInstrDstRC =
		dmgreenUnsubmitted Done Reply Inline Actions Can we give these better register class names? A quick comment looks like it would be useful too, explaining the instruction we are adding and which registers they will use. // NewTmpReg = Opcode.first SrcReg // NewDstReg = Opcode.second NewTmpReg dmgreen: Can we give these better register class names? A quick comment looks like it would be useful…
TII->getRegClass(TII->get(Opcode), 0, TRI, *MF);		TII->getRegClass(TII->get(Opcode.first), 0, TRI, *MF);
const TargetRegisterClass *ORC =		const TargetRegisterClass *FirstInstrOperandRC =
TII->getRegClass(TII->get(Opcode), 1, TRI, *MF);		TII->getRegClass(TII->get(Opcode.first), 1, TRI, *MF);
		const TargetRegisterClass *SecondInstrDstRC =
		(Opcode.first == Opcode.second)
		? FirstInstrDstRC
		: TII->getRegClass(TII->get(Opcode.second), 0, TRI, *MF);
		const TargetRegisterClass *SecondInstrOperandRC =
		(Opcode.first == Opcode.second)
		? FirstInstrOperandRC
		: TII->getRegClass(TII->get(Opcode.second), 1, TRI, *MF);

		// Get old registers destinations and new register destinations
Register DstReg = MI.getOperand(0).getReg();		Register DstReg = MI.getOperand(0).getReg();
Register SrcReg = MI.getOperand(1).getReg();		Register SrcReg = MI.getOperand(1).getReg();
Register NewTmpReg = MRI->createVirtualRegister(RC);		Register NewTmpReg = MRI->createVirtualRegister(FirstInstrDstRC);
		dmgreenUnsubmitted Done Reply Inline Actions This is technically assuming the two instructions have the same RegClass for Operand 1? I think that will always be the case, so maybe that's OK to keep as-is. dmgreen: This is technically assuming the two instructions have the same RegClass for Operand 1? I think…
		red1bluelostAuthorUnsubmitted Done Reply Inline Actions I added it just to be safe. red1bluelost: I added it just to be safe.
Register NewDstReg = MRI->createVirtualRegister(RC);		// In the situation that DstReg is not Virtual (likely WZR or XZR), we want to
		// reuse that same destination register.
MRI->constrainRegClass(SrcReg, RC);		Register NewDstReg = DstReg.isVirtual()
MRI->constrainRegClass(NewTmpReg, ORC);		? MRI->createVirtualRegister(SecondInstrDstRC)
		: DstReg;

		// Constrain registers based on their new uses
		MRI->constrainRegClass(SrcReg, FirstInstrOperandRC);
		MRI->constrainRegClass(NewTmpReg, SecondInstrOperandRC);
		if (DstReg != NewDstReg)
MRI->constrainRegClass(NewDstReg, MRI->getRegClass(DstReg));		MRI->constrainRegClass(NewDstReg, MRI->getRegClass(DstReg));

		// Call the delegating operation to build the instruction
BuildInstr(MI, Opcode, Imm0, Imm1, SrcReg, NewTmpReg, NewDstReg);		BuildInstr(MI, Opcode, Imm0, Imm1, SrcReg, NewTmpReg, NewDstReg);

MRI->replaceRegWith(DstReg, NewDstReg);
// replaceRegWith changes MI's definition register. Keep it for SSA form until		// replaceRegWith changes MI's definition register. Keep it for SSA form until
// deleting MI.		// deleting MI. Only if we made a new destination register.
		if (DstReg != NewDstReg) {
		MRI->replaceRegWith(DstReg, NewDstReg);
MI.getOperand(0).setReg(DstReg);		MI.getOperand(0).setReg(DstReg);
		}

// Record the MIs need to be removed.		// Record the MIs need to be removed.
ToBeRemoved.insert(&MI);		ToBeRemoved.insert(&MI);
if (SubregToRegMI)		if (SubregToRegMI)
ToBeRemoved.insert(SubregToRegMI);		ToBeRemoved.insert(SubregToRegMI);
ToBeRemoved.insert(MovMI);		ToBeRemoved.insert(MovMI);

return true;		return true;
Show All 39 Lines	for (MachineInstr &MI : MBB) {
case AArch64::ADDXrr:		case AArch64::ADDXrr:
Changed = visitADDSUB<uint64_t>(AArch64::ADDXri, AArch64::SUBXri, MI,		Changed = visitADDSUB<uint64_t>(AArch64::ADDXri, AArch64::SUBXri, MI,
ToBeRemoved);		ToBeRemoved);
break;		break;
case AArch64::SUBXrr:		case AArch64::SUBXrr:
Changed = visitADDSUB<uint64_t>(AArch64::SUBXri, AArch64::ADDXri, MI,		Changed = visitADDSUB<uint64_t>(AArch64::SUBXri, AArch64::ADDXri, MI,
ToBeRemoved);		ToBeRemoved);
break;		break;
		case AArch64::ADDSWrr:
		Changed = visitADDSSUBS<uint32_t>({AArch64::ADDWri, AArch64::ADDSWri},
		{AArch64::SUBWri, AArch64::SUBSWri},
		MI, ToBeRemoved);
		break;
		case AArch64::SUBSWrr:
		Changed = visitADDSSUBS<uint32_t>({AArch64::SUBWri, AArch64::SUBSWri},
		{AArch64::ADDWri, AArch64::ADDSWri},
		MI, ToBeRemoved);
		break;
		case AArch64::ADDSXrr:
		Changed = visitADDSSUBS<uint64_t>({AArch64::ADDXri, AArch64::ADDSXri},
		{AArch64::SUBXri, AArch64::SUBSXri},
		MI, ToBeRemoved);
		break;
		case AArch64::SUBSXrr:
		Changed = visitADDSSUBS<uint64_t>({AArch64::SUBXri, AArch64::SUBSXri},
		{AArch64::ADDXri, AArch64::ADDSXri},
		MI, ToBeRemoved);
		break;
}		}
}		}
}		}

for (MachineInstr *MI : ToBeRemoved)		for (MachineInstr *MI : ToBeRemoved)
MI->eraseFromParent();		MI->eraseFromParent();

return Changed;		return Changed;
}		}

FunctionPass *llvm::createAArch64MIPeepholeOptPass() {		FunctionPass *llvm::createAArch64MIPeepholeOptPass() {
return new AArch64MIPeepholeOpt();		return new AArch64MIPeepholeOpt();
}		}

llvm/test/CodeGen/AArch64/addsub.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-linux-gnu \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -mtriple=aarch64-linux-gnu -verify-machineinstrs \| FileCheck %s

	; Note that this should be refactored (for efficiency if nothing else)			; Note that this should be refactored (for efficiency if nothing else)
	; when the PCS is implemented so we don't have to worry about the			; when the PCS is implemented so we don't have to worry about the
	; loads and stores.			; loads and stores.

	@var_i32 = global i32 42			@var_i32 = global i32 42
	@var2_i32 = global i32 43			@var2_i32 = global i32 43
	@var_i64 = global i64 0			@var_i64 = global i64 0
	▲ Show 20 Lines • Show All 390 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: mov w8, #48576			; CHECK-NEXT: mov w8, #48576
	; CHECK-NEXT: movk w8, #65520, lsl #16			; CHECK-NEXT: movk w8, #65520, lsl #16
	; CHECK-NEXT: add x0, x0, x8			; CHECK-NEXT: add x0, x0, x8
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%b = add i64 %a, 4293967296			%b = add i64 %a, 4293967296
	ret i64 %b			ret i64 %b
	}			}

	; TODO: adds/subs			; ADDS and SUBS Optimizations
				; Checks with all types first, then checks that only EQ and NE optimize
				benshi001Unsubmitted Not Done Reply Inline Actions Do we need to pre-commit these tests, and check the difference between before/after ? benshi001: Do we need to pre-commit these tests, and check the difference between before/after ?
				define i1 @eq_i(i32 %0) {
				; CHECK-LABEL: eq_i:
				; CHECK: // %bb.0:
				; CHECK-NEXT: sub w8, w0, #273, lsl #12 // =1118208
				; CHECK-NEXT: cmp w8, #273
				; CHECK-NEXT: cset w0, eq
				; CHECK-NEXT: ret
				%2 = icmp eq i32 %0, 1118481
				ret i1 %2
				}

				define i1 @eq_l(i64 %0) {
				; CHECK-LABEL: eq_l:
				; CHECK: // %bb.0:
				; CHECK-NEXT: sub x8, x0, #273, lsl #12 // =1118208
				; CHECK-NEXT: cmp x8, #273
				; CHECK-NEXT: cset w0, eq
				; CHECK-NEXT: ret
				%2 = icmp eq i64 %0, 1118481
				ret i1 %2
				}

				define i1 @ne_i(i32 %0) {
				; CHECK-LABEL: ne_i:
				; CHECK: // %bb.0:
				; CHECK-NEXT: sub w8, w0, #273, lsl #12 // =1118208
				; CHECK-NEXT: cmp w8, #273
				; CHECK-NEXT: cset w0, ne
				; CHECK-NEXT: ret
				%2 = icmp ne i32 %0, 1118481
				ret i1 %2
				}

				define i1 @ne_l(i64 %0) {
				; CHECK-LABEL: ne_l:
				; CHECK: // %bb.0:
				; CHECK-NEXT: sub x8, x0, #273, lsl #12 // =1118208
				; CHECK-NEXT: cmp x8, #273
				; CHECK-NEXT: cset w0, ne
				; CHECK-NEXT: ret
				%2 = icmp ne i64 %0, 1118481
				ret i1 %2
				}

				define i1 @eq_in(i32 %0) {
				; CHECK-LABEL: eq_in:
				; CHECK: // %bb.0:
				; CHECK-NEXT: add w8, w0, #273, lsl #12 // =1118208
				; CHECK-NEXT: cmn w8, #273
				; CHECK-NEXT: cset w0, eq
				; CHECK-NEXT: ret
				%2 = icmp eq i32 %0, -1118481
				ret i1 %2
				}

				define i1 @eq_ln(i64 %0) {
				; CHECK-LABEL: eq_ln:
				; CHECK: // %bb.0:
				; CHECK-NEXT: add x8, x0, #273, lsl #12 // =1118208
				; CHECK-NEXT: cmn x8, #273
				; CHECK-NEXT: cset w0, eq
				; CHECK-NEXT: ret
				%2 = icmp eq i64 %0, -1118481
				ret i1 %2
				}

				define i1 @ne_in(i32 %0) {
				; CHECK-LABEL: ne_in:
				; CHECK: // %bb.0:
				; CHECK-NEXT: add w8, w0, #273, lsl #12 // =1118208
				; CHECK-NEXT: cmn w8, #273
				; CHECK-NEXT: cset w0, ne
				; CHECK-NEXT: ret
				%2 = icmp ne i32 %0, -1118481
				ret i1 %2
				}

				define i1 @ne_ln(i64 %0) {
				; CHECK-LABEL: ne_ln:
				; CHECK: // %bb.0:
				; CHECK-NEXT: add x8, x0, #273, lsl #12 // =1118208
				; CHECK-NEXT: cmn x8, #273
				; CHECK-NEXT: cset w0, ne
				; CHECK-NEXT: ret
				%2 = icmp ne i64 %0, -1118481
				ret i1 %2
				}

				define i1 @reject_eq(i32 %0) {
				; CHECK-LABEL: reject_eq:
				; CHECK: // %bb.0:
				; CHECK-NEXT: mov w8, #51712
				; CHECK-NEXT: movk w8, #15258, lsl #16
				; CHECK-NEXT: cmp w0, w8
				; CHECK-NEXT: cset w0, eq
				; CHECK-NEXT: ret
				%2 = icmp eq i32 %0, 1000000000
				ret i1 %2
				}

				define i1 @reject_non_eqne_csinc(i32 %0) {
				; CHECK-LABEL: reject_non_eqne_csinc:
				; CHECK: // %bb.0:
				; CHECK-NEXT: mov w8, #4369
				; CHECK-NEXT: movk w8, #17, lsl #16
				; CHECK-NEXT: cmp w0, w8
				; CHECK-NEXT: cset w0, lo
				; CHECK-NEXT: ret
				%2 = icmp ult i32 %0, 1118481
				ret i1 %2
				}

				define i32 @accept_csel(i32 %0) {
				; CHECK-LABEL: accept_csel:
				; CHECK: // %bb.0:
				; CHECK-NEXT: sub w9, w0, #273, lsl #12 // =1118208
				; CHECK-NEXT: mov w8, #17
				; CHECK-NEXT: cmp w9, #273
				; CHECK-NEXT: mov w9, #11
				; CHECK-NEXT: csel w0, w9, w8, eq
				; CHECK-NEXT: ret
				%2 = icmp eq i32 %0, 1118481
				%3 = select i1 %2, i32 11, i32 17
				ret i32 %3
				}

				define i32 @reject_non_eqne_csel(i32 %0) {
				; CHECK-LABEL: reject_non_eqne_csel:
				; CHECK: // %bb.0:
				; CHECK-NEXT: mov w8, #4369
				; CHECK-NEXT: mov w9, #11
				; CHECK-NEXT: movk w8, #17, lsl #16
				; CHECK-NEXT: cmp w0, w8
				; CHECK-NEXT: mov w8, #17
				; CHECK-NEXT: csel w0, w9, w8, lo
				; CHECK-NEXT: ret
				%2 = icmp ult i32 %0, 1118481
				%3 = select i1 %2, i32 11, i32 17
				ret i32 %3
				}

				declare void @fooy()

				define void @accept_branch(i32 %0) {
				; CHECK-LABEL: accept_branch:
				; CHECK: // %bb.0:
				; CHECK-NEXT: sub w8, w0, #291, lsl #12 // =1191936
				; CHECK-NEXT: cmp w8, #1110
				; CHECK-NEXT: b.eq .LBB32_2
				; CHECK-NEXT: // %bb.1:
				; CHECK-NEXT: ret
				; CHECK-NEXT: .LBB32_2:
				; CHECK-NEXT: b fooy
				%2 = icmp ne i32 %0, 1193046
				br i1 %2, label %4, label %3
				3: ; preds = %1
				tail call void @fooy()
				br label %4
				4: ; preds = %3, %1
				ret void
				}

				define void @reject_non_eqne_branch(i32 %0) {
				; CHECK-LABEL: reject_non_eqne_branch:
				; CHECK: // %bb.0:
				; CHECK-NEXT: mov w8, #13398
				; CHECK-NEXT: movk w8, #18, lsl #16
				; CHECK-NEXT: cmp w0, w8
				; CHECK-NEXT: b.le .LBB33_2
				; CHECK-NEXT: // %bb.1:
				; CHECK-NEXT: ret
				; CHECK-NEXT: .LBB33_2:
				; CHECK-NEXT: b fooy
				%2 = icmp sgt i32 %0, 1193046
				br i1 %2, label %4, label %3
				3: ; preds = %1
				tail call void @fooy()
				br label %4
				4: ; preds = %3, %1
				ret void
				}

				define i32 @reject_multiple_usages(i32 %0) {
				; CHECK-LABEL: reject_multiple_usages:
				; CHECK: // %bb.0:
				; CHECK-NEXT: mov w8, #4369
				; CHECK-NEXT: mov w9, #3
				; CHECK-NEXT: movk w8, #17, lsl #16
				; CHECK-NEXT: mov w10, #17
				; CHECK-NEXT: cmp w0, w8
				; CHECK-NEXT: mov w8, #9
				; CHECK-NEXT: mov w11, #12
				; CHECK-NEXT: csel w8, w8, w9, eq
				; CHECK-NEXT: csel w9, w11, w10, hi
				; CHECK-NEXT: add w8, w8, w9
				; CHECK-NEXT: mov w9, #53312
				; CHECK-NEXT: movk w9, #2, lsl #16
				; CHECK-NEXT: cmp w0, w9
				; CHECK-NEXT: mov w9, #26304
				; CHECK-NEXT: movk w9, #1433, lsl #16
				; CHECK-NEXT: csel w0, w8, w9, hi
				; CHECK-NEXT: ret
				%2 = icmp eq i32 %0, 1118481
				%3 = icmp ugt i32 %0, 1118481
				%4 = select i1 %2, i32 9, i32 3
				%5 = select i1 %3, i32 12, i32 17
				%6 = add i32 %4, %5
				%7 = icmp ugt i32 %0, 184384
				%8 = select i1 %7, i32 %6, i32 93939392
				ret i32 %8
				}

				; Unique case found in ClangBuiltLinux where the DstReg is not Virtual and
				; caused an assertion failure
				define dso_local i32 @neigh_periodic_work_tbl_1() {
				; CHECK-LABEL: neigh_periodic_work_tbl_1:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: adrp x8, neigh_periodic_work_tbl_1
				; CHECK-NEXT: add x8, x8, :lo12:neigh_periodic_work_tbl_1
				; CHECK-NEXT: add x8, x8, #18, lsl #12 // =73728
				; CHECK-NEXT: cmn x8, #1272
				; CHECK-NEXT: b.pl .LBB35_2
				; CHECK-NEXT: .LBB35_1: // %for.cond
				; CHECK-NEXT: // =>This Inner Loop Header: Depth=1
				; CHECK-NEXT: b .LBB35_1
				; CHECK-NEXT: .LBB35_2: // %if.end
				; CHECK-NEXT: ret
				entry:
				br i1 icmp slt (i64 add (i64 ptrtoint (i32 ()* @neigh_periodic_work_tbl_1 to i64), i64 75000), i64 0), label %for.cond, label %if.end
				for.cond: ; preds = %entry, %for.cond
				br label %for.cond
				if.end: ; preds = %entry
				ret i32 undef
				}

				@jiffies = dso_local local_unnamed_addr global i32 0, align 4
				@primary_crng = dso_local local_unnamed_addr global i32 0, align 4
				@input_pool = dso_local global i32 0, align 4
				declare dso_local i32 @crng_reseed(...) local_unnamed_addr
				; Function Attrs: nounwind uwtable
				define dso_local i32 @_extract_crng_crng() {
				; CHECK-LABEL: _extract_crng_crng:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: .cfi_def_cfa_offset 16
				; CHECK-NEXT: .cfi_offset w30, -16
				; CHECK-NEXT: adrp x8, _extract_crng_crng
				; CHECK-NEXT: add x8, x8, :lo12:_extract_crng_crng
				; CHECK-NEXT: tbnz x8, #63, .LBB36_2
				; CHECK-NEXT: // %bb.1: // %lor.lhs.false
				; CHECK-NEXT: adrp x9, jiffies
				; CHECK-NEXT: ldrsw x9, [x9, :lo12:jiffies]
				; CHECK-NEXT: sub x8, x8, x9
				; CHECK-NEXT: add x8, x8, #18, lsl #12 // =73728
				; CHECK-NEXT: cmn x8, #1272
				; CHECK-NEXT: b.pl .LBB36_3
				; CHECK-NEXT: .LBB36_2: // %if.then
				; CHECK-NEXT: adrp x8, primary_crng
				; CHECK-NEXT: adrp x9, input_pool
				; CHECK-NEXT: add x9, x9, :lo12:input_pool
				; CHECK-NEXT: ldr w8, [x8, :lo12:primary_crng]
				; CHECK-NEXT: cmp w8, #0
				; CHECK-NEXT: csel x0, xzr, x9, eq
				; CHECK-NEXT: bl crng_reseed
				; CHECK-NEXT: .LBB36_3: // %if.end
				; CHECK-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				entry:
				br i1 icmp slt (i32 ()* @_extract_crng_crng, i32 ()* null), label %if.then, label %lor.lhs.false
				lor.lhs.false: ; preds = %entry
				%0 = load i32, i32* @jiffies, align 4
				%idx.ext = sext i32 %0 to i64
				%idx.neg = sub nsw i64 0, %idx.ext
				%add.ptr = getelementptr i8, i8* getelementptr (i8, i8* bitcast (i32 ()* @_extract_crng_crng to i8*), i64 75000), i64 %idx.neg
				%cmp = icmp slt i8* %add.ptr, null
				br i1 %cmp, label %if.then, label %if.end
				if.then: ; preds = %lor.lhs.false, %entry
				%1 = load i32, i32* @primary_crng, align 4
				%tobool.not = icmp eq i32 %1, 0
				%cond = select i1 %tobool.not, i32* null, i32* @input_pool
				%call = tail call i32 bitcast (i32 (...)* @crng_reseed to i32 (i32))(i32* noundef %cond)
				br label %if.end
				if.end: ; preds = %if.then, %lor.lhs.false
				ret i32 undef
				}

llvm/test/CodeGen/AArch64/arm64-instruction-mix-remarks.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=arm64-apple-ios7.0 \| FileCheck %s			; RUN: llc < %s -mtriple=arm64-apple-ios7.0 \| FileCheck %s

	; RUN: llc -mtriple=arm64-apple-ios7.0 -pass-remarks-output=%t -pass-remarks=asm-printer -o - %s			; RUN: llc -mtriple=arm64-apple-ios7.0 -pass-remarks-output=%t -pass-remarks=asm-printer -o - %s
	; RUN: FileCheck --input-file=%t --check-prefix=YAML %s			; RUN: FileCheck --input-file=%t --check-prefix=YAML %s

	; YAML: Name: InstructionMix			; YAML: Name: InstructionMix
	; YAML-NEXT: DebugLoc: { File: arm64-instruction-mix-remarks.ll, Line: 10, Column: 10 }			; YAML-NEXT: DebugLoc: { File: arm64-instruction-mix-remarks.ll, Line: 10, Column: 10 }
	; YAML-NEXT: Function: foo			; YAML-NEXT: Function: foo
	; YAML-NEXT: Args:			; YAML-NEXT: Args:
	; YAML: - BasicBlock: entry			; YAML: - BasicBlock: entry
	; YAML: - INST_add: '2'			; YAML: - INST_add: '2'
	; YAML: - INST_b.: '1'			; YAML: - INST_b.: '1'
	; YAML: - INST_ldr: '1'			; YAML: - INST_ldr: '1'
	dmgreenUnsubmitted Done Reply Inline Actions Should the ldr still be present? And the orr was already present before? It looks like the test was already missing some input, but we might as well keep the ldr around. dmgreen: Should the ldr still be present? And the orr was already present before? It looks like the test…
	; YAML: - INST_movk: '1'			; YAML: - INST_orr: '1'
	; YAML: - INST_movz: '1'			; YAML: - INST_sub: '1'
	; YAML: - INST_subs: '1'			; YAML: - INST_subs: '1'

	; YAML: Name: InstructionMix			; YAML: Name: InstructionMix
	; YAML-NEXT: DebugLoc: { File: arm64-instruction-mix-remarks.ll, Line: 30, Column: 30 }			; YAML-NEXT: DebugLoc: { File: arm64-instruction-mix-remarks.ll, Line: 30, Column: 30 }
	; YAML-NEXT: Function: foo			; YAML-NEXT: Function: foo
	; YAML-NEXT: Args:			; YAML-NEXT: Args:
	; YAML: - BasicBlock: else			; YAML: - BasicBlock: else
	; YAML: - INST_madd: '2'			; YAML: - INST_madd: '2'
	; YAML: - INST_movz: '1'			; YAML: - INST_movz: '1'
	; YAML: - INST_str: '1'			; YAML: - INST_str: '1'
	define i32 @foo(i32* %ptr, i32 %x, i64 %y) !dbg !3 {			define i32 @foo(i32* %ptr, i32 %x, i64 %y) !dbg !3 {
	; CHECK-LABEL: foo:			; CHECK-LABEL: foo:
	; CHECK: ; %bb.0: ; %entry			; CHECK: ; %bb.0: ; %entry
	; CHECK-NEXT: ldr w10, [x0]			; CHECK-NEXT: ldr w9, [x0]
	; CHECK-NEXT: mov x8, x0			; CHECK-NEXT: mov x8, x0
	; CHECK-NEXT: mov w9, #16959			; CHECK-NEXT: add w0, w9, w1
	; CHECK-NEXT: movk w9, #15, lsl #16			; CHECK-NEXT: add x9, x0, x2
	; CHECK-NEXT: add w0, w10, w1			; CHECK-NEXT: sub x9, x9, #244, lsl #12 ; =999424
	; CHECK-NEXT: add x10, x0, x2			; CHECK-NEXT: cmp x9, #575
	; CHECK-NEXT: cmp x10, x9
	; CHECK-NEXT: b.eq LBB0_2			; CHECK-NEXT: b.eq LBB0_2
	; CHECK-NEXT: ; %bb.1: ; %else			; CHECK-NEXT: ; %bb.1: ; %else
	; CHECK-NEXT: mul w9, w0, w1			; CHECK-NEXT: mul w9, w0, w1
	; CHECK-NEXT: mov w10, #10			; CHECK-NEXT: mov w10, #10
	; CHECK-NEXT: mul w0, w9, w1			; CHECK-NEXT: mul w0, w9, w1
	; CHECK-NEXT: str w10, [x8]			; CHECK-NEXT: str w10, [x8]
	; CHECK-NEXT: LBB0_2: ; %common.ret			; CHECK-NEXT: LBB0_2: ; %common.ret
	; CHECK-NEXT: ; kill: def $w0 killed $w0 killed $x0			; CHECK-NEXT: ; kill: def $w0 killed $w0 killed $x0
	Show All 28 Lines