This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
3/8
AArch64MIPeepholeOpt.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
2/5
checked-int-div.ll

Differential D120428

[AArch64] Optimize safe integer division
AbandonedPublic

Authored by Kmeakin on Feb 23 2022, 12:13 PM.

Download Raw Diff

Details

Reviewers

dmgreen

Summary

Remove redundant checks against 0 when performing "safe integer division" (ie y == 0 ? 0 : x / y). UDIV/SDIV return 0 when divisor is 0, so the CMP+CSEL instructions are unnecessary.

Before:

udiv w0, w0, w1
cmp w0, #0
csel w0, wzr, w0, eq

After:

udiv w0, w0, w1

This patch does not optimize cases where the UDIV/SDIV instruction is guarded by a conditional branch, such as

cbz w1, zero
udiv w0, w0, w1
ret
zero:
mov w0, wzr
ret

This case could be covered in a future patch.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

Kmeakin created this revision.Feb 23 2022, 12:13 PM

Herald added a subscriber: hiraditya. · View Herald TranscriptFeb 23 2022, 12:13 PM

Kmeakin requested review of this revision.Feb 23 2022, 12:13 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 23 2022, 12:14 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Kmeakin retitled this revision from Remove redundant CSELs when performing safe integer division. to [AArch64] Optimize safe integer division.Feb 23 2022, 12:19 PM

Kmeakin edited the summary of this revision. (Show Details)

Herald added a subscriber: kristof.beyls. · View Herald TranscriptFeb 23 2022, 12:19 PM

Kmeakin edited the summary of this revision. (Show Details)Feb 23 2022, 12:25 PM

Harbormaster completed remote builds in B151121: Diff 410906.Feb 23 2022, 1:29 PM

Is this a part of AArch64MIPeepholeOpt because it relies upon DIV being ifcvt'd? These peephole optimisations seem notoriously difficult to get right, but it makes sense that you would have to do it so late. We'll just need to be sure it's tested well.

llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
319	This needn't add brackets around single statements according to the llvm code style
325	Isn't FlagsReg AArch64::NZCV? How does that work with getUniqueVRegDef?
473	I don't tend to find these very useful. Does it need to be added?
llvm/test/CodeGen/AArch64/checked-int-div.ll
2	neoversen1 isn't a valid cpu. Does this not work in other cases due to the costs of div being too high to ifcvt?

Kmeakin added a reviewer: dmgreen.Feb 24 2022, 2:19 AM

In D120428#3342432, @dmgreen wrote:

Is this a part of AArch64MIPeepholeOpt because it relies upon DIV being ifcvt'd?

Yes, exactly

llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
473	Not strictly necessary, I only added it because I noticed other passes also printed the same header.
llvm/test/CodeGen/AArch64/checked-int-div.ll
2	`llc -mcpu=help --mtriple=aarch64--` lists `neoversen1` as an option. Any out of order CPU should do (the cost model defaults to in order, which results in div not being ifcvted)

Kmeakin added inline comments.Feb 24 2022, 2:29 AM

llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
325	`FlagsReg` is indeed `NZCV`, but it is updated by flag-setting instructions (eg `SUBS`), so `getUniqueVRegDef` will return the instruction that sets the flags.

dmgreen added inline comments.Feb 24 2022, 9:55 AM

llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
325	Sure, but it's a physical register, not a vreg. Which was why I was surprised it worked. I was surprised it didn't assert that the register was virtual. Does it work if there are multiple instructions defining nzcv in the function?
llvm/test/CodeGen/AArch64/checked-int-div.ll
2	I think it would be "neoverse-n1", https://godbolt.org/z/5KEGbcfdo.

Kmeakin added inline comments.Mar 1 2022, 4:46 AM

llvm/test/CodeGen/AArch64/checked-int-div.ll
2	Oh, I see now that `neoversen1` is a "cpu feature" and `neoverse-n1` is the CPU. Interestingly, specifying `-mcpu=neoverse-n1` results in the if conversion not firing, but `-mcpu=neoversen1` (or any other unrecognised CPU) does result in if conversion.

Kmeakin added inline comments.Mar 17 2022, 5:46 AM

llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
325	No, it does not work if there are multiple definitions of NZCV. As far as I can tell, it is not possible to get the correct defining instruction for NZCV if it is defined by multiple instructions.

Herald added a project: Restricted Project. · View Herald TranscriptMar 17 2022, 5:46 AM

dmgreen added inline comments.Mar 17 2022, 6:22 AM

llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
325	I don't think any other parts of this pass look for the incoming NZCV at the moment (as opposed to the output uses). Sometimes these passes just needs to search backwards for the first def of NZCV it finds.
llvm/test/CodeGen/AArch64/checked-int-div.ll
2	A div usually takes quite a long time on most cpus, compared to other instructions. So unless you know that the condition is almost always true (because you almost never divide by 0 for example), the branching version might be considered better in general. I'm not sure if there's some way to bias ifcvt to do the transform for these cases anyway? Maybe something with block probabilities or possibly special casing it in isProfitableToIfCvt? It looks like it might just hit other heuristics at the moment though.

Abandoning this patch: after some reflection, I have decided that in its current incarnation it would be very niche optimisation that would only fire under very niche conditions (platforms where the if-conversion fires and the NZCV def is not used anywhere else). It may be worth another attempt later down the line, but performed at a different stage in the optimization pipeline (perhaps during lowering from LLVM-IR to SelectionDAG?)

Kmeakin abandoned this revision.Apr 19 2022, 8:06 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64MIPeepholeOpt.cpp

79 lines

test/

CodeGen/

AArch64/

checked-int-div.ll

72 lines

Diff 410906

llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp

Show All 25 Lines
// 4. Remove redundant ORRWrs which is generated by zero-extend.		// 4. Remove redundant ORRWrs which is generated by zero-extend.
//		//
// %3:gpr32 = ORRWrs $wzr, %2, 0		// %3:gpr32 = ORRWrs $wzr, %2, 0
// %4:gpr64 = SUBREG_TO_REG 0, %3, %subreg.sub_32		// %4:gpr64 = SUBREG_TO_REG 0, %3, %subreg.sub_32
//		//
// If AArch64's 32-bit form of instruction defines the source operand of		// If AArch64's 32-bit form of instruction defines the source operand of
// ORRWrs, we can remove the ORRWrs because the upper 32 bits of the source		// ORRWrs, we can remove the ORRWrs because the upper 32 bits of the source
// operand are set to zero.		// operand are set to zero.
		// 5. Remove redundant CSELs when performing safe integer division.
		//
		// (CSEL 0 ({S\|U}DIV x y) EQ (CMP y 0)) => ({S\|U}DIV x y)
		//
		// The CSEL is redundant, because {S\|U}DIV returns 0 when divisor is 0
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "AArch64ExpandImm.h"		#include "AArch64ExpandImm.h"
#include "AArch64InstrInfo.h"		#include "AArch64InstrInfo.h"
#include "MCTargetDesc/AArch64AddressingModes.h"		#include "MCTargetDesc/AArch64AddressingModes.h"
#include "llvm/ADT/Optional.h"		#include "llvm/ADT/Optional.h"
#include "llvm/ADT/SetVector.h"		#include "llvm/ADT/SetVector.h"
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	struct AArch64MIPeepholeOpt : public MachineFunctionPass {
template <typename T>		template <typename T>
bool visitADDSUB(unsigned PosOpc, unsigned NegOpc, MachineInstr &MI,		bool visitADDSUB(unsigned PosOpc, unsigned NegOpc, MachineInstr &MI,
SmallSetVector<MachineInstr *, 8> &ToBeRemoved);		SmallSetVector<MachineInstr *, 8> &ToBeRemoved);
template <typename T>		template <typename T>
bool visitAND(unsigned Opc, MachineInstr &MI,		bool visitAND(unsigned Opc, MachineInstr &MI,
SmallSetVector<MachineInstr *, 8> &ToBeRemoved);		SmallSetVector<MachineInstr *, 8> &ToBeRemoved);
bool visitORR(MachineInstr &MI,		bool visitORR(MachineInstr &MI,
SmallSetVector<MachineInstr *, 8> &ToBeRemoved);		SmallSetVector<MachineInstr *, 8> &ToBeRemoved);
		bool visitCSEL(MachineInstr &MI,
		SmallSetVector<MachineInstr *, 8> &ToBeRemoved);
bool runOnMachineFunction(MachineFunction &MF) override;		bool runOnMachineFunction(MachineFunction &MF) override;

StringRef getPassName() const override {		StringRef getPassName() const override {
return "AArch64 MI Peephole Optimization pass";		return "AArch64 MI Peephole Optimization pass";
}		}

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.setPreservesCFG();		AU.setPreservesCFG();
▲ Show 20 Lines • Show All 183 Lines • ▼ Show 20 Lines	return splitTwoPartImm<T>(
.addImm(12);		.addImm(12);
BuildMI(*MBB, MI, DL, TII->get(Opcode), NewDstReg)		BuildMI(*MBB, MI, DL, TII->get(Opcode), NewDstReg)
.addReg(NewTmpReg)		.addReg(NewTmpReg)
.addImm(Imm1)		.addImm(Imm1)
.addImm(0);		.addImm(0);
});		});
}		}

		static bool isDiv(unsigned Opcode) {
		return Opcode == AArch64::UDIVWr \|\| Opcode == AArch64::UDIVXr \|\|
		Opcode == AArch64::SDIVWr \|\| Opcode == AArch64::SDIVXr;
		}

		bool AArch64MIPeepholeOpt::visitCSEL(
		MachineInstr &MI, SmallSetVector<MachineInstr *, 8> &ToBeRemoved) {
		auto RetReg = MI.getOperand(0).getReg();
		auto LhsReg = MI.getOperand(1).getReg();
		auto RhsReg = MI.getOperand(2).getReg();
		auto CondCode = MI.getOperand(3).getImm();
		auto FlagsReg = MI.getOperand(4).getReg();

		if (CondCode != AArch64CC::EQ \|\| FlagsReg != AArch64::NZCV) {
		dmgreenUnsubmitted Not Done Reply Inline Actions This needn't add brackets around single statements according to the llvm code style dmgreen: This needn't add brackets around single statements according to the llvm code style
		return false;
		}

		auto *LhsDef = MRI->getUniqueVRegDef(LhsReg);
		auto *RhsDef = MRI->getUniqueVRegDef(RhsReg);
		auto *FlagsDef = MRI->getUniqueVRegDef(FlagsReg);
		dmgreenUnsubmitted Not Done Reply Inline Actions Isn't FlagsReg AArch64::NZCV? How does that work with getUniqueVRegDef? dmgreen: Isn't FlagsReg AArch64::NZCV? How does that work with getUniqueVRegDef?
		KmeakinAuthorUnsubmitted Done Reply Inline Actions `FlagsReg` is indeed `NZCV`, but it is updated by flag-setting instructions (eg `SUBS`), so `getUniqueVRegDef` will return the instruction that sets the flags. Kmeakin: `FlagsReg` is indeed `NZCV`, but it is updated by flag-setting instructions (eg `SUBS`), so…
		dmgreenUnsubmitted Not Done Reply Inline Actions Sure, but it's a physical register, not a vreg. Which was why I was surprised it worked. I was surprised it didn't assert that the register was virtual. Does it work if there are multiple instructions defining nzcv in the function? dmgreen: Sure, but it's a physical register, not a vreg. Which was why I was surprised it worked. I was…
		KmeakinAuthorUnsubmitted Done Reply Inline Actions No, it does not work if there are multiple definitions of NZCV. As far as I can tell, it is not possible to get the correct defining instruction for NZCV if it is defined by multiple instructions. Kmeakin: No, it does not work if there are multiple definitions of NZCV. As far as I can tell, it is not…
		dmgreenUnsubmitted Not Done Reply Inline Actions I don't think any other parts of this pass look for the incoming NZCV at the moment (as opposed to the output uses). Sometimes these passes just needs to search backwards for the first def of NZCV it finds. dmgreen: I don't think any other parts of this pass look for the incoming NZCV at the moment (as opposed…

		if (!LhsDef \|\| !RhsDef \|\| !FlagsDef) {
		return false;
		}

		// Is one of the two sides of the csel a div?
		MachineInstr DivDef, DefaultDef;
		if (isDiv(LhsDef->getOpcode())) {
		DivDef = LhsDef;
		DefaultDef = RhsDef;
		} else if (isDiv(RhsDef->getOpcode())) {
		DivDef = RhsDef;
		DefaultDef = LhsDef;
		} else {
		return false;
		}

		auto QuotientReg = DivDef->getOperand(0).getReg();
		auto DivisorReg = DivDef->getOperand(2).getReg();

		// Is the default value zero?
		if (DefaultDef->getOpcode() != AArch64::COPY \|\|
		(DefaultDef->getOperand(1).getReg() != AArch64::WZR &&
		DefaultDef->getOperand(1).getReg() != AArch64::XZR)) {
		return false;
		}

		// Is the divisor being compared againt zero?
		if ((FlagsDef->getOpcode() != AArch64::SUBSWri &&
		FlagsDef->getOpcode() != AArch64::SUBSXri) \|\|
		FlagsDef->getOperand(1).getReg() != DivisorReg \|\|
		FlagsDef->getOperand(2).getImm() != 0) {
		return false;
		}

		MRI->replaceRegWith(RetReg, QuotientReg);
		ToBeRemoved.insert(&MI);
		ToBeRemoved.insert(DefaultDef);
		if (MRI->hasOneUse(FlagsReg)) {
		ToBeRemoved.insert(FlagsDef);
		}

		return true;
		}

// Checks if the corresponding MOV immediate instruction is applicable for		// Checks if the corresponding MOV immediate instruction is applicable for
// this peephole optimization.		// this peephole optimization.
bool AArch64MIPeepholeOpt::checkMovImmInstr(MachineInstr &MI,		bool AArch64MIPeepholeOpt::checkMovImmInstr(MachineInstr &MI,
MachineInstr *&MovMI,		MachineInstr *&MovMI,
MachineInstr *&SubregToRegMI) {		MachineInstr *&SubregToRegMI) {
// Check whether current MBB is in loop and the AND is loop invariant.		// Check whether current MBB is in loop and the AND is loop invariant.
MachineBasicBlock *MBB = MI.getParent();		MachineBasicBlock *MBB = MI.getParent();
MachineLoop *L = MLI->getLoopFor(MBB);		MachineLoop *L = MLI->getLoopFor(MBB);
▲ Show 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	bool AArch64MIPeepholeOpt::splitTwoPartImm(

return true;		return true;
}		}

bool AArch64MIPeepholeOpt::runOnMachineFunction(MachineFunction &MF) {		bool AArch64MIPeepholeOpt::runOnMachineFunction(MachineFunction &MF) {
if (skipFunction(MF.getFunction()))		if (skipFunction(MF.getFunction()))
return false;		return false;

		LLVM_DEBUG(dbgs() << "******** AArch64 Peephole Optimizer ********\n"
		dmgreenUnsubmitted Not Done Reply Inline Actions I don't tend to find these very useful. Does it need to be added? dmgreen: I don't tend to find these very useful. Does it need to be added?
		KmeakinAuthorUnsubmitted Done Reply Inline Actions Not strictly necessary, I only added it because I noticed other passes also printed the same header. Kmeakin: Not strictly necessary, I only added it because I noticed other passes also printed the same…
		<< "********** Function: " << MF.getName() << '\n');

TII = static_cast<const AArch64InstrInfo *>(MF.getSubtarget().getInstrInfo());		TII = static_cast<const AArch64InstrInfo *>(MF.getSubtarget().getInstrInfo());
TRI = static_cast<const AArch64RegisterInfo *>(		TRI = static_cast<const AArch64RegisterInfo *>(
MF.getSubtarget().getRegisterInfo());		MF.getSubtarget().getRegisterInfo());
MLI = &getAnalysis<MachineLoopInfo>();		MLI = &getAnalysis<MachineLoopInfo>();
MRI = &MF.getRegInfo();		MRI = &MF.getRegInfo();

assert(MRI->isSSA() && "Expected to be run on SSA form!");		assert(MRI->isSSA() && "Expected to be run on SSA form!");

Show All 9 Lines	for (MachineInstr &MI : MBB) {
Changed = visitAND<uint32_t>(AArch64::ANDWri, MI, ToBeRemoved);		Changed = visitAND<uint32_t>(AArch64::ANDWri, MI, ToBeRemoved);
break;		break;
case AArch64::ANDXrr:		case AArch64::ANDXrr:
Changed = visitAND<uint64_t>(AArch64::ANDXri, MI, ToBeRemoved);		Changed = visitAND<uint64_t>(AArch64::ANDXri, MI, ToBeRemoved);
break;		break;
case AArch64::ORRWrs:		case AArch64::ORRWrs:
Changed = visitORR(MI, ToBeRemoved);		Changed = visitORR(MI, ToBeRemoved);
break;		break;
		case AArch64::CSELWr:
		case AArch64::CSELXr:
		Changed = visitCSEL(MI, ToBeRemoved);
		break;
case AArch64::ADDWrr:		case AArch64::ADDWrr:
Changed = visitADDSUB<uint32_t>(AArch64::ADDWri, AArch64::SUBWri, MI,		Changed = visitADDSUB<uint32_t>(AArch64::ADDWri, AArch64::SUBWri, MI,
ToBeRemoved);		ToBeRemoved);
break;		break;
case AArch64::SUBWrr:		case AArch64::SUBWrr:
Changed = visitADDSUB<uint32_t>(AArch64::SUBWri, AArch64::ADDWri, MI,		Changed = visitADDSUB<uint32_t>(AArch64::SUBWri, AArch64::ADDWri, MI,
ToBeRemoved);		ToBeRemoved);
break;		break;
Show All 21 Lines

llvm/test/CodeGen/AArch64/checked-int-div.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple=aarch64-gnu-linux -mcpu=neoversen1 -o - \| FileCheck %s
				dmgreenUnsubmitted Not Done Reply Inline Actions neoversen1 isn't a valid cpu. Does this not work in other cases due to the costs of div being too high to ifcvt? dmgreen: neoversen1 isn't a valid cpu. Does this not work in other cases due to the costs of div being…
				KmeakinAuthorUnsubmitted Done Reply Inline Actions `llc -mcpu=help --mtriple=aarch64--` lists `neoversen1` as an option. Any out of order CPU should do (the cost model defaults to in order, which results in div not being ifcvted) Kmeakin: `llc -mcpu=help --mtriple=aarch64--` lists `neoversen1` as an option. Any out of order CPU…
				dmgreenUnsubmitted Not Done Reply Inline Actions I think it would be "neoverse-n1", https://godbolt.org/z/5KEGbcfdo. dmgreen: I think it would be "neoverse-n1", https://godbolt.org/z/5KEGbcfdo.
				KmeakinAuthorUnsubmitted Done Reply Inline Actions Oh, I see now that `neoversen1` is a "cpu feature" and `neoverse-n1` is the CPU. Interestingly, specifying `-mcpu=neoverse-n1` results in the if conversion not firing, but `-mcpu=neoversen1` (or any other unrecognised CPU) does result in if conversion. Kmeakin: Oh, I see now that `neoversen1` is a "cpu feature" and `neoverse-n1` is the CPU. Interestingly…
				dmgreenUnsubmitted Not Done Reply Inline Actions A div usually takes quite a long time on most cpus, compared to other instructions. So unless you know that the condition is almost always true (because you almost never divide by 0 for example), the branching version might be considered better in general. I'm not sure if there's some way to bias ifcvt to do the transform for these cases anyway? Maybe something with block probabilities or possibly special casing it in isProfitableToIfCvt? It looks like it might just hit other heuristics at the moment though. dmgreen: A div usually takes quite a long time on most cpus, compared to other instructions. So unless…

				; Ensure that `y == 0 ? 0 : x / y` is optimised to a single UDIV/SDIV - UDIV/SDIV return 0 when divisor is 0

				define i32 @u32_checked_div(i32 %0, i32 %1) {
				; CHECK-LABEL: u32_checked_div:
				; CHECK: // %bb.0:
				; CHECK-NEXT: udiv w0, w0, w1
				; CHECK-NEXT: ret
				%3 = icmp eq i32 %1, 0
				br i1 %3, label %6, label %4

				4:
				%5 = udiv i32 %0, %1
				br label %6

				6:
				%7 = phi i32 [ %5, %4 ], [ 0, %2 ]
				ret i32 %7
				}

				define i64 @u64_checked_div(i64 %0, i64 %1) {
				; CHECK-LABEL: u64_checked_div:
				; CHECK: // %bb.0:
				; CHECK-NEXT: udiv x0, x0, x1
				; CHECK-NEXT: ret
				%3 = icmp eq i64 %1, 0
				br i1 %3, label %6, label %4

				4:
				%5 = udiv i64 %0, %1
				br label %6

				6:
				%7 = phi i64 [ %5, %4 ], [ 0, %2 ]
				ret i64 %7
				}

				define i32 @i32_checked_div(i32 %0, i32 %1) {
				; CHECK-LABEL: i32_checked_div:
				; CHECK: // %bb.0:
				; CHECK-NEXT: sdiv w0, w0, w1
				; CHECK-NEXT: ret
				%3 = icmp eq i32 %1, 0
				br i1 %3, label %6, label %4

				4:
				%5 = sdiv i32 %0, %1
				br label %6

				6:
				%7 = phi i32 [ %5, %4 ], [ 0, %2 ]
				ret i32 %7
				}

				define i64 @i64_checked_div(i64 %0, i64 %1) {
				; CHECK-LABEL: i64_checked_div:
				; CHECK: // %bb.0:
				; CHECK-NEXT: sdiv x0, x0, x1
				; CHECK-NEXT: ret
				%3 = icmp eq i64 %1, 0
				br i1 %3, label %6, label %4

				4:
				%5 = sdiv i64 %0, %1
				br label %6

				6:
				%7 = phi i64 [ %5, %4 ], [ 0, %2 ]
				ret i64 %7
				}