This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/GISel/
-
Target/
-
AArch64/
-
GISel/
1/2
AArch64InstructionSelector.cpp
-
test/CodeGen/AArch64/GlobalISel/
-
CodeGen/
-
AArch64/
-
GlobalISel/
-
select-cmp.mir

Differential D96388

[AArch64][GlobalISel] Fold G_ADD into the cset for G_ICMP
ClosedPublic

Authored by paquette on Feb 9 2021, 5:14 PM.

Download Raw Diff

Details

Reviewers

aemerson

Commits

rG9283058abbec: [AArch64][GlobalISel] Fold G_ADD into the cset for G_ICMP

Summary

When we have a G_ADD which is fed by a G_ICMP on one side, we can fold it into the cset for the G_ICMP.

e.g. Given

%cmp = G_ICMP ... %x, %y
%add = G_ADD %cmp, %z

We would normally emit a cmp, cset, and add.

However, %add is either %z or %z + 1. So, we can just use %z as the source of the cset rather than wzr, saving an instruction.

This would probably be cleaner in AArch64PostLegalizerLowering, but we'd need to change the way we represent G_ICMP to do that, I think. For now, it's easiest to implement in selection.

This is a 0.1% code size improvement on CTMark/pairlocalalign at -Os.

Example: https://godbolt.org/z/7KdrP8

Diff Detail

Event Timeline

paquette created this revision.Feb 9 2021, 5:14 PM

Herald added subscribers: danielkiss, hiraditya, kristof.beyls, rovka. · View Herald TranscriptFeb 9 2021, 5:14 PM

paquette requested review of this revision.Feb 9 2021, 5:14 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 9 2021, 5:14 PM

aemerson accepted this revision.Feb 10 2021, 10:32 AM

aemerson added inline comments.

llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
2181	What if the type is s64? emitCSetForICMP seems to only work for s32.

This revision is now accepted and ready to land.Feb 10 2021, 10:32 AM

aemerson requested changes to this revision.Feb 10 2021, 10:33 AM

This revision now requires changes to proceed.Feb 10 2021, 10:33 AM

paquette added inline comments.Feb 10 2021, 10:42 AM

llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
2181	I don't think that should happen, because G_ICMP can only output a s32. So, any G_ADD fed by it should take in + produce a s32. That being said, it's probably better to early exit if we don't have a s32 anyway. (And maybe emitCSetForICMP should be changed to a general purpose emitCSet function.)

Check the G_ADD type before performing the optimization, and add a vector test.

aemerson accepted this revision.Feb 10 2021, 1:12 PM

This revision is now accepted and ready to land.Feb 10 2021, 1:12 PM

Closed by commit rG9283058abbec: [AArch64][GlobalISel] Fold G_ADD into the cset for G_ICMP (authored by paquette). · Explain WhyFeb 10 2021, 1:28 PM

This revision was automatically updated to reflect the committed changes.

paquette added a commit: rG9283058abbec: [AArch64][GlobalISel] Fold G_ADD into the cset for G_ICMP.

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

GISel/

AArch64InstructionSelector.cpp

42 lines

test/

CodeGen/

AArch64/

GlobalISel/

select-cmp.mir

58 lines

Diff 322548

llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp

Show First 20 Lines • Show All 246 Lines • ▼ Show 20 Lines	private:
/// Helper function for selecting G_FCONSTANT. If the G_FCONSTANT can be		/// Helper function for selecting G_FCONSTANT. If the G_FCONSTANT can be
/// materialized using a FMOV instruction, then update MI and return it.		/// materialized using a FMOV instruction, then update MI and return it.
/// Otherwise, do nothing and return a nullptr.		/// Otherwise, do nothing and return a nullptr.
MachineInstr *emitFMovForFConstant(MachineInstr &MI,		MachineInstr *emitFMovForFConstant(MachineInstr &MI,
MachineRegisterInfo &MRI) const;		MachineRegisterInfo &MRI) const;

/// Emit a CSet for an integer compare.		/// Emit a CSet for an integer compare.
///		///
/// \p DefReg is expected to be a 32-bit scalar register.		/// \p DefReg and \p SrcReg are expected to be 32-bit scalar registers.
MachineInstr *emitCSetForICMP(Register DefReg, unsigned Pred,		MachineInstr *emitCSetForICMP(Register DefReg, unsigned Pred,
MachineIRBuilder &MIRBuilder) const;		MachineIRBuilder &MIRBuilder,
		Register SrcReg = AArch64::WZR) const;
/// Emit a CSet for a FP compare.		/// Emit a CSet for a FP compare.
///		///
/// \p Dst is expected to be a 32-bit scalar register.		/// \p Dst is expected to be a 32-bit scalar register.
MachineInstr *emitCSetForFCmp(Register Dst, CmpInst::Predicate Pred,		MachineInstr *emitCSetForFCmp(Register Dst, CmpInst::Predicate Pred,
MachineIRBuilder &MIRBuilder) const;		MachineIRBuilder &MIRBuilder) const;

/// Emit the overflow op for \p Opcode.		/// Emit the overflow op for \p Opcode.
///		///
▲ Show 20 Lines • Show All 1,882 Lines • ▼ Show 20 Lines	if (Ty.getSizeInBits() == 64) {
I.getOperand(1).ChangeToRegister(AArch64::WZR, false);		I.getOperand(1).ChangeToRegister(AArch64::WZR, false);
RBI.constrainGenericRegister(DefReg, AArch64::GPR32RegClass, MRI);		RBI.constrainGenericRegister(DefReg, AArch64::GPR32RegClass, MRI);
} else		} else
return false;		return false;

I.setDesc(TII.get(TargetOpcode::COPY));		I.setDesc(TII.get(TargetOpcode::COPY));
return true;		return true;
}		}

		case TargetOpcode::G_ADD: {
		// Check if this is being fed by a G_ICMP on either side.
		//
		// (cmp pred, x, y) + z
		//
		// In the above case, when the cmp is true, we increment z by 1. So, we can
		// fold the add into the cset for the cmp by using cinc.
		//
		// FIXME: This would probably be a lot nicer in PostLegalizerLowering.
		Register X = I.getOperand(1).getReg();
		Register CmpReg = I.getOperand(2).getReg();
		MachineInstr *Cmp = getOpcodeDef(TargetOpcode::G_ICMP, CmpReg, MRI);
		if (!Cmp) {
		std::swap(X, CmpReg);
		Cmp = getOpcodeDef(TargetOpcode::G_ICMP, CmpReg, MRI);
		if (!Cmp)
		return false;
		}
		MachineIRBuilder MIRBuilder(I);
		auto Pred =
		static_cast<CmpInst::Predicate>(Cmp->getOperand(1).getPredicate());
		emitIntegerCompare(Cmp->getOperand(2), Cmp->getOperand(3),
		Cmp->getOperand(1), MIRBuilder);
		emitCSetForICMP(I.getOperand(0).getReg(), Pred, MIRBuilder, X);
		aemersonUnsubmitted Not Done Reply Inline Actions What if the type is s64? emitCSetForICMP seems to only work for s32. aemerson: What if the type is s64? emitCSetForICMP seems to only work for s32.
		paquetteAuthorUnsubmitted Done Reply Inline Actions I don't think that should happen, because G_ICMP can only output a s32. So, any G_ADD fed by it should take in + produce a s32. That being said, it's probably better to early exit if we don't have a s32 anyway. (And maybe emitCSetForICMP should be changed to a general purpose emitCSet function.) paquette: I don't //think// that should happen, because G_ICMP can only output a s32. So, any G_ADD fed…
		I.eraseFromParent();
		return true;
		}
default:		default:
return false;		return false;
}		}
}		}

bool AArch64InstructionSelector::select(MachineInstr &I) {		bool AArch64InstructionSelector::select(MachineInstr &I) {
assert(I.getParent() && "Instruction should be in a basic block!");		assert(I.getParent() && "Instruction should be in a basic block!");
assert(I.getParent()->getParent() && "Instruction should be in a function!");		assert(I.getParent()->getParent() && "Instruction should be in a function!");
▲ Show 20 Lines • Show All 2,196 Lines • ▼ Show 20 Lines	MachineInstr *AArch64InstructionSelector::emitFMovForFConstant(
unsigned MovOpc = DefSize == 32 ? AArch64::FMOVSi : AArch64::FMOVDi;		unsigned MovOpc = DefSize == 32 ? AArch64::FMOVSi : AArch64::FMOVDi;
I.setDesc(TII.get(MovOpc));		I.setDesc(TII.get(MovOpc));
constrainSelectedInstRegOperands(I, TII, TRI, RBI);		constrainSelectedInstRegOperands(I, TII, TRI, RBI);
return &I;		return &I;
}		}

MachineInstr *		MachineInstr *
AArch64InstructionSelector::emitCSetForICMP(Register DefReg, unsigned Pred,		AArch64InstructionSelector::emitCSetForICMP(Register DefReg, unsigned Pred,
MachineIRBuilder &MIRBuilder) const {		MachineIRBuilder &MIRBuilder,
		Register SrcReg) const {
// CSINC increments the result when the predicate is false. Invert it.		// CSINC increments the result when the predicate is false. Invert it.
const AArch64CC::CondCode InvCC = changeICMPPredToAArch64CC(		const AArch64CC::CondCode InvCC = changeICMPPredToAArch64CC(
CmpInst::getInversePredicate((CmpInst::Predicate)Pred));		CmpInst::getInversePredicate((CmpInst::Predicate)Pred));
auto I =		auto I = MIRBuilder.buildInstr(AArch64::CSINCWr, {DefReg}, {SrcReg, SrcReg})
MIRBuilder
.buildInstr(AArch64::CSINCWr, {DefReg}, {Register(AArch64::WZR), Register(AArch64::WZR)})
.addImm(InvCC);		.addImm(InvCC);
constrainSelectedInstRegOperands(*I, TII, TRI, RBI);		constrainSelectedInstRegOperands(*I, TII, TRI, RBI);
return &*I;		return &*I;
}		}

std::pair<MachineInstr *, AArch64CC::CondCode>		std::pair<MachineInstr *, AArch64CC::CondCode>
AArch64InstructionSelector::emitOverflowOp(unsigned Opcode, Register Dst,		AArch64InstructionSelector::emitOverflowOp(unsigned Opcode, Register Dst,
MachineOperand &LHS,		MachineOperand &LHS,
MachineOperand &RHS,		MachineOperand &RHS,
▲ Show 20 Lines • Show All 1,654 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/GlobalISel/select-cmp.mir

Show First 20 Lines • Show All 264 Lines • ▼ Show 20 Lines	bb.0:
%ext:gpr(s64) = G_ZEXT %reg0(s32)		%ext:gpr(s64) = G_ZEXT %reg0(s32)
%cst:gpr(s64) = G_CONSTANT i64 5		%cst:gpr(s64) = G_CONSTANT i64 5
%shift:gpr(s64) = G_SHL %ext, %cst(s64)		%shift:gpr(s64) = G_SHL %ext, %cst(s64)
%cmp:gpr(s32) = G_ICMP intpred(ugt), %reg1(s64), %shift		%cmp:gpr(s32) = G_ICMP intpred(ugt), %reg1(s64), %shift
$w0 = COPY %cmp(s32)		$w0 = COPY %cmp(s32)
RET_ReallyLR implicit $w0		RET_ReallyLR implicit $w0

...		...
		---
		name: cmp_add_rhs
		legalized: true
		regBankSelected: true
		tracksRegLiveness: true
		machineFunctionInfo: {}
		body: \|
		bb.0:
		liveins: $w0, $w1, $w2

		; The CSINC should use the add's RHS.

		; CHECK-LABEL: name: cmp_add_rhs
		; CHECK: liveins: $w0, $w1, $w2
		; CHECK: %cmp_lhs:gpr32 = COPY $w0
		; CHECK: %cmp_rhs:gpr32 = COPY $w1
		; CHECK: %add_rhs:gpr32 = COPY $w2
		; CHECK: [[SUBSWrr:%[0-9]+]]:gpr32 = SUBSWrr %cmp_lhs, %cmp_rhs, implicit-def $nzcv
		; CHECK: %add:gpr32 = CSINCWr %add_rhs, %add_rhs, 1, implicit $nzcv
		; CHECK: $w0 = COPY %add
		; CHECK: RET_ReallyLR implicit $w0
		%cmp_lhs:gpr(s32) = COPY $w0
		%cmp_rhs:gpr(s32) = COPY $w1
		%add_rhs:gpr(s32) = COPY $w2
		%cmp:gpr(s32) = G_ICMP intpred(eq), %cmp_lhs(s32), %cmp_rhs
		%add:gpr(s32) = G_ADD %cmp, %add_rhs
		$w0 = COPY %add(s32)
		RET_ReallyLR implicit $w0

		...
		---
		name: cmp_add_lhs
		legalized: true
		regBankSelected: true
		tracksRegLiveness: true
		machineFunctionInfo: {}
		body: \|
		bb.0:
		liveins: $w0, $w1, $w2

		; The CSINC should use the add's LHS.

		; CHECK-LABEL: name: cmp_add_lhs
		; CHECK: liveins: $w0, $w1, $w2
		; CHECK: %cmp_lhs:gpr32 = COPY $w0
		; CHECK: %cmp_rhs:gpr32 = COPY $w1
		; CHECK: %add_lhs:gpr32 = COPY $w2
		; CHECK: [[SUBSWrr:%[0-9]+]]:gpr32 = SUBSWrr %cmp_lhs, %cmp_rhs, implicit-def $nzcv
		; CHECK: %add:gpr32 = CSINCWr %add_lhs, %add_lhs, 1, implicit $nzcv
		; CHECK: $w0 = COPY %add
		; CHECK: RET_ReallyLR implicit $w0
		%cmp_lhs:gpr(s32) = COPY $w0
		%cmp_rhs:gpr(s32) = COPY $w1
		%add_lhs:gpr(s32) = COPY $w2
		%cmp:gpr(s32) = G_ICMP intpred(eq), %cmp_lhs(s32), %cmp_rhs
		%add:gpr(s32) = G_ADD %add_lhs, %cmp
		$w0 = COPY %add(s32)
		RET_ReallyLR implicit $w0