This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Refactor an optimization of addition with immediate
ClosedPublic

Authored by benshi001 on Apr 19 2021, 8:37 AM.

Download Raw Diff

Details

Reviewers

craig.topper
MaskRay
asb
luismarques

Commits

rG30e2c7be9935: [RISCV] Refactor an optimization of addition with immediate

Summary

Simplify the original optimization from customed DAG selection code
to TD pattern.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

benshi001 created this revision.Apr 19 2021, 8:37 AM

Herald added subscribers: vkmr, frasercrmck, evandro and 25 others. · View Herald TranscriptApr 19 2021, 8:37 AM

benshi001 requested review of this revision.Apr 19 2021, 8:37 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 19 2021, 8:37 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

benshi001 added a parent revision: D100767: [RISCV][test] Add a new test of addition.Apr 19 2021, 8:38 AM

The original optimization is written in customed DAG selection, which is hard to extend.
Changing it to TD pattern makes it more flexible.

craig.topper added inline comments.Apr 19 2021, 9:18 AM

llvm/lib/Target/RISCV/RISCVInstrInfo.td
1298	Do we need to make sure sext_inreg is the only user of the add? Otherwise we’ll emit two ADDIWs and two ADDIs.

Harbormaster completed remote builds in B99493: Diff 338533.Apr 19 2021, 9:25 AM

craig.topper added inline comments.Apr 19 2021, 10:25 AM

llvm/lib/Target/RISCV/RISCVInstrInfo.td
1298	I think if we emitted ADDI followed by ADDIW for sign_ext case, the first ADDI would CSE with the first ADDI from a non-sext case if there were multiple uses. Then we wouldn't need to check for multiple uses.

lenary resigned from this revision.Apr 19 2021, 11:27 AM

benshi001 added inline comments.Apr 19 2021, 10:31 PM

llvm/lib/Target/RISCV/RISCVInstrInfo.td
1298	I do not quite understand your concern, why do you think "Otherwise we’ll emit two ADDIWs and two ADDIs." ? That is impossible. for the IR "%a = add i32 %b, 3001", two ADDIs are emitted on both rv32 and rv64. for the IR "%a = add i64 %b, 3001", two ADDIs are emitted on rv64. for the IR "%a = add i64 %b, 3001", two ADDIs are emitted on rv32 for the lower 32-bit, (along with other instrs for the upper 32-bit). for the IR pattern "%a = add i32 %b, 3001; %c = sext_inreg %a, i32", two ADDIWs are emitted without any ADDI. I did not find any other cases/IR patterns for this optimization.
1298	In current patch, there is no possibility that ADDI/ADDIW are mixedly emmitted. I can remove the one-use check, but I do concern the immediate composed by a lui/addi pair has further use, and my transform leads to less efficient code.

craig.topper added inline comments.Apr 19 2021, 10:44 PM

llvm/lib/Target/RISCV/RISCVInstrInfo.td

1298

Example test

define signext i32 @add32_sext_accept(i32 signext %a, i32* %b) nounwind {                                                                                                                      
  %1 = add i32 %a, 2999                                                                                                                                                                        
  store i32 %1, i32* %b                                                                                                                                                                        
  ret i32 %1                                                                                                                                                                                   
}

Produces

addi    a2, a0, 1500
addi    a2, a2, 1499
addiw   a0, a0, 1500
addiw   a0, a0, 1499
sw      a2, 0(a1)
ret

Though for that example we could use the addiw result for the sw, but that's a bit hard to fix at the moment.

Here's another example

define i64 @add32_sext_accept(i64 %a, i64* %b) nounwind {                                                                                                                                      
  %1 = add i64 %a, 2999                                                                                                                                                                        
  store i64 %1, i64* %b                                                                                                                                                                        
  %2 = shl i64 %1, 32                                                                                                                                                                          
  %3 = ashr i64 %2, 32                                                                                                                                                                         
  ret i64 %3                                                                                                                                                                                   
}

produces

addi    a2, a0, 1500
addi    a2, a2, 1499
addiw   a0, a0, 1500
addiw   a0, a0, 1499
sd      a2, 0(a1)
ret

benshi001 added inline comments.Apr 19 2021, 11:21 PM

llvm/lib/Target/RISCV/RISCVInstrInfo.td
1298	I see. Your concern does matter. Currently I can not figure an easy way to cover all special cases. So I will remove the ADDIW rule.

benshi001 updated this revision to Diff 338724.Apr 19 2021, 11:28 PM

benshi001 retitled this revision from [RISCV] Optimize addition with immediate to [RISCV] Refactor an optimization of addition with immediate.

benshi001 edited the summary of this revision. (Show Details)

benshi001 removed a reviewer: lenary.

I have removed the ADDIW parts, this patch is simply a refactoring of the oringal optimization, changeing from DAG to TD. There is no extra test needed, since llvm/test/codegen/riscv/add-imm.ll has covered all cases.

LGTM

I think to address my concern you can just change the (sext_inreg (add GPR:$rs1, (AddiPair GPR:$rs2)), i32) to produce (ADDIW (ADDI GPR:$rs1, (AddiPairImmB GPR:$rs2))). We only need the W on the outer ADDI to sign extend. That should reduce my test examples to 1 ADDIW and 2 ADDIs. Which is the same number of instructions you get now with sext.w+addi+addi in the worst case. And when there aren't additional uses you get down to just 2 instructions.

The other option is to change the pattern to (sext_inreg (add_oneuse GPR:$rs1, (AddiPair GPR:$rs2)), i32) by using a PatFrag to check that the add only has one use, the sext_inreg. That would keep my example tests as sext.w+addi+addi as the pattern wouldn't match, but allow 2 instructions when the one use check passes.

The first approach allows 2 of the addis in my test cases to execute in parallel on a superscalar core. The second approach with add_oneuse serializes the sext.w after the addis have completed.

This revision is now accepted and ready to land.Apr 19 2021, 11:51 PM

Harbormaster completed remote builds in B99622: Diff 338724.Apr 20 2021, 12:08 AM

Closed by commit rG30e2c7be9935: [RISCV] Refactor an optimization of addition with immediate (authored by benshi001). · Explain WhyApr 20 2021, 3:04 AM

This revision was automatically updated to reflect the committed changes.

benshi001 added a commit: rG30e2c7be9935: [RISCV] Refactor an optimization of addition with immediate.

Revision Contents

Path

Size

llvm/

lib/

Target/

RISCV/

RISCVISelDAGToDAG.h

2 lines

RISCVISelDAGToDAG.cpp

39 lines

RISCVInstrInfo.td

23 lines

Diff 338795

llvm/lib/Target/RISCV/RISCVISelDAGToDAG.h

Show First 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	public:
}		}
bool selectShiftMask32(SDValue N, SDValue &ShAmt) {		bool selectShiftMask32(SDValue N, SDValue &ShAmt) {
return selectShiftMask(N, 32, ShAmt);		return selectShiftMask(N, 32, ShAmt);
}		}

bool selectSExti32(SDValue N, SDValue &Val);		bool selectSExti32(SDValue N, SDValue &Val);
bool selectZExti32(SDValue N, SDValue &Val);		bool selectZExti32(SDValue N, SDValue &Val);

		bool selectAddiPair(SDValue N, SDValue &Val);

bool MatchSLLIUW(SDNode *N) const;		bool MatchSLLIUW(SDNode *N) const;

bool selectVLOp(SDValue N, SDValue &VL);		bool selectVLOp(SDValue N, SDValue &VL);

bool selectVSplat(SDValue N, SDValue &SplatVal);		bool selectVSplat(SDValue N, SDValue &SplatVal);
bool selectVSplatSimm5(SDValue N, SDValue &SplatVal);		bool selectVSplatSimm5(SDValue N, SDValue &SplatVal);
bool selectVSplatUimm5(SDValue N, SDValue &SplatVal);		bool selectVSplatUimm5(SDValue N, SDValue &SplatVal);
bool selectVSplatSimm5Plus1(SDValue N, SDValue &SplatVal);		bool selectVSplatSimm5Plus1(SDValue N, SDValue &SplatVal);
▲ Show 20 Lines • Show All 111 Lines • Show Last 20 Lines

llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp

Show First 20 Lines • Show All 373 Lines • ▼ Show 20 Lines	void RISCVDAGToDAGISel::Select(SDNode *Node) {
// Instruction Selection not handled by the auto-generated tablegen selection		// Instruction Selection not handled by the auto-generated tablegen selection
// should be handled here.		// should be handled here.
unsigned Opcode = Node->getOpcode();		unsigned Opcode = Node->getOpcode();
MVT XLenVT = Subtarget->getXLenVT();		MVT XLenVT = Subtarget->getXLenVT();
SDLoc DL(Node);		SDLoc DL(Node);
MVT VT = Node->getSimpleValueType(0);		MVT VT = Node->getSimpleValueType(0);

switch (Opcode) {		switch (Opcode) {
case ISD::ADD: {
// Optimize (add r, imm) to (addi (addi r, imm0) imm1) if applicable. The
// immediate must be in specific ranges and have a single use.
if (auto *ConstOp = dyn_cast<ConstantSDNode>(Node->getOperand(1))) {
if (!(ConstOp->hasOneUse()))
break;
// The imm must be in range [-4096,-2049] or [2048,4094].
int64_t Imm = ConstOp->getSExtValue();
if (!(-4096 <= Imm && Imm <= -2049) && !(2048 <= Imm && Imm <= 4094))
break;
// Break the imm to imm0+imm1.
const SDValue ImmOp0 = CurDAG->getTargetConstant(Imm - Imm / 2, DL, VT);
const SDValue ImmOp1 = CurDAG->getTargetConstant(Imm / 2, DL, VT);
auto *NodeAddi0 = CurDAG->getMachineNode(RISCV::ADDI, DL, VT,
Node->getOperand(0), ImmOp0);
auto *NodeAddi1 = CurDAG->getMachineNode(RISCV::ADDI, DL, VT,
SDValue(NodeAddi0, 0), ImmOp1);
ReplaceNode(Node, NodeAddi1);
return;
}
break;
}
case ISD::Constant: {		case ISD::Constant: {
auto *ConstNode = cast<ConstantSDNode>(Node);		auto *ConstNode = cast<ConstantSDNode>(Node);
if (VT == XLenVT && ConstNode->isNullValue()) {		if (VT == XLenVT && ConstNode->isNullValue()) {
SDValue New =		SDValue New =
CurDAG->getCopyFromReg(CurDAG->getEntryNode(), DL, RISCV::X0, XLenVT);		CurDAG->getCopyFromReg(CurDAG->getEntryNode(), DL, RISCV::X0, XLenVT);
ReplaceNode(Node, New.getNode());		ReplaceNode(Node, New.getNode());
return;		return;
}		}
▲ Show 20 Lines • Show All 675 Lines • ▼ Show 20 Lines	if (N.getOpcode() == ISD::AssertZext &&
cast<VTSDNode>(N->getOperand(1))->getVT().bitsLE(MVT::i32)) {		cast<VTSDNode>(N->getOperand(1))->getVT().bitsLE(MVT::i32)) {
Val = N;		Val = N;
return true;		return true;
}		}

return false;		return false;
}		}

		// Check if (add r, imm) can be optimized to (ADDI (ADDI r, imm0), imm1),
		// in which imm = imm0 + imm1 and both imm0 and imm1 are simm12.
		bool RISCVDAGToDAGISel::selectAddiPair(SDValue N, SDValue &Val) {
		if (auto *ConstOp = dyn_cast<ConstantSDNode>(N)) {
		// The immediate operand must have only use.
		if (!(ConstOp->hasOneUse()))
		return false;
		// The immediate operand must be in range [-4096,-2049] or [2048,4094].
		int64_t Imm = ConstOp->getSExtValue();
		if ((-4096 <= Imm && Imm <= -2049) \|\| (2048 <= Imm && Imm <= 4094)) {
		Val = N;
		return true;
		}
		}
		return false;
		}

// Check that it is a SLLIUW (Shift Logical Left Immediate Unsigned i32		// Check that it is a SLLIUW (Shift Logical Left Immediate Unsigned i32
// on RV64).		// on RV64).
// SLLIUW is the same as SLLI except for the fact that it clears the bits		// SLLIUW is the same as SLLI except for the fact that it clears the bits
// XLEN-1:32 of the input RS1 before shifting.		// XLEN-1:32 of the input RS1 before shifting.
// A PatFrag has already checked that it has the right structure:		// A PatFrag has already checked that it has the right structure:
//		//
// (AND (SHL RS1, VC2), VC1)		// (AND (SHL RS1, VC2), VC1)
//		//
▲ Show 20 Lines • Show All 273 Lines • Show Last 20 Lines

llvm/lib/Target/RISCV/RISCVInstrInfo.td

Show First 20 Lines • Show All 139 Lines • ▼ Show 20 Lines	def uimmlog2xlen : Operand<XLenVT>, ImmLeaf<XLenVT, [{
let ParserMatchClass = UImmLog2XLenAsmOperand;		let ParserMatchClass = UImmLog2XLenAsmOperand;
// TODO: should ensure invalid shamt is rejected when decoding.		// TODO: should ensure invalid shamt is rejected when decoding.
let DecoderMethod = "decodeUImmOperand<6>";		let DecoderMethod = "decodeUImmOperand<6>";
let MCOperandPredicate = [{		let MCOperandPredicate = [{
int64_t Imm;		int64_t Imm;
if (!MCOp.evaluateAsConstantImm(Imm))		if (!MCOp.evaluateAsConstantImm(Imm))
return false;		return false;
if (STI.getTargetTriple().isArch64Bit())		if (STI.getTargetTriple().isArch64Bit())
return isUInt<6>(Imm);		return isUInt<6>(Imm);
return isUInt<5>(Imm);		return isUInt<5>(Imm);
}];		}];
let OperandType = "OPERAND_UIMMLOG2XLEN";		let OperandType = "OPERAND_UIMMLOG2XLEN";
let OperandNamespace = "RISCVOp";		let OperandNamespace = "RISCVOp";
}		}

def uimm5 : Operand<XLenVT>, ImmLeaf<XLenVT, [{return isUInt<5>(Imm);}]> {		def uimm5 : Operand<XLenVT>, ImmLeaf<XLenVT, [{return isUInt<5>(Imm);}]> {
let ParserMatchClass = UImmAsmOperand<5>;		let ParserMatchClass = UImmAsmOperand<5>;
▲ Show 20 Lines • Show All 181 Lines • ▼ Show 20 Lines
}]>;		}]>;

// Return an immediate subtracted from 32.		// Return an immediate subtracted from 32.
def ImmSubFrom32 : SDNodeXForm<imm, [{		def ImmSubFrom32 : SDNodeXForm<imm, [{
return CurDAG->getTargetConstant(32 - N->getZExtValue(), SDLoc(N),		return CurDAG->getTargetConstant(32 - N->getZExtValue(), SDLoc(N),
N->getValueType(0));		N->getValueType(0));
}]>;		}]>;

		// Check if an addition can be broken to a pair of ADDI.
		def AddiPair : ComplexPattern<XLenVT, 1, "selectAddiPair">;

		// Return imm/2.
		def AddiPairImmA : SDNodeXForm<imm, [{
		return CurDAG->getTargetConstant(N->getSExtValue() / 2, SDLoc(N),
		N->getValueType(0));
		}]>;

		// Return imm - imm/2.
		def AddiPairImmB : SDNodeXForm<imm, [{
		int64_t Imm = N->getSExtValue();
		return CurDAG->getTargetConstant(Imm - Imm / 2, SDLoc(N),
		N->getValueType(0));
		}]>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Instruction Formats		// Instruction Formats
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

include "RISCVInstrFormats.td"		include "RISCVInstrFormats.td"

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Instruction Class Templates		// Instruction Class Templates
▲ Show 20 Lines • Show All 915 Lines • ▼ Show 20 Lines
// We lower `trap` to `unimp`, as this causes a hard exception on nearly all		// We lower `trap` to `unimp`, as this causes a hard exception on nearly all
// systems.		// systems.
def : Pat<(trap), (UNIMP)>;		def : Pat<(trap), (UNIMP)>;

// We lower `debugtrap` to `ebreak`, as this will get the attention of the		// We lower `debugtrap` to `ebreak`, as this will get the attention of the
// debugger if possible.		// debugger if possible.
def : Pat<(debugtrap), (EBREAK)>;		def : Pat<(debugtrap), (EBREAK)>;

		/// Simple optimization
		def : Pat<(add GPR:$rs1, (AddiPair GPR:$rs2)),
		(ADDI (ADDI GPR:$rs1, (AddiPairImmB GPR:$rs2)),
		(AddiPairImmA GPR:$rs2))>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
		craig.topperUnsubmitted Not Done Reply Inline Actions Do we need to make sure sext_inreg is the only user of the add? Otherwise we’ll emit two ADDIWs and two ADDIs. craig.topper: Do we need to make sure sext_inreg is the only user of the add? Otherwise we’ll emit two ADDIWs…
		craig.topperUnsubmitted Not Done Reply Inline Actions I think if we emitted ADDI followed by ADDIW for sign_ext case, the first ADDI would CSE with the first ADDI from a non-sext case if there were multiple uses. Then we wouldn't need to check for multiple uses. craig.topper: I think if we emitted ADDI followed by ADDIW for sign_ext case, the first ADDI would CSE with…
		benshi001AuthorUnsubmitted Done Reply Inline Actions In current patch, there is no possibility that ADDI/ADDIW are mixedly emmitted. I can remove the one-use check, but I do concern the immediate composed by a lui/addi pair has further use, and my transform leads to less efficient code. benshi001: In current patch, there is no possibility that ADDI/ADDIW are mixedly emmitted. I can remove…
		benshi001AuthorUnsubmitted Done Reply Inline Actions I do not quite understand your concern, why do you think "Otherwise we’ll emit two ADDIWs and two ADDIs." ? That is impossible. for the IR "%a = add i32 %b, 3001", two ADDIs are emitted on both rv32 and rv64. for the IR "%a = add i64 %b, 3001", two ADDIs are emitted on rv64. for the IR "%a = add i64 %b, 3001", two ADDIs are emitted on rv32 for the lower 32-bit, (along with other instrs for the upper 32-bit). for the IR pattern "%a = add i32 %b, 3001; %c = sext_inreg %a, i32", two ADDIWs are emitted without any ADDI. I did not find any other cases/IR patterns for this optimization. benshi001: I do not quite understand your concern, why do you think "Otherwise we’ll emit two ADDIWs and…
		craig.topperUnsubmitted Not Done Reply Inline Actions Example test define signext i32 @add32_sext_accept(i32 signext %a, i32* %b) nounwind { %1 = add i32 %a, 2999 store i32 %1, i32* %b ret i32 %1 } Produces addi a2, a0, 1500 addi a2, a2, 1499 addiw a0, a0, 1500 addiw a0, a0, 1499 sw a2, 0(a1) ret Though for that example we could use the addiw result for the sw, but that's a bit hard to fix at the moment. Here's another example define i64 @add32_sext_accept(i64 %a, i64* %b) nounwind { %1 = add i64 %a, 2999 store i64 %1, i64* %b %2 = shl i64 %1, 32 %3 = ashr i64 %2, 32 ret i64 %3 } produces addi a2, a0, 1500 addi a2, a2, 1499 addiw a0, a0, 1500 addiw a0, a0, 1499 sd a2, 0(a1) ret craig.topper: Example test ``` define signext i32 @add32_sext_accept(i32 signext %a, i32* %b) nounwind {…
		benshi001AuthorUnsubmitted Done Reply Inline Actions I see. Your concern does matter. Currently I can not figure an easy way to cover all special cases. So I will remove the ADDIW rule. benshi001: I see. Your concern does matter. Currently I can not figure an easy way to cover all special…
// Standard extensions		// Standard extensions
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

include "RISCVInstrInfoM.td"		include "RISCVInstrInfoM.td"
include "RISCVInstrInfoA.td"		include "RISCVInstrInfoA.td"
include "RISCVInstrInfoF.td"		include "RISCVInstrInfoF.td"
include "RISCVInstrInfoD.td"		include "RISCVInstrInfoD.td"
include "RISCVInstrInfoC.td"		include "RISCVInstrInfoC.td"
include "RISCVInstrInfoB.td"		include "RISCVInstrInfoB.td"
include "RISCVInstrInfoV.td"		include "RISCVInstrInfoV.td"
include "RISCVInstrInfoZfh.td"		include "RISCVInstrInfoZfh.td"

This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Refactor an optimization of addition with immediateClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 338795

llvm/lib/Target/RISCV/RISCVISelDAGToDAG.h

llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp

llvm/lib/Target/RISCV/RISCVInstrInfo.td

[RISCV] Refactor an optimization of addition with immediate
ClosedPublic