This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] DAG combine (sra (shl X, 32), 32 - C) -> (shl (sext_inreg X, i32), C).
ClosedPublic

Authored by craig.topper on Jun 29 2022, 11:29 AM.

Download Raw Diff

Details

Reviewers

asb
reames
luismarques
frasercrmck

Commits

rG9ace5af0495c: [RISCV] DAG combine (sra (shl X, 32), 32 - C) -> (shl (sext_inreg X, i32), C).

Summary

The sext_inreg can often be folded into an earlier instruction by
using a W instruction. The sext_inreg also works better with our ABI.

This is one of the steps to improving the generated code for this https://godbolt.org/z/hssn6sPco

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

craig.topper created this revision.Jun 29 2022, 11:29 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 29 2022, 11:29 AM

Herald added subscribers: sunshaoce, VincentWu, luke957 and 27 others. · View Herald Transcript

craig.topper requested review of this revision.Jun 29 2022, 11:29 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 29 2022, 11:29 AM

Herald added subscribers: • pcwang-thead, eopXD, MaskRay. · View Herald Transcript

Harbormaster completed remote builds in B172834: Diff 441103.Jun 29 2022, 11:29 AM

I think you need to push the commit with rv64i-shift-sext.ll?

Given this only deals with generic nodes, would this make sense to be a TLI-guided generic combine? (And maybe other targets could benefit from it, too, or already do this in target combines?)

craig.topper mentioned this in rG75095e628124: [RISCV] Pre-commit tests for D128843. NFC.Jun 29 2022, 12:12 PM

In D128843#3620025, @jrtc27 wrote:

Given this only deals with generic nodes, would this make sense to be a TLI-guided generic combine? (And maybe other targets could benefit from it, too, or already do this in target combines?)

Looks like X86 has something similar in combineShiftRightArithmetic, but they handle a more generic form.

// fold (ashr (shl, a, [56,48,32,24,16]), SarConst)                            
// into (shl, (sext (a), [56,48,32,24,16] - SarConst)) or                      
// into (lshr, (sext (a), SarConst - [56,48,32,24,16]))                        
// depending on sign of (SarConst - [56,48,32,24,16])

For RISC-V we also support sext_inreg for i8 and i16 with Zbb. I'm not sure we want to convert to it naively since it won't always fold away and zext.h/zext.b aren't compressible. It probablys makes sense to convert i8/i16 if the input is a load even without Zbb. But that probably needs to create a sextload instead of creating a sext_inreg.

I have another patch I'm working on that depends on this one. So I'd like to consider making this target independent as a future pass. I'll add a FIXME.

Add FIXME to make this generic.

I'm probably going to end up relaxing the one use check here to be that all users of the shl X, 32 are SRA instructions so that we can end up with a sext_inreg shared by multiple shls.

craig.topper edited the summary of this revision. (Show Details)Jun 29 2022, 1:06 PM

My plan had been to solve https://godbolt.org/z/hssn6sPco by adding a fold for (add (shl X, 32), C<<32) -> (shl (add X, C), 32) and then let this patch optimize the shift with sra that might be after it. I thought the (add (shl X, 32), C<<32) fold could stand on its own as well, but in testing I found that some values of C cause an infinite loop with isDesirableToCommuteWithShift.

The larger (sra (add (shl X, 32), C << 32), 32-C1) -> (shl (sext_inreg (add X, C), i32), C1) pattern is probably always profitable since the sext_inreg will always becomes an ADDW.

craig.topper retitled this revision from [RISCV] DAG combine (sra (shl X, 32), 32 - C) -> (sra (sext_inreg X, i32), C). to [RISCV] DAG combine (sra (shl X, 32), 32 - C) -> (shl (sext_inreg X, i32), C)..Jun 29 2022, 2:52 PM

craig.topper added inline comments.

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
8533	I typoed this comment

Fix mistake in comment

Harbormaster completed remote builds in B172876: Diff 441177.Jun 29 2022, 4:28 PM

craig.topper added a child revision: D128869: [RISCV] Fold (sra (add (shl X, 32), C1), 32 - C) -> (shl (sext_inreg (add X, C1), C).Jun 29 2022, 8:52 PM

LGTM, thanks!

This revision is now accepted and ready to land.Jun 30 2022, 7:21 AM

This revision was landed with ongoing or failed builds.Jun 30 2022, 9:02 AM

Closed by commit rG9ace5af0495c: [RISCV] DAG combine (sra (shl X, 32), 32 - C) -> (shl (sext_inreg X, i32), C). (authored by craig.topper). · Explain Why

This revision was automatically updated to reflect the committed changes.

craig.topper added a commit: rG9ace5af0495c: [RISCV] DAG combine (sra (shl X, 32), 32 - C) -> (shl (sext_inreg X, i32), C)..

Revision Contents

Path

Size

llvm/

lib/

Target/

RISCV/

RISCVISelLowering.cpp

32 lines

test/

CodeGen/

RISCV/

rv64i-shift-sext.ll

19 lines

Diff 441423

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 934 Lines • ▼ Show 20 Lines	RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,

setMinimumJumpTableEntries(5);		setMinimumJumpTableEntries(5);

// Jumps are expensive, compared to logic		// Jumps are expensive, compared to logic
setJumpIsExpensive();		setJumpIsExpensive();

setTargetDAGCombine({ISD::INTRINSIC_WO_CHAIN, ISD::ADD, ISD::SUB, ISD::AND,		setTargetDAGCombine({ISD::INTRINSIC_WO_CHAIN, ISD::ADD, ISD::SUB, ISD::AND,
ISD::OR, ISD::XOR});		ISD::OR, ISD::XOR});
		if (Subtarget.is64Bit())
		setTargetDAGCombine(ISD::SRA);

if (Subtarget.hasStdExtF())		if (Subtarget.hasStdExtF())
setTargetDAGCombine({ISD::FADD, ISD::FMAXNUM, ISD::FMINNUM});		setTargetDAGCombine({ISD::FADD, ISD::FMAXNUM, ISD::FMINNUM});

if (Subtarget.hasStdExtZbp())		if (Subtarget.hasStdExtZbp())
setTargetDAGCombine({ISD::ROTL, ISD::ROTR});		setTargetDAGCombine({ISD::ROTL, ISD::ROTR});

if (Subtarget.hasStdExtZbb())		if (Subtarget.hasStdExtZbb())
▲ Show 20 Lines • Show All 7,571 Lines • ▼ Show 20 Lines	if (NegAcc) {
case RISCVISD::VFNMADD_VL: Opcode = RISCVISD::VFNMSUB_VL; break;		case RISCVISD::VFNMADD_VL: Opcode = RISCVISD::VFNMSUB_VL; break;
case RISCVISD::VFNMSUB_VL: Opcode = RISCVISD::VFNMADD_VL; break;		case RISCVISD::VFNMSUB_VL: Opcode = RISCVISD::VFNMADD_VL; break;
}		}
// clang-format on		// clang-format on
}		}

return Opcode;		return Opcode;
}		}

		// Combine (sra (shl X, 32), 32 - C) -> (shl (sext_inreg X, i32), C)
		craig.topperAuthorUnsubmitted Done Reply Inline Actions I typoed this comment craig.topper: I typoed this comment
		// FIXME: Should this be a generic combine? There's a similar combine on X86.
		static SDValue performSRACombine(SDNode *N, SelectionDAG &DAG,
		const RISCVSubtarget &Subtarget) {
		assert(N->getOpcode() == ISD::SRA && "Unexpected opcode");

		if (N->getValueType(0) != MVT::i64 \|\| !Subtarget.is64Bit())
		return SDValue();

		auto *C = dyn_cast<ConstantSDNode>(N->getOperand(1));
		if (!C \|\| C->getZExtValue() >= 32)
		return SDValue();

		SDValue N0 = N->getOperand(0);
		if (N0.getOpcode() != ISD::SHL \|\| !N0.hasOneUse() \|\|
		!isa<ConstantSDNode>(N0.getOperand(1)) \|\|
		N0.getConstantOperandVal(1) != 32)
		return SDValue();

		SDLoc DL(N);
		SDValue SExt = DAG.getNode(ISD::SIGN_EXTEND_INREG, DL, MVT::i64,
		N0.getOperand(0), DAG.getValueType(MVT::i32));
		return DAG.getNode(ISD::SHL, DL, MVT::i64, SExt,
		DAG.getConstant(32 - C->getZExtValue(), DL, MVT::i64));
		}

SDValue RISCVTargetLowering::PerformDAGCombine(SDNode *N,		SDValue RISCVTargetLowering::PerformDAGCombine(SDNode *N,
DAGCombinerInfo &DCI) const {		DAGCombinerInfo &DCI) const {
SelectionDAG &DAG = DCI.DAG;		SelectionDAG &DAG = DCI.DAG;

// Helper to call SimplifyDemandedBits on an operand of N where only some low		// Helper to call SimplifyDemandedBits on an operand of N where only some low
// bits are demanded. N will be added to the Worklist if it was not deleted.		// bits are demanded. N will be added to the Worklist if it was not deleted.
// Caller should return SDValue(N, 0) if this returns true.		// Caller should return SDValue(N, 0) if this returns true.
auto SimplifyDemandedLowBitsHelper = [&](unsigned OpNo, unsigned LowBits) {		auto SimplifyDemandedLowBitsHelper = [&](unsigned OpNo, unsigned LowBits) {
▲ Show 20 Lines • Show All 460 Lines • ▼ Show 20 Lines	if (ShAmt.getOpcode() == RISCVISD::SPLAT_VECTOR_SPLIT_I64_VL) {
ShAmt = DAG.getNode(RISCVISD::VMV_V_X_VL, DL, VT, DAG.getUNDEF(VT),		ShAmt = DAG.getNode(RISCVISD::VMV_V_X_VL, DL, VT, DAG.getUNDEF(VT),
ShAmt.getOperand(1), VL);		ShAmt.getOperand(1), VL);
return DAG.getNode(N->getOpcode(), DL, VT, N->getOperand(0), ShAmt,		return DAG.getNode(N->getOpcode(), DL, VT, N->getOperand(0), ShAmt,
N->getOperand(2), N->getOperand(3));		N->getOperand(2), N->getOperand(3));
}		}
break;		break;
}		}
case ISD::SRA:		case ISD::SRA:
		if (SDValue V = performSRACombine(N, DAG, Subtarget))
		return V;
		LLVM_FALLTHROUGH;
case ISD::SRL:		case ISD::SRL:
case ISD::SHL: {		case ISD::SHL: {
SDValue ShAmt = N->getOperand(1);		SDValue ShAmt = N->getOperand(1);
if (ShAmt.getOpcode() == RISCVISD::SPLAT_VECTOR_SPLIT_I64_VL) {		if (ShAmt.getOpcode() == RISCVISD::SPLAT_VECTOR_SPLIT_I64_VL) {
// We don't need the upper 32 bits of a 64-bit element for a shift amount.		// We don't need the upper 32 bits of a 64-bit element for a shift amount.
SDLoc DL(N);		SDLoc DL(N);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
ShAmt = DAG.getNode(RISCVISD::VMV_V_X_VL, DL, VT, DAG.getUNDEF(VT),		ShAmt = DAG.getNode(RISCVISD::VMV_V_X_VL, DL, VT, DAG.getUNDEF(VT),
▲ Show 20 Lines • Show All 3,316 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/rv64i-shift-sext.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=riscv64 -verify-machineinstrs < %s \			; RUN: llc -mtriple=riscv64 -verify-machineinstrs < %s \
	; RUN: \| FileCheck %s -check-prefix=RV64I			; RUN: \| FileCheck %s -check-prefix=RV64I

	; Test that we turn (sra (shl X, 32), 32-C) into (slli (sext.w X), C)			; Test that we turn (sra (shl X, 32), 32-C) into (slli (sext.w X), C)

	define i64 @test1(i64 %a) nounwind {			define i64 @test1(i64 %a) nounwind {
	; RV64I-LABEL: test1:			; RV64I-LABEL: test1:
	; RV64I: # %bb.0:			; RV64I: # %bb.0:
	; RV64I-NEXT: slli a0, a0, 32			; RV64I-NEXT: sext.w a0, a0
	; RV64I-NEXT: srai a0, a0, 30			; RV64I-NEXT: slli a0, a0, 2
	; RV64I-NEXT: ret			; RV64I-NEXT: ret
	%1 = shl i64 %a, 32			%1 = shl i64 %a, 32
	%2 = ashr i64 %1, 30			%2 = ashr i64 %1, 30
	ret i64 %2			ret i64 %2
	}			}

	define i64 @test2(i32 signext %a) nounwind {			define i64 @test2(i32 signext %a) nounwind {
	; RV64I-LABEL: test2:			; RV64I-LABEL: test2:
	; RV64I: # %bb.0:			; RV64I: # %bb.0:
	; RV64I-NEXT: slli a0, a0, 32			; RV64I-NEXT: slli a0, a0, 3
	; RV64I-NEXT: srai a0, a0, 29
	; RV64I-NEXT: ret			; RV64I-NEXT: ret
	%1 = zext i32 %a to i64			%1 = zext i32 %a to i64
	%2 = shl i64 %1, 32			%2 = shl i64 %1, 32
	%3 = ashr i64 %2, 29			%3 = ashr i64 %2, 29
	ret i64 %3			ret i64 %3
	}			}

	define i64 @test3(i32* %a) nounwind {			define i64 @test3(i32* %a) nounwind {
	; RV64I-LABEL: test3:			; RV64I-LABEL: test3:
	; RV64I: # %bb.0:			; RV64I: # %bb.0:
	; RV64I-NEXT: lw a0, 0(a0)			; RV64I-NEXT: lw a0, 0(a0)
	; RV64I-NEXT: slli a0, a0, 32			; RV64I-NEXT: slli a0, a0, 4
	; RV64I-NEXT: srai a0, a0, 28
	; RV64I-NEXT: ret			; RV64I-NEXT: ret
	%1 = load i32, i32* %a			%1 = load i32, i32* %a
	%2 = zext i32 %1 to i64			%2 = zext i32 %1 to i64
	%3 = shl i64 %2, 32			%3 = shl i64 %2, 32
	%4 = ashr i64 %3, 28			%4 = ashr i64 %3, 28
	ret i64 %4			ret i64 %4
	}			}

	define i64 @test4(i32 signext %a, i32 signext %b) nounwind {			define i64 @test4(i32 signext %a, i32 signext %b) nounwind {
	; RV64I-LABEL: test4:			; RV64I-LABEL: test4:
	; RV64I: # %bb.0:			; RV64I: # %bb.0:
	; RV64I-NEXT: addw a0, a0, a1			; RV64I-NEXT: addw a0, a0, a1
	; RV64I-NEXT: slli a0, a0, 32			; RV64I-NEXT: slli a0, a0, 30
	; RV64I-NEXT: srai a0, a0, 2
	; RV64I-NEXT: ret			; RV64I-NEXT: ret
	%1 = add i32 %a, %b			%1 = add i32 %a, %b
	%2 = zext i32 %1 to i64			%2 = zext i32 %1 to i64
	%3 = shl i64 %2, 32			%3 = shl i64 %2, 32
	%4 = ashr i64 %3, 2			%4 = ashr i64 %3, 2
	ret i64 %4			ret i64 %4
	}			}

	define i64 @test5(i32 signext %a, i32 signext %b) nounwind {			define i64 @test5(i32 signext %a, i32 signext %b) nounwind {
	; RV64I-LABEL: test5:			; RV64I-LABEL: test5:
	; RV64I: # %bb.0:			; RV64I: # %bb.0:
	; RV64I-NEXT: xor a0, a0, a1			; RV64I-NEXT: xor a0, a0, a1
	; RV64I-NEXT: slli a0, a0, 32			; RV64I-NEXT: slli a0, a0, 31
	; RV64I-NEXT: srai a0, a0, 1
	; RV64I-NEXT: ret			; RV64I-NEXT: ret
	%1 = xor i32 %a, %b			%1 = xor i32 %a, %b
	%2 = zext i32 %1 to i64			%2 = zext i32 %1 to i64
	%3 = shl i64 %2, 32			%3 = shl i64 %2, 32
	%4 = ashr i64 %3, 1			%4 = ashr i64 %3, 1
	ret i64 %4			ret i64 %4
	}			}

	define i64 @test6(i32 signext %a, i32 signext %b) nounwind {			define i64 @test6(i32 signext %a, i32 signext %b) nounwind {
	; RV64I-LABEL: test6:			; RV64I-LABEL: test6:
	; RV64I: # %bb.0:			; RV64I: # %bb.0:
	; RV64I-NEXT: sllw a0, a0, a1			; RV64I-NEXT: sllw a0, a0, a1
	; RV64I-NEXT: slli a0, a0, 32			; RV64I-NEXT: slli a0, a0, 16
	; RV64I-NEXT: srai a0, a0, 16
	; RV64I-NEXT: ret			; RV64I-NEXT: ret
	%1 = shl i32 %a, %b			%1 = shl i32 %a, %b
	%2 = zext i32 %1 to i64			%2 = zext i32 %1 to i64
	%3 = shl i64 %2, 32			%3 = shl i64 %2, 32
	%4 = ashr i64 %3, 16			%4 = ashr i64 %3, 16
	ret i64 %4			ret i64 %4
	}			}

	▲ Show 20 Lines • Show All 93 Lines • Show Last 20 Lines