Download Raw Diff

Details

Reviewers

Commits

rG20cc23c708f0: [RISCV] Add performMULcombine to perform strength-reduction
rG3304d51b676e: [RISCV] Add performMULcombine to perform strength-reduction

Summary

The RISC-V backend thus far does not provide strength-reduction, which
causes a long (but not complete) list of 3-instruction patterns listed
to utilize the shift-and-add instruction from Zba and XTHeadBa in
strength-reduction.

This adds the logic to perform strength-reduction through the DAG
combine for ISD::MUL. Initially, we wire this up for XTheadBa only,
until this has had some time to settle and get real-world test
exposure.

The following strength-reductions strategies are currently supported:

XTheadBa
- C = (n + 1) // th.addsl
- C = (n + 1)k // th.addsl, slli
- C = (n + 1)(m + 1) // th.addsl, th.addsl
- C = (n + 1)(m + 1)k // th.addsl, th.addsl, slli
- C = ((n + 1)m + 1) // th.addsl, th.addsl
- C = ((n + 1)m + 1)k // th.addslm th.addsl, slli
base ISA
- C being 2 set-bits // slli, slli, add

			       (possibly slli, th.addsl)

Even though the slli+slli+add sequence would we supported without
XTheadBa, this currently is gated to avoid having to update a large
number of test cases (i.e., anything that has a multiplication with a
constant where only 2 bits are set) in this commit.

With the strength reduction now being performed in performMUL combine,
we drop the (now redundant) patterns from RISCVInstrInfoXTHead.td.

Depends on D143029

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

philipp.tomsich created this revision.Feb 6 2023, 6:24 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 6 2023, 6:24 AM

Herald added subscribers: luke, VincentWu, vkmr and 28 others. · View Herald Transcript

philipp.tomsich requested review of this revision.Feb 6 2023, 6:24 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 6 2023, 6:24 AM

Herald added subscribers: llvm-commits, • pcwang-thead, eopXD, MaskRay. · View Herald Transcript

Harbormaster completed remote builds in B212082: Diff 495107.Feb 6 2023, 7:16 AM

rerun clang-format

philipp.tomsich added a child revision: D143036: [RISCV] Add vendor-defined XTHeadBs (single-bit) extension.Feb 6 2023, 3:44 PM

Harbormaster completed remote builds in B212219: Diff 495294.Feb 6 2023, 4:56 PM

kito-cheng added inline comments.Feb 6 2023, 11:40 PM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
8574	Early exit if no `Subtarget.hasVendorXTHeadBa()`?
8613	Does it applicable on zba?

craig.topper added inline comments.Feb 7 2023, 12:02 AM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
8579	Why allowing for vector?
8617	Divisible*
8645	It's harmless to create a nop ANY_EXTEND. getNode detects it.
8650	It's harmless to create a nop TRUNCATE.
8692	feasible*

craig.topper added inline comments.Feb 7 2023, 12:13 AM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
8645	Though I'm not sure why the VT would be different. The code doesn't look through any extends or truncates so the types should be changing right?

don't try to handle vectors (yet)
unconditionally insert ANY_EXTEND and TRUNCATE and let later passes clean up

Herald added a subscriber: jobnoorman. · View Herald TranscriptFeb 7 2023, 5:13 AM

philipp.tomsich added inline comments.Feb 7 2023, 5:13 AM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
8574	The final optimization (turning this into slli + slli + add is applicable to RV64I). So that block (the one "C has 2 bits set") will eventually move out of the if in a later commit. Refer to the remark from the commit message: Even though the slli+slli+add sequence would we supported without XTheadBa, this currently is gated to avoid having to update a large number of test cases (i.e., anything that has a multiplication with a constant where only 2 bits are set) in this commit. For that reason, I'd like to keep the logic as is.
8613	Yes, it is applicable. I'd like to keep this for a later commit (once we had a chance to run a full QA run with this enabled for Zba). Ok to defer?
8645	th.addsl is defined for XLen only (unlike mul, which has a W-form). When operating on a MVT::i32, this any-extends and then truncate the result (which will be merged into a W-form add or shift on the final instruction). If we don't any-extend, the compile on RV64 for the function unsigned func32(unsigned a) { return a*200; } will keep the first operation separated out as a slliw + add: func32: slliw a1, a0, 2 add a0, a0, a1 th.addsl a0, a0, a0, 2 slliw a0, a0, 3 ret

Harbormaster completed remote builds in B212355: Diff 495483.Feb 7 2023, 6:23 AM

This revision was not accepted when it landed; it landed in state Needs Review.Feb 7 2023, 10:57 PM

Closed by commit rG3304d51b676e: [RISCV] Add performMULcombine to perform strength-reduction (authored by philipp.tomsich). · Explain Why

This revision was automatically updated to reflect the committed changes.

philipp.tomsich added a commit: rG3304d51b676e: [RISCV] Add performMULcombine to perform strength-reduction.

philipp.tomsich added a reverting change: rGb4431b2d945b: Revert "[RISCV] Add performMULcombine to perform strength-reduction".Feb 7 2023, 11:01 PM

Reopening as this was accidentially pushed and revert using 'arc patch on D143534'.

At a high level, it feels wrong for this to be applied only to the t-head versions of the Zba instructions. If this can be profitably done for the standard extension, I'd encourage you to do so. If anything, starting with the standard extension specifically so we get the test coverage would seem like a better strategy.

Have you looked at what it would take to share code with another target here? I glanced at x86, and it seems like there's a bunch of overlap. Maybe we could introduce a set of generic DAG combines which enabled based on a callback or configuration? Haven't given this much thought, so take this as a light suggestion only.

I'll take the two suggestions (Kito and Philip) on getting Zba supported in this initial change as a hint that there's no point in delaying this.
We'll get a new version ready that adds the following:

enables for Zba

I will keep the following for a separate change:

move the slli + slli + add out of the guard ... this will be 3 instructions, with two of them independent for a { slli, slli } -> { add } critical path on dual-issue cores)
adjust all affected test-cases (there will be a massive ripple effect)

If you want the slli + slli + add moved out of the guard in this change as well, please let me know and we'll fold this into this change as well.

philipp.tomsich removed a child revision: D143036: [RISCV] Add vendor-defined XTHeadBs (single-bit) extension.Feb 11 2023, 1:01 AM

This revision was not accepted when it landed; it landed in state Changes Planned.Feb 17 2023, 10:45 AM

Closed by commit rG20cc23c708f0: [RISCV] Add performMULcombine to perform strength-reduction (authored by philipp.tomsich). · Explain Why

This revision was automatically updated to reflect the committed changes.

philipp.tomsich added a commit: rG20cc23c708f0: [RISCV] Add performMULcombine to perform strength-reduction.

philipp.tomsich added a reverting change: rGefe7c4b77bed: Revert "[RISCV] Add performMULcombine to perform strength-reduction".

Accidentially pushed (another 'arc patch' issue) and reverted.

Diff 495730

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,005 Lines • ▼ Show 20 Lines	RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,
setPrefFunctionAlignment(FunctionAlignment);		setPrefFunctionAlignment(FunctionAlignment);

setMinimumJumpTableEntries(5);		setMinimumJumpTableEntries(5);

// Jumps are expensive, compared to logic		// Jumps are expensive, compared to logic
setJumpIsExpensive();		setJumpIsExpensive();

setTargetDAGCombine({ISD::INTRINSIC_WO_CHAIN, ISD::ADD, ISD::SUB, ISD::AND,		setTargetDAGCombine({ISD::INTRINSIC_WO_CHAIN, ISD::ADD, ISD::SUB, ISD::AND,
ISD::OR, ISD::XOR, ISD::SETCC, ISD::SELECT});		ISD::OR, ISD::XOR, ISD::SETCC, ISD::SELECT, ISD::MUL});
if (Subtarget.is64Bit())		if (Subtarget.is64Bit())
setTargetDAGCombine(ISD::SRA);		setTargetDAGCombine(ISD::SRA);

if (Subtarget.hasStdExtF())		if (Subtarget.hasStdExtF())
setTargetDAGCombine({ISD::FADD, ISD::FMAXNUM, ISD::FMINNUM});		setTargetDAGCombine({ISD::FADD, ISD::FMAXNUM, ISD::FMINNUM});

if (Subtarget.hasStdExtZbb())		if (Subtarget.hasStdExtZbb())
setTargetDAGCombine({ISD::UMAX, ISD::UMIN, ISD::SMAX, ISD::SMIN});		setTargetDAGCombine({ISD::UMAX, ISD::UMIN, ISD::SMAX, ISD::SMIN});
▲ Show 20 Lines • Show All 7,541 Lines • ▼ Show 20 Lines	static SDValue combineDeMorganOfBoolean(SDNode *N, SelectionDAG &DAG) {

// Invert the opcode and insert a new xor.		// Invert the opcode and insert a new xor.
SDLoc DL(N);		SDLoc DL(N);
unsigned Opc = IsAnd ? ISD::OR : ISD::AND;		unsigned Opc = IsAnd ? ISD::OR : ISD::AND;
SDValue Logic = DAG.getNode(Opc, DL, VT, N00, N10);		SDValue Logic = DAG.getNode(Opc, DL, VT, N00, N10);
return DAG.getNode(ISD::XOR, DL, VT, Logic, DAG.getConstant(1, DL, VT));		return DAG.getNode(ISD::XOR, DL, VT, Logic, DAG.getConstant(1, DL, VT));
}		}

		static SDValue performMULCombine(SDNode *N, SelectionDAG &DAG,
		const RISCVSubtarget &Subtarget) {
		SDLoc DL(N);
		kito-chengUnsubmitted Not Done Reply Inline Actions Early exit if no `Subtarget.hasVendorXTHeadBa()`? kito-cheng: Early exit if no `Subtarget.hasVendorXTHeadBa()`?
		philipp.tomsichAuthorUnsubmitted Done Reply Inline Actions The final optimization (turning this into slli + slli + add is applicable to RV64I). So that block (the one "C has 2 bits set") will eventually move out of the if in a later commit. Refer to the remark from the commit message: Even though the slli+slli+add sequence would we supported without XTheadBa, this currently is gated to avoid having to update a large number of test cases (i.e., anything that has a multiplication with a constant where only 2 bits are set) in this commit. For that reason, I'd like to keep the logic as is. philipp.tomsich: The final optimization (turning this into slli + slli + add is applicable to RV64I). So that…
		const MVT XLenVT = Subtarget.getXLenVT();
		const EVT VT = N->getValueType(0);

		// An MUL is usually smaller than any alternative sequence for legal type.
		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
		craig.topperUnsubmitted Done Reply Inline Actions Why allowing for vector? craig.topper: Why allowing for vector?
		if (DAG.getMachineFunction().getFunction().hasMinSize() &&
		TLI.isOperationLegal(ISD::MUL, VT))
		return SDValue();

		SDValue N0 = N->getOperand(0);
		SDValue N1 = N->getOperand(1);
		ConstantSDNode *ConstOp = dyn_cast<ConstantSDNode>(N1);
		// Any optimization requires a constant RHS.
		if (!ConstOp)
		return SDValue();

		const APInt &C = ConstOp->getAPIntValue();
		// A multiply-by-pow2 will be reduced to a shift by the
		// architecture-independent code.
		if (C.isPowerOf2())
		return SDValue();

		// The below optimizations only work for non-negative constants
		if (!C.isNonNegative())
		return SDValue();

		auto Shl = [&](SDValue Value, unsigned ShiftAmount) {
		if (!ShiftAmount)
		return Value;

		SDValue ShiftAmountConst = DAG.getConstant(ShiftAmount, DL, XLenVT);
		return DAG.getNode(ISD::SHL, DL, Value.getValueType(), Value,
		ShiftAmountConst);
		};
		auto Add = [&](SDValue Addend1, SDValue Addend2) {
		return DAG.getNode(ISD::ADD, DL, Addend1.getValueType(), Addend1, Addend2);
		};

		if (Subtarget.hasVendorXTHeadBa()) {
		kito-chengUnsubmitted Not Done Reply Inline Actions Does it applicable on zba? kito-cheng: Does it applicable on zba?
		philipp.tomsichAuthorUnsubmitted Done Reply Inline Actions Yes, it is applicable. I'd like to keep this for a later commit (once we had a chance to run a full QA run with this enabled for Zba). Ok to defer? philipp.tomsich: Yes, it is applicable. I'd like to keep this for a later commit (once we had a chance to run a…
		// We try to simplify using shift-and-add instructions into up to
		// 3 instructions (e.g. 2x shift-and-add and 1x shift).

		auto isDivisibleByShiftedAddConst = [&](APInt C, APInt &N,
		craig.topperUnsubmitted Done Reply Inline Actions Divisible* craig.topper: Divisible*
		APInt &Quotient) {
		unsigned BitWidth = C.getBitWidth();
		for (unsigned i = 3; i >= 1; --i) {
		APInt X(BitWidth, (1 << i) + 1);
		APInt Remainder;
		APInt::sdivrem(C, X, Quotient, Remainder);
		if (Remainder == 0) {
		N = X;
		return true;
		}
		}
		return false;
		};
		auto isShiftedAddConst = [&](APInt C, APInt &N) {
		APInt Quotient;
		return isDivisibleByShiftedAddConst(C, N, Quotient) && Quotient == 1;
		};
		auto isSmallShiftAmount = [](APInt C) {
		return (C == 2) \|\| (C == 4) \|\| (C == 8);
		};

		auto ShiftAndAdd = [&](SDValue Value, unsigned ShiftAmount,
		SDValue Addend) {
		return Add(Shl(Value, ShiftAmount), Addend);
		};
		auto AnyExt = [&](SDValue Value) {
		return DAG.getNode(ISD::ANY_EXTEND, DL, XLenVT, Value);
		};
		craig.topperUnsubmitted Done Reply Inline Actions It's harmless to create a nop ANY_EXTEND. getNode detects it. craig.topper: It's harmless to create a nop ANY_EXTEND. getNode detects it.
		craig.topperUnsubmitted Not Done Reply Inline Actions Though I'm not sure why the VT would be different. The code doesn't look through any extends or truncates so the types should be changing right? craig.topper: Though I'm not sure why the VT would be different. The code doesn't look through any extends or…
		philipp.tomsichAuthorUnsubmitted Done Reply Inline Actions th.addsl is defined for XLen only (unlike mul, which has a W-form). When operating on a MVT::i32, this any-extends and then truncate the result (which will be merged into a W-form add or shift on the final instruction). If we don't any-extend, the compile on RV64 for the function unsigned func32(unsigned a) { return a200; } will keep the first operation separated out as a slliw + add: func32: slliw a1, a0, 2 add a0, a0, a1 th.addsl a0, a0, a0, 2 slliw a0, a0, 3 ret philipp.tomsich:* th.addsl is defined for XLen only (unlike mul, which has a W-form). When operating on a MVT…
		auto Trunc = [&](SDValue Value) {
		return DAG.getNode(ISD::TRUNCATE, DL, VT, Value);
		};

		unsigned TrailingZeroes = C.countTrailingZeros();
		craig.topperUnsubmitted Not Done Reply Inline Actions It's harmless to create a nop TRUNCATE. craig.topper: It's harmless to create a nop TRUNCATE.
		const APInt ShiftedC = C.ashr(TrailingZeroes);
		const APInt ShiftedCMinusOne = ShiftedC - 1;

		// the below comments use the following notation:
		// n, m .. a shift-amount for a shift-and-add instruction
		// (i.e. in { 2, 4, 8 })
		// k .. a power-of-2 that is equivalent to shifting by
		// TrailingZeroes bits
		// i, j .. a power-of-2

		APInt ShiftAmt1;
		APInt ShiftAmt2;
		APInt Quotient;

		// C = (m + 1) * k
		if (isShiftedAddConst(ShiftedC, ShiftAmt1)) {
		SDValue Op0 = AnyExt(N0);
		SDValue Result = ShiftAndAdd(Op0, ShiftAmt1.logBase2(), Op0);
		return Trunc(Shl(Result, TrailingZeroes));
		}
		// C = (m + 1) * (n + 1) * k
		if (isDivisibleByShiftedAddConst(ShiftedC, ShiftAmt1, Quotient) &&
		isShiftedAddConst(Quotient, ShiftAmt2)) {
		SDValue Op0 = AnyExt(N0);
		SDValue Result = ShiftAndAdd(Op0, ShiftAmt1.logBase2(), Op0);
		Result = ShiftAndAdd(Result, ShiftAmt2.logBase2(), Result);
		return Trunc(Shl(Result, TrailingZeroes));
		}
		// C = ((m + 1) * n + 1) * k
		if (isDivisibleByShiftedAddConst(ShiftedCMinusOne, ShiftAmt1, ShiftAmt2) &&
		isSmallShiftAmount(ShiftAmt2)) {
		SDValue Op0 = AnyExt(N0);
		SDValue Result = ShiftAndAdd(Op0, ShiftAmt1.logBase2(), Op0);
		Result = ShiftAndAdd(Result, Quotient.logBase2(), Op0);
		return Trunc(Shl(Result, TrailingZeroes));
		}

		// C has 2 bits set: synthesize using 2 shifts and 1 add (which may
		// see one of the shifts merged into a shift-and-add, if feasible)
		if (C.countPopulation() == 2) {
		APInt HighBit(C.getBitWidth(), (1 << C.logBase2()));
		APInt LowBit = C - HighBit;
		craig.topperUnsubmitted Done Reply Inline Actions feasible* craig.topper: feasible*
		return Add(Shl(N0, HighBit.logBase2()), Shl(N0, LowBit.logBase2()));
		}
		}

		return SDValue();
		}

static SDValue performTRUNCATECombine(SDNode *N, SelectionDAG &DAG,		static SDValue performTRUNCATECombine(SDNode *N, SelectionDAG &DAG,
const RISCVSubtarget &Subtarget) {		const RISCVSubtarget &Subtarget) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);

// Pre-promote (i1 (truncate (srl X, Y))) on RV64 with Zbs without zero		// Pre-promote (i1 (truncate (srl X, Y))) on RV64 with Zbs without zero
// extending X. This is safe since we only need the LSB after the shift and		// extending X. This is safe since we only need the LSB after the shift and
// shift amounts larger than 31 would produce poison. If we wait until		// shift amounts larger than 31 would produce poison. If we wait until
▲ Show 20 Lines • Show All 1,633 Lines • ▼ Show 20 Lines	case RISCVISD::FMV_X_ANYEXTW_RV64: {
assert(Op0.getOpcode() == ISD::FABS);		assert(Op0.getOpcode() == ISD::FABS);
return DAG.getNode(ISD::AND, DL, VT, NewFMV,		return DAG.getNode(ISD::AND, DL, VT, NewFMV,
DAG.getConstant(~SignBit, DL, VT));		DAG.getConstant(~SignBit, DL, VT));
}		}
case ISD::ADD:		case ISD::ADD:
return performADDCombine(N, DAG, Subtarget);		return performADDCombine(N, DAG, Subtarget);
case ISD::SUB:		case ISD::SUB:
return performSUBCombine(N, DAG, Subtarget);		return performSUBCombine(N, DAG, Subtarget);
		case ISD::MUL:
		return performMULCombine(N, DAG, Subtarget);
case ISD::AND:		case ISD::AND:
return performANDCombine(N, DCI, Subtarget);		return performANDCombine(N, DCI, Subtarget);
case ISD::OR:		case ISD::OR:
return performORCombine(N, DCI, Subtarget);		return performORCombine(N, DCI, Subtarget);
case ISD::XOR:		case ISD::XOR:
return performXORCombine(N, DAG, Subtarget);		return performXORCombine(N, DAG, Subtarget);
case ISD::FADD:		case ISD::FADD:
case ISD::UMAX:		case ISD::UMAX:
▲ Show 20 Lines • Show All 4,039 Lines • Show Last 20 Lines

llvm/lib/Target/RISCV/RISCVInstrInfoXTHead.td

	Show First 20 Lines • Show All 155 Lines • ▼ Show 20 Lines

	// Reuse complex patterns from StdExtZba			// Reuse complex patterns from StdExtZba
	def : Pat<(add sh1add_op:$rs1, non_imm12:$rs2),			def : Pat<(add sh1add_op:$rs1, non_imm12:$rs2),
	(TH_ADDSL GPR:$rs2, sh1add_op:$rs1, 1)>;			(TH_ADDSL GPR:$rs2, sh1add_op:$rs1, 1)>;
	def : Pat<(add sh2add_op:$rs1, non_imm12:$rs2),			def : Pat<(add sh2add_op:$rs1, non_imm12:$rs2),
	(TH_ADDSL GPR:$rs2, sh2add_op:$rs1, 2)>;			(TH_ADDSL GPR:$rs2, sh2add_op:$rs1, 2)>;
	def : Pat<(add sh3add_op:$rs1, non_imm12:$rs2),			def : Pat<(add sh3add_op:$rs1, non_imm12:$rs2),
	(TH_ADDSL GPR:$rs2, sh3add_op:$rs1, 3)>;			(TH_ADDSL GPR:$rs2, sh3add_op:$rs1, 3)>;

	def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 6)), GPR:$rs2),
	(TH_ADDSL GPR:$rs2, (TH_ADDSL GPR:$rs1, GPR:$rs1, 1), 1)>;
	def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 10)), GPR:$rs2),
	(TH_ADDSL GPR:$rs2, (TH_ADDSL GPR:$rs1, GPR:$rs1, 2), 1)>;
	def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 18)), GPR:$rs2),
	(TH_ADDSL GPR:$rs2, (TH_ADDSL GPR:$rs1, GPR:$rs1, 3), 1)>;
	def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 12)), GPR:$rs2),
	(TH_ADDSL GPR:$rs2, (TH_ADDSL GPR:$rs1, GPR:$rs1, 1), 2)>;
	def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 20)), GPR:$rs2),
	(TH_ADDSL GPR:$rs2, (TH_ADDSL GPR:$rs1, GPR:$rs1, 2), 2)>;
	def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 36)), GPR:$rs2),
	(TH_ADDSL GPR:$rs2, (TH_ADDSL GPR:$rs1, GPR:$rs1, 3), 2)>;
	def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 24)), GPR:$rs2),
	(TH_ADDSL GPR:$rs2, (TH_ADDSL GPR:$rs1, GPR:$rs1, 1), 3)>;
	def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 40)), GPR:$rs2),
	(TH_ADDSL GPR:$rs2, (TH_ADDSL GPR:$rs1, GPR:$rs1, 2), 3)>;
	def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 72)), GPR:$rs2),
	(TH_ADDSL GPR:$rs2, (TH_ADDSL GPR:$rs1, GPR:$rs1, 3), 3)>;

	def : Pat<(add GPR:$r, CSImm12MulBy4:$i),
	(TH_ADDSL GPR:$r, (ADDI X0, (SimmShiftRightBy2XForm CSImm12MulBy4:$i)), 2)>;
	def : Pat<(add GPR:$r, CSImm12MulBy8:$i),
	(TH_ADDSL GPR:$r, (ADDI X0, (SimmShiftRightBy3XForm CSImm12MulBy8:$i)), 3)>;

	def : Pat<(mul GPR:$r, C3LeftShift:$i),
	(SLLI (TH_ADDSL GPR:$r, GPR:$r, 1),
	(TrailingZeros C3LeftShift:$i))>;
	def : Pat<(mul GPR:$r, C5LeftShift:$i),
	(SLLI (TH_ADDSL GPR:$r, GPR:$r, 2),
	(TrailingZeros C5LeftShift:$i))>;
	def : Pat<(mul GPR:$r, C9LeftShift:$i),
	(SLLI (TH_ADDSL GPR:$r, GPR:$r, 3),
	(TrailingZeros C9LeftShift:$i))>;

	def : Pat<(mul_const_oneuse GPR:$r, (XLenVT 11)),
	(TH_ADDSL GPR:$r, (TH_ADDSL GPR:$r, GPR:$r, 2), 1)>;
	def : Pat<(mul_const_oneuse GPR:$r, (XLenVT 19)),
	(TH_ADDSL GPR:$r, (TH_ADDSL GPR:$r, GPR:$r, 3), 1)>;
	def : Pat<(mul_const_oneuse GPR:$r, (XLenVT 13)),
	(TH_ADDSL GPR:$r, (TH_ADDSL GPR:$r, GPR:$r, 1), 2)>;
	def : Pat<(mul_const_oneuse GPR:$r, (XLenVT 21)),
	(TH_ADDSL GPR:$r, (TH_ADDSL GPR:$r, GPR:$r, 2), 2)>;
	def : Pat<(mul_const_oneuse GPR:$r, (XLenVT 37)),
	(TH_ADDSL GPR:$r, (TH_ADDSL GPR:$r, GPR:$r, 3), 2)>;
	def : Pat<(mul_const_oneuse GPR:$r, (XLenVT 25)),
	(TH_ADDSL (TH_ADDSL GPR:$r, GPR:$r, 2), (TH_ADDSL GPR:$r, GPR:$r, 2), 2)>;
	def : Pat<(mul_const_oneuse GPR:$r, (XLenVT 41)),
	(TH_ADDSL GPR:$r, (TH_ADDSL GPR:$r, GPR:$r, 2), 3)>;
	def : Pat<(mul_const_oneuse GPR:$r, (XLenVT 73)),
	(TH_ADDSL GPR:$r, (TH_ADDSL GPR:$r, GPR:$r, 3), 3)>;
	def : Pat<(mul_const_oneuse GPR:$r, (XLenVT 27)),
	(TH_ADDSL (TH_ADDSL GPR:$r, GPR:$r, 3), (TH_ADDSL GPR:$r, GPR:$r, 3), 1)>;
	def : Pat<(mul_const_oneuse GPR:$r, (XLenVT 45)),
	(TH_ADDSL (TH_ADDSL GPR:$r, GPR:$r, 3), (TH_ADDSL GPR:$r, GPR:$r, 3), 2)>;
	def : Pat<(mul_const_oneuse GPR:$r, (XLenVT 81)),
	(TH_ADDSL (TH_ADDSL GPR:$r, GPR:$r, 3), (TH_ADDSL GPR:$r, GPR:$r, 3), 3)>;

	def : Pat<(mul_const_oneuse GPR:$r, (XLenVT 200)),
	(SLLI (TH_ADDSL (TH_ADDSL GPR:$r, GPR:$r, 2),
	(TH_ADDSL GPR:$r, GPR:$r, 2), 2), 3)>;
	} // Predicates = [HasVendorXTHeadBa]			} // Predicates = [HasVendorXTHeadBa]

	defm PseudoTHVdotVMAQA : VPseudoVMAQA_VV_VX;			defm PseudoTHVdotVMAQA : VPseudoVMAQA_VV_VX;
	defm PseudoTHVdotVMAQAU : VPseudoVMAQA_VV_VX;			defm PseudoTHVdotVMAQAU : VPseudoVMAQA_VV_VX;
	defm PseudoTHVdotVMAQASU : VPseudoVMAQA_VV_VX;			defm PseudoTHVdotVMAQASU : VPseudoVMAQA_VV_VX;
	defm PseudoTHVdotVMAQAUS : VPseudoVMAQA_VX;			defm PseudoTHVdotVMAQAUS : VPseudoVMAQA_VX;

	let Predicates = [HasVendorXTHeadVdot] in {			let Predicates = [HasVendorXTHeadVdot] in {
	defm : VPatTernaryVMAQA_VV_VX<"int_riscv_th_vmaqa", "PseudoTHVdotVMAQA", AllQuadWidenableInt8NoVLMulVectors>;			defm : VPatTernaryVMAQA_VV_VX<"int_riscv_th_vmaqa", "PseudoTHVdotVMAQA", AllQuadWidenableInt8NoVLMulVectors>;
	defm : VPatTernaryVMAQA_VV_VX<"int_riscv_th_vmaqau", "PseudoTHVdotVMAQAU", AllQuadWidenableInt8NoVLMulVectors>;			defm : VPatTernaryVMAQA_VV_VX<"int_riscv_th_vmaqau", "PseudoTHVdotVMAQAU", AllQuadWidenableInt8NoVLMulVectors>;
	defm : VPatTernaryVMAQA_VV_VX<"int_riscv_th_vmaqasu","PseudoTHVdotVMAQASU",AllQuadWidenableInt8NoVLMulVectors>;			defm : VPatTernaryVMAQA_VV_VX<"int_riscv_th_vmaqasu","PseudoTHVdotVMAQASU",AllQuadWidenableInt8NoVLMulVectors>;
	defm : VPatTernaryVMAQA_VX<"int_riscv_th_vmaqaus", "PseudoTHVdotVMAQAUS",AllQuadWidenableInt8NoVLMulVectors>;			defm : VPatTernaryVMAQA_VX<"int_riscv_th_vmaqaus", "PseudoTHVdotVMAQAUS",AllQuadWidenableInt8NoVLMulVectors>;
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Add performMULcombine to perform strength-reduction
Needs ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 495730

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

llvm/lib/Target/RISCV/RISCVInstrInfoXTHead.td

This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Add performMULcombine to perform strength-reductionNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 495730

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

llvm/lib/Target/RISCV/RISCVInstrInfoXTHead.td

[RISCV] Add performMULcombine to perform strength-reduction
Needs ReviewPublic