Download Raw Diff

Details

Reviewers

asb
luismarques
efriedma

Commits

rG9f155bc6e592: [RISCV] Prevent re-ordering some adds after shifts
rL363736: [RISCV] Prevent re-ordering some adds after shifts

Summary

DAGCombine will normally turn a (shl (add x, c1), c2) into (add (shl x, c2), c1 << c2), where c1 and c2 are constants. This can be prevented by a callback in TargetLowering.

On RISC-V, materialising the constant c1 << c2 can be more expensive than materialising c1, because materialising the former may take more instructions, and may use a register, where materialising the latter would not.

This patch implements the hook in RISCVTargetLowering to prevent this transform, in the cases where:

c1 fits into the immediate field in an addi instruction.
c1 takes fewer instructions to materialise than c1 << c2.

In future, DAGCombine could do the check to see whether c1 fits into an add immediate, which might simplify more targets hooks than just RISC-V.

Diff Detail

Repository

rG LLVM Github Monorepo

Build Status

Buildable 32879
Build 32878: arc lint + arc unit

Event Timeline

lenary created this revision.Jun 4 2019, 7:05 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 4 2019, 7:05 AM

Herald added subscribers: llvm-commits, benna, psnobl and 19 others. · View Herald Transcript

Harbormaster completed remote builds in B32876: Diff 202931.Jun 4 2019, 7:07 AM

Add commets about larger constants. These can be improved at a later date

Harbormaster completed remote builds in B32879: Diff 202939.Jun 4 2019, 7:41 AM

Jim added a subscriber: Jim.Jun 4 2019, 7:32 PM

Generalise optimisation to check materialisation cost

Harbormaster completed remote builds in B32936: Diff 203171.Jun 5 2019, 8:58 AM

asb added inline comments.Jun 5 2019, 9:06 AM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
859	I was actually thinking it might be better to define getIntImmCost for RISC-V (which at least initially uses generateInstSeq, even if that might result in a little wasted work), then call that from here (that change might affect codegen in other areas, but should be an improvement). Arguably the introduction of getIntImmCost could make sense as a separate patch (that this one depends on), if a sensible standalone test case is straight forward.

Abstract away calculation of Materialisation Cost

Harbormaster completed remote builds in B32979: Diff 203312.Jun 6 2019, 2:35 AM

lewis-revill added a subscriber: lewis-revill.Jun 6 2019, 2:37 AM

lewis-revill added inline comments.

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
868	Isn't there an inaccuracy in this method of checking the materialization cost since `RISCVMatInt::generateInstSeq` always calculates the cost of materializing into a register? In this case we have instructions which might use immediates, but this will calculate the cost as being the same as a single-instruction materialization into a register followed by an instruction using a register. IE: addi rd, rs1, C would appear to be the same cost as: lui rs2, (C >> 12) add rd, rs1, rs2

Remove out-of-date comments

Harbormaster completed remote builds in B32980: Diff 203314.Jun 6 2019, 2:43 AM

asb added inline comments.Jun 6 2019, 3:00 AM

llvm/lib/Target/RISCV/Utils/RISCVMatInt.cpp
78 ↗	(On Diff #203314)	Should add a comment to document what this does, and to document that it really does calculate the cost of materialising an integer (i.e. doesn't take into account whether there might be an opportunity for merging it into an addi). You should also document that it is invalid to call this for a Val which can't be represented with 32Bits when Is64Bit is false (that triggers an assert in generateInstSeq - I think it's probably still ok to treat that as an API misuse rather than adding more explicit error handling).

lenary edited the summary of this revision. (Show Details)Jun 6 2019, 3:35 AM

lenary retitled this revision from [RISCV] Prevent hoisting some adds after shifts to [RISCV] Prevent re-ordering some adds after shifts.Jun 6 2019, 3:48 AM

lenary marked 3 inline comments as done.Jun 6 2019, 4:06 AM

lenary added inline comments.

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
868	On line 860, we check that `C1` (which you're calling `C`) will fit into an add immediate. If it will, that counts as "free", and we definitely want to prevent the re-ordering, and we don't even check the materialisation cost. In fact, I'm going to update the patch to always allow the re-ordering if `C1 << C2` will fit into an add immediate, because then we also know that materialisation of that constant is "free", and so we should allow the re-ordering because it might help later dagcombines.

Allow Combine if C1 << C2 will fit into an immediate
Explain restrictions on RISCVMatInt::getIntMatCost

Harbormaster completed remote builds in B33052: Diff 203525.Jun 7 2019, 3:21 AM

lenary marked an inline comment as done.Jun 7 2019, 3:21 AM

I added some minor comments, along with a bigger suggestion. What do you think about adding a getIntMatCost taking APInt and IsRV64, which will split the immediate into XLEN-sized chunks (see comment for reference to similar code in AArch64)?

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
847	Actually, with the logic there now I guess we're checking that (add val, c1) is fewer instructions than (add, val', c1 << c2) which is a similar condition but not quite the same. i.e. both c1 and c1 << c2 might be materialisable with a single instruction, but if c1 << c2 is materialised using a single lui it's still unprofitable as we can't merge it into the add.
859	shift immediate -> add immediate
860	End comment with a full stop
869	I'm not totally liking calling getIntMatCost with something that isn't logically equivalent to IsRV64 (you could of course have MVT::i64 on RV32 pre-legalisation). The cost of materialising a 64-bit constant on RV64 also isn't necessarily the same as materialising an i64 split into two i32 on RV2. In an ideal world, we'd have a getIntMatCost taking an APInt+IsRV64, with similar logic to `int AArch64TTIImpl::getIntImmCost(const APInt &Imm, Type *Ty)` - i.e. splitting into 32-bit/64-bit chunks. Hard to imagine this being a big deal for this particular case, but it's good infrastructure to have.

asb mentioned this in D63007: [RISCV] Add RISCV-specific TargetTransformInfo.Jun 7 2019, 11:43 PM

Address review feedback

Introduce new getIntMatCost(const APInt &Val, bool IsRV64) API, which can materialise much wider constants than the previous method.
Clarify and check grammar in comments.

Harbormaster completed remote builds in B33479: Diff 205064.Jun 17 2019, 6:49 AM

lenary marked 4 inline comments as done.Jun 17 2019, 6:51 AM

In future, DAGCombine could do the check to see whether c1 fits into an add immediate, which might simplify more targets hooks than just RISC-V.

Why not do this the right way already?

@lebedev.ri @craig.topper

lenary mentioned this in D63433: [RISCV] Add RISCV-specific TargetTransformInfo.Jun 17 2019, 8:29 AM

Thanks for the update Sam, I think there might be a minor correctness issue with getIntMatCost - let me know what you think.

It would be great to add a simple sanity check for the chunking logic. Perhaps there's a test case involving -1 that produces a different answer with the current version of the patch versus a version updated to use the type size for the bit size.

llvm/lib/Target/RISCV/Utils/RISCVMatInt.cpp
78 ↗	(On Diff #205064)	Given you have a description in the header, I think this repeated description for the implementation is redundant (see https://llvm.org/docs/CodingStandards.html#doxygen-use-in-documentation-comments "Don’t duplicate the documentation comment in the header file and in the implementation file"). Though maybe I missed a de facto standard elsewhere in the codebase for duplicating a reduced version. I don't feel strongly about this, so do feel free to keep if you prefer.
81 ↗	(On Diff #205064)	I think this isn't going to give the desired result if Val is e.g. -1. In that case, getMinSignedBits is going to return 1, meaning the loop is only executed once regardless of the original type width. If the original type was e.g. an i64 on RV32, then two instructions are required to materialise the constant, yet the logic in this function will only return 1. I think the solutation is to more closely mirror the AArch64TTIImpl::getIntImmCost function I mentioned before, and add a Type parameter (and maybe also adopt similar logic for sign-extending constants to be a multiple of the PlatRegSize).
llvm/lib/Target/RISCV/Utils/RISCVMatInt.h
41 ↗	(On Diff #205064)	Nit: Shouldn't this be Is64Bit to to match the naming in the implementation?

In D62857#1546071, @xbolva00 wrote:

In future, DAGCombine could do the check to see whether c1 fits into an add immediate, which might simplify more targets hooks than just RISC-V.

Why not do this the right way already?

@lebedev.ri @craig.topper

Now this patch has been extended to cover more cases (i.e. comparing materialisation cost rather than just identifying the isLegalAddImmediate case), it wouldn't have much effect on this backend. A patch that uses isLegalAddImmediate in the relevant DAGCombine might be a small benefit for targets that don't implement the isDesirableToCommuteWithShift hook, but I don't think it would affect the implementation here (it's not obvious to me the hook implementation should assume isLegalAddImmediate had already been checked). So if it does make sense to add, I think it's definitely a separate patch to this one.

asb added inline comments.Jun 18 2019, 1:10 AM

llvm/test/CodeGen/RISCV/add-before-shl.ll
14	The patch has since been updated to do a direct cost comparison, rather than just looking at the case where the constant fits into an immediate. These two paragraphs should be updated to reflect that

lenary added a child revision: D63433: [RISCV] Add RISCV-specific TargetTransformInfo.Jun 18 2019, 1:48 AM

Address review feedback

Update getIntMatCost to take an integer size (in bits). This ensures we chunk the constant correctly for the legal types on the target, and account for the costs of all required chunks. I was unable to devise a simple test case for when this behaviour would not match the behaivour using getMinSignedBits, due to legalisation always splitting the wider type before the isDesirableToCommuteWithShift callback is called.

The chunking will automatically expand each chunk to be the platform register width, so we don't need to sign extend the constant to be a multiple of that width before we start chunking.

Update and de-duplicate comments on tests and implementations
Update naming of IsRV64 parameter in RISCVMatInt.{h,cpp}

Harbormaster completed remote builds in B33558: Diff 205350.Jun 18 2019, 7:55 AM

lenary marked 4 inline comments as done.Jun 18 2019, 7:58 AM

lenary added inline comments.

llvm/lib/Target/RISCV/Utils/RISCVMatInt.cpp
81 ↗	(On Diff #205064)	I think I now cover chunking correctly. As in my message above, I don't need to sign extend Val before chunking, because each chunk is extended to be PlatRegSize within the loop.

LGTM, thanks!

This revision is now accepted and ready to land.Jun 18 2019, 8:19 AM

Closed by commit rL363736: [RISCV] Prevent re-ordering some adds after shifts (authored by lenary). · Explain WhyJun 18 2019, 1:35 PM

This revision was automatically updated to reflect the committed changes.

Diff 202939

llvm/lib/Target/RISCV/RISCVISelLowering.h

Show First 20 Lines • Show All 113 Lines • ▼ Show 20 Lines	ISD::NodeType getExtendForAtomicOps() const override {
return ISD::SIGN_EXTEND;		return ISD::SIGN_EXTEND;
}		}

bool shouldExpandShift(SelectionDAG &DAG, SDNode *N) const override {		bool shouldExpandShift(SelectionDAG &DAG, SDNode *N) const override {
if (DAG.getMachineFunction().getFunction().hasMinSize())		if (DAG.getMachineFunction().getFunction().hasMinSize())
return false;		return false;
return true;		return true;
}		}
		bool isDesirableToCommuteWithShift(const SDNode *N,
		CombineLevel Level) const override;

private:		private:
void analyzeInputArgs(MachineFunction &MF, CCState &CCInfo,		void analyzeInputArgs(MachineFunction &MF, CCState &CCInfo,
const SmallVectorImpl<ISD::InputArg> &Ins,		const SmallVectorImpl<ISD::InputArg> &Ins,
bool IsRet) const;		bool IsRet) const;
void analyzeOutputArgs(MachineFunction &MF, CCState &CCInfo,		void analyzeOutputArgs(MachineFunction &MF, CCState &CCInfo,
const SmallVectorImpl<ISD::OutputArg> &Outs,		const SmallVectorImpl<ISD::OutputArg> &Outs,
bool IsRet, CallLoweringInfo *CLI) const;		bool IsRet, CallLoweringInfo *CLI) const;
▲ Show 20 Lines • Show All 54 Lines • Show Last 20 Lines

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

Show First 20 Lines • Show All 835 Lines • ▼ Show 20 Lines	return DCI.CombineTo(N,
DAG.getNode(ISD::AND, DL, MVT::i64, NewFMV,		DAG.getNode(ISD::AND, DL, MVT::i64, NewFMV,
DAG.getConstant(~SignBit, DL, MVT::i64)));		DAG.getConstant(~SignBit, DL, MVT::i64)));
}		}
}		}

return SDValue();		return SDValue();
}		}

		bool RISCVTargetLowering::isDesirableToCommuteWithShift(
		const SDNode *N, CombineLevel Level) const {
		// The following folds are only desirable if constant `c1` cannot fit into an
		// immediate:
		asbUnsubmitted Done Reply Inline Actions Actually, with the logic there now I guess we're checking that (add val, c1) is fewer instructions than (add, val', c1 << c2) which is a similar condition but not quite the same. i.e. both c1 and c1 << c2 might be materialisable with a single instruction, but if c1 << c2 is materialised using a single lui it's still unprofitable as we can't merge it into the add. asb: Actually, with the logic there now I guess we're checking that (add val, c1) is fewer…
		// (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2)
		// (shl (or x, c1), c2) -> (or (shl x, c2), c1 << c2)
		SDValue N0 = N->getOperand(0);
		if (N0.getOpcode() == ISD::ADD \|\| N0.getOpcode() == ISD::OR) {
		SDValue C1 = N0->getOperand(1);
		if (auto *Const = dyn_cast<ConstantSDNode>(C1)) {
		return !isLegalAddImmediate(Const->getSExtValue());
		}
		}
		return true;
		}

		asbUnsubmitted Done Reply Inline Actions I was actually thinking it might be better to define getIntImmCost for RISC-V (which at least initially uses generateInstSeq, even if that might result in a little wasted work), then call that from here (that change might affect codegen in other areas, but should be an improvement). Arguably the introduction of getIntImmCost could make sense as a separate patch (that this one depends on), if a sensible standalone test case is straight forward. asb: I was actually thinking it might be better to define getIntImmCost for RISC-V (which at least…
		asbUnsubmitted Done Reply Inline Actions shift immediate -> add immediate asb: shift immediate -> add immediate
unsigned RISCVTargetLowering::ComputeNumSignBitsForTargetNode(		unsigned RISCVTargetLowering::ComputeNumSignBitsForTargetNode(
		asbUnsubmitted Done Reply Inline Actions End comment with a full stop asb: End comment with a full stop
SDValue Op, const APInt &DemandedElts, const SelectionDAG &DAG,		SDValue Op, const APInt &DemandedElts, const SelectionDAG &DAG,
unsigned Depth) const {		unsigned Depth) const {
switch (Op.getOpcode()) {		switch (Op.getOpcode()) {
default:		default:
break;		break;
case RISCVISD::SLLW:		case RISCVISD::SLLW:
case RISCVISD::SRAW:		case RISCVISD::SRAW:
case RISCVISD::SRLW:		case RISCVISD::SRLW:
		lewis-revillUnsubmitted Done Reply Inline Actions Isn't there an inaccuracy in this method of checking the materialization cost since `RISCVMatInt::generateInstSeq` always calculates the cost of materializing into a register? In this case we have instructions which might use immediates, but this will calculate the cost as being the same as a single-instruction materialization into a register followed by an instruction using a register. IE: addi rd, rs1, C would appear to be the same cost as: lui rs2, (C >> 12) add rd, rs1, rs2 lewis-revill: Isn't there an inaccuracy in this method of checking the materialization cost since…
		lenaryAuthorUnsubmitted Done Reply Inline Actions On line 860, we check that `C1` (which you're calling `C`) will fit into an add immediate. If it will, that counts as "free", and we definitely want to prevent the re-ordering, and we don't even check the materialisation cost. In fact, I'm going to update the patch to always allow the re-ordering if `C1 << C2` will fit into an add immediate, because then we also know that materialisation of that constant is "free", and so we should allow the re-ordering because it might help later dagcombines. lenary: On line 860, we check that `C1` (which you're calling `C`) will fit into an add immediate. If…
case RISCVISD::DIVW:		case RISCVISD::DIVW:
		asbUnsubmitted Done Reply Inline Actions I'm not totally liking calling getIntMatCost with something that isn't logically equivalent to IsRV64 (you could of course have MVT::i64 on RV32 pre-legalisation). The cost of materialising a 64-bit constant on RV64 also isn't necessarily the same as materialising an i64 split into two i32 on RV2. In an ideal world, we'd have a getIntMatCost taking an APInt+IsRV64, with similar logic to `int AArch64TTIImpl::getIntImmCost(const APInt &Imm, Type Ty)` - i.e. splitting into 32-bit/64-bit chunks. Hard to imagine this being a big deal for this particular case, but it's good infrastructure to have. asb:* I'm not totally liking calling getIntMatCost with something that isn't logically equivalent to…
case RISCVISD::DIVUW:		case RISCVISD::DIVUW:
case RISCVISD::REMUW:		case RISCVISD::REMUW:
// TODO: As the result is sign-extended, this is conservatively correct. A		// TODO: As the result is sign-extended, this is conservatively correct. A
// more precise answer could be calculated for SRAW depending on known		// more precise answer could be calculated for SRAW depending on known
// bits in the shift amount.		// bits in the shift amount.
return 33;		return 33;
}		}

▲ Show 20 Lines • Show All 1,443 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/add-before-shl.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=riscv32 -verify-machineinstrs < %s \
				; RUN: \| FileCheck -check-prefix=RV32I %s
				; RUN: llc -mtriple=riscv64 -verify-machineinstrs < %s \
				; RUN: \| FileCheck -check-prefix=RV64I %s

				; These test that constant adds are not moved after shifts by DAGCombine,
				; if the constant can fit into an immediate.
				;
				; Materialising the large (shifted) constant produced for the new add
				; uses an extra register, and takes several instructions. It is more
				; efficient to perform the add before the shift if the constant to be
				; added fits into an immediate.

				asbUnsubmitted Done Reply Inline Actions The patch has since been updated to do a direct cost comparison, rather than just looking at the case where the constant fits into an immediate. These two paragraphs should be updated to reflect that asb: The patch has since been updated to do a direct cost comparison, rather than just looking at…
				define signext i32 @add_small_const(i32 signext %a) nounwind {
				; RV32I-LABEL: add_small_const:
				; RV32I: # %bb.0:
				; RV32I-NEXT: addi a0, a0, 1
				; RV32I-NEXT: slli a0, a0, 24
				; RV32I-NEXT: srai a0, a0, 24
				; RV32I-NEXT: ret
				;
				; RV64I-LABEL: add_small_const:
				; RV64I: # %bb.0:
				; RV64I-NEXT: addi a0, a0, 1
				; RV64I-NEXT: slli a0, a0, 56
				; RV64I-NEXT: srai a0, a0, 56
				; RV64I-NEXT: ret
				%1 = add i32 %a, 1
				%2 = shl i32 %1, 24
				%3 = ashr i32 %2, 24
				ret i32 %3
				}

				; NOTE: This add constant does not fit into an add immediate, so we allow the
				; the transformation to fire. However, this introduces a second left shift,
				; which we wouldn't need if we did the add before the shl.
				define signext i32 @add_large_const(i32 signext %a) nounwind {
				; RV32I-LABEL: add_large_const:
				; RV32I: # %bb.0:
				; RV32I-NEXT: slli a0, a0, 16
				; RV32I-NEXT: lui a1, 65520
				; RV32I-NEXT: add a0, a0, a1
				; RV32I-NEXT: srai a0, a0, 16
				; RV32I-NEXT: ret
				;
				; RV64I-LABEL: add_large_const:
				; RV64I: # %bb.0:
				; RV64I-NEXT: slli a0, a0, 48
				; RV64I-NEXT: lui a1, 1
				; RV64I-NEXT: addiw a1, a1, -1
				; RV64I-NEXT: slli a1, a1, 48
				; RV64I-NEXT: add a0, a0, a1
				; RV64I-NEXT: srai a0, a0, 48
				; RV64I-NEXT: ret
				%1 = add i32 %a, 4095
				%2 = shl i32 %1, 16
				%3 = ashr i32 %2, 16
				ret i32 %3
				}

				; NOTE: This add constant does not fit into an add immediate, so we allow the
				; the transformation to fire. However, this introduces a second left shift,
				; which we wouldn't need if we did the add before the shl.
				define signext i32 @add_huge_const(i32 signext %a) nounwind {
				; RV32I-LABEL: add_huge_const:
				; RV32I: # %bb.0:
				; RV32I-NEXT: slli a0, a0, 16
				; RV32I-NEXT: lui a1, 524272
				; RV32I-NEXT: add a0, a0, a1
				; RV32I-NEXT: srai a0, a0, 16
				; RV32I-NEXT: ret
				;
				; RV64I-LABEL: add_huge_const:
				; RV64I: # %bb.0:
				; RV64I-NEXT: slli a0, a0, 48
				; RV64I-NEXT: lui a1, 8
				; RV64I-NEXT: addiw a1, a1, -1
				; RV64I-NEXT: slli a1, a1, 48
				; RV64I-NEXT: add a0, a0, a1
				; RV64I-NEXT: srai a0, a0, 48
				; RV64I-NEXT: ret
				%1 = add i32 %a, 32767
				%2 = shl i32 %1, 16
				%3 = ashr i32 %2, 16
				ret i32 %3
				}

This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Prevent re-ordering some adds after shifts
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 202939

llvm/lib/Target/RISCV/RISCVISelLowering.h

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

llvm/test/CodeGen/RISCV/add-before-shl.ll

This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Prevent re-ordering some adds after shiftsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 202939

llvm/lib/Target/RISCV/RISCVISelLowering.h

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

llvm/test/CodeGen/RISCV/add-before-shl.ll

[RISCV] Prevent re-ordering some adds after shifts
ClosedPublic