This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
CodeGen/
5/6
ISDOpcodes.h
-
TargetLowering.h
-
Target/
-
TargetSelectionDAG.td
-
lib/
-
CodeGen/
-
SelectionDAG/
1/3
DAGCombiner.cpp
-
LegalizeVectorTypes.cpp
-
SelectionDAGDumper.cpp
-
TargetLoweringBase.cpp
-
Target/AArch64/
-
AArch64/
-
AArch64ISelLowering.h
2
AArch64ISelLowering.cpp
-
AArch64InstrInfo.td

Differential D106237

[ISel] Port AArch64 HADD and RHADD to ISel
ClosedPublic

Authored by dmgreen on Jul 18 2021, 7:12 AM.

Download Raw Diff

Details

Reviewers

RKSimon
spatel
efriedma
craig.topper
lebedev.ri

Commits

rG4072e362c030: [ISel] Port AArch64 HADD and RHADD to ISel

Summary

This ports the aarch64 combines for HADD and RHADD over to DAG combine, so that they can be used in more architectures (notably MVE in a followup patch). They are renamed to AVGFLOOR and AVGCEIL in the process, to avoid confusion with instructions such as X86 hadd. The code was also rewritten slightly to remove the AArch64 idiosyncrasies.

The general pattern for a AVGFLOORS is

%xe = sext i8 %x to i32
%ye = sext i8 %y to i32
%a = add i32 %xe, %ye
%r = lshr i32 %a, 1
%t = trunc i32 %r to i8

An AVGFLOORU is equivalent with zext. Because of the truncate lshr==ashr, as the top bits are not demanded. An AVGCEIL also includes an extra rounding, so includes an extra add of 1.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

dmgreen created this revision.Jul 18 2021, 7:12 AM

Herald added subscribers: danielkiss, hiraditya, kristof.beyls. · View Herald TranscriptJul 18 2021, 7:12 AM

dmgreen requested review of this revision.Jul 18 2021, 7:12 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 18 2021, 7:12 AM

dmgreen mentioned this in D106238: [ARM] MVE hadd and rhadd.Jul 18 2021, 7:13 AM

dmgreen added a child revision: D106238: [ARM] MVE hadd and rhadd.

X86 might be able to reuse this for X86ISD::AVG

llvm/include/llvm/CodeGen/ISDOpcodes.h
620	having add? Maybe add a code snippet?
625	Maybe add a code snippet?

Harbormaster completed remote builds in B114733: Diff 359623.Jul 18 2021, 8:01 AM

efriedma added inline comments.Jul 18 2021, 11:44 AM

llvm/include/llvm/CodeGen/ISDOpcodes.h
620	There's really very little incentive to keep ISD opcode names short, particularly rarely used ones like this. I'd prefer to spell it out (particularly the "h"; for example, x86 has "haddps", where the "h" stands for "horizontal"). Maybe ADD_HALVE_UNSIGNED_ROUND_DOWN and ADD_HALVE_SIGNED_ROUND_UP?

@dmgreen - if you're still looking at this, the patch needs a rename of the node enum types and preferably we replace X86ISD::AVG to use this as well.

This revision now requires changes to proceed.Nov 3 2021, 8:27 AM

craig.topper added inline comments.Nov 5 2021, 12:21 PM

llvm/include/llvm/CodeGen/ISDOpcodes.h
620	Halide calls these halving_add and rounding_halving_add.
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
935 ↗	(On Diff #359623)	Is zext supposed to be sext?
945 ↗	(On Diff #359623)	Can this be DemandedBits.isPositive() or DemandedBits.isSignBitClear()? Counting bits seems like overkill.

Instead of making this part of SimplifyDemandedBits, could you emit (and (sext (hadds X, Y)), 0x7fffffff) for the (lshr (add (sext(X), sext(Y)), 1) case and let the AND be optimized by itself? Or would the transform not be profitable if it doesn't get removed?

I haven't had a change to look at it recently - I've been busy with other things. I hope to get back to this at some point - it would be good to share it between backends.

llvm/include/llvm/CodeGen/ISDOpcodes.h
620	I'll be honest - I didn't much like the sound of ADD_HALVE_UNSIGNED_ROUND_DOWN. I'm not a fan of how long that name is. Arm uses the name HADD, with S/U or V before it. X86 uses AVG and RISCV uses AAVG I believe, if I was looking up the right instruction. I'm not sure if they have the same concepts of rounding RHADD though, which rounds up. I think gcc calls them AVG_FLOOR and AVG_CEIL. Any of those names or HALVING_ADD would be find by me.

In D106237#3112542, @craig.topper wrote:

Instead of making this part of SimplifyDemandedBits, could you emit (and (sext (hadds X, Y)), 0x7fffffff) for the (lshr (add (sext(X), sext(Y)), 1) case and let the AND be optimized by itself? Or would the transform not be profitable if it doesn't get removed?

Yeah that might work. Like https://alive2.llvm.org/ce/z/9pwKEi. I'll have to check if it looks worse anywhere - I think it should be fine in general, possibly minus the cost of materializing the constant.

arcbbb added a subscriber: arcbbb.Nov 7 2021, 9:03 AM

This now:

Calls the nodes AVGFLOOR and AVGCEIL.
Ports X86ISD::AVG to use AVGCEILU.
Doesn't remove detectAVGPattern, as it can detect more pattern, such as illegal types. It adds some basic widening support to handle them.

Herald added subscribers: ecnelises, kerbowa, pengfei and 2 others. · View Herald TranscriptDec 19 2021, 1:20 PM

lebedev.ri added a reviewer: lebedev.ri.Dec 19 2021, 1:28 PM

lebedev.ri added a subscriber: lebedev.ri.

lebedev.ri added inline comments.

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
908 ↗	(On Diff #395353)	There's also the case where one of the operands is a constant, so you don't have an explicit +1
947–950 ↗	(On Diff #395353)	Since you have already avoided the other misdesign (by starting from the shift and not trunc), can we also avoid one here, by looking at the known sign/leading zero bits instead of an explicit extension?
960–961 ↗	(On Diff #395353)	Don't you need to check that the addition is no-wrap, i.e. don't you need to check how many bits were added during extension? Right now this seems like a miscompile, one that will be avoided if you use known bits instead of looking for extensions.

Harbormaster completed remote builds in B140023: Diff 395353.Dec 19 2021, 2:14 PM

Now using some known bits.

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
908 ↗	(On Diff #395353)	For ARM/Arch64 that would be an hadd (avgfloor), IIUC. For X86 where avgceil is legal but avgfloor is not, we can probably do something about that if needed, or make it target specific. It's probably best not to try and address ever issue in this single patch though.
947–950 ↗	(On Diff #395353)	Sure, sounds good. I've given it a go, but it might be a bit messy. We have to get a type to use for the legalization, which was previously just using the extend type. Let me know if you have any suggestions.
960–961 ↗	(On Diff #395353)	MVE already has some nowrap tablegen patterns, as they don't require any extends and can be recognised from a add+shift. I've not added anything with nowrap here (maybe we can extend this in a future patch, so long as this is correct so far). The other patterns mostly verify with a single bit of extension, so long as I have that right, and I've added checks where they seem to need 2.

Harbormaster completed remote builds in B140827: Diff 396444.Dec 28 2021, 4:49 PM

lebedev.ri added inline comments.Dec 29 2021, 1:54 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
987–988 ↗	(On Diff #396444)	Can we get to here where the operations aren't legal yet?
908 ↗	(On Diff #395353)	Sure, this can be done in a follow-up. What i'm saying is that if you want to catch more cases, you might want to try to deal with `add(ext, C)` as with `add(ext, add(C-1, 1))` (or vice versa). Probably this might be best done by adding a wrapper function, and adding a parameter to this function, `bool LookingForCeil`.

dmgreen added inline comments.Jan 3 2022, 7:10 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
987–988 ↗	(On Diff #396444)	Yes I think so (IIUYC). It'll be run as part of any of the DAG combines. There is very little legalization for AVG at the moment, so I'm not sure we can create a lot of illegal nodes. (But I'm not an expert of when it's best to create illegal target-independent nodes and when not)

Matt added a subscriber: Matt.Jan 7 2022, 7:33 AM

craig.topper added inline comments.Jan 7 2022, 1:15 PM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
18643	This APInt goes with the code after this new code. Move it down?
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
946 ↗	(On Diff #396444)	This needs to forward the Depth from the caller.
957 ↗	(On Diff #396444)	ComputeNumSignBits can call computeKnownBits internally and will count leading 0s as sign bits. Does the zero check need to have priority here?

In D106237#3114240, @dmgreen wrote:

In D106237#3112542, @craig.topper wrote:

Instead of making this part of SimplifyDemandedBits, could you emit (and (sext (hadds X, Y)), 0x7fffffff) for the (lshr (add (sext(X), sext(Y)), 1) case and let the AND be optimized by itself? Or would the transform not be profitable if it doesn't get removed?

Yeah that might work. Like https://alive2.llvm.org/ce/z/9pwKEi. I'll have to check if it looks worse anywhere - I think it should be fine in general, possibly minus the cost of materializing the constant.

Did you check on this?

RKSimon added inline comments.Jan 10 2022, 10:21 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
1351 ↗	(On Diff #396444)	These can be custom lowered on AVX1 targets by splitting, but I'm OK with just a TODO for now.
1654 ↗	(On Diff #396444)	These can be custom lowered on non-AVX512BW targets by splitting, but I'm OK with just a TODO for now.

dmgreen mentioned this in D117901: [DAG] Convert truncstore(extend(x)) back to store(x).Jan 21 2022, 9:16 AM

dmgreen updated this revision to Diff 402013.Jan 21 2022, 9:25 AM

dmgreen marked 14 inline comments as done.

dmgreen edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B144857: Diff 402013.Jan 21 2022, 9:26 AM

In D106237#3228430, @craig.topper wrote:

In D106237#3114240, @dmgreen wrote:

In D106237#3112542, @craig.topper wrote:

Instead of making this part of SimplifyDemandedBits, could you emit (and (sext (hadds X, Y)), 0x7fffffff) for the (lshr (add (sext(X), sext(Y)), 1) case and let the AND be optimized by itself? Or would the transform not be profitable if it doesn't get removed?

Yeah that might work. Like https://alive2.llvm.org/ce/z/9pwKEi. I'll have to check if it looks worse anywhere - I think it should be fine in general, possibly minus the cost of materializing the constant.

Did you check on this?

Yeah - AArch64 as a saddl instruction that can perform 'add(sext, sext)' as a single instruction. That with a shift will be better than a hadd;extend;bic, and it will look worse if the And is less efficient (possibly having to dup a constant).

With the trunc it will perform OK, I'm not sure what the likelyhood of sext with lshr will be without it. With the trunc it will be one of the most common cases, given how the ashr is converted to a lshr.

Providing that the depth is limited, is there an issue with doing it as a demand bits combine?

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
957 ↗	(On Diff #396444)	I was trying to keep SRA producing HADDS, and SRL preferring HADDU. I think it should be using the known bits to preferably make the lowest bitwidth hadd.
llvm/lib/Target/X86/X86ISelLowering.cpp
1351 ↗	(On Diff #396444)	Thanks. I thought that the natural splitting of vectors would capture that already.

dmgreen mentioned this in rGb27e5459d51f: [DAG] Convert truncstore(extend(x)) back to store(x).Jan 22 2022, 5:20 AM

Providing that the depth is limited, is there an issue with doing it as a demand bits combine?

My main reason for questioning was because as far as I know this is a larger transform than anything else we do in SimplifiedDemandedBits. So it seemed a bit of a new direction for SimplifiedDemandedBits, but if we're not concerned about that then I don't object to this patch.

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
953 ↗	(On Diff #402013)	I'm not sure, but would it be better to use countMaxActiveBits and ComputeMaxSignificantBits. And do all the comparisons against the scalar bit width instead of against 1 and 2.
1000 ↗	(On Diff #402013)	changeVectorElementType is a little broken. If the VT is an MVT, but the VT with the element type replaced isn't an MVT, the function will assert because it needs an LLVMContext to create an EVT which it can't get from the MVT. Maybe it's ok in this case as long as our MVTs always have all power of 2 element types smaller for each MVT. Like MVT::vXi32 would need MVT::vXi16 and MVT::vXi8 to exist.

Now uses EVT::getVectorVT

In D106237#3263993, @craig.topper wrote:

Providing that the depth is limited, is there an issue with doing it as a demand bits combine?

My main reason for questioning was because as far as I know this is a larger transform than anything else we do in SimplifiedDemandedBits. So it seemed a bit of a new direction for SimplifiedDemandedBits, but if we're not concerned about that then I don't object to this patch.

Yeah that's fair enough. I had considered it, and was hoping that the initial condition (ConstantSDNode *N1C = isConstOrConstSplat(Op.getOperand(1)); if(N1C && N1C->isOne())..) would rule most stuff out, limiting how expensive this is in practice. In my mind recursing into a SimplifiedDemandedBits call for "N" steps is going to be more expensive than any one transform inside it, due to the number of nodes potentially visited. But they will be related, and I have no hard evidence about which part is potentially slower.

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
953 ↗	(On Diff #402013)	In my mind, saying an add requires a single extra bit is simpler than dealing with MaxActiveBits and BitWidths. But if you think it's worth changing I'm happy to change it.

Harbormaster completed remote builds in B145218: Diff 402489.Jan 24 2022, 11:34 AM

I've lost track of this patch a little - would it make sense to split it into two parts - the first just replaces the target opcodes with the generic ISD variants, and the second then begins the process of moving the matching into generic DAG?

Yeah sounds good. This patch was certainly feeling too big

This is now a more straight-forward porting of the code from AArch64 to DAGCombine. The X86 parts have been moved into a different patch, and this now starts at a truncate in the same way that the AArch64 code did.

dmgreen mentioned this in D119072: [DAGCombine] Move AVG combine to SimplifyDemandBits.Feb 6 2022, 1:34 AM

dmgreen added a child revision: D119072: [DAGCombine] Move AVG combine to SimplifyDemandBits.

dmgreen mentioned this in D119073: [X86] Replace X86ISD::AVG with generic ISD::AVGCEILU.Feb 6 2022, 1:40 AM

Please tidyup the clang-format warnings.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14357	update the urhadd references

Harbormaster completed remote builds in B147808: Diff 406179.Feb 6 2022, 3:08 AM

Attempt to clean up formatting, spelling and some node names.

Harbormaster completed remote builds in B147848: Diff 406287.Feb 6 2022, 3:05 PM

LGTM with a few minors

llvm/include/llvm/CodeGen/ISDOpcodes.h
622	Really pedantic but none of these description refer to these being averaging adds, yet a lot of the comments in other places just refers to them as averages.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12699	assert(N->getOpcode() == ISD::TRUNCATE && "TRUNCATE node expected");
12759	Is this assert necessary?
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14349	update these as well

This revision is now accepted and ready to land.Feb 7 2022, 12:11 AM

dmgreen added a child revision: D119559: [DAGCombine] Basic combines for AVG nodes..Feb 11 2022, 9:57 AM

This revision was landed with ongoing or failed builds.Feb 11 2022, 10:29 AM

Closed by commit rG4072e362c030: [ISel] Port AArch64 HADD and RHADD to ISel (authored by dmgreen). · Explain Why

This revision was automatically updated to reflect the committed changes.

dmgreen added a commit: rG4072e362c030: [ISel] Port AArch64 HADD and RHADD to ISel.

dmgreen mentioned this in rGf810b40c3b51: [X86] Replace X86ISD::AVG with generic ISD::AVGCEILU.Feb 11 2022, 10:57 AM

dmgreen mentioned this in rGea6ebbcfb39b: [ARM] MVE hadd and rhadd.Feb 14 2022, 3:55 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

ISDOpcodes.h

11 lines

TargetLowering.h

4 lines

Target/

TargetSelectionDAG.td

4 lines

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

83 lines

LegalizeVectorTypes.cpp

4 lines

SelectionDAGDumper.cpp

4 lines

TargetLoweringBase.cpp

6 lines

Target/

AArch64/

AArch64ISelLowering.h

8 lines

AArch64ISelLowering.cpp

124 lines

AArch64InstrInfo.td

13 lines

Diff 407945

llvm/include/llvm/CodeGen/ISDOpcodes.h

Show First 20 Lines • Show All 611 Lines • ▼ Show 20 Lines	enum NodeType {
STEP_VECTOR,		STEP_VECTOR,

/// MULHU/MULHS - Multiply high - Multiply two integers of type iN,		/// MULHU/MULHS - Multiply high - Multiply two integers of type iN,
/// producing an unsigned/signed value of type i[2*N], then return the top		/// producing an unsigned/signed value of type i[2*N], then return the top
/// part.		/// part.
MULHU,		MULHU,
MULHS,		MULHS,

		/// AVGFLOORS/AVGFLOORU - Averaging add - Add two integers using an integer of
		RKSimonUnsubmitted Done Reply Inline Actions having add? Maybe add a code snippet? RKSimon: having add? Maybe add a code snippet?
		efriedmaUnsubmitted Done Reply Inline Actions There's really very little incentive to keep ISD opcode names short, particularly rarely used ones like this. I'd prefer to spell it out (particularly the "h"; for example, x86 has "haddps", where the "h" stands for "horizontal"). Maybe ADD_HALVE_UNSIGNED_ROUND_DOWN and ADD_HALVE_SIGNED_ROUND_UP? efriedma: There's really very little incentive to keep ISD opcode names short, particularly rarely used…
		craig.topperUnsubmitted Done Reply Inline Actions Halide calls these halving_add and rounding_halving_add. craig.topper: Halide calls these halving_add and rounding_halving_add.
		dmgreenAuthorUnsubmitted Done Reply Inline Actions I'll be honest - I didn't much like the sound of ADD_HALVE_UNSIGNED_ROUND_DOWN. I'm not a fan of how long that name is. Arm uses the name HADD, with S/U or V before it. X86 uses AVG and RISCV uses AAVG I believe, if I was looking up the right instruction. I'm not sure if they have the same concepts of rounding RHADD though, which rounds up. I think gcc calls them AVG_FLOOR and AVG_CEIL. Any of those names or HALVING_ADD would be find by me. dmgreen: I'll be honest - I didn't much like the sound of ADD_HALVE_UNSIGNED_ROUND_DOWN. I'm not a fan…
		/// type i[N+1], halving the result by shifting it one bit right.
		/// shr(add(ext(X), ext(Y)), 1)
		RKSimonUnsubmitted Not Done Reply Inline Actions Really pedantic but none of these description refer to these being averaging adds, yet a lot of the comments in other places just refers to them as averages. RKSimon: Really pedantic but none of these description refer to these being averaging adds, yet a lot of…
		AVGFLOORS,
		AVGFLOORU,
		/// AVGCEILS/AVGCEILU - Rounding averaging add - Add two integers using an
		RKSimonUnsubmitted Done Reply Inline Actions Maybe add a code snippet? RKSimon: Maybe add a code snippet?
		/// integer of type i[N+2], add 1 and halve the result by shifting it one bit
		/// right. shr(add(ext(X), ext(Y), 1), 1)
		AVGCEILS,
		AVGCEILU,

// ABDS/ABDU - Absolute difference - Return the absolute difference between		// ABDS/ABDU - Absolute difference - Return the absolute difference between
// two numbers interpreted as signed/unsigned.		// two numbers interpreted as signed/unsigned.
// i.e trunc(abs(sext(Op0) - sext(Op1))) becomes abds(Op0, Op1)		// i.e trunc(abs(sext(Op0) - sext(Op1))) becomes abds(Op0, Op1)
// or trunc(abs(zext(Op0) - zext(Op1))) becomes abdu(Op0, Op1)		// or trunc(abs(zext(Op0) - zext(Op1))) becomes abdu(Op0, Op1)
ABDS,		ABDS,
ABDU,		ABDU,

/// [US]{MIN/MAX} - Binary minimum or maximum of signed or unsigned		/// [US]{MIN/MAX} - Binary minimum or maximum of signed or unsigned
▲ Show 20 Lines • Show All 839 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/TargetLowering.h

Show First 20 Lines • Show All 2,509 Lines • ▼ Show 20 Lines	virtual bool isCommutativeBinOp(unsigned Opcode) const {
case ISD::SADDSAT:		case ISD::SADDSAT:
case ISD::UADDSAT:		case ISD::UADDSAT:
case ISD::FMINNUM:		case ISD::FMINNUM:
case ISD::FMAXNUM:		case ISD::FMAXNUM:
case ISD::FMINNUM_IEEE:		case ISD::FMINNUM_IEEE:
case ISD::FMAXNUM_IEEE:		case ISD::FMAXNUM_IEEE:
case ISD::FMINIMUM:		case ISD::FMINIMUM:
case ISD::FMAXIMUM:		case ISD::FMAXIMUM:
		case ISD::AVGFLOORS:
		case ISD::AVGFLOORU:
		case ISD::AVGCEILS:
		case ISD::AVGCEILU:
return true;		return true;
default: return false;		default: return false;
}		}
}		}

/// Return true if the node is a math/logic binary operator.		/// Return true if the node is a math/logic binary operator.
virtual bool isBinOp(unsigned Opcode) const {		virtual bool isBinOp(unsigned Opcode) const {
// A commutative binop must be a binop.		// A commutative binop must be a binop.
▲ Show 20 Lines • Show All 2,302 Lines • Show Last 20 Lines

llvm/include/llvm/Target/TargetSelectionDAG.td

	Show First 20 Lines • Show All 359 Lines • ▼ Show 20 Lines

	def add : SDNode<"ISD::ADD" , SDTIntBinOp ,			def add : SDNode<"ISD::ADD" , SDTIntBinOp ,
	[SDNPCommutative, SDNPAssociative]>;			[SDNPCommutative, SDNPAssociative]>;
	def sub : SDNode<"ISD::SUB" , SDTIntBinOp>;			def sub : SDNode<"ISD::SUB" , SDTIntBinOp>;
	def mul : SDNode<"ISD::MUL" , SDTIntBinOp,			def mul : SDNode<"ISD::MUL" , SDTIntBinOp,
	[SDNPCommutative, SDNPAssociative]>;			[SDNPCommutative, SDNPAssociative]>;
	def mulhs : SDNode<"ISD::MULHS" , SDTIntBinOp, [SDNPCommutative]>;			def mulhs : SDNode<"ISD::MULHS" , SDTIntBinOp, [SDNPCommutative]>;
	def mulhu : SDNode<"ISD::MULHU" , SDTIntBinOp, [SDNPCommutative]>;			def mulhu : SDNode<"ISD::MULHU" , SDTIntBinOp, [SDNPCommutative]>;
				def avgfloors : SDNode<"ISD::AVGFLOORS" , SDTIntBinOp, [SDNPCommutative]>;
				def avgflooru : SDNode<"ISD::AVGFLOORU" , SDTIntBinOp, [SDNPCommutative]>;
				def avgceils : SDNode<"ISD::AVGCEILS" , SDTIntBinOp, [SDNPCommutative]>;
				def avgceilu : SDNode<"ISD::AVGCEILU" , SDTIntBinOp, [SDNPCommutative]>;
	def abds : SDNode<"ISD::ABDS" , SDTIntBinOp, [SDNPCommutative]>;			def abds : SDNode<"ISD::ABDS" , SDTIntBinOp, [SDNPCommutative]>;
	def abdu : SDNode<"ISD::ABDU" , SDTIntBinOp, [SDNPCommutative]>;			def abdu : SDNode<"ISD::ABDU" , SDTIntBinOp, [SDNPCommutative]>;
	def smullohi : SDNode<"ISD::SMUL_LOHI" , SDTIntBinHiLoOp, [SDNPCommutative]>;			def smullohi : SDNode<"ISD::SMUL_LOHI" , SDTIntBinHiLoOp, [SDNPCommutative]>;
	def umullohi : SDNode<"ISD::UMUL_LOHI" , SDTIntBinHiLoOp, [SDNPCommutative]>;			def umullohi : SDNode<"ISD::UMUL_LOHI" , SDTIntBinHiLoOp, [SDNPCommutative]>;
	def sdiv : SDNode<"ISD::SDIV" , SDTIntBinOp>;			def sdiv : SDNode<"ISD::SDIV" , SDTIntBinOp>;
	def udiv : SDNode<"ISD::UDIV" , SDTIntBinOp>;			def udiv : SDNode<"ISD::UDIV" , SDTIntBinOp>;
	def srem : SDNode<"ISD::SREM" , SDTIntBinOp>;			def srem : SDNode<"ISD::SREM" , SDTIntBinOp>;
	def urem : SDNode<"ISD::UREM" , SDTIntBinOp>;			def urem : SDNode<"ISD::UREM" , SDTIntBinOp>;
	▲ Show 20 Lines • Show All 1,293 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 12,682 Lines • ▼ Show 20 Lines	if (SDValue Res = tryToFoldExtendOfConstant(N, TLI, DAG, LegalTypes))
return Res;		return Res;

if (SimplifyDemandedVectorElts(SDValue(N, 0)))		if (SimplifyDemandedVectorElts(SDValue(N, 0)))
return SDValue(N, 0);		return SDValue(N, 0);

return SDValue();		return SDValue();
}		}

		// Attempt to form one of the avg patterns from:
		// truncate(shr(add(zext(OpB), zext(OpA)), 1))
		// Creating avgflooru/avgfloors/avgceilu/avgceils, with the ceiling having an
		// extra rounding add:
		// truncate(shr(add(zext(OpB), zext(OpA), 1), 1))
		// This starts at a truncate, meaning the shift will always be shl, as the top
		// bits are known to not be demanded.
		static SDValue performAvgCombine(SDNode *N, SelectionDAG &DAG) {
		assert(N->getOpcode() == ISD::TRUNCATE && "TRUNCATE node expected");
		RKSimonUnsubmitted Not Done Reply Inline Actions assert(N->getOpcode() == ISD::TRUNCATE && "TRUNCATE node expected"); RKSimon: assert(N->getOpcode() == ISD::TRUNCATE && "TRUNCATE node expected");
		EVT VT = N->getValueType(0);

		SDValue Shift = N->getOperand(0);
		if (Shift.getOpcode() != ISD::SRL)
		return SDValue();

		// Is the right shift using an immediate value of 1?
		ConstantSDNode *N1C = isConstOrConstSplat(Shift.getOperand(1));
		if (!N1C \|\| !N1C->isOne())
		return SDValue();

		// We are looking for an avgfloor
		// add(ext, ext)
		// or one of these as a avgceil
		// add(add(ext, ext), 1)
		// add(add(ext, 1), ext)
		// add(ext, add(ext, 1))
		SDValue Add = Shift.getOperand(0);
		if (Add.getOpcode() != ISD::ADD)
		return SDValue();

		SDValue ExtendOpA = Add.getOperand(0);
		SDValue ExtendOpB = Add.getOperand(1);
		auto MatchOperands = [&](SDValue Op1, SDValue Op2, SDValue Op3) {
		ConstantSDNode *ConstOp;
		if ((ConstOp = isConstOrConstSplat(Op1)) && ConstOp->isOne()) {
		ExtendOpA = Op2;
		ExtendOpB = Op3;
		return true;
		}
		if ((ConstOp = isConstOrConstSplat(Op2)) && ConstOp->isOne()) {
		ExtendOpA = Op1;
		ExtendOpB = Op3;
		return true;
		}
		if ((ConstOp = isConstOrConstSplat(Op3)) && ConstOp->isOne()) {
		ExtendOpA = Op1;
		ExtendOpB = Op2;
		return true;
		}
		return false;
		};
		bool IsCeil = (ExtendOpA.getOpcode() == ISD::ADD &&
		MatchOperands(ExtendOpA.getOperand(0), ExtendOpA.getOperand(1),
		ExtendOpB)) \|\|
		(ExtendOpB.getOpcode() == ISD::ADD &&
		MatchOperands(ExtendOpB.getOperand(0), ExtendOpB.getOperand(1),
		ExtendOpA));

		unsigned ExtendOpAOpc = ExtendOpA.getOpcode();
		unsigned ExtendOpBOpc = ExtendOpB.getOpcode();
		if (!(ExtendOpAOpc == ExtendOpBOpc &&
		(ExtendOpAOpc == ISD::ZERO_EXTEND \|\| ExtendOpAOpc == ISD::SIGN_EXTEND)))
		return SDValue();

		// Is the result of the right shift being truncated to the same value type as
		// the original operands, OpA and OpB?
		SDValue OpA = ExtendOpA.getOperand(0);
		SDValue OpB = ExtendOpB.getOperand(0);
		EVT OpAVT = OpA.getValueType();
		RKSimonUnsubmitted Not Done Reply Inline Actions Is this assert necessary? RKSimon: Is this assert necessary?
		if (VT != OpAVT \|\| OpAVT != OpB.getValueType())
		return SDValue();

		bool IsSignExtend = ExtendOpAOpc == ISD::SIGN_EXTEND;
		unsigned AVGOpc = IsSignExtend ? (IsCeil ? ISD::AVGCEILS : ISD::AVGFLOORS)
		: (IsCeil ? ISD::AVGCEILU : ISD::AVGFLOORU);
		if (!DAG.getTargetLoweringInfo().isOperationLegalOrCustom(AVGOpc, VT))
		return SDValue();

		return DAG.getNode(AVGOpc, SDLoc(N), VT, OpA, OpB);
		}

SDValue DAGCombiner::visitTRUNCATE(SDNode *N) {		SDValue DAGCombiner::visitTRUNCATE(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
EVT SrcVT = N0.getValueType();		EVT SrcVT = N0.getValueType();
bool isLE = DAG.getDataLayout().isLittleEndian();		bool isLE = DAG.getDataLayout().isLittleEndian();

// noop truncate		// noop truncate
if (SrcVT == VT)		if (SrcVT == VT)
▲ Show 20 Lines • Show All 270 Lines • ▼ Show 20 Lines	if (N00.getOpcode() == ISD::SIGN_EXTEND \|\|
VT.getVectorElementType())		VT.getVectorElementType())
return DAG.getNode(ISD::EXTRACT_SUBVECTOR, SDLoc(N0->getOperand(0)), VT,		return DAG.getNode(ISD::EXTRACT_SUBVECTOR, SDLoc(N0->getOperand(0)), VT,
N00.getOperand(0), N0.getOperand(1));		N00.getOperand(0), N0.getOperand(1));
}		}
}		}

if (SDValue NewVSel = matchVSelectOpSizesWithSetCC(N))		if (SDValue NewVSel = matchVSelectOpSizesWithSetCC(N))
return NewVSel;		return NewVSel;
		if (SDValue M = performAvgCombine(N, DAG))
		return M;

// Narrow a suitable binary operation with a non-opaque constant operand by		// Narrow a suitable binary operation with a non-opaque constant operand by
// moving it ahead of the truncate. This is limited to pre-legalization		// moving it ahead of the truncate. This is limited to pre-legalization
// because targets may prefer a wider type during later combines and invert		// because targets may prefer a wider type during later combines and invert
// this transform.		// this transform.
switch (N0.getOpcode()) {		switch (N0.getOpcode()) {
case ISD::ADD:		case ISD::ADD:
case ISD::SUB:		case ISD::SUB:
▲ Show 20 Lines • Show All 5,567 Lines • ▼ Show 20 Lines	if (ST->isTruncatingStore() && ST->isUnindexed() &&
if ((Value.getOpcode() == ISD::ZERO_EXTEND \|\|		if ((Value.getOpcode() == ISD::ZERO_EXTEND \|\|
Value.getOpcode() == ISD::SIGN_EXTEND \|\|		Value.getOpcode() == ISD::SIGN_EXTEND \|\|
Value.getOpcode() == ISD::ANY_EXTEND) &&		Value.getOpcode() == ISD::ANY_EXTEND) &&
Value.getOperand(0).getValueType() == ST->getMemoryVT() &&		Value.getOperand(0).getValueType() == ST->getMemoryVT() &&
TLI.isOperationLegalOrCustom(ISD::STORE, ST->getMemoryVT()))		TLI.isOperationLegalOrCustom(ISD::STORE, ST->getMemoryVT()))
return DAG.getStore(Chain, SDLoc(N), Value.getOperand(0), Ptr,		return DAG.getStore(Chain, SDLoc(N), Value.getOperand(0), Ptr,
ST->getMemOperand());		ST->getMemOperand());

APInt TruncDemandedBits =		APInt TruncDemandedBits =
		craig.topperUnsubmitted Done Reply Inline Actions This APInt goes with the code after this new code. Move it down? craig.topper: This APInt goes with the code after this new code. Move it down?
APInt::getLowBitsSet(Value.getScalarValueSizeInBits(),		APInt::getLowBitsSet(Value.getScalarValueSizeInBits(),
ST->getMemoryVT().getScalarSizeInBits());		ST->getMemoryVT().getScalarSizeInBits());

// See if we can simplify the input to this truncstore with knowledge that		// See if we can simplify the input to this truncstore with knowledge that
// only the low bits are being used. For example:		// only the low bits are being used. For example:
// "truncstore (or (shl x, 8), y), i8" -> "truncstore y, i8"		// "truncstore (or (shl x, 8), y), i8" -> "truncstore y, i8"
AddToWorklist(Value.getNode());		AddToWorklist(Value.getNode());
if (SDValue Shorter = DAG.GetDemandedBits(Value, TruncDemandedBits))		if (SDValue Shorter = DAG.GetDemandedBits(Value, TruncDemandedBits))
▲ Show 20 Lines • Show All 5,685 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp

Show First 20 Lines • Show All 3,281 Lines • ▼ Show 20 Lines	#endif
case ISD::UADDSAT:		case ISD::UADDSAT:
case ISD::SADDSAT:		case ISD::SADDSAT:
case ISD::USUBSAT:		case ISD::USUBSAT:
case ISD::SSUBSAT:		case ISD::SSUBSAT:
case ISD::SSHLSAT:		case ISD::SSHLSAT:
case ISD::USHLSAT:		case ISD::USHLSAT:
case ISD::ROTL:		case ISD::ROTL:
case ISD::ROTR:		case ISD::ROTR:
		case ISD::AVGFLOORS:
		case ISD::AVGFLOORU:
		case ISD::AVGCEILS:
		case ISD::AVGCEILU:
// Vector-predicated binary op widening. Note that -- unlike the		// Vector-predicated binary op widening. Note that -- unlike the
// unpredicated versions -- we don't have to worry about trapping on		// unpredicated versions -- we don't have to worry about trapping on
// operations like UDIV, FADD, etc., as we pass on the original vector		// operations like UDIV, FADD, etc., as we pass on the original vector
// length parameter. This means the widened elements containing garbage		// length parameter. This means the widened elements containing garbage
// aren't active.		// aren't active.
case ISD::VP_SDIV:		case ISD::VP_SDIV:
case ISD::VP_UDIV:		case ISD::VP_UDIV:
case ISD::VP_SREM:		case ISD::VP_SREM:
▲ Show 20 Lines • Show All 2,882 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp

Show First 20 Lines • Show All 225 Lines • ▼ Show 20 Lines	#endif
case ISD::STRICT_FLOG10: return "strict_flog10";		case ISD::STRICT_FLOG10: return "strict_flog10";

// Binary operators		// Binary operators
case ISD::ADD: return "add";		case ISD::ADD: return "add";
case ISD::SUB: return "sub";		case ISD::SUB: return "sub";
case ISD::MUL: return "mul";		case ISD::MUL: return "mul";
case ISD::MULHU: return "mulhu";		case ISD::MULHU: return "mulhu";
case ISD::MULHS: return "mulhs";		case ISD::MULHS: return "mulhs";
		case ISD::AVGFLOORU: return "avgflooru";
		case ISD::AVGFLOORS: return "avgfloors";
		case ISD::AVGCEILU: return "avgceilu";
		case ISD::AVGCEILS: return "avgceils";
case ISD::ABDS: return "abds";		case ISD::ABDS: return "abds";
case ISD::ABDU: return "abdu";		case ISD::ABDU: return "abdu";
case ISD::SDIV: return "sdiv";		case ISD::SDIV: return "sdiv";
case ISD::UDIV: return "udiv";		case ISD::UDIV: return "udiv";
case ISD::SREM: return "srem";		case ISD::SREM: return "srem";
case ISD::UREM: return "urem";		case ISD::UREM: return "urem";
case ISD::SMUL_LOHI: return "smul_lohi";		case ISD::SMUL_LOHI: return "smul_lohi";
case ISD::UMUL_LOHI: return "umul_lohi";		case ISD::UMUL_LOHI: return "umul_lohi";
▲ Show 20 Lines • Show All 817 Lines • Show Last 20 Lines

llvm/lib/CodeGen/TargetLoweringBase.cpp

Show First 20 Lines • Show All 811 Lines • ▼ Show 20 Lines	for (MVT VT : MVT::all_valuetypes()) {
setOperationAction(ISD::SSUBO_CARRY, VT, Expand);		setOperationAction(ISD::SSUBO_CARRY, VT, Expand);

// ADDC/ADDE/SUBC/SUBE default to expand.		// ADDC/ADDE/SUBC/SUBE default to expand.
setOperationAction(ISD::ADDC, VT, Expand);		setOperationAction(ISD::ADDC, VT, Expand);
setOperationAction(ISD::ADDE, VT, Expand);		setOperationAction(ISD::ADDE, VT, Expand);
setOperationAction(ISD::SUBC, VT, Expand);		setOperationAction(ISD::SUBC, VT, Expand);
setOperationAction(ISD::SUBE, VT, Expand);		setOperationAction(ISD::SUBE, VT, Expand);

		// Halving adds
		setOperationAction(ISD::AVGFLOORS, VT, Expand);
		setOperationAction(ISD::AVGFLOORU, VT, Expand);
		setOperationAction(ISD::AVGCEILS, VT, Expand);
		setOperationAction(ISD::AVGCEILU, VT, Expand);

// Absolute difference		// Absolute difference
setOperationAction(ISD::ABDS, VT, Expand);		setOperationAction(ISD::ABDS, VT, Expand);
setOperationAction(ISD::ABDU, VT, Expand);		setOperationAction(ISD::ABDU, VT, Expand);

// These default to Expand so they will be expanded to CTLZ/CTTZ by default.		// These default to Expand so they will be expanded to CTLZ/CTTZ by default.
setOperationAction(ISD::CTLZ_ZERO_UNDEF, VT, Expand);		setOperationAction(ISD::CTLZ_ZERO_UNDEF, VT, Expand);
setOperationAction(ISD::CTTZ_ZERO_UNDEF, VT, Expand);		setOperationAction(ISD::CTTZ_ZERO_UNDEF, VT, Expand);

▲ Show 20 Lines • Show All 1,542 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.h

Show First 20 Lines • Show All 224 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {
FCMLEz,		FCMLEz,
FCMLTz,		FCMLTz,

// Vector across-lanes addition		// Vector across-lanes addition
// Only the lower result lane is defined.		// Only the lower result lane is defined.
SADDV,		SADDV,
UADDV,		UADDV,

// Vector halving addition
SHADD,
UHADD,

// Vector rounding halving addition
SRHADD,
URHADD,

// Add Long Pairwise		// Add Long Pairwise
SADDLP,		SADDLP,
UADDLP,		UADDLP,

// udot/sdot instructions		// udot/sdot instructions
UDOT,		UDOT,
SDOT,		SDOT,

▲ Show 20 Lines • Show All 910 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 864 Lines • ▼ Show 20 Lines	#undef LCALLNAME5

setTargetDAGCombine(ISD::INTRINSIC_WO_CHAIN);		setTargetDAGCombine(ISD::INTRINSIC_WO_CHAIN);

setTargetDAGCombine(ISD::ANY_EXTEND);		setTargetDAGCombine(ISD::ANY_EXTEND);
setTargetDAGCombine(ISD::ZERO_EXTEND);		setTargetDAGCombine(ISD::ZERO_EXTEND);
setTargetDAGCombine(ISD::SIGN_EXTEND);		setTargetDAGCombine(ISD::SIGN_EXTEND);
setTargetDAGCombine(ISD::VECTOR_SPLICE);		setTargetDAGCombine(ISD::VECTOR_SPLICE);
setTargetDAGCombine(ISD::SIGN_EXTEND_INREG);		setTargetDAGCombine(ISD::SIGN_EXTEND_INREG);
setTargetDAGCombine(ISD::TRUNCATE);
setTargetDAGCombine(ISD::CONCAT_VECTORS);		setTargetDAGCombine(ISD::CONCAT_VECTORS);
setTargetDAGCombine(ISD::INSERT_SUBVECTOR);		setTargetDAGCombine(ISD::INSERT_SUBVECTOR);
setTargetDAGCombine(ISD::STORE);		setTargetDAGCombine(ISD::STORE);
if (Subtarget->supportsAddressTopByteIgnored())		if (Subtarget->supportsAddressTopByteIgnored())
setTargetDAGCombine(ISD::LOAD);		setTargetDAGCombine(ISD::LOAD);

setTargetDAGCombine(ISD::MUL);		setTargetDAGCombine(ISD::MUL);

▲ Show 20 Lines • Show All 160 Lines • ▼ Show 20 Lines	for (MVT VT : { MVT::v8i8, MVT::v4i16, MVT::v2i32,
setOperationAction(ISD::SADDSAT, VT, Legal);		setOperationAction(ISD::SADDSAT, VT, Legal);
setOperationAction(ISD::UADDSAT, VT, Legal);		setOperationAction(ISD::UADDSAT, VT, Legal);
setOperationAction(ISD::SSUBSAT, VT, Legal);		setOperationAction(ISD::SSUBSAT, VT, Legal);
setOperationAction(ISD::USUBSAT, VT, Legal);		setOperationAction(ISD::USUBSAT, VT, Legal);
}		}

for (MVT VT : {MVT::v8i8, MVT::v4i16, MVT::v2i32, MVT::v16i8, MVT::v8i16,		for (MVT VT : {MVT::v8i8, MVT::v4i16, MVT::v2i32, MVT::v16i8, MVT::v8i16,
MVT::v4i32}) {		MVT::v4i32}) {
		setOperationAction(ISD::AVGFLOORS, VT, Legal);
		setOperationAction(ISD::AVGFLOORU, VT, Legal);
		setOperationAction(ISD::AVGCEILS, VT, Legal);
		setOperationAction(ISD::AVGCEILU, VT, Legal);
setOperationAction(ISD::ABDS, VT, Legal);		setOperationAction(ISD::ABDS, VT, Legal);
setOperationAction(ISD::ABDU, VT, Legal);		setOperationAction(ISD::ABDU, VT, Legal);
}		}

// Vector reductions		// Vector reductions
for (MVT VT : { MVT::v4f16, MVT::v2f32,		for (MVT VT : { MVT::v4f16, MVT::v2f32,
MVT::v8f16, MVT::v4f32, MVT::v2f64 }) {		MVT::v8f16, MVT::v4f32, MVT::v2f64 }) {
if (VT.getVectorElementType() != MVT::f16 \|\| Subtarget->hasFullFP16()) {		if (VT.getVectorElementType() != MVT::f16 \|\| Subtarget->hasFullFP16()) {
▲ Show 20 Lines • Show All 1,033 Lines • ▼ Show 20 Lines	case AArch64ISD::FIRST_NUMBER:
MAKE_CASE(AArch64ISD::CMLTz)		MAKE_CASE(AArch64ISD::CMLTz)
MAKE_CASE(AArch64ISD::FCMEQz)		MAKE_CASE(AArch64ISD::FCMEQz)
MAKE_CASE(AArch64ISD::FCMGEz)		MAKE_CASE(AArch64ISD::FCMGEz)
MAKE_CASE(AArch64ISD::FCMGTz)		MAKE_CASE(AArch64ISD::FCMGTz)
MAKE_CASE(AArch64ISD::FCMLEz)		MAKE_CASE(AArch64ISD::FCMLEz)
MAKE_CASE(AArch64ISD::FCMLTz)		MAKE_CASE(AArch64ISD::FCMLTz)
MAKE_CASE(AArch64ISD::SADDV)		MAKE_CASE(AArch64ISD::SADDV)
MAKE_CASE(AArch64ISD::UADDV)		MAKE_CASE(AArch64ISD::UADDV)
MAKE_CASE(AArch64ISD::SRHADD)
MAKE_CASE(AArch64ISD::URHADD)
MAKE_CASE(AArch64ISD::SHADD)
MAKE_CASE(AArch64ISD::UHADD)
MAKE_CASE(AArch64ISD::SDOT)		MAKE_CASE(AArch64ISD::SDOT)
MAKE_CASE(AArch64ISD::UDOT)		MAKE_CASE(AArch64ISD::UDOT)
MAKE_CASE(AArch64ISD::SMINV)		MAKE_CASE(AArch64ISD::SMINV)
MAKE_CASE(AArch64ISD::UMINV)		MAKE_CASE(AArch64ISD::UMINV)
MAKE_CASE(AArch64ISD::SMAXV)		MAKE_CASE(AArch64ISD::SMAXV)
MAKE_CASE(AArch64ISD::UMAXV)		MAKE_CASE(AArch64ISD::UMAXV)
MAKE_CASE(AArch64ISD::SADDV_PRED)		MAKE_CASE(AArch64ISD::SADDV_PRED)
MAKE_CASE(AArch64ISD::UADDV_PRED)		MAKE_CASE(AArch64ISD::UADDV_PRED)
▲ Show 20 Lines • Show All 2,255 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op,
case Intrinsic::aarch64_neon_srhadd:		case Intrinsic::aarch64_neon_srhadd:
case Intrinsic::aarch64_neon_urhadd:		case Intrinsic::aarch64_neon_urhadd:
case Intrinsic::aarch64_neon_shadd:		case Intrinsic::aarch64_neon_shadd:
case Intrinsic::aarch64_neon_uhadd: {		case Intrinsic::aarch64_neon_uhadd: {
bool IsSignedAdd = (IntNo == Intrinsic::aarch64_neon_srhadd \|\|		bool IsSignedAdd = (IntNo == Intrinsic::aarch64_neon_srhadd \|\|
IntNo == Intrinsic::aarch64_neon_shadd);		IntNo == Intrinsic::aarch64_neon_shadd);
bool IsRoundingAdd = (IntNo == Intrinsic::aarch64_neon_srhadd \|\|		bool IsRoundingAdd = (IntNo == Intrinsic::aarch64_neon_srhadd \|\|
IntNo == Intrinsic::aarch64_neon_urhadd);		IntNo == Intrinsic::aarch64_neon_urhadd);
unsigned Opcode =		unsigned Opcode = IsSignedAdd
IsSignedAdd ? (IsRoundingAdd ? AArch64ISD::SRHADD : AArch64ISD::SHADD)		? (IsRoundingAdd ? ISD::AVGCEILS : ISD::AVGFLOORS)
: (IsRoundingAdd ? AArch64ISD::URHADD : AArch64ISD::UHADD);		: (IsRoundingAdd ? ISD::AVGCEILU : ISD::AVGFLOORU);
return DAG.getNode(Opcode, dl, Op.getValueType(), Op.getOperand(1),		return DAG.getNode(Opcode, dl, Op.getValueType(), Op.getOperand(1),
Op.getOperand(2));		Op.getOperand(2));
}		}
case Intrinsic::aarch64_neon_sabd:		case Intrinsic::aarch64_neon_sabd:
case Intrinsic::aarch64_neon_uabd: {		case Intrinsic::aarch64_neon_uabd: {
unsigned Opcode = IntNo == Intrinsic::aarch64_neon_uabd ? ISD::ABDU		unsigned Opcode = IntNo == Intrinsic::aarch64_neon_uabd ? ISD::ABDU
: ISD::ABDS;		: ISD::ABDS;
return DAG.getNode(Opcode, dl, Op.getValueType(), Op.getOperand(1),		return DAG.getNode(Opcode, dl, Op.getValueType(), Op.getOperand(1),
▲ Show 20 Lines • Show All 9,853 Lines • ▼ Show 20 Lines	if ((NewOp = tryAdvSIMDModImm32(AArch64ISD::BICi, SDValue(N, 0), DAG,
(NewOp = tryAdvSIMDModImm16(AArch64ISD::BICi, SDValue(N, 0), DAG,		(NewOp = tryAdvSIMDModImm16(AArch64ISD::BICi, SDValue(N, 0), DAG,
UndefBits, &LHS)))		UndefBits, &LHS)))
return NewOp;		return NewOp;
}		}

return SDValue();		return SDValue();
}		}

// Attempt to form urhadd(OpA, OpB) from
// truncate(vlshr(sub(zext(OpB), xor(zext(OpA), Ones(ElemSizeInBits))), 1))
// or uhadd(OpA, OpB) from truncate(vlshr(add(zext(OpA), zext(OpB)), 1)).
// The original form of the first expression is
// truncate(srl(add(zext(OpB), add(zext(OpA), 1)), 1)) and the
// (OpA + OpB + 1) subexpression will have been changed to (OpB - (~OpA)).
// Before this function is called the srl will have been lowered to
// AArch64ISD::VLSHR.
// This pass can also recognize signed variants of the patterns that use sign
// extension instead of zero extension and form a srhadd(OpA, OpB) or a
// shadd(OpA, OpB) from them.
static SDValue
performVectorTruncateCombine(SDNode *N, TargetLowering::DAGCombinerInfo &DCI,
SelectionDAG &DAG) {
EVT VT = N->getValueType(0);

// Since we are looking for a right shift by a constant value of 1 and we are
// operating on types at least 16 bits in length (sign/zero extended OpA and
// OpB, which are at least 8 bits), it follows that the truncate will always
// discard the shifted-in bit and therefore the right shift will be logical
// regardless of the signedness of OpA and OpB.
SDValue Shift = N->getOperand(0);
if (Shift.getOpcode() != AArch64ISD::VLSHR)
return SDValue();

// Is the right shift using an immediate value of 1?
uint64_t ShiftAmount = Shift.getConstantOperandVal(1);
if (ShiftAmount != 1)
return SDValue();

SDValue ExtendOpA, ExtendOpB;
SDValue ShiftOp0 = Shift.getOperand(0);
unsigned ShiftOp0Opc = ShiftOp0.getOpcode();
if (ShiftOp0Opc == ISD::SUB) {

SDValue Xor = ShiftOp0.getOperand(1);
if (Xor.getOpcode() != ISD::XOR)
return SDValue();

// Is the XOR using a constant amount of all ones in the right hand side?
uint64_t C;
if (!isAllConstantBuildVector(Xor.getOperand(1), C))
return SDValue();

unsigned ElemSizeInBits = VT.getScalarSizeInBits();
APInt CAsAPInt(ElemSizeInBits, C);
if (CAsAPInt != APInt::getAllOnes(ElemSizeInBits))
return SDValue();

ExtendOpA = Xor.getOperand(0);
ExtendOpB = ShiftOp0.getOperand(0);
} else if (ShiftOp0Opc == ISD::ADD) {
ExtendOpA = ShiftOp0.getOperand(0);
ExtendOpB = ShiftOp0.getOperand(1);
} else
return SDValue();

unsigned ExtendOpAOpc = ExtendOpA.getOpcode();
unsigned ExtendOpBOpc = ExtendOpB.getOpcode();
if (!(ExtendOpAOpc == ExtendOpBOpc &&
(ExtendOpAOpc == ISD::ZERO_EXTEND \|\| ExtendOpAOpc == ISD::SIGN_EXTEND)))
return SDValue();

// Is the result of the right shift being truncated to the same value type as
// the original operands, OpA and OpB?
SDValue OpA = ExtendOpA.getOperand(0);
SDValue OpB = ExtendOpB.getOperand(0);
EVT OpAVT = OpA.getValueType();
assert(ExtendOpA.getValueType() == ExtendOpB.getValueType());
if (!(VT == OpAVT && OpAVT == OpB.getValueType()))
return SDValue();

SDLoc DL(N);
bool IsSignExtend = ExtendOpAOpc == ISD::SIGN_EXTEND;
bool IsRHADD = ShiftOp0Opc == ISD::SUB;
unsigned HADDOpc = IsSignExtend
? (IsRHADD ? AArch64ISD::SRHADD : AArch64ISD::SHADD)
: (IsRHADD ? AArch64ISD::URHADD : AArch64ISD::UHADD);
SDValue ResultHADD = DAG.getNode(HADDOpc, DL, VT, OpA, OpB);

return ResultHADD;
}

static bool hasPairwiseAdd(unsigned Opcode, EVT VT, bool FullFP16) {		static bool hasPairwiseAdd(unsigned Opcode, EVT VT, bool FullFP16) {
switch (Opcode) {		switch (Opcode) {
case ISD::FADD:		case ISD::FADD:
return (FullFP16 && VT == MVT::f16) \|\| VT == MVT::f32 \|\| VT == MVT::f64;		return (FullFP16 && VT == MVT::f16) \|\| VT == MVT::f32 \|\| VT == MVT::f64;
case ISD::ADD:		case ISD::ADD:
return VT == MVT::i64;		return VT == MVT::i64;
default:		default:
return false;		return false;
▲ Show 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	if (N->getNumOperands() == 2 && N0Opc == ISD::TRUNCATE &&
}		}
}		}

// Wait 'til after everything is legalized to try this. That way we have		// Wait 'til after everything is legalized to try this. That way we have
// legal vector types and such.		// legal vector types and such.
if (DCI.isBeforeLegalizeOps())		if (DCI.isBeforeLegalizeOps())
return SDValue();		return SDValue();

// Optimise concat_vectors of two [us]rhadds or [us]hadds that use extracted		// Optimise concat_vectors of two [us]avgceils or [us]avgfloors that use
// subvectors from the same original vectors. Combine these into a single		// extracted subvectors from the same original vectors. Combine these into a
// [us]rhadd or [us]hadd that operates on the two original vectors. Example:		// single avg that operates on the two original vectors.
		RKSimonUnsubmitted Not Done Reply Inline Actions update these as well RKSimon: update these as well
// (v16i8 (concat_vectors (v8i8 (urhadd (extract_subvector (v16i8 OpA, <0>),		// avgceil is the target independant name for rhadd, avgfloor is a hadd.
// extract_subvector (v16i8 OpB,		// Example:
// <0>))),		// (concat_vectors (v8i8 (avgceils (extract_subvector (v16i8 OpA, <0>),
// (v8i8 (urhadd (extract_subvector (v16i8 OpA, <8>),		// extract_subvector (v16i8 OpB, <0>))),
// extract_subvector (v16i8 OpB,		// (v8i8 (avgceils (extract_subvector (v16i8 OpA, <8>),
// <8>)))))		// extract_subvector (v16i8 OpB, <8>)))))
// ->		// ->
// (v16i8(urhadd(v16i8 OpA, v16i8 OpB)))		// (v16i8(avgceils(v16i8 OpA, v16i8 OpB)))
		RKSimonUnsubmitted Not Done Reply Inline Actions update the urhadd references RKSimon: update the urhadd references
if (N->getNumOperands() == 2 && N0Opc == N1Opc &&		if (N->getNumOperands() == 2 && N0Opc == N1Opc &&
(N0Opc == AArch64ISD::URHADD \|\| N0Opc == AArch64ISD::SRHADD \|\|		(N0Opc == ISD::AVGCEILU \|\| N0Opc == ISD::AVGCEILS \|\|
N0Opc == AArch64ISD::UHADD \|\| N0Opc == AArch64ISD::SHADD)) {		N0Opc == ISD::AVGFLOORU \|\| N0Opc == ISD::AVGFLOORS)) {
SDValue N00 = N0->getOperand(0);		SDValue N00 = N0->getOperand(0);
SDValue N01 = N0->getOperand(1);		SDValue N01 = N0->getOperand(1);
SDValue N10 = N1->getOperand(0);		SDValue N10 = N1->getOperand(0);
SDValue N11 = N1->getOperand(1);		SDValue N11 = N1->getOperand(1);

EVT N00VT = N00.getValueType();		EVT N00VT = N00.getValueType();
EVT N10VT = N10.getValueType();		EVT N10VT = N10.getValueType();

▲ Show 20 Lines • Show All 3,564 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::PerformDAGCombine(SDNode *N,
case ISD::INTRINSIC_WO_CHAIN:		case ISD::INTRINSIC_WO_CHAIN:
return performIntrinsicCombine(N, DCI, Subtarget);		return performIntrinsicCombine(N, DCI, Subtarget);
case ISD::ANY_EXTEND:		case ISD::ANY_EXTEND:
case ISD::ZERO_EXTEND:		case ISD::ZERO_EXTEND:
case ISD::SIGN_EXTEND:		case ISD::SIGN_EXTEND:
return performExtendCombine(N, DCI, DAG);		return performExtendCombine(N, DCI, DAG);
case ISD::SIGN_EXTEND_INREG:		case ISD::SIGN_EXTEND_INREG:
return performSignExtendInRegCombine(N, DCI, DAG);		return performSignExtendInRegCombine(N, DCI, DAG);
case ISD::TRUNCATE:
return performVectorTruncateCombine(N, DCI, DAG);
case ISD::CONCAT_VECTORS:		case ISD::CONCAT_VECTORS:
return performConcatVectorsCombine(N, DCI, DAG);		return performConcatVectorsCombine(N, DCI, DAG);
case ISD::INSERT_SUBVECTOR:		case ISD::INSERT_SUBVECTOR:
return performInsertSubvectorCombine(N, DCI, DAG);		return performInsertSubvectorCombine(N, DCI, DAG);
case ISD::SELECT:		case ISD::SELECT:
return performSelectCombine(N, DCI);		return performSelectCombine(N, DCI);
case ISD::VSELECT:		case ISD::VSELECT:
return performVSelectCombine(N, DCI.DAG);		return performVSelectCombine(N, DCI.DAG);
▲ Show 20 Lines • Show All 2,311 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64InstrInfo.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 624 Lines • ▼ Show 20 Lines

	def AArch64saddv : SDNode<"AArch64ISD::SADDV", SDT_AArch64UnaryVec>;			def AArch64saddv : SDNode<"AArch64ISD::SADDV", SDT_AArch64UnaryVec>;
	def AArch64uaddv : SDNode<"AArch64ISD::UADDV", SDT_AArch64UnaryVec>;			def AArch64uaddv : SDNode<"AArch64ISD::UADDV", SDT_AArch64UnaryVec>;
	def AArch64sminv : SDNode<"AArch64ISD::SMINV", SDT_AArch64UnaryVec>;			def AArch64sminv : SDNode<"AArch64ISD::SMINV", SDT_AArch64UnaryVec>;
	def AArch64uminv : SDNode<"AArch64ISD::UMINV", SDT_AArch64UnaryVec>;			def AArch64uminv : SDNode<"AArch64ISD::UMINV", SDT_AArch64UnaryVec>;
	def AArch64smaxv : SDNode<"AArch64ISD::SMAXV", SDT_AArch64UnaryVec>;			def AArch64smaxv : SDNode<"AArch64ISD::SMAXV", SDT_AArch64UnaryVec>;
	def AArch64umaxv : SDNode<"AArch64ISD::UMAXV", SDT_AArch64UnaryVec>;			def AArch64umaxv : SDNode<"AArch64ISD::UMAXV", SDT_AArch64UnaryVec>;

	def AArch64srhadd : SDNode<"AArch64ISD::SRHADD", SDT_AArch64binvec>;
	def AArch64urhadd : SDNode<"AArch64ISD::URHADD", SDT_AArch64binvec>;
	def AArch64shadd : SDNode<"AArch64ISD::SHADD", SDT_AArch64binvec>;
	def AArch64uhadd : SDNode<"AArch64ISD::UHADD", SDT_AArch64binvec>;

	def AArch64uabd : PatFrags<(ops node:$lhs, node:$rhs),			def AArch64uabd : PatFrags<(ops node:$lhs, node:$rhs),
	[(abdu node:$lhs, node:$rhs),			[(abdu node:$lhs, node:$rhs),
	(int_aarch64_neon_uabd node:$lhs, node:$rhs)]>;			(int_aarch64_neon_uabd node:$lhs, node:$rhs)]>;
	def AArch64sabd : PatFrags<(ops node:$lhs, node:$rhs),			def AArch64sabd : PatFrags<(ops node:$lhs, node:$rhs),
	[(abds node:$lhs, node:$rhs),			[(abds node:$lhs, node:$rhs),
	(int_aarch64_neon_sabd node:$lhs, node:$rhs)]>;			(int_aarch64_neon_sabd node:$lhs, node:$rhs)]>;

	def AArch64uaddlp_n : SDNode<"AArch64ISD::UADDLP", SDT_AArch64uaddlp>;			def AArch64uaddlp_n : SDNode<"AArch64ISD::UADDLP", SDT_AArch64uaddlp>;
	▲ Show 20 Lines • Show All 3,837 Lines • ▼ Show 20 Lines
	defm MLA : SIMDThreeSameVectorBHSTied<0, 0b10010, "mla", null_frag>;			defm MLA : SIMDThreeSameVectorBHSTied<0, 0b10010, "mla", null_frag>;
	defm MLS : SIMDThreeSameVectorBHSTied<1, 0b10010, "mls", null_frag>;			defm MLS : SIMDThreeSameVectorBHSTied<1, 0b10010, "mls", null_frag>;

	defm MUL : SIMDThreeSameVectorBHS<0, 0b10011, "mul", mul>;			defm MUL : SIMDThreeSameVectorBHS<0, 0b10011, "mul", mul>;
	defm PMUL : SIMDThreeSameVectorB<1, 0b10011, "pmul", int_aarch64_neon_pmul>;			defm PMUL : SIMDThreeSameVectorB<1, 0b10011, "pmul", int_aarch64_neon_pmul>;
	defm SABA : SIMDThreeSameVectorBHSTied<0, 0b01111, "saba",			defm SABA : SIMDThreeSameVectorBHSTied<0, 0b01111, "saba",
	TriOpFrag<(add node:$LHS, (AArch64sabd node:$MHS, node:$RHS))> >;			TriOpFrag<(add node:$LHS, (AArch64sabd node:$MHS, node:$RHS))> >;
	defm SABD : SIMDThreeSameVectorBHS<0,0b01110,"sabd", AArch64sabd>;			defm SABD : SIMDThreeSameVectorBHS<0,0b01110,"sabd", AArch64sabd>;
	defm SHADD : SIMDThreeSameVectorBHS<0,0b00000,"shadd", AArch64shadd>;			defm SHADD : SIMDThreeSameVectorBHS<0,0b00000,"shadd", avgfloors>;
	defm SHSUB : SIMDThreeSameVectorBHS<0,0b00100,"shsub", int_aarch64_neon_shsub>;			defm SHSUB : SIMDThreeSameVectorBHS<0,0b00100,"shsub", int_aarch64_neon_shsub>;
	defm SMAXP : SIMDThreeSameVectorBHS<0,0b10100,"smaxp", int_aarch64_neon_smaxp>;			defm SMAXP : SIMDThreeSameVectorBHS<0,0b10100,"smaxp", int_aarch64_neon_smaxp>;
	defm SMAX : SIMDThreeSameVectorBHS<0,0b01100,"smax", smax>;			defm SMAX : SIMDThreeSameVectorBHS<0,0b01100,"smax", smax>;
	defm SMINP : SIMDThreeSameVectorBHS<0,0b10101,"sminp", int_aarch64_neon_sminp>;			defm SMINP : SIMDThreeSameVectorBHS<0,0b10101,"sminp", int_aarch64_neon_sminp>;
	defm SMIN : SIMDThreeSameVectorBHS<0,0b01101,"smin", smin>;			defm SMIN : SIMDThreeSameVectorBHS<0,0b01101,"smin", smin>;
	defm SQADD : SIMDThreeSameVector<0,0b00001,"sqadd", int_aarch64_neon_sqadd>;			defm SQADD : SIMDThreeSameVector<0,0b00001,"sqadd", int_aarch64_neon_sqadd>;
	defm SQDMULH : SIMDThreeSameVectorHS<0,0b10110,"sqdmulh",int_aarch64_neon_sqdmulh>;			defm SQDMULH : SIMDThreeSameVectorHS<0,0b10110,"sqdmulh",int_aarch64_neon_sqdmulh>;
	defm SQRDMULH : SIMDThreeSameVectorHS<1,0b10110,"sqrdmulh",int_aarch64_neon_sqrdmulh>;			defm SQRDMULH : SIMDThreeSameVectorHS<1,0b10110,"sqrdmulh",int_aarch64_neon_sqrdmulh>;
	defm SQRSHL : SIMDThreeSameVector<0,0b01011,"sqrshl", int_aarch64_neon_sqrshl>;			defm SQRSHL : SIMDThreeSameVector<0,0b01011,"sqrshl", int_aarch64_neon_sqrshl>;
	defm SQSHL : SIMDThreeSameVector<0,0b01001,"sqshl", int_aarch64_neon_sqshl>;			defm SQSHL : SIMDThreeSameVector<0,0b01001,"sqshl", int_aarch64_neon_sqshl>;
	defm SQSUB : SIMDThreeSameVector<0,0b00101,"sqsub", int_aarch64_neon_sqsub>;			defm SQSUB : SIMDThreeSameVector<0,0b00101,"sqsub", int_aarch64_neon_sqsub>;
	defm SRHADD : SIMDThreeSameVectorBHS<0,0b00010,"srhadd", AArch64srhadd>;			defm SRHADD : SIMDThreeSameVectorBHS<0,0b00010,"srhadd", avgceils>;
	defm SRSHL : SIMDThreeSameVector<0,0b01010,"srshl", int_aarch64_neon_srshl>;			defm SRSHL : SIMDThreeSameVector<0,0b01010,"srshl", int_aarch64_neon_srshl>;
	defm SSHL : SIMDThreeSameVector<0,0b01000,"sshl", int_aarch64_neon_sshl>;			defm SSHL : SIMDThreeSameVector<0,0b01000,"sshl", int_aarch64_neon_sshl>;
	defm SUB : SIMDThreeSameVector<1,0b10000,"sub", sub>;			defm SUB : SIMDThreeSameVector<1,0b10000,"sub", sub>;
	defm UABA : SIMDThreeSameVectorBHSTied<1, 0b01111, "uaba",			defm UABA : SIMDThreeSameVectorBHSTied<1, 0b01111, "uaba",
	TriOpFrag<(add node:$LHS, (AArch64uabd node:$MHS, node:$RHS))> >;			TriOpFrag<(add node:$LHS, (AArch64uabd node:$MHS, node:$RHS))> >;
	defm UABD : SIMDThreeSameVectorBHS<1,0b01110,"uabd", AArch64uabd>;			defm UABD : SIMDThreeSameVectorBHS<1,0b01110,"uabd", AArch64uabd>;
	defm UHADD : SIMDThreeSameVectorBHS<1,0b00000,"uhadd", AArch64uhadd>;			defm UHADD : SIMDThreeSameVectorBHS<1,0b00000,"uhadd", avgflooru>;
	defm UHSUB : SIMDThreeSameVectorBHS<1,0b00100,"uhsub", int_aarch64_neon_uhsub>;			defm UHSUB : SIMDThreeSameVectorBHS<1,0b00100,"uhsub", int_aarch64_neon_uhsub>;
	defm UMAXP : SIMDThreeSameVectorBHS<1,0b10100,"umaxp", int_aarch64_neon_umaxp>;			defm UMAXP : SIMDThreeSameVectorBHS<1,0b10100,"umaxp", int_aarch64_neon_umaxp>;
	defm UMAX : SIMDThreeSameVectorBHS<1,0b01100,"umax", umax>;			defm UMAX : SIMDThreeSameVectorBHS<1,0b01100,"umax", umax>;
	defm UMINP : SIMDThreeSameVectorBHS<1,0b10101,"uminp", int_aarch64_neon_uminp>;			defm UMINP : SIMDThreeSameVectorBHS<1,0b10101,"uminp", int_aarch64_neon_uminp>;
	defm UMIN : SIMDThreeSameVectorBHS<1,0b01101,"umin", umin>;			defm UMIN : SIMDThreeSameVectorBHS<1,0b01101,"umin", umin>;
	defm UQADD : SIMDThreeSameVector<1,0b00001,"uqadd", int_aarch64_neon_uqadd>;			defm UQADD : SIMDThreeSameVector<1,0b00001,"uqadd", int_aarch64_neon_uqadd>;
	defm UQRSHL : SIMDThreeSameVector<1,0b01011,"uqrshl", int_aarch64_neon_uqrshl>;			defm UQRSHL : SIMDThreeSameVector<1,0b01011,"uqrshl", int_aarch64_neon_uqrshl>;
	defm UQSHL : SIMDThreeSameVector<1,0b01001,"uqshl", int_aarch64_neon_uqshl>;			defm UQSHL : SIMDThreeSameVector<1,0b01001,"uqshl", int_aarch64_neon_uqshl>;
	defm UQSUB : SIMDThreeSameVector<1,0b00101,"uqsub", int_aarch64_neon_uqsub>;			defm UQSUB : SIMDThreeSameVector<1,0b00101,"uqsub", int_aarch64_neon_uqsub>;
	defm URHADD : SIMDThreeSameVectorBHS<1,0b00010,"urhadd", AArch64urhadd>;			defm URHADD : SIMDThreeSameVectorBHS<1,0b00010,"urhadd", avgceilu>;
	defm URSHL : SIMDThreeSameVector<1,0b01010,"urshl", int_aarch64_neon_urshl>;			defm URSHL : SIMDThreeSameVector<1,0b01010,"urshl", int_aarch64_neon_urshl>;
	defm USHL : SIMDThreeSameVector<1,0b01000,"ushl", int_aarch64_neon_ushl>;			defm USHL : SIMDThreeSameVector<1,0b01000,"ushl", int_aarch64_neon_ushl>;
	defm SQRDMLAH : SIMDThreeSameVectorSQRDMLxHTiedHS<1,0b10000,"sqrdmlah",			defm SQRDMLAH : SIMDThreeSameVectorSQRDMLxHTiedHS<1,0b10000,"sqrdmlah",
	int_aarch64_neon_sqrdmlah>;			int_aarch64_neon_sqrdmlah>;
	defm SQRDMLSH : SIMDThreeSameVectorSQRDMLxHTiedHS<1,0b10001,"sqrdmlsh",			defm SQRDMLSH : SIMDThreeSameVectorSQRDMLxHTiedHS<1,0b10001,"sqrdmlsh",
	int_aarch64_neon_sqrdmlsh>;			int_aarch64_neon_sqrdmlsh>;

	// Extra saturate patterns, other than the intrinsics matches above			// Extra saturate patterns, other than the intrinsics matches above
	▲ Show 20 Lines • Show All 3,894 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[ISel] Port AArch64 HADD and RHADD to ISelClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 407945

llvm/include/llvm/CodeGen/ISDOpcodes.h

llvm/include/llvm/CodeGen/TargetLowering.h

llvm/include/llvm/Target/TargetSelectionDAG.td

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp

llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp

llvm/lib/CodeGen/TargetLoweringBase.cpp

llvm/lib/Target/AArch64/AArch64ISelLowering.h

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/lib/Target/AArch64/AArch64InstrInfo.td

[ISel] Port AArch64 HADD and RHADD to ISel
ClosedPublic