This is an archive of the discontinued LLVM Phabricator instance.

junparser retitled this revision from [AArch64][SVE] Optimize index_vector with add to [AArch64][SVE] Combine add and index_vector.Apr 8 2021, 7:22 AM

junparser edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B97723: Diff 336099.Apr 8 2021, 8:00 AM

Harbormaster completed remote builds in B97721: Diff 336096.

Harbormaster completed remote builds in B97720: Diff 336095.Apr 8 2021, 8:12 AM

sdesmalen added inline comments.Apr 9 2021, 4:42 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13437 ↗	(On Diff #336099)	Hi @junparser, there are a few things to take into account here: Folding the add into the index_vector may be more expensive in practice, if `index_vector(zero, step)` is created once, then subsequent `add`s may be cheaper than having multiple `index_vector` instructions which each need to calculate the expansion of the (very similar) series. It's probably best to check that the node (add) has only a single result. The form where it uses an `add` may come in useful when selecting an addressing mode for the gather/scatter operations, in which case: `add(nxv4i32 dup(X), nxv4i32 index_vector(zero, step))` can be implemented with a scalar+vector addressing mode such as `[x0, z0.s]`. You may want to add a check that the result is not used as the address in a load/store operation. This can probably be implemented quite easily in TableGen with some patterns that try to do the fold, but where the PatFrag itself has a condition that it only has a single use that this use is not the address in a MemSDNode. See e.g. AArch64mul_p_oneuse where it also checks for a single use before selecting an MLA/MLS. Because of how we have currently implemented the legalization of gathers/scatters, I don't think you can currently test the condition that the node is not used in a MemSDNode, but I'm not sure if that matters much.

What I want know something more is what is the boundary between tablegen pattern match and dag combine? I never figure this out. Just use this case as example, we can implement the feature in both place, but I don't know how to handle commutivity in tablegen (maybe SDNPCommutative? I don't know.) , and their effects on the next step: combine load/store with index_vector.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13437 ↗	(On Diff #336099)	Hi @sdesmalen It looks reasonable to me to check add has only one use. as for addressing mode for gather/scatter, have we already implemented it now? I haven't see it in trunk. Since my next plan is to combine load/store with index_vector.
13437 ↗	(On Diff #336099)	Hi @junparser, there are a few things to take into account here: Folding the add into the index_vector may be more expensive in practice, if `index_vector(zero, step)` is created once, then subsequent `add`s may be cheaper than having multiple `index_vector` instructions which each need to calculate the expansion of the (very similar) series. It's probably best to check that the node (add) has only a single result. The form where it uses an `add` may come in useful when selecting an addressing mode for the gather/scatter operations, in which case: `add(nxv4i32 dup(X), nxv4i32 index_vector(zero, step))` can be implemented with a scalar+vector addressing mode such as `[x0, z0.s]`. You may want to add a check that the result is not used as the address in a load/store operation. This can probably be implemented quite easily in TableGen with some patterns that try to do the fold, but where the PatFrag itself has a condition that it only has a single use that this use is not the address in a MemSDNode. See e.g. AArch64mul_p_oneuse where it also checks for a single use before selecting an MLA/MLS. Because of how we have currently implemented the legalization of gathers/scatters, I don't think you can currently test the condition that the node is not used in a MemSDNode, but I'm not sure if that matters much.

In D100107#2682168, @junparser wrote:

What I want know something more is what is the boundary between tablegen pattern match and dag combine? I never figure this out. Just use this case as example, we can implement the feature in both place, but I don't know how to handle commutivity in tablegen (maybe SDNPCommutative? I don't know.) , and their effects on the next step: combine load/store with index_vector.

There isn't a firm rule to follow re tablegen pattern vs manual dag combine, but I think the steer is that when the combine is needed after legalization and it's possible to specify as a pattern, a pattern is preferred. In this case, probably a pattern would be a clean way to implement it. I haven't used SDNPCommutative before, but that looks like it should do the trick.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13437 ↗	(On Diff #336099)	The gather/scatter nodes are not implemented as patterns at the moment, only as custom dagcombine code. So adding a check that it isn't used in an addressing mode would currently have no effect, but it probably would if we change our gather/scatter implementation to not use the intrinsics, but use common nodes instead. This is something we want to do in the not-too-distant future. You can alternatively also leave a FIXME. Since my next plan is to combine load/store with index_vector Can you elaborate what you mean by that?

In D100107#2685779, @sdesmalen wrote:

In D100107#2682168, @junparser wrote:

What I want know something more is what is the boundary between tablegen pattern match and dag combine? I never figure this out. Just use this case as example, we can implement the feature in both place, but I don't know how to handle commutivity in tablegen (maybe SDNPCommutative? I don't know.) , and their effects on the next step: combine load/store with index_vector.

There isn't a firm rule to follow re tablegen pattern vs manual dag combine, but I think the steer is that when the combine is needed after legalization and it's possible to specify as a pattern, a pattern is preferred. In this case, probably a pattern would be a clean way to implement it. I haven't used SDNPCommutative before, but that looks like it should do the trick.

Thanks for explain this.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13437 ↗	(On Diff #336099)	We already have node like GLD1_MERGE_ZERO and combine function like performGLD1Combine, so rather than implementing it as tablegen pattern, i prefer to do it in here, and then do some more work in performGLD1Combine with index_vector, that the plan. Also what is the difference between scalar+vector and. vector+ index addressing mode? since we can also use vector+ index addressing mode after this combination.

Address comment.

Harbormaster completed remote builds in B98648: Diff 337380.Apr 14 2021, 2:32 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13437 ↗	(On Diff #336099)	Hi @junparser, please can you provide an example for what you're ultimately trying to achieve. As Sander points out the decision as to where things should live is not not clear cut and I've nothing against doing things within DAGCombine if it helps reduce isel pattern explosion. That said I am very keen to cut down the need for duplicated combines due to the lack of a canonicalised DAG. With regards to the addressing modes we should already have good support by looking for ADD based arithmetic used to compute addresses. Unless there's a good reason I'd rather not have to duplicate this logic to handle INDEX_VECTOR. Personally I see INDEX_VECTOR as a last stage optimisation whereby everything else has had a chance to remove the ADD (or MUL if we're talking about a stride) and thus the last option is to merge stray ADDs into the INDEX_VECTOR. For what it's worth the target specific INDEX_VECTOR existed before the common STEP_VECTOR nodes so I think there's a good chance we'll put effort into ensuring any early uses of INDEX_VECTOR are first converted to STEP_VECTOR to maximum the effectiveness of future common STEP_VECTOR combines.

junparser added inline comments.Apr 14 2021, 5:39 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13437 ↗	(On Diff #336099)	Hi @junparser, please can you provide an example for what you're ultimately trying to achieve. combine add+index_vector here, and do some address mode choice in selectGatherScatterAddrMode between scalar+vector and. vector+ index , and convert it back to index_vector + gather. As Sander points out the decision as to where things should live is not not clear cut and I've nothing against doing things within DAGCombine if it helps reduce isel pattern explosion. That said I am very keen to cut down the need for duplicated combines due to the lack of a canonicalised DAG. With regards to the addressing modes we should already have good support by looking for ADD based arithmetic used to compute addresses. Unless there's a good reason I'd rather not have to duplicate this logic to handle INDEX_VECTOR. Personally I see INDEX_VECTOR as a last stage optimisation whereby everything else has had a chance to remove the ADD (or MUL if we're talking about a stride) and thus the last option is to merge stray ADDs into the INDEX_VECTOR. sounds reasonable to me. For what it's worth the target specific INDEX_VECTOR existed before the common STEP_VECTOR nodes so I think there's a good chance we'll put effort into ensuring any early uses of INDEX_VECTOR are first converted to STEP_VECTOR to maximum the effectiveness of future common STEP_VECTOR combines. I see we lower step_vector to index_vector currently, do you mean that we only generate index_vector in instruction selection, and only use step_vector before that? Hmm, maybe it's a good idea.

Address comments. using tablegen pattern match.

Harbormaster completed remote builds in B98815: Diff 337631.Apr 14 2021, 10:18 PM

Thanks, that looks quite neat. Can you also add a few tests when there is >1 use of the stepvector (e.g. using the stepvector in two adds), so we can test the fold indeed doesn't happen?
One test for each of the instructions should be sufficient.

llvm/test/CodeGen/AArch64/sve-stepvector.ll
147	nit: add_stepvector_nxv8i8_2_commutative ?

Address comments.

Update testcase.

Harbormaster completed remote builds in B98866: Diff 337700.Apr 15 2021, 4:55 AM

Harbormaster completed remote builds in B98876: Diff 337712.Apr 15 2021, 5:32 AM

Matt added a subscriber: Matt.Apr 17 2021, 8:41 AM

@sdesmalen kindly ping~

paulwalker-arm accepted this revision.Apr 19 2021, 3:18 AM

This revision is now accepted and ready to land.Apr 19 2021, 3:18 AM

Other than my comment on two missing tests, the patch looks good to me.

llvm/test/CodeGen/AArch64/sve-stepvector.ll
134	nit: I think you're still missing a test for the scalar+scalar case and one for imm+scalar (for start/stride respectively)

Closed by commit rG5c6ac3b4a25e: [AArch64][SVE] Combine add and index_vector (authored by junparser). · Explain WhyApr 19 2021, 8:39 PM

This revision was automatically updated to reflect the committed changes.

junparser added a commit: rG5c6ac3b4a25e: [AArch64][SVE] Combine add and index_vector.

junparser mentioned this in D100816: [AArch64][SVE] Lower index_vector to step_vector.Apr 19 2021, 10:39 PM

junparser mentioned this in rGb310dd15017f: [AArch64][SVE] Lower index_vector to step_vector.Apr 30 2021, 4:14 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64SVEInstrInfo.td

12 lines

SVEInstrFormats.td

49 lines

test/

CodeGen/

AArch64/

sve-stepvector.ll

102 lines

Diff 338698

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

Show First 20 Lines • Show All 253 Lines • ▼ Show 20 Lines
def SDT_AArch64PTest : SDTypeProfile<0, 2, [SDTCisVec<0>, SDTCisSameAs<0,1>]>;		def SDT_AArch64PTest : SDTypeProfile<0, 2, [SDTCisVec<0>, SDTCisSameAs<0,1>]>;
def AArch64ptest : SDNode<"AArch64ISD::PTEST", SDT_AArch64PTest>;		def AArch64ptest : SDNode<"AArch64ISD::PTEST", SDT_AArch64PTest>;

def SDT_AArch64DUP_PRED : SDTypeProfile<1, 3, [SDTCisVec<0>, SDTCisSameAs<0, 3>, SDTCisVec<1>, SDTCVecEltisVT<1,i1>]>;		def SDT_AArch64DUP_PRED : SDTypeProfile<1, 3, [SDTCisVec<0>, SDTCisSameAs<0, 3>, SDTCisVec<1>, SDTCVecEltisVT<1,i1>]>;
def AArch64dup_mt : SDNode<"AArch64ISD::DUP_MERGE_PASSTHRU", SDT_AArch64DUP_PRED>;		def AArch64dup_mt : SDNode<"AArch64ISD::DUP_MERGE_PASSTHRU", SDT_AArch64DUP_PRED>;

def SDT_IndexVector : SDTypeProfile<1, 2, [SDTCisVec<0>, SDTCisSameAs<1, 2>, SDTCisInt<2>]>;		def SDT_IndexVector : SDTypeProfile<1, 2, [SDTCisVec<0>, SDTCisSameAs<1, 2>, SDTCisInt<2>]>;
def index_vector : SDNode<"AArch64ISD::INDEX_VECTOR", SDT_IndexVector, []>;		def index_vector : SDNode<"AArch64ISD::INDEX_VECTOR", SDT_IndexVector, []>;
		def index_vector_oneuse : PatFrag<(ops node:$base, node:$idx),
		(index_vector node:$base, node:$idx), [{
		return N->hasOneUse();
		}]>;

def reinterpret_cast : SDNode<"AArch64ISD::REINTERPRET_CAST", SDTUnaryOp>;		def reinterpret_cast : SDNode<"AArch64ISD::REINTERPRET_CAST", SDTUnaryOp>;

def AArch64mul_p_oneuse : PatFrag<(ops node:$pred, node:$src1, node:$src2),		def AArch64mul_p_oneuse : PatFrag<(ops node:$pred, node:$src1, node:$src2),
(AArch64mul_p node:$pred, node:$src1, node:$src2), [{		(AArch64mul_p node:$pred, node:$src1, node:$src2), [{
return N->hasOneUse();		return N->hasOneUse();
}]>;		}]>;

▲ Show 20 Lines • Show All 1,087 Lines • ▼ Show 20 Lines	def : Pat<(nxv2f64 (AArch64dup (f64 fpimm:$val))),

defm SQINCP_ZP : sve_int_count_v<0b00000, "sqincp", int_aarch64_sve_sqincp>;		defm SQINCP_ZP : sve_int_count_v<0b00000, "sqincp", int_aarch64_sve_sqincp>;
defm UQINCP_ZP : sve_int_count_v<0b00100, "uqincp", int_aarch64_sve_uqincp>;		defm UQINCP_ZP : sve_int_count_v<0b00100, "uqincp", int_aarch64_sve_uqincp>;
defm SQDECP_ZP : sve_int_count_v<0b01000, "sqdecp", int_aarch64_sve_sqdecp>;		defm SQDECP_ZP : sve_int_count_v<0b01000, "sqdecp", int_aarch64_sve_sqdecp>;
defm UQDECP_ZP : sve_int_count_v<0b01100, "uqdecp", int_aarch64_sve_uqdecp>;		defm UQDECP_ZP : sve_int_count_v<0b01100, "uqdecp", int_aarch64_sve_uqdecp>;
defm INCP_ZP : sve_int_count_v<0b10000, "incp">;		defm INCP_ZP : sve_int_count_v<0b10000, "incp">;
defm DECP_ZP : sve_int_count_v<0b10100, "decp">;		defm DECP_ZP : sve_int_count_v<0b10100, "decp">;

defm INDEX_RR : sve_int_index_rr<"index", index_vector>;		defm INDEX_RR : sve_int_index_rr<"index", index_vector, index_vector_oneuse>;
defm INDEX_IR : sve_int_index_ir<"index", index_vector>;		defm INDEX_IR : sve_int_index_ir<"index", index_vector, index_vector_oneuse>;
defm INDEX_RI : sve_int_index_ri<"index", index_vector>;		defm INDEX_RI : sve_int_index_ri<"index", index_vector, index_vector_oneuse>;
defm INDEX_II : sve_int_index_ii<"index", index_vector>;		defm INDEX_II : sve_int_index_ii<"index", index_vector, index_vector_oneuse>;

// Unpredicated shifts		// Unpredicated shifts
defm ASR_ZZI : sve_int_bin_cons_shift_imm_right<0b00, "asr", AArch64asr_p>;		defm ASR_ZZI : sve_int_bin_cons_shift_imm_right<0b00, "asr", AArch64asr_p>;
defm LSR_ZZI : sve_int_bin_cons_shift_imm_right<0b01, "lsr", AArch64lsr_p>;		defm LSR_ZZI : sve_int_bin_cons_shift_imm_right<0b01, "lsr", AArch64lsr_p>;
defm LSL_ZZI : sve_int_bin_cons_shift_imm_left< 0b11, "lsl", AArch64lsl_p>;		defm LSL_ZZI : sve_int_bin_cons_shift_imm_left< 0b11, "lsl", AArch64lsl_p>;

defm ASR_WIDE_ZZZ : sve_int_bin_cons_shift_wide<0b00, "asr">;		defm ASR_WIDE_ZZZ : sve_int_bin_cons_shift_wide<0b00, "asr">;
defm LSR_WIDE_ZZZ : sve_int_bin_cons_shift_wide<0b01, "lsr">;		defm LSR_WIDE_ZZZ : sve_int_bin_cons_shift_wide<0b01, "lsr">;
▲ Show 20 Lines • Show All 1,411 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/SVEInstrFormats.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,771 Lines • ▼ Show 20 Lines	: I<(outs zprty:$Zd), (ins imm_ty:$imm5, imm_ty:$imm5b),
let Inst{23-22} = sz8_64;		let Inst{23-22} = sz8_64;
let Inst{21} = 0b1;		let Inst{21} = 0b1;
let Inst{20-16} = imm5b;		let Inst{20-16} = imm5b;
let Inst{15-10} = 0b010000;		let Inst{15-10} = 0b010000;
let Inst{9-5} = imm5;		let Inst{9-5} = imm5;
let Inst{4-0} = Zd;		let Inst{4-0} = Zd;
}		}

multiclass sve_int_index_ii<string asm, SDPatternOperator op> {		multiclass sve_int_index_ii<string asm, SDPatternOperator op, SDPatternOperator oneuseop> {
def _B : sve_int_index_ii<0b00, asm, ZPR8, simm5_8b>;		def _B : sve_int_index_ii<0b00, asm, ZPR8, simm5_8b>;
def _H : sve_int_index_ii<0b01, asm, ZPR16, simm5_16b>;		def _H : sve_int_index_ii<0b01, asm, ZPR16, simm5_16b>;
def _S : sve_int_index_ii<0b10, asm, ZPR32, simm5_32b>;		def _S : sve_int_index_ii<0b10, asm, ZPR32, simm5_32b>;
def _D : sve_int_index_ii<0b11, asm, ZPR64, simm5_64b>;		def _D : sve_int_index_ii<0b11, asm, ZPR64, simm5_64b>;

def : Pat<(nxv16i8 (op simm5_8b:$imm5, simm5_8b:$imm5b)),		def : Pat<(nxv16i8 (op simm5_8b:$imm5, simm5_8b:$imm5b)),
(!cast<Instruction>(NAME # "_B") simm5_8b:$imm5, simm5_8b:$imm5b)>;		(!cast<Instruction>(NAME # "_B") simm5_8b:$imm5, simm5_8b:$imm5b)>;
def : Pat<(nxv8i16 (op simm5_16b:$imm5, simm5_16b:$imm5b)),		def : Pat<(nxv8i16 (op simm5_16b:$imm5, simm5_16b:$imm5b)),
(!cast<Instruction>(NAME # "_H") simm5_16b:$imm5, simm5_16b:$imm5b)>;		(!cast<Instruction>(NAME # "_H") simm5_16b:$imm5, simm5_16b:$imm5b)>;
def : Pat<(nxv4i32 (op simm5_32b:$imm5, simm5_32b:$imm5b)),		def : Pat<(nxv4i32 (op simm5_32b:$imm5, simm5_32b:$imm5b)),
(!cast<Instruction>(NAME # "_S") simm5_32b:$imm5, simm5_32b:$imm5b)>;		(!cast<Instruction>(NAME # "_S") simm5_32b:$imm5, simm5_32b:$imm5b)>;
def : Pat<(nxv2i64 (op simm5_64b:$imm5, simm5_64b:$imm5b)),		def : Pat<(nxv2i64 (op simm5_64b:$imm5, simm5_64b:$imm5b)),
(!cast<Instruction>(NAME # "_D") simm5_64b:$imm5, simm5_64b:$imm5b)>;		(!cast<Instruction>(NAME # "_D") simm5_64b:$imm5, simm5_64b:$imm5b)>;

		// add(index_vector(zero, step), dup(X)) -> index_vector(X, step).
		def : Pat<(add (nxv16i8 (oneuseop (i32 0), simm5_8b:$imm5b)), (nxv16i8 (AArch64dup(simm5_8b:$imm5)))),
		(!cast<Instruction>(NAME # "_B") simm5_8b:$imm5, simm5_8b:$imm5b)>;
		def : Pat<(add (nxv8i16 (oneuseop (i32 0), simm5_16b:$imm5b)), (nxv8i16 (AArch64dup(simm5_16b:$imm5)))),
		(!cast<Instruction>(NAME # "_H") simm5_16b:$imm5, simm5_16b:$imm5b)>;
		def : Pat<(add (nxv4i32 (oneuseop (i32 0), simm5_32b:$imm5b)), (nxv4i32 (AArch64dup(simm5_32b:$imm5)))),
		(!cast<Instruction>(NAME # "_S") simm5_32b:$imm5, simm5_32b:$imm5b)>;
		def : Pat<(add (nxv2i64 (oneuseop (i64 0), simm5_64b:$imm5b)), (nxv2i64 (AArch64dup(simm5_64b:$imm5)))),
		(!cast<Instruction>(NAME # "_D") simm5_64b:$imm5, simm5_64b:$imm5b)>;
}		}

class sve_int_index_ir<bits<2> sz8_64, string asm, ZPRRegOp zprty,		class sve_int_index_ir<bits<2> sz8_64, string asm, ZPRRegOp zprty,
RegisterClass srcRegType, Operand imm_ty>		RegisterClass srcRegType, Operand imm_ty>
: I<(outs zprty:$Zd), (ins imm_ty:$imm5, srcRegType:$Rm),		: I<(outs zprty:$Zd), (ins imm_ty:$imm5, srcRegType:$Rm),
asm, "\t$Zd, $imm5, $Rm",		asm, "\t$Zd, $imm5, $Rm",
"", []>, Sched<[]> {		"", []>, Sched<[]> {
bits<5> Rm;		bits<5> Rm;
bits<5> Zd;		bits<5> Zd;
bits<5> imm5;		bits<5> imm5;
let Inst{31-24} = 0b00000100;		let Inst{31-24} = 0b00000100;
let Inst{23-22} = sz8_64;		let Inst{23-22} = sz8_64;
let Inst{21} = 0b1;		let Inst{21} = 0b1;
let Inst{20-16} = Rm;		let Inst{20-16} = Rm;
let Inst{15-10} = 0b010010;		let Inst{15-10} = 0b010010;
let Inst{9-5} = imm5;		let Inst{9-5} = imm5;
let Inst{4-0} = Zd;		let Inst{4-0} = Zd;
}		}

multiclass sve_int_index_ir<string asm, SDPatternOperator op> {		multiclass sve_int_index_ir<string asm, SDPatternOperator op, SDPatternOperator oneuseop> {
def _B : sve_int_index_ir<0b00, asm, ZPR8, GPR32, simm5_8b>;		def _B : sve_int_index_ir<0b00, asm, ZPR8, GPR32, simm5_8b>;
def _H : sve_int_index_ir<0b01, asm, ZPR16, GPR32, simm5_16b>;		def _H : sve_int_index_ir<0b01, asm, ZPR16, GPR32, simm5_16b>;
def _S : sve_int_index_ir<0b10, asm, ZPR32, GPR32, simm5_32b>;		def _S : sve_int_index_ir<0b10, asm, ZPR32, GPR32, simm5_32b>;
def _D : sve_int_index_ir<0b11, asm, ZPR64, GPR64, simm5_64b>;		def _D : sve_int_index_ir<0b11, asm, ZPR64, GPR64, simm5_64b>;

def : Pat<(nxv16i8 (op simm5_8b:$imm5, GPR32:$Rm)),		def : Pat<(nxv16i8 (op simm5_8b:$imm5, GPR32:$Rm)),
(!cast<Instruction>(NAME # "_B") simm5_8b:$imm5, GPR32:$Rm)>;		(!cast<Instruction>(NAME # "_B") simm5_8b:$imm5, GPR32:$Rm)>;
def : Pat<(nxv8i16 (op simm5_16b:$imm5, GPR32:$Rm)),		def : Pat<(nxv8i16 (op simm5_16b:$imm5, GPR32:$Rm)),
(!cast<Instruction>(NAME # "_H") simm5_16b:$imm5, GPR32:$Rm)>;		(!cast<Instruction>(NAME # "_H") simm5_16b:$imm5, GPR32:$Rm)>;
def : Pat<(nxv4i32 (op simm5_32b:$imm5, GPR32:$Rm)),		def : Pat<(nxv4i32 (op simm5_32b:$imm5, GPR32:$Rm)),
(!cast<Instruction>(NAME # "_S") simm5_32b:$imm5, GPR32:$Rm)>;		(!cast<Instruction>(NAME # "_S") simm5_32b:$imm5, GPR32:$Rm)>;
def : Pat<(nxv2i64 (op simm5_64b:$imm5, GPR64:$Rm)),		def : Pat<(nxv2i64 (op simm5_64b:$imm5, GPR64:$Rm)),
(!cast<Instruction>(NAME # "_D") simm5_64b:$imm5, GPR64:$Rm)>;		(!cast<Instruction>(NAME # "_D") simm5_64b:$imm5, GPR64:$Rm)>;

		// add(index_vector(zero, step), dup(X)) -> index_vector(X, step).
		def : Pat<(add (nxv16i8 (oneuseop (i32 0), GPR32:$Rm)), (nxv16i8 (AArch64dup(simm5_8b:$imm5)))),
		(!cast<Instruction>(NAME # "_B") simm5_8b:$imm5, GPR32:$Rm)>;
		def : Pat<(add (nxv8i16 (oneuseop (i32 0), GPR32:$Rm)), (nxv8i16 (AArch64dup(simm5_16b:$imm5)))),
		(!cast<Instruction>(NAME # "_H") simm5_16b:$imm5, GPR32:$Rm)>;
		def : Pat<(add (nxv4i32 (oneuseop (i32 0), GPR32:$Rm)), (nxv4i32 (AArch64dup(simm5_32b:$imm5)))),
		(!cast<Instruction>(NAME # "_S") simm5_32b:$imm5, GPR32:$Rm)>;
		def : Pat<(add (nxv2i64 (oneuseop (i64 0), GPR64:$Rm)), (nxv2i64 (AArch64dup(simm5_64b:$imm5)))),
		(!cast<Instruction>(NAME # "_D") simm5_64b:$imm5, GPR64:$Rm)>;

}		}

class sve_int_index_ri<bits<2> sz8_64, string asm, ZPRRegOp zprty,		class sve_int_index_ri<bits<2> sz8_64, string asm, ZPRRegOp zprty,
RegisterClass srcRegType, Operand imm_ty>		RegisterClass srcRegType, Operand imm_ty>
: I<(outs zprty:$Zd), (ins srcRegType:$Rn, imm_ty:$imm5),		: I<(outs zprty:$Zd), (ins srcRegType:$Rn, imm_ty:$imm5),
asm, "\t$Zd, $Rn, $imm5",		asm, "\t$Zd, $Rn, $imm5",
"", []>, Sched<[]> {		"", []>, Sched<[]> {
bits<5> Rn;		bits<5> Rn;
bits<5> Zd;		bits<5> Zd;
bits<5> imm5;		bits<5> imm5;
let Inst{31-24} = 0b00000100;		let Inst{31-24} = 0b00000100;
let Inst{23-22} = sz8_64;		let Inst{23-22} = sz8_64;
let Inst{21} = 0b1;		let Inst{21} = 0b1;
let Inst{20-16} = imm5;		let Inst{20-16} = imm5;
let Inst{15-10} = 0b010001;		let Inst{15-10} = 0b010001;
let Inst{9-5} = Rn;		let Inst{9-5} = Rn;
let Inst{4-0} = Zd;		let Inst{4-0} = Zd;
}		}

multiclass sve_int_index_ri<string asm, SDPatternOperator op> {		multiclass sve_int_index_ri<string asm, SDPatternOperator op, SDPatternOperator oneuseop> {
def _B : sve_int_index_ri<0b00, asm, ZPR8, GPR32, simm5_8b>;		def _B : sve_int_index_ri<0b00, asm, ZPR8, GPR32, simm5_8b>;
def _H : sve_int_index_ri<0b01, asm, ZPR16, GPR32, simm5_16b>;		def _H : sve_int_index_ri<0b01, asm, ZPR16, GPR32, simm5_16b>;
def _S : sve_int_index_ri<0b10, asm, ZPR32, GPR32, simm5_32b>;		def _S : sve_int_index_ri<0b10, asm, ZPR32, GPR32, simm5_32b>;
def _D : sve_int_index_ri<0b11, asm, ZPR64, GPR64, simm5_64b>;		def _D : sve_int_index_ri<0b11, asm, ZPR64, GPR64, simm5_64b>;

def : Pat<(nxv16i8 (op GPR32:$Rm, simm5_8b:$imm5)),		def : Pat<(nxv16i8 (op GPR32:$Rm, simm5_8b:$imm5)),
(!cast<Instruction>(NAME # "_B") GPR32:$Rm, simm5_8b:$imm5)>;		(!cast<Instruction>(NAME # "_B") GPR32:$Rm, simm5_8b:$imm5)>;
def : Pat<(nxv8i16 (op GPR32:$Rm, simm5_16b:$imm5)),		def : Pat<(nxv8i16 (op GPR32:$Rm, simm5_16b:$imm5)),
(!cast<Instruction>(NAME # "_H") GPR32:$Rm, simm5_16b:$imm5)>;		(!cast<Instruction>(NAME # "_H") GPR32:$Rm, simm5_16b:$imm5)>;
def : Pat<(nxv4i32 (op GPR32:$Rm, simm5_32b:$imm5)),		def : Pat<(nxv4i32 (op GPR32:$Rm, simm5_32b:$imm5)),
(!cast<Instruction>(NAME # "_S") GPR32:$Rm, simm5_32b:$imm5)>;		(!cast<Instruction>(NAME # "_S") GPR32:$Rm, simm5_32b:$imm5)>;
def : Pat<(nxv2i64 (op GPR64:$Rm, simm5_64b:$imm5)),		def : Pat<(nxv2i64 (op GPR64:$Rm, simm5_64b:$imm5)),
(!cast<Instruction>(NAME # "_D") GPR64:$Rm, simm5_64b:$imm5)>;		(!cast<Instruction>(NAME # "_D") GPR64:$Rm, simm5_64b:$imm5)>;

		// add(index_vector(zero, step), dup(X)) -> index_vector(X, step).
		def : Pat<(add (nxv16i8 (oneuseop (i32 0), simm5_8b:$imm5)), (nxv16i8 (AArch64dup(i32 GPR32:$Rm)))),
		(!cast<Instruction>(NAME # "_B") GPR32:$Rm, simm5_8b:$imm5)>;
		def : Pat<(add (nxv8i16 (oneuseop (i32 0), simm5_16b:$imm5)), (nxv8i16 (AArch64dup(i32 GPR32:$Rm)))),
		(!cast<Instruction>(NAME # "_H") GPR32:$Rm, simm5_16b:$imm5)>;
		def : Pat<(add (nxv4i32 (oneuseop (i32 0), simm5_32b:$imm5)), (nxv4i32 (AArch64dup(i32 GPR32:$Rm)))),
		(!cast<Instruction>(NAME # "_S") GPR32:$Rm, simm5_32b:$imm5)>;
		def : Pat<(add (nxv2i64 (oneuseop (i64 0), simm5_64b:$imm5)), (nxv2i64 (AArch64dup(i64 GPR64:$Rm)))),
		(!cast<Instruction>(NAME # "_D") GPR64:$Rm, simm5_64b:$imm5)>;
}		}

class sve_int_index_rr<bits<2> sz8_64, string asm, ZPRRegOp zprty,		class sve_int_index_rr<bits<2> sz8_64, string asm, ZPRRegOp zprty,
RegisterClass srcRegType>		RegisterClass srcRegType>
: I<(outs zprty:$Zd), (ins srcRegType:$Rn, srcRegType:$Rm),		: I<(outs zprty:$Zd), (ins srcRegType:$Rn, srcRegType:$Rm),
asm, "\t$Zd, $Rn, $Rm",		asm, "\t$Zd, $Rn, $Rm",
"", []>, Sched<[]> {		"", []>, Sched<[]> {
bits<5> Zd;		bits<5> Zd;
bits<5> Rm;		bits<5> Rm;
bits<5> Rn;		bits<5> Rn;
let Inst{31-24} = 0b00000100;		let Inst{31-24} = 0b00000100;
let Inst{23-22} = sz8_64;		let Inst{23-22} = sz8_64;
let Inst{21} = 0b1;		let Inst{21} = 0b1;
let Inst{20-16} = Rm;		let Inst{20-16} = Rm;
let Inst{15-10} = 0b010011;		let Inst{15-10} = 0b010011;
let Inst{9-5} = Rn;		let Inst{9-5} = Rn;
let Inst{4-0} = Zd;		let Inst{4-0} = Zd;
}		}

multiclass sve_int_index_rr<string asm, SDPatternOperator op> {		multiclass sve_int_index_rr<string asm, SDPatternOperator op, SDPatternOperator oneuseop> {
def _B : sve_int_index_rr<0b00, asm, ZPR8, GPR32>;		def _B : sve_int_index_rr<0b00, asm, ZPR8, GPR32>;
def _H : sve_int_index_rr<0b01, asm, ZPR16, GPR32>;		def _H : sve_int_index_rr<0b01, asm, ZPR16, GPR32>;
def _S : sve_int_index_rr<0b10, asm, ZPR32, GPR32>;		def _S : sve_int_index_rr<0b10, asm, ZPR32, GPR32>;
def _D : sve_int_index_rr<0b11, asm, ZPR64, GPR64>;		def _D : sve_int_index_rr<0b11, asm, ZPR64, GPR64>;

def : SVE_2_Op_Pat<nxv16i8, op, i32, i32, !cast<Instruction>(NAME # _B)>;		def : SVE_2_Op_Pat<nxv16i8, op, i32, i32, !cast<Instruction>(NAME # _B)>;
def : SVE_2_Op_Pat<nxv8i16, op, i32, i32, !cast<Instruction>(NAME # _H)>;		def : SVE_2_Op_Pat<nxv8i16, op, i32, i32, !cast<Instruction>(NAME # _H)>;
def : SVE_2_Op_Pat<nxv4i32, op, i32, i32, !cast<Instruction>(NAME # _S)>;		def : SVE_2_Op_Pat<nxv4i32, op, i32, i32, !cast<Instruction>(NAME # _S)>;
def : SVE_2_Op_Pat<nxv2i64, op, i64, i64, !cast<Instruction>(NAME # _D)>;		def : SVE_2_Op_Pat<nxv2i64, op, i64, i64, !cast<Instruction>(NAME # _D)>;

		// add(index_vector(zero, step), dup(X)) -> index_vector(X, step).
		def : Pat<(add (nxv16i8 (oneuseop (i32 0), GPR32:$Rm)), (nxv16i8 (AArch64dup(i32 GPR32:$Rn)))),
		(!cast<Instruction>(NAME # "_B") GPR32:$Rn, GPR32:$Rm)>;
		def : Pat<(add (nxv8i16 (oneuseop (i32 0), GPR32:$Rm)), (nxv8i16 (AArch64dup(i32 GPR32:$Rn)))),
		(!cast<Instruction>(NAME # "_H") GPR32:$Rn, GPR32:$Rm)>;
		def : Pat<(add (nxv4i32 (oneuseop (i32 0), GPR32:$Rm)), (nxv4i32 (AArch64dup(i32 GPR32:$Rn)))),
		(!cast<Instruction>(NAME # "_S") GPR32:$Rn, GPR32:$Rm)>;
		def : Pat<(add (nxv2i64 (oneuseop (i64 0), GPR64:$Rm)), (nxv2i64 (AArch64dup(i64 GPR64:$Rn)))),
		(!cast<Instruction>(NAME # "_D") GPR64:$Rn, GPR64:$Rm)>;
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// SVE Bitwise Shift - Predicated Group		// SVE Bitwise Shift - Predicated Group
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

class sve_int_bin_pred_shift_imm<bits<4> tsz8_64, bits<4> opc, string asm,		class sve_int_bin_pred_shift_imm<bits<4> tsz8_64, bits<4> opc, string asm,
ZPRRegOp zprty, Operand immtype>		ZPRRegOp zprty, Operand immtype>
▲ Show 20 Lines • Show All 3,152 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-stepvector.ll

	Show First 20 Lines • Show All 125 Lines • ▼ Show 20 Lines
	entry:			entry:
	%0 = call <vscale x 8 x i8> @llvm.experimental.stepvector.nxv8i8()			%0 = call <vscale x 8 x i8> @llvm.experimental.stepvector.nxv8i8()
	%1 = add <vscale x 8 x i8> %p, %0			%1 = add <vscale x 8 x i8> %p, %0
	%2 = call <vscale x 8 x i8> @llvm.experimental.stepvector.nxv8i8()			%2 = call <vscale x 8 x i8> @llvm.experimental.stepvector.nxv8i8()
	%3 = add <vscale x 8 x i8> %1, %2			%3 = add <vscale x 8 x i8> %1, %2
	ret <vscale x 8 x i8> %3			ret <vscale x 8 x i8> %3
	}			}

				define <vscale x 8 x i8> @add_stepvector_nxv8i8_2() {
				sdesmalenUnsubmitted Not Done Reply Inline Actions nit: I think you're still missing a test for the scalar+scalar case and one for imm+scalar (for start/stride respectively) sdesmalen: nit: I think you're still missing a test for the scalar+scalar case and one for imm+scalar (for…
				; CHECK-LABEL: add_stepvector_nxv8i8_2:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: index z0.h, #2, #1
				; CHECK-NEXT: ret
				entry:
				%0 = insertelement <vscale x 8 x i8> poison, i8 2, i32 0
				%1 = shufflevector <vscale x 8 x i8> %0, <vscale x 8 x i8> poison, <vscale x 8 x i32> zeroinitializer
				%2 = call <vscale x 8 x i8> @llvm.experimental.stepvector.nxv8i8()
				%3 = add <vscale x 8 x i8> %2, %1
				ret <vscale x 8 x i8> %3
				}

				define <vscale x 8 x i8> @add_stepvector_nxv8i8_2_commutative() {
				sdesmalenUnsubmitted Not Done Reply Inline Actions nit: add_stepvector_nxv8i8_2_commutative ? sdesmalen: nit: add_stepvector_nxv8i8_2_commutative ?
				; CHECK-LABEL: add_stepvector_nxv8i8_2_commutative:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: index z0.h, #2, #1
				; CHECK-NEXT: ret
				entry:
				%0 = insertelement <vscale x 8 x i8> poison, i8 2, i32 0
				%1 = shufflevector <vscale x 8 x i8> %0, <vscale x 8 x i8> poison, <vscale x 8 x i32> zeroinitializer
				%2 = call <vscale x 8 x i8> @llvm.experimental.stepvector.nxv8i8()
				%3 = add <vscale x 8 x i8> %1, %2
				ret <vscale x 8 x i8> %3
				}

				define <vscale x 8 x i16> @add_stepvector_nxv8i16_1(i16 %data) {
				; CHECK-LABEL: add_stepvector_nxv8i16_1:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: index z0.h, w0, #1
				; CHECK-NEXT: ret
				entry:
				%0 = insertelement <vscale x 8 x i16> poison, i16 %data, i32 0
				%1 = shufflevector <vscale x 8 x i16> %0, <vscale x 8 x i16> poison, <vscale x 8 x i32> zeroinitializer
				%2 = call <vscale x 8 x i16> @llvm.experimental.stepvector.nxv8i16()
				%3 = add <vscale x 8 x i16> %2, %1
				ret <vscale x 8 x i16> %3
				}

				define <vscale x 4 x i32> @add_stepvector_nxv4i32_1(i32 %data) {
				; CHECK-LABEL: add_stepvector_nxv4i32_1:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: index z0.s, w0, #1
				; CHECK-NEXT: ret
				entry:
				%0 = insertelement <vscale x 4 x i32> poison, i32 %data, i32 0
				%1 = shufflevector <vscale x 4 x i32> %0, <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer
				%2 = call <vscale x 4 x i32> @llvm.experimental.stepvector.nxv4i32()
				%3 = add <vscale x 4 x i32> %2, %1
				ret <vscale x 4 x i32> %3
				}

				define <vscale x 4 x i32> @multiple_use_stepvector_nxv4i32_1(i32 %data) {
				; CHECK-LABEL: multiple_use_stepvector_nxv4i32_1:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: mov z0.s, w0
				; CHECK-NEXT: index z1.s, w0, #1
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: mul z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: sub z0.s, z0.s, z1.s
				; CHECK-NEXT: ret
				entry:
				%0 = insertelement <vscale x 4 x i32> poison, i32 %data, i32 0
				%1 = shufflevector <vscale x 4 x i32> %0, <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer
				%2 = call <vscale x 4 x i32> @llvm.experimental.stepvector.nxv4i32()
				%3 = add <vscale x 4 x i32> %2, %1
				%4 = mul <vscale x 4 x i32> %1, %3
				%5 = sub <vscale x 4 x i32> %4, %3
				ret <vscale x 4 x i32> %5
				}

				define <vscale x 2 x i64> @add_stepvector_nxv2i64_1(i64 %data) {
				; CHECK-LABEL: add_stepvector_nxv2i64_1:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: index z0.d, x0, #1
				; CHECK-NEXT: ret
				entry:
				%0 = insertelement <vscale x 2 x i64> poison, i64 %data, i32 0
				%1 = shufflevector <vscale x 2 x i64> %0, <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer
				%2 = call <vscale x 2 x i64> @llvm.experimental.stepvector.nxv2i64()
				%3 = add <vscale x 2 x i64> %1, %2
				ret <vscale x 2 x i64> %3
				}

				define <vscale x 2 x i64> @multiple_use_stepvector_nxv2i64_1(i64 %data) {
				; CHECK-LABEL: multiple_use_stepvector_nxv2i64_1:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: mov z0.d, x0
				; CHECK-NEXT: index z1.d, #0, #1
				; CHECK-NEXT: add z0.d, z0.d, z1.d
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mul z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: ret
				entry:
				%0 = insertelement <vscale x 2 x i64> poison, i64 %data, i32 0
				%1 = shufflevector <vscale x 2 x i64> %0, <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer
				%2 = call <vscale x 2 x i64> @llvm.experimental.stepvector.nxv2i64()
				%3 = add <vscale x 2 x i64> %1, %2
				%4 = mul <vscale x 2 x i64> %3, %2
				ret <vscale x 2 x i64> %4
				}

	define <vscale x 8 x i8> @mul_stepvector_nxv8i8() {			define <vscale x 8 x i8> @mul_stepvector_nxv8i8() {
	; CHECK-LABEL: mul_stepvector_nxv8i8:			; CHECK-LABEL: mul_stepvector_nxv8i8:
	; CHECK: // %bb.0: // %entry			; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: index z0.h, #0, #2			; CHECK-NEXT: index z0.h, #0, #2
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%0 = insertelement <vscale x 8 x i8> poison, i8 2, i32 0			%0 = insertelement <vscale x 8 x i8> poison, i8 2, i32 0
	%1 = shufflevector <vscale x 8 x i8> %0, <vscale x 8 x i8> poison, <vscale x 8 x i32> zeroinitializer			%1 = shufflevector <vscale x 8 x i8> %0, <vscale x 8 x i8> poison, <vscale x 8 x i32> zeroinitializer
	Show All 29 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Combine add and index_vectorClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 338698

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

llvm/lib/Target/AArch64/SVEInstrFormats.td

llvm/test/CodeGen/AArch64/sve-stepvector.ll

[AArch64][SVE] Combine add and index_vector
ClosedPublic