This is an archive of the discontinued LLVM Phabricator instance.

llvm/include/llvm/IR/IntrinsicsAArch64.td
2050	What does `_x` mean here?
llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
1222 ↗	(On Diff #245676)	`DL` ;-)
3592 ↗	(On Diff #245676)	`NumVecs` seems be always 2 in this patch. Will we need this to work for other values in the future too? [Nit] `2` is a bit of a magic number here. What about `2` -> `/NumVecs=/2`
llvm/test/CodeGen/AArch64/sve2-intrinsics-bit-permutation.ll
2	AFAIK, `-asm-verbose=0` is not currently needed here (and you don't use it in the other test). There are 2 options: Leave `-asm-verbose=0` (guarantees that there are no comments in assembly) and additionally decorate every function that you define with `nounwind` (guarantees that no CFI directives are added). This way you can safely replace every instance of `CHECK` with `CHECK-NEXT`. Remove `-asm-verbose=0` and leave things as they are.

Removed NumVecs parameter from SelectTableSVE2 as the value is always the same (2)
Removed unnecessary -asm-verbose=0 from the RUN line of sve2-intrinsics-bit-permutation.ll

Thanks for reviewing this, @andwar!

llvm/include/llvm/IR/IntrinsicsAArch64.td
2050	_x indicates that this is an unpredicated intrinsic.
llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
3592 ↗	(On Diff #245676)	I agree that it's not very clear what 2 is used for here. As NumVecs will always be the same value for the tbl2 intrinsic and SelectTableSVE2 is unlikely to be used for anything else, I've removed it from the list of parameters & added a comment there to explain the value used.

sdesmalen added inline comments.Feb 21 2020, 9:34 AM

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
1226 ↗	(On Diff #245835)	nit: You can just as well inline this value now.
1229 ↗	(On Diff #245835)	nit: given that we know NumVecs == 2, you can write `SmallVector<SDVDalue, 2>`. nit: How about `N->ops().slice(1, 2)` ? https://llvm.org/doxygen/classllvm_1_1ArrayRef.html#ace7bdce94e806bb8870626657630dab0
1235 ↗	(On Diff #245835)	nit: Maybe just: ReplaceNode(N, CurDAG->getMachineNode(Opc, DL, VT, { RegSeq, N->getOperand(NumVecs + 1) });
llvm/test/CodeGen/AArch64/sve2-intrinsics-perm-tb.ll
10	We should test this with operands that are not already consecutive. `%a` and `%b` will come in as `z0` and `z1` by definition of the calling convention. By adding a `%dummy` in between `%a` and `%b`, you can check that a `mov` is inserted to ensure both registers are consecutive.

efriedma added inline comments.Feb 21 2020, 10:13 AM

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
1220 ↗	(On Diff #245835)	Is it possible to write this as a TableGen pattern? We manage for other variants of tbl (for example, https://github.com/llvm/llvm-project/blob/bc7b26c333f51b4b534abb81d597c0b86123718c/llvm/lib/Target/ARM/ARMInstrNEON.td#L7059 ).

Addressed review comments:

Removed SelectTableSVE2 from AArch64ISelDAGToDAG.cpp and added tablegen patterns for the tbl2 intrinsic
Updated tests to use operands that are not consecutive to ensure that the result is still two consecutive registers

LGTM

This revision is now accepted and ready to land.Feb 25 2020, 3:09 PM

Closed by commit rG9c859fc54d92: [AArch64][SVE] Add SVE2 intrinsics for bit permutation & table lookup (authored by kmclaughlin). · Explain WhyFeb 26 2020, 3:30 AM

This revision was automatically updated to reflect the committed changes.

Thanks for reviewing this, @sdesmalen & @efriedma!

c-rhodes mentioned this in D75197: [AArch64][SVE] Add intrinsics for bitwise permute instructions.Feb 27 2020, 1:48 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

IR/

IntrinsicsAArch64.td

21 lines

lib/

Target/

AArch64/

AArch64SVEInstrInfo.td

10 lines

SVEInstrFormats.td

55 lines

test/

CodeGen/

AArch64/

sve2-intrinsics-bit-permutation.ll

124 lines

sve2-intrinsics-perm-tb.ll

181 lines

Diff 246487

llvm/include/llvm/IR/IntrinsicsAArch64.td

Show First 20 Lines • Show All 1,053 Lines • ▼ Show 20 Lines	: Intrinsic<[llvm_i1_ty],
[IntrNoMem]>;		[IntrNoMem]>;

class AdvSIMD_SVE_TBL_Intrinsic		class AdvSIMD_SVE_TBL_Intrinsic
: Intrinsic<[llvm_anyvector_ty],		: Intrinsic<[llvm_anyvector_ty],
[LLVMMatchType<0>,		[LLVMMatchType<0>,
LLVMVectorOfBitcastsToInt<0>],		LLVMVectorOfBitcastsToInt<0>],
[IntrNoMem]>;		[IntrNoMem]>;

		class AdvSIMD_SVE2_TBX_Intrinsic
		: Intrinsic<[llvm_anyvector_ty],
		[LLVMMatchType<0>,
		LLVMMatchType<0>,
		LLVMVectorOfBitcastsToInt<0>],
		[IntrNoMem]>;

class SVE2_1VectorArg_Long_Intrinsic		class SVE2_1VectorArg_Long_Intrinsic
: Intrinsic<[llvm_anyvector_ty],		: Intrinsic<[llvm_anyvector_ty],
[LLVMSubdivide2VectorType<0>,		[LLVMSubdivide2VectorType<0>,
llvm_i32_ty],		llvm_i32_ty],
[IntrNoMem, ImmArg<1>]>;		[IntrNoMem, ImmArg<1>]>;

class SVE2_2VectorArg_Long_Intrinsic		class SVE2_2VectorArg_Long_Intrinsic
: Intrinsic<[llvm_anyvector_ty],		: Intrinsic<[llvm_anyvector_ty],
▲ Show 20 Lines • Show All 965 Lines • ▼ Show 20 Lines

def int_aarch64_sve_aesd : GCCBuiltin<"__builtin_sve_svaesd_u8">,		def int_aarch64_sve_aesd : GCCBuiltin<"__builtin_sve_svaesd_u8">,
Intrinsic<[llvm_nxv16i8_ty],		Intrinsic<[llvm_nxv16i8_ty],
[llvm_nxv16i8_ty, llvm_nxv16i8_ty],		[llvm_nxv16i8_ty, llvm_nxv16i8_ty],
[IntrNoMem]>;		[IntrNoMem]>;
def int_aarch64_sve_aesimc : GCCBuiltin<"__builtin_sve_svaesimc_u8">,		def int_aarch64_sve_aesimc : GCCBuiltin<"__builtin_sve_svaesimc_u8">,
Intrinsic<[llvm_nxv16i8_ty],		Intrinsic<[llvm_nxv16i8_ty],
[llvm_nxv16i8_ty],		[llvm_nxv16i8_ty],
[IntrNoMem]>;		[IntrNoMem]>;
		andwarUnsubmitted Not Done Reply Inline Actions What does `_x` mean here? andwar: What does `_x` mean here?
		kmclaughlinAuthorUnsubmitted Done Reply Inline Actions _x indicates that this is an unpredicated intrinsic. kmclaughlin: _x indicates that this is an unpredicated intrinsic.
def int_aarch64_sve_aese : GCCBuiltin<"__builtin_sve_svaese_u8">,		def int_aarch64_sve_aese : GCCBuiltin<"__builtin_sve_svaese_u8">,
Intrinsic<[llvm_nxv16i8_ty],		Intrinsic<[llvm_nxv16i8_ty],
[llvm_nxv16i8_ty, llvm_nxv16i8_ty],		[llvm_nxv16i8_ty, llvm_nxv16i8_ty],
[IntrNoMem]>;		[IntrNoMem]>;
def int_aarch64_sve_aesmc : GCCBuiltin<"__builtin_sve_svaesmc_u8">,		def int_aarch64_sve_aesmc : GCCBuiltin<"__builtin_sve_svaesmc_u8">,
Intrinsic<[llvm_nxv16i8_ty],		Intrinsic<[llvm_nxv16i8_ty],
[llvm_nxv16i8_ty],		[llvm_nxv16i8_ty],
[IntrNoMem]>;		[IntrNoMem]>;
def int_aarch64_sve_rax1 : GCCBuiltin<"__builtin_sve_svrax1_u64">,		def int_aarch64_sve_rax1 : GCCBuiltin<"__builtin_sve_svrax1_u64">,
Intrinsic<[llvm_nxv2i64_ty],		Intrinsic<[llvm_nxv2i64_ty],
[llvm_nxv2i64_ty, llvm_nxv2i64_ty],		[llvm_nxv2i64_ty, llvm_nxv2i64_ty],
[IntrNoMem]>;		[IntrNoMem]>;
def int_aarch64_sve_sm4e : GCCBuiltin<"__builtin_sve_svsm4e_u32">,		def int_aarch64_sve_sm4e : GCCBuiltin<"__builtin_sve_svsm4e_u32">,
Intrinsic<[llvm_nxv4i32_ty],		Intrinsic<[llvm_nxv4i32_ty],
[llvm_nxv4i32_ty, llvm_nxv4i32_ty],		[llvm_nxv4i32_ty, llvm_nxv4i32_ty],
[IntrNoMem]>;		[IntrNoMem]>;
def int_aarch64_sve_sm4ekey : GCCBuiltin<"__builtin_sve_svsm4ekey_u32">,		def int_aarch64_sve_sm4ekey : GCCBuiltin<"__builtin_sve_svsm4ekey_u32">,
Intrinsic<[llvm_nxv4i32_ty],		Intrinsic<[llvm_nxv4i32_ty],
[llvm_nxv4i32_ty, llvm_nxv4i32_ty],		[llvm_nxv4i32_ty, llvm_nxv4i32_ty],
[IntrNoMem]>;		[IntrNoMem]>;
		//
		// SVE2 - Extended table lookup/permute
		//

		def int_aarch64_sve_tbl2 : AdvSIMD_SVE2_TBX_Intrinsic;
		def int_aarch64_sve_tbx : AdvSIMD_SVE2_TBX_Intrinsic;

		//
		// SVE2 - Optional bit permutation
		//

		def int_aarch64_sve_bdep_x : AdvSIMD_2VectorArg_Intrinsic;
		def int_aarch64_sve_bext_x : AdvSIMD_2VectorArg_Intrinsic;
		def int_aarch64_sve_bgrp_x : AdvSIMD_2VectorArg_Intrinsic;

}		}

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

Show First 20 Lines • Show All 1,890 Lines • ▼ Show 20 Lines	let Predicates = [HasSVE2] in {
defm STNT1W_ZZR_S : sve2_mem_sstnt_vs<0b101, "stnt1w", Z_s, ZPR32>;		defm STNT1W_ZZR_S : sve2_mem_sstnt_vs<0b101, "stnt1w", Z_s, ZPR32>;

defm STNT1B_ZZR_D : sve2_mem_sstnt_vs<0b000, "stnt1b", Z_d, ZPR64>;		defm STNT1B_ZZR_D : sve2_mem_sstnt_vs<0b000, "stnt1b", Z_d, ZPR64>;
defm STNT1H_ZZR_D : sve2_mem_sstnt_vs<0b010, "stnt1h", Z_d, ZPR64>;		defm STNT1H_ZZR_D : sve2_mem_sstnt_vs<0b010, "stnt1h", Z_d, ZPR64>;
defm STNT1W_ZZR_D : sve2_mem_sstnt_vs<0b100, "stnt1w", Z_d, ZPR64>;		defm STNT1W_ZZR_D : sve2_mem_sstnt_vs<0b100, "stnt1w", Z_d, ZPR64>;
defm STNT1D_ZZR_D : sve2_mem_sstnt_vs<0b110, "stnt1d", Z_d, ZPR64>;		defm STNT1D_ZZR_D : sve2_mem_sstnt_vs<0b110, "stnt1d", Z_d, ZPR64>;

// SVE2 table lookup (three sources)		// SVE2 table lookup (three sources)
defm TBL_ZZZZ : sve2_int_perm_tbl<"tbl">;		defm TBL_ZZZZ : sve2_int_perm_tbl<"tbl", int_aarch64_sve_tbl2>;
defm TBX_ZZZ : sve2_int_perm_tbx<"tbx">;		defm TBX_ZZZ : sve2_int_perm_tbx<"tbx", int_aarch64_sve_tbx>;

// SVE2 integer compare scalar count and limit		// SVE2 integer compare scalar count and limit
defm WHILEGE_PWW : sve_int_while4_rr<0b000, "whilege", int_aarch64_sve_whilege>;		defm WHILEGE_PWW : sve_int_while4_rr<0b000, "whilege", int_aarch64_sve_whilege>;
defm WHILEGT_PWW : sve_int_while4_rr<0b001, "whilegt", int_aarch64_sve_whilegt>;		defm WHILEGT_PWW : sve_int_while4_rr<0b001, "whilegt", int_aarch64_sve_whilegt>;
defm WHILEHS_PWW : sve_int_while4_rr<0b100, "whilehs", int_aarch64_sve_whilehs>;		defm WHILEHS_PWW : sve_int_while4_rr<0b100, "whilehs", int_aarch64_sve_whilehs>;
defm WHILEHI_PWW : sve_int_while4_rr<0b101, "whilehi", int_aarch64_sve_whilehi>;		defm WHILEHI_PWW : sve_int_while4_rr<0b101, "whilehi", int_aarch64_sve_whilehi>;

defm WHILEGE_PXX : sve_int_while8_rr<0b000, "whilege", int_aarch64_sve_whilege>;		defm WHILEGE_PXX : sve_int_while8_rr<0b000, "whilege", int_aarch64_sve_whilege>;
Show All 31 Lines

let Predicates = [HasSVE2SHA3] in {		let Predicates = [HasSVE2SHA3] in {
// SVE2 crypto constructive binary operations		// SVE2 crypto constructive binary operations
defm RAX1_ZZZ_D : sve2_crypto_cons_bin_op<0b1, "rax1", ZPR64, int_aarch64_sve_rax1, nxv2i64>;		defm RAX1_ZZZ_D : sve2_crypto_cons_bin_op<0b1, "rax1", ZPR64, int_aarch64_sve_rax1, nxv2i64>;
}		}

let Predicates = [HasSVE2BitPerm] in {		let Predicates = [HasSVE2BitPerm] in {
// SVE2 bitwise permute		// SVE2 bitwise permute
defm BEXT_ZZZ : sve2_misc_bitwise<0b1100, "bext">;		defm BEXT_ZZZ : sve2_misc_bitwise<0b1100, "bext", int_aarch64_sve_bext_x>;
defm BDEP_ZZZ : sve2_misc_bitwise<0b1101, "bdep">;		defm BDEP_ZZZ : sve2_misc_bitwise<0b1101, "bdep", int_aarch64_sve_bdep_x>;
defm BGRP_ZZZ : sve2_misc_bitwise<0b1110, "bgrp">;		defm BGRP_ZZZ : sve2_misc_bitwise<0b1110, "bgrp", int_aarch64_sve_bgrp_x>;
}		}

llvm/lib/Target/AArch64/SVEInstrFormats.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 992 Lines • ▼ Show 20 Lines	multiclass sve_int_perm_tbl<string asm, SDPatternOperator op> {
def : SVE_2_Op_Pat<nxv4i32, op, nxv4i32, nxv4i32, !cast<Instruction>(NAME # _S)>;		def : SVE_2_Op_Pat<nxv4i32, op, nxv4i32, nxv4i32, !cast<Instruction>(NAME # _S)>;
def : SVE_2_Op_Pat<nxv2i64, op, nxv2i64, nxv2i64, !cast<Instruction>(NAME # _D)>;		def : SVE_2_Op_Pat<nxv2i64, op, nxv2i64, nxv2i64, !cast<Instruction>(NAME # _D)>;

def : SVE_2_Op_Pat<nxv8f16, op, nxv8f16, nxv8i16, !cast<Instruction>(NAME # _H)>;		def : SVE_2_Op_Pat<nxv8f16, op, nxv8f16, nxv8i16, !cast<Instruction>(NAME # _H)>;
def : SVE_2_Op_Pat<nxv4f32, op, nxv4f32, nxv4i32, !cast<Instruction>(NAME # _S)>;		def : SVE_2_Op_Pat<nxv4f32, op, nxv4f32, nxv4i32, !cast<Instruction>(NAME # _S)>;
def : SVE_2_Op_Pat<nxv2f64, op, nxv2f64, nxv2i64, !cast<Instruction>(NAME # _D)>;		def : SVE_2_Op_Pat<nxv2f64, op, nxv2f64, nxv2i64, !cast<Instruction>(NAME # _D)>;
}		}

multiclass sve2_int_perm_tbl<string asm> {		multiclass sve2_int_perm_tbl<string asm, SDPatternOperator op> {
def _B : sve_int_perm_tbl<0b00, 0b01, asm, ZPR8, ZZ_b>;		def _B : sve_int_perm_tbl<0b00, 0b01, asm, ZPR8, ZZ_b>;
def _H : sve_int_perm_tbl<0b01, 0b01, asm, ZPR16, ZZ_h>;		def _H : sve_int_perm_tbl<0b01, 0b01, asm, ZPR16, ZZ_h>;
def _S : sve_int_perm_tbl<0b10, 0b01, asm, ZPR32, ZZ_s>;		def _S : sve_int_perm_tbl<0b10, 0b01, asm, ZPR32, ZZ_s>;
def _D : sve_int_perm_tbl<0b11, 0b01, asm, ZPR64, ZZ_d>;		def _D : sve_int_perm_tbl<0b11, 0b01, asm, ZPR64, ZZ_d>;

		def : Pat<(nxv16i8 (op nxv16i8:$Op1, nxv16i8:$Op2, nxv16i8:$Op3)),
		(nxv16i8 (!cast<Instruction>(NAME # _B) (REG_SEQUENCE ZPR2, nxv16i8:$Op1, zsub0,
		nxv16i8:$Op2, zsub1),
		nxv16i8:$Op3))>;

		def : Pat<(nxv8i16 (op nxv8i16:$Op1, nxv8i16:$Op2, nxv8i16:$Op3)),
		(nxv8i16 (!cast<Instruction>(NAME # _H) (REG_SEQUENCE ZPR2, nxv8i16:$Op1, zsub0,
		nxv8i16:$Op2, zsub1),
		nxv8i16:$Op3))>;

		def : Pat<(nxv4i32 (op nxv4i32:$Op1, nxv4i32:$Op2, nxv4i32:$Op3)),
		(nxv4i32 (!cast<Instruction>(NAME # _S) (REG_SEQUENCE ZPR2, nxv4i32:$Op1, zsub0,
		nxv4i32:$Op2, zsub1),
		nxv4i32:$Op3))>;

		def : Pat<(nxv2i64 (op nxv2i64:$Op1, nxv2i64:$Op2, nxv2i64:$Op3)),
		(nxv2i64 (!cast<Instruction>(NAME # _D) (REG_SEQUENCE ZPR2, nxv2i64:$Op1, zsub0,
		nxv2i64:$Op2, zsub1),
		nxv2i64:$Op3))>;

		def : Pat<(nxv8f16 (op nxv8f16:$Op1, nxv8f16:$Op2, nxv8i16:$Op3)),
		(nxv8f16 (!cast<Instruction>(NAME # _H) (REG_SEQUENCE ZPR2, nxv8f16:$Op1, zsub0,
		nxv8f16:$Op2, zsub1),
		nxv8i16:$Op3))>;

		def : Pat<(nxv4f32 (op nxv4f32:$Op1, nxv4f32:$Op2, nxv4i32:$Op3)),
		(nxv4f32 (!cast<Instruction>(NAME # _S) (REG_SEQUENCE ZPR2, nxv4f32:$Op1, zsub0,
		nxv4f32:$Op2, zsub1),
		nxv4i32:$Op3))>;

		def : Pat<(nxv2f64 (op nxv2f64:$Op1, nxv2f64:$Op2, nxv2i64:$Op3)),
		(nxv2f64 (!cast<Instruction>(NAME # _D) (REG_SEQUENCE ZPR2, nxv2f64:$Op1, zsub0,
		nxv2f64:$Op2, zsub1),
		nxv2i64:$Op3))>;
}		}

class sve2_int_perm_tbx<bits<2> sz8_64, string asm, ZPRRegOp zprty>		class sve2_int_perm_tbx<bits<2> sz8_64, string asm, ZPRRegOp zprty>
: I<(outs zprty:$Zd), (ins zprty:$_Zd, zprty:$Zn, zprty:$Zm),		: I<(outs zprty:$Zd), (ins zprty:$_Zd, zprty:$Zn, zprty:$Zm),
asm, "\t$Zd, $Zn, $Zm",		asm, "\t$Zd, $Zn, $Zm",
"",		"",
[]>, Sched<[]> {		[]>, Sched<[]> {
bits<5> Zd;		bits<5> Zd;
bits<5> Zm;		bits<5> Zm;
bits<5> Zn;		bits<5> Zn;
let Inst{31-24} = 0b00000101;		let Inst{31-24} = 0b00000101;
let Inst{23-22} = sz8_64;		let Inst{23-22} = sz8_64;
let Inst{21} = 0b1;		let Inst{21} = 0b1;
let Inst{20-16} = Zm;		let Inst{20-16} = Zm;
let Inst{15-10} = 0b001011;		let Inst{15-10} = 0b001011;
let Inst{9-5} = Zn;		let Inst{9-5} = Zn;
let Inst{4-0} = Zd;		let Inst{4-0} = Zd;

let Constraints = "$Zd = $_Zd";		let Constraints = "$Zd = $_Zd";
}		}

multiclass sve2_int_perm_tbx<string asm> {		multiclass sve2_int_perm_tbx<string asm, SDPatternOperator op> {
def _B : sve2_int_perm_tbx<0b00, asm, ZPR8>;		def _B : sve2_int_perm_tbx<0b00, asm, ZPR8>;
def _H : sve2_int_perm_tbx<0b01, asm, ZPR16>;		def _H : sve2_int_perm_tbx<0b01, asm, ZPR16>;
def _S : sve2_int_perm_tbx<0b10, asm, ZPR32>;		def _S : sve2_int_perm_tbx<0b10, asm, ZPR32>;
def _D : sve2_int_perm_tbx<0b11, asm, ZPR64>;		def _D : sve2_int_perm_tbx<0b11, asm, ZPR64>;

		def : SVE_3_Op_Pat<nxv16i8, op, nxv16i8, nxv16i8, nxv16i8, !cast<Instruction>(NAME # _B)>;
		def : SVE_3_Op_Pat<nxv8i16, op, nxv8i16, nxv8i16, nxv8i16, !cast<Instruction>(NAME # _H)>;
		def : SVE_3_Op_Pat<nxv4i32, op, nxv4i32, nxv4i32, nxv4i32, !cast<Instruction>(NAME # _S)>;
		def : SVE_3_Op_Pat<nxv2i64, op, nxv2i64, nxv2i64, nxv2i64, !cast<Instruction>(NAME # _D)>;

		def : SVE_3_Op_Pat<nxv8f16, op, nxv8f16, nxv8f16, nxv8i16, !cast<Instruction>(NAME # _H)>;
		def : SVE_3_Op_Pat<nxv4f32, op, nxv4f32, nxv4f32, nxv4i32, !cast<Instruction>(NAME # _S)>;
		def : SVE_3_Op_Pat<nxv2f64, op, nxv2f64, nxv2f64, nxv2i64, !cast<Instruction>(NAME # _D)>;
}		}

class sve_int_perm_reverse_z<bits<2> sz8_64, string asm, ZPRRegOp zprty>		class sve_int_perm_reverse_z<bits<2> sz8_64, string asm, ZPRRegOp zprty>
: I<(outs zprty:$Zd), (ins zprty:$Zn),		: I<(outs zprty:$Zd), (ins zprty:$Zn),
asm, "\t$Zd, $Zn",		asm, "\t$Zd, $Zn",
"",		"",
[]>, Sched<[]> {		[]>, Sched<[]> {
bits<5> Zd;		bits<5> Zd;
▲ Show 20 Lines • Show All 1,979 Lines • ▼ Show 20 Lines	: I<(outs zprty1:$Zd), (ins zprty2:$Zn, zprty2:$Zm),
let Inst{21} = 0b0;		let Inst{21} = 0b0;
let Inst{20-16} = Zm;		let Inst{20-16} = Zm;
let Inst{15-14} = 0b10;		let Inst{15-14} = 0b10;
let Inst{13-10} = opc;		let Inst{13-10} = opc;
let Inst{9-5} = Zn;		let Inst{9-5} = Zn;
let Inst{4-0} = Zd;		let Inst{4-0} = Zd;
}		}

multiclass sve2_misc_bitwise<bits<4> opc, string asm> {		multiclass sve2_misc_bitwise<bits<4> opc, string asm, SDPatternOperator op> {
def _B : sve2_misc<0b00, opc, asm, ZPR8, ZPR8>;		def _B : sve2_misc<0b00, opc, asm, ZPR8, ZPR8>;
def _H : sve2_misc<0b01, opc, asm, ZPR16, ZPR16>;		def _H : sve2_misc<0b01, opc, asm, ZPR16, ZPR16>;
def _S : sve2_misc<0b10, opc, asm, ZPR32, ZPR32>;		def _S : sve2_misc<0b10, opc, asm, ZPR32, ZPR32>;
def _D : sve2_misc<0b11, opc, asm, ZPR64, ZPR64>;		def _D : sve2_misc<0b11, opc, asm, ZPR64, ZPR64>;

		def : SVE_2_Op_Pat<nxv16i8, op, nxv16i8, nxv16i8, !cast<Instruction>(NAME # _B)>;
		def : SVE_2_Op_Pat<nxv8i16, op, nxv8i16, nxv8i16, !cast<Instruction>(NAME # _H)>;
		def : SVE_2_Op_Pat<nxv4i32, op, nxv4i32, nxv4i32, !cast<Instruction>(NAME # _S)>;
		def : SVE_2_Op_Pat<nxv2i64, op, nxv2i64, nxv2i64, !cast<Instruction>(NAME # _D)>;
}		}

multiclass sve2_misc_int_addsub_long_interleaved<bits<2> opc, string asm,		multiclass sve2_misc_int_addsub_long_interleaved<bits<2> opc, string asm,
SDPatternOperator op> {		SDPatternOperator op> {
def _H : sve2_misc<0b01, { 0b00, opc }, asm, ZPR16, ZPR8>;		def _H : sve2_misc<0b01, { 0b00, opc }, asm, ZPR16, ZPR8>;
def _S : sve2_misc<0b10, { 0b00, opc }, asm, ZPR32, ZPR16>;		def _S : sve2_misc<0b10, { 0b00, opc }, asm, ZPR32, ZPR16>;
def _D : sve2_misc<0b11, { 0b00, opc }, asm, ZPR64, ZPR32>;		def _D : sve2_misc<0b11, { 0b00, opc }, asm, ZPR64, ZPR32>;

▲ Show 20 Lines • Show All 4,120 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve2-intrinsics-bit-permutation.ll

This file was added.

				; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve2,+sve2-bitperm < %s \| FileCheck %s

				andwarUnsubmitted Done Reply Inline Actions AFAIK, `-asm-verbose=0` is not currently needed here (and you don't use it in the other test). There are 2 options: Leave `-asm-verbose=0` (guarantees that there are no comments in assembly) and additionally decorate every function that you define with `nounwind` (guarantees that no CFI directives are added). This way you can safely replace every instance of `CHECK` with `CHECK-NEXT`. Remove `-asm-verbose=0` and leave things as they are. andwar: AFAIK, `-asm-verbose=0` is not currently needed here (and you don't use it in the other test).
				;
				; BDEP
				;

				define <vscale x 16 x i8> @bdep_nxv16i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b) {
				; CHECK-LABEL: bdep_nxv16i8:
				; CHECK: bdep z0.b, z0.b, z1.b
				; CHECK-NEXT: ret
				%out = call <vscale x 16 x i8> @llvm.aarch64.sve.bdep.x.nx16i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b)
				ret <vscale x 16 x i8> %out
				}

				define <vscale x 8 x i16> @bdep_nxv8i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b) {
				; CHECK-LABEL: bdep_nxv8i16:
				; CHECK: bdep z0.h, z0.h, z1.h
				; CHECK-NEXT: ret
				%out = call <vscale x 8 x i16> @llvm.aarch64.sve.bdep.x.nx8i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b)
				ret <vscale x 8 x i16> %out
				}

				define <vscale x 4 x i32> @bdep_nxv4i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b) {
				; CHECK-LABEL: bdep_nxv4i32:
				; CHECK: bdep z0.s, z0.s, z1.s
				; CHECK-NEXT: ret
				%out = call <vscale x 4 x i32> @llvm.aarch64.sve.bdep.x.nx4i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b)
				ret <vscale x 4 x i32> %out
				}

				define <vscale x 2 x i64> @bdep_nxv2i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) {
				; CHECK-LABEL: bdep_nxv2i64:
				; CHECK: bdep z0.d, z0.d, z1.d
				; CHECK-NEXT: ret
				%out = call <vscale x 2 x i64> @llvm.aarch64.sve.bdep.x.nx2i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
				ret <vscale x 2 x i64> %out
				}

				;
				; BEXT
				;

				define <vscale x 16 x i8> @bext_nxv16i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b) {
				; CHECK-LABEL: bext_nxv16i8:
				; CHECK: bext z0.b, z0.b, z1.b
				; CHECK-NEXT: ret
				%out = call <vscale x 16 x i8> @llvm.aarch64.sve.bext.x.nx16i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b)
				ret <vscale x 16 x i8> %out
				}

				define <vscale x 8 x i16> @bext_nxv8i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b) {
				; CHECK-LABEL: bext_nxv8i16:
				; CHECK: bext z0.h, z0.h, z1.h
				; CHECK-NEXT: ret
				%out = call <vscale x 8 x i16> @llvm.aarch64.sve.bext.x.nx8i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b)
				ret <vscale x 8 x i16> %out
				}

				define <vscale x 4 x i32> @bext_nxv4i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b) {
				; CHECK-LABEL: bext_nxv4i32:
				; CHECK: bext z0.s, z0.s, z1.s
				; CHECK-NEXT: ret
				%out = call <vscale x 4 x i32> @llvm.aarch64.sve.bext.x.nx4i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b)
				ret <vscale x 4 x i32> %out
				}

				define <vscale x 2 x i64> @bext_nxv2i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) {
				; CHECK-LABEL: bext_nxv2i64:
				; CHECK: bext z0.d, z0.d, z1.d
				; CHECK-NEXT: ret
				%out = call <vscale x 2 x i64> @llvm.aarch64.sve.bext.x.nx2i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
				ret <vscale x 2 x i64> %out
				}

				;
				; BGRP
				;

				define <vscale x 16 x i8> @bgrp_nxv16i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b) {
				; CHECK-LABEL: bgrp_nxv16i8:
				; CHECK: bgrp z0.b, z0.b, z1.b
				; CHECK-NEXT: ret
				%out = call <vscale x 16 x i8> @llvm.aarch64.sve.bgrp.x.nx16i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b)
				ret <vscale x 16 x i8> %out
				}

				define <vscale x 8 x i16> @bgrp_nxv8i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b) {
				; CHECK-LABEL: bgrp_nxv8i16:
				; CHECK: bgrp z0.h, z0.h, z1.h
				; CHECK-NEXT: ret
				%out = call <vscale x 8 x i16> @llvm.aarch64.sve.bgrp.x.nx8i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b)
				ret <vscale x 8 x i16> %out
				}

				define <vscale x 4 x i32> @bgrp_nxv4i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b) {
				; CHECK-LABEL: bgrp_nxv4i32:
				; CHECK: bgrp z0.s, z0.s, z1.s
				; CHECK-NEXT: ret
				%out = call <vscale x 4 x i32> @llvm.aarch64.sve.bgrp.x.nx4i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b)
				ret <vscale x 4 x i32> %out
				}

				define <vscale x 2 x i64> @bgrp_nxv2i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b) {
				; CHECK-LABEL: bgrp_nxv2i64:
				; CHECK: bgrp z0.d, z0.d, z1.d
				; CHECK-NEXT: ret
				%out = call <vscale x 2 x i64> @llvm.aarch64.sve.bgrp.x.nx2i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
				ret <vscale x 2 x i64> %out
				}

				declare <vscale x 16 x i8> @llvm.aarch64.sve.bdep.x.nx16i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b)
				declare <vscale x 8 x i16> @llvm.aarch64.sve.bdep.x.nx8i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b)
				declare <vscale x 4 x i32> @llvm.aarch64.sve.bdep.x.nx4i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b)
				declare <vscale x 2 x i64> @llvm.aarch64.sve.bdep.x.nx2i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b)

				declare <vscale x 16 x i8> @llvm.aarch64.sve.bext.x.nx16i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b)
				declare <vscale x 8 x i16> @llvm.aarch64.sve.bext.x.nx8i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b)
				declare <vscale x 4 x i32> @llvm.aarch64.sve.bext.x.nx4i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b)
				declare <vscale x 2 x i64> @llvm.aarch64.sve.bext.x.nx2i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b)

				declare <vscale x 16 x i8> @llvm.aarch64.sve.bgrp.x.nx16i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b)
				declare <vscale x 8 x i16> @llvm.aarch64.sve.bgrp.x.nx8i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b)
				declare <vscale x 4 x i32> @llvm.aarch64.sve.bgrp.x.nx4i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b)
				declare <vscale x 2 x i64> @llvm.aarch64.sve.bgrp.x.nx2i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b)

llvm/test/CodeGen/AArch64/sve2-intrinsics-perm-tb.ll

This file was added.

				; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve2 < %s \| FileCheck %s

				;
				; TBL2
				;

				define <vscale x 16 x i8> @tbl2_b(<vscale x 16 x i8> %a, <vscale x 16 x i8> %unused,
				<vscale x 16 x i8> %b, <vscale x 16 x i8> %c) {
				; CHECK-LABEL: tbl2_b:
				; CHECK: mov z1.d, z0.d
				sdesmalenUnsubmitted Not Done Reply Inline Actions We should test this with operands that are not already consecutive. `%a` and `%b` will come in as `z0` and `z1` by definition of the calling convention. By adding a `%dummy` in between `%a` and `%b`, you can check that a `mov` is inserted to ensure both registers are consecutive. sdesmalen: We should test this with operands that are not already consecutive. `%a` and `%b` will come in…
				; CHECK-NEXT: tbl z0.b, { z1.b, z2.b }, z3.b
				; CHECK-NEXT: ret
				%out = call <vscale x 16 x i8> @llvm.aarch64.sve.tbl2.nxv16i8(<vscale x 16 x i8> %a,
				<vscale x 16 x i8> %b,
				<vscale x 16 x i8> %c)
				ret <vscale x 16 x i8> %out
				}

				define <vscale x 8 x i16> @tbl2_h(<vscale x 8 x i16> %a, <vscale x 16 x i8> %unused,
				<vscale x 8 x i16> %b, <vscale x 8 x i16> %c) {
				; CHECK-LABEL: tbl2_h:
				; CHECK: mov z1.d, z0.d
				; CHECK-NEXT: tbl z0.h, { z1.h, z2.h }, z3.h
				; CHECK-NEXT: ret
				%out = call <vscale x 8 x i16> @llvm.aarch64.sve.tbl2.nxv8i16(<vscale x 8 x i16> %a,
				<vscale x 8 x i16> %b,
				<vscale x 8 x i16> %c)
				ret <vscale x 8 x i16> %out
				}

				define <vscale x 4 x i32> @tbl2_s(<vscale x 4 x i32> %a, <vscale x 4 x i32> %unused,
				<vscale x 4 x i32> %b, <vscale x 4 x i32> %c) {
				; CHECK-LABEL: tbl2_s:
				; CHECK: mov z1.d, z0.d
				; CHECK-NEXT: tbl z0.s, { z1.s, z2.s }, z3.s
				; CHECK-NEXT: ret
				%out = call <vscale x 4 x i32> @llvm.aarch64.sve.tbl2.nxv4i32(<vscale x 4 x i32> %a,
				<vscale x 4 x i32> %b,
				<vscale x 4 x i32> %c)
				ret <vscale x 4 x i32> %out
				}

				define <vscale x 2 x i64> @tbl2_d(<vscale x 2 x i64> %a, <vscale x 2 x i64> %unused,
				<vscale x 2 x i64> %b, <vscale x 2 x i64> %c) {
				; CHECK-LABEL: tbl2_d:
				; CHECK: mov z1.d, z0.d
				; CHECK-NEXT: tbl z0.d, { z1.d, z2.d }, z3.d
				; CHECK-NEXT: ret
				%out = call <vscale x 2 x i64> @llvm.aarch64.sve.tbl2.nxv2i64(<vscale x 2 x i64> %a,
				<vscale x 2 x i64> %b,
				<vscale x 2 x i64> %c)
				ret <vscale x 2 x i64> %out
				}

				define <vscale x 8 x half> @tbl2_fh(<vscale x 8 x half> %a, <vscale x 8 x half> %unused,
				<vscale x 8 x half> %b, <vscale x 8 x i16> %c) {
				; CHECK-LABEL: tbl2_fh:
				; CHECK: mov z1.d, z0.d
				; CHECK-NEXT: tbl z0.h, { z1.h, z2.h }, z3.h
				; CHECK-NEXT: ret
				%out = call <vscale x 8 x half> @llvm.aarch64.sve.tbl2.nxv8f16(<vscale x 8 x half> %a,
				<vscale x 8 x half> %b,
				<vscale x 8 x i16> %c)
				ret <vscale x 8 x half> %out
				}

				define <vscale x 4 x float> @tbl2_fs(<vscale x 4 x float> %a, <vscale x 4 x float> %unused,
				<vscale x 4 x float> %b, <vscale x 4 x i32> %c) {
				; CHECK-LABEL: tbl2_fs:
				; CHECK: z1.d, z0.d
				; CHECK-NEXT: tbl z0.s, { z1.s, z2.s }, z3.s
				; CHECK-NEXT: ret
				%out = call <vscale x 4 x float> @llvm.aarch64.sve.tbl2.nxv4f32(<vscale x 4 x float> %a,
				<vscale x 4 x float> %b,
				<vscale x 4 x i32> %c)
				ret <vscale x 4 x float> %out
				}

				define <vscale x 2 x double> @tbl2_fd(<vscale x 2 x double> %a, <vscale x 2 x double> %unused,
				<vscale x 2 x double> %b, <vscale x 2 x i64> %c) {
				; CHECK-LABEL: tbl2_fd:
				; CHECK: mov z1.d, z0.d
				; CHECK-NEXT: tbl z0.d, { z1.d, z2.d }, z3.d
				; CHECK-NEXT: ret
				%out = call <vscale x 2 x double> @llvm.aarch64.sve.tbl2.nxv2f64(<vscale x 2 x double> %a,
				<vscale x 2 x double> %b,
				<vscale x 2 x i64> %c)
				ret <vscale x 2 x double> %out
				}

				;
				; TBX
				;

				define <vscale x 16 x i8> @tbx_b(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b, <vscale x 16 x i8> %c) {
				; CHECK-LABEL: tbx_b:
				; CHECK: tbx z0.b, z1.b, z2.b
				; CHECK-NEXT: ret
				%out = call <vscale x 16 x i8> @llvm.aarch64.sve.tbx.nxv16i8(<vscale x 16 x i8> %a,
				<vscale x 16 x i8> %b,
				<vscale x 16 x i8> %c)
				ret <vscale x 16 x i8> %out
				}

				define <vscale x 8 x i16> @tbx_h(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b, <vscale x 8 x i16> %c) {
				; CHECK-LABEL: tbx_h:
				; CHECK: tbx z0.h, z1.h, z2.h
				; CHECK-NEXT: ret
				%out = call <vscale x 8 x i16> @llvm.aarch64.sve.tbx.nxv8i16(<vscale x 8 x i16> %a,
				<vscale x 8 x i16> %b,
				<vscale x 8 x i16> %c)
				ret <vscale x 8 x i16> %out
				}

				define <vscale x 8 x half> @ftbx_h(<vscale x 8 x half> %a, <vscale x 8 x half> %b, <vscale x 8 x i16> %c) {
				; CHECK-LABEL: ftbx_h:
				; CHECK: tbx z0.h, z1.h, z2.h
				; CHECK-NEXT: ret
				%out = call <vscale x 8 x half> @llvm.aarch64.sve.tbx.nxv8f16(<vscale x 8 x half> %a,
				<vscale x 8 x half> %b,
				<vscale x 8 x i16> %c)
				ret <vscale x 8 x half> %out
				}

				define <vscale x 4 x i32> @tbx_s(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b, <vscale x 4 x i32> %c) {
				; CHECK-LABEL: tbx_s:
				; CHECK: tbx z0.s, z1.s, z2.s
				; CHECK-NEXT: ret
				%out = call <vscale x 4 x i32> @llvm.aarch64.sve.tbx.nxv4i32(<vscale x 4 x i32> %a,
				<vscale x 4 x i32> %b,
				<vscale x 4 x i32> %c)
				ret <vscale x 4 x i32> %out
				}

				define <vscale x 4 x float> @ftbx_s(<vscale x 4 x float> %a, <vscale x 4 x float> %b, <vscale x 4 x i32> %c) {
				; CHECK-LABEL: ftbx_s:
				; CHECK: tbx z0.s, z1.s, z2.s
				; CHECK-NEXT: ret
				%out = call <vscale x 4 x float> @llvm.aarch64.sve.tbx.nxv4f32(<vscale x 4 x float> %a,
				<vscale x 4 x float> %b,
				<vscale x 4 x i32> %c)
				ret <vscale x 4 x float> %out
				}

				define <vscale x 2 x i64> @tbx_d(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b, <vscale x 2 x i64> %c) {
				; CHECK-LABEL: tbx_d:
				; CHECK: tbx z0.d, z1.d, z2.d
				; CHECK-NEXT: ret
				%out = call <vscale x 2 x i64> @llvm.aarch64.sve.tbx.nxv2i64(<vscale x 2 x i64> %a,
				<vscale x 2 x i64> %b,
				<vscale x 2 x i64> %c)
				ret <vscale x 2 x i64> %out
				}

				define <vscale x 2 x double> @ftbx_d(<vscale x 2 x double> %a, <vscale x 2 x double> %b, <vscale x 2 x i64> %c) {
				; CHECK-LABEL: ftbx_d:
				; CHECK: tbx z0.d, z1.d, z2.d
				; CHECK-NEXT: ret
				%out = call <vscale x 2 x double> @llvm.aarch64.sve.tbx.nxv2f64(<vscale x 2 x double> %a,
				<vscale x 2 x double> %b,
				<vscale x 2 x i64> %c)
				ret <vscale x 2 x double> %out
				}

				declare <vscale x 16 x i8> @llvm.aarch64.sve.tbl2.nxv16i8(<vscale x 16 x i8>, <vscale x 16 x i8>, <vscale x 16 x i8>)
				declare <vscale x 8 x i16> @llvm.aarch64.sve.tbl2.nxv8i16(<vscale x 8 x i16>, <vscale x 8 x i16>, <vscale x 8 x i16>)
				declare <vscale x 4 x i32> @llvm.aarch64.sve.tbl2.nxv4i32(<vscale x 4 x i32>, <vscale x 4 x i32>, <vscale x 4 x i32>)
				declare <vscale x 2 x i64> @llvm.aarch64.sve.tbl2.nxv2i64(<vscale x 2 x i64>, <vscale x 2 x i64>, <vscale x 2 x i64>)

				declare <vscale x 8 x half> @llvm.aarch64.sve.tbl2.nxv8f16(<vscale x 8 x half>, <vscale x 8 x half>, <vscale x 8 x i16>)
				declare <vscale x 4 x float> @llvm.aarch64.sve.tbl2.nxv4f32(<vscale x 4 x float>, <vscale x 4 x float>, <vscale x 4 x i32>)
				declare <vscale x 2 x double> @llvm.aarch64.sve.tbl2.nxv2f64(<vscale x 2 x double>, <vscale x 2 x double>, <vscale x 2 x i64>)

				declare <vscale x 16 x i8> @llvm.aarch64.sve.tbx.nxv16i8(<vscale x 16 x i8>, <vscale x 16 x i8>, <vscale x 16 x i8>)
				declare <vscale x 8 x i16> @llvm.aarch64.sve.tbx.nxv8i16(<vscale x 8 x i16>, <vscale x 8 x i16>, <vscale x 8 x i16>)
				declare <vscale x 4 x i32> @llvm.aarch64.sve.tbx.nxv4i32(<vscale x 4 x i32>, <vscale x 4 x i32>, <vscale x 4 x i32>)
				declare <vscale x 2 x i64> @llvm.aarch64.sve.tbx.nxv2i64(<vscale x 2 x i64>, <vscale x 2 x i64>, <vscale x 2 x i64>)

				declare <vscale x 8 x half> @llvm.aarch64.sve.tbx.nxv8f16(<vscale x 8 x half>, <vscale x 8 x half>, <vscale x 8 x i16>)
				declare <vscale x 4 x float> @llvm.aarch64.sve.tbx.nxv4f32(<vscale x 4 x float>, <vscale x 4 x float>, <vscale x 4 x i32>)
				declare <vscale x 2 x double> @llvm.aarch64.sve.tbx.nxv2f64(<vscale x 2 x double>, <vscale x 2 x double>, <vscale x 2 x i64>)

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Add SVE2 intrinsics for bit permutation & table lookupClosedPublic

Details

Diff Detail

Event Timeline