This is an archive of the discontinued LLVM Phabricator instance.

lib/Target/AArch64/AArch64InstrFormats.td
8555 ↗	(On Diff #22387)	I think this should say "Doubling", right?
8607 ↗	(On Diff #22387)	We don't allow commented out code, sorry. You should also be able to enable this pattern, by using an SMOV instruction (you'd need a UMOV for the equivalent unsigned operation): def : Pat<(... stays the same ...), (SMOVvi16to32 (!cast<Instruction>(NAME # v4i16_indexed) ..., VectorIndexH:0)>; But the commented out pattern seems weird; it uses SUBREG_TO_REG, which I don't understand why, and it matches sqdmull, not sqrdmulh like the surrounding code. So something's fishy here.
8657 ↗	(On Diff #22387)	This is really weird and sounds like a bug, although if the pattern matches I can't really argue with it as it means the bug is somewhere else... ... I assume this pattern is explicitly tested?
8714 ↗	(On Diff #22387)	You should be able to implement this with SMOV/UMOV, as I mentioned above.
lib/Target/AArch64/AArch64InstrInfo.td
3086 ↗	(On Diff #22387)	Again, UMOV/SMOV to implement this.

This revision now requires changes to proceed.Mar 24 2015, 4:10 AM

vsukharev retitled this revision from [AArch64] Add v8.1a "Rounding Double Multiply Add/Subtract" extension to [AArch64] Add v8.1a "Rounding Doubling Multiply Add/Subtract" extension.Mar 26 2015, 11:43 AM

vsukharev updated this object.

vsukharev edited edge metadata.

vsukharev removed a parent revision: D8501: [AArch64] Add v8.1a atomic instructions.

vsukharev added inline comments.Mar 26 2015, 12:45 PM

lib/Target/AArch64/AArch64InstrFormats.td
8555 ↗	(On Diff #22387)	Sorry, will be changed in next revision.
8607 ↗	(On Diff #22387)	Oops, good catch. That's the correct pattern that is supposed to be here, but cannot be compiled due to problem with first part(matching), not with second part(generating) // FIXME: this cannot be processed by TableGen // error: In SQRDMLAHanonymous_913: Type inference contradiction found, // merging '{i32:i64:v8i8:v16i8:v4i16:v8i16:v2i32:v4i32:v1i64:v2i64}' into 'i16' // error: In SQRDMLAHanonymous_913: Type inference contradiction found, merging 'i16' into 'f16' //def : Pat<(i16 (Accum (i16 FPR16Op:$Rd), // (i16 (vector_extract // (v8i16 (insert_subvector // (undef), // (v4i16 (int_aarch64_neon_sqrdmulh // (v4i16 V64:$Rn), // (v4i16 (AArch64duplane16 // (v8i16 V128_lo:$Rm), // VectorIndexH:$idx)))), // (i32 0))), // (i64 0))))), // (EXTRACT_SUBREG // (v4i16 (!cast<Instruction>(NAME # v4i16_indexed) // (v4i16 (INSERT_SUBREG (v4i16 (IMPLICIT_DEF)), // FPR16Op:$Rd, // ssub)), // V64:$Rn, // V128_lo:$Rm, // VectorIndexH:$idx)), // ssub)>; Test for it: test_sqrdmlsh_extracted_lane_s16 (see below)
8636 ↗	(On Diff #22387)	Also, that's a non-compilable pattern, supposed to be here and tested by test_sqrdmlahq_extracted_lane_s16 // FIXME: this cannot be processed by TableGen // error: In SQRDMLAHanonymous_913: Type inference contradiction found, // merging '{i32:i64:v8i8:v16i8:v4i16:v8i16:v2i32:v4i32:v1i64:v2i64}' into 'i16' // error: In SQRDMLAHanonymous_913: Type inference contradiction found, merging 'i16' into 'f16' //def : Pat<(i16 (Accum (i16 FPR16Op:$Rd), // (i16 (vector_extract // (v8i16 (int_aarch64_neon_sqrdmulh // (v8i16 V128:$Rn), // (v8i16 (AArch64duplane16 // (v8i16 V128_lo:$Rm), // VectorIndexH:$idx)))), // (i64 0))))), // (EXTRACT_SUBREG // (v8i16 (!cast<Instruction>(NAME # v8i16_indexed) // (v8i16 (INSERT_SUBREG (v8i16 (IMPLICIT_DEF)), // FPR16Op:$Rd, // ssub)), // V128:$Rn, // V128_lo:$Rm, // VectorIndexH:$idx)), // ssub)>;
8657 ↗	(On Diff #22387)	weird extra node (v4i32 (insert_subvector (undef),(2i32... is inserted to this DAG, because extact_subvector is illegal from 2i32. It is legal only from 4i32. That could be a bug of higher design level, do you have any thoughts? Meanwhile this pattern successully matches DAG, that we have for explicit test "test_sqrdmlah_extracted_lane_s32" (see comment below)
8714 ↗	(On Diff #22387)	As I commented above, it's a problem of another kind: namely, Tablegen cannot generate matcher for snippet Accum (i16 FPR16Op:$Rd), (i16 (int_aarch64_neon_sqrdmulh.... because of error SQRDMLAHi16_indexed: (set FPR16Op:i16:$dst, (intrinsic_wo_chain:{i32:i64:v8i8:v16i8:v4i16:v8i16:v2i32:v4i32:v1i64:v2i64} 117:iPTR, FPR16Op:<empty>:$Rd, (intrinsic_wo_chain:i16 122:<empty>, FPR16Op:i16:$Rn, (vector_extract:i16 V128_lo:v8i16:$Rm, (imm:i64)<<P:Predicate_VectorIndexH>>:$idx)))) Included from /work/llvm-rw/lib/Target/AArch64/AArch64.td:58: /work/llvm-rw/lib/Target/AArch64/AArch64InstrInfo.td:4357:1: error: In SQRDMLAHi16_indexed: Type inference contradiction found, merging 'f16' into 'i16' defm SQRDMLAH : SIMDIndexedSQRDMLxHSDTied<1, 0b1101, "sqrdmlah", ^ Included from /work/llvm-rw/lib/Target/AArch64/AArch64.td:58: Included from /work/llvm-rw/lib/Target/AArch64/AArch64InstrInfo.td:283: /work/llvm-rw/lib/Target/AArch64/AArch64InstrFormats.td:8737:3: note: instantiated from multiclass def i16_indexed : BaseSIMDIndexedTied<1, U, 1, 0b01, opc, ^
lib/Target/AArch64/AArch64InstrInfo.td
3086 ↗	(On Diff #22387)	(the same as discussion above)
test/CodeGen/AArch64/arm64-neon-v8.1a.ll
230 ↗	(On Diff #22387)	test for second non-compilable pattern, marked with // FIXME: this cannot be processed by TableGen
241 ↗	(On Diff #22387)	That's a test for really weird pattern above to match weird extra insert_subvector in DAG
269 ↗	(On Diff #22387)	test for first non-compilable pattern, marked with // FIXME: this cannot be processed by TableGen

Hi Vladimir,

I've checked out your patch and fiddled around with it. It is possible, but ugly, to match your unmatchable pattern.

First, we need to properly legalize the intrinsic. It has type i16 (and takes i16 arguments). I16 is illegal so needs to be promoted, but the generic code can't promote it for you so we need to do it ourselves. There are two ways to do this: either create a new AArch64ISD:: node for this operation or operate on ISD::INTRINSIC_WO_CHAIN nodes themselves. For simplicity I've done the latter.

First, we need to tell the target-agnostic gubbins that we want to custom lower intrinsic nodes:

// Somewhere near AArch64ISelLowering.cpp:120
setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::i16, Custom);

Now we need to implement the custom lowering.

// In AArch64ISelLowering.cpp , function ReplaceNodeResults() 
  case ISD::INTRINSIC_WO_CHAIN: {                                                                                                                                                         
    auto ID = getIntrinsicID(N);
    if ((ID == Intrinsic::aarch64_neon_sqrdmulh ||
         ID == Intrinsic::aarch64_neon_sqadd) &&
        N->getValueType(0) == MVT::i16) {
      // Promote to i32.
      SDLoc DL(N);                                                                                                                                                  
                                                                                                                                                                    
      auto Op0 = DAG.getNode(ISD::ANY_EXTEND, DL, MVT::i32, N->getOperand(1));                                                                                      
      auto Op1 = DAG.getNode(ISD::ANY_EXTEND, DL, MVT::i32, N->getOperand(2));                                                                                      
                                                                                                                                                                    
      auto NN = DAG.getNode(ISD::INTRINSIC_WO_CHAIN, DL, MVT::i32,                                                                                                  
                            DAG.getConstant(ID, MVT::i32),                                                                                                          
                            Op0, Op1);                                                                                                                              
      NN = DAG.getNode(ISD::TRUNCATE, DL, MVT::i16, NN);                                                                                                            
      Results.push_back(NN);                                                                                                                                        
    }                                                                                                                                                               
    return;                                                                                                                                                         
  }

With this change, we can get code that at least doesn't crash:

umov    w8, v0.h[3]
fmov    s0, w0
fmov    s1, w1
fmov    s2, w8
sqrdmlah        s0, s1, s2
fmov    w0, s0
ret

That uses the i32 variant of the sqrdmlah instruction. We need to do at least this much, I think, because we can't have intrinsics that just crash the compiler.

Now, matching the pattern. The pattern we need to match is basically the same as the i32_indexed version of the pattern, but with a v8i16 instead of v4i32 type:

def : Pat<(i32 (Accum (i32 FPR32Op:$Rd),                                                                                                                              
                 (i32 (int_aarch64_neon_sqrdmulh                                                                                                                    
                   (i32 FPR32Op:$Rn),                                                                                                                               
                   (i32 (vector_extract (v8i16 V128:$Rm),                                                                                                           
                                        VectorIndexH:$idx)))))),

But the pattern to generate is even uglier still. This is the best i've got:

(COPY_TO_REGCLASS (f32 (INSERT_SUBREG (IMPLICIT_DEF),                                                                                                       
               (!cast<Instruction>(NAME#"i16_indexed")                                                                                                      
                 (EXTRACT_SUBREG (i32 (COPY_TO_REGCLASS FPR32Op:$Rd, FPR32)), hsub),                                                                        
                 (EXTRACT_SUBREG (i32 (COPY_TO_REGCLASS FPR32Op:$Rn, FPR32)), hsub),                                                                        
                 V128:$Rm, VectorIndexH:$idx),                                                                                                              
               hsub)), FPR32)>;

All the operands are going to be i32 types, so we need to make sure they're in the FPR32 register bank before we try and take the "hsub" subregister from them. That's the COPY_TO_REGCLASS nodes. We will then end up with an f32 type, which in order to merge it into the i32 the pattern must return, I've added another COPY_TO_REGCLASS so the return value of the entire pattern is merely FPR32 (which both i32 and f32 can be allocated to).

This produces:

 fmov    s1, w1
fmov    s2, w0
sqrdmlah        h2, h1, v0.h[3]
fmov    w0, s2
ret

Which is what we want. It can also produce chained sqrdmlah's, such as:

fmov    s1, w1
fmov    s2, w0
sqrdmlah        h2, h1, v0.h[3]
sqrdmlah        h2, h1, v0.h[2]
fmov    w0, s2
ret

So I think this is certainly a valid way of implementing those intrinsics.

I'm not sure the best way forward here - @Tim, would you mind please checking my tomfoolery above and see if you agree or not? If so, implementing these is quite involved so possibly would be better done in a separate patch.

Cheers,

James

In AArch64ISelLowering.cpp , function ReplaceNodeResults() , we need also "sqsub"
I don't think i32 sqrdmlah() will work right, even if we'd replace ISD::TRUNCATE with SQXTN. Is the following correct?

i16 sqrdmlah (0, 100, 1000) -> 0 sqadd (100 sqrdmulh 1000) -> 0 sqadd (high i16 half of 100000) -> 0 sqadd 1 -> 1 - we need to obtain that with workaround...
i32 sqrdmlah (0, 100, 1000) -> 0 sqadd (high i32 half of 100000) -> 0
nope, workaround does not seem to be right

commented out patterns are removed
commented out tests are rewritten from illegal IR to clang-style

Hi Vladimir,

Thanks for doing that. It looks a lot better without the nasty selection logic, and we can just fix up vector instructions to scalar ones in the AdvSIMD pass if needed, as Tim suggested.

LGTM.

Cheers,

James

This revision is now accepted and ready to land.Mar 31 2015, 5:43 AM

Closed by commit rL233693: [AArch64] Add v8.1a "Rounding Double Multiply Add/Subtract" extension (authored by vsukharev). · Explain WhyMar 31 2015, 6:18 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

AArch64/

AArch64InstrFormats.td

199 lines

AArch64InstrInfo.td

22 lines

test/

CodeGen/

AArch64/

arm64-neon-v8.1a.ll

456 lines

MC/

AArch64/

armv8.1a-rdma.s

154 lines

Disassembler/

AArch64/

armv8.1a-rdma.txt

129 lines

Diff 22947

llvm/trunk/lib/Target/AArch64/AArch64InstrFormats.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,294 Lines • ▼ Show 20 Lines	class BaseSIMDThreeScalar<bit U, bits<2> size, bits<5> opcode,
let Inst{21} = 1;		let Inst{21} = 1;
let Inst{20-16} = Rm;		let Inst{20-16} = Rm;
let Inst{15-11} = opcode;		let Inst{15-11} = opcode;
let Inst{10} = 1;		let Inst{10} = 1;
let Inst{9-5} = Rn;		let Inst{9-5} = Rn;
let Inst{4-0} = Rd;		let Inst{4-0} = Rd;
}		}

		let mayStore = 0, mayLoad = 0, hasSideEffects = 0 in
		class BaseSIMDThreeScalarTied<bit U, bits<2> size, bit R, bits<5> opcode,
		dag oops, dag iops, string asm,
		list<dag> pattern>
		: I<oops, iops, asm, "\t$Rd, $Rn, $Rm", "$Rd = $dst", pattern>,
		Sched<[WriteV]> {
		bits<5> Rd;
		bits<5> Rn;
		bits<5> Rm;
		let Inst{31-30} = 0b01;
		let Inst{29} = U;
		let Inst{28-24} = 0b11110;
		let Inst{23-22} = size;
		let Inst{21} = R;
		let Inst{20-16} = Rm;
		let Inst{15-11} = opcode;
		let Inst{10} = 1;
		let Inst{9-5} = Rn;
		let Inst{4-0} = Rd;
		}

multiclass SIMDThreeScalarD<bit U, bits<5> opc, string asm,		multiclass SIMDThreeScalarD<bit U, bits<5> opc, string asm,
SDPatternOperator OpNode> {		SDPatternOperator OpNode> {
def v1i64 : BaseSIMDThreeScalar<U, 0b11, opc, FPR64, asm,		def v1i64 : BaseSIMDThreeScalar<U, 0b11, opc, FPR64, asm,
[(set (v1i64 FPR64:$Rd), (OpNode (v1i64 FPR64:$Rn), (v1i64 FPR64:$Rm)))]>;		[(set (v1i64 FPR64:$Rd), (OpNode (v1i64 FPR64:$Rn), (v1i64 FPR64:$Rm)))]>;
}		}

multiclass SIMDThreeScalarBHSD<bit U, bits<5> opc, string asm,		multiclass SIMDThreeScalarBHSD<bit U, bits<5> opc, string asm,
SDPatternOperator OpNode> {		SDPatternOperator OpNode> {
Show All 11 Lines

multiclass SIMDThreeScalarHS<bit U, bits<5> opc, string asm,		multiclass SIMDThreeScalarHS<bit U, bits<5> opc, string asm,
SDPatternOperator OpNode> {		SDPatternOperator OpNode> {
def v1i32 : BaseSIMDThreeScalar<U, 0b10, opc, FPR32, asm,		def v1i32 : BaseSIMDThreeScalar<U, 0b10, opc, FPR32, asm,
[(set FPR32:$Rd, (OpNode FPR32:$Rn, FPR32:$Rm))]>;		[(set FPR32:$Rd, (OpNode FPR32:$Rn, FPR32:$Rm))]>;
def v1i16 : BaseSIMDThreeScalar<U, 0b01, opc, FPR16, asm, []>;		def v1i16 : BaseSIMDThreeScalar<U, 0b01, opc, FPR16, asm, []>;
}		}

		multiclass SIMDThreeScalarHSTied<bit U, bit R, bits<5> opc, string asm,
		SDPatternOperator OpNode = null_frag> {
		def v1i32: BaseSIMDThreeScalarTied<U, 0b10, R, opc, (outs FPR32:$dst),
		(ins FPR32:$Rd, FPR32:$Rn, FPR32:$Rm),
		asm, []>;
		def v1i16: BaseSIMDThreeScalarTied<U, 0b01, R, opc, (outs FPR16:$dst),
		(ins FPR16:$Rd, FPR16:$Rn, FPR16:$Rm),
		asm, []>;
		}

multiclass SIMDThreeScalarSD<bit U, bit S, bits<5> opc, string asm,		multiclass SIMDThreeScalarSD<bit U, bit S, bits<5> opc, string asm,
SDPatternOperator OpNode = null_frag> {		SDPatternOperator OpNode = null_frag> {
let mayLoad = 0, mayStore = 0, hasSideEffects = 0 in {		let mayLoad = 0, mayStore = 0, hasSideEffects = 0 in {
def #NAME#64 : BaseSIMDThreeScalar<U, {S,1}, opc, FPR64, asm,		def #NAME#64 : BaseSIMDThreeScalar<U, {S,1}, opc, FPR64, asm,
[(set (f64 FPR64:$Rd), (OpNode (f64 FPR64:$Rn), (f64 FPR64:$Rm)))]>;		[(set (f64 FPR64:$Rd), (OpNode (f64 FPR64:$Rn), (f64 FPR64:$Rm)))]>;
def #NAME#32 : BaseSIMDThreeScalar<U, {S,0}, opc, FPR32, asm,		def #NAME#32 : BaseSIMDThreeScalar<U, {S,0}, opc, FPR32, asm,
[(set FPR32:$Rd, (OpNode FPR32:$Rn, FPR32:$Rm))]>;		[(set FPR32:$Rd, (OpNode FPR32:$Rn, FPR32:$Rm))]>;
}		}
▲ Show 20 Lines • Show All 3,176 Lines • ▼ Show 20 Lines	multiclass SIMDLdSt4SingleAliases<string asm> {
defm : SIMDLdStSingleAliases<asm, "b", "i8", "Four", 4, VectorIndexB>;		defm : SIMDLdStSingleAliases<asm, "b", "i8", "Four", 4, VectorIndexB>;
defm : SIMDLdStSingleAliases<asm, "h", "i16", "Four", 8, VectorIndexH>;		defm : SIMDLdStSingleAliases<asm, "h", "i16", "Four", 8, VectorIndexH>;
defm : SIMDLdStSingleAliases<asm, "s", "i32", "Four", 16, VectorIndexS>;		defm : SIMDLdStSingleAliases<asm, "s", "i32", "Four", 16, VectorIndexS>;
defm : SIMDLdStSingleAliases<asm, "d", "i64", "Four", 32, VectorIndexD>;		defm : SIMDLdStSingleAliases<asm, "d", "i64", "Four", 32, VectorIndexD>;
}		}
} // end of 'let Predicates = [HasNEON]'		} // end of 'let Predicates = [HasNEON]'

//----------------------------------------------------------------------------		//----------------------------------------------------------------------------
		// AdvSIMD v8.1 Rounding Double Multiply Add/Subtract
		//----------------------------------------------------------------------------

		let Predicates = [HasNEON, HasV8_1a] in {

		class BaseSIMDThreeSameVectorTiedR0<bit Q, bit U, bits<2> size, bits<5> opcode,
		RegisterOperand regtype, string asm,
		string kind, list<dag> pattern>
		: BaseSIMDThreeSameVectorTied<Q, U, size, opcode, regtype, asm, kind,
		pattern> {
		let Inst{21}=0;
		}
		multiclass SIMDThreeSameVectorSQRDMLxHTiedHS<bit U, bits<5> opc, string asm,
		SDPatternOperator Accum> {
		def v4i16 : BaseSIMDThreeSameVectorTiedR0<0, U, 0b01, opc, V64, asm, ".4h",
		[(set (v4i16 V64:$dst),
		(Accum (v4i16 V64:$Rd),
		(v4i16 (int_aarch64_neon_sqrdmulh (v4i16 V64:$Rn),
		(v4i16 V64:$Rm)))))]>;
		def v8i16 : BaseSIMDThreeSameVectorTiedR0<1, U, 0b01, opc, V128, asm, ".8h",
		[(set (v8i16 V128:$dst),
		(Accum (v8i16 V128:$Rd),
		(v8i16 (int_aarch64_neon_sqrdmulh (v8i16 V128:$Rn),
		(v8i16 V128:$Rm)))))]>;
		def v2i32 : BaseSIMDThreeSameVectorTiedR0<0, U, 0b10, opc, V64, asm, ".2s",
		[(set (v2i32 V64:$dst),
		(Accum (v2i32 V64:$Rd),
		(v2i32 (int_aarch64_neon_sqrdmulh (v2i32 V64:$Rn),
		(v2i32 V64:$Rm)))))]>;
		def v4i32 : BaseSIMDThreeSameVectorTiedR0<1, U, 0b10, opc, V128, asm, ".4s",
		[(set (v4i32 V128:$dst),
		(Accum (v4i32 V128:$Rd),
		(v4i32 (int_aarch64_neon_sqrdmulh (v4i32 V128:$Rn),
		(v4i32 V128:$Rm)))))]>;
		}

		multiclass SIMDIndexedSQRDMLxHSDTied<bit U, bits<4> opc, string asm,
		SDPatternOperator Accum> {
		def v4i16_indexed : BaseSIMDIndexedTied<0, U, 0, 0b01, opc,
		V64, V64, V128_lo, VectorIndexH,
		asm, ".4h", ".4h", ".4h", ".h",
		[(set (v4i16 V64:$dst),
		(Accum (v4i16 V64:$Rd),
		(v4i16 (int_aarch64_neon_sqrdmulh
		(v4i16 V64:$Rn),
		(v4i16 (AArch64duplane16 (v8i16 V128_lo:$Rm),
		VectorIndexH:$idx))))))]> {
		bits<3> idx;
		let Inst{11} = idx{2};
		let Inst{21} = idx{1};
		let Inst{20} = idx{0};
		}

		def v8i16_indexed : BaseSIMDIndexedTied<1, U, 0, 0b01, opc,
		V128, V128, V128_lo, VectorIndexH,
		asm, ".8h", ".8h", ".8h", ".h",
		[(set (v8i16 V128:$dst),
		(Accum (v8i16 V128:$Rd),
		(v8i16 (int_aarch64_neon_sqrdmulh
		(v8i16 V128:$Rn),
		(v8i16 (AArch64duplane16 (v8i16 V128_lo:$Rm),
		VectorIndexH:$idx))))))]> {
		bits<3> idx;
		let Inst{11} = idx{2};
		let Inst{21} = idx{1};
		let Inst{20} = idx{0};
		}

		def v2i32_indexed : BaseSIMDIndexedTied<0, U, 0, 0b10, opc,
		V64, V64, V128, VectorIndexS,
		asm, ".2s", ".2s", ".2s", ".s",
		[(set (v2i32 V64:$dst),
		(Accum (v2i32 V64:$Rd),
		(v2i32 (int_aarch64_neon_sqrdmulh
		(v2i32 V64:$Rn),
		(v2i32 (AArch64duplane32 (v4i32 V128:$Rm),
		VectorIndexS:$idx))))))]> {
		bits<2> idx;
		let Inst{11} = idx{1};
		let Inst{21} = idx{0};
		}

		// FIXME: it would be nice to use the scalar (v1i32) instruction here, but
		// an intermediate EXTRACT_SUBREG would be untyped.
		// FIXME: direct EXTRACT_SUBREG from v2i32 to i32 is illegal, that's why we
		// got it lowered here as (i32 vector_extract (v4i32 insert_subvector(..)))
		def : Pat<(i32 (Accum (i32 FPR32Op:$Rd),
		(i32 (vector_extract
		(v4i32 (insert_subvector
		(undef),
		(v2i32 (int_aarch64_neon_sqrdmulh
		(v2i32 V64:$Rn),
		(v2i32 (AArch64duplane32
		(v4i32 V128:$Rm),
		VectorIndexS:$idx)))),
		(i32 0))),
		(i64 0))))),
		(EXTRACT_SUBREG
		(v2i32 (!cast<Instruction>(NAME # v2i32_indexed)
		(v2i32 (INSERT_SUBREG (v2i32 (IMPLICIT_DEF)),
		FPR32Op:$Rd,
		ssub)),
		V64:$Rn,
		V128:$Rm,
		VectorIndexS:$idx)),
		ssub)>;

		def v4i32_indexed : BaseSIMDIndexedTied<1, U, 0, 0b10, opc,
		V128, V128, V128, VectorIndexS,
		asm, ".4s", ".4s", ".4s", ".s",
		[(set (v4i32 V128:$dst),
		(Accum (v4i32 V128:$Rd),
		(v4i32 (int_aarch64_neon_sqrdmulh
		(v4i32 V128:$Rn),
		(v4i32 (AArch64duplane32 (v4i32 V128:$Rm),
		VectorIndexS:$idx))))))]> {
		bits<2> idx;
		let Inst{11} = idx{1};
		let Inst{21} = idx{0};
		}

		// FIXME: it would be nice to use the scalar (v1i32) instruction here, but
		// an intermediate EXTRACT_SUBREG would be untyped.
		def : Pat<(i32 (Accum (i32 FPR32Op:$Rd),
		(i32 (vector_extract
		(v4i32 (int_aarch64_neon_sqrdmulh
		(v4i32 V128:$Rn),
		(v4i32 (AArch64duplane32
		(v4i32 V128:$Rm),
		VectorIndexS:$idx)))),
		(i64 0))))),
		(EXTRACT_SUBREG
		(v4i32 (!cast<Instruction>(NAME # v4i32_indexed)
		(v4i32 (INSERT_SUBREG (v4i32 (IMPLICIT_DEF)),
		FPR32Op:$Rd,
		ssub)),
		V128:$Rn,
		V128:$Rm,
		VectorIndexS:$idx)),
		ssub)>;

		def i16_indexed : BaseSIMDIndexedTied<1, U, 1, 0b01, opc,
		FPR16Op, FPR16Op, V128_lo,
		VectorIndexH, asm, ".h", "", "", ".h",
		[]> {
		bits<3> idx;
		let Inst{11} = idx{2};
		let Inst{21} = idx{1};
		let Inst{20} = idx{0};
		}

		def i32_indexed : BaseSIMDIndexedTied<1, U, 1, 0b10, opc,
		FPR32Op, FPR32Op, V128, VectorIndexS,
		asm, ".s", "", "", ".s",
		[(set (i32 FPR32Op:$dst),
		(Accum (i32 FPR32Op:$Rd),
		(i32 (int_aarch64_neon_sqrdmulh
		(i32 FPR32Op:$Rn),
		(i32 (vector_extract (v4i32 V128:$Rm),
		VectorIndexS:$idx))))))]> {
		bits<2> idx;
		let Inst{11} = idx{1};
		let Inst{21} = idx{0};
		}
		}
		} // let Predicates = [HasNeon, HasV8_1a]

		//----------------------------------------------------------------------------
// Crypto extensions		// Crypto extensions
//----------------------------------------------------------------------------		//----------------------------------------------------------------------------

let Predicates = [HasCrypto] in {		let Predicates = [HasCrypto] in {
let mayLoad = 0, mayStore = 0, hasSideEffects = 0 in		let mayLoad = 0, mayStore = 0, hasSideEffects = 0 in
class AESBase<bits<4> opc, string asm, dag outs, dag ins, string cstr,		class AESBase<bits<4> opc, string asm, dag outs, dag ins, string cstr,
list<dag> pat>		list<dag> pat>
: I<outs, ins, asm, "{\t$Rd.16b, $Rn.16b\|.16b\t$Rd, $Rn}", cstr, pat>,		: I<outs, ins, asm, "{\t$Rd.16b, $Rn.16b\|.16b\t$Rd, $Rn}", cstr, pat>,
▲ Show 20 Lines • Show All 102 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AArch64/AArch64InstrInfo.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 2,772 Lines • ▼ Show 20 Lines
	defm UMIN : SIMDThreeSameVectorBHS<1,0b01101,"umin", int_aarch64_neon_umin>;			defm UMIN : SIMDThreeSameVectorBHS<1,0b01101,"umin", int_aarch64_neon_umin>;
	defm UQADD : SIMDThreeSameVector<1,0b00001,"uqadd", int_aarch64_neon_uqadd>;			defm UQADD : SIMDThreeSameVector<1,0b00001,"uqadd", int_aarch64_neon_uqadd>;
	defm UQRSHL : SIMDThreeSameVector<1,0b01011,"uqrshl", int_aarch64_neon_uqrshl>;			defm UQRSHL : SIMDThreeSameVector<1,0b01011,"uqrshl", int_aarch64_neon_uqrshl>;
	defm UQSHL : SIMDThreeSameVector<1,0b01001,"uqshl", int_aarch64_neon_uqshl>;			defm UQSHL : SIMDThreeSameVector<1,0b01001,"uqshl", int_aarch64_neon_uqshl>;
	defm UQSUB : SIMDThreeSameVector<1,0b00101,"uqsub", int_aarch64_neon_uqsub>;			defm UQSUB : SIMDThreeSameVector<1,0b00101,"uqsub", int_aarch64_neon_uqsub>;
	defm URHADD : SIMDThreeSameVectorBHS<1,0b00010,"urhadd", int_aarch64_neon_urhadd>;			defm URHADD : SIMDThreeSameVectorBHS<1,0b00010,"urhadd", int_aarch64_neon_urhadd>;
	defm URSHL : SIMDThreeSameVector<1,0b01010,"urshl", int_aarch64_neon_urshl>;			defm URSHL : SIMDThreeSameVector<1,0b01010,"urshl", int_aarch64_neon_urshl>;
	defm USHL : SIMDThreeSameVector<1,0b01000,"ushl", int_aarch64_neon_ushl>;			defm USHL : SIMDThreeSameVector<1,0b01000,"ushl", int_aarch64_neon_ushl>;
				defm SQRDMLAH : SIMDThreeSameVectorSQRDMLxHTiedHS<1,0b10000,"sqrdmlah",
				int_aarch64_neon_sqadd>;
				defm SQRDMLSH : SIMDThreeSameVectorSQRDMLxHTiedHS<1,0b10001,"sqrdmlsh",
				int_aarch64_neon_sqsub>;

	defm AND : SIMDLogicalThreeVector<0, 0b00, "and", and>;			defm AND : SIMDLogicalThreeVector<0, 0b00, "and", and>;
	defm BIC : SIMDLogicalThreeVector<0, 0b01, "bic",			defm BIC : SIMDLogicalThreeVector<0, 0b01, "bic",
	BinOpFrag<(and node:$LHS, (vnot node:$RHS))> >;			BinOpFrag<(and node:$LHS, (vnot node:$RHS))> >;
	defm BIF : SIMDLogicalThreeVector<1, 0b11, "bif">;			defm BIF : SIMDLogicalThreeVector<1, 0b11, "bif">;
	defm BIT : SIMDLogicalThreeVectorTied<1, 0b10, "bit", AArch64bit>;			defm BIT : SIMDLogicalThreeVectorTied<1, 0b10, "bit", AArch64bit>;
	defm BSL : SIMDLogicalThreeVectorTied<1, 0b01, "bsl",			defm BSL : SIMDLogicalThreeVectorTied<1, 0b01, "bsl",
	TriOpFrag<(or (and node:$LHS, node:$MHS), (and (vnot node:$LHS), node:$RHS))>>;			TriOpFrag<(or (and node:$LHS, node:$MHS), (and (vnot node:$LHS), node:$RHS))>>;
	▲ Show 20 Lines • Show All 200 Lines • ▼ Show 20 Lines
	defm SSHL : SIMDThreeScalarD< 0, 0b01000, "sshl", int_aarch64_neon_sshl>;			defm SSHL : SIMDThreeScalarD< 0, 0b01000, "sshl", int_aarch64_neon_sshl>;
	defm SUB : SIMDThreeScalarD< 1, 0b10000, "sub", sub>;			defm SUB : SIMDThreeScalarD< 1, 0b10000, "sub", sub>;
	defm UQADD : SIMDThreeScalarBHSD<1, 0b00001, "uqadd", int_aarch64_neon_uqadd>;			defm UQADD : SIMDThreeScalarBHSD<1, 0b00001, "uqadd", int_aarch64_neon_uqadd>;
	defm UQRSHL : SIMDThreeScalarBHSD<1, 0b01011, "uqrshl",int_aarch64_neon_uqrshl>;			defm UQRSHL : SIMDThreeScalarBHSD<1, 0b01011, "uqrshl",int_aarch64_neon_uqrshl>;
	defm UQSHL : SIMDThreeScalarBHSD<1, 0b01001, "uqshl", int_aarch64_neon_uqshl>;			defm UQSHL : SIMDThreeScalarBHSD<1, 0b01001, "uqshl", int_aarch64_neon_uqshl>;
	defm UQSUB : SIMDThreeScalarBHSD<1, 0b00101, "uqsub", int_aarch64_neon_uqsub>;			defm UQSUB : SIMDThreeScalarBHSD<1, 0b00101, "uqsub", int_aarch64_neon_uqsub>;
	defm URSHL : SIMDThreeScalarD< 1, 0b01010, "urshl", int_aarch64_neon_urshl>;			defm URSHL : SIMDThreeScalarD< 1, 0b01010, "urshl", int_aarch64_neon_urshl>;
	defm USHL : SIMDThreeScalarD< 1, 0b01000, "ushl", int_aarch64_neon_ushl>;			defm USHL : SIMDThreeScalarD< 1, 0b01000, "ushl", int_aarch64_neon_ushl>;
				let Predicates = [HasV8_1a] in {
				defm SQRDMLAH : SIMDThreeScalarHSTied<1, 0, 0b10000, "sqrdmlah">;
				defm SQRDMLSH : SIMDThreeScalarHSTied<1, 0, 0b10001, "sqrdmlsh">;
				def : Pat<(i32 (int_aarch64_neon_sqadd
				(i32 FPR32:$Rd),
				(i32 (int_aarch64_neon_sqrdmulh (i32 FPR32:$Rn),
				(i32 FPR32:$Rm))))),
				(SQRDMLAHv1i32 FPR32:$Rd, FPR32:$Rn, FPR32:$Rm)>;
				def : Pat<(i32 (int_aarch64_neon_sqsub
				(i32 FPR32:$Rd),
				(i32 (int_aarch64_neon_sqrdmulh (i32 FPR32:$Rn),
				(i32 FPR32:$Rm))))),
				(SQRDMLSHv1i32 FPR32:$Rd, FPR32:$Rn, FPR32:$Rm)>;
				}

	def : InstAlias<"cmls $dst, $src1, $src2",			def : InstAlias<"cmls $dst, $src1, $src2",
	(CMHSv1i64 FPR64:$dst, FPR64:$src2, FPR64:$src1), 0>;			(CMHSv1i64 FPR64:$dst, FPR64:$src2, FPR64:$src1), 0>;
	def : InstAlias<"cmle $dst, $src1, $src2",			def : InstAlias<"cmle $dst, $src1, $src2",
	(CMGEv1i64 FPR64:$dst, FPR64:$src2, FPR64:$src1), 0>;			(CMGEv1i64 FPR64:$dst, FPR64:$src2, FPR64:$src1), 0>;
	def : InstAlias<"cmlo $dst, $src1, $src2",			def : InstAlias<"cmlo $dst, $src1, $src2",
	(CMHIv1i64 FPR64:$dst, FPR64:$src2, FPR64:$src1), 0>;			(CMHIv1i64 FPR64:$dst, FPR64:$src2, FPR64:$src1), 0>;
	def : InstAlias<"cmlt $dst, $src1, $src2",			def : InstAlias<"cmlt $dst, $src1, $src2",
	▲ Show 20 Lines • Show All 1,314 Lines • ▼ Show 20 Lines
	defm SMLSL : SIMDVectorIndexedLongSDTied<0, 0b0110, "smlsl",			defm SMLSL : SIMDVectorIndexedLongSDTied<0, 0b0110, "smlsl",
	TriOpFrag<(sub node:$LHS, (int_aarch64_neon_smull node:$MHS, node:$RHS))>>;			TriOpFrag<(sub node:$LHS, (int_aarch64_neon_smull node:$MHS, node:$RHS))>>;
	defm SMULL : SIMDVectorIndexedLongSD<0, 0b1010, "smull",			defm SMULL : SIMDVectorIndexedLongSD<0, 0b1010, "smull",
	int_aarch64_neon_smull>;			int_aarch64_neon_smull>;
	defm SQDMLAL : SIMDIndexedLongSQDMLXSDTied<0, 0b0011, "sqdmlal",			defm SQDMLAL : SIMDIndexedLongSQDMLXSDTied<0, 0b0011, "sqdmlal",
	int_aarch64_neon_sqadd>;			int_aarch64_neon_sqadd>;
	defm SQDMLSL : SIMDIndexedLongSQDMLXSDTied<0, 0b0111, "sqdmlsl",			defm SQDMLSL : SIMDIndexedLongSQDMLXSDTied<0, 0b0111, "sqdmlsl",
	int_aarch64_neon_sqsub>;			int_aarch64_neon_sqsub>;
				defm SQRDMLAH : SIMDIndexedSQRDMLxHSDTied<1, 0b1101, "sqrdmlah",
				int_aarch64_neon_sqadd>;
				defm SQRDMLSH : SIMDIndexedSQRDMLxHSDTied<1, 0b1111, "sqrdmlsh",
				int_aarch64_neon_sqsub>;
	defm SQDMULL : SIMDIndexedLongSD<0, 0b1011, "sqdmull", int_aarch64_neon_sqdmull>;			defm SQDMULL : SIMDIndexedLongSD<0, 0b1011, "sqdmull", int_aarch64_neon_sqdmull>;
	defm UMLAL : SIMDVectorIndexedLongSDTied<1, 0b0010, "umlal",			defm UMLAL : SIMDVectorIndexedLongSDTied<1, 0b0010, "umlal",
	TriOpFrag<(add node:$LHS, (int_aarch64_neon_umull node:$MHS, node:$RHS))>>;			TriOpFrag<(add node:$LHS, (int_aarch64_neon_umull node:$MHS, node:$RHS))>>;
	defm UMLSL : SIMDVectorIndexedLongSDTied<1, 0b0110, "umlsl",			defm UMLSL : SIMDVectorIndexedLongSDTied<1, 0b0110, "umlsl",
	TriOpFrag<(sub node:$LHS, (int_aarch64_neon_umull node:$MHS, node:$RHS))>>;			TriOpFrag<(sub node:$LHS, (int_aarch64_neon_umull node:$MHS, node:$RHS))>>;
	defm UMULL : SIMDVectorIndexedLongSD<1, 0b1010, "umull",			defm UMULL : SIMDVectorIndexedLongSD<1, 0b1010, "umull",
	int_aarch64_neon_umull>;			int_aarch64_neon_umull>;

	▲ Show 20 Lines • Show All 1,363 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AArch64/arm64-neon-v8.1a.ll

				; RUN: llc < %s -verify-machineinstrs -march=arm64 \| FileCheck %s --check-prefix=CHECK-V8a
				; RUN: llc < %s -verify-machineinstrs -march=arm64 -mattr=+v8.1a \| FileCheck %s --check-prefix=CHECK-V81a
				; RUN: llc < %s -verify-machineinstrs -march=arm64 -mattr=+v8.1a -aarch64-neon-syntax=apple \| FileCheck %s --check-prefix=CHECK-V81a-apple

				declare <4 x i16> @llvm.aarch64.neon.sqrdmulh.v4i16(<4 x i16>, <4 x i16>)
				declare <8 x i16> @llvm.aarch64.neon.sqrdmulh.v8i16(<8 x i16>, <8 x i16>)
				declare <2 x i32> @llvm.aarch64.neon.sqrdmulh.v2i32(<2 x i32>, <2 x i32>)
				declare <4 x i32> @llvm.aarch64.neon.sqrdmulh.v4i32(<4 x i32>, <4 x i32>)
				declare i32 @llvm.aarch64.neon.sqrdmulh.i32(i32, i32)
				declare i16 @llvm.aarch64.neon.sqrdmulh.i16(i16, i16)

				declare <4 x i16> @llvm.aarch64.neon.sqadd.v4i16(<4 x i16>, <4 x i16>)
				declare <8 x i16> @llvm.aarch64.neon.sqadd.v8i16(<8 x i16>, <8 x i16>)
				declare <2 x i32> @llvm.aarch64.neon.sqadd.v2i32(<2 x i32>, <2 x i32>)
				declare <4 x i32> @llvm.aarch64.neon.sqadd.v4i32(<4 x i32>, <4 x i32>)
				declare i32 @llvm.aarch64.neon.sqadd.i32(i32, i32)
				declare i16 @llvm.aarch64.neon.sqadd.i16(i16, i16)

				declare <4 x i16> @llvm.aarch64.neon.sqsub.v4i16(<4 x i16>, <4 x i16>)
				declare <8 x i16> @llvm.aarch64.neon.sqsub.v8i16(<8 x i16>, <8 x i16>)
				declare <2 x i32> @llvm.aarch64.neon.sqsub.v2i32(<2 x i32>, <2 x i32>)
				declare <4 x i32> @llvm.aarch64.neon.sqsub.v4i32(<4 x i32>, <4 x i32>)
				declare i32 @llvm.aarch64.neon.sqsub.i32(i32, i32)
				declare i16 @llvm.aarch64.neon.sqsub.i16(i16, i16)

				;-----------------------------------------------------------------------------
				; RDMA Vector
				; test for SIMDThreeSameVectorSQRDMLxHTiedHS

				define <4 x i16> @test_sqrdmlah_v4i16(<4 x i16> %acc, <4 x i16> %mhs, <4 x i16> %rhs) {
				; CHECK-LABEL: test_sqrdmlah_v4i16:
				%prod = call <4 x i16> @llvm.aarch64.neon.sqrdmulh.v4i16(<4 x i16> %mhs, <4 x i16> %rhs)
				%retval = call <4 x i16> @llvm.aarch64.neon.sqadd.v4i16(<4 x i16> %acc, <4 x i16> %prod)
				; CHECK-V8a: sqrdmulh v1.4h, v1.4h, v2.4h
				; CHECK-V81a: sqrdmlah v0.4h, v1.4h, v2.4h
				; CHECK-V81a-apple: sqrdmlah.4h v0, v1, v2
				ret <4 x i16> %retval
				}

				define <8 x i16> @test_sqrdmlah_v8i16(<8 x i16> %acc, <8 x i16> %mhs, <8 x i16> %rhs) {
				; CHECK-LABEL: test_sqrdmlah_v8i16:
				%prod = call <8 x i16> @llvm.aarch64.neon.sqrdmulh.v8i16(<8 x i16> %mhs, <8 x i16> %rhs)
				%retval = call <8 x i16> @llvm.aarch64.neon.sqadd.v8i16(<8 x i16> %acc, <8 x i16> %prod)
				; CHECK-V8a: sqrdmulh v1.8h, v1.8h, v2.8h
				; CHECK-V81a: sqrdmlah v0.8h, v1.8h, v2.8h
				; CHECK-V81a-apple: sqrdmlah.8h v0, v1, v2
				ret <8 x i16> %retval
				}

				define <2 x i32> @test_sqrdmlah_v2i32(<2 x i32> %acc, <2 x i32> %mhs, <2 x i32> %rhs) {
				; CHECK-LABEL: test_sqrdmlah_v2i32:
				%prod = call <2 x i32> @llvm.aarch64.neon.sqrdmulh.v2i32(<2 x i32> %mhs, <2 x i32> %rhs)
				%retval = call <2 x i32> @llvm.aarch64.neon.sqadd.v2i32(<2 x i32> %acc, <2 x i32> %prod)
				; CHECK-V8a: sqrdmulh v1.2s, v1.2s, v2.2s
				; CHECK-V81a: sqrdmlah v0.2s, v1.2s, v2.2s
				; CHECK-V81a-apple: sqrdmlah.2s v0, v1, v2
				ret <2 x i32> %retval
				}

				define <4 x i32> @test_sqrdmlah_v4i32(<4 x i32> %acc, <4 x i32> %mhs, <4 x i32> %rhs) {
				; CHECK-LABEL: test_sqrdmlah_v4i32:
				%prod = call <4 x i32> @llvm.aarch64.neon.sqrdmulh.v4i32(<4 x i32> %mhs, <4 x i32> %rhs)
				%retval = call <4 x i32> @llvm.aarch64.neon.sqadd.v4i32(<4 x i32> %acc, <4 x i32> %prod)
				; CHECK-V81: sqrdmulh v1.4s, v1.4s, v2.4s
				; CHECK-V81a: sqrdmlah v0.4s, v1.4s, v2.4s
				; CHECK-V81a-apple: sqrdmlah.4s v0, v1, v2
				ret <4 x i32> %retval
				}

				define <4 x i16> @test_sqrdmlsh_v4i16(<4 x i16> %acc, <4 x i16> %mhs, <4 x i16> %rhs) {
				; CHECK-LABEL: test_sqrdmlsh_v4i16:
				%prod = call <4 x i16> @llvm.aarch64.neon.sqrdmulh.v4i16(<4 x i16> %mhs, <4 x i16> %rhs)
				%retval = call <4 x i16> @llvm.aarch64.neon.sqsub.v4i16(<4 x i16> %acc, <4 x i16> %prod)
				; CHECK-V8a: sqrdmulh v1.4h, v1.4h, v2.4h
				; CHECK-V81a: sqrdmlsh v0.4h, v1.4h, v2.4h
				; CHECK-V81a-apple: sqrdmlsh.4h v0, v1, v2
				ret <4 x i16> %retval
				}

				define <8 x i16> @test_sqrdmlsh_v8i16(<8 x i16> %acc, <8 x i16> %mhs, <8 x i16> %rhs) {
				; CHECK-LABEL: test_sqrdmlsh_v8i16:
				%prod = call <8 x i16> @llvm.aarch64.neon.sqrdmulh.v8i16(<8 x i16> %mhs, <8 x i16> %rhs)
				%retval = call <8 x i16> @llvm.aarch64.neon.sqsub.v8i16(<8 x i16> %acc, <8 x i16> %prod)
				; CHECK-V8a: sqrdmulh v1.8h, v1.8h, v2.8h
				; CHECK-V81a: sqrdmlsh v0.8h, v1.8h, v2.8h
				; CHECK-V81a-apple: sqrdmlsh.8h v0, v1, v2
				ret <8 x i16> %retval
				}

				define <2 x i32> @test_sqrdmlsh_v2i32(<2 x i32> %acc, <2 x i32> %mhs, <2 x i32> %rhs) {
				; CHECK-LABEL: test_sqrdmlsh_v2i32:
				%prod = call <2 x i32> @llvm.aarch64.neon.sqrdmulh.v2i32(<2 x i32> %mhs, <2 x i32> %rhs)
				%retval = call <2 x i32> @llvm.aarch64.neon.sqsub.v2i32(<2 x i32> %acc, <2 x i32> %prod)
				; CHECK-V8a: sqrdmulh v1.2s, v1.2s, v2.2s
				; CHECK-V81a: sqrdmlsh v0.2s, v1.2s, v2.2s
				; CHECK-V81a-apple: sqrdmlsh.2s v0, v1, v2
				ret <2 x i32> %retval
				}

				define <4 x i32> @test_sqrdmlsh_v4i32(<4 x i32> %acc, <4 x i32> %mhs, <4 x i32> %rhs) {
				; CHECK-LABEL: test_sqrdmlsh_v4i32:
				%prod = call <4 x i32> @llvm.aarch64.neon.sqrdmulh.v4i32(<4 x i32> %mhs, <4 x i32> %rhs)
				%retval = call <4 x i32> @llvm.aarch64.neon.sqsub.v4i32(<4 x i32> %acc, <4 x i32> %prod)
				; CHECK-V8a: sqrdmulh v1.4s, v1.4s, v2.4s
				; CHECK-V81a: sqrdmlsh v0.4s, v1.4s, v2.4s
				; CHECK-V81a-apple: sqrdmlsh.4s v0, v1, v2
				ret <4 x i32> %retval
				}

				;-----------------------------------------------------------------------------
				; RDMA Vector, by element
				; tests for vXiYY_indexed in SIMDIndexedSQRDMLxHSDTied

				define <4 x i16> @test_sqrdmlah_lane_s16(<4 x i16> %acc, <4 x i16> %x, <4 x i16> %v) {
				; CHECK-LABEL: test_sqrdmlah_lane_s16:
				entry:
				%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 3, i32 3, i32 3, i32 3>
				%prod = call <4 x i16> @llvm.aarch64.neon.sqrdmulh.v4i16(<4 x i16> %x, <4 x i16> %shuffle)
				%retval = call <4 x i16> @llvm.aarch64.neon.sqadd.v4i16(<4 x i16> %acc, <4 x i16> %prod)
				; CHECK-V8a : sqrdmulh v1.4h, v1.4h, v2.h[3]
				; CHECK-V81a: sqrdmlah v0.4h, v1.4h, v2.h[3]
				; CHECK-V81a-apple: sqrdmlah.4h v0, v1, v2[3]
				ret <4 x i16> %retval
				}

				define <8 x i16> @test_sqrdmlahq_lane_s16(<8 x i16> %acc, <8 x i16> %x, <8 x i16> %v) {
				; CHECK-LABEL: test_sqrdmlahq_lane_s16:
				entry:
				%shuffle = shufflevector <8 x i16> %v, <8 x i16> undef, <8 x i32> <i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2>
				%prod = call <8 x i16> @llvm.aarch64.neon.sqrdmulh.v8i16(<8 x i16> %x, <8 x i16> %shuffle)
				%retval = call <8 x i16> @llvm.aarch64.neon.sqadd.v8i16(<8 x i16> %acc, <8 x i16> %prod)
				; CHECK-V8a: sqrdmulh v1.8h, v1.8h, v2.h[2]
				; CHECK-V81a: sqrdmlah v0.8h, v1.8h, v2.h[2]
				; CHECK-V81a-apple: sqrdmlah.8h v0, v1, v2[2]
				ret <8 x i16> %retval
				}

				define <2 x i32> @test_sqrdmlah_lane_s32(<2 x i32> %acc, <2 x i32> %x, <2 x i32> %v) {
				; CHECK-LABEL: test_sqrdmlah_lane_s32:
				entry:
				%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> <i32 1, i32 1>
				%prod = call <2 x i32> @llvm.aarch64.neon.sqrdmulh.v2i32(<2 x i32> %x, <2 x i32> %shuffle)
				%retval = call <2 x i32> @llvm.aarch64.neon.sqadd.v2i32(<2 x i32> %acc, <2 x i32> %prod)
				; CHECK-V8a: sqrdmulh v1.2s, v1.2s, v2.s[1]
				; CHECK-V81a: sqrdmlah v0.2s, v1.2s, v2.s[1]
				; CHECK-V81a-apple: sqrdmlah.2s v0, v1, v2[1]
				ret <2 x i32> %retval
				}

				define <4 x i32> @test_sqrdmlahq_lane_s32(<4 x i32> %acc,<4 x i32> %x, <4 x i32> %v) {
				; CHECK-LABEL: test_sqrdmlahq_lane_s32:
				entry:
				%shuffle = shufflevector <4 x i32> %v, <4 x i32> undef, <4 x i32> zeroinitializer
				%prod = call <4 x i32> @llvm.aarch64.neon.sqrdmulh.v4i32(<4 x i32> %x, <4 x i32> %shuffle)
				%retval = call <4 x i32> @llvm.aarch64.neon.sqadd.v4i32(<4 x i32> %acc, <4 x i32> %prod)
				; CHECK-V8a: sqrdmulh v1.4s, v1.4s, v2.s[0]
				; CHECK-V81a: sqrdmlah v0.4s, v1.4s, v2.s[0]
				; CHECK-V81a-apple: sqrdmlah.4s v0, v1, v2[0]
				ret <4 x i32> %retval
				}

				define <4 x i16> @test_sqrdmlsh_lane_s16(<4 x i16> %acc, <4 x i16> %x, <4 x i16> %v) {
				; CHECK-LABEL: test_sqrdmlsh_lane_s16:
				entry:
				%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 3, i32 3, i32 3, i32 3>
				%prod = call <4 x i16> @llvm.aarch64.neon.sqrdmulh.v4i16(<4 x i16> %x, <4 x i16> %shuffle)
				%retval = call <4 x i16> @llvm.aarch64.neon.sqsub.v4i16(<4 x i16> %acc, <4 x i16> %prod)
				; CHECK-V8a: sqrdmulh v1.4h, v1.4h, v2.h[3]
				; CHECK-V81a: sqrdmlsh v0.4h, v1.4h, v2.h[3]
				; CHECK-V81a-apple: sqrdmlsh.4h v0, v1, v2[3]
				ret <4 x i16> %retval
				}

				define <8 x i16> @test_sqrdmlshq_lane_s16(<8 x i16> %acc, <8 x i16> %x, <8 x i16> %v) {
				; CHECK-LABEL: test_sqrdmlshq_lane_s16:
				entry:
				%shuffle = shufflevector <8 x i16> %v, <8 x i16> undef, <8 x i32> <i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2>
				%prod = call <8 x i16> @llvm.aarch64.neon.sqrdmulh.v8i16(<8 x i16> %x, <8 x i16> %shuffle)
				%retval = call <8 x i16> @llvm.aarch64.neon.sqsub.v8i16(<8 x i16> %acc, <8 x i16> %prod)
				; CHECK-V8a: sqrdmulh v1.8h, v1.8h, v2.h[2]
				; CHECK-V81a: sqrdmlsh v0.8h, v1.8h, v2.h[2]
				; CHECK-V81a-apple: sqrdmlsh.8h v0, v1, v2[2]
				ret <8 x i16> %retval
				}

				define <2 x i32> @test_sqrdmlsh_lane_s32(<2 x i32> %acc, <2 x i32> %x, <2 x i32> %v) {
				; CHECK-LABEL: test_sqrdmlsh_lane_s32:
				entry:
				%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> <i32 1, i32 1>
				%prod = call <2 x i32> @llvm.aarch64.neon.sqrdmulh.v2i32(<2 x i32> %x, <2 x i32> %shuffle)
				%retval = call <2 x i32> @llvm.aarch64.neon.sqsub.v2i32(<2 x i32> %acc, <2 x i32> %prod)
				; CHECK-V8a: sqrdmulh v1.2s, v1.2s, v2.s[1]
				; CHECK-V81a: sqrdmlsh v0.2s, v1.2s, v2.s[1]
				; CHECK-V81a-apple: sqrdmlsh.2s v0, v1, v2[1]
				ret <2 x i32> %retval
				}

				define <4 x i32> @test_sqrdmlshq_lane_s32(<4 x i32> %acc,<4 x i32> %x, <4 x i32> %v) {
				; CHECK-LABEL: test_sqrdmlshq_lane_s32:
				entry:
				%shuffle = shufflevector <4 x i32> %v, <4 x i32> undef, <4 x i32> zeroinitializer
				%prod = call <4 x i32> @llvm.aarch64.neon.sqrdmulh.v4i32(<4 x i32> %x, <4 x i32> %shuffle)
				%retval = call <4 x i32> @llvm.aarch64.neon.sqsub.v4i32(<4 x i32> %acc, <4 x i32> %prod)
				; CHECK-V8a: sqrdmulh v1.4s, v1.4s, v2.s[0]
				; CHECK-V81a: sqrdmlsh v0.4s, v1.4s, v2.s[0]
				; CHECK-V81a-apple: sqrdmlsh.4s v0, v1, v2[0]
				ret <4 x i32> %retval
				}

				;-----------------------------------------------------------------------------
				; RDMA Vector, by element, extracted
				; i16 tests are for vXi16_indexed in SIMDIndexedSQRDMLxHSDTied, with IR in ACLE style
				; i32 tests are for "def : Pat" in SIMDIndexedSQRDMLxHSDTied

				define i16 @test_sqrdmlah_extracted_lane_s16(i16 %acc,<4 x i16> %x, <4 x i16> %v) {
				; CHECK-LABEL: test_sqrdmlah_extracted_lane_s16:
				entry:
				%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 1,i32 1,i32 1,i32 1>
				%prod = call <4 x i16> @llvm.aarch64.neon.sqrdmulh.v4i16(<4 x i16> %x, <4 x i16> %shuffle)
				%acc_vec = insertelement <4 x i16> undef, i16 %acc, i64 0
				%retval_vec = call <4 x i16> @llvm.aarch64.neon.sqadd.v4i16(<4 x i16> %acc_vec, <4 x i16> %prod)
				%retval = extractelement <4 x i16> %retval_vec, i64 0
				; CHECK-V8a: sqrdmulh {{v[0-9]+}}.4h, v0.4h, v1.h[1]
				; CHECK-V81a: sqrdmlah {{v[2-9]+}}.4h, v0.4h, v1.h[1]
				; CHECK-V81a-apple: sqrdmlah.4h {{v[2-9]+}}, v0, v1[1]
				ret i16 %retval
				}

				define i16 @test_sqrdmlahq_extracted_lane_s16(i16 %acc,<8 x i16> %x, <8 x i16> %v) {
				; CHECK-LABEL: test_sqrdmlahq_extracted_lane_s16:
				entry:
				%shuffle = shufflevector <8 x i16> %v, <8 x i16> undef, <8 x i32> <i32 1,i32 1,i32 1,i32 1, i32 1,i32 1,i32 1,i32 1>
				%prod = call <8 x i16> @llvm.aarch64.neon.sqrdmulh.v8i16(<8 x i16> %x, <8 x i16> %shuffle)
				%acc_vec = insertelement <8 x i16> undef, i16 %acc, i64 0
				%retval_vec = call <8 x i16> @llvm.aarch64.neon.sqadd.v8i16(<8 x i16> %acc_vec, <8 x i16> %prod)
				%retval = extractelement <8 x i16> %retval_vec, i64 0
				; CHECK-V8a: sqrdmulh {{v[0-9]+}}.8h, v0.8h, v1.h[1]
				; CHECK-V81a: sqrdmlah {{v[2-9]+}}.8h, v0.8h, v1.h[1]
				; CHECK-V81a-apple: sqrdmlah.8h {{v[2-9]+}}, v0, v1[1]
				ret i16 %retval
				}

				define i32 @test_sqrdmlah_extracted_lane_s32(i32 %acc,<2 x i32> %x, <2 x i32> %v) {
				; CHECK-LABEL: test_sqrdmlah_extracted_lane_s32:
				entry:
				%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> zeroinitializer
				%prod = call <2 x i32> @llvm.aarch64.neon.sqrdmulh.v2i32(<2 x i32> %x, <2 x i32> %shuffle)
				%extract = extractelement <2 x i32> %prod, i64 0
				%retval = call i32 @llvm.aarch64.neon.sqadd.i32(i32 %acc, i32 %extract)
				; CHECK-V8a: sqrdmulh v0.2s, v0.2s, v1.s[0]
				; CHECK-V81a: sqrdmlah v2.2s, v0.2s, v1.s[0]
				; CHECK-V81a-apple: sqrdmlah.2s v2, v0, v1[0]
				ret i32 %retval
				}

				define i32 @test_sqrdmlahq_extracted_lane_s32(i32 %acc,<4 x i32> %x, <4 x i32> %v) {
				; CHECK-LABEL: test_sqrdmlahq_extracted_lane_s32:
				entry:
				%shuffle = shufflevector <4 x i32> %v, <4 x i32> undef, <4 x i32> zeroinitializer
				%prod = call <4 x i32> @llvm.aarch64.neon.sqrdmulh.v4i32(<4 x i32> %x, <4 x i32> %shuffle)
				%extract = extractelement <4 x i32> %prod, i64 0
				%retval = call i32 @llvm.aarch64.neon.sqadd.i32(i32 %acc, i32 %extract)
				; CHECK-V8a: sqrdmulh v0.4s, v0.4s, v1.s[0]
				; CHECK-V81a: sqrdmlah v2.4s, v0.4s, v1.s[0]
				; CHECK-V81a-apple: sqrdmlah.4s v2, v0, v1[0]
				ret i32 %retval
				}

				define i16 @test_sqrdmlsh_extracted_lane_s16(i16 %acc,<4 x i16> %x, <4 x i16> %v) {
				; CHECK-LABEL: test_sqrdmlsh_extracted_lane_s16:
				entry:
				%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 1,i32 1,i32 1,i32 1>
				%prod = call <4 x i16> @llvm.aarch64.neon.sqrdmulh.v4i16(<4 x i16> %x, <4 x i16> %shuffle)
				%acc_vec = insertelement <4 x i16> undef, i16 %acc, i64 0
				%retval_vec = call <4 x i16> @llvm.aarch64.neon.sqsub.v4i16(<4 x i16> %acc_vec, <4 x i16> %prod)
				%retval = extractelement <4 x i16> %retval_vec, i64 0
				; CHECK-V8a: sqrdmulh {{v[0-9]+}}.4h, v0.4h, v1.h[1]
				; CHECK-V81a: sqrdmlsh {{v[2-9]+}}.4h, v0.4h, v1.h[1]
				; CHECK-V81a-apple: sqrdmlsh.4h {{v[2-9]+}}, v0, v1[1]
				ret i16 %retval
				}

				define i16 @test_sqrdmlshq_extracted_lane_s16(i16 %acc,<8 x i16> %x, <8 x i16> %v) {
				; CHECK-LABEL: test_sqrdmlshq_extracted_lane_s16:
				entry:
				%shuffle = shufflevector <8 x i16> %v, <8 x i16> undef, <8 x i32> <i32 1,i32 1,i32 1,i32 1, i32 1,i32 1,i32 1,i32 1>
				%prod = call <8 x i16> @llvm.aarch64.neon.sqrdmulh.v8i16(<8 x i16> %x, <8 x i16> %shuffle)
				%acc_vec = insertelement <8 x i16> undef, i16 %acc, i64 0
				%retval_vec = call <8 x i16> @llvm.aarch64.neon.sqsub.v8i16(<8 x i16> %acc_vec, <8 x i16> %prod)
				%retval = extractelement <8 x i16> %retval_vec, i64 0
				; CHECK-V8a: sqrdmulh {{v[0-9]+}}.8h, v0.8h, v1.h[1]
				; CHECK-V81a: sqrdmlsh {{v[2-9]+}}.8h, v0.8h, v1.h[1]
				; CHECK-V81a-apple: sqrdmlsh.8h {{v[2-9]+}}, v0, v1[1]
				ret i16 %retval
				}

				define i32 @test_sqrdmlsh_extracted_lane_s32(i32 %acc,<2 x i32> %x, <2 x i32> %v) {
				; CHECK-LABEL: test_sqrdmlsh_extracted_lane_s32:
				entry:
				%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> zeroinitializer
				%prod = call <2 x i32> @llvm.aarch64.neon.sqrdmulh.v2i32(<2 x i32> %x, <2 x i32> %shuffle)
				%extract = extractelement <2 x i32> %prod, i64 0
				%retval = call i32 @llvm.aarch64.neon.sqsub.i32(i32 %acc, i32 %extract)
				; CHECK-V8a: sqrdmulh v0.2s, v0.2s, v1.s[0]
				; CHECK-V81a: sqrdmlsh v2.2s, v0.2s, v1.s[0]
				; CHECK-V81a-apple: sqrdmlsh.2s v2, v0, v1[0]
				ret i32 %retval
				}

				define i32 @test_sqrdmlshq_extracted_lane_s32(i32 %acc,<4 x i32> %x, <4 x i32> %v) {
				; CHECK-LABEL: test_sqrdmlshq_extracted_lane_s32:
				entry:
				%shuffle = shufflevector <4 x i32> %v, <4 x i32> undef, <4 x i32> zeroinitializer
				%prod = call <4 x i32> @llvm.aarch64.neon.sqrdmulh.v4i32(<4 x i32> %x, <4 x i32> %shuffle)
				%extract = extractelement <4 x i32> %prod, i64 0
				%retval = call i32 @llvm.aarch64.neon.sqsub.i32(i32 %acc, i32 %extract)
				; CHECK-V8a: sqrdmulh v0.4s, v0.4s, v1.s[0]
				; CHECK-V81a: sqrdmlsh v2.4s, v0.4s, v1.s[0]
				; CHECK-V81a-apple: sqrdmlsh.4s v2, v0, v1[0]
				ret i32 %retval
				}

				;-----------------------------------------------------------------------------
				; RDMA Scalar
				; test for "def : Pat" near SIMDThreeScalarHSTied in AArch64InstInfo.td

				define i16 @test_sqrdmlah_v1i16(i16 %acc, i16 %x, i16 %y) {
				; CHECK-LABEL: test_sqrdmlah_v1i16:
				%x_vec = insertelement <4 x i16> undef, i16 %x, i64 0
				%y_vec = insertelement <4 x i16> undef, i16 %y, i64 0
				%prod_vec = call <4 x i16> @llvm.aarch64.neon.sqrdmulh.v4i16(<4 x i16> %x_vec, <4 x i16> %y_vec)
				%acc_vec = insertelement <4 x i16> undef, i16 %acc, i64 0
				%retval_vec = call <4 x i16> @llvm.aarch64.neon.sqadd.v4i16(<4 x i16> %acc_vec, <4 x i16> %prod_vec)
				%retval = extractelement <4 x i16> %retval_vec, i64 0
				; CHECK-V8a: sqrdmulh {{v[0-9]+}}.4h, {{v[0-9]+}}.4h, {{v[0-9]+}}.4h
				; CHECK-V81a: sqrdmlah {{v[0-9]+}}.4h, {{v[0-9]+}}.4h, {{v[0-9]+}}.4h
				; CHECK-V81a-apple: sqrdmlah.4h {{v[0-9]+}}, {{v[0-9]+}}, {{v[0-9]+}}
				ret i16 %retval
				}

				define i32 @test_sqrdmlah_v1i32(i32 %acc, i32 %x, i32 %y) {
				; CHECK-LABEL: test_sqrdmlah_v1i32:
				%x_vec = insertelement <4 x i32> undef, i32 %x, i64 0
				%y_vec = insertelement <4 x i32> undef, i32 %y, i64 0
				%prod_vec = call <4 x i32> @llvm.aarch64.neon.sqrdmulh.v4i32(<4 x i32> %x_vec, <4 x i32> %y_vec)
				%acc_vec = insertelement <4 x i32> undef, i32 %acc, i64 0
				%retval_vec = call <4 x i32> @llvm.aarch64.neon.sqadd.v4i32(<4 x i32> %acc_vec, <4 x i32> %prod_vec)
				%retval = extractelement <4 x i32> %retval_vec, i64 0
				; CHECK-V8a: sqrdmulh {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, {{v[0-9]+}}.4s
				; CHECK-V81a: sqrdmlah {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, {{v[0-9]+}}.4s
				; CHECK-V81a-apple: sqrdmlah.4s {{v[0-9]+}}, {{v[0-9]+}}, {{v[0-9]+}}
				ret i32 %retval
				}


				define i16 @test_sqrdmlsh_v1i16(i16 %acc, i16 %x, i16 %y) {
				; CHECK-LABEL: test_sqrdmlsh_v1i16:
				%x_vec = insertelement <4 x i16> undef, i16 %x, i64 0
				%y_vec = insertelement <4 x i16> undef, i16 %y, i64 0
				%prod_vec = call <4 x i16> @llvm.aarch64.neon.sqrdmulh.v4i16(<4 x i16> %x_vec, <4 x i16> %y_vec)
				%acc_vec = insertelement <4 x i16> undef, i16 %acc, i64 0
				%retval_vec = call <4 x i16> @llvm.aarch64.neon.sqsub.v4i16(<4 x i16> %acc_vec, <4 x i16> %prod_vec)
				%retval = extractelement <4 x i16> %retval_vec, i64 0
				; CHECK-V8a: sqrdmulh {{v[0-9]+}}.4h, {{v[0-9]+}}.4h, {{v[0-9]+}}.4h
				; CHECK-V81a: sqrdmlsh {{v[0-9]+}}.4h, {{v[0-9]+}}.4h, {{v[0-9]+}}.4h
				; CHECK-V81a-apple: sqrdmlsh.4h {{v[0-9]+}}, {{v[0-9]+}}, {{v[0-9]+}}
				ret i16 %retval
				}

				define i32 @test_sqrdmlsh_v1i32(i32 %acc, i32 %x, i32 %y) {
				; CHECK-LABEL: test_sqrdmlsh_v1i32:
				%x_vec = insertelement <4 x i32> undef, i32 %x, i64 0
				%y_vec = insertelement <4 x i32> undef, i32 %y, i64 0
				%prod_vec = call <4 x i32> @llvm.aarch64.neon.sqrdmulh.v4i32(<4 x i32> %x_vec, <4 x i32> %y_vec)
				%acc_vec = insertelement <4 x i32> undef, i32 %acc, i64 0
				%retval_vec = call <4 x i32> @llvm.aarch64.neon.sqsub.v4i32(<4 x i32> %acc_vec, <4 x i32> %prod_vec)
				%retval = extractelement <4 x i32> %retval_vec, i64 0
				; CHECK-V8a: sqrdmulh {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, {{v[0-9]+}}.4s
				; CHECK-V81a: sqrdmlsh {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, {{v[0-9]+}}.4s
				; CHECK-V81a-apple: sqrdmlsh.4s {{v[0-9]+}}, {{v[0-9]+}}, {{v[0-9]+}}
				ret i32 %retval
				}
				define i32 @test_sqrdmlah_i32(i32 %acc, i32 %mhs, i32 %rhs) {
				; CHECK-LABEL: test_sqrdmlah_i32:
				%prod = call i32 @llvm.aarch64.neon.sqrdmulh.i32(i32 %mhs, i32 %rhs)
				%retval = call i32 @llvm.aarch64.neon.sqadd.i32(i32 %acc, i32 %prod)
				; CHECK-V8a: sqrdmulh {{s[0-9]+}}, {{s[0-9]+}}, {{s[0-9]+}}
				; CHECK-V81a: sqrdmlah {{s[0-9]+}}, {{s[0-9]+}}, {{s[0-9]+}}
				; CHECK-V81a-apple: sqrdmlah {{s[0-9]+}}, {{s[0-9]+}}, {{s[0-9]+}}
				ret i32 %retval
				}

				define i32 @test_sqrdmlsh_i32(i32 %acc, i32 %mhs, i32 %rhs) {
				; CHECK-LABEL: test_sqrdmlsh_i32:
				%prod = call i32 @llvm.aarch64.neon.sqrdmulh.i32(i32 %mhs, i32 %rhs)
				%retval = call i32 @llvm.aarch64.neon.sqsub.i32(i32 %acc, i32 %prod)
				; CHECK-V8a: sqrdmulh {{s[0-9]+}}, {{s[0-9]+}}, {{s[0-9]+}}
				; CHECK-V81a: sqrdmlsh {{s[0-9]+}}, {{s[0-9]+}}, {{s[0-9]+}}
				; CHECK-V81a-apple: sqrdmlsh {{s[0-9]+}}, {{s[0-9]+}}, {{s[0-9]+}}
				ret i32 %retval
				}

				;-----------------------------------------------------------------------------
				; RDMA Scalar, by element
				; i16 tests are performed via tests in above chapter, with IR in ACLE style
				; i32 tests are for i32_indexed in SIMDIndexedSQRDMLxHSDTied

				define i16 @test_sqrdmlah_extract_i16(i16 %acc, i16 %x, <4 x i16> %y_vec) {
				; CHECK-LABEL: test_sqrdmlah_extract_i16:
				%shuffle = shufflevector <4 x i16> %y_vec, <4 x i16> undef, <4 x i32> <i32 1,i32 1,i32 1,i32 1>
				%x_vec = insertelement <4 x i16> undef, i16 %x, i64 0
				%prod = call <4 x i16> @llvm.aarch64.neon.sqrdmulh.v4i16(<4 x i16> %x_vec, <4 x i16> %shuffle)
				%acc_vec = insertelement <4 x i16> undef, i16 %acc, i64 0
				%retval_vec = call <4 x i16> @llvm.aarch64.neon.sqadd.v4i16(<4 x i16> %acc_vec, <4 x i16> %prod)
				%retval = extractelement <4 x i16> %retval_vec, i32 0
				; CHECK-V8a: sqrdmulh {{v[0-9]+}}.4h, {{v[0-9]+}}.4h, v0.h[1]
				; CHECK-V81a: sqrdmlah {{v[0-9]+}}.4h, {{v[0-9]+}}.4h, v0.h[1]
				; CHECK-V81a-apple: sqrdmlah.4h {{v[0-9]+}}, {{v[0-9]+}}, v0[1]
				ret i16 %retval
				}

				define i32 @test_sqrdmlah_extract_i32(i32 %acc, i32 %mhs, <4 x i32> %rhs) {
				; CHECK-LABEL: test_sqrdmlah_extract_i32:
				%extract = extractelement <4 x i32> %rhs, i32 3
				%prod = call i32 @llvm.aarch64.neon.sqrdmulh.i32(i32 %mhs, i32 %extract)
				%retval = call i32 @llvm.aarch64.neon.sqadd.i32(i32 %acc, i32 %prod)
				; CHECK-V8a: sqrdmulh {{s[0-9]+}}, {{s[0-9]+}}, v0.s[3]
				; CHECK-V81a: sqrdmlah {{s[0-9]+}}, {{s[0-9]+}}, v0.s[3]
				; CHECK-V81a-apple: sqrdmlah.s {{s[0-9]+}}, {{s[0-9]+}}, v0[3]
				ret i32 %retval
				}

				define i16 @test_sqrdmlshq_extract_i16(i16 %acc, i16 %x, <8 x i16> %y_vec) {
				; CHECK-LABEL: test_sqrdmlshq_extract_i16:
				%shuffle = shufflevector <8 x i16> %y_vec, <8 x i16> undef, <8 x i32> <i32 1,i32 1,i32 1,i32 1,i32 1,i32 1,i32 1,i32 1>
				%x_vec = insertelement <8 x i16> undef, i16 %x, i64 0
				%prod = call <8 x i16> @llvm.aarch64.neon.sqrdmulh.v8i16(<8 x i16> %x_vec, <8 x i16> %shuffle)
				%acc_vec = insertelement <8 x i16> undef, i16 %acc, i64 0
				%retval_vec = call <8 x i16> @llvm.aarch64.neon.sqsub.v8i16(<8 x i16> %acc_vec, <8 x i16> %prod)
				%retval = extractelement <8 x i16> %retval_vec, i32 0
				; CHECK-V8a: sqrdmulh {{v[0-9]+}}.8h, {{v[0-9]+}}.8h, v0.h[1]
				; CHECK-V81a: sqrdmlsh {{v[0-9]+}}.8h, {{v[0-9]+}}.8h, v0.h[1]
				; CHECK-V81a-apple: sqrdmlsh.8h {{v[0-9]+}}, {{v[0-9]+}}, v0[1]
				ret i16 %retval
				}

				define i32 @test_sqrdmlsh_extract_i32(i32 %acc, i32 %mhs, <4 x i32> %rhs) {
				; CHECK-LABEL: test_sqrdmlsh_extract_i32:
				%extract = extractelement <4 x i32> %rhs, i32 3
				%prod = call i32 @llvm.aarch64.neon.sqrdmulh.i32(i32 %mhs, i32 %extract)
				%retval = call i32 @llvm.aarch64.neon.sqsub.i32(i32 %acc, i32 %prod)
				; CHECK-V8a: sqrdmulh {{s[0-9]+}}, {{s[0-9]+}}, v0.s[3]
				; CHECK-V81a: sqrdmlsh {{s[0-9]+}}, {{s[0-9]+}}, v0.s[3]
				; CHECK-V81a-apple: sqrdmlsh.s {{s[0-9]+}}, {{s[0-9]+}}, v0[3]
				ret i32 %retval
				}

llvm/trunk/test/MC/AArch64/armv8.1a-rdma.s

Property	Old Value	New Value
svn:eol-style	null	native
svn:keywords	null	Rev Date Author URL Id

				// RUN: not llvm-mc -triple aarch64-none-linux-gnu -mattr=+v8.1a -show-encoding < %s 2> %t \| FileCheck %s
				// RUN: FileCheck --check-prefix=CHECK-ERROR < %t %s
				.text

				//AdvSIMD RDMA vector
				sqrdmlah v0.4h, v1.4h, v2.4h
				sqrdmlsh v0.4h, v1.4h, v2.4h
				sqrdmlah v0.2s, v1.2s, v2.2s
				sqrdmlsh v0.2s, v1.2s, v2.2s
				sqrdmlah v0.4s, v1.4s, v2.4s
				sqrdmlsh v0.4s, v1.4s, v2.4s
				sqrdmlah v0.8h, v1.8h, v2.8h
				sqrdmlsh v0.8h, v1.8h, v2.8h
				// CHECK: sqrdmlah v0.4h, v1.4h, v2.4h // encoding: [0x20,0x84,0x42,0x2e]
				// CHECK: sqrdmlsh v0.4h, v1.4h, v2.4h // encoding: [0x20,0x8c,0x42,0x2e]
				// CHECK: sqrdmlah v0.2s, v1.2s, v2.2s // encoding: [0x20,0x84,0x82,0x2e]
				// CHECK: sqrdmlsh v0.2s, v1.2s, v2.2s // encoding: [0x20,0x8c,0x82,0x2e]
				// CHECK: sqrdmlah v0.4s, v1.4s, v2.4s // encoding: [0x20,0x84,0x82,0x6e]
				// CHECK: sqrdmlsh v0.4s, v1.4s, v2.4s // encoding: [0x20,0x8c,0x82,0x6e]
				// CHECK: sqrdmlah v0.8h, v1.8h, v2.8h // encoding: [0x20,0x84,0x42,0x6e]
				// CHECK: sqrdmlsh v0.8h, v1.8h, v2.8h // encoding: [0x20,0x8c,0x42,0x6e]

				sqrdmlah v0.2h, v1.2h, v2.2h
				sqrdmlsh v0.2h, v1.2h, v2.2h
				sqrdmlah v0.8s, v1.8s, v2.8s
				sqrdmlsh v0.8s, v1.8s, v2.8s
				sqrdmlah v0.2s, v1.4h, v2.8h
				sqrdmlsh v0.4s, v1.8h, v2.2s
				// CHECK-ERROR: error: invalid vector kind qualifier
				// CHECK-ERROR: sqrdmlah v0.2h, v1.2h, v2.2h
				// CHECK-ERROR: ^
				// CHECK-ERROR: error: invalid vector kind qualifier
				// CHECK-ERROR: sqrdmlah v0.2h, v1.2h, v2.2h
				// CHECK-ERROR: ^
				// CHECK-ERROR: error: invalid vector kind qualifier
				// CHECK-ERROR: sqrdmlah v0.2h, v1.2h, v2.2h
				// CHECK-ERROR: ^
				// CHECK-ERROR: error: invalid operand for instruction
				// CHECK-ERROR: sqrdmlah v0.2h, v1.2h, v2.2h
				// CHECK-ERROR: ^
				// CHECK-ERROR: error: invalid vector kind qualifier
				// CHECK-ERROR: sqrdmlsh v0.2h, v1.2h, v2.2h
				// CHECK-ERROR: ^
				// CHECK-ERROR: error: invalid vector kind qualifier
				// CHECK-ERROR: sqrdmlsh v0.2h, v1.2h, v2.2h
				// CHECK-ERROR: ^
				// CHECK-ERROR: error: invalid vector kind qualifier
				// CHECK-ERROR: sqrdmlsh v0.2h, v1.2h, v2.2h
				// CHECK-ERROR: ^
				// CHECK-ERROR: error: invalid operand for instruction
				// CHECK-ERROR: sqrdmlsh v0.2h, v1.2h, v2.2h
				// CHECK-ERROR: ^
				// CHECK-ERROR: error: invalid vector kind qualifier
				// CHECK-ERROR: sqrdmlah v0.8s, v1.8s, v2.8s
				// CHECK-ERROR: ^
				// CHECK-ERROR: error: invalid vector kind qualifier
				// CHECK-ERROR: sqrdmlah v0.8s, v1.8s, v2.8s
				// CHECK-ERROR: ^
				// CHECK-ERROR: error: invalid vector kind qualifier
				// CHECK-ERROR: sqrdmlah v0.8s, v1.8s, v2.8s
				// CHECK-ERROR: ^
				// CHECK-ERROR: error: invalid operand for instruction
				// CHECK-ERROR: sqrdmlah v0.8s, v1.8s, v2.8s
				// CHECK-ERROR: ^
				// CHECK-ERROR: error: invalid vector kind qualifier
				// CHECK-ERROR: sqrdmlsh v0.8s, v1.8s, v2.8s
				// CHECK-ERROR: ^
				// CHECK-ERROR: error: invalid vector kind qualifier
				// CHECK-ERROR: sqrdmlsh v0.8s, v1.8s, v2.8s
				// CHECK-ERROR: ^
				// CHECK-ERROR: error: invalid vector kind qualifier
				// CHECK-ERROR: sqrdmlsh v0.8s, v1.8s, v2.8s
				// CHECK-ERROR: ^
				// CHECK-ERROR: error: invalid operand for instruction
				// CHECK-ERROR: sqrdmlsh v0.8s, v1.8s, v2.8s
				// CHECK-ERROR: ^
				// CHECK-ERROR: error: invalid operand for instruction
				// CHECK-ERROR: sqrdmlah v0.2s, v1.4h, v2.8h
				// CHECK-ERROR: ^
				// CHECK-ERROR: error: invalid operand for instruction
				// CHECK-ERROR: sqrdmlsh v0.4s, v1.8h, v2.2s
				// CHECK-ERROR: ^

				//AdvSIMD RDMA scalar
				sqrdmlah h0, h1, h2
				sqrdmlsh h0, h1, h2
				sqrdmlah s0, s1, s2
				sqrdmlsh s0, s1, s2
				// CHECK: sqrdmlah h0, h1, h2 // encoding: [0x20,0x84,0x42,0x7e]
				// CHECK: sqrdmlsh h0, h1, h2 // encoding: [0x20,0x8c,0x42,0x7e]
				// CHECK: sqrdmlah s0, s1, s2 // encoding: [0x20,0x84,0x82,0x7e]
				// CHECK: sqrdmlsh s0, s1, s2 // encoding: [0x20,0x8c,0x82,0x7e]

				//AdvSIMD RDMA vector by-element
				sqrdmlah v0.4h, v1.4h, v2.h[3]
				sqrdmlsh v0.4h, v1.4h, v2.h[3]
				sqrdmlah v0.2s, v1.2s, v2.s[1]
				sqrdmlsh v0.2s, v1.2s, v2.s[1]
				sqrdmlah v0.8h, v1.8h, v2.h[3]
				sqrdmlsh v0.8h, v1.8h, v2.h[3]
				sqrdmlah v0.4s, v1.4s, v2.s[3]
				sqrdmlsh v0.4s, v1.4s, v2.s[3]
				// CHECK: sqrdmlah v0.4h, v1.4h, v2.h[3] // encoding: [0x20,0xd0,0x72,0x2f]
				// CHECK: sqrdmlsh v0.4h, v1.4h, v2.h[3] // encoding: [0x20,0xf0,0x72,0x2f]
				// CHECK: sqrdmlah v0.2s, v1.2s, v2.s[1] // encoding: [0x20,0xd0,0xa2,0x2f]
				// CHECK: sqrdmlsh v0.2s, v1.2s, v2.s[1] // encoding: [0x20,0xf0,0xa2,0x2f]
				// CHECK: sqrdmlah v0.8h, v1.8h, v2.h[3] // encoding: [0x20,0xd0,0x72,0x6f]
				// CHECK: sqrdmlsh v0.8h, v1.8h, v2.h[3] // encoding: [0x20,0xf0,0x72,0x6f]
				// CHECK: sqrdmlah v0.4s, v1.4s, v2.s[3] // encoding: [0x20,0xd8,0xa2,0x6f]
				// CHECK: sqrdmlsh v0.4s, v1.4s, v2.s[3] // encoding: [0x20,0xf8,0xa2,0x6f]

				sqrdmlah v0.4s, v1.2s, v2.s[1]
				sqrdmlsh v0.2s, v1.2d, v2.s[1]
				sqrdmlah v0.8h, v1.8h, v2.s[3]
				sqrdmlsh v0.8h, v1.8h, v2.h[8]
				// CHECK-ERROR: error: invalid operand for instruction
				// CHECK-ERROR: sqrdmlah v0.4s, v1.2s, v2.s[1]
				// CHECK-ERROR: ^
				// CHECK-ERROR: error: invalid operand for instruction
				// CHECK-ERROR: sqrdmlsh v0.2s, v1.2d, v2.s[1]
				// CHECK-ERROR: ^
				// CHECK-ERROR: error: invalid operand for instruction
				// CHECK-ERROR: sqrdmlah v0.8h, v1.8h, v2.s[3]
				// CHECK-ERROR: ^
				// CHECK-ERROR: error: vector lane must be an integer in range [0, 7].
				// CHECK-ERROR: sqrdmlsh v0.8h, v1.8h, v2.h[8]
				// CHECK-ERROR: ^

				//AdvSIMD RDMA scalar by-element
				sqrdmlah h0, h1, v2.h[3]
				sqrdmlsh h0, h1, v2.h[3]
				sqrdmlah s0, s1, v2.s[3]
				sqrdmlsh s0, s1, v2.s[3]
				// CHECK: sqrdmlah h0, h1, v2.h[3] // encoding: [0x20,0xd0,0x72,0x7f]
				// CHECK: sqrdmlsh h0, h1, v2.h[3] // encoding: [0x20,0xf0,0x72,0x7f]
				// CHECK: sqrdmlah s0, s1, v2.s[3] // encoding: [0x20,0xd8,0xa2,0x7f]
				// CHECK: sqrdmlsh s0, s1, v2.s[3] // encoding: [0x20,0xf8,0xa2,0x7f]

				sqrdmlah b0, h1, v2.h[3]
				sqrdmlah s0, d1, v2.s[3]
				sqrdmlsh h0, h1, v2.s[3]
				sqrdmlsh s0, s1, v2.s[4]
				// CHECK-ERROR: error: invalid operand for instruction
				// CHECK-ERROR: sqrdmlah b0, h1, v2.h[3]
				// CHECK-ERROR: ^
				// CHECK-ERROR: error: invalid operand for instruction
				// CHECK-ERROR: sqrdmlah s0, d1, v2.s[3]
				// CHECK-ERROR: ^
				// CHECK-ERROR: error: invalid operand for instruction
				// CHECK-ERROR: sqrdmlsh h0, h1, v2.s[3]
				// CHECK-ERROR: ^
				// CHECK-ERROR: error: vector lane must be an integer in range [0, 3].
				// CHECK-ERROR: sqrdmlsh s0, s1, v2.s[4]
				// CHECK-ERROR: ^

llvm/trunk/test/MC/Disassembler/AArch64/armv8.1a-rdma.txt

Property	Old Value	New Value
svn:eol-style	null	native
svn:keywords	null	Rev Date Author URL Id

				# RUN: not llvm-mc -triple aarch64-none-linux-gnu -mattr=+v8.1a --disassemble < %s 2>&1 \| FileCheck %s

				[0x20,0x84,0x02,0x2e] # sqrdmlah v0.8b, v1.8b, v2.8b
				[0x20,0x8c,0x02,0x2e] # sqrdmlsh v0.8b, v1.8b, v2.8b
				[0x20,0x84,0xc2,0x2e] # sqrdmlah v0.1d, v1.1d, v2.1d
				[0x20,0x8c,0xc2,0x2e] # sqrdmlsh v0.1d, v1.1d, v2.1d
				[0x20,0x84,0x02,0x6e] # sqrdmlah v0.16b, v1.16b, v2.16b
				[0x20,0x8c,0x02,0x6e] # sqrdmlsh v0.16b, v1.16b, v2.16b
				[0x20,0x84,0xc2,0x6e] # sqrdmlah v0.2d, v1.2d, v2.2d
				[0x20,0x8c,0xc2,0x6e] # sqrdmlsh v0.2d, v1.2d, v2.2d
				# CHECK: warning: invalid instruction encoding
				# CHECK: [0x20,0x84,0x02,0x2e]
				# CHECK: warning: invalid instruction encoding
				# CHECK: [0x20,0x8c,0x02,0x2e]
				# CHECK: warning: invalid instruction encoding
				# CHECK: [0x20,0x84,0xc2,0x2e]
				# CHECK: warning: invalid instruction encoding
				# CHECK: [0x20,0x8c,0xc2,0x2e]
				# CHECK: warning: invalid instruction encoding
				# CHECK: [0x20,0x84,0x02,0x6e]
				# CHECK: warning: invalid instruction encoding
				# CHECK: [0x20,0x8c,0x02,0x6e]
				# CHECK: warning: invalid instruction encoding
				# CHECK: [0x20,0x84,0xc2,0x6e]
				# CHECK: warning: invalid instruction encoding
				# CHECK: [0x20,0x8c,0xc2,0x6e]

				[0x20,0x84,0x02,0x7e] # sqrdmlah b0, b1, b2
				[0x20,0x8c,0x02,0x7e] # sqrdmlsh b0, b1, b2
				[0x20,0x84,0xc2,0x7e] # sqrdmlah d0, d1, d2
				[0x20,0x8c,0xc2,0x7e] # sqrdmlsh d0, d1, d2
				# CHECK: warning: invalid instruction encoding
				# CHECK: [0x20,0x84,0x02,0x7e]
				# CHECK: warning: invalid instruction encoding
				# CHECK: [0x20,0x8c,0x02,0x7e]
				# CHECK: warning: invalid instruction encoding
				# CHECK: [0x20,0x84,0xc2,0x7e]
				# CHECK: warning: invalid instruction encoding
				# CHECK: [0x20,0x8c,0xc2,0x7e]

				[0x20,0xd0,0x32,0x2f] # sqrdmlah v0.8b, v1.8b, v2.b[3]
				[0x20,0xf0,0x32,0x2f] # sqrdmlsh v0.8b, v1.8b, v2.b[3]
				[0x20,0xd0,0xe2,0x2f] # sqrdmlah v0.1d, v1.1d, v2.d[1]
				[0x20,0xf0,0xe2,0x2f] # sqrdmlsh v0.1d, v1.1d, v2.d[1]
				[0x20,0xd0,0x32,0x6f] # sqrdmlah v0.16b, v1.16b, v2.b[3]
				[0x20,0xf0,0x32,0x6f] # sqrdmlsh v0.16b, v1.16b, v2.b[3]
				[0x20,0xd8,0xe2,0x6f] # sqrdmlah v0.2d, v1.2d, v2.d[3]
				[0x20,0xf8,0xe2,0x6f] # sqrdmlsh v0.2d, v1.2d, v2.d[3]
				# CHECK: warning: invalid instruction encoding
				# CHECK: [0x20,0xd0,0x32,0x2f]
				# CHECK: warning: invalid instruction encoding
				# CHECK: [0x20,0xf0,0x32,0x2f]
				# CHECK: warning: invalid instruction encoding
				# CHECK: [0x20,0xd0,0xe2,0x2f]
				# CHECK: warning: invalid instruction encoding
				# CHECK: [0x20,0xf0,0xe2,0x2f]
				# CHECK: warning: invalid instruction encoding
				# CHECK: [0x20,0xd0,0x32,0x6f]
				# CHECK: warning: invalid instruction encoding
				# CHECK: [0x20,0xf0,0x32,0x6f]
				# CHECK: warning: invalid instruction encoding
				# CHECK: [0x20,0xd8,0xe2,0x6f]
				# CHECK: warning: invalid instruction encoding
				# CHECK: [0x20,0xf8,0xe2,0x6f]

				[0x20,0xd0,0x32,0x7f] # sqrdmlah b0, b1, v2.b[3]
				[0x20,0xf0,0x32,0x7f] # sqrdmlsh b0, b1, v2.b[3]
				[0x20,0xd8,0xe2,0x7f] # sqrdmlah d0, d1, v2.d[3]
				[0x20,0xf8,0xe2,0x7f] # sqrdmlsh d0, d1, v2.d[3]
				# CHECK: warning: invalid instruction encoding
				# CHECK: [0x20,0xd0,0x32,0x7f]
				# CHECK: warning: invalid instruction encoding
				# CHECK: [0x20,0xf0,0x32,0x7f]
				# CHECK: warning: invalid instruction encoding
				# CHECK: [0x20,0xd8,0xe2,0x7f]
				# CHECK: warning: invalid instruction encoding
				# CHECK: [0x20,0xf8,0xe2,0x7f]

				[0x20,0x84,0x42,0x2e]
				[0x20,0x8c,0x42,0x2e]
				[0x20,0x84,0x82,0x2e]
				[0x20,0x8c,0x82,0x2e]
				[0x20,0x84,0x42,0x6e]
				[0x20,0x8c,0x42,0x6e]
				[0x20,0x84,0x82,0x6e]
				[0x20,0x8c,0x82,0x6e]
				# CHECK: sqrdmlah v0.4h, v1.4h, v2.4h
				# CHECK: sqrdmlsh v0.4h, v1.4h, v2.4h
				# CHECK: sqrdmlah v0.2s, v1.2s, v2.2s
				# CHECK: sqrdmlsh v0.2s, v1.2s, v2.2s
				# CHECK: sqrdmlah v0.8h, v1.8h, v2.8h
				# CHECK: sqrdmlsh v0.8h, v1.8h, v2.8h
				# CHECK: sqrdmlah v0.4s, v1.4s, v2.4s
				# CHECK: sqrdmlsh v0.4s, v1.4s, v2.4s

				[0x20,0x84,0x42,0x7e]
				[0x20,0x8c,0x42,0x7e]
				[0x20,0x84,0x82,0x7e]
				[0x20,0x8c,0x82,0x7e]
				# CHECK: sqrdmlah h0, h1, h2
				# CHECK: sqrdmlsh h0, h1, h2
				# CHECK: sqrdmlah s0, s1, s2
				# CHECK: sqrdmlsh s0, s1, s2

				0x20,0xd0,0x72,0x2f
				0x20,0xf0,0x72,0x2f
				0x20,0xd0,0xa2,0x2f
				0x20,0xf0,0xa2,0x2f
				0x20,0xd0,0x72,0x6f
				0x20,0xf0,0x72,0x6f
				0x20,0xd8,0xa2,0x6f
				0x20,0xf8,0xa2,0x6f
				# CHECK: sqrdmlah v0.4h, v1.4h, v2.h[3]
				# CHECK: sqrdmlsh v0.4h, v1.4h, v2.h[3]
				# CHECK: sqrdmlah v0.2s, v1.2s, v2.s[1]
				# CHECK: sqrdmlsh v0.2s, v1.2s, v2.s[1]
				# CHECK: sqrdmlah v0.8h, v1.8h, v2.h[3]
				# CHECK: sqrdmlsh v0.8h, v1.8h, v2.h[3]
				# CHECK: sqrdmlah v0.4s, v1.4s, v2.s[3]
				# CHECK: sqrdmlsh v0.4s, v1.4s, v2.s[3]

				0x20,0xd0,0x72,0x7f
				0x20,0xf0,0x72,0x7f
				0x20,0xd8,0xa2,0x7f
				0x20,0xf8,0xa2,0x7f
				# CHECK: sqrdmlah h0, h1, v2.h[3]
				# CHECK: sqrdmlsh h0, h1, v2.h[3]
				# CHECK: sqrdmlah s0, s1, v2.s[3]
				# CHECK: sqrdmlsh s0, s1, v2.s[3]

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Add v8.1a "Rounding Doubling Multiply Add/Subtract" extensionClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 22947

llvm/trunk/lib/Target/AArch64/AArch64InstrFormats.td

llvm/trunk/lib/Target/AArch64/AArch64InstrInfo.td

llvm/trunk/test/CodeGen/AArch64/arm64-neon-v8.1a.ll

llvm/trunk/test/MC/AArch64/armv8.1a-rdma.s

llvm/trunk/test/MC/Disassembler/AArch64/armv8.1a-rdma.txt

[AArch64] Add v8.1a "Rounding Doubling Multiply Add/Subtract" extension
ClosedPublic