This is an archive of the discontinued LLVM Phabricator instance.

I think this mostly looks fine (very nice first patch!). I've got a couple of questions though (don't take them as non-negotiable, if you think you have good reasons for those decisions).

Cheers.

Tim.

include/llvm/IR/IntrinsicsAArch64.td
660	I don't think these new intrinsics are needed. The instructions are effectively "(int_aarch64_neon_sqadd $acc, (int_aarch64_neon_sqrdmulh $LHS, $RHS))". The only reason we've fused operations like that in the past is when loop optimisations can interfere and make block-at-a-time selection produce worse code. Basically, this is when sext/zext happen before the key NEON op, and those nodes get hoisted out and become invisible to local selection (e.g. the int_aarch64_neon_smull intrinsics). That shouldn't be an issue here.
lib/Target/AArch64/AArch64.td
29–30	Just how granular is v8.1? Are these instructions optional, or are we expecting CPUs that support them but not v8.1 in general? If not, it'd be much better to keep things as hierarchical as possible and just add a HasV8_1 predicate, or perhaps FeatureFPARMv8_1, but again unless you expect a v8.1 core without a v8.1 FPU that could just be "Requires<HasV8_1, FeatureFPARMv8>".

Hi Tim,
thank you for your warm feedback.

I don't think these new intrinsics are needed. The instructions are effectively "(int_aarch64_neon_sqadd $acc, (int_aarch64_neon_sqrdmulh $LHS, $RHS))".

Ok, but now I have a severe trouble that could be well-known for v1iNN types. If so, would you please give some hint?
For scalar operations, I have temporarily added

        lib/Target/AArch64/AArch64InstrFormats.td:
let mayStore = 0, mayLoad = 0, hasSideEffects = 0 in
class BaseSIMDThreeScalar_cstr<bit U, bits<2> size, bits<5> opcode,
                        dag oops, dag iops, string asm, string cstr, 
                        list<dag> pattern>
  : I<oops, iops, asm,
      "\t$Rd, $Rn, $Rm", cstr, pattern>,
    Sched<[WriteV]> {
  bits<5> Rd;
  bits<5> Rn;
  bits<5> Rm;
  let Inst{31-30} = 0b01;
  let Inst{29}    = U;
  let Inst{28-24} = 0b11110;
  let Inst{23-22} = size;
  let Inst{21}    = 1;
  let Inst{20-16} = Rm;
  let Inst{15-11} = opcode;
  let Inst{10}    = 1;
  let Inst{9-5}   = Rn;
  let Inst{4-0}   = Rd;
}
class BaseSIMDThreeScalarExtRDMA<bit U, bits<2> size, bits<5> opcode,
                        dag oops, dag iops, string asm,
                        list<dag> pattern>
  : BaseSIMDThreeScalar_cstr<U, size, opcode, oops, iops, 
			  asm, "$Rd = $dst", pattern> {
  let Inst{21} =0;
}
multiclass SIMDThreeScalarHSExtRDMA<bit U, bits<5> opc, string asm,
                               SDPatternOperator OpNode = null_frag> {
  def i32  : BaseSIMDThreeScalarExtRDMA<U, 0b10, opc, (outs FPR32:$dst), (ins FPR32:$Rd, FPR32:$Rn, FPR32:$Rm), asm, []>;
  def i16  : BaseSIMDThreeScalarExtRDMA<U, 0b01, opc, (outs FPR16:$dst), (ins FPR16:$Rd, FPR16:$Rn, FPR16:$Rm), asm, []>;
//  def v1i16  : BaseSIMDThreeScalarExtRDMA<U, 0b01, opc, (outs FPR16:$dst), (ins FPR16:$Rd, FPR16:$Rn, FPR16:$Rm), asm, []>;
}


        lib/Target/AArch64/AArch64InstrInfo.td:
defm SQRDMLAH : SIMDThreeScalarHSExtRDMA<1, 0b10000, "sqrdmlah">;
def : Pat<(i16 (int_aarch64_neon_sqadd (i16 FPR16:$Rd),
                  (i16 (int_aarch64_neon_sqrdmulh (i16 FPR16:$Rn),
                                                    (i16 FPR16:$Rm))))),
          (SQRDMLAHi16 FPR16:$Rd, FPR16:$Rn, FPR16:$Rm)>;
//def : Pat<(v1i16 (int_aarch64_neon_sqadd (v1i16 FPR16:$Rd),
//                  (v1i16 (int_aarch64_neon_sqrdmulh (v1i16 FPR16:$Rn),
//                                                    (v1i16 FPR16:$Rm))))),
//          (SQRDMLAHv1i16 FPR16:$Rd, FPR16:$Rn, FPR16:$Rm)>;
def : Pat<(i32 (int_aarch64_neon_sqadd (i32 FPR32:$Rd),
                  (i32 (int_aarch64_neon_sqrdmulh (i32 FPR32:$Rn),
                                                    (i32 FPR32:$Rm))))),
          (SQRDMLAHi32 FPR32:$Rd, FPR32:$Rn, FPR32:$Rm)>;

SQRDMLAHi32 works fine, but for i16 version, I got Tablegen error

anonymous_1513: 	(intrinsic_wo_chain:<empty> 117:<empty>, FPR16:i16:$Rd, (intrinsic_wo_chain:i16 124:<empty>, FPR16:i16:$Rn, FPR16:i16:$Rm))
Included from /work/llvm/lib/Target/AArch64/AArch64.td:58:
/work/llvm/lib/Target/AArch64/AArch64InstrInfo.td:2992:1: error: In anonymous_1513: Type inference contradiction found, merging '{i32:i64:v8i8:v16i8:v4i16:v8i16:v2i32:v4i32:v1i64:v2i64}' into 'i16'
def : Pat<(i16 (int_aarch64_neon_sqadd (i16 FPR16:$Rd),
^
anonymous_1513: 	(SQRDMLAHi16:f16 FPR16:f16:$Rd, FPR16:<empty>:$Rn, FPR16:f16:$Rm)
Included from /work/llvm/lib/Target/AArch64/AArch64.td:58:
/work/llvm/lib/Target/AArch64/AArch64InstrInfo.td:2992:1: error: In anonymous_1513: Type inference contradiction found, merging 'i16' into 'f16'
def : Pat<(i16 (int_aarch64_neon_sqadd (i16 FPR16:$Rd),
^
anonymous_1513: 	(intrinsic_wo_chain:<empty> 117:<empty>, FPR16:i16:$Rd, (intrinsic_wo_chain:i16 124:<empty>, FPR16:i16:$Rn, FPR16:i16:$Rm))

The similar is for v1i16

anonymous_1513: 	(intrinsic_wo_chain:<empty> 117:<empty>, FPR16:v1i16:$Rd, (intrinsic_wo_chain:v1i16 124:<empty>, FPR16:v1i16:$Rn, FPR16:v1i16:$Rm))
Included from /work/llvm/lib/Target/AArch64/AArch64.td:58:
/work/llvm/lib/Target/AArch64/AArch64InstrInfo.td:2996:1: error: In anonymous_1513: Type inference contradiction found, merging '{i32:i64:v8i8:v16i8:v4i16:v8i16:v2i32:v4i32:v1i64:v2i64}' into 'v1i16'
def : Pat<(v1i16 (int_aarch64_neon_sqadd (v1i16 FPR16:$Rd),
^
anonymous_1513: 	(SQRDMLAHv1i16:f16 FPR16:f16:$Rd, FPR16:<empty>:$Rn, FPR16:f16:$Rm)
Included from /work/llvm/lib/Target/AArch64/AArch64.td:58:
/work/llvm/lib/Target/AArch64/AArch64InstrInfo.td:2996:1: error: In anonymous_1513: Type inference contradiction found, merging 'v1i16' into 'f16'
def : Pat<(v1i16 (int_aarch64_neon_sqadd (v1i16 FPR16:$Rd),
^
anonymous_1513: 	(intrinsic_wo_chain:<empty> 117:<empty>, FPR16:v1i16:$Rd, (intrinsic_wo_chain:v1i16 124:<empty>, FPR16:v1i16:$Rn, FPR16:v1i16:$Rm))

It's not the first time I see LLVM has problems with i16, thats why I in my first patch I skipped both implementation and tests for SQRDMLAHi16.
Previously. I was able to implement at least SQRDMLAHv1i16, but now I cannot do it even for v1i16.
What would you suggest here?

Note: First I tried another way of implementation, like the following snipped for vector variant. It has failed due to incomplete type inference between two intrinsics.

lib/Target/AArch64/AArch64InstrFormats.td:
multiclass SIMDThreeSameVectorExtRDMA<bit U, bits<5> opc, string asm,
                               SDPatternOperator OpNode> {
  def v4i16 : BaseSIMDThreeSameVectorExtRDMA<0, U, 0b01, opc, V64,
                                      asm, ".4h",
        [(set (v4i16 V64:$dst),
            (OpNode (v4i16 V64:$Rd), (v4i16 V64:$Rn), (v4i16 V64:$Rm)))]>;
}


lib/Target/AArch64/AArch64InstrInfo.td:
defm SQRDMLAH : SIMDThreeSameVectorExtRDMA<1,0b10000,"sqrdmlah",
         TriOpFrag<(int_aarch64_neon_sqadd node:$LHS,
     (int_aarch64_neon_sqrdmulh node:$MHS, node:$RHS))> >;

SQRDMLAHv4i16:  (set V64:v4i16:$dst, (intrinsic_wo_chain:v4i16 117:iPTR, V64:v4i16:$Rd, (intrinsic_wo_chain:{i32:i64:v8i8:v16i8:v4i16:v8i16:v2i32:v4i32:v1i64:v2i64} 124:iPTR, V64:v4i16:$Rn, V64:v4i16:$Rm)))
Included from /work/llvm/lib/Target/AArch64/AArch64.td:58:
/work/llvm/lib/Target/AArch64/AArch64InstrInfo.td:2730:1: error: In SQRDMLAHv4i16: Could not infer all types in pattern!

though, when I changed "sqadd" for "add", it worked. So, I have abandoned this "TriOpFrag" approach,

it'd be much better to keep things as hierarchical as possible and just add a HasV8_1 predicate, or perhaps FeatureFPARMv8_1ble and just add a HasV8_1 predicate, or perhaps FeatureFPARMv8_1,

At the time when these predicates has been introduced downstream, we were not sure, whether it might be partial v8.05 implementations. Nowadays we are almost sure it should not be, but not 100%. However, I'd agree with you, and in case there unexpectedly some partial implementation will take place in the future, we'd better split this HasV8_1 predicate.
I'll change HasRDMA for HasV8_1 in next revision.
Yet, fortunately, no FeatureFPARMv8_1 will be required : FP has not been improved.

emaste added a subscriber: emaste.Mar 3 2015, 1:13 PM

Hi Tim,
after deep research I've come up with usage of v1i8, v1i16, v1i32 as valid contents of FPR8, FPR12, FPR32.
Only this way I could perform fusion of intrinsics sqadd and sqrdmulh, with v1i16 as operands, intermediate and result type.
In fact, even for current neon scalar instructions sqadd and sqrdmulh, v1i16 types should be used.
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0802a/SQRDMULH_advsimd_vec_scalar.html
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0802a/SQADD_advsimd_scalar.html
Instead, f16 type, enclosed by FPR16 is used. That results in tricky/hacky dag matching, in this way

f16 SQRDMULH(f16,f16,f16);
v1i16 op1;
v1i16 op2;
v1i16 result = cast(v1i16, SQRDMULH(cast(f16, op1), cast(f16, op2))

For single instruction it works, but I couldn't found a way to fuse two instrs, not using explicit v1i16 type. Won't do that using explicit f16 type.

Now locally I have the following

diff --git a/lib/Target/AArch64/AArch64RegisterInfo.td b/lib/Target/AArch64/AArch64RegisterInfo.td
index d5ff3f1..628e9c7 100644
--- a/lib/Target/AArch64/AArch64RegisterInfo.td
+++ b/lib/Target/AArch64/AArch64RegisterInfo.td
@@ -382,13 +382,13 @@ def Q30   : AArch64Reg<30, "q30", [D30], ["v30", ""]>, DwarfRegAlias<B30>;
 def Q31   : AArch64Reg<31, "q31", [D31], ["v31", ""]>, DwarfRegAlias<B31>;
 }
 
-def FPR8  : RegisterClass<"AArch64", [untyped], 8, (sequence "B%u", 0, 31)> {
+def FPR8  : RegisterClass<"AArch64", [untyped, v1i8], 8, (sequence "B%u", 0, 31)> {
   let Size = 8;
 }
-def FPR16 : RegisterClass<"AArch64", [f16], 16, (sequence "H%u", 0, 31)> {
+def FPR16 : RegisterClass<"AArch64", [f16, v1i16], 16, (sequence "H%u", 0, 31)> {
   let Size = 16;
 }
-def FPR32 : RegisterClass<"AArch64", [f32, i32], 32,(sequence "S%u", 0, 31)>;
+def FPR32 : RegisterClass<"AArch64", [f32, i32, v1i32], 32,(sequence "S%u", 0, 31)>;
 def FPR64 : RegisterClass<"AArch64", [f64, i64, v2f32, v1f64, v8i8, v4i16, v2i32,
                                     v1i64, v4f16],
                                     64, (sequence "D%u", 0, 31)>;

plus corresponding changes, polishing and explicit type qualifications added to AArch64InstrFormats.td and AArch64InstrInfo.td. Will submit this refactoring shortly.

I'm aware of recent doubts, whether to accept these types as fully valid ( http://permalink.gmane.org/gmane.comp.compilers.llvm.cvs/175395 ). I guess the time has come for the correct implementation.

I would appreciate any objections/suggestions/other approaches to fuse.
Thanks, Vladimir

after deep research I've come up with usage of v1i8, v1i16, v1i32 as valid contents of FPR8, FPR12, FPR32.

The original AArch64 backend went down that path, and I think it was a
mistake. It ended up adding lots of complexity for not much gain.

I think the proposed plan for actually using these scalar instructions
is by enhancing the AArch64AdvSIMDScalarPass to optimise these cases
when they occur.

Cheers.

Tim.

Hmm... this way, I'd like to postpone implementation of this RDMA extension, since we have some troubles with it.
Meanwhile I have implemented other 4 extensions, and they are ready to upstream.
Let's start work on it, and after it I'll return to rewrite RDMA?

I've just found, for an unknown reason, one feedback in this thread has been sent by mail, but is not present in this webpage.
Quoting it here:

> Hi Tim,
> thank you for your warm feedback.
>
> 1. **I don't think these new intrinsics are needed. The instructions 
> are effectively "(int_aarch64_neon_sqadd $acc, 
> (int_aarch64_neon_sqrdmulh $LHS, $RHS))".**
>
> Ok, but now I have a severe trouble that could be well-known for v1iNN types. If so, would you please give some hint?
> [...]
>   def : Pat<(i16 (int_aarch64_neon_sqadd (i16 FPR16:$Rd),
>                     (i16 (int_aarch64_neon_sqrdmulh (i16 FPR16:$Rn),
>                                                       (i16 
> FPR16:$Rm))))),

Most i16 intrinsics don't have patterns yet, because i16 isn't a legal
AArch64 type. Clang currently generates code like this (using vector
ops) to get those semantics:

define signext i16 @foo(i16 signext %l, i16 signext %r) #0 {
  %1 = insertelement <4 x i16> undef, i16 %l, i64 0
  %2 = insertelement <4 x i16> undef, i16 %r, i64 0
  %3 = tail call <4 x i16> @llvm.aarch64.neon.sqadd.v4i16(<4 x i16> %1, <4 x i16> %2) #2
  %4 = extractelement <4 x i16> %3, i64 0
  ret i16 %4
}

> Note: First I tried another way of implementation, like the following snipped for vector variant. It has failed due to incomplete type inference between two intrinsics.

That looks like an annoying shortcoming in TableGen's type inference, you often need to be more explicit than you'd hope when specifying the types of trees involving intrinsics.

Annoyingly, I think you sometimes need to create a new multiclass hierarchy to insert the needed types. We should probably fix that some time.

Cheers.

Tim.

Revision Contents

Path

Size

include/

llvm/

IR/

IntrinsicsAArch64.td

7 lines

lib/

Target/

AArch64/

AArch64.td

3 lines

AArch64InstrFormats.td

45 lines

AArch64InstrInfo.td

11 lines

AArch64Subtarget.h

2 lines

AArch64Subtarget.cpp

1 line

test/

CodeGen/

AArch64/

arm64-neon-2velem-rdma.ll

91 lines

arm64-neon-rdma-apple.ll

104 lines

arm64-neon-rdma.ll

68 lines

MC/

AArch64/

armv8-extension-rdma.s

154 lines

Disassembler/

AArch64/

armv8-extension-rdma.txt

53 lines

Diff 20989

include/llvm/IR/IntrinsicsAArch64.td

Show First 20 Lines • Show All 647 Lines • ▼ Show 20 Lines	def int_aarch64_crc32w : Intrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_i32_ty],
[IntrNoMem]>;		[IntrNoMem]>;
def int_aarch64_crc32cw : Intrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_i32_ty],		def int_aarch64_crc32cw : Intrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_i32_ty],
[IntrNoMem]>;		[IntrNoMem]>;
def int_aarch64_crc32x : Intrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_i64_ty],		def int_aarch64_crc32x : Intrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_i64_ty],
[IntrNoMem]>;		[IntrNoMem]>;
def int_aarch64_crc32cx : Intrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_i64_ty],		def int_aarch64_crc32cx : Intrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_i64_ty],
[IntrNoMem]>;		[IntrNoMem]>;
}		}

		//===----------------------------------------------------------------------===//
		// Advanced SIMD from ARMv8 RDMA extension
		// Vector Signed Saturating Rounding Doubling Multiply Accumulate Returning High Half
		def int_aarch64_neon_sqrdmlah : AdvSIMD_2IntArg_Intrinsic;
		t.p.northoverUnsubmitted Not Done Reply Inline Actions I don't think these new intrinsics are needed. The instructions are effectively "(int_aarch64_neon_sqadd $acc, (int_aarch64_neon_sqrdmulh $LHS, $RHS))". The only reason we've fused operations like that in the past is when loop optimisations can interfere and make block-at-a-time selection produce worse code. Basically, this is when sext/zext happen before the key NEON op, and those nodes get hoisted out and become invisible to local selection (e.g. the int_aarch64_neon_smull intrinsics). That shouldn't be an issue here. t.p.northover: I don't think these new intrinsics are needed. The instructions are effectively "…
		// Vector Signed Saturating Rounding Doubling Multiply Subtract Returning High Half
		def int_aarch64_neon_sqrdmlsh : AdvSIMD_2IntArg_Intrinsic;

lib/Target/AArch64/AArch64.td

	Show All 20 Lines
	//			//

	def FeatureFPARMv8 : SubtargetFeature<"fp-armv8", "HasFPARMv8", "true",			def FeatureFPARMv8 : SubtargetFeature<"fp-armv8", "HasFPARMv8", "true",
	"Enable ARMv8 FP">;			"Enable ARMv8 FP">;

	def FeatureNEON : SubtargetFeature<"neon", "HasNEON", "true",			def FeatureNEON : SubtargetFeature<"neon", "HasNEON", "true",
	"Enable Advanced SIMD instructions", [FeatureFPARMv8]>;			"Enable Advanced SIMD instructions", [FeatureFPARMv8]>;

				def FeatureRDMA: SubtargetFeature<"rdma","HasRDMA","true",
				"Enable Advanced SIMD instruction extensions",[FeatureNEON]>;
				t.p.northoverUnsubmitted Not Done Reply Inline Actions Just how granular is v8.1? Are these instructions optional, or are we expecting CPUs that support them but not v8.1 in general? If not, it'd be much better to keep things as hierarchical as possible and just add a HasV8_1 predicate, or perhaps FeatureFPARMv8_1, but again unless you expect a v8.1 core without a v8.1 FPU that could just be "Requires<HasV8_1, FeatureFPARMv8>". t.p.northover: Just how granular is v8.1? Are these instructions optional, or are we expecting CPUs that…

	def FeatureCrypto : SubtargetFeature<"crypto", "HasCrypto", "true",			def FeatureCrypto : SubtargetFeature<"crypto", "HasCrypto", "true",
	"Enable cryptographic instructions">;			"Enable cryptographic instructions">;

	def FeatureCRC : SubtargetFeature<"crc", "HasCRC", "true",			def FeatureCRC : SubtargetFeature<"crc", "HasCRC", "true",
	"Enable ARMv8 CRC-32 checksum instructions">;			"Enable ARMv8 CRC-32 checksum instructions">;

	/// Cyclone has register move instructions which are "free".			/// Cyclone has register move instructions which are "free".
	def FeatureZCRegMove : SubtargetFeature<"zcm", "HasZeroCycleRegMove", "true",			def FeatureZCRegMove : SubtargetFeature<"zcm", "HasZeroCycleRegMove", "true",
	▲ Show 20 Lines • Show All 101 Lines • Show Last 20 Lines

lib/Target/AArch64/AArch64InstrFormats.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 8,621 Lines • ▼ Show 20 Lines
	def : TokenAlias<".4S", ".4s">;			def : TokenAlias<".4S", ".4s">;
	def : TokenAlias<".2D", ".2d">;			def : TokenAlias<".2D", ".2d">;
	def : TokenAlias<".1Q", ".1q">;			def : TokenAlias<".1Q", ".1q">;
	def : TokenAlias<".B", ".b">;			def : TokenAlias<".B", ".b">;
	def : TokenAlias<".H", ".h">;			def : TokenAlias<".H", ".h">;
	def : TokenAlias<".S", ".s">;			def : TokenAlias<".S", ".s">;
	def : TokenAlias<".D", ".d">;			def : TokenAlias<".D", ".d">;
	def : TokenAlias<".Q", ".q">;			def : TokenAlias<".Q", ".q">;



				//===----------------------------------------------------------------------===//
				// ARMv8 RDMA extension
				let Predicates = [HasRDMA] in {

				class BaseSIMDThreeSameVectorExtRDMA<bit Q, bit U, bits<2> size, bits<5> opcode,
				RegisterOperand regtype, string asm, string kind,
				list<dag> pattern>
				: BaseSIMDThreeSameVector<Q, U, size, opcode, regtype, asm, kind, pattern> {
				let Inst{21}=0;
				}
				multiclass SIMDThreeSameVectorExtRDMA<bit U, bits<5> opc, string asm,
				SDPatternOperator OpNode> {
				def v4i16 : BaseSIMDThreeSameVectorExtRDMA<0, U, 0b01, opc, V64,
				asm, ".4h",
				[(set (v4i16 V64:$Rd), (OpNode (v4i16 V64:$Rn), (v4i16 V64:$Rm)))]>;
				def v8i16 : BaseSIMDThreeSameVectorExtRDMA<1, U, 0b01, opc, V128,
				asm, ".8h",
				[(set (v8i16 V128:$Rd), (OpNode (v8i16 V128:$Rn), (v8i16 V128:$Rm)))]>;
				def v2i32 : BaseSIMDThreeSameVectorExtRDMA<0, U, 0b10, opc, V64,
				asm, ".2s",
				[(set (v2i32 V64:$Rd), (OpNode (v2i32 V64:$Rn), (v2i32 V64:$Rm)))]>;
				def v4i32 : BaseSIMDThreeSameVectorExtRDMA<1, U, 0b10, opc, V128,
				asm, ".4s",
				[(set (v4i32 V128:$Rd), (OpNode (v4i32 V128:$Rn), (v4i32 V128:$Rm)))]>;
				}
				class BaseSIMDThreeScalarExtRDMA<bit U, bits<2> size, bits<5> opcode,
				RegisterClass regtype, string asm,
				list<dag> pattern>
				: BaseSIMDThreeScalar<U, size, opcode, regtype, asm, pattern> {
				let Inst{21} =0;
				}

				multiclass SIMDThreeScalarHSExtRDMA<bit U, bits<5> opc, string asm,
				SDPatternOperator OpNode> {
				def v1i32 : BaseSIMDThreeScalarExtRDMA<U, 0b10, opc, FPR32, asm, []>;
				def v1i16 : BaseSIMDThreeScalarExtRDMA<U, 0b01, opc, FPR16, asm, []>;

				def : Pat<(i32 (OpNode (i32 FPR32:$Rn), (i32 FPR32:$Rm))),
				(!cast<Instruction>(NAME#"v1i32") FPR32:$Rn, FPR32:$Rm)>;
				}
				} // let Predicates = [HasRDMA]
				//===----- END ARMv8 RDMA extension ---------------------------------------===//

lib/Target/AArch64/AArch64InstrInfo.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show All 16 Lines
def HasFPARMv8 : Predicate<"Subtarget->hasFPARMv8()">,		def HasFPARMv8 : Predicate<"Subtarget->hasFPARMv8()">,
AssemblerPredicate<"FeatureFPARMv8", "fp-armv8">;		AssemblerPredicate<"FeatureFPARMv8", "fp-armv8">;
def HasNEON : Predicate<"Subtarget->hasNEON()">,		def HasNEON : Predicate<"Subtarget->hasNEON()">,
AssemblerPredicate<"FeatureNEON", "neon">;		AssemblerPredicate<"FeatureNEON", "neon">;
def HasCrypto : Predicate<"Subtarget->hasCrypto()">,		def HasCrypto : Predicate<"Subtarget->hasCrypto()">,
AssemblerPredicate<"FeatureCrypto", "crypto">;		AssemblerPredicate<"FeatureCrypto", "crypto">;
def HasCRC : Predicate<"Subtarget->hasCRC()">,		def HasCRC : Predicate<"Subtarget->hasCRC()">,
AssemblerPredicate<"FeatureCRC", "crc">;		AssemblerPredicate<"FeatureCRC", "crc">;
		def HasRDMA : Predicate<"Subtarget->hasRDMA()">,
		AssemblerPredicate<"FeatureRDMA", "rdma">;

def IsLE : Predicate<"Subtarget->isLittleEndian()">;		def IsLE : Predicate<"Subtarget->isLittleEndian()">;
def IsBE : Predicate<"!Subtarget->isLittleEndian()">;		def IsBE : Predicate<"!Subtarget->isLittleEndian()">;
def IsCyclone : Predicate<"Subtarget->isCyclone()">;		def IsCyclone : Predicate<"Subtarget->isCyclone()">;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// AArch64-specific DAG Nodes.		// AArch64-specific DAG Nodes.
//		//

▲ Show 20 Lines • Show All 2,686 Lines • ▼ Show 20 Lines
defm SHSUB : SIMDThreeSameVectorBHS<0,0b00100,"shsub", int_aarch64_neon_shsub>;		defm SHSUB : SIMDThreeSameVectorBHS<0,0b00100,"shsub", int_aarch64_neon_shsub>;
defm SMAXP : SIMDThreeSameVectorBHS<0,0b10100,"smaxp", int_aarch64_neon_smaxp>;		defm SMAXP : SIMDThreeSameVectorBHS<0,0b10100,"smaxp", int_aarch64_neon_smaxp>;
defm SMAX : SIMDThreeSameVectorBHS<0,0b01100,"smax", int_aarch64_neon_smax>;		defm SMAX : SIMDThreeSameVectorBHS<0,0b01100,"smax", int_aarch64_neon_smax>;
defm SMINP : SIMDThreeSameVectorBHS<0,0b10101,"sminp", int_aarch64_neon_sminp>;		defm SMINP : SIMDThreeSameVectorBHS<0,0b10101,"sminp", int_aarch64_neon_sminp>;
defm SMIN : SIMDThreeSameVectorBHS<0,0b01101,"smin", int_aarch64_neon_smin>;		defm SMIN : SIMDThreeSameVectorBHS<0,0b01101,"smin", int_aarch64_neon_smin>;
defm SQADD : SIMDThreeSameVector<0,0b00001,"sqadd", int_aarch64_neon_sqadd>;		defm SQADD : SIMDThreeSameVector<0,0b00001,"sqadd", int_aarch64_neon_sqadd>;
defm SQDMULH : SIMDThreeSameVectorHS<0,0b10110,"sqdmulh",int_aarch64_neon_sqdmulh>;		defm SQDMULH : SIMDThreeSameVectorHS<0,0b10110,"sqdmulh",int_aarch64_neon_sqdmulh>;
defm SQRDMULH : SIMDThreeSameVectorHS<1,0b10110,"sqrdmulh",int_aarch64_neon_sqrdmulh>;		defm SQRDMULH : SIMDThreeSameVectorHS<1,0b10110,"sqrdmulh",int_aarch64_neon_sqrdmulh>;
		defm SQRDMLAH : SIMDThreeSameVectorExtRDMA<1,0b10000,"sqrdmlah",int_aarch64_neon_sqrdmlah>;
		defm SQRDMLSH : SIMDThreeSameVectorExtRDMA<1,0b10001,"sqrdmlsh",int_aarch64_neon_sqrdmlsh>;
defm SQRSHL : SIMDThreeSameVector<0,0b01011,"sqrshl", int_aarch64_neon_sqrshl>;		defm SQRSHL : SIMDThreeSameVector<0,0b01011,"sqrshl", int_aarch64_neon_sqrshl>;
defm SQSHL : SIMDThreeSameVector<0,0b01001,"sqshl", int_aarch64_neon_sqshl>;		defm SQSHL : SIMDThreeSameVector<0,0b01001,"sqshl", int_aarch64_neon_sqshl>;
defm SQSUB : SIMDThreeSameVector<0,0b00101,"sqsub", int_aarch64_neon_sqsub>;		defm SQSUB : SIMDThreeSameVector<0,0b00101,"sqsub", int_aarch64_neon_sqsub>;
defm SRHADD : SIMDThreeSameVectorBHS<0,0b00010,"srhadd",int_aarch64_neon_srhadd>;		defm SRHADD : SIMDThreeSameVectorBHS<0,0b00010,"srhadd",int_aarch64_neon_srhadd>;
defm SRSHL : SIMDThreeSameVector<0,0b01010,"srshl", int_aarch64_neon_srshl>;		defm SRSHL : SIMDThreeSameVector<0,0b01010,"srshl", int_aarch64_neon_srshl>;
defm SSHL : SIMDThreeSameVector<0,0b01000,"sshl", int_aarch64_neon_sshl>;		defm SSHL : SIMDThreeSameVector<0,0b01000,"sshl", int_aarch64_neon_sshl>;
defm SUB : SIMDThreeSameVector<1,0b10000,"sub", sub>;		defm SUB : SIMDThreeSameVector<1,0b10000,"sub", sub>;
defm UABA : SIMDThreeSameVectorBHSTied<1, 0b01111, "uaba",		defm UABA : SIMDThreeSameVectorBHSTied<1, 0b01111, "uaba",
▲ Show 20 Lines • Show All 211 Lines • ▼ Show 20 Lines
defm FCMGE : SIMDThreeScalarFPCmp<1, 0, 0b11100, "fcmge", AArch64fcmge>;		defm FCMGE : SIMDThreeScalarFPCmp<1, 0, 0b11100, "fcmge", AArch64fcmge>;
defm FCMGT : SIMDThreeScalarFPCmp<1, 1, 0b11100, "fcmgt", AArch64fcmgt>;		defm FCMGT : SIMDThreeScalarFPCmp<1, 1, 0b11100, "fcmgt", AArch64fcmgt>;
defm FMULX : SIMDThreeScalarSD<0, 0, 0b11011, "fmulx", int_aarch64_neon_fmulx>;		defm FMULX : SIMDThreeScalarSD<0, 0, 0b11011, "fmulx", int_aarch64_neon_fmulx>;
defm FRECPS : SIMDThreeScalarSD<0, 0, 0b11111, "frecps", int_aarch64_neon_frecps>;		defm FRECPS : SIMDThreeScalarSD<0, 0, 0b11111, "frecps", int_aarch64_neon_frecps>;
defm FRSQRTS : SIMDThreeScalarSD<0, 1, 0b11111, "frsqrts", int_aarch64_neon_frsqrts>;		defm FRSQRTS : SIMDThreeScalarSD<0, 1, 0b11111, "frsqrts", int_aarch64_neon_frsqrts>;
defm SQADD : SIMDThreeScalarBHSD<0, 0b00001, "sqadd", int_aarch64_neon_sqadd>;		defm SQADD : SIMDThreeScalarBHSD<0, 0b00001, "sqadd", int_aarch64_neon_sqadd>;
defm SQDMULH : SIMDThreeScalarHS< 0, 0b10110, "sqdmulh", int_aarch64_neon_sqdmulh>;		defm SQDMULH : SIMDThreeScalarHS< 0, 0b10110, "sqdmulh", int_aarch64_neon_sqdmulh>;
defm SQRDMULH : SIMDThreeScalarHS< 1, 0b10110, "sqrdmulh", int_aarch64_neon_sqrdmulh>;		defm SQRDMULH : SIMDThreeScalarHS< 1, 0b10110, "sqrdmulh", int_aarch64_neon_sqrdmulh>;
		defm SQRDMLAH : SIMDThreeScalarHSExtRDMA<1, 0b10000, "sqrdmlah",int_aarch64_neon_sqrdmlah>;
		defm SQRDMLSH : SIMDThreeScalarHSExtRDMA<1, 0b10001, "sqrdmlsh",int_aarch64_neon_sqrdmlsh>;
defm SQRSHL : SIMDThreeScalarBHSD<0, 0b01011, "sqrshl",int_aarch64_neon_sqrshl>;		defm SQRSHL : SIMDThreeScalarBHSD<0, 0b01011, "sqrshl",int_aarch64_neon_sqrshl>;
defm SQSHL : SIMDThreeScalarBHSD<0, 0b01001, "sqshl", int_aarch64_neon_sqshl>;		defm SQSHL : SIMDThreeScalarBHSD<0, 0b01001, "sqshl", int_aarch64_neon_sqshl>;
defm SQSUB : SIMDThreeScalarBHSD<0, 0b00101, "sqsub", int_aarch64_neon_sqsub>;		defm SQSUB : SIMDThreeScalarBHSD<0, 0b00101, "sqsub", int_aarch64_neon_sqsub>;
defm SRSHL : SIMDThreeScalarD< 0, 0b01010, "srshl", int_aarch64_neon_srshl>;		defm SRSHL : SIMDThreeScalarD< 0, 0b01010, "srshl", int_aarch64_neon_srshl>;
defm SSHL : SIMDThreeScalarD< 0, 0b01000, "sshl", int_aarch64_neon_sshl>;		defm SSHL : SIMDThreeScalarD< 0, 0b01000, "sshl", int_aarch64_neon_sshl>;
defm SUB : SIMDThreeScalarD< 1, 0b10000, "sub", sub>;		defm SUB : SIMDThreeScalarD< 1, 0b10000, "sub", sub>;
defm UQADD : SIMDThreeScalarBHSD<1, 0b00001, "uqadd", int_aarch64_neon_uqadd>;		defm UQADD : SIMDThreeScalarBHSD<1, 0b00001, "uqadd", int_aarch64_neon_uqadd>;
defm UQRSHL : SIMDThreeScalarBHSD<1, 0b01011, "uqrshl",int_aarch64_neon_uqrshl>;		defm UQRSHL : SIMDThreeScalarBHSD<1, 0b01011, "uqrshl",int_aarch64_neon_uqrshl>;
▲ Show 20 Lines • Show All 1,319 Lines • ▼ Show 20 Lines	def : Pat<(v4f32 (fmul V128:$Rn, (AArch64dup (f32 FPR32:$Rm)))),
(i64 0))>;		(i64 0))>;
def : Pat<(v2f64 (fmul V128:$Rn, (AArch64dup (f64 FPR64:$Rm)))),		def : Pat<(v2f64 (fmul V128:$Rn, (AArch64dup (f64 FPR64:$Rm)))),
(FMULv2i64_indexed V128:$Rn,		(FMULv2i64_indexed V128:$Rn,
(INSERT_SUBREG (v4i32 (IMPLICIT_DEF)), FPR64:$Rm, dsub),		(INSERT_SUBREG (v4i32 (IMPLICIT_DEF)), FPR64:$Rm, dsub),
(i64 0))>;		(i64 0))>;

defm SQDMULH : SIMDIndexedHS<0, 0b1100, "sqdmulh", int_aarch64_neon_sqdmulh>;		defm SQDMULH : SIMDIndexedHS<0, 0b1100, "sqdmulh", int_aarch64_neon_sqdmulh>;
defm SQRDMULH : SIMDIndexedHS<0, 0b1101, "sqrdmulh", int_aarch64_neon_sqrdmulh>;		defm SQRDMULH : SIMDIndexedHS<0, 0b1101, "sqrdmulh", int_aarch64_neon_sqrdmulh>;
		let Predicates = [HasRDMA] in {
		defm SQRDMLAH : SIMDIndexedHS<1, 0b1101, "sqrdmlah", int_aarch64_neon_sqrdmlah>;
		defm SQRDMLSH : SIMDIndexedHS<1, 0b1111, "sqrdmlsh", int_aarch64_neon_sqrdmlsh>;
		}
defm MLA : SIMDVectorIndexedHSTied<1, 0b0000, "mla",		defm MLA : SIMDVectorIndexedHSTied<1, 0b0000, "mla",
TriOpFrag<(add node:$LHS, (mul node:$MHS, node:$RHS))>>;		TriOpFrag<(add node:$LHS, (mul node:$MHS, node:$RHS))>>;
defm MLS : SIMDVectorIndexedHSTied<1, 0b0100, "mls",		defm MLS : SIMDVectorIndexedHSTied<1, 0b0100, "mls",
TriOpFrag<(sub node:$LHS, (mul node:$MHS, node:$RHS))>>;		TriOpFrag<(sub node:$LHS, (mul node:$MHS, node:$RHS))>>;
defm MUL : SIMDVectorIndexedHS<0, 0b1000, "mul", mul>;		defm MUL : SIMDVectorIndexedHS<0, 0b1000, "mul", mul>;
defm SMLAL : SIMDVectorIndexedLongSDTied<0, 0b0010, "smlal",		defm SMLAL : SIMDVectorIndexedLongSDTied<0, 0b0010, "smlal",
TriOpFrag<(add node:$LHS, (int_aarch64_neon_smull node:$MHS, node:$RHS))>>;		TriOpFrag<(add node:$LHS, (int_aarch64_neon_smull node:$MHS, node:$RHS))>>;
defm SMLSL : SIMDVectorIndexedLongSDTied<0, 0b0110, "smlsl",		defm SMLSL : SIMDVectorIndexedLongSDTied<0, 0b0110, "smlsl",
▲ Show 20 Lines • Show All 1,378 Lines • Show Last 20 Lines

lib/Target/AArch64/AArch64Subtarget.h

Show All 35 Lines	protected:

/// ARMProcFamily - ARM processor family: Cortex-A53, Cortex-A57, and others.		/// ARMProcFamily - ARM processor family: Cortex-A53, Cortex-A57, and others.
ARMProcFamilyEnum ARMProcFamily;		ARMProcFamilyEnum ARMProcFamily;

bool HasFPARMv8;		bool HasFPARMv8;
bool HasNEON;		bool HasNEON;
bool HasCrypto;		bool HasCrypto;
bool HasCRC;		bool HasCRC;
		bool HasRDMA;

// HasZeroCycleRegMove - Has zero-cycle register mov instructions.		// HasZeroCycleRegMove - Has zero-cycle register mov instructions.
bool HasZeroCycleRegMove;		bool HasZeroCycleRegMove;

// HasZeroCycleZeroing - Has zero-cycle zeroing instructions.		// HasZeroCycleZeroing - Has zero-cycle zeroing instructions.
bool HasZeroCycleZeroing;		bool HasZeroCycleZeroing;

bool IsLittle;		bool IsLittle;
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	public:
bool hasZeroCycleRegMove() const { return HasZeroCycleRegMove; }		bool hasZeroCycleRegMove() const { return HasZeroCycleRegMove; }

bool hasZeroCycleZeroing() const { return HasZeroCycleZeroing; }		bool hasZeroCycleZeroing() const { return HasZeroCycleZeroing; }

bool hasFPARMv8() const { return HasFPARMv8; }		bool hasFPARMv8() const { return HasFPARMv8; }
bool hasNEON() const { return HasNEON; }		bool hasNEON() const { return HasNEON; }
bool hasCrypto() const { return HasCrypto; }		bool hasCrypto() const { return HasCrypto; }
bool hasCRC() const { return HasCRC; }		bool hasCRC() const { return HasCRC; }
		bool hasRDMA() const { return HasRDMA; }

bool isLittleEndian() const { return IsLittle; }		bool isLittleEndian() const { return IsLittle; }

bool isTargetDarwin() const { return TargetTriple.isOSDarwin(); }		bool isTargetDarwin() const { return TargetTriple.isOSDarwin(); }
bool isTargetIOS() const { return TargetTriple.isiOS(); }		bool isTargetIOS() const { return TargetTriple.isiOS(); }
bool isTargetLinux() const { return TargetTriple.isOSLinux(); }		bool isTargetLinux() const { return TargetTriple.isOSLinux(); }
bool isTargetWindows() const { return TargetTriple.isOSWindows(); }		bool isTargetWindows() const { return TargetTriple.isOSWindows(); }

▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

lib/Target/AArch64/AArch64Subtarget.cpp

	Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
	}			}

	AArch64Subtarget::AArch64Subtarget(const std::string &TT,			AArch64Subtarget::AArch64Subtarget(const std::string &TT,
	const std::string &CPU,			const std::string &CPU,
	const std::string &FS,			const std::string &FS,
	const TargetMachine &TM, bool LittleEndian)			const TargetMachine &TM, bool LittleEndian)
	: AArch64GenSubtargetInfo(TT, CPU, FS), ARMProcFamily(Others),			: AArch64GenSubtargetInfo(TT, CPU, FS), ARMProcFamily(Others),
	HasFPARMv8(false), HasNEON(false), HasCrypto(false), HasCRC(false),			HasFPARMv8(false), HasNEON(false), HasCrypto(false), HasCRC(false),
				HasRDMA(false),
	HasZeroCycleRegMove(false), HasZeroCycleZeroing(false),			HasZeroCycleRegMove(false), HasZeroCycleZeroing(false),
	IsLittle(LittleEndian), CPUString(CPU), TargetTriple(TT), FrameLowering(),			IsLittle(LittleEndian), CPUString(CPU), TargetTriple(TT), FrameLowering(),
	InstrInfo(initializeSubtargetDependencies(FS)),			InstrInfo(initializeSubtargetDependencies(FS)),
	TSInfo(TM.getDataLayout()), TLInfo(TM, *this) {}			TSInfo(TM.getDataLayout()), TLInfo(TM, *this) {}

	/// ClassifyGlobalReference - Find the target operand flags that describe			/// ClassifyGlobalReference - Find the target operand flags that describe
	/// how a global value should be referenced for the current subtarget.			/// how a global value should be referenced for the current subtarget.
	unsigned char			unsigned char
	▲ Show 20 Lines • Show All 73 Lines • Show Last 20 Lines

test/CodeGen/AArch64/arm64-neon-2velem-rdma.ll

This file was added.

				; RUN: llc < %s -verify-machineinstrs -mtriple=arm64-none-linux-gnu -mattr=+rdma -fp-contract=fast \| FileCheck %s

				declare <4 x i32> @llvm.aarch64.neon.sqrdmlah.v4i32(<4 x i32>, <4 x i32>)
				declare <2 x i32> @llvm.aarch64.neon.sqrdmlah.v2i32(<2 x i32>, <2 x i32>)
				declare <8 x i16> @llvm.aarch64.neon.sqrdmlah.v8i16(<8 x i16>, <8 x i16>)
				declare <4 x i16> @llvm.aarch64.neon.sqrdmlah.v4i16(<4 x i16>, <4 x i16>)

				define <4 x i16> @test_vqrdmlah_lane_s16(<4 x i16> %a, <4 x i16> %v) {
				; CHECK-LABEL: test_vqrdmlah_lane_s16:
				; CHECK: qrdmlah {{v[0-9]+}}.4h, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[3]
				; CHECK-NEXT: ret
				entry:
				%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 3, i32 3, i32 3, i32 3>
				%vqrdmlah2.i = tail call <4 x i16> @llvm.aarch64.neon.sqrdmlah.v4i16(<4 x i16> %a, <4 x i16> %shuffle)
				ret <4 x i16> %vqrdmlah2.i
				}

				define <8 x i16> @test_vqrdmlahq_lane_s16(<8 x i16> %a, <4 x i16> %v) {
				; CHECK-LABEL: test_vqrdmlahq_lane_s16:
				; CHECK: qrdmlah {{v[0-9]+}}.8h, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[3]
				; CHECK-NEXT: ret
				entry:
				%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <8 x i32> <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>
				%vqrdmlah2.i = tail call <8 x i16> @llvm.aarch64.neon.sqrdmlah.v8i16(<8 x i16> %a, <8 x i16> %shuffle)
				ret <8 x i16> %vqrdmlah2.i
				}

				define <2 x i32> @test_vqrdmlah_lane_s32(<2 x i32> %a, <2 x i32> %v) {
				; CHECK-LABEL: test_vqrdmlah_lane_s32:
				; CHECK: qrdmlah {{v[0-9]+}}.2s, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[1]
				; CHECK-NEXT: ret
				entry:
				%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> <i32 1, i32 1>
				%vqrdmlah2.i = tail call <2 x i32> @llvm.aarch64.neon.sqrdmlah.v2i32(<2 x i32> %a, <2 x i32> %shuffle)
				ret <2 x i32> %vqrdmlah2.i
				}

				define <4 x i32> @test_vqrdmlahq_lane_s32(<4 x i32> %a, <2 x i32> %v) {
				; CHECK-LABEL: test_vqrdmlahq_lane_s32:
				; CHECK: qrdmlah {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[1]
				; CHECK-NEXT: ret
				entry:
				%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%vqrdmlah2.i = tail call <4 x i32> @llvm.aarch64.neon.sqrdmlah.v4i32(<4 x i32> %a, <4 x i32> %shuffle)
				ret <4 x i32> %vqrdmlah2.i
				}

				declare <4 x i32> @llvm.aarch64.neon.sqrdmlsh.v4i32(<4 x i32>, <4 x i32>)
				declare <2 x i32> @llvm.aarch64.neon.sqrdmlsh.v2i32(<2 x i32>, <2 x i32>)
				declare <8 x i16> @llvm.aarch64.neon.sqrdmlsh.v8i16(<8 x i16>, <8 x i16>)
				declare <4 x i16> @llvm.aarch64.neon.sqrdmlsh.v4i16(<4 x i16>, <4 x i16>)

				define <4 x i16> @test_vqrdmlsh_lane_s16(<4 x i16> %a, <4 x i16> %v) {
				; CHECK-LABEL: test_vqrdmlsh_lane_s16:
				; CHECK: qrdmlsh {{v[0-9]+}}.4h, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[3]
				; CHECK-NEXT: ret
				entry:
				%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 3, i32 3, i32 3, i32 3>
				%vqrdmlsh2.i = tail call <4 x i16> @llvm.aarch64.neon.sqrdmlsh.v4i16(<4 x i16> %a, <4 x i16> %shuffle)
				ret <4 x i16> %vqrdmlsh2.i
				}

				define <8 x i16> @test_vqrdmlshq_lane_s16(<8 x i16> %a, <4 x i16> %v) {
				; CHECK-LABEL: test_vqrdmlshq_lane_s16:
				; CHECK: qrdmlsh {{v[0-9]+}}.8h, {{v[0-9]+}}.8h, {{v[0-9]+}}.h[3]
				; CHECK-NEXT: ret
				entry:
				%shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <8 x i32> <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>
				%vqrdmlsh2.i = tail call <8 x i16> @llvm.aarch64.neon.sqrdmlsh.v8i16(<8 x i16> %a, <8 x i16> %shuffle)
				ret <8 x i16> %vqrdmlsh2.i
				}

				define <2 x i32> @test_vqrdmlsh_lane_s32(<2 x i32> %a, <2 x i32> %v) {
				; CHECK-LABEL: test_vqrdmlsh_lane_s32:
				; CHECK: qrdmlsh {{v[0-9]+}}.2s, {{v[0-9]+}}.2s, {{v[0-9]+}}.s[1]
				; CHECK-NEXT: ret
				entry:
				%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> <i32 1, i32 1>
				%vqrdmlsh2.i = tail call <2 x i32> @llvm.aarch64.neon.sqrdmlsh.v2i32(<2 x i32> %a, <2 x i32> %shuffle)
				ret <2 x i32> %vqrdmlsh2.i
				}

				define <4 x i32> @test_vqrdmlshq_lane_s32(<4 x i32> %a, <2 x i32> %v) {
				; CHECK-LABEL: test_vqrdmlshq_lane_s32:
				; CHECK: qrdmlsh {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[1]
				; CHECK-NEXT: ret
				entry:
				%shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
				%vqrdmlsh2.i = tail call <4 x i32> @llvm.aarch64.neon.sqrdmlsh.v4i32(<4 x i32> %a, <4 x i32> %shuffle)
				ret <4 x i32> %vqrdmlsh2.i
				}

test/CodeGen/AArch64/arm64-neon-rdma-apple.ll

This file was added.

				; RUN: llc -asm-verbose=false < %s -march=arm64 -mattr=+rdma -aarch64-neon-syntax=apple \| FileCheck %s


				define <4 x i16> @sqrdmlah_4h(<4 x i16>* %A, <4 x i16>* %B) nounwind {
				;CHECK-LABEL: sqrdmlah_4h:
				;CHECK: sqrdmlah.4h
				%tmp1 = load <4 x i16>* %A
				%tmp2 = load <4 x i16>* %B
				%tmp3 = call <4 x i16> @llvm.aarch64.neon.sqrdmlah.v4i16(<4 x i16> %tmp1, <4 x i16> %tmp2)
				ret <4 x i16> %tmp3
				}

				define <8 x i16> @sqrdmlah_8h(<8 x i16>* %A, <8 x i16>* %B) nounwind {
				;CHECK-LABEL: sqrdmlah_8h:
				;CHECK: sqrdmlah.8h
				%tmp1 = load <8 x i16>* %A
				%tmp2 = load <8 x i16>* %B
				%tmp3 = call <8 x i16> @llvm.aarch64.neon.sqrdmlah.v8i16(<8 x i16> %tmp1, <8 x i16> %tmp2)
				ret <8 x i16> %tmp3
				}

				define <2 x i32> @sqrdmlah_2s(<2 x i32>* %A, <2 x i32>* %B) nounwind {
				;CHECK-LABEL: sqrdmlah_2s:
				;CHECK: sqrdmlah.2s
				%tmp1 = load <2 x i32>* %A
				%tmp2 = load <2 x i32>* %B
				%tmp3 = call <2 x i32> @llvm.aarch64.neon.sqrdmlah.v2i32(<2 x i32> %tmp1, <2 x i32> %tmp2)
				ret <2 x i32> %tmp3
				}

				define <4 x i32> @sqrdmlah_4s(<4 x i32>* %A, <4 x i32>* %B) nounwind {
				;CHECK-LABEL: sqrdmlah_4s:
				;CHECK: sqrdmlah.4s
				%tmp1 = load <4 x i32>* %A
				%tmp2 = load <4 x i32>* %B
				%tmp3 = call <4 x i32> @llvm.aarch64.neon.sqrdmlah.v4i32(<4 x i32> %tmp1, <4 x i32> %tmp2)
				ret <4 x i32> %tmp3
				}

				define i32 @sqrdmlah_1s(i32* %A, i32* %B) nounwind {
				;CHECK-LABEL: sqrdmlah_1s:
				;CHECK: sqrdmlah s0, {{s[0-9]+}}, {{s[0-9]+}}
				%tmp1 = load i32* %A
				%tmp2 = load i32* %B
				%tmp3 = call i32 @llvm.aarch64.neon.sqrdmlah.i32(i32 %tmp1, i32 %tmp2)
				ret i32 %tmp3
				}

				declare <4 x i16> @llvm.aarch64.neon.sqrdmlah.v4i16(<4 x i16>, <4 x i16>) nounwind readnone
				declare <8 x i16> @llvm.aarch64.neon.sqrdmlah.v8i16(<8 x i16>, <8 x i16>) nounwind readnone
				declare <2 x i32> @llvm.aarch64.neon.sqrdmlah.v2i32(<2 x i32>, <2 x i32>) nounwind readnone
				declare <4 x i32> @llvm.aarch64.neon.sqrdmlah.v4i32(<4 x i32>, <4 x i32>) nounwind readnone
				declare i32 @llvm.aarch64.neon.sqrdmlah.i32(i32, i32)

				define <4 x i16> @sqrdmlsh_4h(<4 x i16>* %A, <4 x i16>* %B) nounwind {
				;CHECK-LABEL: sqrdmlsh_4h:
				;CHECK: sqrdmlsh.4h
				%tmp1 = load <4 x i16>* %A
				%tmp2 = load <4 x i16>* %B
				%tmp3 = call <4 x i16> @llvm.aarch64.neon.sqrdmlsh.v4i16(<4 x i16> %tmp1, <4 x i16> %tmp2)
				ret <4 x i16> %tmp3
				}

				define <8 x i16> @sqrdmlsh_8h(<8 x i16>* %A, <8 x i16>* %B) nounwind {
				;CHECK-LABEL: sqrdmlsh_8h:
				;CHECK: sqrdmlsh.8h
				%tmp1 = load <8 x i16>* %A
				%tmp2 = load <8 x i16>* %B
				%tmp3 = call <8 x i16> @llvm.aarch64.neon.sqrdmlsh.v8i16(<8 x i16> %tmp1, <8 x i16> %tmp2)
				ret <8 x i16> %tmp3
				}

				define <2 x i32> @sqrdmlsh_2s(<2 x i32>* %A, <2 x i32>* %B) nounwind {
				;CHECK-LABEL: sqrdmlsh_2s:
				;CHECK: sqrdmlsh.2s
				%tmp1 = load <2 x i32>* %A
				%tmp2 = load <2 x i32>* %B
				%tmp3 = call <2 x i32> @llvm.aarch64.neon.sqrdmlsh.v2i32(<2 x i32> %tmp1, <2 x i32> %tmp2)
				ret <2 x i32> %tmp3
				}

				define <4 x i32> @sqrdmlsh_4s(<4 x i32>* %A, <4 x i32>* %B) nounwind {
				;CHECK-LABEL: sqrdmlsh_4s:
				;CHECK: sqrdmlsh.4s
				%tmp1 = load <4 x i32>* %A
				%tmp2 = load <4 x i32>* %B
				%tmp3 = call <4 x i32> @llvm.aarch64.neon.sqrdmlsh.v4i32(<4 x i32> %tmp1, <4 x i32> %tmp2)
				ret <4 x i32> %tmp3
				}

				define i32 @sqrdmlsh_1s(i32* %A, i32* %B) nounwind {
				;CHECK-LABEL: sqrdmlsh_1s:
				;CHECK: sqrdmlsh s0, {{s[0-9]+}}, {{s[0-9]+}}
				%tmp1 = load i32* %A
				%tmp2 = load i32* %B
				%tmp3 = call i32 @llvm.aarch64.neon.sqrdmlsh.i32(i32 %tmp1, i32 %tmp2)
				ret i32 %tmp3
				}

				declare <4 x i16> @llvm.aarch64.neon.sqrdmlsh.v4i16(<4 x i16>, <4 x i16>) nounwind readnone
				declare <8 x i16> @llvm.aarch64.neon.sqrdmlsh.v8i16(<8 x i16>, <8 x i16>) nounwind readnone
				declare <2 x i32> @llvm.aarch64.neon.sqrdmlsh.v2i32(<2 x i32>, <2 x i32>) nounwind readnone
				declare <4 x i32> @llvm.aarch64.neon.sqrdmlsh.v4i32(<4 x i32>, <4 x i32>) nounwind readnone
				declare i32 @llvm.aarch64.neon.sqrdmlsh.i32(i32, i32) nounwind readnone

test/CodeGen/AArch64/arm64-neon-rdma.ll

This file was added.

				; RUN: llc < %s -verify-machineinstrs -mtriple=arm64-none-linux-gnu -mattr=+rdma \| FileCheck %s
				; arm64 has its own copy of this because of the intrinsics

				declare <4 x i16> @llvm.aarch64.neon.sqrdmlah.v4i16(<4 x i16>, <4 x i16>)
				declare <8 x i16> @llvm.aarch64.neon.sqrdmlah.v8i16(<8 x i16>, <8 x i16>)
				declare <2 x i32> @llvm.aarch64.neon.sqrdmlah.v2i32(<2 x i32>, <2 x i32>)
				declare <4 x i32> @llvm.aarch64.neon.sqrdmlah.v4i32(<4 x i32>, <4 x i32>)

				define <4 x i16> @test_sqrdmlah_v4i16(<4 x i16> %lhs, <4 x i16> %rhs) {
				; CHECK-LABEL: test_sqrdmlah_v4i16:
				%prod = call <4 x i16> @llvm.aarch64.neon.sqrdmlah.v4i16(<4 x i16> %lhs, <4 x i16> %rhs)
				; CHECK: sqrdmlah v0.4h, v0.4h, v1.4h
				ret <4 x i16> %prod
				}

				define <8 x i16> @test_sqrdmlah_v8i16(<8 x i16> %lhs, <8 x i16> %rhs) {
				; CHECK-LABEL: test_sqrdmlah_v8i16:
				%prod = call <8 x i16> @llvm.aarch64.neon.sqrdmlah.v8i16(<8 x i16> %lhs, <8 x i16> %rhs)
				; CHECK: sqrdmlah v0.8h, v0.8h, v1.8h
				ret <8 x i16> %prod
				}

				define <2 x i32> @test_sqrdmlah_v2i32(<2 x i32> %lhs, <2 x i32> %rhs) {
				; CHECK-LABEL: test_sqrdmlah_v2i32:
				%prod = call <2 x i32> @llvm.aarch64.neon.sqrdmlah.v2i32(<2 x i32> %lhs, <2 x i32> %rhs)
				; CHECK: sqrdmlah v0.2s, v0.2s, v1.2s
				ret <2 x i32> %prod
				}

				define <4 x i32> @test_sqrdmlah_v4i32(<4 x i32> %lhs, <4 x i32> %rhs) {
				; CHECK-LABEL: test_sqrdmlah_v4i32:
				%prod = call <4 x i32> @llvm.aarch64.neon.sqrdmlah.v4i32(<4 x i32> %lhs, <4 x i32> %rhs)
				; CHECK: sqrdmlah v0.4s, v0.4s, v1.4s
				ret <4 x i32> %prod
				}

				declare <4 x i16> @llvm.aarch64.neon.sqrdmlsh.v4i16(<4 x i16>, <4 x i16>)
				declare <8 x i16> @llvm.aarch64.neon.sqrdmlsh.v8i16(<8 x i16>, <8 x i16>)
				declare <2 x i32> @llvm.aarch64.neon.sqrdmlsh.v2i32(<2 x i32>, <2 x i32>)
				declare <4 x i32> @llvm.aarch64.neon.sqrdmlsh.v4i32(<4 x i32>, <4 x i32>)

				define <4 x i16> @test_sqrdmlsh_v4i16(<4 x i16> %lhs, <4 x i16> %rhs) {
				; CHECK-LABEL: test_sqrdmlsh_v4i16:
				%prod = call <4 x i16> @llvm.aarch64.neon.sqrdmlsh.v4i16(<4 x i16> %lhs, <4 x i16> %rhs)
				; CHECK: sqrdmlsh v0.4h, v0.4h, v1.4h
				ret <4 x i16> %prod
				}

				define <8 x i16> @test_sqrdmlsh_v8i16(<8 x i16> %lhs, <8 x i16> %rhs) {
				; CHECK-LABEL: test_sqrdmlsh_v8i16:
				%prod = call <8 x i16> @llvm.aarch64.neon.sqrdmlsh.v8i16(<8 x i16> %lhs, <8 x i16> %rhs)
				; CHECK: sqrdmlsh v0.8h, v0.8h, v1.8h
				ret <8 x i16> %prod
				}

				define <2 x i32> @test_sqrdmlsh_v2i32(<2 x i32> %lhs, <2 x i32> %rhs) {
				; CHECK-LABEL: test_sqrdmlsh_v2i32:
				%prod = call <2 x i32> @llvm.aarch64.neon.sqrdmlsh.v2i32(<2 x i32> %lhs, <2 x i32> %rhs)
				; CHECK: sqrdmlsh v0.2s, v0.2s, v1.2s
				ret <2 x i32> %prod
				}

				define <4 x i32> @test_sqrdmlsh_v4i32(<4 x i32> %lhs, <4 x i32> %rhs) {
				; CHECK-LABEL: test_sqrdmlsh_v4i32:
				%prod = call <4 x i32> @llvm.aarch64.neon.sqrdmlsh.v4i32(<4 x i32> %lhs, <4 x i32> %rhs)
				; CHECK: sqrdmlsh v0.4s, v0.4s, v1.4s
				ret <4 x i32> %prod
				}

test/MC/AArch64/armv8-extension-rdma.s

This file was added.

				// RUN: not llvm-mc -triple aarch64-none-linux-gnu -mattr=+rdma -show-encoding < %s 2> %t \| FileCheck %s
				// RUN: FileCheck --check-prefix=CHECK-ERROR < %t %s
				.text

				//AdvSIMD vector
				sqrdmlah v0.4h, v1.4h, v2.4h
				sqrdmlsh v0.4h, v1.4h, v2.4h
				sqrdmlah v0.2s, v1.2s, v2.2s
				sqrdmlsh v0.2s, v1.2s, v2.2s
				sqrdmlah v0.4s, v1.4s, v2.4s
				sqrdmlsh v0.4s, v1.4s, v2.4s
				sqrdmlah v0.8h, v1.8h, v2.8h
				sqrdmlsh v0.8h, v1.8h, v2.8h
				// CHECK: sqrdmlah v0.4h, v1.4h, v2.4h // encoding: [0x20,0x84,0x42,0x2e]
				// CHECK: sqrdmlsh v0.4h, v1.4h, v2.4h // encoding: [0x20,0x8c,0x42,0x2e]
				// CHECK: sqrdmlah v0.2s, v1.2s, v2.2s // encoding: [0x20,0x84,0x82,0x2e]
				// CHECK: sqrdmlsh v0.2s, v1.2s, v2.2s // encoding: [0x20,0x8c,0x82,0x2e]
				// CHECK: sqrdmlah v0.4s, v1.4s, v2.4s // encoding: [0x20,0x84,0x82,0x6e]
				// CHECK: sqrdmlsh v0.4s, v1.4s, v2.4s // encoding: [0x20,0x8c,0x82,0x6e]
				// CHECK: sqrdmlah v0.8h, v1.8h, v2.8h // encoding: [0x20,0x84,0x42,0x6e]
				// CHECK: sqrdmlsh v0.8h, v1.8h, v2.8h // encoding: [0x20,0x8c,0x42,0x6e]

				sqrdmlah v0.2h, v1.2h, v2.2h
				sqrdmlsh v0.2h, v1.2h, v2.2h
				sqrdmlah v0.8s, v1.8s, v2.8s
				sqrdmlsh v0.8s, v1.8s, v2.8s
				sqrdmlah v0.2s, v1.4h, v2.8h
				sqrdmlsh v0.4s, v1.8h, v2.2s
				// CHECK-ERROR: error: invalid vector kind qualifier
				// CHECK-ERROR: sqrdmlah v0.2h, v1.2h, v2.2h
				// CHECK-ERROR: ^
				// CHECK-ERROR: error: invalid vector kind qualifier
				// CHECK-ERROR: sqrdmlah v0.2h, v1.2h, v2.2h
				// CHECK-ERROR: ^
				// CHECK-ERROR: error: invalid vector kind qualifier
				// CHECK-ERROR: sqrdmlah v0.2h, v1.2h, v2.2h
				// CHECK-ERROR: ^
				// CHECK-ERROR: error: invalid operand for instruction
				// CHECK-ERROR: sqrdmlah v0.2h, v1.2h, v2.2h
				// CHECK-ERROR: ^
				// CHECK-ERROR: error: invalid vector kind qualifier
				// CHECK-ERROR: sqrdmlsh v0.2h, v1.2h, v2.2h
				// CHECK-ERROR: ^
				// CHECK-ERROR: error: invalid vector kind qualifier
				// CHECK-ERROR: sqrdmlsh v0.2h, v1.2h, v2.2h
				// CHECK-ERROR: ^
				// CHECK-ERROR: error: invalid vector kind qualifier
				// CHECK-ERROR: sqrdmlsh v0.2h, v1.2h, v2.2h
				// CHECK-ERROR: ^
				// CHECK-ERROR: error: invalid operand for instruction
				// CHECK-ERROR: sqrdmlsh v0.2h, v1.2h, v2.2h
				// CHECK-ERROR: ^
				// CHECK-ERROR: error: invalid vector kind qualifier
				// CHECK-ERROR: sqrdmlah v0.8s, v1.8s, v2.8s
				// CHECK-ERROR: ^
				// CHECK-ERROR: error: invalid vector kind qualifier
				// CHECK-ERROR: sqrdmlah v0.8s, v1.8s, v2.8s
				// CHECK-ERROR: ^
				// CHECK-ERROR: error: invalid vector kind qualifier
				// CHECK-ERROR: sqrdmlah v0.8s, v1.8s, v2.8s
				// CHECK-ERROR: ^
				// CHECK-ERROR: error: invalid operand for instruction
				// CHECK-ERROR: sqrdmlah v0.8s, v1.8s, v2.8s
				// CHECK-ERROR: ^
				// CHECK-ERROR: error: invalid vector kind qualifier
				// CHECK-ERROR: sqrdmlsh v0.8s, v1.8s, v2.8s
				// CHECK-ERROR: ^
				// CHECK-ERROR: error: invalid vector kind qualifier
				// CHECK-ERROR: sqrdmlsh v0.8s, v1.8s, v2.8s
				// CHECK-ERROR: ^
				// CHECK-ERROR: error: invalid vector kind qualifier
				// CHECK-ERROR: sqrdmlsh v0.8s, v1.8s, v2.8s
				// CHECK-ERROR: ^
				// CHECK-ERROR: error: invalid operand for instruction
				// CHECK-ERROR: sqrdmlsh v0.8s, v1.8s, v2.8s
				// CHECK-ERROR: ^
				// CHECK-ERROR: error: invalid operand for instruction
				// CHECK-ERROR: sqrdmlah v0.2s, v1.4h, v2.8h
				// CHECK-ERROR: ^
				// CHECK-ERROR: error: invalid operand for instruction
				// CHECK-ERROR: sqrdmlsh v0.4s, v1.8h, v2.2s
				// CHECK-ERROR: ^

				//AdvSIMD scalar
				sqrdmlah h0, h1, h2
				sqrdmlsh h0, h1, h2
				sqrdmlah s0, s1, s2
				sqrdmlsh s0, s1, s2
				// CHECK: sqrdmlah h0, h1, h2 // encoding: [0x20,0x84,0x42,0x7e]
				// CHECK: sqrdmlsh h0, h1, h2 // encoding: [0x20,0x8c,0x42,0x7e]
				// CHECK: sqrdmlah s0, s1, s2 // encoding: [0x20,0x84,0x82,0x7e]
				// CHECK: sqrdmlsh s0, s1, s2 // encoding: [0x20,0x8c,0x82,0x7e]

				//AdvSIMD vector by-element
				sqrdmlah v0.4h, v1.4h, v2.h[3]
				sqrdmlsh v0.4h, v1.4h, v2.h[3]
				sqrdmlah v0.2s, v1.2s, v2.s[1]
				sqrdmlsh v0.2s, v1.2s, v2.s[1]
				sqrdmlah v0.8h, v1.8h, v2.h[3]
				sqrdmlsh v0.8h, v1.8h, v2.h[3]
				sqrdmlah v0.4s, v1.4s, v2.s[3]
				sqrdmlsh v0.4s, v1.4s, v2.s[3]
				// CHECK: sqrdmlah v0.4h, v1.4h, v2.h[3] // encoding: [0x20,0xd0,0x72,0x2f]
				// CHECK: sqrdmlsh v0.4h, v1.4h, v2.h[3] // encoding: [0x20,0xf0,0x72,0x2f]
				// CHECK: sqrdmlah v0.2s, v1.2s, v2.s[1] // encoding: [0x20,0xd0,0xa2,0x2f]
				// CHECK: sqrdmlsh v0.2s, v1.2s, v2.s[1] // encoding: [0x20,0xf0,0xa2,0x2f]
				// CHECK: sqrdmlah v0.8h, v1.8h, v2.h[3] // encoding: [0x20,0xd0,0x72,0x6f]
				// CHECK: sqrdmlsh v0.8h, v1.8h, v2.h[3] // encoding: [0x20,0xf0,0x72,0x6f]
				// CHECK: sqrdmlah v0.4s, v1.4s, v2.s[3] // encoding: [0x20,0xd8,0xa2,0x6f]
				// CHECK: sqrdmlsh v0.4s, v1.4s, v2.s[3] // encoding: [0x20,0xf8,0xa2,0x6f]

				sqrdmlah v0.4s, v1.2s, v2.s[1]
				sqrdmlsh v0.2s, v1.2d, v2.s[1]
				sqrdmlah v0.8h, v1.8h, v2.s[3]
				sqrdmlsh v0.8h, v1.8h, v2.h[8]
				// CHECK-ERROR: error: invalid operand for instruction
				// CHECK-ERROR: sqrdmlah v0.4s, v1.2s, v2.s[1]
				// CHECK-ERROR: ^
				// CHECK-ERROR: error: invalid operand for instruction
				// CHECK-ERROR: sqrdmlsh v0.2s, v1.2d, v2.s[1]
				// CHECK-ERROR: ^
				// CHECK-ERROR: error: invalid operand for instruction
				// CHECK-ERROR: sqrdmlah v0.8h, v1.8h, v2.s[3]
				// CHECK-ERROR: ^
				// CHECK-ERROR: error: vector lane must be an integer in range [0, 7].
				// CHECK-ERROR: sqrdmlsh v0.8h, v1.8h, v2.h[8]
				// CHECK-ERROR: ^

				//AdvSIMD scalar by-element
				sqrdmlah h0, h1, v2.h[3]
				sqrdmlsh h0, h1, v2.h[3]
				sqrdmlah s0, s1, v2.s[3]
				sqrdmlsh s0, s1, v2.s[3]
				// CHECK: sqrdmlah h0, h1, v2.h[3] // encoding: [0x20,0xd0,0x72,0x7f]
				// CHECK: sqrdmlsh h0, h1, v2.h[3] // encoding: [0x20,0xf0,0x72,0x7f]
				// CHECK: sqrdmlah s0, s1, v2.s[3] // encoding: [0x20,0xd8,0xa2,0x7f]
				// CHECK: sqrdmlsh s0, s1, v2.s[3] // encoding: [0x20,0xf8,0xa2,0x7f]

				sqrdmlah b0, h1, v2.h[3]
				sqrdmlah s0, d1, v2.s[3]
				sqrdmlsh h0, h1, v2.s[3]
				sqrdmlsh s0, s1, v2.s[4]
				// CHECK-ERROR: error: invalid operand for instruction
				// CHECK-ERROR: sqrdmlah b0, h1, v2.h[3]
				// CHECK-ERROR: ^
				// CHECK-ERROR: error: invalid operand for instruction
				// CHECK-ERROR: sqrdmlah s0, d1, v2.s[3]
				// CHECK-ERROR: ^
				// CHECK-ERROR: error: invalid operand for instruction
				// CHECK-ERROR: sqrdmlsh h0, h1, v2.s[3]
				// CHECK-ERROR: ^
				// CHECK-ERROR: error: vector lane must be an integer in range [0, 3].
				// CHECK-ERROR: sqrdmlsh s0, s1, v2.s[4]
				// CHECK-ERROR: ^

test/MC/Disassembler/AArch64/armv8-extension-rdma.txt

This file was added.

				# RUN: llvm-mc -triple aarch64-none-linux-gnu -mattr=+rdma --disassemble < %s \| FileCheck %s

				0x20,0x84,0x42,0x2e
				0x20,0x8c,0x42,0x2e
				0x20,0x84,0x82,0x2e
				0x20,0x8c,0x82,0x2e
				0x20,0x84,0x82,0x6e
				0x20,0x8c,0x82,0x6e
				0x20,0x84,0x42,0x6e
				0x20,0x8c,0x42,0x6e
				# CHECK: sqrdmlah v0.4h, v1.4h, v2.4h
				# CHECK: sqrdmlsh v0.4h, v1.4h, v2.4h
				# CHECK: sqrdmlah v0.2s, v1.2s, v2.2s
				# CHECK: sqrdmlsh v0.2s, v1.2s, v2.2s
				# CHECK: sqrdmlah v0.4s, v1.4s, v2.4s
				# CHECK: sqrdmlsh v0.4s, v1.4s, v2.4s
				# CHECK: sqrdmlah v0.8h, v1.8h, v2.8h
				# CHECK: sqrdmlsh v0.8h, v1.8h, v2.8h

				0x20,0x84,0x42,0x7e
				0x20,0x8c,0x42,0x7e
				0x20,0x84,0x82,0x7e
				0x20,0x8c,0x82,0x7e
				# CHECK: sqrdmlah h0, h1, h2
				# CHECK: sqrdmlsh h0, h1, h2
				# CHECK: sqrdmlah s0, s1, s2
				# CHECK: sqrdmlsh s0, s1, s2

				0x20,0xd0,0x72,0x2f
				0x20,0xf0,0x72,0x2f
				0x20,0xd0,0xa2,0x2f
				0x20,0xf0,0xa2,0x2f
				0x20,0xd0,0x72,0x6f
				0x20,0xf0,0x72,0x6f
				0x20,0xd8,0xa2,0x6f
				0x20,0xf8,0xa2,0x6f
				# CHECK: sqrdmlah v0.4h, v1.4h, v2.h[3]
				# CHECK: sqrdmlsh v0.4h, v1.4h, v2.h[3]
				# CHECK: sqrdmlah v0.2s, v1.2s, v2.s[1]
				# CHECK: sqrdmlsh v0.2s, v1.2s, v2.s[1]
				# CHECK: sqrdmlah v0.8h, v1.8h, v2.h[3]
				# CHECK: sqrdmlsh v0.8h, v1.8h, v2.h[3]
				# CHECK: sqrdmlah v0.4s, v1.4s, v2.s[3]
				# CHECK: sqrdmlsh v0.4s, v1.4s, v2.s[3]

				0x20,0xd0,0x72,0x7f
				0x20,0xf0,0x72,0x7f
				0x20,0xd8,0xa2,0x7f
				0x20,0xf8,0xa2,0x7f
				# CHECK: sqrdmlah h0, h1, v2.h[3]
				# CHECK: sqrdmlsh h0, h1, v2.h[3]
				# CHECK: sqrdmlah s0, s1, v2.s[3]
				# CHECK: sqrdmlsh s0, s1, v2.s[3]

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Add v8.1a RDMA extensionNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 20989

include/llvm/IR/IntrinsicsAArch64.td

lib/Target/AArch64/AArch64.td

lib/Target/AArch64/AArch64InstrFormats.td

lib/Target/AArch64/AArch64InstrInfo.td

lib/Target/AArch64/AArch64Subtarget.h

lib/Target/AArch64/AArch64Subtarget.cpp

test/CodeGen/AArch64/arm64-neon-2velem-rdma.ll

test/CodeGen/AArch64/arm64-neon-rdma-apple.ll

test/CodeGen/AArch64/arm64-neon-rdma.ll

test/MC/AArch64/armv8-extension-rdma.s

test/MC/Disassembler/AArch64/armv8-extension-rdma.txt

[AArch64] Add v8.1a RDMA extension
Needs ReviewPublic