This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
1/2
AArch64InstrFormats.td
2/13
AArch64InstrInfo.td
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
arm64-fma-combines.ll
-
arm64-fml-combines.ll
-
arm64-neon-2velem.ll
1
arm64-neon-scalar-by-elem-mul.ll
1
complex-deinterleaving-f16-mul.ll
3
fp16_intrinsic_lane.ll
-
vecreduce-fmul-legalization-strict.ll

Differential D153207

[AArch64] Add patterns for scalar FMUL, FMULX
ClosedPublic

Authored by overmighty on Jun 17 2023, 2:47 PM.

Download Raw Diff

Details

Reviewers

t.p.northover
SjoerdMeijer
dmgreen
samtebbs

Commits

rGea045b99da8e: [AArch64] Add patterns for scalar FMUL, FMULX

Summary

Scalar FMUL, FMULX instructions perform better or the same compared to indexed
FMUL, FMULX.

For example, the Arm Cortex-A55 Software Optimization Guide lists the following
instructions with a throughput of 2 IPC:

"FP multiply" FMUL
"ASIMD FP multiply" FMULX

whereas it lists the following with a throughput of 1 IPC:

"ASIMD FP multiply, by element" FMUL, FMULX

The Arm Cortex-A510 Software Optimization Guide, however, does not separately
list "by element" variants of the "ASIMD FP multiply" instructions, which are
listed with the same throughput as the non-ASIMD ones.

Fixes #60817.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

overmighty created this revision.Jun 17 2023, 2:47 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 17 2023, 2:47 PM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald Transcript

overmighty requested review of this revision.Jun 17 2023, 2:47 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 17 2023, 2:47 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

overmighty added inline comments.Jun 17 2023, 3:05 PM

llvm/lib/Target/AArch64/AArch64InstrInfo.td

6891

For example, this prevents the following regression:

define float @test_v3f32(<3 x float> %a) nounwind {
; CHECK-LABEL: test_v3f32:
; CHECK:       // %bb.0:
; CHECK-NEXT:    fmul s1, s0, v0.s[1]
; CHECK-NEXT:    fmul s0, s1, v0.s[2]
; CHECK-NEXT:    ret
  %b = call float @llvm.vector.reduce.fmul.f32.v3f32(float 1.0, <3 x float> %a)
  ret float %b
}

test_v3f32:                             // @test_v3f32
// %bb.0:
	mov	s1, v0.s[1]
	fmul	s1, s1, s0
	fmul	s0, s1, v0.s[2]
	ret

llvm/test/CodeGen/AArch64/complex-deinterleaving-f16-mul.ll

Ideally this would be fmul h4, h0, h3, but this is prevented by the patterns to avoid using scalar FMUL if it might introduce an extra DUP/mov.

Harbormaster completed remote builds in B239631: Diff 532426.Jun 17 2023, 3:24 PM

Updated a failing Clang test.

Herald added a subscriber: arphaman. · View Herald TranscriptJun 17 2023, 4:56 PM

Harbormaster completed remote builds in B239636: Diff 532433.Jun 17 2023, 5:44 PM

Looks worthwhile to me so far but I think the test changes need some clean-up.

llvm/test/CodeGen/AArch64/fp16_intrinsic_lane.ll
490–500	Some of the test changes, such as this one being moved and renamed, make it hard to see what has changed. Could you try to reduce the number of such differences?

overmighty added inline comments.Jun 19 2023, 10:43 AM

llvm/test/CodeGen/AArch64/fp16_intrinsic_lane.ll
490–500	Sure, I was trying to maintain some kind of consistency. I have also spotted duplicate tests, I don't know if it would be worth it to refactor some of the tests in the future.

samtebbs added inline comments.Jun 20 2023, 1:47 AM

llvm/test/CodeGen/AArch64/fp16_intrinsic_lane.ll
490–500	Yeah refactoring in a future patch with no functional changes sounds good to me.

There is a lot going on in this patch. It might be good to focus on one of the instructions first. And to split the test changes out into a separate patch, just showing the differences here.

llvm/lib/Target/AArch64/AArch64InstrInfo.td
4504	Could these be in ThreeOperandFPData so that they reuse the patterns with the existing TriOpFrags?
6891	Would the same thing happen with other instructions too, if there were tests?

In D153207#4434191, @dmgreen wrote:

It might be good to focus on one of the instructions first.

Not sure what you mean by this. Should I update this revision to add patterns and tests for only one instruction, have it LGTM'd but not committed, then update it again for the next instruction?

If there was one patch that handled fmul, and another that handled fma then each would be simpler on its own and easier to work through the details.

Rebased new upstream commits, replaced test changes with minimal ones, fixed extra DUP/mov regression for scalar FMULX and added tests for it, removed new patterns for FMADD and FMSUB as well as the associated new tests.

overmighty added inline comments.Jun 21 2023, 10:37 AM

llvm/lib/Target/AArch64/AArch64InstrInfo.td
6891	Right, I fixed it for FMULX too with this updated diff. See the changes in AArch64InstrFormats.td and the new `test_fmulx_horizontal_f*` tests.

Harbormaster completed remote builds in B240294: Diff 533324.Jun 21 2023, 11:45 AM

overmighty added inline comments.Jun 22 2023, 2:55 AM

llvm/test/CodeGen/AArch64/arm64-neon-scalar-by-elem-mul.ll
259	Not sure if `test_fmulx_horizontal_f*` tests should be put in a separate file or not.

Thanks. That makes it clearer to read. Using the same patterns in SIMDFPIndexed is nice.

Can you make it so that t_vmulh_lane3_f16 is next to t_vmulh_lane_f16, and maybe rename t_vmulh_lane_f16 to t_vmulh_lane0_f16. So that they are still the same tests as before, but with the new additions inplace. The same for all the others.

llvm/lib/Target/AArch64/AArch64InstrFormats.td
8545	I think these should be HasNEON too.
llvm/lib/Target/AArch64/AArch64InstrInfo.td
5251	I don't think this needs NEON according to the reference I was looking at. Just FP. The patterns for FMUL are very similar to the patterns for FMULX16. Could they share the same multiclass for the patterns?

Rebased new upstream commits, added missing HasNEON predicate to new SIMDFPIndexed v1i32_indexed and v1i64_indexed patterns, replaced new scalar FMUL and FMULX patterns with new multiclass, re-refactored tests for consistency/grouping.

overmighty added inline comments.Jun 25 2023, 5:32 AM

llvm/lib/Target/AArch64/AArch64InstrFormats.td
8545	Right, I had it on the previous patterns for indexed FMUL, I somehow lost/forgot it while porting them to `SIMDFPIndexed`.
llvm/lib/Target/AArch64/AArch64InstrInfo.td
4432	I wasn't sure where to put this.
5251	Can you confirm that it doesn't need Neon? I couldn't find some sort of AArch64 equivalent to the Intel 64 manual which lists CPUID feature flags along instructions, but a few things suggest that even scalar FMULX needs Neon: The Arm Cortex/Neoverse Software Optimization Guides that I read listed FMULX under the "ASIMD FP multiply" instruction group (and under SVE "Floating point multiply" too) but not under regular "FP multiply" where FMUL appears. I'm not sure what it's worth but FMULX is defined here with an intrinsic that contains `neon` in its name as `OpNode`, the predicate is `HasNEONorSME`, and `SIMDFPThreeScalar` defaults to `HasNEON`. However, this page lists FMULX as an instruction that has been added as part of floating-point changes in AArch64 and not Neon changes: https://developer.arm.com/documentation/den0024/a/AArch64-Floating-point-and-NEON/New-features-for-NEON-and-Floating-point-in-AArch64?lang=en.

overmighty added inline comments.Jun 25 2023, 6:25 AM

llvm/lib/Target/AArch64/AArch64InstrInfo.td
4432	Also, I assume this multiclass needs a better name that mentions the number of operands for the instructions. Scalar FMUL invokes `TwoOperandFPData` but scalar FMULX invokes `SIMDFPThreeScalar`, I'm not sure if something like `TwoOperandFPDataFromLane0Patterns` would be a better name.

Harbormaster completed remote builds in B241025: Diff 534332.Jun 25 2023, 9:51 AM

Matt added a subscriber: Matt.Jun 26 2023, 9:39 AM

Thanks for the changes. With a few extra Nits and suggestions, this LGTM.

llvm/lib/Target/AArch64/AArch64InstrInfo.td
4432	Perhaps move it next to the defm for FPScalarFromIndexedLane0Patterns? Would it expect to be used for anything other than mul/mulx? If not, perhaps include Mul in the name instead of Scalar.
5251	The definitive source will be the pseudocode in the Arm Architecture Reference Manual. For mulx it has: if elements == 1 then CheckFPEnabled64(); else CheckFPAdvSIMDEnabled64(); In this case, if the instruction definition already uses HasNEONorSME then its fine to keep the same thing for the new patterns. We can fix the two in another patch at some point. It likely doesn't matter a lot at the moment, I don't think there is a way to generate mulx intrinsics without Neon being enabled. This could be moved down below the other instruction definitions.

This revision is now accepted and ready to land.Jun 28 2023, 5:02 AM

Rebased new upstream commits, renamed new multiclass from FPScalarFromIndexedLane0Patterns to FMULScalarFromIndexedLane0Patterns and moved it next to the first invocation for FMUL, moved second invocation for FMULX below the group of instruction definitions.

Thanks for the suggestions. If you don't have more and this still looks good to you, please commit this as "OverMighty <its.overmighty@gmail.com>". Thank you. :)

llvm/lib/Target/AArch64/AArch64InstrInfo.td
4432	I don't expect it to be used for any instructions other than FMUL and FMULX without some changes. I'm not sure about removing `Scalar` from the name though, especially if `Indexed` stays.
5251	By the way, this pseudocode from the Arm Architecture Reference Manual differs from the one found in the following web pages, where only `CheckFPAdvSIMDEnabled64` is called: https://developer.arm.com/architectures/instruction-sets/intrinsics/vmulxs_f32 https://developer.arm.com/documentation/ddi0596/2021-03/SIMD-FP-Instructions/FMULX--Floating-point-Multiply-extended-?lang=en

Harbormaster completed remote builds in B242047: Diff 535730.Jun 29 2023, 5:49 AM

Thanks. Yeah will do.

llvm/lib/Target/AArch64/AArch64InstrInfo.td
5251	I see. That was a change made when SME was added to the architecture.

Closed by commit rGea045b99da8e: [AArch64] Add patterns for scalar FMUL, FMULX (authored by overmighty, committed by dmgreen). · Explain WhyJun 30 2023, 12:34 AM

This revision was automatically updated to reflect the committed changes.

dmgreen added a commit: rGea045b99da8e: [AArch64] Add patterns for scalar FMUL, FMULX.

overmighty mentioned this in D158008: [AArch64] Add patterns for FMADD, FMSUB.Aug 19 2023, 8:22 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64InstrFormats.td

25 lines

AArch64InstrInfo.td

31 lines

test/

CodeGen/

AArch64/

arm64-fma-combines.ll

4 lines

arm64-fml-combines.ll

4 lines

arm64-neon-2velem.ll

43 lines

arm64-neon-scalar-by-elem-mul.ll

112 lines

complex-deinterleaving-f16-mul.ll

2 lines

fp16_intrinsic_lane.ll

79 lines

vecreduce-fmul-legalization-strict.ll

6 lines

Diff 536132

llvm/lib/Target/AArch64/AArch64InstrFormats.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 8,421 Lines • ▼ Show 20 Lines
multiclass SIMDThreeSameVectorFMLIndex<bit U, bits<4> opc, string asm,		multiclass SIMDThreeSameVectorFMLIndex<bit U, bits<4> opc, string asm,
SDPatternOperator OpNode> {		SDPatternOperator OpNode> {
def v4f16 : BaseSIMDThreeSameVectorFMLIndex<0, U, opc, asm, ".2s", ".2h", ".h",		def v4f16 : BaseSIMDThreeSameVectorFMLIndex<0, U, opc, asm, ".2s", ".2h", ".h",
V64, v2f32, v4f16, OpNode>;		V64, v2f32, v4f16, OpNode>;
def v8f16 : BaseSIMDThreeSameVectorFMLIndex<1, U, opc, asm, ".4s", ".4h", ".h",		def v8f16 : BaseSIMDThreeSameVectorFMLIndex<1, U, opc, asm, ".4s", ".4h", ".h",
V128, v4f32, v8f16, OpNode>;		V128, v4f32, v8f16, OpNode>;
}		}

let mayRaiseFPException = 1, Uses = [FPCR] in
multiclass SIMDFPIndexed<bit U, bits<4> opc, string asm,		multiclass SIMDFPIndexed<bit U, bits<4> opc, string asm,
SDPatternOperator OpNode> {		SDPatternOperator OpNode> {
		let mayRaiseFPException = 1, Uses = [FPCR] in {
let Predicates = [HasNEON, HasFullFP16] in {		let Predicates = [HasNEON, HasFullFP16] in {
def v4i16_indexed : BaseSIMDIndexed<0, U, 0, 0b00, opc,		def v4i16_indexed : BaseSIMDIndexed<0, U, 0, 0b00, opc,
V64, V64,		V64, V64,
V128_lo, VectorIndexH,		V128_lo, VectorIndexH,
asm, ".4h", ".4h", ".4h", ".h",		asm, ".4h", ".4h", ".4h", ".h",
[(set (v4f16 V64:$Rd),		[(set (v4f16 V64:$Rd),
(OpNode (v4f16 V64:$Rn),		(OpNode (v4f16 V64:$Rn),
(v4f16 (AArch64duplane16 (v8f16 V128_lo:$Rm), VectorIndexH:$idx))))]> {		(v4f16 (AArch64duplane16 (v8f16 V128_lo:$Rm), VectorIndexH:$idx))))]> {
▲ Show 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	def v1i64_indexed : BaseSIMDIndexed<1, U, 1, 0b11, opc,
[(set (f64 FPR64Op:$Rd),		[(set (f64 FPR64Op:$Rd),
(OpNode (f64 FPR64Op:$Rn),		(OpNode (f64 FPR64Op:$Rn),
(f64 (vector_extract (v2f64 V128:$Rm),		(f64 (vector_extract (v2f64 V128:$Rm),
VectorIndexD:$idx))))]> {		VectorIndexD:$idx))))]> {
bits<1> idx;		bits<1> idx;
let Inst{11} = idx{0};		let Inst{11} = idx{0};
let Inst{21} = 0;		let Inst{21} = 0;
}		}
		} // mayRaiseFPException = 1, Uses = [FPCR]

		let Predicates = [HasNEON, HasFullFP16] in {
		def : Pat<(f16 (OpNode
		(f16 (vector_extract (v8f16 V128:$Rn), (i64 0))),
		(f16 (vector_extract (v8f16 V128:$Rm), VectorIndexH:$idx)))),
		(!cast<Instruction>(NAME # v1i16_indexed)
		(EXTRACT_SUBREG V128:$Rn, hsub), V128:$Rm, VectorIndexH:$idx)>;
		}

		let Predicates = [HasNEON] in {
		dmgreenUnsubmitted Not Done Reply Inline Actions I think these should be HasNEON too. dmgreen: I think these should be HasNEON too.
		overmightyAuthorUnsubmitted Done Reply Inline Actions Right, I had it on the previous patterns for indexed FMUL, I somehow lost/forgot it while porting them to `SIMDFPIndexed`. overmighty: Right, I had it on the previous patterns for indexed FMUL, I somehow lost/forgot it while…
		def : Pat<(f32 (OpNode
		(f32 (vector_extract (v4f32 V128:$Rn), (i64 0))),
		(f32 (vector_extract (v4f32 V128:$Rm), VectorIndexS:$idx)))),
		(!cast<Instruction>(NAME # v1i32_indexed)
		(EXTRACT_SUBREG V128:$Rn, ssub), V128:$Rm, VectorIndexS:$idx)>;

		def : Pat<(f64 (OpNode
		(f64 (vector_extract (v2f64 V128:$Rn), (i64 0))),
		(f64 (vector_extract (v2f64 V128:$Rm), VectorIndexD:$idx)))),
		(!cast<Instruction>(NAME # v1i64_indexed)
		(EXTRACT_SUBREG V128:$Rn, dsub), V128:$Rm, VectorIndexD:$idx)>;
		}
}		}

multiclass SIMDFPIndexedTiedPatterns<string INST, SDPatternOperator OpNode> {		multiclass SIMDFPIndexedTiedPatterns<string INST, SDPatternOperator OpNode> {
let Predicates = [HasNEON, HasFullFP16] in {		let Predicates = [HasNEON, HasFullFP16] in {
// Patterns for f16: DUPLANE, DUP scalar and vector_extract.		// Patterns for f16: DUPLANE, DUP scalar and vector_extract.
def : Pat<(v8f16 (OpNode (v8f16 V128:$Rd), (v8f16 V128:$Rn),		def : Pat<(v8f16 (OpNode (v8f16 V128:$Rd), (v8f16 V128:$Rn),
(AArch64duplane16 (v8f16 V128_lo:$Rm),		(AArch64duplane16 (v8f16 V128_lo:$Rm),
VectorIndexH:$idx))),		VectorIndexH:$idx))),
▲ Show 20 Lines • Show All 3,594 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64InstrInfo.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,423 Lines • ▼ Show 20 Lines	def : Pat<(i64 (any_llrint f32:$Rn)),
(FCVTZSUXSr (FRINTXSr f32:$Rn))>;		(FCVTZSUXSr (FRINTXSr f32:$Rn))>;
def : Pat<(i64 (any_llrint f64:$Rn)),		def : Pat<(i64 (any_llrint f64:$Rn)),
(FCVTZSUXDr (FRINTXDr f64:$Rn))>;		(FCVTZSUXDr (FRINTXDr f64:$Rn))>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Floating point two operand instructions.		// Floating point two operand instructions.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

defm FADD : TwoOperandFPData<0b0010, "fadd", any_fadd>;		defm FADD : TwoOperandFPData<0b0010, "fadd", any_fadd>;
		overmightyAuthorUnsubmitted Not Done Reply Inline Actions I wasn't sure where to put this. overmighty: I wasn't sure where to put this.
		overmightyAuthorUnsubmitted Not Done Reply Inline Actions Also, I assume this multiclass needs a better name that mentions the number of operands for the instructions. Scalar FMUL invokes `TwoOperandFPData` but scalar FMULX invokes `SIMDFPThreeScalar`, I'm not sure if something like `TwoOperandFPDataFromLane0Patterns` would be a better name. overmighty: Also, I assume this multiclass needs a better name that mentions the number of operands for the…
		dmgreenUnsubmitted Not Done Reply Inline Actions Perhaps move it next to the defm for FPScalarFromIndexedLane0Patterns? Would it expect to be used for anything other than mul/mulx? If not, perhaps include Mul in the name instead of Scalar. dmgreen: Perhaps move it next to the defm for FPScalarFromIndexedLane0Patterns? Would it expect to be…
		overmightyAuthorUnsubmitted Not Done Reply Inline Actions I don't expect it to be used for any instructions other than FMUL and FMULX without some changes. I'm not sure about removing `Scalar` from the name though, especially if `Indexed` stays. overmighty: I don't expect it to be used for any instructions other than FMUL and FMULX without some…
let SchedRW = [WriteFDiv] in {		let SchedRW = [WriteFDiv] in {
defm FDIV : TwoOperandFPData<0b0001, "fdiv", any_fdiv>;		defm FDIV : TwoOperandFPData<0b0001, "fdiv", any_fdiv>;
}		}
defm FMAXNM : TwoOperandFPData<0b0110, "fmaxnm", any_fmaxnum>;		defm FMAXNM : TwoOperandFPData<0b0110, "fmaxnm", any_fmaxnum>;
defm FMAX : TwoOperandFPData<0b0100, "fmax", any_fmaximum>;		defm FMAX : TwoOperandFPData<0b0100, "fmax", any_fmaximum>;
defm FMINNM : TwoOperandFPData<0b0111, "fminnm", any_fminnum>;		defm FMINNM : TwoOperandFPData<0b0111, "fminnm", any_fminnum>;
defm FMIN : TwoOperandFPData<0b0101, "fmin", any_fminimum>;		defm FMIN : TwoOperandFPData<0b0101, "fmin", any_fminimum>;
let SchedRW = [WriteFMul] in {		let SchedRW = [WriteFMul] in {
defm FMUL : TwoOperandFPData<0b0000, "fmul", any_fmul>;		defm FMUL : TwoOperandFPData<0b0000, "fmul", any_fmul>;
defm FNMUL : TwoOperandFPDataNeg<0b1000, "fnmul", any_fmul>;		defm FNMUL : TwoOperandFPDataNeg<0b1000, "fnmul", any_fmul>;
}		}
defm FSUB : TwoOperandFPData<0b0011, "fsub", any_fsub>;		defm FSUB : TwoOperandFPData<0b0011, "fsub", any_fsub>;

		multiclass FMULScalarFromIndexedLane0Patterns<string inst,
		string inst_f16_suffix,
		string inst_f32_suffix,
		string inst_f64_suffix,
		SDPatternOperator OpNode,
		list<Predicate> preds = []> {
		let Predicates = !listconcat(preds, [HasFullFP16]) in {
		def : Pat<(f16 (OpNode (f16 FPR16:$Rn),
		(f16 (vector_extract (v8f16 V128:$Rm), (i64 0))))),
		(!cast<Instruction>(inst # inst_f16_suffix)
		FPR16:$Rn, (EXTRACT_SUBREG V128:$Rm, hsub))>;
		}
		let Predicates = preds in {
		def : Pat<(f32 (OpNode (f32 FPR32:$Rn),
		(f32 (vector_extract (v4f32 V128:$Rm), (i64 0))))),
		(!cast<Instruction>(inst # inst_f32_suffix)
		FPR32:$Rn, (EXTRACT_SUBREG V128:$Rm, ssub))>;
		def : Pat<(f64 (OpNode (f64 FPR64:$Rn),
		(f64 (vector_extract (v2f64 V128:$Rm), (i64 0))))),
		(!cast<Instruction>(inst # inst_f64_suffix)
		FPR64:$Rn, (EXTRACT_SUBREG V128:$Rm, dsub))>;
		}
		}

		defm : FMULScalarFromIndexedLane0Patterns<"FMUL", "Hrr", "Srr", "Drr",
		any_fmul>;

// Match reassociated forms of FNMUL.		// Match reassociated forms of FNMUL.
def : Pat<(fmul (fneg FPR16:$a), (f16 FPR16:$b)),		def : Pat<(fmul (fneg FPR16:$a), (f16 FPR16:$b)),
(FNMULHrr FPR16:$a, FPR16:$b)>,		(FNMULHrr FPR16:$a, FPR16:$b)>,
Requires<[HasFullFP16]>;		Requires<[HasFullFP16]>;
def : Pat<(fmul (fneg FPR32:$a), (f32 FPR32:$b)),		def : Pat<(fmul (fneg FPR32:$a), (f32 FPR32:$b)),
(FNMULSrr FPR32:$a, FPR32:$b)>;		(FNMULSrr FPR32:$a, FPR32:$b)>;
def : Pat<(fmul (fneg FPR64:$a), (f64 FPR64:$b)),		def : Pat<(fmul (fneg FPR64:$a), (f64 FPR64:$b)),
(FNMULDrr FPR64:$a, FPR64:$b)>;		(FNMULDrr FPR64:$a, FPR64:$b)>;
Show All 15 Lines
defm FMSUB : ThreeOperandFPData<0, 1, "fmsub",		defm FMSUB : ThreeOperandFPData<0, 1, "fmsub",
TriOpFrag<(any_fma node:$LHS, (fneg node:$MHS), node:$RHS)> >;		TriOpFrag<(any_fma node:$LHS, (fneg node:$MHS), node:$RHS)> >;
defm FNMADD : ThreeOperandFPData<1, 0, "fnmadd",		defm FNMADD : ThreeOperandFPData<1, 0, "fnmadd",
TriOpFrag<(fneg (any_fma node:$LHS, node:$MHS, node:$RHS))> >;		TriOpFrag<(fneg (any_fma node:$LHS, node:$MHS, node:$RHS))> >;
defm FNMSUB : ThreeOperandFPData<1, 1, "fnmsub",		defm FNMSUB : ThreeOperandFPData<1, 1, "fnmsub",
TriOpFrag<(any_fma node:$LHS, node:$MHS, (fneg node:$RHS))> >;		TriOpFrag<(any_fma node:$LHS, node:$MHS, (fneg node:$RHS))> >;

// The following def pats catch the case where the LHS of an FMA is negated.		// The following def pats catch the case where the LHS of an FMA is negated.
// The TriOpFrag above catches the case where the middle operand is negated.		// The TriOpFrag above catches the case where the middle operand is negated.
		dmgreenUnsubmitted Not Done Reply Inline Actions Could these be in ThreeOperandFPData so that they reuse the patterns with the existing TriOpFrags? dmgreen: Could these be in ThreeOperandFPData so that they reuse the patterns with the existing…

// N.b. FMSUB etc have the accumulator at the end of (outs), unlike		// N.b. FMSUB etc have the accumulator at the end of (outs), unlike
// the NEON variant.		// the NEON variant.

// Here we handle first -(a + b*c) for FNMADD:		// Here we handle first -(a + b*c) for FNMADD:

let Predicates = [HasNEON, HasFullFP16] in		let Predicates = [HasNEON, HasFullFP16] in
def : Pat<(f16 (fma (fneg FPR16:$Rn), FPR16:$Rm, FPR16:$Ra)),		def : Pat<(f16 (fma (fneg FPR16:$Rn), FPR16:$Rm, FPR16:$Ra)),
▲ Show 20 Lines • Show All 730 Lines • ▼ Show 20 Lines	defm FACGE : SIMDThreeScalarFPCmp<1, 0, 0b101, "facge",
int_aarch64_neon_facge>;		int_aarch64_neon_facge>;
defm FACGT : SIMDThreeScalarFPCmp<1, 1, 0b101, "facgt",		defm FACGT : SIMDThreeScalarFPCmp<1, 1, 0b101, "facgt",
int_aarch64_neon_facgt>;		int_aarch64_neon_facgt>;
defm FCMEQ : SIMDThreeScalarFPCmp<0, 0, 0b100, "fcmeq", AArch64fcmeq>;		defm FCMEQ : SIMDThreeScalarFPCmp<0, 0, 0b100, "fcmeq", AArch64fcmeq>;
defm FCMGE : SIMDThreeScalarFPCmp<1, 0, 0b100, "fcmge", AArch64fcmge>;		defm FCMGE : SIMDThreeScalarFPCmp<1, 0, 0b100, "fcmge", AArch64fcmge>;
defm FCMGT : SIMDThreeScalarFPCmp<1, 1, 0b100, "fcmgt", AArch64fcmgt>;		defm FCMGT : SIMDThreeScalarFPCmp<1, 1, 0b100, "fcmgt", AArch64fcmgt>;
defm FMULX : SIMDFPThreeScalar<0, 0, 0b011, "fmulx", int_aarch64_neon_fmulx, HasNEONorSME>;		defm FMULX : SIMDFPThreeScalar<0, 0, 0b011, "fmulx", int_aarch64_neon_fmulx, HasNEONorSME>;
defm FRECPS : SIMDFPThreeScalar<0, 0, 0b111, "frecps", int_aarch64_neon_frecps, HasNEONorSME>;		defm FRECPS : SIMDFPThreeScalar<0, 0, 0b111, "frecps", int_aarch64_neon_frecps, HasNEONorSME>;
defm FRSQRTS : SIMDFPThreeScalar<0, 1, 0b111, "frsqrts", int_aarch64_neon_frsqrts, HasNEONorSME>;		defm FRSQRTS : SIMDFPThreeScalar<0, 1, 0b111, "frsqrts", int_aarch64_neon_frsqrts, HasNEONorSME>;
		dmgreenUnsubmitted Not Done Reply Inline Actions I don't think this needs NEON according to the reference I was looking at. Just FP. The patterns for FMUL are very similar to the patterns for FMULX16. Could they share the same multiclass for the patterns? dmgreen: I don't think this needs NEON according to the reference I was looking at. Just FP. The…
		overmightyAuthorUnsubmitted Not Done Reply Inline Actions Can you confirm that it doesn't need Neon? I couldn't find some sort of AArch64 equivalent to the Intel 64 manual which lists CPUID feature flags along instructions, but a few things suggest that even scalar FMULX needs Neon: The Arm Cortex/Neoverse Software Optimization Guides that I read listed FMULX under the "ASIMD FP multiply" instruction group (and under SVE "Floating point multiply" too) but not under regular "FP multiply" where FMUL appears. I'm not sure what it's worth but FMULX is defined here with an intrinsic that contains `neon` in its name as `OpNode`, the predicate is `HasNEONorSME`, and `SIMDFPThreeScalar` defaults to `HasNEON`. However, this page lists FMULX as an instruction that has been added as part of floating-point changes in AArch64 and not Neon changes: https://developer.arm.com/documentation/den0024/a/AArch64-Floating-point-and-NEON/New-features-for-NEON-and-Floating-point-in-AArch64?lang=en. overmighty: Can you confirm that it doesn't need Neon? I couldn't find some sort of AArch64 equivalent to…
		dmgreenUnsubmitted Not Done Reply Inline Actions The definitive source will be the pseudocode in the Arm Architecture Reference Manual. For mulx it has: if elements == 1 then CheckFPEnabled64(); else CheckFPAdvSIMDEnabled64(); In this case, if the instruction definition already uses HasNEONorSME then its fine to keep the same thing for the new patterns. We can fix the two in another patch at some point. It likely doesn't matter a lot at the moment, I don't think there is a way to generate mulx intrinsics without Neon being enabled. This could be moved down below the other instruction definitions. dmgreen: The definitive source will be the pseudocode in the Arm Architecture Reference Manual. For mulx…
		overmightyAuthorUnsubmitted Not Done Reply Inline Actions By the way, this pseudocode from the Arm Architecture Reference Manual differs from the one found in the following web pages, where only `CheckFPAdvSIMDEnabled64` is called: https://developer.arm.com/architectures/instruction-sets/intrinsics/vmulxs_f32 https://developer.arm.com/documentation/ddi0596/2021-03/SIMD-FP-Instructions/FMULX--Floating-point-Multiply-extended-?lang=en overmighty: By the way, this pseudocode from the Arm Architecture Reference Manual differs from the one…
		dmgreenUnsubmitted Not Done Reply Inline Actions I see. That was a change made when SME was added to the architecture. dmgreen: I see. That was a change made when SME was added to the architecture.
defm SQADD : SIMDThreeScalarBHSD<0, 0b00001, "sqadd", int_aarch64_neon_sqadd>;		defm SQADD : SIMDThreeScalarBHSD<0, 0b00001, "sqadd", int_aarch64_neon_sqadd>;
defm SQDMULH : SIMDThreeScalarHS< 0, 0b10110, "sqdmulh", int_aarch64_neon_sqdmulh>;		defm SQDMULH : SIMDThreeScalarHS< 0, 0b10110, "sqdmulh", int_aarch64_neon_sqdmulh>;
defm SQRDMULH : SIMDThreeScalarHS< 1, 0b10110, "sqrdmulh", int_aarch64_neon_sqrdmulh>;		defm SQRDMULH : SIMDThreeScalarHS< 1, 0b10110, "sqrdmulh", int_aarch64_neon_sqrdmulh>;
defm SQRSHL : SIMDThreeScalarBHSD<0, 0b01011, "sqrshl",int_aarch64_neon_sqrshl>;		defm SQRSHL : SIMDThreeScalarBHSD<0, 0b01011, "sqrshl",int_aarch64_neon_sqrshl>;
defm SQSHL : SIMDThreeScalarBHSD<0, 0b01001, "sqshl", int_aarch64_neon_sqshl>;		defm SQSHL : SIMDThreeScalarBHSD<0, 0b01001, "sqshl", int_aarch64_neon_sqshl>;
defm SQSUB : SIMDThreeScalarBHSD<0, 0b00101, "sqsub", int_aarch64_neon_sqsub>;		defm SQSUB : SIMDThreeScalarBHSD<0, 0b00101, "sqsub", int_aarch64_neon_sqsub>;
defm SRSHL : SIMDThreeScalarD< 0, 0b01010, "srshl", int_aarch64_neon_srshl>;		defm SRSHL : SIMDThreeScalarD< 0, 0b01010, "srshl", int_aarch64_neon_srshl>;
defm SSHL : SIMDThreeScalarD< 0, 0b01000, "sshl", int_aarch64_neon_sshl>;		defm SSHL : SIMDThreeScalarD< 0, 0b01000, "sshl", int_aarch64_neon_sshl>;
Show All 10 Lines	let Predicates = [HasRDM] in {
def : Pat<(i32 (int_aarch64_neon_sqrdmlah (i32 FPR32:$Rd), (i32 FPR32:$Rn),		def : Pat<(i32 (int_aarch64_neon_sqrdmlah (i32 FPR32:$Rd), (i32 FPR32:$Rn),
(i32 FPR32:$Rm))),		(i32 FPR32:$Rm))),
(SQRDMLAHv1i32 FPR32:$Rd, FPR32:$Rn, FPR32:$Rm)>;		(SQRDMLAHv1i32 FPR32:$Rd, FPR32:$Rn, FPR32:$Rm)>;
def : Pat<(i32 (int_aarch64_neon_sqrdmlsh (i32 FPR32:$Rd), (i32 FPR32:$Rn),		def : Pat<(i32 (int_aarch64_neon_sqrdmlsh (i32 FPR32:$Rd), (i32 FPR32:$Rn),
(i32 FPR32:$Rm))),		(i32 FPR32:$Rm))),
(SQRDMLSHv1i32 FPR32:$Rd, FPR32:$Rn, FPR32:$Rm)>;		(SQRDMLSHv1i32 FPR32:$Rd, FPR32:$Rn, FPR32:$Rm)>;
}		}

		defm : FMULScalarFromIndexedLane0Patterns<"FMULX", "16", "32", "64",
		int_aarch64_neon_fmulx,
		[HasNEONorSME]>;

def : InstAlias<"cmls $dst, $src1, $src2",		def : InstAlias<"cmls $dst, $src1, $src2",
(CMHSv1i64 FPR64:$dst, FPR64:$src2, FPR64:$src1), 0>;		(CMHSv1i64 FPR64:$dst, FPR64:$src2, FPR64:$src1), 0>;
def : InstAlias<"cmle $dst, $src1, $src2",		def : InstAlias<"cmle $dst, $src1, $src2",
(CMGEv1i64 FPR64:$dst, FPR64:$src2, FPR64:$src1), 0>;		(CMGEv1i64 FPR64:$dst, FPR64:$src2, FPR64:$src1), 0>;
def : InstAlias<"cmlo $dst, $src1, $src2",		def : InstAlias<"cmlo $dst, $src1, $src2",
(CMHIv1i64 FPR64:$dst, FPR64:$src2, FPR64:$src1), 0>;		(CMHIv1i64 FPR64:$dst, FPR64:$src2, FPR64:$src1), 0>;
def : InstAlias<"cmlt $dst, $src1, $src2",		def : InstAlias<"cmlt $dst, $src1, $src2",
(CMGTv1i64 FPR64:$dst, FPR64:$src2, FPR64:$src1), 0>;		(CMGTv1i64 FPR64:$dst, FPR64:$src2, FPR64:$src1), 0>;
▲ Show 20 Lines • Show All 1,593 Lines • ▼ Show 20 Lines
defm : FMLSIndexedAfterNegPatterns<		defm : FMLSIndexedAfterNegPatterns<
TriOpFrag<(any_fma node:$RHS, node:$MHS, node:$LHS)> >;		TriOpFrag<(any_fma node:$RHS, node:$MHS, node:$LHS)> >;
defm : FMLSIndexedAfterNegPatterns<		defm : FMLSIndexedAfterNegPatterns<
TriOpFrag<(any_fma node:$MHS, node:$RHS, node:$LHS)> >;		TriOpFrag<(any_fma node:$MHS, node:$RHS, node:$LHS)> >;

defm FMULX : SIMDFPIndexed<1, 0b1001, "fmulx", int_aarch64_neon_fmulx>;		defm FMULX : SIMDFPIndexed<1, 0b1001, "fmulx", int_aarch64_neon_fmulx>;
defm FMUL : SIMDFPIndexed<0, 0b1001, "fmul", any_fmul>;		defm FMUL : SIMDFPIndexed<0, 0b1001, "fmul", any_fmul>;

def : Pat<(v2f32 (any_fmul V64:$Rn, (AArch64dup (f32 FPR32:$Rm)))),		def : Pat<(v2f32 (any_fmul V64:$Rn, (AArch64dup (f32 FPR32:$Rm)))),
		overmightyAuthorUnsubmitted Done Reply Inline Actions For example, this prevents the following regression: define float @test_v3f32(<3 x float> %a) nounwind { ; CHECK-LABEL: test_v3f32: ; CHECK: // %bb.0: ; CHECK-NEXT: fmul s1, s0, v0.s[1] ; CHECK-NEXT: fmul s0, s1, v0.s[2] ; CHECK-NEXT: ret %b = call float @llvm.vector.reduce.fmul.f32.v3f32(float 1.0, <3 x float> %a) ret float %b } test_v3f32: // @test_v3f32 // %bb.0: mov s1, v0.s[1] fmul s1, s1, s0 fmul s0, s1, v0.s[2] ret overmighty: For example, this prevents the following regression: ``` define float @test_v3f32(<3 x float>…
		dmgreenUnsubmitted Not Done Reply Inline Actions Would the same thing happen with other instructions too, if there were tests? dmgreen: Would the same thing happen with other instructions too, if there were tests?
		overmightyAuthorUnsubmitted Done Reply Inline Actions Right, I fixed it for FMULX too with this updated diff. See the changes in AArch64InstrFormats.td and the new `test_fmulx_horizontal_f` tests. overmighty:* Right, I fixed it for FMULX too with this updated diff. See the changes in AArch64InstrFormats.
(FMULv2i32_indexed V64:$Rn,		(FMULv2i32_indexed V64:$Rn,
(INSERT_SUBREG (v4i32 (IMPLICIT_DEF)), FPR32:$Rm, ssub),		(INSERT_SUBREG (v4i32 (IMPLICIT_DEF)), FPR32:$Rm, ssub),
(i64 0))>;		(i64 0))>;
def : Pat<(v4f32 (any_fmul V128:$Rn, (AArch64dup (f32 FPR32:$Rm)))),		def : Pat<(v4f32 (any_fmul V128:$Rn, (AArch64dup (f32 FPR32:$Rm)))),
(FMULv4i32_indexed V128:$Rn,		(FMULv4i32_indexed V128:$Rn,
(INSERT_SUBREG (v4i32 (IMPLICIT_DEF)), FPR32:$Rm, ssub),		(INSERT_SUBREG (v4i32 (IMPLICIT_DEF)), FPR32:$Rm, ssub),
(i64 0))>;		(i64 0))>;
def : Pat<(v2f64 (any_fmul V128:$Rn, (AArch64dup (f64 FPR64:$Rm)))),		def : Pat<(v2f64 (any_fmul V128:$Rn, (AArch64dup (f64 FPR64:$Rm)))),
▲ Show 20 Lines • Show All 2,205 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/arm64-fma-combines.ll

Show All 11 Lines	entry:
%fmul = fmul fast double %tmp1, %tmp1		%fmul = fmul fast double %tmp1, %tmp1
%fmul2 = fmul fast double %tmp2, 0x3F94AFD6A052BF5B		%fmul2 = fmul fast double %tmp2, 0x3F94AFD6A052BF5B
%fadd = fadd fast double %fmul, %fmul2		%fadd = fadd fast double %fmul, %fmul2
br label %for.body		br label %for.body

; CHECK-LABEL: %for.body		; CHECK-LABEL: %for.body
; CHECK: fmla.2d {{v[0-9]+}}, {{v[0-9]+}}, {{v[0-9]+}}		; CHECK: fmla.2d {{v[0-9]+}}, {{v[0-9]+}}, {{v[0-9]+}}
; CHECK: fmla.2d {{v[0-9]+}}, {{v[0-9]+}}, {{v[0-9]+}}[0]		; CHECK: fmla.2d {{v[0-9]+}}, {{v[0-9]+}}, {{v[0-9]+}}[0]
; CHECK: fmla.d {{d[0-9]+}}, {{d[0-9]+}}, {{v[0-9]+}}[0]		; CHECK: fmadd {{d[0-9]+}}, {{d[0-9]+}}, {{d[0-9]+}}, {{d[0-9]+}}
for.body: ; preds = %for.body, %entry		for.body: ; preds = %for.body, %entry
%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]		%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%arrayidx3 = getelementptr inbounds double, ptr %src, i64 %indvars.iv.next		%arrayidx3 = getelementptr inbounds double, ptr %src, i64 %indvars.iv.next
%tmp3 = load double, ptr %arrayidx3, align 8		%tmp3 = load double, ptr %arrayidx3, align 8
%add = fadd fast double %tmp3, %tmp3		%add = fadd fast double %tmp3, %tmp3
%mul = fmul fast double %add, %fadd		%mul = fmul fast double %add, %fadd
%e1 = insertelement <2 x double> undef, double %add, i32 0		%e1 = insertelement <2 x double> undef, double %add, i32 0
Show All 25 Lines
entry:		entry:
%arrayidx1 = getelementptr inbounds float, ptr %src, i64 5		%arrayidx1 = getelementptr inbounds float, ptr %src, i64 5
%arrayidx2 = getelementptr inbounds float, ptr %src, i64 11		%arrayidx2 = getelementptr inbounds float, ptr %src, i64 11
br label %for.body		br label %for.body

; CHECK-LABEL: %for.body		; CHECK-LABEL: %for.body
; CHECK: fmla.2s {{v[0-9]+}}, {{v[0-9]+}}, {{v[0-9]+}}		; CHECK: fmla.2s {{v[0-9]+}}, {{v[0-9]+}}, {{v[0-9]+}}
; CHECK: fmla.2s {{v[0-9]+}}, {{v[0-9]+}}, {{v[0-9]+}}[0]		; CHECK: fmla.2s {{v[0-9]+}}, {{v[0-9]+}}, {{v[0-9]+}}[0]
; CHECK: fmla.s {{s[0-9]+}}, {{s[0-9]+}}, {{v[0-9]+}}[0]		; CHECK: fmadd {{s[0-9]+}}, {{s[0-9]+}}, {{s[0-9]+}}, {{s[0-9]+}}
for.body: ; preds = %for.body, %entry		for.body: ; preds = %for.body, %entry
%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]		%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%arrayidx3 = getelementptr inbounds float, ptr %src, i64 %indvars.iv.next		%arrayidx3 = getelementptr inbounds float, ptr %src, i64 %indvars.iv.next
%tmp1 = load float, ptr %arrayidx3, align 8		%tmp1 = load float, ptr %arrayidx3, align 8
%add = fadd fast float %tmp1, %tmp1		%add = fadd fast float %tmp1, %tmp1
%mul = fmul fast float %add, %add		%mul = fmul fast float %add, %add
%e1 = insertelement <2 x float> undef, float %add, i32 0		%e1 = insertelement <2 x float> undef, float %add, i32 0
▲ Show 20 Lines • Show All 189 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/arm64-fml-combines.ll

	; RUN: llc < %s -O3 -mtriple=arm64-apple-ios -enable-unsafe-fp-math -mattr=+fullfp16 \| FileCheck %s			; RUN: llc < %s -O3 -mtriple=arm64-apple-ios -enable-unsafe-fp-math -mattr=+fullfp16 \| FileCheck %s
	; RUN: llc < %s -O3 -mtriple=arm64-apple-ios -fp-contract=fast -mattr=+fullfp16 \| FileCheck %s			; RUN: llc < %s -O3 -mtriple=arm64-apple-ios -fp-contract=fast -mattr=+fullfp16 \| FileCheck %s

	define void @foo_2d(ptr %src) {			define void @foo_2d(ptr %src) {
	entry:			entry:
	%arrayidx1 = getelementptr inbounds double, ptr %src, i64 5			%arrayidx1 = getelementptr inbounds double, ptr %src, i64 5
	%arrayidx2 = getelementptr inbounds double, ptr %src, i64 11			%arrayidx2 = getelementptr inbounds double, ptr %src, i64 11
	br label %for.body			br label %for.body

	; CHECK-LABEL: %for.body			; CHECK-LABEL: %for.body
	; CHECK: fmls.2d {{v[0-9]+}}, {{v[0-9]+}}, {{v[0-9]+}}			; CHECK: fmls.2d {{v[0-9]+}}, {{v[0-9]+}}, {{v[0-9]+}}
	; CHECK: fmls.2d {{v[0-9]+}}, {{v[0-9]+}}, {{v[0-9]+}}[0]			; CHECK: fmls.2d {{v[0-9]+}}, {{v[0-9]+}}, {{v[0-9]+}}[0]
	; CHECK: fmls.d {{d[0-9]+}}, {{d[0-9]+}}, {{v[0-9]+}}[0]			; CHECK: fmsub {{d[0-9]+}}, {{d[0-9]+}}, {{d[0-9]+}}, {{d[0-9]+}}
	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%indvars.iv.next = sub nuw nsw i64 %indvars.iv, 1			%indvars.iv.next = sub nuw nsw i64 %indvars.iv, 1
	%arrayidx3 = getelementptr inbounds double, ptr %src, i64 %indvars.iv.next			%arrayidx3 = getelementptr inbounds double, ptr %src, i64 %indvars.iv.next
	%tmp1 = load double, ptr %arrayidx3, align 8			%tmp1 = load double, ptr %arrayidx3, align 8
	%add = fadd fast double %tmp1, %tmp1			%add = fadd fast double %tmp1, %tmp1
	%mul = fmul fast double %add, %add			%mul = fmul fast double %add, %add
	%e1 = insertelement <2 x double> undef, double %add, i32 0			%e1 = insertelement <2 x double> undef, double %add, i32 0
	Show All 25 Lines
	entry:			entry:
	%arrayidx1 = getelementptr inbounds float, ptr %src, i64 5			%arrayidx1 = getelementptr inbounds float, ptr %src, i64 5
	%arrayidx2 = getelementptr inbounds float, ptr %src, i64 11			%arrayidx2 = getelementptr inbounds float, ptr %src, i64 11
	br label %for.body			br label %for.body

	; CHECK-LABEL: %for.body			; CHECK-LABEL: %for.body
	; CHECK: fmls.2s {{v[0-9]+}}, {{v[0-9]+}}, {{v[0-9]+}}			; CHECK: fmls.2s {{v[0-9]+}}, {{v[0-9]+}}, {{v[0-9]+}}
	; CHECK: fmls.2s {{v[0-9]+}}, {{v[0-9]+}}, {{v[0-9]+}}[0]			; CHECK: fmls.2s {{v[0-9]+}}, {{v[0-9]+}}, {{v[0-9]+}}[0]
	; CHECK: fmls.s {{s[0-9]+}}, {{s[0-9]+}}, {{v[0-9]+}}[0]			; CHECK: fmsub {{s[0-9]+}}, {{s[0-9]+}}, {{s[0-9]+}}, {{s[0-9]+}}
	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	%arrayidx3 = getelementptr inbounds float, ptr %src, i64 %indvars.iv.next			%arrayidx3 = getelementptr inbounds float, ptr %src, i64 %indvars.iv.next
	%tmp1 = load float, ptr %arrayidx3, align 8			%tmp1 = load float, ptr %arrayidx3, align 8
	%add = fadd fast float %tmp1, %tmp1			%add = fadd fast float %tmp1, %tmp1
	%mul = fmul fast float %add, %add			%mul = fmul fast float %add, %add
	%e1 = insertelement <2 x float> undef, float %add, i32 0			%e1 = insertelement <2 x float> undef, float %add, i32 0
	▲ Show 20 Lines • Show All 121 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/arm64-neon-2velem.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc < %s -verify-machineinstrs -mtriple=arm64-none-linux-gnu -mattr=+neon -fp-contract=fast \| FileCheck %s --check-prefix=CHECK		; RUN: llc < %s -verify-machineinstrs -mtriple=arm64-none-linux-gnu -mattr=+neon -fp-contract=fast \| FileCheck %s --check-prefix=CHECK
; RUN: llc < %s -verify-machineinstrs -mtriple=arm64-none-linux-gnu -mattr=+neon -fp-contract=fast -mcpu=exynos-m3 \| FileCheck %s --check-prefix=CHECK		; RUN: llc < %s -verify-machineinstrs -mtriple=arm64-none-linux-gnu -mattr=+neon -fp-contract=fast -mcpu=exynos-m3 \| FileCheck %s --check-prefix=CHECK

declare <2 x double> @llvm.aarch64.neon.fmulx.v2f64(<2 x double>, <2 x double>)		declare <2 x double> @llvm.aarch64.neon.fmulx.v2f64(<2 x double>, <2 x double>)

declare <4 x float> @llvm.aarch64.neon.fmulx.v4f32(<4 x float>, <4 x float>)		declare <4 x float> @llvm.aarch64.neon.fmulx.v4f32(<4 x float>, <4 x float>)

declare <2 x float> @llvm.aarch64.neon.fmulx.v2f32(<2 x float>, <2 x float>)		declare <2 x float> @llvm.aarch64.neon.fmulx.v2f32(<2 x float>, <2 x float>)

		declare double @llvm.aarch64.neon.fmulx.f64(double, double)

declare <4 x i32> @llvm.aarch64.neon.sqrdmulh.v4i32(<4 x i32>, <4 x i32>)		declare <4 x i32> @llvm.aarch64.neon.sqrdmulh.v4i32(<4 x i32>, <4 x i32>)
declare <4 x i32> @llvm.aarch64.neon.sqrdmulh.lane.v4i32.v2i32(<4 x i32>, <2 x i32>, i32)		declare <4 x i32> @llvm.aarch64.neon.sqrdmulh.lane.v4i32.v2i32(<4 x i32>, <2 x i32>, i32)
declare <4 x i32> @llvm.aarch64.neon.sqrdmulh.laneq.v4i32.v4i32(<4 x i32>, <4 x i32>, i32)		declare <4 x i32> @llvm.aarch64.neon.sqrdmulh.laneq.v4i32.v4i32(<4 x i32>, <4 x i32>, i32)

declare <2 x i32> @llvm.aarch64.neon.sqrdmulh.v2i32(<2 x i32>, <2 x i32>)		declare <2 x i32> @llvm.aarch64.neon.sqrdmulh.v2i32(<2 x i32>, <2 x i32>)
declare <2 x i32> @llvm.aarch64.neon.sqrdmulh.lane.v2i32.v2i32(<2 x i32>, <2 x i32>, i32)		declare <2 x i32> @llvm.aarch64.neon.sqrdmulh.lane.v2i32.v2i32(<2 x i32>, <2 x i32>, i32)
declare <2 x i32> @llvm.aarch64.neon.sqrdmulh.laneq.v2i32.v4i32(<2 x i32>, <4 x i32>, i32)		declare <2 x i32> @llvm.aarch64.neon.sqrdmulh.laneq.v2i32.v4i32(<2 x i32>, <4 x i32>, i32)

▲ Show 20 Lines • Show All 2,042 Lines • ▼ Show 20 Lines
; CHECK-NEXT: fmulx v0.4s, v0.4s, v1.s[1]		; CHECK-NEXT: fmulx v0.4s, v0.4s, v1.s[1]
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%shuffle = shufflevector <2 x float> %v, <2 x float> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>		%shuffle = shufflevector <2 x float> %v, <2 x float> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
%vmulx2.i = tail call <4 x float> @llvm.aarch64.neon.fmulx.v4f32(<4 x float> %a, <4 x float> %shuffle)		%vmulx2.i = tail call <4 x float> @llvm.aarch64.neon.fmulx.v4f32(<4 x float> %a, <4 x float> %shuffle)
ret <4 x float> %vmulx2.i		ret <4 x float> %vmulx2.i
}		}

		define <1 x double> @test_vmulx_lane_f64(<1 x double> %a, <1 x double> %v) {
		; CHECK-LABEL: test_vmulx_lane_f64:
		; CHECK: // %bb.0: // %entry
		; CHECK-NEXT: fmulx d0, d0, d1
		; CHECK-NEXT: ret
		entry:
		%vget_lane = extractelement <1 x double> %a, i64 0
		%vget_lane3 = extractelement <1 x double> %v, i64 0
		%vmulxd_f64.i = tail call double @llvm.aarch64.neon.fmulx.f64(double %vget_lane, double %vget_lane3)
		%vset_lane = insertelement <1 x double> poison, double %vmulxd_f64.i, i64 0
		ret <1 x double> %vset_lane
		}

define <2 x double> @test_vmulxq_lane_f64(<2 x double> %a, <1 x double> %v) {		define <2 x double> @test_vmulxq_lane_f64(<2 x double> %a, <1 x double> %v) {
; CHECK-LABEL: test_vmulxq_lane_f64:		; CHECK-LABEL: test_vmulxq_lane_f64:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1		; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1
; CHECK-NEXT: fmulx v0.2d, v0.2d, v1.d[0]		; CHECK-NEXT: fmulx v0.2d, v0.2d, v1.d[0]
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%shuffle = shufflevector <1 x double> %v, <1 x double> undef, <2 x i32> zeroinitializer		%shuffle = shufflevector <1 x double> %v, <1 x double> undef, <2 x i32> zeroinitializer
Show All 18 Lines
; CHECK-NEXT: fmulx v0.4s, v0.4s, v1.s[3]		; CHECK-NEXT: fmulx v0.4s, v0.4s, v1.s[3]
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%shuffle = shufflevector <4 x float> %v, <4 x float> undef, <4 x i32> <i32 3, i32 3, i32 3, i32 3>		%shuffle = shufflevector <4 x float> %v, <4 x float> undef, <4 x i32> <i32 3, i32 3, i32 3, i32 3>
%vmulx2.i = tail call <4 x float> @llvm.aarch64.neon.fmulx.v4f32(<4 x float> %a, <4 x float> %shuffle)		%vmulx2.i = tail call <4 x float> @llvm.aarch64.neon.fmulx.v4f32(<4 x float> %a, <4 x float> %shuffle)
ret <4 x float> %vmulx2.i		ret <4 x float> %vmulx2.i
}		}

		define <1 x double> @test_vmulx_laneq_f64(<1 x double> %a, <2 x double> %v) {
		; CHECK-LABEL: test_vmulx_laneq_f64:
		; CHECK: // %bb.0: // %entry
		; CHECK-NEXT: fmulx d0, d0, v1.d[1]
		; CHECK-NEXT: ret
		entry:
		%vget_lane = extractelement <1 x double> %a, i64 0
		%vgetq_lane = extractelement <2 x double> %v, i64 1
		%vmulxd_f64.i = tail call double @llvm.aarch64.neon.fmulx.f64(double %vget_lane, double %vgetq_lane)
		%vset_lane = insertelement <1 x double> poison, double %vmulxd_f64.i, i64 0
		ret <1 x double> %vset_lane
		}

define <2 x double> @test_vmulxq_laneq_f64(<2 x double> %a, <2 x double> %v) {		define <2 x double> @test_vmulxq_laneq_f64(<2 x double> %a, <2 x double> %v) {
; CHECK-LABEL: test_vmulxq_laneq_f64:		; CHECK-LABEL: test_vmulxq_laneq_f64:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: fmulx v0.2d, v0.2d, v1.d[1]		; CHECK-NEXT: fmulx v0.2d, v0.2d, v1.d[1]
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%shuffle = shufflevector <2 x double> %v, <2 x double> undef, <2 x i32> <i32 1, i32 1>		%shuffle = shufflevector <2 x double> %v, <2 x double> undef, <2 x i32> <i32 1, i32 1>
%vmulx2.i = tail call <2 x double> @llvm.aarch64.neon.fmulx.v2f64(<2 x double> %a, <2 x double> %shuffle)		%vmulx2.i = tail call <2 x double> @llvm.aarch64.neon.fmulx.v2f64(<2 x double> %a, <2 x double> %shuffle)
▲ Show 20 Lines • Show All 1,444 Lines • ▼ Show 20 Lines	entry:
%shuffle = shufflevector <4 x float> %v, <4 x float> undef, <2 x i32> zeroinitializer		%shuffle = shufflevector <4 x float> %v, <4 x float> undef, <2 x i32> zeroinitializer
%mul = fmul <2 x float> %shuffle, %a		%mul = fmul <2 x float> %shuffle, %a
ret <2 x float> %mul		ret <2 x float> %mul
}		}

define <1 x double> @test_vmul_laneq_f64_0(<1 x double> %a, <2 x double> %v) {		define <1 x double> @test_vmul_laneq_f64_0(<1 x double> %a, <2 x double> %v) {
; CHECK-LABEL: test_vmul_laneq_f64_0:		; CHECK-LABEL: test_vmul_laneq_f64_0:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: fmul d0, d0, v1.d[0]		; CHECK-NEXT: fmul d0, d0, d1
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%0 = bitcast <1 x double> %a to <8 x i8>		%0 = bitcast <1 x double> %a to <8 x i8>
%1 = bitcast <8 x i8> %0 to double		%1 = bitcast <8 x i8> %0 to double
%extract = extractelement <2 x double> %v, i32 0		%extract = extractelement <2 x double> %v, i32 0
%2 = fmul double %1, %extract		%2 = fmul double %1, %extract
%3 = insertelement <1 x double> undef, double %2, i32 0		%3 = insertelement <1 x double> undef, double %2, i32 0
ret <1 x double> %3		ret <1 x double> %3
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines
; CHECK-NEXT: fmulx v0.4s, v0.4s, v1.s[0]		; CHECK-NEXT: fmulx v0.4s, v0.4s, v1.s[0]
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%shuffle = shufflevector <4 x float> %v, <4 x float> undef, <4 x i32> zeroinitializer		%shuffle = shufflevector <4 x float> %v, <4 x float> undef, <4 x i32> zeroinitializer
%vmulx2.i = tail call <4 x float> @llvm.aarch64.neon.fmulx.v4f32(<4 x float> %a, <4 x float> %shuffle)		%vmulx2.i = tail call <4 x float> @llvm.aarch64.neon.fmulx.v4f32(<4 x float> %a, <4 x float> %shuffle)
ret <4 x float> %vmulx2.i		ret <4 x float> %vmulx2.i
}		}

		define <1 x double> @test_vmulx_laneq_f64_0(<1 x double> %a, <2 x double> %v) {
		; CHECK-LABEL: test_vmulx_laneq_f64_0:
		; CHECK: // %bb.0: // %entry
		; CHECK-NEXT: fmulx d0, d0, d1
		; CHECK-NEXT: ret
		entry:
		%vget_lane = extractelement <1 x double> %a, i64 0
		%vgetq_lane = extractelement <2 x double> %v, i64 0
		%vmulxd_f64.i = tail call double @llvm.aarch64.neon.fmulx.f64(double %vget_lane, double %vgetq_lane)
		%vset_lane = insertelement <1 x double> poison, double %vmulxd_f64.i, i64 0
		ret <1 x double> %vset_lane
		}

define <2 x double> @test_vmulxq_laneq_f64_0(<2 x double> %a, <2 x double> %v) {		define <2 x double> @test_vmulxq_laneq_f64_0(<2 x double> %a, <2 x double> %v) {
; CHECK-LABEL: test_vmulxq_laneq_f64_0:		; CHECK-LABEL: test_vmulxq_laneq_f64_0:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: fmulx v0.2d, v0.2d, v1.d[0]		; CHECK-NEXT: fmulx v0.2d, v0.2d, v1.d[0]
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%shuffle = shufflevector <2 x double> %v, <2 x double> undef, <2 x i32> zeroinitializer		%shuffle = shufflevector <2 x double> %v, <2 x double> undef, <2 x i32> zeroinitializer
%vmulx2.i = tail call <2 x double> @llvm.aarch64.neon.fmulx.v2f64(<2 x double> %a, <2 x double> %shuffle)		%vmulx2.i = tail call <2 x double> @llvm.aarch64.neon.fmulx.v2f64(<2 x double> %a, <2 x double> %shuffle)
▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/arm64-neon-scalar-by-elem-mul.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 2			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 2
	; RUN: llc < %s -verify-machineinstrs -mtriple=arm64-none-linux-gnu -mattr=+neon -fp-contract=fast \| FileCheck %s			; RUN: llc < %s -verify-machineinstrs -mtriple=arm64-none-linux-gnu -mattr=+neon -fp-contract=fast \| FileCheck %s

	define float @test_fmul_lane_ss2S(float %a, <2 x float> %v) {			define float @test_fmul_lane_ss2S_0(float %a, <2 x float> %v) {
	; CHECK-LABEL: test_fmul_lane_ss2S:			; CHECK-LABEL: test_fmul_lane_ss2S_0:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1
				; CHECK-NEXT: fmul s0, s0, s1
				; CHECK-NEXT: ret
				%tmp1 = extractelement <2 x float> %v, i32 0
				%tmp2 = fmul float %a, %tmp1
				ret float %tmp2
				}

				define float @test_fmul_lane_ss2S_1(float %a, <2 x float> %v) {
				; CHECK-LABEL: test_fmul_lane_ss2S_1:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1			; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1
	; CHECK-NEXT: fmul s0, s0, v1.s[1]			; CHECK-NEXT: fmul s0, s0, v1.s[1]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%tmp1 = extractelement <2 x float> %v, i32 1			%tmp1 = extractelement <2 x float> %v, i32 1
	%tmp2 = fmul float %a, %tmp1;			%tmp2 = fmul float %a, %tmp1;
	ret float %tmp2;			ret float %tmp2;
	}			}

	define float @test_fmul_lane_ss2S_swap(float %a, <2 x float> %v) {			define float @test_fmul_lane_ss2S_1_swap(float %a, <2 x float> %v) {
	; CHECK-LABEL: test_fmul_lane_ss2S_swap:			; CHECK-LABEL: test_fmul_lane_ss2S_1_swap:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1			; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1
	; CHECK-NEXT: fmul s0, s0, v1.s[1]			; CHECK-NEXT: fmul s0, s0, v1.s[1]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%tmp1 = extractelement <2 x float> %v, i32 1			%tmp1 = extractelement <2 x float> %v, i32 1
	%tmp2 = fmul float %tmp1, %a;			%tmp2 = fmul float %tmp1, %a;
	ret float %tmp2;			ret float %tmp2;
	}			}

				define float @test_fmul_lane_ss4S_0(float %a, <4 x float> %v) {
				; CHECK-LABEL: test_fmul_lane_ss4S_0:
				; CHECK: // %bb.0:
				; CHECK-NEXT: fmul s0, s0, s1
				; CHECK-NEXT: ret
				%tmp1 = extractelement <4 x float> %v, i32 0
				%tmp2 = fmul float %a, %tmp1
				ret float %tmp2
				}

	define float @test_fmul_lane_ss4S(float %a, <4 x float> %v) {			define float @test_fmul_lane_ss4S_3(float %a, <4 x float> %v) {
	; CHECK-LABEL: test_fmul_lane_ss4S:			; CHECK-LABEL: test_fmul_lane_ss4S_3:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: fmul s0, s0, v1.s[3]			; CHECK-NEXT: fmul s0, s0, v1.s[3]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%tmp1 = extractelement <4 x float> %v, i32 3			%tmp1 = extractelement <4 x float> %v, i32 3
	%tmp2 = fmul float %a, %tmp1;			%tmp2 = fmul float %a, %tmp1;
	ret float %tmp2;			ret float %tmp2;
	}			}

	define float @test_fmul_lane_ss4S_swap(float %a, <4 x float> %v) {			define float @test_fmul_lane_ss4S_3_swap(float %a, <4 x float> %v) {
	; CHECK-LABEL: test_fmul_lane_ss4S_swap:			; CHECK-LABEL: test_fmul_lane_ss4S_3_swap:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: fmul s0, s0, v1.s[3]			; CHECK-NEXT: fmul s0, s0, v1.s[3]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%tmp1 = extractelement <4 x float> %v, i32 3			%tmp1 = extractelement <4 x float> %v, i32 3
	%tmp2 = fmul float %tmp1, %a;			%tmp2 = fmul float %tmp1, %a;
	ret float %tmp2;			ret float %tmp2;
	}			}


	define double @test_fmul_lane_ddD(double %a, <1 x double> %v) {			define double @test_fmul_lane_ddD(double %a, <1 x double> %v) {
	; CHECK-LABEL: test_fmul_lane_ddD:			; CHECK-LABEL: test_fmul_lane_ddD:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: fmul d0, d0, d1			; CHECK-NEXT: fmul d0, d0, d1
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%tmp1 = extractelement <1 x double> %v, i32 0			%tmp1 = extractelement <1 x double> %v, i32 0
	%tmp2 = fmul double %a, %tmp1;			%tmp2 = fmul double %a, %tmp1;
	ret double %tmp2;			ret double %tmp2;
	}			}


				define double @test_fmul_lane_dd2D_0(double %a, <2 x double> %v) {
				; CHECK-LABEL: test_fmul_lane_dd2D_0:
				; CHECK: // %bb.0:
				; CHECK-NEXT: fmul d0, d0, d1
				; CHECK-NEXT: ret
				%tmp1 = extractelement <2 x double> %v, i32 0
				%tmp2 = fmul double %a, %tmp1
				ret double %tmp2
				}

	define double @test_fmul_lane_dd2D(double %a, <2 x double> %v) {			define double @test_fmul_lane_dd2D_1(double %a, <2 x double> %v) {
	; CHECK-LABEL: test_fmul_lane_dd2D:			; CHECK-LABEL: test_fmul_lane_dd2D_1:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: fmul d0, d0, v1.d[1]			; CHECK-NEXT: fmul d0, d0, v1.d[1]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%tmp1 = extractelement <2 x double> %v, i32 1			%tmp1 = extractelement <2 x double> %v, i32 1
	%tmp2 = fmul double %a, %tmp1;			%tmp2 = fmul double %a, %tmp1;
	ret double %tmp2;			ret double %tmp2;
	}			}


	define double @test_fmul_lane_dd2D_swap(double %a, <2 x double> %v) {			define double @test_fmul_lane_dd2D_1_swap(double %a, <2 x double> %v) {
	; CHECK-LABEL: test_fmul_lane_dd2D_swap:			; CHECK-LABEL: test_fmul_lane_dd2D_1_swap:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: fmul d0, d0, v1.d[1]			; CHECK-NEXT: fmul d0, d0, v1.d[1]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%tmp1 = extractelement <2 x double> %v, i32 1			%tmp1 = extractelement <2 x double> %v, i32 1
	%tmp2 = fmul double %tmp1, %a;			%tmp2 = fmul double %tmp1, %a;
	ret double %tmp2;			ret double %tmp2;
	}			}

	declare float @llvm.aarch64.neon.fmulx.f32(float, float)			declare float @llvm.aarch64.neon.fmulx.f32(float, float)

	define float @test_fmulx_lane_f32(float %a, <2 x float> %v) {			define float @test_fmulx_lane_f32_0(float %a, <2 x float> %v) {
	; CHECK-LABEL: test_fmulx_lane_f32:			; CHECK-LABEL: test_fmulx_lane_f32_0:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1
				; CHECK-NEXT: fmulx s0, s0, s1
				; CHECK-NEXT: ret
				%tmp1 = extractelement <2 x float> %v, i32 0
				%tmp2 = call float @llvm.aarch64.neon.fmulx.f32(float %a, float %tmp1)
				ret float %tmp2;
				}

				define float @test_fmulx_lane_f32_1(float %a, <2 x float> %v) {
				; CHECK-LABEL: test_fmulx_lane_f32_1:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1			; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1
	; CHECK-NEXT: fmulx s0, s0, v1.s[1]			; CHECK-NEXT: fmulx s0, s0, v1.s[1]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%tmp1 = extractelement <2 x float> %v, i32 1			%tmp1 = extractelement <2 x float> %v, i32 1
	%tmp2 = call float @llvm.aarch64.neon.fmulx.f32(float %a, float %tmp1)			%tmp2 = call float @llvm.aarch64.neon.fmulx.f32(float %a, float %tmp1)
	ret float %tmp2;			ret float %tmp2;
	}			}

	define float @test_fmulx_laneq_f32(float %a, <4 x float> %v) {			define float @test_fmulx_laneq_f32_0(float %a, <4 x float> %v) {
	; CHECK-LABEL: test_fmulx_laneq_f32:			; CHECK-LABEL: test_fmulx_laneq_f32_0:
				; CHECK: // %bb.0:
				; CHECK-NEXT: fmulx s0, s0, s1
				; CHECK-NEXT: ret
				%tmp1 = extractelement <4 x float> %v, i32 0
				%tmp2 = call float @llvm.aarch64.neon.fmulx.f32(float %a, float %tmp1)
				ret float %tmp2;
				}

				define float @test_fmulx_laneq_f32_3(float %a, <4 x float> %v) {
				; CHECK-LABEL: test_fmulx_laneq_f32_3:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: fmulx s0, s0, v1.s[3]			; CHECK-NEXT: fmulx s0, s0, v1.s[3]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%tmp1 = extractelement <4 x float> %v, i32 3			%tmp1 = extractelement <4 x float> %v, i32 3
	%tmp2 = call float @llvm.aarch64.neon.fmulx.f32(float %a, float %tmp1)			%tmp2 = call float @llvm.aarch64.neon.fmulx.f32(float %a, float %tmp1)
	ret float %tmp2;			ret float %tmp2;
	}			}

	define float @test_fmulx_laneq_f32_swap(float %a, <4 x float> %v) {			define float @test_fmulx_laneq_f32_3_swap(float %a, <4 x float> %v) {
	; CHECK-LABEL: test_fmulx_laneq_f32_swap:			; CHECK-LABEL: test_fmulx_laneq_f32_3_swap:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: fmulx s0, s0, v1.s[3]			; CHECK-NEXT: fmulx s0, s0, v1.s[3]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%tmp1 = extractelement <4 x float> %v, i32 3			%tmp1 = extractelement <4 x float> %v, i32 3
	%tmp2 = call float @llvm.aarch64.neon.fmulx.f32(float %tmp1, float %a)			%tmp2 = call float @llvm.aarch64.neon.fmulx.f32(float %tmp1, float %a)
	ret float %tmp2;			ret float %tmp2;
	}			}

	declare double @llvm.aarch64.neon.fmulx.f64(double, double)			declare double @llvm.aarch64.neon.fmulx.f64(double, double)

	define double @test_fmulx_lane_f64(double %a, <1 x double> %v) {			define double @test_fmulx_lane_f64(double %a, <1 x double> %v) {
	; CHECK-LABEL: test_fmulx_lane_f64:			; CHECK-LABEL: test_fmulx_lane_f64:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: fmulx d0, d0, d1			; CHECK-NEXT: fmulx d0, d0, d1
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%tmp1 = extractelement <1 x double> %v, i32 0			%tmp1 = extractelement <1 x double> %v, i32 0
	%tmp2 = call double @llvm.aarch64.neon.fmulx.f64(double %a, double %tmp1)			%tmp2 = call double @llvm.aarch64.neon.fmulx.f64(double %a, double %tmp1)
	ret double %tmp2;			ret double %tmp2;
	}			}

	define double @test_fmulx_laneq_f64_0(double %a, <2 x double> %v) {			define double @test_fmulx_laneq_f64_0(double %a, <2 x double> %v) {
	; CHECK-LABEL: test_fmulx_laneq_f64_0:			; CHECK-LABEL: test_fmulx_laneq_f64_0:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: fmulx d0, d0, v1.d[0]			; CHECK-NEXT: fmulx d0, d0, d1
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%tmp1 = extractelement <2 x double> %v, i32 0			%tmp1 = extractelement <2 x double> %v, i32 0
	%tmp2 = call double @llvm.aarch64.neon.fmulx.f64(double %a, double %tmp1)			%tmp2 = call double @llvm.aarch64.neon.fmulx.f64(double %a, double %tmp1)
	ret double %tmp2;			ret double %tmp2;
	}			}


	define double @test_fmulx_laneq_f64_1(double %a, <2 x double> %v) {			define double @test_fmulx_laneq_f64_1(double %a, <2 x double> %v) {
	Show All 11 Lines
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: fmulx d0, d0, v1.d[1]			; CHECK-NEXT: fmulx d0, d0, v1.d[1]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%tmp1 = extractelement <2 x double> %v, i32 1			%tmp1 = extractelement <2 x double> %v, i32 1
	%tmp2 = call double @llvm.aarch64.neon.fmulx.f64(double %tmp1, double %a)			%tmp2 = call double @llvm.aarch64.neon.fmulx.f64(double %tmp1, double %a)
	ret double %tmp2;			ret double %tmp2;
	}			}

				define float @test_fmulx_horizontal_f32(<2 x float> %v) {
				; CHECK-LABEL: test_fmulx_horizontal_f32:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $q0
				; CHECK-NEXT: fmulx s0, s0, v0.s[1]
				; CHECK-NEXT: ret
				entry:
				%0 = extractelement <2 x float> %v, i32 0
				%1 = extractelement <2 x float> %v, i32 1
				%2 = call float @llvm.aarch64.neon.fmulx.f32(float %0, float %1)
				ret float %2
				}

				define double @test_fmulx_horizontal_f64(<2 x double> %v) {
				; CHECK-LABEL: test_fmulx_horizontal_f64:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fmulx d0, d0, v0.d[1]
				; CHECK-NEXT: ret
				entry:
				%0 = extractelement <2 x double> %v, i32 0
				%1 = extractelement <2 x double> %v, i32 1
				%2 = call double @llvm.aarch64.neon.fmulx.f64(double %0, double %1)
				ret double %2
				}
				overmightyAuthorUnsubmitted Not Done Reply Inline Actions Not sure if `test_fmulx_horizontal_f` tests should be put in a separate file or not. overmighty:* Not sure if `test_fmulx_horizontal_f*` tests should be put in a separate file or not.

llvm/test/CodeGen/AArch64/complex-deinterleaving-f16-mul.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s --mattr=+complxnum,+neon,+fullfp16 -o - \| FileCheck %s			; RUN: llc < %s --mattr=+complxnum,+neon,+fullfp16 -o - \| FileCheck %s

	target triple = "aarch64"			target triple = "aarch64"

	; Expected to not transform			; Expected to not transform
	define <2 x half> @complex_mul_v2f16(<2 x half> %a, <2 x half> %b) {			define <2 x half> @complex_mul_v2f16(<2 x half> %a, <2 x half> %b) {
	; CHECK-LABEL: complex_mul_v2f16:			; CHECK-LABEL: complex_mul_v2f16:
	; CHECK: // %bb.0: // %entry			; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1			; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1
	; CHECK-NEXT: // kill: def $d0 killed $d0 def $q0			; CHECK-NEXT: // kill: def $d0 killed $d0 def $q0
	; CHECK-NEXT: mov h3, v0.h[1]			; CHECK-NEXT: mov h3, v0.h[1]
	; CHECK-NEXT: mov h2, v1.h[1]			; CHECK-NEXT: mov h2, v1.h[1]
	; CHECK-NEXT: fmul h4, h2, v0.h[0]			; CHECK-NEXT: fmul h4, h0, v1.h[1]
	; CHECK-NEXT: fnmul h2, h3, h2			; CHECK-NEXT: fnmul h2, h3, h2
	; CHECK-NEXT: fmla h4, h3, v1.h[0]			; CHECK-NEXT: fmla h4, h3, v1.h[0]
	; CHECK-NEXT: fmla h2, h0, v1.h[0]			; CHECK-NEXT: fmla h2, h0, v1.h[0]
	; CHECK-NEXT: mov v2.h[1], v4.h[0]			; CHECK-NEXT: mov v2.h[1], v4.h[0]
	; CHECK-NEXT: fmov d0, d2			; CHECK-NEXT: fmov d0, d2
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
				overmightyAuthorUnsubmitted Not Done Reply Inline Actions Ideally this would be `fmul h4, h0, h3`, but this is prevented by the patterns to avoid using scalar FMUL if it might introduce an extra DUP/`mov`. overmighty: Ideally this would be `fmul h4, h0, h3`, but this is prevented by the patterns to avoid using…
	%a.real = shufflevector <2 x half> %a, <2 x half> poison, <1 x i32> <i32 0>			%a.real = shufflevector <2 x half> %a, <2 x half> poison, <1 x i32> <i32 0>
	%a.imag = shufflevector <2 x half> %a, <2 x half> poison, <1 x i32> <i32 1>			%a.imag = shufflevector <2 x half> %a, <2 x half> poison, <1 x i32> <i32 1>
	%b.real = shufflevector <2 x half> %b, <2 x half> poison, <1 x i32> <i32 0>			%b.real = shufflevector <2 x half> %b, <2 x half> poison, <1 x i32> <i32 0>
	%b.imag = shufflevector <2 x half> %b, <2 x half> poison, <1 x i32> <i32 1>			%b.imag = shufflevector <2 x half> %b, <2 x half> poison, <1 x i32> <i32 1>
	%0 = fmul fast <1 x half> %b.imag, %a.real			%0 = fmul fast <1 x half> %b.imag, %a.real
	%1 = fmul fast <1 x half> %b.real, %a.imag			%1 = fmul fast <1 x half> %b.real, %a.imag
	%2 = fadd fast <1 x half> %1, %0			%2 = fadd fast <1 x half> %1, %0
	%3 = fmul fast <1 x half> %b.real, %a.real			%3 = fmul fast <1 x half> %b.real, %a.real
	▲ Show 20 Lines • Show All 117 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/fp16_intrinsic_lane.ll

	Show First 20 Lines • Show All 222 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: fmul v0.8h, v0.8h, v1.h[0]			; CHECK-NEXT: fmul v0.8h, v0.8h, v1.h[0]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%shuffle = shufflevector <8 x half> %b, <8 x half> undef, <8 x i32> zeroinitializer			%shuffle = shufflevector <8 x half> %b, <8 x half> undef, <8 x i32> zeroinitializer
	%mul = fmul <8 x half> %shuffle, %a			%mul = fmul <8 x half> %shuffle, %a
	ret <8 x half> %mul			ret <8 x half> %mul
	}			}

	define dso_local half @t_vmulh_lane_f16(half %a, <4 x half> %c, i32 %lane) {			define dso_local half @t_vmulh_lane0_f16(half %a, <4 x half> %c, i32 %lane) {
	; CHECK-LABEL: t_vmulh_lane_f16:			; CHECK-LABEL: t_vmulh_lane0_f16:
	; CHECK: // %bb.0: // %entry			; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1			; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1
	; CHECK-NEXT: fmul h0, h0, v1.h[0]			; CHECK-NEXT: fmul h0, h0, h1
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%0 = extractelement <4 x half> %c, i32 0			%0 = extractelement <4 x half> %c, i32 0
	%1 = fmul half %0, %a			%1 = fmul half %0, %a
	ret half %1			ret half %1
	}			}

	define dso_local half @t_vmulh_laneq_f16(half %a, <8 x half> %c, i32 %lane) {			define dso_local half @t_vmulh_lane3_f16(half %a, <4 x half> %c, i32 %lane) {
	; CHECK-LABEL: t_vmulh_laneq_f16:			; CHECK-LABEL: t_vmulh_lane3_f16:
	; CHECK: // %bb.0: // %entry			; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: fmul h0, h0, v1.h[0]			; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1
				; CHECK-NEXT: fmul h0, h0, v1.h[3]
				; CHECK-NEXT: ret
				entry:
				%0 = extractelement <4 x half> %c, i32 3
				%1 = fmul half %0, %a
				ret half %1
				}

				define dso_local half @t_vmulh_laneq0_f16(half %a, <8 x half> %c, i32 %lane) {
				; CHECK-LABEL: t_vmulh_laneq0_f16:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fmul h0, h0, h1
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%0 = extractelement <8 x half> %c, i32 0			%0 = extractelement <8 x half> %c, i32 0
	%1 = fmul half %0, %a			%1 = fmul half %0, %a
	ret half %1			ret half %1
	}			}

				define dso_local half @t_vmulh_laneq7_f16(half %a, <8 x half> %c, i32 %lane) {
				; CHECK-LABEL: t_vmulh_laneq7_f16:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fmul h0, h0, v1.h[7]
				; CHECK-NEXT: ret
				entry:
				%0 = extractelement <8 x half> %c, i32 7
				%1 = fmul half %0, %a
				ret half %1
				}

	define dso_local half @t_vmulx_f16(half %a, half %b) {			define dso_local half @t_vmulx_f16(half %a, half %b) {
	; CHECK-LABEL: t_vmulx_f16:			; CHECK-LABEL: t_vmulx_f16:
	; CHECK: // %bb.0: // %entry			; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: fmulx h0, h0, h1			; CHECK-NEXT: fmulx h0, h0, h1
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%fmulx.i = tail call half @llvm.aarch64.neon.fmulx.f16(half %a, half %b)			%fmulx.i = tail call half @llvm.aarch64.neon.fmulx.f16(half %a, half %b)
	ret half %fmulx.i			ret half %fmulx.i
	}			}

	define dso_local half @t_vmulxh_lane_f16(half %a, <4 x half> %b, i32 %lane) {			define dso_local half @t_vmulxh_lane0_f16(half %a, <4 x half> %b) {
	; CHECK-LABEL: t_vmulxh_lane_f16:			; CHECK-LABEL: t_vmulxh_lane0_f16:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1
				; CHECK-NEXT: fmulx h0, h0, h1
				; CHECK-NEXT: ret
				entry:
				%extract = extractelement <4 x half> %b, i32 0
				%fmulx.i = tail call half @llvm.aarch64.neon.fmulx.f16(half %a, half %extract)
				ret half %fmulx.i
				}

				define dso_local half @t_vmulxh_lane3_f16(half %a, <4 x half> %b, i32 %lane) {
				; CHECK-LABEL: t_vmulxh_lane3_f16:
	; CHECK: // %bb.0: // %entry			; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1			; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1
	; CHECK-NEXT: fmulx h0, h0, v1.h[3]			; CHECK-NEXT: fmulx h0, h0, v1.h[3]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%extract = extractelement <4 x half> %b, i32 3			%extract = extractelement <4 x half> %b, i32 3
	%fmulx.i = tail call half @llvm.aarch64.neon.fmulx.f16(half %a, half %extract)			%fmulx.i = tail call half @llvm.aarch64.neon.fmulx.f16(half %a, half %extract)
	ret half %fmulx.i			ret half %fmulx.i
	Show All 40 Lines
	; CHECK-NEXT: fmulx v0.8h, v0.8h, v1.h[0]			; CHECK-NEXT: fmulx v0.8h, v0.8h, v1.h[0]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%shuffle = shufflevector <8 x half> %b, <8 x half> undef, <8 x i32> zeroinitializer			%shuffle = shufflevector <8 x half> %b, <8 x half> undef, <8 x i32> zeroinitializer
	%vmulx2.i = tail call <8 x half> @llvm.aarch64.neon.fmulx.v8f16(<8 x half> %a, <8 x half> %shuffle) #4			%vmulx2.i = tail call <8 x half> @llvm.aarch64.neon.fmulx.v8f16(<8 x half> %a, <8 x half> %shuffle) #4
	ret <8 x half> %vmulx2.i			ret <8 x half> %vmulx2.i
	}			}

	define dso_local half @t_vmulxh_laneq_f16(half %a, <8 x half> %b, i32 %lane) {			define dso_local half @t_vmulxh_laneq0_f16(half %a, <8 x half> %b) {
	; CHECK-LABEL: t_vmulxh_laneq_f16:			; CHECK-LABEL: t_vmulxh_laneq0_f16:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fmulx h0, h0, h1
				; CHECK-NEXT: ret
				entry:
				%extract = extractelement <8 x half> %b, i32 0
				%fmulx.i = tail call half @llvm.aarch64.neon.fmulx.f16(half %a, half %extract)
				ret half %fmulx.i
				}

				define dso_local half @t_vmulxh_laneq7_f16(half %a, <8 x half> %b, i32 %lane) {
				; CHECK-LABEL: t_vmulxh_laneq7_f16:
	; CHECK: // %bb.0: // %entry			; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: fmulx h0, h0, v1.h[7]			; CHECK-NEXT: fmulx h0, h0, v1.h[7]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%extract = extractelement <8 x half> %b, i32 7			%extract = extractelement <8 x half> %b, i32 7
	%fmulx.i = tail call half @llvm.aarch64.neon.fmulx.f16(half %a, half %extract)			%fmulx.i = tail call half @llvm.aarch64.neon.fmulx.f16(half %a, half %extract)
	ret half %fmulx.i			ret half %fmulx.i
	}			}
	▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: fmla h0, h1, v2.h[3]			; CHECK-NEXT: fmla h0, h1, v2.h[3]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%0 = fadd <4 x half> %c, %d			%0 = fadd <4 x half> %c, %d
	%extract = extractelement <4 x half> %0, i32 3			%extract = extractelement <4 x half> %0, i32 3
	%1 = tail call half @llvm.fma.f16(half %b, half %extract, half %a)			%1 = tail call half @llvm.fma.f16(half %b, half %extract, half %a)
	ret half %1			ret half %1
	}			}

				define half @test_fmulx_horizontal_f16(<2 x half> %v) {
				; CHECK-LABEL: test_fmulx_horizontal_f16:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $q0
				; CHECK-NEXT: fmulx h0, h0, v0.h[1]
				; CHECK-NEXT: ret
				entry:
				%0 = extractelement <2 x half> %v, i32 0
				%1 = extractelement <2 x half> %v, i32 1
				%2 = call half @llvm.aarch64.neon.fmulx.f16(half %0, half %1)
				ret half %2
				}
				samtebbsUnsubmitted Not Done Reply Inline Actions Some of the test changes, such as this one being moved and renamed, make it hard to see what has changed. Could you try to reduce the number of such differences? samtebbs: Some of the test changes, such as this one being moved and renamed, make it hard to see what…
				overmightyAuthorUnsubmitted Not Done Reply Inline Actions Sure, I was trying to maintain some kind of consistency. I have also spotted duplicate tests, I don't know if it would be worth it to refactor some of the tests in the future. overmighty: Sure, I was trying to maintain some kind of consistency. I have also spotted duplicate tests, I…
				samtebbsUnsubmitted Not Done Reply Inline Actions Yeah refactoring in a future patch with no functional changes sounds good to me. samtebbs: Yeah refactoring in a future patch with no functional changes sounds good to me.

llvm/test/CodeGen/AArch64/vecreduce-fmul-legalization-strict.ll

	Show First 20 Lines • Show All 64 Lines • ▼ Show 20 Lines
	}			}

	define float @test_v16f32(<16 x float> %a) nounwind {			define float @test_v16f32(<16 x float> %a) nounwind {
	; CHECK-LABEL: test_v16f32:			; CHECK-LABEL: test_v16f32:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: fmul s4, s0, v0.s[1]			; CHECK-NEXT: fmul s4, s0, v0.s[1]
	; CHECK-NEXT: fmul s4, s4, v0.s[2]			; CHECK-NEXT: fmul s4, s4, v0.s[2]
	; CHECK-NEXT: fmul s0, s4, v0.s[3]			; CHECK-NEXT: fmul s0, s4, v0.s[3]
	; CHECK-NEXT: fmul s0, s0, v1.s[0]			; CHECK-NEXT: fmul s0, s0, s1
	; CHECK-NEXT: fmul s0, s0, v1.s[1]			; CHECK-NEXT: fmul s0, s0, v1.s[1]
	; CHECK-NEXT: fmul s0, s0, v1.s[2]			; CHECK-NEXT: fmul s0, s0, v1.s[2]
	; CHECK-NEXT: fmul s0, s0, v1.s[3]			; CHECK-NEXT: fmul s0, s0, v1.s[3]
	; CHECK-NEXT: fmul s0, s0, v2.s[0]			; CHECK-NEXT: fmul s0, s0, s2
	; CHECK-NEXT: fmul s0, s0, v2.s[1]			; CHECK-NEXT: fmul s0, s0, v2.s[1]
	; CHECK-NEXT: fmul s0, s0, v2.s[2]			; CHECK-NEXT: fmul s0, s0, v2.s[2]
	; CHECK-NEXT: fmul s0, s0, v2.s[3]			; CHECK-NEXT: fmul s0, s0, v2.s[3]
	; CHECK-NEXT: fmul s0, s0, v3.s[0]			; CHECK-NEXT: fmul s0, s0, s3
	; CHECK-NEXT: fmul s0, s0, v3.s[1]			; CHECK-NEXT: fmul s0, s0, v3.s[1]
	; CHECK-NEXT: fmul s0, s0, v3.s[2]			; CHECK-NEXT: fmul s0, s0, v3.s[2]
	; CHECK-NEXT: fmul s0, s0, v3.s[3]			; CHECK-NEXT: fmul s0, s0, v3.s[3]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%b = call float @llvm.vector.reduce.fmul.f32.v16f32(float 1.0, <16 x float> %a)			%b = call float @llvm.vector.reduce.fmul.f32.v16f32(float 1.0, <16 x float> %a)
	ret float %b			ret float %b
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Add patterns for scalar FMUL, FMULXClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 536132

llvm/lib/Target/AArch64/AArch64InstrFormats.td

llvm/lib/Target/AArch64/AArch64InstrInfo.td

llvm/test/CodeGen/AArch64/arm64-fma-combines.ll

llvm/test/CodeGen/AArch64/arm64-fml-combines.ll

llvm/test/CodeGen/AArch64/arm64-neon-2velem.ll

llvm/test/CodeGen/AArch64/arm64-neon-scalar-by-elem-mul.ll

llvm/test/CodeGen/AArch64/complex-deinterleaving-f16-mul.ll

llvm/test/CodeGen/AArch64/fp16_intrinsic_lane.ll

llvm/test/CodeGen/AArch64/vecreduce-fmul-legalization-strict.ll

[AArch64] Add patterns for scalar FMUL, FMULX
ClosedPublic