This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/test/CodeGen/AArch64/
-
test/
-
CodeGen/
-
AArch64/
2/5
sve-int-arith.ll

Differential D142998

[SVE][codegen] Add few more tests for MUL followed by ADD/SUB (NFC)
ClosedPublic

Authored by sushgokh on Jan 31 2023, 10:45 AM.

Download Raw Diff

Details

Reviewers

SjoerdMeijer
david-arm
dmgreen
paulwalker-arm
c-rhodes
efriedma

Commits

rGee1299c6925b: [CodeGen][AArch64] Precommit additional tests for integer MLA/MAD/MLS/MSB (NFC)

Summary

These tests will form the base for upcoming patch for generating pseudo instructions for MLA/MAD/MLS/MSB at ISel.

This is also forms base for attempt made in D142656.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

sushgokh created this revision.Jan 31 2023, 10:45 AM

Herald added a reviewer: efriedma. · View Herald TranscriptJan 31 2023, 10:45 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: psnobl, tschuett. · View Herald Transcript

sushgokh requested review of this revision.Jan 31 2023, 10:45 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 31 2023, 10:45 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

sushgokh edited the summary of this revision. (Show Details)Jan 31 2023, 11:04 AM

Harbormaster completed remote builds in B211035: Diff 493671.Jan 31 2023, 12:07 PM

sushgokh mentioned this in D142656: [SVE][codegen] Add pattern for SVE multiply-add accumulate.Jan 31 2023, 12:16 PM

SjoerdMeijer added inline comments.Jan 31 2023, 12:19 PM

llvm/test/CodeGen/AArch64/sve-multiply-add-accumulate.ll
12 ↗	(On Diff #493671)	if I am not mistaken, the other patch will change this to ; CHECK-LABEL: muladd_i64: ; CHECK: // %bb.0: ; CHECK-NEXT: ptrue p0.d ; CHECK-NEXT: mov z2.d, #1 // =0x1 ; CHECK-NEXT: mad z0.d, p0/m, z1.d, z2.d ; CHECK-NEXT: ret (perhaps the immediate is different but that looks irrelevant to me). This new codegen looks better, it has 1 mov less, but that is just to return the value. There is no real use of the value, which makes this move visible. Basically what I am saying is that it is unclear what the benefit is. But either way, I will repeat my request on the other patch: let's separate things and let this be about instruction selection of 'mad'. This means that we only want to do have 'mad' tests here, and separate out the mla -> mad changes.

sushgokh added inline comments.Jan 31 2023, 12:28 PM

llvm/test/CodeGen/AArch64/sve-multiply-add-accumulate.ll
12 ↗	(On Diff #493671)	I can just commit the tests for mad. But the changes for D142656 have effect on tests that are currently generating mla. If I commit tests for mla+mad (as I have done here), comparison will be easier for D142656

I think we need at least 3 patches:

Instruction selection of mad for pattern add ( mul, splat_vector(C)). That is D142656.
New tests for this mad pattern and isel. That is this patch, D142998.

So the above only deals with mad, and not with any mla -> mad changes. That's why I suggested to strip out any mla changes out of patches 1 and 2. If you want to make changes in this area, we will follow the same approach:

Create a patch to precommit new test, if applicable.
Create a patch that implements these mla -> mad changes.

What do you think, makes sense?

In D142998#4095821, @SjoerdMeijer wrote:

I think we need at least 3 patches:

Instruction selection of mad for pattern add ( mul, splat_vector(C)). That is D142656.

New tests for this mad pattern and isel. That is this patch, D142998.

So the above only deals with mad, and not with any mla -> mad changes. That's why I suggested to strip out any mla changes out of patches 1 and 2. If you want to make changes in this area, we will follow the same approach:

Create a patch to precommit new test, if applicable.

Create a patch that implements these mla -> mad changes.

What do you think, makes sense?

I tried seperating (1,2) and (3,4) as you say above. However, mla->mad is side effect of implementing (1) and this makes seperating (1) and (4) a difficult thing unless we introduce some hacks to do so

In D142998#4095916, @sushgokh wrote:

I tried seperating (1,2) and (3,4) as you say above. However, mla->mad is side effect of implementing (1) and this makes seperating (1) and (4) a difficult thing unless we introduce some hacks to do so

Ok, I will need to understand better why we can't separate things. But I will go back to the other patch for that.

Thanks for putting these tests in a precommit patch @sushgokh!

llvm/test/CodeGen/AArch64/sve-multiply-add-accumulate.ll
1 ↗	(On Diff #493671)	Perhaps it's better to just add these tests to the existing sve-int-mad-pred.ll file?
165 ↗	(On Diff #493671)	I'm not sure what this is doing that is different to `@muladd_i8_negativeAddend`? Are you just trying to test a case where the immediate is too big to fit into the 'add'?

sushgokh added inline comments.Feb 1 2023, 3:35 AM

llvm/test/CodeGen/AArch64/sve-multiply-add-accumulate.ll
1 ↗	(On Diff #493671)	Just thought better to have seperate file because sve-int-mad-pred.ll only has intrinsics. If you suggest, will add all the tests to that file
165 ↗	(On Diff #493671)	In this test, unlike the other tests, immediate is realised using 2 register moves.

Matt added a subscriber: Matt.Feb 7 2023, 8:33 PM

Adding a loop based test case to understand the effect of patch D142656 better

Harbormaster completed remote builds in B212574: Diff 495795.Feb 8 2023, 4:25 AM

sushgokh retitled this revision from [SVE][codegen] Add test case for a fused multiply-add (NFC) to [SVE][codegen] Add few more tests for MUL followed by ADD/SUB (NFC).Feb 23 2023, 2:23 AM

sushgokh edited the summary of this revision. (Show Details)

@paulwalker-arm adding test cases for

Generating pseudo inst for MLA/MAD/MLS/MSUB
Basic test cases on mul + add/sub as you had suggested in other patch

Harbormaster completed remote builds in B215463: Diff 499778.Feb 23 2023, 3:42 AM

@paulwalker-arm Any comments/suggestions on this ?

In D142998#4170452, @sushgokh wrote:

@paulwalker-arm Any comments/suggestions on this ?

In truth I've ignored this patch whilst waiting on your other patch. My review comments on D142656 include the test file, which I'm recommending to follow the same structure as was used for the FMAs. Those being simple tests I'm happy for them to remain included with the patch that improves the isel. I think this patch can then be rebased on top of D142656 to clearly show the extra cases you care about.

In D142998#4170949, @paulwalker-arm wrote:

In D142998#4170452, @sushgokh wrote:

@paulwalker-arm Any comments/suggestions on this ?

In truth I've ignored this patch whilst waiting on your other patch. My review comments on D142656 include the test file, which I'm recommending to follow the same structure as was used for the FMAs. Those being simple tests I'm happy for them to remain included with the patch that improves the isel. I think this patch can then be rebased on top of D142656 to clearly show the extra cases you care about.

@paulwalker-arm As agreed in last comment for D142656,

Motive of this patch is converting to pseudo instructions. I was assuming these are fairly simple tests with one of opernds as shuffle. If you just expect one liners, I think tests are already present for this patch. Agreed ?

So, to reiterate the line of action,

This patch will commit tests for converting to pseudo instructions
Next patch will actually convert to pseudo instructions
Patch for converting to FMA in D142656. I am not addressing D142656 immediately

Fair enough. Sorry for the misunderstanding.

llvm/test/CodeGen/AArch64/sve-generate-pseudo.ll
4 ↗	(On Diff #499778)	Please use `update_llc_test_checks.py` to autogenerate the `CHECK` lines. Or perhaps why not just place all the tests within sve-int-arith.ll? given they're related.
10 ↗	(On Diff #499778)	Please include equivalent tests for the other supports element types (i.e. i16, i32 and i64).

paulwalker-arm added inline comments.Mar 6 2023, 4:39 AM

llvm/test/CodeGen/AArch64/sve-generate-pseudo.ll
10 ↗	(On Diff #499778)	I forgot to mention. Please also add equivalent tests that are expected to result in FMAD/FMSB being emitted once the follow on patch lands.

sushgokh added inline comments.Mar 6 2023, 4:42 AM

llvm/test/CodeGen/AArch64/sve-generate-pseudo.ll
4 ↗	(On Diff #499778)	I am not autogenerating because we just want to check the pseudo instruction thats generated. I suppose we arent interested in other instructions. Will add tests for other types. I have not added them to sve-int-arith.ll for reason that: Want to avoid auto-generation as said above File name clearly states the purpose
10 ↗	(On Diff #499778)	agreed. Thanks

sushgokh added inline comments.Mar 6 2023, 4:48 AM

llvm/test/CodeGen/AArch64/sve-generate-pseudo.ll
10 ↗	(On Diff #499778)	Current tests in sve-generate-pseudo.ll are for MAD/MSB once the follow on patch lands. i.e. currently, they are generating MLA/MLS but once the follow on patch lands, they will result in MAD/MSB being generated. So, thanks. Will add assembly instructions as well in the check lines to indicate actual instruction generated

paulwalker-arm added inline comments.Mar 6 2023, 5:08 AM

llvm/test/CodeGen/AArch64/sve-generate-pseudo.ll
4 ↗	(On Diff #499778)	The rational here does not make sense to me as the use of pseudo instructions it not revenant from a testing point of view. These tests are simply to show the expected output from a given blob of IR. That said, after looking at `sve-int-arith.ll` I can see now that these are just clones of existing tests (mla_i8 and mls_i8) and so what I'm really asking for is this patch to extend those tests within `sve-int-arith.ll` to cover the other element types for all of mla, mad, mls, and msb.

sushgokh updated this revision to Diff 502734.Mar 6 2023, 11:02 AM

paulwalker-arm added inline comments.Mar 6 2023, 11:09 AM

llvm/test/CodeGen/AArch64/sve-int-arith.ll
402–403	Does simplify the mla tests to: %prod = mul <vscale x 2 x i64> %b, %c %res = add <vscale x 2 x i64> %a, %prod ret <vscale x 2 x i64> %res give the desired output?

A few recommended improvements but otherwise looks good.

llvm/test/CodeGen/AArch64/sve-int-arith.ll
342	Please name the tests after the expected resulting instruction, so `mad` in this case.
508–512	As above, I think this can be just: %prod = mul <vscale x 8 x i16> %b, %c %res = sub <vscale x 8 x i16> %a, %prod ret <vscale x 8 x i16> %res

This revision is now accepted and ready to land.Mar 6 2023, 12:21 PM

Harbormaster completed remote builds in B217651: Diff 502734.Mar 6 2023, 3:31 PM

sushgokh added inline comments.Mar 6 2023, 11:15 PM

llvm/test/CodeGen/AArch64/sve-int-arith.ll
342	So, I have named it as per the current instruction. Once the pseudo instr generation patch lands, will name as per appropriate instruction. Sounds good?
402–403	Yes, it works. Thanks. WIll update tests.

Closed by commit rGee1299c6925b: [CodeGen][AArch64] Precommit additional tests for integer MLA/MAD/MLS/MSB (NFC) (authored by sushgokh). · Explain WhyMar 7 2023, 12:29 AM

This revision was automatically updated to reflect the committed changes.

sushgokh added a commit: rGee1299c6925b: [CodeGen][AArch64] Precommit additional tests for integer MLA/MAD/MLS/MSB (NFC).

Revision Contents

Path

Size

llvm/

test/

CodeGen/

AArch64/

sve-int-arith.ll

394 lines

Diff 502942

llvm/test/CodeGen/AArch64/sve-int-arith.ll

	Show First 20 Lines • Show All 331 Lines • ▼ Show 20 Lines
	; CHECK-LABEL: uqsub_i8:			; CHECK-LABEL: uqsub_i8:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: uqsub z0.b, z0.b, z1.b			; CHECK-NEXT: uqsub z0.b, z0.b, z1.b
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%res = call <vscale x 16 x i8> @llvm.usub.sat.nxv16i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b)			%res = call <vscale x 16 x i8> @llvm.usub.sat.nxv16i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b)
	ret <vscale x 16 x i8> %res			ret <vscale x 16 x i8> %res
	}			}

	define <vscale x 16 x i8> @mla_i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b, <vscale x 16 x i8> %c) {			; Next four cases should generate mad instruction once pseudo instructions are emitted for MLA/MAD
	; CHECK-LABEL: mla_i8:
				define <vscale x 16 x i8> @mad_i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b, <vscale x 16 x i8> %c) {
				paulwalker-armUnsubmitted Not Done Reply Inline Actions Please name the tests after the expected resulting instruction, so `mad` in this case. paulwalker-arm: Please name the tests after the expected resulting instruction, so `mad` in this case.
				sushgokhAuthorUnsubmitted Done Reply Inline Actions So, I have named it as per the current instruction. Once the pseudo instr generation patch lands, will name as per appropriate instruction. Sounds good? sushgokh: So, I have named it as per the current instruction. Once the pseudo instr generation patch…
				; CHECK-LABEL: mad_i8:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ptrue p0.b			; CHECK-NEXT: ptrue p0.b
	; CHECK-NEXT: mla z2.b, p0/m, z0.b, z1.b			; CHECK-NEXT: mla z2.b, p0/m, z0.b, z1.b
	; CHECK-NEXT: mov z0.d, z2.d			; CHECK-NEXT: mov z0.d, z2.d
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%prod = mul <vscale x 16 x i8> %a, %b			%prod = mul <vscale x 16 x i8> %a, %b
	%res = add <vscale x 16 x i8> %c, %prod			%res = add <vscale x 16 x i8> %c, %prod
	ret <vscale x 16 x i8> %res			ret <vscale x 16 x i8> %res
	}			}

				define <vscale x 8 x i16> @mad_i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b, <vscale x 8 x i16> %c) {
				; CHECK-LABEL: mad_i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: mla z2.h, p0/m, z0.h, z1.h
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				%prod = mul <vscale x 8 x i16> %a, %b
				%res = add <vscale x 8 x i16> %c, %prod
				ret <vscale x 8 x i16> %res
				}

				define <vscale x 4 x i32> @mad_i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b, <vscale x 4 x i32> %c) {
				; CHECK-LABEL: mad_i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: mla z2.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				%prod = mul <vscale x 4 x i32> %a, %b
				%res = add <vscale x 4 x i32> %c, %prod
				ret <vscale x 4 x i32> %res
				}

				define <vscale x 2 x i64> @mad_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b, <vscale x 2 x i64> %c) {
				; CHECK-LABEL: mad_i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mla z2.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				%prod = mul <vscale x 2 x i64> %a, %b
				%res = add <vscale x 2 x i64> %c, %prod
				ret <vscale x 2 x i64> %res
				}

				define <vscale x 16 x i8> @mla_i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b, <vscale x 16 x i8> %c) {
				; CHECK-LABEL: mla_i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.b
				; CHECK-NEXT: mla z0.b, p0/m, z1.b, z2.b
				; CHECK-NEXT: ret
				%prod = mul <vscale x 16 x i8> %b, %c
				%res = add <vscale x 16 x i8> %a, %prod
				ret <vscale x 16 x i8> %res
				}

				define <vscale x 8 x i16> @mla_i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b, <vscale x 8 x i16> %c) {
				; CHECK-LABEL: mla_i16:
				; CHECK: // %bb.0:
				paulwalker-armUnsubmitted Not Done Reply Inline Actions Does simplify the mla tests to: %prod = mul <vscale x 2 x i64> %b, %c %res = add <vscale x 2 x i64> %a, %prod ret <vscale x 2 x i64> %res give the desired output? paulwalker-arm: Does simplify the mla tests to: ``` %prod = mul <vscale x 2 x i64> %b, %c %res = add…
				sushgokhAuthorUnsubmitted Done Reply Inline Actions Yes, it works. Thanks. WIll update tests. sushgokh: Yes, it works. Thanks. WIll update tests.
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: mla z0.h, p0/m, z1.h, z2.h
				; CHECK-NEXT: ret
				%prod = mul <vscale x 8 x i16> %b, %c
				%res = add <vscale x 8 x i16> %a, %prod
				ret <vscale x 8 x i16> %res
				}

				define <vscale x 4 x i32> @mla_i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b, <vscale x 4 x i32> %c) {
				; CHECK-LABEL: mla_i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: mla z0.s, p0/m, z1.s, z2.s
				; CHECK-NEXT: ret
				%prod = mul <vscale x 4 x i32> %b, %c
				%res = add <vscale x 4 x i32> %a, %prod
				ret <vscale x 4 x i32> %res
				}

				define <vscale x 2 x i64> @mla_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b, <vscale x 2 x i64> %c) {
				; CHECK-LABEL: mla_i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mla z0.d, p0/m, z1.d, z2.d
				; CHECK-NEXT: ret
				%prod = mul <vscale x 2 x i64> %b, %c
				%res = add <vscale x 2 x i64> %a, %prod
				ret <vscale x 2 x i64> %res
				}

	define <vscale x 16 x i8> @mla_i8_multiuse(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b, <vscale x 16 x i8> %c, <vscale x 16 x i8>* %p) {			define <vscale x 16 x i8> @mla_i8_multiuse(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b, <vscale x 16 x i8> %c, <vscale x 16 x i8>* %p) {
	; CHECK-LABEL: mla_i8_multiuse:			; CHECK-LABEL: mla_i8_multiuse:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ptrue p0.b			; CHECK-NEXT: ptrue p0.b
	; CHECK-NEXT: mul z1.b, p0/m, z1.b, z0.b			; CHECK-NEXT: mul z1.b, p0/m, z1.b, z0.b
	; CHECK-NEXT: add z0.b, z2.b, z1.b			; CHECK-NEXT: add z0.b, z2.b, z1.b
	; CHECK-NEXT: st1b { z1.b }, p0, [x0]			; CHECK-NEXT: st1b { z1.b }, p0, [x0]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%prod = mul <vscale x 16 x i8> %a, %b			%prod = mul <vscale x 16 x i8> %a, %b
	store <vscale x 16 x i8> %prod, <vscale x 16 x i8>* %p			store <vscale x 16 x i8> %prod, <vscale x 16 x i8>* %p
	%res = add <vscale x 16 x i8> %c, %prod			%res = add <vscale x 16 x i8> %c, %prod
	ret <vscale x 16 x i8> %res			ret <vscale x 16 x i8> %res
	}			}

	define <vscale x 16 x i8> @mls_i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b, <vscale x 16 x i8> %c) {			; Next four cases should generate msb instruction once psuedo instruction is emitted for MLS/MSB
	; CHECK-LABEL: mls_i8:
				define <vscale x 16 x i8> @msb_i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b, <vscale x 16 x i8> %c) {
				; CHECK-LABEL: msb_i8:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ptrue p0.b			; CHECK-NEXT: ptrue p0.b
	; CHECK-NEXT: mls z2.b, p0/m, z0.b, z1.b			; CHECK-NEXT: mls z2.b, p0/m, z0.b, z1.b
	; CHECK-NEXT: mov z0.d, z2.d			; CHECK-NEXT: mov z0.d, z2.d
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%prod = mul <vscale x 16 x i8> %a, %b			%prod = mul <vscale x 16 x i8> %a, %b
	%res = sub <vscale x 16 x i8> %c, %prod			%res = sub <vscale x 16 x i8> %c, %prod
	ret <vscale x 16 x i8> %res			ret <vscale x 16 x i8> %res
	}			}

				define <vscale x 8 x i16> @msb_i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b, <vscale x 8 x i16> %c) {
				; CHECK-LABEL: msb_i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: mls z2.h, p0/m, z0.h, z1.h
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				%prod = mul <vscale x 8 x i16> %a, %b
				%res = sub <vscale x 8 x i16> %c, %prod
				ret <vscale x 8 x i16> %res
				}

				define <vscale x 4 x i32> @msb_i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b, <vscale x 4 x i32> %c) {
				; CHECK-LABEL: msb_i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: mls z2.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				%prod = mul <vscale x 4 x i32> %a, %b
				%res = sub <vscale x 4 x i32> %c, %prod
				ret <vscale x 4 x i32> %res
				}

				define <vscale x 2 x i64> @msb_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b, <vscale x 2 x i64> %c) {
				; CHECK-LABEL: msb_i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mls z2.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				%prod = mul <vscale x 2 x i64> %a, %b
				%res = sub <vscale x 2 x i64> %c, %prod
				ret <vscale x 2 x i64> %res
				}

				define <vscale x 16 x i8> @mls_i8(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b, <vscale x 16 x i8> %c) {
				; CHECK-LABEL: mls_i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.b
				; CHECK-NEXT: mls z0.b, p0/m, z1.b, z2.b
				; CHECK-NEXT: ret
				%prod = mul <vscale x 16 x i8> %b, %c
				%res = sub <vscale x 16 x i8> %a, %prod
				ret <vscale x 16 x i8> %res
				}

				define <vscale x 8 x i16> @mls_i16(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b, <vscale x 8 x i16> %c) {
				; CHECK-LABEL: mls_i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.h
				paulwalker-armUnsubmitted Not Done Reply Inline Actions As above, I think this can be just: %prod = mul <vscale x 8 x i16> %b, %c %res = sub <vscale x 8 x i16> %a, %prod ret <vscale x 8 x i16> %res paulwalker-arm: As above, I think this can be just: ``` %prod = mul <vscale x 8 x i16> %b, %c %res = sub…
				; CHECK-NEXT: mls z0.h, p0/m, z1.h, z2.h
				; CHECK-NEXT: ret
				%prod = mul <vscale x 8 x i16> %b, %c
				%res = sub <vscale x 8 x i16> %a, %prod
				ret <vscale x 8 x i16> %res
				}

				define <vscale x 4 x i32> @mls_i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b, <vscale x 4 x i32> %c) {
				; CHECK-LABEL: mls_i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: mls z0.s, p0/m, z1.s, z2.s
				; CHECK-NEXT: ret
				%prod = mul <vscale x 4 x i32> %b, %c
				%res = sub <vscale x 4 x i32> %a, %prod
				ret <vscale x 4 x i32> %res
				}

				define <vscale x 2 x i64> @mls_i64(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b, <vscale x 2 x i64> %c) {
				; CHECK-LABEL: mls_i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mls z0.d, p0/m, z1.d, z2.d
				; CHECK-NEXT: ret
				%prod = mul <vscale x 2 x i64> %b, %c
				%res = sub <vscale x 2 x i64> %a, %prod
				ret <vscale x 2 x i64> %res
				}

				; Test cases below have one of the add/sub operands as constant splat

				define <vscale x 2 x i64> @muladd_i64_positiveAddend(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
				; CHECK-LABEL: muladd_i64_positiveAddend:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mov z2.d, #0xffffffff
				; CHECK-NEXT: mla z2.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				{
				%1 = mul <vscale x 2 x i64> %a, %b
				%2 = add <vscale x 2 x i64> %1, shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> poison, i64 4294967295, i64 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer)
				ret <vscale x 2 x i64> %2
				}

				define <vscale x 2 x i64> @muladd_i64_negativeAddend(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
				; CHECK-LABEL: muladd_i64_negativeAddend:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mov z2.d, #0xffffffff00000001
				; CHECK-NEXT: mla z2.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				{
				%1 = mul <vscale x 2 x i64> %a, %b
				%2 = add <vscale x 2 x i64> %1, shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> poison, i64 -4294967295, i64 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer)
				ret <vscale x 2 x i64> %2
				}


				define <vscale x 4 x i32> @muladd_i32_positiveAddend(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b)
				; CHECK-LABEL: muladd_i32_positiveAddend:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: mov z2.s, #0x10000
				; CHECK-NEXT: mla z2.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				{
				%1 = mul <vscale x 4 x i32> %a, %b
				%2 = add <vscale x 4 x i32> %1, shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 65536, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
				ret <vscale x 4 x i32> %2
				}

				define <vscale x 4 x i32> @muladd_i32_negativeAddend(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b)
				; CHECK-LABEL: muladd_i32_negativeAddend:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: mov z2.s, #0xffff0000
				; CHECK-NEXT: mla z2.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				{
				%1 = mul <vscale x 4 x i32> %a, %b
				%2 = add <vscale x 4 x i32> %1, shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 -65536, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
				ret <vscale x 4 x i32> %2
				}

				define <vscale x 8 x i16> @muladd_i16_positiveAddend(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b)
				; CHECK-LABEL: muladd_i16_positiveAddend:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: mul z0.h, p0/m, z0.h, z1.h
				; CHECK-NEXT: add z0.h, z0.h, #255 // =0xff
				; CHECK-NEXT: ret
				{
				%1 = mul <vscale x 8 x i16> %a, %b
				%2 = add <vscale x 8 x i16> %1, shufflevector (<vscale x 8 x i16> insertelement (<vscale x 8 x i16> poison, i16 255, i16 0), <vscale x 8 x i16> poison, <vscale x 8 x i32> zeroinitializer)
				ret <vscale x 8 x i16> %2
				}

				define <vscale x 8 x i16> @muladd_i16_negativeAddend(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b)
				; CHECK-LABEL: muladd_i16_negativeAddend:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: mov z2.h, #-255 // =0xffffffffffffff01
				; CHECK-NEXT: mla z2.h, p0/m, z0.h, z1.h
				; CHECK-NEXT: mov z0.d, z2.d
				; CHECK-NEXT: ret
				{
				%1 = mul <vscale x 8 x i16> %a, %b
				%2 = add <vscale x 8 x i16> %1, shufflevector (<vscale x 8 x i16> insertelement (<vscale x 8 x i16> poison, i16 -255, i16 0), <vscale x 8 x i16> poison, <vscale x 8 x i32> zeroinitializer)
				ret <vscale x 8 x i16> %2
				}

				define <vscale x 16 x i8> @muladd_i8_positiveAddend(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b)
				; CHECK-LABEL: muladd_i8_positiveAddend:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.b
				; CHECK-NEXT: mul z0.b, p0/m, z0.b, z1.b
				; CHECK-NEXT: add z0.b, z0.b, #15 // =0xf
				; CHECK-NEXT: ret
				{
				%1 = mul <vscale x 16 x i8> %a, %b
				%2 = add <vscale x 16 x i8> %1, shufflevector (<vscale x 16 x i8> insertelement (<vscale x 16 x i8> poison, i8 15, i8 0), <vscale x 16 x i8> poison, <vscale x 16 x i32> zeroinitializer)
				ret <vscale x 16 x i8> %2
				}

				define <vscale x 16 x i8> @muladd_i8_negativeAddend(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b)
				; CHECK-LABEL: muladd_i8_negativeAddend:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.b
				; CHECK-NEXT: mul z0.b, p0/m, z0.b, z1.b
				; CHECK-NEXT: add z0.b, z0.b, #241 // =0xf1
				; CHECK-NEXT: ret
				{
				%1 = mul <vscale x 16 x i8> %a, %b
				%2 = add <vscale x 16 x i8> %1, shufflevector (<vscale x 16 x i8> insertelement (<vscale x 16 x i8> poison, i8 -15, i8 0), <vscale x 16 x i8> poison, <vscale x 16 x i32> zeroinitializer)
				ret <vscale x 16 x i8> %2
				}

				define <vscale x 2 x i64> @mulsub_i64_positiveAddend(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
				; CHECK-LABEL: mulsub_i64_positiveAddend:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mul z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: mov z1.d, #0xffffffff
				; CHECK-NEXT: sub z0.d, z0.d, z1.d
				; CHECK-NEXT: ret
				{
				%1 = mul <vscale x 2 x i64> %a, %b
				%2 = sub <vscale x 2 x i64> %1, shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> poison, i64 4294967295, i64 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer)
				ret <vscale x 2 x i64> %2
				}

				define <vscale x 2 x i64> @mulsub_i64_negativeAddend(<vscale x 2 x i64> %a, <vscale x 2 x i64> %b)
				; CHECK-LABEL: mulsub_i64_negativeAddend:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: mul z0.d, p0/m, z0.d, z1.d
				; CHECK-NEXT: mov z1.d, #0xffffffff00000001
				; CHECK-NEXT: sub z0.d, z0.d, z1.d
				; CHECK-NEXT: ret
				{
				%1 = mul <vscale x 2 x i64> %a, %b
				%2 = sub <vscale x 2 x i64> %1, shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> poison, i64 -4294967295, i64 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer)
				ret <vscale x 2 x i64> %2
				}


				define <vscale x 4 x i32> @mulsub_i32_positiveAddend(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b)
				; CHECK-LABEL: mulsub_i32_positiveAddend:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: mul z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: mov z1.s, #0x10000
				; CHECK-NEXT: sub z0.s, z0.s, z1.s
				; CHECK-NEXT: ret
				{
				%1 = mul <vscale x 4 x i32> %a, %b
				%2 = sub <vscale x 4 x i32> %1, shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 65536, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
				ret <vscale x 4 x i32> %2
				}

				define <vscale x 4 x i32> @mulsub_i32_negativeAddend(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b)
				; CHECK-LABEL: mulsub_i32_negativeAddend:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: mul z0.s, p0/m, z0.s, z1.s
				; CHECK-NEXT: mov z1.s, #0xffff0000
				; CHECK-NEXT: sub z0.s, z0.s, z1.s
				; CHECK-NEXT: ret
				{
				%1 = mul <vscale x 4 x i32> %a, %b
				%2 = sub <vscale x 4 x i32> %1, shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 -65536, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
				ret <vscale x 4 x i32> %2
				}

				define <vscale x 8 x i16> @mulsub_i16_positiveAddend(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b)
				; CHECK-LABEL: mulsub_i16_positiveAddend:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: mul z0.h, p0/m, z0.h, z1.h
				; CHECK-NEXT: sub z0.h, z0.h, #255 // =0xff
				; CHECK-NEXT: ret
				{
				%1 = mul <vscale x 8 x i16> %a, %b
				%2 = sub <vscale x 8 x i16> %1, shufflevector (<vscale x 8 x i16> insertelement (<vscale x 8 x i16> poison, i16 255, i16 0), <vscale x 8 x i16> poison, <vscale x 8 x i32> zeroinitializer)
				ret <vscale x 8 x i16> %2
				}

				define <vscale x 8 x i16> @mulsub_i16_negativeAddend(<vscale x 8 x i16> %a, <vscale x 8 x i16> %b)
				; CHECK-LABEL: mulsub_i16_negativeAddend:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: mul z0.h, p0/m, z0.h, z1.h
				; CHECK-NEXT: mov z1.h, #-255 // =0xffffffffffffff01
				; CHECK-NEXT: sub z0.h, z0.h, z1.h
				; CHECK-NEXT: ret
				{
				%1 = mul <vscale x 8 x i16> %a, %b
				%2 = sub <vscale x 8 x i16> %1, shufflevector (<vscale x 8 x i16> insertelement (<vscale x 8 x i16> poison, i16 -255, i16 0), <vscale x 8 x i16> poison, <vscale x 8 x i32> zeroinitializer)
				ret <vscale x 8 x i16> %2
				}

				define <vscale x 16 x i8> @mulsub_i8_positiveAddend(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b)
				; CHECK-LABEL: mulsub_i8_positiveAddend:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.b
				; CHECK-NEXT: mul z0.b, p0/m, z0.b, z1.b
				; CHECK-NEXT: sub z0.b, z0.b, #15 // =0xf
				; CHECK-NEXT: ret
				{
				%1 = mul <vscale x 16 x i8> %a, %b
				%2 = sub <vscale x 16 x i8> %1, shufflevector (<vscale x 16 x i8> insertelement (<vscale x 16 x i8> poison, i8 15, i8 0), <vscale x 16 x i8> poison, <vscale x 16 x i32> zeroinitializer)
				ret <vscale x 16 x i8> %2
				}

				define <vscale x 16 x i8> @mulsub_i8_negativeAddend(<vscale x 16 x i8> %a, <vscale x 16 x i8> %b)
				; CHECK-LABEL: mulsub_i8_negativeAddend:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.b
				; CHECK-NEXT: mul z0.b, p0/m, z0.b, z1.b
				; CHECK-NEXT: sub z0.b, z0.b, #241 // =0xf1
				; CHECK-NEXT: ret
				{
				%1 = mul <vscale x 16 x i8> %a, %b
				%2 = sub <vscale x 16 x i8> %1, shufflevector (<vscale x 16 x i8> insertelement (<vscale x 16 x i8> poison, i8 -15, i8 0), <vscale x 16 x i8> poison, <vscale x 16 x i32> zeroinitializer)
				ret <vscale x 16 x i8> %2
				}

	declare <vscale x 16 x i8> @llvm.sadd.sat.nxv16i8(<vscale x 16 x i8>, <vscale x 16 x i8>)			declare <vscale x 16 x i8> @llvm.sadd.sat.nxv16i8(<vscale x 16 x i8>, <vscale x 16 x i8>)
	declare <vscale x 8 x i16> @llvm.sadd.sat.nxv8i16(<vscale x 8 x i16>, <vscale x 8 x i16>)			declare <vscale x 8 x i16> @llvm.sadd.sat.nxv8i16(<vscale x 8 x i16>, <vscale x 8 x i16>)
	declare <vscale x 4 x i32> @llvm.sadd.sat.nxv4i32(<vscale x 4 x i32>, <vscale x 4 x i32>)			declare <vscale x 4 x i32> @llvm.sadd.sat.nxv4i32(<vscale x 4 x i32>, <vscale x 4 x i32>)
	declare <vscale x 2 x i64> @llvm.sadd.sat.nxv2i64(<vscale x 2 x i64>, <vscale x 2 x i64>)			declare <vscale x 2 x i64> @llvm.sadd.sat.nxv2i64(<vscale x 2 x i64>, <vscale x 2 x i64>)

	declare <vscale x 16 x i8> @llvm.ssub.sat.nxv16i8(<vscale x 16 x i8>, <vscale x 16 x i8>)			declare <vscale x 16 x i8> @llvm.ssub.sat.nxv16i8(<vscale x 16 x i8>, <vscale x 16 x i8>)
	declare <vscale x 8 x i16> @llvm.ssub.sat.nxv8i16(<vscale x 8 x i16>, <vscale x 8 x i16>)			declare <vscale x 8 x i16> @llvm.ssub.sat.nxv8i16(<vscale x 8 x i16>, <vscale x 8 x i16>)
	declare <vscale x 4 x i32> @llvm.ssub.sat.nxv4i32(<vscale x 4 x i32>, <vscale x 4 x i32>)			declare <vscale x 4 x i32> @llvm.ssub.sat.nxv4i32(<vscale x 4 x i32>, <vscale x 4 x i32>)
	Show All 19 Lines