llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13968–13969	This also allows the case where the total VecSize == 32 (for e.g. `<4 x i8>` which is currently not supported by Neon), or whether the number of elements is not a power of 2 (e.g. `<6 x i8>`. Can you add a test for this case?
13969	The indentation seems weird, did you use clang-format?
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-ld2-alloca.ll
45	nit: can you add `nounwind` as one of the parameters, to avoid the .cfi directives in the output?

hassnaa-arm marked an inline comment as done.Nov 28 2022, 9:24 AM

hassnaa-arm added inline comments.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13968–13969	In the IR, I don't understand why it allocates 16 elements while it only loads 8. Is there a reason behind that ?

Add nounwind to avoid .cfi directives get generated.

Harbormaster completed remote builds in B199796: Diff 478282.Nov 28 2022, 10:50 AM

Add extra test cases

Harbormaster completed remote builds in B199975: Diff 478510.Nov 29 2022, 3:50 AM

sdesmalen added inline comments.Nov 30 2022, 1:18 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13968–13969	Based on the code you've added, I expected the tests you've added to use an ld2, do you know why it doesn't?

hassnaa-arm marked an inline comment as done.Nov 30 2022, 5:56 AM

Change the used mask in the test cases, so that interleaved load get used.

Harbormaster completed remote builds in B200257: Diff 478925.Nov 30 2022, 5:57 AM

sdesmalen added inline comments.Nov 30 2022, 6:09 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13970	Sorry, I just realise you'll also need to add a check that we can generate a predicate pattern for the number of elements (e.g. vl2, vl3, ...), because e.g. a <9 x i8> has no corresponding predicate pattern. You can use `Optional<unsigned> getSVEPredPatternFromNumElements()` for this (defined in Utils/AArch64BaseInfo.h). Can you also add a test for <9 x i8> ?

hassnaa-arm added inline comments.Nov 30 2022, 6:13 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13970	Sorry, I don't understand why I should check for e.g. a <9 x i8> ? How is that related to the condition in the code ? and what do you mean by "You can use Optional<unsigned> getSVEPredPatternFromNumElements() for this " ? Do you mean that in addition to adding the check, I will add a code change also ?

sdesmalen added inline comments.Nov 30 2022, 6:20 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13970	What I meant was that if you do a shuffle mask like this: %load = load <6 x i8>, ptr %alloc %strided.vec = shufflevector <6 x i8> %load, <6 x i8> poison, <3 x i32> <i32 1, i32 3, i32 5> You get the following LD2 instruction: ptrue p0.b, vl3 ld2b { z0.b, z1.b }, p0/z, [x8] Which uses `vl3` for the predicate, which means enable 3 lanes. But there is no `vl9`, so if you'd end up with NumElements == 9, then you can't code-generate the interleaved access using LD2. To ask if there is a ptrue predicate for `NumElements`, you can use `getSVEPredPatternFromNumElements`. (I meant <9 x i8> as the result type of the shuffle by the way)

hassnaa-arm added inline comments.Nov 30 2022, 6:31 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13970	you mean if the result type of the shuffle is <9 x i8> there will be a problem, and you are asking me to add a test cases for that and fix its problem, correct ?

sdesmalen added inline comments.Nov 30 2022, 6:32 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13970	Correct.

hassnaa-arm added inline comments.Nov 30 2022, 6:44 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13970	but why did you choose vl9 specifically ? what about other vl ? e.g. vl10.

sdesmalen added inline comments.Nov 30 2022, 6:50 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13970	Because the available predicate patterns are vl1, vl2, vl3, ..., vl8, vl16, ... So vl9 is the first non-power-of-2 vector length that can't be represented.

hassnaa-arm marked an inline comment as done.Nov 30 2022, 7:08 AM

Skip using interleaved_load for vl9, because vl9 predicate is not available.

Harbormaster completed remote builds in B200267: Diff 478939.Nov 30 2022, 7:08 AM

LGTM with nit addressed

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13962	nit: you can just do `!getSVEPredPatternFromNumElements(NumElements))`

This revision is now accepted and ready to land.Nov 30 2022, 7:11 AM

hassnaa-arm marked an inline comment as done.Nov 30 2022, 7:57 AM

Add check for hasSVE() to avoid affecting NEON interleaved-access tests

Harbormaster completed remote builds in B200400: Diff 479131.Nov 30 2022, 6:09 PM

This revision was landed with ongoing or failed builds.Nov 30 2022, 6:31 PM

Closed by commit rG279c0a83aa22: [AArch64][SME]: Generate streaming-compatible code for ld2-alloca. (authored by Hassnaa Hamdi <hassnaa.hamdi@arm.com>). · Explain Why

This revision was automatically updated to reflect the committed changes.

Hassnaa Hamdi <hassnaa.hamdi@arm.com> mentioned this in rG45adca0f52af: [AArch64][SME]: Add precursory tests for D138791.

Hassnaa Hamdi <hassnaa.hamdi@arm.com> added a commit: rG279c0a83aa22: [AArch64][SME]: Generate streaming-compatible code for ld2-alloca..

Failed Tests (1):

LLVM :: CodeGen/AArch64/sve-streaming-mode-fixed-length-ld2-alloca.ll

Testing Time: 345.12s

Skipped          :    38
Unsupported      :  1357
Passed           : 88906
Expectedly Failed:   195
Failed           :     1

https://lab.llvm.org/buildbot/#/builders/85/builds/12665

david-arm added a reverting change: rG4a5ccf4e9342: Revert "[AArch64][SME]: Generate streaming-compatible code for ld2-alloca.".Dec 1 2022, 2:22 AM

david-arm mentioned this in rG06846596eb17: Revert "[AArch64][SME]: Add precursory tests for D138791".Dec 1 2022, 3:14 AM

uabelho added a subscriber: uabelho.Dec 1 2022, 3:37 AM

sdesmalen mentioned this in rGea11f4ff0ada: Reland "[AArch64][SME]: Add precursory tests for D138791".Dec 1 2022, 6:49 AM

sdesmalen mentioned this in rGd32c9e8384e9: Reland "[AArch64][SME]: Generate streaming-compatible code for ld2-alloca.".

Diff 478282

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 13,953 Lines • ▼ Show 20 Lines	bool AArch64TargetLowering::isLegalInterleavedAccessType(

unsigned VecSize = DL.getTypeSizeInBits(VecTy);		unsigned VecSize = DL.getTypeSizeInBits(VecTy);
unsigned ElSize = DL.getTypeSizeInBits(VecTy->getElementType());		unsigned ElSize = DL.getTypeSizeInBits(VecTy->getElementType());
unsigned NumElements = cast<FixedVectorType>(VecTy)->getNumElements();		unsigned NumElements = cast<FixedVectorType>(VecTy)->getNumElements();

UseScalable = false;		UseScalable = false;

// Ensure the number of vector elements is greater than 1.		// Ensure the number of vector elements is greater than 1.
if (NumElements < 2)		if (NumElements < 2)
		sdesmalenUnsubmitted Done Reply Inline Actions nit: you can just do `!getSVEPredPatternFromNumElements(NumElements))` sdesmalen: nit: you can just do `!getSVEPredPatternFromNumElements(NumElements))`
return false;		return false;

// Ensure the element type is legal.		// Ensure the element type is legal.
if (ElSize != 8 && ElSize != 16 && ElSize != 32 && ElSize != 64)		if (ElSize != 8 && ElSize != 16 && ElSize != 32 && ElSize != 64)
return false;		return false;

if (Subtarget->useSVEForFixedLengthVectors() &&		if (Subtarget->forceStreamingCompatibleSVE() \|\|
		sdesmalenUnsubmitted Not Done Reply Inline Actions This also allows the case where the total VecSize == 32 (for e.g. `<4 x i8>` which is currently not supported by Neon), or whether the number of elements is not a power of 2 (e.g. `<6 x i8>`. Can you add a test for this case? sdesmalen: This also allows the case where the total VecSize == 32 (for e.g. `<4 x i8>` which is currently…
		hassnaa-armAuthorUnsubmitted Done Reply Inline Actions In the IR, I don't understand why it allocates 16 elements while it only loads 8. Is there a reason behind that ? hassnaa-arm: In the IR, I don't understand why it allocates 16 elements while it only loads 8. Is there a…
		sdesmalenUnsubmitted Done Reply Inline Actions Based on the code you've added, I expected the tests you've added to use an ld2, do you know why it doesn't? sdesmalen: Based on the code you've added, I expected the tests you've added to use an ld2, do you know…
		sdesmalenUnsubmitted Done Reply Inline Actions The indentation seems weird, did you use clang-format? sdesmalen: The indentation seems weird, did you use clang-format?
		(Subtarget->useSVEForFixedLengthVectors() &&
		sdesmalenUnsubmitted Done Reply Inline Actions Sorry, I just realise you'll also need to add a check that we can generate a predicate pattern for the number of elements (e.g. vl2, vl3, ...), because e.g. a <9 x i8> has no corresponding predicate pattern. You can use `Optional<unsigned> getSVEPredPatternFromNumElements()` for this (defined in Utils/AArch64BaseInfo.h). Can you also add a test for <9 x i8> ? sdesmalen: Sorry, I just realise you'll also need to add a check that we can generate a predicate pattern…
		hassnaa-armAuthorUnsubmitted Done Reply Inline Actions Sorry, I don't understand why I should check for e.g. a <9 x i8> ? How is that related to the condition in the code ? and what do you mean by "You can use Optional<unsigned> getSVEPredPatternFromNumElements() for this " ? Do you mean that in addition to adding the check, I will add a code change also ? hassnaa-arm: Sorry, I don't understand why I should check for e.g. a <9 x i8> ? How is that related to the…
		sdesmalenUnsubmitted Not Done Reply Inline Actions What I meant was that if you do a shuffle mask like this: %load = load <6 x i8>, ptr %alloc %strided.vec = shufflevector <6 x i8> %load, <6 x i8> poison, <3 x i32> <i32 1, i32 3, i32 5> You get the following LD2 instruction: ptrue p0.b, vl3 ld2b { z0.b, z1.b }, p0/z, [x8] Which uses `vl3` for the predicate, which means enable 3 lanes. But there is no `vl9`, so if you'd end up with NumElements == 9, then you can't code-generate the interleaved access using LD2. To ask if there is a ptrue predicate for `NumElements`, you can use `getSVEPredPatternFromNumElements`. (I meant <9 x i8> as the result type of the shuffle by the way) sdesmalen: What I meant was that if you do a shuffle mask like this: %load = load <6 x i8>, ptr %alloc…
		hassnaa-armAuthorUnsubmitted Done Reply Inline Actions you mean if the result type of the shuffle is <9 x i8> there will be a problem, and you are asking me to add a test cases for that and fix its problem, correct ? hassnaa-arm: you mean if the result type of the shuffle is <9 x i8> there will be a problem, and you are…
		sdesmalenUnsubmitted Not Done Reply Inline Actions Correct. sdesmalen: Correct.
		hassnaa-armAuthorUnsubmitted Done Reply Inline Actions but why did you choose vl9 specifically ? what about other vl ? e.g. vl10. hassnaa-arm: but why did you choose vl9 specifically ? what about other vl ? e.g. vl10.
		sdesmalenUnsubmitted Not Done Reply Inline Actions Because the available predicate patterns are vl1, vl2, vl3, ..., vl8, vl16, ... So vl9 is the first non-power-of-2 vector length that can't be represented. sdesmalen: Because the available predicate patterns are vl1, vl2, vl3, ..., vl8, vl16, ... So vl9 is the…
(VecSize % Subtarget->getMinSVEVectorSizeInBits() == 0 \|\|		(VecSize % Subtarget->getMinSVEVectorSizeInBits() == 0 \|\|
(VecSize < Subtarget->getMinSVEVectorSizeInBits() &&		(VecSize < Subtarget->getMinSVEVectorSizeInBits() &&
isPowerOf2_32(NumElements) && VecSize > 128))) {		isPowerOf2_32(NumElements) && VecSize > 128)))) {
UseScalable = true;		UseScalable = true;
return true;		return true;
}		}

// Ensure the total vector size is 64 or a multiple of 128. Types larger than		// Ensure the total vector size is 64 or a multiple of 128. Types larger than
// 128 will be split into multiple interleaved accesses.		// 128 will be split into multiple interleaved accesses.
return VecSize == 64 \|\| VecSize % 128 == 0;		return VecSize == 64 \|\| VecSize % 128 == 0;
}		}
▲ Show 20 Lines • Show All 9,462 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-ld2-alloca.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -force-streaming-compatible-sve < %s \| FileCheck %s			; RUN: llc -force-streaming-compatible-sve < %s \| FileCheck %s

	target triple = "aarch64-unknown-linux-gnu"			target triple = "aarch64-unknown-linux-gnu"

	declare void @def(ptr)			declare void @def(ptr)

	define void @st1d_fixed(ptr %st_ptr) #0 {			define void @st1d_fixed(ptr %st_ptr) #0 {
	; CHECK-LABEL: st1d_fixed:			; CHECK-LABEL: st1d_fixed:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: sub sp, sp, #160			; CHECK-NEXT: str x29, [sp, #-32]! // 8-byte Folded Spill
	; CHECK-NEXT: .cfi_def_cfa_offset 160			; CHECK-NEXT: stp x30, x19, [sp, #16] // 16-byte Folded Spill
	; CHECK-NEXT: str x30, [sp, #128] // 8-byte Folded Spill			; CHECK-NEXT: addvl sp, sp, #-1
	; CHECK-NEXT: stp x20, x19, [sp, #144] // 16-byte Folded Spill			; CHECK-NEXT: sub sp, sp, #128
	; CHECK-NEXT: .cfi_offset w19, -8
	; CHECK-NEXT: .cfi_offset w20, -16
	; CHECK-NEXT: .cfi_offset w30, -32
	; CHECK-NEXT: mov x19, x0			; CHECK-NEXT: mov x19, x0
	; CHECK-NEXT: mov x0, sp			; CHECK-NEXT: mov x0, sp
	; CHECK-NEXT: mov x20, sp
	; CHECK-NEXT: bl def			; CHECK-NEXT: bl def
	; CHECK-NEXT: ld2 { v0.2d, v1.2d }, [x20], #32			; CHECK-NEXT: cntd x8
	; CHECK-NEXT: ldr x30, [sp, #128] // 8-byte Folded Reload			; CHECK-NEXT: ptrue p0.d, vl4
	; CHECK-NEXT: ld2 { v2.2d, v3.2d }, [x20]			; CHECK-NEXT: sub x8, x8, #2
				; CHECK-NEXT: ld2d { z0.d, z1.d }, p0/z, [sp]
				; CHECK-NEXT: mov w9, #2
				; CHECK-NEXT: cmp x8, #2
				; CHECK-NEXT: csel x8, x8, x9, lo
				; CHECK-NEXT: add x10, sp, #128
				; CHECK-NEXT: lsl x8, x8, #3
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: add x9, sp, #128
				; CHECK-NEXT: st1d { z0.d }, p0, [x10]
				; CHECK-NEXT: ldr q2, [x9, x8]
	; CHECK-NEXT: stp q0, q2, [x19]			; CHECK-NEXT: stp q0, q2, [x19]
	; CHECK-NEXT: ldp x20, x19, [sp, #144] // 16-byte Folded Reload			; CHECK-NEXT: addvl sp, sp, #1
	; CHECK-NEXT: add sp, sp, #160			; CHECK-NEXT: add sp, sp, #128
				; CHECK-NEXT: ldp x30, x19, [sp, #16] // 16-byte Folded Reload
				; CHECK-NEXT: ldr x29, [sp], #32 // 8-byte Folded Reload
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%alloc = alloca [16 x double]			%alloc = alloca [16 x double]
	call void @def(ptr %alloc)			call void @def(ptr %alloc)
	%load = load <8 x double>, ptr %alloc			%load = load <8 x double>, ptr %alloc
	%strided.vec = shufflevector <8 x double> %load, <8 x double> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6>			%strided.vec = shufflevector <8 x double> %load, <8 x double> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
	store <4 x double> %strided.vec, ptr %st_ptr			store <4 x double> %strided.vec, ptr %st_ptr
	ret void			ret void
	}			}

	attributes #0 = { "target-features"="+sve" }			attributes #0 = { "target-features"="+sve" nounwind}
				sdesmalenUnsubmitted Done Reply Inline Actions nit: can you add `nounwind` as one of the parameters, to avoid the .cfi directives in the output? sdesmalen: nit: can you add `nounwind` as one of the parameters, to avoid the .cfi directives in the…

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SME]: Generate streaming-compatible code for ld2-alloca.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 478282

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-ld2-alloca.ll

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SME]: Generate streaming-compatible code for ld2-alloca.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 478282

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-ld2-alloca.ll

[AArch64][SME]: Generate streaming-compatible code for ld2-alloca.
ClosedPublic