This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
2/3
AArch64ISelLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
sve-fixed-vector-zext.ll

Differential D141439

[AARCH64][SVE] Do not optimize vector conversions
ClosedPublic

Authored by bzinodev on Jan 10 2023, 2:53 PM.

Download Raw Diff

Details

Reviewers

paulwalker-arm
cameron.mcinally
SjoerdMeijer
ramana
efriedma

Commits

rG68f45796edbd: [AARCH64][SVE] Do not optimize vector conversions

Summary

shuffle_vector instructions are serialized targeting SVE fixed vectors, see https://reviews.llvm.org/D139111. This patch disables optimizeExtendOrTruncateConversion peepholes that generates shuffle_vector.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

bzinodev created this revision.Jan 10 2023, 2:53 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 10 2023, 2:53 PM

Herald added subscribers: ctetreau, hiraditya, kristof.beyls. · View Herald Transcript

bzinodev requested review of this revision.Jan 10 2023, 2:53 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 10 2023, 2:53 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

bzinodev added a reviewer: ramana.Jan 10 2023, 3:17 PM

Matt added a subscriber: Matt.Jan 10 2023, 3:55 PM

Harbormaster completed remote builds in B206943: Diff 488000.Jan 10 2023, 8:30 PM

Hi Zino,

Looks like the test case is failing? See https://reviews.llvm.org/harbormaster/unit/view/5711271/

I am wondering how big of a hammer this is. Are there no cases at all where doing this is beneficial?

Is the test case minimal? Think I see a loop, which we don't need? To see the codegen differences, it would be good if we can precommit a test, but then we want it more minimal if that is possible.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14181	Nit: I would prefer a pointer to source-code (e.g. a function name) rather than a hyperlink, or just omit it if it is obvious.

It looks like this patch is SVE-related, rather than SME. Can you change the title from [AARCH64][SME] to [AARCH64][SVE] please? Thanks!

bzinodev updated this revision to Diff 488038.Jan 11 2023, 8:44 AM

Harbormaster completed remote builds in B207109: Diff 488038.Jan 11 2023, 10:54 AM

bzinodev updated this revision to Diff 488316.Jan 11 2023, 11:43 AM

bzinodev retitled this revision from [AARCH64][SME] Do not optimize vector conversions to [AARCH64][SVE] Do not optimize vector conversions.

Herald added a reviewer: efriedma. · View Herald TranscriptJan 11 2023, 11:43 AM

Herald added subscribers: psnobl, tschuett. · View Herald Transcript

Harbormaster completed remote builds in B207165: Diff 488316.Jan 11 2023, 2:36 PM

Thanks for reducing the test case. I think bailing out early here makes sense indeed, so LGTM.

This revision is now accepted and ready to land.Jan 12 2023, 3:11 AM

paulwalker-arm added inline comments.Jan 13 2023, 5:21 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14178	For clarification, is this a bad optimisation when SVE is available? or is it the case the current code generation for SVE is suboptimal?

bzinodev marked an inline comment as done.Jan 15 2023, 1:19 PM

bzinodev added inline comments.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14178	In a nutshell I don't know. Is the zext peep that rely on tbl instruction is always profitable even when targeting NEON? I have limited access to ARM HW, this peep performs slower on my small toy test. On SVE, disabling the peep, uunpklo is generated and it is fast enough. Please feel free to propose better code generation options. thanks

I have double checked with Zino that a significant performance uplift was observed for a benchmark app, measured on SVE hardware.
While Zino is in the process of getting llvm commit rights, I am happy/confident to land this on his behalf as the codegen improvement for the examples I have seen are obvious. I think we can iterate on this should there be other/better things to do in this area.

This revision was landed with ongoing or failed builds.Jan 19 2023, 8:50 AM

Closed by commit rG68f45796edbd: [AARCH64][SVE] Do not optimize vector conversions (authored by bzinodev, committed by SjoerdMeijer). · Explain Why

This revision was automatically updated to reflect the committed changes.

SjoerdMeijer added a commit: rG68f45796edbd: [AARCH64][SVE] Do not optimize vector conversions.

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelLowering.cpp

5 lines

test/

CodeGen/

AArch64/

sve-fixed-vector-zext.ll

59 lines

Diff 490541

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 14,169 Lines • ▼ Show 20 Lines	if (Results.size() == 1) {
FinalResult =		FinalResult =
Builder.CreateShuffleVector(Results[0], Results[1], FinalMask);		Builder.CreateShuffleVector(Results[0], Results[1], FinalMask);
}		}

TI->replaceAllUsesWith(FinalResult);		TI->replaceAllUsesWith(FinalResult);
TI->eraseFromParent();		TI->eraseFromParent();
}		}

bool AArch64TargetLowering::optimizeExtendOrTruncateConversion(Instruction *I,		bool AArch64TargetLowering::optimizeExtendOrTruncateConversion(Instruction *I,
		paulwalker-armUnsubmitted Not Done Reply Inline Actions For clarification, is this a bad optimisation when SVE is available? or is it the case the current code generation for SVE is suboptimal? paulwalker-arm: For clarification, is this a bad optimisation when SVE is available? or is it the case the…
		bzinodevAuthorUnsubmitted Done Reply Inline Actions In a nutshell I don't know. Is the zext peep that rely on tbl instruction is always profitable even when targeting NEON? I have limited access to ARM HW, this peep performs slower on my small toy test. On SVE, disabling the peep, uunpklo is generated and it is fast enough. Please feel free to propose better code generation options. thanks bzinodev: In a nutshell I don't know. Is the zext peep that rely on tbl instruction is always profitable…
Loop *L) const {		Loop *L) const {
		// shuffle_vector instructions are serialized when targeting SVE,
		// see LowerSPLAT_VECTOR. This peephole is not beneficial.
		SjoerdMeijerUnsubmitted Done Reply Inline Actions Nit: I would prefer a pointer to source-code (e.g. a function name) rather than a hyperlink, or just omit it if it is obvious. SjoerdMeijer: Nit: I would prefer a pointer to source-code (e.g. a function name) rather than a hyperlink, or…
		if (Subtarget->useSVEForFixedLengthVectors())
		return false;

// Try to optimize conversions using tbl. This requires materializing constant		// Try to optimize conversions using tbl. This requires materializing constant
// index vectors, which can increase code size and add loads. Skip the		// index vectors, which can increase code size and add loads. Skip the
// transform unless the conversion is in a loop block guaranteed to execute		// transform unless the conversion is in a loop block guaranteed to execute
// and we are not optimizing for size.		// and we are not optimizing for size.
Function *F = I->getParent()->getParent();		Function *F = I->getParent()->getParent();
if (!L \|\| L->getHeader() != I->getParent() \|\| F->hasMinSize() \|\|		if (!L \|\| L->getHeader() != I->getParent() \|\| F->hasMinSize() \|\|
F->hasOptSize())		F->hasOptSize())
return false;		return false;
▲ Show 20 Lines • Show All 9,778 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-fixed-vector-zext.ll

This file was added.


				; RUN: llc < %s -mtriple=aarch64-none-linux-gnu -mcpu=neoverse-v1 -O3 -opaque-pointers -aarch64-sve-vector-bits-min=256 -verify-machineinstrs \| FileCheck %s --check-prefixes=SVE256
				; RUN: llc < %s -mtriple=aarch64-none-linux-gnu -mcpu=neoverse-v1 -O3 -opaque-pointers -aarch64-sve-vector-bits-min=128 -verify-machineinstrs \| FileCheck %s --check-prefixes=NEON
				; RUN: llc < %s -mtriple=aarch64-none-linux-gnu -mcpu=neoverse-n1 -O3 -opaque-pointers -verify-machineinstrs \| FileCheck %s --check-prefixes=NEON
				; RUN: llc < %s -mtriple=aarch64-none-linux-gnu -mcpu=neoverse-v2 -O3 -opaque-pointers -verify-machineinstrs \| FileCheck %s --check-prefixes=NEON

				define internal i32 @test(ptr nocapture readonly %p1, i32 %i1, ptr nocapture readonly %p2, i32 %i2) {
				; SVE256-LABEL: test:
				; SVE256: ld1b { z0.h }, p0/z,
				; SVE256: ld1b { z1.h }, p0/z,
				; SVE256: sub z0.h, z0.h, z1.h
				; SVE256-NEXT: sunpklo z1.s, z0.h
				; SVE256-NEXT: ext z0.b, z0.b, z0.b, #16
				; SVE256-NEXT: sunpklo z0.s, z0.h
				; SVE256-NEXT: add z0.s, z1.s, z0.s
				; SVE256-NEXT: uaddv d0, p1, z0.s

				; NEON-LABEL: test:
				; NEON: tbl
				; NEON-NEXT: tbl
				; NEON-NEXT: tbl
				; NEON-NEXT: tbl
				; NEON-NEXT: tbl
				; NEON-NEXT: tbl
				; NEON-NEXT: tbl
				; NEON-NEXT: tbl
				; NEON: addv


				L.entry:
				br label %L1

				L1: ; preds = %L1, %L.entry
				%a = phi i32 [ 16, %L.entry ], [ %14, %L1 ]
				%b = phi i32 [ 0, %L.entry ], [ %13, %L1 ]
				%i = phi i32 [ 0, %L.entry ], [ %12, %L1 ]
				%0 = mul i32 %b, %i1
				%1 = sext i32 %0 to i64
				%2 = getelementptr i8, ptr %p1, i64 %1
				%3 = mul i32 %b, %i2
				%4 = sext i32 %3 to i64
				%5 = getelementptr i8, ptr %p2, i64 %4
				%6 = load <16 x i8>, ptr %2, align 1
				%7 = zext <16 x i8> %6 to <16 x i32>
				%8 = load <16 x i8>, ptr %5, align 1
				%9 = zext <16 x i8> %8 to <16 x i32>
				%10 = sub nsw <16 x i32> %7, %9
				%11 = tail call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> %10)
				%12 = add i32 %11, %i
				%13 = add nuw nsw i32 %b, 1
				%14 = add nsw i32 %a, -1
				%.not = icmp eq i32 %14, 0
				br i1 %.not, label %L2, label %L1

				L2: ; preds = %L1
				ret i32 %12
				}

				declare i32 @llvm.vector.reduce.add.v16i32(<16 x i32>)