This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/RISCV/
-
Target/
-
RISCV/
-
RISCVTargetTransformInfo.h
-
test/Transforms/LoopVectorize/RISCV/
-
Transforms/
-
LoopVectorize/
-
RISCV/
-
low-trip-count.ll
-
scalable-reductions.ll
-
scalable-vf-hint.ll
10
unroll-in-loop-vectorizer.ll

Differential D125747

[RISCV] Enable scalable vectorization by default for RVV
AbandonedPublic

Authored by Miss_Grape on May 17 2022, 12:48 AM.

Download Raw Diff

Details

Reviewers

craig.topper
benshi001
reames

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60,060 ms	x64 debian > ThreadSanitizer-x86_64.ThreadSanitizer-x86_64::restore_stack.cpp
	60,030 ms	x64 debian > libFuzzer.libFuzzer::fuzzer-leak.test
	60,030 ms	x64 debian > libFuzzer.libFuzzer::minimize_crash.test
	60,030 ms	x64 debian > libFuzzer.libFuzzer::out-of-process-fuzz.test
	60,020 ms	x64 debian > libFuzzer.libFuzzer::value-profile-load.test

Event Timeline

Miss_Grape created this revision.May 17 2022, 12:48 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 17 2022, 12:48 AM

Herald added subscribers: sunshaoce, VincentWu, luke957 and 30 others. · View Herald Transcript

Miss_Grape requested review of this revision.May 17 2022, 12:48 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 17 2022, 12:48 AM

Herald added subscribers: llvm-commits, alextsao1999, • pcwang-thead and 3 others. · View Herald Transcript

Do you have any performance data?

Harbormaster completed remote builds in B164815: Diff 429952.May 17 2022, 1:55 AM

I doubt we want scalable vectorization enabled by default right now. As Craig said, we need performance data to justify, but the impression I've gotten asking around is that this is not yet ready for prime time.

More than that though, we probably never want scalable vectorization for all configurations. Having it enabled for default generic targets may make sense, but if we're targeting a particular CPU with full knowledge of vector lengths, classic fixed length is probably a better default.

I have done some researches before, the quality of vectorized code haven't met our expectation currently.
Actually, it is just SIMD-style vectorization for RVV for the time being.
Maybe we should wait for VP-based vectorization?

llvm/test/Transforms/LoopVectorize/RISCV/unroll-in-loop-vectorizer.ll
48	For example, I believe this `store` is unnecessary.

reames added inline comments.May 17 2022, 9:06 PM

llvm/test/Transforms/LoopVectorize/RISCV/unroll-in-loop-vectorizer.ll
6	As an aside, I took a look at the assembly for this example. Th codegen for the vector loop ends up being less than great. For instance: We have a extend trapped in the loop for some reason We rerun vsetvli on every iteration (despite it producing a fixed result) We have a rem in the vector preheader. That's rather expensive. (Well, actually, we end up with a libcall because the attributes don't include +m, but I ignored that.)

reames added inline comments.May 17 2022, 9:10 PM

llvm/test/Transforms/LoopVectorize/RISCV/unroll-in-loop-vectorizer.ll
6	To be fair, the first two issues also exist with fixed length vectorization of the same example.

craig.topper added inline comments.May 17 2022, 10:07 PM

llvm/test/Transforms/LoopVectorize/RISCV/unroll-in-loop-vectorizer.ll
48	Why is it unnecessary? I think the loop is processing using two vector loads/store due to RISCVSubtarget::getMaxInterleaveFactor()

craig.topper added inline comments.May 17 2022, 10:11 PM

llvm/test/Transforms/LoopVectorize/RISCV/unroll-in-loop-vectorizer.ll
6	What kind of extend?

• pcwang-thead added inline comments.May 17 2022, 11:39 PM

llvm/test/Transforms/LoopVectorize/RISCV/unroll-in-loop-vectorizer.ll
48	Oops, you're right. I think the problem is that we are using RVV as SIMD now, so we will: read vlenb to calculate vector length. use `vsetvli _, zero, ...` to set vl/vtype and ignore returned vl. This is inserted by InsertVSETVLI pass. handle tail in scalar loop (may have been improved in D121595?).

reames added inline comments.May 18 2022, 7:42 AM

llvm/test/Transforms/LoopVectorize/RISCV/unroll-in-loop-vectorizer.ll
6	zext - it looks easily removable, I'm surprised to see it after O2. This shouldn't be a hard fix. It appears in both vector and scalar loops.

In D125747#3518172, @craig.topper wrote:

Do you have any performance data?

I use the TSCV test suit, then run on the spike,

when option: -scalable-vectorization=on, the performance more better. But I'm not sure if the performance data from the spike run can be used as a standard to measure performance

In D125747#3541849, @Miss_Grape wrote:

In D125747#3518172, @craig.topper wrote:

Do you have any performance data?

I use the TSCV test suit, then run on the spike,

when option: -scalable-vectorization=on, the performance more better. But I'm not sure if the performance data from the spike run can be used as a standard to measure performance

Better than specifying -riscv-v-vector-bits-min to match the machine width?

craig.topper added inline comments.May 27 2022, 8:22 AM

llvm/test/Transforms/LoopVectorize/RISCV/unroll-in-loop-vectorizer.ll
6	Can you share what you're seeing? I see a zero extend in the preheader but not in the loops.

In D125747#3542559, @craig.topper wrote:

In D125747#3541849, @Miss_Grape wrote:

In D125747#3518172, @craig.topper wrote:

Do you have any performance data?

I use the TSCV test suit, then run on the spike,

when option: -scalable-vectorization=on, the performance more better. But I'm not sure if the performance data from the spike run can be used as a standard to measure performance

Better than specifying -riscv-v-vector-bits-min to match the machine width?

@craig.topper I think this is somewhat the wrong question here. While I agree that fixed length should be our eventual default for known vector lengths, we currently don't enable any vectorization. If we can show either form of vectorization is generally profitable over the no-vectorization configuration, we should enable. We can then evaluate the other configuration against that new baseline.

@Miss_Grape I struggle to make out what that screenshot is conveying. Could you summarize please? Also, a text attachment is greatly preferred over images.

llvm/test/Transforms/LoopVectorize/RISCV/unroll-in-loop-vectorizer.ll

When running just loop-vectorize and then llc, I do not see the extend. When replacing -loop-vectorize with -O2 on the same input, I do.

./opt -mtriple=riscv64 -mattr=+v,+m test/Transforms/LoopVectorize/RISCV/unroll-in-loop-vectorizer.ll -loop-vectorize -scalable-vectorization=on -S -riscv-v-vector-bits-max=512 | ./llc -O3 
./opt -mtriple=riscv64 -mattr=+v,+m test/Transforms/LoopVectorize/RISCV/unroll-in-loop-vectorizer.ll -O2 -scalable-vectorization=on -S -riscv-v-vector-bits-max=512 | ./llc -O3

The key bit of IR after -O2 is:

%index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]
%13 = zext i32 %index to i64
// ... 
%index.next = add nuw i32 %index, %12
%23 = icmp eq i32 %index.next, %n.vec
br i1 %23, label %middle.block, label %vector.body, !llvm.loop !0

We should be able to widen %index IV in indvars without issue, I'm a bit surprised we don't.

Interestingly, if I add Zba, something in codegen manages to fold away the extend now. We didn't when I'd last looked.

./opt -mtriple=riscv64 -mattr=+v,+m,+zba test/Transforms/LoopVectorize/RISCV/unroll-in-loop-vectorizer.ll -O2 -scalable-vectorization=on -S -riscv-v-vector-bits-max=512 | ./llc -O3

To be clear, this is a minor codegen issue at best, and definitely nothing which should block this patch.

craig.topper added inline comments.May 27 2022, 11:30 AM

llvm/test/Transforms/LoopVectorize/RISCV/unroll-in-loop-vectorizer.ll
6	Does opt not automatically infer the DataLayout from the triple? Adding a proper DataLayout seems to make the zext go away.

Do you have a set of tests with -scalable-vectorization=on and without to compare the code quality?

In D125747#3542925, @reames wrote:

In D125747#3542559, @craig.topper wrote:

In D125747#3541849, @Miss_Grape wrote:

In D125747#3518172, @craig.topper wrote:

Do you have any performance data?

I use the TSCV test suit, then run on the spike,

when option: -scalable-vectorization=on, the performance more better. But I'm not sure if the performance data from the spike run can be used as a standard to measure performance

Better than specifying -riscv-v-vector-bits-min to match the machine width?

@craig.topper I think this is somewhat the wrong question here. While I agree that fixed length should be our eventual default for known vector lengths, we currently don't enable any vectorization. If we can show either form of vectorization is generally profitable over the no-vectorization configuration, we should enable. We can then evaluate the other configuration against that new baseline.

@Miss_Grape I struggle to make out what that screenshot is conveying. Could you summarize please? Also, a text attachment is greatly preferred over images.

TSVC.tar.gz13 KBDownload
， TSVC Test suit cases，
Options：default -scalable-vectorization is off

1）clang --target=riscv64-unknown-elf --sysroot=$HOME/task/rvv/riscv64-unknown-elf --gcc-toolchain=$HOME/task/rvv --march=rv64gcv -O3 -mllvm -riscv-v-vector-bits-max=128 -mllvm -riscv-v-vector-bits-min=128 -mllvm -scalable-vectorization=on tsc.c dummy.c -lm -o xxx
2）clang --target=riscv64-unknown-elf --sysroot=$HOME/task/rvv/riscv64-unknown-elf --gcc-toolchain=$HOME/task/rvv -march=rv64gcv -O3 -mllvm -riscv-v-vector-bits-max=128 -mllvm -riscv-v-vector-bits-min=128 tsc.c dummy.c -lm -o xxx

down load riscv's pk and run the spike:

/home/wengliqin/task/rvv/bin/spike pk xxx, then you can get the results shown in the following figure

• pcwang-thead mentioned this in D125271: [riscv] Enable strict assertions in InsertVSETVLI data flow.May 29 2022, 11:00 PM

In D125747#3542925, @reames wrote:

In D125747#3542559, @craig.topper wrote:

In D125747#3541849, @Miss_Grape wrote:

In D125747#3518172, @craig.topper wrote:

Do you have any performance data?

I use the TSCV test suit, then run on the spike,

when option: -scalable-vectorization=on, the performance more better. But I'm not sure if the performance data from the spike run can be used as a standard to measure performance

Better than specifying -riscv-v-vector-bits-min to match the machine width?

@craig.topper I think this is somewhat the wrong question here. While I agree that fixed length should be our eventual default for known vector lengths, we currently don't enable any vectorization. If we can show either form of vectorization is generally profitable over the no-vectorization configuration, we should enable. We can then evaluate the other configuration against that new baseline.

@Miss_Grape I struggle to make out what that screenshot is conveying. Could you summarize please? Also, a text attachment is greatly preferred over images.

This is reslut run on the spike:

scalable-vectorization-on.log4 KBDownload

scalable-vectorization-off.log4 KBDownload

In D125747#3544811, @Miss_Grape wrote:

In D125747#3542925, @reames wrote:

In D125747#3542559, @craig.topper wrote:

In D125747#3541849, @Miss_Grape wrote:

In D125747#3518172, @craig.topper wrote:

Do you have any performance data?

I use the TSCV test suit, then run on the spike,

when option: -scalable-vectorization=on, the performance more better. But I'm not sure if the performance data from the spike run can be used as a standard to measure performance

Better than specifying -riscv-v-vector-bits-min to match the machine width?

@craig.topper I think this is somewhat the wrong question here. While I agree that fixed length should be our eventual default for known vector lengths, we currently don't enable any vectorization. If we can show either form of vectorization is generally profitable over the no-vectorization configuration, we should enable. We can then evaluate the other configuration against that new baseline.

@Miss_Grape I struggle to make out what that screenshot is conveying. Could you summarize please? Also, a text attachment is greatly preferred over images.

TSVC.tar.gz13 KBDownload
， TSVC Test suit cases，

Options：default -scalable-vectorization is off

1）clang --target=riscv64-unknown-elf --sysroot=$HOME/task/rvv/riscv64-unknown-elf --gcc-toolchain=$HOME/task/rvv --march=rv64gcv -O3 -mllvm -riscv-v-vector-bits-max=128 -mllvm -riscv-v-vector-bits-min=128 -mllvm -scalable-vectorization=on tsc.c dummy.c -lm -o xxx
2）clang --target=riscv64-unknown-elf --sysroot=$HOME/task/rvv/riscv64-unknown-elf --gcc-toolchain=$HOME/task/rvv -march=rv64gcv -O3 -mllvm -riscv-v-vector-bits-max=128 -mllvm -riscv-v-vector-bits-min=128 tsc.c dummy.c -lm -o xxx

down load riscv's pk and run the spike:

/home/wengliqin/task/rvv/bin/spike pk xxx, then you can get the results shown in the following figure

Hi,I tried your testsuite. But there are so many warning:
warning: 'set1ds' accessing 128000 bytes in a region of size 85336 [-Wstringop-overflow=]

653 |                 set1ds(LEN/3,   &d[LEN/3],      zero,unit);

Does it influence the testing?

FYI, we seem to have some lurking functional issues with scalable vectorization. I tend to use sqllite as an easy test case for spotting assembly issues, and trying it with scalable vectorization (and without specifying a min vector length), promptly crashed.

clang -isystem /.../gnu-toolchain/sysroot/usr/include/ --target=riscv64 -Xclang -target-feature -Xclang +v,+f,+m,+c,+d,+zba sqlite-autoconf-3380500/sqlite3.c -c -O2 -S -mllvm -scalable-vectorization=on

The crash is:
Warning: Invalid size request on a scalable vector; Possible incorrect use of MVT::getVectorNumElements() for scalable vector. Scalable flag may be dropped, use MVT::getVectorElementCount() instead
clang: /home/preames/llvm-dev/llvm-project/llvm/include/llvm/ADT/Optional.h:195: T& llvm::optional_detail::OptionalStorage<T, true>::getValue() & [with T = long int]: Assertion `hasVal' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
With the last interesting frame being: llvm::CodeMetrics::analyzeBasicBlock

This revision now requires changes to proceed.Jun 2 2022, 12:28 PM

Downstream we have had to patch a lot of TTI functions to avoid crashing with scalable vectors. So I'm also thinking we're not ready for this to be enabled by default. I've been meaning to try and find some solid test cases to show this off but I agree with @reames .

Miss_Grape abandoned this revision.Jun 13 2022, 12:15 AM

FYI, I have a new version of this posted here: https://reviews.llvm.org/D129013

This follows a bunch of work to avoid crashes in cost modeling, or consumers involving Invalid costs, and a fair amount of cost model implementation work.

Revision Contents

Path

Size

llvm/

lib/

Target/

RISCV/

RISCVTargetTransformInfo.h

1 line

test/

Transforms/

LoopVectorize/

RISCV/

low-trip-count.ll

2 lines

scalable-reductions.ll

2 lines

scalable-vf-hint.ll

3 lines

unroll-in-loop-vectorizer.ll

59 lines

Diff 429952

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h

Show First 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	public:
InstructionCost getIntImmCostIntrin(Intrinsic::ID IID, unsigned Idx,		InstructionCost getIntImmCostIntrin(Intrinsic::ID IID, unsigned Idx,
const APInt &Imm, Type *Ty,		const APInt &Imm, Type *Ty,
TTI::TargetCostKind CostKind);		TTI::TargetCostKind CostKind);

TargetTransformInfo::PopcntSupportKind getPopcntSupport(unsigned TyWidth);		TargetTransformInfo::PopcntSupportKind getPopcntSupport(unsigned TyWidth);

bool shouldExpandReduction(const IntrinsicInst *II) const;		bool shouldExpandReduction(const IntrinsicInst *II) const;
bool supportsScalableVectors() const { return ST->hasVInstructions(); }		bool supportsScalableVectors() const { return ST->hasVInstructions(); }
		bool enableScalableVectorization() const { return ST->hasVInstructions(); }
Optional<unsigned> getMaxVScale() const;		Optional<unsigned> getMaxVScale() const;

TypeSize getRegisterBitWidth(TargetTransformInfo::RegisterKind K) const;		TypeSize getRegisterBitWidth(TargetTransformInfo::RegisterKind K) const;

InstructionCost getRegUsageForType(Type *Ty);		InstructionCost getRegUsageForType(Type *Ty);

InstructionCost getMaskedMemoryOpCost(unsigned Opcode, Type *Src,		InstructionCost getMaskedMemoryOpCost(unsigned Opcode, Type *Src,
Align Alignment, unsigned AddressSpace,		Align Alignment, unsigned AddressSpace,
▲ Show 20 Lines • Show All 156 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/RISCV/low-trip-count.ll

	; RUN: opt -loop-vectorize -riscv-v-vector-bits-min=128 -scalable-vectorization=on -force-target-instruction-cost=1 -S < %s \| FileCheck %s			; RUN: opt -loop-vectorize -riscv-v-vector-bits-min=128 -force-target-instruction-cost=1 -S < %s \| FileCheck %s

	target triple = "riscv64"			target triple = "riscv64"

	define void @trip5_i8(i8* noalias nocapture noundef %dst, i8* noalias nocapture noundef readonly %src) #0 {			define void @trip5_i8(i8* noalias nocapture noundef %dst, i8* noalias nocapture noundef readonly %src) #0 {
	; CHECK-LABEL: @trip5_i8(			; CHECK-LABEL: @trip5_i8(
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK: [[ACTIVE_LANE_MASK:%.]] = icmp ule <vscale x 8 x i64> {{%.}}, shufflevector (<vscale x 8 x i64> insertelement (<vscale x 8 x i64> poison, i64 4, i32 0), <vscale x 8 x i64> poison, <vscale x 8 x i32> zeroinitializer)			; CHECK: [[ACTIVE_LANE_MASK:%.]] = icmp ule <vscale x 8 x i64> {{%.}}, shufflevector (<vscale x 8 x i64> insertelement (<vscale x 8 x i64> poison, i64 4, i32 0), <vscale x 8 x i64> poison, <vscale x 8 x i32> zeroinitializer)
	; CHECK: {{%.}} = call <vscale x 8 x i8> @llvm.masked.load.nxv8i8.p0nxv8i8(<vscale x 8 x i8> {{%.*}}, i32 1, <vscale x 8 x i1> [[ACTIVE_LANE_MASK]], <vscale x 8 x i8> poison)			; CHECK: {{%.}} = call <vscale x 8 x i8> @llvm.masked.load.nxv8i8.p0nxv8i8(<vscale x 8 x i8> {{%.*}}, i32 1, <vscale x 8 x i1> [[ACTIVE_LANE_MASK]], <vscale x 8 x i8> poison)
	Show All 24 Lines

llvm/test/Transforms/LoopVectorize/RISCV/scalable-reductions.ll

	; RUN: opt < %s -loop-vectorize -scalable-vectorization=on \			; RUN: opt < %s -loop-vectorize \
	; RUN: -riscv-v-vector-bits-min=128 -riscv-v-vector-bits-max=128 \			; RUN: -riscv-v-vector-bits-min=128 -riscv-v-vector-bits-max=128 \
	; RUN: -pass-remarks=loop-vectorize -pass-remarks-analysis=loop-vectorize \			; RUN: -pass-remarks=loop-vectorize -pass-remarks-analysis=loop-vectorize \
	; RUN: -pass-remarks-missed=loop-vectorize -mtriple riscv64-linux-gnu \			; RUN: -pass-remarks-missed=loop-vectorize -mtriple riscv64-linux-gnu \
	; RUN: -mattr=+v,+f -S 2>%t \| FileCheck %s -check-prefix=CHECK			; RUN: -mattr=+v,+f -S 2>%t \| FileCheck %s -check-prefix=CHECK
	; RUN: cat %t \| FileCheck %s -check-prefix=CHECK-REMARK			; RUN: cat %t \| FileCheck %s -check-prefix=CHECK-REMARK

	; Reduction can be vectorized			; Reduction can be vectorized

	▲ Show 20 Lines • Show All 395 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/RISCV/scalable-vf-hint.ll

	; RUN: opt -mtriple=riscv64 -mattr=+m,+v -loop-vectorize \			; RUN: opt -mtriple=riscv64 -mattr=+m,+v -loop-vectorize \
	; RUN: -riscv-v-vector-bits-max=512 -S -scalable-vectorization=on < %s 2>&1 \			; RUN: -riscv-v-vector-bits-max=512 -S < %s 2>&1 \| FileCheck %s
	; RUN: \| FileCheck %s

	; void test(int a, int b, int N) {			; void test(int a, int b, int N) {
	; #pragma clang loop vectorize(enable) vectorize_width(2, scalable)			; #pragma clang loop vectorize(enable) vectorize_width(2, scalable)
	; for (int i=0; i<N; ++i) {			; for (int i=0; i<N; ++i) {
	; a[i + 64] = a[i] + b[i];			; a[i + 64] = a[i] + b[i];
	; }			; }
	; }			; }
	;			;
	Show All 26 Lines

llvm/test/Transforms/LoopVectorize/RISCV/unroll-in-loop-vectorizer.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -mtriple=riscv64 -mattr=+v -loop-vectorize < %s \| FileCheck %s			; RUN: opt -S -mtriple=riscv64 -mattr=+v -loop-vectorize < %s \| FileCheck %s

	; Make sure we don't unroll scalar loops in the loop vectorizer.			; Make sure we don't unroll scalar loops in the loop vectorizer.
	;			;
	define void @small_loop(i32* nocapture %inArray, i32 %size) nounwind {			define void @small_loop(i32* nocapture %inArray, i32 %size) nounwind {
				reamesUnsubmitted Not Done Reply Inline Actions As an aside, I took a look at the assembly for this example. Th codegen for the vector loop ends up being less than great. For instance: We have a extend trapped in the loop for some reason We rerun vsetvli on every iteration (despite it producing a fixed result) We have a rem in the vector preheader. That's rather expensive. (Well, actually, we end up with a libcall because the attributes don't include +m, but I ignored that.) reames: As an aside, I took a look at the assembly for this example. Th codegen for the vector loop…
				reamesUnsubmitted Not Done Reply Inline Actions To be fair, the first two issues also exist with fixed length vectorization of the same example. reames: To be fair, the first two issues also exist with fixed length vectorization of the same example.
				craig.topperUnsubmitted Not Done Reply Inline Actions What kind of extend? craig.topper: What kind of extend?
				reamesUnsubmitted Not Done Reply Inline Actions zext - it looks easily removable, I'm surprised to see it after O2. This shouldn't be a hard fix. It appears in both vector and scalar loops. reames: zext - it looks easily removable, I'm surprised to see it after O2. This shouldn't be a hard…
				craig.topperUnsubmitted Not Done Reply Inline Actions Can you share what you're seeing? I see a zero extend in the preheader but not in the loops. craig.topper: Can you share what you're seeing? I see a zero extend in the preheader but not in the loops.
				reamesUnsubmitted Not Done Reply Inline Actions When running just loop-vectorize and then llc, I do not see the extend. When replacing -loop-vectorize with -O2 on the same input, I do. ./opt -mtriple=riscv64 -mattr=+v,+m test/Transforms/LoopVectorize/RISCV/unroll-in-loop-vectorizer.ll -loop-vectorize -scalable-vectorization=on -S -riscv-v-vector-bits-max=512 \| ./llc -O3 ./opt -mtriple=riscv64 -mattr=+v,+m test/Transforms/LoopVectorize/RISCV/unroll-in-loop-vectorizer.ll -O2 -scalable-vectorization=on -S -riscv-v-vector-bits-max=512 \| ./llc -O3 The key bit of IR after -O2 is: %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ] %13 = zext i32 %index to i64 // ... %index.next = add nuw i32 %index, %12 %23 = icmp eq i32 %index.next, %n.vec br i1 %23, label %middle.block, label %vector.body, !llvm.loop !0 We should be able to widen %index IV in indvars without issue, I'm a bit surprised we don't. Interestingly, if I add Zba, something in codegen manages to fold away the extend now. We didn't when I'd last looked. ./opt -mtriple=riscv64 -mattr=+v,+m,+zba test/Transforms/LoopVectorize/RISCV/unroll-in-loop-vectorizer.ll -O2 -scalable-vectorization=on -S -riscv-v-vector-bits-max=512 \| ./llc -O3 To be clear, this is a minor codegen issue at best, and definitely nothing which should block this patch. reames: When running just loop-vectorize and then llc, I do not see the extend. When replacing -loop…
				craig.topperUnsubmitted Not Done Reply Inline Actions Does opt not automatically infer the DataLayout from the triple? Adding a proper DataLayout seems to make the zext go away. craig.topper: Does opt not automatically infer the DataLayout from the triple? Adding a proper DataLayout…
	; CHECK-LABEL: @small_loop(			; CHECK-LABEL: @small_loop(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = icmp sgt i32 [[SIZE:%.]], 0			; CHECK-NEXT: [[TMP0:%.]] = icmp sgt i32 [[SIZE:%.]], 0
	; CHECK-NEXT: br i1 [[TMP0]], label [[LOOP_PREHEADER:%.]], label [[EXIT:%.]]			; CHECK-NEXT: br i1 [[TMP0]], label [[LOOP_PREHEADER:%.]], label [[EXIT:%.]]
	; CHECK: loop.preheader:			; CHECK: loop.preheader:
				; CHECK-NEXT: [[TMP1:%.*]] = call i32 @llvm.vscale.i32()
				; CHECK-NEXT: [[TMP2:%.*]] = mul i32 [[TMP1]], 4
				; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[SIZE]], [[TMP2]]
				; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK: vector.ph:
				; CHECK-NEXT: [[TMP3:%.*]] = call i32 @llvm.vscale.i32()
				; CHECK-NEXT: [[TMP4:%.*]] = mul i32 [[TMP3]], 4
				; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i32 [[SIZE]], [[TMP4]]
				; CHECK-NEXT: [[N_VEC:%.*]] = sub i32 [[SIZE]], [[N_MOD_VF]]
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[TMP5:%.*]] = add i32 [[INDEX]], 0
				; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vscale.i32()
				; CHECK-NEXT: [[TMP7:%.*]] = mul i32 [[TMP6]], 2
				; CHECK-NEXT: [[TMP8:%.*]] = add i32 [[TMP7]], 0
				; CHECK-NEXT: [[TMP9:%.*]] = mul i32 [[TMP8]], 1
				; CHECK-NEXT: [[TMP10:%.*]] = add i32 [[INDEX]], [[TMP9]]
				; CHECK-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[INARRAY:%.*]], i32 [[TMP5]]
				; CHECK-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[INARRAY]], i32 [[TMP10]]
				; CHECK-NEXT: [[TMP13:%.]] = getelementptr inbounds i32, i32 [[TMP11]], i32 0
				; CHECK-NEXT: [[TMP14:%.]] = bitcast i32 [[TMP13]] to <vscale x 2 x i32>*
				; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <vscale x 2 x i32>, <vscale x 2 x i32> [[TMP14]], align 4
				; CHECK-NEXT: [[TMP15:%.*]] = call i32 @llvm.vscale.i32()
				; CHECK-NEXT: [[TMP16:%.*]] = mul i32 [[TMP15]], 2
				; CHECK-NEXT: [[TMP17:%.]] = getelementptr inbounds i32, i32 [[TMP11]], i32 [[TMP16]]
				; CHECK-NEXT: [[TMP18:%.]] = bitcast i32 [[TMP17]] to <vscale x 2 x i32>*
				; CHECK-NEXT: [[WIDE_LOAD1:%.]] = load <vscale x 2 x i32>, <vscale x 2 x i32> [[TMP18]], align 4
				; CHECK-NEXT: [[TMP19:%.*]] = add nsw <vscale x 2 x i32> [[WIDE_LOAD]], shufflevector (<vscale x 2 x i32> insertelement (<vscale x 2 x i32> poison, i32 6, i32 0), <vscale x 2 x i32> poison, <vscale x 2 x i32> zeroinitializer)
				; CHECK-NEXT: [[TMP20:%.*]] = add nsw <vscale x 2 x i32> [[WIDE_LOAD1]], shufflevector (<vscale x 2 x i32> insertelement (<vscale x 2 x i32> poison, i32 6, i32 0), <vscale x 2 x i32> poison, <vscale x 2 x i32> zeroinitializer)
				; CHECK-NEXT: [[TMP21:%.]] = bitcast i32 [[TMP13]] to <vscale x 2 x i32>*
				; CHECK-NEXT: store <vscale x 2 x i32> [[TMP19]], <vscale x 2 x i32>* [[TMP21]], align 4
				; CHECK-NEXT: [[TMP22:%.*]] = call i32 @llvm.vscale.i32()
				; CHECK-NEXT: [[TMP23:%.*]] = mul i32 [[TMP22]], 2
				; CHECK-NEXT: [[TMP24:%.]] = getelementptr inbounds i32, i32 [[TMP11]], i32 [[TMP23]]
				; CHECK-NEXT: [[TMP25:%.]] = bitcast i32 [[TMP24]] to <vscale x 2 x i32>*
				; CHECK-NEXT: store <vscale x 2 x i32> [[TMP20]], <vscale x 2 x i32>* [[TMP25]], align 4
				pcwang-theadUnsubmitted Not Done Reply Inline Actions For example, I believe this `store` is unnecessary. pcwang-thead: For example, I believe this `store` is unnecessary.
				craig.topperUnsubmitted Not Done Reply Inline Actions Why is it unnecessary? I think the loop is processing using two vector loads/store due to RISCVSubtarget::getMaxInterleaveFactor() craig.topper: Why is it unnecessary? I think the loop is processing using two vector loads/store due to…
				pcwang-theadUnsubmitted Not Done Reply Inline Actions Oops, you're right. I think the problem is that we are using RVV as SIMD now, so we will: read vlenb to calculate vector length. use `vsetvli _, zero, ...` to set vl/vtype and ignore returned vl. This is inserted by InsertVSETVLI pass. handle tail in scalar loop (may have been improved in D121595?). pcwang-thead: Oops, you're right. I think the problem is that we are using RVV as SIMD now, so we will: *…
				; CHECK-NEXT: [[TMP26:%.*]] = call i32 @llvm.vscale.i32()
				; CHECK-NEXT: [[TMP27:%.*]] = mul i32 [[TMP26]], 4
				; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], [[TMP27]]
				; CHECK-NEXT: [[TMP28:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[TMP28]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
				; CHECK: middle.block:
				; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i32 [[SIZE]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT_LOOPEXIT:%.*]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi i32 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[LOOP_PREHEADER]] ]
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[IV:%.]] = phi i32 [ [[IV1:%.]], [[LOOP]] ], [ 0, [[LOOP_PREHEADER]] ]			; CHECK-NEXT: [[IV:%.]] = phi i32 [ [[IV1:%.]], [[LOOP]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[INARRAY:%.*]], i32 [[IV]]			; CHECK-NEXT: [[TMP29:%.]] = getelementptr inbounds i32, i32 [[INARRAY]], i32 [[IV]]
	; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 [[TMP1]], align 4			; CHECK-NEXT: [[TMP30:%.]] = load i32, i32 [[TMP29]], align 4
	; CHECK-NEXT: [[TMP3:%.*]] = add nsw i32 [[TMP2]], 6			; CHECK-NEXT: [[TMP31:%.*]] = add nsw i32 [[TMP30]], 6
	; CHECK-NEXT: store i32 [[TMP3]], i32* [[TMP1]], align 4			; CHECK-NEXT: store i32 [[TMP31]], i32* [[TMP29]], align 4
	; CHECK-NEXT: [[IV1]] = add i32 [[IV]], 1			; CHECK-NEXT: [[IV1]] = add i32 [[IV]], 1
	; CHECK-NEXT: [[COND:%.*]] = icmp eq i32 [[IV1]], [[SIZE]]			; CHECK-NEXT: [[COND:%.*]] = icmp eq i32 [[IV1]], [[SIZE]]
	; CHECK-NEXT: br i1 [[COND]], label [[EXIT_LOOPEXIT:%.*]], label [[LOOP]]			; CHECK-NEXT: br i1 [[COND]], label [[EXIT_LOOPEXIT]], label [[LOOP]], !llvm.loop [[LOOP2:![0-9]+]]
	; CHECK: exit.loopexit:			; CHECK: exit.loopexit:
	; CHECK-NEXT: br label [[EXIT]]			; CHECK-NEXT: br label [[EXIT]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%0 = icmp sgt i32 %size, 0			%0 = icmp sgt i32 %size, 0
	br i1 %0, label %loop, label %exit			br i1 %0, label %loop, label %exit
	Show All 14 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Enable scalable vectorization by default for RVVAbandonedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 429952

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h

llvm/test/Transforms/LoopVectorize/RISCV/low-trip-count.ll

llvm/test/Transforms/LoopVectorize/RISCV/scalable-reductions.ll

llvm/test/Transforms/LoopVectorize/RISCV/scalable-vf-hint.ll

llvm/test/Transforms/LoopVectorize/RISCV/unroll-in-loop-vectorizer.ll

[RISCV] Enable scalable vectorization by default for RVV
AbandonedPublic