Details
- Reviewers
craig.topper benshi001 reames
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
I doubt we want scalable vectorization enabled by default right now. As Craig said, we need performance data to justify, but the impression I've gotten asking around is that this is not yet ready for prime time.
More than that though, we probably never want scalable vectorization for all configurations. Having it enabled for default generic targets may make sense, but if we're targeting a particular CPU with full knowledge of vector lengths, classic fixed length is probably a better default.
I have done some researches before, the quality of vectorized code haven't met our expectation currently.
Actually, it is just SIMD-style vectorization for RVV for the time being.
Maybe we should wait for VP-based vectorization?
llvm/test/Transforms/LoopVectorize/RISCV/unroll-in-loop-vectorizer.ll | ||
---|---|---|
48 | For example, I believe this store is unnecessary. |
llvm/test/Transforms/LoopVectorize/RISCV/unroll-in-loop-vectorizer.ll | ||
---|---|---|
6 | As an aside, I took a look at the assembly for this example. Th codegen for the vector loop ends up being less than great. For instance:
|
llvm/test/Transforms/LoopVectorize/RISCV/unroll-in-loop-vectorizer.ll | ||
---|---|---|
6 | To be fair, the first two issues also exist with fixed length vectorization of the same example. |
llvm/test/Transforms/LoopVectorize/RISCV/unroll-in-loop-vectorizer.ll | ||
---|---|---|
48 | Why is it unnecessary? I think the loop is processing using two vector loads/store due to RISCVSubtarget::getMaxInterleaveFactor() |
llvm/test/Transforms/LoopVectorize/RISCV/unroll-in-loop-vectorizer.ll | ||
---|---|---|
6 | What kind of extend? |
llvm/test/Transforms/LoopVectorize/RISCV/unroll-in-loop-vectorizer.ll | ||
---|---|---|
48 | Oops, you're right.
|
llvm/test/Transforms/LoopVectorize/RISCV/unroll-in-loop-vectorizer.ll | ||
---|---|---|
6 | zext - it looks easily removable, I'm surprised to see it after O2. This shouldn't be a hard fix. It appears in both vector and scalar loops. |
I use the TSCV test suit, then run on the spike,
when option: -scalable-vectorization=on, the performance more better. But I'm not sure if the performance data from the spike run can be used as a standard to measure performance
llvm/test/Transforms/LoopVectorize/RISCV/unroll-in-loop-vectorizer.ll | ||
---|---|---|
6 | Can you share what you're seeing? I see a zero extend in the preheader but not in the loops. |
@craig.topper I think this is somewhat the wrong question here. While I agree that fixed length should be our eventual default for known vector lengths, we currently don't enable any vectorization. If we can show either form of vectorization is generally profitable over the no-vectorization configuration, we should enable. We can then evaluate the other configuration against that new baseline.
@Miss_Grape I struggle to make out what that screenshot is conveying. Could you summarize please? Also, a text attachment is greatly preferred over images.
llvm/test/Transforms/LoopVectorize/RISCV/unroll-in-loop-vectorizer.ll | ||
---|---|---|
6 | When running just loop-vectorize and then llc, I do not see the extend. When replacing -loop-vectorize with -O2 on the same input, I do. ./opt -mtriple=riscv64 -mattr=+v,+m test/Transforms/LoopVectorize/RISCV/unroll-in-loop-vectorizer.ll -loop-vectorize -scalable-vectorization=on -S -riscv-v-vector-bits-max=512 | ./llc -O3 ./opt -mtriple=riscv64 -mattr=+v,+m test/Transforms/LoopVectorize/RISCV/unroll-in-loop-vectorizer.ll -O2 -scalable-vectorization=on -S -riscv-v-vector-bits-max=512 | ./llc -O3 The key bit of IR after -O2 is: %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ] %13 = zext i32 %index to i64 // ... %index.next = add nuw i32 %index, %12 %23 = icmp eq i32 %index.next, %n.vec br i1 %23, label %middle.block, label %vector.body, !llvm.loop !0 We should be able to widen %index IV in indvars without issue, I'm a bit surprised we don't. Interestingly, if I add Zba, something in codegen manages to fold away the extend now. We didn't when I'd last looked. ./opt -mtriple=riscv64 -mattr=+v,+m,+zba test/Transforms/LoopVectorize/RISCV/unroll-in-loop-vectorizer.ll -O2 -scalable-vectorization=on -S -riscv-v-vector-bits-max=512 | ./llc -O3 To be clear, this is a minor codegen issue at best, and definitely nothing which should block this patch. |
llvm/test/Transforms/LoopVectorize/RISCV/unroll-in-loop-vectorizer.ll | ||
---|---|---|
6 | Does opt not automatically infer the DataLayout from the triple? Adding a proper DataLayout seems to make the zext go away. |
Do you have a set of tests with -scalable-vectorization=on and without to compare the code quality?
- , TSVC Test suit cases,
- Options:default -scalable-vectorization is off
1)clang --target=riscv64-unknown-elf --sysroot=$HOME/task/rvv/riscv64-unknown-elf --gcc-toolchain=$HOME/task/rvv --march=rv64gcv -O3 -mllvm -riscv-v-vector-bits-max=128 -mllvm -riscv-v-vector-bits-min=128 -mllvm -scalable-vectorization=on tsc.c dummy.c -lm -o xxx
2)clang --target=riscv64-unknown-elf --sysroot=$HOME/task/rvv/riscv64-unknown-elf --gcc-toolchain=$HOME/task/rvv -march=rv64gcv -O3 -mllvm -riscv-v-vector-bits-max=128 -mllvm -riscv-v-vector-bits-min=128 tsc.c dummy.c -lm -o xxx
- down load riscv's pk and run the spike:
/home/wengliqin/task/rvv/bin/spike pk xxx, then you can get the results shown in the following figure
Hi,I tried your testsuite. But there are so many warning:
warning: 'set1ds' accessing 128000 bytes in a region of size 85336 [-Wstringop-overflow=]
653 | set1ds(LEN/3, &d[LEN/3], zero,unit);
Does it influence the testing?
FYI, we seem to have some lurking functional issues with scalable vectorization. I tend to use sqllite as an easy test case for spotting assembly issues, and trying it with scalable vectorization (and without specifying a min vector length), promptly crashed.
clang -isystem /.../gnu-toolchain/sysroot/usr/include/ --target=riscv64 -Xclang -target-feature -Xclang +v,+f,+m,+c,+d,+zba sqlite-autoconf-3380500/sqlite3.c -c -O2 -S -mllvm -scalable-vectorization=on
The crash is:
Warning: Invalid size request on a scalable vector; Possible incorrect use of MVT::getVectorNumElements() for scalable vector. Scalable flag may be dropped, use MVT::getVectorElementCount() instead
clang: /home/preames/llvm-dev/llvm-project/llvm/include/llvm/ADT/Optional.h:195: T& llvm::optional_detail::OptionalStorage<T, true>::getValue() & [with T = long int]: Assertion `hasVal' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
With the last interesting frame being: llvm::CodeMetrics::analyzeBasicBlock
Downstream we have had to patch a lot of TTI functions to avoid crashing with scalable vectors. So I'm also thinking we're not ready for this to be enabled by default. I've been meaning to try and find some solid test cases to show this off but I agree with @reames .
FYI, I have a new version of this posted here: https://reviews.llvm.org/D129013
This follows a bunch of work to avoid crashes in cost modeling, or consumers involving Invalid costs, and a fair amount of cost model implementation work.
As an aside, I took a look at the assembly for this example. Th codegen for the vector loop ends up being less than great. For instance: