This is an archive of the discontinued LLVM Phabricator instance.

[RISCV][RVV] Fix vslide1up/down intrinsics overflow bug for SEW=64 on RV32
ClosedPublic

Authored by lhtin on Mar 3 2022, 5:44 AM.

Details

Summary

When left-shifting the incoming VL parameter, it may cause overflow
when the vl >= 0x80000000, resulting in the wrong VL length setting.
For example this code and the output:

#include <riscv_vector.h>

vint64m1_t
test (vint64m1_t a, int64_t b)
{
    return vslide1up_vx_i64m1(a, b, 0x80000000);
}
/* assembly output:
test: 
        vsetivli        zero, 0, e32, m1, ta, mu
        vslide1up.vx    v9, v8, a1
        vslide1up.vx    v8, v9, a0
        ret
*/

The bug was caused by this commit.

Diff Detail

Event Timeline

lhtin created this revision.Mar 3 2022, 5:44 AM
lhtin requested review of this revision.Mar 3 2022, 5:44 AM
lhtin edited the summary of this revision. (Show Details)Mar 3 2022, 5:44 AM

This isn't the only problem with this code. It's broken for CPUs that implement this sentence from the spec. "this permits an implementation to set vl = ceil(AVL / 2) for VLMAX < AVL < 2*VLMAX".

As a concrete example:
If SEW=64 VLMAX is 8 and AVL is 9. The implementation is allowed to return 5 for the VL.
If we multiply the 9 by 2 we create an AVL of 18 for the SEW=32. The implementation is would be allowed to return 9 for the VL. We need it to return 10 to be 2x5 to match the SEW=64 VL.

The only way I see out of this is to insert a SEW=64 vsetvli to explicitly create a VL less than or equal to the SEW=64 VLMAX, then multiply that by 2. This will fix both the overflow you identified and the case I just described.

lhtin added a comment.Mar 4 2022, 12:06 AM

This isn't the only problem with this code. It's broken for CPUs that implement this sentence from the spec. "this permits an implementation to set vl = ceil(AVL / 2) for VLMAX < AVL < 2*VLMAX".

As a concrete example:
If SEW=64 VLMAX is 8 and AVL is 9. The implementation is allowed to return 5 for the VL.
If we multiply the 9 by 2 we create an AVL of 18 for the SEW=32. The implementation is would be allowed to return 9 for the VL. We need it to return 10 to be 2x5 to match the SEW=64 VL.

The only way I see out of this is to insert a SEW=64 vsetvli to explicitly create a VL less than or equal to the SEW=64 VLMAX, then multiply that by 2. This will fix both the overflow you identified and the case I just described.

Thanks for your review. You are right. I will try to solve both problems based on your comments.

lhtin updated this revision to Diff 413221.Mar 5 2022, 9:47 AM

I fixed the extra bug found by the review and made some optimizations for the case where AVL is constant.
This is my first patch to the llvm project. Please feel free to point out any irregularities. Thanks.

lhtin updated this revision to Diff 413266.Mar 5 2022, 5:14 PM

updating based on the latest main branch

craig.topper added inline comments.Mar 5 2022, 8:47 PM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
4657

Where does 128 come from? The vector extension allows 32-bit for Zve32* and 64-bit for Zve64*

4658

I'm not sure I understand this math. What is 64?

KnownSize can be as small as 8 so this returns 0 in that case.

4662

Use cast instead of dyn_cast. dyn_cast returns null if the cast fails and that would need to be checked, but you already called isa earlier, so it can't fail.

4663

AVL is a 64-bit value on RV64 so you can't use unsigned here.

4666

Use * 2 instead of << 1.

lhtin updated this revision to Diff 413275.Mar 5 2022, 11:32 PM
lhtin marked 5 inline comments as done.

Updated based on reviews

lhtin updated this revision to Diff 413342.Mar 6 2022, 10:48 PM
lhtin marked an inline comment as not done.

add getMaxVLen and encodeSEW helper function

kito-cheng added inline comments.Mar 7 2022, 5:29 AM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
4670

Maybe you just using vlmax (vsetvli any_tmp_reg, x0, e32, *) here instead of 2 * MaxVLMAX? then you just need a vsetvli here.

kito-cheng added inline comments.Mar 7 2022, 5:38 AM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
4656

MinSize / RISCV::RVVBitsPerBlock; is equal to LMUL and you using this equation to prevent screw up for fractional LMUL, but it's kind of hard to understand at first impression, so I would suggest you can add an util function here and use LMUL explicitly like that:

RISCVII::VLMUL Lmul = RISCVTargetLowering::getLMUL(VT);
unsigned MaxVLMAX = computeVLMAX(VectorBitsMax, EltSize, Lmul);

And

unsigned MinVLMAX = computeVLMAX(VectorBitsMin, EltSize, Lmul);

it would improve readability

lhtin updated this revision to Diff 413454.Mar 7 2022, 7:10 AM
lhtin marked 2 inline comments as done.

Update as suggested by @kito-cheng

craig.topper added inline comments.Mar 7 2022, 10:05 AM
llvm/lib/Target/RISCV/MCTargetDesc/RISCVBaseInfo.h
381

Unspected -> Unexpected

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vslide1-rv32.ll
2

There are no "fixed vectors" in this test. So I don't like this test name.

lhtin updated this revision to Diff 413761.Mar 8 2022, 4:15 AM
lhtin marked 2 inline comments as done.

update as suggested by @craig.topper

craig.topper added inline comments.Mar 9 2022, 1:57 PM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
4691

This AVL should be I32VL to match the I32 type. It was wrong in the old code. It should have been scaled by 2. Since its gets pattern matched away during isel it probably doesn't really matter, but could to be logically correct.

lhtin updated this revision to Diff 414366.Mar 10 2022, 6:13 AM
lhtin marked an inline comment as done.
lhtin retitled this revision from [RISCV] Fix vslide1up/down intrinsics overflow bug for SEW=64 on RV32 to [RISCV][RVV] Fix vslide1up/down intrinsics overflow bug for SEW=64 on RV32.

Update code based on review.

This revision is now accepted and ready to land.Mar 10 2022, 9:34 AM
lhtin removed a project: Restricted Project.Mar 10 2022, 4:56 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 10 2022, 4:56 PM
lhtin updated this revision to Diff 414907.Mar 12 2022, 10:42 PM

Rebase and thank @craig.topper and @kito-cheng for your comments.