This patch adds a reduced test case which identifies an illegal vsetvli
inserted by the compiler. The compiler emits a vsetvli which is intended
to preserve VL with the SEW/LMUL ratio e32/m1 when in fact the VL could
have been set by e64/m2 in a predecessor block.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
I haven't found the time to investigate the actual cause of this bug yet, so I thought I'd stick it up as a regression test so that people are aware of the issue; since it's something the simulator chokes on it may or may not actually produce a failure depending on the specific simulator/test case. I can maybe take a look later this week.
Why is preserving vl wrong here? Isn't the vl 4 regardless of which it came from?
I believe the pass is preserving it because it believes that AVL=4 will produce the same vl as long as sew/lmul is constant which it should be for sew=64/lmul=2 vs sew=32/lmul=1
You're right, I must have reduced the test case too far at one point and stopped thinking properly. I'll go back to the original failing case.
Sorry about that @craig.topper I was rushing it. The test case is now reduced a lot better. I believe it's an issue where the vmv.x.s "doesn't need" an AVL (it's RISCV::NoRegister) because the operation is VL-agnostic, as it were, so we fall into a trap where we insert a vsetvli zero,zero even though that's an illegal attempt to preserve VL.
That's very interesting and I think it has always been broken. I modified the test case to not depend on VLS support and it does this on 12.0.1 as well.
define i32 @foo(<vscale x 2 x i32> %a, <vscale x 4 x i64> %x, <vscale x 4 x i64>* %y) {
%index = add <vscale x 4 x i64> %x, %x
store <vscale x 4 x i64> %index, <vscale x 4 x i64>* %y
%elt = extractelement <vscale x 2 x i32> %a, i64 0
ret i32 %elt
}We don't have a VL to use here and I don't think we should make one up. I think the right fix is to add the VL argument to the intrinsic and the instruction.
| llvm/test/CodeGen/RISCV/rvv/vsetvli-regression.ll | ||
|---|---|---|
| 1 | Can we use define i32 @foo(<vscale x 2 x i32> %a, <vscale x 4 x i64> %x, <vscale x 4 x i64>* %y) {
%index = add <vscale x 4 x i64> %x, %x
store <vscale x 4 x i64> %index, <vscale x 4 x i64>* %y
%elt = extractelement <vscale x 2 x i32> %a, i64 0
ret i32 %elt
}to avoid the mix of fixed and scalable. | |
@craig.topper do you think we should merge this in or just abandon it since D106403 presumably covers the same case?
Can we use
define i32 @foo(<vscale x 2 x i32> %a, <vscale x 4 x i64> %x, <vscale x 4 x i64>* %y) { %index = add <vscale x 4 x i64> %x, %x store <vscale x 4 x i64> %index, <vscale x 4 x i64>* %y %elt = extractelement <vscale x 2 x i32> %a, i64 0 ret i32 %elt }to avoid the mix of fixed and scalable.