This patch adds a reduced test case which identifies an illegal vsetvli
inserted by the compiler. The compiler emits a vsetvli which is intended
to preserve VL with the SEW/LMUL ratio e32/m1 when in fact the VL could
have been set by e64/m2 in a predecessor block.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I haven't found the time to investigate the actual cause of this bug yet, so I thought I'd stick it up as a regression test so that people are aware of the issue; since it's something the simulator chokes on it may or may not actually produce a failure depending on the specific simulator/test case. I can maybe take a look later this week.
Why is preserving vl wrong here? Isn't the vl 4 regardless of which it came from?
I believe the pass is preserving it because it believes that AVL=4 will produce the same vl as long as sew/lmul is constant which it should be for sew=64/lmul=2 vs sew=32/lmul=1
You're right, I must have reduced the test case too far at one point and stopped thinking properly. I'll go back to the original failing case.
Sorry about that @craig.topper I was rushing it. The test case is now reduced a lot better. I believe it's an issue where the vmv.x.s "doesn't need" an AVL (it's RISCV::NoRegister) because the operation is VL-agnostic, as it were, so we fall into a trap where we insert a vsetvli zero,zero even though that's an illegal attempt to preserve VL.
That's very interesting and I think it has always been broken. I modified the test case to not depend on VLS support and it does this on 12.0.1 as well.
define i32 @foo(<vscale x 2 x i32> %a, <vscale x 4 x i64> %x, <vscale x 4 x i64>* %y) { %index = add <vscale x 4 x i64> %x, %x store <vscale x 4 x i64> %index, <vscale x 4 x i64>* %y %elt = extractelement <vscale x 2 x i32> %a, i64 0 ret i32 %elt }
We don't have a VL to use here and I don't think we should make one up. I think the right fix is to add the VL argument to the intrinsic and the instruction.
llvm/test/CodeGen/RISCV/rvv/vsetvli-regression.ll | ||
---|---|---|
2 | Can we use define i32 @foo(<vscale x 2 x i32> %a, <vscale x 4 x i64> %x, <vscale x 4 x i64>* %y) { %index = add <vscale x 4 x i64> %x, %x store <vscale x 4 x i64> %index, <vscale x 4 x i64>* %y %elt = extractelement <vscale x 2 x i32> %a, i64 0 ret i32 %elt } to avoid the mix of fixed and scalable. |
@craig.topper do you think we should merge this in or just abandon it since D106403 presumably covers the same case?
Can we use
to avoid the mix of fixed and scalable.