We try to fold RISCVISD::VMV_V_X_VL series node + scalar load -> vector load. But if scalar load is indexed load (load update form), it's not profitable to fold because load update node can't be removed after fold.
Details
Diff Detail
Event Timeline
It's actually less frequent because the standard extension does not have load update instruction. But XTHeadFMemIdx extension has load update instruction, so it appears in downstream.
Should we even do the transform for an indexed load? Won’t we lose the update part of the load update?
It's not about the replacement of load update node, it's about VMV_V_X_VL node replacement. It just uses the data part of load update result. As the chain replacement, it just upgrades the chain from load update node to new vector load node because the new vector load node inserted after load update node in sequence order.
It’s not legal to replace the chain result of the original node wasn’t deleted. We would need to insert a TokenFactor to collect the chains from the new and old load and replace all uses of the old load’s chain with the TokenFactor result. Otherwise the dependencies aren’t maintained.
It’s also not legal to duplicate a volatile load. I suspect we didn’t check that.
I think we should disable the load update case in isLegalToFold or isProfitableToFold.
It’s not legal to replace the chain result of the original node wasn’t deleted. We would need to insert a TokenFactor to collect the chains from the new and old load and replace all uses of the old load’s chain with the TokenFactor result.
OK, using TokenFactor is more formal, and I agree with it. But it's another issue to enhance the code, we still need think about load update load.
It’s also not legal to duplicate a volatile load. I suspect we didn’t check that.
I aggree.
I think we should disable the load update case in isLegalToFold or isProfitableToFold.
Why is load update not profitable or legal? I don't see any difference with normal load.
We only fold normal loads if it removes the scalar load. So that's different than the load update which won't be removed.
It's not legal if we don't add the TokenFactor and fix the volatile issue. Making it not legal/profitable avoids those issues.
Ok I looked closer at the code. Neither isProfitableToFold or isLegalToFold are implemented in this file. I think we could add an override of isProfitableToFold, but might not be worth it. We can reject load update this code near where we call isProfitableToFold/isLegalToFold.
Yes, and I also think it makes sense more because rejecting load update is not common logic to add to isProfitableToFold/isLegalToFold.
We only fold normal loads if it removes the scalar load. So that's different than the load update which won't be removed.
I got.
You mean the scalar load update node can't be folded so that there are still two load(one scalar and one vector) result. Instead, it should be one vector load.
HasVendorXTHeadFMemIdx is in upstream, so we should be able to write a test for this?
Theoretically, it can construct one case with XTHeadFMemIdx extension, but it only appears in downstream codebase. I am trying to reproduce it without downstream code.
llvm/test/CodeGen/RISCV/rvv/fold-scalar-load-crash.ll | ||
---|---|---|
11 | Why is there no instructions here? All of them are optimized out? |
llvm/test/CodeGen/RISCV/rvv/fold-scalar-load-crash.ll | ||
---|---|---|
11 | it's interesting. It's removed by dead-mi-elimination pass. But this is truly a case reduced from big original source to reproduce the compiling assert in DAG selection stage :) |
llvm/test/CodeGen/RISCV/rvv/fold-scalar-load-crash.ll | ||
---|---|---|
42 | How about: define i32 @test(i32 %size, ptr %add.ptr, i64 %const) { entry: br label %for.body for.body: ; preds = %for.body, %entry %add.ptr1 = getelementptr i8, ptr %add.ptr, i32 -1 %add.ptr2 = getelementptr i8, ptr %add.ptr1, i32 %size %0 = load i8, ptr %add.ptr1, align 1 %1 = load i8, ptr %add.ptr2, align 1 %2 = insertelement <8 x i8> poison, i8 %0, i64 0 %3 = insertelement <8 x i8> %2, i8 0, i64 %const %4 = insertelement <8 x i8> %3, i8 %1, i64 0 %5 = icmp ult <8 x i8> %4, <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1> %6 = bitcast <8 x i1> %5 to i8 %7 = zext i8 %6 to i32 %cond = icmp eq i32 %7, 0 br i1 %cond, label %if.then381, label %for.body if.then381: ; preds = %for.body ret i32 0 } |
llvm/test/CodeGen/RISCV/rvv/fold-scalar-load-crash.ll | ||
---|---|---|
42 | Thank you for providing this case. It works. |
Why is there no instructions here? All of them are optimized out?