This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Don't fold RISCVISD::VMV_V_X_VL series node and scalar load to vector load when scalar load is update load
ClosedPublic

Authored by zixuan-wu on Jun 5 2023, 7:42 PM.

Details

Summary

We try to fold RISCVISD::VMV_V_X_VL series node + scalar load -> vector load. But if scalar load is indexed load (load update form), it's not profitable to fold because load update node can't be removed after fold.

Diff Detail

Event Timeline

zixuan-wu created this revision.Jun 5 2023, 7:42 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 5 2023, 7:42 PM
zixuan-wu requested review of this revision.Jun 5 2023, 7:42 PM
zixuan-wu edited the summary of this revision. (Show Details)Jun 5 2023, 7:43 PM
zixuan-wu edited the summary of this revision. (Show Details)

When does a load and update node occur on RISC-v?

When does a load and update node occur on RISC-v?

It's actually less frequent because the standard extension does not have load update instruction. But XTHeadFMemIdx extension has load update instruction, so it appears in downstream.

craig.topper added a comment.EditedJun 5 2023, 8:13 PM

When does a load and update node occur on RISC-v?

It's actually less frequent because the standard extension does not have load update instruction. But XTHeadFMemIdx extension has load update instruction, so it appears in downstream.

Should we even do the transform for an indexed load? Won’t we lose the update part of the load update?

When does a load and update node occur on RISC-v?

It's actually less frequent because the standard extension does not have load update instruction. But XTHeadFMemIdx extension has load update instruction, so it appears in downstream.

Should we even do the transform for an indexed load? Won’t we lose the update part of the load update?

It's not about the replacement of load update node, it's about VMV_V_X_VL node replacement. It just uses the data part of load update result. As the chain replacement, it just upgrades the chain from load update node to new vector load node because the new vector load node inserted after load update node in sequence order.

craig.topper added a comment.EditedJun 5 2023, 9:16 PM

When does a load and update node occur on RISC-v?

It's actually less frequent because the standard extension does not have load update instruction. But XTHeadFMemIdx extension has load update instruction, so it appears in downstream.

Should we even do the transform for an indexed load? Won’t we lose the update part of the load update?

It's not about the replacement of load update node, it's about VMV_V_X_VL node replacement. It just uses the data part of load update result. As the chain replacement, it just upgrades the chain from load update node to new vector load node because the new vector load node inserted after load update node in sequence order.

It’s not legal to replace the chain result of the original node wasn’t deleted. We would need to insert a TokenFactor to collect the chains from the new and old load and replace all uses of the old load’s chain with the TokenFactor result. Otherwise the dependencies aren’t maintained.

It’s also not legal to duplicate a volatile load. I suspect we didn’t check that.

I think we should disable the load update case in isLegalToFold or isProfitableToFold.

It’s not legal to replace the chain result of the original node wasn’t deleted. We would need to insert a TokenFactor to collect the chains from the new and old load and replace all uses of the old load’s chain with the TokenFactor result.

OK, using TokenFactor is more formal, and I agree with it. But it's another issue to enhance the code, we still need think about load update load.

It’s also not legal to duplicate a volatile load. I suspect we didn’t check that.

I aggree.

I think we should disable the load update case in isLegalToFold or isProfitableToFold.

Why is load update not profitable or legal? I don't see any difference with normal load.

It’s not legal to replace the chain result of the original node wasn’t deleted. We would need to insert a TokenFactor to collect the chains from the new and old load and replace all uses of the old load’s chain with the TokenFactor result.

OK, using TokenFactor is more formal, and I agree with it. But it's another issue to enhance the code, we still need think about load update load.

It’s also not legal to duplicate a volatile load. I suspect we didn’t check that.

I aggree.

I think we should disable the load update case in isLegalToFold or isProfitableToFold.

Why is load update not profitable or legal? I don't see any difference with normal load.

We only fold normal loads if it removes the scalar load. So that's different than the load update which won't be removed.

It's not legal if we don't add the TokenFactor and fix the volatile issue. Making it not legal/profitable avoids those issues.

Ok I looked closer at the code. Neither isProfitableToFold or isLegalToFold are implemented in this file. I think we could add an override of isProfitableToFold, but might not be worth it. We can reject load update this code near where we call isProfitableToFold/isLegalToFold.

Ok I looked closer at the code. Neither isProfitableToFold or isLegalToFold are implemented in this file. I think we could add an override of isProfitableToFold, but might not be worth it. We can reject load update this code near where we call isProfitableToFold/isLegalToFold.

Yes, and I also think it makes sense more because rejecting load update is not common logic to add to isProfitableToFold/isLegalToFold.

We only fold normal loads if it removes the scalar load. So that's different than the load update which won't be removed.

I got.
You mean the scalar load update node can't be folded so that there are still two load(one scalar and one vector) result. Instead, it should be one vector load.

We only fold normal loads if it removes the scalar load. So that's different than the load update which won't be removed.

I got.
You mean the scalar load update node can't be folded so that there are still two load(one scalar and one vector) result. Instead, it should be one vector load.

Correct.

HasVendorXTHeadFMemIdx is in upstream, so we should be able to write a test for this?

zixuan-wu updated this revision to Diff 528719.Jun 6 2023, 12:15 AM
zixuan-wu retitled this revision from [RISCV] Fix the num of chain SDNode introduced in 9e0f9f113248093e737c4cf5450f0a3c2bcd90ba to [RISCV] Don't fold RISCVISD::VMV_V_X_VL series node and scalar load to vector load when scalar load is update load.
zixuan-wu edited the summary of this revision. (Show Details)

HasVendorXTHeadFMemIdx is in upstream, so we should be able to write a test for this?

Theoretically, it can construct one case with XTHeadFMemIdx extension, but it only appears in downstream codebase. I am trying to reproduce it without downstream code.

zixuan-wu updated this revision to Diff 528738.Jun 6 2023, 1:15 AM
pcwang-thead added a comment.EditedJun 6 2023, 1:36 AM
This comment has been deleted.
llvm/test/CodeGen/RISCV/rvv/fold-scalar-load-crash.ll
12

Why is there no instructions here? All of them are optimized out?

zixuan-wu added inline comments.Jun 6 2023, 1:45 AM
llvm/test/CodeGen/RISCV/rvv/fold-scalar-load-crash.ll
12

it's interesting. It's removed by dead-mi-elimination pass. But this is truly a case reduced from big original source to reproduce the compiling assert in DAG selection stage :)

llvm/test/CodeGen/RISCV/rvv/fold-scalar-load-crash.ll
43

How about:

define i32 @test(i32 %size, ptr %add.ptr, i64 %const) {
entry:
  br label %for.body

for.body:                                         ; preds = %for.body, %entry
  %add.ptr1 = getelementptr i8, ptr %add.ptr, i32 -1
  %add.ptr2 = getelementptr i8, ptr %add.ptr1, i32 %size
  %0 = load i8, ptr %add.ptr1, align 1
  %1 = load i8, ptr %add.ptr2, align 1
  %2 = insertelement <8 x i8> poison, i8 %0, i64 0
  %3 = insertelement <8 x i8> %2, i8 0, i64 %const
  %4 = insertelement <8 x i8> %3, i8 %1, i64 0
  %5 = icmp ult <8 x i8> %4, <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
  %6 = bitcast <8 x i1> %5 to i8
  %7 = zext i8 %6 to i32
  %cond = icmp eq i32 %7, 0
  br i1 %cond, label %if.then381, label %for.body

if.then381:                                       ; preds = %for.body
  ret i32 0
}
zixuan-wu added inline comments.Jun 6 2023, 2:18 AM
llvm/test/CodeGen/RISCV/rvv/fold-scalar-load-crash.ll
43

Thank you for providing this case. It works.

zixuan-wu updated this revision to Diff 528752.Jun 6 2023, 2:21 AM

ping. @craig.topper Do you have more comments?

This revision is now accepted and ready to land.Jul 10 2023, 11:38 PM
This revision was landed with ongoing or failed builds.Jul 11 2023, 12:58 AM
This revision was automatically updated to reflect the committed changes.