This is an archive of the discontinued LLVM Phabricator instance.

[SplitKit] Only copy live lanes
ClosedPublic

Authored by foad on Sep 16 2020, 5:07 AM.

Details

Summary

When splitting a live interval with subranges, only insert copies for
the lanes that are live at the point of the split. This avoids some
unnecessary copies and fixes a problem where copying dead lanes was
generating MIR that failed verification. The test case for this is
test/CodeGen/AMDGPU/splitkit-copy-live-lanes.mir.

Without this fix, some earlier live range splitting would create %430:

%430 [256r,848r:0)[848r,2584r:1) 0@256r 1@848r L0000000000000003 [848r,2584r:0) 0@848r L0000000000000030 [256r,2584r:0) 0@256r weight:1.480938e-03
...
256B undef %430.sub2:vreg_128 = V_LSHRREV_B32_e32 16, %20.sub1:vreg_128, implicit $exec
...
848B %430.sub0:vreg_128 = V_AND_B32_e32 %92:sreg_32, %20.sub1:vreg_128, implicit $exec
...
2584B %431:vreg_128 = COPY %430:vreg_128

Then RAGreedy::tryLocalSplit would split %430 into %432 and %433 just
before 848B giving:

%432 [256r,844r:0) 0@256r L0000000000000030 [256r,844r:0) 0@256r weight:3.066802e-03
%433 [844r,848r:0)[848r,2584r:1) 0@844r 1@848r L0000000000000030 [844r,2584r:0) 0@844r L0000000000000003 [844r,844d:0)[848r,2584r:1) 0@844r 1@848r weight:2.831776e-03
...
256B undef %432.sub2:vreg_128 = V_LSHRREV_B32_e32 16, %20.sub1:vreg_128, implicit $exec
...
844B undef %433.sub0:vreg_128 = COPY %432.sub0:vreg_128 {

internal %433.sub2:vreg_128 = COPY %432.sub2:vreg_128

848B }

%433.sub0:vreg_128 = V_AND_B32_e32 %92:sreg_32, %20.sub1:vreg_128, implicit $exec

...
2584B %431:vreg_128 = COPY %433:vreg_128

Note that the copy from %432 to %433 at 844B is a curious
bundle-without-a-BUNDLE-instruction that SplitKit creates deliberately,
and it includes a copy of .sub0 which is not live at this point, and
that causes it to fail verification:

    • Bad machine code: No live subrange at use ***
  • function: zextload_global_v64i16_to_v64i64
  • basic block: %bb.0 (0x7faed48) [0B;2848B)
  • instruction: 844B undef %433.sub0:vreg_128 = COPY %432.sub0:vreg_128
  • operand 1: %432.sub0:vreg_128
  • interval: %432 [256r,844r:0) 0@256r L0000000000000030 [256r,844r:0) 0@256r weight:3.066802e-03
  • at: 844B

Using real bundles with a BUNDLE instruction might also fix this
problem, but the current fix is less invasive and also avoids some
unnecessary copies.

https://bugs.llvm.org/show_bug.cgi?id=47492

Diff Detail

Unit TestsFailed

Event Timeline

foad created this revision.Sep 16 2020, 5:07 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 16 2020, 5:07 AM
foad requested review of this revision.Sep 16 2020, 5:07 AM
foad added a subscriber: rampitec.Sep 16 2020, 5:10 AM

@rampitec this patch causes CodeGen/AMDGPU/splitkit-copy-bundle.mir to fail because it no longer generates any bundles. Could you help find a way to update the test case? I have pushed this patch to github in case you want to pull from there: https://github.com/jayfoad/llvm-project/tree/only-copy-live-lanes

lkail added a subscriber: lkail.Sep 16 2020, 5:29 AM

I'm assuming this doesn't fix the problem of spilling unnecessary lanes?

llvm/test/CodeGen/AMDGPU/splitkit-copy-live-lanes.mir
6

Missing checks

78–80

Can you -run-pass=none to compact these registers

foad added a comment.Sep 16 2020, 6:24 AM

I'm assuming this doesn't fix the problem of spilling unnecessary lanes?

I'm not familiar with that problem but I doubt it. Though it does seem to have slightly reduced the amount of space that @test_limited_sgpr uses for spilling.

llvm/test/CodeGen/AMDGPU/splitkit-copy-live-lanes.mir
6

I only want to check that verification doesn't fail. I was tempted to leave out the | FileCheck %s altogether but I recall that some reviewers don't like that. What would you like me to check for?

arsenm added inline comments.Sep 16 2020, 6:31 AM
llvm/test/CodeGen/AMDGPU/splitkit-copy-live-lanes.mir
6

The spills and copies, or just generate the checks

foad updated this revision to Diff 292204.Sep 16 2020, 6:48 AM

Generate test checks.

foad marked 2 inline comments as done.Sep 16 2020, 6:48 AM

@rampitec this patch causes CodeGen/AMDGPU/splitkit-copy-bundle.mir to fail because it no longer generates any bundles. Could you help find a way to update the test case? I have pushed this patch to github in case you want to pull from there: https://github.com/jayfoad/llvm-project/tree/only-copy-live-lanes

There is very high chance this test in unfixable with your changes as these prevent the very situation where the KIILs were formed. I'd say you may just generate checks there. At the end this code snippet did not compile and asserted, so we need to make sure it compiles. Bundles themselves are just an implementation detail.

foad updated this revision to Diff 292310.Sep 16 2020, 12:31 PM

Generate test checks for splitkit-copy-bundle.mir.

foad added a comment.Sep 16 2020, 12:32 PM

@rampitec this patch causes CodeGen/AMDGPU/splitkit-copy-bundle.mir to fail because it no longer generates any bundles. Could you help find a way to update the test case? I have pushed this patch to github in case you want to pull from there: https://github.com/jayfoad/llvm-project/tree/only-copy-live-lanes

There is very high chance this test in unfixable with your changes as these prevent the very situation where the KIILs were formed. I'd say you may just generate checks there. At the end this code snippet did not compile and asserted, so we need to make sure it compiles. Bundles themselves are just an implementation detail.

How about this? I dropped the ASM RUN line because it's awkward to use update_llc_test_checks and update_mir_test_checks in the same file.

@rampitec this patch causes CodeGen/AMDGPU/splitkit-copy-bundle.mir to fail because it no longer generates any bundles. Could you help find a way to update the test case? I have pushed this patch to github in case you want to pull from there: https://github.com/jayfoad/llvm-project/tree/only-copy-live-lanes

There is very high chance this test in unfixable with your changes as these prevent the very situation where the KIILs were formed. I'd say you may just generate checks there. At the end this code snippet did not compile and asserted, so we need to make sure it compiles. Bundles themselves are just an implementation detail.

How about this? I dropped the ASM RUN line because it's awkward to use update_llc_test_checks and update_mir_test_checks in the same file.

Looks good to me.

This revision was not accepted when it landed; it landed in state Needs Review.Sep 17 2020, 1:26 AM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
foad added a subscriber: vpykhtin.Sep 25 2020, 3:25 AM
jahentao added a subscriber: jahentao.EditedJul 28 2021, 4:56 AM

How to solve this problem with "BUNDLE instruction" as mentioned? I have the same problem but cannot fix with this reviewed patch.

Using real bundles with a BUNDLE instruction might also fix this problem, but the current fix is less invasive and also avoids some unnecessary copies.

I'm new to it, but I want to have a try.

foad added a comment.Jul 28 2021, 7:09 AM

How to solve this problem with "BUNDLE instruction" as mentioned? I have the same problem but cannot fix with this reviewed patch.

Using real bundles with a BUNDLE instruction might also fix this problem, but the current fix is less invasive and also avoids some unnecessary copies.

I'm new to it, but I want to have a try.

I'm sorry, but I don't remember why I thought using real BUNDLE instructions might help.