Page MenuHomePhabricator

[SplitKit] Only copy live lanes
ClosedPublic

Authored by foad on Wed, Sep 16, 5:07 AM.

Details

Summary

When splitting a live interval with subranges, only insert copies for
the lanes that are live at the point of the split. This avoids some
unnecessary copies and fixes a problem where copying dead lanes was
generating MIR that failed verification. The test case for this is
test/CodeGen/AMDGPU/splitkit-copy-live-lanes.mir.

Without this fix, some earlier live range splitting would create %430:

%430 [256r,848r:0)[848r,2584r:1) 0@256r 1@848r L0000000000000003 [848r,2584r:0) 0@848r L0000000000000030 [256r,2584r:0) 0@256r weight:1.480938e-03
...
256B undef %430.sub2:vreg_128 = V_LSHRREV_B32_e32 16, %20.sub1:vreg_128, implicit $exec
...
848B %430.sub0:vreg_128 = V_AND_B32_e32 %92:sreg_32, %20.sub1:vreg_128, implicit $exec
...
2584B %431:vreg_128 = COPY %430:vreg_128

Then RAGreedy::tryLocalSplit would split %430 into %432 and %433 just
before 848B giving:

%432 [256r,844r:0) 0@256r L0000000000000030 [256r,844r:0) 0@256r weight:3.066802e-03
%433 [844r,848r:0)[848r,2584r:1) 0@844r 1@848r L0000000000000030 [844r,2584r:0) 0@844r L0000000000000003 [844r,844d:0)[848r,2584r:1) 0@844r 1@848r weight:2.831776e-03
...
256B undef %432.sub2:vreg_128 = V_LSHRREV_B32_e32 16, %20.sub1:vreg_128, implicit $exec
...
844B undef %433.sub0:vreg_128 = COPY %432.sub0:vreg_128 {

internal %433.sub2:vreg_128 = COPY %432.sub2:vreg_128

848B }

%433.sub0:vreg_128 = V_AND_B32_e32 %92:sreg_32, %20.sub1:vreg_128, implicit $exec

...
2584B %431:vreg_128 = COPY %433:vreg_128

Note that the copy from %432 to %433 at 844B is a curious
bundle-without-a-BUNDLE-instruction that SplitKit creates deliberately,
and it includes a copy of .sub0 which is not live at this point, and
that causes it to fail verification:

    • Bad machine code: No live subrange at use ***
  • function: zextload_global_v64i16_to_v64i64
  • basic block: %bb.0 (0x7faed48) [0B;2848B)
  • instruction: 844B undef %433.sub0:vreg_128 = COPY %432.sub0:vreg_128
  • operand 1: %432.sub0:vreg_128
  • interval: %432 [256r,844r:0) 0@256r L0000000000000030 [256r,844r:0) 0@256r weight:3.066802e-03
  • at: 844B

Using real bundles with a BUNDLE instruction might also fix this
problem, but the current fix is less invasive and also avoids some
unnecessary copies.

https://bugs.llvm.org/show_bug.cgi?id=47492

Diff Detail

Event Timeline

foad created this revision.Wed, Sep 16, 5:07 AM
Herald added a project: Restricted Project. · View Herald TranscriptWed, Sep 16, 5:07 AM
foad requested review of this revision.Wed, Sep 16, 5:07 AM
foad added a subscriber: rampitec.Wed, Sep 16, 5:10 AM

@rampitec this patch causes CodeGen/AMDGPU/splitkit-copy-bundle.mir to fail because it no longer generates any bundles. Could you help find a way to update the test case? I have pushed this patch to github in case you want to pull from there: https://github.com/jayfoad/llvm-project/tree/only-copy-live-lanes

lkail added a subscriber: lkail.Wed, Sep 16, 5:29 AM

I'm assuming this doesn't fix the problem of spilling unnecessary lanes?

llvm/test/CodeGen/AMDGPU/splitkit-copy-live-lanes.mir
6

Missing checks

78–80

Can you -run-pass=none to compact these registers

foad added a comment.Wed, Sep 16, 6:24 AM

I'm assuming this doesn't fix the problem of spilling unnecessary lanes?

I'm not familiar with that problem but I doubt it. Though it does seem to have slightly reduced the amount of space that @test_limited_sgpr uses for spilling.

llvm/test/CodeGen/AMDGPU/splitkit-copy-live-lanes.mir
6

I only want to check that verification doesn't fail. I was tempted to leave out the | FileCheck %s altogether but I recall that some reviewers don't like that. What would you like me to check for?

arsenm added inline comments.Wed, Sep 16, 6:31 AM
llvm/test/CodeGen/AMDGPU/splitkit-copy-live-lanes.mir
6

The spills and copies, or just generate the checks

foad updated this revision to Diff 292204.Wed, Sep 16, 6:48 AM

Generate test checks.

foad marked 2 inline comments as done.Wed, Sep 16, 6:48 AM

@rampitec this patch causes CodeGen/AMDGPU/splitkit-copy-bundle.mir to fail because it no longer generates any bundles. Could you help find a way to update the test case? I have pushed this patch to github in case you want to pull from there: https://github.com/jayfoad/llvm-project/tree/only-copy-live-lanes

There is very high chance this test in unfixable with your changes as these prevent the very situation where the KIILs were formed. I'd say you may just generate checks there. At the end this code snippet did not compile and asserted, so we need to make sure it compiles. Bundles themselves are just an implementation detail.

foad updated this revision to Diff 292310.Wed, Sep 16, 12:31 PM

Generate test checks for splitkit-copy-bundle.mir.

foad added a comment.Wed, Sep 16, 12:32 PM

@rampitec this patch causes CodeGen/AMDGPU/splitkit-copy-bundle.mir to fail because it no longer generates any bundles. Could you help find a way to update the test case? I have pushed this patch to github in case you want to pull from there: https://github.com/jayfoad/llvm-project/tree/only-copy-live-lanes

There is very high chance this test in unfixable with your changes as these prevent the very situation where the KIILs were formed. I'd say you may just generate checks there. At the end this code snippet did not compile and asserted, so we need to make sure it compiles. Bundles themselves are just an implementation detail.

How about this? I dropped the ASM RUN line because it's awkward to use update_llc_test_checks and update_mir_test_checks in the same file.

@rampitec this patch causes CodeGen/AMDGPU/splitkit-copy-bundle.mir to fail because it no longer generates any bundles. Could you help find a way to update the test case? I have pushed this patch to github in case you want to pull from there: https://github.com/jayfoad/llvm-project/tree/only-copy-live-lanes

There is very high chance this test in unfixable with your changes as these prevent the very situation where the KIILs were formed. I'd say you may just generate checks there. At the end this code snippet did not compile and asserted, so we need to make sure it compiles. Bundles themselves are just an implementation detail.

How about this? I dropped the ASM RUN line because it's awkward to use update_llc_test_checks and update_mir_test_checks in the same file.

Looks good to me.

This revision was not accepted when it landed; it landed in state Needs Review.Thu, Sep 17, 1:26 AM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.