This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Allow rematerialization of virtual reg uses
AbandonedPublic

Authored by rampitec on Jul 19 2021, 2:10 PM.

Details

Reviewers
arsenm
Summary

Currently isReallyTriviallyReMaterializableGeneric() implementation
prevents rematerialization on any virtual register use on the grounds
that is not a trivial rematerialization and that we do not want to
extend liveranges.

It appears that LRE logic does not attempt to extend a liverange of
a source register for rematerialization so that is not an issue.
That is checked in the LiveRangeEdit::allUsesAvailableAt().

The only non-trivial aspect of it on top of what we are already
checking for VOP (e.g. implicit uses/defs) and which is already working
with our SIInstrInfo::isReallyTriviallyReMaterializable() is accounting
for tied-defs which normally represent a read-modify-write operation
and not rematerializable. Everything else is proven to work by the
existing VOP implementation. We already allow such uses and bypassing
the rest of generic checks for VOP, but cannot rematerialize SOP
instructions with register uses.

The test for a tied-def situation already exists in the remat-vop.mir,
test_no_remat_v_cvt_f32_i32_sdwa_dst_unused_preserve.

Applying the same logic to other targets shows that there are really
non-trivial situations with uses so the logic is not generally
acceptable for any target.

Diff Detail

Event Timeline

rampitec created this revision.Jul 19 2021, 2:10 PM
rampitec requested review of this revision.Jul 19 2021, 2:10 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 19 2021, 2:10 PM
Herald added a subscriber: wdng. · View Herald Transcript

I don't follow this. Why would we want to override the generic logic and extend live ranges? If the LRE checks that it doesn't extend the live range, why not just remove the generic check?

I don't follow this. Why would we want to override the generic logic and extend live ranges? If the LRE checks that it doesn't extend the live range, why not just remove the generic check?

It does not extend live ranges, I have added a test for it. The comment is likely just obsolete. But then if I simply remove the check from the generic implementation a lot of tests in other targets start to fail. These must have their own restrictions which do not apply to us. For instance our restriction will be to avoid tied-defs and would have to go into generic logic then.

I don't follow this. Why would we want to override the generic logic and extend live ranges? If the LRE checks that it doesn't extend the live range, why not just remove the generic check?

It does not extend live ranges, I have added a test for it. The comment is likely just obsolete. But then if I simply remove the check from the generic implementation a lot of tests in other targets start to fail. These must have their own restrictions which do not apply to us.

This could also just mean code improvements. You can't just assume these are incorrect changes without looking. I don't see much special going on in the other target implementations

I don't follow this. Why would we want to override the generic logic and extend live ranges? If the LRE checks that it doesn't extend the live range, why not just remove the generic check?

It does not extend live ranges, I have added a test for it. The comment is likely just obsolete. But then if I simply remove the check from the generic implementation a lot of tests in other targets start to fail. These must have their own restrictions which do not apply to us.

This could also just mean code improvements. You can't just assume these are incorrect changes without looking. I don't see much special going on in the other target implementations

Using vreg without a def is not an improvement, just as other sorts of failed verification.

I don't follow this. Why would we want to override the generic logic and extend live ranges? If the LRE checks that it doesn't extend the live range, why not just remove the generic check?

It does not extend live ranges, I have added a test for it. The comment is likely just obsolete. But then if I simply remove the check from the generic implementation a lot of tests in other targets start to fail. These must have their own restrictions which do not apply to us.

This could also just mean code improvements. You can't just assume these are incorrect changes without looking. I don't see much special going on in the other target implementations

Using vreg without a def is not an improvement, just as other sorts of failed verification.

This could also mean we just don't have a test that hits this. I think it would be worth looking into these failures to see if this really is conceptually correct

I don't follow this. Why would we want to override the generic logic and extend live ranges? If the LRE checks that it doesn't extend the live range, why not just remove the generic check?

It does not extend live ranges, I have added a test for it. The comment is likely just obsolete. But then if I simply remove the check from the generic implementation a lot of tests in other targets start to fail. These must have their own restrictions which do not apply to us.

This could also just mean code improvements. You can't just assume these are incorrect changes without looking. I don't see much special going on in the other target implementations

Using vreg without a def is not an improvement, just as other sorts of failed verification.

This could also mean we just don't have a test that hits this. I think it would be worth looking into these failures to see if this really is conceptually correct

I just have to note, we are doing this for ages. Just for select valu moves.

OK, here is the test which fails just the same way as aarch64 without any changes at all:

# RUN: llc -march=amdgcn -mcpu=gfx900 -o - -verify-coalescing -run-pass=simple-register-coalescing %s

---
name:            test
tracksRegLiveness: true
body:             |
  bb.0:
    liveins: $vgpr0

    %0:vgpr_32 = COPY $vgpr0
    %1:vgpr_32 = V_ADD_U32_e32 1, %0, implicit $exec
    %2:vgpr_32 = V_MOV_B32_e32 killed %1, implicit $exec
    $vgpr0 = COPY killed %2
    SI_RETURN_TO_EPILOG killed $vgpr0
...

We allow rematerialization of the V_MOV_B32 into the COPY, but %1 is killed at that slot and it ends up as:

0B      bb.0:
          liveins: $vgpr0
16B       KILL $vgpr0
64B       $vgpr0 = V_MOV_B32_e32 %1:vgpr_32, implicit $exec
80B       SI_RETURN_TO_EPILOG $vgpr0

Another form of the same test, without a killed use but with a later separate KILL:

# RUN: llc -march=amdgcn -mcpu=gfx900 -o - -verify-coalescing -run-pass=simple-register-coalescing %s

---
name:            test
tracksRegLiveness: true
body:             |
  bb.0:
    liveins: $vgpr0

    %0:vgpr_32 = COPY $vgpr0
    %1:vgpr_32 = V_ADD_U32_e32 1, %0, implicit $exec
    %2:vgpr_32 = V_MOV_B32_e32 %1, implicit $exec
    KILL %1
    $vgpr0 = COPY killed %2
    SI_RETURN_TO_EPILOG killed $vgpr0
...

Looks like LRE checks that all uses are available, but coalescer does not.

Another form of the same test, without a killed use but with a later separate KILL:

# RUN: llc -march=amdgcn -mcpu=gfx900 -o - -verify-coalescing -run-pass=simple-register-coalescing %s

---
name:            test
tracksRegLiveness: true
body:             |
  bb.0:
    liveins: $vgpr0

    %0:vgpr_32 = COPY $vgpr0
    %1:vgpr_32 = V_ADD_U32_e32 1, %0, implicit $exec
    %2:vgpr_32 = V_MOV_B32_e32 %1, implicit $exec
    KILL %1
    $vgpr0 = COPY killed %2
    SI_RETURN_TO_EPILOG killed $vgpr0
...

Looks like LRE checks that all uses are available, but coalescer does not.

Should be fixed by D106396.

Alternative D106408 does that for all targets.

rampitec abandoned this revision.Aug 16 2021, 12:07 PM