Page MenuHomePhabricator

AMDGPU: Always split si_end_cf blocks
Needs RevisionPublic

Authored by arsenm on Sep 11 2020, 12:35 PM.

Details

Summary

Ensure we only put the exec modification in a terminator instruction.

The trickiest part of this was dealing with SI_KILL_CLEANUP, which I
don't fully understand. It tries to preserve it as a terminator and
avoids removing it.

Diff Detail

Event Timeline

arsenm created this revision.Sep 11 2020, 12:35 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 11 2020, 12:35 PM
arsenm requested review of this revision.Sep 11 2020, 12:35 PM
critson accepted this revision.Sep 24 2020, 2:08 AM

LGTM.

SI_KILL_CLEANUP pseudos are inserted to mark points where control flow merges and hence the exec mask can be evaluated for early termination of a pixel shader.
These early terminations are added by SIInsertSkips which contains the logic for determining if it is safe to early terminate at a given point.

This revision is now accepted and ready to land.Sep 24 2020, 2:08 AM
nhaehnle requested changes to this revision.Sep 30 2020, 7:55 AM

It's unclear to me what this is trying to achieve. If it is to prevent

bb:
  <-- reload inserted here during live range splitting
  $exec = S_OR_B64 $exec, %other
  ... rest of code ...

... then this change only replaces it by:

bb:
  <-- reload inserted here during live range splitting
  $exec = S_OR_B64_term $exec, %other
  // fallthrough

bb.new:
  ... rest of code ...

The inserted reload code is as incorrect as it was before.

This revision now requires changes to proceed.Sep 30 2020, 7:55 AM

It's unclear to me what this is trying to achieve. If it is to prevent

bb:
  <-- reload inserted here during live range splitting
  $exec = S_OR_B64 $exec, %other
  ... rest of code ...

... then this change only replaces it by:

bb:
  <-- reload inserted here during live range splitting
  $exec = S_OR_B64_term $exec, %other
  // fallthrough

bb.new:
  ... rest of code ...

The inserted reload code is as incorrect as it was before.

I'm not trying to fully solve the live range splitting problem greedy regalloc hits. I'm trying to eliminate the isBasicBlockPrologue concept that fastregalloc trips over when inserting spills at the beginning of the block

I'm not trying to fully solve the live range splitting problem greedy regalloc hits. I'm trying to eliminate the isBasicBlockPrologue concept that fastregalloc trips over when inserting spills at the beginning of the block

What if the concept of a basic block prolog *is* the correct long term solution?

I'm not trying to fully solve the live range splitting problem greedy regalloc hits. I'm trying to eliminate the isBasicBlockPrologue concept that fastregalloc trips over when inserting spills at the beginning of the block

What if the concept of a basic block prolog *is* the correct long term solution?

I don't think it is a well formed concept. The operations the prolog inputs depend on transitively turn into prolog instructions, which generally breaks down.

It's unclear to me what this is trying to achieve. If it is to prevent

bb:
  <-- reload inserted here during live range splitting
  $exec = S_OR_B64 $exec, %other
  ... rest of code ...

... then this change only replaces it by:

bb:
  <-- reload inserted here during live range splitting
  $exec = S_OR_B64_term $exec, %other
  // fallthrough

bb.new:
  ... rest of code ...

The inserted reload code is as incorrect as it was before.

I'm not trying to fully solve the live range splitting problem greedy regalloc hits. I'm trying to eliminate the isBasicBlockPrologue concept that fastregalloc trips over when inserting spills at the beginning of the block

I still not not see how you can get away without isBasicBlockPrologue. I can see how splitting can help with it, but not without. You can split everything and have S_OR the only instruction, but that does not prevent RA from inserting a reload right before it into the same BB.