This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Speedup SIFormMemoryClauses live-in register set calculation
Needs RevisionPublic

Authored by vpykhtin on Oct 25 2022, 5:28 AM.

Details

Summary

This patch uses the same approach used in GCNScheduleDAGMILive::getBBLiveInMap,
getLiveRegMap has complexity O(NumVirtRegs * averageLiveRangeSegmentsPerReg * lg(NumBB))

Diff Detail

Event Timeline

vpykhtin created this revision.Oct 25 2022, 5:28 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 25 2022, 5:28 AM
vpykhtin requested review of this revision.Oct 25 2022, 5:28 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 25 2022, 5:28 AM
arsenm added inline comments.Oct 25 2022, 11:46 AM
llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp
280

You seem to be assuming a single clause per block. I'd expect to handle this a full clause in a time, within a single block.

vpykhtin added inline comments.Oct 25 2022, 11:53 AM
llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp
280

Not quite, I just compute the live-in set for the first clause per BB to reset the RPTracker, then it is advanced to the next clause.

This revision is now accepted and ready to land.Oct 25 2022, 11:59 AM
arsenm requested changes to this revision.Oct 25 2022, 1:22 PM
arsenm added inline comments.
llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp
280

Can you do this per block, instead of calculating this for every block?

This revision now requires changes to proceed.Oct 25 2022, 1:22 PM
vpykhtin added inline comments.Oct 25 2022, 2:37 PM
llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp
280

This is the whole point of doing that all at once:

  1. Slot indexes of all clauses first instructions are collected and sorted.
  2. For every virtual register's LiveRange we have two sorted sequences: Segments and SlotIndexes. We need to determine

which of SlotIndexes fall into Segments of the virtual register - that would mean the register is live at those SlotIndexes.

  1. Since both sequences are sorted we progressively use two-way binary search: either SlotIndex that is contained by the Segment, or Segment containing the SlotIndex.

I now realize the complexity is not what I thought before, it should be (per register):

O( min ( NumSegments * lg(NumSlotIndexes), NumSlotIndexes * lg(NumSegments) )

See getLiveRegMap, LiveRange::findIndexesLiveAt.

vpykhtin added inline comments.Oct 25 2022, 2:48 PM
llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp
280

Sorry I mean first instruction of a first clause per BB, following clauses are processed using 'advance'

arsenm added inline comments.Nov 16 2022, 4:29 PM
llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp
286

Can you add a comment explaining this?

I'm still not following how you're doing it all at once, but subsequent clauses are handled later?

Is this still relevant?