This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Reduce the number of expensive calls in SIFormMemoryClause
ClosedPublic

Authored by cfang on Jan 22 2021, 4:21 PM.

Details

Summary

RPTracker::reset(MI) is a very expensive call when the number of virtual registers is huge.
We observed a long compilation time issue when RPT::reset() is called once for each cluster.

In this work, we call RPT.reset() only at the first seen cluster, and use advance() to get
the register pressure for the later clusters in the same basic block. This could effectively reduce the number
of the expensive calls and thus reduce the compile time.

Note: I am still seeing a couple LIT failures with the current state of this patch.

Diff Detail

Event Timeline

cfang created this revision.Jan 22 2021, 4:21 PM
cfang requested review of this revision.Jan 22 2021, 4:21 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 22 2021, 4:21 PM
Herald added a subscriber: wdng. · View Herald Transcript
rampitec added inline comments.Jan 22 2021, 4:48 PM
llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp
332

I am not sure, but likely RPT.getNext().isValid() has the same result as using !InitializedInBlock.

337

This second advance is not needed. reset() sets iterator before MI, not after.

340

LiveRegSet LiveRegs = ...
Make sure you are actually copying it, not taking a reference.

363

Move it down after "Ind->insertMachineInstrInMaps(*B);" and reset on B, not on MI. Technically MI's iterator is not valid for RPT purposes after bundling.

cfang added inline comments.Jan 22 2021, 6:17 PM
llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp
340

How to clone a copy? I can not find a suitable RPT function for that purpose.

363

Shouldn't we also restore before the continue when there is just one instruction?

cfang updated this revision to Diff 318810.Jan 23 2021, 10:41 PM

Make a few changes based on comments, Thanks!

  1. Use RPT.getNext().isValid() instead of InitializedInBlock;
  2. Make an actual copy of the Live Register Set in stead of the reference itself;
  3. Move the restore of state down after "Ind->insertMachineInstrInMaps(*B);" and reset on B, not on MI

PS. There are still LIT test failures complaining instructions not matching.

This looks good to me, be we need to understand what are these lit failures.

cfang updated this revision to Diff 319143.Jan 25 2021, 3:28 PM

Fix LIT failures.

rampitec accepted this revision.Jan 25 2021, 3:32 PM

LGTM. Thanks!

This revision is now accepted and ready to land.Jan 25 2021, 3:32 PM
MaskRay added a subscriber: MaskRay.EditedJan 26 2021, 9:14 AM

In your commit the message has just Reviewers:. The Reviewers: list does not necessarily mean all the people on the list have acknowledged the patch so Reviewers: is mostly useless. Many people agree that both Reviewed by: & Differential Revision: should be present.

arc amend can fetch the Phabricator summary and amend the local description.

You can install llvm/.git/hooks/pre-push to prevent accidental Summary:, Reviewers:, Subscribers: and Tags:.