Page MenuHomePhabricator

[AMDGPU] Add Optimize VGPR LiveRange Pass.
ClosedPublic

Authored by ruiling on May 10 2021, 8:17 PM.

Details

Summary

This pass aims to optimize VGPR live-range in a typical divergent if-else
control flow. For example:

def(a)
if(cond)

use(a)
... // A

else

use(a)

As AMDGPU access vgpr with respect to active-mask, we can mark a as
dead in region A. For details, please refer to the comments in
implementation file.

The pass is enabled by default, the frontend can disable it through
"-amdgpu-opt-vgpr-liverange=false".

Diff Detail

Event Timeline

ruiling created this revision.May 10 2021, 8:17 PM
ruiling requested review of this revision.May 10 2021, 8:17 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 10 2021, 8:17 PM

Is this relying on assumptions about how the placement of blocks after structurization? I think we need to augment the MIR to better track vector vs. scalar predecessors

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
168

Should default to on

llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp
40

s/LLVM/register allocator

73

Why no initializer like the rest?

167

printName?

169

single quotes

204

Could also handle AGPR

277

Braces

302

printName

327

printName

460–461

Can move this before the initializations

llvm/test/CodeGen/AMDGPU/vgpr-liverange.ll
4

Could really use an additional end to end IR checks to make sure this actually gets the allocator to do what you want

Is this relying on assumptions about how the placement of blocks after structurization? I think we need to augment the MIR to better track vector vs. scalar predecessors

Thanks for the careful review! This pass assume the IR is structurized, mainly things around if-else-endif, the SI_ELSE should have a corresponding SI_IF and SI_ENDIF, and the sub-regions should be single-entry-single-exit, I don't think I have dependency on specific order of the blocks.

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
168

default on is ok to me, but I really hope we could do some testing at Compute and Mesa side to make sure we don't have obvious regression before landed.

llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp
73

should be like others.

204

I guess AGPRs are used as physical registers? if yes, I think we cannot handle them here, as handling physical register needs lots more work regarding the live-or-not-checking and updating the LiveVariable information.

llvm/test/CodeGen/AMDGPU/vgpr-liverange.ll
4

good idea, will do that.

arsenm added inline comments.May 17 2021, 4:56 PM
llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp
204

No, it's a class. They behave exactly like VGPRs

critson added inline comments.May 21 2021, 3:01 AM
llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp
11

"optimize off unnecessary VGPR live range" -> "remove unnecessary VGPR live ranges"?

191

Any reason for explicit 8 here?
(I am also being reminded not to write explicit sizes in these.)

194

Is this Else->instrs() to skip phis for a reason? (Rather than *Else.)

197

I think you can just use MI.getNumOperands() directly in the loop condition and leave it up to the compiler whether to hoist it?
In fact you do that in the next similar loop.

200

What is the "MO.getReg() == 0" for? It does not seem correct.

223

Where does this loop check that the registers added to KillsInElse are actually from the else branch?

238

Should this test be before the live interval retrieval?

291

Can break out of loop here?
Also perhaps assert(ThenEntry) after loop?

348

As Matt noted on the earlier one of these probably should use braces here too.

372

while?

476

As earlier, do we need explicit 8s here?

foad added inline comments.May 21 2021, 3:34 AM
llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp
197

Or for (auto &MO : MI.operands())?

335

Use llvm::is_contained (here and a few other places).

Thanks for all the careful comments, I will also address new comments from Carl and Jay in next version.

llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp
191

No specific reason, will remove it.

194

I think they are just the same. instrs() means all the instructions I think.

197

sounds good.

223

I am assuming the predecessor that is not the Flow is from Else region. Maybe I need to add some assert for this.

238

Do you mean put this check before the line "LV->getVarInfo(Reg)"? If yes, I will fix it next version.

291

sure, will do it.

ruiling added inline comments.May 21 2021, 7:37 AM
llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp
200

they are just for trivial invalid case, should be AMDGPU::NoRegister, why not correct?

ruiling updated this revision to Diff 347063.May 21 2021, 9:24 AM

address review comments

arsenm added inline comments.May 21 2021, 12:37 PM
llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp
204

There's an isVectorRegister helper

ruiling updated this revision to Diff 347538.May 24 2021, 6:14 PM

use isVectorRegister()

ruiling edited the summary of this revision. (Show Details)May 24 2021, 6:19 PM
ruiling added inline comments.
llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp
191

SmallSet still needs explicit size, so keep it while removing other explicit size for SmallVector.

ping, any further comments?

ping again. I would like to add more background about the patch, this change could improve the performance of some critical workloads over 8%. And this improvement is quite important for us. Would you like to accept this? @arsenm @critson

ruiling marked 23 inline comments as done.Jun 7 2021, 4:17 AM

Sorry a few more comments, mostly minor.

llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp
120

Order of Required and Preserved could be the same?

153

I think this can just be "while (MBB) {"

159

This doesn't seem right.
You only collect an arbitrary number of blocks?
Should ElseBlocks not simply be growing to hold all relevant blocks?

198

Instead of "MO.getReg() == AMDGPU::NoRegister", I believe you can just write "!MO.getReg()".

231

As above.

llvm/test/CodeGen/AMDGPU/vgpr-liverange2.ll
1 ↗(On Diff #347538)

These files should have a better name.
Perhaps:
vgpr-liverange.ll -> vgpr-liverange-ir.ll
vgpr-liverange2.ll -> vgpr-liverange.ll

We definitely don't use numeric suffixes at present.

foad added inline comments.Jun 8 2021, 3:36 AM
llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp
159

Cur tells you the first index in Blocks of a block that you have *not* already visited to scan its predecessors. Perhaps it would be clearer as:

SetVector<MachineBasicBlock *> Blocks(Endif);
for (unsigned Cur = 0; Cur < Blocks.size(); ++Cur) {
  auto *MBB = Blocks[Cur];
  for (auto *Pred : MBB->predecessors()) {
    if (Pred != Flow)
      Blocks.insert(Pred);
  }
}
175

for (auto &UseMI : MRI->use_nodbg_instructions(Reg))

ruiling added inline comments.Jun 8 2021, 6:27 PM
llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp
159

Hi @critson, does @foad's comment help answer your concern? I don't quite understand your confusion. The idea here is to iteratively visit the predecessors starting from Endif and push them into the Blocks and stop if the predecessor is Flow. I admit the code here may not quite easy to understand, I struggled for a while when writing this piece of code. @foad's suggestion missed one subtle requirement: the Endif should not appear in the final result. and I don't want to remove the first element from the vector after that. I think that was the main reason I wrote it like this. Considering that for most cases we would not have too much blocks in an if-else structure(may be I am wrong?), I think a "find in vector" should not be a big issue.

ruiling updated this revision to Diff 350804.Jun 9 2021, 12:58 AM
ruiling edited the summary of this revision. (Show Details)

address review comments

ruiling marked 5 inline comments as done.Jun 9 2021, 1:06 AM
foad added inline comments.Jun 9 2021, 2:05 AM
llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp
159

"for most cases we would not have too much blocks in an if-else structure": this makes me a bit nervous. People write big shaders, so some people probably write big shaders with an if-then-else wrapped around the whole thing.

critson added inline comments.Jun 9 2021, 2:21 AM
llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp
159

OK, I understand what the loop does now. I clearly understood it first time, but did get confused on the second reading.
As Jay raises the is_contained in the loop may be executed a lot with a big if-else in a shader?
Maybe this should be a SmallSetVector?

ruiling updated this revision to Diff 350856.Jun 9 2021, 4:41 AM

Use SmallSetVector for ElseBlocks

ruiling marked 2 inline comments as done.Jun 9 2021, 4:44 AM
ruiling added inline comments.
llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp
159

done

critson accepted this revision.Jun 9 2021, 6:59 PM

LGTM, with one nit

llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp
350

Since Visited is a set type, I think this should be a direct test on the set, not is_contained.

This revision is now accepted and ready to land.Jun 9 2021, 6:59 PM
This revision was landed with ongoing or failed builds.Jun 21 2021, 12:27 AM
This revision was automatically updated to reflect the committed changes.
ruiling marked an inline comment as done.