Page MenuHomePhabricator

[AMDGPU] Add Lower Bound to PipelineSolver
AcceptedPublic

Authored by jrbyrnes on Sep 12 2022, 3:19 PM.

Details

Summary

This patch provides an algorithm to compute the lower bound on the cost of the SUs that have not yet been assigned to a SG. It finds the minimum cost SG for a given SU based on the current partial pipeline, but does not assign the SU to the SG. Thus, it will not create added dependencies / affect cost for the other remaining unassigned SUs. Then, at each branch in the search tree, we are able to use the lower bound on the cost of the remaining SUs for additional pruning.

The net benefit of this feature is difficult to conclude through analytical means, and should be done experimentally. While the LB does improve pruning, it increases the time complexity of processing a branch in the search tree. The cost computation is expensive, and the LB computes the cost for all unassigned SUs for every branch in the search tree.

Change-Id: I803200917c6d02f0ac8b916b61741b2b7d72d1e0

Diff Detail

Event Timeline

jrbyrnes created this revision.Sep 12 2022, 3:19 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 12 2022, 3:19 PM
jrbyrnes requested review of this revision.Sep 12 2022, 3:19 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 12 2022, 3:19 PM
jrbyrnes updated this revision to Diff 459575.Sep 12 2022, 3:24 PM

Fix incomplete comment

jrbyrnes updated this revision to Diff 459576.Sep 12 2022, 3:28 PM

Avoid a very small number of calls to calculateLowerBound

Thanks for implementing this!

llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp
576

Maybe move the vector out of the loops to reduce allocations, and clear() here instead?

590

In addEdges, we could abort once the added cost exceeds MinCostForSU (if MinCostForSU != -1). Not sure if that would be worth the effort.

jrbyrnes updated this revision to Diff 459920.Sep 13 2022, 4:43 PM
jrbyrnes marked 2 inline comments as done.

Address Review Comments

Thanks for implementing this!

Hey Jannik -- of course, thanks for the comments / good suggestions!

jsilvanus added inline comments.Sep 13 2022, 11:45 PM
llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp
576

Why not move out of the other two loops as well?
Maybe with a comment that this is for performance only, and only needed in the most inner loop.

jrbyrnes updated this revision to Diff 460106.Sep 14 2022, 8:30 AM
jrbyrnes marked an inline comment as done.

A few optimizations to improve general performance, and perf of LB.

arsenm added inline comments.Sep 15 2022, 6:30 AM
llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp
68

Typo calcalutes

566

I'd assume this can be a SmallVector

jrbyrnes updated this revision to Diff 460515.Sep 15 2022, 3:08 PM
jrbyrnes marked 2 inline comments as done.

Address review comments

arsenm accepted this revision.Tue, Nov 22, 6:48 AM

I know little about the scheduler but this seems reasonable

llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp
794

Single quotes

This revision is now accepted and ready to land.Tue, Nov 22, 6:48 AM