This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU]: Fix failing assertion in SIMachineScheduler
ClosedPublic

Authored by jsilvanus on Apr 20 2022, 11:35 AM.

Details

Summary

This fixes the assertion failure "Loop in the Block Graph!".

SIMachineScheduler groups instructions into blocks (also referred to
as coloring or groups) and then performs a two-level scheduling:
inter-block scheduling, and intra-block scheduling.

This approach requires that the dependency graph on the blocks which
is obtained by contracting the blocks in the original dependency graph
is acyclic. In other words: Whenever A and B end up in the same block,
all vertices on a path from A to B must be in the same block.

When compiling an example consisting of an export followed by
a buffer store, we see a dependency between these two. This dependency
may be false, but that is a different issue.
This dependency was not correctly accounted for by SiMachineScheduler.

A new test case si-scheduler-exports.ll demonstrating this is
also added in this commit.

The problematic part of SiMachineScheduler was a post-optimization of
the block assignment that tried to group all export instructions into
a separate export block for better execution performance. This routine
correctly checked that any paths from exports to exports did not
contain any non-exports, but not vice-versa: In case of an export with
a non-export successor dependency, that single export was moved
to a separate block, which could then be both a successor and a
predecessor block of a non-export block.

As fix, we now skip export grouping if there are exports with direct
non-export successor dependencies. This fixes the issue at hand,
but is slightly pessimistic:
We *could* group all exports into a separate block that have neither
direct nor indirect export successor dependencies.
We will review the potential performance impact and potentially
revisit with a more sophisticated implementation.

Note that just grouping all exports without direct non-export successor
dependencies could still lead to illegal blocks, since non-export A
could depend on export B that depends on export C. In that case,
export C has no non-export successor, but still may not be grouped
into an export block.

Change-Id: I18ab846940a2a3e224199e72339e6fd181b7e8de

Diff Detail

Event Timeline

jsilvanus created this revision.Apr 20 2022, 11:35 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 20 2022, 11:36 AM
jsilvanus requested review of this revision.Apr 20 2022, 11:36 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 20 2022, 11:36 AM
arsenm accepted this revision.Apr 20 2022, 1:39 PM

LGTM with nits

llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp
1128–1129

east const?

llvm/test/CodeGen/AMDGPU/si-scheduler-exports.ll
28

This is redundant with the -mattr on the run line

This revision is now accepted and ready to land.Apr 20 2022, 1:39 PM
jsilvanus updated this revision to Diff 424126.Apr 21 2022, 1:42 AM

Implement review feedback

  • use west const
  • drop redundant attribute in lit test
foad added inline comments.Apr 21 2022, 1:47 AM
llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp
1141

Can you confirm that these subgraph-based checks are no longer required? Is that because your new check is stronger?

jsilvanus marked 2 inline comments as done.Apr 21 2022, 1:49 AM
jsilvanus added inline comments.
llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp
1128–1129

Sorry, fixed. 10 years of east const are hard-wired into my brain I assume.
Good to know that clang-format does not pick this one up.

llvm/test/CodeGen/AMDGPU/si-scheduler-exports.ll
28

Thanks, fixed.

jsilvanus marked 2 inline comments as done.Apr 21 2022, 2:04 AM
jsilvanus added inline comments.
llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp
1141

Regarding the assertion check: The subgraph-based assertion is indeed dropped in the new version and no longer explicitly checked.
But it is implied by the topological order, and actually the new implementation no longer depends on the topological order to be correct, because we just check for the existence
of some edge from an export to a non-export.

Regarding the second subgraph-based check not guarded by NDEBUG: Yes and yes.

The old test determines the set of instructions reachable from DAG->SUnits[j] from which SU is reachable, using the GetSubGraph call,
and aborts if any of them is a non-export.
The new test checks if there is any non-export directly reachable from SU.

Because the new test goes the other direction, the test is incomparable for a particular SU object. But when considering the whole for loop,
the new test is stronger: If there is any pair of DAG->SUnits[j] and SU for which the old check aborts, then there is a path from an export to an export
that visits a non-export. On that path, there must be an edge from an export to a non-export, and for that edge the new test will abort.

foad accepted this revision.Apr 21 2022, 2:12 AM
foad added inline comments.
llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp
1141

Thanks for explaining!

I do not have commit access yet. Could someone else please commit this change?

Name: "Jannik Silvanus"
Email: "jannik.silvanus@amd.com"

jsilvanus closed this revision.Apr 26 2022, 4:08 AM

Closed by 6f0a24b04af6536956b5aa882cf21217eaea8d6a