This is an archive of the discontinued LLVM Phabricator instance.

[Scheduling] Create the missing dependency edges for store cluster
ClosedPublic

Authored by steven.zhang on Dec 31 2019, 1:57 AM.

Details

Summary

If it is load cluster, we don't need to create the dependency edges(SUb->reg) from SUb to SUa as they both depend on the base register "reg"

     +-------+
+---->  reg  |
|    +---+---+
|        ^
|        |
|        |
|        |
|    +---+---+
|    |  SUa  |  Load 0(reg)
|    +---+---+
|        ^
|        |
|        |
|    +---+---+
+----+  SUb  |  Load 4(reg)
     +-------+

But if it is store cluster, we need to create it as follow shows to avoid the instruction store depend on scheduled in-between SUb and SUa. Notice that, AMDGPU several cases break due to this change and I examine them one by one, no obvious issue found from scheduling aspect. However, need double confirm. And one case show some deg in the final code sequence, which seems to be an issue of the later pass I think.

     +-------+
+---->  reg  |
|    +---+---+
|        ^
|        |         Missing       +-------+
|        | +-------------------->+   y   |
|        | |                     +---+---+
|    +---+-+-+                       ^
|    |  SUa  |  Store x 0(reg)       |
|    +---+---+                       |
|        ^                           |
|        |  +------------------------+
|        |  |
|    +---+--++
+----+  SUb  |  Store y 4(reg)
     +-------+

Diff Detail

Event Timeline

steven.zhang created this revision.Dec 31 2019, 1:57 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 31 2019, 1:57 AM
steven.zhang edited the summary of this revision. (Show Details)Dec 31 2019, 1:58 AM
steven.zhang marked an inline comment as done.
steven.zhang added inline comments.
llvm/test/CodeGen/AMDGPU/byval-frame-setup.ll
339 ↗(On Diff #235692)

This case has some deg as it generates more instructions. But the output of the scheduler is as expected. An issue of the later pass ?

fhahn added inline comments.Dec 31 2019, 4:24 AM
llvm/test/CodeGen/AArch64/macro-fusion.ll
22 ↗(On Diff #235692)

This should probably go into a different file, as it’s load/store clustering and not macro fusion

Address comments about the test case.

fhahn added a comment.Jan 7 2020, 3:01 AM

This adds new dependencies, hence I think it would be good to gather code size/perf numbers with this change for some impacted targets (e.g. AArch64) to be reasonably sure that there are no unexpected knock-on effects.

llvm/lib/CodeGen/MachineScheduler.cpp
1602

I may be missing something, but IIRC stores could have other memory operations as successors, e.g. because to enforce an ordering between aliasing memory operations.

steven.zhang marked an inline comment as done.Jan 7 2020, 3:22 AM
steven.zhang added inline comments.
llvm/lib/CodeGen/MachineScheduler.cpp
1602

It they have memory dependency, they won't be put into the same group.

fhahn added inline comments.Jan 7 2020, 3:39 AM
llvm/lib/CodeGen/MachineScheduler.cpp
1602

Ah right. The wording in the comment comment seems a bit general though, maybe it would be possible to mention that this is a constraint for the clustering. It might be good to add it as assert as well :)

steven.zhang marked an inline comment as done.Jan 7 2020, 5:07 AM
steven.zhang added inline comments.
llvm/lib/CodeGen/MachineScheduler.cpp
1602

ok. I will update the comments and the assertion if it is easy ... Thank you for the comments!

This adds new dependencies, hence I think it would be good to gather code size/perf numbers with this change for some impacted targets (e.g. AArch64) to be reasonably sure that there are no unexpected knock-on effects.

I can do the code size part with spec 2017 for AArch64 and other affected targets, but I cannot evaluate the perf numbers as I didn't have the env. Can someone help me get that data ? Thank you in advance!

Address the comments. And all the case changes disappear with the patch https://reviews.llvm.org/D72706 landed. So, I didn't see the negative impact from this patch any more. And from the design, it is doing the right thing, without increasing the compiling time.

Unit tests: pass. 61882 tests passed, 0 failed and 782 were skipped.

clang-tidy: unknown.

clang-format: pass.

Build artifacts: diff.json, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

steven.zhang added reviewers: rampitec, foad.

Rebase the patch and update the test change for AMDGPU.

Tests change looks neutral to me now, and the logic seems plausible. How comes that only AMDGPU tests affected?

Tests change looks neutral to me now, and the logic seems plausible. How comes that only AMDGPU tests affected?

For now, only AMDGPU and AArch64 supports the mem cluster in LLVM. AArch64 only supports the back-2-back load/store cluster while AMDGPU supports 6 load/store at most to be cluster. So, it is more easy to expose this change.

LGTM, but please wait for other responses.

Would it be possible to add a test case for AArch64?

Would it be possible to add a test case for AArch64?

That would not be easy, but I could cook a case for AArch64 to indicate that, the internal dependency graph has been changed.

Add one test for AArch64 and rebase the patch.

Rebase the patch.

foad added a comment.Jul 20 2020, 9:23 AM

Seems reasonable to me. It's just a heuristic and shouldn't affect correctness, right?

Seems reasonable to me. It's just a heuristic and shouldn't affect correctness, right?

Yes.

The AMDGPU test change likely means nothing, but it'd be good if someone who maintains or work that target would ok it. I suggest giving it another week or so. Otherwise, it LGTM.

arsenm accepted this revision.Jul 21 2020, 3:50 PM
This revision is now accepted and ready to land.Jul 21 2020, 3:50 PM

I see three extra AMDGPU tests changed when rebase the patch. @evandro Would you please help me verify if there is no deg for those changes ? Thank you.

LLVM :: CodeGen/AMDGPU/call-argument-types.ll
LLVM :: CodeGen/AMDGPU/callee-special-input-vgprs.ll
LLVM :: CodeGen/AMDGPU/stack-realign.ll

I'm not familiar with the AMDGPU target, but the changes in these new tests seem harmless, except in CodeGen/AMDGPU/stack-realign.ll, where an instruction disappeared. Again, I'd defer to someone who works in this backend, perhaps @arsenm, to chime in.

arsenm added inline comments.Jul 30 2020, 3:10 PM
llvm/test/CodeGen/AMDGPU/stack-realign.ll
164–171 ↗(On Diff #281808)

Looks like an irrelevant change; no problem. I don't see anything missing?

evandro accepted this revision.Jul 30 2020, 4:39 PM
evandro added inline comments.
llvm/test/CodeGen/AMDGPU/stack-realign.ll
164–171 ↗(On Diff #281808)

My bad.

Thank you for all the information!

This revision was landed with ongoing or failed builds.Aug 6 2020, 9:58 PM
This revision was automatically updated to reflect the committed changes.