This is an archive of the discontinued LLVM Phabricator instance.

[MachineScheduler] improve reuse of 'releaseNode'method
ClosedPublic

Authored by DoktorC on Jul 31 2019, 3:55 AM.

Details

Summary

The 'SchedBoundary::releaseNode' is merely invoked for releasing the Top/Bottom root nodes.
However, 'SchedBoundary::releasePending' uses its same logic to check if the Pending queue
has any releasable SUnit.
It is possible to slightly modify the body of the two, allowing re-use of the former ('releaseNode')
in the latter.

Diff Detail

Event Timeline

DoktorC created this revision.Jul 31 2019, 3:55 AM
fhahn added a subscriber: fhahn.
fhahn added inline comments.
llvm/lib/CodeGen/MachineScheduler.cpp
2111

Having the statement on the same line looks a bit unusual. Could you clang-format-diff it?

2361

The new code is is missing this condition and Available.size() may exceed ReadyListLimit I think.

DoktorC updated this revision to Diff 212576.EditedJul 31 2019, 7:22 AM

As asked, the if-bodies are clang-formatted; proper formatting is applied to 'HazardDetected'
condition and 'template' specification of 'releaseNode'.

Further, a typo is fixed (From 'IsPQueue' to 'InPQueue', in MachineScheduler.cpp)

DoktorC marked an inline comment as done.Aug 1 2019, 2:46 AM
DoktorC added inline comments.
llvm/lib/CodeGen/MachineScheduler.cpp
2361

Actually, 'HazardDetected' condition in 'releaseNode' covers this case.

fhahn added inline comments.Aug 12 2019, 2:33 AM
llvm/lib/CodeGen/MachineScheduler.cpp
2361

Right, my main concern was missing out on the early exit in the loop. I think it would still be worth keeping the early exits if Available.size() >= ReadyListLimit.

I *think* it would make sense to have the early exit at the beginning of the loop, as currently we set MinReadyCycle even if we do not add the SU to Available. IIRC, MinReadyCycle should indicate the Cycle of the soonest available instruction in Available, not pending.

DoktorC updated this revision to Diff 215738.Aug 17 2019, 6:10 AM
DoktorC marked an inline comment as done.

Insert Available.size() >= ReadyListLimit at beginning of SchedBoundary::releasePending
loop, allowing early-exit in case of full Available queue.

Further, fix update of MinReadyCycle: it was set to SU->ReadyCycle, even if the
scheduling unit was not moved to the Available queue.
(thank @fhahn)

DoktorC marked an inline comment as done.Aug 17 2019, 6:11 AM
fhahn accepted this revision.Aug 19 2019, 11:10 AM

LGTM, thanks!

llvm/lib/CodeGen/MachineScheduler.cpp
2101

nit: outermost brackets unnecessary.

2106

nit: IMO it would be slightly simpler to keep the check of Available.size() unconditional, as the runtime benefits should be very negligible

This revision is now accepted and ready to land.Aug 19 2019, 11:10 AM
DoktorC updated this revision to Diff 216419.Aug 21 2019, 9:14 AM

Removed useless round brackets and merged Available.size() >= ReadyListLimit with other
hazard conditions.

DoktorC marked 2 inline comments as done.Aug 21 2019, 9:16 AM

Fixes applied.
@fhahn, could I ask you to commit the patch? I do not have commit-access :(

Further, fix update of MinReadyCycle: it was set to SU->ReadyCycle, even if the
scheduling unit was not moved to the Available queue.
(thank @fhahn)

This change causes a crash, because MinReadyCycle is not properly initialised when going through pickOnlyChoice. I think we should address that separately and I'll commit the version with setting MinReadyCycle at the original position.

fhahn requested changes to this revision.Aug 21 2019, 11:00 AM

Even with moving setting the MinReadyCycle back, it fails the following tests. Could you take a look?

Failing Tests (3):

LLVM :: CodeGen/PowerPC/build-vector-tests.ll
LLVM :: CodeGen/PowerPC/vec_conv_fp32_to_i64_elts.ll
LLVM :: CodeGen/PowerPC/vec_conv_i16_to_fp64_elts.ll
This revision now requires changes to proceed.Aug 21 2019, 11:00 AM
DoktorC added a comment.EditedAug 21 2019, 3:07 PM

This change causes a crash, because MinReadyCycle is not properly initialised when going through pickOnlyChoice. I think we should address that separately and I'll commit the version with setting MinReadyCycle at the original position.

I beg your pardon, I ran only the unit-tests and forgot the regression ones.
Yes, I agree with you.

Even with moving setting the MinReadyCycle back, it fails the following tests. Could you take a look?

Failing Tests (3):

LLVM :: CodeGen/PowerPC/build-vector-tests.ll
LLVM :: CodeGen/PowerPC/vec_conv_fp32_to_i64_elts.ll
LLVM :: CodeGen/PowerPC/vec_conv_i16_to_fp64_elts.ll

Checked. By my side, regression tests (I didn't forget them this time) and unit tests are all passed.

UPDATE
My llvm's configuration targeted only some architectures (ARM, Mips and X86). Again, sorry.
I re-configured the project to target all the supported archs, executed the tests and confirm the above
three failing.

I'll try to figure out why they're not passing.

fhahn added a comment.Aug 23 2019, 8:29 AM

UPDATE
My llvm's configuration targeted only some architectures (ARM, Mips and X86). Again, sorry.
I re-configured the project to target all the supported archs, executed the tests and confirm the above
three failing.

I'll try to figure out why they're not passing.

Great, thanks!

BTW, it is better to respond with a new comment instead of updating existing ones, as it seems like Phabricator does not send emails for edited comments.

Hi @fhahn ,
sorry for the long silence period.

I tracked down the problem: it is how the Pending queue is explored with this patch (from the end).
Clearly, this affects which nodes move first from pending to available state; this is critical when
the Available queue reaches the maximum size.

I can re-write the queue exploration to match the pre-patch behaviour.
However, I'm wondering if it is a sign of "bad" design: shouldn't the pending queue explore the best
candidates to move in the available queue?

fhahn added a comment.Nov 22 2019, 8:10 AM

Hi @fhahn ,
sorry for the long silence period.

I tracked down the problem: it is how the Pending queue is explored with this patch (from the end).
Clearly, this affects which nodes move first from pending to available state; this is critical when
the Available queue reaches the maximum size.

I can re-write the queue exploration to match the pre-patch behaviour.
However, I'm wondering if it is a sign of "bad" design: shouldn't the pending queue explore the best
candidates to move in the available queue?

I think it would be best keep the old behavior in this patch and potentially fix the iteration order in a follow up. Changes to the scheduler can lead to subtle regressions, so it's best to keep the changes isolated.

DoktorC updated this revision to Diff 232511.EditedDec 6 2019, 1:45 AM
  • Revert Pending queue exploration to pre-patch behavior (from the beginning). It was source of failure for 3 tests:
LLVM :: CodeGen/PowerPC/build-vector-tests.ll
LLVM :: CodeGen/PowerPC/vec_conv_fp32_to_i64_elts.ll
LLVM :: CodeGen/PowerPC/vec_conv_i16_to_fp64_elts.ll
  • Fix missing MinReadyCycle assignment in releaseNode. It was source of MinReadyCycle uninitialized assertion error.
fhahn accepted this revision.Dec 30 2019, 11:19 AM

LGTM, thanks! And sorry for the delay! Please let me know if I should commit this change for you.

This revision is now accepted and ready to land.Dec 30 2019, 11:19 AM
DoktorC added a comment.EditedDec 31 2019, 2:52 AM

Thank you for the patience and helping me!
Still lacking of commit-access; hence, yes: you can commit for me as

  • Name, Surname: Lorenzo Casalino
  • Email: lorenzo.casalino93@gmail.com

Thank you again, and happy new year :) !

This revision was automatically updated to reflect the committed changes.
fhahn added a comment.Jan 1 2020, 12:36 PM

Thank you for the patience and helping me!
Still lacking of commit-access; hence, yes: you can commit for me as

  • Name, Surname: Lorenzo Casalino
  • Email: lorenzo.casalino93@gmail.com

Thank you again, and happy new year :) !

Happy new year!

I've committed it with you as the author, please let me know in case you get emails about build failures.

Hi,

bad luck, got a build failure.
Link to the report: http://lab [.] llvm [.] org:8011/builders/clang-ppc64be-linux-lnt/builds/33875

Build Reason: scheduler
Build Source Stamp: [branch master] f9f78cf6ac73d9148be9b626f418bf6770e512f6
Blamelist: Lorenzo Casalino <lorenzo.casalino93@gmail.com>

BUILD FAILED: failed test-suite
qiucf added a subscriber: qiucf.Jan 2 2020, 12:07 AM

Hi,

bad luck, got a build failure.
Link to the report: http://lab [.] llvm [.] org:8011/builders/clang-ppc64be-linux-lnt/builds/33875

Build Reason: scheduler
Build Source Stamp: [branch master] f9f78cf6ac73d9148be9b626f418bf6770e512f6
Blamelist: Lorenzo Casalino <lorenzo.casalino93@gmail.com>

BUILD FAILED: failed test-suite

I created a patch for a build failure on GCC: https://reviews.llvm.org/D72069. Would you like to help review it? Thanks.

Hi,

bad luck, got a build failure.
Link to the report: http://lab [.] llvm [.] org:8011/builders/clang-ppc64be-linux-lnt/builds/33875

Build Reason: scheduler
Build Source Stamp: [branch master] f9f78cf6ac73d9148be9b626f418bf6770e512f6
Blamelist: Lorenzo Casalino <lorenzo.casalino93@gmail.com>

BUILD FAILED: failed test-suite

I guess the failure is caused by PowerPC backend.

/opt/at12.0/lib/gcc/powerpc64le-linux-gnu/8.2.1/../../../../powerpc64le-linux-gnu/bin/ld: ../../lib/libLLVMPowerPCCodeGen.a(PPCMachineScheduler.cpp.o): in function `llvm::PostGenericScheduler::releaseTopNode(llvm::SUnit*)':
PPCMachineScheduler.cpp:(.text._ZN4llvm20PostGenericScheduler14releaseTopNodeEPNS_5SUnitE[_ZN4llvm20PostGenericScheduler14releaseTopNodeEPNS_5SUnitE]+0x2c): undefined reference to `void llvm::SchedBoundary::releaseNode<false>(llvm::SUnit*, unsigned int, unsigned int)'
/opt/at12.0/lib/gcc/powerpc64le-linux-gnu/8.2.1/../../../../powerpc64le-linux-gnu/bin/ld: ../../lib/libLLVMPowerPCCodeGen.a(PPCMachineScheduler.cpp.o): in function `llvm::GenericScheduler::releaseBottomNode(llvm::SUnit*)':
PPCMachineScheduler.cpp:(.text._ZN4llvm16GenericScheduler17releaseBottomNodeEPNS_5SUnitE[_ZN4llvm16GenericScheduler17releaseBottomNodeEPNS_5SUnitE]+0x34): undefined reference to `void llvm::SchedBoundary::releaseNode<false>(llvm::SUnit*, unsigned int, unsigned int)'
/opt/at12.0/lib/gcc/powerpc64le-linux-gnu/8.2.1/../../../../powerpc64le-linux-gnu/bin/ld: ../../lib/libLLVMPowerPCCodeGen.a(PPCMachineScheduler.cpp.o): in function `llvm::GenericScheduler::releaseTopNode(llvm::SUnit*)':
PPCMachineScheduler.cpp:(.text._ZN4llvm16GenericScheduler14releaseTopNodeEPNS_5SUnitE[_ZN4llvm16GenericScheduler14releaseTopNodeEPNS_5SUnitE]+0x34): undefined reference to `void llvm::SchedBoundary::releaseNode<false>(llvm::SUnit*, unsigned int, unsigned int)'

It is because you are splitting the template declaration and definition into header and cpp. Usually, you need to add the explicit template instantiation in the cpp. However, I am not sure if it is a good idea to use template ... BTW, it compiles clean if with clang.

It is because you are splitting the template declaration and definition into header and cpp. Usually, you need to add the explicit template instantiation in the cpp. However, I am not sure if it is a good idea to use template ... BTW, it compiles clean if with clang.

You are totally right about explicit instantiation.

Regarding the usage of a template argument, instead of a function parameter, lies in the fact that the scheduler is aware whether
a node is in the Pending queue or not, when calling releaseNode.
Indeed, such method is invoked in two occasions

  • a node is scheduled: successors/predecessors are released; thus, they're known to be not in the Pending queue.
  • moving nodes from pending to available state; the nodes are known to be in the Pending queue.

As a consequence, part of the operations perfomed by 'releaseNode' may be actively skipped, avoiding the
overheads of control instructions. (Yes, modern architectures are endowed with branch predictors; but I do not
think that such an argument saves us from correctly employ known invariants in the algorithms.)

fhahn added a comment.Jan 2 2020, 2:04 AM

It is because you are splitting the template declaration and definition into header and cpp. Usually, you need to add the explicit template instantiation in the cpp. However, I am not sure if it is a good idea to use template ... BTW, it compiles clean if with clang.

You are totally right about explicit instantiation.

Regarding the usage of a template argument, instead of a function parameter, lies in the fact that the scheduler is aware whether
a node is in the Pending queue or not, when calling releaseNode.
Indeed, such method is invoked in two occasions

  • a node is scheduled: successors/predecessors are released; thus, they're known to be not in the Pending queue.
  • moving nodes from pending to available state; the nodes are known to be in the Pending queue.

As a consequence, part of the operations perfomed by 'releaseNode' may be actively skipped, avoiding the
overheads of control instructions. (Yes, modern architectures are endowed with branch predictors; but I do not
think that such an argument saves us from correctly employ known invariants in the algorithms.)

FWIW I think we should trust the compiler to optimize this properly and passing it as parameter should be fine. Both versions should probably compile to the same code (although I don't have time to verify that right now)