This is an archive of the discontinued LLVM Phabricator instance.

Improve post-RA scheduling for SystemZ
ClosedPublic

Authored by jonpa on Jul 6 2017, 6:29 AM.

Details

Summary

The idea of this patch is to continue the scheduler state over a MBB boundary in the case where the successor block has only one predecessor. This means that the scheduler will continue in the successor block (after emitting any branch instructions) with e.g. maintained processor resource counters. Benchmarks seem to benefit from this.

In order to do this the single HazardRecognizer is replaced with a map so that each MBB gets its own HazardRecognizer. This represent the outgoing state of that MBB. Another block which is only reached from MBB can then later take that state before beginning its first region.

*All* instructions are emitted, so that the state of the scheduler is as good as possible at the entry of the region. This means branches and also singular instructions before regions / in small MBBs.

Special care is taken for blocks with a call (which is about the only scheduling boundary on SystemZ), so that only the instructions after the call are part of the final scheduler state.

The only common code change needed is to set the MachineLoopInfo* pointer (that analysis pass seems to have been required also for post-RA scheduling, but not used for some reason).

Diff Detail

Event Timeline

jonpa created this revision.Jul 6 2017, 6:29 AM
jonpa updated this revision to Diff 107514.Jul 20 2017, 7:57 AM

Updated per (off-line) review.

jonpa updated this revision to Diff 107649.Jul 21 2017, 3:32 AM

Patch updated.

jonpa updated this revision to Diff 107877.Jul 24 2017, 3:52 AM

updated per review

jonpa updated this revision to Diff 107878.Jul 24 2017, 3:57 AM

This is the "alternate take" of the last version. I am not sure which is best...

This version introductes of MachineSchedStrategy::leaveMBB() and an optional reversal of BB schedregions traversal which

  • eliminates PreviousMBB
  • simplifies transferStateFromPRed()
  • simplifies initialize() and setupHazardRecForScheduling() (HazardRec pointer could be removed, but kept to minimize diff for now)

This makes the idea of the order of processing scheduling regions in an MBB explicit, which is good since this patch is dependent on that order.
Overall, this simplifies the SystemZ implementation, but has some common-code changes so that the regions can be handled top-down per MBB. (MachineScheduler.cpp)

uweigand edited edge metadata.Jul 24 2017, 8:11 AM

It does indeed seem to be necessary to get an explicit guarantee from common code that scheduling happens in a particular order, otherwise our back-end code may silently break if common code logic happens to change in the future.

I would agree that getting common code to specifically call us in the top-down order we need to simplify cross-region tracking is also helpful. We should discuss this with the relevant common code scheduler developers -- can you check who has been working on this code lately and add them as reviewers?

jonpa added a comment.Jul 27 2017, 2:53 AM

(gentle) ping - this patch is waiting for review of its common-code parts (MachineScheduler), which should be NFC for other targets.

MatzeB edited edge metadata.Jul 27 2017, 3:26 PM

This looks good to me in principle, but there are some comments below:

include/llvm/Target/TargetSubtargetInfo.h
166–170 ↗(On Diff #107878)

My impression is that this decision is not based on the subtarget, but rather on what Scheduler is plugged in. Therefore we should better make this part of ScheduleDAGInstrs.

lib/CodeGen/MachineScheduler.cpp
408

Oh this was already required and not used by the PostMachineScheduler. Guess you got lucky and can just use it ;-)

441

///

451–489

Would it be possible to implement this more iterator style instead of synthesizing an std::vector (also std::vector is usually the wrong choice in llvm)? (I'm not talking full on STL iterator, just some object with a "get me the next region" function).

658–660

This should also call ScheduleDAGInstr::finishBlock().

jonpa updated this revision to Diff 108609.Jul 28 2017, 2:27 AM
jonpa marked 3 inline comments as done.

Thanks for review!

Patch updated.

jonpa added inline comments.Jul 28 2017, 2:29 AM
include/llvm/CodeGen/MachineScheduler.h
217 ↗(On Diff #108609)

This comment is duplicated in several places, but not sure if this is expected.

include/llvm/Target/TargetSubtargetInfo.h
166–170 ↗(On Diff #107878)

Aha. I changed it then so that the ScheduleDAGInstrs is called instead, which in turn calls the SchedStrategy, which makes sense as you pointed out. This is clearly better to me as well. Looks good?

lib/CodeGen/MachineScheduler.cpp
408

yes :-)

451–489

I made an attempt to wrap this in a class per your suggestion. Could not find a better vector class than std, however.. (?).

This is a bit more code compared to just calling std::reverse(). If we decide this is better, perhaps you know of an example in the LLVM code base that I could reuse if better?

658–660

yep

atrick added inline comments.Jul 28 2017, 10:52 AM
lib/CodeGen/MachineScheduler.cpp
441

Please explain what RegionBegin/End refer to here. Note that they need to be inclusive--they cannot refer to instructions outside of the identified scheduling region because those will be reordered before scheduling this region.

Also note somewhere that the scheduling pass cannot add instructions in other regions now without updating these boundaries.

jonpa updated this revision to Diff 108868.Jul 31 2017, 12:13 AM

Comments updated per review.

Also, the comment

// MBB::size() uses instr_iterator to count. Here we need a bundle to
// count as a single instruction.

moved into getSchedRegions().

A couple more comments on the SystemZ parts, see inline comments.

lib/Target/SystemZ/SystemZHazardRecognizer.cpp
412

I guess this is not only "counters". Maybe rename to "copyState" ?

460

Hmm. Shouldn't emitInstruction itself try to handle branches correctly? Maybe it only needs one extra bit of information passed in, whether to assume a branch is taken or not?

lib/Target/SystemZ/SystemZMachineScheduler.cpp
62

I'm wondering why you don't do the initial allocation of the hazard recognizer for the MBB (and taking over the predecessor state) right here. Currently, it seems that if an MBB is scheduled, this is done in "initialize", and if the MBB is not scheduled, it is done in "leaveMBB". If you'd always do it here, it seems that duplication could be removed.

lib/Target/SystemZ/SystemZMachineScheduler.h
104

This is dead now, right?

jonpa updated this revision to Diff 108932.Jul 31 2017, 8:24 AM
jonpa marked 4 inline comments as done.

SystemZ parts updated per review. See inline comments.

lib/Target/SystemZ/SystemZHazardRecognizer.cpp
412

ok

460

ok - updated so that

  • EmitInstruction() will recognize that any branch in slot 1 or 2 will end current group (there is at least one instruction already in group).
  • emitInstruction() takes a new parameter TakenBranch and makes sure that if true, current decoder group is empty after emission.
lib/Target/SystemZ/SystemZMachineScheduler.cpp
62

There is one subtle change in behavior with this patch, and that is the fact that initPolicy() (via enterRegion()) is never called on an empty MBB. That is why the DoneMBB argument is needed for leaveMBB().

I thought about adding an empty region for an empty MBB so that initPolicy() is always called for each MBB. That didn't quite work still, since the MBB cannot then be retrieved from Begin->getParent(), if Begin == End.

lib/Target/SystemZ/SystemZMachineScheduler.h
104

ah, yes.

uweigand added inline comments.Jul 31 2017, 8:43 AM
lib/Target/SystemZ/SystemZMachineScheduler.cpp
62

Maybe the common-code interface still isn't quite right then. Should we have an enterMBB() to properly pair with the leaveMBB(), maybe?

jonpa updated this revision to Diff 109051.Jul 31 2017, 11:33 PM

Updated per review.
startBlock() and enterMBB() implemented / added.

jonpa marked an inline comment as done.Jul 31 2017, 11:36 PM
jonpa added inline comments.
lib/Target/SystemZ/SystemZMachineScheduler.cpp
62

Yes, this seems better.

Thanks! The SystemZ part now looks very reasonable. Just a couple of final, really just cosmetic comments inline.

I think we still need final approval for the common-code changes, though.

lib/Target/SystemZ/SystemZHazardRecognizer.cpp
432

There doesn't appear to be anything HazardRecognizer-specific left in this routine. Maybe move to the SchedPolicy (and possibly merge with getNextMIToEmit into an "advanceTo" routine with just and "end" argument)?

441

Maybe also move to the SchedPolicy and merge into its sole user? Then we wouldn't have to pass the MBB around.

lib/Target/SystemZ/SystemZMachineScheduler.cpp
62

Agreed. Final question: can't the "advance" now be done in initPolicy, so we're finally rid of this function (and the CurrBegin global)?

62

Is it even ever possible now that LastEmittedMI->getParent() is *not* equal to MBB?

66

Maybe inline into its single user? Only if the result looks simpler ...

76

Maybe use HazardRec here? It is now guaranteed to be the same ...

109

HazardRec again?

jonpa updated this revision to Diff 109293.Aug 2 2017, 1:29 AM
jonpa marked 6 inline comments as done.

SystemZ parts updated per review.

lib/Target/SystemZ/SystemZHazardRecognizer.cpp
432

Yes :-)

441

Yes, looks better.

Moved the assert that makes sure that we are aware of all terminators into emitInstruction(). The purpose is currently to check that we don't have any terminators that are branches without the isBranch/isReturn flag. It seems that CondTrap is the only example, right now. Perhaps CondTrap could get the isBranch flag instead? Do we want this assert, or could we simply assume that all terminators are branches?

lib/Target/SystemZ/SystemZMachineScheduler.cpp
62

Yes, a terminator in the single predecessor will be emitted and recorded as LastEmittedMI. I tried first to guard the update of LastEmittedMI with this check, but then realized that the HazardRec does not have the MBB member.

62

Yes, indeed :-)

66

Yes, I guess that looks ok.

76

ok

109

done

OK, the SystemZ parts look good to me now. Thanks!

jonpa added a comment.Aug 9 2017, 3:05 AM

ping - waiting for review of the common-code parts.

Matthias: are you happy with the changes I did according to your suggestion to avoid std::reverse?
Andy: Do the comments now look ok?

jonpa added a comment.Aug 15 2017, 5:58 AM

ping!

This patch is now only waiting for review of the common-code parts.

MatzeB accepted this revision.Aug 15 2017, 7:30 AM

You may revert the region iterator changes if there is no way to avoid the vector (see below). Either way LGTM.

lib/CodeGen/MachineScheduler.cpp
451

Capitalize parameter names.

451–489

I was thinking about something where we do not store intermediate results in a vector but compute regions on the fly. If it turns out we cannot avoid a vector then your previous approach was fine.
Using something like SmallVector<16> should save some allocations when compiling small functions.

This revision is now accepted and ready to land.Aug 15 2017, 7:30 AM
atrick edited edge metadata.Aug 16 2017, 12:57 PM

Andy: Do the comments now look ok?

Yes, thanks.

jonpa updated this revision to Diff 111473.Aug 17 2017, 1:08 AM
jonpa marked 2 inline comments as done.

Updated with change back to SmallVector.

Thanks for review. Committing soon.

jonpa closed this revision.Aug 17 2017, 1:36 AM

llvm/trunk@311072

lib/CodeGen/MachineScheduler.cpp
451–489

Ah, I see. I am then going back to the previous version since it was simpler, and since I really don't want to get into rewriting the loop that extracts the scheduling regions unless really needed. I also changed to SmallVector.