This is an archive of the discontinued LLVM Phabricator instance.

[MachineScheduler] Handle regmasks and allow calls to be rescheduled.
Needs ReviewPublic

Authored by jonpa on Dec 19 2015, 5:34 AM.

Details

Reviewers
MatzeB
atrick
Summary

I have experimented with rescheduling calls in MachineScheduler, and have a
patch I would like to offer for comments and review. As far as I could
understand from the comment above isSchedBoundary(), this is something that
may be wanted generally.

Or is it otherwise a good enough option to use the pre-RA (isel) scheduler
to handle this?

This patch adds:

Handling of regmasks while building the dag, with a new ScheduleDAGInstrs
method 'addRegMaskDeps()'.

A new method 'clearCallsInDefsForReg()' has been added from code factored
out of addPhysRegDeps(), as it is also needed by addRegMaskDeps().

A new method SURegDefIsDead() that checks for a dead-def reg MO, and also
for a regmask, as opposed to calling registerDefIsDead(), which does not check
for the regmask.

A new SubtargetInfo virtual method 'MISchedRescheduleCalls()' and a new
CL option 'RescheduleCalls' to control the behaviour. Default is false, i.e.
no rescheduling of calls.

Diff Detail

Event Timeline

jonpa updated this revision to Diff 43302.Dec 19 2015, 5:34 AM
jonpa retitled this revision from to [MachineScheduler] Handle regmasks and allow calls to be rescheduled..
jonpa updated this object.
jonpa added a subscriber: llvm-commits.
hfinkel added a subscriber: hfinkel.Feb 2 2016, 5:36 PM

The obvious high-level question is: What do we gain by scheduling calls, and what's the right way to model them? Are you just trying to reduce register pressure in the caller, or are there other benefits as well?

(I suspect there are gains to be had by scheduling calls to reduce register pressure, regardless of anything else, but it is not immediately clear from your patch that this is what would happen).

jonpa added a comment.Feb 3 2016, 3:41 AM

I noticed the comment phrasing including "currently" about the scheduler not handling calls, so I thought there might be some general interest.

For SystemZ this may be of interest also post-ra for the purpose of decoding, but that is just experimentation so far.

More scheduling freedom pre-ra would also be possible, and perhaps this would be good in some cases.

Otherwise, I am not really sure at the moment how needed this is.

I noticed the comment phrasing including "currently" about the scheduler not handling calls, so I thought there might be some general interest.

For SystemZ this may be of interest also post-ra for the purpose of decoding, but that is just experimentation so far.

More scheduling freedom pre-ra would also be possible, and perhaps this would be good in some cases.

Otherwise, I am not really sure at the moment how needed this is.

I believe there is general interest. I'm interested, however, in specifically what motivated you to look at this. Can you explain how decoding is relevant here?

jonpa added a comment.Feb 3 2016, 9:21 AM

Considering that different instructions may have different grouping rules, meaning that in some cases some instructions can only go in certain slots and so on, a greater freedom for the scheduler should generally be good. I think I became interested in this while looking at some functions with long chains of calls. But, as I said, this is just experimentation so far.

atrick edited edge metadata.Feb 15 2016, 11:26 AM

High level, I have the same questions as Hal. But I'm not opposed to adding the support so that targets can evaluate the possibility of call scheduling.

Please test correctness at least on x86 by running the InstructionShuffler with RescheduleCalls.

lib/CodeGen/MachineScheduler.cpp
401–406

Don't we still want to call TII->isSchedulingBoundary for calls? The target may treat some calls differently. How about this:

Always honor the command line option if present (it's just for debugging/experimentation anyway):

if (RescheduleCalls.getNumOccurrences())
    return !RescheduleCalls;

Otherwise simply defer to the target:

return TII->isSchedulingBoundary(MI, MBB, *MF);

By default, have TargetInstrInfo::isSchedulingBoundary return true for isCall().

Make sure any in-tree targets that override this method return true for isCall().

Post on llvm-dev explaining that the scheduler now works across calls. If you have an out-of-tree target that overrides
TargetInstrInfo::isSchedulingBoundary, you now need to check isCall yourself.

lib/CodeGen/ScheduleDAGInstrs.cpp
324–329

I know this is copied code, but I don't understand why you break out of the loop here.

Also, I don't understand why you don't just clear all calls in your case, since you're about to add the most call to the Defs set.

344–345

I think this is a redundant check (as it is in addPhysRegDataDeps).

Please make sure we don't move reserved register definitions across calls. I'm not sure how those are reflected in the regmask.

jonpa updated this revision to Diff 53384.Apr 12 2016, 5:05 AM
jonpa edited edge metadata.

Changed the isSchedBoundary() function per request so that TII is used for calls handling.

Changes in clearCallsInDefsForReg() per suggestions to simply erase all previous calls before adding the current one.

New method addCallDeps(), to implement deps for reserved registers. I was not sure if all calls are guaranteed to have a regmask operand - in that case this method could have been eliminated and handled in addRegMaskDeps().

-misched=shuffle passes on X86.

Regarding the current discussion of compile time: this patch will make dags bigger, and potentially slower. Perhaps this should be checked?

With the test-suite on X86 (with default scheduler), I got:

Results

IMPROVED : 251
REGRESSED : 234

It would be nice to get this feature in. It's encouraging that misched=shuffle passes.

Matthias, could you review the register dependencies at call sites and comment on the regmask handling?

For this to be a real feature, we need test cases and a way for subtargets to opt-in.

MatzeB added inline comments.Apr 27 2016, 1:21 PM
lib/CodeGen/ScheduleDAGInstrs.cpp
324–329

+1

Also arguments to calls are not necessarily dead, they may reside in callee saved registers. Simply forgetting about all call-induced Defs seems invalid to me.

334–336

Have you actually tried without this code. If the call potentially reads/writes a reserved register then it must announce that with a use/def machine operand. If it does not this is a bug that should be fixed in the target IMO.

360–361

Don't repeat the function name in the doxygen comment.

366

Variable names should start with uppercase letter.

367

Avoid indentation with if (!MO.clobbersPhysReg(reg)) continue;

371

register masks already contain all register, it should not be necessary to iterate over the aliases.

378

Variable name should start with upper case letter. This call could be deferred until we know that DefSU != SU.

379

Isn't defOp equivalent to !SURegDefIsDead()?

380

Maybe introduce a variable unsigned AliasReg = *Alias; so we don't need to remind ourself with a comment that *Alias is a register...

386–387

see above, I would expect it to simply erase all previous Defs for this register reg.

609–610

\p SU and \p Reg is better doxygen style.

611

I'd tend to pass a const MachineInstr & instead of const SUnit* here (nullptr is invalid and MachineInstr instead of SUnit makes the function useful in more contexts).

612

No space before (.

613

upper case.

615

Please avoid auto in cases where the type is not obvious, we also tend to use the variable name MO for machine operands in most other loops.

618–623

Do we need to take register aliases into account here?

626

no braces.

1035–1036

You can add a continue; here

1053–1054

See my comment in addCallDeps(). I really think calls should not be special in any way regarding register dependencies...

qiucf added a subscriber: qiucf.Mar 30 2021, 11:14 PM