This is an archive of the discontinued LLVM Phabricator instance.

[ScheduleDAGRRList] Do not preschedule the node has ADJCALLSTACKDOWN parent
ClosedPublic

Authored by shiva0217 on Oct 21 2018, 10:55 PM.

Details

Summary

We should not pre-scheduled the node has ADJCALLSTACKDOWN parent, or else, when bottom-up scheduling, ADJCALLSTACKDOWN, and ADJCALLSTACKUP may hold CallResource too long and make other calls can't be scheduled. If there's no other available node to schedule, the scheduler will try to rename the register by creating the copy to avoid the conflict which will fail because CallResource is not a real physical register.

The issue is found by Tim https://github.com/avr-rust/rust/issues/111
and discuss in http://lists.llvm.org/pipermail/llvm-dev/2018-October/127083.html

Diff Detail

Repository
rL LLVM

Event Timeline

shiva0217 created this revision.Oct 21 2018, 10:55 PM
TimNN added a subscriber: TimNN.Oct 21 2018, 11:06 PM
bjope added a subscriber: bjope.Oct 22 2018, 12:02 AM
TimNN added inline comments.Oct 22 2018, 7:40 AM
lib/CodeGen/SelectionDAG/ScheduleDAGRRList.cpp
2947

This finds only the first non-data predecessor. Could there be multiple that would need to be considered?

2957

Would we need to consider transitively reachable nodes? Or is checking the direct predecessors enough, as is done here?

shiva0217 retitled this revision from [ScheduleDAGRRList] Do not preschedule ADJCALLSTACKDOWN to [ScheduleDAGRRList] Do not preschedule the node has ADJCALLSTACKDOWN.Oct 22 2018, 8:14 PM
shiva0217 edited the summary of this revision. (Show Details)
shiva0217 added inline comments.
lib/CodeGen/SelectionDAG/ScheduleDAGRRList.cpp
2947

Hi Tim, I think you are right. We should make sure find the first FrameSetup predecessor to avoid the FrameSetup not the first non-data predecessor.

2957

Checking the direct predecessors probably enough, because the order dependencies are setting for parameter setup instructions for the call. Most backend using order dependencies to group the parameter setting instructions between FrameSetup and call by LowerCall API.

shiva0217 updated this revision to Diff 170562.Oct 22 2018, 8:16 PM

Update patch to address Tim's comments.

What about MachineVerifier::verifyStackFrame()? That currently rejects any function with nested callframe setups. You will probably see your testcases fail when running with -verify-machineinstrs.

The code in PEI::replaceFrameIndices also seems sketchy to me as in case of nested callframes it would set InsideCallSequence to false after the inner ADJCALLSTACKUP (though at a first glance I wasn't sure yet why that variable exists at all, and why we would not track sp adjustments outside of callframe setups...)

What about MachineVerifier::verifyStackFrame()? That currently rejects any function with nested callframe setups. You will probably see your testcases fail when running with -verify-machineinstrs.

Hi @MatzeB, our testcases originally not nested, the error caused by the scheduler tries to interleaving callframe setups. The patch guides the scheduler not try to interleave the callframe setups. The nested callframe setups won't occur in our cases with the patch. So they could pass the MachineVerifier.

The code in PEI::replaceFrameIndices also seems sketchy to me as in case of nested callframes it would set InsideCallSequence to false after the inner ADJCALLSTACKUP (though at a first glance I wasn't sure yet why that variable exists at all, and why we would not track sp adjustments outside of callframe setups...)

It seems that this part of the code assume nested callframes shouldn't occur. I think the sp adjustments outside the ADJCALLSTACKDOWN/ADJCALLSTACKUP will track by if(TII.isFrameInstr(*I) SPAdj += TII.getSPAdjust(*I); Because prolog/epilog instructions should be setting with FrameSetup/FrameDestroy MIflags, so if(TII.isFrameInstr(*I) will return true.

shiva0217 retitled this revision from [ScheduleDAGRRList] Do not preschedule the node has ADJCALLSTACKDOWN to [ScheduleDAGRRList] Do not preschedule the node has ADJCALLSTACKDOWN parent.Oct 22 2018, 11:06 PM
shiva0217 edited the summary of this revision. (Show Details)
TimNN added a comment.Nov 1 2018, 11:35 PM

Friendly Ping! Is there anything I can do to help this patch along? It looks like no reviewers have been assigned yet, do you know of anyone who would be appropriate, @shiva0217?

The patch definitely fixes the original issue.

Hi @TimNN,

I look into the log history of ScheduleDAGRRList.cpp, it seems that @eli.friedman and @samparker have been fixed some issue relative to CallResource.
I'll add them as the reviewers. Hopefully, they could have time to review our patch and won't bother them too much.

The one patch I made in this area was corrected by Eli, so he's far more informed than I! The problem that time was also due to interleaving of down/up, and, as Matthias said in his email, I'm not sure why we'd need this ability. I'll take sometime today to look into the DAG builder to see if serialising these nodes isn't too much of a pain, because I'd hope that expressing this via dependencies would be better long-term.

Please include a testcase, if possible.

If I'm following correctly, the issue is PrescheduleNodesWithMultipleUses() is trying to preschedule a store that's part of a call sequence, and that introduces an extra edge that makes the DAG impossible to schedule? That makes sense, and the approach this patch takes seems correct.

I'm a little concerned this is is a fragile workaround; maybe instead of adding a heuristic here, we should pre-schedule call sequences so they can't overlap. (They normally can't anyway, except for calls to runtime functions inserted by SelectionDAG itself.)

Hi @samparker,
Thanks for your time to take a look at the code.

Hi @eli.friedman,
Thanks for the comments.
Yes, that's exactly how the issue happened.
It seems that PrescheduleNodesWithMultipleUses() try to pre-schedule the store which is one of the multiple users, because register pressure heuristic may increase the live range of other use.
But when the dependency is not the data dependence, the register pressure won't increase because there is no real register using between N and U in the following case.
We probably could avoid to pre-schedule the node when it's not data dependency. Does it sound reasonable or it could drop the performance in some cases?

    N
  / |
 /  |
U  store

I try to reproduce the fail case by in-tree targets, but it seems that it would depend on ISA and scheduling model and no in-tree targets could trigger the issue.

Hi @TimNN,
Is your test case be able to trigger the issue by in-tree targets?

I'm sorry that I'll take a few days off because my first child going to born, but I'll catch up as soon as I can.

I don't really want to merge this without any test coverage.

If you really can't figure out any way to trigger this DAG structure from IR for an in-tree target, maybe you could write a unittest which artificially constructs an appropriate DAG? We currently don't use unittests much for SelectionDAG/ScheduleDAG, but there are some existing tests in /unittest/CodeGen/AArch64SelectionDAGTest.cpp .

TimNN added a comment.Nov 21 2018, 9:31 PM

Sorry for taking so long to reply. The original example I shared (https://gist.github.com/TimNN/501819422631149c3ab2b8cc0b15c98d) reproduces the issue for the AVR target. I'm not sure about the terminology here, but I would consider AVR "in-tree", even though it is still experimental. I don't know if / how the issue can be reproduced on other targets.

Add the test case provided by Tim.

Hi @TimNN,
Thanks for the test case, that really help me a lot.

Hi, @efriedma
Thanks for your insight to provide an alternative way to create a DAG test case.
With the test case, are'we ready to go?

efriedma accepted this revision.Jan 9 2019, 5:58 PM

Sorry, I lost track of this. LGTM.

This revision is now accepted and ready to land.Jan 9 2019, 5:58 PM
shiva0217 closed this revision.Jan 18 2019, 12:55 AM

Hi Eli, Thanks for the review.