This is an archive of the discontinued LLVM Phabricator instance.

Do not select EhPad BB in MachineBlockPlacement when there is regular BB to schedule
ClosedPublic

Authored by deadalnix on Feb 25 2016, 4:50 PM.

Details

Summary

EHPad BB are not entered the classic way and therefor do not need to be placed after their predecessors. This patch make sure EHPad BB are not chosen amongst successors to form chains, and are selected as last resort when selecting the best candidate.

EHPad are scheduled in reverse probability order in order to have them flow into each others naturally.

Diff Detail

Event Timeline

deadalnix updated this revision to Diff 49122.Feb 25 2016, 4:50 PM
deadalnix retitled this revision from to Do not select EhPad BB in MachineBlockPlacement when there is regular BB to schedule.
deadalnix updated this object.
deadalnix added a subscriber: llvm-commits.

@rafael , This is screwing with test/CodeGen/PowerPC/pr25802.ll , I'm not sure what the problem is and what this test is checking. Can you have a look ?

deadalnix updated this revision to Diff 49246.Feb 26 2016, 2:33 PM

As per discussion with @rafael , deleting pr25802.ll

looks like you were involved in D17830 , so you may want to review that one.

davidxl added inline comments.Mar 2 2016, 10:42 PM
lib/CodeGen/MachineBlockPlacement.cpp
563

The new code really lowers the readability. Can you describe the algorithm here? Is it

  1. if the best non eh block exists, return it; otherwise
  2. return the least likely eh pad?

if that is the case, why not simply skip eh blocks in the original worklist loop. If the best bb is found return it, otherwise have another helper loop to find the least likely eh pad?

chandlerc added inline comments.Mar 2 2016, 10:50 PM
lib/CodeGen/MachineBlockPlacement.cpp
563

I agree that this makes it essentially unreadable. =/

I'd suggest instead to maintain two worklists, one of non-EH-pads, and another of EH-pads. Then you can write the direct loop over each, and you can even put the algorithm David suggested in a comment on the second loop. =]

Note that looping over the worklist twice could be quite expensive, which is why I'd suggest two lists.

deadalnix added inline comments.Mar 3 2016, 10:56 AM
lib/CodeGen/MachineBlockPlacement.cpp
563

@davisxl : Yes, that is exactly what this is doing. Going over the ehpad starting with the least likely makes them flow into each other more naturally. Consider you have something like:

hotehpad:
  br exit  // fallthrought
exit:
  ret
coldehpad:
  br exit

vs

coldehpad:
  br exit
hotehpad:
  br exit  // fallthrought
exit:
  ret

@chandlerc , ok for the 2 worklists.

deadalnix updated this revision to Diff 49769.Mar 3 2016, 1:16 PM

Use a separate worklist for ehpads

davidxl added inline comments.Mar 7 2016, 12:28 PM
lib/CodeGen/MachineBlockPlacement.cpp
426

Why is it always good to skip ehpad? With real profile data, can ehpad be 'hotter' than some of the cold blocks?

567

Your patch forces ehpad to be skipped when doing regular layout which is fine. However how is this following layout (enabled by this strategy)

entry, hot, then, cold, clp, hlp, lpret

better than:

entry, hot, then, cold, hlp, lpret, clp?

Of course, both are better than the following:

entry, hot, then, cold, hlp, clp, lpret
deadalnix added inline comments.Mar 7 2016, 2:10 PM
lib/CodeGen/MachineBlockPlacement.cpp
426

There is no point in eagerly scheduling ehpadsas they are never reached the "conventional" way. When you reach an EHPad, you've gone throught the personality function and a bunch of other runtime code, which makes it pretty much irrelevant where it is. Even if the EHPad ends up hotter than some other BB, the way you get to the EHPad makes it irrelevant if it is scheduled eagerly.

There is also considerations linked to D17555 , as to make sure all EHPads ends up on the same side of the split, which simplify dramatically LSDA generation.

567

It is very common for ehpad to flow into each others and I get a ton of backedge scheduling hotlp before coldlp, and potentially longer jumps overall. I'm not too opposed to go

entry, hot, then, cold, hlp, lpret, clp

But the other way around is more readable and easier to understand, and, as mentioned, it doesn't really matter where the landing pad starts so there is no real big drawback schedulign them the other way around.

Note that without that patch, you don't get

entry, hot, then, cold, hlp, clp, lpret

but

entry, hot, then, hlp, lpret, cold, clp

Which really doesn't make any sense as you end up having to jump over lp to reach cold, which is never a win, even if lp is hotter than cold.

davidxl edited edge metadata.Mar 9 2016, 11:35 AM

Sorry for the delay -- for some reason I did not receive the email notification at all..

lib/CodeGen/MachineBlockPlacement.cpp
426

This makes total sense.

569

Since there is no fundamental difference between
entry, hot, then, cold, hlp, lpret, clp vs
entry, hot, then, cold, clp, hlp, lpret

I think change at line 563 should be removed -- it is a distraction to the main code IMO. Some of the test changes can also be avoided.

deadalnix added inline comments.Mar 10 2016, 10:15 PM
lib/CodeGen/MachineBlockPlacement.cpp
569

It can be done if you feel strongly about it. But please consider the following. Quite often, ehpad code flow into each others. See pseudocode :

  invoke @foo to %then unwind label %outerlp
then:
  invoke @foo to %exit unwind label %innerlp
exit:
  ret void
innerlp:
  landingpad { i8*, i32 } cleanup
  br %innercleanup
innercleanup:
  ; Do inner cleanup
  br %outercleanup
outerlp:
  landingpad { i8*, i32 } cleanup
  br %outercleanup
outercleanup:
  ; Do outer cleanup
  resume

Note that this is typical RAII kind of code and/or nested try, or any combiantion of these. Long story short, this kind of pattern are very common.

Now, if you compute the probabilities, outerlp is consider more probable than innerlp, as the preceding invoke is itself more probable. That means that unwinding code ends up being scheduled completely backward every time. Here you'd get :

outerlp, outercleanup, innerlp, innercleanup // which branch to outercleanup

This is the most simple case, but it if very common to have more level of nesting than that in the unwind path and it gets worse. You end with a "corkscrew" like basic block scheduling which I don't think is good.

davidxl added inline comments.Mar 10 2016, 11:11 PM
lib/CodeGen/MachineBlockPlacement.cpp
569

This is a convincing example that shows benefit of using this layout for lp. I suggest adding a small test case to cover this (i.e, inner scope ehpad 'flows' into outer ehpads in the final layout).

By the way, I think the fillworklist change can be separated out as a NFC refactoring.

deadalnix updated this revision to Diff 50397.Mar 11 2016, 12:51 AM
deadalnix edited edge metadata.

Extracted NFC change into D18077
Added a test for the corkscrew ehpad scheduling mentionned in the comments

davidxl added inline comments.Mar 11 2016, 2:22 PM
lib/CodeGen/MachineBlockPlacement.cpp
638–639

suggest this change:

if (Chain.UnschedulePredecesors == 0) {

MBB = *Chain.begin();
auto &WorkList = (MBB->isEHPad() ? EHPadWorkList: BlockWorkList);
WorkList.push_back(MBB);

}

test/CodeGen/X86/block-placement.ll
1145

Where is innnercleanup?

1159

Can you shuffle the lp labels into different orders in the original source?

deadalnix added inline comments.Mar 11 2016, 4:50 PM
test/CodeGen/X86/block-placement.ll
1145

It end up being merged in innrerlp by some pass along the road.

1159

Sure, will do.

deadalnix updated this revision to Diff 50561.Mar 13 2016, 4:50 PM

Scramble lp blocks in the test

Please rebase the patch after the refactoring code is in.

test/CodeGen/X86/block-placement.ll
1137

Please use more plain word for the comments of this test case.

deadalnix updated this revision to Diff 50656.Mar 14 2016, 3:09 PM

Add explaination for the scheduling of EHPad being from the least probably one to the most probable one, including some ASCII art.

Rebase on top of D18077

davidxl added inline comments.Mar 21 2016, 2:29 PM
test/CodeGen/X86/seh-safe-div-win32.ll
68

Not sure if it matters in practice. It seems that the changes in layout behavior for catch handlers as demonstrated in this test case may not be the target problem this patch intends to solve (but changed due to a side effect). In this case, it seems more natural to have handler0 laid before handler1. Do you think this is an issue?

deadalnix added inline comments.Mar 21 2016, 2:50 PM
test/CodeGen/X86/seh-safe-div-win32.ll
68

I'm not sure it changes anything in practice here as the 2 landing pads do not flow into each others. The most important part seems to me that __try.cont is scheduled before the landingpads. That being said, maybe we want a bit more tweaking for windows style EH if it turns out that this is a problem.

davidxl added inline comments.Mar 21 2016, 4:48 PM
test/CodeGen/X86/seh-safe-div-win32.ll
68

The patch is probably good as it is now. Regarding this particular case, what kind of tweaking do you have in mind (just in case it is needed which I hope not)?

deadalnix added inline comments.Mar 21 2016, 5:49 PM
test/CodeGen/X86/seh-safe-div-win32.ll
68

I'm not sure at this stage. I'd like to establish this is actually a problem worth solving before speculating on a possible solution.

@davidxl is there a reason to not merge this ? If not I'd like to proceed.

I am ok with the patch -- but please wait a little longer in case other reviewers want to chime in.

deadalnix updated this revision to Diff 51827.Mar 28 2016, 12:52 PM

Rebase and resolve conflict. Also ping ?

deadalnix updated this revision to Diff 52883.Apr 6 2016, 9:44 PM

Rebase and resolve conflicts. If nobody has anythign against this, I'd like to merge it.

chandlerc accepted this revision.Apr 7 2016, 1:43 AM
chandlerc edited edge metadata.

I still have some unanswered questions here, but I don't think we need to solve them in this patch, it seems like a pretty strict improvement as-is. LGTM.

Also, thanks to David for doing the lion's share of the review here. =]

lib/CodeGen/MachineBlockPlacement.cpp
376

80-columns / clang-format

567–580

This goal makes perfect sense, but I'm not 100% convinced that the probabilities are a really reliable way of achieving it. I don't tihnk you need to solve all of this right away though, maybe just leave a FIXME that we should have a more principled way of ensuring nice layout of nested cleanups?

1151

80-columns / clang-format

This revision is now accepted and ready to land.Apr 7 2016, 1:43 AM
deadalnix updated this revision to Diff 52954.Apr 7 2016, 2:34 PM
deadalnix edited edge metadata.

Format and add comment about the cleanup layout.

This revision was automatically updated to reflect the committed changes.
lib/CodeGen/MachineBlockPlacement.cpp