This is an archive of the discontinued LLVM Phabricator instance.

[SimpleLoopUnswich] Fix a bug on ComputeUnswitchedCost with partial unswitch
ClosedPublic

Authored by jaykang10 on Jun 7 2021, 8:00 AM.

Details

Summary

There was a bug from cost calculation for partially invariant unswitch.

The costs of non-duplicated blocks are substracted from the total LoopCost, so anything that is duplicated should *not* be counted.

On omnetpp of SPEC2017, more functions are inlined with new pass manager's inliner against legacy pass manager one.

It causes bigger unswitch cost and I have found my update of the cost calculation is wrong from it.

A test case is updated because it has multiple unswitch candidates and the different candidate is picked up.

Diff Detail

Event Timeline

jaykang10 created this revision.Jun 7 2021, 8:00 AM
jaykang10 requested review of this revision.Jun 7 2021, 8:00 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 7 2021, 8:00 AM
jaykang10 updated this revision to Diff 350833.Jun 9 2021, 2:49 AM

Added a test case to explain what was wrong

@fhahn @jonpa I was able to see the speedup on omnetpp of SPEC2017 with this patch.

I have added a test case to explain what was wrong.

If you need something more, please let me know.

@fhahn @jonpa If possible, can we push this change please?

@fhahn @jonpa I am sorry for ping.

fhahn added a comment.Jun 14 2021, 3:33 AM

@fhahn @jonpa I am sorry for ping.

There's no need to be sorry for pinging, but I would suggest to stick to the the common practices for the LLVM project (https://llvm.org/docs/Contributing.html#how-to-submit-a-patch), even if it means waiting a bit longer:

The common courtesy ‘ping’ rate is once a week. Please remember that you are asking for valuable time from other professional developers.

@fhahn @jonpa I am sorry for ping.

There's no need to be sorry for pinging, but I would suggest to stick to the the common practices for the LLVM project (https://llvm.org/docs/Contributing.html#how-to-submit-a-patch), even if it means waiting a bit longer:

The common courtesy ‘ping’ rate is once a week. Please remember that you are asking for valuable time from other professional developers.

@fhahn Thanks for your kind reply. I will follow it.

@fhahn @jonpa Can we push this change please?

sanwou01 added inline comments.Jun 22 2021, 3:45 AM
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2874–2875

Isn't this always true for a partial unswitch? If there is a partial unswitch candidate, PartialIVCondBranch will be the terminator of the loop header. When ComputeUnswitchedCost is called for a partial unswitch TI is *also* the terminator of the loop header. Did I miss something?

FWIW I also think that the previous condition is always true: for a partial unswitch, there is always at least one instruction in InstToDuplicate. (again, the terminator of the loop header!)

I think I managed to figure out what's going on here in the end, but it would help to explain what the bug is in the commit message, something like:

The costs of non-duplicated blocks are substracted from the total LoopCost, so anything that is duplicated should *not* be counted.

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2874–2875

Agreed here!

jaykang10 added inline comments.Jun 22 2021, 4:32 AM
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2874–2875

Isn't this always true for a partial unswitch? If there is a partial unswitch candidate, PartialIVCondBranch will be the terminator of the loop header. When ComputeUnswitchedCost is called for a partial unswitch TI is *also* the terminator of the loop header. Did I miss something?

It is correct. The candidate of PartialIVCondBranch is the terminator of the loop header. However, there could be multiple unswitch candidates which are fully or partially loop invariant. In this case, we need to check whether the candidate is partially invariant or not. You can see the example on
partial_unswitch_exiting_block_with_multiple_unswitch_candidates test function.

FWIW I also think that the previous condition is always true: for a partial unswitch, there is always at least one instruction in InstToDuplicate. (again, the terminator of the loop header!)

As we can see on hasPartialIVCondition, we add instructions such as CMP, LOAD or GEP into InstToDuplicate for partially invariant cases. For fully invariant cases, logical OR, logical AND, True, False or loop invariant conditions are added.

As we can see on below comment, we need to check the cost of duplicated successors. The duplicated successors are different according to the fully or partially invariant instructions.

// If this is a partial unswitch candidate, then it must be a conditional
// branch with a condition of either `or`, `and`, their corresponding
// select forms or partially invariant instructions. In that case, one of
// the successors is necessarily duplicated, so don't even try to remove
// its cost.
jaykang10 added a comment.EditedJun 22 2021, 4:33 AM

I think I managed to figure out what's going on here in the end, but it would help to explain what the bug is in the commit message, something like:

The costs of non-duplicated blocks are substracted from the total LoopCost, so anything that is duplicated should *not* be counted.

Sorry for inconvenient, let me add a comment in commit message.

jaykang10 edited the summary of this revision. (Show Details)Jun 22 2021, 4:46 AM

Sorry for inconvenient, let me add a comment in commit message.

No worries! I feel like I learned quite a bit about how this code works now!

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2874–2875

With !FullUnswitch, we know that we are considering a partial unswitch candidate. Could there be multiple *partial* unswitch candidates? Currently, it looks like there can only be a single partial unswitch candidate per loop, but maybe that's not necessarily true?

jaykang10 added inline comments.Jun 22 2021, 5:48 AM
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2874–2875

Um... you are right. !FullUnswitch means we assume logical OR, logical AND and partially invariant instructions.
Let me remove the if (PartialIVCondBranch == &TI).

jaykang10 updated this revision to Diff 353630.Jun 22 2021, 6:34 AM

Following comments from @sanwou01, updated patch.

sanwou01 accepted this revision.Jun 22 2021, 7:04 AM

Thanks, LGTM!

This revision is now accepted and ready to land.Jun 22 2021, 7:04 AM

Thanks, LGTM!

I appreciate your review! After checking the performance number from omnetpp of SPEC2017 with last change, let me push this change.

fhahn accepted this revision.Jun 22 2021, 7:21 AM

LGTM, thanks!