This is an archive of the discontinued LLVM Phabricator instance.

[SimplifyCFG] Hoist common instructions on Switch.
ClosedPublic

Authored by DianQK on Jul 19 2023, 7:31 AM.

Details

Summary

Sink common instructions are not always performance friendly. We need to implement hoist common instructions on switch instruction to solve the following problem:

define i1 @foo(i64 %a, i64 %b, i64 %c, i64 %d) {
start:
  %test = icmp eq i64 %a, %d
  br i1 %test, label %switch_bb, label %exit

switch_bb:                                        ; preds = %start
  switch i64 %a, label %bb0 [
    i64 1, label %bb1
    i64 2, label %bb2
  ]

bb0:                                              ; preds = %switch_bb
  %0 = icmp eq i64 %b, %c
  br label %exit

bb1:                                              ; preds = %switch_bb
  %1 = icmp eq i64 %b, %c
  br label %exit

bb2:                                              ; preds = %switch_bb
  %2 = icmp eq i64 %b, %c
  br label %exit

exit:                                             ; preds = %bb2, %bb1, %bb0, %start
  %result = phi i1 [ false, %start ], [ %0, %bb0 ], [ %1, %bb1 ], [ %2, %bb2 ]
  ret i1 %result
}

The pre-commit test is D156617.

Diff Detail

Event Timeline

DianQK created this revision.Jul 19 2023, 7:31 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 19 2023, 7:31 AM
DianQK requested review of this revision.Jul 19 2023, 7:31 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 19 2023, 7:31 AM
DianQK updated this revision to Diff 542258.Jul 19 2023, 5:29 PM

Added isSafeToHoistInstr check, but still need to fix patchable-function-entry-bti.ll failure.

nikic added a comment.Jul 20 2023, 1:18 AM

Is it possible to instead generalize the HoistThenElseCodeToIf() transform to not require exactly two successors, so it can support switch as well?

Is it possible to instead generalize the HoistThenElseCodeToIf() transform to not require exactly two successors, so it can support switch as well?

I considered that possibility. I'm concerned that this might have an impact on compile time for the two successors.
But it's more a question of whether the reordering and debug instructions can be more profitable for the switch instruction.
Using the same analysis on multiple successors can become more complex. Too long compile time?
But adding some options to control these behaviors may be appropriate. So I'll give it a new try.

Also, I feel like checking if the debug command is the same is a bit much.

DianQK added a comment.EditedJul 20 2023, 6:16 AM

Maybe I need to know more about D84108.

DianQK updated this revision to Diff 543163.Jul 22 2023, 4:00 AM

Now we can work with the unreachable instruction.
I will try to merge a generic hoist common instructions and add some test cases.

DianQK updated this revision to Diff 543284.Jul 23 2023, 7:39 AM

Merge hoist common instructions, except that the termination instruction is not completed.

DianQK updated this revision to Diff 544763.Jul 27 2023, 7:15 AM

Add the hoistCommonCodeFromSuccessors method as a generic hoist common code implementation.
The existing test cases look like they all pass. More test cases for the switch statement may need to be added.
There may be some duplicate calculations that can be eliminated, such as shouldHoistCommonInstructions. But I think most of the code is done.

XChy added a subscriber: XChy.Jul 29 2023, 11:02 AM

Nice work to me! Just some nits here.
Can you create two patches to better show the difference before/after transform?
The first patch includes your unoptimized testcases without the fold. (Your first commit)
The second patch includes your optimized testcases and your code for the fold. (Your second commit, which is based on your first commit)
And then let the first patch be the parent revision of the second patch through "Edit Related Revisions".

llvm/lib/Transforms/Utils/SimplifyCFG.cpp
1552

Omit the braces of single-statement if/for.

llvm/test/Transforms/SimplifyCFG/hoist-common-code.ll
97 ↗(On Diff #544763)

More positive/negative tests are needed.

DianQK added a reviewer: XChy.Jul 30 2023, 5:22 AM
DianQK updated this revision to Diff 545437.Jul 30 2023, 7:16 AM

Split test cases and code changes into two commits.

DianQK updated this revision to Diff 545439.Jul 30 2023, 7:21 AM

Remove redundant header files.

Nice work to me! Just some nits here.
Can you create two patches to better show the difference before/after transform?
The first patch includes your unoptimized testcases without the fold. (Your first commit)
The second patch includes your optimized testcases and your code for the fold. (Your second commit, which is based on your first commit)
And then let the first patch be the parent revision of the second patch through "Edit Related Revisions".

Thanks, I tried it out.
Too busy this weekend to update this patch. But I've split the patch into two. It should look better now.
I should think carefully about test cases.

DianQK edited the summary of this revision. (Show Details)Jul 30 2023, 7:28 AM
DianQK marked an inline comment as done.
DianQK updated this revision to Diff 549701.Aug 13 2023, 6:18 AM

Add more tests.

DianQK updated this revision to Diff 549788.Aug 13 2023, 10:24 PM

Fixes backend-optimization-failure.cpp.
We should examine all instructions during each iteration to confirm the presence of a terminating one.

DianQK updated this revision to Diff 550292.Aug 15 2023, 5:56 AM

Allows to hoist the side effect instruction.

XChy accepted this revision.Aug 28 2023, 10:39 PM

LGTM. May commit it if no changes are requested by others in one day.

This revision is now accepted and ready to land.Aug 28 2023, 10:39 PM
nikic requested changes to this revision.Aug 29 2023, 6:43 AM

This looks pretty good to me. Only significant concern I have is the unreachable handling, which doesn't look quite correct to me.

llvm/lib/Transforms/Utils/SimplifyCFG.cpp
1528
1563

instuctions -> instructions

Also this sounds like it's a count of instructions, while this is really a SkipFlags bitmask.

1593–1599
1599

Unless I'm missing something, I don't think this code is correct in the case where instructions are skipped. E.g. if the block is something like call @not_willreturn; unreachable and we skip over the first instruction and then see the unreachable, it would not be correct to ignore this successor. Is there a test for this?

I think you need to do the removal of unreachable blocks before the main loop, so you only handle the case where there is only unreachable in the block, not any other instructions.

1599

Omit braces for single line body.

1617

Could use all_of here.

1619

I think this may misbehave in the following case: You have BB1 and BB2 with same debug info and BB3 with different. Then BB1 and BB3 will skip it, but BB2 won't.

So I think you want to first do a check whether all are identical, and then skip or not skip debug info.

1630

This == 0 on a statistic doesn't really make sense to me... This is a global variable, not just for this run of the function.

1657

Why does this use a set rather than a vector?

1689

Omit braces for one line if.

1698

SmallVector

1703

Can always use the second code path? Should also work for branches.

7136

Leftover?

This revision now requires changes to proceed.Aug 29 2023, 6:43 AM
DianQK updated this revision to Diff 555039.Aug 31 2023, 7:27 AM
DianQK marked 14 inline comments as done.

Thanks to @XChy and @nikic. I apologize for making so many code issues. (T.T)

DianQK added inline comments.Aug 31 2023, 7:32 AM
llvm/lib/Transforms/Utils/SimplifyCFG.cpp
1599

I moved this before the main loop to keep it simple. simplifyUnreachable and EarlyCSE have finished for this.

DianQK updated this revision to Diff 555204.Aug 31 2023, 5:26 PM

Remove more braces.
Change remove_if to erase_if.

DianQK updated this revision to Diff 555207.EditedAug 31 2023, 5:36 PM

Due to my rebase issue, re-update.

Probably due to my rebase last night causing https://reviews.llvm.org/D155711?vs=555039&id=555207#change-46AmBA3PEORL to get weird.

The full diff looks fine.

ping @nikic
Will you continue to review here? Or should I migrate to GitHub?

nikic added a comment.Sep 19 2023, 2:33 AM

This looks fine to me, but I think this is currently missing tests for the "hoist not first instruction" case for switch? We should have at least basic tests that this is supported and side effects are checked.

llvm/lib/Transforms/Utils/SimplifyCFG.cpp
1579

I'd move this variable below the following if check (to before for (;;)).

1588

"hoist the instruction at the rest" -> "hoist instructions from the rest"?

1597

Drop llvm:: prefix.

1598

You are using iterator_range(SuccIterBegin, SuccIters.end()) in a lot of places. I think it may make sense to create a variable for this?

There is also make_first_range() to iterate over .first.

DianQK updated this revision to Diff 557036.Sep 19 2023, 5:58 AM
DianQK marked 4 inline comments as done.

Update as suggested.

This looks fine to me, but I think this is currently missing tests for the "hoist not first instruction" case for switch? We should have at least basic tests that this is supported and side effects are checked.

Thanks. I added the switch variant to hoist-common-skip.ll.

nikic accepted this revision.Sep 19 2023, 6:02 AM

LGTM, thanks!

This revision is now accepted and ready to land.Sep 19 2023, 6:02 AM
This revision was landed with ongoing or failed builds.Sep 19 2023, 4:22 PM
This revision was automatically updated to reflect the committed changes.

Early heads-up we internally have identified a crash due to this change.

Early heads-up we internally have identified a crash due to this change.

I will try giving a reproduce later, but am reverting this now.

musttail call must precede a ret with an optional bitcast

Early heads-up we internally have identified a crash due to this change.

I will try giving a reproduce later, but am reverting this now.

musttail call must precede a ret with an optional bitcast
% cat x.ll
source_filename = "<stdin>"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

define ptr @ham(ptr %arg, ptr %arg1, ptr %arg2, i64 %arg3, ptr %arg4, i64 %arg5) {
bb:
  %load = load i16, ptr %arg, align 2
  %and = and i16 %load, 1
  %icmp = icmp eq i16 %and, 0
  br i1 %icmp, label %bb7, label %bb6

bb6:                                              ; preds = %bb
  ret ptr null

bb7:                                              ; preds = %bb
  switch i16 %load, label %bb13 [
    i16 0, label %bb8
    i16 1, label %bb9
    i16 2, label %bb11
  ]

bb8:                                              ; preds = %bb7
  %call = musttail call ptr @zot(ptr %arg, ptr %arg1, ptr %arg2, i64 %arg3, ptr %arg4, i64 %arg5)
  ret ptr %call

bb9:                                              ; preds = %bb7
  %call10 = musttail call ptr @zot(ptr %arg, ptr %arg1, ptr %arg2, i64 %arg3, ptr %arg4, i64 %arg5)
  ret ptr %call10

bb11:                                             ; preds = %bb7
  %call12 = musttail call ptr @zot(ptr %arg, ptr %arg1, ptr %arg2, i64 %arg3, ptr %arg4, i64 %arg5)
  ret ptr %call12

bb13:                                             ; preds = %bb7
  unreachable
}

declare ptr @zot(ptr, ptr, ptr, i64, ptr, i64)
% opt -O2 -S < x.ll
musttail call must precede a ret with an optional bitcast
  %call = musttail call ptr @zot(ptr nonnull %arg, ptr %arg1, ptr %arg2, i64 %arg3, ptr %arg4, i64 %arg5)
LLVM ERROR: Broken module found, compilation aborted!
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
...

Early heads-up we internally have identified a crash due to this change.

I will try giving a reproduce later, but am reverting this now.

musttail call must precede a ret with an optional bitcast

Okay. Thank you for the reproduce.