This is an archive of the discontinued LLVM Phabricator instance.

[SimpleLoopUnswitch] Port partially invariant unswitch from LoopUnswitch to SimpleLoopUnswitch
ClosedPublic

Authored by jaykang10 on Mar 25 2021, 10:30 AM.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Herald added a project: Restricted Project. · View Herald TranscriptMar 25 2021, 10:30 AM
fhahn added a comment.Mar 26 2021, 1:43 AM

Thanks for putting up this patch! Would it be possible to move the detection code (hasPartialIVCondition) somewhere, so both version of unswitching can use the same code?

Thanks for putting up this patch! Would it be possible to move the detection code (hasPartialIVCondition) somewhere, so both version of unswitching can use the same code?

Yep, I also wanted to share it so I have tried to keep the existing implementation as much as possible... I am not sure which place is good for this... maybe, somewhere in lib/Transform/Utils?

jaykang10 updated this revision to Diff 333716.Mar 28 2021, 4:42 AM

Moved hasPartialIVCondition to LoopUtils.h following comment of @fhahn

jaykang10 updated this revision to Diff 333782.Mar 29 2021, 1:12 AM

Thank you for looking into this!
Unhelpful comment: have you considered splitting this into two patches, moving code from LoopUnswitch to a common place, and enhancing SimpleLoopUnswitch ?

Thank you for looking into this!
Unhelpful comment: have you considered splitting this into two patches, moving code from LoopUnswitch to a common place, and enhancing SimpleLoopUnswitch ?

You are right! Let me split this patch into two patches.

jaykang10 updated this revision to Diff 333792.Mar 29 2021, 2:27 AM

Following comment of @lebedev.ri, split previous patch into two patches. This one works on top of https://reviews.llvm.org/D99490

lebedev.ri added inline comments.Mar 29 2021, 2:28 AM
llvm/test/Transforms/SimpleLoopUnswitch/partial-unswitch.ll
1

Precommit this?

jaykang10 added inline comments.Mar 29 2021, 2:42 AM
llvm/test/Transforms/SimpleLoopUnswitch/partial-unswitch.ll
1

Sorry... This test file is a copy of test/Transforms/LoopUnswitch/partial-unswitch.ll using SimpleLoopUnswitch. The output of SimpleLoopUnswitch is slightly different with LoopUnswitch one so I have run the script to generate assertion. I have checked the each tests' output. If it is not good to use the script, I will add the assertions manually. Please let me know.

lebedev.ri added inline comments.Mar 29 2021, 2:47 AM
llvm/test/Transforms/SimpleLoopUnswitch/partial-unswitch.ll
1

No i mean, just commit this test file as-is, obviously after regenerating the check lines to pass on main.

jaykang10 added inline comments.Mar 29 2021, 2:51 AM
llvm/test/Transforms/SimpleLoopUnswitch/partial-unswitch.ll
1

Ah, sorry. let me commit this test separately after checking it again.

jaykang10 updated this revision to Diff 333804.Mar 29 2021, 3:21 AM

Following comment of @lebedev.ri, moved the test to separate patch.

jaykang10 updated this revision to Diff 334076.Mar 30 2021, 1:49 AM

Any comments please?

jaykang10 added a comment.EditedApr 6 2021, 4:28 AM

Any objection to push this change please? or can someone let me know what I have to do something more for this change please?

@fhahn Can you review this change when you have time please?

fhahn added a comment.Apr 9 2021, 2:08 AM

Does this patch give the expected speedup on omnetpp?

@fhahn Can you review this change when you have time please?

please keep the 'common courtesy ping rate' of 1 week in mind (https://llvm.org/docs/CodeReview.html#code-reviews-speed-and-reciprocity), also considering that doing a proper review can take a substantial amount of time for a reviewer and there can be a number of reasons for a delayed response (like holidays, other urgent work).

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
212

Invariants here is not really accurate I think. Those are the instructions we need to duplicate outside the loop, right?

219

nit: perhaps a bit simpler NewInst->insertBefore(BB->getTerminator())

2015

struct not needed.

2027–2029

I think we should probably update IVConditionInfo to provide a better way to check if there are partially invariant conditions, e.g. an isPartiallyInvariant accessor. WDYT?

2151

nit: In the partially invariant case, if UnswitchedSuccBB is an exit block, do not ....

2153

no need for llvm::, also can we just capture [&SuccBB]?

2366

We don't need to execute this loop at all for the PartiallyInvariant case, right?

2683

struct not needed.

2736

nit: no llvm:: should be needed.

2740

nit: no llvm:: should be needed

2868

Do we need to check here that we only do this for blocks where we partially unswitch on their conditions?

2947–2948

this change seems unrelated?

3151

can you add a comment here to explain the check?

Does this patch give the expected speedup on omnetpp?

Yep, It makes expected speedup on omnetpp.

@fhahn Can you review this change when you have time please?

please keep the 'common courtesy ping rate' of 1 week in mind (https://llvm.org/docs/CodeReview.html#code-reviews-speed-and-reciprocity), also considering that doing a proper review can take a substantial amount of time for a reviewer and there can be a number of reasons for a delayed response (like holidays, other urgent work).

I am sorry for inconvenient. I will follow the review rule.

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
212

Yep, I will change it to ToToDuplicate.

219

The BB is empty block so we can not use BB.getTerminator() here.

2015

Yep, I will remove it.

2027–2029

I agree with you. It is better to provide isPartiallyInvariant in IVConditionInfo. If possible, it could be good to create a separate patch for it after pushing this patch.

2151

Yep, I will update it.

2153

Yep, I will update it.

2366

Yep, you are right! I will update it.

2683

Yep, I will update it.

2736

Yep, I will update it.

2740

Yep, I will update it.

2868

We need to check the cost of the successors which are duplicated. The partially unswitched blocks on their conditions are only duplicated.

2947–2948

Ah, I ran clang-format. It is result of clang-format.

3151

Yep, I will add a comment.

jaykang10 updated this revision to Diff 336388.Apr 9 2021, 4:08 AM

Following comments of @fhahn, updated patch.

fhahn requested changes to this revision.Apr 15 2021, 8:13 AM

Thanks for the latest update! I gave the patch a try on SPEC2006 on X86 with -O3 -flto and it looks like there's a crash when building 445.gobmk.

This revision now requires changes to proceed.Apr 15 2021, 8:13 AM

Thanks for the latest update! I gave the patch a try on SPEC2006 on X86 with -O3 -flto and it looks like there's a crash when building 445.gobmk.

Ah, sorry, I could make a mistake while I rebase and update the change. Let me check.

jaykang10 updated this revision to Diff 338080.Apr 16 2021, 6:10 AM

Fixed a bug

fhahn added a comment.Apr 16 2021, 6:11 AM

Fixed a bug

Could you add a test case for the problem you fixed?

@fhahn There was a bug. I have fixed it. I have re-run spec2006 on x86 and it was fine.

Fixed a bug

Could you add a test case for the problem you fixed?

Yep, let me try to add it.

jaykang10 updated this revision to Diff 338138.Apr 16 2021, 8:50 AM

Added a test case for previous bug

jaykang10 added a comment.EditedApr 19 2021, 6:26 AM

@fhahn Is it ok to push this change? If you need something more, please let me know.

@fhahn Can we push this change please? If you need something more, please let me know.

@fhahn I think it is ready to push this change.

fhahn added a comment.Apr 27 2021, 3:22 AM

One final round of comments, after that I think this LG.

Could you also add a test for the threshold (like llvm/test/Transforms/LoopUnswitch/partial-unswitch-mssa-threshold.ll) and the interesting MemorySSA update cases from llvm/test/Transforms/LoopUnswitch/partial-unswitch-update-memoryssa.ll?

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
210

nit:

Copy a set of loop invariant values \p ToDuplicate and insert them at the end of \p BB....?

... and conditional branch on the copied condition.` We only branch on a single value.

2947–2948

ok, can you undo it? As it is not related to the change.

fhahn added inline comments.Apr 27 2021, 3:22 AM
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2027–2029

Sounds good, looking forward to a follow-up!

2053

message in assert needs updating?

2735–2749

enough to capture [this]?

2744

Why TinyPtrVector here? In most case we will have at least 2 instructions here anyways? Probably better to use a SmallVector?

2747

emplace_back?

One final round of comments, after that I think this LG.

Could you also add a test for the threshold (like llvm/test/Transforms/LoopUnswitch/partial-unswitch-mssa-threshold.ll) and the interesting MemorySSA update cases from llvm/test/Transforms/LoopUnswitch/partial-unswitch-update-memoryssa.ll?

Yep, let me try to add the tests.

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
210

Yep, I will update it.

2027–2029

Yep

2053

Yep, I will update it.

2735–2749

This code is inside static function rather than class's member function so we can not use [this]. I will update it with [&L].

2744

The SimpleUnswitchPass uses TinyPtrVector for UnswitchCandidates. In order to follow the interface, TinyPtrVector is being used here.

2747

TinyPtrVector does not have emplace_back.

2947–2948

Yep, I will undo it.

jaykang10 updated this revision to Diff 341113.Apr 28 2021, 2:48 AM

Following comments of @fhahn, updated code and added tests.

fhahn accepted this revision.Apr 29 2021, 1:32 PM

LGTM, thanks! A few small comments that can be addressed before committing without further review I think.

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
3151

nit: perhaps unswitched using a partially invariant condition, ...?

llvm/test/Transforms/SimpleLoopUnswitch/partial-unswitch.ll
1104

Could you add a brief comment and a more descriptive name for the test?

1143

It would be great if you could use descriptive labels in the test both for basic blocks and values, to make it easier to read in case people need to take a look in the future.

1150

please avoid using null as pointer, as this UB

This revision is now accepted and ready to land.Apr 29 2021, 1:32 PM

LGTM, thanks! A few small comments that can be addressed before committing without further review I think.

Thanks @fhahn! After updating code following your comments, I will push this change.

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
3151

Yep, I will update it.

llvm/test/Transforms/SimpleLoopUnswitch/partial-unswitch.ll
1104

Yep, I will add them.

1143

Yep, I will update it.

1150

Yep, I will update it.

jaykang10 updated this revision to Diff 341905.Apr 30 2021, 7:53 AM

Following comments of @fhahn, updated code and test

This revision was landed with ongoing or failed builds.Apr 30 2021, 7:58 AM
This revision was automatically updated to reflect the committed changes.
fhahn added a comment.May 11 2021, 8:28 AM

It looks like this patch causes SimpleLoopUnswitch to not terminate on some inputs, see https://bugs.llvm.org/show_bug.cgi?id=50302 for an example

@fhahn Thanks for letting me know. Let me have a look.

fhahn added a comment.May 12 2021, 9:18 AM

@fhahn Thanks for letting me know. Let me have a look.

Thanks. If it's not straight-forward to resolve, it would be best to revert the patch for now.

@fhahn Thanks for letting me know. Let me have a look.

Thanks. If it's not straight-forward to resolve, it would be best to revert the patch for now.

@fhahn Thanks for kind suggestion. I have figured out what causes the endless compiling. It looks we need to move partially invariant instructions rather than duplicating them. Once I resolve this issue, I need to check the score of benchmarks again. Therefore, I would post a patch for review tomorrow or the day after tomorrow.

@fhahn Thanks for letting me know. Let me have a look.

Thanks. If it's not straight-forward to resolve, it would be best to revert the patch for now.

@fhahn Thanks for kind suggestion. I have figured out what causes the endless compiling. It looks we need to move partially invariant instructions rather than duplicating them. Once I resolve this issue, I need to check the score of benchmarks again. Therefore, I would post a patch for review tomorrow or the day after tomorrow.

Great, thanks! It sounds like it would make sense to revert until this is resolved than, so there's no need to rush :)

@fhahn Thanks for letting me know. Let me have a look.

Thanks. If it's not straight-forward to resolve, it would be best to revert the patch for now.

@fhahn Thanks for kind suggestion. I have figured out what causes the endless compiling. It looks we need to move partially invariant instructions rather than duplicating them. Once I resolve this issue, I need to check the score of benchmarks again. Therefore, I would post a patch for review tomorrow or the day after tomorrow.

Great, thanks! It sounds like it would make sense to revert until this is resolved than, so there's no need to rush :)

Yep, I will revert it now. Additionally, I have seen SimpleLoopUnswitch with this patch does not affect performance number of omnetpp in SPEC2017. I need to figure out it too. Once I fixed these issues, I will let you know.

fhahn added a comment.May 14 2021, 2:41 AM

Yep, I will revert it now. Additionally, I have seen SimpleLoopUnswitch with this patch does not affect performance number of omnetpp in SPEC2017. I need to figure out it too. Once I fixed these issues, I will let you know.

Thanks!

jaykang10 updated this revision to Diff 345815.May 17 2021, 3:45 AM

The below bugs are fixed.
https://bugs.llvm.org/show_bug.cgi?id=50279
https://bugs.llvm.org/show_bug.cgi?id=50302

Even if this patch does not add the processed loop to loop pass manager again, the later loop transformations create loop with the header, which has partially invariant instructions, and add it to loop pass manager again. The loop pass manager runs SimpleLoopUnswitchPass with the loop again and it sometimes causes endless unswitch with some CFGs.

In order to avoid the endless unswitch, the new change removes the partially invariant instruction from header as below.

//    +--------------------+
//    |     preheader      |
//    |  %a = load %ptr    |
//    +--------------------+
//             |     /----------------\
//    +--------v----v------+          |
//    |      header        |---\      |
//    | %c = phi %a, %b    |   |      |
//    | %cond = cmp %c, .. |   |      |
//    | br %cond           |   |      |
//    +--------------------+   |      |
//             |               |      |
//    +--------v-----------+   |      |
//    |  store %ptr        |   |      |
//    +--------------------+   |      |
//             |               |      |
//    +--------v-----------<---/      |
//    |       latch        >----------/
//    |  %b = load %ptr    |
//    +--------------------+

@fhahn I have updated the diff which fixes the bugs. I have added phi node to header in order to remove the partially invariant instructions from header. If you feel something wrong from it, please let me know.

For the performance impact on the omnetpp of spec2017, it looks the different inlining is being happened on the benchmark between legacy and new pass manager and it causes fewer chances for unswitching loop. Maybe, I could try to figure out what it causes the different inlining between legacy and new pass manager later.

@fhahn Can we push this patch again please? I have checked the benchmarks and the scores are fine.

fhahn added a comment.May 21 2021, 3:39 AM

The below bugs are fixed.
https://bugs.llvm.org/show_bug.cgi?id=50279
https://bugs.llvm.org/show_bug.cgi?id=50302

Even if this patch does not add the processed loop to loop pass manager again, the later loop transformations create loop with the header, which has partially invariant instructions, and add it to loop pass manager again. The loop pass manager runs SimpleLoopUnswitchPass with the loop again and it sometimes causes endless unswitch with some CFGs.

In order to avoid the endless unswitch, the new change removes the partially invariant instruction from header as below.

//    +--------------------+
//    |     preheader      |
//    |  %a = load %ptr    |
//    +--------------------+
//             |     /----------------\
//    +--------v----v------+          |
//    |      header        |---\      |
//    | %c = phi %a, %b    |   |      |
//    | %cond = cmp %c, .. |   |      |
//    | br %cond           |   |      |
//    +--------------------+   |      |
//             |               |      |
//    +--------v-----------+   |      |
//    |  store %ptr        |   |      |
//    +--------------------+   |      |
//             |               |      |
//    +--------v-----------<---/      |
//    |       latch        >----------/
//    |  %b = load %ptr    |
//    +--------------------+

Hm, this seems quite fragile, as another pass may decide to move the load to the header again. In the legacy pass manager, the loop is annotated with metadata to avoid partially unswithcing the same loop multiple times (https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Scalar/LoopUnswitch.cpp#L849) Can we do the same here?

llvm/test/Transforms/SimpleLoopUnswitch/endless-unswitch.ll
1 ↗(On Diff #345815)

We should avoid adding tests using -O3 here. Can we reproduce the issue by running simple-loop-unswtich twice? Also it would be good to clean up the test a bit.

The below bugs are fixed.
https://bugs.llvm.org/show_bug.cgi?id=50279
https://bugs.llvm.org/show_bug.cgi?id=50302

Even if this patch does not add the processed loop to loop pass manager again, the later loop transformations create loop with the header, which has partially invariant instructions, and add it to loop pass manager again. The loop pass manager runs SimpleLoopUnswitchPass with the loop again and it sometimes causes endless unswitch with some CFGs.

In order to avoid the endless unswitch, the new change removes the partially invariant instruction from header as below.

//    +--------------------+
//    |     preheader      |
//    |  %a = load %ptr    |
//    +--------------------+
//             |     /----------------\
//    +--------v----v------+          |
//    |      header        |---\      |
//    | %c = phi %a, %b    |   |      |
//    | %cond = cmp %c, .. |   |      |
//    | br %cond           |   |      |
//    +--------------------+   |      |
//             |               |      |
//    +--------v-----------+   |      |
//    |  store %ptr        |   |      |
//    +--------------------+   |      |
//             |               |      |
//    +--------v-----------<---/      |
//    |       latch        >----------/
//    |  %b = load %ptr    |
//    +--------------------+

Hm, this seems quite fragile, as another pass may decide to move the load to the header again.

um... I thought the load would not be moved to header later because it is loop invariant and we usually try to move it to outside loop.

In the legacy pass manager, the loop is annotated with metadata to avoid partially unswithcing the same loop multiple times (https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Scalar/LoopUnswitch.cpp#L849) Can we do the same here?

Yep, I have seen the metadata. I was just not sure whether people will accept it or not. Let me try to add it.

llvm/test/Transforms/SimpleLoopUnswitch/endless-unswitch.ll
1 ↗(On Diff #345815)

um... in order to reproduce the endless unswitch, it needs other passes. Let me try to reduce the test.

jaykang10 updated this revision to Diff 347025.May 21 2021, 7:35 AM

Following comments of @fhahn, updated code.

  • Added metadata to avoid endless unswitch
  • Reduced test case

@fhahn I have followed the LoopUnSwitch pass's metadata and it looks ok with benchmark scores. If you feel something wrong from the change, please let me know.

@fhahn Can we push this change again please?

@fhahn Sorry for ping.

fhahn added a comment.Jun 1 2021, 3:32 AM

Thanks for the update! looks good, as this in line with the legacy loop-unswitch implementation. I think the last piece of work remaining is to update the tests to check that the metadata is emitted? Also, could you add a test with an partially unswitchable loop, that already has llvm.loop.unswitch.partial.disable and ensure it is not unswitched?

Thanks for comments @fhahn! Let me add tests for the metadata.

jaykang10 updated this revision to Diff 348957.Jun 1 2021, 6:15 AM

Following comment of @fhahn, added tests for metadata with llvm.loop.unswitch.partial.disable.

fhahn reopened this revision.Jun 2 2021, 2:21 AM
fhahn added inline comments.
llvm/test/Transforms/SimpleLoopUnswitch/partial-unswitch-check-metadata.ll
9 ↗(On Diff #348957)

Can this test just be part of llvm/test/Transforms/SimpleLoopUnswitch/partial-unswitch.ll?

llvm/test/Transforms/SimpleLoopUnswitch/partial-unswitch-generate-metadata.ll
8 ↗(On Diff #348957)

can you also include checks for the generated IR, ensuring that the whole metadata chain is correct (!llvm.loop attached to the right unswitched loop and contains the disable metadata? Also, can this just be part of llvm/test/Transforms/SimpleLoopUnswitch/partial-unswitch.ll? I think we can just extend the checks there, rather than adding a new file + function.

This revision is now accepted and ready to land.Jun 2 2021, 2:21 AM
fhahn accepted this revision.Jun 2 2021, 2:22 AM

LGTM thanks, with the additional suggestions for the tests.

LGTM thanks, with the additional suggestions for the tests.

Thanks @fhahn! After updating tests, I will push it.

llvm/test/Transforms/SimpleLoopUnswitch/partial-unswitch-check-metadata.ll
9 ↗(On Diff #348957)

Yep, I will add the metadata check in llvm/test/Transforms/SimpleLoopUnswitch/partial-unswitch.ll.

llvm/test/Transforms/SimpleLoopUnswitch/partial-unswitch-generate-metadata.ll
8 ↗(On Diff #348957)

Yep, I will add the metadata check in llvm/test/Transforms/SimpleLoopUnswitch/partial-unswitch.ll.

jaykang10 updated this revision to Diff 349216.Jun 2 2021, 3:25 AM

Following comments from @fhahn, updated tests.

This revision was landed with ongoing or failed builds.Jun 2 2021, 3:26 AM
This revision was automatically updated to reflect the committed changes.
jonpa added a comment.Jun 2 2021, 3:04 PM

Great to see progress on this :-)

I don't see the speedup on omnetpp that we saw before on SystemZ. Do I need to pass some option or adjust some target threshold value somewhere?

Great to see progress on this :-)

I don't see the speedup on omnetpp that we saw before on SystemZ. Do I need to pass some option or adjust some target threshold value somewhere?

@jonpa Thanks for checking the performance number on SystemZ.

I was able to see the speed up on omnetpp of SPEC2006 but there was no speedup on omentpp of SPEC2017. I guess you are looking SPEC2017 one.

It seemed there is different inlining between new pass manager and legacy pass manager and it causes to reduce partial loop unswitch opportunities on omnetpp of SPEC2017.

I am trying to get the speedup with the partial loop unswitch on omnetpp of SPEC2017 again. Once I fix it, I will let you know.

jonpa added a comment.Jun 3 2021, 9:02 AM

Great to see progress on this :-)

I don't see the speedup on omnetpp that we saw before on SystemZ. Do I need to pass some option or adjust some target threshold value somewhere?

@jonpa Thanks for checking the performance number on SystemZ.

I was able to see the speed up on omnetpp of SPEC2006 but there was no speedup on omentpp of SPEC2017. I guess you are looking SPEC2017 one.

yes

It seemed there is different inlining between new pass manager and legacy pass manager and it causes to reduce partial loop unswitch opportunities on omnetpp of SPEC2017.

I am trying to get the speedup with the partial loop unswitch on omnetpp of SPEC2017 again. Once I fix it, I will let you know.

awesome!

fhahn added a comment.Jun 7 2021, 6:03 AM

@jaykang10 can you verify the patch fixes the linked issue & close them?

@jaykang10 can you verify the patch fixes the linked issue & close them?

Thanks for reminding me! @fhahn

Yep, I have checked those bug are fixed. I will close them.

fhahn added a comment.Jul 26 2021, 2:38 AM

@jaykang10 it looks like this patch regressed codegen in some cases: https://bugs.llvm.org/show_bug.cgi?id=51139. It would be great if you could take a look.

@jaykang10 it looks like this patch regressed codegen in some cases: https://bugs.llvm.org/show_bug.cgi?id=51139. It would be great if you could take a look.

Ah, let me have a look.

@jaykang10 it looks like this patch regressed codegen in some cases: https://bugs.llvm.org/show_bug.cgi?id=51139. It would be great if you could take a look.

Ah, let me have a look.

Maybe revert while investigating?

@jaykang10 it looks like this patch regressed codegen in some cases: https://bugs.llvm.org/show_bug.cgi?id=51139. It would be great if you could take a look.

Ah, let me have a look.

Maybe revert while investigating?

As I mentioned on https://bugs.llvm.org/show_bug.cgi?id=51141, I am not sure the unswitch pass has to detect the cases which the loop load pre, sccp or other passes can optimize and the pass do not unswitch the loop...
I am creating a patch to fix https://bugs.llvm.org/show_bug.cgi?id=51141 first. The patch does not fix https://bugs.llvm.org/show_bug.cgi?id=51139. Once I create it, let me add you as reviewer.

@jaykang10 it looks like this patch regressed codegen in some cases: https://bugs.llvm.org/show_bug.cgi?id=51139. It would be great if you could take a look.

Ah, let me have a look.

Maybe revert while investigating?

As I mentioned on https://bugs.llvm.org/show_bug.cgi?id=51141, I am not sure the unswitch pass has to detect the cases which the loop load pre, sccp or other passes can optimize and the pass do not unswitch the loop...
I am creating a patch to fix https://bugs.llvm.org/show_bug.cgi?id=51141 first. The patch does not fix https://bugs.llvm.org/show_bug.cgi?id=51139. Once I create it, let me add you as reviewer.

For https://bugs.llvm.org/show_bug.cgi?id=51139, I have added below comment on it.

In this case, the inliner pass fails to inline the function g after unswitch because of the cost as below debug message.

NOT Inlining (cost=250, threshold=250), Call:   call void @g(i32 %2) #3

Originally, after inlining function g, the JumpThreading pass made the block with call @foo dead and SimplifyCFG pass deleted it.

If you add `always_inline` attribute to the function g's prototype as below, you can see the callfoo is gone.

void g(int h) __attribute__((always_inline));

In this case, as I mentioned previously, I am not sure the unswitch pass has to check the inline cost or something like that... If someone has idea about it, please let me know.

@jaykang10 it looks like this patch regressed codegen in some cases: https://bugs.llvm.org/show_bug.cgi?id=51139. It would be great if you could take a look.

Ah, let me have a look.

Maybe revert while investigating?

As I mentioned on https://bugs.llvm.org/show_bug.cgi?id=51141, I am not sure the unswitch pass has to detect the cases which the loop load pre, sccp or other passes can optimize and the pass do not unswitch the loop...
I am creating a patch to fix https://bugs.llvm.org/show_bug.cgi?id=51141 first. The patch does not fix https://bugs.llvm.org/show_bug.cgi?id=51139. Once I create it, let me add you as reviewer.

For https://bugs.llvm.org/show_bug.cgi?id=51139, I have added below comment on it.

In this case, the inliner pass fails to inline the function g after unswitch because of the cost as below debug message.

NOT Inlining (cost=250, threshold=250), Call:   call void @g(i32 %2) #3

Originally, after inlining function g, the JumpThreading pass made the block with call @foo dead and SimplifyCFG pass deleted it.

If you add `always_inline` attribute to the function g's prototype as below, you can see the callfoo is gone.

void g(int h) __attribute__((always_inline));

In this case, as I mentioned previously, I am not sure the unswitch pass has to check the inline cost or something like that... If someone has idea about it, please let me know.

The SimpleUnswitchPass checks the cost of duplicating the loop so we could adjust it. Let me try to adjust the cost calculation for partially invariant unswitch.

I was able to see the speed up on omnetpp of SPEC2006 but there was no speedup on omentpp of SPEC2017. I guess you are looking SPEC2017 one.

@jaykang10, can you share the key loops that contribute to the speedup you observed from partial unswitch in spec2k6 omnetpp by any chance? Thanks.

Herald added a project: Restricted Project. · View Herald TranscriptMay 18 2022, 9:14 PM