Page MenuHomePhabricator

Please use GitHub pull requests for new patches. Phabricator shutdown timeline

[SimpleLoopUnswitch] Port partially invariant unswitch from LoopUnswitch to SimpleLoopUnswitch
ClosedPublic

Authored by jaykang10 on Mar 25 2021, 10:30 AM.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
fhahn added inline comments.Apr 27 2021, 3:22 AM
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2027–2029

Sounds good, looking forward to a follow-up!

2053

message in assert needs updating?

2735

enough to capture [this]?

2744

Why TinyPtrVector here? In most case we will have at least 2 instructions here anyways? Probably better to use a SmallVector?

2747

emplace_back?

One final round of comments, after that I think this LG.

Could you also add a test for the threshold (like llvm/test/Transforms/LoopUnswitch/partial-unswitch-mssa-threshold.ll) and the interesting MemorySSA update cases from llvm/test/Transforms/LoopUnswitch/partial-unswitch-update-memoryssa.ll?

Yep, let me try to add the tests.

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
210

Yep, I will update it.

2027–2029

Yep

2053

Yep, I will update it.

2735

This code is inside static function rather than class's member function so we can not use [this]. I will update it with [&L].

2744

The SimpleUnswitchPass uses TinyPtrVector for UnswitchCandidates. In order to follow the interface, TinyPtrVector is being used here.

2747

TinyPtrVector does not have emplace_back.

2947–2948

Yep, I will undo it.

jaykang10 updated this revision to Diff 341113.Apr 28 2021, 2:48 AM

Following comments of @fhahn, updated code and added tests.

fhahn accepted this revision.Apr 29 2021, 1:32 PM

LGTM, thanks! A few small comments that can be addressed before committing without further review I think.

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
3151

nit: perhaps unswitched using a partially invariant condition, ...?

llvm/test/Transforms/SimpleLoopUnswitch/partial-unswitch.ll
1104

Could you add a brief comment and a more descriptive name for the test?

1143

It would be great if you could use descriptive labels in the test both for basic blocks and values, to make it easier to read in case people need to take a look in the future.

1150

please avoid using null as pointer, as this UB

This revision is now accepted and ready to land.Apr 29 2021, 1:32 PM

LGTM, thanks! A few small comments that can be addressed before committing without further review I think.

Thanks @fhahn! After updating code following your comments, I will push this change.

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
3151

Yep, I will update it.

llvm/test/Transforms/SimpleLoopUnswitch/partial-unswitch.ll
1104

Yep, I will add them.

1143

Yep, I will update it.

1150

Yep, I will update it.

jaykang10 updated this revision to Diff 341905.Apr 30 2021, 7:53 AM

Following comments of @fhahn, updated code and test

This revision was landed with ongoing or failed builds.Apr 30 2021, 7:58 AM
This revision was automatically updated to reflect the committed changes.
fhahn added a comment.May 11 2021, 8:28 AM

It looks like this patch causes SimpleLoopUnswitch to not terminate on some inputs, see https://bugs.llvm.org/show_bug.cgi?id=50302 for an example

@fhahn Thanks for letting me know. Let me have a look.

fhahn added a comment.May 12 2021, 9:18 AM

@fhahn Thanks for letting me know. Let me have a look.

Thanks. If it's not straight-forward to resolve, it would be best to revert the patch for now.

@fhahn Thanks for letting me know. Let me have a look.

Thanks. If it's not straight-forward to resolve, it would be best to revert the patch for now.

@fhahn Thanks for kind suggestion. I have figured out what causes the endless compiling. It looks we need to move partially invariant instructions rather than duplicating them. Once I resolve this issue, I need to check the score of benchmarks again. Therefore, I would post a patch for review tomorrow or the day after tomorrow.

@fhahn Thanks for letting me know. Let me have a look.

Thanks. If it's not straight-forward to resolve, it would be best to revert the patch for now.

@fhahn Thanks for kind suggestion. I have figured out what causes the endless compiling. It looks we need to move partially invariant instructions rather than duplicating them. Once I resolve this issue, I need to check the score of benchmarks again. Therefore, I would post a patch for review tomorrow or the day after tomorrow.

Great, thanks! It sounds like it would make sense to revert until this is resolved than, so there's no need to rush :)

@fhahn Thanks for letting me know. Let me have a look.

Thanks. If it's not straight-forward to resolve, it would be best to revert the patch for now.

@fhahn Thanks for kind suggestion. I have figured out what causes the endless compiling. It looks we need to move partially invariant instructions rather than duplicating them. Once I resolve this issue, I need to check the score of benchmarks again. Therefore, I would post a patch for review tomorrow or the day after tomorrow.

Great, thanks! It sounds like it would make sense to revert until this is resolved than, so there's no need to rush :)

Yep, I will revert it now. Additionally, I have seen SimpleLoopUnswitch with this patch does not affect performance number of omnetpp in SPEC2017. I need to figure out it too. Once I fixed these issues, I will let you know.

fhahn added a comment.May 14 2021, 2:41 AM

Yep, I will revert it now. Additionally, I have seen SimpleLoopUnswitch with this patch does not affect performance number of omnetpp in SPEC2017. I need to figure out it too. Once I fixed these issues, I will let you know.

Thanks!

jaykang10 updated this revision to Diff 345815.May 17 2021, 3:45 AM

The below bugs are fixed.
https://bugs.llvm.org/show_bug.cgi?id=50279
https://bugs.llvm.org/show_bug.cgi?id=50302

Even if this patch does not add the processed loop to loop pass manager again, the later loop transformations create loop with the header, which has partially invariant instructions, and add it to loop pass manager again. The loop pass manager runs SimpleLoopUnswitchPass with the loop again and it sometimes causes endless unswitch with some CFGs.

In order to avoid the endless unswitch, the new change removes the partially invariant instruction from header as below.

//    +--------------------+
//    |     preheader      |
//    |  %a = load %ptr    |
//    +--------------------+
//             |     /----------------\
//    +--------v----v------+          |
//    |      header        |---\      |
//    | %c = phi %a, %b    |   |      |
//    | %cond = cmp %c, .. |   |      |
//    | br %cond           |   |      |
//    +--------------------+   |      |
//             |               |      |
//    +--------v-----------+   |      |
//    |  store %ptr        |   |      |
//    +--------------------+   |      |
//             |               |      |
//    +--------v-----------<---/      |
//    |       latch        >----------/
//    |  %b = load %ptr    |
//    +--------------------+

@fhahn I have updated the diff which fixes the bugs. I have added phi node to header in order to remove the partially invariant instructions from header. If you feel something wrong from it, please let me know.

For the performance impact on the omnetpp of spec2017, it looks the different inlining is being happened on the benchmark between legacy and new pass manager and it causes fewer chances for unswitching loop. Maybe, I could try to figure out what it causes the different inlining between legacy and new pass manager later.

@fhahn Can we push this patch again please? I have checked the benchmarks and the scores are fine.

fhahn added a comment.May 21 2021, 3:39 AM

The below bugs are fixed.
https://bugs.llvm.org/show_bug.cgi?id=50279
https://bugs.llvm.org/show_bug.cgi?id=50302

Even if this patch does not add the processed loop to loop pass manager again, the later loop transformations create loop with the header, which has partially invariant instructions, and add it to loop pass manager again. The loop pass manager runs SimpleLoopUnswitchPass with the loop again and it sometimes causes endless unswitch with some CFGs.

In order to avoid the endless unswitch, the new change removes the partially invariant instruction from header as below.

//    +--------------------+
//    |     preheader      |
//    |  %a = load %ptr    |
//    +--------------------+
//             |     /----------------\
//    +--------v----v------+          |
//    |      header        |---\      |
//    | %c = phi %a, %b    |   |      |
//    | %cond = cmp %c, .. |   |      |
//    | br %cond           |   |      |
//    +--------------------+   |      |
//             |               |      |
//    +--------v-----------+   |      |
//    |  store %ptr        |   |      |
//    +--------------------+   |      |
//             |               |      |
//    +--------v-----------<---/      |
//    |       latch        >----------/
//    |  %b = load %ptr    |
//    +--------------------+

Hm, this seems quite fragile, as another pass may decide to move the load to the header again. In the legacy pass manager, the loop is annotated with metadata to avoid partially unswithcing the same loop multiple times (https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Scalar/LoopUnswitch.cpp#L849) Can we do the same here?

llvm/test/Transforms/SimpleLoopUnswitch/endless-unswitch.ll
1 ↗(On Diff #345815)

We should avoid adding tests using -O3 here. Can we reproduce the issue by running simple-loop-unswtich twice? Also it would be good to clean up the test a bit.

The below bugs are fixed.
https://bugs.llvm.org/show_bug.cgi?id=50279
https://bugs.llvm.org/show_bug.cgi?id=50302

Even if this patch does not add the processed loop to loop pass manager again, the later loop transformations create loop with the header, which has partially invariant instructions, and add it to loop pass manager again. The loop pass manager runs SimpleLoopUnswitchPass with the loop again and it sometimes causes endless unswitch with some CFGs.

In order to avoid the endless unswitch, the new change removes the partially invariant instruction from header as below.

//    +--------------------+
//    |     preheader      |
//    |  %a = load %ptr    |
//    +--------------------+
//             |     /----------------\
//    +--------v----v------+          |
//    |      header        |---\      |
//    | %c = phi %a, %b    |   |      |
//    | %cond = cmp %c, .. |   |      |
//    | br %cond           |   |      |
//    +--------------------+   |      |
//             |               |      |
//    +--------v-----------+   |      |
//    |  store %ptr        |   |      |
//    +--------------------+   |      |
//             |               |      |
//    +--------v-----------<---/      |
//    |       latch        >----------/
//    |  %b = load %ptr    |
//    +--------------------+

Hm, this seems quite fragile, as another pass may decide to move the load to the header again.

um... I thought the load would not be moved to header later because it is loop invariant and we usually try to move it to outside loop.

In the legacy pass manager, the loop is annotated with metadata to avoid partially unswithcing the same loop multiple times (https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Scalar/LoopUnswitch.cpp#L849) Can we do the same here?

Yep, I have seen the metadata. I was just not sure whether people will accept it or not. Let me try to add it.

llvm/test/Transforms/SimpleLoopUnswitch/endless-unswitch.ll
1 ↗(On Diff #345815)

um... in order to reproduce the endless unswitch, it needs other passes. Let me try to reduce the test.

jaykang10 updated this revision to Diff 347025.May 21 2021, 7:35 AM

Following comments of @fhahn, updated code.

  • Added metadata to avoid endless unswitch
  • Reduced test case

@fhahn I have followed the LoopUnSwitch pass's metadata and it looks ok with benchmark scores. If you feel something wrong from the change, please let me know.

@fhahn Can we push this change again please?

@fhahn Sorry for ping.

fhahn added a comment.Jun 1 2021, 3:32 AM

Thanks for the update! looks good, as this in line with the legacy loop-unswitch implementation. I think the last piece of work remaining is to update the tests to check that the metadata is emitted? Also, could you add a test with an partially unswitchable loop, that already has llvm.loop.unswitch.partial.disable and ensure it is not unswitched?

Thanks for comments @fhahn! Let me add tests for the metadata.

jaykang10 updated this revision to Diff 348957.Jun 1 2021, 6:15 AM

Following comment of @fhahn, added tests for metadata with llvm.loop.unswitch.partial.disable.

fhahn reopened this revision.Jun 2 2021, 2:21 AM
fhahn added inline comments.
llvm/test/Transforms/SimpleLoopUnswitch/partial-unswitch-check-metadata.ll
9 ↗(On Diff #348957)

Can this test just be part of llvm/test/Transforms/SimpleLoopUnswitch/partial-unswitch.ll?

llvm/test/Transforms/SimpleLoopUnswitch/partial-unswitch-generate-metadata.ll
8 ↗(On Diff #348957)

can you also include checks for the generated IR, ensuring that the whole metadata chain is correct (!llvm.loop attached to the right unswitched loop and contains the disable metadata? Also, can this just be part of llvm/test/Transforms/SimpleLoopUnswitch/partial-unswitch.ll? I think we can just extend the checks there, rather than adding a new file + function.

This revision is now accepted and ready to land.Jun 2 2021, 2:21 AM
fhahn accepted this revision.Jun 2 2021, 2:22 AM

LGTM thanks, with the additional suggestions for the tests.

LGTM thanks, with the additional suggestions for the tests.

Thanks @fhahn! After updating tests, I will push it.

llvm/test/Transforms/SimpleLoopUnswitch/partial-unswitch-check-metadata.ll
9 ↗(On Diff #348957)

Yep, I will add the metadata check in llvm/test/Transforms/SimpleLoopUnswitch/partial-unswitch.ll.

llvm/test/Transforms/SimpleLoopUnswitch/partial-unswitch-generate-metadata.ll
8 ↗(On Diff #348957)

Yep, I will add the metadata check in llvm/test/Transforms/SimpleLoopUnswitch/partial-unswitch.ll.

jaykang10 updated this revision to Diff 349216.Jun 2 2021, 3:25 AM

Following comments from @fhahn, updated tests.

This revision was landed with ongoing or failed builds.Jun 2 2021, 3:26 AM
This revision was automatically updated to reflect the committed changes.
jonpa added a comment.Jun 2 2021, 3:04 PM

Great to see progress on this :-)

I don't see the speedup on omnetpp that we saw before on SystemZ. Do I need to pass some option or adjust some target threshold value somewhere?

Great to see progress on this :-)

I don't see the speedup on omnetpp that we saw before on SystemZ. Do I need to pass some option or adjust some target threshold value somewhere?

@jonpa Thanks for checking the performance number on SystemZ.

I was able to see the speed up on omnetpp of SPEC2006 but there was no speedup on omentpp of SPEC2017. I guess you are looking SPEC2017 one.

It seemed there is different inlining between new pass manager and legacy pass manager and it causes to reduce partial loop unswitch opportunities on omnetpp of SPEC2017.

I am trying to get the speedup with the partial loop unswitch on omnetpp of SPEC2017 again. Once I fix it, I will let you know.

jonpa added a comment.Jun 3 2021, 9:02 AM

Great to see progress on this :-)

I don't see the speedup on omnetpp that we saw before on SystemZ. Do I need to pass some option or adjust some target threshold value somewhere?

@jonpa Thanks for checking the performance number on SystemZ.

I was able to see the speed up on omnetpp of SPEC2006 but there was no speedup on omentpp of SPEC2017. I guess you are looking SPEC2017 one.

yes

It seemed there is different inlining between new pass manager and legacy pass manager and it causes to reduce partial loop unswitch opportunities on omnetpp of SPEC2017.

I am trying to get the speedup with the partial loop unswitch on omnetpp of SPEC2017 again. Once I fix it, I will let you know.

awesome!

fhahn added a comment.Jun 7 2021, 6:03 AM

@jaykang10 can you verify the patch fixes the linked issue & close them?

@jaykang10 can you verify the patch fixes the linked issue & close them?

Thanks for reminding me! @fhahn

Yep, I have checked those bug are fixed. I will close them.

fhahn added a comment.Jul 26 2021, 2:38 AM

@jaykang10 it looks like this patch regressed codegen in some cases: https://bugs.llvm.org/show_bug.cgi?id=51139. It would be great if you could take a look.

@jaykang10 it looks like this patch regressed codegen in some cases: https://bugs.llvm.org/show_bug.cgi?id=51139. It would be great if you could take a look.

Ah, let me have a look.

@jaykang10 it looks like this patch regressed codegen in some cases: https://bugs.llvm.org/show_bug.cgi?id=51139. It would be great if you could take a look.

Ah, let me have a look.

Maybe revert while investigating?

@jaykang10 it looks like this patch regressed codegen in some cases: https://bugs.llvm.org/show_bug.cgi?id=51139. It would be great if you could take a look.

Ah, let me have a look.

Maybe revert while investigating?

As I mentioned on https://bugs.llvm.org/show_bug.cgi?id=51141, I am not sure the unswitch pass has to detect the cases which the loop load pre, sccp or other passes can optimize and the pass do not unswitch the loop...
I am creating a patch to fix https://bugs.llvm.org/show_bug.cgi?id=51141 first. The patch does not fix https://bugs.llvm.org/show_bug.cgi?id=51139. Once I create it, let me add you as reviewer.

@jaykang10 it looks like this patch regressed codegen in some cases: https://bugs.llvm.org/show_bug.cgi?id=51139. It would be great if you could take a look.

Ah, let me have a look.

Maybe revert while investigating?

As I mentioned on https://bugs.llvm.org/show_bug.cgi?id=51141, I am not sure the unswitch pass has to detect the cases which the loop load pre, sccp or other passes can optimize and the pass do not unswitch the loop...
I am creating a patch to fix https://bugs.llvm.org/show_bug.cgi?id=51141 first. The patch does not fix https://bugs.llvm.org/show_bug.cgi?id=51139. Once I create it, let me add you as reviewer.

For https://bugs.llvm.org/show_bug.cgi?id=51139, I have added below comment on it.

In this case, the inliner pass fails to inline the function g after unswitch because of the cost as below debug message.

NOT Inlining (cost=250, threshold=250), Call:   call void @g(i32 %2) #3

Originally, after inlining function g, the JumpThreading pass made the block with call @foo dead and SimplifyCFG pass deleted it.

If you add `always_inline` attribute to the function g's prototype as below, you can see the callfoo is gone.

void g(int h) __attribute__((always_inline));

In this case, as I mentioned previously, I am not sure the unswitch pass has to check the inline cost or something like that... If someone has idea about it, please let me know.

@jaykang10 it looks like this patch regressed codegen in some cases: https://bugs.llvm.org/show_bug.cgi?id=51139. It would be great if you could take a look.

Ah, let me have a look.

Maybe revert while investigating?

As I mentioned on https://bugs.llvm.org/show_bug.cgi?id=51141, I am not sure the unswitch pass has to detect the cases which the loop load pre, sccp or other passes can optimize and the pass do not unswitch the loop...
I am creating a patch to fix https://bugs.llvm.org/show_bug.cgi?id=51141 first. The patch does not fix https://bugs.llvm.org/show_bug.cgi?id=51139. Once I create it, let me add you as reviewer.

For https://bugs.llvm.org/show_bug.cgi?id=51139, I have added below comment on it.

In this case, the inliner pass fails to inline the function g after unswitch because of the cost as below debug message.

NOT Inlining (cost=250, threshold=250), Call:   call void @g(i32 %2) #3

Originally, after inlining function g, the JumpThreading pass made the block with call @foo dead and SimplifyCFG pass deleted it.

If you add `always_inline` attribute to the function g's prototype as below, you can see the callfoo is gone.

void g(int h) __attribute__((always_inline));

In this case, as I mentioned previously, I am not sure the unswitch pass has to check the inline cost or something like that... If someone has idea about it, please let me know.

The SimpleUnswitchPass checks the cost of duplicating the loop so we could adjust it. Let me try to adjust the cost calculation for partially invariant unswitch.

I was able to see the speed up on omnetpp of SPEC2006 but there was no speedup on omentpp of SPEC2017. I guess you are looking SPEC2017 one.

@jaykang10, can you share the key loops that contribute to the speedup you observed from partial unswitch in spec2k6 omnetpp by any chance? Thanks.

Herald added a project: Restricted Project. · View Herald TranscriptMay 18 2022, 9:14 PM