The partially invariant unswitch has been implemented on LoopUnswitch. https://reviews.llvm.org/D93764 We need to port the feature to SimpleLoopUnswitch for new pass manager. It is related to https://bugs.llvm.org/show_bug.cgi?id=49128
Details
Diff Detail
Unit Tests
Event Timeline
Thanks for putting up this patch! Would it be possible to move the detection code (hasPartialIVCondition) somewhere, so both version of unswitching can use the same code?
Yep, I also wanted to share it so I have tried to keep the existing implementation as much as possible... I am not sure which place is good for this... maybe, somewhere in lib/Transform/Utils?
Thank you for looking into this!
Unhelpful comment: have you considered splitting this into two patches, moving code from LoopUnswitch to a common place, and enhancing SimpleLoopUnswitch ?
Following comment of @lebedev.ri, split previous patch into two patches. This one works on top of https://reviews.llvm.org/D99490
llvm/test/Transforms/SimpleLoopUnswitch/partial-unswitch.ll | ||
---|---|---|
1 | Precommit this? |
llvm/test/Transforms/SimpleLoopUnswitch/partial-unswitch.ll | ||
---|---|---|
1 | Sorry... This test file is a copy of test/Transforms/LoopUnswitch/partial-unswitch.ll using SimpleLoopUnswitch. The output of SimpleLoopUnswitch is slightly different with LoopUnswitch one so I have run the script to generate assertion. I have checked the each tests' output. If it is not good to use the script, I will add the assertions manually. Please let me know. |
llvm/test/Transforms/SimpleLoopUnswitch/partial-unswitch.ll | ||
---|---|---|
1 | No i mean, just commit this test file as-is, obviously after regenerating the check lines to pass on main. |
llvm/test/Transforms/SimpleLoopUnswitch/partial-unswitch.ll | ||
---|---|---|
1 | Ah, sorry. let me commit this test separately after checking it again. |
Any objection to push this change please? or can someone let me know what I have to do something more for this change please?
Does this patch give the expected speedup on omnetpp?
please keep the 'common courtesy ping rate' of 1 week in mind (https://llvm.org/docs/CodeReview.html#code-reviews-speed-and-reciprocity), also considering that doing a proper review can take a substantial amount of time for a reviewer and there can be a number of reasons for a delayed response (like holidays, other urgent work).
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp | ||
---|---|---|
212 | Invariants here is not really accurate I think. Those are the instructions we need to duplicate outside the loop, right? | |
219 | nit: perhaps a bit simpler NewInst->insertBefore(BB->getTerminator()) | |
2016 | struct not needed. | |
2029–2031 | I think we should probably update IVConditionInfo to provide a better way to check if there are partially invariant conditions, e.g. an isPartiallyInvariant accessor. WDYT? | |
2149–2160 | nit: In the partially invariant case, if UnswitchedSuccBB is an exit block, do not .... | |
2151 | no need for llvm::, also can we just capture [&SuccBB]? | |
2375 | We don't need to execute this loop at all for the PartiallyInvariant case, right? | |
2692 | struct not needed. | |
2738 | nit: no llvm:: should be needed. | |
2742 | nit: no llvm:: should be needed | |
2871 | Do we need to check here that we only do this for blocks where we partially unswitch on their conditions? | |
2950–2951 | this change seems unrelated? | |
3154 | can you add a comment here to explain the check? |
Yep, It makes expected speedup on omnetpp.
please keep the 'common courtesy ping rate' of 1 week in mind (https://llvm.org/docs/CodeReview.html#code-reviews-speed-and-reciprocity), also considering that doing a proper review can take a substantial amount of time for a reviewer and there can be a number of reasons for a delayed response (like holidays, other urgent work).
I am sorry for inconvenient. I will follow the review rule.
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp | ||
---|---|---|
212 | Yep, I will change it to ToToDuplicate. | |
219 | The BB is empty block so we can not use BB.getTerminator() here. | |
2016 | Yep, I will remove it. | |
2029–2031 | I agree with you. It is better to provide isPartiallyInvariant in IVConditionInfo. If possible, it could be good to create a separate patch for it after pushing this patch. | |
2149–2160 | Yep, I will update it. | |
2151 | Yep, I will update it. | |
2375 | Yep, you are right! I will update it. | |
2692 | Yep, I will update it. | |
2738 | Yep, I will update it. | |
2742 | Yep, I will update it. | |
2871 | We need to check the cost of the successors which are duplicated. The partially unswitched blocks on their conditions are only duplicated. | |
2950–2951 | Ah, I ran clang-format. It is result of clang-format. | |
3154 | Yep, I will add a comment. |
Thanks for the latest update! I gave the patch a try on SPEC2006 on X86 with -O3 -flto and it looks like there's a crash when building 445.gobmk.
Ah, sorry, I could make a mistake while I rebase and update the change. Let me check.
@fhahn There was a bug. I have fixed it. I have re-run spec2006 on x86 and it was fine.
@fhahn Can we push this change please? If you need something more, please let me know.
One final round of comments, after that I think this LG.
Could you also add a test for the threshold (like llvm/test/Transforms/LoopUnswitch/partial-unswitch-mssa-threshold.ll) and the interesting MemorySSA update cases from llvm/test/Transforms/LoopUnswitch/partial-unswitch-update-memoryssa.ll?
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp | ||
---|---|---|
210 | nit: Copy a set of loop invariant values \p ToDuplicate and insert them at the end of \p BB....? ... and conditional branch on the copied condition.` We only branch on a single value. | |
2950–2951 | ok, can you undo it? As it is not related to the change. |
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp | ||
---|---|---|
2029–2031 | Sounds good, looking forward to a follow-up! | |
2050–2051 | message in assert needs updating? | |
2737–2752 | enough to capture [this]? | |
2746 | Why TinyPtrVector here? In most case we will have at least 2 instructions here anyways? Probably better to use a SmallVector? | |
2749 | emplace_back? |
Yep, let me try to add the tests.
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp | ||
---|---|---|
210 | Yep, I will update it. | |
2029–2031 | Yep | |
2050–2051 | Yep, I will update it. | |
2737–2752 | This code is inside static function rather than class's member function so we can not use [this]. I will update it with [&L]. | |
2746 | The SimpleUnswitchPass uses TinyPtrVector for UnswitchCandidates. In order to follow the interface, TinyPtrVector is being used here. | |
2749 | TinyPtrVector does not have emplace_back. | |
2950–2951 | Yep, I will undo it. |
LGTM, thanks! A few small comments that can be addressed before committing without further review I think.
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp | ||
---|---|---|
3154 | nit: perhaps unswitched using a partially invariant condition, ...? | |
llvm/test/Transforms/SimpleLoopUnswitch/partial-unswitch.ll | ||
1104 | Could you add a brief comment and a more descriptive name for the test? | |
1143 | It would be great if you could use descriptive labels in the test both for basic blocks and values, to make it easier to read in case people need to take a look in the future. | |
1150 | please avoid using null as pointer, as this UB |
Thanks @fhahn! After updating code following your comments, I will push this change.
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp | ||
---|---|---|
3154 | Yep, I will update it. | |
llvm/test/Transforms/SimpleLoopUnswitch/partial-unswitch.ll | ||
1104 | Yep, I will add them. | |
1143 | Yep, I will update it. | |
1150 | Yep, I will update it. |
It looks like this patch causes SimpleLoopUnswitch to not terminate on some inputs, see https://bugs.llvm.org/show_bug.cgi?id=50302 for an example
Thanks. If it's not straight-forward to resolve, it would be best to revert the patch for now.
@fhahn Thanks for kind suggestion. I have figured out what causes the endless compiling. It looks we need to move partially invariant instructions rather than duplicating them. Once I resolve this issue, I need to check the score of benchmarks again. Therefore, I would post a patch for review tomorrow or the day after tomorrow.
Great, thanks! It sounds like it would make sense to revert until this is resolved than, so there's no need to rush :)
Yep, I will revert it now. Additionally, I have seen SimpleLoopUnswitch with this patch does not affect performance number of omnetpp in SPEC2017. I need to figure out it too. Once I fixed these issues, I will let you know.
The below bugs are fixed.
https://bugs.llvm.org/show_bug.cgi?id=50279
https://bugs.llvm.org/show_bug.cgi?id=50302
Even if this patch does not add the processed loop to loop pass manager again, the later loop transformations create loop with the header, which has partially invariant instructions, and add it to loop pass manager again. The loop pass manager runs SimpleLoopUnswitchPass with the loop again and it sometimes causes endless unswitch with some CFGs.
In order to avoid the endless unswitch, the new change removes the partially invariant instruction from header as below.
// +--------------------+ // | preheader | // | %a = load %ptr | // +--------------------+ // | /----------------\ // +--------v----v------+ | // | header |---\ | // | %c = phi %a, %b | | | // | %cond = cmp %c, .. | | | // | br %cond | | | // +--------------------+ | | // | | | // +--------v-----------+ | | // | store %ptr | | | // +--------------------+ | | // | | | // +--------v-----------<---/ | // | latch >----------/ // | %b = load %ptr | // +--------------------+
@fhahn I have updated the diff which fixes the bugs. I have added phi node to header in order to remove the partially invariant instructions from header. If you feel something wrong from it, please let me know.
For the performance impact on the omnetpp of spec2017, it looks the different inlining is being happened on the benchmark between legacy and new pass manager and it causes fewer chances for unswitching loop. Maybe, I could try to figure out what it causes the different inlining between legacy and new pass manager later.
@fhahn Can we push this patch again please? I have checked the benchmarks and the scores are fine.
Hm, this seems quite fragile, as another pass may decide to move the load to the header again. In the legacy pass manager, the loop is annotated with metadata to avoid partially unswithcing the same loop multiple times (https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Scalar/LoopUnswitch.cpp#L849) Can we do the same here?
llvm/test/Transforms/SimpleLoopUnswitch/endless-unswitch.ll | ||
---|---|---|
1 ↗ | (On Diff #345815) | We should avoid adding tests using -O3 here. Can we reproduce the issue by running simple-loop-unswtich twice? Also it would be good to clean up the test a bit. |
um... I thought the load would not be moved to header later because it is loop invariant and we usually try to move it to outside loop.
In the legacy pass manager, the loop is annotated with metadata to avoid partially unswithcing the same loop multiple times (https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Scalar/LoopUnswitch.cpp#L849) Can we do the same here?
Yep, I have seen the metadata. I was just not sure whether people will accept it or not. Let me try to add it.
llvm/test/Transforms/SimpleLoopUnswitch/endless-unswitch.ll | ||
---|---|---|
1 ↗ | (On Diff #345815) | um... in order to reproduce the endless unswitch, it needs other passes. Let me try to reduce the test. |
Following comments of @fhahn, updated code.
- Added metadata to avoid endless unswitch
- Reduced test case
@fhahn I have followed the LoopUnSwitch pass's metadata and it looks ok with benchmark scores. If you feel something wrong from the change, please let me know.
Thanks for the update! looks good, as this in line with the legacy loop-unswitch implementation. I think the last piece of work remaining is to update the tests to check that the metadata is emitted? Also, could you add a test with an partially unswitchable loop, that already has llvm.loop.unswitch.partial.disable and ensure it is not unswitched?
Following comment of @fhahn, added tests for metadata with llvm.loop.unswitch.partial.disable.
llvm/test/Transforms/SimpleLoopUnswitch/partial-unswitch-check-metadata.ll | ||
---|---|---|
9 ↗ | (On Diff #348957) | Can this test just be part of llvm/test/Transforms/SimpleLoopUnswitch/partial-unswitch.ll? |
llvm/test/Transforms/SimpleLoopUnswitch/partial-unswitch-generate-metadata.ll | ||
8 ↗ | (On Diff #348957) | can you also include checks for the generated IR, ensuring that the whole metadata chain is correct (!llvm.loop attached to the right unswitched loop and contains the disable metadata? Also, can this just be part of llvm/test/Transforms/SimpleLoopUnswitch/partial-unswitch.ll? I think we can just extend the checks there, rather than adding a new file + function. |
Thanks @fhahn! After updating tests, I will push it.
llvm/test/Transforms/SimpleLoopUnswitch/partial-unswitch-check-metadata.ll | ||
---|---|---|
9 ↗ | (On Diff #348957) | Yep, I will add the metadata check in llvm/test/Transforms/SimpleLoopUnswitch/partial-unswitch.ll. |
llvm/test/Transforms/SimpleLoopUnswitch/partial-unswitch-generate-metadata.ll | ||
8 ↗ | (On Diff #348957) | Yep, I will add the metadata check in llvm/test/Transforms/SimpleLoopUnswitch/partial-unswitch.ll. |
Great to see progress on this :-)
I don't see the speedup on omnetpp that we saw before on SystemZ. Do I need to pass some option or adjust some target threshold value somewhere?
@jonpa Thanks for checking the performance number on SystemZ.
I was able to see the speed up on omnetpp of SPEC2006 but there was no speedup on omentpp of SPEC2017. I guess you are looking SPEC2017 one.
It seemed there is different inlining between new pass manager and legacy pass manager and it causes to reduce partial loop unswitch opportunities on omnetpp of SPEC2017.
I am trying to get the speedup with the partial loop unswitch on omnetpp of SPEC2017 again. Once I fix it, I will let you know.
yes
It seemed there is different inlining between new pass manager and legacy pass manager and it causes to reduce partial loop unswitch opportunities on omnetpp of SPEC2017.
I am trying to get the speedup with the partial loop unswitch on omnetpp of SPEC2017 again. Once I fix it, I will let you know.
awesome!
Thanks for reminding me! @fhahn
Yep, I have checked those bug are fixed. I will close them.
@jaykang10 it looks like this patch regressed codegen in some cases: https://bugs.llvm.org/show_bug.cgi?id=51139. It would be great if you could take a look.
As I mentioned on https://bugs.llvm.org/show_bug.cgi?id=51141, I am not sure the unswitch pass has to detect the cases which the loop load pre, sccp or other passes can optimize and the pass do not unswitch the loop...
I am creating a patch to fix https://bugs.llvm.org/show_bug.cgi?id=51141 first. The patch does not fix https://bugs.llvm.org/show_bug.cgi?id=51139. Once I create it, let me add you as reviewer.
For https://bugs.llvm.org/show_bug.cgi?id=51139, I have added below comment on it.
In this case, the inliner pass fails to inline the function g after unswitch because of the cost as below debug message. NOT Inlining (cost=250, threshold=250), Call: call void @g(i32 %2) #3 Originally, after inlining function g, the JumpThreading pass made the block with call @foo dead and SimplifyCFG pass deleted it. If you add `always_inline` attribute to the function g's prototype as below, you can see the callfoo is gone. void g(int h) __attribute__((always_inline));
In this case, as I mentioned previously, I am not sure the unswitch pass has to check the inline cost or something like that... If someone has idea about it, please let me know.
The SimpleUnswitchPass checks the cost of duplicating the loop so we could adjust it. Let me try to adjust the cost calculation for partially invariant unswitch.
@jaykang10, can you share the key loops that contribute to the speedup you observed from partial unswitch in spec2k6 omnetpp by any chance? Thanks.
nit:
Copy a set of loop invariant values \p ToDuplicate and insert them at the end of \p BB....?
... and conditional branch on the copied condition.` We only branch on a single value.