This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Fold icmp into phi beyond the same BB.
ClosedPublic

Authored by bipmis on Aug 11 2023, 10:44 AM.

Details

Summary

Observing cases where folding a icmp into phi can be beneficial. Particularly if the above fold results in the icmp argument of type BinaryOperator, it may enable more folding.
Need to understand if there is reason why the folding of icmp into phi happens within the same BB only.
Can this be done under additional checks.

Ran all tests. Looks Ok to me.

Diff Detail

Event Timeline

bipmis created this revision.Aug 11 2023, 10:44 AM
bipmis requested review of this revision.Aug 11 2023, 10:44 AM

Gentle Ping.

nikic added inline comments.Aug 17 2023, 5:12 AM
llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
3895–3896

I don't think this special case make sense. Either we allow this, or we don't. There is no reason to believe that this is beneficial for just binary operators -- it just happens that this allows follow-on folds in your specific example.

I am open to dropping the check entirely, if we don't see any significant regressions from that.

bipmis updated this revision to Diff 552296.Aug 22 2023, 3:20 AM

Updating the patch to fold cmp's into phi beyond BB.
Tested and did not find any significant regressions. Can provide the test results if needed.

bipmis marked an inline comment as done.Aug 22 2023, 3:21 AM

My results agree - this seems to look OK perf wise.

nikic accepted this revision.Aug 22 2023, 12:29 PM

LGTM, let's see if anyone complains.

llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
3896–3897

You can just drop this comment -- not terribly helpful now that the check is gone. (We're folding not only beyond the same block.)

This revision is now accepted and ready to land.Aug 22 2023, 12:29 PM

You might want to double check whether the dfsan failure in pre-merge checks is legitimate or not.

bipmis updated this revision to Diff 552728.Aug 23 2023, 8:45 AM
bipmis added reviewers: browneee, MaskRay.

Updating patch to handle the dfsan compiler-rt test case.

dfsan failure in pre-merge checks is legitimate. Looks like the first branch is being optimized out based on the proposed change. assert added to avoid it.
Additionally dfsan_set_label(LabelIJ..) has been added. This handles the original and latest changes in the code correctly. I could be wrong. Would be good to have check if the changes looks correct.

This revision now requires review to proceed.Aug 23 2023, 8:45 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 23 2023, 8:45 AM
Herald added subscribers: Restricted Project, Enna1. · View Herald Transcript
browneee requested changes to this revision.Aug 23 2023, 8:59 AM
browneee added inline comments.
compiler-rt/test/dfsan/conditional_callbacks.c
100

Thanks nikic and bipmis for checking this change with me :)

The point of the test is that tainted_cond should have taint labels propagated from above. This change would invalidate this test, and break the behavior this is testing for.

I wonder, what is the IR for ((DataI * DataJ) != 1) before and after this change? I suspect this is where the taint label is being lost.

This revision now requires changes to proceed.Aug 23 2023, 8:59 AM
bipmis added inline comments.Aug 23 2023, 12:18 PM
compiler-rt/test/dfsan/conditional_callbacks.c
100

Ok Thanks for explaining it.
Let me discuss the IR for the original conditional_callbacks.c before and after the change.
I see the DataI and DataJ are obtained from argv as expected and same in both cases. The result calculation is completely avoided in the mod patch. The phi of the result values is also avoided. Instead it is converted to an i1 cmp based on the value check of DataJ==2.

The error I encounter with this is :
/home/bipmis01/llvm_30may/llvm-project/compiler-rt/test/dfsan/conditional_callbacks.c:79:12: error: CHECK: expected string not found in input
CHECK: Label 1 used as condition
<stdin>:1:1: note: scanning from here
conditional_callbacks.c.tmp: /home/bipmis01/llvm_30may/llvm-project/compiler-rt/test/dfsan/conditional_callbacks.c:35: void my_dfsan_conditional_callback(dfsan_label, dfsan_origin): Assertion `Label == LabelI' failed.//

browneee added inline comments.Aug 23 2023, 2:52 PM
compiler-rt/test/dfsan/conditional_callbacks.c
100

Thanks. Now that I see the error, my suspicion was wrong (or maybe that is a second problem after the error).

When dfsan is tracking tainted data, this feature allows users to find when the data is used to make a decision (ie, used in a control flow condition).

int DataI = (Argv[0][0] != 0) ? 1 : 0; // 69: no callback here, because condition is not tainted

if (DataI) { // 80: DataI was tainted by dfsan_set_label(LabelI, &DataI, sizeof(DataI));

Because DataI is tainted, my_dfsan_conditional_callback gets called.

static int Count = 0;
  switch (Count++) {

The Count in my_dfsan_conditional_callback is a hack to have different logic each time my_dfsan_conditional_callback.

The assert that is failing on 35, when Count==0. So the first call has a tainted condition (otherwise it wouldn't be called) but it is not tainted with LabelI.

The failure of Label != LabelI probably means that my_dfsan_conditional_callback is first called from another location.

if (DataI) { // 80

I think this code still happens (your diff; left 93 (IIUC, the left side of your diff is new?)), but later - so the calls to my_dfsan_conditional_callback would happen in a different order - and the assertions would fail.

I suggest temporarily remove the assertions in the switch in my_dfsan_conditional_callback, and see if it prints 3 things for the 3 // CHECK:. This would confirm things are still working, and the problem is that the test was not robust to code re-ordering.

bipmis added inline comments.Aug 24 2023, 4:56 AM
compiler-rt/test/dfsan/conditional_callbacks.c
100

I think the first condition has been optimised away.
./conditional_callbacks.c.tmp-orig FooBarBaz
Label 1 used as condition
Label 2 used as condition
Label 3 used as condition
Label 3 used as condition

./conditional_callbacks.c.tmp FooBarBaz
Label 2 used as condition
Label 3 used as condition

This can also be seen from the IR.
A modification to the test as below now works. Not right but just a proof of the calls taking place

switch (Count++) {
case 0:
  assert(Label == LabelJ);
  break;
case 1:
  assert(Label == LabelIJ);
  break;
default:
  break;
}

And remove the first check from source

// CHECK: Label 1 used as condition
browneee added inline comments.Aug 25 2023, 11:15 AM
compiler-rt/test/dfsan/conditional_callbacks.c
100

I think first condition being optimised away makes sense as an explanation, and matches that output.

Question is how to fix it?

Ideas
  1. Mark dfsan_set_label second argument as volatile? I think this would prevent the optimization from removing the if.
  2. Change the code in the test so the optimization won't break the test. e.g. maybe result = identity(42); where identity() is not inlined (I don't have a good sense for when optimizations will [not] occur).
  3. Run DFSan transform pass earlier, before these optimizations? (maybe not feasible; probably a fair bit of work to figure out)
Re 1):

I think this is a problem here only because the (Argv[0][0] != 0) ? 1 : 0; ternary, the dfsan_set_label, and the if are all so close together. If dfsan_set_label was much earlier, then I expect the first ternary would trigger the check and it would work out ok.

Change would be

And similar surrounding functions.

This may be the best option, if it works.

Re 2):

It wouldn't surprise me if there are already other cases where a use in a condition is optimized away, and we just don't have a test pointing to it. Would prefer to keep the if condition (modified to avoid optimization) rather than remove it.

I'm OK with this option.

Re 3):

Probably much more effort than the other two. May create other problems.

bipmis added inline comments.Sep 1 2023, 5:05 AM
compiler-rt/test/dfsan/conditional_callbacks.c
100

Thanks for the ideas.

  1. havent checked yet as it could involve changes to multiple files which uses dfsan_set_label. Also I am not fully sure if this would avoid the optimization from taking place.
  1. Other way to avoid the optimization from happening is to use the value of result. In the current case since only use of result is an assert, it optimizes way most of it(condition1 ,the entire switch and assert itself). The simplest change can be a print of result every time it is updated.

I checked this and it works fine.
This would avoid any new flow/code changes and still avoid the optimization from kicking in. Let me know if this sounds OK. I can update the patch for you to have a look.
Thanks.

browneee added inline comments.Sep 1 2023, 11:14 AM
compiler-rt/test/dfsan/conditional_callbacks.c
100

Printing results SGTM. A non-inlined function to add things to result would probably work too. Thanks!

bipmis updated this revision to Diff 555487.Sep 1 2023, 1:10 PM

Updated patch to use the result by prints. Avoids the optimization of conditions and passes the test.

bipmis marked an inline comment as done.Sep 1 2023, 1:12 PM
bipmis added inline comments.
compiler-rt/test/dfsan/conditional_callbacks.c
100

Thanks. Have updated the patch.
Agreed that can work too.

bipmis added a comment.Sep 5 2023, 1:37 AM

@browneee @nikic Please have a look and let me know if the test changes look OK. Thanks.

dmgreen accepted this revision.Sep 7 2023, 1:36 AM

If this works and passes the test then it sounds good to me. LGTM

bipmis retitled this revision from [WIP] [InstCombine] Fold icmp into phi beyond the same BB. to [InstCombine] Fold icmp into phi beyond the same BB..Sep 7 2023, 8:12 AM
This revision was not accepted when it landed; it landed in state Needs Review.Sep 7 2023, 8:55 AM
This revision was automatically updated to reflect the committed changes.