Page MenuHomePhabricator

[InstCombine] keep assumption before sinking calls
ClosedPublic

Authored by Tyker on Oct 27 2019, 9:05 AM.

Details

Summary

in the following C code the branch is not removed by clang in O3.

int f1(char* p) {
    int i1 = __builtin_strlen(p);
    if (!p)
        return -1;
    return i1;
}

The issue is that the call to strlen is sunk to the following block by instcombine. In its new place the call to strlen doesn't dominate the use in the icmp anymore so value tracking can't see that p cannot be null.
This patch resolves the issue by inserting an assumption at the place of the call before sinking a call when that call can be used to prove an argument to be nonnull.
This resolves this issue at O3.

Diff Detail

Event Timeline

Tyker created this revision.Oct 27 2019, 9:05 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 27 2019, 9:05 AM
Tyker retitled this revision from [InstCombine] to [InstCombine] keep assumption before skink calls.Oct 27 2019, 9:07 AM
Tyker retitled this revision from [InstCombine] keep assumption before skink calls to [InstCombine] keep assumption before skinking calls.
Tyker added a reviewer: xbolva00.
xbolva00 added a comment.EditedOct 27 2019, 10:32 AM

Yeah, this is case when sinking is not very profitable.

I wrote patch to disable sinking for these cases long time ago - https://reviews.llvm.org/D53098 - but I am not sure why I abandoned it - probably LLVM was poor to infer nonnull/deref that time. I improved it a bit this summer.

Some open questions:

This patch adds llvm.assume so effectively it increases use count of a value and some optimizations under with hasOneUse() will not trigger. I would prefer just to disable sinking in this case.

fhahn:

I got the first results back, for AArch64 cortex-a57, running the test-suite, spec2000 and > spec2k6. There is one regression (+10% runtime, +16% code size) in SPEC2006, and a handful of small improvements across suites (range -3% to -1%)

So after a first look, it doesn't look too bad. I'll take a look into the SPEC2k6 regression. > > But it would be good to get numbers on other platforms/benchmarks too.

BTW, I just made TryToSinkInstruction return false in all cases, to disable sinking.

(PR39119)

  • Should we disable sinking? Or just disable sinking of call instructions?
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3157

You need to check also llvm::NullPointerIsDefined(), I think

llvm/test/Transforms/InstCombine/Assume-Remplacing-Call.ll
5 ↗(On Diff #226573)

typo remplacement

majnemer retitled this revision from [InstCombine] keep assumption before skinking calls to [InstCombine] keep assumption before sinking calls.Oct 28 2019, 3:45 PM
jdoerfert added a comment.EditedOct 28 2019, 4:24 PM

TL;DR I would prefer a different solution but this seems fine for the short term. We could also get a similar short term solution up and running in the Attributor easily while avoiding some of the drawbacks I mention below, though that will have other consequences wrt. timing.


I personally would argue this is the wrong place to do this (in the middle-long term), assuming I grasp the whole scope of the change.

I'll outline what I think we should do (in the middle-long term) below. I also add inline comments for this approach if we want to stick with it for now (which I'm not actually opposing).

The problem here is more generic than nonnull in the sink transformation of instcombine.
The problem is that we would like to retain knowledge (of various kinds) while allowing transformations to do what they think is best.
Using llvm.assume is appealing for nonnull but why are we focusing on that one, e.g, why don't we save the dereferenceability information, alignment, ...?
Using llvm.assume is actually problematic for more complex cases, e.g., dereferenceable is hard to express.
So what I would like to see is a more generic framework to pin information/attributes to locations *only* if we, during a dedicated phase, decide that it might be profitable.
Above I already tried to make the point why we don't want to use llvm.assume, now we should not do it on a "per-transformation-basis" because it is not clear if this is the right thing to do (I mean retain the information explicitly):

  • What if in the tests above the argument %p of test_sink_remplacementX would be nonnull? (Right now we would introduce a useless assume.)
  • What if there are (very likely) no users of the information tracked with the llvm.assume?
  • What other passes need to be augmented or do we want to adopt a "case-by-case" approach?

To make my proposed scheme work we need to come up with a new "assumption encoding".
That said, there are various reasons to do so, e.g., __builtin_assume with potential side effects come to mind, so does #pragma omp assume (OpenMP TR8, to be released soon).
I have some ideas on how this could look like without disturbing transformations or accidentally exposing side-effects, but so far I haven't put my thoughts into a coherent write-up.
If there is interest, please contact me :)

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3155

[Wording]

// nonnull call site arguments.

While you only test caller arguments below, the code will actually add assumptions for any nonnull call site argument.

3163

The above checks are "too complicated". You don't want to look at call site and callee explictly but use helpers that look at both. In fact, CallBase::paramHasAttr is such a helper that looks at the call base and the callee if available. Also, there is no dereferenceable(0) which should simplify the last part.

Btw. the comment by @xbolva00 is correct, deref doesn't always imply nonnull.

3169

Nit: Use a CallBase instead of CallSite

llvm/test/Transforms/InstCombine/Assume-Remplacing-Call.ll
67 ↗(On Diff #226573)

I would prefer using using the update test checks script.

88 ↗(On Diff #226573)

these names do not help, make the nonnull explicit in the name please.

89 ↗(On Diff #226573)

A test is missing where the sunk function is not in the entry block and where the argument of the sunk function is not an argument of the caller because in the cases above we could just annotate the argument of the caller as nonnull. (no need for the assume).

Well, maybe better solution - if a callsite has any attribute, avoid sinking?

Tyker updated this revision to Diff 226996.Oct 29 2019, 6:08 PM
Tyker marked 8 inline comments as done.

fixed comments from @jdoerfert

for the choice between the assume or not sinking the call, both have there drawback. if no users from the call can benefit from the nonnull (or other attributes) grantee sinking would be the best choice. but in the given example not sinking is the best. i tried to get the best of both with the assume, but it does add a use which can prevent other optimization. i don't know which is the best short term solution.

i completely agree with your points on a good middle-long term solution. but a good solution can't be implemented quickly. when thinking about it there is many situation were we lose information by performing a transformation which can worsen other optimizations.

xbolva00 added inline comments.Oct 29 2019, 6:13 PM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3159

CS-->getFunction()

llvm/test/Transforms/InstCombine/Assume-Remplacing-Call.ll
5 ↗(On Diff #226573)

Also typo in “Assume-Remplacing-Call.ll” (and please use lowercase)

This looks much better, thx. I have one more comment below, otherwise it is fine with me to go in. @xbolva00 comments, objections?

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3159

We talk about a particular pointer that might have an address space so we need sth like:
!llvm::NullPointerIsDefined(CS->getParent()->getParent(), CS->getArgOp(Idx)->getType()->getPointerAddressSpace())

Tyker updated this revision to Diff 227190.Oct 30 2019, 3:24 PM
Tyker marked 2 inline comments as done.

fixed comments

Is Attributor scheduled in pipeline after the XYZ pass which removes branch due to llvm.assume?

Attributor should run after InstCombine too.

Is Attributor scheduled in pipeline after the XYZ pass which removes branch due to llvm.assume?

Attributor should run after InstCombine too.

For now, the Attributor runs once very early (see D69596). Later, we will run the Attributor alternatively/additionally as CG-SCC pass in the "inliner loop". That needs some preparation but it was design to be doable like this from the very beginning. I mean, you will be able to run the Attributor on any IR subset later but there is a little bit of glue code missing.

jdoerfert accepted this revision.Oct 30 2019, 4:01 PM

While I cannot predict the performance impact, I think it is correct and conceptually the right direction (I mean keeping information). LGTM.

This revision is now accepted and ready to land.Oct 30 2019, 4:01 PM
Tyker updated this revision to Diff 227199.Oct 30 2019, 4:06 PM
This revision was automatically updated to reflect the committed changes.
Tyker added a comment.Oct 30 2019, 4:21 PM

@jdoerfert has there been a RFC or a write-up on a more long-term solution to problem of transformation losing potentially useful information ?

I benchmarked spec2006int with -instcombine-code-sinking=false and I observed major regressions. So just disable code sinking is bad decision. What we can try:

  • disable call sinking
  • disable call sinking only if a callsite has any interesting attribute.

@jdoerfert has there been a RFC or a write-up on a more long-term solution to problem of transformation losing potentially useful information ?

The closest to a write-up was this thread. I have it on my TODO list, so ... ;)
If you want to help (write-up, problem description, possible solutions, ...), that would be much appreciated!

Tyker added a comment.Nov 1 2019, 9:23 AM

I am interested in helping with the implementation. but i don't think i can take architectural decisions.