This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Simplify calls with "returned" attribute
ClosedPublic

Authored by nikic on Mar 7 2020, 12:42 PM.

Details

Summary

If a call argument has the "returned" attribute, we can simplify the call to the value of that argument. The "-inst-simplify" pass already handled this for the constant integer argument case via known bits, which is invoked in SimplifyInstruction. However, non-constant (or non-int) arguments are not handled at all right now.

This addresses one of the regressions from D75801.

Diff Detail

Event Timeline

nikic created this revision.Mar 7 2020, 12:42 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 7 2020, 12:42 PM

I think this is fine assuming we have a must-tail test to verify we do not break things, e.g. like llvm/test/Transforms/Attributor/IPConstantProp/musttail-call.ll we copied from IPConstantProp.

nikic added a comment.Mar 8 2020, 1:10 AM

@jdoerfert Thanks for pointing that out! musttail handing is indeed broken. Also prior to this patch for the case of constant integer arguments.

I'm wondering what the right way to handle that is. It's safe to fold the musttail call if the call gets removed as well, but given the number of diverse consumers of instsimplify, I don't think we can guarantee that. As such, I think it's safest to just not simplify musttail calls at all (I assume people don't do things like marking intrinsics musttail?)

jdoerfert accepted this revision.Mar 8 2020, 10:33 AM

Must-tails was taken care of, this should be fine. LGTM.

This revision is now accepted and ready to land.Mar 8 2020, 10:33 AM
This revision was automatically updated to reflect the committed changes.

Hello, it seems that this patch is causing a compiler crash for us when building a 4.4 Linux kernel.
https://github.com/ClangBuiltLinux/linux/issues/930
It appears to cause a crash during inlining, though I just run opt -inline <module before inlining from -O2> I don't observe the crash. I assume there's some analysis pass I need to run first? opt -instsimplify -inline also didn't seem to work, though opt -O2 does. See link above for steps to reproduce and most concise reproducer I could come up with.

thakis added a subscriber: thakis.Mar 16 2020, 12:14 PM

I'm seeing a crash that bisected to this too: https://bugs.chromium.org/p/chromium/issues/detail?id=1062010#c4

I'll update that bug once creduce is done running, but I'll revert this for now to save others the work to bisect crashes caused by this.

I'm seeing a crash that bisected to this too: https://bugs.chromium.org/p/chromium/issues/detail?id=1062010#c4

I'll update that bug once creduce is done running, but I'll revert this for now to save others the work to bisect crashes caused by this.

@nickdesaulniers has a reproducer now: https://godbolt.org/z/TfTxjJ

I think the issue (somehow) is that the returned arg operand is a call again. Why that is a problem is unclear, some ordering in the inst-simplify worklist maybe.

I think we can revert this and fix it offline.

Apologies for the revert, but thank you @thakis for it. If you need help testing an updated V2, I'd be happy to help.

nikic reopened this revision.Mar 18 2020, 2:48 PM

Sorry for the delay here, and thanks for taking care of the revert and providing a test case!

Here's an IR test case under opt -inline -instcombine:

declare dso_local i64 @g()

; Function Attrs: norecurse nounwind readnone
define dso_local i64 @i(i64 returned %h) #0 {
entry:
  ret i64 %h
}

define dso_local i32 @j(i32 ()* nocapture %h) {
entry:
  %call = call i32 %h()
  ret i32 undef
}

define dso_local i32 @l() {
entry:
  %call = call i32 @j(i32 ()* @k)
  ret i32 undef
}

define internal i32 @k() {
entry:
  %call = call i64 @g()
  %call1 = call i64 @i(i64 returned %call)
  ret i32 undef
}

attributes #0 = { norecurse nounwind readnone }

I believe the problematic code is in [UpdateCallGraphAfterInlining](https://github.com/llvm/llvm-project/blob/acaf1442226ac1871a6df35fea032a2aef32567a/llvm/lib/Transforms/Utils/InlineFunction.cpp#L1179), which assumes that the instruction an inlined call maps to still refers to the same callee. After this change, this is no longer the case.

I don't see a great way to fix the call graph updating code. I think it might make more sense to move this simplification from InstSimplify into InstCombine to avoid the problem. This would also reduce the compile-time impact of this change, which I didn't really expect.

This revision is now accepted and ready to land.Mar 18 2020, 2:48 PM
nikic updated this revision to Diff 251486.Mar 19 2020, 3:02 PM
nikic retitled this revision from [InstSimplify] Simplify calls with "returned" attribute to [InstCombine] Simplify calls with "returned" attribute.

Perform the transform in InstCombine instead, to avoid issue with call graph updates during inlining.

Thanks for the modifications, I'm happy to give this a shot and report back, if you would kindly mind waiting to submit this on that report? I'll take care of this now.

nickdesaulniers accepted this revision.Mar 19 2020, 3:41 PM

From a testing perspective, LGTM. I was able to build+boot a 4.4 aarch64 Linux kernel built with clang with this patch applied! Thank you.

This revision was automatically updated to reflect the committed changes.
glider added a subscriber: glider.Mar 24 2020, 8:12 AM

Hi Nikita,

Unfortunately this patch broke the behavior of __attribute__((no_sanitize(...))) wrt inlining.
Previously, a function having such attribute could not be inlined into a function without it.
For example, for the following code:

__attribute__((no_sanitize("kernel-memory")))
static inline int KMSAN_INIT_INT(int value) {
 return value;
}

int bar(int value) {
  value = KMSAN_INIT_INT(value);
  return value;
}

Clang now inlines KMSAN_INIT_INT when compiling it with -fsanitize=kernel-memory.

Existing code annotations for sanitizers rely on this inlining to not happen.

nikic added a comment.Mar 24 2020, 2:39 PM

@glider Not quite sure what to do on that front. Inlining here is prevented due to an ABI mismatch, but lack of inlining does not imply that no other optimizations may take place. For example, SCCP will happily propagate a constant parameter across noinline function boundaries. I believe that the returned attribute is essentially in the same situation.

@jdoerfert Do you have any suggestions on how to handle this? It seems like something that can probably happen with the attributor as well, so maybe you already have a solution...

@glider Not quite sure what to do on that front. Inlining here is prevented due to an ABI mismatch,

Not sure I understand where the ABI mismatch is coming from.
Functions with e.g. sanitize_memory attribute are 100% ABI-compatible with functions without such an attribute.
The only distinction is that functions without that attribute will be instrumented differently by the MSan pass.
Since the pass has function-level granularity, not inlining the annotated function is the only possible way to make sure it is not instrumented.

but lack of inlining does not imply that no other optimizations may take place. For example, SCCP will happily propagate a constant parameter across noinline function boundaries. I believe that the returned attribute is essentially in the same situation.

So far we haven't seen cases in which this prevented the sanitizers from working normally. The example you mention doesn't seem to affect our use cases.

lebedev.ri added a comment.EditedMar 25 2020, 3:54 AM

I'm going to guess __attribute__((no_sanitize())) needs to also imply (read: implicitly add) noinline attribute,
but even that is not sufficient to block inter-procedural optimizations.
See also disscussion in https://reviews.llvm.org/D53431

All these semantic guarantees for sanitize_memory functions are rather underspecified in langref.
I'm not sure it is good idea to try to enforce them without codifying them first.

I'm going to guess __attribute__((no_sanitize())) needs to also imply (read: implicitly add) noinline attribute,
but even that is not sufficient to block inter-procedural optimizations.
See also disscussion in https://reviews.llvm.org/D53431

I don't think noinline works here:

__attribute__((__noinline__))
int KMSAN_INIT_INT(int value)
{
 return value;
}

int bar(int value) {
  value = KMSAN_INIT_INT(value);
  if (value < 0)
    value *= -1; 
  return value;
}

(see also https://godbolt.org/z/r5fKT2)

The resulting IR shows that KMSAN_INIT_INT has the noinline attribute, but it effectively ends up being inlined in this case:

; Function Attrs: noinline norecurse nounwind readnone uwtable
define dso_local i32 @KMSAN_INIT_INT(i32 returned %value) local_unnamed_addr #0 {
entry:
  ret i32 %value
}

; Function Attrs: norecurse nounwind readnone uwtable
define dso_local i32 @bar(i32 %value) local_unnamed_addr #1 {
entry:
  %cmp = icmp slt i32 %value, 0
  %mul = sub nsw i32 0, %value
  %spec.select = select i1 %cmp, i32 %mul, i32 %value
  ret i32 %spec.select
}

All these semantic guarantees for sanitize_memory functions are rather underspecified in langref.
I'm not sure it is good idea to try to enforce them without codifying them first.

You're right, we'll have to write them down. But in the case of noinline attribute langref is pretty clear on this subject: "This attribute indicates that the inliner should never inline this function in any situation."

Maybe we should add new GCC-like attribute “noipa” to disable these optimizations?

lebedev.ri added a comment.EditedMar 26 2020, 4:02 AM

I'm going to guess __attribute__((no_sanitize())) needs to also imply (read: implicitly add) noinline attribute,
but even that is not sufficient to block inter-procedural optimizations.
See also disscussion in https://reviews.llvm.org/D53431

I don't think noinline works here:
<...>

Yeah, like i said, noinline is insufficient here, and i don't actually
think we currently have the right tool to block any such transform:

See also disscussion in https://reviews.llvm.org/D53431

All these semantic guarantees for sanitize_memory functions are rather underspecified in langref.
I'm not sure it is good idea to try to enforce them without codifying them first.

You're right, we'll have to write them down. But in the case of noinline attribute langref is pretty clear on this subject: "This attribute indicates that the inliner should never inline this function in any situation."

To be noted, i'm not currently saying that the expected behavior is unreasonable,
i'm merely saying that the behavior was not actually guaranteed but accidental,
and likely partial.

Yeah, like i said, noinline is insufficient here, and i don't actually
think we currently have the right tool to block any such transform:

Do you think it makes sense to check for noinline before transforming the call into the argument, i.e.:

if (!Call.use_empty() && !Call.isMustTailCall()) {
  Function *Fn = dyn_cast<Function>(Callee);
  if (Fn && Fn->hasFnAttribute(Attribute::NoInline))
    if (Value *ReturnedArg = Call.getReturnedArgOperand())
      return replaceInstUsesWith(Call, ReturnedArg);
}

You're right, we'll have to write them down. But in the case of noinline attribute langref is pretty clear on this subject: "This attribute indicates that the inliner should never inline this function in any situation."

To be noted, i'm not currently saying that the expected behavior is unreasonable,
i'm merely saying that the behavior was not actually guaranteed but accidental,
and likely partial.

I agree. Let's concentrate on noinline then, as we can afford making our attributes imply noinline.

Yeah, like i said, noinline is insufficient here, and i don't actually
think we currently have the right tool to block any such transform:

Do you think it makes sense to check for noinline before transforming the call into the argument, i.e.:

if (!Call.use_empty() && !Call.isMustTailCall()) {
  Function *Fn = dyn_cast<Function>(Callee);
  if (Fn && Fn->hasFnAttribute(Attribute::NoInline))
    if (Value *ReturnedArg = Call.getReturnedArgOperand())
      return replaceInstUsesWith(Call, ReturnedArg);
}

I can only once again point at

The intent of noinline in LLVM's IR is to block inlining, not all interprocedural optimizations. So I don't think this is actually a bug and don't think this patch should be applied.

We have an attribute to block optimizations: optnone.

You're right, we'll have to write them down. But in the case of noinline attribute langref is pretty clear on this subject: "This attribute indicates that the inliner should never inline this function in any situation."

To be noted, i'm not currently saying that the expected behavior is unreasonable,
i'm merely saying that the behavior was not actually guaranteed but accidental,
and likely partial.

I agree. Let's concentrate on noinline then, as we can afford making our attributes imply noinline.

I'm guessing s/can/can't/ was meant.

Maybe we should add new GCC-like attribute “noipa” to disable these optimizations?

I would expect something to that effect would be needed
to actually provide the guarantees sanitizers are/were expecting.

I agree that the example code is relying on the attribute behavior that is underspecified. In this case, no_sanitize("memory") functions, when compiled with MSan, give special meaning to returning a value from a function. As a lot of other things that MSan does, this one is kind of best effort - consider if the function has passed the value by address:

int* KMSAN_DO_NOTHING(int *p) {
  return p;
}

This int will not get initialized by the no_sanitize attribute.

We could attempt to preserve this behavior by

  1. suppressing all IPA across sanitize/no_sanitize boundary, the same as we do with inlining. This could be hard to maintain.
  2. not inferring the "returned" attribute in no_sanitize(memory) functions. This makes sense, because the function does not simply return that argument.

Or fix this in the user code with optnone or explicit call to __msan_unpoison.

I would really prefer not to overload existing attributes but instead introduce a new one or reuse existing ones them with their "current meaning" (e.g., optnone). I'm also against "complex" attributes when they are comprised of orthogonal parts we can split off, e.g., something like nosanitize should not imply noipa. That said, I think having noipa seems reasonable.

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
4569

re: https://bugs.llvm.org/show_bug.cgi?id=46463

Do we need to check that the callee doesn't have the noinline function attribute?

lebedev.ri added inline comments.Jun 26 2020, 11:11 AM
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
4569

I'm not sure i understand why presence of noinline should block this?
noipa on the other hand, sure.

jdoerfert added inline comments.Jun 26 2020, 11:32 AM
llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
4569

I'm not sure i understand why presence of noinline should block this?
noipa on the other hand, sure.

+1 (a little more detail: https://reviews.llvm.org/D75815#1976892)

@nikic, @spatel do we also need a fix for register allocator? it seems like this optimization can lead to supoptimal codegen.

Check:
https://godbolt.org/z/jPT53e

@nikic, @spatel do we also need a fix for register allocator? it seems like this optimization can lead to supoptimal codegen.

Check:
https://godbolt.org/z/jPT53e

I don't know anything about that part of codegen, but yes it looks like a regression. Please file a bug, so we can track it.