This is an archive of the discontinued LLVM Phabricator instance.

[InlineFunction] update attributes during inlining
ClosedPublic

Authored by anna on Mar 13 2020, 9:19 AM.

Details

Summary

When we inline a callee into a callsite that has attributes on return, we need to add the attributes on calls that feed into the return value in the callee definition.

Currently, we do this for only for calls feeding into the return.

A following patch, will do this also for loads feeding into the return, by adding it as a metadata on the load (i.e. loaded value is nonnull).

The analysis is added to handle the simple cases.

Diff Detail

Event Timeline

anna created this revision.Mar 13 2020, 9:19 AM
reames requested changes to this revision.Mar 13 2020, 10:43 AM

This is not correct. You're modifying the callee function, not the copy of the callee inlined into the caller. The callee may continue to persist, and may have other callers for which this fact does not apply. You need to perform this transform on the inlined copy.

See AddAliasScopeMetadata for an example of the approach.

Also, I'd strongly prefer to see this phrased as merging all return attributes.

This revision now requires changes to proceed.Mar 13 2020, 10:43 AM
jdoerfert requested changes to this revision.Mar 13 2020, 12:33 PM
jdoerfert added a subscriber: jdoerfert.

In addition to @reames correctness concerns this is not valid for a different reason. You cannot backwards propagate information in the general case:

__attribute__((nonnull)) int *foo(int c) {
   int *A = unknown();
   int *B = unknown();
   do_sth_with_ptr_and_may_use_nonnull(A, B);
   return c ? A : B; // or if (C) return A; else return B;
}

Knowing foo returns a nonnull value cannot be used to annotate A or B.

I think this is generally problematic as it basically special cases a situation which we can handle in general just fine.
Let's assume we inline and keep the nonnull on the return values around, we run nonnull deduction on the caller and we get the nonnull where it is correct.
The benefit of doing it this way (D75825) is that it will also work with all other attributes as well without reinventing all the logic in the Attributor.

In addition to @reames correctness concerns this is not valid for a different reason. You cannot backwards propagate information in the general case:

__attribute__((nonnull)) int *foo(int c) {
   int *A = unknown();
   int *B = unknown();
   do_sth_with_ptr_and_may_use_nonnull(A, B);
   return c ? A : B; // or if (C) return A; else return B;
}

Knowing foo returns a nonnull value cannot be used to annotate A or B.

Good catch, I'd missed that.

I think this is generally problematic as it basically special cases a situation which we can handle in general just fine.
Let's assume we inline and keep the nonnull on the return values around, we run nonnull deduction on the caller and we get the nonnull where it is correct.
The benefit of doing it this way (D75825) is that it will also work with all other attributes as well without reinventing all the logic in the Attributor.

I disagree, mostly in the framing of this as an either/or. If we can cheaply match IR patterns during inlining and use attributes instead of assumes, we should. The other option requires IR churn for minimal value since we'll fold the inserted assumes back into attributes in the end anyway. We clearly should use assumes for the general case, but that doesn't mean we shouldn't specialize the common case.

anna updated this revision to Diff 250292.Mar 13 2020, 1:36 PM

addressed review comments. This handles all attributes on the call and only done after the cloning of the inlined body.
Tests added for attributes and making sure the original callee does not have the attributes added within its body.

anna added a comment.Mar 13 2020, 1:49 PM

In addition to @reames correctness concerns this is not valid for a different reason. You cannot backwards propagate information in the general case:

__attribute__((nonnull)) int *foo(int c) {
   int *A = unknown();
   int *B = unknown();
   do_sth_with_ptr_and_may_use_nonnull(A, B);
   return c ? A : B; // or if (C) return A; else return B;
}

Knowing foo returns a nonnull value cannot be used to annotate A or B.

Good catch, I'd missed that.

Just noticed this. I think we can special case this to: the return and def call are in the same basic block (it will take care of the if-else in the example) and has only one use which is the return. The latter takes care of avoiding incorrect semantics in that call do_sth_with_ptr_and_may_use_nonnull or any other use that depends on knowing which of the ptr is non-null.

I think this is generally problematic as it basically special cases a situation which we can handle in general just fine.
Let's assume we inline and keep the nonnull on the return values around, we run nonnull deduction on the caller and we get the nonnull where it is correct.
The benefit of doing it this way (D75825) is that it will also work with all other attributes as well without reinventing all the logic in the Attributor.

I disagree, mostly in the framing of this as an either/or. If we can cheaply match IR patterns during inlining and use attributes instead of assumes, we should. The other option requires IR churn for minimal value since we'll fold the inserted assumes back into attributes in the end anyway. We clearly should use assumes for the general case, but that doesn't mean we shouldn't specialize the common case.

The above restrictions (which makes it conservative, but we have cases like that) should allow us cleanly place all the return attributes on the calls within the body, I think.

anna retitled this revision from [InlineFunction] update nonnnull attribute during inlining to [InlineFunction] update attributes during inlining.Mar 13 2020, 1:56 PM
anna edited the summary of this revision. (Show Details)

In addition to @reames correctness concerns this is not valid for a different reason. You cannot backwards propagate information in the general case:

__attribute__((nonnull)) int *foo(int c) {
   int *A = unknown();
   int *B = unknown();
   do_sth_with_ptr_and_may_use_nonnull(A, B);
   return c ? A : B; // or if (C) return A; else return B;
}

Knowing foo returns a nonnull value cannot be used to annotate A or B.

Good catch, I'd missed that.

Just noticed this. I think we can special case this to: the return and def call are in the same basic block (it will take care of the if-else in the example) and has only one use which is the return. The latter takes care of avoiding incorrect semantics in that call do_sth_with_ptr_and_may_use_nonnull or any other use that depends on knowing which of the ptr is non-null.

None of this is sufficient. We are repeating a lot of deduction logic here now...

Take:

__attribute__((nonnull)) int *foo() {
    int *Base = unknown();
    do_sth_with_ptr_and_may_use_nonnull(Base);
    int *A = return_arg(/* returned */ Base);
    exit();
    return A;
}

Return and call are in the same block, only a single use of A exists, however, backwards propagating nonnull to A is wrong and will be problematic if it is used to optimize Base.

anna added a comment.Mar 13 2020, 2:33 PM

hmm. yes, I can see this getting complicated for design and correctness. Thanks for the example @jdoerfert.

Basically, the usecase I'm interested in is the following:

callee(i8* %arg) {
  %r = call i8* @foo(i8* %arg)
   ret i8* %r
}

And a follow-on example:

callee(i8** %arg) {
  %r = load i8*, i8** %arg <-- here we can add the !nonnull metadata if callsite for callee has nonnull return attr
   ret i8* %r
}

The caller is:

caller {
  ...
  call nonnull i8* @callee(i8** %arg) 
}

AFAICT, this should be fine because the only operations in callee context is the load and return. We are not backward propagating something incorrect into the callee context. W.r.t the caller context, what was true before inlining, remains true after inlining as well.

hmm. yes, I can see this getting complicated for design and correctness. Thanks for the example @jdoerfert.

These things are unfortunately never as easy as we want them to be, believe me I learned the hard way ;)

Basically, the usecase I'm interested in is the following:

callee(i8* %arg) {
  %r = call i8* @foo(i8* %arg)
   ret i8* %r
}

And a follow-on example:

callee(i8** %arg) {
  %r = load i8*, i8** %arg <-- here we can add the !nonnull metadata if callsite for callee has nonnull return attr
   ret i8* %r
}

The caller is:

caller {
  ...
  call nonnull i8* @callee(i8** %arg) 
}

AFAICT, this should be fine because the only operations in callee context is the load and return. We are not backward propagating something incorrect into the callee context. W.r.t the caller context, what was true before inlining, remains true after inlining as well.

  1. If callee is internal and all call site are non-null we can easily make sure the Attributor catches this (if it does not already).
  2. If callee is non-internal or not all call site are non-null we could introduce an internal copy of callee for all "good" call sites (the Attributor issue on this is here: https://github.com/llvm/llvm-project/issues/172)
  3. If your use case is 2) but very restricted, you can walk the successors of the call/load and make sure they are all "OK", e.g., use MustBeExecutedContextExplorer::findInContextOf.

Anna, I'd encourage you to go very narrow here. We can resolve the correlated throw case with the following: require operand of return to be call instruction which is less than small constant window of non-trapping instructions before the return. (i.e. start with previous node) If we allow bitcasts, that provides reasonable coverage. We can always fallback to assumes as noted.

We don't need to be hugely general analysis wise to be very useful. Calls in tail positions or loads in analogous are very common. We should handle that obvious case.

anna added a comment.Mar 13 2020, 6:01 PM

Anna, I'd encourage you to go very narrow here. We can resolve the correlated throw case with the following: require operand of return to be call instruction which is less than small constant window of non-trapping instructions before the return. (i.e. start with previous node) If we allow bitcasts, that provides reasonable coverage. We can always fallback to assumes as noted.

We don't need to be hugely general analysis wise to be very useful. Calls in tail positions or loads in analogous are very common. We should handle that obvious case.

Philip, I think we need to be even more conservative. I don't see how this will handle the parameter of the call instruction being incorrectly optimized away (second example with "returned" attribute on the parameter). IIUC, these are the two restrictions we need:

  1. no throwing instructions between the call (i.e. the operand of the return) and the returnInstruction - possible restrict to small constant window like you described.
  2. the arguments for the call should feed directly from the arguments in the callee (we can have bitcasts/geps here - i.e. use stripAndAccumulateConstantOffsets). This avoids possible incorrect propagation to the argument of the call
anna added a comment.Mar 16 2020, 12:39 PM

talked with Philip offline. We just need the check for non-trapping instructions between the return value (i.e. the call) and the return instruction. In other cases, it is either correct to propagate the nonnull-ness or it showcased a UB that already existed in the program.

anna planned changes to this revision.Mar 16 2020, 1:20 PM

will update patch and place for review.

anna updated this revision to Diff 250900.Mar 17 2020, 2:19 PM

addressed review comments. The analysis is conservative and test cases added show the negative cases where we should not backward propagate the attribute to the call (i.e. return value operand).

anna added a comment.Mar 17 2020, 4:17 PM

the test failures are related to attributes now being added to the calls in clang tests. All of these tests are using llvm intrinsics. Should we just disable this for intrinsic calls for now? Not sure how to update clang tests in the same patch as llvm project. Should be doable though.

the test failures are related to attributes now being added to the calls in clang tests. All of these tests are using llvm intrinsics. Should we just disable this for intrinsic calls for now? Not sure how to update clang tests in the same patch as llvm project. Should be doable though.

Er, git is monorepo. Those tests should be checked out alongside your LLVM copy. You just need to build clang. You can and should simply update the tests in the same patch.

anna added a comment.Mar 19 2020, 6:06 AM

the test failures are related to attributes now being added to the calls in clang tests. All of these tests are using llvm intrinsics. Should we just disable this for intrinsic calls for now? Not sure how to update clang tests in the same patch as llvm project. Should be doable though.

Er, git is monorepo. Those tests should be checked out alongside your LLVM copy. You just need to build clang. You can and should simply update the tests in the same patch.

ah yes, I'd build llvm within it's own subdir without enabling any other projects. Build worked. thanks.

anna updated this revision to Diff 251447.Mar 19 2020, 12:36 PM

fixed clang tests. rot-intrinsics.c testcase has 5 different RUNs with 3 prefixes. Depending on target-triple, the attribute is added to the caller, so I've disabled the optimization for that specific test with -update-return-attrs=false

Herald added a project: Restricted Project. · View Herald TranscriptMar 19 2020, 12:36 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
jdoerfert added inline comments.Mar 19 2020, 3:00 PM
llvm/lib/Transforms/Utils/InlineFunction.cpp
1171

mayThrow is not sufficient. As with my earlier example, a potential exit is sufficient to break this, thus you need willreturn as well.

anna marked an inline comment as done.Mar 20 2020, 4:50 AM
anna added inline comments.
llvm/lib/Transforms/Utils/InlineFunction.cpp
1171

What we need is just isGuaranteedToTransferExecutionToSuccessor. That handles mayThrow, exits/pthread_exit and willreturn.

Just to note, an unconditional exit in the callee itself is not an issue here. The problem is something like this:

;nothrow
foo(i8* %arg) {
if (%arg == null)
  exit;
ret %arg
}
callee() {
  %r = call i8* @bar
  %v = call i8* @foo(i8* %r)
   ret i8* %r
}
caller() {
  call nonnull i8* @callee
}

Here propagating nonnull to callsite bar is incorrect since if %r is null, the program exits.

anna updated this revision to Diff 251615.Mar 20 2020, 4:57 AM

use isGuaranteedToTransferExecutionToSuccessor instead of MayThrow

jdoerfert accepted this revision.Mar 21 2020, 12:30 PM

I'm unsure about the zeroext and signext on the call sites now but other than that I think this is good. wait for @reames OK though.

anna marked an inline comment as done.Mar 22 2020, 9:12 AM
anna added inline comments.
llvm/lib/Transforms/Utils/InlineFunction.cpp
1171

Noticed while adding couple more tests, there are 2 bugs here:

  1. the isGuaranteedToTransferExecutionToSuccessor check should be inverted
  2. make_range should be until the return instruction - so we do not want std::prev on the returnInstruction. what's needed is: make_range(RVal->getIterator(), RInst->getIterator())

This means that from the callsite until (and excluding) the return instruction should be guaranteed to transfer execution to successor - only then we can backward propagate the attribute to that callsite.

lebedev.ri added inline comments.
llvm/lib/Transforms/Utils/InlineFunction.cpp
1171

Are you aware of llvm::isValidAssumeForContext()?
All this (including pitfalls) sound awfully close to that function.

anna updated this revision to Diff 251901.Mar 22 2020, 11:07 AM

Noticed while adding couple more tests, there were 2 bugs:

1 the isGuaranteedToTransferExecutionToSuccessor check should be inverted

  1. make_range should be until the return instruction - so we do not want std::prev on the returnInstruction. what's needed is: make_range(RVal->getIterator(), RInst->getIterator())

This means that from the callsite until (and excluding) the return instruction should be guaranteed to transfer execution to successor - only then we can backward propagate the attribute to that callsite.
Updated patch and added test cases.

anna marked an inline comment as done.Mar 22 2020, 11:13 AM
anna added inline comments.
llvm/lib/Transforms/Utils/InlineFunction.cpp
1171

as stated in a previous comment (https://reviews.llvm.org/D76140#1922292), adding Assumes here for simple cases seems like an overkill. It has significant IR churn and it also adds a use for something which can be easily inferred.
Consider optimizations that depend on facts such as hasOneUse or a limited number of uses. We will now be inhibiting those optimizations.

lebedev.ri added inline comments.Mar 22 2020, 11:26 AM
llvm/lib/Transforms/Utils/InlineFunction.cpp
1171

While i venomously disagree with the avoidance of the usage of one versatile interface
and hope things will change once there's more progress on attributor & assume bundles,
in this case, as it can be seen even from the signature of the isValidAssumeForContext() function,
it implies/forces nothing about using assumes, but only performs a validity checking,
similar to the MayContainThrowingOrExitingCall()

https://github.com/llvm/llvm-project/blob/ca04d0c8fd269978be1c13fe1241172cdfe6a6ea/llvm/lib/Analysis/ValueTracking.cpp#L603

That being said, i haven't reviewed this code so maybe there's some differences here
that make that function unapplicable here.

anna marked an inline comment as done.Mar 23 2020, 6:37 AM
anna added inline comments.
llvm/lib/Transforms/Utils/InlineFunction.cpp
1171

isValidAssumeForContext(Inv, CxtI, DT) does not force anything about assumes, but AFAICT all code which uses this function either has some sort of guard in the caller that the instruction is an assume. Also, the comments in the code state that it is for an assume. In fact, I believe if we intend to use that function more widely for other purposes, we should rename the function before using it (just a thought), and currently we should assert that Inv is an assume. It captures the intent of the function.

That being said, I checked the code in isValidAssumeForContext and it does not fit the bill here for multiple reasons. We either do:

  1. isValidAssumeForContext(RVal /* Inv */, RInst /* CxtI */) which fails when we do not have DT and just return true when RVal comes before RInst - this is always the case, since RVal will come before RInst.
  2. isValidAssumeForContext(RInst /* Inv*/, RVal /* CxtI*/) and it fails at the !isEphemeralValueOf(Inv /* RI */, CxtI /* RV*/) check.

(By fail here, I mean, it does not have the same behaviour as MayContainThrowingOrExitingCall).

anna updated this revision to Diff 252868.Mar 26 2020, 9:02 AM

NFC w.r.t prev diff. Use VMap.lookup instead of a lambda which does the same.

reames accepted this revision.Mar 30 2020, 11:18 AM

LGTM, but with two specific required follow ups. If you're not comfortable committing to both, please don't land this one.

llvm/lib/Transforms/Utils/InlineFunction.cpp
92

I'd suggest a name change here. Maybe: "inliner-attribute-window"?

1158

Pull this out as a static helper instead of a lambda, add an assert internally that the two instructions are in the same block.

Why? Because I'm 80% sure the state capture on the lambda isn't needed, and having it as a separate function forces that discipline.

1174

Ok, after staring at it a bit, I've convinced myself the code here is correct, just needlessly conservative.

What you're doing is:
If the callees return instruction and returned call both map to the same instructions once inlined, determine whether there's a possible exit between the inlined copy.

What you could be doing:
If the callee returns a call, check if *in the callee* there's a possible exit between call and return, then apply attribute to cloned call.

The key difference is when the caller directly returns the result vs uses it locally. The result here is that your transform is much more narrow in applicability than it first appears.

llvm/test/Transforms/Inline/ret_attr_update.ll
113

There's a critical missing test case here:

  • Callee and caller have the same attributes w/different values (i.e. deref)

And thinking through the code, I think there might be a bug here. It's not a serious one, but the if the callee specifies a larger deref than the caller, it looks like the the smaller value is being written over the larger.

Actually, digging through the attribute code, I think I'm wrong about the bug. However, you should definitely write the test to confirm and document merging behaviour!

If it does turn out I'm correct, I'm fine with this being addressed in a follow up patch provided that the test is added in this one and isn't a functional issue.

This revision is now accepted and ready to land.Mar 30 2020, 11:18 AM
anna marked 3 inline comments as done.Mar 30 2020, 11:34 AM
anna added inline comments.
llvm/lib/Transforms/Utils/InlineFunction.cpp
1158

agreed. I'll do that in this change itself before landing. I am using this static helper in followon change D76792.

1174

yes, thanks for pointing it out. I realized it after our offline discussion :)
For now, I will add a FIXME testcase which showcases the difference in code and handle that testcase in a followon change.

llvm/test/Transforms/Inline/ret_attr_update.ll
113

will check this.

anna updated this revision to Diff 253704.Mar 30 2020, 2:38 PM

addressed review comments. Added two test cases: deref value being different, inlined callee body better optimized compared to callee.

anna marked 3 inline comments as done.Mar 30 2020, 2:49 PM
anna added inline comments.
llvm/lib/Transforms/Utils/InlineFunction.cpp
1174

The key difference is when the caller directly returns the result vs uses it locally. The result here is that your transform is much more narrow in applicability than it first appears.

I tried multiple test cases to showcase the difference between the two ideas above but failed. Specifically, simplifyInstruction used during inlining the callee is not too great at optimizing the body. For example, see added testcase test7.

I also tried the less restrictive version (check the safety of the optimization in the callee itself, and do the attribute update on the cloned instruction), but didn't see any testcases in clang that needed update. Of course, that doesn't mean anything :)

llvm/test/Transforms/Inline/ret_attr_update.ll
113

added test case and documented merge behaviour. No bug in code, since we use the already existing value on attribute.

This revision was automatically updated to reflect the committed changes.
anna added a comment.Mar 31 2020, 12:40 PM

I got a failure in one of the binaryFormats:
lib/BinaryFormat/CMakeFiles/LLVMBinaryFormat.dir/MsgPackReader.cpp

Attributes 'zeroext and signext' are incompatible!
  %rev.i.i.i.i.i.i.i.i = tail call signext zeroext i16 @llvm.bswap.i16(i16 %ret.0.copyload.i.i.i.i) #6
in function _ZN4llvm7msgpack6Reader7readIntIsEENS_8ExpectedIbEERNS0_6ObjectE
fatal error: error in backend: Broken function found, compilation aborted!

I believe this just showcases undefined behaviour since we were having a returnValue (i.e. call) with an incomptable attribute compared to the return attribute on the callsite.

anna reopened this revision.Mar 31 2020, 1:58 PM

I got a failure in one of the binaryFormats:
lib/BinaryFormat/CMakeFiles/LLVMBinaryFormat.dir/MsgPackReader.cpp

Attributes 'zeroext and signext' are incompatible!
  %rev.i.i.i.i.i.i.i.i = tail call signext zeroext i16 @llvm.bswap.i16(i16 %ret.0.copyload.i.i.i.i) #6
in function _ZN4llvm7msgpack6Reader7readIntIsEENS_8ExpectedIbEERNS0_6ObjectE
fatal error: error in backend: Broken function found, compilation aborted!

I believe this just showcases undefined behaviour since we were having a returnValue (i.e. call) with an incomptable attribute compared to the return attribute on the callsite.

The last statement is not true. Had a discussion offline with Philip and he pointed out that we missed the fact that attributes such as signext and zeroext are part of the *call* itself. We cannot propagate these attributes into the callee since such attributes are part of the ABI for the call it is attached to.
I'm reopening this review to fix this issue.

This revision is now accepted and ready to land.Mar 31 2020, 1:58 PM
anna planned changes to this revision.Mar 31 2020, 1:59 PM

see above comment.

anna marked an inline comment as done.Mar 31 2020, 2:08 PM
anna added inline comments.
llvm/lib/Transforms/Utils/InlineFunction.cpp
1174

Clarified this with Philip offline. The current patch is not restrictive. In fact, now that I think of it, sometimes, it may be better - simplifyInstruction can fold away instructions and reduce the "window size" between the RV and the ReturnInst.

anna updated this revision to Diff 254213.Apr 1 2020, 8:23 AM

whitelist valid return attributes and only add those. Added testcase for signext.

This revision is now accepted and ready to land.Apr 1 2020, 8:23 AM
anna requested review of this revision.Apr 1 2020, 8:25 AM

fixed buildbot failure. see above comments and added testcase test8.

anna updated this revision to Diff 254222.Apr 1 2020, 9:00 AM

fixed missing code left out during rebase.

reames accepted this revision.Apr 2 2020, 9:03 AM

LGTM again, with minor change.

p.s. Sorry for missing the functional issue the first time. All of the test changes should have made the issue obvious, but despite reading the LangRef description of signext, I somehow managed to miss the separation between ABI and optimization attributes.

llvm/lib/Transforms/Utils/InlineFunction.cpp
1176

I'm not sure that pulling out the helper for two cases actually helps readability. Can you drop this and just do the two cases directly please?

This revision is now accepted and ready to land.Apr 2 2020, 9:03 AM
anna added a comment.Apr 2 2020, 9:48 AM

LGTM again, with minor change.

will update it.

p.s. Sorry for missing the functional issue the first time. All of the test changes should have made the issue obvious, but despite reading the LangRef description of signext, I somehow managed to miss the separation between ABI and optimization attributes.

thanks for the review Philip and pointing out the problem. All of us had missed the functional issue the first time around.

This revision was automatically updated to reflect the committed changes.