This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/test/CodeGen/
-
test/
-
CodeGen/
-
builtins-systemz-zvector.c
-
builtins-systemz-zvector2.c
-
movbe-builtins.c
-
rot-intrinsics.c
-
waitpkg.c
-
llvm/
-
lib/Transforms/Utils/
-
Transforms/
-
Utils/
8/15
InlineFunction.cpp
-
test/Transforms/Inline/
-
Transforms/
-
Inline/
3/3
ret_attr_update.ll

Differential D76140

[InlineFunction] update attributes during inlining
ClosedPublic

Authored by anna on Mar 13 2020, 9:19 AM.

Download Raw Diff

Details

Reviewers

reames
hfinkel
apilipenko
jdoerfert
aartbik

Commits

rGbf7a16a76871: [InlineFunction] Update valid return attributes at callsite within callee body
rG28518d9ae39f: [InlineFunction] Handle return attributes on call within inlined body

Summary

When we inline a callee into a callsite that has attributes on return, we need to add the attributes on calls that feed into the return value in the callee definition.

Currently, we do this for only for calls feeding into the return.

A following patch, will do this also for loads feeding into the return, by adding it as a metadata on the load (i.e. loaded value is nonnull).

The analysis is added to handle the simple cases.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

anna created this revision.Mar 13 2020, 9:19 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 13 2020, 9:19 AM

Herald added subscribers: llvm-commits, dantrushin, hiraditya. · View Herald Transcript

Harbormaster failed remote builds in B49152: Diff 250231!Mar 13 2020, 10:12 AM

This is not correct. You're modifying the callee function, not the copy of the callee inlined into the caller. The callee may continue to persist, and may have other callers for which this fact does not apply. You need to perform this transform on the inlined copy.

See AddAliasScopeMetadata for an example of the approach.

Also, I'd strongly prefer to see this phrased as merging all return attributes.

This revision now requires changes to proceed.Mar 13 2020, 10:43 AM

In addition to @reames correctness concerns this is not valid for a different reason. You cannot backwards propagate information in the general case:

__attribute__((nonnull)) int *foo(int c) {
   int *A = unknown();
   int *B = unknown();
   do_sth_with_ptr_and_may_use_nonnull(A, B);
   return c ? A : B; // or if (C) return A; else return B;
}

Knowing foo returns a nonnull value cannot be used to annotate A or B.

I think this is generally problematic as it basically special cases a situation which we can handle in general just fine.
Let's assume we inline and keep the nonnull on the return values around, we run nonnull deduction on the caller and we get the nonnull where it is correct.
The benefit of doing it this way (D75825) is that it will also work with all other attributes as well without reinventing all the logic in the Attributor.

In D76140#1922162, @jdoerfert wrote:
In addition to @reames correctness concerns this is not valid for a different reason. You cannot backwards propagate information in the general case:
__attribute__((nonnull)) int *foo(int c) {
   int *A = unknown();
   int *B = unknown();
   do_sth_with_ptr_and_may_use_nonnull(A, B);
   return c ? A : B; // or if (C) return A; else return B;
}
Knowing foo returns a nonnull value cannot be used to annotate A or B.

Good catch, I'd missed that.

I think this is generally problematic as it basically special cases a situation which we can handle in general just fine.
Let's assume we inline and keep the nonnull on the return values around, we run nonnull deduction on the caller and we get the nonnull where it is correct.
The benefit of doing it this way (D75825) is that it will also work with all other attributes as well without reinventing all the logic in the Attributor.

I disagree, mostly in the framing of this as an either/or. If we can cheaply match IR patterns during inlining and use attributes instead of assumes, we should. The other option requires IR churn for minimal value since we'll fold the inserted assumes back into attributes in the end anyway. We clearly should use assumes for the general case, but that doesn't mean we shouldn't specialize the common case.

addressed review comments. This handles all attributes on the call and only done after the cloning of the inlined body.
Tests added for attributes and making sure the original callee does not have the attributes added within its body.

In D76140#1922292, @reames wrote:
In D76140#1922162, @jdoerfert wrote:
In addition to @reames correctness concerns this is not valid for a different reason. You cannot backwards propagate information in the general case:
__attribute__((nonnull)) int *foo(int c) {
   int *A = unknown();
   int *B = unknown();
   do_sth_with_ptr_and_may_use_nonnull(A, B);
   return c ? A : B; // or if (C) return A; else return B;
}
Knowing foo returns a nonnull value cannot be used to annotate A or B.
Good catch, I'd missed that.

Just noticed this. I think we can special case this to: the return and def call are in the same basic block (it will take care of the if-else in the example) and has only one use which is the return. The latter takes care of avoiding incorrect semantics in that call do_sth_with_ptr_and_may_use_nonnull or any other use that depends on knowing which of the ptr is non-null.

I think this is generally problematic as it basically special cases a situation which we can handle in general just fine.
Let's assume we inline and keep the nonnull on the return values around, we run nonnull deduction on the caller and we get the nonnull where it is correct.
The benefit of doing it this way (D75825) is that it will also work with all other attributes as well without reinventing all the logic in the Attributor.

I disagree, mostly in the framing of this as an either/or. If we can cheaply match IR patterns during inlining and use attributes instead of assumes, we should. The other option requires IR churn for minimal value since we'll fold the inserted assumes back into attributes in the end anyway. We clearly should use assumes for the general case, but that doesn't mean we shouldn't specialize the common case.

The above restrictions (which makes it conservative, but we have cases like that) should allow us cleanly place all the return attributes on the calls within the body, I think.

anna retitled this revision from [InlineFunction] update nonnnull attribute during inlining to [InlineFunction] update attributes during inlining.Mar 13 2020, 1:56 PM

anna edited the summary of this revision. (Show Details)

In D76140#1922328, @anna wrote:
In D76140#1922292, @reames wrote:
In D76140#1922162, @jdoerfert wrote:
In addition to @reames correctness concerns this is not valid for a different reason. You cannot backwards propagate information in the general case:
__attribute__((nonnull)) int *foo(int c) {
   int *A = unknown();
   int *B = unknown();
   do_sth_with_ptr_and_may_use_nonnull(A, B);
   return c ? A : B; // or if (C) return A; else return B;
}
Knowing foo returns a nonnull value cannot be used to annotate A or B.
Good catch, I'd missed that.
Just noticed this. I think we can special case this to: the return and def call are in the same basic block (it will take care of the if-else in the example) and has only one use which is the return. The latter takes care of avoiding incorrect semantics in that call do_sth_with_ptr_and_may_use_nonnull or any other use that depends on knowing which of the ptr is non-null.

None of this is sufficient. We are repeating a lot of deduction logic here now...

Take:

__attribute__((nonnull)) int *foo() {
    int *Base = unknown();
    do_sth_with_ptr_and_may_use_nonnull(Base);
    int *A = return_arg(/* returned */ Base);
    exit();
    return A;
}

Return and call are in the same block, only a single use of A exists, however, backwards propagating nonnull to A is wrong and will be problematic if it is used to optimize Base.

hmm. yes, I can see this getting complicated for design and correctness. Thanks for the example @jdoerfert.

Basically, the usecase I'm interested in is the following:

callee(i8* %arg) {
  %r = call i8* @foo(i8* %arg)
   ret i8* %r
}

And a follow-on example:

callee(i8** %arg) {
  %r = load i8*, i8** %arg <-- here we can add the !nonnull metadata if callsite for callee has nonnull return attr
   ret i8* %r
}

The caller is:

caller {
  ...
  call nonnull i8* @callee(i8** %arg) 
}

AFAICT, this should be fine because the only operations in callee context is the load and return. We are not backward propagating something incorrect into the callee context. W.r.t the caller context, what was true before inlining, remains true after inlining as well.

Harbormaster failed remote builds in B49181: Diff 250292!Mar 13 2020, 2:34 PM

In D76140#1922417, @anna wrote:

hmm. yes, I can see this getting complicated for design and correctness. Thanks for the example @jdoerfert.

These things are unfortunately never as easy as we want them to be, believe me I learned the hard way ;)

Basically, the usecase I'm interested in is the following:
callee(i8* %arg) {
  %r = call i8* @foo(i8* %arg)
   ret i8* %r
}
And a follow-on example:
callee(i8** %arg) {
  %r = load i8*, i8** %arg <-- here we can add the !nonnull metadata if callsite for callee has nonnull return attr
   ret i8* %r
}
The caller is:
caller {
  ...
  call nonnull i8* @callee(i8** %arg) 
}
AFAICT, this should be fine because the only operations in callee context is the load and return. We are not backward propagating something incorrect into the callee context. W.r.t the caller context, what was true before inlining, remains true after inlining as well.

If callee is internal and all call site are non-null we can easily make sure the Attributor catches this (if it does not already).
If callee is non-internal or not all call site are non-null we could introduce an internal copy of callee for all "good" call sites (the Attributor issue on this is here: https://github.com/llvm/llvm-project/issues/172)
If your use case is 2) but very restricted, you can walk the successors of the call/load and make sure they are all "OK", e.g., use MustBeExecutedContextExplorer::findInContextOf.

Anna, I'd encourage you to go very narrow here. We can resolve the correlated throw case with the following: require operand of return to be call instruction which is less than small constant window of non-trapping instructions before the return. (i.e. start with previous node) If we allow bitcasts, that provides reasonable coverage. We can always fallback to assumes as noted.

We don't need to be hugely general analysis wise to be very useful. Calls in tail positions or loads in analogous are very common. We should handle that obvious case.

In D76140#1922555, @reames wrote:

Anna, I'd encourage you to go very narrow here. We can resolve the correlated throw case with the following: require operand of return to be call instruction which is less than small constant window of non-trapping instructions before the return. (i.e. start with previous node) If we allow bitcasts, that provides reasonable coverage. We can always fallback to assumes as noted.

We don't need to be hugely general analysis wise to be very useful. Calls in tail positions or loads in analogous are very common. We should handle that obvious case.

Philip, I think we need to be even more conservative. I don't see how this will handle the parameter of the call instruction being incorrectly optimized away (second example with "returned" attribute on the parameter). IIUC, these are the two restrictions we need:

no throwing instructions between the call (i.e. the operand of the return) and the returnInstruction - possible restrict to small constant window like you described.
the arguments for the call should feed directly from the arguments in the callee (we can have bitcasts/geps here - i.e. use stripAndAccumulateConstantOffsets). This avoids possible incorrect propagation to the argument of the call

talked with Philip offline. We just need the check for non-trapping instructions between the return value (i.e. the call) and the return instruction. In other cases, it is either correct to propagate the nonnull-ness or it showcased a UB that already existed in the program.

will update patch and place for review.

addressed review comments. The analysis is conservative and test cases added show the negative cases where we should not backward propagate the attribute to the call (i.e. return value operand).

Harbormaster failed remote builds in B49495: Diff 250900!Mar 17 2020, 3:08 PM

the test failures are related to attributes now being added to the calls in clang tests. All of these tests are using llvm intrinsics. Should we just disable this for intrinsic calls for now? Not sure how to update clang tests in the same patch as llvm project. Should be doable though.

In D76140#1927909, @anna wrote:

the test failures are related to attributes now being added to the calls in clang tests. All of these tests are using llvm intrinsics. Should we just disable this for intrinsic calls for now? Not sure how to update clang tests in the same patch as llvm project. Should be doable though.

Er, git is monorepo. Those tests should be checked out alongside your LLVM copy. You just need to build clang. You can and should simply update the tests in the same patch.

In D76140#1930078, @reames wrote:

In D76140#1927909, @anna wrote:

the test failures are related to attributes now being added to the calls in clang tests. All of these tests are using llvm intrinsics. Should we just disable this for intrinsic calls for now? Not sure how to update clang tests in the same patch as llvm project. Should be doable though.

Er, git is monorepo. Those tests should be checked out alongside your LLVM copy. You just need to build clang. You can and should simply update the tests in the same patch.

ah yes, I'd build llvm within it's own subdir without enabling any other projects. Build worked. thanks.

fixed clang tests. rot-intrinsics.c testcase has 5 different RUNs with 3 prefixes. Depending on target-triple, the attribute is added to the caller, so I've disabled the optimization for that specific test with -update-return-attrs=false

Herald added a project: Restricted Project. · View Herald TranscriptMar 19 2020, 12:36 PM

Herald added a subscriber: cfe-commits. · View Herald Transcript

Harbormaster failed remote builds in B49781: Diff 251447!Mar 19 2020, 1:09 PM

jdoerfert added inline comments.Mar 19 2020, 3:00 PM

llvm/lib/Transforms/Utils/InlineFunction.cpp
1172	`mayThrow` is not sufficient. As with my earlier example, a potential `exit` is sufficient to break this, thus you need `willreturn` as well.

anna marked an inline comment as done.Mar 20 2020, 4:50 AM

anna added inline comments.

llvm/lib/Transforms/Utils/InlineFunction.cpp
1172	What we need is just `isGuaranteedToTransferExecutionToSuccessor`. That handles `mayThrow`, exits/pthread_exit and willreturn. Just to note, an unconditional exit in the callee itself is not an issue here. The problem is something like this: ;nothrow foo(i8* %arg) { if (%arg == null) exit; ret %arg } callee() { %r = call i8* @bar %v = call i8* @foo(i8* %r) ret i8* %r } caller() { call nonnull i8* @callee } Here propagating nonnull to callsite `bar` is incorrect since if %r is null, the program exits.

Herald added a reviewer: aartbik. · View Herald TranscriptMar 20 2020, 4:50 AM

use isGuaranteedToTransferExecutionToSuccessor instead of MayThrow

Harbormaster failed remote builds in B49873: Diff 251615!Mar 20 2020, 5:55 AM

I'm unsure about the zeroext and signext on the call sites now but other than that I think this is good. wait for @reames OK though.

anna marked an inline comment as done.Mar 22 2020, 9:12 AM

anna added inline comments.

llvm/lib/Transforms/Utils/InlineFunction.cpp
1172	Noticed while adding couple more tests, there are 2 bugs here: the `isGuaranteedToTransferExecutionToSuccessor` check should be inverted make_range should be until the return instruction - so we do not want `std::prev` on the returnInstruction. what's needed is: `make_range(RVal->getIterator(), RInst->getIterator())` This means that from the callsite until (and excluding) the return instruction should be guaranteed to transfer execution to successor - only then we can backward propagate the attribute to that callsite.

lebedev.ri added a subscriber: lebedev.ri.Mar 22 2020, 10:02 AM

lebedev.ri added inline comments.

llvm/lib/Transforms/Utils/InlineFunction.cpp
1172	Are you aware of `llvm::isValidAssumeForContext()`? All this (including pitfalls) sound awfully close to that function.

Noticed while adding couple more tests, there were 2 bugs:

1 the isGuaranteedToTransferExecutionToSuccessor check should be inverted

make_range should be until the return instruction - so we do not want std::prev on the returnInstruction. what's needed is: make_range(RVal->getIterator(), RInst->getIterator())

This means that from the callsite until (and excluding) the return instruction should be guaranteed to transfer execution to successor - only then we can backward propagate the attribute to that callsite.
Updated patch and added test cases.

anna marked an inline comment as done.Mar 22 2020, 11:13 AM

anna added inline comments.

llvm/lib/Transforms/Utils/InlineFunction.cpp
1172	as stated in a previous comment (https://reviews.llvm.org/D76140#1922292), adding `Assumes` here for simple cases seems like an overkill. It has significant IR churn and it also adds a use for something which can be easily inferred. Consider optimizations that depend on facts such as `hasOneUse` or a limited number of uses. We will now be inhibiting those optimizations.

lebedev.ri added inline comments.Mar 22 2020, 11:26 AM

llvm/lib/Transforms/Utils/InlineFunction.cpp
1172	While i venomously disagree with the avoidance of the usage of one versatile interface and hope things will change once there's more progress on attributor & assume bundles, in this case, as it can be seen even from the signature of the `isValidAssumeForContext()` function, it implies/forces nothing about using assumes, but only performs a validity checking, similar to the `MayContainThrowingOrExitingCall()` https://github.com/llvm/llvm-project/blob/ca04d0c8fd269978be1c13fe1241172cdfe6a6ea/llvm/lib/Analysis/ValueTracking.cpp#L603 That being said, i haven't reviewed this code so maybe there's some differences here that make that function unapplicable here.

Harbormaster failed remote builds in B50036: Diff 251901!Mar 22 2020, 11:45 AM

anna marked an inline comment as done.Mar 23 2020, 6:37 AM

anna added inline comments.

llvm/lib/Transforms/Utils/InlineFunction.cpp
1172	`isValidAssumeForContext(Inv, CxtI, DT)` does not force anything about assumes, but AFAICT all code which uses this function either has some sort of guard in the caller that the instruction is an assume. Also, the comments in the code state that it is for an assume. In fact, I believe if we intend to use that function more widely for other purposes, we should rename the function before using it (just a thought), and currently we should assert that `Inv` is an assume. It captures the intent of the function. That being said, I checked the code in `isValidAssumeForContext` and it does not fit the bill here for multiple reasons. We either do: `isValidAssumeForContext(RVal /* Inv /, RInst / CxtI /)` which fails when we do not have DT and just return true when RVal comes before RInst - this is always the case, since RVal will come before RInst. `isValidAssumeForContext(RInst / Inv/, RVal / CxtI/)` and it fails at the `!isEphemeralValueOf(Inv / RI /, CxtI / RV*/)` check. (By fail here, I mean, it does not have the same behaviour as `MayContainThrowingOrExitingCall`).

ping.

anna mentioned this in D76792: [InlineFunction] Update metadata on loads that are return values.Mar 25 2020, 11:30 AM

anna added a child revision: D76792: [InlineFunction] Update metadata on loads that are return values.

NFC w.r.t prev diff. Use VMap.lookup instead of a lambda which does the same.

Harbormaster failed remote builds in B50553: Diff 252868!Mar 26 2020, 9:45 AM

LGTM, but with two specific required follow ups. If you're not comfortable committing to both, please don't land this one.

llvm/lib/Transforms/Utils/InlineFunction.cpp
93	I'd suggest a name change here. Maybe: "inliner-attribute-window"?
1159	Pull this out as a static helper instead of a lambda, add an assert internally that the two instructions are in the same block. Why? Because I'm 80% sure the state capture on the lambda isn't needed, and having it as a separate function forces that discipline.
1175	Ok, after staring at it a bit, I've convinced myself the code here is correct, just needlessly conservative. What you're doing is: If the callees return instruction and returned call both map to the same instructions once inlined, determine whether there's a possible exit between the inlined copy. What you could be doing: If the callee returns a call, check if in the callee there's a possible exit between call and return, then apply attribute to cloned call. The key difference is when the caller directly returns the result vs uses it locally. The result here is that your transform is much more narrow in applicability than it first appears.
llvm/test/Transforms/Inline/ret_attr_update.ll
113	There's a critical missing test case here: Callee and caller have the same attributes w/different values (i.e. deref) And thinking through the code, I think there might be a bug here. It's not a serious one, but the if the callee specifies a larger deref than the caller, it looks like the the smaller value is being written over the larger. Actually, digging through the attribute code, I think I'm wrong about the bug. However, you should definitely write the test to confirm and document merging behaviour! If it does turn out I'm correct, I'm fine with this being addressed in a follow up patch provided that the test is added in this one and isn't a functional issue.

This revision is now accepted and ready to land.Mar 30 2020, 11:18 AM

anna marked 3 inline comments as done.Mar 30 2020, 11:34 AM

anna added inline comments.

llvm/lib/Transforms/Utils/InlineFunction.cpp
1159	agreed. I'll do that in this change itself before landing. I am using this static helper in followon change D76792.
1175	yes, thanks for pointing it out. I realized it after our offline discussion :) For now, I will add a FIXME testcase which showcases the difference in code and handle that testcase in a followon change.
llvm/test/Transforms/Inline/ret_attr_update.ll
113	will check this.

addressed review comments. Added two test cases: deref value being different, inlined callee body better optimized compared to callee.

anna marked 3 inline comments as done.Mar 30 2020, 2:49 PM

anna added inline comments.

llvm/lib/Transforms/Utils/InlineFunction.cpp
1175	The key difference is when the caller directly returns the result vs uses it locally. The result here is that your transform is much more narrow in applicability than it first appears. I tried multiple test cases to showcase the difference between the two ideas above but failed. Specifically, `simplifyInstruction` used during inlining the callee is not too great at optimizing the body. For example, see added testcase `test7`. I also tried the less restrictive version (check the safety of the optimization in the callee itself, and do the attribute update on the cloned instruction), but didn't see any testcases in clang that needed update. Of course, that doesn't mean anything :)
llvm/test/Transforms/Inline/ret_attr_update.ll
113	added test case and documented merge behaviour. No bug in code, since we use the already existing value on attribute.

Harbormaster failed remote builds in B51030: Diff 253704!Mar 30 2020, 3:51 PM

Closed by commit rG28518d9ae39f: [InlineFunction] Handle return attributes on call within inlined body (authored by anna). · Explain WhyMar 31 2020, 11:59 AM

This revision was automatically updated to reflect the committed changes.

I got a failure in one of the binaryFormats:
lib/BinaryFormat/CMakeFiles/LLVMBinaryFormat.dir/MsgPackReader.cpp

Attributes 'zeroext and signext' are incompatible!
  %rev.i.i.i.i.i.i.i.i = tail call signext zeroext i16 @llvm.bswap.i16(i16 %ret.0.copyload.i.i.i.i) #6
in function _ZN4llvm7msgpack6Reader7readIntIsEENS_8ExpectedIbEERNS0_6ObjectE
fatal error: error in backend: Broken function found, compilation aborted!

I believe this just showcases undefined behaviour since we were having a returnValue (i.e. call) with an incomptable attribute compared to the return attribute on the callsite.

In D76140#1953201, @anna wrote:
I got a failure in one of the binaryFormats:
lib/BinaryFormat/CMakeFiles/LLVMBinaryFormat.dir/MsgPackReader.cpp
Attributes 'zeroext and signext' are incompatible!
  %rev.i.i.i.i.i.i.i.i = tail call signext zeroext i16 @llvm.bswap.i16(i16 %ret.0.copyload.i.i.i.i) #6
in function _ZN4llvm7msgpack6Reader7readIntIsEENS_8ExpectedIbEERNS0_6ObjectE
fatal error: error in backend: Broken function found, compilation aborted!
I believe this just showcases undefined behaviour since we were having a returnValue (i.e. call) with an incomptable attribute compared to the return attribute on the callsite.

The last statement is not true. Had a discussion offline with Philip and he pointed out that we missed the fact that attributes such as signext and zeroext are part of the *call* itself. We cannot propagate these attributes into the callee since such attributes are part of the ABI for the call it is attached to.
I'm reopening this review to fix this issue.

This revision is now accepted and ready to land.Mar 31 2020, 1:58 PM

see above comment.

anna marked an inline comment as done.Mar 31 2020, 2:08 PM

anna added inline comments.

llvm/lib/Transforms/Utils/InlineFunction.cpp
1175	Clarified this with Philip offline. The current patch is not restrictive. In fact, now that I think of it, sometimes, it may be better - `simplifyInstruction` can fold away instructions and reduce the "window size" between the RV and the ReturnInst.

whitelist valid return attributes and only add those. Added testcase for signext.

This revision is now accepted and ready to land.Apr 1 2020, 8:23 AM

fixed buildbot failure. see above comments and added testcase test8.

Harbormaster failed remote builds in B51295: Diff 254213!Apr 1 2020, 8:47 AM

fixed missing code left out during rebase.

Harbormaster failed remote builds in B51299: Diff 254222!Apr 1 2020, 9:20 AM

LGTM again, with minor change.

p.s. Sorry for missing the functional issue the first time. All of the test changes should have made the issue obvious, but despite reading the LangRef description of signext, I somehow managed to miss the separation between ABI and optimization attributes.

llvm/lib/Transforms/Utils/InlineFunction.cpp
1177	I'm not sure that pulling out the helper for two cases actually helps readability. Can you drop this and just do the two cases directly please?

This revision is now accepted and ready to land.Apr 2 2020, 9:03 AM

In D76140#1957416, @reames wrote:

LGTM again, with minor change.

will update it.

p.s. Sorry for missing the functional issue the first time. All of the test changes should have made the issue obvious, but despite reading the LangRef description of signext, I somehow managed to miss the separation between ABI and optimization attributes.

thanks for the review Philip and pointing out the problem. All of us had missed the functional issue the first time around.

Closed by commit rGbf7a16a76871: [InlineFunction] Update valid return attributes at callsite within callee body (authored by anna). · Explain WhyApr 2 2020, 11:23 AM

This revision was automatically updated to reflect the committed changes.

anna mentioned this in rG1d0f75790491: [InlineFunction] Update metadata on loads that are return values.Apr 5 2020, 12:17 PM

Revision Contents

Path

Size

clang/

test/

CodeGen/

builtins-systemz-zvector.c

18 lines

builtins-systemz-zvector2.c

4 lines

movbe-builtins.c

2 lines

rot-intrinsics.c

12 lines

waitpkg.c

4 lines

llvm/

lib/

Transforms/

Utils/

InlineFunction.cpp

88 lines

test/

Transforms/

Inline/

ret_attr_update.ll

75 lines

Diff 251615

clang/test/CodeGen/builtins-systemz-zvector.c

	Show First 20 Lines • Show All 3,659 Lines • ▼ Show 20 Lines
	vuc = vec_sum_u128(vui, vui);			vuc = vec_sum_u128(vui, vui);
	// CHECK: call <16 x i8> @llvm.s390.vsumqf(<4 x i32> %{{.}}, <4 x i32> %{{.}})			// CHECK: call <16 x i8> @llvm.s390.vsumqf(<4 x i32> %{{.}}, <4 x i32> %{{.}})
	// CHECK-ASM: vsumqf			// CHECK-ASM: vsumqf
	vuc = vec_sum_u128(vul, vul);			vuc = vec_sum_u128(vul, vul);
	// CHECK: call <16 x i8> @llvm.s390.vsumqg(<2 x i64> %{{.}}, <2 x i64> %{{.}})			// CHECK: call <16 x i8> @llvm.s390.vsumqg(<2 x i64> %{{.}}, <2 x i64> %{{.}})
	// CHECK-ASM: vsumqg			// CHECK-ASM: vsumqg

	idx = vec_test_mask(vsc, vuc);			idx = vec_test_mask(vsc, vuc);
	// CHECK: call i32 @llvm.s390.vtm(<16 x i8> %{{.}}, <16 x i8> %{{.}})			// CHECK: call signext i32 @llvm.s390.vtm(<16 x i8> %{{.}}, <16 x i8> %{{.}})
	// CHECK-ASM: vtm			// CHECK-ASM: vtm
	idx = vec_test_mask(vuc, vuc);			idx = vec_test_mask(vuc, vuc);
	// CHECK: call i32 @llvm.s390.vtm(<16 x i8> %{{.}}, <16 x i8> %{{.}})			// CHECK: call signext i32 @llvm.s390.vtm(<16 x i8> %{{.}}, <16 x i8> %{{.}})
	// CHECK-ASM: vtm			// CHECK-ASM: vtm
	idx = vec_test_mask(vss, vus);			idx = vec_test_mask(vss, vus);
	// CHECK: call i32 @llvm.s390.vtm(<16 x i8> %{{.}}, <16 x i8> %{{.}})			// CHECK: call signext i32 @llvm.s390.vtm(<16 x i8> %{{.}}, <16 x i8> %{{.}})
	// CHECK-ASM: vtm			// CHECK-ASM: vtm
	idx = vec_test_mask(vus, vus);			idx = vec_test_mask(vus, vus);
	// CHECK: call i32 @llvm.s390.vtm(<16 x i8> %{{.}}, <16 x i8> %{{.}})			// CHECK: call signext i32 @llvm.s390.vtm(<16 x i8> %{{.}}, <16 x i8> %{{.}})
	// CHECK-ASM: vtm			// CHECK-ASM: vtm
	idx = vec_test_mask(vsi, vui);			idx = vec_test_mask(vsi, vui);
	// CHECK: call i32 @llvm.s390.vtm(<16 x i8> %{{.}}, <16 x i8> %{{.}})			// CHECK: call signext i32 @llvm.s390.vtm(<16 x i8> %{{.}}, <16 x i8> %{{.}})
	// CHECK-ASM: vtm			// CHECK-ASM: vtm
	idx = vec_test_mask(vui, vui);			idx = vec_test_mask(vui, vui);
	// CHECK: call i32 @llvm.s390.vtm(<16 x i8> %{{.}}, <16 x i8> %{{.}})			// CHECK: call signext i32 @llvm.s390.vtm(<16 x i8> %{{.}}, <16 x i8> %{{.}})
	// CHECK-ASM: vtm			// CHECK-ASM: vtm
	idx = vec_test_mask(vsl, vul);			idx = vec_test_mask(vsl, vul);
	// CHECK: call i32 @llvm.s390.vtm(<16 x i8> %{{.}}, <16 x i8> %{{.}})			// CHECK: call signext i32 @llvm.s390.vtm(<16 x i8> %{{.}}, <16 x i8> %{{.}})
	// CHECK-ASM: vtm			// CHECK-ASM: vtm
	idx = vec_test_mask(vul, vul);			idx = vec_test_mask(vul, vul);
	// CHECK: call i32 @llvm.s390.vtm(<16 x i8> %{{.}}, <16 x i8> %{{.}})			// CHECK: call signext i32 @llvm.s390.vtm(<16 x i8> %{{.}}, <16 x i8> %{{.}})
	// CHECK-ASM: vtm			// CHECK-ASM: vtm
	idx = vec_test_mask(vd, vul);			idx = vec_test_mask(vd, vul);
	// CHECK: call i32 @llvm.s390.vtm(<16 x i8> %{{.}}, <16 x i8> %{{.}})			// CHECK: call signext i32 @llvm.s390.vtm(<16 x i8> %{{.}}, <16 x i8> %{{.}})
	// CHECK-ASM: vtm			// CHECK-ASM: vtm
	}			}

	void test_string(void) {			void test_string(void) {
	// CHECK-ASM-LABEL: test_string			// CHECK-ASM-LABEL: test_string

	vsc = vec_cp_until_zero(vsc);			vsc = vec_cp_until_zero(vsc);
	// CHECK: call <16 x i8> @llvm.s390.vistrb(<16 x i8> %{{.*}})			// CHECK: call <16 x i8> @llvm.s390.vistrb(<16 x i8> %{{.*}})
	▲ Show 20 Lines • Show All 922 Lines • Show Last 20 Lines

clang/test/CodeGen/builtins-systemz-zvector2.c

Show First 20 Lines • Show All 648 Lines • ▼ Show 20 Lines	void test_integer(void) {
vd = vec_srb(vd, vsl);		vd = vec_srb(vd, vsl);
// CHECK: call <16 x i8> @llvm.s390.vsrlb(<16 x i8> %{{.}}, <16 x i8> %{{.}})		// CHECK: call <16 x i8> @llvm.s390.vsrlb(<16 x i8> %{{.}}, <16 x i8> %{{.}})
// CHECK-ASM: vsrlb		// CHECK-ASM: vsrlb
vd = vec_srb(vd, vul);		vd = vec_srb(vd, vul);
// CHECK: call <16 x i8> @llvm.s390.vsrlb(<16 x i8> %{{.}}, <16 x i8> %{{.}})		// CHECK: call <16 x i8> @llvm.s390.vsrlb(<16 x i8> %{{.}}, <16 x i8> %{{.}})
// CHECK-ASM: vsrlb		// CHECK-ASM: vsrlb

idx = vec_test_mask(vf, vui);		idx = vec_test_mask(vf, vui);
// CHECK: call i32 @llvm.s390.vtm(<16 x i8> %{{.}}, <16 x i8> %{{.}})		// CHECK: call signext i32 @llvm.s390.vtm(<16 x i8> %{{.}}, <16 x i8> %{{.}})
// CHECK-ASM: vtm		// CHECK-ASM: vtm
idx = vec_test_mask(vd, vul);		idx = vec_test_mask(vd, vul);
// CHECK: call i32 @llvm.s390.vtm(<16 x i8> %{{.}}, <16 x i8> %{{.}})		// CHECK: call signext i32 @llvm.s390.vtm(<16 x i8> %{{.}}, <16 x i8> %{{.}})
// CHECK-ASM: vtm		// CHECK-ASM: vtm

vuc = vec_msum_u128(vul, vul, vuc, 0);		vuc = vec_msum_u128(vul, vul, vuc, 0);
// CHECK: call <16 x i8> @llvm.s390.vmslg(<2 x i64> %{{.}}, <2 x i64> %{{.}}, <16 x i8> %{{.*}}, i32 0)		// CHECK: call <16 x i8> @llvm.s390.vmslg(<2 x i64> %{{.}}, <2 x i64> %{{.}}, <16 x i8> %{{.*}}, i32 0)
// CHECK-ASM: vmslg		// CHECK-ASM: vmslg
vuc = vec_msum_u128(vul, vul, vuc, 4);		vuc = vec_msum_u128(vul, vul, vuc, 4);
// CHECK: call <16 x i8> @llvm.s390.vmslg(<2 x i64> %{{.}}, <2 x i64> %{{.}}, <16 x i8> %{{.*}}, i32 4)		// CHECK: call <16 x i8> @llvm.s390.vmslg(<2 x i64> %{{.}}, <2 x i64> %{{.}}, <16 x i8> %{{.*}}, i32 4)
// CHECK-ASM: vmslg		// CHECK-ASM: vmslg
▲ Show 20 Lines • Show All 178 Lines • Show Last 20 Lines

clang/test/CodeGen/movbe-builtins.c

	// RUN: %clang_cc1 -ffreestanding %s -triple=x86_64-apple-darwin -target-feature +movbe -emit-llvm -o - \| FileCheck %s --check-prefixes=CHECK,CHECK-X64			// RUN: %clang_cc1 -ffreestanding %s -triple=x86_64-apple-darwin -target-feature +movbe -emit-llvm -o - \| FileCheck %s --check-prefixes=CHECK,CHECK-X64
	// RUN: %clang_cc1 -ffreestanding %s -triple=i686-apple-darwin -target-feature +movbe -emit-llvm -o - \| FileCheck %s			// RUN: %clang_cc1 -ffreestanding %s -triple=i686-apple-darwin -target-feature +movbe -emit-llvm -o - \| FileCheck %s


	#include <immintrin.h>			#include <immintrin.h>

	short test_loadbe_i16(const short *P) {			short test_loadbe_i16(const short *P) {
	// CHECK-LABEL: @test_loadbe_i16			// CHECK-LABEL: @test_loadbe_i16
	// CHECK: [[LOAD:%.]] = load i16, i16 %{{.*}}, align 1			// CHECK: [[LOAD:%.]] = load i16, i16 %{{.*}}, align 1
	// CHECK: call i16 @llvm.bswap.i16(i16 [[LOAD]])			// CHECK: call signext i16 @llvm.bswap.i16(i16 [[LOAD]])
	return _loadbe_i16(P);			return _loadbe_i16(P);
	}			}

	void test_storebe_i16(short *P, short D) {			void test_storebe_i16(short *P, short D) {
	// CHECK-LABEL: @test_storebe_i16			// CHECK-LABEL: @test_storebe_i16
	// CHECK: [[DATA:%.]] = call i16 @llvm.bswap.i16(i16 %{{.}})			// CHECK: [[DATA:%.]] = call i16 @llvm.bswap.i16(i16 %{{.}})
	// CHECK: store i16 [[DATA]], i16* %{{.*}}, align 1			// CHECK: store i16 [[DATA]], i16* %{{.*}}, align 1
	_storebe_i16(P, D);			_storebe_i16(P, D);
	Show All 31 Lines

clang/test/CodeGen/rot-intrinsics.c

	// RUN: %clang_cc1 -ffreestanding -triple i686--linux -emit-llvm %s -o - \| FileCheck %s --check-prefixes CHECK,CHECK-32BIT-LONG			// RUN: %clang_cc1 -ffreestanding -triple i686--linux -emit-llvm -mllvm -update-return-attrs=false %s -o - \| FileCheck %s --check-prefixes CHECK,CHECK-32BIT-LONG
	// RUN: %clang_cc1 -ffreestanding -triple x86_64--linux -emit-llvm %s -o - \| FileCheck %s --check-prefixes CHECK,CHECK-64BIT-LONG			// RUN: %clang_cc1 -ffreestanding -triple x86_64--linux -emit-llvm -mllvm -update-return-attrs=false %s -o - \| FileCheck %s --check-prefixes CHECK,CHECK-64BIT-LONG
	// RUN: %clang_cc1 -fms-extensions -fms-compatibility -ffreestanding %s -triple=i686-windows-msvc -target-feature +sse2 -emit-llvm -o - -Wall -Werror \| FileCheck %s --check-prefixes CHECK,CHECK-32BIT-LONG			// RUN: %clang_cc1 -fms-extensions -fms-compatibility -ffreestanding %s -triple=i686-windows-msvc -target-feature +sse2 -emit-llvm -mllvm -update-return-attrs=false -o - -Wall -Werror \| FileCheck %s --check-prefixes CHECK,CHECK-32BIT-LONG
	// RUN: %clang_cc1 -fms-extensions -fms-compatibility -ffreestanding %s -triple=x86_64-windows-msvc -target-feature +sse2 -emit-llvm -o - -Wall -Werror \| FileCheck %s --check-prefixes CHECK,CHECK-32BIT-LONG			// RUN: %clang_cc1 -fms-extensions -fms-compatibility -ffreestanding %s -triple=x86_64-windows-msvc -target-feature +sse2 -emit-llvm -mllvm -update-return-attrs=false -o - -Wall -Werror \| FileCheck %s --check-prefixes CHECK,CHECK-32BIT-LONG
	// RUN: %clang_cc1 -fms-extensions -fms-compatibility -fms-compatibility-version=17.00 -ffreestanding %s -triple=i686-windows-msvc -target-feature +sse2 -emit-llvm -o - -Wall -Werror \| FileCheck %s --check-prefixes CHECK,CHECK-32BIT-LONG			// RUN: %clang_cc1 -fms-extensions -fms-compatibility -fms-compatibility-version=17.00 -ffreestanding %s -triple=i686-windows-msvc -target-feature +sse2 -emit-llvm -mllvm -update-return-attrs=false -o - -Wall -Werror \| FileCheck %s --check-prefixes CHECK,CHECK-32BIT-LONG
	// RUN: %clang_cc1 -fms-extensions -fms-compatibility -fms-compatibility-version=17.00 -ffreestanding %s -triple=x86_64-windows-msvc -target-feature +sse2 -emit-llvm -o - -Wall -Werror \| FileCheck %s --check-prefixes CHECK,CHECK-32BIT-LONG			// RUN: %clang_cc1 -fms-extensions -fms-compatibility -fms-compatibility-version=17.00 -ffreestanding %s -triple=x86_64-windows-msvc -target-feature +sse2 -emit-llvm -mllvm -update-return-attrs=false -o - -Wall -Werror \| FileCheck %s --check-prefixes CHECK,CHECK-32BIT-LONG

	#include <x86intrin.h>			#include <x86intrin.h>

	unsigned char test__rolb(unsigned char value, int shift) {			unsigned char test__rolb(unsigned char value, int shift) {
	// CHECK-LABEL: i8 @test__rolb			// CHECK-LABEL: i8 @test__rolb
	// CHECK: [[R:%.]] = call i8 @llvm.fshl.i8(i8 [[X:%.]], i8 [[X]], i8 [[Y:%.*]])			// CHECK: [[R:%.]] = call i8 @llvm.fshl.i8(i8 [[X:%.]], i8 [[X]], i8 [[Y:%.*]])
	// CHECK: ret i8 [[R]]			// CHECK: ret i8 [[R]]
	return __rolb(value, shift);			return __rolb(value, shift);
	▲ Show 20 Lines • Show All 106 Lines • Show Last 20 Lines

clang/test/CodeGen/waitpkg.c

	// RUN: %clang_cc1 %s -ffreestanding -triple x86_64-unknown-unknown -emit-llvm -target-feature +waitpkg -Wall -pedantic -o - \| FileCheck %s			// RUN: %clang_cc1 %s -ffreestanding -triple x86_64-unknown-unknown -emit-llvm -target-feature +waitpkg -Wall -pedantic -o - \| FileCheck %s
	// RUN: %clang_cc1 %s -ffreestanding -triple i386-unknown-unknown -emit-llvm -target-feature +waitpkg -Wall -pedantic -o - \| FileCheck %s			// RUN: %clang_cc1 %s -ffreestanding -triple i386-unknown-unknown -emit-llvm -target-feature +waitpkg -Wall -pedantic -o - \| FileCheck %s

	#include <immintrin.h>			#include <immintrin.h>

	#include <stddef.h>			#include <stddef.h>
	#include <stdint.h>			#include <stdint.h>

	void test_umonitor(void *address) {			void test_umonitor(void *address) {
	//CHECK-LABEL: @test_umonitor			//CHECK-LABEL: @test_umonitor
	//CHECK: call void @llvm.x86.umonitor(i8* %{{.*}})			//CHECK: call void @llvm.x86.umonitor(i8* %{{.*}})
	return _umonitor(address);			return _umonitor(address);
	}			}

	uint8_t test_umwait(uint32_t control, uint64_t counter) {			uint8_t test_umwait(uint32_t control, uint64_t counter) {
	//CHECK-LABEL: @test_umwait			//CHECK-LABEL: @test_umwait
	//CHECK: call i8 @llvm.x86.umwait(i32 %{{.}}, i32 %{{.}}, i32 %{{.*}})			//CHECK: call zeroext i8 @llvm.x86.umwait(i32 %{{.}}, i32 %{{.}}, i32 %{{.*}})
	return _umwait(control, counter);			return _umwait(control, counter);
	}			}

	uint8_t test_tpause(uint32_t control, uint64_t counter) {			uint8_t test_tpause(uint32_t control, uint64_t counter) {
	//CHECK-LABEL: @test_tpause			//CHECK-LABEL: @test_tpause
	//CHECK: call i8 @llvm.x86.tpause(i32 %{{.}}, i32 %{{.}}, i32 %{{.*}})			//CHECK: call zeroext i8 @llvm.x86.tpause(i32 %{{.}}, i32 %{{.}}, i32 %{{.*}})
	return _tpause(control, counter);			return _tpause(control, counter);
	}			}

llvm/lib/Transforms/Utils/InlineFunction.cpp

Show First 20 Lines • Show All 74 Lines • ▼ Show 20 Lines
using namespace llvm;		using namespace llvm;
using ProfileCount = Function::ProfileCount;		using ProfileCount = Function::ProfileCount;

static cl::opt<bool>		static cl::opt<bool>
EnableNoAliasConversion("enable-noalias-to-md-conversion", cl::init(true),		EnableNoAliasConversion("enable-noalias-to-md-conversion", cl::init(true),
cl::Hidden,		cl::Hidden,
cl::desc("Convert noalias attributes to metadata during inlining."));		cl::desc("Convert noalias attributes to metadata during inlining."));

		static cl::opt<bool> UpdateReturnAttributes(
		"update-return-attrs", cl::init(true), cl::Hidden,
		cl::desc("Update return attributes on calls within inlined body"));

static cl::opt<bool>		static cl::opt<bool>
PreserveAlignmentAssumptions("preserve-alignment-assumptions-during-inlining",		PreserveAlignmentAssumptions("preserve-alignment-assumptions-during-inlining",
cl::init(true), cl::Hidden,		cl::init(true), cl::Hidden,
cl::desc("Convert align attributes to assumptions during inlining."));		cl::desc("Convert align attributes to assumptions during inlining."));

		static cl::opt<unsigned> MaxInstCheckedForThrow(
		"max-inst-checked-for-throw-during-inlining", cl::Hidden,
		reamesUnsubmitted Not Done Reply Inline Actions I'd suggest a name change here. Maybe: "inliner-attribute-window"? reames: I'd suggest a name change here. Maybe: "inliner-attribute-window"?
		cl::desc("the maximum number of instructions analyzed for may throw during "
		"attribute inference in inlined body"),
		cl::init(4));

llvm::InlineResult llvm::InlineFunction(CallBase *CB, InlineFunctionInfo &IFI,		llvm::InlineResult llvm::InlineFunction(CallBase *CB, InlineFunctionInfo &IFI,
AAResults *CalleeAAR,		AAResults *CalleeAAR,
bool InsertLifetime) {		bool InsertLifetime) {
return InlineFunction(CallSite(CB), IFI, CalleeAAR, InsertLifetime);		return InlineFunction(CallSite(CB), IFI, CalleeAAR, InsertLifetime);
}		}

namespace {		namespace {

▲ Show 20 Lines • Show All 1,035 Lines • ▼ Show 20 Lines	if (const Instruction *I = dyn_cast<Instruction>(VMI->first)) {
NI->setMetadata(		NI->setMetadata(
LLVMContext::MD_alias_scope,		LLVMContext::MD_alias_scope,
MDNode::concatenate(NI->getMetadata(LLVMContext::MD_alias_scope),		MDNode::concatenate(NI->getMetadata(LLVMContext::MD_alias_scope),
MDNode::get(CalledFunc->getContext(), Scopes)));		MDNode::get(CalledFunc->getContext(), Scopes)));
}		}
}		}
}		}

		static void AddReturnAttributes(CallSite CS, ValueToValueMapTy &VMap) {
		if (!UpdateReturnAttributes)
		return;
		AttrBuilder AB(CS.getAttributes(), AttributeList::ReturnIndex);
		if (AB.empty())
		return;

		auto *CalledFunction = CS.getCalledFunction();
		auto &Context = CalledFunction->getContext();

		auto GetClonedValue = [&](Instruction I) -> Value {
		reamesUnsubmitted Not Done Reply Inline Actions Pull this out as a static helper instead of a lambda, add an assert internally that the two instructions are in the same block. Why? Because I'm 80% sure the state capture on the lambda isn't needed, and having it as a separate function forces that discipline. reames: Pull this out as a static helper instead of a lambda, add an assert internally that the two…
		annaAuthorUnsubmitted Done Reply Inline Actions agreed. I'll do that in this change itself before landing. I am using this static helper in followon change D76792. anna: agreed. I'll do that in this change itself before landing. I am using this static helper in…
		ValueToValueMapTy::iterator VMI = VMap.find(I);
		if (VMI == VMap.end())
		return nullptr;
		return VMI->second;
		};

		auto MayContainThrowingOrExitingCall = [&](Instruction *RVal,
		Instruction *RInst) {
		unsigned NumInstChecked = 0;
		for (auto &I :
		make_range(RVal->getIterator(), std::prev(RInst->getIterator())))
		if (NumInstChecked++ > MaxInstCheckedForThrow \|\|
		isGuaranteedToTransferExecutionToSuccessor(&I))
		jdoerfertUnsubmitted Not Done Reply Inline Actions `mayThrow` is not sufficient. As with my earlier example, a potential `exit` is sufficient to break this, thus you need `willreturn` as well. jdoerfert: `mayThrow` is not sufficient. As with my earlier example, a potential `exit` is sufficient to…
		annaAuthorUnsubmitted Done Reply Inline Actions What we need is just `isGuaranteedToTransferExecutionToSuccessor`. That handles `mayThrow`, exits/pthread_exit and willreturn. Just to note, an unconditional exit in the callee itself is not an issue here. The problem is something like this: ;nothrow foo(i8* %arg) { if (%arg == null) exit; ret %arg } callee() { %r = call i8* @bar %v = call i8* @foo(i8* %r) ret i8* %r } caller() { call nonnull i8* @callee } Here propagating nonnull to callsite `bar` is incorrect since if %r is null, the program exits. anna: What we need is just `isGuaranteedToTransferExecutionToSuccessor`. That handles `mayThrow`…
		annaAuthorUnsubmitted Done Reply Inline Actions Noticed while adding couple more tests, there are 2 bugs here: the `isGuaranteedToTransferExecutionToSuccessor` check should be inverted make_range should be until the return instruction - so we do not want `std::prev` on the returnInstruction. what's needed is: `make_range(RVal->getIterator(), RInst->getIterator())` This means that from the callsite until (and excluding) the return instruction should be guaranteed to transfer execution to successor - only then we can backward propagate the attribute to that callsite. anna: Noticed while adding couple more tests, there are 2 bugs here: 1. the…
		lebedev.riUnsubmitted Not Done Reply Inline Actions Are you aware of `llvm::isValidAssumeForContext()`? All this (including pitfalls) sound awfully close to that function. lebedev.ri: Are you aware of `llvm::isValidAssumeForContext()`? All this (including pitfalls) sound awfully…
		annaAuthorUnsubmitted Done Reply Inline Actions as stated in a previous comment (https://reviews.llvm.org/D76140#1922292), adding `Assumes` here for simple cases seems like an overkill. It has significant IR churn and it also adds a use for something which can be easily inferred. Consider optimizations that depend on facts such as `hasOneUse` or a limited number of uses. We will now be inhibiting those optimizations. anna: as stated in a previous comment (https://reviews.llvm.org/D76140#1922292), adding `Assumes`…
		lebedev.riUnsubmitted Not Done Reply Inline Actions While i venomously disagree with the avoidance of the usage of one versatile interface and hope things will change once there's more progress on attributor & assume bundles, in this case, as it can be seen even from the signature of the `isValidAssumeForContext()` function, it implies/forces nothing about using assumes, but only performs a validity checking, similar to the `MayContainThrowingOrExitingCall()` https://github.com/llvm/llvm-project/blob/ca04d0c8fd269978be1c13fe1241172cdfe6a6ea/llvm/lib/Analysis/ValueTracking.cpp#L603 That being said, i haven't reviewed this code so maybe there's some differences here that make that function unapplicable here. lebedev.ri: While i venomously disagree with the avoidance of the usage of one versatile interface and hope…
		annaAuthorUnsubmitted Done Reply Inline Actions `isValidAssumeForContext(Inv, CxtI, DT)` does not force anything about assumes, but AFAICT all code which uses this function either has some sort of guard in the caller that the instruction is an assume. Also, the comments in the code state that it is for an assume. In fact, I believe if we intend to use that function more widely for other purposes, we should rename the function before using it (just a thought), and currently we should assert that `Inv` is an assume. It captures the intent of the function. That being said, I checked the code in `isValidAssumeForContext` and it does not fit the bill here for multiple reasons. We either do: `isValidAssumeForContext(RVal /* Inv /, RInst / CxtI /)` which fails when we do not have DT and just return true when RVal comes before RInst - this is always the case, since RVal will come before RInst. `isValidAssumeForContext(RInst / Inv/, RVal / CxtI/)` and it fails at the `!isEphemeralValueOf(Inv / RI /, CxtI / RV/)` check. (By fail here, I mean, it does not have the same behaviour as `MayContainThrowingOrExitingCall`). anna:* `isValidAssumeForContext(Inv, CxtI, DT)` does not force anything about assumes, but AFAICT all…
		return true;
		return false;
		};
		reamesUnsubmitted Not Done Reply Inline Actions Ok, after staring at it a bit, I've convinced myself the code here is correct, just needlessly conservative. What you're doing is: If the callees return instruction and returned call both map to the same instructions once inlined, determine whether there's a possible exit between the inlined copy. What you could be doing: If the callee returns a call, check if in the callee there's a possible exit between call and return, then apply attribute to cloned call. The key difference is when the caller directly returns the result vs uses it locally. The result here is that your transform is much more narrow in applicability than it first appears. reames: Ok, after staring at it a bit, I've convinced myself the code here is correct, just needlessly…
		annaAuthorUnsubmitted Done Reply Inline Actions yes, thanks for pointing it out. I realized it after our offline discussion :) For now, I will add a FIXME testcase which showcases the difference in code and handle that testcase in a followon change. anna: yes, thanks for pointing it out. I realized it after our offline discussion :) For now, I will…
		annaAuthorUnsubmitted Done Reply Inline Actions The key difference is when the caller directly returns the result vs uses it locally. The result here is that your transform is much more narrow in applicability than it first appears. I tried multiple test cases to showcase the difference between the two ideas above but failed. Specifically, `simplifyInstruction` used during inlining the callee is not too great at optimizing the body. For example, see added testcase `test7`. I also tried the less restrictive version (check the safety of the optimization in the callee itself, and do the attribute update on the cloned instruction), but didn't see any testcases in clang that needed update. Of course, that doesn't mean anything :) anna: > The key difference is when the caller directly returns the result vs uses it locally. The…
		annaAuthorUnsubmitted Done Reply Inline Actions Clarified this with Philip offline. The current patch is not restrictive. In fact, now that I think of it, sometimes, it may be better - `simplifyInstruction` can fold away instructions and reduce the "window size" between the RV and the ReturnInst. anna: Clarified this with Philip offline. The current patch is not restrictive. In fact, now that I…

		for (auto &BB : *CalledFunction) {
		reamesUnsubmitted Not Done Reply Inline Actions I'm not sure that pulling out the helper for two cases actually helps readability. Can you drop this and just do the two cases directly please? reames: I'm not sure that pulling out the helper for two cases actually helps readability. Can you…
		auto *RI = dyn_cast<ReturnInst>(BB.getTerminator());
		if (!RI \|\| !isa<CallBase>(RI->getOperand(0)))
		continue;
		// Sanity check that the cloned return instruction exists and is a return
		// instruction itself.
		auto *NewRI = dyn_cast_or_null<ReturnInst>(GetClonedValue(RI));
		if (!NewRI)
		continue;
		auto *RetVal = cast<CallBase>(RI->getOperand(0));
		// Sanity check that the cloned RetVal exists and is a call.
		// Simplification during inlining could have transformed the cloned
		// instruction.
		auto *NewRetVal = dyn_cast_or_null<CallBase>(GetClonedValue(RetVal));
		if (!NewRetVal)
		continue;
		// Backward propagation of attributes to the returned value may be incorrect
		// if it is control flow dependent.
		// Consider:
		// @callee {
		// %rv = call @foo()
		// %rv2 = call @bar()
		// if (%rv2 != null)
		// return %rv2
		// if (%rv == null)
		// exit()
		// return %rv
		// }
		// caller() {
		// %val = call nonnull @callee()
		// }
		// Here we cannot add the nonnull attribute on either foo or bar. So, we
		// limit the check to both NewRetVal and NewRI are in the same basic block
		// and there are no throwing/exiting instructions between these
		// instructions.
		if (NewRI->getParent() != NewRetVal->getParent() \|\|
		MayContainThrowingOrExitingCall(NewRetVal, NewRI))
		continue;
		// Add to the existing attributes.
		AttributeList AL = NewRetVal->getAttributes();
		AttributeList NewAL =
		AL.addAttributes(Context, AttributeList::ReturnIndex, AB);
		NewRetVal->setAttributes(NewAL);
		}
		}

/// If the inlined function has non-byval align arguments, then		/// If the inlined function has non-byval align arguments, then
/// add @llvm.assume-based alignment assumptions to preserve this information.		/// add @llvm.assume-based alignment assumptions to preserve this information.
static void AddAlignmentAssumptions(CallSite CS, InlineFunctionInfo &IFI) {		static void AddAlignmentAssumptions(CallSite CS, InlineFunctionInfo &IFI) {
if (!PreserveAlignmentAssumptions \|\| !IFI.GetAssumptionCache)		if (!PreserveAlignmentAssumptions \|\| !IFI.GetAssumptionCache)
return;		return;

AssumptionCache AC = &(IFI.GetAssumptionCache)(*CS.getCaller());		AssumptionCache AC = &(IFI.GetAssumptionCache)(*CS.getCaller());
auto &DL = CS.getCaller()->getParent()->getDataLayout();		auto &DL = CS.getCaller()->getParent()->getDataLayout();
▲ Show 20 Lines • Show All 649 Lines • ▼ Show 20 Lines	fixupLineNumbers(Caller, FirstNewBlock, TheCall,
CalledFunc->getSubprogram() != nullptr);		CalledFunc->getSubprogram() != nullptr);

// Clone existing noalias metadata if necessary.		// Clone existing noalias metadata if necessary.
CloneAliasScopeMetadata(CS, VMap);		CloneAliasScopeMetadata(CS, VMap);

// Add noalias metadata if necessary.		// Add noalias metadata if necessary.
AddAliasScopeMetadata(CS, VMap, DL, CalleeAAR);		AddAliasScopeMetadata(CS, VMap, DL, CalleeAAR);

		// Clone return attributes on the callsite into the calls within the inlined
		// function which feed into its return value.
		AddReturnAttributes(CS, VMap);

// Propagate llvm.mem.parallel_loop_access if necessary.		// Propagate llvm.mem.parallel_loop_access if necessary.
PropagateParallelLoopAccessMetadata(CS, VMap);		PropagateParallelLoopAccessMetadata(CS, VMap);

// Register any cloned assumptions.		// Register any cloned assumptions.
if (IFI.GetAssumptionCache)		if (IFI.GetAssumptionCache)
for (BasicBlock &NewBlock :		for (BasicBlock &NewBlock :
make_range(FirstNewBlock->getIterator(), Caller->end()))		make_range(FirstNewBlock->getIterator(), Caller->end()))
for (Instruction &I : NewBlock) {		for (Instruction &I : NewBlock) {
▲ Show 20 Lines • Show All 603 Lines • Show Last 20 Lines

llvm/test/Transforms/Inline/ret_attr_update.ll

This file was added.

				; RUN: opt < %s -inline-threshold=0 -always-inline -S \| FileCheck %s
				; RUN: opt < %s -passes=always-inline -S \| FileCheck %s

				declare i8* @foo(i8*)
				declare i8* @bar(i8*)

				define i8* @callee(i8 *%p) alwaysinline {
				; CHECK: @callee(
				; CHECK: call i8* @foo(i8* noalias %p)
				%r = call i8* @foo(i8* noalias %p)
				ret i8* %r
				}

				define i8* @caller(i8* %ptr, i64 %x) {
				; CHECK-LABEL: @caller
				; CHECK: call nonnull i8* @foo(i8* noalias
				%gep = getelementptr inbounds i8, i8* %ptr, i64 %x
				%p = call nonnull i8* @callee(i8* %gep)
				ret i8* %p
				}

				declare void @llvm.experimental.guard(i1,...)
				; Cannot add nonnull attribute to foo.
				define internal i8* @callee_with_throwable(i8* %p) alwaysinline {
				; CHECK-NOT: callee_with_throwable
				%r = call i8* @foo(i8* %p)
				%cond = icmp ne i8* %r, null
				call void (i1, ...) @llvm.experimental.guard(i1 %cond) [ "deopt"() ]
				ret i8* %r
				}

				; Here also we cannot add nonnull attribute to the call bar.
				define internal i8* @callee_with_explicit_control_flow(i8* %p) alwaysinline {
				; CHECK-NOT: callee_with_explicit_control_flow
				%r = call i8* @bar(i8* %p)
				%cond = icmp ne i8* %r, null
				br i1 %cond, label %ret, label %orig

				ret:
				ret i8* %r

				orig:
				ret i8* %p
				}

				define i8* @caller2(i8* %ptr, i64 %x, i1 %cond) {
				; CHECK-LABEL: @caller2
				; CHECK: call i8* @foo
				; CHECK: call i8* @bar
				%gep = getelementptr inbounds i8, i8* %ptr, i64 %x
				%p = call nonnull i8* @callee_with_throwable(i8* %gep)
				%q = call nonnull i8* @callee_with_explicit_control_flow(i8* %gep)
				br i1 %cond, label %pret, label %qret

				pret:
				ret i8* %p

				qret:
				ret i8* %q
				}

				define internal i8* @callee3(i8 *%p) alwaysinline {
				; CHECK-NOT: callee3
				%r = call noalias i8* @foo(i8* %p)
				ret i8* %r
				}

				; add the deref attribute to the existing attributes on foo.
				define i8* @caller3(i8* %ptr, i64 %x) {
				; CHECK-LABEL: caller3
				; CHECK: call noalias dereferenceable_or_null(12) i8* @foo
				%gep = getelementptr inbounds i8, i8* %ptr, i64 %x
				%p = call dereferenceable_or_null(12) i8* @callee3(i8* %gep)
				ret i8* %p
				}
				reamesUnsubmitted Done Reply Inline Actions There's a critical missing test case here: Callee and caller have the same attributes w/different values (i.e. deref) And thinking through the code, I think there might be a bug here. It's not a serious one, but the if the callee specifies a larger deref than the caller, it looks like the the smaller value is being written over the larger. Actually, digging through the attribute code, I think I'm wrong about the bug. However, you should definitely write the test to confirm and document merging behaviour! If it does turn out I'm correct, I'm fine with this being addressed in a follow up patch provided that the test is added in this one and isn't a functional issue. reames: There's a critical missing test case here: - Callee and caller have the same attributes…
				annaAuthorUnsubmitted Done Reply Inline Actions will check this. anna: will check this.
				annaAuthorUnsubmitted Done Reply Inline Actions added test case and documented merge behaviour. No bug in code, since we use the already existing value on attribute. anna: added test case and documented merge behaviour. No bug in code, since we use the already…

This is an archive of the discontinued LLVM Phabricator instance.

[InlineFunction] update attributes during inliningClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 251615

clang/test/CodeGen/builtins-systemz-zvector.c

clang/test/CodeGen/builtins-systemz-zvector2.c

clang/test/CodeGen/movbe-builtins.c

clang/test/CodeGen/rot-intrinsics.c

clang/test/CodeGen/waitpkg.c

llvm/lib/Transforms/Utils/InlineFunction.cpp

llvm/test/Transforms/Inline/ret_attr_update.ll

[InlineFunction] update attributes during inlining
ClosedPublic