This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
docs/
-
ReleaseNotes.rst
-
include/clang/
-
clang/
-
Basic/
-
CodeGenOptions.def
-
Driver/
-
Options.td
-
lib/CodeGen/
-
CodeGen/
-
CGCall.h
4/5
CGCall.cpp
-
test/
-
CodeGen/
1/1
lifetime-call-temp.c
-
CodeGenCXX/
-
amdgcn-call-with-aggarg.cpp
-
stack-reuse-miscompile.cpp
-
CodeGenCoroutines/
-
pr59181.cpp

Differential D74094

Reapply: [IRGen] Emit lifetime intrinsics around temporary aggregate argument allocas
Needs ReviewPublic

Authored by nickdesaulniers on Feb 5 2020, 3:09 PM.

Download Raw Diff

Details

Reviewers

rjmccall
thegameg
rsmith
erik.pilkington

Commits

rGe698695fbbf6: Reapply: [IRGen] Emit lifetime intrinsics around temporary aggregate argument…
rGfafc6e4fdf36: [IRGen] Emit lifetime intrinsics around temporary aggregate argument allocas

Summary

This reverts commit e26c24b849211f35a988d001753e0cd15e4a9d7b.

These temporaries are only used in the callee, and their memory can be
reused after the call is complete.

rdar://58552124

Original Author: Erik Pilkington <erik.pilkington@gmail.com>

In order to help users transition to a more disciplined lifetime of
temporaries, the flag -Xclang -sloppy-temporary-lifetimes was added,
which can allow them to retain the previous behavior.

Link: https://github.com/llvm/llvm-project/issues/38157
Link: https://github.com/llvm/llvm-project/issues/41896
Link: https://github.com/llvm/llvm-project/issues/43598
Link: https://github.com/ClangBuiltLinux/linux/issues/39
Link: https://reviews.llvm.org/rGfafc6e4fdf3673dcf557d6c8ae0c0a4bb3184402

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

erik.pilkington created this revision.Feb 5 2020, 3:09 PM

Herald added subscribers: ributzka, dexonsmith, jkorous. · View Herald TranscriptFeb 5 2020, 3:09 PM

rjmccall added inline comments.Feb 6 2020, 11:12 AM

clang/lib/CodeGen/CGCall.cpp
3697	If the argument type has a C++ destructor, will we end its lifetime before we call destructors at the end of the full-expression?

erik.pilkington planned changes to this revision.Feb 6 2020, 12:03 PM

erik.pilkington marked an inline comment as done.

erik.pilkington added inline comments.

clang/lib/CodeGen/CGCall.cpp
3697	Yeah, this is broken :/

rjmccall added inline comments.Feb 6 2020, 12:31 PM

clang/lib/CodeGen/CGCall.cpp
3697	Well, we could at least do it for trivially-destructible types. That probably needs to include all kinds of non-trivial destructibility, not just C++, though.

Disable the optimization for non-trivially destructible types.

thegameg added inline comments.Feb 7 2020, 11:30 AM

clang/lib/CodeGen/CGCall.cpp
3687	Is there any other use of `EmitAnyExprToTemp` that can benefit from this?

erik.pilkington marked an inline comment as done.Feb 7 2020, 11:35 AM

erik.pilkington added inline comments.

clang/lib/CodeGen/CGCall.cpp
3687	Right now, the only other user of that function is `CodeGenFunction::EmitStmtExprLValue`, which doesn't seem all that interesting. There are a lot of places that do create temporary allocas that would be worth looking into in the future though.

LGTM

This revision is now accepted and ready to land.Feb 7 2020, 12:16 PM

Closed by commit rGfafc6e4fdf36: [IRGen] Emit lifetime intrinsics around temporary aggregate argument allocas (authored by erik.pilkington). · Explain WhyFeb 7 2020, 2:43 PM

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptFeb 7 2020, 2:43 PM

ChuanqiXu added a subscriber: ChuanqiXu.Jan 12 2022, 9:01 PM

foad added a subscriber: foad.Aug 2 2023, 8:40 AM

Hi @erik.pilkington, I see this got reverted:

commit e26c24b849211f35a988d001753e0cd15e4a9d7b
Author: Erik Pilkington <erik.pilkington@gmail.com>
Date:   Wed Feb 12 12:02:58 2020 -0800

    Revert "[IRGen] Emit lifetime intrinsics around temporary aggregate argument allocas"
    
    This reverts commit fafc6e4fdf3673dcf557d6c8ae0c0a4bb3184402.
    
    Should fix ppc stage2 failure: http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/23546

Do you have any more info on the "ppc stage2 failure"? I'd like to pursue something like this patch to get more accurate lifetime markers for temporaries, so that LLVM stack slot coloring can do a better job, and we get smaller stack usage. This is prompted by https://github.com/llvm/llvm-project/issues/41896

Herald added a project: Restricted Project. · View Herald TranscriptAug 2 2023, 8:40 AM

nickdesaulniers added a subscriber: nathanchance.Aug 3 2023, 2:08 PM

nickdesaulniers added a subscriber: nickdesaulniers.

In D74094#4554327, @foad wrote:
Hi @erik.pilkington, I see this got reverted:
commit e26c24b849211f35a988d001753e0cd15e4a9d7b
Author: Erik Pilkington <erik.pilkington@gmail.com>
Date:   Wed Feb 12 12:02:58 2020 -0800

    Revert "[IRGen] Emit lifetime intrinsics around temporary aggregate argument allocas"
    
    This reverts commit fafc6e4fdf3673dcf557d6c8ae0c0a4bb3184402.
    
    Should fix ppc stage2 failure: http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/23546
Do you have any more info on the "ppc stage2 failure"? I'd like to pursue something like this patch to get more accurate lifetime markers for temporaries, so that LLVM stack slot coloring can do a better job, and we get smaller stack usage. This is prompted by https://github.com/llvm/llvm-project/issues/41896

I too am very interested, as this was flagged as a potential help for https://github.com/llvm/llvm-project/issues/41896.

paulkirth added a subscriber: paulkirth.Aug 3 2023, 2:52 PM

In D74094#4554327, @foad wrote:

Hi @erik.pilkington, I see this got reverted:

I'm not sure if @erik.pilkington is still watching Phabricator, but in any case I think (like me) he no longer has rdar access. Since this was a Linux PPC bot, it's possible Erik didn't get far investigating the failure. Three years later, the sands may have shifted substantially... maybe the best option to is rebase and investigate the new failures (if any).

@akyrtzi @arphaman @jroelofs, is one of you available to check rdar://58552124, in case it has extra context from the PPC bot failure?

In D74094#4559037, @dexonsmith wrote:

In D74094#4554327, @foad wrote:

Hi @erik.pilkington, I see this got reverted:

I'm not sure if @erik.pilkington is still watching Phabricator, but in any case I think (like me) he no longer has rdar access. Since this was a Linux PPC bot, it's possible Erik didn't get far investigating the failure. Three years later, the sands may have shifted substantially... maybe the best option to is rebase and investigate the new failures (if any).

@akyrtzi @arphaman @jroelofs, is one of you available to check rdar://58552124, in case it has extra context from the PPC bot failure?

I found rdar://59400672&60009515, which track the revert and reminder to investigate. Neither go into any more detail on how/why it broke that bot, unfortunately. I also found https://reviews.llvm.org/rGfafc6e4fdf3673dcf557d6c8ae0c0a4bb3184402 has a tiny bit more context than this review. Maybe @lei can share the reproducer mentioned in that thread? Otherwise, as Duncan said, I think your best bet is to try to re-land and see what breaks.

nickdesaulniers added a subscriber: ahatanak.Aug 4 2023, 9:49 AM

(I am working on rebasing this; tests need updating for opaque ptr)

nickdesaulniers reopened this revision.Aug 4 2023, 10:18 AM

This revision is now accepted and ready to land.Aug 4 2023, 10:18 AM

rebase on opaque ptrs

WIP, the following tests now fail and need to be updated:
Failed Tests (3):

Clang :: CodeGenCXX/amdgcn-call-with-aggarg.cpp
Clang :: CodeGenCoroutines/pr59181.cpp
Clang :: Driver/ps4-ps5-relax-relocations.c

Herald added a subscriber: jvesely. · View Herald TranscriptAug 4 2023, 10:18 AM

nickdesaulniers requested changes to this revision.Aug 4 2023, 10:19 AM

This revision now requires changes to proceed.Aug 4 2023, 10:19 AM

Harbormaster completed remote builds in B250372: Diff 547271.Aug 4 2023, 12:25 PM

ributzka removed a subscriber: ributzka.Aug 7 2023, 9:58 AM

nickdesaulniers commandeered this revision.Aug 7 2023, 10:07 AM

nickdesaulniers edited reviewers, added: erik.pilkington; removed: nickdesaulniers.

This revision is now accepted and ready to land.Aug 7 2023, 10:07 AM

nickdesaulniers planned changes to this revision.Aug 7 2023, 10:08 AM

fix remaining test failures

This revision is now accepted and ready to land.Aug 7 2023, 10:45 AM

remove instance of auto

Hi @lei can you please help us retest this version of this patch that previously broke your multistage build?
https://reviews.llvm.org/rGfafc6e4fdf3673dcf557d6c8ae0c0a4bb3184402#890444

remove another auto, sink ArgSlotAlloca to reduce scope

Harbormaster completed remote builds in B250845: Diff 547861.Aug 7 2023, 2:21 PM

In D74094#4566543, @nickdesaulniers wrote:

Hi @lei can you please help us retest this version of this patch that previously broke your multistage build?
https://reviews.llvm.org/rGfafc6e4fdf3673dcf557d6c8ae0c0a4bb3184402#890444

ping @lei

Sorry I missed this. Will kick off now and let you know the results soon.

@nickdesaulniers I have verified this patch on top of 40ee8abee77a2e8fb0089d4c7f5723b71f27d416 passes our multistage bot http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage

Thank-you!

Nice!

Closed by commit rGe698695fbbf6: Reapply: [IRGen] Emit lifetime intrinsics around temporary aggregate argument… (authored by erik.pilkington, committed by nickdesaulniers). · Explain WhyAug 16 2023, 3:28 PM

This revision was automatically updated to reflect the committed changes.

nickdesaulniers added a commit: rGe698695fbbf6: Reapply: [IRGen] Emit lifetime intrinsics around temporary aggregate argument….

Thanks @lei for the help testing. Re-applied.

alexfh added a reverting change: D159346: Revert "Reapply: [IRGen] Emit lifetime intrinsics around temporary aggregate argument allocas".Sep 1 2023, 2:09 AM

This commit caused invalid AddressSanitizer: stack-use-after-scope errors in our internal setup. Trying to create a standalone repro, but so far we think that the patch is incorrect. @vitalybuka says "The patch seems wrong, callee can return a reference to temp, so lifetime should be extended to full expression."

In D74094#4633785, @alexfh wrote:

This commit caused invalid AddressSanitizer: stack-use-after-scope errors in our internal setup. Trying to create a standalone repro, but so far we think that the patch is incorrect. @vitalybuka says "The patch seems wrong, callee can return a reference to temp, so lifetime should be extended to full expression."

A standalone reproducer is here: https://gcc.godbolt.org/z/Ed1s15Kv5. The test passes when compiled with clang before this patch, and generates a bogus address sanitizer error after this patch. This only happens with -march=haswell.

The affected code is:

#include <unsupported/Eigen/CXX11/Tensor>
#include <gtest/gtest.h>
#include <gmock/gmock.h>
#include <absl/types/span.h>

template <typename Tensor>
auto AsArray(const Tensor& t)
    -> absl::Span<const typename Tensor::Scalar> {
  return {t.data(), (size_t)t.size()};
}
using testing::ElementsAre;

int main() {
  using Tensor = Eigen::TensorFixedSize<float, Eigen::Sizes<2, 2>,
                                       Eigen::RowMajor, Eigen::DenseIndex>;
  Tensor a;
  a.setValues({{1, 2}, {3, 4}});
  auto round = [](Tensor m) {
    return (m + 0.5f).cast<int>().cast<float>();
  };

  const Tensor t3 = round(a.log().exp());
  EXPECT_THAT(AsArray(t3), ElementsAre(1, 2, 3, 4));
}

The problem reproduces with clang -std=c++17 -O3 -fsanitize=address -march=haswell.

I'm going to revert the commit.

alexfh added a reverting change: rGb7f491564484: Revert "Reapply: [IRGen] Emit lifetime intrinsics around temporary aggregate….Sep 1 2023, 3:53 AM

In D74094#4633785, @alexfh wrote:

This commit caused invalid AddressSanitizer: stack-use-after-scope errors in our internal setup. Trying to create a standalone repro, but so far we think that the patch is incorrect. @vitalybuka says "The patch seems wrong, callee can return a reference to temp, so lifetime should be extended to full expression."

I will lazily quote https://en.cppreference.com/w/cpp/language/lifetime, we can lookup the standard, all ask experts, if this is wrong:

All temporary objects are destroyed as the last step in evaluating the full-expression that (lexically) contains the point where they were created, and if multiple temporary objects were created, they are destroyed in the order opposite to the order of creation. This is true even if that evaluation ends in throwing an exception.

And full-expression here is:

const Tensor t3 = round(a.log().exp());

Parameter objects are not temporaries and have their own lifetime rules, so there's nothing wrong with this idea in principle. This seems to just be a bug, probably that we're doing a type check on E->getType() without considering whether E might be a gl-value. We definitely do not want to be emitting lifetime intrinsics for the referent of a reference argument just because the underlying type is an aggregate.

In D74094#4634966, @rjmccall wrote:

Parameter objects are not temporaries and have their own lifetime rules, so there's nothing wrong with this idea in principle. This seems to just be a bug, probably that we're doing a type check on E->getType() without considering whether E might be a gl-value. We definitely do not want to be emitting lifetime intrinsics for the referent of a reference argument just because the underlying type is an aggregate.

I think the issue is more so about returning a reference to one of the parameters; example:

struct foo { int x; };

// Returns a reference to the minimum foo.
foo &min(foo a, foo b);

void consume (foo&);

void bar () {
  foo a, b;
  consume(min(a, b));
}

With the current revision, we emit:

define dso_local void @_Z3barv() #0 {
entry:
  %a = alloca %struct.foo, align 4
  %b = alloca %struct.foo, align 4
  %agg.tmp = alloca %struct.foo, align 4
  %agg.tmp1 = alloca %struct.foo, align 4
  call void @llvm.lifetime.start.p0(i64 4, ptr %a) #4
  call void @llvm.lifetime.start.p0(i64 4, ptr %b) #4
  call void @llvm.lifetime.start.p0(i64 4, ptr %agg.tmp) #4
  call void @llvm.memcpy.p0.p0.i64(ptr align 4 %agg.tmp, ptr align 4 %a, i64 4, i1 false), !tbaa.struct !5
  call void @llvm.lifetime.start.p0(i64 4, ptr %agg.tmp1) #4
  call void @llvm.memcpy.p0.p0.i64(ptr align 4 %agg.tmp1, ptr align 4 %b, i64 4, i1 false), !tbaa.struct !5
  %coerce.dive = getelementptr inbounds %struct.foo, ptr %agg.tmp, i32 0, i32 0
  %0 = load i32, ptr %coerce.dive, align 4
  %coerce.dive2 = getelementptr inbounds %struct.foo, ptr %agg.tmp1, i32 0, i32 0
  %1 = load i32, ptr %coerce.dive2, align 4
  %call = call noundef nonnull align 4 dereferenceable(4) ptr @_Z3min3fooS_(i32 %0, i32 %1)
  call void @llvm.lifetime.end.p0(i64 4, ptr %agg.tmp) #4
  call void @llvm.lifetime.end.p0(i64 4, ptr %agg.tmp1) #4
  call void @_Z7consumeR3foo(ptr noundef nonnull align 4 dereferenceable(4) %call)
  call void @llvm.lifetime.end.p0(i64 4, ptr %b) #4
  call void @llvm.lifetime.end.p0(i64 4, ptr %a) #4
  ret void
}

which doesn't look correct (the lifetimes of %agg.tmp and %agg.tmp1 need to last past the call to consume, IIUC).

That makes me think we should probably do this lifetime annotation in the caller of CodeGenFunction::EmitCallArg since it will have information about the return type of the caller. If a function returns a reference to a type, we probably don't want to emit the lifetime markers for any parameter of that type. I wonder if strict aliasing needs to be considered, too?

(or am I chasing the wrong thing here?)

nickdesaulniers added inline comments.Sep 11 2023, 10:36 AM

clang/test/CodeGen/lifetime-call-temp.c
55	this check line looks broken; missing `:`

nickdesaulniers reopened this revision.Sep 11 2023, 10:50 AM

This revision is now accepted and ready to land.Sep 11 2023, 10:50 AM

nickdesaulniers planned changes to this revision.Sep 11 2023, 10:50 AM

skip optimization when returning a reference; regardless of type
fix test run line
use --check-prefixes=
reflow RUN lines

This revision is now accepted and ready to land.Sep 11 2023, 10:52 AM

Under [expr.call]p6 (https://eel.is/c++draft/expr.call#6), it is implementation-defined whether the lifetime of a parameter ends at the end of the containing full expression or at the end of the function. If the implementation chooses the latter, a reference to the parameter will dangle after the return. In your example, on your chosen target (probably x64-64?), we are actually forced by the psABI to end those lifetimes at the end of the function, because the contents of the parameters are passed in registers (rather than by implicit reference) and reconstituted in the stack frame of the callee. Therefore your min function has undefined behavior, and not just in a formal sense but actually in a very directly dangerous sense. That should not be surprising; intuitively, you should think of parameters as local variables, and therefore this code as returning a reference to a local variable, which is highly suspect even if it is sometimes — only on certain targets and for certain types — formally allowed.

That makes me think we should probably do this lifetime annotation in the caller of CodeGenFunction::EmitCallArg since it will have information about the return type of the caller. If a function returns a reference to a type, we probably don't want to emit the lifetime markers for any parameter of that type. I wonder if strict aliasing needs to be considered, too?

This is definitely barking up the wrong tree. The language specifies the lifetime of an object, and any access to the object during its lifetime is valid. If the parameter's lifetime *was* guaranteed to be the full expression here, we would be obliged to keep it alive for that full duration.

In particular, we cannot do anything with reasoning like "the return value cannot be a reference to the parameter". The right way to understand that is as an effort to invoke the as-if rule, arguing that it is impossible to reference the parameter after the call if it is not part of the return value, and therefore we can be more aggressive than the language would formally allow. But it is not true that the only way to reference a parameter after a call is if it is part of the return value. For example, the function could store a pointer to the parameter into a global variable, and as long as all accesses through that global happen before the parameter's lifetime ends, that's totally fine — it's obviously a very confusing and error-prone way to write code, but it's legal. (Also, even if the address of the parameter *could* only escape into the return value, the type analysis would have to be much more complicated than just whether it's a reference to the type of a parameter.)

In D74094#4643439, @rjmccall wrote:

Under [expr.call]p6 (https://eel.is/c++draft/expr.call#6), it is implementation-defined whether the lifetime of a parameter ends at the end of the containing full expression or at the end of the function. If the implementation chooses the latter, a reference to the parameter will dangle after the return. In your example, on your chosen target (probably x64-64?), we are actually forced by the psABI to end those lifetimes at the end of the function, because the contents of the parameters are passed in registers (rather than by implicit reference) and reconstituted in the stack frame of the callee. Therefore your min function has undefined behavior, and not just in a formal sense but actually in a very directly dangerous sense. That should not be surprising; intuitively, you should think of parameters as local variables, and therefore this code as returning a reference to a local variable, which is highly suspect even if it is sometimes — only on certain targets and for certain types — formally allowed.

That makes me think we should probably do this lifetime annotation in the caller of CodeGenFunction::EmitCallArg since it will have information about the return type of the caller. If a function returns a reference to a type, we probably don't want to emit the lifetime markers for any parameter of that type. I wonder if strict aliasing needs to be considered, too?

This is definitely barking up the wrong tree. The language specifies the lifetime of an object, and any access to the object during its lifetime is valid. If the parameter's lifetime *was* guaranteed to be the full expression here, we would be obliged to keep it alive for that full duration.

In particular, we cannot do anything with reasoning like "the return value cannot be a reference to the parameter". The right way to understand that is as an effort to invoke the as-if rule, arguing that it is impossible to reference the parameter after the call if it is not part of the return value, and therefore we can be more aggressive than the language would formally allow. But it is not true that the only way to reference a parameter after a call is if it is part of the return value. For example, the function could store a pointer to the parameter into a global variable, and as long as all accesses through that global happen before the parameter's lifetime ends, that's totally fine — it's obviously a very confusing and error-prone way to write code, but it's legal. (Also, even if the address of the parameter *could* only escape into the return value, the type analysis would have to be much more complicated than just whether it's a reference to the type of a parameter.)

Hmm...ok thanks for the feedback.

In that case then, is it your opinion that there was also a "very directly dangerous sense" of UB then in the test case for which the previous revision of this patch was reverted?

It's currently unclear to me how to go about relanding this; is there an explicit bug in e698695fbbf62e6676f8907665187f2d2c4d814b or not?

rjmccall requested changes to this revision.Sep 11 2023, 11:05 AM

akyrtzi removed a subscriber: akyrtzi.Sep 11 2023, 11:16 AM

Harbormaster completed remote builds in B256991: Diff 556461.Sep 11 2023, 11:30 AM

I can't easily tell you because the "standalone example" involves a million templates that I don't know anything about, and nobody seems to have done the analysis to identify the specific source of the miscompile.

What I can tell you is that there is an enormous semantic difference between passing a pr-value argument to a parameter that's passed by reference and to a parameter that's passed by value. In the former case, the reference is bound to a temporary with full-expression lifetime, and it is totally reasonable to return its address. In the latter case, the parameter object will often have a lifetime bound by the current function, and you cannot return its address without it being UB.

Looking closer, my previous guess that this was accidentally triggering for reference parameters seems to be wrong, because that case is filtered out earlier in the function. I'm not sure what the problem is exactly, then, unless the code indeed has UB.

In D74094#4643495, @rjmccall wrote:

I can't easily tell you because the "standalone example" involves a million templates that I don't know anything about, and nobody seems to have done the analysis to identify the specific source of the miscompile.

Sure; @alexfh @vitalybuka can you both please help produce a further reduced test case from https://reviews.llvm.org/D74094#4633799 ? Otherwise I'm beginning to think this patch may have been reverted prematurely; before an understood + reduced test case was reduced.

What I can tell you is that there is an enormous semantic difference between passing a pr-value argument to a parameter that's passed by reference and to a parameter that's passed by value. In the former case, the reference is bound to a temporary with full-expression lifetime, and it is totally reasonable to return its address. In the latter case, the parameter object will often have a lifetime bound by the current function, and you cannot return its address without it being UB.

I'm trying to come up with an example of "the former."

struct foo {
  int x;
};

// Returns a reference to the minimum foo.
foo &min(foo &a, foo &b);

void consume (foo&);

void bar () {
  consume(min(foo{0}, foo{1}));
}

doesn't compile:

foo.cpp:11:11: error: no matching function for call to 'min'
   11 |   consume(min(foo{0}, foo{1}));
      |           ^~~
foo.cpp:6:6: note: candidate function not viable: expects an lvalue for 1st argument
    6 | foo &min(foo &a, foo &b);
      |      ^   ~~~~~~

Have I misinterpreted what you meant by "passing a pr-value argument to a parameter that's passed by reference?"

Looking closer, my previous guess that this was accidentally triggering for reference parameters seems to be wrong, because that case is filtered out earlier in the function. I'm not sure what the problem is exactly, then, unless the code indeed has UB.

In which case, perhaps this was reverted prematurely.

In D74094#4643558, @nickdesaulniers wrote:

In D74094#4643495, @rjmccall wrote:

I can't easily tell you because the "standalone example" involves a million templates that I don't know anything about, and nobody seems to have done the analysis to identify the specific source of the miscompile.

Sure; @alexfh @vitalybuka can you both please help produce a further reduced test case from https://reviews.llvm.org/D74094#4633799 ? Otherwise I'm beginning to think this patch may have been reverted prematurely; before an understood + reduced test case was reduced.

Well, it's good to be conservative about adding new optimizations if they're causing apparent miscompiles. Even if the optimization is just exposing existing UB in some new one, that's something we should understand as a project. I have no problem with the eager revert.

It would be great if Alex and Vitaly could help track down the exact problem. However, if they can't, I think it's ultimately your responsibility as the person hoping to land the optimization to understand why the miscompile is not in fact your fault.

What I can tell you is that there is an enormous semantic difference between passing a pr-value argument to a parameter that's passed by reference and to a parameter that's passed by value. In the former case, the reference is bound to a temporary with full-expression lifetime, and it is totally reasonable to return its address. In the latter case, the parameter object will often have a lifetime bound by the current function, and you cannot return its address without it being UB.

I'm trying to come up with an example of "the former."
struct foo {
  int x;
};

// Returns a reference to the minimum foo.
foo &min(foo &a, foo &b);

void consume (foo&);

void bar () {
  consume(min(foo{0}, foo{1}));
}
doesn't compile:
foo.cpp:11:11: error: no matching function for call to 'min'
   11 |   consume(min(foo{0}, foo{1}));
      |           ^~~
foo.cpp:6:6: note: candidate function not viable: expects an lvalue for 1st argument
    6 | foo &min(foo &a, foo &b);
      |      ^   ~~~~~~
Have I misinterpreted what you meant by "passing a pr-value argument to a parameter that's passed by reference?"

Okay, to be frank, this is a bit of a red flag for me. Optimizations like this are ultimately rooted in language rules, and I'm not sure you can effectively do this work if you don't understand the basics of C++ parameter passing. In C++, you cannot bind a non-const l-value reference to a temporary, but you can bind either a const l-value reference or an x-value reference to one. So you could declare min as either const foo &min(const foo &a, const foo &b); or foo &&min(foo &&a, foo &&b);.

In D74094#4643577, @rjmccall wrote:

In D74094#4643558, @nickdesaulniers wrote:

In D74094#4643495, @rjmccall wrote:

I can't easily tell you because the "standalone example" involves a million templates that I don't know anything about, and nobody seems to have done the analysis to identify the specific source of the miscompile.

Sure; @alexfh @vitalybuka can you both please help produce a further reduced test case from https://reviews.llvm.org/D74094#4633799 ? Otherwise I'm beginning to think this patch may have been reverted prematurely; before an understood + reduced test case was reduced.

Well, it's good to be conservative about adding new optimizations if they're causing apparent miscompiles. Even if the optimization is just exposing existing UB in some new one, that's something we should understand as a project. I have no problem with the eager revert.

It would be great if Alex and Vitaly could help track down the exact problem. However, if they can't, I think it's ultimately your responsibility as the person hoping to land the optimization to understand why the miscompile is not in fact your fault.

I've cloned:

abseil @ f3eae68bd1d17df708f2b59a43ad7e837a616e6a
eigen @ 8f858a4ea8106b52f3cf475ffc5e0e9c6a91baa2
googletest @ eab0e7e289db13eabfc246809b0284dac02a369d

I had to build googletest. Then I built a.out via:

$ clang++ foo.cpp -I eigen/ -I googletest/googletest/include/ \
  -I googletest/googlemock/include/ -I abseil-cpp googletest/build/lib/libgtest.a \
  -std=c++17 -O3 -fsanitize=address -march=haswell

With this change re-applied (b7f4915644844fb9f32e8763922a070f5fe4fd29 reverted), a.out runs without issue. So yes, I need something other than a godbolt link which uses clang-trunk which isn't stable after a revert to understand what they are talking about.

What I can tell you is that there is an enormous semantic difference between passing a pr-value argument to a parameter that's passed by reference and to a parameter that's passed by value. In the former case, the reference is bound to a temporary with full-expression lifetime, and it is totally reasonable to return its address. In the latter case, the parameter object will often have a lifetime bound by the current function, and you cannot return its address without it being UB.

I'm trying to come up with an example of "the former."
struct foo {
  int x;
};

// Returns a reference to the minimum foo.
foo &min(foo &a, foo &b);

void consume (foo&);

void bar () {
  consume(min(foo{0}, foo{1}));
}
doesn't compile:
foo.cpp:11:11: error: no matching function for call to 'min'
   11 |   consume(min(foo{0}, foo{1}));
      |           ^~~
foo.cpp:6:6: note: candidate function not viable: expects an lvalue for 1st argument
    6 | foo &min(foo &a, foo &b);
      |      ^   ~~~~~~
Have I misinterpreted what you meant by "passing a pr-value argument to a parameter that's passed by reference?"
Okay, to be frank, this is a bit of a red flag for me. Optimizations like this are ultimately rooted in language rules, and I'm not sure you can effectively do this work if you don't understand the basics of C++ parameter passing.

Fuck me for trying to help, right? If x-values are part of the "basics" of parameter passing, I'm afraid to ask about the more complicated cases.

In C++, you cannot bind a non-const l-value reference to a temporary, but you can bind either a const l-value reference or an x-value reference to one. So you could declare min as either const foo &min(const foo &a, const foo &b); or foo &&min(foo &&a, foo &&b);.

Thanks, that's all I needed.

With b7f4915644844fb9f32e8763922a070f5fe4fd29 reverted, neither of those cases look fishy to me:
https://gist.github.com/nickdesaulniers/1c7b962222639e54c3339b63b572b166

Fuck me for trying to help, right? If x-values are part of the "basics" of parameter passing, I'm afraid to ask about the more complicated cases.

I can see how my response was somewhat aggressive, and I regret that and apologize. I imagine you're probably approaching this from the perspective of general optimization and are unhappy that you're getting bogged down in all this C++ minutiae. It is the nature of high-level optimization, though, that you often have to learn a lot of language details in order to get your job done. I'm happy to teach some of these concepts during review — I wouldn't expect more than, like, twenty people on the planet to know the differences between parameter and temporary lifetimes off-hand, and I understand that non-experts in C++ aren't going to know all these parameter initialization rules by heart. But I am surprised at some of the things we've had to cover, like that it's not generally okay for a function to return a reference to a by-value parameter. I am volunteering my time to try to help you with your goal of landing this patch, and it can be a little frustrating to spend that time on this stuff.

If you're able to build Alex's test case, I would suggest diffing the IR output both with and without your original patch. -O0 output is probably fine for this and will be a lot easier to track back to the IRGen code responsible for it. You may see something that immediately stands out as wrong, but at the very least, it'll tell us what's changed.

timcorringham added a subscriber: timcorringham.Sep 12 2023, 8:45 AM

In D74094#4643653, @nickdesaulniers wrote:

With this change re-applied (b7f4915644844fb9f32e8763922a070f5fe4fd29 reverted), a.out runs without issue. So yes, I need something other than a godbolt link which uses clang-trunk which isn't stable after a revert to understand what they are talking about.

Slight update here, I'm still working on this (slowly), but I now have a reproducer in hand.

// clang++ foo.cpp -I eigen -fsanitize=address  -std=c++17 -O3 -march=haswell

#include <unsupported/Eigen/CXX11/Tensor>

using Tensor = Eigen::TensorFixedSize<float, Eigen::Sizes<2, 2>, Eigen::RowMajor, Eigen::DenseIndex>;

auto round (Tensor m) {
    return (m + 0.5f).cast<int>().cast<float>();
}

int main() {
  Tensor a;
  a.setValues({{1, 2}, {3, 4}});

  const Tensor t3 = round(a.log().exp());
  return t3(0, 0);
}

with eigen @ 8f858a4ea810, and llvm @ 102715a60ee4 with b7f491564484 reverted (i.e. this commit re-applied a second time).

I don't yet fully comprehend yet what's going wrong, and probably need to familiarize myself with the language rules around auto's type deduction.

This can probably be reduced further as well; I'll try to preprocess it and run it through cvise to see if we can get something even more concise.

I'll also check the codegen differences from removing null -fsanitize= option (and perhaps -march=haswell, though I'd guess eigen is doing something with that deep inside).

In D74094#4645562, @nickdesaulniers wrote:

I don't yet fully comprehend yet what's going wrong, and probably need to familiarize myself with the language rules around auto's type deduction.

For reduction purposes, it might be useful to factor out the auto type deduction on the return. I think you can do that with the help of -Xclang -ast-dump. E.g., for this function:

auto f() { return 1; };

the AST dump tells me the return type is int:

|-FunctionDecl 0x1508e5698 <t.cpp:1:1, col:22> col:6 f 'int ()'
| `-CompoundStmt 0x1508e5928 <col:10, col:22>
|   `-ReturnStmt 0x1508e5918 <col:12, col:19>
|     `-IntegerLiteral 0x1508e5780 <col:19> 'int' 1

You could then replace the auto with the type given in the dump.

In D74094#4645998, @dexonsmith wrote:

In D74094#4645562, @nickdesaulniers wrote:

I don't yet fully comprehend yet what's going wrong, and probably need to familiarize myself with the language rules around auto's type deduction.

For reduction purposes, it might be useful to factor out the auto type deduction on the return. I think you can do that with the help of -Xclang -ast-dump.

Heh, yeah I tried that, and hit an assertion before we were able to dump the full AST.

| | |     | `-TypedefDecl 0x7f4a5cf03fb0 <line:72:3, line:73:106> col:106 referenced SegmentWrapper 'std::conditional_t<CanAlign, Ref<const Matrix<Scalar, Dynamic, 1, 0, blockSize, 1>, internal::evaluator<VectorTypeCopyClean>::Alignment>, typename VectorTypeCopyClean::ConstSegmentReturnType>':'clang-18: /android0/llvm-project/clang/lib/AST/NestedNameSpecifier.cpp:309: void clang::NestedNameSpecifier::print(raw_ostream &, const PrintingPolicy &, bool) const: Assertion `!isa<ElaboratedType>(T) && "Elaborated type in nested-name-specifier"' failed.

2.75 hours of cvise later: https://github.com/llvm/llvm-project/issues/43179#issuecomment-1719922669

Can probably either rebuild with assertions disabled, or put a clearly wrong type and have the compiler tell me what it was. Doing the latter:

const std::conditional_t<internal::is_same<float, CoeffReturnType>::value, TensorConversionOp<int, const TensorCwiseUnaryOp<bind2nd_op<scalar_sum_op<float, float>>, const TensorFixedSize<float, Sizes<2, 2>, 1, long>>>, TensorConversionOp<float, const TensorConversionOp<int, const TensorCwiseUnaryOp<bind2nd_op<scalar_sum_op<float, float>>, const TensorFixedSize<float, Sizes<2, 2>, 1, long>>>>>' (aka 'const Eigen::TensorConversionOp<float, const Eigen::TensorConversionOp<int, const Eigen::TensorCwiseUnaryOp<Eigen::internal::bind2nd_op<Eigen::internal::scalar_sum_op<float>>, const Eigen::TensorFixedSize<float, Eigen::Sizes<2, 2>, 1>>>>

In D74094#4646152, @nickdesaulniers wrote:
In D74094#4645998, @dexonsmith wrote:

In D74094#4645562, @nickdesaulniers wrote:

I don't yet fully comprehend yet what's going wrong, and probably need to familiarize myself with the language rules around auto's type deduction.

For reduction purposes, it might be useful to factor out the auto type deduction on the return. I think you can do that with the help of -Xclang -ast-dump.

Heh, yeah I tried that, and hit an assertion before we were able to dump the full AST.
| | |     | `-TypedefDecl 0x7f4a5cf03fb0 <line:72:3, line:73:106> col:106 referenced SegmentWrapper 'std::conditional_t<CanAlign, Ref<const Matrix<Scalar, Dynamic, 1, 0, blockSize, 1>, internal::evaluator<VectorTypeCopyClean>::Alignment>, typename VectorTypeCopyClean::ConstSegmentReturnType>':'clang-18: /android0/llvm-project/clang/lib/AST/NestedNameSpecifier.cpp:309: void clang::NestedNameSpecifier::print(raw_ostream &, const PrintingPolicy &, bool) const: Assertion `!isa<ElaboratedType>(T) && "Elaborated type in nested-name-specifier"' failed.

That can be worked around by adding -Xclang -ast-dump-filter=<function>, in this case -Xclang -ast-dump -Xclang -ast-dump-filter=round:
https://gist.github.com/nickdesaulniers/4ea4b46f112f1b3bc440020f3a2a01e1

No real comment on the issue itself but on the example as a former Eigen maintainer (sorry for the noise if that's all obvious for you):

auto round (Tensor m) {
    return (m + 0.5f).cast<int>().cast<float>();
}

does not return a Tensor but an expression encoding the computation which is evaluated during the assignment const Tensor t3 = round(a.log().exp()); using an overloaded operator=. With "evaluation" I mean evaluation the in the sense of performing the intended mathematical computation.

Many things can go wrong when using auto in such cases, of which two seem to be relevant here:

Eigen can (this is an implementation detail(!)) decide to evaluate subexpressions into temporaries. The returned expression would then reference the result of such an evaluation beyond its lifetime.
If 1. does not happen, the unevaluated expression would reference its arguments. The immediate 0.5f is copied since that's cheap, but the tensor would be referenced. Your example function round takes its parameter by-value and the returned expression would reference it. I'm unsure if the lifetime would be extended in this case (again, the exact details of C++ lifetime rules are not my area of expertise). I think if the lifetime would be extended, ASAN complaining after applying this patch is wrong, if the lifetime is not extended ASAN should complain and the example is wrong.

Btw, these issues are so common that Eigen documents the recommendation to avoid using auto with Eigen types all together: https://eigen.tuxfamily.org/dox/TopicPitfalls.html#title3

In D74094#4646297, @tellenbach wrote:
No real comment on the issue itself but on the example as a former Eigen maintainer (sorry for the noise if that's all obvious for you):
auto round (Tensor m) {
    return (m + 0.5f).cast<int>().cast<float>();
}
does not return a Tensor but an expression encoding the computation which is evaluated during the assignment const Tensor t3 = round(a.log().exp()); using an overloaded operator=. With "evaluation" I mean evaluation the in the sense of performing the intended mathematical computation.

Many things can go wrong when using auto in such cases, of which two seem to be relevant here:

Eigen can (this is an implementation detail(!)) decide to evaluate subexpressions into temporaries. The returned expression would then reference the result of such an evaluation beyond its lifetime.

If 1. does not happen, the unevaluated expression would reference its arguments. The immediate 0.5f is copied since that's cheap, but the tensor would be referenced. Your example function round takes its parameter by-value and the returned expression would reference it. I'm unsure if the lifetime would be extended in this case (again, the exact details of C++ lifetime rules are not my area of expertise). I think if the lifetime would be extended, ASAN complaining after applying this patch is wrong, if the lifetime is not extended ASAN should complain and the example is wrong.

Btw, these issues are so common that Eigen documents the recommendation to avoid using auto with Eigen types all together: https://eigen.tuxfamily.org/dox/TopicPitfalls.html#title3

Okay, thanks, I can see how that works, and I definitely would never had figured that out from just looking at this code. The formal lifetime of a parameter of type Tensor is definitely in the land of implementation-defined behavior, so this code seems to be at best non-portable.

Nick, maybe we can take a new look at this patch from that perspective. You've been trying to use very tight lifetime bounds for these objects that only cover the duration of call, which essentially means you're defining the lifetime of parameter objects to just be the call rather than the full-expression (at least, when the parameter type doesn't have a destructor). In the abstract, that's probably the right rule for Clang to adopt, because I think it matches programmer intuition (it's always wrong to return the address of a local, and that includes by-value parameters), and it means we'd have a consistent rule for all types and targets. It's also a fairly aggressive rule that's likely to uncover a fair amount of code like this that's assuming longer lifetimes. I think it's reasonable to say that these examples are all program bugs that should be fixed, but it might be better to pursue that as a *refinement* on top of an initially more conservative rule. I suspect that that conservative rule will also give us a large fraction of the optimization benefit that we can expect from this change, because at least parameter objects from calls in different full-expressions will be able to share memory. So we can start by giving these objects full-expression lifetime, chase down any program bugs that that uncovers (which will all *unquestionably* be program bugs under the standard), and then gradually work on landing the more aggressive rule (perhaps even for non-trivial types).

So we can start by giving these objects full-expression lifetime, chase down any program bugs that that uncovers (which will all *unquestionably* be program bugs under the standard), and then gradually work on landing the more aggressive rule (perhaps even for non-trivial types).

maybe provide a new flag to control this behaviour?

In D74094#4646975, @xbolva00 wrote:

So we can start by giving these objects full-expression lifetime, chase down any program bugs that that uncovers (which will all *unquestionably* be program bugs under the standard), and then gradually work on landing the more aggressive rule (perhaps even for non-trivial types).

maybe provide a new flag to control this behaviour?

You mean for when we start limiting lifetimes to the call rather than the full-expression? I don't know that we want such a flag in the long term, but having at least a -cc1 flag is usually helpful when we're bringing up new/stricter optimizations, yeah.

Yes, cc1 flag would be useful.

(Sorry for the wall of text)

In D74094#4646297, @tellenbach wrote:

No real comment on the issue itself but on the example as a former Eigen maintainer (sorry for the noise if that's all obvious for you):

Woah! Deus Ex Machina?

auto round (Tensor m) {
    return (m + 0.5f).cast<int>().cast<float>();
}
does not return a Tensor but an expression encoding the computation which is evaluated during the assignment const Tensor t3 = round(a.log().exp()); using an overloaded operator=. With "evaluation" I mean evaluation the in the sense of performing the intended mathematical computation.

Many things can go wrong when using auto in such cases, of which two seem to be relevant here:

Eigen can (this is an implementation detail(!)) decide to evaluate subexpressions into temporaries. The returned expression would then reference the result of such an evaluation beyond its lifetime.

If 1. does not happen, the unevaluated expression would reference its arguments. The immediate 0.5f is copied since that's cheap, but the tensor would be referenced. Your example function round takes its parameter by-value and the returned expression would reference it. I'm unsure if the lifetime would be extended in this case (again, the exact details of C++ lifetime rules are not my area of expertise). I think if the lifetime would be extended, ASAN complaining after applying this patch is wrong, if the lifetime is not extended ASAN should complain and the example is wrong.

Btw, these issues are so common that Eigen documents the recommendation to avoid using auto with Eigen types all together: https://eigen.tuxfamily.org/dox/TopicPitfalls.html#title3

Thanks for the info, https://eigen.tuxfamily.org/dox/TopicPitfalls.html#title3 is quite helpful. That helps explain why the return value was deduced to a const Eigen::TensorConversionOp<..... Sounds like this is some kind of lazy evaluation?

Of note, I modified the reduced test case to use a static function with auto return type. But the code as written originally used a lambda!

auto round = [](Tensor m) {
  return (m + 0.5f).cast<int>().cast<float>();
}

So the original author was well intentioned; generally you want to use auto rather than the full type of the lambda. But the author may not have been aware of the advice or hazards described by https://eigen.tuxfamily.org/dox/TopicPitfalls.html#title3. Our fix which we've committed is to use the -> for the return type of the lambda.

-auto round = [](Tensor m) {
+auto round = [](Tensor m) -> Tensor {
   return (m + 0.5f).cast<int>().cast<float>();
 }

Which I think is inline wrt. the advice in https://eigen.tuxfamily.org/dox/TopicPitfalls.html#title3. (i.e. evaluate the expression so that we don't have some sort of partially evaluated calculation containing references to temporaries). So I think the code in question (that we reverted this over) was just wrong.

One thing that's problematic is that neither clang or gcc do a great job with -Wreturn-stack-address (clang) or -Wreturn-local-addr (GCC): https://godbolt.org/z/6oq5nKnY7 Clang only spots obviously incorrect examples; GCC depends on optimization level.

I wonder if there's anything more we can do to help spot these at compile-time during semantic analysis. It seems for more complex types, it could even be very beneficial. The examples in eigen come to mind (could we catch those, for example?)

In D74094#4646954, @rjmccall wrote:

Okay, thanks, I can see how that works, and I definitely would never had figured that out from just looking at this code. The formal lifetime of a parameter of type Tensor is definitely in the land of implementation-defined behavior, so this code seems to be at best non-portable.

Right, so that was my first question here; sounds like the optimization is correct wrt. the example that broke. Though I still could not tell you precisely why. Perhaps, "round returns an object, but one that contains references to temporary variables`."

Nick, maybe we can take a new look at this patch from that perspective. You've been trying to use very tight lifetime bounds for these objects that only cover the duration of call, which essentially means you're defining the lifetime of parameter objects to just be the call rather than the full-expression (at least, when the parameter type doesn't have a destructor). In the abstract, that's probably the right rule for Clang to adopt, because I think it matches programmer intuition (it's always wrong to return the address of a local, and that includes by-value parameters), and it means we'd have a consistent rule for all types and targets. It's also a fairly aggressive rule that's likely to uncover a fair amount of code like this that's assuming longer lifetimes. I think it's reasonable to say that these examples are all program bugs that should be fixed, but it might be better to pursue that as a *refinement* on top of an initially more conservative rule. I suspect that that conservative rule will also give us a large fraction of the optimization benefit that we can expect from this change, because at least parameter objects from calls in different full-expressions will be able to share memory. So we can start by giving these objects full-expression lifetime, chase down any program bugs that that uncovers (which will all *unquestionably* be program bugs under the standard), and then gradually work on landing the more aggressive rule (perhaps even for non-trivial types).

Let me see if I can picture how the lifetimes are expected to work, for my own sake. Let's say we have functions foo, bar, and baz that all take a struct widget by value and return a struct widget. Then let's say we have code like:

struct widget { long x, y, z, w; };

widget foo(widget);
widget bar(widget);
widget baz(widget);

void quux () {
    widget w = foo(bar(baz({ .x = 42, })));
}

With b7f4915644844fb9f32e8763922a070f5fe4fd29 reverted, we'd get:

define dso_local void @_Z4quuxv() local_unnamed_addr #0 {
entry:
  %w = alloca %struct.widget, align 8
  %agg.tmp = alloca %struct.widget, align 8
  %agg.tmp1 = alloca %struct.widget, align 8
  %agg.tmp2 = alloca %struct.widget, align 8
  call void @llvm.lifetime.start.p0(i64 32, ptr nonnull %w) #4
  call void @llvm.lifetime.start.p0(i64 32, ptr nonnull %agg.tmp) #4
  call void @llvm.lifetime.start.p0(i64 32, ptr nonnull %agg.tmp1) #4
  call void @llvm.lifetime.start.p0(i64 32, ptr nonnull %agg.tmp2) #4
  call void @llvm.memset.p0.i64(ptr noundef nonnull align 8 dereferenceable(32) %agg.tmp2, i8 0, i64 32, i1 false)
  store i64 42, ptr %agg.tmp2, align 8, !tbaa !5
  call void @_Z3baz6widget(ptr nonnull sret(%struct.widget) align 8 %agg.tmp1, ptr noundef nonnull byval(%struct.widget) align 8 %agg.tmp2)
  call void @llvm.lifetime.end.p0(i64 32, ptr nonnull %agg.tmp2) #4
  call void @_Z3bar6widget(ptr nonnull sret(%struct.widget) align 8 %agg.tmp, ptr noundef nonnull byval(%struct.widget) align 8 %agg.tmp1)
  call void @llvm.lifetime.end.p0(i64 32, ptr nonnull %agg.tmp1) #4
  call void @_Z3foo6widget(ptr nonnull sret(%struct.widget) align 8 %w, ptr noundef nonnull byval(%struct.widget) align 8 %agg.tmp)
  call void @llvm.lifetime.end.p0(i64 32, ptr nonnull %agg.tmp) #4
  call void @llvm.lifetime.end.p0(i64 32, ptr nonnull %w) #4
  ret void
}

which is very aggressive; each temporary is marked dead for the next non-intrinsic call. This looks like it would be optimal. IIUC, @rjmccall 's proposal is instead to do something a la:

define dso_local void @_Z4quuxv() local_unnamed_addr #0 {
entry:
  %w = alloca %struct.widget, align 8
  %agg.tmp = alloca %struct.widget, align 8
  %agg.tmp1 = alloca %struct.widget, align 8
  %agg.tmp2 = alloca %struct.widget, align 8
  call void @llvm.lifetime.start.p0(i64 32, ptr nonnull %w) #4
  call void @llvm.lifetime.start.p0(i64 32, ptr nonnull %agg.tmp) #4
  call void @llvm.lifetime.start.p0(i64 32, ptr nonnull %agg.tmp1) #4
  call void @llvm.lifetime.start.p0(i64 32, ptr nonnull %agg.tmp2) #4
  call void @llvm.memset.p0.i64(ptr noundef nonnull align 8 dereferenceable(32) %agg.tmp2, i8 0, i64 32, i1 false)
  store i64 42, ptr %agg.tmp2, align 8, !tbaa !5
  call void @_Z3baz6widget(ptr nonnull sret(%struct.widget) align 8 %agg.tmp1, ptr noundef nonnull byval(%struct.widget) align 8 %agg.tmp2)
  call void @_Z3bar6widget(ptr nonnull sret(%struct.widget) align 8 %agg.tmp, ptr noundef nonnull byval(%struct.widget) align 8 %agg.tmp1)
  call void @_Z3foo6widget(ptr nonnull sret(%struct.widget) align 8 %w, ptr noundef nonnull byval(%struct.widget) align 8 %agg.tmp)
  call void @llvm.lifetime.end.p0(i64 32, ptr nonnull %agg.tmp2) #4
  call void @llvm.lifetime.end.p0(i64 32, ptr nonnull %agg.tmp1) #4
  call void @llvm.lifetime.end.p0(i64 32, ptr nonnull %agg.tmp) #4
  call void @llvm.lifetime.end.p0(i64 32, ptr nonnull %w) #4
  ret void
}

i.e. to have a less aggressive optimization where the lifetimes extend to the full expression, and the temporaries would not be able to share stack slots. That's better than the status quo (no lifetime markers at all) but still suboptimal, and I don't think it will solve the particular case I care about (a particular Linux kernel driver written in C which is passing many aggregates by value).

Do I want to create a flag for that? Not particularly; it does mean supporting BOTH codegen patterns; I'd rather have a flag to control the optimization or not do it at all rather than support 2 different optimizations. I'm also curious whether you'd expect the default value of the flag to be on (more aggressive) or not (less aggressive)? I don't want to have to add -Xclang (or -mllvm) flags to my build system to opt into secret optimizations. If folks find that this optimization breaks their incorrect code, they can use the already existing`-Xclang -disable-lifetime-markers` to opt out of optimizations (though more optimizations than just those introduced by this change). I like that pattern better, since using those flags is itself a red flag to be scrutinized during code review. Then again GCC does have -fconserve-stack that clang does not; we could put more aggressive stack saving optimizations under that, though IME in GCC that flag limits inline substitution to a magic per-target-architecture number of bytes. Or hide it under -O3 perhaps?

At the least, a release note to at least acknowledge that "you're not crazy; clang-18 is now more precise about minimizing the lifetime of temporaries (since the C++ spec says it's implementation defined) in order to reduce stack usage. ASAN is your friend. If the code works under -Xclang -disable-lifetime-markers then this is likely the culprit." would be an improvement if we were to reland this.

Personally, I feel that "we found a buggy example of code we broke, we fixed it. Let's reland it and tell people to be careful when upgrading." But I haven't been around long enough to know what's the precedent here for "aggressive optimizations." Do you feel strongly that we should not just reland this, @rjmccall (or anyone else)?

Converting the above widget example to C, I noticed that we're also missing the lifetime markers on compound literals!

struct widget { long x, y, z, w; };

struct widget foo(struct widget);
struct widget bar(struct widget);
struct widget baz(struct widget);

void quux () {
    struct widget w = foo(bar(baz((struct widget){ .x = 42, })));
}

define dso_local void @quux() local_unnamed_addr #0 {
entry:
  %w = alloca %struct.widget, align 8
  %agg.tmp = alloca %struct.widget, align 8
  %agg.tmp1 = alloca %struct.widget, align 8
  %.compoundliteral = alloca %struct.widget, align 8    ; <--- NO LIFETIME MARKERS :(
  call void @llvm.lifetime.start.p0(i64 32, ptr nonnull %w) #4
  call void @llvm.lifetime.start.p0(i64 32, ptr nonnull %agg.tmp) #4
  call void @llvm.lifetime.start.p0(i64 32, ptr nonnull %agg.tmp1) #4
  call void @llvm.memset.p0.i64(ptr noundef nonnull align 8 dereferenceable(32) %.compoundliteral, i8 0, i64 32, i1 false)
  store i64 42, ptr %.compoundliteral, align 8, !tbaa !5
  call void @baz(ptr nonnull sret(%struct.widget) align 8 %agg.tmp1, ptr noundef nonnull byval(%struct.widget) align 8 %.compoundliteral) #4
  call void @bar(ptr nonnull sret(%struct.widget) align 8 %agg.tmp, ptr noundef nonnull byval(%struct.widget) align 8 %agg.tmp1) #4
  call void @llvm.lifetime.end.p0(i64 32, ptr nonnull %agg.tmp1) #4
  call void @foo(ptr nonnull sret(%struct.widget) align 8 %w, ptr noundef nonnull byval(%struct.widget) align 8 %agg.tmp) #4
  call void @llvm.lifetime.end.p0(i64 32, ptr nonnull %agg.tmp) #4
  call void @llvm.lifetime.end.p0(i64 32, ptr nonnull %w) #4
  ret void
}

One thing I noticed, simply printing the expression when we were about to add the lifetime cleanup; I noticed some examples would trigger on CXXConstructExpr, CXXTemporaryObjectExpr, and CXXOperatorCallExpr. Those feel very C++ specific; I do wonder if we could do something differently for C, but I think we should not since as @rjmccall said "we'd have a consistent rule for all types and targets" which IMO is nicer+simpler.

I tried to write a verifier check to detect when a use was dominated by a call to the lifetime end intrinsic. I don't think that's what's going wrong here, but my initial hypothesis was that we were somehow getting this wrong in codegen, which would lead to bugs.

This failed on multiple examples in tree; llvm/test/CodeGen/AArch64/stack-tagging.ll has some examples that surprised me. Such as values with lifetimes that restart after being ended:

call void @llvm.lifetime.start.p0(i64 4, ptr nonnull %x)
call void @use8(ptr %x) #3
call void @llvm.lifetime.end.p0(i64 4, ptr nonnull %x)

call void @llvm.lifetime.start.p0(i64 4, ptr nonnull %x)
call void @use8(ptr %x) #3
call void @llvm.lifetime.end.p0(i64 4, ptr nonnull %x)

which makes it so that we can't simply use a single dominance check to check for uses after lifetime has ended (though I guess this is just a kind of bounds check; maybe if the use is then additionally dominated by a lifetime start and the start is dominated by the end then we're "resurrected."). Perhaps a separate discussion orthogonal to this bug, but after reading

another question unresolved in my mind is "why do we even need intrinsics for lifetime? why can't we use the initial use as the implicit lifetime start, and the final use as the implicit lifetime end?" I'm certain this is a naive question that probably depends on the semantics of a specific language frontend in particular (or even ABI; perhaps that's why the C++ spec mentions "implementation defined." I can't recall where I saw this, but for some reason I suspect the ABI used for Microsoft platforms differs here?). I also don't understand under what circumstances that something is marked dead can it later be "brought back to life/resurrected." Why is that a thing?

That's better than the status quo (no lifetime markers at all) but still suboptimal, and I don't think it will solve the particular case I care about (a particular Linux kernel driver written in C which is passing many aggregates by value).

Ah, all in the same full-expression? Then yeah, my idea wouldn't be good enough to optimize your situation.

Do I want to create a flag for that? Not particularly; it does mean supporting BOTH codegen patterns; I'd rather have a flag to control the optimization or not do it at all rather than support 2 different optimizations.

The received wisdom here is that it's very helpful to have a flag because it lets users test the effect of different changes independently. Users mostly don't live on trunk, and LLVM releases are roughly every six months, so when users get a new compiler it's going to have at least six months of accumulated changes in it. If a project breaks with a new compiler, it's really great to be able to isolate which change is causing the problem without having to rebuild the compiler.

The fact that we have dynamic checking of this from ASan does help a lot, though.

Personally, I feel that "we found a buggy example of code we broke, we fixed it. Let's reland it and tell people to be careful when upgrading." But I haven't been around long enough to know what's the precedent here for "aggressive optimizations." Do you feel strongly that we should not just reland this, @rjmccall (or anyone else)?

That several things broke immediately after commit does suggest that there are quite a bit more project bugs out there waiting to blow up. I think having a dedicated flag for just this optimization is probably a good idea in the short term.

One thing I noticed, simply printing the expression when we were about to add the lifetime cleanup; I noticed some examples would trigger on CXXConstructExpr, CXXTemporaryObjectExpr, and CXXOperatorCallExpr. Those feel very C++ specific; I do wonder if we could do something differently for C, but I think we should not since as @rjmccall said "we'd have a consistent rule for all types and targets" which IMO is nicer+simpler.

Well, if the language rules are different, they're different. (More on that in a second.) As far as I know, the lifetime of a parameter variable in C is never specified as outliving the call, and the entire concept of a full-expression is a C++ism. We could do something more aggressive in C.

Converting the above widget example to C, I noticed that we're also missing the lifetime markers on compound literals!

Yeah, so this is probably because the lifetime of a compound literal is somewhat surprising in C: it's actually the containing block scope. We could emit lifetime markers, but we'd have to make sure we emitted them in the right place. In pure C, I think this is only detectable if the user actually takes the address of the literal; unlike C++, C doesn't implicitly expose the address of temporaries or have destructors that allow their lifetime to be directly tested. But in ObjC, you could have a compound literal with weak/strong references in it, which arguably are required to be destroyed at the close of the enclosing block.

In D74094#4648319, @rjmccall wrote:

That's better than the status quo (no lifetime markers at all) but still suboptimal, and I don't think it will solve the particular case I care about (a particular Linux kernel driver written in C which is passing many aggregates by value).

Ah, all in the same full-expression? Then yeah, my idea wouldn't be good enough to optimize your situation.

Yeah, here's an example:
https://github.com/torvalds/linux/blob/2cf0f715623872823a72e451243bbf555d10d032/drivers/gpu/drm/amd/display/dc/dml/calcs/dce_calcs.c#L564-L570
These nested function calls take a struct by value and return the same type. The struct contains a single unsigned long long. When targeting 32b ARM, we're kind of hamstrung (I think) by https://github.com/ARM-software/abi-aa/blob/main/aapcs32/aapcs32.rst#result-return; clang will "unwrap" this aggregate for other targets, but not 32b ARM, I think because of that final bullet point about composite types. Though, reading the next section https://github.com/ARM-software/abi-aa/blob/main/aapcs32/aapcs32.rst#parameter-passing, I suspect clang may also be missing out the logic referring to "NCRN." Perhaps LLVM performs that later, but by not unwrapping the parameters, the omission of lifetime markers is painful.

I'm certain I could rewrite this code to not use a 1-element aggregate, but I think this is a common case of discrepancies where GCC does much better than clang in terms of stack usage. D74094 causes a 5x reduction in stack usage for this function in particular for 32b arm targets, so I'd rather target the root issue here which I think is clang.

Do I want to create a flag for that? Not particularly; it does mean supporting BOTH codegen patterns; I'd rather have a flag to control the optimization or not do it at all rather than support 2 different optimizations.

The received wisdom here is that it's very helpful to have a flag because it lets users test the effect of different changes independently. Users mostly don't live on trunk, and LLVM releases are roughly every six months, so when users get a new compiler it's going to have at least six months of accumulated changes in it. If a project breaks with a new compiler, it's really great to be able to isolate which change is causing the problem without having to rebuild the compiler.

The fact that we have dynamic checking of this from ASan does help a lot, though.

Ok, I will add that, and a release note about it and that ASan can help detect this as well.

One thing I noticed, simply printing the expression when we were about to add the lifetime cleanup; I noticed some examples would trigger on CXXConstructExpr, CXXTemporaryObjectExpr, and CXXOperatorCallExpr. Those feel very C++ specific; I do wonder if we could do something differently for C, but I think we should not since as @rjmccall said "we'd have a consistent rule for all types and targets" which IMO is nicer+simpler.

Well, if the language rules are different, they're different. (More on that in a second.) As far as I know, the lifetime of a parameter variable in C is never specified as outliving the call, and the entire concept of a full-expression is a C++ism. We could do something more aggressive in C.

I could only find language in the latest draft about lifetime of a parameter, nothing about lifetime of arguments. See footnote 206 in https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3096.pdf pdf page 178.

Converting the above widget example to C, I noticed that we're also missing the lifetime markers on compound literals!

Yeah, so this is probably because the lifetime of a compound literal is somewhat surprising in C: it's actually the containing block scope. We could emit lifetime markers, but we'd have to make sure we emitted them in the right place. In pure C, I think this is only detectable if the user actually takes the address of the literal; unlike C++, C doesn't implicitly expose the address of temporaries or have destructors that allow their lifetime to be directly tested. But in ObjC, you could have a compound literal with weak/strong references in it, which arguably are required to be destroyed at the close of the enclosing block.

Ack; will track that as a separate TODO.

undo ReturnsReference checks, add -Xclang -sloppy-temporary-lifetimes, add test for that, add release note

nickdesaulniers edited the summary of this revision. (Show Details)Sep 20 2023, 9:41 AM

Herald added a subscriber: wangpc. · View Herald TranscriptSep 20 2023, 9:41 AM

re-add sentence I accidentally dropped from the release note

git clang-format HEAD~

Harbormaster completed remote builds in B257460: Diff 557122.Sep 20 2023, 11:01 AM

Revision Contents

Path

Size

clang/

docs/

ReleaseNotes.rst

9 lines

include/

clang/

Basic/

CodeGenOptions.def

4 lines

Driver/

Options.td

3 lines

lib/

CodeGen/

CGCall.h

20 lines

CGCall.cpp

30 lines

test/

CodeGen/

lifetime-call-temp.c

93 lines

CodeGenCXX/

amdgcn-call-with-aggarg.cpp

19 lines

stack-reuse-miscompile.cpp

4 lines

CodeGenCoroutines/

pr59181.cpp

3 lines

Diff 557122

clang/docs/ReleaseNotes.rst

	Show All 34 Lines
	============================			============================
	These changes are ones which we think may surprise users when upgrading to			These changes are ones which we think may surprise users when upgrading to
	Clang \|release\| because of the opportunity they pose for disruption to existing			Clang \|release\| because of the opportunity they pose for disruption to existing
	code bases.			code bases.


	C/C++ Language Potentially Breaking Changes			C/C++ Language Potentially Breaking Changes
	-------------------------------------------			-------------------------------------------
				- Clang is now more precise with regards to the lifetime of temporary objects
				such as when aggregates are passed by value to a function, resulting in
				better sharing of stack slots and reduced stack usage. This change can lead
				to use-after-scope related issues in code that unintentionally relied on the
				previous behavior. If recompiling with ``-fsanitize=address`` shows a
				use-after-scope warning, then this is likely the case, and the report printed
				should be able to help users pinpoint where the use-after-scope is occurring.
				Users can use ``-Xclang -sloppy-temporary-lifetimes`` to retain the old
				behavior until they are able to find and resolve issues in their code.

	C++ Specific Potentially Breaking Changes			C++ Specific Potentially Breaking Changes
	-----------------------------------------			-----------------------------------------

	ABI Changes in This Version			ABI Changes in This Version
	---------------------------			---------------------------
	- Following the SystemV ABI for x86-64, ``__int128`` arguments will no longer			- Following the SystemV ABI for x86-64, ``__int128`` arguments will no longer
	be split between a register and a stack slot.			be split between a register and a stack slot.
	▲ Show 20 Lines • Show All 379 Lines • Show Last 20 Lines

clang/include/clang/Basic/CodeGenOptions.def

	Show First 20 Lines • Show All 508 Lines • ▼ Show 20 Lines
	/// Whether to zero out caller-used registers before returning.			/// Whether to zero out caller-used registers before returning.
	ENUM_CODEGENOPT(ZeroCallUsedRegs, llvm::ZeroCallUsedRegs::ZeroCallUsedRegsKind,			ENUM_CODEGENOPT(ZeroCallUsedRegs, llvm::ZeroCallUsedRegs::ZeroCallUsedRegsKind,
	5, llvm::ZeroCallUsedRegs::ZeroCallUsedRegsKind::Skip)			5, llvm::ZeroCallUsedRegs::ZeroCallUsedRegsKind::Skip)

	/// Modify C++ ABI to returning `this` pointer from constructors and			/// Modify C++ ABI to returning `this` pointer from constructors and
	/// non-deleting destructors. (No effect on Microsoft ABI.)			/// non-deleting destructors. (No effect on Microsoft ABI.)
	CODEGENOPT(CtorDtorReturnThis, 1, 0)			CODEGENOPT(CtorDtorReturnThis, 1, 0)

				/// Set via -Xclang -sloppy-temporary-lifetimes to disable emission of lifetime
				/// marker intrinsic calls.
				CODEGENOPT(NoLifetimeMarkersForTemporaries, 1, 0)

	#undef CODEGENOPT			#undef CODEGENOPT
	#undef ENUM_CODEGENOPT			#undef ENUM_CODEGENOPT
	#undef VALUE_CODEGENOPT			#undef VALUE_CODEGENOPT

clang/include/clang/Driver/Options.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,843 Lines • ▼ Show 20 Lines	HelpText<"Change the C++ ABI to returning `this` pointer from constructors "
"and non-deleting destructors. (No effect on Microsoft ABI)">,		"and non-deleting destructors. (No effect on Microsoft ABI)">,
MarshallingInfoFlag<CodeGenOpts<"CtorDtorReturnThis">>;		MarshallingInfoFlag<CodeGenOpts<"CtorDtorReturnThis">>;

def fexperimental_assignment_tracking_EQ : Joined<["-"], "fexperimental-assignment-tracking=">,		def fexperimental_assignment_tracking_EQ : Joined<["-"], "fexperimental-assignment-tracking=">,
Group<f_Group>, CodeGenOpts<"EnableAssignmentTracking">,		Group<f_Group>, CodeGenOpts<"EnableAssignmentTracking">,
NormalizedValuesScope<"CodeGenOptions::AssignmentTrackingOpts">,		NormalizedValuesScope<"CodeGenOptions::AssignmentTrackingOpts">,
Values<"disabled,enabled,forced">, NormalizedValues<["Disabled","Enabled","Forced"]>,		Values<"disabled,enabled,forced">, NormalizedValues<["Disabled","Enabled","Forced"]>,
MarshallingInfoEnum<CodeGenOpts<"AssignmentTrackingMode">, "Enabled">;		MarshallingInfoEnum<CodeGenOpts<"AssignmentTrackingMode">, "Enabled">;
		def sloppy_temporary_lifetimes : Flag<["-"], "sloppy-temporary-lifetimes">,
		HelpText<"Don't emit lifetime markers for temporary objects">,
		MarshallingInfoFlag<CodeGenOpts<"NoLifetimeMarkersForTemporaries">>;

} // let Visibility = [CC1Option]		} // let Visibility = [CC1Option]

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Dependency Output Options		// Dependency Output Options
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

let Visibility = [CC1Option] in {		let Visibility = [CC1Option] in {
▲ Show 20 Lines • Show All 1,401 Lines • Show Last 20 Lines

clang/lib/CodeGen/CGCall.h

Show First 20 Lines • Show All 272 Lines • ▼ Show 20 Lines	public:
struct CallArgCleanup {		struct CallArgCleanup {
EHScopeStack::stable_iterator Cleanup;		EHScopeStack::stable_iterator Cleanup;

/// The "is active" insertion point. This instruction is temporary and		/// The "is active" insertion point. This instruction is temporary and
/// will be removed after insertion.		/// will be removed after insertion.
llvm::Instruction *IsActiveIP;		llvm::Instruction *IsActiveIP;
};		};

		struct EndLifetimeInfo {
		llvm::Value *Addr;
		llvm::Value *Size;
		};

void add(RValue rvalue, QualType type) { push_back(CallArg(rvalue, type)); }		void add(RValue rvalue, QualType type) { push_back(CallArg(rvalue, type)); }

void addUncopiedAggregate(LValue LV, QualType type) {		void addUncopiedAggregate(LValue LV, QualType type) {
push_back(CallArg(LV, type));		push_back(CallArg(LV, type));
}		}

/// Add all the arguments from another CallArgList to this one. After doing		/// Add all the arguments from another CallArgList to this one. After doing
/// this, the old CallArgList retains its list of arguments, but must not		/// this, the old CallArgList retains its list of arguments, but must not
/// be used to emit a call.		/// be used to emit a call.
void addFrom(const CallArgList &other) {		void addFrom(const CallArgList &other) {
insert(end(), other.begin(), other.end());		insert(end(), other.begin(), other.end());
Writebacks.insert(Writebacks.end(), other.Writebacks.begin(),		Writebacks.insert(Writebacks.end(), other.Writebacks.begin(),
other.Writebacks.end());		other.Writebacks.end());
CleanupsToDeactivate.insert(CleanupsToDeactivate.end(),		CleanupsToDeactivate.insert(CleanupsToDeactivate.end(),
other.CleanupsToDeactivate.begin(),		other.CleanupsToDeactivate.begin(),
other.CleanupsToDeactivate.end());		other.CleanupsToDeactivate.end());
		LifetimeCleanups.insert(LifetimeCleanups.end(),
		other.LifetimeCleanups.begin(),
		other.LifetimeCleanups.end());
assert(!(StackBase && other.StackBase) && "can't merge stackbases");		assert(!(StackBase && other.StackBase) && "can't merge stackbases");
if (!StackBase)		if (!StackBase)
StackBase = other.StackBase;		StackBase = other.StackBase;
}		}

void addWriteback(LValue srcLV, Address temporary, llvm::Value *toUse) {		void addWriteback(LValue srcLV, Address temporary, llvm::Value *toUse) {
Writeback writeback = {srcLV, temporary, toUse};		Writeback writeback = {srcLV, temporary, toUse};
Writebacks.push_back(writeback);		Writebacks.push_back(writeback);
Show All 23 Lines	public:
void allocateArgumentMemory(CodeGenFunction &CGF);		void allocateArgumentMemory(CodeGenFunction &CGF);
llvm::Instruction *getStackBase() const { return StackBase; }		llvm::Instruction *getStackBase() const { return StackBase; }
void freeArgumentMemory(CodeGenFunction &CGF) const;		void freeArgumentMemory(CodeGenFunction &CGF) const;

/// Returns if we're using an inalloca struct to pass arguments in		/// Returns if we're using an inalloca struct to pass arguments in
/// memory.		/// memory.
bool isUsingInAlloca() const { return StackBase; }		bool isUsingInAlloca() const { return StackBase; }

		void addLifetimeCleanup(EndLifetimeInfo Info) {
		LifetimeCleanups.push_back(Info);
		}

		ArrayRef<EndLifetimeInfo> getLifetimeCleanups() const {
		return LifetimeCleanups;
		}

private:		private:
SmallVector<Writeback, 1> Writebacks;		SmallVector<Writeback, 1> Writebacks;

/// Deactivate these cleanups immediately before making the call. This		/// Deactivate these cleanups immediately before making the call. This
/// is used to cleanup objects that are owned by the callee once the call		/// is used to cleanup objects that are owned by the callee once the call
/// occurs.		/// occurs.
SmallVector<CallArgCleanup, 1> CleanupsToDeactivate;		SmallVector<CallArgCleanup, 1> CleanupsToDeactivate;

		/// Lifetime information needed to call llvm.lifetime.end for any temporary
		/// argument allocas.
		SmallVector<EndLifetimeInfo, 2> LifetimeCleanups;

/// The stacksave call. It dominates all of the argument evaluation.		/// The stacksave call. It dominates all of the argument evaluation.
llvm::CallInst *StackBase = nullptr;		llvm::CallInst *StackBase = nullptr;
};		};

/// FunctionArgList - Type for representing both the decl and type		/// FunctionArgList - Type for representing both the decl and type
/// of parameters to a function. The decl must be either a		/// of parameters to a function. The decl must be either a
/// ParmVarDecl or ImplicitParamDecl.		/// ParmVarDecl or ImplicitParamDecl.
class FunctionArgList : public SmallVector<const VarDecl *, 16> {};		class FunctionArgList : public SmallVector<const VarDecl *, 16> {};
▲ Show 20 Lines • Show All 82 Lines • Show Last 20 Lines

clang/lib/CodeGen/CGCall.cpp

Show All 34 Lines
#include "llvm/IR/AttributeMask.h"		#include "llvm/IR/AttributeMask.h"
#include "llvm/IR/Attributes.h"		#include "llvm/IR/Attributes.h"
#include "llvm/IR/CallingConv.h"		#include "llvm/IR/CallingConv.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/InlineAsm.h"		#include "llvm/IR/InlineAsm.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Intrinsics.h"		#include "llvm/IR/Intrinsics.h"
#include "llvm/IR/Type.h"		#include "llvm/IR/Type.h"
		#include "llvm/Support/TypeSize.h"
#include "llvm/Transforms/Utils/Local.h"		#include "llvm/Transforms/Utils/Local.h"
#include <optional>		#include <optional>
using namespace clang;		using namespace clang;
using namespace CodeGen;		using namespace CodeGen;

/***/		/***/

unsigned CodeGenTypes::ClangCallConvToLLVMCallConv(CallingConv CC) {		unsigned CodeGenTypes::ClangCallConvToLLVMCallConv(CallingConv CC) {
▲ Show 20 Lines • Show All 3,628 Lines • ▼ Show 20 Lines	static void setUsedBits(CodeGenModule &CGM, QualType QTy, int Offset,
int Size = Context.getTypeSizeInChars(QTy).getQuantity();		int Size = Context.getTypeSizeInChars(QTy).getQuantity();
if (Size <= 0)		if (Size <= 0)
return;		return;

std::fill_n(Bits.begin() + Offset, Size,		std::fill_n(Bits.begin() + Offset, Size,
(uint64_t(1) << Context.getCharWidth()) - 1);		(uint64_t(1) << Context.getCharWidth()) - 1);
}		}

static uint64_t buildMultiCharMask(const SmallVectorImpl<uint64_t> &Bits,		static uint64_t buildMultiCharMask(const SmallVectorImpl<uint64_t> &Bits,
thegamegUnsubmitted Not Done Reply Inline Actions Is there any other use of `EmitAnyExprToTemp` that can benefit from this? thegameg: Is there any other use of `EmitAnyExprToTemp` that can benefit from this?
erik.pilkingtonUnsubmitted Done Reply Inline Actions Right now, the only other user of that function is `CodeGenFunction::EmitStmtExprLValue`, which doesn't seem all that interesting. There are a lot of places that do create temporary allocas that would be worth looking into in the future though. erik.pilkington: Right now, the only other user of that function is `CodeGenFunction::EmitStmtExprLValue`, which…
int Pos, int Size, int CharWidth,		int Pos, int Size, int CharWidth,
bool BigEndian) {		bool BigEndian) {
assert(Size > 0);		assert(Size > 0);
uint64_t Mask = 0;		uint64_t Mask = 0;
if (BigEndian) {		if (BigEndian) {
for (auto P = Bits.begin() + Pos, E = Bits.begin() + Pos + Size; P != E;		for (auto P = Bits.begin() + Pos, E = Bits.begin() + Pos + Size; P != E;
++P)		++P)
Mask = (Mask << CharWidth) \| *P;		Mask = (Mask << CharWidth) \| *P;
} else {		} else {
		rjmccallUnsubmitted Done Reply Inline Actions If the argument type has a C++ destructor, will we end its lifetime before we call destructors at the end of the full-expression? rjmccall: If the argument type has a C++ destructor, will we end its lifetime before we call destructors…
		erik.pilkingtonUnsubmitted Done Reply Inline Actions Yeah, this is broken :/ erik.pilkington: Yeah, this is broken :/
		rjmccallUnsubmitted Done Reply Inline Actions Well, we could at least do it for trivially-destructible types. That probably needs to include all kinds of non-trivial destructibility, not just C++, though. rjmccall: Well, we could at least do it for trivially-destructible types. That probably needs to include…
auto P = Bits.begin() + Pos + Size, End = Bits.begin() + Pos;		auto P = Bits.begin() + Pos + Size, End = Bits.begin() + Pos;
do		do
Mask = (Mask << CharWidth) \| *--P;		Mask = (Mask << CharWidth) \| *--P;
while (P != End);		while (P != End);
}		}
return Mask;		return Mask;
}		}

▲ Show 20 Lines • Show All 967 Lines • ▼ Show 20 Lines	void CodeGenFunction::EmitCallArg(CallArgList &args, const Expr *E,
if (HasAggregateEvalKind && isa<ImplicitCastExpr>(E) &&		if (HasAggregateEvalKind && isa<ImplicitCastExpr>(E) &&
cast<CastExpr>(E)->getCastKind() == CK_LValueToRValue) {		cast<CastExpr>(E)->getCastKind() == CK_LValueToRValue) {
LValue L = EmitLValue(cast<CastExpr>(E)->getSubExpr());		LValue L = EmitLValue(cast<CastExpr>(E)->getSubExpr());
assert(L.isSimple());		assert(L.isSimple());
args.addUncopiedAggregate(L, type);		args.addUncopiedAggregate(L, type);
return;		return;
}		}

args.add(EmitAnyExprToTemp(E), type);		AggValueSlot ArgSlot = AggValueSlot::ignored();
		// If the callee returns a reference, skip this stack saving optimization;
		// we don't want to prematurely end the lifetime of the temporary. It may be
		// possible to still perform this optimization if the return type is a
		// reference to a different type than the parameter.
		if (hasAggregateEvaluationKind(E->getType())) {
		Address ArgSlotAlloca = Address::invalid();
		ArgSlot = CreateAggTemp(E->getType(), "agg.tmp", &ArgSlotAlloca);

		// Emit a lifetime start/end for this temporary. If the type has a
		// destructor, then we need to keep it alive. FIXME: We should still be able
		// to end the lifetime after the destructor returns.
		if (!CGM.getCodeGenOpts().NoLifetimeMarkersForTemporaries &&
		!E->getType().isDestructedType()) {
		llvm::TypeSize size =
		CGM.getDataLayout().getTypeAllocSize(ConvertTypeForMem(E->getType()));
		if (llvm::Value *lifetimeSize =
		EmitLifetimeStart(size, ArgSlotAlloca.getPointer()))
		args.addLifetimeCleanup({ArgSlotAlloca.getPointer(), lifetimeSize});
		}
		}

		args.add(EmitAnyExpr(E, ArgSlot), type);
}		}

QualType CodeGenFunction::getVarArgType(const Expr *Arg) {		QualType CodeGenFunction::getVarArgType(const Expr *Arg) {
// System headers on Windows define NULL to 0 instead of 0LL on Win64. MSVC		// System headers on Windows define NULL to 0 instead of 0LL on Win64. MSVC
// implicitly widens null pointer constants that are arguments to varargs		// implicitly widens null pointer constants that are arguments to varargs
// functions to pointer-sized ints.		// functions to pointer-sized ints.
if (!getTarget().getTriple().isOSWindows())		if (!getTarget().getTriple().isOSWindows())
return Arg->getType();		return Arg->getType();
▲ Show 20 Lines • Show All 1,179 Lines • ▼ Show 20 Lines	if (Ret.isScalar() && TargetDecl) {
AllocAlignAttrEmitter.EmitAsAnAssumption(Loc, RetTy, Ret);		AllocAlignAttrEmitter.EmitAsAnAssumption(Loc, RetTy, Ret);
}		}

// Explicitly call CallLifetimeEnd::Emit just to re-use the code even though		// Explicitly call CallLifetimeEnd::Emit just to re-use the code even though
// we can't use the full cleanup mechanism.		// we can't use the full cleanup mechanism.
for (CallLifetimeEnd &LifetimeEnd : CallLifetimeEndAfterCall)		for (CallLifetimeEnd &LifetimeEnd : CallLifetimeEndAfterCall)
LifetimeEnd.Emit(this, /Flags=*/{});		LifetimeEnd.Emit(this, /Flags=*/{});

		if (!CGM.getCodeGenOpts().NoLifetimeMarkersForTemporaries)
		for (const CallArgList::EndLifetimeInfo &LT :
		CallArgs.getLifetimeCleanups())
		EmitLifetimeEnd(LT.Size, LT.Addr);

if (!ReturnValue.isExternallyDestructed() &&		if (!ReturnValue.isExternallyDestructed() &&
RetTy.isDestructedType() == QualType::DK_nontrivial_c_struct)		RetTy.isDestructedType() == QualType::DK_nontrivial_c_struct)
pushDestroy(QualType::DK_nontrivial_c_struct, Ret.getAggregateAddress(),		pushDestroy(QualType::DK_nontrivial_c_struct, Ret.getAggregateAddress(),
RetTy);		RetTy);

return Ret;		return Ret;
}		}

Show All 22 Lines

clang/test/CodeGen/lifetime-call-temp.c

This file was added.

				// RUN: %clang -cc1 -triple x86_64-apple-macos -O1 -disable-llvm-passes %s -S \
				// RUN: -emit-llvm -o - \| FileCheck %s --implicit-check-not=llvm.lifetime
				// RUN: %clang -cc1 -xc++ -std=c++17 -triple x86_64-apple-macos -O1 \
				// RUN: -disable-llvm-passes %s -S -emit-llvm -o - -Wno-return-type-c-linkage \| \
				// RUN: FileCheck %s --implicit-check-not=llvm.lifetime --check-prefixes=CHECK,CXX
				// RUN: %clang -cc1 -xobjective-c -triple x86_64-apple-macos -O1 \
				// RUN: -disable-llvm-passes %s -S -emit-llvm -o - \| \
				// RUN: FileCheck %s --implicit-check-not=llvm.lifetime --check-prefixes=CHECK,OBJC
				// RUN: %clang -cc1 -triple x86_64-apple-macos -O1 -disable-llvm-passes %s -S \
				// RUN: -emit-llvm -o - -sloppy-temporary-lifetimes \| \
				// RUN: FileCheck %s --implicit-check-not=llvm.lifetime --check-prefixes=SLOPPY

				typedef struct { int x[100]; } aggregate;

				#ifdef __cplusplus
				extern "C" {
				#endif

				void takes_aggregate(aggregate);
				aggregate gives_aggregate();

				// CHECK-LABEL: define void @t1
				void t1() {
				takes_aggregate(gives_aggregate());

				// CHECK: [[AGGTMP:%.*]] = alloca %struct.aggregate, align 8
				// CHECK: call void @llvm.lifetime.start.p0(i64 400, ptr [[AGGTMP]])
				// CHECK: call void{{.*}} @gives_aggregate(ptr sret(%struct.aggregate) align 4 [[AGGTMP]])
				// CHECK: call void @takes_aggregate(ptr noundef byval(%struct.aggregate) align 8 [[AGGTMP]])
				// CHECK: call void @llvm.lifetime.end.p0(i64 400, ptr [[AGGTMP]])

				// SLOPPY: [[AGGTMP:%.*]] = alloca %struct.aggregate, align 8
				// SLOPPY-NEXT: call void (ptr, ...) @gives_aggregate(ptr sret(%struct.aggregate) align 4 [[AGGTMP]])
				// SLOPPY-NEXT: call void @takes_aggregate(ptr noundef byval(%struct.aggregate) align 8 [[AGGTMP]])
				}

				// CHECK: declare {{.*}}llvm.lifetime.start
				// CHECK: declare {{.*}}llvm.lifetime.end

				#ifdef __cplusplus
				// CXX: define void @t2
				void t2() {
				struct S {
				S(aggregate) {}
				};
				S{gives_aggregate()};

				// CXX: [[AGG:%.*]] = alloca %struct.aggregate
				// CXX: call void @llvm.lifetime.start.p0(i64 400, ptr
				// CXX: call void @gives_aggregate(ptr sret(%struct.aggregate) align 4 [[AGG]])
				// CXX: call void @_ZZ2t2EN1SC1E9aggregate(ptr {{.}}, ptr {{.}} byval(%struct.aggregate) align 8 [[AGG]])
				// CXX: call void @llvm.lifetime.end.p0(i64 400, ptr
				}

				struct Dtor {
				nickdesaulniersAuthorUnsubmitted Done Reply Inline Actions this check line looks broken; missing `:` nickdesaulniers: this check line looks broken; missing `:`
				~Dtor();
				};

				void takes_dtor(Dtor);
				Dtor gives_dtor();

				// CXX: define void @t3
				void t3() {
				takes_dtor(gives_dtor());

				// CXX-NOT: @llvm.lifetime
				// CXX: ret void
				}

				#endif

				#ifdef __OBJC__

				@interface X
				-m:(aggregate)x;
				@end

				// OBJC: define void @t4
				void t4(X *x) {
				[x m: gives_aggregate()];

				// OBJC: [[AGG:%.*]] = alloca %struct.aggregate
				// OBJC: call void @llvm.lifetime.start.p0(i64 400, ptr
				// OBJC: call void{{.*}} @gives_aggregate(ptr sret(%struct.aggregate) align 4 [[AGGTMP]])
				// OBJC: call {{.*}}@objc_msgSend
				// OBJC: call void @llvm.lifetime.end.p0(i64 400, ptr
				}

				#endif

				#ifdef __cplusplus
				}
				#endif

clang/test/CodeGenCXX/amdgcn-call-with-aggarg.cpp

This file was added.

				// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -emit-llvm -O3 -disable-llvm-passes -o - %s \| FileCheck %s

				struct A {
				float x, y, z, w;
				};

				void foo(A a);

				// CHECK-LABEL: @_Z4testv
				// CHECK: [[A:%.]] = alloca [[STRUCT_A:%.]], align 4, addrspace(5)
				// CHECK-NEXT: [[AGG_TMP:%.*]] = alloca [[STRUCT_A]], align 4, addrspace(5)
				// CHECK-NEXT: [[A_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[A]] to ptr
				// CHECK-NEXT: [[AGG_TMP_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[AGG_TMP]] to ptr
				// CHECK-NEXT: call void @llvm.lifetime.start.p5(i64 16, ptr addrspace(5) [[A]]) #[[ATTR4:[0-9]+]]
				// CHECK-NEXT: call void @llvm.lifetime.start.p5(i64 16, ptr addrspace(5) [[AGG_TMP]]) #[[ATTR4]]
				void test() {
				A a;
				foo(a);
				}

clang/test/CodeGenCXX/stack-reuse-miscompile.cpp

	Show All 20 Lines
	{			{
	// It's essential that the lifetimes of all three T temporaries here are			// It's essential that the lifetimes of all three T temporaries here are
	// overlapping. They must all remain alive through the call to str().			// overlapping. They must all remain alive through the call to str().
	//			//
	// CHECK: [[T1:%.*]] = alloca %class.T, align 4			// CHECK: [[T1:%.*]] = alloca %class.T, align 4
	// CHECK: [[T2:%.*]] = alloca %class.T, align 4			// CHECK: [[T2:%.*]] = alloca %class.T, align 4
	// CHECK: [[T3:%.*]] = alloca %class.T, align 4			// CHECK: [[T3:%.*]] = alloca %class.T, align 4
	//			//
				// CHECK: [[AGG:%.*]] = alloca %class.S, align 4
				//
	// FIXME: We could defer starting the lifetime of the return object of concat			// FIXME: We could defer starting the lifetime of the return object of concat
	// until the call.			// until the call.
	// CHECK: call void @llvm.lifetime.start.p0(i64 16, ptr [[T1]])			// CHECK: call void @llvm.lifetime.start.p0(i64 16, ptr [[T1]])
	//			//
	// CHECK: call void @llvm.lifetime.start.p0(i64 16, ptr [[T2]])			// CHECK: call void @llvm.lifetime.start.p0(i64 16, ptr [[T2]])
	// CHECK: [[T4:%.]] = call noundef ptr @_ZN1TC1EPKc(ptr {{[^,]}} [[T2]], ptr noundef @.str)			// CHECK: [[T4:%.]] = call noundef ptr @_ZN1TC1EPKc(ptr {{[^,]}} [[T2]], ptr noundef @.str)
	//			//
	// CHECK: call void @llvm.lifetime.start.p0(i64 16, ptr [[T3]])			// CHECK: call void @llvm.lifetime.start.p0(i64 16, ptr [[T3]])
				// CHECK: call void @llvm.lifetime.start.p0(i64 8, ptr [[AGG]])
	// CHECK: [[T5:%.]] = call noundef ptr @_ZN1TC1E1S(ptr {{[^,]}} [[T3]], [2 x i32] %{{.*}})			// CHECK: [[T5:%.]] = call noundef ptr @_ZN1TC1E1S(ptr {{[^,]}} [[T3]], [2 x i32] %{{.*}})
				// CHECK: call void @llvm.lifetime.end.p0(i64 8, ptr [[AGG]]
	//			//
	// CHECK: call void @_ZNK1T6concatERKS_(ptr sret(%class.T) align 4 [[T1]], ptr {{[^,]*}} [[T2]], ptr noundef nonnull align 4 dereferenceable(16) [[T3]])			// CHECK: call void @_ZNK1T6concatERKS_(ptr sret(%class.T) align 4 [[T1]], ptr {{[^,]*}} [[T2]], ptr noundef nonnull align 4 dereferenceable(16) [[T3]])
	// CHECK: [[T6:%.]] = call noundef ptr @_ZNK1T3strEv(ptr {{[^,]}} [[T1]])			// CHECK: [[T6:%.]] = call noundef ptr @_ZNK1T3strEv(ptr {{[^,]}} [[T1]])
	//			//
	// CHECK: call void @llvm.lifetime.end.p0(			// CHECK: call void @llvm.lifetime.end.p0(
	// CHECK: call void @llvm.lifetime.end.p0(			// CHECK: call void @llvm.lifetime.end.p0(
	// CHECK: call void @llvm.lifetime.end.p0(			// CHECK: call void @llvm.lifetime.end.p0(
	// CHECK: ret ptr [[T6]]			// CHECK: ret ptr [[T6]]

	return T("[").concat(T(s)).str();			return T("[").concat(T(s)).str();
	}			}

	// CHECK: declare {{.*}}llvm.lifetime.start			// CHECK: declare {{.*}}llvm.lifetime.start
	// CHECK: declare {{.*}}llvm.lifetime.end			// CHECK: declare {{.*}}llvm.lifetime.end

clang/test/CodeGenCoroutines/pr59181.cpp

Show First 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	Task bar(bool cond) {
co_return cond ? Task{ co_await Task{}}: Task{};		co_return cond ? Task{ co_await Task{}}: Task{};
}		}

void foo() {		void foo() {
bar(true);		bar(true);
}		}

// CHECK: cleanup.cont:{{.*}}		// CHECK: cleanup.cont:{{.*}}
		// CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 4, ptr [[AGG:%agg.tmp[0-9]+]])
// CHECK-NEXT: load i8		// CHECK-NEXT: load i8
// CHECK-NEXT: trunc		// CHECK-NEXT: trunc
// CHECK-NEXT: store i1 false		// CHECK-NEXT: store i1 false
// CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 8, ptr [[REF:%ref.tmp[0-9]+]])		// CHECK-NEXT: call void @llvm.lifetime.start.p0(i64 8, ptr [[REF:%ref.tmp[0-9]+]])

// CHECK: await.suspend:{{.*}}		// CHECK: await.suspend:{{.*}}
// CHECK-NOT: call void @llvm.lifetime.start.p0(i64 8, ptr [[REF]])		// CHECK-NOT: call void @llvm.lifetime.start.p0(i64 8, ptr [[REF]])
		// CHECK-NOT: call void @llvm.lifetime.start.p0(i64 8, ptr [[AGG]])
// CHECK: call void @_ZZN4Task12promise_type15await_transformES_EN10Suspension13await_suspendESt16coroutine_handleIvE		// CHECK: call void @_ZZN4Task12promise_type15await_transformES_EN10Suspension13await_suspendESt16coroutine_handleIvE
		// CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 8, ptr [[AGG2:%agg.tmp[0-9]+]])
// CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 8, ptr [[REF]])		// CHECK-NEXT: call void @llvm.lifetime.end.p0(i64 8, ptr [[REF]])