This is an archive of the discontinued LLVM Phabricator instance.

Avoid needlessly copying blocks that initialize or are assigned to local auto variables to the heap
ClosedPublic

Authored by ahatanak on Feb 21 2019, 8:42 AM.

Details

Summary

This patch avoids copying blocks that initialize or are assigned to local auto variables to the heap when the local auto variables do not have their addresses taken and are declared in the same scope as the block. We can possibly add back the optimization in the ARC optimizer that was reverted in r189869 (http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20130902/186509.html), but I suspect it would be much more complicated and fragile than doing it in the front-end.

I'm not 100% sure whether it's necessary to disable this optimization when the address of the local variable is taken. We do pass a block on the stack to a function when the block is directly passed instead of first being assigned to a local variable, but clang currently doesn't copy the passed block to the heap in the callee although the block can possibly escape if the address is taken. For example:

__strong id *g0, g1;

void foo0(BlockTy b) {
  g0 = (__strong id *)&b;
  g1 = *g0; // this is just a retain, not a block copy.
}

void foo1() {
  foo0(^{...}) // block is on the stack.
}

void foo2() {
  foo1();
  ((BlockTy)g1)(); // this can crash if the block is still on the stack.
}

rdar://problem/13289333

Diff Detail

Repository
rC Clang

Event Timeline

ahatanak created this revision.Feb 21 2019, 8:42 AM

The correctness condition here is solely that we cannot allow the block literal to go out of scope before the variable that it is assigned to. (Local block literals with captures have lifetimes like C compound literals: until the end of the enclosing block, rather than the end of the full-expression.) So doing this blindly for assignments is problematic because you'd need to reason about relative lifetimes, but doing it for initialization should always be fine. I don't see any reason why address-taken-ness would matter as long as that condition holds.

ahatanak updated this revision to Diff 187904.Feb 21 2019, 10:38 PM

Remove the code that is needed to check whether the address of a local variable is taken.

For assignment, the optimization isn't performed if the local variable isn't declared in the scope that introduced the block (see the code and comment in SemaExpr.cpp).

I see, alright.

lib/CodeGen/CGObjC.cpp
3234

Can this just be a case in ARCRetainExprEmitter? I think that should subsume both this and the logic in EmitARCStoreStrong.

lib/Sema/SemaDecl.cpp
11262

I won't insist that you look through arbitrary value-propagating expressions like commas and conditionals, but please do at least call IgnoreParens() here and in the assignment case.

lib/Sema/SemaExpr.cpp
12485

Please check for a block-expression RHS first, it is far more likely to short-circuit this check than anything else.

Also, I think the right place for this check is up with the calls to DiagnoseSelfAssignment and DiagnoseSelfMove.

ahatanak updated this revision to Diff 187998.Feb 22 2019, 3:05 PM
ahatanak marked 3 inline comments as done.

Address review comments. Add CodeGen test cases for parentheses expressions.

Okay, one last minor request, then LGTM.

lib/CodeGen/CGObjC.cpp
2953

Oh, I'd forgotten this wasn't a normal expression visitor. Well, okay, this isn't too bad.

lib/Sema/SemaExpr.cpp
12461

You should IgnoreParens on the LHS as well. In general, you should always IgnoreParens.

ahatanak updated this revision to Diff 188472.Feb 26 2019, 3:50 PM
ahatanak marked an inline comment as done.

Call IgnoreParens on the LHS too.

rjmccall accepted this revision.Feb 26 2019, 10:03 PM

Thanks, LGTM.

This revision is now accepted and ready to land.Feb 26 2019, 10:03 PM
ahatanak closed this revision.Feb 27 2019, 10:20 AM

Fixed in r355012.

thakis added a subscriber: thakis.Mar 13 2019, 1:12 PM

Hi ahatanak,

this causes a crash in chrome/ios. A reduced repro is at https://bugs.chromium.org/p/chromium/issues/detail?id=941680 . Is the code invalid, or is that a bug in the transform?

Thanks,
Nico

Seems like the chromium code is valid and shouldn't crash. John/Erik what do you think? The following code also crashes with this patch applied.

typedef void (^BlockTy)();

BlockTy sb;
__weak BlockTy wb;

void foo(id a) {
  auto b = ^{ NSLog(@"foo %@", a); };
  wb = b; // block isn't copied to the heap.
  sb = b; // block is copied to the heap.
}

int main() {
  auto x = [NSObject new];
  foo(x);
  sb();
  wb();
  return 0;
}

Seems like the chromium code is valid and shouldn't crash. John/Erik what do you think? The following code also crashes with this patch applied.

typedef void (^BlockTy)();

BlockTy sb;
__weak BlockTy wb;

void foo(id a) {
  auto b = ^{ NSLog(@"foo %@", a); };
  wb = b; // block isn't copied to the heap.
  sb = b; // block is copied to the heap.
}

int main() {
  auto x = [NSObject new];
  foo(x);
  sb();
  wb();
  return 0;
}

The assignment to wb seems like an escape of some sort. What happens for this similar code?

typedef void (^BlockTy)();

BlockTy sb;
__weak BlockTy wb;

void bar(id b) {
  wb = b;
  sb = b;
}

void foo(id a) {
  bar(^{ NSLog(@"foo %@", a); });
}

int main() {
  auto x = [NSObject new];
  foo(x);
  sb();
  wb();
  return 0;
}

Seems like the chromium code is valid and shouldn't crash. John/Erik what do you think? The following code also crashes with this patch applied.

typedef void (^BlockTy)();

BlockTy sb;
__weak BlockTy wb;

void foo(id a) {
  auto b = ^{ NSLog(@"foo %@", a); };
  wb = b; // block isn't copied to the heap.
  sb = b; // block is copied to the heap.
}

int main() {
  auto x = [NSObject new];
  foo(x);
  sb();
  wb();
  return 0;
}

The assignment to wb seems like an escape of some sort. What happens for this similar code?

typedef void (^BlockTy)();

BlockTy sb;
__weak BlockTy wb;

void bar(id b) {
  wb = b;
  sb = b;
}

void foo(id a) {
  bar(^{ NSLog(@"foo %@", a); });
}

int main() {
  auto x = [NSObject new];
  foo(x);
  sb();
  wb();
  return 0;
}

That code doesn't crash because the block is retained at the entry of bar and ARC optimizer doesn't remove the retain/release pairs in bar.

That code doesn't crash because the block is retained at the entry of bar and ARC optimizer doesn't remove the retain/release pairs in bar.

Oops, I meant:

typedef void (^BlockTy)();

BlockTy sb;
__weak BlockTy wb;

void bar(BlockTy b) {
  wb = b;
  sb = b;
}

void foo(id a) {
  bar(^{ NSLog(@"foo %@", a); });
}

int main() {
  auto x = [NSObject new];
  foo(x);
  sb();
  wb();
  return 0;
}

Is it the same? Does b get retained at the entry to bar()?

Sorry, I misread the code. If you change the parameter type of bar to BlockTy, the code crashes. If the type is id, it doesn't crash because IRGen copies the block to the heap in foo before passing it to bar.

Sorry, I misread the code. If you change the parameter type of bar to BlockTy, the code crashes. If the type is id, it doesn't crash because IRGen copies the block to the heap in foo before passing it to bar.

Okay, that's what I expected. (You didn't misread, I had just mistyped :/.)

How insane would it be to add a copy/dispose pair around assignments of blocks to weak references?

Do you mean copying the block to the heap before assigning it to wb and releasing it after the assignment inside bar? Wouldn't the block assigned to wb be deallocated after the release?

I remember this coming up 7-8 years ago. I intentionally decided against doing a copy/release when assigning to __weak because if the block wasn't already guaranteed to be copied then it was probably better to crash than to silently assign a value that's about to be deallocated. Note that copying the block we assign into wb doesn't change anything about the block stored in b.

I don't know why Chromium is building a weak reference to a block in the first place, but assuming they have a good reason to be doing it, they should fix their code to force a copy before forming a weak reference.

wuhao5 added a subscriber: wuhao5.Mar 14 2019, 10:21 AM

Hey guys, this is Hao, I am working with Chrome team to sort this issue out.

I remember this coming up 7-8 years ago. I intentionally decided against doing a copy/release when assigning to __weak because if the block wasn't already guaranteed to be copied then it was probably better to crash than to silently assign a value that's about to be deallocated. Note that copying the block we assign into wb doesn't change anything about the block stored in b.

I don't know why Chromium is building a weak reference to a block in the first place, but assuming they have a good reason to be doing it, they should fix their code to force a copy before forming a weak reference.

The culprit code is here

it's true that it can be copied before assigning to the weak var, and I understand the reasoning behind. however, my question is: just from the code itself, each variable has the proper scope and assignment, if the block copy happen automatically, just like what we should expect ARC would do, should it not mutate itself to something else. to be more precise, should the block assigned to the weak var be the same after the block is copied? (and in the code, the block should be moved to the heap after calling -addObject: a few line below.)

so in the end of day, as a user, should we expect the compiler would move the block from stack to heap in time and the variable we hold is consistent?

Hey guys, this is Hao, I am working with Chrome team to sort this issue out.

I remember this coming up 7-8 years ago. I intentionally decided against doing a copy/release when assigning to __weak because if the block wasn't already guaranteed to be copied then it was probably better to crash than to silently assign a value that's about to be deallocated. Note that copying the block we assign into wb doesn't change anything about the block stored in b.

I don't know why Chromium is building a weak reference to a block in the first place, but assuming they have a good reason to be doing it, they should fix their code to force a copy before forming a weak reference.

The culprit code is here

it's true that it can be copied before assigning to the weak var, and I understand the reasoning behind. however, my question is: just from the code itself, each variable has the proper scope and assignment, if the block copy happen automatically, just like what we should expect ARC would do, should it not mutate itself to something else. to be more precise, should the block assigned to the weak var be the same after the block is copied? (and in the code, the block should be moved to the heap after calling -addObject: a few line below.)

so in the end of day, as a user, should we expect the compiler would move the block from stack to heap in time and the variable we hold is consistent?

The specified user model of blocks is that they stay on the stack until they get copied, and there can be multiple such copies. ARC just automates that process. So the address of a block is not consistent before you've forced a copy.

I tend to agree that a better user model would have been for blocks to be allocated in one place, without any of this copying business, and for the compiler to make an intelligent decision about stack vs. heap based on how the block is used. That's the model we've used for closures in Swift. But that would've required the compiler to have a better ability to propagate information about how the block was used, which Clang isn't really set up for, and it would've required noescape annotations to be introduced and used reliably throughout the SDK, which seemed like a big request at the time. So it's not how it works.

There is no way in the existing ABI for copying a block to cause other references to the block to become references to the heap block. We do do that for __block variables, but not for block objects themselves.

There is no way in the existing ABI for copying a block to cause other references to the block to become references to the heap block. We do do that for __block variables, but not for block objects themselves.

Do you think thats worth doing? We could add a forwarding pointer to the end of a block literal on the stack (after all the captures) and flip an unused bit to track it, preserving ABI. Seems like now that we're delaying _Block_copyies this might be a bigger issue.

The specified user model of blocks is that they stay on the stack until they get copied, and there can be multiple such copies. ARC just automates that process. So the address of a block is not consistent before you've forced a copy.

I tend to agree that a better user model would have been for blocks to be allocated in one place, without any of this copying business, and for the compiler to make an intelligent decision about stack vs. heap based on how the block is used. That's the model we've used for closures in Swift. But that would've required the compiler to have a better ability to propagate information about how the block was used, which Clang isn't really set up for, and it would've required noescape annotations to be introduced and used reliably throughout the SDK, which seemed like a big request at the time. So it's not how it works.

There is no way in the existing ABI for copying a block to cause other references to the block to become references to the heap block. We do do that for __block variables, but not for block objects themselves.

I see - this makes sense. Right that I'd expect the compiler would know more about where the block is being used and make the variable consistent. my other worry is, although not realistically, that there can be other projects to use this weak/strong pointer trick to do a recursive block invocation. it becomes to me a bit counter-intuitive that I will need to know more about how block and where it should be copied, which currently we don't have to worry about it at all.

Right now we force an explicit copy before using it, but still like to request that this would be handled by Clang at some later point :)

The specified user model of blocks is that they stay on the stack until they get copied, and there can be multiple such copies. ARC just automates that process. So the address of a block is not consistent before you've forced a copy.

I tend to agree that a better user model would have been for blocks to be allocated in one place, without any of this copying business, and for the compiler to make an intelligent decision about stack vs. heap based on how the block is used. That's the model we've used for closures in Swift. But that would've required the compiler to have a better ability to propagate information about how the block was used, which Clang isn't really set up for, and it would've required noescape annotations to be introduced and used reliably throughout the SDK, which seemed like a big request at the time. So it's not how it works.

There is no way in the existing ABI for copying a block to cause other references to the block to become references to the heap block. We do do that for __block variables, but not for block objects themselves.

I see - this makes sense. Right that I'd expect the compiler would know more about where the block is being used and make the variable consistent. my other worry is, although not realistically, that there can be other projects to use this weak/strong pointer trick to do a recursive block invocation. it becomes to me a bit counter-intuitive that I will need to know more about how block and where it should be copied, which currently we don't have to worry about it at all.

Right now we force an explicit copy before using it, but still like to request that this would be handled by Clang at some later point :)

Can I ask why you want a weak reference to a block in the first place? It seems basically useless — blocks can certainly appear in reference cycles, but I don't know why you'd ever try to break that cycle with the block instead of somewhere else.

There is no way in the existing ABI for copying a block to cause other references to the block to become references to the heap block. We do do that for __block variables, but not for block objects themselves.

Do you think thats worth doing? We could add a forwarding pointer to the end of a block literal on the stack (after all the captures) and flip an unused bit to track it, preserving ABI. Seems like now that we're delaying _Block_copyies this might be a bigger issue.

We've always delayed _Block_copy in a bunch of places. Now we're just delaying it in a place that ARC used to be more conservative about.

I guess we could actually make forwarding work at some code-size cost (functions that emitted forwardable blocks would have to zero the forwarding slot and release it when destroying the stack copy). But it'd just silently do nothing without a runtime update, so it'd be somewhat treacherous, especially after a couple of releases: e.g. if we made the runtime change in macOS 20 Eugene O'Neill National Historic Site, and Chromium eventually only ran tests on macOS 20 and higher but still supported deploying to macOS 19 San Francisco Maritime National Historical Park, then Chromium might not catch that it was still necessary to include an explicit copy here.

wuhao5 added a comment.EditedMar 14 2019, 12:00 PM

Can I ask why you want a weak reference to a block in the first place? It seems basically useless — blocks can certainly appear in reference cycles, but I don't know why you'd ever try to break that cycle with the block instead of somewhere else.

The simplified version:

auto b = ^{
  if (check) {
    dispatch_after(queue, 1, b);
  } else {
   // done.
  }
};
dispatch_after(queue, 1, b);

There is no way in the existing ABI for copying a block to cause other references to the block to become references to the heap block. We do do that for __block variables, but not for block objects themselves.

Do you think thats worth doing? We could add a forwarding pointer to the end of a block literal on the stack (after all the captures) and flip an unused bit to track it, preserving ABI. Seems like now that we're delaying _Block_copyies this might be a bigger issue.

We've always delayed _Block_copy in a bunch of places. Now we're just delaying it in a place that ARC used to be more conservative about.

I guess we could actually make forwarding work at some code-size cost (functions that emitted forwardable blocks would have to zero the forwarding slot and release it when destroying the stack copy). But it'd just silently do nothing without a runtime update, so it'd be somewhat treacherous, especially after a couple of releases: e.g. if we made the runtime change in macOS 20 Eugene O'Neill National Historic Site, and Chromium eventually only ran tests on macOS 20 and higher but still supported deploying to macOS 19 San Francisco Maritime National Historical Park, then Chromium might not catch that it was still necessary to include an explicit copy here.

Lol, we're really running out of names, eh? We could still do a version of this without the O'Neill/Maritime problem, to optimize the performance on the following:

void f(int (^blk)()) {
        // implicitly copy the block somehow...
}

int main() {
        int cap;
        auto p = ^{ return cap; };
        for (int i = 0; i != N; ++i)
                f(p);
}

Where we used to do 1 _Block_copy, but now we do N. If we stored a strong reference to the heap block in the stack block, then made _Block_copy just hand back the first heap block it allocated, we could save that cost. That way we could still preserve the crash here, then maybe once macOS Maritime is at EOL we could adopt the forwarding behaviour. WDYT?

Can I ask why you want a weak reference to a block in the first place? It seems basically useless — blocks can certainly appear in reference cycles, but I don't know why you'd ever try to break that cycle with the block instead of somewhere else.

The simplified version:

auto b = ^{

if (check) {
  dispatch_after(queue, 1, b);
} else {
 // done.
}

};
dispatch_after(queue, 1, b);

Okay, so really just a block self-reference. We could really just add a feature for that that would avoid both the complexity and the expense of the self-capture dance.

Okay, so really just a block self-reference. We could really just add a feature for that that would avoid both the complexity and the expense of the self-capture dance.

Is there a plan to cover this case? or is it a legitimate use case that Clang should handle?

Okay, so really just a block self-reference. We could really just add a feature for that that would avoid both the complexity and the expense of the self-capture dance.

Is there a plan to cover this case? or is it a legitimate use case that Clang should handle?

You are currently relying on something that ARC doesn't guarantee, so the client code should be fixed to explicitly copy the block. I think we would be happy to consider a proposal in the long run to allow blocks to self-reference more easily, which will effectively bypass the problem.

Okay, so really just a block self-reference. We could really just add a feature for that that would avoid both the complexity and the expense of the self-capture dance.

Is there a plan to cover this case? or is it a legitimate use case that Clang should handle?

You are currently relying on something that ARC doesn't guarantee, so the client code should be fixed to explicitly copy the block. I think we would be happy to consider a proposal in the long run to allow blocks to self-reference more easily, which will effectively bypass the problem.

I am not sure if I follow here - is it not that the weak pointer holds a block that's in the stack but is supposed to be in the heap?

Okay, so really just a block self-reference. We could really just add a feature for that that would avoid both the complexity and the expense of the self-capture dance.

Is there a plan to cover this case? or is it a legitimate use case that Clang should handle?

You are currently relying on something that ARC doesn't guarantee, so the client code should be fixed to explicitly copy the block. I think we would be happy to consider a proposal in the long run to allow blocks to self-reference more easily, which will effectively bypass the problem.

I am not sure if I follow here - is it not that the weak pointer holds a block that's in the stack but is supposed to be in the heap?

In your code, ARC does not guarantee that the block pointer you're assigning to the weak reference will point to a heap copy of the block. ARC could force a copy of the block as part of the assignment, but it would be pointless because that copy would be immediately destroyed, leaving the weak reference holding nil, which is not what you want dynamically in your code. You need to force the block to be copied first.

It appears as though this commit was reverted in Apple's XCode Clang fork -- the behavior currently in XCode matches the behavior of upstream Clang prior to this patch. Presuming that's correct, I think we should revert this upstream as well. There doesn't seem to be a good reason to have the semantics of blocks differ between Apple's clang and upstream clang.

waltl added a subscriber: waltl.Aug 17 2021, 10:49 AM

I've gone ahead and created a revert review: https://reviews.llvm.org/D108243