This is based on my previous patch, https://reviews.llvm.org/D89903, but is an
attempt at a full fix rather than a minimal one, following rjmccall's
suggestion of initializing directly on the heap and emitting a trivial copy
helper.
This applies to situations where a __block variable's initializer references a
block that potentially captures the variable itself.
Clang special-cases these situations because the initializer might call
Block_copy and cause the variable to be moved to the heap while it's
still being initialized. For example, in this snippet:
int copyBlockAndReturnInt(void (^)(void)); __block int val = copyBlockAndReturnInt(^{ printf("%p\n", val); });
...if copyBlockAndReturnInt calls Block_copy, val will be moved to the
heap. When copyBlockAndReturnInt returns, the generated code needs to store
the value to the new location.
For scalar values like pointers, this is handled by deferring computation of
the variable's address until after the initializer has been evaluated (and just
before actually storing to that address).
But what about structs? This case would be hard:
struct Foo { int a, b; }; int copyBlockAndReturnInt(void (^)(void)); __block struct Foo foo = {42, copyBlockAndReturnInt(^{ printf("%p\n", &foo); })};
To support relocating this to the heap, Clang would have to recalculate
the location of foo before writing to each struct field, in case the
initializer of the current field moved the struct. Block_copy would
also end up copying a half-initialized struct, although that's no big
deal with C structs.
This C++ case, on the other hand, would be impossible:
struct Foo { void (^block_)(void); Foo(void (^block)(void)) { printf("I am at %p\n", this); block_ = Block_copy(block); printf("I am now at %p\n", this); } }; __block Foo foo(^{ printf("%p\n", &foo); });
To support relocating Foo to the heap, Clang would have to magically
move it mid-construction. Normally, Clang uses the move or copy
constructor when moving C++ classes to the heap, but C++ classes
certainly don't expect a move constructor to be called in the middle of
another constructor. memcpy might work in some cases, but it violates
C++ classes' expectation that object addresses will remain stable. Thus
there's no real way to make this work, and ideally it would result in a
compile error (or, alternately, a runtime crash upon calling
Block_copy).
So how does Clang currently approach this situation? Well, there's just a
comment:
case TEK_Aggregate: [...] // TODO: how can we delay here if D is captured by its initializer?
Clang never calls drillIntoBlockVariable, and ends up writing the
variable to the beginning of the byref struct (not the corresponding
field in it), which is never the correct location. This leaves both the
variable and the byref struct corrupt - even if the block is never
actually copied.
This bug dates back to at least 2012, as I reproduced it on LLVM 3.1.
This patch addresses the issue by allocating the variable directly on
the heap and initializing it there. This ensures it never needs to
move.
For now, to avoid the need for an ABI update, this is done in a hacky
way. First, a byref struct is allocated on the stack as usual, but the
object data itself is left uninitialized, and the copy helper is set to
a no-op. Then _Block_object_assign is called to migrate it to the
heap, followed immediately by _Block_object_dispose to remove the
unnecessary extra reference. Only then is the object initialized in its
new location.
The implicit heap allocation introduced here may be unnecessary if the
block is never actually copied, which could be a problem in code that's
highly performance sensitive or wants to be resilient to allocation
failure. To defend against this, this patch adds a warning,
-Wimplicit-block-var-alloc, to diagnose every time the implicit allocation
is triggered. Since most codebases do not care, the warning is disabled
by default and is not included in any standard warning groups.
This patch also makes the is-captured-by-init analysis simpler yet more
precise (while moving it into Sema so that it can power the new
warning). As far as I can tell, the previous implementation was
overcomplicated for no good reason. In ancient days (2011), the
recursive function isCapturedBy only worked on Exprs, and it recursed on
the Expr's children, which it assumed would also be Exprs. This
assumption turned out to be false for statement expressions. The simple
fix would have been to change isCapturedBy to accept and recurse on
arbitrary Stmts instead, since children() is defined for all Stmts, not
just Exprs. I'm not sure if there was some reason that wasn't possible
at the time, but what happened instead is that a special case was added
for statement expressions, which over a series of patches got more and
more convoluted. This patch deletes all of that in favor of just
recursing on Stmts.
This will read okay on the command line, but it'll be awkward in most IDEs that consume diagnostics. You should report the problem fairly completely in the main diagnostic and then just say something like "captured in block here" in the note.