Page MenuHomePhabricator

[CodeGen] Correct codegen for self-capturing __block var
Needs ReviewPublic

Authored by ille on Oct 29 2020, 3:13 PM.



This is based on my previous patch,, but is an
attempt at a full fix rather than a minimal one, following rjmccall's
suggestion of initializing directly on the heap and emitting a trivial copy

This applies to situations where a __block variable's initializer references a
block that potentially captures the variable itself.

Clang special-cases these situations because the initializer might call
Block_copy and cause the variable to be moved to the heap while it's
still being initialized. For example, in this snippet:

int copyBlockAndReturnInt(void (^)(void));

__block int val = copyBlockAndReturnInt(^{
    printf("%p\n", val);

...if copyBlockAndReturnInt calls Block_copy, val will be moved to the
heap. When copyBlockAndReturnInt returns, the generated code needs to store
the value to the new location.

For scalar values like pointers, this is handled by deferring computation of
the variable's address until after the initializer has been evaluated (and just
before actually storing to that address).

But what about structs? This case would be hard:

struct Foo { int a, b; };
int copyBlockAndReturnInt(void (^)(void));

__block struct Foo foo = {42, copyBlockAndReturnInt(^{
    printf("%p\n", &foo);

To support relocating this to the heap, Clang would have to recalculate
the location of foo before writing to each struct field, in case the
initializer of the current field moved the struct. Block_copy would
also end up copying a half-initialized struct, although that's no big
deal with C structs.

This C++ case, on the other hand, would be impossible:

struct Foo {
    void (^block_)(void);
    Foo(void (^block)(void)) {
        printf("I am at %p\n", this);
        block_ = Block_copy(block);
        printf("I am now at %p\n", this);

__block Foo foo(^{
    printf("%p\n", &foo);

To support relocating Foo to the heap, Clang would have to magically
move it mid-construction. Normally, Clang uses the move or copy
constructor when moving C++ classes to the heap, but C++ classes
certainly don't expect a move constructor to be called in the middle of
another constructor. memcpy might work in some cases, but it violates
C++ classes' expectation that object addresses will remain stable. Thus
there's no real way to make this work, and ideally it would result in a
compile error (or, alternately, a runtime crash upon calling

So how does Clang currently approach this situation? Well, there's just a

case TEK_Aggregate:
    // TODO: how can we delay here if D is captured by its initializer?

Clang never calls drillIntoBlockVariable, and ends up writing the
variable to the beginning of the byref struct (not the corresponding
field in it), which is never the correct location. This leaves both the
variable and the byref struct corrupt - even if the block is never
actually copied.

This bug dates back to at least 2012, as I reproduced it on LLVM 3.1.

This patch addresses the issue by allocating the variable directly on
the heap and initializing it there. This ensures it never needs to

For now, to avoid the need for an ABI update, this is done in a hacky
way. First, a byref struct is allocated on the stack as usual, but the
object data itself is left uninitialized, and the copy helper is set to
a no-op. Then _Block_object_assign is called to migrate it to the
heap, followed immediately by _Block_object_dispose to remove the
unnecessary extra reference. Only then is the object initialized in its
new location.

The implicit heap allocation introduced here may be unnecessary if the
block is never actually copied, which could be a problem in code that's
highly performance sensitive or wants to be resilient to allocation
failure. To defend against this, this patch adds a warning,
-Wimplicit-block-var-alloc, to diagnose every time the implicit allocation
is triggered. Since most codebases do not care, the warning is disabled
by default and is not included in any standard warning groups.

This patch also makes the is-captured-by-init analysis simpler yet more
precise (while moving it into Sema so that it can power the new
warning). As far as I can tell, the previous implementation was
overcomplicated for no good reason. In ancient days (2011), the
recursive function isCapturedBy only worked on Exprs, and it recursed on
the Expr's children, which it assumed would also be Exprs. This
assumption turned out to be false for statement expressions. The simple
fix would have been to change isCapturedBy to accept and recurse on
arbitrary Stmts instead, since children() is defined for all Stmts, not
just Exprs. I'm not sure if there was some reason that wasn't possible
at the time, but what happened instead is that a special case was added
for statement expressions, which over a series of patches got more and
more convoluted. This patch deletes all of that in favor of just
recursing on Stmts.

Diff Detail

Unit TestsFailed

390 mslinux > HWAddressSanitizer-x86_64.TestCases::sizes.cpp
Script: -- : 'RUN: at line 3'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang --driver-mode=g++ -m64 -gline-tables-only -fsanitize=hwaddress -fuse-ld=lld -mcmodel=large -mllvm -hwasan-globals -mllvm -hwasan-use-short-granules -mllvm -hwasan-instrument-landing-pads=0 -mllvm -hwasan-instrument-personality-functions /mnt/disks/ssd0/agent/llvm-project/compiler-rt/test/hwasan/TestCases/sizes.cpp -nostdlib++ -lstdc++ -o /mnt/disks/ssd0/agent/llvm-project/build/projects/compiler-rt/test/hwasan/X86_64/TestCases/Output/sizes.cpp.tmp

Event Timeline

ille created this revision.Oct 29 2020, 3:13 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 29 2020, 3:13 PM
ille requested review of this revision.Oct 29 2020, 3:13 PM
ille updated this revision to Diff 301966.Oct 30 2020, 11:24 AM

Satisfy clang-format bot, at the cost of formatting a few adjacent lines.

The other bot failures seem pretty clearly unrelated to this change.

This is great work, thank you.


This will read okay on the command line, but it'll be awkward in most IDEs that consume diagnostics. You should report the problem fairly completely in the main diagnostic and then just say something like "captured in block here" in the note.


You should check isEscapingByref() here rather than just hasAttr; if none of the capturing blocks are escaping, we don't need to worry about this. Or are you handling this implicitly in your pass during escape-marking? That seems subtle.


You should use ->is<RecordType>() instead of isa, since the latter doesn't handle e.g. typedefs.

The reference to TEK_Aggregate is inappropriate here; it's okay to just talk about "aggregates" under the idea that anything else could validly just be zero-initialized prior to the capture.


I think there's some tooling that will be slightly happier if you release the value in out instead of addr. I believe we do emit a different function call to enable that in unoptimized builds.

You should add comments describing what you're doing here. The explanation from the commit description's pretty good.


There is an EvaluatedExprVisitor class which might make this easier, and which will definitely eliminate some false positives.

ille added inline comments.Nov 10 2020, 9:49 AM

On second thought, the hasAttr<BlocksAttr>() shouldn't be there; it's unnecessary. (I based it on isEscapingByref, where it's also unnecessary - but it's needed in isNonEscapingByref, so I suppose isEscapingByref has it for symmetry.). I should replace it with just !isa<ParmVarDecl>(this) to ensure that NonParmVarDeclBits is valid. And the name isCapturedByOwnInit is also a misnomer; I should change it to something like isByRefCapturedByOwnInit.

I want the substantive conditions for isCapturedByOwnInit to be handled during escape marking, for two reasons. One, that's where the warning is emitted, and I want to ensure it doesn't ever disagree with codegen about whether init-on-heap needs to be applied. Two, it can't mean "is captured by own initializer regardless of __block", because that would imply unnecessarily running the capture check on all variables.


You mean isRecordType()? Good catch, I'll add a test for that.


Hmm… In an earlier revision by jfb to this same check, you were concerned about using RecursiveASTVisitor because 'RecursiveASTVisitor instantiations are huge'. EvaluatedExprVisitor inherits from a different class from RecursiveASTVisitor, but they look pretty similar, so doesn't the same concern about template bloat apply? Perhaps I'm misunderstanding that earlier comment.

rjmccall added inline comments.Nov 10 2020, 3:23 PM

Okay. In that case, this can just go in the header, and it's worth documenting in comments (on either the accessor or the underlying bit) that it's only set on __block variables.


StmtVisitor instantiations don't come with the same code-size impact as RecursiveASTVisitor, mostly because they primarily depend on the same recursive children() walk rather than the explicit repeated logic and careful tracking of the latter, but also because they don't make an effort to recurse into various syntactic but unevaluated expressions, and because the main "heavyweight" thing in their instantiation (the switch in Visit) is pretty easily optimized by the compiler if you actually end up only caring about particular cases. But also, EvaluatedExprVisitor is the right tool for this job because you only care about evaluated references to the variable.

@ille, are you still working on this patch?