Page MenuHomePhabricator

[CodeGen] Emit dynamic initializers for static TLS vars in outlined scopes
Needs ReviewPublic

Authored by Prince781 on Mon, Aug 12, 5:04 PM.

Details

Summary

For static TLS vars only visible inside a function, clang will only generate an initializer inside the function body where the variable was declared. However, it is possible for the variable to be indirectly referenced without ever calling the function it was declared in, if a scope referring to the variable gets outlined into a function that is executed on a new thread. Here are two examples that demonstrate this:

#include <thread>
#include <iostream>

struct Object {
    int i;
    Object() : i(3) {}
};

int main(void) {
    static thread_local Object o;

    std::cout << "[main] o.i = " << o.i << std::endl;
    std::thread t([] { std::cout << "[new thread] o.i = " << o.i << std::endl; });
    t.join();
}
#include <iostream>
#include <omp.h>

struct Object {
    int i;
    Object() : i(3) {}
};

int main(void) {
    static Object o;
    #pragma omp threadprivate(o)

    #pragma omp parallel
    #pragma omp critical
    std::cout << "[" << omp_get_thread_num() << "] o.i = " << o.i << std::endl;
}

In this patch, we generate an initializer in a function for every unique reference to a static TLS var that was declared in a different function.

Diff Detail

Event Timeline

Prince781 created this revision.Mon, Aug 12, 5:04 PM
Herald added a project: Restricted Project. · View Herald Transcript
Prince781 edited the summary of this revision. (Show Details)Mon, Aug 12, 5:07 PM

This might be a silly question, but what happens if the initializer for a thread-local variable refers to another thread-local variable? Do you need to initialize both variables? In what order?

Prince781 added a comment.EditedMon, Aug 12, 7:40 PM

This might be a silly question, but what happens if the initializer for a thread-local variable refers to another thread-local variable? Do you need to initialize both variables? In what order?

If variable A's initializer references variable B, then it will call B's initializer. So when we call A's initializer, B's initialization completes before A's.

If B is a static thread-local variable declared outside of the body of A's initializer, then a guarded initializer will be inserted the body of A's initializer.

If variable A's initializer references variable B, then it will call B's initializer.

I'm considering a testcase like this:

struct S { int x; };
void bar(S**);
void baz(void());
void f() { 
    thread_local S s = {1};
    thread_local S* p = &s;
    baz([]{bar(&p);});
}

The initializer for p normally just assumes s is initialized. I don't think this patch adds any code that would address that, although I could be missing something.


Thinking about it a bit more, I also have a general question: how is this supposed to work? What do other compilers do? Does the C++ standard say when the initializer is supposed to run? [stmt.dcl]p4 just says "Dynamic initialization [...] is performed the first time control passes through its declaration."

If variable A's initializer references variable B, then it will call B's initializer.

I don't think this patch adds any code that would address that, although I could be missing something.

No, you're absolutely right. I'll fix my patch later to address this oversight.

Thinking about it a bit more, I also have a general question: how is this supposed to work? What do other compilers do? Does the C++ standard say when the initializer is supposed to run? [stmt.dcl]p4 just says "Dynamic initialization [...] is performed the first time control passes through its declaration."

This I was not too sure about. gcc also does the same thing. I was not able to find anything addressing this situation directly. clang should either generate an error or do "the right thing." If the latter, then I take this from the standard to mean, "we can initialize at any time before the variable is referenced":

[ 3.7.2 Thread storage duration ] [basic.std.thread] A variable with thread storage duration shall be initialized before its first odr-use (3.2) and, if constructed, shall be destroyed on thread exit.

Prince781 updated this revision to Diff 215695.Fri, Aug 16, 3:39 PM

I've updated the patch to initialize, in the proper order, all foreign static TLS variables and the variables they depend on for initialization. I've also cleaned up the patch a bit.

mgrang added inline comments.Fri, Aug 16, 4:13 PM
clang/lib/CodeGen/CodeGenFunction.cpp
472

You can use the range-based version of llvm::sort here:

llvm::sort(OrderedVarInits);

Prince781 updated this revision to Diff 215709.Fri, Aug 16, 4:20 PM

Use range-based version of llvm::sort

Prince781 marked an inline comment as done.Fri, Aug 16, 4:20 PM

in the proper order

I would prefer lexical order, if possible. (At least, the order should be deterministic.)

clang should either generate an error or do "the right thing."

Agreed.

I think we should send a defect report to the C++ standards committee to clarify the ambiguity here.

Prince781 updated this revision to Diff 215912.Mon, Aug 19, 8:56 AM

I think this should order the initializers deterministically according to their var declaration order. Let me know if there's something I haven't considered.

I think we should send a defect report to the C++ standards committee to clarify the ambiguity here.

I followed the instructions on this page and sent it to std-discussion first.

efriedma added inline comments.Tue, Aug 20, 3:51 PM
clang/lib/CodeGen/CodeGenFunction.cpp
479

Is the call to a_deps.find() here actually necessary? It shouldn't be possible for an initializer to directly refer to a variable declared later.

"<" on SourceLocations isn't source order, in general; you need isBeforeInTranslationUnit. (This should be documented somewhere, but I'm not finding the documentation, unfortunately. Any suggestions for where it should be documented?)

Prince781 marked an inline comment as done.Tue, Aug 20, 4:09 PM
Prince781 added inline comments.
clang/lib/CodeGen/CodeGenFunction.cpp
479

It shouldn't be possible for an initializer to directly refer to a variable declared later.

That's true. I was using deps.find() to order the initialization of the variables. But since you mention isBeforeInTranslationUnit, I can use that instead. It appears to be documented here.

efriedma added inline comments.Tue, Aug 20, 4:52 PM
clang/lib/CodeGen/CodeGenFunction.cpp
479

The documentation question was more referring to the lack of documentation for operator< on SourceLocation.

Use SourceManager to order inits.

Prince781 marked 3 inline comments as done.Tue, Aug 20, 11:20 PM
Prince781 added inline comments.
clang/lib/CodeGen/CodeGenFunction.cpp
479

I see. I misinterpreted what you said. Well, then, I think that operator<(SourceLocation &, SourceLocation &) should have Doxygen comments and maybe a note saying, "this probably isn't what you want; see SourceManager::isBeforeInTranslationUnit()"

Prince781 marked an inline comment as done.Tue, Aug 20, 11:20 PM
Prince781 added a project: Restricted Project.Tue, Aug 20, 11:24 PM

Added a few more minor comments.

It looks like the consensus on std-discussion is actually that your testcase has undefined behavior? That seems like an awful conclusion, and the standard text doesn't really seem to support it. (I mean, I guess you could argue that " variable with thread storage duration shall be initialized before its first odr-use" means if it's not initialized, the behavior is undefined, but that's really confusing.) But given that, I think we should submit a core issue, and hold off on merging this until we hear back from the committee.

clang/lib/CodeGen/CodeGenFunction.cpp
306

If you don't care about the iteration order, using pop_back on a vector is faster.

317

Do you need to recurse here? It looks like the caller should handle that.

469

Indentation.

Prince781 marked an inline comment as done.Thu, Aug 22, 7:01 AM

But given that, I think we should submit a core issue, and hold off on merging this until we hear back from the committee.

I agree here. There does appear to be some previous discussion on this matter, but the spec itself still doesn't contain any language addressing this issue. I will submit a core issue.

clang/lib/CodeGen/CodeGenFunction.cpp
317

Oops, I think you might be right.

The more I think about this, the more I have doubts about whether this should be supported. For example, what happens in cases like this?:

#include <thread>
#include <iostream>

struct Object {
    int i;
    Object() : i(3) {}
    Object(int v) : i(3 + v) {}
};

int main(void) {
    int w = 4;
    static thread_local Object o(w);

    std::cout << "[main] o.i = " << o.i << std::endl;
    std::thread([] {
        std::cout << "[new thread] o.i = " << o.i << std::endl;
    }).join();
}

Should w be captured or not? Furthermore, if o referenced another block-scope thread-local that had an initializer referencing another local variable, that would have to be captured too. So I now think this should be an error.

Oh, I somehow forgot that was legal. :( That breaks this whole approach (well, maybe the lambda could capture "w", but that seems way too complicated). So we're left with a few possibilities:

  1. an error
  2. undefined behavior (probably we would want to warn in this case)
  3. capturing the address of the thread_local variable