Page MenuHomePhabricator

[Clang] Undef attribute for global variables
ClosedPublic

Authored by JonChesterfield on Feb 10 2020, 3:00 PM.

Details

Summary

[Clang] Attribute to allow defining undef global variables

Initializing global variables is very cheap on hosted implementations. The
C semantics of zero initializing globals work very well there. It is not
necessarily cheap on freestanding implementations. Where there is no loader
available, code must be emitted near the start point to write the appropriate
values into memory.

At present, external variables can be declared in C++ and definitions provided
in assembly (or IR) to achive this effect. This patch provides an attribute in
order to remove this reason for writing assembly for performance sensitive
freestanding implementations.

A close analogue in tree is LDS memory for amdgcn, where the kernel is
responsible for initializing the memory after it starts executing on the gpu.
Uninitalized variables in LDS are observably cheaper than zero initialized.

Patch is loosely based on the cuda shared and opencl __local variable
implementation which also produces undef global variables.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
aaron.ballman added inline comments.Mar 2 2020, 6:37 AM
clang/lib/Sema/SemaDeclAttr.cpp
7471

If you don't need any custom semantic checking, you can remove that function and instead call handleSimpleAttribute<LoaderUninitializedAttr>(S, D, AL);

clang/test/CodeGenCXX/attr-loader-uninitialized.cpp
15

What should happen with redeclarations? e.g., in C:

int a;

int foo() { return a; }

int a __attribute__((loader_uninitialized));

(This would be a useful test case to add.)

Also, I'd like to see a test case where the attributed global is an array.

clang/test/Sema/attr-loader-uninitialized.cpp
20

I'd also like to see a test demonstrating it doesn't accept any arguments.

JonChesterfield marked 2 inline comments as done.
  • Address review comments, more test coverage
JonChesterfield marked 2 inline comments as done.EditedMar 2 2020, 8:07 AM

Fixed the spelling/formatting, added more tests. The C++ case would be improved by warning on int x __attribute__((loader_uninitialised)) = 0 as there are two initializers.

The semantics for C are not what I hoped for where there are multiple definitions, one of which is via this attribute. Added a test for that. Recommendations for where to poke sema to raise an error on the second one are warmly invited. I'll try checkNewAttributesAfterDef, but will make a debug build first.

clang/lib/Sema/SemaDeclAttr.cpp
7471

I think this patch does need some custom semantic checking, I just haven't been able to work out how to implement it. Specifically, the attribute is an initializer, so

int foo __attribute__((loader_uninitialised)) = some_value;

should be a warning, as the = some_value is going to be ignored.

clang/test/CodeGenCXX/attr-loader-uninitialized.cpp
15

Ah, yes. I was thinking about tentative definitions before changing this test to C++. Fixed the name to be less misleading.

C++ just rejects it. Multiple definitions => error. Added test to sema.

C accepts it in either order. Which I believe it should. Either one is a tentative definition, and the other provides an actual definition (of undef), or the definition (of undef) is followed by a redeclaration.

This leaves the hole that while the following is rightly rejected in C for having multiple definitions:

int a __attr__(...);
int a = 10;

This is erroneously accepted, with the attribute ignored:

int a = 10;
int a __attr__(...);
rjmccall added inline comments.Mar 2 2020, 8:53 AM
clang/include/clang/Basic/AttrDocs.td
4367

How about:

The `loader_uninitialized` attribute can be placed on global variables to
indicate that the variable does not need to be zero initialized by the loader.
On most targets, zero-initialization does not incur any additional cost.
For example, most general purpose operating systems deliberately ensure
that all memory is properly initialized in order to avoid leaking privileged
information from the kernel or other programs. However, some targets
do not make this guarantee, and on these targets, avoiding an unnecessary
zero-initialization can have a significant impact on load times and/or code
size.

A declaration with this attribute is a non-tentative definition just as if it
provided an initializer. Variables with this attribute are considered to be
uninitialized in the same sense as a local variable, and the programs must
write to them before reading from them. If the variable's type is a C++ class
type with a non-trivial default constructor, or an array thereof, this attribute
only suppresses the static zero-initialization of the variable, not the dynamic
initialization provided by executing the default constructor.

clang/lib/Sema/SemaDeclAttr.cpp
7471

This should be an error, not a warning, unless there's a specific need to be lenient.

aaron.ballman added inline comments.Mar 2 2020, 10:26 AM
clang/lib/Sema/SemaDeclAttr.cpp
7471

Agreed that this should be an error rather than a warning.

Not 100% certain, but I suspect you'll need to add that checking to Sema::AddInitializerToDecl() because I'm guessing that the initializer has not been processed by the time the attributes are being applied to the variable declaration. You can check VarDecl::hasInit() within handleLoaderUninitializedAttr() to see if that specific declaration has an initializer, but I'm betting that gives you the wrong answer.

clang/test/CodeGenCXX/attr-loader-uninitialized.cpp
15

This is erroneously accepted, with the attribute ignored:

I think you will probably want to add another case to mergeDeclAttribute() following the similar patterns there so that you can reject when you need to merge declarations that should not be merged.

JonChesterfield marked 2 inline comments as done.
  • Reject initialisers, update doc to recommended string
Quuxplusone added inline comments.Mar 2 2020, 11:27 AM
clang/test/CodeGenCXX/attr-loader-uninitialized.cpp
24

This test case is identical to line 36 of clang/test/Sema/attr-loader-uninitialized.cpp, where you say you don't want it to compile at all.

I think you need a clearer idea of how this interacts with initializers. Is it merely supposed to eliminate the zero-initialization that happens before the user-specified construction/initialization, or is it supposed to compete with the user-specified construction/initialization?

That is, for

nontrivial unt [[clang::loader_uninitialized]];

is it merely supposed to call unt::unt() on a chunk of undef memory (instead of the usual chunk of zeroed memory), or is it supposed to skip the constructor entirely? And for

int x [[clang::loader_uninitialized]] = foo();

is it merely supposed to call foo() and assign the result to a chunk of undef memory (instead of the usual chunk of zeroed memory), or is it supposed to skip the initialization entirely?

JonChesterfield marked 4 inline comments as done.Mar 2 2020, 11:49 AM
JonChesterfield added inline comments.
clang/include/clang/Basic/AttrDocs.td
4367

That's a lot better! Thank you. Updated the patch to use your wording verbatim.

clang/lib/Sema/SemaDeclAttr.cpp
7471

Nice, Yes, that's much better - I should have searched for the opencl __local handling. As you suspected, hasInit() just returns true at that point.

clang/test/CodeGenCXX/attr-loader-uninitialized.cpp
24

I think you commented while the first working piece of sema landed. My thinking is relatively clear but my understanding of clang's semantic analysis is a work in progress!

Initializers (= foo()) are straightforward. Error on the basis that the attribute effectively means = undef, and one should not have two initializers. A test case is now added for that (and now passes).

The codegen I want for a default constructed global marked with this variable is:

  • global variable allocated, with undef as the original value
  • default constructor call synthesized
  • said default constructor set up for invocation from crt, before main, writing over the undef value

Where the default constructor can be optimized as usual, e.g. if it always writes a constant, we can init with that constant instead of the undef and elide the constructor.

I don't have that actually working yet - the constructor call is not being emitted, so we just have the undef global.

I think it's important to distinguish between the values of the bits when the program is loaded and whether constructor/destructors are called, as one could want any combination of the two.

rjmccall added inline comments.Mar 2 2020, 12:21 PM
clang/test/CodeGenCXX/attr-loader-uninitialized.cpp
24

I think Arthur is suggesting that it would be useful to allow the attribute to be used in conjunction with an initializer in C++, since if the initializer has to be run dynamically, we can still meaningfully suppress the static zero-initialization. That is, we've accepted that it's useful to do this when *default-initializing* a global, but it's actually useful when doing *any* kind of dynamic initialization.

Maybe we can leave it as an error in C++ when the initializer is a constant expression. Although that might be unnecessarily brittle if e.g. the constructor is constexpr in one library version but not another.

Quuxplusone added inline comments.Mar 2 2020, 12:27 PM
clang/test/CodeGenCXX/attr-loader-uninitialized.cpp
24

No, that's exctly what I mean. You seem to be holding two contradictory ideas simultaneously:

[[loader_uninitialized]] X x = X{};  // two initializers, therefore error

[[loader_uninitialized]] X x {}; // one initializer plus one constructor, therefore fine

In C++, these two declarations have identical semantics. It doesn't make sense to say that one of them "calls a constructor" and the other one "has an initializer." They're literally the same thing.

Similarly in both C99 and C++ with plain old ints:

[[loader_uninitialized]] int x = foo();

This means "call foo and dynamically initialize x with the result," just as surely as

[[loader_uninitialized]] X x = X();

means "call X::X and dynamically initialize x with the result." Having one rule for dynamic initializers of primitive types and a separate rule for dynamic initializers of class types doesn't work.

Furthermore, "dynamic initialization" can be promoted to compile-time:

[[loader_uninitialized]] int x = 42;
[[loader_uninitialized]] std::string_view x = "hello world";

It wouldn't make semantic sense to say that one of these has "two initializers" and the other has "one initializer," because both of the initializations end up happening at compile time and getting put into .data.

JonChesterfield marked 2 inline comments as done.Mar 2 2020, 1:04 PM
JonChesterfield added inline comments.
clang/test/CodeGenCXX/attr-loader-uninitialized.cpp
24

I'm under the impression that the constructs are:

X x = X{};  // default construct an X and then call the copy constructor at x
X x {}; // default construct an X at the location of x
X x; // default construct an X at the location of x

The C++ initialisation rules are very complicated so I won't be shocked to have got that wrong.

If your toolchain actually runs global constructors then there isn't much use to marking C++ objects with this attribute, unless said constructor does nothing and would thus leave the bytes still undef.

Accepting default constructors (or, perhaps only accepting trivial types?) would allow people to use this attribute with the approximation of C++ available on systems that don't actually run the global constructors.

Equally, if this seems too complicated, I'm happy to restrict this to C as that's still an improvement on the asm and/or linker scripts I have available today.

rjmccall added inline comments.Mar 2 2020, 2:12 PM
clang/test/CodeGenCXX/attr-loader-uninitialized.cpp
24

Similarly in both C99 and C++ with plain old ints:

C does not allow non-constant initialization of global variables.

Let me try to explain what we're thinking here. If the variable provides both an initializer and this attribute, and the compiler can successfully perform constant initialization (as it's required to do in C, and as it often does in C++), then the attribute has probably become meaningless because we're not going to suppress that constant initialization. The compiler strongly prefers to diagnose meaningless attributes in some way, because they often indirectly indicate a bug. For example, maybe the initializer isn't supposed to be there but it's hidden behind a huge mess of macros; or maybe somebody added a constant initializer because they changed the variable schema to make that possible, and so the programmer should now remove the attribute and the dynamic initialization code that went along with it. Compared to that, the C++ use case of "allow a dynamic initializer but suppress static zero-initialization" is pretty marginal; that doesn't mean we absolutely shouldn't support it, but it does suggest that we shouldn't prioritize it over other aspects of the feature, like catching that possible bug.

rjmccall added inline comments.Mar 2 2020, 2:20 PM
clang/test/CodeGenCXX/attr-loader-uninitialized.cpp
24

If your toolchain actually runs global constructors then there isn't much use to marking C++ objects with this attribute, unless said constructor does nothing and would thus leave the bytes still undef.

That's not totally true. C++ still requires globals to be zero-initialized before it runs dynamic initializers, and the dynamic initializer is very likely to make that zero-initialization totally redundant. Arthur is right that there's no inherent reason that that only applies to dynamic *default* initialization, though, as opposed to any non-constant initialization.

I'd prefer to avoid making this C-specific; Clang takes a lot of pride in trying to make feature cross-language as much as possible. We could just make the error about providing both the attribute and an initializer C-specific, though.

Quuxplusone added inline comments.Mar 2 2020, 3:15 PM
clang/test/CodeGenCXX/attr-loader-uninitialized.cpp
24

Ah, I was wrong when I said "in both C99 and C++" int x = foo(); would be supported. You're right, that's a C++ism. Even C11 doesn't seem to support dynamic initializers, not even for function-local static variables.

I suppose I should also mention that I totally don't understand the domain here. :) To me, the way you avoid storing an explicit initializer in the ELF file is you allocate the variable into .bss instead of .data; and the way you avoid paying for zero-initialization at runtime would be that you allocate the variable into a hand-crafted section instead of .bss (example from Keil docs). In ordinary ELF-land, there's no way to say "I want this variable in .data but not initialized," nor "I want this variable in .bss but not zero-initialized" — the file format literally can't represent those concepts.

I've continued thinking about / experimenting with this. Curiously - X x; and X x{}; are considered distinct by clang. The current patch will only accept the former. I'll add some tests for that.

I think there's a reasonable chance that the developers who want to elide the runtime cost of zero initialising will usually also want to avoid dynamic initialisation. That suggests we could accept only trivially default constructible classes, where the undef initializer is correct in that one cannot determine whether a no-op constructor actually ran or not. If we go with that, then the more complicated question of exactly how this should interact with user-controlled disabling of dynamic initialization can be postponed until that feature is introduced. This patch is then reasonably self contained.

This patch is strongly related to the linker script approach, or equivalent asm. It's target-dependent how the uninitialised data gets represented, e.g. x64 will put it in bss anyway. Opencl's __local is similarly uninitialised, and gets annotated with an address space that maps onto magic in llc and/or the loader.

JonChesterfield marked 2 inline comments as done.
  • Reduce scope to trivial default constructed types
  • error on extern variables, minor cleanup
JonChesterfield edited the summary of this revision. (Show Details)Mar 2 2020, 4:18 PM
  • Error on redeclaration with loader_uninit in C

I'm finally happy with the semantic checks here. Thanks for the guidance on where to look for the hooks.

  • attributed variable must be at global scope
  • all initializers are rejected
  • default constructors must be trivial (to reduce the scope of this patch)
  • extern variables rejected as they can't meaningfully have a definition
  • attribute on a declaration following a normal definition in C rejected

Patch is a bit bigger than I hoped for but quite self contained. Everything is guarded by a test on the attribute.

@Quuxplusone does restricting this to trivial default construction resolve your concerns?

JonChesterfield marked an inline comment as done.Mar 3 2020, 8:56 AM
aaron.ballman added inline comments.Mar 6 2020, 6:49 AM
clang/lib/Sema/SemaDecl.cpp
2718

Missing a full stop at the end of the sentence. The comment can probably be re-flowed with the preceding sentence too.

11930

Missing full stop.

12358

Should this either be calling VarDecl::hasExternalStorage() or looking at the linkage of the variable, rather than at the storage class written in the source?

clang/test/CodeGen/attr-loader-uninitialized.c
5

I thought err_loader_uninitialized_extern says that the variable cannot have external linkage?

clang/test/CodeGenCXX/attr-loader-uninitialized.cpp
5

I thought err_loader_uninitialized_extern says that the variable cannot have external linkage?

JonChesterfield marked 2 inline comments as done.
  • Review comments, add tests for private_extern

I thought err_loader_uninitialized_extern says that the variable cannot have external linkage?

Embarrassing! This was a badly written error message, now fixed to:
external declaration of variable cannot have the 'loader_uninitialized' attribute

The premise behind this feature is to be semantically identical to:
type foo = undef;

The initializer value is orthogonal to the variable linkage and visibility. If = 4 was ok, so should this attribute be.

What is not meaningful is to declare that a variable is defined elsewhere that is uninitialized.
That is, extern int x = 42; // warning: 'extern' variable has an initializer
Therefore [[loader_uninitialized]] int x = 42; // also bad

This patch makes the latter an error, on the basis that it's definitely a mistake to provide two initializers for one variable (one 42, one undef).

C++ thinks const int x; is an error and C thinks const int x; is fine. This patch remains consistent with that.

clang/lib/Sema/SemaDecl.cpp
12358

Interesting question, thank you. SC_Extern is right.

hasExternalStorage is true if extern or private_extern. I hadn't seen private_extern before, but it appears to be a way to model hidden visibility. It accepts a normal initializer, e.g.
__private_extern__ int private_extern_can_be_initialised = 10;
therefore should also work with undef.

Added a test for this (which will fail if SC_Extern is replaced with hasExternalStorage).

Replying to linkage out of line as it comes up on a few inline comments.

ping @aaron.ballman - does that look right to you?

aaron.ballman accepted this revision.Mar 12 2020, 2:07 PM

Aside from the diagnostic wording, I think this LG to me. However, I'd appreciate if @rjmccall would also sign off.

clang/include/clang/Basic/DiagnosticSemaKinds.td
5343

How would you feel about: "variable %0 cannot be declared both 'extern' and with the 'loader_uninitialized' attribute" (or something along those lines) to clarify "external declaration"?

This revision is now accepted and ready to land.Mar 12 2020, 2:07 PM
  • Amend diagnostic as suggested, clang-format new lines in SemaKinds.td
JonChesterfield marked an inline comment as done.
  • undo reformat of existing def

Aside from the diagnostic wording, I think this LG to me. However, I'd appreciate if @rjmccall would also sign off.

Thanks! @rjmccall?

clang/include/clang/Basic/DiagnosticSemaKinds.td
5343

Better, thanks. Also discovered clang-format has learned to do sensible things with tablegen so applied it to the new defs.

Harbormaster completed remote builds in B49170: Diff 250263.
  • Update C codegen test to match output of trunk

Thanks guys. Patched the C codegen test before landing - minor change on trunk in the last week, orthogonal to this.

Delighted to have one less reason to write assembly.

This revision was automatically updated to reflect the committed changes.

Tests made assumptions about alignment which are invalid on arm, failed a buildbot. The tests don't actually care about alignment, so fixed by dropping the , align # part of the patterns.

This breaks tests on Windows: http://45.33.8.238/win/10664/step_7.txt

Please take a look, and if it takes some time please revert while you investigate.

This breaks tests on Windows: http://45.33.8.238/win/10664/step_7.txt

Please take a look, and if it takes some time please revert while you investigate.

Thanks! It seems Windows inserts 'dso_local' into the middle of the generated IR.

I can't test on Windows so the two that failed CI are now marked as "UNSUPPORTED: system-windows".

Do you know a usual work around for variation in symbol visibility? I'm happy to copy it from another test but am wary of guessing what might work on Windows.

This breaks tests on Windows: http://45.33.8.238/win/10664/step_7.txt

Please take a look, and if it takes some time please revert while you investigate.

Thanks! It seems Windows inserts 'dso_local' into the middle of the generated IR.

I can't test on Windows so the two that failed CI are now marked as "UNSUPPORTED: system-windows".

Do you know a usual work around for variation in symbol visibility? I'm happy to copy it from another test but am wary of guessing what might work on Windows.

Our windows builds always just add dso_local, so we typically just do a wildcard there to handle those cases (or, separate check lines for windows).

Also, see this bug: This crashes immediately when used on a template instantiation: https://bugs.llvm.org/show_bug.cgi?id=47169

Also, see this bug: This crashes immediately when used on a template instantiation: https://bugs.llvm.org/show_bug.cgi?id=47169

Thanks for the bug report! Template instantiations are missing from the test cases, I will go debugging.

I did a little debugging, and the problem is the template itself isn't a complete type until you require it with RequireCompleteType. You have this same problem with incomplete types: https://godbolt.org/z/hvf3M4

I would suspect you either add a RequireCompleteType for your loader_uninitialized checks, or move the check somewhere where it is already complete.

I did a little debugging, and the problem is the template itself isn't a complete type

That's clear cut then, thanks. This patch was limited to trivially constructible types, and we don't know whether the type is trivially constructible if it's incomplete. Thus it does require complete types and we're missing a check in sema

Yep! Declaring a global variable that isn't 'extern' with an incomplete type is disallowed anyway, so if you call RequireCompleteType, you're likely just diagnosing that early.

You MIGHT have to do some work to allow:

struct S;
extern S foo __attribute__((loader_uninitialized));
JonChesterfield added a comment.EditedAug 14 2020, 10:50 AM

Yep! Declaring a global variable that isn't 'extern' with an incomplete type is disallowed anyway, so if you call RequireCompleteType, you're likely just diagnosing that early.

You MIGHT have to do some work to allow:

struct S;
extern S foo __attribute__((loader_uninitialized));

I'll add that to the tests. Looking OK so far, added a check to SemaDecl and the crash is gone. No fix needed for C. Will have a patch up soon

edit: extern + loader_uninitialized was rejected in the first patch, will check it still works like that

Fix proposed at D85990. Thanks for the detailed bug report!