Page MenuHomePhabricator

[Clang] Undef attribute for global variables

Authored by JonChesterfield on Feb 10 2020, 3:00 PM.



[Clang] Attribute to allow defining undef global variables

Initializing global variables is very cheap on hosted implementations. The
C semantics of zero initializing globals work very well there. It is not
necessarily cheap on freestanding implementations. Where there is no loader
available, code must be emitted near the start point to write the appropriate
values into memory.

At present, external variables can be declared in C++ and definitions provided
in assembly (or IR) to achive this effect. This patch provides an attribute in
order to remove this reason for writing assembly for performance sensitive
freestanding implementations.

A close analogue in tree is LDS memory for amdgcn, where the kernel is
responsible for initializing the memory after it starts executing on the gpu.
Uninitalized variables in LDS are observably cheaper than zero initialized.

Patch is loosely based on the cuda shared and opencl __local variable
implementation which also produces undef global variables.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Quuxplusone added inline comments.Feb 27 2020, 6:52 PM
40 ↗(On Diff #247156)

Can you explain a bit about how this interacts with C++ constructors? Will this object not have its constructor called at startup; or is it that the constructor will be called but the memory simply won't have been zeroed before calling the constructor?

For P1144 relocation (D50114, D61761, etc), Anton Zhilin and I have been discussing the possibility of a similar-sounding attribute that would skip the constructor of a local variable altogether, allowing someone to write e.g.

T relocate(T *source) {
    [[unconstructed]] T result;
    memcpy(result, source, sizeof(T));
    return result;

If your attribute does exactly that, then I'm interested. If your attribute doesn't do that, but is occupying real estate that implies that it does, then I'm also interested.

JonChesterfield marked an inline comment as done.Feb 28 2020, 2:11 AM
JonChesterfield added inline comments.
40 ↗(On Diff #247156)

This change is orthogonal I think. No change to object lifetime. Without the attribute, a global is zero initialized and then the constructor is called before main. With it, the global is undef before the constructor call.

A different attribute that suppresses the con(de)structor call entirely is also of interest to me. A freestanding implementation is likely to accept global constructors without warning, and emit code for them, but not actually run them at startup. That's not a great UX.

This will need to impact static analysis and the AST, I think. Local variables can't be redeclared, but global variables can, so disallowing initializers syntactically when the attribute is present doesn't necessarily stop other declarations from defining the variable. Also, you need to make the attribute mark something as a definition, the same way that e.g. the alias attribute does. Also, this probably ought to disable the default-initialization of non-trivial types in C++, in case that's not already done.

I would like this to be the case but am having a tough time understanding how sema works. VarDecl::hasInit() looked promising but doesn't appear to indicate whether there is a syntactic initialiser (e.g. = 10) present. I will continue to investigate. Codegen appears to be working fine.

Looks like you figured this out.

In D74361#1880904, @jfb wrote:

The current uninitialized attribute fits the model C and C++ follow for attributes: they have no semantic meaning for a valid program, in that a compiler can just ignore them and (barring undefined behavior) there's no observable effect to the program. This updated semantic changes the behavior of a valid C and C++ program, because the standards specify the value of uninitialized globals and TLS. I'd much rather have a separate attribute, say no_zero_init, to explicitly say what this does.

This proposed attribute can similarly be ignored without observable semantic effect. Instead of an IR undef variable, we would have an IR zeroinitialized variable, which is a legitimate implementation choice for undef. Adding the attribute to an existing program will, in general, change its meaning - but that's also true of other attributes.

I agree with this reasoning, but since you seem willing to change the attribute name, the point is now moot.


We try to always add documentation for any new attribute.

I'm not sure I like the new name; it doesn't read right to me. Maybe loader_uninitialized makes the intent clear enough?

Thinking more about it, I agree with you that this is orthogonal to C++ initialization. Users on targets like yours probably ought to be able to disable C++ initialization without having to disable zero-initialization, or vice-versa.

The above patch composes sensibly with openmp, e.g.

#include <omp.h>
#pragma omp declare target
int data __attribute__((no_zero_initializer));
#pragma omp allocate(data) allocator(omp_pteam_mem_alloc)
#pragma omp end declare target
@data = hidden addrspace(3) global i32 undef, align 4

I like loader_uninitialized. There's some prior art on using NOLOAD from a (gnu) linker script to achieve the same result, which is a similar name. I'll update the patch accordingly.

I found an arm toolchain which supports UNINIT in linker scripts. Asking around I've also heard that games dev is a potential user for this feature (perhaps analogous to mmap's MAP_UNINITIALIZED?) but haven't been able to confirm specifics.

I'll take a first shot at the documentation too.

davidb added a subscriber: davidb.Mar 2 2020, 5:58 AM
  • Rename attribute, propose some documentation
aaron.ballman added a subscriber: aaron.ballman.
aaron.ballman added inline comments.

initialised -> initialized


If you don't need any custom semantic checking, you can remove that function and instead call handleSimpleAttribute<LoaderUninitializedAttr>(S, D, AL);


What should happen with redeclarations? e.g., in C:

int a;

int foo() { return a; }

int a __attribute__((loader_uninitialized));

(This would be a useful test case to add.)

Also, I'd like to see a test case where the attributed global is an array.


I'd also like to see a test demonstrating it doesn't accept any arguments.

JonChesterfield marked 2 inline comments as done.
  • Address review comments, more test coverage
JonChesterfield marked 2 inline comments as done.EditedMar 2 2020, 8:07 AM

Fixed the spelling/formatting, added more tests. The C++ case would be improved by warning on int x __attribute__((loader_uninitialised)) = 0 as there are two initializers.

The semantics for C are not what I hoped for where there are multiple definitions, one of which is via this attribute. Added a test for that. Recommendations for where to poke sema to raise an error on the second one are warmly invited. I'll try checkNewAttributesAfterDef, but will make a debug build first.


I think this patch does need some custom semantic checking, I just haven't been able to work out how to implement it. Specifically, the attribute is an initializer, so

int foo __attribute__((loader_uninitialised)) = some_value;

should be a warning, as the = some_value is going to be ignored.


Ah, yes. I was thinking about tentative definitions before changing this test to C++. Fixed the name to be less misleading.

C++ just rejects it. Multiple definitions => error. Added test to sema.

C accepts it in either order. Which I believe it should. Either one is a tentative definition, and the other provides an actual definition (of undef), or the definition (of undef) is followed by a redeclaration.

This leaves the hole that while the following is rightly rejected in C for having multiple definitions:

int a __attr__(...);
int a = 10;

This is erroneously accepted, with the attribute ignored:

int a = 10;
int a __attr__(...);
rjmccall added inline comments.Mar 2 2020, 8:53 AM

How about:

The `loader_uninitialized` attribute can be placed on global variables to
indicate that the variable does not need to be zero initialized by the loader.
On most targets, zero-initialization does not incur any additional cost.
For example, most general purpose operating systems deliberately ensure
that all memory is properly initialized in order to avoid leaking privileged
information from the kernel or other programs. However, some targets
do not make this guarantee, and on these targets, avoiding an unnecessary
zero-initialization can have a significant impact on load times and/or code

A declaration with this attribute is a non-tentative definition just as if it
provided an initializer. Variables with this attribute are considered to be
uninitialized in the same sense as a local variable, and the programs must
write to them before reading from them. If the variable's type is a C++ class
type with a non-trivial default constructor, or an array thereof, this attribute
only suppresses the static zero-initialization of the variable, not the dynamic
initialization provided by executing the default constructor.


This should be an error, not a warning, unless there's a specific need to be lenient.

aaron.ballman added inline comments.Mar 2 2020, 10:26 AM

Agreed that this should be an error rather than a warning.

Not 100% certain, but I suspect you'll need to add that checking to Sema::AddInitializerToDecl() because I'm guessing that the initializer has not been processed by the time the attributes are being applied to the variable declaration. You can check VarDecl::hasInit() within handleLoaderUninitializedAttr() to see if that specific declaration has an initializer, but I'm betting that gives you the wrong answer.


This is erroneously accepted, with the attribute ignored:

I think you will probably want to add another case to mergeDeclAttribute() following the similar patterns there so that you can reject when you need to merge declarations that should not be merged.

JonChesterfield marked 2 inline comments as done.
  • Reject initialisers, update doc to recommended string
Quuxplusone added inline comments.Mar 2 2020, 11:27 AM

This test case is identical to line 36 of clang/test/Sema/attr-loader-uninitialized.cpp, where you say you don't want it to compile at all.

I think you need a clearer idea of how this interacts with initializers. Is it merely supposed to eliminate the zero-initialization that happens before the user-specified construction/initialization, or is it supposed to compete with the user-specified construction/initialization?

That is, for

nontrivial unt [[clang::loader_uninitialized]];

is it merely supposed to call unt::unt() on a chunk of undef memory (instead of the usual chunk of zeroed memory), or is it supposed to skip the constructor entirely? And for

int x [[clang::loader_uninitialized]] = foo();

is it merely supposed to call foo() and assign the result to a chunk of undef memory (instead of the usual chunk of zeroed memory), or is it supposed to skip the initialization entirely?

JonChesterfield marked 4 inline comments as done.Mar 2 2020, 11:49 AM
JonChesterfield added inline comments.

That's a lot better! Thank you. Updated the patch to use your wording verbatim.


Nice, Yes, that's much better - I should have searched for the opencl __local handling. As you suspected, hasInit() just returns true at that point.


I think you commented while the first working piece of sema landed. My thinking is relatively clear but my understanding of clang's semantic analysis is a work in progress!

Initializers (= foo()) are straightforward. Error on the basis that the attribute effectively means = undef, and one should not have two initializers. A test case is now added for that (and now passes).

The codegen I want for a default constructed global marked with this variable is:

  • global variable allocated, with undef as the original value
  • default constructor call synthesized
  • said default constructor set up for invocation from crt, before main, writing over the undef value

Where the default constructor can be optimized as usual, e.g. if it always writes a constant, we can init with that constant instead of the undef and elide the constructor.

I don't have that actually working yet - the constructor call is not being emitted, so we just have the undef global.

I think it's important to distinguish between the values of the bits when the program is loaded and whether constructor/destructors are called, as one could want any combination of the two.

rjmccall added inline comments.Mar 2 2020, 12:21 PM

I think Arthur is suggesting that it would be useful to allow the attribute to be used in conjunction with an initializer in C++, since if the initializer has to be run dynamically, we can still meaningfully suppress the static zero-initialization. That is, we've accepted that it's useful to do this when *default-initializing* a global, but it's actually useful when doing *any* kind of dynamic initialization.

Maybe we can leave it as an error in C++ when the initializer is a constant expression. Although that might be unnecessarily brittle if e.g. the constructor is constexpr in one library version but not another.

Quuxplusone added inline comments.Mar 2 2020, 12:27 PM

No, that's exctly what I mean. You seem to be holding two contradictory ideas simultaneously:

[[loader_uninitialized]] X x = X{};  // two initializers, therefore error

[[loader_uninitialized]] X x {}; // one initializer plus one constructor, therefore fine

In C++, these two declarations have identical semantics. It doesn't make sense to say that one of them "calls a constructor" and the other one "has an initializer." They're literally the same thing.

Similarly in both C99 and C++ with plain old ints:

[[loader_uninitialized]] int x = foo();

This means "call foo and dynamically initialize x with the result," just as surely as

[[loader_uninitialized]] X x = X();

means "call X::X and dynamically initialize x with the result." Having one rule for dynamic initializers of primitive types and a separate rule for dynamic initializers of class types doesn't work.

Furthermore, "dynamic initialization" can be promoted to compile-time:

[[loader_uninitialized]] int x = 42;
[[loader_uninitialized]] std::string_view x = "hello world";

It wouldn't make semantic sense to say that one of these has "two initializers" and the other has "one initializer," because both of the initializations end up happening at compile time and getting put into .data.

JonChesterfield marked 2 inline comments as done.Mar 2 2020, 1:04 PM
JonChesterfield added inline comments.

I'm under the impression that the constructs are:

X x = X{};  // default construct an X and then call the copy constructor at x
X x {}; // default construct an X at the location of x
X x; // default construct an X at the location of x

The C++ initialisation rules are very complicated so I won't be shocked to have got that wrong.

If your toolchain actually runs global constructors then there isn't much use to marking C++ objects with this attribute, unless said constructor does nothing and would thus leave the bytes still undef.

Accepting default constructors (or, perhaps only accepting trivial types?) would allow people to use this attribute with the approximation of C++ available on systems that don't actually run the global constructors.

Equally, if this seems too complicated, I'm happy to restrict this to C as that's still an improvement on the asm and/or linker scripts I have available today.

rjmccall added inline comments.Mar 2 2020, 2:12 PM

Similarly in both C99 and C++ with plain old ints:

C does not allow non-constant initialization of global variables.

Let me try to explain what we're thinking here. If the variable provides both an initializer and this attribute, and the compiler can successfully perform constant initialization (as it's required to do in C, and as it often does in C++), then the attribute has probably become meaningless because we're not going to suppress that constant initialization. The compiler strongly prefers to diagnose meaningless attributes in some way, because they often indirectly indicate a bug. For example, maybe the initializer isn't supposed to be there but it's hidden behind a huge mess of macros; or maybe somebody added a constant initializer because they changed the variable schema to make that possible, and so the programmer should now remove the attribute and the dynamic initialization code that went along with it. Compared to that, the C++ use case of "allow a dynamic initializer but suppress static zero-initialization" is pretty marginal; that doesn't mean we absolutely shouldn't support it, but it does suggest that we shouldn't prioritize it over other aspects of the feature, like catching that possible bug.

rjmccall added inline comments.Mar 2 2020, 2:20 PM

If your toolchain actually runs global constructors then there isn't much use to marking C++ objects with this attribute, unless said constructor does nothing and would thus leave the bytes still undef.

That's not totally true. C++ still requires globals to be zero-initialized before it runs dynamic initializers, and the dynamic initializer is very likely to make that zero-initialization totally redundant. Arthur is right that there's no inherent reason that that only applies to dynamic *default* initialization, though, as opposed to any non-constant initialization.

I'd prefer to avoid making this C-specific; Clang takes a lot of pride in trying to make feature cross-language as much as possible. We could just make the error about providing both the attribute and an initializer C-specific, though.

Quuxplusone added inline comments.Mar 2 2020, 3:15 PM

Ah, I was wrong when I said "in both C99 and C++" int x = foo(); would be supported. You're right, that's a C++ism. Even C11 doesn't seem to support dynamic initializers, not even for function-local static variables.

I suppose I should also mention that I totally don't understand the domain here. :) To me, the way you avoid storing an explicit initializer in the ELF file is you allocate the variable into .bss instead of .data; and the way you avoid paying for zero-initialization at runtime would be that you allocate the variable into a hand-crafted section instead of .bss (example from Keil docs). In ordinary ELF-land, there's no way to say "I want this variable in .data but not initialized," nor "I want this variable in .bss but not zero-initialized" — the file format literally can't represent those concepts.

I've continued thinking about / experimenting with this. Curiously - X x; and X x{}; are considered distinct by clang. The current patch will only accept the former. I'll add some tests for that.

I think there's a reasonable chance that the developers who want to elide the runtime cost of zero initialising will usually also want to avoid dynamic initialisation. That suggests we could accept only trivially default constructible classes, where the undef initializer is correct in that one cannot determine whether a no-op constructor actually ran or not. If we go with that, then the more complicated question of exactly how this should interact with user-controlled disabling of dynamic initialization can be postponed until that feature is introduced. This patch is then reasonably self contained.

This patch is strongly related to the linker script approach, or equivalent asm. It's target-dependent how the uninitialised data gets represented, e.g. x64 will put it in bss anyway. Opencl's __local is similarly uninitialised, and gets annotated with an address space that maps onto magic in llc and/or the loader.

JonChesterfield marked 2 inline comments as done.
  • Reduce scope to trivial default constructed types
  • error on extern variables, minor cleanup
JonChesterfield edited the summary of this revision. (Show Details)Mar 2 2020, 4:18 PM
  • Error on redeclaration with loader_uninit in C

I'm finally happy with the semantic checks here. Thanks for the guidance on where to look for the hooks.

  • attributed variable must be at global scope
  • all initializers are rejected
  • default constructors must be trivial (to reduce the scope of this patch)
  • extern variables rejected as they can't meaningfully have a definition
  • attribute on a declaration following a normal definition in C rejected

Patch is a bit bigger than I hoped for but quite self contained. Everything is guarded by a test on the attribute.

@Quuxplusone does restricting this to trivial default construction resolve your concerns?

JonChesterfield marked an inline comment as done.Mar 3 2020, 8:56 AM
aaron.ballman added inline comments.Mar 6 2020, 6:49 AM

Missing a full stop at the end of the sentence. The comment can probably be re-flowed with the preceding sentence too.


Missing full stop.


Should this either be calling VarDecl::hasExternalStorage() or looking at the linkage of the variable, rather than at the storage class written in the source?


I thought err_loader_uninitialized_extern says that the variable cannot have external linkage?


I thought err_loader_uninitialized_extern says that the variable cannot have external linkage?

JonChesterfield marked 2 inline comments as done.
  • Review comments, add tests for private_extern

I thought err_loader_uninitialized_extern says that the variable cannot have external linkage?

Embarrassing! This was a badly written error message, now fixed to:
external declaration of variable cannot have the 'loader_uninitialized' attribute

The premise behind this feature is to be semantically identical to:
type foo = undef;

The initializer value is orthogonal to the variable linkage and visibility. If = 4 was ok, so should this attribute be.

What is not meaningful is to declare that a variable is defined elsewhere that is uninitialized.
That is, extern int x = 42; // warning: 'extern' variable has an initializer
Therefore [[loader_uninitialized]] int x = 42; // also bad

This patch makes the latter an error, on the basis that it's definitely a mistake to provide two initializers for one variable (one 42, one undef).

C++ thinks const int x; is an error and C thinks const int x; is fine. This patch remains consistent with that.


Interesting question, thank you. SC_Extern is right.

hasExternalStorage is true if extern or private_extern. I hadn't seen private_extern before, but it appears to be a way to model hidden visibility. It accepts a normal initializer, e.g.
__private_extern__ int private_extern_can_be_initialised = 10;
therefore should also work with undef.

Added a test for this (which will fail if SC_Extern is replaced with hasExternalStorage).

Replying to linkage out of line as it comes up on a few inline comments.

ping @aaron.ballman - does that look right to you?

aaron.ballman accepted this revision.Thu, Mar 12, 2:07 PM

Aside from the diagnostic wording, I think this LG to me. However, I'd appreciate if @rjmccall would also sign off.


How would you feel about: "variable %0 cannot be declared both 'extern' and with the 'loader_uninitialized' attribute" (or something along those lines) to clarify "external declaration"?

This revision is now accepted and ready to land.Thu, Mar 12, 2:07 PM
  • Amend diagnostic as suggested, clang-format new lines in
JonChesterfield marked an inline comment as done.
  • undo reformat of existing def

Aside from the diagnostic wording, I think this LG to me. However, I'd appreciate if @rjmccall would also sign off.

Thanks! @rjmccall?


Better, thanks. Also discovered clang-format has learned to do sensible things with tablegen so applied it to the new defs.

Harbormaster completed remote builds in B49170: Diff 250263.
  • Update C codegen test to match output of trunk

Thanks guys. Patched the C codegen test before landing - minor change on trunk in the last week, orthogonal to this.

Delighted to have one less reason to write assembly.

This revision was automatically updated to reflect the committed changes.

Tests made assumptions about alignment which are invalid on arm, failed a buildbot. The tests don't actually care about alignment, so fixed by dropping the , align # part of the patterns.

This breaks tests on Windows:

Please take a look, and if it takes some time please revert while you investigate.

This breaks tests on Windows:

Please take a look, and if it takes some time please revert while you investigate.

Thanks! It seems Windows inserts 'dso_local' into the middle of the generated IR.

I can't test on Windows so the two that failed CI are now marked as "UNSUPPORTED: system-windows".

Do you know a usual work around for variation in symbol visibility? I'm happy to copy it from another test but am wary of guessing what might work on Windows.