This is an archive of the discontinued LLVM Phabricator instance.

[clang] Allow const variables with weak attribute to be overridden
ClosedPublic

Authored by wanders on May 24 2022, 1:36 PM.

Details

Summary

A variable with weak attribute signifies that it can be replaced with
a "strong" symbol link time. Therefore it must not emitted with
"weak_odr" linkage, as that allows the backend to use its value in
optimizations.

The frontend already considers weak const variables as
non-constant (note_constexpr_var_init_weak diagnostic) so this change
makes frontend and backend consistent.

This commit reverses the

f49573d1 weak globals that are const should get weak_odr linkage.

commit from 2009-08-05 which introduced this behavior. Unfortunately
that commit doesn't provide any details on why the change was made.

This was discussed in
https://discourse.llvm.org/t/weak-attribute-semantics-on-const-variables/62311

Diff Detail

Event Timeline

wanders created this revision.May 24 2022, 1:36 PM
Herald added a project: Restricted Project. · View Herald Transcript
wanders requested review of this revision.May 24 2022, 1:36 PM

I'm not a competent reviewer for the patch, but +1 on aligning with GCC behavior here. I don't recall the motivation for the original patch (apologies again for the terrible commit message). This approach makes a lot of sense to me.

wanders updated this revision to Diff 431933.May 25 2022, 3:59 AM

Diff updated to be git-clang-format clean and to (hopefully) accommodate for differences in test output on w64.

The changes so far look sensible, but I think we should add some more tests for a few situations. 1) Using a const weak symbol as a valid initializer should be diagnosed (with a warning? with an error?) so users are alerted to the behavioral quirks. 2) Using a const weak symbol in a constant expression context should probably be an error, right? e.g.,

const int hmm __attribute__((weak)) = 12;
int erp = hmm; // Error in C, dynamic init in C++?

int main() {
  int blerp = hmm; // OK in both languages
  constexpr int derp = hmm; // Error in C++
  int array[hmm]; // Error in both languages
}

Also, this definitely could use a release note so users know about the behavioral change.

Do you have any ideas on how much code this change is expected to break? (Is it sufficient enough that we want to have an explicit deprecation period of the existing behavior before we switch to the new behavior?)

clang/include/clang/Basic/Attr.td
2962

THANK YOU!

The changes so far look sensible, but I think we should add some more tests for a few situations. 1) Using a const weak symbol as a valid initializer should be diagnosed (with a warning? with an error?) so users are alerted to the behavioral quirks.

I don't feel this is really necessary.

  1. Using a const weak symbol in a constant expression context should probably be an error, right? e.g.,

That's already the case (except, using a VLA is not an error in either language mode). Clang previously _only_ treated weak symbols as compile-time-known in the backend; the frontend has treated it as an non-constant-evaluable variable for ages, and that behavior is unchanged by this patch.

Also, this definitely could use a release note so users know about the behavioral change.

Do you have any ideas on how much code this change is expected to break? (Is it sufficient enough that we want to have an explicit deprecation period of the existing behavior before we switch to the new behavior?)

Because it's only changing the optimizer behavior not the frontend constant evaluator, I think it's pretty safe (which is also why I'm OK with it).

clang/include/clang/Basic/AttrDocs.td
6527

Better said:
"""
If an external declaration is marked weak and that symbol does not
exist during linking (possibly dynamic) the address of the symbol
will evaluate to NULL.
"""

(function decls are just weird in that &f is treated as equivalent to f)

The changes so far look sensible, but I think we should add some more tests for a few situations. 1) Using a const weak symbol as a valid initializer should be diagnosed (with a warning? with an error?) so users are alerted to the behavioral quirks. 2) Using a const weak symbol in a constant expression context should probably be an error, right? e.g.,

I do think we have already diagnose/fail on all the relevant cases here. And have tests for them, like

extern const int weak_int __attribute__((weak));
const int weak_int = 42;
int weak_int_test = weak_int; // expected-error {{not a compile-time constant}}

in clang/test/Sema/const-eval.c

But I'll have another look to make sure we have proper coverage.

Also, this definitely could use a release note so users know about the behavioral change.

Happy to write one, but not sure what it should say.
The "only" change in behavior is that frontend no longer tells backend it is allowed to optimize based on initializer value. Frontend behavior is intended to be kept exactly the same (it already is restrictive enough).

Do you have any ideas on how much code this change is expected to break? (Is it sufficient enough that we want to have an explicit deprecation period of the existing behavior before we switch to the new behavior?)

I don't really know much code out there would be affected. As mentioned in the discourse thread https://discourse.llvm.org/t/weak-attribute-semantics-on-const-variables/62311/7 I did some grepping in open source projects I could find. This was biased towards looking at C code.
I could see two different uses of const+weak:

  1. For defining a constant in a header file (which makes that symbol end up in many object files, making it weak avoids multiple definition link errors)
  2. To define some default configuration/identificationstring which is overridden by a strong symbol

There might of course be other uses of weak that I'm not aware of or could find.

But these two cases I want to think are fine with the proposed change:

For 1. we wouldn't alter behavior of code just speed due to optimization being prevented. (I think these uses want to use "selectany" attribute instead)

For 2. this change should actually fix a bug where functions in the same translation unit that provided the default was using the default rather than overridden value.

Here are some examples I found:
In u-boot they do this https://source.denx.de/u-boot/u-boot/-/blob/master/common/hwconfig.c#L71 and I think (havn't verified) clang/llvm would optimize the strlen(cpu_hwconfig) on line 103 to 0. So here I believe this code currently has incorrect behavior when compiled with clang.

Arduino for esp does this constant in header https://github.com/esp8266/Arduino/blob/master/variants/generic/common.h#L79 so that would cause an extra load of the value from memory for everyone blinking the LED using the old deprecaded name for that pin.

Here is a c++ example https://github.com/ClickHouse/ClickHouse/blob/master/src/Common/Allocator.cpp#L11 which I think it might not allow overriding the variable with a strong definition as intended when compiling with clang.

The case that motivated me to look into this is like:

const char VERSION[] __attribute__ ((weak))= "devel";

int is_devel_version(void) {
    return !strcmp (VERSION, "devel");
}

and when "VERSION" gets "weak_odr" linkage "is_devel_version" gets optimized to "return 1". Even if VERSION gets replaced with a strong value later in link phase.

The changes so far look sensible, but I think we should add some more tests for a few situations. 1) Using a const weak symbol as a valid initializer should be diagnosed (with a warning? with an error?) so users are alerted to the behavioral quirks.

I don't feel this is really necessary.

Why?

My thinking is: it's invalid to initialize a file scope variable in C from something other than a constant expression. These are not constant expressions. Why should the initialization work silently and maybe do something wrong?

  1. Using a const weak symbol in a constant expression context should probably be an error, right? e.g.,

That's already the case (except, using a VLA is not an error in either language mode). Clang previously _only_ treated weak symbols as compile-time-known in the backend; the frontend has treated it as an non-constant-evaluable variable for ages, and that behavior is unchanged by this patch.

Ah, okay, then it sounds like we already have the behavior I was hoping for, great.

Also, this definitely could use a release note so users know about the behavioral change.

Do you have any ideas on how much code this change is expected to break? (Is it sufficient enough that we want to have an explicit deprecation period of the existing behavior before we switch to the new behavior?)

Because it's only changing the optimizer behavior not the frontend constant evaluator, I think it's pretty safe (which is also why I'm OK with it).

+1

The changes so far look sensible, but I think we should add some more tests for a few situations. 1) Using a const weak symbol as a valid initializer should be diagnosed (with a warning? with an error?) so users are alerted to the behavioral quirks. 2) Using a const weak symbol in a constant expression context should probably be an error, right? e.g.,

I do think we have already diagnose/fail on all the relevant cases here. And have tests for them, like

extern const int weak_int __attribute__((weak));
const int weak_int = 42;
int weak_int_test = weak_int; // expected-error {{not a compile-time constant}}

in clang/test/Sema/const-eval.c

But I'll have another look to make sure we have proper coverage.

Thanks for double-checking it -- if we've got reasonable coverage, then that's great (nothing else needs to happen).

Also, this definitely could use a release note so users know about the behavioral change.

Happy to write one, but not sure what it should say.
The "only" change in behavior is that frontend no longer tells backend it is allowed to optimize based on initializer value. Frontend behavior is intended to be kept exactly the same (it already is restrictive enough).

I thought this was more disruptive than it actually is, so I feel less strongly. But if you wanted to add one, I'd put it under the Attribute Changes in Clang heading and something along the lines of what you just wrote about the backend no longer being allowed to optimize in some circumstances.

Do you have any ideas on how much code this change is expected to break? (Is it sufficient enough that we want to have an explicit deprecation period of the existing behavior before we switch to the new behavior?)

I don't really know much code out there would be affected. As mentioned in the discourse thread https://discourse.llvm.org/t/weak-attribute-semantics-on-const-variables/62311/7 I did some grepping in open source projects I could find. This was biased towards looking at C code.
I could see two different uses of const+weak:

  1. For defining a constant in a header file (which makes that symbol end up in many object files, making it weak avoids multiple definition link errors)
  2. To define some default configuration/identificationstring which is overridden by a strong symbol

There might of course be other uses of weak that I'm not aware of or could find.

But these two cases I want to think are fine with the proposed change:

For 1. we wouldn't alter behavior of code just speed due to optimization being prevented. (I think these uses want to use "selectany" attribute instead)

For 2. this change should actually fix a bug where functions in the same translation unit that provided the default was using the default rather than overridden value.

Here are some examples I found:
In u-boot they do this https://source.denx.de/u-boot/u-boot/-/blob/master/common/hwconfig.c#L71 and I think (havn't verified) clang/llvm would optimize the strlen(cpu_hwconfig) on line 103 to 0. So here I believe this code currently has incorrect behavior when compiled with clang.

Arduino for esp does this constant in header https://github.com/esp8266/Arduino/blob/master/variants/generic/common.h#L79 so that would cause an extra load of the value from memory for everyone blinking the LED using the old deprecaded name for that pin.

Here is a c++ example https://github.com/ClickHouse/ClickHouse/blob/master/src/Common/Allocator.cpp#L11 which I think it might not allow overriding the variable with a strong definition as intended when compiling with clang.

The case that motivated me to look into this is like:

const char VERSION[] __attribute__ ((weak))= "devel";

int is_devel_version(void) {
    return !strcmp (VERSION, "devel");
}

and when "VERSION" gets "weak_odr" linkage "is_devel_version" gets optimized to "return 1". Even if VERSION gets replaced with a strong value later in link phase.

Excellent, thank you!

I am comfortable moving forward with this.

wanders updated this revision to Diff 432669.May 27 2022, 4:34 PM

Diff updated with release note + updated text in documentation as per jyknight's suggestion.

Tests for existing behavior added in separate patch: https://reviews.llvm.org/D126578

aaron.ballman accepted this revision.Jun 1 2022, 9:59 AM

LGTM! Please wait a day or two before landing in case @jyknight has additional comments, though. Thanks!

This revision is now accepted and ready to land.Jun 1 2022, 9:59 AM
This revision was automatically updated to reflect the committed changes.
wanders marked 2 inline comments as done.