This is an archive of the discontinued LLVM Phabricator instance.

[OpenCL] Mark kernel functions with default visibility
AbandonedPublic

Authored by scott.linder on Oct 11 2018, 11:12 AM.

Details

Summary

The rationale for this is that OpenCL provides programmatic access to these symbols, but not to e.g. non-kernel functions. These symbols should have default visibility even when e.g. -fvisibility hidden is present.

This is an alternative approach to achieving the same goals as https://reviews.llvm.org/D52891

Diff Detail

Event Timeline

scott.linder created this revision.Oct 11 2018, 11:12 AM
arsenm added inline comments.Oct 12 2018, 5:04 AM
lib/AST/Decl.cpp
565

isa

scott.linder marked an inline comment as done.

Address feedback

This approach is trying to make OpenCL kernel and variable exceptions to -fvisibility option. However it does not provide users with choices. What if a user really wants to change the visibility of kernels and variables by -fvisibility? I think is more like a hack compared to https://reviews.llvm.org/D52891

The rationale is that -fvisibility only affects the default, and already does not apply in many cases. For example, see the rest of the conditions above the fvisibility check in getLVForNamespaceScopeDecl: when Var->getStorageClass() == SC_Static the linkage is (usually) internal and the visibility is default. In cases where individual symbols need unique visibility an __attribute__ can be used.

I think one question is whether OpenCL language semantics allow us to make these visibility determinations; I am going off of the APIs available to access symbols.

The rationale is that -fvisibility only affects the default, and already does not apply in many cases. For example, see the rest of the conditions above the fvisibility check in getLVForNamespaceScopeDecl: when Var->getStorageClass() == SC_Static the linkage is (usually) internal and the visibility is default. In cases where individual symbols need unique visibility an __attribute__ can be used.

I think one question is whether OpenCL language semantics allow us to make these visibility determinations; I am going off of the APIs available to access symbols.

I feel confused. So -fvisibility only applies to situations where it does not cause conflicts with language rule enforced linkage/visibility?

rsmith added a subscriber: rsmith.Oct 12 2018, 12:24 PM

The rationale is that -fvisibility only affects the default, and already does not apply in many cases. For example, see the rest of the conditions above the fvisibility check in getLVForNamespaceScopeDecl: when Var->getStorageClass() == SC_Static the linkage is (usually) internal and the visibility is default.

Visibility is meaningless for an internal-linkage declaration.

I think one question is whether OpenCL language semantics allow us to make these visibility determinations; I am going off of the APIs available to access symbols.

-fvisibility is not governed by the OpenCL specification; it's a Clang / GCC extension, and we get to determine its semantics, which generally override the rules from the language model we're implementing.

I'm a little surprised that symbol visibility is even meaningful when compiling GPU code. Can you give some more background on that?

To the extent that visibility is meaningful, it does seem to make sense for kernel to imply default visibility in the same way an attribute would, since it's an unambiguous marker that the function is intended to be called from outside the DSO. It's less obvious to me that the same is true for global variables; is there no distinction in OpenCL between global variables that can be accessed by the host and global variables that are internal to the device-side computation?

That's my understanding, at least.

Take for example:

$ cat foo.c 
static void foo() { }
void bar() { }

void baz() { bar(); foo(); }
$ clang -fvisibility=protected -O0 -S -emit-llvm foo.c -o - 
...

; Function Attrs: noinline nounwind optnone uwtable
define protected void @bar() #0 {
entry:
  ret void
}

; Function Attrs: noinline nounwind optnone uwtable
define protected void @baz() #0 {
entry:
  call void @bar()
  call void @foo()
  ret void
}

; Function Attrs: noinline nounwind optnone uwtable
define internal void @foo() #0 {
entry:
  ret void
}

...

Here foo has default visibility, while the user asked for protected. I still do not completely understand the interaction between linkage and visibility, as Clang gives default visibility to any symbol with internal linkage.

The rationale is that -fvisibility only affects the default, and already does not apply in many cases. For example, see the rest of the conditions above the fvisibility check in getLVForNamespaceScopeDecl: when Var->getStorageClass() == SC_Static the linkage is (usually) internal and the visibility is default.

Visibility is meaningless for an internal-linkage declaration.

I think one question is whether OpenCL language semantics allow us to make these visibility determinations; I am going off of the APIs available to access symbols.

-fvisibility is not governed by the OpenCL specification; it's a Clang / GCC extension, and we get to determine its semantics, which generally override the rules from the language model we're implementing.

I'm a little surprised that symbol visibility is even meaningful when compiling GPU code. Can you give some more background on that?

To the extent that visibility is meaningful, it does seem to make sense for kernel to imply default visibility in the same way an attribute would, since it's an unambiguous marker that the function is intended to be called from outside the DSO. It's less obvious to me that the same is true for global variables; is there no distinction in OpenCL between global variables that can be accessed by the host and global variables that are internal to the device-side computation?

This makes more sense, thank you for the explanation. Ignore my last post, it makes sense that visibility is defaulted and ignored when a symbol is assigned internal linkage.

I am still not confident about globals; maybe @b-sumner has more insight? We have APIs for accessing global variable symbols form the host, but this may be AMD-specific, not general to OpenCL.

The rationale is that -fvisibility only affects the default, and already does not apply in many cases. For example, see the rest of the conditions above the fvisibility check in getLVForNamespaceScopeDecl: when Var->getStorageClass() == SC_Static the linkage is (usually) internal and the visibility is default.

Visibility is meaningless for an internal-linkage declaration.

I think one question is whether OpenCL language semantics allow us to make these visibility determinations; I am going off of the APIs available to access symbols.

-fvisibility is not governed by the OpenCL specification; it's a Clang / GCC extension, and we get to determine its semantics, which generally override the rules from the language model we're implementing.

I'm a little surprised that symbol visibility is even meaningful when compiling GPU code. Can you give some more background on that?

To the extent that visibility is meaningful, it does seem to make sense for kernel to imply default visibility in the same way an attribute would, since it's an unambiguous marker that the function is intended to be called from outside the DSO. It's less obvious to me that the same is true for global variables; is there no distinction in OpenCL between global variables that can be accessed by the host and global variables that are internal to the device-side computation?

This makes more sense, thank you for the explanation. Ignore my last post, it makes sense that visibility is defaulted and ignored when a symbol is assigned internal linkage.

I am still not confident about globals; maybe @b-sumner has more insight? We have APIs for accessing global variable symbols form the host, but this may be AMD-specific, not general to OpenCL.

One thing I can think of the global variables is their initialization. If their initialization must be done by the runtime, then they have to be visible to the runtime.

The rationale is that -fvisibility only affects the default, and already does not apply in many cases. For example, see the rest of the conditions above the fvisibility check in getLVForNamespaceScopeDecl: when Var->getStorageClass() == SC_Static the linkage is (usually) internal and the visibility is default.

Visibility is meaningless for an internal-linkage declaration.

I think one question is whether OpenCL language semantics allow us to make these visibility determinations; I am going off of the APIs available to access symbols.

-fvisibility is not governed by the OpenCL specification; it's a Clang / GCC extension, and we get to determine its semantics, which generally override the rules from the language model we're implementing.

I'm a little surprised that symbol visibility is even meaningful when compiling GPU code. Can you give some more background on that?

To the extent that visibility is meaningful, it does seem to make sense for kernel to imply default visibility in the same way an attribute would, since it's an unambiguous marker that the function is intended to be called from outside the DSO. It's less obvious to me that the same is true for global variables; is there no distinction in OpenCL between global variables that can be accessed by the host and global variables that are internal to the device-side computation?

This makes more sense, thank you for the explanation. Ignore my last post, it makes sense that visibility is defaulted and ignored when a symbol is assigned internal linkage.

I am still not confident about globals; maybe @b-sumner has more insight? We have APIs for accessing global variable symbols form the host, but this may be AMD-specific, not general to OpenCL.

One thing I can think of the global variables is their initialization. If their initialization must be done by the runtime, then they have to be visible to the runtime.

Tony mentioned that in one of the CL2.X standards there is support for dynamic initialization of global variables, but I don't have a standard reference for that.

Beyond constructors/destructors I believe an API which we implement through access to dynamic symbols for global variable is clGetProgramBuildInfo(..., CL_PROGRAM_BUILD_GLOBAL_VARIABLE_TOTAL_SIZE, ...) which is defined as "The total amount of storage, in bytes, used by program variables in the global address space."

As for kernels, an example of an API which we implement through access to kernel symbols is clGetProgramInfo(..., CL_PROGRAM_KERNEL_NAMES, ...)

Both of these are defined at https://www.khronos.org/registry/OpenCL/specs/2.2/html/OpenCL_API.html#_program_object_queries

Beyond constructors/destructors I believe an API which we implement through access to dynamic symbols for global variable is clGetProgramBuildInfo(..., CL_PROGRAM_BUILD_GLOBAL_VARIABLE_TOTAL_SIZE, ...) which is defined as "The total amount of storage, in bytes, used by program variables in the global address space."

Presumably this is looking at the size of a certain section / segment of the image. It's unclear to me why symbol visibility would affect this in any way.

As for kernels, an example of an API which we implement through access to kernel symbols is clGetProgramInfo(..., CL_PROGRAM_KERNEL_NAMES, ...)

Sure. kernel, like dllexport or visibility(default), seems to act as declaring a function to be part of the external interface of the code, and it makes sense for all of those to override a command-line -fvisibility flag.

But it's still unclear to me what effect someone would expect -fvisibility=hidden to have when compiling OpenCL code. What are these symbols being hidden-by-default *from*, if not the host-side OpenCL runtime? Is there ever actually have a device-side dynamic linker?

Don't mark namespace-scope global variable declarations in OpenCL with explicit default visibility

Beyond constructors/destructors I believe an API which we implement through access to dynamic symbols for global variable is clGetProgramBuildInfo(..., CL_PROGRAM_BUILD_GLOBAL_VARIABLE_TOTAL_SIZE, ...) which is defined as "The total amount of storage, in bytes, used by program variables in the global address space."

Presumably this is looking at the size of a certain section / segment of the image. It's unclear to me why symbol visibility would affect this in any way.

You're right, the OpenCL APIs do not seem to require access to these symbols, as inspecting !X segments is sufficient. I removed global variable declarations from the patch.

As for kernels, an example of an API which we implement through access to kernel symbols is clGetProgramInfo(..., CL_PROGRAM_KERNEL_NAMES, ...)

Sure. kernel, like dllexport or visibility(default), seems to act as declaring a function to be part of the external interface of the code, and it makes sense for all of those to override a command-line -fvisibility flag.

But it's still unclear to me what effect someone would expect -fvisibility=hidden to have when compiling OpenCL code. What are these symbols being hidden-by-default *from*, if not the host-side OpenCL runtime? Is there ever actually have a device-side dynamic linker?

Currently for AMDGPU it is an optimization, because if the symbol is non-preemptible we can emit a static PCREL relocation and avoid a load from the GOT. We do not currently support device-side dynamic linking for e.g. lazy binding, so hiding non-kernel symbols is OK. When we begin to support device-side linking we can reconsider the blanket directive. In general I think someone compiling OpenCL code with this option would not expect kernel symbols to be hidden, as you mention; as far as what they intend to hide the symbols from I think that depends on how the implementation handles linking.

t-tye accepted this revision.Nov 5 2018, 1:22 PM

LGTM

Summary needs updating as now only being done for kernels and not namespace scope variables.

This revision is now accepted and ready to land.Nov 5 2018, 1:22 PM
scott.linder retitled this revision from [OpenCL] Mark namespace scope variables and kernel functions with default visibility to [OpenCL] Mark kernel functions with default visibility.Nov 5 2018, 1:25 PM

I agree with Richard that I'm not sure what the point of supporting frontend visibility settings in OpenCL is. If you want the "everything is internal to the image" optimization, presumably you can just infer visibility on everything in a pass over the IR.

lib/AST/Decl.cpp
565

I think you should just check for the attribute here; it can't be present in non-OpenCL files, and it can't be present on non-functions, and it's probably cheaper to look for it than to check the language option anyway.

arsenm added a comment.Nov 5 2018, 4:56 PM

I agree with Richard that I'm not sure what the point of supporting frontend visibility settings in OpenCL is. If you want the "everything is internal to the image" optimization, presumably you can just infer visibility on everything in a pass over the IR.

Eventually we want to support linking object files, and setting the visibility in the IR won't be aware of any user specified visibility

I agree with Richard that I'm not sure what the point of supporting frontend visibility settings in OpenCL is. If you want the "everything is internal to the image" optimization, presumably you can just infer visibility on everything in a pass over the IR.

Eventually we want to support linking object files, and setting the visibility in the IR won't be aware of any user specified visibility

But do you want to support *dynamically* linking object files? Because that's what visibility is about.

But do you want to support *dynamically* linking object files? Because that's what visibility is about.

To be specific, if you don't have multiple levels of linking — doing a slower and relatively more expensive link to form a self-contained unit of code distribution, then doing a faster link to form a runnable program from multiple independently-distributed such units — then visibility isn't really doing anything for you.

t-tye added a comment.Nov 5 2018, 8:15 PM

But do you want to support *dynamically* linking object files? Because that's what visibility is about.

To be specific, if you don't have multiple levels of linking — doing a slower and relatively more expensive link to form a self-contained unit of code distribution, then doing a faster link to form a runnable program from multiple independently-distributed such units — then visibility isn't really doing anything for you.

Currently the AMDGPU code objects are fully linked dynamically loader shared libraries. They are not relocatable code objects. This allows fewer and cheaper relocations and a smaller (dynsym) symbol table for the runtime loader.

Okay, that's interesting. And that dynamic linking step includes fairly unrestricted linking of OpenCL code to other OpenCL code, rather than just e.g. loading a single block of OpenCL code that exports a small, fixed interface? If so, then I accept that you need symbol visibility.

I don't believe that is currently the case (the unrestricted linking of OCL code to OCL code via a dynamic linker), but we do have the notion of a static link step, followed by dynamic linking at runtime. The static link step is currently via IR, but we plan to support linking object files. Maybe I misunderstand the distinction between linkage and visibility, but it seems reasonable that a user would want to have e.g. a non-kernel function participate in static linking, but not be preemptible in the final shared object. The intention with this patch is to allow this with something like -fvisibility hidden without disrupting kernel symbols, which must appear in the dynsym for the reasons mentioned earlier in the thread.

I don't believe that is currently the case (the unrestricted linking of OCL code to OCL code via a dynamic linker), but we do have the notion of a static link step, followed by dynamic linking at runtime. The static link step is currently via IR, but we plan to support linking object files. Maybe I misunderstand the distinction between linkage and visibility, but it seems reasonable that a user would want to have e.g. a non-kernel function participate in static linking, but not be preemptible in the final shared object. The intention with this patch is to allow this with something like -fvisibility hidden without disrupting kernel symbols, which must appear in the dynsym for the reasons mentioned earlier in the thread.

Okay, this still doesn't need user-controlled symbol visibility. The basic question here is whether there is any need for anything other than kernel functions to have dynamic linkage. If not, then you really just need to mark everything as hidden except for kernels. You *could* do that marking in Sema, but frankly that seems like an implementation detail escaping into the AST, and it's just going to cause you unwanted pain and sub-optimality. (For example: -fvisibility=hidden doesn't actually change the visibility of external declarations for reasons that make a lot of sense for general-purpose environments but probably don't matter to you at all. If you're doing this to get better code-generation for references to external entities that are going to end up within the image, you actually need to add a visibility attribute to every single top-level declaration to get that effect. It's much easier to just do it in a pass on the module.)

I don't believe that is currently the case (the unrestricted linking of OCL code to OCL code via a dynamic linker), but we do have the notion of a static link step, followed by dynamic linking at runtime. The static link step is currently via IR, but we plan to support linking object files. Maybe I misunderstand the distinction between linkage and visibility, but it seems reasonable that a user would want to have e.g. a non-kernel function participate in static linking, but not be preemptible in the final shared object. The intention with this patch is to allow this with something like -fvisibility hidden without disrupting kernel symbols, which must appear in the dynsym for the reasons mentioned earlier in the thread.

Okay, this still doesn't need user-controlled symbol visibility. The basic question here is whether there is any need for anything other than kernel functions to have dynamic linkage. If not, then you really just need to mark everything as hidden except for kernels. You *could* do that marking in Sema, but frankly that seems like an implementation detail escaping into the AST, and it's just going to cause you unwanted pain and sub-optimality. (For example: -fvisibility=hidden doesn't actually change the visibility of external declarations for reasons that make a lot of sense for general-purpose environments but probably don't matter to you at all. If you're doing this to get better code-generation for references to external entities that are going to end up within the image, you actually need to add a visibility attribute to every single top-level declaration to get that effect. It's much easier to just do it in a pass on the module.)

If such a pass were to mark all non-kernel symbols as hidden, for example, how would it support languages other than OpenCL where the runtime constraints differ? AMDGPU supports other languages where the assumption that only kernel functions must be visible is not valid. Can a module pass override visibility without making assumptions about the source language? I may just be unfamiliar with Clang, and not know where this sort of pass would live.

I think a pass which only overrides kernel function visibility (i.e. by forcing it to be default) would always be correct, but touching other symbol visibility begins to make assumptions which are not necessarily valid for all languages.

I don't believe that is currently the case (the unrestricted linking of OCL code to OCL code via a dynamic linker), but we do have the notion of a static link step, followed by dynamic linking at runtime. The static link step is currently via IR, but we plan to support linking object files. Maybe I misunderstand the distinction between linkage and visibility, but it seems reasonable that a user would want to have e.g. a non-kernel function participate in static linking, but not be preemptible in the final shared object. The intention with this patch is to allow this with something like -fvisibility hidden without disrupting kernel symbols, which must appear in the dynsym for the reasons mentioned earlier in the thread.

Okay, this still doesn't need user-controlled symbol visibility. The basic question here is whether there is any need for anything other than kernel functions to have dynamic linkage. If not, then you really just need to mark everything as hidden except for kernels. You *could* do that marking in Sema, but frankly that seems like an implementation detail escaping into the AST, and it's just going to cause you unwanted pain and sub-optimality. (For example: -fvisibility=hidden doesn't actually change the visibility of external declarations for reasons that make a lot of sense for general-purpose environments but probably don't matter to you at all. If you're doing this to get better code-generation for references to external entities that are going to end up within the image, you actually need to add a visibility attribute to every single top-level declaration to get that effect. It's much easier to just do it in a pass on the module.)

If such a pass were to mark all non-kernel symbols as hidden, for example, how would it support languages other than OpenCL where the runtime constraints differ? AMDGPU supports other languages where the assumption that only kernel functions must be visible is not valid. Can a module pass override visibility without making assumptions about the source language? I may just be unfamiliar with Clang, and not know where this sort of pass would live.

I think a pass which only overrides kernel function visibility (i.e. by forcing it to be default) would always be correct, but touching other symbol visibility begins to make assumptions which are not necessarily valid for all languages.

You still have the same linkage model for those other languages, right? Ultimately there's something like a kernel function that has to be declared specially and which becomes the only public interface of the code?

You still have the same linkage model for those other languages, right? Ultimately there's something like a kernel function that has to be declared specially and which becomes the only public interface of the code?

I don't think this is true for all languages targeting AMDGPU. For example, HIP has APIs like hipMemcpyToSymbol which accept a string identifying a symbol for a variable in global/constant address space. @yaxunl is my understanding about the HIP runtime requiring dynamic symbols for global variables correct here?

You still have the same linkage model for those other languages, right? Ultimately there's something like a kernel function that has to be declared specially and which becomes the only public interface of the code?

I don't think this is true for all languages targeting AMDGPU. For example, HIP has APIs like hipMemcpyToSymbol which accept a string identifying a symbol for a variable in global/constant address space. @yaxunl is my understanding about the HIP runtime requiring dynamic symbols for global variables correct here?

Right. There is also hipGetSymbolAddress which returns address of a global variable in device memory given the name of the variable. Therefore we cannot hide the global variables.

You still have the same linkage model for those other languages, right? Ultimately there's something like a kernel function that has to be declared specially and which becomes the only public interface of the code?

I don't think this is true for all languages targeting AMDGPU. For example, HIP has APIs like hipMemcpyToSymbol which accept a string identifying a symbol for a variable in global/constant address space. @yaxunl is my understanding about the HIP runtime requiring dynamic symbols for global variables correct here?

Right. There is also hipGetSymbolAddress which returns address of a global variable in device memory given the name of the variable. Therefore we cannot hide the global variables.

Okay. So it's still the case that all symbols will be defined within the linkage unit; it's just that some things might need to get exposed outside of it.

LLVM does provide a dso_local attribute which you could use unconditionally on every symbol without actually changing visibility.

Okay. So it's still the case that all symbols will be defined within the linkage unit; it's just that some things might need to get exposed outside of it.

LLVM does provide a dso_local attribute which you could use unconditionally on every symbol without actually changing visibility.

It seems like dso_local is an IR concept, and is a guarantee of non-preemptability? Marking e.g. a global variable as dso_local in IR doesn't seem to affect what LLD infers about it;computeIsPreemptible in the ELF writer still believes it is preemptable, and so LLD complains when it attempts to generate a dynamic relocation in a read-only section (i.e. text). Is dso_local meant to convey anything to the linker?

If this is the intended behavior we could still tell LLD to assume all defined symbols are not preemptable with something like -Bsymbolic. If we support dynamic linking/preemption in the future we will have to revisit this, but at that time visibility will be meaningful.

I suppose one approach is then:

  • Remove the implicit -fvisibility=hidden in the AMDGPU Clang toolchain
  • Add a module pass to mark all global values with dso_local
  • Add an implicit -Bsymbolic to the linker commandline

I think -fvisibility=hidden isn't good enough because you want to infer hidden visibility even for external symbol references, and that's just not how global visibility works. But after this discussion, I'm prepared to accept that (1) we should have some sort of single-image compiler mode that implies that all symbols are defined within the image and (2) you can make your toolchain imply that together with -fvisibility=hidden and then have specific symbols opt in to non-hidden visibility if they need to be accessible to whatever loader / runtime you have.

I think -fvisibility=hidden isn't good enough because you want to infer hidden visibility even for external symbol references, and that's just not how global visibility works. But after this discussion, I'm prepared to accept that (1) we should have some sort of single-image compiler mode that implies that all symbols are defined within the image and (2) you can make your toolchain imply that together with -fvisibility=hidden and then have specific symbols opt in to non-hidden visibility if they need to be accessible to whatever loader / runtime you have.

It seems that explicit visibility attributes on external symbol references (e.g. __attribute__((visibility("hidden"))) extern int foo;) are respected in Clang, so I don't understand the rationale for global visibility controls not applying to them as well. Can you describe why this is the case?

I think -fvisibility=hidden isn't good enough because you want to infer hidden visibility even for external symbol references, and that's just not how global visibility works. But after this discussion, I'm prepared to accept that (1) we should have some sort of single-image compiler mode that implies that all symbols are defined within the image and (2) you can make your toolchain imply that together with -fvisibility=hidden and then have specific symbols opt in to non-hidden visibility if they need to be accessible to whatever loader / runtime you have.

It seems that explicit visibility attributes on external symbol references (e.g. __attribute__((visibility("hidden"))) extern int foo;) are respected in Clang, so I don't understand the rationale for global visibility controls not applying to them as well. Can you describe why this is the case?

Well, one answer is that that's the behavior that GCC defined when they added global visibility controls, and it would be unreasonable for Clang to deviate in such a fundamental way. However, we do deviate in some other ways in our interpretation of visibility attributes, so that's not a totally satisfactory answer.

The stronger answer is that GCC has a very good reason for this behavior. Global visibility controls apply to all code in the translation unit. While programmers often mentally distinguish between "project" headers (which they control) and "library" headers (which they do not), that is not a difference that is known to the compiler. Therefore, if global visibility controls applied to external symbols, the compiler would end up assuming that all external symbols not annotated as default are provided by the current image. That assumption would have been incompatible with essentially all existing headers for dynamic libraries on GCC's core targets, including system headers; in other words, it would have caused ubiquitous miscompiles.* It would be completely unreasonable to expect all those headers to be changed just for the benefit of a single new compiler feature. Therefore, global visibility controls do not apply to external symbols because doing so would make it impossible to actually enable global visibility controls.

  • GCC's rule helps a great deal even in C++, which is generally more sensitive to global visibility: you can typically use an unannotated C++ header under -fvisibility=hidden and have no problems unless you start using its types as exceptions (and GCC tries to work around that, too) or comparing the addresses of inline functions or template instantiations.

John.

I think -fvisibility=hidden isn't good enough because you want to infer hidden visibility even for external symbol references, and that's just not how global visibility works. But after this discussion, I'm prepared to accept that (1) we should have some sort of single-image compiler mode that implies that all symbols are defined within the image and (2) you can make your toolchain imply that together with -fvisibility=hidden and then have specific symbols opt in to non-hidden visibility if they need to be accessible to whatever loader / runtime you have.

It seems that explicit visibility attributes on external symbol references (e.g. __attribute__((visibility("hidden"))) extern int foo;) are respected in Clang, so I don't understand the rationale for global visibility controls not applying to them as well. Can you describe why this is the case?

Well, one answer is that that's the behavior that GCC defined when they added global visibility controls, and it would be unreasonable for Clang to deviate in such a fundamental way. However, we do deviate in some other ways in our interpretation of visibility attributes, so that's not a totally satisfactory answer.

The stronger answer is that GCC has a very good reason for this behavior. Global visibility controls apply to all code in the translation unit. While programmers often mentally distinguish between "project" headers (which they control) and "library" headers (which they do not), that is not a difference that is known to the compiler. Therefore, if global visibility controls applied to external symbols, the compiler would end up assuming that all external symbols not annotated as default are provided by the current image. That assumption would have been incompatible with essentially all existing headers for dynamic libraries on GCC's core targets, including system headers; in other words, it would have caused ubiquitous miscompiles.* It would be completely unreasonable to expect all those headers to be changed just for the benefit of a single new compiler feature. Therefore, global visibility controls do not apply to external symbols because doing so would make it impossible to actually enable global visibility controls.

  • GCC's rule helps a great deal even in C++, which is generally more sensitive to global visibility: you can typically use an unannotated C++ header under -fvisibility=hidden and have no problems unless you start using its types as exceptions (and GCC tries to work around that, too) or comparing the addresses of inline functions or template instantiations.

John.

Thank you for the context, that makes sense.

In terms of implementation of the above, currently at the AST level the LinkageComputer seems to assign global visibility to extern declarations, then during CodeGen this is simply not copied to the IR level unless the visibility was set explicitly. At first I was confused that the visibility of e.g. a clang::Decl and its llvm::GlobalValue could differ. Is there a reason why the implementation of LinkageComputer does not calculate the correct visibility here from the start? If the logic for extern declarations can be moved into the AST would it then be reasonable to add a distinct global visibility for extern declarations (something like -fextern-visibility=)? We could pair this with the existing global value visibility to effectively get the single-image compilation mode.

Mostly, we don't really want the abstract visibility calculation to change whenever we see a definition.

It sounds like -fextern-visiblity= isn't something we really want, then? I think it would have to be implemented in the abstract linkage computer.

Would a patch to add an option to apply the normal visibility calculations to extern decls be reasonable, then? This change could be done entirely in CodeGen.

It's an interesting idea at least, and it does seem like there ought to be some way to express it, at least internally. A -cc1 option is fine for your purposes, right?

I think we need the option to be in the driver, not just -cc1. We are moving towards the clang driver being the authority on how we do offline compilation for AMDGPU, and hiding this in cc1 doesn't help us.

I also don't think this is unreasonable to expose in general. As you note the behavior of -fvisibility wrt. declarations is just a practical measure, and even in the world of e.g. x86 it seems like there are cases where the option is useful.

I don't want this to be a driver option because I don't want to design a general-purpose feature for this right now, nor do I want to gradually accrete a general-purpose feature around a random collection of needs accumulated from other features. Let's just leave it as a -cc1 option. You can very easily make the driver recognize when it's performing device compilation and automatically pass the -cc1 flag down.

That sounds reasonable to me. I had already posted a patch with the Driver options, but I will update it to only include the -cc1 version.