This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/
-
clang/
-
AST/
-
ASTContext.h
-
Basic/
-
DiagnosticDriverKinds.td
-
LangOptions.h
-
Driver/
2/2
Action.h
-
Compilation.h
-
Options.td
-
lib/
-
AST/
8/8
ASTContext.cpp
-
CodeGen/
-
CGCUDANV.cpp
-
CodeGenModule.h
4/4
CodeGenModule.cpp
-
Driver/
3/3
Action.cpp
-
Driver.cpp
-
ToolChains/
-
Clang.cpp
-
Frontend/
-
CompilerInvocation.cpp
-
test/
-
CodeGenCUDA/
-
static-device-var.cu
-
Driver/
2/2
hip-cuid.hip
-
Frontend/
-
hip-cuid.hip
-
SemaCUDA/
-
static-device-var.cu

Differential D80858

[CUDA][HIP] Support accessing static device variable in host code for -fno-gpu-rdc
ClosedPublic

Authored by yaxunl on May 29 2020, 9:44 PM.

Download Raw Diff

Details

Reviewers

tra
rjmccall
hliao

Commits

rG45f2a56856e2: [CUDA][HIP] Support accessing static device variable in host code for -fno-gpu…

Summary

nvcc supports accessing file-scope static device variables in host code by host APIs
like cudaMemcpyToSymbol etc.

CUDA/HIP let users access device variables in host code by shadow variables. In host compilation,
clang emits a shadow variable for each device variable, and calls __*RegisterVariable to
register it in init function. The address of the shadow variable and the device side mangled
name of the device variable is passed to __*RegisterVariable. Runtime looks up the symbol
by name in the device binary (so called code object, which is actually elf format) to find the
address of the device variable.

The problem with static device variables is that they have internal linkage, therefore their
name may be changed by the linker if there are multiple symbols with the same name. Also
they end up as local symbols in the elf file, whereas the runtime only look up the global symbols.

To support accessing static device var in host code for -fno-gpu-rdc mode, change the intnernal
linkage to external linkage. The name does not need change since there is only one TU for
-fno-gpu-rdc mode. Also the externalization is done only if the device static var is referenced
by host code.

Diff Detail

Event Timeline

yaxunl created this revision.May 29 2020, 9:44 PM

The value is based on llvm::sys::Process::GetRandomNumber(). So unless one provides a build-system-derived uuid for every compilation unit, recompiling identical source will yield an observably different binary.

The distinction between 'unique' and 'random' is significant for anyone depending on repeatable binary output, so this patch should probably rename 'unique' to 'random' everywhere.

In D80858#2067537, @JonChesterfield wrote:

The value is based on llvm::sys::Process::GetRandomNumber(). So unless one provides a build-system-derived uuid for every compilation unit, recompiling identical source will yield an observably different binary.

The distinction between 'unique' and 'random' is significant for anyone depending on repeatable binary output, so this patch should probably rename 'unique' to 'random' everywhere.

+1. Reproducible builds is something we do pay attention to. We've had similar issue with -fcuda-rcd compilation and I think ended up generating the ID by deriving it from the input file name. It works reasonably well in most situations except the cases when the same file is recompiled multiple times with different preprocessor macros set.

In addition to everyone else's concerns about generating this number randomly, it's also inherently less testable.

I'm sure there's something else you could use that's reasonably likely to be unique, like a hash of the input filename or output filename or a combination of the two.

clang/include/clang/Driver/Action.h
217	Allowing an arbitrary string might be more adaptable.
clang/lib/AST/ASTContext.cpp
10064	This needs to be a clearer statement of why this is necessary.
10068	Are you sure this doesn't apply to e.g. local statics? Can't you have kernel lambdas, or am I confusing HIP with another language?
clang/lib/CodeGen/CodeGenModule.cpp
1090	Please extract this "(device \|\| constant) && file && static" predicate instead of repeating it in three different places.
clang/lib/Driver/Action.cpp
171	I'm sure GetRandomNumber can return 0, so this logic is faulty. This also seems like an unfortunate intrusion of HIP-specific semantics on the rest of the driver. Maybe HIP can generate the shared id when it's setting up and cloning the job.

yaxunl marked 10 inline comments as done.Jul 6 2020, 5:50 AM

yaxunl added inline comments.

clang/include/clang/Driver/Action.h
217	done
clang/lib/AST/ASTContext.cpp
10064	added comments to explain why
10068	function-scope static var in a device function is only visible to the device function. Host code cannot access it, therefore no need to externalize it.
clang/lib/CodeGen/CodeGenModule.cpp
1090	extracted as ASTContext::shouldExternalizeStaticVar
clang/lib/Driver/Action.cpp
171	Changed type of ID to string. Now empty ID means disabled. 0 is now allowed as a usual ID. Moved initialization of ID to HIP action builder.

revised by John's comments.

Also added generating compilation unit ID by hashing input file path and options. added option -fuse-cuid=random|hash|none for controlling the method to generate compilation unit ID. Allows -cuid=X to override -fuse-cuid. Enabled this feature for both CUDA and HIP.

tra added inline comments.Jul 6 2020, 2:24 PM

clang/lib/CodeGen/CodeGenModule.cpp
6061	I suspect that will have interesting issues if CUID is an arbitrary user-supplied string. We may want to impose some sort of sanity check or filtering on the cuid value. Considering that it's a CC1 flag, it's not a critical problem, but some safeguards would be useful there, too. Should we limit allowed character set?
clang/test/Driver/hip-cuid.hip
36	Nit: `abcd` could potentially match the value generated by hash. I'd change it to contain characters other than hex.

JonChesterfield added inline comments.Jul 6 2020, 2:39 PM

clang/lib/AST/ASTContext.cpp
10068	This doesn't sound right. An inline function can return a pointer to a function scope static variable, e.g. to implement a singleton in a header file. I think host code can then access said variable.

rjmccall added inline comments.Jul 9 2020, 2:05 PM

clang/lib/AST/ASTContext.cpp
10068	Right, and IIRC you can declare host device functions as well, which ought to agree on the variable if they agree on globals.
clang/lib/Driver/Action.cpp
171	Thanks, that works.

yaxunl marked 9 inline comments as done.Jul 11 2020, 7:55 PM

yaxunl added inline comments.

clang/lib/AST/ASTContext.cpp
10068	As long as we are not accessing the static variable by symbol we do not need externalize it. If a device function returns a pointer to its static variable and somehow passes that pointer to host code, the host code can use it directly by hipMemCpy.
10068	If we have a static variable in a device function, it is only visible in the function and not visible by any host code. We only need externalize it if it needs to be accessed `by symbol` in the host code, however, that is impossible, therefore we do not need externalize it. For static variables in a host device function, the static variables should be different instances on host side and device side. The rationale is that a static variable is per function, whereas a host device function is actually two functions: a host instance and a device instance, which could be totally different by using conditional macros. If it is requested that the static variable in a host device function is one instance, it requires special handling in runtime so that the same variable can be accessed on both device side and host side by common load/store instructions, but that is not the case. Therefore the device side instance of a static variable in a host device function is still only visible to device codes, not visible to host codes. Since it cannot be accessed `by symbol` by host code, it does not needs to be externalized.
clang/lib/CodeGen/CodeGenModule.cpp
6061	will only allow alphanumeric and underscore in CUID for simplicity.
clang/test/Driver/hip-cuid.hip
36	done

Herald added a subscriber: dang. · View Herald TranscriptJul 11 2020, 7:55 PM

Only allow cuid to be alphanumeric and underscore.

ping

I don't that's proper way to support file-scope static device variables. As we discuss APIs like cudaMemcpyToSymol, that's a runtime API instead of driver API. The later needs to specify the module (or code object) id in addition to a symbol name and won't have the conflict issues. For the runtime API, all named device variables (static or not) are identified at the host side by their host stub variables. That stub variable is used to register the corresponding device variables associated with a module id to unique identify that variables across all TUs. As long as we look up device variables using their host stub variable pointers, they are uniquely identified already. The runtime implementation needs to find the module id and the variable symbol from the pointer of its host stub variable. It's not the compiler job to fabricate name uniquely across TUs.

This revision now requires changes to proceed.Jul 23 2020, 7:09 AM

In D80858#2169295, @hliao wrote:

I don't that's proper way to support file-scope static device variables. As we discuss APIs like cudaMemcpyToSymol, that's a runtime API instead of driver API. The later needs to specify the module (or code object) id in addition to a symbol name and won't have the conflict issues. For the runtime API, all named device variables (static or not) are identified at the host side by their host stub variables. That stub variable is used to register the corresponding device variables associated with a module id to unique identify that variables across all TUs. As long as we look up device variables using their host stub variable pointers, they are uniquely identified already. The runtime implementation needs to find the module id and the variable symbol from the pointer of its host stub variable. It's not the compiler job to fabricate name uniquely across TUs.

The problem is that even though the static variable is registered through __hipRigisterVariable, the runtime still relies on looking up symbol name to get the address of the device variable. That's why we need to externalize the static variable.

Another reason is that we need to support it in rdc mode, where different TU can have static var with the same name.

In D80858#2169399, @yaxunl wrote:

In D80858#2169295, @hliao wrote:

I don't that's proper way to support file-scope static device variables. As we discuss APIs like cudaMemcpyToSymol, that's a runtime API instead of driver API. The later needs to specify the module (or code object) id in addition to a symbol name and won't have the conflict issues. For the runtime API, all named device variables (static or not) are identified at the host side by their host stub variables. That stub variable is used to register the corresponding device variables associated with a module id to unique identify that variables across all TUs. As long as we look up device variables using their host stub variable pointers, they are uniquely identified already. The runtime implementation needs to find the module id and the variable symbol from the pointer of its host stub variable. It's not the compiler job to fabricate name uniquely across TUs.

The problem is that even though the static variable is registered through __hipRigisterVariable, the runtime still relies on looking up symbol name to get the address of the device variable. That's why we need to externalize the static variable.

If so, the runtime should be fixed as the variable name. I remembered I fixed the global one so that each one is uniquely identified by module id plus the name. For runtime APIs, all host-side references to device variables should look through the host stub variables instead of its name. If runtime or API doesn't follow that, we should fix them instead of asking the compiler to do the favor.

In D80858#2169534, @yaxunl wrote:

Another reason is that we need to support it in rdc mode, where different TU can have static var with the same name.

That's an issue of our current RDC support through LLVM IR instead of native code. The name conflicts are introduced as we link all TUs into a single module at IR level. The frontend should not be changed to support that.

In D80858#2170311, @hliao wrote:

The problem is that even though the static variable is registered through __hipRigisterVariable, the runtime still relies on looking up symbol name to get the address of the device variable. That's why we need to externalize the static variable.

If so, the runtime should be fixed as the variable name. I remembered I fixed the global one so that each one is uniquely identified by module id plus the name. For runtime APIs, all host-side references to device variables should look through the host stub variables instead of its name. If runtime or API doesn't follow that, we should fix them instead of asking the compiler to do the favor.

For runtime APIs, we do reference device variables through host stub variable instead of its name. However, how does runtime implements that internally?

In host compilation, clang emits call of __hipRegisterVariable(shadow_var, device_var_name) in init functions.

runtime builds a map of each shadow var to a device var name. then runtime looks up device var name in code object to get the real address of a device var.

Note: __hipRegisterVariable does not really associate a shadow var with the address of a device var, since in host compilation there is no way to know the address of a device var. It only associates a shadow var with the name of a device var.

So eventually, runtime still needs to look up the device var symbol in code objects. Since ROCm runtime does not look up local symbols, it cannot find the static device var in code objects, unless we externalize it.

In D80858#2170328, @hliao wrote:

In D80858#2169534, @yaxunl wrote:

Another reason is that we need to support it in rdc mode, where different TU can have static var with the same name.

That's an issue of our current RDC support through LLVM IR instead of native code. The name conflicts are introduced as we link all TUs into a single module at IR level. The frontend should not be changed to support that.

I am not sure if ISA level linking would help for this. My understanding is that the linker will rename the internal symbols having the same name from different objects. It does not matter if it is llvm-link or native linker. Once they are renamed we can no longer find them by name.

I don't see a reason why we cannot support accessing static device variable in FE. We have requests for this feature from different users.

Would it work if we generate a globally unique visible aliases for the static vars and use the alias' name to register device-side entities without changing their visibility?

In D80858#2170533, @yaxunl wrote:

In D80858#2170328, @hliao wrote:

In D80858#2169534, @yaxunl wrote:

Another reason is that we need to support it in rdc mode, where different TU can have static var with the same name.

That's an issue of our current RDC support through LLVM IR instead of native code. The name conflicts are introduced as we link all TUs into a single module at IR level. The frontend should not be changed to support that.

I am not sure if ISA level linking would help for this. My understanding is that the linker will rename the internal symbols having the same name from different objects. It does not matter if it is llvm-link or native linker. Once they are renamed we can no longer find them by name.

I don't see a reason why we cannot support accessing static device variable in FE. We have requests for this feature from different users.

the register API also has a module ID parameter. With that, the runtime should be able to establish the map from the pointer of that host stub variable to its device module and the name within that module. For non-RDC case, that's already enough for the runtime to find the correct device variable.

But, as static internal linkage is defined, it makes that identifier only visible to a single TU and not visible across TU. To be host, I think we still need to define that behaves in the context of CUDA or HIP as it obviously breaks the definition by making that static visible between host and device TUs in the restrict definition.

In D80858#2170547, @tra wrote:

Would it work if we generate a globally unique visible aliases for the static vars and use the alias' name to register device-side entities without changing their visibility?

We still need to define how a static device variable should be visible on the host side. How that behaves in the context of relocatable code generation as well as anonymous namespaces. Also, in the context of CUDA runtime/driver API, if a device variable is addressable on the host side through the runtime API, it should be addressable through the driver API as well. However, the naming will be a big challenge.

In D80858#2170604, @hliao wrote:

In D80858#2170547, @tra wrote:

Would it work if we generate a globally unique visible aliases for the static vars and use the alias' name to register device-side entities without changing their visibility?

We still need to define how a static device variable should be visible on the host side.

As far as the host is concerned, if a 'static' variable is within the same TU it (or, rather, its shadow) should be accessible.

How that behaves in the context of relocatable code generation as well as anonymous namespaces.

AFAICT it would just work -- each TU only sees static things within that TU on both host and device sides. Visible module-unique aliases would allow host-side binary to get address of an otherwise non-visible variables. I expect that will work with RDC -- because we didn't change the visibility of the symbols themselves, they behave according to the regular linking rules. The aliases are also globally unique, so they should present no issues either.

Also, in the context of CUDA runtime/driver API, if a device variable is addressable on the host side through the runtime API, it should be addressable through the driver API as well. However, the naming will be a big challenge.

I'm not convinced that I completely agree with this assertion. It's reasonable for visible symbols (though even then there's a question of how do you figure out a properly mangled name for that symbol), but expecting it to work for non-visible symbols would push it too far, IMO.
It will also have issues with RDC as the name resulution will become ambiguous. Compiler does have necessary info to resolve that ambiguity, runtime support library, too, but the driver does not. We'd have to provide the driver with the additional info to let it map host-side symbol info to its device counterpart. It's doable, but I don't think it's particularly useful in practice. If something needs to be accessed via the driver API, it would be reasonable to expect it to be a public symbol.

In D80858#2170668, @tra wrote:

In D80858#2170604, @hliao wrote:

In D80858#2170547, @tra wrote:

Would it work if we generate a globally unique visible aliases for the static vars and use the alias' name to register device-side entities without changing their visibility?

We still need to define how a static device variable should be visible on the host side.

As far as the host is concerned, if a 'static' variable is within the same TU it (or, rather, its shadow) should be accessible.

How that behaves in the context of relocatable code generation as well as anonymous namespaces.

AFAICT it would just work -- each TU only sees static things within that TU on both host and device sides. Visible module-unique aliases would allow host-side binary to get address of an otherwise non-visible variables. I expect that will work with RDC -- because we didn't change the visibility of the symbols themselves, they behave according to the regular linking rules. The aliases are also globally unique, so they should present no issues either.

The problem is not whether we have solution to tell them but when we need to add that. Not all static device variables need to be visible to the host side. Externalizing them adds the overhead for the linker and may pose additional restrictions on aggressive optimizations. Do we have to support every ambiguous usage in the burden of the compiler change?

Also, in the context of CUDA runtime/driver API, if a device variable is addressable on the host side through the runtime API, it should be addressable through the driver API as well. However, the naming will be a big challenge.

I'm not convinced that I completely agree with this assertion. It's reasonable for visible symbols (though even then there's a question of how do you figure out a properly mangled name for that symbol), but expecting it to work for non-visible symbols would push it too far, IMO.

Won't this patch just makes invisible variables "visible"?

It will also have issues with RDC as the name resulution will become ambiguous. Compiler does have necessary info to resolve that ambiguity, runtime support library, too, but the driver does not. We'd have to provide the driver with the additional info to let it map host-side symbol info to its device counterpart. It's doable, but I don't think it's particularly useful in practice. If something needs to be accessed via the driver API, it would be reasonable to expect it to be a public symbol.

In D80858#2170781, @hliao wrote:

The problem is not whether we have solution to tell them but when we need to add that. Not all static device variables need to be visible to the host side. Externalizing them adds the overhead for the linker and may pose additional restrictions on aggressive optimizations. Do we have to support every ambiguous usage in the burden of the compiler change?

It's a multi-part problem.
The key request here is, IIUIC, to make TU-local variables visible within the TU on both host and the device.
If and how it should be implemented is up for the discusstion.
For me the request is reasonable and it would be good to have it as it would make the code behave according to general expectations of how TU-local variables behave -- "if it's in the same source file, it should be accessible from the functions in that file".

I'm mostly discussing the 'how' part and we obviously don't have all the answers yet.

I agree, that adding the 'handles' for all non-visible variables is suboptimal and it may prove to be too big of a burden. That said, it does not look like an outright showstopper either. Ideally we only need those handles for the items that are being referred to by the host. Most likely most of them will not be. The problem is that it's the host-side compilation which knows which symbols will be needed, but it's the device-side compilation where we would have to generate aliases and, generally speaking, device-side compilation may not be able to ever tell which symbols will be needed by the host. I don't have a good solution for that. Ideally we need to have exact same AST on both sides and we can't guarantee that with the current implementation of HD functions and the use of preprocessor macros.

Also, in the context of CUDA runtime/driver API, if a device variable is addressable on the host side through the runtime API, it should be addressable through the driver API as well. However, the naming will be a big challenge.

I'm not convinced that I completely agree with this assertion. It's reasonable for visible symbols (though even then there's a question of how do you figure out a properly mangled name for that symbol), but expecting it to work for non-visible symbols would push it too far, IMO.

Won't this patch just makes invisible variables "visible"?

Yes, but there will be no name conflicts, so for all practical purposes they don't change the end result of linking.

In D80858#2170821, @tra wrote:

In D80858#2170781, @hliao wrote:

The problem is not whether we have solution to tell them but when we need to add that. Not all static device variables need to be visible to the host side. Externalizing them adds the overhead for the linker and may pose additional restrictions on aggressive optimizations. Do we have to support every ambiguous usage in the burden of the compiler change?

It's a multi-part problem.
The key request here is, IIUIC, to make TU-local variables visible within the TU on both host and the device.
If and how it should be implemented is up for the discusstion.
For me the request is reasonable and it would be good to have it as it would make the code behave according to general expectations of how TU-local variables behave -- "if it's in the same source file, it should be accessible from the functions in that file".

We may try to solve the issue without RDC firstly, where we don't need to change that static variable name (if the runtime maintains the device binaries correctly.) We only need to ensure the linker won't remove their symbols.

I'm mostly discussing the 'how' part and we obviously don't have all the answers yet.

I agree, that adding the 'handles' for all non-visible variables is suboptimal and it may prove to be too big of a burden. That said, it does not look like an outright showstopper either. Ideally we only need those handles for the items that are being referred to by the host. Most likely most of them will not be. The problem is that it's the host-side compilation which knows which symbols will be needed, but it's the device-side compilation where we would have to generate aliases and, generally speaking, device-side compilation may not be able to ever tell which symbols will be needed by the host. I don't have a good solution for that. Ideally we need to have exact same AST on both sides and we can't guarantee that with the current implementation of HD functions and the use of preprocessor macros.

Also, in the context of CUDA runtime/driver API, if a device variable is addressable on the host side through the runtime API, it should be addressable through the driver API as well. However, the naming will be a big challenge.

I'm not convinced that I completely agree with this assertion. It's reasonable for visible symbols (though even then there's a question of how do you figure out a properly mangled name for that symbol), but expecting it to work for non-visible symbols would push it too far, IMO.

Won't this patch just makes invisible variables "visible"?

Yes, but there will be no name conflicts, so for all practical purposes they don't change the end result of linking.

In D80858#2171266, @hliao wrote:

We may try to solve the issue without RDC firstly, where we don't need to change that static variable name (if the runtime maintains the device binaries correctly.) We only need to ensure the linker won't remove their symbols.

This is essentially externalizing it in linker. However, it may not work.

Consider a static device var was accessed in host code through hipMemCpyToSymbol or hipMemCpyFromSymbol. In device code, this static var may be initialized through host code, or its final value may be read by host code after kernel execution. The existence of these operations mean that this static device variable is actually having external linkage, instead of internal linkage, since it is accessed by external modules. Fail to reflect this truth in IR will cause optimization passes to make incorrect assumptions about this variable and perform incorrect optimizations on it. e.g. the llvm passes can assume the value in this static var never changes, or its final value will not be used, then the llvm passes may simply remove it. Marking it as used will not solve the issue, since the llvm passes may still assume its value never changes after initialization, whereas in reality it may be changed by hipMemCpyToSymbol before kernel execution.

It's a good point. Perhaps this is one of the cases where we should *not* follow nvcc.
We can't have our cake (preserve static behavior) and eat it (treat it as non-static in case something on the host side may decide to use an API which uses symbol names). Something's got to give. While we could make it work in some cases, I don't think we can make it work consistently.
I think it would be reasonable to restrict APIs that access symbols by name to be applicable to visible symbols only.

In D80858#2177159, @tra wrote:

It's a good point. Perhaps this is one of the cases where we should *not* follow nvcc.
We can't have our cake (preserve static behavior) and eat it (treat it as non-static in case something on the host side may decide to use an API which uses symbol names). Something's got to give. While we could make it work in some cases, I don't think we can make it work consistently.
I think it would be reasonable to restrict APIs that access symbols by name to be applicable to visible symbols only.

Agree, we only need to support accessing static device var by shadow var (runtime API), which is sufficient for most apps.

In D80858#2177104, @yaxunl wrote:

In D80858#2171266, @hliao wrote:

We may try to solve the issue without RDC firstly, where we don't need to change that static variable name (if the runtime maintains the device binaries correctly.) We only need to ensure the linker won't remove their symbols.

This is essentially externalizing it in linker. However, it may not work.

Consider a static device var was accessed in host code through hipMemCpyToSymbol or hipMemCpyFromSymbol. In device code, this static var may be initialized through host code, or its final value may be read by host code after kernel execution. The existence of these operations mean that this static device variable is actually having external linkage, instead of internal linkage, since it is accessed by external modules. Fail to reflect this truth in IR will cause optimization passes to make incorrect assumptions about this variable and perform incorrect optimizations on it. e.g. the llvm passes can assume the value in this static var never changes, or its final value will not be used, then the llvm passes may simply remove it. Marking it as used will not solve the issue, since the llvm passes may still assume its value never changes after initialization, whereas in reality it may be changed by hipMemCpyToSymbol before kernel execution.

That's what I mean by "externalizing" that variable. In fact, nvcc doesn't the same thing and generate global symbol in PTX.

To get file-scope static device variables support, we have to 2 separate issues:

externalize variable to ensure backend/linker preserve their symbols for runtime to query. The major concern is that we'd better check whether a static device variable is referenced using host APIs. Otherwise, aggressive IPO optimizations may not be able to be applied. As those are file-scope static variables, roughly checking their reference on the host side should be doable.
rename device static variables. Such renaming is only necessary when RDC is enabled. Without RDC, there would be conflicting names and renaming is unnecessary. For renaming, relying on developer supplied CUID is not reliable and we should extend name mangler to include source name, digest, or whatever automatically managed by the compiler. If we support static variable access on the host side, we should not rely on developer supplied option to function correctly. Also, the mangled name should be demangled into a meaningful name if possible. I remembered GCC has special support for names from anonymous namespaces if they have conflicting names. We may follow the similar scheme.

In D80858#2177159, @tra wrote:

It's a good point. Perhaps this is one of the cases where we should *not* follow nvcc.
We can't have our cake (preserve static behavior) and eat it (treat it as non-static in case something on the host side may decide to use an API which uses symbol names). Something's got to give. While we could make it work in some cases, I don't think we can make it work consistently.
I think it would be reasonable to restrict APIs that access symbols by name to be applicable to visible symbols only.

In fact, 'nvcc' does that by globalizing that static device variable.

jdoerfert added a subscriber: jdoerfert.Jul 28 2020, 6:58 AM

I think Sam's approach is reasonable.

The ability to supply CUID to sub-compilations is useful by itself and should probably be split into a separate patch as it's largely independent of the externalization of file-scope static vars.

revised for -fno-gpu-rdc case by Michael's comments.

yaxunl mentioned this in D85223: [CUDA][HIP] Support accessing static device variable in host code for -fgpu-rdc.Aug 4 2020, 9:56 AM

yaxunl added a child revision: D85223: [CUDA][HIP] Support accessing static device variable in host code for -fgpu-rdc.

What is expected to happen to device statics in anonymous name spaces? It may be worth adding them to the tests.

LGTM otherwise.

In D80858#2193970, @tra wrote:

What is expected to happen to device statics in anonymous name spaces? It may be worth adding them to the tests.

LGTM otherwise.

static device var in anonymous name space will have external linkage if it is referenced by host code. will add a test when committing.

This revision was not accepted when it landed; it landed in state Needs Review.Aug 5 2020, 4:58 AM

Closed by commit rG45f2a56856e2: [CUDA][HIP] Support accessing static device variable in host code for -fno-gpu… (authored by yaxunl). · Explain Why

This revision was automatically updated to reflect the committed changes.

yaxunl added a commit: rG45f2a56856e2: [CUDA][HIP] Support accessing static device variable in host code for -fno-gpu….

Herald added a project: Restricted Project. · View Herald TranscriptAug 5 2020, 4:58 AM

Sam, just a FYI that the patch has a couple of unintended consequences.

We now end up with various things instantiated as device-side constant objects when they were not before, when we compile with -std=c++17 (especially with libc++):
https://godbolt.org/z/KbTM9M

That in turn sometimes pulls in other thins that may not exist on the device. E.g. in one case we've ended up with a PTX error caused by unresolved reference to strlen() (via some char_traits functions) .
The other potential issue is that we increase use of __constant__ and there's only 64K of it, so additional use pushed total use over the limit in few cases.

So far all failures can be attributed to questionable user code, but when we may need to figure out how to avoid emitting unused __constant__ data.

In D80858#2207699, @tra wrote:

Sam, just a FYI that the patch has a couple of unintended consequences.

We now end up with various things instantiated as device-side constant objects when they were not before, when we compile with -std=c++17 (especially with libc++):
https://godbolt.org/z/KbTM9M

That in turn sometimes pulls in other thins that may not exist on the device. E.g. in one case we've ended up with a PTX error caused by unresolved reference to strlen() (via some char_traits functions) .
The other potential issue is that we increase use of __constant__ and there's only 64K of it, so additional use pushed total use over the limit in few cases.

So far all failures can be attributed to questionable user code, but when we may need to figure out how to avoid emitting unused __constant__ data.

This is because we have some implicit constant device variables, e.g.

static _LIBCPP_CONSTEXPR const int __libcpp_polling_count = 64;

then they are referenced by some host functions, e.g.

template<class _Fn, class _BFn>
_LIBCPP_AVAILABILITY_SYNC _LIBCPP_INLINE_VISIBILITY
bool __libcpp_thread_poll_with_backoff(
  _Fn && __f, _BFn && __bf, chrono::nanoseconds __max_elapsed = chrono::nanoseconds::zero())
{
    auto const __start = chrono::high_resolution_clock::now();
    for(int __count = 0;;) {
      if(__f())
        return true; // _Fn completion means success
      if(__count < __libcpp_polling_count) {
        __count += 1;
        continue;
      }

when __libcpp_polling_count has internal linkage, it can be eliminated by optimization pass. When it is externalized, it can no longer be eliminated.

We can restrict externalization to constant variables with explicit 'constant' attributes only, which should fix this issue.

We can restrict externalization to constant variables with explicit 'constant' attributes only, which should fix this issue.

SGTM. If it does not have explicit device-side attribute, it's never going to need to cross host/device boundary. I guess this applies to vars with __device__ attribute, too.

In D80858#2207898, @tra wrote:

We can restrict externalization to constant variables with explicit 'constant' attributes only, which should fix this issue.

SGTM. If it does not have explicit device-side attribute, it's never going to need to cross host/device boundary. I guess this applies to vars with __device__ attribute, too.

I think we only add implicit __constant__ attributes for variables, not __device__ attributes.

The fix is here https://reviews.llvm.org/D85686

The fix is here https://reviews.llvm.org/D85686

Thank you.

Revision Contents

Path

Size

clang/

include/

clang/

AST/

ASTContext.h

3 lines

Basic/

DiagnosticDriverKinds.td

2 lines

LangOptions.h

6 lines

Driver/

Action.h

8 lines

Compilation.h

2 lines

Options.td

12 lines

lib/

AST/

ASTContext.cpp

21 lines

CodeGen/

CGCUDANV.cpp

9 lines

CodeGenModule.h

4 lines

CodeGenModule.cpp

8 lines

Driver/

Action.cpp

4 lines

Driver.cpp

49 lines

ToolChains/

Clang.cpp

12 lines

Frontend/

CompilerInvocation.cpp

15 lines

test/

CodeGenCUDA/

static-device-var.cu

84 lines

Driver/

hip-cuid.hip

130 lines

Frontend/

hip-cuid.hip

6 lines

SemaCUDA/

static-device-var.cu

37 lines

Diff 277272

clang/include/clang/AST/ASTContext.h

Show First 20 Lines • Show All 3,012 Lines • ▼ Show 20 Lines	SectionInfo(DeclaratorDecl *Decl,
SectionFlags(SectionFlags) {}		SectionFlags(SectionFlags) {}
};		};

llvm::StringMap<SectionInfo> SectionInfos;		llvm::StringMap<SectionInfo> SectionInfos;

/// Return a new OMPTraitInfo object owned by this context.		/// Return a new OMPTraitInfo object owned by this context.
OMPTraitInfo &getNewOMPTraitInfo();		OMPTraitInfo &getNewOMPTraitInfo();

		/// Whether a C++ static variable should be externalized.
		bool shouldExternalizeStaticVar(const Decl *D) const;

private:		private:
/// All OMPTraitInfo objects live in this collection, one per		/// All OMPTraitInfo objects live in this collection, one per
/// `pragma omp [begin] declare variant` directive.		/// `pragma omp [begin] declare variant` directive.
SmallVector<std::unique_ptr<OMPTraitInfo>, 4> OMPTraitInfoVector;		SmallVector<std::unique_ptr<OMPTraitInfo>, 4> OMPTraitInfoVector;
};		};

/// Insertion operator for diagnostics.		/// Insertion operator for diagnostics.
const DiagnosticBuilder &operator<<(const DiagnosticBuilder &DB,		const DiagnosticBuilder &operator<<(const DiagnosticBuilder &DB,
▲ Show 20 Lines • Show All 114 Lines • Show Last 20 Lines

clang/include/clang/Basic/DiagnosticDriverKinds.td

Show First 20 Lines • Show All 202 Lines • ▼ Show 20 Lines	def warn_drv_yc_multiple_inputs_clang_cl : Warning<
"support for '/Yc' with more than one source file not implemented yet; flag ignored">,		"support for '/Yc' with more than one source file not implemented yet; flag ignored">,
InGroup<ClangClPch>;		InGroup<ClangClPch>;

def err_drv_dllexport_inlines_and_fallback : Error<		def err_drv_dllexport_inlines_and_fallback : Error<
"option '/Zc:dllexportInlines-' is ABI-changing and not compatible with '/fallback'">;		"option '/Zc:dllexportInlines-' is ABI-changing and not compatible with '/fallback'">;

def err_drv_invalid_value : Error<"invalid value '%1' in '%0'">;		def err_drv_invalid_value : Error<"invalid value '%1' in '%0'">;
def err_drv_invalid_int_value : Error<"invalid integral value '%1' in '%0'">;		def err_drv_invalid_int_value : Error<"invalid integral value '%1' in '%0'">;
		def err_drv_invalid_cuid : Error<"invalid value '%1' in '%0' (alphanumeric characters "
		" and underscore only)">;
def err_drv_invalid_remap_file : Error<		def err_drv_invalid_remap_file : Error<
"invalid option '%0' not of the form <from-file>;<to-file>">;		"invalid option '%0' not of the form <from-file>;<to-file>">;
def err_drv_invalid_gcc_output_type : Error<		def err_drv_invalid_gcc_output_type : Error<
"invalid output type '%0' for use with gcc tool">;		"invalid output type '%0' for use with gcc tool">;
def err_drv_cc_print_options_failure : Error<		def err_drv_cc_print_options_failure : Error<
"unable to open CC_PRINT_OPTIONS file: %0">;		"unable to open CC_PRINT_OPTIONS file: %0">;
def err_drv_lto_without_lld : Error<"LTO requires -fuse-ld=lld">;		def err_drv_lto_without_lld : Error<"LTO requires -fuse-ld=lld">;
def err_drv_preamble_format : Error<		def err_drv_preamble_format : Error<
▲ Show 20 Lines • Show All 296 Lines • Show Last 20 Lines

clang/include/clang/Basic/LangOptions.h

Show First 20 Lines • Show All 287 Lines • ▼ Show 20 Lines	public:
/// Triples of the OpenMP targets that the host code codegen should		/// Triples of the OpenMP targets that the host code codegen should
/// take into account in order to generate accurate offloading descriptors.		/// take into account in order to generate accurate offloading descriptors.
std::vector<llvm::Triple> OMPTargetTriples;		std::vector<llvm::Triple> OMPTargetTriples;

/// Name of the IR file that contains the result of the OpenMP target		/// Name of the IR file that contains the result of the OpenMP target
/// host code generation.		/// host code generation.
std::string OMPHostIRFile;		std::string OMPHostIRFile;

		/// The user provided compilation unit ID, if non-empty. This is used to
		/// externalize static variables which is needed to support accessing static
		/// device variables in host code for single source offloading languages
		/// like CUDA/HIP.
		std::string CUID;

/// Indicates whether the front-end is explicitly told that the		/// Indicates whether the front-end is explicitly told that the
/// input is a header file (i.e. -x c-header).		/// input is a header file (i.e. -x c-header).
bool IsHeaderFile = false;		bool IsHeaderFile = false;

LangOptions();		LangOptions();

// Define accessors/mutators for language options of enumeration type.		// Define accessors/mutators for language options of enumeration type.
#define LANGOPT(Name, Bits, Default, Description)		#define LANGOPT(Name, Bits, Default, Description)
▲ Show 20 Lines • Show All 252 Lines • Show Last 20 Lines

clang/include/clang/Driver/Action.h

Show First 20 Lines • Show All 208 Lines • ▼ Show 20 Lines	public:
}		}
bool isOffloading(OffloadKind OKind) const {		bool isOffloading(OffloadKind OKind) const {
return isHostOffloading(OKind) \|\| isDeviceOffloading(OKind);		return isHostOffloading(OKind) \|\| isDeviceOffloading(OKind);
}		}
};		};

class InputAction : public Action {		class InputAction : public Action {
const llvm::opt::Arg &Input;		const llvm::opt::Arg &Input;
		std::string Id;
		rjmccallUnsubmitted Done Reply Inline Actions Allowing an arbitrary string might be more adaptable. rjmccall: Allowing an arbitrary string might be more adaptable.
		yaxunlAuthorUnsubmitted Done Reply Inline Actions done yaxunl: done
virtual void anchor();		virtual void anchor();

public:		public:
InputAction(const llvm::opt::Arg &Input, types::ID Type);		InputAction(const llvm::opt::Arg &Input, types::ID Type,
		StringRef Id = StringRef());

const llvm::opt::Arg &getInputArg() const { return Input; }		const llvm::opt::Arg &getInputArg() const { return Input; }

		void setId(StringRef _Id) { Id = _Id.str(); }
		StringRef getId() const { return Id; }

static bool classof(const Action *A) {		static bool classof(const Action *A) {
return A->getKind() == InputClass;		return A->getKind() == InputClass;
}		}
};		};

class BindArchAction : public Action {		class BindArchAction : public Action {
virtual void anchor();		virtual void anchor();

▲ Show 20 Lines • Show All 423 Lines • Show Last 20 Lines

clang/include/clang/Driver/Compilation.h

Show First 20 Lines • Show All 291 Lines • ▼ Show 20 Lines	public:
void initCompilationForDiagnostics();		void initCompilationForDiagnostics();

/// Return true if we're compiling for diagnostics.		/// Return true if we're compiling for diagnostics.
bool isForDiagnostics() const { return ForDiagnostics; }		bool isForDiagnostics() const { return ForDiagnostics; }

/// Return whether an error during the parsing of the input args.		/// Return whether an error during the parsing of the input args.
bool containsError() const { return ContainsError; }		bool containsError() const { return ContainsError; }

		void setContainsError() { ContainsError = true; }

/// Redirect - Redirect output of this compilation. Can only be done once.		/// Redirect - Redirect output of this compilation. Can only be done once.
///		///
/// \param Redirects - array of optional paths. The array should have a size		/// \param Redirects - array of optional paths. The array should have a size
/// of three. The inferior process's stdin(0), stdout(1), and stderr(2) will		/// of three. The inferior process's stdin(0), stdout(1), and stderr(2) will
/// be redirected to the corresponding paths, if provided (not llvm::None).		/// be redirected to the corresponding paths, if provided (not llvm::None).
void Redirect(ArrayRef<Optional<StringRef>> Redirects);		void Redirect(ArrayRef<Optional<StringRef>> Redirects);
};		};

} // namespace driver		} // namespace driver
} // namespace clang		} // namespace clang

#endif // LLVM_CLANG_DRIVER_COMPILATION_H		#endif // LLVM_CLANG_DRIVER_COMPILATION_H

clang/include/clang/Driver/Options.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 647 Lines • ▼ Show 20 Lines	def fhip_dump_offload_linker_script : Flag<["-"], "fhip-dump-offload-linker-script">,
Group<f_Group>, Flags<[NoArgumentUnused, HelpHidden]>;		Group<f_Group>, Flags<[NoArgumentUnused, HelpHidden]>;
defm hip_new_launch_api : OptInFFlag<"hip-new-launch-api",		defm hip_new_launch_api : OptInFFlag<"hip-new-launch-api",
"Use", "Don't use", " new kernel launching API for HIP">;		"Use", "Don't use", " new kernel launching API for HIP">;
defm gpu_allow_device_init : OptInFFlag<"gpu-allow-device-init",		defm gpu_allow_device_init : OptInFFlag<"gpu-allow-device-init",
"Allow", "Don't allow", " device side init function in HIP">;		"Allow", "Don't allow", " device side init function in HIP">;
def gpu_max_threads_per_block_EQ : Joined<["--"], "gpu-max-threads-per-block=">,		def gpu_max_threads_per_block_EQ : Joined<["--"], "gpu-max-threads-per-block=">,
Flags<[CC1Option]>,		Flags<[CC1Option]>,
HelpText<"Default max threads per block for kernel launch bounds for HIP">;		HelpText<"Default max threads per block for kernel launch bounds for HIP">;
		def cuid_EQ : Joined<["-"], "cuid=">, Flags<[CC1Option]>,
		HelpText<"An ID for compilation unit, which should be the same for the same "
		"compilation unit but different for different compilation units. "
		"It is used to externalize device-side static variables for single "
		"source offloading languages CUDA and HIP so that they can be "
		"accessed by the host code of the same compilation unit.">;
		def fuse_cuid_EQ : Joined<["-"], "fuse-cuid=">,
		HelpText<"Method to generate ID's for compilation units for single source "
		"offloading languages CUDA and HIP: 'hash' (ID's generated by hashing "
		"file path and command line options) \| 'random' (ID's generated as "
		"random numbers) \| 'none' (disabled). Default is 'hash'. This option "
		"will be overriden by option '-cuid=[ID]' if it is specified." >;
def libomptarget_nvptx_path_EQ : Joined<["--"], "libomptarget-nvptx-path=">, Group<i_Group>,		def libomptarget_nvptx_path_EQ : Joined<["--"], "libomptarget-nvptx-path=">, Group<i_Group>,
HelpText<"Path to libomptarget-nvptx libraries">;		HelpText<"Path to libomptarget-nvptx libraries">;
def dD : Flag<["-"], "dD">, Group<d_Group>, Flags<[CC1Option]>,		def dD : Flag<["-"], "dD">, Group<d_Group>, Flags<[CC1Option]>,
HelpText<"Print macro definitions in -E mode in addition to normal output">;		HelpText<"Print macro definitions in -E mode in addition to normal output">;
def dI : Flag<["-"], "dI">, Group<d_Group>, Flags<[CC1Option]>,		def dI : Flag<["-"], "dI">, Group<d_Group>, Flags<[CC1Option]>,
HelpText<"Print include directives in -E mode in addition to normal output">;		HelpText<"Print include directives in -E mode in addition to normal output">;
def dM : Flag<["-"], "dM">, Group<d_Group>, Flags<[CC1Option]>,		def dM : Flag<["-"], "dM">, Group<d_Group>, Flags<[CC1Option]>,
HelpText<"Print macro definitions in -E mode instead of normal output">;		HelpText<"Print macro definitions in -E mode instead of normal output">;
▲ Show 20 Lines • Show All 4,183 Lines • Show Last 20 Lines

clang/lib/AST/ASTContext.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 9,779 Lines • ▼ Show 20 Lines
	false);	false);
	assert(!RequiresICE && "Can't require complex ICE");	assert(!RequiresICE && "Can't require complex ICE");
	Type = Context.getComplexType(ElementType);	Type = Context.getComplexType(ElementType);
	break;	break;
	}	}
	case 'Y':	case 'Y':
	Type = Context.getPointerDiffType();	Type = Context.getPointerDiffType();
	break;	break;
	case 'P':	case 'P':
		rjmccallUnsubmitted Done Reply Inline Actions This needs to be a clearer statement of why this is necessary. rjmccall: This needs to be a clearer statement of why this is necessary.
		yaxunlAuthorUnsubmitted Done Reply Inline Actions added comments to explain why yaxunl: added comments to explain why
	Type = Context.getFILEType();	Type = Context.getFILEType();
	if (Type.isNull()) {	if (Type.isNull()) {
	Error = ASTContext::GE_Missing_stdio;	Error = ASTContext::GE_Missing_stdio;
	return {};	return {};
		rjmccallUnsubmitted Done Reply Inline Actions Are you sure this doesn't apply to e.g. local statics? Can't you have kernel lambdas, or am I confusing HIP with another language? rjmccall: Are you sure this doesn't apply to e.g. local statics? Can't you have kernel lambdas, or am I…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions function-scope static var in a device function is only visible to the device function. Host code cannot access it, therefore no need to externalize it. yaxunl: function-scope static var in a device function is only visible to the device function. Host…
		JonChesterfieldUnsubmitted Done Reply Inline Actions This doesn't sound right. An inline function can return a pointer to a function scope static variable, e.g. to implement a singleton in a header file. I think host code can then access said variable. JonChesterfield: This doesn't sound right. An inline function can return a pointer to a function scope static…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions As long as we are not accessing the static variable by symbol we do not need externalize it. If a device function returns a pointer to its static variable and somehow passes that pointer to host code, the host code can use it directly by hipMemCpy. yaxunl: As long as we are not accessing the static variable by symbol we do not need externalize it.
		rjmccallUnsubmitted Done Reply Inline Actions Right, and IIRC you can declare host device functions as well, which ought to agree on the variable if they agree on globals. rjmccall: Right, and IIRC you can declare __host__ __device__ functions as well, which ought to agree on…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions If we have a static variable in a device function, it is only visible in the function and not visible by any host code. We only need externalize it if it needs to be accessed `by symbol` in the host code, however, that is impossible, therefore we do not need externalize it. For static variables in a host device function, the static variables should be different instances on host side and device side. The rationale is that a static variable is per function, whereas a host device function is actually two functions: a host instance and a device instance, which could be totally different by using conditional macros. If it is requested that the static variable in a host device function is one instance, it requires special handling in runtime so that the same variable can be accessed on both device side and host side by common load/store instructions, but that is not the case. Therefore the device side instance of a static variable in a host device function is still only visible to device codes, not visible to host codes. Since it cannot be accessed `by symbol` by host code, it does not needs to be externalized. yaxunl: If we have a static variable in a device function, it is only visible in the function and not…
	}	}
	break;	break;
	case 'J':	case 'J':
	if (Signed)	if (Signed)
	Type = Context.getsigjmp_bufType();	Type = Context.getsigjmp_bufType();
	else	else
	Type = Context.getjmp_bufType();	Type = Context.getjmp_bufType();

	▲ Show 20 Lines • Show All 191 Lines • ▼ Show 20 Lines
	// See http://msdn.microsoft.com/en-us/library/xa0d9ste.aspx	// See http://msdn.microsoft.com/en-us/library/xa0d9ste.aspx
	// dllexport/dllimport on inline functions.	// dllexport/dllimport on inline functions.
	if (D->hasAttr<DLLImportAttr>()) {	if (D->hasAttr<DLLImportAttr>()) {
	if (L == GVA_DiscardableODR \|\| L == GVA_StrongODR)	if (L == GVA_DiscardableODR \|\| L == GVA_StrongODR)
	return GVA_AvailableExternally;	return GVA_AvailableExternally;
	} else if (D->hasAttr<DLLExportAttr>()) {	} else if (D->hasAttr<DLLExportAttr>()) {
	if (L == GVA_DiscardableODR)	if (L == GVA_DiscardableODR)
	return GVA_StrongODR;	return GVA_StrongODR;
	} else if (Context.getLangOpts().CUDA && Context.getLangOpts().CUDAIsDevice &&	} else if (Context.getLangOpts().CUDA && Context.getLangOpts().CUDAIsDevice) {
	D->hasAttr<CUDAGlobalAttr>()) {
	// Device-side functions with __global__ attribute must always be	// Device-side functions with __global__ attribute must always be
	// visible externally so they can be launched from host.	// visible externally so they can be launched from host.
	if (L == GVA_DiscardableODR \|\| L == GVA_Internal)	if (D->hasAttr<CUDAGlobalAttr>() &&
		(L == GVA_DiscardableODR \|\| L == GVA_Internal))
	return GVA_StrongODR;	return GVA_StrongODR;
		// Single source offloading languages like CUDA/HIP need to be able to
		// access static device variables from host code of the same compilation
		// unit. This is done by externalizing the static variable with a shared
		// name between the host and device compilation which is the same for the
		// same compilation unit whereas different among different compilation
		// units.
		if (Context.shouldExternalizeStaticVar(D))
		return GVA_StrongExternal;
	}	}
	return L;	return L;
	}	}

	/// Adjust the GVALinkage for a declaration based on what an external AST source	/// Adjust the GVALinkage for a declaration based on what an external AST source
	/// knows about whether there can be other definitions of this declaration.	/// knows about whether there can be other definitions of this declaration.
	static GVALinkage	static GVALinkage
	adjustGVALinkageForExternalDefinitionKind(const ASTContext &Ctx, const Decl *D,	adjustGVALinkageForExternalDefinitionKind(const ASTContext &Ctx, const Decl *D,
	▲ Show 20 Lines • Show All 869 Lines • ▼ Show 20 Lines

	const DiagnosticBuilder &	const DiagnosticBuilder &
	clang::operator<<(const DiagnosticBuilder &DB,	clang::operator<<(const DiagnosticBuilder &DB,
	const ASTContext::SectionInfo &Section) {	const ASTContext::SectionInfo &Section) {
	if (Section.Decl)	if (Section.Decl)
	return DB << Section.Decl;	return DB << Section.Decl;
	return DB << "a prior #pragma section";	return DB << "a prior #pragma section";
	}	}

		bool ASTContext::shouldExternalizeStaticVar(const Decl *D) const {
		return !getLangOpts().CUID.empty() &&
		(D->hasAttr<CUDADeviceAttr>() \|\| D->hasAttr<CUDAConstantAttr>()) &&
		isa<VarDecl>(D) && cast<VarDecl>(D)->isFileVarDecl() &&
		cast<VarDecl>(D)->getStorageClass() == SC_Static;
		}
Context not available.

clang/lib/CodeGen/CGCUDANV.cpp

Show First 20 Lines • Show All 228 Lines • ▼ Show 20 Lines	std::string CGNVCUDARuntime::getDeviceSideName(const NamedDecl *ND) {
std::string DeviceSideName;		std::string DeviceSideName;
if (DeviceMC->shouldMangleDeclName(ND)) {		if (DeviceMC->shouldMangleDeclName(ND)) {
SmallString<256> Buffer;		SmallString<256> Buffer;
llvm::raw_svector_ostream Out(Buffer);		llvm::raw_svector_ostream Out(Buffer);
DeviceMC->mangleName(GD, Out);		DeviceMC->mangleName(GD, Out);
DeviceSideName = std::string(Out.str());		DeviceSideName = std::string(Out.str());
} else		} else
DeviceSideName = std::string(ND->getIdentifier()->getName());		DeviceSideName = std::string(ND->getIdentifier()->getName());

		// Make unique name for device side static file-scope variable for HIP.
		if (CGM.getContext().shouldExternalizeStaticVar(ND)) {
		SmallString<256> Buffer;
		llvm::raw_svector_ostream Out(Buffer);
		Out << DeviceSideName;
		CGM.printPostfixForExternalizedStaticVar(Out);
		DeviceSideName = std::string(Out.str());
		}
return DeviceSideName;		return DeviceSideName;
}		}

void CGNVCUDARuntime::emitDeviceStub(CodeGenFunction &CGF,		void CGNVCUDARuntime::emitDeviceStub(CodeGenFunction &CGF,
FunctionArgList &Args) {		FunctionArgList &Args) {
EmittedKernels.push_back({CGF.CurFn, CGF.CurFuncDecl});		EmittedKernels.push_back({CGF.CurFn, CGF.CurFuncDecl});
if (CudaFeatureEnabled(CGM.getTarget().getSDKVersion(),		if (CudaFeatureEnabled(CGM.getTarget().getSDKVersion(),
CudaFeature::CUDA_USES_NEW_LAUNCH) \|\|		CudaFeature::CUDA_USES_NEW_LAUNCH) \|\|
▲ Show 20 Lines • Show All 604 Lines • Show Last 20 Lines

clang/lib/CodeGen/CodeGenModule.h

Show First 20 Lines • Show All 1,400 Lines • ▼ Show 20 Lines	CharUnits getNaturalTypeAlignment(QualType T,
LValueBaseInfo *BaseInfo = nullptr,		LValueBaseInfo *BaseInfo = nullptr,
TBAAAccessInfo *TBAAInfo = nullptr,		TBAAAccessInfo *TBAAInfo = nullptr,
bool forPointeeType = false);		bool forPointeeType = false);
CharUnits getNaturalPointeeTypeAlignment(QualType T,		CharUnits getNaturalPointeeTypeAlignment(QualType T,
LValueBaseInfo *BaseInfo = nullptr,		LValueBaseInfo *BaseInfo = nullptr,
TBAAAccessInfo *TBAAInfo = nullptr);		TBAAAccessInfo *TBAAInfo = nullptr);
bool stopAutoInit();		bool stopAutoInit();

		/// Print the postfix for externalized static variable for single source
		/// offloading languages CUDA and HIP.
		void printPostfixForExternalizedStaticVar(llvm::raw_ostream &OS) const;

private:		private:
llvm::Constant *GetOrCreateLLVMFunction(		llvm::Constant *GetOrCreateLLVMFunction(
StringRef MangledName, llvm::Type *Ty, GlobalDecl D, bool ForVTable,		StringRef MangledName, llvm::Type *Ty, GlobalDecl D, bool ForVTable,
bool DontDefer = false, bool IsThunk = false,		bool DontDefer = false, bool IsThunk = false,
llvm::AttributeList ExtraAttrs = llvm::AttributeList(),		llvm::AttributeList ExtraAttrs = llvm::AttributeList(),
ForDefinition_t IsForDefinition = NotForDefinition);		ForDefinition_t IsForDefinition = NotForDefinition);

llvm::Constant *GetOrCreateMultiVersionResolver(GlobalDecl GD,		llvm::Constant *GetOrCreateMultiVersionResolver(GlobalDecl GD,
▲ Show 20 Lines • Show All 154 Lines • Show Last 20 Lines

clang/lib/CodeGen/CodeGenModule.cpp

Show First 20 Lines • Show All 1,077 Lines • ▼ Show 20 Lines	if (FD->isMultiVersion() && !OmitMultiVersionMangling) {
case MultiVersionKind::Target:		case MultiVersionKind::Target:
AppendTargetMangling(CGM, FD->getAttr<TargetAttr>(), Out);		AppendTargetMangling(CGM, FD->getAttr<TargetAttr>(), Out);
break;		break;
case MultiVersionKind::None:		case MultiVersionKind::None:
llvm_unreachable("None multiversion type isn't valid here");		llvm_unreachable("None multiversion type isn't valid here");
}		}
}		}

		// Make unique name for device side static file-scope variable for HIP.
		if (CGM.getContext().shouldExternalizeStaticVar(ND))
		CGM.printPostfixForExternalizedStaticVar(Out);
return std::string(Out.str());		return std::string(Out.str());
}		}
		rjmccallUnsubmitted Done Reply Inline Actions Please extract this "(device \|\| constant) && file && static" predicate instead of repeating it in three different places. rjmccall: Please extract this "(device \|\| constant) && file && static" predicate instead of repeating it…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions extracted as ASTContext::shouldExternalizeStaticVar yaxunl: extracted as ASTContext::shouldExternalizeStaticVar

void CodeGenModule::UpdateMultiVersionNames(GlobalDecl GD,		void CodeGenModule::UpdateMultiVersionNames(GlobalDecl GD,
const FunctionDecl *FD) {		const FunctionDecl *FD) {
if (!FD->isMultiVersion())		if (!FD->isMultiVersion())
return;		return;

// Get the name of what this would be without the 'target' attribute. This		// Get the name of what this would be without the 'target' attribute. This
// allows us to lookup the version that was emitted when this wasn't a		// allows us to lookup the version that was emitted when this wasn't a
▲ Show 20 Lines • Show All 4,951 Lines • ▼ Show 20 Lines	if (!NumAutoVarInit) {
LangOptions::TrivialAutoVarInitKind::Zero		LangOptions::TrivialAutoVarInitKind::Zero
? "zero"		? "zero"
: "pattern");		: "pattern");
}		}
++NumAutoVarInit;		++NumAutoVarInit;
}		}
return false;		return false;
}		}

		void CodeGenModule::printPostfixForExternalizedStaticVar(
		llvm::raw_ostream &OS) const {
		OS << ".static." << getLangOpts().CUID;
		traUnsubmitted Done Reply Inline Actions I suspect that will have interesting issues if CUID is an arbitrary user-supplied string. We may want to impose some sort of sanity check or filtering on the cuid value. Considering that it's a CC1 flag, it's not a critical problem, but some safeguards would be useful there, too. Should we limit allowed character set? tra: I suspect that will have interesting issues if CUID is an arbitrary user-supplied string. We…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions will only allow alphanumeric and underscore in CUID for simplicity. yaxunl: will only allow alphanumeric and underscore in CUID for simplicity.
		}

clang/lib/Driver/Action.cpp

Show First 20 Lines • Show All 159 Lines • ▼ Show 20 Lines	case OFK_HIP:
// TODO: Add other programming models here.		// TODO: Add other programming models here.
}		}

llvm_unreachable("invalid offload kind");		llvm_unreachable("invalid offload kind");
}		}

void InputAction::anchor() {}		void InputAction::anchor() {}

InputAction::InputAction(const Arg &_Input, types::ID _Type)		InputAction::InputAction(const Arg &_Input, types::ID _Type, StringRef _Id)
: Action(InputClass, _Type), Input(_Input) {}		: Action(InputClass, _Type), Input(_Input), Id(_Id.str()) {}

void BindArchAction::anchor() {}		void BindArchAction::anchor() {}
		rjmccallUnsubmitted Done Reply Inline Actions I'm sure GetRandomNumber can return 0, so this logic is faulty. This also seems like an unfortunate intrusion of HIP-specific semantics on the rest of the driver. Maybe HIP can generate the shared id when it's setting up and cloning the job. rjmccall: I'm sure GetRandomNumber can return 0, so this logic is faulty. This also seems like an…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions Changed type of ID to string. Now empty ID means disabled. 0 is now allowed as a usual ID. Moved initialization of ID to HIP action builder. yaxunl: Changed type of ID to string. Now empty ID means disabled. 0 is now allowed as a usual ID.
		rjmccallUnsubmitted Done Reply Inline Actions Thanks, that works. rjmccall: Thanks, that works.

BindArchAction::BindArchAction(Action *Input, StringRef ArchName)		BindArchAction::BindArchAction(Action *Input, StringRef ArchName)
: Action(BindArchClass, Input), ArchName(ArchName) {}		: Action(BindArchClass, Input), ArchName(ArchName) {}

void OffloadAction::anchor() {}		void OffloadAction::anchor() {}

OffloadAction::OffloadAction(const HostDependence &HDep)		OffloadAction::OffloadAction(const HostDependence &HDep)
: Action(OffloadClass, HDep.getAction()), HostTC(HDep.getToolChain()) {		: Action(OffloadClass, HDep.getAction()), HostTC(HDep.getToolChain()) {
▲ Show 20 Lines • Show All 245 Lines • Show Last 20 Lines

clang/lib/Driver/Driver.cpp

Show First 20 Lines • Show All 67 Lines • ▼ Show 20 Lines
#include "llvm/Option/OptSpecifier.h"		#include "llvm/Option/OptSpecifier.h"
#include "llvm/Option/OptTable.h"		#include "llvm/Option/OptTable.h"
#include "llvm/Option/Option.h"		#include "llvm/Option/Option.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/FileSystem.h"		#include "llvm/Support/FileSystem.h"
#include "llvm/Support/FormatVariadic.h"		#include "llvm/Support/FormatVariadic.h"
#include "llvm/Support/Host.h"		#include "llvm/Support/Host.h"
		#include "llvm/Support/MD5.h"
#include "llvm/Support/Path.h"		#include "llvm/Support/Path.h"
#include "llvm/Support/PrettyStackTrace.h"		#include "llvm/Support/PrettyStackTrace.h"
#include "llvm/Support/Process.h"		#include "llvm/Support/Process.h"
#include "llvm/Support/Program.h"		#include "llvm/Support/Program.h"
#include "llvm/Support/StringSaver.h"		#include "llvm/Support/StringSaver.h"
#include "llvm/Support/TargetRegistry.h"		#include "llvm/Support/TargetRegistry.h"
#include "llvm/Support/VirtualFileSystem.h"		#include "llvm/Support/VirtualFileSystem.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
▲ Show 20 Lines • Show All 2,313 Lines • ▼ Show 20 Lines	protected:
bool IsActive = false;		bool IsActive = false;

/// Flag for -fgpu-rdc.		/// Flag for -fgpu-rdc.
bool Relocatable = false;		bool Relocatable = false;

/// Default GPU architecture if there's no one specified.		/// Default GPU architecture if there's no one specified.
CudaArch DefaultCudaArch = CudaArch::UNKNOWN;		CudaArch DefaultCudaArch = CudaArch::UNKNOWN;

		/// Method to generate compilation unit ID specified by option
		/// '-fuse-cuid='.
		enum UseCUIDKind { CUID_Hash, CUID_Random, CUID_None, CUID_Invalid };
		UseCUIDKind UseCUID = CUID_Hash;

		/// Compilation unit ID specified by option '-cuid='.
		StringRef FixedCUID;

public:		public:
CudaActionBuilderBase(Compilation &C, DerivedArgList &Args,		CudaActionBuilderBase(Compilation &C, DerivedArgList &Args,
const Driver::InputList &Inputs,		const Driver::InputList &Inputs,
Action::OffloadKind OFKind)		Action::OffloadKind OFKind)
: DeviceActionBuilder(C, Args, Inputs, OFKind) {}		: DeviceActionBuilder(C, Args, Inputs, OFKind) {}

ActionBuilderReturnCode addDeviceDepences(Action *HostAction) override {		ActionBuilderReturnCode addDeviceDepences(Action *HostAction) override {
// While generating code for CUDA, we only depend on the host input action		// While generating code for CUDA, we only depend on the host input action
Show All 19 Lines	ActionBuilderReturnCode addDeviceDepences(Action *HostAction) override {
IsActive = true;		IsActive = true;

if (CompileHostOnly)		if (CompileHostOnly)
return ABRT_Success;		return ABRT_Success;

// Replicate inputs for each GPU architecture.		// Replicate inputs for each GPU architecture.
auto Ty = IA->getType() == types::TY_HIP ? types::TY_HIP_DEVICE		auto Ty = IA->getType() == types::TY_HIP ? types::TY_HIP_DEVICE
: types::TY_CUDA_DEVICE;		: types::TY_CUDA_DEVICE;
		std::string CUID = FixedCUID.str();
		if (CUID.empty()) {
		if (UseCUID == CUID_Random)
		CUID = llvm::utohexstr(llvm::sys::Process::GetRandomNumber(),
		/LowerCase=/true);
		else if (UseCUID == CUID_Hash) {
		llvm::MD5 Hasher;
		llvm::MD5::MD5Result Hash;
		SmallString<256> RealPath;
		llvm::sys::fs::real_path(IA->getInputArg().getValue(), RealPath,
		/expand_tilde=/true);
		Hasher.update(RealPath);
		for (auto *A : Args) {
		if (A->getOption().matches(options::OPT_INPUT))
		continue;
		Hasher.update(A->getAsString(Args));
		}
		Hasher.final(Hash);
		CUID = llvm::utohexstr(Hash.low(), /LowerCase=/true);
		}
		}
		IA->setId(CUID);

for (unsigned I = 0, E = GpuArchList.size(); I != E; ++I) {		for (unsigned I = 0, E = GpuArchList.size(); I != E; ++I) {
CudaDeviceActions.push_back(		CudaDeviceActions.push_back(
C.MakeAction<InputAction>(IA->getInputArg(), Ty));		C.MakeAction<InputAction>(IA->getInputArg(), Ty, IA->getId()));
}		}

return ABRT_Success;		return ABRT_Success;
}		}

// If this is an unbundling action use it as is for each CUDA toolchain.		// If this is an unbundling action use it as is for each CUDA toolchain.
if (auto *UA = dyn_cast<OffloadUnbundlingJobAction>(HostAction)) {		if (auto *UA = dyn_cast<OffloadUnbundlingJobAction>(HostAction)) {

▲ Show 20 Lines • Show All 99 Lines • ▼ Show 20 Lines	bool initialize() override {
CompileHostOnly = PartialCompilationArg &&		CompileHostOnly = PartialCompilationArg &&
PartialCompilationArg->getOption().matches(		PartialCompilationArg->getOption().matches(
options::OPT_cuda_host_only);		options::OPT_cuda_host_only);
CompileDeviceOnly = PartialCompilationArg &&		CompileDeviceOnly = PartialCompilationArg &&
PartialCompilationArg->getOption().matches(		PartialCompilationArg->getOption().matches(
options::OPT_cuda_device_only);		options::OPT_cuda_device_only);
EmitLLVM = Args.getLastArg(options::OPT_emit_llvm);		EmitLLVM = Args.getLastArg(options::OPT_emit_llvm);
EmitAsm = Args.getLastArg(options::OPT_S);		EmitAsm = Args.getLastArg(options::OPT_S);
		FixedCUID = Args.getLastArgValue(options::OPT_cuid_EQ);
		if (Arg *A = Args.getLastArg(options::OPT_fuse_cuid_EQ)) {
		StringRef UseCUIDStr = A->getValue();
		UseCUID = llvm::StringSwitch<UseCUIDKind>(UseCUIDStr)
		.Case("hash", CUID_Hash)
		.Case("random", CUID_Random)
		.Case("none", CUID_None)
		.Default(CUID_Invalid);
		if (UseCUID == CUID_Invalid) {
		C.getDriver().Diag(diag::err_drv_invalid_value)
		<< A->getAsString(Args) << UseCUIDStr;
		C.setContainsError();
		return true;
		}
		}

// Collect all cuda_gpu_arch parameters, removing duplicates.		// Collect all cuda_gpu_arch parameters, removing duplicates.
std::set<CudaArch> GpuArchs;		std::set<CudaArch> GpuArchs;
bool Error = false;		bool Error = false;
for (Arg *A : Args) {		for (Arg *A : Args) {
if (!(A->getOption().matches(options::OPT_offload_arch_EQ) \|\|		if (!(A->getOption().matches(options::OPT_offload_arch_EQ) \|\|
A->getOption().matches(options::OPT_no_offload_arch_EQ)))		A->getOption().matches(options::OPT_no_offload_arch_EQ)))
continue;		continue;
▲ Show 20 Lines • Show All 2,637 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/Clang.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 6,000 Lines • ▼ Show 20 Lines
	}			}

	if (IsCuda) {			if (IsCuda) {
	if (Args.hasFlag(options::OPT_fcuda_short_ptr,			if (Args.hasFlag(options::OPT_fcuda_short_ptr,
	options::OPT_fno_cuda_short_ptr, false))			options::OPT_fno_cuda_short_ptr, false))
	CmdArgs.push_back("-fcuda-short-ptr");			CmdArgs.push_back("-fcuda-short-ptr");
	}			}

				if (IsCuda \|\| IsHIP) {
				// Determine the original source input.
				const Action *SourceAction = &JA;
				while (SourceAction->getKind() != Action::InputClass) {
				assert(!SourceAction->getInputs().empty() && "unexpected root action!");
				SourceAction = SourceAction->getInputs()[0];
				}
				auto CUID = cast<InputAction>(SourceAction)->getId();
				if (!CUID.empty())
				CmdArgs.push_back(Args.MakeArgString(Twine("-cuid=") + Twine(CUID)));
				}

	if (IsHIP)			if (IsHIP)
	CmdArgs.push_back("-fcuda-allow-variadic-functions");			CmdArgs.push_back("-fcuda-allow-variadic-functions");

	// OpenMP offloading device jobs take the argument -fopenmp-host-ir-file-path			// OpenMP offloading device jobs take the argument -fopenmp-host-ir-file-path
	// to specify the result of the compile phase on the host, so the meaningful			// to specify the result of the compile phase on the host, so the meaningful
	// device declarations can be identified. Also, -fopenmp-is-device is passed			// device declarations can be identified. Also, -fopenmp-is-device is passed
	// along to tell the frontend that it is generating code for a device, so that			// along to tell the frontend that it is generating code for a device, so that
	// only the relevant declarations are emitted.			// only the relevant declarations are emitted.
	▲ Show 20 Lines • Show All 1,148 Lines • Show Last 20 Lines

clang/lib/Frontend/CompilerInvocation.cpp

Show First 20 Lines • Show All 2,623 Lines • ▼ Show 20 Lines	#include "clang/Basic/LangStandards.def"
if (Args.hasArg(OPT_fgpu_allow_device_init)) {		if (Args.hasArg(OPT_fgpu_allow_device_init)) {
if (Opts.HIP)		if (Opts.HIP)
Opts.GPUAllowDeviceInit = 1;		Opts.GPUAllowDeviceInit = 1;
else		else
Diags.Report(diag::warn_ignored_hip_only_option)		Diags.Report(diag::warn_ignored_hip_only_option)
<< Args.getLastArg(OPT_fgpu_allow_device_init)->getAsString(Args);		<< Args.getLastArg(OPT_fgpu_allow_device_init)->getAsString(Args);
}		}
Opts.HIPUseNewLaunchAPI = Args.hasArg(OPT_fhip_new_launch_api);		Opts.HIPUseNewLaunchAPI = Args.hasArg(OPT_fhip_new_launch_api);

		// Only alphanumeric and underscore is allowed in -cuid option.
		if (auto *A = Args.getLastArg(OPT_cuid_EQ)) {
		const char *V = A->getValue();
		bool IsValid = true;
		for (const char P = V; P; ++P) {
		if (!std::isalnum(P) && P != '_') {
		Diags.Report(diag::err_drv_invalid_cuid) << A->getAsString(Args) << V;
		IsValid = false;
		break;
		}
		}
		if (IsValid)
		Opts.CUID = std::string(V);
		}
if (Opts.HIP)		if (Opts.HIP)
Opts.GPUMaxThreadsPerBlock = getLastArgIntValue(		Opts.GPUMaxThreadsPerBlock = getLastArgIntValue(
Args, OPT_gpu_max_threads_per_block_EQ, Opts.GPUMaxThreadsPerBlock);		Args, OPT_gpu_max_threads_per_block_EQ, Opts.GPUMaxThreadsPerBlock);
else if (Args.hasArg(OPT_gpu_max_threads_per_block_EQ))		else if (Args.hasArg(OPT_gpu_max_threads_per_block_EQ))
Diags.Report(diag::warn_ignored_hip_only_option)		Diags.Report(diag::warn_ignored_hip_only_option)
<< Args.getLastArg(OPT_gpu_max_threads_per_block_EQ)->getAsString(Args);		<< Args.getLastArg(OPT_gpu_max_threads_per_block_EQ)->getAsString(Args);

if (Opts.ObjC) {		if (Opts.ObjC) {
▲ Show 20 Lines • Show All 1,368 Lines • Show Last 20 Lines

clang/test/CodeGenCUDA/static-device-var.cu

This file was added.

				// REQUIRES: x86-registered-target
				// REQUIRES: amdgpu-registered-target

				// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device \
				// RUN: -emit-llvm -o - -x hip %s \| FileCheck \
				// RUN: -check-prefixes=DEV,INT-DEV %s

				// RUN: %clang_cc1 -triple x86_64-gnu-linux \
				// RUN: -emit-llvm -o - -x hip %s \| FileCheck \
				// RUN: -check-prefixes=HOST,INT-HOST %s

				// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -cuid=123abc \
				// RUN: -emit-llvm -o - -x hip %s \| FileCheck \
				// RUN: -check-prefixes=DEV,EXT-DEV %s

				// RUN: %clang_cc1 -triple x86_64-gnu-linux -cuid=123abc \
				// RUN: -emit-llvm -o - -x hip %s \| FileCheck \
				// RUN: -check-prefixes=HOST,EXT-HOST %s

				#include "Inputs/cuda.h"

				// Test function scope static device variable, which should not be externalized.
				// DEV-DAG: @_ZZ6kernelPiPPKiE1w = internal addrspace(4) constant i32 1

				// Test normal static device variables
				// INT-DEV-DAG: @_ZL1x = internal addrspace(1) global i32 0
				// INT-HOST-DAG: @_ZL1x = internal global i32 undef
				// INT-HOST-DAG: @[[DEVNAMEX:[0-9]+]] = {{.*}}c"_ZL1x\00"

				// Test externalized static device variables
				// EXT-DEV-DAG: @_ZL1x.static.123abc = addrspace(1) externally_initialized global i32 0
				// EXT-HOST-DAG: @_ZL1x.static.123abc = internal global i32 undef
				// EXT-HOST-DAG: @[[DEVNAMEX:[0-9]+]] = {{.*}}c"_ZL1x.static.123abc\00"

				static __device__ int x;

				// Test normal static device variables
				// INT-DEV-DAG: @_ZL1y = internal addrspace(4) global i32 0
				// INT-HOST-DAG: @_ZL1y = internal global i32 undef
				// INT-HOST-DAG: @[[DEVNAMEY:[0-9]+]] = {{.*}}c"_ZL1y\00"

				// Test externalized static device variables
				// EXT-DEV-DAG: @_ZL1y.static.123abc = addrspace(4) externally_initialized global i32 0
				// EXT-HOST-DAG: @_ZL1y.static.123abc = internal global i32 undef
				// EXT-HOST-DAG: @[[DEVNAMEY:[0-9]+]] = {{.*}}c"_ZL1y.static.123abc\00"

				static __constant__ int y;

				// Test static host variable, which should not be externalized nor registered.
				// HOST-DAG: @_ZL1z = internal global i32 0
				// DEV-NOT: @_ZL1z
				static int z;

				// Test static device variable in inline function, which should not be
				// externalized nor registered.
				// DEV-DAG: @_ZZ6devfunPPKiE1p = linkonce_odr addrspace(4) constant i32 2, comdat

				inline __device__ void devfun(const int ** b) {
				const static int p = 2;
				b[0] = &p;
				}

				__global__ void kernel(int a, const int *b) {
				const static int w = 1;
				a[0] = x;
				a[1] = y;
				b[0] = &w;
				devfun(b);
				}

				int* getDeviceSymbol(int *x);

				void foo() {
				getDeviceSymbol(&x);
				getDeviceSymbol(&y);
				z = 123;
				}

				// INT-HOST: __hipRegisterVar({{.}}@_ZL1x{{.}}@[[DEVNAMEX]]
				// INT-HOST: __hipRegisterVar({{.}}@_ZL1y{{.}}@[[DEVNAMEY]]
				// EXT-HOST: __hipRegisterVar({{.}}@_ZL1x.static.123abc{{.}}@[[DEVNAMEX]]
				// EXT-HOST: __hipRegisterVar({{.}}@_ZL1y.static.123abc{{.}}@[[DEVNAMEY]]
				// HOST-NOT: __hipRegisterVar({{.*}}@_ZZ6kernelPiPPKiE1w
				// HOST-NOT: __hipRegisterVar({{.*}}@_ZZ6devfunPPKiE1p

clang/test/Driver/hip-cuid.hip

This file was added.

				// REQUIRES: clang-driver
				// REQUIRES: x86-registered-target
				// REQUIRES: amdgpu-registered-target

				// Check invalid -fuse-cuid= option.

				// RUN: not %clang -### -x hip \
				// RUN: -target x86_64-unknown-linux-gnu \
				// RUN: --offload-arch=gfx900 \
				// RUN: --offload-arch=gfx906 \
				// RUN: -c -nogpulib -fuse-cuid=invalid \
				// RUN: %S/Inputs/hip_multiple_inputs/a.cu \
				// RUN: %S/Inputs/hip_multiple_inputs/b.hip \
				// RUN: 2>&1 \| FileCheck -check-prefixes=INVALID %s

				// INVALID: invalid value 'invalid' in '-fuse-cuid=invalid'

				// Check random CUID generator.

				// RUN: %clang -### -x hip \
				// RUN: -target x86_64-unknown-linux-gnu \
				// RUN: --offload-arch=gfx900 \
				// RUN: --offload-arch=gfx906 \
				// RUN: -c -nogpulib -fuse-cuid=random \
				// RUN: %S/Inputs/hip_multiple_inputs/a.cu \
				// RUN: %S/Inputs/hip_multiple_inputs/b.hip \
				// RUN: 2>&1 \| FileCheck -check-prefixes=COMMON,HEX %s

				// Check fixed CUID.

				// RUN: %clang -### -x hip \
				// RUN: -target x86_64-unknown-linux-gnu \
				// RUN: --offload-arch=gfx900 \
				// RUN: --offload-arch=gfx906 \
				// RUN: -c -nogpulib -cuid=xyz_123 \
				// RUN: %S/Inputs/hip_multiple_inputs/a.cu \
				traUnsubmitted Done Reply Inline Actions Nit: `abcd` could potentially match the value generated by hash. I'd change it to contain characters other than hex. tra: Nit: `abcd` could potentially match the value generated by hash. I'd change it to contain…
				yaxunlAuthorUnsubmitted Done Reply Inline Actions done yaxunl: done
				// RUN: %S/Inputs/hip_multiple_inputs/b.hip \
				// RUN: 2>&1 \| FileCheck -check-prefixes=COMMON,FIXED %s

				// Check fixed CUID override -fuse-cuid.

				// RUN: %clang -### -x hip \
				// RUN: -target x86_64-unknown-linux-gnu \
				// RUN: --offload-arch=gfx900 \
				// RUN: --offload-arch=gfx906 \
				// RUN: -c -nogpulib -fuse-cuid=random -cuid=xyz_123 \
				// RUN: %S/Inputs/hip_multiple_inputs/a.cu \
				// RUN: %S/Inputs/hip_multiple_inputs/b.hip \
				// RUN: 2>&1 \| FileCheck -check-prefixes=COMMON,FIXED %s

				// Check hash CUID generator.

				// RUN: %clang -### -x hip \
				// RUN: -target x86_64-unknown-linux-gnu \
				// RUN: --offload-arch=gfx900 \
				// RUN: --offload-arch=gfx906 \
				// RUN: -c -nogpulib -fuse-cuid=hash \
				// RUN: %S/Inputs/hip_multiple_inputs/a.cu \
				// RUN: %S/Inputs/hip_multiple_inputs/b.hip \
				// RUN: 2>&1 \| FileCheck -check-prefixes=COMMON,HEX %s

				// COMMON: "{{.}}clang{{.}}" "-cc1" "-triple" "amdgcn-amd-amdhsa"
				// COMMON-SAME: "-target-cpu" "gfx900"
				// HEX-SAME: "-cuid=[[CUID:[0-9a-f]+]]"
				// FIXED-SAME: "-cuid=[[CUID:xyz_123]]"
				// COMMON-SAME: "{{.*}}a.cu"

				// COMMON: "{{.}}clang{{.}}" "-cc1" "-triple" "amdgcn-amd-amdhsa"
				// COMMON-SAME: "-target-cpu" "gfx906"
				// COMMON-SAME: "-cuid=[[CUID]]"
				// COMMON-SAME: "{{.*}}a.cu"

				// COMMON: "{{.}}clang{{.}}" "-cc1" "-triple" "x86_64-unknown-linux-gnu"
				// COMMON-SAME: "-cuid=[[CUID]]"
				// COMMON-SAME: "{{.*}}a.cu"

				// COMMON: "{{.}}clang{{.}}" "-cc1" "-triple" "amdgcn-amd-amdhsa"
				// COMMON-SAME: "-target-cpu" "gfx900"
				// HEX-NOT: "-cuid=[[CUID]]"
				// HEX-SAME: "-cuid=[[CUID2:[0-9a-f]+]]"
				// FIXED-SAME: "-cuid=[[CUID2:xyz_123]]"
				// COMMON-SAME: "{{.*}}b.hip"

				// COMMON: "{{.}}clang{{.}}" "-cc1" "-triple" "amdgcn-amd-amdhsa"
				// COMMON-SAME: "-target-cpu" "gfx906"
				// HEX-NOT: "-cuid=[[CUID]]"
				// COMMON-SAME: "-cuid=[[CUID2]]"
				// COMMON-SAME: "{{.*}}b.hip"

				// COMMON: "{{.}}clang{{.}}" "-cc1" "-triple" "x86_64-unknown-linux-gnu"
				// HEX-NOT: "-cuid=[[CUID]]"
				// COMMON-SAME: "-cuid=[[CUID2]]"
				// COMMON-SAME: "{{.*}}b.hip"

				// Check CUID generated by hash.
				// The same CUID is generated for the same file with the same options.

				// RUN: rm -rf %t.out

				// RUN: %clang -### -x hip -target x86_64-unknown-linux-gnu \
				// RUN: --offload-arch=gfx906 -c -nogpulib -fuse-cuid=hash \
				// RUN: %S/Inputs/hip_multiple_inputs/a.cu >%t.out 2>&1

				// RUN: %clang -### -x hip -target x86_64-unknown-linux-gnu \
				// RUN: --offload-arch=gfx906 -c -nogpulib -fuse-cuid=hash \
				// RUN: %S/Inputs/hip_multiple_inputs/a.cu >>%t.out 2>&1

				// RUN: FileCheck %s -check-prefixes=HASH -input-file %t.out

				// HASH: "{{.}}clang{{.}}" {{.}} "-target-cpu" "gfx906" {{.}}"-cuid=[[CUID:[0-9a-f]+]]"
				// HASH: "{{.}}clang{{.}}" {{.}} "-target-cpu" "gfx906" {{.}}"-cuid=[[CUID]]"


				// Check CUID generated by hash.
				// Different CUID's are generated for the same file with different options.

				// RUN: rm -rf %t.out

				// RUN: %clang -### -x hip -target x86_64-unknown-linux-gnu -DX=1 \
				// RUN: --offload-arch=gfx906 -c -nogpulib -fuse-cuid=hash \
				// RUN: %S/Inputs/hip_multiple_inputs/a.cu >%t.out 2>&1

				// RUN: %clang -### -x hip -target x86_64-unknown-linux-gnu -DX=2 \
				// RUN: --offload-arch=gfx906 -c -nogpulib -fuse-cuid=hash \
				// RUN: %S/Inputs/../Inputs/hip_multiple_inputs/a.cu >>%t.out 2>&1

				// RUN: FileCheck %s -check-prefixes=HASH2 -input-file %t.out

				// HASH2: "{{.}}clang{{.}}" {{.}} "-target-cpu" "gfx906" {{.}}"-cuid=[[CUID:[0-9a-f]+]]"
				// HASH2-NOT: "{{.}}clang{{.}}" {{.}} "-target-cpu" "gfx906" {{.}}"-cuid=[[CUID]]"

clang/test/Frontend/hip-cuid.hip

This file was added.

				// RUN: not %clang_cc1 -cuid=abc-123 -offload-arch=gfx906 %s 2>&1 \
				// RUN: \| FileCheck --check-prefix=INVALID %s

				// INVALID: invalid value 'abc-123' in '-cuid=abc-123' (alphanumeric characters and underscore only)

				// RUN: %clang_cc1 -cuid=abc_123 -offload-arch=gfx906 %s

clang/test/SemaCUDA/static-device-var.cu

This file was added.

				// REQUIRES: x86-registered-target
				// REQUIRES: amdgpu-registered-target

				// RUN: %clang_cc1 -triple nvptx -fcuda-is-device \
				// RUN: -emit-llvm -o - %s -fsyntax-only -verify

				// RUN: %clang_cc1 -triple x86_64-gnu-linux \
				// RUN: -emit-llvm -o - %s -fsyntax-only -verify

				#include "Inputs/cuda.h"

				__device__ void f1() {
				const static int b = 123;
				static int a;
				// expected-error@-1 {{within a __device__ function, only __shared__ variables or const variables without device memory qualifier may be marked 'static'}}
				}

				__global__ void k1() {
				const static int b = 123;
				static int a;
				// expected-error@-1 {{within a __global__ function, only __shared__ variables or const variables without device memory qualifier may be marked 'static'}}
				}

				static __device__ int x;
				static __constant__ int y;

				__global__ void kernel(int *a) {
				a[0] = x;
				a[1] = y;
				}

				int* getDeviceSymbol(int *x);

				void foo() {
				getDeviceSymbol(&x);
				getDeviceSymbol(&y);
				}

This is an archive of the discontinued LLVM Phabricator instance.

[CUDA][HIP] Support accessing static device variable in host code for -fno-gpu-rdcClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 277272

clang/include/clang/AST/ASTContext.h

clang/include/clang/Basic/DiagnosticDriverKinds.td

clang/include/clang/Basic/LangOptions.h

clang/include/clang/Driver/Action.h

clang/include/clang/Driver/Compilation.h

clang/include/clang/Driver/Options.td

clang/lib/AST/ASTContext.cpp

clang/lib/CodeGen/CGCUDANV.cpp

clang/lib/CodeGen/CodeGenModule.h

clang/lib/CodeGen/CodeGenModule.cpp

clang/lib/Driver/Action.cpp

clang/lib/Driver/Driver.cpp

clang/lib/Driver/ToolChains/Clang.cpp

clang/lib/Frontend/CompilerInvocation.cpp

clang/test/CodeGenCUDA/static-device-var.cu

clang/test/Driver/hip-cuid.hip

clang/test/Frontend/hip-cuid.hip

clang/test/SemaCUDA/static-device-var.cu

[CUDA][HIP] Support accessing static device variable in host code for -fno-gpu-rdc
ClosedPublic