This is to support cases where static globals are marked declare
target. By default these file static globals are not externally
visible but in order for OpenMP runtime to access these symbols,
this changes here makes them externally visisble unless they
have "hidden" visibility attribute.
Making them externally visible, however, leads to symbol conflict
when two files have variables with the same name. Thus, these
symbols needs to be mangled on the device side of the compilation.
In order to do so, the host side mangles the symbol names and
passes that metadata information to the device side. It also uses
these mangled names if offload entry table so that the OPenMP
runtime can find these symbols during registration.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Thanks for the patch. I still think this is a silly feature to support, but users will probably expect it. See comments.
clang/lib/CodeGen/CGOpenMPRuntime.cpp | ||
---|---|---|
10795–10820 | It might be easier to just mangle the original definition, that would reduce a lot of churn here adding origName everywhere. Any reason that's not desirable? | |
10802 | CGM.printPostfixForExternalizedDecl should ideally give the same output on the host and device, but it's somewhat limited since it just checks the file ID and environment, which is technically possible to change. The kernels use getTargetEntryUniqueInfo, which might make sense to re-use for this case. | |
clang/lib/CodeGen/TargetInfo.cpp | ||
7295 | Formatting looks weird, did you do git clang-format HEAD~1? | |
9431 | Just spitballing, is it possible to do this when we make the global instead? | |
clang/test/OpenMP/declare_target_visibility_codegen.cpp | ||
11–13 | If there are no updates between the host and device we can keep these static without emitting an offloading entry. |
clang/lib/CodeGen/CGOpenMPRuntime.cpp | ||
---|---|---|
10795–10820 | You are right, it'd have made the code cleaner but we didn't want to mangle the host side if we could avoid it. | |
10802 | That was the point I had raised in one of the Clang meeting but someone had mentioned that kernels names are created on the host side and the device side reads the information though the Host IR. Seems like kernels name could also run into mismatch issue for some corner cases then? | |
clang/lib/CodeGen/TargetInfo.cpp | ||
7295 | Looks like I didn't run git clang-format correctly, I'll fix it. Thanks | |
9431 | This is something I was wondering as well. In CodeGenModule::GetOrCreateLLVMGlobal, when it creates a new global variable, it always uses the llvm::GlobalValue::ExternalLinkage. Seems like this changes somewhere later to internal for static globals. Do you know where that would be? | |
clang/test/OpenMP/declare_target_visibility_codegen.cpp | ||
11–13 | That 's a good point. I'll fix that. |
clang/lib/CodeGen/CGOpenMPRuntime.cpp | ||
---|---|---|
3285–3289 | This comment needs to be adjusted accordingly | |
10795–10820 | Others may want to comment, but personally I'm not too worried about mangling a name that wouldn't have been placed in the symbol table to begin with and it would make the code a lot cleaner. | |
10802 | So the problem here is that the host and device need to agree on what the name is so that we can register the correct variable. The CUDA / HIP toolchains solved this by either performing a mangling that is stable between the host and device, or by having the driver generate a random hash that gets used on both. OpenMP instead solves this by writing the variable to the host IR first and then reading it on the device to see what the name needs to be. Since we have that dependency we can use any mangling we want, though it's still best for it to be somewhat stable unless we want tests to change every time we run them. It probably won't hurt anything to just use printPostfixForExternalizedDecl but it's not as strong of a mangling as what we can do with the OpenMP method since it needs to be common between the host and device. | |
clang/lib/CodeGen/TargetInfo.cpp | ||
9431 | I'm not exactly sure, I remember deleting some code in D117806 that did something like that, albeit incorrectly. But I'm not sure if you'd have the necessary information to check whether or not there are updates attached to it. We don't want to externalize things if we don't need to, otherwise we'd get a lot of our device runtime variables with external visibility that now can't be optimized out. |
clang/test/OpenMP/target_update_messages.cpp | ||
---|---|---|
17 | There is no test to show you can actually write the update now, is there? |
clang/test/OpenMP/target_update_messages.cpp | ||
---|---|---|
17 | We should probably take the deleted code above and put it in an OpenMP runtime test to make sure it actually works now. |
Adding a test and fixing
This adds a new runtime test and also address some comments.
clang/test/OpenMP/declare_target_only_one_side_compilation.cpp | ||
---|---|---|
61 | I wasn't expecting this to change. For some reason G2 gets the OMPDeclareTargetDeclAttr::DT_Any attribute instead of OMPDeclareTargetDeclAttr::DT_NoHost and because of that the visibility changes. @jdoerfert, is OMPDeclareTargetDeclAttr::DT_Any attribute expected here? | |
clang/test/OpenMP/declare_target_visibility_codegen.cpp | ||
11–13 | I thought about this more and I think the behavior for these declare target static globals should be the same as the other declare target. Checking for update is not enough because users could also map these variables. For update, it could be mapped with a pointer or the users could pass address of these variables to an external function. Please let me know what you think of these cases below: #pragma omp declare target static int x[10]; #pragma omp end declare target //case 1 #pragma omp target update to(x) //case 2 int* y = &x[2]; #pragma omp target update to(y[0]) //case 3 #pragma omp target map(always to:x) { x[0]= 111; } //case 4 #pragma omp target { foo(&x[3]); } | |
clang/test/OpenMP/target_update_messages.cpp | ||
17 | I have now added a test as suggested. |
clang/lib/CodeGen/CGOpenMPRuntime.cpp | ||
---|---|---|
10802 |
This has been changed as suggested. |
I still think we shouldn't bother making all the noise containing the original name. Just mangle it and treat it like every other declare target variable without introducing any extra complexity. These symbols never should've been emitted in the first place so I'm not concerned if someone cracks open a binary and sees some ugly names. CUDA and HIP just mangle the declaration directly as far as I'm aware.
clang/lib/CodeGen/TargetInfo.cpp | ||
---|---|---|
9431 | Were you able to find a place for this when we generate the variable? You should be able to do something similar to the patch above if it's a declare target static to force it to have external visibility, but as mentioned before I would prefer we only do this if necessary which might take some extra analysis. | |
clang/test/OpenMP/declare_target_visibility_codegen.cpp | ||
11–13 | We should still be able to do this if there are either no updates at all in the module, or if the declare type is nohost. Doing anything more complicated would require some optimizations between the host and device we can't do yet. I'm making this point because making these statics external is a performance regression so we should only do it when needed. To that end we may even want a flag that entirely disables this feature. |
If that's the preference I can make changes as suggested. You mentioned CUDA and HIP mangle the declaration directly. To me it looks like they mangle it on host and device separately. Is that not correct? If so, can you point me to the source you are referring to?
clang/lib/CodeGen/TargetInfo.cpp | ||
---|---|---|
9431 | If you are asking about the GV, it is created in 'CodeGenModule::GetOrCreateLLVMGlobal' with external linkage always. auto *GV = new llvm::GlobalVariable( getModule(), Ty, false, llvm::GlobalValue::ExternalLinkage, nullptr, MangledName, nullptr, llvm::GlobalVariable::NotThreadLocal, getContext().getTargetAddressSpace(DAddrSpace)); The linkage, however, changes in 'CodeGenModule::EmitGlobalVarDefinition' based on the information VarDecl llvm::GlobalValue::LinkageTypes Linkage = getLLVMLinkageVarDefinition(D, GV->isConstant()); Maybe you are suggesting changing the linkage information in 'VarDecl' itself? | |
clang/test/OpenMP/declare_target_visibility_codegen.cpp | ||
11–13 | I'll add a check to see if there are any updates in the module. |
You're right, they mangle them separately like in https://godbolt.org/z/r6hG4brqx, this is most likely because they already had separate "device side" names. For OpenMP we currently just use the same name for the variable on the host and device side like in https://godbolt.org/z/eaGo9qsW3 where we just use the same kernel names. Thinking again, I'm still wondering if there's any utility in keeping the names separate. Correct me if I'm wrong, but the host-side variable should be able to remain internal so this mangled device name shouldn't show up in the final executable. In that case the only benefit is slightly nicer IR, which I'm not super concerned with.
clang/lib/CodeGen/TargetInfo.cpp | ||
---|---|---|
9431 | Yes, the patch I linked previously did something like that where it set the LinkageValue based on some information. Although I'm not sure if it would be excessively difficult to try to prune definitions that don't need to be externalized. I haven't looked too deep into this, but I believe CUDA does this inside of adjustGVALinkageForAttributes, there we also check some variable called CUDADeviceVarODRUsedByHost that I'm assuming tracks if we need to bother externalizing this. |
The OpenMP kernel names you mentioned are also generated separately by the host and the device. Would you be okay generating declare target mangle names separately by host and device using the same utility function getTargetEntryUniqueInfo?
If you still think it should only be generated only once by the host, what is a good way of doing this since we can't modify the name in VarDecl?
clang/lib/CodeGen/TargetInfo.cpp | ||
---|---|---|
9431 | The exter |
clang/lib/CodeGen/TargetInfo.cpp | ||
---|---|---|
9431 | Thanks for the information, I'll take a look |
I thought we already emitted the mangled name at least on the device side. I was suggesting that we just use the same name on the host so we don't need to worry about a host-side and device-side name difference and we can get rid of the extra argument to all the offload entry functions.
Yes, that is correct. My question is, is it okay to mangle the host and the device side independently using getTargetEntryUniqueInfo? The reason I am asking is because you had expressed some concerns regarding mangling them separately. Or, maybe there is a way to mangle the original name before the host and device compilation split?
You'll need to mangle them separately for the device and host, the difference is that we want to use a function that shares the input to create the mangled name. As far as I know, this is done using a metadata node in the host bitcode. So as long as we share the same method that kernels use it should be fine.
This comment needs to be adjusted accordingly