This is an archive of the discontinued LLVM Phabricator instance.

[OPENMP] Make declare target static global externally visible
Needs ReviewPublic

Authored by ssquare08 on Jul 13 2022, 2:04 PM.

Details

Summary

This is to support cases where static globals are marked declare
target. By default these file static globals are not externally
visible but in order for OpenMP runtime to access these symbols,
this changes here makes them externally visisble unless they
have "hidden" visibility attribute.
Making them externally visible, however, leads to symbol conflict
when two files have variables with the same name. Thus, these
symbols needs to be mangled on the device side of the compilation.
In order to do so, the host side mangles the symbol names and
passes that metadata information to the device side. It also uses
these mangled names if offload entry table so that the OPenMP
runtime can find these symbols during registration.

Diff Detail

Event Timeline

ssquare08 created this revision.Jul 13 2022, 2:04 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 13 2022, 2:04 PM
ssquare08 requested review of this revision.Jul 13 2022, 2:04 PM
Herald added a project: Restricted Project. · View Herald Transcript

Thanks for the patch. I still think this is a silly feature to support, but users will probably expect it. See comments.

clang/lib/CodeGen/CGOpenMPRuntime.cpp
10795–10820

It might be easier to just mangle the original definition, that would reduce a lot of churn here adding origName everywhere. Any reason that's not desirable?

10802

CGM.printPostfixForExternalizedDecl should ideally give the same output on the host and device, but it's somewhat limited since it just checks the file ID and environment, which is technically possible to change. The kernels use getTargetEntryUniqueInfo, which might make sense to re-use for this case.

clang/lib/CodeGen/TargetInfo.cpp
7295

Formatting looks weird, did you do git clang-format HEAD~1?

9431

Just spitballing, is it possible to do this when we make the global instead?

clang/test/OpenMP/declare_target_visibility_codegen.cpp
11–13

If there are no updates between the host and device we can keep these static without emitting an offloading entry.

ssquare08 added inline comments.Jul 13 2022, 5:23 PM
clang/lib/CodeGen/CGOpenMPRuntime.cpp
10795–10820

You are right, it'd have made the code cleaner but we didn't want to mangle the host side if we could avoid it.

10802

That was the point I had raised in one of the Clang meeting but someone had mentioned that kernels names are created on the host side and the device side reads the information though the Host IR. Seems like kernels name could also run into mismatch issue for some corner cases then?

clang/lib/CodeGen/TargetInfo.cpp
7295

Looks like I didn't run git clang-format correctly, I'll fix it. Thanks

9431

This is something I was wondering as well. In CodeGenModule::GetOrCreateLLVMGlobal, when it creates a new global variable, it always uses the llvm::GlobalValue::ExternalLinkage. Seems like this changes somewhere later to internal for static globals. Do you know where that would be?

clang/test/OpenMP/declare_target_visibility_codegen.cpp
11–13

That 's a good point. I'll fix that.

jhuber6 added inline comments.Jul 13 2022, 5:53 PM
clang/lib/CodeGen/CGOpenMPRuntime.cpp
3285–3289

This comment needs to be adjusted accordingly

10795–10820

Others may want to comment, but personally I'm not too worried about mangling a name that wouldn't have been placed in the symbol table to begin with and it would make the code a lot cleaner.

10802

So the problem here is that the host and device need to agree on what the name is so that we can register the correct variable. The CUDA / HIP toolchains solved this by either performing a mangling that is stable between the host and device, or by having the driver generate a random hash that gets used on both. OpenMP instead solves this by writing the variable to the host IR first and then reading it on the device to see what the name needs to be. Since we have that dependency we can use any mangling we want, though it's still best for it to be somewhat stable unless we want tests to change every time we run them. It probably won't hurt anything to just use printPostfixForExternalizedDecl but it's not as strong of a mangling as what we can do with the OpenMP method since it needs to be common between the host and device.

clang/lib/CodeGen/TargetInfo.cpp
9431

I'm not exactly sure, I remember deleting some code in D117806 that did something like that, albeit incorrectly. But I'm not sure if you'd have the necessary information to check whether or not there are updates attached to it. We don't want to externalize things if we don't need to, otherwise we'd get a lot of our device runtime variables with external visibility that now can't be optimized out.

jdoerfert added inline comments.Jul 14 2022, 7:29 AM
clang/test/OpenMP/target_update_messages.cpp
17

There is no test to show you can actually write the update now, is there?

jhuber6 added inline comments.Jul 14 2022, 7:33 AM
clang/test/OpenMP/target_update_messages.cpp
17

We should probably take the deleted code above and put it in an OpenMP runtime test to make sure it actually works now.

ssquare08 updated this revision to Diff 447901.Jul 26 2022, 5:54 PM

Adding a test and fixing

This adds a new runtime test and also address some comments.

Herald added a project: Restricted Project. · View Herald TranscriptJul 26 2022, 5:54 PM
ssquare08 added inline comments.Jul 26 2022, 6:21 PM
clang/test/OpenMP/declare_target_only_one_side_compilation.cpp
61

I wasn't expecting this to change. For some reason G2 gets the OMPDeclareTargetDeclAttr::DT_Any attribute instead of OMPDeclareTargetDeclAttr::DT_NoHost and because of that the visibility changes. @jdoerfert, is OMPDeclareTargetDeclAttr::DT_Any attribute expected here?

clang/test/OpenMP/declare_target_visibility_codegen.cpp
11–13

I thought about this more and I think the behavior for these declare target static globals should be the same as the other declare target. Checking for update is not enough because users could also map these variables. For update, it could be mapped with a pointer or the users could pass address of these variables to an external function. Please let me know what you think of these cases below:

#pragma omp declare target
static int x[10];
#pragma omp end declare target

//case 1
#pragma omp target update to(x)

//case 2
int* y = &x[2];
#pragma omp target update to(y[0])

//case 3
#pragma omp target map(always to:x)
{
 x[0]= 111;
}

//case 4
#pragma omp target
{ 
  foo(&x[3]);
}
clang/test/OpenMP/target_update_messages.cpp
17

I have now added a test as suggested.

ssquare08 added inline comments.Jul 26 2022, 6:24 PM
clang/lib/CodeGen/CGOpenMPRuntime.cpp
10802

CGM.printPostfixForExternalizedDecl should ideally give the same output on the host and device, but it's somewhat limited since it just checks the file ID and environment, which is technically possible to change. The kernels use getTargetEntryUniqueInfo, which might make sense to re-use for this case.

This has been changed as suggested.

I still think we shouldn't bother making all the noise containing the original name. Just mangle it and treat it like every other declare target variable without introducing any extra complexity. These symbols never should've been emitted in the first place so I'm not concerned if someone cracks open a binary and sees some ugly names. CUDA and HIP just mangle the declaration directly as far as I'm aware.

clang/lib/CodeGen/TargetInfo.cpp
9431

Were you able to find a place for this when we generate the variable? You should be able to do something similar to the patch above if it's a declare target static to force it to have external visibility, but as mentioned before I would prefer we only do this if necessary which might take some extra analysis.

clang/test/OpenMP/declare_target_visibility_codegen.cpp
11–13

We should still be able to do this if there are either no updates at all in the module, or if the declare type is nohost. Doing anything more complicated would require some optimizations between the host and device we can't do yet. I'm making this point because making these statics external is a performance regression so we should only do it when needed. To that end we may even want a flag that entirely disables this feature.

I still think we shouldn't bother making all the noise containing the original name. Just mangle it and treat it like every other declare target variable without introducing any extra complexity. These symbols never should've been emitted in the first place so I'm not concerned if someone cracks open a binary and sees some ugly names. CUDA and HIP just mangle the declaration directly as far as I'm aware.

If that's the preference I can make changes as suggested. You mentioned CUDA and HIP mangle the declaration directly. To me it looks like they mangle it on host and device separately. Is that not correct? If so, can you point me to the source you are referring to?

ssquare08 added inline comments.Aug 8 2022, 3:15 PM
clang/lib/CodeGen/TargetInfo.cpp
9431

If you are asking about the GV, it is created in 'CodeGenModule::GetOrCreateLLVMGlobal' with external linkage always.

auto *GV = new llvm::GlobalVariable(
    getModule(), Ty, false, llvm::GlobalValue::ExternalLinkage, nullptr,
    MangledName, nullptr, llvm::GlobalVariable::NotThreadLocal,
    getContext().getTargetAddressSpace(DAddrSpace));

The linkage, however, changes in 'CodeGenModule::EmitGlobalVarDefinition' based on the information VarDecl

llvm::GlobalValue::LinkageTypes Linkage =
    getLLVMLinkageVarDefinition(D, GV->isConstant());

Maybe you are suggesting changing the linkage information in 'VarDecl' itself?

clang/test/OpenMP/declare_target_visibility_codegen.cpp
11–13

I'll add a check to see if there are any updates in the module.

If that's the preference I can make changes as suggested. You mentioned CUDA and HIP mangle the declaration directly. To me it looks like they mangle it on host and device separately. Is that not correct? If so, can you point me to the source you are referring to?

You're right, they mangle them separately like in https://godbolt.org/z/r6hG4brqx, this is most likely because they already had separate "device side" names. For OpenMP we currently just use the same name for the variable on the host and device side like in https://godbolt.org/z/eaGo9qsW3 where we just use the same kernel names. Thinking again, I'm still wondering if there's any utility in keeping the names separate. Correct me if I'm wrong, but the host-side variable should be able to remain internal so this mangled device name shouldn't show up in the final executable. In that case the only benefit is slightly nicer IR, which I'm not super concerned with.

clang/lib/CodeGen/TargetInfo.cpp
9431

Yes, the patch I linked previously did something like that where it set the LinkageValue based on some information. Although I'm not sure if it would be excessively difficult to try to prune definitions that don't need to be externalized. I haven't looked too deep into this, but I believe CUDA does this inside of adjustGVALinkageForAttributes, there we also check some variable called CUDADeviceVarODRUsedByHost that I'm assuming tracks if we need to bother externalizing this.

If that's the preference I can make changes as suggested. You mentioned CUDA and HIP mangle the declaration directly. To me it looks like they mangle it on host and device separately. Is that not correct? If so, can you point me to the source you are referring to?

You're right, they mangle them separately like in https://godbolt.org/z/r6hG4brqx, this is most likely because they already had separate "device side" names. For OpenMP we currently just use the same name for the variable on the host and device side like in https://godbolt.org/z/eaGo9qsW3 where we just use the same kernel names. Thinking again, I'm still wondering if there's any utility in keeping the names separate. Correct me if I'm wrong, but the host-side variable should be able to remain internal so this mangled device name shouldn't show up in the final executable. In that case the only benefit is slightly nicer IR, which I'm not super concerned with.

Yes, the host-side variable should be able to remain internal.

If that's the preference I can make changes as suggested. You mentioned CUDA and HIP mangle the declaration directly. To me it looks like they mangle it on host and device separately. Is that not correct? If so, can you point me to the source you are referring to?

You're right, they mangle them separately like in https://godbolt.org/z/r6hG4brqx, this is most likely because they already had separate "device side" names. For OpenMP we currently just use the same name for the variable on the host and device side like in https://godbolt.org/z/eaGo9qsW3 where we just use the same kernel names. Thinking again, I'm still wondering if there's any utility in keeping the names separate. Correct me if I'm wrong, but the host-side variable should be able to remain internal so this mangled device name shouldn't show up in the final executable. In that case the only benefit is slightly nicer IR, which I'm not super concerned with.

Yes, the host-side variable should be able to remain internal.

The OpenMP kernel names you mentioned are also generated separately by the host and the device. Would you be okay generating declare target mangle names separately by host and device using the same utility function getTargetEntryUniqueInfo?

If you still think it should only be generated only once by the host, what is a good way of doing this since we can't modify the name in VarDecl?

clang/lib/CodeGen/TargetInfo.cpp
9431

The exter

ssquare08 marked an inline comment as not done.Aug 11 2022, 1:03 PM
ssquare08 added inline comments.
clang/lib/CodeGen/TargetInfo.cpp
9431

Thanks for the information, I'll take a look

The OpenMP kernel names you mentioned are also generated separately by the host and the device. Would you be okay generating declare target mangle names separately by host and device using the same utility function getTargetEntryUniqueInfo?

If you still think it should only be generated only once by the host, what is a good way of doing this since we can't modify the name in VarDecl?

I thought we already emitted the mangled name at least on the device side. I was suggesting that we just use the same name on the host so we don't need to worry about a host-side and device-side name difference and we can get rid of the extra argument to all the offload entry functions.

The OpenMP kernel names you mentioned are also generated separately by the host and the device. Would you be okay generating declare target mangle names separately by host and device using the same utility function getTargetEntryUniqueInfo?

If you still think it should only be generated only once by the host, what is a good way of doing this since we can't modify the name in VarDecl?

I thought we already emitted the mangled name at least on the device side. I was suggesting that we just use the same name on the host so we don't need to worry about a host-side and device-side name difference and we can get rid of the extra argument to all the offload entry functions.

Yes, that is correct. My question is, is it okay to mangle the host and the device side independently using getTargetEntryUniqueInfo? The reason I am asking is because you had expressed some concerns regarding mangling them separately. Or, maybe there is a way to mangle the original name before the host and device compilation split?

Yes, that is correct. My question is, is it okay to mangle the host and the device side independently using getTargetEntryUniqueInfo? The reason I am asking is because you had expressed some concerns regarding mangling them separately. Or, maybe there is a way to mangle the original name before the host and device compilation split?

You'll need to mangle them separately for the device and host, the difference is that we want to use a function that shares the input to create the mangled name. As far as I know, this is done using a metadata node in the host bitcode. So as long as we share the same method that kernels use it should be fine.