This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP][AMDGCN] Generate global variables and attributes for AMDGCN
AbandonedPublic

Authored by saiislam on Aug 17 2020, 11:55 AM.

Details

Summary

Provide support for amdgcn specific global variables and attributes.
Generalize allocation of various common global variables and provide
their specialized implementations for nvptx and amdgcn.

Diff Detail

Event Timeline

saiislam created this revision.Aug 17 2020, 11:55 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 17 2020, 11:55 AM
saiislam requested review of this revision.Aug 17 2020, 11:55 AM
ABataev added inline comments.Aug 17 2020, 12:19 PM
clang/lib/CodeGen/CGOpenMPRuntime.h
502

Cab this type and corresponding functions be made AMDGCN-specific only?

clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.cpp
119

Is this possible?

122

FlatAttrEmitted

132

CompileTimeThreadLimit

clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
1314–1315

Restore original formatting.

clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
249–277

Make them protected, not public if possible. Try the same for other new functions.

clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.h
41–66

No need to add virtual, override is enough

saiislam updated this revision to Diff 287513.Aug 24 2020, 3:31 PM
  1. Moved amdgcn specific functions to CGOpenMPAMDGCN.cpp
  2. Removed tautology condition
  3. Corrected case of local variables
  4. Restored original formatting
  5. Changed back declaration of emit kernel methods as private
  6. Added support of amdgcn specific PrePostActionTy implementation and its corresponding test cases
  7. Changed static line numbers in new test cases with regex
  8. Other small code corrections

Reformat the code

clang/lib/CodeGen/CGOpenMPRuntime.h
501

Remove unnecessary formatting changes.

2491–2495

Better to make it a protected member function if you really require it. Plus, this function is very small and, I think, you simply create your own copy in CGOpenMPRuntimeAMDGCN

2499

Same here, make it protected or just create a copy, if it is small.

clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.h
29–31

Add comments for all new members

77
  1. Do you really need to make this class public?
  2. final
saiislam updated this revision to Diff 288072.Aug 26 2020, 12:16 PM
  1. Reformarting
  2. Comments
  3. Reduced scope of specialized PrePostActionTy
saiislam marked an inline comment as done.Aug 26 2020, 12:27 PM
saiislam added inline comments.
clang/lib/CodeGen/CGOpenMPRuntime.h
2491–2495

Not making it protected because it is used by various static functions. And don't want to create an object pointer of subclass of CGOpenMPRuntime in CGOpenMPRuntime.

2499

It calls static functions which in turn call other static functions, so it won't make sense to create a copy of whole function chain in amdgcn.

ABataev added inline comments.Sep 15 2020, 7:44 AM
clang/lib/CodeGen/CGOpenMPRuntime.h
501

Still not removed

688

Restore original formatting

2494–2499

Better to encapsulate these functions into a new utility class and make them public static.

clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.h
30–63

Do you really need to expose all these new members as public?

31

Runtime does not support nested parallelism on GPU. Do you really need it?

93

It does not help to understand the functionality

clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
1573

It leads to a mem leak.

clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
37
  1. Make it private or protected

2.Add default initializer

221–253

Are all these required to be public?

402–404

Make it private or protected

saiislam updated this revision to Diff 298377.Oct 15 2020, 7:29 AM
  1. Removed unnecessary formatting of untouched code.
  2. Encapsulated addFieldToRecordDecl and createGlobalStruct methods in a class and made them static (triggered change at all calling sites).
  3. Marked most of the member methods of CGOpenMPRuntimeAMDGCN as private (forgot to do same change in nvptx)
  4. Fixed the memory leak
  5. Marked appropriate member variables as protected in CGOpenMPRuntimeGPU
clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.cpp
178

The nvptx emitSPMDKernelWrapper does nothing and the amdgcn one appends some metadata.

How about 'nvptx::generateMetadata(...)' that does nothing and 'amdgcn::generateMetadata(...)` that does this stuff, called from the end of emitSPMDKernel?

200

This metadata generation could be split out from the other changes.

clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.h
43

I'm not convinced by this abstraction. It looks like amdgcn and nvptx want almost exactly the same variable in each case. The difference appears to be that nvptx uses internal linkage and amdgcn uses weak + externally initialized, in which case we're better off with

bool nvptx::needsExternalInitialization() {return false;}
bool amdgpu::needsExternalInitialization() {return true;}

Or, if the inline ternary is unappealing, amdgcn::NewGlobalVariable(...) that passes the arguments to llvm::GlobalVariable while setting the two fields that differ between the two.

clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
204

Please put this back to the previous location so we can see whether it changed in the diff

saiislam marked 3 inline comments as done.Oct 15 2020, 12:13 PM
saiislam added inline comments.
clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.cpp
178

It will be then difficult to track what all things are being done differently in the two. So, the common code has been generalized and (no change in nvptx + some changes in amdgcn) has been used as specialization.

clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.h
43

I understand what you are suggesting. But, there are multiple such variables where linkage between nvptx and amdgcn are different. Also current style gives flexibility to a future implementation to define these variables in their own way.
What do you think?

clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
204

This movement changes them from private to protected.
I could have just added access specifiers and not move the definitions. It would have simplified the review, but it would have decreased the readability for future.

221–253

Yes, they are being called from outside class.

clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp
59

Perhaps (typed into browser):

llvm::GlobalVariable *CGOpenMPRuntimeNVPTX::createGlobal( CodeGenModule &CGM, llvm::ArrayType *Ty, StringRef Name) {
  return new llvm::GlobalVariable(
      CGM.getModule(), Ty, /*isConstant=*/false,
      llvm::GlobalVariable::CommonLinkage, llvm::Constant::getNullValue(Ty),
      Name,
      /*InsertBefore=*/nullptr, llvm::GlobalVariable::NotThreadLocal,
      CGM.getContext().getTargetAddressSpace(LangAS::cuda_shared),
/*isExternallyInitialized*/ true);
}

llvm::GlobalVariable *CGOpenMPRuntimeAMDGCN::createGlobal( CodeGenModule &CGM, llvm::ArrayType *Ty, StringRef Name) {
  return new llvm::GlobalVariable(
      CGM.getModule(), Ty, /*isConstant=*/false,
      llvm::GlobalVariable::WeakAnyLinkage, llvm::Constant::getNullValue(Ty),
      Name,
      /*InsertBefore=*/nullptr, llvm::GlobalVariable::NotThreadLocal,
      CGM.getContext().getTargetAddressSpace(LangAS::cuda_shared),
/*isExternallyInitialized*/ false);
}
saiislam updated this revision to Diff 307108.Nov 23 2020, 9:54 AM
  1. Simplifies overall patch after D90248.
  2. Removes MaxParallelLevel and thus target specific PrePostActionTy.
  3. Removes ExternallyInitialized qualifier from shared variables for AMDGCN.
clang/lib/CodeGen/CGOpenMPRuntime.cpp
1344

This appears to be the same as the free function we had before, except now all the call sites are prefixed CodegenUtil. Is there a functional change I'm missing?

The rest of this patch would be easier to read with this part split off.

clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.cpp
77

This is a very verbose way to say that amdgcn calls emitmetatdata at the end of emitkernel and nvptx doesn't. Suggest unconditionally calling emitmetatdata, and having emitmetatdata be a no-op for nvptx.

89

I think there's a credible chance this is useful to nvptx, so doesn't have to be amdgcn specific

108

I think this is about computing a maximum workgroup size which the runtime uses to limit the number of threads it launches. If so, this is probably useful for nvptx and amdgcn. I'm having trouble working out what the conditions are though. Maybe it's based on an openmp clause?

150

I think I remember seeing a diff that makes this attribute unconditionally emitted by some other part of the toolchain. If so, it may no longer be required

169

HostServices is unused. Mode is redundant with exec_mode. wg_size is redundant with the other wg_size symbol added above. This kern_desc object should be deleted, not upstreamed.

saiislam marked 3 inline comments as done.Nov 26 2020, 4:27 AM
saiislam added inline comments.
clang/lib/CodeGen/CGOpenMPRuntime.cpp
1344

addFieldToRecordDecl and createGlobalStruct methods had file static scope. To make them callable from other files, from amdgcn specific file in this case, they were put in this utility class.

D92167 puts this change into a separate patch. Will update this patch once D92167 gets accepted.

clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.cpp
77

Won't the no-op approach be less extensible? Current way, though verbose, leaves scope for attaching prefix/suffix code as and when required around emitkernel. While in case of no-op, every implementing arch might have to use the exact same pattern of methods with and without code.

89

You are right, it can be useful for nvptx as well. May be we can club its generalization with the nvptx's use-case when it arrives in the future?

108

Yes, the if block in 111-147 corresponds to "number of threads" for thread_limit and num_threads clauses in teams and parallel directives.

169

Ok, thanks. Will update in next revision.

JonChesterfield requested changes to this revision.Nov 26 2020, 8:42 AM

I don't believe the contents of this patch is necessary for codegen on amdgpu. One of the internal/weak distinctions works around a bug in the gfx800 toolchain, but we should root cause and fix that bug instead. The kern_desc object is redundant. I think amdgpu-flat-work-group-size is already emitted, but if not, we might want that.

The wg_size code is interesting but architecture independent, and it's probably more user friendly for nvptx and amdgcn to have the same handling of wg_size constraints.

This revision now requires changes to proceed.Nov 26 2020, 8:42 AM
saiislam abandoned this revision.Sep 21 2021, 7:24 AM
saiislam marked 3 inline comments as done.