Provide support for amdgcn specific global variables and attributes.
Generalize allocation of various common global variables and provide
their specialized implementations for nvptx and amdgcn.
Details
- Reviewers
ABataev jdoerfert JonChesterfield
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
clang/lib/CodeGen/CGOpenMPRuntime.h | ||
---|---|---|
502 | Cab this type and corresponding functions be made AMDGCN-specific only? | |
clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.cpp | ||
119 | Is this possible? | |
122 | FlatAttrEmitted | |
132 | CompileTimeThreadLimit | |
clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp | ||
1314–1315 | Restore original formatting. | |
clang/lib/CodeGen/CGOpenMPRuntimeGPU.h | ||
249–277 | Make them protected, not public if possible. Try the same for other new functions. | |
clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.h | ||
41–66 | No need to add virtual, override is enough |
- Moved amdgcn specific functions to CGOpenMPAMDGCN.cpp
- Removed tautology condition
- Corrected case of local variables
- Restored original formatting
- Changed back declaration of emit kernel methods as private
- Added support of amdgcn specific PrePostActionTy implementation and its corresponding test cases
- Changed static line numbers in new test cases with regex
- Other small code corrections
Reformat the code
clang/lib/CodeGen/CGOpenMPRuntime.h | ||
---|---|---|
501 | Remove unnecessary formatting changes. | |
2491–2495 | Better to make it a protected member function if you really require it. Plus, this function is very small and, I think, you simply create your own copy in CGOpenMPRuntimeAMDGCN | |
2499 | Same here, make it protected or just create a copy, if it is small. | |
clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.h | ||
29–31 | Add comments for all new members | |
77 |
|
clang/lib/CodeGen/CGOpenMPRuntime.h | ||
---|---|---|
2491–2495 | Not making it protected because it is used by various static functions. And don't want to create an object pointer of subclass of CGOpenMPRuntime in CGOpenMPRuntime. | |
2499 | It calls static functions which in turn call other static functions, so it won't make sense to create a copy of whole function chain in amdgcn. |
clang/lib/CodeGen/CGOpenMPRuntime.h | ||
---|---|---|
501 | Still not removed | |
688 | Restore original formatting | |
2494–2499 | Better to encapsulate these functions into a new utility class and make them public static. | |
clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.h | ||
30–63 | Do you really need to expose all these new members as public? | |
31 | Runtime does not support nested parallelism on GPU. Do you really need it? | |
93 | It does not help to understand the functionality | |
clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp | ||
1573 | It leads to a mem leak. | |
clang/lib/CodeGen/CGOpenMPRuntimeGPU.h | ||
37 |
2.Add default initializer | |
221–253 | Are all these required to be public? | |
402–404 | Make it private or protected |
- Removed unnecessary formatting of untouched code.
- Encapsulated addFieldToRecordDecl and createGlobalStruct methods in a class and made them static (triggered change at all calling sites).
- Marked most of the member methods of CGOpenMPRuntimeAMDGCN as private (forgot to do same change in nvptx)
- Fixed the memory leak
- Marked appropriate member variables as protected in CGOpenMPRuntimeGPU
clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.cpp | ||
---|---|---|
178 | The nvptx emitSPMDKernelWrapper does nothing and the amdgcn one appends some metadata. How about 'nvptx::generateMetadata(...)' that does nothing and 'amdgcn::generateMetadata(...)` that does this stuff, called from the end of emitSPMDKernel? | |
200 | This metadata generation could be split out from the other changes. | |
clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.h | ||
43 | I'm not convinced by this abstraction. It looks like amdgcn and nvptx want almost exactly the same variable in each case. The difference appears to be that nvptx uses internal linkage and amdgcn uses weak + externally initialized, in which case we're better off with bool nvptx::needsExternalInitialization() {return false;} Or, if the inline ternary is unappealing, amdgcn::NewGlobalVariable(...) that passes the arguments to llvm::GlobalVariable while setting the two fields that differ between the two. | |
clang/lib/CodeGen/CGOpenMPRuntimeGPU.h | ||
204 | Please put this back to the previous location so we can see whether it changed in the diff |
clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.cpp | ||
---|---|---|
178 | It will be then difficult to track what all things are being done differently in the two. So, the common code has been generalized and (no change in nvptx + some changes in amdgcn) has been used as specialization. | |
clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.h | ||
43 | I understand what you are suggesting. But, there are multiple such variables where linkage between nvptx and amdgcn are different. Also current style gives flexibility to a future implementation to define these variables in their own way. | |
clang/lib/CodeGen/CGOpenMPRuntimeGPU.h | ||
204 | This movement changes them from private to protected. | |
221–253 | Yes, they are being called from outside class. |
clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp | ||
---|---|---|
59 | Perhaps (typed into browser): llvm::GlobalVariable *CGOpenMPRuntimeNVPTX::createGlobal( CodeGenModule &CGM, llvm::ArrayType *Ty, StringRef Name) { return new llvm::GlobalVariable( CGM.getModule(), Ty, /*isConstant=*/false, llvm::GlobalVariable::CommonLinkage, llvm::Constant::getNullValue(Ty), Name, /*InsertBefore=*/nullptr, llvm::GlobalVariable::NotThreadLocal, CGM.getContext().getTargetAddressSpace(LangAS::cuda_shared), /*isExternallyInitialized*/ true); } llvm::GlobalVariable *CGOpenMPRuntimeAMDGCN::createGlobal( CodeGenModule &CGM, llvm::ArrayType *Ty, StringRef Name) { return new llvm::GlobalVariable( CGM.getModule(), Ty, /*isConstant=*/false, llvm::GlobalVariable::WeakAnyLinkage, llvm::Constant::getNullValue(Ty), Name, /*InsertBefore=*/nullptr, llvm::GlobalVariable::NotThreadLocal, CGM.getContext().getTargetAddressSpace(LangAS::cuda_shared), /*isExternallyInitialized*/ false); } |
- Simplifies overall patch after D90248.
- Removes MaxParallelLevel and thus target specific PrePostActionTy.
- Removes ExternallyInitialized qualifier from shared variables for AMDGCN.
clang/lib/CodeGen/CGOpenMPRuntime.cpp | ||
---|---|---|
1344 | This appears to be the same as the free function we had before, except now all the call sites are prefixed CodegenUtil. Is there a functional change I'm missing? The rest of this patch would be easier to read with this part split off. | |
clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.cpp | ||
77 | This is a very verbose way to say that amdgcn calls emitmetatdata at the end of emitkernel and nvptx doesn't. Suggest unconditionally calling emitmetatdata, and having emitmetatdata be a no-op for nvptx. | |
89 | I think there's a credible chance this is useful to nvptx, so doesn't have to be amdgcn specific | |
108 | I think this is about computing a maximum workgroup size which the runtime uses to limit the number of threads it launches. If so, this is probably useful for nvptx and amdgcn. I'm having trouble working out what the conditions are though. Maybe it's based on an openmp clause? | |
150 | I think I remember seeing a diff that makes this attribute unconditionally emitted by some other part of the toolchain. If so, it may no longer be required | |
169 | HostServices is unused. Mode is redundant with exec_mode. wg_size is redundant with the other wg_size symbol added above. This kern_desc object should be deleted, not upstreamed. |
clang/lib/CodeGen/CGOpenMPRuntime.cpp | ||
---|---|---|
1344 | ||
clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.cpp | ||
77 | Won't the no-op approach be less extensible? Current way, though verbose, leaves scope for attaching prefix/suffix code as and when required around emitkernel. While in case of no-op, every implementing arch might have to use the exact same pattern of methods with and without code. | |
89 | You are right, it can be useful for nvptx as well. May be we can club its generalization with the nvptx's use-case when it arrives in the future? | |
108 | Yes, the if block in 111-147 corresponds to "number of threads" for thread_limit and num_threads clauses in teams and parallel directives. | |
169 | Ok, thanks. Will update in next revision. |
I don't believe the contents of this patch is necessary for codegen on amdgpu. One of the internal/weak distinctions works around a bug in the gfx800 toolchain, but we should root cause and fix that bug instead. The kern_desc object is redundant. I think amdgpu-flat-work-group-size is already emitted, but if not, we might want that.
The wg_size code is interesting but architecture independent, and it's probably more user friendly for nvptx and amdgcn to have the same handling of wg_size constraints.
Remove unnecessary formatting changes.