This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/CodeGen/
-
CodeGen/
2/9
CGOpenMPRuntime.h
1/2
CGOpenMPRuntime.cpp
4/7
CGOpenMPRuntimeAMDGCN.h
8/15
CGOpenMPRuntimeAMDGCN.cpp
3/7
CGOpenMPRuntimeGPU.h
2
CGOpenMPRuntimeGPU.cpp
1
CGOpenMPRuntimeNVPTX.h
1
CGOpenMPRuntimeNVPTX.cpp
-
test/OpenMP/
-
OpenMP/
-
amdgcn_target_codegen.cpp

Differential D86097

[OpenMP][AMDGCN] Generate global variables and attributes for AMDGCN
AbandonedPublic

Authored by saiislam on Aug 17 2020, 11:55 AM.

Download Raw Diff

Details

Reviewers

ABataev
jdoerfert
JonChesterfield

Summary

Provide support for amdgcn specific global variables and attributes.
Generalize allocation of various common global variables and provide
their specialized implementations for nvptx and amdgcn.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

saiislam created this revision.Aug 17 2020, 11:55 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 17 2020, 11:55 AM

Herald added subscribers: cfe-commits, guansong, yaxunl and 2 others. · View Herald Transcript

saiislam requested review of this revision.Aug 17 2020, 11:55 AM

Herald added a subscriber: sstefan1. · View Herald TranscriptAug 17 2020, 11:55 AM

ABataev added inline comments.Aug 17 2020, 12:19 PM

clang/lib/CodeGen/CGOpenMPRuntime.h
499	Cab this type and corresponding functions be made AMDGCN-specific only?
clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.cpp
116	Is this possible?
119	`FlatAttrEmitted`
129	`CompileTimeThreadLimit`
clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
1394–1395	Restore original formatting.
clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
245–273	Make them protected, not public if possible. Try the same for other new functions.
clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.h
40–65	No need to add `virtual`, `override` is enough

Harbormaster completed remote builds in B68651: Diff 286101.Aug 17 2020, 1:00 PM

Moved amdgcn specific functions to CGOpenMPAMDGCN.cpp
Removed tautology condition
Corrected case of local variables
Restored original formatting
Changed back declaration of emit kernel methods as private
Added support of amdgcn specific PrePostActionTy implementation and its corresponding test cases
Changed static line numbers in new test cases with regex
Other small code corrections

Harbormaster completed remote builds in B69371: Diff 287513.Aug 24 2020, 4:06 PM

Reformat the code

clang/lib/CodeGen/CGOpenMPRuntime.h
498	Remove unnecessary formatting changes.
2479–2483	Better to make it a protected member function if you really require it. Plus, this function is very small and, I think, you simply create your own copy in CGOpenMPRuntimeAMDGCN
2487	Same here, make it protected or just create a copy, if it is small.
clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.h
29–31	Add comments for all new members
117	Do you really need to make this class public? `final`

Reformarting
Comments
Reduced scope of specialized PrePostActionTy

saiislam marked an inline comment as done.Aug 26 2020, 12:27 PM

saiislam added inline comments.

clang/lib/CodeGen/CGOpenMPRuntime.h
2479–2483	Not making it protected because it is used by various static functions. And don't want to create an object pointer of subclass of CGOpenMPRuntime in CGOpenMPRuntime.
2487	It calls static functions which in turn call other static functions, so it won't make sense to create a copy of whole function chain in amdgcn.

Harbormaster completed remote builds in B69656: Diff 288072.Aug 26 2020, 1:51 PM

Ping.

ABataev added inline comments.Sep 15 2020, 7:44 AM

clang/lib/CodeGen/CGOpenMPRuntime.h
498	Still not removed
684	Restore original formatting
2482–2487	Better to encapsulate these functions into a new utility class and make them public static.
clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.h
30–63	Do you really need to expose all these new members as public?
31	Runtime does not support nested parallelism on GPU. Do you really need it?
98	It does not help to understand the functionality
clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
1957	It leads to a mem leak.
clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
37	Make it private or protected 2.Add default initializer
217–249	Are all these required to be public?
415–417	Make it private or protected

Removed unnecessary formatting of untouched code.
Encapsulated addFieldToRecordDecl and createGlobalStruct methods in a class and made them static (triggered change at all calling sites).
Marked most of the member methods of CGOpenMPRuntimeAMDGCN as private (forgot to do same change in nvptx)
Fixed the memory leak
Marked appropriate member variables as protected in CGOpenMPRuntimeGPU

JonChesterfield added inline comments.Oct 15 2020, 8:56 AM

clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.cpp
175	The nvptx emitSPMDKernelWrapper does nothing and the amdgcn one appends some metadata. How about 'nvptx::generateMetadata(...)' that does nothing and 'amdgcn::generateMetadata(...)` that does this stuff, called from the end of emitSPMDKernel?
197	This metadata generation could be split out from the other changes.
clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.h
48	I'm not convinced by this abstraction. It looks like amdgcn and nvptx want almost exactly the same variable in each case. The difference appears to be that nvptx uses internal linkage and amdgcn uses weak + externally initialized, in which case we're better off with `bool nvptx::needsExternalInitialization() {return false;}` `bool amdgpu::needsExternalInitialization() {return true;}` Or, if the inline ternary is unappealing, amdgcn::NewGlobalVariable(...) that passes the arguments to llvm::GlobalVariable while setting the two fields that differ between the two.
clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
170	Please put this back to the previous location so we can see whether it changed in the diff

Harbormaster completed remote builds in B75182: Diff 298377.Oct 15 2020, 9:01 AM

saiislam marked 3 inline comments as done.Oct 15 2020, 12:13 PM

saiislam added inline comments.

clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.cpp
175	It will be then difficult to track what all things are being done differently in the two. So, the common code has been generalized and (no change in nvptx + some changes in amdgcn) has been used as specialization.
clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.h
48	I understand what you are suggesting. But, there are multiple such variables where linkage between nvptx and amdgcn are different. Also current style gives flexibility to a future implementation to define these variables in their own way. What do you think?
clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
170	This movement changes them from private to protected. I could have just added access specifiers and not move the definitions. It would have simplified the review, but it would have decreased the readability for future.
217–249	Yes, they are being called from outside class.

JonChesterfield added inline comments.Oct 19 2020, 7:49 AM

clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp

Perhaps (typed into browser):

llvm::GlobalVariable *CGOpenMPRuntimeNVPTX::createGlobal( CodeGenModule &CGM, llvm::ArrayType *Ty, StringRef Name) {
  return new llvm::GlobalVariable(
      CGM.getModule(), Ty, /*isConstant=*/false,
      llvm::GlobalVariable::CommonLinkage, llvm::Constant::getNullValue(Ty),
      Name,
      /*InsertBefore=*/nullptr, llvm::GlobalVariable::NotThreadLocal,
      CGM.getContext().getTargetAddressSpace(LangAS::cuda_shared),
/*isExternallyInitialized*/ true);
}

llvm::GlobalVariable *CGOpenMPRuntimeAMDGCN::createGlobal( CodeGenModule &CGM, llvm::ArrayType *Ty, StringRef Name) {
  return new llvm::GlobalVariable(
      CGM.getModule(), Ty, /*isConstant=*/false,
      llvm::GlobalVariable::WeakAnyLinkage, llvm::Constant::getNullValue(Ty),
      Name,
      /*InsertBefore=*/nullptr, llvm::GlobalVariable::NotThreadLocal,
      CGM.getContext().getTargetAddressSpace(LangAS::cuda_shared),
/*isExternallyInitialized*/ false);
}

pdhaliwal added a subscriber: pdhaliwal.Nov 17 2020, 4:48 AM

Simplifies overall patch after D90248.
Removes MaxParallelLevel and thus target specific PrePostActionTy.
Removes ExternallyInitialized qualifier from shared variables for AMDGCN.

Harbormaster completed remote builds in B79814: Diff 307108.Nov 23 2020, 10:28 AM

JonChesterfield added inline comments.Nov 23 2020, 6:02 PM

clang/lib/CodeGen/CGOpenMPRuntime.cpp
1344	This appears to be the same as the free function we had before, except now all the call sites are prefixed CodegenUtil. Is there a functional change I'm missing? The rest of this patch would be easier to read with this part split off.
clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.cpp
74	This is a very verbose way to say that amdgcn calls emitmetatdata at the end of emitkernel and nvptx doesn't. Suggest unconditionally calling emitmetatdata, and having emitmetatdata be a no-op for nvptx.
86	I think there's a credible chance this is useful to nvptx, so doesn't have to be amdgcn specific
105	I think this is about computing a maximum workgroup size which the runtime uses to limit the number of threads it launches. If so, this is probably useful for nvptx and amdgcn. I'm having trouble working out what the conditions are though. Maybe it's based on an openmp clause?
147	I think I remember seeing a diff that makes this attribute unconditionally emitted by some other part of the toolchain. If so, it may no longer be required
166	HostServices is unused. Mode is redundant with exec_mode. wg_size is redundant with the other wg_size symbol added above. This kern_desc object should be deleted, not upstreamed.

saiislam mentioned this in D92167: [OpenMP][NFC] Encapsulate some CGOpenMPRuntime static methods in a utility class.Nov 26 2020, 3:24 AM

saiislam marked 3 inline comments as done.Nov 26 2020, 4:27 AM

saiislam added inline comments.

clang/lib/CodeGen/CGOpenMPRuntime.cpp
1344	addFieldToRecordDecl and createGlobalStruct methods had file static scope. To make them callable from other files, from amdgcn specific file in this case, they were put in this utility class. D92167 puts this change into a separate patch. Will update this patch once D92167 gets accepted.
clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.cpp
74	Won't the no-op approach be less extensible? Current way, though verbose, leaves scope for attaching prefix/suffix code as and when required around emitkernel. While in case of no-op, every implementing arch might have to use the exact same pattern of methods with and without code.
86	You are right, it can be useful for nvptx as well. May be we can club its generalization with the nvptx's use-case when it arrives in the future?
105	Yes, the if block in 111-147 corresponds to "number of threads" for thread_limit and num_threads clauses in teams and parallel directives.
166	Ok, thanks. Will update in next revision.

I don't believe the contents of this patch is necessary for codegen on amdgpu. One of the internal/weak distinctions works around a bug in the gfx800 toolchain, but we should root cause and fix that bug instead. The kern_desc object is redundant. I think amdgpu-flat-work-group-size is already emitted, but if not, we might want that.

The wg_size code is interesting but architecture independent, and it's probably more user friendly for nvptx and amdgcn to have the same handling of wg_size constraints.

This revision now requires changes to proceed.Nov 26 2020, 8:42 AM

saiislam abandoned this revision.Sep 21 2021, 7:24 AM

saiislam marked 3 inline comments as done.

Revision Contents

Path

Size

clang/

lib/

CodeGen/

CGOpenMPRuntime.h

14 lines

CGOpenMPRuntime.cpp

29 lines

CGOpenMPRuntimeAMDGCN.h

76 lines

CGOpenMPRuntimeAMDGCN.cpp

175 lines

CGOpenMPRuntimeGPU.h

101 lines

CGOpenMPRuntimeGPU.cpp

84 lines

CGOpenMPRuntimeNVPTX.h

48 lines

CGOpenMPRuntimeNVPTX.cpp

58 lines

test/

OpenMP/

amdgcn_target_codegen.cpp

84 lines

Diff 288072

clang/lib/CodeGen/CGOpenMPRuntime.h

Show First 20 Lines • Show All 489 Lines • ▼ Show 20 Lines	private:
/// char *name; // Name of the function or global.		/// char *name; // Name of the function or global.
/// size_t size; // Size of the entry info (0 if it a function).		/// size_t size; // Size of the entry info (0 if it a function).
/// int32_t flags;		/// int32_t flags;
/// int32_t reserved;		/// int32_t reserved;
/// };		/// };
QualType TgtOffloadEntryQTy;		QualType TgtOffloadEntryQTy;
/// Entity that registers the offloading constants that were emitted so		/// Entity that registers the offloading constants that were emitted so
/// far.		/// far.

		ABataevUnsubmitted Not Done Reply Inline Actions Remove unnecessary formatting changes. ABataev: Remove unnecessary formatting changes.
		ABataevUnsubmitted Not Done Reply Inline Actions Still not removed ABataev: Still not removed
class OffloadEntriesInfoManagerTy {		class OffloadEntriesInfoManagerTy {
		ABataevUnsubmitted Not Done Reply Inline Actions Cab this type and corresponding functions be made AMDGCN-specific only? ABataev: Cab this type and corresponding functions be made AMDGCN-specific only?
CodeGenModule &CGM;		CodeGenModule &CGM;

/// Number of entries registered so far.		/// Number of entries registered so far.
unsigned OffloadingEntriesNum = 0;		unsigned OffloadingEntriesNum = 0;

public:		public:
/// Base class of the entries info.		/// Base class of the entries info.
class OffloadEntryInfo {		class OffloadEntryInfo {
▲ Show 20 Lines • Show All 169 Lines • ▼ Show 20 Lines	bool hasDeviceGlobalVarEntryInfo(StringRef VarName) const {
return OffloadEntriesDeviceGlobalVar.count(VarName) > 0;		return OffloadEntriesDeviceGlobalVar.count(VarName) > 0;
}		}
/// Applies action \a Action on all registered entries.		/// Applies action \a Action on all registered entries.
typedef llvm::function_ref<void(StringRef,		typedef llvm::function_ref<void(StringRef,
const OffloadEntryInfoDeviceGlobalVar &)>		const OffloadEntryInfoDeviceGlobalVar &)>
OffloadDeviceGlobalVarEntryInfoActTy;		OffloadDeviceGlobalVarEntryInfoActTy;
void actOnDeviceGlobalVarEntriesInfo(		void actOnDeviceGlobalVarEntriesInfo(
const OffloadDeviceGlobalVarEntryInfoActTy &Action);		const OffloadDeviceGlobalVarEntryInfoActTy &Action);

ABataevUnsubmitted Not Done Reply Inline Actions Restore original formatting ABataev: Restore original formatting
private:		private:
// Storage for target region entries kind. The storage is to be indexed by		// Storage for target region entries kind. The storage is to be indexed by
// file ID, device ID, parent function name and line number.		// file ID, device ID, parent function name and line number.
typedef llvm::DenseMap<unsigned, OffloadEntryInfoTargetRegion>		typedef llvm::DenseMap<unsigned, OffloadEntryInfoTargetRegion>
OffloadEntriesTargetRegionPerLine;		OffloadEntriesTargetRegionPerLine;
typedef llvm::StringMap<OffloadEntriesTargetRegionPerLine>		typedef llvm::StringMap<OffloadEntriesTargetRegionPerLine>
OffloadEntriesTargetRegionPerParentName;		OffloadEntriesTargetRegionPerParentName;
typedef llvm::DenseMap<unsigned, OffloadEntriesTargetRegionPerParentName>		typedef llvm::DenseMap<unsigned, OffloadEntriesTargetRegionPerParentName>
▲ Show 20 Lines • Show All 1,778 Lines • ▼ Show 20 Lines	public:

/// Gets the OpenMP-specific address of the local variable.		/// Gets the OpenMP-specific address of the local variable.
Address getAddressOfLocalVariable(CodeGenFunction &CGF,		Address getAddressOfLocalVariable(CodeGenFunction &CGF,
const VarDecl *VD) override {		const VarDecl *VD) override {
return Address::invalid();		return Address::invalid();
}		}
};		};

		/// Declaration of functions visible in clang::CodeGen namespace, to
		/// be used by target specific specializations of CGOpenMPRuntimeGPU.

		FieldDecl addFieldToRecordDecl(ASTContext &C, DeclContext DC,
		QualType FieldTy);
		ABataevUnsubmitted Not Done Reply Inline Actions Better to make it a protected member function if you really require it. Plus, this function is very small and, I think, you simply create your own copy in CGOpenMPRuntimeAMDGCN ABataev: Better to make it a protected member function if you really require it. Plus, this function is…
		saiislamAuthorUnsubmitted Done Reply Inline Actions Not making it protected because it is used by various static functions. And don't want to create an object pointer of subclass of CGOpenMPRuntime in CGOpenMPRuntime. saiislam: Not making it protected because it is used by various static functions. And don't want to…

		template <class... As>
		llvm::GlobalVariable *createGlobalStruct(CodeGenModule &CGM, QualType Ty,
		bool IsConstant,
		ABataevUnsubmitted Not Done Reply Inline Actions Same here, make it protected or just create a copy, if it is small. ABataev: Same here, make it protected or just create a copy, if it is small.
		saiislamAuthorUnsubmitted Done Reply Inline Actions It calls static functions which in turn call other static functions, so it won't make sense to create a copy of whole function chain in amdgcn. saiislam: It calls static functions which in turn call other static functions, so it won't make sense to…
		ABataevUnsubmitted Not Done Reply Inline Actions Better to encapsulate these functions into a new utility class and make them public static. ABataev: Better to encapsulate these functions into a new utility class and make them public static.
		ArrayRef<llvm::Constant *> Data,
		const Twine &Name, As &&... Args);

} // namespace CodeGen		} // namespace CodeGen
} // namespace clang		} // namespace clang

#endif		#endif

clang/lib/CodeGen/CGOpenMPRuntime.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,042 Lines • ▼ Show 20 Lines

LValue CGOpenMPTaskOutlinedRegionInfo::getThreadIDVariableLValue(		LValue CGOpenMPTaskOutlinedRegionInfo::getThreadIDVariableLValue(
CodeGenFunction &CGF) {		CodeGenFunction &CGF) {
return CGF.MakeAddrLValue(CGF.GetAddrOfLocalVar(getThreadIDVariable()),		return CGF.MakeAddrLValue(CGF.GetAddrOfLocalVar(getThreadIDVariable()),
getThreadIDVariable()->getType(),		getThreadIDVariable()->getType(),
AlignmentSource::Decl);		AlignmentSource::Decl);
}		}

static FieldDecl addFieldToRecordDecl(ASTContext &C, DeclContext DC,
QualType FieldTy) {
auto *Field = FieldDecl::Create(
C, DC, SourceLocation(), SourceLocation(), /Id=/nullptr, FieldTy,
C.getTrivialTypeSourceInfo(FieldTy, SourceLocation()),
/BW=/nullptr, /Mutable=/false, /InitStyle=/ICIS_NoInit);
Field->setAccess(AS_public);
DC->addDecl(Field);
return Field;
}

CGOpenMPRuntime::CGOpenMPRuntime(CodeGenModule &CGM, StringRef FirstSeparator,		CGOpenMPRuntime::CGOpenMPRuntime(CodeGenModule &CGM, StringRef FirstSeparator,
StringRef Separator)		StringRef Separator)
: CGM(CGM), FirstSeparator(FirstSeparator), Separator(Separator),		: CGM(CGM), FirstSeparator(FirstSeparator), Separator(Separator),
OMPBuilder(CGM.getModule()), OffloadEntriesInfoManager(CGM) {		OMPBuilder(CGM.getModule()), OffloadEntriesInfoManager(CGM) {
KmpCriticalNameTy = llvm::ArrayType::get(CGM.Int32Ty, /NumElements/ 8);		KmpCriticalNameTy = llvm::ArrayType::get(CGM.Int32Ty, /NumElements/ 8);

// Initialize Types used in OpenMPIRBuilder from OMPKinds.def		// Initialize Types used in OpenMPIRBuilder from OMPKinds.def
OMPBuilder.initialize();		OMPBuilder.initialize();
▲ Show 20 Lines • Show All 277 Lines • ▼ Show 20 Lines	for (const FieldDecl *FD : RD->fields()) {
for (unsigned I = PrevIdx; I < Idx; ++I)		for (unsigned I = PrevIdx; I < Idx; ++I)
Fields.add(llvm::Constant::getNullValue(StructTy->getElementType(I)));		Fields.add(llvm::Constant::getNullValue(StructTy->getElementType(I)));
PrevIdx = Idx + 1;		PrevIdx = Idx + 1;
Fields.add(*DI);		Fields.add(*DI);
++DI;		++DI;
}		}
}		}

		FieldDecl clang::CodeGen::addFieldToRecordDecl(ASTContext &C, DeclContext DC,
		JonChesterfieldUnsubmitted Not Done Reply Inline Actions This appears to be the same as the free function we had before, except now all the call sites are prefixed CodegenUtil. Is there a functional change I'm missing? The rest of this patch would be easier to read with this part split off. JonChesterfield: This appears to be the same as the free function we had before, except now all the call sites…
		saiislamAuthorUnsubmitted Done Reply Inline Actions addFieldToRecordDecl and createGlobalStruct methods had file static scope. To make them callable from other files, from amdgcn specific file in this case, they were put in this utility class. D92167 puts this change into a separate patch. Will update this patch once D92167 gets accepted. saiislam: addFieldToRecordDecl and createGlobalStruct methods had file static scope. To make them…
		QualType FieldTy) {
		auto *Field = FieldDecl::Create(
		C, DC, SourceLocation(), SourceLocation(), /Id=/nullptr, FieldTy,
		C.getTrivialTypeSourceInfo(FieldTy, SourceLocation()),
		/BW=/nullptr, /Mutable=/false, /InitStyle=/ICIS_NoInit);
		Field->setAccess(AS_public);
		DC->addDecl(Field);
		return Field;
		}

template <class... As>		template <class... As>
static llvm::GlobalVariable *		llvm::GlobalVariable *clang::CodeGen::createGlobalStruct(
createGlobalStruct(CodeGenModule &CGM, QualType Ty, bool IsConstant,		CodeGenModule &CGM, QualType Ty, bool IsConstant,
ArrayRef<llvm::Constant *> Data, const Twine &Name,		ArrayRef<llvm::Constant *> Data, const Twine &Name, As &&... Args) {
As &&... Args) {
const auto *RD = cast<RecordDecl>(Ty->getAsTagDecl());		const auto *RD = cast<RecordDecl>(Ty->getAsTagDecl());
const CGRecordLayout &RL = CGM.getTypes().getCGRecordLayout(RD);		const CGRecordLayout &RL = CGM.getTypes().getCGRecordLayout(RD);
ConstantInitBuilder CIBuilder(CGM);		ConstantInitBuilder CIBuilder(CGM);
ConstantStructBuilder Fields = CIBuilder.beginStruct(RL.getLLVMType());		ConstantStructBuilder Fields = CIBuilder.beginStruct(RL.getLLVMType());
buildStructValue(Fields, CGM, RD, RL, Data);		buildStructValue(Fields, CGM, RD, RL, Data);
return Fields.finishAndCreateGlobal(		return Fields.finishAndCreateGlobal(
Name, CGM.getContext().getAlignOfGlobalVarInChars(Ty), IsConstant,		Name, CGM.getContext().getAlignOfGlobalVarInChars(Ty), IsConstant,
std::forward<As>(Args)...);		std::forward<As>(Args)...);
▲ Show 20 Lines • Show All 10,761 Lines • Show Last 20 Lines

clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.h

	Show All 20 Lines

	namespace clang {			namespace clang {
	namespace CodeGen {			namespace CodeGen {

	class CGOpenMPRuntimeAMDGCN final : public CGOpenMPRuntimeGPU {			class CGOpenMPRuntimeAMDGCN final : public CGOpenMPRuntimeGPU {

	public:			public:
	explicit CGOpenMPRuntimeAMDGCN(CodeGenModule &CGM);			explicit CGOpenMPRuntimeAMDGCN(CodeGenModule &CGM);

				/// Curret nesting level of parallel region
				int ParallelLevel = 0;
				ABataevUnsubmitted Done Reply Inline Actions Add comments for all new members ABataev: Add comments for all new members
				ABataevUnsubmitted Not Done Reply Inline Actions Runtime does not support nested parallelism on GPU. Do you really need it? ABataev: Runtime does not support nested parallelism on GPU. Do you really need it?

				/// Maximum nesting level of parallel region
				int MaxParallelLevel = 0;

				/// Struct to store kernel descriptors
				QualType TgtAttributeStructQTy;

	/// Get the GPU warp size.			/// Get the GPU warp size.
	llvm::Value *getGPUWarpSize(CodeGenFunction &CGF) override;			llvm::Value *getGPUWarpSize(CodeGenFunction &CGF) override;

	/// Get the id of the current thread on the GPU.			/// Get the id of the current thread on the GPU.
	llvm::Value *getGPUThreadID(CodeGenFunction &CGF) override;			llvm::Value *getGPUThreadID(CodeGenFunction &CGF) override;

	/// Get the maximum number of threads in a block of the GPU.			/// Get the maximum number of threads in a block of the GPU.
	llvm::Value *getGPUNumThreads(CodeGenFunction &CGF) override;			llvm::Value *getGPUNumThreads(CodeGenFunction &CGF) override;

				/// Allocate global variable for TransferMedium
				JonChesterfieldUnsubmitted Not Done Reply Inline Actions I'm not convinced by this abstraction. It looks like amdgcn and nvptx want almost exactly the same variable in each case. The difference appears to be that nvptx uses internal linkage and amdgcn uses weak + externally initialized, in which case we're better off with `bool nvptx::needsExternalInitialization() {return false;}` `bool amdgpu::needsExternalInitialization() {return true;}` Or, if the inline ternary is unappealing, amdgcn::NewGlobalVariable(...) that passes the arguments to llvm::GlobalVariable while setting the two fields that differ between the two. JonChesterfield: I'm not convinced by this abstraction. It looks like amdgcn and nvptx want almost exactly the…
				saiislamAuthorUnsubmitted Done Reply Inline Actions I understand what you are suggesting. But, there are multiple such variables where linkage between nvptx and amdgcn are different. Also current style gives flexibility to a future implementation to define these variables in their own way. What do you think? saiislam: I understand what you are suggesting. But, there are multiple such variables where linkage…
				llvm::GlobalVariable *
				allocateTransferMediumGlobal(CodeGenModule &CGM, llvm::ArrayType *Ty,
				StringRef TransferMediumName) override;

				/// Allocate global variable for SharedStaticRD
				llvm::GlobalVariable *
				allocateSharedStaticRDGlobal(CodeGenModule &CGM,
				llvm::Type *LLVMStaticTy) override;

				/// Get global variable KernelStaticGlobalized which is a shared pointer for
				/// the global memory in the global memory buffer used for the given kernel
				llvm::GlobalVariable *
				allocateKernelStaticGlobalized(CodeGenModule &CGM) override;

				/// Get target specific PrePostActionTy
				ABataevUnsubmitted Done Reply Inline Actions Do you really need to expose all these new members as public? ABataev: Do you really need to expose all these new members as public?
				PrePostActionTy *getPrePostActionTy() override;

				/// Target independent wrapper over target specific emitSPMDKernel()
				void emitSPMDKernelWrapper(const OMPExecutableDirective &D,
				StringRef ParentName, llvm::Function *&OutlinedFn,
				llvm::Constant *&OutlinedFnID, bool IsOffloadEntry,
				const RegionCodeGenTy &CodeGen) override;

				/// Target independent wrapper over target specific emitNonSPMDKernel()
				void emitNonSPMDKernelWrapper(const OMPExecutableDirective &D,
				StringRef ParentName,
				llvm::Function *&OutlinedFn,
				llvm::Constant *&OutlinedFnID,
				bool IsOffloadEntry,
				const RegionCodeGenTy &CodeGen) override;

				/// Create a unique global variable to indicate the flat-work-group-size
				/// for this region. Values are [256..1024].
				static void setPropertyWorkGroupSize(CodeGenModule &CGM, StringRef Name,
				unsigned WGSize);

				/// Generate global variables _wg_size, kern_desc, __tgt_attribute_struct.
				/// Also generate appropriate value of attribute amdgpu-flat-work-group-size
				void generateMetaData(CodeGenModule &CGM, const OMPExecutableDirective &D,
				llvm::Function *&OutlinedFn, bool IsGeneric);

				/// Returns __tgt_attribute_struct type.
				QualType getTgtAttributeStructQTy();

				/// Emit structure descriptor for a kernel
				void emitStructureKernelDesc(CodeGenModule &CGM, StringRef Name,
				int16_t WG_Size, int8_t Mode,
				int8_t HostServices, int8_t MaxParallelLevel);

				/// AMDGCN specific PrePostActionTy implementation
				ABataevUnsubmitted Done Reply Inline Actions It does not help to understand the functionality ABataev: It does not help to understand the functionality
				class AMDGCNPrePostActionTy final : public PrePostActionTy {
				int &ParallelLevel;
				int &MaxParallelLevel;

				public:
				AMDGCNPrePostActionTy(int &ParallelLevel, int &MaxParallelLevel)
				: ParallelLevel(ParallelLevel), MaxParallelLevel(MaxParallelLevel) {}
				void Enter(CodeGenFunction &CGF) override {
				// Count the number of nested parallels.
				if (ParallelLevel > MaxParallelLevel)
				MaxParallelLevel = ParallelLevel;
				ParallelLevel++;
				}
				void Exit(CodeGenFunction &CGF) override { ParallelLevel--; }
				};
	};			};

	} // namespace CodeGen			} // namespace CodeGen
	} // namespace clang			} // namespace clang
				ABataevUnsubmitted Not Done Reply Inline Actions Do you really need to make this class public? `final` ABataev: 1. Do you really need to make this class public? 2. `final`

	#endif // LLVM_CLANG_LIB_CODEGEN_CGOPENMPRUNTIMEAMDGCN_H			#endif // LLVM_CLANG_LIB_CODEGEN_CGOPENMPRUNTIMEAMDGCN_H

clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.cpp

//===-- CGOpenMPRuntimeAMDGCN.cpp - Interface to OpenMP AMDGCN Runtimes --===//		//===-- CGOpenMPRuntimeAMDGCN.cpp - Interface to OpenMP AMDGCN Runtimes --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This provides a class for OpenMP runtime code generation specialized to		// This provides a class for OpenMP runtime code generation specialized to
// AMDGCN targets from generalized CGOpenMPRuntimeGPU class.		// AMDGCN targets from generalized CGOpenMPRuntimeGPU class.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "CGOpenMPRuntimeAMDGCN.h"		#include "CGOpenMPRuntimeAMDGCN.h"
		#include "CGOpenMPRuntime.h"
#include "CGOpenMPRuntimeGPU.h"		#include "CGOpenMPRuntimeGPU.h"
#include "CodeGenFunction.h"		#include "CodeGenFunction.h"
#include "clang/AST/Attr.h"		#include "clang/AST/Attr.h"
#include "clang/AST/DeclOpenMP.h"		#include "clang/AST/DeclOpenMP.h"
#include "clang/AST/StmtOpenMP.h"		#include "clang/AST/StmtOpenMP.h"
#include "clang/AST/StmtVisitor.h"		#include "clang/AST/StmtVisitor.h"
#include "clang/Basic/Cuda.h"		#include "clang/Basic/Cuda.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/IR/IntrinsicsAMDGPU.h"		#include "llvm/IR/IntrinsicsAMDGPU.h"

using namespace clang;		using namespace clang;
using namespace CodeGen;		using namespace CodeGen;
using namespace llvm::omp;		using namespace llvm::omp;

CGOpenMPRuntimeAMDGCN::CGOpenMPRuntimeAMDGCN(CodeGenModule &CGM)		CGOpenMPRuntimeAMDGCN::CGOpenMPRuntimeAMDGCN(CodeGenModule &CGM)
: CGOpenMPRuntimeGPU(CGM) {		: CGOpenMPRuntimeGPU(CGM) {
if (!CGM.getLangOpts().OpenMPIsDevice)		if (!CGM.getLangOpts().OpenMPIsDevice)
llvm_unreachable("OpenMP AMDGCN can only handle device code.");		llvm_unreachable("OpenMP AMDGCN can only handle device code.");
		StaticRDLinkage = llvm::GlobalValue::PrivateLinkage;
}		}

llvm::Value *CGOpenMPRuntimeAMDGCN::getGPUWarpSize(CodeGenFunction &CGF) {		llvm::Value *CGOpenMPRuntimeAMDGCN::getGPUWarpSize(CodeGenFunction &CGF) {
CGBuilderTy &Bld = CGF.Builder;		CGBuilderTy &Bld = CGF.Builder;
// return constant compile-time target-specific warp size		// return constant compile-time target-specific warp size
unsigned WarpSize = CGF.getTarget().getGridValue(llvm::omp::GV_Warp_Size);		unsigned WarpSize = CGF.getTarget().getGridValue(llvm::omp::GV_Warp_Size);
return Bld.getInt32(WarpSize);		return Bld.getInt32(WarpSize);
}		}
Show All 13 Lines	llvm::Value *CGOpenMPRuntimeAMDGCN::getGPUNumThreads(CodeGenFunction &CGF) {
if (!F) {		if (!F) {
F = llvm::Function::Create(		F = llvm::Function::Create(
llvm::FunctionType::get(CGF.Int64Ty, {CGF.Int32Ty}, false),		llvm::FunctionType::get(CGF.Int64Ty, {CGF.Int32Ty}, false),
llvm::GlobalVariable::ExternalLinkage, LocSize, &CGF.CGM.getModule());		llvm::GlobalVariable::ExternalLinkage, LocSize, &CGF.CGM.getModule());
}		}
return Bld.CreateTrunc(		return Bld.CreateTrunc(
Bld.CreateCall(F, {Bld.getInt32(0)}, "nvptx_num_threads"), CGF.Int32Ty);		Bld.CreateCall(F, {Bld.getInt32(0)}, "nvptx_num_threads"), CGF.Int32Ty);
}		}

		llvm::GlobalVariable *CGOpenMPRuntimeAMDGCN::allocateTransferMediumGlobal(
		CodeGenModule &CGM, llvm::ArrayType *Ty, StringRef TransferMediumName) {
		return new llvm::GlobalVariable(
		CGM.getModule(), Ty, /isConstant=/false,
		llvm::GlobalVariable::WeakAnyLinkage, llvm::UndefValue::get(Ty),
		TransferMediumName,
		/InsertBefore=/nullptr, llvm::GlobalVariable::NotThreadLocal,
		CGM.getContext().getTargetAddressSpace(LangAS::cuda_shared),
		/isExternallyInitialized/ true);
		}
		JonChesterfieldUnsubmitted Done Reply Inline Actions This is a very verbose way to say that amdgcn calls emitmetatdata at the end of emitkernel and nvptx doesn't. Suggest unconditionally calling emitmetatdata, and having emitmetatdata be a no-op for nvptx. JonChesterfield: This is a very verbose way to say that amdgcn calls emitmetatdata at the end of emitkernel and…
		saiislamAuthorUnsubmitted Done Reply Inline Actions Won't the no-op approach be less extensible? Current way, though verbose, leaves scope for attaching prefix/suffix code as and when required around emitkernel. While in case of no-op, every implementing arch might have to use the exact same pattern of methods with and without code. saiislam: Won't the no-op approach be less extensible? Current way, though verbose, leaves scope for…

		llvm::GlobalVariable *
		CGOpenMPRuntimeAMDGCN::allocateSharedStaticRDGlobal(CodeGenModule &CGM,
		llvm::Type *LLVMStaticTy) {
		return new llvm::GlobalVariable(
		CGM.getModule(), LLVMStaticTy,
		/isConstant=/false, llvm::GlobalValue::WeakAnyLinkage,
		llvm::UndefValue::get(LLVMStaticTy), "_openmp_shared_static_glob_rd_$_",
		/InsertBefore=/nullptr, llvm::GlobalValue::NotThreadLocal,
		CGM.getContext().getTargetAddressSpace(LangAS::cuda_shared),
		/isExternallyInitialized/ true);
		}
		JonChesterfieldUnsubmitted Done Reply Inline Actions I think there's a credible chance this is useful to nvptx, so doesn't have to be amdgcn specific JonChesterfield: I think there's a credible chance this is useful to nvptx, so doesn't have to be amdgcn specific
		saiislamAuthorUnsubmitted Done Reply Inline Actions You are right, it can be useful for nvptx as well. May be we can club its generalization with the nvptx's use-case when it arrives in the future? saiislam: You are right, it can be useful for nvptx as well. May be we can club its generalization with…

		llvm::GlobalVariable *
		CGOpenMPRuntimeAMDGCN::allocateKernelStaticGlobalized(CodeGenModule &CGM) {
		return new llvm::GlobalVariable(
		CGM.getModule(), CGM.VoidPtrTy, /isConstant=/false,
		llvm::GlobalValue::WeakAnyLinkage, llvm::UndefValue::get(CGM.VoidPtrTy),
		"_openmp_kernel_static_glob_rd$ptr", /InsertBefore=/nullptr,
		llvm::GlobalValue::NotThreadLocal,
		CGM.getContext().getTargetAddressSpace(LangAS::cuda_shared),
		/isExternallyInitialized/ true);
		}

		void CGOpenMPRuntimeAMDGCN::setPropertyWorkGroupSize(CodeGenModule &CGM,
		StringRef Name,
		unsigned WGSize) {
		auto *GVMode = new llvm::GlobalVariable(
		CGM.getModule(), CGM.Int16Ty, /isConstant=/true,
		llvm::GlobalValue::WeakAnyLinkage,
		llvm::ConstantInt::get(CGM.Int16Ty, WGSize), Name + Twine("_wg_size"),
		JonChesterfieldUnsubmitted Done Reply Inline Actions I think this is about computing a maximum workgroup size which the runtime uses to limit the number of threads it launches. If so, this is probably useful for nvptx and amdgcn. I'm having trouble working out what the conditions are though. Maybe it's based on an openmp clause? JonChesterfield: I think this is about computing a maximum workgroup size which the runtime uses to limit the…
		saiislamAuthorUnsubmitted Done Reply Inline Actions Yes, the if block in 111-147 corresponds to "number of threads" for thread_limit and num_threads clauses in teams and parallel directives. saiislam: Yes, the if block in 111-147 corresponds to "number of threads" for thread_limit and…
		/InsertBefore=/nullptr, llvm::GlobalVariable::NotThreadLocal,
		CGM.getContext().getTargetAddressSpace(LangAS::cuda_device),
		/isExternallyInitialized/ false);
		CGM.addCompilerUsedGlobal(GVMode);
		}

		void CGOpenMPRuntimeAMDGCN::generateMetaData(CodeGenModule &CGM,
		const OMPExecutableDirective &D,
		llvm::Function *&OutlinedFn,
		bool IsGeneric) {
		int FlatAttr = 0;
		ABataevUnsubmitted Not Done Reply Inline Actions Is this possible? ABataev: Is this possible?
		bool FlatAttrEmitted = false;
		unsigned DefaultWorkGroupSz =
		CGM.getTarget().getGridValue(llvm::omp::GVIDX::GV_Default_WG_Size);
		ABataevUnsubmitted Not Done Reply Inline Actions `FlatAttrEmitted` ABataev: `FlatAttrEmitted`

		if (isOpenMPTeamsDirective(D.getDirectiveKind()) \|\|
		isOpenMPParallelDirective(D.getDirectiveKind())) {
		const auto *ThreadLimitClause = D.getSingleClause<OMPThreadLimitClause>();
		const auto *NumThreadsClause = D.getSingleClause<OMPNumThreadsClause>();
		unsigned MaxWorkGroupSz =
		CGM.getTarget().getGridValue(llvm::omp::GVIDX::GV_Max_WG_Size);
		unsigned CompileTimeThreadLimit = 0;
		// Only one of thread_limit or num_threads is used, cant do it for both
		if (ThreadLimitClause && !NumThreadsClause) {
		ABataevUnsubmitted Not Done Reply Inline Actions `CompileTimeThreadLimit` ABataev: `CompileTimeThreadLimit`
		Expr *ThreadLimitExpr = ThreadLimitClause->getThreadLimit();
		clang::Expr::EvalResult Result;
		if (ThreadLimitExpr->EvaluateAsInt(Result, CGM.getContext()))
		CompileTimeThreadLimit = Result.Val.getInt().getExtValue();
		} else if (!ThreadLimitClause && NumThreadsClause) {
		Expr *NumThreadsExpr = NumThreadsClause->getNumThreads();
		clang::Expr::EvalResult Result;
		if (NumThreadsExpr->EvaluateAsInt(Result, CGM.getContext()))
		CompileTimeThreadLimit = Result.Val.getInt().getExtValue();
		}

		// Add kernel metadata if ThreadLimit Clause is compile time constant > 0
		if (CompileTimeThreadLimit > 0) {
		// Add the WarpSize to generic, to reflect what runtime dispatch does.
		if (IsGeneric)
		CompileTimeThreadLimit +=
		CGM.getTarget().getGridValue(llvm::omp::GVIDX::GV_Warp_Size);
		if (CompileTimeThreadLimit > MaxWorkGroupSz)
		JonChesterfieldUnsubmitted Not Done Reply Inline Actions I think I remember seeing a diff that makes this attribute unconditionally emitted by some other part of the toolchain. If so, it may no longer be required JonChesterfield: I think I remember seeing a diff that makes this attribute unconditionally emitted by some…
		CompileTimeThreadLimit = MaxWorkGroupSz;
		std::string AttrVal = llvm::utostr(CompileTimeThreadLimit);
		FlatAttr = CompileTimeThreadLimit;
		OutlinedFn->addFnAttr("amdgpu-flat-work-group-size",
		AttrVal + "," + AttrVal);
		setPropertyWorkGroupSize(CGM, OutlinedFn->getName(),
		CompileTimeThreadLimit);
		}
		FlatAttrEmitted = true;
		} // end of amdgcn teams or parallel directive

		// emit amdgpu-flat-work-group-size if not emitted already.
		if (!FlatAttrEmitted) {
		std::string FlatAttrVal = llvm::utostr(DefaultWorkGroupSz);
		OutlinedFn->addFnAttr("amdgpu-flat-work-group-size",
		FlatAttrVal + "," + FlatAttrVal);
		}
		// Emit a kernel descriptor for runtime.
		StringRef KernDescName = OutlinedFn->getName();
		JonChesterfieldUnsubmitted Not Done Reply Inline Actions HostServices is unused. Mode is redundant with exec_mode. wg_size is redundant with the other wg_size symbol added above. This kern_desc object should be deleted, not upstreamed. JonChesterfield: HostServices is unused. Mode is redundant with exec_mode. wg_size is redundant with the other…
		saiislamAuthorUnsubmitted Done Reply Inline Actions Ok, thanks. Will update in next revision. saiislam: Ok, thanks. Will update in next revision.
		CGOpenMPRuntimeAMDGCN::emitStructureKernelDesc(CGM, KernDescName, FlatAttr,
		IsGeneric,
		1, // Uses HostServices
		MaxParallelLevel);
		// Reset it to zero for any subsequent kernel
		MaxParallelLevel = 0;
		}

		void CGOpenMPRuntimeAMDGCN::emitSPMDKernelWrapper(
		JonChesterfieldUnsubmitted Not Done Reply Inline Actions The nvptx emitSPMDKernelWrapper does nothing and the amdgcn one appends some metadata. How about 'nvptx::generateMetadata(...)' that does nothing and 'amdgcn::generateMetadata(...)` that does this stuff, called from the end of emitSPMDKernel? JonChesterfield: The nvptx emitSPMDKernelWrapper does nothing and the amdgcn one appends some metadata. How…
		saiislamAuthorUnsubmitted Done Reply Inline Actions It will be then difficult to track what all things are being done differently in the two. So, the common code has been generalized and (no change in nvptx + some changes in amdgcn) has been used as specialization. saiislam: It will be then difficult to track what all things are being done differently in the two. So…
		const OMPExecutableDirective &D, StringRef ParentName,
		llvm::Function &OutlinedFn, llvm::Constant &OutlinedFnID,
		bool IsOffloadEntry, const RegionCodeGenTy &CodeGen) {
		emitSPMDKernel(D, ParentName, OutlinedFn, OutlinedFnID, IsOffloadEntry,
		CodeGen);
		generateMetaData(CGM, D, OutlinedFn, /SPMD/ false);
		}

		void CGOpenMPRuntimeAMDGCN::emitNonSPMDKernelWrapper(
		const OMPExecutableDirective &D, StringRef ParentName,
		llvm::Function &OutlinedFn, llvm::Constant &OutlinedFnID,
		bool IsOffloadEntry, const RegionCodeGenTy &CodeGen) {
		emitNonSPMDKernel(D, ParentName, OutlinedFn, OutlinedFnID, IsOffloadEntry,
		CodeGen);
		generateMetaData(CGM, D, OutlinedFn, /Generic/ true);
		}

		PrePostActionTy *CGOpenMPRuntimeAMDGCN::getPrePostActionTy() {
		return new AMDGCNPrePostActionTy(ParallelLevel, MaxParallelLevel);
		}

		/// Emit structure descriptor for a kernel
		JonChesterfieldUnsubmitted Not Done Reply Inline Actions This metadata generation could be split out from the other changes. JonChesterfield: This metadata generation could be split out from the other changes.
		void CGOpenMPRuntimeAMDGCN::emitStructureKernelDesc(
		CodeGenModule &CGM, StringRef Name, int16_t WG_Size, int8_t Mode,
		int8_t HostServices, int8_t MaxParallelLevel) {

		// Create all device images
		llvm::Constant *AttrData[] = {
		llvm::ConstantInt::get(CGM.Int16Ty, 2), // Version
		llvm::ConstantInt::get(CGM.Int16Ty, 9), // Size in bytes
		llvm::ConstantInt::get(CGM.Int16Ty, WG_Size),
		llvm::ConstantInt::get(CGM.Int8Ty, Mode), // 0 => SPMD, 1 => GENERIC
		llvm::ConstantInt::get(CGM.Int8Ty, HostServices), // 1 => use HostServices
		llvm::ConstantInt::get(CGM.Int8Ty, MaxParallelLevel)}; // number of nests

		llvm::GlobalVariable *AttrImages = createGlobalStruct(
		CGM, getTgtAttributeStructQTy(), isDefaultLocationConstant(), AttrData,
		Name + Twine("_kern_desc"), llvm::GlobalValue::WeakAnyLinkage);
		CGM.addCompilerUsedGlobal(AttrImages);
		}

		// Create Tgt Attribute Struct type.
		QualType CGOpenMPRuntimeAMDGCN::getTgtAttributeStructQTy() {
		ASTContext &C = CGM.getContext();
		QualType KmpInt8Ty = C.getIntTypeForBitwidth(/Width=/8, /Signed=/1);
		QualType KmpInt16Ty = C.getIntTypeForBitwidth(/Width=/16, /Signed=/1);
		if (TgtAttributeStructQTy.isNull()) {
		RecordDecl *RD = C.buildImplicitRecord("__tgt_attribute_struct");
		RD->startDefinition();
		clang::CodeGen::addFieldToRecordDecl(C, RD, KmpInt16Ty); // Version
		clang::CodeGen::addFieldToRecordDecl(C, RD,
		KmpInt16Ty); // Struct Size in bytes.
		clang::CodeGen::addFieldToRecordDecl(C, RD, KmpInt16Ty); // WG_size
		clang::CodeGen::addFieldToRecordDecl(C, RD, KmpInt8Ty); // Mode
		clang::CodeGen::addFieldToRecordDecl(C, RD, KmpInt8Ty); // HostServices
		clang::CodeGen::addFieldToRecordDecl(C, RD, KmpInt8Ty); // MaxParallelLevel
		RD->completeDefinition();
		TgtAttributeStructQTy = C.getRecordType(RD);
		}
		return TgtAttributeStructQTy;
		}

clang/lib/CodeGen/CGOpenMPRuntimeGPU.h

Show All 27 Lines	public:
enum ExecutionMode {		enum ExecutionMode {
/// SPMD execution mode (all threads are worker threads).		/// SPMD execution mode (all threads are worker threads).
EM_SPMD,		EM_SPMD,
/// Non-SPMD execution mode (1 master thread, others are workers).		/// Non-SPMD execution mode (1 master thread, others are workers).
EM_NonSPMD,		EM_NonSPMD,
/// Unknown execution mode (orphaned directive).		/// Unknown execution mode (orphaned directive).
EM_Unknown,		EM_Unknown,
};		};
		/// Linkage type of StaticRD Global variable
		llvm::GlobalValue::LinkageTypes StaticRDLinkage;
		ABataevUnsubmitted Done Reply Inline Actions Make it private or protected 2.Add default initializer ABataev: 1. Make it private or protected 2.Add default initializer

private:		private:
/// Parallel outlined function work for workers to execute.		/// Parallel outlined function work for workers to execute.
llvm::SmallVector<llvm::Function *, 16> Work;		llvm::SmallVector<llvm::Function *, 16> Work;

struct EntryFunctionState {		struct EntryFunctionState {
llvm::BasicBlock *ExitBB = nullptr;		llvm::BasicBlock *ExitBB = nullptr;
};		};

▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	private:
//		//

/// Creates offloading entry for the provided entry ID \a ID,		/// Creates offloading entry for the provided entry ID \a ID,
/// address \a Addr, size \a Size, and flags \a Flags.		/// address \a Addr, size \a Size, and flags \a Flags.
void createOffloadEntry(llvm::Constant ID, llvm::Constant Addr,		void createOffloadEntry(llvm::Constant ID, llvm::Constant Addr,
uint64_t Size, int32_t Flags,		uint64_t Size, int32_t Flags,
llvm::GlobalValue::LinkageTypes Linkage) override;		llvm::GlobalValue::LinkageTypes Linkage) override;

/// Emit outlined function specialized for the Fork-Join
/// programming model for applicable target directives on the NVPTX device.
/// \param D Directive to emit.
/// \param ParentName Name of the function that encloses the target region.
/// \param OutlinedFn Outlined function value to be defined by this call.
/// \param OutlinedFnID Outlined function ID value to be defined by this call.
/// \param IsOffloadEntry True if the outlined function is an offload entry.
/// An outlined function may not be an entry if, e.g. the if clause always
/// evaluates to false.
void emitNonSPMDKernel(const OMPExecutableDirective &D, StringRef ParentName,
llvm::Function *&OutlinedFn,
llvm::Constant *&OutlinedFnID, bool IsOffloadEntry,
const RegionCodeGenTy &CodeGen);

/// Emit outlined function specialized for the Single Program
/// Multiple Data programming model for applicable target directives on the
/// NVPTX device.
/// \param D Directive to emit.
/// \param ParentName Name of the function that encloses the target region.
/// \param OutlinedFn Outlined function value to be defined by this call.
/// \param OutlinedFnID Outlined function ID value to be defined by this call.
/// \param IsOffloadEntry True if the outlined function is an offload entry.
/// \param CodeGen Object containing the target statements.
/// An outlined function may not be an entry if, e.g. the if clause always
/// evaluates to false.
void emitSPMDKernel(const OMPExecutableDirective &D, StringRef ParentName,
llvm::Function *&OutlinedFn,
llvm::Constant *&OutlinedFnID, bool IsOffloadEntry,
const RegionCodeGenTy &CodeGen);

/// Emit outlined function for 'target' directive on the NVPTX		/// Emit outlined function for 'target' directive on the NVPTX
/// device.		/// device.
/// \param D Directive to emit.		/// \param D Directive to emit.
/// \param ParentName Name of the function that encloses the target region.		/// \param ParentName Name of the function that encloses the target region.
/// \param OutlinedFn Outlined function value to be defined by this call.		/// \param OutlinedFn Outlined function value to be defined by this call.
/// \param OutlinedFnID Outlined function ID value to be defined by this call.		/// \param OutlinedFnID Outlined function ID value to be defined by this call.
/// \param IsOffloadEntry True if the outlined function is an offload entry.		/// \param IsOffloadEntry True if the outlined function is an offload entry.
/// An outlined function may not be an entry if, e.g. the if clause always		/// An outlined function may not be an entry if, e.g. the if clause always
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	protected:
bool isDefaultLocationConstant() const override { return true; }		bool isDefaultLocationConstant() const override { return true; }

/// Returns additional flags that can be stored in reserved_2 field of the		/// Returns additional flags that can be stored in reserved_2 field of the
/// default location.		/// default location.
/// For NVPTX target contains data about SPMD/Non-SPMD execution mode +		/// For NVPTX target contains data about SPMD/Non-SPMD execution mode +
/// Full/Lightweight runtime mode. Used for better optimization.		/// Full/Lightweight runtime mode. Used for better optimization.
unsigned getDefaultLocationReserved2Flags() const override;		unsigned getDefaultLocationReserved2Flags() const override;

		/// Emit outlined function specialized for the Fork-Join
		JonChesterfieldUnsubmitted Not Done Reply Inline Actions Please put this back to the previous location so we can see whether it changed in the diff JonChesterfield: Please put this back to the previous location so we can see whether it changed in the diff
		saiislamAuthorUnsubmitted Done Reply Inline Actions This movement changes them from private to protected. I could have just added access specifiers and not move the definitions. It would have simplified the review, but it would have decreased the readability for future. saiislam: This movement changes them from private to protected. I could have just added access specifiers…
		/// programming model for applicable target directives on the NVPTX device.
		/// \param D Directive to emit.
		/// \param ParentName Name of the function that encloses the target region.
		/// \param OutlinedFn Outlined function value to be defined by this call.
		/// \param OutlinedFnID Outlined function ID value to be defined by this call.
		/// \param IsOffloadEntry True if the outlined function is an offload entry.
		/// An outlined function may not be an entry if, e.g. the if clause always
		/// evaluates to false.
		void emitNonSPMDKernel(const OMPExecutableDirective &D, StringRef ParentName,
		llvm::Function *&OutlinedFn,
		llvm::Constant *&OutlinedFnID, bool IsOffloadEntry,
		const RegionCodeGenTy &CodeGen);

		/// Emit outlined function specialized for the Single Program
		/// Multiple Data programming model for applicable target directives on the
		/// NVPTX device.
		/// \param D Directive to emit.
		/// \param ParentName Name of the function that encloses the target region.
		/// \param OutlinedFn Outlined function value to be defined by this call.
		/// \param OutlinedFnID Outlined function ID value to be defined by this call.
		/// \param IsOffloadEntry True if the outlined function is an offload entry.
		/// \param CodeGen Object containing the target statements.
		/// An outlined function may not be an entry if, e.g. the if clause always
		/// evaluates to false.
		void emitSPMDKernel(const OMPExecutableDirective &D, StringRef ParentName,
		llvm::Function *&OutlinedFn,
		llvm::Constant *&OutlinedFnID, bool IsOffloadEntry,
		const RegionCodeGenTy &CodeGen);

public:		public:
explicit CGOpenMPRuntimeGPU(CodeGenModule &CGM);		explicit CGOpenMPRuntimeGPU(CodeGenModule &CGM);
void clear() override;		void clear() override;

/// Declare generalized virtual functions which need to be defined		/// Declare generalized virtual functions which need to be defined
/// by all specializations of OpenMPGPURuntime Targets like AMDGCN		/// by all specializations of OpenMPGPURuntime Targets like AMDGCN
/// and NVPTX.		/// and NVPTX.

/// Get the GPU warp size.		/// Get the GPU warp size.
virtual llvm::Value *getGPUWarpSize(CodeGenFunction &CGF) = 0;		virtual llvm::Value *getGPUWarpSize(CodeGenFunction &CGF) = 0;

/// Get the id of the current thread on the GPU.		/// Get the id of the current thread on the GPU.
virtual llvm::Value *getGPUThreadID(CodeGenFunction &CGF) = 0;		virtual llvm::Value *getGPUThreadID(CodeGenFunction &CGF) = 0;

/// Get the maximum number of threads in a block of the GPU.		/// Get the maximum number of threads in a block of the GPU.
virtual llvm::Value *getGPUNumThreads(CodeGenFunction &CGF) = 0;		virtual llvm::Value *getGPUNumThreads(CodeGenFunction &CGF) = 0;

		/// Allocate global variable for TransferMedium
		virtual llvm::GlobalVariable *
		allocateTransferMediumGlobal(CodeGenModule &CGM, llvm::ArrayType *Ty,
		StringRef TransferMediumName) = 0;

		/// Allocate global variable for SharedStaticRD
		virtual llvm::GlobalVariable *
		allocateSharedStaticRDGlobal(CodeGenModule &CGM,
		llvm::Type *LLVMStaticTy) = 0;

		/// Allocate global variable for KernelStaticGlobalized
		virtual llvm::GlobalVariable *
		allocateKernelStaticGlobalized(CodeGenModule &CGM) = 0;

		/// Get target specific PrePostAction
		virtual PrePostActionTy *getPrePostActionTy() = 0;

		/// Target independent wrapper over target specific emitSPMDKernel()
		virtual void emitSPMDKernelWrapper(const OMPExecutableDirective &D,
		StringRef ParentName,
		llvm::Function *&OutlinedFn,
		llvm::Constant *&OutlinedFnID,
		bool IsOffloadEntry,
		const RegionCodeGenTy &CodeGen) = 0;

		/// Target independent wrapper over target specific emitNonSPMDKernel()
		virtual void emitNonSPMDKernelWrapper(const OMPExecutableDirective &D,
		StringRef ParentName,
		llvm::Function *&OutlinedFn,
		llvm::Constant *&OutlinedFnID,
		bool IsOffloadEntry,
		const RegionCodeGenTy &CodeGen) = 0;

		ABataevUnsubmitted Not Done Reply Inline Actions Are all these required to be public? ABataev: Are all these required to be public?
		saiislamAuthorUnsubmitted Done Reply Inline Actions Yes, they are being called from outside class. saiislam: Yes, they are being called from outside class.
/// Emit call to void __kmpc_push_proc_bind(ident_t *loc, kmp_int32		/// Emit call to void __kmpc_push_proc_bind(ident_t *loc, kmp_int32
/// global_tid, int proc_bind) to generate code for 'proc_bind' clause.		/// global_tid, int proc_bind) to generate code for 'proc_bind' clause.
virtual void emitProcBindClause(CodeGenFunction &CGF,		virtual void emitProcBindClause(CodeGenFunction &CGF,
llvm::omp::ProcBindKind ProcBind,		llvm::omp::ProcBindKind ProcBind,
SourceLocation Loc) override;		SourceLocation Loc) override;

/// Emits call to void __kmpc_push_num_threads(ident_t *loc, kmp_int32		/// Emits call to void __kmpc_push_num_threads(ident_t *loc, kmp_int32
/// global_tid, kmp_int32 num_threads) to generate code for 'num_threads'		/// global_tid, kmp_int32 num_threads) to generate code for 'num_threads'
/// clause.		/// clause.
/// \param NumThreads An integer value of threads.		/// \param NumThreads An integer value of threads.
virtual void emitNumThreadsClause(CodeGenFunction &CGF,		virtual void emitNumThreadsClause(CodeGenFunction &CGF,
llvm::Value *NumThreads,		llvm::Value *NumThreads,
SourceLocation Loc) override;		SourceLocation Loc) override;

/// This function ought to emit, in the general case, a call to		/// This function ought to emit, in the general case, a call to
// the openmp runtime kmpc_push_num_teams. In NVPTX backend it is not needed		// the openmp runtime kmpc_push_num_teams. In NVPTX backend it is not needed
// as these numbers are obtained through the PTX grid and block configuration.		// as these numbers are obtained through the PTX grid and block configuration.
/// \param NumTeams An integer expression of teams.		/// \param NumTeams An integer expression of teams.
/// \param ThreadLimit An integer expression of threads.		/// \param ThreadLimit An integer expression of threads.
void emitNumTeamsClause(CodeGenFunction &CGF, const Expr *NumTeams,		void emitNumTeamsClause(CodeGenFunction &CGF, const Expr *NumTeams,
const Expr *ThreadLimit, SourceLocation Loc) override;		const Expr *ThreadLimit, SourceLocation Loc) override;

/// Emits inlined function for the specified OpenMP parallel		/// Emits inlined function for the specified OpenMP parallel
// directive.		// directive.
		ABataevUnsubmitted Not Done Reply Inline Actions Make them protected, not public if possible. Try the same for other new functions. ABataev: Make them protected, not public if possible. Try the same for other new functions.
/// \a D. This outlined function has type void()(kmp_int32 ThreadID,		/// \a D. This outlined function has type void()(kmp_int32 ThreadID,
/// kmp_int32 BoundID, struct context_vars*).		/// kmp_int32 BoundID, struct context_vars*).
/// \param D OpenMP directive.		/// \param D OpenMP directive.
/// \param ThreadIDVar Variable for thread id in the current OpenMP region.		/// \param ThreadIDVar Variable for thread id in the current OpenMP region.
/// \param InnermostKind Kind of innermost directive (for simple directives it		/// \param InnermostKind Kind of innermost directive (for simple directives it
/// is a directive itself, for combined - its innermost directive).		/// is a directive itself, for combined - its innermost directive).
/// \param CodeGen Code generation sequence for the \a D directive.		/// \param CodeGen Code generation sequence for the \a D directive.
llvm::Function *		llvm::Function *
▲ Show 20 Lines • Show All 125 Lines • ▼ Show 20 Lines	public:
/// their declaration context.		/// their declaration context.
enum DataSharingMode {		enum DataSharingMode {
/// CUDA data sharing mode.		/// CUDA data sharing mode.
CUDA,		CUDA,
/// Generic data-sharing mode.		/// Generic data-sharing mode.
Generic,		Generic,
};		};

		/// true if we're definitely in the parallel region.
		bool IsInParallelRegion = false;

		ABataevUnsubmitted Not Done Reply Inline Actions Make it private or protected ABataev: Make it private or protected
/// Cleans up references to the objects in finished function.		/// Cleans up references to the objects in finished function.
///		///
void functionFinished(CodeGenFunction &CGF) override;		void functionFinished(CodeGenFunction &CGF) override;

/// Choose a default value for the dist_schedule clause.		/// Choose a default value for the dist_schedule clause.
void getDefaultDistScheduleAndChunk(CodeGenFunction &CGF,		void getDefaultDistScheduleAndChunk(CodeGenFunction &CGF,
const OMPLoopDirective &S, OpenMPDistScheduleClauseKind &ScheduleKind,		const OMPLoopDirective &S, OpenMPDistScheduleClauseKind &ScheduleKind,
llvm::Value *&Chunk) const override;		llvm::Value *&Chunk) const override;
Show All 32 Lines	private:
bool RequiresFullRuntime = true;		bool RequiresFullRuntime = true;

/// true if we're emitting the code for the target region and next parallel		/// true if we're emitting the code for the target region and next parallel
/// region is L0 for sure.		/// region is L0 for sure.
bool IsInTargetMasterThreadRegion = false;		bool IsInTargetMasterThreadRegion = false;
/// true if currently emitting code for target/teams/distribute region, false		/// true if currently emitting code for target/teams/distribute region, false
/// - otherwise.		/// - otherwise.
bool IsInTTDRegion = false;		bool IsInTTDRegion = false;
/// true if we're definitely in the parallel region.
bool IsInParallelRegion = false;

/// Map between an outlined function and its wrapper.		/// Map between an outlined function and its wrapper.
llvm::DenseMap<llvm::Function , llvm::Function > WrapperFunctionsMap;		llvm::DenseMap<llvm::Function , llvm::Function > WrapperFunctionsMap;

/// Emit function which wraps the outline parallel region		/// Emit function which wraps the outline parallel region
/// and controls the parameters which are passed to this function.		/// and controls the parameters which are passed to this function.
/// The wrapper ensures that the outlined function is called		/// The wrapper ensures that the outlined function is called
/// with the correct arguments when data is shared.		/// with the correct arguments when data is shared.
▲ Show 20 Lines • Show All 68 Lines • Show Last 20 Lines

clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp

//===---- CGOpenMPRuntimeGPU.cpp - Interface to OpenMP GPU Runtimes ----===//		//===---- CGOpenMPRuntimeGPU.cpp - Interface to OpenMP GPU Runtimes ----===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This provides a generalized class for OpenMP runtime code generation		// This provides a generalized class for OpenMP runtime code generation
// specialized by GPU targets NVPTX and AMDGCN.		// specialized by GPU targets NVPTX and AMDGCN.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "CGOpenMPRuntimeGPU.h"		#include "CGOpenMPRuntimeGPU.h"
		#include "CGOpenMPRuntimeAMDGCN.h"
#include "CGOpenMPRuntimeNVPTX.h"		#include "CGOpenMPRuntimeNVPTX.h"
#include "CodeGenFunction.h"		#include "CodeGenFunction.h"
#include "clang/AST/Attr.h"		#include "clang/AST/Attr.h"
#include "clang/AST/DeclOpenMP.h"		#include "clang/AST/DeclOpenMP.h"
#include "clang/AST/StmtOpenMP.h"		#include "clang/AST/StmtOpenMP.h"
#include "clang/AST/StmtVisitor.h"		#include "clang/AST/StmtVisitor.h"
#include "clang/Basic/Cuda.h"		#include "clang/Basic/Cuda.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
▲ Show 20 Lines • Show All 1,162 Lines • ▼ Show 20 Lines	void Exit(CodeGenFunction &CGF) override {
RT.emitNonSPMDEntryFooter(CGF, EST);		RT.emitNonSPMDEntryFooter(CGF, EST);
}		}
} Action(EST, WST);		} Action(EST, WST);
CodeGen.setAction(Action);		CodeGen.setAction(Action);
IsInTTDRegion = true;		IsInTTDRegion = true;
// Reserve place for the globalized memory.		// Reserve place for the globalized memory.
GlobalizedRecords.emplace_back();		GlobalizedRecords.emplace_back();
if (!KernelStaticGlobalized) {		if (!KernelStaticGlobalized) {
KernelStaticGlobalized = new llvm::GlobalVariable(		auto &RT = static_cast<CGOpenMPRuntimeGPU &>(CGM.getOpenMPRuntime());
CGM.getModule(), CGM.VoidPtrTy, /isConstant=/false,		KernelStaticGlobalized = RT.allocateKernelStaticGlobalized(CGM);
llvm::GlobalValue::InternalLinkage,
llvm::ConstantPointerNull::get(CGM.VoidPtrTy),
"_openmp_kernel_static_glob_rd$ptr", /InsertBefore=/nullptr,
llvm::GlobalValue::NotThreadLocal,
CGM.getContext().getTargetAddressSpace(LangAS::cuda_shared));
}		}
emitTargetOutlinedFunctionHelper(D, ParentName, OutlinedFn, OutlinedFnID,		emitTargetOutlinedFunctionHelper(D, ParentName, OutlinedFn, OutlinedFnID,
IsOffloadEntry, CodeGen);		IsOffloadEntry, CodeGen);
IsInTTDRegion = false;		IsInTTDRegion = false;

// Now change the name of the worker function to correspond to this target		// Now change the name of the worker function to correspond to this target
// region's entry function.		// region's entry function.
WST.WorkerFn->setName(Twine(OutlinedFn->getName(), "_worker"));		WST.WorkerFn->setName(Twine(OutlinedFn->getName(), "_worker"));
▲ Show 20 Lines • Show All 108 Lines • ▼ Show 20 Lines	void Exit(CodeGenFunction &CGF) override {
RT.emitSPMDEntryFooter(CGF, EST);		RT.emitSPMDEntryFooter(CGF, EST);
}		}
} Action(*this, EST, D);		} Action(*this, EST, D);
CodeGen.setAction(Action);		CodeGen.setAction(Action);
IsInTTDRegion = true;		IsInTTDRegion = true;
// Reserve place for the globalized memory.		// Reserve place for the globalized memory.
GlobalizedRecords.emplace_back();		GlobalizedRecords.emplace_back();
if (!KernelStaticGlobalized) {		if (!KernelStaticGlobalized) {
KernelStaticGlobalized = new llvm::GlobalVariable(		auto &RT = static_cast<CGOpenMPRuntimeGPU &>(CGM.getOpenMPRuntime());
CGM.getModule(), CGM.VoidPtrTy, /isConstant=/false,		KernelStaticGlobalized = RT.allocateKernelStaticGlobalized(CGM);
llvm::GlobalValue::InternalLinkage,
llvm::ConstantPointerNull::get(CGM.VoidPtrTy),
"_openmp_kernel_static_glob_rd$ptr", /InsertBefore=/nullptr,
llvm::GlobalValue::NotThreadLocal,
CGM.getContext().getTargetAddressSpace(LangAS::cuda_shared));
}		}
emitTargetOutlinedFunctionHelper(D, ParentName, OutlinedFn, OutlinedFnID,		emitTargetOutlinedFunctionHelper(D, ParentName, OutlinedFn, OutlinedFnID,
IsOffloadEntry, CodeGen);		IsOffloadEntry, CodeGen);
IsInTTDRegion = false;		IsInTTDRegion = false;
}		}

void CGOpenMPRuntimeGPU::emitSPMDEntryHeader(		void CGOpenMPRuntimeGPU::emitSPMDEntryHeader(
CodeGenFunction &CGF, EntryFunctionState &EST,		CodeGenFunction &CGF, EntryFunctionState &EST,
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
// 'generic', the runtime reserves one warp for the master, otherwise, all		// 'generic', the runtime reserves one warp for the master, otherwise, all
// warps participate in parallel work.		// warps participate in parallel work.
static void setPropertyExecutionMode(CodeGenModule &CGM, StringRef Name,		static void setPropertyExecutionMode(CodeGenModule &CGM, StringRef Name,
bool Mode) {		bool Mode) {
auto *GVMode =		auto *GVMode =
new llvm::GlobalVariable(CGM.getModule(), CGM.Int8Ty, /isConstant=/true,		new llvm::GlobalVariable(CGM.getModule(), CGM.Int8Ty, /isConstant=/true,
llvm::GlobalValue::WeakAnyLinkage,		llvm::GlobalValue::WeakAnyLinkage,
llvm::ConstantInt::get(CGM.Int8Ty, Mode ? 0 : 1),		llvm::ConstantInt::get(CGM.Int8Ty, Mode ? 0 : 1),
Twine(Name, "_exec_mode"));		Twine(Name, "_exec_mode"));
CGM.addCompilerUsedGlobal(GVMode);		CGM.addCompilerUsedGlobal(GVMode);
		ABataevUnsubmitted Not Done Reply Inline Actions Restore original formatting. ABataev: Restore original formatting.
}		}

void CGOpenMPRuntimeGPU::emitWorkerFunction(WorkerFunctionState &WST) {		void CGOpenMPRuntimeGPU::emitWorkerFunction(WorkerFunctionState &WST) {
ASTContext &Ctx = CGM.getContext();		ASTContext &Ctx = CGM.getContext();

CodeGenFunction CGF(CGM, /suppressNewContext=/true);		CodeGenFunction CGF(CGM, /suppressNewContext=/true);
CGF.StartFunction(GlobalDecl(), Ctx.VoidTy, WST.WorkerFn, WST.CGFI, {},		CGF.StartFunction(GlobalDecl(), Ctx.VoidTy, WST.WorkerFn, WST.CGFI, {},
WST.Loc, WST.Loc);		WST.Loc, WST.Loc);
▲ Show 20 Lines • Show All 470 Lines • ▼ Show 20 Lines	void CGOpenMPRuntimeGPU::emitTargetOutlinedFunction(
bool IsOffloadEntry, const RegionCodeGenTy &CodeGen) {		bool IsOffloadEntry, const RegionCodeGenTy &CodeGen) {
if (!IsOffloadEntry) // Nothing to do.		if (!IsOffloadEntry) // Nothing to do.
return;		return;

assert(!ParentName.empty() && "Invalid target region parent name!");		assert(!ParentName.empty() && "Invalid target region parent name!");

bool Mode = supportsSPMDExecutionMode(CGM.getContext(), D);		bool Mode = supportsSPMDExecutionMode(CGM.getContext(), D);
if (Mode)		if (Mode)
emitSPMDKernel(D, ParentName, OutlinedFn, OutlinedFnID, IsOffloadEntry,		emitSPMDKernelWrapper(D, ParentName, OutlinedFn, OutlinedFnID,
CodeGen);		IsOffloadEntry, CodeGen);
else		else
emitNonSPMDKernel(D, ParentName, OutlinedFn, OutlinedFnID, IsOffloadEntry,		emitNonSPMDKernelWrapper(D, ParentName, OutlinedFn, OutlinedFnID,
CodeGen);		IsOffloadEntry, CodeGen);

setPropertyExecutionMode(CGM, OutlinedFn->getName(), Mode);		setPropertyExecutionMode(CGM, OutlinedFn->getName(), Mode);
}		}

namespace {		namespace {
LLVM_ENABLE_BITMASK_ENUMS_IN_NAMESPACE();		LLVM_ENABLE_BITMASK_ENUMS_IN_NAMESPACE();
/// Enum for accesseing the reserved_2 field of the ident_t struct.		/// Enum for accesseing the reserved_2 field of the ident_t struct.
enum ModeFlagsTy : unsigned {		enum ModeFlagsTy : unsigned {
/// Bit set to 1 when in SPMD mode.		/// Bit set to 1 when in SPMD mode.
▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	void CGOpenMPRuntimeGPU::emitNumTeamsClause(CodeGenFunction &CGF,
const Expr *NumTeams,		const Expr *NumTeams,
const Expr *ThreadLimit,		const Expr *ThreadLimit,
SourceLocation Loc) {}		SourceLocation Loc) {}

llvm::Function *CGOpenMPRuntimeGPU::emitParallelOutlinedFunction(		llvm::Function *CGOpenMPRuntimeGPU::emitParallelOutlinedFunction(
const OMPExecutableDirective &D, const VarDecl *ThreadIDVar,		const OMPExecutableDirective &D, const VarDecl *ThreadIDVar,
OpenMPDirectiveKind InnermostKind, const RegionCodeGenTy &CodeGen) {		OpenMPDirectiveKind InnermostKind, const RegionCodeGenTy &CodeGen) {
// Emit target region as a standalone region.		// Emit target region as a standalone region.
class NVPTXPrePostActionTy : public PrePostActionTy {		auto &RT = static_cast<CGOpenMPRuntimeGPU &>(CGM.getOpenMPRuntime());
bool &IsInParallelRegion;		PrePostActionTy *Action = RT.getPrePostActionTy();
		ABataevUnsubmitted Not Done Reply Inline Actions It leads to a mem leak. ABataev: It leads to a mem leak.
bool PrevIsInParallelRegion;		CodeGen.setAction(*Action);

public:
NVPTXPrePostActionTy(bool &IsInParallelRegion)
: IsInParallelRegion(IsInParallelRegion) {}
void Enter(CodeGenFunction &CGF) override {
PrevIsInParallelRegion = IsInParallelRegion;
IsInParallelRegion = true;
}
void Exit(CodeGenFunction &CGF) override {
IsInParallelRegion = PrevIsInParallelRegion;
}
} Action(IsInParallelRegion);
CodeGen.setAction(Action);
bool PrevIsInTTDRegion = IsInTTDRegion;		bool PrevIsInTTDRegion = IsInTTDRegion;
IsInTTDRegion = false;		IsInTTDRegion = false;
bool PrevIsInTargetMasterThreadRegion = IsInTargetMasterThreadRegion;		bool PrevIsInTargetMasterThreadRegion = IsInTargetMasterThreadRegion;
IsInTargetMasterThreadRegion = false;		IsInTargetMasterThreadRegion = false;
auto *OutlinedFun =		auto *OutlinedFun =
cast<llvm::Function>(CGOpenMPRuntime::emitParallelOutlinedFunction(		cast<llvm::Function>(CGOpenMPRuntime::emitParallelOutlinedFunction(
D, ThreadIDVar, InnermostKind, CodeGen));		D, ThreadIDVar, InnermostKind, CodeGen));
if (CGM.getLangOpts().Optimize) {		if (CGM.getLangOpts().Optimize) {
▲ Show 20 Lines • Show All 1,236 Lines • ▼ Show 20 Lines	auto *Fn = llvm::Function::Create(CGM.getTypes().GetFunctionType(CGFI),
llvm::GlobalValue::InternalLinkage,		llvm::GlobalValue::InternalLinkage,
"_omp_reduction_inter_warp_copy_func", &M);		"_omp_reduction_inter_warp_copy_func", &M);
CGM.SetInternalFunctionAttributes(GlobalDecl(), Fn, CGFI);		CGM.SetInternalFunctionAttributes(GlobalDecl(), Fn, CGFI);
Fn->setDoesNotRecurse();		Fn->setDoesNotRecurse();
CodeGenFunction CGF(CGM);		CodeGenFunction CGF(CGM);
CGF.StartFunction(GlobalDecl(), C.VoidTy, Fn, CGFI, Args, Loc, Loc);		CGF.StartFunction(GlobalDecl(), C.VoidTy, Fn, CGFI, Args, Loc, Loc);

CGBuilderTy &Bld = CGF.Builder;		CGBuilderTy &Bld = CGF.Builder;
		auto &RT = static_cast<CGOpenMPRuntimeGPU &>(CGF.CGM.getOpenMPRuntime());

// This array is used as a medium to transfer, one reduce element at a time,		// This array is used as a medium to transfer, one reduce element at a time,
// the data from the first lane of every warp to lanes in the first warp		// the data from the first lane of every warp to lanes in the first warp
// in order to perform the final step of a reduction in a parallel region		// in order to perform the final step of a reduction in a parallel region
// (reduction across warps). The array is placed in NVPTX __shared__ memory		// (reduction across warps). The array is placed in NVPTX __shared__ memory
// for reduced latency, as well as to have a distinct copy for concurrently		// for reduced latency, as well as to have a distinct copy for concurrently
// executing target regions. The array is declared with common linkage so		// executing target regions. The array is declared with common linkage so
// as to be shared across compilation units.		// as to be shared across compilation units.
StringRef TransferMediumName =		StringRef TransferMediumName =
"__openmp_nvptx_data_transfer_temporary_storage";		"__openmp_nvptx_data_transfer_temporary_storage";
llvm::GlobalVariable *TransferMedium =		llvm::GlobalVariable *TransferMedium =
M.getGlobalVariable(TransferMediumName);		M.getGlobalVariable(TransferMediumName);
unsigned WarpSize = CGF.getTarget().getGridValue(llvm::omp::GV_Warp_Size);		unsigned WarpSize = CGF.getTarget().getGridValue(llvm::omp::GV_Warp_Size);
if (!TransferMedium) {		if (!TransferMedium) {
auto *Ty = llvm::ArrayType::get(CGM.Int32Ty, WarpSize);		auto *Ty = llvm::ArrayType::get(CGM.Int32Ty, WarpSize);
unsigned SharedAddressSpace = C.getTargetAddressSpace(LangAS::cuda_shared);		TransferMedium =
TransferMedium = new llvm::GlobalVariable(		RT.allocateTransferMediumGlobal(CGM, Ty, TransferMediumName);
M, Ty, /isConstant=/false, llvm::GlobalVariable::CommonLinkage,
llvm::Constant::getNullValue(Ty), TransferMediumName,
/InsertBefore=/nullptr, llvm::GlobalVariable::NotThreadLocal,
SharedAddressSpace);
CGM.addCompilerUsedGlobal(TransferMedium);		CGM.addCompilerUsedGlobal(TransferMedium);
}		}

auto &RT = static_cast<CGOpenMPRuntimeGPU &>(CGF.CGM.getOpenMPRuntime());
// Get the CUDA thread id of the current OpenMP thread on the GPU.		// Get the CUDA thread id of the current OpenMP thread on the GPU.
llvm::Value *ThreadID = RT.getGPUThreadID(CGF);		llvm::Value *ThreadID = RT.getGPUThreadID(CGF);
// nvptx_lane_id = nvptx_id % warpsize		// nvptx_lane_id = nvptx_id % warpsize
llvm::Value *LaneID = getNVPTXLaneID(CGF);		llvm::Value *LaneID = getNVPTXLaneID(CGF);
// nvptx_warp_id = nvptx_id / warpsize		// nvptx_warp_id = nvptx_id / warpsize
llvm::Value *WarpID = getNVPTXWarpID(CGF);		llvm::Value *WarpID = getNVPTXWarpID(CGF);

Address AddrReduceListArg = CGF.GetAddrOfLocalVar(&ReduceListArg);		Address AddrReduceListArg = CGF.GetAddrOfLocalVar(&ReduceListArg);
▲ Show 20 Lines • Show All 1,832 Lines • ▼ Show 20 Lines	case CudaArch::UNKNOWN:
break;		break;
case CudaArch::LAST:		case CudaArch::LAST:
llvm_unreachable("Unexpected Cuda arch.");		llvm_unreachable("Unexpected Cuda arch.");
}		}
llvm_unreachable("Unexpected NVPTX target without ptx feature.");		llvm_unreachable("Unexpected NVPTX target without ptx feature.");
}		}

void CGOpenMPRuntimeGPU::clear() {		void CGOpenMPRuntimeGPU::clear() {
		auto &RT = static_cast<CGOpenMPRuntimeGPU &>(CGM.getOpenMPRuntime());
if (!GlobalizedRecords.empty() &&		if (!GlobalizedRecords.empty() &&
!CGM.getLangOpts().OpenMPCUDATargetParallel) {		!CGM.getLangOpts().OpenMPCUDATargetParallel) {
ASTContext &C = CGM.getContext();		ASTContext &C = CGM.getContext();
llvm::SmallVector<const GlobalPtrSizeRecsTy *, 4> GlobalRecs;		llvm::SmallVector<const GlobalPtrSizeRecsTy *, 4> GlobalRecs;
llvm::SmallVector<const GlobalPtrSizeRecsTy *, 4> SharedRecs;		llvm::SmallVector<const GlobalPtrSizeRecsTy *, 4> SharedRecs;
RecordDecl *StaticRD = C.buildImplicitRecord(		RecordDecl *StaticRD = C.buildImplicitRecord(
"_openmp_static_memory_type_$_", RecordDecl::TagKind::TTK_Union);		"_openmp_static_memory_type_$_", RecordDecl::TagKind::TTK_Union);
StaticRD->startDefinition();		StaticRD->startDefinition();
Show All 32 Lines	for (const GlobalPtrSizeRecsTy &Records : GlobalizedRecords) {
StaticRD->addDecl(Field);		StaticRD->addDecl(Field);
GlobalRecs.push_back(&Records);		GlobalRecs.push_back(&Records);
}		}
Records.RecSize->setInitializer(llvm::ConstantInt::get(CGM.SizeTy, Size));		Records.RecSize->setInitializer(llvm::ConstantInt::get(CGM.SizeTy, Size));
Records.UseSharedMemory->setInitializer(		Records.UseSharedMemory->setInitializer(
llvm::ConstantInt::get(CGM.Int16Ty, UseSharedMemory ? 1 : 0));		llvm::ConstantInt::get(CGM.Int16Ty, UseSharedMemory ? 1 : 0));
}		}
// Allocate SharedMemorySize buffer for the shared memory.		// Allocate SharedMemorySize buffer for the shared memory.
// FIXME: nvlink does not handle weak linkage correctly (object with the
// different size are reported as erroneous).
// Restore this code as sson as nvlink is fixed.
if (!SharedStaticRD->field_empty()) {		if (!SharedStaticRD->field_empty()) {
llvm::APInt ArySize(/numBits=/64, SharedMemorySize);		llvm::APInt ArySize(/numBits=/64, SharedMemorySize);
QualType SubTy = C.getConstantArrayType(		QualType SubTy = C.getConstantArrayType(
C.CharTy, ArySize, nullptr, ArrayType::Normal, /IndexTypeQuals=/0);		C.CharTy, ArySize, nullptr, ArrayType::Normal, /IndexTypeQuals=/0);
auto *Field = FieldDecl::Create(		auto *Field = FieldDecl::Create(
C, SharedStaticRD, SourceLocation(), SourceLocation(), nullptr, SubTy,		C, SharedStaticRD, SourceLocation(), SourceLocation(), nullptr, SubTy,
C.getTrivialTypeSourceInfo(SubTy, SourceLocation()),		C.getTrivialTypeSourceInfo(SubTy, SourceLocation()),
/BW=/nullptr, /Mutable=/false,		/BW=/nullptr, /Mutable=/false,
/InitStyle=/ICIS_NoInit);		/InitStyle=/ICIS_NoInit);
Field->setAccess(AS_public);		Field->setAccess(AS_public);
SharedStaticRD->addDecl(Field);		SharedStaticRD->addDecl(Field);
}		}
SharedStaticRD->completeDefinition();		SharedStaticRD->completeDefinition();
if (!SharedStaticRD->field_empty()) {		if (!SharedStaticRD->field_empty()) {
QualType StaticTy = C.getRecordType(SharedStaticRD);		QualType StaticTy = C.getRecordType(SharedStaticRD);
llvm::Type *LLVMStaticTy = CGM.getTypes().ConvertTypeForMem(StaticTy);		llvm::Type *LLVMStaticTy = CGM.getTypes().ConvertTypeForMem(StaticTy);
auto *GV = new llvm::GlobalVariable(		auto *GV = RT.allocateSharedStaticRDGlobal(CGM, LLVMStaticTy);
CGM.getModule(), LLVMStaticTy,
/isConstant=/false, llvm::GlobalValue::CommonLinkage,
llvm::Constant::getNullValue(LLVMStaticTy),
"_openmp_shared_static_glob_rd_$_", /InsertBefore=/nullptr,
llvm::GlobalValue::NotThreadLocal,
C.getTargetAddressSpace(LangAS::cuda_shared));
auto *Replacement = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast(		auto *Replacement = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast(
GV, CGM.VoidPtrTy);		GV, CGM.VoidPtrTy);
for (const GlobalPtrSizeRecsTy *Rec : SharedRecs) {		for (const GlobalPtrSizeRecsTy *Rec : SharedRecs) {
Rec->Buffer->replaceAllUsesWith(Replacement);		Rec->Buffer->replaceAllUsesWith(Replacement);
Rec->Buffer->eraseFromParent();		Rec->Buffer->eraseFromParent();
}		}
}		}
StaticRD->completeDefinition();		StaticRD->completeDefinition();
if (!StaticRD->field_empty()) {		if (!StaticRD->field_empty()) {
QualType StaticTy = C.getRecordType(StaticRD);		QualType StaticTy = C.getRecordType(StaticRD);
std::pair<unsigned, unsigned> SMsBlockPerSM = getSMsBlocksPerSM(CGM);		std::pair<unsigned, unsigned> SMsBlockPerSM = getSMsBlocksPerSM(CGM);
llvm::APInt Size1(32, SMsBlockPerSM.second);		llvm::APInt Size1(32, SMsBlockPerSM.second);
QualType Arr1Ty =		QualType Arr1Ty =
C.getConstantArrayType(StaticTy, Size1, nullptr, ArrayType::Normal,		C.getConstantArrayType(StaticTy, Size1, nullptr, ArrayType::Normal,
/IndexTypeQuals=/0);		/IndexTypeQuals=/0);
llvm::APInt Size2(32, SMsBlockPerSM.first);		llvm::APInt Size2(32, SMsBlockPerSM.first);
QualType Arr2Ty =		QualType Arr2Ty =
C.getConstantArrayType(Arr1Ty, Size2, nullptr, ArrayType::Normal,		C.getConstantArrayType(Arr1Ty, Size2, nullptr, ArrayType::Normal,
/IndexTypeQuals=/0);		/IndexTypeQuals=/0);
llvm::Type *LLVMArr2Ty = CGM.getTypes().ConvertTypeForMem(Arr2Ty);		llvm::Type *LLVMArr2Ty = CGM.getTypes().ConvertTypeForMem(Arr2Ty);
// FIXME: nvlink does not handle weak linkage correctly (object with the		auto *GV =
// different size are reported as erroneous).		new llvm::GlobalVariable(CGM.getModule(), LLVMArr2Ty,
// Restore CommonLinkage as soon as nvlink is fixed.		/isConstant=/false, RT.StaticRDLinkage,
auto *GV = new llvm::GlobalVariable(
CGM.getModule(), LLVMArr2Ty,
/isConstant=/false, llvm::GlobalValue::InternalLinkage,
llvm::Constant::getNullValue(LLVMArr2Ty),		llvm::Constant::getNullValue(LLVMArr2Ty),
"_openmp_static_glob_rd_$_");		"_openmp_static_glob_rd_$_");
auto *Replacement = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast(		auto *Replacement = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast(
GV, CGM.VoidPtrTy);		GV, CGM.VoidPtrTy);
for (const GlobalPtrSizeRecsTy *Rec : GlobalRecs) {		for (const GlobalPtrSizeRecsTy *Rec : GlobalRecs) {
Rec->Buffer->replaceAllUsesWith(Replacement);		Rec->Buffer->replaceAllUsesWith(Replacement);
Rec->Buffer->eraseFromParent();		Rec->Buffer->eraseFromParent();
}		}
}		}
}		}
Show All 16 Lines	if (!TeamsReductions.empty()) {
QualType StaticTy = C.getRecordType(StaticRD);		QualType StaticTy = C.getRecordType(StaticRD);
llvm::Type *LLVMReductionsBufferTy =		llvm::Type *LLVMReductionsBufferTy =
CGM.getTypes().ConvertTypeForMem(StaticTy);		CGM.getTypes().ConvertTypeForMem(StaticTy);
// FIXME: nvlink does not handle weak linkage correctly (object with the		// FIXME: nvlink does not handle weak linkage correctly (object with the
// different size are reported as erroneous).		// different size are reported as erroneous).
// Restore CommonLinkage as soon as nvlink is fixed.		// Restore CommonLinkage as soon as nvlink is fixed.
auto *GV = new llvm::GlobalVariable(		auto *GV = new llvm::GlobalVariable(
CGM.getModule(), LLVMReductionsBufferTy,		CGM.getModule(), LLVMReductionsBufferTy,
/isConstant=/false, llvm::GlobalValue::InternalLinkage,		/isConstant=/false, RT.StaticRDLinkage,
llvm::Constant::getNullValue(LLVMReductionsBufferTy),		llvm::Constant::getNullValue(LLVMReductionsBufferTy),
"_openmp_teams_reductions_buffer_$_");		"_openmp_teams_reductions_buffer_$_");
KernelTeamsReductionPtr->setInitializer(		KernelTeamsReductionPtr->setInitializer(
llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast(GV,		llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast(GV,
CGM.VoidPtrTy));		CGM.VoidPtrTy));
}		}
CGOpenMPRuntime::clear();		CGOpenMPRuntime::clear();
}		}

clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.h

Show All 29 Lines	public:
/// Get the GPU warp size.		/// Get the GPU warp size.
llvm::Value *getGPUWarpSize(CodeGenFunction &CGF) override;		llvm::Value *getGPUWarpSize(CodeGenFunction &CGF) override;

/// Get the id of the current thread on the GPU.		/// Get the id of the current thread on the GPU.
llvm::Value *getGPUThreadID(CodeGenFunction &CGF) override;		llvm::Value *getGPUThreadID(CodeGenFunction &CGF) override;

/// Get the maximum number of threads in a block of the GPU.		/// Get the maximum number of threads in a block of the GPU.
llvm::Value *getGPUNumThreads(CodeGenFunction &CGF) override;		llvm::Value *getGPUNumThreads(CodeGenFunction &CGF) override;

		/// Allocate global variable for TransferMedium
		llvm::GlobalVariable *allocateTransferMediumGlobal(CodeGenModule &CGM,
		llvm::ArrayType *Ty,
		StringRef Name) override;

		/// Allocate global variable for SharedStaticRD
		llvm::GlobalVariable *
		allocateSharedStaticRDGlobal(CodeGenModule &CGM,
		llvm::Type *LLVMStaticTy) override;

		/// Allocate global variable for KernelStaticGlobalized
		llvm::GlobalVariable *
		allocateKernelStaticGlobalized(CodeGenModule &CGM) override;

		/// Get target specific PrePostAction
		PrePostActionTy *getPrePostActionTy() override;

		/// Target independent wrapper over target specific emitSPMDKernel()
		void emitSPMDKernelWrapper(const OMPExecutableDirective &D,
		StringRef ParentName, llvm::Function *&OutlinedFn,
		llvm::Constant *&OutlinedFnID, bool IsOffloadEntry,
		const RegionCodeGenTy &CodeGen) override;

		/// Target independent wrapper over target specific emitNonSPMDKernel()
		void emitNonSPMDKernelWrapper(const OMPExecutableDirective &D,
		StringRef ParentName,
		llvm::Function *&OutlinedFn,
		ABataevUnsubmitted Not Done Reply Inline Actions No need to add `virtual`, `override` is enough ABataev: No need to add `virtual`, `override` is enough
		llvm::Constant *&OutlinedFnID,
		bool IsOffloadEntry,
		const RegionCodeGenTy &CodeGen) override;

		/// NVPTX specific class for PrePostActionTy
		class NVPTXPrePostActionTy final : public PrePostActionTy {
		bool &IsInParallelRegion;
		bool PrevIsInParallelRegion;

		public:
		NVPTXPrePostActionTy(bool &IsInParallelRegion)
		: IsInParallelRegion(IsInParallelRegion) {}
		void Enter(CodeGenFunction &CGF) override {
		PrevIsInParallelRegion = IsInParallelRegion;
		IsInParallelRegion = true;
		}
		void Exit(CodeGenFunction &CGF) override {
		IsInParallelRegion = PrevIsInParallelRegion;
		}
		};
};		};

} // CodeGen namespace.		} // CodeGen namespace.
} // clang namespace.		} // clang namespace.

#endif // LLVM_CLANG_LIB_CODEGEN_CGOPENMPRUNTIMENVPTX_H		#endif // LLVM_CLANG_LIB_CODEGEN_CGOPENMPRUNTIMENVPTX_H

clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp

	Show All 24 Lines
	using namespace clang;			using namespace clang;
	using namespace CodeGen;			using namespace CodeGen;
	using namespace llvm::omp;			using namespace llvm::omp;

	CGOpenMPRuntimeNVPTX::CGOpenMPRuntimeNVPTX(CodeGenModule &CGM)			CGOpenMPRuntimeNVPTX::CGOpenMPRuntimeNVPTX(CodeGenModule &CGM)
	: CGOpenMPRuntimeGPU(CGM) {			: CGOpenMPRuntimeGPU(CGM) {
	if (!CGM.getLangOpts().OpenMPIsDevice)			if (!CGM.getLangOpts().OpenMPIsDevice)
	llvm_unreachable("OpenMP NVPTX can only handle device code.");			llvm_unreachable("OpenMP NVPTX can only handle device code.");

				// FIXME: nvlink does not handle weak linkage correctly (object with the
				// different size are reported as erroneous).
				// Restore CommonLinkage as soon as nvlink is fixed.
				StaticRDLinkage = llvm::GlobalValue::InternalLinkage;
	}			}

	llvm::Value *CGOpenMPRuntimeNVPTX::getGPUWarpSize(CodeGenFunction &CGF) {			llvm::Value *CGOpenMPRuntimeNVPTX::getGPUWarpSize(CodeGenFunction &CGF) {
	return CGF.EmitRuntimeCall(			return CGF.EmitRuntimeCall(
	llvm::Intrinsic::getDeclaration(			llvm::Intrinsic::getDeclaration(
	&CGF.CGM.getModule(), llvm::Intrinsic::nvvm_read_ptx_sreg_warpsize),			&CGF.CGM.getModule(), llvm::Intrinsic::nvvm_read_ptx_sreg_warpsize),
	"nvptx_warp_size");			"nvptx_warp_size");
	}			}

	llvm::Value *CGOpenMPRuntimeNVPTX::getGPUThreadID(CodeGenFunction &CGF) {			llvm::Value *CGOpenMPRuntimeNVPTX::getGPUThreadID(CodeGenFunction &CGF) {
	CGBuilderTy &Bld = CGF.Builder;			CGBuilderTy &Bld = CGF.Builder;
	llvm::Function *F;			llvm::Function *F;
	F = llvm::Intrinsic::getDeclaration(			F = llvm::Intrinsic::getDeclaration(
	&CGF.CGM.getModule(), llvm::Intrinsic::nvvm_read_ptx_sreg_tid_x);			&CGF.CGM.getModule(), llvm::Intrinsic::nvvm_read_ptx_sreg_tid_x);
	return Bld.CreateCall(F, llvm::None, "nvptx_tid");			return Bld.CreateCall(F, llvm::None, "nvptx_tid");
	}			}

	llvm::Value *CGOpenMPRuntimeNVPTX::getGPUNumThreads(CodeGenFunction &CGF) {			llvm::Value *CGOpenMPRuntimeNVPTX::getGPUNumThreads(CodeGenFunction &CGF) {
	CGBuilderTy &Bld = CGF.Builder;			CGBuilderTy &Bld = CGF.Builder;
	llvm::Function *F;			llvm::Function *F;
	F = llvm::Intrinsic::getDeclaration(			F = llvm::Intrinsic::getDeclaration(
	&CGF.CGM.getModule(), llvm::Intrinsic::nvvm_read_ptx_sreg_ntid_x);			&CGF.CGM.getModule(), llvm::Intrinsic::nvvm_read_ptx_sreg_ntid_x);
	return Bld.CreateCall(F, llvm::None, "nvptx_num_threads");			return Bld.CreateCall(F, llvm::None, "nvptx_num_threads");
	}			}

				llvm::GlobalVariable *CGOpenMPRuntimeNVPTX::allocateTransferMediumGlobal(
				JonChesterfieldUnsubmitted Not Done Reply Inline Actions Perhaps (typed into browser): llvm::GlobalVariable CGOpenMPRuntimeNVPTX::createGlobal( CodeGenModule &CGM, llvm::ArrayType Ty, StringRef Name) { return new llvm::GlobalVariable( CGM.getModule(), Ty, /isConstant=/false, llvm::GlobalVariable::CommonLinkage, llvm::Constant::getNullValue(Ty), Name, /InsertBefore=/nullptr, llvm::GlobalVariable::NotThreadLocal, CGM.getContext().getTargetAddressSpace(LangAS::cuda_shared), /isExternallyInitialized/ true); } llvm::GlobalVariable CGOpenMPRuntimeAMDGCN::createGlobal( CodeGenModule &CGM, llvm::ArrayType Ty, StringRef Name) { return new llvm::GlobalVariable( CGM.getModule(), Ty, /isConstant=/false, llvm::GlobalVariable::WeakAnyLinkage, llvm::Constant::getNullValue(Ty), Name, /InsertBefore=/nullptr, llvm::GlobalVariable::NotThreadLocal, CGM.getContext().getTargetAddressSpace(LangAS::cuda_shared), /isExternallyInitialized/ false); } JonChesterfield: Perhaps (typed into browser): ``` llvm::GlobalVariable *CGOpenMPRuntimeNVPTX::createGlobal…
				CodeGenModule &CGM, llvm::ArrayType *Ty, StringRef TransferMediumName) {
				return new llvm::GlobalVariable(
				CGM.getModule(), Ty, /isConstant=/false,
				llvm::GlobalVariable::CommonLinkage, llvm::Constant::getNullValue(Ty),
				TransferMediumName,
				/InsertBefore=/nullptr, llvm::GlobalVariable::NotThreadLocal,
				CGM.getContext().getTargetAddressSpace(LangAS::cuda_shared));
				}

				llvm::GlobalVariable *
				CGOpenMPRuntimeNVPTX::allocateSharedStaticRDGlobal(CodeGenModule &CGM,
				llvm::Type *LLVMStaticTy) {
				return new llvm::GlobalVariable(
				CGM.getModule(), LLVMStaticTy,
				/isConstant=/false, llvm::GlobalValue::CommonLinkage,
				llvm::Constant::getNullValue(LLVMStaticTy),
				"_openmp_shared_static_glob_rd_$_", /InsertBefore=/nullptr,
				llvm::GlobalValue::NotThreadLocal,
				CGM.getContext().getTargetAddressSpace(LangAS::cuda_shared));
				}

				llvm::GlobalVariable *
				CGOpenMPRuntimeNVPTX::allocateKernelStaticGlobalized(CodeGenModule &CGM) {
				return new llvm::GlobalVariable(
				CGM.getModule(), CGM.VoidPtrTy, /isConstant=/false,
				llvm::GlobalValue::InternalLinkage,
				llvm::ConstantPointerNull::get(CGM.VoidPtrTy),
				"_openmp_kernel_static_glob_rd$ptr", /InsertBefore=/nullptr,
				llvm::GlobalValue::NotThreadLocal,
				CGM.getContext().getTargetAddressSpace(LangAS::cuda_shared));
				}

				void CGOpenMPRuntimeNVPTX::emitSPMDKernelWrapper(
				const OMPExecutableDirective &D, StringRef ParentName,
				llvm::Function &OutlinedFn, llvm::Constant &OutlinedFnID,
				bool IsOffloadEntry, const RegionCodeGenTy &CodeGen) {
				emitSPMDKernel(D, ParentName, OutlinedFn, OutlinedFnID, IsOffloadEntry,
				CodeGen);
				}

				void CGOpenMPRuntimeNVPTX::emitNonSPMDKernelWrapper(
				const OMPExecutableDirective &D, StringRef ParentName,
				llvm::Function &OutlinedFn, llvm::Constant &OutlinedFnID,
				bool IsOffloadEntry, const RegionCodeGenTy &CodeGen) {
				emitNonSPMDKernel(D, ParentName, OutlinedFn, OutlinedFnID, IsOffloadEntry,
				CodeGen);
				}

				PrePostActionTy *CGOpenMPRuntimeNVPTX::getPrePostActionTy() {
				return new NVPTXPrePostActionTy(IsInParallelRegion);
				}

clang/test/OpenMP/amdgcn_target_codegen.cpp

	// REQUIRES: amdgpu-registered-target			// REQUIRES: amdgpu-registered-target

	// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc			// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc
	// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - \| FileCheck %s			// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - \| FileCheck %s
	// expected-no-diagnostics			// expected-no-diagnostics
	#ifndef HEADER			#ifndef HEADER
	#define HEADER			#define HEADER

	#define N 1000			#define N 1000

				// CHECK: @"_openmp_kernel_static_glob_rd$ptr" = weak addrspace(3) externally_initialized global i8* undef

				// CHECK: @__omp_offloading_{{.*}}test_amdgcn_target_tid_threadsv_l[[LINE1:.+]]_kern_desc = weak constant %struct.__tgt_attribute_struct { i16 2, i16 9, i16 0, i8 1, i8 1, i8 0 }, align 2
				// CHECK: @__omp_offloading_{{.*}}test_amdgcn_target_tid_threadsv_l[[LINE1]]_exec_mode = weak constant i8 1

				// CHECK: @__omp_offloading_{{.*}}test_amdgcn_target_tid_threads_simdv_l[[LINE2:.+]]_kern_desc = weak constant %struct.__tgt_attribute_struct { i16 2, i16 9, i16 0, i8 0, i8 1, i8 0 }, align 2
				// CHECK: @__omp_offloading_{{.*}}test_amdgcn_target_tid_threads_simdv_l[[LINE2]]_exec_mode = weak constant i8 0

				// CHECK: @__omp_offloading_{{.*}}test_amdgcn_target_max_parallel_levelv_l[[LINE3:.+]]_kern_desc = weak constant %struct.__tgt_attribute_struct { i16 2, i16 9, i16 0, i8 0, i8 1, i8 3 }, align 2
				// CHECK: @__omp_offloading_{{.*}}test_amdgcn_target_max_parallel_levelv_l[[LINE3]]_exec_mode = weak constant i8 0

				// CHECK: @__omp_offloading_{{.*}}test_amdgcn_target_attributes_spmdv_l[[LINE4:.+]]_wg_size = weak addrspace(1) constant i16 10
				// CHECK: @__omp_offloading_{{.*}}test_amdgcn_target_attributes_spmdv_l[[LINE4]]_kern_desc = weak constant %struct.__tgt_attribute_struct { i16 2, i16 9, i16 10, i8 0, i8 1, i8 0 }, align 2
				// CHECK: @__omp_offloading_{{.*}}test_amdgcn_target_attributes_spmdv_l[[LINE4]]_exec_mode = weak constant i8 0

				// CHECK: @__omp_offloading_{{.*}}test_amdgcn_target_attributes_non_spmdv_l[[LINE5:.+]]_wg_size = weak addrspace(1) constant i16 74
				// CHECK: @__omp_offloading_{{.*}}test_amdgcn_target_attributes_non_spmdv_l[[LINE5]]_kern_desc = weak constant %struct.__tgt_attribute_struct { i16 2, i16 9, i16 74, i8 1, i8 1, i8 0 }, align 2
				// CHECK: @__omp_offloading_{{.*}}test_amdgcn_target_attributes_non_spmdv_l[[LINE5]]_exec_mode = weak constant i8 1

				// CHECK: @__omp_offloading_{{.*}}test_amdgcn_target_attributes_max_work_group_sizev_l[[LINE6:.+]]_wg_size = weak addrspace(1) constant i16 1024
				// CHECK: @__omp_offloading_{{.*}}test_amdgcn_target_attributes_max_work_group_sizev_l[[LINE6]]_kern_desc = weak constant %struct.__tgt_attribute_struct { i16 2, i16 9, i16 1024, i8 1, i8 1, i8 0 }, align 2
				// CHECK: @__omp_offloading_{{.*}}test_amdgcn_target_attributes_max_work_group_sizev_l[[LINE6]]_exec_mode = weak constant i8 1

	int test_amdgcn_target_tid_threads() {			int test_amdgcn_target_tid_threads() {
	// CHECK-LABEL: define weak void @{{.*}}test_amdgcn_target_tid_threads			// CHECK-LABEL: define weak void @{{.*}}test_amdgcn_target_tid_threads

	int arr[N];			int arr[N];

	// CHECK: [[NUM_THREADS:%.+]] = call i64 @__ockl_get_local_size(i32 0)			// CHECK: [[NUM_THREADS:%.+]] = call i64 @__ockl_get_local_size(i32 0)
	// CHECK-NEXT: [[VAR:%.+]] = trunc i64 [[NUM_THREADS]] to i32			// CHECK-NEXT: [[VAR:%.+]] = trunc i64 [[NUM_THREADS]] to i32
	// CHECK-NEXT: sub nuw i32 [[VAR]], 64			// CHECK-NEXT: sub nuw i32 [[VAR]], 64
	Show All 16 Lines
	// CHECK-NEXT: call void @__kmpc_spmd_kernel_init(i32 [[VAR]], i16 0, i16 0)			// CHECK-NEXT: call void @__kmpc_spmd_kernel_init(i32 [[VAR]], i16 0, i16 0)
	#pragma omp target simd			#pragma omp target simd
	for (int i = 0; i < N; i++) {			for (int i = 0; i < N; i++) {
	arr[i] = 1;			arr[i] = 1;
	}			}
	return arr[0];			return arr[0];
	}			}

				int test_amdgcn_target_max_parallel_level() {
				// CHECK-LABEL: define weak void @{{.*}}test_amdgcn_target_max_parallel_level
				int arr[N];

				#pragma omp target parallel for
				for (int i = 0; i < N; i++)
				#pragma omp parallel for
				for (int j = 0; j < N; j++)
				#pragma omp parallel for
				for (int k = 0; k < N; k++)
				for (int l = 0; l < N; l++)
				#pragma omp parallel for
				for (int m = 0; m < N; m++)
				arr[m] = 0;

				return arr[0];
				}

				int test_amdgcn_target_attributes_spmd() {
				int arr[N];

				// CHECK: {{.*}}"amdgpu-flat-work-group-size"="10,10"
				#pragma omp target parallel num_threads(10)
				for (int i = 0; i < N; i++) {
				arr[i] = 1;
				}

				return arr[0];
				}

				int test_amdgcn_target_attributes_non_spmd() {
				int arr[N];

				// CHECK: {{.*}}"amdgpu-flat-work-group-size"="74,74"
				#pragma omp target teams thread_limit(10)
				for (int i = 0; i < N; i++) {
				arr[i] = 1;
				}

				return arr[0];
				}

				int test_amdgcn_target_attributes_max_work_group_size() {
				int arr[N];

				// CHECK: {{.*}}"amdgpu-flat-work-group-size"="1024,1024"
				#pragma omp target teams thread_limit(1500)
				for (int i = 0; i < N; i++) {
				arr[i] = 1;
				}

				return arr[0];
				}

	#endif			#endif

				// CHECK: !0 = !{i32 0, i32 [[ARG1:[0-9]+]], i32 [[ARG2:[0-9]+]], !"_Z37test_amdgcn_target_max_parallel_levelv", i32 [[LINE3]], i32 2}
				// CHECK: !1 = !{i32 0, i32 [[ARG1]], i32 [[ARG2]], !"_Z30test_amdgcn_target_tid_threadsv", i32 [[LINE1]], i32 0}
				// CHECK: !2 = !{i32 0, i32 [[ARG1]], i32 [[ARG2]], !"_Z35test_amdgcn_target_tid_threads_simdv", i32 [[LINE2]], i32 1}
				// CHECK: !3 = !{i32 0, i32 [[ARG1]], i32 [[ARG2]], !"_Z38test_amdgcn_target_attributes_non_spmdv", i32 [[LINE5]], i32 4}
				// CHECK: !4 = !{i32 0, i32 [[ARG1]], i32 [[ARG2]], !"_Z34test_amdgcn_target_attributes_spmdv", i32 [[LINE4]], i32 3}
				// CHECK: !5 = !{i32 0, i32 [[ARG1]], i32 [[ARG2]], !"_Z49test_amdgcn_target_attributes_max_work_group_sizev", i32 [[LINE6]], i32 5}
				No newline at end of file

This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP][AMDGCN] Generate global variables and attributes for AMDGCNAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 288072

clang/lib/CodeGen/CGOpenMPRuntime.h

clang/lib/CodeGen/CGOpenMPRuntime.cpp

clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.h

clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.cpp

clang/lib/CodeGen/CGOpenMPRuntimeGPU.h

clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp

clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.h

clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp

clang/test/OpenMP/amdgcn_target_codegen.cpp

[OpenMP][AMDGCN] Generate global variables and attributes for AMDGCN
AbandonedPublic