This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/CodeGen/
-
CodeGen/
1/6
CGOpenMPRuntime.h
1/2
CGOpenMPRuntime.cpp
4/7
CGOpenMPRuntimeAMDGCN.h
8/15
CGOpenMPRuntimeAMDGCN.cpp
3/7
CGOpenMPRuntimeGPU.h
2
CGOpenMPRuntimeGPU.cpp
1
CGOpenMPRuntimeNVPTX.h
1
CGOpenMPRuntimeNVPTX.cpp
-
test/OpenMP/
-
OpenMP/
-
amdgcn_target_codegen.cpp

Differential D86097

[OpenMP][AMDGCN] Generate global variables and attributes for AMDGCN
AbandonedPublic

Authored by saiislam on Aug 17 2020, 11:55 AM.

Download Raw Diff

Details

Reviewers

ABataev
jdoerfert
JonChesterfield

Summary

Provide support for amdgcn specific global variables and attributes.
Generalize allocation of various common global variables and provide
their specialized implementations for nvptx and amdgcn.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

saiislam created this revision.Aug 17 2020, 11:55 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 17 2020, 11:55 AM

Herald added subscribers: cfe-commits, guansong, yaxunl and 2 others. · View Herald Transcript

saiislam requested review of this revision.Aug 17 2020, 11:55 AM

Herald added a subscriber: sstefan1. · View Herald TranscriptAug 17 2020, 11:55 AM

ABataev added inline comments.Aug 17 2020, 12:19 PM

clang/lib/CodeGen/CGOpenMPRuntime.h
499	Cab this type and corresponding functions be made AMDGCN-specific only?
clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.cpp
116	Is this possible?
119	`FlatAttrEmitted`
129	`CompileTimeThreadLimit`
clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
1390–1392	Restore original formatting.
clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
215–243	Make them protected, not public if possible. Try the same for other new functions.
clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.h
40–65	No need to add `virtual`, `override` is enough

Harbormaster completed remote builds in B68651: Diff 286101.Aug 17 2020, 1:00 PM

Moved amdgcn specific functions to CGOpenMPAMDGCN.cpp
Removed tautology condition
Corrected case of local variables
Restored original formatting
Changed back declaration of emit kernel methods as private
Added support of amdgcn specific PrePostActionTy implementation and its corresponding test cases
Changed static line numbers in new test cases with regex
Other small code corrections

Harbormaster completed remote builds in B69371: Diff 287513.Aug 24 2020, 4:06 PM

Reformat the code

clang/lib/CodeGen/CGOpenMPRuntime.h
498	Remove unnecessary formatting changes.
2489–2493	Better to make it a protected member function if you really require it. Plus, this function is very small and, I think, you simply create your own copy in CGOpenMPRuntimeAMDGCN
2497	Same here, make it protected or just create a copy, if it is small.
clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.h
29–31	Add comments for all new members
85	Do you really need to make this class public? `final`

Reformarting
Comments
Reduced scope of specialized PrePostActionTy

saiislam marked an inline comment as done.Aug 26 2020, 12:27 PM

saiislam added inline comments.

clang/lib/CodeGen/CGOpenMPRuntime.h
2489–2493	Not making it protected because it is used by various static functions. And don't want to create an object pointer of subclass of CGOpenMPRuntime in CGOpenMPRuntime.
2497	It calls static functions which in turn call other static functions, so it won't make sense to create a copy of whole function chain in amdgcn.

Harbormaster completed remote builds in B69656: Diff 288072.Aug 26 2020, 1:51 PM

Ping.

ABataev added inline comments.Sep 15 2020, 7:44 AM

clang/lib/CodeGen/CGOpenMPRuntime.h
498	Still not removed
684	Restore original formatting
2492–2497	Better to encapsulate these functions into a new utility class and make them public static.
clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.h
34–67	Do you really need to expose all these new members as public?
35	Runtime does not support nested parallelism on GPU. Do you really need it?
93	It does not help to understand the functionality
clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
1972	It leads to a mem leak.
clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
37	Make it private or protected 2.Add default initializer
187–219	Are all these required to be public?
410–412	Make it private or protected

Removed unnecessary formatting of untouched code.
Encapsulated addFieldToRecordDecl and createGlobalStruct methods in a class and made them static (triggered change at all calling sites).
Marked most of the member methods of CGOpenMPRuntimeAMDGCN as private (forgot to do same change in nvptx)
Fixed the memory leak
Marked appropriate member variables as protected in CGOpenMPRuntimeGPU

JonChesterfield added inline comments.Oct 15 2020, 8:56 AM

clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.cpp
175	The nvptx emitSPMDKernelWrapper does nothing and the amdgcn one appends some metadata. How about 'nvptx::generateMetadata(...)' that does nothing and 'amdgcn::generateMetadata(...)` that does this stuff, called from the end of emitSPMDKernel?
197	This metadata generation could be split out from the other changes.
clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.h
43	I'm not convinced by this abstraction. It looks like amdgcn and nvptx want almost exactly the same variable in each case. The difference appears to be that nvptx uses internal linkage and amdgcn uses weak + externally initialized, in which case we're better off with `bool nvptx::needsExternalInitialization() {return false;}` `bool amdgpu::needsExternalInitialization() {return true;}` Or, if the inline ternary is unappealing, amdgcn::NewGlobalVariable(...) that passes the arguments to llvm::GlobalVariable while setting the two fields that differ between the two.
clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
170	Please put this back to the previous location so we can see whether it changed in the diff

Harbormaster completed remote builds in B75182: Diff 298377.Oct 15 2020, 9:01 AM

saiislam marked 3 inline comments as done.Oct 15 2020, 12:13 PM

saiislam added inline comments.

clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.cpp
175	It will be then difficult to track what all things are being done differently in the two. So, the common code has been generalized and (no change in nvptx + some changes in amdgcn) has been used as specialization.
clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.h
43	I understand what you are suggesting. But, there are multiple such variables where linkage between nvptx and amdgcn are different. Also current style gives flexibility to a future implementation to define these variables in their own way. What do you think?
clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
170	This movement changes them from private to protected. I could have just added access specifiers and not move the definitions. It would have simplified the review, but it would have decreased the readability for future.
187–219	Yes, they are being called from outside class.

JonChesterfield added inline comments.Oct 19 2020, 7:49 AM

clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp

Perhaps (typed into browser):

llvm::GlobalVariable *CGOpenMPRuntimeNVPTX::createGlobal( CodeGenModule &CGM, llvm::ArrayType *Ty, StringRef Name) {
  return new llvm::GlobalVariable(
      CGM.getModule(), Ty, /*isConstant=*/false,
      llvm::GlobalVariable::CommonLinkage, llvm::Constant::getNullValue(Ty),
      Name,
      /*InsertBefore=*/nullptr, llvm::GlobalVariable::NotThreadLocal,
      CGM.getContext().getTargetAddressSpace(LangAS::cuda_shared),
/*isExternallyInitialized*/ true);
}

llvm::GlobalVariable *CGOpenMPRuntimeAMDGCN::createGlobal( CodeGenModule &CGM, llvm::ArrayType *Ty, StringRef Name) {
  return new llvm::GlobalVariable(
      CGM.getModule(), Ty, /*isConstant=*/false,
      llvm::GlobalVariable::WeakAnyLinkage, llvm::Constant::getNullValue(Ty),
      Name,
      /*InsertBefore=*/nullptr, llvm::GlobalVariable::NotThreadLocal,
      CGM.getContext().getTargetAddressSpace(LangAS::cuda_shared),
/*isExternallyInitialized*/ false);
}

pdhaliwal added a subscriber: pdhaliwal.Nov 17 2020, 4:48 AM

Simplifies overall patch after D90248.
Removes MaxParallelLevel and thus target specific PrePostActionTy.
Removes ExternallyInitialized qualifier from shared variables for AMDGCN.

Harbormaster completed remote builds in B79814: Diff 307108.Nov 23 2020, 10:28 AM

JonChesterfield added inline comments.Nov 23 2020, 6:02 PM

clang/lib/CodeGen/CGOpenMPRuntime.cpp
1355	This appears to be the same as the free function we had before, except now all the call sites are prefixed CodegenUtil. Is there a functional change I'm missing? The rest of this patch would be easier to read with this part split off.
clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.cpp
74	This is a very verbose way to say that amdgcn calls emitmetatdata at the end of emitkernel and nvptx doesn't. Suggest unconditionally calling emitmetatdata, and having emitmetatdata be a no-op for nvptx.
86	I think there's a credible chance this is useful to nvptx, so doesn't have to be amdgcn specific
105	I think this is about computing a maximum workgroup size which the runtime uses to limit the number of threads it launches. If so, this is probably useful for nvptx and amdgcn. I'm having trouble working out what the conditions are though. Maybe it's based on an openmp clause?
147	I think I remember seeing a diff that makes this attribute unconditionally emitted by some other part of the toolchain. If so, it may no longer be required
166	HostServices is unused. Mode is redundant with exec_mode. wg_size is redundant with the other wg_size symbol added above. This kern_desc object should be deleted, not upstreamed.

saiislam mentioned this in D92167: [OpenMP][NFC] Encapsulate some CGOpenMPRuntime static methods in a utility class.Nov 26 2020, 3:24 AM

saiislam marked 3 inline comments as done.Nov 26 2020, 4:27 AM

saiislam added inline comments.

clang/lib/CodeGen/CGOpenMPRuntime.cpp
1355	addFieldToRecordDecl and createGlobalStruct methods had file static scope. To make them callable from other files, from amdgcn specific file in this case, they were put in this utility class. D92167 puts this change into a separate patch. Will update this patch once D92167 gets accepted.
clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.cpp
74	Won't the no-op approach be less extensible? Current way, though verbose, leaves scope for attaching prefix/suffix code as and when required around emitkernel. While in case of no-op, every implementing arch might have to use the exact same pattern of methods with and without code.
86	You are right, it can be useful for nvptx as well. May be we can club its generalization with the nvptx's use-case when it arrives in the future?
105	Yes, the if block in 111-147 corresponds to "number of threads" for thread_limit and num_threads clauses in teams and parallel directives.
166	Ok, thanks. Will update in next revision.

I don't believe the contents of this patch is necessary for codegen on amdgpu. One of the internal/weak distinctions works around a bug in the gfx800 toolchain, but we should root cause and fix that bug instead. The kern_desc object is redundant. I think amdgpu-flat-work-group-size is already emitted, but if not, we might want that.

The wg_size code is interesting but architecture independent, and it's probably more user friendly for nvptx and amdgcn to have the same handling of wg_size constraints.

This revision now requires changes to proceed.Nov 26 2020, 8:42 AM

saiislam abandoned this revision.Sep 21 2021, 7:24 AM

saiislam marked 3 inline comments as done.

Revision Contents

Path

Size

clang/

lib/

CodeGen/

CGOpenMPRuntime.h

10 lines

CGOpenMPRuntime.cpp

41 lines

CGOpenMPRuntimeAMDGCN.h

44 lines

CGOpenMPRuntimeAMDGCN.cpp

132 lines

CGOpenMPRuntimeGPU.h

91 lines

CGOpenMPRuntimeGPU.cpp

75 lines

CGOpenMPRuntimeNVPTX.h

28 lines

CGOpenMPRuntimeNVPTX.cpp

54 lines

test/

OpenMP/

amdgcn_target_codegen.cpp

52 lines

Diff 286101

clang/lib/CodeGen/CGOpenMPRuntime.h

Show First 20 Lines • Show All 489 Lines • ▼ Show 20 Lines	private:
/// char *name; // Name of the function or global.		/// char *name; // Name of the function or global.
/// size_t size; // Size of the entry info (0 if it a function).		/// size_t size; // Size of the entry info (0 if it a function).
/// int32_t flags;		/// int32_t flags;
/// int32_t reserved;		/// int32_t reserved;
/// };		/// };
QualType TgtOffloadEntryQTy;		QualType TgtOffloadEntryQTy;
/// Entity that registers the offloading constants that were emitted so		/// Entity that registers the offloading constants that were emitted so
/// far.		/// far.

		ABataevUnsubmitted Not Done Reply Inline Actions Remove unnecessary formatting changes. ABataev: Remove unnecessary formatting changes.
		ABataevUnsubmitted Not Done Reply Inline Actions Still not removed ABataev: Still not removed
		QualType TgtAttributeStructQTy;
		ABataevUnsubmitted Not Done Reply Inline Actions Cab this type and corresponding functions be made AMDGCN-specific only? ABataev: Cab this type and corresponding functions be made AMDGCN-specific only?
class OffloadEntriesInfoManagerTy {		class OffloadEntriesInfoManagerTy {
CodeGenModule &CGM;		CodeGenModule &CGM;

/// Number of entries registered so far.		/// Number of entries registered so far.
unsigned OffloadingEntriesNum = 0;		unsigned OffloadingEntriesNum = 0;

public:		public:
/// Base class of the entries info.		/// Base class of the entries info.
▲ Show 20 Lines • Show All 170 Lines • ▼ Show 20 Lines	bool hasDeviceGlobalVarEntryInfo(StringRef VarName) const {
return OffloadEntriesDeviceGlobalVar.count(VarName) > 0;		return OffloadEntriesDeviceGlobalVar.count(VarName) > 0;
}		}
/// Applies action \a Action on all registered entries.		/// Applies action \a Action on all registered entries.
typedef llvm::function_ref<void(StringRef,		typedef llvm::function_ref<void(StringRef,
const OffloadEntryInfoDeviceGlobalVar &)>		const OffloadEntryInfoDeviceGlobalVar &)>
OffloadDeviceGlobalVarEntryInfoActTy;		OffloadDeviceGlobalVarEntryInfoActTy;
void actOnDeviceGlobalVarEntriesInfo(		void actOnDeviceGlobalVarEntriesInfo(
const OffloadDeviceGlobalVarEntryInfoActTy &Action);		const OffloadDeviceGlobalVarEntryInfoActTy &Action);

ABataevUnsubmitted Not Done Reply Inline Actions Restore original formatting ABataev: Restore original formatting
private:		private:
// Storage for target region entries kind. The storage is to be indexed by		// Storage for target region entries kind. The storage is to be indexed by
// file ID, device ID, parent function name and line number.		// file ID, device ID, parent function name and line number.
typedef llvm::DenseMap<unsigned, OffloadEntryInfoTargetRegion>		typedef llvm::DenseMap<unsigned, OffloadEntryInfoTargetRegion>
OffloadEntriesTargetRegionPerLine;		OffloadEntriesTargetRegionPerLine;
typedef llvm::StringMap<OffloadEntriesTargetRegionPerLine>		typedef llvm::StringMap<OffloadEntriesTargetRegionPerLine>
OffloadEntriesTargetRegionPerParentName;		OffloadEntriesTargetRegionPerParentName;
typedef llvm::DenseMap<unsigned, OffloadEntriesTargetRegionPerParentName>		typedef llvm::DenseMap<unsigned, OffloadEntriesTargetRegionPerParentName>
▲ Show 20 Lines • Show All 1,057 Lines • ▼ Show 20 Lines	public:

/// Emits call of the outlined function with the provided arguments,		/// Emits call of the outlined function with the provided arguments,
/// translating these arguments to correct target-specific arguments.		/// translating these arguments to correct target-specific arguments.
virtual void		virtual void
emitOutlinedFunctionCall(CodeGenFunction &CGF, SourceLocation Loc,		emitOutlinedFunctionCall(CodeGenFunction &CGF, SourceLocation Loc,
llvm::FunctionCallee OutlinedFn,		llvm::FunctionCallee OutlinedFn,
ArrayRef<llvm::Value *> Args = llvm::None) const;		ArrayRef<llvm::Value *> Args = llvm::None) const;

		/// Returns __tgt_attribute_struct type.
		QualType getTgtAttributeStructQTy();

		/// Emit structure descriptor for a kernel
		void emitStructureKernelDesc(CodeGenModule &CGM, StringRef Name,
		int16_t WG_Size, int8_t Mode,
		int8_t HostServices, int8_t MaxParallelLevel);

/// Emits OpenMP-specific function prolog.		/// Emits OpenMP-specific function prolog.
/// Required for device constructs.		/// Required for device constructs.
virtual void emitFunctionProlog(CodeGenFunction &CGF, const Decl *D);		virtual void emitFunctionProlog(CodeGenFunction &CGF, const Decl *D);

/// Gets the OpenMP-specific address of the local variable.		/// Gets the OpenMP-specific address of the local variable.
virtual Address getAddressOfLocalVariable(CodeGenFunction &CGF,		virtual Address getAddressOfLocalVariable(CodeGenFunction &CGF,
const VarDecl *VD);		const VarDecl *VD);

▲ Show 20 Lines • Show All 705 Lines • ▼ Show 20 Lines	public:

/// Gets the OpenMP-specific address of the local variable.		/// Gets the OpenMP-specific address of the local variable.
Address getAddressOfLocalVariable(CodeGenFunction &CGF,		Address getAddressOfLocalVariable(CodeGenFunction &CGF,
const VarDecl *VD) override {		const VarDecl *VD) override {
return Address::invalid();		return Address::invalid();
}		}
};		};

} // namespace CodeGen		} // namespace CodeGen
} // namespace clang		} // namespace clang

#endif		#endif
		ABataevUnsubmitted Not Done Reply Inline Actions Same here, make it protected or just create a copy, if it is small. ABataev: Same here, make it protected or just create a copy, if it is small.
		saiislamAuthorUnsubmitted Done Reply Inline Actions It calls static functions which in turn call other static functions, so it won't make sense to create a copy of whole function chain in amdgcn. saiislam: It calls static functions which in turn call other static functions, so it won't make sense to…

clang/lib/CodeGen/CGOpenMPRuntime.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,346 Lines • ▼ Show 20 Lines	for (const FieldDecl *FD : RD->fields()) {
for (unsigned I = PrevIdx; I < Idx; ++I)		for (unsigned I = PrevIdx; I < Idx; ++I)
Fields.add(llvm::Constant::getNullValue(StructTy->getElementType(I)));		Fields.add(llvm::Constant::getNullValue(StructTy->getElementType(I)));
PrevIdx = Idx + 1;		PrevIdx = Idx + 1;
Fields.add(*DI);		Fields.add(*DI);
++DI;		++DI;
}		}
}		}

template <class... As>		template <class... As>
		JonChesterfieldUnsubmitted Not Done Reply Inline Actions This appears to be the same as the free function we had before, except now all the call sites are prefixed CodegenUtil. Is there a functional change I'm missing? The rest of this patch would be easier to read with this part split off. JonChesterfield: This appears to be the same as the free function we had before, except now all the call sites…
		saiislamAuthorUnsubmitted Done Reply Inline Actions addFieldToRecordDecl and createGlobalStruct methods had file static scope. To make them callable from other files, from amdgcn specific file in this case, they were put in this utility class. D92167 puts this change into a separate patch. Will update this patch once D92167 gets accepted. saiislam: addFieldToRecordDecl and createGlobalStruct methods had file static scope. To make them…
static llvm::GlobalVariable *		static llvm::GlobalVariable *
createGlobalStruct(CodeGenModule &CGM, QualType Ty, bool IsConstant,		createGlobalStruct(CodeGenModule &CGM, QualType Ty, bool IsConstant,
ArrayRef<llvm::Constant *> Data, const Twine &Name,		ArrayRef<llvm::Constant *> Data, const Twine &Name,
As &&... Args) {		As &&... Args) {
const auto *RD = cast<RecordDecl>(Ty->getAsTagDecl());		const auto *RD = cast<RecordDecl>(Ty->getAsTagDecl());
const CGRecordLayout &RL = CGM.getTypes().getCGRecordLayout(RD);		const CGRecordLayout &RL = CGM.getTypes().getCGRecordLayout(RD);
ConstantInitBuilder CIBuilder(CGM);		ConstantInitBuilder CIBuilder(CGM);
ConstantStructBuilder Fields = CIBuilder.beginStruct(RL.getLLVMType());		ConstantStructBuilder Fields = CIBuilder.beginStruct(RL.getLLVMType());
▲ Show 20 Lines • Show All 1,966 Lines • ▼ Show 20 Lines	if (!KmpRoutineEntryPtrTy) {
QualType KmpRoutineEntryTyArgs[] = {KmpInt32Ty, C.VoidPtrTy};		QualType KmpRoutineEntryTyArgs[] = {KmpInt32Ty, C.VoidPtrTy};
FunctionProtoType::ExtProtoInfo EPI;		FunctionProtoType::ExtProtoInfo EPI;
KmpRoutineEntryPtrQTy = C.getPointerType(		KmpRoutineEntryPtrQTy = C.getPointerType(
C.getFunctionType(KmpInt32Ty, KmpRoutineEntryTyArgs, EPI));		C.getFunctionType(KmpInt32Ty, KmpRoutineEntryTyArgs, EPI));
KmpRoutineEntryPtrTy = CGM.getTypes().ConvertType(KmpRoutineEntryPtrQTy);		KmpRoutineEntryPtrTy = CGM.getTypes().ConvertType(KmpRoutineEntryPtrQTy);
}		}
}		}

		/// Emit structure descriptor for a kernel
		void CGOpenMPRuntime::emitStructureKernelDesc(CodeGenModule &CGM,
		StringRef Name, int16_t WG_Size,
		int8_t Mode, int8_t HostServices,
		int8_t MaxParallelLevel) {

		// Create all device images
		llvm::Constant *AttrData[] = {
		llvm::ConstantInt::get(CGM.Int16Ty, 2), // Version
		llvm::ConstantInt::get(CGM.Int16Ty, 9), // Size in bytes
		llvm::ConstantInt::get(CGM.Int16Ty, WG_Size),
		llvm::ConstantInt::get(CGM.Int8Ty, Mode), // 0 => SPMD, 1 => GENERIC
		llvm::ConstantInt::get(CGM.Int8Ty, HostServices), // 1 => use HostServices
		llvm::ConstantInt::get(CGM.Int8Ty, MaxParallelLevel)}; // number of nests

		llvm::GlobalVariable *AttrImages = createGlobalStruct(
		CGM, getTgtAttributeStructQTy(), isDefaultLocationConstant(), AttrData,
		Name + Twine("_kern_desc"), llvm::GlobalValue::WeakAnyLinkage);
		CGM.addCompilerUsedGlobal(AttrImages);
		}

		// Create Tgt Attribute Sruct type.
		QualType CGOpenMPRuntime::getTgtAttributeStructQTy() {
		ASTContext &C = CGM.getContext();
		QualType KmpInt8Ty = C.getIntTypeForBitwidth(/Width=/8, /Signed=/1);
		QualType KmpInt16Ty = C.getIntTypeForBitwidth(/Width=/16, /Signed=/1);
		if (TgtAttributeStructQTy.isNull()) {
		RecordDecl *RD = C.buildImplicitRecord("__tgt_attribute_struct");
		RD->startDefinition();
		addFieldToRecordDecl(C, RD, KmpInt16Ty); // Version
		addFieldToRecordDecl(C, RD, KmpInt16Ty); // Struct Size in bytes.
		addFieldToRecordDecl(C, RD, KmpInt16Ty); // WG_size
		addFieldToRecordDecl(C, RD, KmpInt8Ty); // Mode
		addFieldToRecordDecl(C, RD, KmpInt8Ty); // HostServices
		addFieldToRecordDecl(C, RD, KmpInt8Ty); // MaxParallelLevel
		RD->completeDefinition();
		TgtAttributeStructQTy = C.getRecordType(RD);
		}
		return TgtAttributeStructQTy;
		}

QualType CGOpenMPRuntime::getTgtOffloadEntryQTy() {		QualType CGOpenMPRuntime::getTgtOffloadEntryQTy() {
// Make sure the type of the entry is already created. This is the type we		// Make sure the type of the entry is already created. This is the type we
// have to create:		// have to create:
// struct __tgt_offload_entry{		// struct __tgt_offload_entry{
// void *addr; // Pointer to the offload entry info.		// void *addr; // Pointer to the offload entry info.
// // (function or global)		// // (function or global)
// char *name; // Name of the function or global.		// char *name; // Name of the function or global.
// size_t size; // Size of the entry info (0 if it a function).		// size_t size; // Size of the entry info (0 if it a function).
▲ Show 20 Lines • Show All 8,783 Lines • Show Last 20 Lines

clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.h

	Show All 19 Lines
	#include "clang/AST/StmtOpenMP.h"			#include "clang/AST/StmtOpenMP.h"

	namespace clang {			namespace clang {
	namespace CodeGen {			namespace CodeGen {

	class CGOpenMPRuntimeAMDGCN final : public CGOpenMPRuntimeGPU {			class CGOpenMPRuntimeAMDGCN final : public CGOpenMPRuntimeGPU {

	public:			public:
				/// Nesting level of parallel region.
				int ParallelLevel = 0;
				int MaxParallelLevel = 0;

				ABataevUnsubmitted Done Reply Inline Actions Add comments for all new members ABataev: Add comments for all new members
	explicit CGOpenMPRuntimeAMDGCN(CodeGenModule &CGM);			explicit CGOpenMPRuntimeAMDGCN(CodeGenModule &CGM);

	/// Get the GPU warp size.			/// Get the GPU warp size.
	llvm::Value *getGPUWarpSize(CodeGenFunction &CGF) override;			llvm::Value *getGPUWarpSize(CodeGenFunction &CGF) override;
				ABataevUnsubmitted Not Done Reply Inline Actions Runtime does not support nested parallelism on GPU. Do you really need it? ABataev: Runtime does not support nested parallelism on GPU. Do you really need it?

	/// Get the id of the current thread on the GPU.			/// Get the id of the current thread on the GPU.
	llvm::Value *getGPUThreadID(CodeGenFunction &CGF) override;			llvm::Value *getGPUThreadID(CodeGenFunction &CGF) override;

	/// Get the maximum number of threads in a block of the GPU.			/// Get the maximum number of threads in a block of the GPU.
	llvm::Value *getGPUNumThreads(CodeGenFunction &CGF) override;			llvm::Value *getGPUNumThreads(CodeGenFunction &CGF) override;

				/// Allocate global variable for TransferMedium
				JonChesterfieldUnsubmitted Not Done Reply Inline Actions I'm not convinced by this abstraction. It looks like amdgcn and nvptx want almost exactly the same variable in each case. The difference appears to be that nvptx uses internal linkage and amdgcn uses weak + externally initialized, in which case we're better off with `bool nvptx::needsExternalInitialization() {return false;}` `bool amdgpu::needsExternalInitialization() {return true;}` Or, if the inline ternary is unappealing, amdgcn::NewGlobalVariable(...) that passes the arguments to llvm::GlobalVariable while setting the two fields that differ between the two. JonChesterfield: I'm not convinced by this abstraction. It looks like amdgcn and nvptx want almost exactly the…
				saiislamAuthorUnsubmitted Done Reply Inline Actions I understand what you are suggesting. But, there are multiple such variables where linkage between nvptx and amdgcn are different. Also current style gives flexibility to a future implementation to define these variables in their own way. What do you think? saiislam: I understand what you are suggesting. But, there are multiple such variables where linkage…
				virtual llvm::GlobalVariable *
				allocateTransferMediumGlobal(CodeGenModule &CGM, llvm::ArrayType *Ty,
				StringRef TransferMediumName) override;

				/// Allocate global variable for SharedStaticRD
				virtual llvm::GlobalVariable *
				allocateSharedStaticRDGlobal(CodeGenModule &CGM,
				llvm::Type *LLVMStaticTy) override;

				/// Get global variable KernelStaticGlobalized which is a shared pointer for
				/// the global memory in the global memory buffer used for the given kernel
				virtual llvm::GlobalVariable *
				allocateKernelStaticGlobalized(CodeGenModule &CGM) override;

				/// Emit target specifc SPMD kernel
				virtual void emitSPMDKernelWrapper(const OMPExecutableDirective &D,
				StringRef ParentName,
				llvm::Function *&OutlinedFn,
				llvm::Constant *&OutlinedFnID,
				bool IsOffloadEntry,
				const RegionCodeGenTy &CodeGen) override;

				/// Emit target specific Non-SPMD kernel
				virtual void
				ABataevUnsubmitted Done Reply Inline Actions Do you really need to expose all these new members as public? ABataev: Do you really need to expose all these new members as public?
				emitNonSPMDKernelWrapper(const OMPExecutableDirective &D,
				StringRef ParentName, llvm::Function *&OutlinedFn,
				llvm::Constant *&OutlinedFnID, bool IsOffloadEntry,
				const RegionCodeGenTy &CodeGen) override;

				/// Create a unique global variable to indicate the flat-work-group-size
				/// for this region. Values are [256..1024].
				static void setPropertyWorkGroupSize(CodeGenModule &CGM, StringRef Name,
				unsigned WGSize);

				/// Generate global variables _wg_size, kern_desc, __tgt_attribute_struct.
				/// Also generate appropriate value of attribute amdgpu-flat-work-group-size
				void generateMetaData(CodeGenModule &CGM, const OMPExecutableDirective &D,
				llvm::Function *&OutlinedFn, bool IsGeneric);
	};			};

	} // namespace CodeGen			} // namespace CodeGen
	} // namespace clang			} // namespace clang
				ABataevUnsubmitted Not Done Reply Inline Actions Do you really need to make this class public? `final` ABataev: 1. Do you really need to make this class public? 2. `final`

	#endif // LLVM_CLANG_LIB_CODEGEN_CGOPENMPRUNTIMEAMDGCN_H			#endif // LLVM_CLANG_LIB_CODEGEN_CGOPENMPRUNTIMEAMDGCN_H
				ABataevUnsubmitted Done Reply Inline Actions It does not help to understand the functionality ABataev: It does not help to understand the functionality

clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.cpp

//===-- CGOpenMPRuntimeAMDGCN.cpp - Interface to OpenMP AMDGCN Runtimes --===//		//===-- CGOpenMPRuntimeAMDGCN.cpp - Interface to OpenMP AMDGCN Runtimes --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This provides a class for OpenMP runtime code generation specialized to		// This provides a class for OpenMP runtime code generation specialized to
// AMDGCN targets from generalized CGOpenMPRuntimeGPU class.		// AMDGCN targets from generalized CGOpenMPRuntimeGPU class.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "CGOpenMPRuntimeAMDGCN.h"		#include "CGOpenMPRuntimeAMDGCN.h"
		#include "CGOpenMPRuntime.h"
#include "CGOpenMPRuntimeGPU.h"		#include "CGOpenMPRuntimeGPU.h"
#include "CodeGenFunction.h"		#include "CodeGenFunction.h"
#include "clang/AST/Attr.h"		#include "clang/AST/Attr.h"
#include "clang/AST/DeclOpenMP.h"		#include "clang/AST/DeclOpenMP.h"
#include "clang/AST/StmtOpenMP.h"		#include "clang/AST/StmtOpenMP.h"
#include "clang/AST/StmtVisitor.h"		#include "clang/AST/StmtVisitor.h"
#include "clang/Basic/Cuda.h"		#include "clang/Basic/Cuda.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/IR/IntrinsicsAMDGPU.h"		#include "llvm/IR/IntrinsicsAMDGPU.h"

using namespace clang;		using namespace clang;
using namespace CodeGen;		using namespace CodeGen;
using namespace llvm::omp;		using namespace llvm::omp;

CGOpenMPRuntimeAMDGCN::CGOpenMPRuntimeAMDGCN(CodeGenModule &CGM)		CGOpenMPRuntimeAMDGCN::CGOpenMPRuntimeAMDGCN(CodeGenModule &CGM)
: CGOpenMPRuntimeGPU(CGM) {		: CGOpenMPRuntimeGPU(CGM) {
if (!CGM.getLangOpts().OpenMPIsDevice)		if (!CGM.getLangOpts().OpenMPIsDevice)
llvm_unreachable("OpenMP AMDGCN can only handle device code.");		llvm_unreachable("OpenMP AMDGCN can only handle device code.");
		StaticRDLinkage = llvm::GlobalValue::PrivateLinkage;
}		}

llvm::Value *CGOpenMPRuntimeAMDGCN::getGPUWarpSize(CodeGenFunction &CGF) {		llvm::Value *CGOpenMPRuntimeAMDGCN::getGPUWarpSize(CodeGenFunction &CGF) {
CGBuilderTy &Bld = CGF.Builder;		CGBuilderTy &Bld = CGF.Builder;
// return constant compile-time target-specific warp size		// return constant compile-time target-specific warp size
unsigned WarpSize = CGF.getTarget().getGridValue(llvm::omp::GV_Warp_Size);		unsigned WarpSize = CGF.getTarget().getGridValue(llvm::omp::GV_Warp_Size);
return Bld.getInt32(WarpSize);		return Bld.getInt32(WarpSize);
}		}
Show All 13 Lines	llvm::Value *CGOpenMPRuntimeAMDGCN::getGPUNumThreads(CodeGenFunction &CGF) {
if (!F) {		if (!F) {
F = llvm::Function::Create(		F = llvm::Function::Create(
llvm::FunctionType::get(CGF.Int64Ty, {CGF.Int32Ty}, false),		llvm::FunctionType::get(CGF.Int64Ty, {CGF.Int32Ty}, false),
llvm::GlobalVariable::ExternalLinkage, LocSize, &CGF.CGM.getModule());		llvm::GlobalVariable::ExternalLinkage, LocSize, &CGF.CGM.getModule());
}		}
return Bld.CreateTrunc(		return Bld.CreateTrunc(
Bld.CreateCall(F, {Bld.getInt32(0)}, "nvptx_num_threads"), CGF.Int32Ty);		Bld.CreateCall(F, {Bld.getInt32(0)}, "nvptx_num_threads"), CGF.Int32Ty);
}		}

		llvm::GlobalVariable *CGOpenMPRuntimeAMDGCN::allocateTransferMediumGlobal(
		CodeGenModule &CGM, llvm::ArrayType *Ty, StringRef TransferMediumName) {
		return new llvm::GlobalVariable(
		CGM.getModule(), Ty, /isConstant=/false,
		llvm::GlobalVariable::WeakAnyLinkage, llvm::UndefValue::get(Ty),
		TransferMediumName,
		/InsertBefore=/nullptr, llvm::GlobalVariable::NotThreadLocal,
		CGM.getContext().getTargetAddressSpace(LangAS::cuda_shared),
		/isExternallyInitialized/ true);
		}
		JonChesterfieldUnsubmitted Done Reply Inline Actions This is a very verbose way to say that amdgcn calls emitmetatdata at the end of emitkernel and nvptx doesn't. Suggest unconditionally calling emitmetatdata, and having emitmetatdata be a no-op for nvptx. JonChesterfield: This is a very verbose way to say that amdgcn calls emitmetatdata at the end of emitkernel and…
		saiislamAuthorUnsubmitted Done Reply Inline Actions Won't the no-op approach be less extensible? Current way, though verbose, leaves scope for attaching prefix/suffix code as and when required around emitkernel. While in case of no-op, every implementing arch might have to use the exact same pattern of methods with and without code. saiislam: Won't the no-op approach be less extensible? Current way, though verbose, leaves scope for…

		llvm::GlobalVariable *
		CGOpenMPRuntimeAMDGCN::allocateSharedStaticRDGlobal(CodeGenModule &CGM,
		llvm::Type *LLVMStaticTy) {
		return new llvm::GlobalVariable(
		CGM.getModule(), LLVMStaticTy,
		/isConstant=/false, llvm::GlobalValue::WeakAnyLinkage,
		llvm::UndefValue::get(LLVMStaticTy), "_openmp_shared_static_glob_rd_$_",
		/InsertBefore=/nullptr, llvm::GlobalValue::NotThreadLocal,
		CGM.getContext().getTargetAddressSpace(LangAS::cuda_shared),
		/isExternallyInitialized/ true);
		}
		JonChesterfieldUnsubmitted Done Reply Inline Actions I think there's a credible chance this is useful to nvptx, so doesn't have to be amdgcn specific JonChesterfield: I think there's a credible chance this is useful to nvptx, so doesn't have to be amdgcn specific
		saiislamAuthorUnsubmitted Done Reply Inline Actions You are right, it can be useful for nvptx as well. May be we can club its generalization with the nvptx's use-case when it arrives in the future? saiislam: You are right, it can be useful for nvptx as well. May be we can club its generalization with…

		llvm::GlobalVariable *
		CGOpenMPRuntimeAMDGCN::allocateKernelStaticGlobalized(CodeGenModule &CGM) {
		return new llvm::GlobalVariable(
		CGM.getModule(), CGM.VoidPtrTy, /isConstant=/false,
		llvm::GlobalValue::WeakAnyLinkage, llvm::UndefValue::get(CGM.VoidPtrTy),
		"_openmp_kernel_static_glob_rd$ptr", /InsertBefore=/nullptr,
		llvm::GlobalValue::NotThreadLocal,
		CGM.getContext().getTargetAddressSpace(LangAS::cuda_shared),
		/isExternallyInitialized/ true);
		}

		void CGOpenMPRuntimeAMDGCN::setPropertyWorkGroupSize(CodeGenModule &CGM,
		StringRef Name,
		unsigned WGSize) {
		auto *GVMode = new llvm::GlobalVariable(
		CGM.getModule(), CGM.Int16Ty, /isConstant=/true,
		llvm::GlobalValue::WeakAnyLinkage,
		llvm::ConstantInt::get(CGM.Int16Ty, WGSize), Name + Twine("_wg_size"),
		JonChesterfieldUnsubmitted Done Reply Inline Actions I think this is about computing a maximum workgroup size which the runtime uses to limit the number of threads it launches. If so, this is probably useful for nvptx and amdgcn. I'm having trouble working out what the conditions are though. Maybe it's based on an openmp clause? JonChesterfield: I think this is about computing a maximum workgroup size which the runtime uses to limit the…
		saiislamAuthorUnsubmitted Done Reply Inline Actions Yes, the if block in 111-147 corresponds to "number of threads" for thread_limit and num_threads clauses in teams and parallel directives. saiislam: Yes, the if block in 111-147 corresponds to "number of threads" for thread_limit and…
		/InsertBefore=/nullptr, llvm::GlobalVariable::NotThreadLocal,
		CGM.getContext().getTargetAddressSpace(LangAS::cuda_device),
		/isExternallyInitialized/ false);
		CGM.addCompilerUsedGlobal(GVMode);
		}

		void CGOpenMPRuntimeAMDGCN::generateMetaData(CodeGenModule &CGM,
		const OMPExecutableDirective &D,
		llvm::Function *&OutlinedFn,
		bool IsGeneric) {
		if (!CGM.getTriple().isAMDGCN())
		ABataevUnsubmitted Not Done Reply Inline Actions Is this possible? ABataev: Is this possible?
		return;
		int FlatAttr = 0;
		bool flatAttrEmitted = false;
		ABataevUnsubmitted Not Done Reply Inline Actions `FlatAttrEmitted` ABataev: `FlatAttrEmitted`
		unsigned DefaultWorkGroupSz =
		CGM.getTarget().getGridValue(llvm::omp::GVIDX::GV_Default_WG_Size);

		if (isOpenMPTeamsDirective(D.getDirectiveKind()) \|\|
		isOpenMPParallelDirective(D.getDirectiveKind())) {
		const auto *ThreadLimitClause = D.getSingleClause<OMPThreadLimitClause>();
		const auto *NumThreadsClause = D.getSingleClause<OMPNumThreadsClause>();
		unsigned MaxWorkGroupSz =
		CGM.getTarget().getGridValue(llvm::omp::GVIDX::GV_Max_WG_Size);
		unsigned compileTimeThreadLimit = 0;
		ABataevUnsubmitted Not Done Reply Inline Actions `CompileTimeThreadLimit` ABataev: `CompileTimeThreadLimit`
		// Only one of thread_limit or num_threads is used, cant do it for both
		if (ThreadLimitClause && !NumThreadsClause) {
		Expr *ThreadLimitExpr = ThreadLimitClause->getThreadLimit();
		clang::Expr::EvalResult Result;
		if (ThreadLimitExpr->EvaluateAsInt(Result, CGM.getContext()))
		compileTimeThreadLimit = Result.Val.getInt().getExtValue();
		} else if (!ThreadLimitClause && NumThreadsClause) {
		Expr *NumThreadsExpr = NumThreadsClause->getNumThreads();
		clang::Expr::EvalResult Result;
		if (NumThreadsExpr->EvaluateAsInt(Result, CGM.getContext()))
		compileTimeThreadLimit = Result.Val.getInt().getExtValue();
		}

		// Add kernel metadata if ThreadLimit Clause is compile time constant > 0
		if (compileTimeThreadLimit > 0) {
		// Add the WarpSize to generic, to reflect what runtime dispatch does.
		if (IsGeneric)
		compileTimeThreadLimit +=
		JonChesterfieldUnsubmitted Not Done Reply Inline Actions I think I remember seeing a diff that makes this attribute unconditionally emitted by some other part of the toolchain. If so, it may no longer be required JonChesterfield: I think I remember seeing a diff that makes this attribute unconditionally emitted by some…
		CGM.getTarget().getGridValue(llvm::omp::GVIDX::GV_Warp_Size);
		if (compileTimeThreadLimit > MaxWorkGroupSz)
		compileTimeThreadLimit = MaxWorkGroupSz;
		std::string AttrVal = llvm::utostr(compileTimeThreadLimit);
		FlatAttr = compileTimeThreadLimit;
		OutlinedFn->addFnAttr("amdgpu-flat-work-group-size",
		AttrVal + "," + AttrVal);
		setPropertyWorkGroupSize(CGM, OutlinedFn->getName(),
		compileTimeThreadLimit);
		}
		flatAttrEmitted = true;
		} // end of amdgcn teams or parallel directive

		// emit amdgpu-flat-work-group-size if not emitted already.
		if (!flatAttrEmitted) {
		std::string FlatAttrVal = llvm::utostr(DefaultWorkGroupSz);
		OutlinedFn->addFnAttr("amdgpu-flat-work-group-size",
		FlatAttrVal + "," + FlatAttrVal);
		}
		JonChesterfieldUnsubmitted Not Done Reply Inline Actions HostServices is unused. Mode is redundant with exec_mode. wg_size is redundant with the other wg_size symbol added above. This kern_desc object should be deleted, not upstreamed. JonChesterfield: HostServices is unused. Mode is redundant with exec_mode. wg_size is redundant with the other…
		saiislamAuthorUnsubmitted Done Reply Inline Actions Ok, thanks. Will update in next revision. saiislam: Ok, thanks. Will update in next revision.
		// Emit a kernel descriptor for runtime.
		StringRef KernDescName = OutlinedFn->getName();
		CGOpenMPRuntime::emitStructureKernelDesc(CGM, KernDescName, FlatAttr,
		IsGeneric,
		1, // Uses HostServices
		MaxParallelLevel);
		// Reset it to zero for any subsequent kernel
		MaxParallelLevel = 0;
		}
		JonChesterfieldUnsubmitted Not Done Reply Inline Actions The nvptx emitSPMDKernelWrapper does nothing and the amdgcn one appends some metadata. How about 'nvptx::generateMetadata(...)' that does nothing and 'amdgcn::generateMetadata(...)` that does this stuff, called from the end of emitSPMDKernel? JonChesterfield: The nvptx emitSPMDKernelWrapper does nothing and the amdgcn one appends some metadata. How…
		saiislamAuthorUnsubmitted Done Reply Inline Actions It will be then difficult to track what all things are being done differently in the two. So, the common code has been generalized and (no change in nvptx + some changes in amdgcn) has been used as specialization. saiislam: It will be then difficult to track what all things are being done differently in the two. So…

		void CGOpenMPRuntimeAMDGCN::emitSPMDKernelWrapper(
		const OMPExecutableDirective &D, StringRef ParentName,
		llvm::Function &OutlinedFn, llvm::Constant &OutlinedFnID,
		bool IsOffloadEntry, const RegionCodeGenTy &CodeGen) {
		emitSPMDKernel(D, ParentName, OutlinedFn, OutlinedFnID, IsOffloadEntry,
		CodeGen);
		generateMetaData(CGM, D, OutlinedFn, /SPMD/ false);
		}

		void CGOpenMPRuntimeAMDGCN::emitNonSPMDKernelWrapper(
		const OMPExecutableDirective &D, StringRef ParentName,
		llvm::Function &OutlinedFn, llvm::Constant &OutlinedFnID,
		bool IsOffloadEntry, const RegionCodeGenTy &CodeGen) {
		emitNonSPMDKernel(D, ParentName, OutlinedFn, OutlinedFnID, IsOffloadEntry,
		CodeGen);
		generateMetaData(CGM, D, OutlinedFn, /Generic/ true);
		}
		JonChesterfieldUnsubmitted Not Done Reply Inline Actions This metadata generation could be split out from the other changes. JonChesterfield: This metadata generation could be split out from the other changes.

clang/lib/CodeGen/CGOpenMPRuntimeGPU.h

Show All 27 Lines	public:
enum ExecutionMode {		enum ExecutionMode {
/// SPMD execution mode (all threads are worker threads).		/// SPMD execution mode (all threads are worker threads).
EM_SPMD,		EM_SPMD,
/// Non-SPMD execution mode (1 master thread, others are workers).		/// Non-SPMD execution mode (1 master thread, others are workers).
EM_NonSPMD,		EM_NonSPMD,
/// Unknown execution mode (orphaned directive).		/// Unknown execution mode (orphaned directive).
EM_Unknown,		EM_Unknown,
};		};
		/// Linkage type of StaticRD Global variable
		llvm::GlobalValue::LinkageTypes StaticRDLinkage;
		ABataevUnsubmitted Done Reply Inline Actions Make it private or protected 2.Add default initializer ABataev: 1. Make it private or protected 2.Add default initializer

private:		private:
/// Parallel outlined function work for workers to execute.		/// Parallel outlined function work for workers to execute.
llvm::SmallVector<llvm::Function *, 16> Work;		llvm::SmallVector<llvm::Function *, 16> Work;

struct EntryFunctionState {		struct EntryFunctionState {
llvm::BasicBlock *ExitBB = nullptr;		llvm::BasicBlock *ExitBB = nullptr;
};		};

▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	private:
//		//

/// Creates offloading entry for the provided entry ID \a ID,		/// Creates offloading entry for the provided entry ID \a ID,
/// address \a Addr, size \a Size, and flags \a Flags.		/// address \a Addr, size \a Size, and flags \a Flags.
void createOffloadEntry(llvm::Constant ID, llvm::Constant Addr,		void createOffloadEntry(llvm::Constant ID, llvm::Constant Addr,
uint64_t Size, int32_t Flags,		uint64_t Size, int32_t Flags,
llvm::GlobalValue::LinkageTypes Linkage) override;		llvm::GlobalValue::LinkageTypes Linkage) override;

/// Emit outlined function specialized for the Fork-Join
/// programming model for applicable target directives on the NVPTX device.
/// \param D Directive to emit.
/// \param ParentName Name of the function that encloses the target region.
/// \param OutlinedFn Outlined function value to be defined by this call.
/// \param OutlinedFnID Outlined function ID value to be defined by this call.
/// \param IsOffloadEntry True if the outlined function is an offload entry.
/// An outlined function may not be an entry if, e.g. the if clause always
/// evaluates to false.
void emitNonSPMDKernel(const OMPExecutableDirective &D, StringRef ParentName,
llvm::Function *&OutlinedFn,
llvm::Constant *&OutlinedFnID, bool IsOffloadEntry,
const RegionCodeGenTy &CodeGen);

/// Emit outlined function specialized for the Single Program
/// Multiple Data programming model for applicable target directives on the
/// NVPTX device.
/// \param D Directive to emit.
/// \param ParentName Name of the function that encloses the target region.
/// \param OutlinedFn Outlined function value to be defined by this call.
/// \param OutlinedFnID Outlined function ID value to be defined by this call.
/// \param IsOffloadEntry True if the outlined function is an offload entry.
/// \param CodeGen Object containing the target statements.
/// An outlined function may not be an entry if, e.g. the if clause always
/// evaluates to false.
void emitSPMDKernel(const OMPExecutableDirective &D, StringRef ParentName,
llvm::Function *&OutlinedFn,
llvm::Constant *&OutlinedFnID, bool IsOffloadEntry,
const RegionCodeGenTy &CodeGen);

/// Emit outlined function for 'target' directive on the NVPTX		/// Emit outlined function for 'target' directive on the NVPTX
/// device.		/// device.
/// \param D Directive to emit.		/// \param D Directive to emit.
/// \param ParentName Name of the function that encloses the target region.		/// \param ParentName Name of the function that encloses the target region.
/// \param OutlinedFn Outlined function value to be defined by this call.		/// \param OutlinedFn Outlined function value to be defined by this call.
/// \param OutlinedFnID Outlined function ID value to be defined by this call.		/// \param OutlinedFnID Outlined function ID value to be defined by this call.
/// \param IsOffloadEntry True if the outlined function is an offload entry.		/// \param IsOffloadEntry True if the outlined function is an offload entry.
/// An outlined function may not be an entry if, e.g. the if clause always		/// An outlined function may not be an entry if, e.g. the if clause always
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	protected:
bool isDefaultLocationConstant() const override { return true; }		bool isDefaultLocationConstant() const override { return true; }

/// Returns additional flags that can be stored in reserved_2 field of the		/// Returns additional flags that can be stored in reserved_2 field of the
/// default location.		/// default location.
/// For NVPTX target contains data about SPMD/Non-SPMD execution mode +		/// For NVPTX target contains data about SPMD/Non-SPMD execution mode +
/// Full/Lightweight runtime mode. Used for better optimization.		/// Full/Lightweight runtime mode. Used for better optimization.
unsigned getDefaultLocationReserved2Flags() const override;		unsigned getDefaultLocationReserved2Flags() const override;

public:		public:
		JonChesterfieldUnsubmitted Not Done Reply Inline Actions Please put this back to the previous location so we can see whether it changed in the diff JonChesterfield: Please put this back to the previous location so we can see whether it changed in the diff
		saiislamAuthorUnsubmitted Done Reply Inline Actions This movement changes them from private to protected. I could have just added access specifiers and not move the definitions. It would have simplified the review, but it would have decreased the readability for future. saiislam: This movement changes them from private to protected. I could have just added access specifiers…
explicit CGOpenMPRuntimeGPU(CodeGenModule &CGM);		explicit CGOpenMPRuntimeGPU(CodeGenModule &CGM);
void clear() override;		void clear() override;

/// Declare generalized virtual functions which need to be defined		/// Declare generalized virtual functions which need to be defined
/// by all specializations of OpenMPGPURuntime Targets like AMDGCN		/// by all specializations of OpenMPGPURuntime Targets like AMDGCN
/// and NVPTX.		/// and NVPTX.

/// Get the GPU warp size.		/// Get the GPU warp size.
virtual llvm::Value *getGPUWarpSize(CodeGenFunction &CGF) = 0;		virtual llvm::Value *getGPUWarpSize(CodeGenFunction &CGF) = 0;

/// Get the id of the current thread on the GPU.		/// Get the id of the current thread on the GPU.
virtual llvm::Value *getGPUThreadID(CodeGenFunction &CGF) = 0;		virtual llvm::Value *getGPUThreadID(CodeGenFunction &CGF) = 0;

/// Get the maximum number of threads in a block of the GPU.		/// Get the maximum number of threads in a block of the GPU.
virtual llvm::Value *getGPUNumThreads(CodeGenFunction &CGF) = 0;		virtual llvm::Value *getGPUNumThreads(CodeGenFunction &CGF) = 0;

		/// Allocate global variable for TransferMedium
		virtual llvm::GlobalVariable *
		allocateTransferMediumGlobal(CodeGenModule &CGM, llvm::ArrayType *Ty,
		StringRef TransferMediumName) = 0;

		/// Allocate global variable for SharedStaticRD
		virtual llvm::GlobalVariable *
		allocateSharedStaticRDGlobal(CodeGenModule &CGM,
		llvm::Type *LLVMStaticTy) = 0;

		/// Allocate global variable for KernelStaticGlobalized
		virtual llvm::GlobalVariable *
		allocateKernelStaticGlobalized(CodeGenModule &CGM) = 0;

		virtual void emitSPMDKernelWrapper(const OMPExecutableDirective &D,
		StringRef ParentName,
		llvm::Function *&OutlinedFn,
		llvm::Constant *&OutlinedFnID,
		bool IsOffloadEntry,
		const RegionCodeGenTy &CodeGen) = 0;

		virtual void emitNonSPMDKernelWrapper(const OMPExecutableDirective &D,
		StringRef ParentName,
		llvm::Function *&OutlinedFn,
		llvm::Constant *&OutlinedFnID,
		bool IsOffloadEntry,
		const RegionCodeGenTy &CodeGen) = 0;

		/// Emit outlined function specialized for the Single Program
		/// Multiple Data programming model for applicable target directives on the
		/// NVPTX device.
		/// \param D Directive to emit.
		/// \param ParentName Name of the function that encloses the target region.
		ABataevUnsubmitted Not Done Reply Inline Actions Are all these required to be public? ABataev: Are all these required to be public?
		saiislamAuthorUnsubmitted Done Reply Inline Actions Yes, they are being called from outside class. saiislam: Yes, they are being called from outside class.
		/// \param OutlinedFn Outlined function value to be defined by this call.
		/// \param OutlinedFnID Outlined function ID value to be defined by this call.
		/// \param IsOffloadEntry True if the outlined function is an offload entry.
		/// \param CodeGen Object containing the target statements.
		/// An outlined function may not be an entry if, e.g. the if clause always
		/// evaluates to false.
		void emitSPMDKernel(const OMPExecutableDirective &D, StringRef ParentName,
		llvm::Function *&OutlinedFn,
		llvm::Constant *&OutlinedFnID, bool IsOffloadEntry,
		const RegionCodeGenTy &CodeGen);

		/// Emit outlined function specialized for the Fork-Join
		/// programming model for applicable target directives on the NVPTX device.
		/// \param D Directive to emit.
		/// \param ParentName Name of the function that encloses the target region.
		/// \param OutlinedFn Outlined function value to be defined by this call.
		/// \param OutlinedFnID Outlined function ID value to be defined by this call.
		/// \param IsOffloadEntry True if the outlined function is an offload entry.
		/// An outlined function may not be an entry if, e.g. the if clause always
		/// evaluates to false.
		void emitNonSPMDKernel(const OMPExecutableDirective &D, StringRef ParentName,
		llvm::Function *&OutlinedFn,
		llvm::Constant *&OutlinedFnID, bool IsOffloadEntry,
		const RegionCodeGenTy &CodeGen);
		ABataevUnsubmitted Not Done Reply Inline Actions Make them protected, not public if possible. Try the same for other new functions. ABataev: Make them protected, not public if possible. Try the same for other new functions.

/// Emit call to void __kmpc_push_proc_bind(ident_t *loc, kmp_int32		/// Emit call to void __kmpc_push_proc_bind(ident_t *loc, kmp_int32
/// global_tid, int proc_bind) to generate code for 'proc_bind' clause.		/// global_tid, int proc_bind) to generate code for 'proc_bind' clause.
virtual void emitProcBindClause(CodeGenFunction &CGF,		virtual void emitProcBindClause(CodeGenFunction &CGF,
llvm::omp::ProcBindKind ProcBind,		llvm::omp::ProcBindKind ProcBind,
SourceLocation Loc) override;		SourceLocation Loc) override;

/// Emits call to void __kmpc_push_num_threads(ident_t *loc, kmp_int32		/// Emits call to void __kmpc_push_num_threads(ident_t *loc, kmp_int32
/// global_tid, kmp_int32 num_threads) to generate code for 'num_threads'		/// global_tid, kmp_int32 num_threads) to generate code for 'num_threads'
▲ Show 20 Lines • Show All 149 Lines • ▼ Show 20 Lines	public:
/// their declaration context.		/// their declaration context.
enum DataSharingMode {		enum DataSharingMode {
/// CUDA data sharing mode.		/// CUDA data sharing mode.
CUDA,		CUDA,
/// Generic data-sharing mode.		/// Generic data-sharing mode.
Generic,		Generic,
};		};

/// Cleans up references to the objects in finished function.		/// Cleans up references to the objects in finished function.
///		///
void functionFinished(CodeGenFunction &CGF) override;		void functionFinished(CodeGenFunction &CGF) override;
		ABataevUnsubmitted Not Done Reply Inline Actions Make it private or protected ABataev: Make it private or protected

/// Choose a default value for the dist_schedule clause.		/// Choose a default value for the dist_schedule clause.
void getDefaultDistScheduleAndChunk(CodeGenFunction &CGF,		void getDefaultDistScheduleAndChunk(CodeGenFunction &CGF,
const OMPLoopDirective &S, OpenMPDistScheduleClauseKind &ScheduleKind,		const OMPLoopDirective &S, OpenMPDistScheduleClauseKind &ScheduleKind,
llvm::Value *&Chunk) const override;		llvm::Value *&Chunk) const override;

/// Choose a default value for the schedule clause.		/// Choose a default value for the schedule clause.
void getDefaultScheduleAndChunk(CodeGenFunction &CGF,		void getDefaultScheduleAndChunk(CodeGenFunction &CGF,
▲ Show 20 Lines • Show All 115 Lines • Show Last 20 Lines

clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp

//===---- CGOpenMPRuntimeGPU.cpp - Interface to OpenMP GPU Runtimes ----===//		//===---- CGOpenMPRuntimeGPU.cpp - Interface to OpenMP GPU Runtimes ----===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This provides a generalized class for OpenMP runtime code generation		// This provides a generalized class for OpenMP runtime code generation
// specialized by GPU targets NVPTX and AMDGCN.		// specialized by GPU targets NVPTX and AMDGCN.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "CGOpenMPRuntimeGPU.h"		#include "CGOpenMPRuntimeGPU.h"
		#include "CGOpenMPRuntimeAMDGCN.h"
#include "CGOpenMPRuntimeNVPTX.h"		#include "CGOpenMPRuntimeNVPTX.h"
#include "CodeGenFunction.h"		#include "CodeGenFunction.h"
#include "clang/AST/Attr.h"		#include "clang/AST/Attr.h"
#include "clang/AST/DeclOpenMP.h"		#include "clang/AST/DeclOpenMP.h"
#include "clang/AST/StmtOpenMP.h"		#include "clang/AST/StmtOpenMP.h"
#include "clang/AST/StmtVisitor.h"		#include "clang/AST/StmtVisitor.h"
#include "clang/Basic/Cuda.h"		#include "clang/Basic/Cuda.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
▲ Show 20 Lines • Show All 1,162 Lines • ▼ Show 20 Lines	void Exit(CodeGenFunction &CGF) override {
RT.emitNonSPMDEntryFooter(CGF, EST);		RT.emitNonSPMDEntryFooter(CGF, EST);
}		}
} Action(EST, WST);		} Action(EST, WST);
CodeGen.setAction(Action);		CodeGen.setAction(Action);
IsInTTDRegion = true;		IsInTTDRegion = true;
// Reserve place for the globalized memory.		// Reserve place for the globalized memory.
GlobalizedRecords.emplace_back();		GlobalizedRecords.emplace_back();
if (!KernelStaticGlobalized) {		if (!KernelStaticGlobalized) {
KernelStaticGlobalized = new llvm::GlobalVariable(		auto &RT = static_cast<CGOpenMPRuntimeGPU &>(CGM.getOpenMPRuntime());
CGM.getModule(), CGM.VoidPtrTy, /isConstant=/false,		KernelStaticGlobalized = RT.allocateKernelStaticGlobalized(CGM);
llvm::GlobalValue::InternalLinkage,
llvm::ConstantPointerNull::get(CGM.VoidPtrTy),
"_openmp_kernel_static_glob_rd$ptr", /InsertBefore=/nullptr,
llvm::GlobalValue::NotThreadLocal,
CGM.getContext().getTargetAddressSpace(LangAS::cuda_shared));
}		}
emitTargetOutlinedFunctionHelper(D, ParentName, OutlinedFn, OutlinedFnID,		emitTargetOutlinedFunctionHelper(D, ParentName, OutlinedFn, OutlinedFnID,
IsOffloadEntry, CodeGen);		IsOffloadEntry, CodeGen);
IsInTTDRegion = false;		IsInTTDRegion = false;

// Now change the name of the worker function to correspond to this target		// Now change the name of the worker function to correspond to this target
// region's entry function.		// region's entry function.
WST.WorkerFn->setName(Twine(OutlinedFn->getName(), "_worker"));		WST.WorkerFn->setName(Twine(OutlinedFn->getName(), "_worker"));
▲ Show 20 Lines • Show All 108 Lines • ▼ Show 20 Lines	void Exit(CodeGenFunction &CGF) override {
RT.emitSPMDEntryFooter(CGF, EST);		RT.emitSPMDEntryFooter(CGF, EST);
}		}
} Action(*this, EST, D);		} Action(*this, EST, D);
CodeGen.setAction(Action);		CodeGen.setAction(Action);
IsInTTDRegion = true;		IsInTTDRegion = true;
// Reserve place for the globalized memory.		// Reserve place for the globalized memory.
GlobalizedRecords.emplace_back();		GlobalizedRecords.emplace_back();
if (!KernelStaticGlobalized) {		if (!KernelStaticGlobalized) {
KernelStaticGlobalized = new llvm::GlobalVariable(		auto &RT = static_cast<CGOpenMPRuntimeGPU &>(CGM.getOpenMPRuntime());
CGM.getModule(), CGM.VoidPtrTy, /isConstant=/false,		KernelStaticGlobalized = RT.allocateKernelStaticGlobalized(CGM);
llvm::GlobalValue::InternalLinkage,
llvm::ConstantPointerNull::get(CGM.VoidPtrTy),
"_openmp_kernel_static_glob_rd$ptr", /InsertBefore=/nullptr,
llvm::GlobalValue::NotThreadLocal,
CGM.getContext().getTargetAddressSpace(LangAS::cuda_shared));
}		}
emitTargetOutlinedFunctionHelper(D, ParentName, OutlinedFn, OutlinedFnID,		emitTargetOutlinedFunctionHelper(D, ParentName, OutlinedFn, OutlinedFnID,
IsOffloadEntry, CodeGen);		IsOffloadEntry, CodeGen);
IsInTTDRegion = false;		IsInTTDRegion = false;
}		}

void CGOpenMPRuntimeGPU::emitSPMDEntryHeader(		void CGOpenMPRuntimeGPU::emitSPMDEntryHeader(
CodeGenFunction &CGF, EntryFunctionState &EST,		CodeGenFunction &CGF, EntryFunctionState &EST,
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
// Create a unique global variable to indicate the execution mode of this target		// Create a unique global variable to indicate the execution mode of this target
// region. The execution mode is either 'generic', or 'spmd' depending on the		// region. The execution mode is either 'generic', or 'spmd' depending on the
// target directive. This variable is picked up by the offload library to setup		// target directive. This variable is picked up by the offload library to setup
// the device appropriately before kernel launch. If the execution mode is		// the device appropriately before kernel launch. If the execution mode is
// 'generic', the runtime reserves one warp for the master, otherwise, all		// 'generic', the runtime reserves one warp for the master, otherwise, all
// warps participate in parallel work.		// warps participate in parallel work.
static void setPropertyExecutionMode(CodeGenModule &CGM, StringRef Name,		static void setPropertyExecutionMode(CodeGenModule &CGM, StringRef Name,
bool Mode) {		bool Mode) {
auto *GVMode =		auto *GVMode = new llvm::GlobalVariable(
new llvm::GlobalVariable(CGM.getModule(), CGM.Int8Ty, /isConstant=/true,		CGM.getModule(), CGM.Int8Ty,
llvm::GlobalValue::WeakAnyLinkage,		/isConstant=/true, llvm::GlobalValue::WeakAnyLinkage,
		ABataevUnsubmitted Not Done Reply Inline Actions Restore original formatting. ABataev: Restore original formatting.
llvm::ConstantInt::get(CGM.Int8Ty, Mode ? 0 : 1),		llvm::ConstantInt::get(CGM.Int8Ty, Mode ? 0 : 1),
Twine(Name, "_exec_mode"));		Twine(Name, "_exec_mode"));
CGM.addCompilerUsedGlobal(GVMode);		CGM.addCompilerUsedGlobal(GVMode);
}		}

void CGOpenMPRuntimeGPU::emitWorkerFunction(WorkerFunctionState &WST) {		void CGOpenMPRuntimeGPU::emitWorkerFunction(WorkerFunctionState &WST) {
ASTContext &Ctx = CGM.getContext();		ASTContext &Ctx = CGM.getContext();

CodeGenFunction CGF(CGM, /suppressNewContext=/true);		CodeGenFunction CGF(CGM, /suppressNewContext=/true);
CGF.StartFunction(GlobalDecl(), Ctx.VoidTy, WST.WorkerFn, WST.CGFI, {},		CGF.StartFunction(GlobalDecl(), Ctx.VoidTy, WST.WorkerFn, WST.CGFI, {},
▲ Show 20 Lines • Show All 471 Lines • ▼ Show 20 Lines	void CGOpenMPRuntimeGPU::emitTargetOutlinedFunction(
bool IsOffloadEntry, const RegionCodeGenTy &CodeGen) {		bool IsOffloadEntry, const RegionCodeGenTy &CodeGen) {
if (!IsOffloadEntry) // Nothing to do.		if (!IsOffloadEntry) // Nothing to do.
return;		return;

assert(!ParentName.empty() && "Invalid target region parent name!");		assert(!ParentName.empty() && "Invalid target region parent name!");

bool Mode = supportsSPMDExecutionMode(CGM.getContext(), D);		bool Mode = supportsSPMDExecutionMode(CGM.getContext(), D);
if (Mode)		if (Mode)
emitSPMDKernel(D, ParentName, OutlinedFn, OutlinedFnID, IsOffloadEntry,		emitSPMDKernelWrapper(D, ParentName, OutlinedFn, OutlinedFnID,
CodeGen);		IsOffloadEntry, CodeGen);
else		else
emitNonSPMDKernel(D, ParentName, OutlinedFn, OutlinedFnID, IsOffloadEntry,		emitNonSPMDKernelWrapper(D, ParentName, OutlinedFn, OutlinedFnID,
CodeGen);		IsOffloadEntry, CodeGen);

setPropertyExecutionMode(CGM, OutlinedFn->getName(), Mode);		setPropertyExecutionMode(CGM, OutlinedFn->getName(), Mode);
}		}

namespace {		namespace {
LLVM_ENABLE_BITMASK_ENUMS_IN_NAMESPACE();		LLVM_ENABLE_BITMASK_ENUMS_IN_NAMESPACE();
/// Enum for accesseing the reserved_2 field of the ident_t struct.		/// Enum for accesseing the reserved_2 field of the ident_t struct.
enum ModeFlagsTy : unsigned {		enum ModeFlagsTy : unsigned {
/// Bit set to 1 when in SPMD mode.		/// Bit set to 1 when in SPMD mode.
▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines	void Enter(CodeGenFunction &CGF) override {
PrevIsInParallelRegion = IsInParallelRegion;		PrevIsInParallelRegion = IsInParallelRegion;
IsInParallelRegion = true;		IsInParallelRegion = true;
}		}
void Exit(CodeGenFunction &CGF) override {		void Exit(CodeGenFunction &CGF) override {
IsInParallelRegion = PrevIsInParallelRegion;		IsInParallelRegion = PrevIsInParallelRegion;
}		}
} Action(IsInParallelRegion);		} Action(IsInParallelRegion);
CodeGen.setAction(Action);		CodeGen.setAction(Action);
bool PrevIsInTTDRegion = IsInTTDRegion;		bool PrevIsInTTDRegion = IsInTTDRegion;
		ABataevUnsubmitted Not Done Reply Inline Actions It leads to a mem leak. ABataev: It leads to a mem leak.
IsInTTDRegion = false;		IsInTTDRegion = false;
bool PrevIsInTargetMasterThreadRegion = IsInTargetMasterThreadRegion;		bool PrevIsInTargetMasterThreadRegion = IsInTargetMasterThreadRegion;
IsInTargetMasterThreadRegion = false;		IsInTargetMasterThreadRegion = false;
auto *OutlinedFun =		auto *OutlinedFun =
cast<llvm::Function>(CGOpenMPRuntime::emitParallelOutlinedFunction(		cast<llvm::Function>(CGOpenMPRuntime::emitParallelOutlinedFunction(
D, ThreadIDVar, InnermostKind, CodeGen));		D, ThreadIDVar, InnermostKind, CodeGen));
if (CGM.getLangOpts().Optimize) {		if (CGM.getLangOpts().Optimize) {
OutlinedFun->removeFnAttr(llvm::Attribute::NoInline);		OutlinedFun->removeFnAttr(llvm::Attribute::NoInline);
▲ Show 20 Lines • Show All 1,235 Lines • ▼ Show 20 Lines	auto *Fn = llvm::Function::Create(CGM.getTypes().GetFunctionType(CGFI),
llvm::GlobalValue::InternalLinkage,		llvm::GlobalValue::InternalLinkage,
"_omp_reduction_inter_warp_copy_func", &M);		"_omp_reduction_inter_warp_copy_func", &M);
CGM.SetInternalFunctionAttributes(GlobalDecl(), Fn, CGFI);		CGM.SetInternalFunctionAttributes(GlobalDecl(), Fn, CGFI);
Fn->setDoesNotRecurse();		Fn->setDoesNotRecurse();
CodeGenFunction CGF(CGM);		CodeGenFunction CGF(CGM);
CGF.StartFunction(GlobalDecl(), C.VoidTy, Fn, CGFI, Args, Loc, Loc);		CGF.StartFunction(GlobalDecl(), C.VoidTy, Fn, CGFI, Args, Loc, Loc);

CGBuilderTy &Bld = CGF.Builder;		CGBuilderTy &Bld = CGF.Builder;
		auto &RT = static_cast<CGOpenMPRuntimeGPU &>(CGF.CGM.getOpenMPRuntime());

// This array is used as a medium to transfer, one reduce element at a time,		// This array is used as a medium to transfer, one reduce element at a time,
// the data from the first lane of every warp to lanes in the first warp		// the data from the first lane of every warp to lanes in the first warp
// in order to perform the final step of a reduction in a parallel region		// in order to perform the final step of a reduction in a parallel region
// (reduction across warps). The array is placed in NVPTX __shared__ memory		// (reduction across warps). The array is placed in NVPTX __shared__ memory
// for reduced latency, as well as to have a distinct copy for concurrently		// for reduced latency, as well as to have a distinct copy for concurrently
// executing target regions. The array is declared with common linkage so		// executing target regions. The array is declared with common linkage so
// as to be shared across compilation units.		// as to be shared across compilation units.
StringRef TransferMediumName =		StringRef TransferMediumName =
"__openmp_nvptx_data_transfer_temporary_storage";		"__openmp_nvptx_data_transfer_temporary_storage";
llvm::GlobalVariable *TransferMedium =		llvm::GlobalVariable *TransferMedium =
M.getGlobalVariable(TransferMediumName);		M.getGlobalVariable(TransferMediumName);
unsigned WarpSize = CGF.getTarget().getGridValue(llvm::omp::GV_Warp_Size);		unsigned WarpSize = CGF.getTarget().getGridValue(llvm::omp::GV_Warp_Size);
if (!TransferMedium) {		if (!TransferMedium) {
auto *Ty = llvm::ArrayType::get(CGM.Int32Ty, WarpSize);		auto *Ty = llvm::ArrayType::get(CGM.Int32Ty, WarpSize);
unsigned SharedAddressSpace = C.getTargetAddressSpace(LangAS::cuda_shared);		TransferMedium =
TransferMedium = new llvm::GlobalVariable(		RT.allocateTransferMediumGlobal(CGM, Ty, TransferMediumName);
M, Ty, /isConstant=/false, llvm::GlobalVariable::CommonLinkage,
llvm::Constant::getNullValue(Ty), TransferMediumName,
/InsertBefore=/nullptr, llvm::GlobalVariable::NotThreadLocal,
SharedAddressSpace);
CGM.addCompilerUsedGlobal(TransferMedium);		CGM.addCompilerUsedGlobal(TransferMedium);
}		}

auto &RT = static_cast<CGOpenMPRuntimeGPU &>(CGF.CGM.getOpenMPRuntime());
// Get the CUDA thread id of the current OpenMP thread on the GPU.		// Get the CUDA thread id of the current OpenMP thread on the GPU.
llvm::Value *ThreadID = RT.getGPUThreadID(CGF);		llvm::Value *ThreadID = RT.getGPUThreadID(CGF);
// nvptx_lane_id = nvptx_id % warpsize		// nvptx_lane_id = nvptx_id % warpsize
llvm::Value *LaneID = getNVPTXLaneID(CGF);		llvm::Value *LaneID = getNVPTXLaneID(CGF);
// nvptx_warp_id = nvptx_id / warpsize		// nvptx_warp_id = nvptx_id / warpsize
llvm::Value *WarpID = getNVPTXWarpID(CGF);		llvm::Value *WarpID = getNVPTXWarpID(CGF);

Address AddrReduceListArg = CGF.GetAddrOfLocalVar(&ReduceListArg);		Address AddrReduceListArg = CGF.GetAddrOfLocalVar(&ReduceListArg);
▲ Show 20 Lines • Show All 1,832 Lines • ▼ Show 20 Lines	case CudaArch::UNKNOWN:
break;		break;
case CudaArch::LAST:		case CudaArch::LAST:
llvm_unreachable("Unexpected Cuda arch.");		llvm_unreachable("Unexpected Cuda arch.");
}		}
llvm_unreachable("Unexpected NVPTX target without ptx feature.");		llvm_unreachable("Unexpected NVPTX target without ptx feature.");
}		}

void CGOpenMPRuntimeGPU::clear() {		void CGOpenMPRuntimeGPU::clear() {
		auto &RT = static_cast<CGOpenMPRuntimeGPU &>(CGM.getOpenMPRuntime());
if (!GlobalizedRecords.empty() &&		if (!GlobalizedRecords.empty() &&
!CGM.getLangOpts().OpenMPCUDATargetParallel) {		!CGM.getLangOpts().OpenMPCUDATargetParallel) {
ASTContext &C = CGM.getContext();		ASTContext &C = CGM.getContext();
llvm::SmallVector<const GlobalPtrSizeRecsTy *, 4> GlobalRecs;		llvm::SmallVector<const GlobalPtrSizeRecsTy *, 4> GlobalRecs;
llvm::SmallVector<const GlobalPtrSizeRecsTy *, 4> SharedRecs;		llvm::SmallVector<const GlobalPtrSizeRecsTy *, 4> SharedRecs;
RecordDecl *StaticRD = C.buildImplicitRecord(		RecordDecl *StaticRD = C.buildImplicitRecord(
"_openmp_static_memory_type_$_", RecordDecl::TagKind::TTK_Union);		"_openmp_static_memory_type_$_", RecordDecl::TagKind::TTK_Union);
StaticRD->startDefinition();		StaticRD->startDefinition();
Show All 32 Lines	for (const GlobalPtrSizeRecsTy &Records : GlobalizedRecords) {
StaticRD->addDecl(Field);		StaticRD->addDecl(Field);
GlobalRecs.push_back(&Records);		GlobalRecs.push_back(&Records);
}		}
Records.RecSize->setInitializer(llvm::ConstantInt::get(CGM.SizeTy, Size));		Records.RecSize->setInitializer(llvm::ConstantInt::get(CGM.SizeTy, Size));
Records.UseSharedMemory->setInitializer(		Records.UseSharedMemory->setInitializer(
llvm::ConstantInt::get(CGM.Int16Ty, UseSharedMemory ? 1 : 0));		llvm::ConstantInt::get(CGM.Int16Ty, UseSharedMemory ? 1 : 0));
}		}
// Allocate SharedMemorySize buffer for the shared memory.		// Allocate SharedMemorySize buffer for the shared memory.
// FIXME: nvlink does not handle weak linkage correctly (object with the
// different size are reported as erroneous).
// Restore this code as sson as nvlink is fixed.
if (!SharedStaticRD->field_empty()) {		if (!SharedStaticRD->field_empty()) {
llvm::APInt ArySize(/numBits=/64, SharedMemorySize);		llvm::APInt ArySize(/numBits=/64, SharedMemorySize);
QualType SubTy = C.getConstantArrayType(		QualType SubTy = C.getConstantArrayType(
C.CharTy, ArySize, nullptr, ArrayType::Normal, /IndexTypeQuals=/0);		C.CharTy, ArySize, nullptr, ArrayType::Normal, /IndexTypeQuals=/0);
auto *Field = FieldDecl::Create(		auto *Field = FieldDecl::Create(
C, SharedStaticRD, SourceLocation(), SourceLocation(), nullptr, SubTy,		C, SharedStaticRD, SourceLocation(), SourceLocation(), nullptr, SubTy,
C.getTrivialTypeSourceInfo(SubTy, SourceLocation()),		C.getTrivialTypeSourceInfo(SubTy, SourceLocation()),
/BW=/nullptr, /Mutable=/false,		/BW=/nullptr, /Mutable=/false,
/InitStyle=/ICIS_NoInit);		/InitStyle=/ICIS_NoInit);
Field->setAccess(AS_public);		Field->setAccess(AS_public);
SharedStaticRD->addDecl(Field);		SharedStaticRD->addDecl(Field);
}		}
SharedStaticRD->completeDefinition();		SharedStaticRD->completeDefinition();
if (!SharedStaticRD->field_empty()) {		if (!SharedStaticRD->field_empty()) {
QualType StaticTy = C.getRecordType(SharedStaticRD);		QualType StaticTy = C.getRecordType(SharedStaticRD);
llvm::Type *LLVMStaticTy = CGM.getTypes().ConvertTypeForMem(StaticTy);		llvm::Type *LLVMStaticTy = CGM.getTypes().ConvertTypeForMem(StaticTy);
auto *GV = new llvm::GlobalVariable(		auto *GV = RT.allocateSharedStaticRDGlobal(CGM, LLVMStaticTy);
CGM.getModule(), LLVMStaticTy,
/isConstant=/false, llvm::GlobalValue::CommonLinkage,
llvm::Constant::getNullValue(LLVMStaticTy),
"_openmp_shared_static_glob_rd_$_", /InsertBefore=/nullptr,
llvm::GlobalValue::NotThreadLocal,
C.getTargetAddressSpace(LangAS::cuda_shared));
auto *Replacement = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast(		auto *Replacement = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast(
GV, CGM.VoidPtrTy);		GV, CGM.VoidPtrTy);
for (const GlobalPtrSizeRecsTy *Rec : SharedRecs) {		for (const GlobalPtrSizeRecsTy *Rec : SharedRecs) {
Rec->Buffer->replaceAllUsesWith(Replacement);		Rec->Buffer->replaceAllUsesWith(Replacement);
Rec->Buffer->eraseFromParent();		Rec->Buffer->eraseFromParent();
}		}
}		}
StaticRD->completeDefinition();		StaticRD->completeDefinition();
if (!StaticRD->field_empty()) {		if (!StaticRD->field_empty()) {
QualType StaticTy = C.getRecordType(StaticRD);		QualType StaticTy = C.getRecordType(StaticRD);
std::pair<unsigned, unsigned> SMsBlockPerSM = getSMsBlocksPerSM(CGM);		std::pair<unsigned, unsigned> SMsBlockPerSM = getSMsBlocksPerSM(CGM);
llvm::APInt Size1(32, SMsBlockPerSM.second);		llvm::APInt Size1(32, SMsBlockPerSM.second);
QualType Arr1Ty =		QualType Arr1Ty =
C.getConstantArrayType(StaticTy, Size1, nullptr, ArrayType::Normal,		C.getConstantArrayType(StaticTy, Size1, nullptr, ArrayType::Normal,
/IndexTypeQuals=/0);		/IndexTypeQuals=/0);
llvm::APInt Size2(32, SMsBlockPerSM.first);		llvm::APInt Size2(32, SMsBlockPerSM.first);
QualType Arr2Ty =		QualType Arr2Ty =
C.getConstantArrayType(Arr1Ty, Size2, nullptr, ArrayType::Normal,		C.getConstantArrayType(Arr1Ty, Size2, nullptr, ArrayType::Normal,
/IndexTypeQuals=/0);		/IndexTypeQuals=/0);
llvm::Type *LLVMArr2Ty = CGM.getTypes().ConvertTypeForMem(Arr2Ty);		llvm::Type *LLVMArr2Ty = CGM.getTypes().ConvertTypeForMem(Arr2Ty);
// FIXME: nvlink does not handle weak linkage correctly (object with the		auto *GV =
// different size are reported as erroneous).		new llvm::GlobalVariable(CGM.getModule(), LLVMArr2Ty,
// Restore CommonLinkage as soon as nvlink is fixed.		/isConstant=/false, RT.StaticRDLinkage,
auto *GV = new llvm::GlobalVariable(
CGM.getModule(), LLVMArr2Ty,
/isConstant=/false, llvm::GlobalValue::InternalLinkage,
llvm::Constant::getNullValue(LLVMArr2Ty),		llvm::Constant::getNullValue(LLVMArr2Ty),
"_openmp_static_glob_rd_$_");		"_openmp_static_glob_rd_$_");
auto *Replacement = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast(		auto *Replacement = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast(
GV, CGM.VoidPtrTy);		GV, CGM.VoidPtrTy);
for (const GlobalPtrSizeRecsTy *Rec : GlobalRecs) {		for (const GlobalPtrSizeRecsTy *Rec : GlobalRecs) {
Rec->Buffer->replaceAllUsesWith(Replacement);		Rec->Buffer->replaceAllUsesWith(Replacement);
Rec->Buffer->eraseFromParent();		Rec->Buffer->eraseFromParent();
}		}
}		}
}		}
Show All 16 Lines	if (!TeamsReductions.empty()) {
QualType StaticTy = C.getRecordType(StaticRD);		QualType StaticTy = C.getRecordType(StaticRD);
llvm::Type *LLVMReductionsBufferTy =		llvm::Type *LLVMReductionsBufferTy =
CGM.getTypes().ConvertTypeForMem(StaticTy);		CGM.getTypes().ConvertTypeForMem(StaticTy);
// FIXME: nvlink does not handle weak linkage correctly (object with the		// FIXME: nvlink does not handle weak linkage correctly (object with the
// different size are reported as erroneous).		// different size are reported as erroneous).
// Restore CommonLinkage as soon as nvlink is fixed.		// Restore CommonLinkage as soon as nvlink is fixed.
auto *GV = new llvm::GlobalVariable(		auto *GV = new llvm::GlobalVariable(
CGM.getModule(), LLVMReductionsBufferTy,		CGM.getModule(), LLVMReductionsBufferTy,
/isConstant=/false, llvm::GlobalValue::InternalLinkage,		/isConstant=/false, RT.StaticRDLinkage,
llvm::Constant::getNullValue(LLVMReductionsBufferTy),		llvm::Constant::getNullValue(LLVMReductionsBufferTy),
"_openmp_teams_reductions_buffer_$_");		"_openmp_teams_reductions_buffer_$_");
KernelTeamsReductionPtr->setInitializer(		KernelTeamsReductionPtr->setInitializer(
llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast(GV,		llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast(GV,
CGM.VoidPtrTy));		CGM.VoidPtrTy));
}		}
CGOpenMPRuntime::clear();		CGOpenMPRuntime::clear();
}		}

clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.h

Show All 29 Lines	public:
/// Get the GPU warp size.		/// Get the GPU warp size.
llvm::Value *getGPUWarpSize(CodeGenFunction &CGF) override;		llvm::Value *getGPUWarpSize(CodeGenFunction &CGF) override;

/// Get the id of the current thread on the GPU.		/// Get the id of the current thread on the GPU.
llvm::Value *getGPUThreadID(CodeGenFunction &CGF) override;		llvm::Value *getGPUThreadID(CodeGenFunction &CGF) override;

/// Get the maximum number of threads in a block of the GPU.		/// Get the maximum number of threads in a block of the GPU.
llvm::Value *getGPUNumThreads(CodeGenFunction &CGF) override;		llvm::Value *getGPUNumThreads(CodeGenFunction &CGF) override;

		/// Allocate global variable for TransferMedium
		virtual llvm::GlobalVariable *
		allocateTransferMediumGlobal(CodeGenModule &CGM, llvm::ArrayType *Ty,
		StringRef Name) override;

		/// Allocate global variable for SharedStaticRD
		virtual llvm::GlobalVariable *
		allocateSharedStaticRDGlobal(CodeGenModule &CGM,
		llvm::Type *LLVMStaticTy) override;

		/// Allocate global variable for KernelStaticGlobalized
		virtual llvm::GlobalVariable *
		allocateKernelStaticGlobalized(CodeGenModule &CGM) override;

		/// Emit target specific SPMD kernel
		virtual void emitSPMDKernelWrapper(const OMPExecutableDirective &D,
		StringRef ParentName,
		llvm::Function *&OutlinedFn,
		llvm::Constant *&OutlinedFnID,
		bool IsOffloadEntry,
		const RegionCodeGenTy &CodeGen) override;
		/// Emit target specific Non-SPMD kernel
		virtual void
		emitNonSPMDKernelWrapper(const OMPExecutableDirective &D,
		StringRef ParentName, llvm::Function *&OutlinedFn,
		llvm::Constant *&OutlinedFnID, bool IsOffloadEntry,
		const RegionCodeGenTy &CodeGen) override;
		ABataevUnsubmitted Not Done Reply Inline Actions No need to add `virtual`, `override` is enough ABataev: No need to add `virtual`, `override` is enough
};		};

} // CodeGen namespace.		} // CodeGen namespace.
} // clang namespace.		} // clang namespace.

#endif // LLVM_CLANG_LIB_CODEGEN_CGOPENMPRUNTIMENVPTX_H		#endif // LLVM_CLANG_LIB_CODEGEN_CGOPENMPRUNTIMENVPTX_H

clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp

	Show All 24 Lines
	using namespace clang;			using namespace clang;
	using namespace CodeGen;			using namespace CodeGen;
	using namespace llvm::omp;			using namespace llvm::omp;

	CGOpenMPRuntimeNVPTX::CGOpenMPRuntimeNVPTX(CodeGenModule &CGM)			CGOpenMPRuntimeNVPTX::CGOpenMPRuntimeNVPTX(CodeGenModule &CGM)
	: CGOpenMPRuntimeGPU(CGM) {			: CGOpenMPRuntimeGPU(CGM) {
	if (!CGM.getLangOpts().OpenMPIsDevice)			if (!CGM.getLangOpts().OpenMPIsDevice)
	llvm_unreachable("OpenMP NVPTX can only handle device code.");			llvm_unreachable("OpenMP NVPTX can only handle device code.");

				// FIXME: nvlink does not handle weak linkage correctly (object with the
				// different size are reported as erroneous).
				// Restore CommonLinkage as soon as nvlink is fixed.
				StaticRDLinkage = llvm::GlobalValue::InternalLinkage;
	}			}

	llvm::Value *CGOpenMPRuntimeNVPTX::getGPUWarpSize(CodeGenFunction &CGF) {			llvm::Value *CGOpenMPRuntimeNVPTX::getGPUWarpSize(CodeGenFunction &CGF) {
	return CGF.EmitRuntimeCall(			return CGF.EmitRuntimeCall(
	llvm::Intrinsic::getDeclaration(			llvm::Intrinsic::getDeclaration(
	&CGF.CGM.getModule(), llvm::Intrinsic::nvvm_read_ptx_sreg_warpsize),			&CGF.CGM.getModule(), llvm::Intrinsic::nvvm_read_ptx_sreg_warpsize),
	"nvptx_warp_size");			"nvptx_warp_size");
	}			}

	llvm::Value *CGOpenMPRuntimeNVPTX::getGPUThreadID(CodeGenFunction &CGF) {			llvm::Value *CGOpenMPRuntimeNVPTX::getGPUThreadID(CodeGenFunction &CGF) {
	CGBuilderTy &Bld = CGF.Builder;			CGBuilderTy &Bld = CGF.Builder;
	llvm::Function *F;			llvm::Function *F;
	F = llvm::Intrinsic::getDeclaration(			F = llvm::Intrinsic::getDeclaration(
	&CGF.CGM.getModule(), llvm::Intrinsic::nvvm_read_ptx_sreg_tid_x);			&CGF.CGM.getModule(), llvm::Intrinsic::nvvm_read_ptx_sreg_tid_x);
	return Bld.CreateCall(F, llvm::None, "nvptx_tid");			return Bld.CreateCall(F, llvm::None, "nvptx_tid");
	}			}

	llvm::Value *CGOpenMPRuntimeNVPTX::getGPUNumThreads(CodeGenFunction &CGF) {			llvm::Value *CGOpenMPRuntimeNVPTX::getGPUNumThreads(CodeGenFunction &CGF) {
	CGBuilderTy &Bld = CGF.Builder;			CGBuilderTy &Bld = CGF.Builder;
	llvm::Function *F;			llvm::Function *F;
	F = llvm::Intrinsic::getDeclaration(			F = llvm::Intrinsic::getDeclaration(
	&CGF.CGM.getModule(), llvm::Intrinsic::nvvm_read_ptx_sreg_ntid_x);			&CGF.CGM.getModule(), llvm::Intrinsic::nvvm_read_ptx_sreg_ntid_x);
	return Bld.CreateCall(F, llvm::None, "nvptx_num_threads");			return Bld.CreateCall(F, llvm::None, "nvptx_num_threads");
	}			}

				llvm::GlobalVariable *CGOpenMPRuntimeNVPTX::allocateTransferMediumGlobal(
				JonChesterfieldUnsubmitted Not Done Reply Inline Actions Perhaps (typed into browser): llvm::GlobalVariable CGOpenMPRuntimeNVPTX::createGlobal( CodeGenModule &CGM, llvm::ArrayType Ty, StringRef Name) { return new llvm::GlobalVariable( CGM.getModule(), Ty, /isConstant=/false, llvm::GlobalVariable::CommonLinkage, llvm::Constant::getNullValue(Ty), Name, /InsertBefore=/nullptr, llvm::GlobalVariable::NotThreadLocal, CGM.getContext().getTargetAddressSpace(LangAS::cuda_shared), /isExternallyInitialized/ true); } llvm::GlobalVariable CGOpenMPRuntimeAMDGCN::createGlobal( CodeGenModule &CGM, llvm::ArrayType Ty, StringRef Name) { return new llvm::GlobalVariable( CGM.getModule(), Ty, /isConstant=/false, llvm::GlobalVariable::WeakAnyLinkage, llvm::Constant::getNullValue(Ty), Name, /InsertBefore=/nullptr, llvm::GlobalVariable::NotThreadLocal, CGM.getContext().getTargetAddressSpace(LangAS::cuda_shared), /isExternallyInitialized/ false); } JonChesterfield: Perhaps (typed into browser): ``` llvm::GlobalVariable *CGOpenMPRuntimeNVPTX::createGlobal…
				CodeGenModule &CGM, llvm::ArrayType *Ty, StringRef TransferMediumName) {
				return new llvm::GlobalVariable(
				CGM.getModule(), Ty, /isConstant=/false,
				llvm::GlobalVariable::CommonLinkage, llvm::Constant::getNullValue(Ty),
				TransferMediumName,
				/InsertBefore=/nullptr, llvm::GlobalVariable::NotThreadLocal,
				CGM.getContext().getTargetAddressSpace(LangAS::cuda_shared));
				}

				llvm::GlobalVariable *
				CGOpenMPRuntimeNVPTX::allocateSharedStaticRDGlobal(CodeGenModule &CGM,
				llvm::Type *LLVMStaticTy) {
				return new llvm::GlobalVariable(
				CGM.getModule(), LLVMStaticTy,
				/isConstant=/false, llvm::GlobalValue::CommonLinkage,
				llvm::Constant::getNullValue(LLVMStaticTy),
				"_openmp_shared_static_glob_rd_$_", /InsertBefore=/nullptr,
				llvm::GlobalValue::NotThreadLocal,
				CGM.getContext().getTargetAddressSpace(LangAS::cuda_shared));
				}

				llvm::GlobalVariable *
				CGOpenMPRuntimeNVPTX::allocateKernelStaticGlobalized(CodeGenModule &CGM) {
				return new llvm::GlobalVariable(
				CGM.getModule(), CGM.VoidPtrTy, /isConstant=/false,
				llvm::GlobalValue::InternalLinkage,
				llvm::ConstantPointerNull::get(CGM.VoidPtrTy),
				"_openmp_kernel_static_glob_rd$ptr", /InsertBefore=/nullptr,
				llvm::GlobalValue::NotThreadLocal,
				CGM.getContext().getTargetAddressSpace(LangAS::cuda_shared));
				}

				void CGOpenMPRuntimeNVPTX::emitSPMDKernelWrapper(
				const OMPExecutableDirective &D, StringRef ParentName,
				llvm::Function &OutlinedFn, llvm::Constant &OutlinedFnID,
				bool IsOffloadEntry, const RegionCodeGenTy &CodeGen) {
				emitSPMDKernel(D, ParentName, OutlinedFn, OutlinedFnID, IsOffloadEntry,
				CodeGen);
				}

				void CGOpenMPRuntimeNVPTX::emitNonSPMDKernelWrapper(
				const OMPExecutableDirective &D, StringRef ParentName,
				llvm::Function &OutlinedFn, llvm::Constant &OutlinedFnID,
				bool IsOffloadEntry, const RegionCodeGenTy &CodeGen) {
				emitNonSPMDKernel(D, ParentName, OutlinedFn, OutlinedFnID, IsOffloadEntry,
				CodeGen);
				}

clang/test/OpenMP/amdgcn_target_codegen.cpp

	// REQUIRES: amdgpu-registered-target			// REQUIRES: amdgpu-registered-target

	// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc			// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc
	// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - \| FileCheck %s			// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - \| FileCheck %s
	// expected-no-diagnostics			// expected-no-diagnostics
	#ifndef HEADER			#ifndef HEADER
	#define HEADER			#define HEADER

	#define N 1000			#define N 1000

				// CHECK: @"_openmp_kernel_static_glob_rd$ptr" = weak addrspace(3) externally_initialized global i8* undef

				// CHECK: @__omp_offloading_{{.*}}test_amdgcn_target_tid_threadsv_l36_kern_desc = weak constant %struct.__tgt_attribute_struct { i16 2, i16 9, i16 0, i8 1, i8 1, i8 0 }, align 2
				// CHECK : @__omp_offloading_{{.*}}test_amdgcn_target_tid_threadsv_l36_exec_mode = weak constant i8 1

				// CHECK : @__omp_offloading_{{.*}}test_amdgcn_target_tid_threads_simdv_l52_kern_desc = weak constant %struct.__tgt_attribute_struct { i16 2, i16 9, i16 0, i8 0, i8 1, i8 0 }, align 2
				// CHECK : @__omp_offloading_{{.*}}test_amdgcn_target_tid_threads_simdv_l52_exec_mode = weak constant i8 0

				// CHECK : @__omp_offloading_{{.*}}test_amdgcn_target_attributes_spmdv_l63_wg_size = weak addrspace(1) constant i16 10
				// CHECK : @__omp_offloading_{{.*}}test_amdgcn_target_attributes_spmdv_l63_kern_desc = weak constant %struct.__tgt_attribute_struct { i16 2, i16 9, i16 10, i8 0, i8 1, i8 0 }, align 2
				// CHECK : @__omp_offloading_{{.*}}test_amdgcn_target_attributes_spmdv_l63_exec_mode = weak constant i8 0

				// CHECK : @__omp_offloading_{{.*}}test_amdgcn_target_attributes_non_spmdv_l75_wg_size = weak addrspace(1) constant i16 74
				// CHECK : @__omp_offloading_{{.*}}test_amdgcn_target_attributes_non_spmdv_l75_kern_desc = weak constant %struct.__tgt_attribute_struct { i16 2, i16 9, i16 74, i8 1, i8 1, i8 0 }, align 2
				// CHECK : @__omp_offloading_{{.*}}test_amdgcn_target_attributes_non_spmdv_l75_exec_mode = weak constant i8 1

	int test_amdgcn_target_tid_threads() {			int test_amdgcn_target_tid_threads() {
	// CHECK-LABEL: define weak void @{{.*}}test_amdgcn_target_tid_threads			// CHECK-LABEL: define weak void @{{.*}}test_amdgcn_target_tid_threads

	int arr[N];			int arr[N];

	// CHECK: [[NUM_THREADS:%.+]] = call i64 @__ockl_get_local_size(i32 0)			// CHECK: [[NUM_THREADS:%.+]] = call i64 @__ockl_get_local_size(i32 0)
	// CHECK-NEXT: [[VAR:%.+]] = trunc i64 [[NUM_THREADS]] to i32			// CHECK-NEXT: [[VAR:%.+]] = trunc i64 [[NUM_THREADS]] to i32
	// CHECK-NEXT: sub nuw i32 [[VAR]], 64			// CHECK-NEXT: sub nuw i32 [[VAR]], 64
	Show All 16 Lines
	// CHECK-NEXT: call void @__kmpc_spmd_kernel_init(i32 [[VAR]], i16 0, i16 0)			// CHECK-NEXT: call void @__kmpc_spmd_kernel_init(i32 [[VAR]], i16 0, i16 0)
	#pragma omp target simd			#pragma omp target simd
	for (int i = 0; i < N; i++) {			for (int i = 0; i < N; i++) {
	arr[i] = 1;			arr[i] = 1;
	}			}
	return arr[0];			return arr[0];
	}			}

				int test_amdgcn_target_attributes_spmd() {
				int arr[N];

				// CHECK: {{.*}}"amdgpu-flat-work-group-size"="10,10"
				#pragma omp target parallel num_threads(10)
				for (int i = 0; i < N; i++) {
				arr[i] = 1;
				}

				return arr[0];
				}

				int test_amdgcn_target_attributes_non_spmd() {
				int arr[N];

				// CHECK: {{.*}}"amdgpu-flat-work-group-size"="74,74"
				#pragma omp target teams thread_limit(10)
				for (int i = 0; i < N; i++) {
				arr[i] = 1;
				}

				return arr[0];
				}

				int test_amdgcn_target_attributes_max_work_group_size() {
				int arr[N];

				// CHECK: {{.*}}"amdgpu-flat-work-group-size"="1024,1024"
				#pragma omp target teams thread_limit(1500)
				for (int i = 0; i < N; i++) {
				arr[i] = 1;
				}

				return arr[0];
				}

	#endif			#endif

This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP][AMDGCN] Generate global variables and attributes for AMDGCNAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 286101

clang/lib/CodeGen/CGOpenMPRuntime.h

clang/lib/CodeGen/CGOpenMPRuntime.cpp

clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.h

clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.cpp

clang/lib/CodeGen/CGOpenMPRuntimeGPU.h

clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp

clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.h

clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp

clang/test/OpenMP/amdgcn_target_codegen.cpp

[OpenMP][AMDGCN] Generate global variables and attributes for AMDGCN
AbandonedPublic