This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/lib/CodeGen/
-
lib/
-
CodeGen/
2/9
CGOpenMPRuntime.h
1/2
CGOpenMPRuntime.cpp
4/7
CGOpenMPRuntimeAMDGCN.h
8/15
CGOpenMPRuntimeAMDGCN.cpp
3/7
CGOpenMPRuntimeGPU.h
2
CGOpenMPRuntimeGPU.cpp
-
CGOpenMPRuntimeNVPTX.h
1
CGOpenMPRuntimeNVPTX.cpp

Differential D86097

[OpenMP][AMDGCN] Generate global variables and attributes for AMDGCN
AbandonedPublic

Authored by saiislam on Aug 17 2020, 11:55 AM.

Download Raw Diff

Details

Reviewers

ABataev
jdoerfert
JonChesterfield

Summary

Provide support for amdgcn specific global variables and attributes.
Generalize allocation of various common global variables and provide
their specialized implementations for nvptx and amdgcn.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	10 ms	linux > Extra Tools Unit Tests.clang-query/_/ClangQueryTests::QueryParserTest.Complete
	390 ms	linux > HWAddressSanitizer-x86_64.TestCases::sizes.cpp
	30 ms	windows > Extra Tools Unit Tests.clang-query/_/ClangQueryTests_exe::QueryParserTest.Complete

Event Timeline

saiislam created this revision.Aug 17 2020, 11:55 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 17 2020, 11:55 AM

Herald added subscribers: cfe-commits, guansong, yaxunl and 2 others. · View Herald Transcript

saiislam requested review of this revision.Aug 17 2020, 11:55 AM

Herald added a subscriber: sstefan1. · View Herald TranscriptAug 17 2020, 11:55 AM

ABataev added inline comments.Aug 17 2020, 12:19 PM

clang/lib/CodeGen/CGOpenMPRuntime.h
502	Cab this type and corresponding functions be made AMDGCN-specific only?
clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.cpp
119	Is this possible?
122	`FlatAttrEmitted`
132	`CompileTimeThreadLimit`
clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
1314–1315	Restore original formatting.
clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
249–277	Make them protected, not public if possible. Try the same for other new functions.
clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.h
41–66	No need to add `virtual`, `override` is enough

Harbormaster completed remote builds in B68651: Diff 286101.Aug 17 2020, 1:00 PM

Moved amdgcn specific functions to CGOpenMPAMDGCN.cpp
Removed tautology condition
Corrected case of local variables
Restored original formatting
Changed back declaration of emit kernel methods as private
Added support of amdgcn specific PrePostActionTy implementation and its corresponding test cases
Changed static line numbers in new test cases with regex
Other small code corrections

Harbormaster completed remote builds in B69371: Diff 287513.Aug 24 2020, 4:06 PM

Reformat the code

clang/lib/CodeGen/CGOpenMPRuntime.h
501	Remove unnecessary formatting changes.
2491–2495	Better to make it a protected member function if you really require it. Plus, this function is very small and, I think, you simply create your own copy in CGOpenMPRuntimeAMDGCN
2499	Same here, make it protected or just create a copy, if it is small.
clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.h
29–31	Add comments for all new members
77	Do you really need to make this class public? `final`

Reformarting
Comments
Reduced scope of specialized PrePostActionTy

saiislam marked an inline comment as done.Aug 26 2020, 12:27 PM

saiislam added inline comments.

clang/lib/CodeGen/CGOpenMPRuntime.h
2491–2495	Not making it protected because it is used by various static functions. And don't want to create an object pointer of subclass of CGOpenMPRuntime in CGOpenMPRuntime.
2499	It calls static functions which in turn call other static functions, so it won't make sense to create a copy of whole function chain in amdgcn.

Harbormaster completed remote builds in B69656: Diff 288072.Aug 26 2020, 1:51 PM

Ping.

ABataev added inline comments.Sep 15 2020, 7:44 AM

clang/lib/CodeGen/CGOpenMPRuntime.h
501	Still not removed
688	Restore original formatting
2494–2499	Better to encapsulate these functions into a new utility class and make them public static.
clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.h
30–63	Do you really need to expose all these new members as public?
31	Runtime does not support nested parallelism on GPU. Do you really need it?
93	It does not help to understand the functionality
clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
1573	It leads to a mem leak.
clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
37	Make it private or protected 2.Add default initializer
221–253	Are all these required to be public?
402–404	Make it private or protected

Removed unnecessary formatting of untouched code.
Encapsulated addFieldToRecordDecl and createGlobalStruct methods in a class and made them static (triggered change at all calling sites).
Marked most of the member methods of CGOpenMPRuntimeAMDGCN as private (forgot to do same change in nvptx)
Fixed the memory leak
Marked appropriate member variables as protected in CGOpenMPRuntimeGPU

JonChesterfield added inline comments.Oct 15 2020, 8:56 AM

clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.cpp
178	The nvptx emitSPMDKernelWrapper does nothing and the amdgcn one appends some metadata. How about 'nvptx::generateMetadata(...)' that does nothing and 'amdgcn::generateMetadata(...)` that does this stuff, called from the end of emitSPMDKernel?
200	This metadata generation could be split out from the other changes.
clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.h
43	I'm not convinced by this abstraction. It looks like amdgcn and nvptx want almost exactly the same variable in each case. The difference appears to be that nvptx uses internal linkage and amdgcn uses weak + externally initialized, in which case we're better off with `bool nvptx::needsExternalInitialization() {return false;}` `bool amdgpu::needsExternalInitialization() {return true;}` Or, if the inline ternary is unappealing, amdgcn::NewGlobalVariable(...) that passes the arguments to llvm::GlobalVariable while setting the two fields that differ between the two.
clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
204	Please put this back to the previous location so we can see whether it changed in the diff

Harbormaster completed remote builds in B75182: Diff 298377.Oct 15 2020, 9:01 AM

saiislam marked 3 inline comments as done.Oct 15 2020, 12:13 PM

saiislam added inline comments.

clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.cpp
178	It will be then difficult to track what all things are being done differently in the two. So, the common code has been generalized and (no change in nvptx + some changes in amdgcn) has been used as specialization.
clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.h
43	I understand what you are suggesting. But, there are multiple such variables where linkage between nvptx and amdgcn are different. Also current style gives flexibility to a future implementation to define these variables in their own way. What do you think?
clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
204	This movement changes them from private to protected. I could have just added access specifiers and not move the definitions. It would have simplified the review, but it would have decreased the readability for future.
221–253	Yes, they are being called from outside class.

JonChesterfield added inline comments.Oct 19 2020, 7:49 AM

clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp

Perhaps (typed into browser):

llvm::GlobalVariable *CGOpenMPRuntimeNVPTX::createGlobal( CodeGenModule &CGM, llvm::ArrayType *Ty, StringRef Name) {
  return new llvm::GlobalVariable(
      CGM.getModule(), Ty, /*isConstant=*/false,
      llvm::GlobalVariable::CommonLinkage, llvm::Constant::getNullValue(Ty),
      Name,
      /*InsertBefore=*/nullptr, llvm::GlobalVariable::NotThreadLocal,
      CGM.getContext().getTargetAddressSpace(LangAS::cuda_shared),
/*isExternallyInitialized*/ true);
}

llvm::GlobalVariable *CGOpenMPRuntimeAMDGCN::createGlobal( CodeGenModule &CGM, llvm::ArrayType *Ty, StringRef Name) {
  return new llvm::GlobalVariable(
      CGM.getModule(), Ty, /*isConstant=*/false,
      llvm::GlobalVariable::WeakAnyLinkage, llvm::Constant::getNullValue(Ty),
      Name,
      /*InsertBefore=*/nullptr, llvm::GlobalVariable::NotThreadLocal,
      CGM.getContext().getTargetAddressSpace(LangAS::cuda_shared),
/*isExternallyInitialized*/ false);
}

pdhaliwal added a subscriber: pdhaliwal.Nov 17 2020, 4:48 AM

Simplifies overall patch after D90248.
Removes MaxParallelLevel and thus target specific PrePostActionTy.
Removes ExternallyInitialized qualifier from shared variables for AMDGCN.

Harbormaster completed remote builds in B79814: Diff 307108.Nov 23 2020, 10:28 AM

JonChesterfield added inline comments.Nov 23 2020, 6:02 PM

clang/lib/CodeGen/CGOpenMPRuntime.cpp
1344	This appears to be the same as the free function we had before, except now all the call sites are prefixed CodegenUtil. Is there a functional change I'm missing? The rest of this patch would be easier to read with this part split off.
clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.cpp
77	This is a very verbose way to say that amdgcn calls emitmetatdata at the end of emitkernel and nvptx doesn't. Suggest unconditionally calling emitmetatdata, and having emitmetatdata be a no-op for nvptx.
89	I think there's a credible chance this is useful to nvptx, so doesn't have to be amdgcn specific
108	I think this is about computing a maximum workgroup size which the runtime uses to limit the number of threads it launches. If so, this is probably useful for nvptx and amdgcn. I'm having trouble working out what the conditions are though. Maybe it's based on an openmp clause?
150	I think I remember seeing a diff that makes this attribute unconditionally emitted by some other part of the toolchain. If so, it may no longer be required
169	HostServices is unused. Mode is redundant with exec_mode. wg_size is redundant with the other wg_size symbol added above. This kern_desc object should be deleted, not upstreamed.

saiislam mentioned this in D92167: [OpenMP][NFC] Encapsulate some CGOpenMPRuntime static methods in a utility class.Nov 26 2020, 3:24 AM

saiislam marked 3 inline comments as done.Nov 26 2020, 4:27 AM

saiislam added inline comments.

clang/lib/CodeGen/CGOpenMPRuntime.cpp
1344	addFieldToRecordDecl and createGlobalStruct methods had file static scope. To make them callable from other files, from amdgcn specific file in this case, they were put in this utility class. D92167 puts this change into a separate patch. Will update this patch once D92167 gets accepted.
clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.cpp
77	Won't the no-op approach be less extensible? Current way, though verbose, leaves scope for attaching prefix/suffix code as and when required around emitkernel. While in case of no-op, every implementing arch might have to use the exact same pattern of methods with and without code.
89	You are right, it can be useful for nvptx as well. May be we can club its generalization with the nvptx's use-case when it arrives in the future?
108	Yes, the if block in 111-147 corresponds to "number of threads" for thread_limit and num_threads clauses in teams and parallel directives.
169	Ok, thanks. Will update in next revision.

I don't believe the contents of this patch is necessary for codegen on amdgpu. One of the internal/weak distinctions works around a bug in the gfx800 toolchain, but we should root cause and fix that bug instead. The kern_desc object is redundant. I think amdgpu-flat-work-group-size is already emitted, but if not, we might want that.

The wg_size code is interesting but architecture independent, and it's probably more user friendly for nvptx and amdgcn to have the same handling of wg_size constraints.

This revision now requires changes to proceed.Nov 26 2020, 8:42 AM

saiislam abandoned this revision.Sep 21 2021, 7:24 AM

saiislam marked 3 inline comments as done.

Revision Contents

Path

Size

clang/

lib/

CodeGen/

CGOpenMPRuntime.h

14 lines

CGOpenMPRuntime.cpp

120 lines

CGOpenMPRuntimeAMDGCN.h

36 lines

CGOpenMPRuntimeAMDGCN.cpp

147 lines

CGOpenMPRuntimeGPU.h

23 lines

CGOpenMPRuntimeGPU.cpp

17 lines

CGOpenMPRuntimeNVPTX.h

15 lines

CGOpenMPRuntimeNVPTX.cpp

17 lines

Diff 307108

clang/lib/CodeGen/CGOpenMPRuntime.h

Show First 20 Lines • Show All 492 Lines • ▼ Show 20 Lines	private:
/// char *name; // Name of the function or global.		/// char *name; // Name of the function or global.
/// size_t size; // Size of the entry info (0 if it a function).		/// size_t size; // Size of the entry info (0 if it a function).
/// int32_t flags;		/// int32_t flags;
/// int32_t reserved;		/// int32_t reserved;
/// };		/// };
QualType TgtOffloadEntryQTy;		QualType TgtOffloadEntryQTy;
/// Entity that registers the offloading constants that were emitted so		/// Entity that registers the offloading constants that were emitted so
/// far.		/// far.
class OffloadEntriesInfoManagerTy {		class OffloadEntriesInfoManagerTy {
		ABataevUnsubmitted Not Done Reply Inline Actions Remove unnecessary formatting changes. ABataev: Remove unnecessary formatting changes.
		ABataevUnsubmitted Not Done Reply Inline Actions Still not removed ABataev: Still not removed
CodeGenModule &CGM;		CodeGenModule &CGM;
		ABataevUnsubmitted Not Done Reply Inline Actions Cab this type and corresponding functions be made AMDGCN-specific only? ABataev: Cab this type and corresponding functions be made AMDGCN-specific only?

/// Number of entries registered so far.		/// Number of entries registered so far.
unsigned OffloadingEntriesNum = 0;		unsigned OffloadingEntriesNum = 0;

public:		public:
/// Base class of the entries info.		/// Base class of the entries info.
class OffloadEntryInfo {		class OffloadEntryInfo {
public:		public:
▲ Show 20 Lines • Show All 169 Lines • ▼ Show 20 Lines	bool hasDeviceGlobalVarEntryInfo(StringRef VarName) const {
return OffloadEntriesDeviceGlobalVar.count(VarName) > 0;		return OffloadEntriesDeviceGlobalVar.count(VarName) > 0;
}		}
/// Applies action \a Action on all registered entries.		/// Applies action \a Action on all registered entries.
typedef llvm::function_ref<void(StringRef,		typedef llvm::function_ref<void(StringRef,
const OffloadEntryInfoDeviceGlobalVar &)>		const OffloadEntryInfoDeviceGlobalVar &)>
OffloadDeviceGlobalVarEntryInfoActTy;		OffloadDeviceGlobalVarEntryInfoActTy;
void actOnDeviceGlobalVarEntriesInfo(		void actOnDeviceGlobalVarEntriesInfo(
const OffloadDeviceGlobalVarEntryInfoActTy &Action);		const OffloadDeviceGlobalVarEntryInfoActTy &Action);

ABataevUnsubmitted Not Done Reply Inline Actions Restore original formatting ABataev: Restore original formatting
private:		private:
// Storage for target region entries kind. The storage is to be indexed by		// Storage for target region entries kind. The storage is to be indexed by
// file ID, device ID, parent function name and line number.		// file ID, device ID, parent function name and line number.
typedef llvm::DenseMap<unsigned, OffloadEntryInfoTargetRegion>		typedef llvm::DenseMap<unsigned, OffloadEntryInfoTargetRegion>
OffloadEntriesTargetRegionPerLine;		OffloadEntriesTargetRegionPerLine;
typedef llvm::StringMap<OffloadEntriesTargetRegionPerLine>		typedef llvm::StringMap<OffloadEntriesTargetRegionPerLine>
OffloadEntriesTargetRegionPerParentName;		OffloadEntriesTargetRegionPerParentName;
typedef llvm::DenseMap<unsigned, OffloadEntriesTargetRegionPerParentName>		typedef llvm::DenseMap<unsigned, OffloadEntriesTargetRegionPerParentName>
▲ Show 20 Lines • Show All 1,786 Lines • ▼ Show 20 Lines	public:

/// Gets the OpenMP-specific address of the local variable.		/// Gets the OpenMP-specific address of the local variable.
Address getAddressOfLocalVariable(CodeGenFunction &CGF,		Address getAddressOfLocalVariable(CodeGenFunction &CGF,
const VarDecl *VD) override {		const VarDecl *VD) override {
return Address::invalid();		return Address::invalid();
}		}
};		};

		/// To encapsulate helper methods to be used by target specific specializations
		/// of CGOpenMPRuntimeGPU.
		class CodeGenUtil {
		public:
		static FieldDecl addFieldToRecordDecl(ASTContext &C, DeclContext DC,
		ABataevUnsubmitted Not Done Reply Inline Actions Better to make it a protected member function if you really require it. Plus, this function is very small and, I think, you simply create your own copy in CGOpenMPRuntimeAMDGCN ABataev: Better to make it a protected member function if you really require it. Plus, this function is…
		saiislamAuthorUnsubmitted Done Reply Inline Actions Not making it protected because it is used by various static functions. And don't want to create an object pointer of subclass of CGOpenMPRuntime in CGOpenMPRuntime. saiislam: Not making it protected because it is used by various static functions. And don't want to…
		QualType FieldTy);

		template <class... As>
		static llvm::GlobalVariable *
		ABataevUnsubmitted Not Done Reply Inline Actions Same here, make it protected or just create a copy, if it is small. ABataev: Same here, make it protected or just create a copy, if it is small.
		saiislamAuthorUnsubmitted Done Reply Inline Actions It calls static functions which in turn call other static functions, so it won't make sense to create a copy of whole function chain in amdgcn. saiislam: It calls static functions which in turn call other static functions, so it won't make sense to…
		ABataevUnsubmitted Not Done Reply Inline Actions Better to encapsulate these functions into a new utility class and make them public static. ABataev: Better to encapsulate these functions into a new utility class and make them public static.
		createGlobalStruct(CodeGenModule &CGM, QualType Ty, bool IsConstant,
		ArrayRef<llvm::Constant *> Data, const Twine &Name,
		As &&... Args);
		};

} // namespace CodeGen		} // namespace CodeGen
} // namespace clang		} // namespace clang

#endif		#endif

clang/lib/CodeGen/CGOpenMPRuntime.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,042 Lines • ▼ Show 20 Lines

LValue CGOpenMPTaskOutlinedRegionInfo::getThreadIDVariableLValue(		LValue CGOpenMPTaskOutlinedRegionInfo::getThreadIDVariableLValue(
CodeGenFunction &CGF) {		CodeGenFunction &CGF) {
return CGF.MakeAddrLValue(CGF.GetAddrOfLocalVar(getThreadIDVariable()),		return CGF.MakeAddrLValue(CGF.GetAddrOfLocalVar(getThreadIDVariable()),
getThreadIDVariable()->getType(),		getThreadIDVariable()->getType(),
AlignmentSource::Decl);		AlignmentSource::Decl);
}		}

static FieldDecl addFieldToRecordDecl(ASTContext &C, DeclContext DC,
QualType FieldTy) {
auto *Field = FieldDecl::Create(
C, DC, SourceLocation(), SourceLocation(), /Id=/nullptr, FieldTy,
C.getTrivialTypeSourceInfo(FieldTy, SourceLocation()),
/BW=/nullptr, /Mutable=/false, /InitStyle=/ICIS_NoInit);
Field->setAccess(AS_public);
DC->addDecl(Field);
return Field;
}

CGOpenMPRuntime::CGOpenMPRuntime(CodeGenModule &CGM, StringRef FirstSeparator,		CGOpenMPRuntime::CGOpenMPRuntime(CodeGenModule &CGM, StringRef FirstSeparator,
StringRef Separator)		StringRef Separator)
: CGM(CGM), FirstSeparator(FirstSeparator), Separator(Separator),		: CGM(CGM), FirstSeparator(FirstSeparator), Separator(Separator),
OMPBuilder(CGM.getModule()), OffloadEntriesInfoManager(CGM) {		OMPBuilder(CGM.getModule()), OffloadEntriesInfoManager(CGM) {
KmpCriticalNameTy = llvm::ArrayType::get(CGM.Int32Ty, /NumElements/ 8);		KmpCriticalNameTy = llvm::ArrayType::get(CGM.Int32Ty, /NumElements/ 8);

// Initialize Types used in OpenMPIRBuilder from OMPKinds.def		// Initialize Types used in OpenMPIRBuilder from OMPKinds.def
OMPBuilder.initialize();		OMPBuilder.initialize();
▲ Show 20 Lines • Show All 277 Lines • ▼ Show 20 Lines	for (const FieldDecl *FD : RD->fields()) {
for (unsigned I = PrevIdx; I < Idx; ++I)		for (unsigned I = PrevIdx; I < Idx; ++I)
Fields.add(llvm::Constant::getNullValue(StructTy->getElementType(I)));		Fields.add(llvm::Constant::getNullValue(StructTy->getElementType(I)));
PrevIdx = Idx + 1;		PrevIdx = Idx + 1;
Fields.add(*DI);		Fields.add(*DI);
++DI;		++DI;
}		}
}		}

		FieldDecl *clang::CodeGen::CodeGenUtil::CodeGenUtil::addFieldToRecordDecl(
		JonChesterfieldUnsubmitted Not Done Reply Inline Actions This appears to be the same as the free function we had before, except now all the call sites are prefixed CodegenUtil. Is there a functional change I'm missing? The rest of this patch would be easier to read with this part split off. JonChesterfield: This appears to be the same as the free function we had before, except now all the call sites…
		saiislamAuthorUnsubmitted Done Reply Inline Actions addFieldToRecordDecl and createGlobalStruct methods had file static scope. To make them callable from other files, from amdgcn specific file in this case, they were put in this utility class. D92167 puts this change into a separate patch. Will update this patch once D92167 gets accepted. saiislam: addFieldToRecordDecl and createGlobalStruct methods had file static scope. To make them…
		ASTContext &C, DeclContext *DC, QualType FieldTy) {
		auto *Field = FieldDecl::Create(
		C, DC, SourceLocation(), SourceLocation(), /Id=/nullptr, FieldTy,
		C.getTrivialTypeSourceInfo(FieldTy, SourceLocation()),
		/BW=/nullptr, /Mutable=/false, /InitStyle=/ICIS_NoInit);
		Field->setAccess(AS_public);
		DC->addDecl(Field);
		return Field;
		}

template <class... As>		template <class... As>
static llvm::GlobalVariable *		llvm::GlobalVariable *clang::CodeGen::CodeGenUtil::createGlobalStruct(
createGlobalStruct(CodeGenModule &CGM, QualType Ty, bool IsConstant,		CodeGenModule &CGM, QualType Ty, bool IsConstant,
ArrayRef<llvm::Constant *> Data, const Twine &Name,		ArrayRef<llvm::Constant *> Data, const Twine &Name, As &&... Args) {
As &&... Args) {
const auto *RD = cast<RecordDecl>(Ty->getAsTagDecl());		const auto *RD = cast<RecordDecl>(Ty->getAsTagDecl());
const CGRecordLayout &RL = CGM.getTypes().getCGRecordLayout(RD);		const CGRecordLayout &RL = CGM.getTypes().getCGRecordLayout(RD);
ConstantInitBuilder CIBuilder(CGM);		ConstantInitBuilder CIBuilder(CGM);
ConstantStructBuilder Fields = CIBuilder.beginStruct(RL.getLLVMType());		ConstantStructBuilder Fields = CIBuilder.beginStruct(RL.getLLVMType());
buildStructValue(Fields, CGM, RD, RL, Data);		buildStructValue(Fields, CGM, RD, RL, Data);
return Fields.finishAndCreateGlobal(		return Fields.finishAndCreateGlobal(
Name, CGM.getContext().getAlignOfGlobalVarInChars(Ty), IsConstant,		Name, CGM.getContext().getAlignOfGlobalVarInChars(Ty), IsConstant,
std::forward<As>(Args)...);		std::forward<As>(Args)...);
▲ Show 20 Lines • Show All 1,709 Lines • ▼ Show 20 Lines	void CGOpenMPRuntime::createOffloadEntry(

llvm::Constant *Data[] = {		llvm::Constant *Data[] = {
llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast(ID, CGM.VoidPtrTy),		llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast(ID, CGM.VoidPtrTy),
llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast(Str, CGM.Int8PtrTy),		llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast(Str, CGM.Int8PtrTy),
llvm::ConstantInt::get(CGM.SizeTy, Size),		llvm::ConstantInt::get(CGM.SizeTy, Size),
llvm::ConstantInt::get(CGM.Int32Ty, Flags),		llvm::ConstantInt::get(CGM.Int32Ty, Flags),
llvm::ConstantInt::get(CGM.Int32Ty, 0)};		llvm::ConstantInt::get(CGM.Int32Ty, 0)};
std::string EntryName = getName({"omp_offloading", "entry", ""});		std::string EntryName = getName({"omp_offloading", "entry", ""});
llvm::GlobalVariable *Entry = createGlobalStruct(		llvm::GlobalVariable *Entry = CodeGenUtil::createGlobalStruct(
CGM, getTgtOffloadEntryQTy(), /IsConstant=/true, Data,		CGM, getTgtOffloadEntryQTy(), /IsConstant=/true, Data,
Twine(EntryName).concat(Name), llvm::GlobalValue::WeakAnyLinkage);		Twine(EntryName).concat(Name), llvm::GlobalValue::WeakAnyLinkage);

// The entry has to be created in the section the linker expects it to be.		// The entry has to be created in the section the linker expects it to be.
Entry->setSection("omp_offloading_entries");		Entry->setSection("omp_offloading_entries");
}		}

void CGOpenMPRuntime::createOffloadEntriesAndInfoMetadata() {		void CGOpenMPRuntime::createOffloadEntriesAndInfoMetadata() {
▲ Show 20 Lines • Show All 261 Lines • ▼ Show 20 Lines	QualType CGOpenMPRuntime::getTgtOffloadEntryQTy() {
// size_t size; // Size of the entry info (0 if it a function).		// size_t size; // Size of the entry info (0 if it a function).
// int32_t flags; // Flags associated with the entry, e.g. 'link'.		// int32_t flags; // Flags associated with the entry, e.g. 'link'.
// int32_t reserved; // Reserved, to use by the runtime library.		// int32_t reserved; // Reserved, to use by the runtime library.
// };		// };
if (TgtOffloadEntryQTy.isNull()) {		if (TgtOffloadEntryQTy.isNull()) {
ASTContext &C = CGM.getContext();		ASTContext &C = CGM.getContext();
RecordDecl *RD = C.buildImplicitRecord("__tgt_offload_entry");		RecordDecl *RD = C.buildImplicitRecord("__tgt_offload_entry");
RD->startDefinition();		RD->startDefinition();
addFieldToRecordDecl(C, RD, C.VoidPtrTy);		CodeGenUtil::addFieldToRecordDecl(C, RD, C.VoidPtrTy);
addFieldToRecordDecl(C, RD, C.getPointerType(C.CharTy));		CodeGenUtil::addFieldToRecordDecl(C, RD, C.getPointerType(C.CharTy));
addFieldToRecordDecl(C, RD, C.getSizeType());		CodeGenUtil::addFieldToRecordDecl(C, RD, C.getSizeType());
addFieldToRecordDecl(		CodeGenUtil::addFieldToRecordDecl(
C, RD, C.getIntTypeForBitwidth(/DestWidth=/32, /Signed=/true));		C, RD, C.getIntTypeForBitwidth(/DestWidth=/32, /Signed=/true));
addFieldToRecordDecl(		CodeGenUtil::addFieldToRecordDecl(
C, RD, C.getIntTypeForBitwidth(/DestWidth=/32, /Signed=/true));		C, RD, C.getIntTypeForBitwidth(/DestWidth=/32, /Signed=/true));
RD->completeDefinition();		RD->completeDefinition();
RD->addAttr(PackedAttr::CreateImplicit(C));		RD->addAttr(PackedAttr::CreateImplicit(C));
TgtOffloadEntryQTy = C.getRecordType(RD);		TgtOffloadEntryQTy = C.getRecordType(RD);
}		}
return TgtOffloadEntryQTy;		return TgtOffloadEntryQTy;
}		}

▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	for (const auto &Pair : Privates) {
// If the private variable is a local variable with lvalue ref type,		// If the private variable is a local variable with lvalue ref type,
// allocate the pointer instead of the pointee type.		// allocate the pointer instead of the pointee type.
if (Pair.second.isLocalPrivate()) {		if (Pair.second.isLocalPrivate()) {
if (VD->getType()->isLValueReferenceType())		if (VD->getType()->isLValueReferenceType())
Type = C.getPointerType(Type);		Type = C.getPointerType(Type);
if (isAllocatableDecl(VD))		if (isAllocatableDecl(VD))
Type = C.getPointerType(Type);		Type = C.getPointerType(Type);
}		}
FieldDecl *FD = addFieldToRecordDecl(C, RD, Type);		FieldDecl *FD = CodeGenUtil::addFieldToRecordDecl(C, RD, Type);
if (VD->hasAttrs()) {		if (VD->hasAttrs()) {
for (specific_attr_iterator<AlignedAttr> I(VD->getAttrs().begin()),		for (specific_attr_iterator<AlignedAttr> I(VD->getAttrs().begin()),
E(VD->getAttrs().end());		E(VD->getAttrs().end());
I != E; ++I)		I != E; ++I)
FD->addAttr(*I);		FD->addAttr(*I);
}		}
}		}
RD->completeDefinition();		RD->completeDefinition();
Show All 17 Lines	createKmpTaskTRecordDecl(CodeGenModule &CGM, OpenMPDirectiveKind Kind,
// kmp_uint64 lb;		// kmp_uint64 lb;
// kmp_uint64 ub;		// kmp_uint64 ub;
// kmp_int64 st;		// kmp_int64 st;
// kmp_int32 liter;		// kmp_int32 liter;
// void * reductions;		// void * reductions;
// };		// };
RecordDecl *UD = C.buildImplicitRecord("kmp_cmplrdata_t", TTK_Union);		RecordDecl *UD = C.buildImplicitRecord("kmp_cmplrdata_t", TTK_Union);
UD->startDefinition();		UD->startDefinition();
addFieldToRecordDecl(C, UD, KmpInt32Ty);		CodeGenUtil::addFieldToRecordDecl(C, UD, KmpInt32Ty);
addFieldToRecordDecl(C, UD, KmpRoutineEntryPointerQTy);		CodeGenUtil::addFieldToRecordDecl(C, UD, KmpRoutineEntryPointerQTy);
UD->completeDefinition();		UD->completeDefinition();
QualType KmpCmplrdataTy = C.getRecordType(UD);		QualType KmpCmplrdataTy = C.getRecordType(UD);
RecordDecl *RD = C.buildImplicitRecord("kmp_task_t");		RecordDecl *RD = C.buildImplicitRecord("kmp_task_t");
RD->startDefinition();		RD->startDefinition();
addFieldToRecordDecl(C, RD, C.VoidPtrTy);		CodeGenUtil::addFieldToRecordDecl(C, RD, C.VoidPtrTy);
addFieldToRecordDecl(C, RD, KmpRoutineEntryPointerQTy);		CodeGenUtil::addFieldToRecordDecl(C, RD, KmpRoutineEntryPointerQTy);
addFieldToRecordDecl(C, RD, KmpInt32Ty);		CodeGenUtil::addFieldToRecordDecl(C, RD, KmpInt32Ty);
addFieldToRecordDecl(C, RD, KmpCmplrdataTy);		CodeGenUtil::addFieldToRecordDecl(C, RD, KmpCmplrdataTy);
addFieldToRecordDecl(C, RD, KmpCmplrdataTy);		CodeGenUtil::addFieldToRecordDecl(C, RD, KmpCmplrdataTy);
if (isOpenMPTaskLoopDirective(Kind)) {		if (isOpenMPTaskLoopDirective(Kind)) {
QualType KmpUInt64Ty =		QualType KmpUInt64Ty =
CGM.getContext().getIntTypeForBitwidth(/DestWidth=/64, /Signed=/0);		CGM.getContext().getIntTypeForBitwidth(/DestWidth=/64, /Signed=/0);
QualType KmpInt64Ty =		QualType KmpInt64Ty =
CGM.getContext().getIntTypeForBitwidth(/DestWidth=/64, /Signed=/1);		CGM.getContext().getIntTypeForBitwidth(/DestWidth=/64, /Signed=/1);
addFieldToRecordDecl(C, RD, KmpUInt64Ty);		CodeGenUtil::addFieldToRecordDecl(C, RD, KmpUInt64Ty);
addFieldToRecordDecl(C, RD, KmpUInt64Ty);		CodeGenUtil::addFieldToRecordDecl(C, RD, KmpUInt64Ty);
addFieldToRecordDecl(C, RD, KmpInt64Ty);		CodeGenUtil::addFieldToRecordDecl(C, RD, KmpInt64Ty);
addFieldToRecordDecl(C, RD, KmpInt32Ty);		CodeGenUtil::addFieldToRecordDecl(C, RD, KmpInt32Ty);
addFieldToRecordDecl(C, RD, C.VoidPtrTy);		CodeGenUtil::addFieldToRecordDecl(C, RD, C.VoidPtrTy);
}		}
RD->completeDefinition();		RD->completeDefinition();
return RD;		return RD;
}		}

static RecordDecl *		static RecordDecl *
createKmpTaskTWithPrivatesRecordDecl(CodeGenModule &CGM, QualType KmpTaskTQTy,		createKmpTaskTWithPrivatesRecordDecl(CodeGenModule &CGM, QualType KmpTaskTQTy,
ArrayRef<PrivateDataTy> Privates) {		ArrayRef<PrivateDataTy> Privates) {
ASTContext &C = CGM.getContext();		ASTContext &C = CGM.getContext();
// Build struct kmp_task_t_with_privates {		// Build struct kmp_task_t_with_privates {
// kmp_task_t task_data;		// kmp_task_t task_data;
// .kmp_privates_t. privates;		// .kmp_privates_t. privates;
// };		// };
RecordDecl *RD = C.buildImplicitRecord("kmp_task_t_with_privates");		RecordDecl *RD = C.buildImplicitRecord("kmp_task_t_with_privates");
RD->startDefinition();		RD->startDefinition();
addFieldToRecordDecl(C, RD, KmpTaskTQTy);		CodeGenUtil::addFieldToRecordDecl(C, RD, KmpTaskTQTy);
if (const RecordDecl *PrivateRD = createPrivatesRecordDecl(CGM, Privates))		if (const RecordDecl *PrivateRD = createPrivatesRecordDecl(CGM, Privates))
addFieldToRecordDecl(C, RD, C.getRecordType(PrivateRD));		CodeGenUtil::addFieldToRecordDecl(C, RD, C.getRecordType(PrivateRD));
RD->completeDefinition();		RD->completeDefinition();
return RD;		return RD;
}		}

/// Emit a proxy function which accepts kmp_task_t as the second		/// Emit a proxy function which accepts kmp_task_t as the second
/// argument.		/// argument.
/// \code		/// \code
/// kmp_int32 .omp_task_entry.(kmp_int32 gtid, kmp_task_t *tt) {		/// kmp_int32 .omp_task_entry.(kmp_int32 gtid, kmp_task_t *tt) {
▲ Show 20 Lines • Show All 616 Lines • ▼ Show 20 Lines

/// Builds kmp_depend_info, if it is not built yet, and builds flags type.		/// Builds kmp_depend_info, if it is not built yet, and builds flags type.
static void getKmpAffinityType(ASTContext &C, QualType &KmpTaskAffinityInfoTy) {		static void getKmpAffinityType(ASTContext &C, QualType &KmpTaskAffinityInfoTy) {
QualType FlagsTy = C.getIntTypeForBitwidth(32, /Signed=/false);		QualType FlagsTy = C.getIntTypeForBitwidth(32, /Signed=/false);
if (KmpTaskAffinityInfoTy.isNull()) {		if (KmpTaskAffinityInfoTy.isNull()) {
RecordDecl *KmpAffinityInfoRD =		RecordDecl *KmpAffinityInfoRD =
C.buildImplicitRecord("kmp_task_affinity_info_t");		C.buildImplicitRecord("kmp_task_affinity_info_t");
KmpAffinityInfoRD->startDefinition();		KmpAffinityInfoRD->startDefinition();
addFieldToRecordDecl(C, KmpAffinityInfoRD, C.getIntPtrType());		CodeGenUtil::addFieldToRecordDecl(C, KmpAffinityInfoRD, C.getIntPtrType());
addFieldToRecordDecl(C, KmpAffinityInfoRD, C.getSizeType());		CodeGenUtil::addFieldToRecordDecl(C, KmpAffinityInfoRD, C.getSizeType());
addFieldToRecordDecl(C, KmpAffinityInfoRD, FlagsTy);		CodeGenUtil::addFieldToRecordDecl(C, KmpAffinityInfoRD, FlagsTy);
KmpAffinityInfoRD->completeDefinition();		KmpAffinityInfoRD->completeDefinition();
KmpTaskAffinityInfoTy = C.getRecordType(KmpAffinityInfoRD);		KmpTaskAffinityInfoTy = C.getRecordType(KmpAffinityInfoRD);
}		}
}		}

CGOpenMPRuntime::TaskResultTy		CGOpenMPRuntime::TaskResultTy
CGOpenMPRuntime::emitTaskInit(CodeGenFunction &CGF, SourceLocation Loc,		CGOpenMPRuntime::emitTaskInit(CodeGenFunction &CGF, SourceLocation Loc,
const OMPExecutableDirective &D,		const OMPExecutableDirective &D,
▲ Show 20 Lines • Show All 421 Lines • ▼ Show 20 Lines

/// Builds kmp_depend_info, if it is not built yet, and builds flags type.		/// Builds kmp_depend_info, if it is not built yet, and builds flags type.
static void getDependTypes(ASTContext &C, QualType &KmpDependInfoTy,		static void getDependTypes(ASTContext &C, QualType &KmpDependInfoTy,
QualType &FlagsTy) {		QualType &FlagsTy) {
FlagsTy = C.getIntTypeForBitwidth(C.getTypeSize(C.BoolTy), /Signed=/false);		FlagsTy = C.getIntTypeForBitwidth(C.getTypeSize(C.BoolTy), /Signed=/false);
if (KmpDependInfoTy.isNull()) {		if (KmpDependInfoTy.isNull()) {
RecordDecl *KmpDependInfoRD = C.buildImplicitRecord("kmp_depend_info");		RecordDecl *KmpDependInfoRD = C.buildImplicitRecord("kmp_depend_info");
KmpDependInfoRD->startDefinition();		KmpDependInfoRD->startDefinition();
addFieldToRecordDecl(C, KmpDependInfoRD, C.getIntPtrType());		CodeGenUtil::addFieldToRecordDecl(C, KmpDependInfoRD, C.getIntPtrType());
addFieldToRecordDecl(C, KmpDependInfoRD, C.getSizeType());		CodeGenUtil::addFieldToRecordDecl(C, KmpDependInfoRD, C.getSizeType());
addFieldToRecordDecl(C, KmpDependInfoRD, FlagsTy);		CodeGenUtil::addFieldToRecordDecl(C, KmpDependInfoRD, FlagsTy);
KmpDependInfoRD->completeDefinition();		KmpDependInfoRD->completeDefinition();
KmpDependInfoTy = C.getRecordType(KmpDependInfoRD);		KmpDependInfoTy = C.getRecordType(KmpDependInfoRD);
}		}
}		}

std::pair<llvm::Value *, LValue>		std::pair<llvm::Value *, LValue>
CGOpenMPRuntime::getDepobjElements(CodeGenFunction &CGF, LValue DepobjLVal,		CGOpenMPRuntime::getDepobjElements(CodeGenFunction &CGF, LValue DepobjLVal,
SourceLocation Loc) {		SourceLocation Loc) {
▲ Show 20 Lines • Show All 1,430 Lines • ▼ Show 20 Lines	llvm::Value *CGOpenMPRuntime::emitTaskReductionInit(
// void *reduce_init; // data initialization routine		// void *reduce_init; // data initialization routine
// void *reduce_fini; // data finalization routine		// void *reduce_fini; // data finalization routine
// void *reduce_comb; // data combiner routine		// void *reduce_comb; // data combiner routine
// kmp_task_red_flags_t flags; // flags for additional info from compiler		// kmp_task_red_flags_t flags; // flags for additional info from compiler
// } kmp_taskred_input_t;		// } kmp_taskred_input_t;
ASTContext &C = CGM.getContext();		ASTContext &C = CGM.getContext();
RecordDecl *RD = C.buildImplicitRecord("kmp_taskred_input_t");		RecordDecl *RD = C.buildImplicitRecord("kmp_taskred_input_t");
RD->startDefinition();		RD->startDefinition();
const FieldDecl *SharedFD = addFieldToRecordDecl(C, RD, C.VoidPtrTy);		const FieldDecl *SharedFD =
const FieldDecl *OrigFD = addFieldToRecordDecl(C, RD, C.VoidPtrTy);		CodeGenUtil::addFieldToRecordDecl(C, RD, C.VoidPtrTy);
const FieldDecl *SizeFD = addFieldToRecordDecl(C, RD, C.getSizeType());		const FieldDecl *OrigFD =
const FieldDecl *InitFD = addFieldToRecordDecl(C, RD, C.VoidPtrTy);		CodeGenUtil::addFieldToRecordDecl(C, RD, C.VoidPtrTy);
const FieldDecl *FiniFD = addFieldToRecordDecl(C, RD, C.VoidPtrTy);		const FieldDecl *SizeFD =
const FieldDecl *CombFD = addFieldToRecordDecl(C, RD, C.VoidPtrTy);		CodeGenUtil::addFieldToRecordDecl(C, RD, C.getSizeType());
const FieldDecl *FlagsFD = addFieldToRecordDecl(		const FieldDecl *InitFD =
		CodeGenUtil::addFieldToRecordDecl(C, RD, C.VoidPtrTy);
		const FieldDecl *FiniFD =
		CodeGenUtil::addFieldToRecordDecl(C, RD, C.VoidPtrTy);
		const FieldDecl *CombFD =
		CodeGenUtil::addFieldToRecordDecl(C, RD, C.VoidPtrTy);
		const FieldDecl *FlagsFD = CodeGenUtil::addFieldToRecordDecl(
C, RD, C.getIntTypeForBitwidth(/DestWidth=/32, /Signed=/false));		C, RD, C.getIntTypeForBitwidth(/DestWidth=/32, /Signed=/false));
RD->completeDefinition();		RD->completeDefinition();
QualType RDType = C.getRecordType(RD);		QualType RDType = C.getRecordType(RD);
unsigned Size = Data.ReductionVars.size();		unsigned Size = Data.ReductionVars.size();
llvm::APInt ArraySize(/numBits=/64, Size);		llvm::APInt ArraySize(/numBits=/64, Size);
QualType ArrayRDType = C.getConstantArrayType(		QualType ArrayRDType = C.getConstantArrayType(
RDType, ArraySize, nullptr, ArrayType::Normal, /IndexTypeQuals=/0);		RDType, ArraySize, nullptr, ArrayType::Normal, /IndexTypeQuals=/0);
// kmp_task_red_input_t .rd_input.[Size];		// kmp_task_red_input_t .rd_input.[Size];
▲ Show 20 Lines • Show All 3,018 Lines • ▼ Show 20 Lines	static void emitNonContiguousDescriptor(
// uint64_t count;		// uint64_t count;
// uint64_t stride		// uint64_t stride
// };		// };
ASTContext &C = CGF.getContext();		ASTContext &C = CGF.getContext();
QualType Int64Ty = C.getIntTypeForBitwidth(/DestWidth=/64, /Signed=/0);		QualType Int64Ty = C.getIntTypeForBitwidth(/DestWidth=/64, /Signed=/0);
RecordDecl *RD;		RecordDecl *RD;
RD = C.buildImplicitRecord("descriptor_dim");		RD = C.buildImplicitRecord("descriptor_dim");
RD->startDefinition();		RD->startDefinition();
addFieldToRecordDecl(C, RD, Int64Ty);		CodeGenUtil::addFieldToRecordDecl(C, RD, Int64Ty);
addFieldToRecordDecl(C, RD, Int64Ty);		CodeGenUtil::addFieldToRecordDecl(C, RD, Int64Ty);
addFieldToRecordDecl(C, RD, Int64Ty);		CodeGenUtil::addFieldToRecordDecl(C, RD, Int64Ty);
RD->completeDefinition();		RD->completeDefinition();
QualType DimTy = C.getRecordType(RD);		QualType DimTy = C.getRecordType(RD);

enum { OffsetFD = 0, CountFD, StrideFD };		enum { OffsetFD = 0, CountFD, StrideFD };
// We need two index variable here since the size of "Dims" is the same as the		// We need two index variable here since the size of "Dims" is the same as the
// size of Components, however, the size of offset, count, and stride is equal		// size of Components, however, the size of offset, count, and stride is equal
// to the size of base declaration that is non-contiguous.		// to the size of base declaration that is non-contiguous.
for (unsigned I = 0, L = 0, E = NonContigInfo.Dims.size(); I < E; ++I) {		for (unsigned I = 0, L = 0, E = NonContigInfo.Dims.size(); I < E; ++I) {
▲ Show 20 Lines • Show All 2,592 Lines • ▼ Show 20 Lines	void CGOpenMPRuntime::emitDoacrossInit(CodeGenFunction &CGF,
if (KmpDimTy.isNull()) {		if (KmpDimTy.isNull()) {
// Build struct kmp_dim { // loop bounds info casted to kmp_int64		// Build struct kmp_dim { // loop bounds info casted to kmp_int64
// kmp_int64 lo; // lower		// kmp_int64 lo; // lower
// kmp_int64 up; // upper		// kmp_int64 up; // upper
// kmp_int64 st; // stride		// kmp_int64 st; // stride
// };		// };
RD = C.buildImplicitRecord("kmp_dim");		RD = C.buildImplicitRecord("kmp_dim");
RD->startDefinition();		RD->startDefinition();
addFieldToRecordDecl(C, RD, Int64Ty);		CodeGenUtil::addFieldToRecordDecl(C, RD, Int64Ty);
addFieldToRecordDecl(C, RD, Int64Ty);		CodeGenUtil::addFieldToRecordDecl(C, RD, Int64Ty);
addFieldToRecordDecl(C, RD, Int64Ty);		CodeGenUtil::addFieldToRecordDecl(C, RD, Int64Ty);
RD->completeDefinition();		RD->completeDefinition();
KmpDimTy = C.getRecordType(RD);		KmpDimTy = C.getRecordType(RD);
} else {		} else {
RD = cast<RecordDecl>(KmpDimTy->getAsTagDecl());		RD = cast<RecordDecl>(KmpDimTy->getAsTagDecl());
}		}
llvm::APInt Size(/numBits=/32, NumIterations.size());		llvm::APInt Size(/numBits=/32, NumIterations.size());
QualType ArrayTy =		QualType ArrayTy =
C.getConstantArrayType(KmpDimTy, Size, nullptr, ArrayType::Normal, 0);		C.getConstantArrayType(KmpDimTy, Size, nullptr, ArrayType::Normal, 0);
▲ Show 20 Lines • Show All 436 Lines • ▼ Show 20 Lines	Address CGOpenMPRuntime::emitLastprivateConditionalInit(CodeGenFunction &CGF,
QualType NewType;		QualType NewType;
const FieldDecl *VDField;		const FieldDecl *VDField;
const FieldDecl *FiredField;		const FieldDecl *FiredField;
LValue BaseLVal;		LValue BaseLVal;
auto VI = I->getSecond().find(VD);		auto VI = I->getSecond().find(VD);
if (VI == I->getSecond().end()) {		if (VI == I->getSecond().end()) {
RecordDecl *RD = C.buildImplicitRecord("lasprivate.conditional");		RecordDecl *RD = C.buildImplicitRecord("lasprivate.conditional");
RD->startDefinition();		RD->startDefinition();
VDField = addFieldToRecordDecl(C, RD, VD->getType().getNonReferenceType());		VDField = CodeGenUtil::addFieldToRecordDecl(
FiredField = addFieldToRecordDecl(C, RD, C.CharTy);		C, RD, VD->getType().getNonReferenceType());
		FiredField = CodeGenUtil::addFieldToRecordDecl(C, RD, C.CharTy);
RD->completeDefinition();		RD->completeDefinition();
NewType = C.getRecordType(RD);		NewType = C.getRecordType(RD);
Address Addr = CGF.CreateMemTemp(NewType, C.getDeclAlign(VD), VD->getName());		Address Addr = CGF.CreateMemTemp(NewType, C.getDeclAlign(VD), VD->getName());
BaseLVal = CGF.MakeAddrLValue(Addr, NewType, AlignmentSource::Decl);		BaseLVal = CGF.MakeAddrLValue(Addr, NewType, AlignmentSource::Decl);
I->getSecond().try_emplace(VD, NewType, VDField, FiredField, BaseLVal);		I->getSecond().try_emplace(VD, NewType, VDField, FiredField, BaseLVal);
} else {		} else {
NewType = std::get<0>(VI->getSecond());		NewType = std::get<0>(VI->getSecond());
VDField = std::get<1>(VI->getSecond());		VDField = std::get<1>(VI->getSecond());
▲ Show 20 Lines • Show All 580 Lines • Show Last 20 Lines

clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.h

	Show All 20 Lines

	namespace clang {			namespace clang {
	namespace CodeGen {			namespace CodeGen {

	class CGOpenMPRuntimeAMDGCN final : public CGOpenMPRuntimeGPU {			class CGOpenMPRuntimeAMDGCN final : public CGOpenMPRuntimeGPU {

	public:			public:
	explicit CGOpenMPRuntimeAMDGCN(CodeGenModule &CGM);			explicit CGOpenMPRuntimeAMDGCN(CodeGenModule &CGM);

				private:
				/// Struct to store kernel descriptors
				ABataevUnsubmitted Done Reply Inline Actions Add comments for all new members ABataev: Add comments for all new members
				ABataevUnsubmitted Not Done Reply Inline Actions Runtime does not support nested parallelism on GPU. Do you really need it? ABataev: Runtime does not support nested parallelism on GPU. Do you really need it?
				QualType TgtAttributeStructQTy;

	/// Get the GPU warp size.			/// Get the GPU warp size.
	llvm::Value *getGPUWarpSize(CodeGenFunction &CGF) override;			llvm::Value *getGPUWarpSize(CodeGenFunction &CGF) override;

	/// Get the id of the current thread on the GPU.			/// Get the id of the current thread on the GPU.
	llvm::Value *getGPUThreadID(CodeGenFunction &CGF) override;			llvm::Value *getGPUThreadID(CodeGenFunction &CGF) override;

	/// Get the maximum number of threads in a block of the GPU.			/// Get the maximum number of threads in a block of the GPU.
	llvm::Value *getGPUNumThreads(CodeGenFunction &CGF) override;			llvm::Value *getGPUNumThreads(CodeGenFunction &CGF) override;

				/// Target independent wrapper over target specific emitSPMDKernel()
				JonChesterfieldUnsubmitted Not Done Reply Inline Actions I'm not convinced by this abstraction. It looks like amdgcn and nvptx want almost exactly the same variable in each case. The difference appears to be that nvptx uses internal linkage and amdgcn uses weak + externally initialized, in which case we're better off with `bool nvptx::needsExternalInitialization() {return false;}` `bool amdgpu::needsExternalInitialization() {return true;}` Or, if the inline ternary is unappealing, amdgcn::NewGlobalVariable(...) that passes the arguments to llvm::GlobalVariable while setting the two fields that differ between the two. JonChesterfield: I'm not convinced by this abstraction. It looks like amdgcn and nvptx want almost exactly the…
				saiislamAuthorUnsubmitted Done Reply Inline Actions I understand what you are suggesting. But, there are multiple such variables where linkage between nvptx and amdgcn are different. Also current style gives flexibility to a future implementation to define these variables in their own way. What do you think? saiislam: I understand what you are suggesting. But, there are multiple such variables where linkage…
				void emitSPMDKernelWrapper(const OMPExecutableDirective &D,
				StringRef ParentName, llvm::Function *&OutlinedFn,
				llvm::Constant *&OutlinedFnID, bool IsOffloadEntry,
				const RegionCodeGenTy &CodeGen) override;

				/// Target independent wrapper over target specific emitNonSPMDKernel()
				void emitNonSPMDKernelWrapper(const OMPExecutableDirective &D,
				StringRef ParentName,
				llvm::Function *&OutlinedFn,
				llvm::Constant *&OutlinedFnID,
				bool IsOffloadEntry,
				const RegionCodeGenTy &CodeGen) override;

				/// Create a unique global variable to indicate the flat-work-group-size
				/// for this region. Values are [256..1024].
				static void setPropertyWorkGroupSize(CodeGenModule &CGM, StringRef Name,
				unsigned WGSize);

				/// Generate global variables _wg_size, kern_desc, __tgt_attribute_struct.
				/// Also generate appropriate value of attribute amdgpu-flat-work-group-size
				ABataevUnsubmitted Done Reply Inline Actions Do you really need to expose all these new members as public? ABataev: Do you really need to expose all these new members as public?
				void generateMetaData(CodeGenModule &CGM, const OMPExecutableDirective &D,
				llvm::Function *&OutlinedFn, bool IsGeneric);

				/// Returns __tgt_attribute_struct type.
				QualType getTgtAttributeStructQTy();

				/// Emit structure descriptor for a kernel
				void emitStructureKernelDesc(CodeGenModule &CGM, StringRef Name,
				int16_t WG_Size, int8_t Mode,
				int8_t HostServices);
	};			};

	} // namespace CodeGen			} // namespace CodeGen
	} // namespace clang			} // namespace clang
				ABataevUnsubmitted Not Done Reply Inline Actions Do you really need to make this class public? `final` ABataev: 1. Do you really need to make this class public? 2. `final`

	#endif // LLVM_CLANG_LIB_CODEGEN_CGOPENMPRUNTIMEAMDGCN_H			#endif // LLVM_CLANG_LIB_CODEGEN_CGOPENMPRUNTIMEAMDGCN_H
				ABataevUnsubmitted Done Reply Inline Actions It does not help to understand the functionality ABataev: It does not help to understand the functionality

clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.cpp

//===-- CGOpenMPRuntimeAMDGCN.cpp - Interface to OpenMP AMDGCN Runtimes --===//		//===-- CGOpenMPRuntimeAMDGCN.cpp - Interface to OpenMP AMDGCN Runtimes --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This provides a class for OpenMP runtime code generation specialized to		// This provides a class for OpenMP runtime code generation specialized to
// AMDGCN targets from generalized CGOpenMPRuntimeGPU class.		// AMDGCN targets from generalized CGOpenMPRuntimeGPU class.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "CGOpenMPRuntimeAMDGCN.h"		#include "CGOpenMPRuntimeAMDGCN.h"
		#include "CGOpenMPRuntime.h"
#include "CGOpenMPRuntimeGPU.h"		#include "CGOpenMPRuntimeGPU.h"
#include "CodeGenFunction.h"		#include "CodeGenFunction.h"
#include "clang/AST/Attr.h"		#include "clang/AST/Attr.h"
#include "clang/AST/DeclOpenMP.h"		#include "clang/AST/DeclOpenMP.h"
#include "clang/AST/StmtOpenMP.h"		#include "clang/AST/StmtOpenMP.h"
#include "clang/AST/StmtVisitor.h"		#include "clang/AST/StmtVisitor.h"
#include "clang/Basic/Cuda.h"		#include "clang/Basic/Cuda.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/IR/IntrinsicsAMDGPU.h"		#include "llvm/IR/IntrinsicsAMDGPU.h"

using namespace clang;		using namespace clang;
using namespace CodeGen;		using namespace CodeGen;
using namespace llvm::omp;		using namespace llvm::omp;

		//
		// Definitions of all virtual methods defined in CGOpenMPRuntimeGPU
		//
CGOpenMPRuntimeAMDGCN::CGOpenMPRuntimeAMDGCN(CodeGenModule &CGM)		CGOpenMPRuntimeAMDGCN::CGOpenMPRuntimeAMDGCN(CodeGenModule &CGM)
: CGOpenMPRuntimeGPU(CGM) {		: CGOpenMPRuntimeGPU(CGM) {
if (!CGM.getLangOpts().OpenMPIsDevice)		if (!CGM.getLangOpts().OpenMPIsDevice)
llvm_unreachable("OpenMP AMDGCN can only handle device code.");		llvm_unreachable("OpenMP AMDGCN can only handle device code.");
		KernelStaticGlobalizedLinkage = llvm::GlobalValue::WeakAnyLinkage;
}		}

llvm::Value *CGOpenMPRuntimeAMDGCN::getGPUWarpSize(CodeGenFunction &CGF) {		llvm::Value *CGOpenMPRuntimeAMDGCN::getGPUWarpSize(CodeGenFunction &CGF) {
CGBuilderTy &Bld = CGF.Builder;		CGBuilderTy &Bld = CGF.Builder;
// return constant compile-time target-specific warp size		// return constant compile-time target-specific warp size
unsigned WarpSize = CGF.getTarget().getGridValue(llvm::omp::GV_Warp_Size);		unsigned WarpSize = CGF.getTarget().getGridValue(llvm::omp::GV_Warp_Size);
return Bld.getInt32(WarpSize);		return Bld.getInt32(WarpSize);
}		}
Show All 13 Lines	llvm::Value *CGOpenMPRuntimeAMDGCN::getGPUNumThreads(CodeGenFunction &CGF) {
if (!F) {		if (!F) {
F = llvm::Function::Create(		F = llvm::Function::Create(
llvm::FunctionType::get(CGF.Int64Ty, {CGF.Int32Ty}, false),		llvm::FunctionType::get(CGF.Int64Ty, {CGF.Int32Ty}, false),
llvm::GlobalVariable::ExternalLinkage, LocSize, &CGF.CGM.getModule());		llvm::GlobalVariable::ExternalLinkage, LocSize, &CGF.CGM.getModule());
}		}
return Bld.CreateTrunc(		return Bld.CreateTrunc(
Bld.CreateCall(F, {Bld.getInt32(0)}, "nvptx_num_threads"), CGF.Int32Ty);		Bld.CreateCall(F, {Bld.getInt32(0)}, "nvptx_num_threads"), CGF.Int32Ty);
}		}

		void CGOpenMPRuntimeAMDGCN::emitSPMDKernelWrapper(
		const OMPExecutableDirective &D, StringRef ParentName,
		llvm::Function &OutlinedFn, llvm::Constant &OutlinedFnID,
		bool IsOffloadEntry, const RegionCodeGenTy &CodeGen) {
		emitSPMDKernel(D, ParentName, OutlinedFn, OutlinedFnID, IsOffloadEntry,
		CodeGen);
		generateMetaData(CGM, D, OutlinedFn, /SPMD/ false);
		}

		void CGOpenMPRuntimeAMDGCN::emitNonSPMDKernelWrapper(
		JonChesterfieldUnsubmitted Done Reply Inline Actions This is a very verbose way to say that amdgcn calls emitmetatdata at the end of emitkernel and nvptx doesn't. Suggest unconditionally calling emitmetatdata, and having emitmetatdata be a no-op for nvptx. JonChesterfield: This is a very verbose way to say that amdgcn calls emitmetatdata at the end of emitkernel and…
		saiislamAuthorUnsubmitted Done Reply Inline Actions Won't the no-op approach be less extensible? Current way, though verbose, leaves scope for attaching prefix/suffix code as and when required around emitkernel. While in case of no-op, every implementing arch might have to use the exact same pattern of methods with and without code. saiislam: Won't the no-op approach be less extensible? Current way, though verbose, leaves scope for…
		const OMPExecutableDirective &D, StringRef ParentName,
		llvm::Function &OutlinedFn, llvm::Constant &OutlinedFnID,
		bool IsOffloadEntry, const RegionCodeGenTy &CodeGen) {
		emitNonSPMDKernel(D, ParentName, OutlinedFn, OutlinedFnID, IsOffloadEntry,
		CodeGen);
		generateMetaData(CGM, D, OutlinedFn, /Generic/ true);
		}

		//
		// Definitions of AMDGCN specific methods
		//
		void CGOpenMPRuntimeAMDGCN::setPropertyWorkGroupSize(CodeGenModule &CGM,
		JonChesterfieldUnsubmitted Done Reply Inline Actions I think there's a credible chance this is useful to nvptx, so doesn't have to be amdgcn specific JonChesterfield: I think there's a credible chance this is useful to nvptx, so doesn't have to be amdgcn specific
		saiislamAuthorUnsubmitted Done Reply Inline Actions You are right, it can be useful for nvptx as well. May be we can club its generalization with the nvptx's use-case when it arrives in the future? saiislam: You are right, it can be useful for nvptx as well. May be we can club its generalization with…
		StringRef Name,
		unsigned WGSize) {
		auto *GVMode = new llvm::GlobalVariable(
		CGM.getModule(), CGM.Int16Ty, /isConstant=/true,
		llvm::GlobalValue::WeakAnyLinkage,
		llvm::ConstantInt::get(CGM.Int16Ty, WGSize), Twine(Name, "_wg_size"),
		/InsertBefore=/nullptr, llvm::GlobalVariable::NotThreadLocal,
		CGM.getContext().getTargetAddressSpace(LangAS::cuda_device));

		CGM.addCompilerUsedGlobal(GVMode);
		}

		void CGOpenMPRuntimeAMDGCN::generateMetaData(CodeGenModule &CGM,
		const OMPExecutableDirective &D,
		llvm::Function *&OutlinedFn,
		bool IsGeneric) {
		int FlatAttr = 0;
		bool FlatAttrEmitted = false;
		unsigned DefaultWorkGroupSz =
		JonChesterfieldUnsubmitted Done Reply Inline Actions I think this is about computing a maximum workgroup size which the runtime uses to limit the number of threads it launches. If so, this is probably useful for nvptx and amdgcn. I'm having trouble working out what the conditions are though. Maybe it's based on an openmp clause? JonChesterfield: I think this is about computing a maximum workgroup size which the runtime uses to limit the…
		saiislamAuthorUnsubmitted Done Reply Inline Actions Yes, the if block in 111-147 corresponds to "number of threads" for thread_limit and num_threads clauses in teams and parallel directives. saiislam: Yes, the if block in 111-147 corresponds to "number of threads" for thread_limit and…
		CGM.getTarget().getGridValue(llvm::omp::GVIDX::GV_Default_WG_Size);

		if (isOpenMPTeamsDirective(D.getDirectiveKind()) \|\|
		isOpenMPParallelDirective(D.getDirectiveKind())) {
		const auto *ThreadLimitClause = D.getSingleClause<OMPThreadLimitClause>();
		const auto *NumThreadsClause = D.getSingleClause<OMPNumThreadsClause>();
		unsigned MaxWorkGroupSz =
		CGM.getTarget().getGridValue(llvm::omp::GVIDX::GV_Max_WG_Size);
		unsigned CompileTimeThreadLimit = 0;
		// Only one of thread_limit or num_threads is used, cant do it for both
		if (ThreadLimitClause && !NumThreadsClause) {
		ABataevUnsubmitted Not Done Reply Inline Actions Is this possible? ABataev: Is this possible?
		Expr *ThreadLimitExpr = ThreadLimitClause->getThreadLimit();
		clang::Expr::EvalResult Result;
		if (ThreadLimitExpr->EvaluateAsInt(Result, CGM.getContext()))
		ABataevUnsubmitted Not Done Reply Inline Actions `FlatAttrEmitted` ABataev: `FlatAttrEmitted`
		CompileTimeThreadLimit = Result.Val.getInt().getExtValue();
		} else if (!ThreadLimitClause && NumThreadsClause) {
		Expr *NumThreadsExpr = NumThreadsClause->getNumThreads();
		clang::Expr::EvalResult Result;
		if (NumThreadsExpr->EvaluateAsInt(Result, CGM.getContext()))
		CompileTimeThreadLimit = Result.Val.getInt().getExtValue();
		}

		// Add kernel metadata if ThreadLimit Clause is compile time constant > 0
		if (CompileTimeThreadLimit > 0) {
		ABataevUnsubmitted Not Done Reply Inline Actions `CompileTimeThreadLimit` ABataev: `CompileTimeThreadLimit`
		// Add the WarpSize to generic, to reflect what runtime dispatch does.
		if (IsGeneric)
		CompileTimeThreadLimit +=
		CGM.getTarget().getGridValue(llvm::omp::GVIDX::GV_Warp_Size);
		if (CompileTimeThreadLimit > MaxWorkGroupSz)
		CompileTimeThreadLimit = MaxWorkGroupSz;
		std::string AttrVal = llvm::utostr(CompileTimeThreadLimit);
		FlatAttr = CompileTimeThreadLimit;
		OutlinedFn->addFnAttr("amdgpu-flat-work-group-size",
		AttrVal + "," + AttrVal);
		setPropertyWorkGroupSize(CGM, OutlinedFn->getName(),
		CompileTimeThreadLimit);
		}
		FlatAttrEmitted = true;
		} // end of amdgcn teams or parallel directive

		// emit amdgpu-flat-work-group-size if not emitted already.
		if (!FlatAttrEmitted) {
		JonChesterfieldUnsubmitted Not Done Reply Inline Actions I think I remember seeing a diff that makes this attribute unconditionally emitted by some other part of the toolchain. If so, it may no longer be required JonChesterfield: I think I remember seeing a diff that makes this attribute unconditionally emitted by some…
		std::string FlatAttrVal = llvm::utostr(DefaultWorkGroupSz);
		OutlinedFn->addFnAttr("amdgpu-flat-work-group-size",
		FlatAttrVal + "," + FlatAttrVal);
		}
		// Emit a kernel descriptor for runtime.
		StringRef KernDescName = OutlinedFn->getName();
		CGOpenMPRuntimeAMDGCN::emitStructureKernelDesc(
		CGM, KernDescName, FlatAttr, IsGeneric, 1 /* Uses HostServices */);
		}

		/// Emit structure descriptor for a kernel
		void CGOpenMPRuntimeAMDGCN::emitStructureKernelDesc(CodeGenModule &CGM,
		StringRef Name,
		int16_t WG_Size,
		int8_t Mode,
		int8_t HostServices) {

		// Create all device images
		llvm::Constant *AttrData[] = {
		JonChesterfieldUnsubmitted Not Done Reply Inline Actions HostServices is unused. Mode is redundant with exec_mode. wg_size is redundant with the other wg_size symbol added above. This kern_desc object should be deleted, not upstreamed. JonChesterfield: HostServices is unused. Mode is redundant with exec_mode. wg_size is redundant with the other…
		saiislamAuthorUnsubmitted Done Reply Inline Actions Ok, thanks. Will update in next revision. saiislam: Ok, thanks. Will update in next revision.
		llvm::ConstantInt::get(CGM.Int16Ty, 2), // Version
		llvm::ConstantInt::get(CGM.Int16Ty, 9), // Size in bytes
		llvm::ConstantInt::get(CGM.Int16Ty, WG_Size),
		llvm::ConstantInt::get(CGM.Int8Ty, Mode), // 0 => SPMD, 1 => GENERIC
		llvm::ConstantInt::get(CGM.Int8Ty, HostServices) // 1 => use HostServices
		};

		llvm::GlobalVariable *AttrImages =
		clang::CodeGen::CodeGenUtil::createGlobalStruct(
		JonChesterfieldUnsubmitted Not Done Reply Inline Actions The nvptx emitSPMDKernelWrapper does nothing and the amdgcn one appends some metadata. How about 'nvptx::generateMetadata(...)' that does nothing and 'amdgcn::generateMetadata(...)` that does this stuff, called from the end of emitSPMDKernel? JonChesterfield: The nvptx emitSPMDKernelWrapper does nothing and the amdgcn one appends some metadata. How…
		saiislamAuthorUnsubmitted Done Reply Inline Actions It will be then difficult to track what all things are being done differently in the two. So, the common code has been generalized and (no change in nvptx + some changes in amdgcn) has been used as specialization. saiislam: It will be then difficult to track what all things are being done differently in the two. So…
		CGM, getTgtAttributeStructQTy(), isDefaultLocationConstant(),
		AttrData, Twine(Name, "_kern_desc"),
		llvm::GlobalValue::WeakAnyLinkage);

		CGM.addCompilerUsedGlobal(AttrImages);
		}

		// Create Tgt Attribute Struct type.
		QualType CGOpenMPRuntimeAMDGCN::getTgtAttributeStructQTy() {
		ASTContext &C = CGM.getContext();
		QualType KmpInt8Ty = C.getIntTypeForBitwidth(/Width=/8, /Signed=/1);
		QualType KmpInt16Ty = C.getIntTypeForBitwidth(/Width=/16, /Signed=/1);
		if (TgtAttributeStructQTy.isNull()) {
		RecordDecl *RD = C.buildImplicitRecord("__tgt_attribute_struct");
		RD->startDefinition();
		// Version
		clang::CodeGen::CodeGenUtil::addFieldToRecordDecl(C, RD, KmpInt16Ty);
		// Struct Size in bytes.
		clang::CodeGen::CodeGenUtil::addFieldToRecordDecl(C, RD, KmpInt16Ty);
		// WG_size
		clang::CodeGen::CodeGenUtil::addFieldToRecordDecl(C, RD, KmpInt16Ty);
		// Mode
		JonChesterfieldUnsubmitted Not Done Reply Inline Actions This metadata generation could be split out from the other changes. JonChesterfield: This metadata generation could be split out from the other changes.
		clang::CodeGen::CodeGenUtil::addFieldToRecordDecl(C, RD, KmpInt8Ty);
		// HostServices
		clang::CodeGen::CodeGenUtil::addFieldToRecordDecl(C, RD, KmpInt8Ty);
		RD->completeDefinition();
		TgtAttributeStructQTy = C.getRecordType(RD);
		}
		return TgtAttributeStructQTy;
		}

clang/lib/CodeGen/CGOpenMPRuntimeGPU.h

Show All 27 Lines	public:
enum ExecutionMode {		enum ExecutionMode {
/// SPMD execution mode (all threads are worker threads).		/// SPMD execution mode (all threads are worker threads).
EM_SPMD,		EM_SPMD,
/// Non-SPMD execution mode (1 master thread, others are workers).		/// Non-SPMD execution mode (1 master thread, others are workers).
EM_NonSPMD,		EM_NonSPMD,
/// Unknown execution mode (orphaned directive).		/// Unknown execution mode (orphaned directive).
EM_Unknown,		EM_Unknown,
};		};

		protected:
		ABataevUnsubmitted Done Reply Inline Actions Make it private or protected 2.Add default initializer ABataev: 1. Make it private or protected 2.Add default initializer
		/// Linkage type of KernelStaticGlobalized variable
		llvm::GlobalValue::LinkageTypes KernelStaticGlobalizedLinkage;

private:		private:
/// Parallel outlined function work for workers to execute.		/// Parallel outlined function work for workers to execute.
llvm::SmallVector<llvm::Function *, 16> Work;		llvm::SmallVector<llvm::Function *, 16> Work;

struct EntryFunctionState {		struct EntryFunctionState {
llvm::BasicBlock *ExitBB = nullptr;		llvm::BasicBlock *ExitBB = nullptr;
};		};

▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	private:
//		//

/// Creates offloading entry for the provided entry ID \a ID,		/// Creates offloading entry for the provided entry ID \a ID,
/// address \a Addr, size \a Size, and flags \a Flags.		/// address \a Addr, size \a Size, and flags \a Flags.
void createOffloadEntry(llvm::Constant ID, llvm::Constant Addr,		void createOffloadEntry(llvm::Constant ID, llvm::Constant Addr,
uint64_t Size, int32_t Flags,		uint64_t Size, int32_t Flags,
llvm::GlobalValue::LinkageTypes Linkage) override;		llvm::GlobalValue::LinkageTypes Linkage) override;

		protected:
/// Emit outlined function specialized for the Fork-Join		/// Emit outlined function specialized for the Fork-Join
/// programming model for applicable target directives on the NVPTX device.		/// programming model for applicable target directives on the NVPTX device.
/// \param D Directive to emit.		/// \param D Directive to emit.
/// \param ParentName Name of the function that encloses the target region.		/// \param ParentName Name of the function that encloses the target region.
/// \param OutlinedFn Outlined function value to be defined by this call.		/// \param OutlinedFn Outlined function value to be defined by this call.
/// \param OutlinedFnID Outlined function ID value to be defined by this call.		/// \param OutlinedFnID Outlined function ID value to be defined by this call.
/// \param IsOffloadEntry True if the outlined function is an offload entry.		/// \param IsOffloadEntry True if the outlined function is an offload entry.
/// An outlined function may not be an entry if, e.g. the if clause always		/// An outlined function may not be an entry if, e.g. the if clause always
Show All 14 Lines	protected:
/// \param CodeGen Object containing the target statements.		/// \param CodeGen Object containing the target statements.
/// An outlined function may not be an entry if, e.g. the if clause always		/// An outlined function may not be an entry if, e.g. the if clause always
/// evaluates to false.		/// evaluates to false.
void emitSPMDKernel(const OMPExecutableDirective &D, StringRef ParentName,		void emitSPMDKernel(const OMPExecutableDirective &D, StringRef ParentName,
llvm::Function *&OutlinedFn,		llvm::Function *&OutlinedFn,
llvm::Constant *&OutlinedFnID, bool IsOffloadEntry,		llvm::Constant *&OutlinedFnID, bool IsOffloadEntry,
const RegionCodeGenTy &CodeGen);		const RegionCodeGenTy &CodeGen);

		private:
/// Emit outlined function for 'target' directive on the NVPTX		/// Emit outlined function for 'target' directive on the NVPTX
/// device.		/// device.
/// \param D Directive to emit.		/// \param D Directive to emit.
/// \param ParentName Name of the function that encloses the target region.		/// \param ParentName Name of the function that encloses the target region.
/// \param OutlinedFn Outlined function value to be defined by this call.		/// \param OutlinedFn Outlined function value to be defined by this call.
/// \param OutlinedFnID Outlined function ID value to be defined by this call.		/// \param OutlinedFnID Outlined function ID value to be defined by this call.
/// \param IsOffloadEntry True if the outlined function is an offload entry.		/// \param IsOffloadEntry True if the outlined function is an offload entry.
/// An outlined function may not be an entry if, e.g. the if clause always		/// An outlined function may not be an entry if, e.g. the if clause always
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	protected:
bool isDefaultLocationConstant() const override { return true; }		bool isDefaultLocationConstant() const override { return true; }

/// Returns additional flags that can be stored in reserved_2 field of the		/// Returns additional flags that can be stored in reserved_2 field of the
/// default location.		/// default location.
/// For NVPTX target contains data about SPMD/Non-SPMD execution mode +		/// For NVPTX target contains data about SPMD/Non-SPMD execution mode +
/// Full/Lightweight runtime mode. Used for better optimization.		/// Full/Lightweight runtime mode. Used for better optimization.
unsigned getDefaultLocationReserved2Flags() const override;		unsigned getDefaultLocationReserved2Flags() const override;

public:		public:
		JonChesterfieldUnsubmitted Not Done Reply Inline Actions Please put this back to the previous location so we can see whether it changed in the diff JonChesterfield: Please put this back to the previous location so we can see whether it changed in the diff
		saiislamAuthorUnsubmitted Done Reply Inline Actions This movement changes them from private to protected. I could have just added access specifiers and not move the definitions. It would have simplified the review, but it would have decreased the readability for future. saiislam: This movement changes them from private to protected. I could have just added access specifiers…
explicit CGOpenMPRuntimeGPU(CodeGenModule &CGM);		explicit CGOpenMPRuntimeGPU(CodeGenModule &CGM);
void clear() override;		void clear() override;

/// Declare generalized virtual functions which need to be defined		/// Declare generalized virtual functions which need to be defined
/// by all specializations of OpenMPGPURuntime Targets like AMDGCN		/// by all specializations of OpenMPGPURuntime Targets like AMDGCN
/// and NVPTX.		/// and NVPTX.

/// Get the GPU warp size.		/// Get the GPU warp size.
virtual llvm::Value *getGPUWarpSize(CodeGenFunction &CGF) = 0;		virtual llvm::Value *getGPUWarpSize(CodeGenFunction &CGF) = 0;

/// Get the id of the current thread on the GPU.		/// Get the id of the current thread on the GPU.
virtual llvm::Value *getGPUThreadID(CodeGenFunction &CGF) = 0;		virtual llvm::Value *getGPUThreadID(CodeGenFunction &CGF) = 0;

/// Get the maximum number of threads in a block of the GPU.		/// Get the maximum number of threads in a block of the GPU.
virtual llvm::Value *getGPUNumThreads(CodeGenFunction &CGF) = 0;		virtual llvm::Value *getGPUNumThreads(CodeGenFunction &CGF) = 0;

		/// Target independent wrapper over target specific emitSPMDKernel()
		virtual void emitSPMDKernelWrapper(const OMPExecutableDirective &D,
		StringRef ParentName,
		llvm::Function *&OutlinedFn,
		llvm::Constant *&OutlinedFnID,
		bool IsOffloadEntry,
		const RegionCodeGenTy &CodeGen) = 0;

		/// Target independent wrapper over target specific emitNonSPMDKernel()
		virtual void emitNonSPMDKernelWrapper(const OMPExecutableDirective &D,
		StringRef ParentName,
		llvm::Function *&OutlinedFn,
		llvm::Constant *&OutlinedFnID,
		bool IsOffloadEntry,
		const RegionCodeGenTy &CodeGen) = 0;

/// Emit call to void __kmpc_push_proc_bind(ident_t *loc, kmp_int32		/// Emit call to void __kmpc_push_proc_bind(ident_t *loc, kmp_int32
/// global_tid, int proc_bind) to generate code for 'proc_bind' clause.		/// global_tid, int proc_bind) to generate code for 'proc_bind' clause.
virtual void emitProcBindClause(CodeGenFunction &CGF,		virtual void emitProcBindClause(CodeGenFunction &CGF,
llvm::omp::ProcBindKind ProcBind,		llvm::omp::ProcBindKind ProcBind,
SourceLocation Loc) override;		SourceLocation Loc) override;

/// Emits call to void __kmpc_push_num_threads(ident_t *loc, kmp_int32		/// Emits call to void __kmpc_push_num_threads(ident_t *loc, kmp_int32
/// global_tid, kmp_int32 num_threads) to generate code for 'num_threads'		/// global_tid, kmp_int32 num_threads) to generate code for 'num_threads'
/// clause.		/// clause.
/// \param NumThreads An integer value of threads.		/// \param NumThreads An integer value of threads.
virtual void emitNumThreadsClause(CodeGenFunction &CGF,		virtual void emitNumThreadsClause(CodeGenFunction &CGF,
llvm::Value *NumThreads,		llvm::Value *NumThreads,
SourceLocation Loc) override;		SourceLocation Loc) override;

/// This function ought to emit, in the general case, a call to		/// This function ought to emit, in the general case, a call to
// the openmp runtime kmpc_push_num_teams. In NVPTX backend it is not needed		// the openmp runtime kmpc_push_num_teams. In NVPTX backend it is not needed
// as these numbers are obtained through the PTX grid and block configuration.		// as these numbers are obtained through the PTX grid and block configuration.
		ABataevUnsubmitted Not Done Reply Inline Actions Are all these required to be public? ABataev: Are all these required to be public?
		saiislamAuthorUnsubmitted Done Reply Inline Actions Yes, they are being called from outside class. saiislam: Yes, they are being called from outside class.
/// \param NumTeams An integer expression of teams.		/// \param NumTeams An integer expression of teams.
/// \param ThreadLimit An integer expression of threads.		/// \param ThreadLimit An integer expression of threads.
void emitNumTeamsClause(CodeGenFunction &CGF, const Expr *NumTeams,		void emitNumTeamsClause(CodeGenFunction &CGF, const Expr *NumTeams,
const Expr *ThreadLimit, SourceLocation Loc) override;		const Expr *ThreadLimit, SourceLocation Loc) override;

/// Emits inlined function for the specified OpenMP parallel		/// Emits inlined function for the specified OpenMP parallel
// directive.		// directive.
/// \a D. This outlined function has type void()(kmp_int32 ThreadID,		/// \a D. This outlined function has type void()(kmp_int32 ThreadID,
/// kmp_int32 BoundID, struct context_vars*).		/// kmp_int32 BoundID, struct context_vars*).
/// \param D OpenMP directive.		/// \param D OpenMP directive.
/// \param ThreadIDVar Variable for thread id in the current OpenMP region.		/// \param ThreadIDVar Variable for thread id in the current OpenMP region.
/// \param InnermostKind Kind of innermost directive (for simple directives it		/// \param InnermostKind Kind of innermost directive (for simple directives it
/// is a directive itself, for combined - its innermost directive).		/// is a directive itself, for combined - its innermost directive).
/// \param CodeGen Code generation sequence for the \a D directive.		/// \param CodeGen Code generation sequence for the \a D directive.
llvm::Function *		llvm::Function *
emitParallelOutlinedFunction(const OMPExecutableDirective &D,		emitParallelOutlinedFunction(const OMPExecutableDirective &D,
const VarDecl *ThreadIDVar,		const VarDecl *ThreadIDVar,
OpenMPDirectiveKind InnermostKind,		OpenMPDirectiveKind InnermostKind,
const RegionCodeGenTy &CodeGen) override;		const RegionCodeGenTy &CodeGen) override;

/// Emits inlined function for the specified OpenMP teams		/// Emits inlined function for the specified OpenMP teams
// directive.		// directive.
/// \a D. This outlined function has type void()(kmp_int32 ThreadID,		/// \a D. This outlined function has type void()(kmp_int32 ThreadID,
/// kmp_int32 BoundID, struct context_vars*).		/// kmp_int32 BoundID, struct context_vars*).
		ABataevUnsubmitted Not Done Reply Inline Actions Make them protected, not public if possible. Try the same for other new functions. ABataev: Make them protected, not public if possible. Try the same for other new functions.
/// \param D OpenMP directive.		/// \param D OpenMP directive.
/// \param ThreadIDVar Variable for thread id in the current OpenMP region.		/// \param ThreadIDVar Variable for thread id in the current OpenMP region.
/// \param InnermostKind Kind of innermost directive (for simple directives it		/// \param InnermostKind Kind of innermost directive (for simple directives it
/// is a directive itself, for combined - its innermost directive).		/// is a directive itself, for combined - its innermost directive).
/// \param CodeGen Code generation sequence for the \a D directive.		/// \param CodeGen Code generation sequence for the \a D directive.
llvm::Function *		llvm::Function *
emitTeamsOutlinedFunction(const OMPExecutableDirective &D,		emitTeamsOutlinedFunction(const OMPExecutableDirective &D,
const VarDecl *ThreadIDVar,		const VarDecl *ThreadIDVar,
▲ Show 20 Lines • Show All 108 Lines • ▼ Show 20 Lines	public:
/// their declaration context.		/// their declaration context.
enum DataSharingMode {		enum DataSharingMode {
/// CUDA data sharing mode.		/// CUDA data sharing mode.
CUDA,		CUDA,
/// Generic data-sharing mode.		/// Generic data-sharing mode.
Generic,		Generic,
};		};

/// Cleans up references to the objects in finished function.		/// Cleans up references to the objects in finished function.
///		///
void functionFinished(CodeGenFunction &CGF) override;		void functionFinished(CodeGenFunction &CGF) override;
		ABataevUnsubmitted Not Done Reply Inline Actions Make it private or protected ABataev: Make it private or protected

/// Choose a default value for the dist_schedule clause.		/// Choose a default value for the dist_schedule clause.
void getDefaultDistScheduleAndChunk(CodeGenFunction &CGF,		void getDefaultDistScheduleAndChunk(CodeGenFunction &CGF,
const OMPLoopDirective &S, OpenMPDistScheduleClauseKind &ScheduleKind,		const OMPLoopDirective &S, OpenMPDistScheduleClauseKind &ScheduleKind,
llvm::Value *&Chunk) const override;		llvm::Value *&Chunk) const override;

/// Choose a default value for the schedule clause.		/// Choose a default value for the schedule clause.
void getDefaultScheduleAndChunk(CodeGenFunction &CGF,		void getDefaultScheduleAndChunk(CodeGenFunction &CGF,
▲ Show 20 Lines • Show All 115 Lines • Show Last 20 Lines

clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp

//===---- CGOpenMPRuntimeGPU.cpp - Interface to OpenMP GPU Runtimes ----===//		//===---- CGOpenMPRuntimeGPU.cpp - Interface to OpenMP GPU Runtimes ----===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This provides a generalized class for OpenMP runtime code generation		// This provides a generalized class for OpenMP runtime code generation
// specialized by GPU targets NVPTX and AMDGCN.		// specialized by GPU targets NVPTX and AMDGCN.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "CGOpenMPRuntimeGPU.h"		#include "CGOpenMPRuntimeGPU.h"
		#include "CGOpenMPRuntimeAMDGCN.h"
#include "CGOpenMPRuntimeNVPTX.h"		#include "CGOpenMPRuntimeNVPTX.h"
#include "CodeGenFunction.h"		#include "CodeGenFunction.h"
#include "clang/AST/Attr.h"		#include "clang/AST/Attr.h"
#include "clang/AST/DeclOpenMP.h"		#include "clang/AST/DeclOpenMP.h"
#include "clang/AST/StmtOpenMP.h"		#include "clang/AST/StmtOpenMP.h"
#include "clang/AST/StmtVisitor.h"		#include "clang/AST/StmtVisitor.h"
#include "clang/Basic/Cuda.h"		#include "clang/Basic/Cuda.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
▲ Show 20 Lines • Show All 1,071 Lines • ▼ Show 20 Lines	void Exit(CodeGenFunction &CGF) override {
RT.emitNonSPMDEntryFooter(CGF, EST);		RT.emitNonSPMDEntryFooter(CGF, EST);
}		}
} Action(EST, WST);		} Action(EST, WST);
CodeGen.setAction(Action);		CodeGen.setAction(Action);
IsInTTDRegion = true;		IsInTTDRegion = true;
// Reserve place for the globalized memory.		// Reserve place for the globalized memory.
GlobalizedRecords.emplace_back();		GlobalizedRecords.emplace_back();
if (!KernelStaticGlobalized) {		if (!KernelStaticGlobalized) {
		auto &RT = static_cast<CGOpenMPRuntimeGPU &>(CGM.getOpenMPRuntime());
KernelStaticGlobalized = new llvm::GlobalVariable(		KernelStaticGlobalized = new llvm::GlobalVariable(
CGM.getModule(), CGM.VoidPtrTy, /isConstant=/false,		CGM.getModule(), CGM.VoidPtrTy, /isConstant=/false,
llvm::GlobalValue::InternalLinkage,		RT.KernelStaticGlobalizedLinkage, llvm::UndefValue::get(CGM.VoidPtrTy),
llvm::UndefValue::get(CGM.VoidPtrTy),
"_openmp_kernel_static_glob_rd$ptr", /InsertBefore=/nullptr,		"_openmp_kernel_static_glob_rd$ptr", /InsertBefore=/nullptr,
llvm::GlobalValue::NotThreadLocal,		llvm::GlobalValue::NotThreadLocal,
CGM.getContext().getTargetAddressSpace(LangAS::cuda_shared));		CGM.getContext().getTargetAddressSpace(LangAS::cuda_shared));
}		}
emitTargetOutlinedFunctionHelper(D, ParentName, OutlinedFn, OutlinedFnID,		emitTargetOutlinedFunctionHelper(D, ParentName, OutlinedFn, OutlinedFnID,
IsOffloadEntry, CodeGen);		IsOffloadEntry, CodeGen);
IsInTTDRegion = false;		IsInTTDRegion = false;

▲ Show 20 Lines • Show All 112 Lines • ▼ Show 20 Lines	void Exit(CodeGenFunction &CGF) override {
RT.emitSPMDEntryFooter(CGF, EST);		RT.emitSPMDEntryFooter(CGF, EST);
}		}
} Action(*this, EST, D);		} Action(*this, EST, D);
CodeGen.setAction(Action);		CodeGen.setAction(Action);
IsInTTDRegion = true;		IsInTTDRegion = true;
// Reserve place for the globalized memory.		// Reserve place for the globalized memory.
GlobalizedRecords.emplace_back();		GlobalizedRecords.emplace_back();
if (!KernelStaticGlobalized) {		if (!KernelStaticGlobalized) {
		auto &RT = static_cast<CGOpenMPRuntimeGPU &>(CGM.getOpenMPRuntime());
KernelStaticGlobalized = new llvm::GlobalVariable(		KernelStaticGlobalized = new llvm::GlobalVariable(
CGM.getModule(), CGM.VoidPtrTy, /isConstant=/false,		CGM.getModule(), CGM.VoidPtrTy, /isConstant=/false,
llvm::GlobalValue::InternalLinkage,		RT.KernelStaticGlobalizedLinkage, llvm::UndefValue::get(CGM.VoidPtrTy),
llvm::UndefValue::get(CGM.VoidPtrTy),
"_openmp_kernel_static_glob_rd$ptr", /InsertBefore=/nullptr,		"_openmp_kernel_static_glob_rd$ptr", /InsertBefore=/nullptr,
llvm::GlobalValue::NotThreadLocal,		llvm::GlobalValue::NotThreadLocal,
CGM.getContext().getTargetAddressSpace(LangAS::cuda_shared));		CGM.getContext().getTargetAddressSpace(LangAS::cuda_shared));
}		}
emitTargetOutlinedFunctionHelper(D, ParentName, OutlinedFn, OutlinedFnID,		emitTargetOutlinedFunctionHelper(D, ParentName, OutlinedFn, OutlinedFnID,
IsOffloadEntry, CodeGen);		IsOffloadEntry, CodeGen);
IsInTTDRegion = false;		IsInTTDRegion = false;
}		}
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines
// 'generic', the runtime reserves one warp for the master, otherwise, all		// 'generic', the runtime reserves one warp for the master, otherwise, all
// warps participate in parallel work.		// warps participate in parallel work.
static void setPropertyExecutionMode(CodeGenModule &CGM, StringRef Name,		static void setPropertyExecutionMode(CodeGenModule &CGM, StringRef Name,
bool Mode) {		bool Mode) {
auto *GVMode =		auto *GVMode =
new llvm::GlobalVariable(CGM.getModule(), CGM.Int8Ty, /isConstant=/true,		new llvm::GlobalVariable(CGM.getModule(), CGM.Int8Ty, /isConstant=/true,
llvm::GlobalValue::WeakAnyLinkage,		llvm::GlobalValue::WeakAnyLinkage,
llvm::ConstantInt::get(CGM.Int8Ty, Mode ? 0 : 1),		llvm::ConstantInt::get(CGM.Int8Ty, Mode ? 0 : 1),
Twine(Name, "_exec_mode"));		Twine(Name, "_exec_mode"));
CGM.addCompilerUsedGlobal(GVMode);		CGM.addCompilerUsedGlobal(GVMode);
		ABataevUnsubmitted Not Done Reply Inline Actions Restore original formatting. ABataev: Restore original formatting.
}		}

void CGOpenMPRuntimeGPU::emitWorkerFunction(WorkerFunctionState &WST) {		void CGOpenMPRuntimeGPU::emitWorkerFunction(WorkerFunctionState &WST) {
ASTContext &Ctx = CGM.getContext();		ASTContext &Ctx = CGM.getContext();

CodeGenFunction CGF(CGM, /suppressNewContext=/true);		CodeGenFunction CGF(CGM, /suppressNewContext=/true);
CGF.StartFunction(GlobalDecl(), Ctx.VoidTy, WST.WorkerFn, WST.CGFI, {},		CGF.StartFunction(GlobalDecl(), Ctx.VoidTy, WST.WorkerFn, WST.CGFI, {},
WST.Loc, WST.Loc);		WST.Loc, WST.Loc);
▲ Show 20 Lines • Show All 150 Lines • ▼ Show 20 Lines	void CGOpenMPRuntimeGPU::emitTargetOutlinedFunction(
bool IsOffloadEntry, const RegionCodeGenTy &CodeGen) {		bool IsOffloadEntry, const RegionCodeGenTy &CodeGen) {
if (!IsOffloadEntry) // Nothing to do.		if (!IsOffloadEntry) // Nothing to do.
return;		return;

assert(!ParentName.empty() && "Invalid target region parent name!");		assert(!ParentName.empty() && "Invalid target region parent name!");

bool Mode = supportsSPMDExecutionMode(CGM.getContext(), D);		bool Mode = supportsSPMDExecutionMode(CGM.getContext(), D);
if (Mode)		if (Mode)
emitSPMDKernel(D, ParentName, OutlinedFn, OutlinedFnID, IsOffloadEntry,		emitSPMDKernelWrapper(D, ParentName, OutlinedFn, OutlinedFnID,
CodeGen);		IsOffloadEntry, CodeGen);
else		else
emitNonSPMDKernel(D, ParentName, OutlinedFn, OutlinedFnID, IsOffloadEntry,		emitNonSPMDKernelWrapper(D, ParentName, OutlinedFn, OutlinedFnID,
CodeGen);		IsOffloadEntry, CodeGen);

setPropertyExecutionMode(CGM, OutlinedFn->getName(), Mode);		setPropertyExecutionMode(CGM, OutlinedFn->getName(), Mode);
}		}

namespace {		namespace {
LLVM_ENABLE_BITMASK_ENUMS_IN_NAMESPACE();		LLVM_ENABLE_BITMASK_ENUMS_IN_NAMESPACE();
/// Enum for accesseing the reserved_2 field of the ident_t struct.		/// Enum for accesseing the reserved_2 field of the ident_t struct.
enum ModeFlagsTy : unsigned {		enum ModeFlagsTy : unsigned {
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	void Enter(CodeGenFunction &CGF) override {
PrevIsInParallelRegion = IsInParallelRegion;		PrevIsInParallelRegion = IsInParallelRegion;
IsInParallelRegion = true;		IsInParallelRegion = true;
}		}
void Exit(CodeGenFunction &CGF) override {		void Exit(CodeGenFunction &CGF) override {
IsInParallelRegion = PrevIsInParallelRegion;		IsInParallelRegion = PrevIsInParallelRegion;
}		}
} Action(IsInParallelRegion);		} Action(IsInParallelRegion);
CodeGen.setAction(Action);		CodeGen.setAction(Action);
bool PrevIsInTTDRegion = IsInTTDRegion;		bool PrevIsInTTDRegion = IsInTTDRegion;
		ABataevUnsubmitted Not Done Reply Inline Actions It leads to a mem leak. ABataev: It leads to a mem leak.
IsInTTDRegion = false;		IsInTTDRegion = false;
bool PrevIsInTargetMasterThreadRegion = IsInTargetMasterThreadRegion;		bool PrevIsInTargetMasterThreadRegion = IsInTargetMasterThreadRegion;
IsInTargetMasterThreadRegion = false;		IsInTargetMasterThreadRegion = false;
auto *OutlinedFun =		auto *OutlinedFun =
cast<llvm::Function>(CGOpenMPRuntime::emitParallelOutlinedFunction(		cast<llvm::Function>(CGOpenMPRuntime::emitParallelOutlinedFunction(
D, ThreadIDVar, InnermostKind, CodeGen));		D, ThreadIDVar, InnermostKind, CodeGen));
if (CGM.getLangOpts().Optimize) {		if (CGM.getLangOpts().Optimize) {
OutlinedFun->removeFnAttr(llvm::Attribute::NoInline);		OutlinedFun->removeFnAttr(llvm::Attribute::NoInline);
▲ Show 20 Lines • Show All 3,291 Lines • Show Last 20 Lines

clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.h

	Show All 21 Lines
	namespace clang {			namespace clang {
	namespace CodeGen {			namespace CodeGen {

	class CGOpenMPRuntimeNVPTX final : public CGOpenMPRuntimeGPU {			class CGOpenMPRuntimeNVPTX final : public CGOpenMPRuntimeGPU {

	public:			public:
	explicit CGOpenMPRuntimeNVPTX(CodeGenModule &CGM);			explicit CGOpenMPRuntimeNVPTX(CodeGenModule &CGM);

				private:
	/// Get the GPU warp size.			/// Get the GPU warp size.
	llvm::Value *getGPUWarpSize(CodeGenFunction &CGF) override;			llvm::Value *getGPUWarpSize(CodeGenFunction &CGF) override;

	/// Get the id of the current thread on the GPU.			/// Get the id of the current thread on the GPU.
	llvm::Value *getGPUThreadID(CodeGenFunction &CGF) override;			llvm::Value *getGPUThreadID(CodeGenFunction &CGF) override;

	/// Get the maximum number of threads in a block of the GPU.			/// Get the maximum number of threads in a block of the GPU.
	llvm::Value *getGPUNumThreads(CodeGenFunction &CGF) override;			llvm::Value *getGPUNumThreads(CodeGenFunction &CGF) override;

				/// Target independent wrapper over target specific emitSPMDKernel()
				void emitSPMDKernelWrapper(const OMPExecutableDirective &D,
				StringRef ParentName, llvm::Function *&OutlinedFn,
				llvm::Constant *&OutlinedFnID, bool IsOffloadEntry,
				const RegionCodeGenTy &CodeGen) override;

				/// Target independent wrapper over target specific emitNonSPMDKernel()
				void emitNonSPMDKernelWrapper(const OMPExecutableDirective &D,
				StringRef ParentName,
				llvm::Function *&OutlinedFn,
				llvm::Constant *&OutlinedFnID,
				bool IsOffloadEntry,
				const RegionCodeGenTy &CodeGen) override;
	};			};

	} // CodeGen namespace.			} // CodeGen namespace.
	} // clang namespace.			} // clang namespace.

	#endif // LLVM_CLANG_LIB_CODEGEN_CGOPENMPRUNTIMENVPTX_H			#endif // LLVM_CLANG_LIB_CODEGEN_CGOPENMPRUNTIMENVPTX_H

clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp

	Show All 24 Lines
	using namespace clang;			using namespace clang;
	using namespace CodeGen;			using namespace CodeGen;
	using namespace llvm::omp;			using namespace llvm::omp;

	CGOpenMPRuntimeNVPTX::CGOpenMPRuntimeNVPTX(CodeGenModule &CGM)			CGOpenMPRuntimeNVPTX::CGOpenMPRuntimeNVPTX(CodeGenModule &CGM)
	: CGOpenMPRuntimeGPU(CGM) {			: CGOpenMPRuntimeGPU(CGM) {
	if (!CGM.getLangOpts().OpenMPIsDevice)			if (!CGM.getLangOpts().OpenMPIsDevice)
	llvm_unreachable("OpenMP NVPTX can only handle device code.");			llvm_unreachable("OpenMP NVPTX can only handle device code.");
				KernelStaticGlobalizedLinkage = llvm::GlobalValue::InternalLinkage;
	}			}

	llvm::Value *CGOpenMPRuntimeNVPTX::getGPUWarpSize(CodeGenFunction &CGF) {			llvm::Value *CGOpenMPRuntimeNVPTX::getGPUWarpSize(CodeGenFunction &CGF) {
	return CGF.EmitRuntimeCall(			return CGF.EmitRuntimeCall(
	llvm::Intrinsic::getDeclaration(			llvm::Intrinsic::getDeclaration(
	&CGF.CGM.getModule(), llvm::Intrinsic::nvvm_read_ptx_sreg_warpsize),			&CGF.CGM.getModule(), llvm::Intrinsic::nvvm_read_ptx_sreg_warpsize),
	"nvptx_warp_size");			"nvptx_warp_size");
	}			}

	llvm::Value *CGOpenMPRuntimeNVPTX::getGPUThreadID(CodeGenFunction &CGF) {			llvm::Value *CGOpenMPRuntimeNVPTX::getGPUThreadID(CodeGenFunction &CGF) {
	CGBuilderTy &Bld = CGF.Builder;			CGBuilderTy &Bld = CGF.Builder;
	llvm::Function *F;			llvm::Function *F;
	F = llvm::Intrinsic::getDeclaration(			F = llvm::Intrinsic::getDeclaration(
	&CGF.CGM.getModule(), llvm::Intrinsic::nvvm_read_ptx_sreg_tid_x);			&CGF.CGM.getModule(), llvm::Intrinsic::nvvm_read_ptx_sreg_tid_x);
	return Bld.CreateCall(F, llvm::None, "nvptx_tid");			return Bld.CreateCall(F, llvm::None, "nvptx_tid");
	}			}

	llvm::Value *CGOpenMPRuntimeNVPTX::getGPUNumThreads(CodeGenFunction &CGF) {			llvm::Value *CGOpenMPRuntimeNVPTX::getGPUNumThreads(CodeGenFunction &CGF) {
	CGBuilderTy &Bld = CGF.Builder;			CGBuilderTy &Bld = CGF.Builder;
	llvm::Function *F;			llvm::Function *F;
	F = llvm::Intrinsic::getDeclaration(			F = llvm::Intrinsic::getDeclaration(
	&CGF.CGM.getModule(), llvm::Intrinsic::nvvm_read_ptx_sreg_ntid_x);			&CGF.CGM.getModule(), llvm::Intrinsic::nvvm_read_ptx_sreg_ntid_x);
	return Bld.CreateCall(F, llvm::None, "nvptx_num_threads");			return Bld.CreateCall(F, llvm::None, "nvptx_num_threads");
	}			}

				void CGOpenMPRuntimeNVPTX::emitSPMDKernelWrapper(
				JonChesterfieldUnsubmitted Not Done Reply Inline Actions Perhaps (typed into browser): llvm::GlobalVariable CGOpenMPRuntimeNVPTX::createGlobal( CodeGenModule &CGM, llvm::ArrayType Ty, StringRef Name) { return new llvm::GlobalVariable( CGM.getModule(), Ty, /isConstant=/false, llvm::GlobalVariable::CommonLinkage, llvm::Constant::getNullValue(Ty), Name, /InsertBefore=/nullptr, llvm::GlobalVariable::NotThreadLocal, CGM.getContext().getTargetAddressSpace(LangAS::cuda_shared), /isExternallyInitialized/ true); } llvm::GlobalVariable CGOpenMPRuntimeAMDGCN::createGlobal( CodeGenModule &CGM, llvm::ArrayType Ty, StringRef Name) { return new llvm::GlobalVariable( CGM.getModule(), Ty, /isConstant=/false, llvm::GlobalVariable::WeakAnyLinkage, llvm::Constant::getNullValue(Ty), Name, /InsertBefore=/nullptr, llvm::GlobalVariable::NotThreadLocal, CGM.getContext().getTargetAddressSpace(LangAS::cuda_shared), /isExternallyInitialized/ false); } JonChesterfield: Perhaps (typed into browser): ``` llvm::GlobalVariable *CGOpenMPRuntimeNVPTX::createGlobal…
				const OMPExecutableDirective &D, StringRef ParentName,
				llvm::Function &OutlinedFn, llvm::Constant &OutlinedFnID,
				bool IsOffloadEntry, const RegionCodeGenTy &CodeGen) {
				emitSPMDKernel(D, ParentName, OutlinedFn, OutlinedFnID, IsOffloadEntry,
				CodeGen);
				}

				void CGOpenMPRuntimeNVPTX::emitNonSPMDKernelWrapper(
				const OMPExecutableDirective &D, StringRef ParentName,
				llvm::Function &OutlinedFn, llvm::Constant &OutlinedFnID,
				bool IsOffloadEntry, const RegionCodeGenTy &CodeGen) {
				emitNonSPMDKernel(D, ParentName, OutlinedFn, OutlinedFnID, IsOffloadEntry,
				CodeGen);
				}

This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP][AMDGCN] Generate global variables and attributes for AMDGCNAbandonedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 307108

clang/lib/CodeGen/CGOpenMPRuntime.h

clang/lib/CodeGen/CGOpenMPRuntime.cpp

clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.h

clang/lib/CodeGen/CGOpenMPRuntimeAMDGCN.cpp

clang/lib/CodeGen/CGOpenMPRuntimeGPU.h

clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp

clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.h

clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp

[OpenMP][AMDGCN] Generate global variables and attributes for AMDGCN
AbandonedPublic