This is an archive of the discontinued LLVM Phabricator instance.

Differential D90670

Simplifying memory globalization from the front end to move optimizations to the middle end.
AbandonedPublic

Authored by josemonsalve2 on Nov 2 2020, 10:42 PM.

Download Raw Diff

Details

Reviewers

jdoerfert

Summary

Memory globalization was fully implemented in the front end. There are three runtime
functions in Libomptarget:

__kmpc_data_sharing_push_stack
__kmpc_data_sharing_coalesced_push_stack
__kmpc_data_sharing_pop_stack

The front end performed a scape analysis and created a record declare with all the stack
variables. Then, based on the context (isTTD and other parameters) it would create a push
for the size of the record, or for that size multiplied by the WARP (to globalize for the
whole WARP.

This PR removes the record creation, and it simplifies the front end to be a simple runtime
call that will be later on optimized in the middle end. The middle end will be able to
determine the stack variables that do scape, and those that do not, as well as the
approrpiate merging of different globalized variables

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	100 ms	x64 debian > Clang.OpenMP::nvptx_parallel_for_codegen.cpp
	150 ms	x64 debian > Clang.OpenMP::nvptx_target_codegen.cpp
	100 ms	x64 debian > Clang.OpenMP::nvptx_target_teams_distribute_codegen.cpp
	220 ms	x64 debian > Clang.OpenMP::nvptx_target_teams_distribute_parallel_for_codegen.cpp
	160 ms	x64 debian > Clang.OpenMP::nvptx_target_teams_distribute_parallel_for_simd_codegen.cpp
		View Full Test Results (17 Failed)

Event Timeline

josemonsalve2 created this revision.Nov 2 2020, 10:42 PM

Herald added projects: Restricted Project, Restricted Project, Restricted Project. · View Herald TranscriptNov 2 2020, 10:42 PM

Herald added subscribers: llvm-commits, openmp-commits, cfe-commits. · View Herald Transcript

josemonsalve2 requested review of this revision.Nov 2 2020, 10:42 PM

Herald added a subscriber: sstefan1. · View Herald TranscriptNov 2 2020, 10:42 PM

Harbormaster completed remote builds in B77362: Diff 302467.Nov 2 2020, 11:27 PM

Let's start by adding an updated test to this so we can see how the result looks.

Removing globalized record for parallel regions

When globalization occurs in parallel regions, a record was crerated that is not necessary anymore.
This is expected to be done in the back end.

I'm working on the other tests right now.

Harbormaster completed remote builds in B81479: Diff 310242.Dec 8 2020, 10:02 AM

Modifying 3 more tests to reflect changes in this patch

Harbormaster completed remote builds in B83285: Diff 313352.Dec 22 2020, 9:31 AM

jhuber6 mentioned this in D97680: [OpenMP] Simplify GPU memory globalization.Mar 2 2021, 4:41 PM

This has been completed by @jhuber6 in D97680

Revision Contents

Path

Size

clang/

lib/

CodeGen/

CGOpenMPRuntimeGPU.h

11 lines

CGOpenMPRuntimeGPU.cpp

442 lines

test/

OpenMP/

declare_target_codegen_globalization.cpp

28 lines

nvptx_data_sharing.cpp

36 lines

nvptx_distribute_parallel_generic_mode_codegen.cpp

43 lines

nvptx_parallel_codegen.cpp

25 lines

llvm/

include/

llvm/

Frontend/

OpenMP/

OMPKinds.def

2 lines

openmp/

libomptarget/

deviceRTLs/

common/

src/

data_sharing.cu

6 lines

interface.h

2 lines

Diff 313352

clang/lib/CodeGen/CGOpenMPRuntimeGPU.h

Show First 20 Lines • Show All 433 Lines • ▼ Show 20 Lines	private:
/// and controls the parameters which are passed to this function.		/// and controls the parameters which are passed to this function.
/// The wrapper ensures that the outlined function is called		/// The wrapper ensures that the outlined function is called
/// with the correct arguments when data is shared.		/// with the correct arguments when data is shared.
llvm::Function *createParallelDataSharingWrapper(		llvm::Function *createParallelDataSharingWrapper(
llvm::Function *OutlinedParallelFn, const OMPExecutableDirective &D);		llvm::Function *OutlinedParallelFn, const OMPExecutableDirective &D);

/// The data for the single globalized variable.		/// The data for the single globalized variable.
struct MappedVarData {		struct MappedVarData {
/// Corresponding field in the global record.
const FieldDecl *FD = nullptr;
/// Corresponding address.		/// Corresponding address.
Address PrivateAddr = Address::invalid();		Address PrivateAddr = Address::invalid();
		llvm::Value *globalizedVal;
/// true, if only one element is required (for latprivates in SPMD mode),		/// true, if only one element is required (for latprivates in SPMD mode),
/// false, if need to create based on the warp-size.		/// false, if need to create based on the warp-size.
bool IsOnePerTeam = false;		bool IsOnePerTeam = false;
MappedVarData() = delete;		MappedVarData() = delete;
MappedVarData(const FieldDecl *FD, bool IsOnePerTeam = false)		MappedVarData(bool IsOnePerTeam = false) : IsOnePerTeam(IsOnePerTeam) {}
: FD(FD), IsOnePerTeam(IsOnePerTeam) {}
};		};
/// The map of local variables to their addresses in the global memory.		/// The map of local variables to their addresses in the global memory.
using DeclToAddrMapTy = llvm::MapVector<const Decl *, MappedVarData>;		using DeclToAddrMapTy = llvm::MapVector<const Decl *, MappedVarData>;
/// Set of the parameters passed by value escaping OpenMP context.		/// Set of the parameters passed by value escaping OpenMP context.
using EscapedParamsTy = llvm::SmallPtrSet<const Decl *, 4>;		using EscapedParamsTy = llvm::SmallPtrSet<const Decl *, 4>;
struct FunctionData {		struct FunctionData {
DeclToAddrMapTy LocalVarData;		DeclToAddrMapTy LocalVarData;
llvm::Optional<DeclToAddrMapTy> SecondaryLocalVarData = llvm::None;
EscapedParamsTy EscapedParameters;		EscapedParamsTy EscapedParameters;
llvm::SmallVector<const ValueDecl*, 4> EscapedVariableLengthDecls;		llvm::SmallVector<const ValueDecl*, 4> EscapedVariableLengthDecls;
llvm::SmallVector<llvm::Value *, 4> EscapedVariableLengthDeclsAddrs;		llvm::SmallVector<llvm::Value *, 4> EscapedVariableLengthDeclsAddrs;
const RecordDecl *GlobalRecord = nullptr;
llvm::Optional<const RecordDecl *> SecondaryGlobalRecord = llvm::None;
llvm::Value *GlobalRecordAddr = nullptr;
llvm::Value *IsInSPMDModeFlag = nullptr;		llvm::Value *IsInSPMDModeFlag = nullptr;
std::unique_ptr<CodeGenFunction::OMPMapVars> MappedParams;		std::unique_ptr<CodeGenFunction::OMPMapVars> MappedParams;
};		};
/// Maps the function to the list of the globalized variables with their		/// Maps the function to the list of the globalized variables with their
/// addresses.		/// addresses.
llvm::SmallDenseMap<llvm::Function *, FunctionData> FunctionGlobalizedDecls;		llvm::SmallDenseMap<llvm::Function *, FunctionData> FunctionGlobalizedDecls;
/// List of records for the globalized variables in target/teams/distribute		/// List of records for the globalized variables in target/teams/distribute
/// contexts. Inner records are going to be joined into the single record,		/// contexts. Inner records are going to be joined into the single record,
Show All 12 Lines	private:
llvm::GlobalVariable *KernelTeamsReductionPtr = nullptr;		llvm::GlobalVariable *KernelTeamsReductionPtr = nullptr;
/// List of the records with the list of fields for the reductions across the		/// List of the records with the list of fields for the reductions across the
/// teams. Used to build the intermediate buffer for the fast teams		/// teams. Used to build the intermediate buffer for the fast teams
/// reductions.		/// reductions.
/// All the records are gathered into a union `union.type` is created.		/// All the records are gathered into a union `union.type` is created.
llvm::SmallVector<const RecordDecl *, 4> TeamsReductions;		llvm::SmallVector<const RecordDecl *, 4> TeamsReductions;
/// Shared pointer for the global memory in the global memory buffer used for		/// Shared pointer for the global memory in the global memory buffer used for
/// the given kernel.		/// the given kernel.
llvm::GlobalVariable *KernelStaticGlobalized = nullptr;
/// Pair of the Non-SPMD team and all reductions variables in this team		/// Pair of the Non-SPMD team and all reductions variables in this team
/// region.		/// region.
std::pair<const Decl , llvm::SmallVector<const ValueDecl , 4>>		std::pair<const Decl , llvm::SmallVector<const ValueDecl , 4>>
TeamAndReductions;		TeamAndReductions;
};		};

} // CodeGen namespace.		} // CodeGen namespace.
} // clang namespace.		} // clang namespace.

#endif // LLVM_CLANG_LIB_CODEGEN_CGOPENMPRUNTIMEGPU_H		#endif // LLVM_CLANG_LIB_CODEGEN_CGOPENMPRUNTIMEGPU_H

clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp

Show First 20 Lines • Show All 219 Lines • ▼ Show 20 Lines

/// Get the list of variables that can escape their declaration context.		/// Get the list of variables that can escape their declaration context.
class CheckVarsEscapingDeclContext final		class CheckVarsEscapingDeclContext final
: public ConstStmtVisitor<CheckVarsEscapingDeclContext> {		: public ConstStmtVisitor<CheckVarsEscapingDeclContext> {
CodeGenFunction &CGF;		CodeGenFunction &CGF;
llvm::SetVector<const ValueDecl *> EscapedDecls;		llvm::SetVector<const ValueDecl *> EscapedDecls;
llvm::SetVector<const ValueDecl *> EscapedVariableLengthDecls;		llvm::SetVector<const ValueDecl *> EscapedVariableLengthDecls;
llvm::SmallPtrSet<const Decl *, 4> EscapedParameters;		llvm::SmallPtrSet<const Decl *, 4> EscapedParameters;
RecordDecl *GlobalizedRD = nullptr;
llvm::SmallDenseMap<const ValueDecl , const FieldDecl > MappedDeclsFields;		llvm::SmallDenseMap<const ValueDecl , const FieldDecl > MappedDeclsFields;
bool AllEscaped = false;		bool AllEscaped = false;
bool IsForCombinedParallelRegion = false;		bool IsForCombinedParallelRegion = false;

void markAsEscaped(const ValueDecl *VD) {		void markAsEscaped(const ValueDecl *VD) {
// Do not globalize declare target variables.		// Do not globalize declare target variables.
if (!isa<VarDecl>(VD) \|\|		if (!isa<VarDecl>(VD) \|\|
OMPDeclareTargetDeclAttr::isDeclareTargetDeclaration(VD))		OMPDeclareTargetDeclAttr::isDeclareTargetDeclaration(VD))
▲ Show 20 Lines • Show All 93 Lines • ▼ Show 20 Lines	for (const CapturedStmt::Capture &C : S->captures()) {
markAsEscaped(VD);		markAsEscaped(VD);
if (isa<OMPCapturedExprDecl>(VD))		if (isa<OMPCapturedExprDecl>(VD))
VisitValueDecl(VD);		VisitValueDecl(VD);
IsForCombinedParallelRegion = SavedIsForCombinedParallelRegion;		IsForCombinedParallelRegion = SavedIsForCombinedParallelRegion;
}		}
}		}
}		}

void buildRecordForGlobalizedVars(bool IsInTTDRegion) {
assert(!GlobalizedRD &&
"Record for globalized variables is built already.");
ArrayRef<const ValueDecl *> EscapedDeclsForParallel, EscapedDeclsForTeams;
unsigned WarpSize = CGF.getTarget().getGridValue(llvm::omp::GV_Warp_Size);
if (IsInTTDRegion)
EscapedDeclsForTeams = EscapedDecls.getArrayRef();
else
EscapedDeclsForParallel = EscapedDecls.getArrayRef();
GlobalizedRD = ::buildRecordForGlobalizedVars(
CGF.getContext(), EscapedDeclsForParallel, EscapedDeclsForTeams,
MappedDeclsFields, WarpSize);
}

public:		public:
CheckVarsEscapingDeclContext(CodeGenFunction &CGF,		CheckVarsEscapingDeclContext(CodeGenFunction &CGF,
ArrayRef<const ValueDecl *> TeamsReductions)		ArrayRef<const ValueDecl *> TeamsReductions)
: CGF(CGF), EscapedDecls(TeamsReductions.begin(), TeamsReductions.end()) {		: CGF(CGF), EscapedDecls(TeamsReductions.begin(), TeamsReductions.end()) {
}		}
virtual ~CheckVarsEscapingDeclContext() = default;		virtual ~CheckVarsEscapingDeclContext() = default;
void VisitDeclStmt(const DeclStmt *S) {		void VisitDeclStmt(const DeclStmt *S) {
▲ Show 20 Lines • Show All 129 Lines • ▼ Show 20 Lines	public:
void VisitStmt(const Stmt *S) {		void VisitStmt(const Stmt *S) {
if (!S)		if (!S)
return;		return;
for (const Stmt *Child : S->children())		for (const Stmt *Child : S->children())
if (Child)		if (Child)
Visit(Child);		Visit(Child);
}		}

/// Returns the record that handles all the escaped local variables and used
/// instead of their original storage.
const RecordDecl *getGlobalizedRecord(bool IsInTTDRegion) {
if (!GlobalizedRD)
buildRecordForGlobalizedVars(IsInTTDRegion);
return GlobalizedRD;
}

/// Returns the field in the globalized record for the escaped variable.
const FieldDecl getFieldForGlobalizedVar(const ValueDecl VD) const {
assert(GlobalizedRD &&
"Record for globalized variables must be generated already.");
auto I = MappedDeclsFields.find(VD);
if (I == MappedDeclsFields.end())
return nullptr;
return I->getSecond();
}

/// Returns the list of the escaped local variables/parameters.		/// Returns the list of the escaped local variables/parameters.
ArrayRef<const ValueDecl *> getEscapedDecls() const {		ArrayRef<const ValueDecl *> getEscapedDecls() const {
return EscapedDecls.getArrayRef();		return EscapedDecls.getArrayRef();
}		}

/// Checks if the escaped local variable is actually a parameter passed by		/// Checks if the escaped local variable is actually a parameter passed by
/// value.		/// value.
const llvm::SmallPtrSetImpl<const Decl *> &getEscapedParameters() const {		const llvm::SmallPtrSetImpl<const Decl *> &getEscapedParameters() const {
▲ Show 20 Lines • Show All 571 Lines • ▼ Show 20 Lines	void Exit(CodeGenFunction &CGF) override {
RT.clearLocThreadIdInsertPt(CGF);		RT.clearLocThreadIdInsertPt(CGF);
RT.emitNonSPMDEntryFooter(CGF, EST);		RT.emitNonSPMDEntryFooter(CGF, EST);
}		}
} Action(EST, WST);		} Action(EST, WST);
CodeGen.setAction(Action);		CodeGen.setAction(Action);
IsInTTDRegion = true;		IsInTTDRegion = true;
// Reserve place for the globalized memory.		// Reserve place for the globalized memory.
GlobalizedRecords.emplace_back();		GlobalizedRecords.emplace_back();
if (!KernelStaticGlobalized) {
KernelStaticGlobalized = new llvm::GlobalVariable(
CGM.getModule(), CGM.VoidPtrTy, /isConstant=/false,
llvm::GlobalValue::InternalLinkage,
llvm::ConstantPointerNull::get(CGM.VoidPtrTy),
"_openmp_kernel_static_glob_rd$ptr", /InsertBefore=/nullptr,
llvm::GlobalValue::NotThreadLocal,
CGM.getContext().getTargetAddressSpace(LangAS::cuda_shared));
}
emitTargetOutlinedFunctionHelper(D, ParentName, OutlinedFn, OutlinedFnID,		emitTargetOutlinedFunctionHelper(D, ParentName, OutlinedFn, OutlinedFnID,
IsOffloadEntry, CodeGen);		IsOffloadEntry, CodeGen);
IsInTTDRegion = false;		IsInTTDRegion = false;

// Now change the name of the worker function to correspond to this target		// Now change the name of the worker function to correspond to this target
// region's entry function.		// region's entry function.
WST.WorkerFn->setName(Twine(OutlinedFn->getName(), "_worker"));		WST.WorkerFn->setName(Twine(OutlinedFn->getName(), "_worker"));

▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines	void Exit(CodeGenFunction &CGF) override {
RT.clearLocThreadIdInsertPt(CGF);		RT.clearLocThreadIdInsertPt(CGF);
RT.emitSPMDEntryFooter(CGF, EST);		RT.emitSPMDEntryFooter(CGF, EST);
}		}
} Action(*this, EST, D);		} Action(*this, EST, D);
CodeGen.setAction(Action);		CodeGen.setAction(Action);
IsInTTDRegion = true;		IsInTTDRegion = true;
// Reserve place for the globalized memory.		// Reserve place for the globalized memory.
GlobalizedRecords.emplace_back();		GlobalizedRecords.emplace_back();
if (!KernelStaticGlobalized) {
KernelStaticGlobalized = new llvm::GlobalVariable(
CGM.getModule(), CGM.VoidPtrTy, /isConstant=/false,
llvm::GlobalValue::InternalLinkage,
llvm::ConstantPointerNull::get(CGM.VoidPtrTy),
"_openmp_kernel_static_glob_rd$ptr", /InsertBefore=/nullptr,
llvm::GlobalValue::NotThreadLocal,
CGM.getContext().getTargetAddressSpace(LangAS::cuda_shared));
}
emitTargetOutlinedFunctionHelper(D, ParentName, OutlinedFn, OutlinedFnID,		emitTargetOutlinedFunctionHelper(D, ParentName, OutlinedFn, OutlinedFnID,
IsOffloadEntry, CodeGen);		IsOffloadEntry, CodeGen);
IsInTTDRegion = false;		IsInTTDRegion = false;
}		}

void CGOpenMPRuntimeGPU::emitSPMDEntryHeader(		void CGOpenMPRuntimeGPU::emitSPMDEntryHeader(
CodeGenFunction &CGF, EntryFunctionState &EST,		CodeGenFunction &CGF, EntryFunctionState &EST,
const OMPExecutableDirective &D) {		const OMPExecutableDirective &D) {
▲ Show 20 Lines • Show All 421 Lines • ▼ Show 20 Lines	NVPTXPrePostActionTy(
&MappedDeclsFields)		&MappedDeclsFields)
: Loc(Loc), GlobalizedRD(GlobalizedRD),		: Loc(Loc), GlobalizedRD(GlobalizedRD),
MappedDeclsFields(MappedDeclsFields) {}		MappedDeclsFields(MappedDeclsFields) {}
void Enter(CodeGenFunction &CGF) override {		void Enter(CodeGenFunction &CGF) override {
auto &Rt =		auto &Rt =
static_cast<CGOpenMPRuntimeGPU &>(CGF.CGM.getOpenMPRuntime());		static_cast<CGOpenMPRuntimeGPU &>(CGF.CGM.getOpenMPRuntime());
if (GlobalizedRD) {		if (GlobalizedRD) {
auto I = Rt.FunctionGlobalizedDecls.try_emplace(CGF.CurFn).first;		auto I = Rt.FunctionGlobalizedDecls.try_emplace(CGF.CurFn).first;
I->getSecond().GlobalRecord = GlobalizedRD;
I->getSecond().MappedParams =		I->getSecond().MappedParams =
std::make_unique<CodeGenFunction::OMPMapVars>();		std::make_unique<CodeGenFunction::OMPMapVars>();
DeclToAddrMapTy &Data = I->getSecond().LocalVarData;		DeclToAddrMapTy &Data = I->getSecond().LocalVarData;
for (const auto &Pair : MappedDeclsFields) {		for (const auto &Pair : MappedDeclsFields) {
assert(Pair.getFirst()->isCanonicalDecl() &&		assert(Pair.getFirst()->isCanonicalDecl() &&
"Expected canonical declaration");		"Expected canonical declaration");
Data.insert(std::make_pair(Pair.getFirst(),		Data.insert(std::make_pair(Pair.getFirst(),
MappedVarData(Pair.getSecond(),		MappedVarData(/IsOnePerTeam=/true)));
/IsOnePerTeam=/true)));
}		}
}		}
Rt.emitGenericVarsProlog(CGF, Loc);		Rt.emitGenericVarsProlog(CGF, Loc);
}		}
void Exit(CodeGenFunction &CGF) override {		void Exit(CodeGenFunction &CGF) override {
static_cast<CGOpenMPRuntimeGPU &>(CGF.CGM.getOpenMPRuntime())		static_cast<CGOpenMPRuntimeGPU &>(CGF.CGM.getOpenMPRuntime())
.emitGenericVarsEpilog(CGF);		.emitGenericVarsEpilog(CGF);
}		}
Show All 17 Lines	if (getDataSharingMode(CGM) != CGOpenMPRuntimeGPU::Generic &&
getExecutionMode() != CGOpenMPRuntimeGPU::EM_SPMD)		getExecutionMode() != CGOpenMPRuntimeGPU::EM_SPMD)
return;		return;

CGBuilderTy &Bld = CGF.Builder;		CGBuilderTy &Bld = CGF.Builder;

const auto I = FunctionGlobalizedDecls.find(CGF.CurFn);		const auto I = FunctionGlobalizedDecls.find(CGF.CurFn);
if (I == FunctionGlobalizedDecls.end())		if (I == FunctionGlobalizedDecls.end())
return;		return;
if (const RecordDecl *GlobalizedVarsRecord = I->getSecond().GlobalRecord) {
QualType GlobalRecTy = CGM.getContext().getRecordType(GlobalizedVarsRecord);		// Variables are marked for globalization before, based on an
QualType SecGlobalRecTy;		// scape analysis.

// Recover pointer to this function's global record. The runtime will
// handle the specifics of the allocation of the memory.
// Use actual memory size of the record including the padding
// for alignment purposes.
unsigned Alignment =
CGM.getContext().getTypeAlignInChars(GlobalRecTy).getQuantity();
unsigned GlobalRecordSize =
CGM.getContext().getTypeSizeInChars(GlobalRecTy).getQuantity();
GlobalRecordSize = llvm::alignTo(GlobalRecordSize, Alignment);

llvm::PointerType *GlobalRecPtrTy =
CGF.ConvertTypeForMem(GlobalRecTy)->getPointerTo();
llvm::Value *GlobalRecCastAddr;
llvm::Value *IsTTD = nullptr;
if (!IsInTTDRegion &&
(WithSPMDCheck \|\|
getExecutionMode() == CGOpenMPRuntimeGPU::EM_Unknown)) {
llvm::BasicBlock *ExitBB = CGF.createBasicBlock(".exit");
llvm::BasicBlock *SPMDBB = CGF.createBasicBlock(".spmd");
llvm::BasicBlock *NonSPMDBB = CGF.createBasicBlock(".non-spmd");
if (I->getSecond().SecondaryGlobalRecord.hasValue()) {
llvm::Value *RTLoc = emitUpdateLocation(CGF, Loc);
llvm::Value *ThreadID = getThreadID(CGF, Loc);
llvm::Value *PL = CGF.EmitRuntimeCall(
OMPBuilder.getOrCreateRuntimeFunction(CGM.getModule(),
OMPRTL___kmpc_parallel_level),
{RTLoc, ThreadID});
IsTTD = Bld.CreateIsNull(PL);
}
llvm::Value *IsSPMD = Bld.CreateIsNotNull(
CGF.EmitNounwindRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction(
CGM.getModule(), OMPRTL___kmpc_is_spmd_exec_mode)));
Bld.CreateCondBr(IsSPMD, SPMDBB, NonSPMDBB);
// There is no need to emit line number for unconditional branch.
(void)ApplyDebugLocation::CreateEmpty(CGF);
CGF.EmitBlock(SPMDBB);
Address RecPtr = Address(llvm::ConstantPointerNull::get(GlobalRecPtrTy),
CharUnits::fromQuantity(Alignment));
CGF.EmitBranch(ExitBB);
// There is no need to emit line number for unconditional branch.
(void)ApplyDebugLocation::CreateEmpty(CGF);
CGF.EmitBlock(NonSPMDBB);
llvm::Value *Size = llvm::ConstantInt::get(CGM.SizeTy, GlobalRecordSize);
if (const RecordDecl *SecGlobalizedVarsRecord =
I->getSecond().SecondaryGlobalRecord.getValueOr(nullptr)) {
SecGlobalRecTy =
CGM.getContext().getRecordType(SecGlobalizedVarsRecord);

// Recover pointer to this function's global record. The runtime will
// handle the specifics of the allocation of the memory.
// Use actual memory size of the record including the padding
// for alignment purposes.
unsigned Alignment =
CGM.getContext().getTypeAlignInChars(SecGlobalRecTy).getQuantity();
unsigned GlobalRecordSize =
CGM.getContext().getTypeSizeInChars(SecGlobalRecTy).getQuantity();
GlobalRecordSize = llvm::alignTo(GlobalRecordSize, Alignment);
Size = Bld.CreateSelect(
IsTTD, llvm::ConstantInt::get(CGM.SizeTy, GlobalRecordSize), Size);
}
// TODO: allow the usage of shared memory to be controlled by
// the user, for now, default to global.
llvm::Value *GlobalRecordSizeArg[] = {
Size, CGF.Builder.getInt16(/UseSharedMemory=/0)};
llvm::Value *GlobalRecValue = CGF.EmitRuntimeCall(
OMPBuilder.getOrCreateRuntimeFunction(
CGM.getModule(), OMPRTL___kmpc_data_sharing_coalesced_push_stack),
GlobalRecordSizeArg);
GlobalRecCastAddr = Bld.CreatePointerBitCastOrAddrSpaceCast(
GlobalRecValue, GlobalRecPtrTy);
CGF.EmitBlock(ExitBB);
auto *Phi = Bld.CreatePHI(GlobalRecPtrTy,
/NumReservedValues=/2, "_select_stack");
Phi->addIncoming(RecPtr.getPointer(), SPMDBB);
Phi->addIncoming(GlobalRecCastAddr, NonSPMDBB);
GlobalRecCastAddr = Phi;
I->getSecond().GlobalRecordAddr = Phi;
I->getSecond().IsInSPMDModeFlag = IsSPMD;
} else if (!CGM.getLangOpts().OpenMPCUDATargetParallel && IsInTTDRegion) {
assert(GlobalizedRecords.back().Records.size() < 2 &&
"Expected less than 2 globalized records: one for target and one "
"for teams.");
unsigned Offset = 0;
for (const RecordDecl *RD : GlobalizedRecords.back().Records) {
QualType RDTy = CGM.getContext().getRecordType(RD);
unsigned Alignment =
CGM.getContext().getTypeAlignInChars(RDTy).getQuantity();
unsigned Size = CGM.getContext().getTypeSizeInChars(RDTy).getQuantity();
Offset =
llvm::alignTo(llvm::alignTo(Offset, Alignment) + Size, Alignment);
}
unsigned Alignment =
CGM.getContext().getTypeAlignInChars(GlobalRecTy).getQuantity();
Offset = llvm::alignTo(Offset, Alignment);
GlobalizedRecords.back().Records.push_back(GlobalizedVarsRecord);
++GlobalizedRecords.back().RegionCounter;
if (GlobalizedRecords.back().Records.size() == 1) {
assert(KernelStaticGlobalized &&
"Kernel static pointer must be initialized already.");
auto *UseSharedMemory = new llvm::GlobalVariable(
CGM.getModule(), CGM.Int16Ty, /isConstant=/true,
llvm::GlobalValue::InternalLinkage, nullptr,
"_openmp_static_kernel$is_shared");
UseSharedMemory->setUnnamedAddr(llvm::GlobalValue::UnnamedAddr::Global);
QualType Int16Ty = CGM.getContext().getIntTypeForBitwidth(
/DestWidth=/16, /Signed=/0);
llvm::Value *IsInSharedMemory = CGF.EmitLoadOfScalar(
Address(UseSharedMemory,
CGM.getContext().getTypeAlignInChars(Int16Ty)),
/Volatile=/false, Int16Ty, Loc);
auto *StaticGlobalized = new llvm::GlobalVariable(
CGM.getModule(), CGM.Int8Ty, /isConstant=/false,
llvm::GlobalValue::CommonLinkage, nullptr);
auto *RecSize = new llvm::GlobalVariable(
CGM.getModule(), CGM.SizeTy, /isConstant=/true,
llvm::GlobalValue::InternalLinkage, nullptr,
"_openmp_static_kernel$size");
RecSize->setUnnamedAddr(llvm::GlobalValue::UnnamedAddr::Global);
llvm::Value *Ld = CGF.EmitLoadOfScalar(
Address(RecSize, CGM.getSizeAlign()), /Volatile=/false,
CGM.getContext().getSizeType(), Loc);
llvm::Value *ResAddr = Bld.CreatePointerBitCastOrAddrSpaceCast(
KernelStaticGlobalized, CGM.VoidPtrPtrTy);
llvm::Value *GlobalRecordSizeArg[] = {
llvm::ConstantInt::get(
CGM.Int16Ty,
getExecutionMode() == CGOpenMPRuntimeGPU::EM_SPMD ? 1 : 0),
StaticGlobalized, Ld, IsInSharedMemory, ResAddr};
CGF.EmitRuntimeCall(
OMPBuilder.getOrCreateRuntimeFunction(
CGM.getModule(), OMPRTL___kmpc_get_team_static_memory),
GlobalRecordSizeArg);
GlobalizedRecords.back().Buffer = StaticGlobalized;
GlobalizedRecords.back().RecSize = RecSize;
GlobalizedRecords.back().UseSharedMemory = UseSharedMemory;
GlobalizedRecords.back().Loc = Loc;
}
assert(KernelStaticGlobalized && "Global address must be set already.");
Address FrameAddr = CGF.EmitLoadOfPointer(
Address(KernelStaticGlobalized, CGM.getPointerAlign()),
CGM.getContext()
.getPointerType(CGM.getContext().VoidPtrTy)
.castAs<PointerType>());
llvm::Value *GlobalRecValue =
Bld.CreateConstInBoundsGEP(FrameAddr, Offset).getPointer();
I->getSecond().GlobalRecordAddr = GlobalRecValue;
I->getSecond().IsInSPMDModeFlag = nullptr;
GlobalRecCastAddr = Bld.CreatePointerBitCastOrAddrSpaceCast(
GlobalRecValue, CGF.ConvertTypeForMem(GlobalRecTy)->getPointerTo());
} else {
// TODO: allow the usage of shared memory to be controlled by
// the user, for now, default to global.
bool UseSharedMemory =
IsInTTDRegion && GlobalRecordSize <= SharedMemorySize;
llvm::Value *GlobalRecordSizeArg[] = {
llvm::ConstantInt::get(CGM.SizeTy, GlobalRecordSize),
CGF.Builder.getInt16(UseSharedMemory ? 1 : 0)};
llvm::Value *GlobalRecValue = CGF.EmitRuntimeCall(
OMPBuilder.getOrCreateRuntimeFunction(
CGM.getModule(),
IsInTTDRegion ? OMPRTL___kmpc_data_sharing_push_stack
: OMPRTL___kmpc_data_sharing_coalesced_push_stack),
GlobalRecordSizeArg);
GlobalRecCastAddr = Bld.CreatePointerBitCastOrAddrSpaceCast(
GlobalRecValue, GlobalRecPtrTy);
I->getSecond().GlobalRecordAddr = GlobalRecValue;
I->getSecond().IsInSPMDModeFlag = nullptr;
}
LValue Base =
CGF.MakeNaturalAlignPointeeAddrLValue(GlobalRecCastAddr, GlobalRecTy);

// Emit the "global alloca" which is a GEP from the global declaration
// record using the pointer returned by the runtime.
LValue SecBase;
decltype(I->getSecond().LocalVarData)::const_iterator SecIt;
if (IsTTD) {
SecIt = I->getSecond().SecondaryLocalVarData->begin();
llvm::PointerType *SecGlobalRecPtrTy =
CGF.ConvertTypeForMem(SecGlobalRecTy)->getPointerTo();
SecBase = CGF.MakeNaturalAlignPointeeAddrLValue(
Bld.CreatePointerBitCastOrAddrSpaceCast(
I->getSecond().GlobalRecordAddr, SecGlobalRecPtrTy),
SecGlobalRecTy);
}
for (auto &Rec : I->getSecond().LocalVarData) {		for (auto &Rec : I->getSecond().LocalVarData) {
		const auto *VD = cast<VarDecl>(Rec.first);
		// If it is a parameter then load the value into the Globalized memory
bool EscapedParam = I->getSecond().EscapedParameters.count(Rec.first);		bool EscapedParam = I->getSecond().EscapedParameters.count(Rec.first);
llvm::Value *ParValue;		llvm::Value *ParValue;
		QualType VarTy = VD->getType();
if (EscapedParam) {		if (EscapedParam) {
const auto *VD = cast<VarDecl>(Rec.first);
LValue ParLVal =		LValue ParLVal =
CGF.MakeAddrLValue(CGF.GetAddrOfLocalVar(VD), VD->getType());		CGF.MakeAddrLValue(CGF.GetAddrOfLocalVar(VD), VD->getType());
ParValue = CGF.EmitLoadOfScalar(ParLVal, Loc);		ParValue = CGF.EmitLoadOfScalar(ParLVal, Loc);
}		}
LValue VarAddr = CGF.EmitLValueForField(Base, Rec.second.FD);		// Get the size needed in the stack. Logic of how much to allocate
// Emit VarAddr basing on lane-id if required.		// and which part to give to wich thread is inside the runtime function
QualType VarTy;		llvm::Value *Size = CGF.getTypeSize(VD->getType());
if (Rec.second.IsOnePerTeam) {		llvm::Value *VoidPtr = CGF.EmitRuntimeCall(
VarTy = Rec.second.FD->getType();		OMPBuilder.getOrCreateRuntimeFunction(
} else {		CGM.getModule(), OMPRTL___kmpc_data_sharing_push_stack),
llvm::Value *Ptr = CGF.Builder.CreateInBoundsGEP(		{Size});
VarAddr.getAddress(CGF).getPointer(),
{Bld.getInt32(0), getNVPTXLaneID(CGF)});		Rec.second.globalizedVal = VoidPtr;
VarTy =
Rec.second.FD->getType()->castAsArrayTypeUnsafe()->getElementType();		// Let's cast the void pointer and get the address of the globalized
VarAddr = CGF.MakeAddrLValue(		// variable
Address(Ptr, CGM.getContext().getDeclAlign(Rec.first)), VarTy,		llvm::PointerType *VarPtrTy = CGF.ConvertTypeForMem(VarTy)->getPointerTo();
AlignmentSource::Decl);		llvm::Value *castedVoidPtr = Bld.CreatePointerBitCastOrAddrSpaceCast(
}		VoidPtr, VarPtrTy, VD->getName() + "_on_stack");
Rec.second.PrivateAddr = VarAddr.getAddress(CGF);		LValue VarAddr = CGF.MakeNaturalAlignAddrLValue(castedVoidPtr, VarTy);
if (!IsInTTDRegion &&
(WithSPMDCheck \|\|
getExecutionMode() == CGOpenMPRuntimeGPU::EM_Unknown)) {
assert(I->getSecond().IsInSPMDModeFlag &&
"Expected unknown execution mode or required SPMD check.");
if (IsTTD) {
assert(SecIt->second.IsOnePerTeam &&
"Secondary glob data must be one per team.");
LValue SecVarAddr = CGF.EmitLValueForField(SecBase, SecIt->second.FD);
VarAddr.setAddress(
Address(Bld.CreateSelect(IsTTD, SecVarAddr.getPointer(CGF),
VarAddr.getPointer(CGF)),
VarAddr.getAlignment()));
Rec.second.PrivateAddr = VarAddr.getAddress(CGF);		Rec.second.PrivateAddr = VarAddr.getAddress(CGF);
}
Address GlobalPtr = Rec.second.PrivateAddr;		// If we are working with a parameter it is now time to get the actual value
Address LocalAddr = CGF.CreateMemTemp(VarTy, Rec.second.FD->getName());		// And assign it to the newly globalized location
Rec.second.PrivateAddr = Address(
Bld.CreateSelect(I->getSecond().IsInSPMDModeFlag,
LocalAddr.getPointer(), GlobalPtr.getPointer()),
LocalAddr.getAlignment());
}
if (EscapedParam) {		if (EscapedParam) {
const auto *VD = cast<VarDecl>(Rec.first);		const auto *VD = cast<VarDecl>(Rec.first);
CGF.EmitStoreOfScalar(ParValue, VarAddr);		CGF.EmitStoreOfScalar(ParValue, VarAddr);
I->getSecond().MappedParams->setVarAddr(CGF, VD,		I->getSecond().MappedParams->setVarAddr(CGF, VD, VarAddr.getAddress(CGF));
VarAddr.getAddress(CGF));
}		}
if (IsTTD)
++SecIt;
}		}
}		for (auto &VD : I->getSecond().EscapedVariableLengthDecls) {
for (const ValueDecl *VD : I->getSecond().EscapedVariableLengthDecls) {		// If it is a parameter then load the value into the Globalized memory
// Recover pointer to this function's global record. The runtime will		// QualType VarTy = VD->getType();
// handle the specifics of the allocation of the memory.		// Get the size needed in the stack. Logic of how much to allocate
// Use actual memory size of the record including the padding		// and which part to give to wich thread is inside the runtime function
// for alignment purposes.
CGBuilderTy &Bld = CGF.Builder;
llvm::Value *Size = CGF.getTypeSize(VD->getType());		llvm::Value *Size = CGF.getTypeSize(VD->getType());
CharUnits Align = CGM.getContext().getDeclAlign(VD);		CharUnits Align = CGM.getContext().getDeclAlign(VD);
Size = Bld.CreateNUWAdd(		Size = Bld.CreateNUWAdd(
Size, llvm::ConstantInt::get(CGF.SizeTy, Align.getQuantity() - 1));		Size, llvm::ConstantInt::get(CGF.SizeTy, Align.getQuantity() - 1));
llvm::Value *AlignVal =		llvm::Value *AlignVal =
llvm::ConstantInt::get(CGF.SizeTy, Align.getQuantity());		llvm::ConstantInt::get(CGF.SizeTy, Align.getQuantity());
Size = Bld.CreateUDiv(Size, AlignVal);		Size = Bld.CreateUDiv(Size, AlignVal);
Size = Bld.CreateNUWMul(Size, AlignVal);		Size = Bld.CreateNUWMul(Size, AlignVal);
// TODO: allow the usage of shared memory to be controlled by		llvm::Value *VoidPtr = CGF.EmitRuntimeCall(
// the user, for now, default to global.
llvm::Value *GlobalRecordSizeArg[] = {
Size, CGF.Builder.getInt16(/UseSharedMemory=/0)};
llvm::Value *GlobalRecValue = CGF.EmitRuntimeCall(
OMPBuilder.getOrCreateRuntimeFunction(		OMPBuilder.getOrCreateRuntimeFunction(
CGM.getModule(), OMPRTL___kmpc_data_sharing_coalesced_push_stack),		CGM.getModule(), OMPRTL___kmpc_data_sharing_push_stack),
GlobalRecordSizeArg);		{Size});
llvm::Value *GlobalRecCastAddr = Bld.CreatePointerBitCastOrAddrSpaceCast(
GlobalRecValue, CGF.ConvertTypeForMem(VD->getType())->getPointerTo());		I->getSecond().EscapedVariableLengthDeclsAddrs.emplace_back(VoidPtr);
LValue Base = CGF.MakeAddrLValue(GlobalRecCastAddr, VD->getType(),		LValue Base = CGF.MakeAddrLValue(VoidPtr, VD->getType(),
CGM.getContext().getDeclAlign(VD),		CGM.getContext().getDeclAlign(VD),
AlignmentSource::Decl);		AlignmentSource::Decl);
I->getSecond().MappedParams->setVarAddr(CGF, cast<VarDecl>(VD),		I->getSecond().MappedParams->setVarAddr(CGF, cast<VarDecl>(VD),
Base.getAddress(CGF));		Base.getAddress(CGF));
I->getSecond().EscapedVariableLengthDeclsAddrs.emplace_back(GlobalRecValue);
}		}
I->getSecond().MappedParams->apply(CGF);		I->getSecond().MappedParams->apply(CGF);
}		}

void CGOpenMPRuntimeGPU::emitGenericVarsEpilog(CodeGenFunction &CGF,		void CGOpenMPRuntimeGPU::emitGenericVarsEpilog(CodeGenFunction &CGF,
bool WithSPMDCheck) {		bool WithSPMDCheck) {
if (getDataSharingMode(CGM) != CGOpenMPRuntimeGPU::Generic &&		if (getDataSharingMode(CGM) != CGOpenMPRuntimeGPU::Generic &&
getExecutionMode() != CGOpenMPRuntimeGPU::EM_SPMD)		getExecutionMode() != CGOpenMPRuntimeGPU::EM_SPMD)
return;		return;

const auto I = FunctionGlobalizedDecls.find(CGF.CurFn);		const auto I = FunctionGlobalizedDecls.find(CGF.CurFn);
if (I != FunctionGlobalizedDecls.end()) {		if (I != FunctionGlobalizedDecls.end()) {
I->getSecond().MappedParams->restore(CGF);
if (!CGF.HaveInsertPoint())
return;
for (llvm::Value *Addr :		for (llvm::Value *Addr :
llvm::reverse(I->getSecond().EscapedVariableLengthDeclsAddrs)) {		llvm::reverse(I->getSecond().EscapedVariableLengthDeclsAddrs)) {
CGF.EmitRuntimeCall(		CGF.EmitRuntimeCall(
OMPBuilder.getOrCreateRuntimeFunction(		OMPBuilder.getOrCreateRuntimeFunction(
CGM.getModule(), OMPRTL___kmpc_data_sharing_pop_stack),		CGM.getModule(), OMPRTL___kmpc_data_sharing_pop_stack),
Addr);		Addr);
}		}
if (I->getSecond().GlobalRecordAddr) {		for (auto &Rec : llvm::reverse(I->getSecond().LocalVarData)) {
if (!IsInTTDRegion &&		I->getSecond().MappedParams->restore(CGF);
(WithSPMDCheck \|\|		// const auto *VD = cast<VarDecl>(Rec.first);
getExecutionMode() == CGOpenMPRuntimeGPU::EM_Unknown)) {
CGBuilderTy &Bld = CGF.Builder;		// Get the size needed in the stack. Logic of how much to allocate
llvm::BasicBlock *ExitBB = CGF.createBasicBlock(".exit");		// and which part to give to wich thread is inside the runtime function
llvm::BasicBlock *NonSPMDBB = CGF.createBasicBlock(".non-spmd");		// llvm::Value *size = CGF.getTypeSize(VD->getType());
Bld.CreateCondBr(I->getSecond().IsInSPMDModeFlag, ExitBB, NonSPMDBB);
// There is no need to emit line number for unconditional branch.
(void)ApplyDebugLocation::CreateEmpty(CGF);
CGF.EmitBlock(NonSPMDBB);
CGF.EmitRuntimeCall(
OMPBuilder.getOrCreateRuntimeFunction(
CGM.getModule(), OMPRTL___kmpc_data_sharing_pop_stack),
CGF.EmitCastToVoidPtr(I->getSecond().GlobalRecordAddr));
CGF.EmitBlock(ExitBB);
} else if (!CGM.getLangOpts().OpenMPCUDATargetParallel && IsInTTDRegion) {
assert(GlobalizedRecords.back().RegionCounter > 0 &&
"region counter must be > 0.");
--GlobalizedRecords.back().RegionCounter;
// Emit the restore function only in the target region.
if (GlobalizedRecords.back().RegionCounter == 0) {
QualType Int16Ty = CGM.getContext().getIntTypeForBitwidth(
/DestWidth=/16, /Signed=/0);
llvm::Value *IsInSharedMemory = CGF.EmitLoadOfScalar(
Address(GlobalizedRecords.back().UseSharedMemory,
CGM.getContext().getTypeAlignInChars(Int16Ty)),
/Volatile=/false, Int16Ty, GlobalizedRecords.back().Loc);
llvm::Value *Args[] = {
llvm::ConstantInt::get(
CGM.Int16Ty,
getExecutionMode() == CGOpenMPRuntimeGPU::EM_SPMD ? 1 : 0),
IsInSharedMemory};
CGF.EmitRuntimeCall(
OMPBuilder.getOrCreateRuntimeFunction(
CGM.getModule(), OMPRTL___kmpc_restore_team_static_memory),
Args);
}
} else {
CGF.EmitRuntimeCall(		CGF.EmitRuntimeCall(
OMPBuilder.getOrCreateRuntimeFunction(		OMPBuilder.getOrCreateRuntimeFunction(
CGM.getModule(), OMPRTL___kmpc_data_sharing_pop_stack),		CGM.getModule(), OMPRTL___kmpc_data_sharing_pop_stack),
I->getSecond().GlobalRecordAddr);		{Rec.second.globalizedVal});
}
}		}
}		}
}		}

void CGOpenMPRuntimeGPU::emitTeamsCall(CodeGenFunction &CGF,		void CGOpenMPRuntimeGPU::emitTeamsCall(CodeGenFunction &CGF,
const OMPExecutableDirective &D,		const OMPExecutableDirective &D,
SourceLocation Loc,		SourceLocation Loc,
llvm::Function *OutlinedFn,		llvm::Function *OutlinedFn,
▲ Show 20 Lines • Show All 2,260 Lines • ▼ Show 20 Lines	if (const auto *FD = dyn_cast<FunctionDecl>(D)) {
if (NeedToDelayGlobalization &&		if (NeedToDelayGlobalization &&
getExecutionMode() == CGOpenMPRuntimeGPU::EM_SPMD)		getExecutionMode() == CGOpenMPRuntimeGPU::EM_SPMD)
return;		return;
}		}
if (!Body)		if (!Body)
return;		return;
CheckVarsEscapingDeclContext VarChecker(CGF, TeamAndReductions.second);		CheckVarsEscapingDeclContext VarChecker(CGF, TeamAndReductions.second);
VarChecker.Visit(Body);		VarChecker.Visit(Body);
const RecordDecl *GlobalizedVarsRecord =
VarChecker.getGlobalizedRecord(IsInTTDRegion);
TeamAndReductions.first = nullptr;		TeamAndReductions.first = nullptr;
TeamAndReductions.second.clear();		TeamAndReductions.second.clear();
ArrayRef<const ValueDecl *> EscapedVariableLengthDecls =		ArrayRef<const ValueDecl *> EscapedVariableLengthDecls =
VarChecker.getEscapedVariableLengthDecls();		VarChecker.getEscapedVariableLengthDecls();
if (!GlobalizedVarsRecord && EscapedVariableLengthDecls.empty())
return;
auto I = FunctionGlobalizedDecls.try_emplace(CGF.CurFn).first;		auto I = FunctionGlobalizedDecls.try_emplace(CGF.CurFn).first;
I->getSecond().MappedParams =		I->getSecond().MappedParams =
std::make_unique<CodeGenFunction::OMPMapVars>();		std::make_unique<CodeGenFunction::OMPMapVars>();
I->getSecond().GlobalRecord = GlobalizedVarsRecord;
I->getSecond().EscapedParameters.insert(		I->getSecond().EscapedParameters.insert(
VarChecker.getEscapedParameters().begin(),		VarChecker.getEscapedParameters().begin(),
VarChecker.getEscapedParameters().end());		VarChecker.getEscapedParameters().end());
I->getSecond().EscapedVariableLengthDecls.append(		I->getSecond().EscapedVariableLengthDecls.append(
EscapedVariableLengthDecls.begin(), EscapedVariableLengthDecls.end());		EscapedVariableLengthDecls.begin(), EscapedVariableLengthDecls.end());
DeclToAddrMapTy &Data = I->getSecond().LocalVarData;		DeclToAddrMapTy &Data = I->getSecond().LocalVarData;
for (const ValueDecl *VD : VarChecker.getEscapedDecls()) {		for (const ValueDecl *VD : VarChecker.getEscapedDecls()) {
assert(VD->isCanonicalDecl() && "Expected canonical declaration");		assert(VD->isCanonicalDecl() && "Expected canonical declaration");
const FieldDecl *FD = VarChecker.getFieldForGlobalizedVar(VD);		Data.insert(std::make_pair(VD, MappedVarData(IsInTTDRegion)));
Data.insert(std::make_pair(VD, MappedVarData(FD, IsInTTDRegion)));
}
if (!IsInTTDRegion && !NeedToDelayGlobalization && !IsInParallelRegion) {
CheckVarsEscapingDeclContext VarChecker(CGF, llvm::None);
VarChecker.Visit(Body);
I->getSecond().SecondaryGlobalRecord =
VarChecker.getGlobalizedRecord(/IsInTTDRegion=/true);
I->getSecond().SecondaryLocalVarData.emplace();
DeclToAddrMapTy &Data = I->getSecond().SecondaryLocalVarData.getValue();
for (const ValueDecl *VD : VarChecker.getEscapedDecls()) {
assert(VD->isCanonicalDecl() && "Expected canonical declaration");
const FieldDecl *FD = VarChecker.getFieldForGlobalizedVar(VD);
Data.insert(
std::make_pair(VD, MappedVarData(FD, /IsInTTDRegion=/true)));
}
}		}

if (!NeedToDelayGlobalization) {		if (!NeedToDelayGlobalization) {
emitGenericVarsProlog(CGF, D->getBeginLoc(), /WithSPMDCheck=/true);		emitGenericVarsProlog(CGF, D->getBeginLoc(), /WithSPMDCheck=/true);
struct GlobalizationScope final : EHScopeStack::Cleanup {		struct GlobalizationScope final : EHScopeStack::Cleanup {
GlobalizationScope() = default;		GlobalizationScope() = default;

void Emit(CodeGenFunction &CGF, Flags flags) override {		void Emit(CodeGenFunction &CGF, Flags flags) override {
static_cast<CGOpenMPRuntimeGPU &>(CGF.CGM.getOpenMPRuntime())		static_cast<CGOpenMPRuntimeGPU &>(CGF.CGM.getOpenMPRuntime())
.emitGenericVarsEpilog(CGF, /WithSPMDCheck=/true);		.emitGenericVarsEpilog(CGF, /WithSPMDCheck=/true);
▲ Show 20 Lines • Show All 484 Lines • Show Last 20 Lines

clang/test/OpenMP/declare_target_codegen_globalization.cpp

Show All 18 Lines	#pragma omp target parallel map(from:a)
int b;		int b;
a = foo(b) + bar();		a = foo(b) + bar();
}		}
return a;		return a;
}		}

// parallel region		// parallel region
// CHECK: define {{.}}void @{{.}}(i32* noalias {{.}}, i32 noalias {{.}}, i32 nonnull align {{[0-9]+}} dereferenceable({{[0-9]+}}) %{{.*}})		// CHECK: define {{.}}void @{{.}}(i32* noalias {{.}}, i32 noalias {{.}}, i32 nonnull align {{[0-9]+}} dereferenceable({{[0-9]+}}) %{{.*}})
// CHECK-NOT: call i8* @__kmpc_data_sharing_coalesced_push_stack(		// CHECK-NOT: call i8* @__kmpc_data_sharing_push_stack(
// CHECK: [[B_ADDR:%.+]] = alloca i32,		// CHECK: [[B_ADDR:%.+]] = alloca i32,
// CHECK: call {{.}}[[FOO:@.foo.]](i32 nonnull align {{[0-9]+}} dereferenceable({{[0-9]+}}) [[B_ADDR]])		// CHECK: call {{.}}[[FOO:@.foo.]](i32 nonnull align {{[0-9]+}} dereferenceable({{[0-9]+}}) [[B_ADDR]])
// CHECK: call {{.}}[[BAR:@.bar.*]]()		// CHECK: call {{.}}[[BAR:@.bar.*]]()
// CHECK-NOT: call void @__kmpc_data_sharing_pop_stack(		// CHECK-NOT: call void @__kmpc_data_sharing_pop_stack(
// CHECK: ret void		// CHECK: ret void

// CHECK: define {{.}}[[FOO]](i32 nonnull align {{[0-9]+}} dereferenceable{{.*}})		// CHECK: define {{.}}[[FOO]](i32 nonnull align {{[0-9]+}} dereferenceable{{.*}})
// CHECK-NOT: @__kmpc_data_sharing_coalesced_push_stack		// CHECK-NOT: @__kmpc_data_sharing_push_stack

// CHECK: define {{.*}}[[BAR]]()		// CHECK: define {{.*}}[[BAR]]()
// CHECK: alloca i32,		// CHECK: [[SHARED_A:%.+]] = call i8* @__kmpc_data_sharing_push_stack(i64 4)
// CHECK: [[A_LOCAL_ADDR:%.+]] = alloca i32,		// CHECK: [[SHARED_A2:%.+]] = bitcast i8* [[SHARED_A]] to i32*
// CHECK: [[RES:%.+]] = call i8 @__kmpc_is_spmd_exec_mode()		// CHECK: call {{.}}[[FOO]](i32 nonnull align {{[0-9]+}} dereferenceable{{.*}} [[SHARED_A2]])
// CHECK: [[IS_SPMD:%.+]] = icmp ne i8 [[RES]], 0		// CHECK: call void @__kmpc_data_sharing_pop_stack(i8* [[SHARED_A]])
// CHECK: br i1 [[IS_SPMD]], label
// CHECK: br label
// CHECK: [[RES:%.+]] = call i8* @__kmpc_data_sharing_coalesced_push_stack(i64 128, i16 0)
// CHECK: [[GLOBALS:%.+]] = bitcast i8* [[RES]] to [[GLOBAL_ST:%.+]]*
// CHECK: br label
// CHECK: [[ITEMS:%.+]] = phi [[GLOBAL_ST]]* [ null, {{.+}} ], [ [[GLOBALS]], {{.+}} ]
// CHECK: [[A_ADDR:%.+]] = getelementptr inbounds [[GLOBAL_ST]], [[GLOBAL_ST]]* [[ITEMS]], i{{[0-9]+}} 0, i{{[0-9]+}} 0
// CHECK: [[TID:%.+]] = call i32 @llvm.nvvm.read.ptx.sreg.tid.x()
// CHECK: [[LID:%.+]] = and i32 [[TID]], 31
// CHECK: [[A_GLOBAL_ADDR:%.+]] = getelementptr inbounds [32 x i32], [32 x i32]* [[A_ADDR]], i32 0, i32 [[LID]]
// CHECK: [[A_ADDR:%.+]] = select i1 [[IS_SPMD]], i32* [[A_LOCAL_ADDR]], i32* [[A_GLOBAL_ADDR]]
// CHECK: call {{.}}[[FOO]](i32 nonnull align {{[0-9]+}} dereferenceable{{.*}} [[A_ADDR]])
// CHECK: br i1 [[IS_SPMD]], label
// CHECK: [[BC:%.+]] = bitcast [[GLOBAL_ST]]* [[ITEMS]] to i8*
// CHECK: call void @__kmpc_data_sharing_pop_stack(i8* [[BC]])
// CHECK: br label
// CHECK: ret i32		// CHECK: ret i32

clang/test/OpenMP/nvptx_data_sharing.cpp

	// Test device global memory data sharing codegen.			// Test device global memory data sharing codegen.
	///==========================================================================///			///==========================================================================///

	// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc			// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc
	// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -disable-llvm-optzns \| FileCheck %s --check-prefix CK1 --check-prefix SEQ			// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -disable-llvm-optzns \| FileCheck %s --check-prefix CK1
	// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -disable-llvm-optzns -fopenmp-cuda-parallel-target-regions \| FileCheck %s --check-prefix CK1 --check-prefix PAR

	// expected-no-diagnostics			// expected-no-diagnostics

	#ifndef HEADER			#ifndef HEADER
	#define HEADER			#define HEADER

	void test_ds(){			void test_ds(){
	#pragma omp target			#pragma omp target
	{			{
	int a = 10;			int a = 10;
	#pragma omp parallel			#pragma omp parallel
	{			{
	a = 1000;			a = 1000;
	}			}
	int b = 100;			int b = 100;
	int c = 1000;			int c = 1000;
	#pragma omp parallel private(c)			#pragma omp parallel private(c)
	{			{
	int *c1 = &c;			int *c1 = &c;
	b = a + 10000;			b = a + 10000;
	}			}
	}			}
	}			}
	// SEQ: [[MEM_TY:%.+]] = type { [128 x i8] }
	// SEQ-DAG: [[SHARED_GLOBAL_RD:@.+]] = common addrspace(3) global [[MEM_TY]] zeroinitializer
	// SEQ-DAG: [[KERNEL_PTR:@.+]] = internal addrspace(3) global i8* null
	// SEQ-DAG: [[KERNEL_SIZE:@.+]] = internal unnamed_addr constant i64 8
	// SEQ-DAG: [[KERNEL_SHARED:@.+]] = internal unnamed_addr constant i16 1

	/// ========= In the worker function ========= ///			/// ========= In the worker function ========= ///
	// CK1: {{.}}define internal void @__omp_offloading{{.}}test_ds{{.*}}_worker()			// CK1: {{.}}define internal void @__omp_offloading{{.}}test_ds{{.*}}_worker()
	// CK1: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0)			// CK1: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0)
	// CK1-NOT: call void @__kmpc_data_sharing_init_stack			// CK1-NOT: call void @__kmpc_data_sharing_init_stack

	/// ========= In the kernel function ========= ///			/// ========= In the kernel function ========= ///

	// CK1: {{.}}define weak void @__omp_offloading{{.}}test_ds{{.*}}()			// CK1: {{.}}define weak void @__omp_offloading{{.}}test_ds{{.*}}()
	// CK1: [[SHAREDARGS1:%.+]] = alloca i8**			// CK1: [[SHAREDARGS1:%.+]] = alloca i8**
	// CK1: [[SHAREDARGS2:%.+]] = alloca i8**			// CK1: [[SHAREDARGS2:%.+]] = alloca i8**
	// CK1: call void @__kmpc_kernel_init			// CK1: call void @__kmpc_kernel_init
	// CK1: call void @__kmpc_data_sharing_init_stack			// CK1: call void @__kmpc_data_sharing_init_stack
	// SEQ: [[SHARED_MEM_FLAG:%.+]] = load i16, i16* [[KERNEL_SHARED]],			// CK1: [[GLOBALSTACK_A:%.+]] = call i8* @__kmpc_data_sharing_push_stack(i{{32\|64}} 4)
	// SEQ: [[SIZE:%.+]] = load i64, i64* [[KERNEL_SIZE]],			// CK1: [[GLOBALSTACK_A2:%.+]] = bitcast i8* [[GLOBALSTACK_A]] to {{.*}}
	// SEQ: call void @__kmpc_get_team_static_memory(i16 0, i8* addrspacecast (i8 addrspace(3)* getelementptr inbounds ([[MEM_TY]], [[MEM_TY]] addrspace(3)* [[SHARED_GLOBAL_RD]], i32 0, i32 0, i32 0) to i8), i64 [[SIZE]], i16 [[SHARED_MEM_FLAG]], i8* addrspacecast (i8* addrspace(3)* [[KERNEL_PTR]] to i8**))			// CK1: [[GLOBALSTACK_B:%.+]] = call i8* @__kmpc_data_sharing_push_stack(i{{32\|64}} 4)
	// SEQ: [[KERNEL_RD:%.+]] = load i8, i8 addrspace(3)* [[KERNEL_PTR]],			// CK1: [[GLOBALSTACK_B2:%.+]] = bitcast i8* [[GLOBALSTACK_B]] to {{.*}}
	// SEQ: [[GLOBALSTACK:%.+]] = getelementptr inbounds i8, i8* [[KERNEL_RD]], i64 0			// CK1: store i32 10, i32* [[GLOBALSTACK_A2]], align 4
	// PAR: [[GLOBALSTACK:%.+]] = call i8* @__kmpc_data_sharing_push_stack(i{{32\|64}} 8, i16 1)
	// CK1: [[GLOBALSTACK2:%.+]] = bitcast i8* [[GLOBALSTACK]] to %struct._globalized_locals_ty*
	// CK1: [[A:%.+]] = getelementptr inbounds %struct._globalized_locals_ty, %struct._globalized_locals_ty* [[GLOBALSTACK2]], i32 0, i32 0
	// CK1: [[B:%.+]] = getelementptr inbounds %struct._globalized_locals_ty, %struct._globalized_locals_ty* [[GLOBALSTACK2]], i32 0, i32 1
	// CK1: store i32 10, i32* [[A]]
	// CK1: call void @__kmpc_kernel_prepare_parallel({{.*}})			// CK1: call void @__kmpc_kernel_prepare_parallel({{.*}})
	// CK1: call void @__kmpc_begin_sharing_variables(i8*** [[SHAREDARGS1]], i64 1)			// CK1: call void @__kmpc_begin_sharing_variables(i8*** [[SHAREDARGS1]], i64 1)
	// CK1: [[SHARGSTMP1:%.+]] = load i8, i8* [[SHAREDARGS1]]			// CK1: [[SHARGSTMP1:%.+]] = load i8, i8* [[SHAREDARGS1]]
	// CK1: [[SHARGSTMP2:%.+]] = getelementptr inbounds i8, i8* [[SHARGSTMP1]], i64 0			// CK1: [[SHARGSTMP2:%.+]] = getelementptr inbounds i8, i8* [[SHARGSTMP1]], i64 0
	// CK1: [[SHAREDVAR:%.+]] = bitcast i32* [[A]] to i8*			// CK1: [[SHAREDVAR:%.+]] = bitcast i32* [[GLOBALSTACK_A2]] to i8*
	// CK1: store i8* [[SHAREDVAR]], i8** [[SHARGSTMP2]]			// CK1: store i8* [[SHAREDVAR]], i8** [[SHARGSTMP2]]
	// CK1: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0)			// CK1: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0)
	// CK1: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0)			// CK1: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0)
	// CK1: call void @__kmpc_end_sharing_variables()			// CK1: call void @__kmpc_end_sharing_variables()
	// CK1: store i32 100, i32* [[B]]			// CK1: store i32 100, i32* [[GLOBALSTACK_B2]]
	// CK1: call void @__kmpc_kernel_prepare_parallel({{.*}})			// CK1: call void @__kmpc_kernel_prepare_parallel({{.*}})
	// CK1: call void @__kmpc_begin_sharing_variables(i8*** [[SHAREDARGS2]], i64 2)			// CK1: call void @__kmpc_begin_sharing_variables(i8*** [[SHAREDARGS2]], i64 2)
	// CK1: [[SHARGSTMP3:%.+]] = load i8, i8* [[SHAREDARGS2]]			// CK1: [[SHARGSTMP3:%.+]] = load i8, i8* [[SHAREDARGS2]]
	// CK1: [[SHARGSTMP4:%.+]] = getelementptr inbounds i8, i8* [[SHARGSTMP3]], i64 0			// CK1: [[SHARGSTMP4:%.+]] = getelementptr inbounds i8, i8* [[SHARGSTMP3]], i64 0
	// CK1: [[SHAREDVAR1:%.+]] = bitcast i32* [[B]] to i8*			// CK1: [[SHAREDVAR1:%.+]] = bitcast i32* [[GLOBALSTACK_B2]] to i8*
	// CK1: store i8* [[SHAREDVAR1]], i8** [[SHARGSTMP4]]			// CK1: store i8* [[SHAREDVAR1]], i8** [[SHARGSTMP4]]
	// CK1: [[SHARGSTMP12:%.+]] = getelementptr inbounds i8, i8* [[SHARGSTMP3]], i64 1			// CK1: [[SHARGSTMP12:%.+]] = getelementptr inbounds i8, i8* [[SHARGSTMP3]], i64 1
	// CK1: [[SHAREDVAR2:%.+]] = bitcast i32* [[A]] to i8*			// CK1: [[SHAREDVAR2:%.+]] = bitcast i32* [[GLOBALSTACK_A2]] to i8*
	// CK1: store i8* [[SHAREDVAR2]], i8** [[SHARGSTMP12]]			// CK1: store i8* [[SHAREDVAR2]], i8** [[SHARGSTMP12]]
	// CK1: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0)			// CK1: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0)
	// CK1: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0)			// CK1: call void @__kmpc_barrier_simple_spmd(%struct.ident_t* null, i32 0)
	// CK1: call void @__kmpc_end_sharing_variables()			// CK1: call void @__kmpc_end_sharing_variables()
	// SEQ: [[SHARED_MEM_FLAG:%.+]] = load i16, i16* [[KERNEL_SHARED]],			// CK1: call void @__kmpc_data_sharing_pop_stack(i8* [[GLOBALSTACK_B]])
	// SEQ: call void @__kmpc_restore_team_static_memory(i16 0, i16 [[SHARED_MEM_FLAG]])			// CK1: call void @__kmpc_data_sharing_pop_stack(i8* [[GLOBALSTACK_A]])
	// PAR: call void @__kmpc_data_sharing_pop_stack(i8* [[GLOBALSTACK]])
	// CK1: call void @__kmpc_kernel_deinit(i16 1)			// CK1: call void @__kmpc_kernel_deinit(i16 1)

	/// ========= In the data sharing wrapper function ========= ///			/// ========= In the data sharing wrapper function ========= ///

	// CK1: {{.}}define internal void @__omp_outlined{{.}}wrapper({{.*}})			// CK1: {{.}}define internal void @__omp_outlined{{.}}wrapper({{.*}})
	// CK1: [[SHAREDARGS4:%.+]] = alloca i8**			// CK1: [[SHAREDARGS4:%.+]] = alloca i8**
	// CK1: call void @__kmpc_get_shared_variables(i8*** [[SHAREDARGS4]])			// CK1: call void @__kmpc_get_shared_variables(i8*** [[SHAREDARGS4]])
	// CK1: [[SHARGSTMP13:%.+]] = load i8, i8* [[SHAREDARGS4]]			// CK1: [[SHARGSTMP13:%.+]] = load i8, i8* [[SHAREDARGS4]]
	Show All 29 Lines

clang/test/OpenMP/nvptx_distribute_parallel_generic_mode_codegen.cpp

	// Test target codegen - host bc file has to be created first.			// Test target codegen - host bc file has to be created first.
	// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=45 -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc			// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=45 -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc
	// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=45 -x c++ -triple nvptx64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -disable-llvm-optzns \| FileCheck %s --check-prefix CHECK --check-prefix CHECK-64 --check-prefix SEQ			// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=45 -x c++ -triple nvptx64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -disable-llvm-optzns \| FileCheck %s --check-prefix CHECK
	// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=45 -x c++ -triple nvptx64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -disable-llvm-optzns -fopenmp-cuda-parallel-target-regions \| FileCheck %s --check-prefix CHECK --check-prefix CHECK-64 --check-prefix PAR
	// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=45 -x c++ -triple i386-unknown-unknown -fopenmp-targets=nvptx-nvidia-cuda -emit-llvm-bc %s -o %t-x86-host.bc			// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=45 -x c++ -triple i386-unknown-unknown -fopenmp-targets=nvptx-nvidia-cuda -emit-llvm-bc %s -o %t-x86-host.bc
	// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=45 -x c++ -triple nvptx-unknown-unknown -fopenmp-targets=nvptx-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-x86-host.bc -o - -disable-llvm-optzns \| FileCheck %s --check-prefix CHECK --check-prefix CHECK-32 --check-prefix SEQ			// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=45 -x c++ -triple nvptx-unknown-unknown -fopenmp-targets=nvptx-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-x86-host.bc -o - -disable-llvm-optzns \| FileCheck %s --check-prefix CHECK
	// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=45 -fexceptions -fcxx-exceptions -x c++ -triple nvptx-unknown-unknown -fopenmp-targets=nvptx-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-x86-host.bc -o - -disable-llvm-optzns \| FileCheck %s --check-prefix CHECK --check-prefix CHECK-32 --check-prefix SEQ			// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=45 -fexceptions -fcxx-exceptions -x c++ -triple nvptx-unknown-unknown -fopenmp-targets=nvptx-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-x86-host.bc -o - -disable-llvm-optzns \| FileCheck %s --check-prefix CHECK
	// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=45 -x c++ -triple nvptx-unknown-unknown -fopenmp-targets=nvptx-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-x86-host.bc -o - -disable-llvm-optzns -fopenmp-cuda-parallel-target-regions \| FileCheck %s --check-prefix CHECK --check-prefix CHECK-32 --check-prefix PAR
	// RUN: %clang_cc1 -verify -fopenmp -fopenmp-version=45 -fexceptions -fcxx-exceptions -x c++ -triple nvptx-unknown-unknown -fopenmp-targets=nvptx-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-x86-host.bc -o - -disable-llvm-optzns -fopenmp-cuda-parallel-target-regions \| FileCheck %s --check-prefix CHECK --check-prefix CHECK-32 --check-prefix PAR

	// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc			// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc
	// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -disable-llvm-optzns \| FileCheck %s --check-prefix CHECK --check-prefix CHECK-64 --check-prefix SEQ			// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -disable-llvm-optzns \| FileCheck %s --check-prefix CHECK
	// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -disable-llvm-optzns -fopenmp-cuda-parallel-target-regions \| FileCheck %s --check-prefix CHECK --check-prefix CHECK-64 --check-prefix PAR
	// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple i386-unknown-unknown -fopenmp-targets=nvptx-nvidia-cuda -emit-llvm-bc %s -o %t-x86-host.bc			// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple i386-unknown-unknown -fopenmp-targets=nvptx-nvidia-cuda -emit-llvm-bc %s -o %t-x86-host.bc
	// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx-unknown-unknown -fopenmp-targets=nvptx-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-x86-host.bc -o - -disable-llvm-optzns \| FileCheck %s --check-prefix CHECK --check-prefix CHECK-32 --check-prefix SEQ			// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx-unknown-unknown -fopenmp-targets=nvptx-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-x86-host.bc -o - -disable-llvm-optzns \| FileCheck %s --check-prefix CHECK
	// RUN: %clang_cc1 -verify -fopenmp -fexceptions -fcxx-exceptions -x c++ -triple nvptx-unknown-unknown -fopenmp-targets=nvptx-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-x86-host.bc -o - -disable-llvm-optzns \| FileCheck %s --check-prefix CHECK --check-prefix CHECK-32 --check-prefix SEQ			// RUN: %clang_cc1 -verify -fopenmp -fexceptions -fcxx-exceptions -x c++ -triple nvptx-unknown-unknown -fopenmp-targets=nvptx-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-x86-host.bc -o - -disable-llvm-optzns \| FileCheck %s --check-prefix CHECK
	// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx-unknown-unknown -fopenmp-targets=nvptx-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-x86-host.bc -o - -disable-llvm-optzns -fopenmp-cuda-parallel-target-regions \| FileCheck %s --check-prefix CHECK --check-prefix CHECK-32 --check-prefix PAR
	// RUN: %clang_cc1 -verify -fopenmp -fexceptions -fcxx-exceptions -x c++ -triple nvptx-unknown-unknown -fopenmp-targets=nvptx-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-x86-host.bc -o - -disable-llvm-optzns -fopenmp-cuda-parallel-target-regions \| FileCheck %s --check-prefix CHECK --check-prefix CHECK-32 --check-prefix PAR

	// expected-no-diagnostics			// expected-no-diagnostics
	#ifndef HEADER			#ifndef HEADER
	#define HEADER			#define HEADER

	int a;			int a;

	int foo(int *a);			int foo(int *a);

	int main(int argc, char **argv) {			int main(int argc, char **argv) {
	int b[10], c[10], d[10];			int b[10], c[10], d[10];
	#pragma omp target teams map(tofrom:a)			#pragma omp target teams map(tofrom:a)
	#pragma omp distribute parallel for firstprivate(b) lastprivate(c) if(a)			#pragma omp distribute parallel for firstprivate(b) lastprivate(c) if(a)
	for (int i= 0; i < argc; ++i)			for (int i= 0; i < argc; ++i)
	a = foo(&i) + foo(&a) + foo(&b[i]) + foo(&c[i]) + foo(&d[i]);			a = foo(&i) + foo(&a) + foo(&b[i]) + foo(&c[i]) + foo(&d[i]);
	return 0;			return 0;
	}			}

	// SEQ: [[MEM_TY:%.+]] = type { [128 x i8] }			// CHECK: @__omp_offloading_{{.*}}_main_[[LINE:l.+]]_exec_mode = weak constant i8 0
	// SEQ-DAG: [[SHARED_GLOBAL_RD:@.+]] = common addrspace(3) global [[MEM_TY]] zeroinitializer			// CHECK-NOT: call void @__kmpc_data_sharing_init_stack
	// SEQ-DAG: [[KERNEL_PTR:@.+]] = internal addrspace(3) global i8* null			// CHECK-NOT: call void @__kmpc_data_sharing_push_stack
	// SEQ-DAG: [[KERNEL_SIZE:@.+]] = internal unnamed_addr constant i{{64\|32}} 40
	// SEQ-DAG: [[KERNEL_SHARED:@.+]] = internal unnamed_addr constant i16 1
	// CHECK-DAG: @__omp_offloading_{{.*}}_main_[[LINE:l.+]]_exec_mode = weak constant i8 0

	// CHECK: define weak void @__omp_offloading_{{.}}_main_[[LINE]]([10 x i32] nonnull align 4 dereferenceable(40) %{{.+}}, [10 x i32]* nonnull align 4 dereferenceable(40) %{{.+}}, i32* nonnull align 4 dereferenceable(4) %{{.+}}, i{{64\|32}} %{{.+}}, [10 x i32]* nonnull align 4 dereferenceable(40) %{{.+}})			// CHECK: define weak void @__omp_offloading_{{.}}_main_[[LINE]]([10 x i32] nonnull align 4 dereferenceable(40) %{{.+}}, [10 x i32]* nonnull align 4 dereferenceable(40) %{{.+}}, i32* nonnull align 4 dereferenceable(4) %{{.+}}, i{{64\|32}} %{{.+}}, [10 x i32]* nonnull align 4 dereferenceable(40) %{{.+}})
	// SEQ: [[SHARED:%.+]] = load i16, i16* [[KERNEL_SHARED]],			// CHECK: [[CVOIDPTR:%.+]] = call i8* @__kmpc_data_sharing_push_stack(i{{32\|64}} 40)
	// SEQ: [[SIZE:%.+]] = load i{{64\|32}}, i{{64\|32}}* [[KERNEL_SIZE]],			// CHECK: [[CSTACK:%.+]] = bitcast i8* [[CVOIDPTR]] to [10 x i{{32\|64}}]*
	// SEQ: call void @__kmpc_get_team_static_memory(i16 1, i8* addrspacecast (i8 addrspace(3)* getelementptr inbounds ([[MEM_TY]], [[MEM_TY]] addrspace(3)* [[SHARED_GLOBAL_RD]], i32 0, i32 0, i32 0) to i8), i{{64\|32}} [[SIZE]], i16 [[SHARED]], i8* addrspacecast (i8* addrspace(3)* [[KERNEL_PTR]] to i8**))
	// SEQ: [[PTR:%.+]] = load i8, i8 addrspace(3)* [[KERNEL_PTR]],
	// SEQ: [[GEP:%.+]] = getelementptr inbounds i8, i8* [[PTR]], i{{64\|32}} 0
	// PAR: [[GEP:%.+]] = call i8* @__kmpc_data_sharing_push_stack(i{{32\|64}} 40, i16 1)
	// CHECK: [[STACK:%.+]] = bitcast i8* [[GEP]] to %struct._globalized_locals_ty*
	// CHECK: getelementptr inbounds %struct._globalized_locals_ty, %struct._globalized_locals_ty* [[STACK]], i{{32\|64}} 0, i{{32\|64}} 0
	// CHECK-NOT: getelementptr inbounds %struct._globalized_locals_ty, %struct._globalized_locals_ty* [[STACK]],
	// CHECK: call void @__kmpc_for_static_init_4(			// CHECK: call void @__kmpc_for_static_init_4(

	// CHECK: call void [[PARALLEL:@.+]](			// CHECK: call void [[PARALLEL:@.+]](

	// CHECK: call void @__kmpc_for_static_fini(%struct.ident_t* @			// CHECK: call void @__kmpc_for_static_fini(%struct.ident_t* @

	// SEQ: [[SHARED:%.+]] = load i16, i16* [[KERNEL_SHARED]],			// PAR: call void @__kmpc_data_sharing_pop_stack(i8* [[CVOIDPTR]])
	// SEQ: call void @__kmpc_restore_team_static_memory(i16 1, i16 [[SHARED]])
	// PAR: call void @__kmpc_data_sharing_pop_stack(i8* [[GEP]])

	// CHECK: define internal void [[PARALLEL]](			// CHECK: define internal void [[PARALLEL]](
	// CHECK-NOT: call i8* @__kmpc_data_sharing_push_stack(			// CHECK-NOT: call i8* @__kmpc_data_sharing_push_stack(

	// CHECK-NOT: call void @__kmpc_data_sharing_pop_stack(			// CHECK-NOT: call void @__kmpc_data_sharing_pop_stack(

	#endif			#endif

clang/test/OpenMP/nvptx_parallel_codegen.cpp

	// Test target codegen - host bc file has to be created first.			// Test target codegen - host bc file has to be created first.
	// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc			// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc
	// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -disable-llvm-optzns \| FileCheck %s --check-prefix CHECK --check-prefix CHECK-64 --check-prefix SEQ
	// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -disable-llvm-optzns -fopenmp-cuda-parallel-target-regions \| FileCheck %s --check-prefix CHECK --check-prefix CHECK-64 --check-prefix PAR			// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -disable-llvm-optzns -fopenmp-cuda-parallel-target-regions \| FileCheck %s --check-prefix CHECK --check-prefix CHECK-64 --check-prefix PAR
	// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple i386-unknown-unknown -fopenmp-targets=nvptx-nvidia-cuda -emit-llvm-bc %s -o %t-x86-host.bc			// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple i386-unknown-unknown -fopenmp-targets=nvptx-nvidia-cuda -emit-llvm-bc %s -o %t-x86-host.bc
	// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx-unknown-unknown -fopenmp-targets=nvptx-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-x86-host.bc -o - -disable-llvm-optzns \| FileCheck %s --check-prefix CHECK --check-prefix CHECK-32 --check-prefix SEQ
	// RUN: %clang_cc1 -verify -fopenmp -fexceptions -fcxx-exceptions -x c++ -triple nvptx-unknown-unknown -fopenmp-targets=nvptx-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-x86-host.bc -o - -disable-llvm-optzns \| FileCheck %s --check-prefix CHECK --check-prefix CHECK-32 --check-prefix SEQ
	// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx-unknown-unknown -fopenmp-targets=nvptx-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-x86-host.bc -o - -disable-llvm-optzns -fopenmp-cuda-parallel-target-regions \| FileCheck %s --check-prefix CHECK --check-prefix CHECK-32 --check-prefix PAR			// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx-unknown-unknown -fopenmp-targets=nvptx-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-x86-host.bc -o - -disable-llvm-optzns -fopenmp-cuda-parallel-target-regions \| FileCheck %s --check-prefix CHECK --check-prefix CHECK-32 --check-prefix PAR
	// RUN: %clang_cc1 -verify -fopenmp -fexceptions -fcxx-exceptions -x c++ -triple nvptx-unknown-unknown -fopenmp-targets=nvptx-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-x86-host.bc -o - -disable-llvm-optzns -fopenmp-cuda-parallel-target-regions \| FileCheck %s --check-prefix CHECK --check-prefix CHECK-32 --check-prefix PAR			// RUN: %clang_cc1 -verify -fopenmp -fexceptions -fcxx-exceptions -x c++ -triple nvptx-unknown-unknown -fopenmp-targets=nvptx-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-x86-host.bc -o - -disable-llvm-optzns -fopenmp-cuda-parallel-target-regions \| FileCheck %s --check-prefix CHECK --check-prefix CHECK-32 --check-prefix PAR
	// expected-no-diagnostics			// expected-no-diagnostics
	#ifndef HEADER			#ifndef HEADER
	#define HEADER			#define HEADER




	template<typename tx>			template<typename tx>
	tx ftemplate(int n) {			tx ftemplate(int n) {
	tx a = 0;			tx a = 0;
	short aa = 0;			short aa = 0;
	tx b[10];			tx b[10];

	#pragma omp target if(0)			#pragma omp target if(0)
	{			{
	▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines
	int bar(int n){			int bar(int n){
	int a = 0;			int a = 0;

	a += ftemplate<int>(n);			a += ftemplate<int>(n);

	return a;			return a;
	}			}

	// SEQ: [[MEM_TY:%.+]] = type { [128 x i8] }
	// SEQ-DAG: [[SHARED_GLOBAL_RD:@.+]] = common addrspace(3) global [[MEM_TY]] zeroinitializer
	// SEQ-DAG: [[KERNEL_PTR:@.+]] = internal addrspace(3) global i8* null
	// SEQ-DAG: [[KERNEL_SIZE:@.+]] = internal unnamed_addr constant i{{64\|32}} 4
	// SEQ-DAG: [[KERNEL_SHARED:@.+]] = internal unnamed_addr constant i16 1

	// CHECK-NOT: define {{.*}}void {{@__omp_offloading_.+template.+l20}}_worker()			// CHECK-NOT: define {{.*}}void {{@__omp_offloading_.+template.+l20}}_worker()

	// CHECK-LABEL: define {{.*}}void {{@__omp_offloading_.+template.+l29}}_worker()			// CHECK-LABEL: define {{.*}}void {{@__omp_offloading_.+template.+l29}}_worker()
	// CHECK-DAG: [[OMP_EXEC_STATUS:%.+]] = alloca i8,			// CHECK-DAG: [[OMP_EXEC_STATUS:%.+]] = alloca i8,
	// CHECK-DAG: [[OMP_WORK_FN:%.+]] = alloca i8*,			// CHECK-DAG: [[OMP_WORK_FN:%.+]] = alloca i8*,
	// CHECK: store i8* null, i8** [[OMP_WORK_FN]],			// CHECK: store i8* null, i8** [[OMP_WORK_FN]],
	// CHECK: store i8 0, i8* [[OMP_EXEC_STATUS]],			// CHECK: store i8 0, i8* [[OMP_EXEC_STATUS]],
	▲ Show 20 Lines • Show All 235 Lines • ▼ Show 20 Lines

	// CHECK: declare void @__kmpc_barrier(%struct.ident_t*, i32) #[[#CONVERGENT:]]			// CHECK: declare void @__kmpc_barrier(%struct.ident_t*, i32) #[[#CONVERGENT:]]

	// CHECK-LABEL: define {{.*}}void {{@__omp_offloading_.+template.+l58}}_worker()			// CHECK-LABEL: define {{.*}}void {{@__omp_offloading_.+template.+l58}}_worker()
	// CHECK-LABEL: define {{.*}}void {{@__omp_offloading_.+template.+l58}}(			// CHECK-LABEL: define {{.*}}void {{@__omp_offloading_.+template.+l58}}(
	// CHECK-32: [[A_ADDR:%.+]] = alloca i32,			// CHECK-32: [[A_ADDR:%.+]] = alloca i32,
	// CHECK-64: [[A_ADDR:%.+]] = alloca i64,			// CHECK-64: [[A_ADDR:%.+]] = alloca i64,
	// CHECK-64: [[CONV:%.+]] = bitcast i64* [[A_ADDR]] to i32*			// CHECK-64: [[CONV:%.+]] = bitcast i64* [[A_ADDR]] to i32*
	// SEQ: [[IS_SHARED:%.+]] = load i16, i16* [[KERNEL_SHARED]],
	// SEQ: [[SIZE:%.+]] = load i{{64\|32}}, i{{64\|32}}* [[KERNEL_SIZE]],
	// SEQ: call void @__kmpc_get_team_static_memory(i16 0, i8* addrspacecast (i8 addrspace(3)* getelementptr inbounds ([[MEM_TY]], [[MEM_TY]] addrspace(3)* [[SHARED_GLOBAL_RD]], i32 0, i32 0, i32 0) to i8), i{{64\|32}} [[SIZE]], i16 [[IS_SHARED]], i8* addrspacecast (i8* addrspace(3)* [[KERNEL_PTR]] to i8**))
	// SEQ: [[KERNEL_RD:%.+]] = load i8, i8 addrspace(3)* [[KERNEL_PTR]],
	// SEQ: [[STACK:%.+]] = getelementptr inbounds i8, i8* [[KERNEL_RD]], i{{64\|32}} 0
	// PAR: [[STACK:%.+]] = call i8* @__kmpc_data_sharing_push_stack(i{{32\|64}} 4, i16 1)
	// CHECK: [[BC:%.+]] = bitcast i8* [[STACK]] to %struct._globalized_locals_ty*
	// CHECK-32: [[A:%.+]] = load i32, i32* [[A_ADDR]],			// CHECK-32: [[A:%.+]] = load i32, i32* [[A_ADDR]],
	// CHECK-64: [[A:%.+]] = load i32, i32* [[CONV]],			// CHECK-64: [[A:%.+]] = load i32, i32* [[CONV]],
	// CHECK: [[GLOBAL_A_ADDR:%.+]] = getelementptr inbounds %struct._globalized_locals_ty, %struct._globalized_locals_ty* [[BC]], i{{[0-9]+}} 0, i{{[0-9]+}} 0			// PAR: [[STACK:%.+]] = call i8* @__kmpc_data_sharing_push_stack(i{{32\|64}} 4)
	// CHECK: store i32 [[A]], i32* [[GLOBAL_A_ADDR]],			// CHECK: [[A_ON_STACK:%.+]] = bitcast i8* [[STACK]] to i32*
	// SEQ: [[IS_SHARED:%.+]] = load i16, i16* [[KERNEL_SHARED]],			// CHECK: store i32 [[A]], i32* [[A_ON_STACK]],
	// SEQ: call void @__kmpc_restore_team_static_memory(i16 0, i16 [[IS_SHARED]])
	// PAR: call void @__kmpc_data_sharing_pop_stack(i8* [[STACK]])			// PAR: call void @__kmpc_data_sharing_pop_stack(i8* [[STACK]])

	// CHECK-LABEL: define internal void @{{.+}}(i32* noalias %{{.+}}, i32* noalias %{{.+}}, i32* nonnull align {{[0-9]+}} dereferenceable{{.*}})			// CHECK-LABEL: define internal void @{{.+}}(i32* noalias %{{.+}}, i32* noalias %{{.+}}, i32* nonnull align {{[0-9]+}} dereferenceable{{.*}})
	// CHECK: [[CC:%.+]] = alloca i32,			// CHECK: [[CC:%.+]] = alloca i32,
	// CHECK: [[MASK:%.+]] = call i32 @__kmpc_warp_active_thread_mask(){{$}}			// CHECK: [[MASK:%.+]] = call i32 @__kmpc_warp_active_thread_mask(){{$}}
	// CHECK: [[TID:%.+]] = call i32 @llvm.nvvm.read.ptx.sreg.tid.x()			// CHECK: [[TID:%.+]] = call i32 @llvm.nvvm.read.ptx.sreg.tid.x()
	// CHECK: [[NUM_THREADS:%.+]] = call i32 @llvm.nvvm.read.ptx.sreg.ntid.x()			// CHECK: [[NUM_THREADS:%.+]] = call i32 @llvm.nvvm.read.ptx.sreg.ntid.x()
	// CHECK: store i32 0, i32* [[CC]],			// CHECK: store i32 0, i32* [[CC]],
	Show All 28 Lines

llvm/include/llvm/Frontend/OpenMP/OMPKinds.def

Show First 20 Lines • Show All 537 Lines • ▼ Show 20 Lines	__OMP_RTL(__kmpc_nvptx_teams_reduce_nowait_v2, false, Int32, IdentPtr, Int32,
VoidPtr, Int32, VoidPtr, ShuffleReducePtr, InterWarpCopyPtr,		VoidPtr, Int32, VoidPtr, ShuffleReducePtr, InterWarpCopyPtr,
GlobalListPtr, GlobalListPtr, GlobalListPtr, GlobalListPtr)		GlobalListPtr, GlobalListPtr, GlobalListPtr, GlobalListPtr)

__OMP_RTL(__kmpc_shuffle_int64, false, Int64, Int64, Int16, Int16)		__OMP_RTL(__kmpc_shuffle_int64, false, Int64, Int64, Int16, Int16)
__OMP_RTL(__kmpc_data_sharing_init_stack, false, Void, )		__OMP_RTL(__kmpc_data_sharing_init_stack, false, Void, )
__OMP_RTL(__kmpc_data_sharing_init_stack_spmd, false, Void, )		__OMP_RTL(__kmpc_data_sharing_init_stack_spmd, false, Void, )

__OMP_RTL(__kmpc_data_sharing_coalesced_push_stack, false, VoidPtr, SizeTy, Int16)		__OMP_RTL(__kmpc_data_sharing_coalesced_push_stack, false, VoidPtr, SizeTy, Int16)
__OMP_RTL(__kmpc_data_sharing_push_stack, false, VoidPtr, SizeTy, Int16)		__OMP_RTL(__kmpc_data_sharing_push_stack, false, VoidPtr, SizeTy)
__OMP_RTL(__kmpc_data_sharing_pop_stack, false, Void, VoidPtr)		__OMP_RTL(__kmpc_data_sharing_pop_stack, false, Void, VoidPtr)
__OMP_RTL(__kmpc_begin_sharing_variables, false, Void, VoidPtrPtrPtr, SizeTy)		__OMP_RTL(__kmpc_begin_sharing_variables, false, Void, VoidPtrPtrPtr, SizeTy)
__OMP_RTL(__kmpc_end_sharing_variables, false, Void, )		__OMP_RTL(__kmpc_end_sharing_variables, false, Void, )
__OMP_RTL(__kmpc_get_shared_variables, false, Void, VoidPtrPtrPtr)		__OMP_RTL(__kmpc_get_shared_variables, false, Void, VoidPtrPtrPtr)
__OMP_RTL(__kmpc_parallel_level, false, Int16, IdentPtr, Int32)		__OMP_RTL(__kmpc_parallel_level, false, Int16, IdentPtr, Int32)
__OMP_RTL(__kmpc_is_spmd_exec_mode, false, Int8, )		__OMP_RTL(__kmpc_is_spmd_exec_mode, false, Int8, )
__OMP_RTL(__kmpc_get_team_static_memory, false, Void, Int16, VoidPtr, SizeTy,		__OMP_RTL(__kmpc_get_team_static_memory, false, Void, Int16, VoidPtr, SizeTy,
Int16, VoidPtrPtr)		Int16, VoidPtrPtr)
▲ Show 20 Lines • Show All 672 Lines • Show Last 20 Lines

openmp/libomptarget/deviceRTLs/common/src/data_sharing.cu

Show First 20 Lines • Show All 138 Lines • ▼ Show 20 Lines	EXTERN void *__kmpc_data_sharing_coalesced_push_stack(size_t DataSize,
int16_t UseSharedMemory) {		int16_t UseSharedMemory) {
return data_sharing_push_stack_common(DataSize);		return data_sharing_push_stack_common(DataSize);
}		}

// Called at the time of the kernel initialization. This is used to initilize		// Called at the time of the kernel initialization. This is used to initilize
// the list of references to shared variables and to pre-allocate global storage		// the list of references to shared variables and to pre-allocate global storage
// for holding the globalized variables.		// for holding the globalized variables.
//		//
// By default the globalized variables are stored in global memory. If the		EXTERN void *__kmpc_data_sharing_push_stack(size_t DataSize) {
// UseSharedMemory is set to true, the runtime will attempt to use shared memory
// as long as the size requested fits the pre-allocated size.
EXTERN void *__kmpc_data_sharing_push_stack(size_t DataSize,
int16_t UseSharedMemory) {
// Compute the total memory footprint of the requested data.		// Compute the total memory footprint of the requested data.
// The master thread requires a stack only for itself. A worker		// The master thread requires a stack only for itself. A worker
// thread (which at this point is a warp master) will require		// thread (which at this point is a warp master) will require
// space for the variables of each thread in the warp,		// space for the variables of each thread in the warp,
// i.e. one DataSize chunk per warp lane.		// i.e. one DataSize chunk per warp lane.
// TODO: change WARPSIZE to the number of active threads in the warp.		// TODO: change WARPSIZE to the number of active threads in the warp.
size_t PushSize = (isRuntimeUninitialized() \|\| IsMasterThread(isSPMDMode()))		size_t PushSize = (isRuntimeUninitialized() \|\| IsMasterThread(isSPMDMode()))
? DataSize		? DataSize
▲ Show 20 Lines • Show All 118 Lines • Show Last 20 Lines

openmp/libomptarget/deviceRTLs/interface.h

	Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
	typedef enum omp_proc_bind_t {			typedef enum omp_proc_bind_t {
	omp_proc_bind_false = 0,			omp_proc_bind_false = 0,
	omp_proc_bind_true = 1,			omp_proc_bind_true = 1,
	omp_proc_bind_master = 2,			omp_proc_bind_master = 2,
	omp_proc_bind_close = 3,			omp_proc_bind_close = 3,
	omp_proc_bind_spread = 4			omp_proc_bind_spread = 4
	} omp_proc_bind_t;			} omp_proc_bind_t;

	EXTERN double omp_get_wtick(void);			EXTERN double omp_get_wtick(void);
				Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'EXTERN' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name 'EXTERN' [clang-diagnostic-error] [[https://github.
	EXTERN double omp_get_wtime(void);			EXTERN double omp_get_wtime(void);
				Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'EXTERN' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name 'EXTERN' [clang-diagnostic-error] [[https://github.

	EXTERN void omp_set_num_threads(int num);			EXTERN void omp_set_num_threads(int num);
				Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'EXTERN' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name 'EXTERN' [clang-diagnostic-error] [[https://github.
	EXTERN int omp_get_num_threads(void);			EXTERN int omp_get_num_threads(void);
				Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'EXTERN' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name 'EXTERN' [clang-diagnostic-error] [[https://github.
	EXTERN int omp_get_max_threads(void);			EXTERN int omp_get_max_threads(void);
				Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'EXTERN' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name 'EXTERN' [clang-diagnostic-error] [[https://github.
	EXTERN int omp_get_thread_limit(void);			EXTERN int omp_get_thread_limit(void);
				Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'EXTERN' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name 'EXTERN' [clang-diagnostic-error] [[https://github.
	EXTERN int omp_get_thread_num(void);			EXTERN int omp_get_thread_num(void);
				Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'EXTERN' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name 'EXTERN' [clang-diagnostic-error] [[https://github.
	EXTERN int omp_get_num_procs(void);			EXTERN int omp_get_num_procs(void);
				Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'EXTERN' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name 'EXTERN' [clang-diagnostic-error] [[https://github.
	EXTERN int omp_in_parallel(void);			EXTERN int omp_in_parallel(void);
				Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'EXTERN' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name 'EXTERN' [clang-diagnostic-error] [[https://github.
	EXTERN int omp_in_final(void);			EXTERN int omp_in_final(void);
				Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'EXTERN' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name 'EXTERN' [clang-diagnostic-error] [[https://github.
	EXTERN void omp_set_dynamic(int flag);			EXTERN void omp_set_dynamic(int flag);
				Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'EXTERN' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name 'EXTERN' [clang-diagnostic-error] [[https://github.
	EXTERN int omp_get_dynamic(void);			EXTERN int omp_get_dynamic(void);
				Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'EXTERN' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name 'EXTERN' [clang-diagnostic-error] [[https://github.
	EXTERN void omp_set_nested(int flag);			EXTERN void omp_set_nested(int flag);
				Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'EXTERN' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name 'EXTERN' [clang-diagnostic-error] [[https://github.
	EXTERN int omp_get_nested(void);			EXTERN int omp_get_nested(void);
				Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'EXTERN' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name 'EXTERN' [clang-diagnostic-error] [[https://github.
	EXTERN void omp_set_max_active_levels(int level);			EXTERN void omp_set_max_active_levels(int level);
				Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'EXTERN' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name 'EXTERN' [clang-diagnostic-error] [[https://github.
	EXTERN int omp_get_max_active_levels(void);			EXTERN int omp_get_max_active_levels(void);
				Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'EXTERN' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name 'EXTERN' [clang-diagnostic-error] [[https://github.
	EXTERN int omp_get_level(void);			EXTERN int omp_get_level(void);
				Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'EXTERN' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name 'EXTERN' [clang-diagnostic-error] [[https://github.
	EXTERN int omp_get_active_level(void);			EXTERN int omp_get_active_level(void);
				Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'EXTERN' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name 'EXTERN' [clang-diagnostic-error] [[https://github.
	EXTERN int omp_get_ancestor_thread_num(int level);			EXTERN int omp_get_ancestor_thread_num(int level);
				Lint: Pre-merge checks Inline Actions clang-tidy: error: unknown type name 'EXTERN' [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: unknown type name 'EXTERN' [clang-diagnostic-error] [[https://github.
	EXTERN int omp_get_team_size(int level);			EXTERN int omp_get_team_size(int level);

	EXTERN void omp_init_lock(omp_lock_t *lock);			EXTERN void omp_init_lock(omp_lock_t *lock);
	EXTERN void omp_init_nest_lock(omp_nest_lock_t *lock);			EXTERN void omp_init_nest_lock(omp_nest_lock_t *lock);
	EXTERN void omp_destroy_lock(omp_lock_t *lock);			EXTERN void omp_destroy_lock(omp_lock_t *lock);
	EXTERN void omp_destroy_nest_lock(omp_nest_lock_t *lock);			EXTERN void omp_destroy_nest_lock(omp_nest_lock_t *lock);
	EXTERN void omp_set_lock(omp_lock_t *lock);			EXTERN void omp_set_lock(omp_lock_t *lock);
	EXTERN void omp_set_nest_lock(omp_nest_lock_t *lock);			EXTERN void omp_set_nest_lock(omp_nest_lock_t *lock);
	▲ Show 20 Lines • Show All 349 Lines • ▼ Show 20 Lines
	EXTERN void __kmpc_kernel_prepare_parallel(void *WorkFn);			EXTERN void __kmpc_kernel_prepare_parallel(void *WorkFn);
	EXTERN bool __kmpc_kernel_parallel(void **WorkFn);			EXTERN bool __kmpc_kernel_parallel(void **WorkFn);
	EXTERN void __kmpc_kernel_end_parallel();			EXTERN void __kmpc_kernel_end_parallel();

	EXTERN void __kmpc_data_sharing_init_stack();			EXTERN void __kmpc_data_sharing_init_stack();
	EXTERN void __kmpc_data_sharing_init_stack_spmd();			EXTERN void __kmpc_data_sharing_init_stack_spmd();
	EXTERN void *__kmpc_data_sharing_coalesced_push_stack(size_t size,			EXTERN void *__kmpc_data_sharing_coalesced_push_stack(size_t size,
	int16_t UseSharedMemory);			int16_t UseSharedMemory);
	EXTERN void *__kmpc_data_sharing_push_stack(size_t size, int16_t UseSharedMemory);			EXTERN void *__kmpc_data_sharing_push_stack(size_t size);
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function '__kmpc_data_sharing_push_stack' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function '__kmpc_data_sharing_push_stack'…
	EXTERN void __kmpc_data_sharing_pop_stack(void *a);			EXTERN void __kmpc_data_sharing_pop_stack(void *a);
	EXTERN void __kmpc_begin_sharing_variables(void ***GlobalArgs, size_t nArgs);			EXTERN void __kmpc_begin_sharing_variables(void ***GlobalArgs, size_t nArgs);
	EXTERN void __kmpc_end_sharing_variables();			EXTERN void __kmpc_end_sharing_variables();
	EXTERN void __kmpc_get_shared_variables(void ***GlobalArgs);			EXTERN void __kmpc_get_shared_variables(void ***GlobalArgs);

	// The slot used for data sharing by the master and worker threads. We use a			// The slot used for data sharing by the master and worker threads. We use a
	// complete (default size version and an incomplete one so that we allow sizes			// complete (default size version and an incomplete one so that we allow sizes
	// greater than the default).			// greater than the default).
	Show All 19 Lines