This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/CodeGen/
-
CodeGen/
-
CGOpenMPRuntimeNVPTX.h
-
CGOpenMPRuntimeNVPTX.cpp
-
test/OpenMP/
-
OpenMP/
-
declare_target_codegen_globalization.cpp
-
nvptx_target_codegen.cpp

Differential D52733

[OpenMP][NVPTX] Avoid data sharing if in parallel region
AbandonedPublic

Authored by Hahnfeld on Oct 1 2018, 10:10 AM.

Download Raw Diff

Details

Reviewers

gtbercea
ABataev
jdoerfert

Summary

Previously the generated code only checked if the kernel is executing
in SPMD mode. However a worker thread participating in a parallel
region will never execute serialize nested parallel directives and
doesn't need data sharing with other threads either.
Refactor the code from emitNonSPMDParallelCall into a helper function
to make clear that these two things use the same conditions.

Diff Detail

Event Timeline

Hahnfeld created this revision.Oct 1 2018, 10:10 AM

Herald added subscribers: cfe-commits, guansong, jholewinski. · View Herald TranscriptOct 1 2018, 10:10 AM

Hahnfeld added a parent revision: D52732: [OpenMP][NVPTX] Simplify codegen for orphaned parallel, NFCI..Oct 1 2018, 10:10 AM

It might lead to increased register pressure, isn't it? Currently, I'm trying to emit the code that can be optimized out and, thus, may decrease the register pressure. That's why I tried to reduce the number of the runtime checks.

In D52733#1251421, @ABataev wrote:

It might lead to increased register pressure, isn't it? Currently, I'm trying to emit the code that can be optimized out and, thus, may decrease the register pressure. That's why I tried to reduce the number of the runtime checks.

You are right, it's increasing register usage but I think it shouldn't: The generated code is always checking __kmpc_is_spmd_exec_mode first. So if LLVM would be able to optimize this out in SPMD mode, __kmpc_parallel_level should never be called.

I guess this doesn't work because it's illegal to hoist the load of execution_param across a barrier?

In D52733#1252963, @Hahnfeld wrote:

In D52733#1251421, @ABataev wrote:

It might lead to increased register pressure, isn't it? Currently, I'm trying to emit the code that can be optimized out and, thus, may decrease the register pressure. That's why I tried to reduce the number of the runtime checks.

You are right, it's increasing register usage but I think it shouldn't: The generated code is always checking __kmpc_is_spmd_exec_mode first. So if LLVM would be able to optimize this out in SPMD mode, __kmpc_parallel_level should never be called.

I guess this doesn't work because it's illegal to hoist the load of execution_param across a barrier?

Even if we will be able to reduce register usage for SPMD, it still going to be high for non-SPMD constructs. The optimizer is not able to understand that it is in parallel region or not during the compilation phase

In D52733#1252966, @ABataev wrote:

In D52733#1252963, @Hahnfeld wrote:

In D52733#1251421, @ABataev wrote:

It might lead to increased register pressure, isn't it? Currently, I'm trying to emit the code that can be optimized out and, thus, may decrease the register pressure. That's why I tried to reduce the number of the runtime checks.

You are right, it's increasing register usage but I think it shouldn't: The generated code is always checking __kmpc_is_spmd_exec_mode first. So if LLVM would be able to optimize this out in SPMD mode, __kmpc_parallel_level should never be called.

I guess this doesn't work because it's illegal to hoist the load of execution_param across a barrier?

Even if we will be able to reduce register usage for SPMD, it still going to be high for non-SPMD constructs. The optimizer is not able to understand that it is in parallel region or not during the compilation phase

Instead we avoid the runtime overhead of data sharing. Plus we'll be able to drop around 2 thirds of the static data allocation in libomptarget-nvptx which leaves more room for the user's application... We'll see, for now I agree that the added registers for SPMD are unacceptable.

Hahnfeld abandoned this revision.Nov 16 2019, 9:44 AM

Herald added a reviewer: jdoerfert. · View Herald TranscriptNov 16 2019, 9:44 AM

Herald added a project: Restricted Project. · View Herald Transcript

Revision Contents

Path

Size

lib/

CodeGen/

CGOpenMPRuntimeNVPTX.h

16 lines

CGOpenMPRuntimeNVPTX.cpp

218 lines

test/

OpenMP/

declare_target_codegen_globalization.cpp

10 lines

nvptx_target_codegen.cpp

16 lines

Diff 167761

lib/CodeGen/CGOpenMPRuntimeNVPTX.h

Show First 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	private:

/// Helper for non-SPMD target entry function. Guide the master and		/// Helper for non-SPMD target entry function. Guide the master and
/// worker threads to their respective locations.		/// worker threads to their respective locations.
void emitNonSPMDEntryHeader(CodeGenFunction &CGF, EntryFunctionState &EST,		void emitNonSPMDEntryHeader(CodeGenFunction &CGF, EntryFunctionState &EST,
WorkerFunctionState &WST);		WorkerFunctionState &WST);

/// Signal termination of OMP execution for non-SPMD target entry		/// Signal termination of OMP execution for non-SPMD target entry
/// function.		/// function.
void emitNonSPMDEntryFooter(CodeGenFunction &CGF, EntryFunctionState &EST);		void emitNonSPMDEntryFooter(CodeGenFunction &CGF, EntryFunctionState &EST,
		WorkerFunctionState &WST);

		/// Helper to emit a runtime check whether the current thread is already
		/// participating in a parallel region.
		std::pair<llvm::BasicBlock , llvm::BasicBlock >
		emitInParallelRuntimeCheck(CodeGenFunction &CGF, SourceLocation Loc,
		const RegionCodeGenTy &InParallelGen,
		const RegionCodeGenTy &MasterGen);

/// Helper for generic variables globalization prolog.		/// Helper for generic variables globalization prolog.
void emitGenericVarsProlog(CodeGenFunction &CGF, SourceLocation Loc,		void emitGenericVarsProlog(CodeGenFunction &CGF, SourceLocation Loc,
bool WithSPMDCheck = false);		bool WithRuntimeCheck = false);

/// Helper for generic variables globalization epilog.		/// Helper for generic variables globalization epilog.
void emitGenericVarsEpilog(CodeGenFunction &CGF, bool WithSPMDCheck = false);		void emitGenericVarsEpilog(CodeGenFunction &CGF, SourceLocation Loc,
		bool WithRuntimeCheck = false);

/// Helper for SPMD mode target directive's entry function.		/// Helper for SPMD mode target directive's entry function.
void emitSPMDEntryHeader(CodeGenFunction &CGF, EntryFunctionState &EST,		void emitSPMDEntryHeader(CodeGenFunction &CGF, EntryFunctionState &EST,
const OMPExecutableDirective &D);		const OMPExecutableDirective &D);

/// Signal termination of SPMD mode execution.		/// Signal termination of SPMD mode execution.
void emitSPMDEntryFooter(CodeGenFunction &CGF, EntryFunctionState &EST);		void emitSPMDEntryFooter(CodeGenFunction &CGF, EntryFunctionState &EST);

▲ Show 20 Lines • Show All 292 Lines • ▼ Show 20 Lines	private:
using EscapedParamsTy = llvm::SmallPtrSet<const Decl *, 4>;		using EscapedParamsTy = llvm::SmallPtrSet<const Decl *, 4>;
struct FunctionData {		struct FunctionData {
DeclToAddrMapTy LocalVarData;		DeclToAddrMapTy LocalVarData;
EscapedParamsTy EscapedParameters;		EscapedParamsTy EscapedParameters;
llvm::SmallVector<const ValueDecl*, 4> EscapedVariableLengthDecls;		llvm::SmallVector<const ValueDecl*, 4> EscapedVariableLengthDecls;
llvm::SmallVector<llvm::Value *, 4> EscapedVariableLengthDeclsAddrs;		llvm::SmallVector<llvm::Value *, 4> EscapedVariableLengthDeclsAddrs;
const RecordDecl *GlobalRecord = nullptr;		const RecordDecl *GlobalRecord = nullptr;
llvm::Value *GlobalRecordAddr = nullptr;		llvm::Value *GlobalRecordAddr = nullptr;
llvm::Value *IsInSPMDModeFlag = nullptr;
std::unique_ptr<CodeGenFunction::OMPMapVars> MappedParams;		std::unique_ptr<CodeGenFunction::OMPMapVars> MappedParams;
};		};
/// Maps the function to the list of the globalized variables with their		/// Maps the function to the list of the globalized variables with their
/// addresses.		/// addresses.
llvm::SmallDenseMap<llvm::Function *, FunctionData> FunctionGlobalizedDecls;		llvm::SmallDenseMap<llvm::Function *, FunctionData> FunctionGlobalizedDecls;
};		};

} // CodeGen namespace.		} // CodeGen namespace.
} // clang namespace.		} // clang namespace.

#endif // LLVM_CLANG_LIB_CODEGEN_CGOPENMPRUNTIMENVPTX_H		#endif // LLVM_CLANG_LIB_CODEGEN_CGOPENMPRUNTIMENVPTX_H

lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp

Show First 20 Lines • Show All 1,082 Lines • ▼ Show 20 Lines	NVPTXPrePostActionTy(CGOpenMPRuntimeNVPTX::EntryFunctionState &EST,
CGOpenMPRuntimeNVPTX::WorkerFunctionState &WST)		CGOpenMPRuntimeNVPTX::WorkerFunctionState &WST)
: EST(EST), WST(WST) {}		: EST(EST), WST(WST) {}
void Enter(CodeGenFunction &CGF) override {		void Enter(CodeGenFunction &CGF) override {
static_cast<CGOpenMPRuntimeNVPTX &>(CGF.CGM.getOpenMPRuntime())		static_cast<CGOpenMPRuntimeNVPTX &>(CGF.CGM.getOpenMPRuntime())
.emitNonSPMDEntryHeader(CGF, EST, WST);		.emitNonSPMDEntryHeader(CGF, EST, WST);
}		}
void Exit(CodeGenFunction &CGF) override {		void Exit(CodeGenFunction &CGF) override {
static_cast<CGOpenMPRuntimeNVPTX &>(CGF.CGM.getOpenMPRuntime())		static_cast<CGOpenMPRuntimeNVPTX &>(CGF.CGM.getOpenMPRuntime())
.emitNonSPMDEntryFooter(CGF, EST);		.emitNonSPMDEntryFooter(CGF, EST, WST);
}		}
} Action(EST, WST);		} Action(EST, WST);
CodeGen.setAction(Action);		CodeGen.setAction(Action);
emitTargetOutlinedFunctionHelper(D, ParentName, OutlinedFn, OutlinedFnID,		emitTargetOutlinedFunctionHelper(D, ParentName, OutlinedFn, OutlinedFnID,
IsOffloadEntry, CodeGen);		IsOffloadEntry, CodeGen);

// Now change the name of the worker function to correspond to this target		// Now change the name of the worker function to correspond to this target
// region's entry function.		// region's entry function.
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	void CGOpenMPRuntimeNVPTX::emitNonSPMDEntryHeader(CodeGenFunction &CGF,
CGF.EmitRuntimeCall(		CGF.EmitRuntimeCall(
createNVPTXRuntimeFunction(		createNVPTXRuntimeFunction(
OMPRTL_NVPTX__kmpc_data_sharing_init_stack));		OMPRTL_NVPTX__kmpc_data_sharing_init_stack));

emitGenericVarsProlog(CGF, WST.Loc);		emitGenericVarsProlog(CGF, WST.Loc);
}		}

void CGOpenMPRuntimeNVPTX::emitNonSPMDEntryFooter(CodeGenFunction &CGF,		void CGOpenMPRuntimeNVPTX::emitNonSPMDEntryFooter(CodeGenFunction &CGF,
EntryFunctionState &EST) {		EntryFunctionState &EST,
		WorkerFunctionState &WST) {
IsInTargetMasterThreadRegion = false;		IsInTargetMasterThreadRegion = false;
if (!CGF.HaveInsertPoint())		if (!CGF.HaveInsertPoint())
return;		return;

emitGenericVarsEpilog(CGF);		emitGenericVarsEpilog(CGF, WST.Loc);

if (!EST.ExitBB)		if (!EST.ExitBB)
EST.ExitBB = CGF.createBasicBlock(".exit");		EST.ExitBB = CGF.createBasicBlock(".exit");

llvm::BasicBlock *TerminateBB = CGF.createBasicBlock(".termination.notifier");		llvm::BasicBlock *TerminateBB = CGF.createBasicBlock(".termination.notifier");
CGF.EmitBranch(TerminateBB);		CGF.EmitBranch(TerminateBB);

CGF.EmitBlock(TerminateBB);		CGF.EmitBlock(TerminateBB);
▲ Show 20 Lines • Show All 707 Lines • ▼ Show 20 Lines	void Enter(CodeGenFunction &CGF) override {
Pair.getFirst(),		Pair.getFirst(),
std::make_pair(Pair.getSecond(), Address::invalid())));		std::make_pair(Pair.getSecond(), Address::invalid())));
}		}
}		}
Rt.emitGenericVarsProlog(CGF, Loc);		Rt.emitGenericVarsProlog(CGF, Loc);
}		}
void Exit(CodeGenFunction &CGF) override {		void Exit(CodeGenFunction &CGF) override {
static_cast<CGOpenMPRuntimeNVPTX &>(CGF.CGM.getOpenMPRuntime())		static_cast<CGOpenMPRuntimeNVPTX &>(CGF.CGM.getOpenMPRuntime())
.emitGenericVarsEpilog(CGF);		.emitGenericVarsEpilog(CGF, Loc);
}		}
} Action(Loc, GlobalizedRD, MappedDeclsFields);		} Action(Loc, GlobalizedRD, MappedDeclsFields);
CodeGen.setAction(Action);		CodeGen.setAction(Action);
llvm::Value *OutlinedFunVal = CGOpenMPRuntime::emitTeamsOutlinedFunction(		llvm::Value *OutlinedFunVal = CGOpenMPRuntime::emitTeamsOutlinedFunction(
D, ThreadIDVar, InnermostKind, CodeGen);		D, ThreadIDVar, InnermostKind, CodeGen);
llvm::Function *OutlinedFun = cast<llvm::Function>(OutlinedFunVal);		llvm::Function *OutlinedFun = cast<llvm::Function>(OutlinedFunVal);
OutlinedFun->removeFnAttr(llvm::Attribute::NoInline);		OutlinedFun->removeFnAttr(llvm::Attribute::NoInline);
OutlinedFun->removeFnAttr(llvm::Attribute::OptimizeNone);		OutlinedFun->removeFnAttr(llvm::Attribute::OptimizeNone);
OutlinedFun->addFnAttr(llvm::Attribute::AlwaysInline);		OutlinedFun->addFnAttr(llvm::Attribute::AlwaysInline);

return OutlinedFun;		return OutlinedFun;
}		}

		std::pair<llvm::BasicBlock , llvm::BasicBlock >
		CGOpenMPRuntimeNVPTX::emitInParallelRuntimeCheck(
		CodeGenFunction &CGF, SourceLocation Loc,
		const RegionCodeGenTy &InParallelGen, const RegionCodeGenTy &MasterGen) {
		// Check for SPMD mode and then parallelism:
		// if (__kmpc_is_spmd_exec_mode() \|\| __kmpc_parallel_level(loc, gtid)) {
		// Already in parallel.
		// } else {
		// Code for master thread.
		// (Worker threads have forked off to the generated worker function.)
		// }
		CGBuilderTy &Bld = CGF.Builder;
		llvm::BasicBlock *ExitBB = CGF.createBasicBlock(".exit");
		llvm::BasicBlock *ParallelCheckBB = CGF.createBasicBlock(".parcheck");
		llvm::BasicBlock *InParallelBB = CGF.createBasicBlock(".in-parallel");
		llvm::BasicBlock *MasterBB = CGF.createBasicBlock(".master");
		llvm::Value *IsSPMD = Bld.CreateIsNotNull(CGF.EmitNounwindRuntimeCall(
		createNVPTXRuntimeFunction(OMPRTL_NVPTX__kmpc_is_spmd_exec_mode)));
		Bld.CreateCondBr(IsSPMD, InParallelBB, ParallelCheckBB);
		// There is no need to emit line number for unconditional branch.
		(void)ApplyDebugLocation::CreateEmpty(CGF);
		CGF.EmitBlock(ParallelCheckBB);
		llvm::Value *RTLoc = emitUpdateLocation(CGF, Loc);
		llvm::Value *ThreadID = getThreadID(CGF, Loc);
		llvm::Value *PL = CGF.EmitRuntimeCall(
		createNVPTXRuntimeFunction(OMPRTL_NVPTX__kmpc_parallel_level),
		{RTLoc, ThreadID});
		llvm::Value *Res = Bld.CreateIsNotNull(PL);
		Bld.CreateCondBr(Res, InParallelBB, MasterBB);
		CGF.EmitBlock(InParallelBB);
		InParallelGen(CGF);
		CGF.EmitBranch(ExitBB);
		// There is no need to emit line number for unconditional branch.
		(void)ApplyDebugLocation::CreateEmpty(CGF);
		CGF.EmitBlock(MasterBB);
		MasterGen(CGF);
		CGF.EmitBranch(ExitBB);
		// There is no need to emit line number for unconditional branch.
		(void)ApplyDebugLocation::CreateEmpty(CGF);
		// Emit the continuation block for code after the if.
		CGF.EmitBlock(ExitBB, /IsFinished=/true);

		return {InParallelBB, MasterBB};
		}

void CGOpenMPRuntimeNVPTX::emitGenericVarsProlog(CodeGenFunction &CGF,		void CGOpenMPRuntimeNVPTX::emitGenericVarsProlog(CodeGenFunction &CGF,
SourceLocation Loc,		SourceLocation Loc,
bool WithSPMDCheck) {		bool WithRuntimeCheck) {
if (getDataSharingMode(CGM) != CGOpenMPRuntimeNVPTX::Generic &&		if (getDataSharingMode(CGM) != CGOpenMPRuntimeNVPTX::Generic &&
getExecutionMode() != CGOpenMPRuntimeNVPTX::EM_SPMD)		getExecutionMode() != CGOpenMPRuntimeNVPTX::EM_SPMD)
return;		return;

CGBuilderTy &Bld = CGF.Builder;

const auto I = FunctionGlobalizedDecls.find(CGF.CurFn);		const auto I = FunctionGlobalizedDecls.find(CGF.CurFn);
if (I == FunctionGlobalizedDecls.end())		if (I == FunctionGlobalizedDecls.end())
return;		return;
if (const RecordDecl *GlobalizedVarsRecord = I->getSecond().GlobalRecord) {		if (const RecordDecl *GlobalizedVarsRecord = I->getSecond().GlobalRecord) {
QualType RecTy = CGM.getContext().getRecordType(GlobalizedVarsRecord);		QualType RecTy = CGM.getContext().getRecordType(GlobalizedVarsRecord);

// Recover pointer to this function's global record. The runtime will		// Recover pointer to this function's global record. The runtime will
// handle the specifics of the allocation of the memory.		// handle the specifics of the allocation of the memory.
// Use actual memory size of the record including the padding		// Use actual memory size of the record including the padding
// for alignment purposes.		// for alignment purposes.
		llvm::Value GlobalRecValue, GlobalRecCastAddr;
		auto &&DataSharingGen = [this, &RecTy, &GlobalRecValue, &GlobalRecCastAddr](
		CodeGenFunction &CGF, PrePostActionTy &) {
unsigned Alignment =		unsigned Alignment =
CGM.getContext().getTypeAlignInChars(RecTy).getQuantity();		CGM.getContext().getTypeAlignInChars(RecTy).getQuantity();
unsigned GlobalRecordSize =		unsigned GlobalRecordSize =
CGM.getContext().getTypeSizeInChars(RecTy).getQuantity();		CGM.getContext().getTypeSizeInChars(RecTy).getQuantity();
GlobalRecordSize = llvm::alignTo(GlobalRecordSize, Alignment);		GlobalRecordSize = llvm::alignTo(GlobalRecordSize, Alignment);

llvm::Value *GlobalRecCastAddr;
if (WithSPMDCheck \|\|
getExecutionMode() == CGOpenMPRuntimeNVPTX::EM_Unknown) {
llvm::BasicBlock *ExitBB = CGF.createBasicBlock(".exit");
llvm::BasicBlock *SPMDBB = CGF.createBasicBlock(".spmd");
llvm::BasicBlock *NonSPMDBB = CGF.createBasicBlock(".non-spmd");
llvm::Value *IsSPMD = Bld.CreateIsNotNull(CGF.EmitNounwindRuntimeCall(
createNVPTXRuntimeFunction(OMPRTL_NVPTX__kmpc_is_spmd_exec_mode)));
Bld.CreateCondBr(IsSPMD, SPMDBB, NonSPMDBB);
// There is no need to emit line number for unconditional branch.
(void)ApplyDebugLocation::CreateEmpty(CGF);
CGF.EmitBlock(SPMDBB);
Address RecPtr = CGF.CreateMemTemp(RecTy, "_local_stack");
CGF.EmitBranch(ExitBB);
// There is no need to emit line number for unconditional branch.
(void)ApplyDebugLocation::CreateEmpty(CGF);
CGF.EmitBlock(NonSPMDBB);
// TODO: allow the usage of shared memory to be controlled by		// TODO: allow the usage of shared memory to be controlled by
// the user, for now, default to global.		// the user, for now, default to global.
llvm::Value *GlobalRecordSizeArg[] = {		llvm::Value *GlobalRecordSizeArg[] = {
llvm::ConstantInt::get(CGM.SizeTy, GlobalRecordSize),		llvm::ConstantInt::get(CGM.SizeTy, GlobalRecordSize),
CGF.Builder.getInt16(/UseSharedMemory=/0)};		CGF.Builder.getInt16(/UseSharedMemory=/0)};
llvm::Value *GlobalRecValue =		GlobalRecValue =
CGF.EmitRuntimeCall(createNVPTXRuntimeFunction(		CGF.EmitRuntimeCall(createNVPTXRuntimeFunction(
OMPRTL_NVPTX__kmpc_data_sharing_push_stack),		OMPRTL_NVPTX__kmpc_data_sharing_push_stack),
GlobalRecordSizeArg);		GlobalRecordSizeArg);
GlobalRecCastAddr = Bld.CreatePointerBitCastOrAddrSpaceCast(		GlobalRecCastAddr = CGF.Builder.CreatePointerBitCastOrAddrSpaceCast(
GlobalRecValue, CGF.ConvertTypeForMem(RecTy)->getPointerTo());		GlobalRecValue, CGF.ConvertTypeForMem(RecTy)->getPointerTo());
CGF.EmitBlock(ExitBB);		};
auto *Phi = Bld.CreatePHI(GlobalRecCastAddr->getType(),		if (WithRuntimeCheck \|\|
		getExecutionMode() == CGOpenMPRuntimeNVPTX::EM_Unknown) {
		Address RecPtr = Address::invalid();
		auto &&LocalGen = [&RecPtr, &RecTy](CodeGenFunction &CGF,
		PrePostActionTy &) {
		RecPtr = CGF.CreateMemTemp(RecTy, "_local_stack");
		};

		std::pair<llvm::BasicBlock , llvm::BasicBlock > Blocks =
		emitInParallelRuntimeCheck(CGF, Loc, LocalGen, DataSharingGen);

		auto *Phi =
		CGF.Builder.CreatePHI(GlobalRecCastAddr->getType(),
/NumReservedValues=/2, "_select_stack");		/NumReservedValues=/2, "_select_stack");
Phi->addIncoming(RecPtr.getPointer(), SPMDBB);		Phi->addIncoming(RecPtr.getPointer(), Blocks.first);
Phi->addIncoming(GlobalRecCastAddr, NonSPMDBB);		Phi->addIncoming(GlobalRecCastAddr, Blocks.second);

GlobalRecCastAddr = Phi;		GlobalRecCastAddr = Phi;
I->getSecond().GlobalRecordAddr = Phi;		I->getSecond().GlobalRecordAddr = Phi;
I->getSecond().IsInSPMDModeFlag = IsSPMD;
} else {		} else {
// TODO: allow the usage of shared memory to be controlled by		RegionCodeGenTy DataSharingRCG(DataSharingGen);
// the user, for now, default to global.		DataSharingRCG(CGF);
llvm::Value *GlobalRecordSizeArg[] = {
llvm::ConstantInt::get(CGM.SizeTy, GlobalRecordSize),
CGF.Builder.getInt16(/UseSharedMemory=/0)};
llvm::Value *GlobalRecValue =
CGF.EmitRuntimeCall(createNVPTXRuntimeFunction(
OMPRTL_NVPTX__kmpc_data_sharing_push_stack),
GlobalRecordSizeArg);
GlobalRecCastAddr = Bld.CreatePointerBitCastOrAddrSpaceCast(
GlobalRecValue, CGF.ConvertTypeForMem(RecTy)->getPointerTo());
I->getSecond().GlobalRecordAddr = GlobalRecValue;		I->getSecond().GlobalRecordAddr = GlobalRecValue;
I->getSecond().IsInSPMDModeFlag = nullptr;
}		}
LValue Base =		LValue Base =
CGF.MakeNaturalAlignPointeeAddrLValue(GlobalRecCastAddr, RecTy);		CGF.MakeNaturalAlignPointeeAddrLValue(GlobalRecCastAddr, RecTy);

// Emit the "global alloca" which is a GEP from the global declaration		// Emit the "global alloca" which is a GEP from the global declaration
// record using the pointer returned by the runtime.		// record using the pointer returned by the runtime.
for (auto &Rec : I->getSecond().LocalVarData) {		for (auto &Rec : I->getSecond().LocalVarData) {
bool EscapedParam = I->getSecond().EscapedParameters.count(Rec.first);		bool EscapedParam = I->getSecond().EscapedParameters.count(Rec.first);
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	for (const ValueDecl *VD : I->getSecond().EscapedVariableLengthDecls) {
I->getSecond().MappedParams->setVarAddr(CGF, cast<VarDecl>(VD),		I->getSecond().MappedParams->setVarAddr(CGF, cast<VarDecl>(VD),
Base.getAddress());		Base.getAddress());
I->getSecond().EscapedVariableLengthDeclsAddrs.emplace_back(GlobalRecValue);		I->getSecond().EscapedVariableLengthDeclsAddrs.emplace_back(GlobalRecValue);
}		}
I->getSecond().MappedParams->apply(CGF);		I->getSecond().MappedParams->apply(CGF);
}		}

void CGOpenMPRuntimeNVPTX::emitGenericVarsEpilog(CodeGenFunction &CGF,		void CGOpenMPRuntimeNVPTX::emitGenericVarsEpilog(CodeGenFunction &CGF,
bool WithSPMDCheck) {		SourceLocation Loc,
		bool WithRuntimeCheck) {
if (getDataSharingMode(CGM) != CGOpenMPRuntimeNVPTX::Generic &&		if (getDataSharingMode(CGM) != CGOpenMPRuntimeNVPTX::Generic &&
getExecutionMode() != CGOpenMPRuntimeNVPTX::EM_SPMD)		getExecutionMode() != CGOpenMPRuntimeNVPTX::EM_SPMD)
return;		return;

const auto I = FunctionGlobalizedDecls.find(CGF.CurFn);		const auto I = FunctionGlobalizedDecls.find(CGF.CurFn);
if (I != FunctionGlobalizedDecls.end()) {		if (I != FunctionGlobalizedDecls.end()) {
I->getSecond().MappedParams->restore(CGF);		I->getSecond().MappedParams->restore(CGF);
if (!CGF.HaveInsertPoint())		if (!CGF.HaveInsertPoint())
return;		return;
for (llvm::Value *Addr :		for (llvm::Value *Addr :
llvm::reverse(I->getSecond().EscapedVariableLengthDeclsAddrs)) {		llvm::reverse(I->getSecond().EscapedVariableLengthDeclsAddrs)) {
CGF.EmitRuntimeCall(		CGF.EmitRuntimeCall(
createNVPTXRuntimeFunction(OMPRTL_NVPTX__kmpc_data_sharing_pop_stack),		createNVPTXRuntimeFunction(OMPRTL_NVPTX__kmpc_data_sharing_pop_stack),
Addr);		Addr);
}		}
if (I->getSecond().GlobalRecordAddr) {		if (llvm::Value *GlobalRecordAddr = I->getSecond().GlobalRecordAddr) {
if (WithSPMDCheck \|\|		auto &&DataSharingGen = [this, &GlobalRecordAddr](CodeGenFunction &CGF,
getExecutionMode() == CGOpenMPRuntimeNVPTX::EM_Unknown) {		PrePostActionTy &) {
CGBuilderTy &Bld = CGF.Builder;
llvm::BasicBlock *ExitBB = CGF.createBasicBlock(".exit");
llvm::BasicBlock *NonSPMDBB = CGF.createBasicBlock(".non-spmd");
Bld.CreateCondBr(I->getSecond().IsInSPMDModeFlag, ExitBB, NonSPMDBB);
// There is no need to emit line number for unconditional branch.
(void)ApplyDebugLocation::CreateEmpty(CGF);
CGF.EmitBlock(NonSPMDBB);
CGF.EmitRuntimeCall(
createNVPTXRuntimeFunction(
OMPRTL_NVPTX__kmpc_data_sharing_pop_stack),
CGF.EmitCastToVoidPtr(I->getSecond().GlobalRecordAddr));
CGF.EmitBlock(ExitBB);
} else {
CGF.EmitRuntimeCall(createNVPTXRuntimeFunction(		CGF.EmitRuntimeCall(createNVPTXRuntimeFunction(
OMPRTL_NVPTX__kmpc_data_sharing_pop_stack),		OMPRTL_NVPTX__kmpc_data_sharing_pop_stack),
I->getSecond().GlobalRecordAddr);		CGF.EmitCastToVoidPtr(GlobalRecordAddr));
		};
		if (WithRuntimeCheck \|\|
		getExecutionMode() == CGOpenMPRuntimeNVPTX::EM_Unknown) {
		auto &&LocalGen = [](CodeGenFunction &CGF, PrePostActionTy &) {
		// Nothing to do.
		};
		emitInParallelRuntimeCheck(CGF, Loc, LocalGen, DataSharingGen);
		} else {
		RegionCodeGenTy DataSharingRCG(DataSharingGen);
		DataSharingRCG(CGF);
}		}
}		}
}		}
}		}

void CGOpenMPRuntimeNVPTX::emitTeamsCall(CodeGenFunction &CGF,		void CGOpenMPRuntimeNVPTX::emitTeamsCall(CodeGenFunction &CGF,
const OMPExecutableDirective &D,		const OMPExecutableDirective &D,
SourceLocation Loc,		SourceLocation Loc,
▲ Show 20 Lines • Show All 134 Lines • ▼ Show 20 Lines	if (!CapturedVars.empty())
createNVPTXRuntimeFunction(OMPRTL_NVPTX__kmpc_end_sharing_variables));		createNVPTXRuntimeFunction(OMPRTL_NVPTX__kmpc_end_sharing_variables));

// Remember for post-processing in worker loop.		// Remember for post-processing in worker loop.
Work.emplace_back(WFn);		Work.emplace_back(WFn);
};		};

auto &&LNParallelGen = [this, Loc, &SeqGen, &L0ParallelGen](		auto &&LNParallelGen = [this, Loc, &SeqGen, &L0ParallelGen](
CodeGenFunction &CGF, PrePostActionTy &Action) {		CodeGenFunction &CGF, PrePostActionTy &Action) {
if (IsInParallelRegion) {		if (IsInParallelRegion)
SeqGen(CGF, Action);
} else if (IsInTargetMasterThreadRegion) {
L0ParallelGen(CGF, Action);
} else {
// Check for master and then parallelism:
// if (__kmpc_is_spmd_exec_mode() \|\| __kmpc_parallel_level(loc, gtid)) {
// Serialized execution.
// } else {
// Worker call.
// }
CGBuilderTy &Bld = CGF.Builder;
llvm::BasicBlock *ExitBB = CGF.createBasicBlock(".exit");
llvm::BasicBlock *SeqBB = CGF.createBasicBlock(".sequential");
llvm::BasicBlock *ParallelCheckBB = CGF.createBasicBlock(".parcheck");
llvm::BasicBlock *MasterBB = CGF.createBasicBlock(".master");
llvm::Value *IsSPMD = Bld.CreateIsNotNull(CGF.EmitNounwindRuntimeCall(
createNVPTXRuntimeFunction(OMPRTL_NVPTX__kmpc_is_spmd_exec_mode)));
Bld.CreateCondBr(IsSPMD, SeqBB, ParallelCheckBB);
// There is no need to emit line number for unconditional branch.
(void)ApplyDebugLocation::CreateEmpty(CGF);
CGF.EmitBlock(ParallelCheckBB);
llvm::Value *RTLoc = emitUpdateLocation(CGF, Loc);
llvm::Value *ThreadID = getThreadID(CGF, Loc);
llvm::Value *PL = CGF.EmitRuntimeCall(
createNVPTXRuntimeFunction(OMPRTL_NVPTX__kmpc_parallel_level),
{RTLoc, ThreadID});
llvm::Value *Res = Bld.CreateIsNotNull(PL);
Bld.CreateCondBr(Res, SeqBB, MasterBB);
CGF.EmitBlock(SeqBB);
SeqGen(CGF, Action);		SeqGen(CGF, Action);
CGF.EmitBranch(ExitBB);		else if (IsInTargetMasterThreadRegion)
// There is no need to emit line number for unconditional branch.
(void)ApplyDebugLocation::CreateEmpty(CGF);
CGF.EmitBlock(MasterBB);
L0ParallelGen(CGF, Action);		L0ParallelGen(CGF, Action);
CGF.EmitBranch(ExitBB);		else
// There is no need to emit line number for unconditional branch.		emitInParallelRuntimeCheck(CGF, Loc, SeqGen, L0ParallelGen);
(void)ApplyDebugLocation::CreateEmpty(CGF);
// Emit the continuation block for code after the if.
CGF.EmitBlock(ExitBB, /IsFinished=/true);
}
};		};

if (IfCond) {		if (IfCond) {
emitOMPIfClause(CGF, IfCond, LNParallelGen, SeqGen);		emitOMPIfClause(CGF, IfCond, LNParallelGen, SeqGen);
} else {		} else {
CodeGenFunction::RunCleanupsScope Scope(CGF);		CodeGenFunction::RunCleanupsScope Scope(CGF);
RegionCodeGenTy ThenRCG(LNParallelGen);		RegionCodeGenTy ThenRCG(LNParallelGen);
ThenRCG(CGF);		ThenRCG(CGF);
▲ Show 20 Lines • Show All 1,751 Lines • ▼ Show 20 Lines	I->getSecond().EscapedVariableLengthDecls.append(
EscapedVariableLengthDecls.begin(), EscapedVariableLengthDecls.end());		EscapedVariableLengthDecls.begin(), EscapedVariableLengthDecls.end());
DeclToAddrMapTy &Data = I->getSecond().LocalVarData;		DeclToAddrMapTy &Data = I->getSecond().LocalVarData;
for (const ValueDecl *VD : VarChecker.getEscapedDecls()) {		for (const ValueDecl *VD : VarChecker.getEscapedDecls()) {
assert(VD->isCanonicalDecl() && "Expected canonical declaration");		assert(VD->isCanonicalDecl() && "Expected canonical declaration");
const FieldDecl *FD = VarChecker.getFieldForGlobalizedVar(VD);		const FieldDecl *FD = VarChecker.getFieldForGlobalizedVar(VD);
Data.insert(std::make_pair(VD, std::make_pair(FD, Address::invalid())));		Data.insert(std::make_pair(VD, std::make_pair(FD, Address::invalid())));
}		}
if (!NeedToDelayGlobalization) {		if (!NeedToDelayGlobalization) {
emitGenericVarsProlog(CGF, D->getBeginLoc(), /WithSPMDCheck=/true);		emitGenericVarsProlog(CGF, D->getBeginLoc(), /WithRuntimeCheck=/true);
struct GlobalizationScope final : EHScopeStack::Cleanup {		struct GlobalizationScope final : EHScopeStack::Cleanup {
GlobalizationScope() = default;		SourceLocation Loc;
		GlobalizationScope(SourceLocation Loc) : Loc(Loc) {}

void Emit(CodeGenFunction &CGF, Flags flags) override {		void Emit(CodeGenFunction &CGF, Flags flags) override {
static_cast<CGOpenMPRuntimeNVPTX &>(CGF.CGM.getOpenMPRuntime())		static_cast<CGOpenMPRuntimeNVPTX &>(CGF.CGM.getOpenMPRuntime())
.emitGenericVarsEpilog(CGF, /WithSPMDCheck=/true);		.emitGenericVarsEpilog(CGF, Loc, /WithRuntimeCheck=/true);
}		}
};		};
CGF.EHStack.pushCleanup<GlobalizationScope>(NormalAndEHCleanup);		CGF.EHStack.pushCleanup<GlobalizationScope>(NormalAndEHCleanup,
		D->getBeginLoc());
}		}
}		}

Address CGOpenMPRuntimeNVPTX::getAddressOfLocalVariable(CodeGenFunction &CGF,		Address CGOpenMPRuntimeNVPTX::getAddressOfLocalVariable(CodeGenFunction &CGF,
const VarDecl *VD) {		const VarDecl *VD) {
if (getDataSharingMode(CGM) != CGOpenMPRuntimeNVPTX::Generic)		if (getDataSharingMode(CGM) != CGOpenMPRuntimeNVPTX::Generic)
return Address::invalid();		return Address::invalid();

▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

test/OpenMP/declare_target_codegen_globalization.cpp

	Show All 29 Lines
	// CHECK: call {{.}}[[BAR:@.bar.*]]()			// CHECK: call {{.}}[[BAR:@.bar.*]]()
	// CHECK-NOT: call void @__kmpc_data_sharing_pop_stack(			// CHECK-NOT: call void @__kmpc_data_sharing_pop_stack(
	// CHECK: ret void			// CHECK: ret void

	// CHECK: define {{.}}[[FOO]](i32 dereferenceable{{.*}})			// CHECK: define {{.}}[[FOO]](i32 dereferenceable{{.*}})
	// CHECK-NOT: @__kmpc_data_sharing_push_stack			// CHECK-NOT: @__kmpc_data_sharing_push_stack

	// CHECK: define {{.*}}[[BAR]]()			// CHECK: define {{.*}}[[BAR]]()
				// CHECK: [[GTID:%.+]] = call i32 @__kmpc_global_thread_num(%struct.ident_t*
	// CHECK: [[STACK:%.+]] = alloca [[GLOBAL_ST:%.+]],			// CHECK: [[STACK:%.+]] = alloca [[GLOBAL_ST:%.+]],
	// CHECK: [[RES:%.+]] = call i8 @__kmpc_is_spmd_exec_mode()			// CHECK: [[RES:%.+]] = call i8 @__kmpc_is_spmd_exec_mode()
	// CHECK: [[IS_SPMD:%.+]] = icmp ne i8 [[RES]], 0			// CHECK: [[IS_SPMD:%.+]] = icmp ne i8 [[RES]], 0
	// CHECK: br i1 [[IS_SPMD]], label			// CHECK: br i1 [[IS_SPMD]], label
				// CHECK: [[RES:%.+]] = call i16 @__kmpc_parallel_level(%struct.ident_t* @{{.+}}, i32 [[GTID]])
				// CHECK: icmp ne i16 [[RES]], 0
				// CHECK: br i1
	// CHECK: br label			// CHECK: br label
	// CHECK: [[RES:%.+]] = call i8* @__kmpc_data_sharing_push_stack(i64 4, i16 0)			// CHECK: [[RES:%.+]] = call i8* @__kmpc_data_sharing_push_stack(i64 4, i16 0)
	// CHECK: [[GLOBALS:%.+]] = bitcast i8* [[RES]] to [[GLOBAL_ST]]*			// CHECK: [[GLOBALS:%.+]] = bitcast i8* [[RES]] to [[GLOBAL_ST]]*
	// CHECK: br label			// CHECK: br label
	// CHECK: [[ITEMS:%.+]] = phi [[GLOBAL_ST]]* [ [[STACK]], {{.+}} ], [ [[GLOBALS]], {{.+}} ]			// CHECK: [[ITEMS:%.+]] = phi [[GLOBAL_ST]]* [ [[STACK]], {{.+}} ], [ [[GLOBALS]], {{.+}} ]
	// CHECK: [[A_ADDR:%.+]] = getelementptr inbounds [[GLOBAL_ST]], [[GLOBAL_ST]]* [[ITEMS]], i{{[0-9]+}} 0, i{{[0-9]+}} 0			// CHECK: [[A_ADDR:%.+]] = getelementptr inbounds [[GLOBAL_ST]], [[GLOBAL_ST]]* [[ITEMS]], i{{[0-9]+}} 0, i{{[0-9]+}} 0
	// CHECK: call {{.}}[[FOO]](i32 dereferenceable{{.*}} [[A_ADDR]])			// CHECK: call {{.}}[[FOO]](i32 dereferenceable{{.*}} [[A_ADDR]])
				// CHECK: [[RES:%.+]] = call i8 @__kmpc_is_spmd_exec_mode()
				// CHECK: [[IS_SPMD:%.+]] = icmp ne i8 [[RES]], 0
	// CHECK: br i1 [[IS_SPMD]], label			// CHECK: br i1 [[IS_SPMD]], label
				// CHECK: [[RES:%.+]] = call i16 @__kmpc_parallel_level(%struct.ident_t* @{{.+}}, i32 [[GTID]])
				// CHECK: icmp ne i16 [[RES]], 0
				// CHECK: br i1
				// CHECK: br label
	// CHECK: [[BC:%.+]] = bitcast [[GLOBAL_ST]]* [[ITEMS]] to i8*			// CHECK: [[BC:%.+]] = bitcast [[GLOBAL_ST]]* [[ITEMS]] to i8*
	// CHECK: call void @__kmpc_data_sharing_pop_stack(i8* [[BC]])			// CHECK: call void @__kmpc_data_sharing_pop_stack(i8* [[BC]])
	// CHECK: br label			// CHECK: br label
	// CHECK: ret i32			// CHECK: ret i32

test/OpenMP/nvptx_target_codegen.cpp

Show First 20 Lines • Show All 548 Lines • ▼ Show 20 Lines	#pragma omp parallel
// CHECK: call void @__kmpc_kernel_deinit(		// CHECK: call void @__kmpc_kernel_deinit(
// CHECK: call void @llvm.nvvm.barrier0()		// CHECK: call void @llvm.nvvm.barrier0()
// CHECK: br label {{%?}}[[EXIT]]		// CHECK: br label {{%?}}[[EXIT]]
//		//
// CHECK: [[EXIT]]		// CHECK: [[EXIT]]
// CHECK: ret void		// CHECK: ret void

// CHECK: define i32 [[BAZ]](i32 [[F:%.]], double dereferenceable{{.*}})		// CHECK: define i32 [[BAZ]](i32 [[F:%.]], double dereferenceable{{.*}})
		// CHECK: [[GTID:%.+]] = call i32 @__kmpc_global_thread_num(%struct.ident_t*
// CHECK: [[STACK:%.+]] = alloca [[GLOBAL_ST:%.+]],		// CHECK: [[STACK:%.+]] = alloca [[GLOBAL_ST:%.+]],
// CHECK: [[ZERO_ADDR:%.+]] = alloca i32,		// CHECK: [[ZERO_ADDR:%.+]] = alloca i32,
// CHECK: [[GTID:%.+]] = call i32 @__kmpc_global_thread_num(%struct.ident_t*
// CHECK: store i32 0, i32* [[ZERO_ADDR]]		// CHECK: store i32 0, i32* [[ZERO_ADDR]]
// CHECK: [[RES:%.+]] = call i8 @__kmpc_is_spmd_exec_mode()		// CHECK: [[RES:%.+]] = call i8 @__kmpc_is_spmd_exec_mode()
// CHECK: [[IS_SPMD:%.+]] = icmp ne i8 [[RES]], 0		// CHECK: [[IS_SPMD:%.+]] = icmp ne i8 [[RES]], 0
// CHECK: br i1 [[IS_SPMD]], label		// CHECK: br i1 [[IS_SPMD]], label

		// CHECK: [[RES:%.+]] = call i16 @__kmpc_parallel_level(%struct.ident_t* @{{.+}}, i32 [[GTID]])
		// CHECK: icmp ne i16 [[RES]], 0
		// CHECK: br i1

// CHECK: br label		// CHECK: br label
// CHECK: [[PTR:%.+]] = call i8* @__kmpc_data_sharing_push_stack(i{{64\|32}} 4, i16 0)		// CHECK: [[PTR:%.+]] = call i8* @__kmpc_data_sharing_push_stack(i{{64\|32}} 4, i16 0)
// CHECK: [[REC_ADDR:%.+]] = bitcast i8* [[PTR]] to [[GLOBAL_ST]]*		// CHECK: [[REC_ADDR:%.+]] = bitcast i8* [[PTR]] to [[GLOBAL_ST]]*
// CHECK: br label		// CHECK: br label
// CHECK: [[ITEMS:%.+]] = phi [[GLOBAL_ST]]* [ [[STACK]], {{.+}} ], [ [[REC_ADDR]], {{.+}} ]		// CHECK: [[ITEMS:%.+]] = phi [[GLOBAL_ST]]* [ [[STACK]], {{.+}} ], [ [[REC_ADDR]], {{.+}} ]
// CHECK: [[F_PTR:%.+]] = getelementptr inbounds [[GLOBAL_ST]], [[GLOBAL_ST]]* [[ITEMS]], i32 0, i32 0		// CHECK: [[F_PTR:%.+]] = getelementptr inbounds [[GLOBAL_ST]], [[GLOBAL_ST]]* [[ITEMS]], i32 0, i32 0
// CHECK: store i32 %{{.+}}, i32* [[F_PTR]],		// CHECK: store i32 %{{.+}}, i32* [[F_PTR]],

Show All 18 Lines	#pragma omp parallel
// CHECK: store i8* [[F_REF]], i8** [[REF]],		// CHECK: store i8* [[F_REF]], i8** [[REF]],
// CHECK: call void @llvm.nvvm.barrier0()		// CHECK: call void @llvm.nvvm.barrier0()
// CHECK: call void @llvm.nvvm.barrier0()		// CHECK: call void @llvm.nvvm.barrier0()
// CHECK: call void @__kmpc_end_sharing_variables()		// CHECK: call void @__kmpc_end_sharing_variables()
// CHECK: br label		// CHECK: br label

// CHECK: [[RES:%.+]] = load i32, i32* [[F_PTR]],		// CHECK: [[RES:%.+]] = load i32, i32* [[F_PTR]],
// CHECK: store i32 [[RES]], i32* [[RET:%.+]],		// CHECK: store i32 [[RES]], i32* [[RET:%.+]],
// CHECK: br i1 [[IS_SPMD]], label		// CHECK: [[RES:%.+]] = call i8 @__kmpc_is_spmd_exec_mode()
		// CHECK: icmp ne i8 [[RES]], 0
		// CHECK: br i1

		// CHECK: [[RES:%.+]] = call i16 @__kmpc_parallel_level(%struct.ident_t* @{{.+}}, i32 [[GTID]])
		// CHECK: icmp ne i16 [[RES]], 0
		// CHECK: br i1

// CHECK: [[BC:%.+]] = bitcast [[GLOBAL_ST]]* [[ITEMS]] to i8*		// CHECK: [[BC:%.+]] = bitcast [[GLOBAL_ST]]* [[ITEMS]] to i8*
// CHECK: call void @__kmpc_data_sharing_pop_stack(i8* [[BC]])		// CHECK: call void @__kmpc_data_sharing_pop_stack(i8* [[BC]])
// CHECK: br label		// CHECK: br label
// CHECK: [[RES:%.+]] = load i32, i32* [[RET]],		// CHECK: [[RES:%.+]] = load i32, i32* [[RET]],
// CHECK: ret i32 [[RES]]		// CHECK: ret i32 [[RES]]


// CHECK-LABEL: define {{.*}}void {{@__omp_offloading_.+template.+l311}}_worker()		// CHECK-LABEL: define {{.*}}void {{@__omp_offloading_.+template.+l311}}_worker()
▲ Show 20 Lines • Show All 82 Lines • Show Last 20 Lines