This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/
-
clang/
-
Basic/
-
LangOptions.def
-
Driver/
-
Options.td
-
lib/
-
CodeGen/
-
CGOpenMPRuntime.cpp
-
CGStmtOpenMP.cpp
-
Driver/ToolChains/
-
ToolChains/
-
Clang.cpp
-
Sema/
-
SemaOpenMP.cpp
-
test/OpenMP/
-
OpenMP/
-
target_offload_mandatory_codegen.cpp

Differential D120353

[OpenMP] Add option to make offloading mandatory
ClosedPublic

Authored by jhuber6 on Feb 22 2022, 1:31 PM.

Download Raw Diff

Details

Reviewers

jdoerfert
tianshilei1992
JonChesterfield
ABataev

Commits

rG2b97b16f294a: [OpenMP] Add option to make offloading mandatory

Summary

Currently when we generate OpenMP offloading code we always make
fallback code for the CPU. This is necessary for implementing features
like conditional offloading and ensuring that unhandled pragmas don't
result in missing symbols. However, this is problematic for a few cases.
For offloading tests we can silently fail to the host without realizing
that offloading failed. Additionally, this makes it impossible to
provide interoperabiility to other offloading schemes like HIP or CUDA
because those methods do not provide any such host fallback guaruntee.
this patch adds the -fopenmp-offload-mandatory flag to prevent
generating the fallback symbol on the CPU and instead replaces the
function with a dummy global and the failed branch with 'unreachable'.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	740 ms	x64 debian > SanitizerCommon-tsan-x86_64-Linux.Linux::decorate_proc_maps.cpp

Event Timeline

jhuber6 created this revision.Feb 22 2022, 1:31 PM

Herald added subscribers: dexonsmith, dang, guansong, yaxunl. · View Herald TranscriptFeb 22 2022, 1:31 PM

jhuber6 requested review of this revision.Feb 22 2022, 1:31 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 22 2022, 1:31 PM

Herald added subscribers: cfe-commits, sstefan1. · View Herald Transcript

This is necessary for implementing features like conditional offloading and ensuring that unhandled pragmas don't result in missing symbols.

This behavior is part of the standard.

For offloading tests we can silently fail to the host without realizing that offloading failed.

It is controlled by the OMP_TARGET_OFFLOAD env variable, no? You can set this env var to mandatory to avoid this problem.

In D120353#3338589, @ABataev wrote:

This is necessary for implementing features like conditional offloading and ensuring that unhandled pragmas don't result in missing symbols.

This behavior is part of the standard.

I believe it's reasonable to have this as an option flag to defy the standard, we have other flags that do this already (e.g. -fopenmp-cuda-mode).

For offloading tests we can silently fail to the host without realizing that offloading failed.

It is controlled by the OMP_TARGET_OFFLOAD env variable, no? You can set this env var to mandatory to avoid this problem.

Yes, I don't think we set it in the tests right now for some reason. But the main reason I made this patch is for interoperability. Without this if you want to call a CUDA function from the OpenMP device you'd need a variant and a dummy implementation. If you don't write a dummy implementation you'll get a linker error, if you don't use a variant you'll override the CUDA version.

Harbormaster completed remote builds in B150931: Diff 410634.Feb 22 2022, 2:12 PM

In D120353#3338647, @jhuber6 wrote:

In D120353#3338589, @ABataev wrote:

This is necessary for implementing features like conditional offloading and ensuring that unhandled pragmas don't result in missing symbols.

This behavior is part of the standard.

I believe it's reasonable to have this as an option flag to defy the standard, we have other flags that do this already (e.g. -fopenmp-cuda-mode).

I mean it is not the implementation feature.

For offloading tests we can silently fail to the host without realizing that offloading failed.

It is controlled by the OMP_TARGET_OFFLOAD env variable, no? You can set this env var to mandatory to avoid this problem.

Yes, I don't think we set it in the tests right now for some reason.

I thought it is the default behavior. But we need to set it for offloading tests to be sure in their behavior.

But the main reason I made this patch is for interoperability. Without this if you want to call a CUDA function from the OpenMP device you'd need a variant and a dummy implementation. If you don't write a dummy implementation you'll get a linker error, if you don't use a variant you'll override the CUDA version.

Ah, ok, I see. How is supposed to be used? In Cuda code or in plain C/C++ code?

In D120353#3338718, @ABataev wrote:

In D120353#3338647, @jhuber6 wrote:

But the main reason I made this patch is for interoperability. Without this if you want to call a CUDA function from the OpenMP device you'd need a variant and a dummy implementation. If you don't write a dummy implementation you'll get a linker error, if you don't use a variant you'll override the CUDA function.

Ah, ok, I see. How is supposed to be used? In Cuda code or in plain C/C++ code?

I haven't finalized the implementation, but the basic support I've tested was calling a __device__ function compiled from another file with OpenMP, with this patch the source files would look like this for example. I think the inverse would also be possible given some code on the CUDA side. Calling CUDA kernels would take some extra work.

__device__ int cuda() { return 0; }

int cuda(void);
#pragma omp declare target device_type(nohost) to(cuda)

int main() {
  int x = 1;
#pragma omp target map(from : x)
  x = cuda();

  return x;
}

In D120353#3338770, @jhuber6 wrote:
In D120353#3338718, @ABataev wrote:

In D120353#3338647, @jhuber6 wrote:

But the main reason I made this patch is for interoperability. Without this if you want to call a CUDA function from the OpenMP device you'd need a variant and a dummy implementation. If you don't write a dummy implementation you'll get a linker error, if you don't use a variant you'll override the CUDA function.

Ah, ok, I see. How is supposed to be used? In Cuda code or in plain C/C++ code?

I haven't finalized the implementation, but the basic support I've tested was calling a __device__ function compiled from another file with OpenMP, with this patch the source files would look like this for example. I think the inverse would also be possible given some code on the CUDA side. Calling CUDA kernels would take some extra work.
__device__ int cuda() { return 0; }
int cuda(void);
#pragma omp declare target device_type(nohost) to(cuda)

int main() {
  int x = 1;
#pragma omp target map(from : x)
  x = cuda();

  return x;
}

What if we have #pragma omp target if (...) or #pragma omp target device(...)?

In D120353#3340533, @ABataev wrote:
In D120353#3338770, @jhuber6 wrote:
In D120353#3338718, @ABataev wrote:

In D120353#3338647, @jhuber6 wrote:

But the main reason I made this patch is for interoperability. Without this if you want to call a CUDA function from the OpenMP device you'd need a variant and a dummy implementation. If you don't write a dummy implementation you'll get a linker error, if you don't use a variant you'll override the CUDA function.

Ah, ok, I see. How is supposed to be used? In Cuda code or in plain C/C++ code?

I haven't finalized the implementation, but the basic support I've tested was calling a __device__ function compiled from another file with OpenMP, with this patch the source files would look like this for example. I think the inverse would also be possible given some code on the CUDA side. Calling CUDA kernels would take some extra work.
__device__ int cuda() { return 0; }
int cuda(void);
#pragma omp declare target device_type(nohost) to(cuda)

int main() {
  int x = 1;
#pragma omp target map(from : x)
  x = cuda();

  return x;
}
What if we have #pragma omp target if (...) or #pragma omp target device(...)?

If we have #pragma omp target if (...) then that requires a host fallback and violates the assertion the user passed in, it will hit the unreachable and fail. If the user passed in #pragma omp target device(...) we will assume that a host implementation exists as well.

In D120353#3340596, @jhuber6 wrote:
In D120353#3340533, @ABataev wrote:
In D120353#3338770, @jhuber6 wrote:
In D120353#3338718, @ABataev wrote:

In D120353#3338647, @jhuber6 wrote:

But the main reason I made this patch is for interoperability. Without this if you want to call a CUDA function from the OpenMP device you'd need a variant and a dummy implementation. If you don't write a dummy implementation you'll get a linker error, if you don't use a variant you'll override the CUDA function.

Ah, ok, I see. How is supposed to be used? In Cuda code or in plain C/C++ code?

I haven't finalized the implementation, but the basic support I've tested was calling a __device__ function compiled from another file with OpenMP, with this patch the source files would look like this for example. I think the inverse would also be possible given some code on the CUDA side. Calling CUDA kernels would take some extra work.
__device__ int cuda() { return 0; }
int cuda(void);
#pragma omp declare target device_type(nohost) to(cuda)

int main() {
  int x = 1;
#pragma omp target map(from : x)
  x = cuda();

  return x;
}
What if we have #pragma omp target if (...) or #pragma omp target device(...)?
If we have #pragma omp target if (...) then that requires a host fallback and violates the assertion the user passed in, it will hit the unreachable and fail. If the user passed in #pragma omp target device(...) we will assume that a host implementation exists as well.

Do you have a check for the last case (with the device clause)?

In D120353#3340606, @ABataev wrote:

In D120353#3340596, @jhuber6 wrote:

If we have #pragma omp target if (...) then that requires a host fallback and violates the assertion the user passed in, it will hit the unreachable and fail. If the user passed in #pragma omp target device(...) we will assume that a host implementation exists as well.

Do you have a check for the last case (with the device clause)?

In the test file I had a global, but forgot to check the globals to show that @x exists on the device. I should also put an if (0) to show that we always hit unreachable in that case.

Adding test case to check if codegen for unreachables, and an extra function to show that it is not created for the host while the other is. Also added an error message when the user specified offloading is mandatory but couldn't be created due to if(0) or a lack of triples.

Could you add a test with the device clause too?

In D120353#3340744, @ABataev wrote:

Could you add a test with the device clause too?

Which clause exactly?

In D120353#3340749, @jhuber6 wrote:

In D120353#3340744, @ABataev wrote:

Could you add a test with the device clause too?

Which clause exactly?

device(device_id)

Adding test function with device clause

I assume it would be good to notify the user somehow about target regions, which may require execution on the host. Maybe add a note during the codegen phase?

In D120353#3340775, @ABataev wrote:

I assume it would be good to notify the user somehow about target regions, which may require execution on the host. Maybe add a note during the codegen phase?

Technically all of them may require execution on the host according to the documentation. We could emit a warning whenever we codegen a target region with an if clause, but I feel the user should have a good enough idea that if won't work if they specifically turn on the flag that removes host execution.

This revision is now accepted and ready to land.Feb 23 2022, 9:36 AM

Thanks! Seems a good thing to add to the offloading test runner, preferably in a separate change to avoid reverting this in case of unforeseen problems

In D120353#3340863, @JonChesterfield wrote:

Thanks! Seems a good thing to add to the offloading test runner, preferably in a separate change to avoid reverting this in case of unforeseen problems

Could definitely do that, it doesn't seem like we test if anywhere in the OpenMP tests so it shouldn't break anything.

Harbormaster completed remote builds in B151083: Diff 410851.Feb 23 2022, 10:21 AM

Guarding where we set attrs in the case that it's not a valid function now.

Harbormaster completed remote builds in B151129: Diff 410919.Feb 23 2022, 1:41 PM

This revision was landed with ongoing or failed builds.Feb 23 2022, 1:45 PM

Closed by commit rG2b97b16f294a: [OpenMP] Add option to make offloading mandatory (authored by jhuber6). · Explain Why

This revision was automatically updated to reflect the committed changes.

jhuber6 added a commit: rG2b97b16f294a: [OpenMP] Add option to make offloading mandatory.

Revision Contents

Path

Size

clang/

include/

clang/

Basic/

LangOptions.def

1 line

Driver/

Options.td

4 lines

lib/

CodeGen/

CGOpenMPRuntime.cpp

68 lines

CGStmtOpenMP.cpp

7 lines

Driver/

ToolChains/

Clang.cpp

2 lines

Sema/

SemaOpenMP.cpp

2 lines

test/

OpenMP/

target_offload_mandatory_codegen.cpp

90 lines

Diff 410851

clang/include/clang/Basic/LangOptions.def

	Show First 20 Lines • Show All 241 Lines • ▼ Show 20 Lines
	LANGOPT(OpenMPCUDABlocksPerSM , 32, 0, "Number of blocks per SM for CUDA devices.")			LANGOPT(OpenMPCUDABlocksPerSM , 32, 0, "Number of blocks per SM for CUDA devices.")
	LANGOPT(OpenMPCUDAReductionBufNum , 32, 1024, "Number of the reduction records in the intermediate reduction buffer used for the teams reductions.")			LANGOPT(OpenMPCUDAReductionBufNum , 32, 1024, "Number of the reduction records in the intermediate reduction buffer used for the teams reductions.")
	LANGOPT(OpenMPTargetNewRuntime , 1, 0, "Use the new bitcode library for OpenMP offloading")			LANGOPT(OpenMPTargetNewRuntime , 1, 0, "Use the new bitcode library for OpenMP offloading")
	LANGOPT(OpenMPTargetDebug , 32, 0, "Enable debugging in the OpenMP offloading device RTL")			LANGOPT(OpenMPTargetDebug , 32, 0, "Enable debugging in the OpenMP offloading device RTL")
	LANGOPT(OpenMPOptimisticCollapse , 1, 0, "Use at most 32 bits to represent the collapsed loop nest counter.")			LANGOPT(OpenMPOptimisticCollapse , 1, 0, "Use at most 32 bits to represent the collapsed loop nest counter.")
	LANGOPT(OpenMPThreadSubscription , 1, 0, "Assume work-shared loops do not have more iterations than participating threads.")			LANGOPT(OpenMPThreadSubscription , 1, 0, "Assume work-shared loops do not have more iterations than participating threads.")
	LANGOPT(OpenMPTeamSubscription , 1, 0, "Assume distributed loops do not have more iterations than participating teams.")			LANGOPT(OpenMPTeamSubscription , 1, 0, "Assume distributed loops do not have more iterations than participating teams.")
	LANGOPT(OpenMPNoThreadState , 1, 0, "Assume that no thread in a parallel region will modify an ICV.")			LANGOPT(OpenMPNoThreadState , 1, 0, "Assume that no thread in a parallel region will modify an ICV.")
				LANGOPT(OpenMPOffloadMandatory , 1, 0, "Assert that offloading is mandatory and do not create a host fallback.")
	LANGOPT(RenderScript , 1, 0, "RenderScript")			LANGOPT(RenderScript , 1, 0, "RenderScript")

	LANGOPT(CUDAIsDevice , 1, 0, "compiling for CUDA device")			LANGOPT(CUDAIsDevice , 1, 0, "compiling for CUDA device")
	LANGOPT(CUDAAllowVariadicFunctions, 1, 0, "allowing variadic functions in CUDA device code")			LANGOPT(CUDAAllowVariadicFunctions, 1, 0, "allowing variadic functions in CUDA device code")
	LANGOPT(CUDAHostDeviceConstexpr, 1, 1, "treating unattributed constexpr functions as __host__ __device__")			LANGOPT(CUDAHostDeviceConstexpr, 1, 1, "treating unattributed constexpr functions as __host__ __device__")
	LANGOPT(CUDADeviceApproxTranscendentals, 1, 0, "using approximate transcendental functions")			LANGOPT(CUDADeviceApproxTranscendentals, 1, 0, "using approximate transcendental functions")
	LANGOPT(GPURelocatableDeviceCode, 1, 0, "generate relocatable device code")			LANGOPT(GPURelocatableDeviceCode, 1, 0, "generate relocatable device code")
	LANGOPT(GPUAllowDeviceInit, 1, 0, "allowing device side global init functions for HIP")			LANGOPT(GPUAllowDeviceInit, 1, 0, "allowing device side global init functions for HIP")
	▲ Show 20 Lines • Show All 189 Lines • Show Last 20 Lines

clang/include/clang/Driver/Options.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 2,471 Lines • ▼ Show 20 Lines
	def fno_openmp_assume_teams_oversubscription : Flag<["-"], "fno-openmp-assume-teams-oversubscription">,			def fno_openmp_assume_teams_oversubscription : Flag<["-"], "fno-openmp-assume-teams-oversubscription">,
	Group<f_Group>, Flags<[CC1Option, NoArgumentUnused, HelpHidden]>;			Group<f_Group>, Flags<[CC1Option, NoArgumentUnused, HelpHidden]>;
	def fno_openmp_assume_threads_oversubscription : Flag<["-"], "fno-openmp-assume-threads-oversubscription">,			def fno_openmp_assume_threads_oversubscription : Flag<["-"], "fno-openmp-assume-threads-oversubscription">,
	Group<f_Group>, Flags<[CC1Option, NoArgumentUnused, HelpHidden]>;			Group<f_Group>, Flags<[CC1Option, NoArgumentUnused, HelpHidden]>;
	def fopenmp_assume_no_thread_state : Flag<["-"], "fopenmp-assume-no-thread-state">, Group<f_Group>,			def fopenmp_assume_no_thread_state : Flag<["-"], "fopenmp-assume-no-thread-state">, Group<f_Group>,
	Flags<[CC1Option, NoArgumentUnused, HelpHidden]>,			Flags<[CC1Option, NoArgumentUnused, HelpHidden]>,
	HelpText<"Assert no thread in a parallel region modifies an ICV">,			HelpText<"Assert no thread in a parallel region modifies an ICV">,
	MarshallingInfoFlag<LangOpts<"OpenMPNoThreadState">>;			MarshallingInfoFlag<LangOpts<"OpenMPNoThreadState">>;
				def fopenmp_offload_mandatory : Flag<["-"], "fopenmp-offload-mandatory">, Group<f_Group>,
				Flags<[CC1Option, NoArgumentUnused]>,
				HelpText<"Do not create a host fallback if offloading to the device fails.">,
				MarshallingInfoFlag<LangOpts<"OpenMPOffloadMandatory">>;
	defm openmp_target_new_runtime: BoolFOption<"openmp-target-new-runtime",			defm openmp_target_new_runtime: BoolFOption<"openmp-target-new-runtime",
	LangOpts<"OpenMPTargetNewRuntime">, DefaultTrue,			LangOpts<"OpenMPTargetNewRuntime">, DefaultTrue,
	PosFlag<SetTrue, [CC1Option], "Use the new bitcode library for OpenMP offloading">,			PosFlag<SetTrue, [CC1Option], "Use the new bitcode library for OpenMP offloading">,
	NegFlag<SetFalse>>;			NegFlag<SetFalse>>;
	defm openmp_optimistic_collapse : BoolFOption<"openmp-optimistic-collapse",			defm openmp_optimistic_collapse : BoolFOption<"openmp-optimistic-collapse",
	LangOpts<"OpenMPOptimisticCollapse">, DefaultFalse,			LangOpts<"OpenMPOptimisticCollapse">, DefaultFalse,
	PosFlag<SetTrue, [CC1Option]>, NegFlag<SetFalse>, BothFlags<[NoArgumentUnused, HelpHidden]>>;			PosFlag<SetTrue, [CC1Option]>, NegFlag<SetFalse>, BothFlags<[NoArgumentUnused, HelpHidden]>>;
	def static_openmp: Flag<["-"], "static-openmp">,			def static_openmp: Flag<["-"], "static-openmp">,
	▲ Show 20 Lines • Show All 4,088 Lines • Show Last 20 Lines

clang/lib/CodeGen/CGOpenMPRuntime.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,532 Lines • ▼ Show 20 Lines	void CGOpenMPRuntime::emitTargetOutlinedFunctionHelper(
// information of the current target region. The name will be something like:		// information of the current target region. The name will be something like:
//		//
// __omp_offloading_DD_FFFF_PP_lBB		// __omp_offloading_DD_FFFF_PP_lBB
//		//
// where DD_FFFF is an ID unique to the file (device and file IDs), PP is the		// where DD_FFFF is an ID unique to the file (device and file IDs), PP is the
// mangled name of the function that encloses the target region and BB is the		// mangled name of the function that encloses the target region and BB is the
// line number of the target region.		// line number of the target region.

		const bool BuildOutlinedFn = CGM.getLangOpts().OpenMPIsDevice \|\|
		!CGM.getLangOpts().OpenMPOffloadMandatory;
unsigned DeviceID;		unsigned DeviceID;
unsigned FileID;		unsigned FileID;
unsigned Line;		unsigned Line;
getTargetEntryUniqueInfo(CGM.getContext(), D.getBeginLoc(), DeviceID, FileID,		getTargetEntryUniqueInfo(CGM.getContext(), D.getBeginLoc(), DeviceID, FileID,
Line);		Line);
SmallString<64> EntryFnName;		SmallString<64> EntryFnName;
{		{
llvm::raw_svector_ostream OS(EntryFnName);		llvm::raw_svector_ostream OS(EntryFnName);
OS << "__omp_offloading" << llvm::format("_%x", DeviceID)		OS << "__omp_offloading" << llvm::format("_%x", DeviceID)
<< llvm::format("_%x_", FileID) << ParentName << "_l" << Line;		<< llvm::format("_%x_", FileID) << ParentName << "_l" << Line;
}		}

const CapturedStmt &CS = *D.getCapturedStmt(OMPD_target);		const CapturedStmt &CS = *D.getCapturedStmt(OMPD_target);

CodeGenFunction CGF(CGM, true);		CodeGenFunction CGF(CGM, true);
CGOpenMPTargetRegionInfo CGInfo(CS, CodeGen, EntryFnName);		CGOpenMPTargetRegionInfo CGInfo(CS, CodeGen, EntryFnName);
CodeGenFunction::CGCapturedStmtRAII CapInfoRAII(CGF, &CGInfo);		CodeGenFunction::CGCapturedStmtRAII CapInfoRAII(CGF, &CGInfo);

		if (BuildOutlinedFn)
OutlinedFn = CGF.GenerateOpenMPCapturedStmtFunction(CS, D.getBeginLoc());		OutlinedFn = CGF.GenerateOpenMPCapturedStmtFunction(CS, D.getBeginLoc());

// If this target outline function is not an offload entry, we don't need to		// If this target outline function is not an offload entry, we don't need to
// register it.		// register it.
if (!IsOffloadEntry)		if (!IsOffloadEntry)
return;		return;

// The target region ID is used by the runtime library to identify the current		// The target region ID is used by the runtime library to identify the current
// target region, so it only has to be unique and not necessarily point to		// target region, so it only has to be unique and not necessarily point to
Show All 15 Lines	void CGOpenMPRuntime::emitTargetOutlinedFunctionHelper(
} else {		} else {
std::string Name = getName({EntryFnName, "region_id"});		std::string Name = getName({EntryFnName, "region_id"});
OutlinedFnID = new llvm::GlobalVariable(		OutlinedFnID = new llvm::GlobalVariable(
CGM.getModule(), CGM.Int8Ty, /isConstant=/true,		CGM.getModule(), CGM.Int8Ty, /isConstant=/true,
llvm::GlobalValue::WeakAnyLinkage,		llvm::GlobalValue::WeakAnyLinkage,
llvm::Constant::getNullValue(CGM.Int8Ty), Name);		llvm::Constant::getNullValue(CGM.Int8Ty), Name);
}		}

		// If we do not allow host fallback we still need a named address to use.
		llvm::Constant *TargetRegionEntryAddr = OutlinedFn;
		if (!BuildOutlinedFn) {
		assert(!CGM.getModule().getGlobalVariable(EntryFnName, true) &&
		"Named kernel already exists?");
		TargetRegionEntryAddr = new llvm::GlobalVariable(
		CGM.getModule(), CGM.Int8Ty, /isConstant=/true,
		llvm::GlobalValue::InternalLinkage,
		llvm::Constant::getNullValue(CGM.Int8Ty), EntryFnName);
		}

// Register the information for the entry associated with this target region.		// Register the information for the entry associated with this target region.
OffloadEntriesInfoManager.registerTargetRegionEntryInfo(		OffloadEntriesInfoManager.registerTargetRegionEntryInfo(
DeviceID, FileID, ParentName, Line, OutlinedFn, OutlinedFnID,		DeviceID, FileID, ParentName, Line, TargetRegionEntryAddr, OutlinedFnID,
OffloadEntriesInfoManagerTy::OMPTargetRegionEntryTargetRegion);		OffloadEntriesInfoManagerTy::OMPTargetRegionEntryTargetRegion);

// Add NumTeams and ThreadLimit attributes to the outlined GPU function		// Add NumTeams and ThreadLimit attributes to the outlined GPU function
int32_t DefaultValTeams = -1;		int32_t DefaultValTeams = -1;
getNumTeamsExprForTargetDirective(CGF, D, DefaultValTeams);		getNumTeamsExprForTargetDirective(CGF, D, DefaultValTeams);
if (DefaultValTeams > 0) {		if (DefaultValTeams > 0) {
OutlinedFn->addFnAttr("omp_target_num_teams",		OutlinedFn->addFnAttr("omp_target_num_teams",
std::to_string(DefaultValTeams));		std::to_string(DefaultValTeams));
}		}
int32_t DefaultValThreads = -1;		int32_t DefaultValThreads = -1;
getNumThreadsExprForTargetDirective(CGF, D, DefaultValThreads);		getNumThreadsExprForTargetDirective(CGF, D, DefaultValThreads);
if (DefaultValThreads > 0) {		if (DefaultValThreads > 0) {
OutlinedFn->addFnAttr("omp_target_thread_limit",		OutlinedFn->addFnAttr("omp_target_thread_limit",
std::to_string(DefaultValThreads));		std::to_string(DefaultValThreads));
}		}

		if (BuildOutlinedFn)
CGM.getTargetCodeGenInfo().setTargetAttributes(nullptr, OutlinedFn, CGM);		CGM.getTargetCodeGenInfo().setTargetAttributes(nullptr, OutlinedFn, CGM);
}		}

/// Checks if the expression is constant or does not have non-trivial function		/// Checks if the expression is constant or does not have non-trivial function
/// calls.		/// calls.
static bool isTrivial(ASTContext &Ctx, const Expr * E) {		static bool isTrivial(ASTContext &Ctx, const Expr * E) {
// We can skip constant expressions.		// We can skip constant expressions.
// We can skip expressions with trivial calls or simple expressions.		// We can skip expressions with trivial calls or simple expressions.
return (E->isEvaluatable(Ctx, Expr::SE_AllowUndefinedBehavior) \|\|		return (E->isEvaluatable(Ctx, Expr::SE_AllowUndefinedBehavior) \|\|
▲ Show 20 Lines • Show All 3,700 Lines • ▼ Show 20 Lines	void CGOpenMPRuntime::emitTargetCall(
llvm::Function OutlinedFn, llvm::Value OutlinedFnID, const Expr *IfCond,		llvm::Function OutlinedFn, llvm::Value OutlinedFnID, const Expr *IfCond,
llvm::PointerIntPair<const Expr *, 2, OpenMPDeviceClauseModifier> Device,		llvm::PointerIntPair<const Expr *, 2, OpenMPDeviceClauseModifier> Device,
llvm::function_ref<llvm::Value *(CodeGenFunction &CGF,		llvm::function_ref<llvm::Value *(CodeGenFunction &CGF,
const OMPLoopDirective &D)>		const OMPLoopDirective &D)>
SizeEmitter) {		SizeEmitter) {
if (!CGF.HaveInsertPoint())		if (!CGF.HaveInsertPoint())
return;		return;

assert(OutlinedFn && "Invalid outlined function!");		const bool OffloadingMandatory = !CGM.getLangOpts().OpenMPIsDevice &&
		CGM.getLangOpts().OpenMPOffloadMandatory;

		assert((OffloadingMandatory \|\| OutlinedFn) && "Invalid outlined function!");

const bool RequiresOuterTask = D.hasClausesOfKind<OMPDependClause>() \|\|		const bool RequiresOuterTask = D.hasClausesOfKind<OMPDependClause>() \|\|
D.hasClausesOfKind<OMPNowaitClause>();		D.hasClausesOfKind<OMPNowaitClause>();
llvm::SmallVector<llvm::Value *, 16> CapturedVars;		llvm::SmallVector<llvm::Value *, 16> CapturedVars;
const CapturedStmt &CS = *D.getCapturedStmt(OMPD_target);		const CapturedStmt &CS = *D.getCapturedStmt(OMPD_target);
auto &&ArgsCodegen = [&CS, &CapturedVars](CodeGenFunction &CGF,		auto &&ArgsCodegen = [&CS, &CapturedVars](CodeGenFunction &CGF,
PrePostActionTy &) {		PrePostActionTy &) {
CGF.GenerateOpenMPCapturedVars(CS, CapturedVars);		CGF.GenerateOpenMPCapturedVars(CS, CapturedVars);
};		};
emitInlinedDirective(CGF, OMPD_unknown, ArgsCodegen);		emitInlinedDirective(CGF, OMPD_unknown, ArgsCodegen);

CodeGenFunction::OMPTargetDataInfo InputInfo;		CodeGenFunction::OMPTargetDataInfo InputInfo;
llvm::Value *MapTypesArray = nullptr;		llvm::Value *MapTypesArray = nullptr;
llvm::Value *MapNamesArray = nullptr;		llvm::Value *MapNamesArray = nullptr;
// Fill up the pointer arrays and transfer execution to the device.		// Generate code for the host fallback function.
auto &&ThenGen = [this, Device, OutlinedFn, OutlinedFnID, &D, &InputInfo,		auto &&FallbackGen = [this, OutlinedFn, OutlinedFnID, &D, &CapturedVars,
&MapTypesArray, &MapNamesArray, &CS, RequiresOuterTask,		RequiresOuterTask, &CS,
&CapturedVars,		OffloadingMandatory](CodeGenFunction &CGF) {
SizeEmitter](CodeGenFunction &CGF, PrePostActionTy &) {		if (OffloadingMandatory) {
if (Device.getInt() == OMPC_DEVICE_ancestor) {		CGF.Builder.CreateUnreachable();
// Reverse offloading is not supported, so just execute on the host.		} else {
if (RequiresOuterTask) {		if (RequiresOuterTask) {
CapturedVars.clear();		CapturedVars.clear();
CGF.GenerateOpenMPCapturedVars(CS, CapturedVars);		CGF.GenerateOpenMPCapturedVars(CS, CapturedVars);
}		}
emitOutlinedFunctionCall(CGF, D.getBeginLoc(), OutlinedFn, CapturedVars);		emitOutlinedFunctionCall(CGF, D.getBeginLoc(), OutlinedFn, CapturedVars);
		}
		};
		// Fill up the pointer arrays and transfer execution to the device.
		auto &&ThenGen = [this, Device, OutlinedFn, OutlinedFnID, &D, &InputInfo,
		&MapTypesArray, &MapNamesArray, &CS, RequiresOuterTask,
		&CapturedVars, SizeEmitter,
		FallbackGen](CodeGenFunction &CGF, PrePostActionTy &) {
		if (Device.getInt() == OMPC_DEVICE_ancestor) {
		// Reverse offloading is not supported, so just execute on the host.
		FallbackGen(CGF);
return;		return;
}		}

// On top of the arrays that were filled up, the target offloading call		// On top of the arrays that were filled up, the target offloading call
// takes as arguments the device id as well as the host pointer. The host		// takes as arguments the device id as well as the host pointer. The host
// pointer is used by the runtime library to identify the current target		// pointer is used by the runtime library to identify the current target
// region, so it only has to be unique and not necessarily point to		// region, so it only has to be unique and not necessarily point to
// anything. It could be the pointer to the outlined function that		// anything. It could be the pointer to the outlined function that
▲ Show 20 Lines • Show All 127 Lines • ▼ Show 20 Lines	auto &&ThenGen = [this, Device, OutlinedFn, OutlinedFnID, &D, &InputInfo,
llvm::BasicBlock *OffloadFailedBlock =		llvm::BasicBlock *OffloadFailedBlock =
CGF.createBasicBlock("omp_offload.failed");		CGF.createBasicBlock("omp_offload.failed");
llvm::BasicBlock *OffloadContBlock =		llvm::BasicBlock *OffloadContBlock =
CGF.createBasicBlock("omp_offload.cont");		CGF.createBasicBlock("omp_offload.cont");
llvm::Value *Failed = CGF.Builder.CreateIsNotNull(Return);		llvm::Value *Failed = CGF.Builder.CreateIsNotNull(Return);
CGF.Builder.CreateCondBr(Failed, OffloadFailedBlock, OffloadContBlock);		CGF.Builder.CreateCondBr(Failed, OffloadFailedBlock, OffloadContBlock);

CGF.EmitBlock(OffloadFailedBlock);		CGF.EmitBlock(OffloadFailedBlock);
if (RequiresOuterTask) {		FallbackGen(CGF);
CapturedVars.clear();
CGF.GenerateOpenMPCapturedVars(CS, CapturedVars);
}
emitOutlinedFunctionCall(CGF, D.getBeginLoc(), OutlinedFn, CapturedVars);
CGF.EmitBranch(OffloadContBlock);		CGF.EmitBranch(OffloadContBlock);

CGF.EmitBlock(OffloadContBlock, /IsFinished=/true);		CGF.EmitBlock(OffloadContBlock, /IsFinished=/true);
};		};

// Notify that the host version must be executed.		// Notify that the host version must be executed.
auto &&ElseGen = [this, &D, OutlinedFn, &CS, &CapturedVars,		auto &&ElseGen = [this, &D, OutlinedFn, &CS, &CapturedVars, RequiresOuterTask,
RequiresOuterTask](CodeGenFunction &CGF,		FallbackGen](CodeGenFunction &CGF, PrePostActionTy &) {
PrePostActionTy &) {		FallbackGen(CGF);
if (RequiresOuterTask) {
CapturedVars.clear();
CGF.GenerateOpenMPCapturedVars(CS, CapturedVars);
}
emitOutlinedFunctionCall(CGF, D.getBeginLoc(), OutlinedFn, CapturedVars);
};		};

auto &&TargetThenGen = [this, &ThenGen, &D, &InputInfo, &MapTypesArray,		auto &&TargetThenGen = [this, &ThenGen, &D, &InputInfo, &MapTypesArray,
&MapNamesArray, &CapturedVars, RequiresOuterTask,		&MapNamesArray, &CapturedVars, RequiresOuterTask,
&CS](CodeGenFunction &CGF, PrePostActionTy &) {		&CS](CodeGenFunction &CGF, PrePostActionTy &) {
// Fill up the arrays with all the captured variables.		// Fill up the arrays with all the captured variables.
MappableExprsHandler::MapCombinedInfoTy CombinedInfo;		MappableExprsHandler::MapCombinedInfoTy CombinedInfo;

▲ Show 20 Lines • Show All 2,666 Lines • Show Last 20 Lines

clang/lib/CodeGen/CGStmtOpenMP.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,283 Lines • ▼ Show 20 Lines	static void emitCommonOMPTargetDirective(CodeGenFunction &CGF,
if (IfCond) {		if (IfCond) {
bool Val;		bool Val;
if (CGF.ConstantFoldsToSimpleInteger(IfCond, Val) && !Val)		if (CGF.ConstantFoldsToSimpleInteger(IfCond, Val) && !Val)
IsOffloadEntry = false;		IsOffloadEntry = false;
}		}
if (CGM.getLangOpts().OMPTargetTriples.empty())		if (CGM.getLangOpts().OMPTargetTriples.empty())
IsOffloadEntry = false;		IsOffloadEntry = false;

		if (CGM.getLangOpts().OpenMPOffloadMandatory && !IsOffloadEntry) {
		unsigned DiagID = CGM.getDiags().getCustomDiagID(
		DiagnosticsEngine::Error,
		"No offloading entry generated while offloading is mandatory.");
		CGM.getDiags().Report(DiagID);
		}

assert(CGF.CurFuncDecl && "No parent declaration for target region!");		assert(CGF.CurFuncDecl && "No parent declaration for target region!");
StringRef ParentName;		StringRef ParentName;
// In case we have Ctors/Dtors we use the complete type variant to produce		// In case we have Ctors/Dtors we use the complete type variant to produce
// the mangling of the device outlined kernel.		// the mangling of the device outlined kernel.
if (const auto *D = dyn_cast<CXXConstructorDecl>(CGF.CurFuncDecl))		if (const auto *D = dyn_cast<CXXConstructorDecl>(CGF.CurFuncDecl))
ParentName = CGM.getMangledName(GlobalDecl(D, Ctor_Complete));		ParentName = CGM.getMangledName(GlobalDecl(D, Ctor_Complete));
else if (const auto *D = dyn_cast<CXXDestructorDecl>(CGF.CurFuncDecl))		else if (const auto *D = dyn_cast<CXXDestructorDecl>(CGF.CurFuncDecl))
ParentName = CGM.getMangledName(GlobalDecl(D, Dtor_Complete));		ParentName = CGM.getMangledName(GlobalDecl(D, Dtor_Complete));
▲ Show 20 Lines • Show All 1,279 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/Clang.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,991 Lines • ▼ Show 20 Lines	case Driver::OMPRT_IOMP5:
/Default=/false))		/Default=/false))
CmdArgs.push_back("-fopenmp-assume-teams-oversubscription");		CmdArgs.push_back("-fopenmp-assume-teams-oversubscription");
if (Args.hasFlag(options::OPT_fopenmp_assume_threads_oversubscription,		if (Args.hasFlag(options::OPT_fopenmp_assume_threads_oversubscription,
options::OPT_fno_openmp_assume_threads_oversubscription,		options::OPT_fno_openmp_assume_threads_oversubscription,
/Default=/false))		/Default=/false))
CmdArgs.push_back("-fopenmp-assume-threads-oversubscription");		CmdArgs.push_back("-fopenmp-assume-threads-oversubscription");
if (Args.hasArg(options::OPT_fopenmp_assume_no_thread_state))		if (Args.hasArg(options::OPT_fopenmp_assume_no_thread_state))
CmdArgs.push_back("-fopenmp-assume-no-thread-state");		CmdArgs.push_back("-fopenmp-assume-no-thread-state");
		if (Args.hasArg(options::OPT_fopenmp_offload_mandatory))
		CmdArgs.push_back("-fopenmp-offload-mandatory");
break;		break;
default:		default:
// By default, if Clang doesn't know how to generate useful OpenMP code		// By default, if Clang doesn't know how to generate useful OpenMP code
// for a specific runtime library, we just don't pass the '-fopenmp' flag		// for a specific runtime library, we just don't pass the '-fopenmp' flag
// down to the actual compilation.		// down to the actual compilation.
// FIXME: It would be better to have a mode which only omits IR		// FIXME: It would be better to have a mode which only omits IR
// generation based on the OpenMP support so that we get consistent		// generation based on the OpenMP support so that we get consistent
// semantic analysis, etc.		// semantic analysis, etc.
▲ Show 20 Lines • Show All 2,325 Lines • Show Last 20 Lines

clang/lib/Sema/SemaOpenMP.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,511 Lines • ▼ Show 20 Lines	if (LangOpts.OpenMPIsDevice && DevTy &&
StringRef HostDevTy =		StringRef HostDevTy =
getOpenMPSimpleClauseTypeName(OMPC_device_type, OMPC_DEVICE_TYPE_host);		getOpenMPSimpleClauseTypeName(OMPC_device_type, OMPC_DEVICE_TYPE_host);
Diag(Loc, diag::err_omp_wrong_device_function_call) << HostDevTy << 0;		Diag(Loc, diag::err_omp_wrong_device_function_call) << HostDevTy << 0;
Diag(*OMPDeclareTargetDeclAttr::getLocation(FD),		Diag(*OMPDeclareTargetDeclAttr::getLocation(FD),
diag::note_omp_marked_device_type_here)		diag::note_omp_marked_device_type_here)
<< HostDevTy;		<< HostDevTy;
return;		return;
}		}
if (!LangOpts.OpenMPIsDevice && DevTy &&		if (!LangOpts.OpenMPIsDevice && !LangOpts.OpenMPOffloadMandatory && DevTy &&
*DevTy == OMPDeclareTargetDeclAttr::DT_NoHost) {		*DevTy == OMPDeclareTargetDeclAttr::DT_NoHost) {
// Diagnose nohost function called during host codegen.		// Diagnose nohost function called during host codegen.
StringRef NoHostDevTy = getOpenMPSimpleClauseTypeName(		StringRef NoHostDevTy = getOpenMPSimpleClauseTypeName(
OMPC_device_type, OMPC_DEVICE_TYPE_nohost);		OMPC_device_type, OMPC_DEVICE_TYPE_nohost);
Diag(Loc, diag::err_omp_wrong_device_function_call) << NoHostDevTy << 1;		Diag(Loc, diag::err_omp_wrong_device_function_call) << NoHostDevTy << 1;
Diag(*OMPDeclareTargetDeclAttr::getLocation(FD),		Diag(*OMPDeclareTargetDeclAttr::getLocation(FD),
diag::note_omp_marked_device_type_here)		diag::note_omp_marked_device_type_here)
<< NoHostDevTy;		<< NoHostDevTy;
▲ Show 20 Lines • Show All 19,982 Lines • Show Last 20 Lines

clang/test/OpenMP/target_offload_mandatory_codegen.cpp

This file was added.

				// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature --include-generated-funcs --replace-value-regex "__omp_offloading_[0-9a-z]+_[0-9a-z]+"
				// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -fopenmp-offload-mandatory -emit-llvm %s -o - \| FileCheck %s --check-prefix=MANDATORY
				// expected-no-diagnostics

				void foo() {}
				#pragma omp declare target(foo)

				void bar() {}
				#pragma omp declare target device_type(nohost) to(bar)

				void host() {
				#pragma omp target
				{ bar(); }
				}

				void host_if(bool cond) {
				#pragma omp target if(cond)
				{ bar(); }
				}

				void host_dev(int device) {
				#pragma omp target device(device)
				{ bar(); }
				}
				// MANDATORY-LABEL: define {{[^@]+}}@_Z3foov
				// MANDATORY-SAME: () #[[ATTR0:[0-9]+]] {
				// MANDATORY-NEXT: entry:
				// MANDATORY-NEXT: ret void
				//
				//
				// MANDATORY-LABEL: define {{[^@]+}}@_Z4hostv
				// MANDATORY-SAME: () #[[ATTR0]] {
				// MANDATORY-NEXT: entry:
				// MANDATORY-NEXT: [[TMP0:%.]] = call i32 @__tgt_target_mapper(%struct.ident_t @[[GLOB1:[0-9]+]], i64 -1, i8* @.{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z4hostv_l12.region_id, i32 0, i8 null, i8 null, i64* null, i64* null, i8 null, i8 null)
				// MANDATORY-NEXT: [[TMP1:%.*]] = icmp ne i32 [[TMP0]], 0
				// MANDATORY-NEXT: br i1 [[TMP1]], label [[OMP_OFFLOAD_FAILED:%.]], label [[OMP_OFFLOAD_CONT:%.]]
				// MANDATORY: omp_offload.failed:
				// MANDATORY-NEXT: unreachable
				// MANDATORY: omp_offload.cont:
				// MANDATORY-NEXT: ret void
				//
				//
				// MANDATORY-LABEL: define {{[^@]+}}@_Z7host_ifb
				// MANDATORY-SAME: (i1 noundef zeroext [[COND:%.*]]) #[[ATTR0]] {
				// MANDATORY-NEXT: entry:
				// MANDATORY-NEXT: [[COND_ADDR:%.*]] = alloca i8, align 1
				// MANDATORY-NEXT: [[FROMBOOL:%.*]] = zext i1 [[COND]] to i8
				// MANDATORY-NEXT: store i8 [[FROMBOOL]], i8* [[COND_ADDR]], align 1
				// MANDATORY-NEXT: [[TMP0:%.]] = load i8, i8 [[COND_ADDR]], align 1
				// MANDATORY-NEXT: [[TOBOOL:%.*]] = trunc i8 [[TMP0]] to i1
				// MANDATORY-NEXT: br i1 [[TOBOOL]], label [[OMP_IF_THEN:%.]], label [[OMP_IF_ELSE:%.]]
				// MANDATORY: omp_if.then:
				// MANDATORY-NEXT: [[TMP1:%.]] = call i32 @__tgt_target_mapper(%struct.ident_t @[[GLOB1]], i64 -1, i8* @.{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z7host_ifb_l17.region_id, i32 0, i8 null, i8 null, i64* null, i64* null, i8 null, i8 null)
				// MANDATORY-NEXT: [[TMP2:%.*]] = icmp ne i32 [[TMP1]], 0
				// MANDATORY-NEXT: br i1 [[TMP2]], label [[OMP_OFFLOAD_FAILED:%.]], label [[OMP_OFFLOAD_CONT:%.]]
				// MANDATORY: omp_offload.failed:
				// MANDATORY-NEXT: unreachable
				// MANDATORY: omp_offload.cont:
				// MANDATORY-NEXT: br label [[OMP_IF_END:%.*]]
				// MANDATORY: omp_if.else:
				// MANDATORY-NEXT: unreachable
				// MANDATORY: omp_if.end:
				// MANDATORY-NEXT: ret void
				//
				//
				// MANDATORY-LABEL: define {{[^@]+}}@_Z8host_devi
				// MANDATORY-SAME: (i32 noundef signext [[DEVICE:%.*]]) #[[ATTR0]] {
				// MANDATORY-NEXT: entry:
				// MANDATORY-NEXT: [[DEVICE_ADDR:%.*]] = alloca i32, align 4
				// MANDATORY-NEXT: [[DOTCAPTURE_EXPR_:%.*]] = alloca i32, align 4
				// MANDATORY-NEXT: store i32 [[DEVICE]], i32* [[DEVICE_ADDR]], align 4
				// MANDATORY-NEXT: [[TMP0:%.]] = load i32, i32 [[DEVICE_ADDR]], align 4
				// MANDATORY-NEXT: store i32 [[TMP0]], i32* [[DOTCAPTURE_EXPR_]], align 4
				// MANDATORY-NEXT: [[TMP1:%.]] = load i32, i32 [[DOTCAPTURE_EXPR_]], align 4
				// MANDATORY-NEXT: [[TMP2:%.*]] = sext i32 [[TMP1]] to i64
				// MANDATORY-NEXT: [[TMP3:%.]] = call i32 @__tgt_target_mapper(%struct.ident_t @[[GLOB1]], i64 [[TMP2]], i8* @.{{__omp_offloading_[0-9a-z]+_[0-9a-z]+}}__Z8host_devi_l22.region_id, i32 0, i8 null, i8 null, i64* null, i64* null, i8 null, i8 null)
				// MANDATORY-NEXT: [[TMP4:%.*]] = icmp ne i32 [[TMP3]], 0
				// MANDATORY-NEXT: br i1 [[TMP4]], label [[OMP_OFFLOAD_FAILED:%.]], label [[OMP_OFFLOAD_CONT:%.]]
				// MANDATORY: omp_offload.failed:
				// MANDATORY-NEXT: unreachable
				// MANDATORY: omp_offload.cont:
				// MANDATORY-NEXT: ret void
				//
				//
				// MANDATORY-LABEL: define {{[^@]+}}@.omp_offloading.requires_reg
				// MANDATORY-SAME: () #[[ATTR3:[0-9]+]] {
				// MANDATORY-NEXT: entry:
				// MANDATORY-NEXT: call void @__tgt_register_requires(i64 1)
				// MANDATORY-NEXT: ret void
				//