This is an archive of the discontinued LLVM Phabricator instance.

[OPENMP] Codegen for teams directive for NVPTX
ClosedPublic

Authored by carlo.bertolli on Mar 8 2016, 10:15 AM.

Download Raw Diff

Details

Reviewers

kkwli0
ABataev
• fraggamuffin

Summary

This patch implements the teams directive for the NVPTX backend. It is different from the host code generation path as it:

Does not call kmpc_fork_teams. All necessary teams and threads are started upon touching the target region, when launching a CUDA kernel, and their execution is coordinated through sequential and parallel regions within the target region.

Does not call kmpc_push_num_teams even if a num_teams of thread_limit clause is present. Setting the number of teams and the thread limit is implemented by the nvptx-related runtime.

Please note that I am now passing a Clang Expr * to emitPushNumTeams instead of the originally chosen llvm::Value * type. The reason for that is that I want to avoid emitting expressions for num_teams and thread_limit if they are not needed in the target region.

Diff Detail

Repository: rL LLVM

Event Timeline

carlo.bertolli updated this revision to Diff 50052.Mar 8 2016, 10:15 AM

carlo.bertolli retitled this revision from to [OPENMP] Codegen for teams directive for NVPTX.

carlo.bertolli updated this object.

carlo.bertolli added reviewers: ABataev, • fraggamuffin, kkwli0.

carlo.bertolli set the repository for this revision to rL LLVM.

carlo.bertolli added subscribers: sfantao, arpith-jacob, caomhin, cfe-commits.

Herald added a subscriber: jholewinski. · View Herald TranscriptMar 8 2016, 10:15 AM

carlo.bertolli added a child revision: D17979: [OPENMP] Add regression test for codegen of distribute pragma for NVPTX.Mar 8 2016, 8:09 PM

ABataev added inline comments.Mar 8 2016, 11:47 PM

lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp
375–376	This will cause a crash for captured global variables. Emit as inlined directive

Addressed comment in new version of diff.

[OPENMP] This new version of the patch uses the inlining machinery of CGOpenMPRuntime.cpp instead of just dumping the teams body statement using EmitStmt.

Add tests with captured globals to check that this problem is resolved

Support for global variables in a target region requires the global to be placed in a pragma declare target region. Pragma declare target region is not currently available in Clang (no parsing, sema, or codegen). I will add a check for this in the regression test once the support becomes available. Thanks!

I can actually add a regression test that checks that the compiler does not break when using a global in a teams region, even without declare target, if that is what is required here.

lib/CodeGen/CGOpenMPRuntimeNVPTX.h
145–172	Remove 'virtual' and add 'override' at the end of each function

This revision is now accepted and ready to land.Mar 10 2016, 8:19 PM

sfantao added a parent revision: D18110: [OpenMP] Fix SEMA bug in the capture of global variables in template functions..Mar 11 2016, 4:35 PM

mkuron added a subscriber: mkuron.Mar 20 2016, 9:15 AM

carlo.bertolli removed a parent revision: D18110: [OpenMP] Fix SEMA bug in the capture of global variables in template functions..Mar 23 2016, 7:54 AM

[OPENMP] Even though this patch was already accepted in its previous form, comments on depending patch D18286 (http://reviews.llvm.org/D18286) revealed that a new approach for this patch was necessary. Instead of committing something that I have to change anyway later on, I decided to provide a new version of this base patch. Please review it again and let me know about any comments you may have.

Committed revision 265304.

Revision Contents

Path

Size

lib/

CodeGen/

CGOpenMPRuntime.h

9 lines

CGOpenMPRuntime.cpp

20 lines

CGOpenMPRuntimeNVPTX.h

35 lines

CGOpenMPRuntimeNVPTX.cpp

44 lines

CGStmtOpenMP.cpp

17 lines

test/

OpenMP/

nvptx_teams_codegen.cpp

136 lines

Diff 52416

lib/CodeGen/CGOpenMPRuntime.h

Show First 20 Lines • Show All 906 Lines • ▼ Show 20 Lines	public:
virtual void emitTeamsCall(CodeGenFunction &CGF,		virtual void emitTeamsCall(CodeGenFunction &CGF,
const OMPExecutableDirective &D,		const OMPExecutableDirective &D,
SourceLocation Loc, llvm::Value *OutlinedFn,		SourceLocation Loc, llvm::Value *OutlinedFn,
ArrayRef<llvm::Value *> CapturedVars);		ArrayRef<llvm::Value *> CapturedVars);

/// \brief Emits call to void __kmpc_push_num_teams(ident_t *loc, kmp_int32		/// \brief Emits call to void __kmpc_push_num_teams(ident_t *loc, kmp_int32
/// global_tid, kmp_int32 num_teams, kmp_int32 thread_limit) to generate code		/// global_tid, kmp_int32 num_teams, kmp_int32 thread_limit) to generate code
/// for num_teams clause.		/// for num_teams clause.
/// \param NumTeams An integer value of teams.		/// \param NumTeams An integer expression of teams.
/// \param ThreadLimit An integer value of threads.		/// \param ThreadLimit An integer expression of threads.
virtual void emitNumTeamsClause(CodeGenFunction &CGF, llvm::Value *NumTeams,		virtual void emitNumTeamsClause(CodeGenFunction &CGF, const Expr *NumTeams,
llvm::Value *ThreadLimit, SourceLocation Loc);		const Expr *ThreadLimit, SourceLocation Loc);

};		};

} // namespace CodeGen		} // namespace CodeGen
} // namespace clang		} // namespace clang

#endif		#endif

lib/CodeGen/CGOpenMPRuntime.cpp

Show First 20 Lines • Show All 4,826 Lines • ▼ Show 20 Lines	void CGOpenMPRuntime::emitTeamsCall(CodeGenFunction &CGF,
RealArgs.append(std::begin(Args), std::end(Args));		RealArgs.append(std::begin(Args), std::end(Args));
RealArgs.append(CapturedVars.begin(), CapturedVars.end());		RealArgs.append(CapturedVars.begin(), CapturedVars.end());

auto RTLFn = createRuntimeFunction(OMPRTL__kmpc_fork_teams);		auto RTLFn = createRuntimeFunction(OMPRTL__kmpc_fork_teams);
CGF.EmitRuntimeCall(RTLFn, RealArgs);		CGF.EmitRuntimeCall(RTLFn, RealArgs);
}		}

void CGOpenMPRuntime::emitNumTeamsClause(CodeGenFunction &CGF,		void CGOpenMPRuntime::emitNumTeamsClause(CodeGenFunction &CGF,
llvm::Value *NumTeams,		const Expr *NumTeams,
llvm::Value *ThreadLimit,		const Expr *ThreadLimit,
SourceLocation Loc) {		SourceLocation Loc) {
if (!CGF.HaveInsertPoint())		if (!CGF.HaveInsertPoint())
return;		return;

auto *RTLoc = emitUpdateLocation(CGF, Loc);		auto *RTLoc = emitUpdateLocation(CGF, Loc);

		llvm::Value *NumTeamsVal =
		(NumTeams)
		? CGF.Builder.CreateIntCast(CGF.EmitScalarExpr(NumTeams),
		CGF.CGM.Int32Ty, /* isSigned = */ true)
		: CGF.Builder.getInt32(0);

		llvm::Value *ThreadLimitVal =
		(ThreadLimit)
		? CGF.Builder.CreateIntCast(CGF.EmitScalarExpr(ThreadLimit),
		CGF.CGM.Int32Ty, /* isSigned = */ true)
		: CGF.Builder.getInt32(0);

// Build call __kmpc_push_num_teamss(&loc, global_tid, num_teams, thread_limit)		// Build call __kmpc_push_num_teamss(&loc, global_tid, num_teams, thread_limit)
llvm::Value *PushNumTeamsArgs[] = {		llvm::Value *PushNumTeamsArgs[] = {RTLoc, getThreadID(CGF, Loc), NumTeamsVal,
RTLoc, getThreadID(CGF, Loc), NumTeams, ThreadLimit};		ThreadLimitVal};
CGF.EmitRuntimeCall(createRuntimeFunction(OMPRTL__kmpc_push_num_teams),		CGF.EmitRuntimeCall(createRuntimeFunction(OMPRTL__kmpc_push_num_teams),
PushNumTeamsArgs);		PushNumTeamsArgs);
}		}

lib/CodeGen/CGOpenMPRuntimeNVPTX.h

Show First 20 Lines • Show All 130 Lines • ▼ Show 20 Lines	void emitTargetOutlinedFunction(const OMPExecutableDirective &D,
StringRef ParentName,		StringRef ParentName,
llvm::Function *&OutlinedFn,		llvm::Function *&OutlinedFn,
llvm::Constant *&OutlinedFnID,		llvm::Constant *&OutlinedFnID,
bool IsOffloadEntry,		bool IsOffloadEntry,
const RegionCodeGenTy &CodeGen) override;		const RegionCodeGenTy &CodeGen) override;

public:		public:
explicit CGOpenMPRuntimeNVPTX(CodeGenModule &CGM);		explicit CGOpenMPRuntimeNVPTX(CodeGenModule &CGM);

		/// \brief This function ought to emit, in the general case, a call to
		// the openmp runtime kmpc_push_num_teams. In NVPTX backend it is not needed
		// as these numbers are obtained through the PTX grid and block configuration.
		/// \param NumTeams An integer expression of teams.
		/// \param ThreadLimit An integer expression of threads.
		void emitNumTeamsClause(CodeGenFunction &CGF, const Expr *NumTeams,
		const Expr *ThreadLimit, SourceLocation Loc) override;

		/// \brief Emits inlined function for the specified OpenMP parallel
		// directive but an inlined function for teams.
		/// \a D. This outlined function has type void()(kmp_int32 ThreadID,
		/// kmp_int32 BoundID, struct context_vars*).
		/// \param D OpenMP directive.
		/// \param ThreadIDVar Variable for thread id in the current OpenMP region.
		/// \param InnermostKind Kind of innermost directive (for simple directives it
		/// is a directive itself, for combined - its innermost directive).
		/// \param CodeGen Code generation sequence for the \a D directive.
		llvm::Value *
		emitParallelOrTeamsOutlinedFunction(const OMPExecutableDirective &D,
		const VarDecl *ThreadIDVar,
		OpenMPDirectiveKind InnermostKind,
		const RegionCodeGenTy &CodeGen) override;

		/// \brief Emits code for teams call of the \a OutlinedFn with
		/// variables captured in a record which address is stored in \a
		/// CapturedStruct.
		/// \param OutlinedFn Outlined function to be run by team masters. Type of
		/// this function is void()(kmp_int32 , kmp_int32, struct context_vars*).
		/// \param CapturedVars A pointer to the record with the references to
		/// variables used in \a OutlinedFn function.
		///
		void emitTeamsCall(CodeGenFunction &CGF, const OMPExecutableDirective &D,
		SourceLocation Loc, llvm::Value *OutlinedFn,
		ABataevUnsubmitted Not Done Reply Inline Actions Remove 'virtual' and add 'override' at the end of each function ABataev: Remove 'virtual' and add 'override' at the end of each function
		ArrayRef<llvm::Value *> CapturedVars) override;
};		};

} // CodeGen namespace.		} // CodeGen namespace.
} // clang namespace.		} // clang namespace.

#endif // LLVM_CLANG_LIB_CODEGEN_CGOPENMPRUNTIMENVPTX_H		#endif // LLVM_CLANG_LIB_CODEGEN_CGOPENMPRUNTIMENVPTX_H

lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp

	//===---- CGOpenMPRuntimeNVPTX.cpp - Interface to OpenMP NVPTX Runtimes ---===//			//===---- CGOpenMPRuntimeNVPTX.cpp - Interface to OpenMP NVPTX Runtimes ---===//
	//			//
	// The LLVM Compiler Infrastructure			// The LLVM Compiler Infrastructure
	//			//
	// This file is distributed under the University of Illinois Open Source			// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.			// License. See LICENSE.TXT for details.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// This provides a class for OpenMP runtime code generation specialized to NVPTX			// This provides a class for OpenMP runtime code generation specialized to NVPTX
	// targets.			// targets.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "CGOpenMPRuntimeNVPTX.h"			#include "CGOpenMPRuntimeNVPTX.h"
	#include "clang/AST/DeclOpenMP.h"			#include "clang/AST/DeclOpenMP.h"
				#include "CodeGenFunction.h"
				#include "clang/AST/StmtOpenMP.h"

	using namespace clang;			using namespace clang;
	using namespace CodeGen;			using namespace CodeGen;

	/// \brief Get the GPU warp size.			/// \brief Get the GPU warp size.
	llvm::Value *CGOpenMPRuntimeNVPTX::getNVPTXWarpSize(CodeGenFunction &CGF) {			llvm::Value *CGOpenMPRuntimeNVPTX::getNVPTXWarpSize(CodeGenFunction &CGF) {
	CGBuilderTy &Bld = CGF.Builder;			CGBuilderTy &Bld = CGF.Builder;
	return Bld.CreateCall(			return Bld.CreateCall(
	▲ Show 20 Lines • Show All 320 Lines • ▼ Show 20 Lines
	CGOpenMPRuntimeNVPTX::CGOpenMPRuntimeNVPTX(CodeGenModule &CGM)			CGOpenMPRuntimeNVPTX::CGOpenMPRuntimeNVPTX(CodeGenModule &CGM)
	: CGOpenMPRuntime(CGM), ActiveWorkers(nullptr), WorkID(nullptr) {			: CGOpenMPRuntime(CGM), ActiveWorkers(nullptr), WorkID(nullptr) {
	if (!CGM.getLangOpts().OpenMPIsDevice)			if (!CGM.getLangOpts().OpenMPIsDevice)
	llvm_unreachable("OpenMP NVPTX can only handle device code.");			llvm_unreachable("OpenMP NVPTX can only handle device code.");

	// Called once per module during initialization.			// Called once per module during initialization.
	initializeEnvironment();			initializeEnvironment();
	}			}

				void CGOpenMPRuntimeNVPTX::emitNumTeamsClause(CodeGenFunction &CGF,
				const Expr *NumTeams,
				const Expr *ThreadLimit,
				SourceLocation Loc) {}

				llvm::Value *CGOpenMPRuntimeNVPTX::emitParallelOrTeamsOutlinedFunction(
				const OMPExecutableDirective &D, const VarDecl *ThreadIDVar,
				OpenMPDirectiveKind InnermostKind, const RegionCodeGenTy &CodeGen) {

				llvm::Function *OutlinedFun = nullptr;
				if (isa<OMPTeamsDirective>(D)) {
				llvm::Value *OutlinedFunVal =
				CGOpenMPRuntime::emitParallelOrTeamsOutlinedFunction(
				D, ThreadIDVar, InnermostKind, CodeGen);
				OutlinedFun = cast<llvm::Function>(OutlinedFunVal);
				OutlinedFun->addFnAttr(llvm::Attribute::AlwaysInline);
				} else
				llvm_unreachable("parallel directive is not yet supported for nvptx "
				"backend.");

				return OutlinedFun;
				ABataevUnsubmitted Done Reply Inline Actions This will cause a crash for captured global variables. Emit as inlined directive ABataev: This will cause a crash for captured global variables. Emit as inlined directive
				}

				void CGOpenMPRuntimeNVPTX::emitTeamsCall(CodeGenFunction &CGF,
				const OMPExecutableDirective &D,
				SourceLocation Loc,
				llvm::Value *OutlinedFn,
				ArrayRef<llvm::Value *> CapturedVars) {
				if (!CGF.HaveInsertPoint())
				return;

				Address ZeroAddr =
				CGF.CreateTempAlloca(CGF.Int32Ty, CharUnits::fromQuantity(4),
				/Name/ ".zero.addr");
				CGF.InitTempAlloca(ZeroAddr, CGF.Builder.getInt32(/C/ 0));
				llvm::SmallVector<llvm::Value *, 16> OutlinedFnArgs;
				OutlinedFnArgs.push_back(ZeroAddr.getPointer());
				OutlinedFnArgs.push_back(ZeroAddr.getPointer());
				OutlinedFnArgs.append(CapturedVars.begin(), CapturedVars.end());
				CGF.EmitCallOrInvoke(OutlinedFn, OutlinedFnArgs);
				}

lib/CodeGen/CGStmtOpenMP.cpp

Show First 20 Lines • Show All 3,092 Lines • ▼ Show 20 Lines	static void emitCommonOMPTeamsDirective(CodeGenFunction &CGF,
auto OutlinedFn = CGF.CGM.getOpenMPRuntime().		auto OutlinedFn = CGF.CGM.getOpenMPRuntime().
emitParallelOrTeamsOutlinedFunction(S,		emitParallelOrTeamsOutlinedFunction(S,
*CS->getCapturedDecl()->param_begin(), InnermostKind, CodeGen);		*CS->getCapturedDecl()->param_begin(), InnermostKind, CodeGen);

const OMPTeamsDirective &TD = *dyn_cast<OMPTeamsDirective>(&S);		const OMPTeamsDirective &TD = *dyn_cast<OMPTeamsDirective>(&S);
const OMPNumTeamsClause *NT = TD.getSingleClause<OMPNumTeamsClause>();		const OMPNumTeamsClause *NT = TD.getSingleClause<OMPNumTeamsClause>();
const OMPThreadLimitClause *TL = TD.getSingleClause<OMPThreadLimitClause>();		const OMPThreadLimitClause *TL = TD.getSingleClause<OMPThreadLimitClause>();
if (NT \|\| TL) {		if (NT \|\| TL) {
llvm::Value *NumTeamsVal = (NT) ? CGF.Builder.CreateIntCast(		Expr *NumTeams = (NT) ? NT->getNumTeams() : nullptr;
CGF.EmitScalarExpr(NT->getNumTeams()), CGF.CGM.Int32Ty,		Expr *ThreadLimit = (TL) ? TL->getThreadLimit() : nullptr;
/* isSigned = */ true) :
CGF.Builder.getInt32(0);

llvm::Value *ThreadLimitVal = (TL) ? CGF.Builder.CreateIntCast(
CGF.EmitScalarExpr(TL->getThreadLimit()), CGF.CGM.Int32Ty,
/* isSigned = */ true) :
CGF.Builder.getInt32(0);

CGF.CGM.getOpenMPRuntime().emitNumTeamsClause(CGF, NumTeamsVal,		CGF.CGM.getOpenMPRuntime().emitNumTeamsClause(CGF, NumTeams, ThreadLimit,
ThreadLimitVal, S.getLocStart());		S.getLocStart());
}		}

OMPLexicalScope Scope(CGF, S);		OMPLexicalScope Scope(CGF, S);
llvm::SmallVector<llvm::Value *, 16> CapturedVars;		llvm::SmallVector<llvm::Value *, 16> CapturedVars;
CGF.GenerateOpenMPCapturedVars(*CS, CapturedVars);		CGF.GenerateOpenMPCapturedVars(*CS, CapturedVars);
CGF.CGM.getOpenMPRuntime().emitTeamsCall(CGF, S, S.getLocStart(), OutlinedFn,		CGF.CGM.getOpenMPRuntime().emitTeamsCall(CGF, S, S.getLocStart(), OutlinedFn,
CapturedVars);		CapturedVars);
}		}
▲ Show 20 Lines • Show All 96 Lines • Show Last 20 Lines

test/OpenMP/nvptx_teams_codegen.cpp

This file was added.

				// Test target codegen - host bc file has to be created first.
				// RUN: %clang_cc1 -DCK1 -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fomptargets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc
				// RUN: %clang_cc1 -DCK1 -verify -fopenmp -x c++ -triple nvptx64-unknown-unknown -fomptargets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fomp-host-ir-file-path %t-ppc-host.bc -o - \| FileCheck %s --check-prefix CK1 --check-prefix CK1-64
				// RUN: %clang_cc1 -DCK1 -verify -fopenmp -x c++ -triple i386-unknown-unknown -fomptargets=nvptx-nvidia-cuda -emit-llvm-bc %s -o %t-x86-host.bc
				// RUN: %clang_cc1 -DCK1 -verify -fopenmp -x c++ -triple nvptx-unknown-unknown -fomptargets=nvptx-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fomp-host-ir-file-path %t-x86-host.bc -o - \| FileCheck %s --check-prefix CK1 --check-prefix CK1-32
				// expected-no-diagnostics
				#ifndef HEADER
				#define HEADER

				#ifdef CK1

				template <typename T>
				int tmain(T argc) {
				#pragma omp target
				#pragma omp teams
				argc = 0;
				return 0;
				}


				int main (int argc, char **argv) {
				#pragma omp target
				#pragma omp teams
				{
				argc = 0;
				}
				return tmain(argv);
				}

				// only nvptx side: do not outline teams region and do not call fork_teams
				// CK1: define {{.*}}void @{{[^,]+}}(i{{[0-9]+}} [[ARGC:%.+]])
				// CK1: {{.+}} = alloca i{{[0-9]+}}*,
				// CK1: {{.+}} = alloca i{{[0-9]+}}*,
				// CK1: [[ARGCADDR_PTR:%.+]] = alloca i{{[0-9]+}}*,
				// CK1: [[ARGCADDR:%.+]] = alloca i{{[0-9]+}},
				// CK1: store {{.+}} 0, {{.+}},
				// CK1: store i{{[0-9]+}} [[ARGC]], i{{[0-9]+}}* [[ARGCADDR]],
				// CK1-64: [[CONV:%.+]] = bitcast i{{[0-9]+}}* [[ARGCADDR]] to i{{[0-9]+}}*
				// CK1-64: store i{{[0-9]+}}* [[CONV]], i{{[0-9]+}}** [[ARGCADDR_PTR]],
				// CK1-32: store i{{[0-9]+}}* [[ARGCADDR]], i{{[0-9]+}}** [[ARGCADDR_PTR]],
				// CK1: [[ARGCADDR_PTR_REF:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[ARGCADDR_PTR]],
				// CK1: store i{{[0-9]+}} 0, i{{[0-9]+}}* [[ARGCADDR_PTR_REF]],
				// CK1-NOT: call {{.}}void (%ident_t, i32, void (i32, i32, ...)*, ...) @__kmpc_fork_teams(
				// CK1: ret void
				// CK1-NEXT: }

				// target region in template
				// CK1: define {{.}}void @{{[^,]+}}(i{{.+}}**{{.+}} [[ARGC:%.+]])
				// CK1: [[ARGCADDR_PTR:%.+]] = alloca i{{.+}}***,
				// CK1: [[ARGCADDR:%.+]] = alloca i{{.+}}***,
				// CK1: store i{{.+}}* [[ARGC]], i{{.+}}** [[ARGCADDR]]
				// CK1: [[ARGCADDR_REF:%.+]] = load i{{.+}}*, i{{.+}}** [[ARGCADDR]],
				// CK1: store i8* [[ARGCADDR_REF]], i8** [[ARGCADDR_PTR]],
				// CK1: [[ARGCADDR_PTR_REF:%.+]] = load i{{.+}}*, i{{.+}}** [[ARGCADDR_PTR]],
				// CK1: store i{{[0-9]+}} null, i{{[0-9]+}}* [[ARGCADDR_PTR_REF]],
				// CK1-NOT: call {{.}}void (%ident_t, i32, void (i32, i32, ...)*, ...) @__kmpc_fork_teams(
				// CK1: ret void
				// CK1-NEXT: }


				#endif // CK1

				// Test target codegen - host bc file has to be created first.
				// RUN: %clang_cc1 -DCK2 -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fomptargets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc
				// RUN: %clang_cc1 -DCK2 -verify -fopenmp -x c++ -triple nvptx64-unknown-unknown -fomptargets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fomp-host-ir-file-path %t-ppc-host.bc -o - \| FileCheck %s --check-prefix CK2 --check-prefix CK2-64
				// RUN: %clang_cc1 -DCK2 -verify -fopenmp -x c++ -triple i386-unknown-unknown -fomptargets=nvptx-nvidia-cuda -emit-llvm-bc %s -o %t-x86-host.bc
				// RUN: %clang_cc1 -DCK2 -verify -fopenmp -x c++ -triple nvptx-unknown-unknown -fomptargets=nvptx-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fomp-host-ir-file-path %t-x86-host.bc -o - \| FileCheck %s --check-prefix CK2 --check-prefix CK2-32
				// expected-no-diagnostics
				#ifdef CK2

				template <typename T>
				int tmain(T argc) {
				int a = 10;
				int b = 5;
				#pragma omp target
				#pragma omp teams num_teams(a) thread_limit(b)
				{
				argc = 0;
				}
				return 0;
				}

				int main (int argc, char **argv) {
				int a = 20;
				int b = 5;
				#pragma omp target
				#pragma omp teams num_teams(a) thread_limit(b)
				{
				argc = 0;
				}
				return tmain(argv);
				}

				// CK2: define {{.*}}void @{{[^,]+}}(i{{[0-9]+}} [[A_IN:%.+]], i{{[0-9]+}} [[B_IN:%.+]], i{{[0-9]+}} [[ARGC_IN:.+]])
				// CK2: {{.}} = alloca i{{[0-9]+}}*,
				// CK2: {{.}} = alloca i{{[0-9]+}}*,
				// CK2: [[ARGCADDR_PTR:%.+]] = alloca i{{[0-9]+}}*,
				// CK2: [[AADDR:%.+]] = alloca i{{[0-9]+}},
				// CK2: [[BADDR:%.+]] = alloca i{{[0-9]+}},
				// CK2: [[ARGCADDR:%.+]] = alloca i{{[0-9]+}},
				// CK2-NOT: {{%.+}} = call i32 @__kmpc_global_thread_num(
				// CK2: store i{{[0-9]+}} [[A_IN]], i{{[0-9]+}}* [[AADDR]],
				// CK2: store i{{[0-9]+}} [[B_IN]], i{{[0-9]+}}* [[BADDR]],
				// CK2: store i{{[0-9]+}} [[ARGC_IN]], i{{[0-9]+}}* [[ARGCADDR]],
				// CK2-64: [[ACONV:%.+]] = bitcast i64* [[AADDR]] to i32*
				// CK2-64: [[BCONV:%.+]] = bitcast i64* [[BADDR]] to i32*
				// CK2-64: [[CONV:%.+]] = bitcast i64* [[ARGCADDR]] to i32*
				// CK2-64: store i{{[0-9]+}}* [[CONV]], i{{[0-9]+}}** [[ARGCADDR_PTR]],
				// CK2-32: store i{{[0-9]+}}* [[ARGCADDR]], i{{[0-9]+}}** [[ARGCADDR_PTR]],
				// CK2: [[ARGCADDR_PTR_REF:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[ARGCADDR_PTR]],
				// CK2: store i{{[0-9]+}} 0, i{{[0-9]+}}* [[ARGCADDR_PTR_REF]],
				// CK2-NOT: {{.+}} = call i32 @__kmpc_push_num_teams(
				// CK2-NOT: call void (%ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_teams(
				// CK2: ret

				// CK2: define {{.}}void @{{[^,]+}}(i{{[0-9]+}}{{.+}} [[A_IN:%.+]], i{{[0-9]+}}{{.+}} [[BP:%.+]], i{{[0-9]+}}**{{.+}} [[ARGC:%.+]])
				// CK2: [[ARGCADDR_PTR:%.+]] = alloca i{{[0-9]+}}***,
				// CK2: [[AADDR:%.+]] = alloca i{{[0-9]+}}*,
				// CK2: [[BADDR:%.+]] = alloca i{{[0-9]+}}*,
				// CK2: [[ARGCADDR:%.+]] = alloca i{{[0-9]+}}***,
				// CK2-NOT: {{%.+}} = call i32 @__kmpc_global_thread_num(
				// CK2: store i{{[0-9]+}}* [[A_IN]], i{{[0-9]+}}** [[AADDR]],
				// CK2: store i{{[0-9]+}}* [[B_IN]], i{{[0-9]+}}** [[BADDR]],
				// CK2: store i{{[0-9]+}}* [[ARGC]], i{{[0-9]+}}** [[ARGCADDR]],
				// CK2: [[A_ADDR_VAL:%.+]] = load i32, i32* [[AADDR]]
				// CK2: [[B_ADDR_VAL:%.+]] = load i32, i32* [[BADDR]]
				// CK2: [[ARGC_ADDR_VAL:%.+]] = load i{{[0-9]+}}*, i{{[0-9]+}}** [[ARGCADDR]]
				// CK2: store i{{[0-9]+}}* [[ARGC_ADDR_VAL]], i{{[0-9]+}}** [[ARGCADDR_PTR]],
				// CK2: [[ARGCADDR_PTR_REF:%.+]] = load i{{[0-9]+}}*, i{{[0-9]+}}** [[ARGCADDR_PTR]],
				// CK2: store i{{[0-9]+}} null, i{{[0-9]+}}* [[ARGCADDR_PTR_REF]],
				// CK2-NOT: {{.+}} = call i32 @__kmpc_push_num_teams(
				// CK2-NOT: call void (%ident_t, i32, void (i32, i32, ...), ...) @__kmpc_fork_teams(
				// CK2: ret void

				#endif // CK2
				#endif