This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/CodeGen/
-
CodeGen/
1
CGOpenMPRuntimeNVPTX.cpp
-
test/OpenMP/
-
OpenMP/
-
nvptx_teams_firstprivate_codegen.cpp
-
nvptx_teams_private_codegen.cpp

Differential D18286

[OPENMP] private and firstprivate clauses of teams code generation for nvptx
AcceptedPublic

Authored by carlo.bertolli on Mar 18 2016, 3:37 PM.

Download Raw Diff

Details

Reviewers

kkwli0
ABataev
Hahnfeld

Summary

This patch adds initial support for codegen of private and firstprivate clauses of nvptx.
I simply use the same support in host codegeneration, even though optimizations are possible.
Optimizations may be done by having codegen first analyze data sharing attributes on target and teams (e.g. target firstprivate and teams private on the same variable can drop firstprivate on target). These optimizations will be subject of a successive patch.

Diff Detail

Repository: rL LLVM

Event Timeline

carlo.bertolli updated this revision to Diff 51080.Mar 18 2016, 3:37 PM

carlo.bertolli retitled this revision from to [OPENMP] private and firstprivate clauses of teams code generation for nvptx.

carlo.bertolli updated this object.

carlo.bertolli added reviewers: ABataev, kkwli0, Hahnfeld.

carlo.bertolli set the repository for this revision to rL LLVM.

carlo.bertolli added subscribers: sfantao, caomhin, arpith-jacob, cfe-commits.

Herald added a subscriber: jholewinski. · View Herald TranscriptMar 18 2016, 3:37 PM

mkuron added a subscriber: mkuron.Mar 20 2016, 9:15 AM

ABataev added inline comments.Mar 20 2016, 11:26 PM

lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp

53–57

After some investigation I found out that this the same code, that a;ready exists in CodeGenFunction::EmitOMPTeamsDirective().
I think you must remove emitTeamsCall() from CGOpenMPRuntimeNVPTX class and modify CodeGenFunction::EmitOMPTeamsDirective() like this:

void CodeGenFunction::EmitOMPTeamsDirective(const OMPTeamsDirective &S) {
  OMPLexicalScope(*this, S);
  // Emit parallel region as a standalone region.
  auto &&CodeGen = [&S](CodeGenFunction &CGF) {
    OMPPrivateScope PrivateScope(CGF);
    (void)CGF.EmitOMPFirstprivateClause(S, PrivateScope);
    CGF.EmitOMPPrivateClause(S, PrivateScope);
    (void)PrivateScope.Privatize();
    CGF.EmitStmt(cast<CapturedStmt>(S.getAssociatedStmt())->getCapturedStmt());
  };
  if (getLangOpts().OpenMPIsDevice)
    emitInlinedDirective(CGF, OMPD_teams, CodeGen);
  else
    emitCommonOMPTeamsDirective(*this, S, OMPD_teams, CodeGen);
}

Hi Alexey

Thanks for your comment. The suggested change will not work as I intended in my patch when using the host as a device. This happens when you select the following options:
-fomptargets=powerpc64le-ibm-linux-gnu -fopenmp-is-device

In this case we generate device code and the target is ppc64. In ppc64 we need to generate a call to kmpc_fork_teams. In your proposed change, we treat all devices in an undistinguished way and we do not generate a call to fork_teams.
There are various reasons why we should not do that, the most clear ones to me being:

When using the host as host or as target device we generate different codes. This would mess up with performance profiling.

On a host it is still important to have teams as that may be the place where coarse grain parallelism comes from.

If you still want no specialization in CGOpenMPRuntimeNVPTX, we will need to check if we are targeting a device and if that device is an nvptx one.

I know that the problem is that we have two CodeGen objects being created in different places if we target nvptx or host. However, by the way the interface is currently structured, I do not see any way out of this duplication.

Thanks!

To make clear my comment I an updating the patch following Alexey's comment and modifying it to make it work.
You can see that now we have the following check:
CGM.getTarget().getTriple().getArch() == llvm::Triple::nvptx ||
CGM.getTarget().getTriple().getArch() == llvm::Triple::nvptx64

Hahnfeld added inline comments.Mar 24 2016, 5:25 AM

lib/CodeGen/CGStmtOpenMP.cpp
3074–3078 ↗	(On Diff #51285)	This now duplicates logic from `CodeGenModule::createOpenMPRuntime` and has to be extended for every new triple in multiple places. I would like to propose a new function like `virtual bool CGOpenMPRuntime::shouldEmitInlinedTeamsDirective` which defaults to `false` and can be overriden in `CGOpenMPRuntimeNVPTX`. This way we can later on easily reuse the logic for new device types and target triples.

Hi Jonas

Thanks for your comment. That scheme would not work for #parallel on the gpu in case we do not want to inline the parallel region.
There are patches coming in which that is going to happen so I think that we really need to distinguish between different pragmas.

In D18286#382698, @carlo.bertolli wrote:

Hi Jonas

Thanks for your comment. That scheme would not work for #parallel on the gpu in case we do not want to inline the parallel region.
There are patches coming in which that is going to happen so I think that we really need to distinguish between different pragmas.

Is a #pragma omp parallel changing the way code is generated for #pragma omp teams? How would that look like in CodeGenFunction::EmitOMPTeamsDirective?

My point is that specialization should happen in CGOpenMPRuntimeNVPTX (by means of overriding) without duplicating code, as Alexey said.

Thinking more carefully, I believe you are right - no, it is not. I guess we would not call at all shouldEmitInlinedTeamsDirective for the case of paralle.

Thanks!

Thinking more about this patch I think it is a right solution to override void llvm::Value *emitParallelOrTeamsOutlinedFunction() and void emitTeamsCall() for CGOpenMPRuntimeNVPTX class:

llvm::Value *CGOpenMPRuntimeNVPTX::emitParallelOrTeamsOutlinedFunction() {
  llvm::Value *OutlinedFn = CGOpenMPRuntime::emitParallelOrTeamsOutlinedFunction();
  cast<llvm::Function>(OutlinedFn)->addFnAttr(llvm::Attribute::AlwaysInline);
  return OutlinedFn;
}

void CGOpenMPRuntimeNVPTX::emitTeamsCall() {
  if (!CGF.HaveInsertPoint())
    return;

  Address ZeroAddr =
        CGF.CreateTempAlloca(CGF.Int32Ty, CharUnits::fromQuantity(4),
                             /*Name*/ ".zero.addr");
  CGF.InitTempAlloca(ZeroAddr, CGF.Builder.getInt32(/*C*/ 0));
  llvm::SmallVector<llvm::Value *, 16> OutlinedFnArgs;
  OutlinedFnArgs.push_back(ZeroAddr.getPointer());
  OutlinedFnArgs.push_back(ZeroAddr.getPointer());
  OutlinedFnArgs.append(CapturedVars.begin(), CapturedVars.end());
  CGF.EmitCallOrInvoke(OutlinedFn, OutlinedFnArgs);
}

Also, please update patch to the latest revision

carlo.bertolli mentioned this in D17963: [OPENMP] Codegen for teams directive for NVPTX.Apr 1 2016, 1:28 PM

[OPENMP] Rebase patch on new support for codegen for teams in nvptx target. private and firstprivate clauses are now dealt with in the teams patch and this one only becomes regression tests.

This revision is now accepted and ready to land.Apr 4 2016, 10:03 PM

Hi Carlo,

I think these tests are not yet committed, right?

Thanks,
Jonas

Yes, the purpose of these patches was to check correctness of code gen for the related clauses when targeting nvptx exclusively.
Since I wrote these tests, the base support changed to generate same code on host and any device type, and it seemed to me that the need for a differentiated regression test was not there any more (happy to be told otherwise).

Of course, there are things that we would like to do a bit different on nvptx, especially for firstprivate, but I have not yet had time to work on this and to reflect this into this regression test. If I do, I would like to update this patch, but it seemed too early to just abandon this.

Thanks and please do let me know what is your viewpoint on this

Carlo

Hahnfeld resigned from this revision.Feb 17 2018, 2:01 AM

Herald added subscribers: llvm-commits, guansong. · View Herald TranscriptFeb 17 2018, 2:01 AM

Revision Contents

Path

Size

lib/

CodeGen/

CGOpenMPRuntimeNVPTX.cpp

5 lines

test/

OpenMP/

nvptx_teams_firstprivate_codegen.cpp

848 lines

nvptx_teams_private_codegen.cpp

539 lines

Diff 51080

lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp

	Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
	void CGOpenMPRuntimeNVPTX::emitTeamsCall(CodeGenFunction &CGF,			void CGOpenMPRuntimeNVPTX::emitTeamsCall(CodeGenFunction &CGF,
	const OMPExecutableDirective &D,			const OMPExecutableDirective &D,
	SourceLocation Loc,			SourceLocation Loc,
	llvm::Value *OutlinedFn,			llvm::Value *OutlinedFn,
	ArrayRef<llvm::Value *> CapturedVars) {			ArrayRef<llvm::Value *> CapturedVars) {

	// just emit the statements in the teams region inlined			// just emit the statements in the teams region inlined
	auto &&CodeGen = [&D](CodeGenFunction &CGF) {			auto &&CodeGen = [&D](CodeGenFunction &CGF) {
				CodeGenFunction::OMPPrivateScope PrivateScope(CGF);
				(void)CGF.EmitOMPFirstprivateClause(D, PrivateScope);
				CGF.EmitOMPPrivateClause(D, PrivateScope);
				(void)PrivateScope.Privatize();

				ABataevUnsubmitted Not Done Reply Inline Actions After some investigation I found out that this the same code, that a;ready exists in CodeGenFunction::EmitOMPTeamsDirective(). I think you must remove emitTeamsCall() from CGOpenMPRuntimeNVPTX class and modify CodeGenFunction::EmitOMPTeamsDirective() like this: void CodeGenFunction::EmitOMPTeamsDirective(const OMPTeamsDirective &S) { OMPLexicalScope(this, S); // Emit parallel region as a standalone region. auto &&CodeGen = [&S](CodeGenFunction &CGF) { OMPPrivateScope PrivateScope(CGF); (void)CGF.EmitOMPFirstprivateClause(S, PrivateScope); CGF.EmitOMPPrivateClause(S, PrivateScope); (void)PrivateScope.Privatize(); CGF.EmitStmt(cast<CapturedStmt>(S.getAssociatedStmt())->getCapturedStmt()); }; if (getLangOpts().OpenMPIsDevice) emitInlinedDirective(CGF, OMPD_teams, CodeGen); else emitCommonOMPTeamsDirective(this, S, OMPD_teams, CodeGen); } ABataev: After some investigation I found out that this the same code, that a;ready exists in…
	CGF.EmitStmt(cast<CapturedStmt>(D.getAssociatedStmt())->getCapturedStmt());			CGF.EmitStmt(cast<CapturedStmt>(D.getAssociatedStmt())->getCapturedStmt());
	};			};

	emitInlinedDirective(CGF, OMPD_teams, CodeGen);			emitInlinedDirective(CGF, OMPD_teams, CodeGen);
	}			}

test/OpenMP/nvptx_teams_firstprivate_codegen.cpp

This file was added.

				// RUN: %clang_cc1 -verify -fopenmp -x c++ -std=c++11 -triple powerpc64le-unknown-unknown -fomptargets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc
				// RUN: %clang_cc1 -verify -fopenmp -x c++ -std=c++11 -triple nvptx64-unknown-unknown -fomptargets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fomp-host-ir-file-path %t-ppc-host.bc -o - \| FileCheck %s --check-prefix TCHECK --check-prefix TCHECK-64
				// RUN: %clang_cc1 -verify -fopenmp -x c++ -std=c++11 -triple i386-unknown-unknown -fomptargets=nvptx-nvidia-cuda -emit-llvm-bc %s -o %t-x86-host.bc
				// RUN: %clang_cc1 -verify -fopenmp -x c++ -std=c++11 -triple nvptx-unknown-unknown -fomptargets=nvptx-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fomp-host-ir-file-path %t-x86-host.bc -o - \| FileCheck %s --check-prefix TCHECK --check-prefix TCHECK-32
				// expected-no-diagnostics
				#ifndef HEADER
				#define HEADER



				template<typename tx, typename ty>
				struct TT{
				tx X;
				ty Y;
				};

				// TCHECK: [[TT:%.+]] = type { i64, i8 }
				// TCHECK: [[S1:%.+]] = type { double }

				int foo(int n, double* ptr) {
				int a = 0;
				short aa = 0;
				float b[10];
				float bn[n];
				double c[5][10];
				double cn[5][n];
				TT<long long, char> d;

				#pragma omp target private(a)
				#pragma omp teams firstprivate(a)
				{
				}

				// TCHECK: define void @__omp_offloading_{{.+}}()
				// TCHECK: [[ATA:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[ATE:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[ATA_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[ATA]],
				// TCHECK: store i{{[0-9]+}} [[ATA_VAL]], i{{[0-9]+}}* [[ATE]],
				// TCHECK: ret void


				// a is implictly firstprivate - copy the value from parameter to private teams variable
				#pragma omp target
				#pragma omp teams firstprivate(a)
				{
				}

				// TCHECK: define void @__omp_offloading_{{.+}}(i{{[0-9]+}} [[A_IN:%.+]])
				// TCHECK: [[A_ADDR:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: store i{{[0-9]+}} [[A_IN]], i{{[0-9]+}}* [[A_ADDR]],
				// TCHECK-64: [[CONV:%.+]] = bitcast i{{[0-9]+}}* [[A_ADDR]] to i{{[0-9]+}}*
				// TCHECK-64: [[A_IN_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[CONV]],
				// TCHECK-32: [[A_IN_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[A_ADDR]],
				// TCHECK: store i{{[0-9]+}} [[A_IN_VAL]], i{{[0-9]+}}* [[A]],
				// TCHECK: ret void

				#pragma omp target firstprivate(a)
				#pragma omp teams firstprivate(a)
				{
				}

				// TCHECK: define void @__omp_offloading_{{.+}}(i{{[0-9]+}} [[A_IN:%.+]])
				// TCHECK: [[A_ADDR:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[ATA:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[ATE:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: store i{{[0-9]+}} [[A_IN]], i{{[0-9]+}}* [[A_ADDR]],
				// TCHECK-64: [[CONV:%.+]] = bitcast i{{[0-9]+}}* [[A_ADDR]] to i{{[0-9]+}}*
				// TCHECK-64: [[A_IN_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[CONV]],
				// TCHECK-32: [[A_IN_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[A_ADDR]],
				// TCHECK: store i{{[0-9]+}} [[A_IN_VAL]], i{{[0-9]+}}* [[ATA]],
				// TCHECK: [[A_IN_VAL2:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[ATA]],
				// TCHECK: store i{{[0-9]+}} [[A_IN_VAL2]], i{{[0-9]+}}* [[ATE]],
				// TCHECK: ret void

				#pragma omp target private(a)
				#pragma omp teams firstprivate(a)
				{
				a = 1;
				}

				// TCHECK: define void @__omp_offloading_{{.+}}()
				// TCHECK: [[ATA:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[ATE:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[ATA_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[ATA]],
				// TCHECK: store i{{[0-9]+}} [[ATA_VAL]], i{{[0-9]+}}* [[ATE]],
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[ATE]],
				// TCHECK: ret void

				#pragma omp target
				#pragma omp teams firstprivate(a)
				{
				a = 1;
				}

				// TCHECK: define void @__omp_offloading_{{.+}}(i{{[0-9]+}} [[A_IN:%.+]])
				// TCHECK: [[A_ADDR:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: store i{{[0-9]+}} [[A_IN]], i{{[0-9]+}}* [[A_ADDR]],
				// TCHECK-64: [[CONV:%.+]] = bitcast i{{[0-9]+}}* [[A_ADDR]] to i{{[0-9]+}}*
				// TCHECK-64: [[A_IN_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[CONV]],
				// TCHECK-32: [[A_IN_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[A_ADDR]],
				// TCHECK: store i{{[0-9]+}} [[A_IN_VAL]], i{{[0-9]+}}* [[A]],
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[A]],
				// TCHECK: ret void

				#pragma omp target firstprivate(a)
				#pragma omp teams firstprivate(a)
				{
				a = 1;
				}

				// TCHECK: define void @__omp_offloading_{{.+}}(i{{[0-9]+}} [[A_IN:%.+]])
				// TCHECK: [[A_ADDR:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[ATA:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[ATE:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: store i{{[0-9]+}} [[A_IN]], i{{[0-9]+}}* [[A_ADDR]],
				// TCHECK-64: [[CONV:%.+]] = bitcast i{{[0-9]+}}* [[A_ADDR]] to i{{[0-9]+}}*
				// TCHECK-64: [[A_IN_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[CONV]],
				// TCHECK-32: [[A_IN_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[A_ADDR]],
				// TCHECK: store i{{[0-9]+}} [[A_IN_VAL]], i{{[0-9]+}}* [[ATA]],
				// TCHECK: [[A_IN_VAL2:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[ATA]],
				// TCHECK: store i{{[0-9]+}} [[A_IN_VAL2]], i{{[0-9]+}}* [[ATE]],
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[ATE]],
				// TCHECK: ret void


				#pragma omp target private(a, aa)
				#pragma omp teams firstprivate(a,aa)
				{
				a = 1;
				aa = 1;
				}

				// TCHECK: define void @__omp_offloading_{{.+}}()
				// TCHECK: [[ATA:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A2TA:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[ATE:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A2TE:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[ATA_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[ATA]],
				// TCHECK: store i{{[0-9]+}} [[ATA_VAL]], i{{[0-9]+}}* [[ATE]],
				// TCHECK: [[A2TA_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[A2TA]],
				// TCHECK: store i{{[0-9]+}} [[A2TA_VAL]], i{{[0-9]+}}* [[A2TE]],
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[ATE]],
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[A2TE]],
				// TCHECK: ret void

				#pragma omp target
				#pragma omp teams firstprivate(a,aa)
				{
				a = 1;
				aa = 1;
				}

				// TCHECK: define void @__omp_offloading_{{.+}}(i{{[0-9]+}} [[A_IN:%.+]], i{{[0-9]+}} [[A2_IN:%.+]])
				// TCHECK: [[A_ADDR:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A2_ADDR:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A2:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: store i{{[0-9]+}} [[A_IN]], i{{[0-9]+}}* [[A_ADDR]],
				// TCHECK: store i{{[0-9]+}} [[A2_IN]], i{{[0-9]+}}* [[A2_ADDR]],
				// TCHECK-64: [[CONV:%.+]] = bitcast i{{[0-9]+}}* [[A_ADDR]] to i{{[0-9]+}}*
				// TCHECK: [[CONV2:%.+]] = bitcast i{{[0-9]+}}* [[A2_ADDR]] to i{{[0-9]+}}*
				// TCHECK-64: [[A_IN_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[CONV]],
				// TCHECK-32: [[A_IN_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[A_ADDR]],
				// TCHECK: store i{{[0-9]+}} [[A_IN_VAL]], i{{[0-9]+}}* [[A]],
				// TCHECK: [[A2_IN_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[CONV2]],
				// TCHECK: store i{{[0-9]+}} [[A2_IN_VAL]], i{{[0-9]+}}* [[A2]],
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[A]],
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[A2]],
				// TCHECK: ret void

				#pragma omp target firstprivate(a, aa)
				#pragma omp teams firstprivate(a,aa)
				{
				a = 1;
				aa = 1;

				aa = a+1;
				}

				// check that we are not using the firstprivate parameter
				// TCHECK: define void @__omp_offloading_{{.+}}(i{{[0-9]+}} [[A_IN:%.+]], i{{[0-9]+}} [[A2_IN:%.+]])
				// TCHECK: [[A_ADDR:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A2_ADDR:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[ATA:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A2TA:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[ATE:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A2TE:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: store i{{[0-9]+}} [[A_IN]], i{{[0-9]+}}* [[A_ADDR]],
				// TCHECK: store i{{[0-9]+}} [[A2_IN]], i{{[0-9]+}}* [[A2_ADDR]],
				// TCHECK-64: [[CONV:%.+]] = bitcast i{{[0-9]+}}* [[A_ADDR]] to i{{[0-9]+}}*
				// TCHECK: [[CONV2:%.+]] = bitcast i{{[0-9]+}}* [[A2_ADDR]] to i{{[0-9]+}}*
				// TCHECK-64: [[A_IN_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[CONV]],
				// TCHECK-32: [[A_IN_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[A_ADDR]],
				// TCHECK: store i{{[0-9]+}} [[A_IN_VAL]], i{{[0-9]+}}* [[ATA]],
				// TCHECK: [[A2_IN_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[CONV2]],
				// TCHECK: store i{{[0-9]+}} [[A2_IN_VAL]], i{{[0-9]+}}* [[A2TA]],
				// TCHECK: [[ATA_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[ATA]],
				// TCHECK: store i{{[0-9]+}} [[ATA_VAL]], i{{[0-9]+}}* [[ATE]],
				// TCHECK: [[A2TA_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[A2TA]],
				// TCHECK: store i{{[0-9]+}} [[A2TA_VAL]], i{{[0-9]+}}* [[A2TE]],

				// a = 1, aa = 1
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[ATE]],
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[A2TE]],

				// aa = a+1
				// TCHECK: [[ATE_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[ATE]],
				// TCHECK: [[ATE_INC:%.+]] = add{{.+}} i{{[0-9]+}} [[ATE_VAL]], 1
				// TCHECK: [[ATE_CONV:%.+]] = trunc {{.+}} to
				// TCHECK: store i{{[0-9]+}} [[ATE_CONV]], i{{[0-9]+}}* [[A2TE]],
				// TCHECK: ret void

				#pragma omp target private(a, b, c, d)
				#pragma omp teams firstprivate(a, b, c, d)
				{
				a = 1;
				b[2] = 1.0;
				c[1][2] = 1.0;
				d.X = 1;
				d.Y = 1;
				}
				// CARLO
				// TCHECK: define void @__omp_offloading_{{.+}}()
				// TCHECK: [[ATA:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[BTA:%.+]] = alloca [10 x float],
				// TCHECK: [[CTA:%.+]] = alloca [5 x [10 x double]],
				// TCHECK: [[DTA:%.+]] = alloca [[TT]],
				// TCHECK: [[ATE:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[BTE:%.+]] = alloca [10 x float],
				// TCHECK: [[CTE:%.+]] = alloca [5 x [10 x double]],
				// TCHECK: [[DTE:%.+]] = alloca [[TT]],
				// TCHECK: [[ATA_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[ATA]],
				// TCHECK: store i{{[0-9]+}} [[ATA_VAL]], i{{[0-9]+}}* [[ATE]],
				// TCHECK: [[BTE_CPY:%.+]] = bitcast [10 x float]* [[BTE]] to i8*
				// TCHECK: [[BTA_CPY:%.+]] = bitcast [10 x float]* [[BTA]] to i8*
				// TCHECK: call void @llvm.memcpy.{{.+}}(i8* [[BTE_CPY]], i8* [[BTA_CPY]],{{.+}})
				// TCHECK: [[CTE_CPY:%.+]] = bitcast [5 x [10 x double]]* [[CTE]] to i8*
				// TCHECK: [[CTA_CPY:%.+]] = bitcast [5 x [10 x double]]* [[CTA]] to i8*
				// TCHECK: call void @llvm.memcpy.{{.+}}(i8* [[CTE_CPY]], i8* [[CTA_CPY]],{{.+}})
				// TCHECK: [[DTE_CPY:%.+]] = bitcast [[TT]]* [[DTE]] to i8*
				// TCHECK: [[DTA_CPY:%.+]] = bitcast [[TT]]* [[DTA]] to i8*
				// TCHECK: call void @llvm.memcpy.{{.+}}(i8* [[DTE_CPY]], i8* [[DTA_CPY]],{{.+}})

				// TCHECK: store {{.+}} [[ATE]],
				// TCHECK: [[B_IDX:%.+]] = getelementptr{{.+}} [[BTE]], {{.+}},
				// TCHECK: store {{.+}} [[B_IDX]],
				// TCHECK: [[C_IDX1:%.+]] = getelementptr {{.+}} [[CTE]], {{.+}}
				// TCHECK: [[C_IDX2:%.+]] = getelementptr {{.+}} [[C_IDX1]], {{.+}}
				// TCHECK: store {{.+}} [[C_IDX2]],
				// TCHECK: [[D_X:%.+]] = getelementptr {{.+}} [[DTE]], {{.+}}
				// TCHECK: store {{.+}} [[D_X]],
				// TCHECK: [[D_Y:%.+]] = getelementptr {{.+}} [[DTE]], {{.+}}
				// TCHECK: store {{.+}} [[D_Y]],

				#pragma omp target
				#pragma omp teams firstprivate(a, b, c, d)
				{
				a = 1;
				b[2] = 1.0;
				c[1][2] = 1.0;
				d.X = 1;
				d.Y = 1;
				}

				// TCHECK: define void @__omp_offloading_{{.+}}(i{{[0-9]+}} [[A_IN:%.+]], [10 x float]{{.+}} [[B_IN:%.+]], [5 x [10 x double]]{{.+}} [[C_IN:%.+]], [[TT]]*{{.+}} [[D_IN:%.+]])
				// TCHECK: [[A_ADDR:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[B_ADDR:%.+]] = alloca [10 x float]*,
				// TCHECK: [[C_ADDR:%.+]] = alloca [5 x [10 x double]]*,
				// TCHECK: [[D_ADDR:%.+]] = alloca [[TT]]*,
				// TCHECK: [[A:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[B:%.+]] = alloca [10 x float],
				// TCHECK: [[C:%.+]] = alloca [5 x [10 x double]],
				// TCHECK: [[D:%.+]] = alloca [[TT]],
				// TCHECK: store i{{[0-9]+}} [[A_IN]], i{{[0-9]+}}* [[A_ADDR]],
				// TCHECK: store [10 x float]* [[B_IN]], [10 x float]** [[B_ADDR]],
				// TCHECK: store [5 x [10 x double]]* [[C_IN]], [5 x [10 x double]]** [[C_ADDR]],
				// TCHECK: store [[TT]]* [[D_IN]], [[TT]]** [[D_ADDR]],
				// TCHECK-64: [[CONV:%.+]] = bitcast i{{[0-9]+}}* [[A_ADDR]] to i{{[0-9]+}}*
				// TCHECK: [[B_ADDR_REF:%.+]] = load [10 x float], [10 x float]* [[B_ADDR]],
				// TCHECK: [[C_ADDR_REF:%.+]] = load [5 x [10 x double]], [5 x [10 x double]]* [[C_ADDR]],
				// TCHECK: [[D_ADDR_REF:%.+]] = load %struct.TT, %struct.TT* [[D_ADDR]],

				// TCHECK-64: [[A_IN_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[CONV]],
				// TCHECK-32: [[A_IN_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[A_ADDR]],
				// TCHECK: store i{{[0-9]+}} [[A_IN_VAL]], i{{[0-9]+}}* [[A]],
				// TCHECK: [[B_CPY:%.+]] = bitcast [10 x float]* [[B]] to i8*
				// TCHECK: [[B_IN_CPY:%.+]] = bitcast [10 x float]* [[B_ADDR_REF]] to i8*
				// TCHECK: call void @llvm.memcpy.{{.+}}(i8* [[B_CPY]], i8* [[B_IN_CPY]],{{.+}})
				// TCHECK: [[C_CPY:%.+]] = bitcast [5 x [10 x double]]* [[C]] to i8*
				// TCHECK: [[C_IN_CPY:%.+]] = bitcast [5 x [10 x double]]* [[C_ADDR_REF]] to i8*
				// TCHECK: call void @llvm.memcpy.{{.+}}(i8* [[C_CPY]], i8* [[C_IN_CPY]],{{.+}})
				// TCHECK: [[D_CPY:%.+]] = bitcast [[TT]]* [[D]] to i8*
				// TCHECK: [[D_IN_CPY:%.+]] = bitcast [[TT]]* [[D_ADDR_REF]] to i8*
				// TCHECK: call void @llvm.memcpy.{{.+}}(i8* [[D_CPY]], i8* [[D_IN_CPY]],{{.+}})

				// TCHECK: store {{.+}} [[A]],
				// TCHECK: [[B_IDX:%.+]] = getelementptr{{.+}} [[B]], {{.+}},
				// TCHECK: store {{.+}} [[B_IDX]],
				// TCHECK: [[C_IDX1:%.+]] = getelementptr {{.+}} [[C]], {{.+}}
				// TCHECK: [[C_IDX2:%.+]] = getelementptr {{.+}} [[C_IDX1]], {{.+}}
				// TCHECK: store {{.+}} [[C_IDX2]],
				// TCHECK: [[D_X:%.+]] = getelementptr {{.+}} [[D]], {{.+}}
				// TCHECK: store {{.+}} [[D_X]],
				// TCHECK: [[D_Y:%.+]] = getelementptr {{.+}} [[D]], {{.+}}
				// TCHECK: store {{.+}} [[D_Y]],

				#pragma omp target firstprivate(a, b, c, d)
				#pragma omp teams firstprivate(a, b, c, d)
				{
				a = 1;
				b[2] = 1.0;
				c[1][2] = 1.0;
				d.X = 1;
				d.Y = 1;
				}

				// TCHECK: define void @__omp_offloading_{{.+}}(i{{[0-9]+}} [[A_IN:%.+]], [10 x float]{{.+}} [[B_IN:%.+]], [5 x [10 x double]]{{.+}} [[C_IN:%.+]], [[TT]]*{{.+}} [[D_IN:%.+]])
				// TCHECK: [[A_ADDR:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[B_ADDR:%.+]] = alloca [10 x float]*,
				// TCHECK: [[C_ADDR:%.+]] = alloca [5 x [10 x double]]*,
				// TCHECK: [[D_ADDR:%.+]] = alloca [[TT]]*,
				// TCHECK: [[ATA:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[BTA:%.+]] = alloca [10 x float],
				// TCHECK: [[CTA:%.+]] = alloca [5 x [10 x double]],
				// TCHECK: [[DTA:%.+]] = alloca [[TT]],
				// TCHECK: [[ATE:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[BTE:%.+]] = alloca [10 x float],
				// TCHECK: [[CTE:%.+]] = alloca [5 x [10 x double]],
				// TCHECK: [[DTE:%.+]] = alloca [[TT]],

				// TCHECK: store i{{[0-9]+}} [[A_IN]], i{{[0-9]+}}* [[A_ADDR]],
				// TCHECK: store [10 x float]* [[B_IN]], [10 x float]** [[B_ADDR]],
				// TCHECK: store [5 x [10 x double]]* [[C_IN]], [5 x [10 x double]]** [[C_ADDR]],
				// TCHECK: store [[TT]]* [[D_IN]], [[TT]]** [[D_ADDR]],
				// TCHECK-64: [[CONV:%.+]] = bitcast i{{[0-9]+}}* [[A_ADDR]] to i{{[0-9]+}}*
				// TCHECK: [[B_ADDR_REF:%.+]] = load [10 x float], [10 x float]* [[B_ADDR]],
				// TCHECK: [[C_ADDR_REF:%.+]] = load [5 x [10 x double]], [5 x [10 x double]]* [[C_ADDR]],
				// TCHECK: [[D_ADDR_REF:%.+]] = load %struct.TT, %struct.TT* [[D_ADDR]],

				// TCHECK-64: [[A_IN_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[CONV]],
				// TCHECK-32: [[A_IN_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[A_ADDR]],
				// TCHECK: store i{{[0-9]+}} [[A_IN_VAL]], i{{[0-9]+}}* [[ATA]],
				// TCHECK: [[B_CPY:%.+]] = bitcast [10 x float]* [[BTA]] to i8*
				// TCHECK: [[B_IN_CPY:%.+]] = bitcast [10 x float]* [[B_ADDR_REF]] to i8*
				// TCHECK: call void @llvm.memcpy.{{.+}}(i8* [[B_CPY]], i8* [[B_IN_CPY]],{{.+}})
				// TCHECK: [[C_CPY:%.+]] = bitcast [5 x [10 x double]]* [[CTA]] to i8*
				// TCHECK: [[C_IN_CPY:%.+]] = bitcast [5 x [10 x double]]* [[C_ADDR_REF]] to i8*
				// TCHECK: call void @llvm.memcpy.{{.+}}(i8* [[C_CPY]], i8* [[C_IN_CPY]],{{.+}})
				// TCHECK: [[D_CPY:%.+]] = bitcast [[TT]]* [[DTA]] to i8*
				// TCHECK: [[D_IN_CPY:%.+]] = bitcast [[TT]]* [[D_ADDR_REF]] to i8*
				// TCHECK: call void @llvm.memcpy.{{.+}}(i8* [[D_CPY]], i8* [[D_IN_CPY]],{{.+}})

				// TCHECK: [[ATA_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[ATA]],
				// TCHECK: store i{{[0-9]+}} [[ATA_VAL]], i{{[0-9]+}}* [[ATE]],
				// TCHECK: [[BTE_CPY:%.+]] = bitcast [10 x float]* [[BTE]] to i8*
				// TCHECK: [[BTA_CPY:%.+]] = bitcast [10 x float]* [[BTA]] to i8*
				// TCHECK: call void @llvm.memcpy.{{.+}}(i8* [[BTE_CPY]], i8* [[BTA_CPY]],{{.+}})
				// TCHECK: [[CTE_CPY:%.+]] = bitcast [5 x [10 x double]]* [[CTE]] to i8*
				// TCHECK: [[CTA_CPY:%.+]] = bitcast [5 x [10 x double]]* [[CTA]] to i8*
				// TCHECK: call void @llvm.memcpy.{{.+}}(i8* [[CTE_CPY]], i8* [[CTA_CPY]],{{.+}})
				// TCHECK: [[DTE_CPY:%.+]] = bitcast [[TT]]* [[DTE]] to i8*
				// TCHECK: [[DTA_CPY:%.+]] = bitcast [[TT]]* [[DTA]] to i8*
				// TCHECK: call void @llvm.memcpy.{{.+}}(i8* [[DTE_CPY]], i8* [[DTA_CPY]],{{.+}})

				// TCHECK: store {{.+}} [[ATE]],
				// TCHECK: [[B_IDX:%.+]] = getelementptr{{.+}} [[BTE]], {{.+}},
				// TCHECK: store {{.+}} [[B_IDX]],
				// TCHECK: [[C_IDX1:%.+]] = getelementptr {{.+}} [[CTE]], {{.+}}
				// TCHECK: [[C_IDX2:%.+]] = getelementptr {{.+}} [[C_IDX1]], {{.+}}
				// TCHECK: store {{.+}} [[C_IDX2]],
				// TCHECK: [[D_X:%.+]] = getelementptr {{.+}} [[DTE]], {{.+}}
				// TCHECK: store {{.+}} [[D_X]],
				// TCHECK: [[D_Y:%.+]] = getelementptr {{.+}} [[DTE]], {{.+}}
				// TCHECK: store {{.+}} [[D_Y]],

				#pragma omp target
				#pragma omp teams firstprivate(ptr)
				{
				ptr[0]++;
				}

				// TCHECK: define void @__omp_offloading_{{.+}}(double* [[PTR_IN:%.+]])
				// TCHECK: [[PTR_ADDR:%.+]] = alloca double*,
				// TCHECK: [[PTR:%.+]] = alloca double*,
				// TCHECK: store double* [[PTR_IN]], double** [[PTR_ADDR]],
				// TCHECK: [[PTR_ADDR_REF:%.+]] = load double, double* [[PTR_ADDR]],
				// TCHECK: store double* [[PTR_ADDR_REF]], double** [[PTR]],
				// TCHECK: load double, double* [[PTR]],
				// TCHECK-NOT: [[PTR_ADDR]]

				#pragma omp target firstprivate(ptr)
				#pragma omp teams firstprivate(ptr)
				{
				ptr[0]++;
				}

				// TCHECK: define void @__omp_offloading_{{.+}}(double* [[PTR_IN:%.+]])
				// TCHECK: [[PTR_ADDR:%.+]] = alloca double*,
				// TCHECK: [[PTRTA:%.+]] = alloca double*,
				// TCHECK: [[PTRTE:%.+]] = alloca double*,
				// TCHECK: store double* [[PTR_IN]], double** [[PTR_ADDR]],
				// TCHECK: [[PTR_ADDR_REF:%.+]] = load double, double* [[PTR_ADDR]],
				// TCHECK: store double* [[PTR_ADDR_REF]], double** [[PTRTA]],
				// TCHECK: [[PTRTA_VAL:%.+]] = load double, double* [[PTRTA]],
				// TCHECK: store double* [[PTRTA_VAL]], double** [[PTRTE]],
				// TCHECK: load double, double* [[PTRTE]],
				// TCHECK-NOT: [[PTRTA]]
				// TCHECK-NOT: [[PTR_ADDR]]

				return a;
				}


				template<typename tx>
				tx ftemplate(int n) {
				tx a = 0;
				short aa = 0;
				tx b[10];

				#pragma omp target private(a,aa,b)
				#pragma omp teams firstprivate(a,aa,b)
				{
				a = 1;
				aa = 1;
				b[2] = 1;
				}

				#pragma omp target
				#pragma omp teams firstprivate(a,aa,b)
				{
				a = 1;
				aa = 1;
				b[2] = 1;
				}

				#pragma omp target firstprivate(a,aa,b)
				#pragma omp teams firstprivate(a,aa,b)
				{
				a = 1;
				aa = 1;
				b[2] = 1;
				}

				return a;
				}

				static
				int fstatic(int n) {
				int a = 0;
				short aa = 0;
				char aaa = 0;
				int b[10];

				#pragma omp target private(a,aa,aaa,b)
				#pragma omp teams firstprivate(a,aa,aaa,b)
				{
				a = 1;
				aa = 1;
				aaa = 1;
				b[2] = 1;
				}

				// TCHECK: define void @__omp_offloading_{{.+}}()
				// TCHECK: [[ATA:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A2TA:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A3TA:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[BTA:%.+]] = alloca [10 x i{{[0-9]+}}],
				// TCHECK: [[ATE:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A2TE:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A3TE:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[BTE:%.+]] = alloca [10 x i{{[0-9]+}}],
				// TCHECK: [[ATA_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[ATA]],
				// TCHECK: store i{{[0-9]+}} [[ATA_VAL]], i{{[0-9]+}}* [[ATE]],
				// TCHECK: [[A2TA_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[A2TA]],
				// TCHECK: store i{{[0-9]+}} [[A2TA_VAL]], i{{[0-9]+}}* [[A2TE]],
				// TCHECK: [[A3TA_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[A3TA]],
				// TCHECK: store i{{[0-9]+}} [[A3TA_VAL]], i{{[0-9]+}}* [[A3TE]],
				// TCHECK: [[BTE_CPY:%.+]] = bitcast [10 x i{{[0-9]+}}]* [[BTE]] to i8*
				// TCHECK: [[BTA_CPY:%.+]] = bitcast [10 x i{{[0-9]+}}]* [[BTA]] to i8*
				// TCHECK: call void @llvm.memcpy.{{.+}}(i8* [[BTE_CPY]], i8* [[BTA_CPY]],{{.+}})

				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[ATE]],
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[A2TE]],
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[A3TE]],
				// TCHECK: [[B_GEP:%.+]] = getelementptr inbounds [10 x i{{[0-9]+}}], [10 x i{{[0-9]+}}]* [[BTE]], i{{[0-9]+}} 0, i{{[0-9]+}} 2
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[B_GEP]],
				// TCHECK: ret void

				#pragma omp target
				#pragma omp teams firstprivate(a,aa,aaa,b)
				{
				a = 1;
				aa = 1;
				aaa = 1;
				b[2] = 1;
				}

				// TCHECK: define void @__omp_offloading_{{.+}}(i{{[0-9]+}} [[A_IN:%.+]], i{{[0-9]+}} [[A2_IN:%.+]], i{{[0-9]+}} [[A3_IN:%.+]], [10 x i{{[0-9]+}}]*{{.+}} [[B_IN:%.+]])
				// TCHECK: [[A_ADDR:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A2_ADDR:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A3_ADDR:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[B_ADDR:%.+]] = alloca [10 x i{{[0-9]+}}]*,
				// TCHECK: [[A:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A2:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A3:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[B:%.+]] = alloca [10 x i{{[0-9]+}}],
				// TCHECK: store i{{[0-9]+}} [[A_IN]], i{{[0-9]+}}* [[A_ADDR]],
				// TCHECK: store i{{[0-9]+}} [[A2_IN]], i{{[0-9]+}}* [[A2_ADDR]],
				// TCHECK: store i{{[0-9]+}} [[A3_IN]], i{{[0-9]+}}* [[A3_ADDR]],
				// TCHECK: store [10 x i{{[0-9]+}}]* [[B_IN]], [10 x i{{[0-9]+}}]** [[B_ADDR]],
				// TCHECK-64: [[A_CONV:%.+]] = bitcast i{{[0-9]+}}* [[A_ADDR]] to i{{[0-9]+}}*
				// TCHECK: [[A2_CONV:%.+]] = bitcast i{{[0-9]+}}* [[A2_ADDR]] to i{{[0-9]+}}*
				// TCHECK: [[A3_CONV:%.+]] = bitcast i{{[0-9]+}}* [[A3_ADDR]] to i{{[0-9]+}}*
				// TCHECK: [[B_ADDR_REF:%.+]] = load [10 x i{{[0-9]+}}], [10 x i{{[0-9]+}}]* [[B_ADDR]],
				// TCHECK-64: [[A_IN_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[A_CONV]],
				// TCHECK-32: [[A_IN_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[A_ADDR]],
				// TCHECK: store i{{[0-9]+}} [[A_IN_VAL]], i{{[0-9]+}}* [[A]],
				// TCHECK: [[A2_IN_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[A2_CONV]],
				// TCHECK: store i{{[0-9]+}} [[A2_IN_VAL]], i{{[0-9]+}}* [[A2]],
				// TCHECK: [[A3_IN_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[A3_CONV]],
				// TCHECK: store i{{[0-9]+}} [[A3_IN_VAL]], i{{[0-9]+}}* [[A3]],
				// TCHECK: [[B_CPY:%.+]] = bitcast [10 x i{{[0-9]+}}]* [[B]] to i8*
				// TCHECK: [[B_IN_CPY:%.+]] = bitcast [10 x i{{[0-9]+}}]* [[B_ADDR_REF]] to i8*
				// TCHECK: call void @llvm.memcpy.{{.+}}(i8* [[B_CPY]], i8* [[B_IN_CPY]],{{.+}})

				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[A]],
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[A2]],
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[A3]],
				// TCHECK: [[B_GEP:%.+]] = getelementptr inbounds [10 x i{{[0-9]+}}], [10 x i{{[0-9]+}}]* [[B]], i{{[0-9]+}} 0, i{{[0-9]+}} 2
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[B_GEP]],
				// TCHECK: ret void

				#pragma omp target firstprivate(a,aa,aaa,b)
				#pragma omp teams firstprivate(a,aa,aaa,b)
				{
				a = 1;
				aa = 1;
				aaa = 1;
				b[2] = 1;
				}

				// TCHECK: define void @__omp_offloading_{{.+}}(i{{[0-9]+}} [[A_IN:%.+]], i{{[0-9]+}} [[A2_IN:%.+]], i{{[0-9]+}} [[A3_IN:%.+]], [10 x i{{[0-9]+}}]*{{.+}} [[B_IN:%.+]])
				// TCHECK: [[A_ADDR:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A2_ADDR:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A3_ADDR:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[B_ADDR:%.+]] = alloca [10 x i{{[0-9]+}}]*,
				// TCHECK: [[ATA:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A2TA:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A3TA:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[BTA:%.+]] = alloca [10 x i{{[0-9]+}}],
				// TCHECK: [[ATE:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A2TE:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A3TE:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[BTE:%.+]] = alloca [10 x i{{[0-9]+}}],
				// TCHECK: store i{{[0-9]+}} [[A_IN]], i{{[0-9]+}}* [[A_ADDR]],
				// TCHECK: store i{{[0-9]+}} [[A2_IN]], i{{[0-9]+}}* [[A2_ADDR]],
				// TCHECK: store i{{[0-9]+}} [[A3_IN]], i{{[0-9]+}}* [[A3_ADDR]],
				// TCHECK: store [10 x i{{[0-9]+}}]* [[B_IN]], [10 x i{{[0-9]+}}]** [[B_ADDR]],

				// TCHECK-64: [[A_CONV:%.+]] = bitcast i{{[0-9]+}}* [[A_ADDR]] to i{{[0-9]+}}*
				// TCHECK: [[A2_CONV:%.+]] = bitcast i{{[0-9]+}}* [[A2_ADDR]] to i{{[0-9]+}}*
				// TCHECK: [[A3_CONV:%.+]] = bitcast i{{[0-9]+}}* [[A3_ADDR]] to i{{[0-9]+}}*
				// TCHECK: [[B_ADDR_REF:%.+]] = load [10 x i{{[0-9]+}}], [10 x i{{[0-9]+}}]* [[B_ADDR]],
				// TCHECK-64: [[A_IN_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[A_CONV]],
				// TCHECK-32: [[A_IN_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[A_ADDR]],
				// TCHECK: store i{{[0-9]+}} [[A_IN_VAL]], i{{[0-9]+}}* [[ATA]],
				// TCHECK: [[A2_IN_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[A2_CONV]],
				// TCHECK: store i{{[0-9]+}} [[A2_IN_VAL]], i{{[0-9]+}}* [[A2TA]],
				// TCHECK: [[A3_IN_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[A3_CONV]],
				// TCHECK: store i{{[0-9]+}} [[A3_IN_VAL]], i{{[0-9]+}}* [[A3TA]],
				// TCHECK: [[B_CPY:%.+]] = bitcast [10 x i{{[0-9]+}}]* [[BTA]] to i8*
				// TCHECK: [[B_IN_CPY:%.+]] = bitcast [10 x i{{[0-9]+}}]* [[B_ADDR_REF]] to i8*
				// TCHECK: call void @llvm.memcpy.{{.+}}(i8* [[B_CPY]], i8* [[B_IN_CPY]],{{.+}})

				// TCHECK: [[ATA_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[ATA]],
				// TCHECK: store i{{[0-9]+}} [[ATA_VAL]], i{{[0-9]+}}* [[ATE]],
				// TCHECK: [[A2TA_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[A2TA]],
				// TCHECK: store i{{[0-9]+}} [[A2TA_VAL]], i{{[0-9]+}}* [[A2TE]],
				// TCHECK: [[A3TA_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[A3TA]],
				// TCHECK: store i{{[0-9]+}} [[A3TA_VAL]], i{{[0-9]+}}* [[A3TE]],
				// TCHECK: [[BTE_CPY:%.+]] = bitcast [10 x i{{[0-9]+}}]* [[BTE]] to i8*
				// TCHECK: [[BTA_IN_CPY:%.+]] = bitcast [10 x i{{[0-9]+}}]* [[BTA]] to i8*
				// TCHECK: call void @llvm.memcpy.{{.+}}(i8* [[BTE_CPY]], i8* [[BTA_IN_CPY]],{{.+}})

				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[ATE]],
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[A2TE]],
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[A3TE]],
				// TCHECK: [[B_GEP:%.+]] = getelementptr inbounds [10 x i{{[0-9]+}}], [10 x i{{[0-9]+}}]* [[BTE]], i{{[0-9]+}} 0, i{{[0-9]+}} 2
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[B_GEP]],
				// TCHECK: ret void

				return a;
				}

				struct S1 {
				double a;

				int r1(int n){
				int b = n+1;
				short int c[2][5];

				#pragma omp target private(b,c)
				#pragma omp teams firstprivate(b,c)
				{
				this->a = (double)b + 1.5;
				c[1][1] = ++a;
				}

				// TCHECK: define void @__omp_offloading_{{.+}}([[S1]]* [[TH:%.+]])
				// TCHECK: [[TH_ADDR:%.+]] = alloca [[S1]]*,
				// TCHECK: [[BTA:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[CTA:%.+]] = alloca [2 x [5 x i{{[0-9]+}}]],
				// TCHECK: [[BTE:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[CTE:%.+]] = alloca [2 x [5 x i{{[0-9]+}}]],
				// TCHECK: store [[S1]]* [[TH]], [[S1]]** [[TH_ADDR]],
				// TCHECK: [[TH_ADDR_REF:%.+]] = load [[S1]], [[S1]]* [[TH_ADDR]],

				// TCHECK: [[BTA_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[BTA]],
				// TCHECK: store i{{[0-9]+}} [[BTA_VAL]], i{{[0-9]+}}* [[BTE]],
				// TCHECK: [[CTE_CPY:%.+]] = bitcast [2 x [5 x i{{[0-9]+}}]]* [[CTE]] to i8*
				// TCHECK: [[CTA_CPY:%.+]] = bitcast [2 x [5 x i{{[0-9]+}}]]* [[CTA]] to i8*
				// TCHECK: call void @llvm.memcpy.{{.+}}(i8* [[CTE_CPY]], i8* [[CTA_CPY]],{{.+}})

				// this->a = (double)b + 1.5;
				// TCHECK: [[B_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[BTE]],
				// TCHECK: [[B_CONV:%.+]] = sitofp i{{[0-9]+}} [[B_VAL]] to double
				// TCHECK: [[NEW_A_VAL:%.+]] = fadd double [[B_CONV]], 1.5{{.+}}+00
				// TCHECK: [[A_FIELD:%.+]] = getelementptr inbounds [[S1]], [[S1]]* [[TH_ADDR_REF]], i{{[0-9]+}} 0, i{{[0-9]+}} 0
				// TCHECK: store double [[NEW_A_VAL]], double* [[A_FIELD]],

				// c[1][1] = ++a;
				// TCHECK: [[A_FIELD4:%.+]] = getelementptr inbounds [[S1]], [[S1]]* [[TH_ADDR_REF]], i{{[0-9]+}} 0, i{{[0-9]+}} 0
				// TCHECK: [[A_FIELD4_VAL:%.+]] = load double, double* [[A_FIELD4]],
				// TCHECK: [[A_FIELD_INC:%.+]] = fadd double [[A_FIELD4_VAL]], 1.0{{.+}}+00
				// TCHECK: store double [[A_FIELD_INC]], double* [[A_FIELD4]],
				// TCHECK: [[A_FIELD_INC_CONV:%.+]] = fptosi double [[A_FIELD_INC]] to i{{[0-9]+}}
				// TCHECK: [[C_1_REF:%.+]] = getelementptr inbounds [2 x [5 x i{{[0-9]+}}]], [2 x [5 x i{{[0-9]+}}]]* [[CTE]], i{{[0-9]+}} 0, i{{[0-9]+}} 1
				// TCHECK: [[C_1_1_REF:%.+]] = getelementptr inbounds [5 x i{{[0-9]+}}], [5 x i{{[0-9]+}}]* [[C_1_REF]], i{{[0-9]+}} 0, i{{[0-9]+}} 1
				// TCHECK: store i{{[0-9]+}} [[A_FIELD_INC_CONV]], i{{[0-9]+}}* [[C_1_1_REF]],
				// TCHECK: ret void

				#pragma omp target
				#pragma omp teams firstprivate(b,c)
				{
				this->a = (double)b + 1.5;
				c[1][1] = ++a;
				}

				// TCHECK: define void @__omp_offloading_{{.+}}(i{{[0-9]+}} [[B_IN:%.+]], [2 x [5 x i{{[0-9]+}}]]{{.+}} [[C_IN:%.+]], [[S1]] [[TH:%.+]])
				// TCHECK: [[B_ADDR:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[C_ADDR:%.+]] = alloca [2 x [5 x i{{[0-9]+}}]]*,
				// TCHECK: [[TH_ADDR:%.+]] = alloca [[S1]]*,
				// TCHECK: [[B:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[C:%.+]] = alloca [2 x [5 x i{{[0-9]+}}]],
				// TCHECK: store i{{[0-9]+}} [[B_IN]], i{{[0-9]+}}* [[B_ADDR]],
				// TCHECK: store [2 x [5 x i{{[0-9]+}}]]* [[C_IN]], [2 x [5 x i{{[0-9]+}}]]** [[C_ADDR]],
				// TCHECK: store [[S1]]* [[TH]], [[S1]]** [[TH_ADDR]],
				// TCHECK-64: [[B_CONV:%.+]] = bitcast i{{[0-9]+}}* [[B_ADDR]] to i{{[0-9]+}}*
				// TCHECK: [[C_ADDR_REF:%.+]] = load [2 x [5 x i{{[0-9]+}}]], [2 x [5 x i{{[0-9]+}}]]* [[C_ADDR]],
				// TCHECK: [[TH_ADDR_REF:%.+]] = load [[S1]], [[S1]]* [[TH_ADDR]],
				// TCHECK-64: [[B_IN_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[B_CONV]],
				// TCHECK-32: [[B_IN_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[B_ADDR]],
				// TCHECK: store i{{[0-9]+}} [[B_IN_VAL]], i{{[0-9]+}}* [[B]],
				// TCHECK: [[C_CPY:%.+]] = bitcast [2 x [5 x i{{[0-9]+}}]]* [[C]] to i8*
				// TCHECK: [[C_IN_CPY:%.+]] = bitcast [2 x [5 x i{{[0-9]+}}]]* [[C_ADDR_REF]] to i8*
				// TCHECK: call void @llvm.memcpy.{{.+}}(i8* [[C_CPY]], i8* [[C_IN_CPY]],{{.+}})

				// this->a = (double)b + 1.5;
				// TCHECK: [[B_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[B]],
				// TCHECK: [[B_CONV:%.+]] = sitofp i{{[0-9]+}} [[B_VAL]] to double
				// TCHECK: [[NEW_A_VAL:%.+]] = fadd double [[B_CONV]], 1.5{{.+}}+00
				// TCHECK: [[A_FIELD:%.+]] = getelementptr inbounds [[S1]], [[S1]]* [[TH_ADDR_REF]], i{{[0-9]+}} 0, i{{[0-9]+}} 0
				// TCHECK: store double [[NEW_A_VAL]], double* [[A_FIELD]],

				// c[1][1] = ++a;
				// TCHECK: [[A_FIELD4:%.+]] = getelementptr inbounds [[S1]], [[S1]]* [[TH_ADDR_REF]], i{{[0-9]+}} 0, i{{[0-9]+}} 0
				// TCHECK: [[A_FIELD4_VAL:%.+]] = load double, double* [[A_FIELD4]],
				// TCHECK: [[A_FIELD_INC:%.+]] = fadd double [[A_FIELD4_VAL]], 1.0{{.+}}+00
				// TCHECK: store double [[A_FIELD_INC]], double* [[A_FIELD4]],
				// TCHECK: [[A_FIELD_INC_CONV:%.+]] = fptosi double [[A_FIELD_INC]] to i{{[0-9]+}}
				// TCHECK: [[C_1_REF:%.+]] = getelementptr inbounds [2 x [5 x i{{[0-9]+}}]], [2 x [5 x i{{[0-9]+}}]]* [[C]], i{{[0-9]+}} 0, i{{[0-9]+}} 1
				// TCHECK: [[C_1_1_REF:%.+]] = getelementptr inbounds [5 x i{{[0-9]+}}], [5 x i{{[0-9]+}}]* [[C_1_REF]], i{{[0-9]+}} 0, i{{[0-9]+}} 1
				// TCHECK: store i{{[0-9]+}} [[A_FIELD_INC_CONV]], i{{[0-9]+}}* [[C_1_1_REF]],
				// TCHECK: ret void

				#pragma omp target firstprivate(b,c)
				#pragma omp teams firstprivate(b,c)
				{
				this->a = (double)b + 1.5;
				c[1][1] = ++a;
				}

				return c[1][1] + (int)b;
				}

				// TCHECK: define void @__omp_offloading_{{.+}}(i{{[0-9]+}} [[B_IN:%.+]], [2 x [5 x i{{[0-9]+}}]]{{.+}} [[C_IN:%.+]], [[S1]] [[TH:%.+]])
				// TCHECK: [[B_ADDR:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[C_ADDR:%.+]] = alloca [2 x [5 x i{{[0-9]+}}]]*,
				// TCHECK: [[TH_ADDR:%.+]] = alloca [[S1]]*,
				// TCHECK: [[BTA:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[CTA:%.+]] = alloca [2 x [5 x i{{[0-9]+}}]],
				// TCHECK: [[BTE:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[CTE:%.+]] = alloca [2 x [5 x i{{[0-9]+}}]],

				// TCHECK: store i{{[0-9]+}} [[B_IN]], i{{[0-9]+}}* [[B_ADDR]],
				// TCHECK: store [2 x [5 x i{{[0-9]+}}]]* [[C_IN]], [2 x [5 x i{{[0-9]+}}]]** [[C_ADDR]],
				// TCHECK: store [[S1]]* [[TH]], [[S1]]** [[TH_ADDR]],
				// TCHECK-64: [[B_CONV:%.+]] = bitcast i{{[0-9]+}}* [[B_ADDR]] to i{{[0-9]+}}*
				// TCHECK: [[C_ADDR_REF:%.+]] = load [2 x [5 x i{{[0-9]+}}]], [2 x [5 x i{{[0-9]+}}]]* [[C_ADDR]],
				// TCHECK: [[TH_ADDR_REF:%.+]] = load [[S1]], [[S1]]* [[TH_ADDR]],
				// TCHECK-64: [[B_IN_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[B_CONV]],
				// TCHECK-32: [[B_IN_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[B_ADDR]],
				// TCHECK: store i{{[0-9]+}} [[B_IN_VAL]], i{{[0-9]+}}* [[BTA]],
				// TCHECK: [[C_CPY:%.+]] = bitcast [2 x [5 x i{{[0-9]+}}]]* [[CTA]] to i8*
				// TCHECK: [[C_IN_CPY:%.+]] = bitcast [2 x [5 x i{{[0-9]+}}]]* [[C_ADDR_REF]] to i8*
				// TCHECK: call void @llvm.memcpy.{{.+}}(i8* [[C_CPY]], i8* [[C_IN_CPY]],{{.+}})

				// TCHECK: [[B_IN_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[BTA]],
				// TCHECK: store i{{[0-9]+}} [[B_IN_VAL]], i{{[0-9]+}}* [[BTE]],
				// TCHECK: [[C_CPY:%.+]] = bitcast [2 x [5 x i{{[0-9]+}}]]* [[CTE]] to i8*
				// TCHECK: [[C_IN_CPY:%.+]] = bitcast [2 x [5 x i{{[0-9]+}}]]* [[CTA]] to i8*
				// TCHECK: call void @llvm.memcpy.{{.+}}(i8* [[C_CPY]], i8* [[C_IN_CPY]],{{.+}})

				// this->a = (double)b + 1.5;
				// TCHECK: [[B_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[BTE]],
				// TCHECK: [[B_CONV:%.+]] = sitofp i{{[0-9]+}} [[B_VAL]] to double
				// TCHECK: [[NEW_A_VAL:%.+]] = fadd double [[B_CONV]], 1.5{{.+}}+00
				// TCHECK: [[A_FIELD:%.+]] = getelementptr inbounds [[S1]], [[S1]]* [[TH_ADDR_REF]], i{{[0-9]+}} 0, i{{[0-9]+}} 0
				// TCHECK: store double [[NEW_A_VAL]], double* [[A_FIELD]],

				// c[1][1] = ++a;
				// TCHECK: [[A_FIELD4:%.+]] = getelementptr inbounds [[S1]], [[S1]]* [[TH_ADDR_REF]], i{{[0-9]+}} 0, i{{[0-9]+}} 0
				// TCHECK: [[A_FIELD4_VAL:%.+]] = load double, double* [[A_FIELD4]],
				// TCHECK: [[A_FIELD_INC:%.+]] = fadd double [[A_FIELD4_VAL]], 1.0{{.+}}+00
				// TCHECK: store double [[A_FIELD_INC]], double* [[A_FIELD4]],
				// TCHECK: [[A_FIELD_INC_CONV:%.+]] = fptosi double [[A_FIELD_INC]] to i{{[0-9]+}}
				// TCHECK: [[C_1_REF:%.+]] = getelementptr inbounds [2 x [5 x i{{[0-9]+}}]], [2 x [5 x i{{[0-9]+}}]]* [[CTE]], i{{[0-9]+}} 0, i{{[0-9]+}} 1
				// TCHECK: [[C_1_1_REF:%.+]] = getelementptr inbounds [5 x i{{[0-9]+}}], [5 x i{{[0-9]+}}]* [[C_1_REF]], i{{[0-9]+}} 0, i{{[0-9]+}} 1
				// TCHECK: store i{{[0-9]+}} [[A_FIELD_INC_CONV]], i{{[0-9]+}}* [[C_1_1_REF]],
				// TCHECK: ret void

				};


				int bar(int n, double* ptr){
				int a = 0;
				a += foo(n, ptr);
				S1 S;
				a += S.r1(n);
				a += fstatic(n);
				a += ftemplate<int>(n);

				return a;
				}

				// template
				// TCHECK: define void @__omp_offloading_{{.+}}()
				// TCHECK: [[ATA:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A2TA:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[BTA:%.+]] = alloca [10 x i{{[0-9]+}}],
				// TCHECK: [[ATE:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A2TE:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[BTE:%.+]] = alloca [10 x i{{[0-9]+}}],

				// TCHECK: [[ATA_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[ATA]],
				// TCHECK: store i{{[0-9]+}} [[ATA_VAL]], i{{[0-9]+}}* [[ATE]],
				// TCHECK: [[A2TA_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[A2TA]],
				// TCHECK: store i{{[0-9]+}} [[A2TA_VAL]], i{{[0-9]+}}* [[A2TE]],
				// TCHECK: [[BTE_CPY:%.+]] = bitcast [10 x i{{[0-9]+}}]* [[BTE]] to i8*
				// TCHECK: [[BTA_CPY:%.+]] = bitcast [10 x i{{[0-9]+}}]* [[BTA]] to i8*
				// TCHECK: call void @llvm.memcpy.{{.+}}(i8* [[BTE_CPY]], i8* [[BTA_CPY]],{{.+}})

				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[ATE]],
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[A2TE]],
				// TCHECK: [[B_GEP:%.+]] = getelementptr inbounds [10 x i{{[0-9]+}}], [10 x i{{[0-9]+}}]* [[BTE]], i{{[0-9]+}} 0, i{{[0-9]+}} 2
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[B_GEP]],
				// TCHECK: ret void


				// TCHECK: define void @__omp_offloading_{{.+}}(i{{[0-9]+}}{{.+}} [[A_IN:%.+]], i{{[0-9]+}}{{.+}} [[A2:%.+]], [10 x i{{[0-9]+}}]*{{.+}} [[B_IN:%.+]])
				// TCHECK: [[A_ADDR:%.+]] = alloca i{{[0-9]+}}*,
				// TCHECK: [[A2_ADDR:%.+]] = alloca i{{[0-9]+}}*,
				// TCHECK: [[B_ADDR:%.+]] = alloca [10 x i{{[0-9]+}}]*,
				// TCHECK: [[A:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A2:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[B:%.+]] = alloca [10 x i{{[0-9]+}}],
				// TCHECK: store i{{[0-9]+}}* [[A_IN]], i{{[0-9]+}}** [[A_ADDR]],
				// TCHECK: store i{{[0-9]+}}* [[A2_IN]], i{{[0-9]+}}** [[A2_ADDR]],
				// TCHECK: store [10 x i{{[0-9]+}}]* [[B_IN]], [10 x i{{[0-9]+}}]** [[B_ADDR]],
				// TCHECK: [[A_ADDR_REF:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[A_ADDR]],
				// TCHECK: [[A2_ADDR_REF:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[A2_ADDR]],
				// TCHECK: [[B_ADDR_REF:%.+]] = load [10 x i{{[0-9]+}}], [10 x i{{[0-9]+}}]* [[B_ADDR]],
				// TCHECK: [[A_IN_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[A_ADDR_REF]],
				// TCHECK: store i{{[0-9]+}} [[A_IN_VAL]], i{{[0-9]+}}* [[A]],
				// TCHECK: [[A2_IN_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[A2_ADDR_REF]],
				// TCHECK: store i{{[0-9]+}} [[A2_IN_VAL]], i{{[0-9]+}}* [[A2]],
				// TCHECK: [[B_CPY:%.+]] = bitcast [10 x i{{[0-9]+}}]* [[B]] to i8*
				// TCHECK: [[B_IN_CPY:%.+]] = bitcast [10 x i{{[0-9]+}}]* [[B_ADDR_REF]] to i8*
				// TCHECK: call void @llvm.memcpy.{{.+}}(i8* [[B_CPY]], i8* [[B_IN_CPY]],{{.+}})

				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[A]],
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[A2]],
				// TCHECK: [[B_GEP:%.+]] = getelementptr inbounds [10 x i{{[0-9]+}}], [10 x i{{[0-9]+}}]* [[B]], i{{[0-9]+}} 0, i{{[0-9]+}} 2
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[B_GEP]],
				// TCHECK: ret void

				// TCHECK: define void @__omp_offloading_{{.+}}(i{{[0-9]+}}{{.+}} [[A_IN:%.+]], i{{[0-9]+}}{{.+}} [[A2:%.+]], [10 x i{{[0-9]+}}]*{{.+}} [[B_IN:%.+]])
				// TCHECK: [[A_ADDR:%.+]] = alloca i{{[0-9]+}}*,
				// TCHECK: [[A2_ADDR:%.+]] = alloca i{{[0-9]+}}*,
				// TCHECK: [[B_ADDR:%.+]] = alloca [10 x i{{[0-9]+}}]*,
				// TCHECK: [[ATA:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A2TA:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[BTA:%.+]] = alloca [10 x i{{[0-9]+}}],
				// TCHECK: [[ATE:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A2TE:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[BTE:%.+]] = alloca [10 x i{{[0-9]+}}],
				// TCHECK: store i{{[0-9]+}}* [[A_IN]], i{{[0-9]+}}** [[A_ADDR]],
				// TCHECK: store i{{[0-9]+}}* [[A2_IN]], i{{[0-9]+}}** [[A2_ADDR]],
				// TCHECK: store [10 x i{{[0-9]+}}]* [[B_IN]], [10 x i{{[0-9]+}}]** [[B_ADDR]],
				// TCHECK: [[A_ADDR_REF:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[A_ADDR]],
				// TCHECK: [[A2_ADDR_REF:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[A2_ADDR]],
				// TCHECK: [[B_ADDR_REF:%.+]] = load [10 x i{{[0-9]+}}], [10 x i{{[0-9]+}}]* [[B_ADDR]],
				// TCHECK: [[A_IN_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[A_ADDR_REF]],
				// TCHECK: store i{{[0-9]+}} [[A_IN_VAL]], i{{[0-9]+}}* [[ATA]],
				// TCHECK: [[A2_IN_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[A2_ADDR_REF]],
				// TCHECK: store i{{[0-9]+}} [[A2_IN_VAL]], i{{[0-9]+}}* [[A2TA]],
				// TCHECK: [[B_CPY:%.+]] = bitcast [10 x i{{[0-9]+}}]* [[BTA]] to i8*
				// TCHECK: [[B_IN_CPY:%.+]] = bitcast [10 x i{{[0-9]+}}]* [[B_ADDR_REF]] to i8*
				// TCHECK: call void @llvm.memcpy.{{.+}}(i8* [[B_CPY]], i8* [[B_IN_CPY]],{{.+}})

				// TCHECK: [[A_IN_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[ATA]],
				// TCHECK: store i{{[0-9]+}} [[A_IN_VAL]], i{{[0-9]+}}* [[ATE]],
				// TCHECK: [[A2_IN_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[A2TA]],
				// TCHECK: store i{{[0-9]+}} [[A2_IN_VAL]], i{{[0-9]+}}* [[A2TE]],
				// TCHECK: [[B_CPY:%.+]] = bitcast [10 x i{{[0-9]+}}]* [[BTE]] to i8*
				// TCHECK: [[B_IN_CPY:%.+]] = bitcast [10 x i{{[0-9]+}}]* [[BTA]] to i8*
				// TCHECK: call void @llvm.memcpy.{{.+}}(i8* [[B_CPY]], i8* [[B_IN_CPY]],{{.+}})

				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[ATE]],
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[A2TE]],
				// TCHECK: [[B_GEP:%.+]] = getelementptr inbounds [10 x i{{[0-9]+}}], [10 x i{{[0-9]+}}]* [[BTE]], i{{[0-9]+}} 0, i{{[0-9]+}} 2
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[B_GEP]],
				// TCHECK: ret void

				#endif

test/OpenMP/nvptx_teams_private_codegen.cpp

This file was added.

				// RUN: %clang_cc1 -verify -fopenmp -x c++ -std=c++11 -triple powerpc64le-unknown-unknown -fomptargets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc
				// RUN: %clang_cc1 -verify -fopenmp -x c++ -std=c++11 -triple nvptx64-unknown-unknown -fomptargets=nvptx64-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fomp-host-ir-file-path %t-ppc-host.bc -o - \| FileCheck %s --check-prefix TCHECK --check-prefix TCHECK-64
				// RUN: %clang_cc1 -verify -fopenmp -x c++ -std=c++11 -triple i386-unknown-unknown -fomptargets=nvptx-nvidia-cuda -emit-llvm-bc %s -o %t-x86-host.bc
				// RUN: %clang_cc1 -verify -fopenmp -x c++ -std=c++11 -triple nvptx-unknown-unknown -fomptargets=nvptx-nvidia-cuda -emit-llvm %s -fopenmp-is-device -fomp-host-ir-file-path %t-x86-host.bc -o - \| FileCheck %s --check-prefix TCHECK --check-prefix TCHECK-32
				// expected-no-diagnostics
				#ifndef HEADER
				#define HEADER



				template<typename tx, typename ty>
				struct TT{
				tx X;
				ty Y;
				};

				// TCHECK: [[TT:%.+]] = type { i64, i8 }
				// TCHECK: [[S1:%.+]] = type { double }

				int foo(int n) {
				int a = 0;
				short aa = 0;
				float b[10];
				float bn[n];
				double c[5][10];
				double cn[5][n];
				TT<long long, char> d;

				#pragma omp target private(a)
				#pragma omp teams private(a)
				{
				}

				// generate both private variables
				// TCHECK: define void @__omp_offloading_{{.+}}()
				// TCHECK: [[A:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A2:%.+]] = alloca i{{[0-9]+}},
				// TCHECK-NOT: store {{.+}}, {{.+}} [[A]],
				// TCHECK: ret void


				// a is implictly firstprivate
				#pragma omp target
				#pragma omp teams private(a)
				{
				}

				// TCHECK: define void @__omp_offloading_{{.+}}()
				// TCHECK: [[A:%.+]] = alloca i{{[0-9]+}},
				// TCHECK-NOT: store {{.+}}, {{.+}} [[A]],
				// TCHECK: ret void

				#pragma omp target firstprivate(a)
				#pragma omp teams private(a)
				{
				}

				// because of firstprivate, a.addr and a parameter are
				// created. This is fine, as long as we do not use a.addr
				// TCHECK: define void @__omp_offloading_{{.+}}({{.+}})
				// TCHECK: [[A_ADDR:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[ATA:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[ATE:%.+]] = alloca i{{[0-9]+}},
				// TCHECK-NOT: store {{.+}}, {{.+}} [[ATE]],
				// TCHECK: ret void

				#pragma omp target private(a)
				#pragma omp teams private(a)
				{
				a = 1;
				}

				// TCHECK: define void @__omp_offloading_{{.+}}()
				// TCHECK: [[ATA:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[ATE:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[ATE]],
				// TCHECK-NOT: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[ATA]],
				// TCHECK: ret void

				#pragma omp target
				#pragma omp teams private(a)
				{
				a = 1;
				}

				// TCHECK: define void @__omp_offloading_{{.+}}()
				// TCHECK: [[A:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[A]],
				// TCHECK: ret void

				#pragma omp target firstprivate(a)
				#pragma omp teams private(a)
				{
				a = 1;
				}

				// check that we store in a without looking at the parameter
				// TCHECK: define void @__omp_offloading_{{.+}}({{.+}})
				// TCHECK: [[A_ADDR:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[ATA:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[ATE:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[ATE]],
				// TCHECK: ret void

				#pragma omp target private(a, aa)
				#pragma omp teams private(a,aa)
				{
				a = 1;
				aa = 1;
				}

				// TCHECK: define void @__omp_offloading_{{.+}}()
				// TCHECK: [[ATA:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A2TA:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[ATE:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A2TE:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[ATE]],
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[A2TE]],
				// TCHECK: ret void

				#pragma omp target
				#pragma omp teams private(a,aa)
				{
				a = 1;
				aa = 1;
				}

				// TCHECK: define void @__omp_offloading_{{.+}}()
				// TCHECK: [[A:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A2:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[A]],
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[A2]],
				// TCHECK: ret void

				#pragma omp target firstprivate(a, aa)
				#pragma omp teams private(a,aa)
				{
				a = 1;
				aa = 1;

				aa = a+1;
				}

				// check that we are not using the firstprivate parameter
				// TCHECK: define void @__omp_offloading_{{.+}}({{.+}})
				// TCHECK: [[A_ADDR:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A2_ADDR:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[ATA:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A2TA:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[ATE:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A2TE:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[ATE]],
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[A2TE]],
				// TCHECK: [[A_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[ATE]]
				// TCHECK: [[A_INC:%.+]] = add{{.+}} i{{[0-9]+}} [[A_VAL]], 1
				// TCHECK: [[CONV:%.+]] = trunc i{{[0-9]+}} [[A_INC]] to i{{[0-9]+}}
				// TCHECK: store i{{[0-9]+}} [[CONV]], i{{[0-9]+}}* [[A2TE]]
				// TCHECK: ret void

				#pragma omp target private(a, b,c, d)
				#pragma omp teams private(a, b, c, d)
				{
				a = 1;
				b[2] = 1.0;
				c[1][2] = 1.0;
				d.X = 1;
				d.Y = 1;
				}

				// TCHECK: define void @__omp_offloading_{{.+}}()
				// TCHECK: [[ATA:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[BTA:%.+]] = alloca [10 x float],
				// TCHECK: [[CTA:%.+]] = alloca [5 x [10 x double]],
				// TCHECK: [[DTA:%.+]] = alloca [[TT]],
				// TCHECK: [[ATE:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[BTE:%.+]] = alloca [10 x float],
				// TCHECK: [[CTE:%.+]] = alloca [5 x [10 x double]],
				// TCHECK: [[DTE:%.+]] = alloca [[TT]],
				// TCHECK: store {{.+}} [[ATE]],
				// TCHECK: [[B_IDX:%.+]] = getelementptr{{.+}} [[BTE]], {{.+}},
				// TCHECK: store {{.+}} [[B_IDX]],
				// TCHECK: [[C_IDX1:%.+]] = getelementptr {{.+}} [[CTE]], {{.+}}
				// TCHECK: [[C_IDX2:%.+]] = getelementptr {{.+}} [[C_IDX1]], {{.+}}
				// TCHECK: store {{.+}} [[C_IDX2]],
				// TCHECK: [[D_X:%.+]] = getelementptr {{.+}} [[DTE]], {{.+}}
				// TCHECK: store {{.+}} [[D_X]],
				// TCHECK: [[D_Y:%.+]] = getelementptr {{.+}} [[DTE]], {{.+}}
				// TCHECK: store {{.+}} [[D_Y]],

				#pragma omp target
				#pragma omp teams private(a, b, c, d)
				{
				a = 1;
				b[2] = 1.0;
				c[1][2] = 1.0;
				d.X = 1;
				d.Y = 1;
				}

				// TCHECK: define void @__omp_offloading_{{.+}}()
				// TCHECK: [[A:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[B:%.+]] = alloca [10 x float],
				// TCHECK: [[C:%.+]] = alloca [5 x [10 x double]],
				// TCHECK: [[D:%.+]] = alloca [[TT]],
				// TCHECK: store {{.+}} [[A]],
				// TCHECK: [[B_IDX:%.+]] = getelementptr{{.+}} [[B]], {{.+}},
				// TCHECK: store {{.+}} [[B_IDX]],
				// TCHECK: [[C_IDX1:%.+]] = getelementptr {{.+}} [[C]], {{.+}}
				// TCHECK: [[C_IDX2:%.+]] = getelementptr {{.+}} [[C_IDX1]], {{.+}}
				// TCHECK: store {{.+}} [[C_IDX2]],
				// TCHECK: [[D_X:%.+]] = getelementptr {{.+}} [[D]], {{.+}}
				// TCHECK: store {{.+}} [[D_X]],
				// TCHECK: [[D_Y:%.+]] = getelementptr {{.+}} [[D]], {{.+}}
				// TCHECK: store {{.+}} [[D_Y]],

				#pragma omp target firstprivate(a, b, c, d)
				#pragma omp teams private(a, b, c, d)
				{
				a = 1;
				b[2] = 1.0;
				c[1][2] = 1.0;
				d.X = 1;
				d.Y = 1;
				}

				// TCHECK: define void @__omp_offloading_{{.+}}({{.+}})
				// TCHECK: [[A_ADDR:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[B_ADDR:%.+]] = alloca [10 x float]*,
				// TCHECK: [[C_ADDR:%.+]] = alloca [5 x [10 x double]]*,
				// TCHECK: [[D_ADDR:%.+]] = alloca [[TT]]*,
				// TCHECK: [[ATA:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[BTA:%.+]] = alloca [10 x float],
				// TCHECK: [[CTA:%.+]] = alloca [5 x [10 x double]],
				// TCHECK: [[DTA:%.+]] = alloca [[TT]],
				// TCHECK: [[ATE:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[BTE:%.+]] = alloca [10 x float],
				// TCHECK: [[CTE:%.+]] = alloca [5 x [10 x double]],
				// TCHECK: [[DTE:%.+]] = alloca [[TT]],
				// TCHECK: store {{.+}} [[ATE]],
				// TCHECK: [[B_IDX:%.+]] = getelementptr{{.+}} [[BTE]], {{.+}},
				// TCHECK: store {{.+}} [[B_IDX]],
				// TCHECK: [[C_IDX1:%.+]] = getelementptr {{.+}} [[CTE]], {{.+}}
				// TCHECK: [[C_IDX2:%.+]] = getelementptr {{.+}} [[C_IDX1]], {{.+}}
				// TCHECK: store {{.+}} [[C_IDX2]],
				// TCHECK: [[D_X:%.+]] = getelementptr {{.+}} [[DTE]], {{.+}}
				// TCHECK: store {{.+}} [[D_X]],
				// TCHECK: [[D_Y:%.+]] = getelementptr {{.+}} [[DTE]], {{.+}}
				// TCHECK: store {{.+}} [[D_Y]],

				return a;
				}


				template<typename tx>
				tx ftemplate(int n) {
				tx a = 0;
				short aa = 0;
				tx b[10];

				#pragma omp target private(a,aa,b)
				#pragma omp teams private(a,aa,b)
				{
				a = 1;
				aa = 1;
				b[2] = 1;
				}

				#pragma omp target
				#pragma omp teams private(a,aa,b)
				{
				a = 1;
				aa = 1;
				b[2] = 1;
				}

				#pragma omp target firstprivate(a,aa,b)
				#pragma omp teams private(a,aa,b)
				{
				a = 1;
				aa = 1;
				b[2] = 1;
				}

				return a;
				}

				static
				int fstatic(int n) {
				int a = 0;
				short aa = 0;
				char aaa = 0;
				int b[10];

				#pragma omp target private(a,aa,aaa,b)
				#pragma omp teams private(a,aa,aaa,b)
				{
				a = 1;
				aa = 1;
				aaa = 1;
				b[2] = 1;
				}

				// TCHECK: define void @__omp_offloading_{{.+}}()
				// TCHECK: [[ATA:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A2TA:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A3TA:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[BTA:%.+]] = alloca [10 x i{{[0-9]+}}],
				// TCHECK: [[ATE:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A2TE:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A3TE:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[BTE:%.+]] = alloca [10 x i{{[0-9]+}}],
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[ATE]],
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[A2TE]],
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[A3TE]],
				// TCHECK: [[B_GEP:%.+]] = getelementptr inbounds [10 x i{{[0-9]+}}], [10 x i{{[0-9]+}}]* [[BTE]], i{{[0-9]+}} 0, i{{[0-9]+}} 2
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[B_GEP]],
				// TCHECK: ret void

				#pragma omp target
				#pragma omp teams private(a,aa,aaa,b)
				{
				a = 1;
				aa = 1;
				aaa = 1;
				b[2] = 1;
				}

				// TCHECK: define void @__omp_offloading_{{.+}}()
				// TCHECK: [[A:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A2:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A3:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[B:%.+]] = alloca [10 x i{{[0-9]+}}],
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[A]],
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[A2]],
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[A3]],
				// TCHECK: [[B_GEP:%.+]] = getelementptr inbounds [10 x i{{[0-9]+}}], [10 x i{{[0-9]+}}]* [[B]], i{{[0-9]+}} 0, i{{[0-9]+}} 2
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[B_GEP]],
				// TCHECK: ret void

				#pragma omp target firstprivate(a,aa,aaa,b)
				#pragma omp teams private(a,aa,aaa,b)
				{
				a = 1;
				aa = 1;
				aaa = 1;
				b[2] = 1;
				}

				// TCHECK: define void @__omp_offloading_{{.+}}({{.+}})
				// TCHECK: [[A_ADDR:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A2_ADDR:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A3_ADDR:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[B_ADDR:%.+]] = alloca [10 x i{{[0-9]+}}]*,
				// TCHECK: [[ATA:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A2TA:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A3TA:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[BTA:%.+]] = alloca [10 x i{{[0-9]+}}],
				// TCHECK: [[ATE:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A2TE:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A3TE:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[BTE:%.+]] = alloca [10 x i{{[0-9]+}}],
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[ATE]],
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[A2TE]],
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[A3TE]],
				// TCHECK: [[B_GEP:%.+]] = getelementptr inbounds [10 x i{{[0-9]+}}], [10 x i{{[0-9]+}}]* [[BTE]], i{{[0-9]+}} 0, i{{[0-9]+}} 2
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[B_GEP]],
				// TCHECK: ret void

				return a;
				}

				struct S1 {
				double a;

				int r1(int n){
				int b = n+1;
				short int c[2][5];

				#pragma omp target private(b,c)
				#pragma omp teams private(b,c)
				{
				this->a = (double)b + 1.5;
				c[1][1] = ++a;
				}

				// TCHECK: define void @__omp_offloading_{{.+}}([[S1]]* [[TH:%.+]])
				// TCHECK: [[TH_ADDR:%.+]] = alloca [[S1]]*,
				// TCHECK: [[BTA:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[CTA:%.+]] = alloca [2 x [5 x i{{[0-9]+}}]],
				// TCHECK: [[BTE:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[CTE:%.+]] = alloca [2 x [5 x i{{[0-9]+}}]],
				// TCHECK: store [[S1]]* [[TH]], [[S1]]** [[TH_ADDR]],
				// TCHECK: [[TH_ADDR_REF:%.+]] = load [[S1]], [[S1]]* [[TH_ADDR]],

				// this->a = (double)b + 1.5;
				// TCHECK: [[B_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[BTE]],
				// TCHECK: [[B_CONV:%.+]] = sitofp i{{[0-9]+}} [[B_VAL]] to double
				// TCHECK: [[NEW_A_VAL:%.+]] = fadd double [[B_CONV]], 1.5{{.+}}+00
				// TCHECK: [[A_FIELD:%.+]] = getelementptr inbounds [[S1]], [[S1]]* [[TH_ADDR_REF]], i{{[0-9]+}} 0, i{{[0-9]+}} 0
				// TCHECK: store double [[NEW_A_VAL]], double* [[A_FIELD]],

				// c[1][1] = ++a;
				// TCHECK: [[A_FIELD4:%.+]] = getelementptr inbounds [[S1]], [[S1]]* [[TH_ADDR_REF]], i{{[0-9]+}} 0, i{{[0-9]+}} 0
				// TCHECK: [[A_FIELD4_VAL:%.+]] = load double, double* [[A_FIELD4]],
				// TCHECK: [[A_FIELD_INC:%.+]] = fadd double [[A_FIELD4_VAL]], 1.0{{.+}}+00
				// TCHECK: store double [[A_FIELD_INC]], double* [[A_FIELD4]],
				// TCHECK: [[A_FIELD_INC_CONV:%.+]] = fptosi double [[A_FIELD_INC]] to i{{[0-9]+}}
				// TCHECK: [[C_1_REF:%.+]] = getelementptr inbounds [2 x [5 x i{{[0-9]+}}]], [2 x [5 x i{{[0-9]+}}]]* [[CTE]], i{{[0-9]+}} 0, i{{[0-9]+}} 1
				// TCHECK: [[C_1_1_REF:%.+]] = getelementptr inbounds [5 x i{{[0-9]+}}], [5 x i{{[0-9]+}}]* [[C_1_REF]], i{{[0-9]+}} 0, i{{[0-9]+}} 1
				// TCHECK: store i{{[0-9]+}} [[A_FIELD_INC_CONV]], i{{[0-9]+}}* [[C_1_1_REF]],
				// TCHECK: ret void

				#pragma omp target
				#pragma omp teams private(b,c)
				{
				this->a = (double)b + 1.5;
				c[1][1] = ++a;
				}

				// TCHECK: define void @__omp_offloading_{{.+}}([[S1]]* [[TH:%.+]])
				// TCHECK: [[TH_ADDR:%.+]] = alloca [[S1]]*,
				// TCHECK: [[B:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[C:%.+]] = alloca [2 x [5 x i{{[0-9]+}}]],
				// TCHECK: store [[S1]]* [[TH]], [[S1]]** [[TH_ADDR]],
				// TCHECK: [[TH_ADDR_REF:%.+]] = load [[S1]], [[S1]]* [[TH_ADDR]],

				// this->a = (double)b + 1.5;
				// TCHECK: [[B_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[B]],
				// TCHECK: [[B_CONV:%.+]] = sitofp i{{[0-9]+}} [[B_VAL]] to double
				// TCHECK: [[NEW_A_VAL:%.+]] = fadd double [[B_CONV]], 1.5{{.+}}+00
				// TCHECK: [[A_FIELD:%.+]] = getelementptr inbounds [[S1]], [[S1]]* [[TH_ADDR_REF]], i{{[0-9]+}} 0, i{{[0-9]+}} 0
				// TCHECK: store double [[NEW_A_VAL]], double* [[A_FIELD]],

				// c[1][1] = ++a;
				// TCHECK: [[A_FIELD4:%.+]] = getelementptr inbounds [[S1]], [[S1]]* [[TH_ADDR_REF]], i{{[0-9]+}} 0, i{{[0-9]+}} 0
				// TCHECK: [[A_FIELD4_VAL:%.+]] = load double, double* [[A_FIELD4]],
				// TCHECK: [[A_FIELD_INC:%.+]] = fadd double [[A_FIELD4_VAL]], 1.0{{.+}}+00
				// TCHECK: store double [[A_FIELD_INC]], double* [[A_FIELD4]],
				// TCHECK: [[A_FIELD_INC_CONV:%.+]] = fptosi double [[A_FIELD_INC]] to i{{[0-9]+}}
				// TCHECK: [[C_1_REF:%.+]] = getelementptr inbounds [2 x [5 x i{{[0-9]+}}]], [2 x [5 x i{{[0-9]+}}]]* [[C]], i{{[0-9]+}} 0, i{{[0-9]+}} 1
				// TCHECK: [[C_1_1_REF:%.+]] = getelementptr inbounds [5 x i{{[0-9]+}}], [5 x i{{[0-9]+}}]* [[C_1_REF]], i{{[0-9]+}} 0, i{{[0-9]+}} 1
				// TCHECK: store i{{[0-9]+}} [[A_FIELD_INC_CONV]], i{{[0-9]+}}* [[C_1_1_REF]],
				// TCHECK: ret void

				#pragma omp target firstprivate(b,c)
				#pragma omp teams private(b,c)
				{
				this->a = (double)b + 1.5;
				c[1][1] = ++a;
				}

				return c[1][1] + (int)b;
				}

				// TCHECK: define void @__omp_offloading_{{.+}}([[S1]]* [[TH:%.+]], i{{[0-9]+}}{{.+}})
				// TCHECK: [[TH_ADDR:%.+]] = alloca [[S1]]*,
				// TCHECK: [[B_ADDR:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[C_ADDR:%.+]] = alloca [2 x [5 x i{{[0-9]+}}]]*,
				// TCHECK: [[BTA:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[CTA:%.+]] = alloca [2 x [5 x i{{[0-9]+}}]],
				// TCHECK: [[BTE:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[CTE:%.+]] = alloca [2 x [5 x i{{[0-9]+}}]],
				// TCHECK: store [[S1]]* [[TH]], [[S1]]** [[TH_ADDR]],
				// TCHECK: [[TH_ADDR_REF:%.+]] = load [[S1]], [[S1]]* [[TH_ADDR]],

				// this->a = (double)b + 1.5;
				// TCHECK: [[B_VAL:%.+]] = load i{{[0-9]+}}, i{{[0-9]+}}* [[BTE]],
				// TCHECK: [[B_CONV:%.+]] = sitofp i{{[0-9]+}} [[B_VAL]] to double
				// TCHECK: [[NEW_A_VAL:%.+]] = fadd double [[B_CONV]], 1.5{{.+}}+00
				// TCHECK: [[A_FIELD:%.+]] = getelementptr inbounds [[S1]], [[S1]]* [[TH_ADDR_REF]], i{{[0-9]+}} 0, i{{[0-9]+}} 0
				// TCHECK: store double [[NEW_A_VAL]], double* [[A_FIELD]],

				// c[1][1] = ++a;
				// TCHECK: [[A_FIELD4:%.+]] = getelementptr inbounds [[S1]], [[S1]]* [[TH_ADDR_REF]], i{{[0-9]+}} 0, i{{[0-9]+}} 0
				// TCHECK: [[A_FIELD4_VAL:%.+]] = load double, double* [[A_FIELD4]],
				// TCHECK: [[A_FIELD_INC:%.+]] = fadd double [[A_FIELD4_VAL]], 1.0{{.+}}+00
				// TCHECK: store double [[A_FIELD_INC]], double* [[A_FIELD4]],
				// TCHECK: [[A_FIELD_INC_CONV:%.+]] = fptosi double [[A_FIELD_INC]] to i{{[0-9]+}}
				// TCHECK: [[C_1_REF:%.+]] = getelementptr inbounds [2 x [5 x i{{[0-9]+}}]], [2 x [5 x i{{[0-9]+}}]]* [[CTE]], i{{[0-9]+}} 0, i{{[0-9]+}} 1
				// TCHECK: [[C_1_1_REF:%.+]] = getelementptr inbounds [5 x i{{[0-9]+}}], [5 x i{{[0-9]+}}]* [[C_1_REF]], i{{[0-9]+}} 0, i{{[0-9]+}} 1
				// TCHECK: store i{{[0-9]+}} [[A_FIELD_INC_CONV]], i{{[0-9]+}}* [[C_1_1_REF]],
				// TCHECK: ret void

				};


				int bar(int n){
				int a = 0;
				a += foo(n);
				S1 S;
				a += S.r1(n);
				a += fstatic(n);
				a += ftemplate<int>(n);

				return a;
				}

				// template
				// TCHECK: define void @__omp_offloading_{{.+}}()
				// TCHECK: [[ATA:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A2TA:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[BTA:%.+]] = alloca [10 x i{{[0-9]+}}],
				// TCHECK: [[ATE:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A2TE:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[BTE:%.+]] = alloca [10 x i{{[0-9]+}}],
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[ATE]],
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[A2TE]],
				// TCHECK: [[B_GEP:%.+]] = getelementptr inbounds [10 x i{{[0-9]+}}], [10 x i{{[0-9]+}}]* [[BTE]], i{{[0-9]+}} 0, i{{[0-9]+}} 2
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[B_GEP]],
				// TCHECK: ret void

				// TCHECK: define void @__omp_offloading_{{.+}}()
				// TCHECK: [[A:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A2:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[B:%.+]] = alloca [10 x i{{[0-9]+}}],
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[A]],
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[A2]],
				// TCHECK: [[B_GEP:%.+]] = getelementptr inbounds [10 x i{{[0-9]+}}], [10 x i{{[0-9]+}}]* [[B]], i{{[0-9]+}} 0, i{{[0-9]+}} 2
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[B_GEP]],
				// TCHECK: ret void

				// TCHECK: define void @__omp_offloading_{{.+}}({{.+}})
				// TCHECK: [[A_ADDR:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A2_ADDR:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[B_ADDR:%.+]] = alloca [10 x i{{[0-9]+}}]*,
				// TCHECK: [[ATA:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A2TA:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[BTA:%.+]] = alloca [10 x i{{[0-9]+}}],
				// TCHECK: [[ATE:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[A2TE:%.+]] = alloca i{{[0-9]+}},
				// TCHECK: [[BTE:%.+]] = alloca [10 x i{{[0-9]+}}],
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[ATE]],
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[A2TE]],
				// TCHECK: [[B_GEP:%.+]] = getelementptr inbounds [10 x i{{[0-9]+}}], [10 x i{{[0-9]+}}]* [[BTE]], i{{[0-9]+}} 0, i{{[0-9]+}} 2
				// TCHECK: store i{{[0-9]+}} 1, i{{[0-9]+}}* [[B_GEP]],
				// TCHECK: ret void

				#endif

This is an archive of the discontinued LLVM Phabricator instance.

[OPENMP] private and firstprivate clauses of teams code generation for nvptxAcceptedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 51080

lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp

test/OpenMP/nvptx_teams_firstprivate_codegen.cpp

test/OpenMP/nvptx_teams_private_codegen.cpp

[OPENMP] private and firstprivate clauses of teams code generation for nvptx
AcceptedPublic