This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/CodeGen/
-
CodeGen/
6
CGOpenCLRuntime.cpp
-
test/CodeGenOpenCL/
-
CodeGenOpenCL/
1
cl20-device-side-enqueue.cl

Differential D23086

[OpenCL] Generate concrete struct type for ndrange_t
AbandonedPublic

Authored by ashi1 on Aug 2 2016, 2:10 PM.

Download Raw Diff

Details

Reviewers

majnemer
yaxunl
bader

Summary

ndrange_t needs to be emitted as a struct type since it has to be allocated on a stack as a local variable or function return.

Diff Detail

Repository: rL LLVM

Event Timeline

ashi1 updated this revision to Diff 66551.Aug 2 2016, 2:10 PM

ashi1 retitled this revision from to [OpenCL] Generate concrete struct type for ndrange_t.

ashi1 updated this object.

ashi1 added reviewers: bader, Anastasia, yaxunl.

ashi1 set the repository for this revision to rL LLVM.

ashi1 added a subscriber: cfe-commits.

ASTContext::getTypeInfoImpl in lib/AST/ASTContext.cpp needs to be updated for the real size of ndrange_t.

lib/CodeGen/CGOpenCLRuntime.cpp
43	struct name should be "struct.ndrange_t" to allow library code to access it.

bader added inline comments.Aug 3 2016, 12:26 AM

test/CodeGenOpenCL/cl20-device-side-enqueue.cl
12	Could you also add a regression test to validate that ndrange_t is defined in LLVM as struct type, please?

yaxunl added inline comments.Aug 3 2016, 7:43 AM

lib/CodeGen/CGOpenCLRuntime.cpp
43	Sorry, should be "struct.__ndrange_t" to avoid conflict with builtin type ndrange_t.

I don't think it works correctly yet.

Since ndrange_t is a copyable type i.e. we should be able to allocate the space for it at compile time and to copy it. See spec example s6.13.17.2:

ndrange_t ndrange = ndrange_1d(...);

This implies that compiler should:

Generate a proper copy of an object of ndrange_t. Currently this code:

ndrange_t n1; ndrange_t n2; n1 =n2;

is compiled to:

%ndrange_t = type { i32, [3 x i64], [3 x i64], [3 x i64] }
...
%n1 = alloca %ndrange_t
%n2 = alloca %ndrange_t
%0 = load %ndrange_t, %ndrange_t* %n2
store %ndrange_t %0, %ndrange_t* %n1

Which has a load and a store instruction of the whole struct size. This should ideally be handled with llvm.memcpy intrinsic like all other struct type objects.

Make sizeof(ndrange_t) return its actual size. This might be important to use this type correctly in the CL code.

The Spec doesn't list this type as an implementation defined behavior in s6.3.k

The behavior of applying the sizeof operator to the bool, image2d_t, image3d_t, image2d_array_t, image2d_depth_t, image2d_array_depth_t, image1d_t, image1d_buffer_t or image1d_array_t, sampler_t, clk_event_t, queue_t and event_t types is implementation-defined.

lib/CodeGen/CGOpenCLRuntime.cpp
43	Is there any conflict really? I think it should be Ok to keep "struct.ndrange_t", since we transform it to a struct and don't declare as struct.

yaxunl added inline comments.Aug 4 2016, 10:51 AM

lib/CodeGen/CGOpenCLRuntime.cpp
43	ndrange_t is defined as a builtin type in Clang, so library developer cannot implement it as a concrete type, but they can implement `__ndrange_t`. This is similar to the case of `as_type`.

Anastasia added inline comments.Aug 5 2016, 9:52 AM

lib/CodeGen/CGOpenCLRuntime.cpp
43	I am not sure I am following you, but I think this is different now from other OpenCL types because it's not an opaque type and this change gives it a concrete structure. Technically we can give it any name really struct.ndrange_t or `struct.__ndrange_t` would do because declaring a new type with ndrange_t identifier would be disallowed in OpenCL code and all `__` prefixed identifiers are reserved for internal toolchain use. But not sure why we would go for `struct.__ndrange_t`. We can leave this identifier to be still available for toolchain if needed.

This approach seems wrong to me.

Instead, why not just make ndrange_t a typedef of a real struct in Sema::Initialize?

lib/CodeGen/CGOpenCLRuntime.cpp
32–36	This formatting looks wrong.

This revision now requires changes to proceed.Aug 5 2016, 10:19 AM

In D23086#507055, @majnemer wrote:

This approach seems wrong to me.

Instead, why not just make ndrange_t a typedef of a real struct in Sema::Initialize?

ndrange_t is an OpenCL builtin type and the spec does not define its layout. There are sema checks which requires to check if a type is ndrange_t. How to do the check if it is defined as a typedef of a common struct type?

How about implement it as a child class of Type then typedef it?

In D23086#507055, @majnemer wrote:

This approach seems wrong to me.

Instead, why not just make ndrange_t a typedef of a real struct in Sema::Initialize?

I think we have an issue because in that case during the diagnostic of enqueue_kernel Builtin in SemaChecking.cpp we won't have a good way to identify this type apart from checking its name... That's why ideally we would like to leave it as a special Builtin type.

What kind of issues do you see with the current implementation?

Also do you suggest to create the struct in Sema::Initialize? Alternatively, we also have an OpenCL header file for such things.

In D23086#507098, @Anastasia wrote:

In D23086#507055, @majnemer wrote:

This approach seems wrong to me.

Instead, why not just make ndrange_t a typedef of a real struct in Sema::Initialize?

I think we have an issue because in that case during the diagnostic of enqueue_kernel Builtin in SemaChecking.cpp we won't have a good way to identify this type apart from checking its name...

That's OK, there are special types like this elsewhere (std::type_info from c++).

That's why ideally we would like to leave it as a special Builtin type.

What kind of issues do you see with the current implementation?

Also do you suggest to create the struct in Sema::Initialize? Alternatively, we also have an OpenCL header file for such things.

How about assuming ndrange_t is a struct type defined by user and identify it by struct type name in Clang? This gives user freedom of implementing it differently than SPIR. In opencl-c.h define it as a struct type as SPIR required.

In D23086#507203, @yaxunl wrote:

How about assuming ndrange_t is a struct type defined by user and identify it by struct type name in Clang? This gives user freedom of implementing it differently than SPIR. In opencl-c.h define it as a struct type as SPIR required.

That sounds fine to me.

In D23086#507203, @yaxunl wrote:

How about assuming ndrange_t is a struct type defined by user and identify it by struct type name in Clang? This gives user freedom of implementing it differently than SPIR. In opencl-c.h define it as a struct type as SPIR required.

Anastasia, are you OK with this approach? Thanks.

In D23086#507360, @yaxunl wrote:

In D23086#507203, @yaxunl wrote:

How about assuming ndrange_t is a struct type defined by user and identify it by struct type name in Clang? This gives user freedom of implementing it differently than SPIR. In opencl-c.h define it as a struct type as SPIR required.

Anastasia, are you OK with this approach? Thanks.

Sure. It seems there is no better solution to this.

Actually we need to assume ndrange_t is a user defined typedef instead of struct due to the way its used in the spec, e.g.

void f(ndrange_t x);

instead of

void f(struct ndrange_t x);

Then it may be a little bit tricky to decide whether a type is ndrange_t type. Any suggestions?

How about we decide if a type is ndrange_t type based on their canonical types. If the canonical type of type X is the same as the canonical type of ndrange_t type, then type X is treated as ndrange_t type. Is this reasonable?

In D23086#514279, @yaxunl wrote:

How about we decide if a type is ndrange_t type based on their canonical types. If the canonical type of type X is the same as the canonical type of ndrange_t type, then type X is treated as ndrange_t type. Is this reasonable?

I am not sure I understand entirely what you mean?

Following the earlier suggestion from David, I think we can just create a struct type internally and then typedef it to ndrange_t, we can use buildImplicitRecord and addImplicitTypedef methods I believe. The latter one has already been used for other OpenCL types.

We will have to switch to string comparisons to identify this type in SemaChecking.cpp and CGBuiltins.cpp for handling the enqueue_kernel call.

In D23086#515443, @Anastasia wrote:

In D23086#514279, @yaxunl wrote:

How about we decide if a type is ndrange_t type based on their canonical types. If the canonical type of type X is the same as the canonical type of ndrange_t type, then type X is treated as ndrange_t type. Is this reasonable?

I am not sure I understand entirely what you mean?

Following the earlier suggestion from David, I think we can just create a struct type internally and then typedef it to ndrange_t, we can use buildImplicitRecord and addImplicitTypedef methods I believe. The latter one has already been used for other OpenCL types.

We will have to switch to string comparisons to identify this type in SemaChecking.cpp and CGBuiltins.cpp for handling the enqueue_kernel call.

This was not the approach we agreed upon.

The approach we agreed upon was

In D23086#507215, @majnemer wrote:

In D23086#507203, @yaxunl wrote:

How about assuming ndrange_t is a struct type defined by user and identify it by struct type name in Clang? This gives user freedom of implementing it differently than SPIR. In opencl-c.h define it as a struct type as SPIR required.

That sounds fine to me.

The issue of your approach is that vendors lose the freedom to define ndrange_t the way they like.

In D23086#515468, @yaxunl wrote:

In D23086#515443, @Anastasia wrote:

In D23086#514279, @yaxunl wrote:

How about we decide if a type is ndrange_t type based on their canonical types. If the canonical type of type X is the same as the canonical type of ndrange_t type, then type X is treated as ndrange_t type. Is this reasonable?

I am not sure I understand entirely what you mean?

Following the earlier suggestion from David, I think we can just create a struct type internally and then typedef it to ndrange_t, we can use buildImplicitRecord and addImplicitTypedef methods I believe. The latter one has already been used for other OpenCL types.

We will have to switch to string comparisons to identify this type in SemaChecking.cpp and CGBuiltins.cpp for handling the enqueue_kernel call.

This was not the approach we agreed upon.

The approach we agreed upon was

In D23086#507215, @majnemer wrote:

In D23086#507203, @yaxunl wrote:

How about assuming ndrange_t is a struct type defined by user and identify it by struct type name in Clang? This gives user freedom of implementing it differently than SPIR. In opencl-c.h define it as a struct type as SPIR required.

That sounds fine to me.

The issue of your approach is that vendors lose the freedom to define ndrange_t the way they like.

Surely vendors can re-implement all OpenCL types with an implicit typedef. For example this would just work:

typedef int queue_t;
void bar(queue_t q);

I am afraid we will need to provide some implementation to ndrange_t in Clang itself, otherwise I don't see how it could work. Also it would be good to offer standard functionality without any extra includes just like it worked up to now for all other features.

In D23086#515506, @Anastasia wrote:

Surely vendors can re-implement all OpenCL types with an implicit typedef. For example this would just work:
typedef int queue_t;
void bar(queue_t q);
I am afraid we will need to provide some implementation to ndrange_t in Clang itself, otherwise I don't see how it could work. Also it would be good to offer standard functionality without any extra includes just like it worked up to now for all other features.

It will work if we have a way to identify whether a type is ndrange_t type. So far I have suggested to compare the canonical types. Basically when we see a declaration of typedef with name 'ndrange_t', we save its canonical type X to ASTContext. When we need to check if type Y is ndrange_t, we just need to check Y->getCanonicalType() == X.

On the other hand, pre-define ndrange_t in Clang does not solve the problem of how to decide if a type is ndrange_t, since user can define typedefs for ndrange_t, e.g.

typedef ndrange_t my_ndrange_t;

In this case, we still need to check the canonical type.

In D23086#515590, @yaxunl wrote:
In D23086#515506, @Anastasia wrote:
Surely vendors can re-implement all OpenCL types with an implicit typedef. For example this would just work:
typedef int queue_t;
void bar(queue_t q);
I am afraid we will need to provide some implementation to ndrange_t in Clang itself, otherwise I don't see how it could work. Also it would be good to offer standard functionality without any extra includes just like it worked up to now for all other features.
It will work if we have a way to identify whether a type is ndrange_t type. So far I have suggested to compare the canonical types. Basically when we see a declaration of typedef with name 'ndrange_t', we save its canonical type X to ASTContext. When we need to check if type Y is ndrange_t, we just need to check Y->getCanonicalType() == X.

On the other hand, pre-define ndrange_t in Clang does not solve the problem of how to decide if a type is ndrange_t, since user can define typedefs for ndrange_t, e.g.
typedef ndrange_t my_ndrange_t;
In this case, we still need to check the canonical type.

Why not to just identify the type by the name? It seems much easier and also gives flexibility to implement the type in different ways if needed. Considering that similar approach is already used for some C++ types, it should be fine.

In D23086#516365, @Anastasia wrote:

Why not to just identify the type by the name? It seems much easier and also gives flexibility to implement the type in different ways if needed. Considering that similar approach is already used for some C++ types, it should be fine.

This does not work. For example, user could define

typedef ndrange_t my_ndrange_t;

and pass a variable X of type my_ndrange_t to enqueue_kernel. If we identify ndrange_t by type name, X would be identified as not an ndrange_t type and an error will be emitted, which is wrong. Therefore we have to check the canonical type.

In D23086#516741, @yaxunl wrote:
In D23086#516365, @Anastasia wrote:

Why not to just identify the type by the name? It seems much easier and also gives flexibility to implement the type in different ways if needed. Considering that similar approach is already used for some C++ types, it should be fine.

This does not work. For example, user could define
typedef ndrange_t my_ndrange_t;
and pass a variable X of type my_ndrange_t to enqueue_kernel. If we identify ndrange_t by type name, X would be identified as not an ndrange_t type and an error will be emitted, which is wrong. Therefore we have to check the canonical type.

Right. I am wondering if in C++ use cases, that David have mentioned, they already handle this.

@majnemer, do you have any solution for this issue in your use cases? Checking for the name of the type seems to be unsafe in case of custom typedefs creating aliases of the same type with different names? Also forcing the structure in Clang is undesirable for us because vendors might have different (more suitable) implementations of the type.

Just to summarize, it seems there are the following options to proceed, each has some benefits and disadvantages:

We can check the canonical type. This gives us possibility to accept the type name aliases, but reduces flexibility to implement this type in a custom way.
Identify the type by its name. The issue here is that any type alias via typedef will be treated as a different to ndrange_t (event though it's locally not). One solution to this issue could be to check the typedef chain? At the end I am not sure how critical this is at all to be able to create a typedef of ndrange_t. This approach generally allows vendors to define the type in any suitable way.
We can also drop semantical checks for this type completely and let it to a vendor implementation to define and manipulate the type. This offers the most of flexibility but moves more responsibilities to vendor implementations too.

I am more in favor of either 1st or 2nd option, but the last one could work too.

Anastasia removed a reviewer: Anastasia.Aug 19 2016, 8:49 AM

Anastasia added subscribers: Anastasia, dmitry.

ashi1 abandoned this revision.Aug 24 2020, 12:34 PM

Revision Contents

Path

Size

lib/

CodeGen/

	CGOpenCLRuntime.cpp
	CGOpenCLRuntime.cpp (revision 277508)

21 lines

test/

CodeGenOpenCL/

	cl20-device-side-enqueue.cl
	cl20-device-side-enqueue.cl (revision 277508)

26 lines

Diff 66551

lib/CodeGen/CGOpenCLRuntime.cpp

Context not available.

	CGOpenCLRuntime::~CGOpenCLRuntime() {}	CGOpenCLRuntime::~CGOpenCLRuntime() {}

		//// \brief Defining ndrange_t as mentioned in the SPIR 2.0 spec.
		static llvm::StructType*
		getStructTyForNDRange(CodeGenModule &CGM) {

		llvm::SmallVector<llvm::Type*,4> EleTypes;
		llvm::Type* ArrEleType =
		llvm::ArrayType::get(llvm::IntegerType::
		get(CGM.getLLVMContext(),
		CGM.PointerWidthInBits), 3);
		majnemerUnsubmitted Not Done Reply Inline Actions This formatting looks wrong. majnemer: This formatting looks wrong.

		EleTypes.push_back(CGM.Int32Ty); // work_dim
		EleTypes.push_back(ArrEleType); // global_work_offset
		EleTypes.push_back(ArrEleType); // global_work_size
		EleTypes.push_back(ArrEleType); // local_work_size

		return llvm::StructType::create(EleTypes, "ndrange_t");
		yaxunlUnsubmitted Not Done Reply Inline Actions struct name should be "struct.ndrange_t" to allow library code to access it. yaxunl: struct name should be "struct.ndrange_t" to allow library code to access it.
		yaxunlUnsubmitted Not Done Reply Inline Actions Sorry, should be "struct.__ndrange_t" to avoid conflict with builtin type ndrange_t. yaxunl: Sorry, should be "struct.__ndrange_t" to avoid conflict with builtin type ndrange_t.
		AnastasiaUnsubmitted Not Done Reply Inline Actions Is there any conflict really? I think it should be Ok to keep "struct.ndrange_t", since we transform it to a struct and don't declare as struct. Anastasia: Is there any conflict really? I think it should be Ok to keep "struct.ndrange_t", since we…
		yaxunlUnsubmitted Not Done Reply Inline Actions ndrange_t is defined as a builtin type in Clang, so library developer cannot implement it as a concrete type, but they can implement `__ndrange_t`. This is similar to the case of `as_type`. yaxunl: ndrange_t is defined as a builtin type in Clang, so library developer cannot implement it as a…
		AnastasiaUnsubmitted Not Done Reply Inline Actions I am not sure I am following you, but I think this is different now from other OpenCL types because it's not an opaque type and this change gives it a concrete structure. Technically we can give it any name really struct.ndrange_t or `struct.__ndrange_t` would do because declaring a new type with ndrange_t identifier would be disallowed in OpenCL code and all `__` prefixed identifiers are reserved for internal toolchain use. But not sure why we would go for `struct.__ndrange_t`. We can leave this identifier to be still available for toolchain if needed. Anastasia: I am not sure I am following you, but I think this is different now from other OpenCL types…
		}

	void CGOpenCLRuntime::EmitWorkGroupLocalVarDecl(CodeGenFunction &CGF,	void CGOpenCLRuntime::EmitWorkGroupLocalVarDecl(CodeGenFunction &CGF,
	const VarDecl &D) {	const VarDecl &D) {
	return CGF.EmitStaticVarDecl(D, llvm::GlobalValue::InternalLinkage);	return CGF.EmitStaticVarDecl(D, llvm::GlobalValue::InternalLinkage);
Context not available.
	return llvm::PointerType::get(	return llvm::PointerType::get(
	llvm::StructType::create(Ctx, "opencl.queue_t"), 0);	llvm::StructType::create(Ctx, "opencl.queue_t"), 0);
	case BuiltinType::OCLNDRange:	case BuiltinType::OCLNDRange:
	return llvm::PointerType::get(	return getStructTyForNDRange(CGM);
	llvm::StructType::create(Ctx, "opencl.ndrange_t"), 0);
	case BuiltinType::OCLReserveID:	case BuiltinType::OCLReserveID:
	return llvm::PointerType::get(	return llvm::PointerType::get(
	llvm::StructType::create(Ctx, "opencl.reserve_id_t"), 0);	llvm::StructType::create(Ctx, "opencl.reserve_id_t"), 0);
Context not available.

test/CodeGenOpenCL/cl20-device-side-enqueue.cl

Context not available.
	queue_t default_queue;	queue_t default_queue;
	// CHECK: %flags = alloca i32	// CHECK: %flags = alloca i32
	unsigned flags = 0;	unsigned flags = 0;
	// CHECK: %ndrange = alloca %opencl.ndrange_t*	// CHECK: %ndrange = alloca %ndrange_t
		baderUnsubmitted Not Done Reply Inline Actions Could you also add a regression test to validate that ndrange_t is defined in LLVM as struct type, please? bader: Could you also add a regression test to validate that ndrange_t is defined in LLVM as struct…
	ndrange_t ndrange;	ndrange_t ndrange;
	// CHECK: %clk_event = alloca %opencl.clk_event_t*	// CHECK: %clk_event = alloca %opencl.clk_event_t*
	clk_event_t clk_event;	clk_event_t clk_event;
Context not available.

	// CHECK: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t, %opencl.queue_t* %default_queue	// CHECK: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t, %opencl.queue_t* %default_queue
	// CHECK: [[FLAGS:%[0-9]+]] = load i32, i32* %flags	// CHECK: [[FLAGS:%[0-9]+]] = load i32, i32* %flags
	// CHECK: [[NDR:%[0-9]+]] = load %opencl.ndrange_t, %opencl.ndrange_t* %ndrange	// CHECK: [[NDR:%[0-9]+]] = load %ndrange_t, %ndrange_t* %ndrange
	// CHECK: [[BL:%[0-9]+]] = bitcast <{ i8, i32, i32, i8, %struct.__block_descriptor, i32{{.}}, i32{{.}}, i32{{.}} }>* %block to void ()*	// CHECK: [[BL:%[0-9]+]] = bitcast <{ i8, i32, i32, i8, %struct.__block_descriptor, i32{{.}}, i32{{.}}, i32{{.}} }>* %block to void ()*
	// CHECK: [[BL_I8:%[0-9]+]] = bitcast void ()* [[BL]] to i8*	// CHECK: [[BL_I8:%[0-9]+]] = bitcast void ()* [[BL]] to i8*
	// CHECK: call i32 @__enqueue_kernel_basic(%opencl.queue_t* [[DEF_Q]], i32 [[FLAGS]], %opencl.ndrange_t* [[NDR]], i8* [[BL_I8]])	// CHECK: call i32 @__enqueue_kernel_basic(%opencl.queue_t* [[DEF_Q]], i32 [[FLAGS]], %ndrange_t [[NDR]], i8* [[BL_I8]])
	enqueue_kernel(default_queue, flags, ndrange,	enqueue_kernel(default_queue, flags, ndrange,
	^(void) {	^(void) {
	a[i] = b[i];	a[i] = b[i];
Context not available.

	// CHECK: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t, %opencl.queue_t* %default_queue	// CHECK: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t, %opencl.queue_t* %default_queue
	// CHECK: [[FLAGS:%[0-9]+]] = load i32, i32* %flags	// CHECK: [[FLAGS:%[0-9]+]] = load i32, i32* %flags
	// CHECK: [[NDR:%[0-9]+]] = load %opencl.ndrange_t, %opencl.ndrange_t* %ndrange	// CHECK: [[NDR:%[0-9]+]] = load %ndrange_t, %ndrange_t* %ndrange
	// CHECK: [[BL:%[0-9]+]] = bitcast <{ i8, i32, i32, i8, %struct.__block_descriptor, i32{{.}}, i32{{.}}, i32{{.}} }>* %block3 to void ()*	// CHECK: [[BL:%[0-9]+]] = bitcast <{ i8, i32, i32, i8, %struct.__block_descriptor, i32{{.}}, i32{{.}}, i32{{.}} }>* %block3 to void ()*
	// CHECK: [[BL_I8:%[0-9]+]] = bitcast void ()* [[BL]] to i8*	// CHECK: [[BL_I8:%[0-9]+]] = bitcast void ()* [[BL]] to i8*
	// CHECK: call i32 @__enqueue_kernel_basic_events(%opencl.queue_t* [[DEF_Q]], i32 [[FLAGS]], %opencl.ndrange_t* [[NDR]], i32 2, %opencl.clk_event_t %event_wait_list, %opencl.clk_event_t %clk_event, i8* [[BL_I8]])	// CHECK: call i32 @__enqueue_kernel_basic_events(%opencl.queue_t* [[DEF_Q]], i32 [[FLAGS]], %ndrange_t [[NDR]], i32 2, %opencl.clk_event_t %event_wait_list, %opencl.clk_event_t %clk_event, i8* [[BL_I8]])
	enqueue_kernel(default_queue, flags, ndrange, 2, &event_wait_list, &clk_event,	enqueue_kernel(default_queue, flags, ndrange, 2, &event_wait_list, &clk_event,
	^(void) {	^(void) {
	a[i] = b[i];	a[i] = b[i];
Context not available.

	// CHECK: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t, %opencl.queue_t* %default_queue	// CHECK: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t, %opencl.queue_t* %default_queue
	// CHECK: [[FLAGS:%[0-9]+]] = load i32, i32* %flags	// CHECK: [[FLAGS:%[0-9]+]] = load i32, i32* %flags
	// CHECK: [[NDR:%[0-9]+]] = load %opencl.ndrange_t, %opencl.ndrange_t* %ndrange	// CHECK: [[NDR:%[0-9]+]] = load %ndrange_t, %ndrange_t* %ndrange
	// CHECK: call i32 (%opencl.queue_t, i32, %opencl.ndrange_t, i8, i32, ...) @__enqueue_kernel_vaargs(%opencl.queue_t [[DEF_Q]], i32 [[FLAGS]], %opencl.ndrange_t* [[NDR]], i8* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor* }* @__block_literal_global{{(.[0-9]+)?}} to i8*), i32 1, i32 256)	// CHECK: call i32 (%opencl.queue_t, i32, %ndrange_t, i8, i32, ...) @__enqueue_kernel_vaargs(%opencl.queue_t* [[DEF_Q]], i32 [[FLAGS]], %ndrange_t [[NDR]], i8* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor* }* @__block_literal_global{{(.[0-9]+)?}} to i8*), i32 1, i32 256)
	enqueue_kernel(default_queue, flags, ndrange,	enqueue_kernel(default_queue, flags, ndrange,
	^(local void *p) {	^(local void *p) {
	return;	return;
Context not available.
	char c;	char c;
	// CHECK: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t, %opencl.queue_t* %default_queue	// CHECK: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t, %opencl.queue_t* %default_queue
	// CHECK: [[FLAGS:%[0-9]+]] = load i32, i32* %flags	// CHECK: [[FLAGS:%[0-9]+]] = load i32, i32* %flags
	// CHECK: [[NDR:%[0-9]+]] = load %opencl.ndrange_t, %opencl.ndrange_t* %ndrange	// CHECK: [[NDR:%[0-9]+]] = load %ndrange_t, %ndrange_t* %ndrange
	// CHECK: [[SIZE:%[0-9]+]] = zext i8 {{%[0-9]+}} to i32	// CHECK: [[SIZE:%[0-9]+]] = zext i8 {{%[0-9]+}} to i32
	// CHECK: call i32 (%opencl.queue_t, i32, %opencl.ndrange_t, i8, i32, ...) @__enqueue_kernel_vaargs(%opencl.queue_t [[DEF_Q]], i32 [[FLAGS]], %opencl.ndrange_t* [[NDR]], i8* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor* }* @__block_literal_global{{(.[0-9]+)?}} to i8*), i32 1, i32 [[SIZE]])	// CHECK: call i32 (%opencl.queue_t, i32, %ndrange_t, i8, i32, ...) @__enqueue_kernel_vaargs(%opencl.queue_t* [[DEF_Q]], i32 [[FLAGS]], %ndrange_t [[NDR]], i8* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor* }* @__block_literal_global{{(.[0-9]+)?}} to i8*), i32 1, i32 [[SIZE]])
	enqueue_kernel(default_queue, flags, ndrange,	enqueue_kernel(default_queue, flags, ndrange,
	^(local void *p) {	^(local void *p) {
	return;	return;
Context not available.

	// CHECK: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t, %opencl.queue_t* %default_queue	// CHECK: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t, %opencl.queue_t* %default_queue
	// CHECK: [[FLAGS:%[0-9]+]] = load i32, i32* %flags	// CHECK: [[FLAGS:%[0-9]+]] = load i32, i32* %flags
	// CHECK: [[NDR:%[0-9]+]] = load %opencl.ndrange_t, %opencl.ndrange_t* %ndrange	// CHECK: [[NDR:%[0-9]+]] = load %ndrange_t, %ndrange_t* %ndrange
	// CHECK: [[AD:%arraydecay[0-9]]] = getelementptr inbounds [1 x %opencl.clk_event_t], [1 x %opencl.clk_event_t] %event_wait_list2, i32 0, i32 0	// CHECK: [[AD:%arraydecay[0-9]]] = getelementptr inbounds [1 x %opencl.clk_event_t], [1 x %opencl.clk_event_t] %event_wait_list2, i32 0, i32 0
	// CHECK: call i32 (%opencl.queue_t, i32, %opencl.ndrange_t, i32, %opencl.clk_event_t, %opencl.clk_event_t, i8, i32, ...) @__enqueue_kernel_events_vaargs(%opencl.queue_t [[DEF_Q]], i32 [[FLAGS]], %opencl.ndrange_t* [[NDR]], i32 2, %opencl.clk_event_t [[AD]], %opencl.clk_event_t %clk_event, i8* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor* }* @__block_literal_global{{(.[0-9]+)?}} to i8*), i32 1, i32 256)	// CHECK: call i32 (%opencl.queue_t, i32, %ndrange_t, i32, %opencl.clk_event_t, %opencl.clk_event_t, i8, i32, ...) @__enqueue_kernel_events_vaargs(%opencl.queue_t* [[DEF_Q]], i32 [[FLAGS]], %ndrange_t [[NDR]], i32 2, %opencl.clk_event_t [[AD]], %opencl.clk_event_t %clk_event, i8* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor* }* @__block_literal_global{{(.[0-9]+)?}} to i8*), i32 1, i32 256)
	enqueue_kernel(default_queue, flags, ndrange, 2, event_wait_list2, &clk_event,	enqueue_kernel(default_queue, flags, ndrange, 2, event_wait_list2, &clk_event,
	^(local void *p) {	^(local void *p) {
	return;	return;
Context not available.

	// CHECK: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t, %opencl.queue_t* %default_queue	// CHECK: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t, %opencl.queue_t* %default_queue
	// CHECK: [[FLAGS:%[0-9]+]] = load i32, i32* %flags	// CHECK: [[FLAGS:%[0-9]+]] = load i32, i32* %flags
	// CHECK: [[NDR:%[0-9]+]] = load %opencl.ndrange_t, %opencl.ndrange_t* %ndrange	// CHECK: [[NDR:%[0-9]+]] = load %ndrange_t, %ndrange_t* %ndrange
	// CHECK: [[AD:%arraydecay[0-9]]] = getelementptr inbounds [1 x %opencl.clk_event_t], [1 x %opencl.clk_event_t] %event_wait_list2, i32 0, i32 0	// CHECK: [[AD:%arraydecay[0-9]]] = getelementptr inbounds [1 x %opencl.clk_event_t], [1 x %opencl.clk_event_t] %event_wait_list2, i32 0, i32 0
	// CHECK: [[SIZE:%[0-9]+]] = zext i8 {{%[0-9]+}} to i32	// CHECK: [[SIZE:%[0-9]+]] = zext i8 {{%[0-9]+}} to i32
	// CHECK: call i32 (%opencl.queue_t, i32, %opencl.ndrange_t, i32, %opencl.clk_event_t, %opencl.clk_event_t, i8, i32, ...) @__enqueue_kernel_events_vaargs(%opencl.queue_t [[DEF_Q]], i32 [[FLAGS]], %opencl.ndrange_t* [[NDR]], i32 2, %opencl.clk_event_t [[AD]], %opencl.clk_event_t %clk_event, i8* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor* }* @__block_literal_global{{(.[0-9]+)?}} to i8*), i32 1, i32 [[SIZE]])	// CHECK: call i32 (%opencl.queue_t, i32, %ndrange_t, i32, %opencl.clk_event_t, %opencl.clk_event_t, i8, i32, ...) @__enqueue_kernel_events_vaargs(%opencl.queue_t* [[DEF_Q]], i32 [[FLAGS]], %ndrange_t [[NDR]], i32 2, %opencl.clk_event_t [[AD]], %opencl.clk_event_t %clk_event, i8* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor* }* @__block_literal_global{{(.[0-9]+)?}} to i8*), i32 1, i32 [[SIZE]])
	enqueue_kernel(default_queue, flags, ndrange, 2, event_wait_list2, &clk_event,	enqueue_kernel(default_queue, flags, ndrange, 2, event_wait_list2, &clk_event,
	^(local void *p) {	^(local void *p) {
	return;	return;
Context not available.