This is an archive of the discontinued LLVM Phabricator instance.

[OpenCL] Do not use vararg in emitted functions for enqueue_kernel
ClosedPublic

Authored by yaxunl on Aug 14 2017, 6:39 AM.

Download Raw Diff

Details

Reviewers

Anastasia
bader
b-sumner

Commits

rG29a5ee358e54: [OpenCL] Do not use vararg in emitted functions for enqueue_kernel
rC312441: [OpenCL] Do not use vararg in emitted functions for enqueue_kernel
rL312441: [OpenCL] Do not use vararg in emitted functions for enqueue_kernel

Summary

Not all targets support vararg (e.g. amdgpu). Instead of using vararg in the emitted functions for enqueue_kernel,
this patch creates a temporary array of size_t, stores the size arguments in the temporary array
and passes it to the emitted functions for enqueue_kernel.

Diff Detail

Repository: rL LLVM

Event Timeline

yaxunl created this revision.Aug 14 2017, 6:39 AM

Herald added a subscriber: tpr. · View Herald TranscriptAug 14 2017, 6:39 AM

Anastasia added inline comments.Aug 15 2017, 10:23 AM

test/CodeGenOpenCL/cl20-device-side-enqueue.cl
64 ↗	(On Diff #110956)	This is no longer needed? Could we check the code for array generation too? Also could we modify one test to take more than argument to the block? It seems to be missing in testing.

yaxunl marked an inline comment as done.Aug 15 2017, 8:36 PM

yaxunl added inline comments.

test/CodeGenOpenCL/cl20-device-side-enqueue.cl
64 ↗	(On Diff #110956)	will do.

Revised by Anastasia's comments.

Anastasia added inline comments.Aug 22 2017, 11:21 AM

test/CodeGenOpenCL/cl20-device-side-enqueue.cl
116 ↗	(On Diff #111302)	You are not checking the arrays in the other calls too?

yaxunl marked an inline comment as done.Aug 22 2017, 8:24 PM

yaxunl added inline comments.

test/CodeGenOpenCL/cl20-device-side-enqueue.cl
116 ↗	(On Diff #111302)	The logic is the same and the same lamba is called for emitting the IR. Is it necessary to do the same check for all the cases?

Anastasia added inline comments.Aug 29 2017, 3:57 AM

test/CodeGenOpenCL/cl20-device-side-enqueue.cl
116 ↗	(On Diff #111302)	Ideally yes, we are doing this for other features too... there is only one element in other cases... should be easier.

update tests.

LGTM. Thanks!

This revision is now accepted and ready to land.Sep 1 2017, 9:36 AM

Closed by commit rL312441: [OpenCL] Do not use vararg in emitted functions for enqueue_kernel (authored by yaxunl). · Explain WhySep 3 2017, 6:54 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

cfe/

trunk/

lib/

CodeGen/

CGBuiltin.cpp

59 lines

test/

CodeGenOpenCL/

cl20-device-side-enqueue.cl

90 lines

Diff 113694

cfe/trunk/lib/CodeGen/CGBuiltin.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,595 Lines • ▼ Show 20 Lines	if (NumArgs == 4) {
auto RTCall =		auto RTCall =
Builder.CreateCall(CGM.CreateRuntimeFunction(FTy, Name, ByValAttrSet),		Builder.CreateCall(CGM.CreateRuntimeFunction(FTy, Name, ByValAttrSet),
{Queue, Flags, Range, Block});		{Queue, Flags, Range, Block});
RTCall->setAttributes(ByValAttrSet);		RTCall->setAttributes(ByValAttrSet);
return RValue::get(RTCall);		return RValue::get(RTCall);
}		}
assert(NumArgs >= 5 && "Invalid enqueue_kernel signature");		assert(NumArgs >= 5 && "Invalid enqueue_kernel signature");

		// Create a temporary array to hold the sizes of local pointer arguments
		// for the block. \p First is the position of the first size argument.
		auto CreateArrayForSizeVar = [=](unsigned First) {
		auto *AT = llvm::ArrayType::get(SizeTy, NumArgs - First);
		auto *Arr = Builder.CreateAlloca(AT);
		llvm::Value *Ptr;
		// Each of the following arguments specifies the size of the corresponding
		// argument passed to the enqueued block.
		auto *Zero = llvm::ConstantInt::get(IntTy, 0);
		for (unsigned I = First; I < NumArgs; ++I) {
		auto *Index = llvm::ConstantInt::get(IntTy, I - First);
		auto *GEP = Builder.CreateGEP(Arr, {Zero, Index});
		if (I == First)
		Ptr = GEP;
		auto *V =
		Builder.CreateZExtOrTrunc(EmitScalarExpr(E->getArg(I)), SizeTy);
		Builder.CreateAlignedStore(
		V, GEP, CGM.getDataLayout().getPrefTypeAlignment(SizeTy));
		}
		return Ptr;
		};

// Could have events and/or vaargs.		// Could have events and/or vaargs.
if (E->getArg(3)->getType()->isBlockPointerType()) {		if (E->getArg(3)->getType()->isBlockPointerType()) {
// No events passed, but has variadic arguments.		// No events passed, but has variadic arguments.
Name = "__enqueue_kernel_vaargs";		Name = "__enqueue_kernel_vaargs";
llvm::Value *Block = Builder.CreatePointerCast(		auto *Block = Builder.CreatePointerCast(EmitScalarExpr(E->getArg(3)),
EmitScalarExpr(E->getArg(3)), GenericVoidPtrTy);		GenericVoidPtrTy);
		auto *PtrToSizeArray = CreateArrayForSizeVar(4);

// Create a vector of the arguments, as well as a constant value to		// Create a vector of the arguments, as well as a constant value to
// express to the runtime the number of variadic arguments.		// express to the runtime the number of variadic arguments.
std::vector<llvm::Value *> Args = {Queue, Flags, Range, Block,		std::vector<llvm::Value *> Args = {Queue,
ConstantInt::get(IntTy, NumArgs - 4)};		Flags,
std::vector<llvm::Type *> ArgTys = {QueueTy, IntTy, RangeTy,		Range,
GenericVoidPtrTy, IntTy};		Block,
		ConstantInt::get(IntTy, NumArgs - 4),
// Each of the following arguments specifies the size of the corresponding		PtrToSizeArray};
// argument passed to the enqueued block.		std::vector<llvm::Type *> ArgTys = {QueueTy, IntTy,
for (unsigned I = 4/Position of the first size arg/; I < NumArgs; ++I)		RangeTy, GenericVoidPtrTy,
Args.push_back(		IntTy, PtrToSizeArray->getType()};
Builder.CreateZExtOrTrunc(EmitScalarExpr(E->getArg(I)), SizeTy));

llvm::FunctionType *FTy = llvm::FunctionType::get(		llvm::FunctionType *FTy = llvm::FunctionType::get(
Int32Ty, llvm::ArrayRef<llvm::Type *>(ArgTys), true);		Int32Ty, llvm::ArrayRef<llvm::Type *>(ArgTys), false);
return RValue::get(		return RValue::get(
Builder.CreateCall(CGM.CreateRuntimeFunction(FTy, Name),		Builder.CreateCall(CGM.CreateRuntimeFunction(FTy, Name),
llvm::ArrayRef<llvm::Value *>(Args)));		llvm::ArrayRef<llvm::Value *>(Args)));
}		}
// Any calls now have event arguments passed.		// Any calls now have event arguments passed.
if (NumArgs >= 7) {		if (NumArgs >= 7) {
llvm::Type *EventTy = ConvertType(getContext().OCLClkEventTy);		llvm::Type *EventTy = ConvertType(getContext().OCLClkEventTy);
llvm::Type *EventPtrTy = EventTy->getPointerTo(		llvm::Type *EventPtrTy = EventTy->getPointerTo(
Show All 29 Lines	if (NumArgs >= 7) {
llvm::ArrayRef<llvm::Value *>(Args)));		llvm::ArrayRef<llvm::Value *>(Args)));
}		}
// Has event info and variadics		// Has event info and variadics
// Pass the number of variadics to the runtime function too.		// Pass the number of variadics to the runtime function too.
Args.push_back(ConstantInt::get(Int32Ty, NumArgs - 7));		Args.push_back(ConstantInt::get(Int32Ty, NumArgs - 7));
ArgTys.push_back(Int32Ty);		ArgTys.push_back(Int32Ty);
Name = "__enqueue_kernel_events_vaargs";		Name = "__enqueue_kernel_events_vaargs";

// Each of the following arguments specifies the size of the corresponding		auto *PtrToSizeArray = CreateArrayForSizeVar(7);
// argument passed to the enqueued block.		Args.push_back(PtrToSizeArray);
for (unsigned I = 7/Position of the first size arg/; I < NumArgs; ++I)		ArgTys.push_back(PtrToSizeArray->getType());
Args.push_back(
Builder.CreateZExtOrTrunc(EmitScalarExpr(E->getArg(I)), SizeTy));

llvm::FunctionType *FTy = llvm::FunctionType::get(		llvm::FunctionType *FTy = llvm::FunctionType::get(
Int32Ty, llvm::ArrayRef<llvm::Type *>(ArgTys), true);		Int32Ty, llvm::ArrayRef<llvm::Type *>(ArgTys), false);
return RValue::get(		return RValue::get(
Builder.CreateCall(CGM.CreateRuntimeFunction(FTy, Name),		Builder.CreateCall(CGM.CreateRuntimeFunction(FTy, Name),
llvm::ArrayRef<llvm::Value *>(Args)));		llvm::ArrayRef<llvm::Value *>(Args)));
}		}
LLVM_FALLTHROUGH;		LLVM_FALLTHROUGH;
}		}
// OpenCL v2.0 s6.13.17.6 - Kernel query functions need bitcast of block		// OpenCL v2.0 s6.13.17.6 - Kernel query functions need bitcast of block
// parameter.		// parameter.
▲ Show 20 Lines • Show All 6,812 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGenOpenCL/cl20-device-side-enqueue.cl

Show First 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	kernel void device_side_enqueue(global int a, global int b, int i) {
// COMMON: call i32 @__enqueue_kernel_basic_events(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* {{.}}, i32 2, %opencl.clk_event_t{{.}}* addrspace(4)* [[WAIT_EVNT]], %opencl.clk_event_t{{.}} addrspace(4)* [[EVNT]], i8 addrspace(4)* [[BL_I8]])		// COMMON: call i32 @__enqueue_kernel_basic_events(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* {{.}}, i32 2, %opencl.clk_event_t{{.}}* addrspace(4)* [[WAIT_EVNT]], %opencl.clk_event_t{{.}} addrspace(4)* [[EVNT]], i8 addrspace(4)* [[BL_I8]])
enqueue_kernel(default_queue, flags, ndrange, 2, &event_wait_list, &clk_event,		enqueue_kernel(default_queue, flags, ndrange, 2, &event_wait_list, &clk_event,
^(void) {		^(void) {
a[i] = b[i];		a[i] = b[i];
});		});

// COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t{{.}}, %opencl.queue_t{{.}}* %default_queue		// COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t{{.}}, %opencl.queue_t{{.}}* %default_queue
// COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags		// COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags
// B32: call i32 (%opencl.queue_t{{.}}, i32, %struct.ndrange_t, i8 addrspace(4), i32, ...) @__enqueue_kernel_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{(.[0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor addrspace(2)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i32 256)		// B32: %[[TMP:.*]] = alloca [1 x i32]
// B64: call i32 (%opencl.queue_t{{.}}, i32, %struct.ndrange_t, i8 addrspace(4), i32, ...) @__enqueue_kernel_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{(.[0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor addrspace(2)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i64 256)		// B32: %[[TMP1:.]] = getelementptr [1 x i32], [1 x i32] %[[TMP]], i32 0, i32 0
		// B32: store i32 256, i32* %[[TMP1]], align 4
		// B32: call i32 @__enqueue_kernel_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{(.[0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor addrspace(2)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i32* %[[TMP1]])
		// B64: %[[TMP:.*]] = alloca [1 x i64]
		// B64: %[[TMP1:.]] = getelementptr [1 x i64], [1 x i64] %[[TMP]], i32 0, i32 0
		// B64: store i64 256, i64* %[[TMP1]], align 8
		// B64: call i32 @__enqueue_kernel_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{(.[0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor addrspace(2)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i64* %[[TMP1]])
enqueue_kernel(default_queue, flags, ndrange,		enqueue_kernel(default_queue, flags, ndrange,
^(local void *p) {		^(local void *p) {
return;		return;
},		},
256);		256);
char c;		char c;
// COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t{{.}}, %opencl.queue_t{{.}}* %default_queue		// COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t{{.}}, %opencl.queue_t{{.}}* %default_queue
// COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags		// COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags
// B32: [[SIZE:%[0-9]+]] = zext i8 {{%[0-9]+}} to i32		// B32: %[[TMP:.*]] = alloca [1 x i32]
// B32: call i32 (%opencl.queue_t{{.}}, i32, %struct.ndrange_t, i8 addrspace(4), i32, ...) @__enqueue_kernel_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{(.[0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor addrspace(2)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i32 [[SIZE]])		// B32: %[[TMP1:.]] = getelementptr [1 x i32], [1 x i32] %[[TMP]], i32 0, i32 0
// B64: [[SIZE:%[0-9]+]] = zext i8 {{%[0-9]+}} to i64		// B32: store i32 %{{.}}, i32 %[[TMP1]], align 4
// B64: call i32 (%opencl.queue_t{{.}}, i32, %struct.ndrange_t, i8 addrspace(4), i32, ...) @__enqueue_kernel_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{(.[0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor addrspace(2)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i64 [[SIZE]])		// B32: call i32 @__enqueue_kernel_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{(.[0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor addrspace(2)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i32* %[[TMP1]])
		// B64: %[[TMP:.*]] = alloca [1 x i64]
		// B64: %[[TMP1:.]] = getelementptr [1 x i64], [1 x i64] %[[TMP]], i32 0, i32 0
		// B64: store i64 %{{.}}, i64 %[[TMP1]], align 8
		// B64: call i32 @__enqueue_kernel_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{(.[0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor addrspace(2)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i64* %[[TMP1]])
enqueue_kernel(default_queue, flags, ndrange,		enqueue_kernel(default_queue, flags, ndrange,
^(local void *p) {		^(local void *p) {
return;		return;
},		},
c);		c);

// COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t{{.}}, %opencl.queue_t{{.}}* %default_queue		// COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t{{.}}, %opencl.queue_t{{.}}* %default_queue
// COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags		// COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags
// COMMON: [[AD:%arraydecay[0-9]]] = getelementptr inbounds [1 x %opencl.clk_event_t], [1 x %opencl.clk_event_t] %event_wait_list2, i32 0, i32 0		// COMMON: [[AD:%arraydecay[0-9]]] = getelementptr inbounds [1 x %opencl.clk_event_t], [1 x %opencl.clk_event_t] %event_wait_list2, i32 0, i32 0
// COMMON: [[WAIT_EVNT:%[0-9]+]] = addrspacecast %opencl.clk_event_t{{.}}* [[AD]] to %opencl.clk_event_t{{.}} addrspace(4)*		// COMMON: [[WAIT_EVNT:%[0-9]+]] = addrspacecast %opencl.clk_event_t{{.}}* [[AD]] to %opencl.clk_event_t{{.}} addrspace(4)*
// COMMON: [[EVNT:%[0-9]+]] = addrspacecast %opencl.clk_event_t{{.}}* %clk_event to %opencl.clk_event_t{{.}} addrspace(4)*		// COMMON: [[EVNT:%[0-9]+]] = addrspacecast %opencl.clk_event_t{{.}}* %clk_event to %opencl.clk_event_t{{.}} addrspace(4)*
// B32: call i32 (%opencl.queue_t{{.}}, i32, %struct.ndrange_t, i32, %opencl.clk_event_t{{.}}* addrspace(4), %opencl.clk_event_t{{.}}* addrspace(4), i8 addrspace(4), i32, ...) @__enqueue_kernel_events_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* {{.}}, i32 2, %opencl.clk_event_t{{.}} [[WAIT_EVNT]], %opencl.clk_event_t{{.}} [[EVNT]], i8 addrspace(4) addrspacecast (i8 addrspace(1)* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor addrspace(2)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i32 256)		// B32: %[[TMP:.*]] = alloca [1 x i32]
// B64: call i32 (%opencl.queue_t{{.}}, i32, %struct.ndrange_t, i32, %opencl.clk_event_t{{.}}* addrspace(4), %opencl.clk_event_t{{.}}* addrspace(4), i8 addrspace(4), i32, ...) @__enqueue_kernel_events_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* {{.}}, i32 2, %opencl.clk_event_t{{.}} [[WAIT_EVNT]], %opencl.clk_event_t{{.}} [[EVNT]], i8 addrspace(4) addrspacecast (i8 addrspace(1)* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor addrspace(2)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i64 256)		// B32: %[[TMP1:.]] = getelementptr [1 x i32], [1 x i32] %[[TMP]], i32 0, i32 0
		// B32: store i32 256, i32* %[[TMP1]], align 4
		// B32: call i32 @__enqueue_kernel_events_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* {{.}}, i32 2, %opencl.clk_event_t{{.}} [[WAIT_EVNT]], %opencl.clk_event_t{{.}} [[EVNT]], i8 addrspace(4) addrspacecast (i8 addrspace(1)* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor addrspace(2)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i32* %[[TMP1]])
		// B64: %[[TMP:.*]] = alloca [1 x i64]
		// B64: %[[TMP1:.]] = getelementptr [1 x i64], [1 x i64] %[[TMP]], i32 0, i32 0
		// B64: store i64 256, i64* %[[TMP1]], align 8
		// B64: call i32 @__enqueue_kernel_events_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* {{.}}, i32 2, %opencl.clk_event_t{{.}} [[WAIT_EVNT]], %opencl.clk_event_t{{.}} [[EVNT]], i8 addrspace(4) addrspacecast (i8 addrspace(1)* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor addrspace(2)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i64* %[[TMP1]])
enqueue_kernel(default_queue, flags, ndrange, 2, event_wait_list2, &clk_event,		enqueue_kernel(default_queue, flags, ndrange, 2, event_wait_list2, &clk_event,
^(local void *p) {		^(local void *p) {
return;		return;
},		},
256);		256);

// COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t{{.}}, %opencl.queue_t{{.}}* %default_queue		// COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t{{.}}, %opencl.queue_t{{.}}* %default_queue
// COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags		// COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags
// COMMON: [[AD:%arraydecay[0-9]]] = getelementptr inbounds [1 x %opencl.clk_event_t], [1 x %opencl.clk_event_t] %event_wait_list2, i32 0, i32 0		// COMMON: [[AD:%arraydecay[0-9]]] = getelementptr inbounds [1 x %opencl.clk_event_t], [1 x %opencl.clk_event_t] %event_wait_list2, i32 0, i32 0
// COMMON: [[WAIT_EVNT:%[0-9]+]] = addrspacecast %opencl.clk_event_t{{.}}* [[AD]] to %opencl.clk_event_t{{.}} addrspace(4)*		// COMMON: [[WAIT_EVNT:%[0-9]+]] = addrspacecast %opencl.clk_event_t{{.}}* [[AD]] to %opencl.clk_event_t{{.}} addrspace(4)*
// COMMON: [[EVNT:%[0-9]+]] = addrspacecast %opencl.clk_event_t{{.}}* %clk_event to %opencl.clk_event_t{{.}} addrspace(4)*		// COMMON: [[EVNT:%[0-9]+]] = addrspacecast %opencl.clk_event_t{{.}}* %clk_event to %opencl.clk_event_t{{.}} addrspace(4)*
// B32: [[SIZE:%[0-9]+]] = zext i8 {{%[0-9]+}} to i32		// B32: %[[TMP:.*]] = alloca [1 x i32]
// B32: call i32 (%opencl.queue_t{{.}}, i32, %struct.ndrange_t, i32, %opencl.clk_event_t{{.}}* addrspace(4), %opencl.clk_event_t{{.}}* addrspace(4), i8 addrspace(4), i32, ...) @__enqueue_kernel_events_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* {{.}}, i32 2, %opencl.clk_event_t{{.}}* addrspace(4)* [[WAIT_EVNT]], %opencl.clk_event_t{{.}} addrspace(4)* [[EVNT]], i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor addrspace(2)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i32 [[SIZE]])		// B32: %[[TMP1:.]] = getelementptr [1 x i32], [1 x i32] %[[TMP]], i32 0, i32 0
// B64: [[SIZE:%[0-9]+]] = zext i8 {{%[0-9]+}} to i64		// B32: store i32 %{{.}}, i32 %[[TMP1]], align 4
// B64: call i32 (%opencl.queue_t{{.}}, i32, %struct.ndrange_t, i32, %opencl.clk_event_t{{.}}* addrspace(4), %opencl.clk_event_t{{.}}* addrspace(4), i8 addrspace(4), i32, ...) @__enqueue_kernel_events_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* {{.}}, i32 2, %opencl.clk_event_t{{.}}* addrspace(4)* [[WAIT_EVNT]], %opencl.clk_event_t{{.}} addrspace(4)* [[EVNT]], i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor addrspace(2)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i64 [[SIZE]])		// B32: call i32 @__enqueue_kernel_events_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* {{.}}, i32 2, %opencl.clk_event_t{{.}}* addrspace(4)* [[WAIT_EVNT]], %opencl.clk_event_t{{.}} addrspace(4)* [[EVNT]], i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor addrspace(2)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i32* %[[TMP1]])
		// B64: %[[TMP:.*]] = alloca [1 x i64]
		// B64: %[[TMP1:.]] = getelementptr [1 x i64], [1 x i64] %[[TMP]], i32 0, i32 0
		// B64: store i64 %{{.}}, i64 %[[TMP1]], align 8
		// B64: call i32 @__enqueue_kernel_events_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* {{.}}, i32 2, %opencl.clk_event_t{{.}}* addrspace(4)* [[WAIT_EVNT]], %opencl.clk_event_t{{.}} addrspace(4)* [[EVNT]], i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor addrspace(2)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i64* %[[TMP1]])
enqueue_kernel(default_queue, flags, ndrange, 2, event_wait_list2, &clk_event,		enqueue_kernel(default_queue, flags, ndrange, 2, event_wait_list2, &clk_event,
^(local void *p) {		^(local void *p) {
return;		return;
},		},
c);		c);

long l;		long l;
// COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t{{.}}, %opencl.queue_t{{.}}* %default_queue		// COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t{{.}}, %opencl.queue_t{{.}}* %default_queue
// COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags		// COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags
// B32: [[SIZE:%[0-9]+]] = trunc i64 {{%[0-9]+}} to i32		// B32: %[[TMP:.*]] = alloca [1 x i32]
// B32: call i32 (%opencl.queue_t{{.}}, i32, %struct.ndrange_t, i8 addrspace(4), i32, ...) @__enqueue_kernel_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{(.[0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor addrspace(2)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i32 [[SIZE]])		// B32: %[[TMP1:.]] = getelementptr [1 x i32], [1 x i32] %[[TMP]], i32 0, i32 0
// B64: [[SIZE:%[0-9]+]] = load i64, i64* %l		// B32: store i32 %{{.}}, i32 %[[TMP1]], align 4
// B64: call i32 (%opencl.queue_t{{.}}, i32, %struct.ndrange_t, i8 addrspace(4), i32, ...) @__enqueue_kernel_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{(.[0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor addrspace(2)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i64 [[SIZE]])		// B32: call i32 @__enqueue_kernel_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{(.[0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor addrspace(2)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i32* %[[TMP1]])
		// B64: %[[TMP:.*]] = alloca [1 x i64]
		// B64: %[[TMP1:.]] = getelementptr [1 x i64], [1 x i64] %[[TMP]], i32 0, i32 0
		// B64: store i64 %{{.}}, i64 %[[TMP1]], align 8
		// B64: call i32 @__enqueue_kernel_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{(.[0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor addrspace(2)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i64* %[[TMP1]])
enqueue_kernel(default_queue, flags, ndrange,		enqueue_kernel(default_queue, flags, ndrange,
^(local void *p) {		^(local void *p) {
return;		return;
},		},
l);		l);

		// COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t{{.}}, %opencl.queue_t{{.}}* %default_queue
		// COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags
		// B32: %[[TMP:.*]] = alloca [3 x i32]
		// B32: %[[TMP1:.]] = getelementptr [3 x i32], [3 x i32] %[[TMP]], i32 0, i32 0
		// B32: store i32 1, i32* %[[TMP1]], align 4
		// B32: %[[TMP2:.]] = getelementptr [3 x i32], [3 x i32] %[[TMP]], i32 0, i32 1
		// B32: store i32 2, i32* %[[TMP2]], align 4
		// B32: %[[TMP3:.]] = getelementptr [3 x i32], [3 x i32] %[[TMP]], i32 0, i32 2
		// B32: store i32 4, i32* %[[TMP3]], align 4
		// B32: call i32 @__enqueue_kernel_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{(.[0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor addrspace(2)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 3, i32* %[[TMP1]])
		// B64: %[[TMP:.*]] = alloca [3 x i64]
		// B64: %[[TMP1:.]] = getelementptr [3 x i64], [3 x i64] %[[TMP]], i32 0, i32 0
		// B64: store i64 1, i64* %[[TMP1]], align 8
		// B64: %[[TMP2:.]] = getelementptr [3 x i64], [3 x i64] %[[TMP]], i32 0, i32 1
		// B64: store i64 2, i64* %[[TMP2]], align 8
		// B64: %[[TMP3:.]] = getelementptr [3 x i64], [3 x i64] %[[TMP]], i32 0, i32 2
		// B64: store i64 4, i64* %[[TMP3]], align 8
		// B64: call i32 @__enqueue_kernel_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{(.[0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor addrspace(2)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 3, i64* %[[TMP1]])
		enqueue_kernel(default_queue, flags, ndrange,
		^(local void p1, local void p2, local void *p3) {
		return;
		},
		1, 2, 4);

// COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t, %opencl.queue_t* %default_queue		// COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t, %opencl.queue_t* %default_queue
// COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags		// COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags
// B32: call i32 (%opencl.queue_t{{.}}, i32, %struct.ndrange_t, i8 addrspace(4), i32, ...) @__enqueue_kernel_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{(.[0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor addrspace(2)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i32 0)		// B32: %[[TMP:.*]] = alloca [1 x i32]
// B64: call i32 (%opencl.queue_t{{.}}, i32, %struct.ndrange_t, i8 addrspace(4), i32, ...) @__enqueue_kernel_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{(.[0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor addrspace(2)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i64 4294967296)		// B32: %[[TMP1:.]] = getelementptr [1 x i32], [1 x i32] %[[TMP]], i32 0, i32 0
		// B32: store i32 0, i32* %[[TMP1]], align 4
		// B32: call i32 @__enqueue_kernel_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{(.[0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor addrspace(2)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i32* %[[TMP1]])
		// B64: %[[TMP:.*]] = alloca [1 x i64]
		// B64: %[[TMP1:.]] = getelementptr [1 x i64], [1 x i64] %[[TMP]], i32 0, i32 0
		// B64: store i64 4294967296, i64* %[[TMP1]], align 8
		// B64: call i32 @__enqueue_kernel_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{(.[0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor addrspace(2)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i64* %[[TMP1]])
enqueue_kernel(default_queue, flags, ndrange,		enqueue_kernel(default_queue, flags, ndrange,
^(local void *p) {		^(local void *p) {
return;		return;
},		},
4294967296L);		4294967296L);

// The full type of these expressions are long (and repeated elsewhere), so we		// The full type of these expressions are long (and repeated elsewhere), so we
// capture it as part of the regex for convenience and clarity.		// capture it as part of the regex for convenience and clarity.
Show All 24 Lines