This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
cfe/trunk/
-
trunk/
-
lib/CodeGen/
-
CodeGen/
-
CGBlocks.cpp
-
CGBuiltin.cpp
-
CGOpenCLRuntime.h
-
CGOpenCLRuntime.cpp
-
CodeGenFunction.h
-
CodeGenTypes.h
-
TargetInfo.h
-
TargetInfo.cpp
-
test/CodeGenOpenCL/
-
CodeGenOpenCL/
-
amdgpu-enqueue-kernel.cl
-
blocks.cl
-
cl20-device-side-enqueue.cl

Differential D38134

[OpenCL] Emit enqueued block as kernel
ClosedPublic

Authored by yaxunl on Sep 21 2017, 7:48 AM.

Download Raw Diff

Details

Reviewers

Anastasia
bader
b-sumner

Commits

rGc2a87a05f1a6: [OpenCL] Emit enqueued block as kernel
rC315804: [OpenCL] Emit enqueued block as kernel
rL315804: [OpenCL] Emit enqueued block as kernel

Summary

In OpenCL the kernel function and non-kernel function has different calling conventions.
For certain targets they have different argument ABIs. Also kernels have special function
attributes and metadata for runtime to launch them.

The blocks passed to enqueue_kernel is supposed to be executed as kernels. As such,
the block invoke function should be emitted as kernel with proper calling convention and
argument ABI.

This patch emits enqueued block as kernel. If a block is both called directly and passed
to enqueue_kernel, separate functions will be generated.

Diff Detail

Repository: rL LLVM

Event Timeline

yaxunl created this revision.Sep 21 2017, 7:48 AM

Herald added a subscriber: nhaehnle. · View Herald TranscriptSep 21 2017, 7:48 AM

Now if we have a block which is being called and enqueued at the same time, will we generate 2 functions for it? Could we add such test case btw?

I feel it would be much simpler if we could always generate the kernel metadata for blocks. A lot of special case code would be removed if we do this. OpenCL doesn't prevent kernel functions to be used just as normal functions (6.7.1) so it should be a perfectly valid thing to do. Do you seen any issues with that?

lib/CodeGen/CGBlocks.cpp
1255 ↗	(On Diff #116186)	Is there any test that covers this?
lib/CodeGen/CGOpenCLRuntime.cpp
113 ↗	(On Diff #116186)	I am not particularly in favour of duplicating CodeGen functionality as it typically has so many special cases that are hard to catch. Is this logic needed in order to pass to block literal information that the block is enqueued?
lib/CodeGen/CodeGenFunction.cpp
535 ↗	(On Diff #116186)	I don't quite understand why we need to special case this? As far as I undertsnad block argument is a `generic void*` type but it's being cast to a concrete block struct inside the block function. Do we gain anything from having it a specific type here?

In D38134#877831, @Anastasia wrote:

Now if we have a block which is being called and enqueued at the same time, will we generate 2 functions for it? Could we add such test case btw?

Yes. It is covered by test/CodeGenOpenCL/cl20-device-side-enqueue.cl, line 246, 250, and 256.

I feel it would be much simpler if we could always generate the kernel metadata for blocks. A lot of special case code would be removed if we do this. OpenCL doesn't prevent kernel functions to be used just as normal functions (6.7.1) so it should be a perfectly valid thing to do. Do you seen any issues with that?

The special cases in metadata generation code is due to the first argument of LLVM block invoke function is not defined in BlockDecl. Emitting metadata for all block invoke functions does not help here.

lib/CodeGen/CGBlocks.cpp
1255 ↗	(On Diff #116186)	Yes. This is covered by test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl where the block struct is passed directly to the kernel instead of by a pointer.
lib/CodeGen/CGOpenCLRuntime.cpp
113 ↗	(On Diff #116186)	This code is needed to emit separate functions for a block which is directly called and also enqueued as a kernel. Since the kernel needs to have proper calling convention and ABI, it cannot be emitted as the same function as when the block is called directly. Since it is OpenCL specific code, I found it is cleaner to separate this code as member of CGOpenCLRuntime instead of fitting it into CGF.EmitBlockLiteral.
lib/CodeGen/CodeGenFunction.cpp
535 ↗	(On Diff #116186)	This argument is not part of BlockDecl. BlockDecl only has arguments shown up in the source code. The first argument in the LLVM block invoke function is generated by codegen and there is no correspondence in AST, so it has to be handled as a special case.

I feel it would be much simpler if we could always generate the kernel metadata for blocks. A lot of special case code would be removed if we do this. OpenCL doesn't prevent kernel functions to be used just as normal functions (6.7.1) so it should be a perfectly valid thing to do. Do you seen any issues with that?

The special cases in metadata generation code is due to the first argument of LLVM block invoke function is not defined in BlockDecl. Emitting metadata for all block invoke functions does not help here.

To be more specific. I am just wondering what do we need for blocks to be used as kernels pragmatically. I feel it is essentially kernel calling convention and kernel metadata? The kernel arguments metadata however can be omitted because their type is fixed to be local void* and the number of arguments is passed into enqueue_kernel call so it is known at the enqueueing side too. The block descriptor can be passed as a generic pointer generic void* as it is cast to the right struct type inside the block invoke function anyway. So if we do this we can avoid adding a lot of extra code. Because blocks have reduced functionality compared to kernel functions. Also OpenCL allows kernel functions to be called just as normal functions so this way we can support second use case for blocks too. What do you think about it?

lib/CodeGen/CGOpenCLRuntime.cpp
113 ↗	(On Diff #116186)	This part is replacing standard `EmitScalarExpr` call which is doing several things before calling into block generation. That's why I am a bit worried we are covering all the corner cases here. So if we transform all blocks into kernels unconditionally we won't need this special handling then? Do we generate two versions of the blocks now: one for enqueue and one for call?
lib/CodeGen/CodeGenFunction.cpp
535 ↗	(On Diff #116186)	Considering that enqueued kernel always takes the same type of the arguments (`local void`) and # of args is specified in `enqueue_kernel`, I was wondering whether we need to generate the information on the kernel parameters at all? The enqueueing side will have the information provided in the `enqueue_kernel` code. As for the block itself it can be passed as `generic void` and then cast to the block struct type inside the block itself.
667 ↗	(On Diff #116186)	Why this change?
test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl
3 ↗	(On Diff #116186)	This struct is not identical to block literal struct?

yaxunl marked 10 inline comments as done.Sep 22 2017, 10:58 AM

yaxunl added inline comments.

lib/CodeGen/CGOpenCLRuntime.cpp
113 ↗	(On Diff #116186)	If we transform all blocks into kernels, we could simplify the logic. Probably will not need this special handling. However, when the block is called directly, the first argument is a private pointer, when it is executed as a kernel, the first argument is a global pointer or a struct (for amdgpu target), therefore the emitted functions cannot be the same.
lib/CodeGen/CodeGenFunction.cpp
535 ↗	(On Diff #116186)	amdgpu backend relies on kernel argument metadata to generate some metadata in elf for runtime to launch the kernel. The backend expects the kernel argument metadata on each kernel argument. Not generating kernel metadata on the first kernel argument requires special handling in the backend. I think it is better to let Clang generate kernel argument metadata for all kernel arguments.
667 ↗	(On Diff #116186)	CGM is no longer a function parameter since now this function requires a CGF parameter.
test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl
3 ↗	(On Diff #116186)	The LLVM type of the first argument of block invoke function is created directly with sorting and rearrangement. There is no AST type corresponding to it. However, the function codegen requires AST type of this argument. I feel it is unnecessary to create the corresponding AST type. For simplicity, just create an AST type with the same size and alignment as the LLVM type. In the function code, it will be bitcasted to the correct LLVM struct type and get the captured variables.

I think we should add a test case when the same block is both called and enqueued.

lib/CodeGen/CGOpenCLRuntime.cpp
113 ↗	(On Diff #116186)	Would using generic address space for this first argument not work?
test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl
3 ↗	(On Diff #116186)	So `void ptr` won't be possible here? Since it is cast to a right struct inside the block anyway. Once again a block is a special type object with known semantic to compiler and runtime in contract to kernels that can be written with any arbitrary type of arguments. I just don't like the idea of duplicating the block invoke function in case it's being both called and enqueued. Also the login in blocks code generation becomes difficult to understand. So I am wondering if we could perhaps create a separate kernel function (as a wrapper) for enqueue_kernel which would call a block instead. What do you think about it? I think the kernel prototype would be fairly generic as it would just have a block call inside and pass the arguments into it... We won't need to modify block generation then at all.

In D38134#880133, @Anastasia wrote:

I think we should add a test case when the same block is both called and enqueued.

Will do.

test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl
3 ↗	(On Diff #116186)	Will emit a wrapper kernel which calls the block invoke function and keep the block invoke function unchanged.

Emit enqueued block as a wrapper kernel which calls the block invoke function. Added test for calling and enqueue the same block.

Anastasia added inline comments.Oct 6 2017, 4:39 AM

lib/CodeGen/CGOpenCLRuntime.cpp
144 ↗	(On Diff #117739)	Why do we need to replace original block calls with the kernels? I think in case of calling a block we could use the original block function and only for enqueue use the kernel that would call the block function inside. The pointer to the kernel wrapper could be passed as an additional parameter to `enqueue_kernel` calls. We won't need to iterate through all IR then.
lib/CodeGen/TargetInfo.cpp
8927 ↗	(On Diff #117739)	Could you add some comments please?
8949 ↗	(On Diff #117739)	Wondering if we should add the kernel metadata (w/o args) since it was used for long time to indicate the kernel.
lib/CodeGen/TargetInfo.h
35 ↗	(On Diff #117739)	Do we need this?
test/CodeGenOpenCL/cl20-device-side-enqueue.cl
9 ↗	(On Diff #117739)	Can we check generated kernel function too?

yaxunl marked 6 inline comments as done.Oct 6 2017, 8:43 AM

yaxunl added inline comments.

lib/CodeGen/CGOpenCLRuntime.cpp
144 ↗	(On Diff #117739)	`CGF.EmitScalarExpr(Block)` returns the block literal structure which contains the size/align/invoke_function/captures. The block invoke function is stored to the struct by a `StoreInst`. To create the wrapper kernel, we need to get the block invoke function, therefore we have to iterate through IR. Since we need to find the store instruction any way, it is simpler to just replace the stored function with the kernel and pass the block literal struct, instead of passing the kernel separately.
lib/CodeGen/TargetInfo.cpp
8927 ↗	(On Diff #117739)	Will do.
8949 ↗	(On Diff #117739)	Currently (before this change), clang already does not generate kernel metadata if there is no vec_type_hint, work_group_size_hint, reqd_work_group_size. Remember last time we made the change to use function metadata to represent these attributes. Whether a function is a kernel can be determined by its calling convention.
lib/CodeGen/TargetInfo.h
35 ↗	(On Diff #117739)	Will remove it.
test/CodeGenOpenCL/cl20-device-side-enqueue.cl
9 ↗	(On Diff #117739)	will do.

Revise by Anastasia's comments.

Anastasia added inline comments.Oct 10 2017, 9:42 AM

lib/CodeGen/CGOpenCLRuntime.cpp
144 ↗	(On Diff #117739)	So we cann't get the invoke function from the block literal structure passed into the kernel wrapper directly knowing its offset? Iterating through IR adds extra time and also I am not sure how reliable this is wrt different corner cases of IR.
lib/CodeGen/TargetInfo.cpp
8949 ↗	(On Diff #117739)	Ok, let's leave it for now. We can always add it in on request.
test/CodeGenOpenCL/cl20-device-side-enqueue.cl
297 ↗	(On Diff #118064)	Perhaps we could check the body of this one too since it has a different prototype.

yaxunl marked 4 inline comments as done.Oct 10 2017, 11:27 AM

yaxunl added inline comments.

lib/CodeGen/CGOpenCLRuntime.cpp
144 ↗	(On Diff #117739)	Unfortunately the invoke function is not returned directly. Instead, it is buried in an LLVM value. And to extract the invoke function from the LLVM value we have to wade through a bunch of LLVM IRs. There is one way to get the invoke function directly instead of going through IRs, but we need to change the functions for generating code for the blocks a little bit so that they return the block invoke function.

Revised by Anastasia's comments. Get block invoke function by API instead of iterate through IR's. Pass the block kernel directly to __enqueu_kernel functions.

I think it would be good to add a block test to CodeGenOpenCL where we would just call the block without any enqueue and check that the invoke function is generated but the kernel wrapper isn't.

lib/CodeGen/CGBuiltin.cpp
2846 ↗	(On Diff #118677)	Formatting seems inconsistent from above.
lib/CodeGen/CodeGenFunction.h
2921 ↗	(On Diff #118677)	It will be nullptr in case block is not enqueued? May be it's worth explaining it in the comment.
lib/CodeGen/TargetInfo.h
290 ↗	(On Diff #118677)	Can we also explain the wrapper kernel here?

Revised by Anastasia's comments.

In D38134#895848, @Anastasia wrote:

I think it would be good to add a block test to CodeGenOpenCL where we would just call the block without any enqueue and check that the invoke function is generated but the kernel wrapper isn't.

we have test/CodeGenOpenCL/blocks.cl which only calls a block. I can add check to make sure no kernels generated.

lib/CodeGen/CGBuiltin.cpp
2846 ↗	(On Diff #118677)	Will fix.
lib/CodeGen/CodeGenFunction.h
2921 ↗	(On Diff #118677)	Will do.
lib/CodeGen/TargetInfo.h
290 ↗	(On Diff #118677)	Will do.

LGTM! Great work! Thanks!

This revision is now accepted and ready to land.Oct 13 2017, 8:54 AM

Closed by commit rL315804: [OpenCL] Emit enqueued block as kernel (authored by yaxunl). · Explain WhyOct 14 2017, 5:24 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

cfe/

trunk/

lib/

CodeGen/

28 lines

88 lines

14 lines

36 lines

16 lines

5 lines

10 lines

111 lines

test/

CodeGenOpenCL/

amdgpu-enqueue-kernel.cl

36 lines

blocks.cl

2 lines

cl20-device-side-enqueue.cl

188 lines

Diff 119017

cfe/trunk/lib/CodeGen/CGBlocks.cpp

Show First 20 Lines • Show All 732 Lines • ▼ Show 20 Lines	void CodeGenFunction::destroyBlockInfos(CGBlockInfo *head) {
do {		do {
CGBlockInfo *cur = head;		CGBlockInfo *cur = head;
head = cur->NextBlockInfo;		head = cur->NextBlockInfo;
delete cur;		delete cur;
} while (head != nullptr);		} while (head != nullptr);
}		}

/// Emit a block literal expression in the current function.		/// Emit a block literal expression in the current function.
llvm::Value CodeGenFunction::EmitBlockLiteral(const BlockExpr blockExpr) {		llvm::Value CodeGenFunction::EmitBlockLiteral(const BlockExpr blockExpr,
		llvm::Function **InvokeF) {
// If the block has no captures, we won't have a pre-computed		// If the block has no captures, we won't have a pre-computed
// layout for it.		// layout for it.
if (!blockExpr->getBlockDecl()->hasCaptures()) {		if (!blockExpr->getBlockDecl()->hasCaptures()) {
if (llvm::Constant *Block = CGM.getAddrOfGlobalBlockIfEmitted(blockExpr))		// The block literal is emitted as a global variable, and the block invoke
		// function has to be extracted from its initializer.
		if (llvm::Constant *Block = CGM.getAddrOfGlobalBlockIfEmitted(blockExpr)) {
		if (InvokeF) {
		auto *GV = cast<llvm::GlobalVariable>(
		cast<llvm::Constant>(Block)->stripPointerCasts());
		auto *BlockInit = cast<llvm::ConstantStruct>(GV->getInitializer());
		*InvokeF = cast<llvm::Function>(
		BlockInit->getAggregateElement(2)->stripPointerCasts());
		}
return Block;		return Block;
		}
CGBlockInfo blockInfo(blockExpr->getBlockDecl(), CurFn->getName());		CGBlockInfo blockInfo(blockExpr->getBlockDecl(), CurFn->getName());
computeBlockInfo(CGM, this, blockInfo);		computeBlockInfo(CGM, this, blockInfo);
blockInfo.BlockExpression = blockExpr;		blockInfo.BlockExpression = blockExpr;
return EmitBlockLiteral(blockInfo);		return EmitBlockLiteral(blockInfo, InvokeF);
}		}

// Find the block info for this block and take ownership of it.		// Find the block info for this block and take ownership of it.
std::unique_ptr<CGBlockInfo> blockInfo;		std::unique_ptr<CGBlockInfo> blockInfo;
blockInfo.reset(findAndRemoveBlockInfo(&FirstBlockInfo,		blockInfo.reset(findAndRemoveBlockInfo(&FirstBlockInfo,
blockExpr->getBlockDecl()));		blockExpr->getBlockDecl()));

blockInfo->BlockExpression = blockExpr;		blockInfo->BlockExpression = blockExpr;
return EmitBlockLiteral(*blockInfo);		return EmitBlockLiteral(*blockInfo, InvokeF);
}		}

llvm::Value *CodeGenFunction::EmitBlockLiteral(const CGBlockInfo &blockInfo) {		llvm::Value *CodeGenFunction::EmitBlockLiteral(const CGBlockInfo &blockInfo,
		llvm::Function **InvokeF) {
bool IsOpenCL = CGM.getContext().getLangOpts().OpenCL;		bool IsOpenCL = CGM.getContext().getLangOpts().OpenCL;
auto GenVoidPtrTy =		auto GenVoidPtrTy =
IsOpenCL ? CGM.getOpenCLRuntime().getGenericVoidPointerType() : VoidPtrTy;		IsOpenCL ? CGM.getOpenCLRuntime().getGenericVoidPointerType() : VoidPtrTy;
unsigned GenVoidPtrAddr = IsOpenCL ? LangAS::opencl_generic : LangAS::Default;		unsigned GenVoidPtrAddr = IsOpenCL ? LangAS::opencl_generic : LangAS::Default;
auto GenVoidPtrSize = CharUnits::fromQuantity(		auto GenVoidPtrSize = CharUnits::fromQuantity(
CGM.getTarget().getPointerWidth(GenVoidPtrAddr) / 8);		CGM.getTarget().getPointerWidth(GenVoidPtrAddr) / 8);
// Using the computed layout, generate the actual block function.		// Using the computed layout, generate the actual block function.
bool isLambdaConv = blockInfo.getBlockDecl()->isConversionFromLambda();		bool isLambdaConv = blockInfo.getBlockDecl()->isConversionFromLambda();
llvm::Constant *blockFn = CodeGenFunction(CGM, true).GenerateBlockFunction(		auto *InvokeFn = CodeGenFunction(CGM, true).GenerateBlockFunction(
CurGD, blockInfo, LocalDeclMap, isLambdaConv, blockInfo.CanBeGlobal);		CurGD, blockInfo, LocalDeclMap, isLambdaConv, blockInfo.CanBeGlobal);
blockFn = llvm::ConstantExpr::getPointerCast(blockFn, GenVoidPtrTy);		if (InvokeF)
		*InvokeF = InvokeFn;
		auto *blockFn = llvm::ConstantExpr::getPointerCast(InvokeFn, GenVoidPtrTy);

// If there is nothing to capture, we can emit this as a global block.		// If there is nothing to capture, we can emit this as a global block.
if (blockInfo.CanBeGlobal)		if (blockInfo.CanBeGlobal)
return CGM.getAddrOfGlobalBlockIfEmitted(blockInfo.BlockExpression);		return CGM.getAddrOfGlobalBlockIfEmitted(blockInfo.BlockExpression);

// Otherwise, we have to emit this as a local block.		// Otherwise, we have to emit this as a local block.

Address blockAddr = blockInfo.LocalAddress;		Address blockAddr = blockInfo.LocalAddress;
▲ Show 20 Lines • Show All 1,819 Lines • Show Last 20 Lines

cfe/trunk/lib/CodeGen/CGBuiltin.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,773 Lines • ▼ Show 20 Lines	case Builtin::BIenqueue_kernel: {
LValue NDRangeL = EmitAggExprToLValue(E->getArg(2));		LValue NDRangeL = EmitAggExprToLValue(E->getArg(2));
llvm::Value *Range = NDRangeL.getAddress().getPointer();		llvm::Value *Range = NDRangeL.getAddress().getPointer();
llvm::Type *RangeTy = NDRangeL.getAddress().getType();		llvm::Type *RangeTy = NDRangeL.getAddress().getType();

if (NumArgs == 4) {		if (NumArgs == 4) {
// The most basic form of the call with parameters:		// The most basic form of the call with parameters:
// queue_t, kernel_enqueue_flags_t, ndrange_t, block(void)		// queue_t, kernel_enqueue_flags_t, ndrange_t, block(void)
Name = "__enqueue_kernel_basic";		Name = "__enqueue_kernel_basic";
llvm::Type *ArgTys[] = {QueueTy, Int32Ty, RangeTy, GenericVoidPtrTy};		llvm::Type *ArgTys[] = {QueueTy, Int32Ty, RangeTy, GenericVoidPtrTy,
		GenericVoidPtrTy};
llvm::FunctionType *FTy = llvm::FunctionType::get(		llvm::FunctionType *FTy = llvm::FunctionType::get(
Int32Ty, llvm::ArrayRef<llvm::Type *>(ArgTys, 4), false);		Int32Ty, llvm::ArrayRef<llvm::Type *>(ArgTys), false);

llvm::Value *Block = Builder.CreatePointerCast(		auto Info =
EmitScalarExpr(E->getArg(3)), GenericVoidPtrTy);		CGM.getOpenCLRuntime().emitOpenCLEnqueuedBlock(*this, E->getArg(3));
		llvm::Value *Kernel =
		Builder.CreatePointerCast(Info.Kernel, GenericVoidPtrTy);
		llvm::Value *Block =
		Builder.CreatePointerCast(Info.BlockArg, GenericVoidPtrTy);

AttrBuilder B;		AttrBuilder B;
B.addAttribute(Attribute::ByVal);		B.addAttribute(Attribute::ByVal);
llvm::AttributeList ByValAttrSet =		llvm::AttributeList ByValAttrSet =
llvm::AttributeList::get(CGM.getModule().getContext(), 3U, B);		llvm::AttributeList::get(CGM.getModule().getContext(), 3U, B);

auto RTCall =		auto RTCall =
Builder.CreateCall(CGM.CreateRuntimeFunction(FTy, Name, ByValAttrSet),		Builder.CreateCall(CGM.CreateRuntimeFunction(FTy, Name, ByValAttrSet),
{Queue, Flags, Range, Block});		{Queue, Flags, Range, Kernel, Block});
RTCall->setAttributes(ByValAttrSet);		RTCall->setAttributes(ByValAttrSet);
return RValue::get(RTCall);		return RValue::get(RTCall);
}		}
assert(NumArgs >= 5 && "Invalid enqueue_kernel signature");		assert(NumArgs >= 5 && "Invalid enqueue_kernel signature");

// Create a temporary array to hold the sizes of local pointer arguments		// Create a temporary array to hold the sizes of local pointer arguments
// for the block. \p First is the position of the first size argument.		// for the block. \p First is the position of the first size argument.
auto CreateArrayForSizeVar = [=](unsigned First) {		auto CreateArrayForSizeVar = [=](unsigned First) {
Show All 15 Lines	auto CreateArrayForSizeVar = [=](unsigned First) {
}		}
return Ptr;		return Ptr;
};		};

// Could have events and/or vaargs.		// Could have events and/or vaargs.
if (E->getArg(3)->getType()->isBlockPointerType()) {		if (E->getArg(3)->getType()->isBlockPointerType()) {
// No events passed, but has variadic arguments.		// No events passed, but has variadic arguments.
Name = "__enqueue_kernel_vaargs";		Name = "__enqueue_kernel_vaargs";
auto *Block = Builder.CreatePointerCast(EmitScalarExpr(E->getArg(3)),		auto Info =
GenericVoidPtrTy);		CGM.getOpenCLRuntime().emitOpenCLEnqueuedBlock(*this, E->getArg(3));
		llvm::Value *Kernel =
		Builder.CreatePointerCast(Info.Kernel, GenericVoidPtrTy);
		auto *Block = Builder.CreatePointerCast(Info.BlockArg, GenericVoidPtrTy);
auto *PtrToSizeArray = CreateArrayForSizeVar(4);		auto *PtrToSizeArray = CreateArrayForSizeVar(4);

// Create a vector of the arguments, as well as a constant value to		// Create a vector of the arguments, as well as a constant value to
// express to the runtime the number of variadic arguments.		// express to the runtime the number of variadic arguments.
std::vector<llvm::Value *> Args = {Queue,		std::vector<llvm::Value *> Args = {
Flags,		Queue, Flags, Range,
Range,		Kernel, Block, ConstantInt::get(IntTy, NumArgs - 4),
Block,
ConstantInt::get(IntTy, NumArgs - 4),
PtrToSizeArray};		PtrToSizeArray};
std::vector<llvm::Type *> ArgTys = {QueueTy, IntTy,		std::vector<llvm::Type *> ArgTys = {
RangeTy, GenericVoidPtrTy,		QueueTy, IntTy, RangeTy,
IntTy, PtrToSizeArray->getType()};		GenericVoidPtrTy, GenericVoidPtrTy, IntTy,
		PtrToSizeArray->getType()};

llvm::FunctionType *FTy = llvm::FunctionType::get(		llvm::FunctionType *FTy = llvm::FunctionType::get(
Int32Ty, llvm::ArrayRef<llvm::Type *>(ArgTys), false);		Int32Ty, llvm::ArrayRef<llvm::Type *>(ArgTys), false);
return RValue::get(		return RValue::get(
Builder.CreateCall(CGM.CreateRuntimeFunction(FTy, Name),		Builder.CreateCall(CGM.CreateRuntimeFunction(FTy, Name),
llvm::ArrayRef<llvm::Value *>(Args)));		llvm::ArrayRef<llvm::Value *>(Args)));
}		}
// Any calls now have event arguments passed.		// Any calls now have event arguments passed.
if (NumArgs >= 7) {		if (NumArgs >= 7) {
llvm::Type *EventTy = ConvertType(getContext().OCLClkEventTy);		llvm::Type *EventTy = ConvertType(getContext().OCLClkEventTy);
llvm::Type *EventPtrTy = EventTy->getPointerTo(		llvm::Type *EventPtrTy = EventTy->getPointerTo(
CGM.getContext().getTargetAddressSpace(LangAS::opencl_generic));		CGM.getContext().getTargetAddressSpace(LangAS::opencl_generic));

llvm::Value *NumEvents =		llvm::Value *NumEvents =
Builder.CreateZExtOrTrunc(EmitScalarExpr(E->getArg(3)), Int32Ty);		Builder.CreateZExtOrTrunc(EmitScalarExpr(E->getArg(3)), Int32Ty);
llvm::Value *EventList =		llvm::Value *EventList =
E->getArg(4)->getType()->isArrayType()		E->getArg(4)->getType()->isArrayType()
? EmitArrayToPointerDecay(E->getArg(4)).getPointer()		? EmitArrayToPointerDecay(E->getArg(4)).getPointer()
: EmitScalarExpr(E->getArg(4));		: EmitScalarExpr(E->getArg(4));
llvm::Value *ClkEvent = EmitScalarExpr(E->getArg(5));		llvm::Value *ClkEvent = EmitScalarExpr(E->getArg(5));
// Convert to generic address space.		// Convert to generic address space.
EventList = Builder.CreatePointerCast(EventList, EventPtrTy);		EventList = Builder.CreatePointerCast(EventList, EventPtrTy);
ClkEvent = Builder.CreatePointerCast(ClkEvent, EventPtrTy);		ClkEvent = Builder.CreatePointerCast(ClkEvent, EventPtrTy);
llvm::Value *Block = Builder.CreatePointerCast(		auto Info =
EmitScalarExpr(E->getArg(6)), GenericVoidPtrTy);		CGM.getOpenCLRuntime().emitOpenCLEnqueuedBlock(*this, E->getArg(6));
		llvm::Value *Kernel =
		Builder.CreatePointerCast(Info.Kernel, GenericVoidPtrTy);
		llvm::Value *Block =
		Builder.CreatePointerCast(Info.BlockArg, GenericVoidPtrTy);

std::vector<llvm::Type *> ArgTys = {		std::vector<llvm::Type *> ArgTys = {
QueueTy, Int32Ty, RangeTy, Int32Ty,		QueueTy, Int32Ty, RangeTy, Int32Ty,
EventPtrTy, EventPtrTy, GenericVoidPtrTy};		EventPtrTy, EventPtrTy, GenericVoidPtrTy, GenericVoidPtrTy};

std::vector<llvm::Value *> Args = {Queue, Flags, Range, NumEvents,		std::vector<llvm::Value *> Args = {Queue, Flags, Range, NumEvents,
EventList, ClkEvent, Block};		EventList, ClkEvent, Kernel, Block};

if (NumArgs == 7) {		if (NumArgs == 7) {
// Has events but no variadics.		// Has events but no variadics.
Name = "__enqueue_kernel_basic_events";		Name = "__enqueue_kernel_basic_events";
llvm::FunctionType *FTy = llvm::FunctionType::get(		llvm::FunctionType *FTy = llvm::FunctionType::get(
Int32Ty, llvm::ArrayRef<llvm::Type *>(ArgTys), false);		Int32Ty, llvm::ArrayRef<llvm::Type *>(ArgTys), false);
return RValue::get(		return RValue::get(
Builder.CreateCall(CGM.CreateRuntimeFunction(FTy, Name),		Builder.CreateCall(CGM.CreateRuntimeFunction(FTy, Name),
Show All 17 Lines	case Builtin::BIenqueue_kernel: {
}		}
LLVM_FALLTHROUGH;		LLVM_FALLTHROUGH;
}		}
// OpenCL v2.0 s6.13.17.6 - Kernel query functions need bitcast of block		// OpenCL v2.0 s6.13.17.6 - Kernel query functions need bitcast of block
// parameter.		// parameter.
case Builtin::BIget_kernel_work_group_size: {		case Builtin::BIget_kernel_work_group_size: {
llvm::Type *GenericVoidPtrTy = Builder.getInt8PtrTy(		llvm::Type *GenericVoidPtrTy = Builder.getInt8PtrTy(
getContext().getTargetAddressSpace(LangAS::opencl_generic));		getContext().getTargetAddressSpace(LangAS::opencl_generic));
Value *Arg = EmitScalarExpr(E->getArg(0));		auto Info =
Arg = Builder.CreatePointerCast(Arg, GenericVoidPtrTy);		CGM.getOpenCLRuntime().emitOpenCLEnqueuedBlock(*this, E->getArg(0));
		Value *Kernel = Builder.CreatePointerCast(Info.Kernel, GenericVoidPtrTy);
		Value *Arg = Builder.CreatePointerCast(Info.BlockArg, GenericVoidPtrTy);
return RValue::get(Builder.CreateCall(		return RValue::get(Builder.CreateCall(
CGM.CreateRuntimeFunction(		CGM.CreateRuntimeFunction(
llvm::FunctionType::get(IntTy, GenericVoidPtrTy, false),		llvm::FunctionType::get(IntTy, {GenericVoidPtrTy, GenericVoidPtrTy},
		false),
"__get_kernel_work_group_size_impl"),		"__get_kernel_work_group_size_impl"),
Arg));		{Kernel, Arg}));
}		}
case Builtin::BIget_kernel_preferred_work_group_size_multiple: {		case Builtin::BIget_kernel_preferred_work_group_size_multiple: {
llvm::Type *GenericVoidPtrTy = Builder.getInt8PtrTy(		llvm::Type *GenericVoidPtrTy = Builder.getInt8PtrTy(
getContext().getTargetAddressSpace(LangAS::opencl_generic));		getContext().getTargetAddressSpace(LangAS::opencl_generic));
Value *Arg = EmitScalarExpr(E->getArg(0));		auto Info =
Arg = Builder.CreatePointerCast(Arg, GenericVoidPtrTy);		CGM.getOpenCLRuntime().emitOpenCLEnqueuedBlock(*this, E->getArg(0));
		Value *Kernel = Builder.CreatePointerCast(Info.Kernel, GenericVoidPtrTy);
		Value *Arg = Builder.CreatePointerCast(Info.BlockArg, GenericVoidPtrTy);
return RValue::get(Builder.CreateCall(		return RValue::get(Builder.CreateCall(
CGM.CreateRuntimeFunction(		CGM.CreateRuntimeFunction(
llvm::FunctionType::get(IntTy, GenericVoidPtrTy, false),		llvm::FunctionType::get(IntTy, {GenericVoidPtrTy, GenericVoidPtrTy},
		false),
"__get_kernel_preferred_work_group_multiple_impl"),		"__get_kernel_preferred_work_group_multiple_impl"),
Arg));		{Kernel, Arg}));
}		}
case Builtin::BIget_kernel_max_sub_group_size_for_ndrange:		case Builtin::BIget_kernel_max_sub_group_size_for_ndrange:
case Builtin::BIget_kernel_sub_group_count_for_ndrange: {		case Builtin::BIget_kernel_sub_group_count_for_ndrange: {
llvm::Type *GenericVoidPtrTy = Builder.getInt8PtrTy(		llvm::Type *GenericVoidPtrTy = Builder.getInt8PtrTy(
getContext().getTargetAddressSpace(LangAS::opencl_generic));		getContext().getTargetAddressSpace(LangAS::opencl_generic));
LValue NDRangeL = EmitAggExprToLValue(E->getArg(0));		LValue NDRangeL = EmitAggExprToLValue(E->getArg(0));
llvm::Value *NDRange = NDRangeL.getAddress().getPointer();		llvm::Value *NDRange = NDRangeL.getAddress().getPointer();
Value *Block = EmitScalarExpr(E->getArg(1));		auto Info =
Block = Builder.CreatePointerCast(Block, GenericVoidPtrTy);		CGM.getOpenCLRuntime().emitOpenCLEnqueuedBlock(*this, E->getArg(1));
		Value *Kernel = Builder.CreatePointerCast(Info.Kernel, GenericVoidPtrTy);
		Value *Block = Builder.CreatePointerCast(Info.BlockArg, GenericVoidPtrTy);
const char *Name =		const char *Name =
BuiltinID == Builtin::BIget_kernel_max_sub_group_size_for_ndrange		BuiltinID == Builtin::BIget_kernel_max_sub_group_size_for_ndrange
? "__get_kernel_max_sub_group_size_for_ndrange_impl"		? "__get_kernel_max_sub_group_size_for_ndrange_impl"
: "__get_kernel_sub_group_count_for_ndrange_impl";		: "__get_kernel_sub_group_count_for_ndrange_impl";
return RValue::get(Builder.CreateCall(		return RValue::get(Builder.CreateCall(
CGM.CreateRuntimeFunction(		CGM.CreateRuntimeFunction(
llvm::FunctionType::get(		llvm::FunctionType::get(
IntTy, {NDRange->getType(), GenericVoidPtrTy}, false),		IntTy, {NDRange->getType(), GenericVoidPtrTy, GenericVoidPtrTy},
		false),
Name),		Name),
{NDRange, Block}));		{NDRange, Kernel, Block}));
}		}

case Builtin::BI__builtin_store_half:		case Builtin::BI__builtin_store_half:
case Builtin::BI__builtin_store_halff: {		case Builtin::BI__builtin_store_halff: {
Value *Val = EmitScalarExpr(E->getArg(0));		Value *Val = EmitScalarExpr(E->getArg(0));
Address Address = EmitPointerWithAlignment(E->getArg(1));		Address Address = EmitPointerWithAlignment(E->getArg(1));
Value *HalfVal = Builder.CreateFPTrunc(Val, Builder.getHalfTy());		Value *HalfVal = Builder.CreateFPTrunc(Val, Builder.getHalfTy());
return RValue::get(Builder.CreateStore(HalfVal, Address));		return RValue::get(Builder.CreateStore(HalfVal, Address));
▲ Show 20 Lines • Show All 7,012 Lines • Show Last 20 Lines

cfe/trunk/lib/CodeGen/CGOpenCLRuntime.h

Show All 11 Lines
// runtime libraries.		// runtime libraries.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_CLANG_LIB_CODEGEN_CGOPENCLRUNTIME_H		#ifndef LLVM_CLANG_LIB_CODEGEN_CGOPENCLRUNTIME_H
#define LLVM_CLANG_LIB_CODEGEN_CGOPENCLRUNTIME_H		#define LLVM_CLANG_LIB_CODEGEN_CGOPENCLRUNTIME_H

#include "clang/AST/Type.h"		#include "clang/AST/Type.h"
		#include "llvm/ADT/DenseMap.h"
#include "llvm/IR/Type.h"		#include "llvm/IR/Type.h"
#include "llvm/IR/Value.h"		#include "llvm/IR/Value.h"

namespace clang {		namespace clang {

		class Expr;
class VarDecl;		class VarDecl;

namespace CodeGen {		namespace CodeGen {

class CodeGenFunction;		class CodeGenFunction;
class CodeGenModule;		class CodeGenModule;

class CGOpenCLRuntime {		class CGOpenCLRuntime {
protected:		protected:
CodeGenModule &CGM;		CodeGenModule &CGM;
llvm::Type *PipeTy;		llvm::Type *PipeTy;
llvm::PointerType *SamplerTy;		llvm::PointerType *SamplerTy;

		/// Structure for enqueued block information.
		struct EnqueuedBlockInfo {
		llvm::Function *Kernel; /// Enqueued block kernel.
		llvm::Value *BlockArg; /// The first argument to enqueued block kernel.
		};
		/// Maps block expression to block information.
		llvm::DenseMap<const Expr *, EnqueuedBlockInfo> EnqueuedBlockMap;

public:		public:
CGOpenCLRuntime(CodeGenModule &CGM) : CGM(CGM), PipeTy(nullptr),		CGOpenCLRuntime(CodeGenModule &CGM) : CGM(CGM), PipeTy(nullptr),
SamplerTy(nullptr) {}		SamplerTy(nullptr) {}
virtual ~CGOpenCLRuntime();		virtual ~CGOpenCLRuntime();

/// Emit the IR required for a work-group-local variable declaration, and add		/// Emit the IR required for a work-group-local variable declaration, and add
/// an entry to CGF's LocalDeclMap for D. The base class does this using		/// an entry to CGF's LocalDeclMap for D. The base class does this using
/// CodeGenFunction::EmitStaticVarDecl to emit an internal global for D.		/// CodeGenFunction::EmitStaticVarDecl to emit an internal global for D.
Show All 11 Lines	public:
virtual llvm::Value getPipeElemSize(const Expr PipeArg);		virtual llvm::Value getPipeElemSize(const Expr PipeArg);

// \brief Returnes a value which indicates the alignment in bytes of the pipe		// \brief Returnes a value which indicates the alignment in bytes of the pipe
// element.		// element.
virtual llvm::Value getPipeElemAlign(const Expr PipeArg);		virtual llvm::Value getPipeElemAlign(const Expr PipeArg);

/// \return __generic void* type.		/// \return __generic void* type.
llvm::PointerType *getGenericVoidPointerType();		llvm::PointerType *getGenericVoidPointerType();

		/// \return enqueued block information for enqueued block.
		EnqueuedBlockInfo emitOpenCLEnqueuedBlock(CodeGenFunction &CGF,
		const Expr *E);
};		};

}		}
}		}

#endif		#endif

cfe/trunk/lib/CodeGen/CGOpenCLRuntime.cpp

	Show All 10 Lines
	// subclasses of this implement code generation for specific OpenCL			// subclasses of this implement code generation for specific OpenCL
	// runtime libraries.			// runtime libraries.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "CGOpenCLRuntime.h"			#include "CGOpenCLRuntime.h"
	#include "CodeGenFunction.h"			#include "CodeGenFunction.h"
	#include "TargetInfo.h"			#include "TargetInfo.h"
				#include "clang/CodeGen/ConstantInitBuilder.h"
	#include "llvm/IR/DerivedTypes.h"			#include "llvm/IR/DerivedTypes.h"
	#include "llvm/IR/GlobalValue.h"			#include "llvm/IR/GlobalValue.h"
	#include <assert.h>			#include <assert.h>

	using namespace clang;			using namespace clang;
	using namespace CodeGen;			using namespace CodeGen;

	CGOpenCLRuntime::~CGOpenCLRuntime() {}			CGOpenCLRuntime::~CGOpenCLRuntime() {}
	▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines
	}			}

	llvm::PointerType *CGOpenCLRuntime::getGenericVoidPointerType() {			llvm::PointerType *CGOpenCLRuntime::getGenericVoidPointerType() {
	assert(CGM.getLangOpts().OpenCL);			assert(CGM.getLangOpts().OpenCL);
	return llvm::IntegerType::getInt8PtrTy(			return llvm::IntegerType::getInt8PtrTy(
	CGM.getLLVMContext(),			CGM.getLLVMContext(),
	CGM.getContext().getTargetAddressSpace(LangAS::opencl_generic));			CGM.getContext().getTargetAddressSpace(LangAS::opencl_generic));
	}			}

				CGOpenCLRuntime::EnqueuedBlockInfo
				CGOpenCLRuntime::emitOpenCLEnqueuedBlock(CodeGenFunction &CGF, const Expr *E) {
				// The block literal may be assigned to a const variable. Chasing down
				// to get the block literal.
				if (auto DR = dyn_cast<DeclRefExpr>(E)) {
				E = cast<VarDecl>(DR->getDecl())->getInit();
				}
				if (auto Cast = dyn_cast<CastExpr>(E)) {
				E = Cast->getSubExpr();
				}
				auto *Block = cast<BlockExpr>(E);

				// The same block literal may be enqueued multiple times. Cache it if
				// possible.
				auto Loc = EnqueuedBlockMap.find(Block);
				if (Loc != EnqueuedBlockMap.end()) {
				return Loc->second;
				}

				// Emit block literal as a common block expression and get the block invoke
				// function.
				llvm::Function *Invoke;
				auto *V = CGF.EmitBlockLiteral(cast<BlockExpr>(Block), &Invoke);
				auto *F = CGF.getTargetHooks().createEnqueuedBlockKernel(
				CGF, Invoke, V->stripPointerCasts());

				// The common part of the post-processing of the kernel goes here.
				F->addFnAttr(llvm::Attribute::NoUnwind);
				F->setCallingConv(
				CGF.getTypes().ClangCallConvToLLVMCallConv(CallingConv::CC_OpenCLKernel));
				EnqueuedBlockInfo Info{F, V};
				EnqueuedBlockMap[Block] = Info;
				return Info;
				}

cfe/trunk/lib/CodeGen/CodeGenFunction.h

Show First 20 Lines • Show All 1,578 Lines • ▼ Show 20 Lines	public:
void generateObjCSetterBody(const ObjCImplementationDecl *classImpl,		void generateObjCSetterBody(const ObjCImplementationDecl *classImpl,
const ObjCPropertyImplDecl *propImpl,		const ObjCPropertyImplDecl *propImpl,
llvm::Constant *AtomicHelperFn);		llvm::Constant *AtomicHelperFn);

//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//
// Block Bits		// Block Bits
//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//

llvm::Value EmitBlockLiteral(const BlockExpr );		/// Emit block literal.
		/// \return an LLVM value which is a pointer to a struct which contains
		/// information about the block, including the block invoke function, the
		/// captured variables, etc.
		/// \param InvokeF will contain the block invoke function if it is not
		/// nullptr.
		llvm::Value EmitBlockLiteral(const BlockExpr ,
		llvm::Function **InvokeF = nullptr);
static void destroyBlockInfos(CGBlockInfo *info);		static void destroyBlockInfos(CGBlockInfo *info);

llvm::Function *GenerateBlockFunction(GlobalDecl GD,		llvm::Function *GenerateBlockFunction(GlobalDecl GD,
const CGBlockInfo &Info,		const CGBlockInfo &Info,
const DeclMapTy &ldm,		const DeclMapTy &ldm,
bool IsLambdaConversionToBlock,		bool IsLambdaConversionToBlock,
bool BuildGlobalBlock);		bool BuildGlobalBlock);

▲ Show 20 Lines • Show All 1,313 Lines • ▼ Show 20 Lines
bool EmitOMPWorksharingLoop(const OMPLoopDirective &S, Expr *EUB,		bool EmitOMPWorksharingLoop(const OMPLoopDirective &S, Expr *EUB,
const CodeGenLoopBoundsTy &CodeGenLoopBounds,		const CodeGenLoopBoundsTy &CodeGenLoopBounds,
const CodeGenDispatchBoundsTy &CGDispatchBounds);		const CodeGenDispatchBoundsTy &CGDispatchBounds);

/// Emits the lvalue for the expression with possibly captured variable.		/// Emits the lvalue for the expression with possibly captured variable.
LValue EmitOMPSharedLValue(const Expr *E);		LValue EmitOMPSharedLValue(const Expr *E);

private:		private:
/// Helpers for blocks		/// Helpers for blocks. Returns invoke function by \p InvokeF if it is not
llvm::Value *EmitBlockLiteral(const CGBlockInfo &Info);		/// nullptr. It should be called without \p InvokeF if the caller does not
		/// need invoke function to be returned.
		llvm::Value *EmitBlockLiteral(const CGBlockInfo &Info,
		llvm::Function **InvokeF = nullptr);

/// Helpers for the OpenMP loop directives.		/// Helpers for the OpenMP loop directives.
void EmitOMPSimdInit(const OMPLoopDirective &D, bool IsMonotonic = false);		void EmitOMPSimdInit(const OMPLoopDirective &D, bool IsMonotonic = false);
void EmitOMPSimdFinal(		void EmitOMPSimdFinal(
const OMPLoopDirective &D,		const OMPLoopDirective &D,
const llvm::function_ref<llvm::Value *(CodeGenFunction &)> &CondGen);		const llvm::function_ref<llvm::Value *(CodeGenFunction &)> &CondGen);

void EmitOMPDistributeLoop(const OMPLoopDirective &S,		void EmitOMPDistributeLoop(const OMPLoopDirective &S,
▲ Show 20 Lines • Show All 1,114 Lines • Show Last 20 Lines

cfe/trunk/lib/CodeGen/CodeGenTypes.h

Show First 20 Lines • Show All 158 Lines • ▼ Show 20 Lines	class CodeGenTypes {
SmallVector<const RecordDecl *, 8> DeferredRecords;		SmallVector<const RecordDecl *, 8> DeferredRecords;

/// This map keeps cache of llvm::Types and maps clang::Type to		/// This map keeps cache of llvm::Types and maps clang::Type to
/// corresponding llvm::Type.		/// corresponding llvm::Type.
llvm::DenseMap<const Type , llvm::Type > TypeCache;		llvm::DenseMap<const Type , llvm::Type > TypeCache;

llvm::SmallSet<const Type *, 8> RecordsWithOpaqueMemberPointers;		llvm::SmallSet<const Type *, 8> RecordsWithOpaqueMemberPointers;

unsigned ClangCallConvToLLVMCallConv(CallingConv CC);

public:		public:
CodeGenTypes(CodeGenModule &cgm);		CodeGenTypes(CodeGenModule &cgm);
~CodeGenTypes();		~CodeGenTypes();

const llvm::DataLayout &getDataLayout() const {		const llvm::DataLayout &getDataLayout() const {
return TheModule.getDataLayout();		return TheModule.getDataLayout();
}		}
ASTContext &getContext() const { return Context; }		ASTContext &getContext() const { return Context; }
const ABIInfo &getABIInfo() const { return TheABIInfo; }		const ABIInfo &getABIInfo() const { return TheABIInfo; }
const TargetInfo &getTarget() const { return Target; }		const TargetInfo &getTarget() const { return Target; }
CGCXXABI &getCXXABI() const { return TheCXXABI; }		CGCXXABI &getCXXABI() const { return TheCXXABI; }
llvm::LLVMContext &getLLVMContext() { return TheModule.getContext(); }		llvm::LLVMContext &getLLVMContext() { return TheModule.getContext(); }
const CodeGenOptions &getCodeGenOpts() const;		const CodeGenOptions &getCodeGenOpts() const;

		/// Convert clang calling convention to LLVM callilng convention.
		unsigned ClangCallConvToLLVMCallConv(CallingConv CC);

/// ConvertType - Convert type T into a llvm::Type.		/// ConvertType - Convert type T into a llvm::Type.
llvm::Type *ConvertType(QualType T);		llvm::Type *ConvertType(QualType T);

/// \brief Converts the GlobalDecl into an llvm::Type. This should be used		/// \brief Converts the GlobalDecl into an llvm::Type. This should be used
/// when we know the target of the function we want to convert. This is		/// when we know the target of the function we want to convert. This is
/// because some functions (explicitly, those with pass_object_size		/// because some functions (explicitly, those with pass_object_size
/// parameters) may not have the same signature as their type portrays, and		/// parameters) may not have the same signature as their type portrays, and
/// can only be called directly.		/// can only be called directly.
▲ Show 20 Lines • Show All 190 Lines • Show Last 20 Lines

cfe/trunk/lib/CodeGen/TargetInfo.h

Show First 20 Lines • Show All 281 Lines • ▼ Show 20 Lines	public:
/// Get the custom field values for OpenCL blocks if all values are LLVM		/// Get the custom field values for OpenCL blocks if all values are LLVM
/// constants.		/// constants.
virtual llvm::SmallVector<llvm::Constant *, 1>		virtual llvm::SmallVector<llvm::Constant *, 1>
getCustomFieldValues(CodeGenModule &CGM, const CGBlockInfo &Info) = 0;		getCustomFieldValues(CodeGenModule &CGM, const CGBlockInfo &Info) = 0;
};		};
virtual TargetOpenCLBlockHelper *getTargetOpenCLBlockHelper() const {		virtual TargetOpenCLBlockHelper *getTargetOpenCLBlockHelper() const {
return nullptr;		return nullptr;
}		}

		/// Create an OpenCL kernel for an enqueued block. The kernel function is
		/// a wrapper for the block invoke function with target-specific calling
		/// convention and ABI as an OpenCL kernel. The wrapper function accepts
		/// block context and block arguments in target-specific way and calls
		/// the original block invoke function.
		virtual llvm::Function *
		createEnqueuedBlockKernel(CodeGenFunction &CGF,
		llvm::Function *BlockInvokeFunc,
		llvm::Value *BlockLiteral) const;
};		};

} // namespace CodeGen		} // namespace CodeGen
} // namespace clang		} // namespace clang

#endif // LLVM_CLANG_LIB_CODEGEN_TARGETINFO_H		#endif // LLVM_CLANG_LIB_CODEGEN_TARGETINFO_H

cfe/trunk/lib/CodeGen/TargetInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

//===---- TargetInfo.cpp - Encapsulate target details ------------ C++ --===//		//===---- TargetInfo.cpp - Encapsulate target details ------------ C++ --===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// These classes wrap the information about a call or function		// These classes wrap the information about a call or function
// definition used to handle ABI compliancy.		// definition used to handle ABI compliancy.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "TargetInfo.h"		#include "TargetInfo.h"
#include "ABIInfo.h"		#include "ABIInfo.h"
		#include "CGBlocks.h"
#include "CGCXXABI.h"		#include "CGCXXABI.h"
#include "CGValue.h"		#include "CGValue.h"
#include "CodeGenFunction.h"		#include "CodeGenFunction.h"
#include "clang/AST/RecordLayout.h"		#include "clang/AST/RecordLayout.h"
#include "clang/CodeGen/CGFunctionInfo.h"		#include "clang/CodeGen/CGFunctionInfo.h"
#include "clang/CodeGen/SwiftCallingConv.h"		#include "clang/CodeGen/SwiftCallingConv.h"
#include "clang/Frontend/CodeGenOptions.h"		#include "clang/Frontend/CodeGenOptions.h"
#include "llvm/ADT/StringExtras.h"		#include "llvm/ADT/StringExtras.h"
▲ Show 20 Lines • Show All 7,587 Lines • ▼ Show 20 Lines	public:
unsigned getASTAllocaAddressSpace() const override {		unsigned getASTAllocaAddressSpace() const override {
return LangAS::FirstTargetAddressSpace +		return LangAS::FirstTargetAddressSpace +
getABIInfo().getDataLayout().getAllocaAddrSpace();		getABIInfo().getDataLayout().getAllocaAddrSpace();
}		}
unsigned getGlobalVarAddressSpace(CodeGenModule &CGM,		unsigned getGlobalVarAddressSpace(CodeGenModule &CGM,
const VarDecl *D) const override;		const VarDecl *D) const override;
llvm::SyncScope::ID getLLVMSyncScopeID(SyncScope S,		llvm::SyncScope::ID getLLVMSyncScopeID(SyncScope S,
llvm::LLVMContext &C) const override;		llvm::LLVMContext &C) const override;
		llvm::Function *
		createEnqueuedBlockKernel(CodeGenFunction &CGF,
		llvm::Function *BlockInvokeFunc,
		llvm::Value *BlockLiteral) const override;
};		};
}		}

void AMDGPUTargetCodeGenInfo::setTargetAttributes(		void AMDGPUTargetCodeGenInfo::setTargetAttributes(
const Decl D, llvm::GlobalValue GV, CodeGen::CodeGenModule &M,		const Decl D, llvm::GlobalValue GV, CodeGen::CodeGenModule &M,
ForDefinition_t IsForDefinition) const {		ForDefinition_t IsForDefinition) const {
if (!IsForDefinition)		if (!IsForDefinition)
return;		return;
▲ Show 20 Lines • Show All 1,284 Lines • ▼ Show 20 Lines	case llvm::Triple::sparcv9:
return SetCGInfo(new SparcV9TargetCodeGenInfo(Types));		return SetCGInfo(new SparcV9TargetCodeGenInfo(Types));
case llvm::Triple::xcore:		case llvm::Triple::xcore:
return SetCGInfo(new XCoreTargetCodeGenInfo(Types));		return SetCGInfo(new XCoreTargetCodeGenInfo(Types));
case llvm::Triple::spir:		case llvm::Triple::spir:
case llvm::Triple::spir64:		case llvm::Triple::spir64:
return SetCGInfo(new SPIRTargetCodeGenInfo(Types));		return SetCGInfo(new SPIRTargetCodeGenInfo(Types));
}		}
}		}

		/// Create an OpenCL kernel for an enqueued block.
		///
		/// The kernel has the same function type as the block invoke function. Its
		/// name is the name of the block invoke function postfixed with "_kernel".
		/// It simply calls the block invoke function then returns.
		llvm::Function *
		TargetCodeGenInfo::createEnqueuedBlockKernel(CodeGenFunction &CGF,
		llvm::Function *Invoke,
		llvm::Value *BlockLiteral) const {
		auto *InvokeFT = Invoke->getFunctionType();
		llvm::SmallVector<llvm::Type *, 2> ArgTys;
		for (auto &P : InvokeFT->params())
		ArgTys.push_back(P);
		auto &C = CGF.getLLVMContext();
		std::string Name = Invoke->getName().str() + "_kernel";
		auto *FT = llvm::FunctionType::get(llvm::Type::getVoidTy(C), ArgTys, false);
		auto *F = llvm::Function::Create(FT, llvm::GlobalValue::InternalLinkage, Name,
		&CGF.CGM.getModule());
		auto IP = CGF.Builder.saveIP();
		auto *BB = llvm::BasicBlock::Create(C, "entry", F);
		auto &Builder = CGF.Builder;
		Builder.SetInsertPoint(BB);
		llvm::SmallVector<llvm::Value *, 2> Args;
		for (auto &A : F->args())
		Args.push_back(&A);
		Builder.CreateCall(Invoke, Args);
		Builder.CreateRetVoid();
		Builder.restoreIP(IP);
		return F;
		}

		/// Create an OpenCL kernel for an enqueued block.
		///
		/// The type of the first argument (the block literal) is the struct type
		/// of the block literal instead of a pointer type. The first argument
		/// (block literal) is passed directly by value to the kernel. The kernel
		/// allocates the same type of struct on stack and stores the block literal
		/// to it and passes its pointer to the block invoke function. The kernel
		/// has "enqueued-block" function attribute and kernel argument metadata.
		llvm::Function *AMDGPUTargetCodeGenInfo::createEnqueuedBlockKernel(
		CodeGenFunction &CGF, llvm::Function *Invoke,
		llvm::Value *BlockLiteral) const {
		auto &Builder = CGF.Builder;
		auto &C = CGF.getLLVMContext();

		auto *BlockTy = BlockLiteral->getType()->getPointerElementType();
		auto *InvokeFT = Invoke->getFunctionType();
		llvm::SmallVector<llvm::Type *, 2> ArgTys;
		llvm::SmallVector<llvm::Metadata *, 8> AddressQuals;
		llvm::SmallVector<llvm::Metadata *, 8> AccessQuals;
		llvm::SmallVector<llvm::Metadata *, 8> ArgTypeNames;
		llvm::SmallVector<llvm::Metadata *, 8> ArgBaseTypeNames;
		llvm::SmallVector<llvm::Metadata *, 8> ArgTypeQuals;
		llvm::SmallVector<llvm::Metadata *, 8> ArgNames;

		ArgTys.push_back(BlockTy);
		ArgTypeNames.push_back(llvm::MDString::get(C, "__block_literal"));
		AddressQuals.push_back(llvm::ConstantAsMetadata::get(Builder.getInt32(0)));
		ArgBaseTypeNames.push_back(llvm::MDString::get(C, "__block_literal"));
		ArgTypeQuals.push_back(llvm::MDString::get(C, ""));
		AccessQuals.push_back(llvm::MDString::get(C, "none"));
		ArgNames.push_back(llvm::MDString::get(C, "block_literal"));
		for (unsigned I = 1, E = InvokeFT->getNumParams(); I < E; ++I) {
		ArgTys.push_back(InvokeFT->getParamType(I));
		ArgTys.push_back(BlockTy);
		ArgTypeNames.push_back(llvm::MDString::get(C, "void*"));
		AddressQuals.push_back(llvm::ConstantAsMetadata::get(Builder.getInt32(3)));
		AccessQuals.push_back(llvm::MDString::get(C, "none"));
		ArgBaseTypeNames.push_back(llvm::MDString::get(C, "void*"));
		ArgTypeQuals.push_back(llvm::MDString::get(C, ""));
		ArgNames.push_back(
		llvm::MDString::get(C, std::string("local_arg") + std::to_string(I)));
		}
		std::string Name = Invoke->getName().str() + "_kernel";
		auto *FT = llvm::FunctionType::get(llvm::Type::getVoidTy(C), ArgTys, false);
		auto *F = llvm::Function::Create(FT, llvm::GlobalValue::InternalLinkage, Name,
		&CGF.CGM.getModule());
		F->addFnAttr("enqueued-block");
		auto IP = CGF.Builder.saveIP();
		auto *BB = llvm::BasicBlock::Create(C, "entry", F);
		Builder.SetInsertPoint(BB);
		unsigned BlockAlign = CGF.CGM.getDataLayout().getPrefTypeAlignment(BlockTy);
		auto *BlockPtr = Builder.CreateAlloca(BlockTy, nullptr);
		BlockPtr->setAlignment(BlockAlign);
		Builder.CreateAlignedStore(F->arg_begin(), BlockPtr, BlockAlign);
		auto *Cast = Builder.CreatePointerCast(BlockPtr, InvokeFT->getParamType(0));
		llvm::SmallVector<llvm::Value *, 2> Args;
		Args.push_back(Cast);
		for (auto I = F->arg_begin() + 1, E = F->arg_end(); I != E; ++I)
		Args.push_back(I);
		Builder.CreateCall(Invoke, Args);
		Builder.CreateRetVoid();
		Builder.restoreIP(IP);

		F->setMetadata("kernel_arg_addr_space", llvm::MDNode::get(C, AddressQuals));
		F->setMetadata("kernel_arg_access_qual", llvm::MDNode::get(C, AccessQuals));
		F->setMetadata("kernel_arg_type", llvm::MDNode::get(C, ArgTypeNames));
		F->setMetadata("kernel_arg_base_type",
		llvm::MDNode::get(C, ArgBaseTypeNames));
		F->setMetadata("kernel_arg_type_qual", llvm::MDNode::get(C, ArgTypeQuals));
		if (CGF.CGM.getCodeGenOpts().EmitOpenCLArgMetadata)
		F->setMetadata("kernel_arg_name", llvm::MDNode::get(C, ArgNames));

		return F;
		}

cfe/trunk/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl

				// RUN: %clang_cc1 %s -cl-std=CL2.0 -O0 -emit-llvm -o - -triple amdgcn \| FileCheck %s --check-prefix=CHECK

				typedef struct {int a;} ndrange_t;

				// CHECK-LABEL: define amdgpu_kernel void @test
				kernel void test(global char a, char b, global long c, long d) {
				queue_t default_queue;
				unsigned flags = 0;
				ndrange_t ndrange;

				enqueue_kernel(default_queue, flags, ndrange,
				^(void) {
				a[0] = b;
				});

				enqueue_kernel(default_queue, flags, ndrange,
				^(void) {
				a[0] = b;
				c[0] = d;
				});
				}

				// CHECK-LABEL: define internal amdgpu_kernel void @__test_block_invoke_kernel(<{ i32, i32, i8 addrspace(4), i8 addrspace(1), i8 }>)
				// CHECK-SAME: #[[ATTR:[0-9]+]] !kernel_arg_addr_space !{{.}} !kernel_arg_access_qual !{{.}} !kernel_arg_type !{{.}} !kernel_arg_base_type !{{.}} !kernel_arg_type_qual !{{.*}}
				// CHECK: entry:
				// CHECK: %1 = alloca <{ i32, i32, i8 addrspace(4), i8 addrspace(1), i8 }>, align 8
				// CHECK: store <{ i32, i32, i8 addrspace(4), i8 addrspace(1), i8 }> %0, <{ i32, i32, i8 addrspace(4), i8 addrspace(1), i8 }>* %1, align 8
				// CHECK: %2 = addrspacecast <{ i32, i32, i8 addrspace(4), i8 addrspace(1), i8 }>* %1 to i8 addrspace(4)*
				// CHECK: call void @__test_block_invoke(i8 addrspace(4)* %2)
				// CHECK: ret void
				// CHECK:}

				// CHECK-LABEL: define internal amdgpu_kernel void @__test_block_invoke_2_kernel(<{ i32, i32, i8 addrspace(4), i8 addrspace(1), i64 addrspace(1)*, i64, i8 }>)
				// CHECK-SAME: #[[ATTR]] !kernel_arg_addr_space !{{.}} !kernel_arg_access_qual !{{.}} !kernel_arg_type !{{.}} !kernel_arg_base_type !{{.}} !kernel_arg_type_qual !{{.*}}

				// CHECK: attributes #[[ATTR]] = { nounwind "enqueued-block" }

cfe/trunk/test/CodeGenOpenCL/blocks.cl

Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	void foo(){
};		};
block_B();		block_B();
}		}

// COMMON-LABEL: define internal {{.}}i32 @__foo_block_invoke(i8 addrspace(4) %.block_descriptor)		// COMMON-LABEL: define internal {{.}}i32 @__foo_block_invoke(i8 addrspace(4) %.block_descriptor)
// COMMON: %[[block:.]] = bitcast i8 addrspace(4) %.block_descriptor to <{ i32, i32, i8 addrspace(4), i32 }> addrspace(4)		// COMMON: %[[block:.]] = bitcast i8 addrspace(4) %.block_descriptor to <{ i32, i32, i8 addrspace(4), i32 }> addrspace(4)
// COMMON: %[[block_capture_addr:.]] = getelementptr inbounds <{ i32, i32, i8 addrspace(4), i32 }>, <{ i32, i32, i8 addrspace(4), i32 }> addrspace(4) %[[block]], i32 0, i32 3		// COMMON: %[[block_capture_addr:.]] = getelementptr inbounds <{ i32, i32, i8 addrspace(4), i32 }>, <{ i32, i32, i8 addrspace(4), i32 }> addrspace(4) %[[block]], i32 0, i32 3
// COMMON: %[[block_capture:.]] = load i32, i32 addrspace(4) %[[block_capture_addr]]		// COMMON: %[[block_capture:.]] = load i32, i32 addrspace(4) %[[block_capture_addr]]

		// COMMON-NOT: define{{.*}}@__foo_block_invoke_kernel

cfe/trunk/test/CodeGenOpenCL/cl20-device-side-enqueue.cl

	// RUN: %clang_cc1 %s -cl-std=CL2.0 -ffake-address-space-map -O0 -emit-llvm -o - -triple "spir-unknown-unknown" \| FileCheck %s --check-prefix=COMMON --check-prefix=B32			// RUN: %clang_cc1 %s -cl-std=CL2.0 -ffake-address-space-map -O0 -emit-llvm -o - -triple "spir-unknown-unknown" \| FileCheck %s --check-prefix=COMMON --check-prefix=B32
	// RUN: %clang_cc1 %s -cl-std=CL2.0 -ffake-address-space-map -O0 -emit-llvm -o - -triple "spir64-unknown-unknown" \| FileCheck %s --check-prefix=COMMON --check-prefix=B64			// RUN: %clang_cc1 %s -cl-std=CL2.0 -ffake-address-space-map -O0 -emit-llvm -o - -triple "spir64-unknown-unknown" \| FileCheck %s --check-prefix=COMMON --check-prefix=B64

	#pragma OPENCL EXTENSION cl_khr_subgroups : enable			#pragma OPENCL EXTENSION cl_khr_subgroups : enable

	typedef void (^bl_t)(local void *);			typedef void (^bl_t)(local void *);
	typedef struct {int a;} ndrange_t;			typedef struct {int a;} ndrange_t;

	// N.B. The check here only exists to set BL_GLOBAL			// COMMON: %struct.__opencl_block_literal_generic = type { i32, i32, i8 addrspace(4)* }
	// COMMON: @block_G = addrspace(1) constant void (i8 addrspace(3)) addrspace(4) addrspacecast (void (i8 addrspace(3)) addrspace(1) bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* [[BL_GLOBAL:@__block_literal_global(\.[0-9]+)?]] to void (i8 addrspace(3)) addrspace(1)) to void (i8 addrspace(3)) addrspace(4))
				// For a block global variable, first emit the block literal as a global variable, then emit the block variable itself.
				// COMMON: [[BL_GLOBAL:@__block_literal_global[^ ]]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4) } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4), i8 addrspace(3))* [[INV_G:@[^ ]+]] to i8) to i8 addrspace(4)) }
				// COMMON: @block_G = addrspace(1) constant void (i8 addrspace(3)) addrspace(4) addrspacecast (void (i8 addrspace(3)) addrspace(1) bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* [[BL_GLOBAL]] to void (i8 addrspace(3)) addrspace(1)) to void (i8 addrspace(3)) addrspace(4))

				// For anonymous blocks without captures, emit block literals as global variable.
				// COMMON: [[BLG1:@__block_literal_global[^ ]]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4) } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4), i8 addrspace(3))* {{@[^ ]+}} to i8) to i8 addrspace(4)) }
				// COMMON: [[BLG2:@__block_literal_global[^ ]]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4) } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4), i8 addrspace(3))* {{@[^ ]+}} to i8) to i8 addrspace(4)) }
				// COMMON: [[BLG3:@__block_literal_global[^ ]]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4) } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4), i8 addrspace(3))* {{@[^ ]+}} to i8) to i8 addrspace(4)) }
				// COMMON: [[BLG4:@__block_literal_global[^ ]]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4) } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4), i8 addrspace(3))* {{@[^ ]+}} to i8) to i8 addrspace(4)) }
				// COMMON: [[BLG5:@__block_literal_global[^ ]]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4) } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4), i8 addrspace(3))* {{@[^ ]+}} to i8) to i8 addrspace(4)) }
				// COMMON: [[BLG6:@__block_literal_global[^ ]]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4) } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4), i8 addrspace(3), i8 addrspace(3), i8 addrspace(3))* {{@[^ ]+}} to i8) to i8 addrspace(4)) }
				// COMMON: [[BLG7:@__block_literal_global[^ ]]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4) } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4), i8 addrspace(3))* {{@[^ ]+}} to i8) to i8 addrspace(4)) }
				// COMMON: [[BLG8:@__block_literal_global[^ ]]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4) } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4)) [[INVG8:@[^ ]+]] to i8) to i8 addrspace(4)) }
				// COMMON: [[BLG9:@__block_literal_global[^ ]]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4) } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4), i8 addrspace(3))* [[INVG9:@[^ ]+]] to i8) to i8 addrspace(4)) }
				// COMMON: [[BLG10:@__block_literal_global[^ ]]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4) } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4)) {{@[^ ]+}} to i8) to i8 addrspace(4)) }
				// COMMON: [[BLG11:@__block_literal_global[^ ]]] = internal addrspace(1) constant { i32, i32, i8 addrspace(4) } { i32 {{[0-9]+}}, i32 {{[0-9]+}}, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4)) {{@[^ ]+}} to i8) to i8 addrspace(4)) }

				// Emits block literal [[BL_GLOBAL]], invoke function [[INV_G]] and global block variable @block_G
				// COMMON: define internal spir_func void [[INV_G]](i8 addrspace(4)* %{{.}}, i8 addrspace(3) %{{.*}})
	const bl_t block_G = (bl_t) ^ (local void *a) {};			const bl_t block_G = (bl_t) ^ (local void *a) {};

				// COMMON-LABEL: define spir_kernel void @device_side_enqueue(i32 addrspace(1)* %{{.}}, i32 addrspace(1) %b, i32 %i)
	kernel void device_side_enqueue(global int a, global int b, int i) {			kernel void device_side_enqueue(global int a, global int b, int i) {
	// COMMON: %default_queue = alloca %opencl.queue_t*			// COMMON: %default_queue = alloca %opencl.queue_t*
	queue_t default_queue;			queue_t default_queue;
	// COMMON: %flags = alloca i32			// COMMON: %flags = alloca i32
	unsigned flags = 0;			unsigned flags = 0;
	// COMMON: %ndrange = alloca %struct.ndrange_t			// COMMON: %ndrange = alloca %struct.ndrange_t
	ndrange_t ndrange;			ndrange_t ndrange;
	// COMMON: %clk_event = alloca %opencl.clk_event_t*			// COMMON: %clk_event = alloca %opencl.clk_event_t*
	clk_event_t clk_event;			clk_event_t clk_event;
	// COMMON: %event_wait_list = alloca %opencl.clk_event_t*			// COMMON: %event_wait_list = alloca %opencl.clk_event_t*
	clk_event_t event_wait_list;			clk_event_t event_wait_list;
	// COMMON: %event_wait_list2 = alloca [1 x %opencl.clk_event_t*]			// COMMON: %event_wait_list2 = alloca [1 x %opencl.clk_event_t*]
	clk_event_t event_wait_list2[] = {clk_event};			clk_event_t event_wait_list2[] = {clk_event};

				// Emits block literal on stack and block kernel [[INVLK1]].
	// COMMON: [[NDR:%[a-z0-9]+]] = alloca %struct.ndrange_t, align 4			// COMMON: [[NDR:%[a-z0-9]+]] = alloca %struct.ndrange_t, align 4
	// COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t{{.}}, %opencl.queue_t{{.}}* %default_queue			// COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t{{.}}, %opencl.queue_t{{.}}* %default_queue
	// COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags			// COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags
				// COMMON: store i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4)) [[INVL1:@__device_side_enqueue_block_invoke[^ ]]] to i8) to i8 addrspace(4)), i8 addrspace(4)* %block.invoke
	// B32: [[BL:%[0-9]+]] = bitcast <{ i32, i32, i8 addrspace(4), i32 addrspace(1), i32, i32 addrspace(1)* }>* %block to void ()*			// B32: [[BL:%[0-9]+]] = bitcast <{ i32, i32, i8 addrspace(4), i32 addrspace(1), i32, i32 addrspace(1)* }>* %block to void ()*
	// B64: [[BL:%[0-9]+]] = bitcast <{ i32, i32, i8 addrspace(4), i32 addrspace(1), i32 addrspace(1), i32 }> %block to void ()*			// B64: [[BL:%[0-9]+]] = bitcast <{ i32, i32, i8 addrspace(4), i32 addrspace(1), i32 addrspace(1), i32 }> %block to void ()*
	// COMMON: [[BL_I8:%[0-9]+]] = addrspacecast void ()* [[BL]] to i8 addrspace(4)*			// COMMON: [[BL_I8:%[0-9]+]] = addrspacecast void ()* [[BL]] to i8 addrspace(4)*
	// COMMON: call i32 @__enqueue_kernel_basic(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* byval [[NDR]]{{([0-9]+)?}}, i8 addrspace(4)* [[BL_I8]])			// COMMON-LABEL: call i32 @__enqueue_kernel_basic(
				// COMMON-SAME: %opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* byval [[NDR]]{{([0-9]+)?}},
				// COMMON-SAME: i8 addrspace(4)* addrspacecast (i8* bitcast ({{.}} [[INVLK1:[^ ]+_kernel]] to i8) to i8 addrspace(4)*),
				// COMMON-SAME: i8 addrspace(4)* [[BL_I8]])
	enqueue_kernel(default_queue, flags, ndrange,			enqueue_kernel(default_queue, flags, ndrange,
	^(void) {			^(void) {
	a[i] = b[i];			a[i] = b[i];
	});			});

				// Emits block literal on stack and block kernel [[INVLK2]].
	// COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t{{.}}, %opencl.queue_t{{.}}* %default_queue			// COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t{{.}}, %opencl.queue_t{{.}}* %default_queue
	// COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags			// COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags
	// COMMON: [[WAIT_EVNT:%[0-9]+]] = addrspacecast %opencl.clk_event_t{{.}}* %event_wait_list to %opencl.clk_event_t{{.}} addrspace(4)*			// COMMON: [[WAIT_EVNT:%[0-9]+]] = addrspacecast %opencl.clk_event_t{{.}}* %event_wait_list to %opencl.clk_event_t{{.}} addrspace(4)*
	// COMMON: [[EVNT:%[0-9]+]] = addrspacecast %opencl.clk_event_t{{.}}* %clk_event to %opencl.clk_event_t{{.}} addrspace(4)*			// COMMON: [[EVNT:%[0-9]+]] = addrspacecast %opencl.clk_event_t{{.}}* %clk_event to %opencl.clk_event_t{{.}} addrspace(4)*
				// COMMON: store i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4)) [[INVL2:@__device_side_enqueue_block_invoke[^ ]]] to i8) to i8 addrspace(4)), i8 addrspace(4)* %block.invoke
	// COMMON: [[BL:%[0-9]+]] = bitcast <{ i32, i32, i8 addrspace(4), i32{{.}}, i32{{.}}, i32{{.}} }>* %block3 to void ()*			// COMMON: [[BL:%[0-9]+]] = bitcast <{ i32, i32, i8 addrspace(4), i32{{.}}, i32{{.}}, i32{{.}} }>* %block3 to void ()*
	// COMMON: [[BL_I8:%[0-9]+]] = addrspacecast void ()* [[BL]] to i8 addrspace(4)*			// COMMON: [[BL_I8:%[0-9]+]] = addrspacecast void ()* [[BL]] to i8 addrspace(4)*
	// COMMON: call i32 @__enqueue_kernel_basic_events(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* {{.}}, i32 2, %opencl.clk_event_t{{.}}* addrspace(4)* [[WAIT_EVNT]], %opencl.clk_event_t{{.}} addrspace(4)* [[EVNT]], i8 addrspace(4)* [[BL_I8]])			// COMMON-LABEL: call i32 @__enqueue_kernel_basic_events
				// COMMON-SAME: (%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* {{.}}, i32 2, %opencl.clk_event_t{{.}}* addrspace(4)* [[WAIT_EVNT]], %opencl.clk_event_t{{.}} addrspace(4)* [[EVNT]],
				// COMMON-SAME: i8 addrspace(4)* addrspacecast (i8* bitcast ({{.}} [[INVLK2:[^ ]+_kernel]] to i8) to i8 addrspace(4)*),
				// COMMON-SAME: i8 addrspace(4)* [[BL_I8]])

	enqueue_kernel(default_queue, flags, ndrange, 2, &event_wait_list, &clk_event,			enqueue_kernel(default_queue, flags, ndrange, 2, &event_wait_list, &clk_event,
	^(void) {			^(void) {
	a[i] = b[i];			a[i] = b[i];
	});			});

				// Emits global block literal [[BLG1]] and block kernel [[INVGK1]].
	// COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t{{.}}, %opencl.queue_t{{.}}* %default_queue			// COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t{{.}}, %opencl.queue_t{{.}}* %default_queue
	// COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags			// COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags
	// B32: %[[TMP:.*]] = alloca [1 x i32]			// B32: %[[TMP:.*]] = alloca [1 x i32]
	// B32: %[[TMP1:.]] = getelementptr [1 x i32], [1 x i32] %[[TMP]], i32 0, i32 0			// B32: %[[TMP1:.]] = getelementptr [1 x i32], [1 x i32] %[[TMP]], i32 0, i32 0
	// B32: store i32 256, i32* %[[TMP1]], align 4			// B32: store i32 256, i32* %[[TMP1]], align 4
	// B32: call i32 @__enqueue_kernel_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{([0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i32* %[[TMP1]])
	// B64: %[[TMP:.*]] = alloca [1 x i64]			// B64: %[[TMP:.*]] = alloca [1 x i64]
	// B64: %[[TMP1:.]] = getelementptr [1 x i64], [1 x i64] %[[TMP]], i32 0, i32 0			// B64: %[[TMP1:.]] = getelementptr [1 x i64], [1 x i64] %[[TMP]], i32 0, i32 0
	// B64: store i64 256, i64* %[[TMP1]], align 8			// B64: store i64 256, i64* %[[TMP1]], align 8
	// B64: call i32 @__enqueue_kernel_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{([0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i64* %[[TMP1]])			// COMMON-LABEL: call i32 @__enqueue_kernel_vaargs(
				// COMMON-SAME: %opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{([0-9]+)?}},
				// COMMON-SAME: i8 addrspace(4)* addrspacecast (i8* bitcast ({{.}} [[INVGK1:[^ ]+_kernel]] to i8) to i8 addrspace(4)*),
				// COMMON-SAME: i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* [[BLG1]] to i8 addrspace(1)) to i8 addrspace(4)), i32 1,
				// B32-SAME: i32* %[[TMP1]])
				// B64-SAME: i64* %[[TMP1]])
	enqueue_kernel(default_queue, flags, ndrange,			enqueue_kernel(default_queue, flags, ndrange,
	^(local void *p) {			^(local void *p) {
	return;			return;
	},			},
	256);			256);
	char c;			char c;
				// Emits global block literal [[BLG2]] and block kernel [[INVGK2]].
	// COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t{{.}}, %opencl.queue_t{{.}}* %default_queue			// COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t{{.}}, %opencl.queue_t{{.}}* %default_queue
	// COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags			// COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags
	// B32: %[[TMP:.*]] = alloca [1 x i32]			// B32: %[[TMP:.*]] = alloca [1 x i32]
	// B32: %[[TMP1:.]] = getelementptr [1 x i32], [1 x i32] %[[TMP]], i32 0, i32 0			// B32: %[[TMP1:.]] = getelementptr [1 x i32], [1 x i32] %[[TMP]], i32 0, i32 0
	// B32: store i32 %{{.}}, i32 %[[TMP1]], align 4			// B32: store i32 %{{.}}, i32 %[[TMP1]], align 4
	// B32: call i32 @__enqueue_kernel_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{([0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i32* %[[TMP1]])
	// B64: %[[TMP:.*]] = alloca [1 x i64]			// B64: %[[TMP:.*]] = alloca [1 x i64]
	// B64: %[[TMP1:.]] = getelementptr [1 x i64], [1 x i64] %[[TMP]], i32 0, i32 0			// B64: %[[TMP1:.]] = getelementptr [1 x i64], [1 x i64] %[[TMP]], i32 0, i32 0
	// B64: store i64 %{{.}}, i64 %[[TMP1]], align 8			// B64: store i64 %{{.}}, i64 %[[TMP1]], align 8
	// B64: call i32 @__enqueue_kernel_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{([0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i64* %[[TMP1]])			// COMMON-LABEL: call i32 @__enqueue_kernel_vaargs(
				// COMMON-SAME: %opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{([0-9]+)?}},
				// COMMON-SAME: i8 addrspace(4)* addrspacecast (i8* bitcast ({{.}} [[INVGK2:[^ ]+_kernel]] to i8) to i8 addrspace(4)*),
				// COMMON-SAME: i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* [[BLG2]] to i8 addrspace(1)) to i8 addrspace(4)), i32 1,
				// B32-SAME: i32* %[[TMP1]])
				// B64-SAME: i64* %[[TMP1]])
	enqueue_kernel(default_queue, flags, ndrange,			enqueue_kernel(default_queue, flags, ndrange,
	^(local void *p) {			^(local void *p) {
	return;			return;
	},			},
	c);			c);

				// Emits global block literal [[BLG3]] and block kernel [[INVGK3]].
	// COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t{{.}}, %opencl.queue_t{{.}}* %default_queue			// COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t{{.}}, %opencl.queue_t{{.}}* %default_queue
	// COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags			// COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags
	// COMMON: [[AD:%arraydecay[0-9]]] = getelementptr inbounds [1 x %opencl.clk_event_t], [1 x %opencl.clk_event_t] %event_wait_list2, i32 0, i32 0			// COMMON: [[AD:%arraydecay[0-9]]] = getelementptr inbounds [1 x %opencl.clk_event_t], [1 x %opencl.clk_event_t] %event_wait_list2, i32 0, i32 0
	// COMMON: [[WAIT_EVNT:%[0-9]+]] = addrspacecast %opencl.clk_event_t{{.}}* [[AD]] to %opencl.clk_event_t{{.}} addrspace(4)*			// COMMON: [[WAIT_EVNT:%[0-9]+]] = addrspacecast %opencl.clk_event_t{{.}}* [[AD]] to %opencl.clk_event_t{{.}} addrspace(4)*
	// COMMON: [[EVNT:%[0-9]+]] = addrspacecast %opencl.clk_event_t{{.}}* %clk_event to %opencl.clk_event_t{{.}} addrspace(4)*			// COMMON: [[EVNT:%[0-9]+]] = addrspacecast %opencl.clk_event_t{{.}}* %clk_event to %opencl.clk_event_t{{.}} addrspace(4)*
	// B32: %[[TMP:.*]] = alloca [1 x i32]			// B32: %[[TMP:.*]] = alloca [1 x i32]
	// B32: %[[TMP1:.]] = getelementptr [1 x i32], [1 x i32] %[[TMP]], i32 0, i32 0			// B32: %[[TMP1:.]] = getelementptr [1 x i32], [1 x i32] %[[TMP]], i32 0, i32 0
	// B32: store i32 256, i32* %[[TMP1]], align 4			// B32: store i32 256, i32* %[[TMP1]], align 4
	// B32: call i32 @__enqueue_kernel_events_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* {{.}}, i32 2, %opencl.clk_event_t{{.}} [[WAIT_EVNT]], %opencl.clk_event_t{{.}} [[EVNT]], i8 addrspace(4) addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i32* %[[TMP1]])
	// B64: %[[TMP:.*]] = alloca [1 x i64]			// B64: %[[TMP:.*]] = alloca [1 x i64]
	// B64: %[[TMP1:.]] = getelementptr [1 x i64], [1 x i64] %[[TMP]], i32 0, i32 0			// B64: %[[TMP1:.]] = getelementptr [1 x i64], [1 x i64] %[[TMP]], i32 0, i32 0
	// B64: store i64 256, i64* %[[TMP1]], align 8			// B64: store i64 256, i64* %[[TMP1]], align 8
	// B64: call i32 @__enqueue_kernel_events_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* {{.}}, i32 2, %opencl.clk_event_t{{.}} [[WAIT_EVNT]], %opencl.clk_event_t{{.}} [[EVNT]], i8 addrspace(4) addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i64* %[[TMP1]])			// COMMON-LABEL: call i32 @__enqueue_kernel_events_vaargs
				// COMMON-SAME: (%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* {{.}}, i32 2, %opencl.clk_event_t{{.}} [[WAIT_EVNT]], %opencl.clk_event_t{{.*}} [[EVNT]],
				// COMMON-SAME: i8 addrspace(4)* addrspacecast (i8* bitcast ({{.}} [[INVGK3:[^ ]+_kernel]] to i8) to i8 addrspace(4)*),
				// COMMON-SAME: i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* [[BLG3]] to i8 addrspace(1)) to i8 addrspace(4)), i32 1,
				// B32-SAME: i32* %[[TMP1]])
				// B64-SAME: i64* %[[TMP1]])
	enqueue_kernel(default_queue, flags, ndrange, 2, event_wait_list2, &clk_event,			enqueue_kernel(default_queue, flags, ndrange, 2, event_wait_list2, &clk_event,
	^(local void *p) {			^(local void *p) {
	return;			return;
	},			},
	256);			256);

				// Emits global block literal [[BLG4]] and block kernel [[INVGK4]].
	// COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t{{.}}, %opencl.queue_t{{.}}* %default_queue			// COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t{{.}}, %opencl.queue_t{{.}}* %default_queue
	// COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags			// COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags
	// COMMON: [[AD:%arraydecay[0-9]]] = getelementptr inbounds [1 x %opencl.clk_event_t], [1 x %opencl.clk_event_t] %event_wait_list2, i32 0, i32 0			// COMMON: [[AD:%arraydecay[0-9]]] = getelementptr inbounds [1 x %opencl.clk_event_t], [1 x %opencl.clk_event_t] %event_wait_list2, i32 0, i32 0
	// COMMON: [[WAIT_EVNT:%[0-9]+]] = addrspacecast %opencl.clk_event_t{{.}}* [[AD]] to %opencl.clk_event_t{{.}} addrspace(4)*			// COMMON: [[WAIT_EVNT:%[0-9]+]] = addrspacecast %opencl.clk_event_t{{.}}* [[AD]] to %opencl.clk_event_t{{.}} addrspace(4)*
	// COMMON: [[EVNT:%[0-9]+]] = addrspacecast %opencl.clk_event_t{{.}}* %clk_event to %opencl.clk_event_t{{.}} addrspace(4)*			// COMMON: [[EVNT:%[0-9]+]] = addrspacecast %opencl.clk_event_t{{.}}* %clk_event to %opencl.clk_event_t{{.}} addrspace(4)*
	// B32: %[[TMP:.*]] = alloca [1 x i32]			// B32: %[[TMP:.*]] = alloca [1 x i32]
	// B32: %[[TMP1:.]] = getelementptr [1 x i32], [1 x i32] %[[TMP]], i32 0, i32 0			// B32: %[[TMP1:.]] = getelementptr [1 x i32], [1 x i32] %[[TMP]], i32 0, i32 0
	// B32: store i32 %{{.}}, i32 %[[TMP1]], align 4			// B32: store i32 %{{.}}, i32 %[[TMP1]], align 4
	// B32: call i32 @__enqueue_kernel_events_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* {{.}}, i32 2, %opencl.clk_event_t{{.}}* addrspace(4)* [[WAIT_EVNT]], %opencl.clk_event_t{{.}} addrspace(4)* [[EVNT]], i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i32* %[[TMP1]])
	// B64: %[[TMP:.*]] = alloca [1 x i64]			// B64: %[[TMP:.*]] = alloca [1 x i64]
	// B64: %[[TMP1:.]] = getelementptr [1 x i64], [1 x i64] %[[TMP]], i32 0, i32 0			// B64: %[[TMP1:.]] = getelementptr [1 x i64], [1 x i64] %[[TMP]], i32 0, i32 0
	// B64: store i64 %{{.}}, i64 %[[TMP1]], align 8			// B64: store i64 %{{.}}, i64 %[[TMP1]], align 8
	// B64: call i32 @__enqueue_kernel_events_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* {{.}}, i32 2, %opencl.clk_event_t{{.}}* addrspace(4)* [[WAIT_EVNT]], %opencl.clk_event_t{{.}} addrspace(4)* [[EVNT]], i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i64* %[[TMP1]])			// COMMON-LABEL: call i32 @__enqueue_kernel_events_vaargs
				// COMMON-SAME: (%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* {{.}}, i32 2, %opencl.clk_event_t{{.}}* addrspace(4)* [[WAIT_EVNT]], %opencl.clk_event_t{{.}} addrspace(4)* [[EVNT]],
				// COMMON-SAME: i8 addrspace(4)* addrspacecast (i8* bitcast ({{.}} [[INVGK4:[^ ]+_kernel]] to i8) to i8 addrspace(4)*),
				// COMMON-SAME: i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* [[BLG4]] to i8 addrspace(1)) to i8 addrspace(4)), i32 1,
				// B32-SAME: i32* %[[TMP1]])
				// B64-SAME: i64* %[[TMP1]])
	enqueue_kernel(default_queue, flags, ndrange, 2, event_wait_list2, &clk_event,			enqueue_kernel(default_queue, flags, ndrange, 2, event_wait_list2, &clk_event,
	^(local void *p) {			^(local void *p) {
	return;			return;
	},			},
	c);			c);

	long l;			long l;
				// Emits global block literal [[BLG5]] and block kernel [[INVGK5]].
	// COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t{{.}}, %opencl.queue_t{{.}}* %default_queue			// COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t{{.}}, %opencl.queue_t{{.}}* %default_queue
	// COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags			// COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags
	// B32: %[[TMP:.*]] = alloca [1 x i32]			// B32: %[[TMP:.*]] = alloca [1 x i32]
	// B32: %[[TMP1:.]] = getelementptr [1 x i32], [1 x i32] %[[TMP]], i32 0, i32 0			// B32: %[[TMP1:.]] = getelementptr [1 x i32], [1 x i32] %[[TMP]], i32 0, i32 0
	// B32: store i32 %{{.}}, i32 %[[TMP1]], align 4			// B32: store i32 %{{.}}, i32 %[[TMP1]], align 4
	// B32: call i32 @__enqueue_kernel_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{([0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i32* %[[TMP1]])
	// B64: %[[TMP:.*]] = alloca [1 x i64]			// B64: %[[TMP:.*]] = alloca [1 x i64]
	// B64: %[[TMP1:.]] = getelementptr [1 x i64], [1 x i64] %[[TMP]], i32 0, i32 0			// B64: %[[TMP1:.]] = getelementptr [1 x i64], [1 x i64] %[[TMP]], i32 0, i32 0
	// B64: store i64 %{{.}}, i64 %[[TMP1]], align 8			// B64: store i64 %{{.}}, i64 %[[TMP1]], align 8
	// B64: call i32 @__enqueue_kernel_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{([0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i64* %[[TMP1]])			// COMMON-LABEL: call i32 @__enqueue_kernel_vaargs
				// COMMON-SAME: (%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{([0-9]+)?}},
				// COMMON-SAME: i8 addrspace(4)* addrspacecast (i8* bitcast ({{.}} [[INVGK5:[^ ]+_kernel]] to i8) to i8 addrspace(4)*),
				// COMMON-SAME: i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* [[BLG5]] to i8 addrspace(1)) to i8 addrspace(4)), i32 1,
				// B32-SAME: i32* %[[TMP1]])
				// B64-SAME: i64* %[[TMP1]])
	enqueue_kernel(default_queue, flags, ndrange,			enqueue_kernel(default_queue, flags, ndrange,
	^(local void *p) {			^(local void *p) {
	return;			return;
	},			},
	l);			l);

				// Emits global block literal [[BLG6]] and block kernel [[INVGK6]].
	// COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t{{.}}, %opencl.queue_t{{.}}* %default_queue			// COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t{{.}}, %opencl.queue_t{{.}}* %default_queue
	// COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags			// COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags
	// B32: %[[TMP:.*]] = alloca [3 x i32]			// B32: %[[TMP:.*]] = alloca [3 x i32]
	// B32: %[[TMP1:.]] = getelementptr [3 x i32], [3 x i32] %[[TMP]], i32 0, i32 0			// B32: %[[TMP1:.]] = getelementptr [3 x i32], [3 x i32] %[[TMP]], i32 0, i32 0
	// B32: store i32 1, i32* %[[TMP1]], align 4			// B32: store i32 1, i32* %[[TMP1]], align 4
	// B32: %[[TMP2:.]] = getelementptr [3 x i32], [3 x i32] %[[TMP]], i32 0, i32 1			// B32: %[[TMP2:.]] = getelementptr [3 x i32], [3 x i32] %[[TMP]], i32 0, i32 1
	// B32: store i32 2, i32* %[[TMP2]], align 4			// B32: store i32 2, i32* %[[TMP2]], align 4
	// B32: %[[TMP3:.]] = getelementptr [3 x i32], [3 x i32] %[[TMP]], i32 0, i32 2			// B32: %[[TMP3:.]] = getelementptr [3 x i32], [3 x i32] %[[TMP]], i32 0, i32 2
	// B32: store i32 4, i32* %[[TMP3]], align 4			// B32: store i32 4, i32* %[[TMP3]], align 4
	// B32: call i32 @__enqueue_kernel_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{([0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 3, i32* %[[TMP1]])
	// B64: %[[TMP:.*]] = alloca [3 x i64]			// B64: %[[TMP:.*]] = alloca [3 x i64]
	// B64: %[[TMP1:.]] = getelementptr [3 x i64], [3 x i64] %[[TMP]], i32 0, i32 0			// B64: %[[TMP1:.]] = getelementptr [3 x i64], [3 x i64] %[[TMP]], i32 0, i32 0
	// B64: store i64 1, i64* %[[TMP1]], align 8			// B64: store i64 1, i64* %[[TMP1]], align 8
	// B64: %[[TMP2:.]] = getelementptr [3 x i64], [3 x i64] %[[TMP]], i32 0, i32 1			// B64: %[[TMP2:.]] = getelementptr [3 x i64], [3 x i64] %[[TMP]], i32 0, i32 1
	// B64: store i64 2, i64* %[[TMP2]], align 8			// B64: store i64 2, i64* %[[TMP2]], align 8
	// B64: %[[TMP3:.]] = getelementptr [3 x i64], [3 x i64] %[[TMP]], i32 0, i32 2			// B64: %[[TMP3:.]] = getelementptr [3 x i64], [3 x i64] %[[TMP]], i32 0, i32 2
	// B64: store i64 4, i64* %[[TMP3]], align 8			// B64: store i64 4, i64* %[[TMP3]], align 8
	// B64: call i32 @__enqueue_kernel_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{([0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 3, i64* %[[TMP1]])			// COMMON-LABEL: call i32 @__enqueue_kernel_vaargs
				// COMMON-SAME: (%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{([0-9]+)?}},
				// COMMON-SAME: i8 addrspace(4)* addrspacecast (i8* bitcast ({{.}} [[INVGK6:[^ ]+_kernel]] to i8) to i8 addrspace(4)*),
				// COMMON-SAME: i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* [[BLG6]] to i8 addrspace(1)) to i8 addrspace(4)), i32 3,
				// B32-SAME: i32* %[[TMP1]])
				// B64-SAME: i64* %[[TMP1]])
	enqueue_kernel(default_queue, flags, ndrange,			enqueue_kernel(default_queue, flags, ndrange,
	^(local void p1, local void p2, local void *p3) {			^(local void p1, local void p2, local void *p3) {
	return;			return;
	},			},
	1, 2, 4);			1, 2, 4);

				// Emits global block literal [[BLG7]] and block kernel [[INVGK7]].
	// COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t, %opencl.queue_t* %default_queue			// COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t, %opencl.queue_t* %default_queue
	// COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags			// COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags
	// B32: %[[TMP:.*]] = alloca [1 x i32]			// B32: %[[TMP:.*]] = alloca [1 x i32]
	// B32: %[[TMP1:.]] = getelementptr [1 x i32], [1 x i32] %[[TMP]], i32 0, i32 0			// B32: %[[TMP1:.]] = getelementptr [1 x i32], [1 x i32] %[[TMP]], i32 0, i32 0
	// B32: store i32 0, i32* %[[TMP1]], align 4			// B32: store i32 0, i32* %[[TMP1]], align 4
	// B32: call i32 @__enqueue_kernel_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{([0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i32* %[[TMP1]])
	// B64: %[[TMP:.*]] = alloca [1 x i64]			// B64: %[[TMP:.*]] = alloca [1 x i64]
	// B64: %[[TMP1:.]] = getelementptr [1 x i64], [1 x i64] %[[TMP]], i32 0, i32 0			// B64: %[[TMP1:.]] = getelementptr [1 x i64], [1 x i64] %[[TMP]], i32 0, i32 0
	// B64: store i64 4294967296, i64* %[[TMP1]], align 8			// B64: store i64 4294967296, i64* %[[TMP1]], align 8
	// B64: call i32 @__enqueue_kernel_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{([0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i64* %[[TMP1]])			// COMMON-LABEL: call i32 @__enqueue_kernel_vaargs
				// COMMON-SAME: (%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{([0-9]+)?}},
				// COMMON-SAME: i8 addrspace(4)* addrspacecast (i8* bitcast ({{.}} [[INVGK7:[^ ]+_kernel]] to i8) to i8 addrspace(4)*),
				// COMMON-SAME: i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* [[BLG7]] to i8 addrspace(1)) to i8 addrspace(4)), i32 1,
				// B32-SAME: i32* %[[TMP1]])
				// B64-SAME: i64* %[[TMP1]])
	enqueue_kernel(default_queue, flags, ndrange,			enqueue_kernel(default_queue, flags, ndrange,
	^(local void *p) {			^(local void *p) {
	return;			return;
	},			},
	4294967296L);			4294967296L);

				// Emits global block literal [[BLG8]] and invoke function [[INVG8]].
	// The full type of these expressions are long (and repeated elsewhere), so we			// The full type of these expressions are long (and repeated elsewhere), so we
	// capture it as part of the regex for convenience and clarity.			// capture it as part of the regex for convenience and clarity.
	// COMMON: store void () addrspace(4)* addrspacecast (void () addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* [[BL_A:@__block_literal_global(\.[0-9]+)?]] to void () addrspace(1)) to void () addrspace(4)), void () addrspace(4)** %block_A			// COMMON: store void () addrspace(4)* addrspacecast (void () addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* [[BLG8]] to void () addrspace(1)) to void () addrspace(4)), void () addrspace(4)** %block_A
	void (^const block_A)(void) = ^{			void (^const block_A)(void) = ^{
	return;			return;
	};			};

	// COMMON: store void (i8 addrspace(3)) addrspace(4) addrspacecast (void (i8 addrspace(3)) addrspace(1) bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* [[BL_B:@__block_literal_global(\.[0-9]+)?]] to void (i8 addrspace(3)) addrspace(1)) to void (i8 addrspace(3)) addrspace(4)), void (i8 addrspace(3)) addrspace(4)* %block_B			// Emits global block literal [[BLG9]] and invoke function [[INVG9]].
				// COMMON: store void (i8 addrspace(3)) addrspace(4) addrspacecast (void (i8 addrspace(3)) addrspace(1) bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* [[BLG9]] to void (i8 addrspace(3)) addrspace(1)) to void (i8 addrspace(3)) addrspace(4)), void (i8 addrspace(3)) addrspace(4)* %block_B
	void (^const block_B)(local void ) = ^(local void a) {			void (^const block_B)(local void ) = ^(local void a) {
	return;			return;
	};			};

	// COMMON: call i32 @__get_kernel_work_group_size_impl(i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* [[BL_A]] to i8 addrspace(1)) to i8 addrspace(4)))			// Uses global block literal [[BLG8]] and invoke function [[INVG8]].
				// COMMON: [[r1:%.]] = load i8 addrspace(4), i8 addrspace(4)* addrspace(4)* getelementptr inbounds (%struct.__opencl_block_literal_generic, %struct.__opencl_block_literal_generic addrspace(4)* addrspacecast (%struct.__opencl_block_literal_generic addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* [[BLG8]] to %struct.__opencl_block_literal_generic addrspace(1)) to %struct.__opencl_block_literal_generic addrspace(4)), i32 0, i32 2)
				// COMMON: [[r2:%.]] = addrspacecast i8 addrspace(4) [[r1]] to void (i8 addrspace(4))
				// COMMON: call spir_func void [[r2]](i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* [[BLG8]] to i8 addrspace(1)) to i8 addrspace(4)))
				block_A();

				// Emits global block literal [[BLG8]] and block kernel [[INVGK8]]. [[INVGK8]] calls [[INVG8]].
				// COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t{{.}}, %opencl.queue_t{{.}}* %default_queue
				// COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags
				// COMMON-LABEL: call i32 @__enqueue_kernel_basic(
				// COMMON-SAME: %opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* byval [[NDR]]{{([0-9]+)?}},
				// COMMON-SAME: i8 addrspace(4)* addrspacecast (i8* bitcast ({{.}} [[INVGK8:[^ ]+_kernel]] to i8) to i8 addrspace(4)*),
				// COMMON-SAME: i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* [[BLG8]] to i8 addrspace(1)) to i8 addrspace(4)))
				enqueue_kernel(default_queue, flags, ndrange, block_A);

				// Uses block kernel [[INVGK8]] and global block literal [[BLG8]].
				// COMMON: call i32 @__get_kernel_work_group_size_impl(
				// COMMON-SAME: i8 addrspace(4)* addrspacecast (i8* bitcast ({{.}} [[INVGK8]] to i8) to i8 addrspace(4)*),
				// COMMON-SAME: i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* [[BLG8]] to i8 addrspace(1)) to i8 addrspace(4)))
	unsigned size = get_kernel_work_group_size(block_A);			unsigned size = get_kernel_work_group_size(block_A);
	// COMMON: call i32 @__get_kernel_work_group_size_impl(i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* [[BL_B]] to i8 addrspace(1)) to i8 addrspace(4)))
				// Uses global block literal [[BLG8]] and invoke function [[INVG8]]. Make sure no redundant block literal and invoke functions are emitted.
				// COMMON: [[r1:%.]] = load i8 addrspace(4), i8 addrspace(4)* addrspace(4)* getelementptr inbounds (%struct.__opencl_block_literal_generic, %struct.__opencl_block_literal_generic addrspace(4)* addrspacecast (%struct.__opencl_block_literal_generic addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* [[BLG8]] to %struct.__opencl_block_literal_generic addrspace(1)) to %struct.__opencl_block_literal_generic addrspace(4)), i32 0, i32 2)
				// COMMON: [[r2:%.]] = addrspacecast i8 addrspace(4) [[r1]] to void (i8 addrspace(4))
				// COMMON: call spir_func void [[r2]](i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* [[BLG8]] to i8 addrspace(1)) to i8 addrspace(4)))
				block_A();

				// Emits global block literal [[BLG9]] and block kernel [[INVGK9]]. [[INVGK9]] calls [[INV9]].
				// COMMON: call i32 @__get_kernel_work_group_size_impl(
				// COMMON-SAME: i8 addrspace(4)* addrspacecast (i8* bitcast ({{.}} [[INVGK9:[^ ]+_kernel]] to i8) to i8 addrspace(4)*),
				// COMMON-SAME: i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* [[BLG9]] to i8 addrspace(1)) to i8 addrspace(4)))
	size = get_kernel_work_group_size(block_B);			size = get_kernel_work_group_size(block_B);
	// COMMON: call i32 @__get_kernel_preferred_work_group_multiple_impl(i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* [[BL_A]] to i8 addrspace(1)) to i8 addrspace(4)))
				// Uses global block literal [[BLG8]] and block kernel [[INVGK8]]. Make sure no redundant block literal ind invoke functions are emitted.
				// COMMON: call i32 @__get_kernel_preferred_work_group_multiple_impl(
				// COMMON-SAME: i8 addrspace(4)* addrspacecast (i8* bitcast ({{.}} [[INVGK8]] to i8) to i8 addrspace(4)*),
				// COMMON-SAME: i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* [[BLG8]] to i8 addrspace(1)) to i8 addrspace(4)))
	size = get_kernel_preferred_work_group_size_multiple(block_A);			size = get_kernel_preferred_work_group_size_multiple(block_A);
	// COMMON: call i32 @__get_kernel_preferred_work_group_multiple_impl(i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* [[BL_GLOBAL]] to i8 addrspace(1)) to i8 addrspace(4)))
				// Uses global block literal [[BL_GLOBAL]] and block kernel [[INV_G_K]]. [[INV_G_K]] calls [[INV_G]].
				// COMMON: call i32 @__get_kernel_preferred_work_group_multiple_impl(
				// COMMON-SAME: i8 addrspace(4)* addrspacecast (i8* bitcast ({{.}} [[INV_G_K:[^ ]+_kernel]] to i8) to i8 addrspace(4)*),
				// COMMON-SAME: i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* [[BL_GLOBAL]] to i8 addrspace(1)) to i8 addrspace(4)))
	size = get_kernel_preferred_work_group_size_multiple(block_G);			size = get_kernel_preferred_work_group_size_multiple(block_G);

	// COMMON: call i32 @__get_kernel_max_sub_group_size_for_ndrange_impl(%struct.ndrange_t* {{.}}, i8 addrspace(4) addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* {{.}} to i8 addrspace(1)) to i8 addrspace(4)*))			// Emits global block literal [[BLG10]] and block kernel [[INVGK10]].
				// COMMON: call i32 @__get_kernel_max_sub_group_size_for_ndrange_impl(%struct.ndrange_t* {{[^,]+}},
				// COMMON-SAME: i8 addrspace(4)* addrspacecast (i8* bitcast ({{.}} [[INVGK10:[^ ]+_kernel]] to i8) to i8 addrspace(4)*),
				// COMMON-SAME: i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* [[BLG10]] to i8 addrspace(1)) to i8 addrspace(4)))
	size = get_kernel_max_sub_group_size_for_ndrange(ndrange, ^(){});			size = get_kernel_max_sub_group_size_for_ndrange(ndrange, ^(){});
	// COMMON: call i32 @__get_kernel_sub_group_count_for_ndrange_impl(%struct.ndrange_t* {{.}}, i8 addrspace(4) addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* {{.}} to i8 addrspace(1)) to i8 addrspace(4)*))
				// Emits global block literal [[BLG11]] and block kernel [[INVGK11]].
				// COMMON: call i32 @__get_kernel_sub_group_count_for_ndrange_impl(%struct.ndrange_t* {{[^,]+}},
				// COMMON-SAME: i8 addrspace(4)* addrspacecast (i8* bitcast ({{.}} [[INVGK11:[^ ]+_kernel]] to i8) to i8 addrspace(4)*),
				// COMMON-SAME: i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* [[BLG11]] to i8 addrspace(1)) to i8 addrspace(4)))
	size = get_kernel_sub_group_count_for_ndrange(ndrange, ^(){});			size = get_kernel_sub_group_count_for_ndrange(ndrange, ^(){});
	}			}

				// COMMON: define internal spir_kernel void [[INVLK1]](i8 addrspace(4)*) #{{[0-9]+}} {
				// COMMON: entry:
				// COMMON: call void @__device_side_enqueue_block_invoke(i8 addrspace(4)* %0)
				// COMMON: ret void
				// COMMON: }
				// COMMON: define internal spir_kernel void [[INVLK2]](i8 addrspace(4){{.}})
				// COMMON: define internal spir_kernel void [[INVGK1]](i8 addrspace(4){{.}}, i8 addrspace(3){{.}})
				// COMMON: define internal spir_kernel void [[INVGK2]](i8 addrspace(4){{.}}, i8 addrspace(3){{.}})
				// COMMON: define internal spir_kernel void [[INVGK3]](i8 addrspace(4){{.}}, i8 addrspace(3){{.}})
				// COMMON: define internal spir_kernel void [[INVGK4]](i8 addrspace(4){{.}}, i8 addrspace(3){{.}})
				// COMMON: define internal spir_kernel void [[INVGK5]](i8 addrspace(4){{.}}, i8 addrspace(3){{.}})
				// COMMON: define internal spir_kernel void [[INVGK6]](i8 addrspace(4), i8 addrspace(3), i8 addrspace(3), i8 addrspace(3)) #{{[0-9]+}} {
				// COMMON: entry:
				// COMMON: call void @__device_side_enqueue_block_invoke_8(i8 addrspace(4)* %0, i8 addrspace(3)* %1, i8 addrspace(3)* %2, i8 addrspace(3)* %3)
				// COMMON: ret void
				// COMMON: }
				// COMMON: define internal spir_kernel void [[INVGK7]](i8 addrspace(4){{.}}, i8 addrspace(3){{.}})
				// COMMON: define internal spir_func void [[INVG8]](i8 addrspace(4){{.}})
				// COMMON: define internal spir_func void [[INVG9]](i8 addrspace(4){{.}}, i8 addrspace(3)* %{{.*}})
				// COMMON: define internal spir_kernel void [[INVGK8]](i8 addrspace(4){{.}})
				// COMMON: define internal spir_kernel void [[INVGK9]](i8 addrspace(4){{.}}, i8 addrspace(3){{.}})
				// COMMON: define internal spir_kernel void [[INV_G_K]](i8 addrspace(4){{.}}, i8 addrspace(3){{.}})
				// COMMON: define internal spir_kernel void [[INVGK10]](i8 addrspace(4){{.}})
				// COMMON: define internal spir_kernel void [[INVGK11]](i8 addrspace(4){{.}})

This is an archive of the discontinued LLVM Phabricator instance.

[OpenCL] Emit enqueued block as kernelClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 119017

cfe/trunk/lib/CodeGen/CGBlocks.cpp

cfe/trunk/lib/CodeGen/CGBuiltin.cpp

cfe/trunk/lib/CodeGen/CGOpenCLRuntime.h

cfe/trunk/lib/CodeGen/CGOpenCLRuntime.cpp

cfe/trunk/lib/CodeGen/CodeGenFunction.h

cfe/trunk/lib/CodeGen/CodeGenTypes.h

cfe/trunk/lib/CodeGen/TargetInfo.h

cfe/trunk/lib/CodeGen/TargetInfo.cpp

cfe/trunk/test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl

cfe/trunk/test/CodeGenOpenCL/blocks.cl

cfe/trunk/test/CodeGenOpenCL/cl20-device-side-enqueue.cl

[OpenCL] Emit enqueued block as kernel
ClosedPublic