This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
cfe/trunk/
-
trunk/
-
lib/CodeGen/
-
CodeGen/
-
CGBlocks.cpp
-
CGOpenCLRuntime.h
-
CGOpenCLRuntime.cpp
-
TargetInfo.h
-
test/
-
CodeGen/
-
blocks-opencl.cl
-
CodeGenOpenCL/
-
blocks.cl
-
cl20-device-side-enqueue.cl

Differential D37822

[OpenCL] Clean up and add missing fields for block struct
ClosedPublic

Authored by yaxunl on Sep 13 2017, 12:43 PM.

Download Raw Diff

Details

Reviewers

Anastasia
bader

Commits

rG10712d9203a3: [OpenCL] Clean up and add missing fields for block struct
rC314932: [OpenCL] Clean up and add missing fields for block struct
rL314932: [OpenCL] Clean up and add missing fields for block struct

Summary

Currently block is translated to a structure equivalent to

struct Block {
  void *isa;
  int flags;
  int reserved;
  void *invoke;
  void *descriptor;
};

Except invoke, which is the pointer to the block invoke function,
all other fields are useless for OpenCL, which clutter the IR and
also waste memory since the block struct is passed to the block
invoke function as argument.

On the other hand, the size and alignment of the block struct is
not stored in the struct, which causes difficulty to implement
__enqueue_kernel as library function, since the library function
needs to know the size and alignment of the argument which needs
to be passed to the kernel.

This patch removes the useless fields from the block struct and adds
size and align fields. The equivalent block struct will become

struct Block {
  int size;
  int align;
  generic void *invoke;
 /* custom fields */
};

It also changes the pointer to the invoke function to be
a generic pointer since the address space of a function
may not be private on certain targets.

Diff Detail

Repository: rL LLVM

Event Timeline

yaxunl created this revision.Sep 13 2017, 12:43 PM

b-sumner added a subscriber: b-sumner.Sep 13 2017, 1:13 PM

Fix bug about calling blocks.

Could you please explain a bit more why the alignment have to be put explicitly in the struct? I am just not very convinced this is general enough.

lib/CodeGen/CGBlocks.cpp
314 ↗	(On Diff #115222)	Why removing this?
test/CodeGenOpenCL/blocks.cl
17 ↗	(On Diff #115222)	We don't need to check other fields too?

In D37822#872291, @Anastasia wrote:

Could you please explain a bit more why the alignment have to be put explicitly in the struct? I am just not very convinced this is general enough.

The captured variables are fields of the block literal struct. Due to alignment requirement of these fields, there is alignment requirement of
the block literal struct. The ISA of the block invoke function is generated with the assumption of these alignments. If the block literal is
allocated at a memory address not satisfying the alignment requirement, the kernel behavior is undefined.

Generally, __enqueue_kernel library function needs to prepare the kernel argument before launching the kernel. It usually does this by copying
the block literal to some buffer then pass the address of the buffer to the kernel. Then the address of the buffer has to satisfy the alignment
requirement.

If this block literal struct is not general enough, how about add another field as target reserved size, and leave the remaining space of header for
target specific use. And add a target hook to allow target fill the reserved space, e.g.

struct __opencl_block_literal {
  int total_size;
  int align;
  __generic void *invoke;
  int target_reserved_size; /* round up to 4 bytes */
  int target_reserved[];
  /* captures */
};

lib/CodeGen/CGBlocks.cpp
314 ↗	(On Diff #115222)	It is not removed. It is moved to line 307.
test/CodeGenOpenCL/blocks.cl
17 ↗	(On Diff #115222)	will add checks.

In D37822#872446, @yaxunl wrote:
In D37822#872291, @Anastasia wrote:

Could you please explain a bit more why the alignment have to be put explicitly in the struct? I am just not very convinced this is general enough.

The captured variables are fields of the block literal struct. Due to alignment requirement of these fields, there is alignment requirement of
the block literal struct. The ISA of the block invoke function is generated with the assumption of these alignments. If the block literal is
allocated at a memory address not satisfying the alignment requirement, the kernel behavior is undefined.

Generally, __enqueue_kernel library function needs to prepare the kernel argument before launching the kernel. It usually does this by copying
the block literal to some buffer then pass the address of the buffer to the kernel. Then the address of the buffer has to satisfy the alignment
requirement.

If this block literal struct is not general enough, how about add another field as target reserved size, and leave the remaining space of header for
target specific use. And add a target hook to allow target fill the reserved space, e.g.
struct __opencl_block_literal {
  int total_size;
  int align;
  __generic void *invoke;
  int target_reserved_size; /* round up to 4 bytes */
  int target_reserved[];
  /* captures */
};

I like the idea of the target reserved part actually. But not sure how it could be used without adding any target specific methods?

However, I am still not clear why the alignment of this struct has to be different from any other struct Clang produces. Normally the alignment of objects have to be known during IR generation to put them correctly in the attributes of generated alloca, store and loads. But as a field inside struct I don't know how it can be useful. I would imagine enqueue_kernel would just operate on the block as if it would be an arbitrary buffer of data. Also would size of the struct not account for any padding to make sure the alignment can be deduced based on it correctly?

lib/CodeGen/CGOpenCLRuntime.cpp
108 ↗	(On Diff #115222)	Should we put an assert of LangOpts.OpenCL?
test/CodeGen/blocks-opencl.cl
1 ↗	(On Diff #115222)	Btw, do you think we need this test any more? And if yes, could this be moved to CodeGenOpenCL?

In D37822#873876, @Anastasia wrote:
In D37822#872446, @yaxunl wrote:
In D37822#872291, @Anastasia wrote:

Could you please explain a bit more why the alignment have to be put explicitly in the struct? I am just not very convinced this is general enough.

The captured variables are fields of the block literal struct. Due to alignment requirement of these fields, there is alignment requirement of
the block literal struct. The ISA of the block invoke function is generated with the assumption of these alignments. If the block literal is
allocated at a memory address not satisfying the alignment requirement, the kernel behavior is undefined.

Generally, __enqueue_kernel library function needs to prepare the kernel argument before launching the kernel. It usually does this by copying
the block literal to some buffer then pass the address of the buffer to the kernel. Then the address of the buffer has to satisfy the alignment
requirement.

If this block literal struct is not general enough, how about add another field as target reserved size, and leave the remaining space of header for
target specific use. And add a target hook to allow target fill the reserved space, e.g.
struct __opencl_block_literal {
  int total_size;
  int align;
  __generic void *invoke;
  int target_reserved_size; /* round up to 4 bytes */
  int target_reserved[];
  /* captures */
};
I like the idea of the target reserved part actually. But not sure how it could be used without adding any target specific methods?

If we decide to add target reserved fields, I can add target hooks to fill these fields. However I would suggest to leave this for future since I don't see there is need for other fields for now.

However, I am still not clear why the alignment of this struct has to be different from any other struct Clang produces. Normally the alignment of objects have to be known during IR generation to put them correctly in the attributes of generated alloca, store and loads. But as a field inside struct I don't know how it can be useful. I would imagine enqueue_kernel would just operate on the block as if it would be an arbitrary buffer of data. Also would size of the struct not account for any padding to make sure the alignment can be deduced based on it correctly?

enqueue_kernel needs to pass the block struct to the kernel. Let's assume it does this by copying the block struct to a buffer. If enqueue_kernel does not know the alignment of the struct, it can only put it at an arbitrary address in the buffer. Then the kernel has to copy the struct to an aligned private memory and load the fields. However, if the enqueued_kernel knows the alignment of the struct, it can put it at an address satisfying the alignment. Then the kernel can load the fields directly from the buffer, skips the step of copying to an aligned private memory. Therefore, alignment of the block struct is usually a useful information for enqueue_kernel. I think that's why in the SPIRV spec OpEnqueueKernel requires an alignment operand for the block context.

lib/CodeGen/CGOpenCLRuntime.cpp
108 ↗	(On Diff #115222)	Will do.
test/CodeGen/blocks-opencl.cl
1 ↗	(On Diff #115222)	I think this one can be removed since what it tests is covered by CodeGenOpenCL/blocks.cl.

In D37822#877572, @yaxunl wrote:
In D37822#873876, @Anastasia wrote:
In D37822#872446, @yaxunl wrote:
In D37822#872291, @Anastasia wrote:

Could you please explain a bit more why the alignment have to be put explicitly in the struct? I am just not very convinced this is general enough.

The captured variables are fields of the block literal struct. Due to alignment requirement of these fields, there is alignment requirement of
the block literal struct. The ISA of the block invoke function is generated with the assumption of these alignments. If the block literal is
allocated at a memory address not satisfying the alignment requirement, the kernel behavior is undefined.

Generally, __enqueue_kernel library function needs to prepare the kernel argument before launching the kernel. It usually does this by copying
the block literal to some buffer then pass the address of the buffer to the kernel. Then the address of the buffer has to satisfy the alignment
requirement.

If this block literal struct is not general enough, how about add another field as target reserved size, and leave the remaining space of header for
target specific use. And add a target hook to allow target fill the reserved space, e.g.
struct __opencl_block_literal {
  int total_size;
  int align;
  __generic void *invoke;
  int target_reserved_size; /* round up to 4 bytes */
  int target_reserved[];
  /* captures */
};
I like the idea of the target reserved part actually. But not sure how it could be used without adding any target specific methods?
If we decide to add target reserved fields, I can add target hooks to fill these fields. However I would suggest to leave this for future since I don't see there is need for other fields for now.

I could imagine it can be usefull for some vendor implementations.

However, I am still not clear why the alignment of this struct has to be different from any other struct Clang produces. Normally the alignment of objects have to be known during IR generation to put them correctly in the attributes of generated alloca, store and loads. But as a field inside struct I don't know how it can be useful. I would imagine enqueue_kernel would just operate on the block as if it would be an arbitrary buffer of data. Also would size of the struct not account for any padding to make sure the alignment can be deduced based on it correctly?

enqueue_kernel needs to pass the block struct to the kernel. Let's assume it does this by copying the block struct to a buffer. If enqueue_kernel does not know the alignment of the struct, it can only put it at an arbitrary address in the buffer. Then the kernel has to copy the struct to an aligned private memory and load the fields. However, if the enqueued_kernel knows the alignment of the struct, it can put it at an address satisfying the alignment. Then the kernel can load the fields directly from the buffer, skips the step of copying to an aligned private memory. Therefore, alignment of the block struct is usually a useful information for enqueue_kernel. I think that's why in the SPIRV spec OpEnqueueKernel requires an alignment operand for the block context.

Ok, I just think in C if you use malloc to obtain a pointer to some memory location it doesn't take any alignment information. Then you can use the pointer to copy any data including the struct into the location its pointed to. And the pointer can be used later on correctly. I think the alignment is deduced in this case from the type or the size of an object. Do you know where the alignment information is used for SPIRV call? Also how is the block represented in SPIRV?

In D37822#877903, @Anastasia wrote:
In D37822#877572, @yaxunl wrote:
In D37822#873876, @Anastasia wrote:
In D37822#872446, @yaxunl wrote:
In D37822#872291, @Anastasia wrote:

Could you please explain a bit more why the alignment have to be put explicitly in the struct? I am just not very convinced this is general enough.

The captured variables are fields of the block literal struct. Due to alignment requirement of these fields, there is alignment requirement of
the block literal struct. The ISA of the block invoke function is generated with the assumption of these alignments. If the block literal is
allocated at a memory address not satisfying the alignment requirement, the kernel behavior is undefined.

Generally, __enqueue_kernel library function needs to prepare the kernel argument before launching the kernel. It usually does this by copying
the block literal to some buffer then pass the address of the buffer to the kernel. Then the address of the buffer has to satisfy the alignment
requirement.

If this block literal struct is not general enough, how about add another field as target reserved size, and leave the remaining space of header for
target specific use. And add a target hook to allow target fill the reserved space, e.g.
struct __opencl_block_literal {
  int total_size;
  int align;
  __generic void *invoke;
  int target_reserved_size; /* round up to 4 bytes */
  int target_reserved[];
  /* captures */
};
I like the idea of the target reserved part actually. But not sure how it could be used without adding any target specific methods?
If we decide to add target reserved fields, I can add target hooks to fill these fields. However I would suggest to leave this for future since I don't see there is need for other fields for now.
I could imagine it can be usefull for some vendor implementations.

However, I am still not clear why the alignment of this struct has to be different from any other struct Clang produces. Normally the alignment of objects have to be known during IR generation to put them correctly in the attributes of generated alloca, store and loads. But as a field inside struct I don't know how it can be useful. I would imagine enqueue_kernel would just operate on the block as if it would be an arbitrary buffer of data. Also would size of the struct not account for any padding to make sure the alignment can be deduced based on it correctly?

enqueue_kernel needs to pass the block struct to the kernel. Let's assume it does this by copying the block struct to a buffer. If enqueue_kernel does not know the alignment of the struct, it can only put it at an arbitrary address in the buffer. Then the kernel has to copy the struct to an aligned private memory and load the fields. However, if the enqueued_kernel knows the alignment of the struct, it can put it at an address satisfying the alignment. Then the kernel can load the fields directly from the buffer, skips the step of copying to an aligned private memory. Therefore, alignment of the block struct is usually a useful information for enqueue_kernel. I think that's why in the SPIRV spec OpEnqueueKernel requires an alignment operand for the block context.

Ok, I just think in C if you use malloc to obtain a pointer to some memory location it doesn't take any alignment information. Then you can use the pointer to copy any data including the struct into the location its pointed to. And the pointer can be used later on correctly. I think the alignment is deduced in this case from the type or the size of an object. Do you know where the alignment information is used for SPIRV call? Also how is the block represented in SPIRV?

Actually malloc alignment is not sufficient more many uses such as CPU supported vectors, e.g. AVX512 or passed to create buffer with use-host-pointer. In such cases you need posix_memalign or some similar API. Having the alignment means it is available if needed. If an implementation doesn't need it, there is no harm is there?

Revise by Anastasia's comments.

In D37822#877903, @Anastasia wrote:
In D37822#877572, @yaxunl wrote:
In D37822#873876, @Anastasia wrote:
In D37822#872446, @yaxunl wrote:
In D37822#872291, @Anastasia wrote:

Could you please explain a bit more why the alignment have to be put explicitly in the struct? I am just not very convinced this is general enough.

The captured variables are fields of the block literal struct. Due to alignment requirement of these fields, there is alignment requirement of
the block literal struct. The ISA of the block invoke function is generated with the assumption of these alignments. If the block literal is
allocated at a memory address not satisfying the alignment requirement, the kernel behavior is undefined.

Generally, __enqueue_kernel library function needs to prepare the kernel argument before launching the kernel. It usually does this by copying
the block literal to some buffer then pass the address of the buffer to the kernel. Then the address of the buffer has to satisfy the alignment
requirement.

If this block literal struct is not general enough, how about add another field as target reserved size, and leave the remaining space of header for
target specific use. And add a target hook to allow target fill the reserved space, e.g.
struct __opencl_block_literal {
  int total_size;
  int align;
  __generic void *invoke;
  int target_reserved_size; /* round up to 4 bytes */
  int target_reserved[];
  /* captures */
};
I like the idea of the target reserved part actually. But not sure how it could be used without adding any target specific methods?
If we decide to add target reserved fields, I can add target hooks to fill these fields. However I would suggest to leave this for future since I don't see there is need for other fields for now.
I could imagine it can be usefull for some vendor implementations.

However, I am still not clear why the alignment of this struct has to be different from any other struct Clang produces. Normally the alignment of objects have to be known during IR generation to put them correctly in the attributes of generated alloca, store and loads. But as a field inside struct I don't know how it can be useful. I would imagine enqueue_kernel would just operate on the block as if it would be an arbitrary buffer of data. Also would size of the struct not account for any padding to make sure the alignment can be deduced based on it correctly?

enqueue_kernel needs to pass the block struct to the kernel. Let's assume it does this by copying the block struct to a buffer. If enqueue_kernel does not know the alignment of the struct, it can only put it at an arbitrary address in the buffer. Then the kernel has to copy the struct to an aligned private memory and load the fields. However, if the enqueued_kernel knows the alignment of the struct, it can put it at an address satisfying the alignment. Then the kernel can load the fields directly from the buffer, skips the step of copying to an aligned private memory. Therefore, alignment of the block struct is usually a useful information for enqueue_kernel. I think that's why in the SPIRV spec OpEnqueueKernel requires an alignment operand for the block context.

Ok, I just think in C if you use malloc to obtain a pointer to some memory location it doesn't take any alignment information. Then you can use the pointer to copy any data including the struct into the location its pointed to. And the pointer can be used later on correctly. I think the alignment is deduced in this case from the type or the size of an object. Do you know where the alignment information is used for SPIRV call? Also how is the block represented in SPIRV?

If you just use malloc and put your struct in it, there is no guarantee that your struct is aligned at the required alignment, then your kernel cannot load a field directly from that memory. For example, if your first field is an int and the instruction can only load an int from an addr aligned at 4, and your malloc'ed addr is aligned at 1, then you cannot load that int directly. Instead, you need to copy the 4 bytes to an addr aligned at 4, then use that instruction to load it. If you use posix_memalign to get an aligned buffer, then your kernel can generate more efficient code.

OpEnqueueKernel instruction in SPIR-V is for representing OpenCL enqueue_kernel. In SPIR-V block is represented by block invoke function. When enqueue_kernel is translated to OpEnqueueKernel, it is required to provide block invoke function, block context, the size and alignment of the block context.

Add custom fields to block and target hooks to fill them.

Anastasia added inline comments.Sep 25 2017, 9:28 AM

lib/CodeGen/CGBlocks.cpp
311 ↗	(On Diff #116363)	remove one "that".
312 ↗	(On Diff #116363)	I think the alignment might not be computed correctly now if there will be custom fields that might have a bigger size than a pointer? Also what happens if we have captures as well?
850 ↗	(On Diff #116363)	do we need to add numeration to each item name?
1250 ↗	(On Diff #116363)	If we reorder fields and put this on top we can merge the if statements above and below this point.

yaxunl marked 4 inline comments as done.Sep 27 2017, 1:41 PM

yaxunl added inline comments.

lib/CodeGen/CGBlocks.cpp
311 ↗	(On Diff #116363)	will do.
312 ↗	(On Diff #116363)	Will fix. The captures will be accounted for by computeBlockInfo and BlockSize and BlockAlign will be updated.
850 ↗	(On Diff #116363)	yes. will add it.
1250 ↗	(On Diff #116363)	By convention the size of the whole struct is the first field so that the library function reads the first integer and knows how many bytes to copy.

Rebased to ToT and revised by Anastasia's comments.

LGTM! Thanks!

test/CodeGenOpenCL/blocks.cl
30 ↗	(On Diff #116877)	It might be better to give those r0-r7 some names for readability if possible!

This revision is now accepted and ready to land.Oct 4 2017, 9:28 AM

yaxunl marked an inline comment as done.Oct 4 2017, 11:53 AM

yaxunl added inline comments.

test/CodeGenOpenCL/blocks.cl
30 ↗	(On Diff #116877)	Will fix it when committing.

Closed by commit rL314932: [OpenCL] Clean up and add missing fields for block struct (authored by yaxunl). · Explain WhyOct 4 2017, 1:34 PM

This revision was automatically updated to reflect the committed changes.

yaxunl marked an inline comment as done.

Revision Contents

Path

Size

cfe/

trunk/

lib/

CodeGen/

257 lines

3 lines

7 lines

22 lines

test/

CodeGen/

blocks-opencl.cl

17 lines

CodeGenOpenCL/

blocks.cl

47 lines

cl20-device-side-enqueue.cl

53 lines

Diff 117732

cfe/trunk/lib/CodeGen/CGBlocks.cpp

//===--- CGBlocks.cpp - Emit LLVM Code for declarations ---------- C++ --===//		//===--- CGBlocks.cpp - Emit LLVM Code for declarations ---------- C++ --===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This contains code to emit blocks.		// This contains code to emit blocks.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "CGBlocks.h"		#include "CGBlocks.h"
#include "CGDebugInfo.h"		#include "CGDebugInfo.h"
#include "CGObjCRuntime.h"		#include "CGObjCRuntime.h"
		#include "CGOpenCLRuntime.h"
#include "CodeGenFunction.h"		#include "CodeGenFunction.h"
#include "CodeGenModule.h"		#include "CodeGenModule.h"
#include "ConstantEmitter.h"		#include "ConstantEmitter.h"
#include "clang/CodeGen/ConstantInitBuilder.h"		#include "TargetInfo.h"
#include "clang/AST/DeclObjC.h"		#include "clang/AST/DeclObjC.h"
		#include "clang/CodeGen/ConstantInitBuilder.h"
#include "llvm/ADT/SmallSet.h"		#include "llvm/ADT/SmallSet.h"
#include "llvm/IR/CallSite.h"		#include "llvm/IR/CallSite.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
#include <algorithm>		#include <algorithm>
#include <cstdio>		#include <cstdio>

using namespace clang;		using namespace clang;
▲ Show 20 Lines • Show All 267 Lines • ▼ Show 20 Lines
/// Get the low bit of a nonzero character count. This is the		/// Get the low bit of a nonzero character count. This is the
/// alignment of the nth byte if the 0th byte is universally aligned.		/// alignment of the nth byte if the 0th byte is universally aligned.
static CharUnits getLowBit(CharUnits v) {		static CharUnits getLowBit(CharUnits v) {
return CharUnits::fromQuantity(v.getQuantity() & (~v.getQuantity() + 1));		return CharUnits::fromQuantity(v.getQuantity() & (~v.getQuantity() + 1));
}		}

static void initializeForBlockHeader(CodeGenModule &CGM, CGBlockInfo &info,		static void initializeForBlockHeader(CodeGenModule &CGM, CGBlockInfo &info,
SmallVectorImpl<llvm::Type*> &elementTypes) {		SmallVectorImpl<llvm::Type*> &elementTypes) {

		assert(elementTypes.empty());
		if (CGM.getLangOpts().OpenCL) {
		// The header is basically 'struct { int; int; generic void *;
		// custom_fields; }'. Assert that struct is packed.
		auto GenPtrAlign = CharUnits::fromQuantity(
		CGM.getTarget().getPointerAlign(LangAS::opencl_generic) / 8);
		auto GenPtrSize = CharUnits::fromQuantity(
		CGM.getTarget().getPointerWidth(LangAS::opencl_generic) / 8);
		assert(CGM.getIntSize() <= GenPtrSize);
		assert(CGM.getIntAlign() <= GenPtrAlign);
		assert((2 * CGM.getIntSize()).isMultipleOf(GenPtrAlign));
		elementTypes.push_back(CGM.IntTy); /* total size */
		elementTypes.push_back(CGM.IntTy); /* align */
		elementTypes.push_back(
		CGM.getOpenCLRuntime()
		.getGenericVoidPointerType()); /* invoke function */
		unsigned Offset =
		2 * CGM.getIntSize().getQuantity() + GenPtrSize.getQuantity();
		unsigned BlockAlign = GenPtrAlign.getQuantity();
		if (auto *Helper =
		CGM.getTargetCodeGenInfo().getTargetOpenCLBlockHelper()) {
		for (auto I : Helper->getCustomFieldTypes()) /* custom fields */ {
		// TargetOpenCLBlockHelp needs to make sure the struct is packed.
		// If necessary, add padding fields to the custom fields.
		unsigned Align = CGM.getDataLayout().getABITypeAlignment(I);
		if (BlockAlign < Align)
		BlockAlign = Align;
		assert(Offset % Align == 0);
		Offset += CGM.getDataLayout().getTypeAllocSize(I);
		elementTypes.push_back(I);
		}
		}
		info.BlockAlign = CharUnits::fromQuantity(BlockAlign);
		info.BlockSize = CharUnits::fromQuantity(Offset);
		} else {
// The header is basically 'struct { void ; int; int; void ; void *; }'.		// The header is basically 'struct { void ; int; int; void ; void *; }'.
// Assert that that struct is packed.		// Assert that that struct is packed.
assert(CGM.getIntSize() <= CGM.getPointerSize());		assert(CGM.getIntSize() <= CGM.getPointerSize());
assert(CGM.getIntAlign() <= CGM.getPointerAlign());		assert(CGM.getIntAlign() <= CGM.getPointerAlign());
assert((2 * CGM.getIntSize()).isMultipleOf(CGM.getPointerAlign()));		assert((2 * CGM.getIntSize()).isMultipleOf(CGM.getPointerAlign()));

info.BlockAlign = CGM.getPointerAlign();		info.BlockAlign = CGM.getPointerAlign();
info.BlockSize = 3 * CGM.getPointerSize() + 2 * CGM.getIntSize();		info.BlockSize = 3 * CGM.getPointerSize() + 2 * CGM.getIntSize();

assert(elementTypes.empty());
elementTypes.push_back(CGM.VoidPtrTy);		elementTypes.push_back(CGM.VoidPtrTy);
elementTypes.push_back(CGM.IntTy);		elementTypes.push_back(CGM.IntTy);
elementTypes.push_back(CGM.IntTy);		elementTypes.push_back(CGM.IntTy);
elementTypes.push_back(CGM.VoidPtrTy);		elementTypes.push_back(CGM.VoidPtrTy);
elementTypes.push_back(CGM.getBlockDescriptorType());		elementTypes.push_back(CGM.getBlockDescriptorType());
}		}
		}

static QualType getCaptureFieldType(const CodeGenFunction &CGF,		static QualType getCaptureFieldType(const CodeGenFunction &CGF,
const BlockDecl::Capture &CI) {		const BlockDecl::Capture &CI) {
const VarDecl *VD = CI.getVariable();		const VarDecl *VD = CI.getVariable();

// If the variable is captured by an enclosing block or lambda expression,		// If the variable is captured by an enclosing block or lambda expression,
// use the type of the capture field.		// use the type of the capture field.
if (CGF.BlockInfo && CI.isNested())		if (CGF.BlockInfo && CI.isNested())
return CGF.BlockInfo->getCapture(VD).fieldType();		return CGF.BlockInfo->getCapture(VD).fieldType();
if (auto *FD = CGF.LambdaCaptureFields.lookup(VD))		if (auto *FD = CGF.LambdaCaptureFields.lookup(VD))
return FD->getType();		return FD->getType();
return VD->getType();		return VD->getType();
}		}

/// Compute the layout of the given block. Attempts to lay the block		/// Compute the layout of the given block. Attempts to lay the block
/// out with minimal space requirements.		/// out with minimal space requirements.
static void computeBlockInfo(CodeGenModule &CGM, CodeGenFunction *CGF,		static void computeBlockInfo(CodeGenModule &CGM, CodeGenFunction *CGF,
CGBlockInfo &info) {		CGBlockInfo &info) {
ASTContext &C = CGM.getContext();		ASTContext &C = CGM.getContext();
const BlockDecl *block = info.getBlockDecl();		const BlockDecl *block = info.getBlockDecl();

SmallVector<llvm::Type*, 8> elementTypes;		SmallVector<llvm::Type*, 8> elementTypes;
initializeForBlockHeader(CGM, info, elementTypes);		initializeForBlockHeader(CGM, info, elementTypes);
		bool hasNonConstantCustomFields = false;
if (!block->hasCaptures()) {		if (auto *OpenCLHelper =
		CGM.getTargetCodeGenInfo().getTargetOpenCLBlockHelper())
		hasNonConstantCustomFields =
		!OpenCLHelper->areAllCustomFieldValuesConstant(info);
		if (!block->hasCaptures() && !hasNonConstantCustomFields) {
info.StructureType =		info.StructureType =
llvm::StructType::get(CGM.getLLVMContext(), elementTypes, true);		llvm::StructType::get(CGM.getLLVMContext(), elementTypes, true);
info.CanBeGlobal = true;		info.CanBeGlobal = true;
return;		return;
}		}
else if (C.getLangOpts().ObjC1 &&		else if (C.getLangOpts().ObjC1 &&
CGM.getLangOpts().getGC() == LangOptions::NonGC)		CGM.getLangOpts().getGC() == LangOptions::NonGC)
info.HasCapturedVariableLayout = true;		info.HasCapturedVariableLayout = true;
▲ Show 20 Lines • Show All 361 Lines • ▼ Show 20 Lines	llvm::Value CodeGenFunction::EmitBlockLiteral(const BlockExpr blockExpr) {
blockInfo.reset(findAndRemoveBlockInfo(&FirstBlockInfo,		blockInfo.reset(findAndRemoveBlockInfo(&FirstBlockInfo,
blockExpr->getBlockDecl()));		blockExpr->getBlockDecl()));

blockInfo->BlockExpression = blockExpr;		blockInfo->BlockExpression = blockExpr;
return EmitBlockLiteral(*blockInfo);		return EmitBlockLiteral(*blockInfo);
}		}

llvm::Value *CodeGenFunction::EmitBlockLiteral(const CGBlockInfo &blockInfo) {		llvm::Value *CodeGenFunction::EmitBlockLiteral(const CGBlockInfo &blockInfo) {
		bool IsOpenCL = CGM.getContext().getLangOpts().OpenCL;
		auto GenVoidPtrTy =
		IsOpenCL ? CGM.getOpenCLRuntime().getGenericVoidPointerType() : VoidPtrTy;
		unsigned GenVoidPtrAddr = IsOpenCL ? LangAS::opencl_generic : LangAS::Default;
		auto GenVoidPtrSize = CharUnits::fromQuantity(
		CGM.getTarget().getPointerWidth(GenVoidPtrAddr) / 8);
// Using the computed layout, generate the actual block function.		// Using the computed layout, generate the actual block function.
bool isLambdaConv = blockInfo.getBlockDecl()->isConversionFromLambda();		bool isLambdaConv = blockInfo.getBlockDecl()->isConversionFromLambda();
llvm::Constant *blockFn		llvm::Constant *blockFn = CodeGenFunction(CGM, true).GenerateBlockFunction(
= CodeGenFunction(CGM, true).GenerateBlockFunction(CurGD, blockInfo,		CurGD, blockInfo, LocalDeclMap, isLambdaConv, blockInfo.CanBeGlobal);
LocalDeclMap,		blockFn = llvm::ConstantExpr::getPointerCast(blockFn, GenVoidPtrTy);
isLambdaConv,
blockInfo.CanBeGlobal);
blockFn = llvm::ConstantExpr::getBitCast(blockFn, VoidPtrTy);

// If there is nothing to capture, we can emit this as a global block.		// If there is nothing to capture, we can emit this as a global block.
if (blockInfo.CanBeGlobal)		if (blockInfo.CanBeGlobal)
return CGM.getAddrOfGlobalBlockIfEmitted(blockInfo.BlockExpression);		return CGM.getAddrOfGlobalBlockIfEmitted(blockInfo.BlockExpression);

// Otherwise, we have to emit this as a local block.		// Otherwise, we have to emit this as a local block.

llvm::Constant *isa =
(!CGM.getContext().getLangOpts().OpenCL)
? CGM.getNSConcreteStackBlock()
: CGM.getNullPointer(VoidPtrPtrTy,
CGM.getContext().getPointerType(
QualType(CGM.getContext().VoidPtrTy)));
isa = llvm::ConstantExpr::getBitCast(isa, VoidPtrTy);

// Build the block descriptor.
llvm::Constant *descriptor = buildBlockDescriptor(CGM, blockInfo);

Address blockAddr = blockInfo.LocalAddress;		Address blockAddr = blockInfo.LocalAddress;
assert(blockAddr.isValid() && "block has no address!");		assert(blockAddr.isValid() && "block has no address!");

		llvm::Constant *isa;
		llvm::Constant *descriptor;
		BlockFlags flags;
		if (!IsOpenCL) {
		isa = llvm::ConstantExpr::getBitCast(CGM.getNSConcreteStackBlock(),
		VoidPtrTy);

		// Build the block descriptor.
		descriptor = buildBlockDescriptor(CGM, blockInfo);

// Compute the initial on-stack block flags.		// Compute the initial on-stack block flags.
BlockFlags flags = BLOCK_HAS_SIGNATURE;		flags = BLOCK_HAS_SIGNATURE;
if (blockInfo.HasCapturedVariableLayout) flags \|= BLOCK_HAS_EXTENDED_LAYOUT;		if (blockInfo.HasCapturedVariableLayout)
if (blockInfo.NeedsCopyDispose) flags \|= BLOCK_HAS_COPY_DISPOSE;		flags \|= BLOCK_HAS_EXTENDED_LAYOUT;
if (blockInfo.HasCXXObject) flags \|= BLOCK_HAS_CXX_OBJ;		if (blockInfo.NeedsCopyDispose)
if (blockInfo.UsesStret) flags \|= BLOCK_USE_STRET;		flags \|= BLOCK_HAS_COPY_DISPOSE;
		if (blockInfo.HasCXXObject)
		flags \|= BLOCK_HAS_CXX_OBJ;
		if (blockInfo.UsesStret)
		flags \|= BLOCK_USE_STRET;
		}

auto projectField =		auto projectField =
[&](unsigned index, CharUnits offset, const Twine &name) -> Address {		[&](unsigned index, CharUnits offset, const Twine &name) -> Address {
return Builder.CreateStructGEP(blockAddr, index, offset, name);		return Builder.CreateStructGEP(blockAddr, index, offset, name);
};		};
auto storeField =		auto storeField =
[&](llvm::Value *value, unsigned index, CharUnits offset,		[&](llvm::Value *value, unsigned index, CharUnits offset,
const Twine &name) {		const Twine &name) {
Builder.CreateStore(value, projectField(index, offset, name));		Builder.CreateStore(value, projectField(index, offset, name));
};		};

// Initialize the block header.		// Initialize the block header.
{		{
// We assume all the header fields are densely packed.		// We assume all the header fields are densely packed.
unsigned index = 0;		unsigned index = 0;
CharUnits offset;		CharUnits offset;
auto addHeaderField =		auto addHeaderField =
[&](llvm::Value *value, CharUnits size, const Twine &name) {		[&](llvm::Value *value, CharUnits size, const Twine &name) {
storeField(value, index, offset, name);		storeField(value, index, offset, name);
offset += size;		offset += size;
index++;		index++;
};		};

		if (!IsOpenCL) {
addHeaderField(isa, getPointerSize(), "block.isa");		addHeaderField(isa, getPointerSize(), "block.isa");
addHeaderField(llvm::ConstantInt::get(IntTy, flags.getBitMask()),		addHeaderField(llvm::ConstantInt::get(IntTy, flags.getBitMask()),
getIntSize(), "block.flags");		getIntSize(), "block.flags");
addHeaderField(llvm::ConstantInt::get(IntTy, 0),		addHeaderField(llvm::ConstantInt::get(IntTy, 0), getIntSize(),
getIntSize(), "block.reserved");		"block.reserved");
addHeaderField(blockFn, getPointerSize(), "block.invoke");		} else {
		addHeaderField(
		llvm::ConstantInt::get(IntTy, blockInfo.BlockSize.getQuantity()),
		getIntSize(), "block.size");
		addHeaderField(
		llvm::ConstantInt::get(IntTy, blockInfo.BlockAlign.getQuantity()),
		getIntSize(), "block.align");
		}
		addHeaderField(blockFn, GenVoidPtrSize, "block.invoke");
		if (!IsOpenCL)
addHeaderField(descriptor, getPointerSize(), "block.descriptor");		addHeaderField(descriptor, getPointerSize(), "block.descriptor");
		else if (auto *Helper =
		CGM.getTargetCodeGenInfo().getTargetOpenCLBlockHelper()) {
		for (auto I : Helper->getCustomFieldValues(*this, blockInfo)) {
		addHeaderField(
		I.first,
		CharUnits::fromQuantity(
		CGM.getDataLayout().getTypeAllocSize(I.first->getType())),
		I.second);
		}
		}
}		}

// Finally, capture all the values into the block.		// Finally, capture all the values into the block.
const BlockDecl *blockDecl = blockInfo.getBlockDecl();		const BlockDecl *blockDecl = blockInfo.getBlockDecl();

// First, 'this'.		// First, 'this'.
if (blockDecl->capturesCXXThis()) {		if (blockDecl->capturesCXXThis()) {
Address addr = projectField(blockInfo.CXXThisIndex, blockInfo.CXXThisOffset,		Address addr = projectField(blockInfo.CXXThisIndex, blockInfo.CXXThisOffset,
▲ Show 20 Lines • Show All 179 Lines • ▼ Show 20 Lines
}		}

llvm::Type *CodeGenModule::getGenericBlockLiteralType() {		llvm::Type *CodeGenModule::getGenericBlockLiteralType() {
if (GenericBlockLiteralType)		if (GenericBlockLiteralType)
return GenericBlockLiteralType;		return GenericBlockLiteralType;

llvm::Type *BlockDescPtrTy = getBlockDescriptorType();		llvm::Type *BlockDescPtrTy = getBlockDescriptorType();

		if (getLangOpts().OpenCL) {
		// struct __opencl_block_literal_generic {
		// int __size;
		// int __align;
		// __generic void *__invoke;
		// /* custom fields */
		// };
		SmallVector<llvm::Type *, 8> StructFields(
		{IntTy, IntTy, getOpenCLRuntime().getGenericVoidPointerType()});
		if (auto *Helper = getTargetCodeGenInfo().getTargetOpenCLBlockHelper()) {
		for (auto I : Helper->getCustomFieldTypes())
		StructFields.push_back(I);
		}
		GenericBlockLiteralType = llvm::StructType::create(
		StructFields, "struct.__opencl_block_literal_generic");
		} else {
// struct __block_literal_generic {		// struct __block_literal_generic {
// void *__isa;		// void *__isa;
// int __flags;		// int __flags;
// int __reserved;		// int __reserved;
// void (__invoke)(void );		// void (__invoke)(void );
// struct __block_descriptor *__descriptor;		// struct __block_descriptor *__descriptor;
// };		// };
GenericBlockLiteralType =		GenericBlockLiteralType =
llvm::StructType::create("struct.__block_literal_generic", VoidPtrTy,		llvm::StructType::create("struct.__block_literal_generic", VoidPtrTy,
IntTy, IntTy, VoidPtrTy, BlockDescPtrTy);		IntTy, IntTy, VoidPtrTy, BlockDescPtrTy);
		}

return GenericBlockLiteralType;		return GenericBlockLiteralType;
}		}

RValue CodeGenFunction::EmitBlockCallExpr(const CallExpr *E,		RValue CodeGenFunction::EmitBlockCallExpr(const CallExpr *E,
ReturnValueSlot ReturnValue) {		ReturnValueSlot ReturnValue) {
const BlockPointerType *BPT =		const BlockPointerType *BPT =
E->getCallee()->getType()->getAs<BlockPointerType>();		E->getCallee()->getType()->getAs<BlockPointerType>();

llvm::Value *BlockPtr = EmitScalarExpr(E->getCallee());		llvm::Value *BlockPtr = EmitScalarExpr(E->getCallee());

// Get a pointer to the generic block literal.		// Get a pointer to the generic block literal.
// For OpenCL we generate generic AS void ptr to be able to reuse the same		// For OpenCL we generate generic AS void ptr to be able to reuse the same
// block definition for blocks with captures generated as private AS local		// block definition for blocks with captures generated as private AS local
// variables and without captures generated as global AS program scope		// variables and without captures generated as global AS program scope
// variables.		// variables.
unsigned AddrSpace = 0;		unsigned AddrSpace = 0;
if (getLangOpts().OpenCL)		if (getLangOpts().OpenCL)
AddrSpace = getContext().getTargetAddressSpace(LangAS::opencl_generic);		AddrSpace = getContext().getTargetAddressSpace(LangAS::opencl_generic);

llvm::Type *BlockLiteralTy =		llvm::Type *BlockLiteralTy =
llvm::PointerType::get(CGM.getGenericBlockLiteralType(), AddrSpace);		llvm::PointerType::get(CGM.getGenericBlockLiteralType(), AddrSpace);

// Bitcast the callee to a block literal.		// Bitcast the callee to a block literal.
BlockPtr =		BlockPtr =
Builder.CreatePointerCast(BlockPtr, BlockLiteralTy, "block.literal");		Builder.CreatePointerCast(BlockPtr, BlockLiteralTy, "block.literal");

// Get the function pointer from the literal.		// Get the function pointer from the literal.
llvm::Value *FuncPtr =		llvm::Value *FuncPtr =
Builder.CreateStructGEP(CGM.getGenericBlockLiteralType(), BlockPtr, 3);		Builder.CreateStructGEP(CGM.getGenericBlockLiteralType(), BlockPtr,
		CGM.getLangOpts().OpenCL ? 2 : 3);

// Add the block literal.		// Add the block literal.
CallArgList Args;		CallArgList Args;

QualType VoidPtrQualTy = getContext().VoidPtrTy;		QualType VoidPtrQualTy = getContext().VoidPtrTy;
llvm::Type *GenericVoidPtrTy = VoidPtrTy;		llvm::Type *GenericVoidPtrTy = VoidPtrTy;
if (getLangOpts().OpenCL) {		if (getLangOpts().OpenCL) {
GenericVoidPtrTy = Builder.getInt8PtrTy(		GenericVoidPtrTy = CGM.getOpenCLRuntime().getGenericVoidPointerType();
getContext().getTargetAddressSpace(LangAS::opencl_generic));
VoidPtrQualTy =		VoidPtrQualTy =
getContext().getPointerType(getContext().getAddrSpaceQualType(		getContext().getPointerType(getContext().getAddrSpaceQualType(
getContext().VoidTy, LangAS::opencl_generic));		getContext().VoidTy, LangAS::opencl_generic));
}		}

BlockPtr = Builder.CreatePointerCast(BlockPtr, GenericVoidPtrTy);		BlockPtr = Builder.CreatePointerCast(BlockPtr, GenericVoidPtrTy);
Args.add(RValue::get(BlockPtr), VoidPtrQualTy);		Args.add(RValue::get(BlockPtr), VoidPtrQualTy);

QualType FnType = BPT->getPointeeType();		QualType FnType = BPT->getPointeeType();

// And the rest of the arguments.		// And the rest of the arguments.
EmitCallArgs(Args, FnType->getAs<FunctionProtoType>(), E->arguments());		EmitCallArgs(Args, FnType->getAs<FunctionProtoType>(), E->arguments());

// Load the function.		// Load the function.
llvm::Value *Func = Builder.CreateAlignedLoad(FuncPtr, getPointerAlign());		llvm::Value *Func = Builder.CreateAlignedLoad(FuncPtr, getPointerAlign());

const FunctionType *FuncTy = FnType->castAs<FunctionType>();		const FunctionType *FuncTy = FnType->castAs<FunctionType>();
const CGFunctionInfo &FnInfo =		const CGFunctionInfo &FnInfo =
CGM.getTypes().arrangeBlockFunctionCall(Args, FuncTy);		CGM.getTypes().arrangeBlockFunctionCall(Args, FuncTy);

// Cast the function pointer to the right type.		// Cast the function pointer to the right type.
llvm::Type *BlockFTy = CGM.getTypes().GetFunctionType(FnInfo);		llvm::Type *BlockFTy = CGM.getTypes().GetFunctionType(FnInfo);

llvm::Type *BlockFTyPtr = llvm::PointerType::getUnqual(BlockFTy);		llvm::Type *BlockFTyPtr = llvm::PointerType::getUnqual(BlockFTy);
Func = Builder.CreateBitCast(Func, BlockFTyPtr);		Func = Builder.CreatePointerCast(Func, BlockFTyPtr);

// Prepare the callee.		// Prepare the callee.
CGCallee Callee(CGCalleeInfo(), Func);		CGCallee Callee(CGCalleeInfo(), Func);

// And call the block.		// And call the block.
return EmitCall(FnInfo, Callee, ReturnValue, Args);		return EmitCall(FnInfo, Callee, ReturnValue, Args);
}		}

▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	static llvm::Constant *buildGlobalBlock(CodeGenModule &CGM,
// if we've already emitted this block.		// if we've already emitted this block.
assert(!CGM.getAddrOfGlobalBlockIfEmitted(blockInfo.BlockExpression) &&		assert(!CGM.getAddrOfGlobalBlockIfEmitted(blockInfo.BlockExpression) &&
"Refusing to re-emit a global block.");		"Refusing to re-emit a global block.");

// Generate the constants for the block literal initializer.		// Generate the constants for the block literal initializer.
ConstantInitBuilder builder(CGM);		ConstantInitBuilder builder(CGM);
auto fields = builder.beginStruct();		auto fields = builder.beginStruct();

		bool IsOpenCL = CGM.getLangOpts().OpenCL;
		if (!IsOpenCL) {
// isa		// isa
fields.add((!CGM.getContext().getLangOpts().OpenCL)		fields.add(CGM.getNSConcreteGlobalBlock());
? CGM.getNSConcreteGlobalBlock()
: CGM.getNullPointer(CGM.VoidPtrPtrTy,
CGM.getContext().getPointerType(QualType(
CGM.getContext().VoidPtrTy))));

// __flags		// __flags
BlockFlags flags = BLOCK_IS_GLOBAL \| BLOCK_HAS_SIGNATURE;		BlockFlags flags = BLOCK_IS_GLOBAL \| BLOCK_HAS_SIGNATURE;
if (blockInfo.UsesStret) flags \|= BLOCK_USE_STRET;		if (blockInfo.UsesStret)
		flags \|= BLOCK_USE_STRET;

fields.addInt(CGM.IntTy, flags.getBitMask());		fields.addInt(CGM.IntTy, flags.getBitMask());

// Reserved		// Reserved
fields.addInt(CGM.IntTy, 0);		fields.addInt(CGM.IntTy, 0);
		} else {
		fields.addInt(CGM.IntTy, blockInfo.BlockSize.getQuantity());
		fields.addInt(CGM.IntTy, blockInfo.BlockAlign.getQuantity());
		}

// Function		// Function
fields.add(blockFn);		fields.add(blockFn);

		if (!IsOpenCL) {
// Descriptor		// Descriptor
fields.add(buildBlockDescriptor(CGM, blockInfo));		fields.add(buildBlockDescriptor(CGM, blockInfo));
		} else if (auto *Helper =
		CGM.getTargetCodeGenInfo().getTargetOpenCLBlockHelper()) {
		for (auto I : Helper->getCustomFieldValues(CGM, blockInfo)) {
		fields.add(I);
		}
		}

unsigned AddrSpace = 0;		unsigned AddrSpace = 0;
if (CGM.getContext().getLangOpts().OpenCL)		if (CGM.getContext().getLangOpts().OpenCL)
AddrSpace = CGM.getContext().getTargetAddressSpace(LangAS::opencl_global);		AddrSpace = CGM.getContext().getTargetAddressSpace(LangAS::opencl_global);

llvm::Constant *literal = fields.finishAndCreateGlobal(		llvm::Constant *literal = fields.finishAndCreateGlobal(
"__block_literal_global", blockInfo.BlockAlign,		"__block_literal_global", blockInfo.BlockAlign,
/constant/ true, llvm::GlobalVariable::InternalLinkage, AddrSpace);		/constant/ true, llvm::GlobalVariable::InternalLinkage, AddrSpace);
▲ Show 20 Lines • Show All 108 Lines • ▼ Show 20 Lines	CodeGenFunction::GenerateBlockFunction(GlobalDecl GD,

llvm::FunctionType *fnLLVMType = CGM.getTypes().GetFunctionType(fnInfo);		llvm::FunctionType *fnLLVMType = CGM.getTypes().GetFunctionType(fnInfo);

StringRef name = CGM.getBlockMangledName(GD, blockDecl);		StringRef name = CGM.getBlockMangledName(GD, blockDecl);
llvm::Function *fn = llvm::Function::Create(		llvm::Function *fn = llvm::Function::Create(
fnLLVMType, llvm::GlobalValue::InternalLinkage, name, &CGM.getModule());		fnLLVMType, llvm::GlobalValue::InternalLinkage, name, &CGM.getModule());
CGM.SetInternalFunctionAttributes(blockDecl, fn, fnInfo);		CGM.SetInternalFunctionAttributes(blockDecl, fn, fnInfo);

if (BuildGlobalBlock)		if (BuildGlobalBlock) {
		auto GenVoidPtrTy = getContext().getLangOpts().OpenCL
		? CGM.getOpenCLRuntime().getGenericVoidPointerType()
		: VoidPtrTy;
buildGlobalBlock(CGM, blockInfo,		buildGlobalBlock(CGM, blockInfo,
llvm::ConstantExpr::getBitCast(fn, VoidPtrTy));		llvm::ConstantExpr::getPointerCast(fn, GenVoidPtrTy));
		}

// Begin generating the function.		// Begin generating the function.
StartFunction(blockDecl, fnType->getReturnType(), fn, fnInfo, args,		StartFunction(blockDecl, fnType->getReturnType(), fn, fnInfo, args,
blockDecl->getLocation(),		blockDecl->getLocation(),
blockInfo.getBlockExpr()->getBody()->getLocStart());		blockInfo.getBlockExpr()->getBody()->getLocStart());

// Okay. Undo some of what StartFunction did.		// Okay. Undo some of what StartFunction did.

▲ Show 20 Lines • Show All 1,207 Lines • Show Last 20 Lines

cfe/trunk/lib/CodeGen/CGOpenCLRuntime.h

Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	public:

// \brief Returnes a value which indicates the size in bytes of the pipe		// \brief Returnes a value which indicates the size in bytes of the pipe
// element.		// element.
virtual llvm::Value getPipeElemSize(const Expr PipeArg);		virtual llvm::Value getPipeElemSize(const Expr PipeArg);

// \brief Returnes a value which indicates the alignment in bytes of the pipe		// \brief Returnes a value which indicates the alignment in bytes of the pipe
// element.		// element.
virtual llvm::Value getPipeElemAlign(const Expr PipeArg);		virtual llvm::Value getPipeElemAlign(const Expr PipeArg);

		/// \return __generic void* type.
		llvm::PointerType *getGenericVoidPointerType();
};		};

}		}
}		}

#endif		#endif

cfe/trunk/lib/CodeGen/CGOpenCLRuntime.cpp

Show First 20 Lines • Show All 97 Lines • ▼ Show 20 Lines	llvm::Value CGOpenCLRuntime::getPipeElemAlign(const Expr PipeArg) {
const PipeType *PipeTy = PipeArg->getType()->getAs<PipeType>();		const PipeType *PipeTy = PipeArg->getType()->getAs<PipeType>();
// The type of the last (implicit) argument to be passed.		// The type of the last (implicit) argument to be passed.
llvm::Type *Int32Ty = llvm::IntegerType::getInt32Ty(CGM.getLLVMContext());		llvm::Type *Int32Ty = llvm::IntegerType::getInt32Ty(CGM.getLLVMContext());
unsigned TypeSize = CGM.getContext()		unsigned TypeSize = CGM.getContext()
.getTypeAlignInChars(PipeTy->getElementType())		.getTypeAlignInChars(PipeTy->getElementType())
.getQuantity();		.getQuantity();
return llvm::ConstantInt::get(Int32Ty, TypeSize, false);		return llvm::ConstantInt::get(Int32Ty, TypeSize, false);
}		}

		llvm::PointerType *CGOpenCLRuntime::getGenericVoidPointerType() {
		assert(CGM.getLangOpts().OpenCL);
		return llvm::IntegerType::getInt8PtrTy(
		CGM.getLLVMContext(),
		CGM.getContext().getTargetAddressSpace(LangAS::opencl_generic));
		}

cfe/trunk/lib/CodeGen/TargetInfo.h

Show All 31 Lines

namespace clang {		namespace clang {
class Decl;		class Decl;

namespace CodeGen {		namespace CodeGen {
class ABIInfo;		class ABIInfo;
class CallArgList;		class CallArgList;
class CodeGenFunction;		class CodeGenFunction;
		class CGBlockInfo;
class CGFunctionInfo;		class CGFunctionInfo;

/// TargetCodeGenInfo - This class organizes various target-specific		/// TargetCodeGenInfo - This class organizes various target-specific
/// codegeneration issues, like target-specific attributes, builtins and so		/// codegeneration issues, like target-specific attributes, builtins and so
/// on.		/// on.
class TargetCodeGenInfo {		class TargetCodeGenInfo {
ABIInfo *Info;		ABIInfo *Info;

▲ Show 20 Lines • Show All 212 Lines • ▼ Show 20 Lines	public:
/// \param DestTy is the destination LLVM pointer type.		/// \param DestTy is the destination LLVM pointer type.
virtual llvm::Constant *		virtual llvm::Constant *
performAddrSpaceCast(CodeGenModule &CGM, llvm::Constant *V, unsigned SrcAddr,		performAddrSpaceCast(CodeGenModule &CGM, llvm::Constant *V, unsigned SrcAddr,
unsigned DestAddr, llvm::Type *DestTy) const;		unsigned DestAddr, llvm::Type *DestTy) const;

/// Get the syncscope used in LLVM IR.		/// Get the syncscope used in LLVM IR.
virtual llvm::SyncScope::ID getLLVMSyncScopeID(SyncScope S,		virtual llvm::SyncScope::ID getLLVMSyncScopeID(SyncScope S,
llvm::LLVMContext &C) const;		llvm::LLVMContext &C) const;

		/// Inteface class for filling custom fields of a block literal for OpenCL.
		class TargetOpenCLBlockHelper {
		public:
		typedef std::pair<llvm::Value *, StringRef> ValueTy;
		TargetOpenCLBlockHelper() {}
		virtual ~TargetOpenCLBlockHelper() {}
		/// Get the custom field types for OpenCL blocks.
		virtual llvm::SmallVector<llvm::Type *, 1> getCustomFieldTypes() = 0;
		/// Get the custom field values for OpenCL blocks.
		virtual llvm::SmallVector<ValueTy, 1>
		getCustomFieldValues(CodeGenFunction &CGF, const CGBlockInfo &Info) = 0;
		virtual bool areAllCustomFieldValuesConstant(const CGBlockInfo &Info) = 0;
		/// Get the custom field values for OpenCL blocks if all values are LLVM
		/// constants.
		virtual llvm::SmallVector<llvm::Constant *, 1>
		getCustomFieldValues(CodeGenModule &CGM, const CGBlockInfo &Info) = 0;
		};
		virtual TargetOpenCLBlockHelper *getTargetOpenCLBlockHelper() const {
		return nullptr;
		}
};		};

} // namespace CodeGen		} // namespace CodeGen
} // namespace clang		} // namespace clang

#endif // LLVM_CLANG_LIB_CODEGEN_TARGETINFO_H		#endif // LLVM_CLANG_LIB_CODEGEN_TARGETINFO_H

cfe/trunk/test/CodeGen/blocks-opencl.cl

	// RUN: %clang_cc1 -O0 %s -ffake-address-space-map -emit-llvm -o - -fblocks -triple x86_64-unknown-unknown \| FileCheck %s
	// This used to crash due to trying to generate a bitcase from a cstring
	// in the constant address space to i8* in AS0.

	void dummy(float (^const op)(float)) {
	}

	// CHECK: i8 addrspace(2)* getelementptr inbounds ([9 x i8], [9 x i8] addrspace(2)* @.str, i32 0, i32 0)

	kernel void test_block()
	{
	float (^const X)(float) = ^(float x) {
	return x + 42.0f;
	};
	dummy(X);
	}

cfe/trunk/test/CodeGenOpenCL/blocks.cl

	// RUN: %clang_cc1 %s -cl-std=CL2.0 -emit-llvm -o - -O0 -triple spir-unknown-unknown \| FileCheck -check-prefix=GENERIC -check-prefix=COMMON %s			// RUN: %clang_cc1 %s -cl-std=CL2.0 -emit-llvm -o - -O0 -triple spir-unknown-unknown \| FileCheck -check-prefixes=COMMON,SPIR %s
	// RUN: %clang_cc1 %s -cl-std=CL2.0 -emit-llvm -o - -O0 -triple amdgcn-amd-amdhsa-opencl \| FileCheck -check-prefix=AMD -check-prefix=COMMON %s			// RUN: %clang_cc1 %s -cl-std=CL2.0 -emit-llvm -o - -O0 -triple amdgcn-amd-amdhsa-opencl \| FileCheck -check-prefixes=COMMON,AMD %s

	// Checking for null instead of @__NSConcreteGlobalBlock symbol			// COMMON: %struct.__opencl_block_literal_generic = type { i32, i32, i8 addrspace(4)* }
	// COMMON: @__block_literal_global = internal addrspace(1) constant { i8*, i32, i32, i8, %struct.__block_descriptor addrspace(2)* } { i8** null			// SPIR: @__block_literal_global = internal addrspace(1) constant { i32, i32, i8 addrspace(4)* } { i32 12, i32 4, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4), i8 addrspace(3))* @block_A_block_invoke to i8) to i8 addrspace(4)) }
				// AMD: @__block_literal_global = internal addrspace(1) constant { i32, i32, i8 addrspace(4)* } { i32 16, i32 8, i8 addrspace(4)* addrspacecast (i8* bitcast (void (i8 addrspace(4), i8 addrspace(3))* @block_A_block_invoke to i8) to i8 addrspace(4)) }
				// COMMON-NOT: .str

				// COMMON-LABEL: define internal {{.}}void @block_A_block_invoke(i8 addrspace(4) %.block_descriptor, i8 addrspace(3)* %a)
	void (^block_A)(local void ) = ^(local void a) {			void (^block_A)(local void ) = ^(local void a) {
	return;			return;
	};			};

				// COMMON-LABEL: define {{.*}}void @foo()
	void foo(){			void foo(){
	int i;			int i;
	// Checking for null instead of @_NSConcreteStackBlock symbol			// COMMON-NOT: %block.isa
	// COMMON: store i8* null, i8** %block.isa			// COMMON-NOT: %block.flags
				// COMMON-NOT: %block.reserved
				// COMMON-NOT: %block.descriptor
				// COMMON: %[[block_size:.]] = getelementptr inbounds <{ i32, i32, i8 addrspace(4), i32 }>, <{ i32, i32, i8 addrspace(4), i32 }> %block, i32 0, i32 0
				// SPIR: store i32 16, i32* %[[block_size]]
				// AMD: store i32 20, i32* %[[block_size]]
				// COMMON: %[[block_align:.]] = getelementptr inbounds <{ i32, i32, i8 addrspace(4), i32 }>, <{ i32, i32, i8 addrspace(4), i32 }> %block, i32 0, i32 1
				// SPIR: store i32 4, i32* %[[block_align]]
				// AMD: store i32 8, i32* %[[block_align]]
				// COMMON: %[[block_invoke:.]] = getelementptr inbounds <{ i32, i32, i8 addrspace(4), i32 }>, <{ i32, i32, i8 addrspace(4), i32 }> %[[block:.*]], i32 0, i32 2
				// COMMON: store i8 addrspace(4)* addrspacecast (i8* bitcast (i32 (i8 addrspace(4)) @__foo_block_invoke to i8) to i8 addrspace(4)), i8 addrspace(4)** %[[block_invoke]]
				// COMMON: %[[block_captured:.]] = getelementptr inbounds <{ i32, i32, i8 addrspace(4), i32 }>, <{ i32, i32, i8 addrspace(4), i32 }> %[[block]], i32 0, i32 3
				// COMMON: %[[i_value:.]] = load i32, i32 %i
				// COMMON: store i32 %[[i_value]], i32* %[[block_captured]],
				// COMMON: %[[blk_ptr:.]] = bitcast <{ i32, i32, i8 addrspace(4), i32 }>* %[[block]] to i32 ()*
				// COMMON: %[[blk_gen_ptr:.]] = addrspacecast i32 () %[[blk_ptr]] to i32 () addrspace(4)*
				// COMMON: store i32 () addrspace(4)* %[[blk_gen_ptr]], i32 () addrspace(4)** %[[block_B:.*]],
				// COMMON: %[[blk_gen_ptr:.]] = load i32 () addrspace(4), i32 () addrspace(4)** %[[block_B]]
				// COMMON: %[[block_literal:.]] = bitcast i32 () addrspace(4) %[[blk_gen_ptr]] to %struct.__opencl_block_literal_generic addrspace(4)*
				// COMMON: %[[invoke_addr:.]] = getelementptr inbounds %struct.__opencl_block_literal_generic, %struct.__opencl_block_literal_generic addrspace(4) %[[block_literal]], i32 0, i32 2
				// COMMON: %[[blk_gen_ptr:.]] = bitcast %struct.__opencl_block_literal_generic addrspace(4) %[[block_literal]] to i8 addrspace(4)*
				// COMMON: %[[invoke_func_ptr:.]] = load i8 addrspace(4), i8 addrspace(4)* addrspace(4)* %[[invoke_addr]]
				// COMMON: %[[invoke_func:.]] = addrspacecast i8 addrspace(4) %[[invoke_func_ptr]] to i32 (i8 addrspace(4))
				// COMMON: call {{.}}i32 %[[invoke_func]](i8 addrspace(4) %[[blk_gen_ptr]])

	int (^ block_B)(void) = ^{			int (^ block_B)(void) = ^{
	return i;			return i;
	};			};
				block_B();
	}			}

				// COMMON-LABEL: define internal {{.}}i32 @__foo_block_invoke(i8 addrspace(4) %.block_descriptor)
				// COMMON: %[[block:.]] = bitcast i8 addrspace(4) %.block_descriptor to <{ i32, i32, i8 addrspace(4), i32 }> addrspace(4)
				// COMMON: %[[block_capture_addr:.]] = getelementptr inbounds <{ i32, i32, i8 addrspace(4), i32 }>, <{ i32, i32, i8 addrspace(4), i32 }> addrspace(4) %[[block]], i32 0, i32 3
				// COMMON: %[[block_capture:.]] = load i32, i32 addrspace(4) %[[block_capture_addr]]

cfe/trunk/test/CodeGenOpenCL/cl20-device-side-enqueue.cl

	// RUN: %clang_cc1 %s -cl-std=CL2.0 -ffake-address-space-map -O0 -emit-llvm -o - -triple "spir-unknown-unknown" \| FileCheck %s --check-prefix=COMMON --check-prefix=B32			// RUN: %clang_cc1 %s -cl-std=CL2.0 -ffake-address-space-map -O0 -emit-llvm -o - -triple "spir-unknown-unknown" \| FileCheck %s --check-prefix=COMMON --check-prefix=B32
	// RUN: %clang_cc1 %s -cl-std=CL2.0 -ffake-address-space-map -O0 -emit-llvm -o - -triple "spir64-unknown-unknown" \| FileCheck %s --check-prefix=COMMON --check-prefix=B64			// RUN: %clang_cc1 %s -cl-std=CL2.0 -ffake-address-space-map -O0 -emit-llvm -o - -triple "spir64-unknown-unknown" \| FileCheck %s --check-prefix=COMMON --check-prefix=B64

	#pragma OPENCL EXTENSION cl_khr_subgroups : enable			#pragma OPENCL EXTENSION cl_khr_subgroups : enable

	typedef void (^bl_t)(local void *);			typedef void (^bl_t)(local void *);
	typedef struct {int a;} ndrange_t;			typedef struct {int a;} ndrange_t;

	// N.B. The check here only exists to set BL_GLOBAL			// N.B. The check here only exists to set BL_GLOBAL
	// COMMON: @block_G = addrspace(1) constant void (i8 addrspace(3)) addrspace(4) addrspacecast (void (i8 addrspace(3)) addrspace(1) bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor addrspace(2)* } addrspace(1)* [[BL_GLOBAL:@__block_literal_global(\.[0-9]+)?]] to void (i8 addrspace(3)) addrspace(1)) to void (i8 addrspace(3)) addrspace(4))			// COMMON: @block_G = addrspace(1) constant void (i8 addrspace(3)) addrspace(4) addrspacecast (void (i8 addrspace(3)) addrspace(1) bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* [[BL_GLOBAL:@__block_literal_global(\.[0-9]+)?]] to void (i8 addrspace(3)) addrspace(1)) to void (i8 addrspace(3)) addrspace(4))
	const bl_t block_G = (bl_t) ^ (local void *a) {};			const bl_t block_G = (bl_t) ^ (local void *a) {};

	kernel void device_side_enqueue(global int a, global int b, int i) {			kernel void device_side_enqueue(global int a, global int b, int i) {
	// COMMON: %default_queue = alloca %opencl.queue_t*			// COMMON: %default_queue = alloca %opencl.queue_t*
	queue_t default_queue;			queue_t default_queue;
	// COMMON: %flags = alloca i32			// COMMON: %flags = alloca i32
	unsigned flags = 0;			unsigned flags = 0;
	// COMMON: %ndrange = alloca %struct.ndrange_t			// COMMON: %ndrange = alloca %struct.ndrange_t
	ndrange_t ndrange;			ndrange_t ndrange;
	// COMMON: %clk_event = alloca %opencl.clk_event_t*			// COMMON: %clk_event = alloca %opencl.clk_event_t*
	clk_event_t clk_event;			clk_event_t clk_event;
	// COMMON: %event_wait_list = alloca %opencl.clk_event_t*			// COMMON: %event_wait_list = alloca %opencl.clk_event_t*
	clk_event_t event_wait_list;			clk_event_t event_wait_list;
	// COMMON: %event_wait_list2 = alloca [1 x %opencl.clk_event_t*]			// COMMON: %event_wait_list2 = alloca [1 x %opencl.clk_event_t*]
	clk_event_t event_wait_list2[] = {clk_event};			clk_event_t event_wait_list2[] = {clk_event};

	// COMMON: [[NDR:%[a-z0-9]+]] = alloca %struct.ndrange_t, align 4			// COMMON: [[NDR:%[a-z0-9]+]] = alloca %struct.ndrange_t, align 4
	// COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t{{.}}, %opencl.queue_t{{.}}* %default_queue			// COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t{{.}}, %opencl.queue_t{{.}}* %default_queue
	// COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags			// COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags
	// COMMON: [[BL:%[0-9]+]] = bitcast <{ i8, i32, i32, i8, %struct.__block_descriptor addrspace(2), i32{{.}}, i32{{.}}, i32{{.}} }>* %block to void ()*			// B32: [[BL:%[0-9]+]] = bitcast <{ i32, i32, i8 addrspace(4), i32 addrspace(1), i32, i32 addrspace(1)* }>* %block to void ()*
				// B64: [[BL:%[0-9]+]] = bitcast <{ i32, i32, i8 addrspace(4), i32 addrspace(1), i32 addrspace(1), i32 }> %block to void ()*
	// COMMON: [[BL_I8:%[0-9]+]] = addrspacecast void ()* [[BL]] to i8 addrspace(4)*			// COMMON: [[BL_I8:%[0-9]+]] = addrspacecast void ()* [[BL]] to i8 addrspace(4)*
	// COMMON: call i32 @__enqueue_kernel_basic(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* byval [[NDR]]{{(.[0-9]+)?}}, i8 addrspace(4)* [[BL_I8]])			// COMMON: call i32 @__enqueue_kernel_basic(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* byval [[NDR]]{{([0-9]+)?}}, i8 addrspace(4)* [[BL_I8]])
	enqueue_kernel(default_queue, flags, ndrange,			enqueue_kernel(default_queue, flags, ndrange,
	^(void) {			^(void) {
	a[i] = b[i];			a[i] = b[i];
	});			});

	// COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t{{.}}, %opencl.queue_t{{.}}* %default_queue			// COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t{{.}}, %opencl.queue_t{{.}}* %default_queue
	// COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags			// COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags
	// COMMON: [[WAIT_EVNT:%[0-9]+]] = addrspacecast %opencl.clk_event_t{{.}}* %event_wait_list to %opencl.clk_event_t{{.}} addrspace(4)*			// COMMON: [[WAIT_EVNT:%[0-9]+]] = addrspacecast %opencl.clk_event_t{{.}}* %event_wait_list to %opencl.clk_event_t{{.}} addrspace(4)*
	// COMMON: [[EVNT:%[0-9]+]] = addrspacecast %opencl.clk_event_t{{.}}* %clk_event to %opencl.clk_event_t{{.}} addrspace(4)*			// COMMON: [[EVNT:%[0-9]+]] = addrspacecast %opencl.clk_event_t{{.}}* %clk_event to %opencl.clk_event_t{{.}} addrspace(4)*
	// COMMON: [[BL:%[0-9]+]] = bitcast <{ i8, i32, i32, i8, %struct.__block_descriptor addrspace(2), i32{{.}}, i32{{.}}, i32{{.}} }>* %block3 to void ()*			// COMMON: [[BL:%[0-9]+]] = bitcast <{ i32, i32, i8 addrspace(4), i32{{.}}, i32{{.}}, i32{{.}} }>* %block3 to void ()*
	// COMMON: [[BL_I8:%[0-9]+]] = addrspacecast void ()* [[BL]] to i8 addrspace(4)*			// COMMON: [[BL_I8:%[0-9]+]] = addrspacecast void ()* [[BL]] to i8 addrspace(4)*
	// COMMON: call i32 @__enqueue_kernel_basic_events(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* {{.}}, i32 2, %opencl.clk_event_t{{.}}* addrspace(4)* [[WAIT_EVNT]], %opencl.clk_event_t{{.}} addrspace(4)* [[EVNT]], i8 addrspace(4)* [[BL_I8]])			// COMMON: call i32 @__enqueue_kernel_basic_events(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* {{.}}, i32 2, %opencl.clk_event_t{{.}}* addrspace(4)* [[WAIT_EVNT]], %opencl.clk_event_t{{.}} addrspace(4)* [[EVNT]], i8 addrspace(4)* [[BL_I8]])
	enqueue_kernel(default_queue, flags, ndrange, 2, &event_wait_list, &clk_event,			enqueue_kernel(default_queue, flags, ndrange, 2, &event_wait_list, &clk_event,
	^(void) {			^(void) {
	a[i] = b[i];			a[i] = b[i];
	});			});

	// COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t{{.}}, %opencl.queue_t{{.}}* %default_queue			// COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t{{.}}, %opencl.queue_t{{.}}* %default_queue
	// COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags			// COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags
	// B32: %[[TMP:.*]] = alloca [1 x i32]			// B32: %[[TMP:.*]] = alloca [1 x i32]
	// B32: %[[TMP1:.]] = getelementptr [1 x i32], [1 x i32] %[[TMP]], i32 0, i32 0			// B32: %[[TMP1:.]] = getelementptr [1 x i32], [1 x i32] %[[TMP]], i32 0, i32 0
	// B32: store i32 256, i32* %[[TMP1]], align 4			// B32: store i32 256, i32* %[[TMP1]], align 4
	// B32: call i32 @__enqueue_kernel_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{(.[0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor addrspace(2)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i32* %[[TMP1]])			// B32: call i32 @__enqueue_kernel_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{([0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i32* %[[TMP1]])
	// B64: %[[TMP:.*]] = alloca [1 x i64]			// B64: %[[TMP:.*]] = alloca [1 x i64]
	// B64: %[[TMP1:.]] = getelementptr [1 x i64], [1 x i64] %[[TMP]], i32 0, i32 0			// B64: %[[TMP1:.]] = getelementptr [1 x i64], [1 x i64] %[[TMP]], i32 0, i32 0
	// B64: store i64 256, i64* %[[TMP1]], align 8			// B64: store i64 256, i64* %[[TMP1]], align 8
	// B64: call i32 @__enqueue_kernel_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{(.[0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor addrspace(2)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i64* %[[TMP1]])			// B64: call i32 @__enqueue_kernel_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{([0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i64* %[[TMP1]])
	enqueue_kernel(default_queue, flags, ndrange,			enqueue_kernel(default_queue, flags, ndrange,
	^(local void *p) {			^(local void *p) {
	return;			return;
	},			},
	256);			256);
	char c;			char c;
	// COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t{{.}}, %opencl.queue_t{{.}}* %default_queue			// COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t{{.}}, %opencl.queue_t{{.}}* %default_queue
	// COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags			// COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags
	// B32: %[[TMP:.*]] = alloca [1 x i32]			// B32: %[[TMP:.*]] = alloca [1 x i32]
	// B32: %[[TMP1:.]] = getelementptr [1 x i32], [1 x i32] %[[TMP]], i32 0, i32 0			// B32: %[[TMP1:.]] = getelementptr [1 x i32], [1 x i32] %[[TMP]], i32 0, i32 0
	// B32: store i32 %{{.}}, i32 %[[TMP1]], align 4			// B32: store i32 %{{.}}, i32 %[[TMP1]], align 4
	// B32: call i32 @__enqueue_kernel_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{(.[0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor addrspace(2)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i32* %[[TMP1]])			// B32: call i32 @__enqueue_kernel_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{([0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i32* %[[TMP1]])
	// B64: %[[TMP:.*]] = alloca [1 x i64]			// B64: %[[TMP:.*]] = alloca [1 x i64]
	// B64: %[[TMP1:.]] = getelementptr [1 x i64], [1 x i64] %[[TMP]], i32 0, i32 0			// B64: %[[TMP1:.]] = getelementptr [1 x i64], [1 x i64] %[[TMP]], i32 0, i32 0
	// B64: store i64 %{{.}}, i64 %[[TMP1]], align 8			// B64: store i64 %{{.}}, i64 %[[TMP1]], align 8
	// B64: call i32 @__enqueue_kernel_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{(.[0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor addrspace(2)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i64* %[[TMP1]])			// B64: call i32 @__enqueue_kernel_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{([0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i64* %[[TMP1]])
	enqueue_kernel(default_queue, flags, ndrange,			enqueue_kernel(default_queue, flags, ndrange,
	^(local void *p) {			^(local void *p) {
	return;			return;
	},			},
	c);			c);

	// COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t{{.}}, %opencl.queue_t{{.}}* %default_queue			// COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t{{.}}, %opencl.queue_t{{.}}* %default_queue
	// COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags			// COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags
	// COMMON: [[AD:%arraydecay[0-9]]] = getelementptr inbounds [1 x %opencl.clk_event_t], [1 x %opencl.clk_event_t] %event_wait_list2, i32 0, i32 0			// COMMON: [[AD:%arraydecay[0-9]]] = getelementptr inbounds [1 x %opencl.clk_event_t], [1 x %opencl.clk_event_t] %event_wait_list2, i32 0, i32 0
	// COMMON: [[WAIT_EVNT:%[0-9]+]] = addrspacecast %opencl.clk_event_t{{.}}* [[AD]] to %opencl.clk_event_t{{.}} addrspace(4)*			// COMMON: [[WAIT_EVNT:%[0-9]+]] = addrspacecast %opencl.clk_event_t{{.}}* [[AD]] to %opencl.clk_event_t{{.}} addrspace(4)*
	// COMMON: [[EVNT:%[0-9]+]] = addrspacecast %opencl.clk_event_t{{.}}* %clk_event to %opencl.clk_event_t{{.}} addrspace(4)*			// COMMON: [[EVNT:%[0-9]+]] = addrspacecast %opencl.clk_event_t{{.}}* %clk_event to %opencl.clk_event_t{{.}} addrspace(4)*
	// B32: %[[TMP:.*]] = alloca [1 x i32]			// B32: %[[TMP:.*]] = alloca [1 x i32]
	// B32: %[[TMP1:.]] = getelementptr [1 x i32], [1 x i32] %[[TMP]], i32 0, i32 0			// B32: %[[TMP1:.]] = getelementptr [1 x i32], [1 x i32] %[[TMP]], i32 0, i32 0
	// B32: store i32 256, i32* %[[TMP1]], align 4			// B32: store i32 256, i32* %[[TMP1]], align 4
	// B32: call i32 @__enqueue_kernel_events_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* {{.}}, i32 2, %opencl.clk_event_t{{.}} [[WAIT_EVNT]], %opencl.clk_event_t{{.}} [[EVNT]], i8 addrspace(4) addrspacecast (i8 addrspace(1)* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor addrspace(2)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i32* %[[TMP1]])			// B32: call i32 @__enqueue_kernel_events_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* {{.}}, i32 2, %opencl.clk_event_t{{.}} [[WAIT_EVNT]], %opencl.clk_event_t{{.}} [[EVNT]], i8 addrspace(4) addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i32* %[[TMP1]])
	// B64: %[[TMP:.*]] = alloca [1 x i64]			// B64: %[[TMP:.*]] = alloca [1 x i64]
	// B64: %[[TMP1:.]] = getelementptr [1 x i64], [1 x i64] %[[TMP]], i32 0, i32 0			// B64: %[[TMP1:.]] = getelementptr [1 x i64], [1 x i64] %[[TMP]], i32 0, i32 0
	// B64: store i64 256, i64* %[[TMP1]], align 8			// B64: store i64 256, i64* %[[TMP1]], align 8
	// B64: call i32 @__enqueue_kernel_events_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* {{.}}, i32 2, %opencl.clk_event_t{{.}} [[WAIT_EVNT]], %opencl.clk_event_t{{.}} [[EVNT]], i8 addrspace(4) addrspacecast (i8 addrspace(1)* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor addrspace(2)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i64* %[[TMP1]])			// B64: call i32 @__enqueue_kernel_events_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* {{.}}, i32 2, %opencl.clk_event_t{{.}} [[WAIT_EVNT]], %opencl.clk_event_t{{.}} [[EVNT]], i8 addrspace(4) addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i64* %[[TMP1]])
	enqueue_kernel(default_queue, flags, ndrange, 2, event_wait_list2, &clk_event,			enqueue_kernel(default_queue, flags, ndrange, 2, event_wait_list2, &clk_event,
	^(local void *p) {			^(local void *p) {
	return;			return;
	},			},
	256);			256);

	// COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t{{.}}, %opencl.queue_t{{.}}* %default_queue			// COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t{{.}}, %opencl.queue_t{{.}}* %default_queue
	// COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags			// COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags
	// COMMON: [[AD:%arraydecay[0-9]]] = getelementptr inbounds [1 x %opencl.clk_event_t], [1 x %opencl.clk_event_t] %event_wait_list2, i32 0, i32 0			// COMMON: [[AD:%arraydecay[0-9]]] = getelementptr inbounds [1 x %opencl.clk_event_t], [1 x %opencl.clk_event_t] %event_wait_list2, i32 0, i32 0
	// COMMON: [[WAIT_EVNT:%[0-9]+]] = addrspacecast %opencl.clk_event_t{{.}}* [[AD]] to %opencl.clk_event_t{{.}} addrspace(4)*			// COMMON: [[WAIT_EVNT:%[0-9]+]] = addrspacecast %opencl.clk_event_t{{.}}* [[AD]] to %opencl.clk_event_t{{.}} addrspace(4)*
	// COMMON: [[EVNT:%[0-9]+]] = addrspacecast %opencl.clk_event_t{{.}}* %clk_event to %opencl.clk_event_t{{.}} addrspace(4)*			// COMMON: [[EVNT:%[0-9]+]] = addrspacecast %opencl.clk_event_t{{.}}* %clk_event to %opencl.clk_event_t{{.}} addrspace(4)*
	// B32: %[[TMP:.*]] = alloca [1 x i32]			// B32: %[[TMP:.*]] = alloca [1 x i32]
	// B32: %[[TMP1:.]] = getelementptr [1 x i32], [1 x i32] %[[TMP]], i32 0, i32 0			// B32: %[[TMP1:.]] = getelementptr [1 x i32], [1 x i32] %[[TMP]], i32 0, i32 0
	// B32: store i32 %{{.}}, i32 %[[TMP1]], align 4			// B32: store i32 %{{.}}, i32 %[[TMP1]], align 4
	// B32: call i32 @__enqueue_kernel_events_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* {{.}}, i32 2, %opencl.clk_event_t{{.}}* addrspace(4)* [[WAIT_EVNT]], %opencl.clk_event_t{{.}} addrspace(4)* [[EVNT]], i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor addrspace(2)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i32* %[[TMP1]])			// B32: call i32 @__enqueue_kernel_events_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* {{.}}, i32 2, %opencl.clk_event_t{{.}}* addrspace(4)* [[WAIT_EVNT]], %opencl.clk_event_t{{.}} addrspace(4)* [[EVNT]], i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i32* %[[TMP1]])
	// B64: %[[TMP:.*]] = alloca [1 x i64]			// B64: %[[TMP:.*]] = alloca [1 x i64]
	// B64: %[[TMP1:.]] = getelementptr [1 x i64], [1 x i64] %[[TMP]], i32 0, i32 0			// B64: %[[TMP1:.]] = getelementptr [1 x i64], [1 x i64] %[[TMP]], i32 0, i32 0
	// B64: store i64 %{{.}}, i64 %[[TMP1]], align 8			// B64: store i64 %{{.}}, i64 %[[TMP1]], align 8
	// B64: call i32 @__enqueue_kernel_events_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* {{.}}, i32 2, %opencl.clk_event_t{{.}}* addrspace(4)* [[WAIT_EVNT]], %opencl.clk_event_t{{.}} addrspace(4)* [[EVNT]], i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor addrspace(2)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i64* %[[TMP1]])			// B64: call i32 @__enqueue_kernel_events_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* {{.}}, i32 2, %opencl.clk_event_t{{.}}* addrspace(4)* [[WAIT_EVNT]], %opencl.clk_event_t{{.}} addrspace(4)* [[EVNT]], i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i64* %[[TMP1]])
	enqueue_kernel(default_queue, flags, ndrange, 2, event_wait_list2, &clk_event,			enqueue_kernel(default_queue, flags, ndrange, 2, event_wait_list2, &clk_event,
	^(local void *p) {			^(local void *p) {
	return;			return;
	},			},
	c);			c);

	long l;			long l;
	// COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t{{.}}, %opencl.queue_t{{.}}* %default_queue			// COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t{{.}}, %opencl.queue_t{{.}}* %default_queue
	// COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags			// COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags
	// B32: %[[TMP:.*]] = alloca [1 x i32]			// B32: %[[TMP:.*]] = alloca [1 x i32]
	// B32: %[[TMP1:.]] = getelementptr [1 x i32], [1 x i32] %[[TMP]], i32 0, i32 0			// B32: %[[TMP1:.]] = getelementptr [1 x i32], [1 x i32] %[[TMP]], i32 0, i32 0
	// B32: store i32 %{{.}}, i32 %[[TMP1]], align 4			// B32: store i32 %{{.}}, i32 %[[TMP1]], align 4
	// B32: call i32 @__enqueue_kernel_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{(.[0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor addrspace(2)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i32* %[[TMP1]])			// B32: call i32 @__enqueue_kernel_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{([0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i32* %[[TMP1]])
	// B64: %[[TMP:.*]] = alloca [1 x i64]			// B64: %[[TMP:.*]] = alloca [1 x i64]
	// B64: %[[TMP1:.]] = getelementptr [1 x i64], [1 x i64] %[[TMP]], i32 0, i32 0			// B64: %[[TMP1:.]] = getelementptr [1 x i64], [1 x i64] %[[TMP]], i32 0, i32 0
	// B64: store i64 %{{.}}, i64 %[[TMP1]], align 8			// B64: store i64 %{{.}}, i64 %[[TMP1]], align 8
	// B64: call i32 @__enqueue_kernel_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{(.[0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor addrspace(2)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i64* %[[TMP1]])			// B64: call i32 @__enqueue_kernel_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{([0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i64* %[[TMP1]])
	enqueue_kernel(default_queue, flags, ndrange,			enqueue_kernel(default_queue, flags, ndrange,
	^(local void *p) {			^(local void *p) {
	return;			return;
	},			},
	l);			l);

	// COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t{{.}}, %opencl.queue_t{{.}}* %default_queue			// COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t{{.}}, %opencl.queue_t{{.}}* %default_queue
	// COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags			// COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags
	// B32: %[[TMP:.*]] = alloca [3 x i32]			// B32: %[[TMP:.*]] = alloca [3 x i32]
	// B32: %[[TMP1:.]] = getelementptr [3 x i32], [3 x i32] %[[TMP]], i32 0, i32 0			// B32: %[[TMP1:.]] = getelementptr [3 x i32], [3 x i32] %[[TMP]], i32 0, i32 0
	// B32: store i32 1, i32* %[[TMP1]], align 4			// B32: store i32 1, i32* %[[TMP1]], align 4
	// B32: %[[TMP2:.]] = getelementptr [3 x i32], [3 x i32] %[[TMP]], i32 0, i32 1			// B32: %[[TMP2:.]] = getelementptr [3 x i32], [3 x i32] %[[TMP]], i32 0, i32 1
	// B32: store i32 2, i32* %[[TMP2]], align 4			// B32: store i32 2, i32* %[[TMP2]], align 4
	// B32: %[[TMP3:.]] = getelementptr [3 x i32], [3 x i32] %[[TMP]], i32 0, i32 2			// B32: %[[TMP3:.]] = getelementptr [3 x i32], [3 x i32] %[[TMP]], i32 0, i32 2
	// B32: store i32 4, i32* %[[TMP3]], align 4			// B32: store i32 4, i32* %[[TMP3]], align 4
	// B32: call i32 @__enqueue_kernel_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{(.[0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor addrspace(2)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 3, i32* %[[TMP1]])			// B32: call i32 @__enqueue_kernel_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{([0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 3, i32* %[[TMP1]])
	// B64: %[[TMP:.*]] = alloca [3 x i64]			// B64: %[[TMP:.*]] = alloca [3 x i64]
	// B64: %[[TMP1:.]] = getelementptr [3 x i64], [3 x i64] %[[TMP]], i32 0, i32 0			// B64: %[[TMP1:.]] = getelementptr [3 x i64], [3 x i64] %[[TMP]], i32 0, i32 0
	// B64: store i64 1, i64* %[[TMP1]], align 8			// B64: store i64 1, i64* %[[TMP1]], align 8
	// B64: %[[TMP2:.]] = getelementptr [3 x i64], [3 x i64] %[[TMP]], i32 0, i32 1			// B64: %[[TMP2:.]] = getelementptr [3 x i64], [3 x i64] %[[TMP]], i32 0, i32 1
	// B64: store i64 2, i64* %[[TMP2]], align 8			// B64: store i64 2, i64* %[[TMP2]], align 8
	// B64: %[[TMP3:.]] = getelementptr [3 x i64], [3 x i64] %[[TMP]], i32 0, i32 2			// B64: %[[TMP3:.]] = getelementptr [3 x i64], [3 x i64] %[[TMP]], i32 0, i32 2
	// B64: store i64 4, i64* %[[TMP3]], align 8			// B64: store i64 4, i64* %[[TMP3]], align 8
	// B64: call i32 @__enqueue_kernel_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{(.[0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor addrspace(2)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 3, i64* %[[TMP1]])			// B64: call i32 @__enqueue_kernel_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{([0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 3, i64* %[[TMP1]])
	enqueue_kernel(default_queue, flags, ndrange,			enqueue_kernel(default_queue, flags, ndrange,
	^(local void p1, local void p2, local void *p3) {			^(local void p1, local void p2, local void *p3) {
	return;			return;
	},			},
	1, 2, 4);			1, 2, 4);

	// COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t, %opencl.queue_t* %default_queue			// COMMON: [[DEF_Q:%[0-9]+]] = load %opencl.queue_t, %opencl.queue_t* %default_queue
	// COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags			// COMMON: [[FLAGS:%[0-9]+]] = load i32, i32* %flags
	// B32: %[[TMP:.*]] = alloca [1 x i32]			// B32: %[[TMP:.*]] = alloca [1 x i32]
	// B32: %[[TMP1:.]] = getelementptr [1 x i32], [1 x i32] %[[TMP]], i32 0, i32 0			// B32: %[[TMP1:.]] = getelementptr [1 x i32], [1 x i32] %[[TMP]], i32 0, i32 0
	// B32: store i32 0, i32* %[[TMP1]], align 4			// B32: store i32 0, i32* %[[TMP1]], align 4
	// B32: call i32 @__enqueue_kernel_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{(.[0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor addrspace(2)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i32* %[[TMP1]])			// B32: call i32 @__enqueue_kernel_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{([0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i32* %[[TMP1]])
	// B64: %[[TMP:.*]] = alloca [1 x i64]			// B64: %[[TMP:.*]] = alloca [1 x i64]
	// B64: %[[TMP1:.]] = getelementptr [1 x i64], [1 x i64] %[[TMP]], i32 0, i32 0			// B64: %[[TMP1:.]] = getelementptr [1 x i64], [1 x i64] %[[TMP]], i32 0, i32 0
	// B64: store i64 4294967296, i64* %[[TMP1]], align 8			// B64: store i64 4294967296, i64* %[[TMP1]], align 8
	// B64: call i32 @__enqueue_kernel_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{(.[0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor addrspace(2)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i64* %[[TMP1]])			// B64: call i32 @__enqueue_kernel_vaargs(%opencl.queue_t{{.}} [[DEF_Q]], i32 [[FLAGS]], %struct.ndrange_t* [[NDR]]{{([0-9]+)?}}, i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* @__block_literal_global{{(.[0-9]+)?}} to i8 addrspace(1)) to i8 addrspace(4)), i32 1, i64* %[[TMP1]])
	enqueue_kernel(default_queue, flags, ndrange,			enqueue_kernel(default_queue, flags, ndrange,
	^(local void *p) {			^(local void *p) {
	return;			return;
	},			},
	4294967296L);			4294967296L);

	// The full type of these expressions are long (and repeated elsewhere), so we			// The full type of these expressions are long (and repeated elsewhere), so we
	// capture it as part of the regex for convenience and clarity.			// capture it as part of the regex for convenience and clarity.
	// COMMON: store void () addrspace(4)* addrspacecast (void () addrspace(1)* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor addrspace(2)* } addrspace(1)* [[BL_A:@__block_literal_global(\.[0-9]+)?]] to void () addrspace(1)) to void () addrspace(4)), void () addrspace(4)** %block_A			// COMMON: store void () addrspace(4)* addrspacecast (void () addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* [[BL_A:@__block_literal_global(\.[0-9]+)?]] to void () addrspace(1)) to void () addrspace(4)), void () addrspace(4)** %block_A
	void (^const block_A)(void) = ^{			void (^const block_A)(void) = ^{
	return;			return;
	};			};

	// COMMON: store void (i8 addrspace(3)) addrspace(4) addrspacecast (void (i8 addrspace(3)) addrspace(1) bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor addrspace(2)* } addrspace(1)* [[BL_B:@__block_literal_global(\.[0-9]+)?]] to void (i8 addrspace(3)) addrspace(1)) to void (i8 addrspace(3)) addrspace(4)), void (i8 addrspace(3)) addrspace(4)* %block_B			// COMMON: store void (i8 addrspace(3)) addrspace(4) addrspacecast (void (i8 addrspace(3)) addrspace(1) bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* [[BL_B:@__block_literal_global(\.[0-9]+)?]] to void (i8 addrspace(3)) addrspace(1)) to void (i8 addrspace(3)) addrspace(4)), void (i8 addrspace(3)) addrspace(4)* %block_B
	void (^const block_B)(local void ) = ^(local void a) {			void (^const block_B)(local void ) = ^(local void a) {
	return;			return;
	};			};

	// COMMON: call i32 @__get_kernel_work_group_size_impl(i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor addrspace(2)* } addrspace(1)* [[BL_A]] to i8 addrspace(1)) to i8 addrspace(4)))			// COMMON: call i32 @__get_kernel_work_group_size_impl(i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* [[BL_A]] to i8 addrspace(1)) to i8 addrspace(4)))
	unsigned size = get_kernel_work_group_size(block_A);			unsigned size = get_kernel_work_group_size(block_A);
	// COMMON: call i32 @__get_kernel_work_group_size_impl(i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor addrspace(2)* } addrspace(1)* [[BL_B]] to i8 addrspace(1)) to i8 addrspace(4)))			// COMMON: call i32 @__get_kernel_work_group_size_impl(i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* [[BL_B]] to i8 addrspace(1)) to i8 addrspace(4)))
	size = get_kernel_work_group_size(block_B);			size = get_kernel_work_group_size(block_B);
	// COMMON: call i32 @__get_kernel_preferred_work_group_multiple_impl(i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor addrspace(2)* } addrspace(1)* [[BL_A]] to i8 addrspace(1)) to i8 addrspace(4)))			// COMMON: call i32 @__get_kernel_preferred_work_group_multiple_impl(i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* [[BL_A]] to i8 addrspace(1)) to i8 addrspace(4)))
	size = get_kernel_preferred_work_group_size_multiple(block_A);			size = get_kernel_preferred_work_group_size_multiple(block_A);
	// COMMON: call i32 @__get_kernel_preferred_work_group_multiple_impl(i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor addrspace(2)* } addrspace(1)* [[BL_GLOBAL]] to i8 addrspace(1)) to i8 addrspace(4)))			// COMMON: call i32 @__get_kernel_preferred_work_group_multiple_impl(i8 addrspace(4)* addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* [[BL_GLOBAL]] to i8 addrspace(1)) to i8 addrspace(4)))
	size = get_kernel_preferred_work_group_size_multiple(block_G);			size = get_kernel_preferred_work_group_size_multiple(block_G);

	// COMMON: call i32 @__get_kernel_max_sub_group_size_for_ndrange_impl(%struct.ndrange_t* {{.}}, i8 addrspace(4) addrspacecast (i8 addrspace(1)* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor addrspace(2)* } addrspace(1)* {{.}} to i8 addrspace(1)) to i8 addrspace(4)*))			// COMMON: call i32 @__get_kernel_max_sub_group_size_for_ndrange_impl(%struct.ndrange_t* {{.}}, i8 addrspace(4) addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* {{.}} to i8 addrspace(1)) to i8 addrspace(4)*))
	size = get_kernel_max_sub_group_size_for_ndrange(ndrange, ^(){});			size = get_kernel_max_sub_group_size_for_ndrange(ndrange, ^(){});
	// COMMON: call i32 @__get_kernel_sub_group_count_for_ndrange_impl(%struct.ndrange_t* {{.}}, i8 addrspace(4) addrspacecast (i8 addrspace(1)* bitcast ({ i8*, i32, i32, i8, %struct.__block_descriptor addrspace(2)* } addrspace(1)* {{.}} to i8 addrspace(1)) to i8 addrspace(4)*))			// COMMON: call i32 @__get_kernel_sub_group_count_for_ndrange_impl(%struct.ndrange_t* {{.}}, i8 addrspace(4) addrspacecast (i8 addrspace(1)* bitcast ({ i32, i32, i8 addrspace(4)* } addrspace(1)* {{.}} to i8 addrspace(1)) to i8 addrspace(4)*))
	size = get_kernel_sub_group_count_for_ndrange(ndrange, ^(){});			size = get_kernel_sub_group_count_for_ndrange(ndrange, ^(){});
	}			}