This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/CodeGen/
-
CodeGen/
-
CGBuiltin.cpp
-
TargetInfo.h
-
TargetInfo.cpp
-
test/CodeGenOpenCL/
-
CodeGenOpenCL/
-
pipe_builtin.cl

Differential D36327

[OpenCL] Allow targets emit optimized pipe functions for power of 2 type sizes
AbandonedPublic

Authored by yaxunl on Aug 4 2017, 9:51 AM.

Download Raw Diff

Details

Reviewers

b-sumner
Anastasia
bader
rsmith
rjmccall

Summary

Currently Clang emits call of __read_pipe_2 or __read_pipe_4 for OpenCL read_pipe builtin,
with appended type size and alignment arguments, where 2 or 4 indicates the original
number of arguments.

For certain targets (e.g. amdgpu), there are optimized version of __read_pipe_2/__read_pipe_4
when the type size and alignment has the same power of 2 value. It is desired that Clang
emits a different function for these cases.

This patch let Clang emits __read_pipe_2_N for such cases where N is the size in bytes of
the type. (N = 1,2,4,8,...,128), so that the target runtime can use an optimized version of
read_pipe.

The same with __read_pipe_4, __write_pipe_2 and __wirte_pipe_4.

This optimization is controlled by TargetCodeGenInfo::hasOptimizedPipeBuiltin, which returns
false by default. Each target can override this function to turn on this optimization.

Diff Detail

Event Timeline

yaxunl created this revision.Aug 4 2017, 9:51 AM

Herald added a subscriber: tpr. · View Herald TranscriptAug 4 2017, 9:51 AM

yaxunl edited the summary of this revision. (Show Details)Aug 4 2017, 9:52 AM

Hi Sam,

What do you think about implementing this optimization in target specific optimization pass? Since size/alignment is saved as function parameter in LLVM IR, the optimization can be done in target specific components w/o adding additional conditions to generic library.

Thanks,
Alexey

In D36327#833653, @bader wrote:

Hi Sam,

What do you think about implementing this optimization in target specific optimization pass? Since size/alignment is saved as function parameter in LLVM IR, the optimization can be done in target specific components w/o adding additional conditions to generic library.

Thanks,
Alexey

Hi Alexey,

The optimization of the power-of-2 type size is implemented as a library function. Our backend lacks the capability to link in library code at ISA level, so linking of the optimized library function has to be done before any target-specific passes. It seems the only place to do this is Clang codegen since Clang/llvm does not support target-specific pre-linking passes.

In D36327#833891, @yaxunl wrote:

In D36327#833653, @bader wrote:

Hi Sam,

What do you think about implementing this optimization in target specific optimization pass? Since size/alignment is saved as function parameter in LLVM IR, the optimization can be done in target specific components w/o adding additional conditions to generic library.

Thanks,
Alexey

Hi Alexey,

The optimization of the power-of-2 type size is implemented as a library function. Our backend lacks the capability to link in library code at ISA level, so linking of the optimized library function has to be done before any target-specific passes. It seems the only place to do this is Clang codegen since Clang/llvm does not support target-specific pre-linking passes.

My general feeling is that it doesn't look like a generic enough change for the frontend. Even though it is implemented in a generic way, not every target might have a special support for the power of 2 size and also if there is such a support not every implementation would handle it as a library function. But I can see that perhaps LLVM is missing flexibility in the flow to accommodate these needs. Any change we could try to extend the compilation flow such that this target specific optimization could happen before the IR linking?

In D36327#834032, @Anastasia wrote:

In D36327#833891, @yaxunl wrote:

In D36327#833653, @bader wrote:

Hi Sam,

What do you think about implementing this optimization in target specific optimization pass? Since size/alignment is saved as function parameter in LLVM IR, the optimization can be done in target specific components w/o adding additional conditions to generic library.

Thanks,
Alexey

Hi Alexey,

The optimization of the power-of-2 type size is implemented as a library function. Our backend lacks the capability to link in library code at ISA level, so linking of the optimized library function has to be done before any target-specific passes. It seems the only place to do this is Clang codegen since Clang/llvm does not support target-specific pre-linking passes.

My general feeling is that it doesn't look like a generic enough change for the frontend. Even though it is implemented in a generic way, not every target might have a special support for the power of 2 size and also if there is such a support not every implementation would handle it as a library function. But I can see that perhaps LLVM is missing flexibility in the flow to accommodate these needs. Any change we could try to extend the compilation flow such that this target specific optimization could happen before the IR linking?

It is trivial to implement the small number of specialized functions this patch adds in terms of the general one if desired, and the general one can continue to be handled as it had been.

We had actually proposed a patch (sorry I don't have the reference handy) to add general mechanism for targets to introduce pre-link passes, but it was not accepted. We can try again, but I don't really expect more progress.

In D36327#835153, @b-sumner wrote:

In D36327#834032, @Anastasia wrote:

In D36327#833891, @yaxunl wrote:

In D36327#833653, @bader wrote:

Hi Sam,

What do you think about implementing this optimization in target specific optimization pass? Since size/alignment is saved as function parameter in LLVM IR, the optimization can be done in target specific components w/o adding additional conditions to generic library.

Thanks,
Alexey

Hi Alexey,

The optimization of the power-of-2 type size is implemented as a library function. Our backend lacks the capability to link in library code at ISA level, so linking of the optimized library function has to be done before any target-specific passes. It seems the only place to do this is Clang codegen since Clang/llvm does not support target-specific pre-linking passes.

My general feeling is that it doesn't look like a generic enough change for the frontend. Even though it is implemented in a generic way, not every target might have a special support for the power of 2 size and also if there is such a support not every implementation would handle it as a library function. But I can see that perhaps LLVM is missing flexibility in the flow to accommodate these needs. Any change we could try to extend the compilation flow such that this target specific optimization could happen before the IR linking?

It is trivial to implement the small number of specialized functions this patch adds in terms of the general one if desired, and the general one can continue to be handled as it had been.

We had actually proposed a patch (sorry I don't have the reference handy) to add general mechanism for targets to introduce pre-link passes, but it was not accepted. We can try again, but I don't really expect more progress.

It would be nice to understand why it has not been accepted and whether we could try to argument using this case as an example. It seems like a useful feature for toolchains with the IR linking.

In D36327#835634, @Anastasia wrote:

In D36327#835153, @b-sumner wrote:

In D36327#834032, @Anastasia wrote:

In D36327#833891, @yaxunl wrote:

In D36327#833653, @bader wrote:

Hi Sam,

What do you think about implementing this optimization in target specific optimization pass? Since size/alignment is saved as function parameter in LLVM IR, the optimization can be done in target specific components w/o adding additional conditions to generic library.

Thanks,
Alexey

Hi Alexey,

The optimization of the power-of-2 type size is implemented as a library function. Our backend lacks the capability to link in library code at ISA level, so linking of the optimized library function has to be done before any target-specific passes. It seems the only place to do this is Clang codegen since Clang/llvm does not support target-specific pre-linking passes.

My general feeling is that it doesn't look like a generic enough change for the frontend. Even though it is implemented in a generic way, not every target might have a special support for the power of 2 size and also if there is such a support not every implementation would handle it as a library function. But I can see that perhaps LLVM is missing flexibility in the flow to accommodate these needs. Any change we could try to extend the compilation flow such that this target specific optimization could happen before the IR linking?

It is trivial to implement the small number of specialized functions this patch adds in terms of the general one if desired, and the general one can continue to be handled as it had been.

We had actually proposed a patch (sorry I don't have the reference handy) to add general mechanism for targets to introduce pre-link passes, but it was not accepted. We can try again, but I don't really expect more progress.

It would be nice to understand why it has not been accepted and whether we could try to argument using this case as an example. It seems like a useful feature for toolchains with the IR linking.

The original review is here:

https://reviews.llvm.org/D20682

To cite the reason why it was rejected:

"I fundamentally do not believe that the TargetMachine should be involved in fixing language semantics issues with a "pre linking" pass at the LLVM level. Why? Because there is nothing "target" about this.

There needs to be a fundamentally more principled way of handling this at the language and frontend level IMO."

However, until now, we could not find "a fundamentally more principled way of handling this at the language and frontend level".

bader added a reviewer: rsmith.Aug 8 2017, 11:35 AM

@rsmith do you have an opinion on what would be the right place for the kind of proposed optimization?
It looks like it can be implemented as target independent optimization, acting only for target with specified properties - in this case target must provide required built-in functions.

John, do you have any comments? Thanks.

Could you just implement this in SimplifyLibCalls? I assume there's some way to fill in TargetLibraryInfo appropriately for a platform. Is that too late for your linking requirements?

In D36327#839809, @rjmccall wrote:

Could you just implement this in SimplifyLibCalls? I assume there's some way to fill in TargetLibraryInfo appropriately for a platform. Is that too late for your linking requirements?

Both the optimized and generic versions of __read_pipe function contains call of other library functions and are complicate enough not to be generated programmatically. amdgpu target does not have the capability to link in library code after LLVM codegen. The linking has to be done before SimplifyLibCalls.

In D36327#840616, @yaxunl wrote:

In D36327#839809, @rjmccall wrote:

Could you just implement this in SimplifyLibCalls? I assume there's some way to fill in TargetLibraryInfo appropriately for a platform. Is that too late for your linking requirements?

Both the optimized and generic versions of __read_pipe function contains call of other library functions and are complicate enough not to be generated programmatically. amdgpu target does not have the capability to link in library code after LLVM codegen. The linking has to be done before SimplifyLibCalls.

If I understand correctly, SimplifyLibCalls is LLVM IR transformation, so it works before linking and LLVM codegen (e.g. InstCombine passes run this transformation). This pass is doing something similar to what you are trying to achieve for __read_pipe builti-ins: pow(2.0, x) -> llvm.exp2(x).

In D36327#840658, @bader wrote:

In D36327#840616, @yaxunl wrote:

In D36327#839809, @rjmccall wrote:

Could you just implement this in SimplifyLibCalls? I assume there's some way to fill in TargetLibraryInfo appropriately for a platform. Is that too late for your linking requirements?

Both the optimized and generic versions of __read_pipe function contains call of other library functions and are complicate enough not to be generated programmatically. amdgpu target does not have the capability to link in library code after LLVM codegen. The linking has to be done before SimplifyLibCalls.

If I understand correctly, SimplifyLibCalls is LLVM IR transformation, so it works before linking and LLVM codegen (e.g. InstCombine passes run this transformation). This pass is doing something similar to what you are trying to achieve for __read_pipe builti-ins: pow(2.0, x) -> llvm.exp2(x).

Thanks. I will take a look.

We implemented this optimization through some target specific llvm pass.

Revision Contents

Path

Size

lib/

CodeGen/

	CGBuiltin.cpp
	CGBuiltin.cpp

77 lines

	TargetInfo.h
	TargetInfo.h

4 lines

	TargetInfo.cpp
	TargetInfo.cpp

1 line

test/

CodeGenOpenCL/

	pipe_builtin.cl
	pipe_builtin.cl

89 lines

Diff 109764

lib/CodeGen/CGBuiltin.cpp

Context not available.
	CGOpenCLRuntime OpenCLRT(CGM);	CGOpenCLRuntime OpenCLRT(CGM);
	Value *PacketSize = OpenCLRT.getPipeElemSize(E->getArg(0));	Value *PacketSize = OpenCLRT.getPipeElemSize(E->getArg(0));
	Value *PacketAlign = OpenCLRT.getPipeElemAlign(E->getArg(0));	Value *PacketAlign = OpenCLRT.getPipeElemAlign(E->getArg(0));
		unsigned Size = cast<llvm::ConstantInt>(PacketSize)->getZExtValue();
		unsigned Align = cast<llvm::ConstantInt>(PacketAlign)->getZExtValue();
		bool Opt = Size == Align && isPowerOf2_32(Size) &&
		getTargetHooks().hasOptimizedOpenCLPipeBuiltin();

	// Type of the generic packet parameter.	// Type of the generic packet parameter.
	unsigned GenericAS =	unsigned GenericAS =
	getContext().getTargetAddressSpace(LangAS::opencl_generic);	getContext().getTargetAddressSpace(LangAS::opencl_generic);
	llvm::Type *I8PTy = llvm::PointerType::get(	llvm::Type *PtrElemTy;
	llvm::Type::getInt8Ty(getLLVMContext()), GenericAS);	if (!Opt)
		PtrElemTy = llvm::Type::getInt8Ty(getLLVMContext());
		else if (Size <= 8)
		PtrElemTy = llvm::Type::getIntNTy(getLLVMContext(), Size * 8);
		else
		PtrElemTy = llvm::VectorType::get(
		llvm::Type::getInt64Ty(getLLVMContext()), Size / 8);
		llvm::Type *PtrTy = llvm::PointerType::get(PtrElemTy, GenericAS);

	// Testing which overloaded version we should generate the call for.	// Testing which overloaded version we should generate the call for.
	if (2U == E->getNumArgs()) {	if (2U == E->getNumArgs()) {
	const char *Name = (BuiltinID == Builtin::BIread_pipe) ? "__read_pipe_2"	std::string Name = (BuiltinID == Builtin::BIread_pipe) ? "__read_pipe_2"
	: "__write_pipe_2";	: "__write_pipe_2";
		llvm::SmallVector<llvm::Type *, 4> ArgTys;
		ArgTys.push_back(Arg0->getType());
		ArgTys.push_back(PtrTy);

		if (Opt) {
		Name = Name + "_" + std::to_string(Size);
		} else {
		ArgTys.push_back(Int32Ty);
		ArgTys.push_back(Int32Ty);
		}

	// Creating a generic function type to be able to call with any builtin or	// Creating a generic function type to be able to call with any builtin or
	// user defined type.	// user defined type.
	llvm::Type *ArgTys[] = {Arg0->getType(), I8PTy, Int32Ty, Int32Ty};
	llvm::FunctionType *FTy = llvm::FunctionType::get(	llvm::FunctionType *FTy = llvm::FunctionType::get(
	Int32Ty, llvm::ArrayRef<llvm::Type *>(ArgTys), false);	Int32Ty, llvm::ArrayRef<llvm::Type *>(ArgTys), false);
	Value *BCast = Builder.CreatePointerCast(Arg1, I8PTy);	Value *BCast = Builder.CreatePointerCast(Arg1, PtrTy);

		llvm::SmallVector<llvm::Value *, 4> Args;
		Args.push_back(Arg0);
		Args.push_back(BCast);
		if (!Opt) {
		Args.push_back(PacketSize);
		Args.push_back(PacketAlign);
		}

	return RValue::get(	return RValue::get(
	Builder.CreateCall(CGM.CreateRuntimeFunction(FTy, Name),	Builder.CreateCall(CGM.CreateRuntimeFunction(FTy, Name), Args));
	{Arg0, BCast, PacketSize, PacketAlign}));
	} else {	} else {
	assert(4 == E->getNumArgs() &&	assert(4 == E->getNumArgs() &&
	"Illegal number of parameters to pipe function");	"Illegal number of parameters to pipe function");
	const char *Name = (BuiltinID == Builtin::BIread_pipe) ? "__read_pipe_4"	std::string Name = (BuiltinID == Builtin::BIread_pipe) ? "__read_pipe_4"
	: "__write_pipe_4";	: "__write_pipe_4";
		llvm::SmallVector<llvm::Type *, 6> ArgTys;
		ArgTys.push_back(Arg0->getType());
		ArgTys.push_back(Arg1->getType());
		ArgTys.push_back(Int32Ty);
		ArgTys.push_back(PtrTy);

		if (Opt) {
		Name = Name + "_" + std::to_string(Size);
		} else {
		ArgTys.push_back(Int32Ty);
		ArgTys.push_back(Int32Ty);
		}

	llvm::Type *ArgTys[] = {Arg0->getType(), Arg1->getType(), Int32Ty, I8PTy,
	Int32Ty, Int32Ty};
	Value *Arg2 = EmitScalarExpr(E->getArg(2)),	Value *Arg2 = EmitScalarExpr(E->getArg(2)),
	*Arg3 = EmitScalarExpr(E->getArg(3));	*Arg3 = EmitScalarExpr(E->getArg(3));
	llvm::FunctionType *FTy = llvm::FunctionType::get(	llvm::FunctionType *FTy = llvm::FunctionType::get(
	Int32Ty, llvm::ArrayRef<llvm::Type *>(ArgTys), false);	Int32Ty, llvm::ArrayRef<llvm::Type *>(ArgTys), false);
	Value *BCast = Builder.CreatePointerCast(Arg3, I8PTy);	Value *BCast = Builder.CreatePointerCast(Arg3, PtrTy);
	// We know the third argument is an integer type, but we may need to cast	// We know the third argument is an integer type, but we may need to cast
	// it to i32.	// it to i32.
	if (Arg2->getType() != Int32Ty)	if (Arg2->getType() != Int32Ty)
	Arg2 = Builder.CreateZExtOrTrunc(Arg2, Int32Ty);	Arg2 = Builder.CreateZExtOrTrunc(Arg2, Int32Ty);
	return RValue::get(Builder.CreateCall(
	CGM.CreateRuntimeFunction(FTy, Name),	llvm::SmallVector<llvm::Value *, 6> Args;
	{Arg0, Arg1, Arg2, BCast, PacketSize, PacketAlign}));	Args.push_back(Arg0);
		Args.push_back(Arg1);
		Args.push_back(Arg2);
		Args.push_back(BCast);
		if (!Opt) {
		Args.push_back(PacketSize);
		Args.push_back(PacketAlign);
		}

		return RValue::get(
		Builder.CreateCall(CGM.CreateRuntimeFunction(FTy, Name), Args));
	}	}
	}	}
	// OpenCL v2.0 s6.13.16 ,s9.17.3.5 - Built-in pipe reserve read and write	// OpenCL v2.0 s6.13.16 ,s9.17.3.5 - Built-in pipe reserve read and write
Context not available.

lib/CodeGen/TargetInfo.h

Context not available.
	virtual llvm::Constant *	virtual llvm::Constant *
	performAddrSpaceCast(CodeGenModule &CGM, llvm::Constant *V, unsigned SrcAddr,	performAddrSpaceCast(CodeGenModule &CGM, llvm::Constant *V, unsigned SrcAddr,
	unsigned DestAddr, llvm::Type *DestTy) const;	unsigned DestAddr, llvm::Type *DestTy) const;

		/// Whether the target support optimized read_pipe and write_pipe builtin
		/// functions when type size and alignment is power of 2.
		virtual bool hasOptimizedOpenCLPipeBuiltin() const { return false; }
	};	};

	} // namespace CodeGen	} // namespace CodeGen
Context not available.

lib/CodeGen/TargetInfo.cpp

Context not available.
	}	}
	unsigned getGlobalVarAddressSpace(CodeGenModule &CGM,	unsigned getGlobalVarAddressSpace(CodeGenModule &CGM,
	const VarDecl *D) const override;	const VarDecl *D) const override;
		bool hasOptimizedOpenCLPipeBuiltin() const override { return true; }
	};	};
	}	}

Context not available.

test/CodeGenOpenCL/pipe_builtin.cl

	// RUN: %clang_cc1 -emit-llvm -cl-ext=+cl_khr_subgroups -O0 -cl-std=CL2.0 -o - %s \| FileCheck %s	// RUN: %clang_cc1 -emit-llvm -cl-ext=+cl_khr_subgroups -O0 -cl-std=CL2.0 -o - %s \| FileCheck -check-prefixes=CHECK,NAMD %s
		// RUN: %clang_cc1 -triple amdgcn---amdgizcl -emit-llvm -cl-ext=+cl_khr_subgroups -O0 -cl-std=CL2.0 -o - %s \| FileCheck -check-prefixes=CHECK,AMD %s

	// CHECK: %opencl.pipe_t = type opaque	// CHECK: %opencl.pipe_t = type opaque
	// CHECK: %opencl.reserve_id_t = type opaque	// CHECK: %opencl.reserve_id_t = type opaque

	#pragma OPENCL EXTENSION cl_khr_subgroups : enable	#pragma OPENCL EXTENSION cl_khr_subgroups : enable

		typedef struct {
		int x[100];
		} S;

		typedef long long2 __attribute__((ext_vector_type(2)));
		typedef long long3 __attribute__((ext_vector_type(3)));
		typedef long long4 __attribute__((ext_vector_type(4)));
		typedef long long8 __attribute__((ext_vector_type(8)));
		typedef long long16 __attribute__((ext_vector_type(16)));

	void test1(read_only pipe int p, global int *ptr) {	void test1(read_only pipe int p, global int *ptr) {
	// CHECK: call i32 @__read_pipe_2(%opencl.pipe_t* %{{.}}, i8 %{{.*}}, i32 4, i32 4)	// NAMD: call i32 @__read_pipe_2(%opencl.pipe_t* %{{.}}, i8 %{{.*}}, i32 4, i32 4)
		// AMD: call i32 @__read_pipe_2_4(%opencl.pipe_t addrspace(1)* %{{.}}, i32 %{{.*}})
	read_pipe(p, ptr);	read_pipe(p, ptr);
	// CHECK: call %opencl.reserve_id_t* @__reserve_read_pipe(%opencl.pipe_t* %{{.}}, i32 {{.}}, i32 4, i32 4)	// CHECK: call %opencl.reserve_id_t* @__reserve_read_pipe(%opencl.pipe_t{{.}} %{{.}}, i32 {{.}}, i32 4, i32 4)
	reserve_id_t rid = reserve_read_pipe(p, 2);	reserve_id_t rid = reserve_read_pipe(p, 2);
	// CHECK: call i32 @__read_pipe_4(%opencl.pipe_t* %{{.}}, %opencl.reserve_id_t %{{.}}, i32 {{.}}, i8* %{{.*}}, i32 4, i32 4)	// NAMD: call i32 @__read_pipe_4(%opencl.pipe_t* %{{.}}, %opencl.reserve_id_t %{{.}}, i32 {{.}}, i8* %{{.*}}, i32 4, i32 4)
		// AMD: call i32 @__read_pipe_4_4(%opencl.pipe_t addrspace(1)* %{{.}}, %opencl.reserve_id_t %{{.}}, i32 {{.}}, i32* %{{.*}})
	read_pipe(p, rid, 2, ptr);	read_pipe(p, rid, 2, ptr);
	// CHECK: call void @__commit_read_pipe(%opencl.pipe_t* %{{.}}, %opencl.reserve_id_t %{{.*}}, i32 4, i32 4)	// CHECK: call void @__commit_read_pipe(%opencl.pipe_t{{.}} %{{.}}, %opencl.reserve_id_t{{.}}* %{{.*}}, i32 4, i32 4)
	commit_read_pipe(p, rid);	commit_read_pipe(p, rid);
	}	}

	void test2(write_only pipe int p, global int *ptr) {	void test2(write_only pipe int p, global int *ptr) {
	// CHECK: call i32 @__write_pipe_2(%opencl.pipe_t* %{{.}}, i8 %{{.*}}, i32 4, i32 4)	// NAMD: call i32 @__write_pipe_2(%opencl.pipe_t* %{{.}}, i8 %{{.*}}, i32 4, i32 4)
		// AMD: call i32 @__write_pipe_2_4(%opencl.pipe_t addrspace(1)* %{{.}}, i32 %{{.*}})
	write_pipe(p, ptr);	write_pipe(p, ptr);
	// CHECK: call %opencl.reserve_id_t* @__reserve_write_pipe(%opencl.pipe_t* %{{.}}, i32 {{.}}, i32 4, i32 4)	// CHECK: call %opencl.reserve_id_t* @__reserve_write_pipe(%opencl.pipe_t{{.}} %{{.}}, i32 {{.}}, i32 4, i32 4)
	reserve_id_t rid = reserve_write_pipe(p, 2);	reserve_id_t rid = reserve_write_pipe(p, 2);
	// CHECK: call i32 @__write_pipe_4(%opencl.pipe_t* %{{.}}, %opencl.reserve_id_t %{{.}}, i32 {{.}}, i8* %{{.*}}, i32 4, i32 4)	// NAMD: call i32 @__write_pipe_4(%opencl.pipe_t* %{{.}}, %opencl.reserve_id_t %{{.}}, i32 {{.}}, i8* %{{.*}}, i32 4, i32 4)
		// AMD: call i32 @__write_pipe_4_4(%opencl.pipe_t addrspace(1)* %{{.}}, %opencl.reserve_id_t %{{.}}, i32 {{.}}, i32* %{{.*}})
	write_pipe(p, rid, 2, ptr);	write_pipe(p, rid, 2, ptr);
	// CHECK: call void @__commit_write_pipe(%opencl.pipe_t* %{{.}}, %opencl.reserve_id_t %{{.*}}, i32 4, i32 4)	// CHECK: call void @__commit_write_pipe(%opencl.pipe_t{{.}} %{{.}}, %opencl.reserve_id_t %{{.*}}, i32 4, i32 4)
	commit_write_pipe(p, rid);	commit_write_pipe(p, rid);
	}	}

	void test3(read_only pipe int p, global int *ptr) {	void test3(read_only pipe int p, global int *ptr) {
	// CHECK: call %opencl.reserve_id_t* @__work_group_reserve_read_pipe(%opencl.pipe_t* %{{.}}, i32 {{.}}, i32 4, i32 4)	// CHECK: call %opencl.reserve_id_t* @__work_group_reserve_read_pipe(%opencl.pipe_t{{.}} %{{.}}, i32 {{.}}, i32 4, i32 4)
	reserve_id_t rid = work_group_reserve_read_pipe(p, 2);	reserve_id_t rid = work_group_reserve_read_pipe(p, 2);
	// CHECK: call void @__work_group_commit_read_pipe(%opencl.pipe_t* %{{.}}, %opencl.reserve_id_t %{{.*}}, i32 4, i32 4)	// CHECK: call void @__work_group_commit_read_pipe(%opencl.pipe_t{{.}} %{{.}}, %opencl.reserve_id_t{{.}}* %{{.*}}, i32 4, i32 4)
	work_group_commit_read_pipe(p, rid);	work_group_commit_read_pipe(p, rid);
	}	}

	void test4(write_only pipe int p, global int *ptr) {	void test4(write_only pipe int p, global int *ptr) {
	// CHECK: call %opencl.reserve_id_t* @__work_group_reserve_write_pipe(%opencl.pipe_t* %{{.}}, i32 {{.}}, i32 4, i32 4)	// CHECK: call %opencl.reserve_id_t* @__work_group_reserve_write_pipe(%opencl.pipe_t{{.}} %{{.}}, i32 {{.}}, i32 4, i32 4)
	reserve_id_t rid = work_group_reserve_write_pipe(p, 2);	reserve_id_t rid = work_group_reserve_write_pipe(p, 2);
	// CHECK: call void @__work_group_commit_write_pipe(%opencl.pipe_t* %{{.}}, %opencl.reserve_id_t %{{.*}}, i32 4, i32 4)	// CHECK: call void @__work_group_commit_write_pipe(%opencl.pipe_t{{.}} %{{.}}, %opencl.reserve_id_t{{.}}* %{{.*}}, i32 4, i32 4)
	work_group_commit_write_pipe(p, rid);	work_group_commit_write_pipe(p, rid);
	}	}

	void test5(read_only pipe int p, global int *ptr) {	void test5(read_only pipe int p, global int *ptr) {
	// CHECK: call %opencl.reserve_id_t* @__sub_group_reserve_read_pipe(%opencl.pipe_t* %{{.}}, i32 {{.}}, i32 4, i32 4)	// CHECK: call %opencl.reserve_id_t* @__sub_group_reserve_read_pipe(%opencl.pipe_t{{.}} %{{.}}, i32 {{.}}, i32 4, i32 4)
	reserve_id_t rid = sub_group_reserve_read_pipe(p, 2);	reserve_id_t rid = sub_group_reserve_read_pipe(p, 2);
	// CHECK: call void @__sub_group_commit_read_pipe(%opencl.pipe_t* %{{.}}, %opencl.reserve_id_t %{{.*}}, i32 4, i32 4)	// CHECK: call void @__sub_group_commit_read_pipe(%opencl.pipe_t{{.}} %{{.}}, %opencl.reserve_id_t{{.}}* %{{.*}}, i32 4, i32 4)
	sub_group_commit_read_pipe(p, rid);	sub_group_commit_read_pipe(p, rid);
	}	}

	void test6(write_only pipe int p, global int *ptr) {	void test6(write_only pipe int p, global int *ptr) {
	// CHECK: call %opencl.reserve_id_t* @__sub_group_reserve_write_pipe(%opencl.pipe_t* %{{.}}, i32 {{.}}, i32 4, i32 4)	// CHECK: call %opencl.reserve_id_t* @__sub_group_reserve_write_pipe(%opencl.pipe_t{{.}} %{{.}}, i32 {{.}}, i32 4, i32 4)
	reserve_id_t rid = sub_group_reserve_write_pipe(p, 2);	reserve_id_t rid = sub_group_reserve_write_pipe(p, 2);
	// CHECK: call void @__sub_group_commit_write_pipe(%opencl.pipe_t* %{{.}}, %opencl.reserve_id_t %{{.*}}, i32 4, i32 4)	// CHECK: call void @__sub_group_commit_write_pipe(%opencl.pipe_t{{.}} %{{.}}, %opencl.reserve_id_t{{.}}* %{{.*}}, i32 4, i32 4)
	sub_group_commit_write_pipe(p, rid);	sub_group_commit_write_pipe(p, rid);
	}	}

	void test7(write_only pipe int p, global int *ptr) {	void test7(write_only pipe int p, global int *ptr) {
	// CHECK: call i32 @__get_pipe_num_packets(%opencl.pipe_t* %{{.*}}, i32 4, i32 4)	// CHECK: call i32 @__get_pipe_num_packets(%opencl.pipe_t{{.}} %{{.*}}, i32 4, i32 4)
	*ptr = get_pipe_num_packets(p);	*ptr = get_pipe_num_packets(p);
	// CHECK: call i32 @__get_pipe_max_packets(%opencl.pipe_t* %{{.*}}, i32 4, i32 4)	// CHECK: call i32 @__get_pipe_max_packets(%opencl.pipe_t{{.}} %{{.*}}, i32 4, i32 4)
	*ptr = get_pipe_max_packets(p);	*ptr = get_pipe_max_packets(p);
	}	}

	void test8(read_only pipe int r, write_only pipe int w, global int *ptr) {	void test8(read_only pipe int r, write_only pipe int w, global int *ptr) {
	// verify that return type is correctly casted to i1 value	// verify that return type is correctly casted to i1 value
	// CHECK: %[[R:[0-9]+]] = call i32 @__read_pipe_2	// NAMD: %[[R:[0-9]+]] = call i32 @__read_pipe_2
		// AMD: %[[R:[0-9]+]] = call i32 @__read_pipe_2_4
	// CHECK: icmp ne i32 %[[R]], 0	// CHECK: icmp ne i32 %[[R]], 0
	if (read_pipe(r, ptr)) *ptr = -1;	if (read_pipe(r, ptr)) *ptr = -1;
	// CHECK: %[[W:[0-9]+]] = call i32 @__write_pipe_2	// NAMD: %[[W:[0-9]+]] = call i32 @__write_pipe_2
		// AMD: %[[W:[0-9]+]] = call i32 @__write_pipe_2_4
	// CHECK: icmp ne i32 %[[W]], 0	// CHECK: icmp ne i32 %[[W]], 0
	if (write_pipe(w, ptr)) *ptr = -1;	if (write_pipe(w, ptr)) *ptr = -1;
	// CHECK: %[[N:[0-9]+]] = call i32 @__get_pipe_num_packets	// CHECK: %[[N:[0-9]+]] = call i32 @__get_pipe_num_packets
Context not available.
	// CHECK: icmp ne i32 %[[M]], 0	// CHECK: icmp ne i32 %[[M]], 0
	if (get_pipe_max_packets(w)) *ptr = -1;	if (get_pipe_max_packets(w)) *ptr = -1;
	}	}

		// CHECK-LABEL: @test9
		void test9(read_only pipe char p1, global char *ptr1,
		read_only pipe short p2, global short *ptr2,
		read_only pipe int p4, global int *ptr4,
		read_only pipe long p8, global long *ptr8,
		read_only pipe long2 p16, global long2 *ptr16,
		read_only pipe long4 p32, global long4 *ptr32,
		read_only pipe long8 p64, global long8 *ptr64,
		read_only pipe long16 p128, global long16 *ptr128,
		read_only pipe S pu, global S *ptru) {
		// AMD: call i32 @__read_pipe_2_1(%opencl.pipe_t addrspace(1)* {{.}}, i8 %{{.*}})
		read_pipe(p1, ptr1);
		// AMD: call i32 @__read_pipe_2_2(%opencl.pipe_t addrspace(1)* {{.}}, i16 %{{.*}})
		read_pipe(p2, ptr2);
		// AMD: call i32 @__read_pipe_2_4(%opencl.pipe_t addrspace(1)* {{.}}, i32 %{{.*}})
		read_pipe(p4, ptr4);
		// AMD: call i32 @__read_pipe_2_8(%opencl.pipe_t addrspace(1)* {{.}}, i64 %{{.*}})
		read_pipe(p8, ptr8);
		// AMD: call i32 @__read_pipe_2_16(%opencl.pipe_t addrspace(1)* %{{.}}, <2 x i64> %{{.*}})
		read_pipe(p16, ptr16);
		// AMD: call i32 @__read_pipe_2_32(%opencl.pipe_t addrspace(1)* %{{.}}, <4 x i64> %{{.*}})
		read_pipe(p32, ptr32);
		// AMD: call i32 @__read_pipe_2_64(%opencl.pipe_t addrspace(1)* %{{.}}, <8 x i64> %{{.*}})
		read_pipe(p64, ptr64);
		// AMD: call i32 @__read_pipe_2_128(%opencl.pipe_t addrspace(1)* %{{.}}, <16 x i64> %{{.*}})
		read_pipe(p128, ptr128);
		// AMD: call i32 @__read_pipe_2(%opencl.pipe_t addrspace(1)* %{{.}}, i8 %{{.*}}, i32 400, i32 4)
		read_pipe(pu, ptru);
		}
Context not available.