This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add type mangling for {read, write, readfirst, perm}lane intrinsics
Needs RevisionPublic

Authored by jrbyrnes on Apr 6 2023, 11:58 AM.

Download Raw Diff

Details

Reviewers

rampitec
arsenm
nhaehnle

Summary

Add builtins which accept floats for these instructions. A user is requesting to have permlane builtins for floats without use of casts.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jrbyrnes created this revision.Apr 6 2023, 11:58 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 6 2023, 11:58 AM

Herald added subscribers: kosarev, foad, kerbowa and 6 others. · View Herald Transcript

jrbyrnes requested review of this revision.Apr 6 2023, 11:58 AM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptApr 6 2023, 11:58 AM

Herald added subscribers: llvm-commits, cfe-commits, wdng. · View Herald Transcript

Isn't it simpler to lower it to an existing int intrinsic and casts in clang?

In D147732#4249567, @rampitec wrote:

Isn't it simpler to lower it to an existing int intrinsic and casts in clang?

Thanks for your comment Stas!

I think it would be ideal if clang inserted pure bitcasts for floats instead of fptoui when passed as operands to these builtins. My concern is -- Do you think we need to preserve the implicit casting behavior for compatibility?

In D147732#4249584, @jrbyrnes wrote:

In D147732#4249567, @rampitec wrote:

Isn't it simpler to lower it to an existing int intrinsic and casts in clang?

Thanks for your comment Stas!

I think it would be ideal if clang inserted pure bitcasts for floats instead of fptoui when passed as operands to these builtins. My concern is -- Do you think we need to preserve the implicit casting behavior for compatibility?

You can manually lower builtin to a proper cast and intrinsic in the CGBuiltin.cpp.

Harbormaster completed remote builds in B224079: Diff 511500.Apr 6 2023, 12:39 PM

There is a benefit to not having bitcast noise in the IR

llvm/include/llvm/IR/IntrinsicsAMDGPU.td
1964–1965	Should use type mangling for the existing intrinsics rather than introducing new typed copies

This revision now requires changes to proceed.Apr 7 2023, 3:43 PM

Use type mangling

Harbormaster completed remote builds in B225458: Diff 513386.Apr 13 2023, 5:52 PM

Changing the existing intrinsics to use type mangling could break clients like LLPC and Mesa. I've put up a patch for LLPC to protect it against this change: https://github.com/GPUOpen-Drivers/llpc/pull/2404

In D147732#4267553, @foad wrote:

Changing the existing intrinsics to use type mangling could break clients like LLPC and Mesa. I've put up a patch for LLPC to protect it against this change: https://github.com/GPUOpen-Drivers/llpc/pull/2404

It can be fixed with IR autoupgrade I suppose.

In D147732#4268661, @rampitec wrote:

In D147732#4267553, @foad wrote:

Changing the existing intrinsics to use type mangling could break clients like LLPC and Mesa. I've put up a patch for LLPC to protect it against this change: https://github.com/GPUOpen-Drivers/llpc/pull/2404

It can be fixed with IR autoupgrade I suppose.

No, I'm thinking of clients that use IRBuilder to create intrinsic calls programmatically.

Thanks @foad for pointing that out.

Inherit https://reviews.llvm.org/D86154 -- the main idea is: if we are going to break calls to CreateIntrinsic, we might as well break them all in a single commit.

I have also reworked ISel for permlane intrinsics s.t. we consistently (w.r.t implementation in D86154) rely on LateCodeGenPrepare for legalization, rather than select into a new MI.

Herald added a subscriber: nlopes. · View Herald TranscriptMay 12 2023, 1:48 PM

jrbyrnes retitled this revision from [AMDGPU] Add f32 permlane{16, x16} builtin variants to [AMDGPU] Add type mangling for {read, write, readfirst, perm}lane intrinsics.May 12 2023, 1:49 PM

jrbyrnes added a reviewer: nhaehnle.

clang-format + newlines

Harbormaster completed remote builds in B231705: Diff 521793.May 12 2023, 7:33 PM

Please make sure we can consume old IR using unmangled llvm.amdgcn.permlanex16 by fixing auto upgrade. Otherwise, it may cause regressions for device libs or existing apps. Thanks.

Hi @yaxunl thanks for the comment.

I looked into AutoUpgrade, and it catches upgrades of this type (unmangled -> mangled) by default (via remangleIntrinsicFunction). I've included tests to demonstrate. Is this the desired effect, or are you asking to blacklist these intrinstics from being upgraded?

In D147732#4343146, @jrbyrnes wrote:

Hi @yaxunl thanks for the comment.

I looked into AutoUpgrade, and it catches upgrades of this type (unmangled -> mangled) by default (via remangleIntrinsicFunction). I've included tests to demonstrate. Is this the desired effect, or are you asking to blacklist these intrinstics from being upgraded?

That is the desired effect. Thanks.

Harbormaster completed remote builds in B232062: Diff 522267.May 15 2023, 1:53 PM

D84639 is an old version of this. It has some additional tests not covered here, can you copy them?

Add tests from D84639 + actually support ptr type during legalization (i.e. AMDGPULateCodeGenPrepare)

Harbormaster completed remote builds in B238322: Diff 530688.Jun 12 2023, 10:59 PM

I think this may not hard break mesa. I believe mesa bypasses the intrinsic creation API, and just declares the string name of the intrinsic. The type name mangling suffix is technically irrelevant, and as long as you use a consistent type with a consistent suffix things should work out (and the null suffix also works). After committing mesa should still move to adding the type suffix

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.readlane.ll
22	This is a hint mesa won't break

nlopes added inline comments.Jun 20 2023, 7:17 AM

llvm/lib/Target/AMDGPU/AMDGPULateCodeGenPrepare.cpp
213	Please use poison wherever possible. In this case it seems it's just a placeholder, so it can be poison. We're trying to get rid of poison. Thanks!

arsenm added inline comments.Jun 20 2023, 7:21 AM

llvm/lib/Target/AMDGPU/AMDGPULateCodeGenPrepare.cpp
209	isIntegerTy(16). Also, just check the bitsize is 16. Might as well also handle bfloat
301–310	Just let pointer types pass through to codegen, we try really hard to never introduce ptrtoint/inttoptr
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.readlane.ll
5	Add bfloat and <2 x i16>, <2 x half>, <2 x bfloat> tests

arsenm added inline comments.Jun 20 2023, 7:24 AM

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.readlane.ll
5	Also p2, p3, p5, p6

Address comments + enable selection of ptr types

Harbormaster completed remote builds in B240117: Diff 533080.Jun 20 2023, 7:43 PM

arsenm mentioned this in D84639: AMDGPU: Add type mangling to llvm.amdgcn.readfirstlane.Jun 22 2023, 3:12 AM

In D147732#4434557, @arsenm wrote:

I think this may not hard break mesa. I believe mesa bypasses the intrinsic creation API, and just declares the string name of the intrinsic. The type name mangling suffix is technically irrelevant, and as long as you use a consistent type with a consistent suffix things should work out (and the null suffix also works). After committing mesa should still move to adding the type suffix

I can echo this sentiment.

The main issues arises when there are untyped calls to CreateIntrinsic, as the intrinsics are no longer defined with a type.

For {read, readfirst, write, perm}lanes, Mesa uses LLVMAddFunction and LLVMBuildCall2 APIs under its own ac_build_intrinsic -- these calls are all typed in the current implementation. Also, (as expected) the implementation inserts bitcasts to cast to Int32Ty before inserting these calls since only that version of the intrinsic currently exists. This also implies they wont have an issue with intrinsic / type declarations.

Unless I have missed something, I don't see why switching to type-mangling would cause an issue with Mesa's current implementation.

arsenm requested changes to this revision.Jul 6 2023, 12:25 PM

arsenm added inline comments.

llvm/lib/Target/AMDGPU/AMDGPULateCodeGenPrepare.cpp
187	You're not relying on this for correctness are you? This is an optimization pass, you can't lower here. You also shouldn't need to handle this in the IR, it should codegen normally

This revision now requires changes to proceed.Jul 6 2023, 12:25 PM

Herald added a subscriber: wangpc. · View Herald TranscriptJul 6 2023, 12:25 PM

jrbyrnes added inline comments.Jul 6 2023, 2:25 PM

llvm/lib/Target/AMDGPU/AMDGPULateCodeGenPrepare.cpp
187	This is the legalization for non 32bit types -- I don't exactly know why it wasn't handled via the normal codegen / selection process. @nhaehnle , I believe you tried this in https://reviews.llvm.org/D86154 -- do you happen to remember why we do legalization this way? If not, I'll rework the approach.

arsenm added inline comments.Jul 24 2023, 9:00 AM

llvm/lib/Target/AMDGPU/AMDGPULateCodeGenPrepare.cpp
187	CodeGenPrepare/LateCodeGenPrepare can't be used for lowering, they're optimization passes. Legalization needs to be handled in the codegen

arsenm mentioned this in D156301: [AMDGPU] Support FAdd/FSub global atomics in AMDGPUAtomicOptimizer..Jul 28 2023, 11:29 AM

arsenm mentioned this in D86154: AMDGPU: Add llvm.amdgcn.{read,readfirst,write}lane2 intrinsics with type overloads.Jul 28 2023, 11:45 AM

Revision Contents

Path

Size

clang/

include/

clang/

Basic/

BuiltinsAMDGPU.def

2 lines

lib/

CodeGen/

CGBuiltin.cpp

29 lines

test/

CodeGenOpenCL/

builtins-amdgcn-gfx10.cl

17 lines

builtins-amdgcn.cl

4 lines

SemaOpenCL/

builtins-amdgcn-error-gfx10-param.cl

10 lines

llvm/

include/

llvm/

IR/

IntrinsicsAMDGPU.td

24 lines

lib/

Target/

AMDGPU/

AMDGPUAtomicOptimizer.cpp

39 lines

AMDGPUCodeGenPrepare.cpp

4 lines

AMDGPUInstCombineIntrinsic.cpp

11 lines

AMDGPULateCodeGenPrepare.cpp

167 lines

AMDGPUTargetMachine.cpp

6 lines

SIInstructions.td

30 lines

VOP3Instructions.td

17 lines

test/

Analysis/

UniformityAnalysis/

AMDGPU/

intrinsics.ll

6 lines

Assembler/

autoupgrade-amdgpu-intrinsics.ll

68 lines

CodeGen/

AMDGPU/

GlobalISel/

atomic_optimizations_mul_one.ll

39 lines

inst-select-amdgcn.readfirstlane.mir

89 lines

atomic_optimizations_local_pointer.ll

2 lines

global-atomic-scan.ll

439 lines

llvm.amdgcn.permlane.ll

949 lines

llvm.amdgcn.readfirstlane.ll

169 lines

llvm.amdgcn.readlane.ll

158 lines

llvm.amdgcn.writelane.ll

156 lines

permlane-ptr.ll

296 lines

Transforms/

InstCombine/

AMDGPU/

amdgcn-intrinsics.ll

2590 lines

Verifier/

AMDGPU/

intrinsic-immarg.ll

9 lines

Diff 533080

clang/include/clang/Basic/BuiltinsAMDGPU.def

	Show First 20 Lines • Show All 249 Lines • ▼ Show 20 Lines
	TARGET_BUILTIN(__builtin_amdgcn_udot8, "UiUiUiUiIb", "nc", "dot7-insts")			TARGET_BUILTIN(__builtin_amdgcn_udot8, "UiUiUiUiIb", "nc", "dot7-insts")
	TARGET_BUILTIN(__builtin_amdgcn_sudot8, "iIbiIbiiIb", "nc", "dot8-insts")			TARGET_BUILTIN(__builtin_amdgcn_sudot8, "iIbiIbiiIb", "nc", "dot8-insts")

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// GFX10+ only builtins.			// GFX10+ only builtins.
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	TARGET_BUILTIN(__builtin_amdgcn_permlane16, "UiUiUiUiUiIbIb", "nc", "gfx10-insts")			TARGET_BUILTIN(__builtin_amdgcn_permlane16, "UiUiUiUiUiIbIb", "nc", "gfx10-insts")
	TARGET_BUILTIN(__builtin_amdgcn_permlanex16, "UiUiUiUiUiIbIb", "nc", "gfx10-insts")			TARGET_BUILTIN(__builtin_amdgcn_permlanex16, "UiUiUiUiUiIbIb", "nc", "gfx10-insts")
				TARGET_BUILTIN(__builtin_amdgcn_permlane16_f32, "fffUiUiIbIb", "nc", "gfx10-insts")
				TARGET_BUILTIN(__builtin_amdgcn_permlanex16_f32, "fffUiUiIbIb", "nc", "gfx10-insts")
	TARGET_BUILTIN(__builtin_amdgcn_mov_dpp8, "UiUiIUi", "nc", "gfx10-insts")			TARGET_BUILTIN(__builtin_amdgcn_mov_dpp8, "UiUiIUi", "nc", "gfx10-insts")

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Raytracing builtins.			// Raytracing builtins.
	// By default the 1st argument is i32 and the 4/5-th arguments are float4.			// By default the 1st argument is i32 and the 4/5-th arguments are float4.
	// Postfix l indicates the 1st argument is i64.			// Postfix l indicates the 1st argument is i64.
	// Postfix h indicates the 4/5-th arguments are half4.			// Postfix h indicates the 4/5-th arguments are half4.
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	▲ Show 20 Lines • Show All 135 Lines • Show Last 20 Lines

clang/lib/CodeGen/CGBuiltin.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 17,348 Lines • ▼ Show 20 Lines	for (int i = 0, e = E->getNumArgs(); i != e; ++i)
Args.push_back(EmitScalarExpr(E->getArg(i)));		Args.push_back(EmitScalarExpr(E->getArg(i)));

Function *F = CGM.getIntrinsic(BuiltinWMMAOp,		Function *F = CGM.getIntrinsic(BuiltinWMMAOp,
{Args[ArgForMatchingRetType]->getType()});		{Args[ArgForMatchingRetType]->getType()});

return Builder.CreateCall(F, Args);		return Builder.CreateCall(F, Args);
}		}

		case AMDGPU::BI__builtin_amdgcn_permlane16:
		case AMDGPU::BI__builtin_amdgcn_permlanex16:
		case AMDGPU::BI__builtin_amdgcn_permlane16_f32:
		case AMDGPU::BI__builtin_amdgcn_permlanex16_f32: {
		Intrinsic::ID Intrin;
		switch (BuiltinID) {
		case AMDGPU::BI__builtin_amdgcn_permlane16:
		case AMDGPU::BI__builtin_amdgcn_permlane16_f32:
		Intrin = Intrinsic::amdgcn_permlane16;
		break;
		case AMDGPU::BI__builtin_amdgcn_permlanex16:
		case AMDGPU::BI__builtin_amdgcn_permlanex16_f32:
		Intrin = Intrinsic::amdgcn_permlanex16;
		break;
		}
		llvm::Value *Src0 = EmitScalarExpr(E->getArg(0));
		llvm::Value *Src1 = EmitScalarExpr(E->getArg(1));
		llvm::Value *Src2 = EmitScalarExpr(E->getArg(2));
		llvm::Value *Src3 = EmitScalarExpr(E->getArg(3));
		llvm::Value *Src4 = EmitScalarExpr(E->getArg(4));
		llvm::Value *Src5 = EmitScalarExpr(E->getArg(5));

		llvm::Function *F = CGM.getIntrinsic(Intrin, Src1->getType());
		return Builder.CreateCall(F, {Src0, Src1, Src2, Src3, Src4, Src5});
		}
		case AMDGPU::BI__builtin_amdgcn_readfirstlane:
		return emitUnaryBuiltin(*this, E, Intrinsic::amdgcn_readfirstlane);
		case AMDGPU::BI__builtin_amdgcn_readlane:
		return emitBinaryBuiltin(*this, E, Intrinsic::amdgcn_readlane);
// amdgcn workitem		// amdgcn workitem
case AMDGPU::BI__builtin_amdgcn_workitem_id_x:		case AMDGPU::BI__builtin_amdgcn_workitem_id_x:
return emitRangedBuiltin(*this, Intrinsic::amdgcn_workitem_id_x, 0, 1024);		return emitRangedBuiltin(*this, Intrinsic::amdgcn_workitem_id_x, 0, 1024);
case AMDGPU::BI__builtin_amdgcn_workitem_id_y:		case AMDGPU::BI__builtin_amdgcn_workitem_id_y:
return emitRangedBuiltin(*this, Intrinsic::amdgcn_workitem_id_y, 0, 1024);		return emitRangedBuiltin(*this, Intrinsic::amdgcn_workitem_id_y, 0, 1024);
case AMDGPU::BI__builtin_amdgcn_workitem_id_z:		case AMDGPU::BI__builtin_amdgcn_workitem_id_z:
return emitRangedBuiltin(*this, Intrinsic::amdgcn_workitem_id_z, 0, 1024);		return emitRangedBuiltin(*this, Intrinsic::amdgcn_workitem_id_z, 0, 1024);

▲ Show 20 Lines • Show All 2,790 Lines • Show Last 20 Lines

clang/test/CodeGenOpenCL/builtins-amdgcn-gfx10.cl

	// REQUIRES: amdgpu-registered-target			// REQUIRES: amdgpu-registered-target
	// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu gfx1010 -S -emit-llvm -o - %s \| FileCheck %s			// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu gfx1010 -S -emit-llvm -o - %s \| FileCheck %s
	// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu gfx1011 -S -emit-llvm -o - %s \| FileCheck %s			// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu gfx1011 -S -emit-llvm -o - %s \| FileCheck %s
	// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu gfx1012 -S -emit-llvm -o - %s \| FileCheck %s			// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu gfx1012 -S -emit-llvm -o - %s \| FileCheck %s

	typedef unsigned int uint;			typedef unsigned int uint;
	typedef unsigned long ulong;			typedef unsigned long ulong;

	// CHECK-LABEL: @test_permlane16(			// CHECK-LABEL: @test_permlane16(
	// CHECK: call i32 @llvm.amdgcn.permlane16(i32 %a, i32 %b, i32 %c, i32 %d, i1 false, i1 false)			// CHECK: call i32 @llvm.amdgcn.permlane16.i32(i32 %a, i32 %b, i32 %c, i32 %d, i1 false, i1 false)
	void test_permlane16(global uint* out, uint a, uint b, uint c, uint d) {			void test_permlane16(global uint* out, uint a, uint b, uint c, uint d) {
	*out = __builtin_amdgcn_permlane16(a, b, c, d, 0, 0);			*out = __builtin_amdgcn_permlane16(a, b, c, d, 0, 0);
	}			}

	// CHECK-LABEL: @test_permlanex16(			// CHECK-LABEL: @test_permlanex16(
	// CHECK: call i32 @llvm.amdgcn.permlanex16(i32 %a, i32 %b, i32 %c, i32 %d, i1 false, i1 false)			// CHECK: call i32 @llvm.amdgcn.permlanex16.i32(i32 %a, i32 %b, i32 %c, i32 %d, i1 false, i1 false)
	void test_permlanex16(global uint* out, uint a, uint b, uint c, uint d) {			void test_permlanex16(global uint* out, uint a, uint b, uint c, uint d) {
	*out = __builtin_amdgcn_permlanex16(a, b, c, d, 0, 0);			*out = __builtin_amdgcn_permlanex16(a, b, c, d, 0, 0);
	}			}

				// CHECK-LABEL: @test_permlane16_f32(
				// CHECK: call float @llvm.amdgcn.permlane16.f32(float %a, float %b, i32 %c, i32 %d, i1 false, i1 false)
				void test_permlane16_f32(global float* out, float a, float b, uint c, uint d) {
				*out = __builtin_amdgcn_permlane16_f32(a, b, c, d, 0, 0);
				}

				// CHECK-LABEL: @test_permlanex16_f32(
				// CHECK: call float @llvm.amdgcn.permlanex16.f32(float %a, float %b, i32 %c, i32 %d, i1 false, i1 false)
				void test_permlanex16_f32(global float* out, float a, float b, uint c, uint d) {
				*out = __builtin_amdgcn_permlanex16_f32(a, b, c, d, 0, 0);
				}


	// CHECK-LABEL: @test_mov_dpp8(			// CHECK-LABEL: @test_mov_dpp8(
	// CHECK: call i32 @llvm.amdgcn.mov.dpp8.i32(i32 %a, i32 1)			// CHECK: call i32 @llvm.amdgcn.mov.dpp8.i32(i32 %a, i32 1)
	void test_mov_dpp8(global uint* out, uint a) {			void test_mov_dpp8(global uint* out, uint a) {
	*out = __builtin_amdgcn_mov_dpp8(a, 1);			*out = __builtin_amdgcn_mov_dpp8(a, 1);
	}			}

	// CHECK-LABEL: @test_s_memtime			// CHECK-LABEL: @test_s_memtime
	// CHECK: call i64 @llvm.amdgcn.s.memtime()			// CHECK: call i64 @llvm.amdgcn.s.memtime()
	Show All 18 Lines

clang/test/CodeGenOpenCL/builtins-amdgcn.cl

	Show First 20 Lines • Show All 286 Lines • ▼ Show 20 Lines
	// CHECK-LABEL: @test_ds_bpermute			// CHECK-LABEL: @test_ds_bpermute
	// CHECK: call i32 @llvm.amdgcn.ds.bpermute(i32 %a, i32 %b)			// CHECK: call i32 @llvm.amdgcn.ds.bpermute(i32 %a, i32 %b)
	void test_ds_bpermute(global int* out, int a, int b)			void test_ds_bpermute(global int* out, int a, int b)
	{			{
	*out = __builtin_amdgcn_ds_bpermute(a, b);			*out = __builtin_amdgcn_ds_bpermute(a, b);
	}			}

	// CHECK-LABEL: @test_readfirstlane			// CHECK-LABEL: @test_readfirstlane
	// CHECK: call i32 @llvm.amdgcn.readfirstlane(i32 %a)			// CHECK: call i32 @llvm.amdgcn.readfirstlane.i32(i32 %a)
	void test_readfirstlane(global int* out, int a)			void test_readfirstlane(global int* out, int a)
	{			{
	*out = __builtin_amdgcn_readfirstlane(a);			*out = __builtin_amdgcn_readfirstlane(a);
	}			}

	// CHECK-LABEL: @test_readlane			// CHECK-LABEL: @test_readlane
	// CHECK: call i32 @llvm.amdgcn.readlane(i32 %a, i32 %b)			// CHECK: call i32 @llvm.amdgcn.readlane.i32(i32 %a, i32 %b)
	void test_readlane(global int* out, int a, int b)			void test_readlane(global int* out, int a, int b)
	{			{
	*out = __builtin_amdgcn_readlane(a, b);			*out = __builtin_amdgcn_readlane(a, b);
	}			}

	// CHECK-LABEL: @test_fcmp_f32			// CHECK-LABEL: @test_fcmp_f32
	// CHECK: call i64 @llvm.amdgcn.fcmp.i64.f32(float %a, float %b, i32 5)			// CHECK: call i64 @llvm.amdgcn.fcmp.i64.f32(float %a, float %b, i32 5)
	void test_fcmp_f32(global ulong* out, float a, float b)			void test_fcmp_f32(global ulong* out, float a, float b)
	▲ Show 20 Lines • Show All 496 Lines • Show Last 20 Lines

clang/test/SemaOpenCL/builtins-amdgcn-error-gfx10-param.cl

	// RUN: %clang_cc1 -triple amdgcn-- -target-cpu gfx1010 -verify -S -o - %s			// RUN: %clang_cc1 -triple amdgcn-- -target-cpu gfx1010 -verify -S -o - %s

	typedef unsigned int uint;			typedef unsigned int uint;


	void test_permlane16(global uint* out, uint a, uint b, uint c, uint d, uint e) {			void test_permlane16(global uint* out, uint a, uint b, uint c, uint d, uint e) {
	*out = __builtin_amdgcn_permlane16(a, b, c, d, e, 1); // expected-error{{argument to '__builtin_amdgcn_permlane16' must be a constant integer}}			*out = __builtin_amdgcn_permlane16(a, b, c, d, e, 1); // expected-error{{argument to '__builtin_amdgcn_permlane16' must be a constant integer}}
	*out = __builtin_amdgcn_permlane16(a, b, c, d, 1, e); // expected-error{{argument to '__builtin_amdgcn_permlane16' must be a constant integer}}			*out = __builtin_amdgcn_permlane16(a, b, c, d, 1, e); // expected-error{{argument to '__builtin_amdgcn_permlane16' must be a constant integer}}
	}			}

	void test_permlanex16(global uint* out, uint a, uint b, uint c, uint d, uint e) {			void test_permlanex16(global uint* out, uint a, uint b, uint c, uint d, uint e) {
	*out = __builtin_amdgcn_permlanex16(a, b, c, d, e, 1); // expected-error{{argument to '__builtin_amdgcn_permlanex16' must be a constant integer}}			*out = __builtin_amdgcn_permlanex16(a, b, c, d, e, 1); // expected-error{{argument to '__builtin_amdgcn_permlanex16' must be a constant integer}}
	*out = __builtin_amdgcn_permlanex16(a, b, c, d, 1, e); // expected-error{{argument to '__builtin_amdgcn_permlanex16' must be a constant integer}}			*out = __builtin_amdgcn_permlanex16(a, b, c, d, 1, e); // expected-error{{argument to '__builtin_amdgcn_permlanex16' must be a constant integer}}
	}			}

				void test_permlane16_f32(global float* out, float a, float b, uint c, uint d, uint e) {
				*out = __builtin_amdgcn_permlane16_f32(a, b, c, d, e, 1); // expected-error{{argument to '__builtin_amdgcn_permlane16_f32' must be a constant integer}}
				*out = __builtin_amdgcn_permlane16_f32(a, b, c, d, 1, e); // expected-error{{argument to '__builtin_amdgcn_permlane16_f32' must be a constant integer}}
				}

				void test_permlanex16_f32(global float* out, float a, float b, uint c, uint d, uint e) {
				*out = __builtin_amdgcn_permlanex16_f32(a, b, c, d, e, 1); // expected-error{{argument to '__builtin_amdgcn_permlanex16_f32' must be a constant integer}}
				*out = __builtin_amdgcn_permlanex16_f32(a, b, c, d, 1, e); // expected-error{{argument to '__builtin_amdgcn_permlanex16_f32' must be a constant integer}}
				}

	void test_mov_dpp8(global uint* out, uint a, uint b) {			void test_mov_dpp8(global uint* out, uint a, uint b) {
	*out = __builtin_amdgcn_mov_dpp8(a, b); // expected-error{{argument to '__builtin_amdgcn_mov_dpp8' must be a constant integer}}			*out = __builtin_amdgcn_mov_dpp8(a, b); // expected-error{{argument to '__builtin_amdgcn_mov_dpp8' must be a constant integer}}
	}			}

llvm/include/llvm/IR/IntrinsicsAMDGPU.td

Show First 20 Lines • Show All 1,661 Lines • ▼ Show 20 Lines	def int_amdgcn_ballot :
Intrinsic<[llvm_anyint_ty], [llvm_i1_ty],		Intrinsic<[llvm_anyint_ty], [llvm_i1_ty],
[IntrNoMem, IntrConvergent, IntrWillReturn, IntrNoCallback, IntrNoFree]>;		[IntrNoMem, IntrConvergent, IntrWillReturn, IntrNoCallback, IntrNoFree]>;

def int_amdgcn_inverse_ballot :		def int_amdgcn_inverse_ballot :
Intrinsic<[llvm_i1_ty], [llvm_anyint_ty],		Intrinsic<[llvm_i1_ty], [llvm_anyint_ty],
[IntrNoMem, IntrConvergent, IntrWillReturn, IntrNoCallback, IntrNoFree]>;		[IntrNoMem, IntrConvergent, IntrWillReturn, IntrNoCallback, IntrNoFree]>;

def int_amdgcn_readfirstlane :		def int_amdgcn_readfirstlane :
ClangBuiltin<"__builtin_amdgcn_readfirstlane">,		Intrinsic<[llvm_any_ty], [LLVMMatchType<0>],
Intrinsic<[llvm_i32_ty], [llvm_i32_ty],
[IntrNoMem, IntrConvergent, IntrWillReturn, IntrNoCallback, IntrNoFree]>;		[IntrNoMem, IntrConvergent, IntrWillReturn, IntrNoCallback, IntrNoFree]>;

// The lane argument must be uniform across the currently active threads of the		// The lane argument must be uniform across the currently active threads of the
// current wave. Otherwise, the result is undefined.		// current wave. Otherwise, the result is undefined.
def int_amdgcn_readlane :		def int_amdgcn_readlane :
ClangBuiltin<"__builtin_amdgcn_readlane">,		Intrinsic<[llvm_any_ty], [LLVMMatchType<0>, llvm_i32_ty],
Intrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_i32_ty],
[IntrNoMem, IntrConvergent, IntrWillReturn, IntrNoCallback, IntrNoFree]>;		[IntrNoMem, IntrConvergent, IntrWillReturn, IntrNoCallback, IntrNoFree]>;

// The value to write and lane select arguments must be uniform across the		// The value to write and lane select arguments must be uniform across the
// currently active threads of the current wave. Otherwise, the result is		// currently active threads of the current wave. Otherwise, the result is
// undefined.		// undefined.
def int_amdgcn_writelane :		def int_amdgcn_writelane :
ClangBuiltin<"__builtin_amdgcn_writelane">,		ClangBuiltin<"__builtin_amdgcn_writelane">,
Intrinsic<[llvm_i32_ty], [		Intrinsic<[llvm_any_ty], [
llvm_i32_ty, // uniform value to write: returned by the selected lane		LLVMMatchType<0>, // uniform value to write: returned by the selected lane
llvm_i32_ty, // uniform lane select		llvm_i32_ty, // uniform lane select
llvm_i32_ty // returned by all lanes other than the selected one		LLVMMatchType<0> // returned by all lanes other than the selected one
],		],
[IntrNoMem, IntrConvergent, IntrWillReturn, IntrNoCallback, IntrNoFree]		[IntrNoMem, IntrConvergent, IntrWillReturn, IntrNoCallback, IntrNoFree]
>;		>;

def int_amdgcn_alignbyte : ClangBuiltin<"__builtin_amdgcn_alignbyte">,		def int_amdgcn_alignbyte : ClangBuiltin<"__builtin_amdgcn_alignbyte">,
DefaultAttrsIntrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],		DefaultAttrsIntrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
[IntrNoMem, IntrSpeculatable]		[IntrNoMem, IntrSpeculatable]
>;		>;
▲ Show 20 Lines • Show All 238 Lines • ▼ Show 20 Lines	class AMDGPUGlobalLoadLDS : Intrinsic <
"", [SDNPMemOperand]>;		"", [SDNPMemOperand]>;
def int_amdgcn_global_load_lds : AMDGPUGlobalLoadLDS;		def int_amdgcn_global_load_lds : AMDGPUGlobalLoadLDS;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// GFX10 Intrinsics		// GFX10 Intrinsics
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

// llvm.amdgcn.permlane16 <old> <src0> <src1> <src2> <fi> <bound_control>		// llvm.amdgcn.permlane16 <old> <src0> <src1> <src2> <fi> <bound_control>
def int_amdgcn_permlane16 : ClangBuiltin<"__builtin_amdgcn_permlane16">,		def int_amdgcn_permlane16 :
Intrinsic<[llvm_i32_ty],		Intrinsic<[llvm_any_ty],
[llvm_i32_ty, llvm_i32_ty, llvm_i32_ty, llvm_i32_ty, llvm_i1_ty, llvm_i1_ty],		[LLVMMatchType<0>, LLVMMatchType<0>, llvm_i32_ty, llvm_i32_ty, llvm_i1_ty, llvm_i1_ty],
[IntrNoMem, IntrConvergent, IntrWillReturn,		[IntrNoMem, IntrConvergent, IntrWillReturn,
ImmArg<ArgIndex<4>>, ImmArg<ArgIndex<5>>, IntrNoCallback, IntrNoFree]>;		ImmArg<ArgIndex<4>>, ImmArg<ArgIndex<5>>, IntrNoCallback, IntrNoFree]>;

// llvm.amdgcn.permlanex16 <old> <src0> <src1> <src2> <fi> <bound_control>		// llvm.amdgcn.permlanex16 <old> <src0> <src1> <src2> <fi> <bound_control>
def int_amdgcn_permlanex16 : ClangBuiltin<"__builtin_amdgcn_permlanex16">,		def int_amdgcn_permlanex16 :
Intrinsic<[llvm_i32_ty],		Intrinsic<[llvm_any_ty],
[llvm_i32_ty, llvm_i32_ty, llvm_i32_ty, llvm_i32_ty, llvm_i1_ty, llvm_i1_ty],		[LLVMMatchType<0>, LLVMMatchType<0>, llvm_i32_ty, llvm_i32_ty, llvm_i1_ty, llvm_i1_ty],
[IntrNoMem, IntrConvergent, IntrWillReturn,		[IntrNoMem, IntrConvergent, IntrWillReturn,
ImmArg<ArgIndex<4>>, ImmArg<ArgIndex<5>>, IntrNoCallback, IntrNoFree]>;		ImmArg<ArgIndex<4>>, ImmArg<ArgIndex<5>>, IntrNoCallback, IntrNoFree]>;

// llvm.amdgcn.mov.dpp8.i32 <src> <sel>		// llvm.amdgcn.mov.dpp8.i32 <src> <sel>
// <sel> is a 32-bit constant whose high 8 bits must be zero which selects		// <sel> is a 32-bit constant whose high 8 bits must be zero which selects
// the lanes to read from.		// the lanes to read from.
def int_amdgcn_mov_dpp8 :		def int_amdgcn_mov_dpp8 :
Intrinsic<[llvm_anyint_ty],		Intrinsic<[llvm_anyint_ty],
[LLVMMatchType<0>, llvm_i32_ty],		[LLVMMatchType<0>, llvm_i32_ty],
[IntrNoMem, IntrConvergent, IntrWillReturn,		[IntrNoMem, IntrConvergent, IntrWillReturn,
ImmArg<ArgIndex<1>>, IntrNoCallback, IntrNoFree]>;		ImmArg<ArgIndex<1>>, IntrNoCallback, IntrNoFree]>;

def int_amdgcn_s_get_waveid_in_workgroup :		def int_amdgcn_s_get_waveid_in_workgroup :
ClangBuiltin<"__builtin_amdgcn_s_get_waveid_in_workgroup">,		ClangBuiltin<"__builtin_amdgcn_s_get_waveid_in_workgroup">,
		arsenmUnsubmitted Done Reply Inline Actions Should use type mangling for the existing intrinsics rather than introducing new typed copies arsenm: Should use type mangling for the existing intrinsics rather than introducing new typed copies
Intrinsic<[llvm_i32_ty], [],		Intrinsic<[llvm_i32_ty], [],
[IntrNoMem, IntrHasSideEffects, IntrWillReturn, IntrNoCallback, IntrNoFree]>;		[IntrNoMem, IntrHasSideEffects, IntrWillReturn, IntrNoCallback, IntrNoFree]>;

class AMDGPUGlobalAtomicRtn<LLVMType vt> : Intrinsic <		class AMDGPUGlobalAtomicRtn<LLVMType vt> : Intrinsic <
[vt],		[vt],
[llvm_anyptr_ty, // vaddr		[llvm_anyptr_ty, // vaddr
vt], // vdata(VGPR)		vt], // vdata(VGPR)
[IntrArgMemOnly, IntrWillReturn, NoCapture<ArgIndex<0>>, IntrNoCallback, IntrNoFree], "",		[IntrArgMemOnly, IntrWillReturn, NoCapture<ArgIndex<0>>, IntrNoCallback, IntrNoFree], "",
▲ Show 20 Lines • Show All 505 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp

Show First 20 Lines • Show All 338 Lines • ▼ Show 20 Lines	V = buildNonAtomicBinOp(
B.getInt32(0xf), B.getInt32(0xf), B.getFalse()}));		B.getInt32(0xf), B.getInt32(0xf), B.getFalse()}));
}		}

// Reduce within each pair of rows (i.e. 32 lanes).		// Reduce within each pair of rows (i.e. 32 lanes).
assert(ST->hasPermLaneX16());		assert(ST->hasPermLaneX16());
V = buildNonAtomicBinOp(		V = buildNonAtomicBinOp(
B, Op, V,		B, Op, V,
B.CreateIntrinsic(		B.CreateIntrinsic(
Intrinsic::amdgcn_permlanex16, {},		Intrinsic::amdgcn_permlanex16, {B.getInt32Ty()},
{V, V, B.getInt32(-1), B.getInt32(-1), B.getFalse(), B.getFalse()}));		{V, V, B.getInt32(-1), B.getInt32(-1), B.getFalse(), B.getFalse()}));

if (ST->isWave32())		if (ST->isWave32())
return V;		return V;

if (ST->hasPermLane64()) {		if (ST->hasPermLane64()) {
// Reduce across the upper and lower 32 lanes.		// Reduce across the upper and lower 32 lanes.
return buildNonAtomicBinOp(		return buildNonAtomicBinOp(
B, Op, V, B.CreateIntrinsic(Intrinsic::amdgcn_permlane64, {}, V));		B, Op, V, B.CreateIntrinsic(Intrinsic::amdgcn_permlane64, {}, V));
}		}

// Pick an arbitrary lane from 0..31 and an arbitrary lane from 32..63 and		// Pick an arbitrary lane from 0..31 and an arbitrary lane from 32..63 and
// combine them with a scalar operation.		// combine them with a scalar operation.
Function *ReadLane =		Function *ReadLane =
Intrinsic::getDeclaration(M, Intrinsic::amdgcn_readlane, {});		Intrinsic::getDeclaration(M, Intrinsic::amdgcn_readlane, {Ty});
Value *const Lane0 = B.CreateCall(ReadLane, {V, B.getInt32(0)});		Value *const Lane0 = B.CreateCall(ReadLane, {V, B.getInt32(0)});
Value *const Lane32 = B.CreateCall(ReadLane, {V, B.getInt32(32)});		Value *const Lane32 = B.CreateCall(ReadLane, {V, B.getInt32(32)});
return buildNonAtomicBinOp(B, Op, Lane0, Lane32);		return buildNonAtomicBinOp(B, Op, Lane0, Lane32);
}		}

// Use the builder to create an inclusive scan of V across the wavefront, with		// Use the builder to create an inclusive scan of V across the wavefront, with
// all lanes active.		// all lanes active.
Value *AMDGPUAtomicOptimizerImpl::buildScan(IRBuilder<> &B,		Value *AMDGPUAtomicOptimizerImpl::buildScan(IRBuilder<> &B,
Show All 26 Lines	Value *AMDGPUAtomicOptimizerImpl::buildScan(IRBuilder<> &B,
} else {		} else {
// On GFX10 all DPP operations are confined to a single row. To get cross-		// On GFX10 all DPP operations are confined to a single row. To get cross-
// row operations we have to use permlane or readlane.		// row operations we have to use permlane or readlane.

// Combine lane 15 into lanes 16..31 (and, for wave 64, lane 47 into lanes		// Combine lane 15 into lanes 16..31 (and, for wave 64, lane 47 into lanes
// 48..63).		// 48..63).
assert(ST->hasPermLaneX16());		assert(ST->hasPermLaneX16());
Value *const PermX = B.CreateIntrinsic(		Value *const PermX = B.CreateIntrinsic(
Intrinsic::amdgcn_permlanex16, {},		Intrinsic::amdgcn_permlanex16, {B.getInt32Ty()},
{V, V, B.getInt32(-1), B.getInt32(-1), B.getFalse(), B.getFalse()});		{V, V, B.getInt32(-1), B.getInt32(-1), B.getFalse(), B.getFalse()});
V = buildNonAtomicBinOp(		V = buildNonAtomicBinOp(
B, Op, V,		B, Op, V,
B.CreateCall(UpdateDPP,		B.CreateCall(UpdateDPP,
{Identity, PermX, B.getInt32(DPP::QUAD_PERM_ID),		{Identity, PermX, B.getInt32(DPP::QUAD_PERM_ID),
B.getInt32(0xa), B.getInt32(0xf), B.getFalse()}));		B.getInt32(0xa), B.getInt32(0xf), B.getFalse()}));
if (!ST->isWave32()) {		if (!ST->isWave32()) {
// Combine lane 31 into lanes 32..63.		// Combine lane 31 into lanes 32..63.
Value *const Lane31 = B.CreateIntrinsic(Intrinsic::amdgcn_readlane, {},		Value *const Lane31 = B.CreateIntrinsic(
{V, B.getInt32(31)});		Intrinsic::amdgcn_readlane, {V->getType()}, {V, B.getInt32(31)});
V = buildNonAtomicBinOp(		V = buildNonAtomicBinOp(
B, Op, V,		B, Op, V,
B.CreateCall(UpdateDPP,		B.CreateCall(UpdateDPP,
{Identity, Lane31, B.getInt32(DPP::QUAD_PERM_ID),		{Identity, Lane31, B.getInt32(DPP::QUAD_PERM_ID),
B.getInt32(0xc), B.getInt32(0xf), B.getFalse()}));		B.getInt32(0xc), B.getInt32(0xf), B.getFalse()}));
}		}
}		}
return V;		return V;
Show All 10 Lines	Value AMDGPUAtomicOptimizerImpl::buildShiftRight(IRBuilder<> &B, Value V,

if (ST->hasDPPWavefrontShifts()) {		if (ST->hasDPPWavefrontShifts()) {
// GFX9 has DPP wavefront shift operations.		// GFX9 has DPP wavefront shift operations.
V = B.CreateCall(UpdateDPP,		V = B.CreateCall(UpdateDPP,
{Identity, V, B.getInt32(DPP::WAVE_SHR1), B.getInt32(0xf),		{Identity, V, B.getInt32(DPP::WAVE_SHR1), B.getInt32(0xf),
B.getInt32(0xf), B.getFalse()});		B.getInt32(0xf), B.getFalse()});
} else {		} else {
Function *ReadLane =		Function *ReadLane =
Intrinsic::getDeclaration(M, Intrinsic::amdgcn_readlane, {});		Intrinsic::getDeclaration(M, Intrinsic::amdgcn_readlane, {Ty});
Function *WriteLane =		Function *WriteLane =
Intrinsic::getDeclaration(M, Intrinsic::amdgcn_writelane, {});		Intrinsic::getDeclaration(M, Intrinsic::amdgcn_writelane, {Ty});

// On GFX10 all DPP operations are confined to a single row. To get cross-		// On GFX10 all DPP operations are confined to a single row. To get cross-
// row operations we have to use permlane or readlane.		// row operations we have to use permlane or readlane.
Value *Old = V;		Value *Old = V;
V = B.CreateCall(UpdateDPP,		V = B.CreateCall(UpdateDPP,
{Identity, V, B.getInt32(DPP::ROW_SHR0 + 1),		{Identity, V, B.getInt32(DPP::ROW_SHR0 + 1),
B.getInt32(0xf), B.getInt32(0xf), B.getFalse()});		B.getInt32(0xf), B.getInt32(0xf), B.getFalse()});

▲ Show 20 Lines • Show All 134 Lines • ▼ Show 20 Lines	if (!NeedResult && ST->hasPermLaneX16()) {
if (NeedResult)		if (NeedResult)
ExclScan = buildShiftRight(B, NewV, Identity);		ExclScan = buildShiftRight(B, NewV, Identity);

// Read the value from the last lane, which has accumulated the values of		// Read the value from the last lane, which has accumulated the values of
// each active lane in the wavefront. This will be our new value which we		// each active lane in the wavefront. This will be our new value which we
// will provide to the atomic operation.		// will provide to the atomic operation.
Value *const LastLaneIdx = B.getInt32(ST->getWavefrontSize() - 1);		Value *const LastLaneIdx = B.getInt32(ST->getWavefrontSize() - 1);
assert(TyBitWidth == 32);		assert(TyBitWidth == 32);
NewV = B.CreateIntrinsic(Intrinsic::amdgcn_readlane, {},		NewV = B.CreateIntrinsic(Intrinsic::amdgcn_readlane, {Ty},
{NewV, LastLaneIdx});		{NewV, LastLaneIdx});
}		}

// Finally mark the readlanes in the WWM section.		// Finally mark the readlanes in the WWM section.
NewV = B.CreateIntrinsic(Intrinsic::amdgcn_strict_wwm, Ty, NewV);		NewV = B.CreateIntrinsic(Intrinsic::amdgcn_strict_wwm, Ty, NewV);
} else {		} else {
switch (Op) {		switch (Op) {
default:		default:
▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	if (NeedResult) {
// Create a PHI node to get our new atomic result into the exit block.		// Create a PHI node to get our new atomic result into the exit block.
PHINode *const PHI = B.CreatePHI(Ty, 2);		PHINode *const PHI = B.CreatePHI(Ty, 2);
PHI->addIncoming(PoisonValue::get(Ty), EntryBB);		PHI->addIncoming(PoisonValue::get(Ty), EntryBB);
PHI->addIncoming(NewI, SingleLaneTerminator->getParent());		PHI->addIncoming(NewI, SingleLaneTerminator->getParent());

// We need to broadcast the value who was the lowest active lane (the first		// We need to broadcast the value who was the lowest active lane (the first
// lane) to all other lanes in the wavefront. We use an intrinsic for this,		// lane) to all other lanes in the wavefront. We use an intrinsic for this,
// but have to handle 64-bit broadcasts with two calls to this intrinsic.		// but have to handle 64-bit broadcasts with two calls to this intrinsic.
Value *BroadcastI = nullptr;		Value *BroadcastI =
		B.CreateIntrinsic(Intrinsic::amdgcn_readfirstlane, {Ty}, {PHI});
if (TyBitWidth == 64) {
Value *const ExtractLo = B.CreateTrunc(PHI, B.getInt32Ty());
Value *const ExtractHi =
B.CreateTrunc(B.CreateLShr(PHI, 32), B.getInt32Ty());
CallInst *const ReadFirstLaneLo =
B.CreateIntrinsic(Intrinsic::amdgcn_readfirstlane, {}, ExtractLo);
CallInst *const ReadFirstLaneHi =
B.CreateIntrinsic(Intrinsic::amdgcn_readfirstlane, {}, ExtractHi);
Value *const PartialInsert = B.CreateInsertElement(
PoisonValue::get(VecTy), ReadFirstLaneLo, B.getInt32(0));
Value *const Insert =
B.CreateInsertElement(PartialInsert, ReadFirstLaneHi, B.getInt32(1));
BroadcastI = B.CreateBitCast(Insert, Ty);
} else if (TyBitWidth == 32) {

BroadcastI = B.CreateIntrinsic(Intrinsic::amdgcn_readfirstlane, {}, PHI);
} else {
llvm_unreachable("Unhandled atomic bit width");
}

// Now that we have the result of our single atomic operation, we need to		// Now that we have the result of our single atomic operation, we need to
// get our individual lane's slice into the result. We use the lane offset		// get our individual lane's slice into the result. We use the lane offset
// we previously calculated combined with the atomic result value we got		// we previously calculated combined with the atomic result value we got
// from the first lane, to get our lane's index into the atomic result.		// from the first lane, to get our lane's index into the atomic result.
Value *LaneOffset = nullptr;		Value *LaneOffset = nullptr;
if (ValDivergent) {		if (ValDivergent) {
LaneOffset =		LaneOffset =
▲ Show 20 Lines • Show All 52 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp

Show First 20 Lines • Show All 227 Lines • ▼ Show 20 Lines	public:
bool visitBinaryOperator(BinaryOperator &I);		bool visitBinaryOperator(BinaryOperator &I);
bool visitLoadInst(LoadInst &I);		bool visitLoadInst(LoadInst &I);
bool visitICmpInst(ICmpInst &I);		bool visitICmpInst(ICmpInst &I);
bool visitSelectInst(SelectInst &I);		bool visitSelectInst(SelectInst &I);
bool visitPHINode(PHINode &I);		bool visitPHINode(PHINode &I);

bool visitIntrinsicInst(IntrinsicInst &I);		bool visitIntrinsicInst(IntrinsicInst &I);
bool visitBitreverseIntrinsicInst(IntrinsicInst &I);		bool visitBitreverseIntrinsicInst(IntrinsicInst &I);
		bool visitLaneIntrinsicInst(IntrinsicInst &I);
		Value *buildLegalLaneIntrinsic(IRBuilder<> &B, Intrinsic::ID IID,
		Value Data0, Value Lane = nullptr,
		Value *Data1 = nullptr);

bool doInitialization(Module &M) override;		bool doInitialization(Module &M) override;
bool runOnFunction(Function &F) override;		bool runOnFunction(Function &F) override;

StringRef getPassName() const override { return "AMDGPU IR optimizations"; }		StringRef getPassName() const override { return "AMDGPU IR optimizations"; }

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.addRequired<AssumptionCacheTracker>();		AU.addRequired<AssumptionCacheTracker>();
▲ Show 20 Lines • Show All 1,404 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp

Show First 20 Lines • Show All 937 Lines • ▼ Show 20 Lines	if (match(Src,
return IC.replaceInstUsesWith(II, Src);		return IC.replaceInstUsesWith(II, Src);
}		}

if (IID == Intrinsic::amdgcn_readfirstlane) {		if (IID == Intrinsic::amdgcn_readfirstlane) {
// readfirstlane (readlane x, y) -> readlane x, y		// readfirstlane (readlane x, y) -> readlane x, y
if (match(Src, PatternMatch::m_Intrinsic<Intrinsic::amdgcn_readlane>())) {		if (match(Src, PatternMatch::m_Intrinsic<Intrinsic::amdgcn_readlane>())) {
return IC.replaceInstUsesWith(II, Src);		return IC.replaceInstUsesWith(II, Src);
}		}

		// readfirstlane (bitcast x) -> bitcast (readfirstlane x)
		Value *BitcastInput = nullptr;
		if (match(Src,
		PatternMatch::m_BitCast(PatternMatch::m_Value(BitcastInput)))) {
		CallInst *NewCall =
		IC.Builder.CreateIntrinsic(Intrinsic::amdgcn_readfirstlane,
		{BitcastInput->getType()}, BitcastInput);
		Value *NewCast = IC.Builder.CreateBitCast(NewCall, II.getType());
		return IC.replaceInstUsesWith(II, NewCast);
		}
} else {		} else {
// readlane (readlane x, y), y -> readlane x, y		// readlane (readlane x, y), y -> readlane x, y
if (match(Src, PatternMatch::m_Intrinsic<Intrinsic::amdgcn_readlane>(		if (match(Src, PatternMatch::m_Intrinsic<Intrinsic::amdgcn_readlane>(
PatternMatch::m_Value(),		PatternMatch::m_Value(),
PatternMatch::m_Specific(II.getArgOperand(1))))) {		PatternMatch::m_Specific(II.getArgOperand(1))))) {
return IC.replaceInstUsesWith(II, Src);		return IC.replaceInstUsesWith(II, Src);
}		}
}		}
▲ Show 20 Lines • Show All 278 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPULateCodeGenPrepare.cpp

Show All 12 Lines
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "AMDGPU.h"		#include "AMDGPU.h"
#include "llvm/Analysis/AssumptionCache.h"		#include "llvm/Analysis/AssumptionCache.h"
#include "llvm/Analysis/UniformityAnalysis.h"		#include "llvm/Analysis/UniformityAnalysis.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/InstVisitor.h"		#include "llvm/IR/InstVisitor.h"
		#include "llvm/IR/IntrinsicsAMDGPU.h"
#include "llvm/InitializePasses.h"		#include "llvm/InitializePasses.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/KnownBits.h"		#include "llvm/Support/KnownBits.h"
#include "llvm/Transforms/Utils/Local.h"		#include "llvm/Transforms/Utils/Local.h"

#define DEBUG_TYPE "amdgpu-late-codegenprepare"		#define DEBUG_TYPE "amdgpu-late-codegenprepare"

using namespace llvm;		using namespace llvm;
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	public:
// Check if the specified value is at least DWORD aligned.		// Check if the specified value is at least DWORD aligned.
bool isDWORDAligned(const Value *V) const {		bool isDWORDAligned(const Value *V) const {
KnownBits Known = computeKnownBits(V, *DL, 0, AC);		KnownBits Known = computeKnownBits(V, *DL, 0, AC);
return Known.countMinTrailingZeros() >= 2;		return Known.countMinTrailingZeros() >= 2;
}		}

bool canWidenScalarExtLoad(LoadInst &LI) const;		bool canWidenScalarExtLoad(LoadInst &LI) const;
bool visitLoadInst(LoadInst &LI);		bool visitLoadInst(LoadInst &LI);
		bool visitIntrinsicInst(IntrinsicInst &I);
		bool visitLaneIntrinsicInst(IntrinsicInst &I);

		Value *buildLegalLaneIntrinsic(IRBuilder<> &B, Intrinsic::ID IID,
		Value Data0, Value Data1, Value *Lane0,
		Value Lane1, Value Mod0, Value *Mod1);
};		};

} // end anonymous namespace		} // end anonymous namespace

bool AMDGPULateCodeGenPrepare::doInitialization(Module &M) {		bool AMDGPULateCodeGenPrepare::doInitialization(Module &M) {
Mod = &M;		Mod = &M;
DL = &Mod->getDataLayout();		DL = &Mod->getDataLayout();
return false;		return false;
▲ Show 20 Lines • Show All 85 Lines • ▼ Show 20 Lines	bool AMDGPULateCodeGenPrepare::visitLoadInst(LoadInst &LI) {
auto *NewVal = IRB.CreateBitCast(		auto *NewVal = IRB.CreateBitCast(
IRB.CreateTrunc(IRB.CreateLShr(NewLd, ShAmt), IntNTy), LI.getType());		IRB.CreateTrunc(IRB.CreateLShr(NewLd, ShAmt), IntNTy), LI.getType());
LI.replaceAllUsesWith(NewVal);		LI.replaceAllUsesWith(NewVal);
RecursivelyDeleteTriviallyDeadInstructions(&LI);		RecursivelyDeleteTriviallyDeadInstructions(&LI);

return true;		return true;
}		}

		Value *AMDGPULateCodeGenPrepare::buildLegalLaneIntrinsic(
		arsenmUnsubmitted Not Done Reply Inline Actions You're not relying on this for correctness are you? This is an optimization pass, you can't lower here. You also shouldn't need to handle this in the IR, it should codegen normally arsenm: You're not relying on this for correctness are you? This is an optimization pass, you can't…
		jrbyrnesAuthorUnsubmitted Done Reply Inline Actions This is the legalization for non 32bit types -- I don't exactly know why it wasn't handled via the normal codegen / selection process. @nhaehnle , I believe you tried this in https://reviews.llvm.org/D86154 -- do you happen to remember why we do legalization this way? If not, I'll rework the approach. jrbyrnes: This is the legalization for non 32bit types -- I don't exactly know why it wasn't handled via…
		arsenmUnsubmitted Not Done Reply Inline Actions CodeGenPrepare/LateCodeGenPrepare can't be used for lowering, they're optimization passes. Legalization needs to be handled in the codegen arsenm: CodeGenPrepare/LateCodeGenPrepare can't be used for lowering, they're optimization passes.
		IRBuilder<> &B, Intrinsic::ID IID, Value Data0, Value Data1, Value *Lane0,
		Value Lane1, Value Mod0, Value *Mod1) {
		Type *Ty = Data0->getType();
		bool IsPermLane = (IID == Intrinsic::amdgcn_permlane16 \|\|
		IID == Intrinsic::amdgcn_permlanex16);

		if (Ty->isIntegerTy(32) \|\| Ty->isPointerTy()) {
		if (IsPermLane) {
		Value *Args[6] = {Data0, Data1, Lane0, Lane1, Mod0, Mod1};
		return B.CreateIntrinsic(IID, {Ty}, {Args});
		}

		// {write, read, readfirst}lane
		Value *Args[3] = {Data0, Lane0, Data1};
		unsigned NumArgs = Data1 != nullptr ? 3 : Lane0 != nullptr ? 2 : 1;
		return B.CreateIntrinsic(IID, {B.getInt32Ty()}, {Args, NumArgs});
		}

		if (auto *VecTy = dyn_cast<FixedVectorType>(Ty)) {
		Type *EltType = VecTy->getElementType();
		bool is16Bit = EltType->getPrimitiveSizeInBits() == TypeSize::Fixed(16);
		int EC = VecTy->getElementCount().getKnownMinValue();
		arsenmUnsubmitted Done Reply Inline Actions isIntegerTy(16). Also, just check the bitsize is 16. Might as well also handle bfloat arsenm: isIntegerTy(16). Also, just check the bitsize is 16. Might as well also handle bfloat

		Value *Result = PoisonValue::get(Ty);
		for (int i = 0; i < EC; i += 1 + is16Bit) {
		Value *EltData0;
		nlopesUnsubmitted Done Reply Inline Actions Please use poison wherever possible. In this case it seems it's just a placeholder, so it can be poison. We're trying to get rid of poison. Thanks! nlopes: Please use poison wherever possible. In this case it seems it's just a placeholder, so it can…
		Value *EltData1 = nullptr;

		if (is16Bit) {
		int Idxs[2] = {i, i + 1};
		EltData0 = B.CreateShuffleVector(Data0, PoisonValue::get(Ty), Idxs);
		EltData0 = B.CreateBitCast(EltData0, B.getInt32Ty());
		} else {
		EltData0 = B.CreateExtractElement(Data0, i);
		}

		if (Data1) {
		if (is16Bit) {
		int Idxs[2] = {i, i + 1};
		EltData1 = B.CreateShuffleVector(Data1, PoisonValue::get(Ty), Idxs);
		EltData1 = B.CreateBitCast(EltData1, B.getInt32Ty());
		} else {
		EltData1 = B.CreateExtractElement(Data1, i);
		}
		}

		Value *EltResult = buildLegalLaneIntrinsic(B, IID, EltData0, EltData1,
		Lane0, Lane1, Mod0, Mod1);

		if (is16Bit) {
		EltResult =
		B.CreateBitCast(EltResult, FixedVectorType::get(EltType, 2));
		for (int j = 0; j < 2; ++j) {
		if (i + j >= EC)
		break;
		Result = B.CreateInsertElement(
		Result, B.CreateExtractElement(EltResult, j), i + j);
		}
		} else {
		Result = B.CreateInsertElement(Result, EltResult, i);
		}
		}

		return Result;
		}

		unsigned BitWidth = DL->getTypeSizeInBits(Ty);
		Type *IntTy = Ty;

		if (!Ty->isIntegerTy()) {
		IntTy = IntegerType::get(Mod->getContext(), BitWidth);
		Data0 = B.CreateBitCast(Data0, IntTy);
		if (Data1)
		Data1 = B.CreateBitCast(Data1, IntTy);
		}

		if ((BitWidth % 32) != 0) {
		Type *ExtendedTy =
		IntegerType::get(Mod->getContext(), (BitWidth + 31) & ~31);
		Data0 = B.CreateZExt(Data0, ExtendedTy);
		if (Data1)
		Data1 = B.CreateZExt(Data1, ExtendedTy);
		}

		if (BitWidth > 32) {
		Type *VecTy = FixedVectorType::get(B.getInt32Ty(), (BitWidth + 31) / 32);
		Data0 = B.CreateBitCast(Data0, VecTy);
		if (Data1)
		Data1 = B.CreateBitCast(Data1, VecTy);
		}

		Value *Result =
		buildLegalLaneIntrinsic(B, IID, Data0, Data1, Lane0, Lane1, Mod0, Mod1);

		if ((BitWidth % 32) != 0) {
		if (BitWidth > 32) {
		Result = B.CreateBitCast(
		Result, IntegerType::get(Mod->getContext(), (BitWidth + 31) / 32));
		}
		Result =
		B.CreateTrunc(Result, IntegerType::get(Mod->getContext(), BitWidth));
		}

		return B.CreateBitCast(Result, Ty);
		}

		/// "Legalize" readfirstlane/readlane/writelane to single-dword intrinsics
		/// on i32.
		///
		/// Done during codegen prepare purely because this turned out to be simpler
		/// than doing it in this generality in SelectionDAG.
		bool AMDGPULateCodeGenPrepare::visitLaneIntrinsicInst(IntrinsicInst &I) {
		Type *Ty = I.getType();
		if (Ty->isIntegerTy(32) \|\| Ty->isPointerTy())
		return false; // already legal

		Value *Data0 = I.getArgOperand(0);
		Value *Data1 = nullptr;
		Value *Lane0 = nullptr;
		Value *Lane1 = nullptr;
		Value *Mod0 = nullptr;
		Value *Mod1 = nullptr;

		arsenmUnsubmitted Done Reply Inline Actions Just let pointer types pass through to codegen, we try really hard to never introduce ptrtoint/inttoptr arsenm: Just let pointer types pass through to codegen, we try really hard to never introduce…
		if (I.getIntrinsicID() == Intrinsic::amdgcn_readlane) {
		Lane0 = I.getArgOperand(1);
		} else if (I.getIntrinsicID() == Intrinsic::amdgcn_writelane) {
		Lane0 = I.getArgOperand(1);
		Data1 = I.getArgOperand(2);
		} else if (I.getIntrinsicID() == Intrinsic::amdgcn_permlane16 \|\|
		I.getIntrinsicID() == Intrinsic::amdgcn_permlanex16) {
		Data1 = I.getArgOperand(1);
		Lane0 = I.getArgOperand(2);
		Lane1 = I.getArgOperand(3);
		Mod0 = I.getArgOperand(4);
		Mod1 = I.getArgOperand(5);
		}

		IRBuilder<> Builder(&I);
		Value *Legalized = buildLegalLaneIntrinsic(Builder, I.getIntrinsicID(), Data0,
		Data1, Lane0, Lane1, Mod0, Mod1);

		I.replaceAllUsesWith(Legalized);
		I.eraseFromParent();
		return true;
		}

		bool AMDGPULateCodeGenPrepare::visitIntrinsicInst(IntrinsicInst &I) {
		switch (I.getIntrinsicID()) {
		case Intrinsic::amdgcn_readfirstlane:
		case Intrinsic::amdgcn_readlane:
		case Intrinsic::amdgcn_writelane:
		case Intrinsic::amdgcn_permlane16:
		case Intrinsic::amdgcn_permlanex16:
		return visitLaneIntrinsicInst(I);
		default:
		return false;
		}
		}

INITIALIZE_PASS_BEGIN(AMDGPULateCodeGenPrepare, DEBUG_TYPE,		INITIALIZE_PASS_BEGIN(AMDGPULateCodeGenPrepare, DEBUG_TYPE,
"AMDGPU IR late optimizations", false, false)		"AMDGPU IR late optimizations", false, false)
INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)		INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)
INITIALIZE_PASS_DEPENDENCY(UniformityInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(UniformityInfoWrapperPass)
INITIALIZE_PASS_END(AMDGPULateCodeGenPrepare, DEBUG_TYPE,		INITIALIZE_PASS_END(AMDGPULateCodeGenPrepare, DEBUG_TYPE,
"AMDGPU IR late optimizations", false, false)		"AMDGPU IR late optimizations", false, false)

char AMDGPULateCodeGenPrepare::ID = 0;		char AMDGPULateCodeGenPrepare::ID = 0;

FunctionPass *llvm::createAMDGPULateCodeGenPreparePass() {		FunctionPass *llvm::createAMDGPULateCodeGenPreparePass() {
return new AMDGPULateCodeGenPrepare();		return new AMDGPULateCodeGenPrepare();
}		}

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

Show First 20 Lines • Show All 1,111 Lines • ▼ Show 20 Lines	if (EnableMaxIlpSchedStrategy)
return createGCNMaxILPMachineScheduler(C);		return createGCNMaxILPMachineScheduler(C);

return createGCNMaxOccupancyMachineScheduler(C);		return createGCNMaxOccupancyMachineScheduler(C);
}		}

bool GCNPassConfig::addPreISel() {		bool GCNPassConfig::addPreISel() {
AMDGPUPassConfig::addPreISel();		AMDGPUPassConfig::addPreISel();

if (TM->getOptLevel() > CodeGenOpt::None)
addPass(createAMDGPULateCodeGenPreparePass());

if (isPassEnabled(EnableAtomicOptimizations, CodeGenOpt::Less)) {		if (isPassEnabled(EnableAtomicOptimizations, CodeGenOpt::Less)) {
addPass(createAMDGPUAtomicOptimizerPass());		addPass(createAMDGPUAtomicOptimizerPass());
}		}

if (TM->getOptLevel() > CodeGenOpt::None)		if (TM->getOptLevel() > CodeGenOpt::None)
		addPass(createAMDGPULateCodeGenPreparePass());

		if (TM->getOptLevel() > CodeGenOpt::None)
addPass(createSinkingPass());		addPass(createSinkingPass());

// Merge divergent exit nodes. StructurizeCFG won't recognize the multi-exit		// Merge divergent exit nodes. StructurizeCFG won't recognize the multi-exit
// regions formed by them.		// regions formed by them.
addPass(&AMDGPUUnifyDivergentExitNodesID);		addPass(&AMDGPUUnifyDivergentExitNodesID);
if (!LateCFGStructurize) {		if (!LateCFGStructurize) {
if (EnableStructurizerWorkarounds) {		if (EnableStructurizerWorkarounds) {
addPass(createFixIrreduciblePass());		addPass(createFixIrreduciblePass());
▲ Show 20 Lines • Show All 479 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIInstructions.td

Show First 20 Lines • Show All 2,044 Lines • ▼ Show 20 Lines
// rounding down a bit to avoid unwanted overflow.		// rounding down a bit to avoid unwanted overflow.
def : GCNPat <		def : GCNPat <
(AMDGPUurecip i32:$src0),		(AMDGPUurecip i32:$src0),
(V_CVT_U32_F32_e32		(V_CVT_U32_F32_e32
(V_MUL_F32_e32 (i32 CONST.FP_4294966784),		(V_MUL_F32_e32 (i32 CONST.FP_4294966784),
(V_RCP_IFLAG_F32_e32 (V_CVT_F32_U32_e32 $src0))))		(V_RCP_IFLAG_F32_e32 (V_CVT_F32_U32_e32 $src0))))
>;		>;

		// 64-bit *lane Intrinsics
		def : AMDGPUPat <
		(i64 (int_amdgcn_writelane i64:$src0, i32:$lane, i64:$inp)),
		(REG_SEQUENCE VReg_64,
		(V_WRITELANE_B32 (i32 (EXTRACT_SUBREG VReg_64:$src0, sub0)),
		$lane,
		(i32 (EXTRACT_SUBREG VReg_64:$inp, sub0))), sub0,
		(V_WRITELANE_B32 (i32 (EXTRACT_SUBREG VReg_64:$src0, sub1)),
		$lane,
		(i32 (EXTRACT_SUBREG VReg_64:$inp, sub1))), sub1)
		>;

		def : AMDGPUPat <
		(i64 (int_amdgcn_readlane i64:$src0, i32:$lane)),
		(REG_SEQUENCE VReg_64,
		(V_READLANE_B32 (i32 (EXTRACT_SUBREG VReg_64:$src0, sub0)),
		$lane), sub0,
		(V_READLANE_B32 (i32 (EXTRACT_SUBREG VReg_64:$src0, sub1)),
		$lane), sub1)
		>;

		def : AMDGPUPat <
		(i64 (int_amdgcn_readfirstlane i64:$src0)),
		(REG_SEQUENCE VReg_64,
		(V_READFIRSTLANE_B32 (i32 (EXTRACT_SUBREG VReg_64:$src0, sub0))), sub0,
		(V_READFIRSTLANE_B32 (i32 (EXTRACT_SUBREG VReg_64:$src0, sub1))), sub1)
		>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// VOP3 Patterns		// VOP3 Patterns
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

def : IMad24Pat<V_MAD_I32_I24_e64, 1>;		def : IMad24Pat<V_MAD_I32_I24_e64, 1>;
def : UMad24Pat<V_MAD_U32_U24_e64, 1>;		def : UMad24Pat<V_MAD_U32_U24_e64, 1>;

// BFI patterns		// BFI patterns
▲ Show 20 Lines • Show All 1,079 Lines • ▼ Show 20 Lines	def : GCNPat<
let SubtargetPredicate = NotHasAddNoCarryInsts;		let SubtargetPredicate = NotHasAddNoCarryInsts;
}		}


// Avoid pointlessly materializing a constant in VGPR.		// Avoid pointlessly materializing a constant in VGPR.
// FIXME: Should also do this for readlane, but tablegen crashes on		// FIXME: Should also do this for readlane, but tablegen crashes on
// the ignored src1.		// the ignored src1.
def : GCNPat<		def : GCNPat<
(int_amdgcn_readfirstlane (i32 imm:$src)),		(i32 (int_amdgcn_readfirstlane (i32 imm:$src))),
(S_MOV_B32 SReg_32:$src)		(S_MOV_B32 SReg_32:$src)
>;		>;

multiclass BFMPatterns <ValueType vt, PatFrag SHL, PatFrag ADD, InstSI BFM> {		multiclass BFMPatterns <ValueType vt, PatFrag SHL, PatFrag ADD, InstSI BFM> {
def : GCNPat <		def : GCNPat <
(vt (SHL (vt (add (vt (shl 1, vt:$a)), -1)), vt:$b)),		(vt (SHL (vt (add (vt (shl 1, vt:$a)), -1)), vt:$b)),
(BFM $a, $b)		(BFM $a, $b)
>;		>;
▲ Show 20 Lines • Show All 501 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/VOP3Instructions.td

	Show First 20 Lines • Show All 675 Lines • ▼ Show 20 Lines
	def opsel_i1timm : SDNodeXForm<timm, [{			def opsel_i1timm : SDNodeXForm<timm, [{
	return CurDAG->getTargetConstant(N->getZExtValue() ? SISrcMods::OP_SEL_0 : 0, SDLoc(N), MVT::i32);			return CurDAG->getTargetConstant(N->getZExtValue() ? SISrcMods::OP_SEL_0 : 0, SDLoc(N), MVT::i32);
	}]>;			}]>;
	def gi_opsel_i1timm : GICustomOperandRenderer<"renderOpSelTImm">,			def gi_opsel_i1timm : GICustomOperandRenderer<"renderOpSelTImm">,
	GISDNodeXFormEquiv<opsel_i1timm>;			GISDNodeXFormEquiv<opsel_i1timm>;

	class PermlanePat<SDPatternOperator permlane,			class PermlanePat<SDPatternOperator permlane,
	Instruction inst> : GCNPat<			Instruction inst> : GCNPat<
	(permlane i32:$vdst_in, i32:$src0, i32:$src1, i32:$src2,			(i32 (permlane i32:$vdst_in, i32:$src0, i32:$src1, i32:$src2,
	timm:$fi, timm:$bc),			timm:$fi, timm:$bc)),
	(inst (opsel_i1timm $fi), VGPR_32:$src0, (opsel_i1timm $bc),			(inst (opsel_i1timm $fi), VGPR_32:$src0, (opsel_i1timm $bc),
	SCSrc_b32:$src1, 0, SCSrc_b32:$src2, VGPR_32:$vdst_in)			SCSrc_b32:$src1, 0, SCSrc_b32:$src2, VGPR_32:$vdst_in)
	>;			>;

				class Permlane64Pat<SDPatternOperator permlane,
				Instruction inst> : GCNPat<
				(i64 (permlane i64:$vdst_in, i64:$src0, i32:$src1, i32:$src2,
				timm:$fi, timm:$bc)),
				(REG_SEQUENCE VReg_64,
				(inst (opsel_i1timm $fi), (i32 (EXTRACT_SUBREG VReg_64:$src0, sub0)), (opsel_i1timm $bc),
				SCSrc_b32:$src1, 0, SCSrc_b32:$src2, (i32 (EXTRACT_SUBREG VReg_64:$vdst_in, sub0))), sub0,
				(inst (opsel_i1timm $fi), (i32 (EXTRACT_SUBREG VReg_64:$src0, sub1)), (opsel_i1timm $bc),
				SCSrc_b32:$src1, 0, SCSrc_b32:$src2, (i32 (EXTRACT_SUBREG VReg_64:$vdst_in, sub1))), sub1)
				>;

	let SubtargetPredicate = isGFX10Plus in {			let SubtargetPredicate = isGFX10Plus in {
	let isCommutable = 1, isReMaterializable = 1 in {			let isCommutable = 1, isReMaterializable = 1 in {
	defm V_XOR3_B32 : VOP3Inst <"v_xor3_b32", VOP3_Profile<VOP_I32_I32_I32_I32>>;			defm V_XOR3_B32 : VOP3Inst <"v_xor3_b32", VOP3_Profile<VOP_I32_I32_I32_I32>>;
	} // End isCommutable = 1, isReMaterializable = 1			} // End isCommutable = 1, isReMaterializable = 1
	def : ThreeOp_i32_Pats<xor, xor, V_XOR3_B32_e64>;			def : ThreeOp_i32_Pats<xor, xor, V_XOR3_B32_e64>;

	let Constraints = "$vdst = $vdst_in", DisableEncoding="$vdst_in" in {			let Constraints = "$vdst = $vdst_in", DisableEncoding="$vdst_in" in {
	defm V_PERMLANE16_B32 : VOP3Inst<"v_permlane16_b32", VOP3_PERMLANE_Profile>;			defm V_PERMLANE16_B32 : VOP3Inst<"v_permlane16_b32", VOP3_PERMLANE_Profile>;
	defm V_PERMLANEX16_B32 : VOP3Inst<"v_permlanex16_b32", VOP3_PERMLANE_Profile>;			defm V_PERMLANEX16_B32 : VOP3Inst<"v_permlanex16_b32", VOP3_PERMLANE_Profile>;
	} // End $vdst = $vdst_in, DisableEncoding $vdst_in			} // End $vdst = $vdst_in, DisableEncoding $vdst_in

	def : PermlanePat<int_amdgcn_permlane16, V_PERMLANE16_B32_e64>;			def : PermlanePat<int_amdgcn_permlane16, V_PERMLANE16_B32_e64>;
	def : PermlanePat<int_amdgcn_permlanex16, V_PERMLANEX16_B32_e64>;			def : PermlanePat<int_amdgcn_permlanex16, V_PERMLANEX16_B32_e64>;

				def : Permlane64Pat<int_amdgcn_permlane16, V_PERMLANEX16_B32_e64>;
				def : Permlane64Pat<int_amdgcn_permlanex16, V_PERMLANEX16_B32_e64>;

	defm V_ADD_NC_U16 : VOP3Inst <"v_add_nc_u16", VOP3_Profile<VOP_I16_I16_I16, VOP3_OPSEL>, add>;			defm V_ADD_NC_U16 : VOP3Inst <"v_add_nc_u16", VOP3_Profile<VOP_I16_I16_I16, VOP3_OPSEL>, add>;
	defm V_SUB_NC_U16 : VOP3Inst <"v_sub_nc_u16", VOP3_Profile<VOP_I16_I16_I16, VOP3_OPSEL>, sub>;			defm V_SUB_NC_U16 : VOP3Inst <"v_sub_nc_u16", VOP3_Profile<VOP_I16_I16_I16, VOP3_OPSEL>, sub>;

	def : OpSelBinOpClampPat<uaddsat, V_ADD_NC_U16_e64>;			def : OpSelBinOpClampPat<uaddsat, V_ADD_NC_U16_e64>;
	def : OpSelBinOpClampPat<usubsat, V_SUB_NC_U16_e64>;			def : OpSelBinOpClampPat<usubsat, V_SUB_NC_U16_e64>;

	// Undo sub x, c -> add x, -c canonicalization since c is more likely			// Undo sub x, c -> add x, -c canonicalization since c is more likely
	// an inline immediate than -c.			// an inline immediate than -c.
	▲ Show 20 Lines • Show All 696 Lines • Show Last 20 Lines

llvm/test/Analysis/UniformityAnalysis/AMDGPU/intrinsics.ll

	; RUN: opt -mtriple amdgcn-- -passes='print<uniformity>' -disable-output %s 2>&1 \| FileCheck %s			; RUN: opt -mtriple amdgcn-- -passes='print<uniformity>' -disable-output %s 2>&1 \| FileCheck %s

	; CHECK: DIVERGENT: %swizzle = call i32 @llvm.amdgcn.ds.swizzle(i32 %src, i32 100) #0			; CHECK: DIVERGENT: %swizzle = call i32 @llvm.amdgcn.ds.swizzle(i32 %src, i32 100) #0
	define amdgpu_kernel void @ds_swizzle(ptr addrspace(1) %out, i32 %src) #0 {			define amdgpu_kernel void @ds_swizzle(ptr addrspace(1) %out, i32 %src) #0 {
	%swizzle = call i32 @llvm.amdgcn.ds.swizzle(i32 %src, i32 100) #0			%swizzle = call i32 @llvm.amdgcn.ds.swizzle(i32 %src, i32 100) #0
	store i32 %swizzle, ptr addrspace(1) %out, align 4			store i32 %swizzle, ptr addrspace(1) %out, align 4
	ret void			ret void
	}			}

	; CHECK: DIVERGENT: %v = call i32 @llvm.amdgcn.permlane16(i32 %src0, i32 %src0, i32 %src1, i32 %src2, i1 false, i1 false) #0			; CHECK: DIVERGENT: %v = call i32 @llvm.amdgcn.permlane16.i32(i32 %src0, i32 %src0, i32 %src1, i32 %src2, i1 false, i1 false) #0
	define amdgpu_kernel void @v_permlane16_b32(ptr addrspace(1) %out, i32 %src0, i32 %src1, i32 %src2) #0 {			define amdgpu_kernel void @v_permlane16_b32(ptr addrspace(1) %out, i32 %src0, i32 %src1, i32 %src2) #0 {
	%v = call i32 @llvm.amdgcn.permlane16(i32 %src0, i32 %src0, i32 %src1, i32 %src2, i1 false, i1 false) #0			%v = call i32 @llvm.amdgcn.permlane16(i32 %src0, i32 %src0, i32 %src1, i32 %src2, i1 false, i1 false) #0
	store i32 %v, ptr addrspace(1) %out			store i32 %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	; CHECK: DIVERGENT: %v = call i32 @llvm.amdgcn.permlanex16(i32 %src0, i32 %src0, i32 %src1, i32 %src2, i1 false, i1 false) #0			; CHECK: DIVERGENT: %v = call i32 @llvm.amdgcn.permlanex16.i32(i32 %src0, i32 %src0, i32 %src1, i32 %src2, i1 false, i1 false) #0
	define amdgpu_kernel void @v_permlanex16_b32(ptr addrspace(1) %out, i32 %src0, i32 %src1, i32 %src2) #0 {			define amdgpu_kernel void @v_permlanex16_b32(ptr addrspace(1) %out, i32 %src0, i32 %src1, i32 %src2) #0 {
	%v = call i32 @llvm.amdgcn.permlanex16(i32 %src0, i32 %src0, i32 %src1, i32 %src2, i1 false, i1 false) #0			%v = call i32 @llvm.amdgcn.permlanex16(i32 %src0, i32 %src0, i32 %src1, i32 %src2, i1 false, i1 false) #0
	store i32 %v, ptr addrspace(1) %out			store i32 %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	; CHECK: DIVERGENT: %tmp0 = call i32 @llvm.amdgcn.update.dpp.i32(i32 %in1, i32 %in2, i32 1, i32 1, i32 1, i1 false) #0			; CHECK: DIVERGENT: %tmp0 = call i32 @llvm.amdgcn.update.dpp.i32(i32 %in1, i32 %in2, i32 1, i32 1, i32 1, i1 false) #0
	define amdgpu_kernel void @update_dpp(ptr addrspace(1) %out, i32 %in1, i32 %in2) #0 {			define amdgpu_kernel void @update_dpp(ptr addrspace(1) %out, i32 %in1, i32 %in2) #0 {
	Show All 11 Lines

	; CHECK: DIVERGENT: %tmp0 = call i32 @llvm.amdgcn.mov.dpp8.i32(i32 %in, i32 1) #0			; CHECK: DIVERGENT: %tmp0 = call i32 @llvm.amdgcn.mov.dpp8.i32(i32 %in, i32 1) #0
	define amdgpu_kernel void @mov_dpp8(ptr addrspace(1) %out, i32 %in) #0 {			define amdgpu_kernel void @mov_dpp8(ptr addrspace(1) %out, i32 %in) #0 {
	%tmp0 = call i32 @llvm.amdgcn.mov.dpp8.i32(i32 %in, i32 1) #0			%tmp0 = call i32 @llvm.amdgcn.mov.dpp8.i32(i32 %in, i32 1) #0
	store i32 %tmp0, ptr addrspace(1) %out			store i32 %tmp0, ptr addrspace(1) %out
	ret void			ret void
	}			}

	; CHECK: DIVERGENT: %tmp0 = call i32 @llvm.amdgcn.writelane(i32 0, i32 1, i32 2)			; CHECK: DIVERGENT: %tmp0 = call i32 @llvm.amdgcn.writelane.i32(i32 0, i32 1, i32 2)
	define amdgpu_kernel void @writelane(ptr addrspace(1) %out) #0 {			define amdgpu_kernel void @writelane(ptr addrspace(1) %out) #0 {
	%tmp0 = call i32 @llvm.amdgcn.writelane(i32 0, i32 1, i32 2)			%tmp0 = call i32 @llvm.amdgcn.writelane(i32 0, i32 1, i32 2)
	store i32 %tmp0, ptr addrspace(1) %out			store i32 %tmp0, ptr addrspace(1) %out
	ret void			ret void
	}			}

	; CHECK: DIVERGENT: %tmp0 = call <8 x float> @llvm.amdgcn.wmma.f32.16x16x16.f16.v8f32(<16 x half> %A, <16 x half> %B, <8 x float> %C)			; CHECK: DIVERGENT: %tmp0 = call <8 x float> @llvm.amdgcn.wmma.f32.16x16x16.f16.v8f32(<16 x half> %A, <16 x half> %B, <8 x float> %C)
	define amdgpu_kernel void @wmma_f32_16x16x16_f16(<16 x half> %A, <16 x half> %B, <8 x float> %C, ptr addrspace(1) %out) {			define amdgpu_kernel void @wmma_f32_16x16x16_f16(<16 x half> %A, <16 x half> %B, <8 x float> %C, ptr addrspace(1) %out) {
	▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

llvm/test/Assembler/autoupgrade-amdgpu-intrinsics.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2
				; RUN: opt -S < %s \| FileCheck %s

				declare i32 @llvm.amdgcn.permlane16(i32, i32, i32, i32, i1, i1)
				declare i32 @llvm.amdgcn.permlanex16(i32, i32, i32, i32, i1, i1)
				declare i32 @llvm.amdgcn.readlane(i32, i32)
				declare i32 @llvm.amdgcn.readfirstlane(i32)
				declare i32 @llvm.amdgcn.writelane(i32, i32, i32)

				define void @test_permlanex6(ptr addrspace(1) %out, i32 %src0, i32 %src1, i32 %src2) #1 {
				; CHECK-LABEL: define void @test_permlanex6
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], i32 [[SRC0:%.]], i32 [[SRC1:%.]], i32 [[SRC2:%.]]) {
				; CHECK-NEXT: [[V:%.*]] = call i32 @llvm.amdgcn.permlane16.i32(i32 [[SRC0]], i32 [[SRC0]], i32 [[SRC1]], i32 [[SRC2]], i1 true, i1 false)
				; CHECK-NEXT: store i32 [[V]], ptr addrspace(1) [[OUT]], align 4
				; CHECK-NEXT: ret void
				;
				%v = call i32 @llvm.amdgcn.permlane16(i32 %src0, i32 %src0, i32 %src1, i32 %src2, i1 true, i1 false)
				store i32 %v, ptr addrspace(1) %out, align 4
				ret void
				}

				define void @test_permlanex16(ptr addrspace(1) %out, i32 %src0, i32 %src1, i32 %src2) #1 {
				; CHECK-LABEL: define void @test_permlanex16
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], i32 [[SRC0:%.]], i32 [[SRC1:%.]], i32 [[SRC2:%.]]) {
				; CHECK-NEXT: [[V:%.*]] = call i32 @llvm.amdgcn.permlanex16.i32(i32 [[SRC0]], i32 [[SRC0]], i32 [[SRC1]], i32 [[SRC2]], i1 true, i1 false)
				; CHECK-NEXT: store i32 [[V]], ptr addrspace(1) [[OUT]], align 4
				; CHECK-NEXT: ret void
				;
				%v = call i32 @llvm.amdgcn.permlanex16(i32 %src0, i32 %src0, i32 %src1, i32 %src2, i1 true, i1 false)
				store i32 %v, ptr addrspace(1) %out, align 4
				ret void
				}

				define void @test_readlane(ptr addrspace(1) %out, i32 %src) {
				; CHECK-LABEL: define void @test_readlane
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], i32 [[SRC:%.]]) {
				; CHECK-NEXT: [[READLANE:%.*]] = call i32 @llvm.amdgcn.readlane.i32(i32 [[SRC]], i32 15)
				; CHECK-NEXT: store i32 [[READLANE]], ptr addrspace(1) [[OUT]], align 2
				; CHECK-NEXT: ret void
				;
				%readlane = call i32 @llvm.amdgcn.readlane(i32 %src, i32 15)
				store i32 %readlane, ptr addrspace(1) %out, align 2
				ret void
				}

				define void @test_readfirstlane(ptr addrspace(1) %out, i32 %src) {
				; CHECK-LABEL: define void @test_readfirstlane
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], i32 [[SRC:%.]]) {
				; CHECK-NEXT: [[READFIRSTLANE:%.*]] = call i32 @llvm.amdgcn.readfirstlane.i32(i32 [[SRC]])
				; CHECK-NEXT: store i32 [[READFIRSTLANE]], ptr addrspace(1) [[OUT]], align 2
				; CHECK-NEXT: ret void
				;
				%readfirstlane = call i32 @llvm.amdgcn.readfirstlane(i32 %src)
				store i32 %readfirstlane, ptr addrspace(1) %out, align 2
				ret void
				}

				define void @test_writelane(ptr addrspace(1) %out, i32 %src) {
				; CHECK-LABEL: define void @test_writelane
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], i32 [[SRC:%.]]) {
				; CHECK-NEXT: [[WRITELANE:%.*]] = call i32 @llvm.amdgcn.writelane.i32(i32 1234, i32 15, i32 [[SRC]])
				; CHECK-NEXT: store i32 [[WRITELANE]], ptr addrspace(1) [[OUT]], align 2
				; CHECK-NEXT: ret void
				;
				%writelane = call i32 @llvm.amdgcn.writelane(i32 1234, i32 15, i32 %src)
				store i32 %writelane, ptr addrspace(1) %out, align 2
				ret void
				}

llvm/test/CodeGen/AMDGPU/GlobalISel/atomic_optimizations_mul_one.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2
	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: opt -S -mtriple=amdgcn-- -amdgpu-atomic-optimizer -verify-machineinstrs %s \| FileCheck -check-prefix=IR %s			; RUN: opt -S -mtriple=amdgcn-- -amdgpu-atomic-optimizer -verify-machineinstrs %s \| FileCheck -check-prefix=IR %s
	; RUN: llc -global-isel -mtriple=amdgcn-- -amdgpu-atomic-optimizations -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -global-isel -mtriple=amdgcn-- -amdgpu-atomic-optimizations -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s

	declare i32 @llvm.amdgcn.struct.buffer.atomic.add.i32(i32, <4 x i32>, i32, i32, i32, i32 immarg)			declare i32 @llvm.amdgcn.struct.buffer.atomic.add.i32(i32, <4 x i32>, i32, i32, i32, i32 immarg)
	declare i32 @llvm.amdgcn.struct.buffer.atomic.sub.i32(i32, <4 x i32>, i32, i32, i32, i32 immarg)			declare i32 @llvm.amdgcn.struct.buffer.atomic.sub.i32(i32, <4 x i32>, i32, i32, i32, i32 immarg)
	declare i32 @llvm.amdgcn.struct.buffer.atomic.xor.i32(i32, <4 x i32>, i32, i32, i32, i32 immarg)			declare i32 @llvm.amdgcn.struct.buffer.atomic.xor.i32(i32, <4 x i32>, i32, i32, i32, i32 immarg)
	declare void @llvm.amdgcn.struct.buffer.store.format.v4i32(<4 x i32>, <4 x i32>, i32, i32, i32, i32 immarg)			declare void @llvm.amdgcn.struct.buffer.store.format.v4i32(<4 x i32>, <4 x i32>, i32, i32, i32, i32 immarg)

	define amdgpu_cs void @atomic_add(<4 x i32> inreg %arg) {			define amdgpu_cs void @atomic_add(<4 x i32> inreg %arg) {
	; IR-LABEL: @atomic_add(			; IR-LABEL: define amdgpu_cs void @atomic_add
				; IR-SAME: (<4 x i32> inreg [[ARG:%.*]]) {
	; IR-NEXT: .entry:			; IR-NEXT: .entry:
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP0]])			; IR-NEXT: [[TMP6:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP0]])
	; IR-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32			; IR-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32
	; IR-NEXT: [[TMP8:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP8:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP8]], label [[TMP9:%.]], label [[TMP11:%.]]			; IR-NEXT: br i1 [[TMP8]], label [[TMP9:%.]], label [[TMP11:%.]]
	; IR: 9:			; IR: 9:
	; IR-NEXT: [[TMP10:%.]] = call i32 @llvm.amdgcn.struct.buffer.atomic.add.i32(i32 [[TMP7]], <4 x i32> [[ARG:%.]], i32 0, i32 0, i32 0, i32 0)			; IR-NEXT: [[TMP10:%.*]] = call i32 @llvm.amdgcn.struct.buffer.atomic.add.i32(i32 [[TMP7]], <4 x i32> [[ARG]], i32 0, i32 0, i32 0, i32 0)
	; IR-NEXT: br label [[TMP11]]			; IR-NEXT: br label [[TMP11]]
	; IR: 11:			; IR: 11:
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	; GCN-LABEL: atomic_add:			; GCN-LABEL: atomic_add:
	; GCN: ; %bb.0: ; %.entry			; GCN: ; %bb.0: ; %.entry
	; GCN-NEXT: s_mov_b64 s[4:5], exec			; GCN-NEXT: s_mov_b64 s[4:5], exec
	; GCN-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s4, 0			; GCN-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s4, 0
	Show All 9 Lines
	; GCN-NEXT: .LBB0_2:			; GCN-NEXT: .LBB0_2:
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	.entry:			.entry:
	call i32 @llvm.amdgcn.struct.buffer.atomic.add.i32(i32 1, <4 x i32> %arg, i32 0, i32 0, i32 0, i32 0)			call i32 @llvm.amdgcn.struct.buffer.atomic.add.i32(i32 1, <4 x i32> %arg, i32 0, i32 0, i32 0, i32 0)
	ret void			ret void
	}			}

	define amdgpu_cs void @atomic_add_and_format(<4 x i32> inreg %arg) {			define amdgpu_cs void @atomic_add_and_format(<4 x i32> inreg %arg) {
	; IR-LABEL: @atomic_add_and_format(			; IR-LABEL: define amdgpu_cs void @atomic_add_and_format
				; IR-SAME: (<4 x i32> inreg [[ARG:%.*]]) {
	; IR-NEXT: .entry:			; IR-NEXT: .entry:
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP0]])			; IR-NEXT: [[TMP6:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP0]])
	; IR-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32			; IR-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32
	; IR-NEXT: [[TMP8:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP8:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP8]], label [[TMP9:%.]], label [[TMP11:%.]]			; IR-NEXT: br i1 [[TMP8]], label [[TMP9:%.]], label [[TMP11:%.]]
	; IR: 9:			; IR: 9:
	; IR-NEXT: [[TMP10:%.]] = call i32 @llvm.amdgcn.struct.buffer.atomic.add.i32(i32 [[TMP7]], <4 x i32> [[ARG:%.]], i32 0, i32 0, i32 0, i32 0)			; IR-NEXT: [[TMP10:%.*]] = call i32 @llvm.amdgcn.struct.buffer.atomic.add.i32(i32 [[TMP7]], <4 x i32> [[ARG]], i32 0, i32 0, i32 0, i32 0)
	; IR-NEXT: br label [[TMP11]]			; IR-NEXT: br label [[TMP11]]
	; IR: 11:			; IR: 11:
	; IR-NEXT: [[TMP12:%.]] = phi i32 [ poison, [[DOTENTRY:%.]] ], [ [[TMP10]], [[TMP9]] ]			; IR-NEXT: [[TMP12:%.]] = phi i32 [ poison, [[DOTENTRY:%.]] ], [ [[TMP10]], [[TMP9]] ]
	; IR-NEXT: [[TMP13:%.*]] = call i32 @llvm.amdgcn.readfirstlane(i32 [[TMP12]])			; IR-NEXT: [[TMP13:%.*]] = call i32 @llvm.amdgcn.readfirstlane.i32(i32 [[TMP12]])
	; IR-NEXT: [[TMP14:%.*]] = add i32 [[TMP13]], [[TMP5]]			; IR-NEXT: [[TMP14:%.*]] = add i32 [[TMP13]], [[TMP5]]
	; IR-NEXT: call void @llvm.amdgcn.struct.buffer.store.format.v4i32(<4 x i32> [[ARG]], <4 x i32> [[ARG]], i32 [[TMP14]], i32 0, i32 0, i32 0)			; IR-NEXT: call void @llvm.amdgcn.struct.buffer.store.format.v4i32(<4 x i32> [[ARG]], <4 x i32> [[ARG]], i32 [[TMP14]], i32 0, i32 0, i32 0)
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	; GCN-LABEL: atomic_add_and_format:			; GCN-LABEL: atomic_add_and_format:
	; GCN: ; %bb.0: ; %.entry			; GCN: ; %bb.0: ; %.entry
	; GCN-NEXT: s_mov_b64 s[6:7], exec			; GCN-NEXT: s_mov_b64 s[6:7], exec
	; GCN-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s6, 0			; GCN-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s6, 0
	Show All 21 Lines
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	.entry:			.entry:
	%a = call i32 @llvm.amdgcn.struct.buffer.atomic.add.i32(i32 1, <4 x i32> %arg, i32 0, i32 0, i32 0, i32 0)			%a = call i32 @llvm.amdgcn.struct.buffer.atomic.add.i32(i32 1, <4 x i32> %arg, i32 0, i32 0, i32 0, i32 0)
	call void @llvm.amdgcn.struct.buffer.store.format.v4i32(<4 x i32> %arg, <4 x i32> %arg, i32 %a, i32 0, i32 0, i32 0)			call void @llvm.amdgcn.struct.buffer.store.format.v4i32(<4 x i32> %arg, <4 x i32> %arg, i32 %a, i32 0, i32 0, i32 0)
	ret void			ret void
	}			}

	define amdgpu_cs void @atomic_sub(<4 x i32> inreg %arg) {			define amdgpu_cs void @atomic_sub(<4 x i32> inreg %arg) {
	; IR-LABEL: @atomic_sub(			; IR-LABEL: define amdgpu_cs void @atomic_sub
				; IR-SAME: (<4 x i32> inreg [[ARG:%.*]]) {
	; IR-NEXT: .entry:			; IR-NEXT: .entry:
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP0]])			; IR-NEXT: [[TMP6:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP0]])
	; IR-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32			; IR-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32
	; IR-NEXT: [[TMP8:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP8:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP8]], label [[TMP9:%.]], label [[TMP11:%.]]			; IR-NEXT: br i1 [[TMP8]], label [[TMP9:%.]], label [[TMP11:%.]]
	; IR: 9:			; IR: 9:
	; IR-NEXT: [[TMP10:%.]] = call i32 @llvm.amdgcn.struct.buffer.atomic.sub.i32(i32 [[TMP7]], <4 x i32> [[ARG:%.]], i32 0, i32 0, i32 0, i32 0)			; IR-NEXT: [[TMP10:%.*]] = call i32 @llvm.amdgcn.struct.buffer.atomic.sub.i32(i32 [[TMP7]], <4 x i32> [[ARG]], i32 0, i32 0, i32 0, i32 0)
	; IR-NEXT: br label [[TMP11]]			; IR-NEXT: br label [[TMP11]]
	; IR: 11:			; IR: 11:
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	; GCN-LABEL: atomic_sub:			; GCN-LABEL: atomic_sub:
	; GCN: ; %bb.0: ; %.entry			; GCN: ; %bb.0: ; %.entry
	; GCN-NEXT: s_mov_b64 s[4:5], exec			; GCN-NEXT: s_mov_b64 s[4:5], exec
	; GCN-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s4, 0			; GCN-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s4, 0
	Show All 9 Lines
	; GCN-NEXT: .LBB2_2:			; GCN-NEXT: .LBB2_2:
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	.entry:			.entry:
	call i32 @llvm.amdgcn.struct.buffer.atomic.sub.i32(i32 1, <4 x i32> %arg, i32 0, i32 0, i32 0, i32 0)			call i32 @llvm.amdgcn.struct.buffer.atomic.sub.i32(i32 1, <4 x i32> %arg, i32 0, i32 0, i32 0, i32 0)
	ret void			ret void
	}			}

	define amdgpu_cs void @atomic_sub_and_format(<4 x i32> inreg %arg) {			define amdgpu_cs void @atomic_sub_and_format(<4 x i32> inreg %arg) {
	; IR-LABEL: @atomic_sub_and_format(			; IR-LABEL: define amdgpu_cs void @atomic_sub_and_format
				; IR-SAME: (<4 x i32> inreg [[ARG:%.*]]) {
	; IR-NEXT: .entry:			; IR-NEXT: .entry:
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP0]])			; IR-NEXT: [[TMP6:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP0]])
	; IR-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32			; IR-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32
	; IR-NEXT: [[TMP8:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP8:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP8]], label [[TMP9:%.]], label [[TMP11:%.]]			; IR-NEXT: br i1 [[TMP8]], label [[TMP9:%.]], label [[TMP11:%.]]
	; IR: 9:			; IR: 9:
	; IR-NEXT: [[TMP10:%.]] = call i32 @llvm.amdgcn.struct.buffer.atomic.sub.i32(i32 [[TMP7]], <4 x i32> [[ARG:%.]], i32 0, i32 0, i32 0, i32 0)			; IR-NEXT: [[TMP10:%.*]] = call i32 @llvm.amdgcn.struct.buffer.atomic.sub.i32(i32 [[TMP7]], <4 x i32> [[ARG]], i32 0, i32 0, i32 0, i32 0)
	; IR-NEXT: br label [[TMP11]]			; IR-NEXT: br label [[TMP11]]
	; IR: 11:			; IR: 11:
	; IR-NEXT: [[TMP12:%.]] = phi i32 [ poison, [[DOTENTRY:%.]] ], [ [[TMP10]], [[TMP9]] ]			; IR-NEXT: [[TMP12:%.]] = phi i32 [ poison, [[DOTENTRY:%.]] ], [ [[TMP10]], [[TMP9]] ]
	; IR-NEXT: [[TMP13:%.*]] = call i32 @llvm.amdgcn.readfirstlane(i32 [[TMP12]])			; IR-NEXT: [[TMP13:%.*]] = call i32 @llvm.amdgcn.readfirstlane.i32(i32 [[TMP12]])
	; IR-NEXT: [[TMP14:%.*]] = sub i32 [[TMP13]], [[TMP5]]			; IR-NEXT: [[TMP14:%.*]] = sub i32 [[TMP13]], [[TMP5]]
	; IR-NEXT: call void @llvm.amdgcn.struct.buffer.store.format.v4i32(<4 x i32> [[ARG]], <4 x i32> [[ARG]], i32 [[TMP14]], i32 0, i32 0, i32 0)			; IR-NEXT: call void @llvm.amdgcn.struct.buffer.store.format.v4i32(<4 x i32> [[ARG]], <4 x i32> [[ARG]], i32 [[TMP14]], i32 0, i32 0, i32 0)
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	; GCN-LABEL: atomic_sub_and_format:			; GCN-LABEL: atomic_sub_and_format:
	; GCN: ; %bb.0: ; %.entry			; GCN: ; %bb.0: ; %.entry
	; GCN-NEXT: s_mov_b64 s[6:7], exec			; GCN-NEXT: s_mov_b64 s[6:7], exec
	; GCN-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s6, 0			; GCN-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s6, 0
	Show All 21 Lines
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	.entry:			.entry:
	%a = call i32 @llvm.amdgcn.struct.buffer.atomic.sub.i32(i32 1, <4 x i32> %arg, i32 0, i32 0, i32 0, i32 0)			%a = call i32 @llvm.amdgcn.struct.buffer.atomic.sub.i32(i32 1, <4 x i32> %arg, i32 0, i32 0, i32 0, i32 0)
	call void @llvm.amdgcn.struct.buffer.store.format.v4i32(<4 x i32> %arg, <4 x i32> %arg, i32 %a, i32 0, i32 0, i32 0)			call void @llvm.amdgcn.struct.buffer.store.format.v4i32(<4 x i32> %arg, <4 x i32> %arg, i32 %a, i32 0, i32 0, i32 0)
	ret void			ret void
	}			}

	define amdgpu_cs void @atomic_xor(<4 x i32> inreg %arg) {			define amdgpu_cs void @atomic_xor(<4 x i32> inreg %arg) {
	; IR-LABEL: @atomic_xor(			; IR-LABEL: define amdgpu_cs void @atomic_xor
				; IR-SAME: (<4 x i32> inreg [[ARG:%.*]]) {
	; IR-NEXT: .entry:			; IR-NEXT: .entry:
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP0]])			; IR-NEXT: [[TMP6:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP0]])
	; IR-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32			; IR-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32
	; IR-NEXT: [[TMP8:%.*]] = and i32 [[TMP7]], 1			; IR-NEXT: [[TMP8:%.*]] = and i32 [[TMP7]], 1
	; IR-NEXT: [[TMP9:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP9:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP9]], label [[TMP10:%.]], label [[TMP12:%.]]			; IR-NEXT: br i1 [[TMP9]], label [[TMP10:%.]], label [[TMP12:%.]]
	; IR: 10:			; IR: 10:
	; IR-NEXT: [[TMP11:%.]] = call i32 @llvm.amdgcn.struct.buffer.atomic.xor.i32(i32 [[TMP8]], <4 x i32> [[ARG:%.]], i32 0, i32 0, i32 0, i32 0)			; IR-NEXT: [[TMP11:%.*]] = call i32 @llvm.amdgcn.struct.buffer.atomic.xor.i32(i32 [[TMP8]], <4 x i32> [[ARG]], i32 0, i32 0, i32 0, i32 0)
	; IR-NEXT: br label [[TMP12]]			; IR-NEXT: br label [[TMP12]]
	; IR: 12:			; IR: 12:
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	; GCN-LABEL: atomic_xor:			; GCN-LABEL: atomic_xor:
	; GCN: ; %bb.0: ; %.entry			; GCN: ; %bb.0: ; %.entry
	; GCN-NEXT: s_mov_b64 s[4:5], exec			; GCN-NEXT: s_mov_b64 s[4:5], exec
	; GCN-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s4, 0			; GCN-NEXT: v_mbcnt_lo_u32_b32_e64 v0, s4, 0
	Show All 10 Lines
	; GCN-NEXT: .LBB4_2:			; GCN-NEXT: .LBB4_2:
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	.entry:			.entry:
	call i32 @llvm.amdgcn.struct.buffer.atomic.xor.i32(i32 1, <4 x i32> %arg, i32 0, i32 0, i32 0, i32 0)			call i32 @llvm.amdgcn.struct.buffer.atomic.xor.i32(i32 1, <4 x i32> %arg, i32 0, i32 0, i32 0, i32 0)
	ret void			ret void
	}			}

	define amdgpu_cs void @atomic_xor_and_format(<4 x i32> inreg %arg) {			define amdgpu_cs void @atomic_xor_and_format(<4 x i32> inreg %arg) {
	; IR-LABEL: @atomic_xor_and_format(			; IR-LABEL: define amdgpu_cs void @atomic_xor_and_format
				; IR-SAME: (<4 x i32> inreg [[ARG:%.*]]) {
	; IR-NEXT: .entry:			; IR-NEXT: .entry:
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP0]])			; IR-NEXT: [[TMP6:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP0]])
	; IR-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32			; IR-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32
	; IR-NEXT: [[TMP8:%.*]] = and i32 [[TMP7]], 1			; IR-NEXT: [[TMP8:%.*]] = and i32 [[TMP7]], 1
	; IR-NEXT: [[TMP9:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP9:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP9]], label [[TMP10:%.]], label [[TMP12:%.]]			; IR-NEXT: br i1 [[TMP9]], label [[TMP10:%.]], label [[TMP12:%.]]
	; IR: 10:			; IR: 10:
	; IR-NEXT: [[TMP11:%.]] = call i32 @llvm.amdgcn.struct.buffer.atomic.xor.i32(i32 [[TMP8]], <4 x i32> [[ARG:%.]], i32 0, i32 0, i32 0, i32 0)			; IR-NEXT: [[TMP11:%.*]] = call i32 @llvm.amdgcn.struct.buffer.atomic.xor.i32(i32 [[TMP8]], <4 x i32> [[ARG]], i32 0, i32 0, i32 0, i32 0)
	; IR-NEXT: br label [[TMP12]]			; IR-NEXT: br label [[TMP12]]
	; IR: 12:			; IR: 12:
	; IR-NEXT: [[TMP13:%.]] = phi i32 [ poison, [[DOTENTRY:%.]] ], [ [[TMP11]], [[TMP10]] ]			; IR-NEXT: [[TMP13:%.]] = phi i32 [ poison, [[DOTENTRY:%.]] ], [ [[TMP11]], [[TMP10]] ]
	; IR-NEXT: [[TMP14:%.*]] = call i32 @llvm.amdgcn.readfirstlane(i32 [[TMP13]])			; IR-NEXT: [[TMP14:%.*]] = call i32 @llvm.amdgcn.readfirstlane.i32(i32 [[TMP13]])
	; IR-NEXT: [[TMP15:%.*]] = and i32 [[TMP5]], 1			; IR-NEXT: [[TMP15:%.*]] = and i32 [[TMP5]], 1
	; IR-NEXT: [[TMP16:%.*]] = xor i32 [[TMP14]], [[TMP15]]			; IR-NEXT: [[TMP16:%.*]] = xor i32 [[TMP14]], [[TMP15]]
	; IR-NEXT: call void @llvm.amdgcn.struct.buffer.store.format.v4i32(<4 x i32> [[ARG]], <4 x i32> [[ARG]], i32 [[TMP16]], i32 0, i32 0, i32 0)			; IR-NEXT: call void @llvm.amdgcn.struct.buffer.store.format.v4i32(<4 x i32> [[ARG]], <4 x i32> [[ARG]], i32 [[TMP16]], i32 0, i32 0, i32 0)
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	; GCN-LABEL: atomic_xor_and_format:			; GCN-LABEL: atomic_xor_and_format:
	; GCN: ; %bb.0: ; %.entry			; GCN: ; %bb.0: ; %.entry
	; GCN-NEXT: s_mov_b64 s[6:7], exec			; GCN-NEXT: s_mov_b64 s[6:7], exec
	Show All 30 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-amdgcn.readfirstlane.mir

# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py		# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 2
# RUN: llc -march=amdgcn -mcpu=tahiti -run-pass=instruction-select -verify-machineinstrs -global-isel-abort=2 -pass-remarks-missed='gisel*' %s -o - 2> %t \| FileCheck -check-prefix=GCN %s		# RUN: llc -march=amdgcn -mcpu=tahiti -run-pass=instruction-select -verify-machineinstrs -global-isel-abort=2 -pass-remarks-missed='gisel*' %s -o - 2> %t \| FileCheck -check-prefix=GCN %s
# RUN: FileCheck -check-prefix=ERR %s < %t		# RUN: FileCheck -check-prefix=ERR %s < %t

# ERR: remark: <unknown>:0:0: cannot select: %1:sgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.readfirstlane), %0:sgpr(s32) (in function: readfirstlane_s)		# ERR: remark: <unknown>:0:0: cannot select: %1:sgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.readfirstlane), %0:sgpr(s32) (in function: readfirstlane_s32_s)

---		---
name: readfirstlane_v		name: readfirstlane_v
legalized: true		legalized: true
regBankSelected: true		regBankSelected: true
tracksRegLiveness: true		tracksRegLiveness: true

body: \|		body: \|
Show All 24 Lines	bb.0:
; GCN-NEXT: [[COPY:%[0-9]+]]:sreg_32 = COPY [[V_MOV_B32_e32_]]		; GCN-NEXT: [[COPY:%[0-9]+]]:sreg_32 = COPY [[V_MOV_B32_e32_]]
; GCN-NEXT: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 [[COPY]]		; GCN-NEXT: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 [[COPY]]
; GCN-NEXT: S_ENDPGM 0, implicit [[S_MOV_B32_]]		; GCN-NEXT: S_ENDPGM 0, implicit [[S_MOV_B32_]]
%0:vgpr(s32) = G_CONSTANT i32 123		%0:vgpr(s32) = G_CONSTANT i32 123
%1:sgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.readfirstlane), %0		%1:sgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.readfirstlane), %0
S_ENDPGM 0, implicit %1		S_ENDPGM 0, implicit %1
...		...


		---
		name: readfirstlane_v2s16_v
		legalized: true
		regBankSelected: true
		tracksRegLiveness: true

		body: \|
		bb.0:
		liveins: $vgpr0
		; GCN-LABEL: name: readfirstlane_v2s16_v
		; GCN: liveins: $vgpr0
		; GCN-NEXT: {{ $}}
		; GCN-NEXT: [[COPY:%[0-9]+]]:vgpr(<2 x s16>) = COPY $vgpr0
		; GCN-NEXT: [[INT:%[0-9]+]]:sgpr(<2 x s16>) = G_INTRINSIC intrinsic(@llvm.amdgcn.readfirstlane), [[COPY]](<2 x s16>)
		; GCN-NEXT: S_ENDPGM 0, implicit [[INT]](<2 x s16>)
		%0:vgpr(<2 x s16>) = COPY $vgpr0
		%1:sgpr(<2 x s16>) = G_INTRINSIC intrinsic(@llvm.amdgcn.readfirstlane), %0
		S_ENDPGM 0, implicit %1
		...

		---
		name: readfirstlane_p3_v
		legalized: true
		regBankSelected: true
		tracksRegLiveness: true

		body: \|
		bb.0:
		liveins: $vgpr0
		; GCN-LABEL: name: readfirstlane_p3_v
		; GCN: liveins: $vgpr0
		; GCN-NEXT: {{ $}}
		; GCN-NEXT: [[COPY:%[0-9]+]]:vgpr(p3) = COPY $vgpr0
		; GCN-NEXT: [[INT:%[0-9]+]]:sgpr(p3) = G_INTRINSIC intrinsic(@llvm.amdgcn.readfirstlane), [[COPY]](p3)
		; GCN-NEXT: S_ENDPGM 0, implicit [[INT]](p3)
		%0:vgpr(p3) = COPY $vgpr0
		%1:sgpr(p3) = G_INTRINSIC intrinsic(@llvm.amdgcn.readfirstlane), %0
		S_ENDPGM 0, implicit %1
		...

		---
		name: readfirstlane_p5_v
		legalized: true
		regBankSelected: true
		tracksRegLiveness: true

		body: \|
		bb.0:
		liveins: $vgpr0
		; GCN-LABEL: name: readfirstlane_p5_v
		; GCN: liveins: $vgpr0
		; GCN-NEXT: {{ $}}
		; GCN-NEXT: [[COPY:%[0-9]+]]:vgpr(p5) = COPY $vgpr0
		; GCN-NEXT: [[INT:%[0-9]+]]:sgpr(p5) = G_INTRINSIC intrinsic(@llvm.amdgcn.readfirstlane), [[COPY]](p5)
		; GCN-NEXT: S_ENDPGM 0, implicit [[INT]](p5)
		%0:vgpr(p5) = COPY $vgpr0
		%1:sgpr(p5) = G_INTRINSIC intrinsic(@llvm.amdgcn.readfirstlane), %0
		S_ENDPGM 0, implicit %1
		...

		---
		name: readfirstlane_p2_v
		legalized: true
		regBankSelected: true
		tracksRegLiveness: true

		body: \|
		bb.0:
		liveins: $vgpr0
		; GCN-LABEL: name: readfirstlane_p2_v
		; GCN: liveins: $vgpr0
		; GCN-NEXT: {{ $}}
		; GCN-NEXT: [[COPY:%[0-9]+]]:vgpr(p2) = COPY $vgpr0
		; GCN-NEXT: [[INT:%[0-9]+]]:sgpr(p2) = G_INTRINSIC intrinsic(@llvm.amdgcn.readfirstlane), [[COPY]](p2)
		; GCN-NEXT: S_ENDPGM 0, implicit [[INT]](p2)
		%0:vgpr(p2) = COPY $vgpr0
		%1:sgpr(p2) = G_INTRINSIC intrinsic(@llvm.amdgcn.readfirstlane), %0
		S_ENDPGM 0, implicit %1
		...

# Make sure this fails to select		# Make sure this fails to select
---		---
name: readfirstlane_s		name: readfirstlane_s32_s
legalized: true		legalized: true
regBankSelected: true		regBankSelected: true
tracksRegLiveness: true		tracksRegLiveness: true

body: \|		body: \|
bb.0:		bb.0:
liveins: $sgpr0		liveins: $sgpr0
; GCN-LABEL: name: readfirstlane_s		; GCN-LABEL: name: readfirstlane_s32_s
; GCN: liveins: $sgpr0		; GCN: liveins: $sgpr0
; GCN-NEXT: {{ $}}		; GCN-NEXT: {{ $}}
; GCN-NEXT: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0		; GCN-NEXT: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
; GCN-NEXT: [[INT:%[0-9]+]]:sgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.readfirstlane), [[COPY]](s32)		; GCN-NEXT: [[INT:%[0-9]+]]:sgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.readfirstlane), [[COPY]](s32)
; GCN-NEXT: S_ENDPGM 0, implicit [[INT]](s32)		; GCN-NEXT: S_ENDPGM 0, implicit [[INT]](s32)
%0:sgpr(s32) = COPY $sgpr0		%0:sgpr(s32) = COPY $sgpr0
%1:sgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.readfirstlane), %0		%1:sgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.readfirstlane), %0
S_ENDPGM 0, implicit %1		S_ENDPGM 0, implicit %1
...		...

llvm/test/CodeGen/AMDGPU/atomic_optimizations_local_pointer.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --force-update			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -march=amdgcn -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GFX7LESS %s			; RUN: llc -march=amdgcn -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GFX7LESS %s
	; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GFX8 %s			; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GFX8 %s
	; RUN: llc -march=amdgcn -mcpu=gfx900 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GFX9 %s			; RUN: llc -march=amdgcn -mcpu=gfx900 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GFX9 %s
	; RUN: llc -march=amdgcn -mcpu=gfx1010 -mattr=-wavefrontsize32,+wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GFX10,GFX1064 %s			; RUN: llc -march=amdgcn -mcpu=gfx1010 -mattr=-wavefrontsize32,+wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GFX10,GFX1064 %s
	; RUN: llc -march=amdgcn -mcpu=gfx1010 -mattr=+wavefrontsize32,-wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GFX10,GFX1032 %s			; RUN: llc -march=amdgcn -mcpu=gfx1010 -mattr=+wavefrontsize32,-wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GFX10,GFX1032 %s
	; RUN: llc -march=amdgcn -mcpu=gfx1100 -mattr=-wavefrontsize32,+wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GFX11,GFX1164 %s			; RUN: llc -march=amdgcn -mcpu=gfx1100 -mattr=-wavefrontsize32,+wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GFX11,GFX1164 %s
	; RUN: llc -march=amdgcn -mcpu=gfx1100 -mattr=+wavefrontsize32,-wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GFX11,GFX1132 %s			; RUN: llc -march=amdgcn -mcpu=gfx1100 -mattr=+wavefrontsize32,-wavefrontsize64 -mattr=-flat-for-global -amdgpu-atomic-optimizations=true -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GFX11,GFX1132 %s

	▲ Show 20 Lines • Show All 6,750 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/global-atomic-scan.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2
	; RUN: opt -S -mtriple=amdgcn-- -passes=amdgpu-atomic-optimizer %s \| FileCheck -check-prefix=IR %s			; RUN: opt -S -mtriple=amdgcn-- -passes=amdgpu-atomic-optimizer %s \| FileCheck -check-prefix=IR %s

	define amdgpu_kernel void @atomic_add_i32_offset(ptr addrspace(1) %out, i32 %in) {			define amdgpu_kernel void @atomic_add_i32_offset(ptr addrspace(1) %out, i32 %in) {
	; IR-LABEL: @atomic_add_i32_offset(			; IR-LABEL: define amdgpu_kernel void @atomic_add_i32_offset
				; IR-SAME: (ptr addrspace(1) [[OUT:%.]], i32 [[IN:%.]]) {
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: [[GEP:%.]] = getelementptr i32, ptr addrspace(1) [[OUT:%.]], i64 4			; IR-NEXT: [[GEP:%.*]] = getelementptr i32, ptr addrspace(1) [[OUT]], i64 4
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP0]])			; IR-NEXT: [[TMP6:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP0]])
	; IR-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32			; IR-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32
	; IR-NEXT: [[TMP8:%.]] = mul i32 [[IN:%.]], [[TMP7]]			; IR-NEXT: [[TMP8:%.*]] = mul i32 [[IN]], [[TMP7]]
	; IR-NEXT: [[TMP9:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP9:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP9]], label [[TMP10:%.]], label [[TMP12:%.]]			; IR-NEXT: br i1 [[TMP9]], label [[TMP10:%.]], label [[TMP12:%.]]
	; IR: 10:			; IR: 10:
	; IR-NEXT: [[TMP11:%.*]] = atomicrmw volatile add ptr addrspace(1) [[GEP]], i32 [[TMP8]] seq_cst, align 4			; IR-NEXT: [[TMP11:%.*]] = atomicrmw volatile add ptr addrspace(1) [[GEP]], i32 [[TMP8]] seq_cst, align 4
	; IR-NEXT: br label [[TMP12]]			; IR-NEXT: br label [[TMP12]]
	; IR: 12:			; IR: 12:
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	entry:			entry:
	%gep = getelementptr i32, ptr addrspace(1) %out, i64 4			%gep = getelementptr i32, ptr addrspace(1) %out, i64 4
	%val = atomicrmw volatile add ptr addrspace(1) %gep, i32 %in seq_cst			%val = atomicrmw volatile add ptr addrspace(1) %gep, i32 %in seq_cst
	ret void			ret void
	}			}

	define amdgpu_kernel void @atomic_add_i32_max_neg_offset(ptr addrspace(1) %out, i32 %in) {			define amdgpu_kernel void @atomic_add_i32_max_neg_offset(ptr addrspace(1) %out, i32 %in) {
	; IR-LABEL: @atomic_add_i32_max_neg_offset(			; IR-LABEL: define amdgpu_kernel void @atomic_add_i32_max_neg_offset
				; IR-SAME: (ptr addrspace(1) [[OUT:%.]], i32 [[IN:%.]]) {
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: [[GEP:%.]] = getelementptr i32, ptr addrspace(1) [[OUT:%.]], i64 -1024			; IR-NEXT: [[GEP:%.*]] = getelementptr i32, ptr addrspace(1) [[OUT]], i64 -1024
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP0]])			; IR-NEXT: [[TMP6:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP0]])
	; IR-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32			; IR-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32
	; IR-NEXT: [[TMP8:%.]] = mul i32 [[IN:%.]], [[TMP7]]			; IR-NEXT: [[TMP8:%.*]] = mul i32 [[IN]], [[TMP7]]
	; IR-NEXT: [[TMP9:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP9:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP9]], label [[TMP10:%.]], label [[TMP12:%.]]			; IR-NEXT: br i1 [[TMP9]], label [[TMP10:%.]], label [[TMP12:%.]]
	; IR: 10:			; IR: 10:
	; IR-NEXT: [[TMP11:%.*]] = atomicrmw volatile add ptr addrspace(1) [[GEP]], i32 [[TMP8]] seq_cst, align 4			; IR-NEXT: [[TMP11:%.*]] = atomicrmw volatile add ptr addrspace(1) [[GEP]], i32 [[TMP8]] seq_cst, align 4
	; IR-NEXT: br label [[TMP12]]			; IR-NEXT: br label [[TMP12]]
	; IR: 12:			; IR: 12:
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	entry:			entry:
	%gep = getelementptr i32, ptr addrspace(1) %out, i64 -1024			%gep = getelementptr i32, ptr addrspace(1) %out, i64 -1024
	%val = atomicrmw volatile add ptr addrspace(1) %gep, i32 %in seq_cst			%val = atomicrmw volatile add ptr addrspace(1) %gep, i32 %in seq_cst
	ret void			ret void
	}			}

	define amdgpu_kernel void @atomic_add_i32_soffset(ptr addrspace(1) %out, i32 %in) {			define amdgpu_kernel void @atomic_add_i32_soffset(ptr addrspace(1) %out, i32 %in) {
	; IR-LABEL: @atomic_add_i32_soffset(			; IR-LABEL: define amdgpu_kernel void @atomic_add_i32_soffset
				; IR-SAME: (ptr addrspace(1) [[OUT:%.]], i32 [[IN:%.]]) {
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: [[GEP:%.]] = getelementptr i32, ptr addrspace(1) [[OUT:%.]], i64 9000			; IR-NEXT: [[GEP:%.*]] = getelementptr i32, ptr addrspace(1) [[OUT]], i64 9000
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP0]])			; IR-NEXT: [[TMP6:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP0]])
	; IR-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32			; IR-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32
	; IR-NEXT: [[TMP8:%.]] = mul i32 [[IN:%.]], [[TMP7]]			; IR-NEXT: [[TMP8:%.*]] = mul i32 [[IN]], [[TMP7]]
	; IR-NEXT: [[TMP9:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP9:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP9]], label [[TMP10:%.]], label [[TMP12:%.]]			; IR-NEXT: br i1 [[TMP9]], label [[TMP10:%.]], label [[TMP12:%.]]
	; IR: 10:			; IR: 10:
	; IR-NEXT: [[TMP11:%.*]] = atomicrmw volatile add ptr addrspace(1) [[GEP]], i32 [[TMP8]] seq_cst, align 4			; IR-NEXT: [[TMP11:%.*]] = atomicrmw volatile add ptr addrspace(1) [[GEP]], i32 [[TMP8]] seq_cst, align 4
	; IR-NEXT: br label [[TMP12]]			; IR-NEXT: br label [[TMP12]]
	; IR: 12:			; IR: 12:
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	entry:			entry:
	%gep = getelementptr i32, ptr addrspace(1) %out, i64 9000			%gep = getelementptr i32, ptr addrspace(1) %out, i64 9000
	%val = atomicrmw volatile add ptr addrspace(1) %gep, i32 %in seq_cst			%val = atomicrmw volatile add ptr addrspace(1) %gep, i32 %in seq_cst
	ret void			ret void
	}			}

	define amdgpu_kernel void @atomic_add_i32_huge_offset(ptr addrspace(1) %out, i32 %in) {			define amdgpu_kernel void @atomic_add_i32_huge_offset(ptr addrspace(1) %out, i32 %in) {
	; IR-LABEL: @atomic_add_i32_huge_offset(			; IR-LABEL: define amdgpu_kernel void @atomic_add_i32_huge_offset
				; IR-SAME: (ptr addrspace(1) [[OUT:%.]], i32 [[IN:%.]]) {
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: [[GEP:%.]] = getelementptr i32, ptr addrspace(1) [[OUT:%.]], i64 47224239175595			; IR-NEXT: [[GEP:%.*]] = getelementptr i32, ptr addrspace(1) [[OUT]], i64 47224239175595
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP0]])			; IR-NEXT: [[TMP6:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP0]])
	; IR-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32			; IR-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32
	; IR-NEXT: [[TMP8:%.]] = mul i32 [[IN:%.]], [[TMP7]]			; IR-NEXT: [[TMP8:%.*]] = mul i32 [[IN]], [[TMP7]]
	; IR-NEXT: [[TMP9:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP9:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP9]], label [[TMP10:%.]], label [[TMP12:%.]]			; IR-NEXT: br i1 [[TMP9]], label [[TMP10:%.]], label [[TMP12:%.]]
	; IR: 10:			; IR: 10:
	; IR-NEXT: [[TMP11:%.*]] = atomicrmw volatile add ptr addrspace(1) [[GEP]], i32 [[TMP8]] seq_cst, align 4			; IR-NEXT: [[TMP11:%.*]] = atomicrmw volatile add ptr addrspace(1) [[GEP]], i32 [[TMP8]] seq_cst, align 4
	; IR-NEXT: br label [[TMP12]]			; IR-NEXT: br label [[TMP12]]
	; IR: 12:			; IR: 12:
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	entry:			entry:
	%gep = getelementptr i32, ptr addrspace(1) %out, i64 47224239175595			%gep = getelementptr i32, ptr addrspace(1) %out, i64 47224239175595

	%val = atomicrmw volatile add ptr addrspace(1) %gep, i32 %in seq_cst			%val = atomicrmw volatile add ptr addrspace(1) %gep, i32 %in seq_cst
	ret void			ret void
	}			}

	define amdgpu_kernel void @atomic_add_i32_ret_offset(ptr addrspace(1) %out, ptr addrspace(1) %out2, i32 %in) {			define amdgpu_kernel void @atomic_add_i32_ret_offset(ptr addrspace(1) %out, ptr addrspace(1) %out2, i32 %in) {
	; IR-LABEL: @atomic_add_i32_ret_offset(			; IR-LABEL: define amdgpu_kernel void @atomic_add_i32_ret_offset
				; IR-SAME: (ptr addrspace(1) [[OUT:%.]], ptr addrspace(1) [[OUT2:%.]], i32 [[IN:%.*]]) {
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: [[GEP:%.]] = getelementptr i32, ptr addrspace(1) [[OUT:%.]], i64 4			; IR-NEXT: [[GEP:%.*]] = getelementptr i32, ptr addrspace(1) [[OUT]], i64 4
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP0]])			; IR-NEXT: [[TMP6:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP0]])
	; IR-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32			; IR-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32
	; IR-NEXT: [[TMP8:%.]] = mul i32 [[IN:%.]], [[TMP7]]			; IR-NEXT: [[TMP8:%.*]] = mul i32 [[IN]], [[TMP7]]
	; IR-NEXT: [[TMP9:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP9:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP9]], label [[TMP10:%.]], label [[TMP12:%.]]			; IR-NEXT: br i1 [[TMP9]], label [[TMP10:%.]], label [[TMP12:%.]]
	; IR: 10:			; IR: 10:
	; IR-NEXT: [[TMP11:%.*]] = atomicrmw volatile add ptr addrspace(1) [[GEP]], i32 [[TMP8]] seq_cst, align 4			; IR-NEXT: [[TMP11:%.*]] = atomicrmw volatile add ptr addrspace(1) [[GEP]], i32 [[TMP8]] seq_cst, align 4
	; IR-NEXT: br label [[TMP12]]			; IR-NEXT: br label [[TMP12]]
	; IR: 12:			; IR: 12:
	; IR-NEXT: [[TMP13:%.]] = phi i32 [ poison, [[ENTRY:%.]] ], [ [[TMP11]], [[TMP10]] ]			; IR-NEXT: [[TMP13:%.]] = phi i32 [ poison, [[ENTRY:%.]] ], [ [[TMP11]], [[TMP10]] ]
	; IR-NEXT: [[TMP14:%.*]] = call i32 @llvm.amdgcn.readfirstlane(i32 [[TMP13]])			; IR-NEXT: [[TMP14:%.*]] = call i32 @llvm.amdgcn.readfirstlane.i32(i32 [[TMP13]])
	; IR-NEXT: [[TMP15:%.*]] = mul i32 [[IN]], [[TMP5]]			; IR-NEXT: [[TMP15:%.*]] = mul i32 [[IN]], [[TMP5]]
	; IR-NEXT: [[TMP16:%.*]] = add i32 [[TMP14]], [[TMP15]]			; IR-NEXT: [[TMP16:%.*]] = add i32 [[TMP14]], [[TMP15]]
	; IR-NEXT: store i32 [[TMP16]], ptr addrspace(1) [[OUT2:%.*]], align 4			; IR-NEXT: store i32 [[TMP16]], ptr addrspace(1) [[OUT2]], align 4
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	entry:			entry:
	%gep = getelementptr i32, ptr addrspace(1) %out, i64 4			%gep = getelementptr i32, ptr addrspace(1) %out, i64 4
	%val = atomicrmw volatile add ptr addrspace(1) %gep, i32 %in seq_cst			%val = atomicrmw volatile add ptr addrspace(1) %gep, i32 %in seq_cst
	store i32 %val, ptr addrspace(1) %out2			store i32 %val, ptr addrspace(1) %out2
	ret void			ret void
	}			}

	define amdgpu_kernel void @atomic_add_i32_addr64_offset(ptr addrspace(1) %out, i32 %in, i64 %index) {			define amdgpu_kernel void @atomic_add_i32_addr64_offset(ptr addrspace(1) %out, i32 %in, i64 %index) {
	; IR-LABEL: @atomic_add_i32_addr64_offset(			; IR-LABEL: define amdgpu_kernel void @atomic_add_i32_addr64_offset
				; IR-SAME: (ptr addrspace(1) [[OUT:%.]], i32 [[IN:%.]], i64 [[INDEX:%.*]]) {
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: [[PTR:%.]] = getelementptr i32, ptr addrspace(1) [[OUT:%.]], i64 [[INDEX:%.*]]			; IR-NEXT: [[PTR:%.*]] = getelementptr i32, ptr addrspace(1) [[OUT]], i64 [[INDEX]]
	; IR-NEXT: [[GEP:%.*]] = getelementptr i32, ptr addrspace(1) [[PTR]], i64 4			; IR-NEXT: [[GEP:%.*]] = getelementptr i32, ptr addrspace(1) [[PTR]], i64 4
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP0]])			; IR-NEXT: [[TMP6:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP0]])
	; IR-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32			; IR-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32
	; IR-NEXT: [[TMP8:%.]] = mul i32 [[IN:%.]], [[TMP7]]			; IR-NEXT: [[TMP8:%.*]] = mul i32 [[IN]], [[TMP7]]
	; IR-NEXT: [[TMP9:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP9:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP9]], label [[TMP10:%.]], label [[TMP12:%.]]			; IR-NEXT: br i1 [[TMP9]], label [[TMP10:%.]], label [[TMP12:%.]]
	; IR: 10:			; IR: 10:
	; IR-NEXT: [[TMP11:%.*]] = atomicrmw volatile add ptr addrspace(1) [[GEP]], i32 [[TMP8]] seq_cst, align 4			; IR-NEXT: [[TMP11:%.*]] = atomicrmw volatile add ptr addrspace(1) [[GEP]], i32 [[TMP8]] seq_cst, align 4
	; IR-NEXT: br label [[TMP12]]			; IR-NEXT: br label [[TMP12]]
	; IR: 12:			; IR: 12:
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	entry:			entry:
	%ptr = getelementptr i32, ptr addrspace(1) %out, i64 %index			%ptr = getelementptr i32, ptr addrspace(1) %out, i64 %index
	%gep = getelementptr i32, ptr addrspace(1) %ptr, i64 4			%gep = getelementptr i32, ptr addrspace(1) %ptr, i64 4
	%val = atomicrmw volatile add ptr addrspace(1) %gep, i32 %in seq_cst			%val = atomicrmw volatile add ptr addrspace(1) %gep, i32 %in seq_cst
	ret void			ret void
	}			}

	define amdgpu_kernel void @atomic_add_i32_ret_addr64_offset(ptr addrspace(1) %out, ptr addrspace(1) %out2, i32 %in, i64 %index) {			define amdgpu_kernel void @atomic_add_i32_ret_addr64_offset(ptr addrspace(1) %out, ptr addrspace(1) %out2, i32 %in, i64 %index) {
	; IR-LABEL: @atomic_add_i32_ret_addr64_offset(			; IR-LABEL: define amdgpu_kernel void @atomic_add_i32_ret_addr64_offset
				; IR-SAME: (ptr addrspace(1) [[OUT:%.]], ptr addrspace(1) [[OUT2:%.]], i32 [[IN:%.]], i64 [[INDEX:%.]]) {
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: [[PTR:%.]] = getelementptr i32, ptr addrspace(1) [[OUT:%.]], i64 [[INDEX:%.*]]			; IR-NEXT: [[PTR:%.*]] = getelementptr i32, ptr addrspace(1) [[OUT]], i64 [[INDEX]]
	; IR-NEXT: [[GEP:%.*]] = getelementptr i32, ptr addrspace(1) [[PTR]], i64 4			; IR-NEXT: [[GEP:%.*]] = getelementptr i32, ptr addrspace(1) [[PTR]], i64 4
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP0]])			; IR-NEXT: [[TMP6:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP0]])
	; IR-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32			; IR-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32
	; IR-NEXT: [[TMP8:%.]] = mul i32 [[IN:%.]], [[TMP7]]			; IR-NEXT: [[TMP8:%.*]] = mul i32 [[IN]], [[TMP7]]
	; IR-NEXT: [[TMP9:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP9:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP9]], label [[TMP10:%.]], label [[TMP12:%.]]			; IR-NEXT: br i1 [[TMP9]], label [[TMP10:%.]], label [[TMP12:%.]]
	; IR: 10:			; IR: 10:
	; IR-NEXT: [[TMP11:%.*]] = atomicrmw volatile add ptr addrspace(1) [[GEP]], i32 [[TMP8]] seq_cst, align 4			; IR-NEXT: [[TMP11:%.*]] = atomicrmw volatile add ptr addrspace(1) [[GEP]], i32 [[TMP8]] seq_cst, align 4
	; IR-NEXT: br label [[TMP12]]			; IR-NEXT: br label [[TMP12]]
	; IR: 12:			; IR: 12:
	; IR-NEXT: [[TMP13:%.]] = phi i32 [ poison, [[ENTRY:%.]] ], [ [[TMP11]], [[TMP10]] ]			; IR-NEXT: [[TMP13:%.]] = phi i32 [ poison, [[ENTRY:%.]] ], [ [[TMP11]], [[TMP10]] ]
	; IR-NEXT: [[TMP14:%.*]] = call i32 @llvm.amdgcn.readfirstlane(i32 [[TMP13]])			; IR-NEXT: [[TMP14:%.*]] = call i32 @llvm.amdgcn.readfirstlane.i32(i32 [[TMP13]])
	; IR-NEXT: [[TMP15:%.*]] = mul i32 [[IN]], [[TMP5]]			; IR-NEXT: [[TMP15:%.*]] = mul i32 [[IN]], [[TMP5]]
	; IR-NEXT: [[TMP16:%.*]] = add i32 [[TMP14]], [[TMP15]]			; IR-NEXT: [[TMP16:%.*]] = add i32 [[TMP14]], [[TMP15]]
	; IR-NEXT: store i32 [[TMP16]], ptr addrspace(1) [[OUT2:%.*]], align 4			; IR-NEXT: store i32 [[TMP16]], ptr addrspace(1) [[OUT2]], align 4
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	entry:			entry:
	%ptr = getelementptr i32, ptr addrspace(1) %out, i64 %index			%ptr = getelementptr i32, ptr addrspace(1) %out, i64 %index
	%gep = getelementptr i32, ptr addrspace(1) %ptr, i64 4			%gep = getelementptr i32, ptr addrspace(1) %ptr, i64 4
	%val = atomicrmw volatile add ptr addrspace(1) %gep, i32 %in seq_cst			%val = atomicrmw volatile add ptr addrspace(1) %gep, i32 %in seq_cst
	store i32 %val, ptr addrspace(1) %out2			store i32 %val, ptr addrspace(1) %out2
	ret void			ret void
	}			}

	define amdgpu_kernel void @atomic_add_i32(ptr addrspace(1) %out, i32 %in) {			define amdgpu_kernel void @atomic_add_i32(ptr addrspace(1) %out, i32 %in) {
	; IR-LABEL: @atomic_add_i32(			; IR-LABEL: define amdgpu_kernel void @atomic_add_i32
				; IR-SAME: (ptr addrspace(1) [[OUT:%.]], i32 [[IN:%.]]) {
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP0]])			; IR-NEXT: [[TMP6:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP0]])
	; IR-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32			; IR-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32
	; IR-NEXT: [[TMP8:%.]] = mul i32 [[IN:%.]], [[TMP7]]			; IR-NEXT: [[TMP8:%.*]] = mul i32 [[IN]], [[TMP7]]
	; IR-NEXT: [[TMP9:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP9:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP9]], label [[TMP10:%.]], label [[TMP12:%.]]			; IR-NEXT: br i1 [[TMP9]], label [[TMP10:%.]], label [[TMP12:%.]]
	; IR: 10:			; IR: 10:
	; IR-NEXT: [[TMP11:%.]] = atomicrmw volatile add ptr addrspace(1) [[OUT:%.]], i32 [[TMP8]] seq_cst, align 4			; IR-NEXT: [[TMP11:%.*]] = atomicrmw volatile add ptr addrspace(1) [[OUT]], i32 [[TMP8]] seq_cst, align 4
	; IR-NEXT: br label [[TMP12]]			; IR-NEXT: br label [[TMP12]]
	; IR: 12:			; IR: 12:
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	entry:			entry:
	%val = atomicrmw volatile add ptr addrspace(1) %out, i32 %in seq_cst			%val = atomicrmw volatile add ptr addrspace(1) %out, i32 %in seq_cst
	ret void			ret void
	}			}

	define amdgpu_kernel void @atomic_add_i32_ret(ptr addrspace(1) %out, ptr addrspace(1) %out2, i32 %in) {			define amdgpu_kernel void @atomic_add_i32_ret(ptr addrspace(1) %out, ptr addrspace(1) %out2, i32 %in) {
	; IR-LABEL: @atomic_add_i32_ret(			; IR-LABEL: define amdgpu_kernel void @atomic_add_i32_ret
				; IR-SAME: (ptr addrspace(1) [[OUT:%.]], ptr addrspace(1) [[OUT2:%.]], i32 [[IN:%.*]]) {
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP0]])			; IR-NEXT: [[TMP6:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP0]])
	; IR-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32			; IR-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32
	; IR-NEXT: [[TMP8:%.]] = mul i32 [[IN:%.]], [[TMP7]]			; IR-NEXT: [[TMP8:%.*]] = mul i32 [[IN]], [[TMP7]]
	; IR-NEXT: [[TMP9:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP9:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP9]], label [[TMP10:%.]], label [[TMP12:%.]]			; IR-NEXT: br i1 [[TMP9]], label [[TMP10:%.]], label [[TMP12:%.]]
	; IR: 10:			; IR: 10:
	; IR-NEXT: [[TMP11:%.]] = atomicrmw volatile add ptr addrspace(1) [[OUT:%.]], i32 [[TMP8]] seq_cst, align 4			; IR-NEXT: [[TMP11:%.*]] = atomicrmw volatile add ptr addrspace(1) [[OUT]], i32 [[TMP8]] seq_cst, align 4
	; IR-NEXT: br label [[TMP12]]			; IR-NEXT: br label [[TMP12]]
	; IR: 12:			; IR: 12:
	; IR-NEXT: [[TMP13:%.]] = phi i32 [ poison, [[ENTRY:%.]] ], [ [[TMP11]], [[TMP10]] ]			; IR-NEXT: [[TMP13:%.]] = phi i32 [ poison, [[ENTRY:%.]] ], [ [[TMP11]], [[TMP10]] ]
	; IR-NEXT: [[TMP14:%.*]] = call i32 @llvm.amdgcn.readfirstlane(i32 [[TMP13]])			; IR-NEXT: [[TMP14:%.*]] = call i32 @llvm.amdgcn.readfirstlane.i32(i32 [[TMP13]])
	; IR-NEXT: [[TMP15:%.*]] = mul i32 [[IN]], [[TMP5]]			; IR-NEXT: [[TMP15:%.*]] = mul i32 [[IN]], [[TMP5]]
	; IR-NEXT: [[TMP16:%.*]] = add i32 [[TMP14]], [[TMP15]]			; IR-NEXT: [[TMP16:%.*]] = add i32 [[TMP14]], [[TMP15]]
	; IR-NEXT: store i32 [[TMP16]], ptr addrspace(1) [[OUT2:%.*]], align 4			; IR-NEXT: store i32 [[TMP16]], ptr addrspace(1) [[OUT2]], align 4
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	entry:			entry:
	%val = atomicrmw volatile add ptr addrspace(1) %out, i32 %in seq_cst			%val = atomicrmw volatile add ptr addrspace(1) %out, i32 %in seq_cst
	store i32 %val, ptr addrspace(1) %out2			store i32 %val, ptr addrspace(1) %out2
	ret void			ret void
	}			}

	define amdgpu_kernel void @atomic_add_i32_addr64(ptr addrspace(1) %out, i32 %in, i64 %index) {			define amdgpu_kernel void @atomic_add_i32_addr64(ptr addrspace(1) %out, i32 %in, i64 %index) {
	; IR-LABEL: @atomic_add_i32_addr64(			; IR-LABEL: define amdgpu_kernel void @atomic_add_i32_addr64
				; IR-SAME: (ptr addrspace(1) [[OUT:%.]], i32 [[IN:%.]], i64 [[INDEX:%.*]]) {
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: [[PTR:%.]] = getelementptr i32, ptr addrspace(1) [[OUT:%.]], i64 [[INDEX:%.*]]			; IR-NEXT: [[PTR:%.*]] = getelementptr i32, ptr addrspace(1) [[OUT]], i64 [[INDEX]]
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP0]])			; IR-NEXT: [[TMP6:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP0]])
	; IR-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32			; IR-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32
	; IR-NEXT: [[TMP8:%.]] = mul i32 [[IN:%.]], [[TMP7]]			; IR-NEXT: [[TMP8:%.*]] = mul i32 [[IN]], [[TMP7]]
	; IR-NEXT: [[TMP9:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP9:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP9]], label [[TMP10:%.]], label [[TMP12:%.]]			; IR-NEXT: br i1 [[TMP9]], label [[TMP10:%.]], label [[TMP12:%.]]
	; IR: 10:			; IR: 10:
	; IR-NEXT: [[TMP11:%.*]] = atomicrmw volatile add ptr addrspace(1) [[PTR]], i32 [[TMP8]] seq_cst, align 4			; IR-NEXT: [[TMP11:%.*]] = atomicrmw volatile add ptr addrspace(1) [[PTR]], i32 [[TMP8]] seq_cst, align 4
	; IR-NEXT: br label [[TMP12]]			; IR-NEXT: br label [[TMP12]]
	; IR: 12:			; IR: 12:
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	entry:			entry:
	%ptr = getelementptr i32, ptr addrspace(1) %out, i64 %index			%ptr = getelementptr i32, ptr addrspace(1) %out, i64 %index
	%val = atomicrmw volatile add ptr addrspace(1) %ptr, i32 %in seq_cst			%val = atomicrmw volatile add ptr addrspace(1) %ptr, i32 %in seq_cst
	ret void			ret void
	}			}

	define amdgpu_kernel void @atomic_add_i32_ret_addr64(ptr addrspace(1) %out, ptr addrspace(1) %out2, i32 %in, i64 %index) {			define amdgpu_kernel void @atomic_add_i32_ret_addr64(ptr addrspace(1) %out, ptr addrspace(1) %out2, i32 %in, i64 %index) {
	; IR-LABEL: @atomic_add_i32_ret_addr64(			; IR-LABEL: define amdgpu_kernel void @atomic_add_i32_ret_addr64
				; IR-SAME: (ptr addrspace(1) [[OUT:%.]], ptr addrspace(1) [[OUT2:%.]], i32 [[IN:%.]], i64 [[INDEX:%.]]) {
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: [[PTR:%.]] = getelementptr i32, ptr addrspace(1) [[OUT:%.]], i64 [[INDEX:%.*]]			; IR-NEXT: [[PTR:%.*]] = getelementptr i32, ptr addrspace(1) [[OUT]], i64 [[INDEX]]
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP0]])			; IR-NEXT: [[TMP6:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP0]])
	; IR-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32			; IR-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32
	; IR-NEXT: [[TMP8:%.]] = mul i32 [[IN:%.]], [[TMP7]]			; IR-NEXT: [[TMP8:%.*]] = mul i32 [[IN]], [[TMP7]]
	; IR-NEXT: [[TMP9:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP9:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP9]], label [[TMP10:%.]], label [[TMP12:%.]]			; IR-NEXT: br i1 [[TMP9]], label [[TMP10:%.]], label [[TMP12:%.]]
	; IR: 10:			; IR: 10:
	; IR-NEXT: [[TMP11:%.*]] = atomicrmw volatile add ptr addrspace(1) [[PTR]], i32 [[TMP8]] seq_cst, align 4			; IR-NEXT: [[TMP11:%.*]] = atomicrmw volatile add ptr addrspace(1) [[PTR]], i32 [[TMP8]] seq_cst, align 4
	; IR-NEXT: br label [[TMP12]]			; IR-NEXT: br label [[TMP12]]
	; IR: 12:			; IR: 12:
	; IR-NEXT: [[TMP13:%.]] = phi i32 [ poison, [[ENTRY:%.]] ], [ [[TMP11]], [[TMP10]] ]			; IR-NEXT: [[TMP13:%.]] = phi i32 [ poison, [[ENTRY:%.]] ], [ [[TMP11]], [[TMP10]] ]
	; IR-NEXT: [[TMP14:%.*]] = call i32 @llvm.amdgcn.readfirstlane(i32 [[TMP13]])			; IR-NEXT: [[TMP14:%.*]] = call i32 @llvm.amdgcn.readfirstlane.i32(i32 [[TMP13]])
	; IR-NEXT: [[TMP15:%.*]] = mul i32 [[IN]], [[TMP5]]			; IR-NEXT: [[TMP15:%.*]] = mul i32 [[IN]], [[TMP5]]
	; IR-NEXT: [[TMP16:%.*]] = add i32 [[TMP14]], [[TMP15]]			; IR-NEXT: [[TMP16:%.*]] = add i32 [[TMP14]], [[TMP15]]
	; IR-NEXT: store i32 [[TMP16]], ptr addrspace(1) [[OUT2:%.*]], align 4			; IR-NEXT: store i32 [[TMP16]], ptr addrspace(1) [[OUT2]], align 4
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	entry:			entry:
	%ptr = getelementptr i32, ptr addrspace(1) %out, i64 %index			%ptr = getelementptr i32, ptr addrspace(1) %out, i64 %index
	%val = atomicrmw volatile add ptr addrspace(1) %ptr, i32 %in seq_cst			%val = atomicrmw volatile add ptr addrspace(1) %ptr, i32 %in seq_cst
	store i32 %val, ptr addrspace(1) %out2			store i32 %val, ptr addrspace(1) %out2
	ret void			ret void
	}			}

	define amdgpu_kernel void @atomic_and_i32_offset(ptr addrspace(1) %out, i32 %in) {			define amdgpu_kernel void @atomic_and_i32_offset(ptr addrspace(1) %out, i32 %in) {
	; IR-LABEL: @atomic_and_i32_offset(			; IR-LABEL: define amdgpu_kernel void @atomic_and_i32_offset
				; IR-SAME: (ptr addrspace(1) [[OUT:%.]], i32 [[IN:%.]]) {
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: [[GEP:%.]] = getelementptr i32, ptr addrspace(1) [[OUT:%.]], i64 4			; IR-NEXT: [[GEP:%.*]] = getelementptr i32, ptr addrspace(1) [[OUT]], i64 4
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]			; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]
	; IR: 7:			; IR: 7:
	; IR-NEXT: [[TMP8:%.]] = atomicrmw volatile and ptr addrspace(1) [[GEP]], i32 [[IN:%.]] seq_cst, align 4			; IR-NEXT: [[TMP8:%.*]] = atomicrmw volatile and ptr addrspace(1) [[GEP]], i32 [[IN]] seq_cst, align 4
	; IR-NEXT: br label [[TMP9]]			; IR-NEXT: br label [[TMP9]]
	; IR: 9:			; IR: 9:
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	entry:			entry:
	%gep = getelementptr i32, ptr addrspace(1) %out, i64 4			%gep = getelementptr i32, ptr addrspace(1) %out, i64 4
	%val = atomicrmw volatile and ptr addrspace(1) %gep, i32 %in seq_cst			%val = atomicrmw volatile and ptr addrspace(1) %gep, i32 %in seq_cst
	ret void			ret void
	}			}

	define amdgpu_kernel void @atomic_and_i32_ret_offset(ptr addrspace(1) %out, ptr addrspace(1) %out2, i32 %in) {			define amdgpu_kernel void @atomic_and_i32_ret_offset(ptr addrspace(1) %out, ptr addrspace(1) %out2, i32 %in) {
	; IR-LABEL: @atomic_and_i32_ret_offset(			; IR-LABEL: define amdgpu_kernel void @atomic_and_i32_ret_offset
				; IR-SAME: (ptr addrspace(1) [[OUT:%.]], ptr addrspace(1) [[OUT2:%.]], i32 [[IN:%.*]]) {
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: [[GEP:%.]] = getelementptr i32, ptr addrspace(1) [[OUT:%.]], i64 4			; IR-NEXT: [[GEP:%.*]] = getelementptr i32, ptr addrspace(1) [[OUT]], i64 4
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]			; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]
	; IR: 7:			; IR: 7:
	; IR-NEXT: [[TMP8:%.]] = atomicrmw volatile and ptr addrspace(1) [[GEP]], i32 [[IN:%.]] seq_cst, align 4			; IR-NEXT: [[TMP8:%.*]] = atomicrmw volatile and ptr addrspace(1) [[GEP]], i32 [[IN]] seq_cst, align 4
	; IR-NEXT: br label [[TMP9]]			; IR-NEXT: br label [[TMP9]]
	; IR: 9:			; IR: 9:
	; IR-NEXT: [[TMP10:%.]] = phi i32 [ poison, [[ENTRY:%.]] ], [ [[TMP8]], [[TMP7]] ]			; IR-NEXT: [[TMP10:%.]] = phi i32 [ poison, [[ENTRY:%.]] ], [ [[TMP8]], [[TMP7]] ]
	; IR-NEXT: [[TMP11:%.*]] = call i32 @llvm.amdgcn.readfirstlane(i32 [[TMP10]])			; IR-NEXT: [[TMP11:%.*]] = call i32 @llvm.amdgcn.readfirstlane.i32(i32 [[TMP10]])
	; IR-NEXT: [[TMP12:%.*]] = select i1 [[TMP6]], i32 -1, i32 [[IN]]			; IR-NEXT: [[TMP12:%.*]] = select i1 [[TMP6]], i32 -1, i32 [[IN]]
	; IR-NEXT: [[TMP13:%.*]] = and i32 [[TMP11]], [[TMP12]]			; IR-NEXT: [[TMP13:%.*]] = and i32 [[TMP11]], [[TMP12]]
	; IR-NEXT: store i32 [[TMP13]], ptr addrspace(1) [[OUT2:%.*]], align 4			; IR-NEXT: store i32 [[TMP13]], ptr addrspace(1) [[OUT2]], align 4
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	entry:			entry:
	%gep = getelementptr i32, ptr addrspace(1) %out, i64 4			%gep = getelementptr i32, ptr addrspace(1) %out, i64 4
	%val = atomicrmw volatile and ptr addrspace(1) %gep, i32 %in seq_cst			%val = atomicrmw volatile and ptr addrspace(1) %gep, i32 %in seq_cst
	store i32 %val, ptr addrspace(1) %out2			store i32 %val, ptr addrspace(1) %out2
	ret void			ret void
	}			}

	define amdgpu_kernel void @atomic_and_i32_addr64_offset(ptr addrspace(1) %out, i32 %in, i64 %index) {			define amdgpu_kernel void @atomic_and_i32_addr64_offset(ptr addrspace(1) %out, i32 %in, i64 %index) {
	; IR-LABEL: @atomic_and_i32_addr64_offset(			; IR-LABEL: define amdgpu_kernel void @atomic_and_i32_addr64_offset
				; IR-SAME: (ptr addrspace(1) [[OUT:%.]], i32 [[IN:%.]], i64 [[INDEX:%.*]]) {
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: [[PTR:%.]] = getelementptr i32, ptr addrspace(1) [[OUT:%.]], i64 [[INDEX:%.*]]			; IR-NEXT: [[PTR:%.*]] = getelementptr i32, ptr addrspace(1) [[OUT]], i64 [[INDEX]]
	; IR-NEXT: [[GEP:%.*]] = getelementptr i32, ptr addrspace(1) [[PTR]], i64 4			; IR-NEXT: [[GEP:%.*]] = getelementptr i32, ptr addrspace(1) [[PTR]], i64 4
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]			; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]
	; IR: 7:			; IR: 7:
	; IR-NEXT: [[TMP8:%.]] = atomicrmw volatile and ptr addrspace(1) [[GEP]], i32 [[IN:%.]] seq_cst, align 4			; IR-NEXT: [[TMP8:%.*]] = atomicrmw volatile and ptr addrspace(1) [[GEP]], i32 [[IN]] seq_cst, align 4
	; IR-NEXT: br label [[TMP9]]			; IR-NEXT: br label [[TMP9]]
	; IR: 9:			; IR: 9:
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	entry:			entry:
	%ptr = getelementptr i32, ptr addrspace(1) %out, i64 %index			%ptr = getelementptr i32, ptr addrspace(1) %out, i64 %index
	%gep = getelementptr i32, ptr addrspace(1) %ptr, i64 4			%gep = getelementptr i32, ptr addrspace(1) %ptr, i64 4
	%val = atomicrmw volatile and ptr addrspace(1) %gep, i32 %in seq_cst			%val = atomicrmw volatile and ptr addrspace(1) %gep, i32 %in seq_cst
	ret void			ret void
	}			}

	define amdgpu_kernel void @atomic_and_i32_ret_addr64_offset(ptr addrspace(1) %out, ptr addrspace(1) %out2, i32 %in, i64 %index) {			define amdgpu_kernel void @atomic_and_i32_ret_addr64_offset(ptr addrspace(1) %out, ptr addrspace(1) %out2, i32 %in, i64 %index) {
	; IR-LABEL: @atomic_and_i32_ret_addr64_offset(			; IR-LABEL: define amdgpu_kernel void @atomic_and_i32_ret_addr64_offset
				; IR-SAME: (ptr addrspace(1) [[OUT:%.]], ptr addrspace(1) [[OUT2:%.]], i32 [[IN:%.]], i64 [[INDEX:%.]]) {
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: [[PTR:%.]] = getelementptr i32, ptr addrspace(1) [[OUT:%.]], i64 [[INDEX:%.*]]			; IR-NEXT: [[PTR:%.*]] = getelementptr i32, ptr addrspace(1) [[OUT]], i64 [[INDEX]]
	; IR-NEXT: [[GEP:%.*]] = getelementptr i32, ptr addrspace(1) [[PTR]], i64 4			; IR-NEXT: [[GEP:%.*]] = getelementptr i32, ptr addrspace(1) [[PTR]], i64 4
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]			; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]
	; IR: 7:			; IR: 7:
	; IR-NEXT: [[TMP8:%.]] = atomicrmw volatile and ptr addrspace(1) [[GEP]], i32 [[IN:%.]] seq_cst, align 4			; IR-NEXT: [[TMP8:%.*]] = atomicrmw volatile and ptr addrspace(1) [[GEP]], i32 [[IN]] seq_cst, align 4
	; IR-NEXT: br label [[TMP9]]			; IR-NEXT: br label [[TMP9]]
	; IR: 9:			; IR: 9:
	; IR-NEXT: [[TMP10:%.]] = phi i32 [ poison, [[ENTRY:%.]] ], [ [[TMP8]], [[TMP7]] ]			; IR-NEXT: [[TMP10:%.]] = phi i32 [ poison, [[ENTRY:%.]] ], [ [[TMP8]], [[TMP7]] ]
	; IR-NEXT: [[TMP11:%.*]] = call i32 @llvm.amdgcn.readfirstlane(i32 [[TMP10]])			; IR-NEXT: [[TMP11:%.*]] = call i32 @llvm.amdgcn.readfirstlane.i32(i32 [[TMP10]])
	; IR-NEXT: [[TMP12:%.*]] = select i1 [[TMP6]], i32 -1, i32 [[IN]]			; IR-NEXT: [[TMP12:%.*]] = select i1 [[TMP6]], i32 -1, i32 [[IN]]
	; IR-NEXT: [[TMP13:%.*]] = and i32 [[TMP11]], [[TMP12]]			; IR-NEXT: [[TMP13:%.*]] = and i32 [[TMP11]], [[TMP12]]
	; IR-NEXT: store i32 [[TMP13]], ptr addrspace(1) [[OUT2:%.*]], align 4			; IR-NEXT: store i32 [[TMP13]], ptr addrspace(1) [[OUT2]], align 4
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	entry:			entry:
	%ptr = getelementptr i32, ptr addrspace(1) %out, i64 %index			%ptr = getelementptr i32, ptr addrspace(1) %out, i64 %index
	%gep = getelementptr i32, ptr addrspace(1) %ptr, i64 4			%gep = getelementptr i32, ptr addrspace(1) %ptr, i64 4
	%val = atomicrmw volatile and ptr addrspace(1) %gep, i32 %in seq_cst			%val = atomicrmw volatile and ptr addrspace(1) %gep, i32 %in seq_cst
	store i32 %val, ptr addrspace(1) %out2			store i32 %val, ptr addrspace(1) %out2
	ret void			ret void
	}			}

	define amdgpu_kernel void @atomic_and_i32(ptr addrspace(1) %out, i32 %in) {			define amdgpu_kernel void @atomic_and_i32(ptr addrspace(1) %out, i32 %in) {
	; IR-LABEL: @atomic_and_i32(			; IR-LABEL: define amdgpu_kernel void @atomic_and_i32
				; IR-SAME: (ptr addrspace(1) [[OUT:%.]], i32 [[IN:%.]]) {
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]			; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]
	; IR: 7:			; IR: 7:
	; IR-NEXT: [[TMP8:%.]] = atomicrmw volatile and ptr addrspace(1) [[OUT:%.]], i32 [[IN:%.*]] seq_cst, align 4			; IR-NEXT: [[TMP8:%.*]] = atomicrmw volatile and ptr addrspace(1) [[OUT]], i32 [[IN]] seq_cst, align 4
	; IR-NEXT: br label [[TMP9]]			; IR-NEXT: br label [[TMP9]]
	; IR: 9:			; IR: 9:
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	entry:			entry:
	%val = atomicrmw volatile and ptr addrspace(1) %out, i32 %in seq_cst			%val = atomicrmw volatile and ptr addrspace(1) %out, i32 %in seq_cst
	ret void			ret void
	}			}

	define amdgpu_kernel void @atomic_and_i32_ret(ptr addrspace(1) %out, ptr addrspace(1) %out2, i32 %in) {			define amdgpu_kernel void @atomic_and_i32_ret(ptr addrspace(1) %out, ptr addrspace(1) %out2, i32 %in) {
	; IR-LABEL: @atomic_and_i32_ret(			; IR-LABEL: define amdgpu_kernel void @atomic_and_i32_ret
				; IR-SAME: (ptr addrspace(1) [[OUT:%.]], ptr addrspace(1) [[OUT2:%.]], i32 [[IN:%.*]]) {
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]			; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]
	; IR: 7:			; IR: 7:
	; IR-NEXT: [[TMP8:%.]] = atomicrmw volatile and ptr addrspace(1) [[OUT:%.]], i32 [[IN:%.*]] seq_cst, align 4			; IR-NEXT: [[TMP8:%.*]] = atomicrmw volatile and ptr addrspace(1) [[OUT]], i32 [[IN]] seq_cst, align 4
	; IR-NEXT: br label [[TMP9]]			; IR-NEXT: br label [[TMP9]]
	; IR: 9:			; IR: 9:
	; IR-NEXT: [[TMP10:%.]] = phi i32 [ poison, [[ENTRY:%.]] ], [ [[TMP8]], [[TMP7]] ]			; IR-NEXT: [[TMP10:%.]] = phi i32 [ poison, [[ENTRY:%.]] ], [ [[TMP8]], [[TMP7]] ]
	; IR-NEXT: [[TMP11:%.*]] = call i32 @llvm.amdgcn.readfirstlane(i32 [[TMP10]])			; IR-NEXT: [[TMP11:%.*]] = call i32 @llvm.amdgcn.readfirstlane.i32(i32 [[TMP10]])
	; IR-NEXT: [[TMP12:%.*]] = select i1 [[TMP6]], i32 -1, i32 [[IN]]			; IR-NEXT: [[TMP12:%.*]] = select i1 [[TMP6]], i32 -1, i32 [[IN]]
	; IR-NEXT: [[TMP13:%.*]] = and i32 [[TMP11]], [[TMP12]]			; IR-NEXT: [[TMP13:%.*]] = and i32 [[TMP11]], [[TMP12]]
	; IR-NEXT: store i32 [[TMP13]], ptr addrspace(1) [[OUT2:%.*]], align 4			; IR-NEXT: store i32 [[TMP13]], ptr addrspace(1) [[OUT2]], align 4
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	entry:			entry:
	%val = atomicrmw volatile and ptr addrspace(1) %out, i32 %in seq_cst			%val = atomicrmw volatile and ptr addrspace(1) %out, i32 %in seq_cst
	store i32 %val, ptr addrspace(1) %out2			store i32 %val, ptr addrspace(1) %out2
	ret void			ret void
	}			}

	define amdgpu_kernel void @atomic_and_i32_addr64(ptr addrspace(1) %out, i32 %in, i64 %index) {			define amdgpu_kernel void @atomic_and_i32_addr64(ptr addrspace(1) %out, i32 %in, i64 %index) {
	; IR-LABEL: @atomic_and_i32_addr64(			; IR-LABEL: define amdgpu_kernel void @atomic_and_i32_addr64
				; IR-SAME: (ptr addrspace(1) [[OUT:%.]], i32 [[IN:%.]], i64 [[INDEX:%.*]]) {
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: [[PTR:%.]] = getelementptr i32, ptr addrspace(1) [[OUT:%.]], i64 [[INDEX:%.*]]			; IR-NEXT: [[PTR:%.*]] = getelementptr i32, ptr addrspace(1) [[OUT]], i64 [[INDEX]]
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]			; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]
	; IR: 7:			; IR: 7:
	; IR-NEXT: [[TMP8:%.]] = atomicrmw volatile and ptr addrspace(1) [[PTR]], i32 [[IN:%.]] seq_cst, align 4			; IR-NEXT: [[TMP8:%.*]] = atomicrmw volatile and ptr addrspace(1) [[PTR]], i32 [[IN]] seq_cst, align 4
	; IR-NEXT: br label [[TMP9]]			; IR-NEXT: br label [[TMP9]]
	; IR: 9:			; IR: 9:
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	entry:			entry:
	%ptr = getelementptr i32, ptr addrspace(1) %out, i64 %index			%ptr = getelementptr i32, ptr addrspace(1) %out, i64 %index
	%val = atomicrmw volatile and ptr addrspace(1) %ptr, i32 %in seq_cst			%val = atomicrmw volatile and ptr addrspace(1) %ptr, i32 %in seq_cst
	ret void			ret void
	}			}

	define amdgpu_kernel void @atomic_and_i32_ret_addr64(ptr addrspace(1) %out, ptr addrspace(1) %out2, i32 %in, i64 %index) {			define amdgpu_kernel void @atomic_and_i32_ret_addr64(ptr addrspace(1) %out, ptr addrspace(1) %out2, i32 %in, i64 %index) {
	; IR-LABEL: @atomic_and_i32_ret_addr64(			; IR-LABEL: define amdgpu_kernel void @atomic_and_i32_ret_addr64
				; IR-SAME: (ptr addrspace(1) [[OUT:%.]], ptr addrspace(1) [[OUT2:%.]], i32 [[IN:%.]], i64 [[INDEX:%.]]) {
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: [[PTR:%.]] = getelementptr i32, ptr addrspace(1) [[OUT:%.]], i64 [[INDEX:%.*]]			; IR-NEXT: [[PTR:%.*]] = getelementptr i32, ptr addrspace(1) [[OUT]], i64 [[INDEX]]
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]			; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]
	; IR: 7:			; IR: 7:
	; IR-NEXT: [[TMP8:%.]] = atomicrmw volatile and ptr addrspace(1) [[PTR]], i32 [[IN:%.]] seq_cst, align 4			; IR-NEXT: [[TMP8:%.*]] = atomicrmw volatile and ptr addrspace(1) [[PTR]], i32 [[IN]] seq_cst, align 4
	; IR-NEXT: br label [[TMP9]]			; IR-NEXT: br label [[TMP9]]
	; IR: 9:			; IR: 9:
	; IR-NEXT: [[TMP10:%.]] = phi i32 [ poison, [[ENTRY:%.]] ], [ [[TMP8]], [[TMP7]] ]			; IR-NEXT: [[TMP10:%.]] = phi i32 [ poison, [[ENTRY:%.]] ], [ [[TMP8]], [[TMP7]] ]
	; IR-NEXT: [[TMP11:%.*]] = call i32 @llvm.amdgcn.readfirstlane(i32 [[TMP10]])			; IR-NEXT: [[TMP11:%.*]] = call i32 @llvm.amdgcn.readfirstlane.i32(i32 [[TMP10]])
	; IR-NEXT: [[TMP12:%.*]] = select i1 [[TMP6]], i32 -1, i32 [[IN]]			; IR-NEXT: [[TMP12:%.*]] = select i1 [[TMP6]], i32 -1, i32 [[IN]]
	; IR-NEXT: [[TMP13:%.*]] = and i32 [[TMP11]], [[TMP12]]			; IR-NEXT: [[TMP13:%.*]] = and i32 [[TMP11]], [[TMP12]]
	; IR-NEXT: store i32 [[TMP13]], ptr addrspace(1) [[OUT2:%.*]], align 4			; IR-NEXT: store i32 [[TMP13]], ptr addrspace(1) [[OUT2]], align 4
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	entry:			entry:
	%ptr = getelementptr i32, ptr addrspace(1) %out, i64 %index			%ptr = getelementptr i32, ptr addrspace(1) %out, i64 %index
	%val = atomicrmw volatile and ptr addrspace(1) %ptr, i32 %in seq_cst			%val = atomicrmw volatile and ptr addrspace(1) %ptr, i32 %in seq_cst
	store i32 %val, ptr addrspace(1) %out2			store i32 %val, ptr addrspace(1) %out2
	ret void			ret void
	}			}

	define amdgpu_kernel void @atomic_sub_i32_offset(ptr addrspace(1) %out, i32 %in) {			define amdgpu_kernel void @atomic_sub_i32_offset(ptr addrspace(1) %out, i32 %in) {
	; IR-LABEL: @atomic_sub_i32_offset(			; IR-LABEL: define amdgpu_kernel void @atomic_sub_i32_offset
				; IR-SAME: (ptr addrspace(1) [[OUT:%.]], i32 [[IN:%.]]) {
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: [[GEP:%.]] = getelementptr i32, ptr addrspace(1) [[OUT:%.]], i64 4			; IR-NEXT: [[GEP:%.*]] = getelementptr i32, ptr addrspace(1) [[OUT]], i64 4
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP0]])			; IR-NEXT: [[TMP6:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP0]])
	; IR-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32			; IR-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32
	; IR-NEXT: [[TMP8:%.]] = mul i32 [[IN:%.]], [[TMP7]]			; IR-NEXT: [[TMP8:%.*]] = mul i32 [[IN]], [[TMP7]]
	; IR-NEXT: [[TMP9:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP9:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP9]], label [[TMP10:%.]], label [[TMP12:%.]]			; IR-NEXT: br i1 [[TMP9]], label [[TMP10:%.]], label [[TMP12:%.]]
	; IR: 10:			; IR: 10:
	; IR-NEXT: [[TMP11:%.*]] = atomicrmw volatile sub ptr addrspace(1) [[GEP]], i32 [[TMP8]] seq_cst, align 4			; IR-NEXT: [[TMP11:%.*]] = atomicrmw volatile sub ptr addrspace(1) [[GEP]], i32 [[TMP8]] seq_cst, align 4
	; IR-NEXT: br label [[TMP12]]			; IR-NEXT: br label [[TMP12]]
	; IR: 12:			; IR: 12:
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	entry:			entry:
	%gep = getelementptr i32, ptr addrspace(1) %out, i64 4			%gep = getelementptr i32, ptr addrspace(1) %out, i64 4
	%val = atomicrmw volatile sub ptr addrspace(1) %gep, i32 %in seq_cst			%val = atomicrmw volatile sub ptr addrspace(1) %gep, i32 %in seq_cst
	ret void			ret void
	}			}

	define amdgpu_kernel void @atomic_sub_i32_ret_offset(ptr addrspace(1) %out, ptr addrspace(1) %out2, i32 %in) {			define amdgpu_kernel void @atomic_sub_i32_ret_offset(ptr addrspace(1) %out, ptr addrspace(1) %out2, i32 %in) {
	; IR-LABEL: @atomic_sub_i32_ret_offset(			; IR-LABEL: define amdgpu_kernel void @atomic_sub_i32_ret_offset
				; IR-SAME: (ptr addrspace(1) [[OUT:%.]], ptr addrspace(1) [[OUT2:%.]], i32 [[IN:%.*]]) {
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: [[GEP:%.]] = getelementptr i32, ptr addrspace(1) [[OUT:%.]], i64 4			; IR-NEXT: [[GEP:%.*]] = getelementptr i32, ptr addrspace(1) [[OUT]], i64 4
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP0]])			; IR-NEXT: [[TMP6:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP0]])
	; IR-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32			; IR-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32
	; IR-NEXT: [[TMP8:%.]] = mul i32 [[IN:%.]], [[TMP7]]			; IR-NEXT: [[TMP8:%.*]] = mul i32 [[IN]], [[TMP7]]
	; IR-NEXT: [[TMP9:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP9:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP9]], label [[TMP10:%.]], label [[TMP12:%.]]			; IR-NEXT: br i1 [[TMP9]], label [[TMP10:%.]], label [[TMP12:%.]]
	; IR: 10:			; IR: 10:
	; IR-NEXT: [[TMP11:%.*]] = atomicrmw volatile sub ptr addrspace(1) [[GEP]], i32 [[TMP8]] seq_cst, align 4			; IR-NEXT: [[TMP11:%.*]] = atomicrmw volatile sub ptr addrspace(1) [[GEP]], i32 [[TMP8]] seq_cst, align 4
	; IR-NEXT: br label [[TMP12]]			; IR-NEXT: br label [[TMP12]]
	; IR: 12:			; IR: 12:
	; IR-NEXT: [[TMP13:%.]] = phi i32 [ poison, [[ENTRY:%.]] ], [ [[TMP11]], [[TMP10]] ]			; IR-NEXT: [[TMP13:%.]] = phi i32 [ poison, [[ENTRY:%.]] ], [ [[TMP11]], [[TMP10]] ]
	; IR-NEXT: [[TMP14:%.*]] = call i32 @llvm.amdgcn.readfirstlane(i32 [[TMP13]])			; IR-NEXT: [[TMP14:%.*]] = call i32 @llvm.amdgcn.readfirstlane.i32(i32 [[TMP13]])
	; IR-NEXT: [[TMP15:%.*]] = mul i32 [[IN]], [[TMP5]]			; IR-NEXT: [[TMP15:%.*]] = mul i32 [[IN]], [[TMP5]]
	; IR-NEXT: [[TMP16:%.*]] = sub i32 [[TMP14]], [[TMP15]]			; IR-NEXT: [[TMP16:%.*]] = sub i32 [[TMP14]], [[TMP15]]
	; IR-NEXT: store i32 [[TMP16]], ptr addrspace(1) [[OUT2:%.*]], align 4			; IR-NEXT: store i32 [[TMP16]], ptr addrspace(1) [[OUT2]], align 4
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	entry:			entry:
	%gep = getelementptr i32, ptr addrspace(1) %out, i64 4			%gep = getelementptr i32, ptr addrspace(1) %out, i64 4
	%val = atomicrmw volatile sub ptr addrspace(1) %gep, i32 %in seq_cst			%val = atomicrmw volatile sub ptr addrspace(1) %gep, i32 %in seq_cst
	store i32 %val, ptr addrspace(1) %out2			store i32 %val, ptr addrspace(1) %out2
	ret void			ret void
	}			}

	define amdgpu_kernel void @atomic_sub_i32_addr64_offset(ptr addrspace(1) %out, i32 %in, i64 %index) {			define amdgpu_kernel void @atomic_sub_i32_addr64_offset(ptr addrspace(1) %out, i32 %in, i64 %index) {
	; IR-LABEL: @atomic_sub_i32_addr64_offset(			; IR-LABEL: define amdgpu_kernel void @atomic_sub_i32_addr64_offset
				; IR-SAME: (ptr addrspace(1) [[OUT:%.]], i32 [[IN:%.]], i64 [[INDEX:%.*]]) {
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: [[PTR:%.]] = getelementptr i32, ptr addrspace(1) [[OUT:%.]], i64 [[INDEX:%.*]]			; IR-NEXT: [[PTR:%.*]] = getelementptr i32, ptr addrspace(1) [[OUT]], i64 [[INDEX]]
	; IR-NEXT: [[GEP:%.*]] = getelementptr i32, ptr addrspace(1) [[PTR]], i64 4			; IR-NEXT: [[GEP:%.*]] = getelementptr i32, ptr addrspace(1) [[PTR]], i64 4
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP0]])			; IR-NEXT: [[TMP6:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP0]])
	; IR-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32			; IR-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32
	; IR-NEXT: [[TMP8:%.]] = mul i32 [[IN:%.]], [[TMP7]]			; IR-NEXT: [[TMP8:%.*]] = mul i32 [[IN]], [[TMP7]]
	; IR-NEXT: [[TMP9:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP9:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP9]], label [[TMP10:%.]], label [[TMP12:%.]]			; IR-NEXT: br i1 [[TMP9]], label [[TMP10:%.]], label [[TMP12:%.]]
	; IR: 10:			; IR: 10:
	; IR-NEXT: [[TMP11:%.*]] = atomicrmw volatile sub ptr addrspace(1) [[GEP]], i32 [[TMP8]] seq_cst, align 4			; IR-NEXT: [[TMP11:%.*]] = atomicrmw volatile sub ptr addrspace(1) [[GEP]], i32 [[TMP8]] seq_cst, align 4
	; IR-NEXT: br label [[TMP12]]			; IR-NEXT: br label [[TMP12]]
	; IR: 12:			; IR: 12:
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	entry:			entry:
	%ptr = getelementptr i32, ptr addrspace(1) %out, i64 %index			%ptr = getelementptr i32, ptr addrspace(1) %out, i64 %index
	%gep = getelementptr i32, ptr addrspace(1) %ptr, i64 4			%gep = getelementptr i32, ptr addrspace(1) %ptr, i64 4
	%val = atomicrmw volatile sub ptr addrspace(1) %gep, i32 %in seq_cst			%val = atomicrmw volatile sub ptr addrspace(1) %gep, i32 %in seq_cst
	ret void			ret void
	}			}

	define amdgpu_kernel void @atomic_sub_i32_ret_addr64_offset(ptr addrspace(1) %out, ptr addrspace(1) %out2, i32 %in, i64 %index) {			define amdgpu_kernel void @atomic_sub_i32_ret_addr64_offset(ptr addrspace(1) %out, ptr addrspace(1) %out2, i32 %in, i64 %index) {
	; IR-LABEL: @atomic_sub_i32_ret_addr64_offset(			; IR-LABEL: define amdgpu_kernel void @atomic_sub_i32_ret_addr64_offset
				; IR-SAME: (ptr addrspace(1) [[OUT:%.]], ptr addrspace(1) [[OUT2:%.]], i32 [[IN:%.]], i64 [[INDEX:%.]]) {
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: [[PTR:%.]] = getelementptr i32, ptr addrspace(1) [[OUT:%.]], i64 [[INDEX:%.*]]			; IR-NEXT: [[PTR:%.*]] = getelementptr i32, ptr addrspace(1) [[OUT]], i64 [[INDEX]]
	; IR-NEXT: [[GEP:%.*]] = getelementptr i32, ptr addrspace(1) [[PTR]], i64 4			; IR-NEXT: [[GEP:%.*]] = getelementptr i32, ptr addrspace(1) [[PTR]], i64 4
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP0]])			; IR-NEXT: [[TMP6:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP0]])
	; IR-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32			; IR-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32
	; IR-NEXT: [[TMP8:%.]] = mul i32 [[IN:%.]], [[TMP7]]			; IR-NEXT: [[TMP8:%.*]] = mul i32 [[IN]], [[TMP7]]
	; IR-NEXT: [[TMP9:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP9:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP9]], label [[TMP10:%.]], label [[TMP12:%.]]			; IR-NEXT: br i1 [[TMP9]], label [[TMP10:%.]], label [[TMP12:%.]]
	; IR: 10:			; IR: 10:
	; IR-NEXT: [[TMP11:%.*]] = atomicrmw volatile sub ptr addrspace(1) [[GEP]], i32 [[TMP8]] seq_cst, align 4			; IR-NEXT: [[TMP11:%.*]] = atomicrmw volatile sub ptr addrspace(1) [[GEP]], i32 [[TMP8]] seq_cst, align 4
	; IR-NEXT: br label [[TMP12]]			; IR-NEXT: br label [[TMP12]]
	; IR: 12:			; IR: 12:
	; IR-NEXT: [[TMP13:%.]] = phi i32 [ poison, [[ENTRY:%.]] ], [ [[TMP11]], [[TMP10]] ]			; IR-NEXT: [[TMP13:%.]] = phi i32 [ poison, [[ENTRY:%.]] ], [ [[TMP11]], [[TMP10]] ]
	; IR-NEXT: [[TMP14:%.*]] = call i32 @llvm.amdgcn.readfirstlane(i32 [[TMP13]])			; IR-NEXT: [[TMP14:%.*]] = call i32 @llvm.amdgcn.readfirstlane.i32(i32 [[TMP13]])
	; IR-NEXT: [[TMP15:%.*]] = mul i32 [[IN]], [[TMP5]]			; IR-NEXT: [[TMP15:%.*]] = mul i32 [[IN]], [[TMP5]]
	; IR-NEXT: [[TMP16:%.*]] = sub i32 [[TMP14]], [[TMP15]]			; IR-NEXT: [[TMP16:%.*]] = sub i32 [[TMP14]], [[TMP15]]
	; IR-NEXT: store i32 [[TMP16]], ptr addrspace(1) [[OUT2:%.*]], align 4			; IR-NEXT: store i32 [[TMP16]], ptr addrspace(1) [[OUT2]], align 4
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	entry:			entry:
	%ptr = getelementptr i32, ptr addrspace(1) %out, i64 %index			%ptr = getelementptr i32, ptr addrspace(1) %out, i64 %index
	%gep = getelementptr i32, ptr addrspace(1) %ptr, i64 4			%gep = getelementptr i32, ptr addrspace(1) %ptr, i64 4
	%val = atomicrmw volatile sub ptr addrspace(1) %gep, i32 %in seq_cst			%val = atomicrmw volatile sub ptr addrspace(1) %gep, i32 %in seq_cst
	store i32 %val, ptr addrspace(1) %out2			store i32 %val, ptr addrspace(1) %out2
	ret void			ret void
	}			}

	define amdgpu_kernel void @atomic_sub_i32(ptr addrspace(1) %out, i32 %in) {			define amdgpu_kernel void @atomic_sub_i32(ptr addrspace(1) %out, i32 %in) {
	; IR-LABEL: @atomic_sub_i32(			; IR-LABEL: define amdgpu_kernel void @atomic_sub_i32
				; IR-SAME: (ptr addrspace(1) [[OUT:%.]], i32 [[IN:%.]]) {
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP0]])			; IR-NEXT: [[TMP6:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP0]])
	; IR-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32			; IR-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32
	; IR-NEXT: [[TMP8:%.]] = mul i32 [[IN:%.]], [[TMP7]]			; IR-NEXT: [[TMP8:%.*]] = mul i32 [[IN]], [[TMP7]]
	; IR-NEXT: [[TMP9:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP9:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP9]], label [[TMP10:%.]], label [[TMP12:%.]]			; IR-NEXT: br i1 [[TMP9]], label [[TMP10:%.]], label [[TMP12:%.]]
	; IR: 10:			; IR: 10:
	; IR-NEXT: [[TMP11:%.]] = atomicrmw volatile sub ptr addrspace(1) [[OUT:%.]], i32 [[TMP8]] seq_cst, align 4			; IR-NEXT: [[TMP11:%.*]] = atomicrmw volatile sub ptr addrspace(1) [[OUT]], i32 [[TMP8]] seq_cst, align 4
	; IR-NEXT: br label [[TMP12]]			; IR-NEXT: br label [[TMP12]]
	; IR: 12:			; IR: 12:
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	entry:			entry:
	%val = atomicrmw volatile sub ptr addrspace(1) %out, i32 %in seq_cst			%val = atomicrmw volatile sub ptr addrspace(1) %out, i32 %in seq_cst
	ret void			ret void
	}			}

	define amdgpu_kernel void @atomic_sub_i32_ret(ptr addrspace(1) %out, ptr addrspace(1) %out2, i32 %in) {			define amdgpu_kernel void @atomic_sub_i32_ret(ptr addrspace(1) %out, ptr addrspace(1) %out2, i32 %in) {
	; IR-LABEL: @atomic_sub_i32_ret(			; IR-LABEL: define amdgpu_kernel void @atomic_sub_i32_ret
				; IR-SAME: (ptr addrspace(1) [[OUT:%.]], ptr addrspace(1) [[OUT2:%.]], i32 [[IN:%.*]]) {
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP0]])			; IR-NEXT: [[TMP6:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP0]])
	; IR-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32			; IR-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32
	; IR-NEXT: [[TMP8:%.]] = mul i32 [[IN:%.]], [[TMP7]]			; IR-NEXT: [[TMP8:%.*]] = mul i32 [[IN]], [[TMP7]]
	; IR-NEXT: [[TMP9:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP9:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP9]], label [[TMP10:%.]], label [[TMP12:%.]]			; IR-NEXT: br i1 [[TMP9]], label [[TMP10:%.]], label [[TMP12:%.]]
	; IR: 10:			; IR: 10:
	; IR-NEXT: [[TMP11:%.]] = atomicrmw volatile sub ptr addrspace(1) [[OUT:%.]], i32 [[TMP8]] seq_cst, align 4			; IR-NEXT: [[TMP11:%.*]] = atomicrmw volatile sub ptr addrspace(1) [[OUT]], i32 [[TMP8]] seq_cst, align 4
	; IR-NEXT: br label [[TMP12]]			; IR-NEXT: br label [[TMP12]]
	; IR: 12:			; IR: 12:
	; IR-NEXT: [[TMP13:%.]] = phi i32 [ poison, [[ENTRY:%.]] ], [ [[TMP11]], [[TMP10]] ]			; IR-NEXT: [[TMP13:%.]] = phi i32 [ poison, [[ENTRY:%.]] ], [ [[TMP11]], [[TMP10]] ]
	; IR-NEXT: [[TMP14:%.*]] = call i32 @llvm.amdgcn.readfirstlane(i32 [[TMP13]])			; IR-NEXT: [[TMP14:%.*]] = call i32 @llvm.amdgcn.readfirstlane.i32(i32 [[TMP13]])
	; IR-NEXT: [[TMP15:%.*]] = mul i32 [[IN]], [[TMP5]]			; IR-NEXT: [[TMP15:%.*]] = mul i32 [[IN]], [[TMP5]]
	; IR-NEXT: [[TMP16:%.*]] = sub i32 [[TMP14]], [[TMP15]]			; IR-NEXT: [[TMP16:%.*]] = sub i32 [[TMP14]], [[TMP15]]
	; IR-NEXT: store i32 [[TMP16]], ptr addrspace(1) [[OUT2:%.*]], align 4			; IR-NEXT: store i32 [[TMP16]], ptr addrspace(1) [[OUT2]], align 4
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	entry:			entry:
	%val = atomicrmw volatile sub ptr addrspace(1) %out, i32 %in seq_cst			%val = atomicrmw volatile sub ptr addrspace(1) %out, i32 %in seq_cst
	store i32 %val, ptr addrspace(1) %out2			store i32 %val, ptr addrspace(1) %out2
	ret void			ret void
	}			}

	define amdgpu_kernel void @atomic_sub_i32_addr64(ptr addrspace(1) %out, i32 %in, i64 %index) {			define amdgpu_kernel void @atomic_sub_i32_addr64(ptr addrspace(1) %out, i32 %in, i64 %index) {
	; IR-LABEL: @atomic_sub_i32_addr64(			; IR-LABEL: define amdgpu_kernel void @atomic_sub_i32_addr64
				; IR-SAME: (ptr addrspace(1) [[OUT:%.]], i32 [[IN:%.]], i64 [[INDEX:%.*]]) {
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: [[PTR:%.]] = getelementptr i32, ptr addrspace(1) [[OUT:%.]], i64 [[INDEX:%.*]]			; IR-NEXT: [[PTR:%.*]] = getelementptr i32, ptr addrspace(1) [[OUT]], i64 [[INDEX]]
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP0]])			; IR-NEXT: [[TMP6:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP0]])
	; IR-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32			; IR-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32
	; IR-NEXT: [[TMP8:%.]] = mul i32 [[IN:%.]], [[TMP7]]			; IR-NEXT: [[TMP8:%.*]] = mul i32 [[IN]], [[TMP7]]
	; IR-NEXT: [[TMP9:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP9:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP9]], label [[TMP10:%.]], label [[TMP12:%.]]			; IR-NEXT: br i1 [[TMP9]], label [[TMP10:%.]], label [[TMP12:%.]]
	; IR: 10:			; IR: 10:
	; IR-NEXT: [[TMP11:%.*]] = atomicrmw volatile sub ptr addrspace(1) [[PTR]], i32 [[TMP8]] seq_cst, align 4			; IR-NEXT: [[TMP11:%.*]] = atomicrmw volatile sub ptr addrspace(1) [[PTR]], i32 [[TMP8]] seq_cst, align 4
	; IR-NEXT: br label [[TMP12]]			; IR-NEXT: br label [[TMP12]]
	; IR: 12:			; IR: 12:
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	entry:			entry:
	%ptr = getelementptr i32, ptr addrspace(1) %out, i64 %index			%ptr = getelementptr i32, ptr addrspace(1) %out, i64 %index
	%val = atomicrmw volatile sub ptr addrspace(1) %ptr, i32 %in seq_cst			%val = atomicrmw volatile sub ptr addrspace(1) %ptr, i32 %in seq_cst
	ret void			ret void
	}			}

	define amdgpu_kernel void @atomic_sub_i32_ret_addr64(ptr addrspace(1) %out, ptr addrspace(1) %out2, i32 %in, i64 %index) {			define amdgpu_kernel void @atomic_sub_i32_ret_addr64(ptr addrspace(1) %out, ptr addrspace(1) %out2, i32 %in, i64 %index) {
	; IR-LABEL: @atomic_sub_i32_ret_addr64(			; IR-LABEL: define amdgpu_kernel void @atomic_sub_i32_ret_addr64
				; IR-SAME: (ptr addrspace(1) [[OUT:%.]], ptr addrspace(1) [[OUT2:%.]], i32 [[IN:%.]], i64 [[INDEX:%.]]) {
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: [[PTR:%.]] = getelementptr i32, ptr addrspace(1) [[OUT:%.]], i64 [[INDEX:%.*]]			; IR-NEXT: [[PTR:%.*]] = getelementptr i32, ptr addrspace(1) [[OUT]], i64 [[INDEX]]
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP0]])			; IR-NEXT: [[TMP6:%.*]] = call i64 @llvm.ctpop.i64(i64 [[TMP0]])
	; IR-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32			; IR-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32
	; IR-NEXT: [[TMP8:%.]] = mul i32 [[IN:%.]], [[TMP7]]			; IR-NEXT: [[TMP8:%.*]] = mul i32 [[IN]], [[TMP7]]
	; IR-NEXT: [[TMP9:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP9:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP9]], label [[TMP10:%.]], label [[TMP12:%.]]			; IR-NEXT: br i1 [[TMP9]], label [[TMP10:%.]], label [[TMP12:%.]]
	; IR: 10:			; IR: 10:
	; IR-NEXT: [[TMP11:%.*]] = atomicrmw volatile sub ptr addrspace(1) [[PTR]], i32 [[TMP8]] seq_cst, align 4			; IR-NEXT: [[TMP11:%.*]] = atomicrmw volatile sub ptr addrspace(1) [[PTR]], i32 [[TMP8]] seq_cst, align 4
	; IR-NEXT: br label [[TMP12]]			; IR-NEXT: br label [[TMP12]]
	; IR: 12:			; IR: 12:
	; IR-NEXT: [[TMP13:%.]] = phi i32 [ poison, [[ENTRY:%.]] ], [ [[TMP11]], [[TMP10]] ]			; IR-NEXT: [[TMP13:%.]] = phi i32 [ poison, [[ENTRY:%.]] ], [ [[TMP11]], [[TMP10]] ]
	; IR-NEXT: [[TMP14:%.*]] = call i32 @llvm.amdgcn.readfirstlane(i32 [[TMP13]])			; IR-NEXT: [[TMP14:%.*]] = call i32 @llvm.amdgcn.readfirstlane.i32(i32 [[TMP13]])
	; IR-NEXT: [[TMP15:%.*]] = mul i32 [[IN]], [[TMP5]]			; IR-NEXT: [[TMP15:%.*]] = mul i32 [[IN]], [[TMP5]]
	; IR-NEXT: [[TMP16:%.*]] = sub i32 [[TMP14]], [[TMP15]]			; IR-NEXT: [[TMP16:%.*]] = sub i32 [[TMP14]], [[TMP15]]
	; IR-NEXT: store i32 [[TMP16]], ptr addrspace(1) [[OUT2:%.*]], align 4			; IR-NEXT: store i32 [[TMP16]], ptr addrspace(1) [[OUT2]], align 4
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	entry:			entry:
	%ptr = getelementptr i32, ptr addrspace(1) %out, i64 %index			%ptr = getelementptr i32, ptr addrspace(1) %out, i64 %index
	%val = atomicrmw volatile sub ptr addrspace(1) %ptr, i32 %in seq_cst			%val = atomicrmw volatile sub ptr addrspace(1) %ptr, i32 %in seq_cst
	store i32 %val, ptr addrspace(1) %out2			store i32 %val, ptr addrspace(1) %out2
	ret void			ret void
	}			}

	define amdgpu_kernel void @atomic_max_i32_offset(ptr addrspace(1) %out, i32 %in) {			define amdgpu_kernel void @atomic_max_i32_offset(ptr addrspace(1) %out, i32 %in) {
	; IR-LABEL: @atomic_max_i32_offset(			; IR-LABEL: define amdgpu_kernel void @atomic_max_i32_offset
				; IR-SAME: (ptr addrspace(1) [[OUT:%.]], i32 [[IN:%.]]) {
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: [[GEP:%.]] = getelementptr i32, ptr addrspace(1) [[OUT:%.]], i64 4			; IR-NEXT: [[GEP:%.*]] = getelementptr i32, ptr addrspace(1) [[OUT]], i64 4
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]			; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]
	; IR: 7:			; IR: 7:
	; IR-NEXT: [[TMP8:%.]] = atomicrmw volatile max ptr addrspace(1) [[GEP]], i32 [[IN:%.]] seq_cst, align 4			; IR-NEXT: [[TMP8:%.*]] = atomicrmw volatile max ptr addrspace(1) [[GEP]], i32 [[IN]] seq_cst, align 4
	; IR-NEXT: br label [[TMP9]]			; IR-NEXT: br label [[TMP9]]
	; IR: 9:			; IR: 9:
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	entry:			entry:
	%gep = getelementptr i32, ptr addrspace(1) %out, i64 4			%gep = getelementptr i32, ptr addrspace(1) %out, i64 4
	%val = atomicrmw volatile max ptr addrspace(1) %gep, i32 %in seq_cst			%val = atomicrmw volatile max ptr addrspace(1) %gep, i32 %in seq_cst
	ret void			ret void
	}			}

	define amdgpu_kernel void @atomic_max_i32_ret_offset(ptr addrspace(1) %out, ptr addrspace(1) %out2, i32 %in) {			define amdgpu_kernel void @atomic_max_i32_ret_offset(ptr addrspace(1) %out, ptr addrspace(1) %out2, i32 %in) {
	; IR-LABEL: @atomic_max_i32_ret_offset(			; IR-LABEL: define amdgpu_kernel void @atomic_max_i32_ret_offset
				; IR-SAME: (ptr addrspace(1) [[OUT:%.]], ptr addrspace(1) [[OUT2:%.]], i32 [[IN:%.*]]) {
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: [[GEP:%.]] = getelementptr i32, ptr addrspace(1) [[OUT:%.]], i64 4			; IR-NEXT: [[GEP:%.*]] = getelementptr i32, ptr addrspace(1) [[OUT]], i64 4
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]			; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]
	; IR: 7:			; IR: 7:
	; IR-NEXT: [[TMP8:%.]] = atomicrmw volatile max ptr addrspace(1) [[GEP]], i32 [[IN:%.]] syncscope("workgroup") seq_cst, align 4			; IR-NEXT: [[TMP8:%.*]] = atomicrmw volatile max ptr addrspace(1) [[GEP]], i32 [[IN]] syncscope("workgroup") seq_cst, align 4
	; IR-NEXT: br label [[TMP9]]			; IR-NEXT: br label [[TMP9]]
	; IR: 9:			; IR: 9:
	; IR-NEXT: [[TMP10:%.]] = phi i32 [ poison, [[ENTRY:%.]] ], [ [[TMP8]], [[TMP7]] ]			; IR-NEXT: [[TMP10:%.]] = phi i32 [ poison, [[ENTRY:%.]] ], [ [[TMP8]], [[TMP7]] ]
	; IR-NEXT: [[TMP11:%.*]] = call i32 @llvm.amdgcn.readfirstlane(i32 [[TMP10]])			; IR-NEXT: [[TMP11:%.*]] = call i32 @llvm.amdgcn.readfirstlane.i32(i32 [[TMP10]])
	; IR-NEXT: [[TMP12:%.*]] = select i1 [[TMP6]], i32 -2147483648, i32 [[IN]]			; IR-NEXT: [[TMP12:%.*]] = select i1 [[TMP6]], i32 -2147483648, i32 [[IN]]
	; IR-NEXT: [[TMP13:%.*]] = icmp sgt i32 [[TMP11]], [[TMP12]]			; IR-NEXT: [[TMP13:%.*]] = icmp sgt i32 [[TMP11]], [[TMP12]]
	; IR-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP11]], i32 [[TMP12]]			; IR-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP11]], i32 [[TMP12]]
	; IR-NEXT: store i32 [[TMP14]], ptr addrspace(1) [[OUT2:%.*]], align 4			; IR-NEXT: store i32 [[TMP14]], ptr addrspace(1) [[OUT2]], align 4
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	entry:			entry:
	%gep = getelementptr i32, ptr addrspace(1) %out, i64 4			%gep = getelementptr i32, ptr addrspace(1) %out, i64 4
	%val = atomicrmw volatile max ptr addrspace(1) %gep, i32 %in syncscope("workgroup") seq_cst			%val = atomicrmw volatile max ptr addrspace(1) %gep, i32 %in syncscope("workgroup") seq_cst
	store i32 %val, ptr addrspace(1) %out2			store i32 %val, ptr addrspace(1) %out2
	ret void			ret void
	}			}

	define amdgpu_kernel void @atomic_max_i32_addr64_offset(ptr addrspace(1) %out, i32 %in, i64 %index) {			define amdgpu_kernel void @atomic_max_i32_addr64_offset(ptr addrspace(1) %out, i32 %in, i64 %index) {
	; IR-LABEL: @atomic_max_i32_addr64_offset(			; IR-LABEL: define amdgpu_kernel void @atomic_max_i32_addr64_offset
				; IR-SAME: (ptr addrspace(1) [[OUT:%.]], i32 [[IN:%.]], i64 [[INDEX:%.*]]) {
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: [[PTR:%.]] = getelementptr i32, ptr addrspace(1) [[OUT:%.]], i64 [[INDEX:%.*]]			; IR-NEXT: [[PTR:%.*]] = getelementptr i32, ptr addrspace(1) [[OUT]], i64 [[INDEX]]
	; IR-NEXT: [[GEP:%.*]] = getelementptr i32, ptr addrspace(1) [[PTR]], i64 4			; IR-NEXT: [[GEP:%.*]] = getelementptr i32, ptr addrspace(1) [[PTR]], i64 4
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]			; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]
	; IR: 7:			; IR: 7:
	; IR-NEXT: [[TMP8:%.]] = atomicrmw volatile max ptr addrspace(1) [[GEP]], i32 [[IN:%.]] syncscope("workgroup") seq_cst, align 4			; IR-NEXT: [[TMP8:%.*]] = atomicrmw volatile max ptr addrspace(1) [[GEP]], i32 [[IN]] syncscope("workgroup") seq_cst, align 4
	; IR-NEXT: br label [[TMP9]]			; IR-NEXT: br label [[TMP9]]
	; IR: 9:			; IR: 9:
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	entry:			entry:
	%ptr = getelementptr i32, ptr addrspace(1) %out, i64 %index			%ptr = getelementptr i32, ptr addrspace(1) %out, i64 %index
	%gep = getelementptr i32, ptr addrspace(1) %ptr, i64 4			%gep = getelementptr i32, ptr addrspace(1) %ptr, i64 4
	%val = atomicrmw volatile max ptr addrspace(1) %gep, i32 %in syncscope("workgroup") seq_cst			%val = atomicrmw volatile max ptr addrspace(1) %gep, i32 %in syncscope("workgroup") seq_cst
	ret void			ret void
	}			}

	define amdgpu_kernel void @atomic_max_i32_ret_addr64_offset(ptr addrspace(1) %out, ptr addrspace(1) %out2, i32 %in, i64 %index) {			define amdgpu_kernel void @atomic_max_i32_ret_addr64_offset(ptr addrspace(1) %out, ptr addrspace(1) %out2, i32 %in, i64 %index) {
	; IR-LABEL: @atomic_max_i32_ret_addr64_offset(			; IR-LABEL: define amdgpu_kernel void @atomic_max_i32_ret_addr64_offset
				; IR-SAME: (ptr addrspace(1) [[OUT:%.]], ptr addrspace(1) [[OUT2:%.]], i32 [[IN:%.]], i64 [[INDEX:%.]]) {
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: [[PTR:%.]] = getelementptr i32, ptr addrspace(1) [[OUT:%.]], i64 [[INDEX:%.*]]			; IR-NEXT: [[PTR:%.*]] = getelementptr i32, ptr addrspace(1) [[OUT]], i64 [[INDEX]]
	; IR-NEXT: [[GEP:%.*]] = getelementptr i32, ptr addrspace(1) [[PTR]], i64 4			; IR-NEXT: [[GEP:%.*]] = getelementptr i32, ptr addrspace(1) [[PTR]], i64 4
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]			; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]
	; IR: 7:			; IR: 7:
	; IR-NEXT: [[TMP8:%.]] = atomicrmw volatile max ptr addrspace(1) [[GEP]], i32 [[IN:%.]] syncscope("workgroup") seq_cst, align 4			; IR-NEXT: [[TMP8:%.*]] = atomicrmw volatile max ptr addrspace(1) [[GEP]], i32 [[IN]] syncscope("workgroup") seq_cst, align 4
	; IR-NEXT: br label [[TMP9]]			; IR-NEXT: br label [[TMP9]]
	; IR: 9:			; IR: 9:
	; IR-NEXT: [[TMP10:%.]] = phi i32 [ poison, [[ENTRY:%.]] ], [ [[TMP8]], [[TMP7]] ]			; IR-NEXT: [[TMP10:%.]] = phi i32 [ poison, [[ENTRY:%.]] ], [ [[TMP8]], [[TMP7]] ]
	; IR-NEXT: [[TMP11:%.*]] = call i32 @llvm.amdgcn.readfirstlane(i32 [[TMP10]])			; IR-NEXT: [[TMP11:%.*]] = call i32 @llvm.amdgcn.readfirstlane.i32(i32 [[TMP10]])
	; IR-NEXT: [[TMP12:%.*]] = select i1 [[TMP6]], i32 -2147483648, i32 [[IN]]			; IR-NEXT: [[TMP12:%.*]] = select i1 [[TMP6]], i32 -2147483648, i32 [[IN]]
	; IR-NEXT: [[TMP13:%.*]] = icmp sgt i32 [[TMP11]], [[TMP12]]			; IR-NEXT: [[TMP13:%.*]] = icmp sgt i32 [[TMP11]], [[TMP12]]
	; IR-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP11]], i32 [[TMP12]]			; IR-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP11]], i32 [[TMP12]]
	; IR-NEXT: store i32 [[TMP14]], ptr addrspace(1) [[OUT2:%.*]], align 4			; IR-NEXT: store i32 [[TMP14]], ptr addrspace(1) [[OUT2]], align 4
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	entry:			entry:
	%ptr = getelementptr i32, ptr addrspace(1) %out, i64 %index			%ptr = getelementptr i32, ptr addrspace(1) %out, i64 %index
	%gep = getelementptr i32, ptr addrspace(1) %ptr, i64 4			%gep = getelementptr i32, ptr addrspace(1) %ptr, i64 4
	%val = atomicrmw volatile max ptr addrspace(1) %gep, i32 %in syncscope("workgroup") seq_cst			%val = atomicrmw volatile max ptr addrspace(1) %gep, i32 %in syncscope("workgroup") seq_cst
	store i32 %val, ptr addrspace(1) %out2			store i32 %val, ptr addrspace(1) %out2
	ret void			ret void
	}			}

	define amdgpu_kernel void @atomic_max_i32(ptr addrspace(1) %out, i32 %in) {			define amdgpu_kernel void @atomic_max_i32(ptr addrspace(1) %out, i32 %in) {
	; IR-LABEL: @atomic_max_i32(			; IR-LABEL: define amdgpu_kernel void @atomic_max_i32
				; IR-SAME: (ptr addrspace(1) [[OUT:%.]], i32 [[IN:%.]]) {
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]			; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]
	; IR: 7:			; IR: 7:
	; IR-NEXT: [[TMP8:%.]] = atomicrmw volatile max ptr addrspace(1) [[OUT:%.]], i32 [[IN:%.*]] syncscope("workgroup") seq_cst, align 4			; IR-NEXT: [[TMP8:%.*]] = atomicrmw volatile max ptr addrspace(1) [[OUT]], i32 [[IN]] syncscope("workgroup") seq_cst, align 4
	; IR-NEXT: br label [[TMP9]]			; IR-NEXT: br label [[TMP9]]
	; IR: 9:			; IR: 9:
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	entry:			entry:
	%val = atomicrmw volatile max ptr addrspace(1) %out, i32 %in syncscope("workgroup") seq_cst			%val = atomicrmw volatile max ptr addrspace(1) %out, i32 %in syncscope("workgroup") seq_cst
	ret void			ret void
	}			}

	define amdgpu_kernel void @atomic_max_i32_ret(ptr addrspace(1) %out, ptr addrspace(1) %out2, i32 %in) {			define amdgpu_kernel void @atomic_max_i32_ret(ptr addrspace(1) %out, ptr addrspace(1) %out2, i32 %in) {
	; IR-LABEL: @atomic_max_i32_ret(			; IR-LABEL: define amdgpu_kernel void @atomic_max_i32_ret
				; IR-SAME: (ptr addrspace(1) [[OUT:%.]], ptr addrspace(1) [[OUT2:%.]], i32 [[IN:%.*]]) {
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]			; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]
	; IR: 7:			; IR: 7:
	; IR-NEXT: [[TMP8:%.]] = atomicrmw volatile max ptr addrspace(1) [[OUT:%.]], i32 [[IN:%.*]] syncscope("workgroup") seq_cst, align 4			; IR-NEXT: [[TMP8:%.*]] = atomicrmw volatile max ptr addrspace(1) [[OUT]], i32 [[IN]] syncscope("workgroup") seq_cst, align 4
	; IR-NEXT: br label [[TMP9]]			; IR-NEXT: br label [[TMP9]]
	; IR: 9:			; IR: 9:
	; IR-NEXT: [[TMP10:%.]] = phi i32 [ poison, [[ENTRY:%.]] ], [ [[TMP8]], [[TMP7]] ]			; IR-NEXT: [[TMP10:%.]] = phi i32 [ poison, [[ENTRY:%.]] ], [ [[TMP8]], [[TMP7]] ]
	; IR-NEXT: [[TMP11:%.*]] = call i32 @llvm.amdgcn.readfirstlane(i32 [[TMP10]])			; IR-NEXT: [[TMP11:%.*]] = call i32 @llvm.amdgcn.readfirstlane.i32(i32 [[TMP10]])
	; IR-NEXT: [[TMP12:%.*]] = select i1 [[TMP6]], i32 -2147483648, i32 [[IN]]			; IR-NEXT: [[TMP12:%.*]] = select i1 [[TMP6]], i32 -2147483648, i32 [[IN]]
	; IR-NEXT: [[TMP13:%.*]] = icmp sgt i32 [[TMP11]], [[TMP12]]			; IR-NEXT: [[TMP13:%.*]] = icmp sgt i32 [[TMP11]], [[TMP12]]
	; IR-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP11]], i32 [[TMP12]]			; IR-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP11]], i32 [[TMP12]]
	; IR-NEXT: store i32 [[TMP14]], ptr addrspace(1) [[OUT2:%.*]], align 4			; IR-NEXT: store i32 [[TMP14]], ptr addrspace(1) [[OUT2]], align 4
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	entry:			entry:
	%val = atomicrmw volatile max ptr addrspace(1) %out, i32 %in syncscope("workgroup") seq_cst			%val = atomicrmw volatile max ptr addrspace(1) %out, i32 %in syncscope("workgroup") seq_cst
	store i32 %val, ptr addrspace(1) %out2			store i32 %val, ptr addrspace(1) %out2
	ret void			ret void
	}			}

	define amdgpu_kernel void @atomic_max_i32_addr64(ptr addrspace(1) %out, i32 %in, i64 %index) {			define amdgpu_kernel void @atomic_max_i32_addr64(ptr addrspace(1) %out, i32 %in, i64 %index) {
	; IR-LABEL: @atomic_max_i32_addr64(			; IR-LABEL: define amdgpu_kernel void @atomic_max_i32_addr64
				; IR-SAME: (ptr addrspace(1) [[OUT:%.]], i32 [[IN:%.]], i64 [[INDEX:%.*]]) {
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: [[PTR:%.]] = getelementptr i32, ptr addrspace(1) [[OUT:%.]], i64 [[INDEX:%.*]]			; IR-NEXT: [[PTR:%.*]] = getelementptr i32, ptr addrspace(1) [[OUT]], i64 [[INDEX]]
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]			; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]
	; IR: 7:			; IR: 7:
	; IR-NEXT: [[TMP8:%.]] = atomicrmw volatile max ptr addrspace(1) [[PTR]], i32 [[IN:%.]] syncscope("workgroup") seq_cst, align 4			; IR-NEXT: [[TMP8:%.*]] = atomicrmw volatile max ptr addrspace(1) [[PTR]], i32 [[IN]] syncscope("workgroup") seq_cst, align 4
	; IR-NEXT: br label [[TMP9]]			; IR-NEXT: br label [[TMP9]]
	; IR: 9:			; IR: 9:
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	entry:			entry:
	%ptr = getelementptr i32, ptr addrspace(1) %out, i64 %index			%ptr = getelementptr i32, ptr addrspace(1) %out, i64 %index
	%val = atomicrmw volatile max ptr addrspace(1) %ptr, i32 %in syncscope("workgroup") seq_cst			%val = atomicrmw volatile max ptr addrspace(1) %ptr, i32 %in syncscope("workgroup") seq_cst
	ret void			ret void
	}			}

	define amdgpu_kernel void @atomic_max_i32_ret_addr64(ptr addrspace(1) %out, ptr addrspace(1) %out2, i32 %in, i64 %index) {			define amdgpu_kernel void @atomic_max_i32_ret_addr64(ptr addrspace(1) %out, ptr addrspace(1) %out2, i32 %in, i64 %index) {
	; IR-LABEL: @atomic_max_i32_ret_addr64(			; IR-LABEL: define amdgpu_kernel void @atomic_max_i32_ret_addr64
				; IR-SAME: (ptr addrspace(1) [[OUT:%.]], ptr addrspace(1) [[OUT2:%.]], i32 [[IN:%.]], i64 [[INDEX:%.]]) {
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: [[PTR:%.]] = getelementptr i32, ptr addrspace(1) [[OUT:%.]], i64 [[INDEX:%.*]]			; IR-NEXT: [[PTR:%.*]] = getelementptr i32, ptr addrspace(1) [[OUT]], i64 [[INDEX]]
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]			; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]
	; IR: 7:			; IR: 7:
	; IR-NEXT: [[TMP8:%.]] = atomicrmw volatile max ptr addrspace(1) [[PTR]], i32 [[IN:%.]] syncscope("workgroup") seq_cst, align 4			; IR-NEXT: [[TMP8:%.*]] = atomicrmw volatile max ptr addrspace(1) [[PTR]], i32 [[IN]] syncscope("workgroup") seq_cst, align 4
	; IR-NEXT: br label [[TMP9]]			; IR-NEXT: br label [[TMP9]]
	; IR: 9:			; IR: 9:
	; IR-NEXT: [[TMP10:%.]] = phi i32 [ poison, [[ENTRY:%.]] ], [ [[TMP8]], [[TMP7]] ]			; IR-NEXT: [[TMP10:%.]] = phi i32 [ poison, [[ENTRY:%.]] ], [ [[TMP8]], [[TMP7]] ]
	; IR-NEXT: [[TMP11:%.*]] = call i32 @llvm.amdgcn.readfirstlane(i32 [[TMP10]])			; IR-NEXT: [[TMP11:%.*]] = call i32 @llvm.amdgcn.readfirstlane.i32(i32 [[TMP10]])
	; IR-NEXT: [[TMP12:%.*]] = select i1 [[TMP6]], i32 -2147483648, i32 [[IN]]			; IR-NEXT: [[TMP12:%.*]] = select i1 [[TMP6]], i32 -2147483648, i32 [[IN]]
	; IR-NEXT: [[TMP13:%.*]] = icmp sgt i32 [[TMP11]], [[TMP12]]			; IR-NEXT: [[TMP13:%.*]] = icmp sgt i32 [[TMP11]], [[TMP12]]
	; IR-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP11]], i32 [[TMP12]]			; IR-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP11]], i32 [[TMP12]]
	; IR-NEXT: store i32 [[TMP14]], ptr addrspace(1) [[OUT2:%.*]], align 4			; IR-NEXT: store i32 [[TMP14]], ptr addrspace(1) [[OUT2]], align 4
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	entry:			entry:
	%ptr = getelementptr i32, ptr addrspace(1) %out, i64 %index			%ptr = getelementptr i32, ptr addrspace(1) %out, i64 %index
	%val = atomicrmw volatile max ptr addrspace(1) %ptr, i32 %in syncscope("workgroup") seq_cst			%val = atomicrmw volatile max ptr addrspace(1) %ptr, i32 %in syncscope("workgroup") seq_cst
	store i32 %val, ptr addrspace(1) %out2			store i32 %val, ptr addrspace(1) %out2
	ret void			ret void
	}			}

	define amdgpu_kernel void @atomic_umax_i32_offset(ptr addrspace(1) %out, i32 %in) {			define amdgpu_kernel void @atomic_umax_i32_offset(ptr addrspace(1) %out, i32 %in) {
	; IR-LABEL: @atomic_umax_i32_offset(			; IR-LABEL: define amdgpu_kernel void @atomic_umax_i32_offset
				; IR-SAME: (ptr addrspace(1) [[OUT:%.]], i32 [[IN:%.]]) {
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: [[GEP:%.]] = getelementptr i32, ptr addrspace(1) [[OUT:%.]], i64 4			; IR-NEXT: [[GEP:%.*]] = getelementptr i32, ptr addrspace(1) [[OUT]], i64 4
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]			; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]
	; IR: 7:			; IR: 7:
	; IR-NEXT: [[TMP8:%.]] = atomicrmw volatile umax ptr addrspace(1) [[GEP]], i32 [[IN:%.]] syncscope("workgroup") seq_cst, align 4			; IR-NEXT: [[TMP8:%.*]] = atomicrmw volatile umax ptr addrspace(1) [[GEP]], i32 [[IN]] syncscope("workgroup") seq_cst, align 4
	; IR-NEXT: br label [[TMP9]]			; IR-NEXT: br label [[TMP9]]
	; IR: 9:			; IR: 9:
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	entry:			entry:
	%gep = getelementptr i32, ptr addrspace(1) %out, i64 4			%gep = getelementptr i32, ptr addrspace(1) %out, i64 4
	%val = atomicrmw volatile umax ptr addrspace(1) %gep, i32 %in syncscope("workgroup") seq_cst			%val = atomicrmw volatile umax ptr addrspace(1) %gep, i32 %in syncscope("workgroup") seq_cst
	ret void			ret void
	}			}

	define amdgpu_kernel void @atomic_umax_i32_ret_offset(ptr addrspace(1) %out, ptr addrspace(1) %out2, i32 %in) {			define amdgpu_kernel void @atomic_umax_i32_ret_offset(ptr addrspace(1) %out, ptr addrspace(1) %out2, i32 %in) {
	; IR-LABEL: @atomic_umax_i32_ret_offset(			; IR-LABEL: define amdgpu_kernel void @atomic_umax_i32_ret_offset
				; IR-SAME: (ptr addrspace(1) [[OUT:%.]], ptr addrspace(1) [[OUT2:%.]], i32 [[IN:%.*]]) {
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: [[GEP:%.]] = getelementptr i32, ptr addrspace(1) [[OUT:%.]], i64 4			; IR-NEXT: [[GEP:%.*]] = getelementptr i32, ptr addrspace(1) [[OUT]], i64 4
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]			; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]
	; IR: 7:			; IR: 7:
	; IR-NEXT: [[TMP8:%.]] = atomicrmw volatile umax ptr addrspace(1) [[GEP]], i32 [[IN:%.]] syncscope("workgroup") seq_cst, align 4			; IR-NEXT: [[TMP8:%.*]] = atomicrmw volatile umax ptr addrspace(1) [[GEP]], i32 [[IN]] syncscope("workgroup") seq_cst, align 4
	; IR-NEXT: br label [[TMP9]]			; IR-NEXT: br label [[TMP9]]
	; IR: 9:			; IR: 9:
	; IR-NEXT: [[TMP10:%.]] = phi i32 [ poison, [[ENTRY:%.]] ], [ [[TMP8]], [[TMP7]] ]			; IR-NEXT: [[TMP10:%.]] = phi i32 [ poison, [[ENTRY:%.]] ], [ [[TMP8]], [[TMP7]] ]
	; IR-NEXT: [[TMP11:%.*]] = call i32 @llvm.amdgcn.readfirstlane(i32 [[TMP10]])			; IR-NEXT: [[TMP11:%.*]] = call i32 @llvm.amdgcn.readfirstlane.i32(i32 [[TMP10]])
	; IR-NEXT: [[TMP12:%.*]] = select i1 [[TMP6]], i32 0, i32 [[IN]]			; IR-NEXT: [[TMP12:%.*]] = select i1 [[TMP6]], i32 0, i32 [[IN]]
	; IR-NEXT: [[TMP13:%.*]] = icmp ugt i32 [[TMP11]], [[TMP12]]			; IR-NEXT: [[TMP13:%.*]] = icmp ugt i32 [[TMP11]], [[TMP12]]
	; IR-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP11]], i32 [[TMP12]]			; IR-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP11]], i32 [[TMP12]]
	; IR-NEXT: store i32 [[TMP14]], ptr addrspace(1) [[OUT2:%.*]], align 4			; IR-NEXT: store i32 [[TMP14]], ptr addrspace(1) [[OUT2]], align 4
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	entry:			entry:
	%gep = getelementptr i32, ptr addrspace(1) %out, i64 4			%gep = getelementptr i32, ptr addrspace(1) %out, i64 4
	%val = atomicrmw volatile umax ptr addrspace(1) %gep, i32 %in syncscope("workgroup") seq_cst			%val = atomicrmw volatile umax ptr addrspace(1) %gep, i32 %in syncscope("workgroup") seq_cst
	store i32 %val, ptr addrspace(1) %out2			store i32 %val, ptr addrspace(1) %out2
	ret void			ret void
	}			}

	define amdgpu_kernel void @atomic_umax_i32_addr64_offset(ptr addrspace(1) %out, i32 %in, i64 %index) {			define amdgpu_kernel void @atomic_umax_i32_addr64_offset(ptr addrspace(1) %out, i32 %in, i64 %index) {
	; IR-LABEL: @atomic_umax_i32_addr64_offset(			; IR-LABEL: define amdgpu_kernel void @atomic_umax_i32_addr64_offset
				; IR-SAME: (ptr addrspace(1) [[OUT:%.]], i32 [[IN:%.]], i64 [[INDEX:%.*]]) {
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: [[PTR:%.]] = getelementptr i32, ptr addrspace(1) [[OUT:%.]], i64 [[INDEX:%.*]]			; IR-NEXT: [[PTR:%.*]] = getelementptr i32, ptr addrspace(1) [[OUT]], i64 [[INDEX]]
	; IR-NEXT: [[GEP:%.*]] = getelementptr i32, ptr addrspace(1) [[PTR]], i64 4			; IR-NEXT: [[GEP:%.*]] = getelementptr i32, ptr addrspace(1) [[PTR]], i64 4
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]			; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]
	; IR: 7:			; IR: 7:
	; IR-NEXT: [[TMP8:%.]] = atomicrmw volatile umax ptr addrspace(1) [[GEP]], i32 [[IN:%.]] syncscope("workgroup") seq_cst, align 4			; IR-NEXT: [[TMP8:%.*]] = atomicrmw volatile umax ptr addrspace(1) [[GEP]], i32 [[IN]] syncscope("workgroup") seq_cst, align 4
	; IR-NEXT: br label [[TMP9]]			; IR-NEXT: br label [[TMP9]]
	; IR: 9:			; IR: 9:
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	entry:			entry:
	%ptr = getelementptr i32, ptr addrspace(1) %out, i64 %index			%ptr = getelementptr i32, ptr addrspace(1) %out, i64 %index
	%gep = getelementptr i32, ptr addrspace(1) %ptr, i64 4			%gep = getelementptr i32, ptr addrspace(1) %ptr, i64 4
	%val = atomicrmw volatile umax ptr addrspace(1) %gep, i32 %in syncscope("workgroup") seq_cst			%val = atomicrmw volatile umax ptr addrspace(1) %gep, i32 %in syncscope("workgroup") seq_cst
	ret void			ret void
	}			}

	define amdgpu_kernel void @atomic_umax_i32_ret_addr64_offset(ptr addrspace(1) %out, ptr addrspace(1) %out2, i32 %in, i64 %index) {			define amdgpu_kernel void @atomic_umax_i32_ret_addr64_offset(ptr addrspace(1) %out, ptr addrspace(1) %out2, i32 %in, i64 %index) {
	; IR-LABEL: @atomic_umax_i32_ret_addr64_offset(			; IR-LABEL: define amdgpu_kernel void @atomic_umax_i32_ret_addr64_offset
				; IR-SAME: (ptr addrspace(1) [[OUT:%.]], ptr addrspace(1) [[OUT2:%.]], i32 [[IN:%.]], i64 [[INDEX:%.]]) {
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: [[PTR:%.]] = getelementptr i32, ptr addrspace(1) [[OUT:%.]], i64 [[INDEX:%.*]]			; IR-NEXT: [[PTR:%.*]] = getelementptr i32, ptr addrspace(1) [[OUT]], i64 [[INDEX]]
	; IR-NEXT: [[GEP:%.*]] = getelementptr i32, ptr addrspace(1) [[PTR]], i64 4			; IR-NEXT: [[GEP:%.*]] = getelementptr i32, ptr addrspace(1) [[PTR]], i64 4
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]			; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]
	; IR: 7:			; IR: 7:
	; IR-NEXT: [[TMP8:%.]] = atomicrmw volatile umax ptr addrspace(1) [[GEP]], i32 [[IN:%.]] syncscope("workgroup") seq_cst, align 4			; IR-NEXT: [[TMP8:%.*]] = atomicrmw volatile umax ptr addrspace(1) [[GEP]], i32 [[IN]] syncscope("workgroup") seq_cst, align 4
	; IR-NEXT: br label [[TMP9]]			; IR-NEXT: br label [[TMP9]]
	; IR: 9:			; IR: 9:
	; IR-NEXT: [[TMP10:%.]] = phi i32 [ poison, [[ENTRY:%.]] ], [ [[TMP8]], [[TMP7]] ]			; IR-NEXT: [[TMP10:%.]] = phi i32 [ poison, [[ENTRY:%.]] ], [ [[TMP8]], [[TMP7]] ]
	; IR-NEXT: [[TMP11:%.*]] = call i32 @llvm.amdgcn.readfirstlane(i32 [[TMP10]])			; IR-NEXT: [[TMP11:%.*]] = call i32 @llvm.amdgcn.readfirstlane.i32(i32 [[TMP10]])
	; IR-NEXT: [[TMP12:%.*]] = select i1 [[TMP6]], i32 0, i32 [[IN]]			; IR-NEXT: [[TMP12:%.*]] = select i1 [[TMP6]], i32 0, i32 [[IN]]
	; IR-NEXT: [[TMP13:%.*]] = icmp ugt i32 [[TMP11]], [[TMP12]]			; IR-NEXT: [[TMP13:%.*]] = icmp ugt i32 [[TMP11]], [[TMP12]]
	; IR-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP11]], i32 [[TMP12]]			; IR-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP11]], i32 [[TMP12]]
	; IR-NEXT: store i32 [[TMP14]], ptr addrspace(1) [[OUT2:%.*]], align 4			; IR-NEXT: store i32 [[TMP14]], ptr addrspace(1) [[OUT2]], align 4
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	entry:			entry:
	%ptr = getelementptr i32, ptr addrspace(1) %out, i64 %index			%ptr = getelementptr i32, ptr addrspace(1) %out, i64 %index
	%gep = getelementptr i32, ptr addrspace(1) %ptr, i64 4			%gep = getelementptr i32, ptr addrspace(1) %ptr, i64 4
	%val = atomicrmw volatile umax ptr addrspace(1) %gep, i32 %in syncscope("workgroup") seq_cst			%val = atomicrmw volatile umax ptr addrspace(1) %gep, i32 %in syncscope("workgroup") seq_cst
	store i32 %val, ptr addrspace(1) %out2			store i32 %val, ptr addrspace(1) %out2
	ret void			ret void
	}			}

	define amdgpu_kernel void @atomic_umax_i32(ptr addrspace(1) %out, i32 %in) {			define amdgpu_kernel void @atomic_umax_i32(ptr addrspace(1) %out, i32 %in) {
	; IR-LABEL: @atomic_umax_i32(			; IR-LABEL: define amdgpu_kernel void @atomic_umax_i32
				; IR-SAME: (ptr addrspace(1) [[OUT:%.]], i32 [[IN:%.]]) {
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]			; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]
	; IR: 7:			; IR: 7:
	; IR-NEXT: [[TMP8:%.]] = atomicrmw volatile umax ptr addrspace(1) [[OUT:%.]], i32 [[IN:%.*]] syncscope("workgroup") seq_cst, align 4			; IR-NEXT: [[TMP8:%.*]] = atomicrmw volatile umax ptr addrspace(1) [[OUT]], i32 [[IN]] syncscope("workgroup") seq_cst, align 4
	; IR-NEXT: br label [[TMP9]]			; IR-NEXT: br label [[TMP9]]
	; IR: 9:			; IR: 9:
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	entry:			entry:
	%val = atomicrmw volatile umax ptr addrspace(1) %out, i32 %in syncscope("workgroup") seq_cst			%val = atomicrmw volatile umax ptr addrspace(1) %out, i32 %in syncscope("workgroup") seq_cst
	ret void			ret void
	}			}

	define amdgpu_kernel void @atomic_umax_i32_ret(ptr addrspace(1) %out, ptr addrspace(1) %out2, i32 %in) {			define amdgpu_kernel void @atomic_umax_i32_ret(ptr addrspace(1) %out, ptr addrspace(1) %out2, i32 %in) {
	; IR-LABEL: @atomic_umax_i32_ret(			; IR-LABEL: define amdgpu_kernel void @atomic_umax_i32_ret
				; IR-SAME: (ptr addrspace(1) [[OUT:%.]], ptr addrspace(1) [[OUT2:%.]], i32 [[IN:%.*]]) {
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]			; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]
	; IR: 7:			; IR: 7:
	; IR-NEXT: [[TMP8:%.]] = atomicrmw volatile umax ptr addrspace(1) [[OUT:%.]], i32 [[IN:%.*]] syncscope("workgroup") seq_cst, align 4			; IR-NEXT: [[TMP8:%.*]] = atomicrmw volatile umax ptr addrspace(1) [[OUT]], i32 [[IN]] syncscope("workgroup") seq_cst, align 4
	; IR-NEXT: br label [[TMP9]]			; IR-NEXT: br label [[TMP9]]
	; IR: 9:			; IR: 9:
	; IR-NEXT: [[TMP10:%.]] = phi i32 [ poison, [[ENTRY:%.]] ], [ [[TMP8]], [[TMP7]] ]			; IR-NEXT: [[TMP10:%.]] = phi i32 [ poison, [[ENTRY:%.]] ], [ [[TMP8]], [[TMP7]] ]
	; IR-NEXT: [[TMP11:%.*]] = call i32 @llvm.amdgcn.readfirstlane(i32 [[TMP10]])			; IR-NEXT: [[TMP11:%.*]] = call i32 @llvm.amdgcn.readfirstlane.i32(i32 [[TMP10]])
	; IR-NEXT: [[TMP12:%.*]] = select i1 [[TMP6]], i32 0, i32 [[IN]]			; IR-NEXT: [[TMP12:%.*]] = select i1 [[TMP6]], i32 0, i32 [[IN]]
	; IR-NEXT: [[TMP13:%.*]] = icmp ugt i32 [[TMP11]], [[TMP12]]			; IR-NEXT: [[TMP13:%.*]] = icmp ugt i32 [[TMP11]], [[TMP12]]
	; IR-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP11]], i32 [[TMP12]]			; IR-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP11]], i32 [[TMP12]]
	; IR-NEXT: store i32 [[TMP14]], ptr addrspace(1) [[OUT2:%.*]], align 4			; IR-NEXT: store i32 [[TMP14]], ptr addrspace(1) [[OUT2]], align 4
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	entry:			entry:
	%val = atomicrmw volatile umax ptr addrspace(1) %out, i32 %in syncscope("workgroup") seq_cst			%val = atomicrmw volatile umax ptr addrspace(1) %out, i32 %in syncscope("workgroup") seq_cst
	store i32 %val, ptr addrspace(1) %out2			store i32 %val, ptr addrspace(1) %out2
	ret void			ret void
	}			}

	define amdgpu_kernel void @atomic_umax_i32_addr64(ptr addrspace(1) %out, i32 %in, i64 %index) {			define amdgpu_kernel void @atomic_umax_i32_addr64(ptr addrspace(1) %out, i32 %in, i64 %index) {
	; IR-LABEL: @atomic_umax_i32_addr64(			; IR-LABEL: define amdgpu_kernel void @atomic_umax_i32_addr64
				; IR-SAME: (ptr addrspace(1) [[OUT:%.]], i32 [[IN:%.]], i64 [[INDEX:%.*]]) {
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: [[PTR:%.]] = getelementptr i32, ptr addrspace(1) [[OUT:%.]], i64 [[INDEX:%.*]]			; IR-NEXT: [[PTR:%.*]] = getelementptr i32, ptr addrspace(1) [[OUT]], i64 [[INDEX]]
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]			; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]
	; IR: 7:			; IR: 7:
	; IR-NEXT: [[TMP8:%.]] = atomicrmw volatile umax ptr addrspace(1) [[PTR]], i32 [[IN:%.]] syncscope("workgroup") seq_cst, align 4			; IR-NEXT: [[TMP8:%.*]] = atomicrmw volatile umax ptr addrspace(1) [[PTR]], i32 [[IN]] syncscope("workgroup") seq_cst, align 4
	; IR-NEXT: br label [[TMP9]]			; IR-NEXT: br label [[TMP9]]
	; IR: 9:			; IR: 9:
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	entry:			entry:
	%ptr = getelementptr i32, ptr addrspace(1) %out, i64 %index			%ptr = getelementptr i32, ptr addrspace(1) %out, i64 %index
	%val = atomicrmw volatile umax ptr addrspace(1) %ptr, i32 %in syncscope("workgroup") seq_cst			%val = atomicrmw volatile umax ptr addrspace(1) %ptr, i32 %in syncscope("workgroup") seq_cst
	ret void			ret void
	}			}

	define amdgpu_kernel void @atomic_umax_i32_ret_addr64(ptr addrspace(1) %out, ptr addrspace(1) %out2, i32 %in, i64 %index) {			define amdgpu_kernel void @atomic_umax_i32_ret_addr64(ptr addrspace(1) %out, ptr addrspace(1) %out2, i32 %in, i64 %index) {
	; IR-LABEL: @atomic_umax_i32_ret_addr64(			; IR-LABEL: define amdgpu_kernel void @atomic_umax_i32_ret_addr64
				; IR-SAME: (ptr addrspace(1) [[OUT:%.]], ptr addrspace(1) [[OUT2:%.]], i32 [[IN:%.]], i64 [[INDEX:%.]]) {
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: [[PTR:%.]] = getelementptr i32, ptr addrspace(1) [[OUT:%.]], i64 [[INDEX:%.*]]			; IR-NEXT: [[PTR:%.*]] = getelementptr i32, ptr addrspace(1) [[OUT]], i64 [[INDEX]]
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]			; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]
	; IR: 7:			; IR: 7:
	; IR-NEXT: [[TMP8:%.]] = atomicrmw volatile umax ptr addrspace(1) [[PTR]], i32 [[IN:%.]] syncscope("workgroup") seq_cst, align 4			; IR-NEXT: [[TMP8:%.*]] = atomicrmw volatile umax ptr addrspace(1) [[PTR]], i32 [[IN]] syncscope("workgroup") seq_cst, align 4
	; IR-NEXT: br label [[TMP9]]			; IR-NEXT: br label [[TMP9]]
	; IR: 9:			; IR: 9:
	; IR-NEXT: [[TMP10:%.]] = phi i32 [ poison, [[ENTRY:%.]] ], [ [[TMP8]], [[TMP7]] ]			; IR-NEXT: [[TMP10:%.]] = phi i32 [ poison, [[ENTRY:%.]] ], [ [[TMP8]], [[TMP7]] ]
	; IR-NEXT: [[TMP11:%.*]] = call i32 @llvm.amdgcn.readfirstlane(i32 [[TMP10]])			; IR-NEXT: [[TMP11:%.*]] = call i32 @llvm.amdgcn.readfirstlane.i32(i32 [[TMP10]])
	; IR-NEXT: [[TMP12:%.*]] = select i1 [[TMP6]], i32 0, i32 [[IN]]			; IR-NEXT: [[TMP12:%.*]] = select i1 [[TMP6]], i32 0, i32 [[IN]]
	; IR-NEXT: [[TMP13:%.*]] = icmp ugt i32 [[TMP11]], [[TMP12]]			; IR-NEXT: [[TMP13:%.*]] = icmp ugt i32 [[TMP11]], [[TMP12]]
	; IR-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP11]], i32 [[TMP12]]			; IR-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP11]], i32 [[TMP12]]
	; IR-NEXT: store i32 [[TMP14]], ptr addrspace(1) [[OUT2:%.*]], align 4			; IR-NEXT: store i32 [[TMP14]], ptr addrspace(1) [[OUT2]], align 4
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	entry:			entry:
	%ptr = getelementptr i32, ptr addrspace(1) %out, i64 %index			%ptr = getelementptr i32, ptr addrspace(1) %out, i64 %index
	%val = atomicrmw volatile umax ptr addrspace(1) %ptr, i32 %in syncscope("workgroup") seq_cst			%val = atomicrmw volatile umax ptr addrspace(1) %ptr, i32 %in syncscope("workgroup") seq_cst
	store i32 %val, ptr addrspace(1) %out2			store i32 %val, ptr addrspace(1) %out2
	ret void			ret void
	}			}

	define amdgpu_kernel void @atomic_min_i32_offset(ptr addrspace(1) %out, i32 %in) {			define amdgpu_kernel void @atomic_min_i32_offset(ptr addrspace(1) %out, i32 %in) {
	; IR-LABEL: @atomic_min_i32_offset(			; IR-LABEL: define amdgpu_kernel void @atomic_min_i32_offset
				; IR-SAME: (ptr addrspace(1) [[OUT:%.]], i32 [[IN:%.]]) {
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: [[GEP:%.]] = getelementptr i32, ptr addrspace(1) [[OUT:%.]], i64 4			; IR-NEXT: [[GEP:%.*]] = getelementptr i32, ptr addrspace(1) [[OUT]], i64 4
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]			; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]
	; IR: 7:			; IR: 7:
	; IR-NEXT: [[TMP8:%.]] = atomicrmw volatile min ptr addrspace(1) [[GEP]], i32 [[IN:%.]] syncscope("workgroup") seq_cst, align 4			; IR-NEXT: [[TMP8:%.*]] = atomicrmw volatile min ptr addrspace(1) [[GEP]], i32 [[IN]] syncscope("workgroup") seq_cst, align 4
	; IR-NEXT: br label [[TMP9]]			; IR-NEXT: br label [[TMP9]]
	; IR: 9:			; IR: 9:
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	entry:			entry:
	%gep = getelementptr i32, ptr addrspace(1) %out, i64 4			%gep = getelementptr i32, ptr addrspace(1) %out, i64 4
	%val = atomicrmw volatile min ptr addrspace(1) %gep, i32 %in syncscope("workgroup") seq_cst			%val = atomicrmw volatile min ptr addrspace(1) %gep, i32 %in syncscope("workgroup") seq_cst
	ret void			ret void
	}			}

	define amdgpu_kernel void @atomic_min_i32_ret_offset(ptr addrspace(1) %out, ptr addrspace(1) %out2, i32 %in) {			define amdgpu_kernel void @atomic_min_i32_ret_offset(ptr addrspace(1) %out, ptr addrspace(1) %out2, i32 %in) {
	; IR-LABEL: @atomic_min_i32_ret_offset(			; IR-LABEL: define amdgpu_kernel void @atomic_min_i32_ret_offset
				; IR-SAME: (ptr addrspace(1) [[OUT:%.]], ptr addrspace(1) [[OUT2:%.]], i32 [[IN:%.*]]) {
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: [[GEP:%.]] = getelementptr i32, ptr addrspace(1) [[OUT:%.]], i64 4			; IR-NEXT: [[GEP:%.*]] = getelementptr i32, ptr addrspace(1) [[OUT]], i64 4
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]			; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]
	; IR: 7:			; IR: 7:
	; IR-NEXT: [[TMP8:%.]] = atomicrmw volatile min ptr addrspace(1) [[GEP]], i32 [[IN:%.]] syncscope("workgroup") seq_cst, align 4			; IR-NEXT: [[TMP8:%.*]] = atomicrmw volatile min ptr addrspace(1) [[GEP]], i32 [[IN]] syncscope("workgroup") seq_cst, align 4
	; IR-NEXT: br label [[TMP9]]			; IR-NEXT: br label [[TMP9]]
	; IR: 9:			; IR: 9:
	; IR-NEXT: [[TMP10:%.]] = phi i32 [ poison, [[ENTRY:%.]] ], [ [[TMP8]], [[TMP7]] ]			; IR-NEXT: [[TMP10:%.]] = phi i32 [ poison, [[ENTRY:%.]] ], [ [[TMP8]], [[TMP7]] ]
	; IR-NEXT: [[TMP11:%.*]] = call i32 @llvm.amdgcn.readfirstlane(i32 [[TMP10]])			; IR-NEXT: [[TMP11:%.*]] = call i32 @llvm.amdgcn.readfirstlane.i32(i32 [[TMP10]])
	; IR-NEXT: [[TMP12:%.*]] = select i1 [[TMP6]], i32 2147483647, i32 [[IN]]			; IR-NEXT: [[TMP12:%.*]] = select i1 [[TMP6]], i32 2147483647, i32 [[IN]]
	; IR-NEXT: [[TMP13:%.*]] = icmp slt i32 [[TMP11]], [[TMP12]]			; IR-NEXT: [[TMP13:%.*]] = icmp slt i32 [[TMP11]], [[TMP12]]
	; IR-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP11]], i32 [[TMP12]]			; IR-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP11]], i32 [[TMP12]]
	; IR-NEXT: store i32 [[TMP14]], ptr addrspace(1) [[OUT2:%.*]], align 4			; IR-NEXT: store i32 [[TMP14]], ptr addrspace(1) [[OUT2]], align 4
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	entry:			entry:
	%gep = getelementptr i32, ptr addrspace(1) %out, i64 4			%gep = getelementptr i32, ptr addrspace(1) %out, i64 4
	%val = atomicrmw volatile min ptr addrspace(1) %gep, i32 %in syncscope("workgroup") seq_cst			%val = atomicrmw volatile min ptr addrspace(1) %gep, i32 %in syncscope("workgroup") seq_cst
	store i32 %val, ptr addrspace(1) %out2			store i32 %val, ptr addrspace(1) %out2
	ret void			ret void
	}			}

	define amdgpu_kernel void @atomic_min_i32_addr64_offset(ptr addrspace(1) %out, i32 %in, i64 %index) {			define amdgpu_kernel void @atomic_min_i32_addr64_offset(ptr addrspace(1) %out, i32 %in, i64 %index) {
	; IR-LABEL: @atomic_min_i32_addr64_offset(			; IR-LABEL: define amdgpu_kernel void @atomic_min_i32_addr64_offset
				; IR-SAME: (ptr addrspace(1) [[OUT:%.]], i32 [[IN:%.]], i64 [[INDEX:%.*]]) {
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: [[PTR:%.]] = getelementptr i32, ptr addrspace(1) [[OUT:%.]], i64 [[INDEX:%.*]]			; IR-NEXT: [[PTR:%.*]] = getelementptr i32, ptr addrspace(1) [[OUT]], i64 [[INDEX]]
	; IR-NEXT: [[GEP:%.*]] = getelementptr i32, ptr addrspace(1) [[PTR]], i64 4			; IR-NEXT: [[GEP:%.*]] = getelementptr i32, ptr addrspace(1) [[PTR]], i64 4
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]			; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]
	; IR: 7:			; IR: 7:
	; IR-NEXT: [[TMP8:%.]] = atomicrmw volatile min ptr addrspace(1) [[GEP]], i32 [[IN:%.]] syncscope("workgroup") seq_cst, align 4			; IR-NEXT: [[TMP8:%.*]] = atomicrmw volatile min ptr addrspace(1) [[GEP]], i32 [[IN]] syncscope("workgroup") seq_cst, align 4
	; IR-NEXT: br label [[TMP9]]			; IR-NEXT: br label [[TMP9]]
	; IR: 9:			; IR: 9:
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	entry:			entry:
	%ptr = getelementptr i32, ptr addrspace(1) %out, i64 %index			%ptr = getelementptr i32, ptr addrspace(1) %out, i64 %index
	%gep = getelementptr i32, ptr addrspace(1) %ptr, i64 4			%gep = getelementptr i32, ptr addrspace(1) %ptr, i64 4
	%val = atomicrmw volatile min ptr addrspace(1) %gep, i32 %in syncscope("workgroup") seq_cst			%val = atomicrmw volatile min ptr addrspace(1) %gep, i32 %in syncscope("workgroup") seq_cst
	ret void			ret void
	}			}

	define amdgpu_kernel void @atomic_min_i32_ret_addr64_offset(ptr addrspace(1) %out, ptr addrspace(1) %out2, i32 %in, i64 %index) {			define amdgpu_kernel void @atomic_min_i32_ret_addr64_offset(ptr addrspace(1) %out, ptr addrspace(1) %out2, i32 %in, i64 %index) {
	; IR-LABEL: @atomic_min_i32_ret_addr64_offset(			; IR-LABEL: define amdgpu_kernel void @atomic_min_i32_ret_addr64_offset
				; IR-SAME: (ptr addrspace(1) [[OUT:%.]], ptr addrspace(1) [[OUT2:%.]], i32 [[IN:%.]], i64 [[INDEX:%.]]) {
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: [[PTR:%.]] = getelementptr i32, ptr addrspace(1) [[OUT:%.]], i64 [[INDEX:%.*]]			; IR-NEXT: [[PTR:%.*]] = getelementptr i32, ptr addrspace(1) [[OUT]], i64 [[INDEX]]
	; IR-NEXT: [[GEP:%.*]] = getelementptr i32, ptr addrspace(1) [[PTR]], i64 4			; IR-NEXT: [[GEP:%.*]] = getelementptr i32, ptr addrspace(1) [[PTR]], i64 4
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]			; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]
	; IR: 7:			; IR: 7:
	; IR-NEXT: [[TMP8:%.]] = atomicrmw volatile min ptr addrspace(1) [[GEP]], i32 [[IN:%.]] syncscope("workgroup") seq_cst, align 4			; IR-NEXT: [[TMP8:%.*]] = atomicrmw volatile min ptr addrspace(1) [[GEP]], i32 [[IN]] syncscope("workgroup") seq_cst, align 4
	; IR-NEXT: br label [[TMP9]]			; IR-NEXT: br label [[TMP9]]
	; IR: 9:			; IR: 9:
	; IR-NEXT: [[TMP10:%.]] = phi i32 [ poison, [[ENTRY:%.]] ], [ [[TMP8]], [[TMP7]] ]			; IR-NEXT: [[TMP10:%.]] = phi i32 [ poison, [[ENTRY:%.]] ], [ [[TMP8]], [[TMP7]] ]
	; IR-NEXT: [[TMP11:%.*]] = call i32 @llvm.amdgcn.readfirstlane(i32 [[TMP10]])			; IR-NEXT: [[TMP11:%.*]] = call i32 @llvm.amdgcn.readfirstlane.i32(i32 [[TMP10]])
	; IR-NEXT: [[TMP12:%.*]] = select i1 [[TMP6]], i32 2147483647, i32 [[IN]]			; IR-NEXT: [[TMP12:%.*]] = select i1 [[TMP6]], i32 2147483647, i32 [[IN]]
	; IR-NEXT: [[TMP13:%.*]] = icmp slt i32 [[TMP11]], [[TMP12]]			; IR-NEXT: [[TMP13:%.*]] = icmp slt i32 [[TMP11]], [[TMP12]]
	; IR-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP11]], i32 [[TMP12]]			; IR-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP11]], i32 [[TMP12]]
	; IR-NEXT: store i32 [[TMP14]], ptr addrspace(1) [[OUT2:%.*]], align 4			; IR-NEXT: store i32 [[TMP14]], ptr addrspace(1) [[OUT2]], align 4
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	entry:			entry:
	%ptr = getelementptr i32, ptr addrspace(1) %out, i64 %index			%ptr = getelementptr i32, ptr addrspace(1) %out, i64 %index
	%gep = getelementptr i32, ptr addrspace(1) %ptr, i64 4			%gep = getelementptr i32, ptr addrspace(1) %ptr, i64 4
	%val = atomicrmw volatile min ptr addrspace(1) %gep, i32 %in syncscope("workgroup") seq_cst			%val = atomicrmw volatile min ptr addrspace(1) %gep, i32 %in syncscope("workgroup") seq_cst
	store i32 %val, ptr addrspace(1) %out2			store i32 %val, ptr addrspace(1) %out2
	ret void			ret void
	}			}

	define amdgpu_kernel void @atomic_min_i32(ptr addrspace(1) %out, i32 %in) {			define amdgpu_kernel void @atomic_min_i32(ptr addrspace(1) %out, i32 %in) {
	; IR-LABEL: @atomic_min_i32(			; IR-LABEL: define amdgpu_kernel void @atomic_min_i32
				; IR-SAME: (ptr addrspace(1) [[OUT:%.]], i32 [[IN:%.]]) {
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]			; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]
	; IR: 7:			; IR: 7:
	; IR-NEXT: [[TMP8:%.]] = atomicrmw volatile min ptr addrspace(1) [[OUT:%.]], i32 [[IN:%.*]] syncscope("workgroup") seq_cst, align 4			; IR-NEXT: [[TMP8:%.*]] = atomicrmw volatile min ptr addrspace(1) [[OUT]], i32 [[IN]] syncscope("workgroup") seq_cst, align 4
	; IR-NEXT: br label [[TMP9]]			; IR-NEXT: br label [[TMP9]]
	; IR: 9:			; IR: 9:
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	entry:			entry:
	%val = atomicrmw volatile min ptr addrspace(1) %out, i32 %in syncscope("workgroup") seq_cst			%val = atomicrmw volatile min ptr addrspace(1) %out, i32 %in syncscope("workgroup") seq_cst
	ret void			ret void
	}			}

	define amdgpu_kernel void @atomic_min_i32_ret(ptr addrspace(1) %out, ptr addrspace(1) %out2, i32 %in) {			define amdgpu_kernel void @atomic_min_i32_ret(ptr addrspace(1) %out, ptr addrspace(1) %out2, i32 %in) {
	; IR-LABEL: @atomic_min_i32_ret(			; IR-LABEL: define amdgpu_kernel void @atomic_min_i32_ret
				; IR-SAME: (ptr addrspace(1) [[OUT:%.]], ptr addrspace(1) [[OUT2:%.]], i32 [[IN:%.*]]) {
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]			; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]
	; IR: 7:			; IR: 7:
	; IR-NEXT: [[TMP8:%.]] = atomicrmw volatile min ptr addrspace(1) [[OUT:%.]], i32 [[IN:%.*]] syncscope("workgroup") seq_cst, align 4			; IR-NEXT: [[TMP8:%.*]] = atomicrmw volatile min ptr addrspace(1) [[OUT]], i32 [[IN]] syncscope("workgroup") seq_cst, align 4
	; IR-NEXT: br label [[TMP9]]			; IR-NEXT: br label [[TMP9]]
	; IR: 9:			; IR: 9:
	; IR-NEXT: [[TMP10:%.]] = phi i32 [ poison, [[ENTRY:%.]] ], [ [[TMP8]], [[TMP7]] ]			; IR-NEXT: [[TMP10:%.]] = phi i32 [ poison, [[ENTRY:%.]] ], [ [[TMP8]], [[TMP7]] ]
	; IR-NEXT: [[TMP11:%.*]] = call i32 @llvm.amdgcn.readfirstlane(i32 [[TMP10]])			; IR-NEXT: [[TMP11:%.*]] = call i32 @llvm.amdgcn.readfirstlane.i32(i32 [[TMP10]])
	; IR-NEXT: [[TMP12:%.*]] = select i1 [[TMP6]], i32 2147483647, i32 [[IN]]			; IR-NEXT: [[TMP12:%.*]] = select i1 [[TMP6]], i32 2147483647, i32 [[IN]]
	; IR-NEXT: [[TMP13:%.*]] = icmp slt i32 [[TMP11]], [[TMP12]]			; IR-NEXT: [[TMP13:%.*]] = icmp slt i32 [[TMP11]], [[TMP12]]
	; IR-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP11]], i32 [[TMP12]]			; IR-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP11]], i32 [[TMP12]]
	; IR-NEXT: store i32 [[TMP14]], ptr addrspace(1) [[OUT2:%.*]], align 4			; IR-NEXT: store i32 [[TMP14]], ptr addrspace(1) [[OUT2]], align 4
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	entry:			entry:
	%val = atomicrmw volatile min ptr addrspace(1) %out, i32 %in syncscope("workgroup") seq_cst			%val = atomicrmw volatile min ptr addrspace(1) %out, i32 %in syncscope("workgroup") seq_cst
	store i32 %val, ptr addrspace(1) %out2			store i32 %val, ptr addrspace(1) %out2
	ret void			ret void
	}			}

	define amdgpu_kernel void @atomic_min_i32_addr64(ptr addrspace(1) %out, i32 %in, i64 %index) {			define amdgpu_kernel void @atomic_min_i32_addr64(ptr addrspace(1) %out, i32 %in, i64 %index) {
	; IR-LABEL: @atomic_min_i32_addr64(			; IR-LABEL: define amdgpu_kernel void @atomic_min_i32_addr64
				; IR-SAME: (ptr addrspace(1) [[OUT:%.]], i32 [[IN:%.]], i64 [[INDEX:%.*]]) {
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: [[PTR:%.]] = getelementptr i32, ptr addrspace(1) [[OUT:%.]], i64 [[INDEX:%.*]]			; IR-NEXT: [[PTR:%.*]] = getelementptr i32, ptr addrspace(1) [[OUT]], i64 [[INDEX]]
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]			; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]
	; IR: 7:			; IR: 7:
	; IR-NEXT: [[TMP8:%.]] = atomicrmw volatile min ptr addrspace(1) [[PTR]], i32 [[IN:%.]] syncscope("workgroup") seq_cst, align 4			; IR-NEXT: [[TMP8:%.*]] = atomicrmw volatile min ptr addrspace(1) [[PTR]], i32 [[IN]] syncscope("workgroup") seq_cst, align 4
	; IR-NEXT: br label [[TMP9]]			; IR-NEXT: br label [[TMP9]]
	; IR: 9:			; IR: 9:
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	entry:			entry:
	%ptr = getelementptr i32, ptr addrspace(1) %out, i64 %index			%ptr = getelementptr i32, ptr addrspace(1) %out, i64 %index
	%val = atomicrmw volatile min ptr addrspace(1) %ptr, i32 %in syncscope("workgroup") seq_cst			%val = atomicrmw volatile min ptr addrspace(1) %ptr, i32 %in syncscope("workgroup") seq_cst
	ret void			ret void
	}			}

	define amdgpu_kernel void @atomic_min_i32_ret_addr64(ptr addrspace(1) %out, ptr addrspace(1) %out2, i32 %in, i64 %index) {			define amdgpu_kernel void @atomic_min_i32_ret_addr64(ptr addrspace(1) %out, ptr addrspace(1) %out2, i32 %in, i64 %index) {
	; IR-LABEL: @atomic_min_i32_ret_addr64(			; IR-LABEL: define amdgpu_kernel void @atomic_min_i32_ret_addr64
				; IR-SAME: (ptr addrspace(1) [[OUT:%.]], ptr addrspace(1) [[OUT2:%.]], i32 [[IN:%.]], i64 [[INDEX:%.]]) {
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: [[PTR:%.]] = getelementptr i32, ptr addrspace(1) [[OUT:%.]], i64 [[INDEX:%.*]]			; IR-NEXT: [[PTR:%.*]] = getelementptr i32, ptr addrspace(1) [[OUT]], i64 [[INDEX]]
	; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)			; IR-NEXT: [[TMP0:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 true)
	; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>			; IR-NEXT: [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <2 x i32>
	; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0			; IR-NEXT: [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 0
	; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1			; IR-NEXT: [[TMP3:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
	; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)			; IR-NEXT: [[TMP4:%.*]] = call i32 @llvm.amdgcn.mbcnt.lo(i32 [[TMP2]], i32 0)
	; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])			; IR-NEXT: [[TMP5:%.*]] = call i32 @llvm.amdgcn.mbcnt.hi(i32 [[TMP3]], i32 [[TMP4]])
	; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0			; IR-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP5]], 0
	; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]			; IR-NEXT: br i1 [[TMP6]], label [[TMP7:%.]], label [[TMP9:%.]]
	; IR: 7:			; IR: 7:
	; IR-NEXT: [[TMP8:%.]] = atomicrmw volatile min ptr addrspace(1) [[PTR]], i32 [[IN:%.]] syncscope("workgroup") seq_cst, align 4			; IR-NEXT: [[TMP8:%.*]] = atomicrmw volatile min ptr addrspace(1) [[PTR]], i32 [[IN]] syncscope("workgroup") seq_cst, align 4
	; IR-NEXT: br label [[TMP9]]			; IR-NEXT: br label [[TMP9]]
	; IR: 9:			; IR: 9:
	; IR-NEXT: [[TMP10:%.]] = phi i32 [ poison, [[ENTRY:%.]] ], [ [[TMP8]], [[TMP7]] ]			; IR-NEXT: [[TMP10:%.]] = phi i32 [ poison, [[ENTRY:%.]] ], [ [[TMP8]], [[TMP7]] ]
	; IR-NEXT: [[TMP11:%.*]] = call i32 @llvm.amdgcn.readfirstlane(i32 [[TMP10]])			; IR-NEXT: [[TMP11:%.*]] = call i32 @llvm.amdgcn.readfirstlane.i32(i32 [[TMP10]])
	; IR-NEXT: [[TMP12:%.*]] = select i1 [[TMP6]], i32 2147483647, i32 [[IN]]			; IR-NEXT: [[TMP12:%.*]] = select i1 [[TMP6]], i32 2147483647, i32 [[IN]]
	; IR-NEXT: [[TMP13:%.*]] = icmp slt i32 [[TMP11]], [[TMP12]]			; IR-NEXT: [[TMP13:%.*]] = icmp slt i32 [[TMP11]], [[TMP12]]
	; IR-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP11]], i32 [[TMP12]]			; IR-NEXT: [[TMP14:%.*]] = select i1 [[TMP13]], i32 [[TMP11]], i32 [[TMP12]]
	; IR-NEXT: store i32 [[TMP14]], ptr addrspace(1) [[OUT2:%.*]], align 4			; IR-NEXT: store i32 [[TMP14]], ptr addrspace(1) [[OUT2]], align 4
	; IR-NEXT: ret void			; IR-NEXT: ret void
	;			;
	entry:			entry:
	%ptr = getelementptr i32, ptr addrspace(1) %out, i64 %index			%ptr = getelementptr i32, ptr addrspace(1) %out, i64 %index
	%val = atomicrmw volatile min ptr addrspace(1) %ptr, i32 %in syncscope("workgroup") seq_cst			%val = atomicrmw volatile min ptr addrspace(1) %ptr, i32 %in syncscope("workgroup") seq_cst
	store i32 %val, ptr addrspace(1) %out2			store i32 %val, ptr addrspace(1) %out2
	ret void			ret void
	}			}

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -global-isel=0 -amdgpu-load-store-vectorizer=0 -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10,GFX10-SDAG %s			; RUN: llc -global-isel=0 -amdgpu-load-store-vectorizer=0 -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10,GFX10-SDAG %s
	; RUN: llc -global-isel=1 -amdgpu-load-store-vectorizer=0 -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10,GFX10-GISEL %s			; RUN: llc -global-isel=1 -amdgpu-load-store-vectorizer=0 -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10,GFX10-GISEL %s
	; RUN: llc -global-isel=0 -amdgpu-load-store-vectorizer=0 -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX11,GFX11-SDAG %s			; RUN: llc -global-isel=0 -amdgpu-load-store-vectorizer=0 -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX11,GFX11-SDAG %s
	; RUN: llc -global-isel=1 -amdgpu-load-store-vectorizer=0 -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX11,GFX11-GISEL %s			; RUN: llc -global-isel=1 -amdgpu-load-store-vectorizer=0 -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX11,GFX11-GISEL %s

	declare i32 @llvm.amdgcn.permlane16(i32, i32, i32, i32, i1, i1)			declare i32 @llvm.amdgcn.permlane16(i32, i32, i32, i32, i1, i1)
	declare i32 @llvm.amdgcn.permlanex16(i32, i32, i32, i32, i1, i1)			declare i32 @llvm.amdgcn.permlanex16(i32, i32, i32, i32, i1, i1)


				declare i16 @llvm.amdgcn.permlane16.i16(i16, i16, i32, i32, i1, i1) #0
				declare half @llvm.amdgcn.permlane16.f16(half, half, i32, i32, i1, i1) #0
				declare float @llvm.amdgcn.permlane16.f32(float, float, i32, i32, i1, i1) #0
				declare <3 x i16> @llvm.amdgcn.permlane16.v3i16(<3 x i16>, <3 x i16>, i32, i32, i1, i1) #0
				declare <9 x float> @llvm.amdgcn.permlane16.v9f32(<9 x float>, <9 x float>, i32, i32, i1, i1) #0

				declare bfloat @llvm.amdgcn.permlane16.bfloat(bfloat, bfloat, i32, i32, i1, i1) #0
				declare <2 x bfloat> @llvm.amdgcn.permlane16.v2bf(<2 x bfloat>, <2 x bfloat>, i32, i32, i1, i1) #0
				declare <2 x i16> @llvm.amdgcn.permlane16.v2i16(<2 x i16>, <2 x i16>, i32, i32, i1, i1) #0
				declare <2 x half> @llvm.amdgcn.permlane16.v2f16(<2 x half>, <2 x half>, i32, i32, i1, i1) #0

				declare i16 @llvm.amdgcn.permlanex16.i16(i16, i16, i32, i32, i1, i1) #0
				declare half @llvm.amdgcn.permlanex16.f16(half, half, i32, i32, i1, i1) #0
				declare float @llvm.amdgcn.permlanex16.f32(float, float, i32, i32, i1, i1) #0
				declare <3 x i16> @llvm.amdgcn.permlanex16.v3i16(<3 x i16>, <3 x i16>, i32, i32, i1, i1) #0
				declare <9 x float> @llvm.amdgcn.permlanex16.v9f32(<9 x float>, <9 x float>, i32, i32, i1, i1) #0

				declare bfloat @llvm.amdgcn.permlanex16.bfloat(bfloat, bfloat, i32, i32, i1, i1) #0
				declare <2 x bfloat> @llvm.amdgcn.permlanex16.v2bf(<2 x bfloat>, <2 x bfloat>, i32, i32, i1, i1) #0
				declare <2 x i16> @llvm.amdgcn.permlanex16.v2i16(<2 x i16>, <2 x i16>, i32, i32, i1, i1) #0
				declare <2 x half> @llvm.amdgcn.permlanex16.v2f16(<2 x half>, <2 x half>, i32, i32, i1, i1) #0


	declare i32 @llvm.amdgcn.workitem.id.x()			declare i32 @llvm.amdgcn.workitem.id.x()
	declare i32 @llvm.amdgcn.workitem.id.y()			declare i32 @llvm.amdgcn.workitem.id.y()

	define amdgpu_kernel void @v_permlane16_b32_vss(ptr addrspace(1) %out, i32 %src0, i32 %src1, i32 %src2) {			define amdgpu_kernel void @v_permlane16_b32_vss(ptr addrspace(1) %out, i32 %src0, i32 %src1, i32 %src2) {
	; GFX10-LABEL: v_permlane16_b32_vss:			; GFX10-LABEL: v_permlane16_b32_vss:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: s_clause 0x1
	; GFX10-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24			; GFX10-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
	▲ Show 20 Lines • Show All 895 Lines • ▼ Show 20 Lines
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	%tidx = call i32 @llvm.amdgcn.workitem.id.x()			%tidx = call i32 @llvm.amdgcn.workitem.id.x()
	%undef = freeze i32 poison			%undef = freeze i32 poison
	%v = call i32 @llvm.amdgcn.permlane16(i32 %undef, i32 %tidx, i32 %src1, i32 %src2, i1 true, i1 true)			%v = call i32 @llvm.amdgcn.permlane16(i32 %undef, i32 %tidx, i32 %src1, i32 %src2, i1 true, i1 true)
	store i32 %v, ptr addrspace(1) %out			store i32 %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

				define void @test_permlane_i16(ptr addrspace(1) %out, i16 %src0, i32 %src1, i32 %src2) #1 {
				; GFX10-LABEL: test_permlane_i16:
				; GFX10: ; %bb.0:
				; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: v_and_b32_e32 v2, 0xffff, v2
				; GFX10-NEXT: v_readfirstlane_b32 s4, v3
				; GFX10-NEXT: v_readfirstlane_b32 s5, v4
				; GFX10-NEXT: v_permlane16_b32 v2, v2, s4, s5 op_sel:[1,0]
				; GFX10-NEXT: global_store_short v[0:1], v2, off
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX11-LABEL: test_permlane_i16:
				; GFX11: ; %bb.0:
				; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: v_and_b32_e32 v2, 0xffff, v2
				; GFX11-NEXT: v_readfirstlane_b32 s0, v3
				; GFX11-NEXT: v_readfirstlane_b32 s1, v4
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
				; GFX11-NEXT: v_permlane16_b32 v2, v2, s0, s1 op_sel:[1,0]
				; GFX11-NEXT: global_store_b16 v[0:1], v2, off
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: s_setpc_b64 s[30:31]
				%v = call i16 @llvm.amdgcn.permlane16.i16(i16 %src0, i16 %src0, i32 %src1, i32 %src2, i1 true, i1 false)
				store i16 %v, ptr addrspace(1) %out, align 4
				ret void
				}

				define void @test_permlane_f16(ptr addrspace(1) %out, half %src0, i32 %src1, i32 %src2) #1 {
				; GFX10-LABEL: test_permlane_f16:
				; GFX10: ; %bb.0:
				; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: v_and_b32_e32 v2, 0xffff, v2
				; GFX10-NEXT: v_readfirstlane_b32 s4, v3
				; GFX10-NEXT: v_readfirstlane_b32 s5, v4
				; GFX10-NEXT: v_permlane16_b32 v2, v2, s4, s5 op_sel:[1,0]
				; GFX10-NEXT: global_store_short v[0:1], v2, off
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX11-LABEL: test_permlane_f16:
				; GFX11: ; %bb.0:
				; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: v_and_b32_e32 v2, 0xffff, v2
				; GFX11-NEXT: v_readfirstlane_b32 s0, v3
				; GFX11-NEXT: v_readfirstlane_b32 s1, v4
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
				; GFX11-NEXT: v_permlane16_b32 v2, v2, s0, s1 op_sel:[1,0]
				; GFX11-NEXT: global_store_b16 v[0:1], v2, off
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: s_setpc_b64 s[30:31]
				%v = call half @llvm.amdgcn.permlane16.f16(half %src0, half %src0, i32 %src1, i32 %src2, i1 true, i1 false)
				store half %v, ptr addrspace(1) %out, align 4
				ret void
				}

				define void @test_permlane_f32(ptr addrspace(1) %out, float %src0, i32 %src1, i32 %src2) #1 {
				; GFX10-LABEL: test_permlane_f32:
				; GFX10: ; %bb.0:
				; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: v_readfirstlane_b32 s4, v3
				; GFX10-NEXT: v_readfirstlane_b32 s5, v4
				; GFX10-NEXT: v_permlane16_b32 v2, v2, s4, s5 op_sel:[1,0]
				; GFX10-NEXT: global_store_dword v[0:1], v2, off
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX11-LABEL: test_permlane_f32:
				; GFX11: ; %bb.0:
				; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: v_readfirstlane_b32 s0, v3
				; GFX11-NEXT: v_readfirstlane_b32 s1, v4
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
				; GFX11-NEXT: v_permlane16_b32 v2, v2, s0, s1 op_sel:[1,0]
				; GFX11-NEXT: global_store_b32 v[0:1], v2, off
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: s_setpc_b64 s[30:31]
				%v = call float @llvm.amdgcn.permlane16.f32(float %src0, float %src0, i32 %src1, i32 %src2, i1 true, i1 false)
				store float %v, ptr addrspace(1) %out, align 4
				ret void
				}

				define void @test_permlane_bfloat(ptr addrspace(1) %out, bfloat %src0, i32 %src1, i32 %src2) #1 {
				; GFX10-SDAG-LABEL: test_permlane_bfloat:
				; GFX10-SDAG: ; %bb.0:
				; GFX10-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-SDAG-NEXT: v_lshrrev_b32_e32 v2, 16, v2
				; GFX10-SDAG-NEXT: v_readfirstlane_b32 s4, v3
				; GFX10-SDAG-NEXT: v_readfirstlane_b32 s5, v4
				; GFX10-SDAG-NEXT: v_permlane16_b32 v2, v2, s4, s5 op_sel:[1,0]
				; GFX10-SDAG-NEXT: global_store_short v[0:1], v2, off
				; GFX10-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-SDAG-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX10-GISEL-LABEL: test_permlane_bfloat:
				; GFX10-GISEL: ; %bb.0:
				; GFX10-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-GISEL-NEXT: v_and_b32_e32 v2, 0xffff, v2
				; GFX10-GISEL-NEXT: v_readfirstlane_b32 s4, v3
				; GFX10-GISEL-NEXT: v_readfirstlane_b32 s5, v4
				; GFX10-GISEL-NEXT: v_permlane16_b32 v2, v2, s4, s5 op_sel:[1,0]
				; GFX10-GISEL-NEXT: global_store_short v[0:1], v2, off
				; GFX10-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-GISEL-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX11-SDAG-LABEL: test_permlane_bfloat:
				; GFX11-SDAG: ; %bb.0:
				; GFX11-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-SDAG-NEXT: v_lshrrev_b32_e32 v2, 16, v2
				; GFX11-SDAG-NEXT: v_readfirstlane_b32 s0, v3
				; GFX11-SDAG-NEXT: v_readfirstlane_b32 s1, v4
				; GFX11-SDAG-NEXT: s_delay_alu instid0(VALU_DEP_1)
				; GFX11-SDAG-NEXT: v_permlane16_b32 v2, v2, s0, s1 op_sel:[1,0]
				; GFX11-SDAG-NEXT: global_store_b16 v[0:1], v2, off
				; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-SDAG-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX11-GISEL-LABEL: test_permlane_bfloat:
				; GFX11-GISEL: ; %bb.0:
				; GFX11-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-GISEL-NEXT: v_and_b32_e32 v2, 0xffff, v2
				; GFX11-GISEL-NEXT: v_readfirstlane_b32 s0, v3
				; GFX11-GISEL-NEXT: v_readfirstlane_b32 s1, v4
				; GFX11-GISEL-NEXT: s_delay_alu instid0(VALU_DEP_1)
				; GFX11-GISEL-NEXT: v_permlane16_b32 v2, v2, s0, s1 op_sel:[1,0]
				; GFX11-GISEL-NEXT: global_store_b16 v[0:1], v2, off
				; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-GISEL-NEXT: s_setpc_b64 s[30:31]
				%v = call bfloat @llvm.amdgcn.permlane16.bfloat(bfloat %src0, bfloat %src0, i32 %src1, i32 %src2, i1 true, i1 false)
				store bfloat %v, ptr addrspace(1) %out, align 4
				ret void
				}
				define void @test_permlane_v2bf(ptr addrspace(1) %out, <2 x bfloat> %src0, i32 %src1, i32 %src2) #1 {
				; GFX10-SDAG-LABEL: test_permlane_v2bf:
				; GFX10-SDAG: ; %bb.0:
				; GFX10-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-SDAG-NEXT: v_readfirstlane_b32 s4, v3
				; GFX10-SDAG-NEXT: v_readfirstlane_b32 s5, v4
				; GFX10-SDAG-NEXT: v_permlane16_b32 v2, v2, s4, s5 op_sel:[1,0]
				; GFX10-SDAG-NEXT: global_store_dword v[0:1], v2, off
				; GFX10-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-SDAG-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX10-GISEL-LABEL: test_permlane_v2bf:
				; GFX10-GISEL: ; %bb.0:
				; GFX10-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-GISEL-NEXT: v_bfi_b32 v2, 0xffff, v2, v2
				; GFX10-GISEL-NEXT: v_readfirstlane_b32 s4, v3
				; GFX10-GISEL-NEXT: v_readfirstlane_b32 s5, v4
				; GFX10-GISEL-NEXT: v_permlane16_b32 v2, v2, s4, s5 op_sel:[1,0]
				; GFX10-GISEL-NEXT: v_bfi_b32 v2, 0xffff, v2, v2
				; GFX10-GISEL-NEXT: global_store_dword v[0:1], v2, off
				; GFX10-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-GISEL-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX11-SDAG-LABEL: test_permlane_v2bf:
				; GFX11-SDAG: ; %bb.0:
				; GFX11-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-SDAG-NEXT: v_readfirstlane_b32 s0, v3
				; GFX11-SDAG-NEXT: v_readfirstlane_b32 s1, v4
				; GFX11-SDAG-NEXT: s_delay_alu instid0(VALU_DEP_1)
				; GFX11-SDAG-NEXT: v_permlane16_b32 v2, v2, s0, s1 op_sel:[1,0]
				; GFX11-SDAG-NEXT: global_store_b32 v[0:1], v2, off
				; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-SDAG-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX11-GISEL-LABEL: test_permlane_v2bf:
				; GFX11-GISEL: ; %bb.0:
				; GFX11-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-GISEL-NEXT: v_bfi_b32 v2, 0xffff, v2, v2
				; GFX11-GISEL-NEXT: v_readfirstlane_b32 s0, v3
				; GFX11-GISEL-NEXT: v_readfirstlane_b32 s1, v4
				; GFX11-GISEL-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
				; GFX11-GISEL-NEXT: v_permlane16_b32 v2, v2, s0, s1 op_sel:[1,0]
				; GFX11-GISEL-NEXT: v_bfi_b32 v2, 0xffff, v2, v2
				; GFX11-GISEL-NEXT: global_store_b32 v[0:1], v2, off
				; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-GISEL-NEXT: s_setpc_b64 s[30:31]
				%v = call <2 x bfloat> @llvm.amdgcn.permlane16.v2bf(<2 x bfloat> %src0, <2 x bfloat> %src0, i32 %src1, i32 %src2, i1 true, i1 false)
				store <2 x bfloat> %v, ptr addrspace(1) %out, align 4
				ret void
				}
				define void @test_permlane_v2i16(ptr addrspace(1) %out, <2 x i16> %src0, i32 %src1, i32 %src2) #1 {
				; GFX10-SDAG-LABEL: test_permlane_v2i16:
				; GFX10-SDAG: ; %bb.0:
				; GFX10-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-SDAG-NEXT: v_readfirstlane_b32 s4, v3
				; GFX10-SDAG-NEXT: v_readfirstlane_b32 s5, v4
				; GFX10-SDAG-NEXT: v_permlane16_b32 v2, v2, s4, s5 op_sel:[1,0]
				; GFX10-SDAG-NEXT: global_store_dword v[0:1], v2, off
				; GFX10-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-SDAG-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX10-GISEL-LABEL: test_permlane_v2i16:
				; GFX10-GISEL: ; %bb.0:
				; GFX10-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-GISEL-NEXT: v_readfirstlane_b32 s4, v3
				; GFX10-GISEL-NEXT: v_readfirstlane_b32 s5, v4
				; GFX10-GISEL-NEXT: v_permlane16_b32 v2, v2, s4, s5 op_sel:[1,0]
				; GFX10-GISEL-NEXT: v_bfi_b32 v2, 0xffff, v2, v2
				; GFX10-GISEL-NEXT: global_store_dword v[0:1], v2, off
				; GFX10-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-GISEL-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX11-SDAG-LABEL: test_permlane_v2i16:
				; GFX11-SDAG: ; %bb.0:
				; GFX11-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-SDAG-NEXT: v_readfirstlane_b32 s0, v3
				; GFX11-SDAG-NEXT: v_readfirstlane_b32 s1, v4
				; GFX11-SDAG-NEXT: s_delay_alu instid0(VALU_DEP_1)
				; GFX11-SDAG-NEXT: v_permlane16_b32 v2, v2, s0, s1 op_sel:[1,0]
				; GFX11-SDAG-NEXT: global_store_b32 v[0:1], v2, off
				; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-SDAG-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX11-GISEL-LABEL: test_permlane_v2i16:
				; GFX11-GISEL: ; %bb.0:
				; GFX11-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-GISEL-NEXT: v_readfirstlane_b32 s0, v3
				; GFX11-GISEL-NEXT: v_readfirstlane_b32 s1, v4
				; GFX11-GISEL-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
				; GFX11-GISEL-NEXT: v_permlane16_b32 v2, v2, s0, s1 op_sel:[1,0]
				; GFX11-GISEL-NEXT: v_bfi_b32 v2, 0xffff, v2, v2
				; GFX11-GISEL-NEXT: global_store_b32 v[0:1], v2, off
				; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-GISEL-NEXT: s_setpc_b64 s[30:31]
				%v = call <2 x i16> @llvm.amdgcn.permlane16.v2i16(<2 x i16> %src0, <2 x i16> %src0, i32 %src1, i32 %src2, i1 true, i1 false)
				store <2 x i16> %v, ptr addrspace(1) %out, align 4
				ret void
				}
				define void @test_permlane_v2f16(ptr addrspace(1) %out, <2 x half> %src0, i32 %src1, i32 %src2) #1 {
				; GFX10-SDAG-LABEL: test_permlane_v2f16:
				; GFX10-SDAG: ; %bb.0:
				; GFX10-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-SDAG-NEXT: v_readfirstlane_b32 s4, v3
				; GFX10-SDAG-NEXT: v_readfirstlane_b32 s5, v4
				; GFX10-SDAG-NEXT: v_permlane16_b32 v2, v2, s4, s5 op_sel:[1,0]
				; GFX10-SDAG-NEXT: global_store_dword v[0:1], v2, off
				; GFX10-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-SDAG-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX10-GISEL-LABEL: test_permlane_v2f16:
				; GFX10-GISEL: ; %bb.0:
				; GFX10-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-GISEL-NEXT: v_readfirstlane_b32 s4, v3
				; GFX10-GISEL-NEXT: v_readfirstlane_b32 s5, v4
				; GFX10-GISEL-NEXT: v_permlane16_b32 v2, v2, s4, s5 op_sel:[1,0]
				; GFX10-GISEL-NEXT: v_bfi_b32 v2, 0xffff, v2, v2
				; GFX10-GISEL-NEXT: global_store_dword v[0:1], v2, off
				; GFX10-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-GISEL-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX11-SDAG-LABEL: test_permlane_v2f16:
				; GFX11-SDAG: ; %bb.0:
				; GFX11-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-SDAG-NEXT: v_readfirstlane_b32 s0, v3
				; GFX11-SDAG-NEXT: v_readfirstlane_b32 s1, v4
				; GFX11-SDAG-NEXT: s_delay_alu instid0(VALU_DEP_1)
				; GFX11-SDAG-NEXT: v_permlane16_b32 v2, v2, s0, s1 op_sel:[1,0]
				; GFX11-SDAG-NEXT: global_store_b32 v[0:1], v2, off
				; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-SDAG-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX11-GISEL-LABEL: test_permlane_v2f16:
				; GFX11-GISEL: ; %bb.0:
				; GFX11-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-GISEL-NEXT: v_readfirstlane_b32 s0, v3
				; GFX11-GISEL-NEXT: v_readfirstlane_b32 s1, v4
				; GFX11-GISEL-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
				; GFX11-GISEL-NEXT: v_permlane16_b32 v2, v2, s0, s1 op_sel:[1,0]
				; GFX11-GISEL-NEXT: v_bfi_b32 v2, 0xffff, v2, v2
				; GFX11-GISEL-NEXT: global_store_b32 v[0:1], v2, off
				; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-GISEL-NEXT: s_setpc_b64 s[30:31]
				%v = call <2 x half> @llvm.amdgcn.permlane16.v2f16(<2 x half> %src0, <2 x half> %src0, i32 %src1, i32 %src2, i1 true, i1 false)
				store <2 x half> %v, ptr addrspace(1) %out, align 4
				ret void
				}

				define void @test_permlane_v3i16(ptr addrspace(1) %out, <3 x i16> %src0, i32 %src1, i32 %src2) #1 {
				; GFX10-SDAG-LABEL: test_permlane_v3i16:
				; GFX10-SDAG: ; %bb.0:
				; GFX10-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-SDAG-NEXT: v_readfirstlane_b32 s4, v4
				; GFX10-SDAG-NEXT: v_readfirstlane_b32 s5, v5
				; GFX10-SDAG-NEXT: v_permlane16_b32 v3, v3, s4, s5 op_sel:[1,0]
				; GFX10-SDAG-NEXT: v_permlane16_b32 v2, v2, s4, s5 op_sel:[1,0]
				; GFX10-SDAG-NEXT: global_store_short v[0:1], v3, off offset:4
				; GFX10-SDAG-NEXT: global_store_dword v[0:1], v2, off
				; GFX10-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-SDAG-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX10-GISEL-LABEL: test_permlane_v3i16:
				; GFX10-GISEL: ; %bb.0:
				; GFX10-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-GISEL-NEXT: v_and_b32_e32 v3, 0xffff, v3
				; GFX10-GISEL-NEXT: v_readfirstlane_b32 s4, v4
				; GFX10-GISEL-NEXT: v_readfirstlane_b32 s5, v5
				; GFX10-GISEL-NEXT: v_lshl_or_b32 v3, s4, 16, v3
				; GFX10-GISEL-NEXT: v_permlane16_b32 v2, v2, s4, s5 op_sel:[1,0]
				; GFX10-GISEL-NEXT: v_permlane16_b32 v3, v3, s4, s5 op_sel:[1,0]
				; GFX10-GISEL-NEXT: global_store_short v[0:1], v2, off
				; GFX10-GISEL-NEXT: global_store_short_d16_hi v[0:1], v2, off offset:2
				; GFX10-GISEL-NEXT: global_store_short v[0:1], v3, off offset:4
				; GFX10-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-GISEL-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX11-SDAG-LABEL: test_permlane_v3i16:
				; GFX11-SDAG: ; %bb.0:
				; GFX11-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-SDAG-NEXT: v_readfirstlane_b32 s0, v4
				; GFX11-SDAG-NEXT: v_readfirstlane_b32 s1, v5
				; GFX11-SDAG-NEXT: s_delay_alu instid0(VALU_DEP_1)
				; GFX11-SDAG-NEXT: v_permlane16_b32 v3, v3, s0, s1 op_sel:[1,0]
				; GFX11-SDAG-NEXT: v_permlane16_b32 v2, v2, s0, s1 op_sel:[1,0]
				; GFX11-SDAG-NEXT: s_clause 0x1
				; GFX11-SDAG-NEXT: global_store_b16 v[0:1], v3, off offset:4
				; GFX11-SDAG-NEXT: global_store_b32 v[0:1], v2, off
				; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-SDAG-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX11-GISEL-LABEL: test_permlane_v3i16:
				; GFX11-GISEL: ; %bb.0:
				; GFX11-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-GISEL-NEXT: v_and_b32_e32 v3, 0xffff, v3
				; GFX11-GISEL-NEXT: v_readfirstlane_b32 s0, v4
				; GFX11-GISEL-NEXT: v_readfirstlane_b32 s1, v5
				; GFX11-GISEL-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)
				; GFX11-GISEL-NEXT: v_lshl_or_b32 v3, s0, 16, v3
				; GFX11-GISEL-NEXT: v_permlane16_b32 v2, v2, s0, s1 op_sel:[1,0]
				; GFX11-GISEL-NEXT: s_delay_alu instid0(VALU_DEP_2)
				; GFX11-GISEL-NEXT: v_permlane16_b32 v3, v3, s0, s1 op_sel:[1,0]
				; GFX11-GISEL-NEXT: s_clause 0x2
				; GFX11-GISEL-NEXT: global_store_b16 v[0:1], v2, off
				; GFX11-GISEL-NEXT: global_store_d16_hi_b16 v[0:1], v2, off offset:2
				; GFX11-GISEL-NEXT: global_store_b16 v[0:1], v3, off offset:4
				; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-GISEL-NEXT: s_setpc_b64 s[30:31]
				%v = call <3 x i16> @llvm.amdgcn.permlane16.v3i16(<3 x i16> %src0, <3 x i16> %src0, i32 %src1, i32 %src2, i1 true, i1 false)
				store <3 x i16> %v, ptr addrspace(1) %out, align 4
				ret void
				}

				define void @test_permlane_v9f32(ptr addrspace(1) %out, <9 x float> %src0, i32 %src1, i32 %src2) #1 {
				; GFX10-SDAG-LABEL: test_permlane_v9f32:
				; GFX10-SDAG: ; %bb.0:
				; GFX10-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-SDAG-NEXT: v_readfirstlane_b32 s4, v11
				; GFX10-SDAG-NEXT: v_readfirstlane_b32 s5, v12
				; GFX10-SDAG-NEXT: v_permlane16_b32 v10, v10, s4, s5 op_sel:[1,0]
				; GFX10-SDAG-NEXT: v_permlane16_b32 v6, v6, s4, s5 op_sel:[1,0]
				; GFX10-SDAG-NEXT: v_permlane16_b32 v7, v7, s4, s5 op_sel:[1,0]
				; GFX10-SDAG-NEXT: v_permlane16_b32 v8, v8, s4, s5 op_sel:[1,0]
				; GFX10-SDAG-NEXT: v_permlane16_b32 v9, v9, s4, s5 op_sel:[1,0]
				; GFX10-SDAG-NEXT: v_permlane16_b32 v2, v2, s4, s5 op_sel:[1,0]
				; GFX10-SDAG-NEXT: v_permlane16_b32 v3, v3, s4, s5 op_sel:[1,0]
				; GFX10-SDAG-NEXT: v_permlane16_b32 v4, v4, s4, s5 op_sel:[1,0]
				; GFX10-SDAG-NEXT: v_permlane16_b32 v5, v5, s4, s5 op_sel:[1,0]
				; GFX10-SDAG-NEXT: global_store_dword v[0:1], v10, off offset:32
				; GFX10-SDAG-NEXT: global_store_dwordx4 v[0:1], v[6:9], off offset:16
				; GFX10-SDAG-NEXT: global_store_dwordx4 v[0:1], v[2:5], off
				; GFX10-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-SDAG-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX10-GISEL-LABEL: test_permlane_v9f32:
				; GFX10-GISEL: ; %bb.0:
				; GFX10-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-GISEL-NEXT: v_readfirstlane_b32 s4, v11
				; GFX10-GISEL-NEXT: v_readfirstlane_b32 s5, v12
				; GFX10-GISEL-NEXT: v_permlane16_b32 v2, v2, s4, s5 op_sel:[1,0]
				; GFX10-GISEL-NEXT: v_permlane16_b32 v3, v3, s4, s5 op_sel:[1,0]
				; GFX10-GISEL-NEXT: v_permlane16_b32 v4, v4, s4, s5 op_sel:[1,0]
				; GFX10-GISEL-NEXT: v_permlane16_b32 v5, v5, s4, s5 op_sel:[1,0]
				; GFX10-GISEL-NEXT: v_permlane16_b32 v6, v6, s4, s5 op_sel:[1,0]
				; GFX10-GISEL-NEXT: v_permlane16_b32 v7, v7, s4, s5 op_sel:[1,0]
				; GFX10-GISEL-NEXT: v_permlane16_b32 v8, v8, s4, s5 op_sel:[1,0]
				; GFX10-GISEL-NEXT: v_permlane16_b32 v9, v9, s4, s5 op_sel:[1,0]
				; GFX10-GISEL-NEXT: v_permlane16_b32 v10, v10, s4, s5 op_sel:[1,0]
				; GFX10-GISEL-NEXT: global_store_dwordx4 v[0:1], v[2:5], off
				; GFX10-GISEL-NEXT: global_store_dwordx4 v[0:1], v[6:9], off offset:16
				; GFX10-GISEL-NEXT: global_store_dword v[0:1], v10, off offset:32
				; GFX10-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-GISEL-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX11-SDAG-LABEL: test_permlane_v9f32:
				; GFX11-SDAG: ; %bb.0:
				; GFX11-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-SDAG-NEXT: v_readfirstlane_b32 s0, v11
				; GFX11-SDAG-NEXT: v_readfirstlane_b32 s1, v12
				; GFX11-SDAG-NEXT: s_delay_alu instid0(VALU_DEP_1)
				; GFX11-SDAG-NEXT: v_permlane16_b32 v10, v10, s0, s1 op_sel:[1,0]
				; GFX11-SDAG-NEXT: v_permlane16_b32 v6, v6, s0, s1 op_sel:[1,0]
				; GFX11-SDAG-NEXT: v_permlane16_b32 v7, v7, s0, s1 op_sel:[1,0]
				; GFX11-SDAG-NEXT: v_permlane16_b32 v8, v8, s0, s1 op_sel:[1,0]
				; GFX11-SDAG-NEXT: v_permlane16_b32 v9, v9, s0, s1 op_sel:[1,0]
				; GFX11-SDAG-NEXT: v_permlane16_b32 v2, v2, s0, s1 op_sel:[1,0]
				; GFX11-SDAG-NEXT: v_permlane16_b32 v3, v3, s0, s1 op_sel:[1,0]
				; GFX11-SDAG-NEXT: v_permlane16_b32 v4, v4, s0, s1 op_sel:[1,0]
				; GFX11-SDAG-NEXT: v_permlane16_b32 v5, v5, s0, s1 op_sel:[1,0]
				; GFX11-SDAG-NEXT: s_clause 0x2
				; GFX11-SDAG-NEXT: global_store_b32 v[0:1], v10, off offset:32
				; GFX11-SDAG-NEXT: global_store_b128 v[0:1], v[6:9], off offset:16
				; GFX11-SDAG-NEXT: global_store_b128 v[0:1], v[2:5], off
				; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-SDAG-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX11-GISEL-LABEL: test_permlane_v9f32:
				; GFX11-GISEL: ; %bb.0:
				; GFX11-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-GISEL-NEXT: v_readfirstlane_b32 s0, v11
				; GFX11-GISEL-NEXT: v_readfirstlane_b32 s1, v12
				; GFX11-GISEL-NEXT: s_delay_alu instid0(VALU_DEP_1)
				; GFX11-GISEL-NEXT: v_permlane16_b32 v2, v2, s0, s1 op_sel:[1,0]
				; GFX11-GISEL-NEXT: v_permlane16_b32 v3, v3, s0, s1 op_sel:[1,0]
				; GFX11-GISEL-NEXT: v_permlane16_b32 v4, v4, s0, s1 op_sel:[1,0]
				; GFX11-GISEL-NEXT: v_permlane16_b32 v5, v5, s0, s1 op_sel:[1,0]
				; GFX11-GISEL-NEXT: v_permlane16_b32 v6, v6, s0, s1 op_sel:[1,0]
				; GFX11-GISEL-NEXT: v_permlane16_b32 v7, v7, s0, s1 op_sel:[1,0]
				; GFX11-GISEL-NEXT: v_permlane16_b32 v8, v8, s0, s1 op_sel:[1,0]
				; GFX11-GISEL-NEXT: v_permlane16_b32 v9, v9, s0, s1 op_sel:[1,0]
				; GFX11-GISEL-NEXT: v_permlane16_b32 v10, v10, s0, s1 op_sel:[1,0]
				; GFX11-GISEL-NEXT: s_clause 0x2
				; GFX11-GISEL-NEXT: global_store_b128 v[0:1], v[2:5], off
				; GFX11-GISEL-NEXT: global_store_b128 v[0:1], v[6:9], off offset:16
				; GFX11-GISEL-NEXT: global_store_b32 v[0:1], v10, off offset:32
				; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-GISEL-NEXT: s_setpc_b64 s[30:31]
				%v = call <9 x float> @llvm.amdgcn.permlane16.v9f32(<9 x float> %src0, <9 x float> %src0, i32 %src1, i32 %src2, i1 true, i1 false)
				store <9 x float> %v, ptr addrspace(1) %out, align 4
				ret void
				}

	define amdgpu_kernel void @v_permlanex16_b32_tid_tid(ptr addrspace(1) %out, i32 %src0, i32 %src1, i32 %src2) {			define amdgpu_kernel void @v_permlanex16_b32_tid_tid(ptr addrspace(1) %out, i32 %src0, i32 %src1, i32 %src2) {
	; GFX10-LABEL: v_permlanex16_b32_tid_tid:			; GFX10-LABEL: v_permlanex16_b32_tid_tid:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: s_clause 0x1
	; GFX10-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x30			; GFX10-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x30
	; GFX10-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x24			; GFX10-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x24
	; GFX10-NEXT: v_mov_b32_e32 v1, 0			; GFX10-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	▲ Show 20 Lines • Show All 189 Lines • ▼ Show 20 Lines
	; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)			; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	%tidx = call i32 @llvm.amdgcn.workitem.id.x()			%tidx = call i32 @llvm.amdgcn.workitem.id.x()
	%undef = freeze i32 poison			%undef = freeze i32 poison
	%v = call i32 @llvm.amdgcn.permlanex16(i32 %undef, i32 %tidx, i32 %src1, i32 %src2, i1 true, i1 true)			%v = call i32 @llvm.amdgcn.permlanex16(i32 %undef, i32 %tidx, i32 %src1, i32 %src2, i1 true, i1 true)
	store i32 %v, ptr addrspace(1) %out			store i32 %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

				define void @test_permlanex16_i16(ptr addrspace(1) %out, i16 %src0, i32 %src1, i32 %src2) #1 {
				; GFX10-LABEL: test_permlanex16_i16:
				; GFX10: ; %bb.0:
				; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: v_and_b32_e32 v2, 0xffff, v2
				; GFX10-NEXT: v_readfirstlane_b32 s4, v3
				; GFX10-NEXT: v_readfirstlane_b32 s5, v4
				; GFX10-NEXT: v_permlanex16_b32 v2, v2, s4, s5 op_sel:[1,0]
				; GFX10-NEXT: global_store_short v[0:1], v2, off
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX11-LABEL: test_permlanex16_i16:
				; GFX11: ; %bb.0:
				; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: v_and_b32_e32 v2, 0xffff, v2
				; GFX11-NEXT: v_readfirstlane_b32 s0, v3
				; GFX11-NEXT: v_readfirstlane_b32 s1, v4
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
				; GFX11-NEXT: v_permlanex16_b32 v2, v2, s0, s1 op_sel:[1,0]
				; GFX11-NEXT: global_store_b16 v[0:1], v2, off
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: s_setpc_b64 s[30:31]
				%v = call i16 @llvm.amdgcn.permlanex16.i16(i16 %src0, i16 %src0, i32 %src1, i32 %src2, i1 true, i1 false)
				store i16 %v, ptr addrspace(1) %out, align 4
				ret void
				}

				define void @test_permlanxex16_f16(ptr addrspace(1) %out, half %src0, i32 %src1, i32 %src2) #1 {
				; GFX10-LABEL: test_permlanxex16_f16:
				; GFX10: ; %bb.0:
				; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: v_and_b32_e32 v2, 0xffff, v2
				; GFX10-NEXT: v_readfirstlane_b32 s4, v3
				; GFX10-NEXT: v_readfirstlane_b32 s5, v4
				; GFX10-NEXT: v_permlanex16_b32 v2, v2, s4, s5 op_sel:[1,0]
				; GFX10-NEXT: global_store_short v[0:1], v2, off
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX11-LABEL: test_permlanxex16_f16:
				; GFX11: ; %bb.0:
				; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: v_and_b32_e32 v2, 0xffff, v2
				; GFX11-NEXT: v_readfirstlane_b32 s0, v3
				; GFX11-NEXT: v_readfirstlane_b32 s1, v4
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
				; GFX11-NEXT: v_permlanex16_b32 v2, v2, s0, s1 op_sel:[1,0]
				; GFX11-NEXT: global_store_b16 v[0:1], v2, off
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: s_setpc_b64 s[30:31]
				%v = call half @llvm.amdgcn.permlanex16.f16(half %src0, half %src0, i32 %src1, i32 %src2, i1 true, i1 false)
				store half %v, ptr addrspace(1) %out, align 4
				ret void
				}

				define void @test_permlanex16_f32(ptr addrspace(1) %out, float %src0, i32 %src1, i32 %src2) #1 {
				; GFX10-LABEL: test_permlanex16_f32:
				; GFX10: ; %bb.0:
				; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: v_readfirstlane_b32 s4, v3
				; GFX10-NEXT: v_readfirstlane_b32 s5, v4
				; GFX10-NEXT: v_permlanex16_b32 v2, v2, s4, s5 op_sel:[1,0]
				; GFX10-NEXT: global_store_dword v[0:1], v2, off
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX11-LABEL: test_permlanex16_f32:
				; GFX11: ; %bb.0:
				; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: v_readfirstlane_b32 s0, v3
				; GFX11-NEXT: v_readfirstlane_b32 s1, v4
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
				; GFX11-NEXT: v_permlanex16_b32 v2, v2, s0, s1 op_sel:[1,0]
				; GFX11-NEXT: global_store_b32 v[0:1], v2, off
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: s_setpc_b64 s[30:31]
				%v = call float @llvm.amdgcn.permlanex16.f32(float %src0, float %src0, i32 %src1, i32 %src2, i1 true, i1 false)
				store float %v, ptr addrspace(1) %out, align 4
				ret void
				}

				define void @test_permlanex16_bfloat(ptr addrspace(1) %out, bfloat %src0, i32 %src1, i32 %src2) #1 {
				; GFX10-SDAG-LABEL: test_permlanex16_bfloat:
				; GFX10-SDAG: ; %bb.0:
				; GFX10-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-SDAG-NEXT: v_lshrrev_b32_e32 v2, 16, v2
				; GFX10-SDAG-NEXT: v_readfirstlane_b32 s4, v3
				; GFX10-SDAG-NEXT: v_readfirstlane_b32 s5, v4
				; GFX10-SDAG-NEXT: v_permlanex16_b32 v2, v2, s4, s5 op_sel:[1,0]
				; GFX10-SDAG-NEXT: global_store_short v[0:1], v2, off
				; GFX10-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-SDAG-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX10-GISEL-LABEL: test_permlanex16_bfloat:
				; GFX10-GISEL: ; %bb.0:
				; GFX10-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-GISEL-NEXT: v_and_b32_e32 v2, 0xffff, v2
				; GFX10-GISEL-NEXT: v_readfirstlane_b32 s4, v3
				; GFX10-GISEL-NEXT: v_readfirstlane_b32 s5, v4
				; GFX10-GISEL-NEXT: v_permlanex16_b32 v2, v2, s4, s5 op_sel:[1,0]
				; GFX10-GISEL-NEXT: global_store_short v[0:1], v2, off
				; GFX10-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-GISEL-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX11-SDAG-LABEL: test_permlanex16_bfloat:
				; GFX11-SDAG: ; %bb.0:
				; GFX11-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-SDAG-NEXT: v_lshrrev_b32_e32 v2, 16, v2
				; GFX11-SDAG-NEXT: v_readfirstlane_b32 s0, v3
				; GFX11-SDAG-NEXT: v_readfirstlane_b32 s1, v4
				; GFX11-SDAG-NEXT: s_delay_alu instid0(VALU_DEP_1)
				; GFX11-SDAG-NEXT: v_permlanex16_b32 v2, v2, s0, s1 op_sel:[1,0]
				; GFX11-SDAG-NEXT: global_store_b16 v[0:1], v2, off
				; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-SDAG-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX11-GISEL-LABEL: test_permlanex16_bfloat:
				; GFX11-GISEL: ; %bb.0:
				; GFX11-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-GISEL-NEXT: v_and_b32_e32 v2, 0xffff, v2
				; GFX11-GISEL-NEXT: v_readfirstlane_b32 s0, v3
				; GFX11-GISEL-NEXT: v_readfirstlane_b32 s1, v4
				; GFX11-GISEL-NEXT: s_delay_alu instid0(VALU_DEP_1)
				; GFX11-GISEL-NEXT: v_permlanex16_b32 v2, v2, s0, s1 op_sel:[1,0]
				; GFX11-GISEL-NEXT: global_store_b16 v[0:1], v2, off
				; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-GISEL-NEXT: s_setpc_b64 s[30:31]
				%v = call bfloat @llvm.amdgcn.permlanex16.bfloat(bfloat %src0, bfloat %src0, i32 %src1, i32 %src2, i1 true, i1 false)
				store bfloat %v, ptr addrspace(1) %out, align 4
				ret void
				}
				define void @test_permlanex16_v2bf(ptr addrspace(1) %out, <2 x bfloat> %src0, i32 %src1, i32 %src2) #1 {
				; GFX10-SDAG-LABEL: test_permlanex16_v2bf:
				; GFX10-SDAG: ; %bb.0:
				; GFX10-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-SDAG-NEXT: v_readfirstlane_b32 s4, v3
				; GFX10-SDAG-NEXT: v_readfirstlane_b32 s5, v4
				; GFX10-SDAG-NEXT: v_permlanex16_b32 v2, v2, s4, s5 op_sel:[1,0]
				; GFX10-SDAG-NEXT: global_store_dword v[0:1], v2, off
				; GFX10-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-SDAG-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX10-GISEL-LABEL: test_permlanex16_v2bf:
				; GFX10-GISEL: ; %bb.0:
				; GFX10-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-GISEL-NEXT: v_bfi_b32 v2, 0xffff, v2, v2
				; GFX10-GISEL-NEXT: v_readfirstlane_b32 s4, v3
				; GFX10-GISEL-NEXT: v_readfirstlane_b32 s5, v4
				; GFX10-GISEL-NEXT: v_permlanex16_b32 v2, v2, s4, s5 op_sel:[1,0]
				; GFX10-GISEL-NEXT: v_bfi_b32 v2, 0xffff, v2, v2
				; GFX10-GISEL-NEXT: global_store_dword v[0:1], v2, off
				; GFX10-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-GISEL-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX11-SDAG-LABEL: test_permlanex16_v2bf:
				; GFX11-SDAG: ; %bb.0:
				; GFX11-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-SDAG-NEXT: v_readfirstlane_b32 s0, v3
				; GFX11-SDAG-NEXT: v_readfirstlane_b32 s1, v4
				; GFX11-SDAG-NEXT: s_delay_alu instid0(VALU_DEP_1)
				; GFX11-SDAG-NEXT: v_permlanex16_b32 v2, v2, s0, s1 op_sel:[1,0]
				; GFX11-SDAG-NEXT: global_store_b32 v[0:1], v2, off
				; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-SDAG-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX11-GISEL-LABEL: test_permlanex16_v2bf:
				; GFX11-GISEL: ; %bb.0:
				; GFX11-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-GISEL-NEXT: v_bfi_b32 v2, 0xffff, v2, v2
				; GFX11-GISEL-NEXT: v_readfirstlane_b32 s0, v3
				; GFX11-GISEL-NEXT: v_readfirstlane_b32 s1, v4
				; GFX11-GISEL-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
				; GFX11-GISEL-NEXT: v_permlanex16_b32 v2, v2, s0, s1 op_sel:[1,0]
				; GFX11-GISEL-NEXT: v_bfi_b32 v2, 0xffff, v2, v2
				; GFX11-GISEL-NEXT: global_store_b32 v[0:1], v2, off
				; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-GISEL-NEXT: s_setpc_b64 s[30:31]
				%v = call <2 x bfloat> @llvm.amdgcn.permlanex16.v2bf(<2 x bfloat> %src0, <2 x bfloat> %src0, i32 %src1, i32 %src2, i1 true, i1 false)
				store <2 x bfloat> %v, ptr addrspace(1) %out, align 4
				ret void
				}
				define void @test_permlanex16_v2i16(ptr addrspace(1) %out, <2 x i16> %src0, i32 %src1, i32 %src2) #1 {
				; GFX10-SDAG-LABEL: test_permlanex16_v2i16:
				; GFX10-SDAG: ; %bb.0:
				; GFX10-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-SDAG-NEXT: v_readfirstlane_b32 s4, v3
				; GFX10-SDAG-NEXT: v_readfirstlane_b32 s5, v4
				; GFX10-SDAG-NEXT: v_permlanex16_b32 v2, v2, s4, s5 op_sel:[1,0]
				; GFX10-SDAG-NEXT: global_store_dword v[0:1], v2, off
				; GFX10-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-SDAG-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX10-GISEL-LABEL: test_permlanex16_v2i16:
				; GFX10-GISEL: ; %bb.0:
				; GFX10-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-GISEL-NEXT: v_readfirstlane_b32 s4, v3
				; GFX10-GISEL-NEXT: v_readfirstlane_b32 s5, v4
				; GFX10-GISEL-NEXT: v_permlanex16_b32 v2, v2, s4, s5 op_sel:[1,0]
				; GFX10-GISEL-NEXT: v_bfi_b32 v2, 0xffff, v2, v2
				; GFX10-GISEL-NEXT: global_store_dword v[0:1], v2, off
				; GFX10-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-GISEL-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX11-SDAG-LABEL: test_permlanex16_v2i16:
				; GFX11-SDAG: ; %bb.0:
				; GFX11-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-SDAG-NEXT: v_readfirstlane_b32 s0, v3
				; GFX11-SDAG-NEXT: v_readfirstlane_b32 s1, v4
				; GFX11-SDAG-NEXT: s_delay_alu instid0(VALU_DEP_1)
				; GFX11-SDAG-NEXT: v_permlanex16_b32 v2, v2, s0, s1 op_sel:[1,0]
				; GFX11-SDAG-NEXT: global_store_b32 v[0:1], v2, off
				; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-SDAG-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX11-GISEL-LABEL: test_permlanex16_v2i16:
				; GFX11-GISEL: ; %bb.0:
				; GFX11-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-GISEL-NEXT: v_readfirstlane_b32 s0, v3
				; GFX11-GISEL-NEXT: v_readfirstlane_b32 s1, v4
				; GFX11-GISEL-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
				; GFX11-GISEL-NEXT: v_permlanex16_b32 v2, v2, s0, s1 op_sel:[1,0]
				; GFX11-GISEL-NEXT: v_bfi_b32 v2, 0xffff, v2, v2
				; GFX11-GISEL-NEXT: global_store_b32 v[0:1], v2, off
				; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-GISEL-NEXT: s_setpc_b64 s[30:31]
				%v = call <2 x i16> @llvm.amdgcn.permlanex16.v2i16(<2 x i16> %src0, <2 x i16> %src0, i32 %src1, i32 %src2, i1 true, i1 false)
				store <2 x i16> %v, ptr addrspace(1) %out, align 4
				ret void
				}
				define void @test_permlanex16_v2f16(ptr addrspace(1) %out, <2 x half> %src0, i32 %src1, i32 %src2) #1 {
				; GFX10-SDAG-LABEL: test_permlanex16_v2f16:
				; GFX10-SDAG: ; %bb.0:
				; GFX10-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-SDAG-NEXT: v_readfirstlane_b32 s4, v3
				; GFX10-SDAG-NEXT: v_readfirstlane_b32 s5, v4
				; GFX10-SDAG-NEXT: v_permlanex16_b32 v2, v2, s4, s5 op_sel:[1,0]
				; GFX10-SDAG-NEXT: global_store_dword v[0:1], v2, off
				; GFX10-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-SDAG-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX10-GISEL-LABEL: test_permlanex16_v2f16:
				; GFX10-GISEL: ; %bb.0:
				; GFX10-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-GISEL-NEXT: v_readfirstlane_b32 s4, v3
				; GFX10-GISEL-NEXT: v_readfirstlane_b32 s5, v4
				; GFX10-GISEL-NEXT: v_permlanex16_b32 v2, v2, s4, s5 op_sel:[1,0]
				; GFX10-GISEL-NEXT: v_bfi_b32 v2, 0xffff, v2, v2
				; GFX10-GISEL-NEXT: global_store_dword v[0:1], v2, off
				; GFX10-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-GISEL-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX11-SDAG-LABEL: test_permlanex16_v2f16:
				; GFX11-SDAG: ; %bb.0:
				; GFX11-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-SDAG-NEXT: v_readfirstlane_b32 s0, v3
				; GFX11-SDAG-NEXT: v_readfirstlane_b32 s1, v4
				; GFX11-SDAG-NEXT: s_delay_alu instid0(VALU_DEP_1)
				; GFX11-SDAG-NEXT: v_permlanex16_b32 v2, v2, s0, s1 op_sel:[1,0]
				; GFX11-SDAG-NEXT: global_store_b32 v[0:1], v2, off
				; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-SDAG-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX11-GISEL-LABEL: test_permlanex16_v2f16:
				; GFX11-GISEL: ; %bb.0:
				; GFX11-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-GISEL-NEXT: v_readfirstlane_b32 s0, v3
				; GFX11-GISEL-NEXT: v_readfirstlane_b32 s1, v4
				; GFX11-GISEL-NEXT: s_delay_alu instid0(VALU_DEP_1) \| instskip(NEXT) \| instid1(VALU_DEP_1)
				; GFX11-GISEL-NEXT: v_permlanex16_b32 v2, v2, s0, s1 op_sel:[1,0]
				; GFX11-GISEL-NEXT: v_bfi_b32 v2, 0xffff, v2, v2
				; GFX11-GISEL-NEXT: global_store_b32 v[0:1], v2, off
				; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-GISEL-NEXT: s_setpc_b64 s[30:31]
				%v = call <2 x half> @llvm.amdgcn.permlanex16.v2f16(<2 x half> %src0, <2 x half> %src0, i32 %src1, i32 %src2, i1 true, i1 false)
				store <2 x half> %v, ptr addrspace(1) %out, align 4
				ret void
				}

				define void @test_permlanex16_v3i16(ptr addrspace(1) %out, <3 x i16> %src0, i32 %src1, i32 %src2) #1 {
				; GFX10-SDAG-LABEL: test_permlanex16_v3i16:
				; GFX10-SDAG: ; %bb.0:
				; GFX10-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-SDAG-NEXT: v_readfirstlane_b32 s4, v4
				; GFX10-SDAG-NEXT: v_readfirstlane_b32 s5, v5
				; GFX10-SDAG-NEXT: v_permlanex16_b32 v3, v3, s4, s5 op_sel:[1,0]
				; GFX10-SDAG-NEXT: v_permlanex16_b32 v2, v2, s4, s5 op_sel:[1,0]
				; GFX10-SDAG-NEXT: global_store_short v[0:1], v3, off offset:4
				; GFX10-SDAG-NEXT: global_store_dword v[0:1], v2, off
				; GFX10-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-SDAG-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX10-GISEL-LABEL: test_permlanex16_v3i16:
				; GFX10-GISEL: ; %bb.0:
				; GFX10-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-GISEL-NEXT: v_and_b32_e32 v3, 0xffff, v3
				; GFX10-GISEL-NEXT: v_readfirstlane_b32 s4, v4
				; GFX10-GISEL-NEXT: v_readfirstlane_b32 s5, v5
				; GFX10-GISEL-NEXT: v_lshl_or_b32 v3, s4, 16, v3
				; GFX10-GISEL-NEXT: v_permlanex16_b32 v2, v2, s4, s5 op_sel:[1,0]
				; GFX10-GISEL-NEXT: v_permlanex16_b32 v3, v3, s4, s5 op_sel:[1,0]
				; GFX10-GISEL-NEXT: global_store_short v[0:1], v2, off
				; GFX10-GISEL-NEXT: global_store_short_d16_hi v[0:1], v2, off offset:2
				; GFX10-GISEL-NEXT: global_store_short v[0:1], v3, off offset:4
				; GFX10-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-GISEL-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX11-SDAG-LABEL: test_permlanex16_v3i16:
				; GFX11-SDAG: ; %bb.0:
				; GFX11-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-SDAG-NEXT: v_readfirstlane_b32 s0, v4
				; GFX11-SDAG-NEXT: v_readfirstlane_b32 s1, v5
				; GFX11-SDAG-NEXT: s_delay_alu instid0(VALU_DEP_1)
				; GFX11-SDAG-NEXT: v_permlanex16_b32 v3, v3, s0, s1 op_sel:[1,0]
				; GFX11-SDAG-NEXT: v_permlanex16_b32 v2, v2, s0, s1 op_sel:[1,0]
				; GFX11-SDAG-NEXT: s_clause 0x1
				; GFX11-SDAG-NEXT: global_store_b16 v[0:1], v3, off offset:4
				; GFX11-SDAG-NEXT: global_store_b32 v[0:1], v2, off
				; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-SDAG-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX11-GISEL-LABEL: test_permlanex16_v3i16:
				; GFX11-GISEL: ; %bb.0:
				; GFX11-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-GISEL-NEXT: v_and_b32_e32 v3, 0xffff, v3
				; GFX11-GISEL-NEXT: v_readfirstlane_b32 s0, v4
				; GFX11-GISEL-NEXT: v_readfirstlane_b32 s1, v5
				; GFX11-GISEL-NEXT: s_delay_alu instid0(VALU_DEP_2) \| instskip(NEXT) \| instid1(VALU_DEP_2)
				; GFX11-GISEL-NEXT: v_lshl_or_b32 v3, s0, 16, v3
				; GFX11-GISEL-NEXT: v_permlanex16_b32 v2, v2, s0, s1 op_sel:[1,0]
				; GFX11-GISEL-NEXT: s_delay_alu instid0(VALU_DEP_2)
				; GFX11-GISEL-NEXT: v_permlanex16_b32 v3, v3, s0, s1 op_sel:[1,0]
				; GFX11-GISEL-NEXT: s_clause 0x2
				; GFX11-GISEL-NEXT: global_store_b16 v[0:1], v2, off
				; GFX11-GISEL-NEXT: global_store_d16_hi_b16 v[0:1], v2, off offset:2
				; GFX11-GISEL-NEXT: global_store_b16 v[0:1], v3, off offset:4
				; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-GISEL-NEXT: s_setpc_b64 s[30:31]
				%v = call <3 x i16> @llvm.amdgcn.permlanex16.v3i16(<3 x i16> %src0, <3 x i16> %src0, i32 %src1, i32 %src2, i1 true, i1 false)
				store <3 x i16> %v, ptr addrspace(1) %out, align 4
				ret void
				}

				define void @test_permlanex16_v9f32(ptr addrspace(1) %out, <9 x float> %src0, i32 %src1, i32 %src2) #1 {
				; GFX10-SDAG-LABEL: test_permlanex16_v9f32:
				; GFX10-SDAG: ; %bb.0:
				; GFX10-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-SDAG-NEXT: v_readfirstlane_b32 s4, v11
				; GFX10-SDAG-NEXT: v_readfirstlane_b32 s5, v12
				; GFX10-SDAG-NEXT: v_permlanex16_b32 v10, v10, s4, s5 op_sel:[1,0]
				; GFX10-SDAG-NEXT: v_permlanex16_b32 v6, v6, s4, s5 op_sel:[1,0]
				; GFX10-SDAG-NEXT: v_permlanex16_b32 v7, v7, s4, s5 op_sel:[1,0]
				; GFX10-SDAG-NEXT: v_permlanex16_b32 v8, v8, s4, s5 op_sel:[1,0]
				; GFX10-SDAG-NEXT: v_permlanex16_b32 v9, v9, s4, s5 op_sel:[1,0]
				; GFX10-SDAG-NEXT: v_permlanex16_b32 v2, v2, s4, s5 op_sel:[1,0]
				; GFX10-SDAG-NEXT: v_permlanex16_b32 v3, v3, s4, s5 op_sel:[1,0]
				; GFX10-SDAG-NEXT: v_permlanex16_b32 v4, v4, s4, s5 op_sel:[1,0]
				; GFX10-SDAG-NEXT: v_permlanex16_b32 v5, v5, s4, s5 op_sel:[1,0]
				; GFX10-SDAG-NEXT: global_store_dword v[0:1], v10, off offset:32
				; GFX10-SDAG-NEXT: global_store_dwordx4 v[0:1], v[6:9], off offset:16
				; GFX10-SDAG-NEXT: global_store_dwordx4 v[0:1], v[2:5], off
				; GFX10-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-SDAG-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX10-GISEL-LABEL: test_permlanex16_v9f32:
				; GFX10-GISEL: ; %bb.0:
				; GFX10-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-GISEL-NEXT: v_readfirstlane_b32 s4, v11
				; GFX10-GISEL-NEXT: v_readfirstlane_b32 s5, v12
				; GFX10-GISEL-NEXT: v_permlanex16_b32 v2, v2, s4, s5 op_sel:[1,0]
				; GFX10-GISEL-NEXT: v_permlanex16_b32 v3, v3, s4, s5 op_sel:[1,0]
				; GFX10-GISEL-NEXT: v_permlanex16_b32 v4, v4, s4, s5 op_sel:[1,0]
				; GFX10-GISEL-NEXT: v_permlanex16_b32 v5, v5, s4, s5 op_sel:[1,0]
				; GFX10-GISEL-NEXT: v_permlanex16_b32 v6, v6, s4, s5 op_sel:[1,0]
				; GFX10-GISEL-NEXT: v_permlanex16_b32 v7, v7, s4, s5 op_sel:[1,0]
				; GFX10-GISEL-NEXT: v_permlanex16_b32 v8, v8, s4, s5 op_sel:[1,0]
				; GFX10-GISEL-NEXT: v_permlanex16_b32 v9, v9, s4, s5 op_sel:[1,0]
				; GFX10-GISEL-NEXT: v_permlanex16_b32 v10, v10, s4, s5 op_sel:[1,0]
				; GFX10-GISEL-NEXT: global_store_dwordx4 v[0:1], v[2:5], off
				; GFX10-GISEL-NEXT: global_store_dwordx4 v[0:1], v[6:9], off offset:16
				; GFX10-GISEL-NEXT: global_store_dword v[0:1], v10, off offset:32
				; GFX10-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-GISEL-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX11-SDAG-LABEL: test_permlanex16_v9f32:
				; GFX11-SDAG: ; %bb.0:
				; GFX11-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-SDAG-NEXT: v_readfirstlane_b32 s0, v11
				; GFX11-SDAG-NEXT: v_readfirstlane_b32 s1, v12
				; GFX11-SDAG-NEXT: s_delay_alu instid0(VALU_DEP_1)
				; GFX11-SDAG-NEXT: v_permlanex16_b32 v10, v10, s0, s1 op_sel:[1,0]
				; GFX11-SDAG-NEXT: v_permlanex16_b32 v6, v6, s0, s1 op_sel:[1,0]
				; GFX11-SDAG-NEXT: v_permlanex16_b32 v7, v7, s0, s1 op_sel:[1,0]
				; GFX11-SDAG-NEXT: v_permlanex16_b32 v8, v8, s0, s1 op_sel:[1,0]
				; GFX11-SDAG-NEXT: v_permlanex16_b32 v9, v9, s0, s1 op_sel:[1,0]
				; GFX11-SDAG-NEXT: v_permlanex16_b32 v2, v2, s0, s1 op_sel:[1,0]
				; GFX11-SDAG-NEXT: v_permlanex16_b32 v3, v3, s0, s1 op_sel:[1,0]
				; GFX11-SDAG-NEXT: v_permlanex16_b32 v4, v4, s0, s1 op_sel:[1,0]
				; GFX11-SDAG-NEXT: v_permlanex16_b32 v5, v5, s0, s1 op_sel:[1,0]
				; GFX11-SDAG-NEXT: s_clause 0x2
				; GFX11-SDAG-NEXT: global_store_b32 v[0:1], v10, off offset:32
				; GFX11-SDAG-NEXT: global_store_b128 v[0:1], v[6:9], off offset:16
				; GFX11-SDAG-NEXT: global_store_b128 v[0:1], v[2:5], off
				; GFX11-SDAG-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-SDAG-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX11-GISEL-LABEL: test_permlanex16_v9f32:
				; GFX11-GISEL: ; %bb.0:
				; GFX11-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-GISEL-NEXT: v_readfirstlane_b32 s0, v11
				; GFX11-GISEL-NEXT: v_readfirstlane_b32 s1, v12
				; GFX11-GISEL-NEXT: s_delay_alu instid0(VALU_DEP_1)
				; GFX11-GISEL-NEXT: v_permlanex16_b32 v2, v2, s0, s1 op_sel:[1,0]
				; GFX11-GISEL-NEXT: v_permlanex16_b32 v3, v3, s0, s1 op_sel:[1,0]
				; GFX11-GISEL-NEXT: v_permlanex16_b32 v4, v4, s0, s1 op_sel:[1,0]
				; GFX11-GISEL-NEXT: v_permlanex16_b32 v5, v5, s0, s1 op_sel:[1,0]
				; GFX11-GISEL-NEXT: v_permlanex16_b32 v6, v6, s0, s1 op_sel:[1,0]
				; GFX11-GISEL-NEXT: v_permlanex16_b32 v7, v7, s0, s1 op_sel:[1,0]
				; GFX11-GISEL-NEXT: v_permlanex16_b32 v8, v8, s0, s1 op_sel:[1,0]
				; GFX11-GISEL-NEXT: v_permlanex16_b32 v9, v9, s0, s1 op_sel:[1,0]
				; GFX11-GISEL-NEXT: v_permlanex16_b32 v10, v10, s0, s1 op_sel:[1,0]
				; GFX11-GISEL-NEXT: s_clause 0x2
				; GFX11-GISEL-NEXT: global_store_b128 v[0:1], v[2:5], off
				; GFX11-GISEL-NEXT: global_store_b128 v[0:1], v[6:9], off offset:16
				; GFX11-GISEL-NEXT: global_store_b32 v[0:1], v10, off offset:32
				; GFX11-GISEL-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-GISEL-NEXT: s_setpc_b64 s[30:31]
				%v = call <9 x float> @llvm.amdgcn.permlanex16.v9f32(<9 x float> %src0, <9 x float> %src0, i32 %src1, i32 %src2, i1 true, i1 false)
				store <9 x float> %v, ptr addrspace(1) %out, align 4
				ret void
				}

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.readfirstlane.ll

				; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=hawaii -verify-machineinstrs < %s \| FileCheck -enable-var-scope %s
	; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -enable-var-scope %s			; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -enable-var-scope %s

	declare i32 @llvm.amdgcn.readfirstlane(i32) #0			declare i32 @llvm.amdgcn.readfirstlane.i32(i32) #0
				declare float @llvm.amdgcn.readfirstlane.f32(float) #0
				declare <2 x half> @llvm.amdgcn.readfirstlane.v2f16(<2 x half>) #0
				declare <2 x i16> @llvm.amdgcn.readfirstlane.v2i16(<2 x i16>) #0
				declare <2 x bfloat> @llvm.amdgcn.readfirstlane.v2bf(<2 x bfloat>) #0
				declare i16 @llvm.amdgcn.readfirstlane.i16(i16) #0
				declare half @llvm.amdgcn.readfirstlane.f16(half) #0
				declare bfloat @llvm.amdgcn.readfirstlane.bf(bfloat) #0
				declare ptr @llvm.amdgcn.readfirstlane.ptr(ptr) #0
				declare ptr addrspace(2) @llvm.amdgcn.readfirstlane.p2(ptr addrspace(2)) #0
				declare ptr addrspace(3) @llvm.amdgcn.readfirstlane.p3(ptr addrspace(3)) #0
				declare ptr addrspace(5) @llvm.amdgcn.readfirstlane.p5(ptr addrspace(5)) #0
				declare ptr addrspace(6) @llvm.amdgcn.readfirstlane.p6(ptr addrspace(6)) #0
				declare <3 x i16> @llvm.amdgcn.readfirstlane.v3i16(<3 x i16>) #0
				declare <9 x float> @llvm.amdgcn.readfirstlane.v9f32(<9 x float>) #0


	; CHECK-LABEL: {{^}}test_readfirstlane:			; CHECK-LABEL: {{^}}test_readfirstlane:
	; CHECK: v_readfirstlane_b32 s{{[0-9]+}}, v2			; CHECK: v_readfirstlane_b32 s{{[0-9]+}}, v2
	define void @test_readfirstlane(ptr addrspace(1) %out, i32 %src) #1 {			define void @test_readfirstlane(ptr addrspace(1) %out, i32 %src) #1 {
	%readfirstlane = call i32 @llvm.amdgcn.readfirstlane(i32 %src)			%readfirstlane = call i32 @llvm.amdgcn.readfirstlane.i32(i32 %src)
	store i32 %readfirstlane, ptr addrspace(1) %out, align 4			store i32 %readfirstlane, ptr addrspace(1) %out, align 4
	ret void			ret void
	}			}

	; CHECK-LABEL: {{^}}test_readfirstlane_imm:			; CHECK-LABEL: {{^}}test_readfirstlane_imm:
	; CHECK: s_mov_b32 [[SGPR_VAL:s[0-9]]], 32			; CHECK: s_mov_b32 [[SGPR_VAL:s[0-9]]], 32
	; CHECK-NOT: [[SGPR_VAL]]			; CHECK-NOT: [[SGPR_VAL]]
	; CHECK: ; use [[SGPR_VAL]]			; CHECK: ; use [[SGPR_VAL]]
	define amdgpu_kernel void @test_readfirstlane_imm(ptr addrspace(1) %out) #1 {			define amdgpu_kernel void @test_readfirstlane_imm(ptr addrspace(1) %out) #1 {
	%readfirstlane = call i32 @llvm.amdgcn.readfirstlane(i32 32)			%readfirstlane = call i32 @llvm.amdgcn.readfirstlane.i32(i32 32)
	call void asm sideeffect "; use $0", "s"(i32 %readfirstlane)			call void asm sideeffect "; use $0", "s"(i32 %readfirstlane)
	ret void			ret void
	}			}

	; CHECK-LABEL: {{^}}test_readfirstlane_imm_fold:			; CHECK-LABEL: {{^}}test_readfirstlane_imm_fold:
	; CHECK: v_mov_b32_e32 [[VVAL:v[0-9]]], 32			; CHECK: v_mov_b32_e32 [[VVAL:v[0-9]]], 32
	; CHECK-NOT: [[VVAL]]			; CHECK-NOT: [[VVAL]]
	; CHECK: flat_store_dword v{{\[[0-9]+:[0-9]+\]}}, [[VVAL]]			; CHECK: flat_store_dword v{{\[[0-9]+:[0-9]+\]}}, [[VVAL]]
	define amdgpu_kernel void @test_readfirstlane_imm_fold(ptr addrspace(1) %out) #1 {			define amdgpu_kernel void @test_readfirstlane_imm_fold(ptr addrspace(1) %out) #1 {
	%readfirstlane = call i32 @llvm.amdgcn.readfirstlane(i32 32)			%readfirstlane = call i32 @llvm.amdgcn.readfirstlane.i32(i32 32)
	store i32 %readfirstlane, ptr addrspace(1) %out, align 4			store i32 %readfirstlane, ptr addrspace(1) %out, align 4
	ret void			ret void
	}			}

	; CHECK-LABEL: {{^}}test_readfirstlane_m0:			; CHECK-LABEL: {{^}}test_readfirstlane_m0:
	; CHECK: s_mov_b32 m0, -1			; CHECK: s_mov_b32 m0, -1
	; CHECK: v_mov_b32_e32 [[VVAL:v[0-9]]], m0			; CHECK: v_mov_b32_e32 [[VVAL:v[0-9]]], m0
	; CHECK: flat_store_dword v{{\[[0-9]+:[0-9]+\]}}, [[VVAL]]			; CHECK: flat_store_dword v{{\[[0-9]+:[0-9]+\]}}, [[VVAL]]
	define amdgpu_kernel void @test_readfirstlane_m0(ptr addrspace(1) %out) #1 {			define amdgpu_kernel void @test_readfirstlane_m0(ptr addrspace(1) %out) #1 {
	%m0 = call i32 asm "s_mov_b32 m0, -1", "={m0}"()			%m0 = call i32 asm "s_mov_b32 m0, -1", "={m0}"()
	%readfirstlane = call i32 @llvm.amdgcn.readfirstlane(i32 %m0)			%readfirstlane = call i32 @llvm.amdgcn.readfirstlane.i32(i32 %m0)
	store i32 %readfirstlane, ptr addrspace(1) %out, align 4			store i32 %readfirstlane, ptr addrspace(1) %out, align 4
	ret void			ret void
	}			}

	; CHECK-LABEL: {{^}}test_readfirstlane_copy_from_sgpr:			; CHECK-LABEL: {{^}}test_readfirstlane_copy_from_sgpr:
	; CHECK: ;;#ASMSTART			; CHECK: ;;#ASMSTART
	; CHECK-NEXT: s_mov_b32 [[SGPR:s[0-9]+]]			; CHECK-NEXT: s_mov_b32 [[SGPR:s[0-9]+]]
	; CHECK: ;;#ASMEND			; CHECK: ;;#ASMEND
	; CHECK-NOT: [[SGPR]]			; CHECK-NOT: [[SGPR]]
	; CHECK-NOT: readfirstlane			; CHECK-NOT: readfirstlane
	; CHECK: v_mov_b32_e32 [[VCOPY:v[0-9]+]], [[SGPR]]			; CHECK: v_mov_b32_e32 [[VCOPY:v[0-9]+]], [[SGPR]]
	; CHECK: flat_store_dword v{{\[[0-9]+:[0-9]+\]}}, [[VCOPY]]			; CHECK: flat_store_dword v{{\[[0-9]+:[0-9]+\]}}, [[VCOPY]]
	define amdgpu_kernel void @test_readfirstlane_copy_from_sgpr(ptr addrspace(1) %out) #1 {			define amdgpu_kernel void @test_readfirstlane_copy_from_sgpr(ptr addrspace(1) %out) #1 {
	%sgpr = call i32 asm "s_mov_b32 $0, 0", "=s"()			%sgpr = call i32 asm "s_mov_b32 $0, 0", "=s"()
	%readfirstlane = call i32 @llvm.amdgcn.readfirstlane(i32 %sgpr)			%readfirstlane = call i32 @llvm.amdgcn.readfirstlane.i32(i32 %sgpr)
	store i32 %readfirstlane, ptr addrspace(1) %out, align 4			store i32 %readfirstlane, ptr addrspace(1) %out, align 4
	ret void			ret void
	}			}

	; Make sure this doesn't crash.			; Make sure this doesn't crash.
	; CHECK-LABEL: {{^}}test_readfirstlane_fi:			; CHECK-LABEL: {{^}}test_readfirstlane_fi:
	; CHECK: s_mov_b32 [[FIVAL:s[0-9]]], 4			; CHECK: s_mov_b32 [[FIVAL:s[0-9]]], 4
	define amdgpu_kernel void @test_readfirstlane_fi(ptr addrspace(1) %out) #1 {			define amdgpu_kernel void @test_readfirstlane_fi(ptr addrspace(1) %out) #1 {
	%alloca = alloca i32, addrspace(5)			%alloca = alloca i32, addrspace(5)
	%int = ptrtoint ptr addrspace(5) %alloca to i32			%int = ptrtoint ptr addrspace(5) %alloca to i32
	%readfirstlane = call i32 @llvm.amdgcn.readfirstlane(i32 %int)			%readfirstlane = call i32 @llvm.amdgcn.readfirstlane.i32(i32 %int)
	call void asm sideeffect "; use $0", "s"(i32 %readfirstlane)			call void asm sideeffect "; use $0", "s"(i32 %readfirstlane)
	ret void			ret void
	}			}

				; CHECK-LABEL: {{^}}test_readfirstlane_v2f16:
				; CHECK: v_readfirstlane_b32 s{{[0-9]+}}, v2
				; CHECK-NOT: v_readfirstlane_b32
				define void @test_readfirstlane_v2f16(ptr addrspace(1) %out, <2 x half> %src) #1 {
				%readfirstlane = call <2 x half> @llvm.amdgcn.readfirstlane.v2f16(<2 x half> %src)
				store <2 x half> %readfirstlane, ptr addrspace(1) %out, align 4
				ret void
				}

				; CHECK-LABEL: {{^}}test_readfirstlane_v2i16:
				; CHECK: v_readfirstlane_b32 s{{[0-9]+}}, v2
				; CHECK-NOT: v_readfirstlane_b32
				define void @test_readfirstlane_v2i16(ptr addrspace(1) %out, <2 x i16> %src) #1 {
				%readfirstlane = call <2 x i16> @llvm.amdgcn.readfirstlane.v2i16(<2 x i16> %src)
				store <2 x i16> %readfirstlane, ptr addrspace(1) %out, align 4
				ret void
				}

				; CHECK-LABEL: {{^}}test_readfirstlane_v2bf:
				; CHECK: v_readfirstlane_b32 s{{[0-9]+}}, v2
				; CHECK-NOT: v_readfirstlane_b32
				define void @test_readfirstlane_v2bf(ptr addrspace(1) %out, <2 x bfloat> %src) #1 {
				%readlane = call <2 x bfloat> @llvm.amdgcn.readfirstlane.v2bf(<2 x bfloat> %src)
				store <2 x bfloat> %readlane, ptr addrspace(1) %out, align 4
				ret void
				}

				; CHECK-LABEL: {{^}}test_readfirstlane_i16:
				; CHECK: v_readfirstlane_b32 s{{[0-9]+}}, v2
				; CHECK-NOT: v_readfirstlane_b32
				define void @test_readfirstlane_i16(ptr addrspace(1) %out, i16 %src) {
				%readfirstlane = call i16 @llvm.amdgcn.readfirstlane.i16(i16 %src)
				store i16 %readfirstlane, ptr addrspace(1) %out, align 2
				ret void
				}

				; CHECK-LABEL: {{^}}test_readfirstlane_f16:
				; CHECK: v_readfirstlane_b32 s{{[0-9]+}}, v2
				; CHECK-NOT: v_readfirstlane_b32
				define void @test_readfirstlane_f16(ptr addrspace(1) %out, half %src) {
				%readfirstlane = call half @llvm.amdgcn.readfirstlane.f16(half %src)
				store half %readfirstlane, ptr addrspace(1) %out, align 2
				ret void
				}

				; CHECK-LABEL: {{^}}test_readfirstlane_bfloat:
				; CHECK: v_readfirstlane_b32 s{{[0-9]+}}, v2
				; CHECK-NOT: v_readfirstlane_b32
				define void @test_readfirstlane_bfloat(ptr addrspace(1) %out, bfloat %src) {
				%readfirstlane = call bfloat @llvm.amdgcn.readfirstlane.bf(bfloat %src)
				store bfloat %readfirstlane, ptr addrspace(1) %out, align 2
				ret void
				}

				; CHECK-LABEL: {{^}}test_readfirstlane_ptr:
				; CHECK-DAG: v_readfirstlane_b32 s{{[0-9]+}},
				; CHECK-DAG: v_readfirstlane_b32 s{{[0-9]+}},
				; CHECK-NOT: v_readfirstlane_b32
				define void @test_readfirstlane_ptr(ptr addrspace(1) %out, ptr %src) {
				%readfirstlane = call ptr @llvm.amdgcn.readfirstlane.ptr(ptr %src)
				store ptr %readfirstlane, ptr addrspace(1) %out, align 2
				ret void
				}

				; CHECK-LABEL: {{^}}test_readfirstlane_p2:
				; CHECK-DAG: v_readfirstlane_b32 s{{[0-9]+}},
				; CHECK-NOT: v_readfirstlane_b32
				define void @test_readfirstlane_p2(ptr addrspace(1) %out, ptr addrspace(2) %src) {
				%readfirstlane = call ptr addrspace(2) @llvm.amdgcn.readfirstlane.p2(ptr addrspace(2) %src)
				store ptr addrspace(2) %readfirstlane, ptr addrspace(1) %out, align 2
				ret void
				}

				; CHECK-LABEL: {{^}}test_readfirstlane_p3:
				; CHECK-DAG: v_readfirstlane_b32 s{{[0-9]+}},
				; CHECK-NOT: v_readfirstlane_b32
				define void @test_readfirstlane_p3(ptr addrspace(1) %out, ptr addrspace(3) %src) {
				%readfirstlane = call ptr addrspace(3) @llvm.amdgcn.readfirstlane.p3(ptr addrspace(3) %src)
				store ptr addrspace(3) %readfirstlane, ptr addrspace(1) %out, align 2
				ret void
				}

				; CHECK-LABEL: {{^}}test_readfirstlane_p5:
				; CHECK-DAG: v_readfirstlane_b32 s{{[0-9]+}},
				; CHECK-NOT: v_readfirstlane_b32
				define void @test_readfirstlane_p5(ptr addrspace(1) %out, ptr addrspace(5) %src) {
				%readfirstlane = call ptr addrspace(5) @llvm.amdgcn.readfirstlane.p5(ptr addrspace(5) %src)
				store ptr addrspace(5) %readfirstlane, ptr addrspace(1) %out, align 2
				ret void
				}

				; CHECK-LABEL: {{^}}test_readfirstlane_p6:
				; CHECK-DAG: v_readfirstlane_b32 s{{[0-9]+}},
				; CHECK-NOT: v_readfirstlane_b32
				define void @test_readfirstlane_p6(ptr addrspace(1) %out, ptr addrspace(6) %src) {
				%readfirstlane = call ptr addrspace(6) @llvm.amdgcn.readfirstlane.p6(ptr addrspace(6) %src)
				store ptr addrspace(6) %readfirstlane, ptr addrspace(1) %out, align 2
				ret void
				}

				; CHECK-LABEL: {{^}}test_readfirstlane_f32:
				; CHECK-NOT: v_cvt_f32_i32_e32
				; CHECK: v_readfirstlane_b32 s{{[0-9]+}}, v2
				; CHECK-NOT: v_readfirstlane_b32
				define void @test_readfirstlane_f32(ptr addrspace(1) %out, float %src) #1 {
				%readfirstlane = call float @llvm.amdgcn.readfirstlane.f32(float %src)
				store float %readfirstlane, ptr addrspace(1) %out, align 4
				ret void
				}

				; CHECK-LABEL: {{^}}test_readfirstlane_v3i16:
				; CHECK-DAG: v_readfirstlane_b32 s{{[0-9]+}},
				; CHECK-DAG: v_readfirstlane_b32 s{{[0-9]+}},
				; CHECK-NOT: v_readfirstlane_b32
				define void @test_readfirstlane_v3i16(ptr addrspace(1) %out, <3 x i16> %src) {
				%readfirstlane = call <3 x i16> @llvm.amdgcn.readfirstlane.v3i16(<3 x i16> %src)
				store <3 x i16> %readfirstlane, ptr addrspace(1) %out, align 2
				ret void
				}

				; CHECK-LABEL: {{^}}test_readfirstlane_v9f32:
				; CHECK-DAG: v_readfirstlane_b32 s{{[0-9]+}}, v2
				; CHECK-DAG: v_readfirstlane_b32 s{{[0-9]+}}, v3
				; CHECK-DAG: v_readfirstlane_b32 s{{[0-9]+}}, v4
				; CHECK-DAG: v_readfirstlane_b32 s{{[0-9]+}}, v5
				; CHECK-DAG: v_readfirstlane_b32 s{{[0-9]+}}, v6
				; CHECK-DAG: v_readfirstlane_b32 s{{[0-9]+}}, v7
				; CHECK-DAG: v_readfirstlane_b32 s{{[0-9]+}}, v8
				; CHECK-DAG: v_readfirstlane_b32 s{{[0-9]+}}, v9
				; CHECK-DAG: v_readfirstlane_b32 s{{[0-9]+}}, v10
				; CHECK-NOT: v_readfirstlane_b32
				define void @test_readfirstlane_v9f32(ptr addrspace(1) %out, <9 x float> %src) {
				%readfirstlane = call <9 x float> @llvm.amdgcn.readfirstlane.v9f32(<9 x float> %src)
				store <9 x float> %readfirstlane, ptr addrspace(1) %out, align 2
				ret void
				}

	attributes #0 = { nounwind readnone convergent }			attributes #0 = { nounwind readnone convergent }
	attributes #1 = { nounwind }			attributes #1 = { nounwind }
				No newline at end of file

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.readlane.ll

	; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -enable-var-scope %s			; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -enable-var-scope %s

	declare i32 @llvm.amdgcn.readlane(i32, i32) #0			declare i32 @llvm.amdgcn.readlane(i32, i32) #0
				declare i16 @llvm.amdgcn.readlane.i16(i16, i32) #0
				declare <2 x i16> @llvm.amdgcn.readlane.v2i16(<2 x i16>, i32) #0
				arsenmUnsubmitted Done Reply Inline Actions Add bfloat and <2 x i16>, <2 x half>, <2 x bfloat> tests arsenm: Add bfloat and <2 x i16>, <2 x half>, <2 x bfloat> tests
				arsenmUnsubmitted Done Reply Inline Actions Also p2, p3, p5, p6 arsenm: Also p2, p3, p5, p6
				declare <2 x half> @llvm.amdgcn.readlane.v2f16(<2 x half>, i32) #0
				declare <2 x bfloat> @llvm.amdgcn.readlane.v2bf(<2 x bfloat>, i32) #0
				declare half @llvm.amdgcn.readlane.f16(half, i32) #0
				declare bfloat @llvm.amdgcn.readlane.bf(bfloat, i32) #0
				declare float @llvm.amdgcn.readlane.f32(float, i32) #0
				declare ptr @llvm.amdgcn.readlane.ptr(ptr, i32) #0
				declare ptr addrspace(2) @llvm.amdgcn.readlane.p2(ptr addrspace(2), i32) #0
				declare ptr addrspace(3) @llvm.amdgcn.readlane.p3(ptr addrspace(3), i32) #0
				declare ptr addrspace(5) @llvm.amdgcn.readlane.p5(ptr addrspace(5), i32) #0
				declare ptr addrspace(6) @llvm.amdgcn.readlane.p6(ptr addrspace(6), i32) #0
				declare <3 x i16> @llvm.amdgcn.readlane.v3i16(<3 x i16>, i32) #0
				declare <9 x float> @llvm.amdgcn.readlane.v9f32(<9 x float>, i32) #0

	; CHECK-LABEL: {{^}}test_readlane_sreg_sreg:			; CHECK-LABEL: {{^}}test_readlane_sreg_sreg:
	; CHECK-NOT: v_readlane_b32			; CHECK-NOT: v_readlane_b32
	define amdgpu_kernel void @test_readlane_sreg_sreg(i32 %src0, i32 %src1) #1 {			define amdgpu_kernel void @test_readlane_sreg_sreg(i32 %src0, i32 %src1) #1 {
	%readlane = call i32 @llvm.amdgcn.readlane(i32 %src0, i32 %src1)			%readlane = call i32 @llvm.amdgcn.readlane(i32 %src0, i32 %src1)
				arsenmUnsubmitted Not Done Reply Inline Actions This is a hint mesa won't break arsenm: This is a hint mesa won't break
	call void asm sideeffect "; use $0", "s"(i32 %readlane)			call void asm sideeffect "; use $0", "s"(i32 %readlane)
	ret void			ret void
	}			}

	; CHECK-LABEL: {{^}}test_readlane_vreg_sreg:			; CHECK-LABEL: {{^}}test_readlane_vreg_sreg:
	; CHECK: v_readlane_b32 s{{[0-9]+}}, v{{[0-9]+}}, s{{[0-9]+}}			; CHECK: v_readlane_b32 s{{[0-9]+}}, v{{[0-9]+}}, s{{[0-9]+}}
	define amdgpu_kernel void @test_readlane_vreg_sreg(i32 %src0, i32 %src1) #1 {			define amdgpu_kernel void @test_readlane_vreg_sreg(i32 %src0, i32 %src1) #1 {
	%vgpr = call i32 asm sideeffect "; def $0", "=v"()			%vgpr = call i32 asm sideeffect "; def $0", "=v"()
	▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: s_mov_b32 [[SGPR:s[0-9]+]]			; CHECK-NEXT: s_mov_b32 [[SGPR:s[0-9]+]]
	; CHECK: ;;#ASMEND			; CHECK: ;;#ASMEND
	; CHECK-NOT: [[SGPR]]			; CHECK-NOT: [[SGPR]]
	; CHECK-NOT: readlane			; CHECK-NOT: readlane
	; CHECK: v_mov_b32_e32 [[VCOPY:v[0-9]+]], [[SGPR]]			; CHECK: v_mov_b32_e32 [[VCOPY:v[0-9]+]], [[SGPR]]
	; CHECK: flat_store_dword v{{\[[0-9]+:[0-9]+\]}}, [[VCOPY]]			; CHECK: flat_store_dword v{{\[[0-9]+:[0-9]+\]}}, [[VCOPY]]
	define amdgpu_kernel void @test_readlane_copy_from_sgpr(ptr addrspace(1) %out) #1 {			define amdgpu_kernel void @test_readlane_copy_from_sgpr(ptr addrspace(1) %out) #1 {
	%sgpr = call i32 asm "s_mov_b32 $0, 0", "=s"()			%sgpr = call i32 asm "s_mov_b32 $0, 0", "=s"()
	%readfirstlane = call i32 @llvm.amdgcn.readlane(i32 %sgpr, i32 7)			%readlane = call i32 @llvm.amdgcn.readlane(i32 %sgpr, i32 7)
	store i32 %readfirstlane, ptr addrspace(1) %out, align 4			store i32 %readlane, ptr addrspace(1) %out, align 4
				ret void
				}

				; CHECK-LABEL: {{^}}test_readlane_i16:
				; CHECK: v_readlane_b32 s{{[0-9]+}}, v2, 15
				; CHECK-NOT: v_readlane_b32
				define void @test_readlane_i16(ptr addrspace(1) %out, i16 %src) {
				%readlane = call i16 @llvm.amdgcn.readlane.i16(i16 %src, i32 15)
				store i16 %readlane, ptr addrspace(1) %out, align 2
				ret void
				}

				; CHECK-LABEL: {{^}}test_readlane_f16:
				; CHECK: v_readlane_b32 s{{[0-9]+}}, v2, 15
				; CHECK-NOT: v_readlane_b32
				define void @test_readlane_f16(ptr addrspace(1) %out, half %src) {
				%readlane = call half @llvm.amdgcn.readlane.f16(half %src, i32 15)
				store half %readlane, ptr addrspace(1) %out, align 2
				ret void
				}

				; CHECK-LABEL: {{^}}test_readlane_bfloat:
				; CHECK: v_readlane_b32 s{{[0-9]+}}, v2, 15
				; CHECK-NOT: v_readlane_b32
				define void @test_readlane_bfloat(ptr addrspace(1) %out, bfloat %src) {
				%readlane = call bfloat @llvm.amdgcn.readlane.bf(bfloat %src, i32 15)
				store bfloat %readlane, ptr addrspace(1) %out, align 2
				ret void
				}

				; CHECK-LABEL: {{^}}test_readlane_v2f16:
				; CHECK: v_readlane_b32 s{{[0-9]+}}, v2
				; CHECK-NOT: v_readlane_b32
				define void @test_readlane_v2f16(ptr addrspace(1) %out, <2 x half> %src) #1 {
				%readlane = call <2 x half> @llvm.amdgcn.readlane.v2f16(<2 x half> %src, i32 15)
				store <2 x half> %readlane, ptr addrspace(1) %out, align 4
				ret void
				}

				; CHECK-LABEL: {{^}}test_readlane_v2i16:
				; CHECK: v_readlane_b32 s{{[0-9]+}}, v2
				; CHECK-NOT: v_readlane_b32
				define void @test_readlane_v2i16(ptr addrspace(1) %out, <2 x i16> %src) #1 {
				%readlane = call <2 x i16> @llvm.amdgcn.readlane.v2i16(<2 x i16> %src, i32 15)
				store <2 x i16> %readlane, ptr addrspace(1) %out, align 4
				ret void
				}


				; CHECK-LABEL: {{^}}test_readlane_v2bf:
				; CHECK: v_readlane_b32 s{{[0-9]+}}, v2
				; CHECK-NOT: v_readlane_b32
				define void @test_readlane_v2bf(ptr addrspace(1) %out, <2 x bfloat> %src) #1 {
				%readlane = call <2 x bfloat> @llvm.amdgcn.readlane.v2bf(<2 x bfloat> %src, i32 15)
				store <2 x bfloat> %readlane, ptr addrspace(1) %out, align 4
				ret void
				}

				; CHECK-LABEL: {{^}}test_readlane_f32:
				; CHECK-NOT: v_cvt_f32_i32_e32
				; CHECK: v_readlane_b32 s{{[0-9]+}}, v2, 15
				; CHECK-NOT: v_readlane_b32
				define void @test_readlane_f32(ptr addrspace(1) %out, float %src) {
				%readlane = call float @llvm.amdgcn.readlane.f32(float %src, i32 15)
				store float %readlane, ptr addrspace(1) %out, align 2
				ret void
				}

				; CHECK-LABEL: {{^}}test_readlane_ptr:
				; CHECK-DAG: v_readlane_b32 s{{[0-9]+}},
				; CHECK-DAG: v_readlane_b32 s{{[0-9]+}},
				; CHECK-NOT: v_readlane_b32
				define void @test_readlane_ptr(ptr addrspace(1) %out, ptr %src) {
				%readlane = call ptr @llvm.amdgcn.readlane.ptr(ptr %src, i32 15)
				store ptr %readlane, ptr addrspace(1) %out, align 2
				ret void
				}

				; CHECK-LABEL: {{^}}test_readlane_p2:
				; CHECK-DAG: v_readlane_b32 s{{[0-9]+}},
				; CHECK-NOT: v_readlane_b32
				define void @test_readlane_p2(ptr addrspace(1) %out, ptr addrspace(2) %src) {
				%readlane = call ptr addrspace(2) @llvm.amdgcn.readlane.p2(ptr addrspace(2) %src, i32 15)
				store ptr addrspace(2) %readlane, ptr addrspace(1) %out, align 2
				ret void
				}

				; CHECK-LABEL: {{^}}test_readlane_p3:
				; CHECK-DAG: v_readlane_b32 s{{[0-9]+}},
				; CHECK-NOT: v_readlane_b32
				define void @test_readlane_p3(ptr addrspace(1) %out, ptr addrspace(3) %src) {
				%readlane = call ptr addrspace(3) @llvm.amdgcn.readlane.p3(ptr addrspace(3) %src, i32 15)
				store ptr addrspace(3) %readlane, ptr addrspace(1) %out, align 2
				ret void
				}

				; CHECK-LABEL: {{^}}test_readlane_p5:
				; CHECK-DAG: v_readlane_b32 s{{[0-9]+}},
				; CHECK-NOT: v_readlane_b32
				define void @test_readlane_p5(ptr addrspace(1) %out, ptr addrspace(5) %src) {
				%readlane = call ptr addrspace(5) @llvm.amdgcn.readlane.p5(ptr addrspace(5) %src, i32 15)
				store ptr addrspace(5) %readlane, ptr addrspace(1) %out, align 2
				ret void
				}

				; CHECK-LABEL: {{^}}test_readlane_p6:
				; CHECK-DAG: v_readlane_b32 s{{[0-9]+}},
				; CHECK-NOT: v_readlane_b32
				define void @test_readlane_p6(ptr addrspace(1) %out, ptr addrspace(6) %src) {
				%readlane = call ptr addrspace(6) @llvm.amdgcn.readlane.p6(ptr addrspace(6) %src, i32 15)
				store ptr addrspace(6) %readlane, ptr addrspace(1) %out, align 2
				ret void
				}

				; CHECK-LABEL: {{^}}test_readlane_v3i16:
				; CHECK-DAG: v_readlane_b32 s{{[0-9]+}},
				; CHECK-DAG: v_readlane_b32 s{{[0-9]+}},
				; CHECK-NOT: v_readlane_b32
				define void @test_readlane_v3i16(ptr addrspace(1) %out, <3 x i16> %src) {
				%readlane = call <3 x i16> @llvm.amdgcn.readlane.v3i16(<3 x i16> %src, i32 15)
				store <3 x i16> %readlane, ptr addrspace(1) %out, align 2
				ret void
				}

				; CHECK-LABEL: {{^}}test_readlane_v9f32:
				; CHECK-DAG: v_readlane_b32 s{{[0-9]+}}, v2, 15
				; CHECK-DAG: v_readlane_b32 s{{[0-9]+}}, v3, 15
				; CHECK-DAG: v_readlane_b32 s{{[0-9]+}}, v4, 15
				; CHECK-DAG: v_readlane_b32 s{{[0-9]+}}, v5, 15
				; CHECK-DAG: v_readlane_b32 s{{[0-9]+}}, v6, 15
				; CHECK-DAG: v_readlane_b32 s{{[0-9]+}}, v7, 15
				; CHECK-DAG: v_readlane_b32 s{{[0-9]+}}, v8, 15
				; CHECK-DAG: v_readlane_b32 s{{[0-9]+}}, v9, 15
				; CHECK-DAG: v_readlane_b32 s{{[0-9]+}}, v10, 15
				; CHECK-NOT: v_readlane_b32
				define void @test_readlane_v9f32(<9 x float> addrspace(1)* %out, <9 x float> %src) {
				%readlane = call <9 x float> @llvm.amdgcn.readlane.v9f32(<9 x float> %src, i32 15)
				store <9 x float> %readlane, <9 x float> addrspace(1)* %out, align 2
	ret void			ret void
	}			}

	declare i32 @llvm.amdgcn.workitem.id.x() #2			declare i32 @llvm.amdgcn.workitem.id.x() #2

	attributes #0 = { nounwind readnone convergent }			attributes #0 = { nounwind readnone convergent }
	attributes #1 = { nounwind }			attributes #1 = { nounwind }
	attributes #2 = { nounwind readnone }			attributes #2 = { nounwind readnone }
				No newline at end of file

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll

	; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=gfx700 -verify-machineinstrs < %s \| FileCheck -check-prefixes=CHECK,CIGFX9 %s			; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=gfx700 -verify-machineinstrs < %s \| FileCheck -check-prefixes=CHECK,CIGFX9 %s
	; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=gfx802 -verify-machineinstrs < %s \| FileCheck -check-prefixes=CHECK,CIGFX9 %s			; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=gfx802 -verify-machineinstrs < %s \| FileCheck -check-prefixes=CHECK,CIGFX9 %s
	; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck -check-prefixes=CHECK,GFX10 %s			; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck -check-prefixes=CHECK,GFX10 %s
	; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=gfx1100 -verify-machineinstrs -amdgpu-enable-vopd=0 < %s \| FileCheck -check-prefixes=CHECK,GFX10 %s			; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=gfx1100 -verify-machineinstrs -amdgpu-enable-vopd=0 < %s \| FileCheck -check-prefixes=CHECK,GFX10 %s

	declare i32 @llvm.amdgcn.writelane(i32, i32, i32) #0			declare i32 @llvm.amdgcn.writelane(i32, i32, i32) #0
				declare i16 @llvm.amdgcn.writelane.i16(i16, i32, i16) #0
				declare <2 x i16> @llvm.amdgcn.writelane.v2i16(<2 x i16>, i32, <2 x i16>) #0
				declare <2 x half> @llvm.amdgcn.writelane.v2f16(<2 x half>, i32, <2 x half>) #0
				declare <2 x bfloat> @llvm.amdgcn.writelane.v2bf(<2 x bfloat>, i32, <2 x bfloat>) #0
				declare half @llvm.amdgcn.writelane.f16(half, i32, half) #0
				declare bfloat @llvm.amdgcn.writelane.bf(bfloat, i32, bfloat) #0
				declare float @llvm.amdgcn.writelane.f32(float, i32, float) #0
				declare ptr @llvm.amdgcn.writelane.ptr(ptr, i32, ptr) #0
				declare ptr addrspace(2) @llvm.amdgcn.writelane.p2(ptr addrspace(2), i32, ptr addrspace(2)) #0
				declare ptr addrspace(3) @llvm.amdgcn.writelane.p3(ptr addrspace(3), i32, ptr addrspace(3)) #0
				declare ptr addrspace(5) @llvm.amdgcn.writelane.p5(ptr addrspace(5), i32, ptr addrspace(5)) #0
				declare ptr addrspace(6) @llvm.amdgcn.writelane.p6(ptr addrspace(6), i32, ptr addrspace(6)) #0
				declare <3 x i16> @llvm.amdgcn.writelane.v3i16(<3 x i16>, i32, <3 x i16>) #0
				declare <9 x float> @llvm.amdgcn.writelane.v9f32(<9 x float>, i32, <9 x float>) #0

	; CHECK-LABEL: {{^}}test_writelane_sreg:			; CHECK-LABEL: {{^}}test_writelane_sreg:
	; CIGFX9: v_writelane_b32 v{{[0-9]+}}, s{{[0-9]+}}, m0			; CIGFX9: v_writelane_b32 v{{[0-9]+}}, s{{[0-9]+}}, m0
	; GFX10: v_writelane_b32 v{{[0-9]+}}, s{{[0-9]+}}, s{{[0-9]+}}			; GFX10: v_writelane_b32 v{{[0-9]+}}, s{{[0-9]+}}, s{{[0-9]+}}
	define amdgpu_kernel void @test_writelane_sreg(ptr addrspace(1) %out, i32 %src0, i32 %src1) #1 {			define amdgpu_kernel void @test_writelane_sreg(ptr addrspace(1) %out, i32 %src0, i32 %src1) #1 {
	%oldval = load i32, ptr addrspace(1) %out			%oldval = load i32, ptr addrspace(1) %out
	%writelane = call i32 @llvm.amdgcn.writelane(i32 %src0, i32 %src1, i32 %oldval)			%writelane = call i32 @llvm.amdgcn.writelane(i32 %src0, i32 %src1, i32 %oldval)
	store i32 %writelane, ptr addrspace(1) %out, align 4			store i32 %writelane, ptr addrspace(1) %out, align 4
	▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines
	; CIGFX9: v_writelane_b32 [[OLDVAL]], s{{[0-9]+}}, m0			; CIGFX9: v_writelane_b32 [[OLDVAL]], s{{[0-9]+}}, m0
	; GFX10: v_writelane_b32 [[OLDVAL]], s{{[0-9]+}}, s{{[0-9]+}}			; GFX10: v_writelane_b32 [[OLDVAL]], s{{[0-9]+}}, s{{[0-9]+}}
	define amdgpu_kernel void @test_writelane_imm_oldval(ptr addrspace(1) %out, i32 %src0, i32 %src1) #1 {			define amdgpu_kernel void @test_writelane_imm_oldval(ptr addrspace(1) %out, i32 %src0, i32 %src1) #1 {
	%writelane = call i32 @llvm.amdgcn.writelane(i32 %src0, i32 %src1, i32 42)			%writelane = call i32 @llvm.amdgcn.writelane(i32 %src0, i32 %src1, i32 42)
	store i32 %writelane, ptr addrspace(1) %out, align 4			store i32 %writelane, ptr addrspace(1) %out, align 4
	ret void			ret void
	}			}

				; CHECK-LABEL: {{^}}test_writelane_i16:
				; CHECK: v_writelane_b32 v{{[0-9]+}},
				; CHECK-NOT: v_writelane_b32
				define void @test_writelane_i16(ptr addrspace(1) %out, i16 %src) {
				%writelane = call i16 @llvm.amdgcn.writelane.i16(i16 1234, i32 15, i16 %src)
				store i16 %writelane, ptr addrspace(1) %out, align 2
				ret void
				}

				; CHECK-LABEL: {{^}}test_writelane_f16:
				; CHECK: v_writelane_b32 v{{[0-9]+}},
				; CHECK-NOT: v_writelane_b32
				define void @test_writelane_f16(ptr addrspace(1) %out, half %src) {
				%writelane = call half @llvm.amdgcn.writelane.f16(half 1.0, i32 15, half %src)
				store half %writelane, ptr addrspace(1) %out, align 2
				ret void
				}

				; CHECK-LABEL: {{^}}test_writelane_bfloat:
				; CHECK: v_writelane_b32
				; CHECK-NOT: v_writelane_b32
				define void @test_writelane_bfloat(ptr addrspace(1) %out, bfloat %src, bfloat %in) {
				%writelane = call bfloat @llvm.amdgcn.writelane.bf(bfloat %src, i32 15, bfloat %in)
				store bfloat %writelane, ptr addrspace(1) %out, align 2
				ret void
				}

				; CHECK-LABEL: {{^}}test_writelane_v2f16:
				; CHECK: v_writelane_b32
				; CHECK-NOT: v_writelane_b32
				define void @test_writelane_v2f16(ptr addrspace(1) %out, <2 x half> %src, <2 x half> %in) #1 {
				%writelane = call <2 x half> @llvm.amdgcn.writelane.v2f16(<2 x half> %src, i32 15, <2 x half> %in)
				store <2 x half> %writelane, ptr addrspace(1) %out, align 4
				ret void
				}

				; CHECK-LABEL: {{^}}test_writelane_v2i16:
				; CHECK: v_writelane_b32
				; CHECK-NOT: v_writelane_b32
				define void @test_writelane_v2i16(ptr addrspace(1) %out, <2 x i16> %src, <2 x i16> %in) #1 {
				%writelane = call <2 x i16> @llvm.amdgcn.writelane.v2i16(<2 x i16> %src, i32 15, <2 x i16> %in)
				store <2 x i16> %writelane, ptr addrspace(1) %out, align 4
				ret void
				}

				; CHECK-LABEL: {{^}}test_writelane_v2bf:
				; CHECK: v_writelane_b32
				; CHECK-NOT: v_writelane_b32
				define void @test_writelane_v2bf(ptr addrspace(1) %out, <2 x bfloat> %src, <2 x bfloat> %in) {
				%writelane = call <2 x bfloat> @llvm.amdgcn.writelane.v2bf(<2 x bfloat> %src, i32 15, <2 x bfloat> %in)
				store <2 x bfloat> %writelane, ptr addrspace(1) %out, align 2
				ret void
				}


				; CHECK-LABEL: {{^}}test_writelane_f32:
				; CHECK-NOT: v_cvt_f32_i32_e32
				; CHECK: v_writelane_b32 v{{[0-9]+}},
				; CHECK-NOT: v_writelane_b32
				define void @test_writelane_f32(ptr addrspace(1) %out, float %src) #1 {
				%writelane = call float @llvm.amdgcn.writelane.f32(float 2.0, i32 15, float %src)
				store float %writelane, ptr addrspace(1) %out, align 4
				ret void
				}


				; CHECK-LABEL: {{^}}test_writelane_ptr:
				; CHECK-DAG: v_writelane_b32
				; CHECK-DAG: v_writelane_b32
				; CHECK-NOT: v_writelane_b32
				define void @test_writelane_ptr(ptr addrspace(1) %out, ptr %src, ptr %in) {
				%writelane = call ptr @llvm.amdgcn.writelane.ptr(ptr %src, i32 15, ptr %in)
				store ptr %writelane, ptr addrspace(1) %out, align 2
				ret void
				}

				; CHECK-LABEL: {{^}}test_writelane_p2:
				; CHECK-DAG: v_writelane_b32
				; CHECK-NOT: v_writelane_b32
				define void @test_writelane_p2(ptr addrspace(1) %out, ptr addrspace(2) %src, ptr addrspace(2) %in) {
				%writelane = call ptr addrspace(2) @llvm.amdgcn.writelane.p2(ptr addrspace(2) %src, i32 15, ptr addrspace(2) %in)
				store ptr addrspace(2) %writelane, ptr addrspace(1) %out, align 2
				ret void
				}

				; CHECK-LABEL: {{^}}test_writelane_p3:
				; CHECK-DAG: v_writelane_b32
				; CHECK-NOT: v_writelane_b32
				define void @test_writelane_p3(ptr addrspace(1) %out, ptr addrspace(3) %src, ptr addrspace(3) %in) {
				%writelane = call ptr addrspace(3) @llvm.amdgcn.writelane.p3(ptr addrspace(3) %src, i32 15, ptr addrspace(3) %in)
				store ptr addrspace(3) %writelane, ptr addrspace(1) %out, align 2
				ret void
				}

				; CHECK-LABEL: {{^}}test_writelane_p5:
				; CHECK-DAG: v_writelane_b32
				; CHECK-NOT: v_writelane_b32
				define void @test_writelane_p5(ptr addrspace(1) %out, ptr addrspace(5) %src, ptr addrspace(5) %in) {
				%writelane = call ptr addrspace(5) @llvm.amdgcn.writelane.p5(ptr addrspace(5) %src, i32 15, ptr addrspace(5) %in)
				store ptr addrspace(5) %writelane, ptr addrspace(1) %out, align 2
				ret void
				}

				; CHECK-LABEL: {{^}}test_writelane_p6:
				; CHECK-DAG: v_writelane_b32
				; CHECK-NOT: v_writelane_b32
				define void @test_writelane_p6(ptr addrspace(1) %out, ptr addrspace(6) %src, ptr addrspace(6) %in) {
				%writelane = call ptr addrspace(6) @llvm.amdgcn.writelane.p6(ptr addrspace(6) %src, i32 15, ptr addrspace(6) %in)
				store ptr addrspace(6) %writelane, ptr addrspace(1) %out, align 2
				ret void
				}

				; CHECK-LABEL: {{^}}test_writelane_v3i16:
				; CHECK-DAG: v_writelane_b32 v{{[0-9]+}},
				; CHECK-DAG: v_writelane_b32 v{{[0-9]+}},
				; CHECK-NOT: v_writelane_b32
				define void @test_writelane_v3i16(ptr addrspace(1) %out, <3 x i16> %src) {
				%writelane = call <3 x i16> @llvm.amdgcn.writelane.v3i16(<3 x i16> zeroinitializer, i32 15, <3 x i16> %src)
				store <3 x i16> %writelane, ptr addrspace(1) %out, align 2
				ret void
				}

				; CHECK-LABEL: {{^}}test_writelane_v9f32:
				; CHECK-DAG: v_writelane_b32 v{{[0-9]+}}, 0, 15
				; CHECK-DAG: v_writelane_b32 v{{[0-9]+}}, 0, 15
				; CHECK-DAG: v_writelane_b32 v{{[0-9]+}}, 0, 15
				; CHECK-DAG: v_writelane_b32 v{{[0-9]+}}, 0, 15
				; CHECK-DAG: v_writelane_b32 v{{[0-9]+}}, 0, 15
				; CHECK-DAG: v_writelane_b32 v{{[0-9]+}}, 0, 15
				; CHECK-DAG: v_writelane_b32 v{{[0-9]+}}, 0, 15
				; CHECK-DAG: v_writelane_b32 v{{[0-9]+}}, 0, 15
				; CHECK-DAG: v_writelane_b32 v{{[0-9]+}}, 0, 15
				; CHECK-NOT: v_writelane_b32
				define void @test_writelane_v9f32(ptr addrspace(1) %out, <9 x float> %src) {
				%writelane = call <9 x float> @llvm.amdgcn.writelane.v9f32(<9 x float> zeroinitializer, i32 15, <9 x float> %src)
				store <9 x float> %writelane, ptr addrspace(1) %out, align 2
				ret void
				}


	declare i32 @llvm.amdgcn.workitem.id.x() #2			declare i32 @llvm.amdgcn.workitem.id.x() #2

	attributes #0 = { nounwind readnone convergent }			attributes #0 = { nounwind readnone convergent }
	attributes #1 = { nounwind }			attributes #1 = { nounwind }
	attributes #2 = { nounwind readnone }			attributes #2 = { nounwind readnone }
				No newline at end of file

llvm/test/CodeGen/AMDGPU/permlane-ptr.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -global-isel=0 -amdgpu-load-store-vectorizer=0 -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10,GFX10-SDAG %s
				; RUN: llc -global-isel=0 -amdgpu-load-store-vectorizer=0 -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX11,GFX11-SDAG %s

				declare ptr @llvm.amdgcn.permlane16.ptr(ptr, ptr, i32, i32, i1, i1) #0
				declare ptr addrspace(2) @llvm.amdgcn.permlane16.p2(ptr addrspace(2), ptr addrspace(2), i32, i32, i1, i1) #0
				declare ptr addrspace(3) @llvm.amdgcn.permlane16.p3(ptr addrspace(3), ptr addrspace(3), i32, i32, i1, i1) #0
				declare ptr addrspace(5) @llvm.amdgcn.permlane16.p5(ptr addrspace(5), ptr addrspace(5), i32, i32, i1, i1) #0
				declare ptr addrspace(6) @llvm.amdgcn.permlane16.p6(ptr addrspace(6), ptr addrspace(6), i32, i32, i1, i1) #0
				declare ptr @llvm.amdgcn.permlanex16.ptr(ptr, ptr, i32, i32, i1, i1) #0
				declare ptr addrspace(2) @llvm.amdgcn.permlanex16.p2(ptr addrspace(2), ptr addrspace(2), i32, i32, i1, i1) #0
				declare ptr addrspace(3) @llvm.amdgcn.permlanex16.p3(ptr addrspace(3), ptr addrspace(3), i32, i32, i1, i1) #0
				declare ptr addrspace(5) @llvm.amdgcn.permlanex16.p5(ptr addrspace(5), ptr addrspace(5), i32, i32, i1, i1) #0
				declare ptr addrspace(6) @llvm.amdgcn.permlanex16.p6(ptr addrspace(6), ptr addrspace(6), i32, i32, i1, i1) #0

				define void @test_permlanex16_ptr(ptr addrspace(1) %out, ptr %src0, i32 %src1, i32 %src2) #1 {
				; GFX10-LABEL: test_permlanex16_ptr:
				; GFX10: ; %bb.0:
				; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: v_readfirstlane_b32 s4, v4
				; GFX10-NEXT: v_readfirstlane_b32 s5, v5
				; GFX10-NEXT: v_permlanex16_b32 v3, v3, s4, s5 op_sel:[1,0]
				; GFX10-NEXT: v_permlanex16_b32 v2, v2, s4, s5 op_sel:[1,0]
				; GFX10-NEXT: global_store_dwordx2 v[0:1], v[2:3], off
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX11-LABEL: test_permlanex16_ptr:
				; GFX11: ; %bb.0:
				; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: v_readfirstlane_b32 s0, v4
				; GFX11-NEXT: v_readfirstlane_b32 s1, v5
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
				; GFX11-NEXT: v_permlanex16_b32 v3, v3, s0, s1 op_sel:[1,0]
				; GFX11-NEXT: v_permlanex16_b32 v2, v2, s0, s1 op_sel:[1,0]
				; GFX11-NEXT: global_store_b64 v[0:1], v[2:3], off
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: s_setpc_b64 s[30:31]
				%v = call ptr @llvm.amdgcn.permlanex16.ptr(ptr %src0, ptr %src0, i32 %src1, i32 %src2, i1 true, i1 false)
				store ptr %v, ptr addrspace(1) %out, align 4
				ret void
				}

				define void @test_permlane_p2(ptr addrspace(1) %out, ptr addrspace(2) %src0, i32 %src1, i32 %src2) #1 {
				; GFX10-LABEL: test_permlane_p2:
				; GFX10: ; %bb.0:
				; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: v_readfirstlane_b32 s4, v3
				; GFX10-NEXT: v_readfirstlane_b32 s5, v4
				; GFX10-NEXT: v_permlane16_b32 v2, v2, s4, s5 op_sel:[1,0]
				; GFX10-NEXT: global_store_dword v[0:1], v2, off
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX11-LABEL: test_permlane_p2:
				; GFX11: ; %bb.0:
				; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: v_readfirstlane_b32 s0, v3
				; GFX11-NEXT: v_readfirstlane_b32 s1, v4
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
				; GFX11-NEXT: v_permlane16_b32 v2, v2, s0, s1 op_sel:[1,0]
				; GFX11-NEXT: global_store_b32 v[0:1], v2, off
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: s_setpc_b64 s[30:31]
				%v = call ptr addrspace(2) @llvm.amdgcn.permlane16.p2(ptr addrspace(2) %src0, ptr addrspace(2) %src0, i32 %src1, i32 %src2, i1 true, i1 false)
				store ptr addrspace(2) %v, ptr addrspace(1) %out, align 4
				ret void
				}
				define void @test_permlane_p3(ptr addrspace(1) %out, ptr addrspace(3) %src0, i32 %src1, i32 %src2) #1 {
				; GFX10-LABEL: test_permlane_p3:
				; GFX10: ; %bb.0:
				; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: v_readfirstlane_b32 s4, v3
				; GFX10-NEXT: v_readfirstlane_b32 s5, v4
				; GFX10-NEXT: v_permlane16_b32 v2, v2, s4, s5 op_sel:[1,0]
				; GFX10-NEXT: global_store_dword v[0:1], v2, off
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX11-LABEL: test_permlane_p3:
				; GFX11: ; %bb.0:
				; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: v_readfirstlane_b32 s0, v3
				; GFX11-NEXT: v_readfirstlane_b32 s1, v4
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
				; GFX11-NEXT: v_permlane16_b32 v2, v2, s0, s1 op_sel:[1,0]
				; GFX11-NEXT: global_store_b32 v[0:1], v2, off
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: s_setpc_b64 s[30:31]
				%v = call ptr addrspace(3) @llvm.amdgcn.permlane16.p3(ptr addrspace(3) %src0, ptr addrspace(3) %src0, i32 %src1, i32 %src2, i1 true, i1 false)
				store ptr addrspace(3) %v, ptr addrspace(1) %out, align 4
				ret void
				}
				define void @test_permlane_p5(ptr addrspace(1) %out, ptr addrspace(5) %src0, i32 %src1, i32 %src2) #1 {
				; GFX10-LABEL: test_permlane_p5:
				; GFX10: ; %bb.0:
				; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: v_readfirstlane_b32 s4, v3
				; GFX10-NEXT: v_readfirstlane_b32 s5, v4
				; GFX10-NEXT: v_permlane16_b32 v2, v2, s4, s5 op_sel:[1,0]
				; GFX10-NEXT: global_store_dword v[0:1], v2, off
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX11-LABEL: test_permlane_p5:
				; GFX11: ; %bb.0:
				; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: v_readfirstlane_b32 s0, v3
				; GFX11-NEXT: v_readfirstlane_b32 s1, v4
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
				; GFX11-NEXT: v_permlane16_b32 v2, v2, s0, s1 op_sel:[1,0]
				; GFX11-NEXT: global_store_b32 v[0:1], v2, off
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: s_setpc_b64 s[30:31]
				%v = call ptr addrspace(5) @llvm.amdgcn.permlane16.p5(ptr addrspace(5) %src0, ptr addrspace(5) %src0, i32 %src1, i32 %src2, i1 true, i1 false)
				store ptr addrspace(5) %v, ptr addrspace(1) %out, align 4
				ret void
				}
				define void @test_permlane_p6(ptr addrspace(1) %out, ptr addrspace(6) %src0, i32 %src1, i32 %src2) #1 {
				; GFX10-LABEL: test_permlane_p6:
				; GFX10: ; %bb.0:
				; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: v_readfirstlane_b32 s4, v3
				; GFX10-NEXT: v_readfirstlane_b32 s5, v4
				; GFX10-NEXT: v_permlane16_b32 v2, v2, s4, s5 op_sel:[1,0]
				; GFX10-NEXT: global_store_dword v[0:1], v2, off
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX11-LABEL: test_permlane_p6:
				; GFX11: ; %bb.0:
				; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: v_readfirstlane_b32 s0, v3
				; GFX11-NEXT: v_readfirstlane_b32 s1, v4
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
				; GFX11-NEXT: v_permlane16_b32 v2, v2, s0, s1 op_sel:[1,0]
				; GFX11-NEXT: global_store_b32 v[0:1], v2, off
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: s_setpc_b64 s[30:31]
				%v = call ptr addrspace(6) @llvm.amdgcn.permlane16.p6(ptr addrspace(6) %src0, ptr addrspace(6) %src0, i32 %src1, i32 %src2, i1 true, i1 false)
				store ptr addrspace(6) %v, ptr addrspace(1) %out, align 4
				ret void
				}


				define void @test_permlane_ptr(ptr addrspace(1) %out, ptr %src0, i32 %src1, i32 %src2) #1 {
				; GFX10-LABEL: test_permlane_ptr:
				; GFX10: ; %bb.0:
				; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: v_readfirstlane_b32 s4, v4
				; GFX10-NEXT: v_readfirstlane_b32 s5, v5
				; GFX10-NEXT: v_permlanex16_b32 v3, v3, s4, s5 op_sel:[1,0]
				; GFX10-NEXT: v_permlanex16_b32 v2, v2, s4, s5 op_sel:[1,0]
				; GFX10-NEXT: global_store_dwordx2 v[0:1], v[2:3], off
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX11-LABEL: test_permlane_ptr:
				; GFX11: ; %bb.0:
				; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: v_readfirstlane_b32 s0, v4
				; GFX11-NEXT: v_readfirstlane_b32 s1, v5
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
				; GFX11-NEXT: v_permlanex16_b32 v3, v3, s0, s1 op_sel:[1,0]
				; GFX11-NEXT: v_permlanex16_b32 v2, v2, s0, s1 op_sel:[1,0]
				; GFX11-NEXT: global_store_b64 v[0:1], v[2:3], off
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: s_setpc_b64 s[30:31]
				%v = call ptr @llvm.amdgcn.permlane16.ptr(ptr %src0, ptr %src0, i32 %src1, i32 %src2, i1 true, i1 false)
				store ptr %v, ptr addrspace(1) %out, align 4
				ret void
				}

				define void @test_permlanex16_p2(ptr addrspace(1) %out, ptr addrspace(2) %src0, i32 %src1, i32 %src2) #1 {
				; GFX10-LABEL: test_permlanex16_p2:
				; GFX10: ; %bb.0:
				; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: v_readfirstlane_b32 s4, v3
				; GFX10-NEXT: v_readfirstlane_b32 s5, v4
				; GFX10-NEXT: v_permlanex16_b32 v2, v2, s4, s5 op_sel:[1,0]
				; GFX10-NEXT: global_store_dword v[0:1], v2, off
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX11-LABEL: test_permlanex16_p2:
				; GFX11: ; %bb.0:
				; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: v_readfirstlane_b32 s0, v3
				; GFX11-NEXT: v_readfirstlane_b32 s1, v4
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
				; GFX11-NEXT: v_permlanex16_b32 v2, v2, s0, s1 op_sel:[1,0]
				; GFX11-NEXT: global_store_b32 v[0:1], v2, off
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: s_setpc_b64 s[30:31]
				%v = call ptr addrspace(2) @llvm.amdgcn.permlanex16.p2(ptr addrspace(2) %src0, ptr addrspace(2) %src0, i32 %src1, i32 %src2, i1 true, i1 false)
				store ptr addrspace(2) %v, ptr addrspace(1) %out, align 4
				ret void
				}
				define void @test_permlanex16_p3(ptr addrspace(1) %out, ptr addrspace(3) %src0, i32 %src1, i32 %src2) #1 {
				; GFX10-LABEL: test_permlanex16_p3:
				; GFX10: ; %bb.0:
				; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: v_readfirstlane_b32 s4, v3
				; GFX10-NEXT: v_readfirstlane_b32 s5, v4
				; GFX10-NEXT: v_permlanex16_b32 v2, v2, s4, s5 op_sel:[1,0]
				; GFX10-NEXT: global_store_dword v[0:1], v2, off
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX11-LABEL: test_permlanex16_p3:
				; GFX11: ; %bb.0:
				; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: v_readfirstlane_b32 s0, v3
				; GFX11-NEXT: v_readfirstlane_b32 s1, v4
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
				; GFX11-NEXT: v_permlanex16_b32 v2, v2, s0, s1 op_sel:[1,0]
				; GFX11-NEXT: global_store_b32 v[0:1], v2, off
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: s_setpc_b64 s[30:31]
				%v = call ptr addrspace(3) @llvm.amdgcn.permlanex16.p3(ptr addrspace(3) %src0, ptr addrspace(3) %src0, i32 %src1, i32 %src2, i1 true, i1 false)
				store ptr addrspace(3) %v, ptr addrspace(1) %out, align 4
				ret void
				}
				define void @test_permlanex16_p5(ptr addrspace(1) %out, ptr addrspace(5) %src0, i32 %src1, i32 %src2) #1 {
				; GFX10-LABEL: test_permlanex16_p5:
				; GFX10: ; %bb.0:
				; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: v_readfirstlane_b32 s4, v3
				; GFX10-NEXT: v_readfirstlane_b32 s5, v4
				; GFX10-NEXT: v_permlanex16_b32 v2, v2, s4, s5 op_sel:[1,0]
				; GFX10-NEXT: global_store_dword v[0:1], v2, off
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX11-LABEL: test_permlanex16_p5:
				; GFX11: ; %bb.0:
				; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: v_readfirstlane_b32 s0, v3
				; GFX11-NEXT: v_readfirstlane_b32 s1, v4
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
				; GFX11-NEXT: v_permlanex16_b32 v2, v2, s0, s1 op_sel:[1,0]
				; GFX11-NEXT: global_store_b32 v[0:1], v2, off
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: s_setpc_b64 s[30:31]
				%v = call ptr addrspace(5) @llvm.amdgcn.permlanex16.p5(ptr addrspace(5) %src0, ptr addrspace(5) %src0, i32 %src1, i32 %src2, i1 true, i1 false)
				store ptr addrspace(5) %v, ptr addrspace(1) %out, align 4
				ret void
				}
				define void @test_permlanex16_p6(ptr addrspace(1) %out, ptr addrspace(6) %src0, i32 %src1, i32 %src2) #1 {
				; GFX10-LABEL: test_permlanex16_p6:
				; GFX10: ; %bb.0:
				; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: v_readfirstlane_b32 s4, v3
				; GFX10-NEXT: v_readfirstlane_b32 s5, v4
				; GFX10-NEXT: v_permlanex16_b32 v2, v2, s4, s5 op_sel:[1,0]
				; GFX10-NEXT: global_store_dword v[0:1], v2, off
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX11-LABEL: test_permlanex16_p6:
				; GFX11: ; %bb.0:
				; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: v_readfirstlane_b32 s0, v3
				; GFX11-NEXT: v_readfirstlane_b32 s1, v4
				; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
				; GFX11-NEXT: v_permlanex16_b32 v2, v2, s0, s1 op_sel:[1,0]
				; GFX11-NEXT: global_store_b32 v[0:1], v2, off
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: s_setpc_b64 s[30:31]
				%v = call ptr addrspace(6) @llvm.amdgcn.permlanex16.p6(ptr addrspace(6) %src0, ptr addrspace(6) %src0, i32 %src1, i32 %src2, i1 true, i1 false)
				store ptr addrspace(6) %v, ptr addrspace(1) %out, align 4
				ret void
				}
				;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
				; GFX10-SDAG: {{.*}}
				; GFX11-SDAG: {{.*}}

llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2
	; RUN: opt -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -passes=instcombine -S < %s \| FileCheck %s			; RUN: opt -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -passes=instcombine -S < %s \| FileCheck %s

	; --------------------------------------------------------------------			; --------------------------------------------------------------------
	; llvm.amdgcn.rcp			; llvm.amdgcn.rcp
	; --------------------------------------------------------------------			; --------------------------------------------------------------------

	declare float @llvm.amdgcn.rcp.f32(float) nounwind readnone			declare float @llvm.amdgcn.rcp.f32(float) nounwind readnone
	declare double @llvm.amdgcn.rcp.f64(double) nounwind readnone			declare double @llvm.amdgcn.rcp.f64(double) nounwind readnone

	define float @test_constant_fold_rcp_f32_undef() nounwind {			define float @test_constant_fold_rcp_f32_undef() nounwind {
	; CHECK-LABEL: @test_constant_fold_rcp_f32_undef(			; CHECK-LABEL: define float @test_constant_fold_rcp_f32_undef
				; CHECK-SAME: () #[[ATTR1:[0-9]+]] {
	; CHECK-NEXT: ret float 0x7FF8000000000000			; CHECK-NEXT: ret float 0x7FF8000000000000
	;			;
	%val = call float @llvm.amdgcn.rcp.f32(float undef) nounwind readnone			%val = call float @llvm.amdgcn.rcp.f32(float undef) nounwind readnone
	ret float %val			ret float %val
	}			}

	define float @test_constant_fold_rcp_f32_1() nounwind {			define float @test_constant_fold_rcp_f32_1() nounwind {
	; CHECK-LABEL: @test_constant_fold_rcp_f32_1(			; CHECK-LABEL: define float @test_constant_fold_rcp_f32_1
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret float 1.000000e+00			; CHECK-NEXT: ret float 1.000000e+00
	;			;
	%val = call float @llvm.amdgcn.rcp.f32(float 1.0) nounwind readnone			%val = call float @llvm.amdgcn.rcp.f32(float 1.0) nounwind readnone
	ret float %val			ret float %val
	}			}

	define double @test_constant_fold_rcp_f64_1() nounwind {			define double @test_constant_fold_rcp_f64_1() nounwind {
	; CHECK-LABEL: @test_constant_fold_rcp_f64_1(			; CHECK-LABEL: define double @test_constant_fold_rcp_f64_1
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret double 1.000000e+00			; CHECK-NEXT: ret double 1.000000e+00
	;			;
	%val = call double @llvm.amdgcn.rcp.f64(double 1.0) nounwind readnone			%val = call double @llvm.amdgcn.rcp.f64(double 1.0) nounwind readnone
	ret double %val			ret double %val
	}			}

	define float @test_constant_fold_rcp_f32_half() nounwind {			define float @test_constant_fold_rcp_f32_half() nounwind {
	; CHECK-LABEL: @test_constant_fold_rcp_f32_half(			; CHECK-LABEL: define float @test_constant_fold_rcp_f32_half
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret float 2.000000e+00			; CHECK-NEXT: ret float 2.000000e+00
	;			;
	%val = call float @llvm.amdgcn.rcp.f32(float 0.5) nounwind readnone			%val = call float @llvm.amdgcn.rcp.f32(float 0.5) nounwind readnone
	ret float %val			ret float %val
	}			}

	define double @test_constant_fold_rcp_f64_half() nounwind {			define double @test_constant_fold_rcp_f64_half() nounwind {
	; CHECK-LABEL: @test_constant_fold_rcp_f64_half(			; CHECK-LABEL: define double @test_constant_fold_rcp_f64_half
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret double 2.000000e+00			; CHECK-NEXT: ret double 2.000000e+00
	;			;
	%val = call double @llvm.amdgcn.rcp.f64(double 0.5) nounwind readnone			%val = call double @llvm.amdgcn.rcp.f64(double 0.5) nounwind readnone
	ret double %val			ret double %val
	}			}

	define float @test_constant_fold_rcp_f32_43() nounwind {			define float @test_constant_fold_rcp_f32_43() nounwind {
	; CHECK-LABEL: @test_constant_fold_rcp_f32_43(			; CHECK-LABEL: define float @test_constant_fold_rcp_f32_43
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret float 0x3F97D05F40000000			; CHECK-NEXT: ret float 0x3F97D05F40000000
	;			;
	%val = call float @llvm.amdgcn.rcp.f32(float 4.300000e+01) nounwind readnone			%val = call float @llvm.amdgcn.rcp.f32(float 4.300000e+01) nounwind readnone
	ret float %val			ret float %val
	}			}

	define double @test_constant_fold_rcp_f64_43() nounwind {			define double @test_constant_fold_rcp_f64_43() nounwind {
	; CHECK-LABEL: @test_constant_fold_rcp_f64_43(			; CHECK-LABEL: define double @test_constant_fold_rcp_f64_43
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret double 0x3F97D05F417D05F4			; CHECK-NEXT: ret double 0x3F97D05F417D05F4
	;			;
	%val = call double @llvm.amdgcn.rcp.f64(double 4.300000e+01) nounwind readnone			%val = call double @llvm.amdgcn.rcp.f64(double 4.300000e+01) nounwind readnone
	ret double %val			ret double %val
	}			}

	define float @test_constant_fold_rcp_f32_43_strictfp() nounwind strictfp {			define float @test_constant_fold_rcp_f32_43_strictfp() nounwind strictfp {
	; CHECK-LABEL: @test_constant_fold_rcp_f32_43_strictfp(			; CHECK-LABEL: define float @test_constant_fold_rcp_f32_43_strictfp
	; CHECK-NEXT: [[VAL:%.*]] = call float @llvm.amdgcn.rcp.f32(float 4.300000e+01) #[[ATTR14:[0-9]+]]			; CHECK-SAME: () #[[ATTR2:[0-9]+]] {
				; CHECK-NEXT: [[VAL:%.*]] = call float @llvm.amdgcn.rcp.f32(float 4.300000e+01) #[[ATTR13:[0-9]+]]
	; CHECK-NEXT: ret float [[VAL]]			; CHECK-NEXT: ret float [[VAL]]
	;			;
	%val = call float @llvm.amdgcn.rcp.f32(float 4.300000e+01) strictfp nounwind readnone			%val = call float @llvm.amdgcn.rcp.f32(float 4.300000e+01) strictfp nounwind readnone
	ret float %val			ret float %val
	}			}

	; --------------------------------------------------------------------			; --------------------------------------------------------------------
	; llvm.amdgcn.sqrt			; llvm.amdgcn.sqrt
	; --------------------------------------------------------------------			; --------------------------------------------------------------------

	declare half @llvm.amdgcn.sqrt.f16(half) nounwind readnone			declare half @llvm.amdgcn.sqrt.f16(half) nounwind readnone
	declare float @llvm.amdgcn.sqrt.f32(float) nounwind readnone			declare float @llvm.amdgcn.sqrt.f32(float) nounwind readnone
	declare double @llvm.amdgcn.sqrt.f64(double) nounwind readnone			declare double @llvm.amdgcn.sqrt.f64(double) nounwind readnone

	define half @test_constant_fold_sqrt_f16_undef() nounwind {			define half @test_constant_fold_sqrt_f16_undef() nounwind {
	; CHECK-LABEL: @test_constant_fold_sqrt_f16_undef(			; CHECK-LABEL: define half @test_constant_fold_sqrt_f16_undef
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret half 0xH7E00			; CHECK-NEXT: ret half 0xH7E00
	;			;
	%val = call half @llvm.amdgcn.sqrt.f16(half undef) nounwind readnone			%val = call half @llvm.amdgcn.sqrt.f16(half undef) nounwind readnone
	ret half %val			ret half %val
	}			}

	define float @test_constant_fold_sqrt_f32_undef() nounwind {			define float @test_constant_fold_sqrt_f32_undef() nounwind {
	; CHECK-LABEL: @test_constant_fold_sqrt_f32_undef(			; CHECK-LABEL: define float @test_constant_fold_sqrt_f32_undef
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret float 0x7FF8000000000000			; CHECK-NEXT: ret float 0x7FF8000000000000
	;			;
	%val = call float @llvm.amdgcn.sqrt.f32(float undef) nounwind readnone			%val = call float @llvm.amdgcn.sqrt.f32(float undef) nounwind readnone
	ret float %val			ret float %val
	}			}

	define double @test_constant_fold_sqrt_f64_undef() nounwind {			define double @test_constant_fold_sqrt_f64_undef() nounwind {
	; CHECK-LABEL: @test_constant_fold_sqrt_f64_undef(			; CHECK-LABEL: define double @test_constant_fold_sqrt_f64_undef
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret double 0x7FF8000000000000			; CHECK-NEXT: ret double 0x7FF8000000000000
	;			;
	%val = call double @llvm.amdgcn.sqrt.f64(double undef) nounwind readnone			%val = call double @llvm.amdgcn.sqrt.f64(double undef) nounwind readnone
	ret double %val			ret double %val
	}			}

	define half @test_constant_fold_sqrt_f16_0() nounwind {			define half @test_constant_fold_sqrt_f16_0() nounwind {
	; CHECK-LABEL: @test_constant_fold_sqrt_f16_0(			; CHECK-LABEL: define half @test_constant_fold_sqrt_f16_0
	; CHECK-NEXT: [[VAL:%.*]] = call half @llvm.amdgcn.sqrt.f16(half 0xH0000) #[[ATTR15:[0-9]+]]			; CHECK-SAME: () #[[ATTR1]] {
				; CHECK-NEXT: [[VAL:%.*]] = call half @llvm.amdgcn.sqrt.f16(half 0xH0000) #[[ATTR14:[0-9]+]]
	; CHECK-NEXT: ret half [[VAL]]			; CHECK-NEXT: ret half [[VAL]]
	;			;
	%val = call half @llvm.amdgcn.sqrt.f16(half 0.0) nounwind readnone			%val = call half @llvm.amdgcn.sqrt.f16(half 0.0) nounwind readnone
	ret half %val			ret half %val
	}			}

	define float @test_constant_fold_sqrt_f32_0() nounwind {			define float @test_constant_fold_sqrt_f32_0() nounwind {
	; CHECK-LABEL: @test_constant_fold_sqrt_f32_0(			; CHECK-LABEL: define float @test_constant_fold_sqrt_f32_0
	; CHECK-NEXT: [[VAL:%.*]] = call float @llvm.amdgcn.sqrt.f32(float 0.000000e+00) #[[ATTR15]]			; CHECK-SAME: () #[[ATTR1]] {
				; CHECK-NEXT: [[VAL:%.*]] = call float @llvm.amdgcn.sqrt.f32(float 0.000000e+00) #[[ATTR14]]
	; CHECK-NEXT: ret float [[VAL]]			; CHECK-NEXT: ret float [[VAL]]
	;			;
	%val = call float @llvm.amdgcn.sqrt.f32(float 0.0) nounwind readnone			%val = call float @llvm.amdgcn.sqrt.f32(float 0.0) nounwind readnone
	ret float %val			ret float %val
	}			}

	define double @test_constant_fold_sqrt_f64_0() nounwind {			define double @test_constant_fold_sqrt_f64_0() nounwind {
	; CHECK-LABEL: @test_constant_fold_sqrt_f64_0(			; CHECK-LABEL: define double @test_constant_fold_sqrt_f64_0
	; CHECK-NEXT: [[VAL:%.*]] = call double @llvm.amdgcn.sqrt.f64(double 0.000000e+00) #[[ATTR15]]			; CHECK-SAME: () #[[ATTR1]] {
				; CHECK-NEXT: [[VAL:%.*]] = call double @llvm.amdgcn.sqrt.f64(double 0.000000e+00) #[[ATTR14]]
	; CHECK-NEXT: ret double [[VAL]]			; CHECK-NEXT: ret double [[VAL]]
	;			;
	%val = call double @llvm.amdgcn.sqrt.f64(double 0.0) nounwind readnone			%val = call double @llvm.amdgcn.sqrt.f64(double 0.0) nounwind readnone
	ret double %val			ret double %val
	}			}

	define half @test_constant_fold_sqrt_f16_neg0() nounwind {			define half @test_constant_fold_sqrt_f16_neg0() nounwind {
	; CHECK-LABEL: @test_constant_fold_sqrt_f16_neg0(			; CHECK-LABEL: define half @test_constant_fold_sqrt_f16_neg0
	; CHECK-NEXT: [[VAL:%.*]] = call half @llvm.amdgcn.sqrt.f16(half 0xH8000) #[[ATTR15]]			; CHECK-SAME: () #[[ATTR1]] {
				; CHECK-NEXT: [[VAL:%.*]] = call half @llvm.amdgcn.sqrt.f16(half 0xH8000) #[[ATTR14]]
	; CHECK-NEXT: ret half [[VAL]]			; CHECK-NEXT: ret half [[VAL]]
	;			;
	%val = call half @llvm.amdgcn.sqrt.f16(half -0.0) nounwind readnone			%val = call half @llvm.amdgcn.sqrt.f16(half -0.0) nounwind readnone
	ret half %val			ret half %val
	}			}

	define float @test_constant_fold_sqrt_f32_neg0() nounwind {			define float @test_constant_fold_sqrt_f32_neg0() nounwind {
	; CHECK-LABEL: @test_constant_fold_sqrt_f32_neg0(			; CHECK-LABEL: define float @test_constant_fold_sqrt_f32_neg0
	; CHECK-NEXT: [[VAL:%.*]] = call float @llvm.amdgcn.sqrt.f32(float -0.000000e+00) #[[ATTR15]]			; CHECK-SAME: () #[[ATTR1]] {
				; CHECK-NEXT: [[VAL:%.*]] = call float @llvm.amdgcn.sqrt.f32(float -0.000000e+00) #[[ATTR14]]
	; CHECK-NEXT: ret float [[VAL]]			; CHECK-NEXT: ret float [[VAL]]
	;			;
	%val = call float @llvm.amdgcn.sqrt.f32(float -0.0) nounwind readnone			%val = call float @llvm.amdgcn.sqrt.f32(float -0.0) nounwind readnone
	ret float %val			ret float %val
	}			}

	define double @test_constant_fold_sqrt_f64_neg0() nounwind {			define double @test_constant_fold_sqrt_f64_neg0() nounwind {
	; CHECK-LABEL: @test_constant_fold_sqrt_f64_neg0(			; CHECK-LABEL: define double @test_constant_fold_sqrt_f64_neg0
	; CHECK-NEXT: [[VAL:%.*]] = call double @llvm.amdgcn.sqrt.f64(double -0.000000e+00) #[[ATTR15]]			; CHECK-SAME: () #[[ATTR1]] {
				; CHECK-NEXT: [[VAL:%.*]] = call double @llvm.amdgcn.sqrt.f64(double -0.000000e+00) #[[ATTR14]]
	; CHECK-NEXT: ret double [[VAL]]			; CHECK-NEXT: ret double [[VAL]]
	;			;
	%val = call double @llvm.amdgcn.sqrt.f64(double -0.0) nounwind readnone			%val = call double @llvm.amdgcn.sqrt.f64(double -0.0) nounwind readnone
	ret double %val			ret double %val
	}			}

	define double @test_constant_fold_sqrt_snan_f64() nounwind {			define double @test_constant_fold_sqrt_snan_f64() nounwind {
	; CHECK-LABEL: @test_constant_fold_sqrt_snan_f64(			; CHECK-LABEL: define double @test_constant_fold_sqrt_snan_f64
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: [[VAL:%.*]] = call double @llvm.amdgcn.sqrt.f64(double 0x7FF0000000000001)			; CHECK-NEXT: [[VAL:%.*]] = call double @llvm.amdgcn.sqrt.f64(double 0x7FF0000000000001)
	; CHECK-NEXT: ret double [[VAL]]			; CHECK-NEXT: ret double [[VAL]]
	;			;
	%val = call double @llvm.amdgcn.sqrt.f64(double 0x7FF0000000000001)			%val = call double @llvm.amdgcn.sqrt.f64(double 0x7FF0000000000001)
	ret double %val			ret double %val
	}			}

	define double @test_constant_fold_sqrt_qnan_f64() nounwind {			define double @test_constant_fold_sqrt_qnan_f64() nounwind {
	; CHECK-LABEL: @test_constant_fold_sqrt_qnan_f64(			; CHECK-LABEL: define double @test_constant_fold_sqrt_qnan_f64
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: [[VAL:%.*]] = call double @llvm.amdgcn.sqrt.f64(double 0x7FF8000000000000)			; CHECK-NEXT: [[VAL:%.*]] = call double @llvm.amdgcn.sqrt.f64(double 0x7FF8000000000000)
	; CHECK-NEXT: ret double [[VAL]]			; CHECK-NEXT: ret double [[VAL]]
	;			;
	%val = call double @llvm.amdgcn.sqrt.f64(double 0x7FF8000000000000)			%val = call double @llvm.amdgcn.sqrt.f64(double 0x7FF8000000000000)
	ret double %val			ret double %val
	}			}

	define double @test_constant_fold_sqrt_neg1() nounwind {			define double @test_constant_fold_sqrt_neg1() nounwind {
	; CHECK-LABEL: @test_constant_fold_sqrt_neg1(			; CHECK-LABEL: define double @test_constant_fold_sqrt_neg1
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: [[VAL:%.*]] = call double @llvm.amdgcn.sqrt.f64(double -1.000000e+00)			; CHECK-NEXT: [[VAL:%.*]] = call double @llvm.amdgcn.sqrt.f64(double -1.000000e+00)
	; CHECK-NEXT: ret double [[VAL]]			; CHECK-NEXT: ret double [[VAL]]
	;			;
	%val = call double @llvm.amdgcn.sqrt.f64(double -1.0)			%val = call double @llvm.amdgcn.sqrt.f64(double -1.0)
	ret double %val			ret double %val
	}			}

	; --------------------------------------------------------------------			; --------------------------------------------------------------------
	; llvm.amdgcn.rsq			; llvm.amdgcn.rsq
	; --------------------------------------------------------------------			; --------------------------------------------------------------------

	declare float @llvm.amdgcn.rsq.f32(float) nounwind readnone			declare float @llvm.amdgcn.rsq.f32(float) nounwind readnone

	define float @test_constant_fold_rsq_f32_undef() nounwind {			define float @test_constant_fold_rsq_f32_undef() nounwind {
	; CHECK-LABEL: @test_constant_fold_rsq_f32_undef(			; CHECK-LABEL: define float @test_constant_fold_rsq_f32_undef
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret float 0x7FF8000000000000			; CHECK-NEXT: ret float 0x7FF8000000000000
	;			;
	%val = call float @llvm.amdgcn.rsq.f32(float undef) nounwind readnone			%val = call float @llvm.amdgcn.rsq.f32(float undef) nounwind readnone
	ret float %val			ret float %val
	}			}

	; --------------------------------------------------------------------			; --------------------------------------------------------------------
	; llvm.amdgcn.frexp.mant			; llvm.amdgcn.frexp.mant
	; --------------------------------------------------------------------			; --------------------------------------------------------------------

	declare float @llvm.amdgcn.frexp.mant.f32(float) nounwind readnone			declare float @llvm.amdgcn.frexp.mant.f32(float) nounwind readnone
	declare double @llvm.amdgcn.frexp.mant.f64(double) nounwind readnone			declare double @llvm.amdgcn.frexp.mant.f64(double) nounwind readnone


	define float @test_constant_fold_frexp_mant_f32_undef() nounwind {			define float @test_constant_fold_frexp_mant_f32_undef() nounwind {
	; CHECK-LABEL: @test_constant_fold_frexp_mant_f32_undef(			; CHECK-LABEL: define float @test_constant_fold_frexp_mant_f32_undef
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret float undef			; CHECK-NEXT: ret float undef
	;			;
	%val = call float @llvm.amdgcn.frexp.mant.f32(float undef)			%val = call float @llvm.amdgcn.frexp.mant.f32(float undef)
	ret float %val			ret float %val
	}			}

	define double @test_constant_fold_frexp_mant_f64_undef() nounwind {			define double @test_constant_fold_frexp_mant_f64_undef() nounwind {
	; CHECK-LABEL: @test_constant_fold_frexp_mant_f64_undef(			; CHECK-LABEL: define double @test_constant_fold_frexp_mant_f64_undef
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret double undef			; CHECK-NEXT: ret double undef
	;			;
	%val = call double @llvm.amdgcn.frexp.mant.f64(double undef)			%val = call double @llvm.amdgcn.frexp.mant.f64(double undef)
	ret double %val			ret double %val
	}			}

	define float @test_constant_fold_frexp_mant_f32_0() nounwind {			define float @test_constant_fold_frexp_mant_f32_0() nounwind {
	; CHECK-LABEL: @test_constant_fold_frexp_mant_f32_0(			; CHECK-LABEL: define float @test_constant_fold_frexp_mant_f32_0
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret float 0.000000e+00			; CHECK-NEXT: ret float 0.000000e+00
	;			;
	%val = call float @llvm.amdgcn.frexp.mant.f32(float 0.0)			%val = call float @llvm.amdgcn.frexp.mant.f32(float 0.0)
	ret float %val			ret float %val
	}			}

	define double @test_constant_fold_frexp_mant_f64_0() nounwind {			define double @test_constant_fold_frexp_mant_f64_0() nounwind {
	; CHECK-LABEL: @test_constant_fold_frexp_mant_f64_0(			; CHECK-LABEL: define double @test_constant_fold_frexp_mant_f64_0
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret double 0.000000e+00			; CHECK-NEXT: ret double 0.000000e+00
	;			;
	%val = call double @llvm.amdgcn.frexp.mant.f64(double 0.0)			%val = call double @llvm.amdgcn.frexp.mant.f64(double 0.0)
	ret double %val			ret double %val
	}			}

	define float @test_constant_fold_frexp_mant_f32_n0() nounwind {			define float @test_constant_fold_frexp_mant_f32_n0() nounwind {
	; CHECK-LABEL: @test_constant_fold_frexp_mant_f32_n0(			; CHECK-LABEL: define float @test_constant_fold_frexp_mant_f32_n0
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret float -0.000000e+00			; CHECK-NEXT: ret float -0.000000e+00
	;			;
	%val = call float @llvm.amdgcn.frexp.mant.f32(float -0.0)			%val = call float @llvm.amdgcn.frexp.mant.f32(float -0.0)
	ret float %val			ret float %val
	}			}

	define double @test_constant_fold_frexp_mant_f64_n0() nounwind {			define double @test_constant_fold_frexp_mant_f64_n0() nounwind {
	; CHECK-LABEL: @test_constant_fold_frexp_mant_f64_n0(			; CHECK-LABEL: define double @test_constant_fold_frexp_mant_f64_n0
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret double -0.000000e+00			; CHECK-NEXT: ret double -0.000000e+00
	;			;
	%val = call double @llvm.amdgcn.frexp.mant.f64(double -0.0)			%val = call double @llvm.amdgcn.frexp.mant.f64(double -0.0)
	ret double %val			ret double %val
	}			}

	define float @test_constant_fold_frexp_mant_f32_1() nounwind {			define float @test_constant_fold_frexp_mant_f32_1() nounwind {
	; CHECK-LABEL: @test_constant_fold_frexp_mant_f32_1(			; CHECK-LABEL: define float @test_constant_fold_frexp_mant_f32_1
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret float 5.000000e-01			; CHECK-NEXT: ret float 5.000000e-01
	;			;
	%val = call float @llvm.amdgcn.frexp.mant.f32(float 1.0)			%val = call float @llvm.amdgcn.frexp.mant.f32(float 1.0)
	ret float %val			ret float %val
	}			}

	define double @test_constant_fold_frexp_mant_f64_1() nounwind {			define double @test_constant_fold_frexp_mant_f64_1() nounwind {
	; CHECK-LABEL: @test_constant_fold_frexp_mant_f64_1(			; CHECK-LABEL: define double @test_constant_fold_frexp_mant_f64_1
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret double 5.000000e-01			; CHECK-NEXT: ret double 5.000000e-01
	;			;
	%val = call double @llvm.amdgcn.frexp.mant.f64(double 1.0)			%val = call double @llvm.amdgcn.frexp.mant.f64(double 1.0)
	ret double %val			ret double %val
	}			}

	define float @test_constant_fold_frexp_mant_f32_n1() nounwind {			define float @test_constant_fold_frexp_mant_f32_n1() nounwind {
	; CHECK-LABEL: @test_constant_fold_frexp_mant_f32_n1(			; CHECK-LABEL: define float @test_constant_fold_frexp_mant_f32_n1
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret float -5.000000e-01			; CHECK-NEXT: ret float -5.000000e-01
	;			;
	%val = call float @llvm.amdgcn.frexp.mant.f32(float -1.0)			%val = call float @llvm.amdgcn.frexp.mant.f32(float -1.0)
	ret float %val			ret float %val
	}			}

	define double @test_constant_fold_frexp_mant_f64_n1() nounwind {			define double @test_constant_fold_frexp_mant_f64_n1() nounwind {
	; CHECK-LABEL: @test_constant_fold_frexp_mant_f64_n1(			; CHECK-LABEL: define double @test_constant_fold_frexp_mant_f64_n1
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret double -5.000000e-01			; CHECK-NEXT: ret double -5.000000e-01
	;			;
	%val = call double @llvm.amdgcn.frexp.mant.f64(double -1.0)			%val = call double @llvm.amdgcn.frexp.mant.f64(double -1.0)
	ret double %val			ret double %val
	}			}

	define float @test_constant_fold_frexp_mant_f32_nan() nounwind {			define float @test_constant_fold_frexp_mant_f32_nan() nounwind {
	; CHECK-LABEL: @test_constant_fold_frexp_mant_f32_nan(			; CHECK-LABEL: define float @test_constant_fold_frexp_mant_f32_nan
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret float 0x7FF8000000000000			; CHECK-NEXT: ret float 0x7FF8000000000000
	;			;
	%val = call float @llvm.amdgcn.frexp.mant.f32(float 0x7FF8000000000000)			%val = call float @llvm.amdgcn.frexp.mant.f32(float 0x7FF8000000000000)
	ret float %val			ret float %val
	}			}

	define double @test_constant_fold_frexp_mant_f64_nan() nounwind {			define double @test_constant_fold_frexp_mant_f64_nan() nounwind {
	; CHECK-LABEL: @test_constant_fold_frexp_mant_f64_nan(			; CHECK-LABEL: define double @test_constant_fold_frexp_mant_f64_nan
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret double 0x7FF8000000000000			; CHECK-NEXT: ret double 0x7FF8000000000000
	;			;
	%val = call double @llvm.amdgcn.frexp.mant.f64(double 0x7FF8000000000000)			%val = call double @llvm.amdgcn.frexp.mant.f64(double 0x7FF8000000000000)
	ret double %val			ret double %val
	}			}

	define float @test_constant_fold_frexp_mant_f32_inf() nounwind {			define float @test_constant_fold_frexp_mant_f32_inf() nounwind {
	; CHECK-LABEL: @test_constant_fold_frexp_mant_f32_inf(			; CHECK-LABEL: define float @test_constant_fold_frexp_mant_f32_inf
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret float 0x7FF0000000000000			; CHECK-NEXT: ret float 0x7FF0000000000000
	;			;
	%val = call float @llvm.amdgcn.frexp.mant.f32(float 0x7FF0000000000000)			%val = call float @llvm.amdgcn.frexp.mant.f32(float 0x7FF0000000000000)
	ret float %val			ret float %val
	}			}

	define double @test_constant_fold_frexp_mant_f64_inf() nounwind {			define double @test_constant_fold_frexp_mant_f64_inf() nounwind {
	; CHECK-LABEL: @test_constant_fold_frexp_mant_f64_inf(			; CHECK-LABEL: define double @test_constant_fold_frexp_mant_f64_inf
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret double 0x7FF0000000000000			; CHECK-NEXT: ret double 0x7FF0000000000000
	;			;
	%val = call double @llvm.amdgcn.frexp.mant.f64(double 0x7FF0000000000000)			%val = call double @llvm.amdgcn.frexp.mant.f64(double 0x7FF0000000000000)
	ret double %val			ret double %val
	}			}

	define float @test_constant_fold_frexp_mant_f32_ninf() nounwind {			define float @test_constant_fold_frexp_mant_f32_ninf() nounwind {
	; CHECK-LABEL: @test_constant_fold_frexp_mant_f32_ninf(			; CHECK-LABEL: define float @test_constant_fold_frexp_mant_f32_ninf
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret float 0xFFF0000000000000			; CHECK-NEXT: ret float 0xFFF0000000000000
	;			;
	%val = call float @llvm.amdgcn.frexp.mant.f32(float 0xFFF0000000000000)			%val = call float @llvm.amdgcn.frexp.mant.f32(float 0xFFF0000000000000)
	ret float %val			ret float %val
	}			}

	define double @test_constant_fold_frexp_mant_f64_ninf() nounwind {			define double @test_constant_fold_frexp_mant_f64_ninf() nounwind {
	; CHECK-LABEL: @test_constant_fold_frexp_mant_f64_ninf(			; CHECK-LABEL: define double @test_constant_fold_frexp_mant_f64_ninf
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret double 0xFFF0000000000000			; CHECK-NEXT: ret double 0xFFF0000000000000
	;			;
	%val = call double @llvm.amdgcn.frexp.mant.f64(double 0xFFF0000000000000)			%val = call double @llvm.amdgcn.frexp.mant.f64(double 0xFFF0000000000000)
	ret double %val			ret double %val
	}			}

	define float @test_constant_fold_frexp_mant_f32_max_num() nounwind {			define float @test_constant_fold_frexp_mant_f32_max_num() nounwind {
	; CHECK-LABEL: @test_constant_fold_frexp_mant_f32_max_num(			; CHECK-LABEL: define float @test_constant_fold_frexp_mant_f32_max_num
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret float 0x3FEFFFFFE0000000			; CHECK-NEXT: ret float 0x3FEFFFFFE0000000
	;			;
	%val = call float @llvm.amdgcn.frexp.mant.f32(float 0x47EFFFFFE0000000)			%val = call float @llvm.amdgcn.frexp.mant.f32(float 0x47EFFFFFE0000000)
	ret float %val			ret float %val
	}			}

	define double @test_constant_fold_frexp_mant_f64_max_num() nounwind {			define double @test_constant_fold_frexp_mant_f64_max_num() nounwind {
	; CHECK-LABEL: @test_constant_fold_frexp_mant_f64_max_num(			; CHECK-LABEL: define double @test_constant_fold_frexp_mant_f64_max_num
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret double 0x3FEFFFFFFFFFFFFF			; CHECK-NEXT: ret double 0x3FEFFFFFFFFFFFFF
	;			;
	%val = call double @llvm.amdgcn.frexp.mant.f64(double 0x7FEFFFFFFFFFFFFF)			%val = call double @llvm.amdgcn.frexp.mant.f64(double 0x7FEFFFFFFFFFFFFF)
	ret double %val			ret double %val
	}			}

	define float @test_constant_fold_frexp_mant_f32_min_num() nounwind {			define float @test_constant_fold_frexp_mant_f32_min_num() nounwind {
	; CHECK-LABEL: @test_constant_fold_frexp_mant_f32_min_num(			; CHECK-LABEL: define float @test_constant_fold_frexp_mant_f32_min_num
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret float 5.000000e-01			; CHECK-NEXT: ret float 5.000000e-01
	;			;
	%val = call float @llvm.amdgcn.frexp.mant.f32(float 0x36A0000000000000)			%val = call float @llvm.amdgcn.frexp.mant.f32(float 0x36A0000000000000)
	ret float %val			ret float %val
	}			}

	define double @test_constant_fold_frexp_mant_f64_min_num() nounwind {			define double @test_constant_fold_frexp_mant_f64_min_num() nounwind {
	; CHECK-LABEL: @test_constant_fold_frexp_mant_f64_min_num(			; CHECK-LABEL: define double @test_constant_fold_frexp_mant_f64_min_num
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret double 5.000000e-01			; CHECK-NEXT: ret double 5.000000e-01
	;			;
	%val = call double @llvm.amdgcn.frexp.mant.f64(double 4.940656e-324)			%val = call double @llvm.amdgcn.frexp.mant.f64(double 4.940656e-324)
	ret double %val			ret double %val
	}			}


	; --------------------------------------------------------------------			; --------------------------------------------------------------------
	; llvm.amdgcn.frexp.exp			; llvm.amdgcn.frexp.exp
	; --------------------------------------------------------------------			; --------------------------------------------------------------------

	declare i32 @llvm.amdgcn.frexp.exp.f32(float) nounwind readnone			declare i32 @llvm.amdgcn.frexp.exp.f32(float) nounwind readnone
	declare i32 @llvm.amdgcn.frexp.exp.f64(double) nounwind readnone			declare i32 @llvm.amdgcn.frexp.exp.f64(double) nounwind readnone

	define i32 @test_constant_fold_frexp_exp_f32_undef() nounwind {			define i32 @test_constant_fold_frexp_exp_f32_undef() nounwind {
	; CHECK-LABEL: @test_constant_fold_frexp_exp_f32_undef(			; CHECK-LABEL: define i32 @test_constant_fold_frexp_exp_f32_undef
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i32 undef			; CHECK-NEXT: ret i32 undef
	;			;
	%val = call i32 @llvm.amdgcn.frexp.exp.f32(float undef)			%val = call i32 @llvm.amdgcn.frexp.exp.f32(float undef)
	ret i32 %val			ret i32 %val
	}			}

	define i32 @test_constant_fold_frexp_exp_f64_undef() nounwind {			define i32 @test_constant_fold_frexp_exp_f64_undef() nounwind {
	; CHECK-LABEL: @test_constant_fold_frexp_exp_f64_undef(			; CHECK-LABEL: define i32 @test_constant_fold_frexp_exp_f64_undef
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i32 undef			; CHECK-NEXT: ret i32 undef
	;			;
	%val = call i32 @llvm.amdgcn.frexp.exp.f64(double undef)			%val = call i32 @llvm.amdgcn.frexp.exp.f64(double undef)
	ret i32 %val			ret i32 %val
	}			}

	define i32 @test_constant_fold_frexp_exp_f32_0() nounwind {			define i32 @test_constant_fold_frexp_exp_f32_0() nounwind {
	; CHECK-LABEL: @test_constant_fold_frexp_exp_f32_0(			; CHECK-LABEL: define i32 @test_constant_fold_frexp_exp_f32_0
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	;			;
	%val = call i32 @llvm.amdgcn.frexp.exp.f32(float 0.0)			%val = call i32 @llvm.amdgcn.frexp.exp.f32(float 0.0)
	ret i32 %val			ret i32 %val
	}			}

	define i32 @test_constant_fold_frexp_exp_f64_0() nounwind {			define i32 @test_constant_fold_frexp_exp_f64_0() nounwind {
	; CHECK-LABEL: @test_constant_fold_frexp_exp_f64_0(			; CHECK-LABEL: define i32 @test_constant_fold_frexp_exp_f64_0
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	;			;
	%val = call i32 @llvm.amdgcn.frexp.exp.f64(double 0.0)			%val = call i32 @llvm.amdgcn.frexp.exp.f64(double 0.0)
	ret i32 %val			ret i32 %val
	}			}

	define i32 @test_constant_fold_frexp_exp_f32_n0() nounwind {			define i32 @test_constant_fold_frexp_exp_f32_n0() nounwind {
	; CHECK-LABEL: @test_constant_fold_frexp_exp_f32_n0(			; CHECK-LABEL: define i32 @test_constant_fold_frexp_exp_f32_n0
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	;			;
	%val = call i32 @llvm.amdgcn.frexp.exp.f32(float -0.0)			%val = call i32 @llvm.amdgcn.frexp.exp.f32(float -0.0)
	ret i32 %val			ret i32 %val
	}			}

	define i32 @test_constant_fold_frexp_exp_f64_n0() nounwind {			define i32 @test_constant_fold_frexp_exp_f64_n0() nounwind {
	; CHECK-LABEL: @test_constant_fold_frexp_exp_f64_n0(			; CHECK-LABEL: define i32 @test_constant_fold_frexp_exp_f64_n0
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	;			;
	%val = call i32 @llvm.amdgcn.frexp.exp.f64(double -0.0)			%val = call i32 @llvm.amdgcn.frexp.exp.f64(double -0.0)
	ret i32 %val			ret i32 %val
	}			}

	define i32 @test_constant_fold_frexp_exp_f32_1024() nounwind {			define i32 @test_constant_fold_frexp_exp_f32_1024() nounwind {
	; CHECK-LABEL: @test_constant_fold_frexp_exp_f32_1024(			; CHECK-LABEL: define i32 @test_constant_fold_frexp_exp_f32_1024
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i32 11			; CHECK-NEXT: ret i32 11
	;			;
	%val = call i32 @llvm.amdgcn.frexp.exp.f32(float 1024.0)			%val = call i32 @llvm.amdgcn.frexp.exp.f32(float 1024.0)
	ret i32 %val			ret i32 %val
	}			}

	define i32 @test_constant_fold_frexp_exp_f64_1024() nounwind {			define i32 @test_constant_fold_frexp_exp_f64_1024() nounwind {
	; CHECK-LABEL: @test_constant_fold_frexp_exp_f64_1024(			; CHECK-LABEL: define i32 @test_constant_fold_frexp_exp_f64_1024
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i32 11			; CHECK-NEXT: ret i32 11
	;			;
	%val = call i32 @llvm.amdgcn.frexp.exp.f64(double 1024.0)			%val = call i32 @llvm.amdgcn.frexp.exp.f64(double 1024.0)
	ret i32 %val			ret i32 %val
	}			}

	define i32 @test_constant_fold_frexp_exp_f32_n1024() nounwind {			define i32 @test_constant_fold_frexp_exp_f32_n1024() nounwind {
	; CHECK-LABEL: @test_constant_fold_frexp_exp_f32_n1024(			; CHECK-LABEL: define i32 @test_constant_fold_frexp_exp_f32_n1024
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i32 11			; CHECK-NEXT: ret i32 11
	;			;
	%val = call i32 @llvm.amdgcn.frexp.exp.f32(float -1024.0)			%val = call i32 @llvm.amdgcn.frexp.exp.f32(float -1024.0)
	ret i32 %val			ret i32 %val
	}			}

	define i32 @test_constant_fold_frexp_exp_f64_n1024() nounwind {			define i32 @test_constant_fold_frexp_exp_f64_n1024() nounwind {
	; CHECK-LABEL: @test_constant_fold_frexp_exp_f64_n1024(			; CHECK-LABEL: define i32 @test_constant_fold_frexp_exp_f64_n1024
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i32 11			; CHECK-NEXT: ret i32 11
	;			;
	%val = call i32 @llvm.amdgcn.frexp.exp.f64(double -1024.0)			%val = call i32 @llvm.amdgcn.frexp.exp.f64(double -1024.0)
	ret i32 %val			ret i32 %val
	}			}

	define i32 @test_constant_fold_frexp_exp_f32_1_1024() nounwind {			define i32 @test_constant_fold_frexp_exp_f32_1_1024() nounwind {
	; CHECK-LABEL: @test_constant_fold_frexp_exp_f32_1_1024(			; CHECK-LABEL: define i32 @test_constant_fold_frexp_exp_f32_1_1024
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i32 -9			; CHECK-NEXT: ret i32 -9
	;			;
	%val = call i32 @llvm.amdgcn.frexp.exp.f32(float 0.0009765625)			%val = call i32 @llvm.amdgcn.frexp.exp.f32(float 0.0009765625)
	ret i32 %val			ret i32 %val
	}			}

	define i32 @test_constant_fold_frexp_exp_f64_1_1024() nounwind {			define i32 @test_constant_fold_frexp_exp_f64_1_1024() nounwind {
	; CHECK-LABEL: @test_constant_fold_frexp_exp_f64_1_1024(			; CHECK-LABEL: define i32 @test_constant_fold_frexp_exp_f64_1_1024
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i32 -9			; CHECK-NEXT: ret i32 -9
	;			;
	%val = call i32 @llvm.amdgcn.frexp.exp.f64(double 0.0009765625)			%val = call i32 @llvm.amdgcn.frexp.exp.f64(double 0.0009765625)
	ret i32 %val			ret i32 %val
	}			}

	define i32 @test_constant_fold_frexp_exp_f32_nan() nounwind {			define i32 @test_constant_fold_frexp_exp_f32_nan() nounwind {
	; CHECK-LABEL: @test_constant_fold_frexp_exp_f32_nan(			; CHECK-LABEL: define i32 @test_constant_fold_frexp_exp_f32_nan
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	;			;
	%val = call i32 @llvm.amdgcn.frexp.exp.f32(float 0x7FF8000000000000)			%val = call i32 @llvm.amdgcn.frexp.exp.f32(float 0x7FF8000000000000)
	ret i32 %val			ret i32 %val
	}			}

	define i32 @test_constant_fold_frexp_exp_f64_nan() nounwind {			define i32 @test_constant_fold_frexp_exp_f64_nan() nounwind {
	; CHECK-LABEL: @test_constant_fold_frexp_exp_f64_nan(			; CHECK-LABEL: define i32 @test_constant_fold_frexp_exp_f64_nan
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	;			;
	%val = call i32 @llvm.amdgcn.frexp.exp.f64(double 0x7FF8000000000000)			%val = call i32 @llvm.amdgcn.frexp.exp.f64(double 0x7FF8000000000000)
	ret i32 %val			ret i32 %val
	}			}

	define i32 @test_constant_fold_frexp_exp_f32_inf() nounwind {			define i32 @test_constant_fold_frexp_exp_f32_inf() nounwind {
	; CHECK-LABEL: @test_constant_fold_frexp_exp_f32_inf(			; CHECK-LABEL: define i32 @test_constant_fold_frexp_exp_f32_inf
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	;			;
	%val = call i32 @llvm.amdgcn.frexp.exp.f32(float 0x7FF0000000000000)			%val = call i32 @llvm.amdgcn.frexp.exp.f32(float 0x7FF0000000000000)
	ret i32 %val			ret i32 %val
	}			}

	define i32 @test_constant_fold_frexp_exp_f64_inf() nounwind {			define i32 @test_constant_fold_frexp_exp_f64_inf() nounwind {
	; CHECK-LABEL: @test_constant_fold_frexp_exp_f64_inf(			; CHECK-LABEL: define i32 @test_constant_fold_frexp_exp_f64_inf
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	;			;
	%val = call i32 @llvm.amdgcn.frexp.exp.f64(double 0x7FF0000000000000)			%val = call i32 @llvm.amdgcn.frexp.exp.f64(double 0x7FF0000000000000)
	ret i32 %val			ret i32 %val
	}			}

	define i32 @test_constant_fold_frexp_exp_f32_ninf() nounwind {			define i32 @test_constant_fold_frexp_exp_f32_ninf() nounwind {
	; CHECK-LABEL: @test_constant_fold_frexp_exp_f32_ninf(			; CHECK-LABEL: define i32 @test_constant_fold_frexp_exp_f32_ninf
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	;			;
	%val = call i32 @llvm.amdgcn.frexp.exp.f32(float 0xFFF0000000000000)			%val = call i32 @llvm.amdgcn.frexp.exp.f32(float 0xFFF0000000000000)
	ret i32 %val			ret i32 %val
	}			}

	define i32 @test_constant_fold_frexp_exp_f64_ninf() nounwind {			define i32 @test_constant_fold_frexp_exp_f64_ninf() nounwind {
	; CHECK-LABEL: @test_constant_fold_frexp_exp_f64_ninf(			; CHECK-LABEL: define i32 @test_constant_fold_frexp_exp_f64_ninf
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	;			;
	%val = call i32 @llvm.amdgcn.frexp.exp.f64(double 0xFFF0000000000000)			%val = call i32 @llvm.amdgcn.frexp.exp.f64(double 0xFFF0000000000000)
	ret i32 %val			ret i32 %val
	}			}

	define i32 @test_constant_fold_frexp_exp_f32_max_num() nounwind {			define i32 @test_constant_fold_frexp_exp_f32_max_num() nounwind {
	; CHECK-LABEL: @test_constant_fold_frexp_exp_f32_max_num(			; CHECK-LABEL: define i32 @test_constant_fold_frexp_exp_f32_max_num
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i32 128			; CHECK-NEXT: ret i32 128
	;			;
	%val = call i32 @llvm.amdgcn.frexp.exp.f32(float 0x47EFFFFFE0000000)			%val = call i32 @llvm.amdgcn.frexp.exp.f32(float 0x47EFFFFFE0000000)
	ret i32 %val			ret i32 %val
	}			}

	define i32 @test_constant_fold_frexp_exp_f64_max_num() nounwind {			define i32 @test_constant_fold_frexp_exp_f64_max_num() nounwind {
	; CHECK-LABEL: @test_constant_fold_frexp_exp_f64_max_num(			; CHECK-LABEL: define i32 @test_constant_fold_frexp_exp_f64_max_num
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i32 1024			; CHECK-NEXT: ret i32 1024
	;			;
	%val = call i32 @llvm.amdgcn.frexp.exp.f64(double 0x7FEFFFFFFFFFFFFF)			%val = call i32 @llvm.amdgcn.frexp.exp.f64(double 0x7FEFFFFFFFFFFFFF)
	ret i32 %val			ret i32 %val
	}			}

	define i32 @test_constant_fold_frexp_exp_f32_min_num() nounwind {			define i32 @test_constant_fold_frexp_exp_f32_min_num() nounwind {
	; CHECK-LABEL: @test_constant_fold_frexp_exp_f32_min_num(			; CHECK-LABEL: define i32 @test_constant_fold_frexp_exp_f32_min_num
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i32 -148			; CHECK-NEXT: ret i32 -148
	;			;
	%val = call i32 @llvm.amdgcn.frexp.exp.f32(float 0x36A0000000000000)			%val = call i32 @llvm.amdgcn.frexp.exp.f32(float 0x36A0000000000000)
	ret i32 %val			ret i32 %val
	}			}

	define i32 @test_constant_fold_frexp_exp_f64_min_num() nounwind {			define i32 @test_constant_fold_frexp_exp_f64_min_num() nounwind {
	; CHECK-LABEL: @test_constant_fold_frexp_exp_f64_min_num(			; CHECK-LABEL: define i32 @test_constant_fold_frexp_exp_f64_min_num
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i32 -1073			; CHECK-NEXT: ret i32 -1073
	;			;
	%val = call i32 @llvm.amdgcn.frexp.exp.f64(double 4.940656e-324)			%val = call i32 @llvm.amdgcn.frexp.exp.f64(double 4.940656e-324)
	ret i32 %val			ret i32 %val
	}			}

	; --------------------------------------------------------------------			; --------------------------------------------------------------------
	; llvm.amdgcn.class			; llvm.amdgcn.class
	; --------------------------------------------------------------------			; --------------------------------------------------------------------

	declare i1 @llvm.amdgcn.class.f32(float, i32) nounwind readnone			declare i1 @llvm.amdgcn.class.f32(float, i32) nounwind readnone
	declare i1 @llvm.amdgcn.class.f64(double, i32) nounwind readnone			declare i1 @llvm.amdgcn.class.f64(double, i32) nounwind readnone

	define i1 @test_class_undef_mask_f32(float %x) nounwind {			define i1 @test_class_undef_mask_f32(float %x) nounwind {
	; CHECK-LABEL: @test_class_undef_mask_f32(			; CHECK-LABEL: define i1 @test_class_undef_mask_f32
				; CHECK-SAME: (float [[X:%.*]]) #[[ATTR1]] {
	; CHECK-NEXT: ret i1 false			; CHECK-NEXT: ret i1 false
	;			;
	%val = call i1 @llvm.amdgcn.class.f32(float %x, i32 undef)			%val = call i1 @llvm.amdgcn.class.f32(float %x, i32 undef)
	ret i1 %val			ret i1 %val
	}			}

	define i1 @test_class_over_max_mask_f32(float %x) nounwind {			define i1 @test_class_over_max_mask_f32(float %x) nounwind {
	; CHECK-LABEL: @test_class_over_max_mask_f32(			; CHECK-LABEL: define i1 @test_class_over_max_mask_f32
	; CHECK-NEXT: [[VAL:%.]] = call i1 @llvm.amdgcn.class.f32(float [[X:%.]], i32 1)			; CHECK-SAME: (float [[X:%.*]]) #[[ATTR1]] {
				; CHECK-NEXT: [[VAL:%.*]] = call i1 @llvm.amdgcn.class.f32(float [[X]], i32 1)
	; CHECK-NEXT: ret i1 [[VAL]]			; CHECK-NEXT: ret i1 [[VAL]]
	;			;
	%val = call i1 @llvm.amdgcn.class.f32(float %x, i32 1025)			%val = call i1 @llvm.amdgcn.class.f32(float %x, i32 1025)
	ret i1 %val			ret i1 %val
	}			}

	define i1 @test_class_no_mask_f32(float %x) nounwind {			define i1 @test_class_no_mask_f32(float %x) nounwind {
	; CHECK-LABEL: @test_class_no_mask_f32(			; CHECK-LABEL: define i1 @test_class_no_mask_f32
				; CHECK-SAME: (float [[X:%.*]]) #[[ATTR1]] {
	; CHECK-NEXT: ret i1 false			; CHECK-NEXT: ret i1 false
	;			;
	%val = call i1 @llvm.amdgcn.class.f32(float %x, i32 0)			%val = call i1 @llvm.amdgcn.class.f32(float %x, i32 0)
	ret i1 %val			ret i1 %val
	}			}

	define i1 @test_class_full_mask_f32(float %x) nounwind {			define i1 @test_class_full_mask_f32(float %x) nounwind {
	; CHECK-LABEL: @test_class_full_mask_f32(			; CHECK-LABEL: define i1 @test_class_full_mask_f32
				; CHECK-SAME: (float [[X:%.*]]) #[[ATTR1]] {
	; CHECK-NEXT: ret i1 true			; CHECK-NEXT: ret i1 true
	;			;
	%val = call i1 @llvm.amdgcn.class.f32(float %x, i32 1023)			%val = call i1 @llvm.amdgcn.class.f32(float %x, i32 1023)
	ret i1 %val			ret i1 %val
	}			}

	define i1 @test_class_undef_no_mask_f32() nounwind {			define i1 @test_class_undef_no_mask_f32() nounwind {
	; CHECK-LABEL: @test_class_undef_no_mask_f32(			; CHECK-LABEL: define i1 @test_class_undef_no_mask_f32
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i1 false			; CHECK-NEXT: ret i1 false
	;			;
	%val = call i1 @llvm.amdgcn.class.f32(float undef, i32 0)			%val = call i1 @llvm.amdgcn.class.f32(float undef, i32 0)
	ret i1 %val			ret i1 %val
	}			}

	define i1 @test_class_undef_full_mask_f32() nounwind {			define i1 @test_class_undef_full_mask_f32() nounwind {
	; CHECK-LABEL: @test_class_undef_full_mask_f32(			; CHECK-LABEL: define i1 @test_class_undef_full_mask_f32
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i1 true			; CHECK-NEXT: ret i1 true
	;			;
	%val = call i1 @llvm.amdgcn.class.f32(float undef, i32 1023)			%val = call i1 @llvm.amdgcn.class.f32(float undef, i32 1023)
	ret i1 %val			ret i1 %val
	}			}

	define i1 @test_class_undef_val_f32() nounwind {			define i1 @test_class_undef_val_f32() nounwind {
	; CHECK-LABEL: @test_class_undef_val_f32(			; CHECK-LABEL: define i1 @test_class_undef_val_f32
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i1 undef			; CHECK-NEXT: ret i1 undef
	;			;
	%val = call i1 @llvm.amdgcn.class.f32(float undef, i32 4)			%val = call i1 @llvm.amdgcn.class.f32(float undef, i32 4)
	ret i1 %val			ret i1 %val
	}			}

	define i1 @test_class_undef_undef_f32() nounwind {			define i1 @test_class_undef_undef_f32() nounwind {
	; CHECK-LABEL: @test_class_undef_undef_f32(			; CHECK-LABEL: define i1 @test_class_undef_undef_f32
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i1 undef			; CHECK-NEXT: ret i1 undef
	;			;
	%val = call i1 @llvm.amdgcn.class.f32(float undef, i32 undef)			%val = call i1 @llvm.amdgcn.class.f32(float undef, i32 undef)
	ret i1 %val			ret i1 %val
	}			}

	define i1 @test_class_var_mask_f32(float %x, i32 %mask) nounwind {			define i1 @test_class_var_mask_f32(float %x, i32 %mask) nounwind {
	; CHECK-LABEL: @test_class_var_mask_f32(			; CHECK-LABEL: define i1 @test_class_var_mask_f32
	; CHECK-NEXT: [[VAL:%.]] = call i1 @llvm.amdgcn.class.f32(float [[X:%.]], i32 [[MASK:%.*]])			; CHECK-SAME: (float [[X:%.]], i32 [[MASK:%.]]) #[[ATTR1]] {
				; CHECK-NEXT: [[VAL:%.*]] = call i1 @llvm.amdgcn.class.f32(float [[X]], i32 [[MASK]])
	; CHECK-NEXT: ret i1 [[VAL]]			; CHECK-NEXT: ret i1 [[VAL]]
	;			;
	%val = call i1 @llvm.amdgcn.class.f32(float %x, i32 %mask)			%val = call i1 @llvm.amdgcn.class.f32(float %x, i32 %mask)
	ret i1 %val			ret i1 %val
	}			}

	define i1 @test_class_isnan_f32(float %x) nounwind {			define i1 @test_class_isnan_f32(float %x) nounwind {
	; CHECK-LABEL: @test_class_isnan_f32(			; CHECK-LABEL: define i1 @test_class_isnan_f32
	; CHECK-NEXT: [[VAL:%.]] = fcmp uno float [[X:%.]], 0.000000e+00			; CHECK-SAME: (float [[X:%.*]]) #[[ATTR1]] {
				; CHECK-NEXT: [[VAL:%.*]] = fcmp uno float [[X]], 0.000000e+00
	; CHECK-NEXT: ret i1 [[VAL]]			; CHECK-NEXT: ret i1 [[VAL]]
	;			;
	%val = call i1 @llvm.amdgcn.class.f32(float %x, i32 3)			%val = call i1 @llvm.amdgcn.class.f32(float %x, i32 3)
	ret i1 %val			ret i1 %val
	}			}

	define i1 @test_class_isnan_f32_strict(float %x) nounwind {			define i1 @test_class_isnan_f32_strict(float %x) nounwind {
	; CHECK-LABEL: @test_class_isnan_f32_strict(			; CHECK-LABEL: define i1 @test_class_isnan_f32_strict
	; CHECK-NEXT: [[VAL:%.]] = call i1 @llvm.amdgcn.class.f32(float [[X:%.]], i32 3) #[[ATTR16:[0-9]+]]			; CHECK-SAME: (float [[X:%.*]]) #[[ATTR1]] {
				; CHECK-NEXT: [[VAL:%.*]] = call i1 @llvm.amdgcn.class.f32(float [[X]], i32 3) #[[ATTR15:[0-9]+]]
	; CHECK-NEXT: ret i1 [[VAL]]			; CHECK-NEXT: ret i1 [[VAL]]
	;			;
	%val = call i1 @llvm.amdgcn.class.f32(float %x, i32 3) strictfp			%val = call i1 @llvm.amdgcn.class.f32(float %x, i32 3) strictfp
	ret i1 %val			ret i1 %val
	}			}

	define i1 @test_class_is_p0_n0_f32(float %x) nounwind {			define i1 @test_class_is_p0_n0_f32(float %x) nounwind {
	; CHECK-LABEL: @test_class_is_p0_n0_f32(			; CHECK-LABEL: define i1 @test_class_is_p0_n0_f32
	; CHECK-NEXT: [[VAL:%.]] = fcmp oeq float [[X:%.]], 0.000000e+00			; CHECK-SAME: (float [[X:%.*]]) #[[ATTR1]] {
				; CHECK-NEXT: [[VAL:%.*]] = fcmp oeq float [[X]], 0.000000e+00
	; CHECK-NEXT: ret i1 [[VAL]]			; CHECK-NEXT: ret i1 [[VAL]]
	;			;
	%val = call i1 @llvm.amdgcn.class.f32(float %x, i32 96)			%val = call i1 @llvm.amdgcn.class.f32(float %x, i32 96)
	ret i1 %val			ret i1 %val
	}			}

	define i1 @test_class_is_p0_n0_f32_strict(float %x) nounwind {			define i1 @test_class_is_p0_n0_f32_strict(float %x) nounwind {
	; CHECK-LABEL: @test_class_is_p0_n0_f32_strict(			; CHECK-LABEL: define i1 @test_class_is_p0_n0_f32_strict
	; CHECK-NEXT: [[VAL:%.]] = call i1 @llvm.amdgcn.class.f32(float [[X:%.]], i32 96) #[[ATTR16]]			; CHECK-SAME: (float [[X:%.*]]) #[[ATTR1]] {
				; CHECK-NEXT: [[VAL:%.*]] = call i1 @llvm.amdgcn.class.f32(float [[X]], i32 96) #[[ATTR15]]
	; CHECK-NEXT: ret i1 [[VAL]]			; CHECK-NEXT: ret i1 [[VAL]]
	;			;
	%val = call i1 @llvm.amdgcn.class.f32(float %x, i32 96) strictfp			%val = call i1 @llvm.amdgcn.class.f32(float %x, i32 96) strictfp
	ret i1 %val			ret i1 %val
	}			}

	define i1 @test_constant_class_snan_test_snan_f64() nounwind {			define i1 @test_constant_class_snan_test_snan_f64() nounwind {
	; CHECK-LABEL: @test_constant_class_snan_test_snan_f64(			; CHECK-LABEL: define i1 @test_constant_class_snan_test_snan_f64
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i1 true			; CHECK-NEXT: ret i1 true
	;			;
	%val = call i1 @llvm.amdgcn.class.f64(double 0x7FF0000000000001, i32 1)			%val = call i1 @llvm.amdgcn.class.f64(double 0x7FF0000000000001, i32 1)
	ret i1 %val			ret i1 %val
	}			}

	define i1 @test_constant_class_qnan_test_qnan_f64() nounwind {			define i1 @test_constant_class_qnan_test_qnan_f64() nounwind {
	; CHECK-LABEL: @test_constant_class_qnan_test_qnan_f64(			; CHECK-LABEL: define i1 @test_constant_class_qnan_test_qnan_f64
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i1 true			; CHECK-NEXT: ret i1 true
	;			;
	%val = call i1 @llvm.amdgcn.class.f64(double 0x7FF8000000000000, i32 2)			%val = call i1 @llvm.amdgcn.class.f64(double 0x7FF8000000000000, i32 2)
	ret i1 %val			ret i1 %val
	}			}

	define i1 @test_constant_class_qnan_test_snan_f64() nounwind {			define i1 @test_constant_class_qnan_test_snan_f64() nounwind {
	; CHECK-LABEL: @test_constant_class_qnan_test_snan_f64(			; CHECK-LABEL: define i1 @test_constant_class_qnan_test_snan_f64
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i1 false			; CHECK-NEXT: ret i1 false
	;			;
	%val = call i1 @llvm.amdgcn.class.f64(double 0x7FF8000000000000, i32 1)			%val = call i1 @llvm.amdgcn.class.f64(double 0x7FF8000000000000, i32 1)
	ret i1 %val			ret i1 %val
	}			}

	define i1 @test_constant_class_ninf_test_ninf_f64() nounwind {			define i1 @test_constant_class_ninf_test_ninf_f64() nounwind {
	; CHECK-LABEL: @test_constant_class_ninf_test_ninf_f64(			; CHECK-LABEL: define i1 @test_constant_class_ninf_test_ninf_f64
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i1 true			; CHECK-NEXT: ret i1 true
	;			;
	%val = call i1 @llvm.amdgcn.class.f64(double 0xFFF0000000000000, i32 4)			%val = call i1 @llvm.amdgcn.class.f64(double 0xFFF0000000000000, i32 4)
	ret i1 %val			ret i1 %val
	}			}

	define i1 @test_constant_class_pinf_test_ninf_f64() nounwind {			define i1 @test_constant_class_pinf_test_ninf_f64() nounwind {
	; CHECK-LABEL: @test_constant_class_pinf_test_ninf_f64(			; CHECK-LABEL: define i1 @test_constant_class_pinf_test_ninf_f64
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i1 false			; CHECK-NEXT: ret i1 false
	;			;
	%val = call i1 @llvm.amdgcn.class.f64(double 0x7FF0000000000000, i32 4)			%val = call i1 @llvm.amdgcn.class.f64(double 0x7FF0000000000000, i32 4)
	ret i1 %val			ret i1 %val
	}			}

	define i1 @test_constant_class_qnan_test_ninf_f64() nounwind {			define i1 @test_constant_class_qnan_test_ninf_f64() nounwind {
	; CHECK-LABEL: @test_constant_class_qnan_test_ninf_f64(			; CHECK-LABEL: define i1 @test_constant_class_qnan_test_ninf_f64
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i1 false			; CHECK-NEXT: ret i1 false
	;			;
	%val = call i1 @llvm.amdgcn.class.f64(double 0x7FF8000000000000, i32 4)			%val = call i1 @llvm.amdgcn.class.f64(double 0x7FF8000000000000, i32 4)
	ret i1 %val			ret i1 %val
	}			}

	define i1 @test_constant_class_snan_test_ninf_f64() nounwind {			define i1 @test_constant_class_snan_test_ninf_f64() nounwind {
	; CHECK-LABEL: @test_constant_class_snan_test_ninf_f64(			; CHECK-LABEL: define i1 @test_constant_class_snan_test_ninf_f64
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i1 false			; CHECK-NEXT: ret i1 false
	;			;
	%val = call i1 @llvm.amdgcn.class.f64(double 0x7FF0000000000001, i32 4)			%val = call i1 @llvm.amdgcn.class.f64(double 0x7FF0000000000001, i32 4)
	ret i1 %val			ret i1 %val
	}			}

	define i1 @test_constant_class_nnormal_test_nnormal_f64() nounwind {			define i1 @test_constant_class_nnormal_test_nnormal_f64() nounwind {
	; CHECK-LABEL: @test_constant_class_nnormal_test_nnormal_f64(			; CHECK-LABEL: define i1 @test_constant_class_nnormal_test_nnormal_f64
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i1 true			; CHECK-NEXT: ret i1 true
	;			;
	%val = call i1 @llvm.amdgcn.class.f64(double -1.0, i32 8)			%val = call i1 @llvm.amdgcn.class.f64(double -1.0, i32 8)
	ret i1 %val			ret i1 %val
	}			}

	define i1 @test_constant_class_pnormal_test_nnormal_f64() nounwind {			define i1 @test_constant_class_pnormal_test_nnormal_f64() nounwind {
	; CHECK-LABEL: @test_constant_class_pnormal_test_nnormal_f64(			; CHECK-LABEL: define i1 @test_constant_class_pnormal_test_nnormal_f64
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i1 false			; CHECK-NEXT: ret i1 false
	;			;
	%val = call i1 @llvm.amdgcn.class.f64(double 1.0, i32 8)			%val = call i1 @llvm.amdgcn.class.f64(double 1.0, i32 8)
	ret i1 %val			ret i1 %val
	}			}

	define i1 @test_constant_class_nsubnormal_test_nsubnormal_f64() nounwind {			define i1 @test_constant_class_nsubnormal_test_nsubnormal_f64() nounwind {
	; CHECK-LABEL: @test_constant_class_nsubnormal_test_nsubnormal_f64(			; CHECK-LABEL: define i1 @test_constant_class_nsubnormal_test_nsubnormal_f64
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i1 true			; CHECK-NEXT: ret i1 true
	;			;
	%val = call i1 @llvm.amdgcn.class.f64(double 0x800fffffffffffff, i32 16)			%val = call i1 @llvm.amdgcn.class.f64(double 0x800fffffffffffff, i32 16)
	ret i1 %val			ret i1 %val
	}			}

	define i1 @test_constant_class_psubnormal_test_nsubnormal_f64() nounwind {			define i1 @test_constant_class_psubnormal_test_nsubnormal_f64() nounwind {
	; CHECK-LABEL: @test_constant_class_psubnormal_test_nsubnormal_f64(			; CHECK-LABEL: define i1 @test_constant_class_psubnormal_test_nsubnormal_f64
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i1 false			; CHECK-NEXT: ret i1 false
	;			;
	%val = call i1 @llvm.amdgcn.class.f64(double 0x000fffffffffffff, i32 16)			%val = call i1 @llvm.amdgcn.class.f64(double 0x000fffffffffffff, i32 16)
	ret i1 %val			ret i1 %val
	}			}

	define i1 @test_constant_class_nzero_test_nzero_f64() nounwind {			define i1 @test_constant_class_nzero_test_nzero_f64() nounwind {
	; CHECK-LABEL: @test_constant_class_nzero_test_nzero_f64(			; CHECK-LABEL: define i1 @test_constant_class_nzero_test_nzero_f64
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i1 true			; CHECK-NEXT: ret i1 true
	;			;
	%val = call i1 @llvm.amdgcn.class.f64(double -0.0, i32 32)			%val = call i1 @llvm.amdgcn.class.f64(double -0.0, i32 32)
	ret i1 %val			ret i1 %val
	}			}

	define i1 @test_constant_class_pzero_test_nzero_f64() nounwind {			define i1 @test_constant_class_pzero_test_nzero_f64() nounwind {
	; CHECK-LABEL: @test_constant_class_pzero_test_nzero_f64(			; CHECK-LABEL: define i1 @test_constant_class_pzero_test_nzero_f64
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i1 false			; CHECK-NEXT: ret i1 false
	;			;
	%val = call i1 @llvm.amdgcn.class.f64(double 0.0, i32 32)			%val = call i1 @llvm.amdgcn.class.f64(double 0.0, i32 32)
	ret i1 %val			ret i1 %val
	}			}

	define i1 @test_constant_class_pzero_test_pzero_f64() nounwind {			define i1 @test_constant_class_pzero_test_pzero_f64() nounwind {
	; CHECK-LABEL: @test_constant_class_pzero_test_pzero_f64(			; CHECK-LABEL: define i1 @test_constant_class_pzero_test_pzero_f64
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i1 true			; CHECK-NEXT: ret i1 true
	;			;
	%val = call i1 @llvm.amdgcn.class.f64(double 0.0, i32 64)			%val = call i1 @llvm.amdgcn.class.f64(double 0.0, i32 64)
	ret i1 %val			ret i1 %val
	}			}

	define i1 @test_constant_class_nzero_test_pzero_f64() nounwind {			define i1 @test_constant_class_nzero_test_pzero_f64() nounwind {
	; CHECK-LABEL: @test_constant_class_nzero_test_pzero_f64(			; CHECK-LABEL: define i1 @test_constant_class_nzero_test_pzero_f64
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i1 false			; CHECK-NEXT: ret i1 false
	;			;
	%val = call i1 @llvm.amdgcn.class.f64(double -0.0, i32 64)			%val = call i1 @llvm.amdgcn.class.f64(double -0.0, i32 64)
	ret i1 %val			ret i1 %val
	}			}

	define i1 @test_constant_class_psubnormal_test_psubnormal_f64() nounwind {			define i1 @test_constant_class_psubnormal_test_psubnormal_f64() nounwind {
	; CHECK-LABEL: @test_constant_class_psubnormal_test_psubnormal_f64(			; CHECK-LABEL: define i1 @test_constant_class_psubnormal_test_psubnormal_f64
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i1 true			; CHECK-NEXT: ret i1 true
	;			;
	%val = call i1 @llvm.amdgcn.class.f64(double 0x000fffffffffffff, i32 128)			%val = call i1 @llvm.amdgcn.class.f64(double 0x000fffffffffffff, i32 128)
	ret i1 %val			ret i1 %val
	}			}

	define i1 @test_constant_class_nsubnormal_test_psubnormal_f64() nounwind {			define i1 @test_constant_class_nsubnormal_test_psubnormal_f64() nounwind {
	; CHECK-LABEL: @test_constant_class_nsubnormal_test_psubnormal_f64(			; CHECK-LABEL: define i1 @test_constant_class_nsubnormal_test_psubnormal_f64
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i1 false			; CHECK-NEXT: ret i1 false
	;			;
	%val = call i1 @llvm.amdgcn.class.f64(double 0x800fffffffffffff, i32 128)			%val = call i1 @llvm.amdgcn.class.f64(double 0x800fffffffffffff, i32 128)
	ret i1 %val			ret i1 %val
	}			}

	define i1 @test_constant_class_pnormal_test_pnormal_f64() nounwind {			define i1 @test_constant_class_pnormal_test_pnormal_f64() nounwind {
	; CHECK-LABEL: @test_constant_class_pnormal_test_pnormal_f64(			; CHECK-LABEL: define i1 @test_constant_class_pnormal_test_pnormal_f64
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i1 true			; CHECK-NEXT: ret i1 true
	;			;
	%val = call i1 @llvm.amdgcn.class.f64(double 1.0, i32 256)			%val = call i1 @llvm.amdgcn.class.f64(double 1.0, i32 256)
	ret i1 %val			ret i1 %val
	}			}

	define i1 @test_constant_class_nnormal_test_pnormal_f64() nounwind {			define i1 @test_constant_class_nnormal_test_pnormal_f64() nounwind {
	; CHECK-LABEL: @test_constant_class_nnormal_test_pnormal_f64(			; CHECK-LABEL: define i1 @test_constant_class_nnormal_test_pnormal_f64
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i1 false			; CHECK-NEXT: ret i1 false
	;			;
	%val = call i1 @llvm.amdgcn.class.f64(double -1.0, i32 256)			%val = call i1 @llvm.amdgcn.class.f64(double -1.0, i32 256)
	ret i1 %val			ret i1 %val
	}			}

	define i1 @test_constant_class_pinf_test_pinf_f64() nounwind {			define i1 @test_constant_class_pinf_test_pinf_f64() nounwind {
	; CHECK-LABEL: @test_constant_class_pinf_test_pinf_f64(			; CHECK-LABEL: define i1 @test_constant_class_pinf_test_pinf_f64
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i1 true			; CHECK-NEXT: ret i1 true
	;			;
	%val = call i1 @llvm.amdgcn.class.f64(double 0x7FF0000000000000, i32 512)			%val = call i1 @llvm.amdgcn.class.f64(double 0x7FF0000000000000, i32 512)
	ret i1 %val			ret i1 %val
	}			}

	define i1 @test_constant_class_ninf_test_pinf_f64() nounwind {			define i1 @test_constant_class_ninf_test_pinf_f64() nounwind {
	; CHECK-LABEL: @test_constant_class_ninf_test_pinf_f64(			; CHECK-LABEL: define i1 @test_constant_class_ninf_test_pinf_f64
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i1 false			; CHECK-NEXT: ret i1 false
	;			;
	%val = call i1 @llvm.amdgcn.class.f64(double 0xFFF0000000000000, i32 512)			%val = call i1 @llvm.amdgcn.class.f64(double 0xFFF0000000000000, i32 512)
	ret i1 %val			ret i1 %val
	}			}

	define i1 @test_constant_class_qnan_test_pinf_f64() nounwind {			define i1 @test_constant_class_qnan_test_pinf_f64() nounwind {
	; CHECK-LABEL: @test_constant_class_qnan_test_pinf_f64(			; CHECK-LABEL: define i1 @test_constant_class_qnan_test_pinf_f64
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i1 false			; CHECK-NEXT: ret i1 false
	;			;
	%val = call i1 @llvm.amdgcn.class.f64(double 0x7FF8000000000000, i32 512)			%val = call i1 @llvm.amdgcn.class.f64(double 0x7FF8000000000000, i32 512)
	ret i1 %val			ret i1 %val
	}			}

	define i1 @test_constant_class_snan_test_pinf_f64() nounwind {			define i1 @test_constant_class_snan_test_pinf_f64() nounwind {
	; CHECK-LABEL: @test_constant_class_snan_test_pinf_f64(			; CHECK-LABEL: define i1 @test_constant_class_snan_test_pinf_f64
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i1 false			; CHECK-NEXT: ret i1 false
	;			;
	%val = call i1 @llvm.amdgcn.class.f64(double 0x7FF0000000000001, i32 512)			%val = call i1 @llvm.amdgcn.class.f64(double 0x7FF0000000000001, i32 512)
	ret i1 %val			ret i1 %val
	}			}

	define i1 @test_class_is_snan_nnan_src(float %x) {			define i1 @test_class_is_snan_nnan_src(float %x) {
	; CHECK-LABEL: @test_class_is_snan_nnan_src(			; CHECK-LABEL: define i1 @test_class_is_snan_nnan_src
				; CHECK-SAME: (float [[X:%.*]]) #[[ATTR3:[0-9]+]] {
	; CHECK-NEXT: ret i1 false			; CHECK-NEXT: ret i1 false
	;			;
	%nnan = fadd nnan float %x, 1.0			%nnan = fadd nnan float %x, 1.0
	%class = call i1 @llvm.amdgcn.class.f32(float %nnan, i32 1)			%class = call i1 @llvm.amdgcn.class.f32(float %nnan, i32 1)
	ret i1 %class			ret i1 %class
	}			}

	define i1 @test_class_is_qnan_nnan_src(float %x) {			define i1 @test_class_is_qnan_nnan_src(float %x) {
	; CHECK-LABEL: @test_class_is_qnan_nnan_src(			; CHECK-LABEL: define i1 @test_class_is_qnan_nnan_src
				; CHECK-SAME: (float [[X:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: ret i1 false			; CHECK-NEXT: ret i1 false
	;			;
	%nnan = fadd nnan float %x, 1.0			%nnan = fadd nnan float %x, 1.0
	%class = call i1 @llvm.amdgcn.class.f32(float %nnan, i32 2)			%class = call i1 @llvm.amdgcn.class.f32(float %nnan, i32 2)
	ret i1 %class			ret i1 %class
	}			}

	define i1 @test_class_is_nan_nnan_src(float %x) {			define i1 @test_class_is_nan_nnan_src(float %x) {
	; CHECK-LABEL: @test_class_is_nan_nnan_src(			; CHECK-LABEL: define i1 @test_class_is_nan_nnan_src
				; CHECK-SAME: (float [[X:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: ret i1 false			; CHECK-NEXT: ret i1 false
	;			;
	%nnan = fadd nnan float %x, 1.0			%nnan = fadd nnan float %x, 1.0
	%class = call i1 @llvm.amdgcn.class.f32(float %nnan, i32 3)			%class = call i1 @llvm.amdgcn.class.f32(float %nnan, i32 3)
	ret i1 %class			ret i1 %class
	}			}

	define i1 @test_class_is_nan_other_nnan_src(float %x) {			define i1 @test_class_is_nan_other_nnan_src(float %x) {
	; CHECK-LABEL: @test_class_is_nan_other_nnan_src(			; CHECK-LABEL: define i1 @test_class_is_nan_other_nnan_src
	; CHECK-NEXT: [[NNAN:%.]] = fadd nnan float [[X:%.]], 1.000000e+00			; CHECK-SAME: (float [[X:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: [[NNAN:%.*]] = fadd nnan float [[X]], 1.000000e+00
	; CHECK-NEXT: [[CLASS:%.*]] = call i1 @llvm.amdgcn.class.f32(float [[NNAN]], i32 264)			; CHECK-NEXT: [[CLASS:%.*]] = call i1 @llvm.amdgcn.class.f32(float [[NNAN]], i32 264)
	; CHECK-NEXT: ret i1 [[CLASS]]			; CHECK-NEXT: ret i1 [[CLASS]]
	;			;
	%nnan = fadd nnan float %x, 1.0			%nnan = fadd nnan float %x, 1.0
	%class = call i1 @llvm.amdgcn.class.f32(float %nnan, i32 267)			%class = call i1 @llvm.amdgcn.class.f32(float %nnan, i32 267)
	ret i1 %class			ret i1 %class
	}			}

	; --------------------------------------------------------------------			; --------------------------------------------------------------------
	; llvm.amdgcn.cos			; llvm.amdgcn.cos
	; --------------------------------------------------------------------			; --------------------------------------------------------------------
	declare float @llvm.amdgcn.cos.f32(float) nounwind readnone			declare float @llvm.amdgcn.cos.f32(float) nounwind readnone
	declare float @llvm.fabs.f32(float) nounwind readnone			declare float @llvm.fabs.f32(float) nounwind readnone

	define float @cos_fneg_f32(float %x) {			define float @cos_fneg_f32(float %x) {
	; CHECK-LABEL: @cos_fneg_f32(			; CHECK-LABEL: define float @cos_fneg_f32
	; CHECK-NEXT: [[COS:%.]] = call float @llvm.amdgcn.cos.f32(float [[X:%.]])			; CHECK-SAME: (float [[X:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: [[COS:%.*]] = call float @llvm.amdgcn.cos.f32(float [[X]])
	; CHECK-NEXT: ret float [[COS]]			; CHECK-NEXT: ret float [[COS]]
	;			;
	%x.fneg = fsub float -0.0, %x			%x.fneg = fsub float -0.0, %x
	%cos = call float @llvm.amdgcn.cos.f32(float %x.fneg)			%cos = call float @llvm.amdgcn.cos.f32(float %x.fneg)
	ret float %cos			ret float %cos
	}			}

	define float @cos_unary_fneg_f32(float %x) {			define float @cos_unary_fneg_f32(float %x) {
	; CHECK-LABEL: @cos_unary_fneg_f32(			; CHECK-LABEL: define float @cos_unary_fneg_f32
	; CHECK-NEXT: [[COS:%.]] = call float @llvm.amdgcn.cos.f32(float [[X:%.]])			; CHECK-SAME: (float [[X:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: [[COS:%.*]] = call float @llvm.amdgcn.cos.f32(float [[X]])
	; CHECK-NEXT: ret float [[COS]]			; CHECK-NEXT: ret float [[COS]]
	;			;
	%x.fneg = fneg float %x			%x.fneg = fneg float %x
	%cos = call float @llvm.amdgcn.cos.f32(float %x.fneg)			%cos = call float @llvm.amdgcn.cos.f32(float %x.fneg)
	ret float %cos			ret float %cos
	}			}

	define float @cos_fabs_f32(float %x) {			define float @cos_fabs_f32(float %x) {
	; CHECK-LABEL: @cos_fabs_f32(			; CHECK-LABEL: define float @cos_fabs_f32
	; CHECK-NEXT: [[COS:%.]] = call float @llvm.amdgcn.cos.f32(float [[X:%.]])			; CHECK-SAME: (float [[X:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: [[COS:%.*]] = call float @llvm.amdgcn.cos.f32(float [[X]])
	; CHECK-NEXT: ret float [[COS]]			; CHECK-NEXT: ret float [[COS]]
	;			;
	%x.fabs = call float @llvm.fabs.f32(float %x)			%x.fabs = call float @llvm.fabs.f32(float %x)
	%cos = call float @llvm.amdgcn.cos.f32(float %x.fabs)			%cos = call float @llvm.amdgcn.cos.f32(float %x.fabs)
	ret float %cos			ret float %cos
	}			}

	define float @cos_fabs_fneg_f32(float %x) {			define float @cos_fabs_fneg_f32(float %x) {
	; CHECK-LABEL: @cos_fabs_fneg_f32(			; CHECK-LABEL: define float @cos_fabs_fneg_f32
	; CHECK-NEXT: [[COS:%.]] = call float @llvm.amdgcn.cos.f32(float [[X:%.]])			; CHECK-SAME: (float [[X:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: [[COS:%.*]] = call float @llvm.amdgcn.cos.f32(float [[X]])
	; CHECK-NEXT: ret float [[COS]]			; CHECK-NEXT: ret float [[COS]]
	;			;
	%x.fabs = call float @llvm.fabs.f32(float %x)			%x.fabs = call float @llvm.fabs.f32(float %x)
	%x.fabs.fneg = fsub float -0.0, %x.fabs			%x.fabs.fneg = fsub float -0.0, %x.fabs
	%cos = call float @llvm.amdgcn.cos.f32(float %x.fabs.fneg)			%cos = call float @llvm.amdgcn.cos.f32(float %x.fabs.fneg)
	ret float %cos			ret float %cos
	}			}

	define float @cos_fabs_unary_fneg_f32(float %x) {			define float @cos_fabs_unary_fneg_f32(float %x) {
	; CHECK-LABEL: @cos_fabs_unary_fneg_f32(			; CHECK-LABEL: define float @cos_fabs_unary_fneg_f32
	; CHECK-NEXT: [[COS:%.]] = call float @llvm.amdgcn.cos.f32(float [[X:%.]])			; CHECK-SAME: (float [[X:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: [[COS:%.*]] = call float @llvm.amdgcn.cos.f32(float [[X]])
	; CHECK-NEXT: ret float [[COS]]			; CHECK-NEXT: ret float [[COS]]
	;			;
	%x.fabs = call float @llvm.fabs.f32(float %x)			%x.fabs = call float @llvm.fabs.f32(float %x)
	%x.fabs.fneg = fneg float %x.fabs			%x.fabs.fneg = fneg float %x.fabs
	%cos = call float @llvm.amdgcn.cos.f32(float %x.fabs.fneg)			%cos = call float @llvm.amdgcn.cos.f32(float %x.fabs.fneg)
	ret float %cos			ret float %cos
	}			}

	; --------------------------------------------------------------------			; --------------------------------------------------------------------
	; llvm.amdgcn.cvt.pkrtz			; llvm.amdgcn.cvt.pkrtz
	; --------------------------------------------------------------------			; --------------------------------------------------------------------

	declare <2 x half> @llvm.amdgcn.cvt.pkrtz(float, float) nounwind readnone			declare <2 x half> @llvm.amdgcn.cvt.pkrtz(float, float) nounwind readnone

	define <2 x half> @vars_lhs_cvt_pkrtz(float %x, float %y) {			define <2 x half> @vars_lhs_cvt_pkrtz(float %x, float %y) {
	; CHECK-LABEL: @vars_lhs_cvt_pkrtz(			; CHECK-LABEL: define <2 x half> @vars_lhs_cvt_pkrtz
	; CHECK-NEXT: [[CVT:%.]] = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float [[X:%.]], float [[Y:%.*]])			; CHECK-SAME: (float [[X:%.]], float [[Y:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[CVT:%.*]] = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float [[X]], float [[Y]])
	; CHECK-NEXT: ret <2 x half> [[CVT]]			; CHECK-NEXT: ret <2 x half> [[CVT]]
	;			;
	%cvt = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float %x, float %y)			%cvt = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float %x, float %y)
	ret <2 x half> %cvt			ret <2 x half> %cvt
	}			}

	define <2 x half> @constant_lhs_cvt_pkrtz(float %y) {			define <2 x half> @constant_lhs_cvt_pkrtz(float %y) {
	; CHECK-LABEL: @constant_lhs_cvt_pkrtz(			; CHECK-LABEL: define <2 x half> @constant_lhs_cvt_pkrtz
	; CHECK-NEXT: [[CVT:%.]] = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float 0.000000e+00, float [[Y:%.]])			; CHECK-SAME: (float [[Y:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: [[CVT:%.*]] = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float 0.000000e+00, float [[Y]])
	; CHECK-NEXT: ret <2 x half> [[CVT]]			; CHECK-NEXT: ret <2 x half> [[CVT]]
	;			;
	%cvt = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float 0.0, float %y)			%cvt = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float 0.0, float %y)
	ret <2 x half> %cvt			ret <2 x half> %cvt
	}			}

	define <2 x half> @constant_rhs_cvt_pkrtz(float %x) {			define <2 x half> @constant_rhs_cvt_pkrtz(float %x) {
	; CHECK-LABEL: @constant_rhs_cvt_pkrtz(			; CHECK-LABEL: define <2 x half> @constant_rhs_cvt_pkrtz
	; CHECK-NEXT: [[CVT:%.]] = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float [[X:%.]], float 0.000000e+00)			; CHECK-SAME: (float [[X:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: [[CVT:%.*]] = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float [[X]], float 0.000000e+00)
	; CHECK-NEXT: ret <2 x half> [[CVT]]			; CHECK-NEXT: ret <2 x half> [[CVT]]
	;			;
	%cvt = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float %x, float 0.0)			%cvt = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float %x, float 0.0)
	ret <2 x half> %cvt			ret <2 x half> %cvt
	}			}

	define <2 x half> @undef_lhs_cvt_pkrtz(float %y) {			define <2 x half> @undef_lhs_cvt_pkrtz(float %y) {
	; CHECK-LABEL: @undef_lhs_cvt_pkrtz(			; CHECK-LABEL: define <2 x half> @undef_lhs_cvt_pkrtz
	; CHECK-NEXT: [[CVT:%.]] = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float undef, float [[Y:%.]])			; CHECK-SAME: (float [[Y:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: [[CVT:%.*]] = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float undef, float [[Y]])
	; CHECK-NEXT: ret <2 x half> [[CVT]]			; CHECK-NEXT: ret <2 x half> [[CVT]]
	;			;
	%cvt = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float undef, float %y)			%cvt = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float undef, float %y)
	ret <2 x half> %cvt			ret <2 x half> %cvt
	}			}

	define <2 x half> @undef_rhs_cvt_pkrtz(float %x) {			define <2 x half> @undef_rhs_cvt_pkrtz(float %x) {
	; CHECK-LABEL: @undef_rhs_cvt_pkrtz(			; CHECK-LABEL: define <2 x half> @undef_rhs_cvt_pkrtz
	; CHECK-NEXT: [[CVT:%.]] = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float [[X:%.]], float undef)			; CHECK-SAME: (float [[X:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: [[CVT:%.*]] = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float [[X]], float undef)
	; CHECK-NEXT: ret <2 x half> [[CVT]]			; CHECK-NEXT: ret <2 x half> [[CVT]]
	;			;
	%cvt = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float %x, float undef)			%cvt = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float %x, float undef)
	ret <2 x half> %cvt			ret <2 x half> %cvt
	}			}

	define <2 x half> @undef_cvt_pkrtz() {			define <2 x half> @undef_cvt_pkrtz() {
	; CHECK-LABEL: @undef_cvt_pkrtz(			; CHECK-LABEL: define <2 x half> @undef_cvt_pkrtz
				; CHECK-SAME: () #[[ATTR3]] {
	; CHECK-NEXT: ret <2 x half> undef			; CHECK-NEXT: ret <2 x half> undef
	;			;
	%cvt = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float undef, float undef)			%cvt = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float undef, float undef)
	ret <2 x half> %cvt			ret <2 x half> %cvt
	}			}

	define <2 x half> @constant_splat0_cvt_pkrtz() {			define <2 x half> @constant_splat0_cvt_pkrtz() {
	; CHECK-LABEL: @constant_splat0_cvt_pkrtz(			; CHECK-LABEL: define <2 x half> @constant_splat0_cvt_pkrtz
				; CHECK-SAME: () #[[ATTR3]] {
	; CHECK-NEXT: ret <2 x half> zeroinitializer			; CHECK-NEXT: ret <2 x half> zeroinitializer
	;			;
	%cvt = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float 0.0, float 0.0)			%cvt = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float 0.0, float 0.0)
	ret <2 x half> %cvt			ret <2 x half> %cvt
	}			}

	define <2 x half> @constant_cvt_pkrtz() {			define <2 x half> @constant_cvt_pkrtz() {
	; CHECK-LABEL: @constant_cvt_pkrtz(			; CHECK-LABEL: define <2 x half> @constant_cvt_pkrtz
				; CHECK-SAME: () #[[ATTR3]] {
	; CHECK-NEXT: ret <2 x half> <half 0xH4000, half 0xH4400>			; CHECK-NEXT: ret <2 x half> <half 0xH4000, half 0xH4400>
	;			;
	%cvt = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float 2.0, float 4.0)			%cvt = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float 2.0, float 4.0)
	ret <2 x half> %cvt			ret <2 x half> %cvt
	}			}

	; Test constant values where rtz changes result			; Test constant values where rtz changes result
	define <2 x half> @constant_rtz_pkrtz() {			define <2 x half> @constant_rtz_pkrtz() {
	; CHECK-LABEL: @constant_rtz_pkrtz(			; CHECK-LABEL: define <2 x half> @constant_rtz_pkrtz
				; CHECK-SAME: () #[[ATTR3]] {
	; CHECK-NEXT: ret <2 x half> <half 0xH7BFF, half 0xH7BFF>			; CHECK-NEXT: ret <2 x half> <half 0xH7BFF, half 0xH7BFF>
	;			;
	%cvt = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float 65535.0, float 65535.0)			%cvt = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float 65535.0, float 65535.0)
	ret <2 x half> %cvt			ret <2 x half> %cvt
	}			}

	; --------------------------------------------------------------------			; --------------------------------------------------------------------
	; llvm.amdgcn.cvt.pknorm.i16			; llvm.amdgcn.cvt.pknorm.i16
	; --------------------------------------------------------------------			; --------------------------------------------------------------------

	declare <2 x i16> @llvm.amdgcn.cvt.pknorm.i16(float, float) nounwind readnone			declare <2 x i16> @llvm.amdgcn.cvt.pknorm.i16(float, float) nounwind readnone

	define <2 x i16> @undef_lhs_cvt_pknorm_i16(float %y) {			define <2 x i16> @undef_lhs_cvt_pknorm_i16(float %y) {
	; CHECK-LABEL: @undef_lhs_cvt_pknorm_i16(			; CHECK-LABEL: define <2 x i16> @undef_lhs_cvt_pknorm_i16
	; CHECK-NEXT: [[CVT:%.]] = call <2 x i16> @llvm.amdgcn.cvt.pknorm.i16(float undef, float [[Y:%.]])			; CHECK-SAME: (float [[Y:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: [[CVT:%.*]] = call <2 x i16> @llvm.amdgcn.cvt.pknorm.i16(float undef, float [[Y]])
	; CHECK-NEXT: ret <2 x i16> [[CVT]]			; CHECK-NEXT: ret <2 x i16> [[CVT]]
	;			;
	%cvt = call <2 x i16> @llvm.amdgcn.cvt.pknorm.i16(float undef, float %y)			%cvt = call <2 x i16> @llvm.amdgcn.cvt.pknorm.i16(float undef, float %y)
	ret <2 x i16> %cvt			ret <2 x i16> %cvt
	}			}

	define <2 x i16> @undef_rhs_cvt_pknorm_i16(float %x) {			define <2 x i16> @undef_rhs_cvt_pknorm_i16(float %x) {
	; CHECK-LABEL: @undef_rhs_cvt_pknorm_i16(			; CHECK-LABEL: define <2 x i16> @undef_rhs_cvt_pknorm_i16
	; CHECK-NEXT: [[CVT:%.]] = call <2 x i16> @llvm.amdgcn.cvt.pknorm.i16(float [[X:%.]], float undef)			; CHECK-SAME: (float [[X:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: [[CVT:%.*]] = call <2 x i16> @llvm.amdgcn.cvt.pknorm.i16(float [[X]], float undef)
	; CHECK-NEXT: ret <2 x i16> [[CVT]]			; CHECK-NEXT: ret <2 x i16> [[CVT]]
	;			;
	%cvt = call <2 x i16> @llvm.amdgcn.cvt.pknorm.i16(float %x, float undef)			%cvt = call <2 x i16> @llvm.amdgcn.cvt.pknorm.i16(float %x, float undef)
	ret <2 x i16> %cvt			ret <2 x i16> %cvt
	}			}

	define <2 x i16> @undef_cvt_pknorm_i16() {			define <2 x i16> @undef_cvt_pknorm_i16() {
	; CHECK-LABEL: @undef_cvt_pknorm_i16(			; CHECK-LABEL: define <2 x i16> @undef_cvt_pknorm_i16
				; CHECK-SAME: () #[[ATTR3]] {
	; CHECK-NEXT: ret <2 x i16> undef			; CHECK-NEXT: ret <2 x i16> undef
	;			;
	%cvt = call <2 x i16> @llvm.amdgcn.cvt.pknorm.i16(float undef, float undef)			%cvt = call <2 x i16> @llvm.amdgcn.cvt.pknorm.i16(float undef, float undef)
	ret <2 x i16> %cvt			ret <2 x i16> %cvt
	}			}

	; --------------------------------------------------------------------			; --------------------------------------------------------------------
	; llvm.amdgcn.cvt.pknorm.u16			; llvm.amdgcn.cvt.pknorm.u16
	; --------------------------------------------------------------------			; --------------------------------------------------------------------

	declare <2 x i16> @llvm.amdgcn.cvt.pknorm.u16(float, float) nounwind readnone			declare <2 x i16> @llvm.amdgcn.cvt.pknorm.u16(float, float) nounwind readnone

	define <2 x i16> @undef_lhs_cvt_pknorm_u16(float %y) {			define <2 x i16> @undef_lhs_cvt_pknorm_u16(float %y) {
	; CHECK-LABEL: @undef_lhs_cvt_pknorm_u16(			; CHECK-LABEL: define <2 x i16> @undef_lhs_cvt_pknorm_u16
	; CHECK-NEXT: [[CVT:%.]] = call <2 x i16> @llvm.amdgcn.cvt.pknorm.u16(float undef, float [[Y:%.]])			; CHECK-SAME: (float [[Y:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: [[CVT:%.*]] = call <2 x i16> @llvm.amdgcn.cvt.pknorm.u16(float undef, float [[Y]])
	; CHECK-NEXT: ret <2 x i16> [[CVT]]			; CHECK-NEXT: ret <2 x i16> [[CVT]]
	;			;
	%cvt = call <2 x i16> @llvm.amdgcn.cvt.pknorm.u16(float undef, float %y)			%cvt = call <2 x i16> @llvm.amdgcn.cvt.pknorm.u16(float undef, float %y)
	ret <2 x i16> %cvt			ret <2 x i16> %cvt
	}			}

	define <2 x i16> @undef_rhs_cvt_pknorm_u16(float %x) {			define <2 x i16> @undef_rhs_cvt_pknorm_u16(float %x) {
	; CHECK-LABEL: @undef_rhs_cvt_pknorm_u16(			; CHECK-LABEL: define <2 x i16> @undef_rhs_cvt_pknorm_u16
	; CHECK-NEXT: [[CVT:%.]] = call <2 x i16> @llvm.amdgcn.cvt.pknorm.u16(float [[X:%.]], float undef)			; CHECK-SAME: (float [[X:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: [[CVT:%.*]] = call <2 x i16> @llvm.amdgcn.cvt.pknorm.u16(float [[X]], float undef)
	; CHECK-NEXT: ret <2 x i16> [[CVT]]			; CHECK-NEXT: ret <2 x i16> [[CVT]]
	;			;
	%cvt = call <2 x i16> @llvm.amdgcn.cvt.pknorm.u16(float %x, float undef)			%cvt = call <2 x i16> @llvm.amdgcn.cvt.pknorm.u16(float %x, float undef)
	ret <2 x i16> %cvt			ret <2 x i16> %cvt
	}			}

	define <2 x i16> @undef_cvt_pknorm_u16() {			define <2 x i16> @undef_cvt_pknorm_u16() {
	; CHECK-LABEL: @undef_cvt_pknorm_u16(			; CHECK-LABEL: define <2 x i16> @undef_cvt_pknorm_u16
				; CHECK-SAME: () #[[ATTR3]] {
	; CHECK-NEXT: ret <2 x i16> undef			; CHECK-NEXT: ret <2 x i16> undef
	;			;
	%cvt = call <2 x i16> @llvm.amdgcn.cvt.pknorm.u16(float undef, float undef)			%cvt = call <2 x i16> @llvm.amdgcn.cvt.pknorm.u16(float undef, float undef)
	ret <2 x i16> %cvt			ret <2 x i16> %cvt
	}			}

	; --------------------------------------------------------------------			; --------------------------------------------------------------------
	; llvm.amdgcn.cvt.pk.i16			; llvm.amdgcn.cvt.pk.i16
	; --------------------------------------------------------------------			; --------------------------------------------------------------------

	declare <2 x i16> @llvm.amdgcn.cvt.pk.i16(i32, i32) nounwind readnone			declare <2 x i16> @llvm.amdgcn.cvt.pk.i16(i32, i32) nounwind readnone

	define <2 x i16> @undef_lhs_cvt_pk_i16(i32 %y) {			define <2 x i16> @undef_lhs_cvt_pk_i16(i32 %y) {
	; CHECK-LABEL: @undef_lhs_cvt_pk_i16(			; CHECK-LABEL: define <2 x i16> @undef_lhs_cvt_pk_i16
	; CHECK-NEXT: [[CVT:%.]] = call <2 x i16> @llvm.amdgcn.cvt.pk.i16(i32 undef, i32 [[Y:%.]])			; CHECK-SAME: (i32 [[Y:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: [[CVT:%.*]] = call <2 x i16> @llvm.amdgcn.cvt.pk.i16(i32 undef, i32 [[Y]])
	; CHECK-NEXT: ret <2 x i16> [[CVT]]			; CHECK-NEXT: ret <2 x i16> [[CVT]]
	;			;
	%cvt = call <2 x i16> @llvm.amdgcn.cvt.pk.i16(i32 undef, i32 %y)			%cvt = call <2 x i16> @llvm.amdgcn.cvt.pk.i16(i32 undef, i32 %y)
	ret <2 x i16> %cvt			ret <2 x i16> %cvt
	}			}

	define <2 x i16> @undef_rhs_cvt_pk_i16(i32 %x) {			define <2 x i16> @undef_rhs_cvt_pk_i16(i32 %x) {
	; CHECK-LABEL: @undef_rhs_cvt_pk_i16(			; CHECK-LABEL: define <2 x i16> @undef_rhs_cvt_pk_i16
	; CHECK-NEXT: [[CVT:%.]] = call <2 x i16> @llvm.amdgcn.cvt.pk.i16(i32 [[X:%.]], i32 undef)			; CHECK-SAME: (i32 [[X:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: [[CVT:%.*]] = call <2 x i16> @llvm.amdgcn.cvt.pk.i16(i32 [[X]], i32 undef)
	; CHECK-NEXT: ret <2 x i16> [[CVT]]			; CHECK-NEXT: ret <2 x i16> [[CVT]]
	;			;
	%cvt = call <2 x i16> @llvm.amdgcn.cvt.pk.i16(i32 %x, i32 undef)			%cvt = call <2 x i16> @llvm.amdgcn.cvt.pk.i16(i32 %x, i32 undef)
	ret <2 x i16> %cvt			ret <2 x i16> %cvt
	}			}

	define <2 x i16> @undef_cvt_pk_i16() {			define <2 x i16> @undef_cvt_pk_i16() {
	; CHECK-LABEL: @undef_cvt_pk_i16(			; CHECK-LABEL: define <2 x i16> @undef_cvt_pk_i16
				; CHECK-SAME: () #[[ATTR3]] {
	; CHECK-NEXT: ret <2 x i16> undef			; CHECK-NEXT: ret <2 x i16> undef
	;			;
	%cvt = call <2 x i16> @llvm.amdgcn.cvt.pk.i16(i32 undef, i32 undef)			%cvt = call <2 x i16> @llvm.amdgcn.cvt.pk.i16(i32 undef, i32 undef)
	ret <2 x i16> %cvt			ret <2 x i16> %cvt
	}			}

	; --------------------------------------------------------------------			; --------------------------------------------------------------------
	; llvm.amdgcn.cvt.pk.u16			; llvm.amdgcn.cvt.pk.u16
	; --------------------------------------------------------------------			; --------------------------------------------------------------------

	declare <2 x i16> @llvm.amdgcn.cvt.pk.u16(i32, i32) nounwind readnone			declare <2 x i16> @llvm.amdgcn.cvt.pk.u16(i32, i32) nounwind readnone

	define <2 x i16> @undef_lhs_cvt_pk_u16(i32 %y) {			define <2 x i16> @undef_lhs_cvt_pk_u16(i32 %y) {
	; CHECK-LABEL: @undef_lhs_cvt_pk_u16(			; CHECK-LABEL: define <2 x i16> @undef_lhs_cvt_pk_u16
	; CHECK-NEXT: [[CVT:%.]] = call <2 x i16> @llvm.amdgcn.cvt.pk.u16(i32 undef, i32 [[Y:%.]])			; CHECK-SAME: (i32 [[Y:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: [[CVT:%.*]] = call <2 x i16> @llvm.amdgcn.cvt.pk.u16(i32 undef, i32 [[Y]])
	; CHECK-NEXT: ret <2 x i16> [[CVT]]			; CHECK-NEXT: ret <2 x i16> [[CVT]]
	;			;
	%cvt = call <2 x i16> @llvm.amdgcn.cvt.pk.u16(i32 undef, i32 %y)			%cvt = call <2 x i16> @llvm.amdgcn.cvt.pk.u16(i32 undef, i32 %y)
	ret <2 x i16> %cvt			ret <2 x i16> %cvt
	}			}

	define <2 x i16> @undef_rhs_cvt_pk_u16(i32 %x) {			define <2 x i16> @undef_rhs_cvt_pk_u16(i32 %x) {
	; CHECK-LABEL: @undef_rhs_cvt_pk_u16(			; CHECK-LABEL: define <2 x i16> @undef_rhs_cvt_pk_u16
	; CHECK-NEXT: [[CVT:%.]] = call <2 x i16> @llvm.amdgcn.cvt.pk.u16(i32 [[X:%.]], i32 undef)			; CHECK-SAME: (i32 [[X:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: [[CVT:%.*]] = call <2 x i16> @llvm.amdgcn.cvt.pk.u16(i32 [[X]], i32 undef)
	; CHECK-NEXT: ret <2 x i16> [[CVT]]			; CHECK-NEXT: ret <2 x i16> [[CVT]]
	;			;
	%cvt = call <2 x i16> @llvm.amdgcn.cvt.pk.u16(i32 %x, i32 undef)			%cvt = call <2 x i16> @llvm.amdgcn.cvt.pk.u16(i32 %x, i32 undef)
	ret <2 x i16> %cvt			ret <2 x i16> %cvt
	}			}

	define <2 x i16> @undef_cvt_pk_u16() {			define <2 x i16> @undef_cvt_pk_u16() {
	; CHECK-LABEL: @undef_cvt_pk_u16(			; CHECK-LABEL: define <2 x i16> @undef_cvt_pk_u16
				; CHECK-SAME: () #[[ATTR3]] {
	; CHECK-NEXT: ret <2 x i16> undef			; CHECK-NEXT: ret <2 x i16> undef
	;			;
	%cvt = call <2 x i16> @llvm.amdgcn.cvt.pk.u16(i32 undef, i32 undef)			%cvt = call <2 x i16> @llvm.amdgcn.cvt.pk.u16(i32 undef, i32 undef)
	ret <2 x i16> %cvt			ret <2 x i16> %cvt
	}			}

	; --------------------------------------------------------------------			; --------------------------------------------------------------------
	; llvm.amdgcn.ubfe			; llvm.amdgcn.ubfe
	; --------------------------------------------------------------------			; --------------------------------------------------------------------

	declare i32 @llvm.amdgcn.ubfe.i32(i32, i32, i32) nounwind readnone			declare i32 @llvm.amdgcn.ubfe.i32(i32, i32, i32) nounwind readnone
	declare i64 @llvm.amdgcn.ubfe.i64(i64, i32, i32) nounwind readnone			declare i64 @llvm.amdgcn.ubfe.i64(i64, i32, i32) nounwind readnone

	define i32 @ubfe_var_i32(i32 %src, i32 %offset, i32 %width) {			define i32 @ubfe_var_i32(i32 %src, i32 %offset, i32 %width) {
	; CHECK-LABEL: @ubfe_var_i32(			; CHECK-LABEL: define i32 @ubfe_var_i32
	; CHECK-NEXT: [[BFE:%.]] = call i32 @llvm.amdgcn.ubfe.i32(i32 [[SRC:%.]], i32 [[OFFSET:%.]], i32 [[WIDTH:%.]])			; CHECK-SAME: (i32 [[SRC:%.]], i32 [[OFFSET:%.]], i32 [[WIDTH:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: [[BFE:%.*]] = call i32 @llvm.amdgcn.ubfe.i32(i32 [[SRC]], i32 [[OFFSET]], i32 [[WIDTH]])
	; CHECK-NEXT: ret i32 [[BFE]]			; CHECK-NEXT: ret i32 [[BFE]]
	;			;
	%bfe = call i32 @llvm.amdgcn.ubfe.i32(i32 %src, i32 %offset, i32 %width)			%bfe = call i32 @llvm.amdgcn.ubfe.i32(i32 %src, i32 %offset, i32 %width)
	ret i32 %bfe			ret i32 %bfe
	}			}

	define i32 @ubfe_clear_high_bits_constant_offset_i32(i32 %src, i32 %width) {			define i32 @ubfe_clear_high_bits_constant_offset_i32(i32 %src, i32 %width) {
	; CHECK-LABEL: @ubfe_clear_high_bits_constant_offset_i32(			; CHECK-LABEL: define i32 @ubfe_clear_high_bits_constant_offset_i32
	; CHECK-NEXT: [[BFE:%.]] = call i32 @llvm.amdgcn.ubfe.i32(i32 [[SRC:%.]], i32 5, i32 [[WIDTH:%.*]])			; CHECK-SAME: (i32 [[SRC:%.]], i32 [[WIDTH:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[BFE:%.*]] = call i32 @llvm.amdgcn.ubfe.i32(i32 [[SRC]], i32 5, i32 [[WIDTH]])
	; CHECK-NEXT: ret i32 [[BFE]]			; CHECK-NEXT: ret i32 [[BFE]]
	;			;
	%bfe = call i32 @llvm.amdgcn.ubfe.i32(i32 %src, i32 133, i32 %width)			%bfe = call i32 @llvm.amdgcn.ubfe.i32(i32 %src, i32 133, i32 %width)
	ret i32 %bfe			ret i32 %bfe
	}			}

	define i32 @ubfe_clear_high_bits_constant_width_i32(i32 %src, i32 %offset) {			define i32 @ubfe_clear_high_bits_constant_width_i32(i32 %src, i32 %offset) {
	; CHECK-LABEL: @ubfe_clear_high_bits_constant_width_i32(			; CHECK-LABEL: define i32 @ubfe_clear_high_bits_constant_width_i32
	; CHECK-NEXT: [[BFE:%.]] = call i32 @llvm.amdgcn.ubfe.i32(i32 [[SRC:%.]], i32 [[OFFSET:%.*]], i32 5)			; CHECK-SAME: (i32 [[SRC:%.]], i32 [[OFFSET:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[BFE:%.*]] = call i32 @llvm.amdgcn.ubfe.i32(i32 [[SRC]], i32 [[OFFSET]], i32 5)
	; CHECK-NEXT: ret i32 [[BFE]]			; CHECK-NEXT: ret i32 [[BFE]]
	;			;
	%bfe = call i32 @llvm.amdgcn.ubfe.i32(i32 %src, i32 %offset, i32 133)			%bfe = call i32 @llvm.amdgcn.ubfe.i32(i32 %src, i32 %offset, i32 133)
	ret i32 %bfe			ret i32 %bfe
	}			}

	define i32 @ubfe_width_0(i32 %src, i32 %offset) {			define i32 @ubfe_width_0(i32 %src, i32 %offset) {
	; CHECK-LABEL: @ubfe_width_0(			; CHECK-LABEL: define i32 @ubfe_width_0
				; CHECK-SAME: (i32 [[SRC:%.]], i32 [[OFFSET:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	;			;
	%bfe = call i32 @llvm.amdgcn.ubfe.i32(i32 %src, i32 %offset, i32 0)			%bfe = call i32 @llvm.amdgcn.ubfe.i32(i32 %src, i32 %offset, i32 0)
	ret i32 %bfe			ret i32 %bfe
	}			}

	define i32 @ubfe_width_31(i32 %src, i32 %offset) {			define i32 @ubfe_width_31(i32 %src, i32 %offset) {
	; CHECK-LABEL: @ubfe_width_31(			; CHECK-LABEL: define i32 @ubfe_width_31
	; CHECK-NEXT: [[BFE:%.]] = call i32 @llvm.amdgcn.ubfe.i32(i32 [[SRC:%.]], i32 [[OFFSET:%.*]], i32 31)			; CHECK-SAME: (i32 [[SRC:%.]], i32 [[OFFSET:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[BFE:%.*]] = call i32 @llvm.amdgcn.ubfe.i32(i32 [[SRC]], i32 [[OFFSET]], i32 31)
	; CHECK-NEXT: ret i32 [[BFE]]			; CHECK-NEXT: ret i32 [[BFE]]
	;			;
	%bfe = call i32 @llvm.amdgcn.ubfe.i32(i32 %src, i32 %offset, i32 31)			%bfe = call i32 @llvm.amdgcn.ubfe.i32(i32 %src, i32 %offset, i32 31)
	ret i32 %bfe			ret i32 %bfe
	}			}

	define i32 @ubfe_width_32(i32 %src, i32 %offset) {			define i32 @ubfe_width_32(i32 %src, i32 %offset) {
	; CHECK-LABEL: @ubfe_width_32(			; CHECK-LABEL: define i32 @ubfe_width_32
				; CHECK-SAME: (i32 [[SRC:%.]], i32 [[OFFSET:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	;			;
	%bfe = call i32 @llvm.amdgcn.ubfe.i32(i32 %src, i32 %offset, i32 32)			%bfe = call i32 @llvm.amdgcn.ubfe.i32(i32 %src, i32 %offset, i32 32)
	ret i32 %bfe			ret i32 %bfe
	}			}

	define i32 @ubfe_width_33(i32 %src, i32 %offset) {			define i32 @ubfe_width_33(i32 %src, i32 %offset) {
	; CHECK-LABEL: @ubfe_width_33(			; CHECK-LABEL: define i32 @ubfe_width_33
	; CHECK-NEXT: [[BFE:%.]] = call i32 @llvm.amdgcn.ubfe.i32(i32 [[SRC:%.]], i32 [[OFFSET:%.*]], i32 1)			; CHECK-SAME: (i32 [[SRC:%.]], i32 [[OFFSET:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[BFE:%.*]] = call i32 @llvm.amdgcn.ubfe.i32(i32 [[SRC]], i32 [[OFFSET]], i32 1)
	; CHECK-NEXT: ret i32 [[BFE]]			; CHECK-NEXT: ret i32 [[BFE]]
	;			;
	%bfe = call i32 @llvm.amdgcn.ubfe.i32(i32 %src, i32 %offset, i32 33)			%bfe = call i32 @llvm.amdgcn.ubfe.i32(i32 %src, i32 %offset, i32 33)
	ret i32 %bfe			ret i32 %bfe
	}			}

	define i32 @ubfe_offset_33(i32 %src, i32 %width) {			define i32 @ubfe_offset_33(i32 %src, i32 %width) {
	; CHECK-LABEL: @ubfe_offset_33(			; CHECK-LABEL: define i32 @ubfe_offset_33
	; CHECK-NEXT: [[BFE:%.]] = call i32 @llvm.amdgcn.ubfe.i32(i32 [[SRC:%.]], i32 1, i32 [[WIDTH:%.*]])			; CHECK-SAME: (i32 [[SRC:%.]], i32 [[WIDTH:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[BFE:%.*]] = call i32 @llvm.amdgcn.ubfe.i32(i32 [[SRC]], i32 1, i32 [[WIDTH]])
	; CHECK-NEXT: ret i32 [[BFE]]			; CHECK-NEXT: ret i32 [[BFE]]
	;			;
	%bfe = call i32 @llvm.amdgcn.ubfe.i32(i32 %src, i32 33, i32 %width)			%bfe = call i32 @llvm.amdgcn.ubfe.i32(i32 %src, i32 33, i32 %width)
	ret i32 %bfe			ret i32 %bfe
	}			}

	define i32 @ubfe_offset_0(i32 %src, i32 %width) {			define i32 @ubfe_offset_0(i32 %src, i32 %width) {
	; CHECK-LABEL: @ubfe_offset_0(			; CHECK-LABEL: define i32 @ubfe_offset_0
	; CHECK-NEXT: [[BFE:%.]] = call i32 @llvm.amdgcn.ubfe.i32(i32 [[SRC:%.]], i32 0, i32 [[WIDTH:%.*]])			; CHECK-SAME: (i32 [[SRC:%.]], i32 [[WIDTH:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[BFE:%.*]] = call i32 @llvm.amdgcn.ubfe.i32(i32 [[SRC]], i32 0, i32 [[WIDTH]])
	; CHECK-NEXT: ret i32 [[BFE]]			; CHECK-NEXT: ret i32 [[BFE]]
	;			;
	%bfe = call i32 @llvm.amdgcn.ubfe.i32(i32 %src, i32 0, i32 %width)			%bfe = call i32 @llvm.amdgcn.ubfe.i32(i32 %src, i32 0, i32 %width)
	ret i32 %bfe			ret i32 %bfe
	}			}

	define i32 @ubfe_offset_32(i32 %src, i32 %width) {			define i32 @ubfe_offset_32(i32 %src, i32 %width) {
	; CHECK-LABEL: @ubfe_offset_32(			; CHECK-LABEL: define i32 @ubfe_offset_32
	; CHECK-NEXT: [[BFE:%.]] = call i32 @llvm.amdgcn.ubfe.i32(i32 [[SRC:%.]], i32 0, i32 [[WIDTH:%.*]])			; CHECK-SAME: (i32 [[SRC:%.]], i32 [[WIDTH:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[BFE:%.*]] = call i32 @llvm.amdgcn.ubfe.i32(i32 [[SRC]], i32 0, i32 [[WIDTH]])
	; CHECK-NEXT: ret i32 [[BFE]]			; CHECK-NEXT: ret i32 [[BFE]]
	;			;
	%bfe = call i32 @llvm.amdgcn.ubfe.i32(i32 %src, i32 32, i32 %width)			%bfe = call i32 @llvm.amdgcn.ubfe.i32(i32 %src, i32 32, i32 %width)
	ret i32 %bfe			ret i32 %bfe
	}			}

	define i32 @ubfe_offset_31(i32 %src, i32 %width) {			define i32 @ubfe_offset_31(i32 %src, i32 %width) {
	; CHECK-LABEL: @ubfe_offset_31(			; CHECK-LABEL: define i32 @ubfe_offset_31
	; CHECK-NEXT: [[BFE:%.]] = call i32 @llvm.amdgcn.ubfe.i32(i32 [[SRC:%.]], i32 31, i32 [[WIDTH:%.*]])			; CHECK-SAME: (i32 [[SRC:%.]], i32 [[WIDTH:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[BFE:%.*]] = call i32 @llvm.amdgcn.ubfe.i32(i32 [[SRC]], i32 31, i32 [[WIDTH]])
	; CHECK-NEXT: ret i32 [[BFE]]			; CHECK-NEXT: ret i32 [[BFE]]
	;			;
	%bfe = call i32 @llvm.amdgcn.ubfe.i32(i32 %src, i32 31, i32 %width)			%bfe = call i32 @llvm.amdgcn.ubfe.i32(i32 %src, i32 31, i32 %width)
	ret i32 %bfe			ret i32 %bfe
	}			}

	define i32 @ubfe_offset_0_width_0(i32 %src) {			define i32 @ubfe_offset_0_width_0(i32 %src) {
	; CHECK-LABEL: @ubfe_offset_0_width_0(			; CHECK-LABEL: define i32 @ubfe_offset_0_width_0
				; CHECK-SAME: (i32 [[SRC:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	;			;
	%bfe = call i32 @llvm.amdgcn.ubfe.i32(i32 %src, i32 0, i32 0)			%bfe = call i32 @llvm.amdgcn.ubfe.i32(i32 %src, i32 0, i32 0)
	ret i32 %bfe			ret i32 %bfe
	}			}

	define i32 @ubfe_offset_0_width_3(i32 %src) {			define i32 @ubfe_offset_0_width_3(i32 %src) {
	; CHECK-LABEL: @ubfe_offset_0_width_3(			; CHECK-LABEL: define i32 @ubfe_offset_0_width_3
	; CHECK-NEXT: [[TMP1:%.]] = and i32 [[SRC:%.]], 7			; CHECK-SAME: (i32 [[SRC:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: ret i32 [[TMP1]]			; CHECK-NEXT: [[BFE:%.*]] = and i32 [[SRC]], 7
				; CHECK-NEXT: ret i32 [[BFE]]
	;			;
	%bfe = call i32 @llvm.amdgcn.ubfe.i32(i32 %src, i32 0, i32 3)			%bfe = call i32 @llvm.amdgcn.ubfe.i32(i32 %src, i32 0, i32 3)
	ret i32 %bfe			ret i32 %bfe
	}			}

	define i32 @ubfe_offset_3_width_1(i32 %src) {			define i32 @ubfe_offset_3_width_1(i32 %src) {
	; CHECK-LABEL: @ubfe_offset_3_width_1(			; CHECK-LABEL: define i32 @ubfe_offset_3_width_1
	; CHECK-NEXT: [[TMP1:%.]] = lshr i32 [[SRC:%.]], 3			; CHECK-SAME: (i32 [[SRC:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: [[TMP1:%.*]] = lshr i32 [[SRC]], 3
	; CHECK-NEXT: [[BFE:%.*]] = and i32 [[TMP1]], 1			; CHECK-NEXT: [[BFE:%.*]] = and i32 [[TMP1]], 1
	; CHECK-NEXT: ret i32 [[BFE]]			; CHECK-NEXT: ret i32 [[BFE]]
	;			;
	%bfe = call i32 @llvm.amdgcn.ubfe.i32(i32 %src, i32 3, i32 1)			%bfe = call i32 @llvm.amdgcn.ubfe.i32(i32 %src, i32 3, i32 1)
	ret i32 %bfe			ret i32 %bfe
	}			}

	define i32 @ubfe_offset_3_width_4(i32 %src) {			define i32 @ubfe_offset_3_width_4(i32 %src) {
	; CHECK-LABEL: @ubfe_offset_3_width_4(			; CHECK-LABEL: define i32 @ubfe_offset_3_width_4
	; CHECK-NEXT: [[TMP1:%.]] = lshr i32 [[SRC:%.]], 3			; CHECK-SAME: (i32 [[SRC:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: [[TMP1:%.*]] = lshr i32 [[SRC]], 3
	; CHECK-NEXT: [[BFE:%.*]] = and i32 [[TMP1]], 15			; CHECK-NEXT: [[BFE:%.*]] = and i32 [[TMP1]], 15
	; CHECK-NEXT: ret i32 [[BFE]]			; CHECK-NEXT: ret i32 [[BFE]]
	;			;
	%bfe = call i32 @llvm.amdgcn.ubfe.i32(i32 %src, i32 3, i32 4)			%bfe = call i32 @llvm.amdgcn.ubfe.i32(i32 %src, i32 3, i32 4)
	ret i32 %bfe			ret i32 %bfe
	}			}

	define i32 @ubfe_0_0_0() {			define i32 @ubfe_0_0_0() {
	; CHECK-LABEL: @ubfe_0_0_0(			; CHECK-LABEL: define i32 @ubfe_0_0_0
				; CHECK-SAME: () #[[ATTR3]] {
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	;			;
	%bfe = call i32 @llvm.amdgcn.ubfe.i32(i32 0, i32 0, i32 0)			%bfe = call i32 @llvm.amdgcn.ubfe.i32(i32 0, i32 0, i32 0)
	ret i32 %bfe			ret i32 %bfe
	}			}

	define i32 @ubfe_neg1_5_7() {			define i32 @ubfe_neg1_5_7() {
	; CHECK-LABEL: @ubfe_neg1_5_7(			; CHECK-LABEL: define i32 @ubfe_neg1_5_7
				; CHECK-SAME: () #[[ATTR3]] {
	; CHECK-NEXT: ret i32 127			; CHECK-NEXT: ret i32 127
	;			;
	%bfe = call i32 @llvm.amdgcn.ubfe.i32(i32 -1, i32 5, i32 7)			%bfe = call i32 @llvm.amdgcn.ubfe.i32(i32 -1, i32 5, i32 7)
	ret i32 %bfe			ret i32 %bfe
	}			}

	define i32 @ubfe_undef_src_i32(i32 %offset, i32 %width) {			define i32 @ubfe_undef_src_i32(i32 %offset, i32 %width) {
	; CHECK-LABEL: @ubfe_undef_src_i32(			; CHECK-LABEL: define i32 @ubfe_undef_src_i32
				; CHECK-SAME: (i32 [[OFFSET:%.]], i32 [[WIDTH:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: ret i32 undef			; CHECK-NEXT: ret i32 undef
	;			;
	%bfe = call i32 @llvm.amdgcn.ubfe.i32(i32 undef, i32 %offset, i32 %width)			%bfe = call i32 @llvm.amdgcn.ubfe.i32(i32 undef, i32 %offset, i32 %width)
	ret i32 %bfe			ret i32 %bfe
	}			}

	define i32 @ubfe_undef_offset_i32(i32 %src, i32 %width) {			define i32 @ubfe_undef_offset_i32(i32 %src, i32 %width) {
	; CHECK-LABEL: @ubfe_undef_offset_i32(			; CHECK-LABEL: define i32 @ubfe_undef_offset_i32
	; CHECK-NEXT: [[BFE:%.]] = call i32 @llvm.amdgcn.ubfe.i32(i32 [[SRC:%.]], i32 undef, i32 [[WIDTH:%.*]])			; CHECK-SAME: (i32 [[SRC:%.]], i32 [[WIDTH:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[BFE:%.*]] = call i32 @llvm.amdgcn.ubfe.i32(i32 [[SRC]], i32 undef, i32 [[WIDTH]])
	; CHECK-NEXT: ret i32 [[BFE]]			; CHECK-NEXT: ret i32 [[BFE]]
	;			;
	%bfe = call i32 @llvm.amdgcn.ubfe.i32(i32 %src, i32 undef, i32 %width)			%bfe = call i32 @llvm.amdgcn.ubfe.i32(i32 %src, i32 undef, i32 %width)
	ret i32 %bfe			ret i32 %bfe
	}			}

	define i32 @ubfe_undef_width_i32(i32 %src, i32 %offset) {			define i32 @ubfe_undef_width_i32(i32 %src, i32 %offset) {
	; CHECK-LABEL: @ubfe_undef_width_i32(			; CHECK-LABEL: define i32 @ubfe_undef_width_i32
	; CHECK-NEXT: [[BFE:%.]] = call i32 @llvm.amdgcn.ubfe.i32(i32 [[SRC:%.]], i32 [[OFFSET:%.*]], i32 undef)			; CHECK-SAME: (i32 [[SRC:%.]], i32 [[OFFSET:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[BFE:%.*]] = call i32 @llvm.amdgcn.ubfe.i32(i32 [[SRC]], i32 [[OFFSET]], i32 undef)
	; CHECK-NEXT: ret i32 [[BFE]]			; CHECK-NEXT: ret i32 [[BFE]]
	;			;
	%bfe = call i32 @llvm.amdgcn.ubfe.i32(i32 %src, i32 %offset, i32 undef)			%bfe = call i32 @llvm.amdgcn.ubfe.i32(i32 %src, i32 %offset, i32 undef)
	ret i32 %bfe			ret i32 %bfe
	}			}

	define i64 @ubfe_offset_33_width_4_i64(i64 %src) {			define i64 @ubfe_offset_33_width_4_i64(i64 %src) {
	; CHECK-LABEL: @ubfe_offset_33_width_4_i64(			; CHECK-LABEL: define i64 @ubfe_offset_33_width_4_i64
	; CHECK-NEXT: [[TMP1:%.]] = lshr i64 [[SRC:%.]], 33			; CHECK-SAME: (i64 [[SRC:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: [[TMP1:%.*]] = lshr i64 [[SRC]], 33
	; CHECK-NEXT: [[BFE:%.*]] = and i64 [[TMP1]], 15			; CHECK-NEXT: [[BFE:%.*]] = and i64 [[TMP1]], 15
	; CHECK-NEXT: ret i64 [[BFE]]			; CHECK-NEXT: ret i64 [[BFE]]
	;			;
	%bfe = call i64 @llvm.amdgcn.ubfe.i64(i64 %src, i32 33, i32 4)			%bfe = call i64 @llvm.amdgcn.ubfe.i64(i64 %src, i32 33, i32 4)
	ret i64 %bfe			ret i64 %bfe
	}			}

	define i64 @ubfe_offset_0_i64(i64 %src, i32 %width) {			define i64 @ubfe_offset_0_i64(i64 %src, i32 %width) {
	; CHECK-LABEL: @ubfe_offset_0_i64(			; CHECK-LABEL: define i64 @ubfe_offset_0_i64
	; CHECK-NEXT: [[BFE:%.]] = call i64 @llvm.amdgcn.ubfe.i64(i64 [[SRC:%.]], i32 0, i32 [[WIDTH:%.*]])			; CHECK-SAME: (i64 [[SRC:%.]], i32 [[WIDTH:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[BFE:%.*]] = call i64 @llvm.amdgcn.ubfe.i64(i64 [[SRC]], i32 0, i32 [[WIDTH]])
	; CHECK-NEXT: ret i64 [[BFE]]			; CHECK-NEXT: ret i64 [[BFE]]
	;			;
	%bfe = call i64 @llvm.amdgcn.ubfe.i64(i64 %src, i32 0, i32 %width)			%bfe = call i64 @llvm.amdgcn.ubfe.i64(i64 %src, i32 0, i32 %width)
	ret i64 %bfe			ret i64 %bfe
	}			}

	define i64 @ubfe_offset_32_width_32_i64(i64 %src) {			define i64 @ubfe_offset_32_width_32_i64(i64 %src) {
	; CHECK-LABEL: @ubfe_offset_32_width_32_i64(			; CHECK-LABEL: define i64 @ubfe_offset_32_width_32_i64
	; CHECK-NEXT: [[BFE:%.]] = lshr i64 [[SRC:%.]], 32			; CHECK-SAME: (i64 [[SRC:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: [[BFE:%.*]] = lshr i64 [[SRC]], 32
	; CHECK-NEXT: ret i64 [[BFE]]			; CHECK-NEXT: ret i64 [[BFE]]
	;			;
	%bfe = call i64 @llvm.amdgcn.ubfe.i64(i64 %src, i32 32, i32 32)			%bfe = call i64 @llvm.amdgcn.ubfe.i64(i64 %src, i32 32, i32 32)
	ret i64 %bfe			ret i64 %bfe
	}			}

	; --------------------------------------------------------------------			; --------------------------------------------------------------------
	; llvm.amdgcn.sbfe			; llvm.amdgcn.sbfe
	; --------------------------------------------------------------------			; --------------------------------------------------------------------

	declare i32 @llvm.amdgcn.sbfe.i32(i32, i32, i32) nounwind readnone			declare i32 @llvm.amdgcn.sbfe.i32(i32, i32, i32) nounwind readnone
	declare i64 @llvm.amdgcn.sbfe.i64(i64, i32, i32) nounwind readnone			declare i64 @llvm.amdgcn.sbfe.i64(i64, i32, i32) nounwind readnone

	define i32 @sbfe_offset_31(i32 %src, i32 %width) {			define i32 @sbfe_offset_31(i32 %src, i32 %width) {
	; CHECK-LABEL: @sbfe_offset_31(			; CHECK-LABEL: define i32 @sbfe_offset_31
	; CHECK-NEXT: [[BFE:%.]] = call i32 @llvm.amdgcn.sbfe.i32(i32 [[SRC:%.]], i32 31, i32 [[WIDTH:%.*]])			; CHECK-SAME: (i32 [[SRC:%.]], i32 [[WIDTH:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[BFE:%.*]] = call i32 @llvm.amdgcn.sbfe.i32(i32 [[SRC]], i32 31, i32 [[WIDTH]])
	; CHECK-NEXT: ret i32 [[BFE]]			; CHECK-NEXT: ret i32 [[BFE]]
	;			;
	%bfe = call i32 @llvm.amdgcn.sbfe.i32(i32 %src, i32 31, i32 %width)			%bfe = call i32 @llvm.amdgcn.sbfe.i32(i32 %src, i32 31, i32 %width)
	ret i32 %bfe			ret i32 %bfe
	}			}

	define i32 @sbfe_neg1_5_7() {			define i32 @sbfe_neg1_5_7() {
	; CHECK-LABEL: @sbfe_neg1_5_7(			; CHECK-LABEL: define i32 @sbfe_neg1_5_7
				; CHECK-SAME: () #[[ATTR3]] {
	; CHECK-NEXT: ret i32 -1			; CHECK-NEXT: ret i32 -1
	;			;
	%bfe = call i32 @llvm.amdgcn.sbfe.i32(i32 -1, i32 5, i32 7)			%bfe = call i32 @llvm.amdgcn.sbfe.i32(i32 -1, i32 5, i32 7)
	ret i32 %bfe			ret i32 %bfe
	}			}

	define i64 @sbfe_offset_32_width_32_i64(i64 %src) {			define i64 @sbfe_offset_32_width_32_i64(i64 %src) {
	; CHECK-LABEL: @sbfe_offset_32_width_32_i64(			; CHECK-LABEL: define i64 @sbfe_offset_32_width_32_i64
	; CHECK-NEXT: [[BFE:%.]] = ashr i64 [[SRC:%.]], 32			; CHECK-SAME: (i64 [[SRC:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: [[BFE:%.*]] = ashr i64 [[SRC]], 32
	; CHECK-NEXT: ret i64 [[BFE]]			; CHECK-NEXT: ret i64 [[BFE]]
	;			;
	%bfe = call i64 @llvm.amdgcn.sbfe.i64(i64 %src, i32 32, i32 32)			%bfe = call i64 @llvm.amdgcn.sbfe.i64(i64 %src, i32 32, i32 32)
	ret i64 %bfe			ret i64 %bfe
	}			}

	; --------------------------------------------------------------------			; --------------------------------------------------------------------
	; llvm.amdgcn.exp			; llvm.amdgcn.exp
	; --------------------------------------------------------------------			; --------------------------------------------------------------------

	declare void @llvm.amdgcn.exp.f32(i32 immarg, i32 immarg, float, float, float, float, i1 immarg, i1 immarg) nounwind inaccessiblememonly			declare void @llvm.amdgcn.exp.f32(i32 immarg, i32 immarg, float, float, float, float, i1 immarg, i1 immarg) nounwind inaccessiblememonly




	define void @exp_disabled_inputs_to_undef(float %x, float %y, float %z, float %w) {			define void @exp_disabled_inputs_to_undef(float %x, float %y, float %z, float %w) {
	; enable src0..src3 constants			; enable src0..src3 constants
	; CHECK-LABEL: @exp_disabled_inputs_to_undef(			; CHECK-LABEL: define void @exp_disabled_inputs_to_undef
				; CHECK-SAME: (float [[X:%.]], float [[Y:%.]], float [[Z:%.]], float [[W:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: call void @llvm.amdgcn.exp.f32(i32 0, i32 1, float 1.000000e+00, float undef, float undef, float undef, i1 true, i1 false)			; CHECK-NEXT: call void @llvm.amdgcn.exp.f32(i32 0, i32 1, float 1.000000e+00, float undef, float undef, float undef, i1 true, i1 false)
	; CHECK-NEXT: call void @llvm.amdgcn.exp.f32(i32 0, i32 2, float undef, float 2.000000e+00, float undef, float undef, i1 true, i1 false)			; CHECK-NEXT: call void @llvm.amdgcn.exp.f32(i32 0, i32 2, float undef, float 2.000000e+00, float undef, float undef, i1 true, i1 false)
	; CHECK-NEXT: call void @llvm.amdgcn.exp.f32(i32 0, i32 4, float undef, float undef, float 5.000000e-01, float undef, i1 true, i1 false)			; CHECK-NEXT: call void @llvm.amdgcn.exp.f32(i32 0, i32 4, float undef, float undef, float 5.000000e-01, float undef, i1 true, i1 false)
	; CHECK-NEXT: call void @llvm.amdgcn.exp.f32(i32 0, i32 8, float undef, float undef, float undef, float 4.000000e+00, i1 true, i1 false)			; CHECK-NEXT: call void @llvm.amdgcn.exp.f32(i32 0, i32 8, float undef, float undef, float undef, float 4.000000e+00, i1 true, i1 false)
	; CHECK-NEXT: call void @llvm.amdgcn.exp.f32(i32 0, i32 1, float [[X:%.*]], float undef, float undef, float undef, i1 true, i1 false)			; CHECK-NEXT: call void @llvm.amdgcn.exp.f32(i32 0, i32 1, float [[X]], float undef, float undef, float undef, i1 true, i1 false)
	; CHECK-NEXT: call void @llvm.amdgcn.exp.f32(i32 0, i32 2, float undef, float [[Y:%.*]], float undef, float undef, i1 true, i1 false)			; CHECK-NEXT: call void @llvm.amdgcn.exp.f32(i32 0, i32 2, float undef, float [[Y]], float undef, float undef, i1 true, i1 false)
	; CHECK-NEXT: call void @llvm.amdgcn.exp.f32(i32 0, i32 4, float undef, float undef, float [[Z:%.*]], float undef, i1 true, i1 false)			; CHECK-NEXT: call void @llvm.amdgcn.exp.f32(i32 0, i32 4, float undef, float undef, float [[Z]], float undef, i1 true, i1 false)
	; CHECK-NEXT: call void @llvm.amdgcn.exp.f32(i32 0, i32 8, float undef, float undef, float undef, float [[W:%.*]], i1 true, i1 false)			; CHECK-NEXT: call void @llvm.amdgcn.exp.f32(i32 0, i32 8, float undef, float undef, float undef, float [[W]], i1 true, i1 false)
	; CHECK-NEXT: call void @llvm.amdgcn.exp.f32(i32 0, i32 0, float undef, float undef, float undef, float undef, i1 true, i1 false)			; CHECK-NEXT: call void @llvm.amdgcn.exp.f32(i32 0, i32 0, float undef, float undef, float undef, float undef, i1 true, i1 false)
	; CHECK-NEXT: call void @llvm.amdgcn.exp.f32(i32 0, i32 3, float 1.000000e+00, float 2.000000e+00, float undef, float undef, i1 true, i1 false)			; CHECK-NEXT: call void @llvm.amdgcn.exp.f32(i32 0, i32 3, float 1.000000e+00, float 2.000000e+00, float undef, float undef, i1 true, i1 false)
	; CHECK-NEXT: call void @llvm.amdgcn.exp.f32(i32 0, i32 5, float 1.000000e+00, float undef, float 5.000000e-01, float undef, i1 true, i1 false)			; CHECK-NEXT: call void @llvm.amdgcn.exp.f32(i32 0, i32 5, float 1.000000e+00, float undef, float 5.000000e-01, float undef, i1 true, i1 false)
	; CHECK-NEXT: call void @llvm.amdgcn.exp.f32(i32 0, i32 9, float 1.000000e+00, float undef, float undef, float 4.000000e+00, i1 false, i1 false)			; CHECK-NEXT: call void @llvm.amdgcn.exp.f32(i32 0, i32 9, float 1.000000e+00, float undef, float undef, float 4.000000e+00, i1 false, i1 false)
	; CHECK-NEXT: call void @llvm.amdgcn.exp.f32(i32 0, i32 15, float 1.000000e+00, float 2.000000e+00, float 5.000000e-01, float 4.000000e+00, i1 false, i1 false)			; CHECK-NEXT: call void @llvm.amdgcn.exp.f32(i32 0, i32 15, float 1.000000e+00, float 2.000000e+00, float 5.000000e-01, float 4.000000e+00, i1 false, i1 false)
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	call void @llvm.amdgcn.exp.f32(i32 0, i32 1, float 1.0, float 2.0, float 0.5, float 4.0, i1 true, i1 false)			call void @llvm.amdgcn.exp.f32(i32 0, i32 1, float 1.0, float 2.0, float 0.5, float 4.0, i1 true, i1 false)
	Show All 23 Lines
	; llvm.amdgcn.exp.compr			; llvm.amdgcn.exp.compr
	; --------------------------------------------------------------------			; --------------------------------------------------------------------

	declare void @llvm.amdgcn.exp.compr.v2f16(i32 immarg, i32 immarg, <2 x half>, <2 x half>, i1 immarg, i1 immarg) nounwind inaccessiblememonly			declare void @llvm.amdgcn.exp.compr.v2f16(i32 immarg, i32 immarg, <2 x half>, <2 x half>, i1 immarg, i1 immarg) nounwind inaccessiblememonly



	define void @exp_compr_disabled_inputs_to_undef(<2 x half> %xy, <2 x half> %zw) {			define void @exp_compr_disabled_inputs_to_undef(<2 x half> %xy, <2 x half> %zw) {
	; CHECK-LABEL: @exp_compr_disabled_inputs_to_undef(			; CHECK-LABEL: define void @exp_compr_disabled_inputs_to_undef
				; CHECK-SAME: (<2 x half> [[XY:%.]], <2 x half> [[ZW:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: call void @llvm.amdgcn.exp.compr.v2f16(i32 0, i32 0, <2 x half> undef, <2 x half> undef, i1 true, i1 false)			; CHECK-NEXT: call void @llvm.amdgcn.exp.compr.v2f16(i32 0, i32 0, <2 x half> undef, <2 x half> undef, i1 true, i1 false)
	; CHECK-NEXT: call void @llvm.amdgcn.exp.compr.v2f16(i32 0, i32 1, <2 x half> <half 0xH3C00, half 0xH4000>, <2 x half> undef, i1 true, i1 false)			; CHECK-NEXT: call void @llvm.amdgcn.exp.compr.v2f16(i32 0, i32 1, <2 x half> <half 0xH3C00, half 0xH4000>, <2 x half> undef, i1 true, i1 false)
	; CHECK-NEXT: call void @llvm.amdgcn.exp.compr.v2f16(i32 0, i32 2, <2 x half> <half 0xH3C00, half 0xH4000>, <2 x half> undef, i1 true, i1 false)			; CHECK-NEXT: call void @llvm.amdgcn.exp.compr.v2f16(i32 0, i32 2, <2 x half> <half 0xH3C00, half 0xH4000>, <2 x half> undef, i1 true, i1 false)
	; CHECK-NEXT: call void @llvm.amdgcn.exp.compr.v2f16(i32 0, i32 3, <2 x half> <half 0xH3C00, half 0xH4000>, <2 x half> undef, i1 true, i1 false)			; CHECK-NEXT: call void @llvm.amdgcn.exp.compr.v2f16(i32 0, i32 3, <2 x half> <half 0xH3C00, half 0xH4000>, <2 x half> undef, i1 true, i1 false)
	; CHECK-NEXT: call void @llvm.amdgcn.exp.compr.v2f16(i32 0, i32 0, <2 x half> undef, <2 x half> undef, i1 true, i1 false)			; CHECK-NEXT: call void @llvm.amdgcn.exp.compr.v2f16(i32 0, i32 0, <2 x half> undef, <2 x half> undef, i1 true, i1 false)
	; CHECK-NEXT: call void @llvm.amdgcn.exp.compr.v2f16(i32 0, i32 1, <2 x half> [[XY:%.*]], <2 x half> undef, i1 true, i1 false)			; CHECK-NEXT: call void @llvm.amdgcn.exp.compr.v2f16(i32 0, i32 1, <2 x half> [[XY]], <2 x half> undef, i1 true, i1 false)
	; CHECK-NEXT: call void @llvm.amdgcn.exp.compr.v2f16(i32 0, i32 2, <2 x half> [[XY]], <2 x half> undef, i1 true, i1 false)			; CHECK-NEXT: call void @llvm.amdgcn.exp.compr.v2f16(i32 0, i32 2, <2 x half> [[XY]], <2 x half> undef, i1 true, i1 false)
	; CHECK-NEXT: call void @llvm.amdgcn.exp.compr.v2f16(i32 0, i32 3, <2 x half> [[XY]], <2 x half> undef, i1 true, i1 false)			; CHECK-NEXT: call void @llvm.amdgcn.exp.compr.v2f16(i32 0, i32 3, <2 x half> [[XY]], <2 x half> undef, i1 true, i1 false)
	; CHECK-NEXT: call void @llvm.amdgcn.exp.compr.v2f16(i32 0, i32 12, <2 x half> undef, <2 x half> [[ZW:%.*]], i1 true, i1 false)			; CHECK-NEXT: call void @llvm.amdgcn.exp.compr.v2f16(i32 0, i32 12, <2 x half> undef, <2 x half> [[ZW]], i1 true, i1 false)
	; CHECK-NEXT: call void @llvm.amdgcn.exp.compr.v2f16(i32 0, i32 15, <2 x half> [[XY]], <2 x half> [[ZW]], i1 true, i1 false)			; CHECK-NEXT: call void @llvm.amdgcn.exp.compr.v2f16(i32 0, i32 15, <2 x half> [[XY]], <2 x half> [[ZW]], i1 true, i1 false)
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	call void @llvm.amdgcn.exp.compr.v2f16(i32 0, i32 0, <2 x half> <half 1.0, half 2.0>, <2 x half> <half 0.5, half 4.0>, i1 true, i1 false)			call void @llvm.amdgcn.exp.compr.v2f16(i32 0, i32 0, <2 x half> <half 1.0, half 2.0>, <2 x half> <half 0.5, half 4.0>, i1 true, i1 false)
	call void @llvm.amdgcn.exp.compr.v2f16(i32 0, i32 1, <2 x half> <half 1.0, half 2.0>, <2 x half> <half 0.5, half 4.0>, i1 true, i1 false)			call void @llvm.amdgcn.exp.compr.v2f16(i32 0, i32 1, <2 x half> <half 1.0, half 2.0>, <2 x half> <half 0.5, half 4.0>, i1 true, i1 false)
	call void @llvm.amdgcn.exp.compr.v2f16(i32 0, i32 2, <2 x half> <half 1.0, half 2.0>, <2 x half> <half 0.5, half 4.0>, i1 true, i1 false)			call void @llvm.amdgcn.exp.compr.v2f16(i32 0, i32 2, <2 x half> <half 1.0, half 2.0>, <2 x half> <half 0.5, half 4.0>, i1 true, i1 false)
	call void @llvm.amdgcn.exp.compr.v2f16(i32 0, i32 3, <2 x half> <half 1.0, half 2.0>, <2 x half> <half 0.5, half 4.0>, i1 true, i1 false)			call void @llvm.amdgcn.exp.compr.v2f16(i32 0, i32 3, <2 x half> <half 1.0, half 2.0>, <2 x half> <half 0.5, half 4.0>, i1 true, i1 false)

	Show All 9 Lines

	; --------------------------------------------------------------------			; --------------------------------------------------------------------
	; llvm.amdgcn.fmed3			; llvm.amdgcn.fmed3
	; --------------------------------------------------------------------			; --------------------------------------------------------------------

	declare float @llvm.amdgcn.fmed3.f32(float, float, float) nounwind readnone			declare float @llvm.amdgcn.fmed3.f32(float, float, float) nounwind readnone

	define float @fmed3_f32(float %x, float %y, float %z) {			define float @fmed3_f32(float %x, float %y, float %z) {
	; CHECK-LABEL: @fmed3_f32(			; CHECK-LABEL: define float @fmed3_f32
	; CHECK-NEXT: [[MED3:%.]] = call float @llvm.amdgcn.fmed3.f32(float [[X:%.]], float [[Y:%.]], float [[Z:%.]])			; CHECK-SAME: (float [[X:%.]], float [[Y:%.]], float [[Z:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: [[MED3:%.*]] = call float @llvm.amdgcn.fmed3.f32(float [[X]], float [[Y]], float [[Z]])
	; CHECK-NEXT: ret float [[MED3]]			; CHECK-NEXT: ret float [[MED3]]
	;			;
	%med3 = call float @llvm.amdgcn.fmed3.f32(float %x, float %y, float %z)			%med3 = call float @llvm.amdgcn.fmed3.f32(float %x, float %y, float %z)
	ret float %med3			ret float %med3
	}			}

	define float @fmed3_canonicalize_x_c0_c1_f32(float %x) {			define float @fmed3_canonicalize_x_c0_c1_f32(float %x) {
	; CHECK-LABEL: @fmed3_canonicalize_x_c0_c1_f32(			; CHECK-LABEL: define float @fmed3_canonicalize_x_c0_c1_f32
	; CHECK-NEXT: [[MED3:%.]] = call float @llvm.amdgcn.fmed3.f32(float [[X:%.]], float 0.000000e+00, float 1.000000e+00)			; CHECK-SAME: (float [[X:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: [[MED3:%.*]] = call float @llvm.amdgcn.fmed3.f32(float [[X]], float 0.000000e+00, float 1.000000e+00)
	; CHECK-NEXT: ret float [[MED3]]			; CHECK-NEXT: ret float [[MED3]]
	;			;
	%med3 = call float @llvm.amdgcn.fmed3.f32(float %x, float 0.0, float 1.0)			%med3 = call float @llvm.amdgcn.fmed3.f32(float %x, float 0.0, float 1.0)
	ret float %med3			ret float %med3
	}			}

	define float @fmed3_canonicalize_c0_x_c1_f32(float %x) {			define float @fmed3_canonicalize_c0_x_c1_f32(float %x) {
	; CHECK-LABEL: @fmed3_canonicalize_c0_x_c1_f32(			; CHECK-LABEL: define float @fmed3_canonicalize_c0_x_c1_f32
	; CHECK-NEXT: [[MED3:%.]] = call float @llvm.amdgcn.fmed3.f32(float [[X:%.]], float 0.000000e+00, float 1.000000e+00)			; CHECK-SAME: (float [[X:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: [[MED3:%.*]] = call float @llvm.amdgcn.fmed3.f32(float [[X]], float 0.000000e+00, float 1.000000e+00)
	; CHECK-NEXT: ret float [[MED3]]			; CHECK-NEXT: ret float [[MED3]]
	;			;
	%med3 = call float @llvm.amdgcn.fmed3.f32(float 0.0, float %x, float 1.0)			%med3 = call float @llvm.amdgcn.fmed3.f32(float 0.0, float %x, float 1.0)
	ret float %med3			ret float %med3
	}			}

	define float @fmed3_canonicalize_c0_c1_x_f32(float %x) {			define float @fmed3_canonicalize_c0_c1_x_f32(float %x) {
	; CHECK-LABEL: @fmed3_canonicalize_c0_c1_x_f32(			; CHECK-LABEL: define float @fmed3_canonicalize_c0_c1_x_f32
	; CHECK-NEXT: [[MED3:%.]] = call float @llvm.amdgcn.fmed3.f32(float [[X:%.]], float 0.000000e+00, float 1.000000e+00)			; CHECK-SAME: (float [[X:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: [[MED3:%.*]] = call float @llvm.amdgcn.fmed3.f32(float [[X]], float 0.000000e+00, float 1.000000e+00)
	; CHECK-NEXT: ret float [[MED3]]			; CHECK-NEXT: ret float [[MED3]]
	;			;
	%med3 = call float @llvm.amdgcn.fmed3.f32(float 0.0, float 1.0, float %x)			%med3 = call float @llvm.amdgcn.fmed3.f32(float 0.0, float 1.0, float %x)
	ret float %med3			ret float %med3
	}			}

	define float @fmed3_canonicalize_x_y_c_f32(float %x, float %y) {			define float @fmed3_canonicalize_x_y_c_f32(float %x, float %y) {
	; CHECK-LABEL: @fmed3_canonicalize_x_y_c_f32(			; CHECK-LABEL: define float @fmed3_canonicalize_x_y_c_f32
	; CHECK-NEXT: [[MED3:%.]] = call float @llvm.amdgcn.fmed3.f32(float [[X:%.]], float [[Y:%.*]], float 1.000000e+00)			; CHECK-SAME: (float [[X:%.]], float [[Y:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[MED3:%.*]] = call float @llvm.amdgcn.fmed3.f32(float [[X]], float [[Y]], float 1.000000e+00)
	; CHECK-NEXT: ret float [[MED3]]			; CHECK-NEXT: ret float [[MED3]]
	;			;
	%med3 = call float @llvm.amdgcn.fmed3.f32(float %x, float %y, float 1.0)			%med3 = call float @llvm.amdgcn.fmed3.f32(float %x, float %y, float 1.0)
	ret float %med3			ret float %med3
	}			}

	define float @fmed3_canonicalize_x_c_y_f32(float %x, float %y) {			define float @fmed3_canonicalize_x_c_y_f32(float %x, float %y) {
	; CHECK-LABEL: @fmed3_canonicalize_x_c_y_f32(			; CHECK-LABEL: define float @fmed3_canonicalize_x_c_y_f32
	; CHECK-NEXT: [[MED3:%.]] = call float @llvm.amdgcn.fmed3.f32(float [[X:%.]], float [[Y:%.*]], float 1.000000e+00)			; CHECK-SAME: (float [[X:%.]], float [[Y:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[MED3:%.*]] = call float @llvm.amdgcn.fmed3.f32(float [[X]], float [[Y]], float 1.000000e+00)
	; CHECK-NEXT: ret float [[MED3]]			; CHECK-NEXT: ret float [[MED3]]
	;			;
	%med3 = call float @llvm.amdgcn.fmed3.f32(float %x, float 1.0, float %y)			%med3 = call float @llvm.amdgcn.fmed3.f32(float %x, float 1.0, float %y)
	ret float %med3			ret float %med3
	}			}

	define float @fmed3_canonicalize_c_x_y_f32(float %x, float %y) {			define float @fmed3_canonicalize_c_x_y_f32(float %x, float %y) {
	; CHECK-LABEL: @fmed3_canonicalize_c_x_y_f32(			; CHECK-LABEL: define float @fmed3_canonicalize_c_x_y_f32
	; CHECK-NEXT: [[MED3:%.]] = call float @llvm.amdgcn.fmed3.f32(float [[X:%.]], float [[Y:%.*]], float 1.000000e+00)			; CHECK-SAME: (float [[X:%.]], float [[Y:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[MED3:%.*]] = call float @llvm.amdgcn.fmed3.f32(float [[X]], float [[Y]], float 1.000000e+00)
	; CHECK-NEXT: ret float [[MED3]]			; CHECK-NEXT: ret float [[MED3]]
	;			;
	%med3 = call float @llvm.amdgcn.fmed3.f32(float 1.0, float %x, float %y)			%med3 = call float @llvm.amdgcn.fmed3.f32(float 1.0, float %x, float %y)
	ret float %med3			ret float %med3
	}			}

	define float @fmed3_undef_x_y_f32(float %x, float %y) {			define float @fmed3_undef_x_y_f32(float %x, float %y) {
	; CHECK-LABEL: @fmed3_undef_x_y_f32(			; CHECK-LABEL: define float @fmed3_undef_x_y_f32
	; CHECK-NEXT: [[MED3:%.]] = call float @llvm.minnum.f32(float [[X:%.]], float [[Y:%.*]])			; CHECK-SAME: (float [[X:%.]], float [[Y:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[MED3:%.*]] = call float @llvm.minnum.f32(float [[X]], float [[Y]])
	; CHECK-NEXT: ret float [[MED3]]			; CHECK-NEXT: ret float [[MED3]]
	;			;
	%med3 = call float @llvm.amdgcn.fmed3.f32(float undef, float %x, float %y)			%med3 = call float @llvm.amdgcn.fmed3.f32(float undef, float %x, float %y)
	ret float %med3			ret float %med3
	}			}

	define float @fmed3_fmf_undef_x_y_f32(float %x, float %y) {			define float @fmed3_fmf_undef_x_y_f32(float %x, float %y) {
	; CHECK-LABEL: @fmed3_fmf_undef_x_y_f32(			; CHECK-LABEL: define float @fmed3_fmf_undef_x_y_f32
	; CHECK-NEXT: [[MED3:%.]] = call nnan float @llvm.minnum.f32(float [[X:%.]], float [[Y:%.*]])			; CHECK-SAME: (float [[X:%.]], float [[Y:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[MED3:%.*]] = call nnan float @llvm.minnum.f32(float [[X]], float [[Y]])
	; CHECK-NEXT: ret float [[MED3]]			; CHECK-NEXT: ret float [[MED3]]
	;			;
	%med3 = call nnan float @llvm.amdgcn.fmed3.f32(float undef, float %x, float %y)			%med3 = call nnan float @llvm.amdgcn.fmed3.f32(float undef, float %x, float %y)
	ret float %med3			ret float %med3
	}			}

	define float @fmed3_x_undef_y_f32(float %x, float %y) {			define float @fmed3_x_undef_y_f32(float %x, float %y) {
	; CHECK-LABEL: @fmed3_x_undef_y_f32(			; CHECK-LABEL: define float @fmed3_x_undef_y_f32
	; CHECK-NEXT: [[MED3:%.]] = call float @llvm.minnum.f32(float [[X:%.]], float [[Y:%.*]])			; CHECK-SAME: (float [[X:%.]], float [[Y:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[MED3:%.*]] = call float @llvm.minnum.f32(float [[X]], float [[Y]])
	; CHECK-NEXT: ret float [[MED3]]			; CHECK-NEXT: ret float [[MED3]]
	;			;
	%med3 = call float @llvm.amdgcn.fmed3.f32(float %x, float undef, float %y)			%med3 = call float @llvm.amdgcn.fmed3.f32(float %x, float undef, float %y)
	ret float %med3			ret float %med3
	}			}

	define float @fmed3_x_y_undef_f32(float %x, float %y) {			define float @fmed3_x_y_undef_f32(float %x, float %y) {
	; CHECK-LABEL: @fmed3_x_y_undef_f32(			; CHECK-LABEL: define float @fmed3_x_y_undef_f32
	; CHECK-NEXT: [[MED3:%.]] = call float @llvm.maxnum.f32(float [[X:%.]], float [[Y:%.*]])			; CHECK-SAME: (float [[X:%.]], float [[Y:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[MED3:%.*]] = call float @llvm.maxnum.f32(float [[X]], float [[Y]])
	; CHECK-NEXT: ret float [[MED3]]			; CHECK-NEXT: ret float [[MED3]]
	;			;
	%med3 = call float @llvm.amdgcn.fmed3.f32(float %x, float %y, float undef)			%med3 = call float @llvm.amdgcn.fmed3.f32(float %x, float %y, float undef)
	ret float %med3			ret float %med3
	}			}

	define float @fmed3_qnan0_x_y_f32(float %x, float %y) {			define float @fmed3_qnan0_x_y_f32(float %x, float %y) {
	; CHECK-LABEL: @fmed3_qnan0_x_y_f32(			; CHECK-LABEL: define float @fmed3_qnan0_x_y_f32
	; CHECK-NEXT: [[MED3:%.]] = call float @llvm.minnum.f32(float [[X:%.]], float [[Y:%.*]])			; CHECK-SAME: (float [[X:%.]], float [[Y:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[MED3:%.*]] = call float @llvm.minnum.f32(float [[X]], float [[Y]])
	; CHECK-NEXT: ret float [[MED3]]			; CHECK-NEXT: ret float [[MED3]]
	;			;
	%med3 = call float @llvm.amdgcn.fmed3.f32(float 0x7FF8000000000000, float %x, float %y)			%med3 = call float @llvm.amdgcn.fmed3.f32(float 0x7FF8000000000000, float %x, float %y)
	ret float %med3			ret float %med3
	}			}

	define float @fmed3_x_qnan0_y_f32(float %x, float %y) {			define float @fmed3_x_qnan0_y_f32(float %x, float %y) {
	; CHECK-LABEL: @fmed3_x_qnan0_y_f32(			; CHECK-LABEL: define float @fmed3_x_qnan0_y_f32
	; CHECK-NEXT: [[MED3:%.]] = call float @llvm.minnum.f32(float [[X:%.]], float [[Y:%.*]])			; CHECK-SAME: (float [[X:%.]], float [[Y:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[MED3:%.*]] = call float @llvm.minnum.f32(float [[X]], float [[Y]])
	; CHECK-NEXT: ret float [[MED3]]			; CHECK-NEXT: ret float [[MED3]]
	;			;
	%med3 = call float @llvm.amdgcn.fmed3.f32(float %x, float 0x7FF8000000000000, float %y)			%med3 = call float @llvm.amdgcn.fmed3.f32(float %x, float 0x7FF8000000000000, float %y)
	ret float %med3			ret float %med3
	}			}

	define float @fmed3_x_y_qnan0_f32(float %x, float %y) {			define float @fmed3_x_y_qnan0_f32(float %x, float %y) {
	; CHECK-LABEL: @fmed3_x_y_qnan0_f32(			; CHECK-LABEL: define float @fmed3_x_y_qnan0_f32
	; CHECK-NEXT: [[MED3:%.]] = call float @llvm.maxnum.f32(float [[X:%.]], float [[Y:%.*]])			; CHECK-SAME: (float [[X:%.]], float [[Y:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[MED3:%.*]] = call float @llvm.maxnum.f32(float [[X]], float [[Y]])
	; CHECK-NEXT: ret float [[MED3]]			; CHECK-NEXT: ret float [[MED3]]
	;			;
	%med3 = call float @llvm.amdgcn.fmed3.f32(float %x, float %y, float 0x7FF8000000000000)			%med3 = call float @llvm.amdgcn.fmed3.f32(float %x, float %y, float 0x7FF8000000000000)
	ret float %med3			ret float %med3
	}			}

	define float @fmed3_qnan1_x_y_f32(float %x, float %y) {			define float @fmed3_qnan1_x_y_f32(float %x, float %y) {
	; CHECK-LABEL: @fmed3_qnan1_x_y_f32(			; CHECK-LABEL: define float @fmed3_qnan1_x_y_f32
	; CHECK-NEXT: [[MED3:%.]] = call float @llvm.minnum.f32(float [[X:%.]], float [[Y:%.*]])			; CHECK-SAME: (float [[X:%.]], float [[Y:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[MED3:%.*]] = call float @llvm.minnum.f32(float [[X]], float [[Y]])
	; CHECK-NEXT: ret float [[MED3]]			; CHECK-NEXT: ret float [[MED3]]
	;			;
	%med3 = call float @llvm.amdgcn.fmed3.f32(float 0x7FF8000100000000, float %x, float %y)			%med3 = call float @llvm.amdgcn.fmed3.f32(float 0x7FF8000100000000, float %x, float %y)
	ret float %med3			ret float %med3
	}			}

	; This can return any of the qnans.			; This can return any of the qnans.
	define float @fmed3_qnan0_qnan1_qnan2_f32(float %x, float %y) {			define float @fmed3_qnan0_qnan1_qnan2_f32(float %x, float %y) {
	; CHECK-LABEL: @fmed3_qnan0_qnan1_qnan2_f32(			; CHECK-LABEL: define float @fmed3_qnan0_qnan1_qnan2_f32
				; CHECK-SAME: (float [[X:%.]], float [[Y:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: ret float 0x7FF8030000000000			; CHECK-NEXT: ret float 0x7FF8030000000000
	;			;
	%med3 = call float @llvm.amdgcn.fmed3.f32(float 0x7FF8000100000000, float 0x7FF8002000000000, float 0x7FF8030000000000)			%med3 = call float @llvm.amdgcn.fmed3.f32(float 0x7FF8000100000000, float 0x7FF8002000000000, float 0x7FF8030000000000)
	ret float %med3			ret float %med3
	}			}

	define float @fmed3_constant_src0_0_f32(float %x, float %y) {			define float @fmed3_constant_src0_0_f32(float %x, float %y) {
	; CHECK-LABEL: @fmed3_constant_src0_0_f32(			; CHECK-LABEL: define float @fmed3_constant_src0_0_f32
				; CHECK-SAME: (float [[X:%.]], float [[Y:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: ret float 5.000000e-01			; CHECK-NEXT: ret float 5.000000e-01
	;			;
	%med3 = call float @llvm.amdgcn.fmed3.f32(float 0.5, float -1.0, float 4.0)			%med3 = call float @llvm.amdgcn.fmed3.f32(float 0.5, float -1.0, float 4.0)
	ret float %med3			ret float %med3
	}			}

	define float @fmed3_constant_src0_1_f32(float %x, float %y) {			define float @fmed3_constant_src0_1_f32(float %x, float %y) {
	; CHECK-LABEL: @fmed3_constant_src0_1_f32(			; CHECK-LABEL: define float @fmed3_constant_src0_1_f32
				; CHECK-SAME: (float [[X:%.]], float [[Y:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: ret float 5.000000e-01			; CHECK-NEXT: ret float 5.000000e-01
	;			;
	%med3 = call float @llvm.amdgcn.fmed3.f32(float 0.5, float 4.0, float -1.0)			%med3 = call float @llvm.amdgcn.fmed3.f32(float 0.5, float 4.0, float -1.0)
	ret float %med3			ret float %med3
	}			}

	define float @fmed3_constant_src1_0_f32(float %x, float %y) {			define float @fmed3_constant_src1_0_f32(float %x, float %y) {
	; CHECK-LABEL: @fmed3_constant_src1_0_f32(			; CHECK-LABEL: define float @fmed3_constant_src1_0_f32
				; CHECK-SAME: (float [[X:%.]], float [[Y:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: ret float 5.000000e-01			; CHECK-NEXT: ret float 5.000000e-01
	;			;
	%med3 = call float @llvm.amdgcn.fmed3.f32(float -1.0, float 0.5, float 4.0)			%med3 = call float @llvm.amdgcn.fmed3.f32(float -1.0, float 0.5, float 4.0)
	ret float %med3			ret float %med3
	}			}

	define float @fmed3_constant_src1_1_f32(float %x, float %y) {			define float @fmed3_constant_src1_1_f32(float %x, float %y) {
	; CHECK-LABEL: @fmed3_constant_src1_1_f32(			; CHECK-LABEL: define float @fmed3_constant_src1_1_f32
				; CHECK-SAME: (float [[X:%.]], float [[Y:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: ret float 5.000000e-01			; CHECK-NEXT: ret float 5.000000e-01
	;			;
	%med3 = call float @llvm.amdgcn.fmed3.f32(float 4.0, float 0.5, float -1.0)			%med3 = call float @llvm.amdgcn.fmed3.f32(float 4.0, float 0.5, float -1.0)
	ret float %med3			ret float %med3
	}			}

	define float @fmed3_constant_src2_0_f32(float %x, float %y) {			define float @fmed3_constant_src2_0_f32(float %x, float %y) {
	; CHECK-LABEL: @fmed3_constant_src2_0_f32(			; CHECK-LABEL: define float @fmed3_constant_src2_0_f32
				; CHECK-SAME: (float [[X:%.]], float [[Y:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: ret float 5.000000e-01			; CHECK-NEXT: ret float 5.000000e-01
	;			;
	%med3 = call float @llvm.amdgcn.fmed3.f32(float -1.0, float 4.0, float 0.5)			%med3 = call float @llvm.amdgcn.fmed3.f32(float -1.0, float 4.0, float 0.5)
	ret float %med3			ret float %med3
	}			}

	define float @fmed3_constant_src2_1_f32(float %x, float %y) {			define float @fmed3_constant_src2_1_f32(float %x, float %y) {
	; CHECK-LABEL: @fmed3_constant_src2_1_f32(			; CHECK-LABEL: define float @fmed3_constant_src2_1_f32
				; CHECK-SAME: (float [[X:%.]], float [[Y:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: ret float 5.000000e-01			; CHECK-NEXT: ret float 5.000000e-01
	;			;
	%med3 = call float @llvm.amdgcn.fmed3.f32(float 4.0, float -1.0, float 0.5)			%med3 = call float @llvm.amdgcn.fmed3.f32(float 4.0, float -1.0, float 0.5)
	ret float %med3			ret float %med3
	}			}

	define float @fmed3_x_qnan0_qnan1_f32(float %x) {			define float @fmed3_x_qnan0_qnan1_f32(float %x) {
	; CHECK-LABEL: @fmed3_x_qnan0_qnan1_f32(			; CHECK-LABEL: define float @fmed3_x_qnan0_qnan1_f32
	; CHECK-NEXT: ret float [[X:%.*]]			; CHECK-SAME: (float [[X:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: ret float [[X]]
	;			;
	%med3 = call float @llvm.amdgcn.fmed3.f32(float %x, float 0x7FF8001000000000, float 0x7FF8002000000000)			%med3 = call float @llvm.amdgcn.fmed3.f32(float %x, float 0x7FF8001000000000, float 0x7FF8002000000000)
	ret float %med3			ret float %med3
	}			}

	define float @fmed3_qnan0_x_qnan1_f32(float %x) {			define float @fmed3_qnan0_x_qnan1_f32(float %x) {
	; CHECK-LABEL: @fmed3_qnan0_x_qnan1_f32(			; CHECK-LABEL: define float @fmed3_qnan0_x_qnan1_f32
	; CHECK-NEXT: ret float [[X:%.*]]			; CHECK-SAME: (float [[X:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: ret float [[X]]
	;			;
	%med3 = call float @llvm.amdgcn.fmed3.f32(float 0x7FF8001000000000, float %x, float 0x7FF8002000000000)			%med3 = call float @llvm.amdgcn.fmed3.f32(float 0x7FF8001000000000, float %x, float 0x7FF8002000000000)
	ret float %med3			ret float %med3
	}			}

	define float @fmed3_qnan0_qnan1_x_f32(float %x) {			define float @fmed3_qnan0_qnan1_x_f32(float %x) {
	; CHECK-LABEL: @fmed3_qnan0_qnan1_x_f32(			; CHECK-LABEL: define float @fmed3_qnan0_qnan1_x_f32
	; CHECK-NEXT: ret float [[X:%.*]]			; CHECK-SAME: (float [[X:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: ret float [[X]]
	;			;
	%med3 = call float @llvm.amdgcn.fmed3.f32(float 0x7FF8001000000000, float 0x7FF8002000000000, float %x)			%med3 = call float @llvm.amdgcn.fmed3.f32(float 0x7FF8001000000000, float 0x7FF8002000000000, float %x)
	ret float %med3			ret float %med3
	}			}

	define float @fmed3_nan_0_1_f32() {			define float @fmed3_nan_0_1_f32() {
	; CHECK-LABEL: @fmed3_nan_0_1_f32(			; CHECK-LABEL: define float @fmed3_nan_0_1_f32
				; CHECK-SAME: () #[[ATTR3]] {
	; CHECK-NEXT: ret float 0.000000e+00			; CHECK-NEXT: ret float 0.000000e+00
	;			;
	%med3 = call float @llvm.amdgcn.fmed3.f32(float 0x7FF8001000000000, float 0.0, float 1.0)			%med3 = call float @llvm.amdgcn.fmed3.f32(float 0x7FF8001000000000, float 0.0, float 1.0)
	ret float %med3			ret float %med3
	}			}

	define float @fmed3_0_nan_1_f32() {			define float @fmed3_0_nan_1_f32() {
	; CHECK-LABEL: @fmed3_0_nan_1_f32(			; CHECK-LABEL: define float @fmed3_0_nan_1_f32
				; CHECK-SAME: () #[[ATTR3]] {
	; CHECK-NEXT: ret float 0.000000e+00			; CHECK-NEXT: ret float 0.000000e+00
	;			;
	%med = call float @llvm.amdgcn.fmed3.f32(float 0.0, float 0x7FF8001000000000, float 1.0)			%med = call float @llvm.amdgcn.fmed3.f32(float 0.0, float 0x7FF8001000000000, float 1.0)
	ret float %med			ret float %med
	}			}

	define float @fmed3_0_1_nan_f32() {			define float @fmed3_0_1_nan_f32() {
	; CHECK-LABEL: @fmed3_0_1_nan_f32(			; CHECK-LABEL: define float @fmed3_0_1_nan_f32
				; CHECK-SAME: () #[[ATTR3]] {
	; CHECK-NEXT: ret float 1.000000e+00			; CHECK-NEXT: ret float 1.000000e+00
	;			;
	%med = call float @llvm.amdgcn.fmed3.f32(float 0.0, float 1.0, float 0x7FF8001000000000)			%med = call float @llvm.amdgcn.fmed3.f32(float 0.0, float 1.0, float 0x7FF8001000000000)
	ret float %med			ret float %med
	}			}

	define float @fmed3_undef_0_1_f32() {			define float @fmed3_undef_0_1_f32() {
	; CHECK-LABEL: @fmed3_undef_0_1_f32(			; CHECK-LABEL: define float @fmed3_undef_0_1_f32
				; CHECK-SAME: () #[[ATTR3]] {
	; CHECK-NEXT: ret float 0.000000e+00			; CHECK-NEXT: ret float 0.000000e+00
	;			;
	%med3 = call float @llvm.amdgcn.fmed3.f32(float undef, float 0.0, float 1.0)			%med3 = call float @llvm.amdgcn.fmed3.f32(float undef, float 0.0, float 1.0)
	ret float %med3			ret float %med3
	}			}

	define float @fmed3_0_undef_1_f32() {			define float @fmed3_0_undef_1_f32() {
	; CHECK-LABEL: @fmed3_0_undef_1_f32(			; CHECK-LABEL: define float @fmed3_0_undef_1_f32
				; CHECK-SAME: () #[[ATTR3]] {
	; CHECK-NEXT: ret float 0.000000e+00			; CHECK-NEXT: ret float 0.000000e+00
	;			;
	%med = call float @llvm.amdgcn.fmed3.f32(float 0.0, float undef, float 1.0)			%med = call float @llvm.amdgcn.fmed3.f32(float 0.0, float undef, float 1.0)
	ret float %med			ret float %med
	}			}

	define float @fmed3_0_1_undef_f32() {			define float @fmed3_0_1_undef_f32() {
	; CHECK-LABEL: @fmed3_0_1_undef_f32(			; CHECK-LABEL: define float @fmed3_0_1_undef_f32
				; CHECK-SAME: () #[[ATTR3]] {
	; CHECK-NEXT: ret float 1.000000e+00			; CHECK-NEXT: ret float 1.000000e+00
	;			;
	%med = call float @llvm.amdgcn.fmed3.f32(float 0.0, float 1.0, float undef)			%med = call float @llvm.amdgcn.fmed3.f32(float 0.0, float 1.0, float undef)
	ret float %med			ret float %med
	}			}

	; --------------------------------------------------------------------			; --------------------------------------------------------------------
	; llvm.amdgcn.icmp			; llvm.amdgcn.icmp
	; --------------------------------------------------------------------			; --------------------------------------------------------------------

	declare i64 @llvm.amdgcn.icmp.i64.i32(i32, i32, i32 immarg) nounwind readnone convergent			declare i64 @llvm.amdgcn.icmp.i64.i32(i32, i32, i32 immarg) nounwind readnone convergent
	declare i64 @llvm.amdgcn.icmp.i64.i64(i64, i64, i32 immarg) nounwind readnone convergent			declare i64 @llvm.amdgcn.icmp.i64.i64(i64, i64, i32 immarg) nounwind readnone convergent
	declare i64 @llvm.amdgcn.icmp.i64.i1(i1, i1, i32 immarg) nounwind readnone convergent			declare i64 @llvm.amdgcn.icmp.i64.i1(i1, i1, i32 immarg) nounwind readnone convergent

	define i64 @invalid_icmp_code(i32 %a, i32 %b) {			define i64 @invalid_icmp_code(i32 %a, i32 %b) {
	; CHECK-LABEL: @invalid_icmp_code(			; CHECK-LABEL: define i64 @invalid_icmp_code
	; CHECK-NEXT: [[UNDER:%.]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[A:%.]], i32 [[B:%.*]], i32 31)			; CHECK-SAME: (i32 [[A:%.]], i32 [[B:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[UNDER:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[A]], i32 [[B]], i32 31)
	; CHECK-NEXT: [[OVER:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[A]], i32 [[B]], i32 42)			; CHECK-NEXT: [[OVER:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[A]], i32 [[B]], i32 42)
	; CHECK-NEXT: [[OR:%.*]] = or i64 [[UNDER]], [[OVER]]			; CHECK-NEXT: [[OR:%.*]] = or i64 [[UNDER]], [[OVER]]
	; CHECK-NEXT: ret i64 [[OR]]			; CHECK-NEXT: ret i64 [[OR]]
	;			;
	%under = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %a, i32 %b, i32 31)			%under = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %a, i32 %b, i32 31)
	%over = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %a, i32 %b, i32 42)			%over = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %a, i32 %b, i32 42)
	%or = or i64 %under, %over			%or = or i64 %under, %over
	ret i64 %or			ret i64 %or
	}			}

	define i64 @icmp_constant_inputs_false() {			define i64 @icmp_constant_inputs_false() {
	; CHECK-LABEL: @icmp_constant_inputs_false(			; CHECK-LABEL: define i64 @icmp_constant_inputs_false
				; CHECK-SAME: () #[[ATTR3]] {
	; CHECK-NEXT: ret i64 0			; CHECK-NEXT: ret i64 0
	;			;
	%result = call i64 @llvm.amdgcn.icmp.i64.i32(i32 9, i32 8, i32 32)			%result = call i64 @llvm.amdgcn.icmp.i64.i32(i32 9, i32 8, i32 32)
	ret i64 %result			ret i64 %result
	}			}

	define i64 @icmp_constant_inputs_true() {			define i64 @icmp_constant_inputs_true() {
	; CHECK-LABEL: @icmp_constant_inputs_true(			; CHECK-LABEL: define i64 @icmp_constant_inputs_true
	; CHECK-NEXT: [[RESULT:%.*]] = call i64 @llvm.read_register.i64(metadata [[META0:![0-9]+]]) #[[ATTR17:[0-9]+]]			; CHECK-SAME: () #[[ATTR3]] {
				; CHECK-NEXT: [[RESULT:%.*]] = call i64 @llvm.read_register.i64(metadata [[META0:![0-9]+]]) #[[ATTR16:[0-9]+]]
	; CHECK-NEXT: ret i64 [[RESULT]]			; CHECK-NEXT: ret i64 [[RESULT]]
	;			;
	%result = call i64 @llvm.amdgcn.icmp.i64.i32(i32 9, i32 8, i32 34)			%result = call i64 @llvm.amdgcn.icmp.i64.i32(i32 9, i32 8, i32 34)
	ret i64 %result			ret i64 %result
	}			}

	define i64 @icmp_constant_to_rhs_slt(i32 %x) {			define i64 @icmp_constant_to_rhs_slt(i32 %x) {
	; CHECK-LABEL: @icmp_constant_to_rhs_slt(			; CHECK-LABEL: define i64 @icmp_constant_to_rhs_slt
	; CHECK-NEXT: [[RESULT:%.]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[X:%.]], i32 9, i32 38)			; CHECK-SAME: (i32 [[X:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: [[RESULT:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[X]], i32 9, i32 38)
	; CHECK-NEXT: ret i64 [[RESULT]]			; CHECK-NEXT: ret i64 [[RESULT]]
	;			;
	%result = call i64 @llvm.amdgcn.icmp.i64.i32(i32 9, i32 %x, i32 40)			%result = call i64 @llvm.amdgcn.icmp.i64.i32(i32 9, i32 %x, i32 40)
	ret i64 %result			ret i64 %result
	}			}

	define i64 @fold_icmp_ne_0_zext_icmp_eq_i32(i32 %a, i32 %b) {			define i64 @fold_icmp_ne_0_zext_icmp_eq_i32(i32 %a, i32 %b) {
	; CHECK-LABEL: @fold_icmp_ne_0_zext_icmp_eq_i32(			; CHECK-LABEL: define i64 @fold_icmp_ne_0_zext_icmp_eq_i32
	; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[A:%.]], i32 [[B:%.*]], i32 32)			; CHECK-SAME: (i32 [[A:%.]], i32 [[B:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[A]], i32 [[B]], i32 32)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = icmp eq i32 %a, %b			%cmp = icmp eq i32 %a, %b
	%zext.cmp = zext i1 %cmp to i32			%zext.cmp = zext i1 %cmp to i32
	%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)			%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_ne_0_zext_icmp_ne_i32(i32 %a, i32 %b) {			define i64 @fold_icmp_ne_0_zext_icmp_ne_i32(i32 %a, i32 %b) {
	; CHECK-LABEL: @fold_icmp_ne_0_zext_icmp_ne_i32(			; CHECK-LABEL: define i64 @fold_icmp_ne_0_zext_icmp_ne_i32
	; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[A:%.]], i32 [[B:%.*]], i32 33)			; CHECK-SAME: (i32 [[A:%.]], i32 [[B:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[A]], i32 [[B]], i32 33)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = icmp ne i32 %a, %b			%cmp = icmp ne i32 %a, %b
	%zext.cmp = zext i1 %cmp to i32			%zext.cmp = zext i1 %cmp to i32
	%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)			%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_ne_0_zext_icmp_sle_i32(i32 %a, i32 %b) {			define i64 @fold_icmp_ne_0_zext_icmp_sle_i32(i32 %a, i32 %b) {
	; CHECK-LABEL: @fold_icmp_ne_0_zext_icmp_sle_i32(			; CHECK-LABEL: define i64 @fold_icmp_ne_0_zext_icmp_sle_i32
	; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[A:%.]], i32 [[B:%.*]], i32 41)			; CHECK-SAME: (i32 [[A:%.]], i32 [[B:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[A]], i32 [[B]], i32 41)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = icmp sle i32 %a, %b			%cmp = icmp sle i32 %a, %b
	%zext.cmp = zext i1 %cmp to i32			%zext.cmp = zext i1 %cmp to i32
	%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)			%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_ne_0_zext_icmp_ugt_i64(i64 %a, i64 %b) {			define i64 @fold_icmp_ne_0_zext_icmp_ugt_i64(i64 %a, i64 %b) {
	; CHECK-LABEL: @fold_icmp_ne_0_zext_icmp_ugt_i64(			; CHECK-LABEL: define i64 @fold_icmp_ne_0_zext_icmp_ugt_i64
	; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.icmp.i64.i64(i64 [[A:%.]], i64 [[B:%.*]], i32 34)			; CHECK-SAME: (i64 [[A:%.]], i64 [[B:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i64(i64 [[A]], i64 [[B]], i32 34)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = icmp ugt i64 %a, %b			%cmp = icmp ugt i64 %a, %b
	%zext.cmp = zext i1 %cmp to i32			%zext.cmp = zext i1 %cmp to i32
	%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)			%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_ne_0_zext_icmp_ult_swap_i64(i64 %a, i64 %b) {			define i64 @fold_icmp_ne_0_zext_icmp_ult_swap_i64(i64 %a, i64 %b) {
	; CHECK-LABEL: @fold_icmp_ne_0_zext_icmp_ult_swap_i64(			; CHECK-LABEL: define i64 @fold_icmp_ne_0_zext_icmp_ult_swap_i64
	; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.icmp.i64.i64(i64 [[A:%.]], i64 [[B:%.*]], i32 34)			; CHECK-SAME: (i64 [[A:%.]], i64 [[B:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i64(i64 [[A]], i64 [[B]], i32 34)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = icmp ugt i64 %a, %b			%cmp = icmp ugt i64 %a, %b
	%zext.cmp = zext i1 %cmp to i32			%zext.cmp = zext i1 %cmp to i32
	%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 0, i32 %zext.cmp, i32 33)			%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 0, i32 %zext.cmp, i32 33)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_ne_0_zext_fcmp_oeq_f32(float %a, float %b) {			define i64 @fold_icmp_ne_0_zext_fcmp_oeq_f32(float %a, float %b) {
	; CHECK-LABEL: @fold_icmp_ne_0_zext_fcmp_oeq_f32(			; CHECK-LABEL: define i64 @fold_icmp_ne_0_zext_fcmp_oeq_f32
	; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.fcmp.i64.f32(float [[A:%.]], float [[B:%.*]], i32 1)			; CHECK-SAME: (float [[A:%.]], float [[B:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.fcmp.i64.f32(float [[A]], float [[B]], i32 1)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = fcmp oeq float %a, %b			%cmp = fcmp oeq float %a, %b
	%zext.cmp = zext i1 %cmp to i32			%zext.cmp = zext i1 %cmp to i32
	%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)			%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_ne_0_zext_fcmp_une_f32(float %a, float %b) {			define i64 @fold_icmp_ne_0_zext_fcmp_une_f32(float %a, float %b) {
	; CHECK-LABEL: @fold_icmp_ne_0_zext_fcmp_une_f32(			; CHECK-LABEL: define i64 @fold_icmp_ne_0_zext_fcmp_une_f32
	; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.fcmp.i64.f32(float [[A:%.]], float [[B:%.*]], i32 14)			; CHECK-SAME: (float [[A:%.]], float [[B:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.fcmp.i64.f32(float [[A]], float [[B]], i32 14)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = fcmp une float %a, %b			%cmp = fcmp une float %a, %b
	%zext.cmp = zext i1 %cmp to i32			%zext.cmp = zext i1 %cmp to i32
	%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)			%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_ne_0_zext_fcmp_olt_f64(double %a, double %b) {			define i64 @fold_icmp_ne_0_zext_fcmp_olt_f64(double %a, double %b) {
	; CHECK-LABEL: @fold_icmp_ne_0_zext_fcmp_olt_f64(			; CHECK-LABEL: define i64 @fold_icmp_ne_0_zext_fcmp_olt_f64
	; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.fcmp.i64.f64(double [[A:%.]], double [[B:%.*]], i32 4)			; CHECK-SAME: (double [[A:%.]], double [[B:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.fcmp.i64.f64(double [[A]], double [[B]], i32 4)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = fcmp olt double %a, %b			%cmp = fcmp olt double %a, %b
	%zext.cmp = zext i1 %cmp to i32			%zext.cmp = zext i1 %cmp to i32
	%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)			%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_sext_icmp_ne_0_i32(i32 %a, i32 %b) {			define i64 @fold_icmp_sext_icmp_ne_0_i32(i32 %a, i32 %b) {
	; CHECK-LABEL: @fold_icmp_sext_icmp_ne_0_i32(			; CHECK-LABEL: define i64 @fold_icmp_sext_icmp_ne_0_i32
	; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[A:%.]], i32 [[B:%.*]], i32 32)			; CHECK-SAME: (i32 [[A:%.]], i32 [[B:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[A]], i32 [[B]], i32 32)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = icmp eq i32 %a, %b			%cmp = icmp eq i32 %a, %b
	%sext.cmp = sext i1 %cmp to i32			%sext.cmp = sext i1 %cmp to i32
	%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %sext.cmp, i32 0, i32 33)			%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %sext.cmp, i32 0, i32 33)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_eq_0_zext_icmp_eq_i32(i32 %a, i32 %b) {			define i64 @fold_icmp_eq_0_zext_icmp_eq_i32(i32 %a, i32 %b) {
	; CHECK-LABEL: @fold_icmp_eq_0_zext_icmp_eq_i32(			; CHECK-LABEL: define i64 @fold_icmp_eq_0_zext_icmp_eq_i32
	; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[A:%.]], i32 [[B:%.*]], i32 33)			; CHECK-SAME: (i32 [[A:%.]], i32 [[B:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[A]], i32 [[B]], i32 33)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = icmp eq i32 %a, %b			%cmp = icmp eq i32 %a, %b
	%zext.cmp = zext i1 %cmp to i32			%zext.cmp = zext i1 %cmp to i32
	%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 32)			%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 32)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_eq_0_zext_icmp_slt_i32(i32 %a, i32 %b) {			define i64 @fold_icmp_eq_0_zext_icmp_slt_i32(i32 %a, i32 %b) {
	; CHECK-LABEL: @fold_icmp_eq_0_zext_icmp_slt_i32(			; CHECK-LABEL: define i64 @fold_icmp_eq_0_zext_icmp_slt_i32
	; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[A:%.]], i32 [[B:%.*]], i32 39)			; CHECK-SAME: (i32 [[A:%.]], i32 [[B:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[A]], i32 [[B]], i32 39)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = icmp slt i32 %a, %b			%cmp = icmp slt i32 %a, %b
	%zext.cmp = zext i1 %cmp to i32			%zext.cmp = zext i1 %cmp to i32
	%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 32)			%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 32)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_eq_0_zext_fcmp_oeq_f32(float %a, float %b) {			define i64 @fold_icmp_eq_0_zext_fcmp_oeq_f32(float %a, float %b) {
	; CHECK-LABEL: @fold_icmp_eq_0_zext_fcmp_oeq_f32(			; CHECK-LABEL: define i64 @fold_icmp_eq_0_zext_fcmp_oeq_f32
	; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.fcmp.i64.f32(float [[A:%.]], float [[B:%.*]], i32 14)			; CHECK-SAME: (float [[A:%.]], float [[B:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.fcmp.i64.f32(float [[A]], float [[B]], i32 14)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = fcmp oeq float %a, %b			%cmp = fcmp oeq float %a, %b
	%zext.cmp = zext i1 %cmp to i32			%zext.cmp = zext i1 %cmp to i32
	%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 32)			%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 32)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_eq_0_zext_fcmp_ule_f32(float %a, float %b) {			define i64 @fold_icmp_eq_0_zext_fcmp_ule_f32(float %a, float %b) {
	; CHECK-LABEL: @fold_icmp_eq_0_zext_fcmp_ule_f32(			; CHECK-LABEL: define i64 @fold_icmp_eq_0_zext_fcmp_ule_f32
	; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.fcmp.i64.f32(float [[A:%.]], float [[B:%.*]], i32 2)			; CHECK-SAME: (float [[A:%.]], float [[B:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.fcmp.i64.f32(float [[A]], float [[B]], i32 2)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = fcmp ule float %a, %b			%cmp = fcmp ule float %a, %b
	%zext.cmp = zext i1 %cmp to i32			%zext.cmp = zext i1 %cmp to i32
	%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 32)			%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 32)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_eq_0_zext_fcmp_ogt_f32(float %a, float %b) {			define i64 @fold_icmp_eq_0_zext_fcmp_ogt_f32(float %a, float %b) {
	; CHECK-LABEL: @fold_icmp_eq_0_zext_fcmp_ogt_f32(			; CHECK-LABEL: define i64 @fold_icmp_eq_0_zext_fcmp_ogt_f32
	; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.fcmp.i64.f32(float [[A:%.]], float [[B:%.*]], i32 13)			; CHECK-SAME: (float [[A:%.]], float [[B:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.fcmp.i64.f32(float [[A]], float [[B]], i32 13)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = fcmp ogt float %a, %b			%cmp = fcmp ogt float %a, %b
	%zext.cmp = zext i1 %cmp to i32			%zext.cmp = zext i1 %cmp to i32
	%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 32)			%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 32)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_zext_icmp_eq_1_i32(i32 %a, i32 %b) {			define i64 @fold_icmp_zext_icmp_eq_1_i32(i32 %a, i32 %b) {
	; CHECK-LABEL: @fold_icmp_zext_icmp_eq_1_i32(			; CHECK-LABEL: define i64 @fold_icmp_zext_icmp_eq_1_i32
	; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[A:%.]], i32 [[B:%.*]], i32 32)			; CHECK-SAME: (i32 [[A:%.]], i32 [[B:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[A]], i32 [[B]], i32 32)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = icmp eq i32 %a, %b			%cmp = icmp eq i32 %a, %b
	%zext.cmp = zext i1 %cmp to i32			%zext.cmp = zext i1 %cmp to i32
	%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 1, i32 32)			%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 1, i32 32)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_zext_argi1_eq_1_i32(i1 %cond) {			define i64 @fold_icmp_zext_argi1_eq_1_i32(i1 %cond) {
	; CHECK-LABEL: @fold_icmp_zext_argi1_eq_1_i32(			; CHECK-LABEL: define i64 @fold_icmp_zext_argi1_eq_1_i32
	; CHECK-NEXT: [[ZEXT_COND:%.]] = zext i1 [[COND:%.]] to i32			; CHECK-SAME: (i1 [[COND:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: [[ZEXT_COND:%.*]] = zext i1 [[COND]] to i32
	; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[ZEXT_COND]], i32 0, i32 33)			; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[ZEXT_COND]], i32 0, i32 33)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%zext.cond = zext i1 %cond to i32			%zext.cond = zext i1 %cond to i32
	%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cond, i32 1, i32 32)			%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cond, i32 1, i32 32)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_zext_argi1_eq_neg1_i32(i1 %cond) {			define i64 @fold_icmp_zext_argi1_eq_neg1_i32(i1 %cond) {
	; CHECK-LABEL: @fold_icmp_zext_argi1_eq_neg1_i32(			; CHECK-LABEL: define i64 @fold_icmp_zext_argi1_eq_neg1_i32
	; CHECK-NEXT: [[ZEXT_COND:%.]] = zext i1 [[COND:%.]] to i32			; CHECK-SAME: (i1 [[COND:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: [[ZEXT_COND:%.*]] = zext i1 [[COND]] to i32
	; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[ZEXT_COND]], i32 -1, i32 32)			; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[ZEXT_COND]], i32 -1, i32 32)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%zext.cond = zext i1 %cond to i32			%zext.cond = zext i1 %cond to i32
	%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cond, i32 -1, i32 32)			%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cond, i32 -1, i32 32)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_sext_argi1_eq_1_i32(i1 %cond) {			define i64 @fold_icmp_sext_argi1_eq_1_i32(i1 %cond) {
	; CHECK-LABEL: @fold_icmp_sext_argi1_eq_1_i32(			; CHECK-LABEL: define i64 @fold_icmp_sext_argi1_eq_1_i32
	; CHECK-NEXT: [[SEXT_COND:%.]] = sext i1 [[COND:%.]] to i32			; CHECK-SAME: (i1 [[COND:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: [[SEXT_COND:%.*]] = sext i1 [[COND]] to i32
	; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[SEXT_COND]], i32 1, i32 32)			; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[SEXT_COND]], i32 1, i32 32)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%sext.cond = sext i1 %cond to i32			%sext.cond = sext i1 %cond to i32
	%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %sext.cond, i32 1, i32 32)			%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %sext.cond, i32 1, i32 32)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_sext_argi1_eq_neg1_i32(i1 %cond) {			define i64 @fold_icmp_sext_argi1_eq_neg1_i32(i1 %cond) {
	; CHECK-LABEL: @fold_icmp_sext_argi1_eq_neg1_i32(			; CHECK-LABEL: define i64 @fold_icmp_sext_argi1_eq_neg1_i32
	; CHECK-NEXT: [[SEXT_COND:%.]] = sext i1 [[COND:%.]] to i32			; CHECK-SAME: (i1 [[COND:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: [[SEXT_COND:%.*]] = sext i1 [[COND]] to i32
	; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[SEXT_COND]], i32 0, i32 33)			; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[SEXT_COND]], i32 0, i32 33)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%sext.cond = sext i1 %cond to i32			%sext.cond = sext i1 %cond to i32
	%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %sext.cond, i32 -1, i32 32)			%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %sext.cond, i32 -1, i32 32)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_sext_argi1_eq_neg1_i64(i1 %cond) {			define i64 @fold_icmp_sext_argi1_eq_neg1_i64(i1 %cond) {
	; CHECK-LABEL: @fold_icmp_sext_argi1_eq_neg1_i64(			; CHECK-LABEL: define i64 @fold_icmp_sext_argi1_eq_neg1_i64
	; CHECK-NEXT: [[SEXT_COND:%.]] = sext i1 [[COND:%.]] to i64			; CHECK-SAME: (i1 [[COND:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: [[SEXT_COND:%.*]] = sext i1 [[COND]] to i64
	; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i64(i64 [[SEXT_COND]], i64 0, i32 33)			; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i64(i64 [[SEXT_COND]], i64 0, i32 33)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%sext.cond = sext i1 %cond to i64			%sext.cond = sext i1 %cond to i64
	%mask = call i64 @llvm.amdgcn.icmp.i64.i64(i64 %sext.cond, i64 -1, i32 32)			%mask = call i64 @llvm.amdgcn.icmp.i64.i64(i64 %sext.cond, i64 -1, i32 32)
	ret i64 %mask			ret i64 %mask
	}			}

	; TODO: Should be able to fold to false			; TODO: Should be able to fold to false
	define i64 @fold_icmp_sext_icmp_eq_1_i32(i32 %a, i32 %b) {			define i64 @fold_icmp_sext_icmp_eq_1_i32(i32 %a, i32 %b) {
	; CHECK-LABEL: @fold_icmp_sext_icmp_eq_1_i32(			; CHECK-LABEL: define i64 @fold_icmp_sext_icmp_eq_1_i32
	; CHECK-NEXT: [[CMP:%.]] = icmp eq i32 [[A:%.]], [[B:%.*]]			; CHECK-SAME: (i32 [[A:%.]], i32 [[B:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[CMP:%.*]] = icmp eq i32 [[A]], [[B]]
	; CHECK-NEXT: [[SEXT_CMP:%.*]] = sext i1 [[CMP]] to i32			; CHECK-NEXT: [[SEXT_CMP:%.*]] = sext i1 [[CMP]] to i32
	; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[SEXT_CMP]], i32 1, i32 32)			; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[SEXT_CMP]], i32 1, i32 32)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = icmp eq i32 %a, %b			%cmp = icmp eq i32 %a, %b
	%sext.cmp = sext i1 %cmp to i32			%sext.cmp = sext i1 %cmp to i32
	%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %sext.cmp, i32 1, i32 32)			%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %sext.cmp, i32 1, i32 32)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_sext_icmp_eq_neg1_i32(i32 %a, i32 %b) {			define i64 @fold_icmp_sext_icmp_eq_neg1_i32(i32 %a, i32 %b) {
	; CHECK-LABEL: @fold_icmp_sext_icmp_eq_neg1_i32(			; CHECK-LABEL: define i64 @fold_icmp_sext_icmp_eq_neg1_i32
	; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[A:%.]], i32 [[B:%.*]], i32 32)			; CHECK-SAME: (i32 [[A:%.]], i32 [[B:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[A]], i32 [[B]], i32 32)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = icmp eq i32 %a, %b			%cmp = icmp eq i32 %a, %b
	%sext.cmp = sext i1 %cmp to i32			%sext.cmp = sext i1 %cmp to i32
	%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %sext.cmp, i32 -1, i32 32)			%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %sext.cmp, i32 -1, i32 32)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_sext_icmp_sge_neg1_i32(i32 %a, i32 %b) {			define i64 @fold_icmp_sext_icmp_sge_neg1_i32(i32 %a, i32 %b) {
	; CHECK-LABEL: @fold_icmp_sext_icmp_sge_neg1_i32(			; CHECK-LABEL: define i64 @fold_icmp_sext_icmp_sge_neg1_i32
	; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[A:%.]], i32 [[B:%.*]], i32 39)			; CHECK-SAME: (i32 [[A:%.]], i32 [[B:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[A]], i32 [[B]], i32 39)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = icmp sge i32 %a, %b			%cmp = icmp sge i32 %a, %b
	%sext.cmp = sext i1 %cmp to i32			%sext.cmp = sext i1 %cmp to i32
	%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %sext.cmp, i32 -1, i32 32)			%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %sext.cmp, i32 -1, i32 32)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_not_icmp_ne_0_zext_icmp_sle_i32(i32 %a, i32 %b) {			define i64 @fold_not_icmp_ne_0_zext_icmp_sle_i32(i32 %a, i32 %b) {
	; CHECK-LABEL: @fold_not_icmp_ne_0_zext_icmp_sle_i32(			; CHECK-LABEL: define i64 @fold_not_icmp_ne_0_zext_icmp_sle_i32
	; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[A:%.]], i32 [[B:%.*]], i32 38)			; CHECK-SAME: (i32 [[A:%.]], i32 [[B:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[A]], i32 [[B]], i32 38)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = icmp sle i32 %a, %b			%cmp = icmp sle i32 %a, %b
	%not = xor i1 %cmp, true			%not = xor i1 %cmp, true
	%zext.cmp = zext i1 %not to i32			%zext.cmp = zext i1 %not to i32
	%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)			%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_ne_0_zext_icmp_eq_i4(i4 %a, i4 %b) {			define i64 @fold_icmp_ne_0_zext_icmp_eq_i4(i4 %a, i4 %b) {
	; CHECK-LABEL: @fold_icmp_ne_0_zext_icmp_eq_i4(			; CHECK-LABEL: define i64 @fold_icmp_ne_0_zext_icmp_eq_i4
	; CHECK-NEXT: [[TMP1:%.]] = zext i4 [[A:%.]] to i16			; CHECK-SAME: (i4 [[A:%.]], i4 [[B:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: [[TMP2:%.]] = zext i4 [[B:%.]] to i16			; CHECK-NEXT: [[TMP1:%.*]] = zext i4 [[A]] to i16
				; CHECK-NEXT: [[TMP2:%.*]] = zext i4 [[B]] to i16
	; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i16(i16 [[TMP1]], i16 [[TMP2]], i32 32)			; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i16(i16 [[TMP1]], i16 [[TMP2]], i32 32)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = icmp eq i4 %a, %b			%cmp = icmp eq i4 %a, %b
	%zext.cmp = zext i1 %cmp to i32			%zext.cmp = zext i1 %cmp to i32
	%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)			%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_ne_0_zext_icmp_eq_i8(i8 %a, i8 %b) {			define i64 @fold_icmp_ne_0_zext_icmp_eq_i8(i8 %a, i8 %b) {
	; CHECK-LABEL: @fold_icmp_ne_0_zext_icmp_eq_i8(			; CHECK-LABEL: define i64 @fold_icmp_ne_0_zext_icmp_eq_i8
	; CHECK-NEXT: [[TMP1:%.]] = zext i8 [[A:%.]] to i16			; CHECK-SAME: (i8 [[A:%.]], i8 [[B:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: [[TMP2:%.]] = zext i8 [[B:%.]] to i16			; CHECK-NEXT: [[TMP1:%.*]] = zext i8 [[A]] to i16
				; CHECK-NEXT: [[TMP2:%.*]] = zext i8 [[B]] to i16
	; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i16(i16 [[TMP1]], i16 [[TMP2]], i32 32)			; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i16(i16 [[TMP1]], i16 [[TMP2]], i32 32)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = icmp eq i8 %a, %b			%cmp = icmp eq i8 %a, %b
	%zext.cmp = zext i1 %cmp to i32			%zext.cmp = zext i1 %cmp to i32
	%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)			%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_ne_0_zext_icmp_eq_i16(i16 %a, i16 %b) {			define i64 @fold_icmp_ne_0_zext_icmp_eq_i16(i16 %a, i16 %b) {
	; CHECK-LABEL: @fold_icmp_ne_0_zext_icmp_eq_i16(			; CHECK-LABEL: define i64 @fold_icmp_ne_0_zext_icmp_eq_i16
	; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.icmp.i64.i16(i16 [[A:%.]], i16 [[B:%.*]], i32 32)			; CHECK-SAME: (i16 [[A:%.]], i16 [[B:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i16(i16 [[A]], i16 [[B]], i32 32)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = icmp eq i16 %a, %b			%cmp = icmp eq i16 %a, %b
	%zext.cmp = zext i1 %cmp to i32			%zext.cmp = zext i1 %cmp to i32
	%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)			%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_ne_0_zext_icmp_eq_i36(i36 %a, i36 %b) {			define i64 @fold_icmp_ne_0_zext_icmp_eq_i36(i36 %a, i36 %b) {
	; CHECK-LABEL: @fold_icmp_ne_0_zext_icmp_eq_i36(			; CHECK-LABEL: define i64 @fold_icmp_ne_0_zext_icmp_eq_i36
	; CHECK-NEXT: [[TMP1:%.]] = zext i36 [[A:%.]] to i64			; CHECK-SAME: (i36 [[A:%.]], i36 [[B:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: [[TMP2:%.]] = zext i36 [[B:%.]] to i64			; CHECK-NEXT: [[TMP1:%.*]] = zext i36 [[A]] to i64
				; CHECK-NEXT: [[TMP2:%.*]] = zext i36 [[B]] to i64
	; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i64(i64 [[TMP1]], i64 [[TMP2]], i32 32)			; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i64(i64 [[TMP1]], i64 [[TMP2]], i32 32)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = icmp eq i36 %a, %b			%cmp = icmp eq i36 %a, %b
	%zext.cmp = zext i1 %cmp to i32			%zext.cmp = zext i1 %cmp to i32
	%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)			%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_ne_0_zext_icmp_eq_i128(i128 %a, i128 %b) {			define i64 @fold_icmp_ne_0_zext_icmp_eq_i128(i128 %a, i128 %b) {
	; CHECK-LABEL: @fold_icmp_ne_0_zext_icmp_eq_i128(			; CHECK-LABEL: define i64 @fold_icmp_ne_0_zext_icmp_eq_i128
	; CHECK-NEXT: [[CMP:%.]] = icmp eq i128 [[A:%.]], [[B:%.*]]			; CHECK-SAME: (i128 [[A:%.]], i128 [[B:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[CMP:%.*]] = icmp eq i128 [[A]], [[B]]
	; CHECK-NEXT: [[ZEXT_CMP:%.*]] = zext i1 [[CMP]] to i32			; CHECK-NEXT: [[ZEXT_CMP:%.*]] = zext i1 [[CMP]] to i32
	; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[ZEXT_CMP]], i32 0, i32 33)			; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[ZEXT_CMP]], i32 0, i32 33)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = icmp eq i128 %a, %b			%cmp = icmp eq i128 %a, %b
	%zext.cmp = zext i1 %cmp to i32			%zext.cmp = zext i1 %cmp to i32
	%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)			%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_ne_0_zext_fcmp_oeq_f16(half %a, half %b) {			define i64 @fold_icmp_ne_0_zext_fcmp_oeq_f16(half %a, half %b) {
	; CHECK-LABEL: @fold_icmp_ne_0_zext_fcmp_oeq_f16(			; CHECK-LABEL: define i64 @fold_icmp_ne_0_zext_fcmp_oeq_f16
	; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.fcmp.i64.f16(half [[A:%.]], half [[B:%.*]], i32 1)			; CHECK-SAME: (half [[A:%.]], half [[B:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.fcmp.i64.f16(half [[A]], half [[B]], i32 1)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = fcmp oeq half %a, %b			%cmp = fcmp oeq half %a, %b
	%zext.cmp = zext i1 %cmp to i32			%zext.cmp = zext i1 %cmp to i32
	%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)			%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_ne_0_zext_fcmp_oeq_f128(fp128 %a, fp128 %b) {			define i64 @fold_icmp_ne_0_zext_fcmp_oeq_f128(fp128 %a, fp128 %b) {
	; CHECK-LABEL: @fold_icmp_ne_0_zext_fcmp_oeq_f128(			; CHECK-LABEL: define i64 @fold_icmp_ne_0_zext_fcmp_oeq_f128
	; CHECK-NEXT: [[CMP:%.]] = fcmp oeq fp128 [[A:%.]], [[B:%.*]]			; CHECK-SAME: (fp128 [[A:%.]], fp128 [[B:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[CMP:%.*]] = fcmp oeq fp128 [[A]], [[B]]
	; CHECK-NEXT: [[ZEXT_CMP:%.*]] = zext i1 [[CMP]] to i32			; CHECK-NEXT: [[ZEXT_CMP:%.*]] = zext i1 [[CMP]] to i32
	; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[ZEXT_CMP]], i32 0, i32 33)			; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i32(i32 [[ZEXT_CMP]], i32 0, i32 33)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = fcmp oeq fp128 %a, %b			%cmp = fcmp oeq fp128 %a, %b
	%zext.cmp = zext i1 %cmp to i32			%zext.cmp = zext i1 %cmp to i32
	%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)			%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_ne_0_zext_icmp_slt_i4(i4 %a, i4 %b) {			define i64 @fold_icmp_ne_0_zext_icmp_slt_i4(i4 %a, i4 %b) {
	; CHECK-LABEL: @fold_icmp_ne_0_zext_icmp_slt_i4(			; CHECK-LABEL: define i64 @fold_icmp_ne_0_zext_icmp_slt_i4
	; CHECK-NEXT: [[TMP1:%.]] = sext i4 [[A:%.]] to i16			; CHECK-SAME: (i4 [[A:%.]], i4 [[B:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: [[TMP2:%.]] = sext i4 [[B:%.]] to i16			; CHECK-NEXT: [[TMP1:%.*]] = sext i4 [[A]] to i16
				; CHECK-NEXT: [[TMP2:%.*]] = sext i4 [[B]] to i16
	; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i16(i16 [[TMP1]], i16 [[TMP2]], i32 40)			; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i16(i16 [[TMP1]], i16 [[TMP2]], i32 40)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = icmp slt i4 %a, %b			%cmp = icmp slt i4 %a, %b
	%zext.cmp = zext i1 %cmp to i32			%zext.cmp = zext i1 %cmp to i32
	%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)			%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_ne_0_zext_icmp_slt_i8(i8 %a, i8 %b) {			define i64 @fold_icmp_ne_0_zext_icmp_slt_i8(i8 %a, i8 %b) {
	; CHECK-LABEL: @fold_icmp_ne_0_zext_icmp_slt_i8(			; CHECK-LABEL: define i64 @fold_icmp_ne_0_zext_icmp_slt_i8
	; CHECK-NEXT: [[TMP1:%.]] = sext i8 [[A:%.]] to i16			; CHECK-SAME: (i8 [[A:%.]], i8 [[B:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: [[TMP2:%.]] = sext i8 [[B:%.]] to i16			; CHECK-NEXT: [[TMP1:%.*]] = sext i8 [[A]] to i16
				; CHECK-NEXT: [[TMP2:%.*]] = sext i8 [[B]] to i16
	; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i16(i16 [[TMP1]], i16 [[TMP2]], i32 40)			; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i16(i16 [[TMP1]], i16 [[TMP2]], i32 40)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = icmp slt i8 %a, %b			%cmp = icmp slt i8 %a, %b
	%zext.cmp = zext i1 %cmp to i32			%zext.cmp = zext i1 %cmp to i32
	%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)			%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_ne_0_zext_icmp_slt_i16(i16 %a, i16 %b) {			define i64 @fold_icmp_ne_0_zext_icmp_slt_i16(i16 %a, i16 %b) {
	; CHECK-LABEL: @fold_icmp_ne_0_zext_icmp_slt_i16(			; CHECK-LABEL: define i64 @fold_icmp_ne_0_zext_icmp_slt_i16
	; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.icmp.i64.i16(i16 [[A:%.]], i16 [[B:%.*]], i32 40)			; CHECK-SAME: (i16 [[A:%.]], i16 [[B:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i16(i16 [[A]], i16 [[B]], i32 40)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = icmp slt i16 %a, %b			%cmp = icmp slt i16 %a, %b
	%zext.cmp = zext i1 %cmp to i32			%zext.cmp = zext i1 %cmp to i32
	%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)			%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_ne_0_zext_icmp_ult_i4(i4 %a, i4 %b) {			define i64 @fold_icmp_ne_0_zext_icmp_ult_i4(i4 %a, i4 %b) {
	; CHECK-LABEL: @fold_icmp_ne_0_zext_icmp_ult_i4(			; CHECK-LABEL: define i64 @fold_icmp_ne_0_zext_icmp_ult_i4
	; CHECK-NEXT: [[TMP1:%.]] = zext i4 [[A:%.]] to i16			; CHECK-SAME: (i4 [[A:%.]], i4 [[B:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: [[TMP2:%.]] = zext i4 [[B:%.]] to i16			; CHECK-NEXT: [[TMP1:%.*]] = zext i4 [[A]] to i16
				; CHECK-NEXT: [[TMP2:%.*]] = zext i4 [[B]] to i16
	; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i16(i16 [[TMP1]], i16 [[TMP2]], i32 36)			; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i16(i16 [[TMP1]], i16 [[TMP2]], i32 36)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = icmp ult i4 %a, %b			%cmp = icmp ult i4 %a, %b
	%zext.cmp = zext i1 %cmp to i32			%zext.cmp = zext i1 %cmp to i32
	%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)			%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_ne_0_zext_icmp_ult_i8(i8 %a, i8 %b) {			define i64 @fold_icmp_ne_0_zext_icmp_ult_i8(i8 %a, i8 %b) {
	; CHECK-LABEL: @fold_icmp_ne_0_zext_icmp_ult_i8(			; CHECK-LABEL: define i64 @fold_icmp_ne_0_zext_icmp_ult_i8
	; CHECK-NEXT: [[TMP1:%.]] = zext i8 [[A:%.]] to i16			; CHECK-SAME: (i8 [[A:%.]], i8 [[B:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: [[TMP2:%.]] = zext i8 [[B:%.]] to i16			; CHECK-NEXT: [[TMP1:%.*]] = zext i8 [[A]] to i16
				; CHECK-NEXT: [[TMP2:%.*]] = zext i8 [[B]] to i16
	; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i16(i16 [[TMP1]], i16 [[TMP2]], i32 36)			; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i16(i16 [[TMP1]], i16 [[TMP2]], i32 36)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = icmp ult i8 %a, %b			%cmp = icmp ult i8 %a, %b
	%zext.cmp = zext i1 %cmp to i32			%zext.cmp = zext i1 %cmp to i32
	%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)			%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_ne_0_zext_icmp_ult_i16(i16 %a, i16 %b) {			define i64 @fold_icmp_ne_0_zext_icmp_ult_i16(i16 %a, i16 %b) {
	; CHECK-LABEL: @fold_icmp_ne_0_zext_icmp_ult_i16(			; CHECK-LABEL: define i64 @fold_icmp_ne_0_zext_icmp_ult_i16
	; CHECK-NEXT: [[MASK:%.]] = call i64 @llvm.amdgcn.icmp.i64.i16(i16 [[A:%.]], i16 [[B:%.*]], i32 36)			; CHECK-SAME: (i16 [[A:%.]], i16 [[B:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i16(i16 [[A]], i16 [[B]], i32 36)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = icmp ult i16 %a, %b			%cmp = icmp ult i16 %a, %b
	%zext.cmp = zext i1 %cmp to i32			%zext.cmp = zext i1 %cmp to i32
	%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)			%mask = call i64 @llvm.amdgcn.icmp.i64.i32(i32 %zext.cmp, i32 0, i32 33)
	ret i64 %mask			ret i64 %mask
	}			}

	; 1-bit NE comparisons			; 1-bit NE comparisons

	define i64 @fold_icmp_i1_ne_0_icmp_eq_i1(i32 %a, i32 %b) {			define i64 @fold_icmp_i1_ne_0_icmp_eq_i1(i32 %a, i32 %b) {
	; CHECK-LABEL: @fold_icmp_i1_ne_0_icmp_eq_i1(			; CHECK-LABEL: define i64 @fold_icmp_i1_ne_0_icmp_eq_i1
	; CHECK-NEXT: [[CMP:%.]] = icmp eq i32 [[A:%.]], [[B:%.*]]			; CHECK-SAME: (i32 [[A:%.]], i32 [[B:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[CMP:%.*]] = icmp eq i32 [[A]], [[B]]
	; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)			; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = icmp eq i32 %a, %b			%cmp = icmp eq i32 %a, %b
	%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)			%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_i1_ne_0_icmp_ne_i1(i32 %a, i32 %b) {			define i64 @fold_icmp_i1_ne_0_icmp_ne_i1(i32 %a, i32 %b) {
	; CHECK-LABEL: @fold_icmp_i1_ne_0_icmp_ne_i1(			; CHECK-LABEL: define i64 @fold_icmp_i1_ne_0_icmp_ne_i1
	; CHECK-NEXT: [[CMP:%.]] = icmp ne i32 [[A:%.]], [[B:%.*]]			; CHECK-SAME: (i32 [[A:%.]], i32 [[B:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[CMP:%.*]] = icmp ne i32 [[A]], [[B]]
	; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)			; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = icmp ne i32 %a, %b			%cmp = icmp ne i32 %a, %b
	%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)			%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_i1_ne_0_icmp_sle_i1(i32 %a, i32 %b) {			define i64 @fold_icmp_i1_ne_0_icmp_sle_i1(i32 %a, i32 %b) {
	; CHECK-LABEL: @fold_icmp_i1_ne_0_icmp_sle_i1(			; CHECK-LABEL: define i64 @fold_icmp_i1_ne_0_icmp_sle_i1
	; CHECK-NEXT: [[CMP:%.]] = icmp sle i32 [[A:%.]], [[B:%.*]]			; CHECK-SAME: (i32 [[A:%.]], i32 [[B:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[CMP:%.*]] = icmp sle i32 [[A]], [[B]]
	; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)			; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = icmp sle i32 %a, %b			%cmp = icmp sle i32 %a, %b
	%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)			%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_i1_ne_0_icmp_ugt_i64(i64 %a, i64 %b) {			define i64 @fold_icmp_i1_ne_0_icmp_ugt_i64(i64 %a, i64 %b) {
	; CHECK-LABEL: @fold_icmp_i1_ne_0_icmp_ugt_i64(			; CHECK-LABEL: define i64 @fold_icmp_i1_ne_0_icmp_ugt_i64
	; CHECK-NEXT: [[CMP:%.]] = icmp ugt i64 [[A:%.]], [[B:%.*]]			; CHECK-SAME: (i64 [[A:%.]], i64 [[B:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[CMP:%.*]] = icmp ugt i64 [[A]], [[B]]
	; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)			; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = icmp ugt i64 %a, %b			%cmp = icmp ugt i64 %a, %b
	%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)			%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_i1_ne_0_icmp_ult_swap_i64(i64 %a, i64 %b) {			define i64 @fold_icmp_i1_ne_0_icmp_ult_swap_i64(i64 %a, i64 %b) {
	; CHECK-LABEL: @fold_icmp_i1_ne_0_icmp_ult_swap_i64(			; CHECK-LABEL: define i64 @fold_icmp_i1_ne_0_icmp_ult_swap_i64
	; CHECK-NEXT: [[CMP:%.]] = icmp ugt i64 [[A:%.]], [[B:%.*]]			; CHECK-SAME: (i64 [[A:%.]], i64 [[B:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[CMP:%.*]] = icmp ugt i64 [[A]], [[B]]
	; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)			; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = icmp ugt i64 %a, %b			%cmp = icmp ugt i64 %a, %b
	%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 false, i1 %cmp, i32 33)			%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 false, i1 %cmp, i32 33)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_i1_ne_0_fcmp_oeq_f32(float %a, float %b) {			define i64 @fold_icmp_i1_ne_0_fcmp_oeq_f32(float %a, float %b) {
	; CHECK-LABEL: @fold_icmp_i1_ne_0_fcmp_oeq_f32(			; CHECK-LABEL: define i64 @fold_icmp_i1_ne_0_fcmp_oeq_f32
	; CHECK-NEXT: [[CMP:%.]] = fcmp oeq float [[A:%.]], [[B:%.*]]			; CHECK-SAME: (float [[A:%.]], float [[B:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[CMP:%.*]] = fcmp oeq float [[A]], [[B]]
	; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)			; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = fcmp oeq float %a, %b			%cmp = fcmp oeq float %a, %b
	%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)			%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_i1_ne_0_fcmp_une_f32(float %a, float %b) {			define i64 @fold_icmp_i1_ne_0_fcmp_une_f32(float %a, float %b) {
	; CHECK-LABEL: @fold_icmp_i1_ne_0_fcmp_une_f32(			; CHECK-LABEL: define i64 @fold_icmp_i1_ne_0_fcmp_une_f32
	; CHECK-NEXT: [[CMP:%.]] = fcmp une float [[A:%.]], [[B:%.*]]			; CHECK-SAME: (float [[A:%.]], float [[B:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[CMP:%.*]] = fcmp une float [[A]], [[B]]
	; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)			; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = fcmp une float %a, %b			%cmp = fcmp une float %a, %b
	%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)			%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_i1_ne_0_fcmp_olt_f64(double %a, double %b) {			define i64 @fold_icmp_i1_ne_0_fcmp_olt_f64(double %a, double %b) {
	; CHECK-LABEL: @fold_icmp_i1_ne_0_fcmp_olt_f64(			; CHECK-LABEL: define i64 @fold_icmp_i1_ne_0_fcmp_olt_f64
	; CHECK-NEXT: [[CMP:%.]] = fcmp olt double [[A:%.]], [[B:%.*]]			; CHECK-SAME: (double [[A:%.]], double [[B:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[CMP:%.*]] = fcmp olt double [[A]], [[B]]
	; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)			; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = fcmp olt double %a, %b			%cmp = fcmp olt double %a, %b
	%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)			%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_i1_ne_0_icmp_eq_i4(i4 %a, i4 %b) {			define i64 @fold_icmp_i1_ne_0_icmp_eq_i4(i4 %a, i4 %b) {
	; CHECK-LABEL: @fold_icmp_i1_ne_0_icmp_eq_i4(			; CHECK-LABEL: define i64 @fold_icmp_i1_ne_0_icmp_eq_i4
	; CHECK-NEXT: [[CMP:%.]] = icmp eq i4 [[A:%.]], [[B:%.*]]			; CHECK-SAME: (i4 [[A:%.]], i4 [[B:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[CMP:%.*]] = icmp eq i4 [[A]], [[B]]
	; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)			; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = icmp eq i4 %a, %b			%cmp = icmp eq i4 %a, %b
	%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)			%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_i1_ne_0_icmp_eq_i8(i8 %a, i8 %b) {			define i64 @fold_icmp_i1_ne_0_icmp_eq_i8(i8 %a, i8 %b) {
	; CHECK-LABEL: @fold_icmp_i1_ne_0_icmp_eq_i8(			; CHECK-LABEL: define i64 @fold_icmp_i1_ne_0_icmp_eq_i8
	; CHECK-NEXT: [[CMP:%.]] = icmp eq i8 [[A:%.]], [[B:%.*]]			; CHECK-SAME: (i8 [[A:%.]], i8 [[B:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[CMP:%.*]] = icmp eq i8 [[A]], [[B]]
	; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)			; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = icmp eq i8 %a, %b			%cmp = icmp eq i8 %a, %b
	%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)			%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_i1_ne_0_icmp_eq_i16(i16 %a, i16 %b) {			define i64 @fold_icmp_i1_ne_0_icmp_eq_i16(i16 %a, i16 %b) {
	; CHECK-LABEL: @fold_icmp_i1_ne_0_icmp_eq_i16(			; CHECK-LABEL: define i64 @fold_icmp_i1_ne_0_icmp_eq_i16
	; CHECK-NEXT: [[CMP:%.]] = icmp eq i16 [[A:%.]], [[B:%.*]]			; CHECK-SAME: (i16 [[A:%.]], i16 [[B:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[CMP:%.*]] = icmp eq i16 [[A]], [[B]]
	; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)			; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = icmp eq i16 %a, %b			%cmp = icmp eq i16 %a, %b
	%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)			%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_i1_ne_0_icmp_eq_i36(i36 %a, i36 %b) {			define i64 @fold_icmp_i1_ne_0_icmp_eq_i36(i36 %a, i36 %b) {
	; CHECK-LABEL: @fold_icmp_i1_ne_0_icmp_eq_i36(			; CHECK-LABEL: define i64 @fold_icmp_i1_ne_0_icmp_eq_i36
	; CHECK-NEXT: [[CMP:%.]] = icmp eq i36 [[A:%.]], [[B:%.*]]			; CHECK-SAME: (i36 [[A:%.]], i36 [[B:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[CMP:%.*]] = icmp eq i36 [[A]], [[B]]
	; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)			; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = icmp eq i36 %a, %b			%cmp = icmp eq i36 %a, %b
	%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)			%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_i1_ne_0_icmp_eq_i128(i128 %a, i128 %b) {			define i64 @fold_icmp_i1_ne_0_icmp_eq_i128(i128 %a, i128 %b) {
	; CHECK-LABEL: @fold_icmp_i1_ne_0_icmp_eq_i128(			; CHECK-LABEL: define i64 @fold_icmp_i1_ne_0_icmp_eq_i128
	; CHECK-NEXT: [[CMP:%.]] = icmp eq i128 [[A:%.]], [[B:%.*]]			; CHECK-SAME: (i128 [[A:%.]], i128 [[B:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[CMP:%.*]] = icmp eq i128 [[A]], [[B]]
	; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)			; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = icmp eq i128 %a, %b			%cmp = icmp eq i128 %a, %b
	%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)			%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_i1_ne_0_fcmp_oeq_f16(half %a, half %b) {			define i64 @fold_icmp_i1_ne_0_fcmp_oeq_f16(half %a, half %b) {
	; CHECK-LABEL: @fold_icmp_i1_ne_0_fcmp_oeq_f16(			; CHECK-LABEL: define i64 @fold_icmp_i1_ne_0_fcmp_oeq_f16
	; CHECK-NEXT: [[CMP:%.]] = fcmp oeq half [[A:%.]], [[B:%.*]]			; CHECK-SAME: (half [[A:%.]], half [[B:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[CMP:%.*]] = fcmp oeq half [[A]], [[B]]
	; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)			; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = fcmp oeq half %a, %b			%cmp = fcmp oeq half %a, %b
	%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)			%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_i1_ne_0_fcmp_oeq_f128(fp128 %a, fp128 %b) {			define i64 @fold_icmp_i1_ne_0_fcmp_oeq_f128(fp128 %a, fp128 %b) {
	; CHECK-LABEL: @fold_icmp_i1_ne_0_fcmp_oeq_f128(			; CHECK-LABEL: define i64 @fold_icmp_i1_ne_0_fcmp_oeq_f128
	; CHECK-NEXT: [[CMP:%.]] = fcmp oeq fp128 [[A:%.]], [[B:%.*]]			; CHECK-SAME: (fp128 [[A:%.]], fp128 [[B:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[CMP:%.*]] = fcmp oeq fp128 [[A]], [[B]]
	; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)			; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = fcmp oeq fp128 %a, %b			%cmp = fcmp oeq fp128 %a, %b
	%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)			%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_i1_ne_0_icmp_slt_i4(i4 %a, i4 %b) {			define i64 @fold_icmp_i1_ne_0_icmp_slt_i4(i4 %a, i4 %b) {
	; CHECK-LABEL: @fold_icmp_i1_ne_0_icmp_slt_i4(			; CHECK-LABEL: define i64 @fold_icmp_i1_ne_0_icmp_slt_i4
	; CHECK-NEXT: [[CMP:%.]] = icmp slt i4 [[A:%.]], [[B:%.*]]			; CHECK-SAME: (i4 [[A:%.]], i4 [[B:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[CMP:%.*]] = icmp slt i4 [[A]], [[B]]
	; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)			; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = icmp slt i4 %a, %b			%cmp = icmp slt i4 %a, %b
	%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)			%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_i1_ne_0_icmp_slt_i8(i8 %a, i8 %b) {			define i64 @fold_icmp_i1_ne_0_icmp_slt_i8(i8 %a, i8 %b) {
	; CHECK-LABEL: @fold_icmp_i1_ne_0_icmp_slt_i8(			; CHECK-LABEL: define i64 @fold_icmp_i1_ne_0_icmp_slt_i8
	; CHECK-NEXT: [[CMP:%.]] = icmp slt i8 [[A:%.]], [[B:%.*]]			; CHECK-SAME: (i8 [[A:%.]], i8 [[B:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[CMP:%.*]] = icmp slt i8 [[A]], [[B]]
	; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)			; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = icmp slt i8 %a, %b			%cmp = icmp slt i8 %a, %b
	%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)			%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_i1_ne_0_icmp_slt_i16(i16 %a, i16 %b) {			define i64 @fold_icmp_i1_ne_0_icmp_slt_i16(i16 %a, i16 %b) {
	; CHECK-LABEL: @fold_icmp_i1_ne_0_icmp_slt_i16(			; CHECK-LABEL: define i64 @fold_icmp_i1_ne_0_icmp_slt_i16
	; CHECK-NEXT: [[CMP:%.]] = icmp slt i16 [[A:%.]], [[B:%.*]]			; CHECK-SAME: (i16 [[A:%.]], i16 [[B:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[CMP:%.*]] = icmp slt i16 [[A]], [[B]]
	; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)			; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = icmp slt i16 %a, %b			%cmp = icmp slt i16 %a, %b
	%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)			%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_i1_ne_0_icmp_ult_i4(i4 %a, i4 %b) {			define i64 @fold_icmp_i1_ne_0_icmp_ult_i4(i4 %a, i4 %b) {
	; CHECK-LABEL: @fold_icmp_i1_ne_0_icmp_ult_i4(			; CHECK-LABEL: define i64 @fold_icmp_i1_ne_0_icmp_ult_i4
	; CHECK-NEXT: [[CMP:%.]] = icmp ult i4 [[A:%.]], [[B:%.*]]			; CHECK-SAME: (i4 [[A:%.]], i4 [[B:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[CMP:%.*]] = icmp ult i4 [[A]], [[B]]
	; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)			; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = icmp ult i4 %a, %b			%cmp = icmp ult i4 %a, %b
	%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)			%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_i1_ne_0_icmp_ult_i8(i8 %a, i8 %b) {			define i64 @fold_icmp_i1_ne_0_icmp_ult_i8(i8 %a, i8 %b) {
	; CHECK-LABEL: @fold_icmp_i1_ne_0_icmp_ult_i8(			; CHECK-LABEL: define i64 @fold_icmp_i1_ne_0_icmp_ult_i8
	; CHECK-NEXT: [[CMP:%.]] = icmp ult i8 [[A:%.]], [[B:%.*]]			; CHECK-SAME: (i8 [[A:%.]], i8 [[B:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[CMP:%.*]] = icmp ult i8 [[A]], [[B]]
	; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)			; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = icmp ult i8 %a, %b			%cmp = icmp ult i8 %a, %b
	%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)			%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)
	ret i64 %mask			ret i64 %mask
	}			}

	define i64 @fold_icmp_i1_ne_0_icmp_ult_i16(i16 %a, i16 %b) {			define i64 @fold_icmp_i1_ne_0_icmp_ult_i16(i16 %a, i16 %b) {
	; CHECK-LABEL: @fold_icmp_i1_ne_0_icmp_ult_i16(			; CHECK-LABEL: define i64 @fold_icmp_i1_ne_0_icmp_ult_i16
	; CHECK-NEXT: [[CMP:%.]] = icmp ult i16 [[A:%.]], [[B:%.*]]			; CHECK-SAME: (i16 [[A:%.]], i16 [[B:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[CMP:%.*]] = icmp ult i16 [[A]], [[B]]
	; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)			; CHECK-NEXT: [[MASK:%.*]] = call i64 @llvm.amdgcn.icmp.i64.i1(i1 [[CMP]], i1 false, i32 33)
	; CHECK-NEXT: ret i64 [[MASK]]			; CHECK-NEXT: ret i64 [[MASK]]
	;			;
	%cmp = icmp ult i16 %a, %b			%cmp = icmp ult i16 %a, %b
	%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)			%mask = call i64 @llvm.amdgcn.icmp.i64.i1(i1 %cmp, i1 false, i32 33)
	ret i64 %mask			ret i64 %mask
	}			}

	; --------------------------------------------------------------------			; --------------------------------------------------------------------
	; llvm.amdgcn.fcmp			; llvm.amdgcn.fcmp
	; --------------------------------------------------------------------			; --------------------------------------------------------------------

	declare i64 @llvm.amdgcn.fcmp.i64.f32(float, float, i32 immarg) nounwind readnone convergent			declare i64 @llvm.amdgcn.fcmp.i64.f32(float, float, i32 immarg) nounwind readnone convergent

	define i64 @invalid_fcmp_code(float %a, float %b) {			define i64 @invalid_fcmp_code(float %a, float %b) {
	; CHECK-LABEL: @invalid_fcmp_code(			; CHECK-LABEL: define i64 @invalid_fcmp_code
	; CHECK-NEXT: [[UNDER:%.]] = call i64 @llvm.amdgcn.fcmp.i64.f32(float [[A:%.]], float [[B:%.*]], i32 -1)			; CHECK-SAME: (float [[A:%.]], float [[B:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[UNDER:%.*]] = call i64 @llvm.amdgcn.fcmp.i64.f32(float [[A]], float [[B]], i32 -1)
	; CHECK-NEXT: [[OVER:%.*]] = call i64 @llvm.amdgcn.fcmp.i64.f32(float [[A]], float [[B]], i32 16)			; CHECK-NEXT: [[OVER:%.*]] = call i64 @llvm.amdgcn.fcmp.i64.f32(float [[A]], float [[B]], i32 16)
	; CHECK-NEXT: [[OR:%.*]] = or i64 [[UNDER]], [[OVER]]			; CHECK-NEXT: [[OR:%.*]] = or i64 [[UNDER]], [[OVER]]
	; CHECK-NEXT: ret i64 [[OR]]			; CHECK-NEXT: ret i64 [[OR]]
	;			;
	%under = call i64 @llvm.amdgcn.fcmp.i64.f32(float %a, float %b, i32 -1)			%under = call i64 @llvm.amdgcn.fcmp.i64.f32(float %a, float %b, i32 -1)
	%over = call i64 @llvm.amdgcn.fcmp.i64.f32(float %a, float %b, i32 16)			%over = call i64 @llvm.amdgcn.fcmp.i64.f32(float %a, float %b, i32 16)
	%or = or i64 %under, %over			%or = or i64 %under, %over
	ret i64 %or			ret i64 %or
	}			}

	define i64 @fcmp_constant_inputs_false() {			define i64 @fcmp_constant_inputs_false() {
	; CHECK-LABEL: @fcmp_constant_inputs_false(			; CHECK-LABEL: define i64 @fcmp_constant_inputs_false
				; CHECK-SAME: () #[[ATTR3]] {
	; CHECK-NEXT: ret i64 0			; CHECK-NEXT: ret i64 0
	;			;
	%result = call i64 @llvm.amdgcn.fcmp.i64.f32(float 2.0, float 4.0, i32 1)			%result = call i64 @llvm.amdgcn.fcmp.i64.f32(float 2.0, float 4.0, i32 1)
	ret i64 %result			ret i64 %result
	}			}

	define i64 @fcmp_constant_inputs_true() {			define i64 @fcmp_constant_inputs_true() {
	; CHECK-LABEL: @fcmp_constant_inputs_true(			; CHECK-LABEL: define i64 @fcmp_constant_inputs_true
	; CHECK-NEXT: [[RESULT:%.*]] = call i64 @llvm.read_register.i64(metadata [[META0]]) #[[ATTR17]]			; CHECK-SAME: () #[[ATTR3]] {
				; CHECK-NEXT: [[RESULT:%.*]] = call i64 @llvm.read_register.i64(metadata [[META0]]) #[[ATTR16]]
	; CHECK-NEXT: ret i64 [[RESULT]]			; CHECK-NEXT: ret i64 [[RESULT]]
	;			;
	%result = call i64 @llvm.amdgcn.fcmp.i64.f32(float 2.0, float 4.0, i32 4)			%result = call i64 @llvm.amdgcn.fcmp.i64.f32(float 2.0, float 4.0, i32 4)
	ret i64 %result			ret i64 %result
	}			}

	define i64 @fcmp_constant_to_rhs_olt(float %x) {			define i64 @fcmp_constant_to_rhs_olt(float %x) {
	; CHECK-LABEL: @fcmp_constant_to_rhs_olt(			; CHECK-LABEL: define i64 @fcmp_constant_to_rhs_olt
	; CHECK-NEXT: [[RESULT:%.]] = call i64 @llvm.amdgcn.fcmp.i64.f32(float [[X:%.]], float 4.000000e+00, i32 2)			; CHECK-SAME: (float [[X:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: [[RESULT:%.*]] = call i64 @llvm.amdgcn.fcmp.i64.f32(float [[X]], float 4.000000e+00, i32 2)
	; CHECK-NEXT: ret i64 [[RESULT]]			; CHECK-NEXT: ret i64 [[RESULT]]
	;			;
	%result = call i64 @llvm.amdgcn.fcmp.i64.f32(float 4.0, float %x, i32 4)			%result = call i64 @llvm.amdgcn.fcmp.i64.f32(float 4.0, float %x, i32 4)
	ret i64 %result			ret i64 %result
	}			}

	; --------------------------------------------------------------------			; --------------------------------------------------------------------
	; llvm.amdgcn.ballot			; llvm.amdgcn.ballot
	; --------------------------------------------------------------------			; --------------------------------------------------------------------

	declare i64 @llvm.amdgcn.ballot.i64(i1) nounwind readnone convergent			declare i64 @llvm.amdgcn.ballot.i64(i1) nounwind readnone convergent
	declare i32 @llvm.amdgcn.ballot.i32(i1) nounwind readnone convergent			declare i32 @llvm.amdgcn.ballot.i32(i1) nounwind readnone convergent

	define i64 @ballot_nocombine_64(i1 %i) {			define i64 @ballot_nocombine_64(i1 %i) {
	; CHECK-LABEL: @ballot_nocombine_64(			; CHECK-LABEL: define i64 @ballot_nocombine_64
	; CHECK-NEXT: [[B:%.]] = call i64 @llvm.amdgcn.ballot.i64(i1 [[I:%.]])			; CHECK-SAME: (i1 [[I:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: [[B:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 [[I]])
	; CHECK-NEXT: ret i64 [[B]]			; CHECK-NEXT: ret i64 [[B]]
	;			;
	%b = call i64 @llvm.amdgcn.ballot.i64(i1 %i)			%b = call i64 @llvm.amdgcn.ballot.i64(i1 %i)
	ret i64 %b			ret i64 %b
	}			}

	define i64 @ballot_zero_64() {			define i64 @ballot_zero_64() {
	; CHECK-LABEL: @ballot_zero_64(			; CHECK-LABEL: define i64 @ballot_zero_64
				; CHECK-SAME: () #[[ATTR3]] {
	; CHECK-NEXT: ret i64 0			; CHECK-NEXT: ret i64 0
	;			;
	%b = call i64 @llvm.amdgcn.ballot.i64(i1 0)			%b = call i64 @llvm.amdgcn.ballot.i64(i1 0)
	ret i64 %b			ret i64 %b
	}			}

	define i64 @ballot_one_64() {			define i64 @ballot_one_64() {
	; CHECK-LABEL: @ballot_one_64(			; CHECK-LABEL: define i64 @ballot_one_64
	; CHECK-NEXT: [[B:%.*]] = call i64 @llvm.read_register.i64(metadata [[META0]]) #[[ATTR17]]			; CHECK-SAME: () #[[ATTR3]] {
				; CHECK-NEXT: [[B:%.*]] = call i64 @llvm.read_register.i64(metadata [[META0]]) #[[ATTR16]]
	; CHECK-NEXT: ret i64 [[B]]			; CHECK-NEXT: ret i64 [[B]]
	;			;
	%b = call i64 @llvm.amdgcn.ballot.i64(i1 1)			%b = call i64 @llvm.amdgcn.ballot.i64(i1 1)
	ret i64 %b			ret i64 %b
	}			}

	define i32 @ballot_nocombine_32(i1 %i) {			define i32 @ballot_nocombine_32(i1 %i) {
	; CHECK-LABEL: @ballot_nocombine_32(			; CHECK-LABEL: define i32 @ballot_nocombine_32
	; CHECK-NEXT: [[B:%.]] = call i32 @llvm.amdgcn.ballot.i32(i1 [[I:%.]])			; CHECK-SAME: (i1 [[I:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: [[B:%.*]] = call i32 @llvm.amdgcn.ballot.i32(i1 [[I]])
	; CHECK-NEXT: ret i32 [[B]]			; CHECK-NEXT: ret i32 [[B]]
	;			;
	%b = call i32 @llvm.amdgcn.ballot.i32(i1 %i)			%b = call i32 @llvm.amdgcn.ballot.i32(i1 %i)
	ret i32 %b			ret i32 %b
	}			}

	define i32 @ballot_zero_32() {			define i32 @ballot_zero_32() {
	; CHECK-LABEL: @ballot_zero_32(			; CHECK-LABEL: define i32 @ballot_zero_32
				; CHECK-SAME: () #[[ATTR3]] {
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	;			;
	%b = call i32 @llvm.amdgcn.ballot.i32(i1 0)			%b = call i32 @llvm.amdgcn.ballot.i32(i1 0)
	ret i32 %b			ret i32 %b
	}			}

	define i32 @ballot_one_32() {			define i32 @ballot_one_32() {
	; CHECK-LABEL: @ballot_one_32(			; CHECK-LABEL: define i32 @ballot_one_32
	; CHECK-NEXT: [[B:%.*]] = call i32 @llvm.read_register.i32(metadata [[META1:![0-9]+]]) #[[ATTR17]]			; CHECK-SAME: () #[[ATTR3]] {
				; CHECK-NEXT: [[B:%.*]] = call i32 @llvm.read_register.i32(metadata [[META1:![0-9]+]]) #[[ATTR16]]
	; CHECK-NEXT: ret i32 [[B]]			; CHECK-NEXT: ret i32 [[B]]
	;			;
	%b = call i32 @llvm.amdgcn.ballot.i32(i1 1)			%b = call i32 @llvm.amdgcn.ballot.i32(i1 1)
	ret i32 %b			ret i32 %b
	}			}

	; --------------------------------------------------------------------			; --------------------------------------------------------------------
	; llvm.amdgcn.wqm.vote			; llvm.amdgcn.wqm.vote
	; --------------------------------------------------------------------			; --------------------------------------------------------------------

	declare i1 @llvm.amdgcn.wqm.vote(i1)			declare i1 @llvm.amdgcn.wqm.vote(i1)

	define float @wqm_vote_true() {			define float @wqm_vote_true() {
	; CHECK-LABEL: @wqm_vote_true(			; CHECK-LABEL: define float @wqm_vote_true
				; CHECK-SAME: () #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: ret float 1.000000e+00			; CHECK-NEXT: ret float 1.000000e+00
	;			;
	main_body:			main_body:
	%w = call i1 @llvm.amdgcn.wqm.vote(i1 true)			%w = call i1 @llvm.amdgcn.wqm.vote(i1 true)
	%r = select i1 %w, float 1.0, float 0.0			%r = select i1 %w, float 1.0, float 0.0
	ret float %r			ret float %r
	}			}

	define float @wqm_vote_false() {			define float @wqm_vote_false() {
	; CHECK-LABEL: @wqm_vote_false(			; CHECK-LABEL: define float @wqm_vote_false
				; CHECK-SAME: () #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: ret float 0.000000e+00			; CHECK-NEXT: ret float 0.000000e+00
	;			;
	main_body:			main_body:
	%w = call i1 @llvm.amdgcn.wqm.vote(i1 false)			%w = call i1 @llvm.amdgcn.wqm.vote(i1 false)
	%r = select i1 %w, float 1.0, float 0.0			%r = select i1 %w, float 1.0, float 0.0
	ret float %r			ret float %r
	}			}

	define float @wqm_vote_undef() {			define float @wqm_vote_undef() {
	; CHECK-LABEL: @wqm_vote_undef(			; CHECK-LABEL: define float @wqm_vote_undef
				; CHECK-SAME: () #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: ret float 0.000000e+00			; CHECK-NEXT: ret float 0.000000e+00
	;			;
	main_body:			main_body:
	%w = call i1 @llvm.amdgcn.wqm.vote(i1 undef)			%w = call i1 @llvm.amdgcn.wqm.vote(i1 undef)
	%r = select i1 %w, float 1.0, float 0.0			%r = select i1 %w, float 1.0, float 0.0
	ret float %r			ret float %r
	}			}

	; --------------------------------------------------------------------			; --------------------------------------------------------------------
	; llvm.amdgcn.kill			; llvm.amdgcn.kill
	; --------------------------------------------------------------------			; --------------------------------------------------------------------

	declare void @llvm.amdgcn.kill(i1)			declare void @llvm.amdgcn.kill(i1)

	define void @kill_true() {			define void @kill_true() {
	; CHECK-LABEL: @kill_true(			; CHECK-LABEL: define void @kill_true
				; CHECK-SAME: () #[[ATTR3]] {
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	call void @llvm.amdgcn.kill(i1 true)			call void @llvm.amdgcn.kill(i1 true)
	ret void			ret void
	}			}

	; --------------------------------------------------------------------			; --------------------------------------------------------------------
	; llvm.amdgcn.readfirstlane			; llvm.amdgcn.readfirstlane
	; --------------------------------------------------------------------			; --------------------------------------------------------------------

	declare i32 @llvm.amdgcn.readfirstlane(i32)			declare i32 @llvm.amdgcn.readfirstlane(i32)

	@gv = constant i32 0			@gv = constant i32 0

	define amdgpu_kernel void @readfirstlane_constant(i32 %arg) {			define amdgpu_kernel void @readfirstlane_constant(i32 %arg) {
	; CHECK-LABEL: @readfirstlane_constant(			; CHECK-LABEL: define amdgpu_kernel void @readfirstlane_constant
	; CHECK-NEXT: [[VAR:%.]] = call i32 @llvm.amdgcn.readfirstlane(i32 [[ARG:%.]])			; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: [[VAR:%.*]] = call i32 @llvm.amdgcn.readfirstlane.i32(i32 [[ARG]])
	; CHECK-NEXT: store volatile i32 [[VAR]], ptr undef, align 4			; CHECK-NEXT: store volatile i32 [[VAR]], ptr undef, align 4
	; CHECK-NEXT: store volatile i32 0, ptr undef, align 4			; CHECK-NEXT: store volatile i32 0, ptr undef, align 4
	; CHECK-NEXT: store volatile i32 123, ptr undef, align 4			; CHECK-NEXT: store volatile i32 123, ptr undef, align 4
	; CHECK-NEXT: store volatile i32 ptrtoint (ptr @gv to i32), ptr undef, align 4			; CHECK-NEXT: store volatile i32 ptrtoint (ptr @gv to i32), ptr undef, align 4
	; CHECK-NEXT: store volatile i32 undef, ptr undef, align 4			; CHECK-NEXT: store volatile i32 undef, ptr undef, align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%var = call i32 @llvm.amdgcn.readfirstlane(i32 %arg)			%var = call i32 @llvm.amdgcn.readfirstlane(i32 %arg)
	%zero = call i32 @llvm.amdgcn.readfirstlane(i32 0)			%zero = call i32 @llvm.amdgcn.readfirstlane(i32 0)
	%imm = call i32 @llvm.amdgcn.readfirstlane(i32 123)			%imm = call i32 @llvm.amdgcn.readfirstlane(i32 123)
	%constexpr = call i32 @llvm.amdgcn.readfirstlane(i32 ptrtoint (ptr @gv to i32))			%constexpr = call i32 @llvm.amdgcn.readfirstlane(i32 ptrtoint (ptr @gv to i32))
	%undef = call i32 @llvm.amdgcn.readfirstlane(i32 undef)			%undef = call i32 @llvm.amdgcn.readfirstlane(i32 undef)
	store volatile i32 %var, ptr undef			store volatile i32 %var, ptr undef
	store volatile i32 %zero, ptr undef			store volatile i32 %zero, ptr undef
	store volatile i32 %imm, ptr undef			store volatile i32 %imm, ptr undef
	store volatile i32 %constexpr, ptr undef			store volatile i32 %constexpr, ptr undef
	store volatile i32 %undef, ptr undef			store volatile i32 %undef, ptr undef
	ret void			ret void
	}			}

	define i32 @readfirstlane_idempotent(i32 %arg) {			define i32 @readfirstlane_idempotent(i32 %arg) {
	; CHECK-LABEL: @readfirstlane_idempotent(			; CHECK-LABEL: define i32 @readfirstlane_idempotent
	; CHECK-NEXT: [[READ0:%.]] = call i32 @llvm.amdgcn.readfirstlane(i32 [[ARG:%.]])			; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: [[READ0:%.*]] = call i32 @llvm.amdgcn.readfirstlane.i32(i32 [[ARG]])
	; CHECK-NEXT: ret i32 [[READ0]]			; CHECK-NEXT: ret i32 [[READ0]]
	;			;
	%read0 = call i32 @llvm.amdgcn.readfirstlane(i32 %arg)			%read0 = call i32 @llvm.amdgcn.readfirstlane(i32 %arg)
	%read1 = call i32 @llvm.amdgcn.readfirstlane(i32 %read0)			%read1 = call i32 @llvm.amdgcn.readfirstlane(i32 %read0)
	%read2 = call i32 @llvm.amdgcn.readfirstlane(i32 %read1)			%read2 = call i32 @llvm.amdgcn.readfirstlane(i32 %read1)
	ret i32 %read2			ret i32 %read2
	}			}

	define i32 @readfirstlane_readlane(i32 %arg) {			define i32 @readfirstlane_readlane(i32 %arg) {
	; CHECK-LABEL: @readfirstlane_readlane(			; CHECK-LABEL: define i32 @readfirstlane_readlane
	; CHECK-NEXT: [[READ0:%.]] = call i32 @llvm.amdgcn.readfirstlane(i32 [[ARG:%.]])			; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: [[READ0:%.*]] = call i32 @llvm.amdgcn.readfirstlane.i32(i32 [[ARG]])
	; CHECK-NEXT: ret i32 [[READ0]]			; CHECK-NEXT: ret i32 [[READ0]]
	;			;
	%read0 = call i32 @llvm.amdgcn.readfirstlane(i32 %arg)			%read0 = call i32 @llvm.amdgcn.readfirstlane(i32 %arg)
	%read1 = call i32 @llvm.amdgcn.readlane(i32 %read0, i32 0)			%read1 = call i32 @llvm.amdgcn.readlane(i32 %read0, i32 0)
	ret i32 %read1			ret i32 %read1
	}			}

	define i32 @readfirstlane_readfirstlane_different_block(i32 %arg) {			define i32 @readfirstlane_readfirstlane_different_block(i32 %arg) {
	; CHECK-LABEL: @readfirstlane_readfirstlane_different_block(			; CHECK-LABEL: define i32 @readfirstlane_readfirstlane_different_block
				; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: bb0:			; CHECK-NEXT: bb0:
	; CHECK-NEXT: [[READ0:%.]] = call i32 @llvm.amdgcn.readfirstlane(i32 [[ARG:%.]])			; CHECK-NEXT: [[READ0:%.*]] = call i32 @llvm.amdgcn.readfirstlane.i32(i32 [[ARG]])
	; CHECK-NEXT: br label [[BB1:%.*]]			; CHECK-NEXT: br label [[BB1:%.*]]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: [[READ1:%.*]] = call i32 @llvm.amdgcn.readfirstlane(i32 [[READ0]])			; CHECK-NEXT: [[READ1:%.*]] = call i32 @llvm.amdgcn.readfirstlane.i32(i32 [[READ0]])
	; CHECK-NEXT: ret i32 [[READ1]]			; CHECK-NEXT: ret i32 [[READ1]]
	;			;
	bb0:			bb0:
	%read0 = call i32 @llvm.amdgcn.readfirstlane(i32 %arg)			%read0 = call i32 @llvm.amdgcn.readfirstlane(i32 %arg)
	br label %bb1			br label %bb1

	bb1:			bb1:
	%read1 = call i32 @llvm.amdgcn.readfirstlane(i32 %read0)			%read1 = call i32 @llvm.amdgcn.readfirstlane(i32 %read0)
	ret i32 %read1			ret i32 %read1
	}			}

	define i32 @readfirstlane_readlane_different_block(i32 %arg) {			define i32 @readfirstlane_readlane_different_block(i32 %arg) {
	; CHECK-LABEL: @readfirstlane_readlane_different_block(			; CHECK-LABEL: define i32 @readfirstlane_readlane_different_block
				; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: bb0:			; CHECK-NEXT: bb0:
	; CHECK-NEXT: [[READ0:%.]] = call i32 @llvm.amdgcn.readlane(i32 [[ARG:%.]], i32 0)			; CHECK-NEXT: [[READ0:%.*]] = call i32 @llvm.amdgcn.readlane.i32(i32 [[ARG]], i32 0)
	; CHECK-NEXT: br label [[BB1:%.*]]			; CHECK-NEXT: br label [[BB1:%.*]]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: [[READ1:%.*]] = call i32 @llvm.amdgcn.readfirstlane(i32 [[READ0]])			; CHECK-NEXT: [[READ1:%.*]] = call i32 @llvm.amdgcn.readfirstlane.i32(i32 [[READ0]])
	; CHECK-NEXT: ret i32 [[READ1]]			; CHECK-NEXT: ret i32 [[READ1]]
	;			;
	bb0:			bb0:
	%read0 = call i32 @llvm.amdgcn.readlane(i32 %arg, i32 0)			%read0 = call i32 @llvm.amdgcn.readlane(i32 %arg, i32 0)
	br label %bb1			br label %bb1

	bb1:			bb1:
	%read1 = call i32 @llvm.amdgcn.readfirstlane(i32 %read0)			%read1 = call i32 @llvm.amdgcn.readfirstlane(i32 %read0)
	ret i32 %read1			ret i32 %read1
	}			}

				define i32 @readfirstlane_bitcast(float %arg) {
				; CHECK-LABEL: define i32 @readfirstlane_bitcast
				; CHECK-SAME: (float [[ARG:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: [[TMP1:%.*]] = call float @llvm.amdgcn.readfirstlane.f32(float [[ARG]])
				; CHECK-NEXT: [[READ:%.*]] = bitcast float [[TMP1]] to i32
				; CHECK-NEXT: ret i32 [[READ]]
				;
				%bitcast.arg = bitcast float %arg to i32
				%read = call i32 @llvm.amdgcn.readfirstlane(i32 %bitcast.arg)
				ret i32 %read
				}

				define float @bitcast_readfirstlane_bitcast(float %arg) {
				; CHECK-LABEL: define float @bitcast_readfirstlane_bitcast
				; CHECK-SAME: (float [[ARG:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: [[TMP1:%.*]] = call float @llvm.amdgcn.readfirstlane.f32(float [[ARG]])
				; CHECK-NEXT: ret float [[TMP1]]
				;
				%bitcast.arg = bitcast float %arg to i32
				%read = call i32 @llvm.amdgcn.readfirstlane(i32 %bitcast.arg)
				%cast.read = bitcast i32 %read to float
				ret float %cast.read
				}

				define i32 @readfirstlane_bitcast_multi_use(float %arg) {
				; CHECK-LABEL: define i32 @readfirstlane_bitcast_multi_use
				; CHECK-SAME: (float [[ARG:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: store float [[ARG]], ptr undef, align 4
				; CHECK-NEXT: [[TMP1:%.*]] = call float @llvm.amdgcn.readfirstlane.f32(float [[ARG]])
				; CHECK-NEXT: [[READ:%.*]] = bitcast float [[TMP1]] to i32
				; CHECK-NEXT: ret i32 [[READ]]
				;
				%bitcast.arg = bitcast float %arg to i32
				store i32 %bitcast.arg, ptr undef
				%read = call i32 @llvm.amdgcn.readfirstlane(i32 %bitcast.arg)
				ret i32 %read
				}

	; --------------------------------------------------------------------			; --------------------------------------------------------------------
	; llvm.amdgcn.readlane			; llvm.amdgcn.readlane
	; --------------------------------------------------------------------			; --------------------------------------------------------------------

	declare i32 @llvm.amdgcn.readlane(i32, i32)			declare i32 @llvm.amdgcn.readlane(i32, i32)

	define amdgpu_kernel void @readlane_constant(i32 %arg, i32 %lane) {			define amdgpu_kernel void @readlane_constant(i32 %arg, i32 %lane) {
	; CHECK-LABEL: @readlane_constant(			; CHECK-LABEL: define amdgpu_kernel void @readlane_constant
	; CHECK-NEXT: [[VAR:%.]] = call i32 @llvm.amdgcn.readlane(i32 [[ARG:%.]], i32 7)			; CHECK-SAME: (i32 [[ARG:%.]], i32 [[LANE:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[VAR:%.*]] = call i32 @llvm.amdgcn.readlane.i32(i32 [[ARG]], i32 7)
	; CHECK-NEXT: store volatile i32 [[VAR]], ptr undef, align 4			; CHECK-NEXT: store volatile i32 [[VAR]], ptr undef, align 4
	; CHECK-NEXT: store volatile i32 0, ptr undef, align 4			; CHECK-NEXT: store volatile i32 0, ptr undef, align 4
	; CHECK-NEXT: store volatile i32 123, ptr undef, align 4			; CHECK-NEXT: store volatile i32 123, ptr undef, align 4
	; CHECK-NEXT: store volatile i32 ptrtoint (ptr @gv to i32), ptr undef, align 4			; CHECK-NEXT: store volatile i32 ptrtoint (ptr @gv to i32), ptr undef, align 4
	; CHECK-NEXT: store volatile i32 undef, ptr undef, align 4			; CHECK-NEXT: store volatile i32 undef, ptr undef, align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%var = call i32 @llvm.amdgcn.readlane(i32 %arg, i32 7)			%var = call i32 @llvm.amdgcn.readlane(i32 %arg, i32 7)
	%zero = call i32 @llvm.amdgcn.readlane(i32 0, i32 %lane)			%zero = call i32 @llvm.amdgcn.readlane(i32 0, i32 %lane)
	%imm = call i32 @llvm.amdgcn.readlane(i32 123, i32 %lane)			%imm = call i32 @llvm.amdgcn.readlane(i32 123, i32 %lane)
	%constexpr = call i32 @llvm.amdgcn.readlane(i32 ptrtoint (ptr @gv to i32), i32 %lane)			%constexpr = call i32 @llvm.amdgcn.readlane(i32 ptrtoint (ptr @gv to i32), i32 %lane)
	%undef = call i32 @llvm.amdgcn.readlane(i32 undef, i32 %lane)			%undef = call i32 @llvm.amdgcn.readlane(i32 undef, i32 %lane)
	store volatile i32 %var, ptr undef			store volatile i32 %var, ptr undef
	store volatile i32 %zero, ptr undef			store volatile i32 %zero, ptr undef
	store volatile i32 %imm, ptr undef			store volatile i32 %imm, ptr undef
	store volatile i32 %constexpr, ptr undef			store volatile i32 %constexpr, ptr undef
	store volatile i32 %undef, ptr undef			store volatile i32 %undef, ptr undef
	ret void			ret void
	}			}

	define i32 @readlane_idempotent(i32 %arg, i32 %lane) {			define i32 @readlane_idempotent(i32 %arg, i32 %lane) {
	; CHECK-LABEL: @readlane_idempotent(			; CHECK-LABEL: define i32 @readlane_idempotent
	; CHECK-NEXT: [[READ0:%.]] = call i32 @llvm.amdgcn.readlane(i32 [[ARG:%.]], i32 [[LANE:%.*]])			; CHECK-SAME: (i32 [[ARG:%.]], i32 [[LANE:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[READ0:%.*]] = call i32 @llvm.amdgcn.readlane.i32(i32 [[ARG]], i32 [[LANE]])
	; CHECK-NEXT: ret i32 [[READ0]]			; CHECK-NEXT: ret i32 [[READ0]]
	;			;
	%read0 = call i32 @llvm.amdgcn.readlane(i32 %arg, i32 %lane)			%read0 = call i32 @llvm.amdgcn.readlane(i32 %arg, i32 %lane)
	%read1 = call i32 @llvm.amdgcn.readlane(i32 %read0, i32 %lane)			%read1 = call i32 @llvm.amdgcn.readlane(i32 %read0, i32 %lane)
	ret i32 %read1			ret i32 %read1
	}			}

	define i32 @readlane_idempotent_different_lanes(i32 %arg, i32 %lane0, i32 %lane1) {			define i32 @readlane_idempotent_different_lanes(i32 %arg, i32 %lane0, i32 %lane1) {
	; CHECK-LABEL: @readlane_idempotent_different_lanes(			; CHECK-LABEL: define i32 @readlane_idempotent_different_lanes
	; CHECK-NEXT: [[READ0:%.]] = call i32 @llvm.amdgcn.readlane(i32 [[ARG:%.]], i32 [[LANE0:%.*]])			; CHECK-SAME: (i32 [[ARG:%.]], i32 [[LANE0:%.]], i32 [[LANE1:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: [[READ1:%.]] = call i32 @llvm.amdgcn.readlane(i32 [[READ0]], i32 [[LANE1:%.]])			; CHECK-NEXT: [[READ0:%.*]] = call i32 @llvm.amdgcn.readlane.i32(i32 [[ARG]], i32 [[LANE0]])
				; CHECK-NEXT: [[READ1:%.*]] = call i32 @llvm.amdgcn.readlane.i32(i32 [[READ0]], i32 [[LANE1]])
	; CHECK-NEXT: ret i32 [[READ1]]			; CHECK-NEXT: ret i32 [[READ1]]
	;			;
	%read0 = call i32 @llvm.amdgcn.readlane(i32 %arg, i32 %lane0)			%read0 = call i32 @llvm.amdgcn.readlane(i32 %arg, i32 %lane0)
	%read1 = call i32 @llvm.amdgcn.readlane(i32 %read0, i32 %lane1)			%read1 = call i32 @llvm.amdgcn.readlane(i32 %read0, i32 %lane1)
	ret i32 %read1			ret i32 %read1
	}			}

	define i32 @readlane_readfirstlane(i32 %arg) {			define i32 @readlane_readfirstlane(i32 %arg) {
	; CHECK-LABEL: @readlane_readfirstlane(			; CHECK-LABEL: define i32 @readlane_readfirstlane
	; CHECK-NEXT: [[READ0:%.]] = call i32 @llvm.amdgcn.readfirstlane(i32 [[ARG:%.]])			; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: [[READ0:%.*]] = call i32 @llvm.amdgcn.readfirstlane.i32(i32 [[ARG]])
	; CHECK-NEXT: ret i32 [[READ0]]			; CHECK-NEXT: ret i32 [[READ0]]
	;			;
	%read0 = call i32 @llvm.amdgcn.readfirstlane(i32 %arg)			%read0 = call i32 @llvm.amdgcn.readfirstlane(i32 %arg)
	%read1 = call i32 @llvm.amdgcn.readlane(i32 %read0, i32 0)			%read1 = call i32 @llvm.amdgcn.readlane(i32 %read0, i32 0)
	ret i32 %read1			ret i32 %read1
	}			}

	define i32 @readlane_idempotent_different_block(i32 %arg, i32 %lane) {			define i32 @readlane_idempotent_different_block(i32 %arg, i32 %lane) {
	; CHECK-LABEL: @readlane_idempotent_different_block(			; CHECK-LABEL: define i32 @readlane_idempotent_different_block
				; CHECK-SAME: (i32 [[ARG:%.]], i32 [[LANE:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: bb0:			; CHECK-NEXT: bb0:
	; CHECK-NEXT: [[READ0:%.]] = call i32 @llvm.amdgcn.readlane(i32 [[ARG:%.]], i32 [[LANE:%.*]])			; CHECK-NEXT: [[READ0:%.*]] = call i32 @llvm.amdgcn.readlane.i32(i32 [[ARG]], i32 [[LANE]])
	; CHECK-NEXT: br label [[BB1:%.*]]			; CHECK-NEXT: br label [[BB1:%.*]]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: [[READ1:%.*]] = call i32 @llvm.amdgcn.readlane(i32 [[READ0]], i32 [[LANE]])			; CHECK-NEXT: [[READ1:%.*]] = call i32 @llvm.amdgcn.readlane.i32(i32 [[READ0]], i32 [[LANE]])
	; CHECK-NEXT: ret i32 [[READ1]]			; CHECK-NEXT: ret i32 [[READ1]]
	;			;
	bb0:			bb0:
	%read0 = call i32 @llvm.amdgcn.readlane(i32 %arg, i32 %lane)			%read0 = call i32 @llvm.amdgcn.readlane(i32 %arg, i32 %lane)
	br label %bb1			br label %bb1

	bb1:			bb1:
	%read1 = call i32 @llvm.amdgcn.readlane(i32 %read0, i32 %lane)			%read1 = call i32 @llvm.amdgcn.readlane(i32 %read0, i32 %lane)
	ret i32 %read1			ret i32 %read1
	}			}


	define i32 @readlane_readfirstlane_different_block(i32 %arg) {			define i32 @readlane_readfirstlane_different_block(i32 %arg) {
	; CHECK-LABEL: @readlane_readfirstlane_different_block(			; CHECK-LABEL: define i32 @readlane_readfirstlane_different_block
				; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: bb0:			; CHECK-NEXT: bb0:
	; CHECK-NEXT: [[READ0:%.]] = call i32 @llvm.amdgcn.readfirstlane(i32 [[ARG:%.]])			; CHECK-NEXT: [[READ0:%.*]] = call i32 @llvm.amdgcn.readfirstlane.i32(i32 [[ARG]])
	; CHECK-NEXT: br label [[BB1:%.*]]			; CHECK-NEXT: br label [[BB1:%.*]]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: [[READ1:%.*]] = call i32 @llvm.amdgcn.readlane(i32 [[READ0]], i32 0)			; CHECK-NEXT: [[READ1:%.*]] = call i32 @llvm.amdgcn.readlane.i32(i32 [[READ0]], i32 0)
	; CHECK-NEXT: ret i32 [[READ1]]			; CHECK-NEXT: ret i32 [[READ1]]
	;			;
	bb0:			bb0:
	%read0 = call i32 @llvm.amdgcn.readfirstlane(i32 %arg)			%read0 = call i32 @llvm.amdgcn.readfirstlane(i32 %arg)
	br label %bb1			br label %bb1

	bb1:			bb1:
	%read1 = call i32 @llvm.amdgcn.readlane(i32 %read0, i32 0)			%read1 = call i32 @llvm.amdgcn.readlane(i32 %read0, i32 0)
	ret i32 %read1			ret i32 %read1
	}			}

	; --------------------------------------------------------------------			; --------------------------------------------------------------------
	; llvm.amdgcn.update.dpp.i32			; llvm.amdgcn.update.dpp.i32
	; --------------------------------------------------------------------			; --------------------------------------------------------------------

	declare i32 @llvm.amdgcn.update.dpp.i32(i32, i32, i32, i32, i32, i1)			declare i32 @llvm.amdgcn.update.dpp.i32(i32, i32, i32, i32, i32, i1)

	define amdgpu_kernel void @update_dpp_no_combine(ptr addrspace(1) %out, i32 %in1, i32 %in2) {			define amdgpu_kernel void @update_dpp_no_combine(ptr addrspace(1) %out, i32 %in1, i32 %in2) {
	; CHECK-LABEL: @update_dpp_no_combine(			; CHECK-LABEL: define amdgpu_kernel void @update_dpp_no_combine
	; CHECK-NEXT: [[TMP0:%.]] = call i32 @llvm.amdgcn.update.dpp.i32(i32 [[IN1:%.]], i32 [[IN2:%.*]], i32 1, i32 1, i32 1, i1 false)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], i32 [[IN1:%.]], i32 [[IN2:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: store i32 [[TMP0]], ptr addrspace(1) [[OUT:%.*]], align 4			; CHECK-NEXT: [[VAL0:%.*]] = call i32 @llvm.amdgcn.update.dpp.i32(i32 [[IN1]], i32 [[IN2]], i32 1, i32 1, i32 1, i1 false)
				; CHECK-NEXT: store i32 [[VAL0]], ptr addrspace(1) [[OUT]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%tmp0 = call i32 @llvm.amdgcn.update.dpp.i32(i32 %in1, i32 %in2, i32 1, i32 1, i32 1, i1 0)			%val0 = call i32 @llvm.amdgcn.update.dpp.i32(i32 %in1, i32 %in2, i32 1, i32 1, i32 1, i1 0)
	store i32 %tmp0, ptr addrspace(1) %out			store i32 %val0, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @update_dpp_drop_old(ptr addrspace(1) %out, i32 %in1, i32 %in2) {			define amdgpu_kernel void @update_dpp_drop_old(ptr addrspace(1) %out, i32 %in1, i32 %in2) {
	; CHECK-LABEL: @update_dpp_drop_old(			; CHECK-LABEL: define amdgpu_kernel void @update_dpp_drop_old
	; CHECK-NEXT: [[TMP0:%.]] = call i32 @llvm.amdgcn.update.dpp.i32(i32 undef, i32 [[IN2:%.]], i32 3, i32 15, i32 15, i1 true)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], i32 [[IN1:%.]], i32 [[IN2:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: store i32 [[TMP0]], ptr addrspace(1) [[OUT:%.*]], align 4			; CHECK-NEXT: [[VAL0:%.*]] = call i32 @llvm.amdgcn.update.dpp.i32(i32 undef, i32 [[IN2]], i32 3, i32 15, i32 15, i1 true)
				; CHECK-NEXT: store i32 [[VAL0]], ptr addrspace(1) [[OUT]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%tmp0 = call i32 @llvm.amdgcn.update.dpp.i32(i32 %in1, i32 %in2, i32 3, i32 15, i32 15, i1 1)			%val0 = call i32 @llvm.amdgcn.update.dpp.i32(i32 %in1, i32 %in2, i32 3, i32 15, i32 15, i1 1)
	store i32 %tmp0, ptr addrspace(1) %out			store i32 %val0, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @update_dpp_undef_old(ptr addrspace(1) %out, i32 %in1) {			define amdgpu_kernel void @update_dpp_undef_old(ptr addrspace(1) %out, i32 %in1) {
	; CHECK-LABEL: @update_dpp_undef_old(			; CHECK-LABEL: define amdgpu_kernel void @update_dpp_undef_old
	; CHECK-NEXT: [[TMP0:%.]] = call i32 @llvm.amdgcn.update.dpp.i32(i32 undef, i32 [[IN1:%.]], i32 4, i32 15, i32 15, i1 true)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], i32 [[IN1:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: store i32 [[TMP0]], ptr addrspace(1) [[OUT:%.*]], align 4			; CHECK-NEXT: [[VAL0:%.*]] = call i32 @llvm.amdgcn.update.dpp.i32(i32 undef, i32 [[IN1]], i32 4, i32 15, i32 15, i1 true)
				; CHECK-NEXT: store i32 [[VAL0]], ptr addrspace(1) [[OUT]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%tmp0 = call i32 @llvm.amdgcn.update.dpp.i32(i32 undef, i32 %in1, i32 4, i32 15, i32 15, i1 1)			%val0 = call i32 @llvm.amdgcn.update.dpp.i32(i32 undef, i32 %in1, i32 4, i32 15, i32 15, i1 1)
	store i32 %tmp0, ptr addrspace(1) %out			store i32 %val0, ptr addrspace(1) %out
	ret void			ret void
	}			}


	; --------------------------------------------------------------------			; --------------------------------------------------------------------
	; llvm.amdgcn.permlane16			; llvm.amdgcn.permlane16
	; --------------------------------------------------------------------			; --------------------------------------------------------------------

	declare i32 @llvm.amdgcn.permlane16(i32, i32, i32, i32, i1 immarg, i1 immarg)			declare i32 @llvm.amdgcn.permlane16(i32, i32, i32, i32, i1 immarg, i1 immarg)

	define amdgpu_kernel void @permlane16(ptr addrspace(1) %out, i32 %src0, i32 %src1, i32 %src2) {			define amdgpu_kernel void @permlane16(ptr addrspace(1) %out, i32 %src0, i32 %src1, i32 %src2) {
	; CHECK-LABEL: @permlane16(			; CHECK-LABEL: define amdgpu_kernel void @permlane16
	; CHECK-NEXT: [[RES:%.]] = call i32 @llvm.amdgcn.permlane16(i32 12345, i32 [[SRC0:%.]], i32 [[SRC1:%.]], i32 [[SRC2:%.]], i1 false, i1 false)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], i32 [[SRC0:%.]], i32 [[SRC1:%.]], i32 [[SRC2:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: store i32 [[RES]], ptr addrspace(1) [[OUT:%.*]], align 4			; CHECK-NEXT: [[RES:%.*]] = call i32 @llvm.amdgcn.permlane16.i32(i32 12345, i32 [[SRC0]], i32 [[SRC1]], i32 [[SRC2]], i1 false, i1 false)
				; CHECK-NEXT: store i32 [[RES]], ptr addrspace(1) [[OUT]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%res = call i32 @llvm.amdgcn.permlane16(i32 12345, i32 %src0, i32 %src1, i32 %src2, i1 false, i1 false)			%res = call i32 @llvm.amdgcn.permlane16(i32 12345, i32 %src0, i32 %src1, i32 %src2, i1 false, i1 false)
	store i32 %res, ptr addrspace(1) %out			store i32 %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @permlane16_bound_ctrl(ptr addrspace(1) %out, i32 %src0, i32 %src1, i32 %src2) {			define amdgpu_kernel void @permlane16_bound_ctrl(ptr addrspace(1) %out, i32 %src0, i32 %src1, i32 %src2) {
	; CHECK-LABEL: @permlane16_bound_ctrl(			; CHECK-LABEL: define amdgpu_kernel void @permlane16_bound_ctrl
	; CHECK-NEXT: [[RES:%.]] = call i32 @llvm.amdgcn.permlane16(i32 undef, i32 [[SRC0:%.]], i32 [[SRC1:%.]], i32 [[SRC2:%.]], i1 false, i1 true)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], i32 [[SRC0:%.]], i32 [[SRC1:%.]], i32 [[SRC2:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: store i32 [[RES]], ptr addrspace(1) [[OUT:%.*]], align 4			; CHECK-NEXT: [[RES:%.*]] = call i32 @llvm.amdgcn.permlane16.i32(i32 undef, i32 [[SRC0]], i32 [[SRC1]], i32 [[SRC2]], i1 false, i1 true)
				; CHECK-NEXT: store i32 [[RES]], ptr addrspace(1) [[OUT]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%res = call i32 @llvm.amdgcn.permlane16(i32 12345, i32 %src0, i32 %src1, i32 %src2, i1 false, i1 true)			%res = call i32 @llvm.amdgcn.permlane16(i32 12345, i32 %src0, i32 %src1, i32 %src2, i1 false, i1 true)
	store i32 %res, ptr addrspace(1) %out			store i32 %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @permlane16_fetch_invalid_bound_ctrl(ptr addrspace(1) %out, i32 %src0, i32 %src1, i32 %src2) {			define amdgpu_kernel void @permlane16_fetch_invalid_bound_ctrl(ptr addrspace(1) %out, i32 %src0, i32 %src1, i32 %src2) {
	; CHECK-LABEL: @permlane16_fetch_invalid_bound_ctrl(			; CHECK-LABEL: define amdgpu_kernel void @permlane16_fetch_invalid_bound_ctrl
	; CHECK-NEXT: [[RES:%.]] = call i32 @llvm.amdgcn.permlane16(i32 undef, i32 [[SRC0:%.]], i32 [[SRC1:%.]], i32 [[SRC2:%.]], i1 true, i1 true)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], i32 [[SRC0:%.]], i32 [[SRC1:%.]], i32 [[SRC2:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: store i32 [[RES]], ptr addrspace(1) [[OUT:%.*]], align 4			; CHECK-NEXT: [[RES:%.*]] = call i32 @llvm.amdgcn.permlane16.i32(i32 undef, i32 [[SRC0]], i32 [[SRC1]], i32 [[SRC2]], i1 true, i1 true)
				; CHECK-NEXT: store i32 [[RES]], ptr addrspace(1) [[OUT]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%res = call i32 @llvm.amdgcn.permlane16(i32 12345, i32 %src0, i32 %src1, i32 %src2, i1 true, i1 true)			%res = call i32 @llvm.amdgcn.permlane16(i32 12345, i32 %src0, i32 %src1, i32 %src2, i1 true, i1 true)
	store i32 %res, ptr addrspace(1) %out			store i32 %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	; --------------------------------------------------------------------			; --------------------------------------------------------------------
	; llvm.amdgcn.permlanex16			; llvm.amdgcn.permlanex16
	; --------------------------------------------------------------------			; --------------------------------------------------------------------

	declare i32 @llvm.amdgcn.permlanex16(i32, i32, i32, i32, i1 immarg, i1 immarg)			declare i32 @llvm.amdgcn.permlanex16(i32, i32, i32, i32, i1 immarg, i1 immarg)

	define amdgpu_kernel void @permlanex16(ptr addrspace(1) %out, i32 %src0, i32 %src1, i32 %src2) {			define amdgpu_kernel void @permlanex16(ptr addrspace(1) %out, i32 %src0, i32 %src1, i32 %src2) {
	; CHECK-LABEL: @permlanex16(			; CHECK-LABEL: define amdgpu_kernel void @permlanex16
	; CHECK-NEXT: [[RES:%.]] = call i32 @llvm.amdgcn.permlanex16(i32 12345, i32 [[SRC0:%.]], i32 [[SRC1:%.]], i32 [[SRC2:%.]], i1 false, i1 false)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], i32 [[SRC0:%.]], i32 [[SRC1:%.]], i32 [[SRC2:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: store i32 [[RES]], ptr addrspace(1) [[OUT:%.*]], align 4			; CHECK-NEXT: [[RES:%.*]] = call i32 @llvm.amdgcn.permlanex16.i32(i32 12345, i32 [[SRC0]], i32 [[SRC1]], i32 [[SRC2]], i1 false, i1 false)
				; CHECK-NEXT: store i32 [[RES]], ptr addrspace(1) [[OUT]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%res = call i32 @llvm.amdgcn.permlanex16(i32 12345, i32 %src0, i32 %src1, i32 %src2, i1 false, i1 false)			%res = call i32 @llvm.amdgcn.permlanex16(i32 12345, i32 %src0, i32 %src1, i32 %src2, i1 false, i1 false)
	store i32 %res, ptr addrspace(1) %out			store i32 %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @permlanex16_bound_ctrl(ptr addrspace(1) %out, i32 %src0, i32 %src1, i32 %src2) {			define amdgpu_kernel void @permlanex16_bound_ctrl(ptr addrspace(1) %out, i32 %src0, i32 %src1, i32 %src2) {
	; CHECK-LABEL: @permlanex16_bound_ctrl(			; CHECK-LABEL: define amdgpu_kernel void @permlanex16_bound_ctrl
	; CHECK-NEXT: [[RES:%.]] = call i32 @llvm.amdgcn.permlanex16(i32 undef, i32 [[SRC0:%.]], i32 [[SRC1:%.]], i32 [[SRC2:%.]], i1 false, i1 true)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], i32 [[SRC0:%.]], i32 [[SRC1:%.]], i32 [[SRC2:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: store i32 [[RES]], ptr addrspace(1) [[OUT:%.*]], align 4			; CHECK-NEXT: [[RES:%.*]] = call i32 @llvm.amdgcn.permlanex16.i32(i32 undef, i32 [[SRC0]], i32 [[SRC1]], i32 [[SRC2]], i1 false, i1 true)
				; CHECK-NEXT: store i32 [[RES]], ptr addrspace(1) [[OUT]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%res = call i32 @llvm.amdgcn.permlanex16(i32 12345, i32 %src0, i32 %src1, i32 %src2, i1 false, i1 true)			%res = call i32 @llvm.amdgcn.permlanex16(i32 12345, i32 %src0, i32 %src1, i32 %src2, i1 false, i1 true)
	store i32 %res, ptr addrspace(1) %out			store i32 %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @permlanex16_fetch_invalid_bound_ctrl(ptr addrspace(1) %out, i32 %src0, i32 %src1, i32 %src2) {			define amdgpu_kernel void @permlanex16_fetch_invalid_bound_ctrl(ptr addrspace(1) %out, i32 %src0, i32 %src1, i32 %src2) {
	; CHECK-LABEL: @permlanex16_fetch_invalid_bound_ctrl(			; CHECK-LABEL: define amdgpu_kernel void @permlanex16_fetch_invalid_bound_ctrl
	; CHECK-NEXT: [[RES:%.]] = call i32 @llvm.amdgcn.permlanex16(i32 undef, i32 [[SRC0:%.]], i32 [[SRC1:%.]], i32 [[SRC2:%.]], i1 true, i1 true)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], i32 [[SRC0:%.]], i32 [[SRC1:%.]], i32 [[SRC2:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: store i32 [[RES]], ptr addrspace(1) [[OUT:%.*]], align 4			; CHECK-NEXT: [[RES:%.*]] = call i32 @llvm.amdgcn.permlanex16.i32(i32 undef, i32 [[SRC0]], i32 [[SRC1]], i32 [[SRC2]], i1 true, i1 true)
				; CHECK-NEXT: store i32 [[RES]], ptr addrspace(1) [[OUT]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%res = call i32 @llvm.amdgcn.permlanex16(i32 12345, i32 %src0, i32 %src1, i32 %src2, i1 true, i1 true)			%res = call i32 @llvm.amdgcn.permlanex16(i32 12345, i32 %src0, i32 %src1, i32 %src2, i1 true, i1 true)
	store i32 %res, ptr addrspace(1) %out			store i32 %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	; --------------------------------------------------------------------			; --------------------------------------------------------------------
	▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
	declare <4 x float> @llvm.amdgcn.image.sample.lz.2d.v4f32.f32(i32, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.sample.lz.2d.v4f32.f32(i32, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.sample.c.lz.1d.v4f32.f32(i32, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.sample.c.lz.1d.v4f32.f32(i32, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.sample.c.lz.2d.v4f32.f32(i32, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.sample.c.lz.2d.v4f32.f32(i32, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1

	declare float @llvm.amdgcn.image.sample.c.d.o.2darray.f32.f32.f32(i32, i32, float, float, float, float, float, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare float @llvm.amdgcn.image.sample.c.d.o.2darray.f32.f32.f32(i32, i32, float, float, float, float, float, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <2 x float> @llvm.amdgcn.image.sample.c.d.o.2darray.v2f32.f32.f32(i32, i32, float, float, float, float, float, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <2 x float> @llvm.amdgcn.image.sample.c.d.o.2darray.v2f32.f32.f32(i32, i32, float, float, float, float, float, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1

	define amdgpu_kernel void @image_sample_a16_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s) {			define amdgpu_kernel void @image_sample_a16_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s) {
	; CHECK-LABEL: @image_sample_a16_1d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_1d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f16(i32 15, half [[S:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], half [[S:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f16(i32 15, half [[S]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %s32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %s32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t) {			define amdgpu_kernel void @image_sample_a16_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t) {
	; CHECK-LABEL: @image_sample_a16_2d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_2d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.2d.v4f32.f16(i32 15, half [[S:%.]], half [[T:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], half [[S:%.]], half [[T:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.2d.v4f32.f16(i32 15, half [[S]], half [[T]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%t32 = fpext half %t to float			%t32 = fpext half %t to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.2d.v4f32.f32(i32 15, float %s32, float %t32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.2d.v4f32.f32(i32 15, float %s32, float %t32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_3d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %r) {			define amdgpu_kernel void @image_sample_a16_3d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %r) {
	; CHECK-LABEL: @image_sample_a16_3d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_3d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.3d.v4f32.f16(i32 15, half [[S:%.]], half [[T:%.]], half [[R:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], half [[S:%.]], half [[T:%.]], half [[R:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.3d.v4f32.f16(i32 15, half [[S]], half [[T]], half [[R]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%t32 = fpext half %t to float			%t32 = fpext half %t to float
	%r32 = fpext half %r to float			%r32 = fpext half %r to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.3d.v4f32.f32(i32 15, float %s32, float %t32, float %r32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.3d.v4f32.f32(i32 15, float %s32, float %t32, float %r32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_cube(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %face) {			define amdgpu_kernel void @image_sample_a16_cube(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %face) {
	;			;
	; CHECK-LABEL: @image_sample_a16_cube(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_cube
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.cube.v4f32.f16(i32 15, half [[S:%.]], half [[T:%.]], half [[FACE:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], half [[S:%.]], half [[T:%.]], half [[FACE:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.cube.v4f32.f16(i32 15, half [[S]], half [[T]], half [[FACE]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%t32 = fpext half %t to float			%t32 = fpext half %t to float
	%face32 = fpext half %face to float			%face32 = fpext half %face to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.cube.v4f32.f32(i32 15, float %s32, float %t32, float %face32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.cube.v4f32.f32(i32 15, float %s32, float %t32, float %face32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_1darray(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %slice) {			define amdgpu_kernel void @image_sample_a16_1darray(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %slice) {
	; CHECK-LABEL: @image_sample_a16_1darray(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_1darray
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.1darray.v4f32.f16(i32 15, half [[S:%.]], half [[SLICE:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], half [[S:%.]], half [[SLICE:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.1darray.v4f32.f16(i32 15, half [[S]], half [[SLICE]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%slice32 = fpext half %slice to float			%slice32 = fpext half %slice to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.1darray.v4f32.f32(i32 15, float %s32, float %slice32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.1darray.v4f32.f32(i32 15, float %s32, float %slice32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_2darray(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %slice) {			define amdgpu_kernel void @image_sample_a16_2darray(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %slice) {
	; CHECK-LABEL: @image_sample_a16_2darray(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_2darray
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.2darray.v4f32.f16(i32 15, half [[S:%.]], half [[T:%.]], half [[SLICE:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], half [[S:%.]], half [[T:%.]], half [[SLICE:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.2darray.v4f32.f16(i32 15, half [[S]], half [[T]], half [[SLICE]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%t32 = fpext half %t to float			%t32 = fpext half %t to float
	%slice32 = fpext half %slice to float			%slice32 = fpext half %slice to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.2darray.v4f32.f32(i32 15, float %s32, float %t32, float %slice32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.2darray.v4f32.f32(i32 15, float %s32, float %t32, float %slice32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_c_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s) {			define amdgpu_kernel void @image_sample_a16_c_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s) {
	; CHECK-LABEL: @image_sample_a16_c_1d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_c_1d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.1d.v4f32.f16(i32 15, float [[ZCOMPARE:%.]], half [[S:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[ZCOMPARE:%.]], half [[S:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.1d.v4f32.f16(i32 15, float [[ZCOMPARE]], half [[S]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.c.1d.v4f32.f32(i32 15, float %zcompare, float %s32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.c.1d.v4f32.f32(i32 15, float %zcompare, float %s32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_c_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s, half %t) {			define amdgpu_kernel void @image_sample_a16_c_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s, half %t) {
	; CHECK-LABEL: @image_sample_a16_c_2d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_c_2d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.2d.v4f32.f16(i32 15, float [[ZCOMPARE:%.]], half [[S:%.]], half [[T:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[ZCOMPARE:%.]], half [[S:%.]], half [[T:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.2d.v4f32.f16(i32 15, float [[ZCOMPARE]], half [[S]], half [[T]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%t32 = fpext half %t to float			%t32 = fpext half %t to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.c.2d.v4f32.f32(i32 15, float %zcompare, float %s32, float %t32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.c.2d.v4f32.f32(i32 15, float %zcompare, float %s32, float %t32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_cl_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %clamp) {			define amdgpu_kernel void @image_sample_a16_cl_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %clamp) {
	; CHECK-LABEL: @image_sample_a16_cl_1d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_cl_1d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.cl.1d.v4f32.f16(i32 15, half [[S:%.]], half [[CLAMP:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], half [[S:%.]], half [[CLAMP:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.cl.1d.v4f32.f16(i32 15, half [[S]], half [[CLAMP]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%clamp32 = fpext half %clamp to float			%clamp32 = fpext half %clamp to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.cl.1d.v4f32.f32(i32 15, float %s32, float %clamp32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.cl.1d.v4f32.f32(i32 15, float %s32, float %clamp32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_cl_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %clamp) {			define amdgpu_kernel void @image_sample_a16_cl_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %clamp) {
	; CHECK-LABEL: @image_sample_a16_cl_2d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_cl_2d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.cl.2d.v4f32.f16(i32 15, half [[S:%.]], half [[T:%.]], half [[CLAMP:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], half [[S:%.]], half [[T:%.]], half [[CLAMP:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.cl.2d.v4f32.f16(i32 15, half [[S]], half [[T]], half [[CLAMP]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%t32 = fpext half %t to float			%t32 = fpext half %t to float
	%clamp32 = fpext half %clamp to float			%clamp32 = fpext half %clamp to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.cl.2d.v4f32.f32(i32 15, float %s32, float %t32, float %clamp32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.cl.2d.v4f32.f32(i32 15, float %s32, float %t32, float %clamp32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_c_cl_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s, half %clamp) {			define amdgpu_kernel void @image_sample_a16_c_cl_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s, half %clamp) {
	; CHECK-LABEL: @image_sample_a16_c_cl_1d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_c_cl_1d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.cl.1d.v4f32.f16(i32 15, float [[ZCOMPARE:%.]], half [[S:%.]], half [[CLAMP:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[ZCOMPARE:%.]], half [[S:%.]], half [[CLAMP:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.cl.1d.v4f32.f16(i32 15, float [[ZCOMPARE]], half [[S]], half [[CLAMP]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%clamp32 = fpext half %clamp to float			%clamp32 = fpext half %clamp to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.c.cl.1d.v4f32.f32(i32 15, float %zcompare, float %s32, float %clamp32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.c.cl.1d.v4f32.f32(i32 15, float %zcompare, float %s32, float %clamp32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_c_cl_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s, half %t, half %clamp) {			define amdgpu_kernel void @image_sample_a16_c_cl_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s, half %t, half %clamp) {
	; CHECK-LABEL: @image_sample_a16_c_cl_2d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_c_cl_2d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.cl.2d.v4f32.f16(i32 15, float [[ZCOMPARE:%.]], half [[S:%.]], half [[T:%.]], half [[CLAMP:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[ZCOMPARE:%.]], half [[S:%.]], half [[T:%.]], half [[CLAMP:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.cl.2d.v4f32.f16(i32 15, float [[ZCOMPARE]], half [[S]], half [[T]], half [[CLAMP]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%t32 = fpext half %t to float			%t32 = fpext half %t to float
	%clamp32 = fpext half %clamp to float			%clamp32 = fpext half %clamp to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.c.cl.2d.v4f32.f32(i32 15, float %zcompare, float %s32, float %t32, float %clamp32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.c.cl.2d.v4f32.f32(i32 15, float %zcompare, float %s32, float %t32, float %clamp32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_b16_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %bias, half %s) {			define amdgpu_kernel void @image_sample_a16_b16_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %bias, half %s) {
	; CHECK-LABEL: @image_sample_a16_b16_1d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_b16_1d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.b.1d.v4f32.f16.f16(i32 15, half [[BIAS:%.]], half [[S:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], half [[BIAS:%.]], half [[S:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.b.1d.v4f32.f16.f16(i32 15, half [[BIAS]], half [[S]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%bias32 = fpext half %bias to float			%bias32 = fpext half %bias to float
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.b.1d.v4f32.f32.f32(i32 15, float %bias32, float %s32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.b.1d.v4f32.f32.f32(i32 15, float %bias32, float %s32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_b32_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, half %s) {			define amdgpu_kernel void @image_sample_a16_b32_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, half %s) {
	; CHECK-LABEL: @image_sample_a16_b32_1d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_b32_1d
	; CHECK-NEXT: [[S32:%.]] = fpext half [[S:%.]] to float			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[BIAS:%.]], half [[S:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.b.1d.v4f32.f32.f32(i32 15, float [[BIAS:%.]], float [[S32]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[S32:%.*]] = fpext half [[S]] to float
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.b.1d.v4f32.f32.f32(i32 15, float [[BIAS]], float [[S32]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.b.1d.v4f32.f32.f32(i32 15, float %bias, float %s32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.b.1d.v4f32.f32.f32(i32 15, float %bias, float %s32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_b16_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %bias, half %s, half %t) {			define amdgpu_kernel void @image_sample_a16_b16_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %bias, half %s, half %t) {
	; CHECK-LABEL: @image_sample_a16_b16_2d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_b16_2d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.b.2d.v4f32.f16.f16(i32 15, half [[BIAS:%.]], half [[S:%.]], half [[T:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], half [[BIAS:%.]], half [[S:%.]], half [[T:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.b.2d.v4f32.f16.f16(i32 15, half [[BIAS]], half [[S]], half [[T]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%bias32 = fpext half %bias to float			%bias32 = fpext half %bias to float
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%t32 = fpext half %t to float			%t32 = fpext half %t to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.b.2d.v4f32.f32.f32(i32 15, float %bias32, float %s32, float %t32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.b.2d.v4f32.f32.f32(i32 15, float %bias32, float %s32, float %t32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_b32_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, half %s, half %t) {			define amdgpu_kernel void @image_sample_a16_b32_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, half %s, half %t) {
	; CHECK-LABEL: @image_sample_a16_b32_2d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_b32_2d
	; CHECK-NEXT: [[S32:%.]] = fpext half [[S:%.]] to float			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[BIAS:%.]], half [[S:%.]], half [[T:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: [[T32:%.]] = fpext half [[T:%.]] to float			; CHECK-NEXT: [[S32:%.*]] = fpext half [[S]] to float
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.b.2d.v4f32.f32.f32(i32 15, float [[BIAS:%.]], float [[S32]], float [[T32]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[T32:%.*]] = fpext half [[T]] to float
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.b.2d.v4f32.f32.f32(i32 15, float [[BIAS]], float [[S32]], float [[T32]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%t32 = fpext half %t to float			%t32 = fpext half %t to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.b.2d.v4f32.f32.f32(i32 15, float %bias, float %s32, float %t32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.b.2d.v4f32.f32.f32(i32 15, float %bias, float %s32, float %t32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_c_b16_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %bias, float %zcompare, half %s) {			define amdgpu_kernel void @image_sample_a16_c_b16_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %bias, float %zcompare, half %s) {
	; CHECK-LABEL: @image_sample_a16_c_b16_1d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_c_b16_1d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.b.1d.v4f32.f16.f16(i32 15, half [[BIAS:%.]], float [[ZCOMPARE:%.]], half [[S:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], half [[BIAS:%.]], float [[ZCOMPARE:%.]], half [[S:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.b.1d.v4f32.f16.f16(i32 15, half [[BIAS]], float [[ZCOMPARE]], half [[S]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%bias32 = fpext half %bias to float			%bias32 = fpext half %bias to float
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.c.b.1d.v4f32.f32.f32(i32 15, float %bias32, float %zcompare, float %s32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.c.b.1d.v4f32.f32.f32(i32 15, float %bias32, float %zcompare, float %s32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_c_b32_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, half %s) {			define amdgpu_kernel void @image_sample_a16_c_b32_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, half %s) {
	; CHECK-LABEL: @image_sample_a16_c_b32_1d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_c_b32_1d
	; CHECK-NEXT: [[S32:%.]] = fpext half [[S:%.]] to float			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[BIAS:%.]], float [[ZCOMPARE:%.]], half [[S:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.b.1d.v4f32.f32.f32(i32 15, float [[BIAS:%.]], float [[ZCOMPARE:%.]], float [[S32]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[S32:%.*]] = fpext half [[S]] to float
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.b.1d.v4f32.f32.f32(i32 15, float [[BIAS]], float [[ZCOMPARE]], float [[S32]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.c.b.1d.v4f32.f32.f32(i32 15, float %bias, float %zcompare, float %s32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.c.b.1d.v4f32.f32.f32(i32 15, float %bias, float %zcompare, float %s32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_c_b16_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %bias, float %zcompare, half %s, half %t) {			define amdgpu_kernel void @image_sample_a16_c_b16_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %bias, float %zcompare, half %s, half %t) {
	; CHECK-LABEL: @image_sample_a16_c_b16_2d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_c_b16_2d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.b.2d.v4f32.f16.f16(i32 15, half [[BIAS:%.]], float [[ZCOMPARE:%.]], half [[S:%.]], half [[T:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], half [[BIAS:%.]], float [[ZCOMPARE:%.]], half [[S:%.]], half [[T:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.b.2d.v4f32.f16.f16(i32 15, half [[BIAS]], float [[ZCOMPARE]], half [[S]], half [[T]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%bias32 = fpext half %bias to float			%bias32 = fpext half %bias to float
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%t32 = fpext half %t to float			%t32 = fpext half %t to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.c.b.2d.v4f32.f32.f32(i32 15, float %bias32, float %zcompare, float %s32, float %t32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.c.b.2d.v4f32.f32.f32(i32 15, float %bias32, float %zcompare, float %s32, float %t32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_c_b32_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, half %s, half %t) {			define amdgpu_kernel void @image_sample_a16_c_b32_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, half %s, half %t) {
	; CHECK-LABEL: @image_sample_a16_c_b32_2d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_c_b32_2d
	; CHECK-NEXT: [[S32:%.]] = fpext half [[S:%.]] to float			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[BIAS:%.]], float [[ZCOMPARE:%.]], half [[S:%.]], half [[T:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: [[T32:%.]] = fpext half [[T:%.]] to float			; CHECK-NEXT: [[S32:%.*]] = fpext half [[S]] to float
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.b.2d.v4f32.f32.f32(i32 15, float [[BIAS:%.]], float [[ZCOMPARE:%.]], float [[S32]], float [[T32]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[T32:%.*]] = fpext half [[T]] to float
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.b.2d.v4f32.f32.f32(i32 15, float [[BIAS]], float [[ZCOMPARE]], float [[S32]], float [[T32]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%t32 = fpext half %t to float			%t32 = fpext half %t to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.c.b.2d.v4f32.f32.f32(i32 15, float %bias, float %zcompare, float %s32, float %t32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.c.b.2d.v4f32.f32.f32(i32 15, float %bias, float %zcompare, float %s32, float %t32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_b16_cl_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %bias, half %s, half %clamp) {			define amdgpu_kernel void @image_sample_a16_b16_cl_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %bias, half %s, half %clamp) {
	; CHECK-LABEL: @image_sample_a16_b16_cl_1d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_b16_cl_1d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.b.cl.1d.v4f32.f16.f16(i32 15, half [[BIAS:%.]], half [[S:%.]], half [[CLAMP:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], half [[BIAS:%.]], half [[S:%.]], half [[CLAMP:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.b.cl.1d.v4f32.f16.f16(i32 15, half [[BIAS]], half [[S]], half [[CLAMP]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%bias32 = fpext half %bias to float			%bias32 = fpext half %bias to float
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%clamp32 = fpext half %clamp to float			%clamp32 = fpext half %clamp to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.b.cl.1d.v4f32.f32.f32(i32 15, float %bias32, float %s32, float %clamp32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.b.cl.1d.v4f32.f32.f32(i32 15, float %bias32, float %s32, float %clamp32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_b32_cl_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, half %s, half %clamp) {			define amdgpu_kernel void @image_sample_a16_b32_cl_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, half %s, half %clamp) {
	; CHECK-LABEL: @image_sample_a16_b32_cl_1d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_b32_cl_1d
	; CHECK-NEXT: [[S32:%.]] = fpext half [[S:%.]] to float			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[BIAS:%.]], half [[S:%.]], half [[CLAMP:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: [[CLAMP32:%.]] = fpext half [[CLAMP:%.]] to float			; CHECK-NEXT: [[S32:%.*]] = fpext half [[S]] to float
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.b.cl.1d.v4f32.f32.f32(i32 15, float [[BIAS:%.]], float [[S32]], float [[CLAMP32]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[CLAMP32:%.*]] = fpext half [[CLAMP]] to float
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.b.cl.1d.v4f32.f32.f32(i32 15, float [[BIAS]], float [[S32]], float [[CLAMP32]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%clamp32 = fpext half %clamp to float			%clamp32 = fpext half %clamp to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.b.cl.1d.v4f32.f32.f32(i32 15, float %bias, float %s32, float %clamp32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.b.cl.1d.v4f32.f32.f32(i32 15, float %bias, float %s32, float %clamp32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_b16_cl_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %bias, half %s, half %t, half %clamp) {			define amdgpu_kernel void @image_sample_a16_b16_cl_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %bias, half %s, half %t, half %clamp) {
	; CHECK-LABEL: @image_sample_a16_b16_cl_2d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_b16_cl_2d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.b.cl.2d.v4f32.f16.f16(i32 15, half [[BIAS:%.]], half [[S:%.]], half [[T:%.]], half [[CLAMP:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], half [[BIAS:%.]], half [[S:%.]], half [[T:%.]], half [[CLAMP:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.b.cl.2d.v4f32.f16.f16(i32 15, half [[BIAS]], half [[S]], half [[T]], half [[CLAMP]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%bias32 = fpext half %bias to float			%bias32 = fpext half %bias to float
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%t32 = fpext half %t to float			%t32 = fpext half %t to float
	%clamp32 = fpext half %clamp to float			%clamp32 = fpext half %clamp to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.b.cl.2d.v4f32.f32.f32(i32 15, float %bias32, float %s32, float %t32, float %clamp32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.b.cl.2d.v4f32.f32.f32(i32 15, float %bias32, float %s32, float %t32, float %clamp32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_b32_cl_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, half %s, half %t, half %clamp) {			define amdgpu_kernel void @image_sample_a16_b32_cl_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, half %s, half %t, half %clamp) {
	; CHECK-LABEL: @image_sample_a16_b32_cl_2d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_b32_cl_2d
	; CHECK-NEXT: [[S32:%.]] = fpext half [[S:%.]] to float			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[BIAS:%.]], half [[S:%.]], half [[T:%.]], half [[CLAMP:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: [[T32:%.]] = fpext half [[T:%.]] to float			; CHECK-NEXT: [[S32:%.*]] = fpext half [[S]] to float
	; CHECK-NEXT: [[CLAMP32:%.]] = fpext half [[CLAMP:%.]] to float			; CHECK-NEXT: [[T32:%.*]] = fpext half [[T]] to float
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.b.cl.2d.v4f32.f32.f32(i32 15, float [[BIAS:%.]], float [[S32]], float [[T32]], float [[CLAMP32]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[CLAMP32:%.*]] = fpext half [[CLAMP]] to float
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.b.cl.2d.v4f32.f32.f32(i32 15, float [[BIAS]], float [[S32]], float [[T32]], float [[CLAMP32]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%t32 = fpext half %t to float			%t32 = fpext half %t to float
	%clamp32 = fpext half %clamp to float			%clamp32 = fpext half %clamp to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.b.cl.2d.v4f32.f32.f32(i32 15, float %bias, float %s32, float %t32, float %clamp32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.b.cl.2d.v4f32.f32.f32(i32 15, float %bias, float %s32, float %t32, float %clamp32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_c_b16_cl_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %bias, float %zcompare, half %s, half %clamp) {			define amdgpu_kernel void @image_sample_a16_c_b16_cl_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %bias, float %zcompare, half %s, half %clamp) {
	; CHECK-LABEL: @image_sample_a16_c_b16_cl_1d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_c_b16_cl_1d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.b.cl.1d.v4f32.f16.f16(i32 15, half [[BIAS:%.]], float [[ZCOMPARE:%.]], half [[S:%.]], half [[CLAMP:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], half [[BIAS:%.]], float [[ZCOMPARE:%.]], half [[S:%.]], half [[CLAMP:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.b.cl.1d.v4f32.f16.f16(i32 15, half [[BIAS]], float [[ZCOMPARE]], half [[S]], half [[CLAMP]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%bias32 = fpext half %bias to float			%bias32 = fpext half %bias to float
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%clamp32 = fpext half %clamp to float			%clamp32 = fpext half %clamp to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.c.b.cl.1d.v4f32.f32.f32(i32 15, float %bias32, float %zcompare, float %s32, float %clamp32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.c.b.cl.1d.v4f32.f32.f32(i32 15, float %bias32, float %zcompare, float %s32, float %clamp32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_c_b32_cl_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, half %s, half %clamp) {			define amdgpu_kernel void @image_sample_a16_c_b32_cl_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, half %s, half %clamp) {
	; CHECK-LABEL: @image_sample_a16_c_b32_cl_1d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_c_b32_cl_1d
	; CHECK-NEXT: [[S32:%.]] = fpext half [[S:%.]] to float			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[BIAS:%.]], float [[ZCOMPARE:%.]], half [[S:%.]], half [[CLAMP:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: [[CLAMP32:%.]] = fpext half [[CLAMP:%.]] to float			; CHECK-NEXT: [[S32:%.*]] = fpext half [[S]] to float
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.b.cl.1d.v4f32.f32.f32(i32 15, float [[BIAS:%.]], float [[ZCOMPARE:%.]], float [[S32]], float [[CLAMP32]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[CLAMP32:%.*]] = fpext half [[CLAMP]] to float
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.b.cl.1d.v4f32.f32.f32(i32 15, float [[BIAS]], float [[ZCOMPARE]], float [[S32]], float [[CLAMP32]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%clamp32 = fpext half %clamp to float			%clamp32 = fpext half %clamp to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.c.b.cl.1d.v4f32.f32.f32(i32 15, float %bias, float %zcompare, float %s32, float %clamp32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.c.b.cl.1d.v4f32.f32.f32(i32 15, float %bias, float %zcompare, float %s32, float %clamp32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_c_b16_cl_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %bias, float %zcompare, half %s, half %t, half %clamp) {			define amdgpu_kernel void @image_sample_a16_c_b16_cl_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %bias, float %zcompare, half %s, half %t, half %clamp) {
	; CHECK-LABEL: @image_sample_a16_c_b16_cl_2d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_c_b16_cl_2d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.b.cl.2d.v4f32.f16.f16(i32 15, half [[BIAS:%.]], float [[ZCOMPARE:%.]], half [[S:%.]], half [[T:%.]], half [[CLAMP:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], half [[BIAS:%.]], float [[ZCOMPARE:%.]], half [[S:%.]], half [[T:%.]], half [[CLAMP:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.b.cl.2d.v4f32.f16.f16(i32 15, half [[BIAS]], float [[ZCOMPARE]], half [[S]], half [[T]], half [[CLAMP]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%bias32 = fpext half %bias to float			%bias32 = fpext half %bias to float
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%t32 = fpext half %t to float			%t32 = fpext half %t to float
	%clamp32 = fpext half %clamp to float			%clamp32 = fpext half %clamp to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.c.b.cl.2d.v4f32.f32.f32(i32 15, float %bias32, float %zcompare, float %s32, float %t32, float %clamp32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.c.b.cl.2d.v4f32.f32.f32(i32 15, float %bias32, float %zcompare, float %s32, float %t32, float %clamp32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_c_b32_cl_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, half %s, half %t, half %clamp) {			define amdgpu_kernel void @image_sample_a16_c_b32_cl_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, half %s, half %t, half %clamp) {
	; CHECK-LABEL: @image_sample_a16_c_b32_cl_2d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_c_b32_cl_2d
	; CHECK-NEXT: [[S32:%.]] = fpext half [[S:%.]] to float			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[BIAS:%.]], float [[ZCOMPARE:%.]], half [[S:%.]], half [[T:%.]], half [[CLAMP:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: [[T32:%.]] = fpext half [[T:%.]] to float			; CHECK-NEXT: [[S32:%.*]] = fpext half [[S]] to float
	; CHECK-NEXT: [[CLAMP32:%.]] = fpext half [[CLAMP:%.]] to float			; CHECK-NEXT: [[T32:%.*]] = fpext half [[T]] to float
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.b.cl.2d.v4f32.f32.f32(i32 15, float [[BIAS:%.]], float [[ZCOMPARE:%.]], float [[S32]], float [[T32]], float [[CLAMP32]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[CLAMP32:%.*]] = fpext half [[CLAMP]] to float
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.b.cl.2d.v4f32.f32.f32(i32 15, float [[BIAS]], float [[ZCOMPARE]], float [[S32]], float [[T32]], float [[CLAMP32]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%t32 = fpext half %t to float			%t32 = fpext half %t to float
	%clamp32 = fpext half %clamp to float			%clamp32 = fpext half %clamp to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.c.b.cl.2d.v4f32.f32.f32(i32 15, float %bias, float %zcompare, float %s32, float %t32, float %clamp32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.c.b.cl.2d.v4f32.f32.f32(i32 15, float %bias, float %zcompare, float %s32, float %t32, float %clamp32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_d_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dsdv, half %s) {			define amdgpu_kernel void @image_sample_a16_d_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dsdv, half %s) {
	; CHECK-LABEL: @image_sample_a16_d_1d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_d_1d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.d.1d.v4f32.f16.f16(i32 15, half [[DSDH:%.]], half [[DSDV:%.]], half [[S:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], half [[DSDH:%.]], half [[DSDV:%.]], half [[S:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.d.1d.v4f32.f16.f16(i32 15, half [[DSDH]], half [[DSDV]], half [[S]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%dsdh32 = fpext half %dsdh to float			%dsdh32 = fpext half %dsdh to float
	%dsdv32 = fpext half %dsdv to float			%dsdv32 = fpext half %dsdv to float
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.d.1d.v4f32.f32.f32(i32 15, float %dsdh32, float %dsdv32, float %s32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.d.1d.v4f32.f32.f32(i32 15, float %dsdh32, float %dsdv32, float %s32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_d_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t) {			define amdgpu_kernel void @image_sample_a16_d_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t) {
	; CHECK-LABEL: @image_sample_a16_d_2d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_d_2d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.d.2d.v4f32.f16.f16(i32 15, half [[DSDH:%.]], half [[DTDH:%.]], half [[DSDV:%.]], half [[DTDV:%.]], half [[S:%.]], half [[T:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], half [[DSDH:%.]], half [[DTDH:%.]], half [[DSDV:%.]], half [[DTDV:%.]], half [[S:%.]], half [[T:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.d.2d.v4f32.f16.f16(i32 15, half [[DSDH]], half [[DTDH]], half [[DSDV]], half [[DTDV]], half [[S]], half [[T]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%dsdh32 = fpext half %dsdh to float			%dsdh32 = fpext half %dsdh to float
	%dtdh32 = fpext half %dtdh to float			%dtdh32 = fpext half %dtdh to float
	%dsdv32 = fpext half %dsdv to float			%dsdv32 = fpext half %dsdv to float
	%dtdv32 = fpext half %dtdv to float			%dtdv32 = fpext half %dtdv to float
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%t32 = fpext half %t to float			%t32 = fpext half %t to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.d.2d.v4f32.f32.f32(i32 15, float %dsdh32, float %dtdh32, float %dsdv32, float %dtdv32, float %s32, float %t32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.d.2d.v4f32.f32.f32(i32 15, float %dsdh32, float %dtdh32, float %dsdv32, float %dtdv32, float %s32, float %t32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_d_3d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dtdh, half %drdh, half %dsdv, half %dtdv, half %drdv, half %s, half %t, half %r) {			define amdgpu_kernel void @image_sample_a16_d_3d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dtdh, half %drdh, half %dsdv, half %dtdv, half %drdv, half %s, half %t, half %r) {
	; CHECK-LABEL: @image_sample_a16_d_3d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_d_3d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.d.3d.v4f32.f16.f16(i32 15, half [[DSDH:%.]], half [[DTDH:%.]], half [[DRDH:%.]], half [[DSDV:%.]], half [[DTDV:%.]], half [[DRDV:%.]], half [[S:%.]], half [[T:%.]], half [[R:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], half [[DSDH:%.]], half [[DTDH:%.]], half [[DRDH:%.]], half [[DSDV:%.]], half [[DTDV:%.]], half [[DRDV:%.]], half [[S:%.]], half [[T:%.]], half [[R:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.d.3d.v4f32.f16.f16(i32 15, half [[DSDH]], half [[DTDH]], half [[DRDH]], half [[DSDV]], half [[DTDV]], half [[DRDV]], half [[S]], half [[T]], half [[R]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%dsdh32 = fpext half %dsdh to float			%dsdh32 = fpext half %dsdh to float
	%dtdh32 = fpext half %dtdh to float			%dtdh32 = fpext half %dtdh to float
	%drdh32 = fpext half %drdh to float			%drdh32 = fpext half %drdh to float
	%dsdv32 = fpext half %dsdv to float			%dsdv32 = fpext half %dsdv to float
	%dtdv32 = fpext half %dtdv to float			%dtdv32 = fpext half %dtdv to float
	%drdv32 = fpext half %drdv to float			%drdv32 = fpext half %drdv to float
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%t32 = fpext half %t to float			%t32 = fpext half %t to float
	%r32 = fpext half %r to float			%r32 = fpext half %r to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.d.3d.v4f32.f32.f32(i32 15, float %dsdh32, float %dtdh32, float %drdh32, float %dsdv32, float %dtdv32, float %drdv32, float %s32, float %t32, float %r32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.d.3d.v4f32.f32.f32(i32 15, float %dsdh32, float %dtdh32, float %drdh32, float %dsdv32, float %dtdv32, float %drdv32, float %s32, float %t32, float %r32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_c_d_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %dsdh, half %dsdv, half %s) {			define amdgpu_kernel void @image_sample_a16_c_d_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %dsdh, half %dsdv, half %s) {
	; CHECK-LABEL: @image_sample_a16_c_d_1d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_c_d_1d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.d.1d.v4f32.f16.f16(i32 15, float [[ZCOMPARE:%.]], half [[DSDH:%.]], half [[DSDV:%.]], half [[S:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[ZCOMPARE:%.]], half [[DSDH:%.]], half [[DSDV:%.]], half [[S:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.d.1d.v4f32.f16.f16(i32 15, float [[ZCOMPARE]], half [[DSDH]], half [[DSDV]], half [[S]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%dsdh32 = fpext half %dsdh to float			%dsdh32 = fpext half %dsdh to float
	%dsdv32 = fpext half %dsdv to float			%dsdv32 = fpext half %dsdv to float
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.c.d.1d.v4f32.f32.f32(i32 15, float %zcompare, float %dsdh32, float %dsdv32, float %s32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.c.d.1d.v4f32.f32.f32(i32 15, float %zcompare, float %dsdh32, float %dsdv32, float %s32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_c_d_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t) {			define amdgpu_kernel void @image_sample_a16_c_d_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t) {
	; CHECK-LABEL: @image_sample_a16_c_d_2d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_c_d_2d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.d.2d.v4f32.f16.f16(i32 15, float [[ZCOMPARE:%.]], half [[DSDH:%.]], half [[DTDH:%.]], half [[DSDV:%.]], half [[DTDV:%.]], half [[S:%.]], half [[T:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[ZCOMPARE:%.]], half [[DSDH:%.]], half [[DTDH:%.]], half [[DSDV:%.]], half [[DTDV:%.]], half [[S:%.]], half [[T:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.d.2d.v4f32.f16.f16(i32 15, float [[ZCOMPARE]], half [[DSDH]], half [[DTDH]], half [[DSDV]], half [[DTDV]], half [[S]], half [[T]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%dsdh32 = fpext half %dsdh to float			%dsdh32 = fpext half %dsdh to float
	%dtdh32 = fpext half %dtdh to float			%dtdh32 = fpext half %dtdh to float
	%dsdv32 = fpext half %dsdv to float			%dsdv32 = fpext half %dsdv to float
	%dtdv32 = fpext half %dtdv to float			%dtdv32 = fpext half %dtdv to float
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%t32 = fpext half %t to float			%t32 = fpext half %t to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.c.d.2d.v4f32.f32.f32(i32 15, float %zcompare, float %dsdh32, float %dtdh32, float %dsdv32, float %dtdv32, float %s32, float %t32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.c.d.2d.v4f32.f32.f32(i32 15, float %zcompare, float %dsdh32, float %dtdh32, float %dsdv32, float %dtdv32, float %s32, float %t32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_d_cl_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dsdv, half %s, half %clamp) {			define amdgpu_kernel void @image_sample_a16_d_cl_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dsdv, half %s, half %clamp) {
	; CHECK-LABEL: @image_sample_a16_d_cl_1d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_d_cl_1d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.d.cl.1d.v4f32.f16.f16(i32 15, half [[DSDH:%.]], half [[DSDV:%.]], half [[S:%.]], half [[CLAMP:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], half [[DSDH:%.]], half [[DSDV:%.]], half [[S:%.]], half [[CLAMP:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.d.cl.1d.v4f32.f16.f16(i32 15, half [[DSDH]], half [[DSDV]], half [[S]], half [[CLAMP]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%dsdh32 = fpext half %dsdh to float			%dsdh32 = fpext half %dsdh to float
	%dsdv32 = fpext half %dsdv to float			%dsdv32 = fpext half %dsdv to float
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%clamp32 = fpext half %clamp to float			%clamp32 = fpext half %clamp to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.d.cl.1d.v4f32.f32.f32(i32 15, float %dsdh32, float %dsdv32, float %s32, float %clamp32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.d.cl.1d.v4f32.f32.f32(i32 15, float %dsdh32, float %dsdv32, float %s32, float %clamp32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_d_cl_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t, half %clamp) {			define amdgpu_kernel void @image_sample_a16_d_cl_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t, half %clamp) {
	; CHECK-LABEL: @image_sample_a16_d_cl_2d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_d_cl_2d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.d.cl.2d.v4f32.f16.f16(i32 15, half [[DSDH:%.]], half [[DTDH:%.]], half [[DSDV:%.]], half [[DTDV:%.]], half [[S:%.]], half [[T:%.]], half [[CLAMP:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], half [[DSDH:%.]], half [[DTDH:%.]], half [[DSDV:%.]], half [[DTDV:%.]], half [[S:%.]], half [[T:%.]], half [[CLAMP:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.d.cl.2d.v4f32.f16.f16(i32 15, half [[DSDH]], half [[DTDH]], half [[DSDV]], half [[DTDV]], half [[S]], half [[T]], half [[CLAMP]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%dsdh32 = fpext half %dsdh to float			%dsdh32 = fpext half %dsdh to float
	%dtdh32 = fpext half %dtdh to float			%dtdh32 = fpext half %dtdh to float
	%dsdv32 = fpext half %dsdv to float			%dsdv32 = fpext half %dsdv to float
	%dtdv32 = fpext half %dtdv to float			%dtdv32 = fpext half %dtdv to float
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%t32 = fpext half %t to float			%t32 = fpext half %t to float
	%clamp32 = fpext half %clamp to float			%clamp32 = fpext half %clamp to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.d.cl.2d.v4f32.f32.f32(i32 15, float %dsdh32, float %dtdh32, float %dsdv32, float %dtdv32, float %s32, float %t32, float %clamp32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.d.cl.2d.v4f32.f32.f32(i32 15, float %dsdh32, float %dtdh32, float %dsdv32, float %dtdv32, float %s32, float %t32, float %clamp32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_c_d_cl_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %dsdh, half %dsdv, half %s, half %clamp) {			define amdgpu_kernel void @image_sample_a16_c_d_cl_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %dsdh, half %dsdv, half %s, half %clamp) {
	; CHECK-LABEL: @image_sample_a16_c_d_cl_1d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_c_d_cl_1d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.d.cl.1d.v4f32.f16.f16(i32 15, float [[ZCOMPARE:%.]], half [[DSDH:%.]], half [[DSDV:%.]], half [[S:%.]], half [[CLAMP:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[ZCOMPARE:%.]], half [[DSDH:%.]], half [[DSDV:%.]], half [[S:%.]], half [[CLAMP:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.d.cl.1d.v4f32.f16.f16(i32 15, float [[ZCOMPARE]], half [[DSDH]], half [[DSDV]], half [[S]], half [[CLAMP]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%dsdh32 = fpext half %dsdh to float			%dsdh32 = fpext half %dsdh to float
	%dsdv32 = fpext half %dsdv to float			%dsdv32 = fpext half %dsdv to float
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%clamp32 = fpext half %clamp to float			%clamp32 = fpext half %clamp to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.c.d.cl.1d.v4f32.f32.f32(i32 15, float %zcompare, float %dsdh32, float %dsdv32, float %s32, float %clamp32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.c.d.cl.1d.v4f32.f32.f32(i32 15, float %zcompare, float %dsdh32, float %dsdv32, float %s32, float %clamp32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_c_d_cl_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t, half %clamp) {			define amdgpu_kernel void @image_sample_a16_c_d_cl_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t, half %clamp) {
	; CHECK-LABEL: @image_sample_a16_c_d_cl_2d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_c_d_cl_2d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.d.cl.2d.v4f32.f16.f16(i32 15, float [[ZCOMPARE:%.]], half [[DSDH:%.]], half [[DTDH:%.]], half [[DSDV:%.]], half [[DTDV:%.]], half [[S:%.]], half [[T:%.]], half [[CLAMP:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[ZCOMPARE:%.]], half [[DSDH:%.]], half [[DTDH:%.]], half [[DSDV:%.]], half [[DTDV:%.]], half [[S:%.]], half [[T:%.]], half [[CLAMP:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.d.cl.2d.v4f32.f16.f16(i32 15, float [[ZCOMPARE]], half [[DSDH]], half [[DTDH]], half [[DSDV]], half [[DTDV]], half [[S]], half [[T]], half [[CLAMP]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%dsdh32 = fpext half %dsdh to float			%dsdh32 = fpext half %dsdh to float
	%dtdh32 = fpext half %dtdh to float			%dtdh32 = fpext half %dtdh to float
	%dsdv32 = fpext half %dsdv to float			%dsdv32 = fpext half %dsdv to float
	%dtdv32 = fpext half %dtdv to float			%dtdv32 = fpext half %dtdv to float
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%t32 = fpext half %t to float			%t32 = fpext half %t to float
	%clamp32 = fpext half %clamp to float			%clamp32 = fpext half %clamp to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.c.d.cl.2d.v4f32.f32.f32(i32 15, float %zcompare, float %dsdh32, float %dtdh32, float %dsdv32, float %dtdv32, float %s32, float %t32, float %clamp32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.c.d.cl.2d.v4f32.f32.f32(i32 15, float %zcompare, float %dsdh32, float %dtdh32, float %dsdv32, float %dtdv32, float %s32, float %t32, float %clamp32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_cd_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dsdv, half %s) {			define amdgpu_kernel void @image_sample_a16_cd_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dsdv, half %s) {
	; CHECK-LABEL: @image_sample_a16_cd_1d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_cd_1d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.cd.1d.v4f32.f16.f16(i32 15, half [[DSDH:%.]], half [[DSDV:%.]], half [[S:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], half [[DSDH:%.]], half [[DSDV:%.]], half [[S:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.cd.1d.v4f32.f16.f16(i32 15, half [[DSDH]], half [[DSDV]], half [[S]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%dsdh32 = fpext half %dsdh to float			%dsdh32 = fpext half %dsdh to float
	%dsdv32 = fpext half %dsdv to float			%dsdv32 = fpext half %dsdv to float
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.cd.1d.v4f32.f32.f32(i32 15, float %dsdh32, float %dsdv32, float %s32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.cd.1d.v4f32.f32.f32(i32 15, float %dsdh32, float %dsdv32, float %s32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_cd_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t) {			define amdgpu_kernel void @image_sample_a16_cd_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t) {
	; CHECK-LABEL: @image_sample_a16_cd_2d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_cd_2d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.cd.2d.v4f32.f16.f16(i32 15, half [[DSDH:%.]], half [[DTDH:%.]], half [[DSDV:%.]], half [[DTDV:%.]], half [[S:%.]], half [[T:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], half [[DSDH:%.]], half [[DTDH:%.]], half [[DSDV:%.]], half [[DTDV:%.]], half [[S:%.]], half [[T:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.cd.2d.v4f32.f16.f16(i32 15, half [[DSDH]], half [[DTDH]], half [[DSDV]], half [[DTDV]], half [[S]], half [[T]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%dsdh32 = fpext half %dsdh to float			%dsdh32 = fpext half %dsdh to float
	%dtdh32 = fpext half %dtdh to float			%dtdh32 = fpext half %dtdh to float
	%dsdv32 = fpext half %dsdv to float			%dsdv32 = fpext half %dsdv to float
	%dtdv32 = fpext half %dtdv to float			%dtdv32 = fpext half %dtdv to float
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%t32 = fpext half %t to float			%t32 = fpext half %t to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.cd.2d.v4f32.f32.f32(i32 15, float %dsdh32, float %dtdh32, float %dsdv32, float %dtdv32, float %s32, float %t32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.cd.2d.v4f32.f32.f32(i32 15, float %dsdh32, float %dtdh32, float %dsdv32, float %dtdv32, float %s32, float %t32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_c_cd_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %dsdh, half %dsdv, half %s) {			define amdgpu_kernel void @image_sample_a16_c_cd_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %dsdh, half %dsdv, half %s) {
	; CHECK-LABEL: @image_sample_a16_c_cd_1d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_c_cd_1d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.cd.1d.v4f32.f16.f16(i32 15, float [[ZCOMPARE:%.]], half [[DSDH:%.]], half [[DSDV:%.]], half [[S:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[ZCOMPARE:%.]], half [[DSDH:%.]], half [[DSDV:%.]], half [[S:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.cd.1d.v4f32.f16.f16(i32 15, float [[ZCOMPARE]], half [[DSDH]], half [[DSDV]], half [[S]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%dsdh32 = fpext half %dsdh to float			%dsdh32 = fpext half %dsdh to float
	%dsdv32 = fpext half %dsdv to float			%dsdv32 = fpext half %dsdv to float
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.c.cd.1d.v4f32.f32.f32(i32 15, float %zcompare, float %dsdh32, float %dsdv32, float %s32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.c.cd.1d.v4f32.f32.f32(i32 15, float %zcompare, float %dsdh32, float %dsdv32, float %s32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_c_cd_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t) {			define amdgpu_kernel void @image_sample_a16_c_cd_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t) {
	; CHECK-LABEL: @image_sample_a16_c_cd_2d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_c_cd_2d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.cd.2d.v4f32.f16.f16(i32 15, float [[ZCOMPARE:%.]], half [[DSDH:%.]], half [[DTDH:%.]], half [[DSDV:%.]], half [[DTDV:%.]], half [[S:%.]], half [[T:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[ZCOMPARE:%.]], half [[DSDH:%.]], half [[DTDH:%.]], half [[DSDV:%.]], half [[DTDV:%.]], half [[S:%.]], half [[T:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.cd.2d.v4f32.f16.f16(i32 15, float [[ZCOMPARE]], half [[DSDH]], half [[DTDH]], half [[DSDV]], half [[DTDV]], half [[S]], half [[T]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%dsdh32 = fpext half %dsdh to float			%dsdh32 = fpext half %dsdh to float
	%dtdh32 = fpext half %dtdh to float			%dtdh32 = fpext half %dtdh to float
	%dsdv32 = fpext half %dsdv to float			%dsdv32 = fpext half %dsdv to float
	%dtdv32 = fpext half %dtdv to float			%dtdv32 = fpext half %dtdv to float
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%t32 = fpext half %t to float			%t32 = fpext half %t to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.c.cd.2d.v4f32.f32.f32(i32 15, float %zcompare, float %dsdh32, float %dtdh32, float %dsdv32, float %dtdv32, float %s32, float %t32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.c.cd.2d.v4f32.f32.f32(i32 15, float %zcompare, float %dsdh32, float %dtdh32, float %dsdv32, float %dtdv32, float %s32, float %t32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_cd_cl_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dsdv, half %s, half %clamp) {			define amdgpu_kernel void @image_sample_a16_cd_cl_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dsdv, half %s, half %clamp) {
	; CHECK-LABEL: @image_sample_a16_cd_cl_1d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_cd_cl_1d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.cd.cl.1d.v4f32.f16.f16(i32 15, half [[DSDH:%.]], half [[DSDV:%.]], half [[S:%.]], half [[CLAMP:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], half [[DSDH:%.]], half [[DSDV:%.]], half [[S:%.]], half [[CLAMP:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.cd.cl.1d.v4f32.f16.f16(i32 15, half [[DSDH]], half [[DSDV]], half [[S]], half [[CLAMP]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%dsdh32 = fpext half %dsdh to float			%dsdh32 = fpext half %dsdh to float
	%dsdv32 = fpext half %dsdv to float			%dsdv32 = fpext half %dsdv to float
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%clamp32 = fpext half %clamp to float			%clamp32 = fpext half %clamp to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.cd.cl.1d.v4f32.f32.f32(i32 15, float %dsdh32, float %dsdv32, float %s32, float %clamp32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.cd.cl.1d.v4f32.f32.f32(i32 15, float %dsdh32, float %dsdv32, float %s32, float %clamp32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_cd_cl_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t, half %clamp) {			define amdgpu_kernel void @image_sample_a16_cd_cl_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t, half %clamp) {
	; CHECK-LABEL: @image_sample_a16_cd_cl_2d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_cd_cl_2d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.cd.cl.2d.v4f32.f16.f16(i32 15, half [[DSDH:%.]], half [[DTDH:%.]], half [[DSDV:%.]], half [[DTDV:%.]], half [[S:%.]], half [[T:%.]], half [[CLAMP:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], half [[DSDH:%.]], half [[DTDH:%.]], half [[DSDV:%.]], half [[DTDV:%.]], half [[S:%.]], half [[T:%.]], half [[CLAMP:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.cd.cl.2d.v4f32.f16.f16(i32 15, half [[DSDH]], half [[DTDH]], half [[DSDV]], half [[DTDV]], half [[S]], half [[T]], half [[CLAMP]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%dsdh32 = fpext half %dsdh to float			%dsdh32 = fpext half %dsdh to float
	%dtdh32 = fpext half %dtdh to float			%dtdh32 = fpext half %dtdh to float
	%dsdv32 = fpext half %dsdv to float			%dsdv32 = fpext half %dsdv to float
	%dtdv32 = fpext half %dtdv to float			%dtdv32 = fpext half %dtdv to float
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%t32 = fpext half %t to float			%t32 = fpext half %t to float
	%clamp32 = fpext half %clamp to float			%clamp32 = fpext half %clamp to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.cd.cl.2d.v4f32.f32.f32(i32 15, float %dsdh32, float %dtdh32, float %dsdv32, float %dtdv32, float %s32, float %t32, float %clamp32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.cd.cl.2d.v4f32.f32.f32(i32 15, float %dsdh32, float %dtdh32, float %dsdv32, float %dtdv32, float %s32, float %t32, float %clamp32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_c_cd_cl_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %dsdh, half %dsdv, half %s, half %clamp) {			define amdgpu_kernel void @image_sample_a16_c_cd_cl_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %dsdh, half %dsdv, half %s, half %clamp) {
	; CHECK-LABEL: @image_sample_a16_c_cd_cl_1d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_c_cd_cl_1d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.cd.cl.1d.v4f32.f16.f16(i32 15, float [[ZCOMPARE:%.]], half [[DSDH:%.]], half [[DSDV:%.]], half [[S:%.]], half [[CLAMP:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[ZCOMPARE:%.]], half [[DSDH:%.]], half [[DSDV:%.]], half [[S:%.]], half [[CLAMP:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.cd.cl.1d.v4f32.f16.f16(i32 15, float [[ZCOMPARE]], half [[DSDH]], half [[DSDV]], half [[S]], half [[CLAMP]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%dsdh32 = fpext half %dsdh to float			%dsdh32 = fpext half %dsdh to float
	%dsdv32 = fpext half %dsdv to float			%dsdv32 = fpext half %dsdv to float
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%clamp32 = fpext half %clamp to float			%clamp32 = fpext half %clamp to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.c.cd.cl.1d.v4f32.f32.f32(i32 15, float %zcompare, float %dsdh32, float %dsdv32, float %s32, float %clamp32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.c.cd.cl.1d.v4f32.f32.f32(i32 15, float %zcompare, float %dsdh32, float %dsdv32, float %s32, float %clamp32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_c_cd_cl_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t, half %clamp) {			define amdgpu_kernel void @image_sample_a16_c_cd_cl_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t, half %clamp) {
	; CHECK-LABEL: @image_sample_a16_c_cd_cl_2d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_c_cd_cl_2d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.cd.cl.2d.v4f32.f16.f16(i32 15, float [[ZCOMPARE:%.]], half [[DSDH:%.]], half [[DTDH:%.]], half [[DSDV:%.]], half [[DTDV:%.]], half [[S:%.]], half [[T:%.]], half [[CLAMP:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[ZCOMPARE:%.]], half [[DSDH:%.]], half [[DTDH:%.]], half [[DSDV:%.]], half [[DTDV:%.]], half [[S:%.]], half [[T:%.]], half [[CLAMP:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.cd.cl.2d.v4f32.f16.f16(i32 15, float [[ZCOMPARE]], half [[DSDH]], half [[DTDH]], half [[DSDV]], half [[DTDV]], half [[S]], half [[T]], half [[CLAMP]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%dsdh32 = fpext half %dsdh to float			%dsdh32 = fpext half %dsdh to float
	%dtdh32 = fpext half %dtdh to float			%dtdh32 = fpext half %dtdh to float
	%dsdv32 = fpext half %dsdv to float			%dsdv32 = fpext half %dsdv to float
	%dtdv32 = fpext half %dtdv to float			%dtdv32 = fpext half %dtdv to float
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%t32 = fpext half %t to float			%t32 = fpext half %t to float
	%clamp32 = fpext half %clamp to float			%clamp32 = fpext half %clamp to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.c.cd.cl.2d.v4f32.f32.f32(i32 15, float %zcompare, float %dsdh32, float %dtdh32, float %dsdv32, float %dtdv32, float %s32, float %t32, float %clamp32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.c.cd.cl.2d.v4f32.f32.f32(i32 15, float %zcompare, float %dsdh32, float %dtdh32, float %dsdv32, float %dtdv32, float %s32, float %t32, float %clamp32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_l_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %lod) {			define amdgpu_kernel void @image_sample_a16_l_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %lod) {
	; CHECK-LABEL: @image_sample_a16_l_1d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_l_1d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.l.1d.v4f32.f16(i32 15, half [[S:%.]], half [[LOD:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], half [[S:%.]], half [[LOD:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.l.1d.v4f32.f16(i32 15, half [[S]], half [[LOD]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%lod32 = fpext half %lod to float			%lod32 = fpext half %lod to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.l.1d.v4f32.f32(i32 15, float %s32, float %lod32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.l.1d.v4f32.f32(i32 15, float %s32, float %lod32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_l_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %lod) {			define amdgpu_kernel void @image_sample_a16_l_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %lod) {
	; CHECK-LABEL: @image_sample_a16_l_2d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_l_2d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.l.2d.v4f32.f16(i32 15, half [[S:%.]], half [[T:%.]], half [[LOD:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], half [[S:%.]], half [[T:%.]], half [[LOD:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.l.2d.v4f32.f16(i32 15, half [[S]], half [[T]], half [[LOD]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%t32 = fpext half %t to float			%t32 = fpext half %t to float
	%lod32 = fpext half %lod to float			%lod32 = fpext half %lod to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.l.2d.v4f32.f32(i32 15, float %s32, float %t32, float %lod32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.l.2d.v4f32.f32(i32 15, float %s32, float %t32, float %lod32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_c_l_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s, half %lod) {			define amdgpu_kernel void @image_sample_a16_c_l_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s, half %lod) {
	; CHECK-LABEL: @image_sample_a16_c_l_1d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_c_l_1d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.l.1d.v4f32.f16(i32 15, float [[ZCOMPARE:%.]], half [[S:%.]], half [[LOD:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[ZCOMPARE:%.]], half [[S:%.]], half [[LOD:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.l.1d.v4f32.f16(i32 15, float [[ZCOMPARE]], half [[S]], half [[LOD]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%lod32 = fpext half %lod to float			%lod32 = fpext half %lod to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.c.l.1d.v4f32.f32(i32 15, float %zcompare, float %s32, float %lod32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.c.l.1d.v4f32.f32(i32 15, float %zcompare, float %s32, float %lod32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_c_l_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s, half %t, half %lod) {			define amdgpu_kernel void @image_sample_a16_c_l_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s, half %t, half %lod) {
	; CHECK-LABEL: @image_sample_a16_c_l_2d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_c_l_2d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.l.2d.v4f32.f16(i32 15, float [[ZCOMPARE:%.]], half [[S:%.]], half [[T:%.]], half [[LOD:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[ZCOMPARE:%.]], half [[S:%.]], half [[T:%.]], half [[LOD:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.l.2d.v4f32.f16(i32 15, float [[ZCOMPARE]], half [[S]], half [[T]], half [[LOD]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%t32 = fpext half %t to float			%t32 = fpext half %t to float
	%lod32 = fpext half %lod to float			%lod32 = fpext half %lod to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.c.l.2d.v4f32.f32(i32 15, float %zcompare, float %s32, float %t32, float %lod32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.c.l.2d.v4f32.f32(i32 15, float %zcompare, float %s32, float %t32, float %lod32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_lz_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s) {			define amdgpu_kernel void @image_sample_a16_lz_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s) {
	; CHECK-LABEL: @image_sample_a16_lz_1d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_lz_1d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.lz.1d.v4f32.f16(i32 15, half [[S:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], half [[S:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.lz.1d.v4f32.f16(i32 15, half [[S]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.lz.1d.v4f32.f32(i32 15, float %s32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.lz.1d.v4f32.f32(i32 15, float %s32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_lz_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t) {			define amdgpu_kernel void @image_sample_a16_lz_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t) {
	; CHECK-LABEL: @image_sample_a16_lz_2d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_lz_2d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.lz.2d.v4f32.f16(i32 15, half [[S:%.]], half [[T:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], half [[S:%.]], half [[T:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.lz.2d.v4f32.f16(i32 15, half [[S]], half [[T]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%t32 = fpext half %t to float			%t32 = fpext half %t to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.lz.2d.v4f32.f32(i32 15, float %s32, float %t32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.lz.2d.v4f32.f32(i32 15, float %s32, float %t32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_c_lz_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s) {			define amdgpu_kernel void @image_sample_a16_c_lz_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s) {
	; CHECK-LABEL: @image_sample_a16_c_lz_1d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_c_lz_1d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.lz.1d.v4f32.f16(i32 15, float [[ZCOMPARE:%.]], half [[S:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[ZCOMPARE:%.]], half [[S:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.lz.1d.v4f32.f16(i32 15, float [[ZCOMPARE]], half [[S]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.c.lz.1d.v4f32.f32(i32 15, float %zcompare, float %s32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.c.lz.1d.v4f32.f32(i32 15, float %zcompare, float %s32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_c_lz_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s, half %t) {			define amdgpu_kernel void @image_sample_a16_c_lz_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %s, half %t) {
	; CHECK-LABEL: @image_sample_a16_c_lz_2d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_c_lz_2d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.lz.2d.v4f32.f16(i32 15, float [[ZCOMPARE:%.]], half [[S:%.]], half [[T:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[ZCOMPARE:%.]], half [[S:%.]], half [[T:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.lz.2d.v4f32.f16(i32 15, float [[ZCOMPARE]], half [[S]], half [[T]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%t32 = fpext half %t to float			%t32 = fpext half %t to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.c.lz.2d.v4f32.f32(i32 15, float %zcompare, float %s32, float %t32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.c.lz.2d.v4f32.f32(i32 15, float %zcompare, float %s32, float %t32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_c_d_o_2darray_V1(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 %offset, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t, half %slice) {			define amdgpu_kernel void @image_sample_a16_c_d_o_2darray_V1(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 %offset, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t, half %slice) {
	; CHECK-LABEL: @image_sample_a16_c_d_o_2darray_V1(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_c_d_o_2darray_V1
	; CHECK-NEXT: [[RES:%.]] = call float @llvm.amdgcn.image.sample.c.d.o.2darray.f32.f16.f16(i32 4, i32 [[OFFSET:%.]], float [[ZCOMPARE:%.]], half [[DSDH:%.]], half [[DTDH:%.]], half [[DSDV:%.]], half [[DTDV:%.]], half [[S:%.]], half [[T:%.]], half [[SLICE:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], i32 [[OFFSET:%.]], float [[ZCOMPARE:%.]], half [[DSDH:%.]], half [[DTDH:%.]], half [[DSDV:%.]], half [[DTDV:%.]], half [[S:%.]], half [[T:%.]], half [[SLICE:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: store float [[RES]], ptr addrspace(1) [[OUT:%.*]], align 4			; CHECK-NEXT: [[RES:%.*]] = call float @llvm.amdgcn.image.sample.c.d.o.2darray.f32.f16.f16(i32 4, i32 [[OFFSET]], float [[ZCOMPARE]], half [[DSDH]], half [[DTDH]], half [[DSDV]], half [[DTDV]], half [[S]], half [[T]], half [[SLICE]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store float [[RES]], ptr addrspace(1) [[OUT]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%dsdh32 = fpext half %dsdh to float			%dsdh32 = fpext half %dsdh to float
	%dtdh32 = fpext half %dtdh to float			%dtdh32 = fpext half %dtdh to float
	%dsdv32 = fpext half %dsdv to float			%dsdv32 = fpext half %dsdv to float
	%dtdv32 = fpext half %dtdv to float			%dtdv32 = fpext half %dtdv to float
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%t32 = fpext half %t to float			%t32 = fpext half %t to float
	%slice32 = fpext half %slice to float			%slice32 = fpext half %slice to float
	%res = call float @llvm.amdgcn.image.sample.c.d.o.2darray.f32.f32.f32(i32 4, i32 %offset, float %zcompare, float %dsdh32, float %dtdh32, float %dsdv32, float %dtdv32, float %s32, float %t32, float %slice32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call float @llvm.amdgcn.image.sample.c.d.o.2darray.f32.f32.f32(i32 4, i32 %offset, float %zcompare, float %dsdh32, float %dtdh32, float %dsdv32, float %dtdv32, float %s32, float %t32, float %slice32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store float %res, ptr addrspace(1) %out			store float %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_c_d_o_2darray_V2(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 %offset, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t, half %slice) {			define amdgpu_kernel void @image_sample_a16_c_d_o_2darray_V2(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 %offset, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %t, half %slice) {
	; CHECK-LABEL: @image_sample_a16_c_d_o_2darray_V2(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_c_d_o_2darray_V2
	; CHECK-NEXT: [[RES:%.]] = call <2 x float> @llvm.amdgcn.image.sample.c.d.o.2darray.v2f32.f16.f16(i32 6, i32 [[OFFSET:%.]], float [[ZCOMPARE:%.]], half [[DSDH:%.]], half [[DTDH:%.]], half [[DSDV:%.]], half [[DTDV:%.]], half [[S:%.]], half [[T:%.]], half [[SLICE:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], i32 [[OFFSET:%.]], float [[ZCOMPARE:%.]], half [[DSDH:%.]], half [[DTDH:%.]], half [[DSDV:%.]], half [[DTDV:%.]], half [[S:%.]], half [[T:%.]], half [[SLICE:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: store <2 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 8			; CHECK-NEXT: [[RES:%.*]] = call <2 x float> @llvm.amdgcn.image.sample.c.d.o.2darray.v2f32.f16.f16(i32 6, i32 [[OFFSET]], float [[ZCOMPARE]], half [[DSDH]], half [[DTDH]], half [[DSDV]], half [[DTDV]], half [[S]], half [[T]], half [[SLICE]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <2 x float> [[RES]], ptr addrspace(1) [[OUT]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%dsdh32 = fpext half %dsdh to float			%dsdh32 = fpext half %dsdh to float
	%dtdh32 = fpext half %dtdh to float			%dtdh32 = fpext half %dtdh to float
	%dsdv32 = fpext half %dsdv to float			%dsdv32 = fpext half %dsdv to float
	%dtdv32 = fpext half %dtdv to float			%dtdv32 = fpext half %dtdv to float
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%t32 = fpext half %t to float			%t32 = fpext half %t to float
	%slice32 = fpext half %slice to float			%slice32 = fpext half %slice to float
	%res = call <2 x float> @llvm.amdgcn.image.sample.c.d.o.2darray.v2f32.f32.f32(i32 6, i32 %offset, float %zcompare, float %dsdh32, float %dtdh32, float %dsdv32, float %dtdv32, float %s32, float %t32, float %slice32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <2 x float> @llvm.amdgcn.image.sample.c.d.o.2darray.v2f32.f32.f32(i32 6, i32 %offset, float %zcompare, float %dsdh32, float %dtdh32, float %dsdv32, float %dtdv32, float %s32, float %t32, float %slice32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <2 x float> %res, ptr addrspace(1) %out			store <2 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_c_d_o_2darray_const(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 %offset, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %slice) {			define amdgpu_kernel void @image_sample_a16_c_d_o_2darray_const(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 %offset, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %slice) {
	; CHECK-LABEL: @image_sample_a16_c_d_o_2darray_const(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_c_d_o_2darray_const
	; CHECK-NEXT: [[RES:%.]] = call <2 x float> @llvm.amdgcn.image.sample.c.d.o.2darray.v2f32.f16.f16(i32 6, i32 [[OFFSET:%.]], float [[ZCOMPARE:%.]], half [[DSDH:%.]], half [[DTDH:%.]], half [[DSDV:%.]], half [[DTDV:%.]], half [[S:%.]], half 0xH3400, half [[SLICE:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], i32 [[OFFSET:%.]], float [[ZCOMPARE:%.]], half [[DSDH:%.]], half [[DTDH:%.]], half [[DSDV:%.]], half [[DTDV:%.]], half [[S:%.]], half [[SLICE:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: store <2 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 8			; CHECK-NEXT: [[RES:%.*]] = call <2 x float> @llvm.amdgcn.image.sample.c.d.o.2darray.v2f32.f16.f16(i32 6, i32 [[OFFSET]], float [[ZCOMPARE]], half [[DSDH]], half [[DTDH]], half [[DSDV]], half [[DTDV]], half [[S]], half 0xH3400, half [[SLICE]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <2 x float> [[RES]], ptr addrspace(1) [[OUT]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%dsdh32 = fpext half %dsdh to float			%dsdh32 = fpext half %dsdh to float
	%dtdh32 = fpext half %dtdh to float			%dtdh32 = fpext half %dtdh to float
	%dsdv32 = fpext half %dsdv to float			%dsdv32 = fpext half %dsdv to float
	%dtdv32 = fpext half %dtdv to float			%dtdv32 = fpext half %dtdv to float
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%slice32 = fpext half %slice to float			%slice32 = fpext half %slice to float
	%res = call <2 x float> @llvm.amdgcn.image.sample.c.d.o.2darray.v2f32.f32.f32(i32 6, i32 %offset, float %zcompare, float %dsdh32, float %dtdh32, float %dsdv32, float %dtdv32, float %s32, float 0.25, float %slice32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <2 x float> @llvm.amdgcn.image.sample.c.d.o.2darray.v2f32.f32.f32(i32 6, i32 %offset, float %zcompare, float %dsdh32, float %dtdh32, float %dsdv32, float %dtdv32, float %s32, float 0.25, float %slice32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <2 x float> %res, ptr addrspace(1) %out			store <2 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_c_d_o_2darray_const_noopt(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 %offset, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %slice) {			define amdgpu_kernel void @image_sample_a16_c_d_o_2darray_const_noopt(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 %offset, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, half %s, half %slice) {
	; CHECK-LABEL: @image_sample_a16_c_d_o_2darray_const_noopt(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_c_d_o_2darray_const_noopt
	; CHECK-NEXT: [[S32:%.]] = fpext half [[S:%.]] to float			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], i32 [[OFFSET:%.]], float [[ZCOMPARE:%.]], half [[DSDH:%.]], half [[DTDH:%.]], half [[DSDV:%.]], half [[DTDV:%.]], half [[S:%.]], half [[SLICE:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: [[SLICE32:%.]] = fpext half [[SLICE:%.]] to float			; CHECK-NEXT: [[S32:%.*]] = fpext half [[S]] to float
	; CHECK-NEXT: [[RES:%.]] = call <2 x float> @llvm.amdgcn.image.sample.c.d.o.2darray.v2f32.f16.f32(i32 6, i32 [[OFFSET:%.]], float [[ZCOMPARE:%.]], half [[DSDH:%.]], half [[DTDH:%.]], half [[DSDV:%.]], half [[DTDV:%.]], float [[S32]], float 1.000000e+10, float [[SLICE32]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[SLICE32:%.*]] = fpext half [[SLICE]] to float
	; CHECK-NEXT: store <2 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 8			; CHECK-NEXT: [[RES:%.*]] = call <2 x float> @llvm.amdgcn.image.sample.c.d.o.2darray.v2f32.f16.f32(i32 6, i32 [[OFFSET]], float [[ZCOMPARE]], half [[DSDH]], half [[DTDH]], half [[DSDV]], half [[DTDV]], float [[S32]], float 1.000000e+10, float [[SLICE32]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <2 x float> [[RES]], ptr addrspace(1) [[OUT]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%dsdh32 = fpext half %dsdh to float			%dsdh32 = fpext half %dsdh to float
	%dtdh32 = fpext half %dtdh to float			%dtdh32 = fpext half %dtdh to float
	%dsdv32 = fpext half %dsdv to float			%dsdv32 = fpext half %dsdv to float
	%dtdv32 = fpext half %dtdv to float			%dtdv32 = fpext half %dtdv to float
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%slice32 = fpext half %slice to float			%slice32 = fpext half %slice to float
	%res = call <2 x float> @llvm.amdgcn.image.sample.c.d.o.2darray.v2f32.f32.f32(i32 6, i32 %offset, float %zcompare, float %dsdh32, float %dtdh32, float %dsdv32, float %dtdv32, float %s32, float 1.0e+10, float %slice32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <2 x float> @llvm.amdgcn.image.sample.c.d.o.2darray.v2f32.f32.f32(i32 6, i32 %offset, float %zcompare, float %dsdh32, float %dtdh32, float %dsdv32, float %dtdv32, float %s32, float 1.0e+10, float %slice32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <2 x float> %res, ptr addrspace(1) %out			store <2 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_load_a16_mip_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, i16 %s) {			define amdgpu_kernel void @image_load_a16_mip_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, i16 %s) {
	; CHECK-LABEL: @image_load_a16_mip_1d(			; CHECK-LABEL: define amdgpu_kernel void @image_load_a16_mip_1d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i16(i32 15, i16 [[S:%.]], <8 x i32> [[RSRC:%.*]], i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], i16 [[S:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i16(i32 15, i16 [[S]], <8 x i32> [[RSRC]], i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%s32 = zext i16 %s to i32			%s32 = zext i16 %s to i32
	%res = call <4 x float> @llvm.amdgcn.image.load.mip.1d.v4f32.i32(i32 15, i32 %s32, i32 0, <8 x i32> %rsrc, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.load.mip.1d.v4f32.i32(i32 15, i32 %s32, i32 0, <8 x i32> %rsrc, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_load_a16_mip_1d_noopt(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, i16 %s) {			define amdgpu_kernel void @image_load_a16_mip_1d_noopt(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, i16 %s) {
	; CHECK-LABEL: @image_load_a16_mip_1d_noopt(			; CHECK-LABEL: define amdgpu_kernel void @image_load_a16_mip_1d_noopt
	; CHECK-NEXT: [[S32:%.]] = sext i16 [[S:%.]] to i32			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], i16 [[S:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i32(i32 15, i32 [[S32]], <8 x i32> [[RSRC:%.]], i32 0, i32 0)			; CHECK-NEXT: [[S32:%.*]] = sext i16 [[S]] to i32
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i32(i32 15, i32 [[S32]], <8 x i32> [[RSRC]], i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%s32 = sext i16 %s to i32			%s32 = sext i16 %s to i32
	%res = call <4 x float> @llvm.amdgcn.image.load.mip.1d.v4f32.i32(i32 15, i32 %s32, i32 0, <8 x i32> %rsrc, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.load.mip.1d.v4f32.i32(i32 15, i32 %s32, i32 0, <8 x i32> %rsrc, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_load_a16_mip_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, i16 %s, i16 %t) {			define amdgpu_kernel void @image_load_a16_mip_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, i16 %s, i16 %t) {
	; CHECK-LABEL: @image_load_a16_mip_2d(			; CHECK-LABEL: define amdgpu_kernel void @image_load_a16_mip_2d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.load.2d.v4f32.i16(i32 15, i16 [[S:%.]], i16 [[T:%.]], <8 x i32> [[RSRC:%.]], i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], i16 [[S:%.]], i16 [[T:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.load.2d.v4f32.i16(i32 15, i16 [[S]], i16 [[T]], <8 x i32> [[RSRC]], i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%s32 = zext i16 %s to i32			%s32 = zext i16 %s to i32
	%t32 = zext i16 %t to i32			%t32 = zext i16 %t to i32
	%res = call <4 x float> @llvm.amdgcn.image.load.mip.2d.v4f32.i32(i32 15, i32 %s32, i32 %t32, i32 0, <8 x i32> %rsrc, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.load.mip.2d.v4f32.i32(i32 15, i32 %s32, i32 %t32, i32 0, <8 x i32> %rsrc, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_load_a16_mip_2d_const(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, i16 %s) {			define amdgpu_kernel void @image_load_a16_mip_2d_const(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, i16 %s) {
	; CHECK-LABEL: @image_load_a16_mip_2d_const(			; CHECK-LABEL: define amdgpu_kernel void @image_load_a16_mip_2d_const
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.load.2d.v4f32.i16(i32 15, i16 [[S:%.]], i16 -1, <8 x i32> [[RSRC:%.*]], i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], i16 [[S:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.load.2d.v4f32.i16(i32 15, i16 [[S]], i16 -1, <8 x i32> [[RSRC]], i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%s32 = zext i16 %s to i32			%s32 = zext i16 %s to i32
	%res = call <4 x float> @llvm.amdgcn.image.load.mip.2d.v4f32.i32(i32 15, i32 %s32, i32 65535, i32 0, <8 x i32> %rsrc, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.load.mip.2d.v4f32.i32(i32 15, i32 %s32, i32 65535, i32 0, <8 x i32> %rsrc, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_load_a16_mip_2d_const_noopt(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, i16 %s) {			define amdgpu_kernel void @image_load_a16_mip_2d_const_noopt(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, i16 %s) {
	; CHECK-LABEL: @image_load_a16_mip_2d_const_noopt(			; CHECK-LABEL: define amdgpu_kernel void @image_load_a16_mip_2d_const_noopt
	; CHECK-NEXT: [[S32:%.]] = zext i16 [[S:%.]] to i32			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], i16 [[S:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.load.2d.v4f32.i32(i32 15, i32 [[S32]], i32 65536, <8 x i32> [[RSRC:%.]], i32 0, i32 0)			; CHECK-NEXT: [[S32:%.*]] = zext i16 [[S]] to i32
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.load.2d.v4f32.i32(i32 15, i32 [[S32]], i32 65536, <8 x i32> [[RSRC]], i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%s32 = zext i16 %s to i32			%s32 = zext i16 %s to i32
	%res = call <4 x float> @llvm.amdgcn.image.load.mip.2d.v4f32.i32(i32 15, i32 %s32, i32 65536, i32 0, <8 x i32> %rsrc, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.load.mip.2d.v4f32.i32(i32 15, i32 %s32, i32 65536, i32 0, <8 x i32> %rsrc, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	; --------------------------------------------------------------------			; --------------------------------------------------------------------
	; llvm.amdgcn.image.sample g16			; llvm.amdgcn.image.sample g16
	; --------------------------------------------------------------------			; --------------------------------------------------------------------

	define amdgpu_kernel void @image_sample_g16_d_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dsdv, float %s) {			define amdgpu_kernel void @image_sample_g16_d_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dsdv, float %s) {
	; CHECK-LABEL: @image_sample_g16_d_1d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_g16_d_1d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.d.1d.v4f32.f16.f32(i32 15, half [[DSDH:%.]], half [[DSDV:%.]], float [[S:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], half [[DSDH:%.]], half [[DSDV:%.]], float [[S:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.d.1d.v4f32.f16.f32(i32 15, half [[DSDH]], half [[DSDV]], float [[S]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%dsdh32 = fpext half %dsdh to float			%dsdh32 = fpext half %dsdh to float
	%dsdv32 = fpext half %dsdv to float			%dsdv32 = fpext half %dsdv to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.d.1d.v4f32.f32.f32(i32 15, float %dsdh32, float %dsdv32, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.d.1d.v4f32.f32.f32(i32 15, float %dsdh32, float %dsdv32, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_g16_d_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dtdh, half %dsdv, half %dtdv, float %s, float %t) {			define amdgpu_kernel void @image_sample_g16_d_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dtdh, half %dsdv, half %dtdv, float %s, float %t) {
	; CHECK-LABEL: @image_sample_g16_d_2d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_g16_d_2d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.d.2d.v4f32.f16.f32(i32 15, half [[DSDH:%.]], half [[DTDH:%.]], half [[DSDV:%.]], half [[DTDV:%.]], float [[S:%.]], float [[T:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], half [[DSDH:%.]], half [[DTDH:%.]], half [[DSDV:%.]], half [[DTDV:%.]], float [[S:%.]], float [[T:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.d.2d.v4f32.f16.f32(i32 15, half [[DSDH]], half [[DTDH]], half [[DSDV]], half [[DTDV]], float [[S]], float [[T]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%dsdh32 = fpext half %dsdh to float			%dsdh32 = fpext half %dsdh to float
	%dtdh32 = fpext half %dtdh to float			%dtdh32 = fpext half %dtdh to float
	%dsdv32 = fpext half %dsdv to float			%dsdv32 = fpext half %dsdv to float
	%dtdv32 = fpext half %dtdv to float			%dtdv32 = fpext half %dtdv to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.d.2d.v4f32.f32.f32(i32 15, float %dsdh32, float %dtdh32, float %dsdv32, float %dtdv32, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.d.2d.v4f32.f32.f32(i32 15, float %dsdh32, float %dtdh32, float %dsdv32, float %dtdv32, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_g16_d_3d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dtdh, half %drdh, half %dsdv, half %dtdv, half %drdv, float %s, float %t, float %r) {			define amdgpu_kernel void @image_sample_g16_d_3d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dtdh, half %drdh, half %dsdv, half %dtdv, half %drdv, float %s, float %t, float %r) {
	; CHECK-LABEL: @image_sample_g16_d_3d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_g16_d_3d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.d.3d.v4f32.f16.f32(i32 15, half [[DSDH:%.]], half [[DTDH:%.]], half [[DRDH:%.]], half [[DSDV:%.]], half [[DTDV:%.]], half [[DRDV:%.]], float [[S:%.]], float [[T:%.]], float [[R:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], half [[DSDH:%.]], half [[DTDH:%.]], half [[DRDH:%.]], half [[DSDV:%.]], half [[DTDV:%.]], half [[DRDV:%.]], float [[S:%.]], float [[T:%.]], float [[R:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.d.3d.v4f32.f16.f32(i32 15, half [[DSDH]], half [[DTDH]], half [[DRDH]], half [[DSDV]], half [[DTDV]], half [[DRDV]], float [[S]], float [[T]], float [[R]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%dsdh32 = fpext half %dsdh to float			%dsdh32 = fpext half %dsdh to float
	%dtdh32 = fpext half %dtdh to float			%dtdh32 = fpext half %dtdh to float
	%drdh32 = fpext half %drdh to float			%drdh32 = fpext half %drdh to float
	%dsdv32 = fpext half %dsdv to float			%dsdv32 = fpext half %dsdv to float
	%dtdv32 = fpext half %dtdv to float			%dtdv32 = fpext half %dtdv to float
	%drdv32 = fpext half %drdv to float			%drdv32 = fpext half %drdv to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.d.3d.v4f32.f32.f32(i32 15, float %dsdh32, float %dtdh32, float %drdh32, float %dsdv32, float %dtdv32, float %drdv32, float %s, float %t, float %r, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.d.3d.v4f32.f32.f32(i32 15, float %dsdh32, float %dtdh32, float %drdh32, float %dsdv32, float %dtdv32, float %drdv32, float %s, float %t, float %r, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_g16_c_d_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %dsdh, half %dsdv, float %s) {			define amdgpu_kernel void @image_sample_g16_c_d_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %dsdh, half %dsdv, float %s) {
	; CHECK-LABEL: @image_sample_g16_c_d_1d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_g16_c_d_1d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.d.1d.v4f32.f16.f32(i32 15, float [[ZCOMPARE:%.]], half [[DSDH:%.]], half [[DSDV:%.]], float [[S:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[ZCOMPARE:%.]], half [[DSDH:%.]], half [[DSDV:%.]], float [[S:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.d.1d.v4f32.f16.f32(i32 15, float [[ZCOMPARE]], half [[DSDH]], half [[DSDV]], float [[S]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%dsdh32 = fpext half %dsdh to float			%dsdh32 = fpext half %dsdh to float
	%dsdv32 = fpext half %dsdv to float			%dsdv32 = fpext half %dsdv to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.c.d.1d.v4f32.f32.f32(i32 15, float %zcompare, float %dsdh32, float %dsdv32, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.c.d.1d.v4f32.f32.f32(i32 15, float %zcompare, float %dsdh32, float %dsdv32, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_g16_c_d_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, float %s, float %t) {			define amdgpu_kernel void @image_sample_g16_c_d_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, float %s, float %t) {
	; CHECK-LABEL: @image_sample_g16_c_d_2d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_g16_c_d_2d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.d.2d.v4f32.f16.f32(i32 15, float [[ZCOMPARE:%.]], half [[DSDH:%.]], half [[DTDH:%.]], half [[DSDV:%.]], half [[DTDV:%.]], float [[S:%.]], float [[T:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[ZCOMPARE:%.]], half [[DSDH:%.]], half [[DTDH:%.]], half [[DSDV:%.]], half [[DTDV:%.]], float [[S:%.]], float [[T:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.d.2d.v4f32.f16.f32(i32 15, float [[ZCOMPARE]], half [[DSDH]], half [[DTDH]], half [[DSDV]], half [[DTDV]], float [[S]], float [[T]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%dsdh32 = fpext half %dsdh to float			%dsdh32 = fpext half %dsdh to float
	%dtdh32 = fpext half %dtdh to float			%dtdh32 = fpext half %dtdh to float
	%dsdv32 = fpext half %dsdv to float			%dsdv32 = fpext half %dsdv to float
	%dtdv32 = fpext half %dtdv to float			%dtdv32 = fpext half %dtdv to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.c.d.2d.v4f32.f32.f32(i32 15, float %zcompare, float %dsdh32, float %dtdh32, float %dsdv32, float %dtdv32, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.c.d.2d.v4f32.f32.f32(i32 15, float %zcompare, float %dsdh32, float %dtdh32, float %dsdv32, float %dtdv32, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_g16_d_cl_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dsdv, float %s, float %clamp) {			define amdgpu_kernel void @image_sample_g16_d_cl_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dsdv, float %s, float %clamp) {
	; CHECK-LABEL: @image_sample_g16_d_cl_1d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_g16_d_cl_1d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.d.cl.1d.v4f32.f16.f32(i32 15, half [[DSDH:%.]], half [[DSDV:%.]], float [[S:%.]], float [[CLAMP:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], half [[DSDH:%.]], half [[DSDV:%.]], float [[S:%.]], float [[CLAMP:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.d.cl.1d.v4f32.f16.f32(i32 15, half [[DSDH]], half [[DSDV]], float [[S]], float [[CLAMP]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%dsdh32 = fpext half %dsdh to float			%dsdh32 = fpext half %dsdh to float
	%dsdv32 = fpext half %dsdv to float			%dsdv32 = fpext half %dsdv to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.d.cl.1d.v4f32.f32.f32(i32 15, float %dsdh32, float %dsdv32, float %s, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.d.cl.1d.v4f32.f32.f32(i32 15, float %dsdh32, float %dsdv32, float %s, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_g16_d_cl_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dtdh, half %dsdv, half %dtdv, float %s, float %t, float %clamp) {			define amdgpu_kernel void @image_sample_g16_d_cl_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dtdh, half %dsdv, half %dtdv, float %s, float %t, float %clamp) {
	; CHECK-LABEL: @image_sample_g16_d_cl_2d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_g16_d_cl_2d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.d.cl.2d.v4f32.f16.f32(i32 15, half [[DSDH:%.]], half [[DTDH:%.]], half [[DSDV:%.]], half [[DTDV:%.]], float [[S:%.]], float [[T:%.]], float [[CLAMP:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], half [[DSDH:%.]], half [[DTDH:%.]], half [[DSDV:%.]], half [[DTDV:%.]], float [[S:%.]], float [[T:%.]], float [[CLAMP:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.d.cl.2d.v4f32.f16.f32(i32 15, half [[DSDH]], half [[DTDH]], half [[DSDV]], half [[DTDV]], float [[S]], float [[T]], float [[CLAMP]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%dsdh32 = fpext half %dsdh to float			%dsdh32 = fpext half %dsdh to float
	%dtdh32 = fpext half %dtdh to float			%dtdh32 = fpext half %dtdh to float
	%dsdv32 = fpext half %dsdv to float			%dsdv32 = fpext half %dsdv to float
	%dtdv32 = fpext half %dtdv to float			%dtdv32 = fpext half %dtdv to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.d.cl.2d.v4f32.f32.f32(i32 15, float %dsdh32, float %dtdh32, float %dsdv32, float %dtdv32, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.d.cl.2d.v4f32.f32.f32(i32 15, float %dsdh32, float %dtdh32, float %dsdv32, float %dtdv32, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_g16_c_d_cl_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %dsdh, half %dsdv, float %s, float %clamp) {			define amdgpu_kernel void @image_sample_g16_c_d_cl_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %dsdh, half %dsdv, float %s, float %clamp) {
	; CHECK-LABEL: @image_sample_g16_c_d_cl_1d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_g16_c_d_cl_1d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.d.cl.1d.v4f32.f16.f32(i32 15, float [[ZCOMPARE:%.]], half [[DSDH:%.]], half [[DSDV:%.]], float [[S:%.]], float [[CLAMP:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[ZCOMPARE:%.]], half [[DSDH:%.]], half [[DSDV:%.]], float [[S:%.]], float [[CLAMP:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.d.cl.1d.v4f32.f16.f32(i32 15, float [[ZCOMPARE]], half [[DSDH]], half [[DSDV]], float [[S]], float [[CLAMP]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%dsdh32 = fpext half %dsdh to float			%dsdh32 = fpext half %dsdh to float
	%dsdv32 = fpext half %dsdv to float			%dsdv32 = fpext half %dsdv to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.c.d.cl.1d.v4f32.f32.f32(i32 15, float %zcompare, float %dsdh32, float %dsdv32, float %s, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.c.d.cl.1d.v4f32.f32.f32(i32 15, float %zcompare, float %dsdh32, float %dsdv32, float %s, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_g16_c_d_cl_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, float %s, float %t, float %clamp) {			define amdgpu_kernel void @image_sample_g16_c_d_cl_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, float %s, float %t, float %clamp) {
	; CHECK-LABEL: @image_sample_g16_c_d_cl_2d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_g16_c_d_cl_2d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.d.cl.2d.v4f32.f16.f32(i32 15, float [[ZCOMPARE:%.]], half [[DSDH:%.]], half [[DTDH:%.]], half [[DSDV:%.]], half [[DTDV:%.]], float [[S:%.]], float [[T:%.]], float [[CLAMP:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[ZCOMPARE:%.]], half [[DSDH:%.]], half [[DTDH:%.]], half [[DSDV:%.]], half [[DTDV:%.]], float [[S:%.]], float [[T:%.]], float [[CLAMP:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.d.cl.2d.v4f32.f16.f32(i32 15, float [[ZCOMPARE]], half [[DSDH]], half [[DTDH]], half [[DSDV]], half [[DTDV]], float [[S]], float [[T]], float [[CLAMP]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%dsdh32 = fpext half %dsdh to float			%dsdh32 = fpext half %dsdh to float
	%dtdh32 = fpext half %dtdh to float			%dtdh32 = fpext half %dtdh to float
	%dsdv32 = fpext half %dsdv to float			%dsdv32 = fpext half %dsdv to float
	%dtdv32 = fpext half %dtdv to float			%dtdv32 = fpext half %dtdv to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.c.d.cl.2d.v4f32.f32.f32(i32 15, float %zcompare, float %dsdh32, float %dtdh32, float %dsdv32, float %dtdv32, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.c.d.cl.2d.v4f32.f32.f32(i32 15, float %zcompare, float %dsdh32, float %dtdh32, float %dsdv32, float %dtdv32, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_g16_cd_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dsdv, float %s) {			define amdgpu_kernel void @image_sample_g16_cd_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dsdv, float %s) {
	; CHECK-LABEL: @image_sample_g16_cd_1d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_g16_cd_1d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.cd.1d.v4f32.f16.f32(i32 15, half [[DSDH:%.]], half [[DSDV:%.]], float [[S:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], half [[DSDH:%.]], half [[DSDV:%.]], float [[S:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.cd.1d.v4f32.f16.f32(i32 15, half [[DSDH]], half [[DSDV]], float [[S]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%dsdh32 = fpext half %dsdh to float			%dsdh32 = fpext half %dsdh to float
	%dsdv32 = fpext half %dsdv to float			%dsdv32 = fpext half %dsdv to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.cd.1d.v4f32.f32.f32(i32 15, float %dsdh32, float %dsdv32, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.cd.1d.v4f32.f32.f32(i32 15, float %dsdh32, float %dsdv32, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_g16_cd_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dtdh, half %dsdv, half %dtdv, float %s, float %t) {			define amdgpu_kernel void @image_sample_g16_cd_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dtdh, half %dsdv, half %dtdv, float %s, float %t) {
	; CHECK-LABEL: @image_sample_g16_cd_2d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_g16_cd_2d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.cd.2d.v4f32.f16.f32(i32 15, half [[DSDH:%.]], half [[DTDH:%.]], half [[DSDV:%.]], half [[DTDV:%.]], float [[S:%.]], float [[T:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], half [[DSDH:%.]], half [[DTDH:%.]], half [[DSDV:%.]], half [[DTDV:%.]], float [[S:%.]], float [[T:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.cd.2d.v4f32.f16.f32(i32 15, half [[DSDH]], half [[DTDH]], half [[DSDV]], half [[DTDV]], float [[S]], float [[T]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%dsdh32 = fpext half %dsdh to float			%dsdh32 = fpext half %dsdh to float
	%dtdh32 = fpext half %dtdh to float			%dtdh32 = fpext half %dtdh to float
	%dsdv32 = fpext half %dsdv to float			%dsdv32 = fpext half %dsdv to float
	%dtdv32 = fpext half %dtdv to float			%dtdv32 = fpext half %dtdv to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.cd.2d.v4f32.f32.f32(i32 15, float %dsdh32, float %dtdh32, float %dsdv32, float %dtdv32, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.cd.2d.v4f32.f32.f32(i32 15, float %dsdh32, float %dtdh32, float %dsdv32, float %dtdv32, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_g16_c_cd_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %dsdh, half %dsdv, float %s) {			define amdgpu_kernel void @image_sample_g16_c_cd_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %dsdh, half %dsdv, float %s) {
	; CHECK-LABEL: @image_sample_g16_c_cd_1d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_g16_c_cd_1d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.cd.1d.v4f32.f16.f32(i32 15, float [[ZCOMPARE:%.]], half [[DSDH:%.]], half [[DSDV:%.]], float [[S:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[ZCOMPARE:%.]], half [[DSDH:%.]], half [[DSDV:%.]], float [[S:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.cd.1d.v4f32.f16.f32(i32 15, float [[ZCOMPARE]], half [[DSDH]], half [[DSDV]], float [[S]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%dsdh32 = fpext half %dsdh to float			%dsdh32 = fpext half %dsdh to float
	%dsdv32 = fpext half %dsdv to float			%dsdv32 = fpext half %dsdv to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.c.cd.1d.v4f32.f32.f32(i32 15, float %zcompare, float %dsdh32, float %dsdv32, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.c.cd.1d.v4f32.f32.f32(i32 15, float %zcompare, float %dsdh32, float %dsdv32, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_g16_c_cd_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, float %s, float %t) {			define amdgpu_kernel void @image_sample_g16_c_cd_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, float %s, float %t) {
	; CHECK-LABEL: @image_sample_g16_c_cd_2d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_g16_c_cd_2d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.cd.2d.v4f32.f16.f32(i32 15, float [[ZCOMPARE:%.]], half [[DSDH:%.]], half [[DTDH:%.]], half [[DSDV:%.]], half [[DTDV:%.]], float [[S:%.]], float [[T:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[ZCOMPARE:%.]], half [[DSDH:%.]], half [[DTDH:%.]], half [[DSDV:%.]], half [[DTDV:%.]], float [[S:%.]], float [[T:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.cd.2d.v4f32.f16.f32(i32 15, float [[ZCOMPARE]], half [[DSDH]], half [[DTDH]], half [[DSDV]], half [[DTDV]], float [[S]], float [[T]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%dsdh32 = fpext half %dsdh to float			%dsdh32 = fpext half %dsdh to float
	%dtdh32 = fpext half %dtdh to float			%dtdh32 = fpext half %dtdh to float
	%dsdv32 = fpext half %dsdv to float			%dsdv32 = fpext half %dsdv to float
	%dtdv32 = fpext half %dtdv to float			%dtdv32 = fpext half %dtdv to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.c.cd.2d.v4f32.f32.f32(i32 15, float %zcompare, float %dsdh32, float %dtdh32, float %dsdv32, float %dtdv32, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.c.cd.2d.v4f32.f32.f32(i32 15, float %zcompare, float %dsdh32, float %dtdh32, float %dsdv32, float %dtdv32, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_g16_cd_cl_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dsdv, float %s, float %clamp) {			define amdgpu_kernel void @image_sample_g16_cd_cl_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dsdv, float %s, float %clamp) {
	; CHECK-LABEL: @image_sample_g16_cd_cl_1d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_g16_cd_cl_1d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.cd.cl.1d.v4f32.f16.f32(i32 15, half [[DSDH:%.]], half [[DSDV:%.]], float [[S:%.]], float [[CLAMP:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], half [[DSDH:%.]], half [[DSDV:%.]], float [[S:%.]], float [[CLAMP:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.cd.cl.1d.v4f32.f16.f32(i32 15, half [[DSDH]], half [[DSDV]], float [[S]], float [[CLAMP]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%dsdh32 = fpext half %dsdh to float			%dsdh32 = fpext half %dsdh to float
	%dsdv32 = fpext half %dsdv to float			%dsdv32 = fpext half %dsdv to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.cd.cl.1d.v4f32.f32.f32(i32 15, float %dsdh32, float %dsdv32, float %s, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.cd.cl.1d.v4f32.f32.f32(i32 15, float %dsdh32, float %dsdv32, float %s, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_g16_cd_cl_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dtdh, half %dsdv, half %dtdv, float %s, float %t, float %clamp) {			define amdgpu_kernel void @image_sample_g16_cd_cl_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %dsdh, half %dtdh, half %dsdv, half %dtdv, float %s, float %t, float %clamp) {
	; CHECK-LABEL: @image_sample_g16_cd_cl_2d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_g16_cd_cl_2d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.cd.cl.2d.v4f32.f16.f32(i32 15, half [[DSDH:%.]], half [[DTDH:%.]], half [[DSDV:%.]], half [[DTDV:%.]], float [[S:%.]], float [[T:%.]], float [[CLAMP:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], half [[DSDH:%.]], half [[DTDH:%.]], half [[DSDV:%.]], half [[DTDV:%.]], float [[S:%.]], float [[T:%.]], float [[CLAMP:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.cd.cl.2d.v4f32.f16.f32(i32 15, half [[DSDH]], half [[DTDH]], half [[DSDV]], half [[DTDV]], float [[S]], float [[T]], float [[CLAMP]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%dsdh32 = fpext half %dsdh to float			%dsdh32 = fpext half %dsdh to float
	%dtdh32 = fpext half %dtdh to float			%dtdh32 = fpext half %dtdh to float
	%dsdv32 = fpext half %dsdv to float			%dsdv32 = fpext half %dsdv to float
	%dtdv32 = fpext half %dtdv to float			%dtdv32 = fpext half %dtdv to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.cd.cl.2d.v4f32.f32.f32(i32 15, float %dsdh32, float %dtdh32, float %dsdv32, float %dtdv32, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.cd.cl.2d.v4f32.f32.f32(i32 15, float %dsdh32, float %dtdh32, float %dsdv32, float %dtdv32, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_g16_c_cd_cl_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %dsdh, half %dsdv, float %s, float %clamp) {			define amdgpu_kernel void @image_sample_g16_c_cd_cl_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %dsdh, half %dsdv, float %s, float %clamp) {
	; CHECK-LABEL: @image_sample_g16_c_cd_cl_1d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_g16_c_cd_cl_1d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.cd.cl.1d.v4f32.f16.f32(i32 15, float [[ZCOMPARE:%.]], half [[DSDH:%.]], half [[DSDV:%.]], float [[S:%.]], float [[CLAMP:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[ZCOMPARE:%.]], half [[DSDH:%.]], half [[DSDV:%.]], float [[S:%.]], float [[CLAMP:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.cd.cl.1d.v4f32.f16.f32(i32 15, float [[ZCOMPARE]], half [[DSDH]], half [[DSDV]], float [[S]], float [[CLAMP]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%dsdh32 = fpext half %dsdh to float			%dsdh32 = fpext half %dsdh to float
	%dsdv32 = fpext half %dsdv to float			%dsdv32 = fpext half %dsdv to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.c.cd.cl.1d.v4f32.f32.f32(i32 15, float %zcompare, float %dsdh32, float %dsdv32, float %s, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.c.cd.cl.1d.v4f32.f32.f32(i32 15, float %zcompare, float %dsdh32, float %dsdv32, float %s, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_g16_c_cd_cl_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, float %s, float %t, float %clamp) {			define amdgpu_kernel void @image_sample_g16_c_cd_cl_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, float %s, float %t, float %clamp) {
	; CHECK-LABEL: @image_sample_g16_c_cd_cl_2d(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_g16_c_cd_cl_2d
	; CHECK-NEXT: [[RES:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.cd.cl.2d.v4f32.f16.f32(i32 15, float [[ZCOMPARE:%.]], half [[DSDH:%.]], half [[DTDH:%.]], half [[DSDV:%.]], half [[DTDV:%.]], float [[S:%.]], float [[T:%.]], float [[CLAMP:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[ZCOMPARE:%.]], half [[DSDH:%.]], half [[DTDH:%.]], half [[DSDV:%.]], half [[DTDV:%.]], float [[S:%.]], float [[T:%.]], float [[CLAMP:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.cd.cl.2d.v4f32.f16.f32(i32 15, float [[ZCOMPARE]], half [[DSDH]], half [[DTDH]], half [[DSDV]], half [[DTDV]], float [[S]], float [[T]], float [[CLAMP]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%dsdh32 = fpext half %dsdh to float			%dsdh32 = fpext half %dsdh to float
	%dtdh32 = fpext half %dtdh to float			%dtdh32 = fpext half %dtdh to float
	%dsdv32 = fpext half %dsdv to float			%dsdv32 = fpext half %dsdv to float
	%dtdv32 = fpext half %dtdv to float			%dtdv32 = fpext half %dtdv to float
	%res = call <4 x float> @llvm.amdgcn.image.sample.c.cd.cl.2d.v4f32.f32.f32(i32 15, float %zcompare, float %dsdh32, float %dtdh32, float %dsdv32, float %dtdv32, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <4 x float> @llvm.amdgcn.image.sample.c.cd.cl.2d.v4f32.f32.f32(i32 15, float %zcompare, float %dsdh32, float %dtdh32, float %dsdv32, float %dtdv32, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_g16_c_d_o_2darray_V1(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 %offset, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, float %s, float %t, float %slice) {			define amdgpu_kernel void @image_sample_g16_c_d_o_2darray_V1(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 %offset, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, float %s, float %t, float %slice) {
	; CHECK-LABEL: @image_sample_g16_c_d_o_2darray_V1(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_g16_c_d_o_2darray_V1
	; CHECK-NEXT: [[RES:%.]] = call float @llvm.amdgcn.image.sample.c.d.o.2darray.f32.f16.f32(i32 4, i32 [[OFFSET:%.]], float [[ZCOMPARE:%.]], half [[DSDH:%.]], half [[DTDH:%.]], half [[DSDV:%.]], half [[DTDV:%.]], float [[S:%.]], float [[T:%.]], float [[SLICE:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], i32 [[OFFSET:%.]], float [[ZCOMPARE:%.]], half [[DSDH:%.]], half [[DTDH:%.]], half [[DSDV:%.]], half [[DTDV:%.]], float [[S:%.]], float [[T:%.]], float [[SLICE:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: store float [[RES]], ptr addrspace(1) [[OUT:%.*]], align 4			; CHECK-NEXT: [[RES:%.*]] = call float @llvm.amdgcn.image.sample.c.d.o.2darray.f32.f16.f32(i32 4, i32 [[OFFSET]], float [[ZCOMPARE]], half [[DSDH]], half [[DTDH]], half [[DSDV]], half [[DTDV]], float [[S]], float [[T]], float [[SLICE]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store float [[RES]], ptr addrspace(1) [[OUT]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%dsdh32 = fpext half %dsdh to float			%dsdh32 = fpext half %dsdh to float
	%dtdh32 = fpext half %dtdh to float			%dtdh32 = fpext half %dtdh to float
	%dsdv32 = fpext half %dsdv to float			%dsdv32 = fpext half %dsdv to float
	%dtdv32 = fpext half %dtdv to float			%dtdv32 = fpext half %dtdv to float
	%res = call float @llvm.amdgcn.image.sample.c.d.o.2darray.f32.f32.f32(i32 4, i32 %offset, float %zcompare, float %dsdh32, float %dtdh32, float %dsdv32, float %dtdv32, float %s, float %t, float %slice, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call float @llvm.amdgcn.image.sample.c.d.o.2darray.f32.f32.f32(i32 4, i32 %offset, float %zcompare, float %dsdh32, float %dtdh32, float %dsdv32, float %dtdv32, float %s, float %t, float %slice, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store float %res, ptr addrspace(1) %out			store float %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_g16_c_d_o_2darray_V2(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 %offset, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, float %s, float %t, float %slice) {			define amdgpu_kernel void @image_sample_g16_c_d_o_2darray_V2(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 %offset, float %zcompare, half %dsdh, half %dtdh, half %dsdv, half %dtdv, float %s, float %t, float %slice) {
	; CHECK-LABEL: @image_sample_g16_c_d_o_2darray_V2(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_g16_c_d_o_2darray_V2
	; CHECK-NEXT: [[RES:%.]] = call <2 x float> @llvm.amdgcn.image.sample.c.d.o.2darray.v2f32.f16.f32(i32 6, i32 [[OFFSET:%.]], float [[ZCOMPARE:%.]], half [[DSDH:%.]], half [[DTDH:%.]], half [[DSDV:%.]], half [[DTDV:%.]], float [[S:%.]], float [[T:%.]], float [[SLICE:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], i32 [[OFFSET:%.]], float [[ZCOMPARE:%.]], half [[DSDH:%.]], half [[DTDH:%.]], half [[DSDV:%.]], half [[DTDV:%.]], float [[S:%.]], float [[T:%.]], float [[SLICE:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: store <2 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 8			; CHECK-NEXT: [[RES:%.*]] = call <2 x float> @llvm.amdgcn.image.sample.c.d.o.2darray.v2f32.f16.f32(i32 6, i32 [[OFFSET]], float [[ZCOMPARE]], half [[DSDH]], half [[DTDH]], half [[DSDV]], half [[DTDV]], float [[S]], float [[T]], float [[SLICE]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <2 x float> [[RES]], ptr addrspace(1) [[OUT]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%dsdh32 = fpext half %dsdh to float			%dsdh32 = fpext half %dsdh to float
	%dtdh32 = fpext half %dtdh to float			%dtdh32 = fpext half %dtdh to float
	%dsdv32 = fpext half %dsdv to float			%dsdv32 = fpext half %dsdv to float
	%dtdv32 = fpext half %dtdv to float			%dtdv32 = fpext half %dtdv to float
	%res = call <2 x float> @llvm.amdgcn.image.sample.c.d.o.2darray.v2f32.f32.f32(i32 6, i32 %offset, float %zcompare, float %dsdh32, float %dtdh32, float %dsdv32, float %dtdv32, float %s, float %t, float %slice, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call <2 x float> @llvm.amdgcn.image.sample.c.d.o.2darray.v2f32.f32.f32(i32 6, i32 %offset, float %zcompare, float %dsdh32, float %dtdh32, float %dsdv32, float %dtdv32, float %s, float %t, float %slice, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <2 x float> %res, ptr addrspace(1) %out			store <2 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	; --------------------------------------------------------------------			; --------------------------------------------------------------------
	; llvm.amdgcn.image.sample a16 preserve fast-math flags			; llvm.amdgcn.image.sample a16 preserve fast-math flags
	; --------------------------------------------------------------------			; --------------------------------------------------------------------

	define amdgpu_kernel void @image_sample_a16_1d_nnan(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s) {			define amdgpu_kernel void @image_sample_a16_1d_nnan(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s) {
	; CHECK-LABEL: @image_sample_a16_1d_nnan(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_1d_nnan
	; CHECK-NEXT: [[RES:%.]] = call nnan <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f16(i32 15, half [[S:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], half [[S:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call nnan <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f16(i32 15, half [[S]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%res = call nnan <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %s32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call nnan <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %s32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_1d_nnan_ninf_nsz(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s) {			define amdgpu_kernel void @image_sample_a16_1d_nnan_ninf_nsz(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s) {
	; CHECK-LABEL: @image_sample_a16_1d_nnan_ninf_nsz(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_1d_nnan_ninf_nsz
	; CHECK-NEXT: [[RES:%.]] = call nnan ninf nsz <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f16(i32 15, half [[S:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], half [[S:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call nnan ninf nsz <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f16(i32 15, half [[S]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%res = call nnan ninf nsz <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %s32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call nnan ninf nsz <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %s32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_1d_fast(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s) {			define amdgpu_kernel void @image_sample_a16_1d_fast(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s) {
	; CHECK-LABEL: @image_sample_a16_1d_fast(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_1d_fast
	; CHECK-NEXT: [[RES:%.]] = call fast <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f16(i32 15, half [[S:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], half [[S:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call fast <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f16(i32 15, half [[S]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%res = call fast <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %s32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call fast <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float %s32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_2d_nnan(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t) {			define amdgpu_kernel void @image_sample_a16_2d_nnan(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t) {
	; CHECK-LABEL: @image_sample_a16_2d_nnan(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_2d_nnan
	; CHECK-NEXT: [[RES:%.]] = call nnan <4 x float> @llvm.amdgcn.image.sample.2d.v4f32.f16(i32 15, half [[S:%.]], half [[T:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], half [[S:%.]], half [[T:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call nnan <4 x float> @llvm.amdgcn.image.sample.2d.v4f32.f16(i32 15, half [[S]], half [[T]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%t32 = fpext half %t to float			%t32 = fpext half %t to float
	%res = call nnan <4 x float> @llvm.amdgcn.image.sample.2d.v4f32.f32(i32 15, float %s32, float %t32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call nnan <4 x float> @llvm.amdgcn.image.sample.2d.v4f32.f32(i32 15, float %s32, float %t32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_3d_nnan(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %r) {			define amdgpu_kernel void @image_sample_a16_3d_nnan(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %r) {
	; CHECK-LABEL: @image_sample_a16_3d_nnan(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_3d_nnan
	; CHECK-NEXT: [[RES:%.]] = call nnan <4 x float> @llvm.amdgcn.image.sample.3d.v4f32.f16(i32 15, half [[S:%.]], half [[T:%.]], half [[R:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], half [[S:%.]], half [[T:%.]], half [[R:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call nnan <4 x float> @llvm.amdgcn.image.sample.3d.v4f32.f16(i32 15, half [[S]], half [[T]], half [[R]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%t32 = fpext half %t to float			%t32 = fpext half %t to float
	%r32 = fpext half %r to float			%r32 = fpext half %r to float
	%res = call nnan <4 x float> @llvm.amdgcn.image.sample.3d.v4f32.f32(i32 15, float %s32, float %t32, float %r32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call nnan <4 x float> @llvm.amdgcn.image.sample.3d.v4f32.f32(i32 15, float %s32, float %t32, float %r32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_cube_nnan(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %face) {			define amdgpu_kernel void @image_sample_a16_cube_nnan(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %face) {
	;			;
	; CHECK-LABEL: @image_sample_a16_cube_nnan(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_cube_nnan
	; CHECK-NEXT: [[RES:%.]] = call nnan <4 x float> @llvm.amdgcn.image.sample.cube.v4f32.f16(i32 15, half [[S:%.]], half [[T:%.]], half [[FACE:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], half [[S:%.]], half [[T:%.]], half [[FACE:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call nnan <4 x float> @llvm.amdgcn.image.sample.cube.v4f32.f16(i32 15, half [[S]], half [[T]], half [[FACE]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%t32 = fpext half %t to float			%t32 = fpext half %t to float
	%face32 = fpext half %face to float			%face32 = fpext half %face to float
	%res = call nnan <4 x float> @llvm.amdgcn.image.sample.cube.v4f32.f32(i32 15, float %s32, float %t32, float %face32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call nnan <4 x float> @llvm.amdgcn.image.sample.cube.v4f32.f32(i32 15, float %s32, float %t32, float %face32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_1darray_nnan(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %slice) {			define amdgpu_kernel void @image_sample_a16_1darray_nnan(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %slice) {
	; CHECK-LABEL: @image_sample_a16_1darray_nnan(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_1darray_nnan
	; CHECK-NEXT: [[RES:%.]] = call nnan <4 x float> @llvm.amdgcn.image.sample.1darray.v4f32.f16(i32 15, half [[S:%.]], half [[SLICE:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], half [[S:%.]], half [[SLICE:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call nnan <4 x float> @llvm.amdgcn.image.sample.1darray.v4f32.f16(i32 15, half [[S]], half [[SLICE]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%slice32 = fpext half %slice to float			%slice32 = fpext half %slice to float
	%res = call nnan <4 x float> @llvm.amdgcn.image.sample.1darray.v4f32.f32(i32 15, float %s32, float %slice32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call nnan <4 x float> @llvm.amdgcn.image.sample.1darray.v4f32.f32(i32 15, float %s32, float %slice32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @image_sample_a16_2darray_nnan(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %slice) {			define amdgpu_kernel void @image_sample_a16_2darray_nnan(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s, half %t, half %slice) {
	; CHECK-LABEL: @image_sample_a16_2darray_nnan(			; CHECK-LABEL: define amdgpu_kernel void @image_sample_a16_2darray_nnan
	; CHECK-NEXT: [[RES:%.]] = call nnan <4 x float> @llvm.amdgcn.image.sample.2darray.v4f32.f16(i32 15, half [[S:%.]], half [[T:%.]], half [[SLICE:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], half [[S:%.]], half [[T:%.]], half [[SLICE:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: [[RES:%.*]] = call nnan <4 x float> @llvm.amdgcn.image.sample.2darray.v4f32.f16(i32 15, half [[S]], half [[T]], half [[SLICE]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
				; CHECK-NEXT: store <4 x float> [[RES]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%t32 = fpext half %t to float			%t32 = fpext half %t to float
	%slice32 = fpext half %slice to float			%slice32 = fpext half %slice to float
	%res = call nnan <4 x float> @llvm.amdgcn.image.sample.2darray.v4f32.f32(i32 15, float %s32, float %t32, float %slice32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%res = call nnan <4 x float> @llvm.amdgcn.image.sample.2darray.v4f32.f32(i32 15, float %s32, float %t32, float %slice32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %res, ptr addrspace(1) %out			store <4 x float> %res, ptr addrspace(1) %out
	ret void			ret void
	Show All 10 Lines

	declare <4 x float> @llvm.amdgcn.image.gather4.l.2d.v4f32.f32(i32, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.gather4.l.2d.v4f32.f32(i32, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.gather4.c.l.2d.v4f32.f32(i32, float, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.gather4.c.l.2d.v4f32.f32(i32, float, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.gather4.l.o.2d.v4f32.f32(i32, i32, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.gather4.l.o.2d.v4f32.f32(i32, i32, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.gather4.c.l.o.2d.v4f32.f32(i32, i32, float, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.gather4.c.l.o.2d.v4f32.f32(i32, i32, float, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.gather4.c.l.o.2darray.v4f32.f32(i32, i32, float, float, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.gather4.c.l.o.2darray.v4f32.f32(i32, i32, float, float, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1

	define amdgpu_kernel void @sample_l_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s, float %lod) {			define amdgpu_kernel void @sample_l_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s, float %lod) {
	; CHECK-LABEL: @sample_l_1d(			; CHECK-LABEL: define amdgpu_kernel void @sample_l_1d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[S:%.]], float [[LOD:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.lz.1d.v4f32.f32(i32 15, float [[S:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.lz.1d.v4f32.f32(i32 15, float [[S]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.l.1d.v4f32.f32(i32 15, float %s, float 0.0, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.l.1d.v4f32.f32(i32 15, float %s, float 0.0, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @sample_l_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s, float %t, float %lod) {			define amdgpu_kernel void @sample_l_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s, float %t, float %lod) {
	; CHECK-LABEL: @sample_l_2d(			; CHECK-LABEL: define amdgpu_kernel void @sample_l_2d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[S:%.]], float [[T:%.]], float [[LOD:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.lz.2d.v4f32.f32(i32 15, float [[S:%.]], float [[T:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.lz.2d.v4f32.f32(i32 15, float [[S]], float [[T]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.l.2d.v4f32.f32(i32 15, float %s, float %t, float -0.0, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.l.2d.v4f32.f32(i32 15, float %s, float %t, float -0.0, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @sample_c_l_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %s, float %lod) {			define amdgpu_kernel void @sample_c_l_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %s, float %lod) {
	; CHECK-LABEL: @sample_c_l_1d(			; CHECK-LABEL: define amdgpu_kernel void @sample_c_l_1d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[ZCOMPARE:%.]], float [[S:%.]], float [[LOD:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.lz.1d.v4f32.f32(i32 15, float [[ZCOMPARE:%.]], float [[S:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.lz.1d.v4f32.f32(i32 15, float [[ZCOMPARE]], float [[S]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.l.1d.v4f32.f32(i32 15, float %zcompare, float %s, float -2.0, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.l.1d.v4f32.f32(i32 15, float %zcompare, float %s, float -2.0, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @sample_c_l_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %s, float %t, float %lod) {			define amdgpu_kernel void @sample_c_l_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %s, float %t, float %lod) {
	; CHECK-LABEL: @sample_c_l_2d(			; CHECK-LABEL: define amdgpu_kernel void @sample_c_l_2d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[ZCOMPARE:%.]], float [[S:%.]], float [[T:%.]], float [[LOD:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.lz.2d.v4f32.f32(i32 15, float [[ZCOMPARE:%.]], float [[S:%.]], float [[T:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.lz.2d.v4f32.f32(i32 15, float [[ZCOMPARE]], float [[S]], float [[T]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.l.2d.v4f32.f32(i32 15, float %zcompare, float %s, float %t, float 0.0, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.l.2d.v4f32.f32(i32 15, float %zcompare, float %s, float %t, float 0.0, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @sample_l_o_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 %offset, float %s, float %lod) {			define amdgpu_kernel void @sample_l_o_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 %offset, float %s, float %lod) {
	; CHECK-LABEL: @sample_l_o_1d(			; CHECK-LABEL: define amdgpu_kernel void @sample_l_o_1d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], i32 [[OFFSET:%.]], float [[S:%.]], float [[LOD:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.lz.o.1d.v4f32.f32(i32 15, i32 [[OFFSET:%.]], float [[S:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.lz.o.1d.v4f32.f32(i32 15, i32 [[OFFSET]], float [[S]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.l.o.1d.v4f32.f32(i32 15, i32 %offset, float %s, float 0.0, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.l.o.1d.v4f32.f32(i32 15, i32 %offset, float %s, float 0.0, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @sample_l_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 %offset, float %s, float %t, float %lod) {			define amdgpu_kernel void @sample_l_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 %offset, float %s, float %t, float %lod) {
	; CHECK-LABEL: @sample_l_o_2d(			; CHECK-LABEL: define amdgpu_kernel void @sample_l_o_2d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], i32 [[OFFSET:%.]], float [[S:%.]], float [[T:%.]], float [[LOD:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.lz.o.2d.v4f32.f32(i32 15, i32 [[OFFSET:%.]], float [[S:%.]], float [[T:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.lz.o.2d.v4f32.f32(i32 15, i32 [[OFFSET]], float [[S]], float [[T]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.l.o.2d.v4f32.f32(i32 15, i32 %offset, float %s, float %t, float 0.0, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.l.o.2d.v4f32.f32(i32 15, i32 %offset, float %s, float %t, float 0.0, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @sample_c_l_o_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 %offset, float %zcompare, float %s, float %lod) {			define amdgpu_kernel void @sample_c_l_o_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 %offset, float %zcompare, float %s, float %lod) {
	; CHECK-LABEL: @sample_c_l_o_1d(			; CHECK-LABEL: define amdgpu_kernel void @sample_c_l_o_1d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], i32 [[OFFSET:%.]], float [[ZCOMPARE:%.]], float [[S:%.]], float [[LOD:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.lz.o.1d.v4f32.f32(i32 15, i32 [[OFFSET:%.]], float [[ZCOMPARE:%.]], float [[S:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.lz.o.1d.v4f32.f32(i32 15, i32 [[OFFSET]], float [[ZCOMPARE]], float [[S]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.l.o.1d.v4f32.f32(i32 15, i32 %offset, float %zcompare, float %s, float 0.0, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.l.o.1d.v4f32.f32(i32 15, i32 %offset, float %zcompare, float %s, float 0.0, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @sample_c_l_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 %offset, float %zcompare, float %s, float %t, float %lod) {			define amdgpu_kernel void @sample_c_l_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 %offset, float %zcompare, float %s, float %t, float %lod) {
	; CHECK-LABEL: @sample_c_l_o_2d(			; CHECK-LABEL: define amdgpu_kernel void @sample_c_l_o_2d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], i32 [[OFFSET:%.]], float [[ZCOMPARE:%.]], float [[S:%.]], float [[T:%.]], float [[LOD:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.lz.o.2d.v4f32.f32(i32 15, i32 [[OFFSET:%.]], float [[ZCOMPARE:%.]], float [[S:%.]], float [[T:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.lz.o.2d.v4f32.f32(i32 15, i32 [[OFFSET]], float [[ZCOMPARE]], float [[S]], float [[T]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.l.o.2d.v4f32.f32(i32 15, i32 %offset, float %zcompare, float %s, float %t, float 0.0, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.l.o.2d.v4f32.f32(i32 15, i32 %offset, float %zcompare, float %s, float %t, float 0.0, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @gather4_l_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s, float %t, float %lod) {			define amdgpu_kernel void @gather4_l_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s, float %t, float %lod) {
	; CHECK-LABEL: @gather4_l_2d(			; CHECK-LABEL: define amdgpu_kernel void @gather4_l_2d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[S:%.]], float [[T:%.]], float [[LOD:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.gather4.lz.2d.v4f32.f32(i32 15, float [[S:%.]], float [[T:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.gather4.lz.2d.v4f32.f32(i32 15, float [[S]], float [[T]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.l.2d.v4f32.f32(i32 15, float %s, float %t, float 0.0, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.l.2d.v4f32.f32(i32 15, float %s, float %t, float 0.0, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @gather4_c_l_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %s, float %t, float %lod) {			define amdgpu_kernel void @gather4_c_l_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %s, float %t, float %lod) {
	; CHECK-LABEL: @gather4_c_l_2d(			; CHECK-LABEL: define amdgpu_kernel void @gather4_c_l_2d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[ZCOMPARE:%.]], float [[S:%.]], float [[T:%.]], float [[LOD:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.gather4.c.lz.2d.v4f32.f32(i32 15, float [[ZCOMPARE:%.]], float [[S:%.]], float [[T:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.gather4.c.lz.2d.v4f32.f32(i32 15, float [[ZCOMPARE]], float [[S]], float [[T]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.c.l.2d.v4f32.f32(i32 15, float %zcompare, float %s, float %t, float 0.0, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.c.l.2d.v4f32.f32(i32 15, float %zcompare, float %s, float %t, float 0.0, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @gather4_l_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 %offset, float %s, float %t, float %lod) {			define amdgpu_kernel void @gather4_l_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 %offset, float %s, float %t, float %lod) {
	; CHECK-LABEL: @gather4_l_o_2d(			; CHECK-LABEL: define amdgpu_kernel void @gather4_l_o_2d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], i32 [[OFFSET:%.]], float [[S:%.]], float [[T:%.]], float [[LOD:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.gather4.lz.o.2d.v4f32.f32(i32 15, i32 [[OFFSET:%.]], float [[S:%.]], float [[T:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.gather4.lz.o.2d.v4f32.f32(i32 15, i32 [[OFFSET]], float [[S]], float [[T]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.l.o.2d.v4f32.f32(i32 15, i32 %offset, float %s, float %t, float 0.0, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.l.o.2d.v4f32.f32(i32 15, i32 %offset, float %s, float %t, float 0.0, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @gather4_c_l_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 %offset, float %zcompare, float %s, float %t, float %lod) {			define amdgpu_kernel void @gather4_c_l_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 %offset, float %zcompare, float %s, float %t, float %lod) {
	; CHECK-LABEL: @gather4_c_l_o_2d(			; CHECK-LABEL: define amdgpu_kernel void @gather4_c_l_o_2d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], i32 [[OFFSET:%.]], float [[ZCOMPARE:%.]], float [[S:%.]], float [[T:%.]], float [[LOD:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.gather4.c.lz.o.2d.v4f32.f32(i32 15, i32 [[OFFSET:%.]], float [[ZCOMPARE:%.]], float [[S:%.]], float [[T:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.gather4.c.lz.o.2d.v4f32.f32(i32 15, i32 [[OFFSET]], float [[ZCOMPARE]], float [[S]], float [[T]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.c.l.o.2d.v4f32.f32(i32 15, i32 %offset, float %zcompare, float %s, float %t, float 0.0, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.c.l.o.2d.v4f32.f32(i32 15, i32 %offset, float %zcompare, float %s, float %t, float 0.0, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @gather4_c_l_o_2darray(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 %offset, float %zcompare, float %s, float %t, float %slice, float %lod) {			define amdgpu_kernel void @gather4_c_l_o_2darray(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 %offset, float %zcompare, float %s, float %t, float %slice, float %lod) {
	; CHECK-LABEL: @gather4_c_l_o_2darray(			; CHECK-LABEL: define amdgpu_kernel void @gather4_c_l_o_2darray
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], i32 [[OFFSET:%.]], float [[ZCOMPARE:%.]], float [[S:%.]], float [[T:%.]], float [[SLICE:%.]], float [[LOD:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.gather4.c.lz.o.2darray.v4f32.f32(i32 15, i32 [[OFFSET:%.]], float [[ZCOMPARE:%.]], float [[S:%.]], float [[T:%.]], float [[SLICE:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.gather4.c.lz.o.2darray.v4f32.f32(i32 15, i32 [[OFFSET]], float [[ZCOMPARE]], float [[S]], float [[T]], float [[SLICE]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.c.l.o.2darray.v4f32.f32(i32 15, i32 %offset, float %zcompare, float %s, float %t, float %slice, float 0.0, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.c.l.o.2darray.v4f32.f32(i32 15, i32 %offset, float %zcompare, float %s, float %t, float %slice, float 0.0, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	; --------------------------------------------------------------------			; --------------------------------------------------------------------
	; llvm.amdgcn.image.sample mipmap zero			; llvm.amdgcn.image.sample mipmap zero
	; --------------------------------------------------------------------			; --------------------------------------------------------------------

	define amdgpu_kernel void @load_mip_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, i32 %s) {			define amdgpu_kernel void @load_mip_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, i32 %s) {
	; CHECK-LABEL: @load_mip_1d(			; CHECK-LABEL: define amdgpu_kernel void @load_mip_1d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], i32 [[S:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i32(i32 15, i32 [[S:%.]], <8 x i32> [[RSRC:%.*]], i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i32(i32 15, i32 [[S]], <8 x i32> [[RSRC]], i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.load.mip.1d.v4f32.i32(i32 15, i32 %s, i32 0, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.load.mip.1d.v4f32.i32(i32 15, i32 %s, i32 0, <8 x i32> %rsrc, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @load_mip_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, i32 %s, i32 %t) {			define amdgpu_kernel void @load_mip_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, i32 %s, i32 %t) {
	; CHECK-LABEL: @load_mip_2d(			; CHECK-LABEL: define amdgpu_kernel void @load_mip_2d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], i32 [[S:%.]], i32 [[T:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.load.2d.v4f32.i32(i32 15, i32 [[S:%.]], i32 [[T:%.]], <8 x i32> [[RSRC:%.]], i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.load.2d.v4f32.i32(i32 15, i32 [[S]], i32 [[T]], <8 x i32> [[RSRC]], i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.load.mip.2d.v4f32.i32(i32 15, i32 %s, i32 %t, i32 0, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.load.mip.2d.v4f32.i32(i32 15, i32 %s, i32 %t, i32 0, <8 x i32> %rsrc, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @load_mip_3d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, i32 %s, i32 %t, i32 %u) {			define amdgpu_kernel void @load_mip_3d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, i32 %s, i32 %t, i32 %u) {
	; CHECK-LABEL: @load_mip_3d(			; CHECK-LABEL: define amdgpu_kernel void @load_mip_3d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], i32 [[S:%.]], i32 [[T:%.]], i32 [[U:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.load.3d.v4f32.i32(i32 15, i32 [[S:%.]], i32 [[T:%.]], i32 [[U:%.]], <8 x i32> [[RSRC:%.*]], i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.load.3d.v4f32.i32(i32 15, i32 [[S]], i32 [[T]], i32 [[U]], <8 x i32> [[RSRC]], i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.load.mip.3d.v4f32.i32(i32 15, i32 %s, i32 %t, i32 %u, i32 0, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.load.mip.3d.v4f32.i32(i32 15, i32 %s, i32 %t, i32 %u, i32 0, <8 x i32> %rsrc, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @load_mip_1darray(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, i32 %s, i32 %t) {			define amdgpu_kernel void @load_mip_1darray(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, i32 %s, i32 %t) {
	; CHECK-LABEL: @load_mip_1darray(			; CHECK-LABEL: define amdgpu_kernel void @load_mip_1darray
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], i32 [[S:%.]], i32 [[T:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.load.1darray.v4f32.i32(i32 15, i32 [[S:%.]], i32 [[T:%.]], <8 x i32> [[RSRC:%.]], i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.load.1darray.v4f32.i32(i32 15, i32 [[S]], i32 [[T]], <8 x i32> [[RSRC]], i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.load.mip.1darray.v4f32.i32(i32 15, i32 %s, i32 %t, i32 0, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.load.mip.1darray.v4f32.i32(i32 15, i32 %s, i32 %t, i32 0, <8 x i32> %rsrc, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @load_mip_2darray(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, i32 %s, i32 %t, i32 %u) {			define amdgpu_kernel void @load_mip_2darray(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, i32 %s, i32 %t, i32 %u) {
	; CHECK-LABEL: @load_mip_2darray(			; CHECK-LABEL: define amdgpu_kernel void @load_mip_2darray
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], i32 [[S:%.]], i32 [[T:%.]], i32 [[U:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.load.2darray.v4f32.i32(i32 15, i32 [[S:%.]], i32 [[T:%.]], i32 [[U:%.]], <8 x i32> [[RSRC:%.*]], i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.load.2darray.v4f32.i32(i32 15, i32 [[S]], i32 [[T]], i32 [[U]], <8 x i32> [[RSRC]], i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.load.mip.2darray.v4f32.i32(i32 15, i32 %s, i32 %t, i32 %u, i32 0, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.load.mip.2darray.v4f32.i32(i32 15, i32 %s, i32 %t, i32 %u, i32 0, <8 x i32> %rsrc, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @load_mip_cube(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, i32 %s, i32 %t, i32 %u) {			define amdgpu_kernel void @load_mip_cube(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, i32 %s, i32 %t, i32 %u) {
	; CHECK-LABEL: @load_mip_cube(			; CHECK-LABEL: define amdgpu_kernel void @load_mip_cube
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], i32 [[S:%.]], i32 [[T:%.]], i32 [[U:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.load.cube.v4f32.i32(i32 15, i32 [[S:%.]], i32 [[T:%.]], i32 [[U:%.]], <8 x i32> [[RSRC:%.*]], i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.load.cube.v4f32.i32(i32 15, i32 [[S]], i32 [[T]], i32 [[U]], <8 x i32> [[RSRC]], i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.load.mip.cube.v4f32.i32(i32 15, i32 %s, i32 %t, i32 %u, i32 0, <8 x i32> %rsrc, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.load.mip.cube.v4f32.i32(i32 15, i32 %s, i32 %t, i32 %u, i32 0, <8 x i32> %rsrc, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}


	define amdgpu_kernel void @store_mip_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x float> %vdata, i32 %s) {			define amdgpu_kernel void @store_mip_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x float> %vdata, i32 %s) {
	; CHECK-LABEL: @store_mip_1d(			; CHECK-LABEL: define amdgpu_kernel void @store_mip_1d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x float> [[VDATA:%.]], i32 [[S:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: call void @llvm.amdgcn.image.store.1d.v4f32.i32(<4 x float> [[VDATA:%.]], i32 15, i32 [[S:%.]], <8 x i32> [[RSRC:%.*]], i32 0, i32 0)			; CHECK-NEXT: call void @llvm.amdgcn.image.store.1d.v4f32.i32(<4 x float> [[VDATA]], i32 15, i32 [[S]], <8 x i32> [[RSRC]], i32 0, i32 0)
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	call void @llvm.amdgcn.image.store.mip.1d.v4f32.i32(<4 x float> %vdata, i32 15, i32 %s, i32 0, <8 x i32> %rsrc, i32 0, i32 0)			call void @llvm.amdgcn.image.store.mip.1d.v4f32.i32(<4 x float> %vdata, i32 15, i32 %s, i32 0, <8 x i32> %rsrc, i32 0, i32 0)
	ret void			ret void
	}			}

	define amdgpu_kernel void @store_mip_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x float> %vdata, i32 %s, i32 %t) {			define amdgpu_kernel void @store_mip_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x float> %vdata, i32 %s, i32 %t) {
	; CHECK-LABEL: @store_mip_2d(			; CHECK-LABEL: define amdgpu_kernel void @store_mip_2d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x float> [[VDATA:%.]], i32 [[S:%.]], i32 [[T:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: call void @llvm.amdgcn.image.store.2d.v4f32.i32(<4 x float> [[VDATA:%.]], i32 15, i32 [[S:%.]], i32 [[T:%.]], <8 x i32> [[RSRC:%.]], i32 0, i32 0)			; CHECK-NEXT: call void @llvm.amdgcn.image.store.2d.v4f32.i32(<4 x float> [[VDATA]], i32 15, i32 [[S]], i32 [[T]], <8 x i32> [[RSRC]], i32 0, i32 0)
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	call void @llvm.amdgcn.image.store.mip.2d.v4f32.i32(<4 x float> %vdata, i32 15, i32 %s, i32 %t, i32 0, <8 x i32> %rsrc, i32 0, i32 0)			call void @llvm.amdgcn.image.store.mip.2d.v4f32.i32(<4 x float> %vdata, i32 15, i32 %s, i32 %t, i32 0, <8 x i32> %rsrc, i32 0, i32 0)
	ret void			ret void
	}			}

	define amdgpu_kernel void @store_mip_3d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x float> %vdata, i32 %s, i32 %t, i32 %u) {			define amdgpu_kernel void @store_mip_3d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x float> %vdata, i32 %s, i32 %t, i32 %u) {
	; CHECK-LABEL: @store_mip_3d(			; CHECK-LABEL: define amdgpu_kernel void @store_mip_3d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x float> [[VDATA:%.]], i32 [[S:%.]], i32 [[T:%.]], i32 [[U:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: call void @llvm.amdgcn.image.store.3d.v4f32.i32(<4 x float> [[VDATA:%.]], i32 15, i32 [[S:%.]], i32 [[T:%.]], i32 [[U:%.]], <8 x i32> [[RSRC:%.*]], i32 0, i32 0)			; CHECK-NEXT: call void @llvm.amdgcn.image.store.3d.v4f32.i32(<4 x float> [[VDATA]], i32 15, i32 [[S]], i32 [[T]], i32 [[U]], <8 x i32> [[RSRC]], i32 0, i32 0)
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	call void @llvm.amdgcn.image.store.mip.3d.v4f32.i32(<4 x float> %vdata, i32 15, i32 %s, i32 %t, i32 %u, i32 0, <8 x i32> %rsrc, i32 0, i32 0)			call void @llvm.amdgcn.image.store.mip.3d.v4f32.i32(<4 x float> %vdata, i32 15, i32 %s, i32 %t, i32 %u, i32 0, <8 x i32> %rsrc, i32 0, i32 0)
	ret void			ret void
	}			}

	define amdgpu_kernel void @store_mip_1darray(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x float> %vdata, i32 %s, i32 %t) {			define amdgpu_kernel void @store_mip_1darray(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x float> %vdata, i32 %s, i32 %t) {
	; CHECK-LABEL: @store_mip_1darray(			; CHECK-LABEL: define amdgpu_kernel void @store_mip_1darray
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x float> [[VDATA:%.]], i32 [[S:%.]], i32 [[T:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: call void @llvm.amdgcn.image.store.1darray.v4f32.i32(<4 x float> [[VDATA:%.]], i32 15, i32 [[S:%.]], i32 [[T:%.]], <8 x i32> [[RSRC:%.]], i32 0, i32 0)			; CHECK-NEXT: call void @llvm.amdgcn.image.store.1darray.v4f32.i32(<4 x float> [[VDATA]], i32 15, i32 [[S]], i32 [[T]], <8 x i32> [[RSRC]], i32 0, i32 0)
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	call void @llvm.amdgcn.image.store.mip.1darray.v4f32.i32(<4 x float> %vdata, i32 15, i32 %s, i32 %t, i32 0, <8 x i32> %rsrc, i32 0, i32 0)			call void @llvm.amdgcn.image.store.mip.1darray.v4f32.i32(<4 x float> %vdata, i32 15, i32 %s, i32 %t, i32 0, <8 x i32> %rsrc, i32 0, i32 0)
	ret void			ret void
	}			}

	define amdgpu_kernel void @store_mip_2darray(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x float> %vdata, i32 %s, i32 %t, i32 %u) {			define amdgpu_kernel void @store_mip_2darray(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x float> %vdata, i32 %s, i32 %t, i32 %u) {
	; CHECK-LABEL: @store_mip_2darray(			; CHECK-LABEL: define amdgpu_kernel void @store_mip_2darray
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x float> [[VDATA:%.]], i32 [[S:%.]], i32 [[T:%.]], i32 [[U:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: call void @llvm.amdgcn.image.store.2darray.v4f32.i32(<4 x float> [[VDATA:%.]], i32 15, i32 [[S:%.]], i32 [[T:%.]], i32 [[U:%.]], <8 x i32> [[RSRC:%.*]], i32 0, i32 0)			; CHECK-NEXT: call void @llvm.amdgcn.image.store.2darray.v4f32.i32(<4 x float> [[VDATA]], i32 15, i32 [[S]], i32 [[T]], i32 [[U]], <8 x i32> [[RSRC]], i32 0, i32 0)
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	call void @llvm.amdgcn.image.store.mip.2darray.v4f32.i32(<4 x float> %vdata, i32 15, i32 %s, i32 %t, i32 %u, i32 0, <8 x i32> %rsrc, i32 0, i32 0)			call void @llvm.amdgcn.image.store.mip.2darray.v4f32.i32(<4 x float> %vdata, i32 15, i32 %s, i32 %t, i32 %u, i32 0, <8 x i32> %rsrc, i32 0, i32 0)
	ret void			ret void
	}			}

	define amdgpu_kernel void @store_mip_cube(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x float> %vdata, i32 %s, i32 %t, i32 %u) {			define amdgpu_kernel void @store_mip_cube(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x float> %vdata, i32 %s, i32 %t, i32 %u) {
	; CHECK-LABEL: @store_mip_cube(			; CHECK-LABEL: define amdgpu_kernel void @store_mip_cube
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x float> [[VDATA:%.]], i32 [[S:%.]], i32 [[T:%.]], i32 [[U:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: call void @llvm.amdgcn.image.store.cube.v4f32.i32(<4 x float> [[VDATA:%.]], i32 15, i32 [[S:%.]], i32 [[T:%.]], i32 [[U:%.]], <8 x i32> [[RSRC:%.*]], i32 0, i32 0)			; CHECK-NEXT: call void @llvm.amdgcn.image.store.cube.v4f32.i32(<4 x float> [[VDATA]], i32 15, i32 [[S]], i32 [[T]], i32 [[U]], <8 x i32> [[RSRC]], i32 0, i32 0)
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	call void @llvm.amdgcn.image.store.mip.cube.v4f32.i32(<4 x float> %vdata, i32 15, i32 %s, i32 %t, i32 %u, i32 0, <8 x i32> %rsrc, i32 0, i32 0)			call void @llvm.amdgcn.image.store.mip.cube.v4f32.i32(<4 x float> %vdata, i32 15, i32 %s, i32 %t, i32 %u, i32 0, <8 x i32> %rsrc, i32 0, i32 0)
	ret void			ret void
	}			}

	declare <4 x float> @llvm.amdgcn.image.load.mip.1d.v4f32.i32(i32, i32, i32, <8 x i32>, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.load.mip.1d.v4f32.i32(i32, i32, i32, <8 x i32>, i32, i32) #1
	Show All 26 Lines
	declare <4 x float> @llvm.amdgcn.image.sample.c.b.o.2d.v4f32.f16(i32, i32, half, float, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.sample.c.b.o.2d.v4f32.f16(i32, i32, half, float, half, half, <8 x i32>, <4 x i32>, i1, i32, i32) #1

	declare <4 x float> @llvm.amdgcn.image.gather4.b.2d.v4f32.f32(i32, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.gather4.b.2d.v4f32.f32(i32, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.gather4.c.b.2d.v4f32.f32(i32, float, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.gather4.c.b.2d.v4f32.f32(i32, float, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.gather4.b.o.2d.v4f32.f32(i32, i32, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.gather4.b.o.2d.v4f32.f32(i32, i32, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1
	declare <4 x float> @llvm.amdgcn.image.gather4.c.b.o.2d.v4f32.f32(i32, i32, float, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1			declare <4 x float> @llvm.amdgcn.image.gather4.c.b.o.2d.v4f32.f32(i32, i32, float, float, float, float, <8 x i32>, <4 x i32>, i1, i32, i32) #1

	define amdgpu_kernel void @sample_b_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s) {			define amdgpu_kernel void @sample_b_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s) {
	; CHECK-LABEL: @sample_b_1d(			; CHECK-LABEL: define amdgpu_kernel void @sample_b_1d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[S:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float [[S:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float [[S]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.b.1d.v4f32.f32(i32 15, float 0.0, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.b.1d.v4f32.f32(i32 15, float 0.0, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @sample_b_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s, float %t) {			define amdgpu_kernel void @sample_b_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s, float %t) {
	; CHECK-LABEL: @sample_b_2d(			; CHECK-LABEL: define amdgpu_kernel void @sample_b_2d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[S:%.]], float [[T:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.2d.v4f32.f32(i32 15, float [[S:%.]], float [[T:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.2d.v4f32.f32(i32 15, float [[S]], float [[T]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.b.2d.v4f32.f32(i32 15, float -0.0, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.b.2d.v4f32.f32(i32 15, float -0.0, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @sample_c_b_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %s) {			define amdgpu_kernel void @sample_c_b_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %s) {
	; CHECK-LABEL: @sample_c_b_1d(			; CHECK-LABEL: define amdgpu_kernel void @sample_c_b_1d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[ZCOMPARE:%.]], float [[S:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.1d.v4f32.f32(i32 15, float [[ZCOMPARE:%.]], float [[S:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.1d.v4f32.f32(i32 15, float [[ZCOMPARE]], float [[S]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.1d.v4f32.f32(i32 15, float -0.0, float %zcompare, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.1d.v4f32.f32(i32 15, float -0.0, float %zcompare, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @sample_c_b_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %s, float %t) {			define amdgpu_kernel void @sample_c_b_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %s, float %t) {
	; CHECK-LABEL: @sample_c_b_2d(			; CHECK-LABEL: define amdgpu_kernel void @sample_c_b_2d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[ZCOMPARE:%.]], float [[S:%.]], float [[T:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.2d.v4f32.f32(i32 15, float [[ZCOMPARE:%.]], float [[S:%.]], float [[T:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.2d.v4f32.f32(i32 15, float [[ZCOMPARE]], float [[S]], float [[T]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.2d.v4f32.f32(i32 15, float 0.0, float %zcompare, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.2d.v4f32.f32(i32 15, float 0.0, float %zcompare, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @sample_b_o_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 %offset, float %s) {			define amdgpu_kernel void @sample_b_o_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 %offset, float %s) {
	; CHECK-LABEL: @sample_b_o_1d(			; CHECK-LABEL: define amdgpu_kernel void @sample_b_o_1d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], i32 [[OFFSET:%.]], float [[S:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.o.1d.v4f32.f32(i32 15, i32 [[OFFSET:%.]], float [[S:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.o.1d.v4f32.f32(i32 15, i32 [[OFFSET]], float [[S]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.b.o.1d.v4f32.f32(i32 15, i32 %offset, float 0.0, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.b.o.1d.v4f32.f32(i32 15, i32 %offset, float 0.0, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @sample_b_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 %offset, float %s, float %t) {			define amdgpu_kernel void @sample_b_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 %offset, float %s, float %t) {
	; CHECK-LABEL: @sample_b_o_2d(			; CHECK-LABEL: define amdgpu_kernel void @sample_b_o_2d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], i32 [[OFFSET:%.]], float [[S:%.]], float [[T:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.o.2d.v4f32.f32(i32 15, i32 [[OFFSET:%.]], float [[S:%.]], float [[T:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.o.2d.v4f32.f32(i32 15, i32 [[OFFSET]], float [[S]], float [[T]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.b.o.2d.v4f32.f32(i32 15, i32 %offset, float 0.0, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.b.o.2d.v4f32.f32(i32 15, i32 %offset, float 0.0, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @sample_c_b_o_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 %offset, float %zcompare, float %s) {			define amdgpu_kernel void @sample_c_b_o_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 %offset, float %zcompare, float %s) {
	; CHECK-LABEL: @sample_c_b_o_1d(			; CHECK-LABEL: define amdgpu_kernel void @sample_c_b_o_1d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], i32 [[OFFSET:%.]], float [[ZCOMPARE:%.]], float [[S:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.o.1d.v4f32.f32(i32 15, i32 [[OFFSET:%.]], float [[ZCOMPARE:%.]], float [[S:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.o.1d.v4f32.f32(i32 15, i32 [[OFFSET]], float [[ZCOMPARE]], float [[S]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.o.1d.v4f32.f32(i32 15, i32 %offset, float 0.0, float %zcompare, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.o.1d.v4f32.f32(i32 15, i32 %offset, float 0.0, float %zcompare, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @sample_c_b_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 %offset, float %zcompare, float %s, float %t) {			define amdgpu_kernel void @sample_c_b_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 %offset, float %zcompare, float %s, float %t) {
	; CHECK-LABEL: @sample_c_b_o_2d(			; CHECK-LABEL: define amdgpu_kernel void @sample_c_b_o_2d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], i32 [[OFFSET:%.]], float [[ZCOMPARE:%.]], float [[S:%.]], float [[T:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.o.2d.v4f32.f32(i32 15, i32 [[OFFSET:%.]], float [[ZCOMPARE:%.]], float [[S:%.]], float [[T:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.o.2d.v4f32.f32(i32 15, i32 [[OFFSET]], float [[ZCOMPARE]], float [[S]], float [[T]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.o.2d.v4f32.f32(i32 15, i32 %offset, float 0.0, float %zcompare, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.o.2d.v4f32.f32(i32 15, i32 %offset, float 0.0, float %zcompare, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @gather4_b_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s, float %t) {			define amdgpu_kernel void @gather4_b_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s, float %t) {
	; CHECK-LABEL: @gather4_b_2d(			; CHECK-LABEL: define amdgpu_kernel void @gather4_b_2d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[S:%.]], float [[T:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.gather4.2d.v4f32.f32(i32 15, float [[S:%.]], float [[T:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.gather4.2d.v4f32.f32(i32 15, float [[S]], float [[T]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.b.2d.v4f32.f32(i32 15, float 0.0, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.b.2d.v4f32.f32(i32 15, float 0.0, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @gather4_c_b_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %s, float %t) {			define amdgpu_kernel void @gather4_c_b_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %s, float %t) {
	; CHECK-LABEL: @gather4_c_b_2d(			; CHECK-LABEL: define amdgpu_kernel void @gather4_c_b_2d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[ZCOMPARE:%.]], float [[S:%.]], float [[T:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.gather4.c.2d.v4f32.f32(i32 15, float [[ZCOMPARE:%.]], float [[S:%.]], float [[T:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.gather4.c.2d.v4f32.f32(i32 15, float [[ZCOMPARE]], float [[S]], float [[T]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.c.b.2d.v4f32.f32(i32 15, float 0.0, float %zcompare,float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.c.b.2d.v4f32.f32(i32 15, float 0.0, float %zcompare,float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @gather4_b_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 %offset, float %s, float %t) {			define amdgpu_kernel void @gather4_b_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 %offset, float %s, float %t) {
	; CHECK-LABEL: @gather4_b_o_2d(			; CHECK-LABEL: define amdgpu_kernel void @gather4_b_o_2d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], i32 [[OFFSET:%.]], float [[S:%.]], float [[T:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.gather4.o.2d.v4f32.f32(i32 15, i32 [[OFFSET:%.]], float [[S:%.]], float [[T:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.gather4.o.2d.v4f32.f32(i32 15, i32 [[OFFSET]], float [[S]], float [[T]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.b.o.2d.v4f32.f32(i32 15, i32 %offset, float 0.0, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.b.o.2d.v4f32.f32(i32 15, i32 %offset, float 0.0, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @gather4_c_b_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 %offset, float %zcompare, float %s, float %t) {			define amdgpu_kernel void @gather4_c_b_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 %offset, float %zcompare, float %s, float %t) {
	; CHECK-LABEL: @gather4_c_b_o_2d(			; CHECK-LABEL: define amdgpu_kernel void @gather4_c_b_o_2d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], i32 [[OFFSET:%.]], float [[ZCOMPARE:%.]], float [[S:%.]], float [[T:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.gather4.c.o.2d.v4f32.f32(i32 15, i32 [[OFFSET:%.]], float [[ZCOMPARE:%.]], float [[S:%.]], float [[T:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.gather4.c.o.2d.v4f32.f32(i32 15, i32 [[OFFSET]], float [[ZCOMPARE]], float [[S]], float [[T]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.gather4.c.b.o.2d.v4f32.f32(i32 15, i32 %offset, float 0.0, float %zcompare,float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.gather4.c.b.o.2d.v4f32.f32(i32 15, i32 %offset, float 0.0, float %zcompare,float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @sample_c_b_o_a16_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 %offset, float %zcompare, half %s, half %t) {			define amdgpu_kernel void @sample_c_b_o_a16_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, i32 %offset, float %zcompare, half %s, half %t) {
	; CHECK-LABEL: @sample_c_b_o_a16_2d(			; CHECK-LABEL: define amdgpu_kernel void @sample_c_b_o_a16_2d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], i32 [[OFFSET:%.]], float [[ZCOMPARE:%.]], half [[S:%.]], half [[T:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.o.2d.v4f32.f16(i32 15, i32 [[OFFSET:%.]], float [[ZCOMPARE:%.]], half [[S:%.]], half [[T:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.o.2d.v4f32.f16(i32 15, i32 [[OFFSET]], float [[ZCOMPARE]], half [[S]], half [[T]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.o.2d.v4f32.f16(i32 15, i32 %offset, half 0.0, float %zcompare, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.o.2d.v4f32.f16(i32 15, i32 %offset, half 0.0, float %zcompare, half %s, half %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	; Check that bias is not optimized away if > 0			; Check that bias is not optimized away if > 0
	define amdgpu_kernel void @sample_b_1d_pos(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s) {			define amdgpu_kernel void @sample_b_1d_pos(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s) {
	; CHECK-LABEL: @sample_b_1d_pos(			; CHECK-LABEL: define amdgpu_kernel void @sample_b_1d_pos
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[S:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.b.1d.v4f32.f32.f32(i32 15, float 1.000000e+00, float [[S:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.b.1d.v4f32.f32.f32(i32 15, float 1.000000e+00, float [[S]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.b.1d.v4f32.f32(i32 15, float 1.0, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.b.1d.v4f32.f32(i32 15, float 1.0, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	; Check that bias is not optimized away if < 0			; Check that bias is not optimized away if < 0
	define amdgpu_kernel void @sample_b_1d_neg(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s) {			define amdgpu_kernel void @sample_b_1d_neg(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s) {
	; CHECK-LABEL: @sample_b_1d_neg(			; CHECK-LABEL: define amdgpu_kernel void @sample_b_1d_neg
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[S:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.b.1d.v4f32.f32.f32(i32 15, float -1.000000e+00, float [[S:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.b.1d.v4f32.f32.f32(i32 15, float -1.000000e+00, float [[S]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.b.1d.v4f32.f32(i32 15, float -1.0, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.b.1d.v4f32.f32(i32 15, float -1.0, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	; Zero bias + A16			; Zero bias + A16
	define amdgpu_kernel void @sample_b_1d_a16(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s) {			define amdgpu_kernel void @sample_b_1d_a16(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, half %s) {
	; CHECK-LABEL: @sample_b_1d_a16(			; CHECK-LABEL: define amdgpu_kernel void @sample_b_1d_a16
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], half [[S:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f16(i32 15, half [[S:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f16(i32 15, half [[S]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%s32 = fpext half %s to float			%s32 = fpext half %s to float
	%v = call <4 x float> @llvm.amdgcn.image.sample.b.1d.v4f32.f32(i32 15, float -0.0, float %s32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.b.1d.v4f32.f32(i32 15, float -0.0, float %s32, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	; --------------------------------------------------------------------			; --------------------------------------------------------------------
	; llvm.amdgcn.image.sample offset zero			; llvm.amdgcn.image.sample offset zero
	; --------------------------------------------------------------------			; --------------------------------------------------------------------

	define amdgpu_kernel void @offset_sample_o_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s) {			define amdgpu_kernel void @offset_sample_o_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s) {
	; CHECK-LABEL: @offset_sample_o_1d(			; CHECK-LABEL: define amdgpu_kernel void @offset_sample_o_1d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[S:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float [[S:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.1d.v4f32.f32(i32 15, float [[S]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.o.1d.v4f32.f32(i32 15, i32 0, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.o.1d.v4f32.f32(i32 15, i32 0, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @offset_sample_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s, float %t) {			define amdgpu_kernel void @offset_sample_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s, float %t) {
	; CHECK-LABEL: @offset_sample_o_2d(			; CHECK-LABEL: define amdgpu_kernel void @offset_sample_o_2d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[S:%.]], float [[T:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.2d.v4f32.f32(i32 15, float [[S:%.]], float [[T:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.2d.v4f32.f32(i32 15, float [[S]], float [[T]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.o.2d.v4f32.f32(i32 15, i32 0, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.o.2d.v4f32.f32(i32 15, i32 0, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @offset_sample_c_o_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %s) {			define amdgpu_kernel void @offset_sample_c_o_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %s) {
	; CHECK-LABEL: @offset_sample_c_o_1d(			; CHECK-LABEL: define amdgpu_kernel void @offset_sample_c_o_1d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[ZCOMPARE:%.]], float [[S:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.1d.v4f32.f32(i32 15, float [[ZCOMPARE:%.]], float [[S:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.1d.v4f32.f32(i32 15, float [[ZCOMPARE]], float [[S]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.o.1d.v4f32.f32(i32 15, i32 0, float %zcompare, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.o.1d.v4f32.f32(i32 15, i32 0, float %zcompare, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @offset_sample_c_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %s, float %t) {			define amdgpu_kernel void @offset_sample_c_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %s, float %t) {
	; CHECK-LABEL: @offset_sample_c_o_2d(			; CHECK-LABEL: define amdgpu_kernel void @offset_sample_c_o_2d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[ZCOMPARE:%.]], float [[S:%.]], float [[T:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.2d.v4f32.f32(i32 15, float [[ZCOMPARE:%.]], float [[S:%.]], float [[T:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.2d.v4f32.f32(i32 15, float [[ZCOMPARE]], float [[S]], float [[T]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.o.2d.v4f32.f32(i32 15, i32 0, float %zcompare, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.o.2d.v4f32.f32(i32 15, i32 0, float %zcompare, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @offset_sample_cl_o_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s, float %clamp) {			define amdgpu_kernel void @offset_sample_cl_o_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s, float %clamp) {
	; CHECK-LABEL: @offset_sample_cl_o_1d(			; CHECK-LABEL: define amdgpu_kernel void @offset_sample_cl_o_1d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[S:%.]], float [[CLAMP:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.cl.1d.v4f32.f32(i32 15, float [[S:%.]], float [[CLAMP:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.cl.1d.v4f32.f32(i32 15, float [[S]], float [[CLAMP]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.cl.o.1d.v4f32.f32(i32 15, i32 0, float %s, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.cl.o.1d.v4f32.f32(i32 15, i32 0, float %s, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @offset_sample_cl_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s, float %t, float %clamp) {			define amdgpu_kernel void @offset_sample_cl_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s, float %t, float %clamp) {
	; CHECK-LABEL: @offset_sample_cl_o_2d(			; CHECK-LABEL: define amdgpu_kernel void @offset_sample_cl_o_2d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[S:%.]], float [[T:%.]], float [[CLAMP:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.cl.2d.v4f32.f32(i32 15, float [[S:%.]], float [[T:%.]], float [[CLAMP:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.cl.2d.v4f32.f32(i32 15, float [[S]], float [[T]], float [[CLAMP]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.cl.o.2d.v4f32.f32(i32 15, i32 0, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.cl.o.2d.v4f32.f32(i32 15, i32 0, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @offset_sample_c_cl_o_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %s, float %clamp) {			define amdgpu_kernel void @offset_sample_c_cl_o_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %s, float %clamp) {
	; CHECK-LABEL: @offset_sample_c_cl_o_1d(			; CHECK-LABEL: define amdgpu_kernel void @offset_sample_c_cl_o_1d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[ZCOMPARE:%.]], float [[S:%.]], float [[CLAMP:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.cl.1d.v4f32.f32(i32 15, float [[ZCOMPARE:%.]], float [[S:%.]], float [[CLAMP:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.cl.1d.v4f32.f32(i32 15, float [[ZCOMPARE]], float [[S]], float [[CLAMP]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.cl.o.1d.v4f32.f32(i32 15, i32 0, float %zcompare, float %s, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.cl.o.1d.v4f32.f32(i32 15, i32 0, float %zcompare, float %s, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @offset_sample_c_cl_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %s, float %t, float %clamp) {			define amdgpu_kernel void @offset_sample_c_cl_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %s, float %t, float %clamp) {
	; CHECK-LABEL: @offset_sample_c_cl_o_2d(			; CHECK-LABEL: define amdgpu_kernel void @offset_sample_c_cl_o_2d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[ZCOMPARE:%.]], float [[S:%.]], float [[T:%.]], float [[CLAMP:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.cl.2d.v4f32.f32(i32 15, float [[ZCOMPARE:%.]], float [[S:%.]], float [[T:%.]], float [[CLAMP:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.cl.2d.v4f32.f32(i32 15, float [[ZCOMPARE]], float [[S]], float [[T]], float [[CLAMP]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.cl.o.2d.v4f32.f32(i32 15, i32 0, float %zcompare, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.cl.o.2d.v4f32.f32(i32 15, i32 0, float %zcompare, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @offset_sample_b_o_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %s) {			define amdgpu_kernel void @offset_sample_b_o_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %s) {
	; CHECK-LABEL: @offset_sample_b_o_1d(			; CHECK-LABEL: define amdgpu_kernel void @offset_sample_b_o_1d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[BIAS:%.]], float [[S:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.b.1d.v4f32.f32.f32(i32 15, float [[BIAS:%.]], float [[S:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.b.1d.v4f32.f32.f32(i32 15, float [[BIAS]], float [[S]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.b.o.1d.v4f32.f32.f32(i32 15, i32 0, float %bias, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.b.o.1d.v4f32.f32.f32(i32 15, i32 0, float %bias, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @offset_sample_b_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %s, float %t) {			define amdgpu_kernel void @offset_sample_b_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %s, float %t) {
	; CHECK-LABEL: @offset_sample_b_o_2d(			; CHECK-LABEL: define amdgpu_kernel void @offset_sample_b_o_2d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[BIAS:%.]], float [[S:%.]], float [[T:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.b.2d.v4f32.f32.f32(i32 15, float [[BIAS:%.]], float [[S:%.]], float [[T:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.b.2d.v4f32.f32.f32(i32 15, float [[BIAS]], float [[S]], float [[T]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.b.o.2d.v4f32.f32.f32(i32 15, i32 0, float %bias, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.b.o.2d.v4f32.f32.f32(i32 15, i32 0, float %bias, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @offset_sample_c_b_o_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, float %s) {			define amdgpu_kernel void @offset_sample_c_b_o_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, float %s) {
	; CHECK-LABEL: @offset_sample_c_b_o_1d(			; CHECK-LABEL: define amdgpu_kernel void @offset_sample_c_b_o_1d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[BIAS:%.]], float [[ZCOMPARE:%.]], float [[S:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.b.1d.v4f32.f32.f32(i32 15, float [[BIAS:%.]], float [[ZCOMPARE:%.]], float [[S:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.b.1d.v4f32.f32.f32(i32 15, float [[BIAS]], float [[ZCOMPARE]], float [[S]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.o.1d.v4f32.f32.f32(i32 15, i32 0, float %bias, float %zcompare, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.o.1d.v4f32.f32.f32(i32 15, i32 0, float %bias, float %zcompare, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @offset_sample_c_b_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, float %s, float %t) {			define amdgpu_kernel void @offset_sample_c_b_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, float %s, float %t) {
	; CHECK-LABEL: @offset_sample_c_b_o_2d(			; CHECK-LABEL: define amdgpu_kernel void @offset_sample_c_b_o_2d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[BIAS:%.]], float [[ZCOMPARE:%.]], float [[S:%.]], float [[T:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.b.2d.v4f32.f32.f32(i32 15, float [[BIAS:%.]], float [[ZCOMPARE:%.]], float [[S:%.]], float [[T:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.b.2d.v4f32.f32.f32(i32 15, float [[BIAS]], float [[ZCOMPARE]], float [[S]], float [[T]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.o.2d.v4f32.f32.f32(i32 15, i32 0, float %bias, float %zcompare, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.o.2d.v4f32.f32.f32(i32 15, i32 0, float %bias, float %zcompare, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @offset_sample_b_cl_o_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %s, float %clamp) {			define amdgpu_kernel void @offset_sample_b_cl_o_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %s, float %clamp) {
	; CHECK-LABEL: @offset_sample_b_cl_o_1d(			; CHECK-LABEL: define amdgpu_kernel void @offset_sample_b_cl_o_1d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[BIAS:%.]], float [[S:%.]], float [[CLAMP:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.b.cl.1d.v4f32.f32.f32(i32 15, float [[BIAS:%.]], float [[S:%.]], float [[CLAMP:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.b.cl.1d.v4f32.f32.f32(i32 15, float [[BIAS]], float [[S]], float [[CLAMP]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.b.cl.o.1d.v4f32.f32.f32(i32 15, i32 0, float %bias, float %s, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.b.cl.o.1d.v4f32.f32.f32(i32 15, i32 0, float %bias, float %s, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @offset_sample_b_cl_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %s, float %t, float %clamp) {			define amdgpu_kernel void @offset_sample_b_cl_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %s, float %t, float %clamp) {
	; CHECK-LABEL: @offset_sample_b_cl_o_2d(			; CHECK-LABEL: define amdgpu_kernel void @offset_sample_b_cl_o_2d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[BIAS:%.]], float [[S:%.]], float [[T:%.]], float [[CLAMP:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.b.cl.2d.v4f32.f32.f32(i32 15, float [[BIAS:%.]], float [[S:%.]], float [[T:%.]], float [[CLAMP:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.b.cl.2d.v4f32.f32.f32(i32 15, float [[BIAS]], float [[S]], float [[T]], float [[CLAMP]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.b.cl.o.2d.v4f32.f32.f32(i32 15, i32 0, float %bias, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.b.cl.o.2d.v4f32.f32.f32(i32 15, i32 0, float %bias, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @offset_sample_c_b_cl_o_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, float %s, float %clamp) {			define amdgpu_kernel void @offset_sample_c_b_cl_o_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, float %s, float %clamp) {
	; CHECK-LABEL: @offset_sample_c_b_cl_o_1d(			; CHECK-LABEL: define amdgpu_kernel void @offset_sample_c_b_cl_o_1d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[BIAS:%.]], float [[ZCOMPARE:%.]], float [[S:%.]], float [[CLAMP:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.b.cl.1d.v4f32.f32.f32(i32 15, float [[BIAS:%.]], float [[ZCOMPARE:%.]], float [[S:%.]], float [[CLAMP:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.b.cl.1d.v4f32.f32.f32(i32 15, float [[BIAS]], float [[ZCOMPARE]], float [[S]], float [[CLAMP]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.cl.o.1d.v4f32.f32.f32(i32 15, i32 0, float %bias, float %zcompare, float %s, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.cl.o.1d.v4f32.f32.f32(i32 15, i32 0, float %bias, float %zcompare, float %s, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @offset_sample_c_b_cl_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, float %s, float %t, float %clamp) {			define amdgpu_kernel void @offset_sample_c_b_cl_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %bias, float %zcompare, float %s, float %t, float %clamp) {
	; CHECK-LABEL: @offset_sample_c_b_cl_o_2d(			; CHECK-LABEL: define amdgpu_kernel void @offset_sample_c_b_cl_o_2d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[BIAS:%.]], float [[ZCOMPARE:%.]], float [[S:%.]], float [[T:%.]], float [[CLAMP:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.b.cl.2d.v4f32.f32.f32(i32 15, float [[BIAS:%.]], float [[ZCOMPARE:%.]], float [[S:%.]], float [[T:%.]], float [[CLAMP:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.b.cl.2d.v4f32.f32.f32(i32 15, float [[BIAS]], float [[ZCOMPARE]], float [[S]], float [[T]], float [[CLAMP]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.cl.o.2d.v4f32.f32.f32(i32 15, i32 0, float %bias, float %zcompare, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.b.cl.o.2d.v4f32.f32.f32(i32 15, i32 0, float %bias, float %zcompare, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @offset_sample_d_o_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %dsdh, float %dsdv, float %s) {			define amdgpu_kernel void @offset_sample_d_o_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %dsdh, float %dsdv, float %s) {
	; CHECK-LABEL: @offset_sample_d_o_1d(			; CHECK-LABEL: define amdgpu_kernel void @offset_sample_d_o_1d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[DSDH:%.]], float [[DSDV:%.]], float [[S:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.d.1d.v4f32.f32.f32(i32 15, float [[DSDH:%.]], float [[DSDV:%.]], float [[S:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.d.1d.v4f32.f32.f32(i32 15, float [[DSDH]], float [[DSDV]], float [[S]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.d.o.1d.v4f32.f32.f32(i32 15, i32 0, float %dsdh, float %dsdv, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.d.o.1d.v4f32.f32.f32(i32 15, i32 0, float %dsdh, float %dsdv, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @offset_sample_d_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %dsdh, float %dtdh, float %dsdv, float %dtdv, float %s, float %t) {			define amdgpu_kernel void @offset_sample_d_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %dsdh, float %dtdh, float %dsdv, float %dtdv, float %s, float %t) {
	; CHECK-LABEL: @offset_sample_d_o_2d(			; CHECK-LABEL: define amdgpu_kernel void @offset_sample_d_o_2d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[DSDH:%.]], float [[DTDH:%.]], float [[DSDV:%.]], float [[DTDV:%.]], float [[S:%.]], float [[T:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.d.2d.v4f32.f32.f32(i32 15, float [[DSDH:%.]], float [[DTDH:%.]], float [[DSDV:%.]], float [[DTDV:%.]], float [[S:%.]], float [[T:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.d.2d.v4f32.f32.f32(i32 15, float [[DSDH]], float [[DTDH]], float [[DSDV]], float [[DTDV]], float [[S]], float [[T]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.d.o.2d.v4f32.f32.f32(i32 15, i32 0, float %dsdh, float %dtdh, float %dsdv, float %dtdv, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.d.o.2d.v4f32.f32.f32(i32 15, i32 0, float %dsdh, float %dtdh, float %dsdv, float %dtdv, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @offset_sample_c_d_o_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %dsdh, float %dsdv, float %s) {			define amdgpu_kernel void @offset_sample_c_d_o_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %dsdh, float %dsdv, float %s) {
	; CHECK-LABEL: @offset_sample_c_d_o_1d(			; CHECK-LABEL: define amdgpu_kernel void @offset_sample_c_d_o_1d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[ZCOMPARE:%.]], float [[DSDH:%.]], float [[DSDV:%.]], float [[S:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.d.1d.v4f32.f32.f32(i32 15, float [[ZCOMPARE:%.]], float [[DSDH:%.]], float [[DSDV:%.]], float [[S:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.d.1d.v4f32.f32.f32(i32 15, float [[ZCOMPARE]], float [[DSDH]], float [[DSDV]], float [[S]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.d.o.1d.v4f32.f32.f32(i32 15, i32 0, float %zcompare, float %dsdh, float %dsdv, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.d.o.1d.v4f32.f32.f32(i32 15, i32 0, float %zcompare, float %dsdh, float %dsdv, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @offset_sample_c_d_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %dsdh, float %dtdh, float %dsdv, float %dtdv, float %s, float %t) {			define amdgpu_kernel void @offset_sample_c_d_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %dsdh, float %dtdh, float %dsdv, float %dtdv, float %s, float %t) {
	; CHECK-LABEL: @offset_sample_c_d_o_2d(			; CHECK-LABEL: define amdgpu_kernel void @offset_sample_c_d_o_2d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[ZCOMPARE:%.]], float [[DSDH:%.]], float [[DTDH:%.]], float [[DSDV:%.]], float [[DTDV:%.]], float [[S:%.]], float [[T:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.d.2d.v4f32.f32.f32(i32 15, float [[ZCOMPARE:%.]], float [[DSDH:%.]], float [[DTDH:%.]], float [[DSDV:%.]], float [[DTDV:%.]], float [[S:%.]], float [[T:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.d.2d.v4f32.f32.f32(i32 15, float [[ZCOMPARE]], float [[DSDH]], float [[DTDH]], float [[DSDV]], float [[DTDV]], float [[S]], float [[T]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.d.o.2d.v4f32.f32.f32(i32 15, i32 0, float %zcompare, float %dsdh, float %dtdh, float %dsdv, float %dtdv, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.d.o.2d.v4f32.f32.f32(i32 15, i32 0, float %zcompare, float %dsdh, float %dtdh, float %dsdv, float %dtdv, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @offset_sample_d_cl_o_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %dsdh, float %dsdv, float %s, float %clamp) {			define amdgpu_kernel void @offset_sample_d_cl_o_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %dsdh, float %dsdv, float %s, float %clamp) {
	; CHECK-LABEL: @offset_sample_d_cl_o_1d(			; CHECK-LABEL: define amdgpu_kernel void @offset_sample_d_cl_o_1d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[DSDH:%.]], float [[DSDV:%.]], float [[S:%.]], float [[CLAMP:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.d.cl.1d.v4f32.f32.f32(i32 15, float [[DSDH:%.]], float [[DSDV:%.]], float [[S:%.]], float [[CLAMP:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.d.cl.1d.v4f32.f32.f32(i32 15, float [[DSDH]], float [[DSDV]], float [[S]], float [[CLAMP]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.d.cl.o.1d.v4f32.f32.f32(i32 15, i32 0, float %dsdh, float %dsdv, float %s, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.d.cl.o.1d.v4f32.f32.f32(i32 15, i32 0, float %dsdh, float %dsdv, float %s, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @offset_sample_d_cl_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %dsdh, float %dtdh, float %dsdv, float %dtdv, float %s, float %t, float %clamp) {			define amdgpu_kernel void @offset_sample_d_cl_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %dsdh, float %dtdh, float %dsdv, float %dtdv, float %s, float %t, float %clamp) {
	; CHECK-LABEL: @offset_sample_d_cl_o_2d(			; CHECK-LABEL: define amdgpu_kernel void @offset_sample_d_cl_o_2d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[DSDH:%.]], float [[DTDH:%.]], float [[DSDV:%.]], float [[DTDV:%.]], float [[S:%.]], float [[T:%.]], float [[CLAMP:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.d.cl.2d.v4f32.f32.f32(i32 15, float [[DSDH:%.]], float [[DTDH:%.]], float [[DSDV:%.]], float [[DTDV:%.]], float [[S:%.]], float [[T:%.]], float [[CLAMP:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.d.cl.2d.v4f32.f32.f32(i32 15, float [[DSDH]], float [[DTDH]], float [[DSDV]], float [[DTDV]], float [[S]], float [[T]], float [[CLAMP]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.d.cl.o.2d.v4f32.f32.f32(i32 15, i32 0, float %dsdh, float %dtdh, float %dsdv, float %dtdv, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.d.cl.o.2d.v4f32.f32.f32(i32 15, i32 0, float %dsdh, float %dtdh, float %dsdv, float %dtdv, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @offset_sample_c_d_cl_o_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %dsdh, float %dsdv, float %s, float %clamp) {			define amdgpu_kernel void @offset_sample_c_d_cl_o_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %dsdh, float %dsdv, float %s, float %clamp) {
	; CHECK-LABEL: @offset_sample_c_d_cl_o_1d(			; CHECK-LABEL: define amdgpu_kernel void @offset_sample_c_d_cl_o_1d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[ZCOMPARE:%.]], float [[DSDH:%.]], float [[DSDV:%.]], float [[S:%.]], float [[CLAMP:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.d.cl.1d.v4f32.f32.f32(i32 15, float [[ZCOMPARE:%.]], float [[DSDH:%.]], float [[DSDV:%.]], float [[S:%.]], float [[CLAMP:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.d.cl.1d.v4f32.f32.f32(i32 15, float [[ZCOMPARE]], float [[DSDH]], float [[DSDV]], float [[S]], float [[CLAMP]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.d.cl.o.1d.v4f32.f32.f32(i32 15, i32 0, float %zcompare, float %dsdh, float %dsdv, float %s, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.d.cl.o.1d.v4f32.f32.f32(i32 15, i32 0, float %zcompare, float %dsdh, float %dsdv, float %s, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @offset_sample_c_d_cl_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %dsdh, float %dtdh, float %dsdv, float %dtdv, float %s, float %t, float %clamp) {			define amdgpu_kernel void @offset_sample_c_d_cl_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %dsdh, float %dtdh, float %dsdv, float %dtdv, float %s, float %t, float %clamp) {
	; CHECK-LABEL: @offset_sample_c_d_cl_o_2d(			; CHECK-LABEL: define amdgpu_kernel void @offset_sample_c_d_cl_o_2d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[ZCOMPARE:%.]], float [[DSDH:%.]], float [[DTDH:%.]], float [[DSDV:%.]], float [[DTDV:%.]], float [[S:%.]], float [[T:%.]], float [[CLAMP:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.d.cl.2d.v4f32.f32.f32(i32 15, float [[ZCOMPARE:%.]], float [[DSDH:%.]], float [[DTDH:%.]], float [[DSDV:%.]], float [[DTDV:%.]], float [[S:%.]], float [[T:%.]], float [[CLAMP:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.d.cl.2d.v4f32.f32.f32(i32 15, float [[ZCOMPARE]], float [[DSDH]], float [[DTDH]], float [[DSDV]], float [[DTDV]], float [[S]], float [[T]], float [[CLAMP]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.d.cl.o.2d.v4f32.f32.f32(i32 15, i32 0, float %zcompare, float %dsdh, float %dtdh, float %dsdv, float %dtdv, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.d.cl.o.2d.v4f32.f32.f32(i32 15, i32 0, float %zcompare, float %dsdh, float %dtdh, float %dsdv, float %dtdv, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @offset_sample_cd_o_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %dsdh, float %dsdv, float %s) {			define amdgpu_kernel void @offset_sample_cd_o_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %dsdh, float %dsdv, float %s) {
	; CHECK-LABEL: @offset_sample_cd_o_1d(			; CHECK-LABEL: define amdgpu_kernel void @offset_sample_cd_o_1d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[DSDH:%.]], float [[DSDV:%.]], float [[S:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.cd.1d.v4f32.f32.f32(i32 15, float [[DSDH:%.]], float [[DSDV:%.]], float [[S:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.cd.1d.v4f32.f32.f32(i32 15, float [[DSDH]], float [[DSDV]], float [[S]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.cd.o.1d.v4f32.f32.f32(i32 15, i32 0, float %dsdh, float %dsdv, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.cd.o.1d.v4f32.f32.f32(i32 15, i32 0, float %dsdh, float %dsdv, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @offset_sample_cd_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %dsdh, float %dtdh, float %dsdv, float %dtdv, float %s, float %t) {			define amdgpu_kernel void @offset_sample_cd_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %dsdh, float %dtdh, float %dsdv, float %dtdv, float %s, float %t) {
	; CHECK-LABEL: @offset_sample_cd_o_2d(			; CHECK-LABEL: define amdgpu_kernel void @offset_sample_cd_o_2d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[DSDH:%.]], float [[DTDH:%.]], float [[DSDV:%.]], float [[DTDV:%.]], float [[S:%.]], float [[T:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.cd.2d.v4f32.f32.f32(i32 15, float [[DSDH:%.]], float [[DTDH:%.]], float [[DSDV:%.]], float [[DTDV:%.]], float [[S:%.]], float [[T:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.cd.2d.v4f32.f32.f32(i32 15, float [[DSDH]], float [[DTDH]], float [[DSDV]], float [[DTDV]], float [[S]], float [[T]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.cd.o.2d.v4f32.f32.f32(i32 15, i32 0, float %dsdh, float %dtdh, float %dsdv, float %dtdv, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.cd.o.2d.v4f32.f32.f32(i32 15, i32 0, float %dsdh, float %dtdh, float %dsdv, float %dtdv, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @offset_sample_c_cd_o_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %dsdh, float %dsdv, float %s) {			define amdgpu_kernel void @offset_sample_c_cd_o_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %dsdh, float %dsdv, float %s) {
	; CHECK-LABEL: @offset_sample_c_cd_o_1d(			; CHECK-LABEL: define amdgpu_kernel void @offset_sample_c_cd_o_1d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[ZCOMPARE:%.]], float [[DSDH:%.]], float [[DSDV:%.]], float [[S:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.cd.1d.v4f32.f32.f32(i32 15, float [[ZCOMPARE:%.]], float [[DSDH:%.]], float [[DSDV:%.]], float [[S:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.cd.1d.v4f32.f32.f32(i32 15, float [[ZCOMPARE]], float [[DSDH]], float [[DSDV]], float [[S]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.cd.o.1d.v4f32.f32.f32(i32 15, i32 0, float %zcompare, float %dsdh, float %dsdv, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.cd.o.1d.v4f32.f32.f32(i32 15, i32 0, float %zcompare, float %dsdh, float %dsdv, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @offset_sample_c_cd_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %dsdh, float %dtdh, float %dsdv, float %dtdv, float %s, float %t) {			define amdgpu_kernel void @offset_sample_c_cd_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %dsdh, float %dtdh, float %dsdv, float %dtdv, float %s, float %t) {
	; CHECK-LABEL: @offset_sample_c_cd_o_2d(			; CHECK-LABEL: define amdgpu_kernel void @offset_sample_c_cd_o_2d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[ZCOMPARE:%.]], float [[DSDH:%.]], float [[DTDH:%.]], float [[DSDV:%.]], float [[DTDV:%.]], float [[S:%.]], float [[T:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.cd.2d.v4f32.f32.f32(i32 15, float [[ZCOMPARE:%.]], float [[DSDH:%.]], float [[DTDH:%.]], float [[DSDV:%.]], float [[DTDV:%.]], float [[S:%.]], float [[T:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.cd.2d.v4f32.f32.f32(i32 15, float [[ZCOMPARE]], float [[DSDH]], float [[DTDH]], float [[DSDV]], float [[DTDV]], float [[S]], float [[T]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.cd.o.2d.v4f32.f32.f32(i32 15, i32 0, float %zcompare, float %dsdh, float %dtdh, float %dsdv, float %dtdv, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.cd.o.2d.v4f32.f32.f32(i32 15, i32 0, float %zcompare, float %dsdh, float %dtdh, float %dsdv, float %dtdv, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @offset_sample_cd_cl_o_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %dsdh, float %dsdv, float %s, float %clamp) {			define amdgpu_kernel void @offset_sample_cd_cl_o_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %dsdh, float %dsdv, float %s, float %clamp) {
	; CHECK-LABEL: @offset_sample_cd_cl_o_1d(			; CHECK-LABEL: define amdgpu_kernel void @offset_sample_cd_cl_o_1d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[DSDH:%.]], float [[DSDV:%.]], float [[S:%.]], float [[CLAMP:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.cd.cl.1d.v4f32.f32.f32(i32 15, float [[DSDH:%.]], float [[DSDV:%.]], float [[S:%.]], float [[CLAMP:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.cd.cl.1d.v4f32.f32.f32(i32 15, float [[DSDH]], float [[DSDV]], float [[S]], float [[CLAMP]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.cd.cl.o.1d.v4f32.f32.f32(i32 15, i32 0, float %dsdh, float %dsdv, float %s, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.cd.cl.o.1d.v4f32.f32.f32(i32 15, i32 0, float %dsdh, float %dsdv, float %s, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @offset_sample_cd_cl_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %dsdh, float %dtdh, float %dsdv, float %dtdv, float %s, float %t, float %clamp) {			define amdgpu_kernel void @offset_sample_cd_cl_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %dsdh, float %dtdh, float %dsdv, float %dtdv, float %s, float %t, float %clamp) {
	; CHECK-LABEL: @offset_sample_cd_cl_o_2d(			; CHECK-LABEL: define amdgpu_kernel void @offset_sample_cd_cl_o_2d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[DSDH:%.]], float [[DTDH:%.]], float [[DSDV:%.]], float [[DTDV:%.]], float [[S:%.]], float [[T:%.]], float [[CLAMP:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.cd.cl.2d.v4f32.f32.f32(i32 15, float [[DSDH:%.]], float [[DTDH:%.]], float [[DSDV:%.]], float [[DTDV:%.]], float [[S:%.]], float [[T:%.]], float [[CLAMP:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.cd.cl.2d.v4f32.f32.f32(i32 15, float [[DSDH]], float [[DTDH]], float [[DSDV]], float [[DTDV]], float [[S]], float [[T]], float [[CLAMP]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.cd.cl.o.2d.v4f32.f32.f32(i32 15, i32 0, float %dsdh, float %dtdh, float %dsdv, float %dtdv, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.cd.cl.o.2d.v4f32.f32.f32(i32 15, i32 0, float %dsdh, float %dtdh, float %dsdv, float %dtdv, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @offset_sample_c_cd_cl_o_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %dsdh, float %dsdv, float %s, float %clamp) {			define amdgpu_kernel void @offset_sample_c_cd_cl_o_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %dsdh, float %dsdv, float %s, float %clamp) {
	; CHECK-LABEL: @offset_sample_c_cd_cl_o_1d(			; CHECK-LABEL: define amdgpu_kernel void @offset_sample_c_cd_cl_o_1d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[ZCOMPARE:%.]], float [[DSDH:%.]], float [[DSDV:%.]], float [[S:%.]], float [[CLAMP:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.cd.cl.1d.v4f32.f32.f32(i32 15, float [[ZCOMPARE:%.]], float [[DSDH:%.]], float [[DSDV:%.]], float [[S:%.]], float [[CLAMP:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.cd.cl.1d.v4f32.f32.f32(i32 15, float [[ZCOMPARE]], float [[DSDH]], float [[DSDV]], float [[S]], float [[CLAMP]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.cd.cl.o.1d.v4f32.f32.f32(i32 15, i32 0, float %zcompare, float %dsdh, float %dsdv, float %s, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.cd.cl.o.1d.v4f32.f32.f32(i32 15, i32 0, float %zcompare, float %dsdh, float %dsdv, float %s, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @offset_sample_c_cd_cl_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %dsdh, float %dtdh, float %dsdv, float %dtdv, float %s, float %t, float %clamp) {			define amdgpu_kernel void @offset_sample_c_cd_cl_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %dsdh, float %dtdh, float %dsdv, float %dtdv, float %s, float %t, float %clamp) {
	; CHECK-LABEL: @offset_sample_c_cd_cl_o_2d(			; CHECK-LABEL: define amdgpu_kernel void @offset_sample_c_cd_cl_o_2d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[ZCOMPARE:%.]], float [[DSDH:%.]], float [[DTDH:%.]], float [[DSDV:%.]], float [[DTDV:%.]], float [[S:%.]], float [[T:%.]], float [[CLAMP:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.cd.cl.2d.v4f32.f32.f32(i32 15, float [[ZCOMPARE:%.]], float [[DSDH:%.]], float [[DTDH:%.]], float [[DSDV:%.]], float [[DTDV:%.]], float [[S:%.]], float [[T:%.]], float [[CLAMP:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.cd.cl.2d.v4f32.f32.f32(i32 15, float [[ZCOMPARE]], float [[DSDH]], float [[DTDH]], float [[DSDV]], float [[DTDV]], float [[S]], float [[T]], float [[CLAMP]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.cd.cl.o.2d.v4f32.f32.f32(i32 15, i32 0, float %zcompare, float %dsdh, float %dtdh, float %dsdv, float %dtdv, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.cd.cl.o.2d.v4f32.f32.f32(i32 15, i32 0, float %zcompare, float %dsdh, float %dtdh, float %dsdv, float %dtdv, float %s, float %t, float %clamp, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @offset_sample_l_o_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s, float %lod) {			define amdgpu_kernel void @offset_sample_l_o_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s, float %lod) {
	; CHECK-LABEL: @offset_sample_l_o_1d(			; CHECK-LABEL: define amdgpu_kernel void @offset_sample_l_o_1d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[S:%.]], float [[LOD:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.l.1d.v4f32.f32(i32 15, float [[S:%.]], float [[LOD:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.l.1d.v4f32.f32(i32 15, float [[S]], float [[LOD]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.l.o.1d.v4f32.f32(i32 15, i32 0, float %s, float %lod, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.l.o.1d.v4f32.f32(i32 15, i32 0, float %s, float %lod, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @offset_sample_l_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s, float %t, float %lod) {			define amdgpu_kernel void @offset_sample_l_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s, float %t, float %lod) {
	; CHECK-LABEL: @offset_sample_l_o_2d(			; CHECK-LABEL: define amdgpu_kernel void @offset_sample_l_o_2d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[S:%.]], float [[T:%.]], float [[LOD:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.l.2d.v4f32.f32(i32 15, float [[S:%.]], float [[T:%.]], float [[LOD:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.l.2d.v4f32.f32(i32 15, float [[S]], float [[T]], float [[LOD]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.l.o.2d.v4f32.f32(i32 15, i32 0, float %s, float %t, float %lod, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.l.o.2d.v4f32.f32(i32 15, i32 0, float %s, float %t, float %lod, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @offset_sample_c_l_o_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %s, float %lod) {			define amdgpu_kernel void @offset_sample_c_l_o_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %s, float %lod) {
	; CHECK-LABEL: @offset_sample_c_l_o_1d(			; CHECK-LABEL: define amdgpu_kernel void @offset_sample_c_l_o_1d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[ZCOMPARE:%.]], float [[S:%.]], float [[LOD:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.l.1d.v4f32.f32(i32 15, float [[ZCOMPARE:%.]], float [[S:%.]], float [[LOD:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.l.1d.v4f32.f32(i32 15, float [[ZCOMPARE]], float [[S]], float [[LOD]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.l.o.1d.v4f32.f32(i32 15, i32 0, float %zcompare, float %s, float %lod, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.l.o.1d.v4f32.f32(i32 15, i32 0, float %zcompare, float %s, float %lod, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @offset_sample_c_l_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %s, float %t, float %lod) {			define amdgpu_kernel void @offset_sample_c_l_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %s, float %t, float %lod) {
	; CHECK-LABEL: @offset_sample_c_l_o_2d(			; CHECK-LABEL: define amdgpu_kernel void @offset_sample_c_l_o_2d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[ZCOMPARE:%.]], float [[S:%.]], float [[T:%.]], float [[LOD:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.l.2d.v4f32.f32(i32 15, float [[ZCOMPARE:%.]], float [[S:%.]], float [[T:%.]], float [[LOD:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.l.2d.v4f32.f32(i32 15, float [[ZCOMPARE]], float [[S]], float [[T]], float [[LOD]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.l.o.2d.v4f32.f32(i32 15, i32 0, float %zcompare, float %s, float %t, float %lod, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.l.o.2d.v4f32.f32(i32 15, i32 0, float %zcompare, float %s, float %t, float %lod, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @offset_sample_lz_o_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s) {			define amdgpu_kernel void @offset_sample_lz_o_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s) {
	; CHECK-LABEL: @offset_sample_lz_o_1d(			; CHECK-LABEL: define amdgpu_kernel void @offset_sample_lz_o_1d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[S:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.lz.1d.v4f32.f32(i32 15, float [[S:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.lz.1d.v4f32.f32(i32 15, float [[S]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.lz.o.1d.v4f32.f32(i32 15, i32 0, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.lz.o.1d.v4f32.f32(i32 15, i32 0, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @offset_sample_lz_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s, float %t) {			define amdgpu_kernel void @offset_sample_lz_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %s, float %t) {
	; CHECK-LABEL: @offset_sample_lz_o_2d(			; CHECK-LABEL: define amdgpu_kernel void @offset_sample_lz_o_2d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[S:%.]], float [[T:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.lz.2d.v4f32.f32(i32 15, float [[S:%.]], float [[T:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.lz.2d.v4f32.f32(i32 15, float [[S]], float [[T]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.lz.o.2d.v4f32.f32(i32 15, i32 0, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.lz.o.2d.v4f32.f32(i32 15, i32 0, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @offset_sample_c_lz_o_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %s) {			define amdgpu_kernel void @offset_sample_c_lz_o_1d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %s) {
	; CHECK-LABEL: @offset_sample_c_lz_o_1d(			; CHECK-LABEL: define amdgpu_kernel void @offset_sample_c_lz_o_1d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[ZCOMPARE:%.]], float [[S:%.*]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.lz.1d.v4f32.f32(i32 15, float [[ZCOMPARE:%.]], float [[S:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.*]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.lz.1d.v4f32.f32(i32 15, float [[ZCOMPARE]], float [[S]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.lz.o.1d.v4f32.f32(i32 15, i32 0, float %zcompare, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.lz.o.1d.v4f32.f32(i32 15, i32 0, float %zcompare, float %s, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	define amdgpu_kernel void @offset_sample_c_lz_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %s, float %t) {			define amdgpu_kernel void @offset_sample_c_lz_o_2d(ptr addrspace(1) %out, <8 x i32> inreg %rsrc, <4 x i32> inreg %samp, float %zcompare, float %s, float %t) {
	; CHECK-LABEL: @offset_sample_c_lz_o_2d(			; CHECK-LABEL: define amdgpu_kernel void @offset_sample_c_lz_o_2d
				; CHECK-SAME: (ptr addrspace(1) [[OUT:%.]], <8 x i32> inreg [[RSRC:%.]], <4 x i32> inreg [[SAMP:%.]], float [[ZCOMPARE:%.]], float [[S:%.]], float [[T:%.]]) #[[ATTR3]] {
	; CHECK-NEXT: main_body:			; CHECK-NEXT: main_body:
	; CHECK-NEXT: [[V:%.]] = call <4 x float> @llvm.amdgcn.image.sample.c.lz.2d.v4f32.f32(i32 15, float [[ZCOMPARE:%.]], float [[S:%.]], float [[T:%.]], <8 x i32> [[RSRC:%.]], <4 x i32> [[SAMP:%.]], i1 false, i32 0, i32 0)			; CHECK-NEXT: [[V:%.*]] = call <4 x float> @llvm.amdgcn.image.sample.c.lz.2d.v4f32.f32(i32 15, float [[ZCOMPARE]], float [[S]], float [[T]], <8 x i32> [[RSRC]], <4 x i32> [[SAMP]], i1 false, i32 0, i32 0)
	; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT:%.*]], align 16			; CHECK-NEXT: store <4 x float> [[V]], ptr addrspace(1) [[OUT]], align 16
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	main_body:			main_body:
	%v = call <4 x float> @llvm.amdgcn.image.sample.c.lz.o.2d.v4f32.f32(i32 15, i32 0, float %zcompare, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)			%v = call <4 x float> @llvm.amdgcn.image.sample.c.lz.o.2d.v4f32.f32(i32 15, i32 0, float %zcompare, float %s, float %t, <8 x i32> %rsrc, <4 x i32> %samp, i1 0, i32 0, i32 0)
	store <4 x float> %v, ptr addrspace(1) %out			store <4 x float> %v, ptr addrspace(1) %out
	ret void			ret void
	}			}

	Show All 40 Lines

	; --------------------------------------------------------------------			; --------------------------------------------------------------------
	; llvm.amdgcn.is.shared			; llvm.amdgcn.is.shared
	; --------------------------------------------------------------------			; --------------------------------------------------------------------

	declare i1 @llvm.amdgcn.is.shared(ptr) nounwind readnone			declare i1 @llvm.amdgcn.is.shared(ptr) nounwind readnone

	define i1 @test_is_shared_null() nounwind {			define i1 @test_is_shared_null() nounwind {
	; CHECK-LABEL: @test_is_shared_null(			; CHECK-LABEL: define i1 @test_is_shared_null
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i1 false			; CHECK-NEXT: ret i1 false
	;			;
	%val = call i1 @llvm.amdgcn.is.shared(ptr null)			%val = call i1 @llvm.amdgcn.is.shared(ptr null)
	ret i1 %val			ret i1 %val
	}			}

	define i1 @test_is_shared_undef() nounwind {			define i1 @test_is_shared_undef() nounwind {
	; CHECK-LABEL: @test_is_shared_undef(			; CHECK-LABEL: define i1 @test_is_shared_undef
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i1 undef			; CHECK-NEXT: ret i1 undef
	;			;
	%val = call i1 @llvm.amdgcn.is.shared(ptr undef)			%val = call i1 @llvm.amdgcn.is.shared(ptr undef)
	ret i1 %val			ret i1 %val
	}			}

	; --------------------------------------------------------------------			; --------------------------------------------------------------------
	; llvm.amdgcn.is.private			; llvm.amdgcn.is.private
	; --------------------------------------------------------------------			; --------------------------------------------------------------------

	declare i1 @llvm.amdgcn.is.private(ptr) nounwind readnone			declare i1 @llvm.amdgcn.is.private(ptr) nounwind readnone

	define i1 @test_is_private_null() nounwind {			define i1 @test_is_private_null() nounwind {
	; CHECK-LABEL: @test_is_private_null(			; CHECK-LABEL: define i1 @test_is_private_null
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i1 false			; CHECK-NEXT: ret i1 false
	;			;
	%val = call i1 @llvm.amdgcn.is.private(ptr null)			%val = call i1 @llvm.amdgcn.is.private(ptr null)
	ret i1 %val			ret i1 %val
	}			}

	define i1 @test_is_private_undef() nounwind {			define i1 @test_is_private_undef() nounwind {
	; CHECK-LABEL: @test_is_private_undef(			; CHECK-LABEL: define i1 @test_is_private_undef
				; CHECK-SAME: () #[[ATTR1]] {
	; CHECK-NEXT: ret i1 undef			; CHECK-NEXT: ret i1 undef
	;			;
	%val = call i1 @llvm.amdgcn.is.private(ptr undef)			%val = call i1 @llvm.amdgcn.is.private(ptr undef)
	ret i1 %val			ret i1 %val
	}			}

	; --------------------------------------------------------------------			; --------------------------------------------------------------------
	; llvm.amdgcn.trig.preop			; llvm.amdgcn.trig.preop
	; --------------------------------------------------------------------			; --------------------------------------------------------------------

	declare double @llvm.amdgcn.trig.preop.f64(double, i32)			declare double @llvm.amdgcn.trig.preop.f64(double, i32)
	declare float @llvm.amdgcn.trig.preop.f32(float, i32)			declare float @llvm.amdgcn.trig.preop.f32(float, i32)

	define double @trig_preop_constfold_variable_undef_arg(i32 %arg) {			define double @trig_preop_constfold_variable_undef_arg(i32 %arg) {
	; CHECK-LABEL: @trig_preop_constfold_variable_undef_arg(			; CHECK-LABEL: define double @trig_preop_constfold_variable_undef_arg
	; CHECK-NEXT: [[VAL:%.]] = call double @llvm.amdgcn.trig.preop.f64(double undef, i32 [[ARG:%.]])			; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: [[VAL:%.*]] = call double @llvm.amdgcn.trig.preop.f64(double undef, i32 [[ARG]])
	; CHECK-NEXT: ret double [[VAL]]			; CHECK-NEXT: ret double [[VAL]]
	;			;
	%val = call double @llvm.amdgcn.trig.preop.f64(double undef, i32 %arg)			%val = call double @llvm.amdgcn.trig.preop.f64(double undef, i32 %arg)
	ret double %val			ret double %val
	}			}

	define double @trig_preop_constfold_variable_poison_arg(i32 %arg) {			define double @trig_preop_constfold_variable_poison_arg(i32 %arg) {
	; CHECK-LABEL: @trig_preop_constfold_variable_poison_arg(			; CHECK-LABEL: define double @trig_preop_constfold_variable_poison_arg
	; CHECK-NEXT: [[VAL:%.]] = call double @llvm.amdgcn.trig.preop.f64(double poison, i32 [[ARG:%.]])			; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: [[VAL:%.*]] = call double @llvm.amdgcn.trig.preop.f64(double poison, i32 [[ARG]])
	; CHECK-NEXT: ret double [[VAL]]			; CHECK-NEXT: ret double [[VAL]]
	;			;
	%val = call double @llvm.amdgcn.trig.preop.f64(double poison, i32 %arg)			%val = call double @llvm.amdgcn.trig.preop.f64(double poison, i32 %arg)
	ret double %val			ret double %val
	}			}

	define double @trig_preop_constfold_variable_arg_undef(double %arg) {			define double @trig_preop_constfold_variable_arg_undef(double %arg) {
	; CHECK-LABEL: @trig_preop_constfold_variable_arg_undef(			; CHECK-LABEL: define double @trig_preop_constfold_variable_arg_undef
	; CHECK-NEXT: [[VAL:%.]] = call double @llvm.amdgcn.trig.preop.f64(double [[ARG:%.]], i32 undef)			; CHECK-SAME: (double [[ARG:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: [[VAL:%.*]] = call double @llvm.amdgcn.trig.preop.f64(double [[ARG]], i32 undef)
	; CHECK-NEXT: ret double [[VAL]]			; CHECK-NEXT: ret double [[VAL]]
	;			;
	%val = call double @llvm.amdgcn.trig.preop.f64(double %arg, i32 undef)			%val = call double @llvm.amdgcn.trig.preop.f64(double %arg, i32 undef)
	ret double %val			ret double %val
	}			}

	define double @trig_preop_constfold_variable_arg_poison(double %arg) {			define double @trig_preop_constfold_variable_arg_poison(double %arg) {
	; CHECK-LABEL: @trig_preop_constfold_variable_arg_poison(			; CHECK-LABEL: define double @trig_preop_constfold_variable_arg_poison
	; CHECK-NEXT: [[VAL:%.]] = call double @llvm.amdgcn.trig.preop.f64(double [[ARG:%.]], i32 poison)			; CHECK-SAME: (double [[ARG:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: [[VAL:%.*]] = call double @llvm.amdgcn.trig.preop.f64(double [[ARG]], i32 poison)
	; CHECK-NEXT: ret double [[VAL]]			; CHECK-NEXT: ret double [[VAL]]
	;			;
	%val = call double @llvm.amdgcn.trig.preop.f64(double %arg, i32 poison)			%val = call double @llvm.amdgcn.trig.preop.f64(double %arg, i32 poison)
	ret double %val			ret double %val
	}			}

	define double @trig_preop_constfold_variable_int(i32 %arg) {			define double @trig_preop_constfold_variable_int(i32 %arg) {
	; CHECK-LABEL: @trig_preop_constfold_variable_int(			; CHECK-LABEL: define double @trig_preop_constfold_variable_int
	; CHECK-NEXT: [[VAL:%.]] = call double @llvm.amdgcn.trig.preop.f64(double 3.454350e+02, i32 [[ARG:%.]])			; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: [[VAL:%.*]] = call double @llvm.amdgcn.trig.preop.f64(double 3.454350e+02, i32 [[ARG]])
	; CHECK-NEXT: ret double [[VAL]]			; CHECK-NEXT: ret double [[VAL]]
	;			;
	%val = call double @llvm.amdgcn.trig.preop.f64(double 3.454350e+02, i32 %arg)			%val = call double @llvm.amdgcn.trig.preop.f64(double 3.454350e+02, i32 %arg)
	ret double %val			ret double %val
	}			}

	define double @trig_preop_qnan(i32 %arg) {			define double @trig_preop_qnan(i32 %arg) {
	; CHECK-LABEL: @trig_preop_qnan(			; CHECK-LABEL: define double @trig_preop_qnan
	; CHECK-NEXT: [[VAL:%.]] = call double @llvm.amdgcn.trig.preop.f64(double 0x7FF8000000000000, i32 [[ARG:%.]])			; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: [[VAL:%.*]] = call double @llvm.amdgcn.trig.preop.f64(double 0x7FF8000000000000, i32 [[ARG]])
	; CHECK-NEXT: ret double [[VAL]]			; CHECK-NEXT: ret double [[VAL]]
	;			;
	%val = call double @llvm.amdgcn.trig.preop.f64(double 0x7FF8000000000000, i32 %arg)			%val = call double @llvm.amdgcn.trig.preop.f64(double 0x7FF8000000000000, i32 %arg)
	ret double %val			ret double %val
	}			}

	define double @trig_preop_snan(i32 %arg) {			define double @trig_preop_snan(i32 %arg) {
	; CHECK-LABEL: @trig_preop_snan(			; CHECK-LABEL: define double @trig_preop_snan
	; CHECK-NEXT: [[VAL:%.]] = call double @llvm.amdgcn.trig.preop.f64(double 0x7FF0000000000001, i32 [[ARG:%.]])			; CHECK-SAME: (i32 [[ARG:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: [[VAL:%.*]] = call double @llvm.amdgcn.trig.preop.f64(double 0x7FF0000000000001, i32 [[ARG]])
	; CHECK-NEXT: ret double [[VAL]]			; CHECK-NEXT: ret double [[VAL]]
	;			;
	%val = call double @llvm.amdgcn.trig.preop.f64(double 0x7FF0000000000001, i32 %arg)			%val = call double @llvm.amdgcn.trig.preop.f64(double 0x7FF0000000000001, i32 %arg)
	ret double %val			ret double %val
	}			}

	define double @trig_preop_inf_0() {			define double @trig_preop_inf_0() {
	; CHECK-LABEL: @trig_preop_inf_0(			; CHECK-LABEL: define double @trig_preop_inf_0
				; CHECK-SAME: () #[[ATTR3]] {
	; CHECK-NEXT: [[VAL:%.*]] = call double @llvm.amdgcn.trig.preop.f64(double 0x7FF0000000000000, i32 0)			; CHECK-NEXT: [[VAL:%.*]] = call double @llvm.amdgcn.trig.preop.f64(double 0x7FF0000000000000, i32 0)
	; CHECK-NEXT: ret double [[VAL]]			; CHECK-NEXT: ret double [[VAL]]
	;			;
	%val = call double @llvm.amdgcn.trig.preop.f64(double 0x7FF0000000000000, i32 0)			%val = call double @llvm.amdgcn.trig.preop.f64(double 0x7FF0000000000000, i32 0)
	ret double %val			ret double %val
	}			}

	define double @trig_preop_ninf_0() {			define double @trig_preop_ninf_0() {
	; CHECK-LABEL: @trig_preop_ninf_0(			; CHECK-LABEL: define double @trig_preop_ninf_0
				; CHECK-SAME: () #[[ATTR3]] {
	; CHECK-NEXT: [[VAL:%.*]] = call double @llvm.amdgcn.trig.preop.f64(double 0xFFF0000000000000, i32 0)			; CHECK-NEXT: [[VAL:%.*]] = call double @llvm.amdgcn.trig.preop.f64(double 0xFFF0000000000000, i32 0)
	; CHECK-NEXT: ret double [[VAL]]			; CHECK-NEXT: ret double [[VAL]]
	;			;
	%val = call double @llvm.amdgcn.trig.preop.f64(double 0xFFF0000000000000, i32 0)			%val = call double @llvm.amdgcn.trig.preop.f64(double 0xFFF0000000000000, i32 0)
	ret double %val			ret double %val
	}			}

	define double @trig_preop_variable_fp(double %arg) {			define double @trig_preop_variable_fp(double %arg) {
	; CHECK-LABEL: @trig_preop_variable_fp(			; CHECK-LABEL: define double @trig_preop_variable_fp
	; CHECK-NEXT: [[VAL:%.]] = call double @llvm.amdgcn.trig.preop.f64(double [[ARG:%.]], i32 5)			; CHECK-SAME: (double [[ARG:%.*]]) #[[ATTR3]] {
				; CHECK-NEXT: [[VAL:%.*]] = call double @llvm.amdgcn.trig.preop.f64(double [[ARG]], i32 5)
	; CHECK-NEXT: ret double [[VAL]]			; CHECK-NEXT: ret double [[VAL]]
	;			;
	%val = call double @llvm.amdgcn.trig.preop.f64(double %arg, i32 5)			%val = call double @llvm.amdgcn.trig.preop.f64(double %arg, i32 5)
	ret double %val			ret double %val
	}			}

	define double @trig_preop_variable_args(double %arg0, i32 %arg1) {			define double @trig_preop_variable_args(double %arg0, i32 %arg1) {
	; CHECK-LABEL: @trig_preop_variable_args(			; CHECK-LABEL: define double @trig_preop_variable_args
	; CHECK-NEXT: [[VAL:%.]] = call double @llvm.amdgcn.trig.preop.f64(double [[ARG0:%.]], i32 [[ARG1:%.*]])			; CHECK-SAME: (double [[ARG0:%.]], i32 [[ARG1:%.]]) #[[ATTR3]] {
				; CHECK-NEXT: [[VAL:%.*]] = call double @llvm.amdgcn.trig.preop.f64(double [[ARG0]], i32 [[ARG1]])
	; CHECK-NEXT: ret double [[VAL]]			; CHECK-NEXT: ret double [[VAL]]
	;			;
	%val = call double @llvm.amdgcn.trig.preop.f64(double %arg0, i32 %arg1)			%val = call double @llvm.amdgcn.trig.preop.f64(double %arg0, i32 %arg1)
	ret double %val			ret double %val
	}			}

	define double @trig_preop_constfold() {			define double @trig_preop_constfold() {
	; CHECK-LABEL: @trig_preop_constfold(			; CHECK-LABEL: define double @trig_preop_constfold
				; CHECK-SAME: () #[[ATTR3]] {
	; CHECK-NEXT: [[VAL:%.*]] = call double @llvm.amdgcn.trig.preop.f64(double 3.454350e+02, i32 5)			; CHECK-NEXT: [[VAL:%.*]] = call double @llvm.amdgcn.trig.preop.f64(double 3.454350e+02, i32 5)
	; CHECK-NEXT: ret double [[VAL]]			; CHECK-NEXT: ret double [[VAL]]
	;			;
	%val = call double @llvm.amdgcn.trig.preop.f64(double 3.454350e+02, i32 5)			%val = call double @llvm.amdgcn.trig.preop.f64(double 3.454350e+02, i32 5)
	ret double %val			ret double %val
	}			}

	define double @trig_preop_constfold_strictfp() {			define double @trig_preop_constfold_strictfp() {
	; CHECK-LABEL: @trig_preop_constfold_strictfp(			; CHECK-LABEL: define double @trig_preop_constfold_strictfp
	; CHECK-NEXT: [[VAL:%.*]] = call double @llvm.amdgcn.trig.preop.f64(double 3.454350e+02, i32 5) #[[ATTR16]]			; CHECK-SAME: () #[[ATTR3]] {
				; CHECK-NEXT: [[VAL:%.*]] = call double @llvm.amdgcn.trig.preop.f64(double 3.454350e+02, i32 5) #[[ATTR15]]
	; CHECK-NEXT: ret double [[VAL]]			; CHECK-NEXT: ret double [[VAL]]
	;			;
	%val = call double @llvm.amdgcn.trig.preop.f64(double 3.454350e+02, i32 5) strictfp			%val = call double @llvm.amdgcn.trig.preop.f64(double 3.454350e+02, i32 5) strictfp
	ret double %val			ret double %val
	}			}

	define double @trig_preop_constfold_0.0__0() {			define double @trig_preop_constfold_0.0__0() {
	; CHECK-LABEL: @trig_preop_constfold_0.0__0(			; CHECK-LABEL: define double @trig_preop_constfold_0.0__0
				; CHECK-SAME: () #[[ATTR3]] {
	; CHECK-NEXT: [[VAL:%.*]] = call double @llvm.amdgcn.trig.preop.f64(double 0.000000e+00, i32 0)			; CHECK-NEXT: [[VAL:%.*]] = call double @llvm.amdgcn.trig.preop.f64(double 0.000000e+00, i32 0)
	; CHECK-NEXT: ret double [[VAL]]			; CHECK-NEXT: ret double [[VAL]]
	;			;
	%val = call double @llvm.amdgcn.trig.preop.f64(double 0.0, i32 0)			%val = call double @llvm.amdgcn.trig.preop.f64(double 0.0, i32 0)
	ret double %val			ret double %val
	}			}

	define double @trig_preop_constfold_0.0__1() {			define double @trig_preop_constfold_0.0__1() {
	; CHECK-LABEL: @trig_preop_constfold_0.0__1(			; CHECK-LABEL: define double @trig_preop_constfold_0.0__1
				; CHECK-SAME: () #[[ATTR3]] {
	; CHECK-NEXT: [[VAL:%.*]] = call double @llvm.amdgcn.trig.preop.f64(double 0.000000e+00, i32 1)			; CHECK-NEXT: [[VAL:%.*]] = call double @llvm.amdgcn.trig.preop.f64(double 0.000000e+00, i32 1)
	; CHECK-NEXT: ret double [[VAL]]			; CHECK-NEXT: ret double [[VAL]]
	;			;
	%val = call double @llvm.amdgcn.trig.preop.f64(double 0.0, i32 1)			%val = call double @llvm.amdgcn.trig.preop.f64(double 0.0, i32 1)
	ret double %val			ret double %val
	}			}

	define double @trig_preop_constfold_0.0__neg1() {			define double @trig_preop_constfold_0.0__neg1() {
	; CHECK-LABEL: @trig_preop_constfold_0.0__neg1(			; CHECK-LABEL: define double @trig_preop_constfold_0.0__neg1
				; CHECK-SAME: () #[[ATTR3]] {
	; CHECK-NEXT: [[VAL:%.*]] = call double @llvm.amdgcn.trig.preop.f64(double 0.000000e+00, i32 -1)			; CHECK-NEXT: [[VAL:%.*]] = call double @llvm.amdgcn.trig.preop.f64(double 0.000000e+00, i32 -1)
	; CHECK-NEXT: ret double [[VAL]]			; CHECK-NEXT: ret double [[VAL]]
	;			;
	%val = call double @llvm.amdgcn.trig.preop.f64(double 0.0, i32 -1)			%val = call double @llvm.amdgcn.trig.preop.f64(double 0.0, i32 -1)
	ret double %val			ret double %val
	}			}

	define double @trig_preop_constfold_0.0__9999999() {			define double @trig_preop_constfold_0.0__9999999() {
	; CHECK-LABEL: @trig_preop_constfold_0.0__9999999(			; CHECK-LABEL: define double @trig_preop_constfold_0.0__9999999
				; CHECK-SAME: () #[[ATTR3]] {
	; CHECK-NEXT: [[VAL:%.*]] = call double @llvm.amdgcn.trig.preop.f64(double 0.000000e+00, i32 9999999)			; CHECK-NEXT: [[VAL:%.*]] = call double @llvm.amdgcn.trig.preop.f64(double 0.000000e+00, i32 9999999)
	; CHECK-NEXT: ret double [[VAL]]			; CHECK-NEXT: ret double [[VAL]]
	;			;
	%val = call double @llvm.amdgcn.trig.preop.f64(double 0.0, i32 9999999)			%val = call double @llvm.amdgcn.trig.preop.f64(double 0.0, i32 9999999)
	ret double %val			ret double %val
	}			}

	define double @trig_preop_constfold_0.0__neg999999() {			define double @trig_preop_constfold_0.0__neg999999() {
	; CHECK-LABEL: @trig_preop_constfold_0.0__neg999999(			; CHECK-LABEL: define double @trig_preop_constfold_0.0__neg999999
				; CHECK-SAME: () #[[ATTR3]] {
	; CHECK-NEXT: [[VAL:%.*]] = call double @llvm.amdgcn.trig.preop.f64(double 0.000000e+00, i32 -999999)			; CHECK-NEXT: [[VAL:%.*]] = call double @llvm.amdgcn.trig.preop.f64(double 0.000000e+00, i32 -999999)
	; CHECK-NEXT: ret double [[VAL]]			; CHECK-NEXT: ret double [[VAL]]
	;			;
	%val = call double @llvm.amdgcn.trig.preop.f64(double 0.0, i32 -999999)			%val = call double @llvm.amdgcn.trig.preop.f64(double 0.0, i32 -999999)
	ret double %val			ret double %val
	}			}

	define double @trig_preop_constfold_0x0020000000000000_0() {			define double @trig_preop_constfold_0x0020000000000000_0() {
	; CHECK-LABEL: @trig_preop_constfold_0x0020000000000000_0(			; CHECK-LABEL: define double @trig_preop_constfold_0x0020000000000000_0
				; CHECK-SAME: () #[[ATTR3]] {
	; CHECK-NEXT: [[VAL:%.*]] = call double @llvm.amdgcn.trig.preop.f64(double 0x10000000000000, i32 0)			; CHECK-NEXT: [[VAL:%.*]] = call double @llvm.amdgcn.trig.preop.f64(double 0x10000000000000, i32 0)
	; CHECK-NEXT: ret double [[VAL]]			; CHECK-NEXT: ret double [[VAL]]
	;			;
	%val = call double @llvm.amdgcn.trig.preop.f64(double 0x0010000000000000, i32 0)			%val = call double @llvm.amdgcn.trig.preop.f64(double 0x0010000000000000, i32 0)
	ret double %val			ret double %val
	}			}

	define double @trig_preop_constfold_0x001fffffffffffff_0() {			define double @trig_preop_constfold_0x001fffffffffffff_0() {
	; CHECK-LABEL: @trig_preop_constfold_0x001fffffffffffff_0(			; CHECK-LABEL: define double @trig_preop_constfold_0x001fffffffffffff_0
				; CHECK-SAME: () #[[ATTR3]] {
	; CHECK-NEXT: [[VAL:%.*]] = call double @llvm.amdgcn.trig.preop.f64(double 0xFFFFFFFFFFFFF, i32 0)			; CHECK-NEXT: [[VAL:%.*]] = call double @llvm.amdgcn.trig.preop.f64(double 0xFFFFFFFFFFFFF, i32 0)
	; CHECK-NEXT: ret double [[VAL]]			; CHECK-NEXT: ret double [[VAL]]
	;			;
	%val = call double @llvm.amdgcn.trig.preop.f64(double 0x000fffffffffffff, i32 0)			%val = call double @llvm.amdgcn.trig.preop.f64(double 0x000fffffffffffff, i32 0)
	ret double %val			ret double %val
	}			}

	define double @trig_preop_constfold_0x8020000000000000_0() {			define double @trig_preop_constfold_0x8020000000000000_0() {
	; CHECK-LABEL: @trig_preop_constfold_0x8020000000000000_0(			; CHECK-LABEL: define double @trig_preop_constfold_0x8020000000000000_0
				; CHECK-SAME: () #[[ATTR3]] {
	; CHECK-NEXT: [[VAL:%.*]] = call double @llvm.amdgcn.trig.preop.f64(double 0x8020000000000000, i32 0)			; CHECK-NEXT: [[VAL:%.*]] = call double @llvm.amdgcn.trig.preop.f64(double 0x8020000000000000, i32 0)
	; CHECK-NEXT: ret double [[VAL]]			; CHECK-NEXT: ret double [[VAL]]
	;			;
	%val = call double @llvm.amdgcn.trig.preop.f64(double 0x8020000000000000, i32 0)			%val = call double @llvm.amdgcn.trig.preop.f64(double 0x8020000000000000, i32 0)
	ret double %val			ret double %val
	}			}

	define double @trig_preop_constfold_0x801fffffffffffff_0() {			define double @trig_preop_constfold_0x801fffffffffffff_0() {
	; CHECK-LABEL: @trig_preop_constfold_0x801fffffffffffff_0(			; CHECK-LABEL: define double @trig_preop_constfold_0x801fffffffffffff_0
				; CHECK-SAME: () #[[ATTR3]] {
	; CHECK-NEXT: [[VAL:%.*]] = call double @llvm.amdgcn.trig.preop.f64(double 0x801FFFFFFFFFFFFF, i32 0)			; CHECK-NEXT: [[VAL:%.*]] = call double @llvm.amdgcn.trig.preop.f64(double 0x801FFFFFFFFFFFFF, i32 0)
	; CHECK-NEXT: ret double [[VAL]]			; CHECK-NEXT: ret double [[VAL]]
	;			;
	%val = call double @llvm.amdgcn.trig.preop.f64(double 0x801fffffffffffff, i32 0)			%val = call double @llvm.amdgcn.trig.preop.f64(double 0x801fffffffffffff, i32 0)
	ret double %val			ret double %val
	}			}

llvm/test/Verifier/AMDGPU/intrinsic-immarg.ll

Show First 20 Lines • Show All 549 Lines • ▼ Show 20 Lines	define i32 @test_udot4(i32 %arg0, i32 %arg1, i32 %arg2, i1 %arg3) {
%val = call i32 @llvm.amdgcn.udot4(i32 %arg0, i32 %arg1, i32 %arg2, i1 %arg3)		%val = call i32 @llvm.amdgcn.udot4(i32 %arg0, i32 %arg1, i32 %arg2, i1 %arg3)
ret i32 %val		ret i32 %val
}		}

declare i32 @llvm.amdgcn.permlane16(i32, i32, i32, i32, i1, i1)		declare i32 @llvm.amdgcn.permlane16(i32, i32, i32, i32, i1, i1)
define i32 @test_permlane16(ptr addrspace(1) %out, i32 %arg0, i32 %arg1, i32 %arg2, i1 %arg3, i1 %arg4) {		define i32 @test_permlane16(ptr addrspace(1) %out, i32 %arg0, i32 %arg1, i32 %arg2, i1 %arg3, i1 %arg4) {
; CHECK: immarg operand has non-immediate parameter		; CHECK: immarg operand has non-immediate parameter
; CHECK-NEXT: i1 %arg3		; CHECK-NEXT: i1 %arg3
; CHECK-NEXT: %v1 = call i32 @llvm.amdgcn.permlane16(i32 %arg0, i32 %arg0, i32 %arg1, i32 %arg2, i1 %arg3, i1 false)		; CHECK-NEXT: %v1 = call i32 @llvm.amdgcn.permlane16.i32(i32 %arg0, i32 %arg0, i32 %arg1, i32 %arg2, i1 %arg3, i1 false)
%v1 = call i32 @llvm.amdgcn.permlane16(i32 %arg0, i32 %arg0, i32 %arg1, i32 %arg2, i1 %arg3, i1 false)		%v1 = call i32 @llvm.amdgcn.permlane16(i32 %arg0, i32 %arg0, i32 %arg1, i32 %arg2, i1 %arg3, i1 false)

; CHECK: immarg operand has non-immediate parameter		; CHECK: immarg operand has non-immediate parameter
; CHECK-NEXT: i1 %arg4		; CHECK-NEXT: i1 %arg4
; CHECK-NEXT: call i32 @llvm.amdgcn.permlane16(i32 %v2, i32 %arg0, i32 %arg1, i32 %arg2, i1 false, i1 %arg4)		; CHECK-NEXT: call i32 @llvm.amdgcn.permlane16.i32(i32 %v2, i32 %arg0, i32 %arg1, i32 %arg2, i1 false, i1 %arg4)
%v2 = call i32 @llvm.amdgcn.permlane16(i32 %v2, i32 %arg0, i32 %arg1, i32 %arg2, i1 false, i1 %arg4)		%v2 = call i32 @llvm.amdgcn.permlane16(i32 %v2, i32 %arg0, i32 %arg1, i32 %arg2, i1 false, i1 %arg4)
ret i32 %v2		ret i32 %v2
}		}

declare i32 @llvm.amdgcn.permlanex16(i32, i32, i32, i32, i1, i1)		declare i32 @llvm.amdgcn.permlanex16(i32, i32, i32, i32, i1, i1)
define i32 @test_permlanex16(ptr addrspace(1) %out, i32 %arg0, i32 %arg1, i32 %arg2, i1 %arg3, i1 %arg4) {		define i32 @test_permlanex16(ptr addrspace(1) %out, i32 %arg0, i32 %arg1, i32 %arg2, i1 %arg3, i1 %arg4) {
; CHECK: immarg operand has non-immediate parameter		; CHECK: immarg operand has non-immediate parameter
; CHECK-NEXT: i1 %arg3		; CHECK-NEXT: i1 %arg3
; CHECK-NEXT: %v1 = call i32 @llvm.amdgcn.permlanex16(i32 %arg0, i32 %arg0, i32 %arg1, i32 %arg2, i1 %arg3, i1 false)		; CHECK-NEXT: %v1 = call i32 @llvm.amdgcn.permlanex16.i32(i32 %arg0, i32 %arg0, i32 %arg1, i32 %arg2, i1 %arg3, i1 false)
%v1 = call i32 @llvm.amdgcn.permlanex16(i32 %arg0, i32 %arg0, i32 %arg1, i32 %arg2, i1 %arg3, i1 false)		%v1 = call i32 @llvm.amdgcn.permlanex16(i32 %arg0, i32 %arg0, i32 %arg1, i32 %arg2, i1 %arg3, i1 false)

; CHECK: immarg operand has non-immediate parameter		; CHECK: immarg operand has non-immediate parameter
; CHECK-NEXT: i1 %arg4		; CHECK-NEXT: i1 %arg4
; CHECK-NEXT: call i32 @llvm.amdgcn.permlanex16(i32 %v2, i32 %arg0, i32 %arg1, i32 %arg2, i1 false, i1 %arg4)		; CHECK-NEXT: call i32 @llvm.amdgcn.permlanex16.i32(i32 %v2, i32 %arg0, i32 %arg1, i32 %arg2, i1 false, i1 %arg4)
%v2 = call i32 @llvm.amdgcn.permlanex16(i32 %v2, i32 %arg0, i32 %arg1, i32 %arg2, i1 false, i1 %arg4)		%v2 = call i32 @llvm.amdgcn.permlanex16(i32 %v2, i32 %arg0, i32 %arg1, i32 %arg2, i1 false, i1 %arg4)
ret i32 %v2		ret i32 %v2
}		}

declare float @llvm.amdgcn.interp.p1(float, i32, i32, i32)		declare float @llvm.amdgcn.interp.p1(float, i32, i32, i32)
define void @test_interp_p1(float %arg0, i32 %arg1, i32 %arg2, i32 %arg3) {		define void @test_interp_p1(float %arg0, i32 %arg1, i32 %arg2, i32 %arg3) {
; CHECK: immarg operand has non-immediate parameter		; CHECK: immarg operand has non-immediate parameter
; CHECK-NEXT: i32 %arg1		; CHECK-NEXT: i32 %arg1
Show All 9 Lines	define void @test_interp_p1(float %arg0, i32 %arg1, i32 %arg2, i32 %arg3) {
ret void		ret void
}		}

declare float @llvm.amdgcn.interp.p2(float, float, i32, i32, i32)		declare float @llvm.amdgcn.interp.p2(float, float, i32, i32, i32)
define void @test_interp_p2(float %arg0, float %arg1, i32 %arg2, i32 %arg3, i32 %arg4) {		define void @test_interp_p2(float %arg0, float %arg1, i32 %arg2, i32 %arg3, i32 %arg4) {
; CHECK: immarg operand has non-immediate parameter		; CHECK: immarg operand has non-immediate parameter
; CHECK-NEXT: i32 %arg2		; CHECK-NEXT: i32 %arg2
; CHECK-NEXT: %val0 = call float @llvm.amdgcn.interp.p2(float %arg0, float %arg1, i32 %arg2, i32 0, i32 0)		; CHECK-NEXT: %val0 = call float @llvm.amdgcn.interp.p2(float %arg0, float %arg1, i32 %arg2, i32 0, i32 0)

%val0 = call float @llvm.amdgcn.interp.p2(float %arg0, float %arg1, i32 %arg2, i32 0, i32 0)		%val0 = call float @llvm.amdgcn.interp.p2(float %arg0, float %arg1, i32 %arg2, i32 0, i32 0)
store volatile float %val0, ptr addrspace(1) undef		store volatile float %val0, ptr addrspace(1) undef

; CHECK: immarg operand has non-immediate parameter		; CHECK: immarg operand has non-immediate parameter
; CHECK-NEXT: i32 %arg3		; CHECK-NEXT: i32 %arg3
; CHECK-NEXT: %val1 = call float @llvm.amdgcn.interp.p2(float %arg0, float %arg1, i32 0, i32 %arg3, i32 0)		; CHECK-NEXT: %val1 = call float @llvm.amdgcn.interp.p2(float %arg0, float %arg1, i32 0, i32 %arg3, i32 0)
%val1 = call float @llvm.amdgcn.interp.p2(float %arg0, float %arg1, i32 0, i32 %arg3, i32 0)		%val1 = call float @llvm.amdgcn.interp.p2(float %arg0, float %arg1, i32 0, i32 %arg3, i32 0)
store volatile float %val1, ptr addrspace(1) undef		store volatile float %val1, ptr addrspace(1) undef
▲ Show 20 Lines • Show All 103 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add type mangling for {read, write, readfirst, perm}lane intrinsicsNeeds RevisionPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 533080

clang/include/clang/Basic/BuiltinsAMDGPU.def

clang/lib/CodeGen/CGBuiltin.cpp

clang/test/CodeGenOpenCL/builtins-amdgcn-gfx10.cl

clang/test/CodeGenOpenCL/builtins-amdgcn.cl

clang/test/SemaOpenCL/builtins-amdgcn-error-gfx10-param.cl

llvm/include/llvm/IR/IntrinsicsAMDGPU.td

llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp

llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp

llvm/lib/Target/AMDGPU/AMDGPULateCodeGenPrepare.cpp

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

llvm/lib/Target/AMDGPU/SIInstructions.td

llvm/lib/Target/AMDGPU/VOP3Instructions.td

llvm/test/Analysis/UniformityAnalysis/AMDGPU/intrinsics.ll

llvm/test/Assembler/autoupgrade-amdgpu-intrinsics.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/atomic_optimizations_mul_one.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-amdgcn.readfirstlane.mir

llvm/test/CodeGen/AMDGPU/atomic_optimizations_local_pointer.ll

llvm/test/CodeGen/AMDGPU/global-atomic-scan.ll

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.permlane.ll

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.readfirstlane.ll

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.readlane.ll

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll

llvm/test/CodeGen/AMDGPU/permlane-ptr.ll

llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll

llvm/test/Verifier/AMDGPU/intrinsic-immarg.ll

[AMDGPU] Add type mangling for {read, write, readfirst, perm}lane intrinsics
Needs RevisionPublic