This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Add llvm.amdgcn.{read,readfirst,write}lane2 intrinsics with type overloads
AbandonedPublic

Authored by nhaehnle on Aug 18 2020, 10:23 AM.

Download Raw Diff

Details

Reviewers

foad
arsenm

Summary

These intrinsics should work at least with standard integer and floating
point sizes, pointers, and vectors of those.

This fixes selection for non-s32 types when readfirstlane is inserted
for SGPR return values.

Moving the atomic optimizer pass in the pass pipeline so that it can be
simplified and rely on the more general support of lane intrinsics.

API users should move to these new intrinsics so that we can remove the
old versions.

Change-Id: I1c5e7e7858890e1c30d3b46c8551e74ab7027552
Based-on: https://reviews.llvm.org/D84639

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	20 ms	linux > LLVM-Unit.IR/_/IRTests::IRBuilderTest.Intrinsics
	50 ms	linux > LLVM.Analysis/DivergenceAnalysis/AMDGPU::intrinsics.ll
	30 ms	windows > LLVM-Unit.IR/_/IRTests_exe::IRBuilderTest.Intrinsics
	130 ms	windows > LLVM.Analysis/DivergenceAnalysis/AMDGPU::intrinsics.ll

Event Timeline

nhaehnle created this revision.Aug 18 2020, 10:23 AM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptAug 18 2020, 10:23 AM

Herald added subscribers: cfe-commits, kerbowa, jfb and 7 others. · View Herald Transcript

nhaehnle requested review of this revision.Aug 18 2020, 10:23 AM

Herald added a subscriber: wdng. · View Herald TranscriptAug 18 2020, 10:23 AM

nhaehnle added a reviewer: foad.Aug 18 2020, 10:24 AM

nhaehnle mentioned this in D84639: AMDGPU: Add type mangling to llvm.amdgcn.readfirstlane.

Do we really have to use worse names here? Keeping the name works even if suboptimal for the attributes

Note that part of my motivation here over D84639 is to support more general types on the lane intrinsics, since they also express some semantic content which would be interesting to be able to express e.g. on descriptors. I wasn't able to bend the SelectionDAG type legalization to my will, so that's why I instead "legalize" the intrinsics in the AMDGPUCodeGenPrepare pass.

In D86154#2224270, @nhaehnle wrote:

Note that part of my motivation here over D84639 is to support more general types on the lane intrinsics, since they also express some semantic content which would be interesting to be able to express e.g. on descriptors. I wasn't able to bend the SelectionDAG type legalization to my will, so that's why I instead "legalize" the intrinsics in the AMDGPUCodeGenPrepare pass.

Don't you just need to handle this in ReplaceNodeResults the same way?

b-sumner added a subscriber: b-sumner.Aug 18 2020, 11:03 AM

Harbormaster completed remote builds in B68773: Diff 286336.Aug 18 2020, 11:20 AM

nhaehnle added a parent revision: D86317: IRBuilder: add CreateIntrinsicByType method.Aug 20 2020, 1:43 PM

Don't duplicate the intrinsics. Rely on D86317 to reduce the pain of this
change caused to downstream users.

In D86154#2224272, @arsenm wrote:

In D86154#2224270, @nhaehnle wrote:

Note that part of my motivation here over D84639 is to support more general types on the lane intrinsics, since they also express some semantic content which would be interesting to be able to express e.g. on descriptors. I wasn't able to bend the SelectionDAG type legalization to my will, so that's why I instead "legalize" the intrinsics in the AMDGPUCodeGenPrepare pass.

Don't you just need to handle this in ReplaceNodeResults the same way?

ReplaceNodeResults expects the result type to be changed in semi-magical ways during vector type legalization, which is non-obvious since the method can be called from different places. I think it *could* be made to work with a lot of patience, but it's really a bad interface -- and besides, by doing it in IR we reduce code duplication between SelectionDAG and GlobalISel, which is an added benefit IMO.

Harbormaster completed remote builds in B69079: Diff 286898.Aug 20 2020, 3:06 PM

In D86154#2229292, @nhaehnle wrote:

In D86154#2224272, @arsenm wrote:

In D86154#2224270, @nhaehnle wrote:

Note that part of my motivation here over D84639 is to support more general types on the lane intrinsics, since they also express some semantic content which would be interesting to be able to express e.g. on descriptors. I wasn't able to bend the SelectionDAG type legalization to my will, so that's why I instead "legalize" the intrinsics in the AMDGPUCodeGenPrepare pass.

Don't you just need to handle this in ReplaceNodeResults the same way?

ReplaceNodeResults expects the result type to be changed in semi-magical ways during vector type legalization, which is non-obvious since the method can be called from different places. I think it *could* be made to work with a lot of patience, but it's really a bad interface -- and besides, by doing it in IR we reduce code duplication between SelectionDAG and GlobalISel, which is an added benefit IMO.

Well the globalisel handling should be much simpler. We have a lot of stuff that's randomly handled in the IR to work around the DAG which long term should be moved where it belongs in codegen

ReplaceNodeResults expects the result type to be changed in semi-magical ways during vector type legalization, which is non-obvious since the method can be called from different places. I think it *could* be made to work with a lot of patience, but it's really a bad interface -- and besides, by doing it in IR we reduce code duplication between SelectionDAG and GlobalISel, which is an added benefit IMO.

Well the globalisel handling should be much simpler. We have a lot of stuff that's randomly handled in the IR to work around the DAG which long term should be moved where it belongs in codegen

Is the GlobalISel handling simpler than the handling in IR? What would happen if we wanted to extend the handling to struct types?

jrbyrnes mentioned this in D147732: [AMDGPU] Add type mangling for {read, write, readfirst, perm}lane intrinsics.May 12 2023, 1:48 PM

Should be obsoleted by D147732

This revision now requires changes to proceed.Jul 28 2023, 11:45 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 28 2023, 11:45 AM

Herald added subscribers: nlopes, StephenFan. · View Herald Transcript

arsenm resigned from this revision.Jul 28 2023, 11:46 AM

This revision now requires review to proceed.Jul 28 2023, 11:46 AM

Herald added a subscriber: arsenm. · View Herald TranscriptJul 28 2023, 11:46 AM

Indeed.

Revision Contents

Path

Size

clang/

lib/

CodeGen/

CGBuiltin.cpp

4 lines

test/

CodeGenOpenCL/

builtins-amdgcn.cl

6 lines

llvm/

include/

llvm/

IR/

IntrinsicsAMDGPU.td

17 lines

lib/

Target/

AMDGPU/

AMDGPUAtomicOptimizer.cpp

54 lines

AMDGPUCodeGenPrepare.cpp

140 lines

AMDGPUInstCombineIntrinsic.cpp

11 lines

AMDGPUTargetMachine.cpp

6 lines

SIInstructions.td

2 lines

test/

CodeGen/

AMDGPU/

GlobalISel/

inst-select-amdgcn.readfirstlane.mir

16 lines

llvm.amdgcn.readfirstlane.ll

108 lines

llvm.amdgcn.readlane.ll

59 lines

llvm.amdgcn.writelane.ll

59 lines

Transforms/

InstCombine/

AMDGPU/

amdgcn-intrinsics.ll

89 lines

Diff 286898

clang/lib/CodeGen/CGBuiltin.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 14,867 Lines • ▼ Show 20 Lines	if (ProcessOrderScopeAMDGCN(EmitScalarExpr(E->getArg(2)),
bool Volatile =		bool Volatile =
PtrTy->castAs<PointerType>()->getPointeeType().isVolatileQualified();		PtrTy->castAs<PointerType>()->getPointeeType().isVolatileQualified();
Value *IsVolatile = Builder.getInt1(static_cast<bool>(Volatile));		Value *IsVolatile = Builder.getInt1(static_cast<bool>(Volatile));

return Builder.CreateCall(F, {Ptr, Val, MemOrder, MemScope, IsVolatile});		return Builder.CreateCall(F, {Ptr, Val, MemOrder, MemScope, IsVolatile});
}		}
LLVM_FALLTHROUGH;		LLVM_FALLTHROUGH;
}		}
		case AMDGPU::BI__builtin_amdgcn_readfirstlane:
		return emitUnaryBuiltin(*this, E, Intrinsic::amdgcn_readfirstlane);
		case AMDGPU::BI__builtin_amdgcn_readlane:
		return emitBinaryBuiltin(*this, E, Intrinsic::amdgcn_readlane);
default:		default:
return nullptr;		return nullptr;
}		}
}		}

/// Handle a SystemZ function in which the final argument is a pointer		/// Handle a SystemZ function in which the final argument is a pointer
/// to an int that receives the post-instruction CC value. At the LLVM level		/// to an int that receives the post-instruction CC value. At the LLVM level
/// this is represented as a function that returns a {result, cc} pair.		/// this is represented as a function that returns a {result, cc} pair.
▲ Show 20 Lines • Show All 1,891 Lines • Show Last 20 Lines

clang/test/CodeGenOpenCL/builtins-amdgcn.cl

	Show First 20 Lines • Show All 285 Lines • ▼ Show 20 Lines

	// CHECK-LABEL: @test_ds_bpermute			// CHECK-LABEL: @test_ds_bpermute
	// CHECK: call i32 @llvm.amdgcn.ds.bpermute(i32 %a, i32 %b)			// CHECK: call i32 @llvm.amdgcn.ds.bpermute(i32 %a, i32 %b)
	void test_ds_bpermute(global int* out, int a, int b)			void test_ds_bpermute(global int* out, int a, int b)
	{			{
	*out = __builtin_amdgcn_ds_bpermute(a, b);			*out = __builtin_amdgcn_ds_bpermute(a, b);
	}			}

	// CHECK-LABEL: @test_readfirstlane			// CHECK-LABEL: @test_readfirstlane(
	// CHECK: call i32 @llvm.amdgcn.readfirstlane(i32 %a)			// CHECK: call i32 @llvm.amdgcn.readfirstlane.i32(i32 %a)
	void test_readfirstlane(global int* out, int a)			void test_readfirstlane(global int* out, int a)
	{			{
	*out = __builtin_amdgcn_readfirstlane(a);			*out = __builtin_amdgcn_readfirstlane(a);
	}			}

	// CHECK-LABEL: @test_readlane			// CHECK-LABEL: @test_readlane
	// CHECK: call i32 @llvm.amdgcn.readlane(i32 %a, i32 %b)			// CHECK: call i32 @llvm.amdgcn.readlane.i32(i32 %a, i32 %b)
	void test_readlane(global int* out, int a, int b)			void test_readlane(global int* out, int a, int b)
	{			{
	*out = __builtin_amdgcn_readlane(a, b);			*out = __builtin_amdgcn_readlane(a, b);
	}			}

	// CHECK-LABEL: @test_fcmp_f32			// CHECK-LABEL: @test_fcmp_f32
	// CHECK: call i64 @llvm.amdgcn.fcmp.i64.f32(float %a, float %b, i32 5)			// CHECK: call i64 @llvm.amdgcn.fcmp.i64.f32(float %a, float %b, i32 5)
	void test_fcmp_f32(global ulong* out, float a, float b)			void test_fcmp_f32(global ulong* out, float a, float b)
	▲ Show 20 Lines • Show All 434 Lines • Show Last 20 Lines

llvm/include/llvm/IR/IntrinsicsAMDGPU.td

Show First 20 Lines • Show All 1,410 Lines • ▼ Show 20 Lines	Intrinsic<[llvm_anyint_ty], [llvm_anyfloat_ty, LLVMMatchType<1>, llvm_i32_ty],
[IntrNoMem, IntrConvergent, IntrWillReturn,		[IntrNoMem, IntrConvergent, IntrWillReturn,
ImmArg<ArgIndex<2>>]>;		ImmArg<ArgIndex<2>>]>;

def int_amdgcn_ballot :		def int_amdgcn_ballot :
Intrinsic<[llvm_anyint_ty], [llvm_i1_ty],		Intrinsic<[llvm_anyint_ty], [llvm_i1_ty],
[IntrNoMem, IntrConvergent, IntrWillReturn]>;		[IntrNoMem, IntrConvergent, IntrWillReturn]>;

def int_amdgcn_readfirstlane :		def int_amdgcn_readfirstlane :
GCCBuiltin<"__builtin_amdgcn_readfirstlane">,		Intrinsic<[llvm_any_ty], [LLVMMatchType<0>],
Intrinsic<[llvm_i32_ty], [llvm_i32_ty],
[IntrNoMem, IntrConvergent, IntrWillReturn]>;		[IntrNoMem, IntrConvergent, IntrWillReturn]>;

// The lane argument must be uniform across the currently active threads of the		// The lane argument must be uniform across the currently active threads of the
// current wave. Otherwise, the result is undefined.		// current wave. Otherwise, the result is undefined.
def int_amdgcn_readlane :		def int_amdgcn_readlane :
GCCBuiltin<"__builtin_amdgcn_readlane">,		Intrinsic<[llvm_any_ty],
Intrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_i32_ty],		[LLVMMatchType<0>, // data input
		llvm_i32_ty], // uniform lane select
[IntrNoMem, IntrConvergent, IntrWillReturn]>;		[IntrNoMem, IntrConvergent, IntrWillReturn]>;

// The value to write and lane select arguments must be uniform across the		// The value to write and lane select arguments must be uniform across the
// currently active threads of the current wave. Otherwise, the result is		// currently active threads of the current wave. Otherwise, the result is
// undefined.		// undefined.
def int_amdgcn_writelane :		def int_amdgcn_writelane :
GCCBuiltin<"__builtin_amdgcn_writelane">,		Intrinsic<[llvm_any_ty], [
Intrinsic<[llvm_i32_ty], [		LLVMMatchType<0>, // uniform value to write: returned by the selected lane
llvm_i32_ty, // uniform value to write: returned by the selected lane
llvm_i32_ty, // uniform lane select		llvm_i32_ty, // uniform lane select
llvm_i32_ty // returned by all lanes other than the selected one		LLVMMatchType<0> // returned by all lanes other than the selected one
],		],
[IntrNoMem, IntrConvergent, IntrWillReturn]		[IntrNoMem, IntrConvergent, IntrWillReturn]
>;		>;

// FIXME: Deprecated. This is equivalent to llvm.fshr		// FIXME: Deprecated. This is equivalent to llvm.fshr
def int_amdgcn_alignbit : Intrinsic<[llvm_i32_ty],		def int_amdgcn_alignbit : Intrinsic<[llvm_i32_ty],
[llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],		[llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
[IntrNoMem, IntrSpeculatable, IntrWillReturn]		[IntrNoMem, IntrSpeculatable, IntrWillReturn]
▲ Show 20 Lines • Show All 553 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp

Show First 20 Lines • Show All 279 Lines • ▼ Show 20 Lines

// Use the builder to create an inclusive scan of V across the wavefront, with		// Use the builder to create an inclusive scan of V across the wavefront, with
// all lanes active.		// all lanes active.
Value *AMDGPUAtomicOptimizer::buildScan(IRBuilder<> &B, AtomicRMWInst::BinOp Op,		Value *AMDGPUAtomicOptimizer::buildScan(IRBuilder<> &B, AtomicRMWInst::BinOp Op,
Value V, Value const Identity) const {		Value V, Value const Identity) const {
Type *const Ty = V->getType();		Type *const Ty = V->getType();
Module *M = B.GetInsertBlock()->getModule();		Module *M = B.GetInsertBlock()->getModule();
Function *UpdateDPP =		Function *UpdateDPP =
Intrinsic::getDeclaration(M, Intrinsic::amdgcn_update_dpp, Ty);		Intrinsic::getDeclaration(M, Intrinsic::amdgcn_update_dpp, {Ty});
Function *PermLaneX16 =		Function *PermLaneX16 =
Intrinsic::getDeclaration(M, Intrinsic::amdgcn_permlanex16, {});		Intrinsic::getDeclaration(M, Intrinsic::amdgcn_permlanex16, {});
Function *ReadLane =		Function *ReadLane =
Intrinsic::getDeclaration(M, Intrinsic::amdgcn_readlane, {});		Intrinsic::getDeclaration(M, Intrinsic::amdgcn_readlane, {Ty});

for (unsigned Idx = 0; Idx < 4; Idx++) {		for (unsigned Idx = 0; Idx < 4; Idx++) {
V = buildNonAtomicBinOp(		V = buildNonAtomicBinOp(
B, Op, V,		B, Op, V,
B.CreateCall(UpdateDPP,		B.CreateCall(UpdateDPP,
{Identity, V, B.getInt32(DPP::ROW_SHR0 \| 1 << Idx),		{Identity, V, B.getInt32(DPP::ROW_SHR0 \| 1 << Idx),
B.getInt32(0xf), B.getInt32(0xf), B.getFalse()}));		B.getInt32(0xf), B.getInt32(0xf), B.getFalse()}));
}		}
Show All 38 Lines

// Use the builder to create a shift right of V across the wavefront, with all		// Use the builder to create a shift right of V across the wavefront, with all
// lanes active, to turn an inclusive scan into an exclusive scan.		// lanes active, to turn an inclusive scan into an exclusive scan.
Value AMDGPUAtomicOptimizer::buildShiftRight(IRBuilder<> &B, Value V,		Value AMDGPUAtomicOptimizer::buildShiftRight(IRBuilder<> &B, Value V,
Value *const Identity) const {		Value *const Identity) const {
Type *const Ty = V->getType();		Type *const Ty = V->getType();
Module *M = B.GetInsertBlock()->getModule();		Module *M = B.GetInsertBlock()->getModule();
Function *UpdateDPP =		Function *UpdateDPP =
Intrinsic::getDeclaration(M, Intrinsic::amdgcn_update_dpp, Ty);		Intrinsic::getDeclaration(M, Intrinsic::amdgcn_update_dpp, {Ty});
Function *ReadLane =		Function *ReadLane =
Intrinsic::getDeclaration(M, Intrinsic::amdgcn_readlane, {});		Intrinsic::getDeclaration(M, Intrinsic::amdgcn_readlane, {Ty});
Function *WriteLane =		Function *WriteLane =
Intrinsic::getDeclaration(M, Intrinsic::amdgcn_writelane, {});		Intrinsic::getDeclaration(M, Intrinsic::amdgcn_writelane, {Ty});

if (ST->hasDPPWavefrontShifts()) {		if (ST->hasDPPWavefrontShifts()) {
// GFX9 has DPP wavefront shift operations.		// GFX9 has DPP wavefront shift operations.
V = B.CreateCall(UpdateDPP,		V = B.CreateCall(UpdateDPP,
{Identity, V, B.getInt32(DPP::WAVE_SHR1), B.getInt32(0xf),		{Identity, V, B.getInt32(DPP::WAVE_SHR1), B.getInt32(0xf),
B.getInt32(0xf), B.getFalse()});		B.getInt32(0xf), B.getFalse()});
} else {		} else {
// On GFX10 all DPP operations are confined to a single row. To get cross-		// On GFX10 all DPP operations are confined to a single row. To get cross-
▲ Show 20 Lines • Show All 125 Lines • ▼ Show 20 Lines	const AtomicRMWInst::BinOp ScanOp =
Op == AtomicRMWInst::Sub ? AtomicRMWInst::Add : Op;		Op == AtomicRMWInst::Sub ? AtomicRMWInst::Add : Op;
NewV = buildScan(B, ScanOp, NewV, Identity);		NewV = buildScan(B, ScanOp, NewV, Identity);
ExclScan = buildShiftRight(B, NewV, Identity);		ExclScan = buildShiftRight(B, NewV, Identity);

// Read the value from the last lane, which has accumlated the values of		// Read the value from the last lane, which has accumlated the values of
// each active lane in the wavefront. This will be our new value which we		// each active lane in the wavefront. This will be our new value which we
// will provide to the atomic operation.		// will provide to the atomic operation.
Value *const LastLaneIdx = B.getInt32(ST->getWavefrontSize() - 1);		Value *const LastLaneIdx = B.getInt32(ST->getWavefrontSize() - 1);
if (TyBitWidth == 64) {		NewV = B.CreateIntrinsic(Intrinsic::amdgcn_readlane, {Ty},
Value *const ExtractLo = B.CreateTrunc(NewV, B.getInt32Ty());
Value *const ExtractHi =
B.CreateTrunc(B.CreateLShr(NewV, 32), B.getInt32Ty());
CallInst *const ReadLaneLo = B.CreateIntrinsic(
Intrinsic::amdgcn_readlane, {}, {ExtractLo, LastLaneIdx});
CallInst *const ReadLaneHi = B.CreateIntrinsic(
Intrinsic::amdgcn_readlane, {}, {ExtractHi, LastLaneIdx});
Value *const PartialInsert = B.CreateInsertElement(
UndefValue::get(VecTy), ReadLaneLo, B.getInt32(0));
Value *const Insert =
B.CreateInsertElement(PartialInsert, ReadLaneHi, B.getInt32(1));
NewV = B.CreateBitCast(Insert, Ty);
} else if (TyBitWidth == 32) {
NewV = B.CreateIntrinsic(Intrinsic::amdgcn_readlane, {},
{NewV, LastLaneIdx});		{NewV, LastLaneIdx});
} else {
llvm_unreachable("Unhandled atomic bit width");
}

// Finally mark the readlanes in the WWM section.		// Finally mark the readlanes in the WWM section.
NewV = B.CreateIntrinsic(Intrinsic::amdgcn_wwm, Ty, NewV);		NewV = B.CreateIntrinsic(Intrinsic::amdgcn_wwm, Ty, NewV);
} else {		} else {
switch (Op) {		switch (Op) {
default:		default:
llvm_unreachable("Unhandled atomic op");		llvm_unreachable("Unhandled atomic op");

▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	if (NeedResult) {
// Create a PHI node to get our new atomic result into the exit block.		// Create a PHI node to get our new atomic result into the exit block.
PHINode *const PHI = B.CreatePHI(Ty, 2);		PHINode *const PHI = B.CreatePHI(Ty, 2);
PHI->addIncoming(UndefValue::get(Ty), EntryBB);		PHI->addIncoming(UndefValue::get(Ty), EntryBB);
PHI->addIncoming(NewI, SingleLaneTerminator->getParent());		PHI->addIncoming(NewI, SingleLaneTerminator->getParent());

// We need to broadcast the value who was the lowest active lane (the first		// We need to broadcast the value who was the lowest active lane (the first
// lane) to all other lanes in the wavefront. We use an intrinsic for this,		// lane) to all other lanes in the wavefront. We use an intrinsic for this,
// but have to handle 64-bit broadcasts with two calls to this intrinsic.		// but have to handle 64-bit broadcasts with two calls to this intrinsic.
Value *BroadcastI = nullptr;		Value *BroadcastI =
		B.CreateIntrinsic(Intrinsic::amdgcn_readfirstlane, {Ty}, {PHI});
if (TyBitWidth == 64) {
Value *const ExtractLo = B.CreateTrunc(PHI, B.getInt32Ty());
Value *const ExtractHi =
B.CreateTrunc(B.CreateLShr(PHI, 32), B.getInt32Ty());
CallInst *const ReadFirstLaneLo =
B.CreateIntrinsic(Intrinsic::amdgcn_readfirstlane, {}, ExtractLo);
CallInst *const ReadFirstLaneHi =
B.CreateIntrinsic(Intrinsic::amdgcn_readfirstlane, {}, ExtractHi);
Value *const PartialInsert = B.CreateInsertElement(
UndefValue::get(VecTy), ReadFirstLaneLo, B.getInt32(0));
Value *const Insert =
B.CreateInsertElement(PartialInsert, ReadFirstLaneHi, B.getInt32(1));
BroadcastI = B.CreateBitCast(Insert, Ty);
} else if (TyBitWidth == 32) {

BroadcastI = B.CreateIntrinsic(Intrinsic::amdgcn_readfirstlane, {}, PHI);
} else {
llvm_unreachable("Unhandled atomic bit width");
}

// Now that we have the result of our single atomic operation, we need to		// Now that we have the result of our single atomic operation, we need to
// get our individual lane's slice into the result. We use the lane offset		// get our individual lane's slice into the result. We use the lane offset
// we previously calculated combined with the atomic result value we got		// we previously calculated combined with the atomic result value we got
// from the first lane, to get our lane's index into the atomic result.		// from the first lane, to get our lane's index into the atomic result.
Value *LaneOffset = nullptr;		Value *LaneOffset = nullptr;
if (ValDivergent) {		if (ValDivergent) {
LaneOffset = B.CreateIntrinsic(Intrinsic::amdgcn_wwm, Ty, ExclScan);		LaneOffset = B.CreateIntrinsic(Intrinsic::amdgcn_wwm, Ty, ExclScan);
▲ Show 20 Lines • Show All 51 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp

Show First 20 Lines • Show All 218 Lines • ▼ Show 20 Lines	public:
bool visitInstruction(Instruction &I) { return false; }		bool visitInstruction(Instruction &I) { return false; }
bool visitBinaryOperator(BinaryOperator &I);		bool visitBinaryOperator(BinaryOperator &I);
bool visitLoadInst(LoadInst &I);		bool visitLoadInst(LoadInst &I);
bool visitICmpInst(ICmpInst &I);		bool visitICmpInst(ICmpInst &I);
bool visitSelectInst(SelectInst &I);		bool visitSelectInst(SelectInst &I);

bool visitIntrinsicInst(IntrinsicInst &I);		bool visitIntrinsicInst(IntrinsicInst &I);
bool visitBitreverseIntrinsicInst(IntrinsicInst &I);		bool visitBitreverseIntrinsicInst(IntrinsicInst &I);
		bool visitLaneIntrinsicInst(IntrinsicInst &I);
		Value *buildLegalLaneIntrinsic(IRBuilder<> &B, Intrinsic::ID IID,
		Value Data0, Value Lane = nullptr,
		Value *Data1 = nullptr);

bool doInitialization(Module &M) override;		bool doInitialization(Module &M) override;
bool runOnFunction(Function &F) override;		bool runOnFunction(Function &F) override;

StringRef getPassName() const override { return "AMDGPU IR optimizations"; }		StringRef getPassName() const override { return "AMDGPU IR optimizations"; }

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.addRequired<AssumptionCacheTracker>();		AU.addRequired<AssumptionCacheTracker>();
▲ Show 20 Lines • Show All 1,104 Lines • ▼ Show 20 Lines	bool AMDGPUCodeGenPrepare::visitSelectInst(SelectInst &I) {

return Changed;		return Changed;
}		}

bool AMDGPUCodeGenPrepare::visitIntrinsicInst(IntrinsicInst &I) {		bool AMDGPUCodeGenPrepare::visitIntrinsicInst(IntrinsicInst &I) {
switch (I.getIntrinsicID()) {		switch (I.getIntrinsicID()) {
case Intrinsic::bitreverse:		case Intrinsic::bitreverse:
return visitBitreverseIntrinsicInst(I);		return visitBitreverseIntrinsicInst(I);
		case Intrinsic::amdgcn_readfirstlane:
		case Intrinsic::amdgcn_readlane:
		case Intrinsic::amdgcn_writelane:
		return visitLaneIntrinsicInst(I);
default:		default:
return false;		return false;
}		}
}		}

bool AMDGPUCodeGenPrepare::visitBitreverseIntrinsicInst(IntrinsicInst &I) {		bool AMDGPUCodeGenPrepare::visitBitreverseIntrinsicInst(IntrinsicInst &I) {
bool Changed = false;		bool Changed = false;

if (ST->has16BitInsts() && needsPromotionToI32(I.getType()) &&		if (ST->has16BitInsts() && needsPromotionToI32(I.getType()) &&
DA->isUniform(&I))		DA->isUniform(&I))
Changed \|= promoteUniformBitreverseToI32(I);		Changed \|= promoteUniformBitreverseToI32(I);

return Changed;		return Changed;
}		}

		Value *AMDGPUCodeGenPrepare::buildLegalLaneIntrinsic(IRBuilder<> &B,
		Intrinsic::ID IID,
		Value Data0, Value Lane,
		Value *Data1) {
		Type *Ty = Data0->getType();

		if (Ty == B.getInt32Ty()) {
		Value *Args[3] = {Data0, Lane, Data1};
		unsigned NumArgs = Data1 != nullptr ? 3 : Lane != nullptr ? 2 : 1;
		return B.CreateIntrinsic(IID, {B.getInt32Ty()}, {Args, NumArgs});
		}

		if (auto *VecTy = dyn_cast<FixedVectorType>(Ty)) {
		Type *EltType = VecTy->getElementType();
		bool is16Bit =
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'is16Bit' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'is16Bit' [readability-identifier-naming]…
		(EltType->isIntegerTy() && EltType->getIntegerBitWidth() == 16) \|\|
		(EltType->isHalfTy());
		int EC = VecTy->getElementCount().Min;

		Value *Result = UndefValue::get(Ty);
		for (int i = 0; i < EC; i += 1 + is16Bit) {
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'i' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'i' [readability-identifier-naming]…
		Value *EltData0;
		Value *EltData1 = nullptr;

		if (is16Bit) {
		int Idxs[2] = {i, i + 1};
		EltData0 = B.CreateShuffleVector(Data0, UndefValue::get(Ty), Idxs);
		EltData0 = B.CreateBitCast(EltData0, B.getInt32Ty());
		} else {
		EltData0 = B.CreateExtractElement(Data0, i);
		}

		if (Data1) {
		if (is16Bit) {
		int Idxs[2] = {i, i + 1};
		EltData1 = B.CreateShuffleVector(Data1, UndefValue::get(Ty), Idxs);
		EltData1 = B.CreateBitCast(EltData1, B.getInt32Ty());
		} else {
		EltData1 = B.CreateExtractElement(Data1, i);
		}
		}

		Value *EltResult =
		buildLegalLaneIntrinsic(B, IID, EltData0, Lane, EltData1);

		if (is16Bit) {
		EltResult =
		B.CreateBitCast(EltResult, FixedVectorType::get(EltType, 2));
		for (int j = 0; j < 2; ++j) {
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'j' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'j' [readability-identifier-naming]…
		if (i + j >= EC)
		break;
		Result = B.CreateInsertElement(
		Result, B.CreateExtractElement(EltResult, j), i + j);
		}
		} else {
		Result = B.CreateInsertElement(Result, EltResult, i);
		}
		}

		return Result;
		}

		unsigned BitWidth = DL->getTypeSizeInBits(Ty);
		Type *IntTy = Ty;

		if (!Ty->isIntegerTy()) {
		IntTy = IntegerType::get(Mod->getContext(), BitWidth);
		Data0 = B.CreateBitOrPointerCast(Data0, IntTy);
		if (Data1)
		Data1 = B.CreateBitOrPointerCast(Data1, IntTy);
		}

		if ((BitWidth % 32) != 0) {
		Type *ExtendedTy =
		IntegerType::get(Mod->getContext(), (BitWidth + 31) & ~31);
		Data0 = B.CreateZExt(Data0, ExtendedTy);
		if (Data1)
		Data1 = B.CreateZExt(Data1, ExtendedTy);
		}

		if (BitWidth > 32) {
		Type *VecTy = FixedVectorType::get(B.getInt32Ty(), (BitWidth + 31) / 32);
		Data0 = B.CreateBitCast(Data0, VecTy);
		if (Data1)
		Data1 = B.CreateBitCast(Data1, VecTy);
		}

		Value *Result = buildLegalLaneIntrinsic(B, IID, Data0, Lane, Data1);

		if ((BitWidth % 32) != 0) {
		if (BitWidth > 32) {
		Result = B.CreateBitCast(
		Result, IntegerType::get(Mod->getContext(), (BitWidth + 31) / 32));
		}

		Result =
		B.CreateTrunc(Result, IntegerType::get(Mod->getContext(), BitWidth));
		}

		return B.CreateBitOrPointerCast(Result, Ty);
		}

		/// "Legalize" readfirstlane/readlane/writelane to single-dword intrinsics
		/// on i32.
		///
		/// Done during codegen prepare purely because this turned out to be simpler
		/// than doing it in this generality in SelectionDAG.
		bool AMDGPUCodeGenPrepare::visitLaneIntrinsicInst(IntrinsicInst &I) {
		Type *Ty = I.getType();
		if (Ty->isIntegerTy(32) && Ty->getIntegerBitWidth() == 32)
		return false; // already legal

		Value *Data0 = I.getArgOperand(0);
		Value *Lane = nullptr;
		Value *Data1 = nullptr;

		if (I.getIntrinsicID() == Intrinsic::amdgcn_readlane) {
		Lane = I.getArgOperand(1);
		} else if (I.getIntrinsicID() == Intrinsic::amdgcn_writelane) {
		Lane = I.getArgOperand(1);
		Data1 = I.getArgOperand(2);
		}

		IRBuilder<> Builder(&I);
		Value *Legalized =
		buildLegalLaneIntrinsic(Builder, I.getIntrinsicID(), Data0, Lane, Data1);

		I.replaceAllUsesWith(Legalized);
		I.eraseFromParent();
		return true;
		}

bool AMDGPUCodeGenPrepare::doInitialization(Module &M) {		bool AMDGPUCodeGenPrepare::doInitialization(Module &M) {
Mod = &M;		Mod = &M;
DL = &Mod->getDataLayout();		DL = &Mod->getDataLayout();
return false;		return false;
}		}

bool AMDGPUCodeGenPrepare::runOnFunction(Function &F) {		bool AMDGPUCodeGenPrepare::runOnFunction(Function &F) {
if (skipFunction(F))		if (skipFunction(F))
▲ Show 20 Lines • Show All 58 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp

Show First 20 Lines • Show All 649 Lines • ▼ Show 20 Lines	if (match(Src,
return IC.replaceInstUsesWith(II, Src);		return IC.replaceInstUsesWith(II, Src);
}		}

if (IID == Intrinsic::amdgcn_readfirstlane) {		if (IID == Intrinsic::amdgcn_readfirstlane) {
// readfirstlane (readlane x, y) -> readlane x, y		// readfirstlane (readlane x, y) -> readlane x, y
if (match(Src, PatternMatch::m_Intrinsic<Intrinsic::amdgcn_readlane>())) {		if (match(Src, PatternMatch::m_Intrinsic<Intrinsic::amdgcn_readlane>())) {
return IC.replaceInstUsesWith(II, Src);		return IC.replaceInstUsesWith(II, Src);
}		}

		// readfirstlane (bitcast x) -> bitcast (readfirstlane x)
		Value *BitcastInput = nullptr;
		if (match(Src,
		PatternMatch::m_BitCast(PatternMatch::m_Value(BitcastInput)))) {
		CallInst *NewCall =
		IC.Builder.CreateIntrinsic(Intrinsic::amdgcn_readfirstlane,
		{BitcastInput->getType()}, BitcastInput);
		Value *NewCast = IC.Builder.CreateBitCast(NewCall, II.getType());
		return IC.replaceInstUsesWith(II, NewCast);
		}
} else {		} else {
// readlane (readlane x, y), y -> readlane x, y		// readlane (readlane x, y), y -> readlane x, y
if (match(Src, PatternMatch::m_Intrinsic<Intrinsic::amdgcn_readlane>(		if (match(Src, PatternMatch::m_Intrinsic<Intrinsic::amdgcn_readlane>(
PatternMatch::m_Value(),		PatternMatch::m_Value(),
PatternMatch::m_Specific(II.getArgOperand(1))))) {		PatternMatch::m_Specific(II.getArgOperand(1))))) {
return IC.replaceInstUsesWith(II, Src);		return IC.replaceInstUsesWith(II, Src);
}		}
}		}
▲ Show 20 Lines • Show All 230 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

Show First 20 Lines • Show All 709 Lines • ▼ Show 20 Lines	void AMDGPUPassConfig::addIRPasses() {
// bitcast calls.		// bitcast calls.
addPass(createAMDGPUFixFunctionBitcastsPass());		addPass(createAMDGPUFixFunctionBitcastsPass());

// A call to propagate attributes pass in the backend in case opt was not run.		// A call to propagate attributes pass in the backend in case opt was not run.
addPass(createAMDGPUPropagateAttributesEarlyPass(&TM));		addPass(createAMDGPUPropagateAttributesEarlyPass(&TM));

addPass(createAtomicExpandPass());		addPass(createAtomicExpandPass());

		if (EnableAtomicOptimizations)
		addPass(createAMDGPUAtomicOptimizerPass());

addPass(createAMDGPULowerIntrinsicsPass());		addPass(createAMDGPULowerIntrinsicsPass());

// Function calls are not supported, so make sure we inline everything.		// Function calls are not supported, so make sure we inline everything.
addPass(createAMDGPUAlwaysInlinePass());		addPass(createAMDGPUAlwaysInlinePass());
addPass(createAlwaysInlinerLegacyPass());		addPass(createAlwaysInlinerLegacyPass());
// We need to add the barrier noop pass, otherwise adding the function		// We need to add the barrier noop pass, otherwise adding the function
// inlining pass will cause all of the PassConfigs passes to be run		// inlining pass will cause all of the PassConfigs passes to be run
▲ Show 20 Lines • Show All 140 Lines • ▼ Show 20 Lines	ScheduleDAGInstrs *GCNPassConfig::createMachineScheduler(
if (ST.enableSIScheduler())		if (ST.enableSIScheduler())
return createSIMachineScheduler(C);		return createSIMachineScheduler(C);
return createGCNMaxOccupancyMachineScheduler(C);		return createGCNMaxOccupancyMachineScheduler(C);
}		}

bool GCNPassConfig::addPreISel() {		bool GCNPassConfig::addPreISel() {
AMDGPUPassConfig::addPreISel();		AMDGPUPassConfig::addPreISel();

if (EnableAtomicOptimizations) {
addPass(createAMDGPUAtomicOptimizerPass());
}

// FIXME: We need to run a pass to propagate the attributes when calls are		// FIXME: We need to run a pass to propagate the attributes when calls are
// supported.		// supported.

// Merge divergent exit nodes. StructurizeCFG won't recognize the multi-exit		// Merge divergent exit nodes. StructurizeCFG won't recognize the multi-exit
// regions formed by them.		// regions formed by them.
addPass(&AMDGPUUnifyDivergentExitNodesID);		addPass(&AMDGPUUnifyDivergentExitNodesID);
if (!LateCFGStructurize) {		if (!LateCFGStructurize) {
if (EnableStructurizerWorkarounds) {		if (EnableStructurizerWorkarounds) {
▲ Show 20 Lines • Show All 335 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIInstructions.td

Show First 20 Lines • Show All 2,197 Lines • ▼ Show 20 Lines	def : GCNPat<
let SubtargetPredicate = NotHasAddNoCarryInsts;		let SubtargetPredicate = NotHasAddNoCarryInsts;
}		}


// Avoid pointlessly materializing a constant in VGPR.		// Avoid pointlessly materializing a constant in VGPR.
// FIXME: Should also do this for readlane, but tablegen crashes on		// FIXME: Should also do this for readlane, but tablegen crashes on
// the ignored src1.		// the ignored src1.
def : GCNPat<		def : GCNPat<
(int_amdgcn_readfirstlane (i32 imm:$src)),		(i32 (int_amdgcn_readfirstlane (i32 imm:$src))),
(S_MOV_B32 SReg_32:$src)		(S_MOV_B32 SReg_32:$src)
>;		>;

multiclass BFMPatterns <ValueType vt, InstSI BFM, InstSI MOV> {		multiclass BFMPatterns <ValueType vt, InstSI BFM, InstSI MOV> {
def : GCNPat <		def : GCNPat <
(vt (shl (vt (add (vt (shl 1, vt:$a)), -1)), vt:$b)),		(vt (shl (vt (add (vt (shl 1, vt:$a)), -1)), vt:$b)),
(BFM $a, $b)		(BFM $a, $b)
>;		>;
▲ Show 20 Lines • Show All 266 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-amdgcn.readfirstlane.mir

	# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	# RUN: llc -march=amdgcn -mcpu=tahiti -run-pass=instruction-select -verify-machineinstrs -global-isel-abort=2 -pass-remarks-missed='gisel*' %s -o - 2> %t \| FileCheck -check-prefix=GCN %s			# RUN: llc -march=amdgcn -mcpu=tahiti -run-pass=instruction-select -verify-machineinstrs -global-isel-abort=2 -pass-remarks-missed='gisel.*' %s -o - 2> %t \| FileCheck -check-prefix=GCN %s
	# RUN: FileCheck -check-prefix=ERR %s < %t			# RUN: FileCheck -check-prefix=ERR %s < %t

	# ERR: remark: <unknown>:0:0: cannot select: %1:sgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.readfirstlane), %0:sgpr(s32) (in function: readfirstlane_s)			# ERR: remark: <unknown>:0:0: cannot select: %1:sgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.readfirstlane), %0:sgpr(s32) (in function: readfirstlane_s32_s)

	---			---
	name: readfirstlane_v			name: readfirstlane_s32_v
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0
	; GCN-LABEL: name: readfirstlane_v			; GCN-LABEL: name: readfirstlane_s32_v
	; GCN: liveins: $vgpr0			; GCN: liveins: $vgpr0
	; GCN: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0			; GCN: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
	; GCN: [[V_READFIRSTLANE_B32_:%[0-9]+]]:sreg_32 = V_READFIRSTLANE_B32 [[COPY]], implicit $exec			; GCN: [[V_READFIRSTLANE_B32_:%[0-9]+]]:sreg_32 = V_READFIRSTLANE_B32 [[COPY]], implicit $exec
	; GCN: S_ENDPGM 0, implicit [[V_READFIRSTLANE_B32_]]			; GCN: S_ENDPGM 0, implicit [[V_READFIRSTLANE_B32_]]
	%0:vgpr(s32) = COPY $vgpr0			%0:vgpr(s32) = COPY $vgpr0
	%1:sgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.readfirstlane), %0			%1:sgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.readfirstlane), %0
	S_ENDPGM 0, implicit %1			S_ENDPGM 0, implicit %1
	...			...

	---			---
	name: readfirstlane_v_imm			name: readfirstlane_v_s32_imm
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	bb.0:			bb.0:

	; GCN-LABEL: name: readfirstlane_v_imm			; GCN-LABEL: name: readfirstlane_v_s32_imm
	; GCN: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 123, implicit $exec			; GCN: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 123, implicit $exec
	; GCN: [[COPY:%[0-9]+]]:sreg_32 = COPY [[V_MOV_B32_e32_]]			; GCN: [[COPY:%[0-9]+]]:sreg_32 = COPY [[V_MOV_B32_e32_]]
	; GCN: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 [[COPY]]			; GCN: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 [[COPY]]
	; GCN: S_ENDPGM 0, implicit [[S_MOV_B32_]]			; GCN: S_ENDPGM 0, implicit [[S_MOV_B32_]]
	%0:vgpr(s32) = G_CONSTANT i32 123			%0:vgpr(s32) = G_CONSTANT i32 123
	%1:sgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.readfirstlane), %0			%1:sgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.readfirstlane), %0
	S_ENDPGM 0, implicit %1			S_ENDPGM 0, implicit %1
	...			...

	# Make sure this fails to select			# Make sure this fails to select
	---			---
	name: readfirstlane_s			name: readfirstlane_s32_s
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $sgpr0			liveins: $sgpr0
	; GCN-LABEL: name: readfirstlane_s			; GCN-LABEL: name: readfirstlane_s32_s
	; GCN: liveins: $sgpr0			; GCN: liveins: $sgpr0
	; GCN: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0			; GCN: [[COPY:%[0-9]+]]:sgpr(s32) = COPY $sgpr0
	; GCN: [[INT:%[0-9]+]]:sgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.readfirstlane), [[COPY]](s32)			; GCN: [[INT:%[0-9]+]]:sgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.readfirstlane), [[COPY]](s32)
	; GCN: S_ENDPGM 0, implicit [[INT]](s32)			; GCN: S_ENDPGM 0, implicit [[INT]](s32)
	%0:sgpr(s32) = COPY $sgpr0			%0:sgpr(s32) = COPY $sgpr0
	%1:sgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.readfirstlane), %0			%1:sgpr(s32) = G_INTRINSIC intrinsic(@llvm.amdgcn.readfirstlane), %0
	S_ENDPGM 0, implicit %1			S_ENDPGM 0, implicit %1
	...			...

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.readfirstlane.ll

				; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=hawaii -verify-machineinstrs < %s \| FileCheck -enable-var-scope %s
	; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -enable-var-scope %s			; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -enable-var-scope %s

	declare i32 @llvm.amdgcn.readfirstlane(i32) #0			declare i32 @llvm.amdgcn.readfirstlane.i32(i32) #0
				declare float @llvm.amdgcn.readfirstlane.f32(float) #0
				declare <2 x half> @llvm.amdgcn.readfirstlane.v2f16(<2 x half>) #0
				declare <2 x i16> @llvm.amdgcn.readfirstlane.v2i16(<2 x i16>) #0
				declare i8 addrspace(3)* @llvm.amdgcn.readfirstlane.p3i8(i8 addrspace(3)*) #0
				declare i16 @llvm.amdgcn.readfirstlane.i16(i16) #0
				declare half @llvm.amdgcn.readfirstlane.f16(half) #0
				declare <3 x i16> @llvm.amdgcn.readfirstlane.v3i16(<3 x i16>) #0
				declare <9 x float> @llvm.amdgcn.readfirstlane.v9f32(<9 x float>) #0

	; CHECK-LABEL: {{^}}test_readfirstlane:			; CHECK-LABEL: {{^}}test_readfirstlane_i32:
	; CHECK: v_readfirstlane_b32 s{{[0-9]+}}, v2			; CHECK: v_readfirstlane_b32 s{{[0-9]+}}, v2
	define void @test_readfirstlane(i32 addrspace(1)* %out, i32 %src) #1 {			define void @test_readfirstlane_i32(i32 addrspace(1)* %out, i32 %src) #1 {
	%readfirstlane = call i32 @llvm.amdgcn.readfirstlane(i32 %src)			%readfirstlane = call i32 @llvm.amdgcn.readfirstlane.i32(i32 %src)
	store i32 %readfirstlane, i32 addrspace(1)* %out, align 4			store i32 %readfirstlane, i32 addrspace(1)* %out, align 4
	ret void			ret void
	}			}

	; CHECK-LABEL: {{^}}test_readfirstlane_imm:			; CHECK-LABEL: {{^}}test_readfirstlane_imm:
	; CHECK: s_mov_b32 [[SGPR_VAL:s[0-9]]], 32			; CHECK: s_mov_b32 [[SGPR_VAL:s[0-9]]], 32
	; CHECK-NOT: [[SGPR_VAL]]			; CHECK-NOT: [[SGPR_VAL]]
	; CHECK: ; use [[SGPR_VAL]]			; CHECK: ; use [[SGPR_VAL]]
	define amdgpu_kernel void @test_readfirstlane_imm(i32 addrspace(1)* %out) #1 {			define amdgpu_kernel void @test_readfirstlane_imm(i32 addrspace(1)* %out) #1 {
	%readfirstlane = call i32 @llvm.amdgcn.readfirstlane(i32 32)			%readfirstlane = call i32 @llvm.amdgcn.readfirstlane.i32(i32 32)
	call void asm sideeffect "; use $0", "s"(i32 %readfirstlane)			call void asm sideeffect "; use $0", "s"(i32 %readfirstlane)
	ret void			ret void
	}			}

	; CHECK-LABEL: {{^}}test_readfirstlane_imm_fold:			; CHECK-LABEL: {{^}}test_readfirstlane_imm_fold:
	; CHECK: v_mov_b32_e32 [[VVAL:v[0-9]]], 32			; CHECK: v_mov_b32_e32 [[VVAL:v[0-9]]], 32
	; CHECK-NOT: [[VVAL]]			; CHECK-NOT: [[VVAL]]
	; CHECK: flat_store_dword v{{\[[0-9]+:[0-9]+\]}}, [[VVAL]]			; CHECK: flat_store_dword v{{\[[0-9]+:[0-9]+\]}}, [[VVAL]]
	define amdgpu_kernel void @test_readfirstlane_imm_fold(i32 addrspace(1)* %out) #1 {			define amdgpu_kernel void @test_readfirstlane_imm_fold(i32 addrspace(1)* %out) #1 {
	%readfirstlane = call i32 @llvm.amdgcn.readfirstlane(i32 32)			%readfirstlane = call i32 @llvm.amdgcn.readfirstlane.i32(i32 32)
	store i32 %readfirstlane, i32 addrspace(1)* %out, align 4			store i32 %readfirstlane, i32 addrspace(1)* %out, align 4
	ret void			ret void
	}			}

	; CHECK-LABEL: {{^}}test_readfirstlane_m0:			; CHECK-LABEL: {{^}}test_readfirstlane_m0:
	; CHECK: s_mov_b32 m0, -1			; CHECK: s_mov_b32 m0, -1
	; CHECK: v_mov_b32_e32 [[VVAL:v[0-9]]], m0			; CHECK: v_mov_b32_e32 [[VVAL:v[0-9]]], m0
	; CHECK: flat_store_dword v{{\[[0-9]+:[0-9]+\]}}, [[VVAL]]			; CHECK: flat_store_dword v{{\[[0-9]+:[0-9]+\]}}, [[VVAL]]
	define amdgpu_kernel void @test_readfirstlane_m0(i32 addrspace(1)* %out) #1 {			define amdgpu_kernel void @test_readfirstlane_m0(i32 addrspace(1)* %out) #1 {
	%m0 = call i32 asm "s_mov_b32 m0, -1", "={m0}"()			%m0 = call i32 asm "s_mov_b32 m0, -1", "={m0}"()
	%readfirstlane = call i32 @llvm.amdgcn.readfirstlane(i32 %m0)			%readfirstlane = call i32 @llvm.amdgcn.readfirstlane.i32(i32 %m0)
	store i32 %readfirstlane, i32 addrspace(1)* %out, align 4			store i32 %readfirstlane, i32 addrspace(1)* %out, align 4
	ret void			ret void
	}			}

	; CHECK-LABEL: {{^}}test_readfirstlane_copy_from_sgpr:			; CHECK-LABEL: {{^}}test_readfirstlane_copy_from_sgpr:
	; CHECK: ;;#ASMSTART			; CHECK: ;;#ASMSTART
	; CHECK-NEXT: s_mov_b32 [[SGPR:s[0-9]+]]			; CHECK-NEXT: s_mov_b32 [[SGPR:s[0-9]+]]
	; CHECK: ;;#ASMEND			; CHECK: ;;#ASMEND
	; CHECK-NOT: [[SGPR]]			; CHECK-NOT: [[SGPR]]
	; CHECK-NOT: readfirstlane			; CHECK-NOT: readfirstlane
	; CHECK: v_mov_b32_e32 [[VCOPY:v[0-9]+]], [[SGPR]]			; CHECK: v_mov_b32_e32 [[VCOPY:v[0-9]+]], [[SGPR]]
	; CHECK: flat_store_dword v{{\[[0-9]+:[0-9]+\]}}, [[VCOPY]]			; CHECK: flat_store_dword v{{\[[0-9]+:[0-9]+\]}}, [[VCOPY]]
	define amdgpu_kernel void @test_readfirstlane_copy_from_sgpr(i32 addrspace(1)* %out) #1 {			define amdgpu_kernel void @test_readfirstlane_copy_from_sgpr(i32 addrspace(1)* %out) #1 {
	%sgpr = call i32 asm "s_mov_b32 $0, 0", "=s"()			%sgpr = call i32 asm "s_mov_b32 $0, 0", "=s"()
	%readfirstlane = call i32 @llvm.amdgcn.readfirstlane(i32 %sgpr)			%readfirstlane = call i32 @llvm.amdgcn.readfirstlane.i32(i32 %sgpr)
	store i32 %readfirstlane, i32 addrspace(1)* %out, align 4			store i32 %readfirstlane, i32 addrspace(1)* %out, align 4
	ret void			ret void
	}			}

	; Make sure this doesn't crash.			; Make sure this doesn't crash.
	; CHECK-LABEL: {{^}}test_readfirstlane_fi:			; CHECK-LABEL: {{^}}test_readfirstlane_fi:
	; CHECK: s_mov_b32 [[FIVAL:s[0-9]]], 4			; CHECK: s_mov_b32 [[FIVAL:s[0-9]]], 4
	define amdgpu_kernel void @test_readfirstlane_fi(i32 addrspace(1)* %out) #1 {			define amdgpu_kernel void @test_readfirstlane_fi(i32 addrspace(1)* %out) #1 {
	%alloca = alloca i32, addrspace(5)			%alloca = alloca i32, addrspace(5)
	%int = ptrtoint i32 addrspace(5)* %alloca to i32			%int = ptrtoint i32 addrspace(5)* %alloca to i32
	%readfirstlane = call i32 @llvm.amdgcn.readfirstlane(i32 %int)			%readfirstlane = call i32 @llvm.amdgcn.readfirstlane.i32(i32 %int)
	call void asm sideeffect "; use $0", "s"(i32 %readfirstlane)			call void asm sideeffect "; use $0", "s"(i32 %readfirstlane)
	ret void			ret void
	}			}

				; CHECK-LABEL: {{^}}test_readfirstlane_f32:
				; CHECK: v_readfirstlane_b32 s{{[0-9]+}}, v2
				; CHECK-NOT: v_readfirstlane_b32
				define void @test_readfirstlane_f32(float addrspace(1)* %out, float %src) #1 {
				%readfirstlane = call float @llvm.amdgcn.readfirstlane.f32(float %src)
				store float %readfirstlane, float addrspace(1)* %out, align 4
				ret void
				}

				; CHECK-LABEL: {{^}}test_readfirstlane_v2f16:
				; CHECK: v_readfirstlane_b32 s{{[0-9]+}}, v2
				; CHECK-NOT: v_readfirstlane_b32
				define void @test_readfirstlane_v2f16(<2 x half> addrspace(1)* %out, <2 x half> %src) #1 {
				%readfirstlane = call <2 x half> @llvm.amdgcn.readfirstlane.v2f16(<2 x half> %src)
				store <2 x half> %readfirstlane, <2 x half> addrspace(1)* %out, align 4
				ret void
				}

				; CHECK-LABEL: {{^}}test_readfirstlane_v2i16:
				; CHECK: v_readfirstlane_b32 s{{[0-9]+}}, v2
				; CHECK-NOT: v_readfirstlane_b32
				define void @test_readfirstlane_v2i16(<2 x i16> addrspace(1)* %out, <2 x i16> %src) #1 {
				%readfirstlane = call <2 x i16> @llvm.amdgcn.readfirstlane.v2i16(<2 x i16> %src)
				store <2 x i16> %readfirstlane, <2 x i16> addrspace(1)* %out, align 4
				ret void
				}

				; CHECK-LABEL: {{^}}test_readfirstlane_p3:
				; CHECK: v_readfirstlane_b32 s{{[0-9]+}}, v2
				; CHECK-NOT: v_readfirstlane_b32
				define void @test_readfirstlane_p3(i8 addrspace(3)* addrspace(1)* %out, i8 addrspace(3)* %src) #1 {
				%readfirstlane = call i8 addrspace(3)* @llvm.amdgcn.readfirstlane.p3i8(i8 addrspace(3)* %src)
				store i8 addrspace(3)* %readfirstlane, i8 addrspace(3)* addrspace(1)* %out, align 4
				ret void
				}

				; CHECK-LABEL: {{^}}test_readfirstlane_i16:
				; CHECK: v_readfirstlane_b32 s{{[0-9]+}}, v2
				; CHECK-NOT: v_readfirstlane_b32
				define void @test_readfirstlane_i16(i16 addrspace(1)* %out, i16 %src) {
				%readfirstlane = call i16 @llvm.amdgcn.readfirstlane.i16(i16 %src)
				store i16 %readfirstlane, i16 addrspace(1)* %out, align 2
				ret void
				}

				; CHECK-LABEL: {{^}}test_readfirstlane_f16:
				; CHECK: v_readfirstlane_b32 s{{[0-9]+}}, v2
				; CHECK-NOT: v_readfirstlane_b32
				define void @test_readfirstlane_f16(half addrspace(1)* %out, half %src) {
				%readfirstlane = call half @llvm.amdgcn.readfirstlane.f16(half %src)
				store half %readfirstlane, half addrspace(1)* %out, align 2
				ret void
				}

				; CHECK-LABEL: {{^}}test_readfirstlane_v3i16:
				; CHECK-DAG: v_readfirstlane_b32 s{{[0-9]+}},
				; CHECK-DAG: v_readfirstlane_b32 s{{[0-9]+}},
				; CHECK-NOT: v_readfirstlane_b32
				define void @test_readfirstlane_v3i16(<3 x i16> addrspace(1)* %out, <3 x i16> %src) {
				%readfirstlane = call <3 x i16> @llvm.amdgcn.readfirstlane.v3i16(<3 x i16> %src)
				store <3 x i16> %readfirstlane, <3 x i16> addrspace(1)* %out, align 2
				ret void
				}

				; CHECK-LABEL: {{^}}test_readfirstlane_v9f32:
				; CHECK-DAG: v_readfirstlane_b32 s{{[0-9]+}}, v2
				; CHECK-DAG: v_readfirstlane_b32 s{{[0-9]+}}, v3
				; CHECK-DAG: v_readfirstlane_b32 s{{[0-9]+}}, v4
				; CHECK-DAG: v_readfirstlane_b32 s{{[0-9]+}}, v5
				; CHECK-DAG: v_readfirstlane_b32 s{{[0-9]+}}, v6
				; CHECK-DAG: v_readfirstlane_b32 s{{[0-9]+}}, v7
				; CHECK-DAG: v_readfirstlane_b32 s{{[0-9]+}}, v8
				; CHECK-DAG: v_readfirstlane_b32 s{{[0-9]+}}, v9
				; CHECK-DAG: v_readfirstlane_b32 s{{[0-9]+}}, v10
				; CHECK-NOT: v_readfirstlane_b32
				define void @test_readfirstlane_v9f32(<9 x float> addrspace(1)* %out, <9 x float> %src) {
				%readfirstlane = call <9 x float> @llvm.amdgcn.readfirstlane.v9f32(<9 x float> %src)
				store <9 x float> %readfirstlane, <9 x float> addrspace(1)* %out, align 2
				ret void
				}

	attributes #0 = { nounwind readnone convergent }			attributes #0 = { nounwind readnone convergent }
	attributes #1 = { nounwind }			attributes #1 = { nounwind }

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.readlane.ll

	; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -enable-var-scope %s			; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -enable-var-scope %s

	declare i32 @llvm.amdgcn.readlane(i32, i32) #0			declare i32 @llvm.amdgcn.readlane(i32, i32) #0
				declare i8 addrspace(3)* @llvm.amdgcn.readlane.p3i8(i8 addrspace(3)*, i32) #0
				declare i16 @llvm.amdgcn.readlane.i16(i16, i32) #0
				declare half @llvm.amdgcn.readlane.f16(half, i32) #0
				declare <3 x i16> @llvm.amdgcn.readlane.v3i16(<3 x i16>, i32) #0
				declare <9 x float> @llvm.amdgcn.readlane.v9f32(<9 x float>, i32) #0

	; CHECK-LABEL: {{^}}test_readlane_sreg_sreg:			; CHECK-LABEL: {{^}}test_readlane_sreg_sreg:
	; CHECK-NOT: v_readlane_b32			; CHECK-NOT: v_readlane_b32
	define amdgpu_kernel void @test_readlane_sreg_sreg(i32 %src0, i32 %src1) #1 {			define amdgpu_kernel void @test_readlane_sreg_sreg(i32 %src0, i32 %src1) #1 {
	%readlane = call i32 @llvm.amdgcn.readlane(i32 %src0, i32 %src1)			%readlane = call i32 @llvm.amdgcn.readlane(i32 %src0, i32 %src1)
	call void asm sideeffect "; use $0", "s"(i32 %readlane)			call void asm sideeffect "; use $0", "s"(i32 %readlane)
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines
	; CHECK: flat_store_dword v{{\[[0-9]+:[0-9]+\]}}, [[VCOPY]]			; CHECK: flat_store_dword v{{\[[0-9]+:[0-9]+\]}}, [[VCOPY]]
	define amdgpu_kernel void @test_readlane_copy_from_sgpr(i32 addrspace(1)* %out) #1 {			define amdgpu_kernel void @test_readlane_copy_from_sgpr(i32 addrspace(1)* %out) #1 {
	%sgpr = call i32 asm "s_mov_b32 $0, 0", "=s"()			%sgpr = call i32 asm "s_mov_b32 $0, 0", "=s"()
	%readfirstlane = call i32 @llvm.amdgcn.readlane(i32 %sgpr, i32 7)			%readfirstlane = call i32 @llvm.amdgcn.readlane(i32 %sgpr, i32 7)
	store i32 %readfirstlane, i32 addrspace(1)* %out, align 4			store i32 %readfirstlane, i32 addrspace(1)* %out, align 4
	ret void			ret void
	}			}

				; CHECK-LABEL: {{^}}test_readlane_p3:
				; CHECK: v_readlane_b32 s{{[0-9]+}}, v2, 15
				; CHECK-NOT: v_readlane_b32
				define void @test_readlane_p3(i8 addrspace(3)* addrspace(1)* %out, i8 addrspace(3)* %src) #1 {
				%readlane = call i8 addrspace(3)* @llvm.amdgcn.readlane.p3i8(i8 addrspace(3)* %src, i32 15)
				store i8 addrspace(3)* %readlane, i8 addrspace(3)* addrspace(1)* %out, align 4
				ret void
				}

				; CHECK-LABEL: {{^}}test_readlane_i16:
				; CHECK: v_readlane_b32 s{{[0-9]+}}, v2, 15
				; CHECK-NOT: v_readlane_b32
				define void @test_readlane_i16(i16 addrspace(1)* %out, i16 %src) {
				%readlane = call i16 @llvm.amdgcn.readlane.i16(i16 %src, i32 15)
				store i16 %readlane, i16 addrspace(1)* %out, align 2
				ret void
				}

				; CHECK-LABEL: {{^}}test_readlane_f16:
				; CHECK: v_readlane_b32 s{{[0-9]+}}, v2, 15
				; CHECK-NOT: v_readlane_b32
				define void @test_readlane_f16(half addrspace(1)* %out, half %src) {
				%readlane = call half @llvm.amdgcn.readlane.f16(half %src, i32 15)
				store half %readlane, half addrspace(1)* %out, align 2
				ret void
				}

				; CHECK-LABEL: {{^}}test_readlane_v3i16:
				; CHECK-DAG: v_readlane_b32 s{{[0-9]+}},
				; CHECK-DAG: v_readlane_b32 s{{[0-9]+}},
				; CHECK-NOT: v_readlane_b32
				define void @test_readlane_v3i16(<3 x i16> addrspace(1)* %out, <3 x i16> %src) {
				%readlane = call <3 x i16> @llvm.amdgcn.readlane.v3i16(<3 x i16> %src, i32 15)
				store <3 x i16> %readlane, <3 x i16> addrspace(1)* %out, align 2
				ret void
				}

				; CHECK-LABEL: {{^}}test_readlane_v9f32:
				; CHECK-DAG: v_readlane_b32 s{{[0-9]+}}, v2, 15
				; CHECK-DAG: v_readlane_b32 s{{[0-9]+}}, v3, 15
				; CHECK-DAG: v_readlane_b32 s{{[0-9]+}}, v4, 15
				; CHECK-DAG: v_readlane_b32 s{{[0-9]+}}, v5, 15
				; CHECK-DAG: v_readlane_b32 s{{[0-9]+}}, v6, 15
				; CHECK-DAG: v_readlane_b32 s{{[0-9]+}}, v7, 15
				; CHECK-DAG: v_readlane_b32 s{{[0-9]+}}, v8, 15
				; CHECK-DAG: v_readlane_b32 s{{[0-9]+}}, v9, 15
				; CHECK-DAG: v_readlane_b32 s{{[0-9]+}}, v10, 15
				; CHECK-NOT: v_readlane_b32
				define void @test_readlane_v9f32(<9 x float> addrspace(1)* %out, <9 x float> %src) {
				%readlane = call <9 x float> @llvm.amdgcn.readlane.v9f32(<9 x float> %src, i32 15)
				store <9 x float> %readlane, <9 x float> addrspace(1)* %out, align 2
				ret void
				}

	declare i32 @llvm.amdgcn.workitem.id.x() #2			declare i32 @llvm.amdgcn.workitem.id.x() #2

	attributes #0 = { nounwind readnone convergent }			attributes #0 = { nounwind readnone convergent }
	attributes #1 = { nounwind }			attributes #1 = { nounwind }
	attributes #2 = { nounwind readnone }			attributes #2 = { nounwind readnone }

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll

	; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=gfx700 -verify-machineinstrs < %s \| FileCheck -check-prefixes=CHECK,CI,CIGFX9 %s			; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=gfx700 -verify-machineinstrs < %s \| FileCheck -check-prefixes=CHECK,CI,CIGFX9 %s
	; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=gfx802 -verify-machineinstrs < %s \| FileCheck -check-prefixes=CHECK,GFX9,CIGFX9 %s			; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=gfx802 -verify-machineinstrs < %s \| FileCheck -check-prefixes=CHECK,GFX9,CIGFX9 %s
	; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck -check-prefixes=CHECK,GFX10 %s			; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck -check-prefixes=CHECK,GFX10 %s

	declare i32 @llvm.amdgcn.writelane(i32, i32, i32) #0			declare i32 @llvm.amdgcn.writelane(i32, i32, i32) #0
				declare i8 addrspace(3)* @llvm.amdgcn.writelane.p3i8(i8 addrspace(3), i32, i8 addrspace(3)) #0
				declare i16 @llvm.amdgcn.writelane.i16(i16, i32, i16) #0
				declare half @llvm.amdgcn.writelane.f16(half, i32, half) #0
				declare <3 x i16> @llvm.amdgcn.writelane.v3i16(<3 x i16>, i32, <3 x i16>) #0
				declare <9 x float> @llvm.amdgcn.writelane.v9f32(<9 x float>, i32, <9 x float>) #0

	; CHECK-LABEL: {{^}}test_writelane_sreg:			; CHECK-LABEL: {{^}}test_writelane_sreg:
	; CIGFX9: v_writelane_b32 v{{[0-9]+}}, s{{[0-9]+}}, m0			; CIGFX9: v_writelane_b32 v{{[0-9]+}}, s{{[0-9]+}}, m0
	; GFX10: v_writelane_b32 v{{[0-9]+}}, s{{[0-9]+}}, s{{[0-9]+}}			; GFX10: v_writelane_b32 v{{[0-9]+}}, s{{[0-9]+}}, s{{[0-9]+}}
	define amdgpu_kernel void @test_writelane_sreg(i32 addrspace(1)* %out, i32 %src0, i32 %src1) #1 {			define amdgpu_kernel void @test_writelane_sreg(i32 addrspace(1)* %out, i32 %src0, i32 %src1) #1 {
	%oldval = load i32, i32 addrspace(1)* %out			%oldval = load i32, i32 addrspace(1)* %out
	%writelane = call i32 @llvm.amdgcn.writelane(i32 %src0, i32 %src1, i32 %oldval)			%writelane = call i32 @llvm.amdgcn.writelane(i32 %src0, i32 %src1, i32 %oldval)
	store i32 %writelane, i32 addrspace(1)* %out, align 4			store i32 %writelane, i32 addrspace(1)* %out, align 4
	▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines
	; CIGFX9: v_writelane_b32 [[OLDVAL]], s{{[0-9]+}}, m0			; CIGFX9: v_writelane_b32 [[OLDVAL]], s{{[0-9]+}}, m0
	; GFX10: v_writelane_b32 [[OLDVAL]], s{{[0-9]+}}, s{{[0-9]+}}			; GFX10: v_writelane_b32 [[OLDVAL]], s{{[0-9]+}}, s{{[0-9]+}}
	define amdgpu_kernel void @test_writelane_imm_oldval(i32 addrspace(1)* %out, i32 %src0, i32 %src1) #1 {			define amdgpu_kernel void @test_writelane_imm_oldval(i32 addrspace(1)* %out, i32 %src0, i32 %src1) #1 {
	%writelane = call i32 @llvm.amdgcn.writelane(i32 %src0, i32 %src1, i32 42)			%writelane = call i32 @llvm.amdgcn.writelane(i32 %src0, i32 %src1, i32 42)
	store i32 %writelane, i32 addrspace(1)* %out, align 4			store i32 %writelane, i32 addrspace(1)* %out, align 4
	ret void			ret void
	}			}

				; CHECK-LABEL: {{^}}test_writelane_p3:
				; CHECK: v_writelane_b32 v{{[0-9]+}}, 0, 15
				; CHECK-NOT: v_writelane_b32
				define void @test_writelane_p3(i8 addrspace(3)* addrspace(1)* %out, i8 addrspace(3)* %src) #1 {
				%writelane = call i8 addrspace(3)* @llvm.amdgcn.writelane.p3i8(i8 addrspace(3)* null, i32 15, i8 addrspace(3)* %src)
				store i8 addrspace(3)* %writelane, i8 addrspace(3)* addrspace(1)* %out, align 4
				ret void
				}

				; CHECK-LABEL: {{^}}test_writelane_i16:
				; CHECK: v_writelane_b32 v{{[0-9]+}},
				; CHECK-NOT: v_writelane_b32
				define void @test_writelane_i16(i16 addrspace(1)* %out, i16 %src) {
				%writelane = call i16 @llvm.amdgcn.writelane.i16(i16 1234, i32 15, i16 %src)
				store i16 %writelane, i16 addrspace(1)* %out, align 2
				ret void
				}

				; CHECK-LABEL: {{^}}test_writelane_f16:
				; CHECK: v_writelane_b32 v{{[0-9]+}},
				; CHECK-NOT: v_writelane_b32
				define void @test_writelane_f16(half addrspace(1)* %out, half %src) {
				%writelane = call half @llvm.amdgcn.writelane.f16(half 1.0, i32 15, half %src)
				store half %writelane, half addrspace(1)* %out, align 2
				ret void
				}

				; CHECK-LABEL: {{^}}test_writelane_v3i16:
				; CHECK-DAG: v_writelane_b32 v{{[0-9]+}},
				; CHECK-DAG: v_writelane_b32 v{{[0-9]+}},
				; CHECK-NOT: v_writelane_b32
				define void @test_writelane_v3i16(<3 x i16> addrspace(1)* %out, <3 x i16> %src) {
				%writelane = call <3 x i16> @llvm.amdgcn.writelane.v3i16(<3 x i16> zeroinitializer, i32 15, <3 x i16> %src)
				store <3 x i16> %writelane, <3 x i16> addrspace(1)* %out, align 2
				ret void
				}

				; CHECK-LABEL: {{^}}test_writelane_v9f32:
				; CHECK-DAG: v_writelane_b32 v{{[0-9]+}}, 0, 15
				; CHECK-DAG: v_writelane_b32 v{{[0-9]+}}, 0, 15
				; CHECK-DAG: v_writelane_b32 v{{[0-9]+}}, 0, 15
				; CHECK-DAG: v_writelane_b32 v{{[0-9]+}}, 0, 15
				; CHECK-DAG: v_writelane_b32 v{{[0-9]+}}, 0, 15
				; CHECK-DAG: v_writelane_b32 v{{[0-9]+}}, 0, 15
				; CHECK-DAG: v_writelane_b32 v{{[0-9]+}}, 0, 15
				; CHECK-DAG: v_writelane_b32 v{{[0-9]+}}, 0, 15
				; CHECK-DAG: v_writelane_b32 v{{[0-9]+}}, 0, 15
				; CHECK-NOT: v_writelane_b32
				define void @test_writelane_v9f32(<9 x float> addrspace(1)* %out, <9 x float> %src) {
				%writelane = call <9 x float> @llvm.amdgcn.writelane.v9f32(<9 x float> zeroinitializer, i32 15, <9 x float> %src)
				store <9 x float> %writelane, <9 x float> addrspace(1)* %out, align 2
				ret void
				}

	declare i32 @llvm.amdgcn.workitem.id.x() #2			declare i32 @llvm.amdgcn.workitem.id.x() #2

	attributes #0 = { nounwind readnone convergent }			attributes #0 = { nounwind readnone convergent }
	attributes #1 = { nounwind }			attributes #1 = { nounwind }
	attributes #2 = { nounwind readnone }			attributes #2 = { nounwind readnone }

llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll

	Show First 20 Lines • Show All 2,501 Lines • ▼ Show 20 Lines
	; --------------------------------------------------------------------			; --------------------------------------------------------------------

	declare i32 @llvm.amdgcn.readfirstlane(i32)			declare i32 @llvm.amdgcn.readfirstlane(i32)

	@gv = constant i32 0			@gv = constant i32 0

	define amdgpu_kernel void @readfirstlane_constant(i32 %arg) {			define amdgpu_kernel void @readfirstlane_constant(i32 %arg) {
	; CHECK-LABEL: @readfirstlane_constant(			; CHECK-LABEL: @readfirstlane_constant(
	; CHECK-NEXT: [[VAR:%.]] = call i32 @llvm.amdgcn.readfirstlane(i32 [[ARG:%.]])			; CHECK-NEXT: [[TMP1:%.]] = call i32 @llvm.amdgcn.readfirstlane.i32(i32 [[ARG:%.]])
	; CHECK-NEXT: store volatile i32 [[VAR]], i32* undef, align 4			; CHECK-NEXT: store volatile i32 [[TMP1]], i32* undef, align 4
	; CHECK-NEXT: store volatile i32 0, i32* undef, align 4			; CHECK-NEXT: store volatile i32 0, i32* undef, align 4
	; CHECK-NEXT: store volatile i32 123, i32* undef, align 4			; CHECK-NEXT: store volatile i32 123, i32* undef, align 4
	; CHECK-NEXT: store volatile i32 ptrtoint (i32* @gv to i32), i32* undef, align 4			; CHECK-NEXT: store volatile i32 ptrtoint (i32* @gv to i32), i32* undef, align 4
	; CHECK-NEXT: store volatile i32 undef, i32* undef, align 4			; CHECK-NEXT: store volatile i32 undef, i32* undef, align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%var = call i32 @llvm.amdgcn.readfirstlane(i32 %arg)			%var = call i32 @llvm.amdgcn.readfirstlane(i32 %arg)
	%zero = call i32 @llvm.amdgcn.readfirstlane(i32 0)			%zero = call i32 @llvm.amdgcn.readfirstlane(i32 0)
	%imm = call i32 @llvm.amdgcn.readfirstlane(i32 123)			%imm = call i32 @llvm.amdgcn.readfirstlane(i32 123)
	%constexpr = call i32 @llvm.amdgcn.readfirstlane(i32 ptrtoint (i32* @gv to i32))			%constexpr = call i32 @llvm.amdgcn.readfirstlane(i32 ptrtoint (i32* @gv to i32))
	%undef = call i32 @llvm.amdgcn.readfirstlane(i32 undef)			%undef = call i32 @llvm.amdgcn.readfirstlane(i32 undef)
	store volatile i32 %var, i32* undef			store volatile i32 %var, i32* undef
	store volatile i32 %zero, i32* undef			store volatile i32 %zero, i32* undef
	store volatile i32 %imm, i32* undef			store volatile i32 %imm, i32* undef
	store volatile i32 %constexpr, i32* undef			store volatile i32 %constexpr, i32* undef
	store volatile i32 %undef, i32* undef			store volatile i32 %undef, i32* undef
	ret void			ret void
	}			}

	define i32 @readfirstlane_idempotent(i32 %arg) {			define i32 @readfirstlane_idempotent(i32 %arg) {
	; CHECK-LABEL: @readfirstlane_idempotent(			; CHECK-LABEL: @readfirstlane_idempotent(
	; CHECK-NEXT: [[READ0:%.]] = call i32 @llvm.amdgcn.readfirstlane(i32 [[ARG:%.]])			; CHECK-NEXT: [[TMP1:%.]] = call i32 @llvm.amdgcn.readfirstlane.i32(i32 [[ARG:%.]])
	; CHECK-NEXT: ret i32 [[READ0]]			; CHECK-NEXT: ret i32 [[TMP1]]
	;			;
	%read0 = call i32 @llvm.amdgcn.readfirstlane(i32 %arg)			%read0 = call i32 @llvm.amdgcn.readfirstlane(i32 %arg)
	%read1 = call i32 @llvm.amdgcn.readfirstlane(i32 %read0)			%read1 = call i32 @llvm.amdgcn.readfirstlane(i32 %read0)
	%read2 = call i32 @llvm.amdgcn.readfirstlane(i32 %read1)			%read2 = call i32 @llvm.amdgcn.readfirstlane(i32 %read1)
	ret i32 %read2			ret i32 %read2
	}			}

	define i32 @readfirstlane_readlane(i32 %arg) {			define i32 @readfirstlane_readlane(i32 %arg) {
	; CHECK-LABEL: @readfirstlane_readlane(			; CHECK-LABEL: @readfirstlane_readlane(
	; CHECK-NEXT: [[READ0:%.]] = call i32 @llvm.amdgcn.readfirstlane(i32 [[ARG:%.]])			; CHECK-NEXT: [[TMP1:%.]] = call i32 @llvm.amdgcn.readfirstlane.i32(i32 [[ARG:%.]])
	; CHECK-NEXT: ret i32 [[READ0]]			; CHECK-NEXT: ret i32 [[TMP1]]
	;			;
	%read0 = call i32 @llvm.amdgcn.readfirstlane(i32 %arg)			%read0 = call i32 @llvm.amdgcn.readfirstlane(i32 %arg)
	%read1 = call i32 @llvm.amdgcn.readlane(i32 %read0, i32 0)			%read1 = call i32 @llvm.amdgcn.readlane(i32 %read0, i32 0)
	ret i32 %read1			ret i32 %read1
	}			}

	define i32 @readfirstlane_readfirstlane_different_block(i32 %arg) {			define i32 @readfirstlane_readfirstlane_different_block(i32 %arg) {
	; CHECK-LABEL: @readfirstlane_readfirstlane_different_block(			; CHECK-LABEL: @readfirstlane_readfirstlane_different_block(
	; CHECK-NEXT: bb0:			; CHECK-NEXT: bb0:
	; CHECK-NEXT: [[READ0:%.]] = call i32 @llvm.amdgcn.readfirstlane(i32 [[ARG:%.]])			; CHECK-NEXT: [[TMP0:%.]] = call i32 @llvm.amdgcn.readfirstlane.i32(i32 [[ARG:%.]])
	; CHECK-NEXT: br label [[BB1:%.*]]			; CHECK-NEXT: br label [[BB1:%.*]]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: [[READ1:%.*]] = call i32 @llvm.amdgcn.readfirstlane(i32 [[READ0]])			; CHECK-NEXT: [[TMP1:%.*]] = call i32 @llvm.amdgcn.readfirstlane.i32(i32 [[TMP0]])
	; CHECK-NEXT: ret i32 [[READ1]]			; CHECK-NEXT: ret i32 [[TMP1]]
	;			;
	bb0:			bb0:
	%read0 = call i32 @llvm.amdgcn.readfirstlane(i32 %arg)			%read0 = call i32 @llvm.amdgcn.readfirstlane(i32 %arg)
	br label %bb1			br label %bb1

	bb1:			bb1:
	%read1 = call i32 @llvm.amdgcn.readfirstlane(i32 %read0)			%read1 = call i32 @llvm.amdgcn.readfirstlane(i32 %read0)
	ret i32 %read1			ret i32 %read1
	}			}

	define i32 @readfirstlane_readlane_different_block(i32 %arg) {			define i32 @readfirstlane_readlane_different_block(i32 %arg) {
	; CHECK-LABEL: @readfirstlane_readlane_different_block(			; CHECK-LABEL: @readfirstlane_readlane_different_block(
	; CHECK-NEXT: bb0:			; CHECK-NEXT: bb0:
	; CHECK-NEXT: [[READ0:%.]] = call i32 @llvm.amdgcn.readlane(i32 [[ARG:%.]], i32 0)			; CHECK-NEXT: [[TMP0:%.]] = call i32 @llvm.amdgcn.readlane.i32(i32 [[ARG:%.]], i32 0)
	; CHECK-NEXT: br label [[BB1:%.*]]			; CHECK-NEXT: br label [[BB1:%.*]]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: [[READ1:%.*]] = call i32 @llvm.amdgcn.readfirstlane(i32 [[READ0]])			; CHECK-NEXT: [[TMP1:%.*]] = call i32 @llvm.amdgcn.readfirstlane.i32(i32 [[TMP0]])
	; CHECK-NEXT: ret i32 [[READ1]]			; CHECK-NEXT: ret i32 [[TMP1]]
	;			;
	bb0:			bb0:
	%read0 = call i32 @llvm.amdgcn.readlane(i32 %arg, i32 0)			%read0 = call i32 @llvm.amdgcn.readlane(i32 %arg, i32 0)
	br label %bb1			br label %bb1

	bb1:			bb1:
	%read1 = call i32 @llvm.amdgcn.readfirstlane(i32 %read0)			%read1 = call i32 @llvm.amdgcn.readfirstlane(i32 %read0)
	ret i32 %read1			ret i32 %read1
	}			}

				define i32 @readfirstlane_bitcast(float %arg) {
				; CHECK-LABEL: @readfirstlane_bitcast(
				; CHECK-NEXT: [[TMP1:%.]] = call float @llvm.amdgcn.readfirstlane.f32(float [[ARG:%.]])
				; CHECK-NEXT: [[TMP2:%.*]] = bitcast float [[TMP1]] to i32
				; CHECK-NEXT: ret i32 [[TMP2]]
				;
				%bitcast.arg = bitcast float %arg to i32
				%read = call i32 @llvm.amdgcn.readfirstlane(i32 %bitcast.arg)
				ret i32 %read
				}

				define float @bitcast_readfirstlane_bitcast(float %arg) {
				; CHECK-LABEL: @bitcast_readfirstlane_bitcast(
				; CHECK-NEXT: [[TMP1:%.]] = call float @llvm.amdgcn.readfirstlane.f32(float [[ARG:%.]])
				; CHECK-NEXT: ret float [[TMP1]]
				;
				%bitcast.arg = bitcast float %arg to i32
				%read = call i32 @llvm.amdgcn.readfirstlane(i32 %bitcast.arg)
				%cast.read = bitcast i32 %read to float
				ret float %cast.read
				}

				define i32 @readfirstlane_bitcast_multi_use(float %arg) {
				; CHECK-LABEL: @readfirstlane_bitcast_multi_use(
				; CHECK-NEXT: store float [[ARG:%.]], float undef, align 4
				; CHECK-NEXT: [[TMP1:%.*]] = call float @llvm.amdgcn.readfirstlane.f32(float [[ARG]])
				; CHECK-NEXT: [[TMP2:%.*]] = bitcast float [[TMP1]] to i32
				; CHECK-NEXT: ret i32 [[TMP2]]
				;
				%bitcast.arg = bitcast float %arg to i32
				store i32 %bitcast.arg, i32* undef
				%read = call i32 @llvm.amdgcn.readfirstlane(i32 %bitcast.arg)
				ret i32 %read
				}

	; --------------------------------------------------------------------			; --------------------------------------------------------------------
	; llvm.amdgcn.readlane			; llvm.amdgcn.readlane
	; --------------------------------------------------------------------			; --------------------------------------------------------------------

	declare i32 @llvm.amdgcn.readlane(i32, i32)			declare i32 @llvm.amdgcn.readlane(i32, i32)

	define amdgpu_kernel void @readlane_constant(i32 %arg, i32 %lane) {			define amdgpu_kernel void @readlane_constant(i32 %arg, i32 %lane) {
	; CHECK-LABEL: @readlane_constant(			; CHECK-LABEL: @readlane_constant(
	; CHECK-NEXT: [[VAR:%.]] = call i32 @llvm.amdgcn.readlane(i32 [[ARG:%.]], i32 7)			; CHECK-NEXT: [[TMP1:%.]] = call i32 @llvm.amdgcn.readlane.i32(i32 [[ARG:%.]], i32 7)
	; CHECK-NEXT: store volatile i32 [[VAR]], i32* undef, align 4			; CHECK-NEXT: store volatile i32 [[TMP1]], i32* undef, align 4
	; CHECK-NEXT: store volatile i32 0, i32* undef, align 4			; CHECK-NEXT: store volatile i32 0, i32* undef, align 4
	; CHECK-NEXT: store volatile i32 123, i32* undef, align 4			; CHECK-NEXT: store volatile i32 123, i32* undef, align 4
	; CHECK-NEXT: store volatile i32 ptrtoint (i32* @gv to i32), i32* undef, align 4			; CHECK-NEXT: store volatile i32 ptrtoint (i32* @gv to i32), i32* undef, align 4
	; CHECK-NEXT: store volatile i32 undef, i32* undef, align 4			; CHECK-NEXT: store volatile i32 undef, i32* undef, align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%var = call i32 @llvm.amdgcn.readlane(i32 %arg, i32 7)			%var = call i32 @llvm.amdgcn.readlane(i32 %arg, i32 7)
	%zero = call i32 @llvm.amdgcn.readlane(i32 0, i32 %lane)			%zero = call i32 @llvm.amdgcn.readlane(i32 0, i32 %lane)
	%imm = call i32 @llvm.amdgcn.readlane(i32 123, i32 %lane)			%imm = call i32 @llvm.amdgcn.readlane(i32 123, i32 %lane)
	%constexpr = call i32 @llvm.amdgcn.readlane(i32 ptrtoint (i32* @gv to i32), i32 %lane)			%constexpr = call i32 @llvm.amdgcn.readlane(i32 ptrtoint (i32* @gv to i32), i32 %lane)
	%undef = call i32 @llvm.amdgcn.readlane(i32 undef, i32 %lane)			%undef = call i32 @llvm.amdgcn.readlane(i32 undef, i32 %lane)
	store volatile i32 %var, i32* undef			store volatile i32 %var, i32* undef
	store volatile i32 %zero, i32* undef			store volatile i32 %zero, i32* undef
	store volatile i32 %imm, i32* undef			store volatile i32 %imm, i32* undef
	store volatile i32 %constexpr, i32* undef			store volatile i32 %constexpr, i32* undef
	store volatile i32 %undef, i32* undef			store volatile i32 %undef, i32* undef
	ret void			ret void
	}			}

	define i32 @readlane_idempotent(i32 %arg, i32 %lane) {			define i32 @readlane_idempotent(i32 %arg, i32 %lane) {
	; CHECK-LABEL: @readlane_idempotent(			; CHECK-LABEL: @readlane_idempotent(
	; CHECK-NEXT: [[READ0:%.]] = call i32 @llvm.amdgcn.readlane(i32 [[ARG:%.]], i32 [[LANE:%.*]])			; CHECK-NEXT: [[TMP1:%.]] = call i32 @llvm.amdgcn.readlane.i32(i32 [[ARG:%.]], i32 [[LANE:%.*]])
	; CHECK-NEXT: ret i32 [[READ0]]			; CHECK-NEXT: ret i32 [[TMP1]]
	;			;
	%read0 = call i32 @llvm.amdgcn.readlane(i32 %arg, i32 %lane)			%read0 = call i32 @llvm.amdgcn.readlane(i32 %arg, i32 %lane)
	%read1 = call i32 @llvm.amdgcn.readlane(i32 %read0, i32 %lane)			%read1 = call i32 @llvm.amdgcn.readlane(i32 %read0, i32 %lane)
	ret i32 %read1			ret i32 %read1
	}			}

	define i32 @readlane_idempotent_different_lanes(i32 %arg, i32 %lane0, i32 %lane1) {			define i32 @readlane_idempotent_different_lanes(i32 %arg, i32 %lane0, i32 %lane1) {
	; CHECK-LABEL: @readlane_idempotent_different_lanes(			; CHECK-LABEL: @readlane_idempotent_different_lanes(
	; CHECK-NEXT: [[READ0:%.]] = call i32 @llvm.amdgcn.readlane(i32 [[ARG:%.]], i32 [[LANE0:%.*]])			; CHECK-NEXT: [[TMP1:%.]] = call i32 @llvm.amdgcn.readlane.i32(i32 [[ARG:%.]], i32 [[LANE0:%.*]])
	; CHECK-NEXT: [[READ1:%.]] = call i32 @llvm.amdgcn.readlane(i32 [[READ0]], i32 [[LANE1:%.]])			; CHECK-NEXT: [[TMP2:%.]] = call i32 @llvm.amdgcn.readlane.i32(i32 [[TMP1]], i32 [[LANE1:%.]])
	; CHECK-NEXT: ret i32 [[READ1]]			; CHECK-NEXT: ret i32 [[TMP2]]
	;			;
	%read0 = call i32 @llvm.amdgcn.readlane(i32 %arg, i32 %lane0)			%read0 = call i32 @llvm.amdgcn.readlane(i32 %arg, i32 %lane0)
	%read1 = call i32 @llvm.amdgcn.readlane(i32 %read0, i32 %lane1)			%read1 = call i32 @llvm.amdgcn.readlane(i32 %read0, i32 %lane1)
	ret i32 %read1			ret i32 %read1
	}			}

	define i32 @readlane_readfirstlane(i32 %arg) {			define i32 @readlane_readfirstlane(i32 %arg) {
	; CHECK-LABEL: @readlane_readfirstlane(			; CHECK-LABEL: @readlane_readfirstlane(
	; CHECK-NEXT: [[READ0:%.]] = call i32 @llvm.amdgcn.readfirstlane(i32 [[ARG:%.]])			; CHECK-NEXT: [[TMP1:%.]] = call i32 @llvm.amdgcn.readfirstlane.i32(i32 [[ARG:%.]])
	; CHECK-NEXT: ret i32 [[READ0]]			; CHECK-NEXT: ret i32 [[TMP1]]
	;			;
	%read0 = call i32 @llvm.amdgcn.readfirstlane(i32 %arg)			%read0 = call i32 @llvm.amdgcn.readfirstlane(i32 %arg)
	%read1 = call i32 @llvm.amdgcn.readlane(i32 %read0, i32 0)			%read1 = call i32 @llvm.amdgcn.readlane(i32 %read0, i32 0)
	ret i32 %read1			ret i32 %read1
	}			}

	define i32 @readlane_idempotent_different_block(i32 %arg, i32 %lane) {			define i32 @readlane_idempotent_different_block(i32 %arg, i32 %lane) {
	; CHECK-LABEL: @readlane_idempotent_different_block(			; CHECK-LABEL: @readlane_idempotent_different_block(
	; CHECK-NEXT: bb0:			; CHECK-NEXT: bb0:
	; CHECK-NEXT: [[READ0:%.]] = call i32 @llvm.amdgcn.readlane(i32 [[ARG:%.]], i32 [[LANE:%.*]])			; CHECK-NEXT: [[TMP0:%.]] = call i32 @llvm.amdgcn.readlane.i32(i32 [[ARG:%.]], i32 [[LANE:%.*]])
	; CHECK-NEXT: br label [[BB1:%.*]]			; CHECK-NEXT: br label [[BB1:%.*]]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: [[READ1:%.*]] = call i32 @llvm.amdgcn.readlane(i32 [[READ0]], i32 [[LANE]])			; CHECK-NEXT: [[TMP1:%.*]] = call i32 @llvm.amdgcn.readlane.i32(i32 [[TMP0]], i32 [[LANE]])
	; CHECK-NEXT: ret i32 [[READ1]]			; CHECK-NEXT: ret i32 [[TMP1]]
	;			;
	bb0:			bb0:
	%read0 = call i32 @llvm.amdgcn.readlane(i32 %arg, i32 %lane)			%read0 = call i32 @llvm.amdgcn.readlane(i32 %arg, i32 %lane)
	br label %bb1			br label %bb1

	bb1:			bb1:
	%read1 = call i32 @llvm.amdgcn.readlane(i32 %read0, i32 %lane)			%read1 = call i32 @llvm.amdgcn.readlane(i32 %read0, i32 %lane)
	ret i32 %read1			ret i32 %read1
	}			}


	define i32 @readlane_readfirstlane_different_block(i32 %arg) {			define i32 @readlane_readfirstlane_different_block(i32 %arg) {
	; CHECK-LABEL: @readlane_readfirstlane_different_block(			; CHECK-LABEL: @readlane_readfirstlane_different_block(
	; CHECK-NEXT: bb0:			; CHECK-NEXT: bb0:
	; CHECK-NEXT: [[READ0:%.]] = call i32 @llvm.amdgcn.readfirstlane(i32 [[ARG:%.]])			; CHECK-NEXT: [[TMP0:%.]] = call i32 @llvm.amdgcn.readfirstlane.i32(i32 [[ARG:%.]])
	; CHECK-NEXT: br label [[BB1:%.*]]			; CHECK-NEXT: br label [[BB1:%.*]]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: [[READ1:%.*]] = call i32 @llvm.amdgcn.readlane(i32 [[READ0]], i32 0)			; CHECK-NEXT: [[TMP1:%.*]] = call i32 @llvm.amdgcn.readlane.i32(i32 [[TMP0]], i32 0)
	; CHECK-NEXT: ret i32 [[READ1]]			; CHECK-NEXT: ret i32 [[TMP1]]
	;			;
	bb0:			bb0:
	%read0 = call i32 @llvm.amdgcn.readfirstlane(i32 %arg)			%read0 = call i32 @llvm.amdgcn.readfirstlane(i32 %arg)
	br label %bb1			br label %bb1

	bb1:			bb1:
	%read1 = call i32 @llvm.amdgcn.readlane(i32 %read0, i32 0)			%read1 = call i32 @llvm.amdgcn.readlane(i32 %read0, i32 0)
	ret i32 %read1			ret i32 %read1
	▲ Show 20 Lines • Show All 122 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Add llvm.amdgcn.{read,readfirst,write}lane2 intrinsics with type overloadsAbandonedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 286898

clang/lib/CodeGen/CGBuiltin.cpp

clang/test/CodeGenOpenCL/builtins-amdgcn.cl

llvm/include/llvm/IR/IntrinsicsAMDGPU.td

llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp

llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

llvm/lib/Target/AMDGPU/SIInstructions.td

llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-amdgcn.readfirstlane.mir

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.readfirstlane.ll

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.readlane.ll

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.writelane.ll

llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll

AMDGPU: Add llvm.amdgcn.{read,readfirst,write}lane2 intrinsics with type overloads
AbandonedPublic