This is an archive of the discontinued LLVM Phabricator instance.

In D66197#1629436, @arsenm wrote:

In D66197#1629348, @b-sumner wrote:

Looks fine to me. Thanks!

I wonder if is.local should be is.shared because that's what getreg calls this

That would probably be better.

Rename

Looks fine to me.

LGTM

This revision is now accepted and ready to land.Aug 14 2019, 11:03 AM

Do we really need these to be "amdgpu" specific?

In D66197#1629784, @jdoerfert wrote:

Do we really need these to be "amdgpu" specific?

Are you envisioning these would be used for OpenCL implementations? OpenCL doesn't exactly have these. It instead has to_<addrspacename> functions which return NULL if the generic pointer isn't actually pointing at a object in <addrrspacename>.

In D66197#1629784, @jdoerfert wrote:

Do we really need these to be "amdgpu" specific?

I don't know what a generic intrinsic that does this would look like. This is pretty specific to these two cases, and in general I don't think pointer address spaces can be identified

In D66197#1629810, @arsenm wrote:

In D66197#1629784, @jdoerfert wrote:

Do we really need these to be "amdgpu" specific?

I don't know what a generic intrinsic that does this would look like. This is pretty specific to these two cases, and in general I don't think pointer address spaces can be identified

I would have thought it could be pessimistically resolved in the IR or precise at runtime if the hardware supports that.
E.g.,

if (__builtin_is_private(ptr)) { ... } else { ... }

and we can fold

%cast = addrspacecast i8 addrspace(1)* %gptr to i8*
%is.private = call i1 @llvm.is.private(i8* %cast)

to %is.private = true

In D66197#1629809, @b-sumner wrote:

In D66197#1629784, @jdoerfert wrote:

Do we really need these to be "amdgpu" specific?

Are you envisioning these would be used for OpenCL implementations? OpenCL doesn't exactly have these. It instead has to_<addrspacename> functions which return NULL if the generic pointer isn't actually pointing at a object in <addrrspacename>.

We (will) have various languages/targets that have corresponding address spaces and we already reserve some numbers for specific address spaces (afaik), why not make this a generic functionality.

In D66197#1629861, @jdoerfert wrote:

In D66197#1629809, @b-sumner wrote:

In D66197#1629784, @jdoerfert wrote:

Do we really need these to be "amdgpu" specific?

Are you envisioning these would be used for OpenCL implementations? OpenCL doesn't exactly have these. It instead has to_<addrspacename> functions which return NULL if the generic pointer isn't actually pointing at a object in <addrrspacename>.

We (will) have various languages/targets that have corresponding address spaces and we already reserve some numbers for specific address spaces (afaik), why not make this a generic functionality.

The IR doesn't reserve any numbers for specific usage (except 0 has some special properties, which do not include being a flat/generic pointer as defined in OpenCL). It might make more sense to add a clang builtin for this, which could then be implemented with the target specific intrinsic. I don't want to add a generic target intrinsic while guessing at how this might work on other targets. Something truly generic, like llvm.is.address.space(ptr, address_space_id) I don't think really works generally enough to add. There isn't necessarily a 1:1 mapping between the language address space and IR address space. There could possibly be multiple IR address spaces to handle, and not all targets might be able to do this test at all

In D66197#1629864, @arsenm wrote:

In D66197#1629861, @jdoerfert wrote:

In D66197#1629809, @b-sumner wrote:

In D66197#1629784, @jdoerfert wrote:

Do we really need these to be "amdgpu" specific?

Are you envisioning these would be used for OpenCL implementations? OpenCL doesn't exactly have these. It instead has to_<addrspacename> functions which return NULL if the generic pointer isn't actually pointing at a object in <addrrspacename>.

We (will) have various languages/targets that have corresponding address spaces and we already reserve some numbers for specific address spaces (afaik), why not make this a generic functionality.

The IR doesn't reserve any numbers for specific usage (except 0 has some special properties, which do not include being a flat/generic pointer as defined in OpenCL). It might make more sense to add a clang builtin for this, which could then be implemented with the target specific intrinsic. I don't want to add a generic target intrinsic while guessing at how this might work on other targets. Something truly generic, like llvm.is.address.space(ptr, address_space_id) I don't think really works generally enough to add. There isn't necessarily a 1:1 mapping between the language address space and IR address space. There could possibly be multiple IR address spaces to handle, and not all targets might be able to do this test at all

I see, still, we have llvm.nvvm.isspacep.const and friends already. Now we get llvm.amdgcn.is.private, which seems to be the same thing for amdgpu. As long as people only use this in the backend, great, but if we want middle-end passes that deal with address spaces and optimize accordingly, e.g., introduce data movement, we should have generic intrinsics or helper functions. I would prefer the former and I was curious if there is a problem with that. If the folding (I described earlier) is triple/target specific, sure, but if we do not have multiple llvm.XYZ.is.private we would simplify things.

In D66197#1629896, @jdoerfert wrote:

In D66197#1629864, @arsenm wrote:

In D66197#1629861, @jdoerfert wrote:

In D66197#1629809, @b-sumner wrote:

In D66197#1629784, @jdoerfert wrote:

Do we really need these to be "amdgpu" specific?

Are you envisioning these would be used for OpenCL implementations? OpenCL doesn't exactly have these. It instead has to_<addrspacename> functions which return NULL if the generic pointer isn't actually pointing at a object in <addrrspacename>.

We (will) have various languages/targets that have corresponding address spaces and we already reserve some numbers for specific address spaces (afaik), why not make this a generic functionality.

The IR doesn't reserve any numbers for specific usage (except 0 has some special properties, which do not include being a flat/generic pointer as defined in OpenCL). It might make more sense to add a clang builtin for this, which could then be implemented with the target specific intrinsic. I don't want to add a generic target intrinsic while guessing at how this might work on other targets. Something truly generic, like llvm.is.address.space(ptr, address_space_id) I don't think really works generally enough to add. There isn't necessarily a 1:1 mapping between the language address space and IR address space. There could possibly be multiple IR address spaces to handle, and not all targets might be able to do this test at all

I see, still, we have llvm.nvvm.isspacep.const and friends already. Now we get llvm.amdgcn.is.private, which seems to be the same thing for amdgpu. As long as people only use this in the backend, great, but if we want middle-end passes that deal with address spaces and optimize accordingly, e.g., introduce data movement, we should have generic intrinsics or helper functions. I would prefer the former and I was curious if there is a problem with that. If the folding (I described earlier) is triple/target specific, sure, but if we do not have multiple llvm.XYZ.is.private we would simplify things.

I'm not sure what you mean by data movement. InferAddressSpaces handles this in this patch when the source is inferred. I don't think there are any other passes that would ever need to care about this.

In D66197#1629903, @arsenm wrote:

In D66197#1629896, @jdoerfert wrote:

In D66197#1629864, @arsenm wrote:

In D66197#1629861, @jdoerfert wrote:

In D66197#1629809, @b-sumner wrote:

In D66197#1629784, @jdoerfert wrote:

Do we really need these to be "amdgpu" specific?

Are you envisioning these would be used for OpenCL implementations? OpenCL doesn't exactly have these. It instead has to_<addrspacename> functions which return NULL if the generic pointer isn't actually pointing at a object in <addrrspacename>.

We (will) have various languages/targets that have corresponding address spaces and we already reserve some numbers for specific address spaces (afaik), why not make this a generic functionality.

The IR doesn't reserve any numbers for specific usage (except 0 has some special properties, which do not include being a flat/generic pointer as defined in OpenCL). It might make more sense to add a clang builtin for this, which could then be implemented with the target specific intrinsic. I don't want to add a generic target intrinsic while guessing at how this might work on other targets. Something truly generic, like llvm.is.address.space(ptr, address_space_id) I don't think really works generally enough to add. There isn't necessarily a 1:1 mapping between the language address space and IR address space. There could possibly be multiple IR address spaces to handle, and not all targets might be able to do this test at all

I see, still, we have llvm.nvvm.isspacep.const and friends already. Now we get llvm.amdgcn.is.private, which seems to be the same thing for amdgpu. As long as people only use this in the backend, great, but if we want middle-end passes that deal with address spaces and optimize accordingly, e.g., introduce data movement, we should have generic intrinsics or helper functions. I would prefer the former and I was curious if there is a problem with that. If the folding (I described earlier) is triple/target specific, sure, but if we do not have multiple llvm.XYZ.is.private we would simplify things.

I'm not sure what you mean by data movement. InferAddressSpaces handles this in this patch when the source is inferred. I don't think there are any other passes that would ever need to care about this.

If I want to localize data in a loop inside a, let's say, OpenMP target offloading region, I could check if the pointer I have is already shared/private and if not move the data to a shared/private allocation. Maybe I misunderstand what this intrinsic is supposed to do though.

If you want to go ahead with this, we can add the generic stuff later.

r371009

Revision Contents

Path

Size

include/

llvm/

IR/

IntrinsicsAMDGPU.td

12 lines

lib/

Target/

AMDGPU/

AMDGPUAnnotateKernelFeatures.cpp

3 lines

AMDGPULegalizerInfo.h

2 lines

AMDGPULegalizerInfo.cpp

16 lines

AMDGPUTargetTransformInfo.cpp

17 lines

SIISelLowering.cpp

13 lines

test/

CodeGen/

AMDGPU/

GlobalISel/

llvm.amdgcn.is.private.ll

103 lines

llvm.amdgcn.is.shared.ll

103 lines

annotate-kernel-features-hsa.ll

19 lines

llvm.amdgcn.is.private.ll

50 lines

llvm.amdgcn.is.shared.ll

51 lines

Transforms/

InferAddressSpaces/

AMDGPU/

address-space-id-funcs.ll

55 lines

Diff 215167

include/llvm/IR/IntrinsicsAMDGPU.td

	Show First 20 Lines • Show All 1,463 Lines • ▼ Show 20 Lines
	// program ever uses WQM, then the instruction and the first source will be			// program ever uses WQM, then the instruction and the first source will be
	// computed in WQM.			// computed in WQM.
	def int_amdgcn_set_inactive :			def int_amdgcn_set_inactive :
	Intrinsic<[llvm_anyint_ty],			Intrinsic<[llvm_anyint_ty],
	[LLVMMatchType<0>, // value to be copied			[LLVMMatchType<0>, // value to be copied
	LLVMMatchType<0>], // value for the inactive lanes to take			LLVMMatchType<0>], // value for the inactive lanes to take
	[IntrNoMem, IntrConvergent]>;			[IntrNoMem, IntrConvergent]>;

				// Return if the given flat pointer points to a local memory address.
				def int_amdgcn_is_shared : GCCBuiltin<"__builtin_amdgcn_is_shared">,
				Intrinsic<[llvm_i1_ty], [llvm_ptr_ty],
				[IntrNoMem, IntrSpeculatable, NoCapture<0>]
				>;

				// Return if the given flat pointer points to a prvate memory address.
				def int_amdgcn_is_private : GCCBuiltin<"__builtin_amdgcn_is_private">,
				Intrinsic<[llvm_i1_ty], [llvm_ptr_ty],
				[IntrNoMem, IntrSpeculatable, NoCapture<0>]
				>;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// CI+ Intrinsics			// CI+ Intrinsics
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	def int_amdgcn_s_dcache_inv_vol :			def int_amdgcn_s_dcache_inv_vol :
	GCCBuiltin<"__builtin_amdgcn_s_dcache_inv_vol">,			GCCBuiltin<"__builtin_amdgcn_s_dcache_inv_vol">,
	Intrinsic<[], [], []>;			Intrinsic<[], [], []>;

	▲ Show 20 Lines • Show All 343 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp

Show First 20 Lines • Show All 167 Lines • ▼ Show 20 Lines	case Intrinsic::amdgcn_dispatch_ptr:
return "amdgpu-dispatch-ptr";		return "amdgpu-dispatch-ptr";
case Intrinsic::amdgcn_dispatch_id:		case Intrinsic::amdgcn_dispatch_id:
return "amdgpu-dispatch-id";		return "amdgpu-dispatch-id";
case Intrinsic::amdgcn_kernarg_segment_ptr:		case Intrinsic::amdgcn_kernarg_segment_ptr:
return "amdgpu-kernarg-segment-ptr";		return "amdgpu-kernarg-segment-ptr";
case Intrinsic::amdgcn_implicitarg_ptr:		case Intrinsic::amdgcn_implicitarg_ptr:
return "amdgpu-implicitarg-ptr";		return "amdgpu-implicitarg-ptr";
case Intrinsic::amdgcn_queue_ptr:		case Intrinsic::amdgcn_queue_ptr:
		case Intrinsic::amdgcn_is_shared:
		case Intrinsic::amdgcn_is_private:
		// TODO: Does not require queue ptr on gfx9+
case Intrinsic::trap:		case Intrinsic::trap:
case Intrinsic::debugtrap:		case Intrinsic::debugtrap:
IsQueuePtr = true;		IsQueuePtr = true;
return "amdgpu-queue-ptr";		return "amdgpu-queue-ptr";
default:		default:
return "";		return "";
}		}
}		}
▲ Show 20 Lines • Show All 196 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPULegalizerInfo.h

Show First 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	bool legalizePreloadedArgIntrin(
MachineInstr &MI, MachineRegisterInfo &MRI, MachineIRBuilder &B,		MachineInstr &MI, MachineRegisterInfo &MRI, MachineIRBuilder &B,
AMDGPUFunctionArgInfo::PreloadedValue ArgType) const;		AMDGPUFunctionArgInfo::PreloadedValue ArgType) const;

bool legalizeFDIVFast(MachineInstr &MI, MachineRegisterInfo &MRI,		bool legalizeFDIVFast(MachineInstr &MI, MachineRegisterInfo &MRI,
MachineIRBuilder &B) const;		MachineIRBuilder &B) const;

bool legalizeImplicitArgPtr(MachineInstr &MI, MachineRegisterInfo &MRI,		bool legalizeImplicitArgPtr(MachineInstr &MI, MachineRegisterInfo &MRI,
MachineIRBuilder &B) const;		MachineIRBuilder &B) const;
		bool legalizeIsAddrSpace(MachineInstr &MI, MachineRegisterInfo &MRI,
		MachineIRBuilder &B, unsigned AddrSpace) const;
bool legalizeIntrinsic(MachineInstr &MI, MachineRegisterInfo &MRI,		bool legalizeIntrinsic(MachineInstr &MI, MachineRegisterInfo &MRI,
MachineIRBuilder &MIRBuilder) const override;		MachineIRBuilder &MIRBuilder) const override;

};		};
} // End llvm namespace.		} // End llvm namespace.
#endif		#endif

lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp

Show First 20 Lines • Show All 1,365 Lines • ▼ Show 20 Lines	bool AMDGPULegalizerInfo::legalizeImplicitArgPtr(MachineInstr &MI,
if (!loadInputValue(KernargPtrReg, B, Arg))		if (!loadInputValue(KernargPtrReg, B, Arg))
return false;		return false;

B.buildGEP(DstReg, KernargPtrReg, B.buildConstant(IdxTy, Offset).getReg(0));		B.buildGEP(DstReg, KernargPtrReg, B.buildConstant(IdxTy, Offset).getReg(0));
MI.eraseFromParent();		MI.eraseFromParent();
return true;		return true;
}		}

		bool AMDGPULegalizerInfo::legalizeIsAddrSpace(MachineInstr &MI,
		MachineRegisterInfo &MRI,
		MachineIRBuilder &B,
		unsigned AddrSpace) const {
		B.setInstr(MI);
		Register ApertureReg = getSegmentAperture(AddrSpace, MRI, B);
		auto Hi32 = B.buildExtract(LLT::scalar(32), MI.getOperand(2).getReg(), 32);
		B.buildICmp(ICmpInst::ICMP_EQ, MI.getOperand(0), Hi32, ApertureReg);
		MI.eraseFromParent();
		return true;
		}

bool AMDGPULegalizerInfo::legalizeIntrinsic(MachineInstr &MI,		bool AMDGPULegalizerInfo::legalizeIntrinsic(MachineInstr &MI,
MachineRegisterInfo &MRI,		MachineRegisterInfo &MRI,
MachineIRBuilder &B) const {		MachineIRBuilder &B) const {
// Replace the use G_BRCOND with the exec manipulate and branch pseudos.		// Replace the use G_BRCOND with the exec manipulate and branch pseudos.
switch (MI.getOperand(MI.getNumExplicitDefs()).getIntrinsicID()) {		switch (MI.getOperand(MI.getNumExplicitDefs()).getIntrinsicID()) {
case Intrinsic::amdgcn_if: {		case Intrinsic::amdgcn_if: {
if (MachineInstr *BrCond = verifyCFIntrinsic(MI, MRI)) {		if (MachineInstr *BrCond = verifyCFIntrinsic(MI, MRI)) {
const SIRegisterInfo *TRI		const SIRegisterInfo *TRI
▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	bool AMDGPULegalizerInfo::legalizeIntrinsic(MachineInstr &MI,
case Intrinsic::amdgcn_implicit_buffer_ptr:		case Intrinsic::amdgcn_implicit_buffer_ptr:
return legalizePreloadedArgIntrin(		return legalizePreloadedArgIntrin(
MI, MRI, B, AMDGPUFunctionArgInfo::IMPLICIT_BUFFER_PTR);		MI, MRI, B, AMDGPUFunctionArgInfo::IMPLICIT_BUFFER_PTR);
case Intrinsic::amdgcn_dispatch_id:		case Intrinsic::amdgcn_dispatch_id:
return legalizePreloadedArgIntrin(MI, MRI, B,		return legalizePreloadedArgIntrin(MI, MRI, B,
AMDGPUFunctionArgInfo::DISPATCH_ID);		AMDGPUFunctionArgInfo::DISPATCH_ID);
case Intrinsic::amdgcn_fdiv_fast:		case Intrinsic::amdgcn_fdiv_fast:
return legalizeFDIVFast(MI, MRI, B);		return legalizeFDIVFast(MI, MRI, B);
		case Intrinsic::amdgcn_is_shared:
		return legalizeIsAddrSpace(MI, MRI, B, AMDGPUAS::LOCAL_ADDRESS);
		case Intrinsic::amdgcn_is_private:
		return legalizeIsAddrSpace(MI, MRI, B, AMDGPUAS::PRIVATE_ADDRESS);
default:		default:
return true;		return true;
}		}

return true;		return true;
}		}

lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp

	Show First 20 Lines • Show All 592 Lines • ▼ Show 20 Lines
	bool GCNTTIImpl::collectFlatAddressOperands(SmallVectorImpl<int> &OpIndexes,			bool GCNTTIImpl::collectFlatAddressOperands(SmallVectorImpl<int> &OpIndexes,
	Intrinsic::ID IID) const {			Intrinsic::ID IID) const {
	switch (IID) {			switch (IID) {
	case Intrinsic::amdgcn_atomic_inc:			case Intrinsic::amdgcn_atomic_inc:
	case Intrinsic::amdgcn_atomic_dec:			case Intrinsic::amdgcn_atomic_dec:
	case Intrinsic::amdgcn_ds_fadd:			case Intrinsic::amdgcn_ds_fadd:
	case Intrinsic::amdgcn_ds_fmin:			case Intrinsic::amdgcn_ds_fmin:
	case Intrinsic::amdgcn_ds_fmax:			case Intrinsic::amdgcn_ds_fmax:
				case Intrinsic::amdgcn_is_shared:
				case Intrinsic::amdgcn_is_private:
	OpIndexes.push_back(0);			OpIndexes.push_back(0);
	return true;			return true;
	default:			default:
	return false;			return false;
	}			}
	}			}

	bool GCNTTIImpl::rewriteIntrinsicWithAddressSpace(			bool GCNTTIImpl::rewriteIntrinsicWithAddressSpace(
	IntrinsicInst II, Value OldV, Value *NewV) const {			IntrinsicInst II, Value OldV, Value *NewV) const {
	switch (II->getIntrinsicID()) {			auto IntrID = II->getIntrinsicID();
				switch (IntrID) {
	case Intrinsic::amdgcn_atomic_inc:			case Intrinsic::amdgcn_atomic_inc:
	case Intrinsic::amdgcn_atomic_dec:			case Intrinsic::amdgcn_atomic_dec:
	case Intrinsic::amdgcn_ds_fadd:			case Intrinsic::amdgcn_ds_fadd:
	case Intrinsic::amdgcn_ds_fmin:			case Intrinsic::amdgcn_ds_fmin:
	case Intrinsic::amdgcn_ds_fmax: {			case Intrinsic::amdgcn_ds_fmax: {
	const ConstantInt *IsVolatile = cast<ConstantInt>(II->getArgOperand(4));			const ConstantInt *IsVolatile = cast<ConstantInt>(II->getArgOperand(4));
	if (!IsVolatile->isZero())			if (!IsVolatile->isZero())
	return false;			return false;
	Module *M = II->getParent()->getParent()->getParent();			Module *M = II->getParent()->getParent()->getParent();
	Type *DestTy = II->getType();			Type *DestTy = II->getType();
	Type *SrcTy = NewV->getType();			Type *SrcTy = NewV->getType();
	Function *NewDecl =			Function *NewDecl =
	Intrinsic::getDeclaration(M, II->getIntrinsicID(), {DestTy, SrcTy});			Intrinsic::getDeclaration(M, II->getIntrinsicID(), {DestTy, SrcTy});
	II->setArgOperand(0, NewV);			II->setArgOperand(0, NewV);
	II->setCalledFunction(NewDecl);			II->setCalledFunction(NewDecl);
	return true;			return true;
	}			}
				case Intrinsic::amdgcn_is_shared:
				case Intrinsic::amdgcn_is_private: {
				unsigned TrueAS = IntrID == Intrinsic::amdgcn_is_shared ?
				AMDGPUAS::LOCAL_ADDRESS : AMDGPUAS::PRIVATE_ADDRESS;
				unsigned NewAS = NewV->getType()->getPointerAddressSpace();
				LLVMContext &Ctx = NewV->getType()->getContext();
				ConstantInt *NewVal = (TrueAS == NewAS) ?
				ConstantInt::getTrue(Ctx) : ConstantInt::getFalse(Ctx);
				II->replaceAllUsesWith(NewVal);
				II->eraseFromParent();
				return true;
				}
	default:			default:
	return false;			return false;
	}			}
	}			}

	unsigned GCNTTIImpl::getShuffleCost(TTI::ShuffleKind Kind, Type *Tp, int Index,			unsigned GCNTTIImpl::getShuffleCost(TTI::ShuffleKind Kind, Type *Tp, int Index,
	Type *SubTp) {			Type *SubTp) {
	if (ST->hasVOP3PInsts()) {			if (ST->hasVOP3PInsts()) {
	▲ Show 20 Lines • Show All 147 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,054 Lines • ▼ Show 20 Lines	case Intrinsic::amdgcn_groupstaticsize: {

const Module *M = MF.getFunction().getParent();		const Module *M = MF.getFunction().getParent();
const GlobalValue *GV =		const GlobalValue *GV =
M->getNamedValue(Intrinsic::getName(Intrinsic::amdgcn_groupstaticsize));		M->getNamedValue(Intrinsic::getName(Intrinsic::amdgcn_groupstaticsize));
SDValue GA = DAG.getTargetGlobalAddress(GV, DL, MVT::i32, 0,		SDValue GA = DAG.getTargetGlobalAddress(GV, DL, MVT::i32, 0,
SIInstrInfo::MO_ABS32_LO);		SIInstrInfo::MO_ABS32_LO);
return {DAG.getMachineNode(AMDGPU::S_MOV_B32, DL, MVT::i32, GA), 0};		return {DAG.getMachineNode(AMDGPU::S_MOV_B32, DL, MVT::i32, GA), 0};
}		}
		case Intrinsic::amdgcn_is_shared:
		case Intrinsic::amdgcn_is_private: {
		SDLoc SL(Op);
		unsigned AS = (IntrinsicID == Intrinsic::amdgcn_is_shared) ?
		AMDGPUAS::LOCAL_ADDRESS : AMDGPUAS::PRIVATE_ADDRESS;
		SDValue Aperture = getSegmentAperture(AS, SL, DAG);
		SDValue SrcVec = DAG.getNode(ISD::BITCAST, DL, MVT::v2i32,
		Op.getOperand(1));

		SDValue SrcHi = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, SL, MVT::i32, SrcVec,
		DAG.getConstant(1, SL, MVT::i32));
		return DAG.getSetCC(SL, MVT::i1, SrcHi, Aperture, ISD::SETEQ);
		}
default:		default:
if (const AMDGPU::ImageDimIntrinsicInfo *ImageDimIntr =		if (const AMDGPU::ImageDimIntrinsicInfo *ImageDimIntr =
AMDGPU::getImageDimIntrinsicInfo(IntrinsicID))		AMDGPU::getImageDimIntrinsicInfo(IntrinsicID))
return lowerImage(Op, ImageDimIntr, DAG);		return lowerImage(Op, ImageDimIntr, DAG);

return Op;		return Op;
}		}
}		}
▲ Show 20 Lines • Show All 4,801 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.is.private.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -global-isel -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=CI %s
				; RUN: llc -global-isel -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GFX9 %s

				; TODO: Merge with DAG test

				define amdgpu_kernel void @is_private_vgpr(i8* addrspace(1)* %ptr.ptr) {
				; CI-LABEL: is_private_vgpr:
				; CI: ; %bb.0:
				; CI-NEXT: v_ashrrev_i32_e32 v1, 31, v0
				; CI-NEXT: v_mul_lo_u32 v2, 0, v0
				; CI-NEXT: v_mul_lo_u32 v1, 8, v1
				; CI-NEXT: v_mul_lo_u32 v3, 8, v0
				; CI-NEXT: s_load_dwordx2 s[0:1], s[6:7], 0x0
				; CI-NEXT: v_mul_hi_u32 v0, 8, v0
				; CI-NEXT: v_add_i32_e32 v1, vcc, v2, v1
				; CI-NEXT: v_add_i32_e32 v1, vcc, v1, v0
				; CI-NEXT: s_waitcnt lgkmcnt(0)
				; CI-NEXT: v_add_i32_e32 v0, vcc, s0, v3
				; CI-NEXT: v_mov_b32_e32 v2, s1
				; CI-NEXT: v_addc_u32_e32 v1, vcc, v2, v1, vcc
				; CI-NEXT: flat_load_dwordx2 v[0:1], v[0:1]
				; CI-NEXT: s_load_dword s0, s[4:5], 0x11
				; CI-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
				; CI-NEXT: v_cmp_eq_u32_e32 vcc, s0, v1
				; CI-NEXT: v_cndmask_b32_e64 v0, 0, 1, vcc
				; CI-NEXT: flat_store_dword v[0:1], v0
				; CI-NEXT: s_endpgm
				;
				; GFX9-LABEL: is_private_vgpr:
				; GFX9: ; %bb.0:
				; GFX9-NEXT: v_ashrrev_i32_e32 v1, 31, v0
				; GFX9-NEXT: v_mul_lo_u32 v2, 0, v0
				; GFX9-NEXT: v_mul_lo_u32 v1, 8, v1
				; GFX9-NEXT: s_load_dwordx2 s[0:1], s[6:7], 0x0
				; GFX9-NEXT: v_mul_hi_u32 v3, 8, v0
				; GFX9-NEXT: v_mul_lo_u32 v0, 8, v0
				; GFX9-NEXT: v_add_u32_e32 v1, v2, v1
				; GFX9-NEXT: v_add_u32_e32 v1, v1, v3
				; GFX9-NEXT: s_waitcnt lgkmcnt(0)
				; GFX9-NEXT: v_mov_b32_e32 v2, s1
				; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, s0, v0
				; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, v2, v1, vcc
				; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
				; GFX9-NEXT: s_getreg_b32 s0, hwreg(HW_REG_SH_MEM_BASES, 0, 16)
				; GFX9-NEXT: s_lshl_b32 s0, s0, 16
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, s0, v1
				; GFX9-NEXT: v_cndmask_b32_e64 v0, 0, 1, vcc
				; GFX9-NEXT: global_store_dword v[0:1], v0, off
				; GFX9-NEXT: s_endpgm
				%id = call i32 @llvm.amdgcn.workitem.id.x()
				%gep = getelementptr inbounds i8, i8 addrspace(1)* %ptr.ptr, i32 %id
				%ptr = load volatile i8, i8 addrspace(1)* %gep
				%val = call i1 @llvm.amdgcn.is.private(i8* %ptr)
				%ext = zext i1 %val to i32
				store i32 %ext, i32 addrspace(1)* undef
				ret void
				}

				define amdgpu_kernel void @is_private_sgpr(i8* %ptr) {
				; CI-LABEL: is_private_sgpr:
				; CI: ; %bb.0:
				; CI-NEXT: s_load_dwordx2 s[0:1], s[6:7], 0x0
				; CI-NEXT: s_waitcnt lgkmcnt(0)
				; CI-NEXT: s_load_dword s0, s[4:5], 0x11
				; CI-NEXT: s_waitcnt lgkmcnt(0)
				; CI-NEXT: s_cmp_eq_u32 s1, s0
				; CI-NEXT: s_cbranch_scc0 BB1_2
				; CI-NEXT: ; %bb.1: ; %bb0
				; CI-NEXT: v_mov_b32_e32 v0, 0
				; CI-NEXT: flat_store_dword v[0:1], v0
				; CI-NEXT: BB1_2: ; %bb1
				; CI-NEXT: s_endpgm
				;
				; GFX9-LABEL: is_private_sgpr:
				; GFX9: ; %bb.0:
				; GFX9-NEXT: s_load_dwordx2 s[0:1], s[6:7], 0x0
				; GFX9-NEXT: s_waitcnt lgkmcnt(0)
				; GFX9-NEXT: s_getreg_b32 s0, hwreg(HW_REG_SH_MEM_BASES, 0, 16)
				; GFX9-NEXT: s_lshl_b32 s0, s0, 16
				; GFX9-NEXT: s_cmp_eq_u32 s1, s0
				; GFX9-NEXT: s_cbranch_scc0 BB1_2
				; GFX9-NEXT: ; %bb.1: ; %bb0
				; GFX9-NEXT: v_mov_b32_e32 v0, 0
				; GFX9-NEXT: global_store_dword v[0:1], v0, off
				; GFX9-NEXT: BB1_2: ; %bb1
				; GFX9-NEXT: s_endpgm
				%val = call i1 @llvm.amdgcn.is.private(i8* %ptr)
				br i1 %val, label %bb0, label %bb1

				bb0:
				store volatile i32 0, i32 addrspace(1)* undef
				br label %bb1

				bb1:
				ret void
				}

				declare i32 @llvm.amdgcn.workitem.id.x() #0
				declare i1 @llvm.amdgcn.is.private(i8* nocapture) #0

				attributes #0 = { nounwind readnone speculatable }

test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.is.shared.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -global-isel -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=CI %s
				; RUN: llc -global-isel -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GFX9 %s

				; TODO: Merge with DAG test

				define amdgpu_kernel void @is_local_vgpr(i8* addrspace(1)* %ptr.ptr) {
				; CI-LABEL: is_local_vgpr:
				; CI: ; %bb.0:
				; CI-NEXT: v_ashrrev_i32_e32 v1, 31, v0
				; CI-NEXT: v_mul_lo_u32 v2, 0, v0
				; CI-NEXT: v_mul_lo_u32 v1, 8, v1
				; CI-NEXT: v_mul_lo_u32 v3, 8, v0
				; CI-NEXT: s_load_dwordx2 s[0:1], s[6:7], 0x0
				; CI-NEXT: v_mul_hi_u32 v0, 8, v0
				; CI-NEXT: v_add_i32_e32 v1, vcc, v2, v1
				; CI-NEXT: v_add_i32_e32 v1, vcc, v1, v0
				; CI-NEXT: s_waitcnt lgkmcnt(0)
				; CI-NEXT: v_add_i32_e32 v0, vcc, s0, v3
				; CI-NEXT: v_mov_b32_e32 v2, s1
				; CI-NEXT: v_addc_u32_e32 v1, vcc, v2, v1, vcc
				; CI-NEXT: flat_load_dwordx2 v[0:1], v[0:1]
				; CI-NEXT: s_load_dword s0, s[4:5], 0x10
				; CI-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
				; CI-NEXT: v_cmp_eq_u32_e32 vcc, s0, v1
				; CI-NEXT: v_cndmask_b32_e64 v0, 0, 1, vcc
				; CI-NEXT: flat_store_dword v[0:1], v0
				; CI-NEXT: s_endpgm
				;
				; GFX9-LABEL: is_local_vgpr:
				; GFX9: ; %bb.0:
				; GFX9-NEXT: v_ashrrev_i32_e32 v1, 31, v0
				; GFX9-NEXT: v_mul_lo_u32 v2, 0, v0
				; GFX9-NEXT: v_mul_lo_u32 v1, 8, v1
				; GFX9-NEXT: s_load_dwordx2 s[0:1], s[6:7], 0x0
				; GFX9-NEXT: v_mul_hi_u32 v3, 8, v0
				; GFX9-NEXT: v_mul_lo_u32 v0, 8, v0
				; GFX9-NEXT: v_add_u32_e32 v1, v2, v1
				; GFX9-NEXT: v_add_u32_e32 v1, v1, v3
				; GFX9-NEXT: s_waitcnt lgkmcnt(0)
				; GFX9-NEXT: v_mov_b32_e32 v2, s1
				; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, s0, v0
				; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, v2, v1, vcc
				; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
				; GFX9-NEXT: s_getreg_b32 s0, hwreg(HW_REG_SH_MEM_BASES, 16, 16)
				; GFX9-NEXT: s_lshl_b32 s0, s0, 16
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: v_cmp_eq_u32_e32 vcc, s0, v1
				; GFX9-NEXT: v_cndmask_b32_e64 v0, 0, 1, vcc
				; GFX9-NEXT: global_store_dword v[0:1], v0, off
				; GFX9-NEXT: s_endpgm
				%id = call i32 @llvm.amdgcn.workitem.id.x()
				%gep = getelementptr inbounds i8, i8 addrspace(1)* %ptr.ptr, i32 %id
				%ptr = load volatile i8, i8 addrspace(1)* %gep
				%val = call i1 @llvm.amdgcn.is.shared(i8* %ptr)
				%ext = zext i1 %val to i32
				store i32 %ext, i32 addrspace(1)* undef
				ret void
				}

				define amdgpu_kernel void @is_local_sgpr(i8* %ptr) {
				; CI-LABEL: is_local_sgpr:
				; CI: ; %bb.0:
				; CI-NEXT: s_load_dwordx2 s[0:1], s[6:7], 0x0
				; CI-NEXT: s_waitcnt lgkmcnt(0)
				; CI-NEXT: s_load_dword s0, s[4:5], 0x10
				; CI-NEXT: s_waitcnt lgkmcnt(0)
				; CI-NEXT: s_cmp_eq_u32 s1, s0
				; CI-NEXT: s_cbranch_scc0 BB1_2
				; CI-NEXT: ; %bb.1: ; %bb0
				; CI-NEXT: v_mov_b32_e32 v0, 0
				; CI-NEXT: flat_store_dword v[0:1], v0
				; CI-NEXT: BB1_2: ; %bb1
				; CI-NEXT: s_endpgm
				;
				; GFX9-LABEL: is_local_sgpr:
				; GFX9: ; %bb.0:
				; GFX9-NEXT: s_load_dwordx2 s[0:1], s[6:7], 0x0
				; GFX9-NEXT: s_waitcnt lgkmcnt(0)
				; GFX9-NEXT: s_getreg_b32 s0, hwreg(HW_REG_SH_MEM_BASES, 16, 16)
				; GFX9-NEXT: s_lshl_b32 s0, s0, 16
				; GFX9-NEXT: s_cmp_eq_u32 s1, s0
				; GFX9-NEXT: s_cbranch_scc0 BB1_2
				; GFX9-NEXT: ; %bb.1: ; %bb0
				; GFX9-NEXT: v_mov_b32_e32 v0, 0
				; GFX9-NEXT: global_store_dword v[0:1], v0, off
				; GFX9-NEXT: BB1_2: ; %bb1
				; GFX9-NEXT: s_endpgm
				%val = call i1 @llvm.amdgcn.is.shared(i8* %ptr)
				br i1 %val, label %bb0, label %bb1

				bb0:
				store volatile i32 0, i32 addrspace(1)* undef
				br label %bb1

				bb1:
				ret void
				}

				declare i32 @llvm.amdgcn.workitem.id.x() #0
				declare i1 @llvm.amdgcn.is.shared(i8* nocapture) #0

				attributes #0 = { nounwind readnone speculatable }

test/CodeGen/AMDGPU/annotate-kernel-features-hsa.ll

	; RUN: opt -mtriple=amdgcn-unknown-amdhsa -S -amdgpu-annotate-kernel-features < %s \| FileCheck -check-prefix=HSA %s			; RUN: opt -mtriple=amdgcn-unknown-amdhsa -S -amdgpu-annotate-kernel-features < %s \| FileCheck -check-prefix=HSA %s

	declare i32 @llvm.amdgcn.workgroup.id.x() #0			declare i32 @llvm.amdgcn.workgroup.id.x() #0
	declare i32 @llvm.amdgcn.workgroup.id.y() #0			declare i32 @llvm.amdgcn.workgroup.id.y() #0
	declare i32 @llvm.amdgcn.workgroup.id.z() #0			declare i32 @llvm.amdgcn.workgroup.id.z() #0

	declare i32 @llvm.amdgcn.workitem.id.x() #0			declare i32 @llvm.amdgcn.workitem.id.x() #0
	declare i32 @llvm.amdgcn.workitem.id.y() #0			declare i32 @llvm.amdgcn.workitem.id.y() #0
	declare i32 @llvm.amdgcn.workitem.id.z() #0			declare i32 @llvm.amdgcn.workitem.id.z() #0

	declare i8 addrspace(4)* @llvm.amdgcn.dispatch.ptr() #0			declare i8 addrspace(4)* @llvm.amdgcn.dispatch.ptr() #0
	declare i8 addrspace(4)* @llvm.amdgcn.queue.ptr() #0			declare i8 addrspace(4)* @llvm.amdgcn.queue.ptr() #0
	declare i8 addrspace(4)* @llvm.amdgcn.kernarg.segment.ptr() #0			declare i8 addrspace(4)* @llvm.amdgcn.kernarg.segment.ptr() #0

				declare i1 @llvm.amdgcn.is.local(i8* nocapture) #2
				declare i1 @llvm.amdgcn.is.private(i8* nocapture) #2

	; HSA: define amdgpu_kernel void @use_tgid_x(i32 addrspace(1)* %ptr) #1 {			; HSA: define amdgpu_kernel void @use_tgid_x(i32 addrspace(1)* %ptr) #1 {
	define amdgpu_kernel void @use_tgid_x(i32 addrspace(1)* %ptr) #1 {			define amdgpu_kernel void @use_tgid_x(i32 addrspace(1)* %ptr) #1 {
	%val = call i32 @llvm.amdgcn.workgroup.id.x()			%val = call i32 @llvm.amdgcn.workgroup.id.x()
	store i32 %val, i32 addrspace(1)* %ptr			store i32 %val, i32 addrspace(1)* %ptr
	ret void			ret void
	}			}

	; HSA: define amdgpu_kernel void @use_tgid_y(i32 addrspace(1)* %ptr) #2 {			; HSA: define amdgpu_kernel void @use_tgid_y(i32 addrspace(1)* %ptr) #2 {
	▲ Show 20 Lines • Show All 203 Lines • ▼ Show 20 Lines

	; HSA: define amdgpu_kernel void @use_flat_to_constant_addrspacecast(i32* %ptr) #1 {			; HSA: define amdgpu_kernel void @use_flat_to_constant_addrspacecast(i32* %ptr) #1 {
	define amdgpu_kernel void @use_flat_to_constant_addrspacecast(i32* %ptr) #1 {			define amdgpu_kernel void @use_flat_to_constant_addrspacecast(i32* %ptr) #1 {
	%ftos = addrspacecast i32* %ptr to i32 addrspace(4)*			%ftos = addrspacecast i32* %ptr to i32 addrspace(4)*
	%ld = load volatile i32, i32 addrspace(4)* %ftos			%ld = load volatile i32, i32 addrspace(4)* %ftos
	ret void			ret void
	}			}

				; HSA: define amdgpu_kernel void @use_is_local(i8* %ptr) #11 {
				define amdgpu_kernel void @use_is_local(i8* %ptr) #1 {
				%is.local = call i1 @llvm.amdgcn.is.local(i8* %ptr)
				%ext = zext i1 %is.local to i32
				store i32 %ext, i32 addrspace(1)* undef
				ret void
				}

				; HSA: define amdgpu_kernel void @use_is_private(i8* %ptr) #11 {
				define amdgpu_kernel void @use_is_private(i8* %ptr) #1 {
				%is.private = call i1 @llvm.amdgcn.is.private(i8* %ptr)
				%ext = zext i1 %is.private to i32
				store i32 %ext, i32 addrspace(1)* undef
				ret void
				}

	attributes #0 = { nounwind readnone speculatable }			attributes #0 = { nounwind readnone speculatable }
	attributes #1 = { nounwind }			attributes #1 = { nounwind }

	; HSA: attributes #0 = { nounwind readnone speculatable }			; HSA: attributes #0 = { nounwind readnone speculatable }
	; HSA: attributes #1 = { nounwind }			; HSA: attributes #1 = { nounwind }
	; HSA: attributes #2 = { nounwind "amdgpu-work-group-id-y" }			; HSA: attributes #2 = { nounwind "amdgpu-work-group-id-y" }
	; HSA: attributes #3 = { nounwind "amdgpu-work-group-id-z" }			; HSA: attributes #3 = { nounwind "amdgpu-work-group-id-z" }
	; HSA: attributes #4 = { nounwind "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" }			; HSA: attributes #4 = { nounwind "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" }
	; HSA: attributes #5 = { nounwind "amdgpu-work-item-id-y" }			; HSA: attributes #5 = { nounwind "amdgpu-work-item-id-y" }
	; HSA: attributes #6 = { nounwind "amdgpu-work-item-id-z" }			; HSA: attributes #6 = { nounwind "amdgpu-work-item-id-z" }
	; HSA: attributes #7 = { nounwind "amdgpu-work-group-id-y" "amdgpu-work-item-id-y" }			; HSA: attributes #7 = { nounwind "amdgpu-work-group-id-y" "amdgpu-work-item-id-y" }
	; HSA: attributes #8 = { nounwind "amdgpu-work-item-id-y" "amdgpu-work-item-id-z" }			; HSA: attributes #8 = { nounwind "amdgpu-work-item-id-y" "amdgpu-work-item-id-z" }
	; HSA: attributes #9 = { nounwind "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" "amdgpu-work-item-id-y" "amdgpu-work-item-id-z" }			; HSA: attributes #9 = { nounwind "amdgpu-work-group-id-y" "amdgpu-work-group-id-z" "amdgpu-work-item-id-y" "amdgpu-work-item-id-z" }
	; HSA: attributes #10 = { nounwind "amdgpu-dispatch-ptr" }			; HSA: attributes #10 = { nounwind "amdgpu-dispatch-ptr" }
	; HSA: attributes #11 = { nounwind "amdgpu-queue-ptr" }			; HSA: attributes #11 = { nounwind "amdgpu-queue-ptr" }
	; HSA: attributes #12 = { nounwind "amdgpu-kernarg-segment-ptr" }			; HSA: attributes #12 = { nounwind "amdgpu-kernarg-segment-ptr" }

test/CodeGen/AMDGPU/llvm.amdgcn.is.private.ll

This file was added.

				; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,CI %s
				; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX9 %s

				; GCN-LABEL: {{^}}is_private_vgpr:
				; GCN-DAG: {{flat\|global}}_load_dwordx2 v{{\[[0-9]+}}:[[PTR_HI:[0-9]+]]{{\]}}
				; CI-DAG: s_load_dword [[APERTURE:s[0-9]+]], s[4:5], 0x11
				; GFX9-DAG: s_getreg_b32 [[APERTURE:s[0-9]+]], hwreg(HW_REG_SH_MEM_BASES, 0, 16)
				; GFX9: s_lshl_b32 [[APERTURE]], [[APERTURE]], 16
				; GCN: v_cmp_eq_u32_e32 vcc, [[APERTURE]], v[[PTR_HI]]
				; GCN: v_cndmask_b32_e64 v{{[0-9]+}}, 0, 1, vcc
				define amdgpu_kernel void @is_private_vgpr(i8* addrspace(1)* %ptr.ptr) {
				%id = call i32 @llvm.amdgcn.workitem.id.x()
				%gep = getelementptr inbounds i8, i8 addrspace(1)* %ptr.ptr, i32 %id
				%ptr = load volatile i8, i8 addrspace(1)* %gep
				%val = call i1 @llvm.amdgcn.is.private(i8* %ptr)
				%ext = zext i1 %val to i32
				store i32 %ext, i32 addrspace(1)* undef
				ret void
				}

				; FIXME: setcc (zero_extend (setcc)), 1) not folded out, resulting in
				; select and vcc branch.

				; GCN-LABEL: {{^}}is_private_sgpr:
				; CI-DAG: s_load_dword [[APERTURE:s[0-9]+]], s[4:5], 0x11{{$}}
				; GFX9-DAG: s_getreg_b32 [[APERTURE:s[0-9]+]], hwreg(HW_REG_SH_MEM_BASES, 0, 16)

				; CI-DAG: s_load_dword [[PTR_HI:s[0-9]+]], s[6:7], 0x1{{$}}
				; GFX9-DAG: s_load_dword [[PTR_HI:s[0-9]+]], s[6:7], 0x4{{$}}
				; GFX9: s_lshl_b32 [[APERTURE]], [[APERTURE]], 16

				; GCN: v_mov_b32_e32 [[V_APERTURE:v[0-9]+]], [[APERTURE]]
				; GCN: v_cmp_eq_u32_e32 vcc, [[PTR_HI]], [[V_APERTURE]]
				; GCN: s_cbranch_vccnz
				define amdgpu_kernel void @is_private_sgpr(i8* %ptr) {
				%val = call i1 @llvm.amdgcn.is.private(i8* %ptr)
				br i1 %val, label %bb0, label %bb1

				bb0:
				store volatile i32 0, i32 addrspace(1)* undef
				br label %bb1

				bb1:
				ret void
				}

				declare i32 @llvm.amdgcn.workitem.id.x() #0
				declare i1 @llvm.amdgcn.is.private(i8* nocapture) #0

				attributes #0 = { nounwind readnone speculatable }

test/CodeGen/AMDGPU/llvm.amdgcn.is.shared.ll

This file was added.

				; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,CI %s
				; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX9 %s

				; GCN-LABEL: {{^}}is_local_vgpr:
				; GCN-DAG: {{flat\|global}}_load_dwordx2 v{{\[[0-9]+}}:[[PTR_HI:[0-9]+]]{{\]}}
				; CI-DAG: s_load_dword [[APERTURE:s[0-9]+]], s[4:5], 0x10
				; GFX9-DAG: s_getreg_b32 [[APERTURE:s[0-9]+]], hwreg(HW_REG_SH_MEM_BASES, 16, 16)
				; GFX9: s_lshl_b32 [[APERTURE]], [[APERTURE]], 16

				; GCN: v_cmp_eq_u32_e32 vcc, [[APERTURE]], v[[PTR_HI]]
				; GCN: v_cndmask_b32_e64 v{{[0-9]+}}, 0, 1, vcc
				define amdgpu_kernel void @is_local_vgpr(i8* addrspace(1)* %ptr.ptr) {
				%id = call i32 @llvm.amdgcn.workitem.id.x()
				%gep = getelementptr inbounds i8, i8 addrspace(1)* %ptr.ptr, i32 %id
				%ptr = load volatile i8, i8 addrspace(1)* %gep
				%val = call i1 @llvm.amdgcn.is.shared(i8* %ptr)
				%ext = zext i1 %val to i32
				store i32 %ext, i32 addrspace(1)* undef
				ret void
				}

				; FIXME: setcc (zero_extend (setcc)), 1) not folded out, resulting in
				; select and vcc branch.

				; GCN-LABEL: {{^}}is_local_sgpr:
				; CI-DAG: s_load_dword [[APERTURE:s[0-9]+]], s[4:5], 0x10{{$}}
				; GFX9-DAG: s_getreg_b32 [[APERTURE:s[0-9]+]], hwreg(HW_REG_SH_MEM_BASES, 16, 16)
				; GFX9-DAG: s_lshl_b32 [[APERTURE]], [[APERTURE]], 16

				; CI-DAG: s_load_dword [[PTR_HI:s[0-9]+]], s[6:7], 0x1{{$}}
				; GFX9-DAG: s_load_dword [[PTR_HI:s[0-9]+]], s[6:7], 0x4{{$}}

				; GCN: v_mov_b32_e32 [[V_APERTURE:v[0-9]+]], [[APERTURE]]
				; GCN: v_cmp_eq_u32_e32 vcc, [[PTR_HI]], [[V_APERTURE]]
				; GCN: s_cbranch_vccnz
				define amdgpu_kernel void @is_local_sgpr(i8* %ptr) {
				%val = call i1 @llvm.amdgcn.is.shared(i8* %ptr)
				br i1 %val, label %bb0, label %bb1

				bb0:
				store volatile i32 0, i32 addrspace(1)* undef
				br label %bb1

				bb1:
				ret void
				}

				declare i32 @llvm.amdgcn.workitem.id.x() #0
				declare i1 @llvm.amdgcn.is.shared(i8* nocapture) #0

				attributes #0 = { nounwind readnone speculatable }

test/Transforms/InferAddressSpaces/AMDGPU/address-space-id-funcs.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -infer-address-spaces -instsimplify %s \| FileCheck %s

				define amdgpu_kernel void @is_local_true(i8 addrspace(3)* %lptr) {
				; CHECK-LABEL: @is_local_true(
				; CHECK-NEXT: store i32 1, i32 addrspace(1)* undef
				; CHECK-NEXT: ret void
				;
				%cast = addrspacecast i8 addrspace(3)* %lptr to i8*
				%is.local = call i1 @llvm.amdgcn.is.local(i8* %cast)
				%ext = zext i1 %is.local to i32
				store i32 %ext, i32 addrspace(1)* undef
				ret void
				}

				define amdgpu_kernel void @is_local_false(i8 addrspace(1)* %gptr) {
				; CHECK-LABEL: @is_local_false(
				; CHECK-NEXT: store i32 0, i32 addrspace(1)* undef
				; CHECK-NEXT: ret void
				;
				%cast = addrspacecast i8 addrspace(1)* %gptr to i8*
				%is.local = call i1 @llvm.amdgcn.is.local(i8* %cast)
				%ext = zext i1 %is.local to i32
				store i32 %ext, i32 addrspace(1)* undef
				ret void
				}

				define void @is_private_true(i8 addrspace(5)* %lptr) {
				; CHECK-LABEL: @is_private_true(
				; CHECK-NEXT: store i32 1, i32 addrspace(1)* undef
				; CHECK-NEXT: ret void
				;
				%cast = addrspacecast i8 addrspace(5)* %lptr to i8*
				%is.private = call i1 @llvm.amdgcn.is.private(i8* %cast)
				%ext = zext i1 %is.private to i32
				store i32 %ext, i32 addrspace(1)* undef
				ret void
				}

				define void @is_private_false(i8 addrspace(1)* %gptr) {
				; CHECK-LABEL: @is_private_false(
				; CHECK-NEXT: store i32 0, i32 addrspace(1)* undef
				; CHECK-NEXT: ret void
				;
				%cast = addrspacecast i8 addrspace(1)* %gptr to i8*
				%is.private = call i1 @llvm.amdgcn.is.private(i8* %cast)
				%ext = zext i1 %is.private to i32
				store i32 %ext, i32 addrspace(1)* undef
				ret void
				}

				declare i1 @llvm.amdgcn.is.local(i8* nocapture) #0
				declare i1 @llvm.amdgcn.is.private(i8* nocapture) #0

				attributes #0 = { nounwind readnone speculatable willreturn }

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Add intrinsics for address space identificationClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 215167

include/llvm/IR/IntrinsicsAMDGPU.td

lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp

lib/Target/AMDGPU/AMDGPULegalizerInfo.h

lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp

lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp

lib/Target/AMDGPU/SIISelLowering.cpp

test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.is.private.ll

test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.is.shared.ll

test/CodeGen/AMDGPU/annotate-kernel-features-hsa.ll

test/CodeGen/AMDGPU/llvm.amdgcn.is.private.ll

test/CodeGen/AMDGPU/llvm.amdgcn.is.shared.ll

test/Transforms/InferAddressSpaces/AMDGPU/address-space-id-funcs.ll

AMDGPU: Add intrinsics for address space identification
ClosedPublic