In D20297#431268, @arsenm wrote:

We don't actually want these. We now have the kernarg.segment.ptr intrinsic, so the library should just directly read the offsets from there

I assume it points to the beginning of the kernel args.
Is the idea to have another intrinsic to read from the end (workdim, global offset), or is it OK to have intrinsics for those?

In D20297#431298, @jvesely wrote:

In D20297#431268, @arsenm wrote:

We don't actually want these. We now have the kernarg.segment.ptr intrinsic, so the library should just directly read the offsets from there

I assume it points to the beginning of the kernel args.
Is the idea to have another intrinsic to read from the end (workdim, global offset), or is it OK to have intrinsics for those?

We should fix clover to not read from the end of the arguments (I thought this is what it already did)?

In D20297#431315, @arsenm wrote:

In D20297#431298, @jvesely wrote:

In D20297#431268, @arsenm wrote:

We don't actually want these. We now have the kernarg.segment.ptr intrinsic, so the library should just directly read the offsets from there

I assume it points to the beginning of the kernel args.
Is the idea to have another intrinsic to read from the end (workdim, global offset), or is it OK to have intrinsics for those?

We should fix clover to not read from the end of the arguments (I thought this is what it already did)?

some are before the kernel args (global size, local size, ngroups -- this is done by radeonsi/r600 mesa driver). others are after the kernel arguments (work_dim, global-offset -- this is done by clover).
The intention was to move all implicit arguments after the explicit ones, so new implicit arguments can be added without breaking ABI (moving explicit arguments).

In D20297#431388, @jvesely wrote:

In D20297#431315, @arsenm wrote:

In D20297#431298, @jvesely wrote:

In D20297#431268, @arsenm wrote:

We don't actually want these. We now have the kernarg.segment.ptr intrinsic, so the library should just directly read the offsets from there

I assume it points to the beginning of the kernel args.
Is the idea to have another intrinsic to read from the end (workdim, global offset), or is it OK to have intrinsics for those?

We should fix clover to not read from the end of the arguments (I thought this is what it already did)?

some are before the kernel args (global size, local size, ngroups -- this is done by radeonsi/r600 mesa driver). others are after the kernel arguments (work_dim, global-offset -- this is done by clover).
The intention was to move all implicit arguments after the explicit ones, so new implicit arguments can be added without breaking ABI (moving explicit arguments).

I think clover should move towards matching the HSA ABI closer. Most of the implicit arguments would then be user SGPR inputs like HSA uses, and the number of implicit args would be reduced.

In D20297#431404, @arsenm wrote:

In D20297#431388, @jvesely wrote:

In D20297#431315, @arsenm wrote:

In D20297#431298, @jvesely wrote:

In D20297#431268, @arsenm wrote:

We don't actually want these. We now have the kernarg.segment.ptr intrinsic, so the library should just directly read the offsets from there

I assume it points to the beginning of the kernel args.
Is the idea to have another intrinsic to read from the end (workdim, global offset), or is it OK to have intrinsics for those?

We should fix clover to not read from the end of the arguments (I thought this is what it already did)?

some are before the kernel args (global size, local size, ngroups -- this is done by radeonsi/r600 mesa driver). others are after the kernel arguments (work_dim, global-offset -- this is done by clover).
The intention was to move all implicit arguments after the explicit ones, so new implicit arguments can be added without breaking ABI (moving explicit arguments).

I think clover should move towards matching the HSA ABI closer. Most of the implicit arguments would then be user SGPR inputs like HSA uses, and the number of implicit args would be reduced.

new implicit arguments were appended to allow newer mesa to work with older llvm (without ifdef hell) any kind of significant abi change would break that (otherwise it'd be in one palce and we'd avoi this problem).
I'm not sure how much of a problem it is and what other (possible) users of clover might want. anyway, it's beyond scope of me trying to make get_global_offset() work. would you be OK if I restricted the changes to r600?

Drop new amdgcn intrinsics. make kernarg segment mesa compatible instead.

jvesely updated this object.May 20 2016, 10:01 AM

jvesely removed a child revision: D20298: AMDGPU/R600: Add get_global_offset_{x,y,z} intrinsic.

In D20297#431404, @arsenm wrote:

In D20297#431388, @jvesely wrote:

In D20297#431315, @arsenm wrote:

In D20297#431298, @jvesely wrote:

In D20297#431268, @arsenm wrote:

We don't actually want these. We now have the kernarg.segment.ptr intrinsic, so the library should just directly read the offsets from there

I assume it points to the beginning of the kernel args.
Is the idea to have another intrinsic to read from the end (workdim, global offset), or is it OK to have intrinsics for those?

We should fix clover to not read from the end of the arguments (I thought this is what it already did)?

some are before the kernel args (global size, local size, ngroups -- this is done by radeonsi/r600 mesa driver). others are after the kernel arguments (work_dim, global-offset -- this is done by clover).
The intention was to move all implicit arguments after the explicit ones, so new implicit arguments can be added without breaking ABI (moving explicit arguments).

I think clover should move towards matching the HSA ABI closer. Most of the implicit arguments would then be user SGPR inputs like HSA uses, and the number of implicit args would be reduced.

I agree. I actually started working on this last week. I think all implicit args that aren't passed in SGPRs should be at the end of the kernarg segment.

lib/Target/AMDGPU/SIISelLowering.cpp
1554–1559	There should be a separate intrinsic which points to the start of implicit args. As the ABI is currently, the library could use kernarg_segment_ptr to load the workgroup size information.

In D20297#435687, @tstellarAMD wrote:

In D20297#431404, @arsenm wrote:

In D20297#431388, @jvesely wrote:

In D20297#431315, @arsenm wrote:

In D20297#431298, @jvesely wrote:

In D20297#431268, @arsenm wrote:

We don't actually want these. We now have the kernarg.segment.ptr intrinsic, so the library should just directly read the offsets from there

I assume it points to the beginning of the kernel args.
Is the idea to have another intrinsic to read from the end (workdim, global offset), or is it OK to have intrinsics for those?

We should fix clover to not read from the end of the arguments (I thought this is what it already did)?

some are before the kernel args (global size, local size, ngroups -- this is done by radeonsi/r600 mesa driver). others are after the kernel arguments (work_dim, global-offset -- this is done by clover).
The intention was to move all implicit arguments after the explicit ones, so new implicit arguments can be added without breaking ABI (moving explicit arguments).

I think clover should move towards matching the HSA ABI closer. Most of the implicit arguments would then be user SGPR inputs like HSA uses, and the number of implicit args would be reduced.

This is a bit confusing. If the implicit args should be in SGPRs, wouldn't we need one intrinsic per implicit arg? and how would the number of implicit args be reduced? the only redundant one is global size (libclc computes it as num_groups * local_size).

I agree. I actually started working on this last week. I think all implicit args that aren't passed in SGPRs should be at the end of the kernarg segment.

My idea was to switch work_dim and newly implemented global_offset, as those are already appended (the rest can be switched by patches to libclc and clover). However, doesn't this contradict Matt's suggestion to pass implicit arguments in SGPRs?

I think clover should move towards matching the HSA ABI closer. Most of the implicit arguments would then be user SGPR inputs like HSA uses, and the number of implicit args would be reduced.

This is a bit confusing. If the implicit args should be in SGPRs, wouldn't we need one intrinsic per implicit arg? and how would the number of implicit args be reduced? the only redundant one is global size (libclc computes it as num_groups * local_size).

I agree. I actually started working on this last week. I think all implicit args that aren't passed in SGPRs should be at the end of the kernarg segment.

My idea was to switch work_dim and newly implemented global_offset, as those are already appended (the rest can be switched by patches to libclc and clover). However, doesn't this contradict Matt's suggestion to pass implicit arguments in SGPRs?

SGPR space is limited, so we won't be able to pass all implicit arguments this way, so some will need to be added to the kernarg buffer. The types of values that should be passed in SGPRs are things that tend to be common across all runtimes, like work-group/work-item size, scratch buffer pointers, etc.

• tstellarAMD added a comment.May 20 2016, 7:04 PM

This comment was removed by • tstellarAMD.

We could do what we are planning for OpenCL and add a struct pointer for future expansion at the end of some decided N bytes to reserve.

If 256 bytes were reserved at the beginning, that would be more than enough for any implicit needs and would keep the user arg base alignment the same (not sure that really matters).

In D20297#436251, @tstellarAMD wrote:

I think clover should move towards matching the HSA ABI closer. Most of the implicit arguments would then be user SGPR inputs like HSA uses, and the number of implicit args would be reduced.

This is a bit confusing. If the implicit args should be in SGPRs, wouldn't we need one intrinsic per implicit arg? and how would the number of implicit args be reduced? the only redundant one is global size (libclc computes it as num_groups * local_size).

I agree. I actually started working on this last week. I think all implicit args that aren't passed in SGPRs should be at the end of the kernarg segment.

My idea was to switch work_dim and newly implemented global_offset, as those are already appended (the rest can be switched by patches to libclc and clover). However, doesn't this contradict Matt's suggestion to pass implicit arguments in SGPRs?

SGPR space is limited, so we won't be able to pass all implicit arguments this way, so some will need to be added to the kernarg buffer. The types of values that should be passed in SGPRs are things that tend to be common across all runtimes, like work-group/work-item size, scratch buffer pointers, etc.

OK, so to be specific about currently passed information.
workdim, wg_size, num_group should be eventually passed in SGPRs and therefore should have their own intrinsic, correct?
should those values also be duplicated in the kernarg segment?

the rest (global_size, global_offset) are loaded via kernargs segment ptr

do you still want two pointers (beginning of kernarg and beginning of implicit args) for mesa, or is it ok to have one if all the information is present at the appended location?

In D20297#436254, @arsenm wrote:

We could do what we are planning for OpenCL and add a struct pointer for future expansion at the end of some decided N bytes to reserve.

If 256 bytes were reserved at the beginning, that would be more than enough for any implicit needs and would keep the user arg base alignment the same (not sure that really matters).

reserving 256 bytes at the beginning would waste half a KC block for r600 (and might run into limitations for other hw), so clover is probably best of appending the information to minimize space (it should work until there are kernels with variable parameters). radeonsi driver is free to change this for GCN hw.

In D20297#436378, @jvesely wrote:

In D20297#436251, @tstellarAMD wrote:

I think clover should move towards matching the HSA ABI closer. Most of the implicit arguments would then be user SGPR inputs like HSA uses, and the number of implicit args would be reduced.

This is a bit confusing. If the implicit args should be in SGPRs, wouldn't we need one intrinsic per implicit arg? and how would the number of implicit args be reduced? the only redundant one is global size (libclc computes it as num_groups * local_size).

I agree. I actually started working on this last week. I think all implicit args that aren't passed in SGPRs should be at the end of the kernarg segment.

My idea was to switch work_dim and newly implemented global_offset, as those are already appended (the rest can be switched by patches to libclc and clover). However, doesn't this contradict Matt's suggestion to pass implicit arguments in SGPRs?

SGPR space is limited, so we won't be able to pass all implicit arguments this way, so some will need to be added to the kernarg buffer. The types of values that should be passed in SGPRs are things that tend to be common across all runtimes, like work-group/work-item size, scratch buffer pointers, etc.

OK, so to be specific about currently passed information.
workdim, wg_size, num_group should be eventually passed in SGPRs and therefore should have their own intrinsic, correct?

If you look at enum PreloadedValue in SIRegisterInfo.h (this may have moved to SIMachineFunctionInfo.h by now), those are the values preloaded into SGPRs by the HSA runtime. If we are going to be changing the ABI for radeonsi clover, I would really like it to match what we use for HSA. For r600, it doesn't really matter to me what the ABI ends up being.

should those values also be duplicated in the kernarg segment?

the rest (global_size, global_offset) are loaded via kernargs segment ptr

do you still want two pointers (beginning of kernarg and beginning of implicit args) for mesa, or is it ok to have one if all the information is present at the appended location?

I think there needs to be both. The intrinsics names should correctly describe what they do, so I don't want to have kernarg.segment.ptr mean different things for different arches.

In D20297#436254, @arsenm wrote:

We could do what we are planning for OpenCL and add a struct pointer for future expansion at the end of some decided N bytes to reserve.

If 256 bytes were reserved at the beginning, that would be more than enough for any implicit needs and would keep the user arg base alignment the same (not sure that really matters).

reserving 256 bytes at the beginning would waste half a KC block for r600 (and might run into limitations for other hw), so clover is probably best of appending the information to minimize space (it should work until there are kernels with variable parameters). radeonsi driver is free to change this for GCN hw.

Create new intrinsic for implicit args

Why not the offset from the base pointer intrinsic rather than an intrinsic to the offset?

In D20297#441156, @arsenm wrote:

Why not the offset from the base pointer intrinsic rather than an intrinsic to the offset?

Does not really matter, but pointer appeared more flexible. if you decide to move implicit args elsewhere (that is not offset to kernarg segment) you only need to reimplement the intrinsic, without updating library.

In D20297#441156, @arsenm wrote:

Why not the offset from the base pointer intrinsic rather than an intrinsic to the offset?

Because if the implict args are stored after the explict kernel args, then the offset from kernarg.base.ptr to the start of implicit args is not known until compile time.

This patch LGTM, but I think we should drop segment from the intrinsic name. In HSA 'segment' means address space, and we don't have a separate address space for implicit args.

drop segment from the intrinsic name

In D20297#442193, @tstellarAMD wrote:

In D20297#441156, @arsenm wrote:

Why not the offset from the base pointer intrinsic rather than an intrinsic to the offset?

Because if the implict args are stored after the explict kernel args, then the offset from kernarg.base.ptr to the start of implicit args is not known until compile time.

Why is this an issue? This won't be known anyway

In D20297#444575, @arsenm wrote:

In D20297#442193, @tstellarAMD wrote:

In D20297#441156, @arsenm wrote:

Why not the offset from the base pointer intrinsic rather than an intrinsic to the offset?

Because if the implict args are stored after the explict kernel args, then the offset from kernarg.base.ptr to the start of implicit args is not known until compile time.

Why is this an issue? This won't be known anyway

just to be sure I understand correctly. your suggestion is to have "implicitarg.offset" so the libclc(or any other user) then uses "_builtin_kernarg_segment_ptr() + _builtin_implict_arg_offset()" to read the implicit arguments?

rebase after r272512

In D20297#449024, @jvesely wrote:

In D20297#444575, @arsenm wrote:

In D20297#442193, @tstellarAMD wrote:

In D20297#441156, @arsenm wrote:

Why not the offset from the base pointer intrinsic rather than an intrinsic to the offset?

Because if the implict args are stored after the explict kernel args, then the offset from kernarg.base.ptr to the start of implicit args is not known until compile time.

Why is this an issue? This won't be known anyway

just to be sure I understand correctly. your suggestion is to have "implicitarg.offset" so the libclc(or any other user) then uses "_builtin_kernarg_segment_ptr() + _builtin_implict_arg_offset()" to read the implicit arguments?

I have no preference either way on this. Since this patch has been outstanding for a while, I say just commit it. We can always add _builtin_implict_arg_offset() later if we decide it is a better solution.

This revision is now accepted and ready to land.Jun 17 2016, 8:20 PM

fix build without D20298 (GRID_OFFSET is still needed)

Closed by commit rL273317: AMDGPU: Add implicitarg.ptr intrinsic. (authored by jvesely). · Explain WhyJun 21 2016, 1:53 PM

This revision was automatically updated to reflect the committed changes.

jvesely mentioned this in D21622: AMDGPU/R600: Add implicitarg.ptr intrinsic.Jun 22 2016, 1:59 PM

Diff 61438

include/llvm/IR/IntrinsicsAMDGPU.td

	Show First 20 Lines • Show All 328 Lines • ▼ Show 20 Lines
	def int_amdgcn_queue_ptr :			def int_amdgcn_queue_ptr :
	GCCBuiltin<"__builtin_amdgcn_queue_ptr">,			GCCBuiltin<"__builtin_amdgcn_queue_ptr">,
	Intrinsic<[LLVMQualPointerType<llvm_i8_ty, 2>], [], [IntrNoMem]>;			Intrinsic<[LLVMQualPointerType<llvm_i8_ty, 2>], [], [IntrNoMem]>;

	def int_amdgcn_kernarg_segment_ptr :			def int_amdgcn_kernarg_segment_ptr :
	GCCBuiltin<"__builtin_amdgcn_kernarg_segment_ptr">,			GCCBuiltin<"__builtin_amdgcn_kernarg_segment_ptr">,
	Intrinsic<[LLVMQualPointerType<llvm_i8_ty, 2>], [], [IntrNoMem]>;			Intrinsic<[LLVMQualPointerType<llvm_i8_ty, 2>], [], [IntrNoMem]>;

				def int_amdgcn_implicitarg_ptr :
				GCCBuiltin<"__builtin_amdgcn_implicitarg_ptr">,
				Intrinsic<[LLVMQualPointerType<llvm_i8_ty, 2>], [], [IntrNoMem]>;

	// __builtin_amdgcn_interp_p1 <i>, <attr_chan>, <attr>, <m0>			// __builtin_amdgcn_interp_p1 <i>, <attr_chan>, <attr>, <m0>
	def int_amdgcn_interp_p1 :			def int_amdgcn_interp_p1 :
	GCCBuiltin<"__builtin_amdgcn_interp_p1">,			GCCBuiltin<"__builtin_amdgcn_interp_p1">,
	Intrinsic<[llvm_float_ty],			Intrinsic<[llvm_float_ty],
	[llvm_i32_ty, llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],			[llvm_i32_ty, llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
	[IntrNoMem]>; // This intrinsic reads from lds, but the memory			[IntrNoMem]>; // This intrinsic reads from lds, but the memory
	// values are constant, so it behaves like IntrNoMem.			// values are constant, so it behaves like IntrNoMem.

	▲ Show 20 Lines • Show All 66 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPUISelLowering.h

Show First 20 Lines • Show All 195 Lines • ▼ Show 20 Lines	public:
/// MachineFunction.		/// MachineFunction.
///		///
/// \returns a RegisterSDNode representing Reg.		/// \returns a RegisterSDNode representing Reg.
virtual SDValue CreateLiveInRegister(SelectionDAG &DAG,		virtual SDValue CreateLiveInRegister(SelectionDAG &DAG,
const TargetRegisterClass *RC,		const TargetRegisterClass *RC,
unsigned Reg, EVT VT) const;		unsigned Reg, EVT VT) const;

enum ImplicitParameter {		enum ImplicitParameter {
GRID_DIM,		FIRST_IMPLICIT,
GRID_OFFSET		GRID_DIM = FIRST_IMPLICIT,
		GRID_OFFSET,
};		};

/// \brief Helper function that returns the byte offset of the given		/// \brief Helper function that returns the byte offset of the given
/// type of implicit parameter.		/// type of implicit parameter.
uint32_t getImplicitParameterOffset(const AMDGPUMachineFunction *MFI,		uint32_t getImplicitParameterOffset(const AMDGPUMachineFunction *MFI,
const ImplicitParameter Param) const;		const ImplicitParameter Param) const;
};		};

▲ Show 20 Lines • Show All 103 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIISelLowering.h

	Show All 15 Lines
	#define LLVM_LIB_TARGET_AMDGPU_SIISELLOWERING_H			#define LLVM_LIB_TARGET_AMDGPU_SIISELLOWERING_H

	#include "AMDGPUISelLowering.h"			#include "AMDGPUISelLowering.h"
	#include "SIInstrInfo.h"			#include "SIInstrInfo.h"

	namespace llvm {			namespace llvm {

	class SITargetLowering final : public AMDGPUTargetLowering {			class SITargetLowering final : public AMDGPUTargetLowering {
	SDValue LowerParameter(SelectionDAG &DAG, EVT VT, EVT MemVT, const SDLoc &DL,			SDValue LowerParameterPtr(SelectionDAG &DAG, const SDLoc &SL, SDValue Chain,
				unsigned Offset) const;
				SDValue LowerParameter(SelectionDAG &DAG, EVT VT, EVT MemVT, const SDLoc &SL,
	SDValue Chain, unsigned Offset, bool Signed) const;			SDValue Chain, unsigned Offset, bool Signed) const;
	SDValue LowerGlobalAddress(AMDGPUMachineFunction *MFI, SDValue Op,			SDValue LowerGlobalAddress(AMDGPUMachineFunction *MFI, SDValue Op,
	SelectionDAG &DAG) const override;			SelectionDAG &DAG) const override;
	SDValue lowerImplicitZextParam(SelectionDAG &DAG, SDValue Op,			SDValue lowerImplicitZextParam(SelectionDAG &DAG, SDValue Op,
	MVT VT, unsigned Offset) const;			MVT VT, unsigned Offset) const;

	SDValue LowerINTRINSIC_WO_CHAIN(SDValue Op, SelectionDAG &DAG) const;			SDValue LowerINTRINSIC_WO_CHAIN(SDValue Op, SelectionDAG &DAG) const;
	SDValue LowerINTRINSIC_W_CHAIN(SDValue Op, SelectionDAG &DAG) const;			SDValue LowerINTRINSIC_W_CHAIN(SDValue Op, SelectionDAG &DAG) const;
	▲ Show 20 Lines • Show All 118 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIISelLowering.cpp

Show First 20 Lines • Show All 528 Lines • ▼ Show 20 Lines	bool SITargetLowering::isTypeDesirableForOp(unsigned Op, EVT VT) const {
// SimplifySetCC uses this function to determine whether or not it should		// SimplifySetCC uses this function to determine whether or not it should
// create setcc with i1 operands. We don't have instructions for i1 setcc.		// create setcc with i1 operands. We don't have instructions for i1 setcc.
if (VT == MVT::i1 && Op == ISD::SETCC)		if (VT == MVT::i1 && Op == ISD::SETCC)
return false;		return false;

return TargetLowering::isTypeDesirableForOp(Op, VT);		return TargetLowering::isTypeDesirableForOp(Op, VT);
}		}

SDValue SITargetLowering::LowerParameter(SelectionDAG &DAG, EVT VT, EVT MemVT,		SDValue SITargetLowering::LowerParameterPtr(SelectionDAG &DAG,
const SDLoc &SL, SDValue Chain,		const SDLoc &SL, SDValue Chain,
unsigned Offset, bool Signed) const {		unsigned Offset) const {
const DataLayout &DL = DAG.getDataLayout();		const DataLayout &DL = DAG.getDataLayout();
MachineFunction &MF = DAG.getMachineFunction();		MachineFunction &MF = DAG.getMachineFunction();
const SIRegisterInfo *TRI =		const SIRegisterInfo *TRI =
static_cast<const SIRegisterInfo*>(Subtarget->getRegisterInfo());		static_cast<const SIRegisterInfo*>(Subtarget->getRegisterInfo());
unsigned InputPtrReg = TRI->getPreloadedValue(MF, SIRegisterInfo::KERNARG_SEGMENT_PTR);		unsigned InputPtrReg = TRI->getPreloadedValue(MF, SIRegisterInfo::KERNARG_SEGMENT_PTR);

Type Ty = VT.getTypeForEVT(DAG.getContext());

MachineRegisterInfo &MRI = DAG.getMachineFunction().getRegInfo();		MachineRegisterInfo &MRI = DAG.getMachineFunction().getRegInfo();
MVT PtrVT = getPointerTy(DL, AMDGPUAS::CONSTANT_ADDRESS);		MVT PtrVT = getPointerTy(DL, AMDGPUAS::CONSTANT_ADDRESS);
PointerType *PtrTy = PointerType::get(Ty, AMDGPUAS::CONSTANT_ADDRESS);
SDValue BasePtr = DAG.getCopyFromReg(Chain, SL,		SDValue BasePtr = DAG.getCopyFromReg(Chain, SL,
MRI.getLiveInVirtReg(InputPtrReg), PtrVT);		MRI.getLiveInVirtReg(InputPtrReg), PtrVT);
SDValue Ptr = DAG.getNode(ISD::ADD, SL, PtrVT, BasePtr,		return DAG.getNode(ISD::ADD, SL, PtrVT, BasePtr,
DAG.getConstant(Offset, SL, PtrVT));		DAG.getConstant(Offset, SL, PtrVT));
		}
		SDValue SITargetLowering::LowerParameter(SelectionDAG &DAG, EVT VT, EVT MemVT,
		const SDLoc &SL, SDValue Chain,
		unsigned Offset, bool Signed) const {
		const DataLayout &DL = DAG.getDataLayout();
		Type Ty = VT.getTypeForEVT(DAG.getContext());
		MVT PtrVT = getPointerTy(DL, AMDGPUAS::CONSTANT_ADDRESS);
		PointerType *PtrTy = PointerType::get(Ty, AMDGPUAS::CONSTANT_ADDRESS);
SDValue PtrOffset = DAG.getUNDEF(PtrVT);		SDValue PtrOffset = DAG.getUNDEF(PtrVT);
MachinePointerInfo PtrInfo(UndefValue::get(PtrTy));		MachinePointerInfo PtrInfo(UndefValue::get(PtrTy));

unsigned Align = DL.getABITypeAlignment(Ty);		unsigned Align = DL.getABITypeAlignment(Ty);

ISD::LoadExtType ExtTy = Signed ? ISD::SEXTLOAD : ISD::ZEXTLOAD;		ISD::LoadExtType ExtTy = Signed ? ISD::SEXTLOAD : ISD::ZEXTLOAD;
if (MemVT.isFloatingPoint())		if (MemVT.isFloatingPoint())
ExtTy = ISD::EXTLOAD;		ExtTy = ISD::EXTLOAD;

		SDValue Ptr = LowerParameterPtr(DAG, SL, Chain, Offset);
return DAG.getLoad(ISD::UNINDEXED, ExtTy,		return DAG.getLoad(ISD::UNINDEXED, ExtTy,
VT, SL, Chain, Ptr, PtrOffset, PtrInfo, MemVT,		VT, SL, Chain, Ptr, PtrOffset, PtrInfo, MemVT,
false, // isVolatile		false, // isVolatile
true, // isNonTemporal		true, // isNonTemporal
true, // isInvariant		true, // isInvariant
Align); // Alignment		Align); // Alignment
}		}

▲ Show 20 Lines • Show All 963 Lines • ▼ Show 20 Lines	if (!Subtarget->isAmdHsaOS()) {
return DAG.getUNDEF(VT);		return DAG.getUNDEF(VT);
}		}

auto Reg = IntrinsicID == Intrinsic::amdgcn_dispatch_ptr ?		auto Reg = IntrinsicID == Intrinsic::amdgcn_dispatch_ptr ?
SIRegisterInfo::DISPATCH_PTR : SIRegisterInfo::QUEUE_PTR;		SIRegisterInfo::DISPATCH_PTR : SIRegisterInfo::QUEUE_PTR;
return CreateLiveInRegister(DAG, &AMDGPU::SReg_64RegClass,		return CreateLiveInRegister(DAG, &AMDGPU::SReg_64RegClass,
TRI->getPreloadedValue(MF, Reg), VT);		TRI->getPreloadedValue(MF, Reg), VT);
}		}
		case Intrinsic::amdgcn_implicitarg_ptr: {
		unsigned offset = getImplicitParameterOffset(MFI, FIRST_IMPLICIT);
		return LowerParameterPtr(DAG, DL, DAG.getEntryNode(), offset);
		}
case Intrinsic::amdgcn_kernarg_segment_ptr: {		case Intrinsic::amdgcn_kernarg_segment_ptr: {
unsigned Reg		unsigned Reg
= TRI->getPreloadedValue(MF, SIRegisterInfo::KERNARG_SEGMENT_PTR);		= TRI->getPreloadedValue(MF, SIRegisterInfo::KERNARG_SEGMENT_PTR);
return CreateLiveInRegister(DAG, &AMDGPU::SReg_64RegClass, Reg, VT);		return CreateLiveInRegister(DAG, &AMDGPU::SReg_64RegClass, Reg, VT);
}		}
case Intrinsic::amdgcn_rcp:		case Intrinsic::amdgcn_rcp:
return DAG.getNode(AMDGPUISD::RCP, DL, VT, Op.getOperand(1));		return DAG.getNode(AMDGPUISD::RCP, DL, VT, Op.getOperand(1));
		tstellarAMDUnsubmitted Done Reply Inline Actions There should be a separate intrinsic which points to the start of implicit args. As the ABI is currently, the library could use kernarg_segment_ptr to load the workgroup size information. tstellarAMD: There should be a separate intrinsic which points to the start of implicit args. As the ABI is…
case Intrinsic::amdgcn_rsq:		case Intrinsic::amdgcn_rsq:
case AMDGPUIntrinsic::AMDGPU_rsq: // Legacy name		case AMDGPUIntrinsic::AMDGPU_rsq: // Legacy name
return DAG.getNode(AMDGPUISD::RSQ, DL, VT, Op.getOperand(1));		return DAG.getNode(AMDGPUISD::RSQ, DL, VT, Op.getOperand(1));
case Intrinsic::amdgcn_rsq_legacy: {		case Intrinsic::amdgcn_rsq_legacy: {
if (Subtarget->getGeneration() >= AMDGPUSubtarget::VOLCANIC_ISLANDS)		if (Subtarget->getGeneration() >= AMDGPUSubtarget::VOLCANIC_ISLANDS)
return emitRemovedIntrinsicError(DAG, DL, VT);		return emitRemovedIntrinsicError(DAG, DL, VT);

return DAG.getNode(AMDGPUISD::RSQ_LEGACY, DL, VT, Op.getOperand(1));		return DAG.getNode(AMDGPUISD::RSQ_LEGACY, DL, VT, Op.getOperand(1));
▲ Show 20 Lines • Show All 1,842 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/llvm.amdgcn.kernarg.segment.ptr.ll

Show All 9 Lines	define void @test(i32 addrspace(1)* %out) #1 {
%kernarg.segment.ptr = call noalias i8 addrspace(2)* @llvm.amdgcn.kernarg.segment.ptr()		%kernarg.segment.ptr = call noalias i8 addrspace(2)* @llvm.amdgcn.kernarg.segment.ptr()
%header.ptr = bitcast i8 addrspace(2)* %kernarg.segment.ptr to i32 addrspace(2)*		%header.ptr = bitcast i8 addrspace(2)* %kernarg.segment.ptr to i32 addrspace(2)*
%gep = getelementptr i32, i32 addrspace(2)* %header.ptr, i64 10		%gep = getelementptr i32, i32 addrspace(2)* %header.ptr, i64 10
%value = load i32, i32 addrspace(2)* %gep		%value = load i32, i32 addrspace(2)* %gep
store i32 %value, i32 addrspace(1)* %out		store i32 %value, i32 addrspace(1)* %out
ret void		ret void
}		}

		; ALL-LABEL: {{^}}test_implicit:
		; 10 + 9 (36 prepended implicit bytes) + 2(out pointer) = 21 = 0x15
		; MESA: s_load_dword s{{[0-9]+}}, s[0:1], 0x15
		define void @test_implicit(i32 addrspace(1)* %out) #1 {
		%implicitarg.ptr = call noalias i8 addrspace(2)* @llvm.amdgcn.implicitarg.ptr()
		%header.ptr = bitcast i8 addrspace(2)* %implicitarg.ptr to i32 addrspace(2)*
		%gep = getelementptr i32, i32 addrspace(2)* %header.ptr, i64 10
		%value = load i32, i32 addrspace(2)* %gep
		store i32 %value, i32 addrspace(1)* %out
		ret void
		}

declare i8 addrspace(2)* @llvm.amdgcn.kernarg.segment.ptr() #0		declare i8 addrspace(2)* @llvm.amdgcn.kernarg.segment.ptr() #0
		declare i8 addrspace(2)* @llvm.amdgcn.implicitarg.ptr() #0

attributes #0 = { nounwind readnone }		attributes #0 = { nounwind readnone }
attributes #1 = { nounwind }		attributes #1 = { nounwind }

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/SI: Add implicitarg.ptr intrinsic.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 61438

include/llvm/IR/IntrinsicsAMDGPU.td

lib/Target/AMDGPU/AMDGPUISelLowering.h

lib/Target/AMDGPU/SIISelLowering.h

lib/Target/AMDGPU/SIISelLowering.cpp

test/CodeGen/AMDGPU/llvm.amdgcn.kernarg.segment.ptr.ll

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/SI: Add implicitarg.ptr intrinsic.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 61438

include/llvm/IR/IntrinsicsAMDGPU.td

lib/Target/AMDGPU/AMDGPUISelLowering.h

lib/Target/AMDGPU/SIISelLowering.h

lib/Target/AMDGPU/SIISelLowering.cpp

test/CodeGen/AMDGPU/llvm.amdgcn.kernarg.segment.ptr.ll

AMDGPU/SI: Add implicitarg.ptr intrinsic.
ClosedPublic