This is an archive of the discontinued LLVM Phabricator instance.

lib/Target/AMDGPU/SIISelLowering.cpp
153 ↗	(On Diff #128485)	Would it be better to add a query for whether unaligned accesses were supported? Currently it would only be for AmdHsaOS but would allow it to be easily changed in the future.

arsenm added inline comments.Jan 2 2018, 8:30 PM

lib/Target/AMDGPU/SIISelLowering.cpp
153 ↗	(On Diff #128485)	We already have that. The problem is working around a variety of issues derived from operation legality not being specified per address space of

Version 3:

Much simpler.
Addrspacecast is never inserted.
Unaligned loads are unaffected (work fine).
Only scalar loads support 32-bit pointers. An address in a VGPR will fail to compile.
D41715 (amdgpu.uniform on loads) is required for enforce scalar loads in some cases.

Harbormaster completed remote builds in B13503: Diff 128561.Jan 3 2018, 1:47 PM

mareko marked 2 inline comments as done.Jan 4 2018, 9:09 AM

This needs to update AMDGPUAliasAnalysis. Also needs more test coverage. I don't see this testing unaligned access or some of the other places it was added.

This part is problematic:
"Only scalar loads support 32-bit pointers. An address in a VGPR will fail to compile."
"D41715 (amdgpu.uniform on loads) is required for enforce scalar loads in some cases."

We can't rely on metadata to be able to compile

lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1447 ↗	(On Diff #128561)	I think this needs to be _XEXEC

This revision now requires changes to proceed.Jan 9 2018, 7:46 AM

In D41651#971102, @arsenm wrote:

This needs to update AMDGPUAliasAnalysis. Also needs more test coverage. I don't see this testing unaligned access or some of the other places it was added.

I added CONSTANT_ADDRESS_32BIT to most places that used CONSTANT_ADDRESS. I don't know whether that's necessary or not.

This part is problematic:
"Only scalar loads support 32-bit pointers. An address in a VGPR will fail to compile."
"D41715 (amdgpu.uniform on loads) is required for enforce scalar loads in some cases."

We can't rely on metadata to be able to compile

We need to rely on metadata. VMEM loads are simply unsupported because we don't need them with the 32-bit address space. Unaligned loads are also untested and unsupported because we don't need them either.

mareko added a comment.Jan 9 2018, 4:41 PM

This comment was removed by mareko.

Only scalar loads support 32-bit pointers. An address in a VGPR will
fail to compile. That's OK because the results of loads will only be used
in places where VGPRs are forbidden.

Updated AMDGPUAliasAnalysis and used SReg_64_XEXEC.
The tests cover all uses cases we need for Mesa.

I don't know whether handling CONSTANT_ADDRESS_32BIT everywhere
is necessary. Comments welcome.

Harbormaster completed remote builds in B13678: Diff 129282.Jan 10 2018, 8:50 AM

mareko marked an inline comment as done.Jan 15 2018, 7:40 AM

This needs documentation in AMDGPUUsage.rst.

Relying on metadata for correctness is indeed not okay. We should either say that CONSTANT_ADDRESS_32BIT just assumes uniformness, and move the address to an SGPR (via v_readfirstlane) if required, *or* support this also with VMEM instructions.

As far as I understand, the point of this change is to use 32-bit pointers for descriptor tables. It doesn't seem too far-fetched that we'll eventually have to supported extensions with divergent resource descriptors, so I vaguely prefer the second solution.

The other question is, why do we need a new address space at all? Can't we synthesize an appropriate pointer via inttoptr casts? I believe this is what SCPC is doing.

Nicolai, I think you mean LLPC is synthesizing a pointer using inttoptr. That is true, but Marek pointed out to us that it is a bad thing because it means alias analysis cannot do such a good job. So we're interested in using 32 bit pointers instead.

For LLPC we need to extend a 32 bit pointer to 64 bit either by a value supplied in an option/feature, or by using the high half of s_getpc. But we don't need to get that into this change -- we can add it later, as long as this change is structured to allow it.

In D41651#986619, @nhaehnle wrote:

This needs documentation in AMDGPUUsage.rst.

Relying on metadata for correctness is indeed not okay. We should either say that CONSTANT_ADDRESS_32BIT just assumes uniformness, and move the address to an SGPR (via v_readfirstlane) if required, *or* support this also with VMEM instructions.

Here is why relying on metadata is OK.

The behavior of 64-bit pointers:

If the address is in VGPRs and amdgpu.uniform is not dropped, you'll get readfirstlane and correct behavior.
If the address is in VGPRs and amdgpu.uniform is dropped by a random pass, you'll get SMEM opcodes reading descriptors from VGPRs, so you'll get an invalid binary without an error and a GPU hang.

The behavior for 32-bit pointers:

If the address is in VGPRs and amdgpu.uniform is not dropped, you'll get readfirstlane and correct behavior.
If the address is in VGPRs and amdgpu.uniform is dropped by a random pass, you'll get a compile error.

Therefore, 32-bit pointers are a significant improvement in compiler behavior over 64-bit pointers. The current implementation covers everything Mesa will ever need. 32-bit pointers in VMEM opcodes would be a bonus, but it would also be useless for Mesa.

As far as I understand, the point of this change is to use 32-bit pointers for descriptor tables. It doesn't seem too far-fetched that we'll eventually have to supported extensions with divergent resource descriptors, so I vaguely prefer the second solution.

Game developers will be advised to use the readfirstlane intrinsic in a loop, as has happened in the past. As long as AMD doesn't support divergent resource descriptors in other drivers, we are fine.

The other question is, why do we need a new address space at all? Can't we synthesize an appropriate pointer via inttoptr casts? I believe this is what SCPC is doing.

The short story is: We should never use inttoptr if InstCombine can't remove it. inttoptr is unoptimizable by LLVM.

Only scalar loads support 32-bit pointers. An address in a VGPR will
fail to compile. That's OK because the results of loads will only be used
in places where VGPRs are forbidden.

Updated AMDGPUAliasAnalysis and used SReg_64_XEXEC.
The tests cover all uses cases we need for Mesa.

Updated documentation.

In D41651#988249, @mareko wrote:

In D41651#986619, @nhaehnle wrote:

This needs documentation in AMDGPUUsage.rst.

Relying on metadata for correctness is indeed not okay. We should either say that CONSTANT_ADDRESS_32BIT just assumes uniformness, and move the address to an SGPR (via v_readfirstlane) if required, *or* support this also with VMEM instructions.

Here is why relying on metadata is OK.

The behavior of 64-bit pointers:

If the address is in VGPRs and amdgpu.uniform is not dropped, you'll get readfirstlane and correct behavior.

If the address is in VGPRs and amdgpu.uniform is dropped by a random pass, you'll get SMEM opcodes reading descriptors from VGPRs, so you'll get an invalid binary without an error and a GPU hang.

The behavior for 32-bit pointers:

If the address is in VGPRs and amdgpu.uniform is not dropped, you'll get readfirstlane and correct behavior.

If the address is in VGPRs and amdgpu.uniform is dropped by a random pass, you'll get a compile error.

This is exactly why it's not OK? If it's dropped you get a compile error or miscompile

In D41651#988348, @arsenm wrote:

This is exactly why it's not OK? If it's dropped you get a compile error or miscompile

I'd rather get a compile error than miscompile. Only 64-bit pointers miscompile. This patch makes sure that 32-bit pointers do not miscompile. We've always been relying on metadata for correctness. This patch doesn't change that, nor does it intend to.

I get it. Some people want VMEM with 32-bit pointers. That is, however, not so easy for me. This patch implements the subset of 32-bit pointer functionality that Mesa will use, and only that. I've been using it and testing it for ~26 days now.

In D41651#988249, @mareko wrote:

Here is why relying on metadata is OK.

The behavior of 64-bit pointers:

If the address is in VGPRs and amdgpu.uniform is not dropped, you'll get readfirstlane and correct behavior.

If the address is in VGPRs and amdgpu.uniform is dropped by a random pass, you'll get SMEM opcodes reading descriptors from VGPRs, so you'll get an invalid binary without an error and a GPU hang.

Just to clarify, do you mean that the 64-bit pointer will be used in a VMEM load, which would (typically) load a descriptor into VGPRs and then that descriptor would be fed as VGPRs into a MUBUF or MIMG instruction? We should fix that and automatically introduce v_readfirstlanes. I actually thought I had fixed that bug at some point in the past, but maybe it has reappeared :/

I agree that unlike silent miscompilation, compile errors are something we could potentially live with today. But if moving to VMEM is too difficult, I think it would still be better to declare that CONSTANT_ADDRESS_32BIT can only be used with dynamically uniform addresses, and just introduce a v_readfirstlane regardless of whether the metadata is there.

The other question is, why do we need a new address space at all? Can't we synthesize an appropriate pointer via inttoptr casts? I believe this is what SCPC is doing.

The short story is: We should never use inttoptr if InstCombine can't remove it. inttoptr is unoptimizable by LLVM.

Hmm yeah, I vaguely remember this now, although I have to admit that I never properly understood the details. I suppose adding the address space is easier than fighting the fight at the generic LLVM level.

32-bit loads are always considered uniform and so are always translated
to s_loads with possible v_readfirstlane.

If you don't want uniformity, expand the pointer into a descriptor
manually and use SI.load.const.

Harbormaster completed remote builds in B14479: Diff 132216.Jan 31 2018, 11:04 AM

Ping

This revision was not accepted when it landed; it landed in state Needs Review.Feb 7 2018, 8:05 AM

Closed by commit rL324487: AMDGPU: Add 32-bit constant address space (authored by mareko). · Explain Why

This revision was automatically updated to reflect the committed changes.

alex-t added a subscriber: alex-t.Feb 14 2018, 11:48 AM

alex-t added inline comments.

llvm/trunk/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
232	Is it still uniform even if depends on divergent data? Like this: %tid = tail call i32 @llvm.amdgcn.workgroup.id.x() %gep = getelementptr i32 addrspace(6)* %tid %val = load i32, i32 addrspace(6)* %gep This is not correct Moreover, this violates Divergence Analysis results

mareko added inline comments.Feb 14 2018, 3:18 PM

llvm/trunk/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
232	The address space implies uniformity and is geared towards shader resource descriptor loads where uniformity is required. Non-uniform addresses result in v_readfirstlane. In the future, the implementation can be extended to support non-divergent data and VMEM loads/stores/atomics.

In fact v_readfirstlane is inserted by the ISel to glue vector input to the unexpected scalar instruction.
This means that compiler user writing valid IR will get unexpected behavior.
Is this documented somewhere?

My objections WRT implementation are:
Bypassing the normal way of processing values divergence is misleading. I was very much surprised to see "amdgpu.uniform" metadata already set at the point (AMDGPUAnnotateUniformValues) where they are expected to be queried from DA.
Moreover they were set for the value that is reported by DA as divergent!

The correct place to do this is TargetTransformInfo::isAlwaysUniform hook that I added specifically for handling target-specific features (like readfirstlane itself BTW).
Using that hook lets the DA to process values that produce uniform result irrelative of their operands divergence correctly. Then the DA computes the divergence for such exceptions in normal way and no hackery on metadata is needed.
We just query the divergence in AMDGPUAnnotateUniformValues as we did before and set metadata accordingly.

In D41651#1008600, @alex-t wrote:

In fact v_readfirstlane is inserted by the ISel to glue vector input to the unexpected scalar instruction.
This means that compiler user writing valid IR will get unexpected behavior.
Is this documented somewhere?

My objections WRT implementation are:
Bypassing the normal way of processing values divergence is misleading. I was very much surprised to see "amdgpu.uniform" metadata already set at the point (AMDGPUAnnotateUniformValues) where they are expected to be queried from DA.
Moreover they were set for the value that is reported by DA as divergent!

The address space is meant to be used for loading shader resource descriptors. It's not a general-purpose address space.

Revision Contents

Path

Size

llvm/

trunk/

docs/

AMDGPUUsage.rst

1 line

lib/

Target/

AMDGPU/

AMDGPU.h

3 lines

AMDGPUAliasAnalysis.cpp

3 lines

AMDGPUCodeGenPrepare.cpp

3 lines

AMDGPUISelDAGToDAG.cpp

34 lines

AMDGPUInstructionSelector.cpp

6 lines

AMDGPUTargetMachine.cpp

2 lines

AMDGPUTargetTransformInfo.cpp

1 line

SIISelLowering.cpp

30 lines

SIMachineFunctionInfo.h

6 lines

SIMachineFunctionInfo.cpp

8 lines

SMInstructions.td

3 lines

Utils/

AMDGPUBaseInfo.cpp

6 lines

test/

CodeGen/

AMDGPU/

constant-address-space-32bit.ll

288 lines

Diff 133216

llvm/trunk/docs/AMDGPUUsage.rst

Show First 20 Lines • Show All 279 Lines • ▼ Show 20 Lines	.. table:: Address Space Mapping
\ Current Default amdgiz/amdgizcl hcc Future Default		\ Current Default amdgiz/amdgizcl hcc Future Default
================== ================= ================= ================= =================		================== ================= ================= ================= =================
0 Private (Scratch) Generic (Flat) Generic (Flat) Generic (Flat)		0 Private (Scratch) Generic (Flat) Generic (Flat) Generic (Flat)
1 Global Global Global Global		1 Global Global Global Global
2 Constant Constant Constant Region (GDS)		2 Constant Constant Constant Region (GDS)
3 Local (group/LDS) Local (group/LDS) Local (group/LDS) Local (group/LDS)		3 Local (group/LDS) Local (group/LDS) Local (group/LDS) Local (group/LDS)
4 Generic (Flat) Region (GDS) Region (GDS) Constant		4 Generic (Flat) Region (GDS) Region (GDS) Constant
5 Region (GDS) Private (Scratch) Private (Scratch) Private (Scratch)		5 Region (GDS) Private (Scratch) Private (Scratch) Private (Scratch)
		6 Constant 32-bit Constant 32-bit Constant 32-bit Constant 32-bit
================== ================= ================= ================= =================		================== ================= ================= ================= =================

Current Default		Current Default
This is the current default address space mapping used for all languages		This is the current default address space mapping used for all languages
except hcc. This will shortly be deprecated.		except hcc. This will shortly be deprecated.

amdgiz/amdgizcl		amdgiz/amdgizcl
This is the current address space mapping used when ``amdgiz`` or ``amdgizcl``		This is the current address space mapping used when ``amdgiz`` or ``amdgizcl``
▲ Show 20 Lines • Show All 3,827 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/AMDGPU.h

Show First 20 Lines • Show All 218 Lines • ▼ Show 20 Lines	struct AMDGPUAS {

enum : unsigned {		enum : unsigned {
// The maximum value for flat, generic, local, private, constant and region.		// The maximum value for flat, generic, local, private, constant and region.
MAX_COMMON_ADDRESS = 5,		MAX_COMMON_ADDRESS = 5,

GLOBAL_ADDRESS = 1, ///< Address space for global memory (RAT0, VTX0).		GLOBAL_ADDRESS = 1, ///< Address space for global memory (RAT0, VTX0).
CONSTANT_ADDRESS = 2, ///< Address space for constant memory (VTX2)		CONSTANT_ADDRESS = 2, ///< Address space for constant memory (VTX2)
LOCAL_ADDRESS = 3, ///< Address space for local memory.		LOCAL_ADDRESS = 3, ///< Address space for local memory.

		CONSTANT_ADDRESS_32BIT = 6, ///< Address space for 32-bit constant memory

/// Address space for direct addressible parameter memory (CONST0)		/// Address space for direct addressible parameter memory (CONST0)
PARAM_D_ADDRESS = 6,		PARAM_D_ADDRESS = 6,
/// Address space for indirect addressible parameter memory (VTX1)		/// Address space for indirect addressible parameter memory (VTX1)
PARAM_I_ADDRESS = 7,		PARAM_I_ADDRESS = 7,

// Do not re-order the CONSTANT_BUFFER_* enums. Several places depend on		// Do not re-order the CONSTANT_BUFFER_* enums. Several places depend on
// this order to be able to dynamically index a constant buffer, for		// this order to be able to dynamically index a constant buffer, for
// example:		// example:
Show All 34 Lines

llvm/trunk/lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp

Show First 20 Lines • Show All 109 Lines • ▼ Show 20 Lines	AliasResult AMDGPUAAResult::alias(const MemoryLocation &LocA,
// Forward the query to the next alias analysis.		// Forward the query to the next alias analysis.
return AAResultBase::alias(LocA, LocB);		return AAResultBase::alias(LocA, LocB);
}		}

bool AMDGPUAAResult::pointsToConstantMemory(const MemoryLocation &Loc,		bool AMDGPUAAResult::pointsToConstantMemory(const MemoryLocation &Loc,
bool OrLocal) {		bool OrLocal) {
const Value *Base = GetUnderlyingObject(Loc.Ptr, DL);		const Value *Base = GetUnderlyingObject(Loc.Ptr, DL);

if (Base->getType()->getPointerAddressSpace() == AS.CONSTANT_ADDRESS) {		if (Base->getType()->getPointerAddressSpace() == AS.CONSTANT_ADDRESS \|\|
		Base->getType()->getPointerAddressSpace() == AS.CONSTANT_ADDRESS_32BIT) {
return true;		return true;
}		}

if (const GlobalVariable *GV = dyn_cast<GlobalVariable>(Base)) {		if (const GlobalVariable *GV = dyn_cast<GlobalVariable>(Base)) {
if (GV->isConstant())		if (GV->isConstant())
return true;		return true;
} else if (const Argument *Arg = dyn_cast<Argument>(Base)) {		} else if (const Argument *Arg = dyn_cast<Argument>(Base)) {
const Function *F = Arg->getParent();		const Function *F = Arg->getParent();
Show All 33 Lines

llvm/trunk/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp

Show First 20 Lines • Show All 460 Lines • ▼ Show 20 Lines	bool AMDGPUCodeGenPrepare::visitBinaryOperator(BinaryOperator &I) {
if (ST->has16BitInsts() && needsPromotionToI32(I.getType()) &&		if (ST->has16BitInsts() && needsPromotionToI32(I.getType()) &&
DA->isUniform(&I))		DA->isUniform(&I))
Changed \|= promoteUniformOpToI32(I);		Changed \|= promoteUniformOpToI32(I);

return Changed;		return Changed;
}		}

bool AMDGPUCodeGenPrepare::visitLoadInst(LoadInst &I) {		bool AMDGPUCodeGenPrepare::visitLoadInst(LoadInst &I) {
if (I.getPointerAddressSpace() == AMDGPUASI.CONSTANT_ADDRESS &&		if ((I.getPointerAddressSpace() == AMDGPUASI.CONSTANT_ADDRESS \|\|
		I.getPointerAddressSpace() == AMDGPUASI.CONSTANT_ADDRESS_32BIT) &&
canWidenScalarExtLoad(I)) {		canWidenScalarExtLoad(I)) {
IRBuilder<> Builder(&I);		IRBuilder<> Builder(&I);
Builder.SetCurrentDebugLocation(I.getDebugLoc());		Builder.SetCurrentDebugLocation(I.getDebugLoc());

Type *I32Ty = Builder.getInt32Ty();		Type *I32Ty = Builder.getInt32Ty();
Type *PT = PointerType::get(I32Ty, I.getPointerAddressSpace());		Type *PT = PointerType::get(I32Ty, I.getPointerAddressSpace());
Value *BitCast= Builder.CreateBitCast(I.getPointerOperand(), PT);		Value *BitCast= Builder.CreateBitCast(I.getPointerOperand(), PT);
Value *WidenLoad = Builder.CreateLoad(BitCast);		Value *WidenLoad = Builder.CreateLoad(BitCast);
▲ Show 20 Lines • Show All 94 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp

Show First 20 Lines • Show All 156 Lines • ▼ Show 20 Lines	bool SelectFlatAtomicSigned(SDValue Addr, SDValue &VAddr,
SDValue &Offset, SDValue &SLC) const;		SDValue &Offset, SDValue &SLC) const;

template <bool IsSigned>		template <bool IsSigned>
bool SelectFlatOffset(SDValue Addr, SDValue &VAddr,		bool SelectFlatOffset(SDValue Addr, SDValue &VAddr,
SDValue &Offset, SDValue &SLC) const;		SDValue &Offset, SDValue &SLC) const;

bool SelectSMRDOffset(SDValue ByteOffsetNode, SDValue &Offset,		bool SelectSMRDOffset(SDValue ByteOffsetNode, SDValue &Offset,
bool &Imm) const;		bool &Imm) const;
		SDValue Expand32BitAddress(SDValue Addr) const;
bool SelectSMRD(SDValue Addr, SDValue &SBase, SDValue &Offset,		bool SelectSMRD(SDValue Addr, SDValue &SBase, SDValue &Offset,
bool &Imm) const;		bool &Imm) const;
bool SelectSMRDImm(SDValue Addr, SDValue &SBase, SDValue &Offset) const;		bool SelectSMRDImm(SDValue Addr, SDValue &SBase, SDValue &Offset) const;
bool SelectSMRDImm32(SDValue Addr, SDValue &SBase, SDValue &Offset) const;		bool SelectSMRDImm32(SDValue Addr, SDValue &SBase, SDValue &Offset) const;
bool SelectSMRDSgpr(SDValue Addr, SDValue &SBase, SDValue &Offset) const;		bool SelectSMRDSgpr(SDValue Addr, SDValue &SBase, SDValue &Offset) const;
bool SelectSMRDBufferImm(SDValue Addr, SDValue &Offset) const;		bool SelectSMRDBufferImm(SDValue Addr, SDValue &Offset) const;
bool SelectSMRDBufferImm32(SDValue Addr, SDValue &Offset) const;		bool SelectSMRDBufferImm32(SDValue Addr, SDValue &Offset) const;
bool SelectMOVRELOffset(SDValue Index, SDValue &Base, SDValue &Offset) const;		bool SelectMOVRELOffset(SDValue Index, SDValue &Base, SDValue &Offset) const;
▲ Show 20 Lines • Show All 458 Lines • ▼ Show 20 Lines	void AMDGPUDAGToDAGISel::Select(SDNode *N) {

SelectCode(N);		SelectCode(N);
}		}

bool AMDGPUDAGToDAGISel::isConstantLoad(const MemSDNode *N, int CbId) const {		bool AMDGPUDAGToDAGISel::isConstantLoad(const MemSDNode *N, int CbId) const {
if (!N->readMem())		if (!N->readMem())
return false;		return false;
if (CbId == -1)		if (CbId == -1)
return N->getAddressSpace() == AMDGPUASI.CONSTANT_ADDRESS;		return N->getAddressSpace() == AMDGPUASI.CONSTANT_ADDRESS \|\|
		N->getAddressSpace() == AMDGPUASI.CONSTANT_ADDRESS_32BIT;

return N->getAddressSpace() == AMDGPUASI.CONSTANT_BUFFER_0 + CbId;		return N->getAddressSpace() == AMDGPUASI.CONSTANT_BUFFER_0 + CbId;
}		}

bool AMDGPUDAGToDAGISel::isUniformBr(const SDNode *N) const {		bool AMDGPUDAGToDAGISel::isUniformBr(const SDNode *N) const {
const BasicBlock *BB = FuncInfo->MBB->getBasicBlock();		const BasicBlock *BB = FuncInfo->MBB->getBasicBlock();
const Instruction *Term = BB->getTerminator();		const Instruction *Term = BB->getTerminator();
return Term->getMetadata("amdgpu.uniform") \|\|		return Term->getMetadata("amdgpu.uniform") \|\|
▲ Show 20 Lines • Show All 785 Lines • ▼ Show 20 Lines	if (Gen == AMDGPUSubtarget::SEA_ISLANDS && isUInt<32>(EncodedOffset)) {
SDValue C32Bit = CurDAG->getTargetConstant(ByteOffset, SL, MVT::i32);		SDValue C32Bit = CurDAG->getTargetConstant(ByteOffset, SL, MVT::i32);
Offset = SDValue(CurDAG->getMachineNode(AMDGPU::S_MOV_B32, SL, MVT::i32,		Offset = SDValue(CurDAG->getMachineNode(AMDGPU::S_MOV_B32, SL, MVT::i32,
C32Bit), 0);		C32Bit), 0);
}		}
Imm = false;		Imm = false;
return true;		return true;
}		}

		SDValue AMDGPUDAGToDAGISel::Expand32BitAddress(SDValue Addr) const {
		if (Addr.getValueType() != MVT::i32)
		return Addr;

		// Zero-extend a 32-bit address.
		SDLoc SL(Addr);

		const MachineFunction &MF = CurDAG->getMachineFunction();
		const SIMachineFunctionInfo *Info = MF.getInfo<SIMachineFunctionInfo>();
		unsigned AddrHiVal = Info->get32BitAddressHighBits();
		SDValue AddrHi = CurDAG->getTargetConstant(AddrHiVal, SL, MVT::i32);

		const SDValue Ops[] = {
		CurDAG->getTargetConstant(AMDGPU::SReg_64_XEXECRegClassID, SL, MVT::i32),
		Addr,
		CurDAG->getTargetConstant(AMDGPU::sub0, SL, MVT::i32),
		SDValue(CurDAG->getMachineNode(AMDGPU::S_MOV_B32, SL, MVT::i32, AddrHi),
		0),
		CurDAG->getTargetConstant(AMDGPU::sub1, SL, MVT::i32),
		};

		return SDValue(CurDAG->getMachineNode(AMDGPU::REG_SEQUENCE, SL, MVT::i64,
		Ops), 0);
		}

bool AMDGPUDAGToDAGISel::SelectSMRD(SDValue Addr, SDValue &SBase,		bool AMDGPUDAGToDAGISel::SelectSMRD(SDValue Addr, SDValue &SBase,
SDValue &Offset, bool &Imm) const {		SDValue &Offset, bool &Imm) const {
SDLoc SL(Addr);		SDLoc SL(Addr);

if (CurDAG->isBaseWithConstantOffset(Addr)) {		if (CurDAG->isBaseWithConstantOffset(Addr)) {
SDValue N0 = Addr.getOperand(0);		SDValue N0 = Addr.getOperand(0);
SDValue N1 = Addr.getOperand(1);		SDValue N1 = Addr.getOperand(1);

if (SelectSMRDOffset(N1, Offset, Imm)) {		if (SelectSMRDOffset(N1, Offset, Imm)) {
SBase = N0;		SBase = Expand32BitAddress(N0);
return true;		return true;
}		}
}		}
SBase = Addr;		SBase = Expand32BitAddress(Addr);
Offset = CurDAG->getTargetConstant(0, SL, MVT::i32);		Offset = CurDAG->getTargetConstant(0, SL, MVT::i32);
Imm = true;		Imm = true;
return true;		return true;
}		}

bool AMDGPUDAGToDAGISel::SelectSMRDImm(SDValue Addr, SDValue &SBase,		bool AMDGPUDAGToDAGISel::SelectSMRDImm(SDValue Addr, SDValue &SBase,
SDValue &Offset) const {		SDValue &Offset) const {
bool Imm;		bool Imm;
▲ Show 20 Lines • Show All 753 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp

Show First 20 Lines • Show All 223 Lines • ▼ Show 20 Lines	static bool isInstrUniform(const MachineInstr &MI) {
// UndefValue means this is a load of a kernel input. These are uniform.		// UndefValue means this is a load of a kernel input. These are uniform.
// Sometimes LDS instructions have constant pointers.		// Sometimes LDS instructions have constant pointers.
// If Ptr is null, then that means this mem operand contains a		// If Ptr is null, then that means this mem operand contains a
// PseudoSourceValue like GOT.		// PseudoSourceValue like GOT.
if (!Ptr \|\| isa<UndefValue>(Ptr) \|\| isa<Argument>(Ptr) \|\|		if (!Ptr \|\| isa<UndefValue>(Ptr) \|\| isa<Argument>(Ptr) \|\|
isa<Constant>(Ptr) \|\| isa<GlobalValue>(Ptr))		isa<Constant>(Ptr) \|\| isa<GlobalValue>(Ptr))
return true;		return true;

		if (MMO->getAddrSpace() == AMDGPUAS::CONSTANT_ADDRESS_32BIT)
		alex-tUnsubmitted Not Done Reply Inline Actions Is it still uniform even if depends on divergent data? Like this: %tid = tail call i32 @llvm.amdgcn.workgroup.id.x() %gep = getelementptr i32 addrspace(6)* %tid %val = load i32, i32 addrspace(6)* %gep This is not correct Moreover, this violates Divergence Analysis results alex-t: Is it still uniform even if depends on divergent data? Like this: %tid = tail call i32 @llvm.
		marekoAuthorUnsubmitted Not Done Reply Inline Actions The address space implies uniformity and is geared towards shader resource descriptor loads where uniformity is required. Non-uniform addresses result in v_readfirstlane. In the future, the implementation can be extended to support non-divergent data and VMEM loads/stores/atomics. mareko: The address space implies uniformity and is geared towards shader resource descriptor loads…
		return true;

const Instruction *I = dyn_cast<Instruction>(Ptr);		const Instruction *I = dyn_cast<Instruction>(Ptr);
return I && I->getMetadata("amdgpu.uniform");		return I && I->getMetadata("amdgpu.uniform");
}		}

static unsigned getSmrdOpcode(unsigned BaseOpcode, unsigned LoadSize) {		static unsigned getSmrdOpcode(unsigned BaseOpcode, unsigned LoadSize) {

if (LoadSize == 32)		if (LoadSize == 32)
return BaseOpcode;		return BaseOpcode;
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines
}		}

bool AMDGPUInstructionSelector::selectSMRD(MachineInstr &I,		bool AMDGPUInstructionSelector::selectSMRD(MachineInstr &I,
ArrayRef<GEPInfo> AddrInfo) const {		ArrayRef<GEPInfo> AddrInfo) const {

if (!I.hasOneMemOperand())		if (!I.hasOneMemOperand())
return false;		return false;

if ((*I.memoperands_begin())->getAddrSpace() != AMDGPUASI.CONSTANT_ADDRESS)		if ((*I.memoperands_begin())->getAddrSpace() != AMDGPUASI.CONSTANT_ADDRESS &&
		(*I.memoperands_begin())->getAddrSpace() != AMDGPUASI.CONSTANT_ADDRESS_32BIT)
return false;		return false;

if (!isInstrUniform(I))		if (!isInstrUniform(I))
return false;		return false;

if (hasVgprParts(AddrInfo))		if (hasVgprParts(AddrInfo))
return false;		return false;

▲ Show 20 Lines • Show All 123 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

Show First 20 Lines • Show All 260 Lines • ▼ Show 20 Lines	static StringRef computeDataLayout(const Triple &TT) {
if (TT.getArch() == Triple::r600) {		if (TT.getArch() == Triple::r600) {
// 32-bit pointers.		// 32-bit pointers.
return "e-p:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128"		return "e-p:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128"
"-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-A5";		"-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-A5";
}		}

// 32-bit private, local, and region pointers. 64-bit global, constant and		// 32-bit private, local, and region pointers. 64-bit global, constant and
// flat.		// flat.
return "e-p:64:64-p1:64:64-p2:64:64-p3:32:32-p4:32:32-p5:32:32"		return "e-p:64:64-p1:64:64-p2:64:64-p3:32:32-p4:32:32-p5:32:32-p6:32:32"
"-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128"		"-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128"
"-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-A5";		"-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-A5";
}		}

LLVM_READNONE		LLVM_READNONE
static StringRef getGPUOrDefault(const Triple &TT, StringRef GPU) {		static StringRef getGPUOrDefault(const Triple &TT, StringRef GPU) {
if (!GPU.empty())		if (!GPU.empty())
return GPU;		return GPU;
▲ Show 20 Lines • Show All 613 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp

	Show First 20 Lines • Show All 231 Lines • ▼ Show 20 Lines
	unsigned AMDGPUTTIImpl::getMinVectorRegisterBitWidth() const {			unsigned AMDGPUTTIImpl::getMinVectorRegisterBitWidth() const {
	return 32;			return 32;
	}			}

	unsigned AMDGPUTTIImpl::getLoadStoreVecRegBitWidth(unsigned AddrSpace) const {			unsigned AMDGPUTTIImpl::getLoadStoreVecRegBitWidth(unsigned AddrSpace) const {
	AMDGPUAS AS = ST->getAMDGPUAS();			AMDGPUAS AS = ST->getAMDGPUAS();
	if (AddrSpace == AS.GLOBAL_ADDRESS \|\|			if (AddrSpace == AS.GLOBAL_ADDRESS \|\|
	AddrSpace == AS.CONSTANT_ADDRESS \|\|			AddrSpace == AS.CONSTANT_ADDRESS \|\|
				AddrSpace == AS.CONSTANT_ADDRESS_32BIT \|\|
	AddrSpace == AS.FLAT_ADDRESS)			AddrSpace == AS.FLAT_ADDRESS)
	return 128;			return 128;
	if (AddrSpace == AS.LOCAL_ADDRESS \|\|			if (AddrSpace == AS.LOCAL_ADDRESS \|\|
	AddrSpace == AS.REGION_ADDRESS)			AddrSpace == AS.REGION_ADDRESS)
	return 64;			return 64;
	if (AddrSpace == AS.PRIVATE_ADDRESS)			if (AddrSpace == AS.PRIVATE_ADDRESS)
	return 8 * ST->getMaxPrivateElementSize();			return 8 * ST->getMaxPrivateElementSize();

	▲ Show 20 Lines • Show All 374 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/SIISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 894 Lines • ▼ Show 20 Lines	bool SITargetLowering::isLegalAddressingMode(const DataLayout &DL,
unsigned AS, Instruction *I) const {		unsigned AS, Instruction *I) const {
// No global is ever allowed as a base.		// No global is ever allowed as a base.
if (AM.BaseGV)		if (AM.BaseGV)
return false;		return false;

if (AS == AMDGPUASI.GLOBAL_ADDRESS)		if (AS == AMDGPUASI.GLOBAL_ADDRESS)
return isLegalGlobalAddressingMode(AM);		return isLegalGlobalAddressingMode(AM);

if (AS == AMDGPUASI.CONSTANT_ADDRESS) {		if (AS == AMDGPUASI.CONSTANT_ADDRESS \|\|
		AS == AMDGPUASI.CONSTANT_ADDRESS_32BIT) {
// If the offset isn't a multiple of 4, it probably isn't going to be		// If the offset isn't a multiple of 4, it probably isn't going to be
// correctly aligned.		// correctly aligned.
// FIXME: Can we get the real alignment here?		// FIXME: Can we get the real alignment here?
if (AM.BaseOffs % 4 != 0)		if (AM.BaseOffs % 4 != 0)
return isLegalMUBUFAddressingMode(AM);		return isLegalMUBUFAddressingMode(AM);

// There are no SMRD extloads, so if we have to do a small type access we		// There are no SMRD extloads, so if we have to do a small type access we
// will use a MUBUF load.		// will use a MUBUF load.
▲ Show 20 Lines • Show All 106 Lines • ▼ Show 20 Lines	if (!Subtarget->hasUnalignedScratchAccess() &&
AddrSpace == AMDGPUASI.FLAT_ADDRESS)) {		AddrSpace == AMDGPUASI.FLAT_ADDRESS)) {
return false;		return false;
}		}

if (Subtarget->hasUnalignedBufferAccess()) {		if (Subtarget->hasUnalignedBufferAccess()) {
// If we have an uniform constant load, it still requires using a slow		// If we have an uniform constant load, it still requires using a slow
// buffer instruction if unaligned.		// buffer instruction if unaligned.
if (IsFast) {		if (IsFast) {
*IsFast = (AddrSpace == AMDGPUASI.CONSTANT_ADDRESS) ?		*IsFast = (AddrSpace == AMDGPUASI.CONSTANT_ADDRESS \|\|
		AddrSpace == AMDGPUASI.CONSTANT_ADDRESS_32BIT) ?
(Align % 4 == 0) : true;		(Align % 4 == 0) : true;
}		}

return true;		return true;
}		}

// Smaller than dword value must be aligned.		// Smaller than dword value must be aligned.
if (VT.bitsLT(MVT::i32))		if (VT.bitsLT(MVT::i32))
Show All 26 Lines	EVT SITargetLowering::getOptimalMemOpType(uint64_t Size, unsigned DstAlign,

// Use the default.		// Use the default.
return MVT::Other;		return MVT::Other;
}		}

static bool isFlatGlobalAddrSpace(unsigned AS, AMDGPUAS AMDGPUASI) {		static bool isFlatGlobalAddrSpace(unsigned AS, AMDGPUAS AMDGPUASI) {
return AS == AMDGPUASI.GLOBAL_ADDRESS \|\|		return AS == AMDGPUASI.GLOBAL_ADDRESS \|\|
AS == AMDGPUASI.FLAT_ADDRESS \|\|		AS == AMDGPUASI.FLAT_ADDRESS \|\|
AS == AMDGPUASI.CONSTANT_ADDRESS;		AS == AMDGPUASI.CONSTANT_ADDRESS \|\|
		AS == AMDGPUASI.CONSTANT_ADDRESS_32BIT;
}		}

bool SITargetLowering::isNoopAddrSpaceCast(unsigned SrcAS,		bool SITargetLowering::isNoopAddrSpaceCast(unsigned SrcAS,
unsigned DestAS) const {		unsigned DestAS) const {
return isFlatGlobalAddrSpace(SrcAS, AMDGPUASI) &&		return isFlatGlobalAddrSpace(SrcAS, AMDGPUASI) &&
isFlatGlobalAddrSpace(DestAS, AMDGPUASI);		isFlatGlobalAddrSpace(DestAS, AMDGPUASI);
}		}

▲ Show 20 Lines • Show All 2,925 Lines • ▼ Show 20 Lines	for (unsigned i = 0; i < 3; ++i) {
// Create fixed stack object for work item ID.		// Create fixed stack object for work item ID.
ObjectIdx = MF.getFrameInfo().CreateFixedObject(4, i * 4 + 16, true);		ObjectIdx = MF.getFrameInfo().CreateFixedObject(4, i * 4 + 16, true);
Info->setDebuggerWorkItemIDStackObjectIndex(i, ObjectIdx);		Info->setDebuggerWorkItemIDStackObjectIndex(i, ObjectIdx);
}		}
}		}

bool SITargetLowering::shouldEmitFixup(const GlobalValue *GV) const {		bool SITargetLowering::shouldEmitFixup(const GlobalValue *GV) const {
const Triple &TT = getTargetMachine().getTargetTriple();		const Triple &TT = getTargetMachine().getTargetTriple();
return GV->getType()->getAddressSpace() == AMDGPUASI.CONSTANT_ADDRESS &&		return (GV->getType()->getAddressSpace() == AMDGPUASI.CONSTANT_ADDRESS \|\|
		GV->getType()->getAddressSpace() == AMDGPUASI.CONSTANT_ADDRESS_32BIT) &&
AMDGPU::shouldEmitConstantsToTextSection(TT);		AMDGPU::shouldEmitConstantsToTextSection(TT);
}		}

bool SITargetLowering::shouldEmitGOTReloc(const GlobalValue *GV) const {		bool SITargetLowering::shouldEmitGOTReloc(const GlobalValue *GV) const {
return (GV->getType()->getAddressSpace() == AMDGPUASI.GLOBAL_ADDRESS \|\|		return (GV->getType()->getAddressSpace() == AMDGPUASI.GLOBAL_ADDRESS \|\|
GV->getType()->getAddressSpace() == AMDGPUASI.CONSTANT_ADDRESS) &&		GV->getType()->getAddressSpace() == AMDGPUASI.CONSTANT_ADDRESS \|\|
		GV->getType()->getAddressSpace() == AMDGPUASI.CONSTANT_ADDRESS_32BIT) &&
!shouldEmitFixup(GV) &&		!shouldEmitFixup(GV) &&
!getTargetMachine().shouldAssumeDSOLocal(*GV->getParent(), GV);		!getTargetMachine().shouldAssumeDSOLocal(*GV->getParent(), GV);
}		}

bool SITargetLowering::shouldEmitPCReloc(const GlobalValue *GV) const {		bool SITargetLowering::shouldEmitPCReloc(const GlobalValue *GV) const {
return !shouldEmitFixup(GV) && !shouldEmitGOTReloc(GV);		return !shouldEmitFixup(GV) && !shouldEmitGOTReloc(GV);
}		}

▲ Show 20 Lines • Show All 360 Lines • ▼ Show 20 Lines	SDValue SITargetLowering::lowerEXTRACT_VECTOR_ELT(SDValue Op,

return DAG.getNode(ISD::BITCAST, SL, ResultVT, Result);		return DAG.getNode(ISD::BITCAST, SL, ResultVT, Result);
}		}

bool		bool
SITargetLowering::isOffsetFoldingLegal(const GlobalAddressSDNode *GA) const {		SITargetLowering::isOffsetFoldingLegal(const GlobalAddressSDNode *GA) const {
// We can fold offsets for anything that doesn't require a GOT relocation.		// We can fold offsets for anything that doesn't require a GOT relocation.
return (GA->getAddressSpace() == AMDGPUASI.GLOBAL_ADDRESS \|\|		return (GA->getAddressSpace() == AMDGPUASI.GLOBAL_ADDRESS \|\|
GA->getAddressSpace() == AMDGPUASI.CONSTANT_ADDRESS) &&		GA->getAddressSpace() == AMDGPUASI.CONSTANT_ADDRESS \|\|
		GA->getAddressSpace() == AMDGPUASI.CONSTANT_ADDRESS_32BIT) &&
!shouldEmitGOTReloc(GA->getGlobal());		!shouldEmitGOTReloc(GA->getGlobal());
}		}

static SDValue		static SDValue
buildPCRelGlobalAddress(SelectionDAG &DAG, const GlobalValue *GV,		buildPCRelGlobalAddress(SelectionDAG &DAG, const GlobalValue *GV,
const SDLoc &DL, unsigned Offset, EVT PtrVT,		const SDLoc &DL, unsigned Offset, EVT PtrVT,
unsigned GAFlags = SIInstrInfo::MO_NONE) {		unsigned GAFlags = SIInstrInfo::MO_NONE) {
// In order to support pc-relative addressing, the PC_ADD_REL_OFFSET SDNode is		// In order to support pc-relative addressing, the PC_ADD_REL_OFFSET SDNode is
Show All 36 Lines

SDValue SITargetLowering::LowerGlobalAddress(AMDGPUMachineFunction *MFI,		SDValue SITargetLowering::LowerGlobalAddress(AMDGPUMachineFunction *MFI,
SDValue Op,		SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
GlobalAddressSDNode *GSD = cast<GlobalAddressSDNode>(Op);		GlobalAddressSDNode *GSD = cast<GlobalAddressSDNode>(Op);
const GlobalValue *GV = GSD->getGlobal();		const GlobalValue *GV = GSD->getGlobal();

if (GSD->getAddressSpace() != AMDGPUASI.CONSTANT_ADDRESS &&		if (GSD->getAddressSpace() != AMDGPUASI.CONSTANT_ADDRESS &&
		GSD->getAddressSpace() != AMDGPUASI.CONSTANT_ADDRESS_32BIT &&
GSD->getAddressSpace() != AMDGPUASI.GLOBAL_ADDRESS &&		GSD->getAddressSpace() != AMDGPUASI.GLOBAL_ADDRESS &&
// FIXME: It isn't correct to rely on the type of the pointer. This should		// FIXME: It isn't correct to rely on the type of the pointer. This should
// be removed when address space 0 is 64-bit.		// be removed when address space 0 is 64-bit.
!GV->getType()->getElementType()->isFunctionTy())		!GV->getType()->getElementType()->isFunctionTy())
return AMDGPUTargetLowering::LowerGlobalAddress(MFI, Op, DAG);		return AMDGPUTargetLowering::LowerGlobalAddress(MFI, Op, DAG);

SDLoc DL(GSD);		SDLoc DL(GSD);
EVT PtrVT = Op.getValueType();		EVT PtrVT = Op.getValueType();
▲ Show 20 Lines • Show All 918 Lines • ▼ Show 20 Lines	SDValue SITargetLowering::LowerLOAD(SDValue Op, SelectionDAG &DAG) const {
SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();		SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();
// If there is a possibilty that flat instruction access scratch memory		// If there is a possibilty that flat instruction access scratch memory
// then we need to use the same legalization rules we use for private.		// then we need to use the same legalization rules we use for private.
if (AS == AMDGPUASI.FLAT_ADDRESS)		if (AS == AMDGPUASI.FLAT_ADDRESS)
AS = MFI->hasFlatScratchInit() ?		AS = MFI->hasFlatScratchInit() ?
AMDGPUASI.PRIVATE_ADDRESS : AMDGPUASI.GLOBAL_ADDRESS;		AMDGPUASI.PRIVATE_ADDRESS : AMDGPUASI.GLOBAL_ADDRESS;

unsigned NumElements = MemVT.getVectorNumElements();		unsigned NumElements = MemVT.getVectorNumElements();
if (AS == AMDGPUASI.CONSTANT_ADDRESS) {		if (AS == AMDGPUASI.CONSTANT_ADDRESS \|\|
		AS == AMDGPUASI.CONSTANT_ADDRESS_32BIT) {
if (isMemOpUniform(Load))		if (isMemOpUniform(Load))
return SDValue();		return SDValue();
// Non-uniform loads will be selected to MUBUF instructions, so they		// Non-uniform loads will be selected to MUBUF instructions, so they
// have the same legalization requirements as global and private		// have the same legalization requirements as global and private
// loads.		// loads.
//		//
}		}
if (AS == AMDGPUASI.CONSTANT_ADDRESS \|\| AS == AMDGPUASI.GLOBAL_ADDRESS) {		if (AS == AMDGPUASI.CONSTANT_ADDRESS \|\|
		AS == AMDGPUASI.CONSTANT_ADDRESS_32BIT \|\|
		AS == AMDGPUASI.GLOBAL_ADDRESS) {
if (Subtarget->getScalarizeGlobalBehavior() && isMemOpUniform(Load) &&		if (Subtarget->getScalarizeGlobalBehavior() && isMemOpUniform(Load) &&
!Load->isVolatile() && isMemOpHasNoClobberedMemOperand(Load))		!Load->isVolatile() && isMemOpHasNoClobberedMemOperand(Load))
return SDValue();		return SDValue();
// Non-uniform loads will be selected to MUBUF instructions, so they		// Non-uniform loads will be selected to MUBUF instructions, so they
// have the same legalization requirements as global and private		// have the same legalization requirements as global and private
// loads.		// loads.
//		//
}		}
if (AS == AMDGPUASI.CONSTANT_ADDRESS \|\| AS == AMDGPUASI.GLOBAL_ADDRESS \|\|		if (AS == AMDGPUASI.CONSTANT_ADDRESS \|\|
		AS == AMDGPUASI.CONSTANT_ADDRESS_32BIT \|\|
		AS == AMDGPUASI.GLOBAL_ADDRESS \|\|
AS == AMDGPUASI.FLAT_ADDRESS) {		AS == AMDGPUASI.FLAT_ADDRESS) {
if (NumElements > 4)		if (NumElements > 4)
return SplitVectorLoad(Op, DAG);		return SplitVectorLoad(Op, DAG);
// v4 loads are supported for private and global memory.		// v4 loads are supported for private and global memory.
return SDValue();		return SDValue();
}		}
if (AS == AMDGPUASI.PRIVATE_ADDRESS) {		if (AS == AMDGPUASI.PRIVATE_ADDRESS) {
// Depending on the setting of the private_element_size field in the		// Depending on the setting of the private_element_size field in the
▲ Show 20 Lines • Show All 2,363 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/SIMachineFunctionInfo.h

Show First 20 Lines • Show All 180 Lines • ▼ Show 20 Lines	private:
// user arguments. This is an offset from the KernargSegmentPtr.		// user arguments. This is an offset from the KernargSegmentPtr.
bool ImplicitArgPtr : 1;		bool ImplicitArgPtr : 1;

// The hard-wired high half of the address of the global information table		// The hard-wired high half of the address of the global information table
// for AMDPAL OS type. 0xffffffff represents no hard-wired high half, since		// for AMDPAL OS type. 0xffffffff represents no hard-wired high half, since
// current hardware only allows a 16 bit value.		// current hardware only allows a 16 bit value.
unsigned GITPtrHigh;		unsigned GITPtrHigh;

		unsigned HighBitsOf32BitAddress;

MCPhysReg getNextUserSGPR() const {		MCPhysReg getNextUserSGPR() const {
assert(NumSystemSGPRs == 0 && "System SGPRs must be added after user SGPRs");		assert(NumSystemSGPRs == 0 && "System SGPRs must be added after user SGPRs");
return AMDGPU::SGPR0 + NumUserSGPRs;		return AMDGPU::SGPR0 + NumUserSGPRs;
}		}

MCPhysReg getNextSystemSGPR() const {		MCPhysReg getNextSystemSGPR() const {
return AMDGPU::SGPR0 + NumUserSGPRs + NumSystemSGPRs;		return AMDGPU::SGPR0 + NumUserSGPRs + NumSystemSGPRs;
}		}
▲ Show 20 Lines • Show All 209 Lines • ▼ Show 20 Lines	public:
unsigned getPreloadedReg(AMDGPUFunctionArgInfo::PreloadedValue Value) const {		unsigned getPreloadedReg(AMDGPUFunctionArgInfo::PreloadedValue Value) const {
return ArgInfo.getPreloadedValue(Value).first->getRegister();		return ArgInfo.getPreloadedValue(Value).first->getRegister();
}		}

unsigned getGITPtrHigh() const {		unsigned getGITPtrHigh() const {
return GITPtrHigh;		return GITPtrHigh;
}		}

		unsigned get32BitAddressHighBits() const {
		return HighBitsOf32BitAddress;
		}

unsigned getNumUserSGPRs() const {		unsigned getNumUserSGPRs() const {
return NumUserSGPRs;		return NumUserSGPRs;
}		}

unsigned getNumPreloadedSGPRs() const {		unsigned getNumPreloadedSGPRs() const {
return NumUserSGPRs + NumSystemSGPRs;		return NumUserSGPRs + NumSystemSGPRs;
}		}

▲ Show 20 Lines • Show All 232 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp

Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	: AMDGPUMachineFunction(MF),
WorkGroupIDZ(false),		WorkGroupIDZ(false),
WorkGroupInfo(false),		WorkGroupInfo(false),
PrivateSegmentWaveByteOffset(false),		PrivateSegmentWaveByteOffset(false),
WorkItemIDX(false),		WorkItemIDX(false),
WorkItemIDY(false),		WorkItemIDY(false),
WorkItemIDZ(false),		WorkItemIDZ(false),
ImplicitBufferPtr(false),		ImplicitBufferPtr(false),
ImplicitArgPtr(false),		ImplicitArgPtr(false),
GITPtrHigh(0xffffffff) {		GITPtrHigh(0xffffffff),
		HighBitsOf32BitAddress(0) {
const SISubtarget &ST = MF.getSubtarget<SISubtarget>();		const SISubtarget &ST = MF.getSubtarget<SISubtarget>();
const Function &F = MF.getFunction();		const Function &F = MF.getFunction();
FlatWorkGroupSizes = ST.getFlatWorkGroupSizes(F);		FlatWorkGroupSizes = ST.getFlatWorkGroupSizes(F);
WavesPerEU = ST.getWavesPerEU(F);		WavesPerEU = ST.getWavesPerEU(F);

if (!isEntryFunction()) {		if (!isEntryFunction()) {
// Non-entry functions have no special inputs for now, other registers		// Non-entry functions have no special inputs for now, other registers
// required for scratch access.		// required for scratch access.
▲ Show 20 Lines • Show All 100 Lines • ▼ Show 20 Lines	if (ST.hasFlatAddressSpace() && isEntryFunction() && IsCOV2) {
if (HasStackObjects \|\| F.hasFnAttribute("amdgpu-flat-scratch"))		if (HasStackObjects \|\| F.hasFnAttribute("amdgpu-flat-scratch"))
FlatScratchInit = true;		FlatScratchInit = true;
}		}

Attribute A = F.getFnAttribute("amdgpu-git-ptr-high");		Attribute A = F.getFnAttribute("amdgpu-git-ptr-high");
StringRef S = A.getValueAsString();		StringRef S = A.getValueAsString();
if (!S.empty())		if (!S.empty())
S.consumeInteger(0, GITPtrHigh);		S.consumeInteger(0, GITPtrHigh);

		A = F.getFnAttribute("amdgpu-32bit-address-high-bits");
		S = A.getValueAsString();
		if (!S.empty())
		S.consumeInteger(0, HighBitsOf32BitAddress);
}		}

unsigned SIMachineFunctionInfo::addPrivateSegmentBuffer(		unsigned SIMachineFunctionInfo::addPrivateSegmentBuffer(
const SIRegisterInfo &TRI) {		const SIRegisterInfo &TRI) {
ArgInfo.PrivateSegmentBuffer =		ArgInfo.PrivateSegmentBuffer =
ArgDescriptor::createRegister(TRI.getMatchingSuperReg(		ArgDescriptor::createRegister(TRI.getMatchingSuperReg(
getNextUserSGPR(), AMDGPU::sub0, &AMDGPU::SReg_128RegClass));		getNextUserSGPR(), AMDGPU::sub0, &AMDGPU::SReg_128RegClass));
NumUserSGPRs += 4;		NumUserSGPRs += 4;
▲ Show 20 Lines • Show All 121 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/SMInstructions.td

	Show First 20 Lines • Show All 217 Lines • ▼ Show 20 Lines
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Scalar Memory Patterns			// Scalar Memory Patterns
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//


	def smrd_load : PatFrag <(ops node:$ptr), (load node:$ptr), [{			def smrd_load : PatFrag <(ops node:$ptr), (load node:$ptr), [{
	auto Ld = cast<LoadSDNode>(N);			auto Ld = cast<LoadSDNode>(N);
	return Ld->getAlignment() >= 4 &&			return Ld->getAlignment() >= 4 &&
	((Ld->getAddressSpace() == AMDGPUASI.CONSTANT_ADDRESS &&			(((Ld->getAddressSpace() == AMDGPUASI.CONSTANT_ADDRESS \|\|
				Ld->getAddressSpace() == AMDGPUASI.CONSTANT_ADDRESS_32BIT) &&
	static_cast<const SITargetLowering *>(getTargetLowering())->isMemOpUniform(N)) \|\|			static_cast<const SITargetLowering *>(getTargetLowering())->isMemOpUniform(N)) \|\|
	(Subtarget->getScalarizeGlobalBehavior() && Ld->getAddressSpace() == AMDGPUASI.GLOBAL_ADDRESS &&			(Subtarget->getScalarizeGlobalBehavior() && Ld->getAddressSpace() == AMDGPUASI.GLOBAL_ADDRESS &&
	!Ld->isVolatile() &&			!Ld->isVolatile() &&
	static_cast<const SITargetLowering *>(getTargetLowering())->isMemOpUniform(N) &&			static_cast<const SITargetLowering *>(getTargetLowering())->isMemOpUniform(N) &&
	static_cast<const SITargetLowering *>(getTargetLowering())->isMemOpHasNoClobberedMemOperand(N)));			static_cast<const SITargetLowering *>(getTargetLowering())->isMemOpHasNoClobberedMemOperand(N)));
	}]>;			}]>;

	def SMRDImm : ComplexPattern<i64, 2, "SelectSMRDImm">;			def SMRDImm : ComplexPattern<i64, 2, "SelectSMRDImm">;
	▲ Show 20 Lines • Show All 288 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp

Show First 20 Lines • Show All 441 Lines • ▼ Show 20 Lines	bool isGroupSegment(const GlobalValue *GV) {
return GV->getType()->getAddressSpace() == AMDGPUAS::LOCAL_ADDRESS;		return GV->getType()->getAddressSpace() == AMDGPUAS::LOCAL_ADDRESS;
}		}

bool isGlobalSegment(const GlobalValue *GV) {		bool isGlobalSegment(const GlobalValue *GV) {
return GV->getType()->getAddressSpace() == AMDGPUAS::GLOBAL_ADDRESS;		return GV->getType()->getAddressSpace() == AMDGPUAS::GLOBAL_ADDRESS;
}		}

bool isReadOnlySegment(const GlobalValue *GV) {		bool isReadOnlySegment(const GlobalValue *GV) {
return GV->getType()->getAddressSpace() == AMDGPUAS::CONSTANT_ADDRESS;		return GV->getType()->getAddressSpace() == AMDGPUAS::CONSTANT_ADDRESS \|\|
		GV->getType()->getAddressSpace() == AMDGPUAS::CONSTANT_ADDRESS_32BIT;
}		}

bool shouldEmitConstantsToTextSection(const Triple &TT) {		bool shouldEmitConstantsToTextSection(const Triple &TT) {
return TT.getOS() != Triple::AMDHSA;		return TT.getOS() != Triple::AMDHSA;
}		}

int getIntegerAttribute(const Function &F, StringRef Name, int Default) {		int getIntegerAttribute(const Function &F, StringRef Name, int Default) {
Attribute A = F.getFnAttribute(Name);		Attribute A = F.getFnAttribute(Name);
▲ Show 20 Lines • Show All 452 Lines • ▼ Show 20 Lines	bool isUniformMMO(const MachineMemOperand *MMO) {
// UndefValue means this is a load of a kernel input. These are uniform.		// UndefValue means this is a load of a kernel input. These are uniform.
// Sometimes LDS instructions have constant pointers.		// Sometimes LDS instructions have constant pointers.
// If Ptr is null, then that means this mem operand contains a		// If Ptr is null, then that means this mem operand contains a
// PseudoSourceValue like GOT.		// PseudoSourceValue like GOT.
if (!Ptr \|\| isa<UndefValue>(Ptr) \|\|		if (!Ptr \|\| isa<UndefValue>(Ptr) \|\|
isa<Constant>(Ptr) \|\| isa<GlobalValue>(Ptr))		isa<Constant>(Ptr) \|\| isa<GlobalValue>(Ptr))
return true;		return true;

		if (MMO->getAddrSpace() == AMDGPUAS::CONSTANT_ADDRESS_32BIT)
		return true;

if (const Argument *Arg = dyn_cast<Argument>(Ptr))		if (const Argument *Arg = dyn_cast<Argument>(Ptr))
return isArgPassedInSGPR(Arg);		return isArgPassedInSGPR(Arg);

const Instruction *I = dyn_cast<Instruction>(Ptr);		const Instruction *I = dyn_cast<Instruction>(Ptr);
return I && I->getMetadata("amdgpu.uniform");		return I && I->getMetadata("amdgpu.uniform");
}		}

int64_t getSMRDEncodedOffset(const MCSubtargetInfo &ST, int64_t ByteOffset) {		int64_t getSMRDEncodedOffset(const MCSubtargetInfo &ST, int64_t ByteOffset) {
Show All 35 Lines

llvm/trunk/test/CodeGen/AMDGPU/constant-address-space-32bit.ll

				; RUN: llc -march=amdgcn -mcpu=tahiti < %s \| FileCheck -check-prefixes=GCN,SICI,SI %s
				; RUN: llc -march=amdgcn -mcpu=bonaire < %s \| FileCheck -check-prefixes=GCN,SICI %s
				; RUN: llc -march=amdgcn -mcpu=tonga < %s \| FileCheck -check-prefixes=GCN,VIGFX9 %s
				; RUN: llc -march=amdgcn -mcpu=gfx900 < %s \| FileCheck -check-prefixes=GCN,VIGFX9 %s

				; GCN-LABEL: {{^}}load_i32:
				; GCN-DAG: s_mov_b32 s3, 0
				; GCN-DAG: s_mov_b32 s2, s1
				; GCN-DAG: s_mov_b32 s1, s3
				; SICI-DAG: s_load_dword s{{[0-9]}}, s[0:1], 0x0
				; SICI-DAG: s_load_dword s{{[0-9]}}, s[2:3], 0x2
				; VIGFX9-DAG: s_load_dword s{{[0-9]}}, s[0:1], 0x0
				; VIGFX9-DAG: s_load_dword s{{[0-9]}}, s[2:3], 0x8
				define amdgpu_vs float @load_i32(i32 addrspace(6)* inreg %p0, i32 addrspace(6)* inreg %p1) #0 {
				%gep1 = getelementptr i32, i32 addrspace(6)* %p1, i64 2
				%r0 = load i32, i32 addrspace(6)* %p0
				%r1 = load i32, i32 addrspace(6)* %gep1
				%r = add i32 %r0, %r1
				%r2 = bitcast i32 %r to float
				ret float %r2
				}

				; GCN-LABEL: {{^}}load_v2i32:
				; GCN-DAG: s_mov_b32 s3, 0
				; GCN-DAG: s_mov_b32 s2, s1
				; GCN-DAG: s_mov_b32 s1, s3
				; SICI-DAG: s_load_dwordx2 s[{{.*}}], s[0:1], 0x0
				; SICI-DAG: s_load_dwordx2 s[{{.*}}], s[2:3], 0x4
				; VIGFX9-DAG: s_load_dwordx2 s[{{.*}}], s[0:1], 0x0
				; VIGFX9-DAG: s_load_dwordx2 s[{{.*}}], s[2:3], 0x10
				define amdgpu_vs <2 x float> @load_v2i32(<2 x i32> addrspace(6)* inreg %p0, <2 x i32> addrspace(6)* inreg %p1) #0 {
				%gep1 = getelementptr <2 x i32>, <2 x i32> addrspace(6)* %p1, i64 2
				%r0 = load <2 x i32>, <2 x i32> addrspace(6)* %p0
				%r1 = load <2 x i32>, <2 x i32> addrspace(6)* %gep1
				%r = add <2 x i32> %r0, %r1
				%r2 = bitcast <2 x i32> %r to <2 x float>
				ret <2 x float> %r2
				}

				; GCN-LABEL: {{^}}load_v4i32:
				; GCN-DAG: s_mov_b32 s3, 0
				; GCN-DAG: s_mov_b32 s2, s1
				; GCN-DAG: s_mov_b32 s1, s3
				; SICI-DAG: s_load_dwordx4 s[{{.*}}], s[0:1], 0x0
				; SICI-DAG: s_load_dwordx4 s[{{.*}}], s[2:3], 0x8
				; VIGFX9-DAG: s_load_dwordx4 s[{{.*}}], s[0:1], 0x0
				; VIGFX9-DAG: s_load_dwordx4 s[{{.*}}], s[2:3], 0x20
				define amdgpu_vs <4 x float> @load_v4i32(<4 x i32> addrspace(6)* inreg %p0, <4 x i32> addrspace(6)* inreg %p1) #0 {
				%gep1 = getelementptr <4 x i32>, <4 x i32> addrspace(6)* %p1, i64 2
				%r0 = load <4 x i32>, <4 x i32> addrspace(6)* %p0
				%r1 = load <4 x i32>, <4 x i32> addrspace(6)* %gep1
				%r = add <4 x i32> %r0, %r1
				%r2 = bitcast <4 x i32> %r to <4 x float>
				ret <4 x float> %r2
				}

				; GCN-LABEL: {{^}}load_v8i32:
				; GCN-DAG: s_mov_b32 s3, 0
				; GCN-DAG: s_mov_b32 s2, s1
				; GCN-DAG: s_mov_b32 s1, s3
				; SICI-DAG: s_load_dwordx8 s[{{.*}}], s[0:1], 0x0
				; SICI-DAG: s_load_dwordx8 s[{{.*}}], s[2:3], 0x10
				; VIGFX9-DAG: s_load_dwordx8 s[{{.*}}], s[0:1], 0x0
				; VIGFX9-DAG: s_load_dwordx8 s[{{.*}}], s[2:3], 0x40
				define amdgpu_vs <8 x float> @load_v8i32(<8 x i32> addrspace(6)* inreg %p0, <8 x i32> addrspace(6)* inreg %p1) #0 {
				%gep1 = getelementptr <8 x i32>, <8 x i32> addrspace(6)* %p1, i64 2
				%r0 = load <8 x i32>, <8 x i32> addrspace(6)* %p0
				%r1 = load <8 x i32>, <8 x i32> addrspace(6)* %gep1
				%r = add <8 x i32> %r0, %r1
				%r2 = bitcast <8 x i32> %r to <8 x float>
				ret <8 x float> %r2
				}

				; GCN-LABEL: {{^}}load_v16i32:
				; GCN-DAG: s_mov_b32 s3, 0
				; GCN-DAG: s_mov_b32 s2, s1
				; GCN-DAG: s_mov_b32 s1, s3
				; SICI-DAG: s_load_dwordx16 s[{{.*}}], s[0:1], 0x0
				; SICI-DAG: s_load_dwordx16 s[{{.*}}], s[2:3], 0x20
				; VIGFX9-DAG: s_load_dwordx16 s[{{.*}}], s[0:1], 0x0
				; VIGFX9-DAG: s_load_dwordx16 s[{{.*}}], s[2:3], 0x80
				define amdgpu_vs <16 x float> @load_v16i32(<16 x i32> addrspace(6)* inreg %p0, <16 x i32> addrspace(6)* inreg %p1) #0 {
				%gep1 = getelementptr <16 x i32>, <16 x i32> addrspace(6)* %p1, i64 2
				%r0 = load <16 x i32>, <16 x i32> addrspace(6)* %p0
				%r1 = load <16 x i32>, <16 x i32> addrspace(6)* %gep1
				%r = add <16 x i32> %r0, %r1
				%r2 = bitcast <16 x i32> %r to <16 x float>
				ret <16 x float> %r2
				}

				; GCN-LABEL: {{^}}load_float:
				; GCN-DAG: s_mov_b32 s3, 0
				; GCN-DAG: s_mov_b32 s2, s1
				; GCN-DAG: s_mov_b32 s1, s3
				; SICI-DAG: s_load_dword s{{[0-9]}}, s[0:1], 0x0
				; SICI-DAG: s_load_dword s{{[0-9]}}, s[2:3], 0x2
				; VIGFX9-DAG: s_load_dword s{{[0-9]}}, s[0:1], 0x0
				; VIGFX9-DAG: s_load_dword s{{[0-9]}}, s[2:3], 0x8
				define amdgpu_vs float @load_float(float addrspace(6)* inreg %p0, float addrspace(6)* inreg %p1) #0 {
				%gep1 = getelementptr float, float addrspace(6)* %p1, i64 2
				%r0 = load float, float addrspace(6)* %p0
				%r1 = load float, float addrspace(6)* %gep1
				%r = fadd float %r0, %r1
				ret float %r
				}

				; GCN-LABEL: {{^}}load_v2float:
				; GCN-DAG: s_mov_b32 s3, 0
				; GCN-DAG: s_mov_b32 s2, s1
				; GCN-DAG: s_mov_b32 s1, s3
				; SICI-DAG: s_load_dwordx2 s[{{.*}}], s[0:1], 0x0
				; SICI-DAG: s_load_dwordx2 s[{{.*}}], s[2:3], 0x4
				; VIGFX9-DAG: s_load_dwordx2 s[{{.*}}], s[0:1], 0x0
				; VIGFX9-DAG: s_load_dwordx2 s[{{.*}}], s[2:3], 0x10
				define amdgpu_vs <2 x float> @load_v2float(<2 x float> addrspace(6)* inreg %p0, <2 x float> addrspace(6)* inreg %p1) #0 {
				%gep1 = getelementptr <2 x float>, <2 x float> addrspace(6)* %p1, i64 2
				%r0 = load <2 x float>, <2 x float> addrspace(6)* %p0
				%r1 = load <2 x float>, <2 x float> addrspace(6)* %gep1
				%r = fadd <2 x float> %r0, %r1
				ret <2 x float> %r
				}

				; GCN-LABEL: {{^}}load_v4float:
				; GCN-DAG: s_mov_b32 s3, 0
				; GCN-DAG: s_mov_b32 s2, s1
				; GCN-DAG: s_mov_b32 s1, s3
				; SICI-DAG: s_load_dwordx4 s[{{.*}}], s[0:1], 0x0
				; SICI-DAG: s_load_dwordx4 s[{{.*}}], s[2:3], 0x8
				; VIGFX9-DAG: s_load_dwordx4 s[{{.*}}], s[0:1], 0x0
				; VIGFX9-DAG: s_load_dwordx4 s[{{.*}}], s[2:3], 0x20
				define amdgpu_vs <4 x float> @load_v4float(<4 x float> addrspace(6)* inreg %p0, <4 x float> addrspace(6)* inreg %p1) #0 {
				%gep1 = getelementptr <4 x float>, <4 x float> addrspace(6)* %p1, i64 2
				%r0 = load <4 x float>, <4 x float> addrspace(6)* %p0
				%r1 = load <4 x float>, <4 x float> addrspace(6)* %gep1
				%r = fadd <4 x float> %r0, %r1
				ret <4 x float> %r
				}

				; GCN-LABEL: {{^}}load_v8float:
				; GCN-DAG: s_mov_b32 s3, 0
				; GCN-DAG: s_mov_b32 s2, s1
				; GCN-DAG: s_mov_b32 s1, s3
				; SICI-DAG: s_load_dwordx8 s[{{.*}}], s[0:1], 0x0
				; SICI-DAG: s_load_dwordx8 s[{{.*}}], s[2:3], 0x10
				; VIGFX9-DAG: s_load_dwordx8 s[{{.*}}], s[0:1], 0x0
				; VIGFX9-DAG: s_load_dwordx8 s[{{.*}}], s[2:3], 0x40
				define amdgpu_vs <8 x float> @load_v8float(<8 x float> addrspace(6)* inreg %p0, <8 x float> addrspace(6)* inreg %p1) #0 {
				%gep1 = getelementptr <8 x float>, <8 x float> addrspace(6)* %p1, i64 2
				%r0 = load <8 x float>, <8 x float> addrspace(6)* %p0
				%r1 = load <8 x float>, <8 x float> addrspace(6)* %gep1
				%r = fadd <8 x float> %r0, %r1
				ret <8 x float> %r
				}

				; GCN-LABEL: {{^}}load_v16float:
				; GCN-DAG: s_mov_b32 s3, 0
				; GCN-DAG: s_mov_b32 s2, s1
				; GCN-DAG: s_mov_b32 s1, s3
				; SICI-DAG: s_load_dwordx16 s[{{.*}}], s[0:1], 0x0
				; SICI-DAG: s_load_dwordx16 s[{{.*}}], s[2:3], 0x20
				; VIGFX9-DAG: s_load_dwordx16 s[{{.*}}], s[0:1], 0x0
				; VIGFX9-DAG: s_load_dwordx16 s[{{.*}}], s[2:3], 0x80
				define amdgpu_vs <16 x float> @load_v16float(<16 x float> addrspace(6)* inreg %p0, <16 x float> addrspace(6)* inreg %p1) #0 {
				%gep1 = getelementptr <16 x float>, <16 x float> addrspace(6)* %p1, i64 2
				%r0 = load <16 x float>, <16 x float> addrspace(6)* %p0
				%r1 = load <16 x float>, <16 x float> addrspace(6)* %gep1
				%r = fadd <16 x float> %r0, %r1
				ret <16 x float> %r
				}

				; GCN-LABEL: {{^}}load_i32_hi0:
				; GCN: s_mov_b32 s1, 0
				; GCN-NEXT: s_load_dword s0, s[0:1], 0x0
				define amdgpu_vs i32 @load_i32_hi0(i32 addrspace(6)* inreg %p) #1 {
				%r0 = load i32, i32 addrspace(6)* %p
				ret i32 %r0
				}

				; GCN-LABEL: {{^}}load_i32_hi1:
				; GCN: s_mov_b32 s1, 1
				; GCN-NEXT: s_load_dword s0, s[0:1], 0x0
				define amdgpu_vs i32 @load_i32_hi1(i32 addrspace(6)* inreg %p) #2 {
				%r0 = load i32, i32 addrspace(6)* %p
				ret i32 %r0
				}

				; GCN-LABEL: {{^}}load_i32_hiffff8000:
				; GCN: s_movk_i32 s1, 0x8000
				; GCN-NEXT: s_load_dword s0, s[0:1], 0x0
				define amdgpu_vs i32 @load_i32_hiffff8000(i32 addrspace(6)* inreg %p) #3 {
				%r0 = load i32, i32 addrspace(6)* %p
				ret i32 %r0
				}

				; GCN-LABEL: {{^}}load_i32_hifffffff0:
				; GCN: s_mov_b32 s1, -16
				; GCN-NEXT: s_load_dword s0, s[0:1], 0x0
				define amdgpu_vs i32 @load_i32_hifffffff0(i32 addrspace(6)* inreg %p) #4 {
				%r0 = load i32, i32 addrspace(6)* %p
				ret i32 %r0
				}

				; GCN-LABEL: {{^}}load_sampler
				; GCN: v_readfirstlane_b32
				; GCN-NEXT: v_readfirstlane_b32
				; SI: s_nop
				; GCN-NEXT: s_load_dwordx8
				; GCN-NEXT: s_load_dwordx4
				; GCN: image_sample
				define amdgpu_ps <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> @load_sampler([0 x <4 x i32>] addrspace(6)* inreg noalias dereferenceable(18446744073709551615), [0 x <8 x i32>] addrspace(6)* inreg noalias dereferenceable(18446744073709551615), [0 x <4 x i32>] addrspace(6)* inreg noalias dereferenceable(18446744073709551615), [0 x <8 x i32>] addrspace(6)* inreg noalias dereferenceable(18446744073709551615), float inreg, i32 inreg, <2 x i32>, <2 x i32>, <2 x i32>, <3 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, float, float, float, float, float, i32, i32, float, i32) #5 {
				main_body:
				%22 = call nsz float @llvm.amdgcn.interp.mov(i32 2, i32 0, i32 0, i32 %5) #8
				%23 = bitcast float %22 to i32
				%24 = shl i32 %23, 1
				%25 = getelementptr [0 x <8 x i32>], [0 x <8 x i32>] addrspace(6)* %1, i32 0, i32 %24, !amdgpu.uniform !0
				%26 = load <8 x i32>, <8 x i32> addrspace(6)* %25, align 32, !invariant.load !0
				%27 = shl i32 %23, 2
				%28 = or i32 %27, 3
				%29 = bitcast [0 x <8 x i32>] addrspace(6)* %1 to [0 x <4 x i32>] addrspace(6)*
				%30 = getelementptr [0 x <4 x i32>], [0 x <4 x i32>] addrspace(6)* %29, i32 0, i32 %28, !amdgpu.uniform !0
				%31 = load <4 x i32>, <4 x i32> addrspace(6)* %30, align 16, !invariant.load !0
				%32 = call nsz <4 x float> @llvm.amdgcn.image.sample.v4f32.v2f32.v8i32(<2 x float> zeroinitializer, <8 x i32> %26, <4 x i32> %31, i32 15, i1 false, i1 false, i1 false, i1 false, i1 false) #8
				%33 = extractelement <4 x float> %32, i32 0
				%34 = extractelement <4 x float> %32, i32 1
				%35 = extractelement <4 x float> %32, i32 2
				%36 = extractelement <4 x float> %32, i32 3
				%37 = bitcast float %4 to i32
				%38 = insertvalue <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> undef, i32 %37, 4
				%39 = insertvalue <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> %38, float %33, 5
				%40 = insertvalue <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> %39, float %34, 6
				%41 = insertvalue <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> %40, float %35, 7
				%42 = insertvalue <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> %41, float %36, 8
				%43 = insertvalue <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> %42, float %20, 19
				ret <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> %43
				}

				; GCN-LABEL: {{^}}load_sampler_nouniform
				; GCN: v_readfirstlane_b32
				; GCN-NEXT: v_readfirstlane_b32
				; SI: s_nop
				; GCN-NEXT: s_load_dwordx8
				; GCN-NEXT: s_load_dwordx4
				; GCN: image_sample
				define amdgpu_ps <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> @load_sampler_nouniform([0 x <4 x i32>] addrspace(6)* inreg noalias dereferenceable(18446744073709551615), [0 x <8 x i32>] addrspace(6)* inreg noalias dereferenceable(18446744073709551615), [0 x <4 x i32>] addrspace(6)* inreg noalias dereferenceable(18446744073709551615), [0 x <8 x i32>] addrspace(6)* inreg noalias dereferenceable(18446744073709551615), float inreg, i32 inreg, <2 x i32>, <2 x i32>, <2 x i32>, <3 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, float, float, float, float, float, i32, i32, float, i32) #5 {
				main_body:
				%22 = call nsz float @llvm.amdgcn.interp.mov(i32 2, i32 0, i32 0, i32 %5) #8
				%23 = bitcast float %22 to i32
				%24 = shl i32 %23, 1
				%25 = getelementptr [0 x <8 x i32>], [0 x <8 x i32>] addrspace(6)* %1, i32 0, i32 %24
				%26 = load <8 x i32>, <8 x i32> addrspace(6)* %25, align 32, !invariant.load !0
				%27 = shl i32 %23, 2
				%28 = or i32 %27, 3
				%29 = bitcast [0 x <8 x i32>] addrspace(6)* %1 to [0 x <4 x i32>] addrspace(6)*
				%30 = getelementptr [0 x <4 x i32>], [0 x <4 x i32>] addrspace(6)* %29, i32 0, i32 %28
				%31 = load <4 x i32>, <4 x i32> addrspace(6)* %30, align 16, !invariant.load !0
				%32 = call nsz <4 x float> @llvm.amdgcn.image.sample.v4f32.v2f32.v8i32(<2 x float> zeroinitializer, <8 x i32> %26, <4 x i32> %31, i32 15, i1 false, i1 false, i1 false, i1 false, i1 false) #8
				%33 = extractelement <4 x float> %32, i32 0
				%34 = extractelement <4 x float> %32, i32 1
				%35 = extractelement <4 x float> %32, i32 2
				%36 = extractelement <4 x float> %32, i32 3
				%37 = bitcast float %4 to i32
				%38 = insertvalue <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> undef, i32 %37, 4
				%39 = insertvalue <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> %38, float %33, 5
				%40 = insertvalue <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> %39, float %34, 6
				%41 = insertvalue <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> %40, float %35, 7
				%42 = insertvalue <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> %41, float %36, 8
				%43 = insertvalue <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> %42, float %20, 19
				ret <{ i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> %43
				}

				; Function Attrs: nounwind readnone speculatable
				declare float @llvm.amdgcn.interp.mov(i32, i32, i32, i32) #6

				; Function Attrs: nounwind readonly
				declare <4 x float> @llvm.amdgcn.image.sample.v4f32.v2f32.v8i32(<2 x float>, <8 x i32>, <4 x i32>, i32, i1, i1, i1, i1, i1) #7


				!0 = !{}

				attributes #0 = { nounwind }
				attributes #1 = { nounwind "amdgpu-32bit-address-high-bits"="0" }
				attributes #2 = { nounwind "amdgpu-32bit-address-high-bits"="1" }
				attributes #3 = { nounwind "amdgpu-32bit-address-high-bits"="0xffff8000" }
				attributes #4 = { nounwind "amdgpu-32bit-address-high-bits"="0xfffffff0" }
				attributes #5 = { "InitialPSInputAddr"="45175" }
				attributes #6 = { nounwind readnone speculatable }
				attributes #7 = { nounwind readonly }
				attributes #8 = { nounwind readnone }

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Add 32-bit constant address spaceClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 133216

llvm/trunk/docs/AMDGPUUsage.rst

llvm/trunk/lib/Target/AMDGPU/AMDGPU.h

llvm/trunk/lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp

llvm/trunk/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp

llvm/trunk/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp

llvm/trunk/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp

llvm/trunk/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

llvm/trunk/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp

llvm/trunk/lib/Target/AMDGPU/SIISelLowering.cpp

llvm/trunk/lib/Target/AMDGPU/SIMachineFunctionInfo.h

llvm/trunk/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp

llvm/trunk/lib/Target/AMDGPU/SMInstructions.td

llvm/trunk/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp

llvm/trunk/test/CodeGen/AMDGPU/constant-address-space-32bit.ll

AMDGPU: Add 32-bit constant address space
ClosedPublic