This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/GlobalISel: First pass at attempting to legalize load/stores
ClosedPublic

Authored by arsenm on Jul 17 2019, 5:14 PM.

Download Raw Diff

Details

Reviewers

tstellar
nhaehnle
kerbowa
rampitec
ronl

Summary

There's still a lot more to do, but this handles decomposing due to
alignment. I've gotten it to the point where nothing crashes or
infinite loops the legalizer.

Diff Detail

Event Timeline

arsenm created this revision.Jul 17 2019, 5:14 PM

Herald added subscribers: Petar.Avramovic, t-tye, tpr and 6 others. · View Herald TranscriptJul 17 2019, 5:14 PM

arsenm mentioned this in D65084: AMDGPU/GlobalISel: Remove manual store select code.Jul 22 2019, 7:24 AM

Just to clarify, how is selection of global_load_ubyte and friends going to work? I assume similar to today where the load returns an s32 value, but instruction selection does matching based on the MemOperand remembering the size?

Why are unaligned global loads split up on CI+? I see that you're trying to handle this in the code, but apparently it doesn't work correctly?

lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
549–550	Shouldn't the max size for global be 128? It only goes up to dwordx4.

In D64899#1597032, @nhaehnle wrote:

Just to clarify, how is selection of global_load_ubyte and friends going to work? I assume similar to today where the load returns an s32 value, but instruction selection does matching based on the MemOperand remembering the size?

Yes, it's passed on the MMO size as it has always worked.

Why are unaligned global loads split up on CI+? I see that you're trying to handle this in the code, but apparently it doesn't work correctly?

These are using mesa run lines. We only assume unaligned access is enabled for amdhsa (although I think the kernel hardcodes this). Most of the challenge of this patch is managing the number of combinations for the tests, so I'll go through all of these again eventually. I was working on a program to generate all of these, but then got tired of it

lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
549–550	It goes up to 512 for SMRD loads. Constant address space really doesn't exist. If the global load is uniform and constant, it can use an SMRD load. It will be split up during RegBankSelect

Add comment, separate HSA run line to test unaligned loads. We should probably just assume unaligned is always on, since I think the kernel hardcodes this

ping

Rebase and fix failures

ping

arsenm added reviewers: kerbowa, rampitec, ronl.Sep 9 2019, 1:16 PM

rampitec added inline comments.Sep 9 2019, 1:42 PM

lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
573	You can combine two conditions into a single if.
lib/Target/AMDGPU/SIISelLowering.cpp
1271–1290	"Align > 4"?

Address comments and rebase testss

LGTM

This revision is now accepted and ready to land.Sep 9 2019, 5:24 PM

r371533. Had to split some of the tests to avoid differences in release builds

Revision Contents

Path

Size

lib/

Target/

AMDGPU/

AMDGPULegalizerInfo.cpp

328 lines

SIISelLowering.h

5 lines

SIISelLowering.cpp

38 lines

test/

CodeGen/

AMDGPU/

GlobalISel/

inst-select-load-private.mir

328 lines

legalize-load-constant.mir

12987 lines

legalize-load-flat.mir

11355 lines

legalize-load-global.mir

15547 lines

legalize-load-local.mir

10444 lines

legalize-load-private.mir

11099 lines

legalize-load.mir

legalize-store.mir

54 lines

Diff 210467

lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp

Show First 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	return [=](const LegalityQuery &Query) {
const LLT EltTy = Ty.getElementType();		const LLT EltTy = Ty.getElementType();
unsigned Size = Ty.getSizeInBits();		unsigned Size = Ty.getSizeInBits();
unsigned Pieces = (Size + 63) / 64;		unsigned Pieces = (Size + 63) / 64;
unsigned NewNumElts = (Ty.getNumElements() + 1) / Pieces;		unsigned NewNumElts = (Ty.getNumElements() + 1) / Pieces;
return std::make_pair(TypeIdx, LLT::scalarOrVector(NewNumElts, EltTy));		return std::make_pair(TypeIdx, LLT::scalarOrVector(NewNumElts, EltTy));
};		};
}		}

		// Increase the number of vector elements to reach the next multiple of 32-bit
		// type.
		static LegalizeMutation moreEltsToNext32Bit(unsigned TypeIdx) {
		return [=](const LegalityQuery &Query) {
		const LLT Ty = Query.Types[TypeIdx];

		const LLT EltTy = Ty.getElementType();
		const int Size = Ty.getSizeInBits();
		const int EltSize = EltTy.getSizeInBits();
		const int NextMul32 = (Size + 31) / 32;

		assert(EltSize < 32);

		const int NewNumElts = (32 * NextMul32 + EltSize - 1) / EltSize;
		return std::make_pair(TypeIdx, LLT::vector(NewNumElts, EltTy));
		};
		}

		static LegalityPredicate vectorSmallerThan(unsigned TypeIdx, unsigned Size) {
		return [=](const LegalityQuery &Query) {
		const LLT QueryTy = Query.Types[TypeIdx];
		return QueryTy.isVector() && QueryTy.getSizeInBits() < Size;
		};
		}

static LegalityPredicate vectorWiderThan(unsigned TypeIdx, unsigned Size) {		static LegalityPredicate vectorWiderThan(unsigned TypeIdx, unsigned Size) {
return [=](const LegalityQuery &Query) {		return [=](const LegalityQuery &Query) {
const LLT QueryTy = Query.Types[TypeIdx];		const LLT QueryTy = Query.Types[TypeIdx];
return QueryTy.isVector() && QueryTy.getSizeInBits() > Size;		return QueryTy.isVector() && QueryTy.getSizeInBits() > Size;
};		};
}		}

static LegalityPredicate numElementsNotEven(unsigned TypeIdx) {		static LegalityPredicate numElementsNotEven(unsigned TypeIdx) {
Show All 14 Lines	if (Ty.isVector()) {
(EltSize == 16 && Ty.getNumElements() % 2 == 0) \|\|		(EltSize == 16 && Ty.getNumElements() % 2 == 0) \|\|
EltSize == 128 \|\| EltSize == 256;		EltSize == 128 \|\| EltSize == 256;
}		}

return Ty.getSizeInBits() % 32 == 0 && Ty.getSizeInBits() <= 512;		return Ty.getSizeInBits() % 32 == 0 && Ty.getSizeInBits() <= 512;
};		};
}		}

		static LegalityPredicate isWideScalarTruncStore(unsigned TypeIdx) {
		return [=](const LegalityQuery &Query) {
		const LLT Ty = Query.Types[TypeIdx];
		return !Ty.isVector() && Ty.getSizeInBits() > 32 &&
		Query.MMODescrs[0].SizeInBits < Ty.getSizeInBits();
		};
		}

AMDGPULegalizerInfo::AMDGPULegalizerInfo(const GCNSubtarget &ST_,		AMDGPULegalizerInfo::AMDGPULegalizerInfo(const GCNSubtarget &ST_,
const GCNTargetMachine &TM)		const GCNTargetMachine &TM)
: ST(ST_) {		: ST(ST_) {
using namespace TargetOpcode;		using namespace TargetOpcode;

auto GetAddrSpacePtr = [&TM](unsigned AS) {		auto GetAddrSpacePtr = [&TM](unsigned AS) {
return LLT::pointer(AS, TM.getPointerSizeInBits(AS));		return LLT::pointer(AS, TM.getPointerSizeInBits(AS));
};		};

const LLT S1 = LLT::scalar(1);		const LLT S1 = LLT::scalar(1);
const LLT S8 = LLT::scalar(8);		const LLT S8 = LLT::scalar(8);
const LLT S16 = LLT::scalar(16);		const LLT S16 = LLT::scalar(16);
const LLT S32 = LLT::scalar(32);		const LLT S32 = LLT::scalar(32);
const LLT S64 = LLT::scalar(64);		const LLT S64 = LLT::scalar(64);
		const LLT S96 = LLT::scalar(96);
const LLT S128 = LLT::scalar(128);		const LLT S128 = LLT::scalar(128);
const LLT S256 = LLT::scalar(256);		const LLT S256 = LLT::scalar(256);
const LLT S512 = LLT::scalar(512);		const LLT S512 = LLT::scalar(512);

const LLT V2S16 = LLT::vector(2, 16);		const LLT V2S16 = LLT::vector(2, 16);
const LLT V4S16 = LLT::vector(4, 16);		const LLT V4S16 = LLT::vector(4, 16);

const LLT V2S32 = LLT::vector(2, 32);		const LLT V2S32 = LLT::vector(2, 32);
▲ Show 20 Lines • Show All 102 Lines • ▼ Show 20 Lines	getActionDefinitionsBuilder({G_UADDO, G_SADDO, G_USUBO, G_SSUBO,
.legalFor({{S32, S1}})		.legalFor({{S32, S1}})
.clampScalar(0, S32, S32);		.clampScalar(0, S32, S32);

getActionDefinitionsBuilder(G_BITCAST)		getActionDefinitionsBuilder(G_BITCAST)
.legalForCartesianProduct({S32, V2S16})		.legalForCartesianProduct({S32, V2S16})
.legalForCartesianProduct({S64, V2S32, V4S16})		.legalForCartesianProduct({S64, V2S32, V4S16})
.legalForCartesianProduct({V2S64, V4S32})		.legalForCartesianProduct({V2S64, V4S32})
// Don't worry about the size constraint.		// Don't worry about the size constraint.
.legalIf(all(isPointer(0), isPointer(1)));		.legalIf(all(isPointer(0), isPointer(1)))
		// FIXME: Testing hack
		.legalForCartesianProduct({S16, LLT::vector(2, 8), });

if (ST.has16BitInsts()) {		if (ST.has16BitInsts()) {
getActionDefinitionsBuilder(G_FCONSTANT)		getActionDefinitionsBuilder(G_FCONSTANT)
.legalFor({S32, S64, S16})		.legalFor({S32, S64, S16})
.clampScalar(0, S16, S64);		.clampScalar(0, S16, S64);
} else {		} else {
getActionDefinitionsBuilder(G_FCONSTANT)		getActionDefinitionsBuilder(G_FCONSTANT)
.legalFor({S32, S64})		.legalFor({S32, S64})
▲ Show 20 Lines • Show All 90 Lines • ▼ Show 20 Lines	getActionDefinitionsBuilder(G_FSUB)
// Must use fadd + fneg		// Must use fadd + fneg
.lowerFor({S64, S16, V2S16})		.lowerFor({S64, S16, V2S16})
.scalarize(0)		.scalarize(0)
.clampScalar(0, S32, S64);		.clampScalar(0, S32, S64);

getActionDefinitionsBuilder({G_SEXT, G_ZEXT, G_ANYEXT})		getActionDefinitionsBuilder({G_SEXT, G_ZEXT, G_ANYEXT})
.legalFor({{S64, S32}, {S32, S16}, {S64, S16},		.legalFor({{S64, S32}, {S32, S16}, {S64, S16},
{S32, S1}, {S64, S1}, {S16, S1},		{S32, S1}, {S64, S1}, {S16, S1},
		{S96, S32},
// FIXME: Hack		// FIXME: Hack
{S64, LLT::scalar(33)},		{S64, LLT::scalar(33)},
{S32, S8}, {S128, S32}, {S128, S64}, {S32, LLT::scalar(24)}})		{S32, S8}, {S128, S32}, {S128, S64}, {S32, LLT::scalar(24)}})
.scalarize(0);		.scalarize(0);

getActionDefinitionsBuilder({G_SITOFP, G_UITOFP})		getActionDefinitionsBuilder({G_SITOFP, G_UITOFP})
.legalFor({{S32, S32}, {S64, S32}})		.legalFor({{S32, S32}, {S64, S32}})
.lowerFor({{S32, S64}})		.lowerFor({{S32, S64}})
▲ Show 20 Lines • Show All 147 Lines • ▼ Show 20 Lines	if (ST.hasFlatAddressSpace()) {
getActionDefinitionsBuilder(G_ADDRSPACE_CAST)		getActionDefinitionsBuilder(G_ADDRSPACE_CAST)
.scalarize(0)		.scalarize(0)
.custom();		.custom();
}		}

// TODO: Should load to s16 be legal? Most loads extend to 32-bits, but we		// TODO: Should load to s16 be legal? Most loads extend to 32-bits, but we
// handle some operations by just promoting the register during		// handle some operations by just promoting the register during
// selection. There are also d16 loads on GFX9+ which preserve the high bits.		// selection. There are also d16 loads on GFX9+ which preserve the high bits.
getActionDefinitionsBuilder({G_LOAD, G_STORE})		auto maxSizeForAddrSpace = [](unsigned AS) -> unsigned {
.narrowScalarIf([](const LegalityQuery &Query) {		switch (AS) {
unsigned Size = Query.Types[0].getSizeInBits();		// FIXME: Private element size.
		case AMDGPUAS::PRIVATE_ADDRESS:
		return 32;
		// FIXME: Check subtarget
		case AMDGPUAS::LOCAL_ADDRESS:
		return 64;
		case AMDGPUAS::CONSTANT_ADDRESS:
		case AMDGPUAS::GLOBAL_ADDRESS:
		return 512;
		nhaehnleUnsubmitted Not Done Reply Inline Actions Shouldn't the max size for global be 128? It only goes up to dwordx4. nhaehnle: Shouldn't the max size for global be 128? It only goes up to dwordx4.
		arsenmAuthorUnsubmitted Done Reply Inline Actions It goes up to 512 for SMRD loads. Constant address space really doesn't exist. If the global load is uniform and constant, it can use an SMRD load. It will be split up during RegBankSelect arsenm: It goes up to 512 for SMRD loads. Constant address space really doesn't exist. If the global…
		default:
		return 128;
		}
		};

		const auto needToSplitLoad = [=](const LegalityQuery &Query) -> bool {
		const LLT DstTy = Query.Types[0];

		// Split vector extloads.
unsigned MemSize = Query.MMODescrs[0].SizeInBits;		unsigned MemSize = Query.MMODescrs[0].SizeInBits;
return (Size > 32 && MemSize < Size);		if (DstTy.isVector() && DstTy.getSizeInBits() > MemSize)
		return true;

		const LLT PtrTy = Query.Types[1];
		unsigned AS = PtrTy.getAddressSpace();
		if (MemSize > maxSizeForAddrSpace(AS))
		return true;

		// Catch weird sized loads that don't evenly divide into the access sizes
		// TODO: May be able to widen depending on alignment etc.
		unsigned NumRegs = MemSize / 32;
		if (NumRegs == 3) {
		if (!ST.hasDwordx3LoadStores())
		rampitecUnsubmitted Not Done Reply Inline Actions You can combine two conditions into a single if. rampitec: You can combine two conditions into a single if.
		return true;
		}

		unsigned Align = Query.MMODescrs[0].AlignInBits;
		if (Align < MemSize) {
		const SITargetLowering *TLI = ST.getTargetLowering();
		return !TLI->allowsMisalignedMemoryAccessesImpl(MemSize, AS, Align / 8);
		}

		return false;
		};

		unsigned GlobalAlign32 = ST.hasUnalignedBufferAccess() ? 0 : 32;
		unsigned GlobalAlign16 = ST.hasUnalignedBufferAccess() ? 0 : 16;
		unsigned GlobalAlign8 = ST.hasUnalignedBufferAccess() ? 0 : 8;

		// TODO: Refine based on subtargets which support unaligned access or 128-bit
		// LDS
		// TODO: Unsupported flat for SI.

		for (unsigned Op : {G_LOAD, G_STORE}) {
		const bool IsStore = Op == G_STORE;

		auto &Actions = getActionDefinitionsBuilder(Op);
		// Whitelist the common cases.
		// TODO: Pointer loads
		// TODO: Wide constant loads
		// TODO: Only CI+ has 3x loads
		// TODO: Loads to s16 on gfx9
		Actions.legalForTypesWithMemDesc({{S32, GlobalPtr, 32, GlobalAlign32},
		{V2S32, GlobalPtr, 64, GlobalAlign32},
		{V3S32, GlobalPtr, 96, GlobalAlign32},
		{S96, GlobalPtr, 96, GlobalAlign32},
		{V4S32, GlobalPtr, 128, GlobalAlign32},
		{S128, GlobalPtr, 128, GlobalAlign32},
		{S64, GlobalPtr, 64, GlobalAlign32},
		{V2S64, GlobalPtr, 128, GlobalAlign32},
		{V2S16, GlobalPtr, 32, GlobalAlign32},
		{S32, GlobalPtr, 8, GlobalAlign8},
		{S32, GlobalPtr, 16, GlobalAlign16},

		{S32, LocalPtr, 32, 32},
		{S64, LocalPtr, 64, 32},
		{V2S32, LocalPtr, 64, 32},
		{S32, LocalPtr, 8, 8},
		{S32, LocalPtr, 16, 16},
		{V2S16, LocalPtr, 32, 32},

		{S32, PrivatePtr, 32, 32},
		{S32, PrivatePtr, 8, 8},
		{S32, PrivatePtr, 16, 16},
		{V2S16, PrivatePtr, 32, 32},

		{S32, FlatPtr, 32, GlobalAlign32},
		{S32, FlatPtr, 16, GlobalAlign16},
		{S32, FlatPtr, 8, GlobalAlign8},
		{V2S16, FlatPtr, 32, GlobalAlign32},

		{S32, ConstantPtr, 32, GlobalAlign32},
		{V2S32, ConstantPtr, 64, GlobalAlign32},
		{V3S32, ConstantPtr, 96, GlobalAlign32},
		{V4S32, ConstantPtr, 128, GlobalAlign32},
		{S64, ConstantPtr, 64, GlobalAlign32},
		{S128, ConstantPtr, 128, GlobalAlign32},
		{V2S32, ConstantPtr, 32, GlobalAlign32}});
		Actions
		.narrowScalarIf(
		[=](const LegalityQuery &Query) -> bool {
		return !Query.Types[0].isVector() && needToSplitLoad(Query);
},		},
[](const LegalityQuery &Query) {		[=](const LegalityQuery &Query) -> std::pair<unsigned, LLT> {
return std::make_pair(0, LLT::scalar(32));		const LLT DstTy = Query.Types[0];
})		const LLT PtrTy = Query.Types[1];
.moreElementsIf(isSmallOddVector(0), oneMoreElement(0))
.fewerElementsIf([=](const LegalityQuery &Query) {		const unsigned DstSize = DstTy.getSizeInBits();
unsigned MemSize = Query.MMODescrs[0].SizeInBits;		unsigned MemSize = Query.MMODescrs[0].SizeInBits;
return (MemSize == 96) &&
Query.Types[0].isVector() &&		// Split extloads.
!ST.hasDwordx3LoadStores();		if (DstSize > MemSize)
		return std::make_pair(0, LLT::scalar(MemSize));

		if (DstSize > 32 && (DstSize % 32 != 0)) {
		// FIXME: Need a way to specify non-extload of larger size if
		// suitably aligned.
		return std::make_pair(0, LLT::scalar(32 * (DstSize / 32)));
		}

		unsigned MaxSize = maxSizeForAddrSpace(PtrTy.getAddressSpace());
		if (MemSize > MaxSize)
		return std::make_pair(0, LLT::scalar(MaxSize));

		unsigned Align = Query.MMODescrs[0].AlignInBits;
		return std::make_pair(0, LLT::scalar(Align));
		})
		.fewerElementsIf(
		[=](const LegalityQuery &Query) -> bool {
		return Query.Types[0].isVector() && needToSplitLoad(Query);
},		},
[=](const LegalityQuery &Query) {		[=](const LegalityQuery &Query) -> std::pair<unsigned, LLT> {
return std::make_pair(0, V2S32);		const LLT DstTy = Query.Types[0];
		const LLT PtrTy = Query.Types[1];

		LLT EltTy = DstTy.getElementType();
		unsigned MaxSize = maxSizeForAddrSpace(PtrTy.getAddressSpace());

		// Split if it's too large for the address space.
		if (Query.MMODescrs[0].SizeInBits > MaxSize) {
		unsigned NumElts = DstTy.getNumElements();
		unsigned NumPieces = Query.MMODescrs[0].SizeInBits / MaxSize;

		// FIXME: Refine when odd breakdowns handled
		// The scalars will need to be re-legalized.
		if (NumPieces == 1 \|\| NumPieces >= NumElts \|\|
		NumElts % NumPieces != 0)
		return std::make_pair(0, EltTy);

		return std::make_pair(0,
		LLT::vector(NumElts / NumPieces, EltTy));
		}

		// Need to split because of alignment.
		unsigned Align = Query.MMODescrs[0].AlignInBits;
		unsigned EltSize = EltTy.getSizeInBits();
		if (EltSize > Align &&
		(EltSize / Align < DstTy.getNumElements())) {
		return std::make_pair(0, LLT::vector(EltSize / Align, EltTy));
		}

		// May need relegalization for the scalars.
		return std::make_pair(0, EltTy);
})		})
.legalIf([=](const LegalityQuery &Query) {		.minScalar(0, S32);
const LLT &Ty0 = Query.Types[0];
		if (IsStore)
		Actions.narrowScalarIf(isWideScalarTruncStore(0), changeTo(0, S32));

		// TODO: Need a bitcast lower option?
		Actions
		.legalIf([=](const LegalityQuery &Query) {
		const LLT Ty0 = Query.Types[0];
unsigned Size = Ty0.getSizeInBits();		unsigned Size = Ty0.getSizeInBits();
unsigned MemSize = Query.MMODescrs[0].SizeInBits;		unsigned MemSize = Query.MMODescrs[0].SizeInBits;
if (Size < 32 \|\| (Size > 32 && MemSize < Size))		unsigned Align = Query.MMODescrs[0].AlignInBits;
return false;

if (Ty0.isVector() && Size != MemSize)		// No extending vector loads.
		if (Size > MemSize && Ty0.isVector())
return false;		return false;

// TODO: Decompose private loads into 4-byte components.		// FIXME: Widening store from alignment not valid.
// TODO: Illegal flat loads on SI		if (MemSize < Size)
		MemSize = std::max(MemSize, Align);

switch (MemSize) {		switch (MemSize) {
case 8:		case 8:
case 16:		case 16:
return Size == 32;		return Size == 32;
case 32:		case 32:
case 64:		case 64:
case 128:		case 128:
return true;		return true;

case 96:		case 96:
return ST.hasDwordx3LoadStores();		return ST.hasDwordx3LoadStores();

case 256:		case 256:
case 512:		case 512:
// TODO: Possibly support loads of i256 and i512 . This will require		// TODO: Possibly support loads of i256 and i512 . This will
// adding i256 and i512 types to MVT in order for to be able to use		// require adding i256 and i512 types to MVT in order for to be able
// TableGen.		// to use TableGen.
// TODO: Add support for other vector types, this will require		// TODO: Add support for other vector types, this will require
// defining more value mappings for the new types.		// defining more value mappings for the new types.
return Ty0.isVector() && (Ty0.getScalarType().getSizeInBits() == 32 \|\|		return Ty0.isVector() &&
		(Ty0.getScalarType().getSizeInBits() == 32 \|\|
Ty0.getScalarType().getSizeInBits() == 64);		Ty0.getScalarType().getSizeInBits() == 64);

default:		default:
return false;		return false;
}		}
})		})
.clampScalar(0, S32, S64);		.widenScalarToNextPow2(0)
		// TODO: v3s32->v4s32 with alignment
		.moreElementsIf(vectorSmallerThan(0, 32), moreEltsToNext32Bit(0));
		}

// FIXME: Handle alignment requirements.
auto &ExtLoads = getActionDefinitionsBuilder({G_SEXTLOAD, G_ZEXTLOAD})		auto &ExtLoads = getActionDefinitionsBuilder({G_SEXTLOAD, G_ZEXTLOAD})
.legalForTypesWithMemDesc({		.legalForTypesWithMemDesc({{S32, GlobalPtr, 8, 8},
{S32, GlobalPtr, 8, 8},		{S32, GlobalPtr, 16, 2 * 8},
{S32, GlobalPtr, 16, 8},
{S32, LocalPtr, 8, 8},		{S32, LocalPtr, 8, 8},
{S32, LocalPtr, 16, 8},		{S32, LocalPtr, 16, 16},
{S32, PrivatePtr, 8, 8},		{S32, PrivatePtr, 8, 8},
{S32, PrivatePtr, 16, 8}});		{S32, PrivatePtr, 16, 16}});
if (ST.hasFlatAddressSpace()) {		if (ST.hasFlatAddressSpace()) {
ExtLoads.legalForTypesWithMemDesc({{S32, FlatPtr, 8, 8},		ExtLoads.legalForTypesWithMemDesc(
{S32, FlatPtr, 16, 8}});		{{S32, FlatPtr, 8, 8}, {S32, FlatPtr, 16, 16}});
}		}

ExtLoads.clampScalar(0, S32, S32)		ExtLoads.clampScalar(0, S32, S32)
.widenScalarToNextPow2(0)		.widenScalarToNextPow2(0)
.unsupportedIfMemSizeNotPow2()		.unsupportedIfMemSizeNotPow2()
.lower();		.lower();

auto &Atomics = getActionDefinitionsBuilder(		auto &Atomics = getActionDefinitionsBuilder(
▲ Show 20 Lines • Show All 812 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIISelLowering.h

Show First 20 Lines • Show All 229 Lines • ▼ Show 20 Lines	public:
bool isLegalGlobalAddressingMode(const AddrMode &AM) const;		bool isLegalGlobalAddressingMode(const AddrMode &AM) const;
bool isLegalAddressingMode(const DataLayout &DL, const AddrMode &AM, Type *Ty,		bool isLegalAddressingMode(const DataLayout &DL, const AddrMode &AM, Type *Ty,
unsigned AS,		unsigned AS,
Instruction *I = nullptr) const override;		Instruction *I = nullptr) const override;

bool canMergeStoresTo(unsigned AS, EVT MemVT,		bool canMergeStoresTo(unsigned AS, EVT MemVT,
const SelectionDAG &DAG) const override;		const SelectionDAG &DAG) const override;

		bool allowsMisalignedMemoryAccessesImpl(
		unsigned Size, unsigned AS, unsigned Align,
		MachineMemOperand::Flags Flags = MachineMemOperand::MONone,
		bool *IsFast = nullptr) const;

bool allowsMisalignedMemoryAccesses(		bool allowsMisalignedMemoryAccesses(
EVT VT, unsigned AS, unsigned Align,		EVT VT, unsigned AS, unsigned Align,
MachineMemOperand::Flags Flags = MachineMemOperand::MONone,		MachineMemOperand::Flags Flags = MachineMemOperand::MONone,
bool *IsFast = nullptr) const override;		bool *IsFast = nullptr) const override;

EVT getOptimalMemOpType(uint64_t Size, unsigned DstAlign,		EVT getOptimalMemOpType(uint64_t Size, unsigned DstAlign,
unsigned SrcAlign, bool IsMemset,		unsigned SrcAlign, bool IsMemset,
bool ZeroMemset,		bool ZeroMemset,
▲ Show 20 Lines • Show All 137 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,209 Lines • ▼ Show 20 Lines	if (AS == AMDGPUAS::GLOBAL_ADDRESS \|\| AS == AMDGPUAS::FLAT_ADDRESS) {
unsigned MaxPrivateBits = 8 * getSubtarget()->getMaxPrivateElementSize();		unsigned MaxPrivateBits = 8 * getSubtarget()->getMaxPrivateElementSize();
return (MemVT.getSizeInBits() <= MaxPrivateBits);		return (MemVT.getSizeInBits() <= MaxPrivateBits);
} else if (AS == AMDGPUAS::LOCAL_ADDRESS \|\| AS == AMDGPUAS::REGION_ADDRESS) {		} else if (AS == AMDGPUAS::LOCAL_ADDRESS \|\| AS == AMDGPUAS::REGION_ADDRESS) {
return (MemVT.getSizeInBits() <= 2 * 32);		return (MemVT.getSizeInBits() <= 2 * 32);
}		}
return true;		return true;
}		}

bool SITargetLowering::allowsMisalignedMemoryAccesses(		bool SITargetLowering::allowsMisalignedMemoryAccessesImpl(
EVT VT, unsigned AddrSpace, unsigned Align, MachineMemOperand::Flags Flags,		unsigned Size, unsigned AddrSpace, unsigned Align,
bool *IsFast) const {		MachineMemOperand::Flags Flags, bool *IsFast) const {
if (IsFast)		if (IsFast)
*IsFast = false;		*IsFast = false;

// TODO: I think v3i32 should allow unaligned accesses on CI with DS_READ_B96,
// which isn't a simple VT.
// Until MVT is extended to handle this, simply check for the size and
// rely on the condition below: allow accesses if the size is a multiple of 4.
if (VT == MVT::Other \|\| (VT != MVT::Other && VT.getSizeInBits() > 1024 &&
VT.getStoreSize() > 16)) {
return false;
}

if (AddrSpace == AMDGPUAS::LOCAL_ADDRESS \|\|		if (AddrSpace == AMDGPUAS::LOCAL_ADDRESS \|\|
AddrSpace == AMDGPUAS::REGION_ADDRESS) {		AddrSpace == AMDGPUAS::REGION_ADDRESS) {
// ds_read/write_b64 require 8-byte alignment, but we can do a 4 byte		// ds_read/write_b64 require 8-byte alignment, but we can do a 4 byte
// aligned, 8 byte access in a single operation using ds_read2/write2_b32		// aligned, 8 byte access in a single operation using ds_read2/write2_b32
// with adjacent offsets.		// with adjacent offsets.
bool AlignedBy4 = (Align % 4 == 0);		bool AlignedBy4 = (Align % 4 == 0);
if (IsFast)		if (IsFast)
*IsFast = AlignedBy4;		*IsFast = AlignedBy4;
Show All 22 Lines	if (IsFast) {
AddrSpace == AMDGPUAS::CONSTANT_ADDRESS_32BIT) ?		AddrSpace == AMDGPUAS::CONSTANT_ADDRESS_32BIT) ?
(Align % 4 == 0) : true;		(Align % 4 == 0) : true;
}		}

return true;		return true;
}		}

// Smaller than dword value must be aligned.		// Smaller than dword value must be aligned.
if (VT.bitsLT(MVT::i32))		if (Size < 32)
return false;		return false;

// 8.1.6 - For Dword or larger reads or writes, the two LSBs of the		// 8.1.6 - For Dword or larger reads or writes, the two LSBs of the
// byte-address are ignored, thus forcing Dword alignment.		// byte-address are ignored, thus forcing Dword alignment.
// This applies to private, global, and constant memory.		// This applies to private, global, and constant memory.
if (IsFast)		if (IsFast)
*IsFast = true;		*IsFast = true;

return VT.bitsGT(MVT::i32) && Align % 4 == 0;		return Size >= 32 && Align % 4 == 0;
		}

		bool SITargetLowering::allowsMisalignedMemoryAccesses(
		EVT VT, unsigned AddrSpace, unsigned Align, MachineMemOperand::Flags Flags,
		bool *IsFast) const {
		if (IsFast)
		*IsFast = false;

		// TODO: I think v3i32 should allow unaligned accesses on CI with DS_READ_B96,
		// which isn't a simple VT.
		// Until MVT is extended to handle this, simply check for the size and
		// rely on the condition below: allow accesses if the size is a multiple of 4.
		if (VT == MVT::Other \|\| (VT != MVT::Other && VT.getSizeInBits() > 1024 &&
		VT.getStoreSize() > 16)) {
		return false;
		}

		return allowsMisalignedMemoryAccessesImpl(VT.getSizeInBits(), AddrSpace,
		Align, Flags, IsFast);
		rampitecUnsubmitted Not Done Reply Inline Actions "Align > 4"? rampitec: "Align > 4"?
}		}

EVT SITargetLowering::getOptimalMemOpType(		EVT SITargetLowering::getOptimalMemOpType(
uint64_t Size, unsigned DstAlign, unsigned SrcAlign, bool IsMemset,		uint64_t Size, unsigned DstAlign, unsigned SrcAlign, bool IsMemset,
bool ZeroMemset, bool MemcpyStrSrc,		bool ZeroMemset, bool MemcpyStrSrc,
const AttributeList &FuncAttributes) const {		const AttributeList &FuncAttributes) const {
// FIXME: Should account for address space here.		// FIXME: Should account for address space here.

▲ Show 20 Lines • Show All 9,462 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/GlobalISel/inst-select-load-private.mir

Show First 20 Lines • Show All 91 Lines • ▼ Show 20 Lines	bb.0:
%0:vgpr(p5) = COPY $vgpr0		%0:vgpr(p5) = COPY $vgpr0
%1:vgpr(s32) = G_LOAD %0 :: (load 1, align 1, addrspace 5)		%1:vgpr(s32) = G_LOAD %0 :: (load 1, align 1, addrspace 5)
$vgpr0 = COPY %1		$vgpr0 = COPY %1

...		...

---		---

name: load_private_v2s32
legalized: true
regBankSelected: true
tracksRegLiveness: true
machineFunctionInfo:
scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
scratchWaveOffsetReg: $sgpr4
stackPtrOffsetReg: $sgpr32

body: \|
bb.0:
liveins: $vgpr0

; GFX10: $vgpr0 = COPY [[GLOBAL_LOAD_DWORDX2_]]
; GFX6-LABEL: name: load_private_v2s32
; GFX6: liveins: $vgpr0
; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
; GFX6: [[BUFFER_LOAD_DWORDX2_OFFEN:%[0-9]+]]:vreg_64 = BUFFER_LOAD_DWORDX2_OFFEN [[COPY]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, implicit $exec :: (load 8, addrspace 5)
; GFX6: $vgpr0_vgpr1 = COPY [[BUFFER_LOAD_DWORDX2_OFFEN]]
; GFX9-LABEL: name: load_private_v2s32
; GFX9: liveins: $vgpr0
; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
; GFX9: [[BUFFER_LOAD_DWORDX2_OFFEN:%[0-9]+]]:vreg_64 = BUFFER_LOAD_DWORDX2_OFFEN [[COPY]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, implicit $exec :: (load 8, addrspace 5)
; GFX9: $vgpr0_vgpr1 = COPY [[BUFFER_LOAD_DWORDX2_OFFEN]]
%0:vgpr(p5) = COPY $vgpr0
%1:vgpr(<2 x s32>) = G_LOAD %0 :: (load 8, align 8, addrspace 5)
$vgpr0_vgpr1 = COPY %1

...

---

name: load_private_v4s32
legalized: true
regBankSelected: true
tracksRegLiveness: true
machineFunctionInfo:
scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
scratchWaveOffsetReg: $sgpr4
stackPtrOffsetReg: $sgpr32

body: \|
bb.0:
liveins: $vgpr0

; GFX6-LABEL: name: load_private_v4s32
; GFX6: liveins: $vgpr0
; GFX6: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
; GFX6: [[BUFFER_LOAD_DWORDX4_OFFEN:%[0-9]+]]:vreg_128 = BUFFER_LOAD_DWORDX4_OFFEN [[COPY]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, implicit $exec :: (load 16, align 4, addrspace 5)
; GFX6: $vgpr0_vgpr1_vgpr2_vgpr3 = COPY [[BUFFER_LOAD_DWORDX4_OFFEN]]
; GFX9-LABEL: name: load_private_v4s32
; GFX9: liveins: $vgpr0
; GFX9: [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr0
; GFX9: [[BUFFER_LOAD_DWORDX4_OFFEN:%[0-9]+]]:vreg_128 = BUFFER_LOAD_DWORDX4_OFFEN [[COPY]], $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr4, 0, 0, 0, 0, 0, implicit $exec :: (load 16, align 4, addrspace 5)
; GFX9: $vgpr0_vgpr1_vgpr2_vgpr3 = COPY [[BUFFER_LOAD_DWORDX4_OFFEN]]
%0:vgpr(p5) = COPY $vgpr0
%1:vgpr(<4 x s32>) = G_LOAD %0 :: (load 16, align 4, addrspace 5)
$vgpr0_vgpr1_vgpr2_vgpr3 = COPY %1

...

---

name: load_private_s64
legalized: true
regBankSelected: true
tracksRegLiveness: true
machineFunctionInfo:
scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
scratchWaveOffsetReg: $sgpr4
stackPtrOffsetReg: $sgpr32

body: \|
bb.0:
liveins: $vgpr0

; GFX6-LABEL: name: load_private_s64
; GFX6: liveins: $vgpr0
; GFX6: [[COPY:%[0-9]+]]:vgpr(p5) = COPY $vgpr0
; GFX6: [[LOAD:%[0-9]+]]:vreg_64(s64) = G_LOAD [[COPY]](p5) :: (load 8, addrspace 5)
; GFX6: $vgpr0_vgpr1 = COPY [[LOAD]](s64)
; GFX9-LABEL: name: load_private_s64
; GFX9: liveins: $vgpr0
; GFX9: [[COPY:%[0-9]+]]:vgpr(p5) = COPY $vgpr0
; GFX9: [[LOAD:%[0-9]+]]:vreg_64(s64) = G_LOAD [[COPY]](p5) :: (load 8, addrspace 5)
; GFX9: $vgpr0_vgpr1 = COPY [[LOAD]](s64)
%0:vgpr(p5) = COPY $vgpr0
%1:vgpr(s64) = G_LOAD %0 :: (load 8, align 8, addrspace 5)
$vgpr0_vgpr1 = COPY %1

...

---

name: load_private_v2s64
legalized: true
regBankSelected: true
tracksRegLiveness: true
machineFunctionInfo:
scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
scratchWaveOffsetReg: $sgpr4
stackPtrOffsetReg: $sgpr32

body: \|
bb.0:
liveins: $vgpr0

; GFX6-LABEL: name: load_private_v2s64
; GFX6: liveins: $vgpr0
; GFX6: [[COPY:%[0-9]+]]:vgpr(p5) = COPY $vgpr0
; GFX6: [[LOAD:%[0-9]+]]:vreg_128(<2 x s64>) = G_LOAD [[COPY]](p5) :: (load 16, align 4, addrspace 5)
; GFX6: $vgpr0_vgpr1_vgpr2_vgpr3 = COPY [[LOAD]](<2 x s64>)
; GFX9-LABEL: name: load_private_v2s64
; GFX9: liveins: $vgpr0
; GFX9: [[COPY:%[0-9]+]]:vgpr(p5) = COPY $vgpr0
; GFX9: [[LOAD:%[0-9]+]]:vreg_128(<2 x s64>) = G_LOAD [[COPY]](p5) :: (load 16, align 4, addrspace 5)
; GFX9: $vgpr0_vgpr1_vgpr2_vgpr3 = COPY [[LOAD]](<2 x s64>)
%0:vgpr(p5) = COPY $vgpr0
%1:vgpr(<2 x s64>) = G_LOAD %0 :: (load 16, align 4, addrspace 5)
$vgpr0_vgpr1_vgpr2_vgpr3 = COPY %1

...

---

name: load_private_v2p1
legalized: true
regBankSelected: true
tracksRegLiveness: true
machineFunctionInfo:
scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
scratchWaveOffsetReg: $sgpr4
stackPtrOffsetReg: $sgpr32

body: \|
bb.0:
liveins: $vgpr0

; GFX6-LABEL: name: load_private_v2p1
; GFX6: liveins: $vgpr0
; GFX6: [[COPY:%[0-9]+]]:vgpr(p5) = COPY $vgpr0
; GFX6: [[LOAD:%[0-9]+]]:vreg_128(<2 x p1>) = G_LOAD [[COPY]](p5) :: (load 16, align 4, addrspace 5)
; GFX6: $vgpr0_vgpr1_vgpr2_vgpr3 = COPY [[LOAD]](<2 x p1>)
; GFX9-LABEL: name: load_private_v2p1
; GFX9: liveins: $vgpr0
; GFX9: [[COPY:%[0-9]+]]:vgpr(p5) = COPY $vgpr0
; GFX9: [[LOAD:%[0-9]+]]:vreg_128(<2 x p1>) = G_LOAD [[COPY]](p5) :: (load 16, align 4, addrspace 5)
; GFX9: $vgpr0_vgpr1_vgpr2_vgpr3 = COPY [[LOAD]](<2 x p1>)
%0:vgpr(p5) = COPY $vgpr0
%1:vgpr(<2 x p1>) = G_LOAD %0 :: (load 16, align 4, addrspace 5)
$vgpr0_vgpr1_vgpr2_vgpr3 = COPY %1

...

---

name: load_private_s128
legalized: true
regBankSelected: true
tracksRegLiveness: true
machineFunctionInfo:
scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
scratchWaveOffsetReg: $sgpr4
stackPtrOffsetReg: $sgpr32

body: \|
bb.0:
liveins: $vgpr0

; GFX6-LABEL: name: load_private_s128
; GFX6: liveins: $vgpr0
; GFX6: [[COPY:%[0-9]+]]:vgpr(p5) = COPY $vgpr0
; GFX6: [[LOAD:%[0-9]+]]:vreg_128(s128) = G_LOAD [[COPY]](p5) :: (load 16, align 4, addrspace 5)
; GFX6: $vgpr0_vgpr1_vgpr2_vgpr3 = COPY [[LOAD]](s128)
; GFX9-LABEL: name: load_private_s128
; GFX9: liveins: $vgpr0
; GFX9: [[COPY:%[0-9]+]]:vgpr(p5) = COPY $vgpr0
; GFX9: [[LOAD:%[0-9]+]]:vreg_128(s128) = G_LOAD [[COPY]](p5) :: (load 16, align 4, addrspace 5)
; GFX9: $vgpr0_vgpr1_vgpr2_vgpr3 = COPY [[LOAD]](s128)
%0:vgpr(p5) = COPY $vgpr0
%1:vgpr(s128) = G_LOAD %0 :: (load 16, align 4, addrspace 5)
$vgpr0_vgpr1_vgpr2_vgpr3 = COPY %1

...

---

name: load_private_p3_from_4		name: load_private_p3_from_4
legalized: true		legalized: true
regBankSelected: true		regBankSelected: true
tracksRegLiveness: true		tracksRegLiveness: true

body: \|		body: \|
bb.0:		bb.0:
liveins: $vgpr0		liveins: $vgpr0
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	bb.0:
%0:vgpr(p5) = COPY $vgpr0		%0:vgpr(p5) = COPY $vgpr0
%1:vgpr(p5) = G_LOAD %0 :: (load 4, align 4, addrspace 5)		%1:vgpr(p5) = G_LOAD %0 :: (load 4, align 4, addrspace 5)
$vgpr0 = COPY %1		$vgpr0 = COPY %1

...		...

---		---

name: load_private_p999_from_8
legalized: true
regBankSelected: true
tracksRegLiveness: true
machineFunctionInfo:
scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
scratchWaveOffsetReg: $sgpr4
stackPtrOffsetReg: $sgpr32

body: \|
bb.0:
liveins: $vgpr0

; GFX6-LABEL: name: load_private_p999_from_8
; GFX6: liveins: $vgpr0
; GFX6: [[COPY:%[0-9]+]]:vgpr(p5) = COPY $vgpr0
; GFX6: [[LOAD:%[0-9]+]]:vreg_64(p999) = G_LOAD [[COPY]](p5) :: (load 8, addrspace 5)
; GFX6: $vgpr0_vgpr1 = COPY [[LOAD]](p999)
; GFX9-LABEL: name: load_private_p999_from_8
; GFX9: liveins: $vgpr0
; GFX9: [[COPY:%[0-9]+]]:vgpr(p5) = COPY $vgpr0
; GFX9: [[LOAD:%[0-9]+]]:vreg_64(p999) = G_LOAD [[COPY]](p5) :: (load 8, addrspace 5)
; GFX9: $vgpr0_vgpr1 = COPY [[LOAD]](p999)
%0:vgpr(p5) = COPY $vgpr0
%1:vgpr(p999) = G_LOAD %0 :: (load 8, align 8, addrspace 5)
$vgpr0_vgpr1 = COPY %1

...

---

name: load_private_v2p3
legalized: true
regBankSelected: true
tracksRegLiveness: true
machineFunctionInfo:
scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
scratchWaveOffsetReg: $sgpr4
stackPtrOffsetReg: $sgpr32

body: \|
bb.0:
liveins: $vgpr0

; GFX6-LABEL: name: load_private_v2p3
; GFX6: liveins: $vgpr0
; GFX6: [[COPY:%[0-9]+]]:vgpr(p5) = COPY $vgpr0
; GFX6: [[LOAD:%[0-9]+]]:vreg_64(<2 x p3>) = G_LOAD [[COPY]](p5) :: (load 8, addrspace 5)
; GFX6: $vgpr0_vgpr1 = COPY [[LOAD]](<2 x p3>)
; GFX9-LABEL: name: load_private_v2p3
; GFX9: liveins: $vgpr0
; GFX9: [[COPY:%[0-9]+]]:vgpr(p5) = COPY $vgpr0
; GFX9: [[LOAD:%[0-9]+]]:vreg_64(<2 x p3>) = G_LOAD [[COPY]](p5) :: (load 8, addrspace 5)
; GFX9: $vgpr0_vgpr1 = COPY [[LOAD]](<2 x p3>)
%0:vgpr(p5) = COPY $vgpr0
%1:vgpr(<2 x p3>) = G_LOAD %0 :: (load 8, align 8, addrspace 5)
$vgpr0_vgpr1 = COPY %1

...

---

name: load_private_v2s16		name: load_private_v2s16
legalized: true		legalized: true
regBankSelected: true		regBankSelected: true
tracksRegLiveness: true		tracksRegLiveness: true
machineFunctionInfo:		machineFunctionInfo:
scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3		scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
scratchWaveOffsetReg: $sgpr4		scratchWaveOffsetReg: $sgpr4
stackPtrOffsetReg: $sgpr32		stackPtrOffsetReg: $sgpr32
Show All 13 Lines	bb.0:
; GFX9: [[LOAD:%[0-9]+]]:vgpr_32(<2 x s16>) = G_LOAD [[COPY]](p5) :: (load 4, addrspace 5)		; GFX9: [[LOAD:%[0-9]+]]:vgpr_32(<2 x s16>) = G_LOAD [[COPY]](p5) :: (load 4, addrspace 5)
; GFX9: $vgpr0 = COPY [[LOAD]](<2 x s16>)		; GFX9: $vgpr0 = COPY [[LOAD]](<2 x s16>)
%0:vgpr(p5) = COPY $vgpr0		%0:vgpr(p5) = COPY $vgpr0
%1:vgpr(<2 x s16>) = G_LOAD %0 :: (load 4, align 4, addrspace 5)		%1:vgpr(<2 x s16>) = G_LOAD %0 :: (load 4, align 4, addrspace 5)
$vgpr0 = COPY %1		$vgpr0 = COPY %1

...		...

---

name: load_private_v4s16
legalized: true
regBankSelected: true
tracksRegLiveness: true

body: \|
bb.0:
liveins: $vgpr0

; GFX6-LABEL: name: load_private_v4s16
; GFX6: liveins: $vgpr0
; GFX6: [[COPY:%[0-9]+]]:vgpr(p5) = COPY $vgpr0
; GFX6: [[LOAD:%[0-9]+]]:vreg_64(<4 x s16>) = G_LOAD [[COPY]](p5) :: (load 8, addrspace 5)
; GFX6: $vgpr0_vgpr1 = COPY [[LOAD]](<4 x s16>)
; GFX9-LABEL: name: load_private_v4s16
; GFX9: liveins: $vgpr0
; GFX9: [[COPY:%[0-9]+]]:vgpr(p5) = COPY $vgpr0
; GFX9: [[LOAD:%[0-9]+]]:vreg_64(<4 x s16>) = G_LOAD [[COPY]](p5) :: (load 8, addrspace 5)
; GFX9: $vgpr0_vgpr1 = COPY [[LOAD]](<4 x s16>)
%0:vgpr(p5) = COPY $vgpr0
%1:vgpr(<4 x s16>) = G_LOAD %0 :: (load 8, align 8, addrspace 5)
$vgpr0_vgpr1 = COPY %1

...

# ---

# name: load_private_v6s16
# legalized: true
# regBankSelected: true
# tracksRegLiveness: true
# machineFunctionInfo:
# scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
# scratchWaveOffsetReg: $sgpr4
# stackPtrOffsetReg: $sgpr32

# body: \|
# bb.0:
# liveins: $vgpr0

# %0:vgpr(p5) = COPY $vgpr0
# %1:vgpr(<6 x s16>) = G_LOAD %0 :: (load 12, align 4, addrspace 5)
# $vgpr0_vgpr1_vgpr2 = COPY %1

# ...

---

name: load_private_v8s16
legalized: true
regBankSelected: true
tracksRegLiveness: true
machineFunctionInfo:
scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
scratchWaveOffsetReg: $sgpr4
stackPtrOffsetReg: $sgpr32

body: \|
bb.0:
liveins: $vgpr0

; GFX6-LABEL: name: load_private_v8s16
; GFX6: liveins: $vgpr0
; GFX6: [[COPY:%[0-9]+]]:vgpr(p5) = COPY $vgpr0
; GFX6: [[LOAD:%[0-9]+]]:vreg_128(<8 x s16>) = G_LOAD [[COPY]](p5) :: (load 16, align 4, addrspace 5)
; GFX6: $vgpr0_vgpr1_vgpr2_vgpr3 = COPY [[LOAD]](<8 x s16>)
; GFX9-LABEL: name: load_private_v8s16
; GFX9: liveins: $vgpr0
; GFX9: [[COPY:%[0-9]+]]:vgpr(p5) = COPY $vgpr0
; GFX9: [[LOAD:%[0-9]+]]:vreg_128(<8 x s16>) = G_LOAD [[COPY]](p5) :: (load 16, align 4, addrspace 5)
; GFX9: $vgpr0_vgpr1_vgpr2_vgpr3 = COPY [[LOAD]](<8 x s16>)
%0:vgpr(p5) = COPY $vgpr0
%1:vgpr(<8 x s16>) = G_LOAD %0 :: (load 16, align 4, addrspace 5)
$vgpr0_vgpr1_vgpr2_vgpr3 = COPY %1

...

################################################################################		################################################################################
### Stress addressing modes		### Stress addressing modes
################################################################################		################################################################################

---		---

name: load_private_s32_from_1_gep_2047		name: load_private_s32_from_1_gep_2047
legalized: true		legalized: true
▲ Show 20 Lines • Show All 636 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/GlobalISel/legalize-load-constant.mir

This file was added.

This file has a very large number of changes (12,987 lines). Show File Contents

test/CodeGen/AMDGPU/GlobalISel/legalize-load-flat.mir

This file was added.

This file has a very large number of changes (11,355 lines). Show File Contents

test/CodeGen/AMDGPU/GlobalISel/legalize-load-global.mir

This file was added.

This file has a very large number of changes (15,547 lines). Show File Contents

test/CodeGen/AMDGPU/GlobalISel/legalize-load-local.mir

This file was added.

This file has a very large number of changes (10,444 lines). Show File Contents

test/CodeGen/AMDGPU/GlobalISel/legalize-load-private.mir

This file was added.

This file has a very large number of changes (11,099 lines). Show File Contents

test/CodeGen/AMDGPU/GlobalISel/legalize-load.mir

This file was deleted.

This file was completely deleted. Show File Contents

test/CodeGen/AMDGPU/GlobalISel/legalize-store.mir

	Show First 20 Lines • Show All 137 Lines • ▼ Show 20 Lines
	name: test_store_global_v3s32			name: test_store_global_v3s32
	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0_vgpr1, $vgpr2_vgpr3_vgpr4			liveins: $vgpr0_vgpr1, $vgpr2_vgpr3_vgpr4

	; SI-LABEL: name: test_store_global_v3s32			; SI-LABEL: name: test_store_global_v3s32
	; SI: [[COPY:%[0-9]+]]:_(p1) = COPY $vgpr0_vgpr1			; SI: [[COPY:%[0-9]+]]:_(p1) = COPY $vgpr0_vgpr1
	; SI: [[COPY1:%[0-9]+]]:_(<3 x s32>) = COPY $vgpr2_vgpr3_vgpr4			; SI: [[COPY1:%[0-9]+]]:_(<3 x s32>) = COPY $vgpr2_vgpr3_vgpr4
	; SI: [[EXTRACT:%[0-9]+]]:_(<2 x s32>) = G_EXTRACT [[COPY1]](<3 x s32>), 0			; SI: G_STORE [[COPY1]](<3 x s32>), [[COPY]](p1) :: (store 12, align 4, addrspace 1)
	; SI: [[EXTRACT1:%[0-9]+]]:_(s32) = G_EXTRACT [[COPY1]](<3 x s32>), 64
	; SI: G_STORE [[EXTRACT]](<2 x s32>), [[COPY]](p1) :: (store 8, align 4, addrspace 1)
	; SI: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 8
	; SI: [[GEP:%[0-9]+]]:_(p1) = G_GEP [[COPY]], [[C]](s64)
	; SI: G_STORE [[EXTRACT1]](s32), [[GEP]](p1) :: (store 4, addrspace 1)
	; VI-LABEL: name: test_store_global_v3s32			; VI-LABEL: name: test_store_global_v3s32
	; VI: [[COPY:%[0-9]+]]:_(p1) = COPY $vgpr0_vgpr1			; VI: [[COPY:%[0-9]+]]:_(p1) = COPY $vgpr0_vgpr1
	; VI: [[COPY1:%[0-9]+]]:_(<3 x s32>) = COPY $vgpr2_vgpr3_vgpr4			; VI: [[COPY1:%[0-9]+]]:_(<3 x s32>) = COPY $vgpr2_vgpr3_vgpr4
	; VI: G_STORE [[COPY1]](<3 x s32>), [[COPY]](p1) :: (store 12, align 4, addrspace 1)			; VI: G_STORE [[COPY1]](<3 x s32>), [[COPY]](p1) :: (store 12, align 4, addrspace 1)
	%0:_(p1) = COPY $vgpr0_vgpr1			%0:_(p1) = COPY $vgpr0_vgpr1
	%1:_(<3 x s32>) = COPY $vgpr2_vgpr3_vgpr4			%1:_(<3 x s32>) = COPY $vgpr2_vgpr3_vgpr4
	G_STORE %1, %0 :: (store 12, align 4, addrspace 1)			G_STORE %1, %0 :: (store 12, align 4, addrspace 1)
	...			...
	▲ Show 20 Lines • Show All 162 Lines • ▼ Show 20 Lines
	name: test_store_global_96			name: test_store_global_96
	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0_vgpr1_vgpr2, $vgpr3_vgpr4			liveins: $vgpr0_vgpr1_vgpr2, $vgpr3_vgpr4

	; SI-LABEL: name: test_store_global_96			; SI-LABEL: name: test_store_global_96
	; SI: [[COPY:%[0-9]+]]:_(s96) = COPY $vgpr0_vgpr1_vgpr2			; SI: [[COPY:%[0-9]+]]:_(s96) = COPY $vgpr0_vgpr1_vgpr2
	; SI: [[COPY1:%[0-9]+]]:_(p1) = COPY $vgpr3_vgpr4			; SI: [[COPY1:%[0-9]+]]:_(p1) = COPY $vgpr3_vgpr4
	; SI: [[EXTRACT:%[0-9]+]]:_(s64) = G_EXTRACT [[COPY]](s96), 0			; SI: G_STORE [[COPY]](s96), [[COPY1]](p1) :: (store 12, align 16, addrspace 1)
	; SI: [[EXTRACT1:%[0-9]+]]:_(s32) = G_EXTRACT [[COPY]](s96), 64
	; SI: G_STORE [[EXTRACT]](s64), [[COPY1]](p1) :: (store 8, align 16, addrspace 1)
	; SI: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 8
	; SI: [[GEP:%[0-9]+]]:_(p1) = G_GEP [[COPY1]], [[C]](s64)
	; SI: G_STORE [[EXTRACT1]](s32), [[GEP]](p1) :: (store 4, align 8, addrspace 1)
	; VI-LABEL: name: test_store_global_96			; VI-LABEL: name: test_store_global_96
	; VI: [[COPY:%[0-9]+]]:_(s96) = COPY $vgpr0_vgpr1_vgpr2			; VI: [[COPY:%[0-9]+]]:_(s96) = COPY $vgpr0_vgpr1_vgpr2
	; VI: [[COPY1:%[0-9]+]]:_(p1) = COPY $vgpr3_vgpr4			; VI: [[COPY1:%[0-9]+]]:_(p1) = COPY $vgpr3_vgpr4
	; VI: G_STORE [[COPY]](s96), [[COPY1]](p1) :: (store 12, align 16, addrspace 1)			; VI: G_STORE [[COPY]](s96), [[COPY1]](p1) :: (store 12, align 16, addrspace 1)
	%0:_(s96) = COPY $vgpr0_vgpr1_vgpr2			%0:_(s96) = COPY $vgpr0_vgpr1_vgpr2
	%1:_(p1) = COPY $vgpr3_vgpr4			%1:_(p1) = COPY $vgpr3_vgpr4

	G_STORE %0, %1 :: (store 12, addrspace 1, align 16)			G_STORE %0, %1 :: (store 12, addrspace 1, align 16)
	▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0_vgpr1, $vgpr2_vgpr3_vgpr4_vgpr5			liveins: $vgpr0_vgpr1, $vgpr2_vgpr3_vgpr4_vgpr5

	; SI-LABEL: name: test_store_global_v3s8_align4			; SI-LABEL: name: test_store_global_v3s8_align4
	; SI: [[COPY:%[0-9]+]]:_(p1) = COPY $vgpr0_vgpr1			; SI: [[COPY:%[0-9]+]]:_(p1) = COPY $vgpr0_vgpr1
	; SI: [[DEF:%[0-9]+]]:_(<3 x s8>) = G_IMPLICIT_DEF			; SI: [[DEF:%[0-9]+]]:_(<3 x s8>) = G_IMPLICIT_DEF
	; SI: [[DEF1:%[0-9]+]]:_(<4 x s8>) = G_IMPLICIT_DEF			; SI: [[DEF1:%[0-9]+]]:_(<4 x s8>) = G_IMPLICIT_DEF
	; SI: [[INSERT:%[0-9]+]]:_(<4 x s8>) = G_INSERT [[DEF1]], [[DEF]](<3 x s8>), 0			; SI: [[ANYEXT:%[0-9]+]]:_(<4 x s16>) = G_ANYEXT [[DEF1]](<4 x s8>)
	; SI: G_STORE [[INSERT]](<4 x s8>), [[COPY]](p1) :: (store 3, align 4, addrspace 1)			; SI: [[INSERT:%[0-9]+]]:_(<4 x s16>) = G_INSERT [[ANYEXT]], [[DEF]](<3 x s8>), 0
				; SI: [[TRUNC:%[0-9]+]]:_(<4 x s8>) = G_TRUNC [[INSERT]](<4 x s16>)
				; SI: [[UV:%[0-9]+]]:_(s8), [[UV1:%[0-9]+]]:_(s8), [[UV2:%[0-9]+]]:_(s8), [[UV3:%[0-9]+]]:_(s8) = G_UNMERGE_VALUES [[TRUNC]](<4 x s8>)
				; SI: [[ANYEXT1:%[0-9]+]]:_(s32) = G_ANYEXT [[UV]](s8)
				; SI: G_STORE [[ANYEXT1]](s32), [[COPY]](p1) :: (store 1, align 4, addrspace 1)
				; SI: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 1
				; SI: [[GEP:%[0-9]+]]:_(p1) = G_GEP [[COPY]], [[C]](s64)
				; SI: [[ANYEXT2:%[0-9]+]]:_(s32) = G_ANYEXT [[UV1]](s8)
				; SI: G_STORE [[ANYEXT2]](s32), [[GEP]](p1) :: (store 1, addrspace 1)
				; SI: [[C1:%[0-9]+]]:_(s64) = G_CONSTANT i64 2
				; SI: [[GEP1:%[0-9]+]]:_(p1) = G_GEP [[COPY]], [[C1]](s64)
				; SI: [[ANYEXT3:%[0-9]+]]:_(s32) = G_ANYEXT [[UV2]](s8)
				; SI: G_STORE [[ANYEXT3]](s32), [[GEP1]](p1) :: (store 1, align 2, addrspace 1)
				; SI: [[C2:%[0-9]+]]:_(s64) = G_CONSTANT i64 3
				; SI: [[GEP2:%[0-9]+]]:_(p1) = G_GEP [[COPY]], [[C2]](s64)
				; SI: [[ANYEXT4:%[0-9]+]]:_(s32) = G_ANYEXT [[UV3]](s8)
				; SI: G_STORE [[ANYEXT4]](s32), [[GEP2]](p1) :: (store 1, addrspace 1)
	; VI-LABEL: name: test_store_global_v3s8_align4			; VI-LABEL: name: test_store_global_v3s8_align4
	; VI: [[COPY:%[0-9]+]]:_(p1) = COPY $vgpr0_vgpr1			; VI: [[COPY:%[0-9]+]]:_(p1) = COPY $vgpr0_vgpr1
	; VI: [[DEF:%[0-9]+]]:_(<3 x s8>) = G_IMPLICIT_DEF			; VI: [[DEF:%[0-9]+]]:_(<3 x s8>) = G_IMPLICIT_DEF
	; VI: [[DEF1:%[0-9]+]]:_(<4 x s8>) = G_IMPLICIT_DEF			; VI: [[DEF1:%[0-9]+]]:_(<4 x s8>) = G_IMPLICIT_DEF
	; VI: [[INSERT:%[0-9]+]]:_(<4 x s8>) = G_INSERT [[DEF1]], [[DEF]](<3 x s8>), 0			; VI: [[ANYEXT:%[0-9]+]]:_(<4 x s16>) = G_ANYEXT [[DEF1]](<4 x s8>)
	; VI: G_STORE [[INSERT]](<4 x s8>), [[COPY]](p1) :: (store 3, align 4, addrspace 1)			; VI: [[INSERT:%[0-9]+]]:_(<4 x s16>) = G_INSERT [[ANYEXT]], [[DEF]](<3 x s8>), 0
				; VI: [[TRUNC:%[0-9]+]]:_(<4 x s8>) = G_TRUNC [[INSERT]](<4 x s16>)
				; VI: [[UV:%[0-9]+]]:_(s8), [[UV1:%[0-9]+]]:_(s8), [[UV2:%[0-9]+]]:_(s8), [[UV3:%[0-9]+]]:_(s8) = G_UNMERGE_VALUES [[TRUNC]](<4 x s8>)
				; VI: [[ANYEXT1:%[0-9]+]]:_(s32) = G_ANYEXT [[UV]](s8)
				; VI: G_STORE [[ANYEXT1]](s32), [[COPY]](p1) :: (store 1, align 4, addrspace 1)
				; VI: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 1
				; VI: [[GEP:%[0-9]+]]:_(p1) = G_GEP [[COPY]], [[C]](s64)
				; VI: [[ANYEXT2:%[0-9]+]]:_(s32) = G_ANYEXT [[UV1]](s8)
				; VI: G_STORE [[ANYEXT2]](s32), [[GEP]](p1) :: (store 1, addrspace 1)
				; VI: [[C1:%[0-9]+]]:_(s64) = G_CONSTANT i64 2
				; VI: [[GEP1:%[0-9]+]]:_(p1) = G_GEP [[COPY]], [[C1]](s64)
				; VI: [[ANYEXT3:%[0-9]+]]:_(s32) = G_ANYEXT [[UV2]](s8)
				; VI: G_STORE [[ANYEXT3]](s32), [[GEP1]](p1) :: (store 1, align 2, addrspace 1)
				; VI: [[C2:%[0-9]+]]:_(s64) = G_CONSTANT i64 3
				; VI: [[GEP2:%[0-9]+]]:_(p1) = G_GEP [[COPY]], [[C2]](s64)
				; VI: [[ANYEXT4:%[0-9]+]]:_(s32) = G_ANYEXT [[UV3]](s8)
				; VI: G_STORE [[ANYEXT4]](s32), [[GEP2]](p1) :: (store 1, addrspace 1)
	%0:_(p1) = COPY $vgpr0_vgpr1			%0:_(p1) = COPY $vgpr0_vgpr1
	%1:_(<3 x s8>) = G_IMPLICIT_DEF			%1:_(<3 x s8>) = G_IMPLICIT_DEF
	G_STORE %1, %0 :: (store 3, addrspace 1, align 4)			G_STORE %1, %0 :: (store 3, addrspace 1, align 4)

	...			...