This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
-
AMDGPU.td
1
AMDGPUSubtarget.cpp
-
AMDGPUTargetTransformInfo.cpp
-
DSInstructions.td
-
GCNSubtarget.h
-
test/
-
CodeGen/
-
AMDGPU/
-
GlobalISel/
-
lds-misaligned-bug.ll
-
legalize-load-local.mir
-
load-constant.96.ll
-
load-unaligned.ll
-
amdgpu.private-memory.ll
-
chain-hi-to-lo.ll
-
ds_read2.ll
-
ds_write2.ll
-
fast-unaligned-load-store.global.ll
-
lds-misaligned-bug.ll
-
unaligned-load-store.ll
-
MIR/AMDGPU/
-
AMDGPU/
-
llc-target-cpu-attr-from-cmdline-ir.mir
-
llc-target-cpu-attr-from-cmdline.mir
-
Transforms/LoadStoreVectorizer/AMDGPU/
-
LoadStoreVectorizer/
-
AMDGPU/
-
adjust-alloca-alignment.ll
-
multiple_tails.ll

Differential D98491

[AMDGPU] Split GCN subtarget features for unaligned access
AbandonedPublic

Authored by mbrkusanin on Mar 12 2021, 4:08 AM.

Download Raw Diff

Details

Reviewers

foad
rampitec
mshivama
arsenm
hsmhsm

Summary

Split UnalignedAccessMode into UnalignedDSAccess and UnalignedBufferAccess so
that only UnalignedBufferAccess can be enabled by default for AMDHSA

Diff Detail

Event Timeline

mbrkusanin created this revision.Mar 12 2021, 4:08 AM

Herald added subscribers: kerbowa, hiraditya, t-tye and 7 others. · View Herald TranscriptMar 12 2021, 4:08 AM

mbrkusanin requested review of this revision.Mar 12 2021, 4:08 AM

Herald added a subscriber: wdng. · View Herald TranscriptMar 12 2021, 4:08 AM

piotr edited reviewers, added: arsenm; removed: arsen.Mar 12 2021, 4:33 AM

Harbormaster completed remote builds in B93463: Diff 330197.Mar 12 2021, 6:22 AM

rampitec added inline comments.Mar 12 2021, 8:21 AM

llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
96	HSA wants unaligned DS access as well. That is only underaligned ds_read/write_b128 shall not be produced for performance reasons.

I thought these are still the same hardware control, so aren't truly different features

mshivama added a reviewer: hsmhsm.Mar 23 2021, 1:59 AM

@mbrkusanin

I see some comments from @rampitec and @arsenm, and @rampitec had also made below comments in the related internal JIRA ticket.

Comments from @rampitec in JIRA ticket:
Disabling unaligned dword access is not acceptable for HSA.
My proposal was simple: prefer ds_read2_b64 (and write) over b128 if alignment < 16. I never heard of b64 performance issues though, so the rest is the same as now: create a widest load/store possible with that one exception for b128.

From the JIRA comments, I also see @foad is in agreement with above comments. So, it looks like, we should avoid accessing ds_read_128/ds_write_128 when the alignment is < 16. I am not sure, what you think here, but, your comments are very valuable to make further progress here.

If we all agree with above, then looks like the the file DSInstructions.td has to go below changes.

The code

let SubtargetPredicate = HasUnalignedDSAccess in {

foreach vt = VReg_96.RegTypes in {
defm : DSReadPat_mc <DS_READ_B96, vt, "load_local">;
}

foreach vt = VReg_128.RegTypes in {
defm : DSReadPat_mc <DS_READ_B128, vt, "load_local">;
}

} // End SubtargetPredicate = HasUnalignedDSAccess

should change to

let SubtargetPredicate = HasUnalignedDSAccess in {

foreach vt = VReg_96.RegTypes in {
defm : DSReadPat_mc <DS_READ_B96, vt, "load_local">;
}

foreach vt = VReg_128.RegTypes in {
defm : DSReadPat_mc <DS_READ_B128, vt, "load_align16_local">;
}

} // End SubtargetPredicate = HasUnalignedDSAccess

and the code

let SubtargetPredicate = HasUnalignedDSAccess in {

foreach vt = VReg_96.RegTypes in {
defm : DSWritePat_mc <DS_WRITE_B96, vt, "store_local">;
}

foreach vt = VReg_128.RegTypes in {
defm : DSWritePat_mc <DS_WRITE_B128, vt, "store_local">;
}

} // End SubtargetPredicate = HasUnalignedDSAccess

should change to

let SubtargetPredicate = HasUnalignedDSAccess in {

foreach vt = VReg_96.RegTypes in {
defm : DSWritePat_mc <DS_WRITE_B96, vt, "store_local">;
}

foreach vt = VReg_128.RegTypes in {
defm : DSWritePat_mc <DS_WRITE_B128, vt, "store_align16_local">;
}

} // End SubtargetPredicate = HasUnalignedDSAccess

In D98491#2643944, @hsmhsm wrote:
@mbrkusanin

I see some comments from @rampitec and @arsenm, and @rampitec had also made below comments in the related internal JIRA ticket.
Comments from @rampitec in JIRA ticket:
Disabling unaligned dword access is not acceptable for HSA.
My proposal was simple: prefer ds_read2_b64 (and write) over b128 if alignment < 16. I never heard of b64 performance issues though, so the rest is the same as now: create a widest load/store possible with that one exception for b128.
From the JIRA comments, I also see @foad is in agreement with above comments. So, it looks like, we should avoid accessing ds_read_128/ds_write_128 when the alignment is < 16. I am not sure, what you think here, but, your comments are very valuable to make further progress here.

If we all agree with above, then looks like the the file DSInstructions.td has to go below changes.

I strongly disagree with splitting features in a way that does not correspond with the hardware controls to match a desired output.

In D98491#2643944, @hsmhsm wrote:

@mbrkusanin
My proposal was simple: prefer ds_read2_b64 (and write) over b128 if alignment < 16. I never heard of b64 performance issues though, so the rest is the same as now: create a widest load/store possible with that one exception for b128.

We have since then confirmed that ds_read_b64 has the same performance hit on memory not aligned to 64 bit, so 64 bit operations too need an alignment check.

In D98491#2644858, @rampitec wrote:

In D98491#2643944, @hsmhsm wrote:

@mbrkusanin
My proposal was simple: prefer ds_read2_b64 (and write) over b128 if alignment < 16. I never heard of b64 performance issues though, so the rest is the same as now: create a widest load/store possible with that one exception for b128.

We have since then confirmed that ds_read_b64 has the same performance hit on memory not aligned to 64 bit, so 64 bit operations too need an alignment check.

But I see the below ISel pattern for - DS_READ_B64 indeed checks alignment requirment, though not sure if it is over written some where else.

DSInstructions.td:717-719

foreach vt = VReg_64.RegTypes in {
defm : DSReadPat_mc <DS_READ_B64, vt, "load_align8_local">;
}

In D98491#2645085, @hsmhsm wrote:
In D98491#2644858, @rampitec wrote:

In D98491#2643944, @hsmhsm wrote:

@mbrkusanin
My proposal was simple: prefer ds_read2_b64 (and write) over b128 if alignment < 16. I never heard of b64 performance issues though, so the rest is the same as now: create a widest load/store possible with that one exception for b128.

We have since then confirmed that ds_read_b64 has the same performance hit on memory not aligned to 64 bit, so 64 bit operations too need an alignment check.

But I see the below ISel pattern for - DS_READ_B64 indeed checks alignment requirment, though not sure if it is over written some where else.

DSInstructions.td:717-719
foreach vt = VReg_64.RegTypes in {
defm : DSReadPat_mc <DS_READ_B64, vt, "load_align8_local">;
}

And further based on a closer look at the code, I think, in general, we better revert the commit 0654ff703d4e99423133165db63083b831efb9b6 in order to avoid above performance issues because of unligned access. But, I am not aware of the consequences of reverting this patch.

@mbrkusanin / @foad What is your openion on reverting the patch 0654ff703d4e99423133165db63083b831efb9b6?

In D98491#2645233, @hsmhsm wrote:

@mbrkusanin / @foad What is your openion on reverting the patch 0654ff703d4e99423133165db63083b831efb9b6?

0654ff703d4e99423133165db63083b831efb9b6 (aka D84403) had good effects like selecting ds_write_b96 when the alignment was 16. It would be a shame to lose that.

arsenm requested changes to this revision.Mar 30 2021, 5:36 PM

This revision now requires changes to proceed.Mar 30 2021, 5:36 PM

mbrkusanin abandoned this revision.Apr 8 2021, 2:50 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPU.td

38 lines

AMDGPUSubtarget.cpp

9 lines

AMDGPUTargetTransformInfo.cpp

2 lines

DSInstructions.td

8 lines

GCNSubtarget.h

19 lines

test/

CodeGen/

AMDGPU/

GlobalISel/

lds-misaligned-bug.ll

2 lines

legalize-load-local.mir

4 lines

load-constant.96.ll

8 lines

load-unaligned.ll

4 lines

amdgpu.private-memory.ll

10 lines

chain-hi-to-lo.ll

4 lines

ds_read2.ll

4 lines

ds_write2.ll

4 lines

fast-unaligned-load-store.global.ll

6 lines

lds-misaligned-bug.ll

2 lines

unaligned-load-store.ll

2 lines

MIR/

AMDGPU/

llc-target-cpu-attr-from-cmdline-ir.mir

6 lines

llc-target-cpu-attr-from-cmdline.mir

4 lines

Transforms/

LoadStoreVectorizer/

AMDGPU/

adjust-alloca-alignment.ll

8 lines

multiple_tails.ll

25 lines

Diff 330197

llvm/lib/Target/AMDGPU/AMDGPU.td

	Show First 20 Lines • Show All 88 Lines • ▼ Show 20 Lines

	def FeatureAddNoCarryInsts : SubtargetFeature<"add-no-carry-insts",			def FeatureAddNoCarryInsts : SubtargetFeature<"add-no-carry-insts",
	"AddNoCarryInsts",			"AddNoCarryInsts",
	"true",			"true",
	"Have VALU add/sub instructions without carry out"			"Have VALU add/sub instructions without carry out"
	>;			>;

	def FeatureUnalignedBufferAccess : SubtargetFeature<"unaligned-buffer-access",			def FeatureUnalignedBufferAccess : SubtargetFeature<"unaligned-buffer-access",
	"UnalignedBufferAccess",			"EnableUnalignedBufferAccess",
				"true",
				"Enable unaligned global loads and stores if hardware supports it"
				>;

				def FeatureSupportUnalignedBufferAccess : SubtargetFeature<"unaligned-buffer-access-support",
				"SupportsUnalignedBufferAccess",
	"true",			"true",
	"Hardware supports unaligned global loads and stores"			"Hardware supports unaligned global loads and stores"
	>;			>;

	def FeatureTrapHandler: SubtargetFeature<"trap-handler",			def FeatureTrapHandler: SubtargetFeature<"trap-handler",
	"TrapHandler",			"TrapHandler",
	"true",			"true",
	"Trap handler support"			"Trap handler support"
	>;			>;

	def FeatureUnalignedScratchAccess : SubtargetFeature<"unaligned-scratch-access",			def FeatureUnalignedScratchAccess : SubtargetFeature<"unaligned-scratch-access",
	"UnalignedScratchAccess",			"UnalignedScratchAccess",
	"true",			"true",
	"Support unaligned scratch loads and stores"			"Support unaligned scratch loads and stores"
	>;			>;

	def FeatureUnalignedDSAccess : SubtargetFeature<"unaligned-ds-access",			def FeatureUnalignedDSAccess : SubtargetFeature<"unaligned-ds-access",
	"UnalignedDSAccess",			"EnableUnalignedDSAccess",
				"true",
				"Enable unaligned local and region loads and stores if hardware supports it"
				>;

				def FeatureSupportUnalignedDSAccess : SubtargetFeature<"unaligned-ds-access-support",
				"SupportsUnalignedDSAccess",
	"true",			"true",
	"Hardware supports unaligned local and region loads and stores"			"Hardware supports unaligned local and region loads and stores"
	>;			>;

	def FeatureApertureRegs : SubtargetFeature<"aperture-regs",			def FeatureApertureRegs : SubtargetFeature<"aperture-regs",
	"HasApertureRegs",			"HasApertureRegs",
	"true",			"true",
	"Has Memory Aperture Base and Size Registers"			"Has Memory Aperture Base and Size Registers"
	▲ Show 20 Lines • Show All 563 Lines • ▼ Show 20 Lines
	>;			>;

	def FeatureTrigReducedRange : SubtargetFeature<"trig-reduced-range",			def FeatureTrigReducedRange : SubtargetFeature<"trig-reduced-range",
	"HasTrigReducedRange",			"HasTrigReducedRange",
	"true",			"true",
	"Requires use of fract on arguments to trig instructions"			"Requires use of fract on arguments to trig instructions"
	>;			>;

	// Alignment enforcement is controlled by a configuration register:
	// SH_MEM_CONFIG.alignment_mode
	def FeatureUnalignedAccessMode : SubtargetFeature<"unaligned-access-mode",
	"UnalignedAccessMode",
	"true",
	"Enable unaligned global, local and region loads and stores if the hardware"
	" supports it"
	>;

	def FeaturePackedTID : SubtargetFeature<"packed-tid",			def FeaturePackedTID : SubtargetFeature<"packed-tid",
	"HasPackedTID",			"HasPackedTID",
	"true",			"true",
	"Workitem IDs are packed into v0 at kernel launch"			"Workitem IDs are packed into v0 at kernel launch"
	>;			>;

	// Dummy feature used to disable assembler instructions.			// Dummy feature used to disable assembler instructions.
	def FeatureDisable : SubtargetFeature<"",			def FeatureDisable : SubtargetFeature<"",
	Show All 16 Lines
	>;			>;

	def FeatureSeaIslands : GCNSubtargetFeatureGeneration<"SEA_ISLANDS",			def FeatureSeaIslands : GCNSubtargetFeatureGeneration<"SEA_ISLANDS",
	"sea-islands",			"sea-islands",
	[FeatureFP64, FeatureLocalMemorySize65536, FeatureMIMG_R128,			[FeatureFP64, FeatureLocalMemorySize65536, FeatureMIMG_R128,
	FeatureWavefrontSize64, FeatureFlatAddressSpace,			FeatureWavefrontSize64, FeatureFlatAddressSpace,
	FeatureCIInsts, FeatureMovrel, FeatureTrigReducedRange,			FeatureCIInsts, FeatureMovrel, FeatureTrigReducedRange,
	FeatureGFX7GFX8GFX9Insts, FeatureSMemTimeInst, FeatureMadMacF32Insts,			FeatureGFX7GFX8GFX9Insts, FeatureSMemTimeInst, FeatureMadMacF32Insts,
	FeatureDsSrc2Insts, FeatureExtendedImageInsts, FeatureUnalignedBufferAccess			FeatureDsSrc2Insts, FeatureExtendedImageInsts,
				FeatureSupportUnalignedBufferAccess
	]			]
	>;			>;

	def FeatureVolcanicIslands : GCNSubtargetFeatureGeneration<"VOLCANIC_ISLANDS",			def FeatureVolcanicIslands : GCNSubtargetFeatureGeneration<"VOLCANIC_ISLANDS",
	"volcanic-islands",			"volcanic-islands",
	[FeatureFP64, FeatureLocalMemorySize65536, FeatureMIMG_R128,			[FeatureFP64, FeatureLocalMemorySize65536, FeatureMIMG_R128,
	FeatureWavefrontSize64, FeatureFlatAddressSpace,			FeatureWavefrontSize64, FeatureFlatAddressSpace,
	FeatureGCN3Encoding, FeatureCIInsts, Feature16BitInsts,			FeatureGCN3Encoding, FeatureCIInsts, Feature16BitInsts,
	FeatureSMemRealTime, FeatureVGPRIndexMode, FeatureMovrel,			FeatureSMemRealTime, FeatureVGPRIndexMode, FeatureMovrel,
	FeatureScalarStores, FeatureInv2PiInlineImm,			FeatureScalarStores, FeatureInv2PiInlineImm,
	FeatureSDWA, FeatureSDWAOutModsVOPC, FeatureSDWAMac, FeatureDPP,			FeatureSDWA, FeatureSDWAOutModsVOPC, FeatureSDWAMac, FeatureDPP,
	FeatureIntClamp, FeatureTrigReducedRange, FeatureGFX8Insts,			FeatureIntClamp, FeatureTrigReducedRange, FeatureGFX8Insts,
	FeatureGFX7GFX8GFX9Insts, FeatureSMemTimeInst, FeatureMadMacF32Insts,			FeatureGFX7GFX8GFX9Insts, FeatureSMemTimeInst, FeatureMadMacF32Insts,
	FeatureDsSrc2Insts, FeatureExtendedImageInsts, FeatureFastDenormalF32,			FeatureDsSrc2Insts, FeatureExtendedImageInsts, FeatureFastDenormalF32,
	FeatureUnalignedBufferAccess			FeatureSupportUnalignedBufferAccess
	]			]
	>;			>;

	def FeatureGFX9 : GCNSubtargetFeatureGeneration<"GFX9",			def FeatureGFX9 : GCNSubtargetFeatureGeneration<"GFX9",
	"gfx9",			"gfx9",
	[FeatureFP64, FeatureLocalMemorySize65536,			[FeatureFP64, FeatureLocalMemorySize65536,
	FeatureWavefrontSize64, FeatureFlatAddressSpace,			FeatureWavefrontSize64, FeatureFlatAddressSpace,
	FeatureGCN3Encoding, FeatureCIInsts, Feature16BitInsts,			FeatureGCN3Encoding, FeatureCIInsts, Feature16BitInsts,
	FeatureSMemRealTime, FeatureScalarStores, FeatureInv2PiInlineImm,			FeatureSMemRealTime, FeatureScalarStores, FeatureInv2PiInlineImm,
	FeatureApertureRegs, FeatureGFX9Insts, FeatureVOP3P, FeatureVGPRIndexMode,			FeatureApertureRegs, FeatureGFX9Insts, FeatureVOP3P, FeatureVGPRIndexMode,
	FeatureFastFMAF32, FeatureDPP, FeatureIntClamp,			FeatureFastFMAF32, FeatureDPP, FeatureIntClamp,
	FeatureSDWA, FeatureSDWAOmod, FeatureSDWAScalar, FeatureSDWASdst,			FeatureSDWA, FeatureSDWAOmod, FeatureSDWAScalar, FeatureSDWASdst,
	FeatureFlatInstOffsets, FeatureFlatGlobalInsts, FeatureFlatScratchInsts,			FeatureFlatInstOffsets, FeatureFlatGlobalInsts, FeatureFlatScratchInsts,
	FeatureAddNoCarryInsts, FeatureGFX8Insts, FeatureGFX7GFX8GFX9Insts,			FeatureAddNoCarryInsts, FeatureGFX8Insts, FeatureGFX7GFX8GFX9Insts,
	FeatureScalarFlatScratchInsts, FeatureScalarAtomics, FeatureR128A16,			FeatureScalarFlatScratchInsts, FeatureScalarAtomics, FeatureR128A16,
	FeatureSMemTimeInst, FeatureFastDenormalF32, FeatureSupportsXNACK,			FeatureSMemTimeInst, FeatureFastDenormalF32, FeatureSupportsXNACK,
	FeatureUnalignedBufferAccess, FeatureUnalignedDSAccess			FeatureSupportUnalignedBufferAccess, FeatureSupportUnalignedDSAccess
	]			]
	>;			>;

	def FeatureGFX10 : GCNSubtargetFeatureGeneration<"GFX10",			def FeatureGFX10 : GCNSubtargetFeatureGeneration<"GFX10",
	"gfx10",			"gfx10",
	[FeatureFP64, FeatureLocalMemorySize65536, FeatureMIMG_R128,			[FeatureFP64, FeatureLocalMemorySize65536, FeatureMIMG_R128,
	FeatureFlatAddressSpace,			FeatureFlatAddressSpace,
	FeatureCIInsts, Feature16BitInsts,			FeatureCIInsts, Feature16BitInsts,
	FeatureSMemRealTime, FeatureInv2PiInlineImm,			FeatureSMemRealTime, FeatureInv2PiInlineImm,
	FeatureApertureRegs, FeatureGFX9Insts, FeatureGFX10Insts, FeatureVOP3P,			FeatureApertureRegs, FeatureGFX9Insts, FeatureGFX10Insts, FeatureVOP3P,
	FeatureMovrel, FeatureFastFMAF32, FeatureDPP, FeatureIntClamp,			FeatureMovrel, FeatureFastFMAF32, FeatureDPP, FeatureIntClamp,
	FeatureSDWA, FeatureSDWAOmod, FeatureSDWAScalar, FeatureSDWASdst,			FeatureSDWA, FeatureSDWAOmod, FeatureSDWAScalar, FeatureSDWASdst,
	FeatureFlatInstOffsets, FeatureFlatGlobalInsts, FeatureFlatScratchInsts,			FeatureFlatInstOffsets, FeatureFlatGlobalInsts, FeatureFlatScratchInsts,
	FeatureAddNoCarryInsts, FeatureFmaMixInsts, FeatureGFX8Insts,			FeatureAddNoCarryInsts, FeatureFmaMixInsts, FeatureGFX8Insts,
	FeatureNoSdstCMPX, FeatureVscnt, FeatureRegisterBanking,			FeatureNoSdstCMPX, FeatureVscnt, FeatureRegisterBanking,
	FeatureVOP3Literal, FeatureDPP8, FeatureExtendedImageInsts,			FeatureVOP3Literal, FeatureDPP8, FeatureExtendedImageInsts,
	FeatureNoDataDepHazard, FeaturePkFmacF16Inst,			FeatureNoDataDepHazard, FeaturePkFmacF16Inst,
	FeatureGFX10A16, FeatureSMemTimeInst, FeatureFastDenormalF32, FeatureG16,			FeatureGFX10A16, FeatureSMemTimeInst, FeatureFastDenormalF32, FeatureG16,
	FeatureUnalignedBufferAccess, FeatureUnalignedDSAccess			FeatureSupportUnalignedBufferAccess, FeatureSupportUnalignedDSAccess
	]			]
	>;			>;

	class FeatureSet<list<SubtargetFeature> Features_> {			class FeatureSet<list<SubtargetFeature> Features_> {
	list<SubtargetFeature> Features = Features_;			list<SubtargetFeature> Features = Features_;
	}			}

	def FeatureISAVersion6_0_0 : FeatureSet<[FeatureSouthernIslands,			def FeatureISAVersion6_0_0 : FeatureSet<[FeatureSouthernIslands,
	▲ Show 20 Lines • Show All 612 Lines • ▼ Show 20 Lines

	def EnableLateCFGStructurize : Predicate<			def EnableLateCFGStructurize : Predicate<
	"EnableLateStructurizeCFG">;			"EnableLateStructurizeCFG">;

	def EnableFlatScratch : Predicate<"Subtarget->enableFlatScratch()">;			def EnableFlatScratch : Predicate<"Subtarget->enableFlatScratch()">;

	def DisableFlatScratch : Predicate<"!Subtarget->enableFlatScratch()">;			def DisableFlatScratch : Predicate<"!Subtarget->enableFlatScratch()">;

	def HasUnalignedAccessMode : Predicate<"Subtarget->hasUnalignedAccessMode()">,			def HasUnalignedDSAccess : Predicate<"Subtarget->hasUnalignedDSAccessEnabled()">,
	AssemblerPredicate<(all_of FeatureUnalignedAccessMode)>;			AssemblerPredicate<(all_of FeatureUnalignedDSAccess)>;

	// Include AMDGPU TD files			// Include AMDGPU TD files
	include "SISchedule.td"			include "SISchedule.td"
	include "GCNProcessors.td"			include "GCNProcessors.td"
	include "AMDGPUInstrInfo.td"			include "AMDGPUInstrInfo.td"
	include "SIRegisterInfo.td"			include "SIRegisterInfo.td"
	include "AMDGPURegisterBanks.td"			include "AMDGPURegisterBanks.td"
	include "AMDGPUInstructions.td"			include "AMDGPUInstructions.td"
	include "SIInstrInfo.td"			include "SIInstrInfo.td"
	include "AMDGPUCallingConv.td"			include "AMDGPUCallingConv.td"
	include "AMDGPUSearchableTables.td"			include "AMDGPUSearchableTables.td"

llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp

Show First 20 Lines • Show All 87 Lines • ▼ Show 20 Lines	GCNSubtarget::initializeSubtargetDependencies(const Triple &TT,
//		//
// Similarly we want enable-prt-strict-null to be on by default and not to		// Similarly we want enable-prt-strict-null to be on by default and not to
// unset everything else if it is disabled		// unset everything else if it is disabled

SmallString<256> FullFS("+promote-alloca,+load-store-opt,+enable-ds128,");		SmallString<256> FullFS("+promote-alloca,+load-store-opt,+enable-ds128,");

// Turn on features that HSA ABI requires. Also turn on FlatForGlobal by default		// Turn on features that HSA ABI requires. Also turn on FlatForGlobal by default
if (isAmdHsaOS())		if (isAmdHsaOS())
FullFS += "+flat-for-global,+unaligned-access-mode,+trap-handler,";		FullFS += "+flat-for-global,+unaligned-buffer-access,+trap-handler,";
		rampitecUnsubmitted Not Done Reply Inline Actions HSA wants unaligned DS access as well. That is only underaligned ds_read/write_b128 shall not be produced for performance reasons. rampitec: HSA wants unaligned DS access as well. That is only underaligned ds_read/write_b128 shall not…

FullFS += "+enable-prt-strict-null,"; // This is overridden by a disable in FS		FullFS += "+enable-prt-strict-null,"; // This is overridden by a disable in FS

// Disable mutually exclusive bits.		// Disable mutually exclusive bits.
if (FS.find_lower("+wavefrontsize") != StringRef::npos) {		if (FS.find_lower("+wavefrontsize") != StringRef::npos) {
if (FS.find_lower("wavefrontsize16") == StringRef::npos)		if (FS.find_lower("wavefrontsize16") == StringRef::npos)
FullFS += "-wavefrontsize16,";		FullFS += "-wavefrontsize16,";
if (FS.find_lower("wavefrontsize32") == StringRef::npos)		if (FS.find_lower("wavefrontsize32") == StringRef::npos)
▲ Show 20 Lines • Show All 85 Lines • ▼ Show 20 Lines	AMDGPUSubtarget::AMDGPUSubtarget(const Triple &TT) :
EnablePromoteAlloca(false),		EnablePromoteAlloca(false),
HasTrigReducedRange(false),		HasTrigReducedRange(false),
MaxWavesPerEU(10),		MaxWavesPerEU(10),
LocalMemorySize(0),		LocalMemorySize(0),
WavefrontSizeLog2(0)		WavefrontSizeLog2(0)
{ }		{ }

GCNSubtarget::GCNSubtarget(const Triple &TT, StringRef GPU, StringRef FS,		GCNSubtarget::GCNSubtarget(const Triple &TT, StringRef GPU, StringRef FS,
const GCNTargetMachine &TM) :		const GCNTargetMachine &TM) :
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - const GCNTargetMachine &TM) : - AMDGPUGenSubtargetInfo(TT, GPU, /TuneCPU/ GPU, FS), - AMDGPUSubtarget(TT), - TargetTriple(TT), - TargetID(this), - Gen(INVALID), - InstrItins(getInstrItineraryForCPU(GPU)), - LDSBankCount(0), - MaxPrivateElementSize(0), - 164 diff lines are omitted. See full path. Lint: Pre-merge checks:* clang-format: please reformat the code ``` - const GCNTargetMachine…
AMDGPUGenSubtargetInfo(TT, GPU, /TuneCPU/ GPU, FS),		AMDGPUGenSubtargetInfo(TT, GPU, /TuneCPU/ GPU, FS),
AMDGPUSubtarget(TT),		AMDGPUSubtarget(TT),
TargetTriple(TT),		TargetTriple(TT),
TargetID(*this),		TargetID(*this),
Gen(INVALID),		Gen(INVALID),
InstrItins(getInstrItineraryForCPU(GPU)),		InstrItins(getInstrItineraryForCPU(GPU)),
LDSBankCount(0),		LDSBankCount(0),
MaxPrivateElementSize(0),		MaxPrivateElementSize(0),

FastFMAF32(false),		FastFMAF32(false),
FastDenormalF32(false),		FastDenormalF32(false),
HalfRate64Ops(false),		HalfRate64Ops(false),
FullRate64Ops(false),		FullRate64Ops(false),

FlatForGlobal(false),		FlatForGlobal(false),
AutoWaitcntBeforeBarrier(false),		AutoWaitcntBeforeBarrier(false),
UnalignedScratchAccess(false),		UnalignedScratchAccess(false),
UnalignedAccessMode(false),		EnableUnalignedBufferAccess(false),
		EnableUnalignedDSAccess(false),

HasApertureRegs(false),		HasApertureRegs(false),
SupportsXNACK(false),		SupportsXNACK(false),
EnableXNACK(false),		EnableXNACK(false),
EnableTgSplit(false),		EnableTgSplit(false),
EnableCuMode(false),		EnableCuMode(false),
TrapHandler(false),		TrapHandler(false),

▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	GCNSubtarget::GCNSubtarget(const Triple &TT, StringRef GPU, StringRef FS,
FlatInstOffsets(false),		FlatInstOffsets(false),
FlatGlobalInsts(false),		FlatGlobalInsts(false),
FlatScratchInsts(false),		FlatScratchInsts(false),
ScalarFlatScratchInsts(false),		ScalarFlatScratchInsts(false),
AddNoCarryInsts(false),		AddNoCarryInsts(false),
HasUnpackedD16VMem(false),		HasUnpackedD16VMem(false),
LDSMisalignedBug(false),		LDSMisalignedBug(false),
HasMFMAInlineLiteralBug(false),		HasMFMAInlineLiteralBug(false),
UnalignedBufferAccess(false),		SupportsUnalignedBufferAccess(false),
UnalignedDSAccess(false),		SupportsUnalignedDSAccess(false),
HasPackedTID(false),		HasPackedTID(false),

ScalarizeGlobal(false),		ScalarizeGlobal(false),

HasVcmpxPermlaneHazard(false),		HasVcmpxPermlaneHazard(false),
HasVMEMtoScalarWriteHazard(false),		HasVMEMtoScalarWriteHazard(false),
HasSMEMtoVectorWriteHazard(false),		HasSMEMtoVectorWriteHazard(false),
HasInstFwdPrefetchBug(false),		HasInstFwdPrefetchBug(false),
▲ Show 20 Lines • Show All 683 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp

Show First 20 Lines • Show All 256 Lines • ▼ Show 20 Lines	void AMDGPUTTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE,
BaseT::getPeelingPreferences(L, SE, PP);		BaseT::getPeelingPreferences(L, SE, PP);
}		}

const FeatureBitset GCNTTIImpl::InlineFeatureIgnoreList = {		const FeatureBitset GCNTTIImpl::InlineFeatureIgnoreList = {
// Codegen control options which don't matter.		// Codegen control options which don't matter.
AMDGPU::FeatureEnableLoadStoreOpt, AMDGPU::FeatureEnableSIScheduler,		AMDGPU::FeatureEnableLoadStoreOpt, AMDGPU::FeatureEnableSIScheduler,
AMDGPU::FeatureEnableUnsafeDSOffsetFolding, AMDGPU::FeatureFlatForGlobal,		AMDGPU::FeatureEnableUnsafeDSOffsetFolding, AMDGPU::FeatureFlatForGlobal,
AMDGPU::FeaturePromoteAlloca, AMDGPU::FeatureUnalignedScratchAccess,		AMDGPU::FeaturePromoteAlloca, AMDGPU::FeatureUnalignedScratchAccess,
AMDGPU::FeatureUnalignedAccessMode,		AMDGPU::FeatureUnalignedBufferAccess, AMDGPU::FeatureUnalignedDSAccess,

AMDGPU::FeatureAutoWaitcntBeforeBarrier,		AMDGPU::FeatureAutoWaitcntBeforeBarrier,

// Property of the kernel/environment which can't actually differ.		// Property of the kernel/environment which can't actually differ.
AMDGPU::FeatureSGPRInitBug, AMDGPU::FeatureXNACK,		AMDGPU::FeatureSGPRInitBug, AMDGPU::FeatureXNACK,
AMDGPU::FeatureTrapHandler,		AMDGPU::FeatureTrapHandler,

// The default assumption needs to be ecc is enabled, but no directly		// The default assumption needs to be ecc is enabled, but no directly
▲ Show 20 Lines • Show All 1,055 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/DSInstructions.td

	Show First 20 Lines • Show All 722 Lines • ▼ Show 20 Lines
	foreach vt = VReg_96.RegTypes in {			foreach vt = VReg_96.RegTypes in {
	defm : DSReadPat_mc <DS_READ_B96, vt, "load_align16_local">;			defm : DSReadPat_mc <DS_READ_B96, vt, "load_align16_local">;
	}			}

	foreach vt = VReg_128.RegTypes in {			foreach vt = VReg_128.RegTypes in {
	defm : DSReadPat_mc <DS_READ_B128, vt, "load_align16_local">;			defm : DSReadPat_mc <DS_READ_B128, vt, "load_align16_local">;
	}			}

	let SubtargetPredicate = HasUnalignedAccessMode in {			let SubtargetPredicate = HasUnalignedDSAccess in {

	foreach vt = VReg_96.RegTypes in {			foreach vt = VReg_96.RegTypes in {
	defm : DSReadPat_mc <DS_READ_B96, vt, "load_local">;			defm : DSReadPat_mc <DS_READ_B96, vt, "load_local">;
	}			}

	foreach vt = VReg_128.RegTypes in {			foreach vt = VReg_128.RegTypes in {
	defm : DSReadPat_mc <DS_READ_B128, vt, "load_local">;			defm : DSReadPat_mc <DS_READ_B128, vt, "load_local">;
	}			}

	} // End SubtargetPredicate = HasUnalignedAccessMode			} // End SubtargetPredicate = HasUnalignedDSAccess

	} // End SubtargetPredicate = isGFX7Plus			} // End SubtargetPredicate = isGFX7Plus

	} // End AddedComplexity = 100			} // End AddedComplexity = 100

	let OtherPredicates = [D16PreservesUnusedBits] in {			let OtherPredicates = [D16PreservesUnusedBits] in {
	def : DSReadPat_D16<DS_READ_U16_D16_HI, load_d16_hi_local, v2i16>;			def : DSReadPat_D16<DS_READ_U16_D16_HI, load_d16_hi_local, v2i16>;
	def : DSReadPat_D16<DS_READ_U16_D16_HI, load_d16_hi_local, v2f16>;			def : DSReadPat_D16<DS_READ_U16_D16_HI, load_d16_hi_local, v2f16>;
	▲ Show 20 Lines • Show All 128 Lines • ▼ Show 20 Lines
	foreach vt = VReg_96.RegTypes in {			foreach vt = VReg_96.RegTypes in {
	defm : DSWritePat_mc <DS_WRITE_B96, vt, "store_align16_local">;			defm : DSWritePat_mc <DS_WRITE_B96, vt, "store_align16_local">;
	}			}

	foreach vt = VReg_128.RegTypes in {			foreach vt = VReg_128.RegTypes in {
	defm : DSWritePat_mc <DS_WRITE_B128, vt, "store_align16_local">;			defm : DSWritePat_mc <DS_WRITE_B128, vt, "store_align16_local">;
	}			}

	let SubtargetPredicate = HasUnalignedAccessMode in {			let SubtargetPredicate = HasUnalignedDSAccess in {

	foreach vt = VReg_96.RegTypes in {			foreach vt = VReg_96.RegTypes in {
	defm : DSWritePat_mc <DS_WRITE_B96, vt, "store_local">;			defm : DSWritePat_mc <DS_WRITE_B96, vt, "store_local">;
	}			}

	foreach vt = VReg_128.RegTypes in {			foreach vt = VReg_128.RegTypes in {
	defm : DSWritePat_mc <DS_WRITE_B128, vt, "store_local">;			defm : DSWritePat_mc <DS_WRITE_B128, vt, "store_local">;
	}			}

	} // End SubtargetPredicate = HasUnalignedAccessMode			} // End SubtargetPredicate = HasUnalignedDSAccess

	} // End SubtargetPredicate = isGFX7Plus			} // End SubtargetPredicate = isGFX7Plus

	} // End AddedComplexity = 100			} // End AddedComplexity = 100

	class DSAtomicRetPat<DS_Pseudo inst, ValueType vt, PatFrag frag, bit gds=0> : GCNPat <			class DSAtomicRetPat<DS_Pseudo inst, ValueType vt, PatFrag frag, bit gds=0> : GCNPat <
	(frag (DS1Addr1Offset i32:$ptr, i16:$offset), vt:$value),			(frag (DS1Addr1Offset i32:$ptr, i16:$offset), vt:$value),
	(inst $ptr, getVregSrcForVT<vt>.ret:$value, offset:$offset, (i1 gds))			(inst $ptr, getVregSrcForVT<vt>.ret:$value, offset:$offset, (i1 gds))
	▲ Show 20 Lines • Show All 490 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/GCNSubtarget.h

Show First 20 Lines • Show All 82 Lines • ▼ Show 20 Lines	protected:
bool FastDenormalF32;		bool FastDenormalF32;
bool HalfRate64Ops;		bool HalfRate64Ops;
bool FullRate64Ops;		bool FullRate64Ops;

// Dynamically set bits that enable features.		// Dynamically set bits that enable features.
bool FlatForGlobal;		bool FlatForGlobal;
bool AutoWaitcntBeforeBarrier;		bool AutoWaitcntBeforeBarrier;
bool UnalignedScratchAccess;		bool UnalignedScratchAccess;
bool UnalignedAccessMode;		bool EnableUnalignedBufferAccess;
		bool EnableUnalignedDSAccess;
bool HasApertureRegs;		bool HasApertureRegs;
bool SupportsXNACK;		bool SupportsXNACK;

// This should not be used directly. 'TargetID' tracks the dynamic settings		// This should not be used directly. 'TargetID' tracks the dynamic settings
// for XNACK.		// for XNACK.
bool EnableXNACK;		bool EnableXNACK;

bool EnableTgSplit;		bool EnableTgSplit;
▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines	protected:
bool HasUnpackedD16VMem;		bool HasUnpackedD16VMem;
bool R600ALUInst;		bool R600ALUInst;
bool CaymanISA;		bool CaymanISA;
bool CFALUBug;		bool CFALUBug;
bool LDSMisalignedBug;		bool LDSMisalignedBug;
bool HasMFMAInlineLiteralBug;		bool HasMFMAInlineLiteralBug;
bool HasVertexCache;		bool HasVertexCache;
short TexVTXClauseSize;		short TexVTXClauseSize;
bool UnalignedBufferAccess;		bool SupportsUnalignedBufferAccess;
bool UnalignedDSAccess;		bool SupportsUnalignedDSAccess;
bool HasPackedTID;		bool HasPackedTID;
bool ScalarizeGlobal;		bool ScalarizeGlobal;

bool HasVcmpxPermlaneHazard;		bool HasVcmpxPermlaneHazard;
bool HasVMEMtoScalarWriteHazard;		bool HasVMEMtoScalarWriteHazard;
bool HasSMEMtoVectorWriteHazard;		bool HasSMEMtoVectorWriteHazard;
bool HasInstFwdPrefetchBug;		bool HasInstFwdPrefetchBug;
bool HasVcmpxExecWARHazard;		bool HasVcmpxExecWARHazard;
▲ Show 20 Lines • Show All 293 Lines • ▼ Show 20 Lines	bool usePRTStrictNull() const {
return EnablePRTStrictNull;		return EnablePRTStrictNull;
}		}

bool hasAutoWaitcntBeforeBarrier() const {		bool hasAutoWaitcntBeforeBarrier() const {
return AutoWaitcntBeforeBarrier;		return AutoWaitcntBeforeBarrier;
}		}

bool hasUnalignedBufferAccess() const {		bool hasUnalignedBufferAccess() const {
return UnalignedBufferAccess;		return SupportsUnalignedBufferAccess;
}		}

bool hasUnalignedBufferAccessEnabled() const {		bool hasUnalignedBufferAccessEnabled() const {
return UnalignedBufferAccess && UnalignedAccessMode;		return SupportsUnalignedBufferAccess && EnableUnalignedBufferAccess;
}		}

bool hasUnalignedDSAccess() const {		bool hasUnalignedDSAccess() const {
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - bool hasUnalignedDSAccess() const { - return SupportsUnalignedDSAccess; - } + bool hasUnalignedDSAccess() const { return SupportsUnalignedDSAccess; } Lint: Pre-merge checks: clang-format: please reformat the code ``` - bool hasUnalignedDSAccess() const { - return…
return UnalignedDSAccess;		return SupportsUnalignedDSAccess;
}		}

bool hasUnalignedDSAccessEnabled() const {		bool hasUnalignedDSAccessEnabled() const {
return UnalignedDSAccess && UnalignedAccessMode;		return SupportsUnalignedDSAccess && EnableUnalignedDSAccess;
}		}

bool hasUnalignedScratchAccess() const {		bool hasUnalignedScratchAccess() const {
return UnalignedScratchAccess;		return UnalignedScratchAccess;
}		}

bool hasUnalignedAccessMode() const {
return UnalignedAccessMode;
}

bool hasApertureRegs() const {		bool hasApertureRegs() const {
return HasApertureRegs;		return HasApertureRegs;
}		}

bool isTrapHandlerEnabled() const {		bool isTrapHandlerEnabled() const {
return TrapHandler;		return TrapHandler;
}		}

▲ Show 20 Lines • Show All 583 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/lds-misaligned-bug.ll

	; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,ALIGNED %s			; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,ALIGNED %s
	; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1011 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,ALIGNED %s			; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1011 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,ALIGNED %s
	; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1012 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,ALIGNED %s			; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1012 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,ALIGNED %s
	; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs -mattr=+cumode < %s \| FileCheck -check-prefixes=GCN,ALIGNED %s			; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs -mattr=+cumode < %s \| FileCheck -check-prefixes=GCN,ALIGNED %s
	; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs -mattr=+cumode,+unaligned-access-mode < %s \| FileCheck -check-prefixes=GCN,UNALIGNED %s			; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs -mattr=+cumode,+unaligned-ds-access < %s \| FileCheck -check-prefixes=GCN,UNALIGNED %s

	; GCN-LABEL: test_local_misaligned_v2:			; GCN-LABEL: test_local_misaligned_v2:
	; GCN-DAG: ds_read2_b32			; GCN-DAG: ds_read2_b32
	; GCN-DAG: ds_write2_b32			; GCN-DAG: ds_write2_b32
	define amdgpu_kernel void @test_local_misaligned_v2(i32 addrspace(3)* %arg) {			define amdgpu_kernel void @test_local_misaligned_v2(i32 addrspace(3)* %arg) {
	bb:			bb:
	%lid = tail call i32 @llvm.amdgcn.workitem.id.x()			%lid = tail call i32 @llvm.amdgcn.workitem.id.x()
	%gep = getelementptr inbounds i32, i32 addrspace(3)* %arg, i32 %lid			%gep = getelementptr inbounds i32, i32 addrspace(3)* %arg, i32 %lid
	▲ Show 20 Lines • Show All 118 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-load-local.mir

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	# RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=tahiti -O0 -run-pass=legalizer -global-isel-abort=0 %s -o - \| FileCheck -check-prefix=SI %s			# RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=tahiti -O0 -run-pass=legalizer -global-isel-abort=0 %s -o - \| FileCheck -check-prefix=SI %s
	# RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=bonaire -mattr=-enable-ds128 -O0 -run-pass=legalizer -global-isel-abort=0 %s -o - \| FileCheck -check-prefix=CI %s			# RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=bonaire -mattr=-enable-ds128 -O0 -run-pass=legalizer -global-isel-abort=0 %s -o - \| FileCheck -check-prefix=CI %s
	# RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=bonaire -mattr=+enable-ds128 -O0 -run-pass=legalizer -global-isel-abort=0 %s -o - \| FileCheck -check-prefix=CI-DS128 %s			# RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=bonaire -mattr=+enable-ds128 -O0 -run-pass=legalizer -global-isel-abort=0 %s -o - \| FileCheck -check-prefix=CI-DS128 %s
	# RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=fiji -O0 -run-pass=legalizer -global-isel-abort=0 %s -o - \| FileCheck -check-prefix=VI %s			# RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=fiji -O0 -run-pass=legalizer -global-isel-abort=0 %s -o - \| FileCheck -check-prefix=VI %s
	# RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx900 -O0 -run-pass=legalizer -mattr=-unaligned-access-mode -global-isel-abort=0 %s -o - \| FileCheck -check-prefix=GFX9 %s			# RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx900 -O0 -run-pass=legalizer -mattr=-unaligned-ds-access -global-isel-abort=0 %s -o - \| FileCheck -check-prefix=GFX9 %s
	# RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx900 -O0 -run-pass=legalizer -mattr=+unaligned-access-mode -global-isel-abort=0 %s -o - \| FileCheck -check-prefix=GFX9-UNALIGNED %s			# RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx900 -O0 -run-pass=legalizer -mattr=+unaligned-ds-access -global-isel-abort=0 %s -o - \| FileCheck -check-prefix=GFX9-UNALIGNED %s

	---			---
	name: test_load_local_s1_align1			name: test_load_local_s1_align1
	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0			liveins: $vgpr0

	; SI-LABEL: name: test_load_local_s1_align1			; SI-LABEL: name: test_load_local_s1_align1
	▲ Show 20 Lines • Show All 12,769 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/load-constant.96.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=gfx900 -mattr=+unaligned-access-mode < %s \| FileCheck -check-prefixes=GCN,GFX9,GFX9-UNALIGNED %s			; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=gfx900 -mattr=+unaligned-buffer-access < %s \| FileCheck -check-prefixes=GCN,GFX9,GFX9-UNALIGNED %s
	; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=gfx900 -mattr=-unaligned-access-mode < %s \| FileCheck -check-prefixes=GCN,GFX9,GFX9-NOUNALIGNED %s			; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=gfx900 -mattr=-unaligned-buffer-access < %s \| FileCheck -check-prefixes=GCN,GFX9,GFX9-NOUNALIGNED %s
	; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=hawaii -mattr=+unaligned-access-mode < %s \| FileCheck -check-prefixes=GCN,GFX7,GFX7-UNALIGNED %s			; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=hawaii -mattr=+unaligned-buffer-access < %s \| FileCheck -check-prefixes=GCN,GFX7,GFX7-UNALIGNED %s
	; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=hawaii -mattr=-unaligned-access-mode < %s \| FileCheck -check-prefixes=GCN,GFX7,GFX7-NOUNALIGNED %s			; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=hawaii -mattr=-unaligned-buffer-access < %s \| FileCheck -check-prefixes=GCN,GFX7,GFX7-NOUNALIGNED %s

	; FIXME:			; FIXME:
	; XUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=tahiti < %s \| FileCheck -check-prefixes=GCN,GFX6 %s			; XUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=tahiti < %s \| FileCheck -check-prefixes=GCN,GFX6 %s

	define <3 x i32> @v_load_constant_v3i32_align1(<3 x i32> addrspace(4)* %ptr) {			define <3 x i32> @v_load_constant_v3i32_align1(<3 x i32> addrspace(4)* %ptr) {
	; GFX9-UNALIGNED-LABEL: v_load_constant_v3i32_align1:			; GFX9-UNALIGNED-LABEL: v_load_constant_v3i32_align1:
	; GFX9-UNALIGNED: ; %bb.0:			; GFX9-UNALIGNED: ; %bb.0:
	; GFX9-UNALIGNED-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-UNALIGNED-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	▲ Show 20 Lines • Show All 738 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/load-unaligned.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=gfx900 -mattr=+unaligned-access-mode -verify-machineinstrs < %s \| FileCheck --check-prefix=GFX9 %s			; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=gfx900 -mattr=+unaligned-ds-access -verify-machineinstrs < %s \| FileCheck --check-prefix=GFX9 %s
	; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=hawaii -mattr=+unaligned-access-mode -verify-machineinstrs < %s \| FileCheck --check-prefix=GFX7 %s			; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -mcpu=hawaii -mattr=+unaligned-ds-access -verify-machineinstrs < %s \| FileCheck --check-prefix=GFX7 %s

	; Unaligned DS access in available from GFX9 onwards.			; Unaligned DS access in available from GFX9 onwards.
	; LDS alignment enforcement is controlled by a configuration register:			; LDS alignment enforcement is controlled by a configuration register:
	; SH_MEM_CONFIG.alignment_mode			; SH_MEM_CONFIG.alignment_mode

	define <4 x i32> @load_lds_v4i32_align1(<4 x i32> addrspace(3)* %ptr) {			define <4 x i32> @load_lds_v4i32_align1(<4 x i32> addrspace(3)* %ptr) {
	; GFX9-LABEL: load_lds_v4i32_align1:			; GFX9-LABEL: load_lds_v4i32_align1:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	▲ Show 20 Lines • Show All 241 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/amdgpu.private-memory.ll

	; RUN: llc -show-mc-encoding --amdhsa-code-object-version=2 -mattr=+promote-alloca -disable-promote-alloca-to-vector -amdgpu-load-store-vectorizer=0 -enable-amdgpu-aa=0 -verify-machineinstrs -march=amdgcn < %s \| FileCheck -enable-var-scope -check-prefix=SI-PROMOTE -check-prefix=SI -check-prefix=FUNC %s			; RUN: llc -show-mc-encoding --amdhsa-code-object-version=2 -mattr=+promote-alloca -disable-promote-alloca-to-vector -amdgpu-load-store-vectorizer=0 -enable-amdgpu-aa=0 -verify-machineinstrs -march=amdgcn < %s \| FileCheck -enable-var-scope -check-prefix=SI-PROMOTE -check-prefix=SI -check-prefix=FUNC %s
	; RUN: llc -show-mc-encoding --amdhsa-code-object-version=2 -mattr=+promote-alloca -disable-promote-alloca-to-vector -amdgpu-load-store-vectorizer=0 -enable-amdgpu-aa=0 -verify-machineinstrs -mtriple=amdgcn--amdhsa -mcpu=kaveri -mattr=-unaligned-access-mode < %s \| FileCheck -enable-var-scope -check-prefix=SI-PROMOTE -check-prefix=SI -check-prefix=FUNC -check-prefix=HSA-PROMOTE %s			; RUN: llc -show-mc-encoding --amdhsa-code-object-version=2 -mattr=+promote-alloca -disable-promote-alloca-to-vector -amdgpu-load-store-vectorizer=0 -enable-amdgpu-aa=0 -verify-machineinstrs -mtriple=amdgcn--amdhsa -mcpu=kaveri -mattr=-unaligned-buffer-access < %s \| FileCheck -enable-var-scope -check-prefix=SI-PROMOTE -check-prefix=SI -check-prefix=FUNC -check-prefix=HSA-PROMOTE %s
	; RUN: llc -show-mc-encoding --amdhsa-code-object-version=2 -mattr=-promote-alloca -amdgpu-load-store-vectorizer=0 -enable-amdgpu-aa=0 -verify-machineinstrs -march=amdgcn < %s \| FileCheck %s -check-prefix=SI-ALLOCA -check-prefix=SI -check-prefix=FUNC			; RUN: llc -show-mc-encoding --amdhsa-code-object-version=2 -mattr=-promote-alloca -amdgpu-load-store-vectorizer=0 -enable-amdgpu-aa=0 -verify-machineinstrs -march=amdgcn < %s \| FileCheck %s -check-prefix=SI-ALLOCA -check-prefix=SI -check-prefix=FUNC
	; RUN: llc -show-mc-encoding --amdhsa-code-object-version=2 -mattr=-promote-alloca -amdgpu-load-store-vectorizer=0 -enable-amdgpu-aa=0 -verify-machineinstrs -mtriple=amdgcn-amdhsa -mcpu=kaveri -mattr=-unaligned-access-mode < %s \| FileCheck -enable-var-scope -check-prefix=SI-ALLOCA -check-prefix=SI -check-prefix=FUNC -check-prefix=HSA-ALLOCA %s			; RUN: llc -show-mc-encoding --amdhsa-code-object-version=2 -mattr=-promote-alloca -amdgpu-load-store-vectorizer=0 -enable-amdgpu-aa=0 -verify-machineinstrs -mtriple=amdgcn-amdhsa -mcpu=kaveri -mattr=-unaligned-buffer-access < %s \| FileCheck -enable-var-scope -check-prefix=SI-ALLOCA -check-prefix=SI -check-prefix=FUNC -check-prefix=HSA-ALLOCA %s
	; RUN: llc -show-mc-encoding --amdhsa-code-object-version=2 -mattr=+promote-alloca -disable-promote-alloca-to-vector -amdgpu-load-store-vectorizer=0 -enable-amdgpu-aa=0 -verify-machineinstrs -mtriple=amdgcn-amdhsa -march=amdgcn -mcpu=tonga -mattr=-unaligned-access-mode < %s \| FileCheck -enable-var-scope -check-prefix=SI-PROMOTE -check-prefix=SI -check-prefix=FUNC %s			; RUN: llc -show-mc-encoding --amdhsa-code-object-version=2 -mattr=+promote-alloca -disable-promote-alloca-to-vector -amdgpu-load-store-vectorizer=0 -enable-amdgpu-aa=0 -verify-machineinstrs -mtriple=amdgcn-amdhsa -march=amdgcn -mcpu=tonga -mattr=-unaligned-buffer-access < %s \| FileCheck -enable-var-scope -check-prefix=SI-PROMOTE -check-prefix=SI -check-prefix=FUNC %s
	; RUN: llc -show-mc-encoding --amdhsa-code-object-version=2 -mattr=+promote-alloca -amdgpu-load-store-vectorizer=0 -enable-amdgpu-aa=0 -verify-machineinstrs -mtriple=amdgcn-amdhsa -march=amdgcn -mcpu=tonga -mattr=-unaligned-access-mode < %s \| FileCheck -enable-var-scope -check-prefix=SI-PROMOTE-VECT -check-prefix=SI -check-prefix=FUNC %s			; RUN: llc -show-mc-encoding --amdhsa-code-object-version=2 -mattr=+promote-alloca -amdgpu-load-store-vectorizer=0 -enable-amdgpu-aa=0 -verify-machineinstrs -mtriple=amdgcn-amdhsa -march=amdgcn -mcpu=tonga -mattr=-unaligned-buffer-access < %s \| FileCheck -enable-var-scope -check-prefix=SI-PROMOTE-VECT -check-prefix=SI -check-prefix=FUNC %s
	; RUN: llc -show-mc-encoding --amdhsa-code-object-version=2 -mattr=-promote-alloca -amdgpu-load-store-vectorizer=0 -enable-amdgpu-aa=0 -verify-machineinstrs -mtriple=amdgcn-amdhsa -march=amdgcn -mcpu=tonga -mattr=-unaligned-access-mode < %s \| FileCheck -enable-var-scope -check-prefix=SI-ALLOCA -check-prefix=SI -check-prefix=FUNC %s			; RUN: llc -show-mc-encoding --amdhsa-code-object-version=2 -mattr=-promote-alloca -amdgpu-load-store-vectorizer=0 -enable-amdgpu-aa=0 -verify-machineinstrs -mtriple=amdgcn-amdhsa -march=amdgcn -mcpu=tonga -mattr=-unaligned-buffer-access < %s \| FileCheck -enable-var-scope -check-prefix=SI-ALLOCA -check-prefix=SI -check-prefix=FUNC %s

	; RUN: opt -S -mtriple=amdgcn-unknown-amdhsa -data-layout=A5 -mcpu=kaveri -amdgpu-promote-alloca -disable-promote-alloca-to-vector < %s \| FileCheck -enable-var-scope -check-prefix=HSAOPT -check-prefix=OPT %s			; RUN: opt -S -mtriple=amdgcn-unknown-amdhsa -data-layout=A5 -mcpu=kaveri -amdgpu-promote-alloca -disable-promote-alloca-to-vector < %s \| FileCheck -enable-var-scope -check-prefix=HSAOPT -check-prefix=OPT %s
	; RUN: opt -S -mtriple=amdgcn-unknown-unknown -data-layout=A5 -mcpu=kaveri -amdgpu-promote-alloca -disable-promote-alloca-to-vector < %s \| FileCheck -enable-var-scope -check-prefix=NOHSAOPT -check-prefix=OPT %s			; RUN: opt -S -mtriple=amdgcn-unknown-unknown -data-layout=A5 -mcpu=kaveri -amdgpu-promote-alloca -disable-promote-alloca-to-vector < %s \| FileCheck -enable-var-scope -check-prefix=NOHSAOPT -check-prefix=OPT %s

	; RUN: llc -march=r600 -mcpu=cypress -disable-promote-alloca-to-vector < %s \| FileCheck %s -check-prefix=R600 -check-prefix=FUNC			; RUN: llc -march=r600 -mcpu=cypress -disable-promote-alloca-to-vector < %s \| FileCheck %s -check-prefix=R600 -check-prefix=FUNC
	; RUN: llc -march=r600 -mcpu=cypress < %s \| FileCheck %s -check-prefix=R600-VECT -check-prefix=FUNC			; RUN: llc -march=r600 -mcpu=cypress < %s \| FileCheck %s -check-prefix=R600-VECT -check-prefix=FUNC

	; HSAOPT: @mova_same_clause.stack = internal unnamed_addr addrspace(3) global [256 x [5 x i32]] undef, align 4			; HSAOPT: @mova_same_clause.stack = internal unnamed_addr addrspace(3) global [256 x [5 x i32]] undef, align 4
	▲ Show 20 Lines • Show All 551 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/chain-hi-to-lo.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs -mattr=-unaligned-access-mode < %s \| FileCheck -check-prefixes=GCN,GFX900 %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs -mattr=-unaligned-buffer-access < %s \| FileCheck -check-prefixes=GCN,GFX900 %s
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs -mattr=-unaligned-access-mode -amdgpu-enable-flat-scratch < %s \| FileCheck -check-prefixes=GCN,FLATSCR %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs -mattr=-unaligned-buffer-access -amdgpu-enable-flat-scratch < %s \| FileCheck -check-prefixes=GCN,FLATSCR %s

	define <2 x half> @chain_hi_to_lo_private() {			define <2 x half> @chain_hi_to_lo_private() {
	; GFX900-LABEL: chain_hi_to_lo_private:			; GFX900-LABEL: chain_hi_to_lo_private:
	; GFX900: ; %bb.0: ; %bb			; GFX900: ; %bb.0: ; %bb
	; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX900-NEXT: buffer_load_ushort v0, off, s[0:3], 0 offset:2			; GFX900-NEXT: buffer_load_ushort v0, off, s[0:3], 0 offset:2
	; GFX900-NEXT: s_waitcnt vmcnt(0)			; GFX900-NEXT: s_waitcnt vmcnt(0)
	; GFX900-NEXT: buffer_load_short_d16_hi v0, off, s[0:3], 0			; GFX900-NEXT: buffer_load_short_d16_hi v0, off, s[0:3], 0
	▲ Show 20 Lines • Show All 450 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/ds_read2.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=amdgcn--amdpal -mcpu=bonaire -verify-machineinstrs -mattr=+load-store-opt < %s \| FileCheck -enable-var-scope --check-prefix=CI %s			; RUN: llc -mtriple=amdgcn--amdpal -mcpu=bonaire -verify-machineinstrs -mattr=+load-store-opt < %s \| FileCheck -enable-var-scope --check-prefix=CI %s
	; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx900 -verify-machineinstrs -mattr=+load-store-opt,-unaligned-access-mode < %s \| FileCheck -enable-var-scope -check-prefixes=GFX9,GFX9-ALIGNED %s			; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx900 -verify-machineinstrs -mattr=+load-store-opt,-unaligned-ds-access < %s \| FileCheck -enable-var-scope -check-prefixes=GFX9,GFX9-ALIGNED %s
	; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx900 -verify-machineinstrs -mattr=+load-store-opt,+unaligned-access-mode < %s \| FileCheck -enable-var-scope -check-prefixes=GFX9,GFX9-UNALIGNED %s			; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx900 -verify-machineinstrs -mattr=+load-store-opt,+unaligned-ds-access < %s \| FileCheck -enable-var-scope -check-prefixes=GFX9,GFX9-UNALIGNED %s

	; FIXME: We don't get cases where the address was an SGPR because we			; FIXME: We don't get cases where the address was an SGPR because we
	; get a copy to the address register for each one.			; get a copy to the address register for each one.

	@lds = addrspace(3) global [512 x float] undef, align 4			@lds = addrspace(3) global [512 x float] undef, align 4
	@lds.f64 = addrspace(3) global [512 x double] undef, align 8			@lds.f64 = addrspace(3) global [512 x double] undef, align 8

	define amdgpu_kernel void @simple_read2_f32(float addrspace(1)* %out) #0 {			define amdgpu_kernel void @simple_read2_f32(float addrspace(1)* %out) #0 {
	▲ Show 20 Lines • Show All 1,539 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/ds_write2.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=amdgcn--amdpal -mcpu=bonaire -verify-machineinstrs -mattr=+load-store-opt < %s \| FileCheck -enable-var-scope --check-prefix=CI %s			; RUN: llc -mtriple=amdgcn--amdpal -mcpu=bonaire -verify-machineinstrs -mattr=+load-store-opt < %s \| FileCheck -enable-var-scope --check-prefix=CI %s
	; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx900 -verify-machineinstrs -mattr=+load-store-opt,-unaligned-access-mode < %s \| FileCheck -enable-var-scope -check-prefixes=GFX9,GFX9-ALIGNED %s			; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx900 -verify-machineinstrs -mattr=+load-store-opt,-unaligned-ds-access < %s \| FileCheck -enable-var-scope -check-prefixes=GFX9,GFX9-ALIGNED %s
	; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx900 -verify-machineinstrs -mattr=+load-store-opt,+unaligned-access-mode < %s \| FileCheck -enable-var-scope -check-prefixes=GFX9,GFX9-UNALIGNED %s			; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx900 -verify-machineinstrs -mattr=+load-store-opt,+unaligned-ds-access < %s \| FileCheck -enable-var-scope -check-prefixes=GFX9,GFX9-UNALIGNED %s

	@lds = addrspace(3) global [512 x float] undef, align 4			@lds = addrspace(3) global [512 x float] undef, align 4
	@lds.f64 = addrspace(3) global [512 x double] undef, align 8			@lds.f64 = addrspace(3) global [512 x double] undef, align 8

	define amdgpu_kernel void @simple_write2_one_val_f32(float addrspace(1)* %C, float addrspace(1)* %in) #0 {			define amdgpu_kernel void @simple_write2_one_val_f32(float addrspace(1)* %C, float addrspace(1)* %in) #0 {
	; CI-LABEL: simple_write2_one_val_f32:			; CI-LABEL: simple_write2_one_val_f32:
	; CI: ; %bb.0:			; CI: ; %bb.0:
	; CI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xb			; CI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xb
	▲ Show 20 Lines • Show All 1,059 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/fast-unaligned-load-store.global.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -mattr=-unaligned-access-mode < %s \| FileCheck --check-prefix=GFX7-ALIGNED %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -mattr=-unaligned-buffer-access < %s \| FileCheck --check-prefix=GFX7-ALIGNED %s
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -mattr=+unaligned-access-mode < %s \| FileCheck --check-prefix=GFX7-UNALIGNED %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -mattr=+unaligned-buffer-access < %s \| FileCheck --check-prefix=GFX7-UNALIGNED %s
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -mattr=+unaligned-access-mode < %s \| FileCheck --check-prefix=GFX9 %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -mattr=+unaligned-buffer-access < %s \| FileCheck --check-prefix=GFX9 %s

	; Should not merge this to a dword load			; Should not merge this to a dword load
	define i32 @global_load_2xi16_align2(i16 addrspace(1)* %p) #0 {			define i32 @global_load_2xi16_align2(i16 addrspace(1)* %p) #0 {
	; GFX7-ALIGNED-LABEL: global_load_2xi16_align2:			; GFX7-ALIGNED-LABEL: global_load_2xi16_align2:
	; GFX7-ALIGNED: ; %bb.0:			; GFX7-ALIGNED: ; %bb.0:
	; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX7-ALIGNED-NEXT: v_add_i32_e32 v2, vcc, 2, v0			; GFX7-ALIGNED-NEXT: v_add_i32_e32 v2, vcc, 2, v0
	; GFX7-ALIGNED-NEXT: v_addc_u32_e32 v3, vcc, 0, v1, vcc			; GFX7-ALIGNED-NEXT: v_addc_u32_e32 v3, vcc, 0, v1, vcc
	▲ Show 20 Lines • Show All 314 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/lds-misaligned-bug.ll

	; RUN: llc -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,ALIGNED,SPLIT %s			; RUN: llc -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,ALIGNED,SPLIT %s
	; RUN: llc -march=amdgcn -mcpu=gfx1011 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,ALIGNED,SPLIT %s			; RUN: llc -march=amdgcn -mcpu=gfx1011 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,ALIGNED,SPLIT %s
	; RUN: llc -march=amdgcn -mcpu=gfx1012 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,ALIGNED,SPLIT %s			; RUN: llc -march=amdgcn -mcpu=gfx1012 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,ALIGNED,SPLIT %s
	; RUN: llc -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs -mattr=+cumode < %s \| FileCheck -check-prefixes=GCN,ALIGNED,VECT %s			; RUN: llc -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs -mattr=+cumode < %s \| FileCheck -check-prefixes=GCN,ALIGNED,VECT %s
	; RUN: llc -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs -mattr=+cumode,+unaligned-access-mode < %s \| FileCheck -check-prefixes=GCN,UNALIGNED,VECT %s			; RUN: llc -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs -mattr=+cumode,+unaligned-ds-access < %s \| FileCheck -check-prefixes=GCN,UNALIGNED,VECT %s

	; GCN-LABEL: test_local_misaligned_v2:			; GCN-LABEL: test_local_misaligned_v2:
	; GCN-DAG: ds_read2_b32			; GCN-DAG: ds_read2_b32
	; GCN-DAG: ds_write2_b32			; GCN-DAG: ds_write2_b32
	define amdgpu_kernel void @test_local_misaligned_v2(i32 addrspace(3)* %arg) {			define amdgpu_kernel void @test_local_misaligned_v2(i32 addrspace(3)* %arg) {
	bb:			bb:
	%lid = tail call i32 @llvm.amdgcn.workitem.id.x()			%lid = tail call i32 @llvm.amdgcn.workitem.id.x()
	%gep = getelementptr inbounds i32, i32 addrspace(3)* %arg, i32 %lid			%gep = getelementptr inbounds i32, i32 addrspace(3)* %arg, i32 %lid
	▲ Show 20 Lines • Show All 258 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/unaligned-load-store.ll

	; RUN: llc -march=amdgcn -verify-machineinstrs< %s \| FileCheck -check-prefixes=SI,MUBUF,ALIGNED %s			; RUN: llc -march=amdgcn -verify-machineinstrs< %s \| FileCheck -check-prefixes=SI,MUBUF,ALIGNED %s
	; RUN: llc -march=amdgcn -mcpu=bonaire -mattr=+unaligned-access-mode -verify-machineinstrs< %s \| FileCheck -check-prefixes=SI,MUBUF,UNALIGNED %s			; RUN: llc -march=amdgcn -mcpu=bonaire -mattr=+unaligned-buffer-access -verify-machineinstrs< %s \| FileCheck -check-prefixes=SI,MUBUF,UNALIGNED %s
	; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -verify-machineinstrs< %s \| FileCheck -check-prefixes=SI,MUBUF,ALIGNED %s			; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -verify-machineinstrs< %s \| FileCheck -check-prefixes=SI,MUBUF,ALIGNED %s
	; RUN: llc -march=amdgcn -mcpu=gfx900 -mattr=-flat-for-global -amdgpu-enable-flat-scratch -verify-machineinstrs < %s \| FileCheck -check-prefixes=SI,FLATSCR,ALIGNED %s			; RUN: llc -march=amdgcn -mcpu=gfx900 -mattr=-flat-for-global -amdgpu-enable-flat-scratch -verify-machineinstrs < %s \| FileCheck -check-prefixes=SI,FLATSCR,ALIGNED %s

	; SI-LABEL: {{^}}local_unaligned_load_store_i16:			; SI-LABEL: {{^}}local_unaligned_load_store_i16:
	; SI: ds_read_u8			; SI: ds_read_u8
	; SI: ds_read_u8			; SI: ds_read_u8
	; SI: ds_write_b8			; SI: ds_write_b8
	; SI: ds_write_b8			; SI: ds_write_b8
	▲ Show 20 Lines • Show All 686 Lines • Show Last 20 Lines

llvm/test/CodeGen/MIR/AMDGPU/llc-target-cpu-attr-from-cmdline-ir.mir

	# RUN: llc -march=amdgcn -mcpu=hawaii -run-pass=none -o - %s \| FileCheck -check-prefix=MCPU %s			# RUN: llc -march=amdgcn -mcpu=hawaii -run-pass=none -o - %s \| FileCheck -check-prefix=MCPU %s
	# RUN: llc -march=amdgcn -mattr=+unaligned-access-mode -run-pass=none -o - %s \| FileCheck -check-prefix=MATTR %s			# RUN: llc -march=amdgcn -mattr=+unaligned-buffer-access -run-pass=none -o - %s \| FileCheck -check-prefix=MATTR %s

	# FIXME: This overrides attributes that already are present. It should probably			# FIXME: This overrides attributes that already are present. It should probably
	# only touch functions without an existing attribute.			# only touch functions without an existing attribute.

	# MCPU: @with_cpu_attr() #0 {			# MCPU: @with_cpu_attr() #0 {
	# MCPU: @no_cpu_attr() #1 {			# MCPU: @no_cpu_attr() #1 {

	# MCPU: attributes #0 = { "target-cpu"="fiji" }			# MCPU: attributes #0 = { "target-cpu"="fiji" }
	# MCPU: attributes #1 = { "target-cpu"="hawaii" }			# MCPU: attributes #1 = { "target-cpu"="hawaii" }

	# MATTR: attributes #0 = { "target-cpu"="fiji" "target-features"="+unaligned-access-mode" }			# MATTR: attributes #0 = { "target-cpu"="fiji" "target-features"="+unaligned-buffer-access" }
	# MATTR: attributes #1 = { "target-features"="+unaligned-access-mode" }			# MATTR: attributes #1 = { "target-features"="+unaligned-buffer-access" }

	--- \|			--- \|
	define amdgpu_kernel void @with_cpu_attr() #0 {			define amdgpu_kernel void @with_cpu_attr() #0 {
	ret void			ret void
	}			}

	define amdgpu_kernel void @no_cpu_attr() {			define amdgpu_kernel void @no_cpu_attr() {
	ret void			ret void
	Show All 36 Lines

llvm/test/CodeGen/MIR/AMDGPU/llc-target-cpu-attr-from-cmdline.mir

	# RUN: llc -march=amdgcn -mcpu=hawaii -run-pass=none -o - %s \| FileCheck -check-prefix=MCPU %s			# RUN: llc -march=amdgcn -mcpu=hawaii -run-pass=none -o - %s \| FileCheck -check-prefix=MCPU %s
	# RUN: llc -march=amdgcn -mattr=+unaligned-access-mode -run-pass=none -o - %s \| FileCheck -check-prefix=MATTR %s			# RUN: llc -march=amdgcn -mattr=+unaligned-ds-access -run-pass=none -o - %s \| FileCheck -check-prefix=MATTR %s

	# The command line arguments for -mcpu and -mattr should manifest themselves by adding the corresponding attributes to the stub IR function.			# The command line arguments for -mcpu and -mattr should manifest themselves by adding the corresponding attributes to the stub IR function.

	# MCPU: attributes #0 = { "target-cpu"="hawaii" }			# MCPU: attributes #0 = { "target-cpu"="hawaii" }
	# MATTR: attributes #0 = { "target-features"="+unaligned-access-mode" }			# MATTR: attributes #0 = { "target-features"="+unaligned-ds-access" }

	---			---
	name: no_ir			name: no_ir
	legalized: true			legalized: true
	regBankSelected: true			regBankSelected: true
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $sgpr0, $sgpr1			liveins: $sgpr0, $sgpr1

	%0:sgpr(s32) = COPY $sgpr0			%0:sgpr(s32) = COPY $sgpr0
	%1:sgpr(s32) = COPY $sgpr1			%1:sgpr(s32) = COPY $sgpr1
	%2:vgpr(s32) = G_OR %0, %1			%2:vgpr(s32) = G_OR %0, %1
	S_ENDPGM 0, implicit %2			S_ENDPGM 0, implicit %2
	...			...

llvm/test/Transforms/LoadStoreVectorizer/AMDGPU/adjust-alloca-alignment.ll

	; RUN: opt -S -load-store-vectorizer --mcpu=hawaii -mattr=-unaligned-access-mode,+max-private-element-size-16 < %s \| FileCheck -check-prefix=ALIGNED -check-prefix=ALL %s			; RUN: opt -S -load-store-vectorizer --mcpu=hawaii -mattr=-unaligned-buffer-access,+max-private-element-size-16 < %s \| FileCheck -check-prefix=ALIGNED -check-prefix=ALL %s
	; RUN: opt -S -load-store-vectorizer --mcpu=hawaii -mattr=+unaligned-access-mode,+unaligned-scratch-access,+max-private-element-size-16 < %s \| FileCheck -check-prefix=UNALIGNED -check-prefix=ALL %s			; RUN: opt -S -load-store-vectorizer --mcpu=hawaii -mattr=+unaligned-buffer-access,+unaligned-scratch-access,+max-private-element-size-16 < %s \| FileCheck -check-prefix=UNALIGNED -check-prefix=ALL %s
	; RUN: opt -S -passes='function(load-store-vectorizer)' --mcpu=hawaii -mattr=-unaligned-access-mode,+max-private-element-size-16 < %s \| FileCheck -check-prefix=ALIGNED -check-prefix=ALL %s			; RUN: opt -S -passes='function(load-store-vectorizer)' --mcpu=hawaii -mattr=-unaligned-buffer-access,+max-private-element-size-16 < %s \| FileCheck -check-prefix=ALIGNED -check-prefix=ALL %s
	; RUN: opt -S -passes='function(load-store-vectorizer)' --mcpu=hawaii -mattr=+unaligned-access-mode,+unaligned-scratch-access,+max-private-element-size-16 < %s \| FileCheck -check-prefix=UNALIGNED -check-prefix=ALL %s			; RUN: opt -S -passes='function(load-store-vectorizer)' --mcpu=hawaii -mattr=+unaligned-buffer-access,+unaligned-scratch-access,+max-private-element-size-16 < %s \| FileCheck -check-prefix=UNALIGNED -check-prefix=ALL %s

	target triple = "amdgcn--"			target triple = "amdgcn--"
	target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5"			target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5"

	; ALL-LABEL: @load_unknown_offset_align1_i8(			; ALL-LABEL: @load_unknown_offset_align1_i8(
	; ALL: alloca [128 x i8], align 1			; ALL: alloca [128 x i8], align 1
	; UNALIGNED: load <2 x i8>, <2 x i8> addrspace(5)* %{{[0-9]+}}, align 1{{$}}			; UNALIGNED: load <2 x i8>, <2 x i8> addrspace(5)* %{{[0-9]+}}, align 1{{$}}
	define amdgpu_kernel void @load_unknown_offset_align1_i8(i8 addrspace(1)* noalias %out, i32 %offset) #0 {			define amdgpu_kernel void @load_unknown_offset_align1_i8(i8 addrspace(1)* noalias %out, i32 %offset) #0 {
	▲ Show 20 Lines • Show All 232 Lines • Show Last 20 Lines

llvm/test/Transforms/LoadStoreVectorizer/AMDGPU/multiple_tails.ll

; RUN: opt -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -basic-aa -load-store-vectorizer -S -o - %s \| FileCheck -check-prefixes=GCN,GFX7 %s		; RUN: opt -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -basic-aa -load-store-vectorizer -S -o - %s \| FileCheck -check-prefixes=GCN %s
; RUN: opt -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -basic-aa -load-store-vectorizer -S -o - %s \| FileCheck -check-prefixes=GCN,GFX9 %s		; RUN: opt -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -basic-aa -load-store-vectorizer -S -o - %s \| FileCheck -check-prefixes=GCN %s

target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5"		target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5"

; Checks that there is no crash when there are multiple tails		; Checks that there is no crash when there are multiple tails
; for a the same head starting a chain.		; for a the same head starting a chain.
@0 = internal addrspace(3) global [16384 x i32] undef		@0 = internal addrspace(3) global [16384 x i32] undef

; GCN-LABEL: @no_crash(		; GCN-LABEL: @no_crash(
Show All 15 Lines	define amdgpu_kernel void @no_crash(i32 %arg) {
ret void		ret void
}		}

; Check adjiacent memory locations are properly matched and the		; Check adjiacent memory locations are properly matched and the
; longest chain vectorized		; longest chain vectorized

; GCN-LABEL: @interleave_get_longest		; GCN-LABEL: @interleave_get_longest

; GFX7: load <2 x i32>		; GCN: load <2 x i32>
; GFX7: load i32		; GCN: load i32
; GFX7: store <2 x i32> zeroinitializer		; GCN: store <2 x i32> zeroinitializer
; GFX7: load i32		; GCN: load i32
; GFX7: load <2 x i32>		; GCN: load <2 x i32>
; GFX7: load i32		; GCN: load i32
; GFX7: load i32		; GCN: load i32

; GFX9: load <4 x i32>
; GFX9: load i32
; GFX9: store <2 x i32> zeroinitializer
; GFX9: load i32
; GFX9: load i32
; GFX9: load i32

define amdgpu_kernel void @interleave_get_longest(i32 %arg) {		define amdgpu_kernel void @interleave_get_longest(i32 %arg) {
%a1 = add i32 %arg, 1		%a1 = add i32 %arg, 1
%a2 = add i32 %arg, 2		%a2 = add i32 %arg, 2
%a3 = add i32 %arg, 3		%a3 = add i32 %arg, 3
%a4 = add i32 %arg, 4		%a4 = add i32 %arg, 4
%tmp1 = getelementptr [16384 x i32], [16384 x i32] addrspace(3)* @0, i32 0, i32 %arg		%tmp1 = getelementptr [16384 x i32], [16384 x i32] addrspace(3)* @0, i32 0, i32 %arg
%tmp2 = getelementptr [16384 x i32], [16384 x i32] addrspace(3)* @0, i32 0, i32 %a1		%tmp2 = getelementptr [16384 x i32], [16384 x i32] addrspace(3)* @0, i32 0, i32 %a1
Show All 18 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Split GCN subtarget features for unaligned accessAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 330197

llvm/lib/Target/AMDGPU/AMDGPU.td

llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp

llvm/lib/Target/AMDGPU/DSInstructions.td

llvm/lib/Target/AMDGPU/GCNSubtarget.h

llvm/test/CodeGen/AMDGPU/GlobalISel/lds-misaligned-bug.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-load-local.mir

llvm/test/CodeGen/AMDGPU/GlobalISel/load-constant.96.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/load-unaligned.ll

llvm/test/CodeGen/AMDGPU/amdgpu.private-memory.ll

llvm/test/CodeGen/AMDGPU/chain-hi-to-lo.ll

llvm/test/CodeGen/AMDGPU/ds_read2.ll

llvm/test/CodeGen/AMDGPU/ds_write2.ll

llvm/test/CodeGen/AMDGPU/fast-unaligned-load-store.global.ll

llvm/test/CodeGen/AMDGPU/lds-misaligned-bug.ll

llvm/test/CodeGen/AMDGPU/unaligned-load-store.ll

llvm/test/CodeGen/MIR/AMDGPU/llc-target-cpu-attr-from-cmdline-ir.mir

llvm/test/CodeGen/MIR/AMDGPU/llc-target-cpu-attr-from-cmdline.mir

llvm/test/Transforms/LoadStoreVectorizer/AMDGPU/adjust-alloca-alignment.ll

llvm/test/Transforms/LoadStoreVectorizer/AMDGPU/multiple_tails.ll

[AMDGPU] Split GCN subtarget features for unaligned access
AbandonedPublic