This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Separate R600 and GCN TableGen files
ClosedPublic

Authored by tstellar on May 2 2018, 1:17 PM.

Download Raw Diff

Details

Reviewers

arsenm
nhaehnle
jvesely

Commits

rGc5a154db48c3: AMDGPU: Separate R600 and GCN TableGen files
rL335942: AMDGPU: Separate R600 and GCN TableGen files

Summary

We now have two sets of generated TableGen files, one for R600 and one
for GCN, so each sub-target now has its own tables of instructions,
registers, ISel patterns, etc. This should help reduce compile time
since each sub-target now only has to consider information that
is specific to itself. This will also help prevent the R600
sub-target from slowing down new features for GCN, like disassembler
support, GlobalISel, etc.

Diff Detail

Repository

rL LLVM

Build Status

Buildable 19809
Build 19809: arc lint + arc unit

Event Timeline

tstellar created this revision.May 2 2018, 1:17 PM

Herald added subscribers: javed.absar, t-tye, tpr and 6 others. · View Herald TranscriptMay 2 2018, 1:17 PM

Harbormaster completed remote builds in B17626: Diff 144924.May 2 2018, 1:17 PM

kzhuravl added inline comments.May 2 2018, 1:23 PM

lib/Target/AMDGPU/Processors.td
14 ↗	(On Diff #144924)	Aren't GCN processors defined in GCNProcessors.td? I do not see it being removed or modified in this change..

tstellar added inline comments.May 2 2018, 1:26 PM

lib/Target/AMDGPU/Processors.td
14 ↗	(On Diff #144924)	This file isn't used at all, I think it was a rebase artifact. I can remove it.

arsenm added inline comments.May 3 2018, 4:28 AM

lib/Target/AMDGPU/AMDGPUMCInstLower.cpp
141 ↗	(On Diff #144924)	Should this be a separate class as well?
lib/Target/AMDGPU/AMDGPUSubtarget.h
66	Is it possible to avoid making these virtual?

tstellar added inline comments.May 3 2018, 9:19 AM

lib/Target/AMDGPU/AMDGPUSubtarget.h
66	I will look through this again and see if I can eliminate some of these virtual functions, but to get rid of all of them we have a few options: We could eliminate the AMDGPUCommonSubtarget super class and then in code shared between r600 and amdgcn (which is mostly IR passes and a few remaining classes like AMDGPUTargetLowering, AMDAsmPrinter, etc) do something like: bool IsAmdHsaOs; if (Triple.getArch() == Triple::amdgcn) IsAmdHsaOS = static_cast<SISubtarget>(Subtarget).isAmdHsaOS() else IsAmdHsaOS = static_cast<R600Subtaget>(Subtarget).isAmdHsaOS(); Remove subtarget checks from shared classes by refactoring code into r600/gcn specific classes.

Removed unused Processors.td file and made all AMDGPUCommonSubtarget
functions non-virtual.

arsenm added inline comments.May 23 2018, 11:57 PM

lib/Target/AMDGPU/AMDGPUInstrInfo.cpp
27	Commented out code
lib/Target/AMDGPU/AMDGPUInstructions.td
49	Should probably rename this at some point
lib/Target/AMDGPU/AMDGPUMCInstLower.cpp
127–130 ↗	(On Diff #148332)	Why is there a difference here?
lib/Target/AMDGPU/AMDGPUSubtarget.h
207–208	Why isn't this SISubtarget/GCNSubtarget?
436–437	Why is this needed outside of GCN code?
lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
589	Can’t this be in a gcn class? I think all of this is for packed anyway
lib/Target/AMDGPU/R600ISelLowering.cpp
1583–1585	Probably should reject these, but that's a separate change
lib/Target/AMDGPU/R600IntrinsicInfo.cpp
23–24 ↗	(On Diff #148332)	We can probably drop the whole class. We don't have very many intrinsic definitions left in the backend, and this code never really worked well to begin with
lib/Target/AMDGPU/SIInsertWaitcnts.cpp
936–937	Fix formatting

tstellar marked 8 inline comments as done.May 31 2018, 11:55 AM

tstellar added inline comments.

lib/Target/AMDGPU/AMDGPUSubtarget.h
207–208	I was planning to rename this as a follow on patch to avoid creating even more churn in this patch.
436–437	It's not. I've dropped the R600 implementation of this.

Rebase patch after splitting some of the changes requested
into separate patches.

Rebase patch on latest ToT.

Harbormaster completed remote builds in B19312: Diff 151267.Jun 13 2018, 4:05 PM

I've tried the updated version of the patch, although it did not apply cleanly. It also causes GPU hangs on my turks in piglit tests.

In D46365#1132085, @jvesely wrote:

I've tried the updated version of the patch, although it did not apply cleanly. It also causes GPU hangs on my turks in piglit tests.

Do you have a test case?

I assume that there is no change in generated code intended for r600 (EG/CM).
These are the changes in piglit tests I noticed:

< 	MEM_RAT_CACHELESS STORE_RAW T0.X, T1.X, 1
---
> 	MEM_RAT_CACHELESS STORE_DWORD T0.X, T1.X

There are other changes wrt register allocation and packetizer, but this one looks the most suspicious. My turks is TS2 and STORE_DWORD is not defined in the ISA (STORE_RAW is the only allowed opcode for CACHELESS target). Checking cayman ISA STORE_DWORD is opcode 20 (vs. opc 2 for STORE_RAW), which is reserved on TS2. The instruction also lost the offset.

Now, there are tests for MEMRAT_CACHELESS stoers, and they pass so I guess there is another untested store path that got mixed between TS2 and TS3.
I can paste the .ll file if you're interested.

a quick update. running llc manually on the kernel .ll (dumped using CLOVER_DEBUG=llvm) produces correct assembly. Running it in clover generates incorrect code (dumped using CLOVER_DEBUG=native) and hangs GPU.

lib/Target/AMDGPU/AMDGPUFeatures.td
53	gi complains about blank line at the end of file here
lib/Target/AMDGPU/R600ISelLowering.cpp
237	git complains about whitespace error in this location

I added the below snippet to check whether the caymanISA feature gets initialized correctly:

@@ -415,7 +417,10 @@ R600Subtarget::R600Subtarget(const Triple &TT, StringRef GPU, StringRef FS,
   TLInfo(TM, initializeSubtargetDependencies(TT, GPU, FS)),
   DX10Clamp(false),
   InstrItins(getInstrItineraryForCPU(GPU)),
-  AS (AMDGPU::getAMDGPUAS(TT)) { }
+  AS (AMDGPU::getAMDGPUAS(TT)) {
+  fprintf(stderr, "R600 FEATURE STRING: %s\n", FS.data());
+  fprintf(stderr, "R600 Has Cayman ISA: %s\n", CaymanISA ? "YES" : "NO");
+}

As expected it randomly on occasion printed:

'-fp32-denormals' is not a recognized feature for this target (ignoring feature)
'-fp32-denormals' is not a recognized feature for this target (ignoring feature)
'-fp32-denormals' is not a recognized feature for this target (ignoring feature)
R600 FEATURE STRING: -fp32-denormals
R600 Has Cayman ISA: YES

running llc through valgrind produced flood of 'Conditional jump or move depends on uninitialised value(s)'
269 errors from 24 contexts. Initialzieng just CaymanISA in R600SUbtarget gets rid of most of them.

In D46365#1133213, @jvesely wrote:

Now, there are tests for MEMRAT_CACHELESS stoers, and they pass so I guess there is another untested store path that got mixed between TS2 and TS3.
I can paste the .ll file if you're interested.

Yes, that would be helpful.

Rebase and fix some uninitialized variables in R600Subtarget.

In D46365#1140194, @jvesely wrote:

running llc through valgrind produced flood of 'Conditional jump or move depends on uninitialised value(s)'
269 errors from 24 contexts. Initialzieng just CaymanISA in R600SUbtarget gets rid of most of them.

These should be fixed now, can you re-test?

arsenm added inline comments.Jun 26 2018, 12:44 AM

lib/Target/AMDGPU/AMDGPUFeatures.td
2	Missing header comment
lib/Target/AMDGPU/AMDGPUSubtarget.h
475–476	Why are these leftover as virtual?
lib/Target/AMDGPU/R600.td
2	Missing header comment

In D46365#1141392, @tstellar wrote:

In D46365#1140194, @jvesely wrote:

running llc through valgrind produced flood of 'Conditional jump or move depends on uninitialised value(s)'
269 errors from 24 contexts. Initialzieng just CaymanISA in R600SUbtarget gets rid of most of them.

These should be fixed now, can you re-test?

Fails to build:
llvm-tblgen: Unknown command line argument '-gen-tgt-intrinsic'. Try: '../../../bin/llvm-tblgen -help'
llvm-tblgen: Did you mean '-gen-tgt-intrinsic-impl'?
make[2]: *** [lib/Target/AMDGPU/CMakeFiles/AMDGPUCommonTableGen.dir/build.make:1730: lib/Target/AMDGPU/R600GenIntrinsics.inc.tmp] Error 1

In D46365#1140270, @tstellar wrote:

In D46365#1133213, @jvesely wrote:

Now, there are tests for MEMRAT_CACHELESS stoers, and they pass so I guess there is another untested store path that got mixed between TS2 and TS3.
I can paste the .ll file if you're interested.

Yes, that would be helpful.

https://people.freedesktop.org/~jvesely/llvm/

test cases 46 and 48 the "n-" and "new-" prefixed versions are the result of the previous iteration of this patch

In D46365#1146098, @jvesely wrote:

In D46365#1141392, @tstellar wrote:

In D46365#1140194, @jvesely wrote:

running llc through valgrind produced flood of 'Conditional jump or move depends on uninitialised value(s)'
269 errors from 24 contexts. Initialzieng just CaymanISA in R600SUbtarget gets rid of most of them.

These should be fixed now, can you re-test?

Fails to build:
llvm-tblgen: Unknown command line argument '-gen-tgt-intrinsic'. Try: '../../../bin/llvm-tblgen -help'
llvm-tblgen: Did you mean '-gen-tgt-intrinsic-impl'?
make[2]: *** [lib/Target/AMDGPU/CMakeFiles/AMDGPUCommonTableGen.dir/build.make:1730: lib/Target/AMDGPU/R600GenIntrinsics.inc.tmp] Error 1

After fixing the build file as tblgen suggested (and few local fixes in my own patches) it builds OK and there are no piglit regressions on my turks.
I think this should land rather soon, with a bunch of cleanup follow ups. Having things (files, classes) that are prefixed R600, AMDGPU, AMDGPUCommon, GCN, AMDGCN, and SI is rather confusing.

Rebase and stop generating intrinsic info for R600, we don't need this.

Harbormaster completed remote builds in B19808: Diff 153250.Jun 27 2018, 8:53 PM

In D46365#1146123, @jvesely wrote:

In D46365#1146098, @jvesely wrote:

Fails to build:
llvm-tblgen: Unknown command line argument '-gen-tgt-intrinsic'. Try: '../../../bin/llvm-tblgen -help'
llvm-tblgen: Did you mean '-gen-tgt-intrinsic-impl'?
make[2]: *** [lib/Target/AMDGPU/CMakeFiles/AMDGPUCommonTableGen.dir/build.make:1730: lib/Target/AMDGPU/R600GenIntrinsics.inc.tmp] Error 1

After fixing the build file as tblgen suggested (and few local fixes in my own patches) it builds OK and there are no piglit regressions on my turks.

IntrinsicInfo isn't needed any more, so I dropped this.

I think this should land rather soon, with a bunch of cleanup follow ups. Having things (files, classes) that are prefixed R600, AMDGPU, AMDGPUCommon, GCN, AMDGCN, and SI is rather confusing.

Ok, I can start working on this once this patch lands.

tstellar marked 3 inline comments as done.Jun 27 2018, 9:07 PM

Add missing headers to tablegen files and remove virtual functions
from AMDGPUSubtarget.

LGTM

lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
735–739	I would expect this to be a separate function, but not sure where this would go

This revision is now accepted and ready to land.Jun 28 2018, 12:00 AM

tstellar added inline comments.Jun 28 2018, 9:18 AM

lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
735–739	We can refactor AMDGPUMCInstLower.cpp so that this can be in its own function. I can work on this as one of the follow on clean ups.

Closed by commit rL335942: AMDGPU: Separate R600 and GCN TableGen files (authored by tstellar). · Explain WhyJun 28 2018, 4:52 PM

This revision was automatically updated to reflect the committed changes.

Hi! We encountered a UBSan runtime error after this was merged. I wrote a bug report about it: https://bugs.llvm.org/show_bug.cgi?id=38071.

Revision Contents

Path

Size

lib/

Target/

AMDGPU/

AMDGPU.td

148 lines

AMDGPUCallingConv.td

17 lines

AMDGPUFeatures.td

60 lines

AMDGPUISelDAGToDAG.cpp

104 lines

AMDGPUISelLowering.h

7 lines

AMDGPUISelLowering.cpp

57 lines

AMDGPUInstrInfo.h

22 lines

AMDGPUInstrInfo.cpp

102 lines

AMDGPUInstructions.td

88 lines

AMDGPUIntrinsics.td

2 lines

AMDGPULowerIntrinsics.cpp

3 lines

AMDGPUPromoteAlloca.cpp

15 lines

AMDGPURegisterInfo.td

1 line

AMDGPUSubtarget.h

613 lines

AMDGPUSubtarget.cpp

116 lines

AMDGPUTargetMachine.h

14 lines

AMDGPUTargetTransformInfo.h

17 lines

AMDGPUTargetTransformInfo.cpp

2 lines

AMDILCFGStructurizer.cpp

92 lines

CMakeLists.txt

11 lines

Disassembler/

AMDGPUDisassembler.cpp

1 line

EvergreenInstructions.td

7 lines

InstPrinter/

AMDGPUInstPrinter.h

11 lines

AMDGPUInstPrinter.cpp

95 lines

MCTargetDesc/

AMDGPUMCTargetDesc.h

16 lines

AMDGPUMCTargetDesc.cpp

25 lines

CMakeLists.txt

1 line

R600MCCodeEmitter.cpp

35 lines

27 lines

3 lines

59 lines

2 lines

R600ClauseMergePass.cpp

26 lines

R600ControlFlowFinalizer.cpp

106 lines

R600EmitClauseMarkers.cpp

48 lines

R600ExpandSpecialInstrs.cpp

55 lines

3 lines

340 lines

2 lines

9 lines

429 lines

93 lines

R600MachineScheduler.cpp

62 lines

R600OptimizeVectorRegisters.cpp

14 lines

45 lines

56 lines

7 lines

58 lines

2 lines

2 lines

4 lines

3 lines

68 lines

7 lines

2 lines

16 lines

105 lines

5 lines

3 lines

1 line

2 lines

Utils/

AMDGPUBaseInfo.h

2 lines

AMDGPUBaseInfo.cpp

14 lines

Diff 153252

lib/Target/AMDGPU/AMDGPU.td

	//===-- AMDGPU.td - AMDGPU Tablegen files --------- tablegen --===//			//===-- AMDGPU.td - AMDGPU Tablegen files --------- tablegen --===//
	//			//
	// The LLVM Compiler Infrastructure			// The LLVM Compiler Infrastructure
	//			//
	// This file is distributed under the University of Illinois Open Source			// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.			// License. See LICENSE.TXT for details.
	//			//
	//===------------------------------------------------------------===//			//===------------------------------------------------------------===//

	include "llvm/TableGen/SearchableTable.td"			include "llvm/TableGen/SearchableTable.td"
	include "llvm/Target/Target.td"			include "llvm/Target/Target.td"
				include "AMDGPUFeatures.td"

	//===------------------------------------------------------------===//			//===------------------------------------------------------------===//
	// Subtarget Features (device properties)			// Subtarget Features (device properties)
	//===------------------------------------------------------------===//			//===------------------------------------------------------------===//

	def FeatureFP64 : SubtargetFeature<"fp64",
	"FP64",
	"true",
	"Enable double precision operations"
	>;

	def FeatureFMA : SubtargetFeature<"fmaf",
	"FMA",
	"true",
	"Enable single precision FMA (not as fast as mul+add, but fused)"
	>;

	def FeatureFastFMAF32 : SubtargetFeature<"fast-fmaf",			def FeatureFastFMAF32 : SubtargetFeature<"fast-fmaf",
	"FastFMAF32",			"FastFMAF32",
	"true",			"true",
	"Assuming f32 fma is at least as fast as mul + add"			"Assuming f32 fma is at least as fast as mul + add"
	>;			>;

	def FeatureMIMG_R128 : SubtargetFeature<"mimg-r128",			def FeatureMIMG_R128 : SubtargetFeature<"mimg-r128",
	"MIMG_R128",			"MIMG_R128",
	"true",			"true",
	"Support 128-bit texture resources"			"Support 128-bit texture resources"
	>;			>;

	def HalfRate64Ops : SubtargetFeature<"half-rate-64-ops",			def HalfRate64Ops : SubtargetFeature<"half-rate-64-ops",
	"HalfRate64Ops",			"HalfRate64Ops",
	"true",			"true",
	"Most fp64 instructions are half rate instead of quarter"			"Most fp64 instructions are half rate instead of quarter"
	>;			>;

	def FeatureR600ALUInst : SubtargetFeature<"R600ALUInst",
	"R600ALUInst",
	"false",
	"Older version of ALU instructions encoding"
	>;

	def FeatureVertexCache : SubtargetFeature<"HasVertexCache",
	"HasVertexCache",
	"true",
	"Specify use of dedicated vertex cache"
	>;

	def FeatureCaymanISA : SubtargetFeature<"caymanISA",
	"CaymanISA",
	"true",
	"Use Cayman ISA"
	>;

	def FeatureCFALUBug : SubtargetFeature<"cfalubug",
	"CFALUBug",
	"true",
	"GPU has CF_ALU bug"
	>;

	def FeatureFlatAddressSpace : SubtargetFeature<"flat-address-space",			def FeatureFlatAddressSpace : SubtargetFeature<"flat-address-space",
	"FlatAddressSpace",			"FlatAddressSpace",
	"true",			"true",
	"Support flat address space"			"Support flat address space"
	>;			>;

	def FeatureFlatInstOffsets : SubtargetFeature<"flat-inst-offsets",			def FeatureFlatInstOffsets : SubtargetFeature<"flat-inst-offsets",
	"FlatInstOffsets",			"FlatInstOffsets",
	▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines
	>;			>;

	def FeatureSGPRInitBug : SubtargetFeature<"sgpr-init-bug",			def FeatureSGPRInitBug : SubtargetFeature<"sgpr-init-bug",
	"SGPRInitBug",			"SGPRInitBug",
	"true",			"true",
	"VI SGPR initialization bug requiring a fixed SGPR allocation size"			"VI SGPR initialization bug requiring a fixed SGPR allocation size"
	>;			>;

	class SubtargetFeatureFetchLimit <string Value> :
	SubtargetFeature <"fetch"#Value,
	"TexVTXClauseSize",
	Value,
	"Limit the maximum number of fetches in a clause to "#Value
	>;

	def FeatureFetchLimit8 : SubtargetFeatureFetchLimit <"8">;
	def FeatureFetchLimit16 : SubtargetFeatureFetchLimit <"16">;

	class SubtargetFeatureWavefrontSize <int Value> : SubtargetFeature<
	"wavefrontsize"#Value,
	"WavefrontSize",
	!cast<string>(Value),
	"The number of threads per wavefront"
	>;

	def FeatureWavefrontSize16 : SubtargetFeatureWavefrontSize<16>;
	def FeatureWavefrontSize32 : SubtargetFeatureWavefrontSize<32>;
	def FeatureWavefrontSize64 : SubtargetFeatureWavefrontSize<64>;

	class SubtargetFeatureLDSBankCount <int Value> : SubtargetFeature <			class SubtargetFeatureLDSBankCount <int Value> : SubtargetFeature <
	"ldsbankcount"#Value,			"ldsbankcount"#Value,
	"LDSBankCount",			"LDSBankCount",
	!cast<string>(Value),			!cast<string>(Value),
	"The number of LDS banks per compute unit."			"The number of LDS banks per compute unit."
	>;			>;

	def FeatureLDSBankCount16 : SubtargetFeatureLDSBankCount<16>;			def FeatureLDSBankCount16 : SubtargetFeatureLDSBankCount<16>;
	def FeatureLDSBankCount32 : SubtargetFeatureLDSBankCount<32>;			def FeatureLDSBankCount32 : SubtargetFeatureLDSBankCount<32>;

	class SubtargetFeatureLocalMemorySize <int Value> : SubtargetFeature<
	"localmemorysize"#Value,
	"LocalMemorySize",
	!cast<string>(Value),
	"The size of local memory in bytes"
	>;

	def FeatureGCN : SubtargetFeature<"gcn",
	"IsGCN",
	"true",
	"GCN or newer GPU"
	>;

	def FeatureGCN3Encoding : SubtargetFeature<"gcn3-encoding",			def FeatureGCN3Encoding : SubtargetFeature<"gcn3-encoding",
	"GCN3Encoding",			"GCN3Encoding",
	"true",			"true",
	"Encoding format for VI"			"Encoding format for VI"
	>;			>;

	def FeatureCIInsts : SubtargetFeature<"ci-insts",			def FeatureCIInsts : SubtargetFeature<"ci-insts",
	"CIInsts",			"CIInsts",
	▲ Show 20 Lines • Show All 156 Lines • ▼ Show 20 Lines

	def FeatureFP16Denormals : SubtargetFeature<"fp16-denormals",			def FeatureFP16Denormals : SubtargetFeature<"fp16-denormals",
	"FP64FP16Denormals",			"FP64FP16Denormals",
	"true",			"true",
	"Enable half precision denormal handling",			"Enable half precision denormal handling",
	[FeatureFP64FP16Denormals]			[FeatureFP64FP16Denormals]
	>;			>;

	def FeatureDX10Clamp : SubtargetFeature<"dx10-clamp",
	"DX10Clamp",
	"true",
	"clamp modifier clamps NaNs to 0.0"
	>;

	def FeatureFPExceptions : SubtargetFeature<"fp-exceptions",			def FeatureFPExceptions : SubtargetFeature<"fp-exceptions",
	"FPExceptions",			"FPExceptions",
	"true",			"true",
	"Enable floating point exceptions"			"Enable floating point exceptions"
	>;			>;

	class FeatureMaxPrivateElementSize<int size> : SubtargetFeature<			class FeatureMaxPrivateElementSize<int size> : SubtargetFeature<
	"max-private-element-size-"#size,			"max-private-element-size-"#size,
	Show All 26 Lines
	>;			>;

	def FeatureDumpCodeLower : SubtargetFeature <"dumpcode",			def FeatureDumpCodeLower : SubtargetFeature <"dumpcode",
	"DumpCode",			"DumpCode",
	"true",			"true",
	"Dump MachineInstrs in the CodeEmitter"			"Dump MachineInstrs in the CodeEmitter"
	>;			>;

	def FeaturePromoteAlloca : SubtargetFeature <"promote-alloca",
	"EnablePromoteAlloca",
	"true",
	"Enable promote alloca pass"
	>;

	// XXX - This should probably be removed once enabled by default			// XXX - This should probably be removed once enabled by default
	def FeatureEnableLoadStoreOpt : SubtargetFeature <"load-store-opt",			def FeatureEnableLoadStoreOpt : SubtargetFeature <"load-store-opt",
	"EnableLoadStoreOpt",			"EnableLoadStoreOpt",
	"true",			"true",
	"Enable SI load/store optimizer pass"			"Enable SI load/store optimizer pass"
	>;			>;

	// Performance debugging feature. Allow using DS instruction immediate			// Performance debugging feature. Allow using DS instruction immediate
	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	>;			>;

	// Dummy feature used to disable assembler instructions.			// Dummy feature used to disable assembler instructions.
	def FeatureDisable : SubtargetFeature<"",			def FeatureDisable : SubtargetFeature<"",
	"FeatureDisable","true",			"FeatureDisable","true",
	"Dummy feature to disable assembler instructions"			"Dummy feature to disable assembler instructions"
	>;			>;

	class SubtargetFeatureGeneration <string Value,			def FeatureGCN : SubtargetFeature<"gcn",
	list<SubtargetFeature> Implies> :			"IsGCN",
	SubtargetFeature <Value, "Gen", "AMDGPUSubtarget::"#Value,			"true",
	Value#" GPU generation", Implies>;			"GCN or newer GPU"

	def FeatureLocalMemorySize0 : SubtargetFeatureLocalMemorySize<0>;
	def FeatureLocalMemorySize32768 : SubtargetFeatureLocalMemorySize<32768>;
	def FeatureLocalMemorySize65536 : SubtargetFeatureLocalMemorySize<65536>;

	def FeatureR600 : SubtargetFeatureGeneration<"R600",
	[FeatureR600ALUInst, FeatureFetchLimit8, FeatureLocalMemorySize0]
	>;

	def FeatureR700 : SubtargetFeatureGeneration<"R700",
	[FeatureFetchLimit16, FeatureLocalMemorySize0]
	>;

	def FeatureEvergreen : SubtargetFeatureGeneration<"EVERGREEN",
	[FeatureFetchLimit16, FeatureLocalMemorySize32768]
	>;			>;

	def FeatureNorthernIslands : SubtargetFeatureGeneration<"NORTHERN_ISLANDS",			class AMDGPUSubtargetFeatureGeneration <string Value,
	[FeatureFetchLimit16, FeatureWavefrontSize64,			list<SubtargetFeature> Implies> :
	FeatureLocalMemorySize32768]			SubtargetFeatureGeneration <Value, "AMDGPUSubtarget", Implies>;
	>;

	def FeatureSouthernIslands : SubtargetFeatureGeneration<"SOUTHERN_ISLANDS",			def FeatureSouthernIslands : AMDGPUSubtargetFeatureGeneration<"SOUTHERN_ISLANDS",
	[FeatureFP64, FeatureLocalMemorySize32768, FeatureMIMG_R128,			[FeatureFP64, FeatureLocalMemorySize32768, FeatureMIMG_R128,
	FeatureWavefrontSize64, FeatureGCN,			FeatureWavefrontSize64, FeatureGCN,
	FeatureLDSBankCount32, FeatureMovrel]			FeatureLDSBankCount32, FeatureMovrel]
	>;			>;

	def FeatureSeaIslands : SubtargetFeatureGeneration<"SEA_ISLANDS",			def FeatureSeaIslands : AMDGPUSubtargetFeatureGeneration<"SEA_ISLANDS",
	[FeatureFP64, FeatureLocalMemorySize65536, FeatureMIMG_R128,			[FeatureFP64, FeatureLocalMemorySize65536, FeatureMIMG_R128,
	FeatureWavefrontSize64, FeatureGCN, FeatureFlatAddressSpace,			FeatureWavefrontSize64, FeatureGCN, FeatureFlatAddressSpace,
	FeatureCIInsts, FeatureMovrel]			FeatureCIInsts, FeatureMovrel]
	>;			>;

	def FeatureVolcanicIslands : SubtargetFeatureGeneration<"VOLCANIC_ISLANDS",			def FeatureVolcanicIslands : AMDGPUSubtargetFeatureGeneration<"VOLCANIC_ISLANDS",
	[FeatureFP64, FeatureLocalMemorySize65536, FeatureMIMG_R128,			[FeatureFP64, FeatureLocalMemorySize65536, FeatureMIMG_R128,
	FeatureWavefrontSize64, FeatureFlatAddressSpace, FeatureGCN,			FeatureWavefrontSize64, FeatureFlatAddressSpace, FeatureGCN,
	FeatureGCN3Encoding, FeatureCIInsts, Feature16BitInsts,			FeatureGCN3Encoding, FeatureCIInsts, Feature16BitInsts,
	FeatureSMemRealTime, FeatureVGPRIndexMode, FeatureMovrel,			FeatureSMemRealTime, FeatureVGPRIndexMode, FeatureMovrel,
	FeatureScalarStores, FeatureInv2PiInlineImm,			FeatureScalarStores, FeatureInv2PiInlineImm,
	FeatureSDWA, FeatureSDWAOutModsVOPC, FeatureSDWAMac, FeatureDPP,			FeatureSDWA, FeatureSDWAOutModsVOPC, FeatureSDWAMac, FeatureDPP,
	FeatureIntClamp			FeatureIntClamp
	]			]
	>;			>;

	def FeatureGFX9 : SubtargetFeatureGeneration<"GFX9",			def FeatureGFX9 : AMDGPUSubtargetFeatureGeneration<"GFX9",
	[FeatureFP64, FeatureLocalMemorySize65536,			[FeatureFP64, FeatureLocalMemorySize65536,
	FeatureWavefrontSize64, FeatureFlatAddressSpace, FeatureGCN,			FeatureWavefrontSize64, FeatureFlatAddressSpace, FeatureGCN,
	FeatureGCN3Encoding, FeatureCIInsts, Feature16BitInsts,			FeatureGCN3Encoding, FeatureCIInsts, Feature16BitInsts,
	FeatureSMemRealTime, FeatureScalarStores, FeatureInv2PiInlineImm,			FeatureSMemRealTime, FeatureScalarStores, FeatureInv2PiInlineImm,
	FeatureApertureRegs, FeatureGFX9Insts, FeatureVOP3P, FeatureVGPRIndexMode,			FeatureApertureRegs, FeatureGFX9Insts, FeatureVOP3P, FeatureVGPRIndexMode,
	FeatureFastFMAF32, FeatureDPP, FeatureIntClamp,			FeatureFastFMAF32, FeatureDPP, FeatureIntClamp,
	FeatureSDWA, FeatureSDWAOmod, FeatureSDWAScalar, FeatureSDWASdst,			FeatureSDWA, FeatureSDWAOmod, FeatureSDWAScalar, FeatureSDWASdst,
	FeatureFlatInstOffsets, FeatureFlatGlobalInsts, FeatureFlatScratchInsts,			FeatureFlatInstOffsets, FeatureFlatGlobalInsts, FeatureFlatScratchInsts,
	▲ Show 20 Lines • Show All 186 Lines • ▼ Show 20 Lines
	// Dummy Instruction itineraries for pseudo instructions			// Dummy Instruction itineraries for pseudo instructions
	def ALU_NULL : FuncUnit;			def ALU_NULL : FuncUnit;
	def NullALU : InstrItinClass;			def NullALU : InstrItinClass;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Predicate helper class			// Predicate helper class
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	def TruePredicate : Predicate<"true">;

	def isSICI : Predicate<			def isSICI : Predicate<
	"Subtarget->getGeneration() == AMDGPUSubtarget::SOUTHERN_ISLANDS \|\|"			"Subtarget->getGeneration() == AMDGPUSubtarget::SOUTHERN_ISLANDS \|\|"
	"Subtarget->getGeneration() == AMDGPUSubtarget::SEA_ISLANDS"			"Subtarget->getGeneration() == AMDGPUSubtarget::SEA_ISLANDS"
	>, AssemblerPredicate<"!FeatureGCN3Encoding">;			>, AssemblerPredicate<"!FeatureGCN3Encoding">;

	def isVI : Predicate <			def isVI : Predicate <
	"Subtarget->getGeneration() >= AMDGPUSubtarget::VOLCANIC_ISLANDS">,			"Subtarget->getGeneration() >= AMDGPUSubtarget::VOLCANIC_ISLANDS">,
	AssemblerPredicate<"FeatureGCN3Encoding">;			AssemblerPredicate<"FeatureGCN3Encoding">;
	▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines

	def HasDLInsts : Predicate<"Subtarget->hasDLInsts()">,			def HasDLInsts : Predicate<"Subtarget->hasDLInsts()">,
	AssemblerPredicate<"FeatureDLInsts">;			AssemblerPredicate<"FeatureDLInsts">;


	def EnableLateCFGStructurize : Predicate<			def EnableLateCFGStructurize : Predicate<
	"EnableLateStructurizeCFG">;			"EnableLateStructurizeCFG">;

	// Exists to help track down where SubtargetPredicate isn't set rather
	// than letting tablegen crash with an unhelpful error.
	def InvalidPred : Predicate<"predicate not set on instruction or pattern">;

	class PredicateControl {
	Predicate SubtargetPredicate = InvalidPred;
	Predicate SIAssemblerPredicate = isSICI;
	Predicate VIAssemblerPredicate = isVI;
	list<Predicate> AssemblerPredicates = [];
	Predicate AssemblerPredicate = TruePredicate;
	list<Predicate> OtherPredicates = [];
	list<Predicate> Predicates = !listconcat([SubtargetPredicate,
	AssemblerPredicate],
	AssemblerPredicates,
	OtherPredicates);
	}

	class AMDGPUPat<dag pattern, dag result> : Pat<pattern, result>,
	PredicateControl;


	// Include AMDGPU TD files			// Include AMDGPU TD files
	include "R600Schedule.td"
	include "R600Processors.td"
	include "SISchedule.td"			include "SISchedule.td"
	include "GCNProcessors.td"			include "GCNProcessors.td"
	include "AMDGPUInstrInfo.td"			include "AMDGPUInstrInfo.td"
	include "AMDGPUIntrinsics.td"			include "AMDGPUIntrinsics.td"
				include "SIIntrinsics.td"
	include "AMDGPURegisterInfo.td"			include "AMDGPURegisterInfo.td"
	include "AMDGPURegisterBanks.td"			include "AMDGPURegisterBanks.td"
	include "AMDGPUInstructions.td"			include "AMDGPUInstructions.td"
				include "SIInstrInfo.td"
	include "AMDGPUCallingConv.td"			include "AMDGPUCallingConv.td"
	include "AMDGPUSearchableTables.td"			include "AMDGPUSearchableTables.td"

lib/Target/AMDGPU/AMDGPUCallingConv.td

Show First 20 Lines • Show All 79 Lines • ▼ Show 20 Lines	CCIfType<[f32, f16] , CCAssignToReg<[
VGPR96, VGPR97, VGPR98, VGPR99, VGPR100, VGPR101, VGPR102, VGPR103,		VGPR96, VGPR97, VGPR98, VGPR99, VGPR100, VGPR101, VGPR102, VGPR103,
VGPR104, VGPR105, VGPR106, VGPR107, VGPR108, VGPR109, VGPR110, VGPR111,		VGPR104, VGPR105, VGPR106, VGPR107, VGPR108, VGPR109, VGPR110, VGPR111,
VGPR112, VGPR113, VGPR114, VGPR115, VGPR116, VGPR117, VGPR118, VGPR119,		VGPR112, VGPR113, VGPR114, VGPR115, VGPR116, VGPR117, VGPR118, VGPR119,
VGPR120, VGPR121, VGPR122, VGPR123, VGPR124, VGPR125, VGPR126, VGPR127,		VGPR120, VGPR121, VGPR122, VGPR123, VGPR124, VGPR125, VGPR126, VGPR127,
VGPR128, VGPR129, VGPR130, VGPR131, VGPR132, VGPR133, VGPR134, VGPR135		VGPR128, VGPR129, VGPR130, VGPR131, VGPR132, VGPR133, VGPR134, VGPR135
]>>		]>>
]>;		]>;

// Calling convention for R600
def CC_R600 : CallingConv<[
CCIfInReg<CCIfType<[v4f32, v4i32] , CCAssignToReg<[
T0_XYZW, T1_XYZW, T2_XYZW, T3_XYZW, T4_XYZW, T5_XYZW, T6_XYZW, T7_XYZW,
T8_XYZW, T9_XYZW, T10_XYZW, T11_XYZW, T12_XYZW, T13_XYZW, T14_XYZW, T15_XYZW,
T16_XYZW, T17_XYZW, T18_XYZW, T19_XYZW, T20_XYZW, T21_XYZW, T22_XYZW,
T23_XYZW, T24_XYZW, T25_XYZW, T26_XYZW, T27_XYZW, T28_XYZW, T29_XYZW,
T30_XYZW, T31_XYZW, T32_XYZW
]>>>
]>;

// Calling convention for compute kernels		// Calling convention for compute kernels
def CC_AMDGPU_Kernel : CallingConv<[		def CC_AMDGPU_Kernel : CallingConv<[
CCCustom<"allocateKernArg">		CCCustom<"allocateKernArg">
]>;		]>;

def CSR_AMDGPU_VGPRs_24_255 : CalleeSavedRegs<		def CSR_AMDGPU_VGPRs_24_255 : CalleeSavedRegs<
(sequence "VGPR%u", 24, 255)		(sequence "VGPR%u", 24, 255)
>;		>;
▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	CCIf<"static_cast<const AMDGPUSubtarget&>"
CCDelegateTo<CC_AMDGPU_Kernel>>,		CCDelegateTo<CC_AMDGPU_Kernel>>,
CCIf<"static_cast<const AMDGPUSubtarget&>"		CCIf<"static_cast<const AMDGPUSubtarget&>"
"(State.getMachineFunction().getSubtarget()).getGeneration() >= "		"(State.getMachineFunction().getSubtarget()).getGeneration() >= "
"AMDGPUSubtarget::SOUTHERN_ISLANDS",		"AMDGPUSubtarget::SOUTHERN_ISLANDS",
CCDelegateTo<CC_SI>>,		CCDelegateTo<CC_SI>>,
CCIf<"static_cast<const AMDGPUSubtarget&>"		CCIf<"static_cast<const AMDGPUSubtarget&>"
"(State.getMachineFunction().getSubtarget()).getGeneration() >= "		"(State.getMachineFunction().getSubtarget()).getGeneration() >= "
"AMDGPUSubtarget::SOUTHERN_ISLANDS && State.getCallingConv() == CallingConv::C",		"AMDGPUSubtarget::SOUTHERN_ISLANDS && State.getCallingConv() == CallingConv::C",
CCDelegateTo<CC_AMDGPU_Func>>,		CCDelegateTo<CC_AMDGPU_Func>>
CCIf<"static_cast<const AMDGPUSubtarget&>"
"(State.getMachineFunction().getSubtarget()).getGeneration() < "
"AMDGPUSubtarget::SOUTHERN_ISLANDS",
CCDelegateTo<CC_R600>>
]>;		]>;

lib/Target/AMDGPU/AMDGPUFeatures.td

This file was added.

				//===-- AMDGPUFeatures.td - AMDGPU Feature Definitions ------ tablegen --===//
				//
				arsenmUnsubmitted Done Reply Inline Actions Missing header comment arsenm: Missing header comment
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//

				def FeatureFP64 : SubtargetFeature<"fp64",
				"FP64",
				"true",
				"Enable double precision operations"
				>;

				def FeatureFMA : SubtargetFeature<"fmaf",
				"FMA",
				"true",
				"Enable single precision FMA (not as fast as mul+add, but fused)"
				>;

				class SubtargetFeatureLocalMemorySize <int Value> : SubtargetFeature<
				"localmemorysize"#Value,
				"LocalMemorySize",
				!cast<string>(Value),
				"The size of local memory in bytes"
				>;

				def FeatureLocalMemorySize0 : SubtargetFeatureLocalMemorySize<0>;
				def FeatureLocalMemorySize32768 : SubtargetFeatureLocalMemorySize<32768>;
				def FeatureLocalMemorySize65536 : SubtargetFeatureLocalMemorySize<65536>;

				class SubtargetFeatureWavefrontSize <int Value> : SubtargetFeature<
				"wavefrontsize"#Value,
				"WavefrontSize",
				!cast<string>(Value),
				"The number of threads per wavefront"
				>;

				def FeatureWavefrontSize16 : SubtargetFeatureWavefrontSize<16>;
				def FeatureWavefrontSize32 : SubtargetFeatureWavefrontSize<32>;
				def FeatureWavefrontSize64 : SubtargetFeatureWavefrontSize<64>;

				class SubtargetFeatureGeneration <string Value, string Subtarget,
				list<SubtargetFeature> Implies> :
				SubtargetFeature <Value, "Gen", Subtarget#"::"#Value,
				Value#" GPU generation", Implies>;

				def FeatureDX10Clamp : SubtargetFeature<"dx10-clamp",
				"DX10Clamp",
				"true",
				"clamp modifier clamps NaNs to 0.0"
				>;
				jveselyUnsubmitted Not Done Reply Inline Actions gi complains about blank line at the end of file here jvesely: gi complains about blank line at the end of file here

				def FeaturePromoteAlloca : SubtargetFeature <"promote-alloca",
				"EnablePromoteAlloca",
				"true",
				"Enable promote alloca pass"
				>;

lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp

Show First 20 Lines • Show All 98 Lines • ▼ Show 20 Lines
protected:		protected:
void SelectBuildVector(SDNode *N, unsigned RegClassID);		void SelectBuildVector(SDNode *N, unsigned RegClassID);

private:		private:
std::pair<SDValue, SDValue> foldFrameIndex(SDValue N) const;		std::pair<SDValue, SDValue> foldFrameIndex(SDValue N) const;
bool isNoNanSrc(SDValue N) const;		bool isNoNanSrc(SDValue N) const;
bool isInlineImmediate(const SDNode *N) const;		bool isInlineImmediate(const SDNode *N) const;

bool isConstantLoad(const MemSDNode *N, int cbID) const;
bool isUniformBr(const SDNode *N) const;		bool isUniformBr(const SDNode *N) const;

SDNode glueCopyToM0(SDNode N) const;		SDNode glueCopyToM0(SDNode N) const;

const TargetRegisterClass getOperandRegClass(SDNode N, unsigned OpNo) const;		const TargetRegisterClass getOperandRegClass(SDNode N, unsigned OpNo) const;
bool SelectGlobalValueConstantOffset(SDValue Addr, SDValue& IntPtr);
bool SelectGlobalValueVariableOffset(SDValue Addr, SDValue &BaseReg,
SDValue& Offset);
virtual bool SelectADDRVTX_READ(SDValue Addr, SDValue &Base, SDValue &Offset);		virtual bool SelectADDRVTX_READ(SDValue Addr, SDValue &Base, SDValue &Offset);
virtual bool SelectADDRIndirect(SDValue Addr, SDValue &Base, SDValue &Offset);		virtual bool SelectADDRIndirect(SDValue Addr, SDValue &Base, SDValue &Offset);
bool isDSOffsetLegal(const SDValue &Base, unsigned Offset,		bool isDSOffsetLegal(const SDValue &Base, unsigned Offset,
unsigned OffsetBits) const;		unsigned OffsetBits) const;
bool SelectDS1Addr1Offset(SDValue Ptr, SDValue &Base, SDValue &Offset) const;		bool SelectDS1Addr1Offset(SDValue Ptr, SDValue &Base, SDValue &Offset) const;
bool SelectDS64Bit4ByteAligned(SDValue Ptr, SDValue &Base, SDValue &Offset0,		bool SelectDS64Bit4ByteAligned(SDValue Ptr, SDValue &Base, SDValue &Offset0,
SDValue &Offset1) const;		SDValue &Offset1) const;
bool SelectMUBUF(SDValue Addr, SDValue &SRsrc, SDValue &VAddr,		bool SelectMUBUF(SDValue Addr, SDValue &SRsrc, SDValue &VAddr,
▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines	private:
void SelectATOMIC_CMP_SWAP(SDNode *N);		void SelectATOMIC_CMP_SWAP(SDNode *N);

protected:		protected:
// Include the pieces autogenerated from the target description.		// Include the pieces autogenerated from the target description.
#include "AMDGPUGenDAGISel.inc"		#include "AMDGPUGenDAGISel.inc"
};		};

class R600DAGToDAGISel : public AMDGPUDAGToDAGISel {		class R600DAGToDAGISel : public AMDGPUDAGToDAGISel {
		const R600Subtarget *Subtarget;
		AMDGPUAS AMDGPUASI;

		bool isConstantLoad(const MemSDNode *N, int cbID) const;
		bool SelectGlobalValueConstantOffset(SDValue Addr, SDValue& IntPtr);
		bool SelectGlobalValueVariableOffset(SDValue Addr, SDValue &BaseReg,
		SDValue& Offset);
public:		public:
explicit R600DAGToDAGISel(TargetMachine *TM, CodeGenOpt::Level OptLevel) :		explicit R600DAGToDAGISel(TargetMachine *TM, CodeGenOpt::Level OptLevel) :
AMDGPUDAGToDAGISel(TM, OptLevel) {}		AMDGPUDAGToDAGISel(TM, OptLevel) {
		AMDGPUASI = AMDGPU::getAMDGPUAS(*TM);
		}

void Select(SDNode *N) override;		void Select(SDNode *N) override;

bool SelectADDRIndirect(SDValue Addr, SDValue &Base,		bool SelectADDRIndirect(SDValue Addr, SDValue &Base,
SDValue &Offset) override;		SDValue &Offset) override;
bool SelectADDRVTX_READ(SDValue Addr, SDValue &Base,		bool SelectADDRVTX_READ(SDValue Addr, SDValue &Base,
SDValue &Offset) override;		SDValue &Offset) override;

		bool runOnMachineFunction(MachineFunction &MF) override;
		protected:
		// Include the pieces autogenerated from the target description.
		#include "R600GenDAGISel.inc"
};		};

} // end anonymous namespace		} // end anonymous namespace

INITIALIZE_PASS_BEGIN(AMDGPUDAGToDAGISel, "isel",		INITIALIZE_PASS_BEGIN(AMDGPUDAGToDAGISel, "isel",
"AMDGPU DAG->DAG Pattern Instruction Selection", false, false)		"AMDGPU DAG->DAG Pattern Instruction Selection", false, false)
INITIALIZE_PASS_DEPENDENCY(AMDGPUArgumentUsageInfo)		INITIALIZE_PASS_DEPENDENCY(AMDGPUArgumentUsageInfo)
INITIALIZE_PASS_DEPENDENCY(AMDGPUPerfHintAnalysis)		INITIALIZE_PASS_DEPENDENCY(AMDGPUPerfHintAnalysis)
Show All 27 Lines	bool AMDGPUDAGToDAGISel::isNoNanSrc(SDValue N) const {
// TODO: Move into isKnownNeverNaN		// TODO: Move into isKnownNeverNaN
if (N->getFlags().isDefined())		if (N->getFlags().isDefined())
return N->getFlags().hasNoNaNs();		return N->getFlags().hasNoNaNs();

return CurDAG->isKnownNeverNaN(N);		return CurDAG->isKnownNeverNaN(N);
}		}

bool AMDGPUDAGToDAGISel::isInlineImmediate(const SDNode *N) const {		bool AMDGPUDAGToDAGISel::isInlineImmediate(const SDNode *N) const {
const SIInstrInfo *TII		const SIInstrInfo *TII = Subtarget->getInstrInfo();
= static_cast<const SISubtarget *>(Subtarget)->getInstrInfo();

if (const ConstantSDNode *C = dyn_cast<ConstantSDNode>(N))		if (const ConstantSDNode *C = dyn_cast<ConstantSDNode>(N))
return TII->isInlineConstant(C->getAPIntValue());		return TII->isInlineConstant(C->getAPIntValue());

if (const ConstantFPSDNode *C = dyn_cast<ConstantFPSDNode>(N))		if (const ConstantFPSDNode *C = dyn_cast<ConstantFPSDNode>(N))
return TII->isInlineConstant(C->getValueAPF().bitcastToAPInt());		return TII->isInlineConstant(C->getValueAPF().bitcastToAPInt());

return false;		return false;
▲ Show 20 Lines • Show All 339 Lines • ▼ Show 20 Lines	void AMDGPUDAGToDAGISel::Select(SDNode *N) {
case AMDGPUISD::ATOMIC_CMP_SWAP:		case AMDGPUISD::ATOMIC_CMP_SWAP:
SelectATOMIC_CMP_SWAP(N);		SelectATOMIC_CMP_SWAP(N);
return;		return;
}		}

SelectCode(N);		SelectCode(N);
}		}

bool AMDGPUDAGToDAGISel::isConstantLoad(const MemSDNode *N, int CbId) const {
if (!N->readMem())
return false;
if (CbId == -1)
return N->getAddressSpace() == AMDGPUASI.CONSTANT_ADDRESS \|\|
N->getAddressSpace() == AMDGPUASI.CONSTANT_ADDRESS_32BIT;

return N->getAddressSpace() == AMDGPUASI.CONSTANT_BUFFER_0 + CbId;
}

bool AMDGPUDAGToDAGISel::isUniformBr(const SDNode *N) const {		bool AMDGPUDAGToDAGISel::isUniformBr(const SDNode *N) const {
const BasicBlock *BB = FuncInfo->MBB->getBasicBlock();		const BasicBlock *BB = FuncInfo->MBB->getBasicBlock();
const Instruction *Term = BB->getTerminator();		const Instruction *Term = BB->getTerminator();
return Term->getMetadata("amdgpu.uniform") \|\|		return Term->getMetadata("amdgpu.uniform") \|\|
Term->getMetadata("structurizecfg.uniform");		Term->getMetadata("structurizecfg.uniform");
}		}

StringRef AMDGPUDAGToDAGISel::getPassName() const {		StringRef AMDGPUDAGToDAGISel::getPassName() const {
return "AMDGPU DAG->DAG Pattern Instruction Selection";		return "AMDGPU DAG->DAG Pattern Instruction Selection";
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Complex Patterns		// Complex Patterns
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

bool AMDGPUDAGToDAGISel::SelectGlobalValueConstantOffset(SDValue Addr,
SDValue& IntPtr) {
if (ConstantSDNode *Cst = dyn_cast<ConstantSDNode>(Addr)) {
IntPtr = CurDAG->getIntPtrConstant(Cst->getZExtValue() / 4, SDLoc(Addr),
true);
return true;
}
return false;
}

bool AMDGPUDAGToDAGISel::SelectGlobalValueVariableOffset(SDValue Addr,
SDValue& BaseReg, SDValue &Offset) {
if (!isa<ConstantSDNode>(Addr)) {
BaseReg = Addr;
Offset = CurDAG->getIntPtrConstant(0, SDLoc(Addr), true);
return true;
}
return false;
}

bool AMDGPUDAGToDAGISel::SelectADDRVTX_READ(SDValue Addr, SDValue &Base,		bool AMDGPUDAGToDAGISel::SelectADDRVTX_READ(SDValue Addr, SDValue &Base,
SDValue &Offset) {		SDValue &Offset) {
return false;		return false;
}		}

bool AMDGPUDAGToDAGISel::SelectADDRIndirect(SDValue Addr, SDValue &Base,		bool AMDGPUDAGToDAGISel::SelectADDRIndirect(SDValue Addr, SDValue &Base,
SDValue &Offset) {		SDValue &Offset) {
ConstantSDNode *C;		ConstantSDNode *C;
SDLoc DL(Addr);		SDLoc DL(Addr);

if ((C = dyn_cast<ConstantSDNode>(Addr))) {		if ((C = dyn_cast<ConstantSDNode>(Addr))) {
Base = CurDAG->getRegister(AMDGPU::INDIRECT_BASE_ADDR, MVT::i32);		Base = CurDAG->getRegister(R600::INDIRECT_BASE_ADDR, MVT::i32);
Offset = CurDAG->getTargetConstant(C->getZExtValue(), DL, MVT::i32);		Offset = CurDAG->getTargetConstant(C->getZExtValue(), DL, MVT::i32);
} else if ((Addr.getOpcode() == AMDGPUISD::DWORDADDR) &&		} else if ((Addr.getOpcode() == AMDGPUISD::DWORDADDR) &&
(C = dyn_cast<ConstantSDNode>(Addr.getOperand(0)))) {		(C = dyn_cast<ConstantSDNode>(Addr.getOperand(0)))) {
Base = CurDAG->getRegister(AMDGPU::INDIRECT_BASE_ADDR, MVT::i32);		Base = CurDAG->getRegister(R600::INDIRECT_BASE_ADDR, MVT::i32);
Offset = CurDAG->getTargetConstant(C->getZExtValue(), DL, MVT::i32);		Offset = CurDAG->getTargetConstant(C->getZExtValue(), DL, MVT::i32);
} else if ((Addr.getOpcode() == ISD::ADD \|\| Addr.getOpcode() == ISD::OR) &&		} else if ((Addr.getOpcode() == ISD::ADD \|\| Addr.getOpcode() == ISD::OR) &&
(C = dyn_cast<ConstantSDNode>(Addr.getOperand(1)))) {		(C = dyn_cast<ConstantSDNode>(Addr.getOperand(1)))) {
Base = Addr.getOperand(0);		Base = Addr.getOperand(0);
Offset = CurDAG->getTargetConstant(C->getZExtValue(), DL, MVT::i32);		Offset = CurDAG->getTargetConstant(C->getZExtValue(), DL, MVT::i32);
} else {		} else {
Base = Addr;		Base = Addr;
Offset = CurDAG->getTargetConstant(0, DL, MVT::i32);		Offset = CurDAG->getTargetConstant(0, DL, MVT::i32);
▲ Show 20 Lines • Show All 1,446 Lines • ▼ Show 20 Lines	while (Position != CurDAG->allnodes_end()) {
ReplaceUses(Node, ResNode);		ReplaceUses(Node, ResNode);
IsModified = true;		IsModified = true;
}		}
}		}
CurDAG->RemoveDeadNodes();		CurDAG->RemoveDeadNodes();
} while (IsModified);		} while (IsModified);
}		}

		bool R600DAGToDAGISel::runOnMachineFunction(MachineFunction &MF) {
		Subtarget = &MF.getSubtarget<R600Subtarget>();
		return SelectionDAGISel::runOnMachineFunction(MF);
		}

		bool R600DAGToDAGISel::isConstantLoad(const MemSDNode *N, int CbId) const {
		if (!N->readMem())
		return false;
		if (CbId == -1)
		return N->getAddressSpace() == AMDGPUASI.CONSTANT_ADDRESS \|\|
		N->getAddressSpace() == AMDGPUASI.CONSTANT_ADDRESS_32BIT;

		return N->getAddressSpace() == AMDGPUASI.CONSTANT_BUFFER_0 + CbId;
		}

		bool R600DAGToDAGISel::SelectGlobalValueConstantOffset(SDValue Addr,
		SDValue& IntPtr) {
		if (ConstantSDNode *Cst = dyn_cast<ConstantSDNode>(Addr)) {
		IntPtr = CurDAG->getIntPtrConstant(Cst->getZExtValue() / 4, SDLoc(Addr),
		true);
		return true;
		}
		return false;
		}

		bool R600DAGToDAGISel::SelectGlobalValueVariableOffset(SDValue Addr,
		SDValue& BaseReg, SDValue &Offset) {
		if (!isa<ConstantSDNode>(Addr)) {
		BaseReg = Addr;
		Offset = CurDAG->getIntPtrConstant(0, SDLoc(Addr), true);
		return true;
		}
		return false;
		}

void R600DAGToDAGISel::Select(SDNode *N) {		void R600DAGToDAGISel::Select(SDNode *N) {
unsigned int Opc = N->getOpcode();		unsigned int Opc = N->getOpcode();
if (N->isMachineOpcode()) {		if (N->isMachineOpcode()) {
N->setNodeId(-1);		N->setNodeId(-1);
return; // Already selected.		return; // Already selected.
}		}

switch (Opc) {		switch (Opc) {
default: break;		default: break;
case AMDGPUISD::BUILD_VERTICAL_VECTOR:		case AMDGPUISD::BUILD_VERTICAL_VECTOR:
case ISD::SCALAR_TO_VECTOR:		case ISD::SCALAR_TO_VECTOR:
case ISD::BUILD_VECTOR: {		case ISD::BUILD_VECTOR: {
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
unsigned NumVectorElts = VT.getVectorNumElements();		unsigned NumVectorElts = VT.getVectorNumElements();
unsigned RegClassID;		unsigned RegClassID;
// BUILD_VECTOR was lowered into an IMPLICIT_DEF + 4 INSERT_SUBREG		// BUILD_VECTOR was lowered into an IMPLICIT_DEF + 4 INSERT_SUBREG
// that adds a 128 bits reg copy when going through TwoAddressInstructions		// that adds a 128 bits reg copy when going through TwoAddressInstructions
// pass. We want to avoid 128 bits copies as much as possible because they		// pass. We want to avoid 128 bits copies as much as possible because they
// can't be bundled by our scheduler.		// can't be bundled by our scheduler.
switch(NumVectorElts) {		switch(NumVectorElts) {
case 2: RegClassID = AMDGPU::R600_Reg64RegClassID; break;		case 2: RegClassID = R600::R600_Reg64RegClassID; break;
case 4:		case 4:
if (Opc == AMDGPUISD::BUILD_VERTICAL_VECTOR)		if (Opc == AMDGPUISD::BUILD_VERTICAL_VECTOR)
RegClassID = AMDGPU::R600_Reg128VerticalRegClassID;		RegClassID = R600::R600_Reg128VerticalRegClassID;
else		else
RegClassID = AMDGPU::R600_Reg128RegClassID;		RegClassID = R600::R600_Reg128RegClassID;
break;		break;
default: llvm_unreachable("Do not know how to lower this BUILD_VECTOR");		default: llvm_unreachable("Do not know how to lower this BUILD_VECTOR");
}		}
SelectBuildVector(N, RegClassID);		SelectBuildVector(N, RegClassID);
return;		return;
}		}
}		}

SelectCode(N);		SelectCode(N);
}		}

bool R600DAGToDAGISel::SelectADDRIndirect(SDValue Addr, SDValue &Base,		bool R600DAGToDAGISel::SelectADDRIndirect(SDValue Addr, SDValue &Base,
SDValue &Offset) {		SDValue &Offset) {
ConstantSDNode *C;		ConstantSDNode *C;
SDLoc DL(Addr);		SDLoc DL(Addr);

if ((C = dyn_cast<ConstantSDNode>(Addr))) {		if ((C = dyn_cast<ConstantSDNode>(Addr))) {
Base = CurDAG->getRegister(AMDGPU::INDIRECT_BASE_ADDR, MVT::i32);		Base = CurDAG->getRegister(R600::INDIRECT_BASE_ADDR, MVT::i32);
Offset = CurDAG->getTargetConstant(C->getZExtValue(), DL, MVT::i32);		Offset = CurDAG->getTargetConstant(C->getZExtValue(), DL, MVT::i32);
} else if ((Addr.getOpcode() == AMDGPUISD::DWORDADDR) &&		} else if ((Addr.getOpcode() == AMDGPUISD::DWORDADDR) &&
(C = dyn_cast<ConstantSDNode>(Addr.getOperand(0)))) {		(C = dyn_cast<ConstantSDNode>(Addr.getOperand(0)))) {
Base = CurDAG->getRegister(AMDGPU::INDIRECT_BASE_ADDR, MVT::i32);		Base = CurDAG->getRegister(R600::INDIRECT_BASE_ADDR, MVT::i32);
Offset = CurDAG->getTargetConstant(C->getZExtValue(), DL, MVT::i32);		Offset = CurDAG->getTargetConstant(C->getZExtValue(), DL, MVT::i32);
} else if ((Addr.getOpcode() == ISD::ADD \|\| Addr.getOpcode() == ISD::OR) &&		} else if ((Addr.getOpcode() == ISD::ADD \|\| Addr.getOpcode() == ISD::OR) &&
(C = dyn_cast<ConstantSDNode>(Addr.getOperand(1)))) {		(C = dyn_cast<ConstantSDNode>(Addr.getOperand(1)))) {
Base = Addr.getOperand(0);		Base = Addr.getOperand(0);
Offset = CurDAG->getTargetConstant(C->getZExtValue(), DL, MVT::i32);		Offset = CurDAG->getTargetConstant(C->getZExtValue(), DL, MVT::i32);
} else {		} else {
Base = Addr;		Base = Addr;
Offset = CurDAG->getTargetConstant(0, DL, MVT::i32);		Offset = CurDAG->getTargetConstant(0, DL, MVT::i32);
Show All 14 Lines	if (Addr.getOpcode() == ISD::ADD
Offset = CurDAG->getTargetConstant(IMMOffset->getZExtValue(), SDLoc(Addr),		Offset = CurDAG->getTargetConstant(IMMOffset->getZExtValue(), SDLoc(Addr),
MVT::i32);		MVT::i32);
return true;		return true;
// If the pointer address is constant, we can move it to the offset field.		// If the pointer address is constant, we can move it to the offset field.
} else if ((IMMOffset = dyn_cast<ConstantSDNode>(Addr))		} else if ((IMMOffset = dyn_cast<ConstantSDNode>(Addr))
&& isInt<16>(IMMOffset->getZExtValue())) {		&& isInt<16>(IMMOffset->getZExtValue())) {
Base = CurDAG->getCopyFromReg(CurDAG->getEntryNode(),		Base = CurDAG->getCopyFromReg(CurDAG->getEntryNode(),
SDLoc(CurDAG->getEntryNode()),		SDLoc(CurDAG->getEntryNode()),
AMDGPU::ZERO, MVT::i32);		R600::ZERO, MVT::i32);
Offset = CurDAG->getTargetConstant(IMMOffset->getZExtValue(), SDLoc(Addr),		Offset = CurDAG->getTargetConstant(IMMOffset->getZExtValue(), SDLoc(Addr),
MVT::i32);		MVT::i32);
return true;		return true;
}		}

// Default case, no offset		// Default case, no offset
Base = Addr;		Base = Addr;
Offset = CurDAG->getTargetConstant(0, SDLoc(Addr), MVT::i32);		Offset = CurDAG->getTargetConstant(0, SDLoc(Addr), MVT::i32);
return true;		return true;
}		}

lib/Target/AMDGPU/AMDGPUISelLowering.h

Show All 17 Lines

#include "AMDGPU.h"		#include "AMDGPU.h"
#include "llvm/CodeGen/CallingConvLower.h"		#include "llvm/CodeGen/CallingConvLower.h"
#include "llvm/CodeGen/TargetLowering.h"		#include "llvm/CodeGen/TargetLowering.h"

namespace llvm {		namespace llvm {

class AMDGPUMachineFunction;		class AMDGPUMachineFunction;
class AMDGPUSubtarget;		class AMDGPUCommonSubtarget;
struct ArgDescriptor;		struct ArgDescriptor;

class AMDGPUTargetLowering : public TargetLowering {		class AMDGPUTargetLowering : public TargetLowering {
private:		private:
		const AMDGPUCommonSubtarget *Subtarget;

/// \returns AMDGPUISD::FFBH_U32 node if the incoming \p Op may have been		/// \returns AMDGPUISD::FFBH_U32 node if the incoming \p Op may have been
/// legalized from a smaller type VT. Need to match pre-legalized type because		/// legalized from a smaller type VT. Need to match pre-legalized type because
/// the generic legalization inserts the add/sub between the select and		/// the generic legalization inserts the add/sub between the select and
/// compare.		/// compare.
SDValue getFFBX_U32(SelectionDAG &DAG, SDValue Op, const SDLoc &DL, unsigned Opc) const;		SDValue getFFBX_U32(SelectionDAG &DAG, SDValue Op, const SDLoc &DL, unsigned Opc) const;

public:		public:
static unsigned numBitsUnsigned(SDValue Op, SelectionDAG &DAG);		static unsigned numBitsUnsigned(SDValue Op, SelectionDAG &DAG);
static unsigned numBitsSigned(SDValue Op, SelectionDAG &DAG);		static unsigned numBitsSigned(SDValue Op, SelectionDAG &DAG);

protected:		protected:
const AMDGPUSubtarget *Subtarget;
AMDGPUAS AMDGPUASI;		AMDGPUAS AMDGPUASI;

SDValue LowerEXTRACT_SUBVECTOR(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerEXTRACT_SUBVECTOR(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerCONCAT_VECTORS(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerCONCAT_VECTORS(SDValue Op, SelectionDAG &DAG) const;
/// Split a vector store into multiple scalar stores.		/// Split a vector store into multiple scalar stores.
/// \returns The resulting chain.		/// \returns The resulting chain.

SDValue LowerFREM(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFREM(SDValue Op, SelectionDAG &DAG) const;
▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	protected:
SDValue LowerSDIVREM(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSDIVREM(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerUDIVREM(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerUDIVREM(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerDIVREM24(SDValue Op, SelectionDAG &DAG, bool sign) const;		SDValue LowerDIVREM24(SDValue Op, SelectionDAG &DAG, bool sign) const;
void LowerUDIVREM64(SDValue Op, SelectionDAG &DAG,		void LowerUDIVREM64(SDValue Op, SelectionDAG &DAG,
SmallVectorImpl<SDValue> &Results) const;		SmallVectorImpl<SDValue> &Results) const;
void analyzeFormalArgumentsCompute(CCState &State,		void analyzeFormalArgumentsCompute(CCState &State,
const SmallVectorImpl<ISD::InputArg> &Ins) const;		const SmallVectorImpl<ISD::InputArg> &Ins) const;
public:		public:
AMDGPUTargetLowering(const TargetMachine &TM, const AMDGPUSubtarget &STI);		AMDGPUTargetLowering(const TargetMachine &TM, const AMDGPUCommonSubtarget &STI);

bool mayIgnoreSignedZero(SDValue Op) const {		bool mayIgnoreSignedZero(SDValue Op) const {
if (getTargetMachine().Options.NoSignedZerosFPMath)		if (getTargetMachine().Options.NoSignedZerosFPMath)
return true;		return true;

const auto Flags = Op.getNode()->getFlags();		const auto Flags = Op.getNode()->getFlags();
if (Flags.isDefined())		if (Flags.isDefined())
return Flags.hasNoSignedZeros();		return Flags.hasNoSignedZeros();
▲ Show 20 Lines • Show All 365 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPUISelLowering.cpp

Show First 20 Lines • Show All 149 Lines • ▼ Show 20 Lines	unsigned AMDGPUTargetLowering::numBitsSigned(SDValue Op, SelectionDAG &DAG) {
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();

// In order for this to be a signed 24-bit value, bit 23, must		// In order for this to be a signed 24-bit value, bit 23, must
// be a sign bit.		// be a sign bit.
return VT.getSizeInBits() - DAG.ComputeNumSignBits(Op);		return VT.getSizeInBits() - DAG.ComputeNumSignBits(Op);
}		}

AMDGPUTargetLowering::AMDGPUTargetLowering(const TargetMachine &TM,		AMDGPUTargetLowering::AMDGPUTargetLowering(const TargetMachine &TM,
const AMDGPUSubtarget &STI)		const AMDGPUCommonSubtarget &STI)
: TargetLowering(TM), Subtarget(&STI) {		: TargetLowering(TM), Subtarget(&STI) {
AMDGPUASI = AMDGPU::getAMDGPUAS(TM);		AMDGPUASI = AMDGPU::getAMDGPUAS(TM);
// Lower floating point store/load to integer store/load to reduce the number		// Lower floating point store/load to integer store/load to reduce the number
// of patterns in tablegen.		// of patterns in tablegen.
setOperationAction(ISD::LOAD, MVT::f32, Promote);		setOperationAction(ISD::LOAD, MVT::f32, Promote);
AddPromotedToType(ISD::LOAD, MVT::f32, MVT::i32);		AddPromotedToType(ISD::LOAD, MVT::f32, MVT::i32);

setOperationAction(ISD::LOAD, MVT::v2f32, Promote);		setOperationAction(ISD::LOAD, MVT::v2f32, Promote);
▲ Show 20 Lines • Show All 158 Lines • ▼ Show 20 Lines	AMDGPUTargetLowering::AMDGPUTargetLowering(const TargetMachine &TM,
setOperationAction(ISD::FMAXNUM, MVT::f32, Legal);		setOperationAction(ISD::FMAXNUM, MVT::f32, Legal);

setOperationAction(ISD::FROUND, MVT::f32, Custom);		setOperationAction(ISD::FROUND, MVT::f32, Custom);
setOperationAction(ISD::FROUND, MVT::f64, Custom);		setOperationAction(ISD::FROUND, MVT::f64, Custom);

setOperationAction(ISD::FLOG, MVT::f32, Custom);		setOperationAction(ISD::FLOG, MVT::f32, Custom);
setOperationAction(ISD::FLOG10, MVT::f32, Custom);		setOperationAction(ISD::FLOG10, MVT::f32, Custom);

if (Subtarget->has16BitInsts()) {
setOperationAction(ISD::FLOG, MVT::f16, Custom);
setOperationAction(ISD::FLOG10, MVT::f16, Custom);
}

setOperationAction(ISD::FNEARBYINT, MVT::f32, Custom);		setOperationAction(ISD::FNEARBYINT, MVT::f32, Custom);
setOperationAction(ISD::FNEARBYINT, MVT::f64, Custom);		setOperationAction(ISD::FNEARBYINT, MVT::f64, Custom);

setOperationAction(ISD::FREM, MVT::f32, Custom);		setOperationAction(ISD::FREM, MVT::f32, Custom);
setOperationAction(ISD::FREM, MVT::f64, Custom);		setOperationAction(ISD::FREM, MVT::f64, Custom);

// v_mad_f32 does not support denormals according to some sources.
if (!Subtarget->hasFP32Denormals())
setOperationAction(ISD::FMAD, MVT::f32, Legal);

// Expand to fneg + fadd.		// Expand to fneg + fadd.
setOperationAction(ISD::FSUB, MVT::f64, Expand);		setOperationAction(ISD::FSUB, MVT::f64, Expand);

setOperationAction(ISD::CONCAT_VECTORS, MVT::v4i32, Custom);		setOperationAction(ISD::CONCAT_VECTORS, MVT::v4i32, Custom);
setOperationAction(ISD::CONCAT_VECTORS, MVT::v4f32, Custom);		setOperationAction(ISD::CONCAT_VECTORS, MVT::v4f32, Custom);
setOperationAction(ISD::CONCAT_VECTORS, MVT::v8i32, Custom);		setOperationAction(ISD::CONCAT_VECTORS, MVT::v8i32, Custom);
setOperationAction(ISD::CONCAT_VECTORS, MVT::v8f32, Custom);		setOperationAction(ISD::CONCAT_VECTORS, MVT::v8f32, Custom);
setOperationAction(ISD::EXTRACT_SUBVECTOR, MVT::v2f32, Custom);		setOperationAction(ISD::EXTRACT_SUBVECTOR, MVT::v2f32, Custom);
setOperationAction(ISD::EXTRACT_SUBVECTOR, MVT::v2i32, Custom);		setOperationAction(ISD::EXTRACT_SUBVECTOR, MVT::v2i32, Custom);
setOperationAction(ISD::EXTRACT_SUBVECTOR, MVT::v4f32, Custom);		setOperationAction(ISD::EXTRACT_SUBVECTOR, MVT::v4f32, Custom);
setOperationAction(ISD::EXTRACT_SUBVECTOR, MVT::v4i32, Custom);		setOperationAction(ISD::EXTRACT_SUBVECTOR, MVT::v4i32, Custom);
setOperationAction(ISD::EXTRACT_SUBVECTOR, MVT::v8f32, Custom);		setOperationAction(ISD::EXTRACT_SUBVECTOR, MVT::v8f32, Custom);
setOperationAction(ISD::EXTRACT_SUBVECTOR, MVT::v8i32, Custom);		setOperationAction(ISD::EXTRACT_SUBVECTOR, MVT::v8i32, Custom);

if (Subtarget->getGeneration() < AMDGPUSubtarget::SEA_ISLANDS) {
setOperationAction(ISD::FCEIL, MVT::f64, Custom);
setOperationAction(ISD::FTRUNC, MVT::f64, Custom);
setOperationAction(ISD::FRINT, MVT::f64, Custom);
setOperationAction(ISD::FFLOOR, MVT::f64, Custom);
}

if (!Subtarget->hasBFI()) {
// fcopysign can be done in a single instruction with BFI.
setOperationAction(ISD::FCOPYSIGN, MVT::f32, Expand);
setOperationAction(ISD::FCOPYSIGN, MVT::f64, Expand);
}

setOperationAction(ISD::FP16_TO_FP, MVT::f64, Expand);		setOperationAction(ISD::FP16_TO_FP, MVT::f64, Expand);
setOperationAction(ISD::FP_TO_FP16, MVT::f64, Custom);		setOperationAction(ISD::FP_TO_FP16, MVT::f64, Custom);
setOperationAction(ISD::FP_TO_FP16, MVT::f32, Custom);		setOperationAction(ISD::FP_TO_FP16, MVT::f32, Custom);

const MVT ScalarIntVTs[] = { MVT::i32, MVT::i64 };		const MVT ScalarIntVTs[] = { MVT::i32, MVT::i64 };
for (MVT VT : ScalarIntVTs) {		for (MVT VT : ScalarIntVTs) {
// These should use [SU]DIVREM, so set them to expand		// These should use [SU]DIVREM, so set them to expand
setOperationAction(ISD::SDIV, VT, Expand);		setOperationAction(ISD::SDIV, VT, Expand);
Show All 15 Lines	for (MVT VT : ScalarIntVTs) {

// AMDGPU uses ADDC/SUBC/ADDE/SUBE		// AMDGPU uses ADDC/SUBC/ADDE/SUBE
setOperationAction(ISD::ADDC, VT, Legal);		setOperationAction(ISD::ADDC, VT, Legal);
setOperationAction(ISD::SUBC, VT, Legal);		setOperationAction(ISD::SUBC, VT, Legal);
setOperationAction(ISD::ADDE, VT, Legal);		setOperationAction(ISD::ADDE, VT, Legal);
setOperationAction(ISD::SUBE, VT, Legal);		setOperationAction(ISD::SUBE, VT, Legal);
}		}

if (!Subtarget->hasBCNT(32))
setOperationAction(ISD::CTPOP, MVT::i32, Expand);

if (!Subtarget->hasBCNT(64))
setOperationAction(ISD::CTPOP, MVT::i64, Expand);

// The hardware supports 32-bit ROTR, but not ROTL.		// The hardware supports 32-bit ROTR, but not ROTL.
setOperationAction(ISD::ROTL, MVT::i32, Expand);		setOperationAction(ISD::ROTL, MVT::i32, Expand);
setOperationAction(ISD::ROTL, MVT::i64, Expand);		setOperationAction(ISD::ROTL, MVT::i64, Expand);
setOperationAction(ISD::ROTR, MVT::i64, Expand);		setOperationAction(ISD::ROTR, MVT::i64, Expand);

setOperationAction(ISD::MUL, MVT::i64, Expand);		setOperationAction(ISD::MUL, MVT::i64, Expand);
setOperationAction(ISD::MULHU, MVT::i64, Expand);		setOperationAction(ISD::MULHU, MVT::i64, Expand);
setOperationAction(ISD::MULHS, MVT::i64, Expand);		setOperationAction(ISD::MULHS, MVT::i64, Expand);
setOperationAction(ISD::UINT_TO_FP, MVT::i64, Custom);		setOperationAction(ISD::UINT_TO_FP, MVT::i64, Custom);
setOperationAction(ISD::SINT_TO_FP, MVT::i64, Custom);		setOperationAction(ISD::SINT_TO_FP, MVT::i64, Custom);
setOperationAction(ISD::FP_TO_SINT, MVT::i64, Custom);		setOperationAction(ISD::FP_TO_SINT, MVT::i64, Custom);
setOperationAction(ISD::FP_TO_UINT, MVT::i64, Custom);		setOperationAction(ISD::FP_TO_UINT, MVT::i64, Custom);
setOperationAction(ISD::SELECT_CC, MVT::i64, Expand);		setOperationAction(ISD::SELECT_CC, MVT::i64, Expand);

setOperationAction(ISD::SMIN, MVT::i32, Legal);		setOperationAction(ISD::SMIN, MVT::i32, Legal);
setOperationAction(ISD::UMIN, MVT::i32, Legal);		setOperationAction(ISD::UMIN, MVT::i32, Legal);
setOperationAction(ISD::SMAX, MVT::i32, Legal);		setOperationAction(ISD::SMAX, MVT::i32, Legal);
setOperationAction(ISD::UMAX, MVT::i32, Legal);		setOperationAction(ISD::UMAX, MVT::i32, Legal);

if (Subtarget->hasFFBH())
setOperationAction(ISD::CTLZ_ZERO_UNDEF, MVT::i32, Custom);

if (Subtarget->hasFFBL())
setOperationAction(ISD::CTTZ_ZERO_UNDEF, MVT::i32, Custom);

setOperationAction(ISD::CTTZ, MVT::i64, Custom);		setOperationAction(ISD::CTTZ, MVT::i64, Custom);
setOperationAction(ISD::CTTZ_ZERO_UNDEF, MVT::i64, Custom);		setOperationAction(ISD::CTTZ_ZERO_UNDEF, MVT::i64, Custom);
setOperationAction(ISD::CTLZ, MVT::i64, Custom);		setOperationAction(ISD::CTLZ, MVT::i64, Custom);
setOperationAction(ISD::CTLZ_ZERO_UNDEF, MVT::i64, Custom);		setOperationAction(ISD::CTLZ_ZERO_UNDEF, MVT::i64, Custom);

// We only really have 32-bit BFE instructions (and 16-bit on VI).
//
// On SI+ there are 64-bit BFEs, but they are scalar only and there isn't any
// effort to match them now. We want this to be false for i64 cases when the
// extraction isn't restricted to the upper or lower half. Ideally we would
// have some pass reduce 64-bit extracts to 32-bit if possible. Extracts that
// span the midpoint are probably relatively rare, so don't worry about them
// for now.
if (Subtarget->hasBFE())
setHasExtractBitsInsn(true);

static const MVT::SimpleValueType VectorIntTypes[] = {		static const MVT::SimpleValueType VectorIntTypes[] = {
MVT::v2i32, MVT::v4i32		MVT::v2i32, MVT::v4i32
};		};

for (MVT VT : VectorIntTypes) {		for (MVT VT : VectorIntTypes) {
// Expand the following operations for the current type by default.		// Expand the following operations for the current type by default.
setOperationAction(ISD::ADD, VT, Expand);		setOperationAction(ISD::ADD, VT, Expand);
setOperationAction(ISD::AND, VT, Expand);		setOperationAction(ISD::AND, VT, Expand);
▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	AMDGPUTargetLowering::AMDGPUTargetLowering(const TargetMachine &TM,
// FIXME: This is only partially true. If we have to do vector compares, any		// FIXME: This is only partially true. If we have to do vector compares, any
// SGPR pair can be a condition register. If we have a uniform condition, we		// SGPR pair can be a condition register. If we have a uniform condition, we
// are better off doing SALU operations, where there is only one SCC. For now,		// are better off doing SALU operations, where there is only one SCC. For now,
// we don't have a way of knowing during instruction selection if a condition		// we don't have a way of knowing during instruction selection if a condition
// will be uniform and we always use vector compares. Assume we are using		// will be uniform and we always use vector compares. Assume we are using
// vector compares until that is fixed.		// vector compares until that is fixed.
setHasMultipleConditionRegisters(true);		setHasMultipleConditionRegisters(true);

// SI at least has hardware support for floating point exceptions, but no way
// of using or handling them is implemented. They are also optional in OpenCL
// (Section 7.3)
setHasFloatingPointExceptions(Subtarget->hasFPExceptions());

PredictableSelectIsExpensive = false;		PredictableSelectIsExpensive = false;

// We want to find all load dependencies for long chains of stores to enable		// We want to find all load dependencies for long chains of stores to enable
// merging into very wide vectors. The problem is with vectors with > 4		// merging into very wide vectors. The problem is with vectors with > 4
// elements. MergeConsecutiveStores will attempt to merge these because x8/x16		// elements. MergeConsecutiveStores will attempt to merge these because x8/x16
// vectors are a legal type, even though we have to split the loads		// vectors are a legal type, even though we have to split the loads
// usually. When we can more precisely specify load legality per address		// usually. When we can more precisely specify load legality per address
// space, we should be able to make FindBetterChain/MergeConsecutiveStores		// space, we should be able to make FindBetterChain/MergeConsecutiveStores
▲ Show 20 Lines • Show All 206 Lines • ▼ Show 20 Lines	case ISD::INTRINSIC_WO_CHAIN:
return true;		return true;
}		}
}		}
break;		break;
case ISD::LOAD:		case ISD::LOAD:
{		{
const LoadSDNode * L = dyn_cast<LoadSDNode>(N);		const LoadSDNode * L = dyn_cast<LoadSDNode>(N);
if (L->getMemOperand()->getAddrSpace()		if (L->getMemOperand()->getAddrSpace()
== Subtarget->getAMDGPUAS().CONSTANT_ADDRESS_32BIT)		== AMDGPUASI.CONSTANT_ADDRESS_32BIT)
return true;		return true;
return false;		return false;
}		}
break;		break;
}		}
}		}

//===---------------------------------------------------------------------===//		//===---------------------------------------------------------------------===//
▲ Show 20 Lines • Show All 3,488 Lines • ▼ Show 20 Lines	case AMDGPUISD::PERM: {
}		}
break;		break;
}		}
case ISD::INTRINSIC_WO_CHAIN: {		case ISD::INTRINSIC_WO_CHAIN: {
unsigned IID = cast<ConstantSDNode>(Op.getOperand(0))->getZExtValue();		unsigned IID = cast<ConstantSDNode>(Op.getOperand(0))->getZExtValue();
switch (IID) {		switch (IID) {
case Intrinsic::amdgcn_mbcnt_lo:		case Intrinsic::amdgcn_mbcnt_lo:
case Intrinsic::amdgcn_mbcnt_hi: {		case Intrinsic::amdgcn_mbcnt_hi: {
		const SISubtarget &ST =
		DAG.getMachineFunction().getSubtarget<SISubtarget>();
// These return at most the wavefront size - 1.		// These return at most the wavefront size - 1.
unsigned Size = Op.getValueType().getSizeInBits();		unsigned Size = Op.getValueType().getSizeInBits();
Known.Zero.setHighBits(Size - Subtarget->getWavefrontSizeLog2());		Known.Zero.setHighBits(Size - ST.getWavefrontSizeLog2());
break;		break;
}		}
default:		default:
break;		break;
}		}
}		}
}		}
}		}
Show All 34 Lines

lib/Target/AMDGPU/AMDGPUInstrInfo.h

	Show All 14 Lines

	#ifndef LLVM_LIB_TARGET_AMDGPU_AMDGPUINSTRINFO_H			#ifndef LLVM_LIB_TARGET_AMDGPU_AMDGPUINSTRINFO_H
	#define LLVM_LIB_TARGET_AMDGPU_AMDGPUINSTRINFO_H			#define LLVM_LIB_TARGET_AMDGPU_AMDGPUINSTRINFO_H

	#include "AMDGPU.h"			#include "AMDGPU.h"
	#include "Utils/AMDGPUBaseInfo.h"			#include "Utils/AMDGPUBaseInfo.h"
	#include "llvm/CodeGen/TargetInstrInfo.h"			#include "llvm/CodeGen/TargetInstrInfo.h"

	#define GET_INSTRINFO_HEADER
	#include "AMDGPUGenInstrInfo.inc"
	#undef GET_INSTRINFO_HEADER

	namespace llvm {			namespace llvm {

	class AMDGPUSubtarget;			class AMDGPUSubtarget;
	class MachineFunction;			class MachineFunction;
	class MachineInstr;			class MachineInstr;
	class MachineInstrBuilder;			class MachineInstrBuilder;

	class AMDGPUInstrInfo : public AMDGPUGenInstrInfo {			class AMDGPUInstrInfo {
	private:
	const AMDGPUSubtarget &ST;

	virtual void anchor();
	protected:
	AMDGPUAS AMDGPUASI;

	public:			public:
	explicit AMDGPUInstrInfo(const AMDGPUSubtarget &st);			explicit AMDGPUInstrInfo(const AMDGPUSubtarget &st);

	bool shouldScheduleLoadsNear(SDNode Load1, SDNode Load2,
	int64_t Offset1, int64_t Offset2,
	unsigned NumLoads) const override;

	/// Return a target-specific opcode if Opcode is a pseudo instruction.
	/// Return -1 if the target-specific opcode for the pseudo instruction does
	/// not exist. If Opcode is not a pseudo instruction, this is identity.
	int pseudoToMCOpcode(int Opcode) const;

	static bool isUniformMMO(const MachineMemOperand *MMO);			static bool isUniformMMO(const MachineMemOperand *MMO);
	};			};

	namespace AMDGPU {			namespace AMDGPU {

	struct RsrcIntrinsic {			struct RsrcIntrinsic {
	unsigned Intr;			unsigned Intr;
	uint8_t RsrcArg;			uint8_t RsrcArg;
	Show All 21 Lines

lib/Target/AMDGPU/AMDGPUInstrInfo.cpp

	//===-- AMDGPUInstrInfo.cpp - Base class for AMD GPU InstrInfo ------------===//			//===-- AMDGPUInstrInfo.cpp - Base class for AMD GPU InstrInfo ------------===//
	//			//
	// The LLVM Compiler Infrastructure			// The LLVM Compiler Infrastructure
	//			//
	// This file is distributed under the University of Illinois Open Source			// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.			// License. See LICENSE.TXT for details.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	/// \file			/// \file
	/// Implementation of the TargetInstrInfo class that is common to all			/// \brief Implementation of the TargetInstrInfo class that is common to all
	/// AMD GPUs.			/// AMD GPUs.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "AMDGPUInstrInfo.h"			#include "AMDGPUInstrInfo.h"
	#include "AMDGPURegisterInfo.h"			#include "AMDGPURegisterInfo.h"
	#include "AMDGPUTargetMachine.h"			#include "AMDGPUTargetMachine.h"
	#include "MCTargetDesc/AMDGPUMCTargetDesc.h"			#include "MCTargetDesc/AMDGPUMCTargetDesc.h"
	#include "llvm/CodeGen/MachineFrameInfo.h"			#include "llvm/CodeGen/MachineFrameInfo.h"
	#include "llvm/CodeGen/MachineInstrBuilder.h"			#include "llvm/CodeGen/MachineInstrBuilder.h"
	#include "llvm/CodeGen/MachineRegisterInfo.h"			#include "llvm/CodeGen/MachineRegisterInfo.h"

	using namespace llvm;			using namespace llvm;

	#define GET_INSTRINFO_CTOR_DTOR
	#include "AMDGPUGenInstrInfo.inc"

	namespace llvm {
	namespace AMDGPU {
	#define GET_D16ImageDimIntrinsics_IMPL
	#define GET_ImageDimIntrinsicTable_IMPL
	#define GET_RsrcIntrinsics_IMPL
	#include "AMDGPUGenSearchableTables.inc"
	}
	}

	// Pin the vtable to this file.			// Pin the vtable to this file.
	void AMDGPUInstrInfo::anchor() {}			//void AMDGPUInstrInfo::anchor() {}
				arsenmUnsubmitted Done Reply Inline Actions Commented out code arsenm: Commented out code

	AMDGPUInstrInfo::AMDGPUInstrInfo(const AMDGPUSubtarget &ST)			AMDGPUInstrInfo::AMDGPUInstrInfo(const AMDGPUSubtarget &ST) { }
	: AMDGPUGenInstrInfo(AMDGPU::ADJCALLSTACKUP, AMDGPU::ADJCALLSTACKDOWN),
	ST(ST),
	AMDGPUASI(ST.getAMDGPUAS()) {}

	// FIXME: This behaves strangely. If, for example, you have 32 load + stores,
	// the first 16 loads will be interleaved with the stores, and the next 16 will
	// be clustered as expected. It should really split into 2 16 store batches.
	//
	// Loads are clustered until this returns false, rather than trying to schedule
	// groups of stores. This also means we have to deal with saying different
	// address space loads should be clustered, and ones which might cause bank
	// conflicts.
	//
	// This might be deprecated so it might not be worth that much effort to fix.
	bool AMDGPUInstrInfo::shouldScheduleLoadsNear(SDNode Load0, SDNode Load1,
	int64_t Offset0, int64_t Offset1,
	unsigned NumLoads) const {
	assert(Offset1 > Offset0 &&
	"Second offset should be larger than first offset!");
	// If we have less than 16 loads in a row, and the offsets are within 64
	// bytes, then schedule together.

	// A cacheline is 64 bytes (for global memory).
	return (NumLoads <= 16 && (Offset1 - Offset0) < 64);
	}

	// This must be kept in sync with the SIEncodingFamily class in SIInstrInfo.td
	enum SIEncodingFamily {
	SI = 0,
	VI = 1,
	SDWA = 2,
	SDWA9 = 3,
	GFX80 = 4,
	GFX9 = 5
	};

	static SIEncodingFamily subtargetEncodingFamily(const AMDGPUSubtarget &ST) {
	switch (ST.getGeneration()) {
	case AMDGPUSubtarget::SOUTHERN_ISLANDS:
	case AMDGPUSubtarget::SEA_ISLANDS:
	return SIEncodingFamily::SI;
	case AMDGPUSubtarget::VOLCANIC_ISLANDS:
	case AMDGPUSubtarget::GFX9:
	return SIEncodingFamily::VI;

	// FIXME: This should never be called for r600 GPUs.
	case AMDGPUSubtarget::R600:
	case AMDGPUSubtarget::R700:
	case AMDGPUSubtarget::EVERGREEN:
	case AMDGPUSubtarget::NORTHERN_ISLANDS:
	return SIEncodingFamily::SI;
	}

	llvm_unreachable("Unknown subtarget generation!");
	}

	int AMDGPUInstrInfo::pseudoToMCOpcode(int Opcode) const {
	SIEncodingFamily Gen = subtargetEncodingFamily(ST);

	if ((get(Opcode).TSFlags & SIInstrFlags::renamedInGFX9) != 0 &&
	ST.getGeneration() >= AMDGPUSubtarget::GFX9)
	Gen = SIEncodingFamily::GFX9;

	if (get(Opcode).TSFlags & SIInstrFlags::SDWA)
	Gen = ST.getGeneration() == AMDGPUSubtarget::GFX9 ? SIEncodingFamily::SDWA9
	: SIEncodingFamily::SDWA;
	// Adjust the encoding family to GFX80 for D16 buffer instructions when the
	// subtarget has UnpackedD16VMem feature.
	// TODO: remove this when we discard GFX80 encoding.
	if (ST.hasUnpackedD16VMem() && (get(Opcode).TSFlags & SIInstrFlags::D16Buf))
	Gen = SIEncodingFamily::GFX80;

	int MCOp = AMDGPU::getMCOpcode(Opcode, Gen);

	// -1 means that Opcode is already a native instruction.
	if (MCOp == -1)
	return Opcode;

	// (uint16_t)-1 means that Opcode is a pseudo instruction that has
	// no encoding in the given subtarget generation.
	if (MCOp == (uint16_t)-1)
	return -1;

	return MCOp;
	}

	// TODO: Should largely merge with AMDGPUTTIImpl::isSourceOfDivergence.			// TODO: Should largely merge with AMDGPUTTIImpl::isSourceOfDivergence.
	bool AMDGPUInstrInfo::isUniformMMO(const MachineMemOperand *MMO) {			bool AMDGPUInstrInfo::isUniformMMO(const MachineMemOperand *MMO) {
	const Value *Ptr = MMO->getValue();			const Value *Ptr = MMO->getValue();
	// UndefValue means this is a load of a kernel input. These are uniform.			// UndefValue means this is a load of a kernel input. These are uniform.
	// Sometimes LDS instructions have constant pointers.			// Sometimes LDS instructions have constant pointers.
	// If Ptr is null, then that means this mem operand contains a			// If Ptr is null, then that means this mem operand contains a
	// PseudoSourceValue like GOT.			// PseudoSourceValue like GOT.
	Show All 13 Lines

lib/Target/AMDGPU/AMDGPUInstructions.td

Show All 36 Lines
}		}

class AMDGPUShaderInst <dag outs, dag ins, string asm = "",		class AMDGPUShaderInst <dag outs, dag ins, string asm = "",
list<dag> pattern = []> : AMDGPUInst<outs, ins, asm, pattern> {		list<dag> pattern = []> : AMDGPUInst<outs, ins, asm, pattern> {

field bits<32> Inst = 0xffffffff;		field bits<32> Inst = 0xffffffff;
}		}

		//===---------------------------------------------------------------------===//
		// Return instruction
		//===---------------------------------------------------------------------===//

		class ILFormat<dag outs, dag ins, string asmstr, list<dag> pattern>
		arsenmUnsubmitted Not Done Reply Inline Actions Should probably rename this at some point arsenm: Should probably rename this at some point
		: Instruction {

		let Namespace = "AMDGPU";
		dag OutOperandList = outs;
		dag InOperandList = ins;
		let Pattern = pattern;
		let AsmString = !strconcat(asmstr, "\n");
		let isPseudo = 1;
		let Itinerary = NullALU;
		bit hasIEEEFlag = 0;
		bit hasZeroOpFlag = 0;
		let mayLoad = 0;
		let mayStore = 0;
		let hasSideEffects = 0;
		let isCodeGenOnly = 1;
		}

		def TruePredicate : Predicate<"true">;

		// Exists to help track down where SubtargetPredicate isn't set rather
		// than letting tablegen crash with an unhelpful error.
		def InvalidPred : Predicate<"predicate not set on instruction or pattern">;

		class PredicateControl {
		Predicate SubtargetPredicate = InvalidPred;
		list<Predicate> AssemblerPredicates = [];
		Predicate AssemblerPredicate = TruePredicate;
		list<Predicate> OtherPredicates = [];
		list<Predicate> Predicates = !listconcat([SubtargetPredicate,
		AssemblerPredicate],
		AssemblerPredicates,
		OtherPredicates);
		}
		class AMDGPUPat<dag pattern, dag result> : Pat<pattern, result>,
		PredicateControl;

def FP16Denormals : Predicate<"Subtarget->hasFP16Denormals()">;		def FP16Denormals : Predicate<"Subtarget->hasFP16Denormals()">;
def FP32Denormals : Predicate<"Subtarget->hasFP32Denormals()">;		def FP32Denormals : Predicate<"Subtarget->hasFP32Denormals()">;
def FP64Denormals : Predicate<"Subtarget->hasFP64Denormals()">;		def FP64Denormals : Predicate<"Subtarget->hasFP64Denormals()">;
def NoFP16Denormals : Predicate<"!Subtarget->hasFP16Denormals()">;		def NoFP16Denormals : Predicate<"!Subtarget->hasFP16Denormals()">;
def NoFP32Denormals : Predicate<"!Subtarget->hasFP32Denormals()">;		def NoFP32Denormals : Predicate<"!Subtarget->hasFP32Denormals()">;
def NoFP64Denormals : Predicate<"!Subtarget->hasFP64Denormals()">;		def NoFP64Denormals : Predicate<"!Subtarget->hasFP64Denormals()">;
def UnsafeFPMath : Predicate<"TM.Options.UnsafeFPMath">;		def UnsafeFPMath : Predicate<"TM.Options.UnsafeFPMath">;
def FMA : Predicate<"Subtarget->hasFMA()">;		def FMA : Predicate<"Subtarget->hasFMA()">;
Show All 36 Lines
// Custom Operands		// Custom Operands
//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//
def brtarget : Operand<OtherVT>;		def brtarget : Operand<OtherVT>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Misc. PatFrags		// Misc. PatFrags
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

class HasOneUseUnaryOp<SDPatternOperator op> : PatFrag<
(ops node:$src0),
(op $src0),
[{ return N->hasOneUse(); }]
>;

class HasOneUseBinOp<SDPatternOperator op> : PatFrag<		class HasOneUseBinOp<SDPatternOperator op> : PatFrag<
(ops node:$src0, node:$src1),		(ops node:$src0, node:$src1),
(op $src0, $src1),		(op $src0, $src1),
[{ return N->hasOneUse(); }]		[{ return N->hasOneUse(); }]
>;		>;

class HasOneUseTernaryOp<SDPatternOperator op> : PatFrag<		class HasOneUseTernaryOp<SDPatternOperator op> : PatFrag<
(ops node:$src0, node:$src1, node:$src2),		(ops node:$src0, node:$src1, node:$src2),
(op $src0, $src1, $src2),		(op $src0, $src1, $src2),
[{ return N->hasOneUse(); }]		[{ return N->hasOneUse(); }]
>;		>;

def trunc_oneuse : HasOneUseUnaryOp<trunc>;

let Properties = [SDNPCommutative, SDNPAssociative] in {		let Properties = [SDNPCommutative, SDNPAssociative] in {
def smax_oneuse : HasOneUseBinOp<smax>;		def smax_oneuse : HasOneUseBinOp<smax>;
def smin_oneuse : HasOneUseBinOp<smin>;		def smin_oneuse : HasOneUseBinOp<smin>;
def umax_oneuse : HasOneUseBinOp<umax>;		def umax_oneuse : HasOneUseBinOp<umax>;
def umin_oneuse : HasOneUseBinOp<umin>;		def umin_oneuse : HasOneUseBinOp<umin>;
def fminnum_oneuse : HasOneUseBinOp<fminnum>;		def fminnum_oneuse : HasOneUseBinOp<fminnum>;
def fmaxnum_oneuse : HasOneUseBinOp<fmaxnum>;		def fmaxnum_oneuse : HasOneUseBinOp<fmaxnum>;
def and_oneuse : HasOneUseBinOp<and>;		def and_oneuse : HasOneUseBinOp<and>;
▲ Show 20 Lines • Show All 110 Lines • ▼ Show 20 Lines	def COND_NE : PatLeaf <
[{return N->get() == ISD::SETNE \|\| N->get() == ISD::SETUNE;}]		[{return N->get() == ISD::SETNE \|\| N->get() == ISD::SETUNE;}]
>;		>;

def COND_NULL : PatLeaf <		def COND_NULL : PatLeaf <
(cond),		(cond),
[{(void)N; return false;}]		[{(void)N; return false;}]
>;		>;

		//===----------------------------------------------------------------------===//
		// PatLeafs for Texture Constants
		//===----------------------------------------------------------------------===//

		def TEX_ARRAY : PatLeaf<
		(imm),
		[{uint32_t TType = (uint32_t)N->getZExtValue();
		return TType == 9 \|\| TType == 10 \|\| TType == 16;
		}]
		>;

		def TEX_RECT : PatLeaf<
		(imm),
		[{uint32_t TType = (uint32_t)N->getZExtValue();
		return TType == 5;
		}]
		>;

		def TEX_SHADOW : PatLeaf<
		(imm),
		[{uint32_t TType = (uint32_t)N->getZExtValue();
		return (TType >= 6 && TType <= 8) \|\| TType == 13;
		}]
		>;

		def TEX_SHADOW_ARRAY : PatLeaf<
		(imm),
		[{uint32_t TType = (uint32_t)N->getZExtValue();
		return TType == 11 \|\| TType == 12 \|\| TType == 17;
		}]
		>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Load/Store Pattern Fragments		// Load/Store Pattern Fragments
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

class Aligned8Bytes <dag ops, dag frag> : PatFrag <ops, frag, [{		class Aligned8Bytes <dag ops, dag frag> : PatFrag <ops, frag, [{
return cast<MemSDNode>(N)->getAlignment() % 8 == 0;		return cast<MemSDNode>(N)->getAlignment() % 8 == 0;
}]>;		}]>;
▲ Show 20 Lines • Show All 513 Lines • ▼ Show 20 Lines	class RcpPat<Instruction RcpInst, ValueType vt> : AMDGPUPat <
(fdiv FP_ONE, vt:$src),		(fdiv FP_ONE, vt:$src),
(RcpInst $src)		(RcpInst $src)
>;		>;

class RsqPat<Instruction RsqInst, ValueType vt> : AMDGPUPat <		class RsqPat<Instruction RsqInst, ValueType vt> : AMDGPUPat <
(AMDGPUrcp (fsqrt vt:$src)),		(AMDGPUrcp (fsqrt vt:$src)),
(RsqInst $src)		(RsqInst $src)
>;		>;

include "R600Instructions.td"
include "R700Instructions.td"
include "EvergreenInstructions.td"
include "CaymanInstructions.td"

include "SIInstrInfo.td"

lib/Target/AMDGPU/AMDGPUIntrinsics.td

	//===-- AMDGPUIntrinsics.td - Common intrinsics -- tablegen ------------===//			//===-- AMDGPUIntrinsics.td - Common intrinsics -- tablegen ------------===//
	//			//
	// The LLVM Compiler Infrastructure			// The LLVM Compiler Infrastructure
	//			//
	// This file is distributed under the University of Illinois Open Source			// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.			// License. See LICENSE.TXT for details.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// This file defines intrinsics that are used by all hw codegen targets.			// This file defines intrinsics that are used by all hw codegen targets.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	let TargetPrefix = "AMDGPU", isTarget = 1 in {			let TargetPrefix = "AMDGPU", isTarget = 1 in {
	def int_AMDGPU_kill : Intrinsic<[], [llvm_float_ty], []>;			def int_AMDGPU_kill : Intrinsic<[], [llvm_float_ty], []>;
	}			}

	include "SIIntrinsics.td"

lib/Target/AMDGPU/AMDGPULowerIntrinsics.cpp

	Show First 20 Lines • Show All 111 Lines • ▼ Show 20 Lines
	}			}

	bool AMDGPULowerIntrinsics::makeLIDRangeMetadata(Function &F) const {			bool AMDGPULowerIntrinsics::makeLIDRangeMetadata(Function &F) const {
	auto *TPC = getAnalysisIfAvailable<TargetPassConfig>();			auto *TPC = getAnalysisIfAvailable<TargetPassConfig>();
	if (!TPC)			if (!TPC)
	return false;			return false;

	const TargetMachine &TM = TPC->getTM<TargetMachine>();			const TargetMachine &TM = TPC->getTM<TargetMachine>();
	const AMDGPUSubtarget &ST = TM.getSubtarget<AMDGPUSubtarget>(F);
	bool Changed = false;			bool Changed = false;

	for (auto *U : F.users()) {			for (auto *U : F.users()) {
	auto *CI = dyn_cast<CallInst>(U);			auto *CI = dyn_cast<CallInst>(U);
	if (!CI)			if (!CI)
	continue;			continue;

	Changed \|= ST.makeLIDRangeMetadata(CI);			Changed \|= AMDGPUCommonSubtarget::get(TM, F).makeLIDRangeMetadata(CI);
	}			}
	return Changed;			return Changed;
	}			}

	bool AMDGPULowerIntrinsics::runOnModule(Module &M) {			bool AMDGPULowerIntrinsics::runOnModule(Module &M) {
	bool Changed = false;			bool Changed = false;

	for (Function &F : M) {			for (Function &F : M) {
	Show All 34 Lines

lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp

Show First 20 Lines • Show All 146 Lines • ▼ Show 20 Lines	if (auto *TPC = getAnalysisIfAvailable<TargetPassConfig>())
TM = &TPC->getTM<TargetMachine>();		TM = &TPC->getTM<TargetMachine>();
else		else
return false;		return false;

const Triple &TT = TM->getTargetTriple();		const Triple &TT = TM->getTargetTriple();
IsAMDGCN = TT.getArch() == Triple::amdgcn;		IsAMDGCN = TT.getArch() == Triple::amdgcn;
IsAMDHSA = TT.getOS() == Triple::AMDHSA;		IsAMDHSA = TT.getOS() == Triple::AMDHSA;

const AMDGPUSubtarget &ST = TM->getSubtarget<AMDGPUSubtarget>(F);		const AMDGPUCommonSubtarget &ST = AMDGPUCommonSubtarget::get(*TM, F);
if (!ST.isPromoteAllocaEnabled())		if (!ST.isPromoteAllocaEnabled())
return false;		return false;

AS = AMDGPU::getAMDGPUAS(*F.getParent());		AS = AMDGPU::getAMDGPUAS(*F.getParent());

bool SufficientLDS = hasSufficientLocalMem(F);		bool SufficientLDS = hasSufficientLocalMem(F);
bool Changed = false;		bool Changed = false;
BasicBlock &EntryBB = *F.begin();		BasicBlock &EntryBB = *F.begin();
for (auto I = EntryBB.begin(), E = EntryBB.end(); I != E; ) {		for (auto I = EntryBB.begin(), E = EntryBB.end(); I != E; ) {
AllocaInst *AI = dyn_cast<AllocaInst>(I);		AllocaInst *AI = dyn_cast<AllocaInst>(I);

++I;		++I;
if (AI)		if (AI)
Changed \|= handleAlloca(*AI, SufficientLDS);		Changed \|= handleAlloca(*AI, SufficientLDS);
}		}

return Changed;		return Changed;
}		}

std::pair<Value , Value >		std::pair<Value , Value >
AMDGPUPromoteAlloca::getLocalSizeYZ(IRBuilder<> &Builder) {		AMDGPUPromoteAlloca::getLocalSizeYZ(IRBuilder<> &Builder) {
const AMDGPUSubtarget &ST = TM->getSubtarget<AMDGPUSubtarget>(		const Function &F = *Builder.GetInsertBlock()->getParent();
*Builder.GetInsertBlock()->getParent());		const AMDGPUCommonSubtarget &ST = AMDGPUCommonSubtarget::get(*TM, F);

if (!IsAMDHSA) {		if (!IsAMDHSA) {
Function *LocalSizeYFn		Function *LocalSizeYFn
= Intrinsic::getDeclaration(Mod, Intrinsic::r600_read_local_size_y);		= Intrinsic::getDeclaration(Mod, Intrinsic::r600_read_local_size_y);
Function *LocalSizeZFn		Function *LocalSizeZFn
= Intrinsic::getDeclaration(Mod, Intrinsic::r600_read_local_size_z);		= Intrinsic::getDeclaration(Mod, Intrinsic::r600_read_local_size_z);

CallInst *LocalSizeY = Builder.CreateCall(LocalSizeYFn, {});		CallInst *LocalSizeY = Builder.CreateCall(LocalSizeYFn, {});
▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines	AMDGPUPromoteAlloca::getLocalSizeYZ(IRBuilder<> &Builder) {

// Extract y component. Upper half of LoadZU should be zero already.		// Extract y component. Upper half of LoadZU should be zero already.
Value *Y = Builder.CreateLShr(LoadXY, 16);		Value *Y = Builder.CreateLShr(LoadXY, 16);

return std::make_pair(Y, LoadZU);		return std::make_pair(Y, LoadZU);
}		}

Value *AMDGPUPromoteAlloca::getWorkitemID(IRBuilder<> &Builder, unsigned N) {		Value *AMDGPUPromoteAlloca::getWorkitemID(IRBuilder<> &Builder, unsigned N) {
const AMDGPUSubtarget &ST = TM->getSubtarget<AMDGPUSubtarget>(		const AMDGPUCommonSubtarget &ST =
*Builder.GetInsertBlock()->getParent());		AMDGPUCommonSubtarget::get(TM, Builder.GetInsertBlock()->getParent());
Intrinsic::ID IntrID = Intrinsic::ID::not_intrinsic;		Intrinsic::ID IntrID = Intrinsic::ID::not_intrinsic;

switch (N) {		switch (N) {
case 0:		case 0:
IntrID = IsAMDGCN ? Intrinsic::amdgcn_workitem_id_x		IntrID = IsAMDGCN ? Intrinsic::amdgcn_workitem_id_x
: Intrinsic::r600_read_tidig_x;		: Intrinsic::r600_read_tidig_x;
break;		break;
case 1:		case 1:
▲ Show 20 Lines • Show All 323 Lines • ▼ Show 20 Lines	bool AMDGPUPromoteAlloca::collectUsesWithPtrTypes(
}		}

return true;		return true;
}		}

bool AMDGPUPromoteAlloca::hasSufficientLocalMem(const Function &F) {		bool AMDGPUPromoteAlloca::hasSufficientLocalMem(const Function &F) {

FunctionType *FTy = F.getFunctionType();		FunctionType *FTy = F.getFunctionType();
const AMDGPUSubtarget &ST = TM->getSubtarget<AMDGPUSubtarget>(F);		const AMDGPUCommonSubtarget &ST = AMDGPUCommonSubtarget::get(*TM, F);

// If the function has any arguments in the local address space, then it's		// If the function has any arguments in the local address space, then it's
// possible these arguments require the entire local memory space, so		// possible these arguments require the entire local memory space, so
// we cannot use local memory in the pass.		// we cannot use local memory in the pass.
for (Type *ParamTy : FTy->params()) {		for (Type *ParamTy : FTy->params()) {
PointerType *PtrTy = dyn_cast<PointerType>(ParamTy);		PointerType *PtrTy = dyn_cast<PointerType>(ParamTy);
if (PtrTy && PtrTy->getAddressSpace() == AS.LOCAL_ADDRESS) {		if (PtrTy && PtrTy->getAddressSpace() == AS.LOCAL_ADDRESS) {
LocalMemLimit = 0;		LocalMemLimit = 0;
▲ Show 20 Lines • Show All 110 Lines • ▼ Show 20 Lines	LLVM_DEBUG(
<< " promote alloca to LDS not supported with calling convention.\n");		<< " promote alloca to LDS not supported with calling convention.\n");
return false;		return false;
}		}

// Not likely to have sufficient local memory for promotion.		// Not likely to have sufficient local memory for promotion.
if (!SufficientLDS)		if (!SufficientLDS)
return false;		return false;

const AMDGPUSubtarget &ST =		const AMDGPUCommonSubtarget &ST = AMDGPUCommonSubtarget::get(*TM, ContainingFunction);
TM->getSubtarget<AMDGPUSubtarget>(ContainingFunction);
unsigned WorkGroupSize = ST.getFlatWorkGroupSizes(ContainingFunction).second;		unsigned WorkGroupSize = ST.getFlatWorkGroupSizes(ContainingFunction).second;

const DataLayout &DL = Mod->getDataLayout();		const DataLayout &DL = Mod->getDataLayout();

unsigned Align = I.getAlignment();		unsigned Align = I.getAlignment();
if (Align == 0)		if (Align == 0)
Align = DL.getABITypeAlignment(I.getAllocatedType());		Align = DL.getABITypeAlignment(I.getAllocatedType());

▲ Show 20 Lines • Show All 174 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPURegisterInfo.td

	Show All 13 Lines
	let Namespace = "AMDGPU" in {			let Namespace = "AMDGPU" in {

	foreach Index = 0-15 in {			foreach Index = 0-15 in {
	def sub#Index : SubRegIndex<32, !shl(Index, 5)>;			def sub#Index : SubRegIndex<32, !shl(Index, 5)>;
	}			}

	}			}

	include "R600RegisterInfo.td"
	include "SIRegisterInfo.td"			include "SIRegisterInfo.td"

lib/Target/AMDGPU/AMDGPUSubtarget.h

Show All 33 Lines
#include "llvm/Support/MathExtras.h"		#include "llvm/Support/MathExtras.h"
#include <cassert>		#include <cassert>
#include <cstdint>		#include <cstdint>
#include <memory>		#include <memory>
#include <utility>		#include <utility>

#define GET_SUBTARGETINFO_HEADER		#define GET_SUBTARGETINFO_HEADER
#include "AMDGPUGenSubtargetInfo.inc"		#include "AMDGPUGenSubtargetInfo.inc"
		#define GET_SUBTARGETINFO_HEADER
		#include "R600GenSubtargetInfo.inc"

namespace llvm {		namespace llvm {

class StringRef;		class StringRef;

class AMDGPUSubtarget : public AMDGPUGenSubtargetInfo {		class AMDGPUCommonSubtarget {
		private:
		Triple TargetTriple;

		protected:
		const FeatureBitset &SubtargetFeatureBits;
		bool Has16BitInsts;
		bool HasMadMixInsts;
		bool FP32Denormals;
		bool FPExceptions;
		bool HasSDWA;
		bool HasVOP3PInsts;
		bool HasMulI24;
		bool HasMulU24;
		bool HasFminFmaxLegacy;
		bool EnablePromoteAlloca;
		int LocalMemorySize;
		unsigned WavefrontSize;
		arsenmUnsubmitted Done Reply Inline Actions Is it possible to avoid making these virtual? arsenm: Is it possible to avoid making these virtual?
		tstellarAuthorUnsubmitted Not Done Reply Inline Actions I will look through this again and see if I can eliminate some of these virtual functions, but to get rid of all of them we have a few options: We could eliminate the AMDGPUCommonSubtarget super class and then in code shared between r600 and amdgcn (which is mostly IR passes and a few remaining classes like AMDGPUTargetLowering, AMDAsmPrinter, etc) do something like: bool IsAmdHsaOs; if (Triple.getArch() == Triple::amdgcn) IsAmdHsaOS = static_cast<SISubtarget>(Subtarget).isAmdHsaOS() else IsAmdHsaOS = static_cast<R600Subtaget>(Subtarget).isAmdHsaOS(); Remove subtarget checks from shared classes by refactoring code into r600/gcn specific classes. tstellar: I will look through this again and see if I can eliminate some of these virtual functions, but…

		public:
		AMDGPUCommonSubtarget(const Triple &TT, const FeatureBitset &FeatureBits);

		static const AMDGPUCommonSubtarget &get(const MachineFunction &MF);
		static const AMDGPUCommonSubtarget &get(const TargetMachine &TM,
		const Function &F);

		/// \returns Default range flat work group size for a calling convention.
		std::pair<unsigned, unsigned> getDefaultFlatWorkGroupSize(CallingConv::ID CC) const;

		/// \returns Subtarget's default pair of minimum/maximum flat work group sizes
		/// for function \p F, or minimum/maximum flat work group sizes explicitly
		/// requested using "amdgpu-flat-work-group-size" attribute attached to
		/// function \p F.
		///
		/// \returns Subtarget's default values if explicitly requested values cannot
		/// be converted to integer, or violate subtarget's specifications.
		std::pair<unsigned, unsigned> getFlatWorkGroupSizes(const Function &F) const;

		/// \returns Subtarget's default pair of minimum/maximum number of waves per
		/// execution unit for function \p F, or minimum/maximum number of waves per
		/// execution unit explicitly requested using "amdgpu-waves-per-eu" attribute
		/// attached to function \p F.
		///
		/// \returns Subtarget's default values if explicitly requested values cannot
		/// be converted to integer, violate subtarget's specifications, or are not
		/// compatible with minimum/maximum number of waves limited by flat work group
		/// size, register usage, and/or lds usage.
		std::pair<unsigned, unsigned> getWavesPerEU(const Function &F) const;

		/// Return the amount of LDS that can be used that will not restrict the
		/// occupancy lower than WaveCount.
		unsigned getMaxLocalMemSizeWithWaveCount(unsigned WaveCount,
		const Function &) const;

		/// Inverse of getMaxLocalMemWithWaveCount. Return the maximum wavecount if
		/// the given LDS memory size is the only constraint.
		unsigned getOccupancyWithLocalMemSize(uint32_t Bytes, const Function &) const;

		unsigned getOccupancyWithLocalMemSize(const MachineFunction &MF) const;

		bool isAmdHsaOS() const {
		return TargetTriple.getOS() == Triple::AMDHSA;
		}

		bool isAmdPalOS() const {
		return TargetTriple.getOS() == Triple::AMDPAL;
		}

		bool has16BitInsts() const {
		return Has16BitInsts;
		}

		bool hasMadMixInsts() const {
		return HasMadMixInsts;
		}

		bool hasFP32Denormals() const {
		return FP32Denormals;
		}

		bool hasFPExceptions() const {
		return FPExceptions;
		}

		bool hasSDWA() const {
		return HasSDWA;
		}

		bool hasVOP3PInsts() const {
		return HasVOP3PInsts;
		}

		bool hasMulI24() const {
		return HasMulI24;
		}

		bool hasMulU24() const {
		return HasMulU24;
		}

		bool hasFminFmaxLegacy() const {
		return HasFminFmaxLegacy;
		}

		bool isPromoteAllocaEnabled() const {
		return EnablePromoteAlloca;
		}

		unsigned getWavefrontSize() const {
		return WavefrontSize;
		}

		int getLocalMemorySize() const {
		return LocalMemorySize;
		}

		unsigned getAlignmentForImplicitArgPtr() const {
		return isAmdHsaOS() ? 8 : 4;
		}

		/// \returns Maximum number of work groups per compute unit supported by the
		/// subtarget and limited by given \p FlatWorkGroupSize.
		unsigned getMaxWorkGroupsPerCU(unsigned FlatWorkGroupSize) const {
		return AMDGPU::IsaInfo::getMaxWorkGroupsPerCU(SubtargetFeatureBits,
		FlatWorkGroupSize);
		}

		/// \returns Minimum flat work group size supported by the subtarget.
		unsigned getMinFlatWorkGroupSize() const {
		return AMDGPU::IsaInfo::getMinFlatWorkGroupSize(SubtargetFeatureBits);
		}

		/// \returns Maximum flat work group size supported by the subtarget.
		unsigned getMaxFlatWorkGroupSize() const {
		return AMDGPU::IsaInfo::getMaxFlatWorkGroupSize(SubtargetFeatureBits);
		}

		/// \returns Maximum number of waves per execution unit supported by the
		/// subtarget and limited by given \p FlatWorkGroupSize.
		unsigned getMaxWavesPerEU(unsigned FlatWorkGroupSize) const {
		return AMDGPU::IsaInfo::getMaxWavesPerEU(SubtargetFeatureBits,
		FlatWorkGroupSize);
		}

		/// \returns Minimum number of waves per execution unit supported by the
		/// subtarget.
		unsigned getMinWavesPerEU() const {
		return AMDGPU::IsaInfo::getMinWavesPerEU(SubtargetFeatureBits);
		}

		unsigned getMaxWavesPerEU() const { return 10; }

		/// Creates value range metadata on an workitemid.* inrinsic call or load.
		bool makeLIDRangeMetadata(Instruction *I) const;

		virtual ~AMDGPUCommonSubtarget() {}
		};

		class AMDGPUSubtarget : public AMDGPUGenSubtargetInfo,
		public AMDGPUCommonSubtarget {
		arsenmUnsubmitted Not Done Reply Inline Actions Why isn't this SISubtarget/GCNSubtarget? arsenm: Why isn't this SISubtarget/GCNSubtarget?
		tstellarAuthorUnsubmitted Not Done Reply Inline Actions I was planning to rename this as a follow on patch to avoid creating even more churn in this patch. tstellar: I was planning to rename this as a follow on patch to avoid creating even more churn in this…
public:		public:
enum Generation {		enum Generation {
R600 = 0,		// Gap for R600 generations, so we can do comparisons between
R700,		// AMDGPUSubtarget and r600Subtarget.
EVERGREEN,		SOUTHERN_ISLANDS = 4,
NORTHERN_ISLANDS,		SEA_ISLANDS = 5,
SOUTHERN_ISLANDS,		VOLCANIC_ISLANDS = 6,
SEA_ISLANDS,		GFX9 = 7,
VOLCANIC_ISLANDS,
GFX9,
};		};

enum {		enum {
ISAVersion0_0_0,		ISAVersion0_0_0,
ISAVersion6_0_0,		ISAVersion6_0_0,
ISAVersion6_0_1,		ISAVersion6_0_1,
ISAVersion7_0_0,		ISAVersion7_0_0,
ISAVersion7_0_1,		ISAVersion7_0_1,
Show All 25 Lines	enum TrapID {
TrapIDDebugReservedFE = 0xfe,		TrapIDDebugReservedFE = 0xfe,
TrapIDDebugReservedFF = 0xff		TrapIDDebugReservedFF = 0xff
};		};

enum TrapRegValues {		enum TrapRegValues {
LLVMTrapHandlerRegValue = 1		LLVMTrapHandlerRegValue = 1
};		};

		private:
		SIFrameLowering FrameLowering;

		/// GlobalISel related APIs.
		std::unique_ptr<AMDGPUCallLowering> CallLoweringInfo;
		std::unique_ptr<InstructionSelector> InstSelector;
		std::unique_ptr<LegalizerInfo> Legalizer;
		std::unique_ptr<RegisterBankInfo> RegBankInfo;

protected:		protected:
// Basic subtarget description.		// Basic subtarget description.
Triple TargetTriple;		Triple TargetTriple;
Generation Gen;		unsigned Gen;
unsigned IsaVersion;		unsigned IsaVersion;
unsigned WavefrontSize;
int LocalMemorySize;
int LDSBankCount;		int LDSBankCount;
unsigned MaxPrivateElementSize;		unsigned MaxPrivateElementSize;

// Possibly statically set by tablegen, but may want to be overridden.		// Possibly statically set by tablegen, but may want to be overridden.
bool FastFMAF32;		bool FastFMAF32;
bool HalfRate64Ops;		bool HalfRate64Ops;

// Dynamially set bits that enable features.		// Dynamially set bits that enable features.
bool FP32Denormals;
bool FP64FP16Denormals;		bool FP64FP16Denormals;
bool FPExceptions;
bool DX10Clamp;		bool DX10Clamp;
bool FlatForGlobal;		bool FlatForGlobal;
bool AutoWaitcntBeforeBarrier;		bool AutoWaitcntBeforeBarrier;
bool CodeObjectV3;		bool CodeObjectV3;
bool UnalignedScratchAccess;		bool UnalignedScratchAccess;
bool UnalignedBufferAccess;		bool UnalignedBufferAccess;
bool HasApertureRegs;		bool HasApertureRegs;
bool EnableXNACK;		bool EnableXNACK;
bool TrapHandler;		bool TrapHandler;
bool DebuggerInsertNops;		bool DebuggerInsertNops;
bool DebuggerEmitPrologue;		bool DebuggerEmitPrologue;

// Used as options.		// Used as options.
bool EnableHugePrivateBuffer;		bool EnableHugePrivateBuffer;
bool EnableVGPRSpilling;		bool EnableVGPRSpilling;
bool EnablePromoteAlloca;
bool EnableLoadStoreOpt;		bool EnableLoadStoreOpt;
bool EnableUnsafeDSOffsetFolding;		bool EnableUnsafeDSOffsetFolding;
bool EnableSIScheduler;		bool EnableSIScheduler;
bool EnableDS128;		bool EnableDS128;
bool DumpCode;		bool DumpCode;

// Subtarget statically properties set by tablegen		// Subtarget statically properties set by tablegen
bool FP64;		bool FP64;
bool FMA;		bool FMA;
bool MIMG_R128;		bool MIMG_R128;
bool IsGCN;		bool IsGCN;
bool GCN3Encoding;		bool GCN3Encoding;
bool CIInsts;		bool CIInsts;
bool GFX9Insts;		bool GFX9Insts;
bool SGPRInitBug;		bool SGPRInitBug;
bool HasSMemRealTime;		bool HasSMemRealTime;
bool Has16BitInsts;
bool HasIntClamp;		bool HasIntClamp;
bool HasVOP3PInsts;
bool HasMadMixInsts;
bool HasFmaMixInsts;		bool HasFmaMixInsts;
bool HasMovrel;		bool HasMovrel;
bool HasVGPRIndexMode;		bool HasVGPRIndexMode;
bool HasScalarStores;		bool HasScalarStores;
bool HasScalarAtomics;		bool HasScalarAtomics;
bool HasInv2PiInlineImm;		bool HasInv2PiInlineImm;
bool HasSDWA;
bool HasSDWAOmod;		bool HasSDWAOmod;
bool HasSDWAScalar;		bool HasSDWAScalar;
bool HasSDWASdst;		bool HasSDWASdst;
bool HasSDWAMac;		bool HasSDWAMac;
bool HasSDWAOutModsVOPC;		bool HasSDWAOutModsVOPC;
bool HasDPP;		bool HasDPP;
bool HasDLInsts;		bool HasDLInsts;
bool D16PreservesUnusedBits;		bool D16PreservesUnusedBits;
bool FlatAddressSpace;		bool FlatAddressSpace;
bool FlatInstOffsets;		bool FlatInstOffsets;
bool FlatGlobalInsts;		bool FlatGlobalInsts;
bool FlatScratchInsts;		bool FlatScratchInsts;
bool AddNoCarryInsts;		bool AddNoCarryInsts;
bool HasUnpackedD16VMem;		bool HasUnpackedD16VMem;
bool R600ALUInst;		bool R600ALUInst;
bool CaymanISA;		bool CaymanISA;
bool CFALUBug;		bool CFALUBug;
bool HasVertexCache;		bool HasVertexCache;
short TexVTXClauseSize;		short TexVTXClauseSize;
bool ScalarizeGlobal;		bool ScalarizeGlobal;

// Dummy feature to use for assembler in tablegen.		// Dummy feature to use for assembler in tablegen.
bool FeatureDisable;		bool FeatureDisable;

InstrItineraryData InstrItins;
SelectionDAGTargetInfo TSInfo;		SelectionDAGTargetInfo TSInfo;
AMDGPUAS AS;		AMDGPUAS AS;

public:		public:
AMDGPUSubtarget(const Triple &TT, StringRef GPU, StringRef FS,		AMDGPUSubtarget(const Triple &TT, StringRef GPU, StringRef FS,
const TargetMachine &TM);		const TargetMachine &TM);
~AMDGPUSubtarget() override;		~AMDGPUSubtarget() override;

AMDGPUSubtarget &initializeSubtargetDependencies(const Triple &TT,		AMDGPUSubtarget &initializeSubtargetDependencies(const Triple &TT,
StringRef GPU, StringRef FS);		StringRef GPU, StringRef FS);

const AMDGPUInstrInfo *getInstrInfo() const override = 0;		virtual const SIInstrInfo *getInstrInfo() const override = 0;
const AMDGPUFrameLowering *getFrameLowering() const override = 0;
const AMDGPUTargetLowering *getTargetLowering() const override = 0;
const AMDGPURegisterInfo *getRegisterInfo() const override = 0;

const InstrItineraryData *getInstrItineraryData() const override {		const SIFrameLowering *getFrameLowering() const override {
return &InstrItins;		return &FrameLowering;
		}

		virtual const SITargetLowering *getTargetLowering() const override = 0;

		virtual const SIRegisterInfo *getRegisterInfo() const override = 0;

		const CallLowering *getCallLowering() const override {
		return CallLoweringInfo.get();
		}

		const InstructionSelector *getInstructionSelector() const override {
		return InstSelector.get();
		}

		const LegalizerInfo *getLegalizerInfo() const override {
		return Legalizer.get();
		}

		const RegisterBankInfo *getRegBankInfo() const override {
		return RegBankInfo.get();
}		}

// Nothing implemented, just prevent crashes on use.		// Nothing implemented, just prevent crashes on use.
const SelectionDAGTargetInfo *getSelectionDAGInfo() const override {		const SelectionDAGTargetInfo *getSelectionDAGInfo() const override {
return &TSInfo;		return &TSInfo;
}		}

void ParseSubtargetFeatures(StringRef CPU, StringRef FS);		void ParseSubtargetFeatures(StringRef CPU, StringRef FS);

bool isAmdHsaOS() const {
return TargetTriple.getOS() == Triple::AMDHSA;
}

bool isMesa3DOS() const {		bool isMesa3DOS() const {
return TargetTriple.getOS() == Triple::Mesa3D;		return TargetTriple.getOS() == Triple::Mesa3D;
}		}

bool isAmdPalOS() const {
return TargetTriple.getOS() == Triple::AMDPAL;
}

Generation getGeneration() const {		Generation getGeneration() const {
return Gen;		return (Generation)Gen;
}

unsigned getWavefrontSize() const {
return WavefrontSize;
}		}

unsigned getWavefrontSizeLog2() const {		unsigned getWavefrontSizeLog2() const {
return Log2_32(WavefrontSize);		return Log2_32(WavefrontSize);
}		}

int getLocalMemorySize() const {
return LocalMemorySize;
}

int getLDSBankCount() const {		int getLDSBankCount() const {
return LDSBankCount;		return LDSBankCount;
}		}

unsigned getMaxPrivateElementSize() const {		unsigned getMaxPrivateElementSize() const {
return MaxPrivateElementSize;		return MaxPrivateElementSize;
}		}

AMDGPUAS getAMDGPUAS() const {		AMDGPUAS getAMDGPUAS() const {
return AS;		return AS;
}		}

bool has16BitInsts() const {
return Has16BitInsts;
}

bool hasIntClamp() const {		bool hasIntClamp() const {
return HasIntClamp;		return HasIntClamp;
}		}

bool hasVOP3PInsts() const {
return HasVOP3PInsts;
}

bool hasFP64() const {		bool hasFP64() const {
return FP64;		return FP64;
}		}

bool hasMIMG_R128() const {		bool hasMIMG_R128() const {
return MIMG_R128;		return MIMG_R128;
}		}

		bool hasHWFP64() const {
		return FP64;
		}

bool hasFastFMAF32() const {		bool hasFastFMAF32() const {
return FastFMAF32;		return FastFMAF32;
}		}

bool hasHalfRate64Ops() const {		bool hasHalfRate64Ops() const {
return HalfRate64Ops;		return HalfRate64Ops;
}		}

bool hasAddr64() const {		bool hasAddr64() const {
return (getGeneration() < VOLCANIC_ISLANDS);		return (getGeneration() < AMDGPUSubtarget::VOLCANIC_ISLANDS);
}		}
		arsenmUnsubmitted Done Reply Inline Actions Why is this needed outside of GCN code? arsenm: Why is this needed outside of GCN code?
		tstellarAuthorUnsubmitted Not Done Reply Inline Actions It's not. I've dropped the R600 implementation of this. tstellar: It's not. I've dropped the R600 implementation of this.

bool hasBFE() const {		bool hasBFE() const {
return (getGeneration() >= EVERGREEN);		return true;
}		}

bool hasBFI() const {		bool hasBFI() const {
return (getGeneration() >= EVERGREEN);		return true;
}		}

bool hasBFM() const {		bool hasBFM() const {
return hasBFE();		return hasBFE();
}		}

bool hasBCNT(unsigned Size) const {		bool hasBCNT(unsigned Size) const {
if (Size == 32)		return true;
return (getGeneration() >= EVERGREEN);

if (Size == 64)
return (getGeneration() >= SOUTHERN_ISLANDS);

return false;
}

bool hasMulU24() const {
return (getGeneration() >= EVERGREEN);
}

bool hasMulI24() const {
return (getGeneration() >= SOUTHERN_ISLANDS \|\|
hasCaymanISA());
}		}

bool hasFFBL() const {		bool hasFFBL() const {
return (getGeneration() >= EVERGREEN);		return true;
}		}

bool hasFFBH() const {		bool hasFFBH() const {
return (getGeneration() >= EVERGREEN);		return true;
}		}

bool hasMed3_16() const {		bool hasMed3_16() const {
return getGeneration() >= GFX9;		return getGeneration() >= AMDGPUSubtarget::GFX9;
}		}

bool hasMin3Max3_16() const {		bool hasMin3Max3_16() const {
return getGeneration() >= GFX9;		return getGeneration() >= AMDGPUSubtarget::GFX9;
}

bool hasMadMixInsts() const {
return HasMadMixInsts;
}		}

bool hasFmaMixInsts() const {		bool hasFmaMixInsts() const {
return HasFmaMixInsts;		return HasFmaMixInsts;
}		}

bool hasCARRY() const {		bool hasCARRY() const {
return (getGeneration() >= EVERGREEN);		return true;
		arsenmUnsubmitted Done Reply Inline Actions Why are these leftover as virtual? arsenm: Why are these leftover as virtual?
}

bool hasBORROW() const {
return (getGeneration() >= EVERGREEN);
}

bool hasCaymanISA() const {
return CaymanISA;
}		}

bool hasFMA() const {		bool hasFMA() const {
return FMA;		return FMA;
}		}

TrapHandlerAbi getTrapHandlerAbi() const {		TrapHandlerAbi getTrapHandlerAbi() const {
return isAmdHsaOS() ? TrapHandlerAbiHsa : TrapHandlerAbiNone;		return isAmdHsaOS() ? TrapHandlerAbiHsa : TrapHandlerAbiNone;
}		}

bool enableHugePrivateBuffer() const {		bool enableHugePrivateBuffer() const {
return EnableHugePrivateBuffer;		return EnableHugePrivateBuffer;
}		}

bool isPromoteAllocaEnabled() const {
return EnablePromoteAlloca;
}

bool unsafeDSOffsetFoldingEnabled() const {		bool unsafeDSOffsetFoldingEnabled() const {
return EnableUnsafeDSOffsetFolding;		return EnableUnsafeDSOffsetFolding;
}		}

bool dumpCode() const {		bool dumpCode() const {
return DumpCode;		return DumpCode;
}		}

/// Return the amount of LDS that can be used that will not restrict the		/// Return the amount of LDS that can be used that will not restrict the
/// occupancy lower than WaveCount.		/// occupancy lower than WaveCount.
unsigned getMaxLocalMemSizeWithWaveCount(unsigned WaveCount,		unsigned getMaxLocalMemSizeWithWaveCount(unsigned WaveCount,
const Function &) const;		const Function &) const;

/// Inverse of getMaxLocalMemWithWaveCount. Return the maximum wavecount if
/// the given LDS memory size is the only constraint.
unsigned getOccupancyWithLocalMemSize(uint32_t Bytes, const Function &) const;

unsigned getOccupancyWithLocalMemSize(const MachineFunction &MF) const;

bool hasFP16Denormals() const {		bool hasFP16Denormals() const {
return FP64FP16Denormals;		return FP64FP16Denormals;
}		}

bool hasFP32Denormals() const {
return FP32Denormals;
}

bool hasFP64Denormals() const {		bool hasFP64Denormals() const {
return FP64FP16Denormals;		return FP64FP16Denormals;
}		}

bool supportsMinMaxDenormModes() const {		bool supportsMinMaxDenormModes() const {
return getGeneration() >= AMDGPUSubtarget::GFX9;		return getGeneration() >= AMDGPUSubtarget::GFX9;
}		}

bool hasFPExceptions() const {
return FPExceptions;
}

bool enableDX10Clamp() const {		bool enableDX10Clamp() const {
return DX10Clamp;		return DX10Clamp;
}		}

bool enableIEEEBit(const MachineFunction &MF) const {		bool enableIEEEBit(const MachineFunction &MF) const {
return AMDGPU::isCompute(MF.getFunction().getCallingConv());		return AMDGPU::isCompute(MF.getFunction().getCallingConv());
}		}

Show All 25 Lines	bool hasUnalignedBufferAccess() const {
return UnalignedBufferAccess;		return UnalignedBufferAccess;
}		}

bool hasUnalignedScratchAccess() const {		bool hasUnalignedScratchAccess() const {
return UnalignedScratchAccess;		return UnalignedScratchAccess;
}		}

bool hasApertureRegs() const {		bool hasApertureRegs() const {
return HasApertureRegs;		return HasApertureRegs;
}		}

bool isTrapHandlerEnabled() const {		bool isTrapHandlerEnabled() const {
return TrapHandler;		return TrapHandler;
}		}

bool isXNACKEnabled() const {		bool isXNACKEnabled() const {
return EnableXNACK;		return EnableXNACK;
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	public:
bool isAmdCodeObjectV2(const Function &F) const {		bool isAmdCodeObjectV2(const Function &F) const {
return isAmdHsaOS() \|\| isMesaKernel(F);		return isAmdHsaOS() \|\| isMesaKernel(F);
}		}

bool hasMad64_32() const {		bool hasMad64_32() const {
return getGeneration() >= SEA_ISLANDS;		return getGeneration() >= SEA_ISLANDS;
}		}

bool hasFminFmaxLegacy() const {
return getGeneration() < AMDGPUSubtarget::VOLCANIC_ISLANDS;
}

bool hasSDWA() const {
return HasSDWA;
}

bool hasSDWAOmod() const {		bool hasSDWAOmod() const {
return HasSDWAOmod;		return HasSDWAOmod;
}		}

bool hasSDWAScalar() const {		bool hasSDWAScalar() const {
return HasSDWAScalar;		return HasSDWAScalar;
}		}

Show All 22 Lines	public:
}		}

/// Returns the offset in bytes from the start of the input buffer		/// Returns the offset in bytes from the start of the input buffer
/// of the first explicit kernel argument.		/// of the first explicit kernel argument.
unsigned getExplicitKernelArgOffset(const Function &F) const {		unsigned getExplicitKernelArgOffset(const Function &F) const {
return isAmdCodeObjectV2(F) ? 0 : 36;		return isAmdCodeObjectV2(F) ? 0 : 36;
}		}

unsigned getAlignmentForImplicitArgPtr() const {
return isAmdHsaOS() ? 8 : 4;
}

/// \returns Number of bytes of arguments that are passed to a shader or		/// \returns Number of bytes of arguments that are passed to a shader or
/// kernel in addition to the explicit ones declared for the function.		/// kernel in addition to the explicit ones declared for the function.
unsigned getImplicitArgNumBytes(const Function &F) const {		unsigned getImplicitArgNumBytes(const Function &F) const {
if (isMesaKernel(F))		if (isMesaKernel(F))
return 16;		return 16;
return AMDGPU::getIntegerAttribute(F, "amdgpu-implicitarg-num-bytes", 0);		return AMDGPU::getIntegerAttribute(F, "amdgpu-implicitarg-num-bytes", 0);
}		}

Show All 12 Lines	public:
bool enableMachineScheduler() const override {		bool enableMachineScheduler() const override {
return true;		return true;
}		}

bool enableSubRegLiveness() const override {		bool enableSubRegLiveness() const override {
return true;		return true;
}		}

void setScalarizeGlobalBehavior(bool b) { ScalarizeGlobal = b;}		void setScalarizeGlobalBehavior(bool b) { ScalarizeGlobal = b; }
bool getScalarizeGlobalBehavior() const { return ScalarizeGlobal;}		bool getScalarizeGlobalBehavior() const { return ScalarizeGlobal; }

/// \returns Number of execution units per compute unit supported by the		/// \returns Number of execution units per compute unit supported by the
/// subtarget.		/// subtarget.
unsigned getEUsPerCU() const {		unsigned getEUsPerCU() const {
return AMDGPU::IsaInfo::getEUsPerCU(getFeatureBits());		return AMDGPU::IsaInfo::getEUsPerCU(MCSubtargetInfo::getFeatureBits());
}

/// \returns Maximum number of work groups per compute unit supported by the
/// subtarget and limited by given \p FlatWorkGroupSize.
unsigned getMaxWorkGroupsPerCU(unsigned FlatWorkGroupSize) const {
return AMDGPU::IsaInfo::getMaxWorkGroupsPerCU(getFeatureBits(),
FlatWorkGroupSize);
}		}

/// \returns Maximum number of waves per compute unit supported by the		/// \returns Maximum number of waves per compute unit supported by the
/// subtarget without any kind of limitation.		/// subtarget without any kind of limitation.
unsigned getMaxWavesPerCU() const {		unsigned getMaxWavesPerCU() const {
return AMDGPU::IsaInfo::getMaxWavesPerCU(getFeatureBits());		return AMDGPU::IsaInfo::getMaxWavesPerCU(MCSubtargetInfo::getFeatureBits());
}		}

/// \returns Maximum number of waves per compute unit supported by the		/// \returns Maximum number of waves per compute unit supported by the
/// subtarget and limited by given \p FlatWorkGroupSize.		/// subtarget and limited by given \p FlatWorkGroupSize.
unsigned getMaxWavesPerCU(unsigned FlatWorkGroupSize) const {		unsigned getMaxWavesPerCU(unsigned FlatWorkGroupSize) const {
return AMDGPU::IsaInfo::getMaxWavesPerCU(getFeatureBits(),		return AMDGPU::IsaInfo::getMaxWavesPerCU(MCSubtargetInfo::getFeatureBits(),
FlatWorkGroupSize);		FlatWorkGroupSize);
}		}

/// \returns Minimum number of waves per execution unit supported by the
/// subtarget.
unsigned getMinWavesPerEU() const {
return AMDGPU::IsaInfo::getMinWavesPerEU(getFeatureBits());
}

/// \returns Maximum number of waves per execution unit supported by the		/// \returns Maximum number of waves per execution unit supported by the
/// subtarget without any kind of limitation.		/// subtarget without any kind of limitation.
unsigned getMaxWavesPerEU() const {		unsigned getMaxWavesPerEU() const {
return AMDGPU::IsaInfo::getMaxWavesPerEU(getFeatureBits());		return AMDGPU::IsaInfo::getMaxWavesPerEU();
}

/// \returns Maximum number of waves per execution unit supported by the
/// subtarget and limited by given \p FlatWorkGroupSize.
unsigned getMaxWavesPerEU(unsigned FlatWorkGroupSize) const {
return AMDGPU::IsaInfo::getMaxWavesPerEU(getFeatureBits(),
FlatWorkGroupSize);
}

/// \returns Minimum flat work group size supported by the subtarget.
unsigned getMinFlatWorkGroupSize() const {
return AMDGPU::IsaInfo::getMinFlatWorkGroupSize(getFeatureBits());
}

/// \returns Maximum flat work group size supported by the subtarget.
unsigned getMaxFlatWorkGroupSize() const {
return AMDGPU::IsaInfo::getMaxFlatWorkGroupSize(getFeatureBits());
}		}

/// \returns Number of waves per work group supported by the subtarget and		/// \returns Number of waves per work group supported by the subtarget and
/// limited by given \p FlatWorkGroupSize.		/// limited by given \p FlatWorkGroupSize.
unsigned getWavesPerWorkGroup(unsigned FlatWorkGroupSize) const {		unsigned getWavesPerWorkGroup(unsigned FlatWorkGroupSize) const {
return AMDGPU::IsaInfo::getWavesPerWorkGroup(getFeatureBits(),		return AMDGPU::IsaInfo::getWavesPerWorkGroup(
FlatWorkGroupSize);		MCSubtargetInfo::getFeatureBits(), FlatWorkGroupSize);
}

/// \returns Default range flat work group size for a calling convention.
std::pair<unsigned, unsigned> getDefaultFlatWorkGroupSize(CallingConv::ID CC) const;

/// \returns Subtarget's default pair of minimum/maximum flat work group sizes
/// for function \p F, or minimum/maximum flat work group sizes explicitly
/// requested using "amdgpu-flat-work-group-size" attribute attached to
/// function \p F.
///
/// \returns Subtarget's default values if explicitly requested values cannot
/// be converted to integer, or violate subtarget's specifications.
std::pair<unsigned, unsigned> getFlatWorkGroupSizes(const Function &F) const;

/// \returns Subtarget's default pair of minimum/maximum number of waves per
/// execution unit for function \p F, or minimum/maximum number of waves per
/// execution unit explicitly requested using "amdgpu-waves-per-eu" attribute
/// attached to function \p F.
///
/// \returns Subtarget's default values if explicitly requested values cannot
/// be converted to integer, violate subtarget's specifications, or are not
/// compatible with minimum/maximum number of waves limited by flat work group
/// size, register usage, and/or lds usage.
std::pair<unsigned, unsigned> getWavesPerEU(const Function &F) const;

/// Creates value range metadata on an workitemid.* inrinsic call or load.
bool makeLIDRangeMetadata(Instruction *I) const;
};

class R600Subtarget final : public AMDGPUSubtarget {
private:
R600InstrInfo InstrInfo;
R600FrameLowering FrameLowering;
R600TargetLowering TLInfo;

public:
R600Subtarget(const Triple &TT, StringRef CPU, StringRef FS,
const TargetMachine &TM);

const R600InstrInfo *getInstrInfo() const override {
return &InstrInfo;
}

const R600FrameLowering *getFrameLowering() const override {
return &FrameLowering;
}

const R600TargetLowering *getTargetLowering() const override {
return &TLInfo;
}

const R600RegisterInfo *getRegisterInfo() const override {
return &InstrInfo.getRegisterInfo();
}

bool hasCFAluBug() const {
return CFALUBug;
}

bool hasVertexCache() const {
return HasVertexCache;
}

short getTexVTXClauseSize() const {
return TexVTXClauseSize;
}		}
};		};

class SISubtarget final : public AMDGPUSubtarget {		class SISubtarget final : public AMDGPUSubtarget {
private:		private:
SIInstrInfo InstrInfo;		SIInstrInfo InstrInfo;
SIFrameLowering FrameLowering;		SIFrameLowering FrameLowering;
SITargetLowering TLInfo;		SITargetLowering TLInfo;
Show All 34 Lines	public:

const RegisterBankInfo *getRegBankInfo() const override {		const RegisterBankInfo *getRegBankInfo() const override {
return RegBankInfo.get();		return RegBankInfo.get();
}		}

const SIRegisterInfo *getRegisterInfo() const override {		const SIRegisterInfo *getRegisterInfo() const override {
return &InstrInfo.getRegisterInfo();		return &InstrInfo.getRegisterInfo();
}		}
		// static wrappers
		static bool hasHalfRate64Ops(const TargetSubtargetInfo &STI);

// XXX - Why is this here if it isn't in the default pass set?		// XXX - Why is this here if it isn't in the default pass set?
bool enableEarlyIfConversion() const override {		bool enableEarlyIfConversion() const override {
return true;		return true;
}		}

void overrideSchedPolicy(MachineSchedPolicy &Policy,		void overrideSchedPolicy(MachineSchedPolicy &Policy,
unsigned NumRegionInstrs) const override;		unsigned NumRegionInstrs) const override;

bool isVGPRSpillingEnabled(const Function& F) const;		bool isVGPRSpillingEnabled(const Function &F) const;

unsigned getMaxNumUserSGPRs() const {		unsigned getMaxNumUserSGPRs() const {
return 16;		return 16;
}		}

bool hasSMemRealTime() const {		bool hasSMemRealTime() const {
return HasSMemRealTime;		return HasSMemRealTime;
}		}
▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	public:

bool hasReadM0SendMsgHazard() const {		bool hasReadM0SendMsgHazard() const {
return getGeneration() >= AMDGPUSubtarget::VOLCANIC_ISLANDS;		return getGeneration() >= AMDGPUSubtarget::VOLCANIC_ISLANDS;
}		}

unsigned getKernArgSegmentSize(const Function &F,		unsigned getKernArgSegmentSize(const Function &F,
unsigned ExplictArgBytes) const;		unsigned ExplictArgBytes) const;

/// Return the maximum number of waves per SIMD for kernels using \p SGPRs SGPRs		/// Return the maximum number of waves per SIMD for kernels using \p SGPRs
		/// SGPRs
unsigned getOccupancyWithNumSGPRs(unsigned SGPRs) const;		unsigned getOccupancyWithNumSGPRs(unsigned SGPRs) const;

/// Return the maximum number of waves per SIMD for kernels using \p VGPRs VGPRs		/// Return the maximum number of waves per SIMD for kernels using \p VGPRs
		/// VGPRs
unsigned getOccupancyWithNumVGPRs(unsigned VGPRs) const;		unsigned getOccupancyWithNumVGPRs(unsigned VGPRs) const;

/// \returns true if the flat_scratch register should be initialized with the		/// \returns true if the flat_scratch register should be initialized with the
/// pointer to the wave's scratch memory rather than a size and offset.		/// pointer to the wave's scratch memory rather than a size and offset.
bool flatScratchIsPointer() const {		bool flatScratchIsPointer() const {
return getGeneration() >= GFX9;		return getGeneration() >= AMDGPUSubtarget::GFX9;
}		}

/// \returns true if the machine has merged shaders in which s0-s7 are		/// \returns true if the machine has merged shaders in which s0-s7 are
/// reserved by the hardware and user SGPRs start at s8		/// reserved by the hardware and user SGPRs start at s8
bool hasMergedShaders() const {		bool hasMergedShaders() const {
return getGeneration() >= GFX9;		return getGeneration() >= GFX9;
}		}

/// \returns SGPR allocation granularity supported by the subtarget.		/// \returns SGPR allocation granularity supported by the subtarget.
unsigned getSGPRAllocGranule() const {		unsigned getSGPRAllocGranule() const {
return AMDGPU::IsaInfo::getSGPRAllocGranule(getFeatureBits());		return AMDGPU::IsaInfo::getSGPRAllocGranule(
		MCSubtargetInfo::getFeatureBits());
}		}

/// \returns SGPR encoding granularity supported by the subtarget.		/// \returns SGPR encoding granularity supported by the subtarget.
unsigned getSGPREncodingGranule() const {		unsigned getSGPREncodingGranule() const {
return AMDGPU::IsaInfo::getSGPREncodingGranule(getFeatureBits());		return AMDGPU::IsaInfo::getSGPREncodingGranule(
		MCSubtargetInfo::getFeatureBits());
}		}

/// \returns Total number of SGPRs supported by the subtarget.		/// \returns Total number of SGPRs supported by the subtarget.
unsigned getTotalNumSGPRs() const {		unsigned getTotalNumSGPRs() const {
return AMDGPU::IsaInfo::getTotalNumSGPRs(getFeatureBits());		return AMDGPU::IsaInfo::getTotalNumSGPRs(MCSubtargetInfo::getFeatureBits());
}		}

/// \returns Addressable number of SGPRs supported by the subtarget.		/// \returns Addressable number of SGPRs supported by the subtarget.
unsigned getAddressableNumSGPRs() const {		unsigned getAddressableNumSGPRs() const {
return AMDGPU::IsaInfo::getAddressableNumSGPRs(getFeatureBits());		return AMDGPU::IsaInfo::getAddressableNumSGPRs(
		MCSubtargetInfo::getFeatureBits());
}		}

/// \returns Minimum number of SGPRs that meets the given number of waves per		/// \returns Minimum number of SGPRs that meets the given number of waves per
/// execution unit requirement supported by the subtarget.		/// execution unit requirement supported by the subtarget.
unsigned getMinNumSGPRs(unsigned WavesPerEU) const {		unsigned getMinNumSGPRs(unsigned WavesPerEU) const {
return AMDGPU::IsaInfo::getMinNumSGPRs(getFeatureBits(), WavesPerEU);		return AMDGPU::IsaInfo::getMinNumSGPRs(MCSubtargetInfo::getFeatureBits(),
		WavesPerEU);
}		}

/// \returns Maximum number of SGPRs that meets the given number of waves per		/// \returns Maximum number of SGPRs that meets the given number of waves per
/// execution unit requirement supported by the subtarget.		/// execution unit requirement supported by the subtarget.
unsigned getMaxNumSGPRs(unsigned WavesPerEU, bool Addressable) const {		unsigned getMaxNumSGPRs(unsigned WavesPerEU, bool Addressable) const {
return AMDGPU::IsaInfo::getMaxNumSGPRs(getFeatureBits(), WavesPerEU,		return AMDGPU::IsaInfo::getMaxNumSGPRs(MCSubtargetInfo::getFeatureBits(),
Addressable);		WavesPerEU, Addressable);
}		}

/// \returns Reserved number of SGPRs for given function \p MF.		/// \returns Reserved number of SGPRs for given function \p MF.
unsigned getReservedNumSGPRs(const MachineFunction &MF) const;		unsigned getReservedNumSGPRs(const MachineFunction &MF) const;

/// \returns Maximum number of SGPRs that meets number of waves per execution		/// \returns Maximum number of SGPRs that meets number of waves per execution
/// unit requirement for function \p MF, or number of SGPRs explicitly		/// unit requirement for function \p MF, or number of SGPRs explicitly
/// requested using "amdgpu-num-sgpr" attribute attached to function \p MF.		/// requested using "amdgpu-num-sgpr" attribute attached to function \p MF.
///		///
/// \returns Value that meets number of waves per execution unit requirement		/// \returns Value that meets number of waves per execution unit requirement
/// if explicitly requested value cannot be converted to integer, violates		/// if explicitly requested value cannot be converted to integer, violates
/// subtarget's specifications, or does not meet number of waves per execution		/// subtarget's specifications, or does not meet number of waves per execution
/// unit requirement.		/// unit requirement.
unsigned getMaxNumSGPRs(const MachineFunction &MF) const;		unsigned getMaxNumSGPRs(const MachineFunction &MF) const;

/// \returns VGPR allocation granularity supported by the subtarget.		/// \returns VGPR allocation granularity supported by the subtarget.
unsigned getVGPRAllocGranule() const {		unsigned getVGPRAllocGranule() const {
return AMDGPU::IsaInfo::getVGPRAllocGranule(getFeatureBits());		return AMDGPU::IsaInfo::getVGPRAllocGranule(
		MCSubtargetInfo::getFeatureBits());
}		}

/// \returns VGPR encoding granularity supported by the subtarget.		/// \returns VGPR encoding granularity supported by the subtarget.
unsigned getVGPREncodingGranule() const {		unsigned getVGPREncodingGranule() const {
return AMDGPU::IsaInfo::getVGPREncodingGranule(getFeatureBits());		return AMDGPU::IsaInfo::getVGPREncodingGranule(
		MCSubtargetInfo::getFeatureBits());
}		}

/// \returns Total number of VGPRs supported by the subtarget.		/// \returns Total number of VGPRs supported by the subtarget.
unsigned getTotalNumVGPRs() const {		unsigned getTotalNumVGPRs() const {
return AMDGPU::IsaInfo::getTotalNumVGPRs(getFeatureBits());		return AMDGPU::IsaInfo::getTotalNumVGPRs(MCSubtargetInfo::getFeatureBits());
}		}

/// \returns Addressable number of VGPRs supported by the subtarget.		/// \returns Addressable number of VGPRs supported by the subtarget.
unsigned getAddressableNumVGPRs() const {		unsigned getAddressableNumVGPRs() const {
return AMDGPU::IsaInfo::getAddressableNumVGPRs(getFeatureBits());		return AMDGPU::IsaInfo::getAddressableNumVGPRs(
		MCSubtargetInfo::getFeatureBits());
}		}

/// \returns Minimum number of VGPRs that meets given number of waves per		/// \returns Minimum number of VGPRs that meets given number of waves per
/// execution unit requirement supported by the subtarget.		/// execution unit requirement supported by the subtarget.
unsigned getMinNumVGPRs(unsigned WavesPerEU) const {		unsigned getMinNumVGPRs(unsigned WavesPerEU) const {
return AMDGPU::IsaInfo::getMinNumVGPRs(getFeatureBits(), WavesPerEU);		return AMDGPU::IsaInfo::getMinNumVGPRs(MCSubtargetInfo::getFeatureBits(),
		WavesPerEU);
}		}

/// \returns Maximum number of VGPRs that meets given number of waves per		/// \returns Maximum number of VGPRs that meets given number of waves per
/// execution unit requirement supported by the subtarget.		/// execution unit requirement supported by the subtarget.
unsigned getMaxNumVGPRs(unsigned WavesPerEU) const {		unsigned getMaxNumVGPRs(unsigned WavesPerEU) const {
return AMDGPU::IsaInfo::getMaxNumVGPRs(getFeatureBits(), WavesPerEU);		return AMDGPU::IsaInfo::getMaxNumVGPRs(MCSubtargetInfo::getFeatureBits(),
		WavesPerEU);
}		}

/// \returns Maximum number of VGPRs that meets number of waves per execution		/// \returns Maximum number of VGPRs that meets number of waves per execution
/// unit requirement for function \p MF, or number of VGPRs explicitly		/// unit requirement for function \p MF, or number of VGPRs explicitly
/// requested using "amdgpu-num-vgpr" attribute attached to function \p MF.		/// requested using "amdgpu-num-vgpr" attribute attached to function \p MF.
///		///
/// \returns Value that meets number of waves per execution unit requirement		/// \returns Value that meets number of waves per execution unit requirement
/// if explicitly requested value cannot be converted to integer, violates		/// if explicitly requested value cannot be converted to integer, violates
/// subtarget's specifications, or does not meet number of waves per execution		/// subtarget's specifications, or does not meet number of waves per execution
/// unit requirement.		/// unit requirement.
unsigned getMaxNumVGPRs(const MachineFunction &MF) const;		unsigned getMaxNumVGPRs(const MachineFunction &MF) const;

void getPostRAMutations(		void getPostRAMutations(
std::vector<std::unique_ptr<ScheduleDAGMutation>> &Mutations)		std::vector<std::unique_ptr<ScheduleDAGMutation>> &Mutations)
const override;		const override;
};		};


		class R600Subtarget final : public R600GenSubtargetInfo,
		public AMDGPUCommonSubtarget {
		public:
		enum Generation { R600 = 0, R700 = 1, EVERGREEN = 2, NORTHERN_ISLANDS = 3 };

		private:
		R600InstrInfo InstrInfo;
		R600FrameLowering FrameLowering;
		bool FMA;
		bool CaymanISA;
		bool CFALUBug;
		bool DX10Clamp;
		bool HasVertexCache;
		bool R600ALUInst;
		bool FP64;
		short TexVTXClauseSize;
		Generation Gen;
		R600TargetLowering TLInfo;
		InstrItineraryData InstrItins;
		SelectionDAGTargetInfo TSInfo;
		AMDGPUAS AS;

		public:
		R600Subtarget(const Triple &TT, StringRef CPU, StringRef FS,
		const TargetMachine &TM);

		const R600InstrInfo *getInstrInfo() const override { return &InstrInfo; }

		const R600FrameLowering *getFrameLowering() const override {
		return &FrameLowering;
		}

		const R600TargetLowering *getTargetLowering() const override {
		return &TLInfo;
		}

		const R600RegisterInfo *getRegisterInfo() const override {
		return &InstrInfo.getRegisterInfo();
		}

		const InstrItineraryData *getInstrItineraryData() const override {
		return &InstrItins;
		}

		// Nothing implemented, just prevent crashes on use.
		const SelectionDAGTargetInfo *getSelectionDAGInfo() const override {
		return &TSInfo;
		}

		void ParseSubtargetFeatures(StringRef CPU, StringRef FS);

		Generation getGeneration() const {
		return Gen;
		}

		unsigned getStackAlignment() const {
		return 4;
		}

		R600Subtarget &initializeSubtargetDependencies(const Triple &TT,
		StringRef GPU, StringRef FS);

		bool hasBFE() const {
		return (getGeneration() >= EVERGREEN);
		}

		bool hasBFI() const {
		return (getGeneration() >= EVERGREEN);
		}

		bool hasBCNT(unsigned Size) const {
		if (Size == 32)
		return (getGeneration() >= EVERGREEN);

		return false;
		}

		bool hasBORROW() const {
		return (getGeneration() >= EVERGREEN);
		}

		bool hasCARRY() const {
		return (getGeneration() >= EVERGREEN);
		}

		bool hasCaymanISA() const {
		return CaymanISA;
		}

		bool hasFFBL() const {
		return (getGeneration() >= EVERGREEN);
		}

		bool hasFFBH() const {
		return (getGeneration() >= EVERGREEN);
		}

		bool hasFMA() const { return FMA; }

		unsigned getExplicitKernelArgOffset(const MachineFunction &MF) const {
		return 36;
		}

		bool hasCFAluBug() const { return CFALUBug; }

		bool hasVertexCache() const { return HasVertexCache; }

		short getTexVTXClauseSize() const { return TexVTXClauseSize; }

		AMDGPUAS getAMDGPUAS() const { return AS; }

		bool enableMachineScheduler() const override {
		return true;
		}

		bool enableSubRegLiveness() const override {
		return true;
		}
		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_LIB_TARGET_AMDGPU_AMDGPUSUBTARGET_H		#endif // LLVM_LIB_TARGET_AMDGPU_AMDGPUSUBTARGET_H

lib/Target/AMDGPU/AMDGPUSubtarget.cpp

Show All 17 Lines
#include "AMDGPUCallLowering.h"		#include "AMDGPUCallLowering.h"
#include "AMDGPUInstructionSelector.h"		#include "AMDGPUInstructionSelector.h"
#include "AMDGPULegalizerInfo.h"		#include "AMDGPULegalizerInfo.h"
#include "AMDGPURegisterBankInfo.h"		#include "AMDGPURegisterBankInfo.h"
#include "SIMachineFunctionInfo.h"		#include "SIMachineFunctionInfo.h"
#include "MCTargetDesc/AMDGPUMCTargetDesc.h"		#include "MCTargetDesc/AMDGPUMCTargetDesc.h"
#include "llvm/ADT/SmallString.h"		#include "llvm/ADT/SmallString.h"
#include "llvm/CodeGen/MachineScheduler.h"		#include "llvm/CodeGen/MachineScheduler.h"
		#include "llvm/MC/MCSubtargetInfo.h"
#include "llvm/IR/MDBuilder.h"		#include "llvm/IR/MDBuilder.h"
#include "llvm/CodeGen/TargetFrameLowering.h"		#include "llvm/CodeGen/TargetFrameLowering.h"
#include <algorithm>		#include <algorithm>

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "amdgpu-subtarget"		#define DEBUG_TYPE "amdgpu-subtarget"

#define GET_SUBTARGETINFO_TARGET_DESC		#define GET_SUBTARGETINFO_TARGET_DESC
#define GET_SUBTARGETINFO_CTOR		#define GET_SUBTARGETINFO_CTOR
#include "AMDGPUGenSubtargetInfo.inc"		#include "AMDGPUGenSubtargetInfo.inc"
		#define GET_SUBTARGETINFO_TARGET_DESC
		#define GET_SUBTARGETINFO_CTOR
		#include "R600GenSubtargetInfo.inc"

AMDGPUSubtarget::~AMDGPUSubtarget() = default;		AMDGPUSubtarget::~AMDGPUSubtarget() = default;

		R600Subtarget &
		R600Subtarget::initializeSubtargetDependencies(const Triple &TT,
		StringRef GPU, StringRef FS) {
		SmallString<256> FullFS("+promote-alloca,+dx10-clamp,");
		FullFS += FS;
		ParseSubtargetFeatures(GPU, FullFS);

		// FIXME: I don't think think Evergreen has any useful support for
		// denormals, but should be checked. Should we issue a warning somewhere
		// if someone tries to enable these?
		if (getGeneration() <= R600Subtarget::NORTHERN_ISLANDS) {
		FP32Denormals = false;
		}

		HasMulU24 = getGeneration() >= EVERGREEN;
		HasMulI24 = hasCaymanISA();

		return *this;
		}

AMDGPUSubtarget &		AMDGPUSubtarget &
AMDGPUSubtarget::initializeSubtargetDependencies(const Triple &TT,		AMDGPUSubtarget::initializeSubtargetDependencies(const Triple &TT,
StringRef GPU, StringRef FS) {		StringRef GPU, StringRef FS) {
// Determine default and user-specified characteristics		// Determine default and user-specified characteristics
// On SI+, we want FP64 denormals to be on by default. FP32 denormals can be		// On SI+, we want FP64 denormals to be on by default. FP32 denormals can be
// enabled, but some instructions do not respect them and they run at the		// enabled, but some instructions do not respect them and they run at the
// double precision rate, so don't enable by default.		// double precision rate, so don't enable by default.
//		//
Show All 40 Lines	if (TT.getArch() == Triple::amdgcn) {
if (LocalMemorySize == 0)		if (LocalMemorySize == 0)
LocalMemorySize = 32768;		LocalMemorySize = 32768;

// Do something sensible for unspecified target.		// Do something sensible for unspecified target.
if (!HasMovrel && !HasVGPRIndexMode)		if (!HasMovrel && !HasVGPRIndexMode)
HasMovrel = true;		HasMovrel = true;
}		}

		HasFminFmaxLegacy = getGeneration() < AMDGPUSubtarget::VOLCANIC_ISLANDS;

return *this;		return *this;
}		}

		AMDGPUCommonSubtarget::AMDGPUCommonSubtarget(const Triple &TT,
		const FeatureBitset &FeatureBits) :
		TargetTriple(TT),
		SubtargetFeatureBits(FeatureBits),
		Has16BitInsts(false),
		HasMadMixInsts(false),
		FP32Denormals(false),
		FPExceptions(false),
		HasSDWA(false),
		HasVOP3PInsts(false),
		HasMulI24(true),
		HasMulU24(true),
		HasFminFmaxLegacy(true),
		EnablePromoteAlloca(false),
		LocalMemorySize(0),
		WavefrontSize(0)
		{ }

AMDGPUSubtarget::AMDGPUSubtarget(const Triple &TT, StringRef GPU, StringRef FS,		AMDGPUSubtarget::AMDGPUSubtarget(const Triple &TT, StringRef GPU, StringRef FS,
const TargetMachine &TM)		const TargetMachine &TM) :
: AMDGPUGenSubtargetInfo(TT, GPU, FS),		AMDGPUGenSubtargetInfo(TT, GPU, FS),
		AMDGPUCommonSubtarget(TT, getFeatureBits()),
		FrameLowering(TargetFrameLowering::StackGrowsUp, getStackAlignment(), 0),
TargetTriple(TT),		TargetTriple(TT),
Gen(TT.getArch() == Triple::amdgcn ? SOUTHERN_ISLANDS : R600),		Gen(SOUTHERN_ISLANDS),
IsaVersion(ISAVersion0_0_0),		IsaVersion(ISAVersion0_0_0),
WavefrontSize(0),
LocalMemorySize(0),
LDSBankCount(0),		LDSBankCount(0),
MaxPrivateElementSize(0),		MaxPrivateElementSize(0),

FastFMAF32(false),		FastFMAF32(false),
HalfRate64Ops(false),		HalfRate64Ops(false),

FP32Denormals(false),
FP64FP16Denormals(false),		FP64FP16Denormals(false),
FPExceptions(false),
DX10Clamp(false),		DX10Clamp(false),
FlatForGlobal(false),		FlatForGlobal(false),
AutoWaitcntBeforeBarrier(false),		AutoWaitcntBeforeBarrier(false),
CodeObjectV3(false),		CodeObjectV3(false),
UnalignedScratchAccess(false),		UnalignedScratchAccess(false),
UnalignedBufferAccess(false),		UnalignedBufferAccess(false),

HasApertureRegs(false),		HasApertureRegs(false),
EnableXNACK(false),		EnableXNACK(false),
TrapHandler(false),		TrapHandler(false),
DebuggerInsertNops(false),		DebuggerInsertNops(false),
DebuggerEmitPrologue(false),		DebuggerEmitPrologue(false),

EnableHugePrivateBuffer(false),		EnableHugePrivateBuffer(false),
EnableVGPRSpilling(false),		EnableVGPRSpilling(false),
EnablePromoteAlloca(false),
EnableLoadStoreOpt(false),		EnableLoadStoreOpt(false),
EnableUnsafeDSOffsetFolding(false),		EnableUnsafeDSOffsetFolding(false),
EnableSIScheduler(false),		EnableSIScheduler(false),
EnableDS128(false),		EnableDS128(false),
DumpCode(false),		DumpCode(false),

FP64(false),		FP64(false),
FMA(false),
MIMG_R128(false),
IsGCN(false),
GCN3Encoding(false),		GCN3Encoding(false),
CIInsts(false),		CIInsts(false),
GFX9Insts(false),		GFX9Insts(false),
SGPRInitBug(false),		SGPRInitBug(false),
HasSMemRealTime(false),		HasSMemRealTime(false),
Has16BitInsts(false),
HasIntClamp(false),		HasIntClamp(false),
HasVOP3PInsts(false),
HasMadMixInsts(false),
HasFmaMixInsts(false),		HasFmaMixInsts(false),
HasMovrel(false),		HasMovrel(false),
HasVGPRIndexMode(false),		HasVGPRIndexMode(false),
HasScalarStores(false),		HasScalarStores(false),
HasScalarAtomics(false),		HasScalarAtomics(false),
HasInv2PiInlineImm(false),		HasInv2PiInlineImm(false),
HasSDWA(false),
HasSDWAOmod(false),		HasSDWAOmod(false),
HasSDWAScalar(false),		HasSDWAScalar(false),
HasSDWASdst(false),		HasSDWASdst(false),
HasSDWAMac(false),		HasSDWAMac(false),
HasSDWAOutModsVOPC(false),		HasSDWAOutModsVOPC(false),
HasDPP(false),		HasDPP(false),
HasDLInsts(false),		HasDLInsts(false),
D16PreservesUnusedBits(false),		D16PreservesUnusedBits(false),
FlatAddressSpace(false),		FlatAddressSpace(false),
FlatInstOffsets(false),		FlatInstOffsets(false),
FlatGlobalInsts(false),		FlatGlobalInsts(false),
FlatScratchInsts(false),		FlatScratchInsts(false),
AddNoCarryInsts(false),		AddNoCarryInsts(false),
HasUnpackedD16VMem(false),		HasUnpackedD16VMem(false),

R600ALUInst(false),
CaymanISA(false),
CFALUBug(false),
HasVertexCache(false),
TexVTXClauseSize(0),
ScalarizeGlobal(false),		ScalarizeGlobal(false),

FeatureDisable(false),		FeatureDisable(false) {
InstrItins(getInstrItineraryForCPU(GPU)) {
AS = AMDGPU::getAMDGPUAS(TT);		AS = AMDGPU::getAMDGPUAS(TT);
initializeSubtargetDependencies(TT, GPU, FS);		initializeSubtargetDependencies(TT, GPU, FS);
}		}

unsigned AMDGPUSubtarget::getMaxLocalMemSizeWithWaveCount(unsigned NWaves,		unsigned AMDGPUCommonSubtarget::getMaxLocalMemSizeWithWaveCount(unsigned NWaves,
const Function &F) const {		const Function &F) const {
if (NWaves == 1)		if (NWaves == 1)
return getLocalMemorySize();		return getLocalMemorySize();
unsigned WorkGroupSize = getFlatWorkGroupSizes(F).second;		unsigned WorkGroupSize = getFlatWorkGroupSizes(F).second;
unsigned WorkGroupsPerCu = getMaxWorkGroupsPerCU(WorkGroupSize);		unsigned WorkGroupsPerCu = getMaxWorkGroupsPerCU(WorkGroupSize);
unsigned MaxWaves = getMaxWavesPerEU();		unsigned MaxWaves = getMaxWavesPerEU();
return getLocalMemorySize() * MaxWaves / WorkGroupsPerCu / NWaves;		return getLocalMemorySize() * MaxWaves / WorkGroupsPerCu / NWaves;
}		}

unsigned AMDGPUSubtarget::getOccupancyWithLocalMemSize(uint32_t Bytes,		unsigned AMDGPUCommonSubtarget::getOccupancyWithLocalMemSize(uint32_t Bytes,
const Function &F) const {		const Function &F) const {
unsigned WorkGroupSize = getFlatWorkGroupSizes(F).second;		unsigned WorkGroupSize = getFlatWorkGroupSizes(F).second;
unsigned WorkGroupsPerCu = getMaxWorkGroupsPerCU(WorkGroupSize);		unsigned WorkGroupsPerCu = getMaxWorkGroupsPerCU(WorkGroupSize);
unsigned MaxWaves = getMaxWavesPerEU();		unsigned MaxWaves = getMaxWavesPerEU();
unsigned Limit = getLocalMemorySize() * MaxWaves / WorkGroupsPerCu;		unsigned Limit = getLocalMemorySize() * MaxWaves / WorkGroupsPerCu;
unsigned NumWaves = Limit / (Bytes ? Bytes : 1u);		unsigned NumWaves = Limit / (Bytes ? Bytes : 1u);
NumWaves = std::min(NumWaves, MaxWaves);		NumWaves = std::min(NumWaves, MaxWaves);
NumWaves = std::max(NumWaves, 1u);		NumWaves = std::max(NumWaves, 1u);
return NumWaves;		return NumWaves;
}		}

unsigned		unsigned
AMDGPUSubtarget::getOccupancyWithLocalMemSize(const MachineFunction &MF) const {		AMDGPUCommonSubtarget::getOccupancyWithLocalMemSize(const MachineFunction &MF) const {
const auto *MFI = MF.getInfo<SIMachineFunctionInfo>();		const auto *MFI = MF.getInfo<SIMachineFunctionInfo>();
return getOccupancyWithLocalMemSize(MFI->getLDSSize(), MF.getFunction());		return getOccupancyWithLocalMemSize(MFI->getLDSSize(), MF.getFunction());
}		}

std::pair<unsigned, unsigned>		std::pair<unsigned, unsigned>
AMDGPUSubtarget::getDefaultFlatWorkGroupSize(CallingConv::ID CC) const {		AMDGPUCommonSubtarget::getDefaultFlatWorkGroupSize(CallingConv::ID CC) const {
switch (CC) {		switch (CC) {
case CallingConv::AMDGPU_CS:		case CallingConv::AMDGPU_CS:
case CallingConv::AMDGPU_KERNEL:		case CallingConv::AMDGPU_KERNEL:
case CallingConv::SPIR_KERNEL:		case CallingConv::SPIR_KERNEL:
return std::make_pair(getWavefrontSize() * 2, getWavefrontSize() * 4);		return std::make_pair(getWavefrontSize() * 2, getWavefrontSize() * 4);
case CallingConv::AMDGPU_VS:		case CallingConv::AMDGPU_VS:
case CallingConv::AMDGPU_LS:		case CallingConv::AMDGPU_LS:
case CallingConv::AMDGPU_HS:		case CallingConv::AMDGPU_HS:
case CallingConv::AMDGPU_ES:		case CallingConv::AMDGPU_ES:
case CallingConv::AMDGPU_GS:		case CallingConv::AMDGPU_GS:
case CallingConv::AMDGPU_PS:		case CallingConv::AMDGPU_PS:
return std::make_pair(1, getWavefrontSize());		return std::make_pair(1, getWavefrontSize());
default:		default:
return std::make_pair(1, 16 * getWavefrontSize());		return std::make_pair(1, 16 * getWavefrontSize());
}		}
}		}

std::pair<unsigned, unsigned> AMDGPUSubtarget::getFlatWorkGroupSizes(		std::pair<unsigned, unsigned> AMDGPUCommonSubtarget::getFlatWorkGroupSizes(
const Function &F) const {		const Function &F) const {
// FIXME: 1024 if function.		// FIXME: 1024 if function.
// Default minimum/maximum flat work group sizes.		// Default minimum/maximum flat work group sizes.
std::pair<unsigned, unsigned> Default =		std::pair<unsigned, unsigned> Default =
getDefaultFlatWorkGroupSize(F.getCallingConv());		getDefaultFlatWorkGroupSize(F.getCallingConv());

// TODO: Do not process "amdgpu-max-work-group-size" attribute once mesa		// TODO: Do not process "amdgpu-max-work-group-size" attribute once mesa
// starts using "amdgpu-flat-work-group-size" attribute.		// starts using "amdgpu-flat-work-group-size" attribute.
Show All 13 Lines	std::pair<unsigned, unsigned> AMDGPUCommonSubtarget::getFlatWorkGroupSizes(
if (Requested.first < getMinFlatWorkGroupSize())		if (Requested.first < getMinFlatWorkGroupSize())
return Default;		return Default;
if (Requested.second > getMaxFlatWorkGroupSize())		if (Requested.second > getMaxFlatWorkGroupSize())
return Default;		return Default;

return Requested;		return Requested;
}		}

std::pair<unsigned, unsigned> AMDGPUSubtarget::getWavesPerEU(		std::pair<unsigned, unsigned> AMDGPUCommonSubtarget::getWavesPerEU(
const Function &F) const {		const Function &F) const {
// Default minimum/maximum number of waves per execution unit.		// Default minimum/maximum number of waves per execution unit.
std::pair<unsigned, unsigned> Default(1, getMaxWavesPerEU());		std::pair<unsigned, unsigned> Default(1, getMaxWavesPerEU());

// Default/requested minimum/maximum flat work group sizes.		// Default/requested minimum/maximum flat work group sizes.
std::pair<unsigned, unsigned> FlatWorkGroupSizes = getFlatWorkGroupSizes(F);		std::pair<unsigned, unsigned> FlatWorkGroupSizes = getFlatWorkGroupSizes(F);

// If minimum/maximum flat work group sizes were explicitly requested using		// If minimum/maximum flat work group sizes were explicitly requested using
Show All 31 Lines	std::pair<unsigned, unsigned> AMDGPUCommonSubtarget::getWavesPerEU(
// minimum/maximum flat work group sizes.		// minimum/maximum flat work group sizes.
if (RequestedFlatWorkGroupSize &&		if (RequestedFlatWorkGroupSize &&
Requested.first < MinImpliedByFlatWorkGroupSize)		Requested.first < MinImpliedByFlatWorkGroupSize)
return Default;		return Default;

return Requested;		return Requested;
}		}

bool AMDGPUSubtarget::makeLIDRangeMetadata(Instruction *I) const {		bool AMDGPUCommonSubtarget::makeLIDRangeMetadata(Instruction *I) const {
Function *Kernel = I->getParent()->getParent();		Function *Kernel = I->getParent()->getParent();
unsigned MinSize = 0;		unsigned MinSize = 0;
unsigned MaxSize = getFlatWorkGroupSizes(*Kernel).second;		unsigned MaxSize = getFlatWorkGroupSizes(*Kernel).second;
bool IdQuery = false;		bool IdQuery = false;

// If reqd_work_group_size is present it narrows value down.		// If reqd_work_group_size is present it narrows value down.
if (auto *CI = dyn_cast<CallInst>(I)) {		if (auto *CI = dyn_cast<CallInst>(I)) {
const Function *F = CI->getCalledFunction();		const Function *F = CI->getCalledFunction();
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	bool AMDGPUCommonSubtarget::makeLIDRangeMetadata(Instruction *I) const {
MDNode *MaxWorkGroupSizeRange = MDB.createRange(APInt(32, MinSize),		MDNode *MaxWorkGroupSizeRange = MDB.createRange(APInt(32, MinSize),
APInt(32, MaxSize));		APInt(32, MaxSize));
I->setMetadata(LLVMContext::MD_range, MaxWorkGroupSizeRange);		I->setMetadata(LLVMContext::MD_range, MaxWorkGroupSizeRange);
return true;		return true;
}		}

R600Subtarget::R600Subtarget(const Triple &TT, StringRef GPU, StringRef FS,		R600Subtarget::R600Subtarget(const Triple &TT, StringRef GPU, StringRef FS,
const TargetMachine &TM) :		const TargetMachine &TM) :
AMDGPUSubtarget(TT, GPU, FS, TM),		R600GenSubtargetInfo(TT, GPU, FS),
		AMDGPUCommonSubtarget(TT, getFeatureBits()),
InstrInfo(*this),		InstrInfo(*this),
FrameLowering(TargetFrameLowering::StackGrowsUp, getStackAlignment(), 0),		FrameLowering(TargetFrameLowering::StackGrowsUp, getStackAlignment(), 0),
TLInfo(TM, *this) {}		FMA(false),
		CaymanISA(false),
		CFALUBug(false),
		DX10Clamp(false),
		HasVertexCache(false),
		R600ALUInst(false),
		FP64(false),
		TexVTXClauseSize(0),
		Gen(R600),
		TLInfo(TM, initializeSubtargetDependencies(TT, GPU, FS)),
		InstrItins(getInstrItineraryForCPU(GPU)),
		AS (AMDGPU::getAMDGPUAS(TT)) { }

SISubtarget::SISubtarget(const Triple &TT, StringRef GPU, StringRef FS,		SISubtarget::SISubtarget(const Triple &TT, StringRef GPU, StringRef FS,
const GCNTargetMachine &TM)		const GCNTargetMachine &TM)
: AMDGPUSubtarget(TT, GPU, FS, TM), InstrInfo(*this),		: AMDGPUSubtarget(TT, GPU, FS, TM), InstrInfo(*this),
FrameLowering(TargetFrameLowering::StackGrowsUp, getStackAlignment(), 0),		FrameLowering(TargetFrameLowering::StackGrowsUp, getStackAlignment(), 0),
TLInfo(TM, *this) {		TLInfo(TM, *this) {
CallLoweringInfo.reset(new AMDGPUCallLowering(*getTargetLowering()));		CallLoweringInfo.reset(new AMDGPUCallLowering(*getTargetLowering()));
Legalizer.reset(new AMDGPULegalizerInfo(*this, TM));		Legalizer.reset(new AMDGPULegalizerInfo(*this, TM));
▲ Show 20 Lines • Show All 227 Lines • ▼ Show 20 Lines	struct MemOpClusterMutation : ScheduleDAGMutation {
}		}
};		};
} // namespace		} // namespace

void SISubtarget::getPostRAMutations(		void SISubtarget::getPostRAMutations(
std::vector<std::unique_ptr<ScheduleDAGMutation>> &Mutations) const {		std::vector<std::unique_ptr<ScheduleDAGMutation>> &Mutations) const {
Mutations.push_back(llvm::make_unique<MemOpClusterMutation>(&InstrInfo));		Mutations.push_back(llvm::make_unique<MemOpClusterMutation>(&InstrInfo));
}		}

		const AMDGPUCommonSubtarget &AMDGPUCommonSubtarget::get(const MachineFunction &MF) {
		if (MF.getTarget().getTargetTriple().getArch() == Triple::amdgcn)
		return static_cast<const AMDGPUCommonSubtarget&>(MF.getSubtarget<AMDGPUSubtarget>());
		else
		return static_cast<const AMDGPUCommonSubtarget&>(MF.getSubtarget<R600Subtarget>());
		}

		const AMDGPUCommonSubtarget &AMDGPUCommonSubtarget::get(const TargetMachine &TM, const Function &F) {
		if (TM.getTargetTriple().getArch() == Triple::amdgcn)
		return static_cast<const AMDGPUCommonSubtarget&>(TM.getSubtarget<AMDGPUSubtarget>(F));
		else
		return static_cast<const AMDGPUCommonSubtarget&>(TM.getSubtarget<R600Subtarget>(F));
		}

lib/Target/AMDGPU/AMDGPUTargetMachine.h

	Show All 28 Lines

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// AMDGPU Target Machine (R600+)			// AMDGPU Target Machine (R600+)
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	class AMDGPUTargetMachine : public LLVMTargetMachine {			class AMDGPUTargetMachine : public LLVMTargetMachine {
	protected:			protected:
	std::unique_ptr<TargetLoweringObjectFile> TLOF;			std::unique_ptr<TargetLoweringObjectFile> TLOF;
	AMDGPUIntrinsicInfo IntrinsicInfo;
	AMDGPUAS AS;			AMDGPUAS AS;

	StringRef getGPUName(const Function &F) const;			StringRef getGPUName(const Function &F) const;
	StringRef getFeatureString(const Function &F) const;			StringRef getFeatureString(const Function &F) const;

	public:			public:
	static bool EnableLateStructurizeCFG;			static bool EnableLateStructurizeCFG;

	AMDGPUTargetMachine(const Target &T, const Triple &TT, StringRef CPU,			AMDGPUTargetMachine(const Target &T, const Triple &TT, StringRef CPU,
	StringRef FS, TargetOptions Options,			StringRef FS, TargetOptions Options,
	Optional<Reloc::Model> RM, Optional<CodeModel::Model> CM,			Optional<Reloc::Model> RM, Optional<CodeModel::Model> CM,
	CodeGenOpt::Level OL);			CodeGenOpt::Level OL);
	~AMDGPUTargetMachine() override;			~AMDGPUTargetMachine() override;

	const AMDGPUSubtarget *getSubtargetImpl() const;			const TargetSubtargetInfo *getSubtargetImpl() const;
	const AMDGPUSubtarget *getSubtargetImpl(const Function &) const override = 0;			const TargetSubtargetInfo *getSubtargetImpl(const Function &) const override = 0;

	const AMDGPUIntrinsicInfo *getIntrinsicInfo() const override {
	return &IntrinsicInfo;
	}

	TargetLoweringObjectFile *getObjFileLowering() const override {			TargetLoweringObjectFile *getObjFileLowering() const override {
	return TLOF.get();			return TLOF.get();
	}			}
	AMDGPUAS getAMDGPUAS() const {			AMDGPUAS getAMDGPUAS() const {
	return AS;			return AS;
	}			}

	Show All 32 Lines
	};			};

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// GCN Target Machine (SI+)			// GCN Target Machine (SI+)
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	class GCNTargetMachine final : public AMDGPUTargetMachine {			class GCNTargetMachine final : public AMDGPUTargetMachine {
	private:			private:
				AMDGPUIntrinsicInfo IntrinsicInfo;
	mutable StringMap<std::unique_ptr<SISubtarget>> SubtargetMap;			mutable StringMap<std::unique_ptr<SISubtarget>> SubtargetMap;

	public:			public:
	GCNTargetMachine(const Target &T, const Triple &TT, StringRef CPU,			GCNTargetMachine(const Target &T, const Triple &TT, StringRef CPU,
	StringRef FS, TargetOptions Options,			StringRef FS, TargetOptions Options,
	Optional<Reloc::Model> RM, Optional<CodeModel::Model> CM,			Optional<Reloc::Model> RM, Optional<CodeModel::Model> CM,
	CodeGenOpt::Level OL, bool JIT);			CodeGenOpt::Level OL, bool JIT);

	TargetPassConfig *createPassConfig(PassManagerBase &PM) override;			TargetPassConfig *createPassConfig(PassManagerBase &PM) override;

	const SISubtarget *getSubtargetImpl(const Function &) const override;			const SISubtarget *getSubtargetImpl(const Function &) const override;

	TargetTransformInfo getTargetTransformInfo(const Function &F) override;			TargetTransformInfo getTargetTransformInfo(const Function &F) override;

				const AMDGPUIntrinsicInfo *getIntrinsicInfo() const override {
				return &IntrinsicInfo;
				}

	bool useIPRA() const override {			bool useIPRA() const override {
	return true;			return true;
	}			}
	};			};

	} // end namespace llvm			} // end namespace llvm

	#endif // LLVM_LIB_TARGET_AMDGPU_AMDGPUTARGETMACHINE_H			#endif // LLVM_LIB_TARGET_AMDGPU_AMDGPUTARGETMACHINE_H

lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h

Show All 39 Lines
class Value;		class Value;

class AMDGPUTTIImpl final : public BasicTTIImplBase<AMDGPUTTIImpl> {		class AMDGPUTTIImpl final : public BasicTTIImplBase<AMDGPUTTIImpl> {
using BaseT = BasicTTIImplBase<AMDGPUTTIImpl>;		using BaseT = BasicTTIImplBase<AMDGPUTTIImpl>;
using TTI = TargetTransformInfo;		using TTI = TargetTransformInfo;

friend BaseT;		friend BaseT;

const AMDGPUSubtarget *ST;		Triple TargetTriple;
const AMDGPUTargetLowering *TLI;

public:		public:
explicit AMDGPUTTIImpl(const AMDGPUTargetMachine *TM, const Function &F)		explicit AMDGPUTTIImpl(const AMDGPUTargetMachine *TM, const Function &F)
: BaseT(TM, F.getParent()->getDataLayout()),		: BaseT(TM, F.getParent()->getDataLayout()),
ST(TM->getSubtargetImpl(F)),		TargetTriple(TM->getTargetTriple()) {}
TLI(ST->getTargetLowering()) {}

const AMDGPUSubtarget *getST() const { return ST; }
const AMDGPUTargetLowering *getTLI() const { return TLI; }

void getUnrollingPreferences(Loop *L, ScalarEvolution &SE,		void getUnrollingPreferences(Loop *L, ScalarEvolution &SE,
TTI::UnrollingPreferences &UP);		TTI::UnrollingPreferences &UP);
};		};

class GCNTTIImpl final : public BasicTTIImplBase<GCNTTIImpl> {		class GCNTTIImpl final : public BasicTTIImplBase<GCNTTIImpl> {
using BaseT = BasicTTIImplBase<GCNTTIImpl>;		using BaseT = BasicTTIImplBase<GCNTTIImpl>;
using TTI = TargetTransformInfo;		using TTI = TargetTransformInfo;
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	class GCNTTIImpl final : public BasicTTIImplBase<GCNTTIImpl> {
inline int get64BitInstrCost() const {		inline int get64BitInstrCost() const {
return ST->hasHalfRate64Ops() ?		return ST->hasHalfRate64Ops() ?
getHalfRateInstrCost() : getQuarterRateInstrCost();		getHalfRateInstrCost() : getQuarterRateInstrCost();
}		}

public:		public:
explicit GCNTTIImpl(const AMDGPUTargetMachine *TM, const Function &F)		explicit GCNTTIImpl(const AMDGPUTargetMachine *TM, const Function &F)
: BaseT(TM, F.getParent()->getDataLayout()),		: BaseT(TM, F.getParent()->getDataLayout()),
ST(TM->getSubtargetImpl(F)),		ST(static_cast<const AMDGPUSubtarget*>(TM->getSubtargetImpl(F))),
TLI(ST->getTargetLowering()),		TLI(ST->getTargetLowering()),
CommonTTI(TM, F),		CommonTTI(TM, F),
IsGraphicsShader(AMDGPU::isShader(F.getCallingConv())) {}		IsGraphicsShader(AMDGPU::isShader(F.getCallingConv())) {}

bool hasBranchDivergence() { return true; }		bool hasBranchDivergence() { return true; }

void getUnrollingPreferences(Loop *L, ScalarEvolution &SE,		void getUnrollingPreferences(Loop *L, ScalarEvolution &SE,
TTI::UnrollingPreferences &UP);		TTI::UnrollingPreferences &UP);
▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines
};		};

class R600TTIImpl final : public BasicTTIImplBase<R600TTIImpl> {		class R600TTIImpl final : public BasicTTIImplBase<R600TTIImpl> {
using BaseT = BasicTTIImplBase<R600TTIImpl>;		using BaseT = BasicTTIImplBase<R600TTIImpl>;
using TTI = TargetTransformInfo;		using TTI = TargetTransformInfo;

friend BaseT;		friend BaseT;

const AMDGPUSubtarget *ST;		const R600Subtarget *ST;
const AMDGPUTargetLowering *TLI;		const AMDGPUTargetLowering *TLI;
AMDGPUTTIImpl CommonTTI;		AMDGPUTTIImpl CommonTTI;

public:		public:
explicit R600TTIImpl(const AMDGPUTargetMachine *TM, const Function &F)		explicit R600TTIImpl(const AMDGPUTargetMachine *TM, const Function &F)
: BaseT(TM, F.getParent()->getDataLayout()),		: BaseT(TM, F.getParent()->getDataLayout()),
ST(TM->getSubtargetImpl(F)),		ST(static_cast<const R600Subtarget*>(TM->getSubtargetImpl(F))),
TLI(ST->getTargetLowering()),		TLI(ST->getTargetLowering()),
CommonTTI(TM, F) {}		CommonTTI(TM, F) {}

const AMDGPUSubtarget *getST() const { return ST; }		const R600Subtarget *getST() const { return ST; }
const AMDGPUTargetLowering *getTLI() const { return TLI; }		const AMDGPUTargetLowering *getTLI() const { return TLI; }

void getUnrollingPreferences(Loop *L, ScalarEvolution &SE,		void getUnrollingPreferences(Loop *L, ScalarEvolution &SE,
TTI::UnrollingPreferences &UP);		TTI::UnrollingPreferences &UP);
unsigned getHardwareNumberOfRegisters(bool Vec) const;		unsigned getHardwareNumberOfRegisters(bool Vec) const;
unsigned getNumberOfRegisters(bool Vec) const;		unsigned getNumberOfRegisters(bool Vec) const;
unsigned getRegisterBitWidth(bool Vector) const;		unsigned getRegisterBitWidth(bool Vector) const;
unsigned getMinVectorRegisterBitWidth() const;		unsigned getMinVectorRegisterBitWidth() const;
Show All 17 Lines

lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp

Show First 20 Lines • Show All 96 Lines • ▼ Show 20 Lines	void AMDGPUTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE,

// TODO: Do we want runtime unrolling?		// TODO: Do we want runtime unrolling?

// Maximum alloca size than can fit registers. Reserve 16 registers.		// Maximum alloca size than can fit registers. Reserve 16 registers.
const unsigned MaxAlloca = (256 - 16) * 4;		const unsigned MaxAlloca = (256 - 16) * 4;
unsigned ThresholdPrivate = UnrollThresholdPrivate;		unsigned ThresholdPrivate = UnrollThresholdPrivate;
unsigned ThresholdLocal = UnrollThresholdLocal;		unsigned ThresholdLocal = UnrollThresholdLocal;
unsigned MaxBoost = std::max(ThresholdPrivate, ThresholdLocal);		unsigned MaxBoost = std::max(ThresholdPrivate, ThresholdLocal);
AMDGPUAS ASST = ST->getAMDGPUAS();		const AMDGPUAS &ASST = AMDGPU::getAMDGPUAS(TargetTriple);
for (const BasicBlock *BB : L->getBlocks()) {		for (const BasicBlock *BB : L->getBlocks()) {
const DataLayout &DL = BB->getModule()->getDataLayout();		const DataLayout &DL = BB->getModule()->getDataLayout();
unsigned LocalGEPsSeen = 0;		unsigned LocalGEPsSeen = 0;

if (llvm::any_of(L->getSubLoops(), [BB](const Loop* SubLoop) {		if (llvm::any_of(L->getSubLoops(), [BB](const Loop* SubLoop) {
return SubLoop->contains(BB); }))		return SubLoop->contains(BB); }))
continue; // Block belongs to an inner loop.		continue; // Block belongs to an inner loop.

▲ Show 20 Lines • Show All 467 Lines • ▼ Show 20 Lines	case Intrinsic::amdgcn_readlane:
return true;		return true;
}		}
}		}
return false;		return false;
}		}

unsigned GCNTTIImpl::getShuffleCost(TTI::ShuffleKind Kind, Type *Tp, int Index,		unsigned GCNTTIImpl::getShuffleCost(TTI::ShuffleKind Kind, Type *Tp, int Index,
Type *SubTp) {		Type *SubTp) {
if (ST->hasVOP3PInsts()) {		if (ST->hasVOP3PInsts()) {
		arsenmUnsubmitted Done Reply Inline Actions Can’t this be in a gcn class? I think all of this is for packed anyway arsenm: Can’t this be in a gcn class? I think all of this is for packed anyway
VectorType *VT = cast<VectorType>(Tp);		VectorType *VT = cast<VectorType>(Tp);
if (VT->getNumElements() == 2 &&		if (VT->getNumElements() == 2 &&
DL.getTypeSizeInBits(VT->getElementType()) == 16) {		DL.getTypeSizeInBits(VT->getElementType()) == 16) {
// With op_sel VOP3P instructions freely can access the low half or high		// With op_sel VOP3P instructions freely can access the low half or high
// half of a register, so any swizzle is free.		// half of a register, so any swizzle is free.

switch (Kind) {		switch (Kind) {
case TTI::SK_Broadcast:		case TTI::SK_Broadcast:
▲ Show 20 Lines • Show All 135 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDILCFGStructurizer.cpp

Show First 20 Lines • Show All 426 Lines • ▼ Show 20 Lines
}		}

void AMDGPUCFGStructurizer::reversePredicateSetter(		void AMDGPUCFGStructurizer::reversePredicateSetter(
MachineBasicBlock::iterator I, MachineBasicBlock &MBB) {		MachineBasicBlock::iterator I, MachineBasicBlock &MBB) {
assert(I.isValid() && "Expected valid iterator");		assert(I.isValid() && "Expected valid iterator");
for (;; --I) {		for (;; --I) {
if (I == MBB.end())		if (I == MBB.end())
continue;		continue;
if (I->getOpcode() == AMDGPU::PRED_X) {		if (I->getOpcode() == R600::PRED_X) {
switch (I->getOperand(2).getImm()) {		switch (I->getOperand(2).getImm()) {
case AMDGPU::PRED_SETE_INT:		case R600::PRED_SETE_INT:
I->getOperand(2).setImm(AMDGPU::PRED_SETNE_INT);		I->getOperand(2).setImm(R600::PRED_SETNE_INT);
return;		return;
case AMDGPU::PRED_SETNE_INT:		case R600::PRED_SETNE_INT:
I->getOperand(2).setImm(AMDGPU::PRED_SETE_INT);		I->getOperand(2).setImm(R600::PRED_SETE_INT);
return;		return;
case AMDGPU::PRED_SETE:		case R600::PRED_SETE:
I->getOperand(2).setImm(AMDGPU::PRED_SETNE);		I->getOperand(2).setImm(R600::PRED_SETNE);
return;		return;
case AMDGPU::PRED_SETNE:		case R600::PRED_SETNE:
I->getOperand(2).setImm(AMDGPU::PRED_SETE);		I->getOperand(2).setImm(R600::PRED_SETE);
return;		return;
default:		default:
llvm_unreachable("PRED_X Opcode invalid!");		llvm_unreachable("PRED_X Opcode invalid!");
}		}
}		}
}		}
}		}

▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	void AMDGPUCFGStructurizer::insertCondBranchBefore(
//insert before		//insert before
blk->insert(I, NewInstr);		blk->insert(I, NewInstr);
MachineInstrBuilder(*MF, NewInstr).addReg(RegNum, false);		MachineInstrBuilder(*MF, NewInstr).addReg(RegNum, false);
SHOWNEWINSTR(NewInstr);		SHOWNEWINSTR(NewInstr);
}		}

int AMDGPUCFGStructurizer::getBranchNzeroOpcode(int OldOpcode) {		int AMDGPUCFGStructurizer::getBranchNzeroOpcode(int OldOpcode) {
switch(OldOpcode) {		switch(OldOpcode) {
case AMDGPU::JUMP_COND:		case R600::JUMP_COND:
case AMDGPU::JUMP: return AMDGPU::IF_PREDICATE_SET;		case R600::JUMP: return R600::IF_PREDICATE_SET;
case AMDGPU::BRANCH_COND_i32:		case R600::BRANCH_COND_i32:
case AMDGPU::BRANCH_COND_f32: return AMDGPU::IF_LOGICALNZ_f32;		case R600::BRANCH_COND_f32: return R600::IF_LOGICALNZ_f32;
default: llvm_unreachable("internal error");		default: llvm_unreachable("internal error");
}		}
return -1;		return -1;
}		}

int AMDGPUCFGStructurizer::getBranchZeroOpcode(int OldOpcode) {		int AMDGPUCFGStructurizer::getBranchZeroOpcode(int OldOpcode) {
switch(OldOpcode) {		switch(OldOpcode) {
case AMDGPU::JUMP_COND:		case R600::JUMP_COND:
case AMDGPU::JUMP: return AMDGPU::IF_PREDICATE_SET;		case R600::JUMP: return R600::IF_PREDICATE_SET;
case AMDGPU::BRANCH_COND_i32:		case R600::BRANCH_COND_i32:
case AMDGPU::BRANCH_COND_f32: return AMDGPU::IF_LOGICALZ_f32;		case R600::BRANCH_COND_f32: return R600::IF_LOGICALZ_f32;
default: llvm_unreachable("internal error");		default: llvm_unreachable("internal error");
}		}
return -1;		return -1;
}		}

int AMDGPUCFGStructurizer::getContinueNzeroOpcode(int OldOpcode) {		int AMDGPUCFGStructurizer::getContinueNzeroOpcode(int OldOpcode) {
switch(OldOpcode) {		switch(OldOpcode) {
case AMDGPU::JUMP_COND:		case R600::JUMP_COND:
case AMDGPU::JUMP: return AMDGPU::CONTINUE_LOGICALNZ_i32;		case R600::JUMP: return R600::CONTINUE_LOGICALNZ_i32;
default: llvm_unreachable("internal error");		default: llvm_unreachable("internal error");
}		}
return -1;		return -1;
}		}

int AMDGPUCFGStructurizer::getContinueZeroOpcode(int OldOpcode) {		int AMDGPUCFGStructurizer::getContinueZeroOpcode(int OldOpcode) {
switch(OldOpcode) {		switch(OldOpcode) {
case AMDGPU::JUMP_COND:		case R600::JUMP_COND:
case AMDGPU::JUMP: return AMDGPU::CONTINUE_LOGICALZ_i32;		case R600::JUMP: return R600::CONTINUE_LOGICALZ_i32;
default: llvm_unreachable("internal error");		default: llvm_unreachable("internal error");
}		}
return -1;		return -1;
}		}

MachineBasicBlock AMDGPUCFGStructurizer::getTrueBranch(MachineInstr MI) {		MachineBasicBlock AMDGPUCFGStructurizer::getTrueBranch(MachineInstr MI) {
return MI->getOperand(0).getMBB();		return MI->getOperand(0).getMBB();
}		}
Show All 11 Lines	AMDGPUCFGStructurizer::getFalseBranch(MachineBasicBlock *MBB,
MachineBasicBlock::succ_iterator It = MBB->succ_begin();		MachineBasicBlock::succ_iterator It = MBB->succ_begin();
MachineBasicBlock::succ_iterator Next = It;		MachineBasicBlock::succ_iterator Next = It;
++Next;		++Next;
return (It == TrueBranch) ? Next : *It;		return (It == TrueBranch) ? Next : *It;
}		}

bool AMDGPUCFGStructurizer::isCondBranch(MachineInstr *MI) {		bool AMDGPUCFGStructurizer::isCondBranch(MachineInstr *MI) {
switch (MI->getOpcode()) {		switch (MI->getOpcode()) {
case AMDGPU::JUMP_COND:		case R600::JUMP_COND:
case AMDGPU::BRANCH_COND_i32:		case R600::BRANCH_COND_i32:
case AMDGPU::BRANCH_COND_f32: return true;		case R600::BRANCH_COND_f32: return true;
default:		default:
return false;		return false;
}		}
return false;		return false;
}		}

bool AMDGPUCFGStructurizer::isUncondBranch(MachineInstr *MI) {		bool AMDGPUCFGStructurizer::isUncondBranch(MachineInstr *MI) {
switch (MI->getOpcode()) {		switch (MI->getOpcode()) {
case AMDGPU::JUMP:		case R600::JUMP:
case AMDGPU::BRANCH:		case R600::BRANCH:
return true;		return true;
default:		default:
return false;		return false;
}		}
return false;		return false;
}		}

DebugLoc AMDGPUCFGStructurizer::getLastDebugLocInBB(MachineBasicBlock *MBB) {		DebugLoc AMDGPUCFGStructurizer::getLastDebugLocInBB(MachineBasicBlock *MBB) {
Show All 32 Lines	MachineInstr *AMDGPUCFGStructurizer::getLoopendBlockBranchInstr(
}		}
return nullptr;		return nullptr;
}		}

MachineInstr AMDGPUCFGStructurizer::getReturnInstr(MachineBasicBlock MBB) {		MachineInstr AMDGPUCFGStructurizer::getReturnInstr(MachineBasicBlock MBB) {
MachineBasicBlock::reverse_iterator It = MBB->rbegin();		MachineBasicBlock::reverse_iterator It = MBB->rbegin();
if (It != MBB->rend()) {		if (It != MBB->rend()) {
MachineInstr instr = &(It);		MachineInstr instr = &(It);
if (instr->getOpcode() == AMDGPU::RETURN)		if (instr->getOpcode() == R600::RETURN)
return instr;		return instr;
}		}
return nullptr;		return nullptr;
}		}

bool AMDGPUCFGStructurizer::isReturnBlock(MachineBasicBlock *MBB) {		bool AMDGPUCFGStructurizer::isReturnBlock(MachineBasicBlock *MBB) {
MachineInstr *MI = getReturnInstr(MBB);		MachineInstr *MI = getReturnInstr(MBB);
bool IsReturn = (MBB->succ_size() == 0);		bool IsReturn = (MBB->succ_size() == 0);
Show All 36 Lines	assert((!MBB->getParent()->getJumpTableInfo()
&& "found a jump table");		&& "found a jump table");

//collect continue right before endloop		//collect continue right before endloop
SmallVector<MachineInstr *, DEFAULT_VEC_SLOTS> ContInstr;		SmallVector<MachineInstr *, DEFAULT_VEC_SLOTS> ContInstr;
MachineBasicBlock::iterator Pre = MBB->begin();		MachineBasicBlock::iterator Pre = MBB->begin();
MachineBasicBlock::iterator E = MBB->end();		MachineBasicBlock::iterator E = MBB->end();
MachineBasicBlock::iterator It = Pre;		MachineBasicBlock::iterator It = Pre;
while (It != E) {		while (It != E) {
if (Pre->getOpcode() == AMDGPU::CONTINUE		if (Pre->getOpcode() == R600::CONTINUE
&& It->getOpcode() == AMDGPU::ENDLOOP)		&& It->getOpcode() == R600::ENDLOOP)
ContInstr.push_back(&*Pre);		ContInstr.push_back(&*Pre);
Pre = It;		Pre = It;
++It;		++It;
}		}

//delete continue right before endloop		//delete continue right before endloop
for (unsigned i = 0; i < ContInstr.size(); ++i)		for (unsigned i = 0; i < ContInstr.size(); ++i)
ContInstr[i]->eraseFromParent();		ContInstr[i]->eraseFromParent();
▲ Show 20 Lines • Show All 598 Lines • ▼ Show 20 Lines	if (!MigrateTrue \|\| !MigrateFalse) {
// lot of instructions.		// lot of instructions.
return 0;		return 0;
}		}

int NumNewBlk = 0;		int NumNewBlk = 0;

bool LandBlkHasOtherPred = (LandBlk->pred_size() > 2);		bool LandBlkHasOtherPred = (LandBlk->pred_size() > 2);

//insert AMDGPU::ENDIF to avoid special case "input landBlk == NULL"		//insert R600::ENDIF to avoid special case "input landBlk == NULL"
MachineBasicBlock::iterator I = insertInstrBefore(LandBlk, AMDGPU::ENDIF);		MachineBasicBlock::iterator I = insertInstrBefore(LandBlk, R600::ENDIF);

if (LandBlkHasOtherPred) {		if (LandBlkHasOtherPred) {
report_fatal_error("Extra register needed to handle CFG");		report_fatal_error("Extra register needed to handle CFG");
unsigned CmpResReg =		unsigned CmpResReg =
HeadMBB->getParent()->getRegInfo().createVirtualRegister(I32RC);		HeadMBB->getParent()->getRegInfo().createVirtualRegister(I32RC);
report_fatal_error("Extra compare instruction needed to handle CFG");		report_fatal_error("Extra compare instruction needed to handle CFG");
insertCondBranchBefore(LandBlk, I, AMDGPU::IF_PREDICATE_SET,		insertCondBranchBefore(LandBlk, I, R600::IF_PREDICATE_SET,
CmpResReg, DebugLoc());		CmpResReg, DebugLoc());
}		}

// XXX: We are running this after RA, so creating virtual registers will		// XXX: We are running this after RA, so creating virtual registers will
// cause an assertion failure in the PostRA scheduling pass.		// cause an assertion failure in the PostRA scheduling pass.
unsigned InitReg =		unsigned InitReg =
HeadMBB->getParent()->getRegInfo().createVirtualRegister(I32RC);		HeadMBB->getParent()->getRegInfo().createVirtualRegister(I32RC);
insertCondBranchBefore(LandBlk, I, AMDGPU::IF_PREDICATE_SET, InitReg,		insertCondBranchBefore(LandBlk, I, R600::IF_PREDICATE_SET, InitReg,
DebugLoc());		DebugLoc());

if (MigrateTrue) {		if (MigrateTrue) {
migrateInstruction(TrueMBB, LandBlk, I);		migrateInstruction(TrueMBB, LandBlk, I);
// need to uncondionally insert the assignment to ensure a path from its		// need to uncondionally insert the assignment to ensure a path from its
// predecessor rather than headBlk has valid value in initReg if		// predecessor rather than headBlk has valid value in initReg if
// (initVal != 1).		// (initVal != 1).
report_fatal_error("Extra register needed to handle CFG");		report_fatal_error("Extra register needed to handle CFG");
}		}
insertInstrBefore(I, AMDGPU::ELSE);		insertInstrBefore(I, R600::ELSE);

if (MigrateFalse) {		if (MigrateFalse) {
migrateInstruction(FalseMBB, LandBlk, I);		migrateInstruction(FalseMBB, LandBlk, I);
// need to uncondionally insert the assignment to ensure a path from its		// need to uncondionally insert the assignment to ensure a path from its
// predecessor rather than headBlk has valid value in initReg if		// predecessor rather than headBlk has valid value in initReg if
// (initVal != 0)		// (initVal != 0)
report_fatal_error("Extra register needed to handle CFG");		report_fatal_error("Extra register needed to handle CFG");
}		}

if (LandBlkHasOtherPred) {		if (LandBlkHasOtherPred) {
// add endif		// add endif
insertInstrBefore(I, AMDGPU::ENDIF);		insertInstrBefore(I, R600::ENDIF);

// put initReg = 2 to other predecessors of landBlk		// put initReg = 2 to other predecessors of landBlk
for (MachineBasicBlock::pred_iterator PI = LandBlk->pred_begin(),		for (MachineBasicBlock::pred_iterator PI = LandBlk->pred_begin(),
PE = LandBlk->pred_end(); PI != PE; ++PI) {		PE = LandBlk->pred_end(); PI != PE; ++PI) {
MachineBasicBlock MBB = PI;		MachineBasicBlock MBB = PI;
if (MBB != TrueMBB && MBB != FalseMBB)		if (MBB != TrueMBB && MBB != FalseMBB)
report_fatal_error("Extra register needed to handle CFG");		report_fatal_error("Extra register needed to handle CFG");
}		}
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	if (TrueMBB) {
MBB->removeSuccessor(TrueMBB, true);		MBB->removeSuccessor(TrueMBB, true);
if (LandMBB && TrueMBB->succ_size()!=0)		if (LandMBB && TrueMBB->succ_size()!=0)
TrueMBB->removeSuccessor(LandMBB, true);		TrueMBB->removeSuccessor(LandMBB, true);
retireBlock(TrueMBB);		retireBlock(TrueMBB);
MLI->removeBlock(TrueMBB);		MLI->removeBlock(TrueMBB);
}		}

if (FalseMBB) {		if (FalseMBB) {
insertInstrBefore(I, AMDGPU::ELSE);		insertInstrBefore(I, R600::ELSE);
MBB->splice(I, FalseMBB, FalseMBB->begin(),		MBB->splice(I, FalseMBB, FalseMBB->begin(),
FalseMBB->end());		FalseMBB->end());
MBB->removeSuccessor(FalseMBB, true);		MBB->removeSuccessor(FalseMBB, true);
if (LandMBB && FalseMBB->succ_size() != 0)		if (LandMBB && FalseMBB->succ_size() != 0)
FalseMBB->removeSuccessor(LandMBB, true);		FalseMBB->removeSuccessor(LandMBB, true);
retireBlock(FalseMBB);		retireBlock(FalseMBB);
MLI->removeBlock(FalseMBB);		MLI->removeBlock(FalseMBB);
}		}
insertInstrBefore(I, AMDGPU::ENDIF);		insertInstrBefore(I, R600::ENDIF);

BranchMI->eraseFromParent();		BranchMI->eraseFromParent();

if (LandMBB && TrueMBB && FalseMBB)		if (LandMBB && TrueMBB && FalseMBB)
MBB->addSuccessor(LandMBB);		MBB->addSuccessor(LandMBB);
}		}

void AMDGPUCFGStructurizer::mergeLooplandBlock(MachineBasicBlock *DstBlk,		void AMDGPUCFGStructurizer::mergeLooplandBlock(MachineBasicBlock *DstBlk,
MachineBasicBlock *LandMBB) {		MachineBasicBlock *LandMBB) {
LLVM_DEBUG(dbgs() << "loopPattern header = BB" << DstBlk->getNumber()		LLVM_DEBUG(dbgs() << "loopPattern header = BB" << DstBlk->getNumber()
<< " land = BB" << LandMBB->getNumber() << "\n";);		<< " land = BB" << LandMBB->getNumber() << "\n";);

insertInstrBefore(DstBlk, AMDGPU::WHILELOOP, DebugLoc());		insertInstrBefore(DstBlk, R600::WHILELOOP, DebugLoc());
insertInstrEnd(DstBlk, AMDGPU::ENDLOOP, DebugLoc());		insertInstrEnd(DstBlk, R600::ENDLOOP, DebugLoc());
DstBlk->replaceSuccessor(DstBlk, LandMBB);		DstBlk->replaceSuccessor(DstBlk, LandMBB);
}		}

void AMDGPUCFGStructurizer::mergeLoopbreakBlock(MachineBasicBlock *ExitingMBB,		void AMDGPUCFGStructurizer::mergeLoopbreakBlock(MachineBasicBlock *ExitingMBB,
MachineBasicBlock *LandMBB) {		MachineBasicBlock *LandMBB) {
LLVM_DEBUG(dbgs() << "loopbreakPattern exiting = BB"		LLVM_DEBUG(dbgs() << "loopbreakPattern exiting = BB"
<< ExitingMBB->getNumber() << " land = BB"		<< ExitingMBB->getNumber() << " land = BB"
<< LandMBB->getNumber() << "\n";);		<< LandMBB->getNumber() << "\n";);
MachineInstr *BranchMI = getLoopendBlockBranchInstr(ExitingMBB);		MachineInstr *BranchMI = getLoopendBlockBranchInstr(ExitingMBB);
assert(BranchMI && isCondBranch(BranchMI));		assert(BranchMI && isCondBranch(BranchMI));
DebugLoc DL = BranchMI->getDebugLoc();		DebugLoc DL = BranchMI->getDebugLoc();
MachineBasicBlock *TrueBranch = getTrueBranch(BranchMI);		MachineBasicBlock *TrueBranch = getTrueBranch(BranchMI);
MachineBasicBlock::iterator I = BranchMI;		MachineBasicBlock::iterator I = BranchMI;
if (TrueBranch != LandMBB)		if (TrueBranch != LandMBB)
reversePredicateSetter(I, *I->getParent());		reversePredicateSetter(I, *I->getParent());
insertCondBranchBefore(ExitingMBB, I, AMDGPU::IF_PREDICATE_SET, AMDGPU::PREDICATE_BIT, DL);		insertCondBranchBefore(ExitingMBB, I, R600::IF_PREDICATE_SET, R600::PREDICATE_BIT, DL);
insertInstrBefore(I, AMDGPU::BREAK);		insertInstrBefore(I, R600::BREAK);
insertInstrBefore(I, AMDGPU::ENDIF);		insertInstrBefore(I, R600::ENDIF);
//now branchInst can be erase safely		//now branchInst can be erase safely
BranchMI->eraseFromParent();		BranchMI->eraseFromParent();
//now take care of successors, retire blocks		//now take care of successors, retire blocks
ExitingMBB->removeSuccessor(LandMBB, true);		ExitingMBB->removeSuccessor(LandMBB, true);
}		}

void AMDGPUCFGStructurizer::settleLoopcontBlock(MachineBasicBlock *ContingMBB,		void AMDGPUCFGStructurizer::settleLoopcontBlock(MachineBasicBlock *ContingMBB,
MachineBasicBlock *ContMBB) {		MachineBasicBlock *ContMBB) {
Show All 12 Lines	if (MI) {
bool UseContinueLogical = ((&*ContingMBB->rbegin()) == MI);		bool UseContinueLogical = ((&*ContingMBB->rbegin()) == MI);

if (!UseContinueLogical) {		if (!UseContinueLogical) {
int BranchOpcode =		int BranchOpcode =
TrueBranch == ContMBB ? getBranchNzeroOpcode(OldOpcode) :		TrueBranch == ContMBB ? getBranchNzeroOpcode(OldOpcode) :
getBranchZeroOpcode(OldOpcode);		getBranchZeroOpcode(OldOpcode);
insertCondBranchBefore(I, BranchOpcode, DL);		insertCondBranchBefore(I, BranchOpcode, DL);
// insertEnd to ensure phi-moves, if exist, go before the continue-instr.		// insertEnd to ensure phi-moves, if exist, go before the continue-instr.
insertInstrEnd(ContingMBB, AMDGPU::CONTINUE, DL);		insertInstrEnd(ContingMBB, R600::CONTINUE, DL);
insertInstrEnd(ContingMBB, AMDGPU::ENDIF, DL);		insertInstrEnd(ContingMBB, R600::ENDIF, DL);
} else {		} else {
int BranchOpcode =		int BranchOpcode =
TrueBranch == ContMBB ? getContinueNzeroOpcode(OldOpcode) :		TrueBranch == ContMBB ? getContinueNzeroOpcode(OldOpcode) :
getContinueZeroOpcode(OldOpcode);		getContinueZeroOpcode(OldOpcode);
insertCondBranchBefore(I, BranchOpcode, DL);		insertCondBranchBefore(I, BranchOpcode, DL);
}		}

MI->eraseFromParent();		MI->eraseFromParent();
} else {		} else {
// if we've arrived here then we've already erased the branch instruction		// if we've arrived here then we've already erased the branch instruction
// travel back up the basic block to see the last reference of our debug		// travel back up the basic block to see the last reference of our debug
// location we've just inserted that reference here so it should be		// location we've just inserted that reference here so it should be
// representative insertEnd to ensure phi-moves, if exist, go before the		// representative insertEnd to ensure phi-moves, if exist, go before the
// continue-instr.		// continue-instr.
insertInstrEnd(ContingMBB, AMDGPU::CONTINUE,		insertInstrEnd(ContingMBB, R600::CONTINUE,
getLastDebugLocInBB(ContingMBB));		getLastDebugLocInBB(ContingMBB));
}		}
}		}

int AMDGPUCFGStructurizer::cloneOnSideEntryTo(MachineBasicBlock *PreMBB,		int AMDGPUCFGStructurizer::cloneOnSideEntryTo(MachineBasicBlock *PreMBB,
MachineBasicBlock SrcMBB, MachineBasicBlock DstMBB) {		MachineBasicBlock SrcMBB, MachineBasicBlock DstMBB) {
int Cloned = 0;		int Cloned = 0;
assert(PreMBB->isSuccessor(SrcMBB));		assert(PreMBB->isSuccessor(SrcMBB));
▲ Show 20 Lines • Show All 110 Lines • ▼ Show 20 Lines	void AMDGPUCFGStructurizer::removeRedundantConditionalBranch(
SHOWNEWBLK(MBB1, "Removing redundant successor");		SHOWNEWBLK(MBB1, "Removing redundant successor");
MBB->removeSuccessor(MBB1, true);		MBB->removeSuccessor(MBB1, true);
}		}

void AMDGPUCFGStructurizer::addDummyExitBlock(		void AMDGPUCFGStructurizer::addDummyExitBlock(
SmallVectorImpl<MachineBasicBlock*> &RetMBB) {		SmallVectorImpl<MachineBasicBlock*> &RetMBB) {
MachineBasicBlock *DummyExitBlk = FuncRep->CreateMachineBasicBlock();		MachineBasicBlock *DummyExitBlk = FuncRep->CreateMachineBasicBlock();
FuncRep->push_back(DummyExitBlk); //insert to function		FuncRep->push_back(DummyExitBlk); //insert to function
insertInstrEnd(DummyExitBlk, AMDGPU::RETURN);		insertInstrEnd(DummyExitBlk, R600::RETURN);

for (SmallVectorImpl<MachineBasicBlock *>::iterator It = RetMBB.begin(),		for (SmallVectorImpl<MachineBasicBlock *>::iterator It = RetMBB.begin(),
E = RetMBB.end(); It != E; ++It) {		E = RetMBB.end(); It != E; ++It) {
MachineBasicBlock MBB = It;		MachineBasicBlock MBB = It;
MachineInstr *MI = getReturnInstr(MBB);		MachineInstr *MI = getReturnInstr(MBB);
if (MI)		if (MI)
MI->eraseFromParent();		MI->eraseFromParent();
MBB->addSuccessor(DummyExitBlk);		MBB->addSuccessor(DummyExitBlk);
▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

lib/Target/AMDGPU/CMakeLists.txt

	set(LLVM_TARGET_DEFINITIONS AMDGPU.td)			set(LLVM_TARGET_DEFINITIONS AMDGPU.td)

	tablegen(LLVM AMDGPUGenAsmMatcher.inc -gen-asm-matcher)			tablegen(LLVM AMDGPUGenAsmMatcher.inc -gen-asm-matcher)
	tablegen(LLVM AMDGPUGenAsmWriter.inc -gen-asm-writer)			tablegen(LLVM AMDGPUGenAsmWriter.inc -gen-asm-writer)
	tablegen(LLVM AMDGPUGenCallingConv.inc -gen-callingconv)			tablegen(LLVM AMDGPUGenCallingConv.inc -gen-callingconv)
	tablegen(LLVM AMDGPUGenDAGISel.inc -gen-dag-isel)			tablegen(LLVM AMDGPUGenDAGISel.inc -gen-dag-isel)
	tablegen(LLVM AMDGPUGenDFAPacketizer.inc -gen-dfa-packetizer)
	tablegen(LLVM AMDGPUGenDisassemblerTables.inc -gen-disassembler)			tablegen(LLVM AMDGPUGenDisassemblerTables.inc -gen-disassembler)
	tablegen(LLVM AMDGPUGenInstrInfo.inc -gen-instr-info)			tablegen(LLVM AMDGPUGenInstrInfo.inc -gen-instr-info)
	tablegen(LLVM AMDGPUGenIntrinsicEnums.inc -gen-tgt-intrinsic-enums)			tablegen(LLVM AMDGPUGenIntrinsicEnums.inc -gen-tgt-intrinsic-enums)
	tablegen(LLVM AMDGPUGenIntrinsicImpl.inc -gen-tgt-intrinsic-impl)			tablegen(LLVM AMDGPUGenIntrinsicImpl.inc -gen-tgt-intrinsic-impl)
	tablegen(LLVM AMDGPUGenMCCodeEmitter.inc -gen-emitter)			tablegen(LLVM AMDGPUGenMCCodeEmitter.inc -gen-emitter)
	tablegen(LLVM AMDGPUGenMCPseudoLowering.inc -gen-pseudo-lowering)			tablegen(LLVM AMDGPUGenMCPseudoLowering.inc -gen-pseudo-lowering)
	tablegen(LLVM AMDGPUGenRegisterBank.inc -gen-register-bank)			tablegen(LLVM AMDGPUGenRegisterBank.inc -gen-register-bank)
	tablegen(LLVM AMDGPUGenRegisterInfo.inc -gen-register-info)			tablegen(LLVM AMDGPUGenRegisterInfo.inc -gen-register-info)
	tablegen(LLVM AMDGPUGenSearchableTables.inc -gen-searchable-tables)			tablegen(LLVM AMDGPUGenSearchableTables.inc -gen-searchable-tables)
	tablegen(LLVM AMDGPUGenSubtargetInfo.inc -gen-subtarget)			tablegen(LLVM AMDGPUGenSubtargetInfo.inc -gen-subtarget)

	set(LLVM_TARGET_DEFINITIONS AMDGPUGISel.td)			set(LLVM_TARGET_DEFINITIONS AMDGPUGISel.td)
	tablegen(LLVM AMDGPUGenGlobalISel.inc -gen-global-isel)			tablegen(LLVM AMDGPUGenGlobalISel.inc -gen-global-isel)

				set(LLVM_TARGET_DEFINITIONS R600.td)
				tablegen(LLVM R600GenAsmWriter.inc -gen-asm-writer)
				tablegen(LLVM R600GenCallingConv.inc -gen-callingconv)
				tablegen(LLVM R600GenDAGISel.inc -gen-dag-isel)
				tablegen(LLVM R600GenDFAPacketizer.inc -gen-dfa-packetizer)
				tablegen(LLVM R600GenInstrInfo.inc -gen-instr-info)
				tablegen(LLVM R600GenMCCodeEmitter.inc -gen-emitter)
				tablegen(LLVM R600GenRegisterInfo.inc -gen-register-info)
				tablegen(LLVM R600GenSubtargetInfo.inc -gen-subtarget)

	add_public_tablegen_target(AMDGPUCommonTableGen)			add_public_tablegen_target(AMDGPUCommonTableGen)

	add_llvm_target(AMDGPUCodeGen			add_llvm_target(AMDGPUCodeGen
	AMDGPUAliasAnalysis.cpp			AMDGPUAliasAnalysis.cpp
	AMDGPUAlwaysInlinePass.cpp			AMDGPUAlwaysInlinePass.cpp
	AMDGPUAnnotateKernelFeatures.cpp			AMDGPUAnnotateKernelFeatures.cpp
	AMDGPUAnnotateUniformValues.cpp			AMDGPUAnnotateUniformValues.cpp
	AMDGPUArgumentUsageInfo.cpp			AMDGPUArgumentUsageInfo.cpp
	▲ Show 20 Lines • Show All 87 Lines • Show Last 20 Lines

lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp

	Show All 14 Lines
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	// ToDo: What to do with instruction suffixes (v_mov_b32 vs v_mov_b32_e32)?			// ToDo: What to do with instruction suffixes (v_mov_b32 vs v_mov_b32_e32)?

	#include "Disassembler/AMDGPUDisassembler.h"			#include "Disassembler/AMDGPUDisassembler.h"
	#include "AMDGPU.h"			#include "AMDGPU.h"
	#include "AMDGPURegisterInfo.h"			#include "AMDGPURegisterInfo.h"
				#include "MCTargetDesc/AMDGPUMCTargetDesc.h"
	#include "SIDefines.h"			#include "SIDefines.h"
	#include "MCTargetDesc/AMDGPUMCTargetDesc.h"			#include "MCTargetDesc/AMDGPUMCTargetDesc.h"
	#include "Utils/AMDGPUBaseInfo.h"			#include "Utils/AMDGPUBaseInfo.h"
	#include "llvm-c/Disassembler.h"			#include "llvm-c/Disassembler.h"
	#include "llvm/ADT/APInt.h"			#include "llvm/ADT/APInt.h"
	#include "llvm/ADT/ArrayRef.h"			#include "llvm/ADT/ArrayRef.h"
	#include "llvm/ADT/Twine.h"			#include "llvm/ADT/Twine.h"
	#include "llvm/BinaryFormat/ELF.h"			#include "llvm/BinaryFormat/ELF.h"
	▲ Show 20 Lines • Show All 914 Lines • Show Last 20 Lines

lib/Target/AMDGPU/EvergreenInstructions.td

	//===-- EvergreenInstructions.td - EG Instruction defs ----- tablegen --===//			//===-- EvergreenInstructions.td - EG Instruction defs ----- tablegen --===//
	//			//
	// The LLVM Compiler Infrastructure			// The LLVM Compiler Infrastructure
	//			//
	// This file is distributed under the University of Illinois Open Source			// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.			// License. See LICENSE.TXT for details.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// TableGen definitions for instructions which are:			// TableGen definitions for instructions which are:
	// - Available to Evergreen and newer VLIW4/VLIW5 GPUs			// - Available to Evergreen and newer VLIW4/VLIW5 GPUs
	// - Available only on Evergreen family GPUs.			// - Available only on Evergreen family GPUs.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	def isEG : Predicate<			def isEG : Predicate<
	"Subtarget->getGeneration() >= AMDGPUSubtarget::EVERGREEN && "			"Subtarget->getGeneration() >= R600Subtarget::EVERGREEN && "
	"Subtarget->getGeneration() <= AMDGPUSubtarget::NORTHERN_ISLANDS && "
	"!Subtarget->hasCaymanISA()"			"!Subtarget->hasCaymanISA()"
	>;			>;

	def isEGorCayman : Predicate<			def isEGorCayman : Predicate<
	"Subtarget->getGeneration() == AMDGPUSubtarget::EVERGREEN \|\|"			"Subtarget->getGeneration() == R600Subtarget::EVERGREEN \|\|"
	"Subtarget->getGeneration() == AMDGPUSubtarget::NORTHERN_ISLANDS"			"Subtarget->getGeneration() == R600Subtarget::NORTHERN_ISLANDS"
	>;			>;

	class EGPat<dag pattern, dag result> : AMDGPUPat<pattern, result> {			class EGPat<dag pattern, dag result> : AMDGPUPat<pattern, result> {
	let SubtargetPredicate = isEG;			let SubtargetPredicate = isEG;
	}			}

	class EGOrCaymanPat<dag pattern, dag result> : AMDGPUPat<pattern, result> {			class EGOrCaymanPat<dag pattern, dag result> : AMDGPUPat<pattern, result> {
	let SubtargetPredicate = isEGorCayman;			let SubtargetPredicate = isEGorCayman;
	▲ Show 20 Lines • Show All 741 Lines • Show Last 20 Lines

lib/Target/AMDGPU/InstPrinter/AMDGPUInstPrinter.h

Show First 20 Lines • Show All 212 Lines • ▼ Show 20 Lines	protected:
void printSwizzle(const MCInst *MI, unsigned OpNo, const MCSubtargetInfo &STI,		void printSwizzle(const MCInst *MI, unsigned OpNo, const MCSubtargetInfo &STI,
raw_ostream &O);		raw_ostream &O);
void printWaitFlag(const MCInst *MI, unsigned OpNo,		void printWaitFlag(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI, raw_ostream &O);		const MCSubtargetInfo &STI, raw_ostream &O);
void printHwreg(const MCInst *MI, unsigned OpNo, const MCSubtargetInfo &STI,		void printHwreg(const MCInst *MI, unsigned OpNo, const MCSubtargetInfo &STI,
raw_ostream &O);		raw_ostream &O);
};		};

// FIXME: R600 specific parts of AMDGPUInstrPrinter should be moved here, and		class R600InstPrinter : public MCInstPrinter {
// MCTargetDesc should be using R600InstPrinter for the R600 target.
class R600InstPrinter : public AMDGPUInstPrinter {
public:		public:
R600InstPrinter(const MCAsmInfo &MAI, const MCInstrInfo &MII,		R600InstPrinter(const MCAsmInfo &MAI, const MCInstrInfo &MII,
const MCRegisterInfo &MRI)		const MCRegisterInfo &MRI)
: AMDGPUInstPrinter(MAI, MII, MRI) {}		: MCInstPrinter(MAI, MII, MRI) {}

		void printInst(const MCInst *MI, raw_ostream &O, StringRef Annot,
		const MCSubtargetInfo &STI) override;
		void printInstruction(const MCInst *MI, raw_ostream &O);
		static const char *getRegisterName(unsigned RegNo);

void printAbs(const MCInst *MI, unsigned OpNo, raw_ostream &O);		void printAbs(const MCInst *MI, unsigned OpNo, raw_ostream &O);
void printBankSwizzle(const MCInst *MI, unsigned OpNo, raw_ostream &O);		void printBankSwizzle(const MCInst *MI, unsigned OpNo, raw_ostream &O);
void printClamp(const MCInst *MI, unsigned OpNo, raw_ostream &O);		void printClamp(const MCInst *MI, unsigned OpNo, raw_ostream &O);
void printCT(const MCInst *MI, unsigned OpNo, raw_ostream &O);		void printCT(const MCInst *MI, unsigned OpNo, raw_ostream &O);
void printKCache(const MCInst *MI, unsigned OpNo, raw_ostream &O);		void printKCache(const MCInst *MI, unsigned OpNo, raw_ostream &O);
void printLast(const MCInst *MI, unsigned OpNo, raw_ostream &O);		void printLast(const MCInst *MI, unsigned OpNo, raw_ostream &O);
void printLiteral(const MCInst *MI, unsigned OpNo, raw_ostream &O);		void printLiteral(const MCInst *MI, unsigned OpNo, raw_ostream &O);
Show All 14 Lines

lib/Target/AMDGPU/InstPrinter/AMDGPUInstPrinter.cpp

Show First 20 Lines • Show All 504 Lines • ▼ Show 20 Lines	else {
// operand. This is technically allowed for the encoding of s_mov_b64.		// operand. This is technically allowed for the encoding of s_mov_b64.
O << formatHex(static_cast<uint64_t>(Imm));		O << formatHex(static_cast<uint64_t>(Imm));
}		}
}		}

void AMDGPUInstPrinter::printOperand(const MCInst *MI, unsigned OpNo,		void AMDGPUInstPrinter::printOperand(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI,		const MCSubtargetInfo &STI,
raw_ostream &O) {		raw_ostream &O) {
if (!STI.getFeatureBits()[AMDGPU::FeatureGCN]) {
static_cast<R600InstPrinter*>(this)->printOperand(MI, OpNo, O);
return;
}

if (OpNo >= MI->getNumOperands()) {		if (OpNo >= MI->getNumOperands()) {
O << "/Missing OP" << OpNo << "/";		O << "/Missing OP" << OpNo << "/";
return;		return;
}		}

const MCOperand &Op = MI->getOperand(OpNo);		const MCOperand &Op = MI->getOperand(OpNo);
if (Op.isReg()) {		if (Op.isReg()) {
printRegOperand(Op.getReg(), O, MRI);		printRegOperand(Op.getReg(), O, MRI);
▲ Show 20 Lines • Show All 434 Lines • ▼ Show 20 Lines	void AMDGPUInstPrinter::printVGPRIndexMode(const MCInst *MI, unsigned OpNo,

if (Val & VGPRIndexMode::SRC2_ENABLE)		if (Val & VGPRIndexMode::SRC2_ENABLE)
O << " src2";		O << " src2";
}		}

void AMDGPUInstPrinter::printMemOperand(const MCInst *MI, unsigned OpNo,		void AMDGPUInstPrinter::printMemOperand(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI,		const MCSubtargetInfo &STI,
raw_ostream &O) {		raw_ostream &O) {
if (!STI.getFeatureBits()[AMDGPU::FeatureGCN]) {
static_cast<R600InstPrinter*>(this)->printMemOperand(MI, OpNo, O);
return;
}

printOperand(MI, OpNo, STI, O);		printOperand(MI, OpNo, STI, O);
O << ", ";		O << ", ";
printOperand(MI, OpNo + 1, STI, O);		printOperand(MI, OpNo + 1, STI, O);
}		}

void AMDGPUInstPrinter::printIfSet(const MCInst *MI, unsigned OpNo,		void AMDGPUInstPrinter::printIfSet(const MCInst *MI, unsigned OpNo,
raw_ostream &O, StringRef Asm,		raw_ostream &O, StringRef Asm,
StringRef Default) {		StringRef Default) {
Show All 9 Lines
void AMDGPUInstPrinter::printIfSet(const MCInst *MI, unsigned OpNo,		void AMDGPUInstPrinter::printIfSet(const MCInst *MI, unsigned OpNo,
raw_ostream &O, char Asm) {		raw_ostream &O, char Asm) {
const MCOperand &Op = MI->getOperand(OpNo);		const MCOperand &Op = MI->getOperand(OpNo);
assert(Op.isImm());		assert(Op.isImm());
if (Op.getImm() == 1)		if (Op.getImm() == 1)
O << Asm;		O << Asm;
}		}

void AMDGPUInstPrinter::printAbs(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI, raw_ostream &O) {
static_cast<R600InstPrinter*>(this)->printAbs(MI, OpNo, O);
}

void AMDGPUInstPrinter::printClamp(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI, raw_ostream &O) {
static_cast<R600InstPrinter*>(this)->printClamp(MI, OpNo, O);
}

void AMDGPUInstPrinter::printHigh(const MCInst *MI, unsigned OpNo,		void AMDGPUInstPrinter::printHigh(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI,		const MCSubtargetInfo &STI,
raw_ostream &O) {		raw_ostream &O) {
if (MI->getOperand(OpNo).getImm())		if (MI->getOperand(OpNo).getImm())
O << " high";		O << " high";
}		}

void AMDGPUInstPrinter::printClampSI(const MCInst *MI, unsigned OpNo,		void AMDGPUInstPrinter::printClampSI(const MCInst *MI, unsigned OpNo,
Show All 10 Lines	void AMDGPUInstPrinter::printOModSI(const MCInst *MI, unsigned OpNo,
if (Imm == SIOutMods::MUL2)		if (Imm == SIOutMods::MUL2)
O << " mul:2";		O << " mul:2";
else if (Imm == SIOutMods::MUL4)		else if (Imm == SIOutMods::MUL4)
O << " mul:4";		O << " mul:4";
else if (Imm == SIOutMods::DIV2)		else if (Imm == SIOutMods::DIV2)
O << " div:2";		O << " div:2";
}		}

void AMDGPUInstPrinter::printLiteral(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI,
raw_ostream &O) {
static_cast<R600InstPrinter*>(this)->printLiteral(MI, OpNo, O);
}

void AMDGPUInstPrinter::printLast(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI, raw_ostream &O) {
static_cast<R600InstPrinter*>(this)->printLast(MI, OpNo, O);
}

void AMDGPUInstPrinter::printNeg(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI, raw_ostream &O) {
static_cast<R600InstPrinter*>(this)->printNeg(MI, OpNo, O);
}

void AMDGPUInstPrinter::printOMOD(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI, raw_ostream &O) {
static_cast<R600InstPrinter*>(this)->printOMOD(MI, OpNo, O);
}

void AMDGPUInstPrinter::printRel(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI, raw_ostream &O) {
static_cast<R600InstPrinter*>(this)->printRel(MI, OpNo, O);
}

void AMDGPUInstPrinter::printUpdateExecMask(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI,
raw_ostream &O) {
static_cast<R600InstPrinter*>(this)->printUpdateExecMask(MI, OpNo, O);
}

void AMDGPUInstPrinter::printUpdatePred(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI,
raw_ostream &O) {
static_cast<R600InstPrinter*>(this)->printUpdatePred(MI, OpNo, O);
}

void AMDGPUInstPrinter::printWrite(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI, raw_ostream &O) {
static_cast<R600InstPrinter*>(this)->printWrite(MI, OpNo, O);
}

void AMDGPUInstPrinter::printBankSwizzle(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI,
raw_ostream &O) {
static_cast<R600InstPrinter*>(this)->printBankSwizzle(MI, OpNo, O);
}

void AMDGPUInstPrinter::printRSel(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI, raw_ostream &O) {
static_cast<R600InstPrinter*>(this)->printRSel(MI, OpNo, O);
}

void AMDGPUInstPrinter::printCT(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI, raw_ostream &O) {
static_cast<R600InstPrinter*>(this)->printCT(MI, OpNo, O);
}

void AMDGPUInstPrinter::printKCache(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI, raw_ostream &O) {
static_cast<R600InstPrinter*>(this)->printKCache(MI, OpNo, O);
}

void AMDGPUInstPrinter::printSendMsg(const MCInst *MI, unsigned OpNo,		void AMDGPUInstPrinter::printSendMsg(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI,		const MCSubtargetInfo &STI,
raw_ostream &O) {		raw_ostream &O) {
using namespace llvm::AMDGPU::SendMsg;		using namespace llvm::AMDGPU::SendMsg;

const unsigned SImm16 = MI->getOperand(OpNo).getImm();		const unsigned SImm16 = MI->getOperand(OpNo).getImm();
const unsigned Id = SImm16 & ID_MASK_;		const unsigned Id = SImm16 & ID_MASK_;
do {		do {
▲ Show 20 Lines • Show All 188 Lines • ▼ Show 20 Lines	void AMDGPUInstPrinter::printHwreg(const MCInst *MI, unsigned OpNo,
if (Width != WIDTH_M1_DEFAULT_ + 1 \|\| Offset != OFFSET_DEFAULT_) {		if (Width != WIDTH_M1_DEFAULT_ + 1 \|\| Offset != OFFSET_DEFAULT_) {
O << ", " << Offset << ", " << Width;		O << ", " << Offset << ", " << Width;
}		}
O << ')';		O << ')';
}		}

#include "AMDGPUGenAsmWriter.inc"		#include "AMDGPUGenAsmWriter.inc"

		void R600InstPrinter::printInst(const MCInst *MI, raw_ostream &O,
		StringRef Annot, const MCSubtargetInfo &STI) {
		O.flush();
		printInstruction(MI, O);
		printAnnotation(O, Annot);
		}

void R600InstPrinter::printAbs(const MCInst *MI, unsigned OpNo,		void R600InstPrinter::printAbs(const MCInst *MI, unsigned OpNo,
raw_ostream &O) {		raw_ostream &O) {
AMDGPUInstPrinter::printIfSet(MI, OpNo, O, '\|');		AMDGPUInstPrinter::printIfSet(MI, OpNo, O, '\|');
}		}

void R600InstPrinter::printBankSwizzle(const MCInst *MI, unsigned OpNo,		void R600InstPrinter::printBankSwizzle(const MCInst *MI, unsigned OpNo,
raw_ostream &O) {		raw_ostream &O) {
int BankSwizzle = MI->getOperand(OpNo).getImm();		int BankSwizzle = MI->getOperand(OpNo).getImm();
▲ Show 20 Lines • Show All 102 Lines • ▼ Show 20 Lines	if (OpNo >= MI->getNumOperands()) {
O << "/Missing OP" << OpNo << "/";		O << "/Missing OP" << OpNo << "/";
return;		return;
}		}

const MCOperand &Op = MI->getOperand(OpNo);		const MCOperand &Op = MI->getOperand(OpNo);
if (Op.isReg()) {		if (Op.isReg()) {
switch (Op.getReg()) {		switch (Op.getReg()) {
// This is the default predicate state, so we don't need to print it.		// This is the default predicate state, so we don't need to print it.
case AMDGPU::PRED_SEL_OFF:		case R600::PRED_SEL_OFF:
break;		break;

default:		default:
O << getRegisterName(Op.getReg());		O << getRegisterName(Op.getReg());
break;		break;
}		}
} else if (Op.isImm()) {		} else if (Op.isImm()) {
O << Op.getImm();		O << Op.getImm();
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines

void R600InstPrinter::printWrite(const MCInst *MI, unsigned OpNo,		void R600InstPrinter::printWrite(const MCInst *MI, unsigned OpNo,
raw_ostream &O) {		raw_ostream &O) {
const MCOperand &Op = MI->getOperand(OpNo);		const MCOperand &Op = MI->getOperand(OpNo);
if (Op.getImm() == 0) {		if (Op.getImm() == 0) {
O << " (MASKED)";		O << " (MASKED)";
}		}
}		}

		#include "R600GenAsmWriter.inc"

lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.h

	Show All 34 Lines
	class raw_pwrite_stream;			class raw_pwrite_stream;

	Target &getTheAMDGPUTarget();			Target &getTheAMDGPUTarget();
	Target &getTheGCNTarget();			Target &getTheGCNTarget();

	MCCodeEmitter *createR600MCCodeEmitter(const MCInstrInfo &MCII,			MCCodeEmitter *createR600MCCodeEmitter(const MCInstrInfo &MCII,
	const MCRegisterInfo &MRI,			const MCRegisterInfo &MRI,
	MCContext &Ctx);			MCContext &Ctx);
				MCInstrInfo *createR600MCInstrInfo();

	MCCodeEmitter *createSIMCCodeEmitter(const MCInstrInfo &MCII,			MCCodeEmitter *createSIMCCodeEmitter(const MCInstrInfo &MCII,
	const MCRegisterInfo &MRI,			const MCRegisterInfo &MRI,
	MCContext &Ctx);			MCContext &Ctx);

	MCAsmBackend *createAMDGPUAsmBackend(const Target &T,			MCAsmBackend *createAMDGPUAsmBackend(const Target &T,
	const MCSubtargetInfo &STI,			const MCSubtargetInfo &STI,
	const MCRegisterInfo &MRI,			const MCRegisterInfo &MRI,
	const MCTargetOptions &Options);			const MCTargetOptions &Options);

	std::unique_ptr<MCObjectTargetWriter>			std::unique_ptr<MCObjectTargetWriter>
	createAMDGPUELFObjectWriter(bool Is64Bit, uint8_t OSABI,			createAMDGPUELFObjectWriter(bool Is64Bit, uint8_t OSABI,
	bool HasRelocationAddend);			bool HasRelocationAddend);
	} // End llvm namespace			} // End llvm namespace

	#define GET_REGINFO_ENUM			#define GET_REGINFO_ENUM
	#include "AMDGPUGenRegisterInfo.inc"			#include "AMDGPUGenRegisterInfo.inc"
	#undef GET_REGINFO_ENUM			#undef GET_REGINFO_ENUM

				#define GET_REGINFO_ENUM
				#include "R600GenRegisterInfo.inc"
				#undef GET_REGINFO_ENUM

	#define GET_INSTRINFO_ENUM			#define GET_INSTRINFO_ENUM
	#define GET_INSTRINFO_OPERAND_ENUM			#define GET_INSTRINFO_OPERAND_ENUM
	#define GET_INSTRINFO_SCHED_ENUM			#define GET_INSTRINFO_SCHED_ENUM
	#include "AMDGPUGenInstrInfo.inc"			#include "AMDGPUGenInstrInfo.inc"
	#undef GET_INSTRINFO_SCHED_ENUM			#undef GET_INSTRINFO_SCHED_ENUM
	#undef GET_INSTRINFO_OPERAND_ENUM			#undef GET_INSTRINFO_OPERAND_ENUM
	#undef GET_INSTRINFO_ENUM			#undef GET_INSTRINFO_ENUM

				#define GET_INSTRINFO_ENUM
				#define GET_INSTRINFO_OPERAND_ENUM
				#define GET_INSTRINFO_SCHED_ENUM
				#include "R600GenInstrInfo.inc"
				#undef GET_INSTRINFO_SCHED_ENUM
				#undef GET_INSTRINFO_OPERAND_ENUM
				#undef GET_INSTRINFO_ENUM

	#define GET_SUBTARGETINFO_ENUM			#define GET_SUBTARGETINFO_ENUM
	#include "AMDGPUGenSubtargetInfo.inc"			#include "AMDGPUGenSubtargetInfo.inc"
	#undef GET_SUBTARGETINFO_ENUM			#undef GET_SUBTARGETINFO_ENUM

				#define GET_SUBTARGETINFO_ENUM
				#include "R600GenSubtargetInfo.inc"
				#undef GET_SUBTARGETINFO_ENUM

	#endif			#endif

lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.cpp

Show All 32 Lines
using namespace llvm;		using namespace llvm;

#define GET_INSTRINFO_MC_DESC		#define GET_INSTRINFO_MC_DESC
#include "AMDGPUGenInstrInfo.inc"		#include "AMDGPUGenInstrInfo.inc"

#define GET_SUBTARGETINFO_MC_DESC		#define GET_SUBTARGETINFO_MC_DESC
#include "AMDGPUGenSubtargetInfo.inc"		#include "AMDGPUGenSubtargetInfo.inc"

		#define NoSchedModel NoSchedModelR600
		#define GET_SUBTARGETINFO_MC_DESC
		#include "R600GenSubtargetInfo.inc"
		#undef NoSchedModelR600

#define GET_REGINFO_MC_DESC		#define GET_REGINFO_MC_DESC
#include "AMDGPUGenRegisterInfo.inc"		#include "AMDGPUGenRegisterInfo.inc"

		#define GET_REGINFO_MC_DESC
		#include "R600GenRegisterInfo.inc"

static MCInstrInfo *createAMDGPUMCInstrInfo() {		static MCInstrInfo *createAMDGPUMCInstrInfo() {
MCInstrInfo *X = new MCInstrInfo();		MCInstrInfo *X = new MCInstrInfo();
InitAMDGPUMCInstrInfo(X);		InitAMDGPUMCInstrInfo(X);
return X;		return X;
}		}

static MCRegisterInfo *createAMDGPUMCRegisterInfo(const Triple &TT) {		static MCRegisterInfo *createAMDGPUMCRegisterInfo(const Triple &TT) {
MCRegisterInfo *X = new MCRegisterInfo();		MCRegisterInfo *X = new MCRegisterInfo();
		if (TT.getArch() == Triple::r600)
		InitR600MCRegisterInfo(X, 0);
		else
InitAMDGPUMCRegisterInfo(X, 0);		InitAMDGPUMCRegisterInfo(X, 0);
return X;		return X;
}		}

static MCSubtargetInfo *		static MCSubtargetInfo *
createAMDGPUMCSubtargetInfo(const Triple &TT, StringRef CPU, StringRef FS) {		createAMDGPUMCSubtargetInfo(const Triple &TT, StringRef CPU, StringRef FS) {
		if (TT.getArch() == Triple::r600)
		return createR600MCSubtargetInfoImpl(TT, CPU, FS);
return createAMDGPUMCSubtargetInfoImpl(TT, CPU, FS);		return createAMDGPUMCSubtargetInfoImpl(TT, CPU, FS);
}		}

static MCInstPrinter *createAMDGPUMCInstPrinter(const Triple &T,		static MCInstPrinter *createAMDGPUMCInstPrinter(const Triple &T,
unsigned SyntaxVariant,		unsigned SyntaxVariant,
const MCAsmInfo &MAI,		const MCAsmInfo &MAI,
const MCInstrInfo &MII,		const MCInstrInfo &MII,
const MCRegisterInfo &MRI) {		const MCRegisterInfo &MRI) {
return T.getArch() == Triple::r600 ? new R600InstPrinter(MAI, MII, MRI) :		if (T.getArch() == Triple::r600)
new AMDGPUInstPrinter(MAI, MII, MRI);		return new R600InstPrinter(MAI, MII, MRI);
		else
		return new AMDGPUInstPrinter(MAI, MII, MRI);
}		}

static MCTargetStreamer *createAMDGPUAsmTargetStreamer(MCStreamer &S,		static MCTargetStreamer *createAMDGPUAsmTargetStreamer(MCStreamer &S,
formatted_raw_ostream &OS,		formatted_raw_ostream &OS,
MCInstPrinter *InstPrint,		MCInstPrinter *InstPrint,
bool isVerboseAsm) {		bool isVerboseAsm) {
return new AMDGPUTargetAsmStreamer(S, OS);		return new AMDGPUTargetAsmStreamer(S, OS);
}		}
Show All 9 Lines	static MCStreamer *createMCStreamer(const Triple &T, MCContext &Context,
std::unique_ptr<MCObjectWriter> &&OW,		std::unique_ptr<MCObjectWriter> &&OW,
std::unique_ptr<MCCodeEmitter> &&Emitter,		std::unique_ptr<MCCodeEmitter> &&Emitter,
bool RelaxAll) {		bool RelaxAll) {
return createAMDGPUELFStreamer(T, Context, std::move(MAB), std::move(OW),		return createAMDGPUELFStreamer(T, Context, std::move(MAB), std::move(OW),
std::move(Emitter), RelaxAll);		std::move(Emitter), RelaxAll);
}		}

extern "C" void LLVMInitializeAMDGPUTargetMC() {		extern "C" void LLVMInitializeAMDGPUTargetMC() {

		TargetRegistry::RegisterMCInstrInfo(getTheGCNTarget(), createAMDGPUMCInstrInfo);
		TargetRegistry::RegisterMCInstrInfo(getTheAMDGPUTarget(), createR600MCInstrInfo);
for (Target *T : {&getTheAMDGPUTarget(), &getTheGCNTarget()}) {		for (Target *T : {&getTheAMDGPUTarget(), &getTheGCNTarget()}) {
RegisterMCAsmInfo<AMDGPUMCAsmInfo> X(*T);		RegisterMCAsmInfo<AMDGPUMCAsmInfo> X(*T);

TargetRegistry::RegisterMCInstrInfo(*T, createAMDGPUMCInstrInfo);
TargetRegistry::RegisterMCRegInfo(*T, createAMDGPUMCRegisterInfo);		TargetRegistry::RegisterMCRegInfo(*T, createAMDGPUMCRegisterInfo);
TargetRegistry::RegisterMCSubtargetInfo(*T, createAMDGPUMCSubtargetInfo);		TargetRegistry::RegisterMCSubtargetInfo(*T, createAMDGPUMCSubtargetInfo);
TargetRegistry::RegisterMCInstPrinter(*T, createAMDGPUMCInstPrinter);		TargetRegistry::RegisterMCInstPrinter(*T, createAMDGPUMCInstPrinter);
TargetRegistry::RegisterMCAsmBackend(*T, createAMDGPUAsmBackend);		TargetRegistry::RegisterMCAsmBackend(*T, createAMDGPUAsmBackend);
TargetRegistry::RegisterELFStreamer(*T, createMCStreamer);		TargetRegistry::RegisterELFStreamer(*T, createMCStreamer);
}		}

// R600 specific registration		// R600 specific registration
Show All 14 Lines

lib/Target/AMDGPU/MCTargetDesc/CMakeLists.txt

	add_llvm_library(LLVMAMDGPUDesc			add_llvm_library(LLVMAMDGPUDesc
	AMDGPUAsmBackend.cpp			AMDGPUAsmBackend.cpp
	AMDGPUELFObjectWriter.cpp			AMDGPUELFObjectWriter.cpp
	AMDGPUELFStreamer.cpp			AMDGPUELFStreamer.cpp
	AMDGPUHSAMetadataStreamer.cpp			AMDGPUHSAMetadataStreamer.cpp
	AMDGPUMCAsmInfo.cpp			AMDGPUMCAsmInfo.cpp
	AMDGPUMCCodeEmitter.cpp			AMDGPUMCCodeEmitter.cpp
	AMDGPUMCTargetDesc.cpp			AMDGPUMCTargetDesc.cpp
	AMDGPUTargetStreamer.cpp			AMDGPUTargetStreamer.cpp
	R600MCCodeEmitter.cpp			R600MCCodeEmitter.cpp
				R600MCTargetDesc.cpp
	SIMCCodeEmitter.cpp			SIMCCodeEmitter.cpp
	)			)

lib/Target/AMDGPU/MCTargetDesc/R600MCCodeEmitter.cpp

Show All 9 Lines
/// \file		/// \file
///		///
/// The R600 code emitter produces machine code that can be executed		/// The R600 code emitter produces machine code that can be executed
/// directly on the GPU device.		/// directly on the GPU device.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "MCTargetDesc/AMDGPUFixupKinds.h"		#include "MCTargetDesc/AMDGPUFixupKinds.h"
#include "MCTargetDesc/AMDGPUMCCodeEmitter.h"
#include "MCTargetDesc/AMDGPUMCTargetDesc.h"		#include "MCTargetDesc/AMDGPUMCTargetDesc.h"
#include "R600Defines.h"		#include "R600Defines.h"
#include "llvm/MC/MCCodeEmitter.h"		#include "llvm/MC/MCCodeEmitter.h"
#include "llvm/MC/MCContext.h"		#include "llvm/MC/MCContext.h"
#include "llvm/MC/MCFixup.h"		#include "llvm/MC/MCFixup.h"
#include "llvm/MC/MCInst.h"		#include "llvm/MC/MCInst.h"
#include "llvm/MC/MCInstrDesc.h"		#include "llvm/MC/MCInstrDesc.h"
#include "llvm/MC/MCInstrInfo.h"		#include "llvm/MC/MCInstrInfo.h"
#include "llvm/MC/MCRegisterInfo.h"		#include "llvm/MC/MCRegisterInfo.h"
#include "llvm/MC/MCSubtargetInfo.h"		#include "llvm/MC/MCSubtargetInfo.h"
#include "llvm/Support/Endian.h"		#include "llvm/Support/Endian.h"
#include "llvm/Support/EndianStream.h"		#include "llvm/Support/EndianStream.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include <cassert>		#include <cassert>
#include <cstdint>		#include <cstdint>

using namespace llvm;		using namespace llvm;

namespace {		namespace {

class R600MCCodeEmitter : public AMDGPUMCCodeEmitter {		class R600MCCodeEmitter : public MCCodeEmitter {
const MCRegisterInfo &MRI;		const MCRegisterInfo &MRI;
		const MCInstrInfo &MCII;

public:		public:
R600MCCodeEmitter(const MCInstrInfo &mcii, const MCRegisterInfo &mri)		R600MCCodeEmitter(const MCInstrInfo &mcii, const MCRegisterInfo &mri)
: AMDGPUMCCodeEmitter(mcii), MRI(mri) {}		: MRI(mri), MCII(mcii) {}
R600MCCodeEmitter(const R600MCCodeEmitter &) = delete;		R600MCCodeEmitter(const R600MCCodeEmitter &) = delete;
R600MCCodeEmitter &operator=(const R600MCCodeEmitter &) = delete;		R600MCCodeEmitter &operator=(const R600MCCodeEmitter &) = delete;

/// Encode the instruction and write it to the OS.		/// Encode the instruction and write it to the OS.
void encodeInstruction(const MCInst &MI, raw_ostream &OS,		void encodeInstruction(const MCInst &MI, raw_ostream &OS,
SmallVectorImpl<MCFixup> &Fixups,		SmallVectorImpl<MCFixup> &Fixups,
const MCSubtargetInfo &STI) const override;		const MCSubtargetInfo &STI) const;

/// \returns the encoding for an MCOperand.		/// \returns the encoding for an MCOperand.
uint64_t getMachineOpValue(const MCInst &MI, const MCOperand &MO,		uint64_t getMachineOpValue(const MCInst &MI, const MCOperand &MO,
SmallVectorImpl<MCFixup> &Fixups,		SmallVectorImpl<MCFixup> &Fixups,
const MCSubtargetInfo &STI) const override;		const MCSubtargetInfo &STI) const;

private:		private:

void Emit(uint32_t value, raw_ostream &OS) const;		void Emit(uint32_t value, raw_ostream &OS) const;
void Emit(uint64_t value, raw_ostream &OS) const;		void Emit(uint64_t value, raw_ostream &OS) const;

unsigned getHWReg(unsigned regNo) const;		unsigned getHWReg(unsigned regNo) const;

		uint64_t getBinaryCodeForInstr(const MCInst &MI,
		SmallVectorImpl<MCFixup> &Fixups,
		const MCSubtargetInfo &STI) const;
		uint64_t computeAvailableFeatures(const FeatureBitset &FB) const;
		void verifyInstructionPredicates(const MCInst &MI,
		uint64_t AvailableFeatures) const;

};		};

} // end anonymous namespace		} // end anonymous namespace

enum RegElement {		enum RegElement {
ELEMENT_X = 0,		ELEMENT_X = 0,
ELEMENT_Y,		ELEMENT_Y,
ELEMENT_Z,		ELEMENT_Z,
Show All 18 Lines

void R600MCCodeEmitter::encodeInstruction(const MCInst &MI, raw_ostream &OS,		void R600MCCodeEmitter::encodeInstruction(const MCInst &MI, raw_ostream &OS,
SmallVectorImpl<MCFixup> &Fixups,		SmallVectorImpl<MCFixup> &Fixups,
const MCSubtargetInfo &STI) const {		const MCSubtargetInfo &STI) const {
verifyInstructionPredicates(MI,		verifyInstructionPredicates(MI,
computeAvailableFeatures(STI.getFeatureBits()));		computeAvailableFeatures(STI.getFeatureBits()));

const MCInstrDesc &Desc = MCII.get(MI.getOpcode());		const MCInstrDesc &Desc = MCII.get(MI.getOpcode());
if (MI.getOpcode() == AMDGPU::RETURN \|\|		if (MI.getOpcode() == R600::RETURN \|\|
MI.getOpcode() == AMDGPU::FETCH_CLAUSE \|\|		MI.getOpcode() == R600::FETCH_CLAUSE \|\|
MI.getOpcode() == AMDGPU::ALU_CLAUSE \|\|		MI.getOpcode() == R600::ALU_CLAUSE \|\|
MI.getOpcode() == AMDGPU::BUNDLE \|\|		MI.getOpcode() == R600::BUNDLE \|\|
MI.getOpcode() == AMDGPU::KILL) {		MI.getOpcode() == R600::KILL) {
return;		return;
} else if (IS_VTX(Desc)) {		} else if (IS_VTX(Desc)) {
uint64_t InstWord01 = getBinaryCodeForInstr(MI, Fixups, STI);		uint64_t InstWord01 = getBinaryCodeForInstr(MI, Fixups, STI);
uint32_t InstWord2 = MI.getOperand(2).getImm(); // Offset		uint32_t InstWord2 = MI.getOperand(2).getImm(); // Offset
if (!(STI.getFeatureBits()[AMDGPU::FeatureCaymanISA])) {		if (!(STI.getFeatureBits()[R600::FeatureCaymanISA])) {
InstWord2 \|= 1 << 19; // Mega-Fetch bit		InstWord2 \|= 1 << 19; // Mega-Fetch bit
}		}

Emit(InstWord01, OS);		Emit(InstWord01, OS);
Emit(InstWord2, OS);		Emit(InstWord2, OS);
Emit((uint32_t) 0, OS);		Emit((uint32_t) 0, OS);
} else if (IS_TEX(Desc)) {		} else if (IS_TEX(Desc)) {
int64_t Sampler = MI.getOperand(14).getImm();		int64_t Sampler = MI.getOperand(14).getImm();
Show All 16 Lines	Emit((uint32_t) 0, OS);
SrcSelect[ELEMENT_W] << 29 \| Offsets[0] << 0 \| Offsets[1] << 5 \|		SrcSelect[ELEMENT_W] << 29 \| Offsets[0] << 0 \| Offsets[1] << 5 \|
Offsets[2] << 10;		Offsets[2] << 10;

Emit(Word01, OS);		Emit(Word01, OS);
Emit(Word2, OS);		Emit(Word2, OS);
Emit((uint32_t) 0, OS);		Emit((uint32_t) 0, OS);
} else {		} else {
uint64_t Inst = getBinaryCodeForInstr(MI, Fixups, STI);		uint64_t Inst = getBinaryCodeForInstr(MI, Fixups, STI);
if ((STI.getFeatureBits()[AMDGPU::FeatureR600ALUInst]) &&		if ((STI.getFeatureBits()[R600::FeatureR600ALUInst]) &&
((Desc.TSFlags & R600_InstFlag::OP1) \|\|		((Desc.TSFlags & R600_InstFlag::OP1) \|\|
Desc.TSFlags & R600_InstFlag::OP2)) {		Desc.TSFlags & R600_InstFlag::OP2)) {
uint64_t ISAOpCode = Inst & (0x3FFULL << 39);		uint64_t ISAOpCode = Inst & (0x3FFULL << 39);
Inst &= ~(0x3FFULL << 39);		Inst &= ~(0x3FFULL << 39);
Inst \|= ISAOpCode << 1;		Inst \|= ISAOpCode << 1;
}		}
Emit(Inst, OS);		Emit(Inst, OS);
}		}
Show All 33 Lines	if (MO.isExpr()) {
return 0;		return 0;
}		}

assert(MO.isImm());		assert(MO.isImm());
return MO.getImm();		return MO.getImm();
}		}

#define ENABLE_INSTR_PREDICATE_VERIFIER		#define ENABLE_INSTR_PREDICATE_VERIFIER
#include "AMDGPUGenMCCodeEmitter.inc"		#include "R600GenMCCodeEmitter.inc"

lib/Target/AMDGPU/MCTargetDesc/R600MCTargetDesc.cpp

This file was added.

				//===-- R600MCTargetDesc.cpp - R600 Target Descriptions -------------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				/// \file
				/// \brief This file provides R600 specific target descriptions.
				//
				//===----------------------------------------------------------------------===//

				#include "AMDGPUMCTargetDesc.h"
				#include "llvm/MC/MCInstrInfo.h"

				using namespace llvm;

				#define GET_INSTRINFO_MC_DESC
				#include "R600GenInstrInfo.inc"

				MCInstrInfo *llvm::createR600MCInstrInfo() {
				MCInstrInfo *X = new MCInstrInfo();
				InitR600MCInstrInfo(X);
				return X;
				}

lib/Target/AMDGPU/MCTargetDesc/SIMCCodeEmitter.cpp

Show First 20 Lines • Show All 432 Lines • ▼ Show 20 Lines	if (Enc != ~0U && (Enc != 255 \|\| Desc.getSize() == 4))
return Enc;		return Enc;

} else if (MO.isImm())		} else if (MO.isImm())
return MO.getImm();		return MO.getImm();

llvm_unreachable("Encoding of this operand type is not supported yet.");		llvm_unreachable("Encoding of this operand type is not supported yet.");
return 0;		return 0;
}		}

		#define ENABLE_INSTR_PREDICATE_VERIFIER
		#include "AMDGPUGenMCCodeEmitter.inc"

lib/Target/AMDGPU/R600.td

This file was added.

				//===-- R600.td - R600 Tablegen files ----------------------- tablegen --===//
				//
				arsenmUnsubmitted Done Reply Inline Actions Missing header comment arsenm: Missing header comment
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//

				include "llvm/Target/Target.td"

				def R600InstrInfo : InstrInfo {
				let guessInstructionProperties = 1;
				let noNamedPositionallyEncodedOperands = 1;
				}

				def R600 : Target {
				let InstructionSet = R600InstrInfo;
				let AllowRegisterRenaming = 1;
				}

				let Namespace = "R600" in {

				foreach Index = 0-15 in {
				def sub#Index : SubRegIndex<32, !shl(Index, 5)>;
				}

				include "R600RegisterInfo.td"

				}

				def NullALU : InstrItinClass;
				def ALU_NULL : FuncUnit;

				include "AMDGPUFeatures.td"
				include "R600Schedule.td"
				include "R600Processors.td"
				include "AMDGPUInstrInfo.td"
				include "AMDGPUInstructions.td"
				include "R600Instructions.td"
				include "R700Instructions.td"
				include "EvergreenInstructions.td"
				include "CaymanInstructions.td"

				// Calling convention for R600
				def CC_R600 : CallingConv<[
				CCIfInReg<CCIfType<[v4f32, v4i32] , CCAssignToReg<[
				T0_XYZW, T1_XYZW, T2_XYZW, T3_XYZW, T4_XYZW, T5_XYZW, T6_XYZW, T7_XYZW,
				T8_XYZW, T9_XYZW, T10_XYZW, T11_XYZW, T12_XYZW, T13_XYZW, T14_XYZW, T15_XYZW,
				T16_XYZW, T17_XYZW, T18_XYZW, T19_XYZW, T20_XYZW, T21_XYZW, T22_XYZW,
				T23_XYZW, T24_XYZW, T25_XYZW, T26_XYZW, T27_XYZW, T28_XYZW, T29_XYZW,
				T30_XYZW, T31_XYZW, T32_XYZW
				]>>>
				]>;

				// Calling convention for compute kernels
				def CC_R600_Kernel : CallingConv<[
				CCCustom<"allocateKernArg">
				]>;

lib/Target/AMDGPU/R600AsmPrinter.cpp

Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	void R600AsmPrinter::EmitProgramInfoR600(const MachineFunction &MF) {
unsigned MaxGPR = 0;		unsigned MaxGPR = 0;
bool killPixel = false;		bool killPixel = false;
const R600Subtarget &STM = MF.getSubtarget<R600Subtarget>();		const R600Subtarget &STM = MF.getSubtarget<R600Subtarget>();
const R600RegisterInfo *RI = STM.getRegisterInfo();		const R600RegisterInfo *RI = STM.getRegisterInfo();
const R600MachineFunctionInfo *MFI = MF.getInfo<R600MachineFunctionInfo>();		const R600MachineFunctionInfo *MFI = MF.getInfo<R600MachineFunctionInfo>();

for (const MachineBasicBlock &MBB : MF) {		for (const MachineBasicBlock &MBB : MF) {
for (const MachineInstr &MI : MBB) {		for (const MachineInstr &MI : MBB) {
if (MI.getOpcode() == AMDGPU::KILLGT)		if (MI.getOpcode() == R600::KILLGT)
killPixel = true;		killPixel = true;
unsigned numOperands = MI.getNumOperands();		unsigned numOperands = MI.getNumOperands();
for (unsigned op_idx = 0; op_idx < numOperands; op_idx++) {		for (unsigned op_idx = 0; op_idx < numOperands; op_idx++) {
const MachineOperand &MO = MI.getOperand(op_idx);		const MachineOperand &MO = MI.getOperand(op_idx);
if (!MO.isReg())		if (!MO.isReg())
continue;		continue;
unsigned HWReg = RI->getHWRegIndex(MO.getReg());		unsigned HWReg = RI->getHWRegIndex(MO.getReg());

▲ Show 20 Lines • Show All 71 Lines • Show Last 20 Lines

lib/Target/AMDGPU/R600ClauseMergePass.cpp

	Show All 28 Lines
	using namespace llvm;			using namespace llvm;

	#define DEBUG_TYPE "r600mergeclause"			#define DEBUG_TYPE "r600mergeclause"

	namespace {			namespace {

	static bool isCFAlu(const MachineInstr &MI) {			static bool isCFAlu(const MachineInstr &MI) {
	switch (MI.getOpcode()) {			switch (MI.getOpcode()) {
	case AMDGPU::CF_ALU:			case R600::CF_ALU:
	case AMDGPU::CF_ALU_PUSH_BEFORE:			case R600::CF_ALU_PUSH_BEFORE:
	return true;			return true;
	default:			default:
	return false;			return false;
	}			}
	}			}

	class R600ClauseMergePass : public MachineFunctionPass {			class R600ClauseMergePass : public MachineFunctionPass {

	Show All 33 Lines

	char R600ClauseMergePass::ID = 0;			char R600ClauseMergePass::ID = 0;

	char &llvm::R600ClauseMergePassID = R600ClauseMergePass::ID;			char &llvm::R600ClauseMergePassID = R600ClauseMergePass::ID;

	unsigned R600ClauseMergePass::getCFAluSize(const MachineInstr &MI) const {			unsigned R600ClauseMergePass::getCFAluSize(const MachineInstr &MI) const {
	assert(isCFAlu(MI));			assert(isCFAlu(MI));
	return MI			return MI
	.getOperand(TII->getOperandIdx(MI.getOpcode(), AMDGPU::OpName::COUNT))			.getOperand(TII->getOperandIdx(MI.getOpcode(), R600::OpName::COUNT))
	.getImm();			.getImm();
	}			}

	bool R600ClauseMergePass::isCFAluEnabled(const MachineInstr &MI) const {			bool R600ClauseMergePass::isCFAluEnabled(const MachineInstr &MI) const {
	assert(isCFAlu(MI));			assert(isCFAlu(MI));
	return MI			return MI
	.getOperand(TII->getOperandIdx(MI.getOpcode(), AMDGPU::OpName::Enabled))			.getOperand(TII->getOperandIdx(MI.getOpcode(), R600::OpName::Enabled))
	.getImm();			.getImm();
	}			}

	void R600ClauseMergePass::cleanPotentialDisabledCFAlu(			void R600ClauseMergePass::cleanPotentialDisabledCFAlu(
	MachineInstr &CFAlu) const {			MachineInstr &CFAlu) const {
	int CntIdx = TII->getOperandIdx(AMDGPU::CF_ALU, AMDGPU::OpName::COUNT);			int CntIdx = TII->getOperandIdx(R600::CF_ALU, R600::OpName::COUNT);
	MachineBasicBlock::iterator I = CFAlu, E = CFAlu.getParent()->end();			MachineBasicBlock::iterator I = CFAlu, E = CFAlu.getParent()->end();
	I++;			I++;
	do {			do {
	while (I != E && !isCFAlu(*I))			while (I != E && !isCFAlu(*I))
	I++;			I++;
	if (I == E)			if (I == E)
	return;			return;
	MachineInstr &MI = *I++;			MachineInstr &MI = *I++;
	if (isCFAluEnabled(MI))			if (isCFAluEnabled(MI))
	break;			break;
	CFAlu.getOperand(CntIdx).setImm(getCFAluSize(CFAlu) + getCFAluSize(MI));			CFAlu.getOperand(CntIdx).setImm(getCFAluSize(CFAlu) + getCFAluSize(MI));
	MI.eraseFromParent();			MI.eraseFromParent();
	} while (I != E);			} while (I != E);
	}			}

	bool R600ClauseMergePass::mergeIfPossible(MachineInstr &RootCFAlu,			bool R600ClauseMergePass::mergeIfPossible(MachineInstr &RootCFAlu,
	const MachineInstr &LatrCFAlu) const {			const MachineInstr &LatrCFAlu) const {
	assert(isCFAlu(RootCFAlu) && isCFAlu(LatrCFAlu));			assert(isCFAlu(RootCFAlu) && isCFAlu(LatrCFAlu));
	int CntIdx = TII->getOperandIdx(AMDGPU::CF_ALU, AMDGPU::OpName::COUNT);			int CntIdx = TII->getOperandIdx(R600::CF_ALU, R600::OpName::COUNT);
	unsigned RootInstCount = getCFAluSize(RootCFAlu),			unsigned RootInstCount = getCFAluSize(RootCFAlu),
	LaterInstCount = getCFAluSize(LatrCFAlu);			LaterInstCount = getCFAluSize(LatrCFAlu);
	unsigned CumuledInsts = RootInstCount + LaterInstCount;			unsigned CumuledInsts = RootInstCount + LaterInstCount;
	if (CumuledInsts >= TII->getMaxAlusPerClause()) {			if (CumuledInsts >= TII->getMaxAlusPerClause()) {
	LLVM_DEBUG(dbgs() << "Excess inst counts\n");			LLVM_DEBUG(dbgs() << "Excess inst counts\n");
	return false;			return false;
	}			}
	if (RootCFAlu.getOpcode() == AMDGPU::CF_ALU_PUSH_BEFORE)			if (RootCFAlu.getOpcode() == R600::CF_ALU_PUSH_BEFORE)
	return false;			return false;
	// Is KCache Bank 0 compatible ?			// Is KCache Bank 0 compatible ?
	int Mode0Idx =			int Mode0Idx =
	TII->getOperandIdx(AMDGPU::CF_ALU, AMDGPU::OpName::KCACHE_MODE0);			TII->getOperandIdx(R600::CF_ALU, R600::OpName::KCACHE_MODE0);
	int KBank0Idx =			int KBank0Idx =
	TII->getOperandIdx(AMDGPU::CF_ALU, AMDGPU::OpName::KCACHE_BANK0);			TII->getOperandIdx(R600::CF_ALU, R600::OpName::KCACHE_BANK0);
	int KBank0LineIdx =			int KBank0LineIdx =
	TII->getOperandIdx(AMDGPU::CF_ALU, AMDGPU::OpName::KCACHE_ADDR0);			TII->getOperandIdx(R600::CF_ALU, R600::OpName::KCACHE_ADDR0);
	if (LatrCFAlu.getOperand(Mode0Idx).getImm() &&			if (LatrCFAlu.getOperand(Mode0Idx).getImm() &&
	RootCFAlu.getOperand(Mode0Idx).getImm() &&			RootCFAlu.getOperand(Mode0Idx).getImm() &&
	(LatrCFAlu.getOperand(KBank0Idx).getImm() !=			(LatrCFAlu.getOperand(KBank0Idx).getImm() !=
	RootCFAlu.getOperand(KBank0Idx).getImm() \|\|			RootCFAlu.getOperand(KBank0Idx).getImm() \|\|
	LatrCFAlu.getOperand(KBank0LineIdx).getImm() !=			LatrCFAlu.getOperand(KBank0LineIdx).getImm() !=
	RootCFAlu.getOperand(KBank0LineIdx).getImm())) {			RootCFAlu.getOperand(KBank0LineIdx).getImm())) {
	LLVM_DEBUG(dbgs() << "Wrong KC0\n");			LLVM_DEBUG(dbgs() << "Wrong KC0\n");
	return false;			return false;
	}			}
	// Is KCache Bank 1 compatible ?			// Is KCache Bank 1 compatible ?
	int Mode1Idx =			int Mode1Idx =
	TII->getOperandIdx(AMDGPU::CF_ALU, AMDGPU::OpName::KCACHE_MODE1);			TII->getOperandIdx(R600::CF_ALU, R600::OpName::KCACHE_MODE1);
	int KBank1Idx =			int KBank1Idx =
	TII->getOperandIdx(AMDGPU::CF_ALU, AMDGPU::OpName::KCACHE_BANK1);			TII->getOperandIdx(R600::CF_ALU, R600::OpName::KCACHE_BANK1);
	int KBank1LineIdx =			int KBank1LineIdx =
	TII->getOperandIdx(AMDGPU::CF_ALU, AMDGPU::OpName::KCACHE_ADDR1);			TII->getOperandIdx(R600::CF_ALU, R600::OpName::KCACHE_ADDR1);
	if (LatrCFAlu.getOperand(Mode1Idx).getImm() &&			if (LatrCFAlu.getOperand(Mode1Idx).getImm() &&
	RootCFAlu.getOperand(Mode1Idx).getImm() &&			RootCFAlu.getOperand(Mode1Idx).getImm() &&
	(LatrCFAlu.getOperand(KBank1Idx).getImm() !=			(LatrCFAlu.getOperand(KBank1Idx).getImm() !=
	RootCFAlu.getOperand(KBank1Idx).getImm() \|\|			RootCFAlu.getOperand(KBank1Idx).getImm() \|\|
	LatrCFAlu.getOperand(KBank1LineIdx).getImm() !=			LatrCFAlu.getOperand(KBank1LineIdx).getImm() !=
	RootCFAlu.getOperand(KBank1LineIdx).getImm())) {			RootCFAlu.getOperand(KBank1LineIdx).getImm())) {
	LLVM_DEBUG(dbgs() << "Wrong KC0\n");			LLVM_DEBUG(dbgs() << "Wrong KC0\n");
	return false;			return false;
	▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

lib/Target/AMDGPU/R600ControlFlowFinalizer.cpp

Show First 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	for (std::vector<CFStack::StackItem>::const_iterator I = BranchStack.begin(),
E = BranchStack.end(); I != E; ++I) {		E = BranchStack.end(); I != E; ++I) {
if (*I == Item)		if (*I == Item)
return true;		return true;
}		}
return false;		return false;
}		}

bool CFStack::requiresWorkAroundForInst(unsigned Opcode) {		bool CFStack::requiresWorkAroundForInst(unsigned Opcode) {
if (Opcode == AMDGPU::CF_ALU_PUSH_BEFORE && ST->hasCaymanISA() &&		if (Opcode == R600::CF_ALU_PUSH_BEFORE && ST->hasCaymanISA() &&
getLoopDepth() > 1)		getLoopDepth() > 1)
return true;		return true;

if (!ST->hasCFAluBug())		if (!ST->hasCFAluBug())
return false;		return false;

switch(Opcode) {		switch(Opcode) {
default: return false;		default: return false;
case AMDGPU::CF_ALU_PUSH_BEFORE:		case R600::CF_ALU_PUSH_BEFORE:
case AMDGPU::CF_ALU_ELSE_AFTER:		case R600::CF_ALU_ELSE_AFTER:
case AMDGPU::CF_ALU_BREAK:		case R600::CF_ALU_BREAK:
case AMDGPU::CF_ALU_CONTINUE:		case R600::CF_ALU_CONTINUE:
if (CurrentSubEntries == 0)		if (CurrentSubEntries == 0)
return false;		return false;
if (ST->getWavefrontSize() == 64) {		if (ST->getWavefrontSize() == 64) {
// We are being conservative here. We only require this work-around if		// We are being conservative here. We only require this work-around if
// CurrentSubEntries > 3 &&		// CurrentSubEntries > 3 &&
// (CurrentSubEntries % 4 == 3 \|\| CurrentSubEntries % 4 == 0)		// (CurrentSubEntries % 4 == 3 \|\| CurrentSubEntries % 4 == 0)
//		//
// We have to be conservative, because we don't know for certain that		// We have to be conservative, because we don't know for certain that
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	void CFStack::updateMaxStackSize() {
unsigned CurrentStackSize =		unsigned CurrentStackSize =
CurrentEntries + (alignTo(CurrentSubEntries, 4) / 4);		CurrentEntries + (alignTo(CurrentSubEntries, 4) / 4);
MaxStackSize = std::max(CurrentStackSize, MaxStackSize);		MaxStackSize = std::max(CurrentStackSize, MaxStackSize);
}		}

void CFStack::pushBranch(unsigned Opcode, bool isWQM) {		void CFStack::pushBranch(unsigned Opcode, bool isWQM) {
CFStack::StackItem Item = CFStack::ENTRY;		CFStack::StackItem Item = CFStack::ENTRY;
switch(Opcode) {		switch(Opcode) {
case AMDGPU::CF_PUSH_EG:		case R600::CF_PUSH_EG:
case AMDGPU::CF_ALU_PUSH_BEFORE:		case R600::CF_ALU_PUSH_BEFORE:
if (!isWQM) {		if (!isWQM) {
if (!ST->hasCaymanISA() &&		if (!ST->hasCaymanISA() &&
!branchStackContains(CFStack::FIRST_NON_WQM_PUSH))		!branchStackContains(CFStack::FIRST_NON_WQM_PUSH))
Item = CFStack::FIRST_NON_WQM_PUSH; // May not be required on Evergreen/NI		Item = CFStack::FIRST_NON_WQM_PUSH; // May not be required on Evergreen/NI
// See comment in		// See comment in
// CFStack::getSubEntrySize()		// CFStack::getSubEntrySize()
else if (CurrentEntries > 0 &&		else if (CurrentEntries > 0 &&
ST->getGeneration() > R600Subtarget::EVERGREEN &&		ST->getGeneration() > R600Subtarget::EVERGREEN &&
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	private:

const R600InstrInfo *TII = nullptr;		const R600InstrInfo *TII = nullptr;
const R600RegisterInfo *TRI = nullptr;		const R600RegisterInfo *TRI = nullptr;
unsigned MaxFetchInst;		unsigned MaxFetchInst;
const R600Subtarget *ST = nullptr;		const R600Subtarget *ST = nullptr;

bool IsTrivialInst(MachineInstr &MI) const {		bool IsTrivialInst(MachineInstr &MI) const {
switch (MI.getOpcode()) {		switch (MI.getOpcode()) {
case AMDGPU::KILL:		case R600::KILL:
case AMDGPU::RETURN:		case R600::RETURN:
return true;		return true;
default:		default:
return false;		return false;
}		}
}		}

const MCInstrDesc &getHWInstrDesc(ControlFlowInstruction CFI) const {		const MCInstrDesc &getHWInstrDesc(ControlFlowInstruction CFI) const {
unsigned Opcode = 0;		unsigned Opcode = 0;
bool isEg = (ST->getGeneration() >= R600Subtarget::EVERGREEN);		bool isEg = (ST->getGeneration() >= R600Subtarget::EVERGREEN);
switch (CFI) {		switch (CFI) {
case CF_TC:		case CF_TC:
Opcode = isEg ? AMDGPU::CF_TC_EG : AMDGPU::CF_TC_R600;		Opcode = isEg ? R600::CF_TC_EG : R600::CF_TC_R600;
break;		break;
case CF_VC:		case CF_VC:
Opcode = isEg ? AMDGPU::CF_VC_EG : AMDGPU::CF_VC_R600;		Opcode = isEg ? R600::CF_VC_EG : R600::CF_VC_R600;
break;		break;
case CF_CALL_FS:		case CF_CALL_FS:
Opcode = isEg ? AMDGPU::CF_CALL_FS_EG : AMDGPU::CF_CALL_FS_R600;		Opcode = isEg ? R600::CF_CALL_FS_EG : R600::CF_CALL_FS_R600;
break;		break;
case CF_WHILE_LOOP:		case CF_WHILE_LOOP:
Opcode = isEg ? AMDGPU::WHILE_LOOP_EG : AMDGPU::WHILE_LOOP_R600;		Opcode = isEg ? R600::WHILE_LOOP_EG : R600::WHILE_LOOP_R600;
break;		break;
case CF_END_LOOP:		case CF_END_LOOP:
Opcode = isEg ? AMDGPU::END_LOOP_EG : AMDGPU::END_LOOP_R600;		Opcode = isEg ? R600::END_LOOP_EG : R600::END_LOOP_R600;
break;		break;
case CF_LOOP_BREAK:		case CF_LOOP_BREAK:
Opcode = isEg ? AMDGPU::LOOP_BREAK_EG : AMDGPU::LOOP_BREAK_R600;		Opcode = isEg ? R600::LOOP_BREAK_EG : R600::LOOP_BREAK_R600;
break;		break;
case CF_LOOP_CONTINUE:		case CF_LOOP_CONTINUE:
Opcode = isEg ? AMDGPU::CF_CONTINUE_EG : AMDGPU::CF_CONTINUE_R600;		Opcode = isEg ? R600::CF_CONTINUE_EG : R600::CF_CONTINUE_R600;
break;		break;
case CF_JUMP:		case CF_JUMP:
Opcode = isEg ? AMDGPU::CF_JUMP_EG : AMDGPU::CF_JUMP_R600;		Opcode = isEg ? R600::CF_JUMP_EG : R600::CF_JUMP_R600;
break;		break;
case CF_ELSE:		case CF_ELSE:
Opcode = isEg ? AMDGPU::CF_ELSE_EG : AMDGPU::CF_ELSE_R600;		Opcode = isEg ? R600::CF_ELSE_EG : R600::CF_ELSE_R600;
break;		break;
case CF_POP:		case CF_POP:
Opcode = isEg ? AMDGPU::POP_EG : AMDGPU::POP_R600;		Opcode = isEg ? R600::POP_EG : R600::POP_R600;
break;		break;
case CF_END:		case CF_END:
if (ST->hasCaymanISA()) {		if (ST->hasCaymanISA()) {
Opcode = AMDGPU::CF_END_CM;		Opcode = R600::CF_END_CM;
break;		break;
}		}
Opcode = isEg ? AMDGPU::CF_END_EG : AMDGPU::CF_END_R600;		Opcode = isEg ? R600::CF_END_EG : R600::CF_END_R600;
break;		break;
}		}
assert (Opcode && "No opcode selected");		assert (Opcode && "No opcode selected");
return TII->get(Opcode);		return TII->get(Opcode);
}		}

bool isCompatibleWithClause(const MachineInstr &MI,		bool isCompatibleWithClause(const MachineInstr &MI,
std::set<unsigned> &DstRegs) const {		std::set<unsigned> &DstRegs) const {
unsigned DstMI, SrcMI;		unsigned DstMI, SrcMI;
for (MachineInstr::const_mop_iterator I = MI.operands_begin(),		for (MachineInstr::const_mop_iterator I = MI.operands_begin(),
E = MI.operands_end();		E = MI.operands_end();
I != E; ++I) {		I != E; ++I) {
const MachineOperand &MO = *I;		const MachineOperand &MO = *I;
if (!MO.isReg())		if (!MO.isReg())
continue;		continue;
if (MO.isDef()) {		if (MO.isDef()) {
unsigned Reg = MO.getReg();		unsigned Reg = MO.getReg();
if (AMDGPU::R600_Reg128RegClass.contains(Reg))		if (R600::R600_Reg128RegClass.contains(Reg))
DstMI = Reg;		DstMI = Reg;
else		else
DstMI = TRI->getMatchingSuperReg(Reg,		DstMI = TRI->getMatchingSuperReg(Reg,
AMDGPURegisterInfo::getSubRegFromChannel(TRI->getHWRegChan(Reg)),		AMDGPURegisterInfo::getSubRegFromChannel(TRI->getHWRegChan(Reg)),
&AMDGPU::R600_Reg128RegClass);		&R600::R600_Reg128RegClass);
}		}
if (MO.isUse()) {		if (MO.isUse()) {
unsigned Reg = MO.getReg();		unsigned Reg = MO.getReg();
if (AMDGPU::R600_Reg128RegClass.contains(Reg))		if (R600::R600_Reg128RegClass.contains(Reg))
SrcMI = Reg;		SrcMI = Reg;
else		else
SrcMI = TRI->getMatchingSuperReg(Reg,		SrcMI = TRI->getMatchingSuperReg(Reg,
AMDGPURegisterInfo::getSubRegFromChannel(TRI->getHWRegChan(Reg)),		AMDGPURegisterInfo::getSubRegFromChannel(TRI->getHWRegChan(Reg)),
&AMDGPU::R600_Reg128RegClass);		&R600::R600_Reg128RegClass);
}		}
}		}
if ((DstRegs.find(SrcMI) == DstRegs.end())) {		if ((DstRegs.find(SrcMI) == DstRegs.end())) {
DstRegs.insert(DstMI);		DstRegs.insert(DstMI);
return true;		return true;
} else		} else
return false;		return false;
}		}
Show All 23 Lines	MachineInstr *MIb = BuildMI(MBB, ClauseHead, MBB.findDebugLoc(ClauseHead),
getHWInstrDesc(IsTex?CF_TC:CF_VC))		getHWInstrDesc(IsTex?CF_TC:CF_VC))
.addImm(0) // ADDR		.addImm(0) // ADDR
.addImm(AluInstCount - 1); // COUNT		.addImm(AluInstCount - 1); // COUNT
return ClauseFile(MIb, std::move(ClauseContent));		return ClauseFile(MIb, std::move(ClauseContent));
}		}

void getLiteral(MachineInstr &MI, std::vector<MachineOperand *> &Lits) const {		void getLiteral(MachineInstr &MI, std::vector<MachineOperand *> &Lits) const {
static const unsigned LiteralRegs[] = {		static const unsigned LiteralRegs[] = {
AMDGPU::ALU_LITERAL_X,		R600::ALU_LITERAL_X,
AMDGPU::ALU_LITERAL_Y,		R600::ALU_LITERAL_Y,
AMDGPU::ALU_LITERAL_Z,		R600::ALU_LITERAL_Z,
AMDGPU::ALU_LITERAL_W		R600::ALU_LITERAL_W
};		};
const SmallVector<std::pair<MachineOperand *, int64_t>, 3> Srcs =		const SmallVector<std::pair<MachineOperand *, int64_t>, 3> Srcs =
TII->getSrcs(MI);		TII->getSrcs(MI);
for (const auto &Src:Srcs) {		for (const auto &Src:Srcs) {
if (Src.first->getReg() != AMDGPU::ALU_LITERAL_X)		if (Src.first->getReg() != R600::ALU_LITERAL_X)
continue;		continue;
int64_t Imm = Src.second;		int64_t Imm = Src.second;
std::vector<MachineOperand *>::iterator It =		std::vector<MachineOperand *>::iterator It =
llvm::find_if(Lits, [&](MachineOperand *val) {		llvm::find_if(Lits, [&](MachineOperand *val) {
return val->isImm() && (val->getImm() == Imm);		return val->isImm() && (val->getImm() == Imm);
});		});

// Get corresponding Operand		// Get corresponding Operand
MachineOperand &Operand = MI.getOperand(		MachineOperand &Operand = MI.getOperand(
TII->getOperandIdx(MI.getOpcode(), AMDGPU::OpName::literal));		TII->getOperandIdx(MI.getOpcode(), R600::OpName::literal));

if (It != Lits.end()) {		if (It != Lits.end()) {
// Reuse existing literal reg		// Reuse existing literal reg
unsigned Index = It - Lits.begin();		unsigned Index = It - Lits.begin();
Src.first->setReg(LiteralRegs[Index]);		Src.first->setReg(LiteralRegs[Index]);
} else {		} else {
// Allocate new literal reg		// Allocate new literal reg
assert(Lits.size() < 4 && "Too many literals in Instruction Group");		assert(Lits.size() < 4 && "Too many literals in Instruction Group");
Src.first->setReg(LiteralRegs[Lits.size()]);		Src.first->setReg(LiteralRegs[Lits.size()]);
Lits.push_back(&Operand);		Lits.push_back(&Operand);
}		}
}		}
}		}

MachineBasicBlock::iterator insertLiterals(		MachineBasicBlock::iterator insertLiterals(
MachineBasicBlock::iterator InsertPos,		MachineBasicBlock::iterator InsertPos,
const std::vector<unsigned> &Literals) const {		const std::vector<unsigned> &Literals) const {
MachineBasicBlock *MBB = InsertPos->getParent();		MachineBasicBlock *MBB = InsertPos->getParent();
for (unsigned i = 0, e = Literals.size(); i < e; i+=2) {		for (unsigned i = 0, e = Literals.size(); i < e; i+=2) {
unsigned LiteralPair0 = Literals[i];		unsigned LiteralPair0 = Literals[i];
unsigned LiteralPair1 = (i + 1 < e)?Literals[i + 1]:0;		unsigned LiteralPair1 = (i + 1 < e)?Literals[i + 1]:0;
InsertPos = BuildMI(MBB, InsertPos->getDebugLoc(),		InsertPos = BuildMI(MBB, InsertPos->getDebugLoc(),
TII->get(AMDGPU::LITERALS))		TII->get(R600::LITERALS))
.addImm(LiteralPair0)		.addImm(LiteralPair0)
.addImm(LiteralPair1);		.addImm(LiteralPair1);
}		}
return InsertPos;		return InsertPos;
}		}

ClauseFile		ClauseFile
MakeALUClause(MachineBasicBlock &MBB, MachineBasicBlock::iterator &I)		MakeALUClause(MachineBasicBlock &MBB, MachineBasicBlock::iterator &I)
Show All 25 Lines	for (MachineBasicBlock::instr_iterator E = MBB.instr_end(); I != E;) {
DeleteMI.eraseFromParent();		DeleteMI.eraseFromParent();
} else {		} else {
getLiteral(*I, Literals);		getLiteral(*I, Literals);
ClauseContent.push_back(&*I);		ClauseContent.push_back(&*I);
I++;		I++;
}		}
for (unsigned i = 0, e = Literals.size(); i < e; i += 2) {		for (unsigned i = 0, e = Literals.size(); i < e; i += 2) {
MachineInstrBuilder MILit = BuildMI(MBB, I, I->getDebugLoc(),		MachineInstrBuilder MILit = BuildMI(MBB, I, I->getDebugLoc(),
TII->get(AMDGPU::LITERALS));		TII->get(R600::LITERALS));
if (Literals[i]->isImm()) {		if (Literals[i]->isImm()) {
MILit.addImm(Literals[i]->getImm());		MILit.addImm(Literals[i]->getImm());
} else {		} else {
MILit.addGlobalAddress(Literals[i]->getGlobal(),		MILit.addGlobalAddress(Literals[i]->getGlobal(),
Literals[i]->getOffset());		Literals[i]->getOffset());
}		}
if (i + 1 < e) {		if (i + 1 < e) {
if (Literals[i + 1]->isImm()) {		if (Literals[i + 1]->isImm()) {
Show All 12 Lines	MakeALUClause(MachineBasicBlock &MBB, MachineBasicBlock::iterator &I)
return ClauseFile(&ClauseHead, std::move(ClauseContent));		return ClauseFile(&ClauseHead, std::move(ClauseContent));
}		}

void EmitFetchClause(MachineBasicBlock::iterator InsertPos,		void EmitFetchClause(MachineBasicBlock::iterator InsertPos,
const DebugLoc &DL, ClauseFile &Clause,		const DebugLoc &DL, ClauseFile &Clause,
unsigned &CfCount) {		unsigned &CfCount) {
CounterPropagateAddr(*Clause.first, CfCount);		CounterPropagateAddr(*Clause.first, CfCount);
MachineBasicBlock *BB = Clause.first->getParent();		MachineBasicBlock *BB = Clause.first->getParent();
BuildMI(BB, DL, TII->get(AMDGPU::FETCH_CLAUSE)).addImm(CfCount);		BuildMI(BB, DL, TII->get(R600::FETCH_CLAUSE)).addImm(CfCount);
for (unsigned i = 0, e = Clause.second.size(); i < e; ++i) {		for (unsigned i = 0, e = Clause.second.size(); i < e; ++i) {
BB->splice(InsertPos, BB, Clause.second[i]);		BB->splice(InsertPos, BB, Clause.second[i]);
}		}
CfCount += 2 * Clause.second.size();		CfCount += 2 * Clause.second.size();
}		}

void EmitALUClause(MachineBasicBlock::iterator InsertPos, const DebugLoc &DL,		void EmitALUClause(MachineBasicBlock::iterator InsertPos, const DebugLoc &DL,
ClauseFile &Clause, unsigned &CfCount) {		ClauseFile &Clause, unsigned &CfCount) {
Clause.first->getOperand(0).setImm(0);		Clause.first->getOperand(0).setImm(0);
CounterPropagateAddr(*Clause.first, CfCount);		CounterPropagateAddr(*Clause.first, CfCount);
MachineBasicBlock *BB = Clause.first->getParent();		MachineBasicBlock *BB = Clause.first->getParent();
BuildMI(BB, DL, TII->get(AMDGPU::ALU_CLAUSE)).addImm(CfCount);		BuildMI(BB, DL, TII->get(R600::ALU_CLAUSE)).addImm(CfCount);
for (unsigned i = 0, e = Clause.second.size(); i < e; ++i) {		for (unsigned i = 0, e = Clause.second.size(); i < e; ++i) {
BB->splice(InsertPos, BB, Clause.second[i]);		BB->splice(InsertPos, BB, Clause.second[i]);
}		}
CfCount += Clause.second.size();		CfCount += Clause.second.size();
}		}

void CounterPropagateAddr(MachineInstr &MI, unsigned Addr) const {		void CounterPropagateAddr(MachineInstr &MI, unsigned Addr) const {
MI.getOperand(0).setImm(Addr + MI.getOperand(0).getImm());		MI.getOperand(0).setImm(Addr + MI.getOperand(0).getImm());
Show All 40 Lines	for (MachineFunction::iterator MB = MF.begin(), ME = MF.end(); MB != ME;
LLVM_DEBUG(dbgs() << CfCount << ":"; I->dump(););		LLVM_DEBUG(dbgs() << CfCount << ":"; I->dump(););
FetchClauses.push_back(MakeFetchClause(MBB, I));		FetchClauses.push_back(MakeFetchClause(MBB, I));
CfCount++;		CfCount++;
LastAlu.back() = nullptr;		LastAlu.back() = nullptr;
continue;		continue;
}		}

MachineBasicBlock::iterator MI = I;		MachineBasicBlock::iterator MI = I;
if (MI->getOpcode() != AMDGPU::ENDIF)		if (MI->getOpcode() != R600::ENDIF)
LastAlu.back() = nullptr;		LastAlu.back() = nullptr;
if (MI->getOpcode() == AMDGPU::CF_ALU)		if (MI->getOpcode() == R600::CF_ALU)
LastAlu.back() = &*MI;		LastAlu.back() = &*MI;
I++;		I++;
bool RequiresWorkAround =		bool RequiresWorkAround =
CFStack.requiresWorkAroundForInst(MI->getOpcode());		CFStack.requiresWorkAroundForInst(MI->getOpcode());
switch (MI->getOpcode()) {		switch (MI->getOpcode()) {
case AMDGPU::CF_ALU_PUSH_BEFORE:		case R600::CF_ALU_PUSH_BEFORE:
if (RequiresWorkAround) {		if (RequiresWorkAround) {
LLVM_DEBUG(dbgs()		LLVM_DEBUG(dbgs()
<< "Applying bug work-around for ALU_PUSH_BEFORE\n");		<< "Applying bug work-around for ALU_PUSH_BEFORE\n");
BuildMI(MBB, MI, MBB.findDebugLoc(MI), TII->get(AMDGPU::CF_PUSH_EG))		BuildMI(MBB, MI, MBB.findDebugLoc(MI), TII->get(R600::CF_PUSH_EG))
.addImm(CfCount + 1)		.addImm(CfCount + 1)
.addImm(1);		.addImm(1);
MI->setDesc(TII->get(AMDGPU::CF_ALU));		MI->setDesc(TII->get(R600::CF_ALU));
CfCount++;		CfCount++;
CFStack.pushBranch(AMDGPU::CF_PUSH_EG);		CFStack.pushBranch(R600::CF_PUSH_EG);
} else		} else
CFStack.pushBranch(AMDGPU::CF_ALU_PUSH_BEFORE);		CFStack.pushBranch(R600::CF_ALU_PUSH_BEFORE);
LLVM_FALLTHROUGH;		LLVM_FALLTHROUGH;
case AMDGPU::CF_ALU:		case R600::CF_ALU:
I = MI;		I = MI;
AluClauses.push_back(MakeALUClause(MBB, I));		AluClauses.push_back(MakeALUClause(MBB, I));
LLVM_DEBUG(dbgs() << CfCount << ":"; MI->dump(););		LLVM_DEBUG(dbgs() << CfCount << ":"; MI->dump(););
CfCount++;		CfCount++;
break;		break;
case AMDGPU::WHILELOOP: {		case R600::WHILELOOP: {
CFStack.pushLoop();		CFStack.pushLoop();
MachineInstr *MIb = BuildMI(MBB, MI, MBB.findDebugLoc(MI),		MachineInstr *MIb = BuildMI(MBB, MI, MBB.findDebugLoc(MI),
getHWInstrDesc(CF_WHILE_LOOP))		getHWInstrDesc(CF_WHILE_LOOP))
.addImm(1);		.addImm(1);
std::pair<unsigned, std::set<MachineInstr *>> Pair(CfCount,		std::pair<unsigned, std::set<MachineInstr *>> Pair(CfCount,
std::set<MachineInstr *>());		std::set<MachineInstr *>());
Pair.second.insert(MIb);		Pair.second.insert(MIb);
LoopStack.push_back(std::move(Pair));		LoopStack.push_back(std::move(Pair));
MI->eraseFromParent();		MI->eraseFromParent();
CfCount++;		CfCount++;
break;		break;
}		}
case AMDGPU::ENDLOOP: {		case R600::ENDLOOP: {
CFStack.popLoop();		CFStack.popLoop();
std::pair<unsigned, std::set<MachineInstr *>> Pair =		std::pair<unsigned, std::set<MachineInstr *>> Pair =
std::move(LoopStack.back());		std::move(LoopStack.back());
LoopStack.pop_back();		LoopStack.pop_back();
CounterPropagateAddr(Pair.second, CfCount);		CounterPropagateAddr(Pair.second, CfCount);
BuildMI(MBB, MI, MBB.findDebugLoc(MI), getHWInstrDesc(CF_END_LOOP))		BuildMI(MBB, MI, MBB.findDebugLoc(MI), getHWInstrDesc(CF_END_LOOP))
.addImm(Pair.first + 1);		.addImm(Pair.first + 1);
MI->eraseFromParent();		MI->eraseFromParent();
CfCount++;		CfCount++;
break;		break;
}		}
case AMDGPU::IF_PREDICATE_SET: {		case R600::IF_PREDICATE_SET: {
LastAlu.push_back(nullptr);		LastAlu.push_back(nullptr);
MachineInstr *MIb = BuildMI(MBB, MI, MBB.findDebugLoc(MI),		MachineInstr *MIb = BuildMI(MBB, MI, MBB.findDebugLoc(MI),
getHWInstrDesc(CF_JUMP))		getHWInstrDesc(CF_JUMP))
.addImm(0)		.addImm(0)
.addImm(0);		.addImm(0);
IfThenElseStack.push_back(MIb);		IfThenElseStack.push_back(MIb);
LLVM_DEBUG(dbgs() << CfCount << ":"; MIb->dump(););		LLVM_DEBUG(dbgs() << CfCount << ":"; MIb->dump(););
MI->eraseFromParent();		MI->eraseFromParent();
CfCount++;		CfCount++;
break;		break;
}		}
case AMDGPU::ELSE: {		case R600::ELSE: {
MachineInstr * JumpInst = IfThenElseStack.back();		MachineInstr * JumpInst = IfThenElseStack.back();
IfThenElseStack.pop_back();		IfThenElseStack.pop_back();
CounterPropagateAddr(*JumpInst, CfCount);		CounterPropagateAddr(*JumpInst, CfCount);
MachineInstr *MIb = BuildMI(MBB, MI, MBB.findDebugLoc(MI),		MachineInstr *MIb = BuildMI(MBB, MI, MBB.findDebugLoc(MI),
getHWInstrDesc(CF_ELSE))		getHWInstrDesc(CF_ELSE))
.addImm(0)		.addImm(0)
.addImm(0);		.addImm(0);
LLVM_DEBUG(dbgs() << CfCount << ":"; MIb->dump(););		LLVM_DEBUG(dbgs() << CfCount << ":"; MIb->dump(););
IfThenElseStack.push_back(MIb);		IfThenElseStack.push_back(MIb);
MI->eraseFromParent();		MI->eraseFromParent();
CfCount++;		CfCount++;
break;		break;
}		}
case AMDGPU::ENDIF: {		case R600::ENDIF: {
CFStack.popBranch();		CFStack.popBranch();
if (LastAlu.back()) {		if (LastAlu.back()) {
ToPopAfter.push_back(LastAlu.back());		ToPopAfter.push_back(LastAlu.back());
} else {		} else {
MachineInstr *MIb = BuildMI(MBB, MI, MBB.findDebugLoc(MI),		MachineInstr *MIb = BuildMI(MBB, MI, MBB.findDebugLoc(MI),
getHWInstrDesc(CF_POP))		getHWInstrDesc(CF_POP))
.addImm(CfCount + 1)		.addImm(CfCount + 1)
.addImm(1);		.addImm(1);
(void)MIb;		(void)MIb;
LLVM_DEBUG(dbgs() << CfCount << ":"; MIb->dump(););		LLVM_DEBUG(dbgs() << CfCount << ":"; MIb->dump(););
CfCount++;		CfCount++;
}		}

MachineInstr *IfOrElseInst = IfThenElseStack.back();		MachineInstr *IfOrElseInst = IfThenElseStack.back();
IfThenElseStack.pop_back();		IfThenElseStack.pop_back();
CounterPropagateAddr(*IfOrElseInst, CfCount);		CounterPropagateAddr(*IfOrElseInst, CfCount);
IfOrElseInst->getOperand(1).setImm(1);		IfOrElseInst->getOperand(1).setImm(1);
LastAlu.pop_back();		LastAlu.pop_back();
MI->eraseFromParent();		MI->eraseFromParent();
break;		break;
}		}
case AMDGPU::BREAK: {		case R600::BREAK: {
CfCount ++;		CfCount ++;
MachineInstr *MIb = BuildMI(MBB, MI, MBB.findDebugLoc(MI),		MachineInstr *MIb = BuildMI(MBB, MI, MBB.findDebugLoc(MI),
getHWInstrDesc(CF_LOOP_BREAK))		getHWInstrDesc(CF_LOOP_BREAK))
.addImm(0);		.addImm(0);
LoopStack.back().second.insert(MIb);		LoopStack.back().second.insert(MIb);
MI->eraseFromParent();		MI->eraseFromParent();
break;		break;
}		}
case AMDGPU::CONTINUE: {		case R600::CONTINUE: {
MachineInstr *MIb = BuildMI(MBB, MI, MBB.findDebugLoc(MI),		MachineInstr *MIb = BuildMI(MBB, MI, MBB.findDebugLoc(MI),
getHWInstrDesc(CF_LOOP_CONTINUE))		getHWInstrDesc(CF_LOOP_CONTINUE))
.addImm(0);		.addImm(0);
LoopStack.back().second.insert(MIb);		LoopStack.back().second.insert(MIb);
MI->eraseFromParent();		MI->eraseFromParent();
CfCount++;		CfCount++;
break;		break;
}		}
case AMDGPU::RETURN: {		case R600::RETURN: {
DebugLoc DL = MBB.findDebugLoc(MI);		DebugLoc DL = MBB.findDebugLoc(MI);
BuildMI(MBB, MI, DL, getHWInstrDesc(CF_END));		BuildMI(MBB, MI, DL, getHWInstrDesc(CF_END));
CfCount++;		CfCount++;
if (CfCount % 2) {		if (CfCount % 2) {
BuildMI(MBB, I, DL, TII->get(AMDGPU::PAD));		BuildMI(MBB, I, DL, TII->get(R600::PAD));
CfCount++;		CfCount++;
}		}
MI->eraseFromParent();		MI->eraseFromParent();
for (unsigned i = 0, e = FetchClauses.size(); i < e; i++)		for (unsigned i = 0, e = FetchClauses.size(); i < e; i++)
EmitFetchClause(I, DL, FetchClauses[i], CfCount);		EmitFetchClause(I, DL, FetchClauses[i], CfCount);
for (unsigned i = 0, e = AluClauses.size(); i < e; i++)		for (unsigned i = 0, e = AluClauses.size(); i < e; i++)
EmitALUClause(I, DL, AluClauses[i], CfCount);		EmitALUClause(I, DL, AluClauses[i], CfCount);
break;		break;
}		}
default:		default:
if (TII->isExport(MI->getOpcode())) {		if (TII->isExport(MI->getOpcode())) {
LLVM_DEBUG(dbgs() << CfCount << ":"; MI->dump(););		LLVM_DEBUG(dbgs() << CfCount << ":"; MI->dump(););
CfCount++;		CfCount++;
}		}
break;		break;
}		}
}		}
for (unsigned i = 0, e = ToPopAfter.size(); i < e; ++i) {		for (unsigned i = 0, e = ToPopAfter.size(); i < e; ++i) {
MachineInstr *Alu = ToPopAfter[i];		MachineInstr *Alu = ToPopAfter[i];
BuildMI(MBB, Alu, MBB.findDebugLoc((MachineBasicBlock::iterator)Alu),		BuildMI(MBB, Alu, MBB.findDebugLoc((MachineBasicBlock::iterator)Alu),
TII->get(AMDGPU::CF_ALU_POP_AFTER))		TII->get(R600::CF_ALU_POP_AFTER))
.addImm(Alu->getOperand(0).getImm())		.addImm(Alu->getOperand(0).getImm())
.addImm(Alu->getOperand(1).getImm())		.addImm(Alu->getOperand(1).getImm())
.addImm(Alu->getOperand(2).getImm())		.addImm(Alu->getOperand(2).getImm())
.addImm(Alu->getOperand(3).getImm())		.addImm(Alu->getOperand(3).getImm())
.addImm(Alu->getOperand(4).getImm())		.addImm(Alu->getOperand(4).getImm())
.addImm(Alu->getOperand(5).getImm())		.addImm(Alu->getOperand(5).getImm())
.addImm(Alu->getOperand(6).getImm())		.addImm(Alu->getOperand(6).getImm())
.addImm(Alu->getOperand(7).getImm())		.addImm(Alu->getOperand(7).getImm())
Show All 28 Lines

lib/Target/AMDGPU/R600EmitClauseMarkers.cpp

Show First 20 Lines • Show All 46 Lines • ▼ Show 20 Lines

class R600EmitClauseMarkers : public MachineFunctionPass {		class R600EmitClauseMarkers : public MachineFunctionPass {
private:		private:
const R600InstrInfo *TII = nullptr;		const R600InstrInfo *TII = nullptr;
int Address = 0;		int Address = 0;

unsigned OccupiedDwords(MachineInstr &MI) const {		unsigned OccupiedDwords(MachineInstr &MI) const {
switch (MI.getOpcode()) {		switch (MI.getOpcode()) {
case AMDGPU::INTERP_PAIR_XY:		case R600::INTERP_PAIR_XY:
case AMDGPU::INTERP_PAIR_ZW:		case R600::INTERP_PAIR_ZW:
case AMDGPU::INTERP_VEC_LOAD:		case R600::INTERP_VEC_LOAD:
case AMDGPU::DOT_4:		case R600::DOT_4:
return 4;		return 4;
case AMDGPU::KILL:		case R600::KILL:
return 0;		return 0;
default:		default:
break;		break;
}		}

// These will be expanded to two ALU instructions in the		// These will be expanded to two ALU instructions in the
// ExpandSpecialInstructions pass.		// ExpandSpecialInstructions pass.
if (TII->isLDSRetInstr(MI.getOpcode()))		if (TII->isLDSRetInstr(MI.getOpcode()))
return 2;		return 2;

if (TII->isVector(MI) \|\| TII->isCubeOp(MI.getOpcode()) \|\|		if (TII->isVector(MI) \|\| TII->isCubeOp(MI.getOpcode()) \|\|
TII->isReductionOp(MI.getOpcode()))		TII->isReductionOp(MI.getOpcode()))
return 4;		return 4;

unsigned NumLiteral = 0;		unsigned NumLiteral = 0;
for (MachineInstr::mop_iterator It = MI.operands_begin(),		for (MachineInstr::mop_iterator It = MI.operands_begin(),
E = MI.operands_end();		E = MI.operands_end();
It != E; ++It) {		It != E; ++It) {
MachineOperand &MO = *It;		MachineOperand &MO = *It;
if (MO.isReg() && MO.getReg() == AMDGPU::ALU_LITERAL_X)		if (MO.isReg() && MO.getReg() == R600::ALU_LITERAL_X)
++NumLiteral;		++NumLiteral;
}		}
return 1 + NumLiteral;		return 1 + NumLiteral;
}		}

bool isALU(const MachineInstr &MI) const {		bool isALU(const MachineInstr &MI) const {
if (TII->isALUInstr(MI.getOpcode()))		if (TII->isALUInstr(MI.getOpcode()))
return true;		return true;
if (TII->isVector(MI) \|\| TII->isCubeOp(MI.getOpcode()))		if (TII->isVector(MI) \|\| TII->isCubeOp(MI.getOpcode()))
return true;		return true;
switch (MI.getOpcode()) {		switch (MI.getOpcode()) {
case AMDGPU::PRED_X:		case R600::PRED_X:
case AMDGPU::INTERP_PAIR_XY:		case R600::INTERP_PAIR_XY:
case AMDGPU::INTERP_PAIR_ZW:		case R600::INTERP_PAIR_ZW:
case AMDGPU::INTERP_VEC_LOAD:		case R600::INTERP_VEC_LOAD:
case AMDGPU::COPY:		case R600::COPY:
case AMDGPU::DOT_4:		case R600::DOT_4:
return true;		return true;
default:		default:
return false;		return false;
}		}
}		}

bool IsTrivialInst(MachineInstr &MI) const {		bool IsTrivialInst(MachineInstr &MI) const {
switch (MI.getOpcode()) {		switch (MI.getOpcode()) {
case AMDGPU::KILL:		case R600::KILL:
case AMDGPU::RETURN:		case R600::RETURN:
case AMDGPU::IMPLICIT_DEF:		case R600::IMPLICIT_DEF:
return true;		return true;
default:		default:
return false;		return false;
}		}
}		}

std::pair<unsigned, unsigned> getAccessedBankLine(unsigned Sel) const {		std::pair<unsigned, unsigned> getAccessedBankLine(unsigned Sel) const {
// Sel is (512 + (kc_bank << 12) + ConstIndex) << 2		// Sel is (512 + (kc_bank << 12) + ConstIndex) << 2
Show All 10 Lines	private:
}		}

bool		bool
SubstituteKCacheBank(MachineInstr &MI,		SubstituteKCacheBank(MachineInstr &MI,
std::vector<std::pair<unsigned, unsigned>> &CachedConsts,		std::vector<std::pair<unsigned, unsigned>> &CachedConsts,
bool UpdateInstr = true) const {		bool UpdateInstr = true) const {
std::vector<std::pair<unsigned, unsigned>> UsedKCache;		std::vector<std::pair<unsigned, unsigned>> UsedKCache;

if (!TII->isALUInstr(MI.getOpcode()) && MI.getOpcode() != AMDGPU::DOT_4)		if (!TII->isALUInstr(MI.getOpcode()) && MI.getOpcode() != R600::DOT_4)
return true;		return true;

const SmallVectorImpl<std::pair<MachineOperand *, int64_t>> &Consts =		const SmallVectorImpl<std::pair<MachineOperand *, int64_t>> &Consts =
TII->getSrcs(MI);		TII->getSrcs(MI);
assert(		assert(
(TII->isALUInstr(MI.getOpcode()) \|\| MI.getOpcode() == AMDGPU::DOT_4) &&		(TII->isALUInstr(MI.getOpcode()) \|\| MI.getOpcode() == R600::DOT_4) &&
"Can't assign Const");		"Can't assign Const");
for (unsigned i = 0, n = Consts.size(); i < n; ++i) {		for (unsigned i = 0, n = Consts.size(); i < n; ++i) {
if (Consts[i].first->getReg() != AMDGPU::ALU_CONST)		if (Consts[i].first->getReg() != R600::ALU_CONST)
continue;		continue;
unsigned Sel = Consts[i].second;		unsigned Sel = Consts[i].second;
unsigned Chan = Sel & 3, Index = ((Sel >> 2) - 512) & 31;		unsigned Chan = Sel & 3, Index = ((Sel >> 2) - 512) & 31;
unsigned KCacheIndex = Index * 4 + Chan;		unsigned KCacheIndex = Index * 4 + Chan;
const std::pair<unsigned, unsigned> &BankLine = getAccessedBankLine(Sel);		const std::pair<unsigned, unsigned> &BankLine = getAccessedBankLine(Sel);
if (CachedConsts.empty()) {		if (CachedConsts.empty()) {
CachedConsts.push_back(BankLine);		CachedConsts.push_back(BankLine);
UsedKCache.push_back(std::pair<unsigned, unsigned>(0, KCacheIndex));		UsedKCache.push_back(std::pair<unsigned, unsigned>(0, KCacheIndex));
Show All 14 Lines	for (unsigned i = 0, n = Consts.size(); i < n; ++i) {
}		}
return false;		return false;
}		}

if (!UpdateInstr)		if (!UpdateInstr)
return true;		return true;

for (unsigned i = 0, j = 0, n = Consts.size(); i < n; ++i) {		for (unsigned i = 0, j = 0, n = Consts.size(); i < n; ++i) {
if (Consts[i].first->getReg() != AMDGPU::ALU_CONST)		if (Consts[i].first->getReg() != R600::ALU_CONST)
continue;		continue;
switch(UsedKCache[j].first) {		switch(UsedKCache[j].first) {
case 0:		case 0:
Consts[i].first->setReg(		Consts[i].first->setReg(
AMDGPU::R600_KC0RegClass.getRegister(UsedKCache[j].second));		R600::R600_KC0RegClass.getRegister(UsedKCache[j].second));
break;		break;
case 1:		case 1:
Consts[i].first->setReg(		Consts[i].first->setReg(
AMDGPU::R600_KC1RegClass.getRegister(UsedKCache[j].second));		R600::R600_KC1RegClass.getRegister(UsedKCache[j].second));
break;		break;
default:		default:
llvm_unreachable("Wrong Cache Line");		llvm_unreachable("Wrong Cache Line");
}		}
j++;		j++;
}		}
return true;		return true;
}		}
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	MakeALUClause(MachineBasicBlock &MBB, MachineBasicBlock::iterator I) {
unsigned AluInstCount = 0;		unsigned AluInstCount = 0;
for (MachineBasicBlock::iterator E = MBB.end(); I != E; ++I) {		for (MachineBasicBlock::iterator E = MBB.end(); I != E; ++I) {
if (IsTrivialInst(*I))		if (IsTrivialInst(*I))
continue;		continue;
if (!isALU(*I))		if (!isALU(*I))
break;		break;
if (AluInstCount > TII->getMaxAlusPerClause())		if (AluInstCount > TII->getMaxAlusPerClause())
break;		break;
if (I->getOpcode() == AMDGPU::PRED_X) {		if (I->getOpcode() == R600::PRED_X) {
// We put PRED_X in its own clause to ensure that ifcvt won't create		// We put PRED_X in its own clause to ensure that ifcvt won't create
// clauses with more than 128 insts.		// clauses with more than 128 insts.
// IfCvt is indeed checking that "then" and "else" branches of an if		// IfCvt is indeed checking that "then" and "else" branches of an if
// statement have less than ~60 insts thus converted clauses can't be		// statement have less than ~60 insts thus converted clauses can't be
// bigger than ~121 insts (predicate setter needs to be in the same		// bigger than ~121 insts (predicate setter needs to be in the same
// clause as predicated alus).		// clause as predicated alus).
if (AluInstCount > 0)		if (AluInstCount > 0)
break;		break;
Show All 19 Lines	for (MachineBasicBlock::iterator E = MBB.end(); I != E; ++I) {
if (!canClauseLocalKillFitInClause(AluInstCount, KCacheBanks, I, E))		if (!canClauseLocalKillFitInClause(AluInstCount, KCacheBanks, I, E))
break;		break;

if (!SubstituteKCacheBank(*I, KCacheBanks))		if (!SubstituteKCacheBank(*I, KCacheBanks))
break;		break;
AluInstCount += OccupiedDwords(*I);		AluInstCount += OccupiedDwords(*I);
}		}
unsigned Opcode = PushBeforeModifier ?		unsigned Opcode = PushBeforeModifier ?
AMDGPU::CF_ALU_PUSH_BEFORE : AMDGPU::CF_ALU;		R600::CF_ALU_PUSH_BEFORE : R600::CF_ALU;
BuildMI(MBB, ClauseHead, MBB.findDebugLoc(ClauseHead), TII->get(Opcode))		BuildMI(MBB, ClauseHead, MBB.findDebugLoc(ClauseHead), TII->get(Opcode))
// We don't use the ADDR field until R600ControlFlowFinalizer pass, where		// We don't use the ADDR field until R600ControlFlowFinalizer pass, where
// it is safe to assume it is 0. However if we always put 0 here, the ifcvt		// it is safe to assume it is 0. However if we always put 0 here, the ifcvt
// pass may assume that identical ALU clause starter at the beginning of a		// pass may assume that identical ALU clause starter at the beginning of a
// true and false branch can be factorized which is not the case.		// true and false branch can be factorized which is not the case.
.addImm(Address++) // ADDR		.addImm(Address++) // ADDR
.addImm(KCacheBanks.empty()?0:KCacheBanks[0].first) // KB0		.addImm(KCacheBanks.empty()?0:KCacheBanks[0].first) // KB0
.addImm((KCacheBanks.size() < 2)?0:KCacheBanks[1].first) // KB1		.addImm((KCacheBanks.size() < 2)?0:KCacheBanks[1].first) // KB1
Show All 16 Lines	public:
bool runOnMachineFunction(MachineFunction &MF) override {		bool runOnMachineFunction(MachineFunction &MF) override {
const R600Subtarget &ST = MF.getSubtarget<R600Subtarget>();		const R600Subtarget &ST = MF.getSubtarget<R600Subtarget>();
TII = ST.getInstrInfo();		TII = ST.getInstrInfo();

for (MachineFunction::iterator BB = MF.begin(), BB_E = MF.end();		for (MachineFunction::iterator BB = MF.begin(), BB_E = MF.end();
BB != BB_E; ++BB) {		BB != BB_E; ++BB) {
MachineBasicBlock &MBB = *BB;		MachineBasicBlock &MBB = *BB;
MachineBasicBlock::iterator I = MBB.begin();		MachineBasicBlock::iterator I = MBB.begin();
if (I != MBB.end() && I->getOpcode() == AMDGPU::CF_ALU)		if (I != MBB.end() && I->getOpcode() == R600::CF_ALU)
continue; // BB was already parsed		continue; // BB was already parsed
for (MachineBasicBlock::iterator E = MBB.end(); I != E;) {		for (MachineBasicBlock::iterator E = MBB.end(); I != E;) {
if (isALU(*I)) {		if (isALU(*I)) {
auto next = MakeALUClause(MBB, I);		auto next = MakeALUClause(MBB, I);
assert(next != I);		assert(next != I);
I = next;		I = next;
} else		} else
++I;		++I;
Show All 22 Lines

lib/Target/AMDGPU/R600ExpandSpecialInstrs.cpp

Show First 20 Lines • Show All 90 Lines • ▼ Show 20 Lines	for (MachineFunction::iterator BB = MF.begin(), BB_E = MF.end();
MachineBasicBlock &MBB = *BB;		MachineBasicBlock &MBB = *BB;
MachineBasicBlock::iterator I = MBB.begin();		MachineBasicBlock::iterator I = MBB.begin();
while (I != MBB.end()) {		while (I != MBB.end()) {
MachineInstr &MI = *I;		MachineInstr &MI = *I;
I = std::next(I);		I = std::next(I);

// Expand LDS_*_RET instructions		// Expand LDS_*_RET instructions
if (TII->isLDSRetInstr(MI.getOpcode())) {		if (TII->isLDSRetInstr(MI.getOpcode())) {
int DstIdx = TII->getOperandIdx(MI.getOpcode(), AMDGPU::OpName::dst);		int DstIdx = TII->getOperandIdx(MI.getOpcode(), R600::OpName::dst);
assert(DstIdx != -1);		assert(DstIdx != -1);
MachineOperand &DstOp = MI.getOperand(DstIdx);		MachineOperand &DstOp = MI.getOperand(DstIdx);
MachineInstr *Mov = TII->buildMovInstr(&MBB, I,		MachineInstr *Mov = TII->buildMovInstr(&MBB, I,
DstOp.getReg(), AMDGPU::OQAP);		DstOp.getReg(), R600::OQAP);
DstOp.setReg(AMDGPU::OQAP);		DstOp.setReg(R600::OQAP);
int LDSPredSelIdx = TII->getOperandIdx(MI.getOpcode(),		int LDSPredSelIdx = TII->getOperandIdx(MI.getOpcode(),
AMDGPU::OpName::pred_sel);		R600::OpName::pred_sel);
int MovPredSelIdx = TII->getOperandIdx(Mov->getOpcode(),		int MovPredSelIdx = TII->getOperandIdx(Mov->getOpcode(),
AMDGPU::OpName::pred_sel);		R600::OpName::pred_sel);
// Copy the pred_sel bit		// Copy the pred_sel bit
Mov->getOperand(MovPredSelIdx).setReg(		Mov->getOperand(MovPredSelIdx).setReg(
MI.getOperand(LDSPredSelIdx).getReg());		MI.getOperand(LDSPredSelIdx).getReg());
}		}

switch (MI.getOpcode()) {		switch (MI.getOpcode()) {
default: break;		default: break;
// Expand PRED_X to one of the PRED_SET instructions.		// Expand PRED_X to one of the PRED_SET instructions.
case AMDGPU::PRED_X: {		case R600::PRED_X: {
uint64_t Flags = MI.getOperand(3).getImm();		uint64_t Flags = MI.getOperand(3).getImm();
// The native opcode used by PRED_X is stored as an immediate in the		// The native opcode used by PRED_X is stored as an immediate in the
// third operand.		// third operand.
MachineInstr *PredSet = TII->buildDefaultInstruction(MBB, I,		MachineInstr *PredSet = TII->buildDefaultInstruction(MBB, I,
MI.getOperand(2).getImm(), // opcode		MI.getOperand(2).getImm(), // opcode
MI.getOperand(0).getReg(), // dst		MI.getOperand(0).getReg(), // dst
MI.getOperand(1).getReg(), // src0		MI.getOperand(1).getReg(), // src0
AMDGPU::ZERO); // src1		R600::ZERO); // src1
TII->addFlag(*PredSet, 0, MO_FLAG_MASK);		TII->addFlag(*PredSet, 0, MO_FLAG_MASK);
if (Flags & MO_FLAG_PUSH) {		if (Flags & MO_FLAG_PUSH) {
TII->setImmOperand(*PredSet, AMDGPU::OpName::update_exec_mask, 1);		TII->setImmOperand(*PredSet, R600::OpName::update_exec_mask, 1);
} else {		} else {
TII->setImmOperand(*PredSet, AMDGPU::OpName::update_pred, 1);		TII->setImmOperand(*PredSet, R600::OpName::update_pred, 1);
}		}
MI.eraseFromParent();		MI.eraseFromParent();
continue;		continue;
}		}
case AMDGPU::DOT_4: {		case R600::DOT_4: {

const R600RegisterInfo &TRI = TII->getRegisterInfo();		const R600RegisterInfo &TRI = TII->getRegisterInfo();

unsigned DstReg = MI.getOperand(0).getReg();		unsigned DstReg = MI.getOperand(0).getReg();
unsigned DstBase = TRI.getEncodingValue(DstReg) & HW_REG_MASK;		unsigned DstBase = TRI.getEncodingValue(DstReg) & HW_REG_MASK;

for (unsigned Chan = 0; Chan < 4; ++Chan) {		for (unsigned Chan = 0; Chan < 4; ++Chan) {
bool Mask = (Chan != TRI.getHWRegChan(DstReg));		bool Mask = (Chan != TRI.getHWRegChan(DstReg));
unsigned SubDstReg =		unsigned SubDstReg =
AMDGPU::R600_TReg32RegClass.getRegister((DstBase * 4) + Chan);		R600::R600_TReg32RegClass.getRegister((DstBase * 4) + Chan);
MachineInstr *BMI =		MachineInstr *BMI =
TII->buildSlotOfVectorInstruction(MBB, &MI, Chan, SubDstReg);		TII->buildSlotOfVectorInstruction(MBB, &MI, Chan, SubDstReg);
if (Chan > 0) {		if (Chan > 0) {
BMI->bundleWithPred();		BMI->bundleWithPred();
}		}
if (Mask) {		if (Mask) {
TII->addFlag(*BMI, 0, MO_FLAG_MASK);		TII->addFlag(*BMI, 0, MO_FLAG_MASK);
}		}
if (Chan != 3)		if (Chan != 3)
TII->addFlag(*BMI, 0, MO_FLAG_NOT_LAST);		TII->addFlag(*BMI, 0, MO_FLAG_NOT_LAST);
unsigned Opcode = BMI->getOpcode();		unsigned Opcode = BMI->getOpcode();
// While not strictly necessary from hw point of view, we force		// While not strictly necessary from hw point of view, we force
// all src operands of a dot4 inst to belong to the same slot.		// all src operands of a dot4 inst to belong to the same slot.
unsigned Src0 = BMI->getOperand(		unsigned Src0 = BMI->getOperand(
TII->getOperandIdx(Opcode, AMDGPU::OpName::src0))		TII->getOperandIdx(Opcode, R600::OpName::src0))
.getReg();		.getReg();
unsigned Src1 = BMI->getOperand(		unsigned Src1 = BMI->getOperand(
TII->getOperandIdx(Opcode, AMDGPU::OpName::src1))		TII->getOperandIdx(Opcode, R600::OpName::src1))
.getReg();		.getReg();
(void) Src0;		(void) Src0;
(void) Src1;		(void) Src1;
if ((TRI.getEncodingValue(Src0) & 0xff) < 127 &&		if ((TRI.getEncodingValue(Src0) & 0xff) < 127 &&
(TRI.getEncodingValue(Src1) & 0xff) < 127)		(TRI.getEncodingValue(Src1) & 0xff) < 127)
assert(TRI.getHWRegChan(Src0) == TRI.getHWRegChan(Src1));		assert(TRI.getHWRegChan(Src0) == TRI.getHWRegChan(Src1));
}		}
MI.eraseFromParent();		MI.eraseFromParent();
Show All 30 Lines	while (I != MBB.end()) {
// T0_XYZW = CUBE T1_XYZW		// T0_XYZW = CUBE T1_XYZW
// becomes:		// becomes:
// TO_X = CUBE T1_Z, T1_Y		// TO_X = CUBE T1_Z, T1_Y
// T0_Y = CUBE T1_Z, T1_X		// T0_Y = CUBE T1_Z, T1_X
// T0_Z = CUBE T1_X, T1_Z		// T0_Z = CUBE T1_X, T1_Z
// T0_W = CUBE T1_Y, T1_Z		// T0_W = CUBE T1_Y, T1_Z
for (unsigned Chan = 0; Chan < 4; Chan++) {		for (unsigned Chan = 0; Chan < 4; Chan++) {
unsigned DstReg = MI.getOperand(		unsigned DstReg = MI.getOperand(
TII->getOperandIdx(MI, AMDGPU::OpName::dst)).getReg();		TII->getOperandIdx(MI, R600::OpName::dst)).getReg();
unsigned Src0 = MI.getOperand(		unsigned Src0 = MI.getOperand(
TII->getOperandIdx(MI, AMDGPU::OpName::src0)).getReg();		TII->getOperandIdx(MI, R600::OpName::src0)).getReg();
unsigned Src1 = 0;		unsigned Src1 = 0;

// Determine the correct source registers		// Determine the correct source registers
if (!IsCube) {		if (!IsCube) {
int Src1Idx = TII->getOperandIdx(MI, AMDGPU::OpName::src1);		int Src1Idx = TII->getOperandIdx(MI, R600::OpName::src1);
if (Src1Idx != -1) {		if (Src1Idx != -1) {
Src1 = MI.getOperand(Src1Idx).getReg();		Src1 = MI.getOperand(Src1Idx).getReg();
}		}
}		}
if (IsReduction) {		if (IsReduction) {
unsigned SubRegIndex = AMDGPURegisterInfo::getSubRegFromChannel(Chan);		unsigned SubRegIndex = AMDGPURegisterInfo::getSubRegFromChannel(Chan);
Src0 = TRI.getSubReg(Src0, SubRegIndex);		Src0 = TRI.getSubReg(Src0, SubRegIndex);
Src1 = TRI.getSubReg(Src1, SubRegIndex);		Src1 = TRI.getSubReg(Src1, SubRegIndex);
Show All 11 Lines	while (I != MBB.end()) {
if (IsCube) {		if (IsCube) {
unsigned SubRegIndex = AMDGPURegisterInfo::getSubRegFromChannel(Chan);		unsigned SubRegIndex = AMDGPURegisterInfo::getSubRegFromChannel(Chan);
DstReg = TRI.getSubReg(DstReg, SubRegIndex);		DstReg = TRI.getSubReg(DstReg, SubRegIndex);
} else {		} else {
// Mask the write if the original instruction does not write to		// Mask the write if the original instruction does not write to
// the current Channel.		// the current Channel.
Mask = (Chan != TRI.getHWRegChan(DstReg));		Mask = (Chan != TRI.getHWRegChan(DstReg));
unsigned DstBase = TRI.getEncodingValue(DstReg) & HW_REG_MASK;		unsigned DstBase = TRI.getEncodingValue(DstReg) & HW_REG_MASK;
DstReg = AMDGPU::R600_TReg32RegClass.getRegister((DstBase * 4) + Chan);		DstReg = R600::R600_TReg32RegClass.getRegister((DstBase * 4) + Chan);
}		}

// Set the IsLast bit		// Set the IsLast bit
NotLast = (Chan != 3 );		NotLast = (Chan != 3 );

// Add the new instruction		// Add the new instruction
unsigned Opcode = MI.getOpcode();		unsigned Opcode = MI.getOpcode();
switch (Opcode) {		switch (Opcode) {
case AMDGPU::CUBE_r600_pseudo:		case R600::CUBE_r600_pseudo:
Opcode = AMDGPU::CUBE_r600_real;		Opcode = R600::CUBE_r600_real;
break;		break;
case AMDGPU::CUBE_eg_pseudo:		case R600::CUBE_eg_pseudo:
Opcode = AMDGPU::CUBE_eg_real;		Opcode = R600::CUBE_eg_real;
break;		break;
default:		default:
break;		break;
}		}

MachineInstr *NewMI =		MachineInstr *NewMI =
TII->buildDefaultInstruction(MBB, I, Opcode, DstReg, Src0, Src1);		TII->buildDefaultInstruction(MBB, I, Opcode, DstReg, Src0, Src1);

if (Chan != 0)		if (Chan != 0)
NewMI->bundleWithPred();		NewMI->bundleWithPred();
if (Mask) {		if (Mask) {
TII->addFlag(*NewMI, 0, MO_FLAG_MASK);		TII->addFlag(*NewMI, 0, MO_FLAG_MASK);
}		}
if (NotLast) {		if (NotLast) {
TII->addFlag(*NewMI, 0, MO_FLAG_NOT_LAST);		TII->addFlag(*NewMI, 0, MO_FLAG_NOT_LAST);
}		}
SetFlagInNewMI(NewMI, &MI, AMDGPU::OpName::clamp);		SetFlagInNewMI(NewMI, &MI, R600::OpName::clamp);
SetFlagInNewMI(NewMI, &MI, AMDGPU::OpName::literal);		SetFlagInNewMI(NewMI, &MI, R600::OpName::literal);
SetFlagInNewMI(NewMI, &MI, AMDGPU::OpName::src0_abs);		SetFlagInNewMI(NewMI, &MI, R600::OpName::src0_abs);
SetFlagInNewMI(NewMI, &MI, AMDGPU::OpName::src1_abs);		SetFlagInNewMI(NewMI, &MI, R600::OpName::src1_abs);
SetFlagInNewMI(NewMI, &MI, AMDGPU::OpName::src0_neg);		SetFlagInNewMI(NewMI, &MI, R600::OpName::src0_neg);
SetFlagInNewMI(NewMI, &MI, AMDGPU::OpName::src1_neg);		SetFlagInNewMI(NewMI, &MI, R600::OpName::src1_neg);
}		}
MI.eraseFromParent();		MI.eraseFromParent();
}		}
}		}
return false;		return false;
}		}

lib/Target/AMDGPU/R600ISelLowering.h

	Show All 17 Lines
	#include "AMDGPUISelLowering.h"			#include "AMDGPUISelLowering.h"

	namespace llvm {			namespace llvm {

	class R600InstrInfo;			class R600InstrInfo;
	class R600Subtarget;			class R600Subtarget;

	class R600TargetLowering final : public AMDGPUTargetLowering {			class R600TargetLowering final : public AMDGPUTargetLowering {

				const R600Subtarget *Subtarget;
	public:			public:
	R600TargetLowering(const TargetMachine &TM, const R600Subtarget &STI);			R600TargetLowering(const TargetMachine &TM, const R600Subtarget &STI);

	const R600Subtarget *getSubtarget() const;			const R600Subtarget *getSubtarget() const;

	MachineBasicBlock *			MachineBasicBlock *
	EmitInstrWithCustomInserter(MachineInstr &MI,			EmitInstrWithCustomInserter(MachineInstr &MI,
	MachineBasicBlock *BB) const override;			MachineBasicBlock *BB) const override;
	SDValue LowerOperation(SDValue Op, SelectionDAG &DAG) const override;			SDValue LowerOperation(SDValue Op, SelectionDAG &DAG) const override;
	SDValue PerformDAGCombine(SDNode *N, DAGCombinerInfo &DCI) const override;			SDValue PerformDAGCombine(SDNode *N, DAGCombinerInfo &DCI) const override;
	void ReplaceNodeResults(SDNode * N,			void ReplaceNodeResults(SDNode * N,
	SmallVectorImpl<SDValue> &Results,			SmallVectorImpl<SDValue> &Results,
	SelectionDAG &DAG) const override;			SelectionDAG &DAG) const override;
				CCAssignFn *CCAssignFnForCall(CallingConv::ID CC, bool IsVarArg) const;
	SDValue LowerFormalArguments(SDValue Chain, CallingConv::ID CallConv,			SDValue LowerFormalArguments(SDValue Chain, CallingConv::ID CallConv,
	bool isVarArg,			bool isVarArg,
	const SmallVectorImpl<ISD::InputArg> &Ins,			const SmallVectorImpl<ISD::InputArg> &Ins,
	const SDLoc &DL, SelectionDAG &DAG,			const SDLoc &DL, SelectionDAG &DAG,
	SmallVectorImpl<SDValue> &InVals) const override;			SmallVectorImpl<SDValue> &InVals) const override;
	EVT getSetCCResultType(const DataLayout &DL, LLVMContext &,			EVT getSetCCResultType(const DataLayout &DL, LLVMContext &,
	EVT VT) const override;			EVT VT) const override;

	▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

lib/Target/AMDGPU/R600ISelLowering.cpp

//===-- R600ISelLowering.cpp - R600 DAG Lowering Implementation -----------===//		//===-- R600ISelLowering.cpp - R600 DAG Lowering Implementation -----------===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
/// \file		/// \file
/// Custom DAG lowering for R600		/// Custom DAG lowering for R600
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "R600ISelLowering.h"		#include "R600ISelLowering.h"
#include "AMDGPUFrameLowering.h"		#include "AMDGPUFrameLowering.h"
#include "AMDGPUIntrinsicInfo.h"
#include "AMDGPUSubtarget.h"		#include "AMDGPUSubtarget.h"
#include "R600Defines.h"		#include "R600Defines.h"
#include "R600FrameLowering.h"		#include "R600FrameLowering.h"
#include "R600InstrInfo.h"		#include "R600InstrInfo.h"
#include "R600MachineFunctionInfo.h"		#include "R600MachineFunctionInfo.h"
#include "MCTargetDesc/AMDGPUMCTargetDesc.h"		#include "MCTargetDesc/AMDGPUMCTargetDesc.h"
#include "Utils/AMDGPUBaseInfo.h"		#include "Utils/AMDGPUBaseInfo.h"
#include "llvm/ADT/APFloat.h"		#include "llvm/ADT/APFloat.h"
Show All 20 Lines
#include <cassert>		#include <cassert>
#include <cstdint>		#include <cstdint>
#include <iterator>		#include <iterator>
#include <utility>		#include <utility>
#include <vector>		#include <vector>

using namespace llvm;		using namespace llvm;

		static bool allocateKernArg(unsigned ValNo, MVT ValVT, MVT LocVT,
		CCValAssign::LocInfo LocInfo,
		ISD::ArgFlagsTy ArgFlags, CCState &State) {
		MachineFunction &MF = State.getMachineFunction();
		AMDGPUMachineFunction *MFI = MF.getInfo<AMDGPUMachineFunction>();

		uint64_t Offset = MFI->allocateKernArg(LocVT.getStoreSize(),
		ArgFlags.getOrigAlign());
		State.addLoc(CCValAssign::getCustomMem(ValNo, ValVT, Offset, LocVT, LocInfo));
		return true;
		}

		#include "R600GenCallingConv.inc"

R600TargetLowering::R600TargetLowering(const TargetMachine &TM,		R600TargetLowering::R600TargetLowering(const TargetMachine &TM,
const R600Subtarget &STI)		const R600Subtarget &STI)
: AMDGPUTargetLowering(TM, STI), Gen(STI.getGeneration()) {		: AMDGPUTargetLowering(TM, STI), Subtarget(&STI), Gen(STI.getGeneration()) {
addRegisterClass(MVT::f32, &AMDGPU::R600_Reg32RegClass);		addRegisterClass(MVT::f32, &R600::R600_Reg32RegClass);
addRegisterClass(MVT::i32, &AMDGPU::R600_Reg32RegClass);		addRegisterClass(MVT::i32, &R600::R600_Reg32RegClass);
addRegisterClass(MVT::v2f32, &AMDGPU::R600_Reg64RegClass);		addRegisterClass(MVT::v2f32, &R600::R600_Reg64RegClass);
addRegisterClass(MVT::v2i32, &AMDGPU::R600_Reg64RegClass);		addRegisterClass(MVT::v2i32, &R600::R600_Reg64RegClass);
addRegisterClass(MVT::v4f32, &AMDGPU::R600_Reg128RegClass);		addRegisterClass(MVT::v4f32, &R600::R600_Reg128RegClass);
addRegisterClass(MVT::v4i32, &AMDGPU::R600_Reg128RegClass);		addRegisterClass(MVT::v4i32, &R600::R600_Reg128RegClass);

computeRegisterProperties(STI.getRegisterInfo());		computeRegisterProperties(Subtarget->getRegisterInfo());

// Legalize loads and stores to the private address space.		// Legalize loads and stores to the private address space.
setOperationAction(ISD::LOAD, MVT::i32, Custom);		setOperationAction(ISD::LOAD, MVT::i32, Custom);
setOperationAction(ISD::LOAD, MVT::v2i32, Custom);		setOperationAction(ISD::LOAD, MVT::v2i32, Custom);
setOperationAction(ISD::LOAD, MVT::v4i32, Custom);		setOperationAction(ISD::LOAD, MVT::v4i32, Custom);

// EXTLOAD should be the same as ZEXTLOAD. It is legal for some address		// EXTLOAD should be the same as ZEXTLOAD. It is legal for some address
// spaces, so it is custom lowered to handle those where it isn't.		// spaces, so it is custom lowered to handle those where it isn't.
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	R600TargetLowering::R600TargetLowering(const TargetMachine &TM,
setOperationAction(ISD::SETCC, MVT::v2i32, Expand);		setOperationAction(ISD::SETCC, MVT::v2i32, Expand);

setOperationAction(ISD::BR_CC, MVT::i32, Expand);		setOperationAction(ISD::BR_CC, MVT::i32, Expand);
setOperationAction(ISD::BR_CC, MVT::f32, Expand);		setOperationAction(ISD::BR_CC, MVT::f32, Expand);
setOperationAction(ISD::BRCOND, MVT::Other, Custom);		setOperationAction(ISD::BRCOND, MVT::Other, Custom);

setOperationAction(ISD::FSUB, MVT::f32, Expand);		setOperationAction(ISD::FSUB, MVT::f32, Expand);

		setOperationAction(ISD::FCEIL, MVT::f64, Custom);
		setOperationAction(ISD::FTRUNC, MVT::f64, Custom);
		setOperationAction(ISD::FRINT, MVT::f64, Custom);
		setOperationAction(ISD::FFLOOR, MVT::f64, Custom);

setOperationAction(ISD::SELECT_CC, MVT::f32, Custom);		setOperationAction(ISD::SELECT_CC, MVT::f32, Custom);
setOperationAction(ISD::SELECT_CC, MVT::i32, Custom);		setOperationAction(ISD::SELECT_CC, MVT::i32, Custom);

setOperationAction(ISD::SETCC, MVT::i32, Expand);		setOperationAction(ISD::SETCC, MVT::i32, Expand);
setOperationAction(ISD::SETCC, MVT::f32, Expand);		setOperationAction(ISD::SETCC, MVT::f32, Expand);
setOperationAction(ISD::FP_TO_UINT, MVT::i1, Custom);		setOperationAction(ISD::FP_TO_UINT, MVT::i1, Custom);
setOperationAction(ISD::FP_TO_SINT, MVT::i1, Custom);		setOperationAction(ISD::FP_TO_SINT, MVT::i1, Custom);
setOperationAction(ISD::FP_TO_SINT, MVT::i64, Custom);		setOperationAction(ISD::FP_TO_SINT, MVT::i64, Custom);
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	R600TargetLowering::R600TargetLowering(const TargetMachine &TM,
setOperationAction(ISD::SHL_PARTS, MVT::i32, Custom);		setOperationAction(ISD::SHL_PARTS, MVT::i32, Custom);
setOperationAction(ISD::SRL_PARTS, MVT::i32, Custom);		setOperationAction(ISD::SRL_PARTS, MVT::i32, Custom);
setOperationAction(ISD::SRA_PARTS, MVT::i32, Custom);		setOperationAction(ISD::SRA_PARTS, MVT::i32, Custom);

if (!Subtarget->hasFMA()) {		if (!Subtarget->hasFMA()) {
setOperationAction(ISD::FMA, MVT::f32, Expand);		setOperationAction(ISD::FMA, MVT::f32, Expand);
setOperationAction(ISD::FMA, MVT::f64, Expand);		setOperationAction(ISD::FMA, MVT::f64, Expand);
}		}

		jveselyUnsubmitted Not Done Reply Inline Actions git complains about whitespace error in this location jvesely: git complains about whitespace error in this location
		// FIXME: This was moved from AMDGPUTargetLowering, I'm not sure if we
		// need it for R600.
		if (!Subtarget->hasFP32Denormals())
		setOperationAction(ISD::FMAD, MVT::f32, Legal);

		if (!Subtarget->hasBFI()) {
		// fcopysign can be done in a single instruction with BFI.
		setOperationAction(ISD::FCOPYSIGN, MVT::f32, Expand);
		setOperationAction(ISD::FCOPYSIGN, MVT::f64, Expand);
		}

		if (!Subtarget->hasBCNT(32))
		setOperationAction(ISD::CTPOP, MVT::i32, Expand);

		if (!Subtarget->hasBCNT(64))
		setOperationAction(ISD::CTPOP, MVT::i64, Expand);

		if (Subtarget->hasFFBH())
		setOperationAction(ISD::CTLZ_ZERO_UNDEF, MVT::i32, Custom);

		if (Subtarget->hasFFBL())
		setOperationAction(ISD::CTTZ_ZERO_UNDEF, MVT::i32, Custom);

		// FIXME: This was moved from AMDGPUTargetLowering, I'm not sure if we
		// need it for R600.
		if (Subtarget->hasBFE())
		setHasExtractBitsInsn(true);

setOperationAction(ISD::GlobalAddress, MVT::i32, Custom);		setOperationAction(ISD::GlobalAddress, MVT::i32, Custom);

const MVT ScalarIntVTs[] = { MVT::i32, MVT::i64 };		const MVT ScalarIntVTs[] = { MVT::i32, MVT::i64 };
for (MVT VT : ScalarIntVTs) {		for (MVT VT : ScalarIntVTs) {
setOperationAction(ISD::ADDC, VT, Expand);		setOperationAction(ISD::ADDC, VT, Expand);
setOperationAction(ISD::SUBC, VT, Expand);		setOperationAction(ISD::SUBC, VT, Expand);
setOperationAction(ISD::ADDE, VT, Expand);		setOperationAction(ISD::ADDE, VT, Expand);
setOperationAction(ISD::SUBE, VT, Expand);		setOperationAction(ISD::SUBE, VT, Expand);
Show All 13 Lines	R600TargetLowering::R600TargetLowering(const TargetMachine &TM,
setTargetDAGCombine(ISD::FP_ROUND);		setTargetDAGCombine(ISD::FP_ROUND);
setTargetDAGCombine(ISD::FP_TO_SINT);		setTargetDAGCombine(ISD::FP_TO_SINT);
setTargetDAGCombine(ISD::EXTRACT_VECTOR_ELT);		setTargetDAGCombine(ISD::EXTRACT_VECTOR_ELT);
setTargetDAGCombine(ISD::SELECT_CC);		setTargetDAGCombine(ISD::SELECT_CC);
setTargetDAGCombine(ISD::INSERT_VECTOR_ELT);		setTargetDAGCombine(ISD::INSERT_VECTOR_ELT);
setTargetDAGCombine(ISD::LOAD);		setTargetDAGCombine(ISD::LOAD);
}		}

const R600Subtarget *R600TargetLowering::getSubtarget() const {
return static_cast<const R600Subtarget *>(Subtarget);
}

static inline bool isEOP(MachineBasicBlock::iterator I) {		static inline bool isEOP(MachineBasicBlock::iterator I) {
if (std::next(I) == I->getParent()->end())		if (std::next(I) == I->getParent()->end())
return false;		return false;
return std::next(I)->getOpcode() == AMDGPU::RETURN;		return std::next(I)->getOpcode() == R600::RETURN;
}		}

MachineBasicBlock *		MachineBasicBlock *
R600TargetLowering::EmitInstrWithCustomInserter(MachineInstr &MI,		R600TargetLowering::EmitInstrWithCustomInserter(MachineInstr &MI,
MachineBasicBlock *BB) const {		MachineBasicBlock *BB) const {
MachineFunction *MF = BB->getParent();		MachineFunction *MF = BB->getParent();
MachineRegisterInfo &MRI = MF->getRegInfo();		MachineRegisterInfo &MRI = MF->getRegInfo();
MachineBasicBlock::iterator I = MI;		MachineBasicBlock::iterator I = MI;
const R600InstrInfo *TII = getSubtarget()->getInstrInfo();		const R600InstrInfo *TII = Subtarget->getInstrInfo();

switch (MI.getOpcode()) {		switch (MI.getOpcode()) {
default:		default:
// Replace LDS_*_RET instruction that don't have any uses with the		// Replace LDS_*_RET instruction that don't have any uses with the
// equivalent LDS_*_NORET instruction.		// equivalent LDS_*_NORET instruction.
if (TII->isLDSRetInstr(MI.getOpcode())) {		if (TII->isLDSRetInstr(MI.getOpcode())) {
int DstIdx = TII->getOperandIdx(MI.getOpcode(), AMDGPU::OpName::dst);		int DstIdx = TII->getOperandIdx(MI.getOpcode(), R600::OpName::dst);
assert(DstIdx != -1);		assert(DstIdx != -1);
MachineInstrBuilder NewMI;		MachineInstrBuilder NewMI;
// FIXME: getLDSNoRetOp method only handles LDS_1A1D LDS ops. Add		// FIXME: getLDSNoRetOp method only handles LDS_1A1D LDS ops. Add
// LDS_1A2D support and remove this special case.		// LDS_1A2D support and remove this special case.
if (!MRI.use_empty(MI.getOperand(DstIdx).getReg()) \|\|		if (!MRI.use_empty(MI.getOperand(DstIdx).getReg()) \|\|
MI.getOpcode() == AMDGPU::LDS_CMPST_RET)		MI.getOpcode() == R600::LDS_CMPST_RET)
return BB;		return BB;

NewMI = BuildMI(*BB, I, BB->findDebugLoc(I),		NewMI = BuildMI(*BB, I, BB->findDebugLoc(I),
TII->get(AMDGPU::getLDSNoRetOp(MI.getOpcode())));		TII->get(R600::getLDSNoRetOp(MI.getOpcode())));
for (unsigned i = 1, e = MI.getNumOperands(); i < e; ++i) {		for (unsigned i = 1, e = MI.getNumOperands(); i < e; ++i) {
NewMI.add(MI.getOperand(i));		NewMI.add(MI.getOperand(i));
}		}
} else {		} else {
return AMDGPUTargetLowering::EmitInstrWithCustomInserter(MI, BB);		return AMDGPUTargetLowering::EmitInstrWithCustomInserter(MI, BB);
}		}
break;		break;

case AMDGPU::FABS_R600: {		case R600::FABS_R600: {
MachineInstr *NewMI = TII->buildDefaultInstruction(		MachineInstr *NewMI = TII->buildDefaultInstruction(
*BB, I, AMDGPU::MOV, MI.getOperand(0).getReg(),		*BB, I, R600::MOV, MI.getOperand(0).getReg(),
MI.getOperand(1).getReg());		MI.getOperand(1).getReg());
TII->addFlag(*NewMI, 0, MO_FLAG_ABS);		TII->addFlag(*NewMI, 0, MO_FLAG_ABS);
break;		break;
}		}

case AMDGPU::FNEG_R600: {		case R600::FNEG_R600: {
MachineInstr *NewMI = TII->buildDefaultInstruction(		MachineInstr *NewMI = TII->buildDefaultInstruction(
*BB, I, AMDGPU::MOV, MI.getOperand(0).getReg(),		*BB, I, R600::MOV, MI.getOperand(0).getReg(),
MI.getOperand(1).getReg());		MI.getOperand(1).getReg());
TII->addFlag(*NewMI, 0, MO_FLAG_NEG);		TII->addFlag(*NewMI, 0, MO_FLAG_NEG);
break;		break;
}		}

case AMDGPU::MASK_WRITE: {		case R600::MASK_WRITE: {
unsigned maskedRegister = MI.getOperand(0).getReg();		unsigned maskedRegister = MI.getOperand(0).getReg();
assert(TargetRegisterInfo::isVirtualRegister(maskedRegister));		assert(TargetRegisterInfo::isVirtualRegister(maskedRegister));
MachineInstr * defInstr = MRI.getVRegDef(maskedRegister);		MachineInstr * defInstr = MRI.getVRegDef(maskedRegister);
TII->addFlag(*defInstr, 0, MO_FLAG_MASK);		TII->addFlag(*defInstr, 0, MO_FLAG_MASK);
break;		break;
}		}

case AMDGPU::MOV_IMM_F32:		case R600::MOV_IMM_F32:
TII->buildMovImm(*BB, I, MI.getOperand(0).getReg(), MI.getOperand(1)		TII->buildMovImm(*BB, I, MI.getOperand(0).getReg(), MI.getOperand(1)
.getFPImm()		.getFPImm()
->getValueAPF()		->getValueAPF()
.bitcastToAPInt()		.bitcastToAPInt()
.getZExtValue());		.getZExtValue());
break;		break;

case AMDGPU::MOV_IMM_I32:		case R600::MOV_IMM_I32:
TII->buildMovImm(*BB, I, MI.getOperand(0).getReg(),		TII->buildMovImm(*BB, I, MI.getOperand(0).getReg(),
MI.getOperand(1).getImm());		MI.getOperand(1).getImm());
break;		break;

case AMDGPU::MOV_IMM_GLOBAL_ADDR: {		case R600::MOV_IMM_GLOBAL_ADDR: {
//TODO: Perhaps combine this instruction with the next if possible		//TODO: Perhaps combine this instruction with the next if possible
auto MIB = TII->buildDefaultInstruction(		auto MIB = TII->buildDefaultInstruction(
*BB, MI, AMDGPU::MOV, MI.getOperand(0).getReg(), AMDGPU::ALU_LITERAL_X);		*BB, MI, R600::MOV, MI.getOperand(0).getReg(), R600::ALU_LITERAL_X);
int Idx = TII->getOperandIdx(*MIB, AMDGPU::OpName::literal);		int Idx = TII->getOperandIdx(*MIB, R600::OpName::literal);
//TODO: Ugh this is rather ugly		//TODO: Ugh this is rather ugly
MIB->getOperand(Idx) = MI.getOperand(1);		MIB->getOperand(Idx) = MI.getOperand(1);
break;		break;
}		}

case AMDGPU::CONST_COPY: {		case R600::CONST_COPY: {
MachineInstr *NewMI = TII->buildDefaultInstruction(		MachineInstr *NewMI = TII->buildDefaultInstruction(
*BB, MI, AMDGPU::MOV, MI.getOperand(0).getReg(), AMDGPU::ALU_CONST);		*BB, MI, R600::MOV, MI.getOperand(0).getReg(), R600::ALU_CONST);
TII->setImmOperand(*NewMI, AMDGPU::OpName::src0_sel,		TII->setImmOperand(*NewMI, R600::OpName::src0_sel,
MI.getOperand(1).getImm());		MI.getOperand(1).getImm());
break;		break;
}		}

case AMDGPU::RAT_WRITE_CACHELESS_32_eg:		case R600::RAT_WRITE_CACHELESS_32_eg:
case AMDGPU::RAT_WRITE_CACHELESS_64_eg:		case R600::RAT_WRITE_CACHELESS_64_eg:
case AMDGPU::RAT_WRITE_CACHELESS_128_eg:		case R600::RAT_WRITE_CACHELESS_128_eg:
BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(MI.getOpcode()))		BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(MI.getOpcode()))
.add(MI.getOperand(0))		.add(MI.getOperand(0))
.add(MI.getOperand(1))		.add(MI.getOperand(1))
.addImm(isEOP(I)); // Set End of program bit		.addImm(isEOP(I)); // Set End of program bit
break;		break;

case AMDGPU::RAT_STORE_TYPED_eg:		case R600::RAT_STORE_TYPED_eg:
BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(MI.getOpcode()))		BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(MI.getOpcode()))
.add(MI.getOperand(0))		.add(MI.getOperand(0))
.add(MI.getOperand(1))		.add(MI.getOperand(1))
.add(MI.getOperand(2))		.add(MI.getOperand(2))
.addImm(isEOP(I)); // Set End of program bit		.addImm(isEOP(I)); // Set End of program bit
break;		break;

case AMDGPU::BRANCH:		case R600::BRANCH:
BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(AMDGPU::JUMP))		BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(R600::JUMP))
.add(MI.getOperand(0));		.add(MI.getOperand(0));
break;		break;

case AMDGPU::BRANCH_COND_f32: {		case R600::BRANCH_COND_f32: {
MachineInstr *NewMI =		MachineInstr *NewMI =
BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(AMDGPU::PRED_X),		BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(R600::PRED_X),
AMDGPU::PREDICATE_BIT)		R600::PREDICATE_BIT)
.add(MI.getOperand(1))		.add(MI.getOperand(1))
.addImm(AMDGPU::PRED_SETNE)		.addImm(R600::PRED_SETNE)
.addImm(0); // Flags		.addImm(0); // Flags
TII->addFlag(*NewMI, 0, MO_FLAG_PUSH);		TII->addFlag(*NewMI, 0, MO_FLAG_PUSH);
BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(AMDGPU::JUMP_COND))		BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(R600::JUMP_COND))
.add(MI.getOperand(0))		.add(MI.getOperand(0))
.addReg(AMDGPU::PREDICATE_BIT, RegState::Kill);		.addReg(R600::PREDICATE_BIT, RegState::Kill);
break;		break;
}		}

case AMDGPU::BRANCH_COND_i32: {		case R600::BRANCH_COND_i32: {
MachineInstr *NewMI =		MachineInstr *NewMI =
BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(AMDGPU::PRED_X),		BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(R600::PRED_X),
AMDGPU::PREDICATE_BIT)		R600::PREDICATE_BIT)
.add(MI.getOperand(1))		.add(MI.getOperand(1))
.addImm(AMDGPU::PRED_SETNE_INT)		.addImm(R600::PRED_SETNE_INT)
.addImm(0); // Flags		.addImm(0); // Flags
TII->addFlag(*NewMI, 0, MO_FLAG_PUSH);		TII->addFlag(*NewMI, 0, MO_FLAG_PUSH);
BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(AMDGPU::JUMP_COND))		BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(R600::JUMP_COND))
.add(MI.getOperand(0))		.add(MI.getOperand(0))
.addReg(AMDGPU::PREDICATE_BIT, RegState::Kill);		.addReg(R600::PREDICATE_BIT, RegState::Kill);
break;		break;
}		}

case AMDGPU::EG_ExportSwz:		case R600::EG_ExportSwz:
case AMDGPU::R600_ExportSwz: {		case R600::R600_ExportSwz: {
// Instruction is left unmodified if its not the last one of its type		// Instruction is left unmodified if its not the last one of its type
bool isLastInstructionOfItsType = true;		bool isLastInstructionOfItsType = true;
unsigned InstExportType = MI.getOperand(1).getImm();		unsigned InstExportType = MI.getOperand(1).getImm();
for (MachineBasicBlock::iterator NextExportInst = std::next(I),		for (MachineBasicBlock::iterator NextExportInst = std::next(I),
EndBlock = BB->end(); NextExportInst != EndBlock;		EndBlock = BB->end(); NextExportInst != EndBlock;
NextExportInst = std::next(NextExportInst)) {		NextExportInst = std::next(NextExportInst)) {
if (NextExportInst->getOpcode() == AMDGPU::EG_ExportSwz \|\|		if (NextExportInst->getOpcode() == R600::EG_ExportSwz \|\|
NextExportInst->getOpcode() == AMDGPU::R600_ExportSwz) {		NextExportInst->getOpcode() == R600::R600_ExportSwz) {
unsigned CurrentInstExportType = NextExportInst->getOperand(1)		unsigned CurrentInstExportType = NextExportInst->getOperand(1)
.getImm();		.getImm();
if (CurrentInstExportType == InstExportType) {		if (CurrentInstExportType == InstExportType) {
isLastInstructionOfItsType = false;		isLastInstructionOfItsType = false;
break;		break;
}		}
}		}
}		}
bool EOP = isEOP(I);		bool EOP = isEOP(I);
if (!EOP && !isLastInstructionOfItsType)		if (!EOP && !isLastInstructionOfItsType)
return BB;		return BB;
unsigned CfInst = (MI.getOpcode() == AMDGPU::EG_ExportSwz) ? 84 : 40;		unsigned CfInst = (MI.getOpcode() == R600::EG_ExportSwz) ? 84 : 40;
BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(MI.getOpcode()))		BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(MI.getOpcode()))
.add(MI.getOperand(0))		.add(MI.getOperand(0))
.add(MI.getOperand(1))		.add(MI.getOperand(1))
.add(MI.getOperand(2))		.add(MI.getOperand(2))
.add(MI.getOperand(3))		.add(MI.getOperand(3))
.add(MI.getOperand(4))		.add(MI.getOperand(4))
.add(MI.getOperand(5))		.add(MI.getOperand(5))
.add(MI.getOperand(6))		.add(MI.getOperand(6))
.addImm(CfInst)		.addImm(CfInst)
.addImm(EOP);		.addImm(EOP);
break;		break;
}		}
case AMDGPU::RETURN: {		case R600::RETURN: {
return BB;		return BB;
}		}
}		}

MI.eraseFromParent();		MI.eraseFromParent();
return BB;		return BB;
}		}

▲ Show 20 Lines • Show All 139 Lines • ▼ Show 20 Lines	case ISD::INTRINSIC_WO_CHAIN: {
case Intrinsic::r600_read_local_size_x:		case Intrinsic::r600_read_local_size_x:
return LowerImplicitParameter(DAG, VT, DL, 6);		return LowerImplicitParameter(DAG, VT, DL, 6);
case Intrinsic::r600_read_local_size_y:		case Intrinsic::r600_read_local_size_y:
return LowerImplicitParameter(DAG, VT, DL, 7);		return LowerImplicitParameter(DAG, VT, DL, 7);
case Intrinsic::r600_read_local_size_z:		case Intrinsic::r600_read_local_size_z:
return LowerImplicitParameter(DAG, VT, DL, 8);		return LowerImplicitParameter(DAG, VT, DL, 8);

case Intrinsic::r600_read_tgid_x:		case Intrinsic::r600_read_tgid_x:
return CreateLiveInRegisterRaw(DAG, &AMDGPU::R600_TReg32RegClass,		return CreateLiveInRegisterRaw(DAG, &R600::R600_TReg32RegClass,
AMDGPU::T1_X, VT);		R600::T1_X, VT);
case Intrinsic::r600_read_tgid_y:		case Intrinsic::r600_read_tgid_y:
return CreateLiveInRegisterRaw(DAG, &AMDGPU::R600_TReg32RegClass,		return CreateLiveInRegisterRaw(DAG, &R600::R600_TReg32RegClass,
AMDGPU::T1_Y, VT);		R600::T1_Y, VT);
case Intrinsic::r600_read_tgid_z:		case Intrinsic::r600_read_tgid_z:
return CreateLiveInRegisterRaw(DAG, &AMDGPU::R600_TReg32RegClass,		return CreateLiveInRegisterRaw(DAG, &R600::R600_TReg32RegClass,
AMDGPU::T1_Z, VT);		R600::T1_Z, VT);
case Intrinsic::r600_read_tidig_x:		case Intrinsic::r600_read_tidig_x:
return CreateLiveInRegisterRaw(DAG, &AMDGPU::R600_TReg32RegClass,		return CreateLiveInRegisterRaw(DAG, &R600::R600_TReg32RegClass,
AMDGPU::T0_X, VT);		R600::T0_X, VT);
case Intrinsic::r600_read_tidig_y:		case Intrinsic::r600_read_tidig_y:
return CreateLiveInRegisterRaw(DAG, &AMDGPU::R600_TReg32RegClass,		return CreateLiveInRegisterRaw(DAG, &R600::R600_TReg32RegClass,
AMDGPU::T0_Y, VT);		R600::T0_Y, VT);
case Intrinsic::r600_read_tidig_z:		case Intrinsic::r600_read_tidig_z:
return CreateLiveInRegisterRaw(DAG, &AMDGPU::R600_TReg32RegClass,		return CreateLiveInRegisterRaw(DAG, &R600::R600_TReg32RegClass,
AMDGPU::T0_Z, VT);		R600::T0_Z, VT);

case Intrinsic::r600_recipsqrt_ieee:		case Intrinsic::r600_recipsqrt_ieee:
return DAG.getNode(AMDGPUISD::RSQ, DL, VT, Op.getOperand(1));		return DAG.getNode(AMDGPUISD::RSQ, DL, VT, Op.getOperand(1));

case Intrinsic::r600_recipsqrt_clamped:		case Intrinsic::r600_recipsqrt_clamped:
return DAG.getNode(AMDGPUISD::RSQ_CLAMP, DL, VT, Op.getOperand(1));		return DAG.getNode(AMDGPUISD::RSQ_CLAMP, DL, VT, Op.getOperand(1));
default:		default:
return Op;		return Op;
▲ Show 20 Lines • Show All 905 Lines • ▼ Show 20 Lines	SDValue R600TargetLowering::LowerBRCOND(SDValue Op, SelectionDAG &DAG) const {

return DAG.getNode(AMDGPUISD::BRANCH_COND, SDLoc(Op), Op.getValueType(),		return DAG.getNode(AMDGPUISD::BRANCH_COND, SDLoc(Op), Op.getValueType(),
Chain, Jump, Cond);		Chain, Jump, Cond);
}		}

SDValue R600TargetLowering::lowerFrameIndex(SDValue Op,		SDValue R600TargetLowering::lowerFrameIndex(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
MachineFunction &MF = DAG.getMachineFunction();		MachineFunction &MF = DAG.getMachineFunction();
const R600FrameLowering *TFL = getSubtarget()->getFrameLowering();		const R600FrameLowering *TFL = Subtarget->getFrameLowering();

FrameIndexSDNode *FIN = cast<FrameIndexSDNode>(Op);		FrameIndexSDNode *FIN = cast<FrameIndexSDNode>(Op);

unsigned FrameIndex = FIN->getIndex();		unsigned FrameIndex = FIN->getIndex();
unsigned IgnoredFrameReg;		unsigned IgnoredFrameReg;
unsigned Offset =		unsigned Offset =
TFL->getFrameIndexReference(MF, FrameIndex, IgnoredFrameReg);		TFL->getFrameIndexReference(MF, FrameIndex, IgnoredFrameReg);
return DAG.getConstant(Offset * 4 * TFL->getStackWidth(MF), SDLoc(Op),		return DAG.getConstant(Offset * 4 * TFL->getStackWidth(MF), SDLoc(Op),
Op.getValueType());		Op.getValueType());
}		}

		CCAssignFn *R600TargetLowering::CCAssignFnForCall(CallingConv::ID CC,
		bool IsVarArg) const {
		switch (CC) {
		case CallingConv::AMDGPU_KERNEL:
		case CallingConv::SPIR_KERNEL:
		case CallingConv::C:
		case CallingConv::Fast:
		case CallingConv::Cold:
		arsenmUnsubmitted Not Done Reply Inline Actions Probably should reject these, but that's a separate change arsenm: Probably should reject these, but that's a separate change
		return CC_R600_Kernel;
		case CallingConv::AMDGPU_VS:
		case CallingConv::AMDGPU_GS:
		case CallingConv::AMDGPU_PS:
		case CallingConv::AMDGPU_CS:
		case CallingConv::AMDGPU_HS:
		case CallingConv::AMDGPU_ES:
		case CallingConv::AMDGPU_LS:
		return CC_R600;
		default:
		report_fatal_error("Unsupported calling convention.");
		}
		}

/// XXX Only kernel functions are supported, so we can assume for now that		/// XXX Only kernel functions are supported, so we can assume for now that
/// every function is a kernel function, but in the future we should use		/// every function is a kernel function, but in the future we should use
/// separate calling conventions for kernel and non-kernel functions.		/// separate calling conventions for kernel and non-kernel functions.
SDValue R600TargetLowering::LowerFormalArguments(		SDValue R600TargetLowering::LowerFormalArguments(
SDValue Chain, CallingConv::ID CallConv, bool isVarArg,		SDValue Chain, CallingConv::ID CallConv, bool isVarArg,
const SmallVectorImpl<ISD::InputArg> &Ins, const SDLoc &DL,		const SmallVectorImpl<ISD::InputArg> &Ins, const SDLoc &DL,
SelectionDAG &DAG, SmallVectorImpl<SDValue> &InVals) const {		SelectionDAG &DAG, SmallVectorImpl<SDValue> &InVals) const {
SmallVector<CCValAssign, 16> ArgLocs;		SmallVector<CCValAssign, 16> ArgLocs;
Show All 16 Lines	for (unsigned i = 0, e = Ins.size(); i < e; ++i) {
EVT VT = In.VT;		EVT VT = In.VT;
EVT MemVT = VA.getLocVT();		EVT MemVT = VA.getLocVT();
if (!VT.isVector() && MemVT.isVector()) {		if (!VT.isVector() && MemVT.isVector()) {
// Get load source type if scalarized.		// Get load source type if scalarized.
MemVT = MemVT.getVectorElementType();		MemVT = MemVT.getVectorElementType();
}		}

if (AMDGPU::isShader(CallConv)) {		if (AMDGPU::isShader(CallConv)) {
unsigned Reg = MF.addLiveIn(VA.getLocReg(), &AMDGPU::R600_Reg128RegClass);		unsigned Reg = MF.addLiveIn(VA.getLocReg(), &R600::R600_Reg128RegClass);
SDValue Register = DAG.getCopyFromReg(Chain, DL, Reg, VT);		SDValue Register = DAG.getCopyFromReg(Chain, DL, Reg, VT);
InVals.push_back(Register);		InVals.push_back(Register);
continue;		continue;
}		}

PointerType PtrTy = PointerType::get(VT.getTypeForEVT(DAG.getContext()),		PointerType PtrTy = PointerType::get(VT.getTypeForEVT(DAG.getContext()),
AMDGPUASI.CONSTANT_BUFFER_0);		AMDGPUASI.CONSTANT_BUFFER_0);

Show All 14 Lines	for (unsigned i = 0, e = Ins.size(); i < e; ++i) {
}		}

// Compute the offset from the value.		// Compute the offset from the value.
// XXX - I think PartOffset should give you this, but it seems to give the		// XXX - I think PartOffset should give you this, but it seems to give the
// size of the register which isn't useful.		// size of the register which isn't useful.

unsigned ValBase = ArgLocs[In.getOrigArgIndex()].getLocMemOffset();		unsigned ValBase = ArgLocs[In.getOrigArgIndex()].getLocMemOffset();
unsigned PartOffset = VA.getLocMemOffset();		unsigned PartOffset = VA.getLocMemOffset();
unsigned Offset = Subtarget->getExplicitKernelArgOffset(MF.getFunction()) +		unsigned Offset = Subtarget->getExplicitKernelArgOffset(MF) +
VA.getLocMemOffset();		VA.getLocMemOffset();

MachinePointerInfo PtrInfo(UndefValue::get(PtrTy), PartOffset - ValBase);		MachinePointerInfo PtrInfo(UndefValue::get(PtrTy), PartOffset - ValBase);
SDValue Arg = DAG.getLoad(		SDValue Arg = DAG.getLoad(
ISD::UNINDEXED, Ext, VT, DL, Chain,		ISD::UNINDEXED, Ext, VT, DL, Chain,
DAG.getConstant(Offset, DL, MVT::i32), DAG.getUNDEF(MVT::i32), PtrInfo,		DAG.getConstant(Offset, DL, MVT::i32), DAG.getUNDEF(MVT::i32), PtrInfo,
MemVT, /* Alignment = */ 4, MachineMemOperand::MONonTemporal \|		MemVT, /* Alignment = */ 4, MachineMemOperand::MONonTemporal \|
MachineMemOperand::MODereferenceable \|		MachineMemOperand::MODereferenceable \|
▲ Show 20 Lines • Show All 371 Lines • ▼ Show 20 Lines	SDValue R600TargetLowering::PerformDAGCombine(SDNode *N,

return AMDGPUTargetLowering::PerformDAGCombine(N, DCI);		return AMDGPUTargetLowering::PerformDAGCombine(N, DCI);
}		}

bool R600TargetLowering::FoldOperand(SDNode *ParentNode, unsigned SrcIdx,		bool R600TargetLowering::FoldOperand(SDNode *ParentNode, unsigned SrcIdx,
SDValue &Src, SDValue &Neg, SDValue &Abs,		SDValue &Src, SDValue &Neg, SDValue &Abs,
SDValue &Sel, SDValue &Imm,		SDValue &Sel, SDValue &Imm,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
const R600InstrInfo *TII = getSubtarget()->getInstrInfo();		const R600InstrInfo *TII = Subtarget->getInstrInfo();
if (!Src.isMachineOpcode())		if (!Src.isMachineOpcode())
return false;		return false;

switch (Src.getMachineOpcode()) {		switch (Src.getMachineOpcode()) {
case AMDGPU::FNEG_R600:		case R600::FNEG_R600:
if (!Neg.getNode())		if (!Neg.getNode())
return false;		return false;
Src = Src.getOperand(0);		Src = Src.getOperand(0);
Neg = DAG.getTargetConstant(1, SDLoc(ParentNode), MVT::i32);		Neg = DAG.getTargetConstant(1, SDLoc(ParentNode), MVT::i32);
return true;		return true;
case AMDGPU::FABS_R600:		case R600::FABS_R600:
if (!Abs.getNode())		if (!Abs.getNode())
return false;		return false;
Src = Src.getOperand(0);		Src = Src.getOperand(0);
Abs = DAG.getTargetConstant(1, SDLoc(ParentNode), MVT::i32);		Abs = DAG.getTargetConstant(1, SDLoc(ParentNode), MVT::i32);
return true;		return true;
case AMDGPU::CONST_COPY: {		case R600::CONST_COPY: {
unsigned Opcode = ParentNode->getMachineOpcode();		unsigned Opcode = ParentNode->getMachineOpcode();
bool HasDst = TII->getOperandIdx(Opcode, AMDGPU::OpName::dst) > -1;		bool HasDst = TII->getOperandIdx(Opcode, R600::OpName::dst) > -1;

if (!Sel.getNode())		if (!Sel.getNode())
return false;		return false;

SDValue CstOffset = Src.getOperand(0);		SDValue CstOffset = Src.getOperand(0);
if (ParentNode->getValueType(0).isVector())		if (ParentNode->getValueType(0).isVector())
return false;		return false;

// Gather constants values		// Gather constants values
int SrcIndices[] = {		int SrcIndices[] = {
TII->getOperandIdx(Opcode, AMDGPU::OpName::src0),		TII->getOperandIdx(Opcode, R600::OpName::src0),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src1),		TII->getOperandIdx(Opcode, R600::OpName::src1),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src2),		TII->getOperandIdx(Opcode, R600::OpName::src2),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src0_X),		TII->getOperandIdx(Opcode, R600::OpName::src0_X),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src0_Y),		TII->getOperandIdx(Opcode, R600::OpName::src0_Y),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src0_Z),		TII->getOperandIdx(Opcode, R600::OpName::src0_Z),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src0_W),		TII->getOperandIdx(Opcode, R600::OpName::src0_W),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src1_X),		TII->getOperandIdx(Opcode, R600::OpName::src1_X),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src1_Y),		TII->getOperandIdx(Opcode, R600::OpName::src1_Y),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src1_Z),		TII->getOperandIdx(Opcode, R600::OpName::src1_Z),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src1_W)		TII->getOperandIdx(Opcode, R600::OpName::src1_W)
};		};
std::vector<unsigned> Consts;		std::vector<unsigned> Consts;
for (int OtherSrcIdx : SrcIndices) {		for (int OtherSrcIdx : SrcIndices) {
int OtherSelIdx = TII->getSelIdx(Opcode, OtherSrcIdx);		int OtherSelIdx = TII->getSelIdx(Opcode, OtherSrcIdx);
if (OtherSrcIdx < 0 \|\| OtherSelIdx < 0)		if (OtherSrcIdx < 0 \|\| OtherSelIdx < 0)
continue;		continue;
if (HasDst) {		if (HasDst) {
OtherSrcIdx--;		OtherSrcIdx--;
OtherSelIdx--;		OtherSelIdx--;
}		}
if (RegisterSDNode *Reg =		if (RegisterSDNode *Reg =
dyn_cast<RegisterSDNode>(ParentNode->getOperand(OtherSrcIdx))) {		dyn_cast<RegisterSDNode>(ParentNode->getOperand(OtherSrcIdx))) {
if (Reg->getReg() == AMDGPU::ALU_CONST) {		if (Reg->getReg() == R600::ALU_CONST) {
ConstantSDNode *Cst		ConstantSDNode *Cst
= cast<ConstantSDNode>(ParentNode->getOperand(OtherSelIdx));		= cast<ConstantSDNode>(ParentNode->getOperand(OtherSelIdx));
Consts.push_back(Cst->getZExtValue());		Consts.push_back(Cst->getZExtValue());
}		}
}		}
}		}

ConstantSDNode *Cst = cast<ConstantSDNode>(CstOffset);		ConstantSDNode *Cst = cast<ConstantSDNode>(CstOffset);
Consts.push_back(Cst->getZExtValue());		Consts.push_back(Cst->getZExtValue());
if (!TII->fitsConstReadLimitations(Consts)) {		if (!TII->fitsConstReadLimitations(Consts)) {
return false;		return false;
}		}

Sel = CstOffset;		Sel = CstOffset;
Src = DAG.getRegister(AMDGPU::ALU_CONST, MVT::f32);		Src = DAG.getRegister(R600::ALU_CONST, MVT::f32);
return true;		return true;
}		}
case AMDGPU::MOV_IMM_GLOBAL_ADDR:		case R600::MOV_IMM_GLOBAL_ADDR:
// Check if the Imm slot is used. Taken from below.		// Check if the Imm slot is used. Taken from below.
if (cast<ConstantSDNode>(Imm)->getZExtValue())		if (cast<ConstantSDNode>(Imm)->getZExtValue())
return false;		return false;
Imm = Src.getOperand(0);		Imm = Src.getOperand(0);
Src = DAG.getRegister(AMDGPU::ALU_LITERAL_X, MVT::i32);		Src = DAG.getRegister(R600::ALU_LITERAL_X, MVT::i32);
return true;		return true;
case AMDGPU::MOV_IMM_I32:		case R600::MOV_IMM_I32:
case AMDGPU::MOV_IMM_F32: {		case R600::MOV_IMM_F32: {
unsigned ImmReg = AMDGPU::ALU_LITERAL_X;		unsigned ImmReg = R600::ALU_LITERAL_X;
uint64_t ImmValue = 0;		uint64_t ImmValue = 0;

if (Src.getMachineOpcode() == AMDGPU::MOV_IMM_F32) {		if (Src.getMachineOpcode() == R600::MOV_IMM_F32) {
ConstantFPSDNode *FPC = dyn_cast<ConstantFPSDNode>(Src.getOperand(0));		ConstantFPSDNode *FPC = dyn_cast<ConstantFPSDNode>(Src.getOperand(0));
float FloatValue = FPC->getValueAPF().convertToFloat();		float FloatValue = FPC->getValueAPF().convertToFloat();
if (FloatValue == 0.0) {		if (FloatValue == 0.0) {
ImmReg = AMDGPU::ZERO;		ImmReg = R600::ZERO;
} else if (FloatValue == 0.5) {		} else if (FloatValue == 0.5) {
ImmReg = AMDGPU::HALF;		ImmReg = R600::HALF;
} else if (FloatValue == 1.0) {		} else if (FloatValue == 1.0) {
ImmReg = AMDGPU::ONE;		ImmReg = R600::ONE;
} else {		} else {
ImmValue = FPC->getValueAPF().bitcastToAPInt().getZExtValue();		ImmValue = FPC->getValueAPF().bitcastToAPInt().getZExtValue();
}		}
} else {		} else {
ConstantSDNode *C = dyn_cast<ConstantSDNode>(Src.getOperand(0));		ConstantSDNode *C = dyn_cast<ConstantSDNode>(Src.getOperand(0));
uint64_t Value = C->getZExtValue();		uint64_t Value = C->getZExtValue();
if (Value == 0) {		if (Value == 0) {
ImmReg = AMDGPU::ZERO;		ImmReg = R600::ZERO;
} else if (Value == 1) {		} else if (Value == 1) {
ImmReg = AMDGPU::ONE_INT;		ImmReg = R600::ONE_INT;
} else {		} else {
ImmValue = Value;		ImmValue = Value;
}		}
}		}

// Check that we aren't already using an immediate.		// Check that we aren't already using an immediate.
// XXX: It's possible for an instruction to have more than one		// XXX: It's possible for an instruction to have more than one
// immediate operand, but this is not supported yet.		// immediate operand, but this is not supported yet.
if (ImmReg == AMDGPU::ALU_LITERAL_X) {		if (ImmReg == R600::ALU_LITERAL_X) {
if (!Imm.getNode())		if (!Imm.getNode())
return false;		return false;
ConstantSDNode *C = dyn_cast<ConstantSDNode>(Imm);		ConstantSDNode *C = dyn_cast<ConstantSDNode>(Imm);
assert(C);		assert(C);
if (C->getZExtValue())		if (C->getZExtValue())
return false;		return false;
Imm = DAG.getTargetConstant(ImmValue, SDLoc(ParentNode), MVT::i32);		Imm = DAG.getTargetConstant(ImmValue, SDLoc(ParentNode), MVT::i32);
}		}
Src = DAG.getRegister(ImmReg, MVT::i32);		Src = DAG.getRegister(ImmReg, MVT::i32);
return true;		return true;
}		}
default:		default:
return false;		return false;
}		}
}		}

/// Fold the instructions after selecting them		/// Fold the instructions after selecting them
SDNode R600TargetLowering::PostISelFolding(MachineSDNode Node,		SDNode R600TargetLowering::PostISelFolding(MachineSDNode Node,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
const R600InstrInfo *TII = getSubtarget()->getInstrInfo();		const R600InstrInfo *TII = Subtarget->getInstrInfo();
if (!Node->isMachineOpcode())		if (!Node->isMachineOpcode())
return Node;		return Node;

unsigned Opcode = Node->getMachineOpcode();		unsigned Opcode = Node->getMachineOpcode();
SDValue FakeOp;		SDValue FakeOp;

std::vector<SDValue> Ops(Node->op_begin(), Node->op_end());		std::vector<SDValue> Ops(Node->op_begin(), Node->op_end());

if (Opcode == AMDGPU::DOT_4) {		if (Opcode == R600::DOT_4) {
int OperandIdx[] = {		int OperandIdx[] = {
TII->getOperandIdx(Opcode, AMDGPU::OpName::src0_X),		TII->getOperandIdx(Opcode, R600::OpName::src0_X),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src0_Y),		TII->getOperandIdx(Opcode, R600::OpName::src0_Y),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src0_Z),		TII->getOperandIdx(Opcode, R600::OpName::src0_Z),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src0_W),		TII->getOperandIdx(Opcode, R600::OpName::src0_W),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src1_X),		TII->getOperandIdx(Opcode, R600::OpName::src1_X),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src1_Y),		TII->getOperandIdx(Opcode, R600::OpName::src1_Y),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src1_Z),		TII->getOperandIdx(Opcode, R600::OpName::src1_Z),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src1_W)		TII->getOperandIdx(Opcode, R600::OpName::src1_W)
};		};
int NegIdx[] = {		int NegIdx[] = {
TII->getOperandIdx(Opcode, AMDGPU::OpName::src0_neg_X),		TII->getOperandIdx(Opcode, R600::OpName::src0_neg_X),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src0_neg_Y),		TII->getOperandIdx(Opcode, R600::OpName::src0_neg_Y),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src0_neg_Z),		TII->getOperandIdx(Opcode, R600::OpName::src0_neg_Z),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src0_neg_W),		TII->getOperandIdx(Opcode, R600::OpName::src0_neg_W),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src1_neg_X),		TII->getOperandIdx(Opcode, R600::OpName::src1_neg_X),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src1_neg_Y),		TII->getOperandIdx(Opcode, R600::OpName::src1_neg_Y),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src1_neg_Z),		TII->getOperandIdx(Opcode, R600::OpName::src1_neg_Z),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src1_neg_W)		TII->getOperandIdx(Opcode, R600::OpName::src1_neg_W)
};		};
int AbsIdx[] = {		int AbsIdx[] = {
TII->getOperandIdx(Opcode, AMDGPU::OpName::src0_abs_X),		TII->getOperandIdx(Opcode, R600::OpName::src0_abs_X),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src0_abs_Y),		TII->getOperandIdx(Opcode, R600::OpName::src0_abs_Y),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src0_abs_Z),		TII->getOperandIdx(Opcode, R600::OpName::src0_abs_Z),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src0_abs_W),		TII->getOperandIdx(Opcode, R600::OpName::src0_abs_W),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src1_abs_X),		TII->getOperandIdx(Opcode, R600::OpName::src1_abs_X),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src1_abs_Y),		TII->getOperandIdx(Opcode, R600::OpName::src1_abs_Y),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src1_abs_Z),		TII->getOperandIdx(Opcode, R600::OpName::src1_abs_Z),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src1_abs_W)		TII->getOperandIdx(Opcode, R600::OpName::src1_abs_W)
};		};
for (unsigned i = 0; i < 8; i++) {		for (unsigned i = 0; i < 8; i++) {
if (OperandIdx[i] < 0)		if (OperandIdx[i] < 0)
return Node;		return Node;
SDValue &Src = Ops[OperandIdx[i] - 1];		SDValue &Src = Ops[OperandIdx[i] - 1];
SDValue &Neg = Ops[NegIdx[i] - 1];		SDValue &Neg = Ops[NegIdx[i] - 1];
SDValue &Abs = Ops[AbsIdx[i] - 1];		SDValue &Abs = Ops[AbsIdx[i] - 1];
bool HasDst = TII->getOperandIdx(Opcode, AMDGPU::OpName::dst) > -1;		bool HasDst = TII->getOperandIdx(Opcode, R600::OpName::dst) > -1;
int SelIdx = TII->getSelIdx(Opcode, OperandIdx[i]);		int SelIdx = TII->getSelIdx(Opcode, OperandIdx[i]);
if (HasDst)		if (HasDst)
SelIdx--;		SelIdx--;
SDValue &Sel = (SelIdx > -1) ? Ops[SelIdx] : FakeOp;		SDValue &Sel = (SelIdx > -1) ? Ops[SelIdx] : FakeOp;
if (FoldOperand(Node, i, Src, Neg, Abs, Sel, FakeOp, DAG))		if (FoldOperand(Node, i, Src, Neg, Abs, Sel, FakeOp, DAG))
return DAG.getMachineNode(Opcode, SDLoc(Node), Node->getVTList(), Ops);		return DAG.getMachineNode(Opcode, SDLoc(Node), Node->getVTList(), Ops);
}		}
} else if (Opcode == AMDGPU::REG_SEQUENCE) {		} else if (Opcode == R600::REG_SEQUENCE) {
for (unsigned i = 1, e = Node->getNumOperands(); i < e; i += 2) {		for (unsigned i = 1, e = Node->getNumOperands(); i < e; i += 2) {
SDValue &Src = Ops[i];		SDValue &Src = Ops[i];
if (FoldOperand(Node, i, Src, FakeOp, FakeOp, FakeOp, FakeOp, DAG))		if (FoldOperand(Node, i, Src, FakeOp, FakeOp, FakeOp, FakeOp, DAG))
return DAG.getMachineNode(Opcode, SDLoc(Node), Node->getVTList(), Ops);		return DAG.getMachineNode(Opcode, SDLoc(Node), Node->getVTList(), Ops);
}		}
} else {		} else {
if (!TII->hasInstrModifiers(Opcode))		if (!TII->hasInstrModifiers(Opcode))
return Node;		return Node;
int OperandIdx[] = {		int OperandIdx[] = {
TII->getOperandIdx(Opcode, AMDGPU::OpName::src0),		TII->getOperandIdx(Opcode, R600::OpName::src0),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src1),		TII->getOperandIdx(Opcode, R600::OpName::src1),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src2)		TII->getOperandIdx(Opcode, R600::OpName::src2)
};		};
int NegIdx[] = {		int NegIdx[] = {
TII->getOperandIdx(Opcode, AMDGPU::OpName::src0_neg),		TII->getOperandIdx(Opcode, R600::OpName::src0_neg),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src1_neg),		TII->getOperandIdx(Opcode, R600::OpName::src1_neg),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src2_neg)		TII->getOperandIdx(Opcode, R600::OpName::src2_neg)
};		};
int AbsIdx[] = {		int AbsIdx[] = {
TII->getOperandIdx(Opcode, AMDGPU::OpName::src0_abs),		TII->getOperandIdx(Opcode, R600::OpName::src0_abs),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src1_abs),		TII->getOperandIdx(Opcode, R600::OpName::src1_abs),
-1		-1
};		};
for (unsigned i = 0; i < 3; i++) {		for (unsigned i = 0; i < 3; i++) {
if (OperandIdx[i] < 0)		if (OperandIdx[i] < 0)
return Node;		return Node;
SDValue &Src = Ops[OperandIdx[i] - 1];		SDValue &Src = Ops[OperandIdx[i] - 1];
SDValue &Neg = Ops[NegIdx[i] - 1];		SDValue &Neg = Ops[NegIdx[i] - 1];
SDValue FakeAbs;		SDValue FakeAbs;
SDValue &Abs = (AbsIdx[i] > -1) ? Ops[AbsIdx[i] - 1] : FakeAbs;		SDValue &Abs = (AbsIdx[i] > -1) ? Ops[AbsIdx[i] - 1] : FakeAbs;
bool HasDst = TII->getOperandIdx(Opcode, AMDGPU::OpName::dst) > -1;		bool HasDst = TII->getOperandIdx(Opcode, R600::OpName::dst) > -1;
int SelIdx = TII->getSelIdx(Opcode, OperandIdx[i]);		int SelIdx = TII->getSelIdx(Opcode, OperandIdx[i]);
int ImmIdx = TII->getOperandIdx(Opcode, AMDGPU::OpName::literal);		int ImmIdx = TII->getOperandIdx(Opcode, R600::OpName::literal);
if (HasDst) {		if (HasDst) {
SelIdx--;		SelIdx--;
ImmIdx--;		ImmIdx--;
}		}
SDValue &Sel = (SelIdx > -1) ? Ops[SelIdx] : FakeOp;		SDValue &Sel = (SelIdx > -1) ? Ops[SelIdx] : FakeOp;
SDValue &Imm = Ops[ImmIdx];		SDValue &Imm = Ops[ImmIdx];
if (FoldOperand(Node, i, Src, Neg, Abs, Sel, Imm, DAG))		if (FoldOperand(Node, i, Src, Neg, Abs, Sel, Imm, DAG))
return DAG.getMachineNode(Opcode, SDLoc(Node), Node->getVTList(), Ops);		return DAG.getMachineNode(Opcode, SDLoc(Node), Node->getVTList(), Ops);
}		}
}		}

return Node;		return Node;
}		}

lib/Target/AMDGPU/R600InstrFormats.td

Show All 35 Lines	class InstR600 <dag outs, dag ins, string asm, list<dag> pattern,
bit HasNativeOperands = 0;		bit HasNativeOperands = 0;
bit VTXInst = 0;		bit VTXInst = 0;
bit TEXInst = 0;		bit TEXInst = 0;
bit ALUInst = 0;		bit ALUInst = 0;
bit IsExport = 0;		bit IsExport = 0;
bit LDS_1A2D = 0;		bit LDS_1A2D = 0;

let SubtargetPredicate = isR600toCayman;		let SubtargetPredicate = isR600toCayman;
let Namespace = "AMDGPU";		let Namespace = "R600";
let OutOperandList = outs;		let OutOperandList = outs;
let InOperandList = ins;		let InOperandList = ins;
let AsmString = asm;		let AsmString = asm;
let Pattern = pattern;		let Pattern = pattern;
let Itinerary = itin;		let Itinerary = itin;

// No AsmMatcher support.		// No AsmMatcher support.
let isCodeGenOnly = 1;		let isCodeGenOnly = 1;
▲ Show 20 Lines • Show All 453 Lines • Show Last 20 Lines

lib/Target/AMDGPU/R600InstrInfo.h

Show All 9 Lines
/// \file		/// \file
/// Interface definition for R600InstrInfo		/// Interface definition for R600InstrInfo
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_LIB_TARGET_AMDGPU_R600INSTRINFO_H		#ifndef LLVM_LIB_TARGET_AMDGPU_R600INSTRINFO_H
#define LLVM_LIB_TARGET_AMDGPU_R600INSTRINFO_H		#define LLVM_LIB_TARGET_AMDGPU_R600INSTRINFO_H

#include "AMDGPUInstrInfo.h"
#include "R600RegisterInfo.h"		#include "R600RegisterInfo.h"
		#include "llvm/CodeGen/TargetInstrInfo.h"

		#define GET_INSTRINFO_HEADER
		#include "R600GenInstrInfo.inc"

namespace llvm {		namespace llvm {

namespace R600InstrFlags {		namespace R600InstrFlags {
enum : uint64_t {		enum : uint64_t {
REGISTER_STORE = UINT64_C(1) << 62,		REGISTER_STORE = UINT64_C(1) << 62,
REGISTER_LOAD = UINT64_C(1) << 63		REGISTER_LOAD = UINT64_C(1) << 63
};		};
}		}

class AMDGPUTargetMachine;		class AMDGPUTargetMachine;
class DFAPacketizer;		class DFAPacketizer;
class MachineFunction;		class MachineFunction;
class MachineInstr;		class MachineInstr;
class MachineInstrBuilder;		class MachineInstrBuilder;
class R600Subtarget;		class R600Subtarget;

class R600InstrInfo final : public AMDGPUInstrInfo {		class R600InstrInfo final : public R600GenInstrInfo {
private:		private:
const R600RegisterInfo RI;		const R600RegisterInfo RI;
const R600Subtarget &ST;		const R600Subtarget &ST;

std::vector<std::pair<int, unsigned>>		std::vector<std::pair<int, unsigned>>
ExtractSrcs(MachineInstr &MI, const DenseMap<unsigned, unsigned> &PV,		ExtractSrcs(MachineInstr &MI, const DenseMap<unsigned, unsigned> &PV,
unsigned &ConstCount) const;		unsigned &ConstCount) const;

▲ Show 20 Lines • Show All 273 Lines • ▼ Show 20 Lines	public:
bool isRegisterLoad(const MachineInstr &MI) const {		bool isRegisterLoad(const MachineInstr &MI) const {
return get(MI.getOpcode()).TSFlags & R600InstrFlags::REGISTER_LOAD;		return get(MI.getOpcode()).TSFlags & R600InstrFlags::REGISTER_LOAD;
}		}

unsigned getAddressSpaceForPseudoSourceKind(		unsigned getAddressSpaceForPseudoSourceKind(
PseudoSourceValue::PSVKind Kind) const override;		PseudoSourceValue::PSVKind Kind) const override;
};		};

namespace AMDGPU {		namespace R600 {

int getLDSNoRetOp(uint16_t Opcode);		int getLDSNoRetOp(uint16_t Opcode);

} //End namespace AMDGPU		} //End namespace AMDGPU

} // End llvm namespace		} // End llvm namespace

#endif		#endif

lib/Target/AMDGPU/R600InstrInfo.cpp

Show All 39 Lines
#include <cstring>		#include <cstring>
#include <iterator>		#include <iterator>
#include <utility>		#include <utility>
#include <vector>		#include <vector>

using namespace llvm;		using namespace llvm;

#define GET_INSTRINFO_CTOR_DTOR		#define GET_INSTRINFO_CTOR_DTOR
#include "AMDGPUGenDFAPacketizer.inc"		#include "R600GenDFAPacketizer.inc"

		#define GET_INSTRINFO_CTOR_DTOR
		#define GET_INSTRMAP_INFO
		#define GET_INSTRINFO_NAMED_OPS
		#include "R600GenInstrInfo.inc"

R600InstrInfo::R600InstrInfo(const R600Subtarget &ST)		R600InstrInfo::R600InstrInfo(const R600Subtarget &ST)
: AMDGPUInstrInfo(ST), RI(), ST(ST) {}		: R600GenInstrInfo(-1, -1), RI(), ST(ST) {}

bool R600InstrInfo::isVector(const MachineInstr &MI) const {		bool R600InstrInfo::isVector(const MachineInstr &MI) const {
return get(MI.getOpcode()).TSFlags & R600_InstFlag::VECTOR;		return get(MI.getOpcode()).TSFlags & R600_InstFlag::VECTOR;
}		}

void R600InstrInfo::copyPhysReg(MachineBasicBlock &MBB,		void R600InstrInfo::copyPhysReg(MachineBasicBlock &MBB,
MachineBasicBlock::iterator MI,		MachineBasicBlock::iterator MI,
const DebugLoc &DL, unsigned DestReg,		const DebugLoc &DL, unsigned DestReg,
unsigned SrcReg, bool KillSrc) const {		unsigned SrcReg, bool KillSrc) const {
unsigned VectorComponents = 0;		unsigned VectorComponents = 0;
if ((AMDGPU::R600_Reg128RegClass.contains(DestReg) \|\|		if ((R600::R600_Reg128RegClass.contains(DestReg) \|\|
AMDGPU::R600_Reg128VerticalRegClass.contains(DestReg)) &&		R600::R600_Reg128VerticalRegClass.contains(DestReg)) &&
(AMDGPU::R600_Reg128RegClass.contains(SrcReg) \|\|		(R600::R600_Reg128RegClass.contains(SrcReg) \|\|
AMDGPU::R600_Reg128VerticalRegClass.contains(SrcReg))) {		R600::R600_Reg128VerticalRegClass.contains(SrcReg))) {
VectorComponents = 4;		VectorComponents = 4;
} else if((AMDGPU::R600_Reg64RegClass.contains(DestReg) \|\|		} else if((R600::R600_Reg64RegClass.contains(DestReg) \|\|
AMDGPU::R600_Reg64VerticalRegClass.contains(DestReg)) &&		R600::R600_Reg64VerticalRegClass.contains(DestReg)) &&
(AMDGPU::R600_Reg64RegClass.contains(SrcReg) \|\|		(R600::R600_Reg64RegClass.contains(SrcReg) \|\|
AMDGPU::R600_Reg64VerticalRegClass.contains(SrcReg))) {		R600::R600_Reg64VerticalRegClass.contains(SrcReg))) {
VectorComponents = 2;		VectorComponents = 2;
}		}

if (VectorComponents > 0) {		if (VectorComponents > 0) {
for (unsigned I = 0; I < VectorComponents; I++) {		for (unsigned I = 0; I < VectorComponents; I++) {
unsigned SubRegIndex = AMDGPURegisterInfo::getSubRegFromChannel(I);		unsigned SubRegIndex = AMDGPURegisterInfo::getSubRegFromChannel(I);
buildDefaultInstruction(MBB, MI, AMDGPU::MOV,		buildDefaultInstruction(MBB, MI, R600::MOV,
RI.getSubReg(DestReg, SubRegIndex),		RI.getSubReg(DestReg, SubRegIndex),
RI.getSubReg(SrcReg, SubRegIndex))		RI.getSubReg(SrcReg, SubRegIndex))
.addReg(DestReg,		.addReg(DestReg,
RegState::Define \| RegState::Implicit);		RegState::Define \| RegState::Implicit);
}		}
} else {		} else {
MachineInstr *NewMI = buildDefaultInstruction(MBB, MI, AMDGPU::MOV,		MachineInstr *NewMI = buildDefaultInstruction(MBB, MI, R600::MOV,
DestReg, SrcReg);		DestReg, SrcReg);
NewMI->getOperand(getOperandIdx(*NewMI, AMDGPU::OpName::src0))		NewMI->getOperand(getOperandIdx(*NewMI, R600::OpName::src0))
.setIsKill(KillSrc);		.setIsKill(KillSrc);
}		}
}		}

/// \returns true if \p MBBI can be moved into a new basic.		/// \returns true if \p MBBI can be moved into a new basic.
bool R600InstrInfo::isLegalToSplitMBBAt(MachineBasicBlock &MBB,		bool R600InstrInfo::isLegalToSplitMBBAt(MachineBasicBlock &MBB,
MachineBasicBlock::iterator MBBI) const {		MachineBasicBlock::iterator MBBI) const {
for (MachineInstr::const_mop_iterator I = MBBI->operands_begin(),		for (MachineInstr::const_mop_iterator I = MBBI->operands_begin(),
E = MBBI->operands_end(); I != E; ++I) {		E = MBBI->operands_end(); I != E; ++I) {
if (I->isReg() && !TargetRegisterInfo::isVirtualRegister(I->getReg()) &&		if (I->isReg() && !TargetRegisterInfo::isVirtualRegister(I->getReg()) &&
I->isUse() && RI.isPhysRegLiveAcrossClauses(I->getReg()))		I->isUse() && RI.isPhysRegLiveAcrossClauses(I->getReg()))
return false;		return false;
}		}
return true;		return true;
}		}

bool R600InstrInfo::isMov(unsigned Opcode) const {		bool R600InstrInfo::isMov(unsigned Opcode) const {
switch(Opcode) {		switch(Opcode) {
default:		default:
return false;		return false;
case AMDGPU::MOV:		case R600::MOV:
case AMDGPU::MOV_IMM_F32:		case R600::MOV_IMM_F32:
case AMDGPU::MOV_IMM_I32:		case R600::MOV_IMM_I32:
return true;		return true;
}		}
}		}

bool R600InstrInfo::isReductionOp(unsigned Opcode) const {		bool R600InstrInfo::isReductionOp(unsigned Opcode) const {
return false;		return false;
}		}

bool R600InstrInfo::isCubeOp(unsigned Opcode) const {		bool R600InstrInfo::isCubeOp(unsigned Opcode) const {
switch(Opcode) {		switch(Opcode) {
default: return false;		default: return false;
case AMDGPU::CUBE_r600_pseudo:		case R600::CUBE_r600_pseudo:
case AMDGPU::CUBE_r600_real:		case R600::CUBE_r600_real:
case AMDGPU::CUBE_eg_pseudo:		case R600::CUBE_eg_pseudo:
case AMDGPU::CUBE_eg_real:		case R600::CUBE_eg_real:
return true;		return true;
}		}
}		}

bool R600InstrInfo::isALUInstr(unsigned Opcode) const {		bool R600InstrInfo::isALUInstr(unsigned Opcode) const {
unsigned TargetFlags = get(Opcode).TSFlags;		unsigned TargetFlags = get(Opcode).TSFlags;

return (TargetFlags & R600_InstFlag::ALU_INST);		return (TargetFlags & R600_InstFlag::ALU_INST);
Show All 11 Lines	bool R600InstrInfo::isLDSInstr(unsigned Opcode) const {
unsigned TargetFlags = get(Opcode).TSFlags;		unsigned TargetFlags = get(Opcode).TSFlags;

return ((TargetFlags & R600_InstFlag::LDS_1A) \|		return ((TargetFlags & R600_InstFlag::LDS_1A) \|
(TargetFlags & R600_InstFlag::LDS_1A1D) \|		(TargetFlags & R600_InstFlag::LDS_1A1D) \|
(TargetFlags & R600_InstFlag::LDS_1A2D));		(TargetFlags & R600_InstFlag::LDS_1A2D));
}		}

bool R600InstrInfo::isLDSRetInstr(unsigned Opcode) const {		bool R600InstrInfo::isLDSRetInstr(unsigned Opcode) const {
return isLDSInstr(Opcode) && getOperandIdx(Opcode, AMDGPU::OpName::dst) != -1;		return isLDSInstr(Opcode) && getOperandIdx(Opcode, R600::OpName::dst) != -1;
}		}

bool R600InstrInfo::canBeConsideredALU(const MachineInstr &MI) const {		bool R600InstrInfo::canBeConsideredALU(const MachineInstr &MI) const {
if (isALUInstr(MI.getOpcode()))		if (isALUInstr(MI.getOpcode()))
return true;		return true;
if (isVector(MI) \|\| isCubeOp(MI.getOpcode()))		if (isVector(MI) \|\| isCubeOp(MI.getOpcode()))
return true;		return true;
switch (MI.getOpcode()) {		switch (MI.getOpcode()) {
case AMDGPU::PRED_X:		case R600::PRED_X:
case AMDGPU::INTERP_PAIR_XY:		case R600::INTERP_PAIR_XY:
case AMDGPU::INTERP_PAIR_ZW:		case R600::INTERP_PAIR_ZW:
case AMDGPU::INTERP_VEC_LOAD:		case R600::INTERP_VEC_LOAD:
case AMDGPU::COPY:		case R600::COPY:
case AMDGPU::DOT_4:		case R600::DOT_4:
return true;		return true;
default:		default:
return false;		return false;
}		}
}		}

bool R600InstrInfo::isTransOnly(unsigned Opcode) const {		bool R600InstrInfo::isTransOnly(unsigned Opcode) const {
if (ST.hasCaymanISA())		if (ST.hasCaymanISA())
return false;		return false;
return (get(Opcode).getSchedClass() == AMDGPU::Sched::TransALU);		return (get(Opcode).getSchedClass() == R600::Sched::TransALU);
}		}

bool R600InstrInfo::isTransOnly(const MachineInstr &MI) const {		bool R600InstrInfo::isTransOnly(const MachineInstr &MI) const {
return isTransOnly(MI.getOpcode());		return isTransOnly(MI.getOpcode());
}		}

bool R600InstrInfo::isVectorOnly(unsigned Opcode) const {		bool R600InstrInfo::isVectorOnly(unsigned Opcode) const {
return (get(Opcode).getSchedClass() == AMDGPU::Sched::VecALU);		return (get(Opcode).getSchedClass() == R600::Sched::VecALU);
}		}

bool R600InstrInfo::isVectorOnly(const MachineInstr &MI) const {		bool R600InstrInfo::isVectorOnly(const MachineInstr &MI) const {
return isVectorOnly(MI.getOpcode());		return isVectorOnly(MI.getOpcode());
}		}

bool R600InstrInfo::isExport(unsigned Opcode) const {		bool R600InstrInfo::isExport(unsigned Opcode) const {
return (get(Opcode).TSFlags & R600_InstFlag::IS_EXPORT);		return (get(Opcode).TSFlags & R600_InstFlag::IS_EXPORT);
Show All 17 Lines	bool R600InstrInfo::usesTextureCache(const MachineInstr &MI) const {
const MachineFunction *MF = MI.getParent()->getParent();		const MachineFunction *MF = MI.getParent()->getParent();
return (AMDGPU::isCompute(MF->getFunction().getCallingConv()) &&		return (AMDGPU::isCompute(MF->getFunction().getCallingConv()) &&
usesVertexCache(MI.getOpcode())) \|\|		usesVertexCache(MI.getOpcode())) \|\|
usesTextureCache(MI.getOpcode());		usesTextureCache(MI.getOpcode());
}		}

bool R600InstrInfo::mustBeLastInClause(unsigned Opcode) const {		bool R600InstrInfo::mustBeLastInClause(unsigned Opcode) const {
switch (Opcode) {		switch (Opcode) {
case AMDGPU::KILLGT:		case R600::KILLGT:
case AMDGPU::GROUP_BARRIER:		case R600::GROUP_BARRIER:
return true;		return true;
default:		default:
return false;		return false;
}		}
}		}

bool R600InstrInfo::usesAddressRegister(MachineInstr &MI) const {		bool R600InstrInfo::usesAddressRegister(MachineInstr &MI) const {
return MI.findRegisterUseOperandIdx(AMDGPU::AR_X) != -1;		return MI.findRegisterUseOperandIdx(R600::AR_X) != -1;
}		}

bool R600InstrInfo::definesAddressRegister(MachineInstr &MI) const {		bool R600InstrInfo::definesAddressRegister(MachineInstr &MI) const {
return MI.findRegisterDefOperandIdx(AMDGPU::AR_X) != -1;		return MI.findRegisterDefOperandIdx(R600::AR_X) != -1;
}		}

bool R600InstrInfo::readsLDSSrcReg(const MachineInstr &MI) const {		bool R600InstrInfo::readsLDSSrcReg(const MachineInstr &MI) const {
if (!isALUInstr(MI.getOpcode())) {		if (!isALUInstr(MI.getOpcode())) {
return false;		return false;
}		}
for (MachineInstr::const_mop_iterator I = MI.operands_begin(),		for (MachineInstr::const_mop_iterator I = MI.operands_begin(),
E = MI.operands_end();		E = MI.operands_end();
I != E; ++I) {		I != E; ++I) {
if (!I->isReg() \|\| !I->isUse() \|\|		if (!I->isReg() \|\| !I->isUse() \|\|
TargetRegisterInfo::isVirtualRegister(I->getReg()))		TargetRegisterInfo::isVirtualRegister(I->getReg()))
continue;		continue;

if (AMDGPU::R600_LDS_SRC_REGRegClass.contains(I->getReg()))		if (R600::R600_LDS_SRC_REGRegClass.contains(I->getReg()))
return true;		return true;
}		}
return false;		return false;
}		}

int R600InstrInfo::getSelIdx(unsigned Opcode, unsigned SrcIdx) const {		int R600InstrInfo::getSelIdx(unsigned Opcode, unsigned SrcIdx) const {
static const unsigned SrcSelTable[][2] = {		static const unsigned SrcSelTable[][2] = {
{AMDGPU::OpName::src0, AMDGPU::OpName::src0_sel},		{R600::OpName::src0, R600::OpName::src0_sel},
{AMDGPU::OpName::src1, AMDGPU::OpName::src1_sel},		{R600::OpName::src1, R600::OpName::src1_sel},
{AMDGPU::OpName::src2, AMDGPU::OpName::src2_sel},		{R600::OpName::src2, R600::OpName::src2_sel},
{AMDGPU::OpName::src0_X, AMDGPU::OpName::src0_sel_X},		{R600::OpName::src0_X, R600::OpName::src0_sel_X},
{AMDGPU::OpName::src0_Y, AMDGPU::OpName::src0_sel_Y},		{R600::OpName::src0_Y, R600::OpName::src0_sel_Y},
{AMDGPU::OpName::src0_Z, AMDGPU::OpName::src0_sel_Z},		{R600::OpName::src0_Z, R600::OpName::src0_sel_Z},
{AMDGPU::OpName::src0_W, AMDGPU::OpName::src0_sel_W},		{R600::OpName::src0_W, R600::OpName::src0_sel_W},
{AMDGPU::OpName::src1_X, AMDGPU::OpName::src1_sel_X},		{R600::OpName::src1_X, R600::OpName::src1_sel_X},
{AMDGPU::OpName::src1_Y, AMDGPU::OpName::src1_sel_Y},		{R600::OpName::src1_Y, R600::OpName::src1_sel_Y},
{AMDGPU::OpName::src1_Z, AMDGPU::OpName::src1_sel_Z},		{R600::OpName::src1_Z, R600::OpName::src1_sel_Z},
{AMDGPU::OpName::src1_W, AMDGPU::OpName::src1_sel_W}		{R600::OpName::src1_W, R600::OpName::src1_sel_W}
};		};

for (const auto &Row : SrcSelTable) {		for (const auto &Row : SrcSelTable) {
if (getOperandIdx(Opcode, Row[0]) == (int)SrcIdx) {		if (getOperandIdx(Opcode, Row[0]) == (int)SrcIdx) {
return getOperandIdx(Opcode, Row[1]);		return getOperandIdx(Opcode, Row[1]);
}		}
}		}
return -1;		return -1;
}		}

SmallVector<std::pair<MachineOperand *, int64_t>, 3>		SmallVector<std::pair<MachineOperand *, int64_t>, 3>
R600InstrInfo::getSrcs(MachineInstr &MI) const {		R600InstrInfo::getSrcs(MachineInstr &MI) const {
SmallVector<std::pair<MachineOperand *, int64_t>, 3> Result;		SmallVector<std::pair<MachineOperand *, int64_t>, 3> Result;

if (MI.getOpcode() == AMDGPU::DOT_4) {		if (MI.getOpcode() == R600::DOT_4) {
static const unsigned OpTable[8][2] = {		static const unsigned OpTable[8][2] = {
{AMDGPU::OpName::src0_X, AMDGPU::OpName::src0_sel_X},		{R600::OpName::src0_X, R600::OpName::src0_sel_X},
{AMDGPU::OpName::src0_Y, AMDGPU::OpName::src0_sel_Y},		{R600::OpName::src0_Y, R600::OpName::src0_sel_Y},
{AMDGPU::OpName::src0_Z, AMDGPU::OpName::src0_sel_Z},		{R600::OpName::src0_Z, R600::OpName::src0_sel_Z},
{AMDGPU::OpName::src0_W, AMDGPU::OpName::src0_sel_W},		{R600::OpName::src0_W, R600::OpName::src0_sel_W},
{AMDGPU::OpName::src1_X, AMDGPU::OpName::src1_sel_X},		{R600::OpName::src1_X, R600::OpName::src1_sel_X},
{AMDGPU::OpName::src1_Y, AMDGPU::OpName::src1_sel_Y},		{R600::OpName::src1_Y, R600::OpName::src1_sel_Y},
{AMDGPU::OpName::src1_Z, AMDGPU::OpName::src1_sel_Z},		{R600::OpName::src1_Z, R600::OpName::src1_sel_Z},
{AMDGPU::OpName::src1_W, AMDGPU::OpName::src1_sel_W},		{R600::OpName::src1_W, R600::OpName::src1_sel_W},
};		};

for (unsigned j = 0; j < 8; j++) {		for (unsigned j = 0; j < 8; j++) {
MachineOperand &MO =		MachineOperand &MO =
MI.getOperand(getOperandIdx(MI.getOpcode(), OpTable[j][0]));		MI.getOperand(getOperandIdx(MI.getOpcode(), OpTable[j][0]));
unsigned Reg = MO.getReg();		unsigned Reg = MO.getReg();
if (Reg == AMDGPU::ALU_CONST) {		if (Reg == R600::ALU_CONST) {
MachineOperand &Sel =		MachineOperand &Sel =
MI.getOperand(getOperandIdx(MI.getOpcode(), OpTable[j][1]));		MI.getOperand(getOperandIdx(MI.getOpcode(), OpTable[j][1]));
Result.push_back(std::make_pair(&MO, Sel.getImm()));		Result.push_back(std::make_pair(&MO, Sel.getImm()));
continue;		continue;
}		}

}		}
return Result;		return Result;
}		}

static const unsigned OpTable[3][2] = {		static const unsigned OpTable[3][2] = {
{AMDGPU::OpName::src0, AMDGPU::OpName::src0_sel},		{R600::OpName::src0, R600::OpName::src0_sel},
{AMDGPU::OpName::src1, AMDGPU::OpName::src1_sel},		{R600::OpName::src1, R600::OpName::src1_sel},
{AMDGPU::OpName::src2, AMDGPU::OpName::src2_sel},		{R600::OpName::src2, R600::OpName::src2_sel},
};		};

for (unsigned j = 0; j < 3; j++) {		for (unsigned j = 0; j < 3; j++) {
int SrcIdx = getOperandIdx(MI.getOpcode(), OpTable[j][0]);		int SrcIdx = getOperandIdx(MI.getOpcode(), OpTable[j][0]);
if (SrcIdx < 0)		if (SrcIdx < 0)
break;		break;
MachineOperand &MO = MI.getOperand(SrcIdx);		MachineOperand &MO = MI.getOperand(SrcIdx);
unsigned Reg = MO.getReg();		unsigned Reg = MO.getReg();
if (Reg == AMDGPU::ALU_CONST) {		if (Reg == R600::ALU_CONST) {
MachineOperand &Sel =		MachineOperand &Sel =
MI.getOperand(getOperandIdx(MI.getOpcode(), OpTable[j][1]));		MI.getOperand(getOperandIdx(MI.getOpcode(), OpTable[j][1]));
Result.push_back(std::make_pair(&MO, Sel.getImm()));		Result.push_back(std::make_pair(&MO, Sel.getImm()));
continue;		continue;
}		}
if (Reg == AMDGPU::ALU_LITERAL_X) {		if (Reg == R600::ALU_LITERAL_X) {
MachineOperand &Operand =		MachineOperand &Operand =
MI.getOperand(getOperandIdx(MI.getOpcode(), AMDGPU::OpName::literal));		MI.getOperand(getOperandIdx(MI.getOpcode(), R600::OpName::literal));
if (Operand.isImm()) {		if (Operand.isImm()) {
Result.push_back(std::make_pair(&MO, Operand.getImm()));		Result.push_back(std::make_pair(&MO, Operand.getImm()));
continue;		continue;
}		}
assert(Operand.isGlobal());		assert(Operand.isGlobal());
}		}
Result.push_back(std::make_pair(&MO, 0));		Result.push_back(std::make_pair(&MO, 0));
}		}
return Result;		return Result;
}		}

std::vector<std::pair<int, unsigned>>		std::vector<std::pair<int, unsigned>>
R600InstrInfo::ExtractSrcs(MachineInstr &MI,		R600InstrInfo::ExtractSrcs(MachineInstr &MI,
const DenseMap<unsigned, unsigned> &PV,		const DenseMap<unsigned, unsigned> &PV,
unsigned &ConstCount) const {		unsigned &ConstCount) const {
ConstCount = 0;		ConstCount = 0;
const std::pair<int, unsigned> DummyPair(-1, 0);		const std::pair<int, unsigned> DummyPair(-1, 0);
std::vector<std::pair<int, unsigned>> Result;		std::vector<std::pair<int, unsigned>> Result;
unsigned i = 0;		unsigned i = 0;
for (const auto &Src : getSrcs(MI)) {		for (const auto &Src : getSrcs(MI)) {
++i;		++i;
unsigned Reg = Src.first->getReg();		unsigned Reg = Src.first->getReg();
int Index = RI.getEncodingValue(Reg) & 0xff;		int Index = RI.getEncodingValue(Reg) & 0xff;
if (Reg == AMDGPU::OQAP) {		if (Reg == R600::OQAP) {
Result.push_back(std::make_pair(Index, 0U));		Result.push_back(std::make_pair(Index, 0U));
}		}
if (PV.find(Reg) != PV.end()) {		if (PV.find(Reg) != PV.end()) {
// 255 is used to tells its a PS/PV reg		// 255 is used to tells its a PS/PV reg
Result.push_back(std::make_pair(255, 0U));		Result.push_back(std::make_pair(255, 0U));
continue;		continue;
}		}
if (Index > 127) {		if (Index > 127) {
▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines	unsigned R600InstrInfo::isLegalUpTo(
memset(Vector, -1, sizeof(Vector));		memset(Vector, -1, sizeof(Vector));
for (unsigned i = 0, e = IGSrcs.size(); i < e; i++) {		for (unsigned i = 0, e = IGSrcs.size(); i < e; i++) {
const std::vector<std::pair<int, unsigned>> &Srcs =		const std::vector<std::pair<int, unsigned>> &Srcs =
Swizzle(IGSrcs[i], Swz[i]);		Swizzle(IGSrcs[i], Swz[i]);
for (unsigned j = 0; j < 3; j++) {		for (unsigned j = 0; j < 3; j++) {
const std::pair<int, unsigned> &Src = Srcs[j];		const std::pair<int, unsigned> &Src = Srcs[j];
if (Src.first < 0 \|\| Src.first == 255)		if (Src.first < 0 \|\| Src.first == 255)
continue;		continue;
if (Src.first == GET_REG_INDEX(RI.getEncodingValue(AMDGPU::OQAP))) {		if (Src.first == GET_REG_INDEX(RI.getEncodingValue(R600::OQAP))) {
if (Swz[i] != R600InstrInfo::ALU_VEC_012_SCL_210 &&		if (Swz[i] != R600InstrInfo::ALU_VEC_012_SCL_210 &&
Swz[i] != R600InstrInfo::ALU_VEC_021_SCL_122) {		Swz[i] != R600InstrInfo::ALU_VEC_021_SCL_122) {
// The value from output queue A (denoted by register OQAP) can		// The value from output queue A (denoted by register OQAP) can
// only be fetched during the first cycle.		// only be fetched during the first cycle.
return false;		return false;
}		}
// OQAP does not count towards the normal read port restrictions		// OQAP does not count towards the normal read port restrictions
continue;		continue;
▲ Show 20 Lines • Show All 89 Lines • ▼ Show 20 Lines	R600InstrInfo::fitsReadPortLimitations(const std::vector<MachineInstr *> &IG,

std::vector<std::vector<std::pair<int, unsigned>>> IGSrcs;		std::vector<std::vector<std::pair<int, unsigned>>> IGSrcs;
ValidSwizzle.clear();		ValidSwizzle.clear();
unsigned ConstCount;		unsigned ConstCount;
BankSwizzle TransBS = ALU_VEC_012_SCL_210;		BankSwizzle TransBS = ALU_VEC_012_SCL_210;
for (unsigned i = 0, e = IG.size(); i < e; ++i) {		for (unsigned i = 0, e = IG.size(); i < e; ++i) {
IGSrcs.push_back(ExtractSrcs(*IG[i], PV, ConstCount));		IGSrcs.push_back(ExtractSrcs(*IG[i], PV, ConstCount));
unsigned Op = getOperandIdx(IG[i]->getOpcode(),		unsigned Op = getOperandIdx(IG[i]->getOpcode(),
AMDGPU::OpName::bank_swizzle);		R600::OpName::bank_swizzle);
ValidSwizzle.push_back( (R600InstrInfo::BankSwizzle)		ValidSwizzle.push_back( (R600InstrInfo::BankSwizzle)
IG[i]->getOperand(Op).getImm());		IG[i]->getOperand(Op).getImm());
}		}
std::vector<std::pair<int, unsigned>> TransOps;		std::vector<std::pair<int, unsigned>> TransOps;
if (!isLastAluTrans)		if (!isLastAluTrans)
return FindSwizzleForVectorSlot(IGSrcs, ValidSwizzle, TransOps, TransBS);		return FindSwizzleForVectorSlot(IGSrcs, ValidSwizzle, TransOps, TransBS);

TransOps = std::move(IGSrcs.back());		TransOps = std::move(IGSrcs.back());
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	R600InstrInfo::fitsConstReadLimitations(const std::vector<MachineInstr *> &MIs)
std::vector<unsigned> Consts;		std::vector<unsigned> Consts;
SmallSet<int64_t, 4> Literals;		SmallSet<int64_t, 4> Literals;
for (unsigned i = 0, n = MIs.size(); i < n; i++) {		for (unsigned i = 0, n = MIs.size(); i < n; i++) {
MachineInstr &MI = *MIs[i];		MachineInstr &MI = *MIs[i];
if (!isALUInstr(MI.getOpcode()))		if (!isALUInstr(MI.getOpcode()))
continue;		continue;

for (const auto &Src : getSrcs(MI)) {		for (const auto &Src : getSrcs(MI)) {
if (Src.first->getReg() == AMDGPU::ALU_LITERAL_X)		if (Src.first->getReg() == R600::ALU_LITERAL_X)
Literals.insert(Src.second);		Literals.insert(Src.second);
if (Literals.size() > 4)		if (Literals.size() > 4)
return false;		return false;
if (Src.first->getReg() == AMDGPU::ALU_CONST)		if (Src.first->getReg() == R600::ALU_CONST)
Consts.push_back(Src.second);		Consts.push_back(Src.second);
if (AMDGPU::R600_KC0RegClass.contains(Src.first->getReg()) \|\|		if (R600::R600_KC0RegClass.contains(Src.first->getReg()) \|\|
AMDGPU::R600_KC1RegClass.contains(Src.first->getReg())) {		R600::R600_KC1RegClass.contains(Src.first->getReg())) {
unsigned Index = RI.getEncodingValue(Src.first->getReg()) & 0xff;		unsigned Index = RI.getEncodingValue(Src.first->getReg()) & 0xff;
unsigned Chan = RI.getHWRegChan(Src.first->getReg());		unsigned Chan = RI.getHWRegChan(Src.first->getReg());
Consts.push_back((Index << 2) \| Chan);		Consts.push_back((Index << 2) \| Chan);
}		}
}		}
}		}
return fitsConstReadLimitations(Consts);		return fitsConstReadLimitations(Consts);
}		}

DFAPacketizer *		DFAPacketizer *
R600InstrInfo::CreateTargetScheduleState(const TargetSubtargetInfo &STI) const {		R600InstrInfo::CreateTargetScheduleState(const TargetSubtargetInfo &STI) const {
const InstrItineraryData *II = STI.getInstrItineraryData();		const InstrItineraryData *II = STI.getInstrItineraryData();
return static_cast<const R600Subtarget &>(STI).createDFAPacketizer(II);		return static_cast<const R600Subtarget &>(STI).createDFAPacketizer(II);
}		}

static bool		static bool
isPredicateSetter(unsigned Opcode) {		isPredicateSetter(unsigned Opcode) {
switch (Opcode) {		switch (Opcode) {
case AMDGPU::PRED_X:		case R600::PRED_X:
return true;		return true;
default:		default:
return false;		return false;
}		}
}		}

static MachineInstr *		static MachineInstr *
findFirstPredicateSetterFrom(MachineBasicBlock &MBB,		findFirstPredicateSetterFrom(MachineBasicBlock &MBB,
MachineBasicBlock::iterator I) {		MachineBasicBlock::iterator I) {
while (I != MBB.begin()) {		while (I != MBB.begin()) {
--I;		--I;
MachineInstr &MI = *I;		MachineInstr &MI = *I;
if (isPredicateSetter(MI.getOpcode()))		if (isPredicateSetter(MI.getOpcode()))
return &MI;		return &MI;
}		}

return nullptr;		return nullptr;
}		}

static		static
bool isJump(unsigned Opcode) {		bool isJump(unsigned Opcode) {
return Opcode == AMDGPU::JUMP \|\| Opcode == AMDGPU::JUMP_COND;		return Opcode == R600::JUMP \|\| Opcode == R600::JUMP_COND;
}		}

static bool isBranch(unsigned Opcode) {		static bool isBranch(unsigned Opcode) {
return Opcode == AMDGPU::BRANCH \|\| Opcode == AMDGPU::BRANCH_COND_i32 \|\|		return Opcode == R600::BRANCH \|\| Opcode == R600::BRANCH_COND_i32 \|\|
Opcode == AMDGPU::BRANCH_COND_f32;		Opcode == R600::BRANCH_COND_f32;
}		}

bool R600InstrInfo::analyzeBranch(MachineBasicBlock &MBB,		bool R600InstrInfo::analyzeBranch(MachineBasicBlock &MBB,
MachineBasicBlock *&TBB,		MachineBasicBlock *&TBB,
MachineBasicBlock *&FBB,		MachineBasicBlock *&FBB,
SmallVectorImpl<MachineOperand> &Cond,		SmallVectorImpl<MachineOperand> &Cond,
bool AllowModify) const {		bool AllowModify) const {
// Most of the following comes from the ARM implementation of AnalyzeBranch		// Most of the following comes from the ARM implementation of AnalyzeBranch

// If the block has no terminators, it just falls into the block after it.		// If the block has no terminators, it just falls into the block after it.
MachineBasicBlock::iterator I = MBB.getLastNonDebugInstr();		MachineBasicBlock::iterator I = MBB.getLastNonDebugInstr();
if (I == MBB.end())		if (I == MBB.end())
return false;		return false;

// AMDGPU::BRANCH* instructions are only available after isel and are not		// R600::BRANCH* instructions are only available after isel and are not
// handled		// handled
if (isBranch(I->getOpcode()))		if (isBranch(I->getOpcode()))
return true;		return true;
if (!isJump(I->getOpcode())) {		if (!isJump(I->getOpcode())) {
return false;		return false;
}		}

// Remove successive JUMP		// Remove successive JUMP
while (I != MBB.begin() && std::prev(I)->getOpcode() == AMDGPU::JUMP) {		while (I != MBB.begin() && std::prev(I)->getOpcode() == R600::JUMP) {
MachineBasicBlock::iterator PriorI = std::prev(I);		MachineBasicBlock::iterator PriorI = std::prev(I);
if (AllowModify)		if (AllowModify)
I->removeFromParent();		I->removeFromParent();
I = PriorI;		I = PriorI;
}		}
MachineInstr &LastInst = *I;		MachineInstr &LastInst = *I;

// If there is only one terminator instruction, process it.		// If there is only one terminator instruction, process it.
unsigned LastOpc = LastInst.getOpcode();		unsigned LastOpc = LastInst.getOpcode();
if (I == MBB.begin() \|\| !isJump((--I)->getOpcode())) {		if (I == MBB.begin() \|\| !isJump((--I)->getOpcode())) {
if (LastOpc == AMDGPU::JUMP) {		if (LastOpc == R600::JUMP) {
TBB = LastInst.getOperand(0).getMBB();		TBB = LastInst.getOperand(0).getMBB();
return false;		return false;
} else if (LastOpc == AMDGPU::JUMP_COND) {		} else if (LastOpc == R600::JUMP_COND) {
auto predSet = I;		auto predSet = I;
while (!isPredicateSetter(predSet->getOpcode())) {		while (!isPredicateSetter(predSet->getOpcode())) {
predSet = --I;		predSet = --I;
}		}
TBB = LastInst.getOperand(0).getMBB();		TBB = LastInst.getOperand(0).getMBB();
Cond.push_back(predSet->getOperand(1));		Cond.push_back(predSet->getOperand(1));
Cond.push_back(predSet->getOperand(2));		Cond.push_back(predSet->getOperand(2));
Cond.push_back(MachineOperand::CreateReg(AMDGPU::PRED_SEL_ONE, false));		Cond.push_back(MachineOperand::CreateReg(R600::PRED_SEL_ONE, false));
return false;		return false;
}		}
return true; // Can't handle indirect branch.		return true; // Can't handle indirect branch.
}		}

// Get the instruction before it if it is a terminator.		// Get the instruction before it if it is a terminator.
MachineInstr &SecondLastInst = *I;		MachineInstr &SecondLastInst = *I;
unsigned SecondLastOpc = SecondLastInst.getOpcode();		unsigned SecondLastOpc = SecondLastInst.getOpcode();

// If the block ends with a B and a Bcc, handle it.		// If the block ends with a B and a Bcc, handle it.
if (SecondLastOpc == AMDGPU::JUMP_COND && LastOpc == AMDGPU::JUMP) {		if (SecondLastOpc == R600::JUMP_COND && LastOpc == R600::JUMP) {
auto predSet = --I;		auto predSet = --I;
while (!isPredicateSetter(predSet->getOpcode())) {		while (!isPredicateSetter(predSet->getOpcode())) {
predSet = --I;		predSet = --I;
}		}
TBB = SecondLastInst.getOperand(0).getMBB();		TBB = SecondLastInst.getOperand(0).getMBB();
FBB = LastInst.getOperand(0).getMBB();		FBB = LastInst.getOperand(0).getMBB();
Cond.push_back(predSet->getOperand(1));		Cond.push_back(predSet->getOperand(1));
Cond.push_back(predSet->getOperand(2));		Cond.push_back(predSet->getOperand(2));
Cond.push_back(MachineOperand::CreateReg(AMDGPU::PRED_SEL_ONE, false));		Cond.push_back(MachineOperand::CreateReg(R600::PRED_SEL_ONE, false));
return false;		return false;
}		}

// Otherwise, can't handle this.		// Otherwise, can't handle this.
return true;		return true;
}		}

static		static
MachineBasicBlock::iterator FindLastAluClause(MachineBasicBlock &MBB) {		MachineBasicBlock::iterator FindLastAluClause(MachineBasicBlock &MBB) {
for (MachineBasicBlock::reverse_iterator It = MBB.rbegin(), E = MBB.rend();		for (MachineBasicBlock::reverse_iterator It = MBB.rbegin(), E = MBB.rend();
It != E; ++It) {		It != E; ++It) {
if (It->getOpcode() == AMDGPU::CF_ALU \|\|		if (It->getOpcode() == R600::CF_ALU \|\|
It->getOpcode() == AMDGPU::CF_ALU_PUSH_BEFORE)		It->getOpcode() == R600::CF_ALU_PUSH_BEFORE)
return It.getReverse();		return It.getReverse();
}		}
return MBB.end();		return MBB.end();
}		}

unsigned R600InstrInfo::insertBranch(MachineBasicBlock &MBB,		unsigned R600InstrInfo::insertBranch(MachineBasicBlock &MBB,
MachineBasicBlock *TBB,		MachineBasicBlock *TBB,
MachineBasicBlock *FBB,		MachineBasicBlock *FBB,
ArrayRef<MachineOperand> Cond,		ArrayRef<MachineOperand> Cond,
const DebugLoc &DL,		const DebugLoc &DL,
int *BytesAdded) const {		int *BytesAdded) const {
assert(TBB && "insertBranch must not be told to insert a fallthrough");		assert(TBB && "insertBranch must not be told to insert a fallthrough");
assert(!BytesAdded && "code size not handled");		assert(!BytesAdded && "code size not handled");

if (!FBB) {		if (!FBB) {
if (Cond.empty()) {		if (Cond.empty()) {
BuildMI(&MBB, DL, get(AMDGPU::JUMP)).addMBB(TBB);		BuildMI(&MBB, DL, get(R600::JUMP)).addMBB(TBB);
return 1;		return 1;
} else {		} else {
MachineInstr *PredSet = findFirstPredicateSetterFrom(MBB, MBB.end());		MachineInstr *PredSet = findFirstPredicateSetterFrom(MBB, MBB.end());
assert(PredSet && "No previous predicate !");		assert(PredSet && "No previous predicate !");
addFlag(*PredSet, 0, MO_FLAG_PUSH);		addFlag(*PredSet, 0, MO_FLAG_PUSH);
PredSet->getOperand(2).setImm(Cond[1].getImm());		PredSet->getOperand(2).setImm(Cond[1].getImm());

BuildMI(&MBB, DL, get(AMDGPU::JUMP_COND))		BuildMI(&MBB, DL, get(R600::JUMP_COND))
.addMBB(TBB)		.addMBB(TBB)
.addReg(AMDGPU::PREDICATE_BIT, RegState::Kill);		.addReg(R600::PREDICATE_BIT, RegState::Kill);
MachineBasicBlock::iterator CfAlu = FindLastAluClause(MBB);		MachineBasicBlock::iterator CfAlu = FindLastAluClause(MBB);
if (CfAlu == MBB.end())		if (CfAlu == MBB.end())
return 1;		return 1;
assert (CfAlu->getOpcode() == AMDGPU::CF_ALU);		assert (CfAlu->getOpcode() == R600::CF_ALU);
CfAlu->setDesc(get(AMDGPU::CF_ALU_PUSH_BEFORE));		CfAlu->setDesc(get(R600::CF_ALU_PUSH_BEFORE));
return 1;		return 1;
}		}
} else {		} else {
MachineInstr *PredSet = findFirstPredicateSetterFrom(MBB, MBB.end());		MachineInstr *PredSet = findFirstPredicateSetterFrom(MBB, MBB.end());
assert(PredSet && "No previous predicate !");		assert(PredSet && "No previous predicate !");
addFlag(*PredSet, 0, MO_FLAG_PUSH);		addFlag(*PredSet, 0, MO_FLAG_PUSH);
PredSet->getOperand(2).setImm(Cond[1].getImm());		PredSet->getOperand(2).setImm(Cond[1].getImm());
BuildMI(&MBB, DL, get(AMDGPU::JUMP_COND))		BuildMI(&MBB, DL, get(R600::JUMP_COND))
.addMBB(TBB)		.addMBB(TBB)
.addReg(AMDGPU::PREDICATE_BIT, RegState::Kill);		.addReg(R600::PREDICATE_BIT, RegState::Kill);
BuildMI(&MBB, DL, get(AMDGPU::JUMP)).addMBB(FBB);		BuildMI(&MBB, DL, get(R600::JUMP)).addMBB(FBB);
MachineBasicBlock::iterator CfAlu = FindLastAluClause(MBB);		MachineBasicBlock::iterator CfAlu = FindLastAluClause(MBB);
if (CfAlu == MBB.end())		if (CfAlu == MBB.end())
return 2;		return 2;
assert (CfAlu->getOpcode() == AMDGPU::CF_ALU);		assert (CfAlu->getOpcode() == R600::CF_ALU);
CfAlu->setDesc(get(AMDGPU::CF_ALU_PUSH_BEFORE));		CfAlu->setDesc(get(R600::CF_ALU_PUSH_BEFORE));
return 2;		return 2;
}		}
}		}

unsigned R600InstrInfo::removeBranch(MachineBasicBlock &MBB,		unsigned R600InstrInfo::removeBranch(MachineBasicBlock &MBB,
int *BytesRemoved) const {		int *BytesRemoved) const {
assert(!BytesRemoved && "code size not handled");		assert(!BytesRemoved && "code size not handled");

// Note : we leave PRED* instructions there.		// Note : we leave PRED* instructions there.
// They may be needed when predicating instructions.		// They may be needed when predicating instructions.

MachineBasicBlock::iterator I = MBB.end();		MachineBasicBlock::iterator I = MBB.end();

if (I == MBB.begin()) {		if (I == MBB.begin()) {
return 0;		return 0;
}		}
--I;		--I;
switch (I->getOpcode()) {		switch (I->getOpcode()) {
default:		default:
return 0;		return 0;
case AMDGPU::JUMP_COND: {		case R600::JUMP_COND: {
MachineInstr *predSet = findFirstPredicateSetterFrom(MBB, I);		MachineInstr *predSet = findFirstPredicateSetterFrom(MBB, I);
clearFlag(*predSet, 0, MO_FLAG_PUSH);		clearFlag(*predSet, 0, MO_FLAG_PUSH);
I->eraseFromParent();		I->eraseFromParent();
MachineBasicBlock::iterator CfAlu = FindLastAluClause(MBB);		MachineBasicBlock::iterator CfAlu = FindLastAluClause(MBB);
if (CfAlu == MBB.end())		if (CfAlu == MBB.end())
break;		break;
assert (CfAlu->getOpcode() == AMDGPU::CF_ALU_PUSH_BEFORE);		assert (CfAlu->getOpcode() == R600::CF_ALU_PUSH_BEFORE);
CfAlu->setDesc(get(AMDGPU::CF_ALU));		CfAlu->setDesc(get(R600::CF_ALU));
break;		break;
}		}
case AMDGPU::JUMP:		case R600::JUMP:
I->eraseFromParent();		I->eraseFromParent();
break;		break;
}		}
I = MBB.end();		I = MBB.end();

if (I == MBB.begin()) {		if (I == MBB.begin()) {
return 1;		return 1;
}		}
--I;		--I;
switch (I->getOpcode()) {		switch (I->getOpcode()) {
// FIXME: only one case??		// FIXME: only one case??
default:		default:
return 1;		return 1;
case AMDGPU::JUMP_COND: {		case R600::JUMP_COND: {
MachineInstr *predSet = findFirstPredicateSetterFrom(MBB, I);		MachineInstr *predSet = findFirstPredicateSetterFrom(MBB, I);
clearFlag(*predSet, 0, MO_FLAG_PUSH);		clearFlag(*predSet, 0, MO_FLAG_PUSH);
I->eraseFromParent();		I->eraseFromParent();
MachineBasicBlock::iterator CfAlu = FindLastAluClause(MBB);		MachineBasicBlock::iterator CfAlu = FindLastAluClause(MBB);
if (CfAlu == MBB.end())		if (CfAlu == MBB.end())
break;		break;
assert (CfAlu->getOpcode() == AMDGPU::CF_ALU_PUSH_BEFORE);		assert (CfAlu->getOpcode() == R600::CF_ALU_PUSH_BEFORE);
CfAlu->setDesc(get(AMDGPU::CF_ALU));		CfAlu->setDesc(get(R600::CF_ALU));
break;		break;
}		}
case AMDGPU::JUMP:		case R600::JUMP:
I->eraseFromParent();		I->eraseFromParent();
break;		break;
}		}
return 2;		return 2;
}		}

bool R600InstrInfo::isPredicated(const MachineInstr &MI) const {		bool R600InstrInfo::isPredicated(const MachineInstr &MI) const {
int idx = MI.findFirstPredOperandIdx();		int idx = MI.findFirstPredOperandIdx();
if (idx < 0)		if (idx < 0)
return false;		return false;

unsigned Reg = MI.getOperand(idx).getReg();		unsigned Reg = MI.getOperand(idx).getReg();
switch (Reg) {		switch (Reg) {
default: return false;		default: return false;
case AMDGPU::PRED_SEL_ONE:		case R600::PRED_SEL_ONE:
case AMDGPU::PRED_SEL_ZERO:		case R600::PRED_SEL_ZERO:
case AMDGPU::PREDICATE_BIT:		case R600::PREDICATE_BIT:
return true;		return true;
}		}
}		}

bool R600InstrInfo::isPredicable(const MachineInstr &MI) const {		bool R600InstrInfo::isPredicable(const MachineInstr &MI) const {
// XXX: KILL* instructions can be predicated, but they must be the last		// XXX: KILL* instructions can be predicated, but they must be the last
// instruction in a clause, so this means any instructions after them cannot		// instruction in a clause, so this means any instructions after them cannot
// be predicated. Until we have proper support for instruction clauses in the		// be predicated. Until we have proper support for instruction clauses in the
// backend, we will mark KILL* instructions as unpredicable.		// backend, we will mark KILL* instructions as unpredicable.

if (MI.getOpcode() == AMDGPU::KILLGT) {		if (MI.getOpcode() == R600::KILLGT) {
return false;		return false;
} else if (MI.getOpcode() == AMDGPU::CF_ALU) {		} else if (MI.getOpcode() == R600::CF_ALU) {
// If the clause start in the middle of MBB then the MBB has more		// If the clause start in the middle of MBB then the MBB has more
// than a single clause, unable to predicate several clauses.		// than a single clause, unable to predicate several clauses.
if (MI.getParent()->begin() != MachineBasicBlock::const_iterator(MI))		if (MI.getParent()->begin() != MachineBasicBlock::const_iterator(MI))
return false;		return false;
// TODO: We don't support KC merging atm		// TODO: We don't support KC merging atm
return MI.getOperand(3).getImm() == 0 && MI.getOperand(4).getImm() == 0;		return MI.getOperand(3).getImm() == 0 && MI.getOperand(4).getImm() == 0;
} else if (isVector(MI)) {		} else if (isVector(MI)) {
return false;		return false;
} else {		} else {
return AMDGPUInstrInfo::isPredicable(MI);		return TargetInstrInfo::isPredicable(MI);
}		}
}		}

bool		bool
R600InstrInfo::isProfitableToIfCvt(MachineBasicBlock &MBB,		R600InstrInfo::isProfitableToIfCvt(MachineBasicBlock &MBB,
unsigned NumCycles,		unsigned NumCycles,
unsigned ExtraPredCycles,		unsigned ExtraPredCycles,
BranchProbability Probability) const{		BranchProbability Probability) const{
Show All 24 Lines	R600InstrInfo::isProfitableToUnpredicate(MachineBasicBlock &TMBB,
MachineBasicBlock &FMBB) const {		MachineBasicBlock &FMBB) const {
return false;		return false;
}		}

bool		bool
R600InstrInfo::reverseBranchCondition(SmallVectorImpl<MachineOperand> &Cond) const {		R600InstrInfo::reverseBranchCondition(SmallVectorImpl<MachineOperand> &Cond) const {
MachineOperand &MO = Cond[1];		MachineOperand &MO = Cond[1];
switch (MO.getImm()) {		switch (MO.getImm()) {
case AMDGPU::PRED_SETE_INT:		case R600::PRED_SETE_INT:
MO.setImm(AMDGPU::PRED_SETNE_INT);		MO.setImm(R600::PRED_SETNE_INT);
break;		break;
case AMDGPU::PRED_SETNE_INT:		case R600::PRED_SETNE_INT:
MO.setImm(AMDGPU::PRED_SETE_INT);		MO.setImm(R600::PRED_SETE_INT);
break;		break;
case AMDGPU::PRED_SETE:		case R600::PRED_SETE:
MO.setImm(AMDGPU::PRED_SETNE);		MO.setImm(R600::PRED_SETNE);
break;		break;
case AMDGPU::PRED_SETNE:		case R600::PRED_SETNE:
MO.setImm(AMDGPU::PRED_SETE);		MO.setImm(R600::PRED_SETE);
break;		break;
default:		default:
return true;		return true;
}		}

MachineOperand &MO2 = Cond[2];		MachineOperand &MO2 = Cond[2];
switch (MO2.getReg()) {		switch (MO2.getReg()) {
case AMDGPU::PRED_SEL_ZERO:		case R600::PRED_SEL_ZERO:
MO2.setReg(AMDGPU::PRED_SEL_ONE);		MO2.setReg(R600::PRED_SEL_ONE);
break;		break;
case AMDGPU::PRED_SEL_ONE:		case R600::PRED_SEL_ONE:
MO2.setReg(AMDGPU::PRED_SEL_ZERO);		MO2.setReg(R600::PRED_SEL_ZERO);
break;		break;
default:		default:
return true;		return true;
}		}
return false;		return false;
}		}

bool R600InstrInfo::DefinesPredicate(MachineInstr &MI,		bool R600InstrInfo::DefinesPredicate(MachineInstr &MI,
std::vector<MachineOperand> &Pred) const {		std::vector<MachineOperand> &Pred) const {
return isPredicateSetter(MI.getOpcode());		return isPredicateSetter(MI.getOpcode());
}		}

bool R600InstrInfo::PredicateInstruction(MachineInstr &MI,		bool R600InstrInfo::PredicateInstruction(MachineInstr &MI,
ArrayRef<MachineOperand> Pred) const {		ArrayRef<MachineOperand> Pred) const {
int PIdx = MI.findFirstPredOperandIdx();		int PIdx = MI.findFirstPredOperandIdx();

if (MI.getOpcode() == AMDGPU::CF_ALU) {		if (MI.getOpcode() == R600::CF_ALU) {
MI.getOperand(8).setImm(0);		MI.getOperand(8).setImm(0);
return true;		return true;
}		}

if (MI.getOpcode() == AMDGPU::DOT_4) {		if (MI.getOpcode() == R600::DOT_4) {
MI.getOperand(getOperandIdx(MI, AMDGPU::OpName::pred_sel_X))		MI.getOperand(getOperandIdx(MI, R600::OpName::pred_sel_X))
.setReg(Pred[2].getReg());		.setReg(Pred[2].getReg());
MI.getOperand(getOperandIdx(MI, AMDGPU::OpName::pred_sel_Y))		MI.getOperand(getOperandIdx(MI, R600::OpName::pred_sel_Y))
.setReg(Pred[2].getReg());		.setReg(Pred[2].getReg());
MI.getOperand(getOperandIdx(MI, AMDGPU::OpName::pred_sel_Z))		MI.getOperand(getOperandIdx(MI, R600::OpName::pred_sel_Z))
.setReg(Pred[2].getReg());		.setReg(Pred[2].getReg());
MI.getOperand(getOperandIdx(MI, AMDGPU::OpName::pred_sel_W))		MI.getOperand(getOperandIdx(MI, R600::OpName::pred_sel_W))
.setReg(Pred[2].getReg());		.setReg(Pred[2].getReg());
MachineInstrBuilder MIB(*MI.getParent()->getParent(), MI);		MachineInstrBuilder MIB(*MI.getParent()->getParent(), MI);
MIB.addReg(AMDGPU::PREDICATE_BIT, RegState::Implicit);		MIB.addReg(R600::PREDICATE_BIT, RegState::Implicit);
return true;		return true;
}		}

if (PIdx != -1) {		if (PIdx != -1) {
MachineOperand &PMO = MI.getOperand(PIdx);		MachineOperand &PMO = MI.getOperand(PIdx);
PMO.setReg(Pred[2].getReg());		PMO.setReg(Pred[2].getReg());
MachineInstrBuilder MIB(*MI.getParent()->getParent(), MI);		MachineInstrBuilder MIB(*MI.getParent()->getParent(), MI);
MIB.addReg(AMDGPU::PREDICATE_BIT, RegState::Implicit);		MIB.addReg(R600::PREDICATE_BIT, RegState::Implicit);
return true;		return true;
}		}

return false;		return false;
}		}

unsigned int R600InstrInfo::getPredicationCost(const MachineInstr &) const {		unsigned int R600InstrInfo::getPredicationCost(const MachineInstr &) const {
return 2;		return 2;
Show All 13 Lines	unsigned R600InstrInfo::calculateIndirectAddress(unsigned RegIndex,
return RegIndex;		return RegIndex;
}		}

bool R600InstrInfo::expandPostRAPseudo(MachineInstr &MI) const {		bool R600InstrInfo::expandPostRAPseudo(MachineInstr &MI) const {
switch (MI.getOpcode()) {		switch (MI.getOpcode()) {
default: {		default: {
MachineBasicBlock *MBB = MI.getParent();		MachineBasicBlock *MBB = MI.getParent();
int OffsetOpIdx =		int OffsetOpIdx =
AMDGPU::getNamedOperandIdx(MI.getOpcode(), AMDGPU::OpName::addr);		R600::getNamedOperandIdx(MI.getOpcode(), R600::OpName::addr);
// addr is a custom operand with multiple MI operands, and only the		// addr is a custom operand with multiple MI operands, and only the
// first MI operand is given a name.		// first MI operand is given a name.
int RegOpIdx = OffsetOpIdx + 1;		int RegOpIdx = OffsetOpIdx + 1;
int ChanOpIdx =		int ChanOpIdx =
AMDGPU::getNamedOperandIdx(MI.getOpcode(), AMDGPU::OpName::chan);		R600::getNamedOperandIdx(MI.getOpcode(), R600::OpName::chan);
if (isRegisterLoad(MI)) {		if (isRegisterLoad(MI)) {
int DstOpIdx =		int DstOpIdx =
AMDGPU::getNamedOperandIdx(MI.getOpcode(), AMDGPU::OpName::dst);		R600::getNamedOperandIdx(MI.getOpcode(), R600::OpName::dst);
unsigned RegIndex = MI.getOperand(RegOpIdx).getImm();		unsigned RegIndex = MI.getOperand(RegOpIdx).getImm();
unsigned Channel = MI.getOperand(ChanOpIdx).getImm();		unsigned Channel = MI.getOperand(ChanOpIdx).getImm();
unsigned Address = calculateIndirectAddress(RegIndex, Channel);		unsigned Address = calculateIndirectAddress(RegIndex, Channel);
unsigned OffsetReg = MI.getOperand(OffsetOpIdx).getReg();		unsigned OffsetReg = MI.getOperand(OffsetOpIdx).getReg();
if (OffsetReg == AMDGPU::INDIRECT_BASE_ADDR) {		if (OffsetReg == R600::INDIRECT_BASE_ADDR) {
buildMovInstr(MBB, MI, MI.getOperand(DstOpIdx).getReg(),		buildMovInstr(MBB, MI, MI.getOperand(DstOpIdx).getReg(),
getIndirectAddrRegClass()->getRegister(Address));		getIndirectAddrRegClass()->getRegister(Address));
} else {		} else {
buildIndirectRead(MBB, MI, MI.getOperand(DstOpIdx).getReg(), Address,		buildIndirectRead(MBB, MI, MI.getOperand(DstOpIdx).getReg(), Address,
OffsetReg);		OffsetReg);
}		}
} else if (isRegisterStore(MI)) {		} else if (isRegisterStore(MI)) {
int ValOpIdx =		int ValOpIdx =
AMDGPU::getNamedOperandIdx(MI.getOpcode(), AMDGPU::OpName::val);		R600::getNamedOperandIdx(MI.getOpcode(), R600::OpName::val);
unsigned RegIndex = MI.getOperand(RegOpIdx).getImm();		unsigned RegIndex = MI.getOperand(RegOpIdx).getImm();
unsigned Channel = MI.getOperand(ChanOpIdx).getImm();		unsigned Channel = MI.getOperand(ChanOpIdx).getImm();
unsigned Address = calculateIndirectAddress(RegIndex, Channel);		unsigned Address = calculateIndirectAddress(RegIndex, Channel);
unsigned OffsetReg = MI.getOperand(OffsetOpIdx).getReg();		unsigned OffsetReg = MI.getOperand(OffsetOpIdx).getReg();
if (OffsetReg == AMDGPU::INDIRECT_BASE_ADDR) {		if (OffsetReg == R600::INDIRECT_BASE_ADDR) {
buildMovInstr(MBB, MI, getIndirectAddrRegClass()->getRegister(Address),		buildMovInstr(MBB, MI, getIndirectAddrRegClass()->getRegister(Address),
MI.getOperand(ValOpIdx).getReg());		MI.getOperand(ValOpIdx).getReg());
} else {		} else {
buildIndirectWrite(MBB, MI, MI.getOperand(ValOpIdx).getReg(),		buildIndirectWrite(MBB, MI, MI.getOperand(ValOpIdx).getReg(),
calculateIndirectAddress(RegIndex, Channel),		calculateIndirectAddress(RegIndex, Channel),
OffsetReg);		OffsetReg);
}		}
} else {		} else {
return false;		return false;
}		}

MBB->erase(MI);		MBB->erase(MI);
return true;		return true;
}		}
case AMDGPU::R600_EXTRACT_ELT_V2:		case R600::R600_EXTRACT_ELT_V2:
case AMDGPU::R600_EXTRACT_ELT_V4:		case R600::R600_EXTRACT_ELT_V4:
buildIndirectRead(MI.getParent(), MI, MI.getOperand(0).getReg(),		buildIndirectRead(MI.getParent(), MI, MI.getOperand(0).getReg(),
RI.getHWRegIndex(MI.getOperand(1).getReg()), // Address		RI.getHWRegIndex(MI.getOperand(1).getReg()), // Address
MI.getOperand(2).getReg(),		MI.getOperand(2).getReg(),
RI.getHWRegChan(MI.getOperand(1).getReg()));		RI.getHWRegChan(MI.getOperand(1).getReg()));
break;		break;
case AMDGPU::R600_INSERT_ELT_V2:		case R600::R600_INSERT_ELT_V2:
case AMDGPU::R600_INSERT_ELT_V4:		case R600::R600_INSERT_ELT_V4:
buildIndirectWrite(MI.getParent(), MI, MI.getOperand(2).getReg(), // Value		buildIndirectWrite(MI.getParent(), MI, MI.getOperand(2).getReg(), // Value
RI.getHWRegIndex(MI.getOperand(1).getReg()), // Address		RI.getHWRegIndex(MI.getOperand(1).getReg()), // Address
MI.getOperand(3).getReg(), // Offset		MI.getOperand(3).getReg(), // Offset
RI.getHWRegChan(MI.getOperand(1).getReg())); // Channel		RI.getHWRegChan(MI.getOperand(1).getReg())); // Channel
break;		break;
}		}
MI.eraseFromParent();		MI.eraseFromParent();
return true;		return true;
}		}

void R600InstrInfo::reserveIndirectRegisters(BitVector &Reserved,		void R600InstrInfo::reserveIndirectRegisters(BitVector &Reserved,
const MachineFunction &MF,		const MachineFunction &MF,
const R600RegisterInfo &TRI) const {		const R600RegisterInfo &TRI) const {
const R600Subtarget &ST = MF.getSubtarget<R600Subtarget>();		const R600Subtarget &ST = MF.getSubtarget<R600Subtarget>();
const R600FrameLowering *TFL = ST.getFrameLowering();		const R600FrameLowering *TFL = ST.getFrameLowering();

unsigned StackWidth = TFL->getStackWidth(MF);		unsigned StackWidth = TFL->getStackWidth(MF);
int End = getIndirectIndexEnd(MF);		int End = getIndirectIndexEnd(MF);

if (End == -1)		if (End == -1)
return;		return;

for (int Index = getIndirectIndexBegin(MF); Index <= End; ++Index) {		for (int Index = getIndirectIndexBegin(MF); Index <= End; ++Index) {
for (unsigned Chan = 0; Chan < StackWidth; ++Chan) {		for (unsigned Chan = 0; Chan < StackWidth; ++Chan) {
unsigned Reg = AMDGPU::R600_TReg32RegClass.getRegister((4 * Index) + Chan);		unsigned Reg = R600::R600_TReg32RegClass.getRegister((4 * Index) + Chan);
TRI.reserveRegisterTuples(Reserved, Reg);		TRI.reserveRegisterTuples(Reserved, Reg);
}		}
}		}
}		}

const TargetRegisterClass *R600InstrInfo::getIndirectAddrRegClass() const {		const TargetRegisterClass *R600InstrInfo::getIndirectAddrRegClass() const {
return &AMDGPU::R600_TReg32_XRegClass;		return &R600::R600_TReg32_XRegClass;
}		}

MachineInstrBuilder R600InstrInfo::buildIndirectWrite(MachineBasicBlock *MBB,		MachineInstrBuilder R600InstrInfo::buildIndirectWrite(MachineBasicBlock *MBB,
MachineBasicBlock::iterator I,		MachineBasicBlock::iterator I,
unsigned ValueReg, unsigned Address,		unsigned ValueReg, unsigned Address,
unsigned OffsetReg) const {		unsigned OffsetReg) const {
return buildIndirectWrite(MBB, I, ValueReg, Address, OffsetReg, 0);		return buildIndirectWrite(MBB, I, ValueReg, Address, OffsetReg, 0);
}		}

MachineInstrBuilder R600InstrInfo::buildIndirectWrite(MachineBasicBlock *MBB,		MachineInstrBuilder R600InstrInfo::buildIndirectWrite(MachineBasicBlock *MBB,
MachineBasicBlock::iterator I,		MachineBasicBlock::iterator I,
unsigned ValueReg, unsigned Address,		unsigned ValueReg, unsigned Address,
unsigned OffsetReg,		unsigned OffsetReg,
unsigned AddrChan) const {		unsigned AddrChan) const {
unsigned AddrReg;		unsigned AddrReg;
switch (AddrChan) {		switch (AddrChan) {
default: llvm_unreachable("Invalid Channel");		default: llvm_unreachable("Invalid Channel");
case 0: AddrReg = AMDGPU::R600_AddrRegClass.getRegister(Address); break;		case 0: AddrReg = R600::R600_AddrRegClass.getRegister(Address); break;
case 1: AddrReg = AMDGPU::R600_Addr_YRegClass.getRegister(Address); break;		case 1: AddrReg = R600::R600_Addr_YRegClass.getRegister(Address); break;
case 2: AddrReg = AMDGPU::R600_Addr_ZRegClass.getRegister(Address); break;		case 2: AddrReg = R600::R600_Addr_ZRegClass.getRegister(Address); break;
case 3: AddrReg = AMDGPU::R600_Addr_WRegClass.getRegister(Address); break;		case 3: AddrReg = R600::R600_Addr_WRegClass.getRegister(Address); break;
}		}
MachineInstr MOVA = buildDefaultInstruction(MBB, I, AMDGPU::MOVA_INT_eg,		MachineInstr MOVA = buildDefaultInstruction(MBB, I, R600::MOVA_INT_eg,
AMDGPU::AR_X, OffsetReg);		R600::AR_X, OffsetReg);
setImmOperand(*MOVA, AMDGPU::OpName::write, 0);		setImmOperand(*MOVA, R600::OpName::write, 0);

MachineInstrBuilder Mov = buildDefaultInstruction(*MBB, I, AMDGPU::MOV,		MachineInstrBuilder Mov = buildDefaultInstruction(*MBB, I, R600::MOV,
AddrReg, ValueReg)		AddrReg, ValueReg)
.addReg(AMDGPU::AR_X,		.addReg(R600::AR_X,
RegState::Implicit \| RegState::Kill);		RegState::Implicit \| RegState::Kill);
setImmOperand(*Mov, AMDGPU::OpName::dst_rel, 1);		setImmOperand(*Mov, R600::OpName::dst_rel, 1);
return Mov;		return Mov;
}		}

MachineInstrBuilder R600InstrInfo::buildIndirectRead(MachineBasicBlock *MBB,		MachineInstrBuilder R600InstrInfo::buildIndirectRead(MachineBasicBlock *MBB,
MachineBasicBlock::iterator I,		MachineBasicBlock::iterator I,
unsigned ValueReg, unsigned Address,		unsigned ValueReg, unsigned Address,
unsigned OffsetReg) const {		unsigned OffsetReg) const {
return buildIndirectRead(MBB, I, ValueReg, Address, OffsetReg, 0);		return buildIndirectRead(MBB, I, ValueReg, Address, OffsetReg, 0);
}		}

MachineInstrBuilder R600InstrInfo::buildIndirectRead(MachineBasicBlock *MBB,		MachineInstrBuilder R600InstrInfo::buildIndirectRead(MachineBasicBlock *MBB,
MachineBasicBlock::iterator I,		MachineBasicBlock::iterator I,
unsigned ValueReg, unsigned Address,		unsigned ValueReg, unsigned Address,
unsigned OffsetReg,		unsigned OffsetReg,
unsigned AddrChan) const {		unsigned AddrChan) const {
unsigned AddrReg;		unsigned AddrReg;
switch (AddrChan) {		switch (AddrChan) {
default: llvm_unreachable("Invalid Channel");		default: llvm_unreachable("Invalid Channel");
case 0: AddrReg = AMDGPU::R600_AddrRegClass.getRegister(Address); break;		case 0: AddrReg = R600::R600_AddrRegClass.getRegister(Address); break;
case 1: AddrReg = AMDGPU::R600_Addr_YRegClass.getRegister(Address); break;		case 1: AddrReg = R600::R600_Addr_YRegClass.getRegister(Address); break;
case 2: AddrReg = AMDGPU::R600_Addr_ZRegClass.getRegister(Address); break;		case 2: AddrReg = R600::R600_Addr_ZRegClass.getRegister(Address); break;
case 3: AddrReg = AMDGPU::R600_Addr_WRegClass.getRegister(Address); break;		case 3: AddrReg = R600::R600_Addr_WRegClass.getRegister(Address); break;
}		}
MachineInstr MOVA = buildDefaultInstruction(MBB, I, AMDGPU::MOVA_INT_eg,		MachineInstr MOVA = buildDefaultInstruction(MBB, I, R600::MOVA_INT_eg,
AMDGPU::AR_X,		R600::AR_X,
OffsetReg);		OffsetReg);
setImmOperand(*MOVA, AMDGPU::OpName::write, 0);		setImmOperand(*MOVA, R600::OpName::write, 0);
MachineInstrBuilder Mov = buildDefaultInstruction(*MBB, I, AMDGPU::MOV,		MachineInstrBuilder Mov = buildDefaultInstruction(*MBB, I, R600::MOV,
ValueReg,		ValueReg,
AddrReg)		AddrReg)
.addReg(AMDGPU::AR_X,		.addReg(R600::AR_X,
RegState::Implicit \| RegState::Kill);		RegState::Implicit \| RegState::Kill);
setImmOperand(*Mov, AMDGPU::OpName::src0_rel, 1);		setImmOperand(*Mov, R600::OpName::src0_rel, 1);

return Mov;		return Mov;
}		}

int R600InstrInfo::getIndirectIndexBegin(const MachineFunction &MF) const {		int R600InstrInfo::getIndirectIndexBegin(const MachineFunction &MF) const {
const MachineRegisterInfo &MRI = MF.getRegInfo();		const MachineRegisterInfo &MRI = MF.getRegInfo();
const MachineFrameInfo &MFI = MF.getFrameInfo();		const MachineFrameInfo &MFI = MF.getFrameInfo();
int Offset = -1;		int Offset = -1;
▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines	MIB.addReg(Src1Reg) // $src1
.addImm(0) // $src1_rel		.addImm(0) // $src1_rel
.addImm(0) // $src1_abs		.addImm(0) // $src1_abs
.addImm(-1); // $src1_sel		.addImm(-1); // $src1_sel
}		}

//XXX: The r600g finalizer expects this to be 1, once we've moved the		//XXX: The r600g finalizer expects this to be 1, once we've moved the
//scheduling to the backend, we can change the default to 0.		//scheduling to the backend, we can change the default to 0.
MIB.addImm(1) // $last		MIB.addImm(1) // $last
.addReg(AMDGPU::PRED_SEL_OFF) // $pred_sel		.addReg(R600::PRED_SEL_OFF) // $pred_sel
.addImm(0) // $literal		.addImm(0) // $literal
.addImm(0); // $bank_swizzle		.addImm(0); // $bank_swizzle

return MIB;		return MIB;
}		}

#define OPERAND_CASE(Label) \		#define OPERAND_CASE(Label) \
case Label: { \		case Label: { \
static const unsigned Ops[] = \		static const unsigned Ops[] = \
{ \		{ \
Label##_X, \		Label##_X, \
Label##_Y, \		Label##_Y, \
Label##_Z, \		Label##_Z, \
Label##_W \		Label##_W \
}; \		}; \
return Ops[Slot]; \		return Ops[Slot]; \
}		}

static unsigned getSlotedOps(unsigned Op, unsigned Slot) {		static unsigned getSlotedOps(unsigned Op, unsigned Slot) {
switch (Op) {		switch (Op) {
OPERAND_CASE(AMDGPU::OpName::update_exec_mask)		OPERAND_CASE(R600::OpName::update_exec_mask)
OPERAND_CASE(AMDGPU::OpName::update_pred)		OPERAND_CASE(R600::OpName::update_pred)
OPERAND_CASE(AMDGPU::OpName::write)		OPERAND_CASE(R600::OpName::write)
OPERAND_CASE(AMDGPU::OpName::omod)		OPERAND_CASE(R600::OpName::omod)
OPERAND_CASE(AMDGPU::OpName::dst_rel)		OPERAND_CASE(R600::OpName::dst_rel)
OPERAND_CASE(AMDGPU::OpName::clamp)		OPERAND_CASE(R600::OpName::clamp)
OPERAND_CASE(AMDGPU::OpName::src0)		OPERAND_CASE(R600::OpName::src0)
OPERAND_CASE(AMDGPU::OpName::src0_neg)		OPERAND_CASE(R600::OpName::src0_neg)
OPERAND_CASE(AMDGPU::OpName::src0_rel)		OPERAND_CASE(R600::OpName::src0_rel)
OPERAND_CASE(AMDGPU::OpName::src0_abs)		OPERAND_CASE(R600::OpName::src0_abs)
OPERAND_CASE(AMDGPU::OpName::src0_sel)		OPERAND_CASE(R600::OpName::src0_sel)
OPERAND_CASE(AMDGPU::OpName::src1)		OPERAND_CASE(R600::OpName::src1)
OPERAND_CASE(AMDGPU::OpName::src1_neg)		OPERAND_CASE(R600::OpName::src1_neg)
OPERAND_CASE(AMDGPU::OpName::src1_rel)		OPERAND_CASE(R600::OpName::src1_rel)
OPERAND_CASE(AMDGPU::OpName::src1_abs)		OPERAND_CASE(R600::OpName::src1_abs)
OPERAND_CASE(AMDGPU::OpName::src1_sel)		OPERAND_CASE(R600::OpName::src1_sel)
OPERAND_CASE(AMDGPU::OpName::pred_sel)		OPERAND_CASE(R600::OpName::pred_sel)
default:		default:
llvm_unreachable("Wrong Operand");		llvm_unreachable("Wrong Operand");
}		}
}		}

#undef OPERAND_CASE		#undef OPERAND_CASE

MachineInstr *R600InstrInfo::buildSlotOfVectorInstruction(		MachineInstr *R600InstrInfo::buildSlotOfVectorInstruction(
MachineBasicBlock &MBB, MachineInstr *MI, unsigned Slot, unsigned DstReg)		MachineBasicBlock &MBB, MachineInstr *MI, unsigned Slot, unsigned DstReg)
const {		const {
assert (MI->getOpcode() == AMDGPU::DOT_4 && "Not Implemented");		assert (MI->getOpcode() == R600::DOT_4 && "Not Implemented");
unsigned Opcode;		unsigned Opcode;
if (ST.getGeneration() <= R600Subtarget::R700)		if (ST.getGeneration() <= R600Subtarget::R700)
Opcode = AMDGPU::DOT4_r600;		Opcode = R600::DOT4_r600;
else		else
Opcode = AMDGPU::DOT4_eg;		Opcode = R600::DOT4_eg;
MachineBasicBlock::iterator I = MI;		MachineBasicBlock::iterator I = MI;
MachineOperand &Src0 = MI->getOperand(		MachineOperand &Src0 = MI->getOperand(
getOperandIdx(MI->getOpcode(), getSlotedOps(AMDGPU::OpName::src0, Slot)));		getOperandIdx(MI->getOpcode(), getSlotedOps(R600::OpName::src0, Slot)));
MachineOperand &Src1 = MI->getOperand(		MachineOperand &Src1 = MI->getOperand(
getOperandIdx(MI->getOpcode(), getSlotedOps(AMDGPU::OpName::src1, Slot)));		getOperandIdx(MI->getOpcode(), getSlotedOps(R600::OpName::src1, Slot)));
MachineInstr *MIB = buildDefaultInstruction(		MachineInstr *MIB = buildDefaultInstruction(
MBB, I, Opcode, DstReg, Src0.getReg(), Src1.getReg());		MBB, I, Opcode, DstReg, Src0.getReg(), Src1.getReg());
static const unsigned Operands[14] = {		static const unsigned Operands[14] = {
AMDGPU::OpName::update_exec_mask,		R600::OpName::update_exec_mask,
AMDGPU::OpName::update_pred,		R600::OpName::update_pred,
AMDGPU::OpName::write,		R600::OpName::write,
AMDGPU::OpName::omod,		R600::OpName::omod,
AMDGPU::OpName::dst_rel,		R600::OpName::dst_rel,
AMDGPU::OpName::clamp,		R600::OpName::clamp,
AMDGPU::OpName::src0_neg,		R600::OpName::src0_neg,
AMDGPU::OpName::src0_rel,		R600::OpName::src0_rel,
AMDGPU::OpName::src0_abs,		R600::OpName::src0_abs,
AMDGPU::OpName::src0_sel,		R600::OpName::src0_sel,
AMDGPU::OpName::src1_neg,		R600::OpName::src1_neg,
AMDGPU::OpName::src1_rel,		R600::OpName::src1_rel,
AMDGPU::OpName::src1_abs,		R600::OpName::src1_abs,
AMDGPU::OpName::src1_sel,		R600::OpName::src1_sel,
};		};

MachineOperand &MO = MI->getOperand(getOperandIdx(MI->getOpcode(),		MachineOperand &MO = MI->getOperand(getOperandIdx(MI->getOpcode(),
getSlotedOps(AMDGPU::OpName::pred_sel, Slot)));		getSlotedOps(R600::OpName::pred_sel, Slot)));
MIB->getOperand(getOperandIdx(Opcode, AMDGPU::OpName::pred_sel))		MIB->getOperand(getOperandIdx(Opcode, R600::OpName::pred_sel))
.setReg(MO.getReg());		.setReg(MO.getReg());

for (unsigned i = 0; i < 14; i++) {		for (unsigned i = 0; i < 14; i++) {
MachineOperand &MO = MI->getOperand(		MachineOperand &MO = MI->getOperand(
getOperandIdx(MI->getOpcode(), getSlotedOps(Operands[i], Slot)));		getOperandIdx(MI->getOpcode(), getSlotedOps(Operands[i], Slot)));
assert (MO.isImm());		assert (MO.isImm());
setImmOperand(*MIB, Operands[i], MO.getImm());		setImmOperand(*MIB, Operands[i], MO.getImm());
}		}
MIB->getOperand(20).setImm(0);		MIB->getOperand(20).setImm(0);
return MIB;		return MIB;
}		}

MachineInstr *R600InstrInfo::buildMovImm(MachineBasicBlock &BB,		MachineInstr *R600InstrInfo::buildMovImm(MachineBasicBlock &BB,
MachineBasicBlock::iterator I,		MachineBasicBlock::iterator I,
unsigned DstReg,		unsigned DstReg,
uint64_t Imm) const {		uint64_t Imm) const {
MachineInstr *MovImm = buildDefaultInstruction(BB, I, AMDGPU::MOV, DstReg,		MachineInstr *MovImm = buildDefaultInstruction(BB, I, R600::MOV, DstReg,
AMDGPU::ALU_LITERAL_X);		R600::ALU_LITERAL_X);
setImmOperand(*MovImm, AMDGPU::OpName::literal, Imm);		setImmOperand(*MovImm, R600::OpName::literal, Imm);
return MovImm;		return MovImm;
}		}

MachineInstr R600InstrInfo::buildMovInstr(MachineBasicBlock MBB,		MachineInstr R600InstrInfo::buildMovInstr(MachineBasicBlock MBB,
MachineBasicBlock::iterator I,		MachineBasicBlock::iterator I,
unsigned DstReg, unsigned SrcReg) const {		unsigned DstReg, unsigned SrcReg) const {
return buildDefaultInstruction(*MBB, I, AMDGPU::MOV, DstReg, SrcReg);		return buildDefaultInstruction(*MBB, I, R600::MOV, DstReg, SrcReg);
}		}

int R600InstrInfo::getOperandIdx(const MachineInstr &MI, unsigned Op) const {		int R600InstrInfo::getOperandIdx(const MachineInstr &MI, unsigned Op) const {
return getOperandIdx(MI.getOpcode(), Op);		return getOperandIdx(MI.getOpcode(), Op);
}		}

int R600InstrInfo::getOperandIdx(unsigned Opcode, unsigned Op) const {		int R600InstrInfo::getOperandIdx(unsigned Opcode, unsigned Op) const {
return AMDGPU::getNamedOperandIdx(Opcode, Op);		return R600::getNamedOperandIdx(Opcode, Op);
}		}

void R600InstrInfo::setImmOperand(MachineInstr &MI, unsigned Op,		void R600InstrInfo::setImmOperand(MachineInstr &MI, unsigned Op,
int64_t Imm) const {		int64_t Imm) const {
int Idx = getOperandIdx(MI, Op);		int Idx = getOperandIdx(MI, Op);
assert(Idx != -1 && "Operand not supported for this instruction.");		assert(Idx != -1 && "Operand not supported for this instruction.");
assert(MI.getOperand(Idx).isImm());		assert(MI.getOperand(Idx).isImm());
MI.getOperand(Idx).setImm(Imm);		MI.getOperand(Idx).setImm(Imm);
Show All 10 Lines	MachineOperand &R600InstrInfo::getFlagOp(MachineInstr &MI, unsigned SrcIdx,
if (Flag != 0) {		if (Flag != 0) {
// If we pass something other than the default value of Flag to this		// If we pass something other than the default value of Flag to this
// function, it means we are want to set a flag on an instruction		// function, it means we are want to set a flag on an instruction
// that uses native encoding.		// that uses native encoding.
assert(HAS_NATIVE_OPERANDS(TargetFlags));		assert(HAS_NATIVE_OPERANDS(TargetFlags));
bool IsOP3 = (TargetFlags & R600_InstFlag::OP3) == R600_InstFlag::OP3;		bool IsOP3 = (TargetFlags & R600_InstFlag::OP3) == R600_InstFlag::OP3;
switch (Flag) {		switch (Flag) {
case MO_FLAG_CLAMP:		case MO_FLAG_CLAMP:
FlagIndex = getOperandIdx(MI, AMDGPU::OpName::clamp);		FlagIndex = getOperandIdx(MI, R600::OpName::clamp);
break;		break;
case MO_FLAG_MASK:		case MO_FLAG_MASK:
FlagIndex = getOperandIdx(MI, AMDGPU::OpName::write);		FlagIndex = getOperandIdx(MI, R600::OpName::write);
break;		break;
case MO_FLAG_NOT_LAST:		case MO_FLAG_NOT_LAST:
case MO_FLAG_LAST:		case MO_FLAG_LAST:
FlagIndex = getOperandIdx(MI, AMDGPU::OpName::last);		FlagIndex = getOperandIdx(MI, R600::OpName::last);
break;		break;
case MO_FLAG_NEG:		case MO_FLAG_NEG:
switch (SrcIdx) {		switch (SrcIdx) {
case 0:		case 0:
FlagIndex = getOperandIdx(MI, AMDGPU::OpName::src0_neg);		FlagIndex = getOperandIdx(MI, R600::OpName::src0_neg);
break;		break;
case 1:		case 1:
FlagIndex = getOperandIdx(MI, AMDGPU::OpName::src1_neg);		FlagIndex = getOperandIdx(MI, R600::OpName::src1_neg);
break;		break;
case 2:		case 2:
FlagIndex = getOperandIdx(MI, AMDGPU::OpName::src2_neg);		FlagIndex = getOperandIdx(MI, R600::OpName::src2_neg);
break;		break;
}		}
break;		break;

case MO_FLAG_ABS:		case MO_FLAG_ABS:
assert(!IsOP3 && "Cannot set absolute value modifier for OP3 "		assert(!IsOP3 && "Cannot set absolute value modifier for OP3 "
"instructions.");		"instructions.");
(void)IsOP3;		(void)IsOP3;
switch (SrcIdx) {		switch (SrcIdx) {
case 0:		case 0:
FlagIndex = getOperandIdx(MI, AMDGPU::OpName::src0_abs);		FlagIndex = getOperandIdx(MI, R600::OpName::src0_abs);
break;		break;
case 1:		case 1:
FlagIndex = getOperandIdx(MI, AMDGPU::OpName::src1_abs);		FlagIndex = getOperandIdx(MI, R600::OpName::src1_abs);
break;		break;
}		}
break;		break;

default:		default:
FlagIndex = -1;		FlagIndex = -1;
break;		break;
}		}
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	void R600InstrInfo::clearFlag(MachineInstr &MI, unsigned Operand,
}		}
}		}

unsigned R600InstrInfo::getAddressSpaceForPseudoSourceKind(		unsigned R600InstrInfo::getAddressSpaceForPseudoSourceKind(
PseudoSourceValue::PSVKind Kind) const {		PseudoSourceValue::PSVKind Kind) const {
switch (Kind) {		switch (Kind) {
case PseudoSourceValue::Stack:		case PseudoSourceValue::Stack:
case PseudoSourceValue::FixedStack:		case PseudoSourceValue::FixedStack:
return AMDGPUASI.PRIVATE_ADDRESS;		return ST.getAMDGPUAS().PRIVATE_ADDRESS;
case PseudoSourceValue::ConstantPool:		case PseudoSourceValue::ConstantPool:
case PseudoSourceValue::GOT:		case PseudoSourceValue::GOT:
case PseudoSourceValue::JumpTable:		case PseudoSourceValue::JumpTable:
case PseudoSourceValue::GlobalValueCallEntry:		case PseudoSourceValue::GlobalValueCallEntry:
case PseudoSourceValue::ExternalSymbolCallEntry:		case PseudoSourceValue::ExternalSymbolCallEntry:
case PseudoSourceValue::TargetCustom:		case PseudoSourceValue::TargetCustom:
return AMDGPUASI.CONSTANT_ADDRESS;		return ST.getAMDGPUAS().CONSTANT_ADDRESS;
}		}
llvm_unreachable("Invalid pseudo source kind");		llvm_unreachable("Invalid pseudo source kind");
return AMDGPUASI.PRIVATE_ADDRESS;		return ST.getAMDGPUAS().PRIVATE_ADDRESS;
}		}

lib/Target/AMDGPU/R600Instructions.td

Show All 12 Lines
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

include "R600InstrFormats.td"		include "R600InstrFormats.td"

// FIXME: Should not be arbitrarily split from other R600 inst classes.		// FIXME: Should not be arbitrarily split from other R600 inst classes.
class R600WrapperInst <dag outs, dag ins, string asm = "", list<dag> pattern = []> :		class R600WrapperInst <dag outs, dag ins, string asm = "", list<dag> pattern = []> :
AMDGPUInst<outs, ins, asm, pattern>, PredicateControl {		AMDGPUInst<outs, ins, asm, pattern>, PredicateControl {
let SubtargetPredicate = isR600toCayman;		let SubtargetPredicate = isR600toCayman;
		let Namespace = "R600";
}		}


class InstR600ISA <dag outs, dag ins, string asm, list<dag> pattern = []> :		class InstR600ISA <dag outs, dag ins, string asm, list<dag> pattern = []> :
InstR600 <outs, ins, asm, pattern, NullALU> {		InstR600 <outs, ins, asm, pattern, NullALU> {

let Namespace = "AMDGPU";
}		}

def MEMxi : Operand<iPTR> {		def MEMxi : Operand<iPTR> {
let MIOperandInfo = (ops R600_TReg32_X:$ptr, i32imm:$index);		let MIOperandInfo = (ops R600_TReg32_X:$ptr, i32imm:$index);
let PrintMethod = "printMemOperand";		let PrintMethod = "printMemOperand";
}		}

def MEMrr : Operand<iPTR> {		def MEMrr : Operand<iPTR> {
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
def ADDRGA_CONST_OFFSET : ComplexPattern<i32, 1, "SelectGlobalValueConstantOffset", [], []>;		def ADDRGA_CONST_OFFSET : ComplexPattern<i32, 1, "SelectGlobalValueConstantOffset", [], []>;
def ADDRGA_VAR_OFFSET : ComplexPattern<i32, 2, "SelectGlobalValueVariableOffset", [], []>;		def ADDRGA_VAR_OFFSET : ComplexPattern<i32, 2, "SelectGlobalValueVariableOffset", [], []>;
def ADDRIndirect : ComplexPattern<iPTR, 2, "SelectADDRIndirect", [], []>;		def ADDRIndirect : ComplexPattern<iPTR, 2, "SelectADDRIndirect", [], []>;


def R600_Pred : PredicateOperand<i32, (ops R600_Predicate),		def R600_Pred : PredicateOperand<i32, (ops R600_Predicate),
(ops PRED_SEL_OFF)>;		(ops PRED_SEL_OFF)>;

		let isTerminator = 1, isReturn = 1, hasCtrlDep = 1,
		usesCustomInserter = 1, Namespace = "R600" in {
		def RETURN : ILFormat<(outs), (ins variable_ops),
		"RETURN", [(AMDGPUendpgm)]
		>;
		}

let mayLoad = 0, mayStore = 0, hasSideEffects = 0 in {		let mayLoad = 0, mayStore = 0, hasSideEffects = 0 in {

// Class for instructions with only one source register.		// Class for instructions with only one source register.
// If you add new ins to this instruction, make sure they are listed before		// If you add new ins to this instruction, make sure they are listed before
// $literal, because the backend currently assumes that the last operand is		// $literal, because the backend currently assumes that the last operand is
// a literal. Also be sure to update the enum R600Op1OperandIndex::ROI in		// a literal. Also be sure to update the enum R600Op1OperandIndex::ROI in
// R600Defines.h, R600InstrInfo::buildDefaultInstruction(),		// R600Defines.h, R600InstrInfo::buildDefaultInstruction(),
▲ Show 20 Lines • Show All 117 Lines • ▼ Show 20 Lines	InstR600 <(outs R600_Reg32:$dst),
asm,		asm,
pattern,		pattern,
itin>;		itin>;



} // End mayLoad = 1, mayStore = 0, hasSideEffects = 0		} // End mayLoad = 1, mayStore = 0, hasSideEffects = 0

def TEX_SHADOW : PatLeaf<
(imm),
[{uint32_t TType = (uint32_t)N->getZExtValue();
return (TType >= 6 && TType <= 8) \|\| TType == 13;
}]
>;

def TEX_RECT : PatLeaf<
(imm),
[{uint32_t TType = (uint32_t)N->getZExtValue();
return TType == 5;
}]
>;

def TEX_ARRAY : PatLeaf<
(imm),
[{uint32_t TType = (uint32_t)N->getZExtValue();
return TType == 9 \|\| TType == 10 \|\| TType == 16;
}]
>;

def TEX_SHADOW_ARRAY : PatLeaf<
(imm),
[{uint32_t TType = (uint32_t)N->getZExtValue();
return TType == 11 \|\| TType == 12 \|\| TType == 17;
}]
>;

class EG_CF_RAT <bits <8> cfinst, bits <6> ratinst, bits<4> ratid, bits<4> mask,		class EG_CF_RAT <bits <8> cfinst, bits <6> ratinst, bits<4> ratid, bits<4> mask,
dag outs, dag ins, string asm, list<dag> pattern> :		dag outs, dag ins, string asm, list<dag> pattern> :
InstR600ISA <outs, ins, asm, pattern>,		InstR600ISA <outs, ins, asm, pattern>,
CF_ALLOC_EXPORT_WORD0_RAT, CF_ALLOC_EXPORT_WORD1_BUF {		CF_ALLOC_EXPORT_WORD0_RAT, CF_ALLOC_EXPORT_WORD1_BUF {

let rat_id = ratid;		let rat_id = ratid;
let rat_inst = ratinst;		let rat_inst = ratinst;
let rim = 0;		let rim = 0;
▲ Show 20 Lines • Show All 94 Lines • ▼ Show 20 Lines
def vtx_id2_az_extloadi8 : LoadVtxId2 <az_extloadi8>;		def vtx_id2_az_extloadi8 : LoadVtxId2 <az_extloadi8>;
def vtx_id2_az_extloadi16 : LoadVtxId2 <az_extloadi16>;		def vtx_id2_az_extloadi16 : LoadVtxId2 <az_extloadi16>;
def vtx_id2_load : LoadVtxId2 <load>;		def vtx_id2_load : LoadVtxId2 <load>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// R600 SDNodes		// R600 SDNodes
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

		let Namespace = "R600" in {

def INTERP_PAIR_XY : AMDGPUShaderInst <		def INTERP_PAIR_XY : AMDGPUShaderInst <
(outs R600_TReg32_X:$dst0, R600_TReg32_Y:$dst1),		(outs R600_TReg32_X:$dst0, R600_TReg32_Y:$dst1),
(ins i32imm:$src0, R600_TReg32_Y:$src1, R600_TReg32_X:$src2),		(ins i32imm:$src0, R600_TReg32_Y:$src1, R600_TReg32_X:$src2),
"INTERP_PAIR_XY $src0 $src1 $src2 : $dst0 dst1",		"INTERP_PAIR_XY $src0 $src1 $src2 : $dst0 dst1",
[]>;		[]>;

def INTERP_PAIR_ZW : AMDGPUShaderInst <		def INTERP_PAIR_ZW : AMDGPUShaderInst <
(outs R600_TReg32_Z:$dst0, R600_TReg32_W:$dst1),		(outs R600_TReg32_Z:$dst0, R600_TReg32_W:$dst1),
(ins i32imm:$src0, R600_TReg32_Y:$src1, R600_TReg32_X:$src2),		(ins i32imm:$src0, R600_TReg32_Y:$src1, R600_TReg32_X:$src2),
"INTERP_PAIR_ZW $src0 $src1 $src2 : $dst0 dst1",		"INTERP_PAIR_ZW $src0 $src1 $src2 : $dst0 dst1",
[]>;		[]>;

		}

def CONST_ADDRESS: SDNode<"AMDGPUISD::CONST_ADDRESS",		def CONST_ADDRESS: SDNode<"AMDGPUISD::CONST_ADDRESS",
SDTypeProfile<1, -1, [SDTCisInt<0>, SDTCisPtrTy<1>]>,		SDTypeProfile<1, -1, [SDTCisInt<0>, SDTCisPtrTy<1>]>,
[SDNPVariadic]		[SDNPVariadic]
>;		>;

def DOT4 : SDNode<"AMDGPUISD::DOT4",		def DOT4 : SDNode<"AMDGPUISD::DOT4",
SDTypeProfile<1, 8, [SDTCisFP<0>, SDTCisVT<1, f32>, SDTCisVT<2, f32>,		SDTypeProfile<1, 8, [SDTCisFP<0>, SDTCisVT<1, f32>, SDTCisVT<2, f32>,
SDTCisVT<3, f32>, SDTCisVT<4, f32>, SDTCisVT<5, f32>,		SDTCisVT<3, f32>, SDTCisVT<4, f32>, SDTCisVT<5, f32>,
Show All 31 Lines	def : R600Pat<(TEXTURE_FETCH (i32 TextureOp), vt:$SRC_GPR,
imm:$COORD_TYPE_X, imm:$COORD_TYPE_Y, imm:$COORD_TYPE_Z,		imm:$COORD_TYPE_X, imm:$COORD_TYPE_Y, imm:$COORD_TYPE_Z,
imm:$COORD_TYPE_W)>;		imm:$COORD_TYPE_W)>;
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Interpolation Instructions		// Interpolation Instructions
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

		let Namespace = "R600" in {

def INTERP_VEC_LOAD : AMDGPUShaderInst <		def INTERP_VEC_LOAD : AMDGPUShaderInst <
(outs R600_Reg128:$dst),		(outs R600_Reg128:$dst),
(ins i32imm:$src0),		(ins i32imm:$src0),
"INTERP_LOAD $src0 : $dst">;		"INTERP_LOAD $src0 : $dst">;

		}

def INTERP_XY : R600_2OP <0xD6, "INTERP_XY", []> {		def INTERP_XY : R600_2OP <0xD6, "INTERP_XY", []> {
let bank_swizzle = 5;		let bank_swizzle = 5;
}		}

def INTERP_ZW : R600_2OP <0xD7, "INTERP_ZW", []> {		def INTERP_ZW : R600_2OP <0xD7, "INTERP_ZW", []> {
let bank_swizzle = 5;		let bank_swizzle = 5;
}		}

▲ Show 20 Lines • Show All 223 Lines • ▼ Show 20 Lines
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Common Instructions R600, R700, Evergreen, Cayman		// Common Instructions R600, R700, Evergreen, Cayman
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

let isCodeGenOnly = 1, isPseudo = 1 in {		let isCodeGenOnly = 1, isPseudo = 1 in {

let usesCustomInserter = 1 in {		let Namespace = "R600", usesCustomInserter = 1 in {

class FABS <RegisterClass rc> : AMDGPUShaderInst <		class FABS <RegisterClass rc> : AMDGPUShaderInst <
(outs rc:$dst),		(outs rc:$dst),
(ins rc:$src0),		(ins rc:$src0),
"FABS $dst, $src0",		"FABS $dst, $src0",
[(set f32:$dst, (fabs f32:$src0))]		[(set f32:$dst, (fabs f32:$src0))]
>;		>;

▲ Show 20 Lines • Show All 115 Lines • ▼ Show 20 Lines

let isPseudo = 1, isCodeGenOnly = 1, usesCustomInserter = 1 in {		let isPseudo = 1, isCodeGenOnly = 1, usesCustomInserter = 1 in {

class MOV_IMM <ValueType vt, Operand immType> : R600WrapperInst <		class MOV_IMM <ValueType vt, Operand immType> : R600WrapperInst <
(outs R600_Reg32:$dst),		(outs R600_Reg32:$dst),
(ins immType:$imm),		(ins immType:$imm),
"",		"",
[]		[]
>;		> {
		let Namespace = "R600";
		}

} // end let isPseudo = 1, isCodeGenOnly = 1, usesCustomInserter = 1		} // end let isPseudo = 1, isCodeGenOnly = 1, usesCustomInserter = 1

def MOV_IMM_I32 : MOV_IMM<i32, i32imm>;		def MOV_IMM_I32 : MOV_IMM<i32, i32imm>;
def : R600Pat <		def : R600Pat <
(imm:$val),		(imm:$val),
(MOV_IMM_I32 imm:$val)		(MOV_IMM_I32 imm:$val)
>;		>;
▲ Show 20 Lines • Show All 198 Lines • ▼ Show 20 Lines
class CNDGE_Common <bits<5> inst> : R600_3OP <		class CNDGE_Common <bits<5> inst> : R600_3OP <
inst, "CNDGE",		inst, "CNDGE",
[(set f32:$dst, (selectcc f32:$src0, FP_ZERO, f32:$src1, f32:$src2, COND_OGE))]		[(set f32:$dst, (selectcc f32:$src0, FP_ZERO, f32:$src1, f32:$src2, COND_OGE))]
> {		> {
let Itinerary = VecALU;		let Itinerary = VecALU;
}		}


let isCodeGenOnly = 1, isPseudo = 1, Namespace = "AMDGPU" in {		let isCodeGenOnly = 1, isPseudo = 1, Namespace = "R600" in {
class R600_VEC2OP<list<dag> pattern> : InstR600 <(outs R600_Reg32:$dst), (ins		class R600_VEC2OP<list<dag> pattern> : InstR600 <(outs R600_Reg32:$dst), (ins
// Slot X		// Slot X
UEM:$update_exec_mask_X, UP:$update_pred_X, WRITE:$write_X,		UEM:$update_exec_mask_X, UP:$update_pred_X, WRITE:$write_X,
OMOD:$omod_X, REL:$dst_rel_X, CLAMP:$clamp_X,		OMOD:$omod_X, REL:$dst_rel_X, CLAMP:$clamp_X,
R600_TReg32_X:$src0_X, NEG:$src0_neg_X, REL:$src0_rel_X, ABS:$src0_abs_X, SEL:$src0_sel_X,		R600_TReg32_X:$src0_X, NEG:$src0_neg_X, REL:$src0_rel_X, ABS:$src0_abs_X, SEL:$src0_sel_X,
R600_TReg32_X:$src1_X, NEG:$src1_neg_X, REL:$src1_rel_X, ABS:$src1_abs_X, SEL:$src1_sel_X,		R600_TReg32_X:$src1_X, NEG:$src1_neg_X, REL:$src1_rel_X, ABS:$src1_abs_X, SEL:$src1_sel_X,
R600_Pred:$pred_sel_X,		R600_Pred:$pred_sel_X,
// Slot Y		// Slot Y
▲ Show 20 Lines • Show All 302 Lines • ▼ Show 20 Lines

}		}


//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Regist loads and stores - for indirect addressing		// Regist loads and stores - for indirect addressing
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

		let Namespace = "R600" in {
defm R600_ : RegisterLoadStore <R600_Reg32, FRAMEri, ADDRIndirect>;		defm R600_ : RegisterLoadStore <R600_Reg32, FRAMEri, ADDRIndirect>;
		}

// Hardcode channel to 0		// Hardcode channel to 0
// NOTE: LSHR is not available here. LSHR is per family instruction		// NOTE: LSHR is not available here. LSHR is per family instruction
def : R600Pat <		def : R600Pat <
(i32 (load_private ADDRIndirect:$addr) ),		(i32 (load_private ADDRIndirect:$addr) ),
(R600_RegisterLoad FRAMEri:$addr, (i32 0))		(R600_RegisterLoad FRAMEri:$addr, (i32 0))
>;		>;
def : R600Pat <		def : R600Pat <
Show All 35 Lines
}		}

} // End isTerminator = 1, isBranch = 1		} // End isTerminator = 1, isBranch = 1

let usesCustomInserter = 1 in {		let usesCustomInserter = 1 in {

let mayLoad = 0, mayStore = 0, hasSideEffects = 1 in {		let mayLoad = 0, mayStore = 0, hasSideEffects = 1 in {

def MASK_WRITE : AMDGPUShaderInst <		def MASK_WRITE : InstR600 <
(outs),		(outs),
(ins R600_Reg32:$src),		(ins R600_Reg32:$src),
"MASK_WRITE $src",		"MASK_WRITE $src",
[]		[],
		NullALU
>;		>;

} // End mayLoad = 0, mayStore = 0, hasSideEffects = 1		} // End mayLoad = 0, mayStore = 0, hasSideEffects = 1


def TXD: InstR600 <		def TXD: InstR600 <
(outs R600_Reg128:$dst),		(outs R600_Reg128:$dst),
(ins R600_Reg128:$src0, R600_Reg128:$src1, R600_Reg128:$src2,		(ins R600_Reg128:$src0, R600_Reg128:$src1, R600_Reg128:$src2,
Show All 14 Lines
} // End isPseudo = 1		} // End isPseudo = 1
} // End usesCustomInserter = 1		} // End usesCustomInserter = 1


//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Constant Buffer Addressing Support		// Constant Buffer Addressing Support
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

let usesCustomInserter = 1, isCodeGenOnly = 1, isPseudo = 1, Namespace = "AMDGPU" in {		let usesCustomInserter = 1, isCodeGenOnly = 1, isPseudo = 1, Namespace = "R600" in {
def CONST_COPY : Instruction {		def CONST_COPY : Instruction {
let OutOperandList = (outs R600_Reg32:$dst);		let OutOperandList = (outs R600_Reg32:$dst);
let InOperandList = (ins i32imm:$src);		let InOperandList = (ins i32imm:$src);
let Pattern =		let Pattern =
[(set R600_Reg32:$dst, (CONST_ADDRESS ADDRGA_CONST_OFFSET:$src))];		[(set R600_Reg32:$dst, (CONST_ADDRESS ADDRGA_CONST_OFFSET:$src))];
let AsmString = "CONST_COPY";		let AsmString = "CONST_COPY";
let hasSideEffects = 0;		let hasSideEffects = 0;
let isAsCheapAsAMove = 1;		let isAsCheapAsAMove = 1;
▲ Show 20 Lines • Show All 106 Lines • ▼ Show 20 Lines
//		//
// Inst{127-96} = 0;		// Inst{127-96} = 0;
let VTXInst = 1;		let VTXInst = 1;
}		}

//===---------------------------------------------------------------------===//		//===---------------------------------------------------------------------===//
// Flow and Program control Instructions		// Flow and Program control Instructions
//===---------------------------------------------------------------------===//		//===---------------------------------------------------------------------===//
class ILFormat<dag outs, dag ins, string asmstr, list<dag> pattern>
: Instruction {

let Namespace = "AMDGPU";
dag OutOperandList = outs;
dag InOperandList = ins;
let Pattern = pattern;
let AsmString = !strconcat(asmstr, "\n");
let isPseudo = 1;
let Itinerary = NullALU;
bit hasIEEEFlag = 0;
bit hasZeroOpFlag = 0;
let mayLoad = 0;
let mayStore = 0;
let hasSideEffects = 0;
let isCodeGenOnly = 1;
}

multiclass BranchConditional<SDNode Op, RegisterClass rci, RegisterClass rcf> {		multiclass BranchConditional<SDNode Op, RegisterClass rci, RegisterClass rcf> {
def _i32 : ILFormat<(outs),		def _i32 : ILFormat<(outs),
(ins brtarget:$target, rci:$src0),		(ins brtarget:$target, rci:$src0),
"; i32 Pseudo branch instruction",		"; i32 Pseudo branch instruction",
[(Op bb:$target, (i32 rci:$src0))]>;		[(Op bb:$target, (i32 rci:$src0))]>;
def _f32 : ILFormat<(outs),		def _f32 : ILFormat<(outs),
(ins brtarget:$target, rcf:$src0),		(ins brtarget:$target, rcf:$src0),
Show All 15 Lines	multiclass BranchInstr2<string name> {
def _f32 : ILFormat<(outs), (ins R600_Reg32:$src0, R600_Reg32:$src1),		def _f32 : ILFormat<(outs), (ins R600_Reg32:$src0, R600_Reg32:$src1),
!strconcat(name, " $src0, $src1"), []>;		!strconcat(name, " $src0, $src1"), []>;
}		}

//===---------------------------------------------------------------------===//		//===---------------------------------------------------------------------===//
// Custom Inserter for Branches and returns, this eventually will be a		// Custom Inserter for Branches and returns, this eventually will be a
// separate pass		// separate pass
//===---------------------------------------------------------------------===//		//===---------------------------------------------------------------------===//
let isTerminator = 1, usesCustomInserter = 1, isBranch = 1, isBarrier = 1 in {		let isTerminator = 1, usesCustomInserter = 1, isBranch = 1, isBarrier = 1,
		Namespace = "R600" in {
def BRANCH : ILFormat<(outs), (ins brtarget:$target),		def BRANCH : ILFormat<(outs), (ins brtarget:$target),
"; Pseudo unconditional branch instruction",		"; Pseudo unconditional branch instruction",
[(br bb:$target)]>;		[(br bb:$target)]>;
defm BRANCH_COND : BranchConditional<IL_brcond, R600_Reg32, R600_Reg32>;		defm BRANCH_COND : BranchConditional<IL_brcond, R600_Reg32, R600_Reg32>;
}		}

//===---------------------------------------------------------------------===//
// Return instruction
//===---------------------------------------------------------------------===//
let isTerminator = 1, isReturn = 1, hasCtrlDep = 1,
usesCustomInserter = 1 in {
def RETURN : ILFormat<(outs), (ins variable_ops),
"RETURN", [(AMDGPUendpgm)]
>;
}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Branch Instructions		// Branch Instructions
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

def IF_PREDICATE_SET : ILFormat<(outs), (ins R600_Reg32:$src),		def IF_PREDICATE_SET : ILFormat<(outs), (ins R600_Reg32:$src),
"IF_PREDICATE_SET $src", []>;		"IF_PREDICATE_SET $src", []>;

let isTerminator=1 in {		let isTerminator=1 in {
▲ Show 20 Lines • Show All 114 Lines • ▼ Show 20 Lines
//CNDGE_INT extra pattern		//CNDGE_INT extra pattern
def : R600Pat <		def : R600Pat <
(selectcc i32:$src0, -1, i32:$src1, i32:$src2, COND_SGT),		(selectcc i32:$src0, -1, i32:$src1, i32:$src2, COND_SGT),
(CNDGE_INT $src0, $src1, $src2)		(CNDGE_INT $src0, $src1, $src2)
>;		>;

// KIL Patterns		// KIL Patterns
def KIL : R600Pat <		def KIL : R600Pat <
(int_AMDGPU_kill f32:$src0),		(int_r600_kill f32:$src0),
(MASK_WRITE (KILLGT (f32 ZERO), $src0))		(MASK_WRITE (KILLGT (f32 ZERO), $src0))
>;		>;

def : Extract_Element <f32, v4f32, 0, sub0>;		def : Extract_Element <f32, v4f32, 0, sub0>;
def : Extract_Element <f32, v4f32, 1, sub1>;		def : Extract_Element <f32, v4f32, 1, sub1>;
def : Extract_Element <f32, v4f32, 2, sub2>;		def : Extract_Element <f32, v4f32, 2, sub2>;
def : Extract_Element <f32, v4f32, 3, sub3>;		def : Extract_Element <f32, v4f32, 3, sub3>;

▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

lib/Target/AMDGPU/R600MachineScheduler.cpp

Show First 20 Lines • Show All 156 Lines • ▼ Show 20 Lines	case AluT_XYZW:
break;		break;
case AluDiscarded:		case AluDiscarded:
break;		break;
default: {		default: {
++CurEmitted;		++CurEmitted;
for (MachineInstr::mop_iterator It = SU->getInstr()->operands_begin(),		for (MachineInstr::mop_iterator It = SU->getInstr()->operands_begin(),
E = SU->getInstr()->operands_end(); It != E; ++It) {		E = SU->getInstr()->operands_end(); It != E; ++It) {
MachineOperand &MO = *It;		MachineOperand &MO = *It;
if (MO.isReg() && MO.getReg() == AMDGPU::ALU_LITERAL_X)		if (MO.isReg() && MO.getReg() == R600::ALU_LITERAL_X)
++CurEmitted;		++CurEmitted;
}		}
}		}
}		}
} else {		} else {
++CurEmitted;		++CurEmitted;
}		}

LLVM_DEBUG(dbgs() << CurEmitted << " Instructions Emitted in this clause\n");		LLVM_DEBUG(dbgs() << CurEmitted << " Instructions Emitted in this clause\n");

if (CurInstKind != IDFetch) {		if (CurInstKind != IDFetch) {
MoveUnits(Pending[IDFetch], Available[IDFetch]);		MoveUnits(Pending[IDFetch], Available[IDFetch]);
} else		} else
FetchInstCount++;		FetchInstCount++;
}		}

static bool		static bool
isPhysicalRegCopy(MachineInstr *MI) {		isPhysicalRegCopy(MachineInstr *MI) {
if (MI->getOpcode() != AMDGPU::COPY)		if (MI->getOpcode() != R600::COPY)
return false;		return false;

return !TargetRegisterInfo::isVirtualRegister(MI->getOperand(1).getReg());		return !TargetRegisterInfo::isVirtualRegister(MI->getOperand(1).getReg());
}		}

void R600SchedStrategy::releaseTopNode(SUnit *SU) {		void R600SchedStrategy::releaseTopNode(SUnit *SU) {
LLVM_DEBUG(dbgs() << "Top Releasing "; SU->dump(DAG););		LLVM_DEBUG(dbgs() << "Top Releasing "; SU->dump(DAG););
}		}
Show All 26 Lines

R600SchedStrategy::AluKind R600SchedStrategy::getAluKind(SUnit *SU) const {		R600SchedStrategy::AluKind R600SchedStrategy::getAluKind(SUnit *SU) const {
MachineInstr *MI = SU->getInstr();		MachineInstr *MI = SU->getInstr();

if (TII->isTransOnly(*MI))		if (TII->isTransOnly(*MI))
return AluTrans;		return AluTrans;

switch (MI->getOpcode()) {		switch (MI->getOpcode()) {
case AMDGPU::PRED_X:		case R600::PRED_X:
return AluPredX;		return AluPredX;
case AMDGPU::INTERP_PAIR_XY:		case R600::INTERP_PAIR_XY:
case AMDGPU::INTERP_PAIR_ZW:		case R600::INTERP_PAIR_ZW:
case AMDGPU::INTERP_VEC_LOAD:		case R600::INTERP_VEC_LOAD:
case AMDGPU::DOT_4:		case R600::DOT_4:
return AluT_XYZW;		return AluT_XYZW;
case AMDGPU::COPY:		case R600::COPY:
if (MI->getOperand(1).isUndef()) {		if (MI->getOperand(1).isUndef()) {
// MI will become a KILL, don't considers it in scheduling		// MI will become a KILL, don't considers it in scheduling
return AluDiscarded;		return AluDiscarded;
}		}
default:		default:
break;		break;
}		}

// Does the instruction take a whole IG ?		// Does the instruction take a whole IG ?
// XXX: Is it possible to add a helper function in R600InstrInfo that can		// XXX: Is it possible to add a helper function in R600InstrInfo that can
// be used here and in R600PacketizerList::isSoloInstruction() ?		// be used here and in R600PacketizerList::isSoloInstruction() ?
if(TII->isVector(*MI) \|\|		if(TII->isVector(*MI) \|\|
TII->isCubeOp(MI->getOpcode()) \|\|		TII->isCubeOp(MI->getOpcode()) \|\|
TII->isReductionOp(MI->getOpcode()) \|\|		TII->isReductionOp(MI->getOpcode()) \|\|
MI->getOpcode() == AMDGPU::GROUP_BARRIER) {		MI->getOpcode() == R600::GROUP_BARRIER) {
return AluT_XYZW;		return AluT_XYZW;
}		}

if (TII->isLDSInstr(MI->getOpcode())) {		if (TII->isLDSInstr(MI->getOpcode())) {
return AluT_X;		return AluT_X;
}		}

// Is the result already assigned to a channel ?		// Is the result already assigned to a channel ?
unsigned DestSubReg = MI->getOperand(0).getSubReg();		unsigned DestSubReg = MI->getOperand(0).getSubReg();
switch (DestSubReg) {		switch (DestSubReg) {
case AMDGPU::sub0:		case R600::sub0:
return AluT_X;		return AluT_X;
case AMDGPU::sub1:		case R600::sub1:
return AluT_Y;		return AluT_Y;
case AMDGPU::sub2:		case R600::sub2:
return AluT_Z;		return AluT_Z;
case AMDGPU::sub3:		case R600::sub3:
return AluT_W;		return AluT_W;
default:		default:
break;		break;
}		}

// Is the result already member of a X/Y/Z/W class ?		// Is the result already member of a X/Y/Z/W class ?
unsigned DestReg = MI->getOperand(0).getReg();		unsigned DestReg = MI->getOperand(0).getReg();
if (regBelongsToClass(DestReg, &AMDGPU::R600_TReg32_XRegClass) \|\|		if (regBelongsToClass(DestReg, &R600::R600_TReg32_XRegClass) \|\|
regBelongsToClass(DestReg, &AMDGPU::R600_AddrRegClass))		regBelongsToClass(DestReg, &R600::R600_AddrRegClass))
return AluT_X;		return AluT_X;
if (regBelongsToClass(DestReg, &AMDGPU::R600_TReg32_YRegClass))		if (regBelongsToClass(DestReg, &R600::R600_TReg32_YRegClass))
return AluT_Y;		return AluT_Y;
if (regBelongsToClass(DestReg, &AMDGPU::R600_TReg32_ZRegClass))		if (regBelongsToClass(DestReg, &R600::R600_TReg32_ZRegClass))
return AluT_Z;		return AluT_Z;
if (regBelongsToClass(DestReg, &AMDGPU::R600_TReg32_WRegClass))		if (regBelongsToClass(DestReg, &R600::R600_TReg32_WRegClass))
return AluT_W;		return AluT_W;
if (regBelongsToClass(DestReg, &AMDGPU::R600_Reg128RegClass))		if (regBelongsToClass(DestReg, &R600::R600_Reg128RegClass))
return AluT_XYZW;		return AluT_XYZW;

// LDS src registers cannot be used in the Trans slot.		// LDS src registers cannot be used in the Trans slot.
if (TII->readsLDSSrcReg(*MI))		if (TII->readsLDSSrcReg(*MI))
return AluT_XYZW;		return AluT_XYZW;

return AluAny;		return AluAny;
}		}

int R600SchedStrategy::getInstKind(SUnit* SU) {		int R600SchedStrategy::getInstKind(SUnit* SU) {
int Opcode = SU->getInstr()->getOpcode();		int Opcode = SU->getInstr()->getOpcode();

if (TII->usesTextureCache(Opcode) \|\| TII->usesVertexCache(Opcode))		if (TII->usesTextureCache(Opcode) \|\| TII->usesVertexCache(Opcode))
return IDFetch;		return IDFetch;

if (TII->isALUInstr(Opcode)) {		if (TII->isALUInstr(Opcode)) {
return IDAlu;		return IDAlu;
}		}

switch (Opcode) {		switch (Opcode) {
case AMDGPU::PRED_X:		case R600::PRED_X:
case AMDGPU::COPY:		case R600::COPY:
case AMDGPU::CONST_COPY:		case R600::CONST_COPY:
case AMDGPU::INTERP_PAIR_XY:		case R600::INTERP_PAIR_XY:
case AMDGPU::INTERP_PAIR_ZW:		case R600::INTERP_PAIR_ZW:
case AMDGPU::INTERP_VEC_LOAD:		case R600::INTERP_VEC_LOAD:
case AMDGPU::DOT_4:		case R600::DOT_4:
return IDAlu;		return IDAlu;
default:		default:
return IDOther;		return IDOther;
}		}
}		}

SUnit R600SchedStrategy::PopInst(std::vector<SUnit > &Q, bool AnyALU) {		SUnit R600SchedStrategy::PopInst(std::vector<SUnit > &Q, bool AnyALU) {
if (Q.empty())		if (Q.empty())
Show All 29 Lines	void R600SchedStrategy::PrepareNextSlot() {
OccupedSlotsMask = 0;		OccupedSlotsMask = 0;
// if (HwGen == R600Subtarget::NORTHERN_ISLANDS)		// if (HwGen == R600Subtarget::NORTHERN_ISLANDS)
// OccupedSlotsMask \|= 16;		// OccupedSlotsMask \|= 16;
InstructionsGroupCandidate.clear();		InstructionsGroupCandidate.clear();
LoadAlu();		LoadAlu();
}		}

void R600SchedStrategy::AssignSlot(MachineInstr* MI, unsigned Slot) {		void R600SchedStrategy::AssignSlot(MachineInstr* MI, unsigned Slot) {
int DstIndex = TII->getOperandIdx(MI->getOpcode(), AMDGPU::OpName::dst);		int DstIndex = TII->getOperandIdx(MI->getOpcode(), R600::OpName::dst);
if (DstIndex == -1) {		if (DstIndex == -1) {
return;		return;
}		}
unsigned DestReg = MI->getOperand(DstIndex).getReg();		unsigned DestReg = MI->getOperand(DstIndex).getReg();
// PressureRegister crashes if an operand is def and used in the same inst		// PressureRegister crashes if an operand is def and used in the same inst
// and we try to constraint its regclass		// and we try to constraint its regclass
for (MachineInstr::mop_iterator It = MI->operands_begin(),		for (MachineInstr::mop_iterator It = MI->operands_begin(),
E = MI->operands_end(); It != E; ++It) {		E = MI->operands_end(); It != E; ++It) {
MachineOperand &MO = *It;		MachineOperand &MO = *It;
if (MO.isReg() && !MO.isDef() &&		if (MO.isReg() && !MO.isDef() &&
MO.getReg() == DestReg)		MO.getReg() == DestReg)
return;		return;
}		}
// Constrains the regclass of DestReg to assign it to Slot		// Constrains the regclass of DestReg to assign it to Slot
switch (Slot) {		switch (Slot) {
case 0:		case 0:
MRI->constrainRegClass(DestReg, &AMDGPU::R600_TReg32_XRegClass);		MRI->constrainRegClass(DestReg, &R600::R600_TReg32_XRegClass);
break;		break;
case 1:		case 1:
MRI->constrainRegClass(DestReg, &AMDGPU::R600_TReg32_YRegClass);		MRI->constrainRegClass(DestReg, &R600::R600_TReg32_YRegClass);
break;		break;
case 2:		case 2:
MRI->constrainRegClass(DestReg, &AMDGPU::R600_TReg32_ZRegClass);		MRI->constrainRegClass(DestReg, &R600::R600_TReg32_ZRegClass);
break;		break;
case 3:		case 3:
MRI->constrainRegClass(DestReg, &AMDGPU::R600_TReg32_WRegClass);		MRI->constrainRegClass(DestReg, &R600::R600_TReg32_WRegClass);
break;		break;
}		}
}		}

SUnit *R600SchedStrategy::AttemptFillSlot(unsigned Slot, bool AnyAlu) {		SUnit *R600SchedStrategy::AttemptFillSlot(unsigned Slot, bool AnyAlu) {
static const AluKind IndexToID[] = {AluT_X, AluT_Y, AluT_Z, AluT_W};		static const AluKind IndexToID[] = {AluT_X, AluT_Y, AluT_Z, AluT_W};
SUnit *SlotedSU = PopInst(AvailableAlus[IndexToID[Slot]], AnyAlu);		SUnit *SlotedSU = PopInst(AvailableAlus[IndexToID[Slot]], AnyAlu);
if (SlotedSU)		if (SlotedSU)
▲ Show 20 Lines • Show All 75 Lines • Show Last 20 Lines

lib/Target/AMDGPU/R600OptimizeVectorRegisters.cpp

Show First 20 Lines • Show All 73 Lines • ▼ Show 20 Lines

class RegSeqInfo {		class RegSeqInfo {
public:		public:
MachineInstr *Instr;		MachineInstr *Instr;
DenseMap<unsigned, unsigned> RegToChan;		DenseMap<unsigned, unsigned> RegToChan;
std::vector<unsigned> UndefReg;		std::vector<unsigned> UndefReg;

RegSeqInfo(MachineRegisterInfo &MRI, MachineInstr *MI) : Instr(MI) {		RegSeqInfo(MachineRegisterInfo &MRI, MachineInstr *MI) : Instr(MI) {
assert(MI->getOpcode() == AMDGPU::REG_SEQUENCE);		assert(MI->getOpcode() == R600::REG_SEQUENCE);
for (unsigned i = 1, e = Instr->getNumOperands(); i < e; i+=2) {		for (unsigned i = 1, e = Instr->getNumOperands(); i < e; i+=2) {
MachineOperand &MO = Instr->getOperand(i);		MachineOperand &MO = Instr->getOperand(i);
unsigned Chan = Instr->getOperand(i + 1).getImm();		unsigned Chan = Instr->getOperand(i + 1).getImm();
if (isImplicitlyDef(MRI, MO.getReg()))		if (isImplicitlyDef(MRI, MO.getReg()))
UndefReg.push_back(Chan);		UndefReg.push_back(Chan);
else		else
RegToChan[MO.getReg()] = Chan;		RegToChan[MO.getReg()] = Chan;
}		}
▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines

char &llvm::R600VectorRegMergerID = R600VectorRegMerger::ID;		char &llvm::R600VectorRegMergerID = R600VectorRegMerger::ID;

bool R600VectorRegMerger::canSwizzle(const MachineInstr &MI)		bool R600VectorRegMerger::canSwizzle(const MachineInstr &MI)
const {		const {
if (TII->get(MI.getOpcode()).TSFlags & R600_InstFlag::TEX_INST)		if (TII->get(MI.getOpcode()).TSFlags & R600_InstFlag::TEX_INST)
return true;		return true;
switch (MI.getOpcode()) {		switch (MI.getOpcode()) {
case AMDGPU::R600_ExportSwz:		case R600::R600_ExportSwz:
case AMDGPU::EG_ExportSwz:		case R600::EG_ExportSwz:
return true;		return true;
default:		default:
return false;		return false;
}		}
}		}

bool R600VectorRegMerger::tryMergeVector(const RegSeqInfo *Untouched,		bool R600VectorRegMerger::tryMergeVector(const RegSeqInfo *Untouched,
RegSeqInfo *ToMerge, std::vector< std::pair<unsigned, unsigned>> &Remap)		RegSeqInfo *ToMerge, std::vector< std::pair<unsigned, unsigned>> &Remap)
Show All 36 Lines	MachineInstr *R600VectorRegMerger::RebuildVector(
MachineBasicBlock &MBB = *Pos->getParent();		MachineBasicBlock &MBB = *Pos->getParent();
DebugLoc DL = Pos->getDebugLoc();		DebugLoc DL = Pos->getDebugLoc();

unsigned SrcVec = BaseRSI->Instr->getOperand(0).getReg();		unsigned SrcVec = BaseRSI->Instr->getOperand(0).getReg();
DenseMap<unsigned, unsigned> UpdatedRegToChan = BaseRSI->RegToChan;		DenseMap<unsigned, unsigned> UpdatedRegToChan = BaseRSI->RegToChan;
std::vector<unsigned> UpdatedUndef = BaseRSI->UndefReg;		std::vector<unsigned> UpdatedUndef = BaseRSI->UndefReg;
for (DenseMap<unsigned, unsigned>::iterator It = RSI->RegToChan.begin(),		for (DenseMap<unsigned, unsigned>::iterator It = RSI->RegToChan.begin(),
E = RSI->RegToChan.end(); It != E; ++It) {		E = RSI->RegToChan.end(); It != E; ++It) {
unsigned DstReg = MRI->createVirtualRegister(&AMDGPU::R600_Reg128RegClass);		unsigned DstReg = MRI->createVirtualRegister(&R600::R600_Reg128RegClass);
unsigned SubReg = (*It).first;		unsigned SubReg = (*It).first;
unsigned Swizzle = (*It).second;		unsigned Swizzle = (*It).second;
unsigned Chan = getReassignedChan(RemapChan, Swizzle);		unsigned Chan = getReassignedChan(RemapChan, Swizzle);

MachineInstr *Tmp = BuildMI(MBB, Pos, DL, TII->get(AMDGPU::INSERT_SUBREG),		MachineInstr *Tmp = BuildMI(MBB, Pos, DL, TII->get(R600::INSERT_SUBREG),
DstReg)		DstReg)
.addReg(SrcVec)		.addReg(SrcVec)
.addReg(SubReg)		.addReg(SubReg)
.addImm(Chan);		.addImm(Chan);
UpdatedRegToChan[SubReg] = Chan;		UpdatedRegToChan[SubReg] = Chan;
std::vector<unsigned>::iterator ChanPos = llvm::find(UpdatedUndef, Chan);		std::vector<unsigned>::iterator ChanPos = llvm::find(UpdatedUndef, Chan);
if (ChanPos != UpdatedUndef.end())		if (ChanPos != UpdatedUndef.end())
UpdatedUndef.erase(ChanPos);		UpdatedUndef.erase(ChanPos);
assert(!is_contained(UpdatedUndef, Chan) &&		assert(!is_contained(UpdatedUndef, Chan) &&
"UpdatedUndef shouldn't contain Chan more than once!");		"UpdatedUndef shouldn't contain Chan more than once!");
LLVM_DEBUG(dbgs() << " ->"; Tmp->dump(););		LLVM_DEBUG(dbgs() << " ->"; Tmp->dump(););
(void)Tmp;		(void)Tmp;
SrcVec = DstReg;		SrcVec = DstReg;
}		}
MachineInstr *NewMI =		MachineInstr *NewMI =
BuildMI(MBB, Pos, DL, TII->get(AMDGPU::COPY), Reg).addReg(SrcVec);		BuildMI(MBB, Pos, DL, TII->get(R600::COPY), Reg).addReg(SrcVec);
LLVM_DEBUG(dbgs() << " ->"; NewMI->dump(););		LLVM_DEBUG(dbgs() << " ->"; NewMI->dump(););

LLVM_DEBUG(dbgs() << " Updating Swizzle:\n");		LLVM_DEBUG(dbgs() << " Updating Swizzle:\n");
for (MachineRegisterInfo::use_instr_iterator It = MRI->use_instr_begin(Reg),		for (MachineRegisterInfo::use_instr_iterator It = MRI->use_instr_begin(Reg),
E = MRI->use_instr_end(); It != E; ++It) {		E = MRI->use_instr_end(); It != E; ++It) {
LLVM_DEBUG(dbgs() << " "; (*It).dump(); dbgs() << " ->");		LLVM_DEBUG(dbgs() << " "; (*It).dump(); dbgs() << " ->");
SwizzleInput(*It, RemapChan);		SwizzleInput(*It, RemapChan);
LLVM_DEBUG((*It).dump());		LLVM_DEBUG((*It).dump());
▲ Show 20 Lines • Show All 103 Lines • ▼ Show 20 Lines	for (MachineFunction::iterator MBB = Fn.begin(), MBBe = Fn.end();
MachineBasicBlock MB = &MBB;		MachineBasicBlock MB = &MBB;
PreviousRegSeq.clear();		PreviousRegSeq.clear();
PreviousRegSeqByReg.clear();		PreviousRegSeqByReg.clear();
PreviousRegSeqByUndefCount.clear();		PreviousRegSeqByUndefCount.clear();

for (MachineBasicBlock::iterator MII = MB->begin(), MIIE = MB->end();		for (MachineBasicBlock::iterator MII = MB->begin(), MIIE = MB->end();
MII != MIIE; ++MII) {		MII != MIIE; ++MII) {
MachineInstr &MI = *MII;		MachineInstr &MI = *MII;
if (MI.getOpcode() != AMDGPU::REG_SEQUENCE) {		if (MI.getOpcode() != R600::REG_SEQUENCE) {
if (TII->get(MI.getOpcode()).TSFlags & R600_InstFlag::TEX_INST) {		if (TII->get(MI.getOpcode()).TSFlags & R600_InstFlag::TEX_INST) {
unsigned Reg = MI.getOperand(1).getReg();		unsigned Reg = MI.getOperand(1).getReg();
for (MachineRegisterInfo::def_instr_iterator		for (MachineRegisterInfo::def_instr_iterator
It = MRI->def_instr_begin(Reg), E = MRI->def_instr_end();		It = MRI->def_instr_begin(Reg), E = MRI->def_instr_end();
It != E; ++It) {		It != E; ++It) {
RemoveMI(&(*It));		RemoveMI(&(*It));
}		}
}		}
▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

lib/Target/AMDGPU/R600Packetizer.cpp

Show First 20 Lines • Show All 78 Lines • ▼ Show 20 Lines	DenseMap<unsigned, unsigned> getPreviousVector(MachineBasicBlock::iterator I)
do {		do {
bool isTrans = false;		bool isTrans = false;
int BISlot = getSlot(*BI);		int BISlot = getSlot(*BI);
if (LastDstChan >= BISlot)		if (LastDstChan >= BISlot)
isTrans = true;		isTrans = true;
LastDstChan = BISlot;		LastDstChan = BISlot;
if (TII->isPredicated(*BI))		if (TII->isPredicated(*BI))
continue;		continue;
int OperandIdx = TII->getOperandIdx(BI->getOpcode(), AMDGPU::OpName::write);		int OperandIdx = TII->getOperandIdx(BI->getOpcode(), R600::OpName::write);
if (OperandIdx > -1 && BI->getOperand(OperandIdx).getImm() == 0)		if (OperandIdx > -1 && BI->getOperand(OperandIdx).getImm() == 0)
continue;		continue;
int DstIdx = TII->getOperandIdx(BI->getOpcode(), AMDGPU::OpName::dst);		int DstIdx = TII->getOperandIdx(BI->getOpcode(), R600::OpName::dst);
if (DstIdx == -1) {		if (DstIdx == -1) {
continue;		continue;
}		}
unsigned Dst = BI->getOperand(DstIdx).getReg();		unsigned Dst = BI->getOperand(DstIdx).getReg();
if (isTrans \|\| TII->isTransOnly(*BI)) {		if (isTrans \|\| TII->isTransOnly(*BI)) {
Result[Dst] = AMDGPU::PS;		Result[Dst] = R600::PS;
continue;		continue;
}		}
if (BI->getOpcode() == AMDGPU::DOT4_r600 \|\|		if (BI->getOpcode() == R600::DOT4_r600 \|\|
BI->getOpcode() == AMDGPU::DOT4_eg) {		BI->getOpcode() == R600::DOT4_eg) {
Result[Dst] = AMDGPU::PV_X;		Result[Dst] = R600::PV_X;
continue;		continue;
}		}
if (Dst == AMDGPU::OQAP) {		if (Dst == R600::OQAP) {
continue;		continue;
}		}
unsigned PVReg = 0;		unsigned PVReg = 0;
switch (TRI.getHWRegChan(Dst)) {		switch (TRI.getHWRegChan(Dst)) {
case 0:		case 0:
PVReg = AMDGPU::PV_X;		PVReg = R600::PV_X;
break;		break;
case 1:		case 1:
PVReg = AMDGPU::PV_Y;		PVReg = R600::PV_Y;
break;		break;
case 2:		case 2:
PVReg = AMDGPU::PV_Z;		PVReg = R600::PV_Z;
break;		break;
case 3:		case 3:
PVReg = AMDGPU::PV_W;		PVReg = R600::PV_W;
break;		break;
default:		default:
llvm_unreachable("Invalid Chan");		llvm_unreachable("Invalid Chan");
}		}
Result[Dst] = PVReg;		Result[Dst] = PVReg;
} while ((++BI)->isBundledWithPred());		} while ((++BI)->isBundledWithPred());
return Result;		return Result;
}		}

void substitutePV(MachineInstr &MI, const DenseMap<unsigned, unsigned> &PVs)		void substitutePV(MachineInstr &MI, const DenseMap<unsigned, unsigned> &PVs)
const {		const {
unsigned Ops[] = {		unsigned Ops[] = {
AMDGPU::OpName::src0,		R600::OpName::src0,
AMDGPU::OpName::src1,		R600::OpName::src1,
AMDGPU::OpName::src2		R600::OpName::src2
};		};
for (unsigned i = 0; i < 3; i++) {		for (unsigned i = 0; i < 3; i++) {
int OperandIdx = TII->getOperandIdx(MI.getOpcode(), Ops[i]);		int OperandIdx = TII->getOperandIdx(MI.getOpcode(), Ops[i]);
if (OperandIdx < 0)		if (OperandIdx < 0)
continue;		continue;
unsigned Src = MI.getOperand(OperandIdx).getReg();		unsigned Src = MI.getOperand(OperandIdx).getReg();
const DenseMap<unsigned, unsigned>::const_iterator It = PVs.find(Src);		const DenseMap<unsigned, unsigned>::const_iterator It = PVs.find(Src);
if (It != PVs.end())		if (It != PVs.end())
Show All 23 Lines	public:

// isSoloInstruction - return true if instruction MI can not be packetized		// isSoloInstruction - return true if instruction MI can not be packetized
// with any other instruction, which means that MI itself is a packet.		// with any other instruction, which means that MI itself is a packet.
bool isSoloInstruction(const MachineInstr &MI) override {		bool isSoloInstruction(const MachineInstr &MI) override {
if (TII->isVector(MI))		if (TII->isVector(MI))
return true;		return true;
if (!TII->isALUInstr(MI.getOpcode()))		if (!TII->isALUInstr(MI.getOpcode()))
return true;		return true;
if (MI.getOpcode() == AMDGPU::GROUP_BARRIER)		if (MI.getOpcode() == R600::GROUP_BARRIER)
return true;		return true;
// XXX: This can be removed once the packetizer properly handles all the		// XXX: This can be removed once the packetizer properly handles all the
// LDS instruction group restrictions.		// LDS instruction group restrictions.
return TII->isLDSInstr(MI.getOpcode());		return TII->isLDSInstr(MI.getOpcode());
}		}

// isLegalToPacketizeTogether - Is it legal to packetize SUI and SUJ		// isLegalToPacketizeTogether - Is it legal to packetize SUI and SUJ
// together.		// together.
bool isLegalToPacketizeTogether(SUnit SUI, SUnit SUJ) override {		bool isLegalToPacketizeTogether(SUnit SUI, SUnit SUJ) override {
MachineInstr MII = SUI->getInstr(), MIJ = SUJ->getInstr();		MachineInstr MII = SUI->getInstr(), MIJ = SUJ->getInstr();
if (getSlot(MII) == getSlot(MIJ))		if (getSlot(MII) == getSlot(MIJ))
ConsideredInstUsesAlreadyWrittenVectorElement = true;		ConsideredInstUsesAlreadyWrittenVectorElement = true;
// Does MII and MIJ share the same pred_sel ?		// Does MII and MIJ share the same pred_sel ?
int OpI = TII->getOperandIdx(MII->getOpcode(), AMDGPU::OpName::pred_sel),		int OpI = TII->getOperandIdx(MII->getOpcode(), R600::OpName::pred_sel),
OpJ = TII->getOperandIdx(MIJ->getOpcode(), AMDGPU::OpName::pred_sel);		OpJ = TII->getOperandIdx(MIJ->getOpcode(), R600::OpName::pred_sel);
unsigned PredI = (OpI > -1)?MII->getOperand(OpI).getReg():0,		unsigned PredI = (OpI > -1)?MII->getOperand(OpI).getReg():0,
PredJ = (OpJ > -1)?MIJ->getOperand(OpJ).getReg():0;		PredJ = (OpJ > -1)?MIJ->getOperand(OpJ).getReg():0;
if (PredI != PredJ)		if (PredI != PredJ)
return false;		return false;
if (SUJ->isSucc(SUI)) {		if (SUJ->isSucc(SUI)) {
for (unsigned i = 0, e = SUJ->Succs.size(); i < e; ++i) {		for (unsigned i = 0, e = SUJ->Succs.size(); i < e; ++i) {
const SDep &Dep = SUJ->Succs[i];		const SDep &Dep = SUJ->Succs[i];
if (Dep.getSUnit() != SUI)		if (Dep.getSUnit() != SUI)
Show All 17 Lines	public:

// isLegalToPruneDependencies - Is it legal to prune dependece between SUI		// isLegalToPruneDependencies - Is it legal to prune dependece between SUI
// and SUJ.		// and SUJ.
bool isLegalToPruneDependencies(SUnit SUI, SUnit SUJ) override {		bool isLegalToPruneDependencies(SUnit SUI, SUnit SUJ) override {
return false;		return false;
}		}

void setIsLastBit(MachineInstr *MI, unsigned Bit) const {		void setIsLastBit(MachineInstr *MI, unsigned Bit) const {
unsigned LastOp = TII->getOperandIdx(MI->getOpcode(), AMDGPU::OpName::last);		unsigned LastOp = TII->getOperandIdx(MI->getOpcode(), R600::OpName::last);
MI->getOperand(LastOp).setImm(Bit);		MI->getOperand(LastOp).setImm(Bit);
}		}

bool isBundlableWithCurrentPMI(MachineInstr &MI,		bool isBundlableWithCurrentPMI(MachineInstr &MI,
const DenseMap<unsigned, unsigned> &PV,		const DenseMap<unsigned, unsigned> &PV,
std::vector<R600InstrInfo::BankSwizzle> &BS,		std::vector<R600InstrInfo::BankSwizzle> &BS,
bool &isTransSlot) {		bool &isTransSlot) {
isTransSlot = TII->isTransOnly(MI);		isTransSlot = TII->isTransOnly(MI);
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	const DenseMap<unsigned, unsigned> &PV =
getPreviousVector(FirstInBundle);		getPreviousVector(FirstInBundle);
std::vector<R600InstrInfo::BankSwizzle> BS;		std::vector<R600InstrInfo::BankSwizzle> BS;
bool isTransSlot;		bool isTransSlot;

if (isBundlableWithCurrentPMI(MI, PV, BS, isTransSlot)) {		if (isBundlableWithCurrentPMI(MI, PV, BS, isTransSlot)) {
for (unsigned i = 0, e = CurrentPacketMIs.size(); i < e; i++) {		for (unsigned i = 0, e = CurrentPacketMIs.size(); i < e; i++) {
MachineInstr *MI = CurrentPacketMIs[i];		MachineInstr *MI = CurrentPacketMIs[i];
unsigned Op = TII->getOperandIdx(MI->getOpcode(),		unsigned Op = TII->getOperandIdx(MI->getOpcode(),
AMDGPU::OpName::bank_swizzle);		R600::OpName::bank_swizzle);
MI->getOperand(Op).setImm(BS[i]);		MI->getOperand(Op).setImm(BS[i]);
}		}
unsigned Op =		unsigned Op =
TII->getOperandIdx(MI.getOpcode(), AMDGPU::OpName::bank_swizzle);		TII->getOperandIdx(MI.getOpcode(), R600::OpName::bank_swizzle);
MI.getOperand(Op).setImm(BS.back());		MI.getOperand(Op).setImm(BS.back());
if (!CurrentPacketMIs.empty())		if (!CurrentPacketMIs.empty())
setIsLastBit(CurrentPacketMIs.back(), 0);		setIsLastBit(CurrentPacketMIs.back(), 0);
substitutePV(MI, PV);		substitutePV(MI, PV);
MachineBasicBlock::iterator It = VLIWPacketizerList::addToPacket(MI);		MachineBasicBlock::iterator It = VLIWPacketizerList::addToPacket(MI);
if (isTransSlot) {		if (isTransSlot) {
endPacket(std::next(It)->getParent(), std::next(It));		endPacket(std::next(It)->getParent(), std::next(It));
}		}
Show All 12 Lines	bool R600Packetizer::runOnMachineFunction(MachineFunction &Fn) {

MachineLoopInfo &MLI = getAnalysis<MachineLoopInfo>();		MachineLoopInfo &MLI = getAnalysis<MachineLoopInfo>();

// Instantiate the packetizer.		// Instantiate the packetizer.
R600PacketizerList Packetizer(Fn, ST, MLI);		R600PacketizerList Packetizer(Fn, ST, MLI);

// DFA state table should not be empty.		// DFA state table should not be empty.
assert(Packetizer.getResourceTracker() && "Empty DFA table!");		assert(Packetizer.getResourceTracker() && "Empty DFA table!");
		assert(Packetizer.getResourceTracker()->getInstrItins());

if (Packetizer.getResourceTracker()->getInstrItins()->isEmpty())		if (Packetizer.getResourceTracker()->getInstrItins()->isEmpty())
return false;		return false;

//		//
// Loop over all basic blocks and remove KILL pseudo-instructions		// Loop over all basic blocks and remove KILL pseudo-instructions
// These instructions confuse the dependence analysis. Consider:		// These instructions confuse the dependence analysis. Consider:
// D0 = ... (Insn 0)		// D0 = ... (Insn 0)
// R0 = KILL R0, D0 (Insn 1)		// R0 = KILL R0, D0 (Insn 1)
// R0 = ... (Insn 2)		// R0 = ... (Insn 2)
// Here, Insn 1 will result in the dependence graph not emitting an output		// Here, Insn 1 will result in the dependence graph not emitting an output
// dependence between Insn 0 and Insn 2. This can lead to incorrect		// dependence between Insn 0 and Insn 2. This can lead to incorrect
// packetization		// packetization
//		//
for (MachineFunction::iterator MBB = Fn.begin(), MBBe = Fn.end();		for (MachineFunction::iterator MBB = Fn.begin(), MBBe = Fn.end();
MBB != MBBe; ++MBB) {		MBB != MBBe; ++MBB) {
MachineBasicBlock::iterator End = MBB->end();		MachineBasicBlock::iterator End = MBB->end();
MachineBasicBlock::iterator MI = MBB->begin();		MachineBasicBlock::iterator MI = MBB->begin();
while (MI != End) {		while (MI != End) {
if (MI->isKill() \|\| MI->getOpcode() == AMDGPU::IMPLICIT_DEF \|\|		if (MI->isKill() \|\| MI->getOpcode() == R600::IMPLICIT_DEF \|\|
(MI->getOpcode() == AMDGPU::CF_ALU && !MI->getOperand(8).getImm())) {		(MI->getOpcode() == R600::CF_ALU && !MI->getOperand(8).getImm())) {
MachineBasicBlock::iterator DeleteMI = MI;		MachineBasicBlock::iterator DeleteMI = MI;
++MI;		++MI;
MBB->erase(DeleteMI);		MBB->erase(DeleteMI);
End = MBB->end();		End = MBB->end();
continue;		continue;
}		}
++MI;		++MI;
}		}
▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

lib/Target/AMDGPU/R600Processors.td

	//===-- R600Processors.td - R600 Processor definitions --------------------===//			//===-- R600Processors.td - R600 Processor definitions --------------------===//
	//			//
	// The LLVM Compiler Infrastructure			// The LLVM Compiler Infrastructure
	//			//
	// This file is distributed under the University of Illinois Open Source			// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.			// License. See LICENSE.TXT for details.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

				class SubtargetFeatureFetchLimit <string Value> :
				SubtargetFeature <"fetch"#Value,
				"TexVTXClauseSize",
				Value,
				"Limit the maximum number of fetches in a clause to "#Value
				>;

				def FeatureR600ALUInst : SubtargetFeature<"R600ALUInst",
				"R600ALUInst",
				"false",
				"Older version of ALU instructions encoding"
				>;

				def FeatureFetchLimit8 : SubtargetFeatureFetchLimit <"8">;
				def FeatureFetchLimit16 : SubtargetFeatureFetchLimit <"16">;

				def FeatureVertexCache : SubtargetFeature<"HasVertexCache",
				"HasVertexCache",
				"true",
				"Specify use of dedicated vertex cache"
				>;

				def FeatureCaymanISA : SubtargetFeature<"caymanISA",
				"CaymanISA",
				"true",
				"Use Cayman ISA"
				>;

				def FeatureCFALUBug : SubtargetFeature<"cfalubug",
				"CFALUBug",
				"true",
				"GPU has CF_ALU bug"
				>;

				class R600SubtargetFeatureGeneration <string Value,
				list<SubtargetFeature> Implies> :
				SubtargetFeatureGeneration <Value, "R600Subtarget", Implies>;

				def FeatureR600 : R600SubtargetFeatureGeneration<"R600",
				[FeatureR600ALUInst, FeatureFetchLimit8, FeatureLocalMemorySize0]
				>;

				def FeatureR700 : R600SubtargetFeatureGeneration<"R700",
				[FeatureFetchLimit16, FeatureLocalMemorySize0]
				>;

				def FeatureEvergreen : R600SubtargetFeatureGeneration<"EVERGREEN",
				[FeatureFetchLimit16, FeatureLocalMemorySize32768]
				>;

				def FeatureNorthernIslands : R600SubtargetFeatureGeneration<"NORTHERN_ISLANDS",
				[FeatureFetchLimit16, FeatureWavefrontSize64,
				FeatureLocalMemorySize32768]
				>;


	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Radeon HD 2000/3000 Series (R600).			// Radeon HD 2000/3000 Series (R600).
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	def : Processor<"r600", R600_VLIW5_Itin,			def : Processor<"r600", R600_VLIW5_Itin,
	[FeatureR600, FeatureWavefrontSize64, FeatureVertexCache]			[FeatureR600, FeatureWavefrontSize64, FeatureVertexCache]
	>;			>;

	▲ Show 20 Lines • Show All 73 Lines • Show Last 20 Lines

lib/Target/AMDGPU/R600RegisterInfo.h

Show All 9 Lines
/// \file		/// \file
/// Interface definition for R600RegisterInfo		/// Interface definition for R600RegisterInfo
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_LIB_TARGET_AMDGPU_R600REGISTERINFO_H		#ifndef LLVM_LIB_TARGET_AMDGPU_R600REGISTERINFO_H
#define LLVM_LIB_TARGET_AMDGPU_R600REGISTERINFO_H		#define LLVM_LIB_TARGET_AMDGPU_R600REGISTERINFO_H

#include "AMDGPURegisterInfo.h"		#define GET_REGINFO_HEADER
		#include "R600GenRegisterInfo.inc"

namespace llvm {		namespace llvm {

class AMDGPUSubtarget;		class AMDGPUSubtarget;

struct R600RegisterInfo final : public AMDGPURegisterInfo {		struct R600RegisterInfo final : public R600GenRegisterInfo {
RegClassWeight RCW;		RegClassWeight RCW;

R600RegisterInfo();		R600RegisterInfo();

BitVector getReservedRegs(const MachineFunction &MF) const override;		BitVector getReservedRegs(const MachineFunction &MF) const override;
const MCPhysReg getCalleeSavedRegs(const MachineFunction MF) const override;		const MCPhysReg getCalleeSavedRegs(const MachineFunction MF) const override;
unsigned getFrameRegister(const MachineFunction &MF) const override;		unsigned getFrameRegister(const MachineFunction &MF) const override;

Show All 11 Lines	struct R600RegisterInfo final : public R600GenRegisterInfo {

// \returns true if \p Reg can be defined in one ALU clause and used in		// \returns true if \p Reg can be defined in one ALU clause and used in
// another.		// another.
bool isPhysRegLiveAcrossClauses(unsigned Reg) const;		bool isPhysRegLiveAcrossClauses(unsigned Reg) const;

void eliminateFrameIndex(MachineBasicBlock::iterator MI, int SPAdj,		void eliminateFrameIndex(MachineBasicBlock::iterator MI, int SPAdj,
unsigned FIOperandNum,		unsigned FIOperandNum,
RegScavenger *RS = nullptr) const override;		RegScavenger *RS = nullptr) const override;

		void reserveRegisterTuples(BitVector &Reserved, unsigned Reg) const;
};		};

} // End namespace llvm		} // End namespace llvm

#endif		#endif

lib/Target/AMDGPU/R600RegisterInfo.cpp

	Show All 15 Lines
	#include "AMDGPUTargetMachine.h"			#include "AMDGPUTargetMachine.h"
	#include "R600Defines.h"			#include "R600Defines.h"
	#include "R600InstrInfo.h"			#include "R600InstrInfo.h"
	#include "R600MachineFunctionInfo.h"			#include "R600MachineFunctionInfo.h"
	#include "MCTargetDesc/AMDGPUMCTargetDesc.h"			#include "MCTargetDesc/AMDGPUMCTargetDesc.h"

	using namespace llvm;			using namespace llvm;

	R600RegisterInfo::R600RegisterInfo() : AMDGPURegisterInfo() {			R600RegisterInfo::R600RegisterInfo() : R600GenRegisterInfo(0) {
	RCW.RegWeight = 0;			RCW.RegWeight = 0;
	RCW.WeightLimit = 0;			RCW.WeightLimit = 0;
	}			}

				#define GET_REGINFO_TARGET_DESC
				#include "R600GenRegisterInfo.inc"

	BitVector R600RegisterInfo::getReservedRegs(const MachineFunction &MF) const {			BitVector R600RegisterInfo::getReservedRegs(const MachineFunction &MF) const {
	BitVector Reserved(getNumRegs());			BitVector Reserved(getNumRegs());

	const R600Subtarget &ST = MF.getSubtarget<R600Subtarget>();			const R600Subtarget &ST = MF.getSubtarget<R600Subtarget>();
	const R600InstrInfo *TII = ST.getInstrInfo();			const R600InstrInfo *TII = ST.getInstrInfo();

	reserveRegisterTuples(Reserved, AMDGPU::ZERO);			reserveRegisterTuples(Reserved, R600::ZERO);
	reserveRegisterTuples(Reserved, AMDGPU::HALF);			reserveRegisterTuples(Reserved, R600::HALF);
	reserveRegisterTuples(Reserved, AMDGPU::ONE);			reserveRegisterTuples(Reserved, R600::ONE);
	reserveRegisterTuples(Reserved, AMDGPU::ONE_INT);			reserveRegisterTuples(Reserved, R600::ONE_INT);
	reserveRegisterTuples(Reserved, AMDGPU::NEG_HALF);			reserveRegisterTuples(Reserved, R600::NEG_HALF);
	reserveRegisterTuples(Reserved, AMDGPU::NEG_ONE);			reserveRegisterTuples(Reserved, R600::NEG_ONE);
	reserveRegisterTuples(Reserved, AMDGPU::PV_X);			reserveRegisterTuples(Reserved, R600::PV_X);
	reserveRegisterTuples(Reserved, AMDGPU::ALU_LITERAL_X);			reserveRegisterTuples(Reserved, R600::ALU_LITERAL_X);
	reserveRegisterTuples(Reserved, AMDGPU::ALU_CONST);			reserveRegisterTuples(Reserved, R600::ALU_CONST);
	reserveRegisterTuples(Reserved, AMDGPU::PREDICATE_BIT);			reserveRegisterTuples(Reserved, R600::PREDICATE_BIT);
	reserveRegisterTuples(Reserved, AMDGPU::PRED_SEL_OFF);			reserveRegisterTuples(Reserved, R600::PRED_SEL_OFF);
	reserveRegisterTuples(Reserved, AMDGPU::PRED_SEL_ZERO);			reserveRegisterTuples(Reserved, R600::PRED_SEL_ZERO);
	reserveRegisterTuples(Reserved, AMDGPU::PRED_SEL_ONE);			reserveRegisterTuples(Reserved, R600::PRED_SEL_ONE);
	reserveRegisterTuples(Reserved, AMDGPU::INDIRECT_BASE_ADDR);			reserveRegisterTuples(Reserved, R600::INDIRECT_BASE_ADDR);

	for (TargetRegisterClass::iterator I = AMDGPU::R600_AddrRegClass.begin(),			for (TargetRegisterClass::iterator I = R600::R600_AddrRegClass.begin(),
	E = AMDGPU::R600_AddrRegClass.end(); I != E; ++I) {			E = R600::R600_AddrRegClass.end(); I != E; ++I) {
	reserveRegisterTuples(Reserved, *I);			reserveRegisterTuples(Reserved, *I);
	}			}

	TII->reserveIndirectRegisters(Reserved, MF, *this);			TII->reserveIndirectRegisters(Reserved, MF, *this);

	return Reserved;			return Reserved;
	}			}

	// Dummy to not crash RegisterClassInfo.			// Dummy to not crash RegisterClassInfo.
	static const MCPhysReg CalleeSavedReg = AMDGPU::NoRegister;			static const MCPhysReg CalleeSavedReg = R600::NoRegister;

	const MCPhysReg *R600RegisterInfo::getCalleeSavedRegs(			const MCPhysReg *R600RegisterInfo::getCalleeSavedRegs(
	const MachineFunction *) const {			const MachineFunction *) const {
	return &CalleeSavedReg;			return &CalleeSavedReg;
	}			}

	unsigned R600RegisterInfo::getFrameRegister(const MachineFunction &MF) const {			unsigned R600RegisterInfo::getFrameRegister(const MachineFunction &MF) const {
	return AMDGPU::NoRegister;			return R600::NoRegister;
	}			}

	unsigned R600RegisterInfo::getHWRegChan(unsigned reg) const {			unsigned R600RegisterInfo::getHWRegChan(unsigned reg) const {
	return this->getEncodingValue(reg) >> HW_CHAN_SHIFT;			return this->getEncodingValue(reg) >> HW_CHAN_SHIFT;
	}			}

	unsigned R600RegisterInfo::getHWRegIndex(unsigned Reg) const {			unsigned R600RegisterInfo::getHWRegIndex(unsigned Reg) const {
	return GET_REG_INDEX(getEncodingValue(Reg));			return GET_REG_INDEX(getEncodingValue(Reg));
	}			}

	const TargetRegisterClass * R600RegisterInfo::getCFGStructurizerRegClass(			const TargetRegisterClass * R600RegisterInfo::getCFGStructurizerRegClass(
	MVT VT) const {			MVT VT) const {
	switch(VT.SimpleTy) {			switch(VT.SimpleTy) {
	default:			default:
	case MVT::i32: return &AMDGPU::R600_TReg32RegClass;			case MVT::i32: return &R600::R600_TReg32RegClass;
	}			}
	}			}

	const RegClassWeight &R600RegisterInfo::getRegClassWeight(			const RegClassWeight &R600RegisterInfo::getRegClassWeight(
	const TargetRegisterClass *RC) const {			const TargetRegisterClass *RC) const {
	return RCW;			return RCW;
	}			}

	bool R600RegisterInfo::isPhysRegLiveAcrossClauses(unsigned Reg) const {			bool R600RegisterInfo::isPhysRegLiveAcrossClauses(unsigned Reg) const {
	assert(!TargetRegisterInfo::isVirtualRegister(Reg));			assert(!TargetRegisterInfo::isVirtualRegister(Reg));

	switch (Reg) {			switch (Reg) {
	case AMDGPU::OQAP:			case R600::OQAP:
	case AMDGPU::OQBP:			case R600::OQBP:
	case AMDGPU::AR_X:			case R600::AR_X:
	return false;			return false;
	default:			default:
	return true;			return true;
	}			}
	}			}

	void R600RegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator MI,			void R600RegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator MI,
	int SPAdj,			int SPAdj,
	unsigned FIOperandNum,			unsigned FIOperandNum,
	RegScavenger *RS) const {			RegScavenger *RS) const {
	llvm_unreachable("Subroutines not supported yet");			llvm_unreachable("Subroutines not supported yet");
	}			}

				void R600RegisterInfo::reserveRegisterTuples(BitVector &Reserved, unsigned Reg) const {
				MCRegAliasIterator R(Reg, this, true);

				for (; R.isValid(); ++R)
				Reserved.set(*R);
				}

lib/Target/AMDGPU/R600RegisterInfo.td

Show First 20 Lines • Show All 239 Lines • ▼ Show 20 Lines	def R600_Reg128 : RegisterClass<"AMDGPU", [v4f32, v4i32], 128,
(add (sequence "T%u_XYZW", 0, 127))> {		(add (sequence "T%u_XYZW", 0, 127))> {
let CopyCost = -1;		let CopyCost = -1;
}		}

def R600_Reg128Vertical : RegisterClass<"AMDGPU", [v4f32, v4i32], 128,		def R600_Reg128Vertical : RegisterClass<"AMDGPU", [v4f32, v4i32], 128,
(add V0123_W, V0123_Z, V0123_Y, V0123_X)		(add V0123_W, V0123_Z, V0123_Y, V0123_X)
>;		>;

def R600_Reg64 : RegisterClass<"AMDGPU", [v2f32, v2i32], 64,		def R600_Reg64 : RegisterClass<"AMDGPU", [v2f32, v2i32, i64, f64], 64,
(add (sequence "T%u_XY", 0, 63))>;		(add (sequence "T%u_XY", 0, 63))>;

def R600_Reg64Vertical : RegisterClass<"AMDGPU", [v2f32, v2i32], 64,		def R600_Reg64Vertical : RegisterClass<"AMDGPU", [v2f32, v2i32], 64,
(add V01_X, V01_Y, V01_Z, V01_W,		(add V01_X, V01_Y, V01_Z, V01_W,
V23_X, V23_Y, V23_Z, V23_W)>;		V23_X, V23_Y, V23_Z, V23_W)>;

lib/Target/AMDGPU/R700Instructions.td

	//===-- R700Instructions.td - R700 Instruction defs -------- tablegen --===//			//===-- R700Instructions.td - R700 Instruction defs -------- tablegen --===//
	//			//
	// The LLVM Compiler Infrastructure			// The LLVM Compiler Infrastructure
	//			//
	// This file is distributed under the University of Illinois Open Source			// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.			// License. See LICENSE.TXT for details.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// TableGen definitions for instructions which are:			// TableGen definitions for instructions which are:
	// - Available to R700 and newer VLIW4/VLIW5 GPUs			// - Available to R700 and newer VLIW4/VLIW5 GPUs
	// - Available only on R700 family GPUs.			// - Available only on R700 family GPUs.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	def isR700 : Predicate<"Subtarget->getGeneration() == AMDGPUSubtarget::R700">;			def isR700 : Predicate<"Subtarget->getGeneration() == R600Subtarget::R700">;

	let Predicates = [isR700] in {			let Predicates = [isR700] in {
	def SIN_r700 : SIN_Common<0x6E>;			def SIN_r700 : SIN_Common<0x6E>;
	def COS_r700 : COS_Common<0x6F>;			def COS_r700 : COS_Common<0x6F>;
	}			}

lib/Target/AMDGPU/SIFoldOperands.cpp

Show First 20 Lines • Show All 70 Lines • ▼ Show 20 Lines
};		};

class SIFoldOperands : public MachineFunctionPass {		class SIFoldOperands : public MachineFunctionPass {
public:		public:
static char ID;		static char ID;
MachineRegisterInfo *MRI;		MachineRegisterInfo *MRI;
const SIInstrInfo *TII;		const SIInstrInfo *TII;
const SIRegisterInfo *TRI;		const SIRegisterInfo *TRI;
const SISubtarget *ST;		const AMDGPUSubtarget *ST;

void foldOperand(MachineOperand &OpToFold,		void foldOperand(MachineOperand &OpToFold,
MachineInstr *UseMI,		MachineInstr *UseMI,
unsigned UseOpIdx,		unsigned UseOpIdx,
SmallVectorImpl<FoldCandidate> &FoldList,		SmallVectorImpl<FoldCandidate> &FoldList,
SmallVectorImpl<MachineInstr *> &CopiesToReplace) const;		SmallVectorImpl<MachineInstr *> &CopiesToReplace) const;

void foldInstOperand(MachineInstr &MI, MachineOperand &OpToFold) const;		void foldInstOperand(MachineInstr &MI, MachineOperand &OpToFold) const;
▲ Show 20 Lines • Show All 879 Lines • ▼ Show 20 Lines	bool SIFoldOperands::tryFoldOMod(MachineInstr &MI) {
return true;		return true;
}		}

bool SIFoldOperands::runOnMachineFunction(MachineFunction &MF) {		bool SIFoldOperands::runOnMachineFunction(MachineFunction &MF) {
if (skipFunction(MF.getFunction()))		if (skipFunction(MF.getFunction()))
return false;		return false;

MRI = &MF.getRegInfo();		MRI = &MF.getRegInfo();
ST = &MF.getSubtarget<SISubtarget>();		ST = &MF.getSubtarget<AMDGPUSubtarget>();
TII = ST->getInstrInfo();		TII = ST->getInstrInfo();
TRI = &TII->getRegisterInfo();		TRI = &TII->getRegisterInfo();

const SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();		const SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();

// omod is ignored by hardware if IEEE bit is enabled. omod also does not		// omod is ignored by hardware if IEEE bit is enabled. omod also does not
// correctly handle signed zeros.		// correctly handle signed zeros.
//		//
▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIISelLowering.h

	Show All 16 Lines

	#include "AMDGPUISelLowering.h"			#include "AMDGPUISelLowering.h"
	#include "AMDGPUArgumentUsageInfo.h"			#include "AMDGPUArgumentUsageInfo.h"
	#include "SIInstrInfo.h"			#include "SIInstrInfo.h"

	namespace llvm {			namespace llvm {

	class SITargetLowering final : public AMDGPUTargetLowering {			class SITargetLowering final : public AMDGPUTargetLowering {
				private:
				const SISubtarget *Subtarget;

	SDValue lowerKernArgParameterPtr(SelectionDAG &DAG, const SDLoc &SL,			SDValue lowerKernArgParameterPtr(SelectionDAG &DAG, const SDLoc &SL,
	SDValue Chain, uint64_t Offset) const;			SDValue Chain, uint64_t Offset) const;
	SDValue getImplicitArgPtr(SelectionDAG &DAG, const SDLoc &SL) const;			SDValue getImplicitArgPtr(SelectionDAG &DAG, const SDLoc &SL) const;
	SDValue lowerKernargMemParameter(SelectionDAG &DAG, EVT VT, EVT MemVT,			SDValue lowerKernargMemParameter(SelectionDAG &DAG, EVT VT, EVT MemVT,
	const SDLoc &SL, SDValue Chain,			const SDLoc &SL, SDValue Chain,
	uint64_t Offset, unsigned Align, bool Signed,			uint64_t Offset, unsigned Align, bool Signed,
	const ISD::InputArg *Arg = nullptr) const;			const ISD::InputArg *Arg = nullptr) const;

	▲ Show 20 Lines • Show All 281 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 107 Lines • ▼ Show 20 Lines	if (!CCInfo.isAllocated(AMDGPU::SGPR0 + Reg)) {
return AMDGPU::SGPR0 + Reg;		return AMDGPU::SGPR0 + Reg;
}		}
}		}
llvm_unreachable("Cannot allocate sgpr");		llvm_unreachable("Cannot allocate sgpr");
}		}

SITargetLowering::SITargetLowering(const TargetMachine &TM,		SITargetLowering::SITargetLowering(const TargetMachine &TM,
const SISubtarget &STI)		const SISubtarget &STI)
: AMDGPUTargetLowering(TM, STI) {		: AMDGPUTargetLowering(TM, STI),
		Subtarget(&STI) {
addRegisterClass(MVT::i1, &AMDGPU::VReg_1RegClass);		addRegisterClass(MVT::i1, &AMDGPU::VReg_1RegClass);
addRegisterClass(MVT::i64, &AMDGPU::SReg_64RegClass);		addRegisterClass(MVT::i64, &AMDGPU::SReg_64RegClass);

addRegisterClass(MVT::i32, &AMDGPU::SReg_32_XM0RegClass);		addRegisterClass(MVT::i32, &AMDGPU::SReg_32_XM0RegClass);
addRegisterClass(MVT::f32, &AMDGPU::VGPR_32RegClass);		addRegisterClass(MVT::f32, &AMDGPU::VGPR_32RegClass);

addRegisterClass(MVT::f64, &AMDGPU::VReg_64RegClass);		addRegisterClass(MVT::f64, &AMDGPU::VReg_64RegClass);
addRegisterClass(MVT::v2i32, &AMDGPU::SReg_64RegClass);		addRegisterClass(MVT::v2i32, &AMDGPU::SReg_64RegClass);
Show All 17 Lines	if (Subtarget->has16BitInsts()) {

// Unless there are also VOP3P operations, not operations are really legal.		// Unless there are also VOP3P operations, not operations are really legal.
addRegisterClass(MVT::v2i16, &AMDGPU::SReg_32_XM0RegClass);		addRegisterClass(MVT::v2i16, &AMDGPU::SReg_32_XM0RegClass);
addRegisterClass(MVT::v2f16, &AMDGPU::SReg_32_XM0RegClass);		addRegisterClass(MVT::v2f16, &AMDGPU::SReg_32_XM0RegClass);
addRegisterClass(MVT::v4i16, &AMDGPU::SReg_64RegClass);		addRegisterClass(MVT::v4i16, &AMDGPU::SReg_64RegClass);
addRegisterClass(MVT::v4f16, &AMDGPU::SReg_64RegClass);		addRegisterClass(MVT::v4f16, &AMDGPU::SReg_64RegClass);
}		}

computeRegisterProperties(STI.getRegisterInfo());		computeRegisterProperties(Subtarget->getRegisterInfo());

// We need to custom lower vector stores from local memory		// We need to custom lower vector stores from local memory
setOperationAction(ISD::LOAD, MVT::v2i32, Custom);		setOperationAction(ISD::LOAD, MVT::v2i32, Custom);
setOperationAction(ISD::LOAD, MVT::v4i32, Custom);		setOperationAction(ISD::LOAD, MVT::v4i32, Custom);
setOperationAction(ISD::LOAD, MVT::v8i32, Custom);		setOperationAction(ISD::LOAD, MVT::v8i32, Custom);
setOperationAction(ISD::LOAD, MVT::v16i32, Custom);		setOperationAction(ISD::LOAD, MVT::v16i32, Custom);
setOperationAction(ISD::LOAD, MVT::i1, Custom);		setOperationAction(ISD::LOAD, MVT::i1, Custom);

▲ Show 20 Lines • Show All 159 Lines • ▼ Show 20 Lines	#endif
setOperationAction(ISD::ATOMIC_CMP_SWAP, MVT::i32, Custom);		setOperationAction(ISD::ATOMIC_CMP_SWAP, MVT::i32, Custom);
setOperationAction(ISD::ATOMIC_CMP_SWAP, MVT::i64, Custom);		setOperationAction(ISD::ATOMIC_CMP_SWAP, MVT::i64, Custom);

// We can't return success/failure, only the old value,		// We can't return success/failure, only the old value,
// let LLVM add the comparison		// let LLVM add the comparison
setOperationAction(ISD::ATOMIC_CMP_SWAP_WITH_SUCCESS, MVT::i32, Expand);		setOperationAction(ISD::ATOMIC_CMP_SWAP_WITH_SUCCESS, MVT::i32, Expand);
setOperationAction(ISD::ATOMIC_CMP_SWAP_WITH_SUCCESS, MVT::i64, Expand);		setOperationAction(ISD::ATOMIC_CMP_SWAP_WITH_SUCCESS, MVT::i64, Expand);

if (getSubtarget()->hasFlatAddressSpace()) {		if (Subtarget->hasFlatAddressSpace()) {
setOperationAction(ISD::ADDRSPACECAST, MVT::i32, Custom);		setOperationAction(ISD::ADDRSPACECAST, MVT::i32, Custom);
setOperationAction(ISD::ADDRSPACECAST, MVT::i64, Custom);		setOperationAction(ISD::ADDRSPACECAST, MVT::i64, Custom);
}		}

setOperationAction(ISD::BSWAP, MVT::i32, Legal);		setOperationAction(ISD::BSWAP, MVT::i32, Legal);
setOperationAction(ISD::BITREVERSE, MVT::i32, Legal);		setOperationAction(ISD::BITREVERSE, MVT::i32, Legal);

// On SI this is s_memtime and s_memrealtime on VI.		// On SI this is s_memtime and s_memrealtime on VI.
setOperationAction(ISD::READCYCLECOUNTER, MVT::i64, Legal);		setOperationAction(ISD::READCYCLECOUNTER, MVT::i64, Legal);
setOperationAction(ISD::TRAP, MVT::Other, Custom);		setOperationAction(ISD::TRAP, MVT::Other, Custom);
setOperationAction(ISD::DEBUGTRAP, MVT::Other, Custom);		setOperationAction(ISD::DEBUGTRAP, MVT::Other, Custom);

		if (Subtarget->has16BitInsts()) {
		setOperationAction(ISD::FLOG, MVT::f16, Custom);
		setOperationAction(ISD::FLOG10, MVT::f16, Custom);
		}

		// v_mad_f32 does not support denormals according to some sources.
		if (!Subtarget->hasFP32Denormals())
		setOperationAction(ISD::FMAD, MVT::f32, Legal);

		if (!Subtarget->hasBFI()) {
		// fcopysign can be done in a single instruction with BFI.
		setOperationAction(ISD::FCOPYSIGN, MVT::f32, Expand);
		setOperationAction(ISD::FCOPYSIGN, MVT::f64, Expand);
		}

		if (!Subtarget->hasBCNT(32))
		setOperationAction(ISD::CTPOP, MVT::i32, Expand);

		if (!Subtarget->hasBCNT(64))
		setOperationAction(ISD::CTPOP, MVT::i64, Expand);

		if (Subtarget->hasFFBH())
		setOperationAction(ISD::CTLZ_ZERO_UNDEF, MVT::i32, Custom);

		if (Subtarget->hasFFBL())
		setOperationAction(ISD::CTTZ_ZERO_UNDEF, MVT::i32, Custom);

		// We only really have 32-bit BFE instructions (and 16-bit on VI).
		//
		// On SI+ there are 64-bit BFEs, but they are scalar only and there isn't any
		// effort to match them now. We want this to be false for i64 cases when the
		// extraction isn't restricted to the upper or lower half. Ideally we would
		// have some pass reduce 64-bit extracts to 32-bit if possible. Extracts that
		// span the midpoint are probably relatively rare, so don't worry about them
		// for now.
		if (Subtarget->hasBFE())
		setHasExtractBitsInsn(true);

setOperationAction(ISD::FMINNUM, MVT::f64, Legal);		setOperationAction(ISD::FMINNUM, MVT::f64, Legal);
setOperationAction(ISD::FMAXNUM, MVT::f64, Legal);		setOperationAction(ISD::FMAXNUM, MVT::f64, Legal);

if (Subtarget->getGeneration() >= SISubtarget::SEA_ISLANDS) {		if (Subtarget->getGeneration() >= SISubtarget::SEA_ISLANDS) {
setOperationAction(ISD::FTRUNC, MVT::f64, Legal);		setOperationAction(ISD::FTRUNC, MVT::f64, Legal);
setOperationAction(ISD::FCEIL, MVT::f64, Legal);		setOperationAction(ISD::FCEIL, MVT::f64, Legal);
setOperationAction(ISD::FRINT, MVT::f64, Legal);		setOperationAction(ISD::FRINT, MVT::f64, Legal);
		} else {
		setOperationAction(ISD::FCEIL, MVT::f64, Custom);
		setOperationAction(ISD::FTRUNC, MVT::f64, Custom);
		setOperationAction(ISD::FRINT, MVT::f64, Custom);
		setOperationAction(ISD::FFLOOR, MVT::f64, Custom);
}		}

setOperationAction(ISD::FFLOOR, MVT::f64, Legal);		setOperationAction(ISD::FFLOOR, MVT::f64, Legal);

setOperationAction(ISD::FSIN, MVT::f32, Custom);		setOperationAction(ISD::FSIN, MVT::f32, Custom);
setOperationAction(ISD::FCOS, MVT::f32, Custom);		setOperationAction(ISD::FCOS, MVT::f32, Custom);
setOperationAction(ISD::FDIV, MVT::f32, Custom);		setOperationAction(ISD::FDIV, MVT::f32, Custom);
setOperationAction(ISD::FDIV, MVT::f64, Custom);		setOperationAction(ISD::FDIV, MVT::f64, Custom);
▲ Show 20 Lines • Show All 257 Lines • ▼ Show 20 Lines	#endif
setTargetDAGCombine(ISD::ATOMIC_LOAD_XOR);		setTargetDAGCombine(ISD::ATOMIC_LOAD_XOR);
setTargetDAGCombine(ISD::ATOMIC_LOAD_NAND);		setTargetDAGCombine(ISD::ATOMIC_LOAD_NAND);
setTargetDAGCombine(ISD::ATOMIC_LOAD_MIN);		setTargetDAGCombine(ISD::ATOMIC_LOAD_MIN);
setTargetDAGCombine(ISD::ATOMIC_LOAD_MAX);		setTargetDAGCombine(ISD::ATOMIC_LOAD_MAX);
setTargetDAGCombine(ISD::ATOMIC_LOAD_UMIN);		setTargetDAGCombine(ISD::ATOMIC_LOAD_UMIN);
setTargetDAGCombine(ISD::ATOMIC_LOAD_UMAX);		setTargetDAGCombine(ISD::ATOMIC_LOAD_UMAX);

setSchedulingPreference(Sched::RegPressure);		setSchedulingPreference(Sched::RegPressure);

		// SI at least has hardware support for floating point exceptions, but no way
		// of using or handling them is implemented. They are also optional in OpenCL
		// (Section 7.3)
		setHasFloatingPointExceptions(Subtarget->hasFPExceptions());
}		}

const SISubtarget *SITargetLowering::getSubtarget() const {		const SISubtarget *SITargetLowering::getSubtarget() const {
return static_cast<const SISubtarget *>(Subtarget);		return Subtarget;
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// TargetLowering queries		// TargetLowering queries
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

// v_mad_mix* support a conversion from f16 to f32.		// v_mad_mix* support a conversion from f16 to f32.
//		//
▲ Show 20 Lines • Show All 1,377 Lines • ▼ Show 20 Lines	for (unsigned i = 0, realRVLocIdx = 0;

Chain = DAG.getCopyToReg(Chain, DL, VA.getLocReg(), Arg, Flag);		Chain = DAG.getCopyToReg(Chain, DL, VA.getLocReg(), Arg, Flag);
Flag = Chain.getValue(1);		Flag = Chain.getValue(1);
RetOps.push_back(DAG.getRegister(VA.getLocReg(), VA.getLocVT()));		RetOps.push_back(DAG.getRegister(VA.getLocReg(), VA.getLocVT()));
}		}

// FIXME: Does sret work properly?		// FIXME: Does sret work properly?
if (!Info->isEntryFunction()) {		if (!Info->isEntryFunction()) {
const SIRegisterInfo *TRI		const SIRegisterInfo *TRI = Subtarget->getRegisterInfo();
= static_cast<const SISubtarget *>(Subtarget)->getRegisterInfo();
const MCPhysReg *I =		const MCPhysReg *I =
TRI->getCalleeSavedRegsViaCopy(&DAG.getMachineFunction());		TRI->getCalleeSavedRegsViaCopy(&DAG.getMachineFunction());
if (I) {		if (I) {
for (; *I; ++I) {		for (; *I; ++I) {
if (AMDGPU::SReg_64RegClass.contains(*I))		if (AMDGPU::SReg_64RegClass.contains(*I))
RetOps.push_back(DAG.getRegister(*I, MVT::i64));		RetOps.push_back(DAG.getRegister(*I, MVT::i64));
else if (AMDGPU::SReg_32RegClass.contains(*I))		else if (AMDGPU::SReg_32RegClass.contains(*I))
RetOps.push_back(DAG.getRegister(*I, MVT::i32));		RetOps.push_back(DAG.getRegister(*I, MVT::i32));
▲ Show 20 Lines • Show All 85 Lines • ▼ Show 20 Lines	if (!CLI.CS)
return;		return;

const Function *CalleeFunc = CLI.CS.getCalledFunction();		const Function *CalleeFunc = CLI.CS.getCalledFunction();
assert(CalleeFunc);		assert(CalleeFunc);

SelectionDAG &DAG = CLI.DAG;		SelectionDAG &DAG = CLI.DAG;
const SDLoc &DL = CLI.DL;		const SDLoc &DL = CLI.DL;

const SISubtarget *ST = getSubtarget();		const SIRegisterInfo *TRI = Subtarget->getRegisterInfo();
const SIRegisterInfo *TRI = ST->getRegisterInfo();

auto &ArgUsageInfo =		auto &ArgUsageInfo =
DAG.getPass()->getAnalysis<AMDGPUArgumentUsageInfo>();		DAG.getPass()->getAnalysis<AMDGPUArgumentUsageInfo>();
const AMDGPUFunctionArgInfo &CalleeArgInfo		const AMDGPUFunctionArgInfo &CalleeArgInfo
= ArgUsageInfo.lookupFuncArgInfo(*CalleeFunc);		= ArgUsageInfo.lookupFuncArgInfo(*CalleeFunc);

const AMDGPUFunctionArgInfo &CallerArgInfo = Info.getArgInfo();		const AMDGPUFunctionArgInfo &CallerArgInfo = Info.getArgInfo();

▲ Show 20 Lines • Show All 421 Lines • ▼ Show 20 Lines	SDValue SITargetLowering::LowerCall(CallLoweringInfo &CLI,
// into the call.		// into the call.
for (auto &RegToPass : RegsToPass) {		for (auto &RegToPass : RegsToPass) {
Ops.push_back(DAG.getRegister(RegToPass.first,		Ops.push_back(DAG.getRegister(RegToPass.first,
RegToPass.second.getValueType()));		RegToPass.second.getValueType()));
}		}

// Add a register mask operand representing the call-preserved registers.		// Add a register mask operand representing the call-preserved registers.

const AMDGPURegisterInfo *TRI = Subtarget->getRegisterInfo();		auto TRI = static_cast<const SIRegisterInfo>(Subtarget->getRegisterInfo());
const uint32_t *Mask = TRI->getCallPreservedMask(MF, CallConv);		const uint32_t *Mask = TRI->getCallPreservedMask(MF, CallConv);
assert(Mask && "Missing call preserved mask for calling convention");		assert(Mask && "Missing call preserved mask for calling convention");
Ops.push_back(DAG.getRegisterMask(Mask));		Ops.push_back(DAG.getRegisterMask(Mask));

if (InFlag.getNode())		if (InFlag.getNode())
Ops.push_back(InFlag);		Ops.push_back(InFlag);

SDVTList NodeTys = DAG.getVTList(MVT::Other, MVT::Glue);		SDVTList NodeTys = DAG.getVTList(MVT::Other, MVT::Glue);
▲ Show 20 Lines • Show All 5,601 Lines • ▼ Show 20 Lines

// Figure out which registers should be reserved for stack access. Only after		// Figure out which registers should be reserved for stack access. Only after
// the function is legalized do we know all of the non-spill stack objects or if		// the function is legalized do we know all of the non-spill stack objects or if
// calls are present.		// calls are present.
void SITargetLowering::finalizeLowering(MachineFunction &MF) const {		void SITargetLowering::finalizeLowering(MachineFunction &MF) const {
MachineRegisterInfo &MRI = MF.getRegInfo();		MachineRegisterInfo &MRI = MF.getRegInfo();
SIMachineFunctionInfo *Info = MF.getInfo<SIMachineFunctionInfo>();		SIMachineFunctionInfo *Info = MF.getInfo<SIMachineFunctionInfo>();
const MachineFrameInfo &MFI = MF.getFrameInfo();		const MachineFrameInfo &MFI = MF.getFrameInfo();
const SISubtarget &ST = MF.getSubtarget<SISubtarget>();		const SIRegisterInfo *TRI = Subtarget->getRegisterInfo();
const SIRegisterInfo *TRI = ST.getRegisterInfo();

if (Info->isEntryFunction()) {		if (Info->isEntryFunction()) {
// Callable functions have fixed registers used for stack access.		// Callable functions have fixed registers used for stack access.
reservePrivateMemoryRegs(getTargetMachine(), MF, TRI, Info);		reservePrivateMemoryRegs(getTargetMachine(), MF, TRI, Info);
}		}

// We have to assume the SP is needed in case there are calls in the function		// We have to assume the SP is needed in case there are calls in the function
// during lowering. Calls are only detected after the function is		// during lowering. Calls are only detected after the function is
▲ Show 20 Lines • Show All 109 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIInsertWaitcnts.cpp

Show First 20 Lines • Show All 927 Lines • ▼ Show 20 Lines	else if (MI.getOpcode() == AMDGPU::BUFFER_WBINVL1 \|\|
MI.getOpcode() == AMDGPU::BUFFER_WBINVL1_SC \|\|		MI.getOpcode() == AMDGPU::BUFFER_WBINVL1_SC \|\|
MI.getOpcode() == AMDGPU::BUFFER_WBINVL1_VOL) {		MI.getOpcode() == AMDGPU::BUFFER_WBINVL1_VOL) {
EmitWaitcnt \|=		EmitWaitcnt \|=
ScoreBrackets->updateByWait(VM_CNT, ScoreBrackets->getScoreUB(VM_CNT));		ScoreBrackets->updateByWait(VM_CNT, ScoreBrackets->getScoreUB(VM_CNT));
}		}

// All waits must be resolved at call return.		// All waits must be resolved at call return.
// NOTE: this could be improved with knowledge of all call sites or		// NOTE: this could be improved with knowledge of all call sites or
// with knowledge of the called routines.		// with knowledge of the called routines.
if (MI.getOpcode() == AMDGPU::RETURN \|\|		if (MI.getOpcode() == AMDGPU::SI_RETURN_TO_EPILOG \|\|
		arsenmUnsubmitted Done Reply Inline Actions Fix formatting arsenm: Fix formatting
MI.getOpcode() == AMDGPU::SI_RETURN_TO_EPILOG \|\|
MI.getOpcode() == AMDGPU::S_SETPC_B64_return) {		MI.getOpcode() == AMDGPU::S_SETPC_B64_return) {
for (enum InstCounterType T = VM_CNT; T < NUM_INST_CNTS;		for (enum InstCounterType T = VM_CNT; T < NUM_INST_CNTS;
T = (enum InstCounterType)(T + 1)) {		T = (enum InstCounterType)(T + 1)) {
if (ScoreBrackets->getScoreUB(T) > ScoreBrackets->getScoreLB(T)) {		if (ScoreBrackets->getScoreUB(T) > ScoreBrackets->getScoreLB(T)) {
ScoreBrackets->setScoreLB(T, ScoreBrackets->getScoreUB(T));		ScoreBrackets->setScoreLB(T, ScoreBrackets->getScoreUB(T));
EmitWaitcnt \|= CNT_MASK(T);		EmitWaitcnt \|= CNT_MASK(T);
}		}
}		}
▲ Show 20 Lines • Show All 179 Lines • ▼ Show 20 Lines	EmitWaitcnt \|= ScoreBrackets->updateByWait(
EXP_CNT, ScoreBrackets->getScoreUB(EXP_CNT));		EXP_CNT, ScoreBrackets->getScoreUB(EXP_CNT));
EmitWaitcnt \|= ScoreBrackets->updateByWait(		EmitWaitcnt \|= ScoreBrackets->updateByWait(
LGKM_CNT, ScoreBrackets->getScoreUB(LGKM_CNT));		LGKM_CNT, ScoreBrackets->getScoreUB(LGKM_CNT));
}		}

// TODO: Remove this work-around, enable the assert for Bug 457939		// TODO: Remove this work-around, enable the assert for Bug 457939
// after fixing the scheduler. Also, the Shader Compiler code is		// after fixing the scheduler. Also, the Shader Compiler code is
// independent of target.		// independent of target.
if (readsVCCZ(MI) && ST->getGeneration() <= SISubtarget::SEA_ISLANDS) {		if (readsVCCZ(MI) && ST->getGeneration() <= AMDGPUSubtarget::SEA_ISLANDS) {
if (ScoreBrackets->getScoreLB(LGKM_CNT) <		if (ScoreBrackets->getScoreLB(LGKM_CNT) <
ScoreBrackets->getScoreUB(LGKM_CNT) &&		ScoreBrackets->getScoreUB(LGKM_CNT) &&
ScoreBrackets->hasPendingSMEM()) {		ScoreBrackets->hasPendingSMEM()) {
// Wait on everything, not just LGKM. vccz reads usually come from		// Wait on everything, not just LGKM. vccz reads usually come from
// terminators, and we always wait on everything at the end of the		// terminators, and we always wait on everything at the end of the
// block, so if we only wait on LGKM here, we might end up with		// block, so if we only wait on LGKM here, we might end up with
// another s_waitcnt inserted right after this if there are non-LGKM		// another s_waitcnt inserted right after this if there are non-LGKM
// instructions still outstanding.		// instructions still outstanding.
▲ Show 20 Lines • Show All 568 Lines • ▼ Show 20 Lines	for (MachineBasicBlock::iterator Iter = Block.begin(), E = Block.end();
}		}

bool VCCZBugWorkAround = false;		bool VCCZBugWorkAround = false;
if (readsVCCZ(Inst) &&		if (readsVCCZ(Inst) &&
(!VCCZBugHandledSet.count(&Inst))) {		(!VCCZBugHandledSet.count(&Inst))) {
if (ScoreBrackets->getScoreLB(LGKM_CNT) <		if (ScoreBrackets->getScoreLB(LGKM_CNT) <
ScoreBrackets->getScoreUB(LGKM_CNT) &&		ScoreBrackets->getScoreUB(LGKM_CNT) &&
ScoreBrackets->hasPendingSMEM()) {		ScoreBrackets->hasPendingSMEM()) {
if (ST->getGeneration() <= SISubtarget::SEA_ISLANDS)		if (ST->getGeneration() <= AMDGPUSubtarget::SEA_ISLANDS)
VCCZBugWorkAround = true;		VCCZBugWorkAround = true;
}		}
}		}

// Generate an s_waitcnt instruction to be placed before		// Generate an s_waitcnt instruction to be placed before
// cur_Inst, if needed.		// cur_Inst, if needed.
generateWaitcntInstBefore(Inst, ScoreBrackets);		generateWaitcntInstBefore(Inst, ScoreBrackets);

▲ Show 20 Lines • Show All 296 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIInstrFormats.td

Show All 15 Lines	def isGCN : Predicate<"Subtarget->getGeneration() "
AssemblerPredicate<"FeatureGCN">;		AssemblerPredicate<"FeatureGCN">;
def isSI : Predicate<"Subtarget->getGeneration() "		def isSI : Predicate<"Subtarget->getGeneration() "
"== SISubtarget::SOUTHERN_ISLANDS">,		"== SISubtarget::SOUTHERN_ISLANDS">,
AssemblerPredicate<"FeatureSouthernIslands">;		AssemblerPredicate<"FeatureSouthernIslands">;


class InstSI <dag outs, dag ins, string asm = "",		class InstSI <dag outs, dag ins, string asm = "",
list<dag> pattern = []> :		list<dag> pattern = []> :
AMDGPUInst<outs, ins, asm, pattern>, PredicateControl {		AMDGPUInst<outs, ins, asm, pattern>, GCNPredicateControl {
let SubtargetPredicate = isGCN;		let SubtargetPredicate = isGCN;

// Low bits - basic encoding information.		// Low bits - basic encoding information.
field bit SALU = 0;		field bit SALU = 0;
field bit VALU = 0;		field bit VALU = 0;

// SALU instruction formats.		// SALU instruction formats.
field bit SOP1 = 0;		field bit SOP1 = 0;
▲ Show 20 Lines • Show All 303 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIInstrInfo.h

Show All 25 Lines
#include "llvm/CodeGen/MachineInstr.h"		#include "llvm/CodeGen/MachineInstr.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"		#include "llvm/CodeGen/MachineInstrBuilder.h"
#include "llvm/CodeGen/MachineOperand.h"		#include "llvm/CodeGen/MachineOperand.h"
#include "llvm/MC/MCInstrDesc.h"		#include "llvm/MC/MCInstrDesc.h"
#include "llvm/Support/Compiler.h"		#include "llvm/Support/Compiler.h"
#include <cassert>		#include <cassert>
#include <cstdint>		#include <cstdint>

		#define GET_INSTRINFO_HEADER
		#include "AMDGPUGenInstrInfo.inc"

namespace llvm {		namespace llvm {

class APInt;		class APInt;
class MachineRegisterInfo;		class MachineRegisterInfo;
class RegScavenger;		class RegScavenger;
class SISubtarget;		class SISubtarget;
class TargetRegisterClass;		class TargetRegisterClass;

class SIInstrInfo final : public AMDGPUInstrInfo {		class SIInstrInfo final : public AMDGPUGenInstrInfo {
private:		private:
const SIRegisterInfo RI;		const SIRegisterInfo RI;
const SISubtarget &ST;		const SISubtarget &ST;

// The inverse predicate should have the negative value.		// The inverse predicate should have the negative value.
enum BranchPredicate {		enum BranchPredicate {
INVALID_BR = 0,		INVALID_BR = 0,
SCC_TRUE = 1,		SCC_TRUE = 1,
▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines	bool areLoadsFromSameBasePtr(SDNode Load1, SDNode Load2,
int64_t &Offset2) const override;		int64_t &Offset2) const override;

bool getMemOpBaseRegImmOfs(MachineInstr &LdSt, unsigned &BaseReg,		bool getMemOpBaseRegImmOfs(MachineInstr &LdSt, unsigned &BaseReg,
int64_t &Offset,		int64_t &Offset,
const TargetRegisterInfo *TRI) const final;		const TargetRegisterInfo *TRI) const final;

bool shouldClusterMemOps(MachineInstr &FirstLdSt, unsigned BaseReg1,		bool shouldClusterMemOps(MachineInstr &FirstLdSt, unsigned BaseReg1,
MachineInstr &SecondLdSt, unsigned BaseReg2,		MachineInstr &SecondLdSt, unsigned BaseReg2,
unsigned NumLoads) const final;		unsigned NumLoads) const override;

		bool shouldScheduleLoadsNear(SDNode Load0, SDNode Load1, int64_t Offset0,
		int64_t Offset1, unsigned NumLoads) const override;

void copyPhysReg(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI,		void copyPhysReg(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI,
const DebugLoc &DL, unsigned DestReg, unsigned SrcReg,		const DebugLoc &DL, unsigned DestReg, unsigned SrcReg,
bool KillSrc) const override;		bool KillSrc) const override;

unsigned calculateLDSSpillAddress(MachineBasicBlock &MBB, MachineInstr &MI,		unsigned calculateLDSSpillAddress(MachineBasicBlock &MBB, MachineInstr &MI,
RegScavenger *RS, unsigned TmpReg,		RegScavenger *RS, unsigned TmpReg,
unsigned Offset, unsigned Size) const;		unsigned Offset, unsigned Size) const;
▲ Show 20 Lines • Show All 691 Lines • ▼ Show 20 Lines	MachineInstrBuilder getAddNoCarry(MachineBasicBlock &MBB,
unsigned DestReg) const;		unsigned DestReg) const;

static bool isKillTerminator(unsigned Opcode);		static bool isKillTerminator(unsigned Opcode);
const MCInstrDesc &getKillTerminatorFromPseudo(unsigned Opcode) const;		const MCInstrDesc &getKillTerminatorFromPseudo(unsigned Opcode) const;

static bool isLegalMUBUFImmOffset(unsigned Imm) {		static bool isLegalMUBUFImmOffset(unsigned Imm) {
return isUInt<12>(Imm);		return isUInt<12>(Imm);
}		}

		/// \brief Return a target-specific opcode if Opcode is a pseudo instruction.
		/// Return -1 if the target-specific opcode for the pseudo instruction does
		/// not exist. If Opcode is not a pseudo instruction, this is identity.
		int pseudoToMCOpcode(int Opcode) const;

};		};

namespace AMDGPU {		namespace AMDGPU {

LLVM_READONLY		LLVM_READONLY
int getVOPe64(uint16_t Opcode);		int getVOPe64(uint16_t Opcode);

LLVM_READONLY		LLVM_READONLY
▲ Show 20 Lines • Show All 64 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIInstrInfo.cpp

//===- SIInstrInfo.cpp - SI Instruction Information ----------------------===//		//===- SIInstrInfo.cpp - SI Instruction Information ----------------------===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
/// \file		/// \file
/// SI Implementation of TargetInstrInfo.		/// SI Implementation of TargetInstrInfo.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "SIInstrInfo.h"		#include "SIInstrInfo.h"
#include "AMDGPU.h"		#include "AMDGPU.h"
		#include "AMDGPUIntrinsicInfo.h"
#include "AMDGPUSubtarget.h"		#include "AMDGPUSubtarget.h"
#include "GCNHazardRecognizer.h"		#include "GCNHazardRecognizer.h"
#include "SIDefines.h"		#include "SIDefines.h"
#include "SIMachineFunctionInfo.h"		#include "SIMachineFunctionInfo.h"
#include "SIRegisterInfo.h"		#include "SIRegisterInfo.h"
#include "MCTargetDesc/AMDGPUMCTargetDesc.h"		#include "MCTargetDesc/AMDGPUMCTargetDesc.h"
#include "Utils/AMDGPUBaseInfo.h"		#include "Utils/AMDGPUBaseInfo.h"
#include "llvm/ADT/APInt.h"		#include "llvm/ADT/APInt.h"
Show All 33 Lines
#include "llvm/Target/TargetMachine.h"		#include "llvm/Target/TargetMachine.h"
#include <cassert>		#include <cassert>
#include <cstdint>		#include <cstdint>
#include <iterator>		#include <iterator>
#include <utility>		#include <utility>

using namespace llvm;		using namespace llvm;

		#define GET_INSTRINFO_CTOR_DTOR
		#include "AMDGPUGenInstrInfo.inc"

		namespace llvm {
		namespace AMDGPU {
		#define GET_D16ImageDimIntrinsics_IMPL
		#define GET_ImageDimIntrinsicTable_IMPL
		#define GET_RsrcIntrinsics_IMPL
		#include "AMDGPUGenSearchableTables.inc"
		}
		}


// Must be at least 4 to be able to branch over minimum unconditional branch		// Must be at least 4 to be able to branch over minimum unconditional branch
// code. This is only for making it possible to write reasonably small tests for		// code. This is only for making it possible to write reasonably small tests for
// long branches.		// long branches.
static cl::opt<unsigned>		static cl::opt<unsigned>
BranchOffsetBits("amdgpu-s-branch-bits", cl::ReallyHidden, cl::init(16),		BranchOffsetBits("amdgpu-s-branch-bits", cl::ReallyHidden, cl::init(16),
cl::desc("Restrict range of branch instructions (DEBUG)"));		cl::desc("Restrict range of branch instructions (DEBUG)"));

SIInstrInfo::SIInstrInfo(const SISubtarget &ST)		SIInstrInfo::SIInstrInfo(const SISubtarget &ST)
: AMDGPUInstrInfo(ST), RI(ST), ST(ST) {}		: AMDGPUGenInstrInfo(AMDGPU::ADJCALLSTACKUP, AMDGPU::ADJCALLSTACKDOWN),
		RI(ST), ST(ST) {}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// TargetInstrInfo callbacks		// TargetInstrInfo callbacks
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

static unsigned getNumOperandsNoGlue(SDNode *Node) {		static unsigned getNumOperandsNoGlue(SDNode *Node) {
unsigned N = Node->getNumOperands();		unsigned N = Node->getNumOperands();
while (N && Node->getOperand(N - 1).getValueType() == MVT::Glue)		while (N && Node->getOperand(N - 1).getValueType() == MVT::Glue)
▲ Show 20 Lines • Show All 350 Lines • ▼ Show 20 Lines	bool SIInstrInfo::shouldClusterMemOps(MachineInstr &FirstLdSt,

const MachineRegisterInfo &MRI =		const MachineRegisterInfo &MRI =
FirstLdSt.getParent()->getParent()->getRegInfo();		FirstLdSt.getParent()->getParent()->getRegInfo();
const TargetRegisterClass *DstRC = MRI.getRegClass(FirstDst->getReg());		const TargetRegisterClass *DstRC = MRI.getRegClass(FirstDst->getReg());

return (NumLoads * (RI.getRegSizeInBits(*DstRC) / 8)) <= LoadClusterThreshold;		return (NumLoads * (RI.getRegSizeInBits(*DstRC) / 8)) <= LoadClusterThreshold;
}		}

		// FIXME: This behaves strangely. If, for example, you have 32 load + stores,
		// the first 16 loads will be interleaved with the stores, and the next 16 will
		// be clustered as expected. It should really split into 2 16 store batches.
		//
		// Loads are clustered until this returns false, rather than trying to schedule
		// groups of stores. This also means we have to deal with saying different
		// address space loads should be clustered, and ones which might cause bank
		// conflicts.
		//
		// This might be deprecated so it might not be worth that much effort to fix.
		bool SIInstrInfo::shouldScheduleLoadsNear(SDNode Load0, SDNode Load1,
		int64_t Offset0, int64_t Offset1,
		unsigned NumLoads) const {
		assert(Offset1 > Offset0 &&
		"Second offset should be larger than first offset!");
		// If we have less than 16 loads in a row, and the offsets are within 64
		// bytes, then schedule together.

		// A cacheline is 64 bytes (for global memory).
		return (NumLoads <= 16 && (Offset1 - Offset0) < 64);
		}

static void reportIllegalCopy(const SIInstrInfo *TII, MachineBasicBlock &MBB,		static void reportIllegalCopy(const SIInstrInfo *TII, MachineBasicBlock &MBB,
MachineBasicBlock::iterator MI,		MachineBasicBlock::iterator MI,
const DebugLoc &DL, unsigned DestReg,		const DebugLoc &DL, unsigned DestReg,
unsigned SrcReg, bool KillSrc) {		unsigned SrcReg, bool KillSrc) {
MachineFunction *MF = MBB.getParent();		MachineFunction *MF = MBB.getParent();
DiagnosticInfoUnsupported IllegalCopy(MF->getFunction(),		DiagnosticInfoUnsupported IllegalCopy(MF->getFunction(),
"illegal SGPR to VGPR copy",		"illegal SGPR to VGPR copy",
DL, DS_Error);		DL, DS_Error);
▲ Show 20 Lines • Show All 544 Lines • ▼ Show 20 Lines
}		}

/// \param @Offset Offset in bytes of the FrameIndex being spilled		/// \param @Offset Offset in bytes of the FrameIndex being spilled
unsigned SIInstrInfo::calculateLDSSpillAddress(		unsigned SIInstrInfo::calculateLDSSpillAddress(
MachineBasicBlock &MBB, MachineInstr &MI, RegScavenger *RS, unsigned TmpReg,		MachineBasicBlock &MBB, MachineInstr &MI, RegScavenger *RS, unsigned TmpReg,
unsigned FrameOffset, unsigned Size) const {		unsigned FrameOffset, unsigned Size) const {
MachineFunction *MF = MBB.getParent();		MachineFunction *MF = MBB.getParent();
SIMachineFunctionInfo *MFI = MF->getInfo<SIMachineFunctionInfo>();		SIMachineFunctionInfo *MFI = MF->getInfo<SIMachineFunctionInfo>();
const SISubtarget &ST = MF->getSubtarget<SISubtarget>();		const AMDGPUSubtarget &ST = MF->getSubtarget<AMDGPUSubtarget>();
DebugLoc DL = MBB.findDebugLoc(MI);		DebugLoc DL = MBB.findDebugLoc(MI);
unsigned WorkGroupSize = MFI->getMaxFlatWorkGroupSize();		unsigned WorkGroupSize = MFI->getMaxFlatWorkGroupSize();
unsigned WavefrontSize = ST.getWavefrontSize();		unsigned WavefrontSize = ST.getWavefrontSize();

unsigned TIDReg = MFI->getTIDReg();		unsigned TIDReg = MFI->getTIDReg();
if (!MFI->hasCalculatedTID()) {		if (!MFI->hasCalculatedTID()) {
MachineBasicBlock &Entry = MBB.getParent()->front();		MachineBasicBlock &Entry = MBB.getParent()->front();
MachineBasicBlock::iterator Insert = Entry.front();		MachineBasicBlock::iterator Insert = Entry.front();
▲ Show 20 Lines • Show All 119 Lines • ▼ Show 20 Lines	case AMDGPU::S_NOP:
return MI.getOperand(0).getImm() + 1;		return MI.getOperand(0).getImm() + 1;
}		}
}		}

bool SIInstrInfo::expandPostRAPseudo(MachineInstr &MI) const {		bool SIInstrInfo::expandPostRAPseudo(MachineInstr &MI) const {
MachineBasicBlock &MBB = *MI.getParent();		MachineBasicBlock &MBB = *MI.getParent();
DebugLoc DL = MBB.findDebugLoc(MI);		DebugLoc DL = MBB.findDebugLoc(MI);
switch (MI.getOpcode()) {		switch (MI.getOpcode()) {
default: return AMDGPUInstrInfo::expandPostRAPseudo(MI);		default: return TargetInstrInfo::expandPostRAPseudo(MI);
case AMDGPU::S_MOV_B64_term:		case AMDGPU::S_MOV_B64_term:
// This is only a terminator to get the correct spill code placement during		// This is only a terminator to get the correct spill code placement during
// register allocation.		// register allocation.
MI.setDesc(get(AMDGPU::S_MOV_B64));		MI.setDesc(get(AMDGPU::S_MOV_B64));
break;		break;

case AMDGPU::S_XOR_B64_term:		case AMDGPU::S_XOR_B64_term:
// This is only a terminator to get the correct spill code placement during		// This is only a terminator to get the correct spill code placement during
▲ Show 20 Lines • Show All 749 Lines • ▼ Show 20 Lines	bool SIInstrInfo::isFoldableCopy(const MachineInstr &MI) const {
}		}
}		}

unsigned SIInstrInfo::getAddressSpaceForPseudoSourceKind(		unsigned SIInstrInfo::getAddressSpaceForPseudoSourceKind(
PseudoSourceValue::PSVKind Kind) const {		PseudoSourceValue::PSVKind Kind) const {
switch(Kind) {		switch(Kind) {
case PseudoSourceValue::Stack:		case PseudoSourceValue::Stack:
case PseudoSourceValue::FixedStack:		case PseudoSourceValue::FixedStack:
return AMDGPUASI.PRIVATE_ADDRESS;		return ST.getAMDGPUAS().PRIVATE_ADDRESS;
case PseudoSourceValue::ConstantPool:		case PseudoSourceValue::ConstantPool:
case PseudoSourceValue::GOT:		case PseudoSourceValue::GOT:
case PseudoSourceValue::JumpTable:		case PseudoSourceValue::JumpTable:
case PseudoSourceValue::GlobalValueCallEntry:		case PseudoSourceValue::GlobalValueCallEntry:
case PseudoSourceValue::ExternalSymbolCallEntry:		case PseudoSourceValue::ExternalSymbolCallEntry:
case PseudoSourceValue::TargetCustom:		case PseudoSourceValue::TargetCustom:
return AMDGPUASI.CONSTANT_ADDRESS;		return ST.getAMDGPUAS().CONSTANT_ADDRESS;
}		}
return AMDGPUASI.FLAT_ADDRESS;		return ST.getAMDGPUAS().FLAT_ADDRESS;
}		}

static void removeModOperands(MachineInstr &MI) {		static void removeModOperands(MachineInstr &MI) {
unsigned Opc = MI.getOpcode();		unsigned Opc = MI.getOpcode();
int Src0ModIdx = AMDGPU::getNamedOperandIdx(Opc,		int Src0ModIdx = AMDGPU::getNamedOperandIdx(Opc,
AMDGPU::OpName::src0_modifiers);		AMDGPU::OpName::src0_modifiers);
int Src1ModIdx = AMDGPU::getNamedOperandIdx(Opc,		int Src1ModIdx = AMDGPU::getNamedOperandIdx(Opc,
AMDGPU::OpName::src1_modifiers);		AMDGPU::OpName::src1_modifiers);
▲ Show 20 Lines • Show All 2,723 Lines • ▼ Show 20 Lines

unsigned SIInstrInfo::isStackAccess(const MachineInstr &MI,		unsigned SIInstrInfo::isStackAccess(const MachineInstr &MI,
int &FrameIndex) const {		int &FrameIndex) const {
const MachineOperand *Addr = getNamedOperand(MI, AMDGPU::OpName::vaddr);		const MachineOperand *Addr = getNamedOperand(MI, AMDGPU::OpName::vaddr);
if (!Addr \|\| !Addr->isFI())		if (!Addr \|\| !Addr->isFI())
return AMDGPU::NoRegister;		return AMDGPU::NoRegister;

assert(!MI.memoperands_empty() &&		assert(!MI.memoperands_empty() &&
(*MI.memoperands_begin())->getAddrSpace() == AMDGPUASI.PRIVATE_ADDRESS);		(*MI.memoperands_begin())->getAddrSpace() == ST.getAMDGPUAS().PRIVATE_ADDRESS);

FrameIndex = Addr->getIndex();		FrameIndex = Addr->getIndex();
return getNamedOperand(MI, AMDGPU::OpName::vdata)->getReg();		return getNamedOperand(MI, AMDGPU::OpName::vdata)->getReg();
}		}

unsigned SIInstrInfo::isSGPRStackAccess(const MachineInstr &MI,		unsigned SIInstrInfo::isSGPRStackAccess(const MachineInstr &MI,
int &FrameIndex) const {		int &FrameIndex) const {
const MachineOperand *Addr = getNamedOperand(MI, AMDGPU::OpName::addr);		const MachineOperand *Addr = getNamedOperand(MI, AMDGPU::OpName::addr);
▲ Show 20 Lines • Show All 102 Lines • ▼ Show 20 Lines
bool SIInstrInfo::mayAccessFlatAddressSpace(const MachineInstr &MI) const {		bool SIInstrInfo::mayAccessFlatAddressSpace(const MachineInstr &MI) const {
if (!isFLAT(MI))		if (!isFLAT(MI))
return false;		return false;

if (MI.memoperands_empty())		if (MI.memoperands_empty())
return true;		return true;

for (const MachineMemOperand *MMO : MI.memoperands()) {		for (const MachineMemOperand *MMO : MI.memoperands()) {
if (MMO->getAddrSpace() == AMDGPUASI.FLAT_ADDRESS)		if (MMO->getAddrSpace() == ST.getAMDGPUAS().FLAT_ADDRESS)
return true;		return true;
}		}
return false;		return false;
}		}

bool SIInstrInfo::isNonUniformBranchInstr(MachineInstr &Branch) const {		bool SIInstrInfo::isNonUniformBranchInstr(MachineInstr &Branch) const {
return Branch.getOpcode() == AMDGPU::SI_NON_UNIFORM_BRCOND_PSEUDO;		return Branch.getOpcode() == AMDGPU::SI_NON_UNIFORM_BRCOND_PSEUDO;
}		}
▲ Show 20 Lines • Show All 163 Lines • ▼ Show 20 Lines	bool SIInstrInfo::isBufferSMRD(const MachineInstr &MI) const {
// Check that it is using a buffer resource.		// Check that it is using a buffer resource.
int Idx = AMDGPU::getNamedOperandIdx(MI.getOpcode(), AMDGPU::OpName::sbase);		int Idx = AMDGPU::getNamedOperandIdx(MI.getOpcode(), AMDGPU::OpName::sbase);
if (Idx == -1) // e.g. s_memtime		if (Idx == -1) // e.g. s_memtime
return false;		return false;

const auto RCID = MI.getDesc().OpInfo[Idx].RegClass;		const auto RCID = MI.getDesc().OpInfo[Idx].RegClass;
return RCID == AMDGPU::SReg_128RegClassID;		return RCID == AMDGPU::SReg_128RegClassID;
}		}

		// This must be kept in sync with the SIEncodingFamily class in SIInstrInfo.td
		enum SIEncodingFamily {
		SI = 0,
		VI = 1,
		SDWA = 2,
		SDWA9 = 3,
		GFX80 = 4,
		GFX9 = 5
		};

		static SIEncodingFamily subtargetEncodingFamily(const SISubtarget &ST) {
		switch (ST.getGeneration()) {
		case SISubtarget::SOUTHERN_ISLANDS:
		case SISubtarget::SEA_ISLANDS:
		return SIEncodingFamily::SI;
		case SISubtarget::VOLCANIC_ISLANDS:
		case SISubtarget::GFX9:
		return SIEncodingFamily::VI;
		}
		llvm_unreachable("Unknown subtarget generation!");
		}

		int SIInstrInfo::pseudoToMCOpcode(int Opcode) const {
		SIEncodingFamily Gen = subtargetEncodingFamily(ST);

		if ((get(Opcode).TSFlags & SIInstrFlags::renamedInGFX9) != 0 &&
		ST.getGeneration() >= SISubtarget::GFX9)
		Gen = SIEncodingFamily::GFX9;

		if (get(Opcode).TSFlags & SIInstrFlags::SDWA)
		Gen = ST.getGeneration() == SISubtarget::GFX9 ? SIEncodingFamily::SDWA9
		: SIEncodingFamily::SDWA;
		// Adjust the encoding family to GFX80 for D16 buffer instructions when the
		// subtarget has UnpackedD16VMem feature.
		// TODO: remove this when we discard GFX80 encoding.
		if (ST.hasUnpackedD16VMem() && (get(Opcode).TSFlags & SIInstrFlags::D16Buf))
		Gen = SIEncodingFamily::GFX80;

		int MCOp = AMDGPU::getMCOpcode(Opcode, Gen);

		// -1 means that Opcode is already a native instruction.
		if (MCOp == -1)
		return Opcode;

		// (uint16_t)-1 means that Opcode is a pseudo instruction that has
		// no encoding in the given subtarget generation.
		if (MCOp == (uint16_t)-1)
		return -1;

		return MCOp;
		}

lib/Target/AMDGPU/SIInstrInfo.td

Show All 11 Lines	def isCIOnly : Predicate<"Subtarget->getGeneration() =="
"SISubtarget::SEA_ISLANDS">,		"SISubtarget::SEA_ISLANDS">,
AssemblerPredicate <"FeatureSeaIslands">;		AssemblerPredicate <"FeatureSeaIslands">;
def isVIOnly : Predicate<"Subtarget->getGeneration() =="		def isVIOnly : Predicate<"Subtarget->getGeneration() =="
"SISubtarget::VOLCANIC_ISLANDS">,		"SISubtarget::VOLCANIC_ISLANDS">,
AssemblerPredicate <"FeatureVolcanicIslands">;		AssemblerPredicate <"FeatureVolcanicIslands">;

def DisableInst : Predicate <"false">, AssemblerPredicate<"FeatureDisable">;		def DisableInst : Predicate <"false">, AssemblerPredicate<"FeatureDisable">;

		class GCNPredicateControl : PredicateControl {
		Predicate SIAssemblerPredicate = isSICI;
		Predicate VIAssemblerPredicate = isVI;
		}

// Execpt for the NONE field, this must be kept in sync with the		// Execpt for the NONE field, this must be kept in sync with the
// SIEncodingFamily enum in AMDGPUInstrInfo.cpp		// SIEncodingFamily enum in AMDGPUInstrInfo.cpp
def SIEncodingFamily {		def SIEncodingFamily {
int NONE = -1;		int NONE = -1;
int SI = 0;		int SI = 0;
int VI = 1;		int VI = 1;
int SDWA = 2;		int SDWA = 2;
int SDWA9 = 3;		int SDWA9 = 3;
▲ Show 20 Lines • Show All 1,977 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIInstructions.td

	//===-- SIInstructions.td - SI Instruction Defintions ---------------------===//			//===-- SIInstructions.td - SI Instruction Defintions ---------------------===//
	//			//
	// The LLVM Compiler Infrastructure			// The LLVM Compiler Infrastructure
	//			//
	// This file is distributed under the University of Illinois Open Source			// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.			// License. See LICENSE.TXT for details.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// This file was originally auto-generated from a GPU register header file and			// This file was originally auto-generated from a GPU register header file and
	// all the instruction definitions were originally commented out. Instructions			// all the instruction definitions were originally commented out. Instructions
	// that are not yet supported remain commented out.			// that are not yet supported remain commented out.
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	class GCNPat<dag pattern, dag result> : AMDGPUPat<pattern, result> {			class GCNPat<dag pattern, dag result> : Pat<pattern, result>, GCNPredicateControl {
	let SubtargetPredicate = isGCN;			let SubtargetPredicate = isGCN;
	}			}


	include "VOPInstructions.td"			include "VOPInstructions.td"
	include "SOPInstructions.td"			include "SOPInstructions.td"
	include "SMInstructions.td"			include "SMInstructions.td"
	include "FLATInstructions.td"			include "FLATInstructions.td"
	include "BUFInstructions.td"			include "BUFInstructions.td"

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// EXP Instructions			// EXP Instructions
	▲ Show 20 Lines • Show All 1,593 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIRegisterInfo.h

	Show All 15 Lines
	#define LLVM_LIB_TARGET_AMDGPU_SIREGISTERINFO_H			#define LLVM_LIB_TARGET_AMDGPU_SIREGISTERINFO_H

	#include "AMDGPURegisterInfo.h"			#include "AMDGPURegisterInfo.h"
	#include "SIDefines.h"			#include "SIDefines.h"
	#include "llvm/CodeGen/MachineRegisterInfo.h"			#include "llvm/CodeGen/MachineRegisterInfo.h"

	namespace llvm {			namespace llvm {

				class AMDGPUSubtarget;
	class LiveIntervals;			class LiveIntervals;
	class MachineRegisterInfo;			class MachineRegisterInfo;
	class SISubtarget;			class SISubtarget;
	class SIMachineFunctionInfo;			class SIMachineFunctionInfo;

	class SIRegisterInfo final : public AMDGPURegisterInfo {			class SIRegisterInfo final : public AMDGPURegisterInfo {
	private:			private:
	unsigned SGPRSetID;			unsigned SGPRSetID;
	▲ Show 20 Lines • Show All 215 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIRegisterInfo.cpp

Show First 20 Lines • Show All 1,226 Lines • ▼ Show 20 Lines	static const TargetRegisterClass *const BaseClasses[] = {
&AMDGPU::VReg_96RegClass,		&AMDGPU::VReg_96RegClass,
&AMDGPU::VReg_128RegClass,		&AMDGPU::VReg_128RegClass,
&AMDGPU::SReg_128RegClass,		&AMDGPU::SReg_128RegClass,
&AMDGPU::VReg_256RegClass,		&AMDGPU::VReg_256RegClass,
&AMDGPU::SReg_256RegClass,		&AMDGPU::SReg_256RegClass,
&AMDGPU::VReg_512RegClass,		&AMDGPU::VReg_512RegClass,
&AMDGPU::SReg_512RegClass,		&AMDGPU::SReg_512RegClass,
&AMDGPU::SCC_CLASSRegClass,		&AMDGPU::SCC_CLASSRegClass,
&AMDGPU::R600_Reg32RegClass,
&AMDGPU::R600_PredicateRegClass,
&AMDGPU::Pseudo_SReg_32RegClass,		&AMDGPU::Pseudo_SReg_32RegClass,
&AMDGPU::Pseudo_SReg_128RegClass,		&AMDGPU::Pseudo_SReg_128RegClass,
};		};

for (const TargetRegisterClass *BaseClass : BaseClasses) {		for (const TargetRegisterClass *BaseClass : BaseClasses) {
if (BaseClass->contains(Reg)) {		if (BaseClass->contains(Reg)) {
return BaseClass;		return BaseClass;
}		}
▲ Show 20 Lines • Show All 353 Lines • Show Last 20 Lines

lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h

Show First 20 Lines • Show All 94 Lines • ▼ Show 20 Lines	unsigned getMaxWavesPerCU(const FeatureBitset &Features,
unsigned FlatWorkGroupSize);		unsigned FlatWorkGroupSize);

/// \returns Minimum number of waves per execution unit for given subtarget \p		/// \returns Minimum number of waves per execution unit for given subtarget \p
/// Features.		/// Features.
unsigned getMinWavesPerEU(const FeatureBitset &Features);		unsigned getMinWavesPerEU(const FeatureBitset &Features);

/// \returns Maximum number of waves per execution unit for given subtarget \p		/// \returns Maximum number of waves per execution unit for given subtarget \p
/// Features without any kind of limitation.		/// Features without any kind of limitation.
unsigned getMaxWavesPerEU(const FeatureBitset &Features);		unsigned getMaxWavesPerEU();

/// \returns Maximum number of waves per execution unit for given subtarget \p		/// \returns Maximum number of waves per execution unit for given subtarget \p
/// Features and limited by given \p FlatWorkGroupSize.		/// Features and limited by given \p FlatWorkGroupSize.
unsigned getMaxWavesPerEU(const FeatureBitset &Features,		unsigned getMaxWavesPerEU(const FeatureBitset &Features,
unsigned FlatWorkGroupSize);		unsigned FlatWorkGroupSize);

/// \returns Minimum flat work group size for given subtarget \p Features.		/// \returns Minimum flat work group size for given subtarget \p Features.
unsigned getMinFlatWorkGroupSize(const FeatureBitset &Features);		unsigned getMinFlatWorkGroupSize(const FeatureBitset &Features);
▲ Show 20 Lines • Show All 336 Lines • Show Last 20 Lines

lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp

Show First 20 Lines • Show All 175 Lines • ▼ Show 20 Lines	if (Features.test(FeatureISAVersion9_0_2))
return {9, 0, 2};		return {9, 0, 2};
if (Features.test(FeatureISAVersion9_0_4))		if (Features.test(FeatureISAVersion9_0_4))
return {9, 0, 4};		return {9, 0, 4};
if (Features.test(FeatureISAVersion9_0_6))		if (Features.test(FeatureISAVersion9_0_6))
return {9, 0, 6};		return {9, 0, 6};
if (Features.test(FeatureGFX9))		if (Features.test(FeatureGFX9))
return {9, 0, 0};		return {9, 0, 0};

if (!Features.test(FeatureGCN) \|\| Features.test(FeatureSouthernIslands))		if (Features.test(FeatureSouthernIslands))
return {0, 0, 0};		return {0, 0, 0};
return {7, 0, 0};		return {7, 0, 0};
}		}

void streamIsaVersion(const MCSubtargetInfo *STI, raw_ostream &Stream) {		void streamIsaVersion(const MCSubtargetInfo *STI, raw_ostream &Stream) {
auto TargetTriple = STI->getTargetTriple();		auto TargetTriple = STI->getTargetTriple();
auto ISAVersion = IsaInfo::getIsaVersion(STI->getFeatureBits());		auto ISAVersion = IsaInfo::getIsaVersion(STI->getFeatureBits());

▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	unsigned getMaxWorkGroupsPerCU(const FeatureBitset &Features,
unsigned N = getWavesPerWorkGroup(Features, FlatWorkGroupSize);		unsigned N = getWavesPerWorkGroup(Features, FlatWorkGroupSize);
if (N == 1)		if (N == 1)
return 40;		return 40;
N = 40 / N;		N = 40 / N;
return std::min(N, 16u);		return std::min(N, 16u);
}		}

unsigned getMaxWavesPerCU(const FeatureBitset &Features) {		unsigned getMaxWavesPerCU(const FeatureBitset &Features) {
return getMaxWavesPerEU(Features) * getEUsPerCU(Features);		return getMaxWavesPerEU() * getEUsPerCU(Features);
}		}

unsigned getMaxWavesPerCU(const FeatureBitset &Features,		unsigned getMaxWavesPerCU(const FeatureBitset &Features,
unsigned FlatWorkGroupSize) {		unsigned FlatWorkGroupSize) {
return getWavesPerWorkGroup(Features, FlatWorkGroupSize);		return getWavesPerWorkGroup(Features, FlatWorkGroupSize);
}		}

unsigned getMinWavesPerEU(const FeatureBitset &Features) {		unsigned getMinWavesPerEU(const FeatureBitset &Features) {
return 1;		return 1;
}		}

unsigned getMaxWavesPerEU(const FeatureBitset &Features) {		unsigned getMaxWavesPerEU() {
if (!Features.test(FeatureGCN))
return 8;
// FIXME: Need to take scratch memory into account.		// FIXME: Need to take scratch memory into account.
return 10;		return 10;
}		}

unsigned getMaxWavesPerEU(const FeatureBitset &Features,		unsigned getMaxWavesPerEU(const FeatureBitset &Features,
unsigned FlatWorkGroupSize) {		unsigned FlatWorkGroupSize) {
return alignTo(getMaxWavesPerCU(Features, FlatWorkGroupSize),		return alignTo(getMaxWavesPerCU(Features, FlatWorkGroupSize),
getEUsPerCU(Features)) / getEUsPerCU(Features);		getEUsPerCU(Features)) / getEUsPerCU(Features);
Show All 39 Lines	unsigned getAddressableNumSGPRs(const FeatureBitset &Features) {
if (Version.Major >= 8)		if (Version.Major >= 8)
return 102;		return 102;
return 104;		return 104;
}		}

unsigned getMinNumSGPRs(const FeatureBitset &Features, unsigned WavesPerEU) {		unsigned getMinNumSGPRs(const FeatureBitset &Features, unsigned WavesPerEU) {
assert(WavesPerEU != 0);		assert(WavesPerEU != 0);

if (WavesPerEU >= getMaxWavesPerEU(Features))		if (WavesPerEU >= getMaxWavesPerEU())
return 0;		return 0;

unsigned MinNumSGPRs = getTotalNumSGPRs(Features) / (WavesPerEU + 1);		unsigned MinNumSGPRs = getTotalNumSGPRs(Features) / (WavesPerEU + 1);
if (Features.test(FeatureTrapHandler))		if (Features.test(FeatureTrapHandler))
MinNumSGPRs -= std::min(MinNumSGPRs, (unsigned)TRAP_NUM_SGPRS);		MinNumSGPRs -= std::min(MinNumSGPRs, (unsigned)TRAP_NUM_SGPRS);
MinNumSGPRs = alignDown(MinNumSGPRs, getSGPRAllocGranule(Features)) + 1;		MinNumSGPRs = alignDown(MinNumSGPRs, getSGPRAllocGranule(Features)) + 1;
return std::min(MinNumSGPRs, getAddressableNumSGPRs(Features));		return std::min(MinNumSGPRs, getAddressableNumSGPRs(Features));
}		}
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines

unsigned getAddressableNumVGPRs(const FeatureBitset &Features) {		unsigned getAddressableNumVGPRs(const FeatureBitset &Features) {
return getTotalNumVGPRs(Features);		return getTotalNumVGPRs(Features);
}		}

unsigned getMinNumVGPRs(const FeatureBitset &Features, unsigned WavesPerEU) {		unsigned getMinNumVGPRs(const FeatureBitset &Features, unsigned WavesPerEU) {
assert(WavesPerEU != 0);		assert(WavesPerEU != 0);

if (WavesPerEU >= getMaxWavesPerEU(Features))		if (WavesPerEU >= getMaxWavesPerEU())
return 0;		return 0;
unsigned MinNumVGPRs =		unsigned MinNumVGPRs =
alignDown(getTotalNumVGPRs(Features) / (WavesPerEU + 1),		alignDown(getTotalNumVGPRs(Features) / (WavesPerEU + 1),
getVGPRAllocGranule(Features)) + 1;		getVGPRAllocGranule(Features)) + 1;
return std::min(MinNumVGPRs, getAddressableNumVGPRs(Features));		return std::min(MinNumVGPRs, getAddressableNumVGPRs(Features));
}		}

unsigned getMaxNumVGPRs(const FeatureBitset &Features, unsigned WavesPerEU) {		unsigned getMaxNumVGPRs(const FeatureBitset &Features, unsigned WavesPerEU) {
▲ Show 20 Lines • Show All 327 Lines • ▼ Show 20 Lines

#define CASE_CI_VI(node) \		#define CASE_CI_VI(node) \
assert(!isSI(STI)); \		assert(!isSI(STI)); \
case node: return isCI(STI) ? node##_ci : node##_vi;		case node: return isCI(STI) ? node##_ci : node##_vi;

#define CASE_VI_GFX9(node) \		#define CASE_VI_GFX9(node) \
case node: return isGFX9(STI) ? node##_gfx9 : node##_vi;		case node: return isGFX9(STI) ? node##_gfx9 : node##_vi;

unsigned getMCReg(unsigned Reg, const MCSubtargetInfo &STI) {		unsigned getMCReg(unsigned Reg, const MCSubtargetInfo &STI) {
		if (STI.getTargetTriple().getArch() == Triple::r600)
		return Reg;
MAP_REG2REG		MAP_REG2REG
}		}
		arsenmUnsubmitted Not Done Reply Inline Actions I would expect this to be a separate function, but not sure where this would go arsenm: I would expect this to be a separate function, but not sure where this would go
		tstellarAuthorUnsubmitted Not Done Reply Inline Actions We can refactor AMDGPUMCInstLower.cpp so that this can be in its own function. I can work on this as one of the follow on clean ups. tstellar: We can refactor AMDGPUMCInstLower.cpp so that this can be in its own function. I can work on…

#undef CASE_CI_VI		#undef CASE_CI_VI
#undef CASE_VI_GFX9		#undef CASE_VI_GFX9

#define CASE_CI_VI(node) case node##_ci: case node##_vi: return node;		#define CASE_CI_VI(node) case node##_ci: case node##_vi: return node;
#define CASE_VI_GFX9(node) case node##_vi: case node##_gfx9: return node;		#define CASE_VI_GFX9(node) case node##_vi: case node##_gfx9: return node;

unsigned mc2PseudoReg(unsigned Reg) {		unsigned mc2PseudoReg(unsigned Reg) {
▲ Show 20 Lines • Show All 229 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Separate R600 and GCN TableGen filesClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 153252

lib/Target/AMDGPU/AMDGPU.td

lib/Target/AMDGPU/AMDGPUCallingConv.td

lib/Target/AMDGPU/AMDGPUFeatures.td

lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp

lib/Target/AMDGPU/AMDGPUISelLowering.h

lib/Target/AMDGPU/AMDGPUISelLowering.cpp

lib/Target/AMDGPU/AMDGPUInstrInfo.h

lib/Target/AMDGPU/AMDGPUInstrInfo.cpp

lib/Target/AMDGPU/AMDGPUInstructions.td

lib/Target/AMDGPU/AMDGPUIntrinsics.td

lib/Target/AMDGPU/AMDGPULowerIntrinsics.cpp

lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp

lib/Target/AMDGPU/AMDGPURegisterInfo.td

lib/Target/AMDGPU/AMDGPUSubtarget.h

lib/Target/AMDGPU/AMDGPUSubtarget.cpp

lib/Target/AMDGPU/AMDGPUTargetMachine.h

lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h

lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp

lib/Target/AMDGPU/AMDILCFGStructurizer.cpp

lib/Target/AMDGPU/CMakeLists.txt

lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp

lib/Target/AMDGPU/EvergreenInstructions.td

lib/Target/AMDGPU/InstPrinter/AMDGPUInstPrinter.h

lib/Target/AMDGPU/InstPrinter/AMDGPUInstPrinter.cpp

lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.h

lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.cpp

lib/Target/AMDGPU/MCTargetDesc/CMakeLists.txt

lib/Target/AMDGPU/MCTargetDesc/R600MCCodeEmitter.cpp

lib/Target/AMDGPU/MCTargetDesc/R600MCTargetDesc.cpp

lib/Target/AMDGPU/MCTargetDesc/SIMCCodeEmitter.cpp

lib/Target/AMDGPU/R600.td

lib/Target/AMDGPU/R600AsmPrinter.cpp

lib/Target/AMDGPU/R600ClauseMergePass.cpp

lib/Target/AMDGPU/R600ControlFlowFinalizer.cpp

lib/Target/AMDGPU/R600EmitClauseMarkers.cpp

lib/Target/AMDGPU/R600ExpandSpecialInstrs.cpp

lib/Target/AMDGPU/R600ISelLowering.h

lib/Target/AMDGPU/R600ISelLowering.cpp

lib/Target/AMDGPU/R600InstrFormats.td

lib/Target/AMDGPU/R600InstrInfo.h

lib/Target/AMDGPU/R600InstrInfo.cpp

lib/Target/AMDGPU/R600Instructions.td

lib/Target/AMDGPU/R600MachineScheduler.cpp

lib/Target/AMDGPU/R600OptimizeVectorRegisters.cpp

lib/Target/AMDGPU/R600Packetizer.cpp

lib/Target/AMDGPU/R600Processors.td

lib/Target/AMDGPU/R600RegisterInfo.h

lib/Target/AMDGPU/R600RegisterInfo.cpp

lib/Target/AMDGPU/R600RegisterInfo.td

lib/Target/AMDGPU/R700Instructions.td

lib/Target/AMDGPU/SIFoldOperands.cpp

lib/Target/AMDGPU/SIISelLowering.h

lib/Target/AMDGPU/SIISelLowering.cpp

lib/Target/AMDGPU/SIInsertWaitcnts.cpp

lib/Target/AMDGPU/SIInstrFormats.td

lib/Target/AMDGPU/SIInstrInfo.h

lib/Target/AMDGPU/SIInstrInfo.cpp

lib/Target/AMDGPU/SIInstrInfo.td

lib/Target/AMDGPU/SIInstructions.td

lib/Target/AMDGPU/SIRegisterInfo.h

lib/Target/AMDGPU/SIRegisterInfo.cpp

lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h

lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp

AMDGPU: Separate R600 and GCN TableGen files
ClosedPublic