This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Separate R600 and GCN TableGen files
ClosedPublic

Authored by tstellar on May 2 2018, 1:17 PM.

Download Raw Diff

Details

Reviewers

arsenm
nhaehnle
jvesely

Commits

rGc5a154db48c3: AMDGPU: Separate R600 and GCN TableGen files
rL335942: AMDGPU: Separate R600 and GCN TableGen files

Summary

We now have two sets of generated TableGen files, one for R600 and one
for GCN, so each sub-target now has its own tables of instructions,
registers, ISel patterns, etc. This should help reduce compile time
since each sub-target now only has to consider information that
is specific to itself. This will also help prevent the R600
sub-target from slowing down new features for GCN, like disassembler
support, GlobalISel, etc.

Diff Detail

Repository

rL LLVM

Build Status

Buildable 17626
Build 17626: arc lint + arc unit

Event Timeline

tstellar created this revision.May 2 2018, 1:17 PM

Herald added subscribers: javed.absar, t-tye, tpr and 6 others. · View Herald TranscriptMay 2 2018, 1:17 PM

Harbormaster completed remote builds in B17626: Diff 144924.May 2 2018, 1:17 PM

kzhuravl added inline comments.May 2 2018, 1:23 PM

lib/Target/AMDGPU/Processors.td
14	Aren't GCN processors defined in GCNProcessors.td? I do not see it being removed or modified in this change..

tstellar added inline comments.May 2 2018, 1:26 PM

lib/Target/AMDGPU/Processors.td
14	This file isn't used at all, I think it was a rebase artifact. I can remove it.

arsenm added inline comments.May 3 2018, 4:28 AM

lib/Target/AMDGPU/AMDGPUMCInstLower.cpp
141	Should this be a separate class as well?
lib/Target/AMDGPU/AMDGPUSubtarget.h
66	Is it possible to avoid making these virtual?

tstellar added inline comments.May 3 2018, 9:19 AM

lib/Target/AMDGPU/AMDGPUSubtarget.h
66	I will look through this again and see if I can eliminate some of these virtual functions, but to get rid of all of them we have a few options: We could eliminate the AMDGPUCommonSubtarget super class and then in code shared between r600 and amdgcn (which is mostly IR passes and a few remaining classes like AMDGPUTargetLowering, AMDAsmPrinter, etc) do something like: bool IsAmdHsaOs; if (Triple.getArch() == Triple::amdgcn) IsAmdHsaOS = static_cast<SISubtarget>(Subtarget).isAmdHsaOS() else IsAmdHsaOS = static_cast<R600Subtaget>(Subtarget).isAmdHsaOS(); Remove subtarget checks from shared classes by refactoring code into r600/gcn specific classes.

Removed unused Processors.td file and made all AMDGPUCommonSubtarget
functions non-virtual.

arsenm added inline comments.May 23 2018, 11:57 PM

lib/Target/AMDGPU/AMDGPUInstrInfo.cpp
31–32	Commented out code
lib/Target/AMDGPU/AMDGPUInstructions.td
49	Should probably rename this at some point
lib/Target/AMDGPU/AMDGPUMCInstLower.cpp
95–99	Why is there a difference here?
lib/Target/AMDGPU/AMDGPUSubtarget.h
207–208	Why isn't this SISubtarget/GCNSubtarget?
421–423	Why is this needed outside of GCN code?
lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
580	Can’t this be in a gcn class? I think all of this is for packed anyway
lib/Target/AMDGPU/R600ISelLowering.cpp
1563–1565	Probably should reject these, but that's a separate change
lib/Target/AMDGPU/R600IntrinsicInfo.cpp
24–25	We can probably drop the whole class. We don't have very many intrinsic definitions left in the backend, and this code never really worked well to begin with
lib/Target/AMDGPU/SIInsertWaitcnts.cpp
925–926	Fix formatting

tstellar marked 8 inline comments as done.May 31 2018, 11:55 AM

tstellar added inline comments.

lib/Target/AMDGPU/AMDGPUSubtarget.h
207–208	I was planning to rename this as a follow on patch to avoid creating even more churn in this patch.
421–423	It's not. I've dropped the R600 implementation of this.

Rebase patch after splitting some of the changes requested
into separate patches.

Rebase patch on latest ToT.

Harbormaster completed remote builds in B19312: Diff 151267.Jun 13 2018, 4:05 PM

I've tried the updated version of the patch, although it did not apply cleanly. It also causes GPU hangs on my turks in piglit tests.

In D46365#1132085, @jvesely wrote:

I've tried the updated version of the patch, although it did not apply cleanly. It also causes GPU hangs on my turks in piglit tests.

Do you have a test case?

I assume that there is no change in generated code intended for r600 (EG/CM).
These are the changes in piglit tests I noticed:

< 	MEM_RAT_CACHELESS STORE_RAW T0.X, T1.X, 1
---
> 	MEM_RAT_CACHELESS STORE_DWORD T0.X, T1.X

There are other changes wrt register allocation and packetizer, but this one looks the most suspicious. My turks is TS2 and STORE_DWORD is not defined in the ISA (STORE_RAW is the only allowed opcode for CACHELESS target). Checking cayman ISA STORE_DWORD is opcode 20 (vs. opc 2 for STORE_RAW), which is reserved on TS2. The instruction also lost the offset.

Now, there are tests for MEMRAT_CACHELESS stoers, and they pass so I guess there is another untested store path that got mixed between TS2 and TS3.
I can paste the .ll file if you're interested.

a quick update. running llc manually on the kernel .ll (dumped using CLOVER_DEBUG=llvm) produces correct assembly. Running it in clover generates incorrect code (dumped using CLOVER_DEBUG=native) and hangs GPU.

lib/Target/AMDGPU/AMDGPUFeatures.td
53	gi complains about blank line at the end of file here
lib/Target/AMDGPU/R600ISelLowering.cpp
238	git complains about whitespace error in this location

I added the below snippet to check whether the caymanISA feature gets initialized correctly:

@@ -415,7 +417,10 @@ R600Subtarget::R600Subtarget(const Triple &TT, StringRef GPU, StringRef FS,
   TLInfo(TM, initializeSubtargetDependencies(TT, GPU, FS)),
   DX10Clamp(false),
   InstrItins(getInstrItineraryForCPU(GPU)),
-  AS (AMDGPU::getAMDGPUAS(TT)) { }
+  AS (AMDGPU::getAMDGPUAS(TT)) {
+  fprintf(stderr, "R600 FEATURE STRING: %s\n", FS.data());
+  fprintf(stderr, "R600 Has Cayman ISA: %s\n", CaymanISA ? "YES" : "NO");
+}

As expected it randomly on occasion printed:

'-fp32-denormals' is not a recognized feature for this target (ignoring feature)
'-fp32-denormals' is not a recognized feature for this target (ignoring feature)
'-fp32-denormals' is not a recognized feature for this target (ignoring feature)
R600 FEATURE STRING: -fp32-denormals
R600 Has Cayman ISA: YES

running llc through valgrind produced flood of 'Conditional jump or move depends on uninitialised value(s)'
269 errors from 24 contexts. Initialzieng just CaymanISA in R600SUbtarget gets rid of most of them.

In D46365#1133213, @jvesely wrote:

Now, there are tests for MEMRAT_CACHELESS stoers, and they pass so I guess there is another untested store path that got mixed between TS2 and TS3.
I can paste the .ll file if you're interested.

Yes, that would be helpful.

Rebase and fix some uninitialized variables in R600Subtarget.

In D46365#1140194, @jvesely wrote:

running llc through valgrind produced flood of 'Conditional jump or move depends on uninitialised value(s)'
269 errors from 24 contexts. Initialzieng just CaymanISA in R600SUbtarget gets rid of most of them.

These should be fixed now, can you re-test?

arsenm added inline comments.Jun 26 2018, 12:44 AM

lib/Target/AMDGPU/AMDGPUFeatures.td
2	Missing header comment
lib/Target/AMDGPU/AMDGPUSubtarget.h
473–475	Why are these leftover as virtual?
lib/Target/AMDGPU/R600.td
2	Missing header comment

In D46365#1141392, @tstellar wrote:

In D46365#1140194, @jvesely wrote:

running llc through valgrind produced flood of 'Conditional jump or move depends on uninitialised value(s)'
269 errors from 24 contexts. Initialzieng just CaymanISA in R600SUbtarget gets rid of most of them.

These should be fixed now, can you re-test?

Fails to build:
llvm-tblgen: Unknown command line argument '-gen-tgt-intrinsic'. Try: '../../../bin/llvm-tblgen -help'
llvm-tblgen: Did you mean '-gen-tgt-intrinsic-impl'?
make[2]: *** [lib/Target/AMDGPU/CMakeFiles/AMDGPUCommonTableGen.dir/build.make:1730: lib/Target/AMDGPU/R600GenIntrinsics.inc.tmp] Error 1

In D46365#1140270, @tstellar wrote:

In D46365#1133213, @jvesely wrote:

Now, there are tests for MEMRAT_CACHELESS stoers, and they pass so I guess there is another untested store path that got mixed between TS2 and TS3.
I can paste the .ll file if you're interested.

Yes, that would be helpful.

https://people.freedesktop.org/~jvesely/llvm/

test cases 46 and 48 the "n-" and "new-" prefixed versions are the result of the previous iteration of this patch

In D46365#1146098, @jvesely wrote:

In D46365#1141392, @tstellar wrote:

In D46365#1140194, @jvesely wrote:

running llc through valgrind produced flood of 'Conditional jump or move depends on uninitialised value(s)'
269 errors from 24 contexts. Initialzieng just CaymanISA in R600SUbtarget gets rid of most of them.

These should be fixed now, can you re-test?

Fails to build:
llvm-tblgen: Unknown command line argument '-gen-tgt-intrinsic'. Try: '../../../bin/llvm-tblgen -help'
llvm-tblgen: Did you mean '-gen-tgt-intrinsic-impl'?
make[2]: *** [lib/Target/AMDGPU/CMakeFiles/AMDGPUCommonTableGen.dir/build.make:1730: lib/Target/AMDGPU/R600GenIntrinsics.inc.tmp] Error 1

After fixing the build file as tblgen suggested (and few local fixes in my own patches) it builds OK and there are no piglit regressions on my turks.
I think this should land rather soon, with a bunch of cleanup follow ups. Having things (files, classes) that are prefixed R600, AMDGPU, AMDGPUCommon, GCN, AMDGCN, and SI is rather confusing.

Rebase and stop generating intrinsic info for R600, we don't need this.

Harbormaster completed remote builds in B19808: Diff 153250.Jun 27 2018, 8:53 PM

In D46365#1146123, @jvesely wrote:

In D46365#1146098, @jvesely wrote:

Fails to build:
llvm-tblgen: Unknown command line argument '-gen-tgt-intrinsic'. Try: '../../../bin/llvm-tblgen -help'
llvm-tblgen: Did you mean '-gen-tgt-intrinsic-impl'?
make[2]: *** [lib/Target/AMDGPU/CMakeFiles/AMDGPUCommonTableGen.dir/build.make:1730: lib/Target/AMDGPU/R600GenIntrinsics.inc.tmp] Error 1

After fixing the build file as tblgen suggested (and few local fixes in my own patches) it builds OK and there are no piglit regressions on my turks.

IntrinsicInfo isn't needed any more, so I dropped this.

I think this should land rather soon, with a bunch of cleanup follow ups. Having things (files, classes) that are prefixed R600, AMDGPU, AMDGPUCommon, GCN, AMDGCN, and SI is rather confusing.

Ok, I can start working on this once this patch lands.

tstellar marked 3 inline comments as done.Jun 27 2018, 9:07 PM

Add missing headers to tablegen files and remove virtual functions
from AMDGPUSubtarget.

LGTM

lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
716–718	I would expect this to be a separate function, but not sure where this would go

This revision is now accepted and ready to land.Jun 28 2018, 12:00 AM

tstellar added inline comments.Jun 28 2018, 9:18 AM

lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
716–718	We can refactor AMDGPUMCInstLower.cpp so that this can be in its own function. I can work on this as one of the follow on clean ups.

Closed by commit rL335942: AMDGPU: Separate R600 and GCN TableGen files (authored by tstellar). · Explain WhyJun 28 2018, 4:52 PM

This revision was automatically updated to reflect the committed changes.

Hi! We encountered a UBSan runtime error after this was merged. I wrote a bug report about it: https://bugs.llvm.org/show_bug.cgi?id=38071.

Revision Contents

Path

Size

lib/

Target/

AMDGPU/

148 lines

12 lines

17 lines

52 lines

AMDGPUISelDAGToDAG.cpp

109 lines

AMDGPUISelLowering.h

7 lines

AMDGPUISelLowering.cpp

17 lines

AMDGPUInstrInfo.h

14 lines

AMDGPUInstrInfo.cpp

99 lines

AMDGPUInstructions.td

88 lines

AMDGPUIntrinsics.td

2 lines

AMDGPULowerIntrinsics.cpp

3 lines

AMDGPUMCInstLower.h

6 lines

AMDGPUMCInstLower.cpp

74 lines

AMDGPUPromoteAlloca.cpp

15 lines

AMDGPURegisterInfo.td

1 line

AMDGPUSubtarget.h

638 lines

AMDGPUSubtarget.cpp

87 lines

AMDGPUTargetMachine.h

19 lines

AMDGPUTargetMachine.cpp

5 lines

AMDGPUTargetTransformInfo.h

68 lines

AMDGPUTargetTransformInfo.cpp

142 lines

AMDILCFGStructurizer.cpp

92 lines

CMakeLists.txt

13 lines

Disassembler/

AMDGPUDisassembler.cpp

1 line

EvergreenInstructions.td

7 lines

InstPrinter/

AMDGPUInstPrinter.h

11 lines

AMDGPUInstPrinter.cpp

95 lines

MCTargetDesc/

AMDGPUMCTargetDesc.h

16 lines

AMDGPUMCTargetDesc.cpp

25 lines

CMakeLists.txt

1 line

R600MCCodeEmitter.cpp

35 lines

27 lines

3 lines

154 lines

52 lines

R600ClauseMergePass.cpp

26 lines

R600ControlFlowFinalizer.cpp

106 lines

R600EmitClauseMarkers.cpp

48 lines

R600ExpandSpecialInstrs.cpp

55 lines

3 lines

331 lines

2 lines

9 lines

429 lines

94 lines

58 lines

R600IntrinsicInfo.cpp

103 lines

R600Intrinsics.td

2 lines

R600MachineScheduler.cpp

62 lines

R600OptimizeVectorRegisters.cpp

14 lines

45 lines

56 lines

7 lines

58 lines

2 lines

2 lines

4 lines

3 lines

25 lines

6 lines

10 lines

2 lines

16 lines

107 lines

5 lines

3 lines

1 line

2 lines

Utils/

AMDGPUBaseInfo.h

2 lines

AMDGPUBaseInfo.cpp

12 lines

Diff 144924

lib/Target/AMDGPU/AMDGPU.td

	//===-- AMDGPU.td - AMDGPU Tablegen files --------- tablegen --===//			//===-- AMDGPU.td - AMDGPU Tablegen files --------- tablegen --===//
	//			//
	// The LLVM Compiler Infrastructure			// The LLVM Compiler Infrastructure
	//			//
	// This file is distributed under the University of Illinois Open Source			// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.			// License. See LICENSE.TXT for details.
	//			//
	//===------------------------------------------------------------===//			//===------------------------------------------------------------===//

	include "llvm/Target/Target.td"			include "llvm/Target/Target.td"
				include "AMDGPUFeatures.td"

	//===------------------------------------------------------------===//			//===------------------------------------------------------------===//
	// Subtarget Features (device properties)			// Subtarget Features (device properties)
	//===------------------------------------------------------------===//			//===------------------------------------------------------------===//

	def FeatureFP64 : SubtargetFeature<"fp64",
	"FP64",
	"true",
	"Enable double precision operations"
	>;

	def FeatureFMA : SubtargetFeature<"fmaf",
	"FMA",
	"true",
	"Enable single precision FMA (not as fast as mul+add, but fused)"
	>;

	def FeatureFastFMAF32 : SubtargetFeature<"fast-fmaf",			def FeatureFastFMAF32 : SubtargetFeature<"fast-fmaf",
	"FastFMAF32",			"FastFMAF32",
	"true",			"true",
	"Assuming f32 fma is at least as fast as mul + add"			"Assuming f32 fma is at least as fast as mul + add"
	>;			>;

	def FeatureMIMG_R128 : SubtargetFeature<"mimg-r128",			def FeatureMIMG_R128 : SubtargetFeature<"mimg-r128",
	"MIMG_R128",			"MIMG_R128",
	"true",			"true",
	"Support 128-bit texture resources"			"Support 128-bit texture resources"
	>;			>;

	def HalfRate64Ops : SubtargetFeature<"half-rate-64-ops",			def HalfRate64Ops : SubtargetFeature<"half-rate-64-ops",
	"HalfRate64Ops",			"HalfRate64Ops",
	"true",			"true",
	"Most fp64 instructions are half rate instead of quarter"			"Most fp64 instructions are half rate instead of quarter"
	>;			>;

	def FeatureR600ALUInst : SubtargetFeature<"R600ALUInst",
	"R600ALUInst",
	"false",
	"Older version of ALU instructions encoding"
	>;

	def FeatureVertexCache : SubtargetFeature<"HasVertexCache",
	"HasVertexCache",
	"true",
	"Specify use of dedicated vertex cache"
	>;

	def FeatureCaymanISA : SubtargetFeature<"caymanISA",
	"CaymanISA",
	"true",
	"Use Cayman ISA"
	>;

	def FeatureCFALUBug : SubtargetFeature<"cfalubug",
	"CFALUBug",
	"true",
	"GPU has CF_ALU bug"
	>;

	def FeatureFlatAddressSpace : SubtargetFeature<"flat-address-space",			def FeatureFlatAddressSpace : SubtargetFeature<"flat-address-space",
	"FlatAddressSpace",			"FlatAddressSpace",
	"true",			"true",
	"Support flat address space"			"Support flat address space"
	>;			>;

	def FeatureFlatInstOffsets : SubtargetFeature<"flat-inst-offsets",			def FeatureFlatInstOffsets : SubtargetFeature<"flat-inst-offsets",
	"FlatInstOffsets",			"FlatInstOffsets",
	▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
	>;			>;

	def FeatureSGPRInitBug : SubtargetFeature<"sgpr-init-bug",			def FeatureSGPRInitBug : SubtargetFeature<"sgpr-init-bug",
	"SGPRInitBug",			"SGPRInitBug",
	"true",			"true",
	"VI SGPR initialization bug requiring a fixed SGPR allocation size"			"VI SGPR initialization bug requiring a fixed SGPR allocation size"
	>;			>;

	class SubtargetFeatureFetchLimit <string Value> :
	SubtargetFeature <"fetch"#Value,
	"TexVTXClauseSize",
	Value,
	"Limit the maximum number of fetches in a clause to "#Value
	>;

	def FeatureFetchLimit8 : SubtargetFeatureFetchLimit <"8">;
	def FeatureFetchLimit16 : SubtargetFeatureFetchLimit <"16">;

	class SubtargetFeatureWavefrontSize <int Value> : SubtargetFeature<
	"wavefrontsize"#Value,
	"WavefrontSize",
	!cast<string>(Value),
	"The number of threads per wavefront"
	>;

	def FeatureWavefrontSize16 : SubtargetFeatureWavefrontSize<16>;
	def FeatureWavefrontSize32 : SubtargetFeatureWavefrontSize<32>;
	def FeatureWavefrontSize64 : SubtargetFeatureWavefrontSize<64>;

	class SubtargetFeatureLDSBankCount <int Value> : SubtargetFeature <			class SubtargetFeatureLDSBankCount <int Value> : SubtargetFeature <
	"ldsbankcount"#Value,			"ldsbankcount"#Value,
	"LDSBankCount",			"LDSBankCount",
	!cast<string>(Value),			!cast<string>(Value),
	"The number of LDS banks per compute unit."			"The number of LDS banks per compute unit."
	>;			>;

	def FeatureLDSBankCount16 : SubtargetFeatureLDSBankCount<16>;			def FeatureLDSBankCount16 : SubtargetFeatureLDSBankCount<16>;
	def FeatureLDSBankCount32 : SubtargetFeatureLDSBankCount<32>;			def FeatureLDSBankCount32 : SubtargetFeatureLDSBankCount<32>;

	class SubtargetFeatureLocalMemorySize <int Value> : SubtargetFeature<
	"localmemorysize"#Value,
	"LocalMemorySize",
	!cast<string>(Value),
	"The size of local memory in bytes"
	>;

	def FeatureGCN : SubtargetFeature<"gcn",
	"IsGCN",
	"true",
	"GCN or newer GPU"
	>;

	def FeatureGCN3Encoding : SubtargetFeature<"gcn3-encoding",			def FeatureGCN3Encoding : SubtargetFeature<"gcn3-encoding",
	"GCN3Encoding",			"GCN3Encoding",
	"true",			"true",
	"Encoding format for VI"			"Encoding format for VI"
	>;			>;

	def FeatureCIInsts : SubtargetFeature<"ci-insts",			def FeatureCIInsts : SubtargetFeature<"ci-insts",
	"CIInsts",			"CIInsts",
	▲ Show 20 Lines • Show All 141 Lines • ▼ Show 20 Lines

	def FeatureFP16Denormals : SubtargetFeature<"fp16-denormals",			def FeatureFP16Denormals : SubtargetFeature<"fp16-denormals",
	"FP64FP16Denormals",			"FP64FP16Denormals",
	"true",			"true",
	"Enable half precision denormal handling",			"Enable half precision denormal handling",
	[FeatureFP64FP16Denormals]			[FeatureFP64FP16Denormals]
	>;			>;

	def FeatureDX10Clamp : SubtargetFeature<"dx10-clamp",
	"DX10Clamp",
	"true",
	"clamp modifier clamps NaNs to 0.0"
	>;

	def FeatureFPExceptions : SubtargetFeature<"fp-exceptions",			def FeatureFPExceptions : SubtargetFeature<"fp-exceptions",
	"FPExceptions",			"FPExceptions",
	"true",			"true",
	"Enable floating point exceptions"			"Enable floating point exceptions"
	>;			>;

	class FeatureMaxPrivateElementSize<int size> : SubtargetFeature<			class FeatureMaxPrivateElementSize<int size> : SubtargetFeature<
	"max-private-element-size-"#size,			"max-private-element-size-"#size,
	Show All 26 Lines
	>;			>;

	def FeatureDumpCodeLower : SubtargetFeature <"dumpcode",			def FeatureDumpCodeLower : SubtargetFeature <"dumpcode",
	"DumpCode",			"DumpCode",
	"true",			"true",
	"Dump MachineInstrs in the CodeEmitter"			"Dump MachineInstrs in the CodeEmitter"
	>;			>;

	def FeaturePromoteAlloca : SubtargetFeature <"promote-alloca",
	"EnablePromoteAlloca",
	"true",
	"Enable promote alloca pass"
	>;

	// XXX - This should probably be removed once enabled by default			// XXX - This should probably be removed once enabled by default
	def FeatureEnableLoadStoreOpt : SubtargetFeature <"load-store-opt",			def FeatureEnableLoadStoreOpt : SubtargetFeature <"load-store-opt",
	"EnableLoadStoreOpt",			"EnableLoadStoreOpt",
	"true",			"true",
	"Enable SI load/store optimizer pass"			"Enable SI load/store optimizer pass"
	>;			>;

	// Performance debugging feature. Allow using DS instruction immediate			// Performance debugging feature. Allow using DS instruction immediate
	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	>;			>;

	// Dummy feature used to disable assembler instructions.			// Dummy feature used to disable assembler instructions.
	def FeatureDisable : SubtargetFeature<"",			def FeatureDisable : SubtargetFeature<"",
	"FeatureDisable","true",			"FeatureDisable","true",
	"Dummy feature to disable assembler instructions"			"Dummy feature to disable assembler instructions"
	>;			>;

	class SubtargetFeatureGeneration <string Value,			def FeatureGCN : SubtargetFeature<"gcn",
	list<SubtargetFeature> Implies> :			"IsGCN",
	SubtargetFeature <Value, "Gen", "AMDGPUSubtarget::"#Value,			"true",
	Value#" GPU generation", Implies>;			"GCN or newer GPU"

	def FeatureLocalMemorySize0 : SubtargetFeatureLocalMemorySize<0>;
	def FeatureLocalMemorySize32768 : SubtargetFeatureLocalMemorySize<32768>;
	def FeatureLocalMemorySize65536 : SubtargetFeatureLocalMemorySize<65536>;

	def FeatureR600 : SubtargetFeatureGeneration<"R600",
	[FeatureR600ALUInst, FeatureFetchLimit8, FeatureLocalMemorySize0]
	>;

	def FeatureR700 : SubtargetFeatureGeneration<"R700",
	[FeatureFetchLimit16, FeatureLocalMemorySize0]
	>;

	def FeatureEvergreen : SubtargetFeatureGeneration<"EVERGREEN",
	[FeatureFetchLimit16, FeatureLocalMemorySize32768]
	>;			>;

	def FeatureNorthernIslands : SubtargetFeatureGeneration<"NORTHERN_ISLANDS",			class AMDGPUSubtargetFeatureGeneration <string Value,
	[FeatureFetchLimit16, FeatureWavefrontSize64,			list<SubtargetFeature> Implies> :
	FeatureLocalMemorySize32768]			SubtargetFeatureGeneration <Value, "AMDGPUSubtarget", Implies>;
	>;

	def FeatureSouthernIslands : SubtargetFeatureGeneration<"SOUTHERN_ISLANDS",			def FeatureSouthernIslands : AMDGPUSubtargetFeatureGeneration<"SOUTHERN_ISLANDS",
	[FeatureFP64, FeatureLocalMemorySize32768, FeatureMIMG_R128,			[FeatureFP64, FeatureLocalMemorySize32768, FeatureMIMG_R128,
	FeatureWavefrontSize64, FeatureGCN,			FeatureWavefrontSize64, FeatureGCN,
	FeatureLDSBankCount32, FeatureMovrel]			FeatureLDSBankCount32, FeatureMovrel]
	>;			>;

	def FeatureSeaIslands : SubtargetFeatureGeneration<"SEA_ISLANDS",			def FeatureSeaIslands : AMDGPUSubtargetFeatureGeneration<"SEA_ISLANDS",
	[FeatureFP64, FeatureLocalMemorySize65536, FeatureMIMG_R128,			[FeatureFP64, FeatureLocalMemorySize65536, FeatureMIMG_R128,
	FeatureWavefrontSize64, FeatureGCN, FeatureFlatAddressSpace,			FeatureWavefrontSize64, FeatureGCN, FeatureFlatAddressSpace,
	FeatureCIInsts, FeatureMovrel]			FeatureCIInsts, FeatureMovrel]
	>;			>;

	def FeatureVolcanicIslands : SubtargetFeatureGeneration<"VOLCANIC_ISLANDS",			def FeatureVolcanicIslands : AMDGPUSubtargetFeatureGeneration<"VOLCANIC_ISLANDS",
	[FeatureFP64, FeatureLocalMemorySize65536, FeatureMIMG_R128,			[FeatureFP64, FeatureLocalMemorySize65536, FeatureMIMG_R128,
	FeatureWavefrontSize64, FeatureFlatAddressSpace, FeatureGCN,			FeatureWavefrontSize64, FeatureFlatAddressSpace, FeatureGCN,
	FeatureGCN3Encoding, FeatureCIInsts, Feature16BitInsts,			FeatureGCN3Encoding, FeatureCIInsts, Feature16BitInsts,
	FeatureSMemRealTime, FeatureVGPRIndexMode, FeatureMovrel,			FeatureSMemRealTime, FeatureVGPRIndexMode, FeatureMovrel,
	FeatureScalarStores, FeatureInv2PiInlineImm,			FeatureScalarStores, FeatureInv2PiInlineImm,
	FeatureSDWA, FeatureSDWAOutModsVOPC, FeatureSDWAMac, FeatureDPP,			FeatureSDWA, FeatureSDWAOutModsVOPC, FeatureSDWAMac, FeatureDPP,
	FeatureIntClamp			FeatureIntClamp
	]			]
	>;			>;

	def FeatureGFX9 : SubtargetFeatureGeneration<"GFX9",			def FeatureGFX9 : AMDGPUSubtargetFeatureGeneration<"GFX9",
	[FeatureFP64, FeatureLocalMemorySize65536,			[FeatureFP64, FeatureLocalMemorySize65536,
	FeatureWavefrontSize64, FeatureFlatAddressSpace, FeatureGCN,			FeatureWavefrontSize64, FeatureFlatAddressSpace, FeatureGCN,
	FeatureGCN3Encoding, FeatureCIInsts, Feature16BitInsts,			FeatureGCN3Encoding, FeatureCIInsts, Feature16BitInsts,
	FeatureSMemRealTime, FeatureScalarStores, FeatureInv2PiInlineImm,			FeatureSMemRealTime, FeatureScalarStores, FeatureInv2PiInlineImm,
	FeatureApertureRegs, FeatureGFX9Insts, FeatureVOP3P, FeatureVGPRIndexMode,			FeatureApertureRegs, FeatureGFX9Insts, FeatureVOP3P, FeatureVGPRIndexMode,
	FeatureFastFMAF32, FeatureDPP, FeatureIntClamp,			FeatureFastFMAF32, FeatureDPP, FeatureIntClamp,
	FeatureSDWA, FeatureSDWAOmod, FeatureSDWAScalar, FeatureSDWASdst,			FeatureSDWA, FeatureSDWAOmod, FeatureSDWAScalar, FeatureSDWASdst,
	FeatureFlatInstOffsets, FeatureFlatGlobalInsts, FeatureFlatScratchInsts,			FeatureFlatInstOffsets, FeatureFlatGlobalInsts, FeatureFlatScratchInsts,
	▲ Show 20 Lines • Show All 180 Lines • ▼ Show 20 Lines
	// Dummy Instruction itineraries for pseudo instructions			// Dummy Instruction itineraries for pseudo instructions
	def ALU_NULL : FuncUnit;			def ALU_NULL : FuncUnit;
	def NullALU : InstrItinClass;			def NullALU : InstrItinClass;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Predicate helper class			// Predicate helper class
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	def TruePredicate : Predicate<"true">;

	def isSICI : Predicate<			def isSICI : Predicate<
	"Subtarget->getGeneration() == AMDGPUSubtarget::SOUTHERN_ISLANDS \|\|"			"Subtarget->getGeneration() == AMDGPUSubtarget::SOUTHERN_ISLANDS \|\|"
	"Subtarget->getGeneration() == AMDGPUSubtarget::SEA_ISLANDS"			"Subtarget->getGeneration() == AMDGPUSubtarget::SEA_ISLANDS"
	>, AssemblerPredicate<"!FeatureGCN3Encoding">;			>, AssemblerPredicate<"!FeatureGCN3Encoding">;

	def isVI : Predicate <			def isVI : Predicate <
	"Subtarget->getGeneration() >= AMDGPUSubtarget::VOLCANIC_ISLANDS">,			"Subtarget->getGeneration() >= AMDGPUSubtarget::VOLCANIC_ISLANDS">,
	AssemblerPredicate<"FeatureGCN3Encoding">;			AssemblerPredicate<"FeatureGCN3Encoding">;
	▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
	def HasVGPRIndexMode : Predicate<"Subtarget->hasVGPRIndexMode()">,			def HasVGPRIndexMode : Predicate<"Subtarget->hasVGPRIndexMode()">,
	AssemblerPredicate<"FeatureVGPRIndexMode">;			AssemblerPredicate<"FeatureVGPRIndexMode">;
	def HasMovrel : Predicate<"Subtarget->hasMovrel()">,			def HasMovrel : Predicate<"Subtarget->hasMovrel()">,
	AssemblerPredicate<"FeatureMovrel">;			AssemblerPredicate<"FeatureMovrel">;

	def EnableLateCFGStructurize : Predicate<			def EnableLateCFGStructurize : Predicate<
	"EnableLateStructurizeCFG">;			"EnableLateStructurizeCFG">;

	// Exists to help track down where SubtargetPredicate isn't set rather
	// than letting tablegen crash with an unhelpful error.
	def InvalidPred : Predicate<"predicate not set on instruction or pattern">;

	class PredicateControl {
	Predicate SubtargetPredicate = InvalidPred;
	Predicate SIAssemblerPredicate = isSICI;
	Predicate VIAssemblerPredicate = isVI;
	list<Predicate> AssemblerPredicates = [];
	Predicate AssemblerPredicate = TruePredicate;
	list<Predicate> OtherPredicates = [];
	list<Predicate> Predicates = !listconcat([SubtargetPredicate,
	AssemblerPredicate],
	AssemblerPredicates,
	OtherPredicates);
	}

	class AMDGPUPat<dag pattern, dag result> : Pat<pattern, result>,
	PredicateControl;


	// Include AMDGPU TD files			// Include AMDGPU TD files
	include "R600Schedule.td"
	include "R600Processors.td"
	include "SISchedule.td"			include "SISchedule.td"
	include "GCNProcessors.td"			include "GCNProcessors.td"
	include "AMDGPUInstrInfo.td"			include "AMDGPUInstrInfo.td"
	include "AMDGPUIntrinsics.td"			include "AMDGPUIntrinsics.td"
				include "SIIntrinsics.td"
	include "AMDGPURegisterInfo.td"			include "AMDGPURegisterInfo.td"
	include "AMDGPURegisterBanks.td"			include "AMDGPURegisterBanks.td"
	include "AMDGPUInstructions.td"			include "AMDGPUInstructions.td"
				include "SIInstrInfo.td"
	include "AMDGPUCallingConv.td"			include "AMDGPUCallingConv.td"
	include "AMDGPUSearchableTables.td"			include "AMDGPUSearchableTables.td"

lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp

Show First 20 Lines • Show All 188 Lines • ▼ Show 20 Lines	bool AMDGPUAsmPrinter::isBlockOnlyReachableByFallthrough(
return (MBB->back().getOpcode() != AMDGPU::S_SETPC_B64);		return (MBB->back().getOpcode() != AMDGPU::S_SETPC_B64);
}		}

void AMDGPUAsmPrinter::EmitFunctionBodyStart() {		void AMDGPUAsmPrinter::EmitFunctionBodyStart() {
const AMDGPUMachineFunction *MFI = MF->getInfo<AMDGPUMachineFunction>();		const AMDGPUMachineFunction *MFI = MF->getInfo<AMDGPUMachineFunction>();
if (!MFI->isEntryFunction())		if (!MFI->isEntryFunction())
return;		return;

const AMDGPUSubtarget &STM = MF->getSubtarget<AMDGPUSubtarget>();		const AMDGPUCommonSubtarget &STM = AMDGPUCommonSubtarget::get(*MF);
amd_kernel_code_t KernelCode;		amd_kernel_code_t KernelCode;
if (STM.isAmdCodeObjectV2(*MF)) {		if (STM.isAmdCodeObjectV2(*MF)) {
getAmdKernelCode(KernelCode, CurrentProgramInfo, *MF);		getAmdKernelCode(KernelCode, CurrentProgramInfo, *MF);

OutStreamer->SwitchSection(getObjFileLowering().getTextSection());		OutStreamer->SwitchSection(getObjFileLowering().getTextSection());
getTargetStreamer()->EmitAMDKernelCodeT(KernelCode);		getTargetStreamer()->EmitAMDKernelCodeT(KernelCode);
}		}

if (TM.getTargetTriple().getOS() != Triple::AMDHSA)		if (TM.getTargetTriple().getOS() != Triple::AMDHSA)
return;		return;

HSAMetadataStream.emitKernel(MF->getFunction(),		HSAMetadataStream.emitKernel(MF->getFunction(),
getHSACodeProps(*MF, CurrentProgramInfo),		getHSACodeProps(*MF, CurrentProgramInfo),
getHSADebugProps(*MF, CurrentProgramInfo));		getHSADebugProps(*MF, CurrentProgramInfo));
}		}

void AMDGPUAsmPrinter::EmitFunctionEntryLabel() {		void AMDGPUAsmPrinter::EmitFunctionEntryLabel() {
const SIMachineFunctionInfo *MFI = MF->getInfo<SIMachineFunctionInfo>();		const SIMachineFunctionInfo *MFI = MF->getInfo<SIMachineFunctionInfo>();
const AMDGPUSubtarget &STM = MF->getSubtarget<AMDGPUSubtarget>();		const AMDGPUCommonSubtarget &STM = AMDGPUCommonSubtarget::get(*MF);
if (MFI->isEntryFunction() && STM.isAmdCodeObjectV2(*MF)) {		if (MFI->isEntryFunction() && STM.isAmdCodeObjectV2(*MF)) {
SmallString<128> SymbolName;		SmallString<128> SymbolName;
getNameWithPrefix(SymbolName, &MF->getFunction()),		getNameWithPrefix(SymbolName, &MF->getFunction()),
getTargetStreamer()->EmitAMDGPUSymbolType(		getTargetStreamer()->EmitAMDGPUSymbolType(
SymbolName, ELF::STT_AMDGPU_HSA_KERNEL);		SymbolName, ELF::STT_AMDGPU_HSA_KERNEL);
}		}
const AMDGPUSubtarget &STI = MF->getSubtarget<AMDGPUSubtarget>();		const AMDGPUSubtarget &STI = MF->getSubtarget<AMDGPUSubtarget>();
if (STI.dumpCode()) {		if (STI.dumpCode()) {
▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	bool AMDGPUAsmPrinter::runOnMachineFunction(MachineFunction &MF) {
const AMDGPUMachineFunction *MFI = MF.getInfo<AMDGPUMachineFunction>();		const AMDGPUMachineFunction *MFI = MF.getInfo<AMDGPUMachineFunction>();

// The starting address of all shader programs must be 256 bytes aligned.		// The starting address of all shader programs must be 256 bytes aligned.
// Regular functions just need the basic required instruction alignment.		// Regular functions just need the basic required instruction alignment.
MF.setAlignment(MFI->isEntryFunction() ? 8 : 2);		MF.setAlignment(MFI->isEntryFunction() ? 8 : 2);

SetupMachineFunction(MF);		SetupMachineFunction(MF);

const AMDGPUSubtarget &STM = MF.getSubtarget<AMDGPUSubtarget>();		const AMDGPUCommonSubtarget &STM = AMDGPUCommonSubtarget::get(MF);
MCContext &Context = getObjFileLowering().getContext();		MCContext &Context = getObjFileLowering().getContext();
// FIXME: This should be an explicit check for Mesa.		// FIXME: This should be an explicit check for Mesa.
if (!STM.isAmdHsaOS() && !STM.isAmdPalOS()) {		if (!STM.isAmdHsaOS() && !STM.isAmdPalOS()) {
MCSectionELF *ConfigSection =		MCSectionELF *ConfigSection =
Context.getELFSection(".AMDGPU.config", ELF::SHT_PROGBITS, 0);		Context.getELFSection(".AMDGPU.config", ELF::SHT_PROGBITS, 0);
OutStreamer->SwitchSection(ConfigSection);		OutStreamer->SwitchSection(ConfigSection);
}		}

if (STM.getGeneration() >= AMDGPUSubtarget::SOUTHERN_ISLANDS) {		if (TM.getTargetTriple().getArch() == Triple::amdgcn) {
if (MFI->isEntryFunction()) {		if (MFI->isEntryFunction()) {
getSIProgramInfo(CurrentProgramInfo, MF);		getSIProgramInfo(CurrentProgramInfo, MF);
} else {		} else {
auto I = CallGraphResourceInfo.insert(		auto I = CallGraphResourceInfo.insert(
std::make_pair(&MF.getFunction(), SIFunctionResourceInfo()));		std::make_pair(&MF.getFunction(), SIFunctionResourceInfo()));
SIFunctionResourceInfo &Info = I.first->second;		SIFunctionResourceInfo &Info = I.first->second;
assert(I.second && "should only be called once per function");		assert(I.second && "should only be called once per function");
Info = analyzeResourceUsage(MF);		Info = analyzeResourceUsage(MF);
Show All 14 Lines	bool AMDGPUAsmPrinter::runOnMachineFunction(MachineFunction &MF) {

EmitFunctionBody();		EmitFunctionBody();

if (isVerbose()) {		if (isVerbose()) {
MCSectionELF *CommentSection =		MCSectionELF *CommentSection =
Context.getELFSection(".AMDGPU.csdata", ELF::SHT_PROGBITS, 0);		Context.getELFSection(".AMDGPU.csdata", ELF::SHT_PROGBITS, 0);
OutStreamer->SwitchSection(CommentSection);		OutStreamer->SwitchSection(CommentSection);

if (STM.getGeneration() >= AMDGPUSubtarget::SOUTHERN_ISLANDS) {		if (TM.getTargetTriple().getArch() == Triple::amdgcn) {
if (!MFI->isEntryFunction()) {		if (!MFI->isEntryFunction()) {
OutStreamer->emitRawComment(" Function info:", false);		OutStreamer->emitRawComment(" Function info:", false);
SIFunctionResourceInfo &Info = CallGraphResourceInfo[&MF.getFunction()];		SIFunctionResourceInfo &Info = CallGraphResourceInfo[&MF.getFunction()];
emitCommonFunctionComments(		emitCommonFunctionComments(
Info.NumVGPR,		Info.NumVGPR,
Info.getTotalNumSGPRs(MF.getSubtarget<SISubtarget>()),		Info.getTotalNumSGPRs(MF.getSubtarget<SISubtarget>()),
Info.PrivateSegmentSize,		Info.PrivateSegmentSize,
getFunctionCodeSize(MF));		getFunctionCodeSize(MF));
▲ Show 20 Lines • Show All 92 Lines • ▼ Show 20 Lines	void AMDGPUAsmPrinter::EmitProgramInfoR600(const MachineFunction &MF) {
unsigned MaxGPR = 0;		unsigned MaxGPR = 0;
bool killPixel = false;		bool killPixel = false;
const R600Subtarget &STM = MF.getSubtarget<R600Subtarget>();		const R600Subtarget &STM = MF.getSubtarget<R600Subtarget>();
const R600RegisterInfo *RI = STM.getRegisterInfo();		const R600RegisterInfo *RI = STM.getRegisterInfo();
const R600MachineFunctionInfo *MFI = MF.getInfo<R600MachineFunctionInfo>();		const R600MachineFunctionInfo *MFI = MF.getInfo<R600MachineFunctionInfo>();

for (const MachineBasicBlock &MBB : MF) {		for (const MachineBasicBlock &MBB : MF) {
for (const MachineInstr &MI : MBB) {		for (const MachineInstr &MI : MBB) {
if (MI.getOpcode() == AMDGPU::KILLGT)		if (MI.getOpcode() == R600::KILLGT)
killPixel = true;		killPixel = true;
unsigned numOperands = MI.getNumOperands();		unsigned numOperands = MI.getNumOperands();
for (unsigned op_idx = 0; op_idx < numOperands; op_idx++) {		for (unsigned op_idx = 0; op_idx < numOperands; op_idx++) {
const MachineOperand &MO = MI.getOperand(op_idx);		const MachineOperand &MO = MI.getOperand(op_idx);
if (!MO.isReg())		if (!MO.isReg())
continue;		continue;
unsigned HWReg = RI->getHWRegIndex(MO.getReg());		unsigned HWReg = RI->getHWRegIndex(MO.getReg());

▲ Show 20 Lines • Show All 833 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPUCallingConv.td

Show First 20 Lines • Show All 79 Lines • ▼ Show 20 Lines	CCIfType<[f32, f16] , CCAssignToReg<[
VGPR96, VGPR97, VGPR98, VGPR99, VGPR100, VGPR101, VGPR102, VGPR103,		VGPR96, VGPR97, VGPR98, VGPR99, VGPR100, VGPR101, VGPR102, VGPR103,
VGPR104, VGPR105, VGPR106, VGPR107, VGPR108, VGPR109, VGPR110, VGPR111,		VGPR104, VGPR105, VGPR106, VGPR107, VGPR108, VGPR109, VGPR110, VGPR111,
VGPR112, VGPR113, VGPR114, VGPR115, VGPR116, VGPR117, VGPR118, VGPR119,		VGPR112, VGPR113, VGPR114, VGPR115, VGPR116, VGPR117, VGPR118, VGPR119,
VGPR120, VGPR121, VGPR122, VGPR123, VGPR124, VGPR125, VGPR126, VGPR127,		VGPR120, VGPR121, VGPR122, VGPR123, VGPR124, VGPR125, VGPR126, VGPR127,
VGPR128, VGPR129, VGPR130, VGPR131, VGPR132, VGPR133, VGPR134, VGPR135		VGPR128, VGPR129, VGPR130, VGPR131, VGPR132, VGPR133, VGPR134, VGPR135
]>>		]>>
]>;		]>;

// Calling convention for R600
def CC_R600 : CallingConv<[
CCIfInReg<CCIfType<[v4f32, v4i32] , CCAssignToReg<[
T0_XYZW, T1_XYZW, T2_XYZW, T3_XYZW, T4_XYZW, T5_XYZW, T6_XYZW, T7_XYZW,
T8_XYZW, T9_XYZW, T10_XYZW, T11_XYZW, T12_XYZW, T13_XYZW, T14_XYZW, T15_XYZW,
T16_XYZW, T17_XYZW, T18_XYZW, T19_XYZW, T20_XYZW, T21_XYZW, T22_XYZW,
T23_XYZW, T24_XYZW, T25_XYZW, T26_XYZW, T27_XYZW, T28_XYZW, T29_XYZW,
T30_XYZW, T31_XYZW, T32_XYZW
]>>>
]>;

// Calling convention for compute kernels		// Calling convention for compute kernels
def CC_AMDGPU_Kernel : CallingConv<[		def CC_AMDGPU_Kernel : CallingConv<[
CCCustom<"allocateKernArg">		CCCustom<"allocateKernArg">
]>;		]>;

def CSR_AMDGPU_VGPRs_24_255 : CalleeSavedRegs<		def CSR_AMDGPU_VGPRs_24_255 : CalleeSavedRegs<
(sequence "VGPR%u", 24, 255)		(sequence "VGPR%u", 24, 255)
>;		>;
▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	CCIf<"static_cast<const AMDGPUSubtarget&>"
CCDelegateTo<CC_AMDGPU_Kernel>>,		CCDelegateTo<CC_AMDGPU_Kernel>>,
CCIf<"static_cast<const AMDGPUSubtarget&>"		CCIf<"static_cast<const AMDGPUSubtarget&>"
"(State.getMachineFunction().getSubtarget()).getGeneration() >= "		"(State.getMachineFunction().getSubtarget()).getGeneration() >= "
"AMDGPUSubtarget::SOUTHERN_ISLANDS",		"AMDGPUSubtarget::SOUTHERN_ISLANDS",
CCDelegateTo<CC_SI>>,		CCDelegateTo<CC_SI>>,
CCIf<"static_cast<const AMDGPUSubtarget&>"		CCIf<"static_cast<const AMDGPUSubtarget&>"
"(State.getMachineFunction().getSubtarget()).getGeneration() >= "		"(State.getMachineFunction().getSubtarget()).getGeneration() >= "
"AMDGPUSubtarget::SOUTHERN_ISLANDS && State.getCallingConv() == CallingConv::C",		"AMDGPUSubtarget::SOUTHERN_ISLANDS && State.getCallingConv() == CallingConv::C",
CCDelegateTo<CC_AMDGPU_Func>>,		CCDelegateTo<CC_AMDGPU_Func>>
CCIf<"static_cast<const AMDGPUSubtarget&>"
"(State.getMachineFunction().getSubtarget()).getGeneration() < "
"AMDGPUSubtarget::SOUTHERN_ISLANDS",
CCDelegateTo<CC_R600>>
]>;		]>;

lib/Target/AMDGPU/AMDGPUFeatures.td

This file was added.


				def FeatureFP64 : SubtargetFeature<"fp64",
				arsenmUnsubmitted Done Reply Inline Actions Missing header comment arsenm: Missing header comment
				"FP64",
				"true",
				"Enable double precision operations"
				>;

				def FeatureFMA : SubtargetFeature<"fmaf",
				"FMA",
				"true",
				"Enable single precision FMA (not as fast as mul+add, but fused)"
				>;

				class SubtargetFeatureLocalMemorySize <int Value> : SubtargetFeature<
				"localmemorysize"#Value,
				"LocalMemorySize",
				!cast<string>(Value),
				"The size of local memory in bytes"
				>;

				def FeatureLocalMemorySize0 : SubtargetFeatureLocalMemorySize<0>;
				def FeatureLocalMemorySize32768 : SubtargetFeatureLocalMemorySize<32768>;
				def FeatureLocalMemorySize65536 : SubtargetFeatureLocalMemorySize<65536>;

				class SubtargetFeatureWavefrontSize <int Value> : SubtargetFeature<
				"wavefrontsize"#Value,
				"WavefrontSize",
				!cast<string>(Value),
				"The number of threads per wavefront"
				>;

				def FeatureWavefrontSize16 : SubtargetFeatureWavefrontSize<16>;
				def FeatureWavefrontSize32 : SubtargetFeatureWavefrontSize<32>;
				def FeatureWavefrontSize64 : SubtargetFeatureWavefrontSize<64>;

				class SubtargetFeatureGeneration <string Value, string Subtarget,
				list<SubtargetFeature> Implies> :
				SubtargetFeature <Value, "Gen", Subtarget#"::"#Value,
				Value#" GPU generation", Implies>;

				def FeatureDX10Clamp : SubtargetFeature<"dx10-clamp",
				"DX10Clamp",
				"true",
				"clamp modifier clamps NaNs to 0.0"
				>;

				def FeaturePromoteAlloca : SubtargetFeature <"promote-alloca",
				"EnablePromoteAlloca",
				"true",
				"Enable promote alloca pass"
				>;

				jveselyUnsubmitted Not Done Reply Inline Actions gi complains about blank line at the end of file here jvesely: gi complains about blank line at the end of file here

lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp

Show First 20 Lines • Show All 96 Lines • ▼ Show 20 Lines
protected:		protected:
void SelectBuildVector(SDNode *N, unsigned RegClassID);		void SelectBuildVector(SDNode *N, unsigned RegClassID);

private:		private:
std::pair<SDValue, SDValue> foldFrameIndex(SDValue N) const;		std::pair<SDValue, SDValue> foldFrameIndex(SDValue N) const;
bool isNoNanSrc(SDValue N) const;		bool isNoNanSrc(SDValue N) const;
bool isInlineImmediate(const SDNode *N) const;		bool isInlineImmediate(const SDNode *N) const;

bool isConstantLoad(const MemSDNode *N, int cbID) const;
bool isUniformBr(const SDNode *N) const;		bool isUniformBr(const SDNode *N) const;

SDNode glueCopyToM0(SDNode N) const;		SDNode glueCopyToM0(SDNode N) const;

const TargetRegisterClass getOperandRegClass(SDNode N, unsigned OpNo) const;		const TargetRegisterClass getOperandRegClass(SDNode N, unsigned OpNo) const;
bool SelectGlobalValueConstantOffset(SDValue Addr, SDValue& IntPtr);
bool SelectGlobalValueVariableOffset(SDValue Addr, SDValue &BaseReg,
SDValue& Offset);
virtual bool SelectADDRVTX_READ(SDValue Addr, SDValue &Base, SDValue &Offset);		virtual bool SelectADDRVTX_READ(SDValue Addr, SDValue &Base, SDValue &Offset);
virtual bool SelectADDRIndirect(SDValue Addr, SDValue &Base, SDValue &Offset);		virtual bool SelectADDRIndirect(SDValue Addr, SDValue &Base, SDValue &Offset);
bool isDSOffsetLegal(const SDValue &Base, unsigned Offset,		bool isDSOffsetLegal(const SDValue &Base, unsigned Offset,
unsigned OffsetBits) const;		unsigned OffsetBits) const;
bool SelectDS1Addr1Offset(SDValue Ptr, SDValue &Base, SDValue &Offset) const;		bool SelectDS1Addr1Offset(SDValue Ptr, SDValue &Base, SDValue &Offset) const;
bool SelectDS64Bit4ByteAligned(SDValue Ptr, SDValue &Base, SDValue &Offset0,		bool SelectDS64Bit4ByteAligned(SDValue Ptr, SDValue &Base, SDValue &Offset0,
SDValue &Offset1) const;		SDValue &Offset1) const;
bool SelectMUBUF(SDValue Addr, SDValue &SRsrc, SDValue &VAddr,		bool SelectMUBUF(SDValue Addr, SDValue &SRsrc, SDValue &VAddr,
▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines	private:
void SelectATOMIC_CMP_SWAP(SDNode *N);		void SelectATOMIC_CMP_SWAP(SDNode *N);

protected:		protected:
// Include the pieces autogenerated from the target description.		// Include the pieces autogenerated from the target description.
#include "AMDGPUGenDAGISel.inc"		#include "AMDGPUGenDAGISel.inc"
};		};

class R600DAGToDAGISel : public AMDGPUDAGToDAGISel {		class R600DAGToDAGISel : public AMDGPUDAGToDAGISel {
		const R600Subtarget *Subtarget;
		AMDGPUAS AMDGPUASI;

		bool isConstantLoad(const MemSDNode *N, int cbID) const;
		bool SelectGlobalValueConstantOffset(SDValue Addr, SDValue& IntPtr);
		bool SelectGlobalValueVariableOffset(SDValue Addr, SDValue &BaseReg,
		SDValue& Offset);
public:		public:
explicit R600DAGToDAGISel(TargetMachine *TM, CodeGenOpt::Level OptLevel) :		explicit R600DAGToDAGISel(TargetMachine *TM, CodeGenOpt::Level OptLevel) :
AMDGPUDAGToDAGISel(TM, OptLevel) {}		AMDGPUDAGToDAGISel(TM, OptLevel) {
		AMDGPUASI = AMDGPU::getAMDGPUAS(*TM);
		}

void Select(SDNode *N) override;		void Select(SDNode *N) override;

bool SelectADDRIndirect(SDValue Addr, SDValue &Base,		bool SelectADDRIndirect(SDValue Addr, SDValue &Base,
SDValue &Offset) override;		SDValue &Offset) override;
bool SelectADDRVTX_READ(SDValue Addr, SDValue &Base,		bool SelectADDRVTX_READ(SDValue Addr, SDValue &Base,
SDValue &Offset) override;		SDValue &Offset) override;

		bool runOnMachineFunction(MachineFunction &MF) override;
		protected:
		// Include the pieces autogenerated from the target description.
		#include "R600GenDAGISel.inc"
};		};

} // end anonymous namespace		} // end anonymous namespace

INITIALIZE_PASS_BEGIN(AMDGPUDAGToDAGISel, "isel",		INITIALIZE_PASS_BEGIN(AMDGPUDAGToDAGISel, "isel",
"AMDGPU DAG->DAG Pattern Instruction Selection", false, false)		"AMDGPU DAG->DAG Pattern Instruction Selection", false, false)
INITIALIZE_PASS_DEPENDENCY(AMDGPUArgumentUsageInfo)		INITIALIZE_PASS_DEPENDENCY(AMDGPUArgumentUsageInfo)
INITIALIZE_PASS_END(AMDGPUDAGToDAGISel, "isel",		INITIALIZE_PASS_END(AMDGPUDAGToDAGISel, "isel",
Show All 25 Lines	bool AMDGPUDAGToDAGISel::isNoNanSrc(SDValue N) const {
// TODO: Move into isKnownNeverNaN		// TODO: Move into isKnownNeverNaN
if (N->getFlags().isDefined())		if (N->getFlags().isDefined())
return N->getFlags().hasNoNaNs();		return N->getFlags().hasNoNaNs();

return CurDAG->isKnownNeverNaN(N);		return CurDAG->isKnownNeverNaN(N);
}		}

bool AMDGPUDAGToDAGISel::isInlineImmediate(const SDNode *N) const {		bool AMDGPUDAGToDAGISel::isInlineImmediate(const SDNode *N) const {
const SIInstrInfo *TII		const SIInstrInfo *TII = Subtarget->getInstrInfo();
= static_cast<const SISubtarget *>(Subtarget)->getInstrInfo();

if (const ConstantSDNode *C = dyn_cast<ConstantSDNode>(N))		if (const ConstantSDNode *C = dyn_cast<ConstantSDNode>(N))
return TII->isInlineConstant(C->getAPIntValue());		return TII->isInlineConstant(C->getAPIntValue());

if (const ConstantFPSDNode *C = dyn_cast<ConstantFPSDNode>(N))		if (const ConstantFPSDNode *C = dyn_cast<ConstantFPSDNode>(N))
return TII->isInlineConstant(C->getValueAPF().bitcastToAPInt());		return TII->isInlineConstant(C->getValueAPF().bitcastToAPInt());

return false;		return false;
▲ Show 20 Lines • Show All 100 Lines • ▼ Show 20 Lines	static bool getConstantValue(SDValue N, uint32_t &Out) {

return false;		return false;
}		}

void AMDGPUDAGToDAGISel::SelectBuildVector(SDNode *N, unsigned RegClassID) {		void AMDGPUDAGToDAGISel::SelectBuildVector(SDNode *N, unsigned RegClassID) {
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
unsigned NumVectorElts = VT.getVectorNumElements();		unsigned NumVectorElts = VT.getVectorNumElements();
EVT EltVT = VT.getVectorElementType();		EVT EltVT = VT.getVectorElementType();
const AMDGPURegisterInfo *TRI = Subtarget->getRegisterInfo();
SDLoc DL(N);		SDLoc DL(N);
SDValue RegClass = CurDAG->getTargetConstant(RegClassID, DL, MVT::i32);		SDValue RegClass = CurDAG->getTargetConstant(RegClassID, DL, MVT::i32);

if (NumVectorElts == 1) {		if (NumVectorElts == 1) {
CurDAG->SelectNodeTo(N, AMDGPU::COPY_TO_REGCLASS, EltVT, N->getOperand(0),		CurDAG->SelectNodeTo(N, AMDGPU::COPY_TO_REGCLASS, EltVT, N->getOperand(0),
RegClass);		RegClass);
return;		return;
}		}
Show All 11 Lines	void AMDGPUDAGToDAGISel::SelectBuildVector(SDNode *N, unsigned RegClassID) {
for (unsigned i = 0; i < NOps; i++) {		for (unsigned i = 0; i < NOps; i++) {
// XXX: Why is this here?		// XXX: Why is this here?
if (isa<RegisterSDNode>(N->getOperand(i))) {		if (isa<RegisterSDNode>(N->getOperand(i))) {
IsRegSeq = false;		IsRegSeq = false;
break;		break;
}		}
RegSeqArgs[1 + (2 * i)] = N->getOperand(i);		RegSeqArgs[1 + (2 * i)] = N->getOperand(i);
RegSeqArgs[1 + (2 * i) + 1] =		RegSeqArgs[1 + (2 * i) + 1] =
CurDAG->getTargetConstant(TRI->getSubRegFromChannel(i), DL,		CurDAG->getTargetConstant(AMDGPURegisterInfo::getSubRegFromChannel(i), DL,
MVT::i32);		MVT::i32);
}		}
if (NOps != NumVectorElts) {		if (NOps != NumVectorElts) {
// Fill in the missing undef elements if this was a scalar_to_vector.		// Fill in the missing undef elements if this was a scalar_to_vector.
assert(N->getOpcode() == ISD::SCALAR_TO_VECTOR && NOps < NumVectorElts);		assert(N->getOpcode() == ISD::SCALAR_TO_VECTOR && NOps < NumVectorElts);
MachineSDNode *ImpDef = CurDAG->getMachineNode(TargetOpcode::IMPLICIT_DEF,		MachineSDNode *ImpDef = CurDAG->getMachineNode(TargetOpcode::IMPLICIT_DEF,
DL, EltVT);		DL, EltVT);
for (unsigned i = NOps; i < NumVectorElts; ++i) {		for (unsigned i = NOps; i < NumVectorElts; ++i) {
RegSeqArgs[1 + (2 * i)] = SDValue(ImpDef, 0);		RegSeqArgs[1 + (2 * i)] = SDValue(ImpDef, 0);
RegSeqArgs[1 + (2 * i) + 1] =		RegSeqArgs[1 + (2 * i) + 1] =
CurDAG->getTargetConstant(TRI->getSubRegFromChannel(i), DL, MVT::i32);		CurDAG->getTargetConstant(AMDGPURegisterInfo::getSubRegFromChannel(i), DL, MVT::i32);
}		}
}		}

if (!IsRegSeq)		if (!IsRegSeq)
SelectCode(N);		SelectCode(N);
CurDAG->SelectNodeTo(N, AMDGPU::REG_SEQUENCE, N->getVTList(), RegSeqArgs);		CurDAG->SelectNodeTo(N, AMDGPU::REG_SEQUENCE, N->getVTList(), RegSeqArgs);
}		}

▲ Show 20 Lines • Show All 182 Lines • ▼ Show 20 Lines	void AMDGPUDAGToDAGISel::Select(SDNode *N) {
case AMDGPUISD::ATOMIC_CMP_SWAP:		case AMDGPUISD::ATOMIC_CMP_SWAP:
SelectATOMIC_CMP_SWAP(N);		SelectATOMIC_CMP_SWAP(N);
return;		return;
}		}

SelectCode(N);		SelectCode(N);
}		}

bool AMDGPUDAGToDAGISel::isConstantLoad(const MemSDNode *N, int CbId) const {
if (!N->readMem())
return false;
if (CbId == -1)
return N->getAddressSpace() == AMDGPUASI.CONSTANT_ADDRESS \|\|
N->getAddressSpace() == AMDGPUASI.CONSTANT_ADDRESS_32BIT;

return N->getAddressSpace() == AMDGPUASI.CONSTANT_BUFFER_0 + CbId;
}

bool AMDGPUDAGToDAGISel::isUniformBr(const SDNode *N) const {		bool AMDGPUDAGToDAGISel::isUniformBr(const SDNode *N) const {
const BasicBlock *BB = FuncInfo->MBB->getBasicBlock();		const BasicBlock *BB = FuncInfo->MBB->getBasicBlock();
const Instruction *Term = BB->getTerminator();		const Instruction *Term = BB->getTerminator();
return Term->getMetadata("amdgpu.uniform") \|\|		return Term->getMetadata("amdgpu.uniform") \|\|
Term->getMetadata("structurizecfg.uniform");		Term->getMetadata("structurizecfg.uniform");
}		}

StringRef AMDGPUDAGToDAGISel::getPassName() const {		StringRef AMDGPUDAGToDAGISel::getPassName() const {
return "AMDGPU DAG->DAG Pattern Instruction Selection";		return "AMDGPU DAG->DAG Pattern Instruction Selection";
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Complex Patterns		// Complex Patterns
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

bool AMDGPUDAGToDAGISel::SelectGlobalValueConstantOffset(SDValue Addr,
SDValue& IntPtr) {
if (ConstantSDNode *Cst = dyn_cast<ConstantSDNode>(Addr)) {
IntPtr = CurDAG->getIntPtrConstant(Cst->getZExtValue() / 4, SDLoc(Addr),
true);
return true;
}
return false;
}

bool AMDGPUDAGToDAGISel::SelectGlobalValueVariableOffset(SDValue Addr,
SDValue& BaseReg, SDValue &Offset) {
if (!isa<ConstantSDNode>(Addr)) {
BaseReg = Addr;
Offset = CurDAG->getIntPtrConstant(0, SDLoc(Addr), true);
return true;
}
return false;
}

bool AMDGPUDAGToDAGISel::SelectADDRVTX_READ(SDValue Addr, SDValue &Base,		bool AMDGPUDAGToDAGISel::SelectADDRVTX_READ(SDValue Addr, SDValue &Base,
SDValue &Offset) {		SDValue &Offset) {
return false;		return false;
}		}

bool AMDGPUDAGToDAGISel::SelectADDRIndirect(SDValue Addr, SDValue &Base,		bool AMDGPUDAGToDAGISel::SelectADDRIndirect(SDValue Addr, SDValue &Base,
SDValue &Offset) {		SDValue &Offset) {
ConstantSDNode *C;		ConstantSDNode *C;
SDLoc DL(Addr);		SDLoc DL(Addr);

if ((C = dyn_cast<ConstantSDNode>(Addr))) {		if ((C = dyn_cast<ConstantSDNode>(Addr))) {
Base = CurDAG->getRegister(AMDGPU::INDIRECT_BASE_ADDR, MVT::i32);		Base = CurDAG->getRegister(R600::INDIRECT_BASE_ADDR, MVT::i32);
Offset = CurDAG->getTargetConstant(C->getZExtValue(), DL, MVT::i32);		Offset = CurDAG->getTargetConstant(C->getZExtValue(), DL, MVT::i32);
} else if ((Addr.getOpcode() == AMDGPUISD::DWORDADDR) &&		} else if ((Addr.getOpcode() == AMDGPUISD::DWORDADDR) &&
(C = dyn_cast<ConstantSDNode>(Addr.getOperand(0)))) {		(C = dyn_cast<ConstantSDNode>(Addr.getOperand(0)))) {
Base = CurDAG->getRegister(AMDGPU::INDIRECT_BASE_ADDR, MVT::i32);		Base = CurDAG->getRegister(R600::INDIRECT_BASE_ADDR, MVT::i32);
Offset = CurDAG->getTargetConstant(C->getZExtValue(), DL, MVT::i32);		Offset = CurDAG->getTargetConstant(C->getZExtValue(), DL, MVT::i32);
} else if ((Addr.getOpcode() == ISD::ADD \|\| Addr.getOpcode() == ISD::OR) &&		} else if ((Addr.getOpcode() == ISD::ADD \|\| Addr.getOpcode() == ISD::OR) &&
(C = dyn_cast<ConstantSDNode>(Addr.getOperand(1)))) {		(C = dyn_cast<ConstantSDNode>(Addr.getOperand(1)))) {
Base = Addr.getOperand(0);		Base = Addr.getOperand(0);
Offset = CurDAG->getTargetConstant(C->getZExtValue(), DL, MVT::i32);		Offset = CurDAG->getTargetConstant(C->getZExtValue(), DL, MVT::i32);
} else {		} else {
Base = Addr;		Base = Addr;
Offset = CurDAG->getTargetConstant(0, DL, MVT::i32);		Offset = CurDAG->getTargetConstant(0, DL, MVT::i32);
▲ Show 20 Lines • Show All 1,440 Lines • ▼ Show 20 Lines	while (Position != CurDAG->allnodes_end()) {
ReplaceUses(Node, ResNode);		ReplaceUses(Node, ResNode);
IsModified = true;		IsModified = true;
}		}
}		}
CurDAG->RemoveDeadNodes();		CurDAG->RemoveDeadNodes();
} while (IsModified);		} while (IsModified);
}		}

		bool R600DAGToDAGISel::runOnMachineFunction(MachineFunction &MF) {
		Subtarget = &MF.getSubtarget<R600Subtarget>();
		return SelectionDAGISel::runOnMachineFunction(MF);
		}

		bool R600DAGToDAGISel::isConstantLoad(const MemSDNode *N, int CbId) const {
		if (!N->readMem())
		return false;
		if (CbId == -1)
		return N->getAddressSpace() == AMDGPUASI.CONSTANT_ADDRESS \|\|
		N->getAddressSpace() == AMDGPUASI.CONSTANT_ADDRESS_32BIT;

		return N->getAddressSpace() == AMDGPUASI.CONSTANT_BUFFER_0 + CbId;
		}

		bool R600DAGToDAGISel::SelectGlobalValueConstantOffset(SDValue Addr,
		SDValue& IntPtr) {
		if (ConstantSDNode *Cst = dyn_cast<ConstantSDNode>(Addr)) {
		IntPtr = CurDAG->getIntPtrConstant(Cst->getZExtValue() / 4, SDLoc(Addr),
		true);
		return true;
		}
		return false;
		}

		bool R600DAGToDAGISel::SelectGlobalValueVariableOffset(SDValue Addr,
		SDValue& BaseReg, SDValue &Offset) {
		if (!isa<ConstantSDNode>(Addr)) {
		BaseReg = Addr;
		Offset = CurDAG->getIntPtrConstant(0, SDLoc(Addr), true);
		return true;
		}
		return false;
		}

void R600DAGToDAGISel::Select(SDNode *N) {		void R600DAGToDAGISel::Select(SDNode *N) {
unsigned int Opc = N->getOpcode();		unsigned int Opc = N->getOpcode();
if (N->isMachineOpcode()) {		if (N->isMachineOpcode()) {
N->setNodeId(-1);		N->setNodeId(-1);
return; // Already selected.		return; // Already selected.
}		}

switch (Opc) {		switch (Opc) {
default: break;		default: break;
case AMDGPUISD::BUILD_VERTICAL_VECTOR:		case AMDGPUISD::BUILD_VERTICAL_VECTOR:
case ISD::SCALAR_TO_VECTOR:		case ISD::SCALAR_TO_VECTOR:
case ISD::BUILD_VECTOR: {		case ISD::BUILD_VECTOR: {
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
unsigned NumVectorElts = VT.getVectorNumElements();		unsigned NumVectorElts = VT.getVectorNumElements();
unsigned RegClassID;		unsigned RegClassID;
// BUILD_VECTOR was lowered into an IMPLICIT_DEF + 4 INSERT_SUBREG		// BUILD_VECTOR was lowered into an IMPLICIT_DEF + 4 INSERT_SUBREG
// that adds a 128 bits reg copy when going through TwoAddressInstructions		// that adds a 128 bits reg copy when going through TwoAddressInstructions
// pass. We want to avoid 128 bits copies as much as possible because they		// pass. We want to avoid 128 bits copies as much as possible because they
// can't be bundled by our scheduler.		// can't be bundled by our scheduler.
switch(NumVectorElts) {		switch(NumVectorElts) {
case 2: RegClassID = AMDGPU::R600_Reg64RegClassID; break;		case 2: RegClassID = R600::R600_Reg64RegClassID; break;
case 4:		case 4:
if (Opc == AMDGPUISD::BUILD_VERTICAL_VECTOR)		if (Opc == AMDGPUISD::BUILD_VERTICAL_VECTOR)
RegClassID = AMDGPU::R600_Reg128VerticalRegClassID;		RegClassID = R600::R600_Reg128VerticalRegClassID;
else		else
RegClassID = AMDGPU::R600_Reg128RegClassID;		RegClassID = R600::R600_Reg128RegClassID;
break;		break;
default: llvm_unreachable("Do not know how to lower this BUILD_VECTOR");		default: llvm_unreachable("Do not know how to lower this BUILD_VECTOR");
}		}
SelectBuildVector(N, RegClassID);		SelectBuildVector(N, RegClassID);
return;		return;
}		}
}		}

SelectCode(N);		SelectCode(N);
}		}

bool R600DAGToDAGISel::SelectADDRIndirect(SDValue Addr, SDValue &Base,		bool R600DAGToDAGISel::SelectADDRIndirect(SDValue Addr, SDValue &Base,
SDValue &Offset) {		SDValue &Offset) {
ConstantSDNode *C;		ConstantSDNode *C;
SDLoc DL(Addr);		SDLoc DL(Addr);

if ((C = dyn_cast<ConstantSDNode>(Addr))) {		if ((C = dyn_cast<ConstantSDNode>(Addr))) {
Base = CurDAG->getRegister(AMDGPU::INDIRECT_BASE_ADDR, MVT::i32);		Base = CurDAG->getRegister(R600::INDIRECT_BASE_ADDR, MVT::i32);
Offset = CurDAG->getTargetConstant(C->getZExtValue(), DL, MVT::i32);		Offset = CurDAG->getTargetConstant(C->getZExtValue(), DL, MVT::i32);
} else if ((Addr.getOpcode() == AMDGPUISD::DWORDADDR) &&		} else if ((Addr.getOpcode() == AMDGPUISD::DWORDADDR) &&
(C = dyn_cast<ConstantSDNode>(Addr.getOperand(0)))) {		(C = dyn_cast<ConstantSDNode>(Addr.getOperand(0)))) {
Base = CurDAG->getRegister(AMDGPU::INDIRECT_BASE_ADDR, MVT::i32);		Base = CurDAG->getRegister(R600::INDIRECT_BASE_ADDR, MVT::i32);
Offset = CurDAG->getTargetConstant(C->getZExtValue(), DL, MVT::i32);		Offset = CurDAG->getTargetConstant(C->getZExtValue(), DL, MVT::i32);
} else if ((Addr.getOpcode() == ISD::ADD \|\| Addr.getOpcode() == ISD::OR) &&		} else if ((Addr.getOpcode() == ISD::ADD \|\| Addr.getOpcode() == ISD::OR) &&
(C = dyn_cast<ConstantSDNode>(Addr.getOperand(1)))) {		(C = dyn_cast<ConstantSDNode>(Addr.getOperand(1)))) {
Base = Addr.getOperand(0);		Base = Addr.getOperand(0);
Offset = CurDAG->getTargetConstant(C->getZExtValue(), DL, MVT::i32);		Offset = CurDAG->getTargetConstant(C->getZExtValue(), DL, MVT::i32);
} else {		} else {
Base = Addr;		Base = Addr;
Offset = CurDAG->getTargetConstant(0, DL, MVT::i32);		Offset = CurDAG->getTargetConstant(0, DL, MVT::i32);
Show All 14 Lines	if (Addr.getOpcode() == ISD::ADD
Offset = CurDAG->getTargetConstant(IMMOffset->getZExtValue(), SDLoc(Addr),		Offset = CurDAG->getTargetConstant(IMMOffset->getZExtValue(), SDLoc(Addr),
MVT::i32);		MVT::i32);
return true;		return true;
// If the pointer address is constant, we can move it to the offset field.		// If the pointer address is constant, we can move it to the offset field.
} else if ((IMMOffset = dyn_cast<ConstantSDNode>(Addr))		} else if ((IMMOffset = dyn_cast<ConstantSDNode>(Addr))
&& isInt<16>(IMMOffset->getZExtValue())) {		&& isInt<16>(IMMOffset->getZExtValue())) {
Base = CurDAG->getCopyFromReg(CurDAG->getEntryNode(),		Base = CurDAG->getCopyFromReg(CurDAG->getEntryNode(),
SDLoc(CurDAG->getEntryNode()),		SDLoc(CurDAG->getEntryNode()),
AMDGPU::ZERO, MVT::i32);		R600::ZERO, MVT::i32);
Offset = CurDAG->getTargetConstant(IMMOffset->getZExtValue(), SDLoc(Addr),		Offset = CurDAG->getTargetConstant(IMMOffset->getZExtValue(), SDLoc(Addr),
MVT::i32);		MVT::i32);
return true;		return true;
}		}

// Default case, no offset		// Default case, no offset
Base = Addr;		Base = Addr;
Offset = CurDAG->getTargetConstant(0, SDLoc(Addr), MVT::i32);		Offset = CurDAG->getTargetConstant(0, SDLoc(Addr), MVT::i32);
return true;		return true;
}		}

lib/Target/AMDGPU/AMDGPUISelLowering.h

Show All 17 Lines

#include "AMDGPU.h"		#include "AMDGPU.h"
#include "llvm/CodeGen/CallingConvLower.h"		#include "llvm/CodeGen/CallingConvLower.h"
#include "llvm/CodeGen/TargetLowering.h"		#include "llvm/CodeGen/TargetLowering.h"

namespace llvm {		namespace llvm {

class AMDGPUMachineFunction;		class AMDGPUMachineFunction;
class AMDGPUSubtarget;		class AMDGPUCommonSubtarget;
struct ArgDescriptor;		struct ArgDescriptor;

class AMDGPUTargetLowering : public TargetLowering {		class AMDGPUTargetLowering : public TargetLowering {
private:		private:
		const AMDGPUCommonSubtarget *Subtarget;

/// \returns AMDGPUISD::FFBH_U32 node if the incoming \p Op may have been		/// \returns AMDGPUISD::FFBH_U32 node if the incoming \p Op may have been
/// legalized from a smaller type VT. Need to match pre-legalized type because		/// legalized from a smaller type VT. Need to match pre-legalized type because
/// the generic legalization inserts the add/sub between the select and		/// the generic legalization inserts the add/sub between the select and
/// compare.		/// compare.
SDValue getFFBX_U32(SelectionDAG &DAG, SDValue Op, const SDLoc &DL, unsigned Opc) const;		SDValue getFFBX_U32(SelectionDAG &DAG, SDValue Op, const SDLoc &DL, unsigned Opc) const;

public:		public:
static unsigned numBitsUnsigned(SDValue Op, SelectionDAG &DAG);		static unsigned numBitsUnsigned(SDValue Op, SelectionDAG &DAG);
static unsigned numBitsSigned(SDValue Op, SelectionDAG &DAG);		static unsigned numBitsSigned(SDValue Op, SelectionDAG &DAG);

protected:		protected:
const AMDGPUSubtarget *Subtarget;
AMDGPUAS AMDGPUASI;		AMDGPUAS AMDGPUASI;

SDValue LowerEXTRACT_SUBVECTOR(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerEXTRACT_SUBVECTOR(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerCONCAT_VECTORS(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerCONCAT_VECTORS(SDValue Op, SelectionDAG &DAG) const;
/// \brief Split a vector store into multiple scalar stores.		/// \brief Split a vector store into multiple scalar stores.
/// \returns The resulting chain.		/// \returns The resulting chain.

SDValue LowerFREM(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFREM(SDValue Op, SelectionDAG &DAG) const;
▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines	protected:
SDValue LowerSDIVREM(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSDIVREM(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerUDIVREM(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerUDIVREM(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerDIVREM24(SDValue Op, SelectionDAG &DAG, bool sign) const;		SDValue LowerDIVREM24(SDValue Op, SelectionDAG &DAG, bool sign) const;
void LowerUDIVREM64(SDValue Op, SelectionDAG &DAG,		void LowerUDIVREM64(SDValue Op, SelectionDAG &DAG,
SmallVectorImpl<SDValue> &Results) const;		SmallVectorImpl<SDValue> &Results) const;
void analyzeFormalArgumentsCompute(CCState &State,		void analyzeFormalArgumentsCompute(CCState &State,
const SmallVectorImpl<ISD::InputArg> &Ins) const;		const SmallVectorImpl<ISD::InputArg> &Ins) const;
public:		public:
AMDGPUTargetLowering(const TargetMachine &TM, const AMDGPUSubtarget &STI);		AMDGPUTargetLowering(const TargetMachine &TM, const AMDGPUCommonSubtarget &STI);

bool mayIgnoreSignedZero(SDValue Op) const {		bool mayIgnoreSignedZero(SDValue Op) const {
if (getTargetMachine().Options.NoSignedZerosFPMath)		if (getTargetMachine().Options.NoSignedZerosFPMath)
return true;		return true;

const auto Flags = Op.getNode()->getFlags();		const auto Flags = Op.getNode()->getFlags();
if (Flags.isDefined())		if (Flags.isDefined())
return Flags.hasNoSignedZeros();		return Flags.hasNoSignedZeros();
▲ Show 20 Lines • Show All 446 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPUISelLowering.cpp

Show First 20 Lines • Show All 145 Lines • ▼ Show 20 Lines	unsigned AMDGPUTargetLowering::numBitsSigned(SDValue Op, SelectionDAG &DAG) {
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();

// In order for this to be a signed 24-bit value, bit 23, must		// In order for this to be a signed 24-bit value, bit 23, must
// be a sign bit.		// be a sign bit.
return VT.getSizeInBits() - DAG.ComputeNumSignBits(Op);		return VT.getSizeInBits() - DAG.ComputeNumSignBits(Op);
}		}

AMDGPUTargetLowering::AMDGPUTargetLowering(const TargetMachine &TM,		AMDGPUTargetLowering::AMDGPUTargetLowering(const TargetMachine &TM,
const AMDGPUSubtarget &STI)		const AMDGPUCommonSubtarget &STI)
: TargetLowering(TM), Subtarget(&STI) {		: TargetLowering(TM), Subtarget(&STI) {
AMDGPUASI = AMDGPU::getAMDGPUAS(TM);		AMDGPUASI = AMDGPU::getAMDGPUAS(TM);
// Lower floating point store/load to integer store/load to reduce the number		// Lower floating point store/load to integer store/load to reduce the number
// of patterns in tablegen.		// of patterns in tablegen.
setOperationAction(ISD::LOAD, MVT::f32, Promote);		setOperationAction(ISD::LOAD, MVT::f32, Promote);
AddPromotedToType(ISD::LOAD, MVT::f32, MVT::i32);		AddPromotedToType(ISD::LOAD, MVT::f32, MVT::i32);

setOperationAction(ISD::LOAD, MVT::v2f32, Promote);		setOperationAction(ISD::LOAD, MVT::v2f32, Promote);
▲ Show 20 Lines • Show All 187 Lines • ▼ Show 20 Lines	AMDGPUTargetLowering::AMDGPUTargetLowering(const TargetMachine &TM,
setOperationAction(ISD::CONCAT_VECTORS, MVT::v8f32, Custom);		setOperationAction(ISD::CONCAT_VECTORS, MVT::v8f32, Custom);
setOperationAction(ISD::EXTRACT_SUBVECTOR, MVT::v2f32, Custom);		setOperationAction(ISD::EXTRACT_SUBVECTOR, MVT::v2f32, Custom);
setOperationAction(ISD::EXTRACT_SUBVECTOR, MVT::v2i32, Custom);		setOperationAction(ISD::EXTRACT_SUBVECTOR, MVT::v2i32, Custom);
setOperationAction(ISD::EXTRACT_SUBVECTOR, MVT::v4f32, Custom);		setOperationAction(ISD::EXTRACT_SUBVECTOR, MVT::v4f32, Custom);
setOperationAction(ISD::EXTRACT_SUBVECTOR, MVT::v4i32, Custom);		setOperationAction(ISD::EXTRACT_SUBVECTOR, MVT::v4i32, Custom);
setOperationAction(ISD::EXTRACT_SUBVECTOR, MVT::v8f32, Custom);		setOperationAction(ISD::EXTRACT_SUBVECTOR, MVT::v8f32, Custom);
setOperationAction(ISD::EXTRACT_SUBVECTOR, MVT::v8i32, Custom);		setOperationAction(ISD::EXTRACT_SUBVECTOR, MVT::v8i32, Custom);

if (Subtarget->getGeneration() < AMDGPUSubtarget::SEA_ISLANDS) {
setOperationAction(ISD::FCEIL, MVT::f64, Custom);
setOperationAction(ISD::FTRUNC, MVT::f64, Custom);
setOperationAction(ISD::FRINT, MVT::f64, Custom);
setOperationAction(ISD::FFLOOR, MVT::f64, Custom);
}

if (!Subtarget->hasBFI()) {		if (!Subtarget->hasBFI()) {
// fcopysign can be done in a single instruction with BFI.		// fcopysign can be done in a single instruction with BFI.
setOperationAction(ISD::FCOPYSIGN, MVT::f32, Expand);		setOperationAction(ISD::FCOPYSIGN, MVT::f32, Expand);
setOperationAction(ISD::FCOPYSIGN, MVT::f64, Expand);		setOperationAction(ISD::FCOPYSIGN, MVT::f64, Expand);
}		}

setOperationAction(ISD::FP16_TO_FP, MVT::f64, Expand);		setOperationAction(ISD::FP16_TO_FP, MVT::f64, Expand);
setOperationAction(ISD::FP_TO_FP16, MVT::f64, Custom);		setOperationAction(ISD::FP_TO_FP16, MVT::f64, Custom);
▲ Show 20 Lines • Show All 395 Lines • ▼ Show 20 Lines	case ISD::INTRINSIC_WO_CHAIN:
return true;		return true;
}		}
}		}
break;		break;
case ISD::LOAD:		case ISD::LOAD:
{		{
const LoadSDNode * L = dyn_cast<LoadSDNode>(N);		const LoadSDNode * L = dyn_cast<LoadSDNode>(N);
if (L->getMemOperand()->getAddrSpace()		if (L->getMemOperand()->getAddrSpace()
== Subtarget->getAMDGPUAS().CONSTANT_ADDRESS_32BIT)		== AMDGPUASI.CONSTANT_ADDRESS_32BIT)
return true;		return true;
return false;		return false;
}		}
break;		break;
}		}
}		}

bool AMDGPUTargetLowering::isSDNodeSourceOfDivergence(const SDNode * N,		bool AMDGPUTargetLowering::isSDNodeSourceOfDivergence(const SDNode * N,
Show All 34 Lines	case ISD::CopyFromReg:
}		}
return !DA \|\| DA->isDivergent(FLI->getValueFromVirtualReg(Reg));		return !DA \|\| DA->isDivergent(FLI->getValueFromVirtualReg(Reg));
}		}
}		}
break;		break;
case ISD::LOAD: {		case ISD::LOAD: {
const LoadSDNode *L = dyn_cast<LoadSDNode>(N);		const LoadSDNode *L = dyn_cast<LoadSDNode>(N);
if (L->getMemOperand()->getAddrSpace() ==		if (L->getMemOperand()->getAddrSpace() ==
Subtarget->getAMDGPUAS().PRIVATE_ADDRESS)		AMDGPUASI.PRIVATE_ADDRESS)
return true;		return true;
} break;		} break;
case ISD::CALLSEQ_END:		case ISD::CALLSEQ_END:
return true;		return true;
break;		break;
case ISD::INTRINSIC_WO_CHAIN:		case ISD::INTRINSIC_WO_CHAIN:
{		{

▲ Show 20 Lines • Show All 3,461 Lines • ▼ Show 20 Lines	else
Known.Zero.setHighBits(32 - MaxValBits);		Known.Zero.setHighBits(32 - MaxValBits);
break;		break;
}		}
case ISD::INTRINSIC_WO_CHAIN: {		case ISD::INTRINSIC_WO_CHAIN: {
unsigned IID = cast<ConstantSDNode>(Op.getOperand(0))->getZExtValue();		unsigned IID = cast<ConstantSDNode>(Op.getOperand(0))->getZExtValue();
switch (IID) {		switch (IID) {
case Intrinsic::amdgcn_mbcnt_lo:		case Intrinsic::amdgcn_mbcnt_lo:
case Intrinsic::amdgcn_mbcnt_hi: {		case Intrinsic::amdgcn_mbcnt_hi: {
		const SISubtarget &ST =
		DAG.getMachineFunction().getSubtarget<SISubtarget>();
// These return at most the wavefront size - 1.		// These return at most the wavefront size - 1.
unsigned Size = Op.getValueType().getSizeInBits();		unsigned Size = Op.getValueType().getSizeInBits();
Known.Zero.setHighBits(Size - Subtarget->getWavefrontSizeLog2());		Known.Zero.setHighBits(Size - ST.getWavefrontSizeLog2());
break;		break;
}		}
default:		default:
break;		break;
}		}
}		}
}		}
}		}
Show All 34 Lines

lib/Target/AMDGPU/AMDGPUInstrInfo.h

	Show All 14 Lines

	#ifndef LLVM_LIB_TARGET_AMDGPU_AMDGPUINSTRINFO_H			#ifndef LLVM_LIB_TARGET_AMDGPU_AMDGPUINSTRINFO_H
	#define LLVM_LIB_TARGET_AMDGPU_AMDGPUINSTRINFO_H			#define LLVM_LIB_TARGET_AMDGPU_AMDGPUINSTRINFO_H

	#include "AMDGPU.h"			#include "AMDGPU.h"
	#include "Utils/AMDGPUBaseInfo.h"			#include "Utils/AMDGPUBaseInfo.h"
	#include "llvm/CodeGen/TargetInstrInfo.h"			#include "llvm/CodeGen/TargetInstrInfo.h"

	#define GET_INSTRINFO_HEADER
	#include "AMDGPUGenInstrInfo.inc"
	#undef GET_INSTRINFO_HEADER

	namespace llvm {			namespace llvm {

	class AMDGPUSubtarget;			class AMDGPUSubtarget;
	class MachineFunction;			class MachineFunction;
	class MachineInstr;			class MachineInstr;
	class MachineInstrBuilder;			class MachineInstrBuilder;

	class AMDGPUInstrInfo : public AMDGPUGenInstrInfo {			class AMDGPUInstrInfo {
	private:			private:
	const AMDGPUSubtarget &ST;			const AMDGPUSubtarget &ST;

	virtual void anchor();			// virtual void anchor();
	protected:
	AMDGPUAS AMDGPUASI;

	public:			public:
	explicit AMDGPUInstrInfo(const AMDGPUSubtarget &st);			explicit AMDGPUInstrInfo(const AMDGPUSubtarget &st);

	bool shouldScheduleLoadsNear(SDNode Load1, SDNode Load2,
	int64_t Offset1, int64_t Offset2,
	unsigned NumLoads) const override;

	/// \brief Return a target-specific opcode if Opcode is a pseudo instruction.			/// \brief Return a target-specific opcode if Opcode is a pseudo instruction.
	/// Return -1 if the target-specific opcode for the pseudo instruction does			/// Return -1 if the target-specific opcode for the pseudo instruction does
	/// not exist. If Opcode is not a pseudo instruction, this is identity.			/// not exist. If Opcode is not a pseudo instruction, this is identity.
	int pseudoToMCOpcode(int Opcode) const;			int pseudoToMCOpcode(int Opcode) const;

	static bool isUniformMMO(const MachineMemOperand *MMO);			static bool isUniformMMO(const MachineMemOperand *MMO);
	};			};

	Show All 19 Lines

lib/Target/AMDGPU/AMDGPUInstrInfo.cpp

	Show All 17 Lines
	#include "AMDGPUTargetMachine.h"			#include "AMDGPUTargetMachine.h"
	#include "MCTargetDesc/AMDGPUMCTargetDesc.h"			#include "MCTargetDesc/AMDGPUMCTargetDesc.h"
	#include "llvm/CodeGen/MachineFrameInfo.h"			#include "llvm/CodeGen/MachineFrameInfo.h"
	#include "llvm/CodeGen/MachineInstrBuilder.h"			#include "llvm/CodeGen/MachineInstrBuilder.h"
	#include "llvm/CodeGen/MachineRegisterInfo.h"			#include "llvm/CodeGen/MachineRegisterInfo.h"

	using namespace llvm;			using namespace llvm;

	#define GET_INSTRINFO_CTOR_DTOR
	#include "AMDGPUGenInstrInfo.inc"

	namespace llvm {
	namespace AMDGPU {
	#define GET_RSRCINTRINSIC_IMPL
	#include "AMDGPUGenSearchableTables.inc"

	#define GET_D16IMAGEDIMINTRINSIC_IMPL
	#include "AMDGPUGenSearchableTables.inc"
	}
	}

	// Pin the vtable to this file.			// Pin the vtable to this file.
	void AMDGPUInstrInfo::anchor() {}			//void AMDGPUInstrInfo::anchor() {}

	AMDGPUInstrInfo::AMDGPUInstrInfo(const AMDGPUSubtarget &ST)			AMDGPUInstrInfo::AMDGPUInstrInfo(const AMDGPUSubtarget &ST)
	: AMDGPUGenInstrInfo(AMDGPU::ADJCALLSTACKUP, AMDGPU::ADJCALLSTACKDOWN),			: ST(ST) { }
	ST(ST),
	AMDGPUASI(ST.getAMDGPUAS()) {}

	// FIXME: This behaves strangely. If, for example, you have 32 load + stores,
	// the first 16 loads will be interleaved with the stores, and the next 16 will
	// be clustered as expected. It should really split into 2 16 store batches.
	//
	// Loads are clustered until this returns false, rather than trying to schedule
	// groups of stores. This also means we have to deal with saying different
	// address space loads should be clustered, and ones which might cause bank
	// conflicts.
	//
	// This might be deprecated so it might not be worth that much effort to fix.
	bool AMDGPUInstrInfo::shouldScheduleLoadsNear(SDNode Load0, SDNode Load1,
	int64_t Offset0, int64_t Offset1,
	unsigned NumLoads) const {
	assert(Offset1 > Offset0 &&
	"Second offset should be larger than first offset!");
	// If we have less than 16 loads in a row, and the offsets are within 64
	// bytes, then schedule together.

	// A cacheline is 64 bytes (for global memory).
	return (NumLoads <= 16 && (Offset1 - Offset0) < 64);
	}

	// This must be kept in sync with the SIEncodingFamily class in SIInstrInfo.td
	enum SIEncodingFamily {
	SI = 0,
	VI = 1,
	SDWA = 2,
	SDWA9 = 3,
	GFX80 = 4,
	GFX9 = 5
	};

	static SIEncodingFamily subtargetEncodingFamily(const AMDGPUSubtarget &ST) {
	switch (ST.getGeneration()) {
	case AMDGPUSubtarget::SOUTHERN_ISLANDS:
	case AMDGPUSubtarget::SEA_ISLANDS:
	return SIEncodingFamily::SI;
	case AMDGPUSubtarget::VOLCANIC_ISLANDS:
	case AMDGPUSubtarget::GFX9:
	return SIEncodingFamily::VI;

	// FIXME: This should never be called for r600 GPUs.
	case AMDGPUSubtarget::R600:
	case AMDGPUSubtarget::R700:
	case AMDGPUSubtarget::EVERGREEN:
	case AMDGPUSubtarget::NORTHERN_ISLANDS:
	return SIEncodingFamily::SI;
	}

				arsenmUnsubmitted Done Reply Inline Actions Commented out code arsenm: Commented out code
	llvm_unreachable("Unknown subtarget generation!");
	}

	int AMDGPUInstrInfo::pseudoToMCOpcode(int Opcode) const {
	SIEncodingFamily Gen = subtargetEncodingFamily(ST);

	if ((get(Opcode).TSFlags & SIInstrFlags::renamedInGFX9) != 0 &&
	ST.getGeneration() >= AMDGPUSubtarget::GFX9)
	Gen = SIEncodingFamily::GFX9;

	if (get(Opcode).TSFlags & SIInstrFlags::SDWA)
	Gen = ST.getGeneration() == AMDGPUSubtarget::GFX9 ? SIEncodingFamily::SDWA9
	: SIEncodingFamily::SDWA;
	// Adjust the encoding family to GFX80 for D16 buffer instructions when the
	// subtarget has UnpackedD16VMem feature.
	// TODO: remove this when we discard GFX80 encoding.
	if (ST.hasUnpackedD16VMem() && (get(Opcode).TSFlags & SIInstrFlags::D16)
	&& !(get(Opcode).TSFlags & SIInstrFlags::MIMG))
	Gen = SIEncodingFamily::GFX80;

	int MCOp = AMDGPU::getMCOpcode(Opcode, Gen);

	// -1 means that Opcode is already a native instruction.
	if (MCOp == -1)
	return Opcode;

	// (uint16_t)-1 means that Opcode is a pseudo instruction that has
	// no encoding in the given subtarget generation.
	if (MCOp == (uint16_t)-1)
	return -1;

	return MCOp;
	}

	// TODO: Should largely merge with AMDGPUTTIImpl::isSourceOfDivergence.			// TODO: Should largely merge with AMDGPUTTIImpl::isSourceOfDivergence.
	bool AMDGPUInstrInfo::isUniformMMO(const MachineMemOperand *MMO) {			bool AMDGPUInstrInfo::isUniformMMO(const MachineMemOperand *MMO) {
	const Value *Ptr = MMO->getValue();			const Value *Ptr = MMO->getValue();
	// UndefValue means this is a load of a kernel input. These are uniform.			// UndefValue means this is a load of a kernel input. These are uniform.
	// Sometimes LDS instructions have constant pointers.			// Sometimes LDS instructions have constant pointers.
	// If Ptr is null, then that means this mem operand contains a			// If Ptr is null, then that means this mem operand contains a
	// PseudoSourceValue like GOT.			// PseudoSourceValue like GOT.
	Show All 13 Lines

lib/Target/AMDGPU/AMDGPUInstructions.td

Show All 36 Lines
}		}

class AMDGPUShaderInst <dag outs, dag ins, string asm = "",		class AMDGPUShaderInst <dag outs, dag ins, string asm = "",
list<dag> pattern = []> : AMDGPUInst<outs, ins, asm, pattern> {		list<dag> pattern = []> : AMDGPUInst<outs, ins, asm, pattern> {

field bits<32> Inst = 0xffffffff;		field bits<32> Inst = 0xffffffff;
}		}

		//===---------------------------------------------------------------------===//
		// Return instruction
		//===---------------------------------------------------------------------===//

		class ILFormat<dag outs, dag ins, string asmstr, list<dag> pattern>
		arsenmUnsubmitted Not Done Reply Inline Actions Should probably rename this at some point arsenm: Should probably rename this at some point
		: Instruction {

		let Namespace = "AMDGPU";
		dag OutOperandList = outs;
		dag InOperandList = ins;
		let Pattern = pattern;
		let AsmString = !strconcat(asmstr, "\n");
		let isPseudo = 1;
		let Itinerary = NullALU;
		bit hasIEEEFlag = 0;
		bit hasZeroOpFlag = 0;
		let mayLoad = 0;
		let mayStore = 0;
		let hasSideEffects = 0;
		let isCodeGenOnly = 1;
		}

		def TruePredicate : Predicate<"true">;

		// Exists to help track down where SubtargetPredicate isn't set rather
		// than letting tablegen crash with an unhelpful error.
		def InvalidPred : Predicate<"predicate not set on instruction or pattern">;

		class PredicateControl {
		Predicate SubtargetPredicate = InvalidPred;
		list<Predicate> AssemblerPredicates = [];
		Predicate AssemblerPredicate = TruePredicate;
		list<Predicate> OtherPredicates = [];
		list<Predicate> Predicates = !listconcat([SubtargetPredicate,
		AssemblerPredicate],
		AssemblerPredicates,
		OtherPredicates);
		}
		class AMDGPUPat<dag pattern, dag result> : Pat<pattern, result>,
		PredicateControl;

def FP16Denormals : Predicate<"Subtarget->hasFP16Denormals()">;		def FP16Denormals : Predicate<"Subtarget->hasFP16Denormals()">;
def FP32Denormals : Predicate<"Subtarget->hasFP32Denormals()">;		def FP32Denormals : Predicate<"Subtarget->hasFP32Denormals()">;
def FP64Denormals : Predicate<"Subtarget->hasFP64Denormals()">;		def FP64Denormals : Predicate<"Subtarget->hasFP64Denormals()">;
def NoFP16Denormals : Predicate<"!Subtarget->hasFP16Denormals()">;		def NoFP16Denormals : Predicate<"!Subtarget->hasFP16Denormals()">;
def NoFP32Denormals : Predicate<"!Subtarget->hasFP32Denormals()">;		def NoFP32Denormals : Predicate<"!Subtarget->hasFP32Denormals()">;
def NoFP64Denormals : Predicate<"!Subtarget->hasFP64Denormals()">;		def NoFP64Denormals : Predicate<"!Subtarget->hasFP64Denormals()">;
def UnsafeFPMath : Predicate<"TM.Options.UnsafeFPMath">;		def UnsafeFPMath : Predicate<"TM.Options.UnsafeFPMath">;
def FMA : Predicate<"Subtarget->hasFMA()">;		def FMA : Predicate<"Subtarget->hasFMA()">;
Show All 36 Lines
// Custom Operands		// Custom Operands
//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//
def brtarget : Operand<OtherVT>;		def brtarget : Operand<OtherVT>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Misc. PatFrags		// Misc. PatFrags
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

class HasOneUseUnaryOp<SDPatternOperator op> : PatFrag<
(ops node:$src0),
(op $src0),
[{ return N->hasOneUse(); }]
>;

class HasOneUseBinOp<SDPatternOperator op> : PatFrag<		class HasOneUseBinOp<SDPatternOperator op> : PatFrag<
(ops node:$src0, node:$src1),		(ops node:$src0, node:$src1),
(op $src0, $src1),		(op $src0, $src1),
[{ return N->hasOneUse(); }]		[{ return N->hasOneUse(); }]
>;		>;

class HasOneUseTernaryOp<SDPatternOperator op> : PatFrag<		class HasOneUseTernaryOp<SDPatternOperator op> : PatFrag<
(ops node:$src0, node:$src1, node:$src2),		(ops node:$src0, node:$src1, node:$src2),
(op $src0, $src1, $src2),		(op $src0, $src1, $src2),
[{ return N->hasOneUse(); }]		[{ return N->hasOneUse(); }]
>;		>;

def trunc_oneuse : HasOneUseUnaryOp<trunc>;

let Properties = [SDNPCommutative, SDNPAssociative] in {		let Properties = [SDNPCommutative, SDNPAssociative] in {
def smax_oneuse : HasOneUseBinOp<smax>;		def smax_oneuse : HasOneUseBinOp<smax>;
def smin_oneuse : HasOneUseBinOp<smin>;		def smin_oneuse : HasOneUseBinOp<smin>;
def umax_oneuse : HasOneUseBinOp<umax>;		def umax_oneuse : HasOneUseBinOp<umax>;
def umin_oneuse : HasOneUseBinOp<umin>;		def umin_oneuse : HasOneUseBinOp<umin>;
def fminnum_oneuse : HasOneUseBinOp<fminnum>;		def fminnum_oneuse : HasOneUseBinOp<fminnum>;
def fmaxnum_oneuse : HasOneUseBinOp<fmaxnum>;		def fmaxnum_oneuse : HasOneUseBinOp<fmaxnum>;
def and_oneuse : HasOneUseBinOp<and>;		def and_oneuse : HasOneUseBinOp<and>;
▲ Show 20 Lines • Show All 109 Lines • ▼ Show 20 Lines	def COND_NE : PatLeaf <
[{return N->get() == ISD::SETNE \|\| N->get() == ISD::SETUNE;}]		[{return N->get() == ISD::SETNE \|\| N->get() == ISD::SETUNE;}]
>;		>;

def COND_NULL : PatLeaf <		def COND_NULL : PatLeaf <
(cond),		(cond),
[{(void)N; return false;}]		[{(void)N; return false;}]
>;		>;

		//===----------------------------------------------------------------------===//
		// PatLeafs for Texture Constants
		//===----------------------------------------------------------------------===//

		def TEX_ARRAY : PatLeaf<
		(imm),
		[{uint32_t TType = (uint32_t)N->getZExtValue();
		return TType == 9 \|\| TType == 10 \|\| TType == 16;
		}]
		>;

		def TEX_RECT : PatLeaf<
		(imm),
		[{uint32_t TType = (uint32_t)N->getZExtValue();
		return TType == 5;
		}]
		>;

		def TEX_SHADOW : PatLeaf<
		(imm),
		[{uint32_t TType = (uint32_t)N->getZExtValue();
		return (TType >= 6 && TType <= 8) \|\| TType == 13;
		}]
		>;

		def TEX_SHADOW_ARRAY : PatLeaf<
		(imm),
		[{uint32_t TType = (uint32_t)N->getZExtValue();
		return TType == 11 \|\| TType == 12 \|\| TType == 17;
		}]
		>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Load/Store Pattern Fragments		// Load/Store Pattern Fragments
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

class Aligned8Bytes <dag ops, dag frag> : PatFrag <ops, frag, [{		class Aligned8Bytes <dag ops, dag frag> : PatFrag <ops, frag, [{
return cast<MemSDNode>(N)->getAlignment() % 8 == 0;		return cast<MemSDNode>(N)->getAlignment() % 8 == 0;
}]>;		}]>;
▲ Show 20 Lines • Show All 491 Lines • ▼ Show 20 Lines	class RcpPat<Instruction RcpInst, ValueType vt> : AMDGPUPat <
(fdiv FP_ONE, vt:$src),		(fdiv FP_ONE, vt:$src),
(RcpInst $src)		(RcpInst $src)
>;		>;

class RsqPat<Instruction RsqInst, ValueType vt> : AMDGPUPat <		class RsqPat<Instruction RsqInst, ValueType vt> : AMDGPUPat <
(AMDGPUrcp (fsqrt vt:$src)),		(AMDGPUrcp (fsqrt vt:$src)),
(RsqInst $src)		(RsqInst $src)
>;		>;

include "R600Instructions.td"
include "R700Instructions.td"
include "EvergreenInstructions.td"
include "CaymanInstructions.td"

include "SIInstrInfo.td"

lib/Target/AMDGPU/AMDGPUIntrinsics.td

	//===-- AMDGPUIntrinsics.td - Common intrinsics -- tablegen ------------===//			//===-- AMDGPUIntrinsics.td - Common intrinsics -- tablegen ------------===//
	//			//
	// The LLVM Compiler Infrastructure			// The LLVM Compiler Infrastructure
	//			//
	// This file is distributed under the University of Illinois Open Source			// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.			// License. See LICENSE.TXT for details.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// This file defines intrinsics that are used by all hw codegen targets.			// This file defines intrinsics that are used by all hw codegen targets.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	let TargetPrefix = "AMDGPU", isTarget = 1 in {			let TargetPrefix = "AMDGPU", isTarget = 1 in {
	def int_AMDGPU_kill : Intrinsic<[], [llvm_float_ty], []>;			def int_AMDGPU_kill : Intrinsic<[], [llvm_float_ty], []>;
	}			}

	include "SIIntrinsics.td"

lib/Target/AMDGPU/AMDGPULowerIntrinsics.cpp

	Show First 20 Lines • Show All 111 Lines • ▼ Show 20 Lines
	}			}

	bool AMDGPULowerIntrinsics::makeLIDRangeMetadata(Function &F) const {			bool AMDGPULowerIntrinsics::makeLIDRangeMetadata(Function &F) const {
	auto *TPC = getAnalysisIfAvailable<TargetPassConfig>();			auto *TPC = getAnalysisIfAvailable<TargetPassConfig>();
	if (!TPC)			if (!TPC)
	return false;			return false;

	const TargetMachine &TM = TPC->getTM<TargetMachine>();			const TargetMachine &TM = TPC->getTM<TargetMachine>();
	const AMDGPUSubtarget &ST = TM.getSubtarget<AMDGPUSubtarget>(F);
	bool Changed = false;			bool Changed = false;

	for (auto *U : F.users()) {			for (auto *U : F.users()) {
	auto *CI = dyn_cast<CallInst>(U);			auto *CI = dyn_cast<CallInst>(U);
	if (!CI)			if (!CI)
	continue;			continue;

	Changed \|= ST.makeLIDRangeMetadata(CI);			Changed \|= AMDGPUCommonSubtarget::get(TM, F).makeLIDRangeMetadata(CI);
	}			}
	return Changed;			return Changed;
	}			}

	bool AMDGPULowerIntrinsics::runOnModule(Module &M) {			bool AMDGPULowerIntrinsics::runOnModule(Module &M) {
	bool Changed = false;			bool Changed = false;

	for (Function &F : M) {			for (Function &F : M) {
	Show All 34 Lines

lib/Target/AMDGPU/AMDGPUMCInstLower.h

	//===- AMDGPUMCInstLower.h MachineInstr Lowering Interface ------- C++ --===//			//===- AMDGPUMCInstLower.h MachineInstr Lowering Interface ------- C++ --===//
	//			//
	// The LLVM Compiler Infrastructure			// The LLVM Compiler Infrastructure
	//			//
	// This file is distributed under the University of Illinois Open Source			// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.			// License. See LICENSE.TXT for details.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLVM_LIB_TARGET_AMDGPU_AMDGPUMCINSTLOWER_H			#ifndef LLVM_LIB_TARGET_AMDGPU_AMDGPUMCINSTLOWER_H
	#define LLVM_LIB_TARGET_AMDGPU_AMDGPUMCINSTLOWER_H			#define LLVM_LIB_TARGET_AMDGPU_AMDGPUMCINSTLOWER_H

	namespace llvm {			namespace llvm {

	class AMDGPUSubtarget;
	class AsmPrinter;			class AsmPrinter;
	class MachineBasicBlock;			class MachineBasicBlock;
	class MachineInstr;			class MachineInstr;
	class MachineOperand;			class MachineOperand;
	class MCContext;			class MCContext;
	class MCExpr;			class MCExpr;
	class MCInst;			class MCInst;
	class MCOperand;			class MCOperand;
				class TargetSubtargetInfo;

	class AMDGPUMCInstLower {			class AMDGPUMCInstLower {
	MCContext &Ctx;			MCContext &Ctx;
	const AMDGPUSubtarget &ST;			const TargetSubtargetInfo &ST;
	const AsmPrinter &AP;			const AsmPrinter &AP;

	const MCExpr *getLongBranchBlockExpr(const MachineBasicBlock &SrcBB,			const MCExpr *getLongBranchBlockExpr(const MachineBasicBlock &SrcBB,
	const MachineOperand &MO) const;			const MachineOperand &MO) const;

	public:			public:
	AMDGPUMCInstLower(MCContext &ctx, const AMDGPUSubtarget &ST,			AMDGPUMCInstLower(MCContext &ctx, const TargetSubtargetInfo &ST,
	const AsmPrinter &AP);			const AsmPrinter &AP);

	bool lowerOperand(const MachineOperand &MO, MCOperand &MCOp) const;			bool lowerOperand(const MachineOperand &MO, MCOperand &MCOp) const;

	/// \brief Lower a MachineInstr to an MCInst			/// \brief Lower a MachineInstr to an MCInst
	void lower(const MachineInstr *MI, MCInst &OutMI) const;			void lower(const MachineInstr *MI, MCInst &OutMI) const;

	};			};

	} // End namespace llvm			} // End namespace llvm

	#endif			#endif

lib/Target/AMDGPU/AMDGPUMCInstLower.cpp

Show All 33 Lines
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/Format.h"		#include "llvm/Support/Format.h"
#include <algorithm>		#include <algorithm>

using namespace llvm;		using namespace llvm;

#include "AMDGPUGenMCPseudoLowering.inc"		#include "AMDGPUGenMCPseudoLowering.inc"

AMDGPUMCInstLower::AMDGPUMCInstLower(MCContext &ctx, const AMDGPUSubtarget &st,		AMDGPUMCInstLower::AMDGPUMCInstLower(MCContext &ctx,
		const TargetSubtargetInfo &st,
const AsmPrinter &ap):		const AsmPrinter &ap):
Ctx(ctx), ST(st), AP(ap) { }		Ctx(ctx), ST(st), AP(ap) { }

static MCSymbolRefExpr::VariantKind getVariantKind(unsigned MOFlags) {		static MCSymbolRefExpr::VariantKind getVariantKind(unsigned MOFlags) {
switch (MOFlags) {		switch (MOFlags) {
default:		default:
return MCSymbolRefExpr::VK_None;		return MCSymbolRefExpr::VK_None;
case SIInstrInfo::MO_GOTPCREL:		case SIInstrInfo::MO_GOTPCREL:
Show All 10 Lines
}		}

const MCExpr *AMDGPUMCInstLower::getLongBranchBlockExpr(		const MCExpr *AMDGPUMCInstLower::getLongBranchBlockExpr(
const MachineBasicBlock &SrcBB,		const MachineBasicBlock &SrcBB,
const MachineOperand &MO) const {		const MachineOperand &MO) const {
const MCExpr *DestBBSym		const MCExpr *DestBBSym
= MCSymbolRefExpr::create(MO.getMBB()->getSymbol(), Ctx);		= MCSymbolRefExpr::create(MO.getMBB()->getSymbol(), Ctx);
const MCExpr *SrcBBSym = MCSymbolRefExpr::create(SrcBB.getSymbol(), Ctx);		const MCExpr *SrcBBSym = MCSymbolRefExpr::create(SrcBB.getSymbol(), Ctx);
		const SIInstrInfo TII = static_cast<const SIInstrInfo>(ST.getInstrInfo());

assert(SrcBB.front().getOpcode() == AMDGPU::S_GETPC_B64 &&		assert(SrcBB.front().getOpcode() == AMDGPU::S_GETPC_B64 &&
ST.getInstrInfo()->get(AMDGPU::S_GETPC_B64).Size == 4);		TII->get(AMDGPU::S_GETPC_B64).Size == 4);

// s_getpc_b64 returns the address of next instruction.		// s_getpc_b64 returns the address of next instruction.
const MCConstantExpr *One = MCConstantExpr::create(4, Ctx);		const MCConstantExpr *One = MCConstantExpr::create(4, Ctx);
SrcBBSym = MCBinaryExpr::createAdd(SrcBBSym, One, Ctx);		SrcBBSym = MCBinaryExpr::createAdd(SrcBBSym, One, Ctx);

if (MO.getTargetFlags() == AMDGPU::TF_LONG_BRANCH_FORWARD)		if (MO.getTargetFlags() == AMDGPU::TF_LONG_BRANCH_FORWARD)
return MCBinaryExpr::createSub(DestBBSym, SrcBBSym, Ctx);		return MCBinaryExpr::createSub(DestBBSym, SrcBBSym, Ctx);

assert(MO.getTargetFlags() == AMDGPU::TF_LONG_BRANCH_BACKWARD);		assert(MO.getTargetFlags() == AMDGPU::TF_LONG_BRANCH_BACKWARD);
return MCBinaryExpr::createSub(SrcBBSym, DestBBSym, Ctx);		return MCBinaryExpr::createSub(SrcBBSym, DestBBSym, Ctx);
}		}

bool AMDGPUMCInstLower::lowerOperand(const MachineOperand &MO,		bool AMDGPUMCInstLower::lowerOperand(const MachineOperand &MO,
MCOperand &MCOp) const {		MCOperand &MCOp) const {
switch (MO.getType()) {		switch (MO.getType()) {
default:		default:
llvm_unreachable("unknown operand type");		llvm_unreachable("unknown operand type");
case MachineOperand::MO_Immediate:		case MachineOperand::MO_Immediate:
MCOp = MCOperand::createImm(MO.getImm());		MCOp = MCOperand::createImm(MO.getImm());
return true;		return true;
case MachineOperand::MO_Register:		case MachineOperand::MO_Register:
		if (ST.getTargetTriple().getArch() == Triple::amdgcn)
MCOp = MCOperand::createReg(AMDGPU::getMCReg(MO.getReg(), ST));		MCOp = MCOperand::createReg(AMDGPU::getMCReg(MO.getReg(), ST));
		else
		MCOp = MCOperand::createReg(MO.getReg());
return true;		return true;
		arsenmUnsubmitted Done Reply Inline Actions Why is there a difference here? arsenm: Why is there a difference here?
case MachineOperand::MO_MachineBasicBlock: {		case MachineOperand::MO_MachineBasicBlock: {
if (MO.getTargetFlags() != 0) {		if (MO.getTargetFlags() != 0) {
MCOp = MCOperand::createExpr(		MCOp = MCOperand::createExpr(
getLongBranchBlockExpr(*MO.getParent()->getParent(), MO));		getLongBranchBlockExpr(*MO.getParent()->getParent(), MO));
} else {		} else {
MCOp = MCOperand::createExpr(		MCOp = MCOperand::createExpr(
MCSymbolRefExpr::create(MO.getMBB()->getSymbol(), Ctx));		MCSymbolRefExpr::create(MO.getMBB()->getSymbol(), Ctx));
}		}
Show All 22 Lines	bool AMDGPUMCInstLower::lowerOperand(const MachineOperand &MO,
case MachineOperand::MO_RegisterMask:		case MachineOperand::MO_RegisterMask:
// Regmasks are like implicit defs.		// Regmasks are like implicit defs.
return false;		return false;
}		}
}		}

void AMDGPUMCInstLower::lower(const MachineInstr *MI, MCInst &OutMI) const {		void AMDGPUMCInstLower::lower(const MachineInstr *MI, MCInst &OutMI) const {
unsigned Opcode = MI->getOpcode();		unsigned Opcode = MI->getOpcode();
const auto *TII = ST.getInstrInfo();		int MCOpcode = Opcode;
		auto &STI = MI->getParent()->getParent()->getSubtarget<TargetSubtargetInfo>();

		if (STI.getTargetTriple().getArch() == Triple::amdgcn) {
		arsenmUnsubmitted Done Reply Inline Actions Should this be a separate class as well? arsenm: Should this be a separate class as well?
		const auto TII = static_cast<const SIInstrInfo>(STI.getInstrInfo());

// FIXME: Should be able to handle this with emitPseudoExpansionLowering. We		// FIXME: Should be able to handle this with emitPseudoExpansionLowering. We
// need to select it to the subtarget specific version, and there's no way to		// need to select it to the subtarget specific version, and there's no way to
// do that with a single pseudo source operation.		// do that with a single pseudo source operation.
if (Opcode == AMDGPU::S_SETPC_B64_return)		if (Opcode == AMDGPU::S_SETPC_B64_return)
Opcode = AMDGPU::S_SETPC_B64;		Opcode = AMDGPU::S_SETPC_B64;
else if (Opcode == AMDGPU::SI_CALL) {		else if (Opcode == AMDGPU::SI_CALL) {
// SI_CALL is just S_SWAPPC_B64 with an additional operand to track the		// SI_CALL is just S_SWAPPC_B64 with an additional operand to track the
// called function (which we need to remove here).		// called function (which we need to remove here).
OutMI.setOpcode(TII->pseudoToMCOpcode(AMDGPU::S_SWAPPC_B64));		OutMI.setOpcode(TII->pseudoToMCOpcode(AMDGPU::S_SWAPPC_B64));
MCOperand Dest, Src;		MCOperand Dest, Src;
lowerOperand(MI->getOperand(0), Dest);		lowerOperand(MI->getOperand(0), Dest);
lowerOperand(MI->getOperand(1), Src);		lowerOperand(MI->getOperand(1), Src);
OutMI.addOperand(Dest);		OutMI.addOperand(Dest);
OutMI.addOperand(Src);		OutMI.addOperand(Src);
return;		return;
} else if (Opcode == AMDGPU::SI_TCRETURN) {		} else if (Opcode == AMDGPU::SI_TCRETURN) {
// TODO: How to use branch immediate and avoid register+add?		// TODO: How to use branch immediate and avoid register+add?
Opcode = AMDGPU::S_SETPC_B64;		Opcode = AMDGPU::S_SETPC_B64;
}		}

int MCOpcode = TII->pseudoToMCOpcode(Opcode);
if (MCOpcode == -1) {		if (MCOpcode == -1) {
LLVMContext &C = MI->getParent()->getParent()->getFunction().getContext();		LLVMContext &C = MI->getParent()->getParent()->getFunction().getContext();
C.emitError("AMDGPUMCInstLower::lower - Pseudo instruction doesn't have "		C.emitError("AMDGPUMCInstLower::lower - Pseudo instruction doesn't have "
"a target-specific version: " + Twine(MI->getOpcode()));		"a target-specific version: " + Twine(MI->getOpcode()));
}		}
		MCOpcode = TII->pseudoToMCOpcode(Opcode);
		}

OutMI.setOpcode(MCOpcode);		OutMI.setOpcode(MCOpcode);

for (const MachineOperand &MO : MI->explicit_operands()) {		for (const MachineOperand &MO : MI->explicit_operands()) {
MCOperand MCOp;		MCOperand MCOp;
lowerOperand(MO, MCOp);		lowerOperand(MO, MCOp);
OutMI.addOperand(MCOp);		OutMI.addOperand(MCOp);
}		}
}		}

bool AMDGPUAsmPrinter::lowerOperand(const MachineOperand &MO,		bool AMDGPUAsmPrinter::lowerOperand(const MachineOperand &MO,
MCOperand &MCOp) const {		MCOperand &MCOp) const {
const AMDGPUSubtarget &STI = MF->getSubtarget<AMDGPUSubtarget>();		auto &STI = MF->getSubtarget<TargetSubtargetInfo>();
AMDGPUMCInstLower MCInstLowering(OutContext, STI, *this);		AMDGPUMCInstLower MCInstLowering(OutContext, STI, *this);
return MCInstLowering.lowerOperand(MO, MCOp);		return MCInstLowering.lowerOperand(MO, MCOp);
}		}

const MCExpr AMDGPUAsmPrinter::lowerConstant(const Constant CV) {		const MCExpr AMDGPUAsmPrinter::lowerConstant(const Constant CV) {
// TargetMachine does not support llvm-style cast. Use C++-style cast.		// TargetMachine does not support llvm-style cast. Use C++-style cast.
// This is safe since TM is always of type AMDGPUTargetMachine or its		// This is safe since TM is always of type AMDGPUTargetMachine or its
// derived class.		// derived class.
Show All 14 Lines	const MCExpr AMDGPUAsmPrinter::lowerConstant(const Constant CV) {
}		}
return AsmPrinter::lowerConstant(CV);		return AsmPrinter::lowerConstant(CV);
}		}

void AMDGPUAsmPrinter::EmitInstruction(const MachineInstr *MI) {		void AMDGPUAsmPrinter::EmitInstruction(const MachineInstr *MI) {
if (emitPseudoExpansionLowering(*OutStreamer, MI))		if (emitPseudoExpansionLowering(*OutStreamer, MI))
return;		return;

const AMDGPUSubtarget &STI = MF->getSubtarget<AMDGPUSubtarget>();		const TargetSubtargetInfo &STI = MF->getSubtarget<TargetSubtargetInfo>();
AMDGPUMCInstLower MCInstLowering(OutContext, STI, *this);		AMDGPUMCInstLower MCInstLowering(OutContext, STI, *this);

StringRef Err;		StringRef Err;
if (!STI.getInstrInfo()->verifyInstruction(*MI, Err)) {		if (!STI.getInstrInfo()->verifyInstruction(*MI, Err)) {
LLVMContext &C = MI->getParent()->getParent()->getFunction().getContext();		LLVMContext &C = MI->getParent()->getParent()->getFunction().getContext();
C.emitError("Illegal instruction detected: " + Err);		C.emitError("Illegal instruction detected: " + Err);
MI->print(errs());		MI->print(errs());
}		}
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	if (MI->getOpcode() == AMDGPU::SI_MASKED_UNREACHABLE) {
OutStreamer->emitRawComment(" divergent unreachable");		OutStreamer->emitRawComment(" divergent unreachable");
return;		return;
}		}

MCInst TmpInst;		MCInst TmpInst;
MCInstLowering.lower(MI, TmpInst);		MCInstLowering.lower(MI, TmpInst);
EmitToStreamer(*OutStreamer, TmpInst);		EmitToStreamer(*OutStreamer, TmpInst);

if (STI.dumpCode()) {		if (AMDGPUSubtarget::get(*MF).dumpCode()) {
// Disassemble instruction/operands to text.		// Disassemble instruction/operands to text.
DisasmLines.resize(DisasmLines.size() + 1);		DisasmLines.resize(DisasmLines.size() + 1);
std::string &DisasmLine = DisasmLines.back();		std::string &DisasmLine = DisasmLines.back();
raw_string_ostream DisasmStream(DisasmLine);		raw_string_ostream DisasmStream(DisasmLine);

AMDGPUInstPrinter InstPrinter(*TM.getMCAsmInfo(),		AMDGPUInstPrinter InstPrinter(*TM.getMCAsmInfo(),
*STI.getInstrInfo(),		*STI.getInstrInfo(),
*STI.getRegisterInfo());		*STI.getRegisterInfo());
Show All 25 Lines

lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp

Show First 20 Lines • Show All 146 Lines • ▼ Show 20 Lines	if (auto *TPC = getAnalysisIfAvailable<TargetPassConfig>())
TM = &TPC->getTM<TargetMachine>();		TM = &TPC->getTM<TargetMachine>();
else		else
return false;		return false;

const Triple &TT = TM->getTargetTriple();		const Triple &TT = TM->getTargetTriple();
IsAMDGCN = TT.getArch() == Triple::amdgcn;		IsAMDGCN = TT.getArch() == Triple::amdgcn;
IsAMDHSA = TT.getOS() == Triple::AMDHSA;		IsAMDHSA = TT.getOS() == Triple::AMDHSA;

const AMDGPUSubtarget &ST = TM->getSubtarget<AMDGPUSubtarget>(F);		const AMDGPUCommonSubtarget &ST = AMDGPUCommonSubtarget::get(*TM, F);
if (!ST.isPromoteAllocaEnabled())		if (!ST.isPromoteAllocaEnabled())
return false;		return false;

AS = AMDGPU::getAMDGPUAS(*F.getParent());		AS = AMDGPU::getAMDGPUAS(*F.getParent());

bool SufficientLDS = hasSufficientLocalMem(F);		bool SufficientLDS = hasSufficientLocalMem(F);
bool Changed = false;		bool Changed = false;
BasicBlock &EntryBB = *F.begin();		BasicBlock &EntryBB = *F.begin();
for (auto I = EntryBB.begin(), E = EntryBB.end(); I != E; ) {		for (auto I = EntryBB.begin(), E = EntryBB.end(); I != E; ) {
AllocaInst *AI = dyn_cast<AllocaInst>(I);		AllocaInst *AI = dyn_cast<AllocaInst>(I);

++I;		++I;
if (AI)		if (AI)
Changed \|= handleAlloca(*AI, SufficientLDS);		Changed \|= handleAlloca(*AI, SufficientLDS);
}		}

return Changed;		return Changed;
}		}

std::pair<Value , Value >		std::pair<Value , Value >
AMDGPUPromoteAlloca::getLocalSizeYZ(IRBuilder<> &Builder) {		AMDGPUPromoteAlloca::getLocalSizeYZ(IRBuilder<> &Builder) {
const AMDGPUSubtarget &ST = TM->getSubtarget<AMDGPUSubtarget>(		const Function &F = *Builder.GetInsertBlock()->getParent();
*Builder.GetInsertBlock()->getParent());		const AMDGPUCommonSubtarget &ST = AMDGPUCommonSubtarget::get(*TM, F);

if (!IsAMDHSA) {		if (!IsAMDHSA) {
Function *LocalSizeYFn		Function *LocalSizeYFn
= Intrinsic::getDeclaration(Mod, Intrinsic::r600_read_local_size_y);		= Intrinsic::getDeclaration(Mod, Intrinsic::r600_read_local_size_y);
Function *LocalSizeZFn		Function *LocalSizeZFn
= Intrinsic::getDeclaration(Mod, Intrinsic::r600_read_local_size_z);		= Intrinsic::getDeclaration(Mod, Intrinsic::r600_read_local_size_z);

CallInst *LocalSizeY = Builder.CreateCall(LocalSizeYFn, {});		CallInst *LocalSizeY = Builder.CreateCall(LocalSizeYFn, {});
▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines	AMDGPUPromoteAlloca::getLocalSizeYZ(IRBuilder<> &Builder) {

// Extract y component. Upper half of LoadZU should be zero already.		// Extract y component. Upper half of LoadZU should be zero already.
Value *Y = Builder.CreateLShr(LoadXY, 16);		Value *Y = Builder.CreateLShr(LoadXY, 16);

return std::make_pair(Y, LoadZU);		return std::make_pair(Y, LoadZU);
}		}

Value *AMDGPUPromoteAlloca::getWorkitemID(IRBuilder<> &Builder, unsigned N) {		Value *AMDGPUPromoteAlloca::getWorkitemID(IRBuilder<> &Builder, unsigned N) {
const AMDGPUSubtarget &ST = TM->getSubtarget<AMDGPUSubtarget>(		const AMDGPUCommonSubtarget &ST =
*Builder.GetInsertBlock()->getParent());		AMDGPUCommonSubtarget::get(TM, Builder.GetInsertBlock()->getParent());
Intrinsic::ID IntrID = Intrinsic::ID::not_intrinsic;		Intrinsic::ID IntrID = Intrinsic::ID::not_intrinsic;

switch (N) {		switch (N) {
case 0:		case 0:
IntrID = IsAMDGCN ? Intrinsic::amdgcn_workitem_id_x		IntrID = IsAMDGCN ? Intrinsic::amdgcn_workitem_id_x
: Intrinsic::r600_read_tidig_x;		: Intrinsic::r600_read_tidig_x;
break;		break;
case 1:		case 1:
▲ Show 20 Lines • Show All 320 Lines • ▼ Show 20 Lines	bool AMDGPUPromoteAlloca::collectUsesWithPtrTypes(
}		}

return true;		return true;
}		}

bool AMDGPUPromoteAlloca::hasSufficientLocalMem(const Function &F) {		bool AMDGPUPromoteAlloca::hasSufficientLocalMem(const Function &F) {

FunctionType *FTy = F.getFunctionType();		FunctionType *FTy = F.getFunctionType();
const AMDGPUSubtarget &ST = TM->getSubtarget<AMDGPUSubtarget>(F);		const AMDGPUCommonSubtarget &ST = AMDGPUCommonSubtarget::get(*TM, F);

// If the function has any arguments in the local address space, then it's		// If the function has any arguments in the local address space, then it's
// possible these arguments require the entire local memory space, so		// possible these arguments require the entire local memory space, so
// we cannot use local memory in the pass.		// we cannot use local memory in the pass.
for (Type *ParamTy : FTy->params()) {		for (Type *ParamTy : FTy->params()) {
PointerType *PtrTy = dyn_cast<PointerType>(ParamTy);		PointerType *PtrTy = dyn_cast<PointerType>(ParamTy);
if (PtrTy && PtrTy->getAddressSpace() == AS.LOCAL_ADDRESS) {		if (PtrTy && PtrTy->getAddressSpace() == AS.LOCAL_ADDRESS) {
LocalMemLimit = 0;		LocalMemLimit = 0;
▲ Show 20 Lines • Show All 109 Lines • ▼ Show 20 Lines	default:
DEBUG(dbgs() << " promote alloca to LDS not supported with calling convention.\n");		DEBUG(dbgs() << " promote alloca to LDS not supported with calling convention.\n");
return false;		return false;
}		}

// Not likely to have sufficient local memory for promotion.		// Not likely to have sufficient local memory for promotion.
if (!SufficientLDS)		if (!SufficientLDS)
return false;		return false;

const AMDGPUSubtarget &ST =		const AMDGPUCommonSubtarget &ST = AMDGPUCommonSubtarget::get(*TM, ContainingFunction);
TM->getSubtarget<AMDGPUSubtarget>(ContainingFunction);
unsigned WorkGroupSize = ST.getFlatWorkGroupSizes(ContainingFunction).second;		unsigned WorkGroupSize = ST.getFlatWorkGroupSizes(ContainingFunction).second;

const DataLayout &DL = Mod->getDataLayout();		const DataLayout &DL = Mod->getDataLayout();

unsigned Align = I.getAlignment();		unsigned Align = I.getAlignment();
if (Align == 0)		if (Align == 0)
Align = DL.getABITypeAlignment(I.getAllocatedType());		Align = DL.getABITypeAlignment(I.getAllocatedType());

▲ Show 20 Lines • Show All 174 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPURegisterInfo.td

	Show All 13 Lines
	let Namespace = "AMDGPU" in {			let Namespace = "AMDGPU" in {

	foreach Index = 0-15 in {			foreach Index = 0-15 in {
	def sub#Index : SubRegIndex<32, !shl(Index, 5)>;			def sub#Index : SubRegIndex<32, !shl(Index, 5)>;
	}			}

	}			}

	include "R600RegisterInfo.td"
	include "SIRegisterInfo.td"			include "SIRegisterInfo.td"

lib/Target/AMDGPU/AMDGPUSubtarget.h

Show All 33 Lines
#include "llvm/Support/MathExtras.h"		#include "llvm/Support/MathExtras.h"
#include <cassert>		#include <cassert>
#include <cstdint>		#include <cstdint>
#include <memory>		#include <memory>
#include <utility>		#include <utility>

#define GET_SUBTARGETINFO_HEADER		#define GET_SUBTARGETINFO_HEADER
#include "AMDGPUGenSubtargetInfo.inc"		#include "AMDGPUGenSubtargetInfo.inc"
		#define GET_SUBTARGETINFO_HEADER
		#include "R600GenSubtargetInfo.inc"

namespace llvm {		namespace llvm {

class StringRef;		class StringRef;

class AMDGPUSubtarget : public AMDGPUGenSubtargetInfo {		class AMDGPUCommonSubtarget {
		private:
		Triple TargetTriple;

		protected:
		// Dummy feature to use for assembler in tablegen.
		public:
		AMDGPUCommonSubtarget(const Triple &TT, StringRef GPU, StringRef FS,
		const TargetMachine &TM);

		static const AMDGPUCommonSubtarget &get(const MachineFunction &MF);
		static const AMDGPUCommonSubtarget &get(const TargetMachine &TM,
		const Function &F);
		virtual bool enableDX10Clamp() const = 0;
		virtual unsigned getAlignmentForImplicitArgPtr() const = 0;
		virtual bool isAmdCodeObjectV2(const MachineFunction &MF) const = 0;
		virtual bool isAmdHsaOS() const = 0;
		virtual bool isAmdPalOS() const = 0;
		arsenmUnsubmitted Done Reply Inline Actions Is it possible to avoid making these virtual? arsenm: Is it possible to avoid making these virtual?
		tstellarAuthorUnsubmitted Not Done Reply Inline Actions I will look through this again and see if I can eliminate some of these virtual functions, but to get rid of all of them we have a few options: We could eliminate the AMDGPUCommonSubtarget super class and then in code shared between r600 and amdgcn (which is mostly IR passes and a few remaining classes like AMDGPUTargetLowering, AMDAsmPrinter, etc) do something like: bool IsAmdHsaOs; if (Triple.getArch() == Triple::amdgcn) IsAmdHsaOS = static_cast<SISubtarget>(Subtarget).isAmdHsaOS() else IsAmdHsaOS = static_cast<R600Subtaget>(Subtarget).isAmdHsaOS(); Remove subtarget checks from shared classes by refactoring code into r600/gcn specific classes. tstellar: I will look through this again and see if I can eliminate some of these virtual functions, but…
		virtual bool has16BitInsts() const = 0;
		virtual bool hasBCNT(unsigned Size) const = 0;
		virtual bool hasBFE() const = 0;
		virtual bool hasBFI() const = 0;
		virtual bool hasMadMixInsts() const = 0;
		virtual bool hasCARRY() const = 0;
		virtual bool hasFFBH() const = 0;
		virtual bool hasFFBL() const = 0;
		virtual bool hasFminFmaxLegacy() const = 0;
		virtual bool hasFP32Denormals() const = 0;
		virtual bool hasFPExceptions() const = 0;
		virtual bool hasMulI24() const = 0;
		virtual bool hasMulU24() const = 0;
		virtual bool hasSDWA() const = 0;
		virtual bool hasVOP3PInsts() const = 0;
		virtual int getLocalMemorySize() const = 0;
		virtual bool dumpCode() const = 0;
		virtual bool isPromoteAllocaEnabled() const = 0;
		virtual FeatureBitset getFeatureBitsImpl() const = 0;
		virtual unsigned getWavefrontSize() const = 0;

		/// \returns Default range flat work group size for a calling convention.
		std::pair<unsigned, unsigned> getDefaultFlatWorkGroupSize(CallingConv::ID CC) const;

		/// \returns Subtarget's default pair of minimum/maximum flat work group sizes
		/// for function \p F, or minimum/maximum flat work group sizes explicitly
		/// requested using "amdgpu-flat-work-group-size" attribute attached to
		/// function \p F.
		///
		/// \returns Subtarget's default values if explicitly requested values cannot
		/// be converted to integer, or violate subtarget's specifications.
		std::pair<unsigned, unsigned> getFlatWorkGroupSizes(const Function &F) const;

		/// \returns Subtarget's default pair of minimum/maximum number of waves per
		/// execution unit for function \p F, or minimum/maximum number of waves per
		/// execution unit explicitly requested using "amdgpu-waves-per-eu" attribute
		/// attached to function \p F.
		///
		/// \returns Subtarget's default values if explicitly requested values cannot
		/// be converted to integer, violate subtarget's specifications, or are not
		/// compatible with minimum/maximum number of waves limited by flat work group
		/// size, register usage, and/or lds usage.
		std::pair<unsigned, unsigned> getWavesPerEU(const Function &F) const;

		/// Return the amount of LDS that can be used that will not restrict the
		/// occupancy lower than WaveCount.
		unsigned getMaxLocalMemSizeWithWaveCount(unsigned WaveCount,
		const Function &) const;

		/// Inverse of getMaxLocalMemWithWaveCount. Return the maximum wavecount if
		/// the given LDS memory size is the only constraint.
		unsigned getOccupancyWithLocalMemSize(uint32_t Bytes, const Function &) const;

		unsigned getOccupancyWithLocalMemSize(const MachineFunction &MF) const;

		/// \returns Maximum number of work groups per compute unit supported by the
		/// subtarget and limited by given \p FlatWorkGroupSize.
		unsigned getMaxWorkGroupsPerCU(unsigned FlatWorkGroupSize) const {
		return AMDGPU::IsaInfo::getMaxWorkGroupsPerCU(getFeatureBitsImpl(),
		FlatWorkGroupSize);
		}

		/// \returns Minimum flat work group size supported by the subtarget.
		unsigned getMinFlatWorkGroupSize() const {
		return AMDGPU::IsaInfo::getMinFlatWorkGroupSize(getFeatureBitsImpl());
		}

		/// \returns Maximum flat work group size supported by the subtarget.
		unsigned getMaxFlatWorkGroupSize() const {
		return AMDGPU::IsaInfo::getMaxFlatWorkGroupSize(getFeatureBitsImpl());
		}

		/// \returns Maximum number of waves per execution unit supported by the
		/// subtarget and limited by given \p FlatWorkGroupSize.
		unsigned getMaxWavesPerEU(unsigned FlatWorkGroupSize) const {
		return AMDGPU::IsaInfo::getMaxWavesPerEU(getFeatureBitsImpl(),
		FlatWorkGroupSize);
		}

		/// \returns Minimum number of waves per execution unit supported by the
		/// subtarget.
		unsigned getMinWavesPerEU() const {
		return AMDGPU::IsaInfo::getMinWavesPerEU(getFeatureBitsImpl());
		}

		unsigned getMaxWavesPerEU() const { return 10; }

		/// Creates value range metadata on an workitemid.* inrinsic call or load.
		bool makeLIDRangeMetadata(Instruction *I) const;

		virtual ~AMDGPUCommonSubtarget() {}
		};

		class AMDGPUSubtarget : public AMDGPUGenSubtargetInfo,
		public AMDGPUCommonSubtarget {
public:		public:
enum Generation {		enum Generation {
R600 = 0,		// Gap for R600 generations, so we can do comparisons between
R700,		// AMDGPUSubtarget and r600Subtarget.
EVERGREEN,		SOUTHERN_ISLANDS = 4,
NORTHERN_ISLANDS,		SEA_ISLANDS = 5,
SOUTHERN_ISLANDS,		VOLCANIC_ISLANDS = 6,
SEA_ISLANDS,		GFX9 = 7,
VOLCANIC_ISLANDS,
GFX9,
};		};

enum {		enum {
ISAVersion0_0_0,		ISAVersion0_0_0,
ISAVersion6_0_0,		ISAVersion6_0_0,
ISAVersion6_0_1,		ISAVersion6_0_1,
ISAVersion7_0_0,		ISAVersion7_0_0,
ISAVersion7_0_1,		ISAVersion7_0_1,
Show All 21 Lines	enum TrapID {
TrapIDDebugBreakpoint = 7,		TrapIDDebugBreakpoint = 7,
TrapIDDebugReserved8 = 8,		TrapIDDebugReserved8 = 8,
TrapIDDebugReservedFE = 0xfe,		TrapIDDebugReservedFE = 0xfe,
TrapIDDebugReservedFF = 0xff		TrapIDDebugReservedFF = 0xff
};		};

enum TrapRegValues {		enum TrapRegValues {
LLVMTrapHandlerRegValue = 1		LLVMTrapHandlerRegValue = 1
};		};

		arsenmUnsubmitted Not Done Reply Inline Actions Why isn't this SISubtarget/GCNSubtarget? arsenm: Why isn't this SISubtarget/GCNSubtarget?
		tstellarAuthorUnsubmitted Not Done Reply Inline Actions I was planning to rename this as a follow on patch to avoid creating even more churn in this patch. tstellar: I was planning to rename this as a follow on patch to avoid creating even more churn in this…
		private:
		SIFrameLowering FrameLowering;

		/// GlobalISel related APIs.
		std::unique_ptr<AMDGPUCallLowering> CallLoweringInfo;
		std::unique_ptr<InstructionSelector> InstSelector;
		std::unique_ptr<LegalizerInfo> Legalizer;
		std::unique_ptr<RegisterBankInfo> RegBankInfo;

protected:		protected:
// Basic subtarget description.		// Basic subtarget description.
Triple TargetTriple;		Triple TargetTriple;
Generation Gen;		unsigned Gen;
unsigned IsaVersion;		unsigned IsaVersion;
unsigned WavefrontSize;		unsigned WavefrontSize;
int LocalMemorySize;		int LocalMemorySize;
int LDSBankCount;		int LDSBankCount;
unsigned MaxPrivateElementSize;		unsigned MaxPrivateElementSize;

// Possibly statically set by tablegen, but may want to be overridden.		// Possibly statically set by tablegen, but may want to be overridden.
bool FastFMAF32;		bool FastFMAF32;
▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	protected:
bool CFALUBug;		bool CFALUBug;
bool HasVertexCache;		bool HasVertexCache;
short TexVTXClauseSize;		short TexVTXClauseSize;
bool ScalarizeGlobal;		bool ScalarizeGlobal;

// Dummy feature to use for assembler in tablegen.		// Dummy feature to use for assembler in tablegen.
bool FeatureDisable;		bool FeatureDisable;

InstrItineraryData InstrItins;
SelectionDAGTargetInfo TSInfo;		SelectionDAGTargetInfo TSInfo;
AMDGPUAS AS;		AMDGPUAS AS;

public:		public:
AMDGPUSubtarget(const Triple &TT, StringRef GPU, StringRef FS,		AMDGPUSubtarget(const Triple &TT, StringRef GPU, StringRef FS,
const TargetMachine &TM);		const TargetMachine &TM);
~AMDGPUSubtarget() override;		~AMDGPUSubtarget() override;

AMDGPUSubtarget &initializeSubtargetDependencies(const Triple &TT,		AMDGPUSubtarget &initializeSubtargetDependencies(const Triple &TT,
StringRef GPU, StringRef FS);		StringRef GPU, StringRef FS);

const AMDGPUInstrInfo *getInstrInfo() const override = 0;		virtual const SIInstrInfo *getInstrInfo() const override = 0;
const AMDGPUFrameLowering *getFrameLowering() const override = 0;
const AMDGPUTargetLowering *getTargetLowering() const override = 0;
const AMDGPURegisterInfo *getRegisterInfo() const override = 0;

const InstrItineraryData *getInstrItineraryData() const override {		const SIFrameLowering *getFrameLowering() const override {
return &InstrItins;		return &FrameLowering;
		}

		virtual const SITargetLowering *getTargetLowering() const override = 0;

		virtual const SIRegisterInfo *getRegisterInfo() const override = 0;

		const CallLowering *getCallLowering() const override {
		return CallLoweringInfo.get();
		}

		const InstructionSelector *getInstructionSelector() const override {
		return InstSelector.get();
		}

		const LegalizerInfo *getLegalizerInfo() const override {
		return Legalizer.get();
		}

		const RegisterBankInfo *getRegBankInfo() const override {
		return RegBankInfo.get();
}		}

// Nothing implemented, just prevent crashes on use.		// Nothing implemented, just prevent crashes on use.
const SelectionDAGTargetInfo *getSelectionDAGInfo() const override {		const SelectionDAGTargetInfo *getSelectionDAGInfo() const override {
return &TSInfo;		return &TSInfo;
}		}

void ParseSubtargetFeatures(StringRef CPU, StringRef FS);		void ParseSubtargetFeatures(StringRef CPU, StringRef FS);

bool isAmdHsaOS() const {		bool isAmdHsaOS() const override {
return TargetTriple.getOS() == Triple::AMDHSA;		return TargetTriple.getOS() == Triple::AMDHSA;
}		}

bool isMesa3DOS() const {		bool isMesa3DOS() const {
return TargetTriple.getOS() == Triple::Mesa3D;		return TargetTriple.getOS() == Triple::Mesa3D;
}		}

bool isAmdPalOS() const {		bool isAmdPalOS() const override {
return TargetTriple.getOS() == Triple::AMDPAL;		return TargetTriple.getOS() == Triple::AMDPAL;
}		}

Generation getGeneration() const {		Generation getGeneration() const {
return Gen;		return (Generation)Gen;
}		}

unsigned getWavefrontSize() const {		unsigned getWavefrontSize() const override {
return WavefrontSize;		return WavefrontSize;
}		}

unsigned getWavefrontSizeLog2() const {		unsigned getWavefrontSizeLog2() const {
return Log2_32(WavefrontSize);		return Log2_32(WavefrontSize);
}		}

int getLocalMemorySize() const {		int getLocalMemorySize() const override {
return LocalMemorySize;		return LocalMemorySize;
}		}

int getLDSBankCount() const {		int getLDSBankCount() const {
return LDSBankCount;		return LDSBankCount;
}		}

unsigned getMaxPrivateElementSize() const {		unsigned getMaxPrivateElementSize() const {
return MaxPrivateElementSize;		return MaxPrivateElementSize;
}		}

		FeatureBitset getFeatureBitsImpl() const override {
		return getFeatureBits();
		}

AMDGPUAS getAMDGPUAS() const {		AMDGPUAS getAMDGPUAS() const {
return AS;		return AS;
}		}

bool has16BitInsts() const {		bool has16BitInsts() const override {
return Has16BitInsts;		return Has16BitInsts;
}		}

bool hasIntClamp() const {		bool hasIntClamp() const {
return HasIntClamp;		return HasIntClamp;
}		}

bool hasVOP3PInsts() const {		bool hasVOP3PInsts() const override {
return HasVOP3PInsts;		return HasVOP3PInsts;
}		}

bool hasFP64() const {		bool hasFP64() const {
return FP64;		return FP64;
}		}

bool hasMIMG_R128() const {		bool hasMIMG_R128() const {
return MIMG_R128;		return MIMG_R128;
}		}

		bool hasHWFP64() const {
		return FP64;
		}

bool hasFastFMAF32() const {		bool hasFastFMAF32() const {
return FastFMAF32;		return FastFMAF32;
}		}

bool hasHalfRate64Ops() const {		bool hasHalfRate64Ops() const {
return HalfRate64Ops;		return HalfRate64Ops;
}		}

bool hasAddr64() const {		virtual bool hasAddr64() const {
return (getGeneration() < VOLCANIC_ISLANDS);		return (getGeneration() < AMDGPUSubtarget::VOLCANIC_ISLANDS);
}		}
		arsenmUnsubmitted Done Reply Inline Actions Why is this needed outside of GCN code? arsenm: Why is this needed outside of GCN code?
		tstellarAuthorUnsubmitted Not Done Reply Inline Actions It's not. I've dropped the R600 implementation of this. tstellar: It's not. I've dropped the R600 implementation of this.

bool hasBFE() const {		bool hasBFE() const override {
return (getGeneration() >= EVERGREEN);		return true;
}		}

bool hasBFI() const {		bool hasBFI() const override {
return (getGeneration() >= EVERGREEN);		return true;
}		}

bool hasBFM() const {		bool hasBFM() const {
return hasBFE();		return hasBFE();
}		}

bool hasBCNT(unsigned Size) const {		bool hasBCNT(unsigned Size) const override {
if (Size == 32)		return true;
return (getGeneration() >= EVERGREEN);

if (Size == 64)
return (getGeneration() >= SOUTHERN_ISLANDS);

return false;
}		}

bool hasMulU24() const {		bool hasMulU24() const override {
return (getGeneration() >= EVERGREEN);		return true;
}		}

bool hasMulI24() const {		bool hasMulI24() const override {
return (getGeneration() >= SOUTHERN_ISLANDS \|\|		return true;
hasCaymanISA());
}		}

bool hasFFBL() const {		bool hasFFBL() const override {
return (getGeneration() >= EVERGREEN);		return true;
}		}

bool hasFFBH() const {		bool hasFFBH() const override {
return (getGeneration() >= EVERGREEN);		return true;
}		}

bool hasMed3_16() const {		virtual bool hasMed3_16() const {
return getGeneration() >= GFX9;		return getGeneration() >= AMDGPUSubtarget::GFX9;
}		}

bool hasMin3Max3_16() const {		virtual bool hasMin3Max3_16() const {
return getGeneration() >= GFX9;		return getGeneration() >= AMDGPUSubtarget::GFX9;
}		}

bool hasMadMixInsts() const {		bool hasMadMixInsts() const override {
return HasMadMixInsts;		return HasMadMixInsts;
}		}

bool hasCARRY() const {		bool hasCARRY() const override {
return (getGeneration() >= EVERGREEN);		return true;
}		}

bool hasBORROW() const {		virtual bool hasBORROW() const {
return (getGeneration() >= EVERGREEN);		return true;
}		}
		arsenmUnsubmitted Done Reply Inline Actions Why are these leftover as virtual? arsenm: Why are these leftover as virtual?

bool hasCaymanISA() const {		virtual bool hasCaymanISA() const {
return CaymanISA;		return false;
}		}

bool hasFMA() const {		bool hasFMA() const {
return FMA;		return FMA;
}		}

TrapHandlerAbi getTrapHandlerAbi() const {		TrapHandlerAbi getTrapHandlerAbi() const {
return isAmdHsaOS() ? TrapHandlerAbiHsa : TrapHandlerAbiNone;		return isAmdHsaOS() ? TrapHandlerAbiHsa : TrapHandlerAbiNone;
}		}

bool enableHugePrivateBuffer() const {		bool enableHugePrivateBuffer() const {
return EnableHugePrivateBuffer;		return EnableHugePrivateBuffer;
}		}

bool isPromoteAllocaEnabled() const {		bool isPromoteAllocaEnabled() const override {
return EnablePromoteAlloca;		return EnablePromoteAlloca;
}		}

bool unsafeDSOffsetFoldingEnabled() const {		bool unsafeDSOffsetFoldingEnabled() const {
return EnableUnsafeDSOffsetFolding;		return EnableUnsafeDSOffsetFolding;
}		}

bool dumpCode() const {		bool dumpCode() const override {
return DumpCode;		return DumpCode;
}		}

/// Return the amount of LDS that can be used that will not restrict the		/// Return the amount of LDS that can be used that will not restrict the
/// occupancy lower than WaveCount.		/// occupancy lower than WaveCount.
unsigned getMaxLocalMemSizeWithWaveCount(unsigned WaveCount,		unsigned getMaxLocalMemSizeWithWaveCount(unsigned WaveCount,
const Function &) const;		const Function &) const;

/// Inverse of getMaxLocalMemWithWaveCount. Return the maximum wavecount if
/// the given LDS memory size is the only constraint.
unsigned getOccupancyWithLocalMemSize(uint32_t Bytes, const Function &) const;

unsigned getOccupancyWithLocalMemSize(const MachineFunction &MF) const;

bool hasFP16Denormals() const {		bool hasFP16Denormals() const {
return FP64FP16Denormals;		return FP64FP16Denormals;
}		}

bool hasFP32Denormals() const {		bool hasFP32Denormals() const override {
return FP32Denormals;		return FP32Denormals;
}		}

bool hasFP64Denormals() const {		bool hasFP64Denormals() const {
return FP64FP16Denormals;		return FP64FP16Denormals;
}		}

bool supportsMinMaxDenormModes() const {		bool supportsMinMaxDenormModes() const {
return getGeneration() >= AMDGPUSubtarget::GFX9;		return getGeneration() >= AMDGPUSubtarget::GFX9;
}		}

bool hasFPExceptions() const {		bool hasFPExceptions() const override {
return FPExceptions;		return FPExceptions;
}		}

bool enableDX10Clamp() const {		bool enableDX10Clamp() const override {
return DX10Clamp;		return DX10Clamp;
}		}

bool enableIEEEBit(const MachineFunction &MF) const {		bool enableIEEEBit(const MachineFunction &MF) const {
return AMDGPU::isCompute(MF.getFunction().getCallingConv());		return AMDGPU::isCompute(MF.getFunction().getCallingConv());
}		}

bool useFlatForGlobal() const {		bool useFlatForGlobal() const {
Show All 24 Lines	bool hasUnalignedBufferAccess() const {
return UnalignedBufferAccess;		return UnalignedBufferAccess;
}		}

bool hasUnalignedScratchAccess() const {		bool hasUnalignedScratchAccess() const {
return UnalignedScratchAccess;		return UnalignedScratchAccess;
}		}

bool hasApertureRegs() const {		bool hasApertureRegs() const {
return HasApertureRegs;		return HasApertureRegs;
}		}

bool isTrapHandlerEnabled() const {		bool isTrapHandlerEnabled() const {
return TrapHandler;		return TrapHandler;
}		}

bool isXNACKEnabled() const {		bool isXNACKEnabled() const {
return EnableXNACK;		return EnableXNACK;
Show All 37 Lines	bool isMesaKernel(const MachineFunction &MF) const {
return isMesa3DOS() && !AMDGPU::isShader(MF.getFunction().getCallingConv());		return isMesa3DOS() && !AMDGPU::isShader(MF.getFunction().getCallingConv());
}		}

// Covers VS/PS/CS graphics shaders		// Covers VS/PS/CS graphics shaders
bool isMesaGfxShader(const MachineFunction &MF) const {		bool isMesaGfxShader(const MachineFunction &MF) const {
return isMesa3DOS() && AMDGPU::isShader(MF.getFunction().getCallingConv());		return isMesa3DOS() && AMDGPU::isShader(MF.getFunction().getCallingConv());
}		}

bool isAmdCodeObjectV2(const MachineFunction &MF) const {		bool isAmdCodeObjectV2(const MachineFunction &MF) const override {
return isAmdHsaOS() \|\| isMesaKernel(MF);		return isAmdHsaOS() \|\| isMesaKernel(MF);
}		}

bool hasMad64_32() const {		bool hasMad64_32() const {
return getGeneration() >= SEA_ISLANDS;		return getGeneration() >= SEA_ISLANDS;
}		}

bool hasFminFmaxLegacy() const {		bool hasFminFmaxLegacy() const override {
return getGeneration() < AMDGPUSubtarget::VOLCANIC_ISLANDS;		return getGeneration() < AMDGPUSubtarget::VOLCANIC_ISLANDS;
}		}

bool hasSDWA() const {		bool hasSDWA() const override {
return HasSDWA;		return HasSDWA;
}		}

bool hasSDWAOmod() const {		bool hasSDWAOmod() const {
return HasSDWAOmod;		return HasSDWAOmod;
}		}

bool hasSDWAScalar() const {		bool hasSDWAScalar() const {
Show All 17 Lines	public:
}		}

/// \brief Returns the offset in bytes from the start of the input buffer		/// \brief Returns the offset in bytes from the start of the input buffer
/// of the first explicit kernel argument.		/// of the first explicit kernel argument.
unsigned getExplicitKernelArgOffset(const MachineFunction &MF) const {		unsigned getExplicitKernelArgOffset(const MachineFunction &MF) const {
return isAmdCodeObjectV2(MF) ? 0 : 36;		return isAmdCodeObjectV2(MF) ? 0 : 36;
}		}

unsigned getAlignmentForImplicitArgPtr() const {		unsigned getAlignmentForImplicitArgPtr() const override {
return isAmdHsaOS() ? 8 : 4;		return isAmdHsaOS() ? 8 : 4;
}		}

/// \returns Number of bytes of arguments that are passed to a shader or		/// \returns Number of bytes of arguments that are passed to a shader or
/// kernel in addition to the explicit ones declared for the function.		/// kernel in addition to the explicit ones declared for the function.
unsigned getImplicitArgNumBytes(const MachineFunction &MF) const {		unsigned getImplicitArgNumBytes(const MachineFunction &MF) const {
if (isMesaKernel(MF))		if (isMesaKernel(MF))
return 16;		return 16;
Show All 16 Lines	public:
bool enableMachineScheduler() const override {		bool enableMachineScheduler() const override {
return true;		return true;
}		}

bool enableSubRegLiveness() const override {		bool enableSubRegLiveness() const override {
return true;		return true;
}		}

void setScalarizeGlobalBehavior(bool b) { ScalarizeGlobal = b;}		void setScalarizeGlobalBehavior(bool b) { ScalarizeGlobal = b; }
bool getScalarizeGlobalBehavior() const { return ScalarizeGlobal;}		bool getScalarizeGlobalBehavior() const { return ScalarizeGlobal; }

/// \returns Number of execution units per compute unit supported by the		/// \returns Number of execution units per compute unit supported by the
/// subtarget.		/// subtarget.
unsigned getEUsPerCU() const {		unsigned getEUsPerCU() const {
return AMDGPU::IsaInfo::getEUsPerCU(getFeatureBits());		return AMDGPU::IsaInfo::getEUsPerCU(MCSubtargetInfo::getFeatureBits());
}

/// \returns Maximum number of work groups per compute unit supported by the
/// subtarget and limited by given \p FlatWorkGroupSize.
unsigned getMaxWorkGroupsPerCU(unsigned FlatWorkGroupSize) const {
return AMDGPU::IsaInfo::getMaxWorkGroupsPerCU(getFeatureBits(),
FlatWorkGroupSize);
}		}

/// \returns Maximum number of waves per compute unit supported by the		/// \returns Maximum number of waves per compute unit supported by the
/// subtarget without any kind of limitation.		/// subtarget without any kind of limitation.
unsigned getMaxWavesPerCU() const {		unsigned getMaxWavesPerCU() const {
return AMDGPU::IsaInfo::getMaxWavesPerCU(getFeatureBits());		return AMDGPU::IsaInfo::getMaxWavesPerCU(MCSubtargetInfo::getFeatureBits());
}		}

/// \returns Maximum number of waves per compute unit supported by the		/// \returns Maximum number of waves per compute unit supported by the
/// subtarget and limited by given \p FlatWorkGroupSize.		/// subtarget and limited by given \p FlatWorkGroupSize.
unsigned getMaxWavesPerCU(unsigned FlatWorkGroupSize) const {		unsigned getMaxWavesPerCU(unsigned FlatWorkGroupSize) const {
return AMDGPU::IsaInfo::getMaxWavesPerCU(getFeatureBits(),		return AMDGPU::IsaInfo::getMaxWavesPerCU(MCSubtargetInfo::getFeatureBits(),
FlatWorkGroupSize);		FlatWorkGroupSize);
}		}

/// \returns Minimum number of waves per execution unit supported by the
/// subtarget.
unsigned getMinWavesPerEU() const {
return AMDGPU::IsaInfo::getMinWavesPerEU(getFeatureBits());
}

/// \returns Maximum number of waves per execution unit supported by the		/// \returns Maximum number of waves per execution unit supported by the
/// subtarget without any kind of limitation.		/// subtarget without any kind of limitation.
unsigned getMaxWavesPerEU() const {		unsigned getMaxWavesPerEU() const {
return AMDGPU::IsaInfo::getMaxWavesPerEU(getFeatureBits());		return AMDGPU::IsaInfo::getMaxWavesPerEU();
}

/// \returns Maximum number of waves per execution unit supported by the
/// subtarget and limited by given \p FlatWorkGroupSize.
unsigned getMaxWavesPerEU(unsigned FlatWorkGroupSize) const {
return AMDGPU::IsaInfo::getMaxWavesPerEU(getFeatureBits(),
FlatWorkGroupSize);
}

/// \returns Minimum flat work group size supported by the subtarget.
unsigned getMinFlatWorkGroupSize() const {
return AMDGPU::IsaInfo::getMinFlatWorkGroupSize(getFeatureBits());
}

/// \returns Maximum flat work group size supported by the subtarget.
unsigned getMaxFlatWorkGroupSize() const {
return AMDGPU::IsaInfo::getMaxFlatWorkGroupSize(getFeatureBits());
}		}

/// \returns Number of waves per work group supported by the subtarget and		/// \returns Number of waves per work group supported by the subtarget and
/// limited by given \p FlatWorkGroupSize.		/// limited by given \p FlatWorkGroupSize.
unsigned getWavesPerWorkGroup(unsigned FlatWorkGroupSize) const {		unsigned getWavesPerWorkGroup(unsigned FlatWorkGroupSize) const {
return AMDGPU::IsaInfo::getWavesPerWorkGroup(getFeatureBits(),		return AMDGPU::IsaInfo::getWavesPerWorkGroup(
FlatWorkGroupSize);		MCSubtargetInfo::getFeatureBits(), FlatWorkGroupSize);
}

/// \returns Default range flat work group size for a calling convention.
std::pair<unsigned, unsigned> getDefaultFlatWorkGroupSize(CallingConv::ID CC) const;

/// \returns Subtarget's default pair of minimum/maximum flat work group sizes
/// for function \p F, or minimum/maximum flat work group sizes explicitly
/// requested using "amdgpu-flat-work-group-size" attribute attached to
/// function \p F.
///
/// \returns Subtarget's default values if explicitly requested values cannot
/// be converted to integer, or violate subtarget's specifications.
std::pair<unsigned, unsigned> getFlatWorkGroupSizes(const Function &F) const;

/// \returns Subtarget's default pair of minimum/maximum number of waves per
/// execution unit for function \p F, or minimum/maximum number of waves per
/// execution unit explicitly requested using "amdgpu-waves-per-eu" attribute
/// attached to function \p F.
///
/// \returns Subtarget's default values if explicitly requested values cannot
/// be converted to integer, violate subtarget's specifications, or are not
/// compatible with minimum/maximum number of waves limited by flat work group
/// size, register usage, and/or lds usage.
std::pair<unsigned, unsigned> getWavesPerEU(const Function &F) const;

/// Creates value range metadata on an workitemid.* inrinsic call or load.
bool makeLIDRangeMetadata(Instruction *I) const;
};

class R600Subtarget final : public AMDGPUSubtarget {
private:
R600InstrInfo InstrInfo;
R600FrameLowering FrameLowering;
R600TargetLowering TLInfo;

public:
R600Subtarget(const Triple &TT, StringRef CPU, StringRef FS,
const TargetMachine &TM);

const R600InstrInfo *getInstrInfo() const override {
return &InstrInfo;
}

const R600FrameLowering *getFrameLowering() const override {
return &FrameLowering;
}

const R600TargetLowering *getTargetLowering() const override {
return &TLInfo;
}

const R600RegisterInfo *getRegisterInfo() const override {
return &InstrInfo.getRegisterInfo();
}

bool hasCFAluBug() const {
return CFALUBug;
}

bool hasVertexCache() const {
return HasVertexCache;
}

short getTexVTXClauseSize() const {
return TexVTXClauseSize;
}		}
};		};

class SISubtarget final : public AMDGPUSubtarget {		class SISubtarget final : public AMDGPUSubtarget {
private:		private:
SIInstrInfo InstrInfo;		SIInstrInfo InstrInfo;
SIFrameLowering FrameLowering;		SIFrameLowering FrameLowering;
SITargetLowering TLInfo;		SITargetLowering TLInfo;
Show All 34 Lines	public:

const RegisterBankInfo *getRegBankInfo() const override {		const RegisterBankInfo *getRegBankInfo() const override {
return RegBankInfo.get();		return RegBankInfo.get();
}		}

const SIRegisterInfo *getRegisterInfo() const override {		const SIRegisterInfo *getRegisterInfo() const override {
return &InstrInfo.getRegisterInfo();		return &InstrInfo.getRegisterInfo();
}		}
		// static wrappers
		static bool hasHalfRate64Ops(const TargetSubtargetInfo &STI);

// XXX - Why is this here if it isn't in the default pass set?		// XXX - Why is this here if it isn't in the default pass set?
bool enableEarlyIfConversion() const override {		bool enableEarlyIfConversion() const override {
return true;		return true;
}		}

void overrideSchedPolicy(MachineSchedPolicy &Policy,		void overrideSchedPolicy(MachineSchedPolicy &Policy,
unsigned NumRegionInstrs) const override;		unsigned NumRegionInstrs) const override;

bool isVGPRSpillingEnabled(const Function& F) const;		bool isVGPRSpillingEnabled(const Function &F) const;

unsigned getMaxNumUserSGPRs() const {		unsigned getMaxNumUserSGPRs() const {
return 16;		return 16;
}		}

bool hasSMemRealTime() const {		bool hasSMemRealTime() const {
return HasSMemRealTime;		return HasSMemRealTime;
}		}
Show All 31 Lines	public:
}		}

bool enableSIScheduler() const {		bool enableSIScheduler() const {
return EnableSIScheduler;		return EnableSIScheduler;
}		}

bool debuggerSupported() const {		bool debuggerSupported() const {
return debuggerInsertNops() && debuggerReserveRegs() &&		return debuggerInsertNops() && debuggerReserveRegs() &&
debuggerEmitPrologue();		debuggerEmitPrologue();
}		}

bool debuggerInsertNops() const {		bool debuggerInsertNops() const {
return DebuggerInsertNops;		return DebuggerInsertNops;
}		}

bool debuggerReserveRegs() const {		bool debuggerReserveRegs() const {
return DebuggerReserveRegs;		return DebuggerReserveRegs;
Show All 25 Lines	public:

bool hasReadM0SendMsgHazard() const {		bool hasReadM0SendMsgHazard() const {
return getGeneration() >= AMDGPUSubtarget::VOLCANIC_ISLANDS;		return getGeneration() >= AMDGPUSubtarget::VOLCANIC_ISLANDS;
}		}

unsigned getKernArgSegmentSize(const MachineFunction &MF,		unsigned getKernArgSegmentSize(const MachineFunction &MF,
unsigned ExplictArgBytes) const;		unsigned ExplictArgBytes) const;

/// Return the maximum number of waves per SIMD for kernels using \p SGPRs SGPRs		/// Return the maximum number of waves per SIMD for kernels using \p SGPRs
		/// SGPRs
unsigned getOccupancyWithNumSGPRs(unsigned SGPRs) const;		unsigned getOccupancyWithNumSGPRs(unsigned SGPRs) const;

/// Return the maximum number of waves per SIMD for kernels using \p VGPRs VGPRs		/// Return the maximum number of waves per SIMD for kernels using \p VGPRs
		/// VGPRs
unsigned getOccupancyWithNumVGPRs(unsigned VGPRs) const;		unsigned getOccupancyWithNumVGPRs(unsigned VGPRs) const;

/// \returns true if the flat_scratch register should be initialized with the		/// \returns true if the flat_scratch register should be initialized with the
/// pointer to the wave's scratch memory rather than a size and offset.		/// pointer to the wave's scratch memory rather than a size and offset.
bool flatScratchIsPointer() const {		bool flatScratchIsPointer() const {
return getGeneration() >= GFX9;		return getGeneration() >= AMDGPUSubtarget::GFX9;
}		}

/// \returns true if the machine has merged shaders in which s0-s7 are		/// \returns true if the machine has merged shaders in which s0-s7 are
/// reserved by the hardware and user SGPRs start at s8		/// reserved by the hardware and user SGPRs start at s8
bool hasMergedShaders() const {		bool hasMergedShaders() const {
return getGeneration() >= GFX9;		return getGeneration() >= GFX9;
}		}

/// \returns SGPR allocation granularity supported by the subtarget.		/// \returns SGPR allocation granularity supported by the subtarget.
unsigned getSGPRAllocGranule() const {		unsigned getSGPRAllocGranule() const {
return AMDGPU::IsaInfo::getSGPRAllocGranule(getFeatureBits());		return AMDGPU::IsaInfo::getSGPRAllocGranule(
		MCSubtargetInfo::getFeatureBits());
}		}

/// \returns SGPR encoding granularity supported by the subtarget.		/// \returns SGPR encoding granularity supported by the subtarget.
unsigned getSGPREncodingGranule() const {		unsigned getSGPREncodingGranule() const {
return AMDGPU::IsaInfo::getSGPREncodingGranule(getFeatureBits());		return AMDGPU::IsaInfo::getSGPREncodingGranule(
		MCSubtargetInfo::getFeatureBits());
}		}

/// \returns Total number of SGPRs supported by the subtarget.		/// \returns Total number of SGPRs supported by the subtarget.
unsigned getTotalNumSGPRs() const {		unsigned getTotalNumSGPRs() const {
return AMDGPU::IsaInfo::getTotalNumSGPRs(getFeatureBits());		return AMDGPU::IsaInfo::getTotalNumSGPRs(MCSubtargetInfo::getFeatureBits());
}		}

/// \returns Addressable number of SGPRs supported by the subtarget.		/// \returns Addressable number of SGPRs supported by the subtarget.
unsigned getAddressableNumSGPRs() const {		unsigned getAddressableNumSGPRs() const {
return AMDGPU::IsaInfo::getAddressableNumSGPRs(getFeatureBits());		return AMDGPU::IsaInfo::getAddressableNumSGPRs(
		MCSubtargetInfo::getFeatureBits());
}		}

/// \returns Minimum number of SGPRs that meets the given number of waves per		/// \returns Minimum number of SGPRs that meets the given number of waves per
/// execution unit requirement supported by the subtarget.		/// execution unit requirement supported by the subtarget.
unsigned getMinNumSGPRs(unsigned WavesPerEU) const {		unsigned getMinNumSGPRs(unsigned WavesPerEU) const {
return AMDGPU::IsaInfo::getMinNumSGPRs(getFeatureBits(), WavesPerEU);		return AMDGPU::IsaInfo::getMinNumSGPRs(MCSubtargetInfo::getFeatureBits(),
		WavesPerEU);
}		}

/// \returns Maximum number of SGPRs that meets the given number of waves per		/// \returns Maximum number of SGPRs that meets the given number of waves per
/// execution unit requirement supported by the subtarget.		/// execution unit requirement supported by the subtarget.
unsigned getMaxNumSGPRs(unsigned WavesPerEU, bool Addressable) const {		unsigned getMaxNumSGPRs(unsigned WavesPerEU, bool Addressable) const {
return AMDGPU::IsaInfo::getMaxNumSGPRs(getFeatureBits(), WavesPerEU,		return AMDGPU::IsaInfo::getMaxNumSGPRs(MCSubtargetInfo::getFeatureBits(),
Addressable);		WavesPerEU, Addressable);
}		}

/// \returns Reserved number of SGPRs for given function \p MF.		/// \returns Reserved number of SGPRs for given function \p MF.
unsigned getReservedNumSGPRs(const MachineFunction &MF) const;		unsigned getReservedNumSGPRs(const MachineFunction &MF) const;

/// \returns Maximum number of SGPRs that meets number of waves per execution		/// \returns Maximum number of SGPRs that meets number of waves per execution
/// unit requirement for function \p MF, or number of SGPRs explicitly		/// unit requirement for function \p MF, or number of SGPRs explicitly
/// requested using "amdgpu-num-sgpr" attribute attached to function \p MF.		/// requested using "amdgpu-num-sgpr" attribute attached to function \p MF.
///		///
/// \returns Value that meets number of waves per execution unit requirement		/// \returns Value that meets number of waves per execution unit requirement
/// if explicitly requested value cannot be converted to integer, violates		/// if explicitly requested value cannot be converted to integer, violates
/// subtarget's specifications, or does not meet number of waves per execution		/// subtarget's specifications, or does not meet number of waves per execution
/// unit requirement.		/// unit requirement.
unsigned getMaxNumSGPRs(const MachineFunction &MF) const;		unsigned getMaxNumSGPRs(const MachineFunction &MF) const;

/// \returns VGPR allocation granularity supported by the subtarget.		/// \returns VGPR allocation granularity supported by the subtarget.
unsigned getVGPRAllocGranule() const {		unsigned getVGPRAllocGranule() const {
return AMDGPU::IsaInfo::getVGPRAllocGranule(getFeatureBits());		return AMDGPU::IsaInfo::getVGPRAllocGranule(
		MCSubtargetInfo::getFeatureBits());
}		}

/// \returns VGPR encoding granularity supported by the subtarget.		/// \returns VGPR encoding granularity supported by the subtarget.
unsigned getVGPREncodingGranule() const {		unsigned getVGPREncodingGranule() const {
return AMDGPU::IsaInfo::getVGPREncodingGranule(getFeatureBits());		return AMDGPU::IsaInfo::getVGPREncodingGranule(
		MCSubtargetInfo::getFeatureBits());
}		}

/// \returns Total number of VGPRs supported by the subtarget.		/// \returns Total number of VGPRs supported by the subtarget.
unsigned getTotalNumVGPRs() const {		unsigned getTotalNumVGPRs() const {
return AMDGPU::IsaInfo::getTotalNumVGPRs(getFeatureBits());		return AMDGPU::IsaInfo::getTotalNumVGPRs(MCSubtargetInfo::getFeatureBits());
}		}

/// \returns Addressable number of VGPRs supported by the subtarget.		/// \returns Addressable number of VGPRs supported by the subtarget.
unsigned getAddressableNumVGPRs() const {		unsigned getAddressableNumVGPRs() const {
return AMDGPU::IsaInfo::getAddressableNumVGPRs(getFeatureBits());		return AMDGPU::IsaInfo::getAddressableNumVGPRs(
		MCSubtargetInfo::getFeatureBits());
}		}

/// \returns Minimum number of VGPRs that meets given number of waves per		/// \returns Minimum number of VGPRs that meets given number of waves per
/// execution unit requirement supported by the subtarget.		/// execution unit requirement supported by the subtarget.
unsigned getMinNumVGPRs(unsigned WavesPerEU) const {		unsigned getMinNumVGPRs(unsigned WavesPerEU) const {
return AMDGPU::IsaInfo::getMinNumVGPRs(getFeatureBits(), WavesPerEU);		return AMDGPU::IsaInfo::getMinNumVGPRs(MCSubtargetInfo::getFeatureBits(),
		WavesPerEU);
}		}

/// \returns Maximum number of VGPRs that meets given number of waves per		/// \returns Maximum number of VGPRs that meets given number of waves per
/// execution unit requirement supported by the subtarget.		/// execution unit requirement supported by the subtarget.
unsigned getMaxNumVGPRs(unsigned WavesPerEU) const {		unsigned getMaxNumVGPRs(unsigned WavesPerEU) const {
return AMDGPU::IsaInfo::getMaxNumVGPRs(getFeatureBits(), WavesPerEU);		return AMDGPU::IsaInfo::getMaxNumVGPRs(MCSubtargetInfo::getFeatureBits(),
		WavesPerEU);
}		}

/// \returns Reserved number of VGPRs for given function \p MF.		/// \returns Reserved number of VGPRs for given function \p MF.
unsigned getReservedNumVGPRs(const MachineFunction &MF) const {		unsigned getReservedNumVGPRs(const MachineFunction &MF) const {
return debuggerReserveRegs() ? 4 : 0;		return debuggerReserveRegs() ? 4 : 0;
}		}

/// \returns Maximum number of VGPRs that meets number of waves per execution		/// \returns Maximum number of VGPRs that meets number of waves per execution
/// unit requirement for function \p MF, or number of VGPRs explicitly		/// unit requirement for function \p MF, or number of VGPRs explicitly
/// requested using "amdgpu-num-vgpr" attribute attached to function \p MF.		/// requested using "amdgpu-num-vgpr" attribute attached to function \p MF.
///		///
/// \returns Value that meets number of waves per execution unit requirement		/// \returns Value that meets number of waves per execution unit requirement
/// if explicitly requested value cannot be converted to integer, violates		/// if explicitly requested value cannot be converted to integer, violates
/// subtarget's specifications, or does not meet number of waves per execution		/// subtarget's specifications, or does not meet number of waves per execution
/// unit requirement.		/// unit requirement.
unsigned getMaxNumVGPRs(const MachineFunction &MF) const;		unsigned getMaxNumVGPRs(const MachineFunction &MF) const;

void getPostRAMutations(		void getPostRAMutations(
std::vector<std::unique_ptr<ScheduleDAGMutation>> &Mutations)		std::vector<std::unique_ptr<ScheduleDAGMutation>> &Mutations)
const override;		const override;
};		};


		class R600Subtarget final : public R600GenSubtargetInfo,
		public AMDGPUCommonSubtarget {
		public:
		enum Generation { R600 = 0, R700 = 1, EVERGREEN = 2, NORTHERN_ISLANDS = 3 };

		private:
		R600InstrInfo InstrInfo;
		R600FrameLowering FrameLowering;
		R600TargetLowering TLInfo;
		unsigned WavefrontSize;
		bool FMA;
		bool CaymanISA;
		bool CFALUBug;
		bool DX10Clamp;
		bool HasVertexCache;
		bool FP32Denormals;
		bool R600ALUInst;
		bool DumpCode;
		bool FP64;
		bool EnablePromoteAlloca;
		short TexVTXClauseSize;
		Generation Gen;
		int LocalMemorySize;
		unsigned MaxPrivateElementSize;
		int LDSBankCount;
		InstrItineraryData InstrItins;
		SelectionDAGTargetInfo TSInfo;
		AMDGPUAS AS;

		public:
		R600Subtarget(const Triple &TT, StringRef CPU, StringRef FS,
		const TargetMachine &TM);

		const R600InstrInfo *getInstrInfo() const override { return &InstrInfo; }

		const R600FrameLowering *getFrameLowering() const override {
		return &FrameLowering;
		}

		const R600TargetLowering *getTargetLowering() const override {
		return &TLInfo;
		}

		const R600RegisterInfo *getRegisterInfo() const override {
		return &InstrInfo.getRegisterInfo();
		}

		const InstrItineraryData *getInstrItineraryData() const override {
		return &InstrItins;
		}

		// Nothing implemented, just prevent crashes on use.
		const SelectionDAGTargetInfo *getSelectionDAGInfo() const override {
		return &TSInfo;
		}

		void ParseSubtargetFeatures(StringRef CPU, StringRef FS);

		Generation getGeneration() const {
		return Gen;
		}

		unsigned getStackAlignment() const {
		return 4;
		}

		bool isAmdCodeObjectV2(const MachineFunction &MF) const override {
		return false;
		}

		bool isAmdHsaOS() const override {
		return false;
		}

		bool isAmdPalOS() const override {
		return false;
		}

		bool dumpCode() const override {
		return DumpCode;
		}

		bool enableDX10Clamp() const override {
		return DX10Clamp;
		}

		R600Subtarget &initializeSubtargetDependencies(const Triple &TT,
		StringRef GPU, StringRef FS);

		FeatureBitset getFeatureBitsImpl() const override {
		return getFeatureBits();
		}

		unsigned getAlignmentForImplicitArgPtr() const override {
		return 4;
		}

		bool isPromoteAllocaEnabled() const override {
		return EnablePromoteAlloca;
		}

		bool hasAddr64() const {
		return false;
		}

		bool has16BitInsts() const override {
		return false;
		}

		unsigned getWavefrontSize() const override {
		return WavefrontSize;
		}

		int getLocalMemorySize() const override {
		return LocalMemorySize;
		}

		bool hasFP32Denormals() const override {
		return FP32Denormals;
		}

		bool hasBFE() const override {
		return (getGeneration() >= EVERGREEN);
		}

		bool hasBFI() const override {
		return (getGeneration() >= EVERGREEN);
		}

		bool hasBCNT(unsigned Size) const override {
		if (Size == 32)
		return (getGeneration() >= EVERGREEN);

		return false;
		}

		bool hasBORROW() const {
		return (getGeneration() >= EVERGREEN);
		}

		bool hasMadMixInsts() const override {
		return false;
		}

		bool hasCARRY() const override {
		return (getGeneration() >= EVERGREEN);
		}

		bool hasCaymanISA() const {
		return CaymanISA;
		}

		bool hasFFBL() const override {
		return (getGeneration() >= EVERGREEN);
		}

		bool hasFFBH() const override { return (getGeneration() >= EVERGREEN); }

		bool hasFMA() const { return FMA; }

		bool hasFminFmaxLegacy() const override { return true; }

		bool hasFPExceptions() const override { return false; }

		bool hasMed3_16() const { return false; }

		bool hasMin3Max3_16() { return false; }

		bool hasMulU24() const override { return (getGeneration() >= EVERGREEN); }

		bool hasMulI24() const override { return hasCaymanISA(); }

		bool hasSDWA() const override { return false; }

		bool hasVOP3PInsts() const override { return false; }

		unsigned getExplicitKernelArgOffset(const MachineFunction &MF) const {
		return 36;
		}

		bool hasCFAluBug() const { return CFALUBug; }

		bool hasVertexCache() const { return HasVertexCache; }

		short getTexVTXClauseSize() const { return TexVTXClauseSize; }

		AMDGPUAS getAMDGPUAS() const { return AS; }

		bool enableMachineScheduler() const override {
		return true;
		}

		bool enableSubRegLiveness() const override {
		return true;
		}
		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_LIB_TARGET_AMDGPU_AMDGPUSUBTARGET_H		#endif // LLVM_LIB_TARGET_AMDGPU_AMDGPUSUBTARGET_H

lib/Target/AMDGPU/AMDGPUSubtarget.cpp

Show All 17 Lines
#include "AMDGPUCallLowering.h"		#include "AMDGPUCallLowering.h"
#include "AMDGPUInstructionSelector.h"		#include "AMDGPUInstructionSelector.h"
#include "AMDGPULegalizerInfo.h"		#include "AMDGPULegalizerInfo.h"
#include "AMDGPURegisterBankInfo.h"		#include "AMDGPURegisterBankInfo.h"
#include "SIMachineFunctionInfo.h"		#include "SIMachineFunctionInfo.h"
#include "MCTargetDesc/AMDGPUMCTargetDesc.h"		#include "MCTargetDesc/AMDGPUMCTargetDesc.h"
#include "llvm/ADT/SmallString.h"		#include "llvm/ADT/SmallString.h"
#include "llvm/CodeGen/MachineScheduler.h"		#include "llvm/CodeGen/MachineScheduler.h"
		#include "llvm/MC/MCSubtargetInfo.h"
#include "llvm/IR/MDBuilder.h"		#include "llvm/IR/MDBuilder.h"
#include "llvm/CodeGen/TargetFrameLowering.h"		#include "llvm/CodeGen/TargetFrameLowering.h"
#include <algorithm>		#include <algorithm>

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "amdgpu-subtarget"		#define DEBUG_TYPE "amdgpu-subtarget"

#define GET_SUBTARGETINFO_TARGET_DESC		#define GET_SUBTARGETINFO_TARGET_DESC
#define GET_SUBTARGETINFO_CTOR		#define GET_SUBTARGETINFO_CTOR
#include "AMDGPUGenSubtargetInfo.inc"		#include "AMDGPUGenSubtargetInfo.inc"
		#define GET_SUBTARGETINFO_TARGET_DESC
		#define GET_SUBTARGETINFO_CTOR
		#include "R600GenSubtargetInfo.inc"

AMDGPUSubtarget::~AMDGPUSubtarget() = default;		AMDGPUSubtarget::~AMDGPUSubtarget() = default;

		R600Subtarget &
		R600Subtarget::initializeSubtargetDependencies(const Triple &TT,
		StringRef GPU, StringRef FS) {
		SmallString<256> FullFS("+promote-alloca,+dx10-clamp,");
		FullFS += FS;
		ParseSubtargetFeatures(GPU, FullFS);

		// FIXME: I don't think think Evergreen has any useful support for
		// denormals, but should be checked. Should we issue a warning somewhere
		// if someone tries to enable these?
		if (getGeneration() <= R600Subtarget::NORTHERN_ISLANDS) {
		FP32Denormals = false;
		}

		// Set defaults if needed.
		if (MaxPrivateElementSize == 0)
		MaxPrivateElementSize = 4;

		if (LDSBankCount == 0)
		LDSBankCount = 32;

		return *this;
		}

AMDGPUSubtarget &		AMDGPUSubtarget &
AMDGPUSubtarget::initializeSubtargetDependencies(const Triple &TT,		AMDGPUSubtarget::initializeSubtargetDependencies(const Triple &TT,
StringRef GPU, StringRef FS) {		StringRef GPU, StringRef FS) {
// Determine default and user-specified characteristics		// Determine default and user-specified characteristics
// On SI+, we want FP64 denormals to be on by default. FP32 denormals can be		// On SI+, we want FP64 denormals to be on by default. FP32 denormals can be
// enabled, but some instructions do not respect them and they run at the		// enabled, but some instructions do not respect them and they run at the
// double precision rate, so don't enable by default.		// double precision rate, so don't enable by default.
//		//
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	if (TT.getArch() == Triple::amdgcn) {
// Do something sensible for unspecified target.		// Do something sensible for unspecified target.
if (!HasMovrel && !HasVGPRIndexMode)		if (!HasMovrel && !HasVGPRIndexMode)
HasMovrel = true;		HasMovrel = true;
}		}

return *this;		return *this;
}		}

		AMDGPUCommonSubtarget::AMDGPUCommonSubtarget(const Triple &TT, StringRef GPU,
		StringRef FS, const TargetMachine &TM) { }

AMDGPUSubtarget::AMDGPUSubtarget(const Triple &TT, StringRef GPU, StringRef FS,		AMDGPUSubtarget::AMDGPUSubtarget(const Triple &TT, StringRef GPU, StringRef FS,
const TargetMachine &TM)		const TargetMachine &TM) :
: AMDGPUGenSubtargetInfo(TT, GPU, FS),		AMDGPUGenSubtargetInfo(TT, GPU, FS),
		AMDGPUCommonSubtarget(TT, GPU, FS, TM),
		FrameLowering(TargetFrameLowering::StackGrowsUp, getStackAlignment(), 0),
TargetTriple(TT),		TargetTriple(TT),
Gen(TT.getArch() == Triple::amdgcn ? SOUTHERN_ISLANDS : R600),		Gen(SOUTHERN_ISLANDS),
IsaVersion(ISAVersion0_0_0),		IsaVersion(ISAVersion0_0_0),
WavefrontSize(0),		WavefrontSize(0),
LocalMemorySize(0),		LocalMemorySize(0),
LDSBankCount(0),		LDSBankCount(0),
MaxPrivateElementSize(0),		MaxPrivateElementSize(0),

FastFMAF32(false),		FastFMAF32(false),
HalfRate64Ops(false),		HalfRate64Ops(false),
Show All 20 Lines	AMDGPUSubtarget::AMDGPUSubtarget(const Triple &TT, StringRef GPU, StringRef FS,
EnablePromoteAlloca(false),		EnablePromoteAlloca(false),
EnableLoadStoreOpt(false),		EnableLoadStoreOpt(false),
EnableUnsafeDSOffsetFolding(false),		EnableUnsafeDSOffsetFolding(false),
EnableSIScheduler(false),		EnableSIScheduler(false),
EnableDS128(false),		EnableDS128(false),
DumpCode(false),		DumpCode(false),

FP64(false),		FP64(false),
FMA(false),
MIMG_R128(false),
IsGCN(false),
GCN3Encoding(false),		GCN3Encoding(false),
CIInsts(false),		CIInsts(false),
GFX9Insts(false),		GFX9Insts(false),
SGPRInitBug(false),		SGPRInitBug(false),
HasSMemRealTime(false),		HasSMemRealTime(false),
Has16BitInsts(false),		Has16BitInsts(false),
HasIntClamp(false),		HasIntClamp(false),
HasVOP3PInsts(false),		HasVOP3PInsts(false),
Show All 12 Lines	AMDGPUSubtarget::AMDGPUSubtarget(const Triple &TT, StringRef GPU, StringRef FS,
HasDPP(false),		HasDPP(false),
FlatAddressSpace(false),		FlatAddressSpace(false),
FlatInstOffsets(false),		FlatInstOffsets(false),
FlatGlobalInsts(false),		FlatGlobalInsts(false),
FlatScratchInsts(false),		FlatScratchInsts(false),
AddNoCarryInsts(false),		AddNoCarryInsts(false),
HasUnpackedD16VMem(false),		HasUnpackedD16VMem(false),

R600ALUInst(false),
CaymanISA(false),
CFALUBug(false),
HasVertexCache(false),
TexVTXClauseSize(0),
ScalarizeGlobal(false),		ScalarizeGlobal(false),

FeatureDisable(false),		FeatureDisable(false) {
InstrItins(getInstrItineraryForCPU(GPU)) {
AS = AMDGPU::getAMDGPUAS(TT);		AS = AMDGPU::getAMDGPUAS(TT);
initializeSubtargetDependencies(TT, GPU, FS);		initializeSubtargetDependencies(TT, GPU, FS);
}		}

unsigned AMDGPUSubtarget::getMaxLocalMemSizeWithWaveCount(unsigned NWaves,		unsigned AMDGPUCommonSubtarget::getMaxLocalMemSizeWithWaveCount(unsigned NWaves,
const Function &F) const {		const Function &F) const {
if (NWaves == 1)		if (NWaves == 1)
return getLocalMemorySize();		return getLocalMemorySize();
unsigned WorkGroupSize = getFlatWorkGroupSizes(F).second;		unsigned WorkGroupSize = getFlatWorkGroupSizes(F).second;
unsigned WorkGroupsPerCu = getMaxWorkGroupsPerCU(WorkGroupSize);		unsigned WorkGroupsPerCu = getMaxWorkGroupsPerCU(WorkGroupSize);
unsigned MaxWaves = getMaxWavesPerEU();		unsigned MaxWaves = getMaxWavesPerEU();
return getLocalMemorySize() * MaxWaves / WorkGroupsPerCu / NWaves;		return getLocalMemorySize() * MaxWaves / WorkGroupsPerCu / NWaves;
}		}

unsigned AMDGPUSubtarget::getOccupancyWithLocalMemSize(uint32_t Bytes,		unsigned AMDGPUCommonSubtarget::getOccupancyWithLocalMemSize(uint32_t Bytes,
const Function &F) const {		const Function &F) const {
unsigned WorkGroupSize = getFlatWorkGroupSizes(F).second;		unsigned WorkGroupSize = getFlatWorkGroupSizes(F).second;
unsigned WorkGroupsPerCu = getMaxWorkGroupsPerCU(WorkGroupSize);		unsigned WorkGroupsPerCu = getMaxWorkGroupsPerCU(WorkGroupSize);
unsigned MaxWaves = getMaxWavesPerEU();		unsigned MaxWaves = getMaxWavesPerEU();
unsigned Limit = getLocalMemorySize() * MaxWaves / WorkGroupsPerCu;		unsigned Limit = getLocalMemorySize() * MaxWaves / WorkGroupsPerCu;
unsigned NumWaves = Limit / (Bytes ? Bytes : 1u);		unsigned NumWaves = Limit / (Bytes ? Bytes : 1u);
NumWaves = std::min(NumWaves, MaxWaves);		NumWaves = std::min(NumWaves, MaxWaves);
NumWaves = std::max(NumWaves, 1u);		NumWaves = std::max(NumWaves, 1u);
return NumWaves;		return NumWaves;
}		}

unsigned		unsigned
AMDGPUSubtarget::getOccupancyWithLocalMemSize(const MachineFunction &MF) const {		AMDGPUCommonSubtarget::getOccupancyWithLocalMemSize(const MachineFunction &MF) const {
const auto *MFI = MF.getInfo<SIMachineFunctionInfo>();		const auto *MFI = MF.getInfo<SIMachineFunctionInfo>();
return getOccupancyWithLocalMemSize(MFI->getLDSSize(), MF.getFunction());		return getOccupancyWithLocalMemSize(MFI->getLDSSize(), MF.getFunction());
}		}

std::pair<unsigned, unsigned>		std::pair<unsigned, unsigned>
AMDGPUSubtarget::getDefaultFlatWorkGroupSize(CallingConv::ID CC) const {		AMDGPUCommonSubtarget::getDefaultFlatWorkGroupSize(CallingConv::ID CC) const {
switch (CC) {		switch (CC) {
case CallingConv::AMDGPU_CS:		case CallingConv::AMDGPU_CS:
case CallingConv::AMDGPU_KERNEL:		case CallingConv::AMDGPU_KERNEL:
case CallingConv::SPIR_KERNEL:		case CallingConv::SPIR_KERNEL:
return std::make_pair(getWavefrontSize() * 2, getWavefrontSize() * 4);		return std::make_pair(getWavefrontSize() * 2, getWavefrontSize() * 4);
case CallingConv::AMDGPU_VS:		case CallingConv::AMDGPU_VS:
case CallingConv::AMDGPU_LS:		case CallingConv::AMDGPU_LS:
case CallingConv::AMDGPU_HS:		case CallingConv::AMDGPU_HS:
case CallingConv::AMDGPU_ES:		case CallingConv::AMDGPU_ES:
case CallingConv::AMDGPU_GS:		case CallingConv::AMDGPU_GS:
case CallingConv::AMDGPU_PS:		case CallingConv::AMDGPU_PS:
return std::make_pair(1, getWavefrontSize());		return std::make_pair(1, getWavefrontSize());
default:		default:
return std::make_pair(1, 16 * getWavefrontSize());		return std::make_pair(1, 16 * getWavefrontSize());
}		}
}		}

std::pair<unsigned, unsigned> AMDGPUSubtarget::getFlatWorkGroupSizes(		std::pair<unsigned, unsigned> AMDGPUCommonSubtarget::getFlatWorkGroupSizes(
const Function &F) const {		const Function &F) const {
// FIXME: 1024 if function.		// FIXME: 1024 if function.
// Default minimum/maximum flat work group sizes.		// Default minimum/maximum flat work group sizes.
std::pair<unsigned, unsigned> Default =		std::pair<unsigned, unsigned> Default =
getDefaultFlatWorkGroupSize(F.getCallingConv());		getDefaultFlatWorkGroupSize(F.getCallingConv());

// TODO: Do not process "amdgpu-max-work-group-size" attribute once mesa		// TODO: Do not process "amdgpu-max-work-group-size" attribute once mesa
// starts using "amdgpu-flat-work-group-size" attribute.		// starts using "amdgpu-flat-work-group-size" attribute.
Show All 13 Lines	std::pair<unsigned, unsigned> AMDGPUCommonSubtarget::getFlatWorkGroupSizes(
if (Requested.first < getMinFlatWorkGroupSize())		if (Requested.first < getMinFlatWorkGroupSize())
return Default;		return Default;
if (Requested.second > getMaxFlatWorkGroupSize())		if (Requested.second > getMaxFlatWorkGroupSize())
return Default;		return Default;

return Requested;		return Requested;
}		}

std::pair<unsigned, unsigned> AMDGPUSubtarget::getWavesPerEU(		std::pair<unsigned, unsigned> AMDGPUCommonSubtarget::getWavesPerEU(
const Function &F) const {		const Function &F) const {
// Default minimum/maximum number of waves per execution unit.		// Default minimum/maximum number of waves per execution unit.
std::pair<unsigned, unsigned> Default(1, getMaxWavesPerEU());		std::pair<unsigned, unsigned> Default(1, getMaxWavesPerEU());

// Default/requested minimum/maximum flat work group sizes.		// Default/requested minimum/maximum flat work group sizes.
std::pair<unsigned, unsigned> FlatWorkGroupSizes = getFlatWorkGroupSizes(F);		std::pair<unsigned, unsigned> FlatWorkGroupSizes = getFlatWorkGroupSizes(F);

// If minimum/maximum flat work group sizes were explicitly requested using		// If minimum/maximum flat work group sizes were explicitly requested using
Show All 31 Lines	std::pair<unsigned, unsigned> AMDGPUCommonSubtarget::getWavesPerEU(
// minimum/maximum flat work group sizes.		// minimum/maximum flat work group sizes.
if (RequestedFlatWorkGroupSize &&		if (RequestedFlatWorkGroupSize &&
Requested.first < MinImpliedByFlatWorkGroupSize)		Requested.first < MinImpliedByFlatWorkGroupSize)
return Default;		return Default;

return Requested;		return Requested;
}		}

bool AMDGPUSubtarget::makeLIDRangeMetadata(Instruction *I) const {		bool AMDGPUCommonSubtarget::makeLIDRangeMetadata(Instruction *I) const {
Function *Kernel = I->getParent()->getParent();		Function *Kernel = I->getParent()->getParent();
unsigned MinSize = 0;		unsigned MinSize = 0;
unsigned MaxSize = getFlatWorkGroupSizes(*Kernel).second;		unsigned MaxSize = getFlatWorkGroupSizes(*Kernel).second;
bool IdQuery = false;		bool IdQuery = false;

// If reqd_work_group_size is present it narrows value down.		// If reqd_work_group_size is present it narrows value down.
if (auto *CI = dyn_cast<CallInst>(I)) {		if (auto *CI = dyn_cast<CallInst>(I)) {
const Function *F = CI->getCalledFunction();		const Function *F = CI->getCalledFunction();
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	bool AMDGPUCommonSubtarget::makeLIDRangeMetadata(Instruction *I) const {
MDNode *MaxWorkGroupSizeRange = MDB.createRange(APInt(32, MinSize),		MDNode *MaxWorkGroupSizeRange = MDB.createRange(APInt(32, MinSize),
APInt(32, MaxSize));		APInt(32, MaxSize));
I->setMetadata(LLVMContext::MD_range, MaxWorkGroupSizeRange);		I->setMetadata(LLVMContext::MD_range, MaxWorkGroupSizeRange);
return true;		return true;
}		}

R600Subtarget::R600Subtarget(const Triple &TT, StringRef GPU, StringRef FS,		R600Subtarget::R600Subtarget(const Triple &TT, StringRef GPU, StringRef FS,
const TargetMachine &TM) :		const TargetMachine &TM) :
AMDGPUSubtarget(TT, GPU, FS, TM),		R600GenSubtargetInfo(TT, GPU, FS),
		AMDGPUCommonSubtarget(TT, GPU, FS, TM),
InstrInfo(*this),		InstrInfo(*this),
FrameLowering(TargetFrameLowering::StackGrowsUp, getStackAlignment(), 0),		FrameLowering(TargetFrameLowering::StackGrowsUp, getStackAlignment(), 0),
TLInfo(TM, *this) {}		TLInfo(TM, initializeSubtargetDependencies(TT, GPU, FS)),
		DX10Clamp(false),
		InstrItins(getInstrItineraryForCPU(GPU)),
		AS (AMDGPU::getAMDGPUAS(TT)) {
		}

SISubtarget::SISubtarget(const Triple &TT, StringRef GPU, StringRef FS,		SISubtarget::SISubtarget(const Triple &TT, StringRef GPU, StringRef FS,
const GCNTargetMachine &TM)		const GCNTargetMachine &TM)
: AMDGPUSubtarget(TT, GPU, FS, TM), InstrInfo(*this),		: AMDGPUSubtarget(TT, GPU, FS, TM), InstrInfo(*this),
FrameLowering(TargetFrameLowering::StackGrowsUp, getStackAlignment(), 0),		FrameLowering(TargetFrameLowering::StackGrowsUp, getStackAlignment(), 0),
TLInfo(TM, *this) {		TLInfo(TM, *this) {
CallLoweringInfo.reset(new AMDGPUCallLowering(*getTargetLowering()));		CallLoweringInfo.reset(new AMDGPUCallLowering(*getTargetLowering()));
Legalizer.reset(new AMDGPULegalizerInfo(*this, TM));		Legalizer.reset(new AMDGPULegalizerInfo(*this, TM));
▲ Show 20 Lines • Show All 227 Lines • ▼ Show 20 Lines	struct MemOpClusterMutation : ScheduleDAGMutation {
}		}
};		};
} // namespace		} // namespace

void SISubtarget::getPostRAMutations(		void SISubtarget::getPostRAMutations(
std::vector<std::unique_ptr<ScheduleDAGMutation>> &Mutations) const {		std::vector<std::unique_ptr<ScheduleDAGMutation>> &Mutations) const {
Mutations.push_back(llvm::make_unique<MemOpClusterMutation>(&InstrInfo));		Mutations.push_back(llvm::make_unique<MemOpClusterMutation>(&InstrInfo));
}		}

		const AMDGPUCommonSubtarget &AMDGPUCommonSubtarget::get(const MachineFunction &MF) {
		if (MF.getTarget().getTargetTriple().getArch() == Triple::amdgcn)
		return static_cast<const AMDGPUCommonSubtarget&>(MF.getSubtarget<AMDGPUSubtarget>());
		else
		return static_cast<const AMDGPUCommonSubtarget&>(MF.getSubtarget<R600Subtarget>());
		}

		const AMDGPUCommonSubtarget &AMDGPUCommonSubtarget::get(const TargetMachine &TM, const Function &F) {
		if (TM.getTargetTriple().getArch() == Triple::amdgcn)
		return static_cast<const AMDGPUCommonSubtarget&>(TM.getSubtarget<AMDGPUSubtarget>(F));
		else
		return static_cast<const AMDGPUCommonSubtarget&>(TM.getSubtarget<R600Subtarget>(F));
		}

lib/Target/AMDGPU/AMDGPUTargetMachine.h

	Show All 11 Lines
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLVM_LIB_TARGET_AMDGPU_AMDGPUTARGETMACHINE_H			#ifndef LLVM_LIB_TARGET_AMDGPU_AMDGPUTARGETMACHINE_H
	#define LLVM_LIB_TARGET_AMDGPU_AMDGPUTARGETMACHINE_H			#define LLVM_LIB_TARGET_AMDGPU_AMDGPUTARGETMACHINE_H

	#include "AMDGPUIntrinsicInfo.h"			#include "AMDGPUIntrinsicInfo.h"
	#include "AMDGPUSubtarget.h"			#include "AMDGPUSubtarget.h"
				#include "R600IntrinsicInfo.h"
	#include "llvm/ADT/Optional.h"			#include "llvm/ADT/Optional.h"
	#include "llvm/ADT/StringMap.h"			#include "llvm/ADT/StringMap.h"
	#include "llvm/ADT/StringRef.h"			#include "llvm/ADT/StringRef.h"
	#include "llvm/Analysis/TargetTransformInfo.h"			#include "llvm/Analysis/TargetTransformInfo.h"
	#include "llvm/Support/CodeGen.h"			#include "llvm/Support/CodeGen.h"
	#include "llvm/Target/TargetMachine.h"			#include "llvm/Target/TargetMachine.h"
	#include <memory>			#include <memory>

	namespace llvm {			namespace llvm {

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// AMDGPU Target Machine (R600+)			// AMDGPU Target Machine (R600+)
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	class AMDGPUTargetMachine : public LLVMTargetMachine {			class AMDGPUTargetMachine : public LLVMTargetMachine {
	protected:			protected:
	std::unique_ptr<TargetLoweringObjectFile> TLOF;			std::unique_ptr<TargetLoweringObjectFile> TLOF;
	AMDGPUIntrinsicInfo IntrinsicInfo;
	AMDGPUAS AS;			AMDGPUAS AS;

	StringRef getGPUName(const Function &F) const;			StringRef getGPUName(const Function &F) const;
	StringRef getFeatureString(const Function &F) const;			StringRef getFeatureString(const Function &F) const;

	public:			public:
	static bool EnableLateStructurizeCFG;			static bool EnableLateStructurizeCFG;

	AMDGPUTargetMachine(const Target &T, const Triple &TT, StringRef CPU,			AMDGPUTargetMachine(const Target &T, const Triple &TT, StringRef CPU,
	StringRef FS, TargetOptions Options,			StringRef FS, TargetOptions Options,
	Optional<Reloc::Model> RM, Optional<CodeModel::Model> CM,			Optional<Reloc::Model> RM, Optional<CodeModel::Model> CM,
	CodeGenOpt::Level OL);			CodeGenOpt::Level OL);
	~AMDGPUTargetMachine() override;			~AMDGPUTargetMachine() override;

	const AMDGPUSubtarget *getSubtargetImpl() const;			const TargetSubtargetInfo *getSubtargetImpl() const;
	const AMDGPUSubtarget *getSubtargetImpl(const Function &) const override = 0;			const TargetSubtargetInfo *getSubtargetImpl(const Function &) const override = 0;

	const AMDGPUIntrinsicInfo *getIntrinsicInfo() const override {
	return &IntrinsicInfo;
	}
	TargetTransformInfo getTargetTransformInfo(const Function &F) override;			TargetTransformInfo getTargetTransformInfo(const Function &F) override;

	TargetLoweringObjectFile *getObjFileLowering() const override {			TargetLoweringObjectFile *getObjFileLowering() const override {
	return TLOF.get();			return TLOF.get();
	}			}
	AMDGPUAS getAMDGPUAS() const {			AMDGPUAS getAMDGPUAS() const {
	return AS;			return AS;
	}			}

	void adjustPassManager(PassManagerBuilder &) override;			void adjustPassManager(PassManagerBuilder &) override;
	/// Get the integer value of a null pointer in the given address space.			/// Get the integer value of a null pointer in the given address space.
	uint64_t getNullPointerValue(unsigned AddrSpace) const {			uint64_t getNullPointerValue(unsigned AddrSpace) const {
	if (AddrSpace == AS.LOCAL_ADDRESS \|\| AddrSpace == AS.REGION_ADDRESS)			if (AddrSpace == AS.LOCAL_ADDRESS \|\| AddrSpace == AS.REGION_ADDRESS)
	return -1;			return -1;
	return 0;			return 0;
	}			}
	};			};

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// R600 Target Machine (R600 -> Cayman)			// R600 Target Machine (R600 -> Cayman)
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	class R600TargetMachine final : public AMDGPUTargetMachine {			class R600TargetMachine final : public AMDGPUTargetMachine {
	private:			private:
				R600IntrinsicInfo IntrinsicInfo;
	mutable StringMap<std::unique_ptr<R600Subtarget>> SubtargetMap;			mutable StringMap<std::unique_ptr<R600Subtarget>> SubtargetMap;

	public:			public:
	R600TargetMachine(const Target &T, const Triple &TT, StringRef CPU,			R600TargetMachine(const Target &T, const Triple &TT, StringRef CPU,
	StringRef FS, TargetOptions Options,			StringRef FS, TargetOptions Options,
	Optional<Reloc::Model> RM, Optional<CodeModel::Model> CM,			Optional<Reloc::Model> RM, Optional<CodeModel::Model> CM,
	CodeGenOpt::Level OL, bool JIT);			CodeGenOpt::Level OL, bool JIT);

	TargetPassConfig *createPassConfig(PassManagerBase &PM) override;			TargetPassConfig *createPassConfig(PassManagerBase &PM) override;

	const R600Subtarget *getSubtargetImpl(const Function &) const override;			const R600Subtarget *getSubtargetImpl(const Function &) const override;

				const R600IntrinsicInfo *getIntrinsicInfo() const override {
				return &IntrinsicInfo;
				}

	bool isMachineVerifierClean() const override {			bool isMachineVerifierClean() const override {
	return false;			return false;
	}			}
	};			};

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// GCN Target Machine (SI+)			// GCN Target Machine (SI+)
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	class GCNTargetMachine final : public AMDGPUTargetMachine {			class GCNTargetMachine final : public AMDGPUTargetMachine {
	private:			private:
				AMDGPUIntrinsicInfo IntrinsicInfo;
	mutable StringMap<std::unique_ptr<SISubtarget>> SubtargetMap;			mutable StringMap<std::unique_ptr<SISubtarget>> SubtargetMap;

	public:			public:
	GCNTargetMachine(const Target &T, const Triple &TT, StringRef CPU,			GCNTargetMachine(const Target &T, const Triple &TT, StringRef CPU,
	StringRef FS, TargetOptions Options,			StringRef FS, TargetOptions Options,
	Optional<Reloc::Model> RM, Optional<CodeModel::Model> CM,			Optional<Reloc::Model> RM, Optional<CodeModel::Model> CM,
	CodeGenOpt::Level OL, bool JIT);			CodeGenOpt::Level OL, bool JIT);

	TargetPassConfig *createPassConfig(PassManagerBase &PM) override;			TargetPassConfig *createPassConfig(PassManagerBase &PM) override;

	const SISubtarget *getSubtargetImpl(const Function &) const override;			const SISubtarget *getSubtargetImpl(const Function &) const override;

				const AMDGPUIntrinsicInfo *getIntrinsicInfo() const override {
				return &IntrinsicInfo;
				}

	bool useIPRA() const override {			bool useIPRA() const override {
	return true;			return true;
	}			}
	};			};

	} // end namespace llvm			} // end namespace llvm

	#endif // LLVM_LIB_TARGET_AMDGPU_AMDGPUTARGETMACHINE_H			#endif // LLVM_LIB_TARGET_AMDGPU_AMDGPUTARGETMACHINE_H

lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

Show First 20 Lines • Show All 559 Lines • ▼ Show 20 Lines	public:
void addPreSched2() override;		void addPreSched2() override;
void addPreEmitPass() override;		void addPreEmitPass() override;
};		};

} // end anonymous namespace		} // end anonymous namespace

TargetTransformInfo		TargetTransformInfo
AMDGPUTargetMachine::getTargetTransformInfo(const Function &F) {		AMDGPUTargetMachine::getTargetTransformInfo(const Function &F) {
		if (getTargetTriple().getArch() == Triple::r600)
		return TargetTransformInfo(R600TTIImpl(this, F));
		else
return TargetTransformInfo(AMDGPUTTIImpl(this, F));		return TargetTransformInfo(AMDGPUTTIImpl(this, F));
}		}

void AMDGPUPassConfig::addEarlyCSEOrGVNPass() {		void AMDGPUPassConfig::addEarlyCSEOrGVNPass() {
if (getOptLevel() == CodeGenOpt::Aggressive)		if (getOptLevel() == CodeGenOpt::Aggressive)
addPass(createGVNPass());		addPass(createGVNPass());
else		else
addPass(createEarlyCSEPass());		addPass(createEarlyCSEPass());
}		}
▲ Show 20 Lines • Show All 315 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h

Show All 39 Lines
class Value;		class Value;

class AMDGPUTTIImpl final : public BasicTTIImplBase<AMDGPUTTIImpl> {		class AMDGPUTTIImpl final : public BasicTTIImplBase<AMDGPUTTIImpl> {
using BaseT = BasicTTIImplBase<AMDGPUTTIImpl>;		using BaseT = BasicTTIImplBase<AMDGPUTTIImpl>;
using TTI = TargetTransformInfo;		using TTI = TargetTransformInfo;

friend BaseT;		friend BaseT;

const AMDGPUSubtarget *ST;		const TargetMachine *TM;
const AMDGPUTargetLowering *TLI;		const TargetSubtargetInfo *ST;
		const TargetLowering *TLI;
bool IsGraphicsShader;		bool IsGraphicsShader;

const FeatureBitset InlineFeatureIgnoreList = {		const FeatureBitset InlineFeatureIgnoreList = {
// Codegen control options which don't matter.		// Codegen control options which don't matter.
AMDGPU::FeatureEnableLoadStoreOpt,		AMDGPU::FeatureEnableLoadStoreOpt,
AMDGPU::FeatureEnableSIScheduler,		AMDGPU::FeatureEnableSIScheduler,
AMDGPU::FeatureEnableUnsafeDSOffsetFolding,		AMDGPU::FeatureEnableUnsafeDSOffsetFolding,
AMDGPU::FeatureFlatForGlobal,		AMDGPU::FeatureFlatForGlobal,
Show All 11 Lines	const FeatureBitset InlineFeatureIgnoreList = {
AMDGPU::FeatureXNACK,		AMDGPU::FeatureXNACK,
AMDGPU::FeatureTrapHandler,		AMDGPU::FeatureTrapHandler,

// Perf-tuning features		// Perf-tuning features
AMDGPU::FeatureFastFMAF32,		AMDGPU::FeatureFastFMAF32,
AMDGPU::HalfRate64Ops		AMDGPU::HalfRate64Ops
};		};

const AMDGPUSubtarget *getST() const { return ST; }		const TargetSubtargetInfo *getST() const { return ST; }
const AMDGPUTargetLowering *getTLI() const { return TLI; }		const TargetLowering *getTLI() const { return TLI; }

static inline int getFullRateInstrCost() {		static inline int getFullRateInstrCost() {
return TargetTransformInfo::TCC_Basic;		return TargetTransformInfo::TCC_Basic;
}		}

static inline int getHalfRateInstrCost() {		static inline int getHalfRateInstrCost() {
return 2 * TargetTransformInfo::TCC_Basic;		return 2 * TargetTransformInfo::TCC_Basic;
}		}

// TODO: The size is usually 8 bytes, but takes 4x as many cycles. Maybe		// TODO: The size is usually 8 bytes, but takes 4x as many cycles. Maybe
// should be 2 or 4.		// should be 2 or 4.
static inline int getQuarterRateInstrCost() {		static inline int getQuarterRateInstrCost() {
return 3 * TargetTransformInfo::TCC_Basic;		return 3 * TargetTransformInfo::TCC_Basic;
}		}

// On some parts, normal fp64 operations are half rate, and others		// On some parts, normal fp64 operations are half rate, and others
// quarter. This also applies to some integer operations.		// quarter. This also applies to some integer operations.
inline int get64BitInstrCost() const {		inline int get64BitInstrCost() const {
return ST->hasHalfRate64Ops() ?		if (TM->getTargetTriple().getArch() == Triple::r600)
		return getQuarterRateInstrCost();
		return static_cast<const AMDGPUSubtarget*>(ST)->hasHalfRate64Ops() ?
getHalfRateInstrCost() : getQuarterRateInstrCost();		getHalfRateInstrCost() : getQuarterRateInstrCost();
}		}

public:		public:
explicit AMDGPUTTIImpl(const AMDGPUTargetMachine *TM, const Function &F)		explicit AMDGPUTTIImpl(const AMDGPUTargetMachine *TM, const Function &F)
: BaseT(TM, F.getParent()->getDataLayout()),		: BaseT(TM, F.getParent()->getDataLayout()),
		TM(TM),
ST(TM->getSubtargetImpl(F)),		ST(TM->getSubtargetImpl(F)),
TLI(ST->getTargetLowering()),		TLI(ST->getTargetLowering()),
IsGraphicsShader(AMDGPU::isShader(F.getCallingConv())) {}		IsGraphicsShader(AMDGPU::isShader(F.getCallingConv())) {}

		bool isGCN() const { return TM->getTargetTriple().getArch() == Triple::amdgcn; }

		bool isR600() const { return TM->getTargetTriple().getArch() == Triple::r600; }

bool hasBranchDivergence() { return true; }		bool hasBranchDivergence() { return true; }

void getUnrollingPreferences(Loop *L, ScalarEvolution &SE,		void getUnrollingPreferences(Loop *L, ScalarEvolution &SE,
TTI::UnrollingPreferences &UP);		TTI::UnrollingPreferences &UP);

TTI::PopcntSupportKind getPopcntSupport(unsigned TyWidth) {		TTI::PopcntSupportKind getPopcntSupport(unsigned TyWidth) {
assert(isPowerOf2_32(TyWidth) && "Ty width must be power of 2");		assert(isPowerOf2_32(TyWidth) && "Ty width must be power of 2");
return TTI::PSK_FastHardware;		return TTI::PSK_FastHardware;
Show All 37 Lines	public:

int getVectorInstrCost(unsigned Opcode, Type *ValTy, unsigned Index);		int getVectorInstrCost(unsigned Opcode, Type *ValTy, unsigned Index);
bool isSourceOfDivergence(const Value *V) const;		bool isSourceOfDivergence(const Value *V) const;
bool isAlwaysUniform(const Value *V) const;		bool isAlwaysUniform(const Value *V) const;

unsigned getFlatAddressSpace() const {		unsigned getFlatAddressSpace() const {
// Don't bother running InferAddressSpaces pass on graphics shaders which		// Don't bother running InferAddressSpaces pass on graphics shaders which
// don't use flat addressing.		// don't use flat addressing.
if (IsGraphicsShader)		if (IsGraphicsShader \|\| TM->getTargetTriple().getArch() == Triple::r600)
return -1;		return -1;
return ST->hasFlatAddressSpace() ?		const AMDGPUSubtarget Subtarget = static_cast<const AMDGPUSubtarget>(ST);
ST->getAMDGPUAS().FLAT_ADDRESS : ST->getAMDGPUAS().UNKNOWN_ADDRESS_SPACE;		return Subtarget->hasFlatAddressSpace() ?
		Subtarget->getAMDGPUAS().FLAT_ADDRESS :
		Subtarget->getAMDGPUAS().UNKNOWN_ADDRESS_SPACE;
}		}

unsigned getVectorSplitCost() { return 0; }		unsigned getVectorSplitCost() { return 0; }

unsigned getShuffleCost(TTI::ShuffleKind Kind, Type *Tp, int Index,		unsigned getShuffleCost(TTI::ShuffleKind Kind, Type *Tp, int Index,
Type *SubTp);		Type *SubTp);

bool areInlineCompatible(const Function *Caller,		bool areInlineCompatible(const Function *Caller,
const Function *Callee) const;		const Function *Callee) const;

unsigned getInliningThresholdMultiplier() { return 9; }		unsigned getInliningThresholdMultiplier() { return 9; }
};		};

		class R600TTIImpl final : public BasicTTIImplBase<R600TTIImpl> {
		using BaseT = BasicTTIImplBase<R600TTIImpl>;
		using TTI = TargetTransformInfo;

		friend BaseT;

		const TargetMachine *TM;
		const TargetSubtargetInfo *ST;
		const TargetLowering *TLI;
		AMDGPUTTIImpl CommonTTI;

		public:
		explicit R600TTIImpl(const AMDGPUTargetMachine *TM, const Function &F)
		: BaseT(TM, F.getParent()->getDataLayout()),
		TM(TM),
		ST(TM->getSubtargetImpl(F)),
		TLI(ST->getTargetLowering()),
		CommonTTI(TM, F) {}

		const TargetSubtargetInfo *getST() const { return ST; }
		const TargetLowering *getTLI() const { return TLI; }

		void getUnrollingPreferences(Loop *L, ScalarEvolution &SE,
		TTI::UnrollingPreferences &UP);
		unsigned getHardwareNumberOfRegisters(bool Vec) const;
		unsigned getNumberOfRegisters(bool Vec) const;
		unsigned getRegisterBitWidth(bool Vector) const;
		unsigned getMinVectorRegisterBitWidth() const;
		unsigned getLoadStoreVecRegBitWidth(unsigned AddrSpace) const;
		bool isLegalToVectorizeMemChain(unsigned ChainSizeInBytes, unsigned Alignment,
		unsigned AddrSpace) const;
		bool isLegalToVectorizeLoadChain(unsigned ChainSizeInBytes,
		unsigned Alignment,
		unsigned AddrSpace) const;
		bool isLegalToVectorizeStoreChain(unsigned ChainSizeInBytes,
		unsigned Alignment,
		unsigned AddrSpace) const;
		unsigned getMaxInterleaveFactor(unsigned VF);
		unsigned getCFInstrCost(unsigned Opcode);
		int getVectorInstrCost(unsigned Opcode, Type *ValTy, unsigned Index);
		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_LIB_TARGET_AMDGPU_AMDGPUTARGETTRANSFORMINFO_H		#endif // LLVM_LIB_TARGET_AMDGPU_AMDGPUTARGETTRANSFORMINFO_H

lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp

Show First 20 Lines • Show All 96 Lines • ▼ Show 20 Lines	void AMDGPUTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE,

// TODO: Do we want runtime unrolling?		// TODO: Do we want runtime unrolling?

// Maximum alloca size than can fit registers. Reserve 16 registers.		// Maximum alloca size than can fit registers. Reserve 16 registers.
const unsigned MaxAlloca = (256 - 16) * 4;		const unsigned MaxAlloca = (256 - 16) * 4;
unsigned ThresholdPrivate = UnrollThresholdPrivate;		unsigned ThresholdPrivate = UnrollThresholdPrivate;
unsigned ThresholdLocal = UnrollThresholdLocal;		unsigned ThresholdLocal = UnrollThresholdLocal;
unsigned MaxBoost = std::max(ThresholdPrivate, ThresholdLocal);		unsigned MaxBoost = std::max(ThresholdPrivate, ThresholdLocal);
AMDGPUAS ASST = ST->getAMDGPUAS();		const AMDGPUAS &ASST = AMDGPU::getAMDGPUAS(TM->getTargetTriple());
for (const BasicBlock *BB : L->getBlocks()) {		for (const BasicBlock *BB : L->getBlocks()) {
const DataLayout &DL = BB->getModule()->getDataLayout();		const DataLayout &DL = BB->getModule()->getDataLayout();
unsigned LocalGEPsSeen = 0;		unsigned LocalGEPsSeen = 0;

if (llvm::any_of(L->getSubLoops(), [BB](const Loop* SubLoop) {		if (llvm::any_of(L->getSubLoops(), [BB](const Loop* SubLoop) {
return SubLoop->contains(BB); }))		return SubLoop->contains(BB); }))
continue; // Block belongs to an inner loop.		continue; // Block belongs to an inner loop.

▲ Show 20 Lines • Show All 93 Lines • ▼ Show 20 Lines	for (const Instruction &I : *BB) {
return;		return;
}		}
}		}
}		}

unsigned AMDGPUTTIImpl::getHardwareNumberOfRegisters(bool Vec) const {		unsigned AMDGPUTTIImpl::getHardwareNumberOfRegisters(bool Vec) const {
// The concept of vector registers doesn't really exist. Some packed vector		// The concept of vector registers doesn't really exist. Some packed vector
// operations operate on the normal 32-bit registers.		// operations operate on the normal 32-bit registers.

// Number of VGPRs on SI.
if (ST->getGeneration() >= AMDGPUSubtarget::SOUTHERN_ISLANDS)
return 256;		return 256;

return 4 * 128; // XXX - 4 channels. Should these count as vector instead?
}		}

unsigned AMDGPUTTIImpl::getNumberOfRegisters(bool Vec) const {		unsigned AMDGPUTTIImpl::getNumberOfRegisters(bool Vec) const {
// This is really the number of registers to fill when vectorizing /		// This is really the number of registers to fill when vectorizing /
// interleaving loops, so we lie to avoid trying to use all registers.		// interleaving loops, so we lie to avoid trying to use all registers.
return getHardwareNumberOfRegisters(Vec) >> 3;		return getHardwareNumberOfRegisters(Vec) >> 3;
}		}

Show All 22 Lines	unsigned AMDGPUTTIImpl::getStoreVectorFactor(unsigned VF, unsigned StoreSize,
unsigned VecRegBitWidth = VF * StoreSize;		unsigned VecRegBitWidth = VF * StoreSize;
if (VecRegBitWidth > 128)		if (VecRegBitWidth > 128)
return 128 / StoreSize;		return 128 / StoreSize;

return VF;		return VF;
}		}

unsigned AMDGPUTTIImpl::getLoadStoreVecRegBitWidth(unsigned AddrSpace) const {		unsigned AMDGPUTTIImpl::getLoadStoreVecRegBitWidth(unsigned AddrSpace) const {
AMDGPUAS AS = ST->getAMDGPUAS();		const Triple &Triple = TM->getTargetTriple();
		const AMDGPUAS &AS = AMDGPU::getAMDGPUAS(Triple);
		auto Subtarget = static_cast<const AMDGPUSubtarget*>(ST);
if (AddrSpace == AS.GLOBAL_ADDRESS \|\|		if (AddrSpace == AS.GLOBAL_ADDRESS \|\|
AddrSpace == AS.CONSTANT_ADDRESS \|\|		AddrSpace == AS.CONSTANT_ADDRESS \|\|
AddrSpace == AS.CONSTANT_ADDRESS_32BIT) {		AddrSpace == AS.CONSTANT_ADDRESS_32BIT) {
if (ST->getGeneration() <= AMDGPUSubtarget::NORTHERN_ISLANDS)		if (Triple.getArch() == Triple::r600)
return 128;		return 128;
return 512;		return 512;
}		}

if (AddrSpace == AS.FLAT_ADDRESS)		if (AddrSpace == AS.FLAT_ADDRESS)
return 128;		return 128;

if (AddrSpace == AS.LOCAL_ADDRESS \|\|		if (AddrSpace == AS.LOCAL_ADDRESS \|\|
AddrSpace == AS.REGION_ADDRESS)		AddrSpace == AS.REGION_ADDRESS) {
return ST->useDS128() ? 128 : 64;		if (Triple.getArch() == Triple::r600)
		return 64;
		return Subtarget->useDS128() ? 128 : 64;
		}

if (AddrSpace == AS.PRIVATE_ADDRESS)		if (AddrSpace == AS.PRIVATE_ADDRESS) {
return 8 * ST->getMaxPrivateElementSize();		if (Triple.getArch() == Triple::r600)
		return 32;

		return 8 * Subtarget->getMaxPrivateElementSize();
		}

if (ST->getGeneration() <= AMDGPUSubtarget::NORTHERN_ISLANDS &&		if (Triple.getArch() == Triple::r600 &&
(AddrSpace == AS.PARAM_D_ADDRESS \|\|		(AddrSpace == AS.PARAM_D_ADDRESS \|\|
AddrSpace == AS.PARAM_I_ADDRESS \|\|		AddrSpace == AS.PARAM_I_ADDRESS \|\|
(AddrSpace >= AS.CONSTANT_BUFFER_0 &&		(AddrSpace >= AS.CONSTANT_BUFFER_0 &&
AddrSpace <= AS.CONSTANT_BUFFER_15)))		AddrSpace <= AS.CONSTANT_BUFFER_15)))
return 128;		return 128;
llvm_unreachable("unhandled address space");		llvm_unreachable("unhandled address space");
}		}

bool AMDGPUTTIImpl::isLegalToVectorizeMemChain(unsigned ChainSizeInBytes,		bool AMDGPUTTIImpl::isLegalToVectorizeMemChain(unsigned ChainSizeInBytes,
unsigned Alignment,		unsigned Alignment,
unsigned AddrSpace) const {		unsigned AddrSpace) const {
// We allow vectorization of flat stores, even though we may need to decompose		// We allow vectorization of flat stores, even though we may need to decompose
// them later if they may access private memory. We don't have enough context		// them later if they may access private memory. We don't have enough context
// here, and legalization can handle it.		// here, and legalization can handle it.
if (AddrSpace == ST->getAMDGPUAS().PRIVATE_ADDRESS) {		auto Subtarget = static_cast<const AMDGPUSubtarget>(ST);
return (Alignment >= 4 \|\| ST->hasUnalignedScratchAccess()) &&		if (AddrSpace == AMDGPU::getAMDGPUAS(TM->getTargetTriple()).PRIVATE_ADDRESS) {
ChainSizeInBytes <= ST->getMaxPrivateElementSize();		return (Alignment >= 4 \|\| Subtarget->hasUnalignedScratchAccess()) &&
		ChainSizeInBytes <= Subtarget->getMaxPrivateElementSize();
}		}
return true;		return true;
}		}

bool AMDGPUTTIImpl::isLegalToVectorizeLoadChain(unsigned ChainSizeInBytes,		bool AMDGPUTTIImpl::isLegalToVectorizeLoadChain(unsigned ChainSizeInBytes,
unsigned Alignment,		unsigned Alignment,
unsigned AddrSpace) const {		unsigned AddrSpace) const {
return isLegalToVectorizeMemChain(ChainSizeInBytes, Alignment, AddrSpace);		return isLegalToVectorizeMemChain(ChainSizeInBytes, Alignment, AddrSpace);
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	bool AMDGPUTTIImpl::getTgtMemIntrinsic(IntrinsicInst *Inst,
}		}
}		}

int AMDGPUTTIImpl::getArithmeticInstrCost(		int AMDGPUTTIImpl::getArithmeticInstrCost(
unsigned Opcode, Type *Ty, TTI::OperandValueKind Opd1Info,		unsigned Opcode, Type *Ty, TTI::OperandValueKind Opd1Info,
TTI::OperandValueKind Opd2Info, TTI::OperandValueProperties Opd1PropInfo,		TTI::OperandValueKind Opd2Info, TTI::OperandValueProperties Opd1PropInfo,
TTI::OperandValueProperties Opd2PropInfo, ArrayRef<const Value *> Args ) {		TTI::OperandValueProperties Opd2PropInfo, ArrayRef<const Value *> Args ) {
EVT OrigTy = TLI->getValueType(DL, Ty);		EVT OrigTy = TLI->getValueType(DL, Ty);
if (!OrigTy.isSimple()) {		if (isR600() \|\| !OrigTy.isSimple()) {
return BaseT::getArithmeticInstrCost(Opcode, Ty, Opd1Info, Opd2Info,		return BaseT::getArithmeticInstrCost(Opcode, Ty, Opd1Info, Opd2Info,
Opd1PropInfo, Opd2PropInfo);		Opd1PropInfo, Opd2PropInfo);
}		}

		auto STI = static_cast<const SISubtarget*>(ST);
// Legalize the type.		// Legalize the type.
std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, Ty);		std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, Ty);
int ISD = TLI->InstructionOpcodeToISD(Opcode);		int ISD = TLI->InstructionOpcodeToISD(Opcode);

// Because we don't have any legal vector operations, but the legal types, we		// Because we don't have any legal vector operations, but the legal types, we
// need to account for split vectors.		// need to account for split vectors.
unsigned NElts = LT.second.isVector() ?		unsigned NElts = LT.second.isVector() ?
LT.second.getVectorNumElements() : 1;		LT.second.getVectorNumElements() : 1;
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	case ISD::FMUL:
break;		break;
case ISD::FDIV:		case ISD::FDIV:
case ISD::FREM:		case ISD::FREM:
// FIXME: frem should be handled separately. The fdiv in it is most of it,		// FIXME: frem should be handled separately. The fdiv in it is most of it,
// but the current lowering is also not entirely correct.		// but the current lowering is also not entirely correct.
if (SLT == MVT::f64) {		if (SLT == MVT::f64) {
int Cost = 4 * get64BitInstrCost() + 7 * getQuarterRateInstrCost();		int Cost = 4 * get64BitInstrCost() + 7 * getQuarterRateInstrCost();
// Add cost of workaround.		// Add cost of workaround.
if (ST->getGeneration() == AMDGPUSubtarget::SOUTHERN_ISLANDS)		if (STI->getGeneration() == AMDGPUSubtarget::SOUTHERN_ISLANDS)
Cost += 3 * getFullRateInstrCost();		Cost += 3 * getFullRateInstrCost();

return LT.first * Cost * NElts;		return LT.first * Cost * NElts;
}		}

if (!Args.empty() && match(Args[0], PatternMatch::m_FPOne())) {		if (!Args.empty() && match(Args[0], PatternMatch::m_FPOne())) {
// TODO: This is more complicated, unsafe flags etc.		// TODO: This is more complicated, unsafe flags etc.
if ((SLT == MVT::f32 && !ST->hasFP32Denormals()) \|\|		if ((SLT == MVT::f32 && !STI->hasFP32Denormals()) \|\|
(SLT == MVT::f16 && ST->has16BitInsts())) {		(SLT == MVT::f16 && STI->has16BitInsts())) {
return LT.first * getQuarterRateInstrCost() * NElts;		return LT.first * getQuarterRateInstrCost() * NElts;
}		}
}		}

if (SLT == MVT::f16 && ST->has16BitInsts()) {		if (SLT == MVT::f16 && STI->has16BitInsts()) {
// 2 x v_cvt_f32_f16		// 2 x v_cvt_f32_f16
// f32 rcp		// f32 rcp
// f32 fmul		// f32 fmul
// v_cvt_f16_f32		// v_cvt_f16_f32
// f16 div_fixup		// f16 div_fixup
int Cost = 4 * getFullRateInstrCost() + 2 * getQuarterRateInstrCost();		int Cost = 4 * getFullRateInstrCost() + 2 * getQuarterRateInstrCost();
return LT.first * Cost * NElts;		return LT.first * Cost * NElts;
}		}

if (SLT == MVT::f32 \|\| SLT == MVT::f16) {		if (SLT == MVT::f32 \|\| SLT == MVT::f16) {
int Cost = 7 * getFullRateInstrCost() + 1 * getQuarterRateInstrCost();		int Cost = 7 * getFullRateInstrCost() + 1 * getQuarterRateInstrCost();

if (!ST->hasFP32Denormals()) {		if (!STI->hasFP32Denormals()) {
// FP mode switches.		// FP mode switches.
Cost += 2 * getFullRateInstrCost();		Cost += 2 * getFullRateInstrCost();
}		}

return LT.first * NElts * Cost;		return LT.first * NElts * Cost;
}		}
break;		break;
default:		default:
Show All 12 Lines	case Instruction::Ret:
return 10;		return 10;
default:		default:
return BaseT::getCFInstrCost(Opcode);		return BaseT::getCFInstrCost(Opcode);
}		}
}		}

int AMDGPUTTIImpl::getVectorInstrCost(unsigned Opcode, Type *ValTy,		int AMDGPUTTIImpl::getVectorInstrCost(unsigned Opcode, Type *ValTy,
unsigned Index) {		unsigned Index) {
		bool Has16BitInsts = isGCN() &&
		static_cast<const AMDGPUSubtarget *>(ST)->has16BitInsts();
switch (Opcode) {		switch (Opcode) {
case Instruction::ExtractElement:		case Instruction::ExtractElement:
case Instruction::InsertElement: {		case Instruction::InsertElement: {
unsigned EltSize		unsigned EltSize
= DL.getTypeSizeInBits(cast<VectorType>(ValTy)->getElementType());		= DL.getTypeSizeInBits(cast<VectorType>(ValTy)->getElementType());
if (EltSize < 32) {		if (EltSize < 32) {
if (EltSize == 16 && Index == 0 && ST->has16BitInsts())		if (EltSize == 16 && Index == 0 && Has16BitInsts)
return 0;		return 0;
return BaseT::getVectorInstrCost(Opcode, ValTy, Index);		return BaseT::getVectorInstrCost(Opcode, ValTy, Index);
}		}

// Extracts are just reads of a subregister, so are free. Inserts are		// Extracts are just reads of a subregister, so are free. Inserts are
// considered free because we don't want to have any cost for scalarizing		// considered free because we don't want to have any cost for scalarizing
// operations, and we don't have to copy into a different register class.		// operations, and we don't have to copy into a different register class.

▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	bool AMDGPUTTIImpl::isSourceOfDivergence(const Value *V) const {

// Loads from the private address space are divergent, because threads		// Loads from the private address space are divergent, because threads
// can execute the load instruction with the same inputs and get different		// can execute the load instruction with the same inputs and get different
// results.		// results.
//		//
// All other loads are not divergent, because if threads issue loads with the		// All other loads are not divergent, because if threads issue loads with the
// same arguments, they will always get the same result.		// same arguments, they will always get the same result.
if (const LoadInst *Load = dyn_cast<LoadInst>(V))		if (const LoadInst *Load = dyn_cast<LoadInst>(V))
return Load->getPointerAddressSpace() == ST->getAMDGPUAS().PRIVATE_ADDRESS;		return Load->getPointerAddressSpace() ==
		AMDGPU::getAMDGPUAS(TM->getTargetTriple()).PRIVATE_ADDRESS;

// Atomics are divergent because they are executed sequentially: when an		// Atomics are divergent because they are executed sequentially: when an
// atomic operation refers to the same address in each thread, then each		// atomic operation refers to the same address in each thread, then each
// thread after the first sees the value written by the previous thread as		// thread after the first sees the value written by the previous thread as
// original value.		// original value.
if (isa<AtomicRMWInst>(V) \|\| isa<AtomicCmpXchgInst>(V))		if (isa<AtomicRMWInst>(V) \|\| isa<AtomicCmpXchgInst>(V))
return true;		return true;

Show All 17 Lines	case Intrinsic::amdgcn_readlane:
return true;		return true;
}		}
}		}
return false;		return false;
}		}

unsigned AMDGPUTTIImpl::getShuffleCost(TTI::ShuffleKind Kind, Type *Tp, int Index,		unsigned AMDGPUTTIImpl::getShuffleCost(TTI::ShuffleKind Kind, Type *Tp, int Index,
Type *SubTp) {		Type *SubTp) {
if (ST->hasVOP3PInsts()) {		if (isGCN() && static_cast<const AMDGPUSubtarget*>(ST)->hasVOP3PInsts()) {
		arsenmUnsubmitted Done Reply Inline Actions Can’t this be in a gcn class? I think all of this is for packed anyway arsenm: Can’t this be in a gcn class? I think all of this is for packed anyway
VectorType *VT = cast<VectorType>(Tp);		VectorType *VT = cast<VectorType>(Tp);
if (VT->getNumElements() == 2 &&		if (VT->getNumElements() == 2 &&
DL.getTypeSizeInBits(VT->getElementType()) == 16) {		DL.getTypeSizeInBits(VT->getElementType()) == 16) {
// With op_sel VOP3P instructions freely can access the low half or high		// With op_sel VOP3P instructions freely can access the low half or high
// half of a register, so any swizzle is free.		// half of a register, so any swizzle is free.

switch (Kind) {		switch (Kind) {
case TTI::SK_Broadcast:		case TTI::SK_Broadcast:
Show All 16 Lines	const FeatureBitset &CallerBits =
TM.getSubtargetImpl(*Caller)->getFeatureBits();		TM.getSubtargetImpl(*Caller)->getFeatureBits();
const FeatureBitset &CalleeBits =		const FeatureBitset &CalleeBits =
TM.getSubtargetImpl(*Callee)->getFeatureBits();		TM.getSubtargetImpl(*Callee)->getFeatureBits();

FeatureBitset RealCallerBits = CallerBits & ~InlineFeatureIgnoreList;		FeatureBitset RealCallerBits = CallerBits & ~InlineFeatureIgnoreList;
FeatureBitset RealCalleeBits = CalleeBits & ~InlineFeatureIgnoreList;		FeatureBitset RealCalleeBits = CalleeBits & ~InlineFeatureIgnoreList;
return ((RealCallerBits & RealCalleeBits) == RealCalleeBits);		return ((RealCallerBits & RealCalleeBits) == RealCalleeBits);
}		}

		void R600TTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE,
		TTI::UnrollingPreferences &UP) {
		CommonTTI.getUnrollingPreferences(L, SE, UP);
		}

		unsigned R600TTIImpl::getHardwareNumberOfRegisters(bool Vec) const {
		return 4 * 128; // XXX - 4 channels. Should these count as vector instead?
		}

		unsigned R600TTIImpl::getNumberOfRegisters(bool Vec) const {
		return getHardwareNumberOfRegisters(Vec);
		}

		unsigned R600TTIImpl::getRegisterBitWidth(bool Vector) const {
		return 32;
		}

		unsigned R600TTIImpl::getMinVectorRegisterBitWidth() const {
		return 32;
		}

		unsigned R600TTIImpl::getLoadStoreVecRegBitWidth(unsigned AddrSpace) const {
		const AMDGPUAS &AS = AMDGPU::getAMDGPUAS(TM->getTargetTriple());
		if (AddrSpace == AS.GLOBAL_ADDRESS \|\|
		AddrSpace == AS.CONSTANT_ADDRESS)
		return 128;
		if (AddrSpace == AS.LOCAL_ADDRESS \|\|
		AddrSpace == AS.REGION_ADDRESS)
		return 64;
		if (AddrSpace == AS.PRIVATE_ADDRESS)
		return 32;

		if ((AddrSpace == AS.PARAM_D_ADDRESS \|\|
		AddrSpace == AS.PARAM_I_ADDRESS \|\|
		(AddrSpace >= AS.CONSTANT_BUFFER_0 &&
		AddrSpace <= AS.CONSTANT_BUFFER_15)))
		return 128;
		llvm_unreachable("unhandled address space");
		}

		bool R600TTIImpl::isLegalToVectorizeMemChain(unsigned ChainSizeInBytes,
		unsigned Alignment,
		unsigned AddrSpace) const {
		// We allow vectorization of flat stores, even though we may need to decompose
		// them later if they may access private memory. We don't have enough context
		// here, and legalization can handle it.
		if (AddrSpace == AMDGPU::getAMDGPUAS(TM->getTargetTriple()).PRIVATE_ADDRESS)
		return false;
		return true;
		}

		bool R600TTIImpl::isLegalToVectorizeLoadChain(unsigned ChainSizeInBytes,
		unsigned Alignment,
		unsigned AddrSpace) const {
		return isLegalToVectorizeMemChain(ChainSizeInBytes, Alignment, AddrSpace);
		}

		bool R600TTIImpl::isLegalToVectorizeStoreChain(unsigned ChainSizeInBytes,
		unsigned Alignment,
		unsigned AddrSpace) const {
		return isLegalToVectorizeMemChain(ChainSizeInBytes, Alignment, AddrSpace);
		}

		unsigned R600TTIImpl::getMaxInterleaveFactor(unsigned VF) {
		// Disable unrolling if the loop is not vectorized.
		// TODO: Enable this again.
		if (VF == 1)
		return 1;

		return 8;
		}

		unsigned R600TTIImpl::getCFInstrCost(unsigned Opcode) {
		return CommonTTI.getCFInstrCost(Opcode);
		}

		int R600TTIImpl::getVectorInstrCost(unsigned Opcode, Type *ValTy,
		unsigned Index) {
		return CommonTTI.getVectorInstrCost(Opcode, ValTy, Index);
		}

lib/Target/AMDGPU/AMDILCFGStructurizer.cpp

Show First 20 Lines • Show All 431 Lines • ▼ Show 20 Lines
}		}

void AMDGPUCFGStructurizer::reversePredicateSetter(		void AMDGPUCFGStructurizer::reversePredicateSetter(
MachineBasicBlock::iterator I, MachineBasicBlock &MBB) {		MachineBasicBlock::iterator I, MachineBasicBlock &MBB) {
assert(I.isValid() && "Expected valid iterator");		assert(I.isValid() && "Expected valid iterator");
for (;; --I) {		for (;; --I) {
if (I == MBB.end())		if (I == MBB.end())
continue;		continue;
if (I->getOpcode() == AMDGPU::PRED_X) {		if (I->getOpcode() == R600::PRED_X) {
switch (I->getOperand(2).getImm()) {		switch (I->getOperand(2).getImm()) {
case AMDGPU::PRED_SETE_INT:		case R600::PRED_SETE_INT:
I->getOperand(2).setImm(AMDGPU::PRED_SETNE_INT);		I->getOperand(2).setImm(R600::PRED_SETNE_INT);
return;		return;
case AMDGPU::PRED_SETNE_INT:		case R600::PRED_SETNE_INT:
I->getOperand(2).setImm(AMDGPU::PRED_SETE_INT);		I->getOperand(2).setImm(R600::PRED_SETE_INT);
return;		return;
case AMDGPU::PRED_SETE:		case R600::PRED_SETE:
I->getOperand(2).setImm(AMDGPU::PRED_SETNE);		I->getOperand(2).setImm(R600::PRED_SETNE);
return;		return;
case AMDGPU::PRED_SETNE:		case R600::PRED_SETNE:
I->getOperand(2).setImm(AMDGPU::PRED_SETE);		I->getOperand(2).setImm(R600::PRED_SETE);
return;		return;
default:		default:
llvm_unreachable("PRED_X Opcode invalid!");		llvm_unreachable("PRED_X Opcode invalid!");
}		}
}		}
}		}
}		}

▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	void AMDGPUCFGStructurizer::insertCondBranchBefore(
//insert before		//insert before
blk->insert(I, NewInstr);		blk->insert(I, NewInstr);
MachineInstrBuilder(*MF, NewInstr).addReg(RegNum, false);		MachineInstrBuilder(*MF, NewInstr).addReg(RegNum, false);
SHOWNEWINSTR(NewInstr);		SHOWNEWINSTR(NewInstr);
}		}

int AMDGPUCFGStructurizer::getBranchNzeroOpcode(int OldOpcode) {		int AMDGPUCFGStructurizer::getBranchNzeroOpcode(int OldOpcode) {
switch(OldOpcode) {		switch(OldOpcode) {
case AMDGPU::JUMP_COND:		case R600::JUMP_COND:
case AMDGPU::JUMP: return AMDGPU::IF_PREDICATE_SET;		case R600::JUMP: return R600::IF_PREDICATE_SET;
case AMDGPU::BRANCH_COND_i32:		case R600::BRANCH_COND_i32:
case AMDGPU::BRANCH_COND_f32: return AMDGPU::IF_LOGICALNZ_f32;		case R600::BRANCH_COND_f32: return R600::IF_LOGICALNZ_f32;
default: llvm_unreachable("internal error");		default: llvm_unreachable("internal error");
}		}
return -1;		return -1;
}		}

int AMDGPUCFGStructurizer::getBranchZeroOpcode(int OldOpcode) {		int AMDGPUCFGStructurizer::getBranchZeroOpcode(int OldOpcode) {
switch(OldOpcode) {		switch(OldOpcode) {
case AMDGPU::JUMP_COND:		case R600::JUMP_COND:
case AMDGPU::JUMP: return AMDGPU::IF_PREDICATE_SET;		case R600::JUMP: return R600::IF_PREDICATE_SET;
case AMDGPU::BRANCH_COND_i32:		case R600::BRANCH_COND_i32:
case AMDGPU::BRANCH_COND_f32: return AMDGPU::IF_LOGICALZ_f32;		case R600::BRANCH_COND_f32: return R600::IF_LOGICALZ_f32;
default: llvm_unreachable("internal error");		default: llvm_unreachable("internal error");
}		}
return -1;		return -1;
}		}

int AMDGPUCFGStructurizer::getContinueNzeroOpcode(int OldOpcode) {		int AMDGPUCFGStructurizer::getContinueNzeroOpcode(int OldOpcode) {
switch(OldOpcode) {		switch(OldOpcode) {
case AMDGPU::JUMP_COND:		case R600::JUMP_COND:
case AMDGPU::JUMP: return AMDGPU::CONTINUE_LOGICALNZ_i32;		case R600::JUMP: return R600::CONTINUE_LOGICALNZ_i32;
default: llvm_unreachable("internal error");		default: llvm_unreachable("internal error");
}		}
return -1;		return -1;
}		}

int AMDGPUCFGStructurizer::getContinueZeroOpcode(int OldOpcode) {		int AMDGPUCFGStructurizer::getContinueZeroOpcode(int OldOpcode) {
switch(OldOpcode) {		switch(OldOpcode) {
case AMDGPU::JUMP_COND:		case R600::JUMP_COND:
case AMDGPU::JUMP: return AMDGPU::CONTINUE_LOGICALZ_i32;		case R600::JUMP: return R600::CONTINUE_LOGICALZ_i32;
default: llvm_unreachable("internal error");		default: llvm_unreachable("internal error");
}		}
return -1;		return -1;
}		}

MachineBasicBlock AMDGPUCFGStructurizer::getTrueBranch(MachineInstr MI) {		MachineBasicBlock AMDGPUCFGStructurizer::getTrueBranch(MachineInstr MI) {
return MI->getOperand(0).getMBB();		return MI->getOperand(0).getMBB();
}		}
Show All 11 Lines	AMDGPUCFGStructurizer::getFalseBranch(MachineBasicBlock *MBB,
MachineBasicBlock::succ_iterator It = MBB->succ_begin();		MachineBasicBlock::succ_iterator It = MBB->succ_begin();
MachineBasicBlock::succ_iterator Next = It;		MachineBasicBlock::succ_iterator Next = It;
++Next;		++Next;
return (It == TrueBranch) ? Next : *It;		return (It == TrueBranch) ? Next : *It;
}		}

bool AMDGPUCFGStructurizer::isCondBranch(MachineInstr *MI) {		bool AMDGPUCFGStructurizer::isCondBranch(MachineInstr *MI) {
switch (MI->getOpcode()) {		switch (MI->getOpcode()) {
case AMDGPU::JUMP_COND:		case R600::JUMP_COND:
case AMDGPU::BRANCH_COND_i32:		case R600::BRANCH_COND_i32:
case AMDGPU::BRANCH_COND_f32: return true;		case R600::BRANCH_COND_f32: return true;
default:		default:
return false;		return false;
}		}
return false;		return false;
}		}

bool AMDGPUCFGStructurizer::isUncondBranch(MachineInstr *MI) {		bool AMDGPUCFGStructurizer::isUncondBranch(MachineInstr *MI) {
switch (MI->getOpcode()) {		switch (MI->getOpcode()) {
case AMDGPU::JUMP:		case R600::JUMP:
case AMDGPU::BRANCH:		case R600::BRANCH:
return true;		return true;
default:		default:
return false;		return false;
}		}
return false;		return false;
}		}

DebugLoc AMDGPUCFGStructurizer::getLastDebugLocInBB(MachineBasicBlock *MBB) {		DebugLoc AMDGPUCFGStructurizer::getLastDebugLocInBB(MachineBasicBlock *MBB) {
Show All 32 Lines	MachineInstr *AMDGPUCFGStructurizer::getLoopendBlockBranchInstr(
}		}
return nullptr;		return nullptr;
}		}

MachineInstr AMDGPUCFGStructurizer::getReturnInstr(MachineBasicBlock MBB) {		MachineInstr AMDGPUCFGStructurizer::getReturnInstr(MachineBasicBlock MBB) {
MachineBasicBlock::reverse_iterator It = MBB->rbegin();		MachineBasicBlock::reverse_iterator It = MBB->rbegin();
if (It != MBB->rend()) {		if (It != MBB->rend()) {
MachineInstr instr = &(It);		MachineInstr instr = &(It);
if (instr->getOpcode() == AMDGPU::RETURN)		if (instr->getOpcode() == R600::RETURN)
return instr;		return instr;
}		}
return nullptr;		return nullptr;
}		}

bool AMDGPUCFGStructurizer::isReturnBlock(MachineBasicBlock *MBB) {		bool AMDGPUCFGStructurizer::isReturnBlock(MachineBasicBlock *MBB) {
MachineInstr *MI = getReturnInstr(MBB);		MachineInstr *MI = getReturnInstr(MBB);
bool IsReturn = (MBB->succ_size() == 0);		bool IsReturn = (MBB->succ_size() == 0);
Show All 37 Lines	assert((!MBB->getParent()->getJumpTableInfo()
&& "found a jump table");		&& "found a jump table");

//collect continue right before endloop		//collect continue right before endloop
SmallVector<MachineInstr *, DEFAULT_VEC_SLOTS> ContInstr;		SmallVector<MachineInstr *, DEFAULT_VEC_SLOTS> ContInstr;
MachineBasicBlock::iterator Pre = MBB->begin();		MachineBasicBlock::iterator Pre = MBB->begin();
MachineBasicBlock::iterator E = MBB->end();		MachineBasicBlock::iterator E = MBB->end();
MachineBasicBlock::iterator It = Pre;		MachineBasicBlock::iterator It = Pre;
while (It != E) {		while (It != E) {
if (Pre->getOpcode() == AMDGPU::CONTINUE		if (Pre->getOpcode() == R600::CONTINUE
&& It->getOpcode() == AMDGPU::ENDLOOP)		&& It->getOpcode() == R600::ENDLOOP)
ContInstr.push_back(&*Pre);		ContInstr.push_back(&*Pre);
Pre = It;		Pre = It;
++It;		++It;
}		}

//delete continue right before endloop		//delete continue right before endloop
for (unsigned i = 0; i < ContInstr.size(); ++i)		for (unsigned i = 0; i < ContInstr.size(); ++i)
ContInstr[i]->eraseFromParent();		ContInstr[i]->eraseFromParent();
▲ Show 20 Lines • Show All 627 Lines • ▼ Show 20 Lines	if (!MigrateTrue \|\| !MigrateFalse) {
// lot of instructions.		// lot of instructions.
return 0;		return 0;
}		}

int NumNewBlk = 0;		int NumNewBlk = 0;

bool LandBlkHasOtherPred = (LandBlk->pred_size() > 2);		bool LandBlkHasOtherPred = (LandBlk->pred_size() > 2);

//insert AMDGPU::ENDIF to avoid special case "input landBlk == NULL"		//insert R600::ENDIF to avoid special case "input landBlk == NULL"
MachineBasicBlock::iterator I = insertInstrBefore(LandBlk, AMDGPU::ENDIF);		MachineBasicBlock::iterator I = insertInstrBefore(LandBlk, R600::ENDIF);

if (LandBlkHasOtherPred) {		if (LandBlkHasOtherPred) {
report_fatal_error("Extra register needed to handle CFG");		report_fatal_error("Extra register needed to handle CFG");
unsigned CmpResReg =		unsigned CmpResReg =
HeadMBB->getParent()->getRegInfo().createVirtualRegister(I32RC);		HeadMBB->getParent()->getRegInfo().createVirtualRegister(I32RC);
report_fatal_error("Extra compare instruction needed to handle CFG");		report_fatal_error("Extra compare instruction needed to handle CFG");
insertCondBranchBefore(LandBlk, I, AMDGPU::IF_PREDICATE_SET,		insertCondBranchBefore(LandBlk, I, R600::IF_PREDICATE_SET,
CmpResReg, DebugLoc());		CmpResReg, DebugLoc());
}		}

// XXX: We are running this after RA, so creating virtual registers will		// XXX: We are running this after RA, so creating virtual registers will
// cause an assertion failure in the PostRA scheduling pass.		// cause an assertion failure in the PostRA scheduling pass.
unsigned InitReg =		unsigned InitReg =
HeadMBB->getParent()->getRegInfo().createVirtualRegister(I32RC);		HeadMBB->getParent()->getRegInfo().createVirtualRegister(I32RC);
insertCondBranchBefore(LandBlk, I, AMDGPU::IF_PREDICATE_SET, InitReg,		insertCondBranchBefore(LandBlk, I, R600::IF_PREDICATE_SET, InitReg,
DebugLoc());		DebugLoc());

if (MigrateTrue) {		if (MigrateTrue) {
migrateInstruction(TrueMBB, LandBlk, I);		migrateInstruction(TrueMBB, LandBlk, I);
// need to uncondionally insert the assignment to ensure a path from its		// need to uncondionally insert the assignment to ensure a path from its
// predecessor rather than headBlk has valid value in initReg if		// predecessor rather than headBlk has valid value in initReg if
// (initVal != 1).		// (initVal != 1).
report_fatal_error("Extra register needed to handle CFG");		report_fatal_error("Extra register needed to handle CFG");
}		}
insertInstrBefore(I, AMDGPU::ELSE);		insertInstrBefore(I, R600::ELSE);

if (MigrateFalse) {		if (MigrateFalse) {
migrateInstruction(FalseMBB, LandBlk, I);		migrateInstruction(FalseMBB, LandBlk, I);
// need to uncondionally insert the assignment to ensure a path from its		// need to uncondionally insert the assignment to ensure a path from its
// predecessor rather than headBlk has valid value in initReg if		// predecessor rather than headBlk has valid value in initReg if
// (initVal != 0)		// (initVal != 0)
report_fatal_error("Extra register needed to handle CFG");		report_fatal_error("Extra register needed to handle CFG");
}		}

if (LandBlkHasOtherPred) {		if (LandBlkHasOtherPred) {
// add endif		// add endif
insertInstrBefore(I, AMDGPU::ENDIF);		insertInstrBefore(I, R600::ENDIF);

// put initReg = 2 to other predecessors of landBlk		// put initReg = 2 to other predecessors of landBlk
for (MachineBasicBlock::pred_iterator PI = LandBlk->pred_begin(),		for (MachineBasicBlock::pred_iterator PI = LandBlk->pred_begin(),
PE = LandBlk->pred_end(); PI != PE; ++PI) {		PE = LandBlk->pred_end(); PI != PE; ++PI) {
MachineBasicBlock MBB = PI;		MachineBasicBlock MBB = PI;
if (MBB != TrueMBB && MBB != FalseMBB)		if (MBB != TrueMBB && MBB != FalseMBB)
report_fatal_error("Extra register needed to handle CFG");		report_fatal_error("Extra register needed to handle CFG");
}		}
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	if (TrueMBB) {
MBB->removeSuccessor(TrueMBB, true);		MBB->removeSuccessor(TrueMBB, true);
if (LandMBB && TrueMBB->succ_size()!=0)		if (LandMBB && TrueMBB->succ_size()!=0)
TrueMBB->removeSuccessor(LandMBB, true);		TrueMBB->removeSuccessor(LandMBB, true);
retireBlock(TrueMBB);		retireBlock(TrueMBB);
MLI->removeBlock(TrueMBB);		MLI->removeBlock(TrueMBB);
}		}

if (FalseMBB) {		if (FalseMBB) {
insertInstrBefore(I, AMDGPU::ELSE);		insertInstrBefore(I, R600::ELSE);
MBB->splice(I, FalseMBB, FalseMBB->begin(),		MBB->splice(I, FalseMBB, FalseMBB->begin(),
FalseMBB->end());		FalseMBB->end());
MBB->removeSuccessor(FalseMBB, true);		MBB->removeSuccessor(FalseMBB, true);
if (LandMBB && FalseMBB->succ_size() != 0)		if (LandMBB && FalseMBB->succ_size() != 0)
FalseMBB->removeSuccessor(LandMBB, true);		FalseMBB->removeSuccessor(LandMBB, true);
retireBlock(FalseMBB);		retireBlock(FalseMBB);
MLI->removeBlock(FalseMBB);		MLI->removeBlock(FalseMBB);
}		}
insertInstrBefore(I, AMDGPU::ENDIF);		insertInstrBefore(I, R600::ENDIF);

BranchMI->eraseFromParent();		BranchMI->eraseFromParent();

if (LandMBB && TrueMBB && FalseMBB)		if (LandMBB && TrueMBB && FalseMBB)
MBB->addSuccessor(LandMBB);		MBB->addSuccessor(LandMBB);
}		}

void AMDGPUCFGStructurizer::mergeLooplandBlock(MachineBasicBlock *DstBlk,		void AMDGPUCFGStructurizer::mergeLooplandBlock(MachineBasicBlock *DstBlk,
MachineBasicBlock *LandMBB) {		MachineBasicBlock *LandMBB) {
DEBUG(dbgs() << "loopPattern header = BB" << DstBlk->getNumber()		DEBUG(dbgs() << "loopPattern header = BB" << DstBlk->getNumber()
<< " land = BB" << LandMBB->getNumber() << "\n";);		<< " land = BB" << LandMBB->getNumber() << "\n";);

insertInstrBefore(DstBlk, AMDGPU::WHILELOOP, DebugLoc());		insertInstrBefore(DstBlk, R600::WHILELOOP, DebugLoc());
insertInstrEnd(DstBlk, AMDGPU::ENDLOOP, DebugLoc());		insertInstrEnd(DstBlk, R600::ENDLOOP, DebugLoc());
DstBlk->replaceSuccessor(DstBlk, LandMBB);		DstBlk->replaceSuccessor(DstBlk, LandMBB);
}		}

void AMDGPUCFGStructurizer::mergeLoopbreakBlock(MachineBasicBlock *ExitingMBB,		void AMDGPUCFGStructurizer::mergeLoopbreakBlock(MachineBasicBlock *ExitingMBB,
MachineBasicBlock *LandMBB) {		MachineBasicBlock *LandMBB) {
DEBUG(dbgs() << "loopbreakPattern exiting = BB" << ExitingMBB->getNumber()		DEBUG(dbgs() << "loopbreakPattern exiting = BB" << ExitingMBB->getNumber()
<< " land = BB" << LandMBB->getNumber() << "\n";);		<< " land = BB" << LandMBB->getNumber() << "\n";);
MachineInstr *BranchMI = getLoopendBlockBranchInstr(ExitingMBB);		MachineInstr *BranchMI = getLoopendBlockBranchInstr(ExitingMBB);
assert(BranchMI && isCondBranch(BranchMI));		assert(BranchMI && isCondBranch(BranchMI));
DebugLoc DL = BranchMI->getDebugLoc();		DebugLoc DL = BranchMI->getDebugLoc();
MachineBasicBlock *TrueBranch = getTrueBranch(BranchMI);		MachineBasicBlock *TrueBranch = getTrueBranch(BranchMI);
MachineBasicBlock::iterator I = BranchMI;		MachineBasicBlock::iterator I = BranchMI;
if (TrueBranch != LandMBB)		if (TrueBranch != LandMBB)
reversePredicateSetter(I, *I->getParent());		reversePredicateSetter(I, *I->getParent());
insertCondBranchBefore(ExitingMBB, I, AMDGPU::IF_PREDICATE_SET, AMDGPU::PREDICATE_BIT, DL);		insertCondBranchBefore(ExitingMBB, I, R600::IF_PREDICATE_SET, R600::PREDICATE_BIT, DL);
insertInstrBefore(I, AMDGPU::BREAK);		insertInstrBefore(I, R600::BREAK);
insertInstrBefore(I, AMDGPU::ENDIF);		insertInstrBefore(I, R600::ENDIF);
//now branchInst can be erase safely		//now branchInst can be erase safely
BranchMI->eraseFromParent();		BranchMI->eraseFromParent();
//now take care of successors, retire blocks		//now take care of successors, retire blocks
ExitingMBB->removeSuccessor(LandMBB, true);		ExitingMBB->removeSuccessor(LandMBB, true);
}		}

void AMDGPUCFGStructurizer::settleLoopcontBlock(MachineBasicBlock *ContingMBB,		void AMDGPUCFGStructurizer::settleLoopcontBlock(MachineBasicBlock *ContingMBB,
MachineBasicBlock *ContMBB) {		MachineBasicBlock *ContMBB) {
Show All 12 Lines	if (MI) {
bool UseContinueLogical = ((&*ContingMBB->rbegin()) == MI);		bool UseContinueLogical = ((&*ContingMBB->rbegin()) == MI);

if (!UseContinueLogical) {		if (!UseContinueLogical) {
int BranchOpcode =		int BranchOpcode =
TrueBranch == ContMBB ? getBranchNzeroOpcode(OldOpcode) :		TrueBranch == ContMBB ? getBranchNzeroOpcode(OldOpcode) :
getBranchZeroOpcode(OldOpcode);		getBranchZeroOpcode(OldOpcode);
insertCondBranchBefore(I, BranchOpcode, DL);		insertCondBranchBefore(I, BranchOpcode, DL);
// insertEnd to ensure phi-moves, if exist, go before the continue-instr.		// insertEnd to ensure phi-moves, if exist, go before the continue-instr.
insertInstrEnd(ContingMBB, AMDGPU::CONTINUE, DL);		insertInstrEnd(ContingMBB, R600::CONTINUE, DL);
insertInstrEnd(ContingMBB, AMDGPU::ENDIF, DL);		insertInstrEnd(ContingMBB, R600::ENDIF, DL);
} else {		} else {
int BranchOpcode =		int BranchOpcode =
TrueBranch == ContMBB ? getContinueNzeroOpcode(OldOpcode) :		TrueBranch == ContMBB ? getContinueNzeroOpcode(OldOpcode) :
getContinueZeroOpcode(OldOpcode);		getContinueZeroOpcode(OldOpcode);
insertCondBranchBefore(I, BranchOpcode, DL);		insertCondBranchBefore(I, BranchOpcode, DL);
}		}

MI->eraseFromParent();		MI->eraseFromParent();
} else {		} else {
// if we've arrived here then we've already erased the branch instruction		// if we've arrived here then we've already erased the branch instruction
// travel back up the basic block to see the last reference of our debug		// travel back up the basic block to see the last reference of our debug
// location we've just inserted that reference here so it should be		// location we've just inserted that reference here so it should be
// representative insertEnd to ensure phi-moves, if exist, go before the		// representative insertEnd to ensure phi-moves, if exist, go before the
// continue-instr.		// continue-instr.
insertInstrEnd(ContingMBB, AMDGPU::CONTINUE,		insertInstrEnd(ContingMBB, R600::CONTINUE,
getLastDebugLocInBB(ContingMBB));		getLastDebugLocInBB(ContingMBB));
}		}
}		}

int AMDGPUCFGStructurizer::cloneOnSideEntryTo(MachineBasicBlock *PreMBB,		int AMDGPUCFGStructurizer::cloneOnSideEntryTo(MachineBasicBlock *PreMBB,
MachineBasicBlock SrcMBB, MachineBasicBlock DstMBB) {		MachineBasicBlock SrcMBB, MachineBasicBlock DstMBB) {
int Cloned = 0;		int Cloned = 0;
assert(PreMBB->isSuccessor(SrcMBB));		assert(PreMBB->isSuccessor(SrcMBB));
▲ Show 20 Lines • Show All 115 Lines • ▼ Show 20 Lines	void AMDGPUCFGStructurizer::removeRedundantConditionalBranch(
SHOWNEWBLK(MBB1, "Removing redundant successor");		SHOWNEWBLK(MBB1, "Removing redundant successor");
MBB->removeSuccessor(MBB1, true);		MBB->removeSuccessor(MBB1, true);
}		}

void AMDGPUCFGStructurizer::addDummyExitBlock(		void AMDGPUCFGStructurizer::addDummyExitBlock(
SmallVectorImpl<MachineBasicBlock*> &RetMBB) {		SmallVectorImpl<MachineBasicBlock*> &RetMBB) {
MachineBasicBlock *DummyExitBlk = FuncRep->CreateMachineBasicBlock();		MachineBasicBlock *DummyExitBlk = FuncRep->CreateMachineBasicBlock();
FuncRep->push_back(DummyExitBlk); //insert to function		FuncRep->push_back(DummyExitBlk); //insert to function
insertInstrEnd(DummyExitBlk, AMDGPU::RETURN);		insertInstrEnd(DummyExitBlk, R600::RETURN);

for (SmallVectorImpl<MachineBasicBlock *>::iterator It = RetMBB.begin(),		for (SmallVectorImpl<MachineBasicBlock *>::iterator It = RetMBB.begin(),
E = RetMBB.end(); It != E; ++It) {		E = RetMBB.end(); It != E; ++It) {
MachineBasicBlock MBB = It;		MachineBasicBlock MBB = It;
MachineInstr *MI = getReturnInstr(MBB);		MachineInstr *MI = getReturnInstr(MBB);
if (MI)		if (MI)
MI->eraseFromParent();		MI->eraseFromParent();
MBB->addSuccessor(DummyExitBlk);		MBB->addSuccessor(DummyExitBlk);
▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

lib/Target/AMDGPU/CMakeLists.txt

set(LLVM_TARGET_DEFINITIONS AMDGPU.td)		set(LLVM_TARGET_DEFINITIONS AMDGPU.td)

tablegen(LLVM AMDGPUGenAsmMatcher.inc -gen-asm-matcher)		tablegen(LLVM AMDGPUGenAsmMatcher.inc -gen-asm-matcher)
tablegen(LLVM AMDGPUGenAsmWriter.inc -gen-asm-writer)		tablegen(LLVM AMDGPUGenAsmWriter.inc -gen-asm-writer)
tablegen(LLVM AMDGPUGenCallingConv.inc -gen-callingconv)		tablegen(LLVM AMDGPUGenCallingConv.inc -gen-callingconv)
tablegen(LLVM AMDGPUGenDAGISel.inc -gen-dag-isel)		tablegen(LLVM AMDGPUGenDAGISel.inc -gen-dag-isel)
tablegen(LLVM AMDGPUGenDFAPacketizer.inc -gen-dfa-packetizer)
tablegen(LLVM AMDGPUGenDisassemblerTables.inc -gen-disassembler)		tablegen(LLVM AMDGPUGenDisassemblerTables.inc -gen-disassembler)
tablegen(LLVM AMDGPUGenInstrInfo.inc -gen-instr-info)		tablegen(LLVM AMDGPUGenInstrInfo.inc -gen-instr-info)
tablegen(LLVM AMDGPUGenIntrinsics.inc -gen-tgt-intrinsic)		tablegen(LLVM AMDGPUGenIntrinsics.inc -gen-tgt-intrinsic)
tablegen(LLVM AMDGPUGenMCCodeEmitter.inc -gen-emitter)		tablegen(LLVM AMDGPUGenMCCodeEmitter.inc -gen-emitter)
tablegen(LLVM AMDGPUGenMCPseudoLowering.inc -gen-pseudo-lowering)		tablegen(LLVM AMDGPUGenMCPseudoLowering.inc -gen-pseudo-lowering)
tablegen(LLVM AMDGPUGenRegisterBank.inc -gen-register-bank)		tablegen(LLVM AMDGPUGenRegisterBank.inc -gen-register-bank)
tablegen(LLVM AMDGPUGenRegisterInfo.inc -gen-register-info)		tablegen(LLVM AMDGPUGenRegisterInfo.inc -gen-register-info)
tablegen(LLVM AMDGPUGenSearchableTables.inc -gen-searchable-tables)		tablegen(LLVM AMDGPUGenSearchableTables.inc -gen-searchable-tables)
tablegen(LLVM AMDGPUGenSubtargetInfo.inc -gen-subtarget)		tablegen(LLVM AMDGPUGenSubtargetInfo.inc -gen-subtarget)

		set(LLVM_TARGET_DEFINITIONS R600.td)
		tablegen(LLVM R600GenAsmWriter.inc -gen-asm-writer)
		tablegen(LLVM R600GenCallingConv.inc -gen-callingconv)
		tablegen(LLVM R600GenDAGISel.inc -gen-dag-isel)
		tablegen(LLVM R600GenDFAPacketizer.inc -gen-dfa-packetizer)
		tablegen(LLVM R600GenInstrInfo.inc -gen-instr-info)
		tablegen(LLVM R600GenIntrinsics.inc -gen-tgt-intrinsic)
		tablegen(LLVM R600GenMCCodeEmitter.inc -gen-emitter)
		tablegen(LLVM R600GenRegisterInfo.inc -gen-register-info)
		tablegen(LLVM R600GenSubtargetInfo.inc -gen-subtarget)

add_public_tablegen_target(AMDGPUCommonTableGen)		add_public_tablegen_target(AMDGPUCommonTableGen)

add_llvm_target(AMDGPUCodeGen		add_llvm_target(AMDGPUCodeGen
AMDGPUAliasAnalysis.cpp		AMDGPUAliasAnalysis.cpp
AMDGPUAlwaysInlinePass.cpp		AMDGPUAlwaysInlinePass.cpp
AMDGPUAnnotateKernelFeatures.cpp		AMDGPUAnnotateKernelFeatures.cpp
AMDGPUAnnotateUniformValues.cpp		AMDGPUAnnotateUniformValues.cpp
AMDGPUArgumentUsageInfo.cpp		AMDGPUArgumentUsageInfo.cpp
Show All 36 Lines	add_llvm_target(AMDGPUCodeGen
GCNRegPressure.cpp		GCNRegPressure.cpp
GCNSchedStrategy.cpp		GCNSchedStrategy.cpp
R600ClauseMergePass.cpp		R600ClauseMergePass.cpp
R600ControlFlowFinalizer.cpp		R600ControlFlowFinalizer.cpp
R600EmitClauseMarkers.cpp		R600EmitClauseMarkers.cpp
R600ExpandSpecialInstrs.cpp		R600ExpandSpecialInstrs.cpp
R600FrameLowering.cpp		R600FrameLowering.cpp
R600InstrInfo.cpp		R600InstrInfo.cpp
		R600IntrinsicInfo.cpp
R600ISelLowering.cpp		R600ISelLowering.cpp
R600MachineFunctionInfo.cpp		R600MachineFunctionInfo.cpp
R600MachineScheduler.cpp		R600MachineScheduler.cpp
R600OptimizeVectorRegisters.cpp		R600OptimizeVectorRegisters.cpp
R600Packetizer.cpp		R600Packetizer.cpp
R600RegisterInfo.cpp		R600RegisterInfo.cpp
SIAnnotateControlFlow.cpp		SIAnnotateControlFlow.cpp
SIDebuggerInsertNops.cpp		SIDebuggerInsertNops.cpp
Show All 31 Lines

lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp

	Show All 14 Lines
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	// ToDo: What to do with instruction suffixes (v_mov_b32 vs v_mov_b32_e32)?			// ToDo: What to do with instruction suffixes (v_mov_b32 vs v_mov_b32_e32)?

	#include "Disassembler/AMDGPUDisassembler.h"			#include "Disassembler/AMDGPUDisassembler.h"
	#include "AMDGPU.h"			#include "AMDGPU.h"
	#include "AMDGPURegisterInfo.h"			#include "AMDGPURegisterInfo.h"
				#include "MCTargetDesc/AMDGPUMCTargetDesc.h"
	#include "SIDefines.h"			#include "SIDefines.h"
	#include "MCTargetDesc/AMDGPUMCTargetDesc.h"			#include "MCTargetDesc/AMDGPUMCTargetDesc.h"
	#include "Utils/AMDGPUBaseInfo.h"			#include "Utils/AMDGPUBaseInfo.h"
	#include "llvm-c/Disassembler.h"			#include "llvm-c/Disassembler.h"
	#include "llvm/ADT/APInt.h"			#include "llvm/ADT/APInt.h"
	#include "llvm/ADT/ArrayRef.h"			#include "llvm/ADT/ArrayRef.h"
	#include "llvm/ADT/Twine.h"			#include "llvm/ADT/Twine.h"
	#include "llvm/BinaryFormat/ELF.h"			#include "llvm/BinaryFormat/ELF.h"
	▲ Show 20 Lines • Show All 903 Lines • Show Last 20 Lines

lib/Target/AMDGPU/EvergreenInstructions.td

	//===-- EvergreenInstructions.td - EG Instruction defs ----- tablegen --===//			//===-- EvergreenInstructions.td - EG Instruction defs ----- tablegen --===//
	//			//
	// The LLVM Compiler Infrastructure			// The LLVM Compiler Infrastructure
	//			//
	// This file is distributed under the University of Illinois Open Source			// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.			// License. See LICENSE.TXT for details.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// TableGen definitions for instructions which are:			// TableGen definitions for instructions which are:
	// - Available to Evergreen and newer VLIW4/VLIW5 GPUs			// - Available to Evergreen and newer VLIW4/VLIW5 GPUs
	// - Available only on Evergreen family GPUs.			// - Available only on Evergreen family GPUs.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	def isEG : Predicate<			def isEG : Predicate<
	"Subtarget->getGeneration() >= AMDGPUSubtarget::EVERGREEN && "			"Subtarget->getGeneration() >= R600Subtarget::EVERGREEN && "
	"Subtarget->getGeneration() <= AMDGPUSubtarget::NORTHERN_ISLANDS && "
	"!Subtarget->hasCaymanISA()"			"!Subtarget->hasCaymanISA()"
	>;			>;

	def isEGorCayman : Predicate<			def isEGorCayman : Predicate<
	"Subtarget->getGeneration() == AMDGPUSubtarget::EVERGREEN \|\|"			"Subtarget->getGeneration() == R600Subtarget::EVERGREEN \|\|"
	"Subtarget->getGeneration() == AMDGPUSubtarget::NORTHERN_ISLANDS"			"Subtarget->getGeneration() == R600Subtarget::NORTHERN_ISLANDS"
	>;			>;

	class EGPat<dag pattern, dag result> : AMDGPUPat<pattern, result> {			class EGPat<dag pattern, dag result> : AMDGPUPat<pattern, result> {
	let SubtargetPredicate = isEG;			let SubtargetPredicate = isEG;
	}			}

	class EGOrCaymanPat<dag pattern, dag result> : AMDGPUPat<pattern, result> {			class EGOrCaymanPat<dag pattern, dag result> : AMDGPUPat<pattern, result> {
	let SubtargetPredicate = isEGorCayman;			let SubtargetPredicate = isEGorCayman;
	▲ Show 20 Lines • Show All 741 Lines • Show Last 20 Lines

lib/Target/AMDGPU/InstPrinter/AMDGPUInstPrinter.h

Show First 20 Lines • Show All 210 Lines • ▼ Show 20 Lines	protected:
void printSwizzle(const MCInst *MI, unsigned OpNo, const MCSubtargetInfo &STI,		void printSwizzle(const MCInst *MI, unsigned OpNo, const MCSubtargetInfo &STI,
raw_ostream &O);		raw_ostream &O);
void printWaitFlag(const MCInst *MI, unsigned OpNo,		void printWaitFlag(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI, raw_ostream &O);		const MCSubtargetInfo &STI, raw_ostream &O);
void printHwreg(const MCInst *MI, unsigned OpNo, const MCSubtargetInfo &STI,		void printHwreg(const MCInst *MI, unsigned OpNo, const MCSubtargetInfo &STI,
raw_ostream &O);		raw_ostream &O);
};		};

// FIXME: R600 specific parts of AMDGPUInstrPrinter should be moved here, and		class R600InstPrinter : public MCInstPrinter {
// MCTargetDesc should be using R600InstPrinter for the R600 target.
class R600InstPrinter : public AMDGPUInstPrinter {
public:		public:
R600InstPrinter(const MCAsmInfo &MAI, const MCInstrInfo &MII,		R600InstPrinter(const MCAsmInfo &MAI, const MCInstrInfo &MII,
const MCRegisterInfo &MRI)		const MCRegisterInfo &MRI)
: AMDGPUInstPrinter(MAI, MII, MRI) {}		: MCInstPrinter(MAI, MII, MRI) {}

		void printInst(const MCInst *MI, raw_ostream &O, StringRef Annot,
		const MCSubtargetInfo &STI) override;
		void printInstruction(const MCInst *MI, raw_ostream &O);
		static const char *getRegisterName(unsigned RegNo);

void printAbs(const MCInst *MI, unsigned OpNo, raw_ostream &O);		void printAbs(const MCInst *MI, unsigned OpNo, raw_ostream &O);
void printBankSwizzle(const MCInst *MI, unsigned OpNo, raw_ostream &O);		void printBankSwizzle(const MCInst *MI, unsigned OpNo, raw_ostream &O);
void printClamp(const MCInst *MI, unsigned OpNo, raw_ostream &O);		void printClamp(const MCInst *MI, unsigned OpNo, raw_ostream &O);
void printCT(const MCInst *MI, unsigned OpNo, raw_ostream &O);		void printCT(const MCInst *MI, unsigned OpNo, raw_ostream &O);
void printKCache(const MCInst *MI, unsigned OpNo, raw_ostream &O);		void printKCache(const MCInst *MI, unsigned OpNo, raw_ostream &O);
void printLast(const MCInst *MI, unsigned OpNo, raw_ostream &O);		void printLast(const MCInst *MI, unsigned OpNo, raw_ostream &O);
void printLiteral(const MCInst *MI, unsigned OpNo, raw_ostream &O);		void printLiteral(const MCInst *MI, unsigned OpNo, raw_ostream &O);
Show All 14 Lines

lib/Target/AMDGPU/InstPrinter/AMDGPUInstPrinter.cpp

Show First 20 Lines • Show All 499 Lines • ▼ Show 20 Lines	else {
// operand. This is technically allowed for the encoding of s_mov_b64.		// operand. This is technically allowed for the encoding of s_mov_b64.
O << formatHex(static_cast<uint64_t>(Imm));		O << formatHex(static_cast<uint64_t>(Imm));
}		}
}		}

void AMDGPUInstPrinter::printOperand(const MCInst *MI, unsigned OpNo,		void AMDGPUInstPrinter::printOperand(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI,		const MCSubtargetInfo &STI,
raw_ostream &O) {		raw_ostream &O) {
if (!STI.getFeatureBits()[AMDGPU::FeatureGCN]) {
static_cast<R600InstPrinter*>(this)->printOperand(MI, OpNo, O);
return;
}

if (OpNo >= MI->getNumOperands()) {		if (OpNo >= MI->getNumOperands()) {
O << "/Missing OP" << OpNo << "/";		O << "/Missing OP" << OpNo << "/";
return;		return;
}		}

const MCOperand &Op = MI->getOperand(OpNo);		const MCOperand &Op = MI->getOperand(OpNo);
if (Op.isReg()) {		if (Op.isReg()) {
printRegOperand(Op.getReg(), O, MRI);		printRegOperand(Op.getReg(), O, MRI);
▲ Show 20 Lines • Show All 429 Lines • ▼ Show 20 Lines	void AMDGPUInstPrinter::printVGPRIndexMode(const MCInst *MI, unsigned OpNo,

if (Val & VGPRIndexMode::SRC2_ENABLE)		if (Val & VGPRIndexMode::SRC2_ENABLE)
O << " src2";		O << " src2";
}		}

void AMDGPUInstPrinter::printMemOperand(const MCInst *MI, unsigned OpNo,		void AMDGPUInstPrinter::printMemOperand(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI,		const MCSubtargetInfo &STI,
raw_ostream &O) {		raw_ostream &O) {
if (!STI.getFeatureBits()[AMDGPU::FeatureGCN]) {
static_cast<R600InstPrinter*>(this)->printMemOperand(MI, OpNo, O);
return;
}

printOperand(MI, OpNo, STI, O);		printOperand(MI, OpNo, STI, O);
O << ", ";		O << ", ";
printOperand(MI, OpNo + 1, STI, O);		printOperand(MI, OpNo + 1, STI, O);
}		}

void AMDGPUInstPrinter::printIfSet(const MCInst *MI, unsigned OpNo,		void AMDGPUInstPrinter::printIfSet(const MCInst *MI, unsigned OpNo,
raw_ostream &O, StringRef Asm,		raw_ostream &O, StringRef Asm,
StringRef Default) {		StringRef Default) {
Show All 9 Lines
void AMDGPUInstPrinter::printIfSet(const MCInst *MI, unsigned OpNo,		void AMDGPUInstPrinter::printIfSet(const MCInst *MI, unsigned OpNo,
raw_ostream &O, char Asm) {		raw_ostream &O, char Asm) {
const MCOperand &Op = MI->getOperand(OpNo);		const MCOperand &Op = MI->getOperand(OpNo);
assert(Op.isImm());		assert(Op.isImm());
if (Op.getImm() == 1)		if (Op.getImm() == 1)
O << Asm;		O << Asm;
}		}

void AMDGPUInstPrinter::printAbs(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI, raw_ostream &O) {
static_cast<R600InstPrinter*>(this)->printAbs(MI, OpNo, O);
}

void AMDGPUInstPrinter::printClamp(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI, raw_ostream &O) {
static_cast<R600InstPrinter*>(this)->printClamp(MI, OpNo, O);
}

void AMDGPUInstPrinter::printHigh(const MCInst *MI, unsigned OpNo,		void AMDGPUInstPrinter::printHigh(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI,		const MCSubtargetInfo &STI,
raw_ostream &O) {		raw_ostream &O) {
if (MI->getOperand(OpNo).getImm())		if (MI->getOperand(OpNo).getImm())
O << " high";		O << " high";
}		}

void AMDGPUInstPrinter::printClampSI(const MCInst *MI, unsigned OpNo,		void AMDGPUInstPrinter::printClampSI(const MCInst *MI, unsigned OpNo,
Show All 10 Lines	void AMDGPUInstPrinter::printOModSI(const MCInst *MI, unsigned OpNo,
if (Imm == SIOutMods::MUL2)		if (Imm == SIOutMods::MUL2)
O << " mul:2";		O << " mul:2";
else if (Imm == SIOutMods::MUL4)		else if (Imm == SIOutMods::MUL4)
O << " mul:4";		O << " mul:4";
else if (Imm == SIOutMods::DIV2)		else if (Imm == SIOutMods::DIV2)
O << " div:2";		O << " div:2";
}		}

void AMDGPUInstPrinter::printLiteral(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI,
raw_ostream &O) {
static_cast<R600InstPrinter*>(this)->printLiteral(MI, OpNo, O);
}

void AMDGPUInstPrinter::printLast(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI, raw_ostream &O) {
static_cast<R600InstPrinter*>(this)->printLast(MI, OpNo, O);
}

void AMDGPUInstPrinter::printNeg(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI, raw_ostream &O) {
static_cast<R600InstPrinter*>(this)->printNeg(MI, OpNo, O);
}

void AMDGPUInstPrinter::printOMOD(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI, raw_ostream &O) {
static_cast<R600InstPrinter*>(this)->printOMOD(MI, OpNo, O);
}

void AMDGPUInstPrinter::printRel(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI, raw_ostream &O) {
static_cast<R600InstPrinter*>(this)->printRel(MI, OpNo, O);
}

void AMDGPUInstPrinter::printUpdateExecMask(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI,
raw_ostream &O) {
static_cast<R600InstPrinter*>(this)->printUpdateExecMask(MI, OpNo, O);
}

void AMDGPUInstPrinter::printUpdatePred(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI,
raw_ostream &O) {
static_cast<R600InstPrinter*>(this)->printUpdatePred(MI, OpNo, O);
}

void AMDGPUInstPrinter::printWrite(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI, raw_ostream &O) {
static_cast<R600InstPrinter*>(this)->printWrite(MI, OpNo, O);
}

void AMDGPUInstPrinter::printBankSwizzle(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI,
raw_ostream &O) {
static_cast<R600InstPrinter*>(this)->printBankSwizzle(MI, OpNo, O);
}

void AMDGPUInstPrinter::printRSel(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI, raw_ostream &O) {
static_cast<R600InstPrinter*>(this)->printRSel(MI, OpNo, O);
}

void AMDGPUInstPrinter::printCT(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI, raw_ostream &O) {
static_cast<R600InstPrinter*>(this)->printCT(MI, OpNo, O);
}

void AMDGPUInstPrinter::printKCache(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI, raw_ostream &O) {
static_cast<R600InstPrinter*>(this)->printKCache(MI, OpNo, O);
}

void AMDGPUInstPrinter::printSendMsg(const MCInst *MI, unsigned OpNo,		void AMDGPUInstPrinter::printSendMsg(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI,		const MCSubtargetInfo &STI,
raw_ostream &O) {		raw_ostream &O) {
using namespace llvm::AMDGPU::SendMsg;		using namespace llvm::AMDGPU::SendMsg;

const unsigned SImm16 = MI->getOperand(OpNo).getImm();		const unsigned SImm16 = MI->getOperand(OpNo).getImm();
const unsigned Id = SImm16 & ID_MASK_;		const unsigned Id = SImm16 & ID_MASK_;
do {		do {
▲ Show 20 Lines • Show All 188 Lines • ▼ Show 20 Lines	void AMDGPUInstPrinter::printHwreg(const MCInst *MI, unsigned OpNo,
if (Width != WIDTH_M1_DEFAULT_ + 1 \|\| Offset != OFFSET_DEFAULT_) {		if (Width != WIDTH_M1_DEFAULT_ + 1 \|\| Offset != OFFSET_DEFAULT_) {
O << ", " << Offset << ", " << Width;		O << ", " << Offset << ", " << Width;
}		}
O << ')';		O << ')';
}		}

#include "AMDGPUGenAsmWriter.inc"		#include "AMDGPUGenAsmWriter.inc"

		void R600InstPrinter::printInst(const MCInst *MI, raw_ostream &O,
		StringRef Annot, const MCSubtargetInfo &STI) {
		O.flush();
		printInstruction(MI, O);
		printAnnotation(O, Annot);
		}

void R600InstPrinter::printAbs(const MCInst *MI, unsigned OpNo,		void R600InstPrinter::printAbs(const MCInst *MI, unsigned OpNo,
raw_ostream &O) {		raw_ostream &O) {
AMDGPUInstPrinter::printIfSet(MI, OpNo, O, '\|');		AMDGPUInstPrinter::printIfSet(MI, OpNo, O, '\|');
}		}

void R600InstPrinter::printBankSwizzle(const MCInst *MI, unsigned OpNo,		void R600InstPrinter::printBankSwizzle(const MCInst *MI, unsigned OpNo,
raw_ostream &O) {		raw_ostream &O) {
int BankSwizzle = MI->getOperand(OpNo).getImm();		int BankSwizzle = MI->getOperand(OpNo).getImm();
▲ Show 20 Lines • Show All 102 Lines • ▼ Show 20 Lines	if (OpNo >= MI->getNumOperands()) {
O << "/Missing OP" << OpNo << "/";		O << "/Missing OP" << OpNo << "/";
return;		return;
}		}

const MCOperand &Op = MI->getOperand(OpNo);		const MCOperand &Op = MI->getOperand(OpNo);
if (Op.isReg()) {		if (Op.isReg()) {
switch (Op.getReg()) {		switch (Op.getReg()) {
// This is the default predicate state, so we don't need to print it.		// This is the default predicate state, so we don't need to print it.
case AMDGPU::PRED_SEL_OFF:		case R600::PRED_SEL_OFF:
break;		break;

default:		default:
O << getRegisterName(Op.getReg());		O << getRegisterName(Op.getReg());
break;		break;
}		}
} else if (Op.isImm()) {		} else if (Op.isImm()) {
O << Op.getImm();		O << Op.getImm();
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines

void R600InstPrinter::printWrite(const MCInst *MI, unsigned OpNo,		void R600InstPrinter::printWrite(const MCInst *MI, unsigned OpNo,
raw_ostream &O) {		raw_ostream &O) {
const MCOperand &Op = MI->getOperand(OpNo);		const MCOperand &Op = MI->getOperand(OpNo);
if (Op.getImm() == 0) {		if (Op.getImm() == 0) {
O << " (MASKED)";		O << " (MASKED)";
}		}
}		}

		#include "R600GenAsmWriter.inc"

lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.h

	Show All 34 Lines
	class raw_pwrite_stream;			class raw_pwrite_stream;

	Target &getTheAMDGPUTarget();			Target &getTheAMDGPUTarget();
	Target &getTheGCNTarget();			Target &getTheGCNTarget();

	MCCodeEmitter *createR600MCCodeEmitter(const MCInstrInfo &MCII,			MCCodeEmitter *createR600MCCodeEmitter(const MCInstrInfo &MCII,
	const MCRegisterInfo &MRI,			const MCRegisterInfo &MRI,
	MCContext &Ctx);			MCContext &Ctx);
				MCInstrInfo *createR600MCInstrInfo();

	MCCodeEmitter *createSIMCCodeEmitter(const MCInstrInfo &MCII,			MCCodeEmitter *createSIMCCodeEmitter(const MCInstrInfo &MCII,
	const MCRegisterInfo &MRI,			const MCRegisterInfo &MRI,
	MCContext &Ctx);			MCContext &Ctx);

	MCAsmBackend *createAMDGPUAsmBackend(const Target &T,			MCAsmBackend *createAMDGPUAsmBackend(const Target &T,
	const MCSubtargetInfo &STI,			const MCSubtargetInfo &STI,
	const MCRegisterInfo &MRI,			const MCRegisterInfo &MRI,
	const MCTargetOptions &Options);			const MCTargetOptions &Options);

	std::unique_ptr<MCObjectWriter>			std::unique_ptr<MCObjectWriter>
	createAMDGPUELFObjectWriter(bool Is64Bit, uint8_t OSABI,			createAMDGPUELFObjectWriter(bool Is64Bit, uint8_t OSABI,
	bool HasRelocationAddend, raw_pwrite_stream &OS);			bool HasRelocationAddend, raw_pwrite_stream &OS);
	} // End llvm namespace			} // End llvm namespace

	#define GET_REGINFO_ENUM			#define GET_REGINFO_ENUM
	#include "AMDGPUGenRegisterInfo.inc"			#include "AMDGPUGenRegisterInfo.inc"
	#undef GET_REGINFO_ENUM			#undef GET_REGINFO_ENUM

				#define GET_REGINFO_ENUM
				#include "R600GenRegisterInfo.inc"
				#undef GET_REGINFO_ENUM

	#define GET_INSTRINFO_ENUM			#define GET_INSTRINFO_ENUM
	#define GET_INSTRINFO_OPERAND_ENUM			#define GET_INSTRINFO_OPERAND_ENUM
	#define GET_INSTRINFO_SCHED_ENUM			#define GET_INSTRINFO_SCHED_ENUM
	#include "AMDGPUGenInstrInfo.inc"			#include "AMDGPUGenInstrInfo.inc"
	#undef GET_INSTRINFO_SCHED_ENUM			#undef GET_INSTRINFO_SCHED_ENUM
	#undef GET_INSTRINFO_OPERAND_ENUM			#undef GET_INSTRINFO_OPERAND_ENUM
	#undef GET_INSTRINFO_ENUM			#undef GET_INSTRINFO_ENUM

				#define GET_INSTRINFO_ENUM
				#define GET_INSTRINFO_OPERAND_ENUM
				#define GET_INSTRINFO_SCHED_ENUM
				#include "R600GenInstrInfo.inc"
				#undef GET_INSTRINFO_SCHED_ENUM
				#undef GET_INSTRINFO_OPERAND_ENUM
				#undef GET_INSTRINFO_ENUM

	#define GET_SUBTARGETINFO_ENUM			#define GET_SUBTARGETINFO_ENUM
	#include "AMDGPUGenSubtargetInfo.inc"			#include "AMDGPUGenSubtargetInfo.inc"
	#undef GET_SUBTARGETINFO_ENUM			#undef GET_SUBTARGETINFO_ENUM

				#define GET_SUBTARGETINFO_ENUM
				#include "R600GenSubtargetInfo.inc"
				#undef GET_SUBTARGETINFO_ENUM

	#endif			#endif

lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.cpp

Show All 31 Lines
using namespace llvm;		using namespace llvm;

#define GET_INSTRINFO_MC_DESC		#define GET_INSTRINFO_MC_DESC
#include "AMDGPUGenInstrInfo.inc"		#include "AMDGPUGenInstrInfo.inc"

#define GET_SUBTARGETINFO_MC_DESC		#define GET_SUBTARGETINFO_MC_DESC
#include "AMDGPUGenSubtargetInfo.inc"		#include "AMDGPUGenSubtargetInfo.inc"

		#define NoSchedModel NoSchedModelR600
		#define GET_SUBTARGETINFO_MC_DESC
		#include "R600GenSubtargetInfo.inc"
		#undef NoSchedModelR600

#define GET_REGINFO_MC_DESC		#define GET_REGINFO_MC_DESC
#include "AMDGPUGenRegisterInfo.inc"		#include "AMDGPUGenRegisterInfo.inc"

		#define GET_REGINFO_MC_DESC
		#include "R600GenRegisterInfo.inc"

static MCInstrInfo *createAMDGPUMCInstrInfo() {		static MCInstrInfo *createAMDGPUMCInstrInfo() {
MCInstrInfo *X = new MCInstrInfo();		MCInstrInfo *X = new MCInstrInfo();
InitAMDGPUMCInstrInfo(X);		InitAMDGPUMCInstrInfo(X);
return X;		return X;
}		}

static MCRegisterInfo *createAMDGPUMCRegisterInfo(const Triple &TT) {		static MCRegisterInfo *createAMDGPUMCRegisterInfo(const Triple &TT) {
MCRegisterInfo *X = new MCRegisterInfo();		MCRegisterInfo *X = new MCRegisterInfo();
		if (TT.getArch() == Triple::r600)
		InitR600MCRegisterInfo(X, 0);
		else
InitAMDGPUMCRegisterInfo(X, 0);		InitAMDGPUMCRegisterInfo(X, 0);
return X;		return X;
}		}

static MCSubtargetInfo *		static MCSubtargetInfo *
createAMDGPUMCSubtargetInfo(const Triple &TT, StringRef CPU, StringRef FS) {		createAMDGPUMCSubtargetInfo(const Triple &TT, StringRef CPU, StringRef FS) {
		if (TT.getArch() == Triple::r600)
		return createR600MCSubtargetInfoImpl(TT, CPU, FS);
return createAMDGPUMCSubtargetInfoImpl(TT, CPU, FS);		return createAMDGPUMCSubtargetInfoImpl(TT, CPU, FS);
}		}

static MCInstPrinter *createAMDGPUMCInstPrinter(const Triple &T,		static MCInstPrinter *createAMDGPUMCInstPrinter(const Triple &T,
unsigned SyntaxVariant,		unsigned SyntaxVariant,
const MCAsmInfo &MAI,		const MCAsmInfo &MAI,
const MCInstrInfo &MII,		const MCInstrInfo &MII,
const MCRegisterInfo &MRI) {		const MCRegisterInfo &MRI) {
return T.getArch() == Triple::r600 ? new R600InstPrinter(MAI, MII, MRI) :		if (T.getArch() == Triple::r600)
new AMDGPUInstPrinter(MAI, MII, MRI);		return new R600InstPrinter(MAI, MII, MRI);
		else
		return new AMDGPUInstPrinter(MAI, MII, MRI);
}		}

static MCTargetStreamer *createAMDGPUAsmTargetStreamer(MCStreamer &S,		static MCTargetStreamer *createAMDGPUAsmTargetStreamer(MCStreamer &S,
formatted_raw_ostream &OS,		formatted_raw_ostream &OS,
MCInstPrinter *InstPrint,		MCInstPrinter *InstPrint,
bool isVerboseAsm) {		bool isVerboseAsm) {
return new AMDGPUTargetAsmStreamer(S, OS);		return new AMDGPUTargetAsmStreamer(S, OS);
}		}
Show All 9 Lines	static MCStreamer *createMCStreamer(const Triple &T, MCContext &Context,
raw_pwrite_stream &OS,		raw_pwrite_stream &OS,
std::unique_ptr<MCCodeEmitter> &&Emitter,		std::unique_ptr<MCCodeEmitter> &&Emitter,
bool RelaxAll) {		bool RelaxAll) {
return createAMDGPUELFStreamer(T, Context, std::move(MAB), OS,		return createAMDGPUELFStreamer(T, Context, std::move(MAB), OS,
std::move(Emitter), RelaxAll);		std::move(Emitter), RelaxAll);
}		}

extern "C" void LLVMInitializeAMDGPUTargetMC() {		extern "C" void LLVMInitializeAMDGPUTargetMC() {

		TargetRegistry::RegisterMCInstrInfo(getTheGCNTarget(), createAMDGPUMCInstrInfo);
		TargetRegistry::RegisterMCInstrInfo(getTheAMDGPUTarget(), createR600MCInstrInfo);
for (Target *T : {&getTheAMDGPUTarget(), &getTheGCNTarget()}) {		for (Target *T : {&getTheAMDGPUTarget(), &getTheGCNTarget()}) {
RegisterMCAsmInfo<AMDGPUMCAsmInfo> X(*T);		RegisterMCAsmInfo<AMDGPUMCAsmInfo> X(*T);

TargetRegistry::RegisterMCInstrInfo(*T, createAMDGPUMCInstrInfo);
TargetRegistry::RegisterMCRegInfo(*T, createAMDGPUMCRegisterInfo);		TargetRegistry::RegisterMCRegInfo(*T, createAMDGPUMCRegisterInfo);
TargetRegistry::RegisterMCSubtargetInfo(*T, createAMDGPUMCSubtargetInfo);		TargetRegistry::RegisterMCSubtargetInfo(*T, createAMDGPUMCSubtargetInfo);
TargetRegistry::RegisterMCInstPrinter(*T, createAMDGPUMCInstPrinter);		TargetRegistry::RegisterMCInstPrinter(*T, createAMDGPUMCInstPrinter);
TargetRegistry::RegisterMCAsmBackend(*T, createAMDGPUAsmBackend);		TargetRegistry::RegisterMCAsmBackend(*T, createAMDGPUAsmBackend);
TargetRegistry::RegisterELFStreamer(*T, createMCStreamer);		TargetRegistry::RegisterELFStreamer(*T, createMCStreamer);
}		}

// R600 specific registration		// R600 specific registration
Show All 14 Lines

lib/Target/AMDGPU/MCTargetDesc/CMakeLists.txt

	add_llvm_library(LLVMAMDGPUDesc			add_llvm_library(LLVMAMDGPUDesc
	AMDGPUAsmBackend.cpp			AMDGPUAsmBackend.cpp
	AMDGPUELFObjectWriter.cpp			AMDGPUELFObjectWriter.cpp
	AMDGPUELFStreamer.cpp			AMDGPUELFStreamer.cpp
	AMDGPUHSAMetadataStreamer.cpp			AMDGPUHSAMetadataStreamer.cpp
	AMDGPUMCAsmInfo.cpp			AMDGPUMCAsmInfo.cpp
	AMDGPUMCCodeEmitter.cpp			AMDGPUMCCodeEmitter.cpp
	AMDGPUMCTargetDesc.cpp			AMDGPUMCTargetDesc.cpp
	AMDGPUTargetStreamer.cpp			AMDGPUTargetStreamer.cpp
	R600MCCodeEmitter.cpp			R600MCCodeEmitter.cpp
				R600MCTargetDesc.cpp
	SIMCCodeEmitter.cpp			SIMCCodeEmitter.cpp
	)			)

lib/Target/AMDGPU/MCTargetDesc/R600MCCodeEmitter.cpp

Show All 9 Lines
/// \file		/// \file
///		///
/// \brief The R600 code emitter produces machine code that can be executed		/// \brief The R600 code emitter produces machine code that can be executed
/// directly on the GPU device.		/// directly on the GPU device.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "MCTargetDesc/AMDGPUFixupKinds.h"		#include "MCTargetDesc/AMDGPUFixupKinds.h"
#include "MCTargetDesc/AMDGPUMCCodeEmitter.h"
#include "MCTargetDesc/AMDGPUMCTargetDesc.h"		#include "MCTargetDesc/AMDGPUMCTargetDesc.h"
#include "R600Defines.h"		#include "R600Defines.h"
#include "llvm/MC/MCCodeEmitter.h"		#include "llvm/MC/MCCodeEmitter.h"
#include "llvm/MC/MCContext.h"		#include "llvm/MC/MCContext.h"
#include "llvm/MC/MCFixup.h"		#include "llvm/MC/MCFixup.h"
#include "llvm/MC/MCInst.h"		#include "llvm/MC/MCInst.h"
#include "llvm/MC/MCInstrDesc.h"		#include "llvm/MC/MCInstrDesc.h"
#include "llvm/MC/MCInstrInfo.h"		#include "llvm/MC/MCInstrInfo.h"
#include "llvm/MC/MCRegisterInfo.h"		#include "llvm/MC/MCRegisterInfo.h"
#include "llvm/MC/MCSubtargetInfo.h"		#include "llvm/MC/MCSubtargetInfo.h"
#include "llvm/Support/Endian.h"		#include "llvm/Support/Endian.h"
#include "llvm/Support/EndianStream.h"		#include "llvm/Support/EndianStream.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include <cassert>		#include <cassert>
#include <cstdint>		#include <cstdint>

using namespace llvm;		using namespace llvm;

namespace {		namespace {

class R600MCCodeEmitter : public AMDGPUMCCodeEmitter {		class R600MCCodeEmitter : public MCCodeEmitter {
const MCRegisterInfo &MRI;		const MCRegisterInfo &MRI;
		const MCInstrInfo &MCII;

public:		public:
R600MCCodeEmitter(const MCInstrInfo &mcii, const MCRegisterInfo &mri)		R600MCCodeEmitter(const MCInstrInfo &mcii, const MCRegisterInfo &mri)
: AMDGPUMCCodeEmitter(mcii), MRI(mri) {}		: MRI(mri), MCII(mcii) {}
R600MCCodeEmitter(const R600MCCodeEmitter &) = delete;		R600MCCodeEmitter(const R600MCCodeEmitter &) = delete;
R600MCCodeEmitter &operator=(const R600MCCodeEmitter &) = delete;		R600MCCodeEmitter &operator=(const R600MCCodeEmitter &) = delete;

/// \brief Encode the instruction and write it to the OS.		/// \brief Encode the instruction and write it to the OS.
void encodeInstruction(const MCInst &MI, raw_ostream &OS,		void encodeInstruction(const MCInst &MI, raw_ostream &OS,
SmallVectorImpl<MCFixup> &Fixups,		SmallVectorImpl<MCFixup> &Fixups,
const MCSubtargetInfo &STI) const override;		const MCSubtargetInfo &STI) const;

/// \returns the encoding for an MCOperand.		/// \returns the encoding for an MCOperand.
uint64_t getMachineOpValue(const MCInst &MI, const MCOperand &MO,		uint64_t getMachineOpValue(const MCInst &MI, const MCOperand &MO,
SmallVectorImpl<MCFixup> &Fixups,		SmallVectorImpl<MCFixup> &Fixups,
const MCSubtargetInfo &STI) const override;		const MCSubtargetInfo &STI) const;

private:		private:

void Emit(uint32_t value, raw_ostream &OS) const;		void Emit(uint32_t value, raw_ostream &OS) const;
void Emit(uint64_t value, raw_ostream &OS) const;		void Emit(uint64_t value, raw_ostream &OS) const;

unsigned getHWReg(unsigned regNo) const;		unsigned getHWReg(unsigned regNo) const;

		uint64_t getBinaryCodeForInstr(const MCInst &MI,
		SmallVectorImpl<MCFixup> &Fixups,
		const MCSubtargetInfo &STI) const;
		uint64_t computeAvailableFeatures(const FeatureBitset &FB) const;
		void verifyInstructionPredicates(const MCInst &MI,
		uint64_t AvailableFeatures) const;

};		};

} // end anonymous namespace		} // end anonymous namespace

enum RegElement {		enum RegElement {
ELEMENT_X = 0,		ELEMENT_X = 0,
ELEMENT_Y,		ELEMENT_Y,
ELEMENT_Z,		ELEMENT_Z,
Show All 18 Lines

void R600MCCodeEmitter::encodeInstruction(const MCInst &MI, raw_ostream &OS,		void R600MCCodeEmitter::encodeInstruction(const MCInst &MI, raw_ostream &OS,
SmallVectorImpl<MCFixup> &Fixups,		SmallVectorImpl<MCFixup> &Fixups,
const MCSubtargetInfo &STI) const {		const MCSubtargetInfo &STI) const {
verifyInstructionPredicates(MI,		verifyInstructionPredicates(MI,
computeAvailableFeatures(STI.getFeatureBits()));		computeAvailableFeatures(STI.getFeatureBits()));

const MCInstrDesc &Desc = MCII.get(MI.getOpcode());		const MCInstrDesc &Desc = MCII.get(MI.getOpcode());
if (MI.getOpcode() == AMDGPU::RETURN \|\|		if (MI.getOpcode() == R600::RETURN \|\|
MI.getOpcode() == AMDGPU::FETCH_CLAUSE \|\|		MI.getOpcode() == R600::FETCH_CLAUSE \|\|
MI.getOpcode() == AMDGPU::ALU_CLAUSE \|\|		MI.getOpcode() == R600::ALU_CLAUSE \|\|
MI.getOpcode() == AMDGPU::BUNDLE \|\|		MI.getOpcode() == R600::BUNDLE \|\|
MI.getOpcode() == AMDGPU::KILL) {		MI.getOpcode() == R600::KILL) {
return;		return;
} else if (IS_VTX(Desc)) {		} else if (IS_VTX(Desc)) {
uint64_t InstWord01 = getBinaryCodeForInstr(MI, Fixups, STI);		uint64_t InstWord01 = getBinaryCodeForInstr(MI, Fixups, STI);
uint32_t InstWord2 = MI.getOperand(2).getImm(); // Offset		uint32_t InstWord2 = MI.getOperand(2).getImm(); // Offset
if (!(STI.getFeatureBits()[AMDGPU::FeatureCaymanISA])) {		if (!(STI.getFeatureBits()[R600::FeatureCaymanISA])) {
InstWord2 \|= 1 << 19; // Mega-Fetch bit		InstWord2 \|= 1 << 19; // Mega-Fetch bit
}		}

Emit(InstWord01, OS);		Emit(InstWord01, OS);
Emit(InstWord2, OS);		Emit(InstWord2, OS);
Emit((uint32_t) 0, OS);		Emit((uint32_t) 0, OS);
} else if (IS_TEX(Desc)) {		} else if (IS_TEX(Desc)) {
int64_t Sampler = MI.getOperand(14).getImm();		int64_t Sampler = MI.getOperand(14).getImm();
Show All 16 Lines	Emit((uint32_t) 0, OS);
SrcSelect[ELEMENT_W] << 29 \| Offsets[0] << 0 \| Offsets[1] << 5 \|		SrcSelect[ELEMENT_W] << 29 \| Offsets[0] << 0 \| Offsets[1] << 5 \|
Offsets[2] << 10;		Offsets[2] << 10;

Emit(Word01, OS);		Emit(Word01, OS);
Emit(Word2, OS);		Emit(Word2, OS);
Emit((uint32_t) 0, OS);		Emit((uint32_t) 0, OS);
} else {		} else {
uint64_t Inst = getBinaryCodeForInstr(MI, Fixups, STI);		uint64_t Inst = getBinaryCodeForInstr(MI, Fixups, STI);
if ((STI.getFeatureBits()[AMDGPU::FeatureR600ALUInst]) &&		if ((STI.getFeatureBits()[R600::FeatureR600ALUInst]) &&
((Desc.TSFlags & R600_InstFlag::OP1) \|\|		((Desc.TSFlags & R600_InstFlag::OP1) \|\|
Desc.TSFlags & R600_InstFlag::OP2)) {		Desc.TSFlags & R600_InstFlag::OP2)) {
uint64_t ISAOpCode = Inst & (0x3FFULL << 39);		uint64_t ISAOpCode = Inst & (0x3FFULL << 39);
Inst &= ~(0x3FFULL << 39);		Inst &= ~(0x3FFULL << 39);
Inst \|= ISAOpCode << 1;		Inst \|= ISAOpCode << 1;
}		}
Emit(Inst, OS);		Emit(Inst, OS);
}		}
Show All 33 Lines	if (MO.isExpr()) {
return 0;		return 0;
}		}

assert(MO.isImm());		assert(MO.isImm());
return MO.getImm();		return MO.getImm();
}		}

#define ENABLE_INSTR_PREDICATE_VERIFIER		#define ENABLE_INSTR_PREDICATE_VERIFIER
#include "AMDGPUGenMCCodeEmitter.inc"		#include "R600GenMCCodeEmitter.inc"

lib/Target/AMDGPU/MCTargetDesc/R600MCTargetDesc.cpp

This file was added.

				//===-- R600MCTargetDesc.cpp - R600 Target Descriptions -------------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				/// \file
				/// \brief This file provides R600 specific target descriptions.
				//
				//===----------------------------------------------------------------------===//

				#include "AMDGPUMCTargetDesc.h"
				#include "llvm/MC/MCInstrInfo.h"

				using namespace llvm;

				#define GET_INSTRINFO_MC_DESC
				#include "R600GenInstrInfo.inc"

				MCInstrInfo *llvm::createR600MCInstrInfo() {
				MCInstrInfo *X = new MCInstrInfo();
				InitR600MCInstrInfo(X);
				return X;
				}

lib/Target/AMDGPU/MCTargetDesc/SIMCCodeEmitter.cpp

Show First 20 Lines • Show All 432 Lines • ▼ Show 20 Lines	if (Enc != ~0U && (Enc != 255 \|\| Desc.getSize() == 4))
return Enc;		return Enc;

} else if (MO.isImm())		} else if (MO.isImm())
return MO.getImm();		return MO.getImm();

llvm_unreachable("Encoding of this operand type is not supported yet.");		llvm_unreachable("Encoding of this operand type is not supported yet.");
return 0;		return 0;
}		}

		#define ENABLE_INSTR_PREDICATE_VERIFIER
		#include "AMDGPUGenMCCodeEmitter.inc"

lib/Target/AMDGPU/Processors.td

This file was added.

				//===-- Processors.td - GCN Processor definitions ------------------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//

				// The code produced for "generic" is only useful for tests and cannot
				// reasonably be expected to execute on any particular target.
				def : ProcessorModel<"generic", NoSchedModel, []>;

				//===----------------------------------------------------------------------===//
				kzhuravlUnsubmitted Done Reply Inline Actions Aren't GCN processors defined in GCNProcessors.td? I do not see it being removed or modified in this change.. kzhuravl: Aren't GCN processors defined in GCNProcessors.td? I do not see it being removed or modified in…
				tstellarAuthorUnsubmitted Not Done Reply Inline Actions This file isn't used at all, I think it was a rebase artifact. I can remove it. tstellar: This file isn't used at all, I think it was a rebase artifact. I can remove it.
				// Southern Islands
				//===----------------------------------------------------------------------===//

				def : ProcessorModel<"gfx600", SIFullSpeedModel,
				[FeatureISAVersion6_0_0]>;

				def : ProcessorModel<"tahiti", SIFullSpeedModel,
				[FeatureISAVersion6_0_0]
				>;

				def : ProcessorModel<"gfx601", SIQuarterSpeedModel,
				[FeatureISAVersion6_0_1]
				>;

				def : ProcessorModel<"pitcairn", SIQuarterSpeedModel,
				[FeatureISAVersion6_0_1]>;

				def : ProcessorModel<"verde", SIQuarterSpeedModel,
				[FeatureISAVersion6_0_1]>;

				def : ProcessorModel<"oland", SIQuarterSpeedModel,
				[FeatureISAVersion6_0_1]>;

				def : ProcessorModel<"hainan", SIQuarterSpeedModel, [FeatureISAVersion6_0_1]>;

				//===----------------------------------------------------------------------===//
				// Sea Islands
				//===----------------------------------------------------------------------===//

				def : ProcessorModel<"gfx700", SIQuarterSpeedModel,
				[FeatureISAVersion7_0_0]
				>;

				def : ProcessorModel<"bonaire", SIQuarterSpeedModel,
				[FeatureISAVersion7_0_0]
				>;

				def : ProcessorModel<"kaveri", SIQuarterSpeedModel,
				[FeatureISAVersion7_0_0]
				>;

				def : ProcessorModel<"gfx701", SIFullSpeedModel,
				[FeatureISAVersion7_0_1]
				>;

				def : ProcessorModel<"hawaii", SIFullSpeedModel,
				[FeatureISAVersion7_0_1]
				>;

				def : ProcessorModel<"gfx702", SIQuarterSpeedModel,
				[FeatureISAVersion7_0_2]
				>;

				def : ProcessorModel<"gfx703", SIQuarterSpeedModel,
				[FeatureISAVersion7_0_3]
				>;

				def : ProcessorModel<"kabini", SIQuarterSpeedModel,
				[FeatureISAVersion7_0_3]
				>;

				def : ProcessorModel<"mullins", SIQuarterSpeedModel,
				[FeatureISAVersion7_0_3]>;

				//===----------------------------------------------------------------------===//
				// Volcanic Islands
				//===----------------------------------------------------------------------===//

				def : ProcessorModel<"tonga", SIQuarterSpeedModel,
				[FeatureISAVersion8_0_2]
				>;

				def : ProcessorModel<"iceland", SIQuarterSpeedModel,
				[FeatureISAVersion8_0_0]
				>;

				def : ProcessorModel<"carrizo", SIQuarterSpeedModel,
				[FeatureISAVersion8_0_1]
				>;

				def : ProcessorModel<"fiji", SIQuarterSpeedModel,
				[FeatureISAVersion8_0_3]
				>;

				def : ProcessorModel<"stoney", SIQuarterSpeedModel,
				[FeatureISAVersion8_1_0]
				>;

				def : ProcessorModel<"polaris10", SIQuarterSpeedModel,
				[FeatureISAVersion8_0_3]
				>;

				def : ProcessorModel<"polaris11", SIQuarterSpeedModel,
				[FeatureISAVersion8_0_3]
				>;

				def : ProcessorModel<"gfx800", SIQuarterSpeedModel,
				[FeatureISAVersion8_0_0]
				>;

				def : ProcessorModel<"gfx801", SIQuarterSpeedModel,
				[FeatureISAVersion8_0_1]
				>;

				def : ProcessorModel<"gfx802", SIQuarterSpeedModel,
				[FeatureISAVersion8_0_2]
				>;

				def : ProcessorModel<"gfx803", SIQuarterSpeedModel,
				[FeatureISAVersion8_0_3]
				>;

				def : ProcessorModel<"gfx804", SIQuarterSpeedModel,
				[FeatureISAVersion8_0_4]
				>;

				def : ProcessorModel<"gfx810", SIQuarterSpeedModel,
				[FeatureISAVersion8_1_0]
				>;

				//===----------------------------------------------------------------------===//
				// GFX9
				//===----------------------------------------------------------------------===//

				def : ProcessorModel<"gfx900", SIQuarterSpeedModel,
				[FeatureISAVersion9_0_0]
				>;

				def : ProcessorModel<"gfx901", SIQuarterSpeedModel,
				[FeatureISAVersion9_0_1]
				>;

				def : ProcessorModel<"gfx902", SIQuarterSpeedModel,
				[FeatureISAVersion9_0_2]
				>;

				def : ProcessorModel<"gfx903", SIQuarterSpeedModel,
				[FeatureISAVersion9_0_3]
				>;

lib/Target/AMDGPU/R600.td

This file was added.


				include "llvm/Target/Target.td"
				arsenmUnsubmitted Done Reply Inline Actions Missing header comment arsenm: Missing header comment

				def R600InstrInfo : InstrInfo {
				let guessInstructionProperties = 1;
				let noNamedPositionallyEncodedOperands = 1;
				}

				def R600 : Target {
				let InstructionSet = R600InstrInfo;
				let AllowRegisterRenaming = 1;
				}

				let Namespace = "R600" in {

				foreach Index = 0-15 in {
				def sub#Index : SubRegIndex<32, !shl(Index, 5)>;
				}

				include "R600RegisterInfo.td"

				}

				def NullALU : InstrItinClass;
				def ALU_NULL : FuncUnit;

				include "AMDGPUFeatures.td"
				include "R600Schedule.td"
				include "R600Processors.td"
				include "R600Intrinsics.td"
				include "AMDGPUInstrInfo.td"
				include "AMDGPUInstructions.td"
				include "R600Instructions.td"
				include "R700Instructions.td"
				include "EvergreenInstructions.td"
				include "CaymanInstructions.td"

				// Calling convention for R600
				def CC_R600 : CallingConv<[
				CCIfInReg<CCIfType<[v4f32, v4i32] , CCAssignToReg<[
				T0_XYZW, T1_XYZW, T2_XYZW, T3_XYZW, T4_XYZW, T5_XYZW, T6_XYZW, T7_XYZW,
				T8_XYZW, T9_XYZW, T10_XYZW, T11_XYZW, T12_XYZW, T13_XYZW, T14_XYZW, T15_XYZW,
				T16_XYZW, T17_XYZW, T18_XYZW, T19_XYZW, T20_XYZW, T21_XYZW, T22_XYZW,
				T23_XYZW, T24_XYZW, T25_XYZW, T26_XYZW, T27_XYZW, T28_XYZW, T29_XYZW,
				T30_XYZW, T31_XYZW, T32_XYZW
				]>>>
				]>;

				// Calling convention for compute kernels
				def CC_R600_Kernel : CallingConv<[
				CCCustom<"allocateKernArg">
				]>;

lib/Target/AMDGPU/R600ClauseMergePass.cpp

	Show All 28 Lines
	using namespace llvm;			using namespace llvm;

	#define DEBUG_TYPE "r600mergeclause"			#define DEBUG_TYPE "r600mergeclause"

	namespace {			namespace {

	static bool isCFAlu(const MachineInstr &MI) {			static bool isCFAlu(const MachineInstr &MI) {
	switch (MI.getOpcode()) {			switch (MI.getOpcode()) {
	case AMDGPU::CF_ALU:			case R600::CF_ALU:
	case AMDGPU::CF_ALU_PUSH_BEFORE:			case R600::CF_ALU_PUSH_BEFORE:
	return true;			return true;
	default:			default:
	return false;			return false;
	}			}
	}			}

	class R600ClauseMergePass : public MachineFunctionPass {			class R600ClauseMergePass : public MachineFunctionPass {

	Show All 33 Lines

	char R600ClauseMergePass::ID = 0;			char R600ClauseMergePass::ID = 0;

	char &llvm::R600ClauseMergePassID = R600ClauseMergePass::ID;			char &llvm::R600ClauseMergePassID = R600ClauseMergePass::ID;

	unsigned R600ClauseMergePass::getCFAluSize(const MachineInstr &MI) const {			unsigned R600ClauseMergePass::getCFAluSize(const MachineInstr &MI) const {
	assert(isCFAlu(MI));			assert(isCFAlu(MI));
	return MI			return MI
	.getOperand(TII->getOperandIdx(MI.getOpcode(), AMDGPU::OpName::COUNT))			.getOperand(TII->getOperandIdx(MI.getOpcode(), R600::OpName::COUNT))
	.getImm();			.getImm();
	}			}

	bool R600ClauseMergePass::isCFAluEnabled(const MachineInstr &MI) const {			bool R600ClauseMergePass::isCFAluEnabled(const MachineInstr &MI) const {
	assert(isCFAlu(MI));			assert(isCFAlu(MI));
	return MI			return MI
	.getOperand(TII->getOperandIdx(MI.getOpcode(), AMDGPU::OpName::Enabled))			.getOperand(TII->getOperandIdx(MI.getOpcode(), R600::OpName::Enabled))
	.getImm();			.getImm();
	}			}

	void R600ClauseMergePass::cleanPotentialDisabledCFAlu(			void R600ClauseMergePass::cleanPotentialDisabledCFAlu(
	MachineInstr &CFAlu) const {			MachineInstr &CFAlu) const {
	int CntIdx = TII->getOperandIdx(AMDGPU::CF_ALU, AMDGPU::OpName::COUNT);			int CntIdx = TII->getOperandIdx(R600::CF_ALU, R600::OpName::COUNT);
	MachineBasicBlock::iterator I = CFAlu, E = CFAlu.getParent()->end();			MachineBasicBlock::iterator I = CFAlu, E = CFAlu.getParent()->end();
	I++;			I++;
	do {			do {
	while (I != E && !isCFAlu(*I))			while (I != E && !isCFAlu(*I))
	I++;			I++;
	if (I == E)			if (I == E)
	return;			return;
	MachineInstr &MI = *I++;			MachineInstr &MI = *I++;
	if (isCFAluEnabled(MI))			if (isCFAluEnabled(MI))
	break;			break;
	CFAlu.getOperand(CntIdx).setImm(getCFAluSize(CFAlu) + getCFAluSize(MI));			CFAlu.getOperand(CntIdx).setImm(getCFAluSize(CFAlu) + getCFAluSize(MI));
	MI.eraseFromParent();			MI.eraseFromParent();
	} while (I != E);			} while (I != E);
	}			}

	bool R600ClauseMergePass::mergeIfPossible(MachineInstr &RootCFAlu,			bool R600ClauseMergePass::mergeIfPossible(MachineInstr &RootCFAlu,
	const MachineInstr &LatrCFAlu) const {			const MachineInstr &LatrCFAlu) const {
	assert(isCFAlu(RootCFAlu) && isCFAlu(LatrCFAlu));			assert(isCFAlu(RootCFAlu) && isCFAlu(LatrCFAlu));
	int CntIdx = TII->getOperandIdx(AMDGPU::CF_ALU, AMDGPU::OpName::COUNT);			int CntIdx = TII->getOperandIdx(R600::CF_ALU, R600::OpName::COUNT);
	unsigned RootInstCount = getCFAluSize(RootCFAlu),			unsigned RootInstCount = getCFAluSize(RootCFAlu),
	LaterInstCount = getCFAluSize(LatrCFAlu);			LaterInstCount = getCFAluSize(LatrCFAlu);
	unsigned CumuledInsts = RootInstCount + LaterInstCount;			unsigned CumuledInsts = RootInstCount + LaterInstCount;
	if (CumuledInsts >= TII->getMaxAlusPerClause()) {			if (CumuledInsts >= TII->getMaxAlusPerClause()) {
	DEBUG(dbgs() << "Excess inst counts\n");			DEBUG(dbgs() << "Excess inst counts\n");
	return false;			return false;
	}			}
	if (RootCFAlu.getOpcode() == AMDGPU::CF_ALU_PUSH_BEFORE)			if (RootCFAlu.getOpcode() == R600::CF_ALU_PUSH_BEFORE)
	return false;			return false;
	// Is KCache Bank 0 compatible ?			// Is KCache Bank 0 compatible ?
	int Mode0Idx =			int Mode0Idx =
	TII->getOperandIdx(AMDGPU::CF_ALU, AMDGPU::OpName::KCACHE_MODE0);			TII->getOperandIdx(R600::CF_ALU, R600::OpName::KCACHE_MODE0);
	int KBank0Idx =			int KBank0Idx =
	TII->getOperandIdx(AMDGPU::CF_ALU, AMDGPU::OpName::KCACHE_BANK0);			TII->getOperandIdx(R600::CF_ALU, R600::OpName::KCACHE_BANK0);
	int KBank0LineIdx =			int KBank0LineIdx =
	TII->getOperandIdx(AMDGPU::CF_ALU, AMDGPU::OpName::KCACHE_ADDR0);			TII->getOperandIdx(R600::CF_ALU, R600::OpName::KCACHE_ADDR0);
	if (LatrCFAlu.getOperand(Mode0Idx).getImm() &&			if (LatrCFAlu.getOperand(Mode0Idx).getImm() &&
	RootCFAlu.getOperand(Mode0Idx).getImm() &&			RootCFAlu.getOperand(Mode0Idx).getImm() &&
	(LatrCFAlu.getOperand(KBank0Idx).getImm() !=			(LatrCFAlu.getOperand(KBank0Idx).getImm() !=
	RootCFAlu.getOperand(KBank0Idx).getImm() \|\|			RootCFAlu.getOperand(KBank0Idx).getImm() \|\|
	LatrCFAlu.getOperand(KBank0LineIdx).getImm() !=			LatrCFAlu.getOperand(KBank0LineIdx).getImm() !=
	RootCFAlu.getOperand(KBank0LineIdx).getImm())) {			RootCFAlu.getOperand(KBank0LineIdx).getImm())) {
	DEBUG(dbgs() << "Wrong KC0\n");			DEBUG(dbgs() << "Wrong KC0\n");
	return false;			return false;
	}			}
	// Is KCache Bank 1 compatible ?			// Is KCache Bank 1 compatible ?
	int Mode1Idx =			int Mode1Idx =
	TII->getOperandIdx(AMDGPU::CF_ALU, AMDGPU::OpName::KCACHE_MODE1);			TII->getOperandIdx(R600::CF_ALU, R600::OpName::KCACHE_MODE1);
	int KBank1Idx =			int KBank1Idx =
	TII->getOperandIdx(AMDGPU::CF_ALU, AMDGPU::OpName::KCACHE_BANK1);			TII->getOperandIdx(R600::CF_ALU, R600::OpName::KCACHE_BANK1);
	int KBank1LineIdx =			int KBank1LineIdx =
	TII->getOperandIdx(AMDGPU::CF_ALU, AMDGPU::OpName::KCACHE_ADDR1);			TII->getOperandIdx(R600::CF_ALU, R600::OpName::KCACHE_ADDR1);
	if (LatrCFAlu.getOperand(Mode1Idx).getImm() &&			if (LatrCFAlu.getOperand(Mode1Idx).getImm() &&
	RootCFAlu.getOperand(Mode1Idx).getImm() &&			RootCFAlu.getOperand(Mode1Idx).getImm() &&
	(LatrCFAlu.getOperand(KBank1Idx).getImm() !=			(LatrCFAlu.getOperand(KBank1Idx).getImm() !=
	RootCFAlu.getOperand(KBank1Idx).getImm() \|\|			RootCFAlu.getOperand(KBank1Idx).getImm() \|\|
	LatrCFAlu.getOperand(KBank1LineIdx).getImm() !=			LatrCFAlu.getOperand(KBank1LineIdx).getImm() !=
	RootCFAlu.getOperand(KBank1LineIdx).getImm())) {			RootCFAlu.getOperand(KBank1LineIdx).getImm())) {
	DEBUG(dbgs() << "Wrong KC0\n");			DEBUG(dbgs() << "Wrong KC0\n");
	return false;			return false;
	▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

lib/Target/AMDGPU/R600ControlFlowFinalizer.cpp

Show First 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	for (std::vector<CFStack::StackItem>::const_iterator I = BranchStack.begin(),
E = BranchStack.end(); I != E; ++I) {		E = BranchStack.end(); I != E; ++I) {
if (*I == Item)		if (*I == Item)
return true;		return true;
}		}
return false;		return false;
}		}

bool CFStack::requiresWorkAroundForInst(unsigned Opcode) {		bool CFStack::requiresWorkAroundForInst(unsigned Opcode) {
if (Opcode == AMDGPU::CF_ALU_PUSH_BEFORE && ST->hasCaymanISA() &&		if (Opcode == R600::CF_ALU_PUSH_BEFORE && ST->hasCaymanISA() &&
getLoopDepth() > 1)		getLoopDepth() > 1)
return true;		return true;

if (!ST->hasCFAluBug())		if (!ST->hasCFAluBug())
return false;		return false;

switch(Opcode) {		switch(Opcode) {
default: return false;		default: return false;
case AMDGPU::CF_ALU_PUSH_BEFORE:		case R600::CF_ALU_PUSH_BEFORE:
case AMDGPU::CF_ALU_ELSE_AFTER:		case R600::CF_ALU_ELSE_AFTER:
case AMDGPU::CF_ALU_BREAK:		case R600::CF_ALU_BREAK:
case AMDGPU::CF_ALU_CONTINUE:		case R600::CF_ALU_CONTINUE:
if (CurrentSubEntries == 0)		if (CurrentSubEntries == 0)
return false;		return false;
if (ST->getWavefrontSize() == 64) {		if (ST->getWavefrontSize() == 64) {
// We are being conservative here. We only require this work-around if		// We are being conservative here. We only require this work-around if
// CurrentSubEntries > 3 &&		// CurrentSubEntries > 3 &&
// (CurrentSubEntries % 4 == 3 \|\| CurrentSubEntries % 4 == 0)		// (CurrentSubEntries % 4 == 3 \|\| CurrentSubEntries % 4 == 0)
//		//
// We have to be conservative, because we don't know for certain that		// We have to be conservative, because we don't know for certain that
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	void CFStack::updateMaxStackSize() {
unsigned CurrentStackSize =		unsigned CurrentStackSize =
CurrentEntries + (alignTo(CurrentSubEntries, 4) / 4);		CurrentEntries + (alignTo(CurrentSubEntries, 4) / 4);
MaxStackSize = std::max(CurrentStackSize, MaxStackSize);		MaxStackSize = std::max(CurrentStackSize, MaxStackSize);
}		}

void CFStack::pushBranch(unsigned Opcode, bool isWQM) {		void CFStack::pushBranch(unsigned Opcode, bool isWQM) {
CFStack::StackItem Item = CFStack::ENTRY;		CFStack::StackItem Item = CFStack::ENTRY;
switch(Opcode) {		switch(Opcode) {
case AMDGPU::CF_PUSH_EG:		case R600::CF_PUSH_EG:
case AMDGPU::CF_ALU_PUSH_BEFORE:		case R600::CF_ALU_PUSH_BEFORE:
if (!isWQM) {		if (!isWQM) {
if (!ST->hasCaymanISA() &&		if (!ST->hasCaymanISA() &&
!branchStackContains(CFStack::FIRST_NON_WQM_PUSH))		!branchStackContains(CFStack::FIRST_NON_WQM_PUSH))
Item = CFStack::FIRST_NON_WQM_PUSH; // May not be required on Evergreen/NI		Item = CFStack::FIRST_NON_WQM_PUSH; // May not be required on Evergreen/NI
// See comment in		// See comment in
// CFStack::getSubEntrySize()		// CFStack::getSubEntrySize()
else if (CurrentEntries > 0 &&		else if (CurrentEntries > 0 &&
ST->getGeneration() > R600Subtarget::EVERGREEN &&		ST->getGeneration() > R600Subtarget::EVERGREEN &&
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	private:

const R600InstrInfo *TII = nullptr;		const R600InstrInfo *TII = nullptr;
const R600RegisterInfo *TRI = nullptr;		const R600RegisterInfo *TRI = nullptr;
unsigned MaxFetchInst;		unsigned MaxFetchInst;
const R600Subtarget *ST = nullptr;		const R600Subtarget *ST = nullptr;

bool IsTrivialInst(MachineInstr &MI) const {		bool IsTrivialInst(MachineInstr &MI) const {
switch (MI.getOpcode()) {		switch (MI.getOpcode()) {
case AMDGPU::KILL:		case R600::KILL:
case AMDGPU::RETURN:		case R600::RETURN:
return true;		return true;
default:		default:
return false;		return false;
}		}
}		}

const MCInstrDesc &getHWInstrDesc(ControlFlowInstruction CFI) const {		const MCInstrDesc &getHWInstrDesc(ControlFlowInstruction CFI) const {
unsigned Opcode = 0;		unsigned Opcode = 0;
bool isEg = (ST->getGeneration() >= R600Subtarget::EVERGREEN);		bool isEg = (ST->getGeneration() >= R600Subtarget::EVERGREEN);
switch (CFI) {		switch (CFI) {
case CF_TC:		case CF_TC:
Opcode = isEg ? AMDGPU::CF_TC_EG : AMDGPU::CF_TC_R600;		Opcode = isEg ? R600::CF_TC_EG : R600::CF_TC_R600;
break;		break;
case CF_VC:		case CF_VC:
Opcode = isEg ? AMDGPU::CF_VC_EG : AMDGPU::CF_VC_R600;		Opcode = isEg ? R600::CF_VC_EG : R600::CF_VC_R600;
break;		break;
case CF_CALL_FS:		case CF_CALL_FS:
Opcode = isEg ? AMDGPU::CF_CALL_FS_EG : AMDGPU::CF_CALL_FS_R600;		Opcode = isEg ? R600::CF_CALL_FS_EG : R600::CF_CALL_FS_R600;
break;		break;
case CF_WHILE_LOOP:		case CF_WHILE_LOOP:
Opcode = isEg ? AMDGPU::WHILE_LOOP_EG : AMDGPU::WHILE_LOOP_R600;		Opcode = isEg ? R600::WHILE_LOOP_EG : R600::WHILE_LOOP_R600;
break;		break;
case CF_END_LOOP:		case CF_END_LOOP:
Opcode = isEg ? AMDGPU::END_LOOP_EG : AMDGPU::END_LOOP_R600;		Opcode = isEg ? R600::END_LOOP_EG : R600::END_LOOP_R600;
break;		break;
case CF_LOOP_BREAK:		case CF_LOOP_BREAK:
Opcode = isEg ? AMDGPU::LOOP_BREAK_EG : AMDGPU::LOOP_BREAK_R600;		Opcode = isEg ? R600::LOOP_BREAK_EG : R600::LOOP_BREAK_R600;
break;		break;
case CF_LOOP_CONTINUE:		case CF_LOOP_CONTINUE:
Opcode = isEg ? AMDGPU::CF_CONTINUE_EG : AMDGPU::CF_CONTINUE_R600;		Opcode = isEg ? R600::CF_CONTINUE_EG : R600::CF_CONTINUE_R600;
break;		break;
case CF_JUMP:		case CF_JUMP:
Opcode = isEg ? AMDGPU::CF_JUMP_EG : AMDGPU::CF_JUMP_R600;		Opcode = isEg ? R600::CF_JUMP_EG : R600::CF_JUMP_R600;
break;		break;
case CF_ELSE:		case CF_ELSE:
Opcode = isEg ? AMDGPU::CF_ELSE_EG : AMDGPU::CF_ELSE_R600;		Opcode = isEg ? R600::CF_ELSE_EG : R600::CF_ELSE_R600;
break;		break;
case CF_POP:		case CF_POP:
Opcode = isEg ? AMDGPU::POP_EG : AMDGPU::POP_R600;		Opcode = isEg ? R600::POP_EG : R600::POP_R600;
break;		break;
case CF_END:		case CF_END:
if (ST->hasCaymanISA()) {		if (ST->hasCaymanISA()) {
Opcode = AMDGPU::CF_END_CM;		Opcode = R600::CF_END_CM;
break;		break;
}		}
Opcode = isEg ? AMDGPU::CF_END_EG : AMDGPU::CF_END_R600;		Opcode = isEg ? R600::CF_END_EG : R600::CF_END_R600;
break;		break;
}		}
assert (Opcode && "No opcode selected");		assert (Opcode && "No opcode selected");
return TII->get(Opcode);		return TII->get(Opcode);
}		}

bool isCompatibleWithClause(const MachineInstr &MI,		bool isCompatibleWithClause(const MachineInstr &MI,
std::set<unsigned> &DstRegs) const {		std::set<unsigned> &DstRegs) const {
unsigned DstMI, SrcMI;		unsigned DstMI, SrcMI;
for (MachineInstr::const_mop_iterator I = MI.operands_begin(),		for (MachineInstr::const_mop_iterator I = MI.operands_begin(),
E = MI.operands_end();		E = MI.operands_end();
I != E; ++I) {		I != E; ++I) {
const MachineOperand &MO = *I;		const MachineOperand &MO = *I;
if (!MO.isReg())		if (!MO.isReg())
continue;		continue;
if (MO.isDef()) {		if (MO.isDef()) {
unsigned Reg = MO.getReg();		unsigned Reg = MO.getReg();
if (AMDGPU::R600_Reg128RegClass.contains(Reg))		if (R600::R600_Reg128RegClass.contains(Reg))
DstMI = Reg;		DstMI = Reg;
else		else
DstMI = TRI->getMatchingSuperReg(Reg,		DstMI = TRI->getMatchingSuperReg(Reg,
AMDGPURegisterInfo::getSubRegFromChannel(TRI->getHWRegChan(Reg)),		AMDGPURegisterInfo::getSubRegFromChannel(TRI->getHWRegChan(Reg)),
&AMDGPU::R600_Reg128RegClass);		&R600::R600_Reg128RegClass);
}		}
if (MO.isUse()) {		if (MO.isUse()) {
unsigned Reg = MO.getReg();		unsigned Reg = MO.getReg();
if (AMDGPU::R600_Reg128RegClass.contains(Reg))		if (R600::R600_Reg128RegClass.contains(Reg))
SrcMI = Reg;		SrcMI = Reg;
else		else
SrcMI = TRI->getMatchingSuperReg(Reg,		SrcMI = TRI->getMatchingSuperReg(Reg,
AMDGPURegisterInfo::getSubRegFromChannel(TRI->getHWRegChan(Reg)),		AMDGPURegisterInfo::getSubRegFromChannel(TRI->getHWRegChan(Reg)),
&AMDGPU::R600_Reg128RegClass);		&R600::R600_Reg128RegClass);
}		}
}		}
if ((DstRegs.find(SrcMI) == DstRegs.end())) {		if ((DstRegs.find(SrcMI) == DstRegs.end())) {
DstRegs.insert(DstMI);		DstRegs.insert(DstMI);
return true;		return true;
} else		} else
return false;		return false;
}		}
Show All 23 Lines	MachineInstr *MIb = BuildMI(MBB, ClauseHead, MBB.findDebugLoc(ClauseHead),
getHWInstrDesc(IsTex?CF_TC:CF_VC))		getHWInstrDesc(IsTex?CF_TC:CF_VC))
.addImm(0) // ADDR		.addImm(0) // ADDR
.addImm(AluInstCount - 1); // COUNT		.addImm(AluInstCount - 1); // COUNT
return ClauseFile(MIb, std::move(ClauseContent));		return ClauseFile(MIb, std::move(ClauseContent));
}		}

void getLiteral(MachineInstr &MI, std::vector<MachineOperand *> &Lits) const {		void getLiteral(MachineInstr &MI, std::vector<MachineOperand *> &Lits) const {
static const unsigned LiteralRegs[] = {		static const unsigned LiteralRegs[] = {
AMDGPU::ALU_LITERAL_X,		R600::ALU_LITERAL_X,
AMDGPU::ALU_LITERAL_Y,		R600::ALU_LITERAL_Y,
AMDGPU::ALU_LITERAL_Z,		R600::ALU_LITERAL_Z,
AMDGPU::ALU_LITERAL_W		R600::ALU_LITERAL_W
};		};
const SmallVector<std::pair<MachineOperand *, int64_t>, 3> Srcs =		const SmallVector<std::pair<MachineOperand *, int64_t>, 3> Srcs =
TII->getSrcs(MI);		TII->getSrcs(MI);
for (const auto &Src:Srcs) {		for (const auto &Src:Srcs) {
if (Src.first->getReg() != AMDGPU::ALU_LITERAL_X)		if (Src.first->getReg() != R600::ALU_LITERAL_X)
continue;		continue;
int64_t Imm = Src.second;		int64_t Imm = Src.second;
std::vector<MachineOperand *>::iterator It =		std::vector<MachineOperand *>::iterator It =
llvm::find_if(Lits, [&](MachineOperand *val) {		llvm::find_if(Lits, [&](MachineOperand *val) {
return val->isImm() && (val->getImm() == Imm);		return val->isImm() && (val->getImm() == Imm);
});		});

// Get corresponding Operand		// Get corresponding Operand
MachineOperand &Operand = MI.getOperand(		MachineOperand &Operand = MI.getOperand(
TII->getOperandIdx(MI.getOpcode(), AMDGPU::OpName::literal));		TII->getOperandIdx(MI.getOpcode(), R600::OpName::literal));

if (It != Lits.end()) {		if (It != Lits.end()) {
// Reuse existing literal reg		// Reuse existing literal reg
unsigned Index = It - Lits.begin();		unsigned Index = It - Lits.begin();
Src.first->setReg(LiteralRegs[Index]);		Src.first->setReg(LiteralRegs[Index]);
} else {		} else {
// Allocate new literal reg		// Allocate new literal reg
assert(Lits.size() < 4 && "Too many literals in Instruction Group");		assert(Lits.size() < 4 && "Too many literals in Instruction Group");
Src.first->setReg(LiteralRegs[Lits.size()]);		Src.first->setReg(LiteralRegs[Lits.size()]);
Lits.push_back(&Operand);		Lits.push_back(&Operand);
}		}
}		}
}		}

MachineBasicBlock::iterator insertLiterals(		MachineBasicBlock::iterator insertLiterals(
MachineBasicBlock::iterator InsertPos,		MachineBasicBlock::iterator InsertPos,
const std::vector<unsigned> &Literals) const {		const std::vector<unsigned> &Literals) const {
MachineBasicBlock *MBB = InsertPos->getParent();		MachineBasicBlock *MBB = InsertPos->getParent();
for (unsigned i = 0, e = Literals.size(); i < e; i+=2) {		for (unsigned i = 0, e = Literals.size(); i < e; i+=2) {
unsigned LiteralPair0 = Literals[i];		unsigned LiteralPair0 = Literals[i];
unsigned LiteralPair1 = (i + 1 < e)?Literals[i + 1]:0;		unsigned LiteralPair1 = (i + 1 < e)?Literals[i + 1]:0;
InsertPos = BuildMI(MBB, InsertPos->getDebugLoc(),		InsertPos = BuildMI(MBB, InsertPos->getDebugLoc(),
TII->get(AMDGPU::LITERALS))		TII->get(R600::LITERALS))
.addImm(LiteralPair0)		.addImm(LiteralPair0)
.addImm(LiteralPair1);		.addImm(LiteralPair1);
}		}
return InsertPos;		return InsertPos;
}		}

ClauseFile		ClauseFile
MakeALUClause(MachineBasicBlock &MBB, MachineBasicBlock::iterator &I)		MakeALUClause(MachineBasicBlock &MBB, MachineBasicBlock::iterator &I)
Show All 25 Lines	for (MachineBasicBlock::instr_iterator E = MBB.instr_end(); I != E;) {
DeleteMI.eraseFromParent();		DeleteMI.eraseFromParent();
} else {		} else {
getLiteral(*I, Literals);		getLiteral(*I, Literals);
ClauseContent.push_back(&*I);		ClauseContent.push_back(&*I);
I++;		I++;
}		}
for (unsigned i = 0, e = Literals.size(); i < e; i += 2) {		for (unsigned i = 0, e = Literals.size(); i < e; i += 2) {
MachineInstrBuilder MILit = BuildMI(MBB, I, I->getDebugLoc(),		MachineInstrBuilder MILit = BuildMI(MBB, I, I->getDebugLoc(),
TII->get(AMDGPU::LITERALS));		TII->get(R600::LITERALS));
if (Literals[i]->isImm()) {		if (Literals[i]->isImm()) {
MILit.addImm(Literals[i]->getImm());		MILit.addImm(Literals[i]->getImm());
} else {		} else {
MILit.addGlobalAddress(Literals[i]->getGlobal(),		MILit.addGlobalAddress(Literals[i]->getGlobal(),
Literals[i]->getOffset());		Literals[i]->getOffset());
}		}
if (i + 1 < e) {		if (i + 1 < e) {
if (Literals[i + 1]->isImm()) {		if (Literals[i + 1]->isImm()) {
Show All 12 Lines	MakeALUClause(MachineBasicBlock &MBB, MachineBasicBlock::iterator &I)
return ClauseFile(&ClauseHead, std::move(ClauseContent));		return ClauseFile(&ClauseHead, std::move(ClauseContent));
}		}

void EmitFetchClause(MachineBasicBlock::iterator InsertPos,		void EmitFetchClause(MachineBasicBlock::iterator InsertPos,
const DebugLoc &DL, ClauseFile &Clause,		const DebugLoc &DL, ClauseFile &Clause,
unsigned &CfCount) {		unsigned &CfCount) {
CounterPropagateAddr(*Clause.first, CfCount);		CounterPropagateAddr(*Clause.first, CfCount);
MachineBasicBlock *BB = Clause.first->getParent();		MachineBasicBlock *BB = Clause.first->getParent();
BuildMI(BB, DL, TII->get(AMDGPU::FETCH_CLAUSE)).addImm(CfCount);		BuildMI(BB, DL, TII->get(R600::FETCH_CLAUSE)).addImm(CfCount);
for (unsigned i = 0, e = Clause.second.size(); i < e; ++i) {		for (unsigned i = 0, e = Clause.second.size(); i < e; ++i) {
BB->splice(InsertPos, BB, Clause.second[i]);		BB->splice(InsertPos, BB, Clause.second[i]);
}		}
CfCount += 2 * Clause.second.size();		CfCount += 2 * Clause.second.size();
}		}

void EmitALUClause(MachineBasicBlock::iterator InsertPos, const DebugLoc &DL,		void EmitALUClause(MachineBasicBlock::iterator InsertPos, const DebugLoc &DL,
ClauseFile &Clause, unsigned &CfCount) {		ClauseFile &Clause, unsigned &CfCount) {
Clause.first->getOperand(0).setImm(0);		Clause.first->getOperand(0).setImm(0);
CounterPropagateAddr(*Clause.first, CfCount);		CounterPropagateAddr(*Clause.first, CfCount);
MachineBasicBlock *BB = Clause.first->getParent();		MachineBasicBlock *BB = Clause.first->getParent();
BuildMI(BB, DL, TII->get(AMDGPU::ALU_CLAUSE)).addImm(CfCount);		BuildMI(BB, DL, TII->get(R600::ALU_CLAUSE)).addImm(CfCount);
for (unsigned i = 0, e = Clause.second.size(); i < e; ++i) {		for (unsigned i = 0, e = Clause.second.size(); i < e; ++i) {
BB->splice(InsertPos, BB, Clause.second[i]);		BB->splice(InsertPos, BB, Clause.second[i]);
}		}
CfCount += Clause.second.size();		CfCount += Clause.second.size();
}		}

void CounterPropagateAddr(MachineInstr &MI, unsigned Addr) const {		void CounterPropagateAddr(MachineInstr &MI, unsigned Addr) const {
MI.getOperand(0).setImm(Addr + MI.getOperand(0).getImm());		MI.getOperand(0).setImm(Addr + MI.getOperand(0).getImm());
Show All 40 Lines	for (MachineFunction::iterator MB = MF.begin(), ME = MF.end(); MB != ME;
DEBUG(dbgs() << CfCount << ":"; I->dump(););		DEBUG(dbgs() << CfCount << ":"; I->dump(););
FetchClauses.push_back(MakeFetchClause(MBB, I));		FetchClauses.push_back(MakeFetchClause(MBB, I));
CfCount++;		CfCount++;
LastAlu.back() = nullptr;		LastAlu.back() = nullptr;
continue;		continue;
}		}

MachineBasicBlock::iterator MI = I;		MachineBasicBlock::iterator MI = I;
if (MI->getOpcode() != AMDGPU::ENDIF)		if (MI->getOpcode() != R600::ENDIF)
LastAlu.back() = nullptr;		LastAlu.back() = nullptr;
if (MI->getOpcode() == AMDGPU::CF_ALU)		if (MI->getOpcode() == R600::CF_ALU)
LastAlu.back() = &*MI;		LastAlu.back() = &*MI;
I++;		I++;
bool RequiresWorkAround =		bool RequiresWorkAround =
CFStack.requiresWorkAroundForInst(MI->getOpcode());		CFStack.requiresWorkAroundForInst(MI->getOpcode());
switch (MI->getOpcode()) {		switch (MI->getOpcode()) {
case AMDGPU::CF_ALU_PUSH_BEFORE:		case R600::CF_ALU_PUSH_BEFORE:
if (RequiresWorkAround) {		if (RequiresWorkAround) {
DEBUG(dbgs() << "Applying bug work-around for ALU_PUSH_BEFORE\n");		DEBUG(dbgs() << "Applying bug work-around for ALU_PUSH_BEFORE\n");
BuildMI(MBB, MI, MBB.findDebugLoc(MI), TII->get(AMDGPU::CF_PUSH_EG))		BuildMI(MBB, MI, MBB.findDebugLoc(MI), TII->get(R600::CF_PUSH_EG))
.addImm(CfCount + 1)		.addImm(CfCount + 1)
.addImm(1);		.addImm(1);
MI->setDesc(TII->get(AMDGPU::CF_ALU));		MI->setDesc(TII->get(R600::CF_ALU));
CfCount++;		CfCount++;
CFStack.pushBranch(AMDGPU::CF_PUSH_EG);		CFStack.pushBranch(R600::CF_PUSH_EG);
} else		} else
CFStack.pushBranch(AMDGPU::CF_ALU_PUSH_BEFORE);		CFStack.pushBranch(R600::CF_ALU_PUSH_BEFORE);
LLVM_FALLTHROUGH;		LLVM_FALLTHROUGH;
case AMDGPU::CF_ALU:		case R600::CF_ALU:
I = MI;		I = MI;
AluClauses.push_back(MakeALUClause(MBB, I));		AluClauses.push_back(MakeALUClause(MBB, I));
DEBUG(dbgs() << CfCount << ":"; MI->dump(););		DEBUG(dbgs() << CfCount << ":"; MI->dump(););
CfCount++;		CfCount++;
break;		break;
case AMDGPU::WHILELOOP: {		case R600::WHILELOOP: {
CFStack.pushLoop();		CFStack.pushLoop();
MachineInstr *MIb = BuildMI(MBB, MI, MBB.findDebugLoc(MI),		MachineInstr *MIb = BuildMI(MBB, MI, MBB.findDebugLoc(MI),
getHWInstrDesc(CF_WHILE_LOOP))		getHWInstrDesc(CF_WHILE_LOOP))
.addImm(1);		.addImm(1);
std::pair<unsigned, std::set<MachineInstr *>> Pair(CfCount,		std::pair<unsigned, std::set<MachineInstr *>> Pair(CfCount,
std::set<MachineInstr *>());		std::set<MachineInstr *>());
Pair.second.insert(MIb);		Pair.second.insert(MIb);
LoopStack.push_back(std::move(Pair));		LoopStack.push_back(std::move(Pair));
MI->eraseFromParent();		MI->eraseFromParent();
CfCount++;		CfCount++;
break;		break;
}		}
case AMDGPU::ENDLOOP: {		case R600::ENDLOOP: {
CFStack.popLoop();		CFStack.popLoop();
std::pair<unsigned, std::set<MachineInstr *>> Pair =		std::pair<unsigned, std::set<MachineInstr *>> Pair =
std::move(LoopStack.back());		std::move(LoopStack.back());
LoopStack.pop_back();		LoopStack.pop_back();
CounterPropagateAddr(Pair.second, CfCount);		CounterPropagateAddr(Pair.second, CfCount);
BuildMI(MBB, MI, MBB.findDebugLoc(MI), getHWInstrDesc(CF_END_LOOP))		BuildMI(MBB, MI, MBB.findDebugLoc(MI), getHWInstrDesc(CF_END_LOOP))
.addImm(Pair.first + 1);		.addImm(Pair.first + 1);
MI->eraseFromParent();		MI->eraseFromParent();
CfCount++;		CfCount++;
break;		break;
}		}
case AMDGPU::IF_PREDICATE_SET: {		case R600::IF_PREDICATE_SET: {
LastAlu.push_back(nullptr);		LastAlu.push_back(nullptr);
MachineInstr *MIb = BuildMI(MBB, MI, MBB.findDebugLoc(MI),		MachineInstr *MIb = BuildMI(MBB, MI, MBB.findDebugLoc(MI),
getHWInstrDesc(CF_JUMP))		getHWInstrDesc(CF_JUMP))
.addImm(0)		.addImm(0)
.addImm(0);		.addImm(0);
IfThenElseStack.push_back(MIb);		IfThenElseStack.push_back(MIb);
DEBUG(dbgs() << CfCount << ":"; MIb->dump(););		DEBUG(dbgs() << CfCount << ":"; MIb->dump(););
MI->eraseFromParent();		MI->eraseFromParent();
CfCount++;		CfCount++;
break;		break;
}		}
case AMDGPU::ELSE: {		case R600::ELSE: {
MachineInstr * JumpInst = IfThenElseStack.back();		MachineInstr * JumpInst = IfThenElseStack.back();
IfThenElseStack.pop_back();		IfThenElseStack.pop_back();
CounterPropagateAddr(*JumpInst, CfCount);		CounterPropagateAddr(*JumpInst, CfCount);
MachineInstr *MIb = BuildMI(MBB, MI, MBB.findDebugLoc(MI),		MachineInstr *MIb = BuildMI(MBB, MI, MBB.findDebugLoc(MI),
getHWInstrDesc(CF_ELSE))		getHWInstrDesc(CF_ELSE))
.addImm(0)		.addImm(0)
.addImm(0);		.addImm(0);
DEBUG(dbgs() << CfCount << ":"; MIb->dump(););		DEBUG(dbgs() << CfCount << ":"; MIb->dump(););
IfThenElseStack.push_back(MIb);		IfThenElseStack.push_back(MIb);
MI->eraseFromParent();		MI->eraseFromParent();
CfCount++;		CfCount++;
break;		break;
}		}
case AMDGPU::ENDIF: {		case R600::ENDIF: {
CFStack.popBranch();		CFStack.popBranch();
if (LastAlu.back()) {		if (LastAlu.back()) {
ToPopAfter.push_back(LastAlu.back());		ToPopAfter.push_back(LastAlu.back());
} else {		} else {
MachineInstr *MIb = BuildMI(MBB, MI, MBB.findDebugLoc(MI),		MachineInstr *MIb = BuildMI(MBB, MI, MBB.findDebugLoc(MI),
getHWInstrDesc(CF_POP))		getHWInstrDesc(CF_POP))
.addImm(CfCount + 1)		.addImm(CfCount + 1)
.addImm(1);		.addImm(1);
(void)MIb;		(void)MIb;
DEBUG(dbgs() << CfCount << ":"; MIb->dump(););		DEBUG(dbgs() << CfCount << ":"; MIb->dump(););
CfCount++;		CfCount++;
}		}

MachineInstr *IfOrElseInst = IfThenElseStack.back();		MachineInstr *IfOrElseInst = IfThenElseStack.back();
IfThenElseStack.pop_back();		IfThenElseStack.pop_back();
CounterPropagateAddr(*IfOrElseInst, CfCount);		CounterPropagateAddr(*IfOrElseInst, CfCount);
IfOrElseInst->getOperand(1).setImm(1);		IfOrElseInst->getOperand(1).setImm(1);
LastAlu.pop_back();		LastAlu.pop_back();
MI->eraseFromParent();		MI->eraseFromParent();
break;		break;
}		}
case AMDGPU::BREAK: {		case R600::BREAK: {
CfCount ++;		CfCount ++;
MachineInstr *MIb = BuildMI(MBB, MI, MBB.findDebugLoc(MI),		MachineInstr *MIb = BuildMI(MBB, MI, MBB.findDebugLoc(MI),
getHWInstrDesc(CF_LOOP_BREAK))		getHWInstrDesc(CF_LOOP_BREAK))
.addImm(0);		.addImm(0);
LoopStack.back().second.insert(MIb);		LoopStack.back().second.insert(MIb);
MI->eraseFromParent();		MI->eraseFromParent();
break;		break;
}		}
case AMDGPU::CONTINUE: {		case R600::CONTINUE: {
MachineInstr *MIb = BuildMI(MBB, MI, MBB.findDebugLoc(MI),		MachineInstr *MIb = BuildMI(MBB, MI, MBB.findDebugLoc(MI),
getHWInstrDesc(CF_LOOP_CONTINUE))		getHWInstrDesc(CF_LOOP_CONTINUE))
.addImm(0);		.addImm(0);
LoopStack.back().second.insert(MIb);		LoopStack.back().second.insert(MIb);
MI->eraseFromParent();		MI->eraseFromParent();
CfCount++;		CfCount++;
break;		break;
}		}
case AMDGPU::RETURN: {		case R600::RETURN: {
DebugLoc DL = MBB.findDebugLoc(MI);		DebugLoc DL = MBB.findDebugLoc(MI);
BuildMI(MBB, MI, DL, getHWInstrDesc(CF_END));		BuildMI(MBB, MI, DL, getHWInstrDesc(CF_END));
CfCount++;		CfCount++;
if (CfCount % 2) {		if (CfCount % 2) {
BuildMI(MBB, I, DL, TII->get(AMDGPU::PAD));		BuildMI(MBB, I, DL, TII->get(R600::PAD));
CfCount++;		CfCount++;
}		}
MI->eraseFromParent();		MI->eraseFromParent();
for (unsigned i = 0, e = FetchClauses.size(); i < e; i++)		for (unsigned i = 0, e = FetchClauses.size(); i < e; i++)
EmitFetchClause(I, DL, FetchClauses[i], CfCount);		EmitFetchClause(I, DL, FetchClauses[i], CfCount);
for (unsigned i = 0, e = AluClauses.size(); i < e; i++)		for (unsigned i = 0, e = AluClauses.size(); i < e; i++)
EmitALUClause(I, DL, AluClauses[i], CfCount);		EmitALUClause(I, DL, AluClauses[i], CfCount);
break;		break;
}		}
default:		default:
if (TII->isExport(MI->getOpcode())) {		if (TII->isExport(MI->getOpcode())) {
DEBUG(dbgs() << CfCount << ":"; MI->dump(););		DEBUG(dbgs() << CfCount << ":"; MI->dump(););
CfCount++;		CfCount++;
}		}
break;		break;
}		}
}		}
for (unsigned i = 0, e = ToPopAfter.size(); i < e; ++i) {		for (unsigned i = 0, e = ToPopAfter.size(); i < e; ++i) {
MachineInstr *Alu = ToPopAfter[i];		MachineInstr *Alu = ToPopAfter[i];
BuildMI(MBB, Alu, MBB.findDebugLoc((MachineBasicBlock::iterator)Alu),		BuildMI(MBB, Alu, MBB.findDebugLoc((MachineBasicBlock::iterator)Alu),
TII->get(AMDGPU::CF_ALU_POP_AFTER))		TII->get(R600::CF_ALU_POP_AFTER))
.addImm(Alu->getOperand(0).getImm())		.addImm(Alu->getOperand(0).getImm())
.addImm(Alu->getOperand(1).getImm())		.addImm(Alu->getOperand(1).getImm())
.addImm(Alu->getOperand(2).getImm())		.addImm(Alu->getOperand(2).getImm())
.addImm(Alu->getOperand(3).getImm())		.addImm(Alu->getOperand(3).getImm())
.addImm(Alu->getOperand(4).getImm())		.addImm(Alu->getOperand(4).getImm())
.addImm(Alu->getOperand(5).getImm())		.addImm(Alu->getOperand(5).getImm())
.addImm(Alu->getOperand(6).getImm())		.addImm(Alu->getOperand(6).getImm())
.addImm(Alu->getOperand(7).getImm())		.addImm(Alu->getOperand(7).getImm())
Show All 28 Lines

lib/Target/AMDGPU/R600EmitClauseMarkers.cpp

Show First 20 Lines • Show All 46 Lines • ▼ Show 20 Lines

class R600EmitClauseMarkers : public MachineFunctionPass {		class R600EmitClauseMarkers : public MachineFunctionPass {
private:		private:
const R600InstrInfo *TII = nullptr;		const R600InstrInfo *TII = nullptr;
int Address = 0;		int Address = 0;

unsigned OccupiedDwords(MachineInstr &MI) const {		unsigned OccupiedDwords(MachineInstr &MI) const {
switch (MI.getOpcode()) {		switch (MI.getOpcode()) {
case AMDGPU::INTERP_PAIR_XY:		case R600::INTERP_PAIR_XY:
case AMDGPU::INTERP_PAIR_ZW:		case R600::INTERP_PAIR_ZW:
case AMDGPU::INTERP_VEC_LOAD:		case R600::INTERP_VEC_LOAD:
case AMDGPU::DOT_4:		case R600::DOT_4:
return 4;		return 4;
case AMDGPU::KILL:		case R600::KILL:
return 0;		return 0;
default:		default:
break;		break;
}		}

// These will be expanded to two ALU instructions in the		// These will be expanded to two ALU instructions in the
// ExpandSpecialInstructions pass.		// ExpandSpecialInstructions pass.
if (TII->isLDSRetInstr(MI.getOpcode()))		if (TII->isLDSRetInstr(MI.getOpcode()))
return 2;		return 2;

if (TII->isVector(MI) \|\| TII->isCubeOp(MI.getOpcode()) \|\|		if (TII->isVector(MI) \|\| TII->isCubeOp(MI.getOpcode()) \|\|
TII->isReductionOp(MI.getOpcode()))		TII->isReductionOp(MI.getOpcode()))
return 4;		return 4;

unsigned NumLiteral = 0;		unsigned NumLiteral = 0;
for (MachineInstr::mop_iterator It = MI.operands_begin(),		for (MachineInstr::mop_iterator It = MI.operands_begin(),
E = MI.operands_end();		E = MI.operands_end();
It != E; ++It) {		It != E; ++It) {
MachineOperand &MO = *It;		MachineOperand &MO = *It;
if (MO.isReg() && MO.getReg() == AMDGPU::ALU_LITERAL_X)		if (MO.isReg() && MO.getReg() == R600::ALU_LITERAL_X)
++NumLiteral;		++NumLiteral;
}		}
return 1 + NumLiteral;		return 1 + NumLiteral;
}		}

bool isALU(const MachineInstr &MI) const {		bool isALU(const MachineInstr &MI) const {
if (TII->isALUInstr(MI.getOpcode()))		if (TII->isALUInstr(MI.getOpcode()))
return true;		return true;
if (TII->isVector(MI) \|\| TII->isCubeOp(MI.getOpcode()))		if (TII->isVector(MI) \|\| TII->isCubeOp(MI.getOpcode()))
return true;		return true;
switch (MI.getOpcode()) {		switch (MI.getOpcode()) {
case AMDGPU::PRED_X:		case R600::PRED_X:
case AMDGPU::INTERP_PAIR_XY:		case R600::INTERP_PAIR_XY:
case AMDGPU::INTERP_PAIR_ZW:		case R600::INTERP_PAIR_ZW:
case AMDGPU::INTERP_VEC_LOAD:		case R600::INTERP_VEC_LOAD:
case AMDGPU::COPY:		case R600::COPY:
case AMDGPU::DOT_4:		case R600::DOT_4:
return true;		return true;
default:		default:
return false;		return false;
}		}
}		}

bool IsTrivialInst(MachineInstr &MI) const {		bool IsTrivialInst(MachineInstr &MI) const {
switch (MI.getOpcode()) {		switch (MI.getOpcode()) {
case AMDGPU::KILL:		case R600::KILL:
case AMDGPU::RETURN:		case R600::RETURN:
case AMDGPU::IMPLICIT_DEF:		case R600::IMPLICIT_DEF:
return true;		return true;
default:		default:
return false;		return false;
}		}
}		}

std::pair<unsigned, unsigned> getAccessedBankLine(unsigned Sel) const {		std::pair<unsigned, unsigned> getAccessedBankLine(unsigned Sel) const {
// Sel is (512 + (kc_bank << 12) + ConstIndex) << 2		// Sel is (512 + (kc_bank << 12) + ConstIndex) << 2
Show All 10 Lines	private:
}		}

bool		bool
SubstituteKCacheBank(MachineInstr &MI,		SubstituteKCacheBank(MachineInstr &MI,
std::vector<std::pair<unsigned, unsigned>> &CachedConsts,		std::vector<std::pair<unsigned, unsigned>> &CachedConsts,
bool UpdateInstr = true) const {		bool UpdateInstr = true) const {
std::vector<std::pair<unsigned, unsigned>> UsedKCache;		std::vector<std::pair<unsigned, unsigned>> UsedKCache;

if (!TII->isALUInstr(MI.getOpcode()) && MI.getOpcode() != AMDGPU::DOT_4)		if (!TII->isALUInstr(MI.getOpcode()) && MI.getOpcode() != R600::DOT_4)
return true;		return true;

const SmallVectorImpl<std::pair<MachineOperand *, int64_t>> &Consts =		const SmallVectorImpl<std::pair<MachineOperand *, int64_t>> &Consts =
TII->getSrcs(MI);		TII->getSrcs(MI);
assert(		assert(
(TII->isALUInstr(MI.getOpcode()) \|\| MI.getOpcode() == AMDGPU::DOT_4) &&		(TII->isALUInstr(MI.getOpcode()) \|\| MI.getOpcode() == R600::DOT_4) &&
"Can't assign Const");		"Can't assign Const");
for (unsigned i = 0, n = Consts.size(); i < n; ++i) {		for (unsigned i = 0, n = Consts.size(); i < n; ++i) {
if (Consts[i].first->getReg() != AMDGPU::ALU_CONST)		if (Consts[i].first->getReg() != R600::ALU_CONST)
continue;		continue;
unsigned Sel = Consts[i].second;		unsigned Sel = Consts[i].second;
unsigned Chan = Sel & 3, Index = ((Sel >> 2) - 512) & 31;		unsigned Chan = Sel & 3, Index = ((Sel >> 2) - 512) & 31;
unsigned KCacheIndex = Index * 4 + Chan;		unsigned KCacheIndex = Index * 4 + Chan;
const std::pair<unsigned, unsigned> &BankLine = getAccessedBankLine(Sel);		const std::pair<unsigned, unsigned> &BankLine = getAccessedBankLine(Sel);
if (CachedConsts.empty()) {		if (CachedConsts.empty()) {
CachedConsts.push_back(BankLine);		CachedConsts.push_back(BankLine);
UsedKCache.push_back(std::pair<unsigned, unsigned>(0, KCacheIndex));		UsedKCache.push_back(std::pair<unsigned, unsigned>(0, KCacheIndex));
Show All 14 Lines	for (unsigned i = 0, n = Consts.size(); i < n; ++i) {
}		}
return false;		return false;
}		}

if (!UpdateInstr)		if (!UpdateInstr)
return true;		return true;

for (unsigned i = 0, j = 0, n = Consts.size(); i < n; ++i) {		for (unsigned i = 0, j = 0, n = Consts.size(); i < n; ++i) {
if (Consts[i].first->getReg() != AMDGPU::ALU_CONST)		if (Consts[i].first->getReg() != R600::ALU_CONST)
continue;		continue;
switch(UsedKCache[j].first) {		switch(UsedKCache[j].first) {
case 0:		case 0:
Consts[i].first->setReg(		Consts[i].first->setReg(
AMDGPU::R600_KC0RegClass.getRegister(UsedKCache[j].second));		R600::R600_KC0RegClass.getRegister(UsedKCache[j].second));
break;		break;
case 1:		case 1:
Consts[i].first->setReg(		Consts[i].first->setReg(
AMDGPU::R600_KC1RegClass.getRegister(UsedKCache[j].second));		R600::R600_KC1RegClass.getRegister(UsedKCache[j].second));
break;		break;
default:		default:
llvm_unreachable("Wrong Cache Line");		llvm_unreachable("Wrong Cache Line");
}		}
j++;		j++;
}		}
return true;		return true;
}		}
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	MakeALUClause(MachineBasicBlock &MBB, MachineBasicBlock::iterator I) {
unsigned AluInstCount = 0;		unsigned AluInstCount = 0;
for (MachineBasicBlock::iterator E = MBB.end(); I != E; ++I) {		for (MachineBasicBlock::iterator E = MBB.end(); I != E; ++I) {
if (IsTrivialInst(*I))		if (IsTrivialInst(*I))
continue;		continue;
if (!isALU(*I))		if (!isALU(*I))
break;		break;
if (AluInstCount > TII->getMaxAlusPerClause())		if (AluInstCount > TII->getMaxAlusPerClause())
break;		break;
if (I->getOpcode() == AMDGPU::PRED_X) {		if (I->getOpcode() == R600::PRED_X) {
// We put PRED_X in its own clause to ensure that ifcvt won't create		// We put PRED_X in its own clause to ensure that ifcvt won't create
// clauses with more than 128 insts.		// clauses with more than 128 insts.
// IfCvt is indeed checking that "then" and "else" branches of an if		// IfCvt is indeed checking that "then" and "else" branches of an if
// statement have less than ~60 insts thus converted clauses can't be		// statement have less than ~60 insts thus converted clauses can't be
// bigger than ~121 insts (predicate setter needs to be in the same		// bigger than ~121 insts (predicate setter needs to be in the same
// clause as predicated alus).		// clause as predicated alus).
if (AluInstCount > 0)		if (AluInstCount > 0)
break;		break;
Show All 19 Lines	for (MachineBasicBlock::iterator E = MBB.end(); I != E; ++I) {
if (!canClauseLocalKillFitInClause(AluInstCount, KCacheBanks, I, E))		if (!canClauseLocalKillFitInClause(AluInstCount, KCacheBanks, I, E))
break;		break;

if (!SubstituteKCacheBank(*I, KCacheBanks))		if (!SubstituteKCacheBank(*I, KCacheBanks))
break;		break;
AluInstCount += OccupiedDwords(*I);		AluInstCount += OccupiedDwords(*I);
}		}
unsigned Opcode = PushBeforeModifier ?		unsigned Opcode = PushBeforeModifier ?
AMDGPU::CF_ALU_PUSH_BEFORE : AMDGPU::CF_ALU;		R600::CF_ALU_PUSH_BEFORE : R600::CF_ALU;
BuildMI(MBB, ClauseHead, MBB.findDebugLoc(ClauseHead), TII->get(Opcode))		BuildMI(MBB, ClauseHead, MBB.findDebugLoc(ClauseHead), TII->get(Opcode))
// We don't use the ADDR field until R600ControlFlowFinalizer pass, where		// We don't use the ADDR field until R600ControlFlowFinalizer pass, where
// it is safe to assume it is 0. However if we always put 0 here, the ifcvt		// it is safe to assume it is 0. However if we always put 0 here, the ifcvt
// pass may assume that identical ALU clause starter at the beginning of a		// pass may assume that identical ALU clause starter at the beginning of a
// true and false branch can be factorized which is not the case.		// true and false branch can be factorized which is not the case.
.addImm(Address++) // ADDR		.addImm(Address++) // ADDR
.addImm(KCacheBanks.empty()?0:KCacheBanks[0].first) // KB0		.addImm(KCacheBanks.empty()?0:KCacheBanks[0].first) // KB0
.addImm((KCacheBanks.size() < 2)?0:KCacheBanks[1].first) // KB1		.addImm((KCacheBanks.size() < 2)?0:KCacheBanks[1].first) // KB1
Show All 16 Lines	public:
bool runOnMachineFunction(MachineFunction &MF) override {		bool runOnMachineFunction(MachineFunction &MF) override {
const R600Subtarget &ST = MF.getSubtarget<R600Subtarget>();		const R600Subtarget &ST = MF.getSubtarget<R600Subtarget>();
TII = ST.getInstrInfo();		TII = ST.getInstrInfo();

for (MachineFunction::iterator BB = MF.begin(), BB_E = MF.end();		for (MachineFunction::iterator BB = MF.begin(), BB_E = MF.end();
BB != BB_E; ++BB) {		BB != BB_E; ++BB) {
MachineBasicBlock &MBB = *BB;		MachineBasicBlock &MBB = *BB;
MachineBasicBlock::iterator I = MBB.begin();		MachineBasicBlock::iterator I = MBB.begin();
if (I != MBB.end() && I->getOpcode() == AMDGPU::CF_ALU)		if (I != MBB.end() && I->getOpcode() == R600::CF_ALU)
continue; // BB was already parsed		continue; // BB was already parsed
for (MachineBasicBlock::iterator E = MBB.end(); I != E;) {		for (MachineBasicBlock::iterator E = MBB.end(); I != E;) {
if (isALU(*I)) {		if (isALU(*I)) {
auto next = MakeALUClause(MBB, I);		auto next = MakeALUClause(MBB, I);
assert(next != I);		assert(next != I);
I = next;		I = next;
} else		} else
++I;		++I;
Show All 22 Lines

lib/Target/AMDGPU/R600ExpandSpecialInstrs.cpp

Show First 20 Lines • Show All 90 Lines • ▼ Show 20 Lines	for (MachineFunction::iterator BB = MF.begin(), BB_E = MF.end();
MachineBasicBlock &MBB = *BB;		MachineBasicBlock &MBB = *BB;
MachineBasicBlock::iterator I = MBB.begin();		MachineBasicBlock::iterator I = MBB.begin();
while (I != MBB.end()) {		while (I != MBB.end()) {
MachineInstr &MI = *I;		MachineInstr &MI = *I;
I = std::next(I);		I = std::next(I);

// Expand LDS_*_RET instructions		// Expand LDS_*_RET instructions
if (TII->isLDSRetInstr(MI.getOpcode())) {		if (TII->isLDSRetInstr(MI.getOpcode())) {
int DstIdx = TII->getOperandIdx(MI.getOpcode(), AMDGPU::OpName::dst);		int DstIdx = TII->getOperandIdx(MI.getOpcode(), R600::OpName::dst);
assert(DstIdx != -1);		assert(DstIdx != -1);
MachineOperand &DstOp = MI.getOperand(DstIdx);		MachineOperand &DstOp = MI.getOperand(DstIdx);
MachineInstr *Mov = TII->buildMovInstr(&MBB, I,		MachineInstr *Mov = TII->buildMovInstr(&MBB, I,
DstOp.getReg(), AMDGPU::OQAP);		DstOp.getReg(), R600::OQAP);
DstOp.setReg(AMDGPU::OQAP);		DstOp.setReg(R600::OQAP);
int LDSPredSelIdx = TII->getOperandIdx(MI.getOpcode(),		int LDSPredSelIdx = TII->getOperandIdx(MI.getOpcode(),
AMDGPU::OpName::pred_sel);		R600::OpName::pred_sel);
int MovPredSelIdx = TII->getOperandIdx(Mov->getOpcode(),		int MovPredSelIdx = TII->getOperandIdx(Mov->getOpcode(),
AMDGPU::OpName::pred_sel);		R600::OpName::pred_sel);
// Copy the pred_sel bit		// Copy the pred_sel bit
Mov->getOperand(MovPredSelIdx).setReg(		Mov->getOperand(MovPredSelIdx).setReg(
MI.getOperand(LDSPredSelIdx).getReg());		MI.getOperand(LDSPredSelIdx).getReg());
}		}

switch (MI.getOpcode()) {		switch (MI.getOpcode()) {
default: break;		default: break;
// Expand PRED_X to one of the PRED_SET instructions.		// Expand PRED_X to one of the PRED_SET instructions.
case AMDGPU::PRED_X: {		case R600::PRED_X: {
uint64_t Flags = MI.getOperand(3).getImm();		uint64_t Flags = MI.getOperand(3).getImm();
// The native opcode used by PRED_X is stored as an immediate in the		// The native opcode used by PRED_X is stored as an immediate in the
// third operand.		// third operand.
MachineInstr *PredSet = TII->buildDefaultInstruction(MBB, I,		MachineInstr *PredSet = TII->buildDefaultInstruction(MBB, I,
MI.getOperand(2).getImm(), // opcode		MI.getOperand(2).getImm(), // opcode
MI.getOperand(0).getReg(), // dst		MI.getOperand(0).getReg(), // dst
MI.getOperand(1).getReg(), // src0		MI.getOperand(1).getReg(), // src0
AMDGPU::ZERO); // src1		R600::ZERO); // src1
TII->addFlag(*PredSet, 0, MO_FLAG_MASK);		TII->addFlag(*PredSet, 0, MO_FLAG_MASK);
if (Flags & MO_FLAG_PUSH) {		if (Flags & MO_FLAG_PUSH) {
TII->setImmOperand(*PredSet, AMDGPU::OpName::update_exec_mask, 1);		TII->setImmOperand(*PredSet, R600::OpName::update_exec_mask, 1);
} else {		} else {
TII->setImmOperand(*PredSet, AMDGPU::OpName::update_pred, 1);		TII->setImmOperand(*PredSet, R600::OpName::update_pred, 1);
}		}
MI.eraseFromParent();		MI.eraseFromParent();
continue;		continue;
}		}
case AMDGPU::DOT_4: {		case R600::DOT_4: {

const R600RegisterInfo &TRI = TII->getRegisterInfo();		const R600RegisterInfo &TRI = TII->getRegisterInfo();

unsigned DstReg = MI.getOperand(0).getReg();		unsigned DstReg = MI.getOperand(0).getReg();
unsigned DstBase = TRI.getEncodingValue(DstReg) & HW_REG_MASK;		unsigned DstBase = TRI.getEncodingValue(DstReg) & HW_REG_MASK;

for (unsigned Chan = 0; Chan < 4; ++Chan) {		for (unsigned Chan = 0; Chan < 4; ++Chan) {
bool Mask = (Chan != TRI.getHWRegChan(DstReg));		bool Mask = (Chan != TRI.getHWRegChan(DstReg));
unsigned SubDstReg =		unsigned SubDstReg =
AMDGPU::R600_TReg32RegClass.getRegister((DstBase * 4) + Chan);		R600::R600_TReg32RegClass.getRegister((DstBase * 4) + Chan);
MachineInstr *BMI =		MachineInstr *BMI =
TII->buildSlotOfVectorInstruction(MBB, &MI, Chan, SubDstReg);		TII->buildSlotOfVectorInstruction(MBB, &MI, Chan, SubDstReg);
if (Chan > 0) {		if (Chan > 0) {
BMI->bundleWithPred();		BMI->bundleWithPred();
}		}
if (Mask) {		if (Mask) {
TII->addFlag(*BMI, 0, MO_FLAG_MASK);		TII->addFlag(*BMI, 0, MO_FLAG_MASK);
}		}
if (Chan != 3)		if (Chan != 3)
TII->addFlag(*BMI, 0, MO_FLAG_NOT_LAST);		TII->addFlag(*BMI, 0, MO_FLAG_NOT_LAST);
unsigned Opcode = BMI->getOpcode();		unsigned Opcode = BMI->getOpcode();
// While not strictly necessary from hw point of view, we force		// While not strictly necessary from hw point of view, we force
// all src operands of a dot4 inst to belong to the same slot.		// all src operands of a dot4 inst to belong to the same slot.
unsigned Src0 = BMI->getOperand(		unsigned Src0 = BMI->getOperand(
TII->getOperandIdx(Opcode, AMDGPU::OpName::src0))		TII->getOperandIdx(Opcode, R600::OpName::src0))
.getReg();		.getReg();
unsigned Src1 = BMI->getOperand(		unsigned Src1 = BMI->getOperand(
TII->getOperandIdx(Opcode, AMDGPU::OpName::src1))		TII->getOperandIdx(Opcode, R600::OpName::src1))
.getReg();		.getReg();
(void) Src0;		(void) Src0;
(void) Src1;		(void) Src1;
if ((TRI.getEncodingValue(Src0) & 0xff) < 127 &&		if ((TRI.getEncodingValue(Src0) & 0xff) < 127 &&
(TRI.getEncodingValue(Src1) & 0xff) < 127)		(TRI.getEncodingValue(Src1) & 0xff) < 127)
assert(TRI.getHWRegChan(Src0) == TRI.getHWRegChan(Src1));		assert(TRI.getHWRegChan(Src0) == TRI.getHWRegChan(Src1));
}		}
MI.eraseFromParent();		MI.eraseFromParent();
Show All 30 Lines	while (I != MBB.end()) {
// T0_XYZW = CUBE T1_XYZW		// T0_XYZW = CUBE T1_XYZW
// becomes:		// becomes:
// TO_X = CUBE T1_Z, T1_Y		// TO_X = CUBE T1_Z, T1_Y
// T0_Y = CUBE T1_Z, T1_X		// T0_Y = CUBE T1_Z, T1_X
// T0_Z = CUBE T1_X, T1_Z		// T0_Z = CUBE T1_X, T1_Z
// T0_W = CUBE T1_Y, T1_Z		// T0_W = CUBE T1_Y, T1_Z
for (unsigned Chan = 0; Chan < 4; Chan++) {		for (unsigned Chan = 0; Chan < 4; Chan++) {
unsigned DstReg = MI.getOperand(		unsigned DstReg = MI.getOperand(
TII->getOperandIdx(MI, AMDGPU::OpName::dst)).getReg();		TII->getOperandIdx(MI, R600::OpName::dst)).getReg();
unsigned Src0 = MI.getOperand(		unsigned Src0 = MI.getOperand(
TII->getOperandIdx(MI, AMDGPU::OpName::src0)).getReg();		TII->getOperandIdx(MI, R600::OpName::src0)).getReg();
unsigned Src1 = 0;		unsigned Src1 = 0;

// Determine the correct source registers		// Determine the correct source registers
if (!IsCube) {		if (!IsCube) {
int Src1Idx = TII->getOperandIdx(MI, AMDGPU::OpName::src1);		int Src1Idx = TII->getOperandIdx(MI, R600::OpName::src1);
if (Src1Idx != -1) {		if (Src1Idx != -1) {
Src1 = MI.getOperand(Src1Idx).getReg();		Src1 = MI.getOperand(Src1Idx).getReg();
}		}
}		}
if (IsReduction) {		if (IsReduction) {
unsigned SubRegIndex = AMDGPURegisterInfo::getSubRegFromChannel(Chan);		unsigned SubRegIndex = AMDGPURegisterInfo::getSubRegFromChannel(Chan);
Src0 = TRI.getSubReg(Src0, SubRegIndex);		Src0 = TRI.getSubReg(Src0, SubRegIndex);
Src1 = TRI.getSubReg(Src1, SubRegIndex);		Src1 = TRI.getSubReg(Src1, SubRegIndex);
Show All 11 Lines	while (I != MBB.end()) {
if (IsCube) {		if (IsCube) {
unsigned SubRegIndex = AMDGPURegisterInfo::getSubRegFromChannel(Chan);		unsigned SubRegIndex = AMDGPURegisterInfo::getSubRegFromChannel(Chan);
DstReg = TRI.getSubReg(DstReg, SubRegIndex);		DstReg = TRI.getSubReg(DstReg, SubRegIndex);
} else {		} else {
// Mask the write if the original instruction does not write to		// Mask the write if the original instruction does not write to
// the current Channel.		// the current Channel.
Mask = (Chan != TRI.getHWRegChan(DstReg));		Mask = (Chan != TRI.getHWRegChan(DstReg));
unsigned DstBase = TRI.getEncodingValue(DstReg) & HW_REG_MASK;		unsigned DstBase = TRI.getEncodingValue(DstReg) & HW_REG_MASK;
DstReg = AMDGPU::R600_TReg32RegClass.getRegister((DstBase * 4) + Chan);		DstReg = R600::R600_TReg32RegClass.getRegister((DstBase * 4) + Chan);
}		}

// Set the IsLast bit		// Set the IsLast bit
NotLast = (Chan != 3 );		NotLast = (Chan != 3 );

// Add the new instruction		// Add the new instruction
unsigned Opcode = MI.getOpcode();		unsigned Opcode = MI.getOpcode();
switch (Opcode) {		switch (Opcode) {
case AMDGPU::CUBE_r600_pseudo:		case R600::CUBE_r600_pseudo:
Opcode = AMDGPU::CUBE_r600_real;		Opcode = R600::CUBE_r600_real;
break;		break;
case AMDGPU::CUBE_eg_pseudo:		case R600::CUBE_eg_pseudo:
Opcode = AMDGPU::CUBE_eg_real;		Opcode = R600::CUBE_eg_real;
break;		break;
default:		default:
break;		break;
}		}

MachineInstr *NewMI =		MachineInstr *NewMI =
TII->buildDefaultInstruction(MBB, I, Opcode, DstReg, Src0, Src1);		TII->buildDefaultInstruction(MBB, I, Opcode, DstReg, Src0, Src1);

if (Chan != 0)		if (Chan != 0)
NewMI->bundleWithPred();		NewMI->bundleWithPred();
if (Mask) {		if (Mask) {
TII->addFlag(*NewMI, 0, MO_FLAG_MASK);		TII->addFlag(*NewMI, 0, MO_FLAG_MASK);
}		}
if (NotLast) {		if (NotLast) {
TII->addFlag(*NewMI, 0, MO_FLAG_NOT_LAST);		TII->addFlag(*NewMI, 0, MO_FLAG_NOT_LAST);
}		}
SetFlagInNewMI(NewMI, &MI, AMDGPU::OpName::clamp);		SetFlagInNewMI(NewMI, &MI, R600::OpName::clamp);
SetFlagInNewMI(NewMI, &MI, AMDGPU::OpName::literal);		SetFlagInNewMI(NewMI, &MI, R600::OpName::literal);
SetFlagInNewMI(NewMI, &MI, AMDGPU::OpName::src0_abs);		SetFlagInNewMI(NewMI, &MI, R600::OpName::src0_abs);
SetFlagInNewMI(NewMI, &MI, AMDGPU::OpName::src1_abs);		SetFlagInNewMI(NewMI, &MI, R600::OpName::src1_abs);
SetFlagInNewMI(NewMI, &MI, AMDGPU::OpName::src0_neg);		SetFlagInNewMI(NewMI, &MI, R600::OpName::src0_neg);
SetFlagInNewMI(NewMI, &MI, AMDGPU::OpName::src1_neg);		SetFlagInNewMI(NewMI, &MI, R600::OpName::src1_neg);
}		}
MI.eraseFromParent();		MI.eraseFromParent();
}		}
}		}
return false;		return false;
}		}

lib/Target/AMDGPU/R600ISelLowering.h

	Show All 17 Lines
	#include "AMDGPUISelLowering.h"			#include "AMDGPUISelLowering.h"

	namespace llvm {			namespace llvm {

	class R600InstrInfo;			class R600InstrInfo;
	class R600Subtarget;			class R600Subtarget;

	class R600TargetLowering final : public AMDGPUTargetLowering {			class R600TargetLowering final : public AMDGPUTargetLowering {

				const R600Subtarget *Subtarget;
	public:			public:
	R600TargetLowering(const TargetMachine &TM, const R600Subtarget &STI);			R600TargetLowering(const TargetMachine &TM, const R600Subtarget &STI);

	const R600Subtarget *getSubtarget() const;			const R600Subtarget *getSubtarget() const;

	MachineBasicBlock *			MachineBasicBlock *
	EmitInstrWithCustomInserter(MachineInstr &MI,			EmitInstrWithCustomInserter(MachineInstr &MI,
	MachineBasicBlock *BB) const override;			MachineBasicBlock *BB) const override;
	SDValue LowerOperation(SDValue Op, SelectionDAG &DAG) const override;			SDValue LowerOperation(SDValue Op, SelectionDAG &DAG) const override;
	SDValue PerformDAGCombine(SDNode *N, DAGCombinerInfo &DCI) const override;			SDValue PerformDAGCombine(SDNode *N, DAGCombinerInfo &DCI) const override;
	void ReplaceNodeResults(SDNode * N,			void ReplaceNodeResults(SDNode * N,
	SmallVectorImpl<SDValue> &Results,			SmallVectorImpl<SDValue> &Results,
	SelectionDAG &DAG) const override;			SelectionDAG &DAG) const override;
				CCAssignFn *CCAssignFnForCall(CallingConv::ID CC, bool IsVarArg) const;
	SDValue LowerFormalArguments(SDValue Chain, CallingConv::ID CallConv,			SDValue LowerFormalArguments(SDValue Chain, CallingConv::ID CallConv,
	bool isVarArg,			bool isVarArg,
	const SmallVectorImpl<ISD::InputArg> &Ins,			const SmallVectorImpl<ISD::InputArg> &Ins,
	const SDLoc &DL, SelectionDAG &DAG,			const SDLoc &DL, SelectionDAG &DAG,
	SmallVectorImpl<SDValue> &InVals) const override;			SmallVectorImpl<SDValue> &InVals) const override;
	EVT getSetCCResultType(const DataLayout &DL, LLVMContext &,			EVT getSetCCResultType(const DataLayout &DL, LLVMContext &,
	EVT VT) const override;			EVT VT) const override;

	▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

lib/Target/AMDGPU/R600ISelLowering.cpp

//===-- R600ISelLowering.cpp - R600 DAG Lowering Implementation -----------===//		//===-- R600ISelLowering.cpp - R600 DAG Lowering Implementation -----------===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
/// \file		/// \file
/// \brief Custom DAG lowering for R600		/// \brief Custom DAG lowering for R600
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "R600ISelLowering.h"		#include "R600ISelLowering.h"
#include "AMDGPUFrameLowering.h"		#include "AMDGPUFrameLowering.h"
#include "AMDGPUIntrinsicInfo.h"
#include "AMDGPUSubtarget.h"		#include "AMDGPUSubtarget.h"
#include "R600Defines.h"		#include "R600Defines.h"
#include "R600FrameLowering.h"		#include "R600FrameLowering.h"
#include "R600InstrInfo.h"		#include "R600InstrInfo.h"
		#include "R600IntrinsicInfo.h"
#include "R600MachineFunctionInfo.h"		#include "R600MachineFunctionInfo.h"
#include "MCTargetDesc/AMDGPUMCTargetDesc.h"		#include "MCTargetDesc/AMDGPUMCTargetDesc.h"
#include "Utils/AMDGPUBaseInfo.h"		#include "Utils/AMDGPUBaseInfo.h"
#include "llvm/ADT/APFloat.h"		#include "llvm/ADT/APFloat.h"
#include "llvm/ADT/APInt.h"		#include "llvm/ADT/APInt.h"
#include "llvm/ADT/ArrayRef.h"		#include "llvm/ADT/ArrayRef.h"
#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
Show All 16 Lines
#include <cassert>		#include <cassert>
#include <cstdint>		#include <cstdint>
#include <iterator>		#include <iterator>
#include <utility>		#include <utility>
#include <vector>		#include <vector>

using namespace llvm;		using namespace llvm;

		static bool allocateKernArg(unsigned ValNo, MVT ValVT, MVT LocVT,
		CCValAssign::LocInfo LocInfo,
		ISD::ArgFlagsTy ArgFlags, CCState &State) {
		MachineFunction &MF = State.getMachineFunction();
		AMDGPUMachineFunction *MFI = MF.getInfo<AMDGPUMachineFunction>();

		uint64_t Offset = MFI->allocateKernArg(LocVT.getStoreSize(),
		ArgFlags.getOrigAlign());
		State.addLoc(CCValAssign::getCustomMem(ValNo, ValVT, Offset, LocVT, LocInfo));
		return true;
		}

		#include "R600GenCallingConv.inc"

R600TargetLowering::R600TargetLowering(const TargetMachine &TM,		R600TargetLowering::R600TargetLowering(const TargetMachine &TM,
const R600Subtarget &STI)		const R600Subtarget &STI)
: AMDGPUTargetLowering(TM, STI), Gen(STI.getGeneration()) {		: AMDGPUTargetLowering(TM, STI), Subtarget(&STI), Gen(STI.getGeneration()) {
addRegisterClass(MVT::f32, &AMDGPU::R600_Reg32RegClass);		addRegisterClass(MVT::f32, &R600::R600_Reg32RegClass);
addRegisterClass(MVT::i32, &AMDGPU::R600_Reg32RegClass);		addRegisterClass(MVT::i32, &R600::R600_Reg32RegClass);
addRegisterClass(MVT::v2f32, &AMDGPU::R600_Reg64RegClass);		addRegisterClass(MVT::v2f32, &R600::R600_Reg64RegClass);
addRegisterClass(MVT::v2i32, &AMDGPU::R600_Reg64RegClass);		addRegisterClass(MVT::v2i32, &R600::R600_Reg64RegClass);
addRegisterClass(MVT::v4f32, &AMDGPU::R600_Reg128RegClass);		addRegisterClass(MVT::v4f32, &R600::R600_Reg128RegClass);
addRegisterClass(MVT::v4i32, &AMDGPU::R600_Reg128RegClass);		addRegisterClass(MVT::v4i32, &R600::R600_Reg128RegClass);

computeRegisterProperties(STI.getRegisterInfo());		computeRegisterProperties(Subtarget->getRegisterInfo());

// Legalize loads and stores to the private address space.		// Legalize loads and stores to the private address space.
setOperationAction(ISD::LOAD, MVT::i32, Custom);		setOperationAction(ISD::LOAD, MVT::i32, Custom);
setOperationAction(ISD::LOAD, MVT::v2i32, Custom);		setOperationAction(ISD::LOAD, MVT::v2i32, Custom);
setOperationAction(ISD::LOAD, MVT::v4i32, Custom);		setOperationAction(ISD::LOAD, MVT::v4i32, Custom);

// EXTLOAD should be the same as ZEXTLOAD. It is legal for some address		// EXTLOAD should be the same as ZEXTLOAD. It is legal for some address
// spaces, so it is custom lowered to handle those where it isn't.		// spaces, so it is custom lowered to handle those where it isn't.
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	R600TargetLowering::R600TargetLowering(const TargetMachine &TM,
setOperationAction(ISD::SETCC, MVT::v2i32, Expand);		setOperationAction(ISD::SETCC, MVT::v2i32, Expand);

setOperationAction(ISD::BR_CC, MVT::i32, Expand);		setOperationAction(ISD::BR_CC, MVT::i32, Expand);
setOperationAction(ISD::BR_CC, MVT::f32, Expand);		setOperationAction(ISD::BR_CC, MVT::f32, Expand);
setOperationAction(ISD::BRCOND, MVT::Other, Custom);		setOperationAction(ISD::BRCOND, MVT::Other, Custom);

setOperationAction(ISD::FSUB, MVT::f32, Expand);		setOperationAction(ISD::FSUB, MVT::f32, Expand);

		setOperationAction(ISD::FCEIL, MVT::f64, Custom);
		setOperationAction(ISD::FTRUNC, MVT::f64, Custom);
		setOperationAction(ISD::FRINT, MVT::f64, Custom);
		setOperationAction(ISD::FFLOOR, MVT::f64, Custom);

setOperationAction(ISD::SELECT_CC, MVT::f32, Custom);		setOperationAction(ISD::SELECT_CC, MVT::f32, Custom);
setOperationAction(ISD::SELECT_CC, MVT::i32, Custom);		setOperationAction(ISD::SELECT_CC, MVT::i32, Custom);

setOperationAction(ISD::SETCC, MVT::i32, Expand);		setOperationAction(ISD::SETCC, MVT::i32, Expand);
setOperationAction(ISD::SETCC, MVT::f32, Expand);		setOperationAction(ISD::SETCC, MVT::f32, Expand);
setOperationAction(ISD::FP_TO_UINT, MVT::i1, Custom);		setOperationAction(ISD::FP_TO_UINT, MVT::i1, Custom);
setOperationAction(ISD::FP_TO_SINT, MVT::i1, Custom);		setOperationAction(ISD::FP_TO_SINT, MVT::i1, Custom);
setOperationAction(ISD::FP_TO_SINT, MVT::i64, Custom);		setOperationAction(ISD::FP_TO_SINT, MVT::i64, Custom);
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	R600TargetLowering::R600TargetLowering(const TargetMachine &TM,
setOperationAction(ISD::SHL_PARTS, MVT::i32, Custom);		setOperationAction(ISD::SHL_PARTS, MVT::i32, Custom);
setOperationAction(ISD::SRL_PARTS, MVT::i32, Custom);		setOperationAction(ISD::SRL_PARTS, MVT::i32, Custom);
setOperationAction(ISD::SRA_PARTS, MVT::i32, Custom);		setOperationAction(ISD::SRA_PARTS, MVT::i32, Custom);

if (!Subtarget->hasFMA()) {		if (!Subtarget->hasFMA()) {
setOperationAction(ISD::FMA, MVT::f32, Expand);		setOperationAction(ISD::FMA, MVT::f32, Expand);
setOperationAction(ISD::FMA, MVT::f64, Expand);		setOperationAction(ISD::FMA, MVT::f64, Expand);
}		}

		jveselyUnsubmitted Not Done Reply Inline Actions git complains about whitespace error in this location jvesely: git complains about whitespace error in this location
setOperationAction(ISD::GlobalAddress, MVT::i32, Custom);		setOperationAction(ISD::GlobalAddress, MVT::i32, Custom);

const MVT ScalarIntVTs[] = { MVT::i32, MVT::i64 };		const MVT ScalarIntVTs[] = { MVT::i32, MVT::i64 };
for (MVT VT : ScalarIntVTs) {		for (MVT VT : ScalarIntVTs) {
setOperationAction(ISD::ADDC, VT, Expand);		setOperationAction(ISD::ADDC, VT, Expand);
setOperationAction(ISD::SUBC, VT, Expand);		setOperationAction(ISD::SUBC, VT, Expand);
setOperationAction(ISD::ADDE, VT, Expand);		setOperationAction(ISD::ADDE, VT, Expand);
setOperationAction(ISD::SUBE, VT, Expand);		setOperationAction(ISD::SUBE, VT, Expand);
Show All 13 Lines	R600TargetLowering::R600TargetLowering(const TargetMachine &TM,
setTargetDAGCombine(ISD::FP_ROUND);		setTargetDAGCombine(ISD::FP_ROUND);
setTargetDAGCombine(ISD::FP_TO_SINT);		setTargetDAGCombine(ISD::FP_TO_SINT);
setTargetDAGCombine(ISD::EXTRACT_VECTOR_ELT);		setTargetDAGCombine(ISD::EXTRACT_VECTOR_ELT);
setTargetDAGCombine(ISD::SELECT_CC);		setTargetDAGCombine(ISD::SELECT_CC);
setTargetDAGCombine(ISD::INSERT_VECTOR_ELT);		setTargetDAGCombine(ISD::INSERT_VECTOR_ELT);
setTargetDAGCombine(ISD::LOAD);		setTargetDAGCombine(ISD::LOAD);
}		}

const R600Subtarget *R600TargetLowering::getSubtarget() const {
return static_cast<const R600Subtarget *>(Subtarget);
}

static inline bool isEOP(MachineBasicBlock::iterator I) {		static inline bool isEOP(MachineBasicBlock::iterator I) {
if (std::next(I) == I->getParent()->end())		if (std::next(I) == I->getParent()->end())
return false;		return false;
return std::next(I)->getOpcode() == AMDGPU::RETURN;		return std::next(I)->getOpcode() == R600::RETURN;
}		}

MachineBasicBlock *		MachineBasicBlock *
R600TargetLowering::EmitInstrWithCustomInserter(MachineInstr &MI,		R600TargetLowering::EmitInstrWithCustomInserter(MachineInstr &MI,
MachineBasicBlock *BB) const {		MachineBasicBlock *BB) const {
MachineFunction *MF = BB->getParent();		MachineFunction *MF = BB->getParent();
MachineRegisterInfo &MRI = MF->getRegInfo();		MachineRegisterInfo &MRI = MF->getRegInfo();
MachineBasicBlock::iterator I = MI;		MachineBasicBlock::iterator I = MI;
const R600InstrInfo *TII = getSubtarget()->getInstrInfo();		const R600InstrInfo *TII = Subtarget->getInstrInfo();

switch (MI.getOpcode()) {		switch (MI.getOpcode()) {
default:		default:
// Replace LDS_*_RET instruction that don't have any uses with the		// Replace LDS_*_RET instruction that don't have any uses with the
// equivalent LDS_*_NORET instruction.		// equivalent LDS_*_NORET instruction.
if (TII->isLDSRetInstr(MI.getOpcode())) {		if (TII->isLDSRetInstr(MI.getOpcode())) {
int DstIdx = TII->getOperandIdx(MI.getOpcode(), AMDGPU::OpName::dst);		int DstIdx = TII->getOperandIdx(MI.getOpcode(), R600::OpName::dst);
assert(DstIdx != -1);		assert(DstIdx != -1);
MachineInstrBuilder NewMI;		MachineInstrBuilder NewMI;
// FIXME: getLDSNoRetOp method only handles LDS_1A1D LDS ops. Add		// FIXME: getLDSNoRetOp method only handles LDS_1A1D LDS ops. Add
// LDS_1A2D support and remove this special case.		// LDS_1A2D support and remove this special case.
if (!MRI.use_empty(MI.getOperand(DstIdx).getReg()) \|\|		if (!MRI.use_empty(MI.getOperand(DstIdx).getReg()) \|\|
MI.getOpcode() == AMDGPU::LDS_CMPST_RET)		MI.getOpcode() == R600::LDS_CMPST_RET)
return BB;		return BB;

NewMI = BuildMI(*BB, I, BB->findDebugLoc(I),		NewMI = BuildMI(*BB, I, BB->findDebugLoc(I),
TII->get(AMDGPU::getLDSNoRetOp(MI.getOpcode())));		TII->get(R600::getLDSNoRetOp(MI.getOpcode())));
for (unsigned i = 1, e = MI.getNumOperands(); i < e; ++i) {		for (unsigned i = 1, e = MI.getNumOperands(); i < e; ++i) {
NewMI.add(MI.getOperand(i));		NewMI.add(MI.getOperand(i));
}		}
} else {		} else {
return AMDGPUTargetLowering::EmitInstrWithCustomInserter(MI, BB);		return AMDGPUTargetLowering::EmitInstrWithCustomInserter(MI, BB);
}		}
break;		break;
case AMDGPU::CLAMP_R600: {		case R600::CLAMP_R600: {
MachineInstr *NewMI = TII->buildDefaultInstruction(		MachineInstr *NewMI = TII->buildDefaultInstruction(
*BB, I, AMDGPU::MOV, MI.getOperand(0).getReg(),		*BB, I, R600::MOV, MI.getOperand(0).getReg(),
MI.getOperand(1).getReg());		MI.getOperand(1).getReg());
TII->addFlag(*NewMI, 0, MO_FLAG_CLAMP);		TII->addFlag(*NewMI, 0, MO_FLAG_CLAMP);
break;		break;
}		}

case AMDGPU::FABS_R600: {		case R600::FABS_R600: {
MachineInstr *NewMI = TII->buildDefaultInstruction(		MachineInstr *NewMI = TII->buildDefaultInstruction(
*BB, I, AMDGPU::MOV, MI.getOperand(0).getReg(),		*BB, I, R600::MOV, MI.getOperand(0).getReg(),
MI.getOperand(1).getReg());		MI.getOperand(1).getReg());
TII->addFlag(*NewMI, 0, MO_FLAG_ABS);		TII->addFlag(*NewMI, 0, MO_FLAG_ABS);
break;		break;
}		}

case AMDGPU::FNEG_R600: {		case R600::FNEG_R600: {
MachineInstr *NewMI = TII->buildDefaultInstruction(		MachineInstr *NewMI = TII->buildDefaultInstruction(
*BB, I, AMDGPU::MOV, MI.getOperand(0).getReg(),		*BB, I, R600::MOV, MI.getOperand(0).getReg(),
MI.getOperand(1).getReg());		MI.getOperand(1).getReg());
TII->addFlag(*NewMI, 0, MO_FLAG_NEG);		TII->addFlag(*NewMI, 0, MO_FLAG_NEG);
break;		break;
}		}

case AMDGPU::MASK_WRITE: {		case R600::MASK_WRITE: {
unsigned maskedRegister = MI.getOperand(0).getReg();		unsigned maskedRegister = MI.getOperand(0).getReg();
assert(TargetRegisterInfo::isVirtualRegister(maskedRegister));		assert(TargetRegisterInfo::isVirtualRegister(maskedRegister));
MachineInstr * defInstr = MRI.getVRegDef(maskedRegister);		MachineInstr * defInstr = MRI.getVRegDef(maskedRegister);
TII->addFlag(*defInstr, 0, MO_FLAG_MASK);		TII->addFlag(*defInstr, 0, MO_FLAG_MASK);
break;		break;
}		}

case AMDGPU::MOV_IMM_F32:		case R600::MOV_IMM_F32:
TII->buildMovImm(*BB, I, MI.getOperand(0).getReg(), MI.getOperand(1)		TII->buildMovImm(*BB, I, MI.getOperand(0).getReg(), MI.getOperand(1)
.getFPImm()		.getFPImm()
->getValueAPF()		->getValueAPF()
.bitcastToAPInt()		.bitcastToAPInt()
.getZExtValue());		.getZExtValue());
break;		break;

case AMDGPU::MOV_IMM_I32:		case R600::MOV_IMM_I32:
TII->buildMovImm(*BB, I, MI.getOperand(0).getReg(),		TII->buildMovImm(*BB, I, MI.getOperand(0).getReg(),
MI.getOperand(1).getImm());		MI.getOperand(1).getImm());
break;		break;

case AMDGPU::MOV_IMM_GLOBAL_ADDR: {		case R600::MOV_IMM_GLOBAL_ADDR: {
//TODO: Perhaps combine this instruction with the next if possible		//TODO: Perhaps combine this instruction with the next if possible
auto MIB = TII->buildDefaultInstruction(		auto MIB = TII->buildDefaultInstruction(
*BB, MI, AMDGPU::MOV, MI.getOperand(0).getReg(), AMDGPU::ALU_LITERAL_X);		*BB, MI, R600::MOV, MI.getOperand(0).getReg(), R600::ALU_LITERAL_X);
int Idx = TII->getOperandIdx(*MIB, AMDGPU::OpName::literal);		int Idx = TII->getOperandIdx(*MIB, R600::OpName::literal);
//TODO: Ugh this is rather ugly		//TODO: Ugh this is rather ugly
MIB->getOperand(Idx) = MI.getOperand(1);		MIB->getOperand(Idx) = MI.getOperand(1);
break;		break;
}		}

case AMDGPU::CONST_COPY: {		case R600::CONST_COPY: {
MachineInstr *NewMI = TII->buildDefaultInstruction(		MachineInstr *NewMI = TII->buildDefaultInstruction(
*BB, MI, AMDGPU::MOV, MI.getOperand(0).getReg(), AMDGPU::ALU_CONST);		*BB, MI, R600::MOV, MI.getOperand(0).getReg(), R600::ALU_CONST);
TII->setImmOperand(*NewMI, AMDGPU::OpName::src0_sel,		TII->setImmOperand(*NewMI, R600::OpName::src0_sel,
MI.getOperand(1).getImm());		MI.getOperand(1).getImm());
break;		break;
}		}

case AMDGPU::RAT_WRITE_CACHELESS_32_eg:		case R600::RAT_WRITE_CACHELESS_32_eg:
case AMDGPU::RAT_WRITE_CACHELESS_64_eg:		case R600::RAT_WRITE_CACHELESS_64_eg:
case AMDGPU::RAT_WRITE_CACHELESS_128_eg:		case R600::RAT_WRITE_CACHELESS_128_eg:
BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(MI.getOpcode()))		BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(MI.getOpcode()))
.add(MI.getOperand(0))		.add(MI.getOperand(0))
.add(MI.getOperand(1))		.add(MI.getOperand(1))
.addImm(isEOP(I)); // Set End of program bit		.addImm(isEOP(I)); // Set End of program bit
break;		break;

case AMDGPU::RAT_STORE_TYPED_eg:		case R600::RAT_STORE_TYPED_eg:
BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(MI.getOpcode()))		BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(MI.getOpcode()))
.add(MI.getOperand(0))		.add(MI.getOperand(0))
.add(MI.getOperand(1))		.add(MI.getOperand(1))
.add(MI.getOperand(2))		.add(MI.getOperand(2))
.addImm(isEOP(I)); // Set End of program bit		.addImm(isEOP(I)); // Set End of program bit
break;		break;

case AMDGPU::BRANCH:		case R600::BRANCH:
BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(AMDGPU::JUMP))		BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(R600::JUMP))
.add(MI.getOperand(0));		.add(MI.getOperand(0));
break;		break;

case AMDGPU::BRANCH_COND_f32: {		case R600::BRANCH_COND_f32: {
MachineInstr *NewMI =		MachineInstr *NewMI =
BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(AMDGPU::PRED_X),		BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(R600::PRED_X),
AMDGPU::PREDICATE_BIT)		R600::PREDICATE_BIT)
.add(MI.getOperand(1))		.add(MI.getOperand(1))
.addImm(AMDGPU::PRED_SETNE)		.addImm(R600::PRED_SETNE)
.addImm(0); // Flags		.addImm(0); // Flags
TII->addFlag(*NewMI, 0, MO_FLAG_PUSH);		TII->addFlag(*NewMI, 0, MO_FLAG_PUSH);
BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(AMDGPU::JUMP_COND))		BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(R600::JUMP_COND))
.add(MI.getOperand(0))		.add(MI.getOperand(0))
.addReg(AMDGPU::PREDICATE_BIT, RegState::Kill);		.addReg(R600::PREDICATE_BIT, RegState::Kill);
break;		break;
}		}

case AMDGPU::BRANCH_COND_i32: {		case R600::BRANCH_COND_i32: {
MachineInstr *NewMI =		MachineInstr *NewMI =
BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(AMDGPU::PRED_X),		BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(R600::PRED_X),
AMDGPU::PREDICATE_BIT)		R600::PREDICATE_BIT)
.add(MI.getOperand(1))		.add(MI.getOperand(1))
.addImm(AMDGPU::PRED_SETNE_INT)		.addImm(R600::PRED_SETNE_INT)
.addImm(0); // Flags		.addImm(0); // Flags
TII->addFlag(*NewMI, 0, MO_FLAG_PUSH);		TII->addFlag(*NewMI, 0, MO_FLAG_PUSH);
BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(AMDGPU::JUMP_COND))		BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(R600::JUMP_COND))
.add(MI.getOperand(0))		.add(MI.getOperand(0))
.addReg(AMDGPU::PREDICATE_BIT, RegState::Kill);		.addReg(R600::PREDICATE_BIT, RegState::Kill);
break;		break;
}		}

case AMDGPU::EG_ExportSwz:		case R600::EG_ExportSwz:
case AMDGPU::R600_ExportSwz: {		case R600::R600_ExportSwz: {
// Instruction is left unmodified if its not the last one of its type		// Instruction is left unmodified if its not the last one of its type
bool isLastInstructionOfItsType = true;		bool isLastInstructionOfItsType = true;
unsigned InstExportType = MI.getOperand(1).getImm();		unsigned InstExportType = MI.getOperand(1).getImm();
for (MachineBasicBlock::iterator NextExportInst = std::next(I),		for (MachineBasicBlock::iterator NextExportInst = std::next(I),
EndBlock = BB->end(); NextExportInst != EndBlock;		EndBlock = BB->end(); NextExportInst != EndBlock;
NextExportInst = std::next(NextExportInst)) {		NextExportInst = std::next(NextExportInst)) {
if (NextExportInst->getOpcode() == AMDGPU::EG_ExportSwz \|\|		if (NextExportInst->getOpcode() == R600::EG_ExportSwz \|\|
NextExportInst->getOpcode() == AMDGPU::R600_ExportSwz) {		NextExportInst->getOpcode() == R600::R600_ExportSwz) {
unsigned CurrentInstExportType = NextExportInst->getOperand(1)		unsigned CurrentInstExportType = NextExportInst->getOperand(1)
.getImm();		.getImm();
if (CurrentInstExportType == InstExportType) {		if (CurrentInstExportType == InstExportType) {
isLastInstructionOfItsType = false;		isLastInstructionOfItsType = false;
break;		break;
}		}
}		}
}		}
bool EOP = isEOP(I);		bool EOP = isEOP(I);
if (!EOP && !isLastInstructionOfItsType)		if (!EOP && !isLastInstructionOfItsType)
return BB;		return BB;
unsigned CfInst = (MI.getOpcode() == AMDGPU::EG_ExportSwz) ? 84 : 40;		unsigned CfInst = (MI.getOpcode() == R600::EG_ExportSwz) ? 84 : 40;
BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(MI.getOpcode()))		BuildMI(*BB, I, BB->findDebugLoc(I), TII->get(MI.getOpcode()))
.add(MI.getOperand(0))		.add(MI.getOperand(0))
.add(MI.getOperand(1))		.add(MI.getOperand(1))
.add(MI.getOperand(2))		.add(MI.getOperand(2))
.add(MI.getOperand(3))		.add(MI.getOperand(3))
.add(MI.getOperand(4))		.add(MI.getOperand(4))
.add(MI.getOperand(5))		.add(MI.getOperand(5))
.add(MI.getOperand(6))		.add(MI.getOperand(6))
.addImm(CfInst)		.addImm(CfInst)
.addImm(EOP);		.addImm(EOP);
break;		break;
}		}
case AMDGPU::RETURN: {		case R600::RETURN: {
return BB;		return BB;
}		}
}		}

MI.eraseFromParent();		MI.eraseFromParent();
return BB;		return BB;
}		}

Show All 28 Lines	SDValue R600TargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const {
case ISD::BRCOND: return LowerBRCOND(Op, DAG);		case ISD::BRCOND: return LowerBRCOND(Op, DAG);
case ISD::GlobalAddress: return LowerGlobalAddress(MFI, Op, DAG);		case ISD::GlobalAddress: return LowerGlobalAddress(MFI, Op, DAG);
case ISD::FrameIndex: return lowerFrameIndex(Op, DAG);		case ISD::FrameIndex: return lowerFrameIndex(Op, DAG);
case ISD::INTRINSIC_VOID: {		case ISD::INTRINSIC_VOID: {
SDValue Chain = Op.getOperand(0);		SDValue Chain = Op.getOperand(0);
unsigned IntrinsicID =		unsigned IntrinsicID =
cast<ConstantSDNode>(Op.getOperand(1))->getZExtValue();		cast<ConstantSDNode>(Op.getOperand(1))->getZExtValue();
switch (IntrinsicID) {		switch (IntrinsicID) {
case AMDGPUIntrinsic::r600_store_swizzle: {		case r600Intrinsic::r600_store_swizzle: {
SDLoc DL(Op);		SDLoc DL(Op);
const SDValue Args[8] = {		const SDValue Args[8] = {
Chain,		Chain,
Op.getOperand(2), // Export Value		Op.getOperand(2), // Export Value
Op.getOperand(3), // ArrayBase		Op.getOperand(3), // ArrayBase
Op.getOperand(4), // Type		Op.getOperand(4), // Type
DAG.getConstant(0, DL, MVT::i32), // SWZ_X		DAG.getConstant(0, DL, MVT::i32), // SWZ_X
DAG.getConstant(1, DL, MVT::i32), // SWZ_Y		DAG.getConstant(1, DL, MVT::i32), // SWZ_Y
Show All 10 Lines	case ISD::INTRINSIC_VOID: {
break;		break;
}		}
case ISD::INTRINSIC_WO_CHAIN: {		case ISD::INTRINSIC_WO_CHAIN: {
unsigned IntrinsicID =		unsigned IntrinsicID =
cast<ConstantSDNode>(Op.getOperand(0))->getZExtValue();		cast<ConstantSDNode>(Op.getOperand(0))->getZExtValue();
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();
SDLoc DL(Op);		SDLoc DL(Op);
switch (IntrinsicID) {		switch (IntrinsicID) {
case AMDGPUIntrinsic::r600_tex:		case r600Intrinsic::r600_tex:
case AMDGPUIntrinsic::r600_texc: {		case r600Intrinsic::r600_texc: {
unsigned TextureOp;		unsigned TextureOp;
switch (IntrinsicID) {		switch (IntrinsicID) {
case AMDGPUIntrinsic::r600_tex:		case r600Intrinsic::r600_tex:
TextureOp = 0;		TextureOp = 0;
break;		break;
case AMDGPUIntrinsic::r600_texc:		case r600Intrinsic::r600_texc:
TextureOp = 1;		TextureOp = 1;
break;		break;
default:		default:
llvm_unreachable("unhandled texture operation");		llvm_unreachable("unhandled texture operation");
}		}

SDValue TexArgs[19] = {		SDValue TexArgs[19] = {
DAG.getConstant(TextureOp, DL, MVT::i32),		DAG.getConstant(TextureOp, DL, MVT::i32),
Show All 13 Lines	case r600Intrinsic::r600_texc: {
Op.getOperand(6),		Op.getOperand(6),
Op.getOperand(7),		Op.getOperand(7),
Op.getOperand(8),		Op.getOperand(8),
Op.getOperand(9),		Op.getOperand(9),
Op.getOperand(10)		Op.getOperand(10)
};		};
return DAG.getNode(AMDGPUISD::TEXTURE_FETCH, DL, MVT::v4f32, TexArgs);		return DAG.getNode(AMDGPUISD::TEXTURE_FETCH, DL, MVT::v4f32, TexArgs);
}		}
case AMDGPUIntrinsic::r600_dot4: {		case r600Intrinsic::r600_dot4: {
SDValue Args[8] = {		SDValue Args[8] = {
DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::f32, Op.getOperand(1),		DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::f32, Op.getOperand(1),
DAG.getConstant(0, DL, MVT::i32)),		DAG.getConstant(0, DL, MVT::i32)),
DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::f32, Op.getOperand(2),		DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::f32, Op.getOperand(2),
DAG.getConstant(0, DL, MVT::i32)),		DAG.getConstant(0, DL, MVT::i32)),
DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::f32, Op.getOperand(1),		DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::f32, Op.getOperand(1),
DAG.getConstant(1, DL, MVT::i32)),		DAG.getConstant(1, DL, MVT::i32)),
DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::f32, Op.getOperand(2),		DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::f32, Op.getOperand(2),
Show All 30 Lines	case ISD::INTRINSIC_WO_CHAIN: {
case Intrinsic::r600_read_local_size_x:		case Intrinsic::r600_read_local_size_x:
return LowerImplicitParameter(DAG, VT, DL, 6);		return LowerImplicitParameter(DAG, VT, DL, 6);
case Intrinsic::r600_read_local_size_y:		case Intrinsic::r600_read_local_size_y:
return LowerImplicitParameter(DAG, VT, DL, 7);		return LowerImplicitParameter(DAG, VT, DL, 7);
case Intrinsic::r600_read_local_size_z:		case Intrinsic::r600_read_local_size_z:
return LowerImplicitParameter(DAG, VT, DL, 8);		return LowerImplicitParameter(DAG, VT, DL, 8);

case Intrinsic::r600_read_tgid_x:		case Intrinsic::r600_read_tgid_x:
return CreateLiveInRegisterRaw(DAG, &AMDGPU::R600_TReg32RegClass,		return CreateLiveInRegisterRaw(DAG, &R600::R600_TReg32RegClass,
AMDGPU::T1_X, VT);		R600::T1_X, VT);
case Intrinsic::r600_read_tgid_y:		case Intrinsic::r600_read_tgid_y:
return CreateLiveInRegisterRaw(DAG, &AMDGPU::R600_TReg32RegClass,		return CreateLiveInRegisterRaw(DAG, &R600::R600_TReg32RegClass,
AMDGPU::T1_Y, VT);		R600::T1_Y, VT);
case Intrinsic::r600_read_tgid_z:		case Intrinsic::r600_read_tgid_z:
return CreateLiveInRegisterRaw(DAG, &AMDGPU::R600_TReg32RegClass,		return CreateLiveInRegisterRaw(DAG, &R600::R600_TReg32RegClass,
AMDGPU::T1_Z, VT);		R600::T1_Z, VT);
case Intrinsic::r600_read_tidig_x:		case Intrinsic::r600_read_tidig_x:
return CreateLiveInRegisterRaw(DAG, &AMDGPU::R600_TReg32RegClass,		return CreateLiveInRegisterRaw(DAG, &R600::R600_TReg32RegClass,
AMDGPU::T0_X, VT);		R600::T0_X, VT);
case Intrinsic::r600_read_tidig_y:		case Intrinsic::r600_read_tidig_y:
return CreateLiveInRegisterRaw(DAG, &AMDGPU::R600_TReg32RegClass,		return CreateLiveInRegisterRaw(DAG, &R600::R600_TReg32RegClass,
AMDGPU::T0_Y, VT);		R600::T0_Y, VT);
case Intrinsic::r600_read_tidig_z:		case Intrinsic::r600_read_tidig_z:
return CreateLiveInRegisterRaw(DAG, &AMDGPU::R600_TReg32RegClass,		return CreateLiveInRegisterRaw(DAG, &R600::R600_TReg32RegClass,
AMDGPU::T0_Z, VT);		R600::T0_Z, VT);

case Intrinsic::r600_recipsqrt_ieee:		case Intrinsic::r600_recipsqrt_ieee:
return DAG.getNode(AMDGPUISD::RSQ, DL, VT, Op.getOperand(1));		return DAG.getNode(AMDGPUISD::RSQ, DL, VT, Op.getOperand(1));

case Intrinsic::r600_recipsqrt_clamped:		case Intrinsic::r600_recipsqrt_clamped:
return DAG.getNode(AMDGPUISD::RSQ_CLAMP, DL, VT, Op.getOperand(1));		return DAG.getNode(AMDGPUISD::RSQ_CLAMP, DL, VT, Op.getOperand(1));
default:		default:
return Op;		return Op;
▲ Show 20 Lines • Show All 905 Lines • ▼ Show 20 Lines	SDValue R600TargetLowering::LowerBRCOND(SDValue Op, SelectionDAG &DAG) const {

return DAG.getNode(AMDGPUISD::BRANCH_COND, SDLoc(Op), Op.getValueType(),		return DAG.getNode(AMDGPUISD::BRANCH_COND, SDLoc(Op), Op.getValueType(),
Chain, Jump, Cond);		Chain, Jump, Cond);
}		}

SDValue R600TargetLowering::lowerFrameIndex(SDValue Op,		SDValue R600TargetLowering::lowerFrameIndex(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
MachineFunction &MF = DAG.getMachineFunction();		MachineFunction &MF = DAG.getMachineFunction();
const R600FrameLowering *TFL = getSubtarget()->getFrameLowering();		const R600FrameLowering *TFL = Subtarget->getFrameLowering();

FrameIndexSDNode *FIN = cast<FrameIndexSDNode>(Op);		FrameIndexSDNode *FIN = cast<FrameIndexSDNode>(Op);

unsigned FrameIndex = FIN->getIndex();		unsigned FrameIndex = FIN->getIndex();
unsigned IgnoredFrameReg;		unsigned IgnoredFrameReg;
unsigned Offset =		unsigned Offset =
TFL->getFrameIndexReference(MF, FrameIndex, IgnoredFrameReg);		TFL->getFrameIndexReference(MF, FrameIndex, IgnoredFrameReg);
return DAG.getConstant(Offset * 4 * TFL->getStackWidth(MF), SDLoc(Op),		return DAG.getConstant(Offset * 4 * TFL->getStackWidth(MF), SDLoc(Op),
Op.getValueType());		Op.getValueType());
}		}

		CCAssignFn *R600TargetLowering::CCAssignFnForCall(CallingConv::ID CC,
		bool IsVarArg) const {
		switch (CC) {
		case CallingConv::AMDGPU_KERNEL:
		case CallingConv::SPIR_KERNEL:
		case CallingConv::C:
		case CallingConv::Fast:
		case CallingConv::Cold:
		arsenmUnsubmitted Not Done Reply Inline Actions Probably should reject these, but that's a separate change arsenm: Probably should reject these, but that's a separate change
		return CC_R600_Kernel;
		case CallingConv::AMDGPU_VS:
		case CallingConv::AMDGPU_GS:
		case CallingConv::AMDGPU_PS:
		case CallingConv::AMDGPU_CS:
		case CallingConv::AMDGPU_HS:
		case CallingConv::AMDGPU_ES:
		case CallingConv::AMDGPU_LS:
		return CC_R600;
		default:
		report_fatal_error("Unsupported calling convention.");
		}
		}

/// XXX Only kernel functions are supported, so we can assume for now that		/// XXX Only kernel functions are supported, so we can assume for now that
/// every function is a kernel function, but in the future we should use		/// every function is a kernel function, but in the future we should use
/// separate calling conventions for kernel and non-kernel functions.		/// separate calling conventions for kernel and non-kernel functions.
SDValue R600TargetLowering::LowerFormalArguments(		SDValue R600TargetLowering::LowerFormalArguments(
SDValue Chain, CallingConv::ID CallConv, bool isVarArg,		SDValue Chain, CallingConv::ID CallConv, bool isVarArg,
const SmallVectorImpl<ISD::InputArg> &Ins, const SDLoc &DL,		const SmallVectorImpl<ISD::InputArg> &Ins, const SDLoc &DL,
SelectionDAG &DAG, SmallVectorImpl<SDValue> &InVals) const {		SelectionDAG &DAG, SmallVectorImpl<SDValue> &InVals) const {
SmallVector<CCValAssign, 16> ArgLocs;		SmallVector<CCValAssign, 16> ArgLocs;
Show All 16 Lines	for (unsigned i = 0, e = Ins.size(); i < e; ++i) {
EVT VT = In.VT;		EVT VT = In.VT;
EVT MemVT = VA.getLocVT();		EVT MemVT = VA.getLocVT();
if (!VT.isVector() && MemVT.isVector()) {		if (!VT.isVector() && MemVT.isVector()) {
// Get load source type if scalarized.		// Get load source type if scalarized.
MemVT = MemVT.getVectorElementType();		MemVT = MemVT.getVectorElementType();
}		}

if (AMDGPU::isShader(CallConv)) {		if (AMDGPU::isShader(CallConv)) {
unsigned Reg = MF.addLiveIn(VA.getLocReg(), &AMDGPU::R600_Reg128RegClass);		unsigned Reg = MF.addLiveIn(VA.getLocReg(), &R600::R600_Reg128RegClass);
SDValue Register = DAG.getCopyFromReg(Chain, DL, Reg, VT);		SDValue Register = DAG.getCopyFromReg(Chain, DL, Reg, VT);
InVals.push_back(Register);		InVals.push_back(Register);
continue;		continue;
}		}

PointerType PtrTy = PointerType::get(VT.getTypeForEVT(DAG.getContext()),		PointerType PtrTy = PointerType::get(VT.getTypeForEVT(DAG.getContext()),
AMDGPUASI.CONSTANT_BUFFER_0);		AMDGPUASI.CONSTANT_BUFFER_0);

▲ Show 20 Lines • Show All 401 Lines • ▼ Show 20 Lines	SDValue R600TargetLowering::PerformDAGCombine(SDNode *N,

return AMDGPUTargetLowering::PerformDAGCombine(N, DCI);		return AMDGPUTargetLowering::PerformDAGCombine(N, DCI);
}		}

bool R600TargetLowering::FoldOperand(SDNode *ParentNode, unsigned SrcIdx,		bool R600TargetLowering::FoldOperand(SDNode *ParentNode, unsigned SrcIdx,
SDValue &Src, SDValue &Neg, SDValue &Abs,		SDValue &Src, SDValue &Neg, SDValue &Abs,
SDValue &Sel, SDValue &Imm,		SDValue &Sel, SDValue &Imm,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
const R600InstrInfo *TII = getSubtarget()->getInstrInfo();		const R600InstrInfo *TII = Subtarget->getInstrInfo();
if (!Src.isMachineOpcode())		if (!Src.isMachineOpcode())
return false;		return false;

switch (Src.getMachineOpcode()) {		switch (Src.getMachineOpcode()) {
case AMDGPU::FNEG_R600:		case R600::FNEG_R600:
if (!Neg.getNode())		if (!Neg.getNode())
return false;		return false;
Src = Src.getOperand(0);		Src = Src.getOperand(0);
Neg = DAG.getTargetConstant(1, SDLoc(ParentNode), MVT::i32);		Neg = DAG.getTargetConstant(1, SDLoc(ParentNode), MVT::i32);
return true;		return true;
case AMDGPU::FABS_R600:		case R600::FABS_R600:
if (!Abs.getNode())		if (!Abs.getNode())
return false;		return false;
Src = Src.getOperand(0);		Src = Src.getOperand(0);
Abs = DAG.getTargetConstant(1, SDLoc(ParentNode), MVT::i32);		Abs = DAG.getTargetConstant(1, SDLoc(ParentNode), MVT::i32);
return true;		return true;
case AMDGPU::CONST_COPY: {		case R600::CONST_COPY: {
unsigned Opcode = ParentNode->getMachineOpcode();		unsigned Opcode = ParentNode->getMachineOpcode();
bool HasDst = TII->getOperandIdx(Opcode, AMDGPU::OpName::dst) > -1;		bool HasDst = TII->getOperandIdx(Opcode, R600::OpName::dst) > -1;

if (!Sel.getNode())		if (!Sel.getNode())
return false;		return false;

SDValue CstOffset = Src.getOperand(0);		SDValue CstOffset = Src.getOperand(0);
if (ParentNode->getValueType(0).isVector())		if (ParentNode->getValueType(0).isVector())
return false;		return false;

// Gather constants values		// Gather constants values
int SrcIndices[] = {		int SrcIndices[] = {
TII->getOperandIdx(Opcode, AMDGPU::OpName::src0),		TII->getOperandIdx(Opcode, R600::OpName::src0),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src1),		TII->getOperandIdx(Opcode, R600::OpName::src1),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src2),		TII->getOperandIdx(Opcode, R600::OpName::src2),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src0_X),		TII->getOperandIdx(Opcode, R600::OpName::src0_X),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src0_Y),		TII->getOperandIdx(Opcode, R600::OpName::src0_Y),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src0_Z),		TII->getOperandIdx(Opcode, R600::OpName::src0_Z),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src0_W),		TII->getOperandIdx(Opcode, R600::OpName::src0_W),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src1_X),		TII->getOperandIdx(Opcode, R600::OpName::src1_X),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src1_Y),		TII->getOperandIdx(Opcode, R600::OpName::src1_Y),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src1_Z),		TII->getOperandIdx(Opcode, R600::OpName::src1_Z),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src1_W)		TII->getOperandIdx(Opcode, R600::OpName::src1_W)
};		};
std::vector<unsigned> Consts;		std::vector<unsigned> Consts;
for (int OtherSrcIdx : SrcIndices) {		for (int OtherSrcIdx : SrcIndices) {
int OtherSelIdx = TII->getSelIdx(Opcode, OtherSrcIdx);		int OtherSelIdx = TII->getSelIdx(Opcode, OtherSrcIdx);
if (OtherSrcIdx < 0 \|\| OtherSelIdx < 0)		if (OtherSrcIdx < 0 \|\| OtherSelIdx < 0)
continue;		continue;
if (HasDst) {		if (HasDst) {
OtherSrcIdx--;		OtherSrcIdx--;
OtherSelIdx--;		OtherSelIdx--;
}		}
if (RegisterSDNode *Reg =		if (RegisterSDNode *Reg =
dyn_cast<RegisterSDNode>(ParentNode->getOperand(OtherSrcIdx))) {		dyn_cast<RegisterSDNode>(ParentNode->getOperand(OtherSrcIdx))) {
if (Reg->getReg() == AMDGPU::ALU_CONST) {		if (Reg->getReg() == R600::ALU_CONST) {
ConstantSDNode *Cst		ConstantSDNode *Cst
= cast<ConstantSDNode>(ParentNode->getOperand(OtherSelIdx));		= cast<ConstantSDNode>(ParentNode->getOperand(OtherSelIdx));
Consts.push_back(Cst->getZExtValue());		Consts.push_back(Cst->getZExtValue());
}		}
}		}
}		}

ConstantSDNode *Cst = cast<ConstantSDNode>(CstOffset);		ConstantSDNode *Cst = cast<ConstantSDNode>(CstOffset);
Consts.push_back(Cst->getZExtValue());		Consts.push_back(Cst->getZExtValue());
if (!TII->fitsConstReadLimitations(Consts)) {		if (!TII->fitsConstReadLimitations(Consts)) {
return false;		return false;
}		}

Sel = CstOffset;		Sel = CstOffset;
Src = DAG.getRegister(AMDGPU::ALU_CONST, MVT::f32);		Src = DAG.getRegister(R600::ALU_CONST, MVT::f32);
return true;		return true;
}		}
case AMDGPU::MOV_IMM_GLOBAL_ADDR:		case R600::MOV_IMM_GLOBAL_ADDR:
// Check if the Imm slot is used. Taken from below.		// Check if the Imm slot is used. Taken from below.
if (cast<ConstantSDNode>(Imm)->getZExtValue())		if (cast<ConstantSDNode>(Imm)->getZExtValue())
return false;		return false;
Imm = Src.getOperand(0);		Imm = Src.getOperand(0);
Src = DAG.getRegister(AMDGPU::ALU_LITERAL_X, MVT::i32);		Src = DAG.getRegister(R600::ALU_LITERAL_X, MVT::i32);
return true;		return true;
case AMDGPU::MOV_IMM_I32:		case R600::MOV_IMM_I32:
case AMDGPU::MOV_IMM_F32: {		case R600::MOV_IMM_F32: {
unsigned ImmReg = AMDGPU::ALU_LITERAL_X;		unsigned ImmReg = R600::ALU_LITERAL_X;
uint64_t ImmValue = 0;		uint64_t ImmValue = 0;

if (Src.getMachineOpcode() == AMDGPU::MOV_IMM_F32) {		if (Src.getMachineOpcode() == R600::MOV_IMM_F32) {
ConstantFPSDNode *FPC = dyn_cast<ConstantFPSDNode>(Src.getOperand(0));		ConstantFPSDNode *FPC = dyn_cast<ConstantFPSDNode>(Src.getOperand(0));
float FloatValue = FPC->getValueAPF().convertToFloat();		float FloatValue = FPC->getValueAPF().convertToFloat();
if (FloatValue == 0.0) {		if (FloatValue == 0.0) {
ImmReg = AMDGPU::ZERO;		ImmReg = R600::ZERO;
} else if (FloatValue == 0.5) {		} else if (FloatValue == 0.5) {
ImmReg = AMDGPU::HALF;		ImmReg = R600::HALF;
} else if (FloatValue == 1.0) {		} else if (FloatValue == 1.0) {
ImmReg = AMDGPU::ONE;		ImmReg = R600::ONE;
} else {		} else {
ImmValue = FPC->getValueAPF().bitcastToAPInt().getZExtValue();		ImmValue = FPC->getValueAPF().bitcastToAPInt().getZExtValue();
}		}
} else {		} else {
ConstantSDNode *C = dyn_cast<ConstantSDNode>(Src.getOperand(0));		ConstantSDNode *C = dyn_cast<ConstantSDNode>(Src.getOperand(0));
uint64_t Value = C->getZExtValue();		uint64_t Value = C->getZExtValue();
if (Value == 0) {		if (Value == 0) {
ImmReg = AMDGPU::ZERO;		ImmReg = R600::ZERO;
} else if (Value == 1) {		} else if (Value == 1) {
ImmReg = AMDGPU::ONE_INT;		ImmReg = R600::ONE_INT;
} else {		} else {
ImmValue = Value;		ImmValue = Value;
}		}
}		}

// Check that we aren't already using an immediate.		// Check that we aren't already using an immediate.
// XXX: It's possible for an instruction to have more than one		// XXX: It's possible for an instruction to have more than one
// immediate operand, but this is not supported yet.		// immediate operand, but this is not supported yet.
if (ImmReg == AMDGPU::ALU_LITERAL_X) {		if (ImmReg == R600::ALU_LITERAL_X) {
if (!Imm.getNode())		if (!Imm.getNode())
return false;		return false;
ConstantSDNode *C = dyn_cast<ConstantSDNode>(Imm);		ConstantSDNode *C = dyn_cast<ConstantSDNode>(Imm);
assert(C);		assert(C);
if (C->getZExtValue())		if (C->getZExtValue())
return false;		return false;
Imm = DAG.getTargetConstant(ImmValue, SDLoc(ParentNode), MVT::i32);		Imm = DAG.getTargetConstant(ImmValue, SDLoc(ParentNode), MVT::i32);
}		}
Src = DAG.getRegister(ImmReg, MVT::i32);		Src = DAG.getRegister(ImmReg, MVT::i32);
return true;		return true;
}		}
default:		default:
return false;		return false;
}		}
}		}

/// \brief Fold the instructions after selecting them		/// \brief Fold the instructions after selecting them
SDNode R600TargetLowering::PostISelFolding(MachineSDNode Node,		SDNode R600TargetLowering::PostISelFolding(MachineSDNode Node,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
const R600InstrInfo *TII = getSubtarget()->getInstrInfo();		const R600InstrInfo *TII = Subtarget->getInstrInfo();
if (!Node->isMachineOpcode())		if (!Node->isMachineOpcode())
return Node;		return Node;

unsigned Opcode = Node->getMachineOpcode();		unsigned Opcode = Node->getMachineOpcode();
SDValue FakeOp;		SDValue FakeOp;

std::vector<SDValue> Ops(Node->op_begin(), Node->op_end());		std::vector<SDValue> Ops(Node->op_begin(), Node->op_end());

if (Opcode == AMDGPU::DOT_4) {		if (Opcode == R600::DOT_4) {
int OperandIdx[] = {		int OperandIdx[] = {
TII->getOperandIdx(Opcode, AMDGPU::OpName::src0_X),		TII->getOperandIdx(Opcode, R600::OpName::src0_X),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src0_Y),		TII->getOperandIdx(Opcode, R600::OpName::src0_Y),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src0_Z),		TII->getOperandIdx(Opcode, R600::OpName::src0_Z),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src0_W),		TII->getOperandIdx(Opcode, R600::OpName::src0_W),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src1_X),		TII->getOperandIdx(Opcode, R600::OpName::src1_X),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src1_Y),		TII->getOperandIdx(Opcode, R600::OpName::src1_Y),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src1_Z),		TII->getOperandIdx(Opcode, R600::OpName::src1_Z),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src1_W)		TII->getOperandIdx(Opcode, R600::OpName::src1_W)
};		};
int NegIdx[] = {		int NegIdx[] = {
TII->getOperandIdx(Opcode, AMDGPU::OpName::src0_neg_X),		TII->getOperandIdx(Opcode, R600::OpName::src0_neg_X),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src0_neg_Y),		TII->getOperandIdx(Opcode, R600::OpName::src0_neg_Y),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src0_neg_Z),		TII->getOperandIdx(Opcode, R600::OpName::src0_neg_Z),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src0_neg_W),		TII->getOperandIdx(Opcode, R600::OpName::src0_neg_W),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src1_neg_X),		TII->getOperandIdx(Opcode, R600::OpName::src1_neg_X),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src1_neg_Y),		TII->getOperandIdx(Opcode, R600::OpName::src1_neg_Y),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src1_neg_Z),		TII->getOperandIdx(Opcode, R600::OpName::src1_neg_Z),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src1_neg_W)		TII->getOperandIdx(Opcode, R600::OpName::src1_neg_W)
};		};
int AbsIdx[] = {		int AbsIdx[] = {
TII->getOperandIdx(Opcode, AMDGPU::OpName::src0_abs_X),		TII->getOperandIdx(Opcode, R600::OpName::src0_abs_X),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src0_abs_Y),		TII->getOperandIdx(Opcode, R600::OpName::src0_abs_Y),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src0_abs_Z),		TII->getOperandIdx(Opcode, R600::OpName::src0_abs_Z),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src0_abs_W),		TII->getOperandIdx(Opcode, R600::OpName::src0_abs_W),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src1_abs_X),		TII->getOperandIdx(Opcode, R600::OpName::src1_abs_X),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src1_abs_Y),		TII->getOperandIdx(Opcode, R600::OpName::src1_abs_Y),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src1_abs_Z),		TII->getOperandIdx(Opcode, R600::OpName::src1_abs_Z),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src1_abs_W)		TII->getOperandIdx(Opcode, R600::OpName::src1_abs_W)
};		};
for (unsigned i = 0; i < 8; i++) {		for (unsigned i = 0; i < 8; i++) {
if (OperandIdx[i] < 0)		if (OperandIdx[i] < 0)
return Node;		return Node;
SDValue &Src = Ops[OperandIdx[i] - 1];		SDValue &Src = Ops[OperandIdx[i] - 1];
SDValue &Neg = Ops[NegIdx[i] - 1];		SDValue &Neg = Ops[NegIdx[i] - 1];
SDValue &Abs = Ops[AbsIdx[i] - 1];		SDValue &Abs = Ops[AbsIdx[i] - 1];
bool HasDst = TII->getOperandIdx(Opcode, AMDGPU::OpName::dst) > -1;		bool HasDst = TII->getOperandIdx(Opcode, R600::OpName::dst) > -1;
int SelIdx = TII->getSelIdx(Opcode, OperandIdx[i]);		int SelIdx = TII->getSelIdx(Opcode, OperandIdx[i]);
if (HasDst)		if (HasDst)
SelIdx--;		SelIdx--;
SDValue &Sel = (SelIdx > -1) ? Ops[SelIdx] : FakeOp;		SDValue &Sel = (SelIdx > -1) ? Ops[SelIdx] : FakeOp;
if (FoldOperand(Node, i, Src, Neg, Abs, Sel, FakeOp, DAG))		if (FoldOperand(Node, i, Src, Neg, Abs, Sel, FakeOp, DAG))
return DAG.getMachineNode(Opcode, SDLoc(Node), Node->getVTList(), Ops);		return DAG.getMachineNode(Opcode, SDLoc(Node), Node->getVTList(), Ops);
}		}
} else if (Opcode == AMDGPU::REG_SEQUENCE) {		} else if (Opcode == R600::REG_SEQUENCE) {
for (unsigned i = 1, e = Node->getNumOperands(); i < e; i += 2) {		for (unsigned i = 1, e = Node->getNumOperands(); i < e; i += 2) {
SDValue &Src = Ops[i];		SDValue &Src = Ops[i];
if (FoldOperand(Node, i, Src, FakeOp, FakeOp, FakeOp, FakeOp, DAG))		if (FoldOperand(Node, i, Src, FakeOp, FakeOp, FakeOp, FakeOp, DAG))
return DAG.getMachineNode(Opcode, SDLoc(Node), Node->getVTList(), Ops);		return DAG.getMachineNode(Opcode, SDLoc(Node), Node->getVTList(), Ops);
}		}
} else if (Opcode == AMDGPU::CLAMP_R600) {		} else if (Opcode == R600::CLAMP_R600) {
SDValue Src = Node->getOperand(0);		SDValue Src = Node->getOperand(0);
if (!Src.isMachineOpcode() \|\|		if (!Src.isMachineOpcode() \|\|
!TII->hasInstrModifiers(Src.getMachineOpcode()))		!TII->hasInstrModifiers(Src.getMachineOpcode()))
return Node;		return Node;
int ClampIdx = TII->getOperandIdx(Src.getMachineOpcode(),		int ClampIdx = TII->getOperandIdx(Src.getMachineOpcode(),
AMDGPU::OpName::clamp);		R600::OpName::clamp);
if (ClampIdx < 0)		if (ClampIdx < 0)
return Node;		return Node;
SDLoc DL(Node);		SDLoc DL(Node);
std::vector<SDValue> Ops(Src->op_begin(), Src->op_end());		std::vector<SDValue> Ops(Src->op_begin(), Src->op_end());
Ops[ClampIdx - 1] = DAG.getTargetConstant(1, DL, MVT::i32);		Ops[ClampIdx - 1] = DAG.getTargetConstant(1, DL, MVT::i32);
return DAG.getMachineNode(Src.getMachineOpcode(), DL,		return DAG.getMachineNode(Src.getMachineOpcode(), DL,
Node->getVTList(), Ops);		Node->getVTList(), Ops);
} else {		} else {
if (!TII->hasInstrModifiers(Opcode))		if (!TII->hasInstrModifiers(Opcode))
return Node;		return Node;
int OperandIdx[] = {		int OperandIdx[] = {
TII->getOperandIdx(Opcode, AMDGPU::OpName::src0),		TII->getOperandIdx(Opcode, R600::OpName::src0),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src1),		TII->getOperandIdx(Opcode, R600::OpName::src1),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src2)		TII->getOperandIdx(Opcode, R600::OpName::src2)
};		};
int NegIdx[] = {		int NegIdx[] = {
TII->getOperandIdx(Opcode, AMDGPU::OpName::src0_neg),		TII->getOperandIdx(Opcode, R600::OpName::src0_neg),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src1_neg),		TII->getOperandIdx(Opcode, R600::OpName::src1_neg),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src2_neg)		TII->getOperandIdx(Opcode, R600::OpName::src2_neg)
};		};
int AbsIdx[] = {		int AbsIdx[] = {
TII->getOperandIdx(Opcode, AMDGPU::OpName::src0_abs),		TII->getOperandIdx(Opcode, R600::OpName::src0_abs),
TII->getOperandIdx(Opcode, AMDGPU::OpName::src1_abs),		TII->getOperandIdx(Opcode, R600::OpName::src1_abs),
-1		-1
};		};
for (unsigned i = 0; i < 3; i++) {		for (unsigned i = 0; i < 3; i++) {
if (OperandIdx[i] < 0)		if (OperandIdx[i] < 0)
return Node;		return Node;
SDValue &Src = Ops[OperandIdx[i] - 1];		SDValue &Src = Ops[OperandIdx[i] - 1];
SDValue &Neg = Ops[NegIdx[i] - 1];		SDValue &Neg = Ops[NegIdx[i] - 1];
SDValue FakeAbs;		SDValue FakeAbs;
SDValue &Abs = (AbsIdx[i] > -1) ? Ops[AbsIdx[i] - 1] : FakeAbs;		SDValue &Abs = (AbsIdx[i] > -1) ? Ops[AbsIdx[i] - 1] : FakeAbs;
bool HasDst = TII->getOperandIdx(Opcode, AMDGPU::OpName::dst) > -1;		bool HasDst = TII->getOperandIdx(Opcode, R600::OpName::dst) > -1;
int SelIdx = TII->getSelIdx(Opcode, OperandIdx[i]);		int SelIdx = TII->getSelIdx(Opcode, OperandIdx[i]);
int ImmIdx = TII->getOperandIdx(Opcode, AMDGPU::OpName::literal);		int ImmIdx = TII->getOperandIdx(Opcode, R600::OpName::literal);
if (HasDst) {		if (HasDst) {
SelIdx--;		SelIdx--;
ImmIdx--;		ImmIdx--;
}		}
SDValue &Sel = (SelIdx > -1) ? Ops[SelIdx] : FakeOp;		SDValue &Sel = (SelIdx > -1) ? Ops[SelIdx] : FakeOp;
SDValue &Imm = Ops[ImmIdx];		SDValue &Imm = Ops[ImmIdx];
if (FoldOperand(Node, i, Src, Neg, Abs, Sel, Imm, DAG))		if (FoldOperand(Node, i, Src, Neg, Abs, Sel, Imm, DAG))
return DAG.getMachineNode(Opcode, SDLoc(Node), Node->getVTList(), Ops);		return DAG.getMachineNode(Opcode, SDLoc(Node), Node->getVTList(), Ops);
}		}
}		}

return Node;		return Node;
}		}

lib/Target/AMDGPU/R600InstrFormats.td

Show All 35 Lines	class InstR600 <dag outs, dag ins, string asm, list<dag> pattern,
bit HasNativeOperands = 0;		bit HasNativeOperands = 0;
bit VTXInst = 0;		bit VTXInst = 0;
bit TEXInst = 0;		bit TEXInst = 0;
bit ALUInst = 0;		bit ALUInst = 0;
bit IsExport = 0;		bit IsExport = 0;
bit LDS_1A2D = 0;		bit LDS_1A2D = 0;

let SubtargetPredicate = isR600toCayman;		let SubtargetPredicate = isR600toCayman;
let Namespace = "AMDGPU";		let Namespace = "R600";
let OutOperandList = outs;		let OutOperandList = outs;
let InOperandList = ins;		let InOperandList = ins;
let AsmString = asm;		let AsmString = asm;
let Pattern = pattern;		let Pattern = pattern;
let Itinerary = itin;		let Itinerary = itin;

// No AsmMatcher support.		// No AsmMatcher support.
let isCodeGenOnly = 1;		let isCodeGenOnly = 1;
▲ Show 20 Lines • Show All 453 Lines • Show Last 20 Lines

lib/Target/AMDGPU/R600InstrInfo.h

Show All 9 Lines
/// \file		/// \file
/// \brief Interface definition for R600InstrInfo		/// \brief Interface definition for R600InstrInfo
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_LIB_TARGET_AMDGPU_R600INSTRINFO_H		#ifndef LLVM_LIB_TARGET_AMDGPU_R600INSTRINFO_H
#define LLVM_LIB_TARGET_AMDGPU_R600INSTRINFO_H		#define LLVM_LIB_TARGET_AMDGPU_R600INSTRINFO_H

#include "AMDGPUInstrInfo.h"
#include "R600RegisterInfo.h"		#include "R600RegisterInfo.h"
		#include "llvm/CodeGen/TargetInstrInfo.h"

		#define GET_INSTRINFO_HEADER
		#include "R600GenInstrInfo.inc"

namespace llvm {		namespace llvm {

namespace R600InstrFlags {		namespace R600InstrFlags {
enum : uint64_t {		enum : uint64_t {
REGISTER_STORE = UINT64_C(1) << 62,		REGISTER_STORE = UINT64_C(1) << 62,
REGISTER_LOAD = UINT64_C(1) << 63		REGISTER_LOAD = UINT64_C(1) << 63
};		};
}		}

class AMDGPUTargetMachine;		class AMDGPUTargetMachine;
class DFAPacketizer;		class DFAPacketizer;
class MachineFunction;		class MachineFunction;
class MachineInstr;		class MachineInstr;
class MachineInstrBuilder;		class MachineInstrBuilder;
class R600Subtarget;		class R600Subtarget;

class R600InstrInfo final : public AMDGPUInstrInfo {		class R600InstrInfo final : public R600GenInstrInfo {
private:		private:
const R600RegisterInfo RI;		const R600RegisterInfo RI;
const R600Subtarget &ST;		const R600Subtarget &ST;

std::vector<std::pair<int, unsigned>>		std::vector<std::pair<int, unsigned>>
ExtractSrcs(MachineInstr &MI, const DenseMap<unsigned, unsigned> &PV,		ExtractSrcs(MachineInstr &MI, const DenseMap<unsigned, unsigned> &PV,
unsigned &ConstCount) const;		unsigned &ConstCount) const;

▲ Show 20 Lines • Show All 273 Lines • ▼ Show 20 Lines	public:
bool isRegisterLoad(const MachineInstr &MI) const {		bool isRegisterLoad(const MachineInstr &MI) const {
return get(MI.getOpcode()).TSFlags & R600InstrFlags::REGISTER_LOAD;		return get(MI.getOpcode()).TSFlags & R600InstrFlags::REGISTER_LOAD;
}		}

unsigned getAddressSpaceForPseudoSourceKind(		unsigned getAddressSpaceForPseudoSourceKind(
PseudoSourceValue::PSVKind Kind) const override;		PseudoSourceValue::PSVKind Kind) const override;
};		};

namespace AMDGPU {		namespace R600 {

int getLDSNoRetOp(uint16_t Opcode);		int getLDSNoRetOp(uint16_t Opcode);

} //End namespace AMDGPU		} //End namespace AMDGPU

} // End llvm namespace		} // End llvm namespace

#endif		#endif

lib/Target/AMDGPU/R600InstrInfo.cpp

Show All 39 Lines
#include <cstring>		#include <cstring>
#include <iterator>		#include <iterator>
#include <utility>		#include <utility>
#include <vector>		#include <vector>

using namespace llvm;		using namespace llvm;

#define GET_INSTRINFO_CTOR_DTOR		#define GET_INSTRINFO_CTOR_DTOR
#include "AMDGPUGenDFAPacketizer.inc"		#include "R600GenDFAPacketizer.inc"

		#define GET_INSTRINFO_CTOR_DTOR
		#define GET_INSTRMAP_INFO
		#define GET_INSTRINFO_NAMED_OPS
		#include "R600GenInstrInfo.inc"

R600InstrInfo::R600InstrInfo(const R600Subtarget &ST)		R600InstrInfo::R600InstrInfo(const R600Subtarget &ST)
: AMDGPUInstrInfo(ST), RI(), ST(ST) {}		: R600GenInstrInfo(-1, -1), RI(), ST(ST) {}

bool R600InstrInfo::isVector(const MachineInstr &MI) const {		bool R600InstrInfo::isVector(const MachineInstr &MI) const {
return get(MI.getOpcode()).TSFlags & R600_InstFlag::VECTOR;		return get(MI.getOpcode()).TSFlags & R600_InstFlag::VECTOR;
}		}

void R600InstrInfo::copyPhysReg(MachineBasicBlock &MBB,		void R600InstrInfo::copyPhysReg(MachineBasicBlock &MBB,
MachineBasicBlock::iterator MI,		MachineBasicBlock::iterator MI,
const DebugLoc &DL, unsigned DestReg,		const DebugLoc &DL, unsigned DestReg,
unsigned SrcReg, bool KillSrc) const {		unsigned SrcReg, bool KillSrc) const {
unsigned VectorComponents = 0;		unsigned VectorComponents = 0;
if ((AMDGPU::R600_Reg128RegClass.contains(DestReg) \|\|		if ((R600::R600_Reg128RegClass.contains(DestReg) \|\|
AMDGPU::R600_Reg128VerticalRegClass.contains(DestReg)) &&		R600::R600_Reg128VerticalRegClass.contains(DestReg)) &&
(AMDGPU::R600_Reg128RegClass.contains(SrcReg) \|\|		(R600::R600_Reg128RegClass.contains(SrcReg) \|\|
AMDGPU::R600_Reg128VerticalRegClass.contains(SrcReg))) {		R600::R600_Reg128VerticalRegClass.contains(SrcReg))) {
VectorComponents = 4;		VectorComponents = 4;
} else if((AMDGPU::R600_Reg64RegClass.contains(DestReg) \|\|		} else if((R600::R600_Reg64RegClass.contains(DestReg) \|\|
AMDGPU::R600_Reg64VerticalRegClass.contains(DestReg)) &&		R600::R600_Reg64VerticalRegClass.contains(DestReg)) &&
(AMDGPU::R600_Reg64RegClass.contains(SrcReg) \|\|		(R600::R600_Reg64RegClass.contains(SrcReg) \|\|
AMDGPU::R600_Reg64VerticalRegClass.contains(SrcReg))) {		R600::R600_Reg64VerticalRegClass.contains(SrcReg))) {
VectorComponents = 2;		VectorComponents = 2;
}		}

if (VectorComponents > 0) {		if (VectorComponents > 0) {
for (unsigned I = 0; I < VectorComponents; I++) {		for (unsigned I = 0; I < VectorComponents; I++) {
unsigned SubRegIndex = AMDGPURegisterInfo::getSubRegFromChannel(I);		unsigned SubRegIndex = AMDGPURegisterInfo::getSubRegFromChannel(I);
buildDefaultInstruction(MBB, MI, AMDGPU::MOV,		buildDefaultInstruction(MBB, MI, R600::MOV,
RI.getSubReg(DestReg, SubRegIndex),		RI.getSubReg(DestReg, SubRegIndex),
RI.getSubReg(SrcReg, SubRegIndex))		RI.getSubReg(SrcReg, SubRegIndex))
.addReg(DestReg,		.addReg(DestReg,
RegState::Define \| RegState::Implicit);		RegState::Define \| RegState::Implicit);
}		}
} else {		} else {
MachineInstr *NewMI = buildDefaultInstruction(MBB, MI, AMDGPU::MOV,		MachineInstr *NewMI = buildDefaultInstruction(MBB, MI, R600::MOV,
DestReg, SrcReg);		DestReg, SrcReg);
NewMI->getOperand(getOperandIdx(*NewMI, AMDGPU::OpName::src0))		NewMI->getOperand(getOperandIdx(*NewMI, R600::OpName::src0))
.setIsKill(KillSrc);		.setIsKill(KillSrc);
}		}
}		}

/// \returns true if \p MBBI can be moved into a new basic.		/// \returns true if \p MBBI can be moved into a new basic.
bool R600InstrInfo::isLegalToSplitMBBAt(MachineBasicBlock &MBB,		bool R600InstrInfo::isLegalToSplitMBBAt(MachineBasicBlock &MBB,
MachineBasicBlock::iterator MBBI) const {		MachineBasicBlock::iterator MBBI) const {
for (MachineInstr::const_mop_iterator I = MBBI->operands_begin(),		for (MachineInstr::const_mop_iterator I = MBBI->operands_begin(),
E = MBBI->operands_end(); I != E; ++I) {		E = MBBI->operands_end(); I != E; ++I) {
if (I->isReg() && !TargetRegisterInfo::isVirtualRegister(I->getReg()) &&		if (I->isReg() && !TargetRegisterInfo::isVirtualRegister(I->getReg()) &&
I->isUse() && RI.isPhysRegLiveAcrossClauses(I->getReg()))		I->isUse() && RI.isPhysRegLiveAcrossClauses(I->getReg()))
return false;		return false;
}		}
return true;		return true;
}		}

bool R600InstrInfo::isMov(unsigned Opcode) const {		bool R600InstrInfo::isMov(unsigned Opcode) const {
switch(Opcode) {		switch(Opcode) {
default:		default:
return false;		return false;
case AMDGPU::MOV:		case R600::MOV:
case AMDGPU::MOV_IMM_F32:		case R600::MOV_IMM_F32:
case AMDGPU::MOV_IMM_I32:		case R600::MOV_IMM_I32:
return true;		return true;
}		}
}		}

bool R600InstrInfo::isReductionOp(unsigned Opcode) const {		bool R600InstrInfo::isReductionOp(unsigned Opcode) const {
return false;		return false;
}		}

bool R600InstrInfo::isCubeOp(unsigned Opcode) const {		bool R600InstrInfo::isCubeOp(unsigned Opcode) const {
switch(Opcode) {		switch(Opcode) {
default: return false;		default: return false;
case AMDGPU::CUBE_r600_pseudo:		case R600::CUBE_r600_pseudo:
case AMDGPU::CUBE_r600_real:		case R600::CUBE_r600_real:
case AMDGPU::CUBE_eg_pseudo:		case R600::CUBE_eg_pseudo:
case AMDGPU::CUBE_eg_real:		case R600::CUBE_eg_real:
return true;		return true;
}		}
}		}

bool R600InstrInfo::isALUInstr(unsigned Opcode) const {		bool R600InstrInfo::isALUInstr(unsigned Opcode) const {
unsigned TargetFlags = get(Opcode).TSFlags;		unsigned TargetFlags = get(Opcode).TSFlags;

return (TargetFlags & R600_InstFlag::ALU_INST);		return (TargetFlags & R600_InstFlag::ALU_INST);
Show All 11 Lines	bool R600InstrInfo::isLDSInstr(unsigned Opcode) const {
unsigned TargetFlags = get(Opcode).TSFlags;		unsigned TargetFlags = get(Opcode).TSFlags;

return ((TargetFlags & R600_InstFlag::LDS_1A) \|		return ((TargetFlags & R600_InstFlag::LDS_1A) \|
(TargetFlags & R600_InstFlag::LDS_1A1D) \|		(TargetFlags & R600_InstFlag::LDS_1A1D) \|
(TargetFlags & R600_InstFlag::LDS_1A2D));		(TargetFlags & R600_InstFlag::LDS_1A2D));
}		}

bool R600InstrInfo::isLDSRetInstr(unsigned Opcode) const {		bool R600InstrInfo::isLDSRetInstr(unsigned Opcode) const {
return isLDSInstr(Opcode) && getOperandIdx(Opcode, AMDGPU::OpName::dst) != -1;		return isLDSInstr(Opcode) && getOperandIdx(Opcode, R600::OpName::dst) != -1;
}		}

bool R600InstrInfo::canBeConsideredALU(const MachineInstr &MI) const {		bool R600InstrInfo::canBeConsideredALU(const MachineInstr &MI) const {
if (isALUInstr(MI.getOpcode()))		if (isALUInstr(MI.getOpcode()))
return true;		return true;
if (isVector(MI) \|\| isCubeOp(MI.getOpcode()))		if (isVector(MI) \|\| isCubeOp(MI.getOpcode()))
return true;		return true;
switch (MI.getOpcode()) {		switch (MI.getOpcode()) {
case AMDGPU::PRED_X:		case R600::PRED_X:
case AMDGPU::INTERP_PAIR_XY:		case R600::INTERP_PAIR_XY:
case AMDGPU::INTERP_PAIR_ZW:		case R600::INTERP_PAIR_ZW:
case AMDGPU::INTERP_VEC_LOAD:		case R600::INTERP_VEC_LOAD:
case AMDGPU::COPY:		case R600::COPY:
case AMDGPU::DOT_4:		case R600::DOT_4:
return true;		return true;
default:		default:
return false;		return false;
}		}
}		}

bool R600InstrInfo::isTransOnly(unsigned Opcode) const {		bool R600InstrInfo::isTransOnly(unsigned Opcode) const {
if (ST.hasCaymanISA())		if (ST.hasCaymanISA())
return false;		return false;
return (get(Opcode).getSchedClass() == AMDGPU::Sched::TransALU);		return (get(Opcode).getSchedClass() == R600::Sched::TransALU);
}		}

bool R600InstrInfo::isTransOnly(const MachineInstr &MI) const {		bool R600InstrInfo::isTransOnly(const MachineInstr &MI) const {
return isTransOnly(MI.getOpcode());		return isTransOnly(MI.getOpcode());
}		}

bool R600InstrInfo::isVectorOnly(unsigned Opcode) const {		bool R600InstrInfo::isVectorOnly(unsigned Opcode) const {
return (get(Opcode).getSchedClass() == AMDGPU::Sched::VecALU);		return (get(Opcode).getSchedClass() == R600::Sched::VecALU);
}		}

bool R600InstrInfo::isVectorOnly(const MachineInstr &MI) const {		bool R600InstrInfo::isVectorOnly(const MachineInstr &MI) const {
return isVectorOnly(MI.getOpcode());		return isVectorOnly(MI.getOpcode());
}		}

bool R600InstrInfo::isExport(unsigned Opcode) const {		bool R600InstrInfo::isExport(unsigned Opcode) const {
return (get(Opcode).TSFlags & R600_InstFlag::IS_EXPORT);		return (get(Opcode).TSFlags & R600_InstFlag::IS_EXPORT);
Show All 17 Lines	bool R600InstrInfo::usesTextureCache(const MachineInstr &MI) const {
const MachineFunction *MF = MI.getParent()->getParent();		const MachineFunction *MF = MI.getParent()->getParent();
return (AMDGPU::isCompute(MF->getFunction().getCallingConv()) &&		return (AMDGPU::isCompute(MF->getFunction().getCallingConv()) &&
usesVertexCache(MI.getOpcode())) \|\|		usesVertexCache(MI.getOpcode())) \|\|
usesTextureCache(MI.getOpcode());		usesTextureCache(MI.getOpcode());
}		}

bool R600InstrInfo::mustBeLastInClause(unsigned Opcode) const {		bool R600InstrInfo::mustBeLastInClause(unsigned Opcode) const {
switch (Opcode) {		switch (Opcode) {
case AMDGPU::KILLGT:		case R600::KILLGT:
case AMDGPU::GROUP_BARRIER:		case R600::GROUP_BARRIER:
return true;		return true;
default:		default:
return false;		return false;
}		}
}		}

bool R600InstrInfo::usesAddressRegister(MachineInstr &MI) const {		bool R600InstrInfo::usesAddressRegister(MachineInstr &MI) const {
return MI.findRegisterUseOperandIdx(AMDGPU::AR_X) != -1;		return MI.findRegisterUseOperandIdx(R600::AR_X) != -1;
}		}

bool R600InstrInfo::definesAddressRegister(MachineInstr &MI) const {		bool R600InstrInfo::definesAddressRegister(MachineInstr &MI) const {
return MI.findRegisterDefOperandIdx(AMDGPU::AR_X) != -1;		return MI.findRegisterDefOperandIdx(R600::AR_X) != -1;
}		}

bool R600InstrInfo::readsLDSSrcReg(const MachineInstr &MI) const {		bool R600InstrInfo::readsLDSSrcReg(const MachineInstr &MI) const {
if (!isALUInstr(MI.getOpcode())) {		if (!isALUInstr(MI.getOpcode())) {
return false;		return false;
}		}
for (MachineInstr::const_mop_iterator I = MI.operands_begin(),		for (MachineInstr::const_mop_iterator I = MI.operands_begin(),
E = MI.operands_end();		E = MI.operands_end();
I != E; ++I) {		I != E; ++I) {
if (!I->isReg() \|\| !I->isUse() \|\|		if (!I->isReg() \|\| !I->isUse() \|\|
TargetRegisterInfo::isVirtualRegister(I->getReg()))		TargetRegisterInfo::isVirtualRegister(I->getReg()))
continue;		continue;

if (AMDGPU::R600_LDS_SRC_REGRegClass.contains(I->getReg()))		if (R600::R600_LDS_SRC_REGRegClass.contains(I->getReg()))
return true;		return true;
}		}
return false;		return false;
}		}

int R600InstrInfo::getSelIdx(unsigned Opcode, unsigned SrcIdx) const {		int R600InstrInfo::getSelIdx(unsigned Opcode, unsigned SrcIdx) const {
static const unsigned SrcSelTable[][2] = {		static const unsigned SrcSelTable[][2] = {
{AMDGPU::OpName::src0, AMDGPU::OpName::src0_sel},		{R600::OpName::src0, R600::OpName::src0_sel},
{AMDGPU::OpName::src1, AMDGPU::OpName::src1_sel},		{R600::OpName::src1, R600::OpName::src1_sel},
{AMDGPU::OpName::src2, AMDGPU::OpName::src2_sel},		{R600::OpName::src2, R600::OpName::src2_sel},
{AMDGPU::OpName::src0_X, AMDGPU::OpName::src0_sel_X},		{R600::OpName::src0_X, R600::OpName::src0_sel_X},
{AMDGPU::OpName::src0_Y, AMDGPU::OpName::src0_sel_Y},		{R600::OpName::src0_Y, R600::OpName::src0_sel_Y},
{AMDGPU::OpName::src0_Z, AMDGPU::OpName::src0_sel_Z},		{R600::OpName::src0_Z, R600::OpName::src0_sel_Z},
{AMDGPU::OpName::src0_W, AMDGPU::OpName::src0_sel_W},		{R600::OpName::src0_W, R600::OpName::src0_sel_W},
{AMDGPU::OpName::src1_X, AMDGPU::OpName::src1_sel_X},		{R600::OpName::src1_X, R600::OpName::src1_sel_X},
{AMDGPU::OpName::src1_Y, AMDGPU::OpName::src1_sel_Y},		{R600::OpName::src1_Y, R600::OpName::src1_sel_Y},
{AMDGPU::OpName::src1_Z, AMDGPU::OpName::src1_sel_Z},		{R600::OpName::src1_Z, R600::OpName::src1_sel_Z},
{AMDGPU::OpName::src1_W, AMDGPU::OpName::src1_sel_W}		{R600::OpName::src1_W, R600::OpName::src1_sel_W}
};		};

for (const auto &Row : SrcSelTable) {		for (const auto &Row : SrcSelTable) {
if (getOperandIdx(Opcode, Row[0]) == (int)SrcIdx) {		if (getOperandIdx(Opcode, Row[0]) == (int)SrcIdx) {
return getOperandIdx(Opcode, Row[1]);		return getOperandIdx(Opcode, Row[1]);
}		}
}		}
return -1;		return -1;
}		}

SmallVector<std::pair<MachineOperand *, int64_t>, 3>		SmallVector<std::pair<MachineOperand *, int64_t>, 3>
R600InstrInfo::getSrcs(MachineInstr &MI) const {		R600InstrInfo::getSrcs(MachineInstr &MI) const {
SmallVector<std::pair<MachineOperand *, int64_t>, 3> Result;		SmallVector<std::pair<MachineOperand *, int64_t>, 3> Result;

if (MI.getOpcode() == AMDGPU::DOT_4) {		if (MI.getOpcode() == R600::DOT_4) {
static const unsigned OpTable[8][2] = {		static const unsigned OpTable[8][2] = {
{AMDGPU::OpName::src0_X, AMDGPU::OpName::src0_sel_X},		{R600::OpName::src0_X, R600::OpName::src0_sel_X},
{AMDGPU::OpName::src0_Y, AMDGPU::OpName::src0_sel_Y},		{R600::OpName::src0_Y, R600::OpName::src0_sel_Y},
{AMDGPU::OpName::src0_Z, AMDGPU::OpName::src0_sel_Z},		{R600::OpName::src0_Z, R600::OpName::src0_sel_Z},
{AMDGPU::OpName::src0_W, AMDGPU::OpName::src0_sel_W},		{R600::OpName::src0_W, R600::OpName::src0_sel_W},
{AMDGPU::OpName::src1_X, AMDGPU::OpName::src1_sel_X},		{R600::OpName::src1_X, R600::OpName::src1_sel_X},
{AMDGPU::OpName::src1_Y, AMDGPU::OpName::src1_sel_Y},		{R600::OpName::src1_Y, R600::OpName::src1_sel_Y},
{AMDGPU::OpName::src1_Z, AMDGPU::OpName::src1_sel_Z},		{R600::OpName::src1_Z, R600::OpName::src1_sel_Z},
{AMDGPU::OpName::src1_W, AMDGPU::OpName::src1_sel_W},		{R600::OpName::src1_W, R600::OpName::src1_sel_W},
};		};

for (unsigned j = 0; j < 8; j++) {		for (unsigned j = 0; j < 8; j++) {
MachineOperand &MO =		MachineOperand &MO =
MI.getOperand(getOperandIdx(MI.getOpcode(), OpTable[j][0]));		MI.getOperand(getOperandIdx(MI.getOpcode(), OpTable[j][0]));
unsigned Reg = MO.getReg();		unsigned Reg = MO.getReg();
if (Reg == AMDGPU::ALU_CONST) {		if (Reg == R600::ALU_CONST) {
MachineOperand &Sel =		MachineOperand &Sel =
MI.getOperand(getOperandIdx(MI.getOpcode(), OpTable[j][1]));		MI.getOperand(getOperandIdx(MI.getOpcode(), OpTable[j][1]));
Result.push_back(std::make_pair(&MO, Sel.getImm()));		Result.push_back(std::make_pair(&MO, Sel.getImm()));
continue;		continue;
}		}

}		}
return Result;		return Result;
}		}

static const unsigned OpTable[3][2] = {		static const unsigned OpTable[3][2] = {
{AMDGPU::OpName::src0, AMDGPU::OpName::src0_sel},		{R600::OpName::src0, R600::OpName::src0_sel},
{AMDGPU::OpName::src1, AMDGPU::OpName::src1_sel},		{R600::OpName::src1, R600::OpName::src1_sel},
{AMDGPU::OpName::src2, AMDGPU::OpName::src2_sel},		{R600::OpName::src2, R600::OpName::src2_sel},
};		};

for (unsigned j = 0; j < 3; j++) {		for (unsigned j = 0; j < 3; j++) {
int SrcIdx = getOperandIdx(MI.getOpcode(), OpTable[j][0]);		int SrcIdx = getOperandIdx(MI.getOpcode(), OpTable[j][0]);
if (SrcIdx < 0)		if (SrcIdx < 0)
break;		break;
MachineOperand &MO = MI.getOperand(SrcIdx);		MachineOperand &MO = MI.getOperand(SrcIdx);
unsigned Reg = MO.getReg();		unsigned Reg = MO.getReg();
if (Reg == AMDGPU::ALU_CONST) {		if (Reg == R600::ALU_CONST) {
MachineOperand &Sel =		MachineOperand &Sel =
MI.getOperand(getOperandIdx(MI.getOpcode(), OpTable[j][1]));		MI.getOperand(getOperandIdx(MI.getOpcode(), OpTable[j][1]));
Result.push_back(std::make_pair(&MO, Sel.getImm()));		Result.push_back(std::make_pair(&MO, Sel.getImm()));
continue;		continue;
}		}
if (Reg == AMDGPU::ALU_LITERAL_X) {		if (Reg == R600::ALU_LITERAL_X) {
MachineOperand &Operand =		MachineOperand &Operand =
MI.getOperand(getOperandIdx(MI.getOpcode(), AMDGPU::OpName::literal));		MI.getOperand(getOperandIdx(MI.getOpcode(), R600::OpName::literal));
if (Operand.isImm()) {		if (Operand.isImm()) {
Result.push_back(std::make_pair(&MO, Operand.getImm()));		Result.push_back(std::make_pair(&MO, Operand.getImm()));
continue;		continue;
}		}
assert(Operand.isGlobal());		assert(Operand.isGlobal());
}		}
Result.push_back(std::make_pair(&MO, 0));		Result.push_back(std::make_pair(&MO, 0));
}		}
return Result;		return Result;
}		}

std::vector<std::pair<int, unsigned>>		std::vector<std::pair<int, unsigned>>
R600InstrInfo::ExtractSrcs(MachineInstr &MI,		R600InstrInfo::ExtractSrcs(MachineInstr &MI,
const DenseMap<unsigned, unsigned> &PV,		const DenseMap<unsigned, unsigned> &PV,
unsigned &ConstCount) const {		unsigned &ConstCount) const {
ConstCount = 0;		ConstCount = 0;
const std::pair<int, unsigned> DummyPair(-1, 0);		const std::pair<int, unsigned> DummyPair(-1, 0);
std::vector<std::pair<int, unsigned>> Result;		std::vector<std::pair<int, unsigned>> Result;
unsigned i = 0;		unsigned i = 0;
for (const auto &Src : getSrcs(MI)) {		for (const auto &Src : getSrcs(MI)) {
++i;		++i;
unsigned Reg = Src.first->getReg();		unsigned Reg = Src.first->getReg();
int Index = RI.getEncodingValue(Reg) & 0xff;		int Index = RI.getEncodingValue(Reg) & 0xff;
if (Reg == AMDGPU::OQAP) {		if (Reg == R600::OQAP) {
Result.push_back(std::make_pair(Index, 0U));		Result.push_back(std::make_pair(Index, 0U));
}		}
if (PV.find(Reg) != PV.end()) {		if (PV.find(Reg) != PV.end()) {
// 255 is used to tells its a PS/PV reg		// 255 is used to tells its a PS/PV reg
Result.push_back(std::make_pair(255, 0U));		Result.push_back(std::make_pair(255, 0U));
continue;		continue;
}		}
if (Index > 127) {		if (Index > 127) {
▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines	unsigned R600InstrInfo::isLegalUpTo(
memset(Vector, -1, sizeof(Vector));		memset(Vector, -1, sizeof(Vector));
for (unsigned i = 0, e = IGSrcs.size(); i < e; i++) {		for (unsigned i = 0, e = IGSrcs.size(); i < e; i++) {
const std::vector<std::pair<int, unsigned>> &Srcs =		const std::vector<std::pair<int, unsigned>> &Srcs =
Swizzle(IGSrcs[i], Swz[i]);		Swizzle(IGSrcs[i], Swz[i]);
for (unsigned j = 0; j < 3; j++) {		for (unsigned j = 0; j < 3; j++) {
const std::pair<int, unsigned> &Src = Srcs[j];		const std::pair<int, unsigned> &Src = Srcs[j];
if (Src.first < 0 \|\| Src.first == 255)		if (Src.first < 0 \|\| Src.first == 255)
continue;		continue;
if (Src.first == GET_REG_INDEX(RI.getEncodingValue(AMDGPU::OQAP))) {		if (Src.first == GET_REG_INDEX(RI.getEncodingValue(R600::OQAP))) {
if (Swz[i] != R600InstrInfo::ALU_VEC_012_SCL_210 &&		if (Swz[i] != R600InstrInfo::ALU_VEC_012_SCL_210 &&
Swz[i] != R600InstrInfo::ALU_VEC_021_SCL_122) {		Swz[i] != R600InstrInfo::ALU_VEC_021_SCL_122) {
// The value from output queue A (denoted by register OQAP) can		// The value from output queue A (denoted by register OQAP) can
// only be fetched during the first cycle.		// only be fetched during the first cycle.
return false;		return false;
}		}
// OQAP does not count towards the normal read port restrictions		// OQAP does not count towards the normal read port restrictions
continue;		continue;
▲ Show 20 Lines • Show All 89 Lines • ▼ Show 20 Lines	R600InstrInfo::fitsReadPortLimitations(const std::vector<MachineInstr *> &IG,

std::vector<std::vector<std::pair<int, unsigned>>> IGSrcs;		std::vector<std::vector<std::pair<int, unsigned>>> IGSrcs;
ValidSwizzle.clear();		ValidSwizzle.clear();
unsigned ConstCount;		unsigned ConstCount;
BankSwizzle TransBS = ALU_VEC_012_SCL_210;		BankSwizzle TransBS = ALU_VEC_012_SCL_210;
for (unsigned i = 0, e = IG.size(); i < e; ++i) {		for (unsigned i = 0, e = IG.size(); i < e; ++i) {
IGSrcs.push_back(ExtractSrcs(*IG[i], PV, ConstCount));		IGSrcs.push_back(ExtractSrcs(*IG[i], PV, ConstCount));
unsigned Op = getOperandIdx(IG[i]->getOpcode(),		unsigned Op = getOperandIdx(IG[i]->getOpcode(),
AMDGPU::OpName::bank_swizzle);		R600::OpName::bank_swizzle);
ValidSwizzle.push_back( (R600InstrInfo::BankSwizzle)		ValidSwizzle.push_back( (R600InstrInfo::BankSwizzle)
IG[i]->getOperand(Op).getImm());		IG[i]->getOperand(Op).getImm());
}		}
std::vector<std::pair<int, unsigned>> TransOps;		std::vector<std::pair<int, unsigned>> TransOps;
if (!isLastAluTrans)		if (!isLastAluTrans)
return FindSwizzleForVectorSlot(IGSrcs, ValidSwizzle, TransOps, TransBS);		return FindSwizzleForVectorSlot(IGSrcs, ValidSwizzle, TransOps, TransBS);

TransOps = std::move(IGSrcs.back());		TransOps = std::move(IGSrcs.back());
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	R600InstrInfo::fitsConstReadLimitations(const std::vector<MachineInstr *> &MIs)
std::vector<unsigned> Consts;		std::vector<unsigned> Consts;
SmallSet<int64_t, 4> Literals;		SmallSet<int64_t, 4> Literals;
for (unsigned i = 0, n = MIs.size(); i < n; i++) {		for (unsigned i = 0, n = MIs.size(); i < n; i++) {
MachineInstr &MI = *MIs[i];		MachineInstr &MI = *MIs[i];
if (!isALUInstr(MI.getOpcode()))		if (!isALUInstr(MI.getOpcode()))
continue;		continue;

for (const auto &Src : getSrcs(MI)) {		for (const auto &Src : getSrcs(MI)) {
if (Src.first->getReg() == AMDGPU::ALU_LITERAL_X)		if (Src.first->getReg() == R600::ALU_LITERAL_X)
Literals.insert(Src.second);		Literals.insert(Src.second);
if (Literals.size() > 4)		if (Literals.size() > 4)
return false;		return false;
if (Src.first->getReg() == AMDGPU::ALU_CONST)		if (Src.first->getReg() == R600::ALU_CONST)
Consts.push_back(Src.second);		Consts.push_back(Src.second);
if (AMDGPU::R600_KC0RegClass.contains(Src.first->getReg()) \|\|		if (R600::R600_KC0RegClass.contains(Src.first->getReg()) \|\|
AMDGPU::R600_KC1RegClass.contains(Src.first->getReg())) {		R600::R600_KC1RegClass.contains(Src.first->getReg())) {
unsigned Index = RI.getEncodingValue(Src.first->getReg()) & 0xff;		unsigned Index = RI.getEncodingValue(Src.first->getReg()) & 0xff;
unsigned Chan = RI.getHWRegChan(Src.first->getReg());		unsigned Chan = RI.getHWRegChan(Src.first->getReg());
Consts.push_back((Index << 2) \| Chan);		Consts.push_back((Index << 2) \| Chan);
}		}
}		}
}		}
return fitsConstReadLimitations(Consts);		return fitsConstReadLimitations(Consts);
}		}

DFAPacketizer *		DFAPacketizer *
R600InstrInfo::CreateTargetScheduleState(const TargetSubtargetInfo &STI) const {		R600InstrInfo::CreateTargetScheduleState(const TargetSubtargetInfo &STI) const {
const InstrItineraryData *II = STI.getInstrItineraryData();		const InstrItineraryData *II = STI.getInstrItineraryData();
return static_cast<const R600Subtarget &>(STI).createDFAPacketizer(II);		return static_cast<const R600Subtarget &>(STI).createDFAPacketizer(II);
}		}

static bool		static bool
isPredicateSetter(unsigned Opcode) {		isPredicateSetter(unsigned Opcode) {
switch (Opcode) {		switch (Opcode) {
case AMDGPU::PRED_X:		case R600::PRED_X:
return true;		return true;
default:		default:
return false;		return false;
}		}
}		}

static MachineInstr *		static MachineInstr *
findFirstPredicateSetterFrom(MachineBasicBlock &MBB,		findFirstPredicateSetterFrom(MachineBasicBlock &MBB,
MachineBasicBlock::iterator I) {		MachineBasicBlock::iterator I) {
while (I != MBB.begin()) {		while (I != MBB.begin()) {
--I;		--I;
MachineInstr &MI = *I;		MachineInstr &MI = *I;
if (isPredicateSetter(MI.getOpcode()))		if (isPredicateSetter(MI.getOpcode()))
return &MI;		return &MI;
}		}

return nullptr;		return nullptr;
}		}

static		static
bool isJump(unsigned Opcode) {		bool isJump(unsigned Opcode) {
return Opcode == AMDGPU::JUMP \|\| Opcode == AMDGPU::JUMP_COND;		return Opcode == R600::JUMP \|\| Opcode == R600::JUMP_COND;
}		}

static bool isBranch(unsigned Opcode) {		static bool isBranch(unsigned Opcode) {
return Opcode == AMDGPU::BRANCH \|\| Opcode == AMDGPU::BRANCH_COND_i32 \|\|		return Opcode == R600::BRANCH \|\| Opcode == R600::BRANCH_COND_i32 \|\|
Opcode == AMDGPU::BRANCH_COND_f32;		Opcode == R600::BRANCH_COND_f32;
}		}

bool R600InstrInfo::analyzeBranch(MachineBasicBlock &MBB,		bool R600InstrInfo::analyzeBranch(MachineBasicBlock &MBB,
MachineBasicBlock *&TBB,		MachineBasicBlock *&TBB,
MachineBasicBlock *&FBB,		MachineBasicBlock *&FBB,
SmallVectorImpl<MachineOperand> &Cond,		SmallVectorImpl<MachineOperand> &Cond,
bool AllowModify) const {		bool AllowModify) const {
// Most of the following comes from the ARM implementation of AnalyzeBranch		// Most of the following comes from the ARM implementation of AnalyzeBranch

// If the block has no terminators, it just falls into the block after it.		// If the block has no terminators, it just falls into the block after it.
MachineBasicBlock::iterator I = MBB.getLastNonDebugInstr();		MachineBasicBlock::iterator I = MBB.getLastNonDebugInstr();
if (I == MBB.end())		if (I == MBB.end())
return false;		return false;

// AMDGPU::BRANCH* instructions are only available after isel and are not		// R600::BRANCH* instructions are only available after isel and are not
// handled		// handled
if (isBranch(I->getOpcode()))		if (isBranch(I->getOpcode()))
return true;		return true;
if (!isJump(I->getOpcode())) {		if (!isJump(I->getOpcode())) {
return false;		return false;
}		}

// Remove successive JUMP		// Remove successive JUMP
while (I != MBB.begin() && std::prev(I)->getOpcode() == AMDGPU::JUMP) {		while (I != MBB.begin() && std::prev(I)->getOpcode() == R600::JUMP) {
MachineBasicBlock::iterator PriorI = std::prev(I);		MachineBasicBlock::iterator PriorI = std::prev(I);
if (AllowModify)		if (AllowModify)
I->removeFromParent();		I->removeFromParent();
I = PriorI;		I = PriorI;
}		}
MachineInstr &LastInst = *I;		MachineInstr &LastInst = *I;

// If there is only one terminator instruction, process it.		// If there is only one terminator instruction, process it.
unsigned LastOpc = LastInst.getOpcode();		unsigned LastOpc = LastInst.getOpcode();
if (I == MBB.begin() \|\| !isJump((--I)->getOpcode())) {		if (I == MBB.begin() \|\| !isJump((--I)->getOpcode())) {
if (LastOpc == AMDGPU::JUMP) {		if (LastOpc == R600::JUMP) {
TBB = LastInst.getOperand(0).getMBB();		TBB = LastInst.getOperand(0).getMBB();
return false;		return false;
} else if (LastOpc == AMDGPU::JUMP_COND) {		} else if (LastOpc == R600::JUMP_COND) {
auto predSet = I;		auto predSet = I;
while (!isPredicateSetter(predSet->getOpcode())) {		while (!isPredicateSetter(predSet->getOpcode())) {
predSet = --I;		predSet = --I;
}		}
TBB = LastInst.getOperand(0).getMBB();		TBB = LastInst.getOperand(0).getMBB();
Cond.push_back(predSet->getOperand(1));		Cond.push_back(predSet->getOperand(1));
Cond.push_back(predSet->getOperand(2));		Cond.push_back(predSet->getOperand(2));
Cond.push_back(MachineOperand::CreateReg(AMDGPU::PRED_SEL_ONE, false));		Cond.push_back(MachineOperand::CreateReg(R600::PRED_SEL_ONE, false));
return false;		return false;
}		}
return true; // Can't handle indirect branch.		return true; // Can't handle indirect branch.
}		}

// Get the instruction before it if it is a terminator.		// Get the instruction before it if it is a terminator.
MachineInstr &SecondLastInst = *I;		MachineInstr &SecondLastInst = *I;
unsigned SecondLastOpc = SecondLastInst.getOpcode();		unsigned SecondLastOpc = SecondLastInst.getOpcode();

// If the block ends with a B and a Bcc, handle it.		// If the block ends with a B and a Bcc, handle it.
if (SecondLastOpc == AMDGPU::JUMP_COND && LastOpc == AMDGPU::JUMP) {		if (SecondLastOpc == R600::JUMP_COND && LastOpc == R600::JUMP) {
auto predSet = --I;		auto predSet = --I;
while (!isPredicateSetter(predSet->getOpcode())) {		while (!isPredicateSetter(predSet->getOpcode())) {
predSet = --I;		predSet = --I;
}		}
TBB = SecondLastInst.getOperand(0).getMBB();		TBB = SecondLastInst.getOperand(0).getMBB();
FBB = LastInst.getOperand(0).getMBB();		FBB = LastInst.getOperand(0).getMBB();
Cond.push_back(predSet->getOperand(1));		Cond.push_back(predSet->getOperand(1));
Cond.push_back(predSet->getOperand(2));		Cond.push_back(predSet->getOperand(2));
Cond.push_back(MachineOperand::CreateReg(AMDGPU::PRED_SEL_ONE, false));		Cond.push_back(MachineOperand::CreateReg(R600::PRED_SEL_ONE, false));
return false;		return false;
}		}

// Otherwise, can't handle this.		// Otherwise, can't handle this.
return true;		return true;
}		}

static		static
MachineBasicBlock::iterator FindLastAluClause(MachineBasicBlock &MBB) {		MachineBasicBlock::iterator FindLastAluClause(MachineBasicBlock &MBB) {
for (MachineBasicBlock::reverse_iterator It = MBB.rbegin(), E = MBB.rend();		for (MachineBasicBlock::reverse_iterator It = MBB.rbegin(), E = MBB.rend();
It != E; ++It) {		It != E; ++It) {
if (It->getOpcode() == AMDGPU::CF_ALU \|\|		if (It->getOpcode() == R600::CF_ALU \|\|
It->getOpcode() == AMDGPU::CF_ALU_PUSH_BEFORE)		It->getOpcode() == R600::CF_ALU_PUSH_BEFORE)
return It.getReverse();		return It.getReverse();
}		}
return MBB.end();		return MBB.end();
}		}

unsigned R600InstrInfo::insertBranch(MachineBasicBlock &MBB,		unsigned R600InstrInfo::insertBranch(MachineBasicBlock &MBB,
MachineBasicBlock *TBB,		MachineBasicBlock *TBB,
MachineBasicBlock *FBB,		MachineBasicBlock *FBB,
ArrayRef<MachineOperand> Cond,		ArrayRef<MachineOperand> Cond,
const DebugLoc &DL,		const DebugLoc &DL,
int *BytesAdded) const {		int *BytesAdded) const {
assert(TBB && "insertBranch must not be told to insert a fallthrough");		assert(TBB && "insertBranch must not be told to insert a fallthrough");
assert(!BytesAdded && "code size not handled");		assert(!BytesAdded && "code size not handled");

if (!FBB) {		if (!FBB) {
if (Cond.empty()) {		if (Cond.empty()) {
BuildMI(&MBB, DL, get(AMDGPU::JUMP)).addMBB(TBB);		BuildMI(&MBB, DL, get(R600::JUMP)).addMBB(TBB);
return 1;		return 1;
} else {		} else {
MachineInstr *PredSet = findFirstPredicateSetterFrom(MBB, MBB.end());		MachineInstr *PredSet = findFirstPredicateSetterFrom(MBB, MBB.end());
assert(PredSet && "No previous predicate !");		assert(PredSet && "No previous predicate !");
addFlag(*PredSet, 0, MO_FLAG_PUSH);		addFlag(*PredSet, 0, MO_FLAG_PUSH);
PredSet->getOperand(2).setImm(Cond[1].getImm());		PredSet->getOperand(2).setImm(Cond[1].getImm());

BuildMI(&MBB, DL, get(AMDGPU::JUMP_COND))		BuildMI(&MBB, DL, get(R600::JUMP_COND))
.addMBB(TBB)		.addMBB(TBB)
.addReg(AMDGPU::PREDICATE_BIT, RegState::Kill);		.addReg(R600::PREDICATE_BIT, RegState::Kill);
MachineBasicBlock::iterator CfAlu = FindLastAluClause(MBB);		MachineBasicBlock::iterator CfAlu = FindLastAluClause(MBB);
if (CfAlu == MBB.end())		if (CfAlu == MBB.end())
return 1;		return 1;
assert (CfAlu->getOpcode() == AMDGPU::CF_ALU);		assert (CfAlu->getOpcode() == R600::CF_ALU);
CfAlu->setDesc(get(AMDGPU::CF_ALU_PUSH_BEFORE));		CfAlu->setDesc(get(R600::CF_ALU_PUSH_BEFORE));
return 1;		return 1;
}		}
} else {		} else {
MachineInstr *PredSet = findFirstPredicateSetterFrom(MBB, MBB.end());		MachineInstr *PredSet = findFirstPredicateSetterFrom(MBB, MBB.end());
assert(PredSet && "No previous predicate !");		assert(PredSet && "No previous predicate !");
addFlag(*PredSet, 0, MO_FLAG_PUSH);		addFlag(*PredSet, 0, MO_FLAG_PUSH);
PredSet->getOperand(2).setImm(Cond[1].getImm());		PredSet->getOperand(2).setImm(Cond[1].getImm());
BuildMI(&MBB, DL, get(AMDGPU::JUMP_COND))		BuildMI(&MBB, DL, get(R600::JUMP_COND))
.addMBB(TBB)		.addMBB(TBB)
.addReg(AMDGPU::PREDICATE_BIT, RegState::Kill);		.addReg(R600::PREDICATE_BIT, RegState::Kill);
BuildMI(&MBB, DL, get(AMDGPU::JUMP)).addMBB(FBB);		BuildMI(&MBB, DL, get(R600::JUMP)).addMBB(FBB);
MachineBasicBlock::iterator CfAlu = FindLastAluClause(MBB);		MachineBasicBlock::iterator CfAlu = FindLastAluClause(MBB);
if (CfAlu == MBB.end())		if (CfAlu == MBB.end())
return 2;		return 2;
assert (CfAlu->getOpcode() == AMDGPU::CF_ALU);		assert (CfAlu->getOpcode() == R600::CF_ALU);
CfAlu->setDesc(get(AMDGPU::CF_ALU_PUSH_BEFORE));		CfAlu->setDesc(get(R600::CF_ALU_PUSH_BEFORE));
return 2;		return 2;
}		}
}		}

unsigned R600InstrInfo::removeBranch(MachineBasicBlock &MBB,		unsigned R600InstrInfo::removeBranch(MachineBasicBlock &MBB,
int *BytesRemoved) const {		int *BytesRemoved) const {
assert(!BytesRemoved && "code size not handled");		assert(!BytesRemoved && "code size not handled");

// Note : we leave PRED* instructions there.		// Note : we leave PRED* instructions there.
// They may be needed when predicating instructions.		// They may be needed when predicating instructions.

MachineBasicBlock::iterator I = MBB.end();		MachineBasicBlock::iterator I = MBB.end();

if (I == MBB.begin()) {		if (I == MBB.begin()) {
return 0;		return 0;
}		}
--I;		--I;
switch (I->getOpcode()) {		switch (I->getOpcode()) {
default:		default:
return 0;		return 0;
case AMDGPU::JUMP_COND: {		case R600::JUMP_COND: {
MachineInstr *predSet = findFirstPredicateSetterFrom(MBB, I);		MachineInstr *predSet = findFirstPredicateSetterFrom(MBB, I);
clearFlag(*predSet, 0, MO_FLAG_PUSH);		clearFlag(*predSet, 0, MO_FLAG_PUSH);
I->eraseFromParent();		I->eraseFromParent();
MachineBasicBlock::iterator CfAlu = FindLastAluClause(MBB);		MachineBasicBlock::iterator CfAlu = FindLastAluClause(MBB);
if (CfAlu == MBB.end())		if (CfAlu == MBB.end())
break;		break;
assert (CfAlu->getOpcode() == AMDGPU::CF_ALU_PUSH_BEFORE);		assert (CfAlu->getOpcode() == R600::CF_ALU_PUSH_BEFORE);
CfAlu->setDesc(get(AMDGPU::CF_ALU));		CfAlu->setDesc(get(R600::CF_ALU));
break;		break;
}		}
case AMDGPU::JUMP:		case R600::JUMP:
I->eraseFromParent();		I->eraseFromParent();
break;		break;
}		}
I = MBB.end();		I = MBB.end();

if (I == MBB.begin()) {		if (I == MBB.begin()) {
return 1;		return 1;
}		}
--I;		--I;
switch (I->getOpcode()) {		switch (I->getOpcode()) {
// FIXME: only one case??		// FIXME: only one case??
default:		default:
return 1;		return 1;
case AMDGPU::JUMP_COND: {		case R600::JUMP_COND: {
MachineInstr *predSet = findFirstPredicateSetterFrom(MBB, I);		MachineInstr *predSet = findFirstPredicateSetterFrom(MBB, I);
clearFlag(*predSet, 0, MO_FLAG_PUSH);		clearFlag(*predSet, 0, MO_FLAG_PUSH);
I->eraseFromParent();		I->eraseFromParent();
MachineBasicBlock::iterator CfAlu = FindLastAluClause(MBB);		MachineBasicBlock::iterator CfAlu = FindLastAluClause(MBB);
if (CfAlu == MBB.end())		if (CfAlu == MBB.end())
break;		break;
assert (CfAlu->getOpcode() == AMDGPU::CF_ALU_PUSH_BEFORE);		assert (CfAlu->getOpcode() == R600::CF_ALU_PUSH_BEFORE);
CfAlu->setDesc(get(AMDGPU::CF_ALU));		CfAlu->setDesc(get(R600::CF_ALU));
break;		break;
}		}
case AMDGPU::JUMP:		case R600::JUMP:
I->eraseFromParent();		I->eraseFromParent();
break;		break;
}		}
return 2;		return 2;
}		}

bool R600InstrInfo::isPredicated(const MachineInstr &MI) const {		bool R600InstrInfo::isPredicated(const MachineInstr &MI) const {
int idx = MI.findFirstPredOperandIdx();		int idx = MI.findFirstPredOperandIdx();
if (idx < 0)		if (idx < 0)
return false;		return false;

unsigned Reg = MI.getOperand(idx).getReg();		unsigned Reg = MI.getOperand(idx).getReg();
switch (Reg) {		switch (Reg) {
default: return false;		default: return false;
case AMDGPU::PRED_SEL_ONE:		case R600::PRED_SEL_ONE:
case AMDGPU::PRED_SEL_ZERO:		case R600::PRED_SEL_ZERO:
case AMDGPU::PREDICATE_BIT:		case R600::PREDICATE_BIT:
return true;		return true;
}		}
}		}

bool R600InstrInfo::isPredicable(const MachineInstr &MI) const {		bool R600InstrInfo::isPredicable(const MachineInstr &MI) const {
// XXX: KILL* instructions can be predicated, but they must be the last		// XXX: KILL* instructions can be predicated, but they must be the last
// instruction in a clause, so this means any instructions after them cannot		// instruction in a clause, so this means any instructions after them cannot
// be predicated. Until we have proper support for instruction clauses in the		// be predicated. Until we have proper support for instruction clauses in the
// backend, we will mark KILL* instructions as unpredicable.		// backend, we will mark KILL* instructions as unpredicable.

if (MI.getOpcode() == AMDGPU::KILLGT) {		if (MI.getOpcode() == R600::KILLGT) {
return false;		return false;
} else if (MI.getOpcode() == AMDGPU::CF_ALU) {		} else if (MI.getOpcode() == R600::CF_ALU) {
// If the clause start in the middle of MBB then the MBB has more		// If the clause start in the middle of MBB then the MBB has more
// than a single clause, unable to predicate several clauses.		// than a single clause, unable to predicate several clauses.
if (MI.getParent()->begin() != MachineBasicBlock::const_iterator(MI))		if (MI.getParent()->begin() != MachineBasicBlock::const_iterator(MI))
return false;		return false;
// TODO: We don't support KC merging atm		// TODO: We don't support KC merging atm
return MI.getOperand(3).getImm() == 0 && MI.getOperand(4).getImm() == 0;		return MI.getOperand(3).getImm() == 0 && MI.getOperand(4).getImm() == 0;
} else if (isVector(MI)) {		} else if (isVector(MI)) {
return false;		return false;
} else {		} else {
return AMDGPUInstrInfo::isPredicable(MI);		return TargetInstrInfo::isPredicable(MI);
}		}
}		}

bool		bool
R600InstrInfo::isProfitableToIfCvt(MachineBasicBlock &MBB,		R600InstrInfo::isProfitableToIfCvt(MachineBasicBlock &MBB,
unsigned NumCycles,		unsigned NumCycles,
unsigned ExtraPredCycles,		unsigned ExtraPredCycles,
BranchProbability Probability) const{		BranchProbability Probability) const{
Show All 24 Lines	R600InstrInfo::isProfitableToUnpredicate(MachineBasicBlock &TMBB,
MachineBasicBlock &FMBB) const {		MachineBasicBlock &FMBB) const {
return false;		return false;
}		}

bool		bool
R600InstrInfo::reverseBranchCondition(SmallVectorImpl<MachineOperand> &Cond) const {		R600InstrInfo::reverseBranchCondition(SmallVectorImpl<MachineOperand> &Cond) const {
MachineOperand &MO = Cond[1];		MachineOperand &MO = Cond[1];
switch (MO.getImm()) {		switch (MO.getImm()) {
case AMDGPU::PRED_SETE_INT:		case R600::PRED_SETE_INT:
MO.setImm(AMDGPU::PRED_SETNE_INT);		MO.setImm(R600::PRED_SETNE_INT);
break;		break;
case AMDGPU::PRED_SETNE_INT:		case R600::PRED_SETNE_INT:
MO.setImm(AMDGPU::PRED_SETE_INT);		MO.setImm(R600::PRED_SETE_INT);
break;		break;
case AMDGPU::PRED_SETE:		case R600::PRED_SETE:
MO.setImm(AMDGPU::PRED_SETNE);		MO.setImm(R600::PRED_SETNE);
break;		break;
case AMDGPU::PRED_SETNE:		case R600::PRED_SETNE:
MO.setImm(AMDGPU::PRED_SETE);		MO.setImm(R600::PRED_SETE);
break;		break;
default:		default:
return true;		return true;
}		}

MachineOperand &MO2 = Cond[2];		MachineOperand &MO2 = Cond[2];
switch (MO2.getReg()) {		switch (MO2.getReg()) {
case AMDGPU::PRED_SEL_ZERO:		case R600::PRED_SEL_ZERO:
MO2.setReg(AMDGPU::PRED_SEL_ONE);		MO2.setReg(R600::PRED_SEL_ONE);
break;		break;
case AMDGPU::PRED_SEL_ONE:		case R600::PRED_SEL_ONE:
MO2.setReg(AMDGPU::PRED_SEL_ZERO);		MO2.setReg(R600::PRED_SEL_ZERO);
break;		break;
default:		default:
return true;		return true;
}		}
return false;		return false;
}		}

bool R600InstrInfo::DefinesPredicate(MachineInstr &MI,		bool R600InstrInfo::DefinesPredicate(MachineInstr &MI,
std::vector<MachineOperand> &Pred) const {		std::vector<MachineOperand> &Pred) const {
return isPredicateSetter(MI.getOpcode());		return isPredicateSetter(MI.getOpcode());
}		}

bool R600InstrInfo::PredicateInstruction(MachineInstr &MI,		bool R600InstrInfo::PredicateInstruction(MachineInstr &MI,
ArrayRef<MachineOperand> Pred) const {		ArrayRef<MachineOperand> Pred) const {
int PIdx = MI.findFirstPredOperandIdx();		int PIdx = MI.findFirstPredOperandIdx();

if (MI.getOpcode() == AMDGPU::CF_ALU) {		if (MI.getOpcode() == R600::CF_ALU) {
MI.getOperand(8).setImm(0);		MI.getOperand(8).setImm(0);
return true;		return true;
}		}

if (MI.getOpcode() == AMDGPU::DOT_4) {		if (MI.getOpcode() == R600::DOT_4) {
MI.getOperand(getOperandIdx(MI, AMDGPU::OpName::pred_sel_X))		MI.getOperand(getOperandIdx(MI, R600::OpName::pred_sel_X))
.setReg(Pred[2].getReg());		.setReg(Pred[2].getReg());
MI.getOperand(getOperandIdx(MI, AMDGPU::OpName::pred_sel_Y))		MI.getOperand(getOperandIdx(MI, R600::OpName::pred_sel_Y))
.setReg(Pred[2].getReg());		.setReg(Pred[2].getReg());
MI.getOperand(getOperandIdx(MI, AMDGPU::OpName::pred_sel_Z))		MI.getOperand(getOperandIdx(MI, R600::OpName::pred_sel_Z))
.setReg(Pred[2].getReg());		.setReg(Pred[2].getReg());
MI.getOperand(getOperandIdx(MI, AMDGPU::OpName::pred_sel_W))		MI.getOperand(getOperandIdx(MI, R600::OpName::pred_sel_W))
.setReg(Pred[2].getReg());		.setReg(Pred[2].getReg());
MachineInstrBuilder MIB(*MI.getParent()->getParent(), MI);		MachineInstrBuilder MIB(*MI.getParent()->getParent(), MI);
MIB.addReg(AMDGPU::PREDICATE_BIT, RegState::Implicit);		MIB.addReg(R600::PREDICATE_BIT, RegState::Implicit);
return true;		return true;
}		}

if (PIdx != -1) {		if (PIdx != -1) {
MachineOperand &PMO = MI.getOperand(PIdx);		MachineOperand &PMO = MI.getOperand(PIdx);
PMO.setReg(Pred[2].getReg());		PMO.setReg(Pred[2].getReg());
MachineInstrBuilder MIB(*MI.getParent()->getParent(), MI);		MachineInstrBuilder MIB(*MI.getParent()->getParent(), MI);
MIB.addReg(AMDGPU::PREDICATE_BIT, RegState::Implicit);		MIB.addReg(R600::PREDICATE_BIT, RegState::Implicit);
return true;		return true;
}		}

return false;		return false;
}		}

unsigned int R600InstrInfo::getPredicationCost(const MachineInstr &) const {		unsigned int R600InstrInfo::getPredicationCost(const MachineInstr &) const {
return 2;		return 2;
Show All 13 Lines	unsigned R600InstrInfo::calculateIndirectAddress(unsigned RegIndex,
return RegIndex;		return RegIndex;
}		}

bool R600InstrInfo::expandPostRAPseudo(MachineInstr &MI) const {		bool R600InstrInfo::expandPostRAPseudo(MachineInstr &MI) const {
switch (MI.getOpcode()) {		switch (MI.getOpcode()) {
default: {		default: {
MachineBasicBlock *MBB = MI.getParent();		MachineBasicBlock *MBB = MI.getParent();
int OffsetOpIdx =		int OffsetOpIdx =
AMDGPU::getNamedOperandIdx(MI.getOpcode(), AMDGPU::OpName::addr);		R600::getNamedOperandIdx(MI.getOpcode(), R600::OpName::addr);
// addr is a custom operand with multiple MI operands, and only the		// addr is a custom operand with multiple MI operands, and only the
// first MI operand is given a name.		// first MI operand is given a name.
int RegOpIdx = OffsetOpIdx + 1;		int RegOpIdx = OffsetOpIdx + 1;
int ChanOpIdx =		int ChanOpIdx =
AMDGPU::getNamedOperandIdx(MI.getOpcode(), AMDGPU::OpName::chan);		R600::getNamedOperandIdx(MI.getOpcode(), R600::OpName::chan);
if (isRegisterLoad(MI)) {		if (isRegisterLoad(MI)) {
int DstOpIdx =		int DstOpIdx =
AMDGPU::getNamedOperandIdx(MI.getOpcode(), AMDGPU::OpName::dst);		R600::getNamedOperandIdx(MI.getOpcode(), R600::OpName::dst);
unsigned RegIndex = MI.getOperand(RegOpIdx).getImm();		unsigned RegIndex = MI.getOperand(RegOpIdx).getImm();
unsigned Channel = MI.getOperand(ChanOpIdx).getImm();		unsigned Channel = MI.getOperand(ChanOpIdx).getImm();
unsigned Address = calculateIndirectAddress(RegIndex, Channel);		unsigned Address = calculateIndirectAddress(RegIndex, Channel);
unsigned OffsetReg = MI.getOperand(OffsetOpIdx).getReg();		unsigned OffsetReg = MI.getOperand(OffsetOpIdx).getReg();
if (OffsetReg == AMDGPU::INDIRECT_BASE_ADDR) {		if (OffsetReg == R600::INDIRECT_BASE_ADDR) {
buildMovInstr(MBB, MI, MI.getOperand(DstOpIdx).getReg(),		buildMovInstr(MBB, MI, MI.getOperand(DstOpIdx).getReg(),
getIndirectAddrRegClass()->getRegister(Address));		getIndirectAddrRegClass()->getRegister(Address));
} else {		} else {
buildIndirectRead(MBB, MI, MI.getOperand(DstOpIdx).getReg(), Address,		buildIndirectRead(MBB, MI, MI.getOperand(DstOpIdx).getReg(), Address,
OffsetReg);		OffsetReg);
}		}
} else if (isRegisterStore(MI)) {		} else if (isRegisterStore(MI)) {
int ValOpIdx =		int ValOpIdx =
AMDGPU::getNamedOperandIdx(MI.getOpcode(), AMDGPU::OpName::val);		R600::getNamedOperandIdx(MI.getOpcode(), R600::OpName::val);
unsigned RegIndex = MI.getOperand(RegOpIdx).getImm();		unsigned RegIndex = MI.getOperand(RegOpIdx).getImm();
unsigned Channel = MI.getOperand(ChanOpIdx).getImm();		unsigned Channel = MI.getOperand(ChanOpIdx).getImm();
unsigned Address = calculateIndirectAddress(RegIndex, Channel);		unsigned Address = calculateIndirectAddress(RegIndex, Channel);
unsigned OffsetReg = MI.getOperand(OffsetOpIdx).getReg();		unsigned OffsetReg = MI.getOperand(OffsetOpIdx).getReg();
if (OffsetReg == AMDGPU::INDIRECT_BASE_ADDR) {		if (OffsetReg == R600::INDIRECT_BASE_ADDR) {
buildMovInstr(MBB, MI, getIndirectAddrRegClass()->getRegister(Address),		buildMovInstr(MBB, MI, getIndirectAddrRegClass()->getRegister(Address),
MI.getOperand(ValOpIdx).getReg());		MI.getOperand(ValOpIdx).getReg());
} else {		} else {
buildIndirectWrite(MBB, MI, MI.getOperand(ValOpIdx).getReg(),		buildIndirectWrite(MBB, MI, MI.getOperand(ValOpIdx).getReg(),
calculateIndirectAddress(RegIndex, Channel),		calculateIndirectAddress(RegIndex, Channel),
OffsetReg);		OffsetReg);
}		}
} else {		} else {
return false;		return false;
}		}

MBB->erase(MI);		MBB->erase(MI);
return true;		return true;
}		}
case AMDGPU::R600_EXTRACT_ELT_V2:		case R600::R600_EXTRACT_ELT_V2:
case AMDGPU::R600_EXTRACT_ELT_V4:		case R600::R600_EXTRACT_ELT_V4:
buildIndirectRead(MI.getParent(), MI, MI.getOperand(0).getReg(),		buildIndirectRead(MI.getParent(), MI, MI.getOperand(0).getReg(),
RI.getHWRegIndex(MI.getOperand(1).getReg()), // Address		RI.getHWRegIndex(MI.getOperand(1).getReg()), // Address
MI.getOperand(2).getReg(),		MI.getOperand(2).getReg(),
RI.getHWRegChan(MI.getOperand(1).getReg()));		RI.getHWRegChan(MI.getOperand(1).getReg()));
break;		break;
case AMDGPU::R600_INSERT_ELT_V2:		case R600::R600_INSERT_ELT_V2:
case AMDGPU::R600_INSERT_ELT_V4:		case R600::R600_INSERT_ELT_V4:
buildIndirectWrite(MI.getParent(), MI, MI.getOperand(2).getReg(), // Value		buildIndirectWrite(MI.getParent(), MI, MI.getOperand(2).getReg(), // Value
RI.getHWRegIndex(MI.getOperand(1).getReg()), // Address		RI.getHWRegIndex(MI.getOperand(1).getReg()), // Address
MI.getOperand(3).getReg(), // Offset		MI.getOperand(3).getReg(), // Offset
RI.getHWRegChan(MI.getOperand(1).getReg())); // Channel		RI.getHWRegChan(MI.getOperand(1).getReg())); // Channel
break;		break;
}		}
MI.eraseFromParent();		MI.eraseFromParent();
return true;		return true;
}		}

void R600InstrInfo::reserveIndirectRegisters(BitVector &Reserved,		void R600InstrInfo::reserveIndirectRegisters(BitVector &Reserved,
const MachineFunction &MF,		const MachineFunction &MF,
const R600RegisterInfo &TRI) const {		const R600RegisterInfo &TRI) const {
const R600Subtarget &ST = MF.getSubtarget<R600Subtarget>();		const R600Subtarget &ST = MF.getSubtarget<R600Subtarget>();
const R600FrameLowering *TFL = ST.getFrameLowering();		const R600FrameLowering *TFL = ST.getFrameLowering();

unsigned StackWidth = TFL->getStackWidth(MF);		unsigned StackWidth = TFL->getStackWidth(MF);
int End = getIndirectIndexEnd(MF);		int End = getIndirectIndexEnd(MF);

if (End == -1)		if (End == -1)
return;		return;

for (int Index = getIndirectIndexBegin(MF); Index <= End; ++Index) {		for (int Index = getIndirectIndexBegin(MF); Index <= End; ++Index) {
for (unsigned Chan = 0; Chan < StackWidth; ++Chan) {		for (unsigned Chan = 0; Chan < StackWidth; ++Chan) {
unsigned Reg = AMDGPU::R600_TReg32RegClass.getRegister((4 * Index) + Chan);		unsigned Reg = R600::R600_TReg32RegClass.getRegister((4 * Index) + Chan);
TRI.reserveRegisterTuples(Reserved, Reg);		TRI.reserveRegisterTuples(Reserved, Reg);
}		}
}		}
}		}

const TargetRegisterClass *R600InstrInfo::getIndirectAddrRegClass() const {		const TargetRegisterClass *R600InstrInfo::getIndirectAddrRegClass() const {
return &AMDGPU::R600_TReg32_XRegClass;		return &R600::R600_TReg32_XRegClass;
}		}

MachineInstrBuilder R600InstrInfo::buildIndirectWrite(MachineBasicBlock *MBB,		MachineInstrBuilder R600InstrInfo::buildIndirectWrite(MachineBasicBlock *MBB,
MachineBasicBlock::iterator I,		MachineBasicBlock::iterator I,
unsigned ValueReg, unsigned Address,		unsigned ValueReg, unsigned Address,
unsigned OffsetReg) const {		unsigned OffsetReg) const {
return buildIndirectWrite(MBB, I, ValueReg, Address, OffsetReg, 0);		return buildIndirectWrite(MBB, I, ValueReg, Address, OffsetReg, 0);
}		}

MachineInstrBuilder R600InstrInfo::buildIndirectWrite(MachineBasicBlock *MBB,		MachineInstrBuilder R600InstrInfo::buildIndirectWrite(MachineBasicBlock *MBB,
MachineBasicBlock::iterator I,		MachineBasicBlock::iterator I,
unsigned ValueReg, unsigned Address,		unsigned ValueReg, unsigned Address,
unsigned OffsetReg,		unsigned OffsetReg,
unsigned AddrChan) const {		unsigned AddrChan) const {
unsigned AddrReg;		unsigned AddrReg;
switch (AddrChan) {		switch (AddrChan) {
default: llvm_unreachable("Invalid Channel");		default: llvm_unreachable("Invalid Channel");
case 0: AddrReg = AMDGPU::R600_AddrRegClass.getRegister(Address); break;		case 0: AddrReg = R600::R600_AddrRegClass.getRegister(Address); break;
case 1: AddrReg = AMDGPU::R600_Addr_YRegClass.getRegister(Address); break;		case 1: AddrReg = R600::R600_Addr_YRegClass.getRegister(Address); break;
case 2: AddrReg = AMDGPU::R600_Addr_ZRegClass.getRegister(Address); break;		case 2: AddrReg = R600::R600_Addr_ZRegClass.getRegister(Address); break;
case 3: AddrReg = AMDGPU::R600_Addr_WRegClass.getRegister(Address); break;		case 3: AddrReg = R600::R600_Addr_WRegClass.getRegister(Address); break;
}		}
MachineInstr MOVA = buildDefaultInstruction(MBB, I, AMDGPU::MOVA_INT_eg,		MachineInstr MOVA = buildDefaultInstruction(MBB, I, R600::MOVA_INT_eg,
AMDGPU::AR_X, OffsetReg);		R600::AR_X, OffsetReg);
setImmOperand(*MOVA, AMDGPU::OpName::write, 0);		setImmOperand(*MOVA, R600::OpName::write, 0);

MachineInstrBuilder Mov = buildDefaultInstruction(*MBB, I, AMDGPU::MOV,		MachineInstrBuilder Mov = buildDefaultInstruction(*MBB, I, R600::MOV,
AddrReg, ValueReg)		AddrReg, ValueReg)
.addReg(AMDGPU::AR_X,		.addReg(R600::AR_X,
RegState::Implicit \| RegState::Kill);		RegState::Implicit \| RegState::Kill);
setImmOperand(*Mov, AMDGPU::OpName::dst_rel, 1);		setImmOperand(*Mov, R600::OpName::dst_rel, 1);
return Mov;		return Mov;
}		}

MachineInstrBuilder R600InstrInfo::buildIndirectRead(MachineBasicBlock *MBB,		MachineInstrBuilder R600InstrInfo::buildIndirectRead(MachineBasicBlock *MBB,
MachineBasicBlock::iterator I,		MachineBasicBlock::iterator I,
unsigned ValueReg, unsigned Address,		unsigned ValueReg, unsigned Address,
unsigned OffsetReg) const {		unsigned OffsetReg) const {
return buildIndirectRead(MBB, I, ValueReg, Address, OffsetReg, 0);		return buildIndirectRead(MBB, I, ValueReg, Address, OffsetReg, 0);
}		}

MachineInstrBuilder R600InstrInfo::buildIndirectRead(MachineBasicBlock *MBB,		MachineInstrBuilder R600InstrInfo::buildIndirectRead(MachineBasicBlock *MBB,
MachineBasicBlock::iterator I,		MachineBasicBlock::iterator I,
unsigned ValueReg, unsigned Address,		unsigned ValueReg, unsigned Address,
unsigned OffsetReg,		unsigned OffsetReg,
unsigned AddrChan) const {		unsigned AddrChan) const {
unsigned AddrReg;		unsigned AddrReg;
switch (AddrChan) {		switch (AddrChan) {
default: llvm_unreachable("Invalid Channel");		default: llvm_unreachable("Invalid Channel");
case 0: AddrReg = AMDGPU::R600_AddrRegClass.getRegister(Address); break;		case 0: AddrReg = R600::R600_AddrRegClass.getRegister(Address); break;
case 1: AddrReg = AMDGPU::R600_Addr_YRegClass.getRegister(Address); break;		case 1: AddrReg = R600::R600_Addr_YRegClass.getRegister(Address); break;
case 2: AddrReg = AMDGPU::R600_Addr_ZRegClass.getRegister(Address); break;		case 2: AddrReg = R600::R600_Addr_ZRegClass.getRegister(Address); break;
case 3: AddrReg = AMDGPU::R600_Addr_WRegClass.getRegister(Address); break;		case 3: AddrReg = R600::R600_Addr_WRegClass.getRegister(Address); break;
}		}
MachineInstr MOVA = buildDefaultInstruction(MBB, I, AMDGPU::MOVA_INT_eg,		MachineInstr MOVA = buildDefaultInstruction(MBB, I, R600::MOVA_INT_eg,
AMDGPU::AR_X,		R600::AR_X,
OffsetReg);		OffsetReg);
setImmOperand(*MOVA, AMDGPU::OpName::write, 0);		setImmOperand(*MOVA, R600::OpName::write, 0);
MachineInstrBuilder Mov = buildDefaultInstruction(*MBB, I, AMDGPU::MOV,		MachineInstrBuilder Mov = buildDefaultInstruction(*MBB, I, R600::MOV,
ValueReg,		ValueReg,
AddrReg)		AddrReg)
.addReg(AMDGPU::AR_X,		.addReg(R600::AR_X,
RegState::Implicit \| RegState::Kill);		RegState::Implicit \| RegState::Kill);
setImmOperand(*Mov, AMDGPU::OpName::src0_rel, 1);		setImmOperand(*Mov, R600::OpName::src0_rel, 1);

return Mov;		return Mov;
}		}

int R600InstrInfo::getIndirectIndexBegin(const MachineFunction &MF) const {		int R600InstrInfo::getIndirectIndexBegin(const MachineFunction &MF) const {
const MachineRegisterInfo &MRI = MF.getRegInfo();		const MachineRegisterInfo &MRI = MF.getRegInfo();
const MachineFrameInfo &MFI = MF.getFrameInfo();		const MachineFrameInfo &MFI = MF.getFrameInfo();
int Offset = -1;		int Offset = -1;
▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines	MIB.addReg(Src1Reg) // $src1
.addImm(0) // $src1_rel		.addImm(0) // $src1_rel
.addImm(0) // $src1_abs		.addImm(0) // $src1_abs
.addImm(-1); // $src1_sel		.addImm(-1); // $src1_sel
}		}

//XXX: The r600g finalizer expects this to be 1, once we've moved the		//XXX: The r600g finalizer expects this to be 1, once we've moved the
//scheduling to the backend, we can change the default to 0.		//scheduling to the backend, we can change the default to 0.
MIB.addImm(1) // $last		MIB.addImm(1) // $last
.addReg(AMDGPU::PRED_SEL_OFF) // $pred_sel		.addReg(R600::PRED_SEL_OFF) // $pred_sel
.addImm(0) // $literal		.addImm(0) // $literal
.addImm(0); // $bank_swizzle		.addImm(0); // $bank_swizzle

return MIB;		return MIB;
}		}

#define OPERAND_CASE(Label) \		#define OPERAND_CASE(Label) \
case Label: { \		case Label: { \
static const unsigned Ops[] = \		static const unsigned Ops[] = \
{ \		{ \
Label##_X, \		Label##_X, \
Label##_Y, \		Label##_Y, \
Label##_Z, \		Label##_Z, \
Label##_W \		Label##_W \
}; \		}; \
return Ops[Slot]; \		return Ops[Slot]; \
}		}

static unsigned getSlotedOps(unsigned Op, unsigned Slot) {		static unsigned getSlotedOps(unsigned Op, unsigned Slot) {
switch (Op) {		switch (Op) {
OPERAND_CASE(AMDGPU::OpName::update_exec_mask)		OPERAND_CASE(R600::OpName::update_exec_mask)
OPERAND_CASE(AMDGPU::OpName::update_pred)		OPERAND_CASE(R600::OpName::update_pred)
OPERAND_CASE(AMDGPU::OpName::write)		OPERAND_CASE(R600::OpName::write)
OPERAND_CASE(AMDGPU::OpName::omod)		OPERAND_CASE(R600::OpName::omod)
OPERAND_CASE(AMDGPU::OpName::dst_rel)		OPERAND_CASE(R600::OpName::dst_rel)
OPERAND_CASE(AMDGPU::OpName::clamp)		OPERAND_CASE(R600::OpName::clamp)
OPERAND_CASE(AMDGPU::OpName::src0)		OPERAND_CASE(R600::OpName::src0)
OPERAND_CASE(AMDGPU::OpName::src0_neg)		OPERAND_CASE(R600::OpName::src0_neg)
OPERAND_CASE(AMDGPU::OpName::src0_rel)		OPERAND_CASE(R600::OpName::src0_rel)
OPERAND_CASE(AMDGPU::OpName::src0_abs)		OPERAND_CASE(R600::OpName::src0_abs)
OPERAND_CASE(AMDGPU::OpName::src0_sel)		OPERAND_CASE(R600::OpName::src0_sel)
OPERAND_CASE(AMDGPU::OpName::src1)		OPERAND_CASE(R600::OpName::src1)
OPERAND_CASE(AMDGPU::OpName::src1_neg)		OPERAND_CASE(R600::OpName::src1_neg)
OPERAND_CASE(AMDGPU::OpName::src1_rel)		OPERAND_CASE(R600::OpName::src1_rel)
OPERAND_CASE(AMDGPU::OpName::src1_abs)		OPERAND_CASE(R600::OpName::src1_abs)
OPERAND_CASE(AMDGPU::OpName::src1_sel)		OPERAND_CASE(R600::OpName::src1_sel)
OPERAND_CASE(AMDGPU::OpName::pred_sel)		OPERAND_CASE(R600::OpName::pred_sel)
default:		default:
llvm_unreachable("Wrong Operand");		llvm_unreachable("Wrong Operand");
}		}
}		}

#undef OPERAND_CASE		#undef OPERAND_CASE

MachineInstr *R600InstrInfo::buildSlotOfVectorInstruction(		MachineInstr *R600InstrInfo::buildSlotOfVectorInstruction(
MachineBasicBlock &MBB, MachineInstr *MI, unsigned Slot, unsigned DstReg)		MachineBasicBlock &MBB, MachineInstr *MI, unsigned Slot, unsigned DstReg)
const {		const {
assert (MI->getOpcode() == AMDGPU::DOT_4 && "Not Implemented");		assert (MI->getOpcode() == R600::DOT_4 && "Not Implemented");
unsigned Opcode;		unsigned Opcode;
if (ST.getGeneration() <= R600Subtarget::R700)		if (ST.getGeneration() <= R600Subtarget::R700)
Opcode = AMDGPU::DOT4_r600;		Opcode = R600::DOT4_r600;
else		else
Opcode = AMDGPU::DOT4_eg;		Opcode = R600::DOT4_eg;
MachineBasicBlock::iterator I = MI;		MachineBasicBlock::iterator I = MI;
MachineOperand &Src0 = MI->getOperand(		MachineOperand &Src0 = MI->getOperand(
getOperandIdx(MI->getOpcode(), getSlotedOps(AMDGPU::OpName::src0, Slot)));		getOperandIdx(MI->getOpcode(), getSlotedOps(R600::OpName::src0, Slot)));
MachineOperand &Src1 = MI->getOperand(		MachineOperand &Src1 = MI->getOperand(
getOperandIdx(MI->getOpcode(), getSlotedOps(AMDGPU::OpName::src1, Slot)));		getOperandIdx(MI->getOpcode(), getSlotedOps(R600::OpName::src1, Slot)));
MachineInstr *MIB = buildDefaultInstruction(		MachineInstr *MIB = buildDefaultInstruction(
MBB, I, Opcode, DstReg, Src0.getReg(), Src1.getReg());		MBB, I, Opcode, DstReg, Src0.getReg(), Src1.getReg());
static const unsigned Operands[14] = {		static const unsigned Operands[14] = {
AMDGPU::OpName::update_exec_mask,		R600::OpName::update_exec_mask,
AMDGPU::OpName::update_pred,		R600::OpName::update_pred,
AMDGPU::OpName::write,		R600::OpName::write,
AMDGPU::OpName::omod,		R600::OpName::omod,
AMDGPU::OpName::dst_rel,		R600::OpName::dst_rel,
AMDGPU::OpName::clamp,		R600::OpName::clamp,
AMDGPU::OpName::src0_neg,		R600::OpName::src0_neg,
AMDGPU::OpName::src0_rel,		R600::OpName::src0_rel,
AMDGPU::OpName::src0_abs,		R600::OpName::src0_abs,
AMDGPU::OpName::src0_sel,		R600::OpName::src0_sel,
AMDGPU::OpName::src1_neg,		R600::OpName::src1_neg,
AMDGPU::OpName::src1_rel,		R600::OpName::src1_rel,
AMDGPU::OpName::src1_abs,		R600::OpName::src1_abs,
AMDGPU::OpName::src1_sel,		R600::OpName::src1_sel,
};		};

MachineOperand &MO = MI->getOperand(getOperandIdx(MI->getOpcode(),		MachineOperand &MO = MI->getOperand(getOperandIdx(MI->getOpcode(),
getSlotedOps(AMDGPU::OpName::pred_sel, Slot)));		getSlotedOps(R600::OpName::pred_sel, Slot)));
MIB->getOperand(getOperandIdx(Opcode, AMDGPU::OpName::pred_sel))		MIB->getOperand(getOperandIdx(Opcode, R600::OpName::pred_sel))
.setReg(MO.getReg());		.setReg(MO.getReg());

for (unsigned i = 0; i < 14; i++) {		for (unsigned i = 0; i < 14; i++) {
MachineOperand &MO = MI->getOperand(		MachineOperand &MO = MI->getOperand(
getOperandIdx(MI->getOpcode(), getSlotedOps(Operands[i], Slot)));		getOperandIdx(MI->getOpcode(), getSlotedOps(Operands[i], Slot)));
assert (MO.isImm());		assert (MO.isImm());
setImmOperand(*MIB, Operands[i], MO.getImm());		setImmOperand(*MIB, Operands[i], MO.getImm());
}		}
MIB->getOperand(20).setImm(0);		MIB->getOperand(20).setImm(0);
return MIB;		return MIB;
}		}

MachineInstr *R600InstrInfo::buildMovImm(MachineBasicBlock &BB,		MachineInstr *R600InstrInfo::buildMovImm(MachineBasicBlock &BB,
MachineBasicBlock::iterator I,		MachineBasicBlock::iterator I,
unsigned DstReg,		unsigned DstReg,
uint64_t Imm) const {		uint64_t Imm) const {
MachineInstr *MovImm = buildDefaultInstruction(BB, I, AMDGPU::MOV, DstReg,		MachineInstr *MovImm = buildDefaultInstruction(BB, I, R600::MOV, DstReg,
AMDGPU::ALU_LITERAL_X);		R600::ALU_LITERAL_X);
setImmOperand(*MovImm, AMDGPU::OpName::literal, Imm);		setImmOperand(*MovImm, R600::OpName::literal, Imm);
return MovImm;		return MovImm;
}		}

MachineInstr R600InstrInfo::buildMovInstr(MachineBasicBlock MBB,		MachineInstr R600InstrInfo::buildMovInstr(MachineBasicBlock MBB,
MachineBasicBlock::iterator I,		MachineBasicBlock::iterator I,
unsigned DstReg, unsigned SrcReg) const {		unsigned DstReg, unsigned SrcReg) const {
return buildDefaultInstruction(*MBB, I, AMDGPU::MOV, DstReg, SrcReg);		return buildDefaultInstruction(*MBB, I, R600::MOV, DstReg, SrcReg);
}		}

int R600InstrInfo::getOperandIdx(const MachineInstr &MI, unsigned Op) const {		int R600InstrInfo::getOperandIdx(const MachineInstr &MI, unsigned Op) const {
return getOperandIdx(MI.getOpcode(), Op);		return getOperandIdx(MI.getOpcode(), Op);
}		}

int R600InstrInfo::getOperandIdx(unsigned Opcode, unsigned Op) const {		int R600InstrInfo::getOperandIdx(unsigned Opcode, unsigned Op) const {
return AMDGPU::getNamedOperandIdx(Opcode, Op);		return R600::getNamedOperandIdx(Opcode, Op);
}		}

void R600InstrInfo::setImmOperand(MachineInstr &MI, unsigned Op,		void R600InstrInfo::setImmOperand(MachineInstr &MI, unsigned Op,
int64_t Imm) const {		int64_t Imm) const {
int Idx = getOperandIdx(MI, Op);		int Idx = getOperandIdx(MI, Op);
assert(Idx != -1 && "Operand not supported for this instruction.");		assert(Idx != -1 && "Operand not supported for this instruction.");
assert(MI.getOperand(Idx).isImm());		assert(MI.getOperand(Idx).isImm());
MI.getOperand(Idx).setImm(Imm);		MI.getOperand(Idx).setImm(Imm);
Show All 10 Lines	MachineOperand &R600InstrInfo::getFlagOp(MachineInstr &MI, unsigned SrcIdx,
if (Flag != 0) {		if (Flag != 0) {
// If we pass something other than the default value of Flag to this		// If we pass something other than the default value of Flag to this
// function, it means we are want to set a flag on an instruction		// function, it means we are want to set a flag on an instruction
// that uses native encoding.		// that uses native encoding.
assert(HAS_NATIVE_OPERANDS(TargetFlags));		assert(HAS_NATIVE_OPERANDS(TargetFlags));
bool IsOP3 = (TargetFlags & R600_InstFlag::OP3) == R600_InstFlag::OP3;		bool IsOP3 = (TargetFlags & R600_InstFlag::OP3) == R600_InstFlag::OP3;
switch (Flag) {		switch (Flag) {
case MO_FLAG_CLAMP:		case MO_FLAG_CLAMP:
FlagIndex = getOperandIdx(MI, AMDGPU::OpName::clamp);		FlagIndex = getOperandIdx(MI, R600::OpName::clamp);
break;		break;
case MO_FLAG_MASK:		case MO_FLAG_MASK:
FlagIndex = getOperandIdx(MI, AMDGPU::OpName::write);		FlagIndex = getOperandIdx(MI, R600::OpName::write);
break;		break;
case MO_FLAG_NOT_LAST:		case MO_FLAG_NOT_LAST:
case MO_FLAG_LAST:		case MO_FLAG_LAST:
FlagIndex = getOperandIdx(MI, AMDGPU::OpName::last);		FlagIndex = getOperandIdx(MI, R600::OpName::last);
break;		break;
case MO_FLAG_NEG:		case MO_FLAG_NEG:
switch (SrcIdx) {		switch (SrcIdx) {
case 0:		case 0:
FlagIndex = getOperandIdx(MI, AMDGPU::OpName::src0_neg);		FlagIndex = getOperandIdx(MI, R600::OpName::src0_neg);
break;		break;
case 1:		case 1:
FlagIndex = getOperandIdx(MI, AMDGPU::OpName::src1_neg);		FlagIndex = getOperandIdx(MI, R600::OpName::src1_neg);
break;		break;
case 2:		case 2:
FlagIndex = getOperandIdx(MI, AMDGPU::OpName::src2_neg);		FlagIndex = getOperandIdx(MI, R600::OpName::src2_neg);
break;		break;
}		}
break;		break;

case MO_FLAG_ABS:		case MO_FLAG_ABS:
assert(!IsOP3 && "Cannot set absolute value modifier for OP3 "		assert(!IsOP3 && "Cannot set absolute value modifier for OP3 "
"instructions.");		"instructions.");
(void)IsOP3;		(void)IsOP3;
switch (SrcIdx) {		switch (SrcIdx) {
case 0:		case 0:
FlagIndex = getOperandIdx(MI, AMDGPU::OpName::src0_abs);		FlagIndex = getOperandIdx(MI, R600::OpName::src0_abs);
break;		break;
case 1:		case 1:
FlagIndex = getOperandIdx(MI, AMDGPU::OpName::src1_abs);		FlagIndex = getOperandIdx(MI, R600::OpName::src1_abs);
break;		break;
}		}
break;		break;

default:		default:
FlagIndex = -1;		FlagIndex = -1;
break;		break;
}		}
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	void R600InstrInfo::clearFlag(MachineInstr &MI, unsigned Operand,
}		}
}		}

unsigned R600InstrInfo::getAddressSpaceForPseudoSourceKind(		unsigned R600InstrInfo::getAddressSpaceForPseudoSourceKind(
PseudoSourceValue::PSVKind Kind) const {		PseudoSourceValue::PSVKind Kind) const {
switch (Kind) {		switch (Kind) {
case PseudoSourceValue::Stack:		case PseudoSourceValue::Stack:
case PseudoSourceValue::FixedStack:		case PseudoSourceValue::FixedStack:
return AMDGPUASI.PRIVATE_ADDRESS;		return ST.getAMDGPUAS().PRIVATE_ADDRESS;
case PseudoSourceValue::ConstantPool:		case PseudoSourceValue::ConstantPool:
case PseudoSourceValue::GOT:		case PseudoSourceValue::GOT:
case PseudoSourceValue::JumpTable:		case PseudoSourceValue::JumpTable:
case PseudoSourceValue::GlobalValueCallEntry:		case PseudoSourceValue::GlobalValueCallEntry:
case PseudoSourceValue::ExternalSymbolCallEntry:		case PseudoSourceValue::ExternalSymbolCallEntry:
case PseudoSourceValue::TargetCustom:		case PseudoSourceValue::TargetCustom:
return AMDGPUASI.CONSTANT_ADDRESS;		return ST.getAMDGPUAS().CONSTANT_ADDRESS;
}		}
llvm_unreachable("Invalid pseudo source kind");		llvm_unreachable("Invalid pseudo source kind");
return AMDGPUASI.PRIVATE_ADDRESS;		return ST.getAMDGPUAS().PRIVATE_ADDRESS;
}		}

lib/Target/AMDGPU/R600Instructions.td

//===-- R600Instructions.td - R600 Instruction defs -------- tablegen --===//		//===-- R600Instructions.td - R600 Instruction defs -------- tablegen --===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// TableGen definitions for instructions which are available on R600 family		// TableGen definitions for instructions which are available on R600 family
// GPUs.		// GPUs.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

include "R600Intrinsics.td"
include "R600InstrFormats.td"		include "R600InstrFormats.td"

// FIXME: Should not be arbitrarily split from other R600 inst classes.		// FIXME: Should not be arbitrarily split from other R600 inst classes.
class R600WrapperInst <dag outs, dag ins, string asm = "", list<dag> pattern = []> :		class R600WrapperInst <dag outs, dag ins, string asm = "", list<dag> pattern = []> :
AMDGPUInst<outs, ins, asm, pattern>, PredicateControl {		AMDGPUInst<outs, ins, asm, pattern>, PredicateControl {
let SubtargetPredicate = isR600toCayman;		let SubtargetPredicate = isR600toCayman;
		let Namespace = "R600";
}		}


class InstR600ISA <dag outs, dag ins, string asm, list<dag> pattern = []> :		class InstR600ISA <dag outs, dag ins, string asm, list<dag> pattern = []> :
InstR600 <outs, ins, asm, pattern, NullALU> {		InstR600 <outs, ins, asm, pattern, NullALU> {

let Namespace = "AMDGPU";
}		}

def MEMxi : Operand<iPTR> {		def MEMxi : Operand<iPTR> {
let MIOperandInfo = (ops R600_TReg32_X:$ptr, i32imm:$index);		let MIOperandInfo = (ops R600_TReg32_X:$ptr, i32imm:$index);
let PrintMethod = "printMemOperand";		let PrintMethod = "printMemOperand";
}		}

def MEMrr : Operand<iPTR> {		def MEMrr : Operand<iPTR> {
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
def ADDRGA_CONST_OFFSET : ComplexPattern<i32, 1, "SelectGlobalValueConstantOffset", [], []>;		def ADDRGA_CONST_OFFSET : ComplexPattern<i32, 1, "SelectGlobalValueConstantOffset", [], []>;
def ADDRGA_VAR_OFFSET : ComplexPattern<i32, 2, "SelectGlobalValueVariableOffset", [], []>;		def ADDRGA_VAR_OFFSET : ComplexPattern<i32, 2, "SelectGlobalValueVariableOffset", [], []>;
def ADDRIndirect : ComplexPattern<iPTR, 2, "SelectADDRIndirect", [], []>;		def ADDRIndirect : ComplexPattern<iPTR, 2, "SelectADDRIndirect", [], []>;


def R600_Pred : PredicateOperand<i32, (ops R600_Predicate),		def R600_Pred : PredicateOperand<i32, (ops R600_Predicate),
(ops PRED_SEL_OFF)>;		(ops PRED_SEL_OFF)>;

		let isTerminator = 1, isReturn = 1, hasCtrlDep = 1,
		usesCustomInserter = 1, Namespace = "R600" in {
		def RETURN : ILFormat<(outs), (ins variable_ops),
		"RETURN", [(AMDGPUendpgm)]
		>;
		}

let mayLoad = 0, mayStore = 0, hasSideEffects = 0 in {		let mayLoad = 0, mayStore = 0, hasSideEffects = 0 in {

// Class for instructions with only one source register.		// Class for instructions with only one source register.
// If you add new ins to this instruction, make sure they are listed before		// If you add new ins to this instruction, make sure they are listed before
// $literal, because the backend currently assumes that the last operand is		// $literal, because the backend currently assumes that the last operand is
// a literal. Also be sure to update the enum R600Op1OperandIndex::ROI in		// a literal. Also be sure to update the enum R600Op1OperandIndex::ROI in
// R600Defines.h, R600InstrInfo::buildDefaultInstruction(),		// R600Defines.h, R600InstrInfo::buildDefaultInstruction(),
▲ Show 20 Lines • Show All 117 Lines • ▼ Show 20 Lines	InstR600 <(outs R600_Reg32:$dst),
asm,		asm,
pattern,		pattern,
itin>;		itin>;



} // End mayLoad = 1, mayStore = 0, hasSideEffects = 0		} // End mayLoad = 1, mayStore = 0, hasSideEffects = 0

def TEX_SHADOW : PatLeaf<
(imm),
[{uint32_t TType = (uint32_t)N->getZExtValue();
return (TType >= 6 && TType <= 8) \|\| TType == 13;
}]
>;

def TEX_RECT : PatLeaf<
(imm),
[{uint32_t TType = (uint32_t)N->getZExtValue();
return TType == 5;
}]
>;

def TEX_ARRAY : PatLeaf<
(imm),
[{uint32_t TType = (uint32_t)N->getZExtValue();
return TType == 9 \|\| TType == 10 \|\| TType == 16;
}]
>;

def TEX_SHADOW_ARRAY : PatLeaf<
(imm),
[{uint32_t TType = (uint32_t)N->getZExtValue();
return TType == 11 \|\| TType == 12 \|\| TType == 17;
}]
>;

class EG_CF_RAT <bits <8> cfinst, bits <6> ratinst, bits<4> ratid, bits<4> mask,		class EG_CF_RAT <bits <8> cfinst, bits <6> ratinst, bits<4> ratid, bits<4> mask,
dag outs, dag ins, string asm, list<dag> pattern> :		dag outs, dag ins, string asm, list<dag> pattern> :
InstR600ISA <outs, ins, asm, pattern>,		InstR600ISA <outs, ins, asm, pattern>,
CF_ALLOC_EXPORT_WORD0_RAT, CF_ALLOC_EXPORT_WORD1_BUF {		CF_ALLOC_EXPORT_WORD0_RAT, CF_ALLOC_EXPORT_WORD1_BUF {

let rat_id = ratid;		let rat_id = ratid;
let rat_inst = ratinst;		let rat_inst = ratinst;
let rim = 0;		let rim = 0;
▲ Show 20 Lines • Show All 94 Lines • ▼ Show 20 Lines
def vtx_id2_az_extloadi8 : LoadVtxId2 <az_extloadi8>;		def vtx_id2_az_extloadi8 : LoadVtxId2 <az_extloadi8>;
def vtx_id2_az_extloadi16 : LoadVtxId2 <az_extloadi16>;		def vtx_id2_az_extloadi16 : LoadVtxId2 <az_extloadi16>;
def vtx_id2_load : LoadVtxId2 <load>;		def vtx_id2_load : LoadVtxId2 <load>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// R600 SDNodes		// R600 SDNodes
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

		let Namespace = "R600" in {

def INTERP_PAIR_XY : AMDGPUShaderInst <		def INTERP_PAIR_XY : AMDGPUShaderInst <
(outs R600_TReg32_X:$dst0, R600_TReg32_Y:$dst1),		(outs R600_TReg32_X:$dst0, R600_TReg32_Y:$dst1),
(ins i32imm:$src0, R600_TReg32_Y:$src1, R600_TReg32_X:$src2),		(ins i32imm:$src0, R600_TReg32_Y:$src1, R600_TReg32_X:$src2),
"INTERP_PAIR_XY $src0 $src1 $src2 : $dst0 dst1",		"INTERP_PAIR_XY $src0 $src1 $src2 : $dst0 dst1",
[]>;		[]>;

def INTERP_PAIR_ZW : AMDGPUShaderInst <		def INTERP_PAIR_ZW : AMDGPUShaderInst <
(outs R600_TReg32_Z:$dst0, R600_TReg32_W:$dst1),		(outs R600_TReg32_Z:$dst0, R600_TReg32_W:$dst1),
(ins i32imm:$src0, R600_TReg32_Y:$src1, R600_TReg32_X:$src2),		(ins i32imm:$src0, R600_TReg32_Y:$src1, R600_TReg32_X:$src2),
"INTERP_PAIR_ZW $src0 $src1 $src2 : $dst0 dst1",		"INTERP_PAIR_ZW $src0 $src1 $src2 : $dst0 dst1",
[]>;		[]>;

		}

def CONST_ADDRESS: SDNode<"AMDGPUISD::CONST_ADDRESS",		def CONST_ADDRESS: SDNode<"AMDGPUISD::CONST_ADDRESS",
SDTypeProfile<1, -1, [SDTCisInt<0>, SDTCisPtrTy<1>]>,		SDTypeProfile<1, -1, [SDTCisInt<0>, SDTCisPtrTy<1>]>,
[SDNPVariadic]		[SDNPVariadic]
>;		>;

def DOT4 : SDNode<"AMDGPUISD::DOT4",		def DOT4 : SDNode<"AMDGPUISD::DOT4",
SDTypeProfile<1, 8, [SDTCisFP<0>, SDTCisVT<1, f32>, SDTCisVT<2, f32>,		SDTypeProfile<1, 8, [SDTCisFP<0>, SDTCisVT<1, f32>, SDTCisVT<2, f32>,
SDTCisVT<3, f32>, SDTCisVT<4, f32>, SDTCisVT<5, f32>,		SDTCisVT<3, f32>, SDTCisVT<4, f32>, SDTCisVT<5, f32>,
Show All 31 Lines	def : R600Pat<(TEXTURE_FETCH (i32 TextureOp), vt:$SRC_GPR,
imm:$COORD_TYPE_X, imm:$COORD_TYPE_Y, imm:$COORD_TYPE_Z,		imm:$COORD_TYPE_X, imm:$COORD_TYPE_Y, imm:$COORD_TYPE_Z,
imm:$COORD_TYPE_W)>;		imm:$COORD_TYPE_W)>;
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Interpolation Instructions		// Interpolation Instructions
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

		let Namespace = "R600" in {

def INTERP_VEC_LOAD : AMDGPUShaderInst <		def INTERP_VEC_LOAD : AMDGPUShaderInst <
(outs R600_Reg128:$dst),		(outs R600_Reg128:$dst),
(ins i32imm:$src0),		(ins i32imm:$src0),
"INTERP_LOAD $src0 : $dst">;		"INTERP_LOAD $src0 : $dst">;

		}

def INTERP_XY : R600_2OP <0xD6, "INTERP_XY", []> {		def INTERP_XY : R600_2OP <0xD6, "INTERP_XY", []> {
let bank_swizzle = 5;		let bank_swizzle = 5;
}		}

def INTERP_ZW : R600_2OP <0xD7, "INTERP_ZW", []> {		def INTERP_ZW : R600_2OP <0xD7, "INTERP_ZW", []> {
let bank_swizzle = 5;		let bank_swizzle = 5;
}		}

▲ Show 20 Lines • Show All 223 Lines • ▼ Show 20 Lines
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Common Instructions R600, R700, Evergreen, Cayman		// Common Instructions R600, R700, Evergreen, Cayman
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

let isCodeGenOnly = 1, isPseudo = 1 in {		let isCodeGenOnly = 1, isPseudo = 1 in {

let usesCustomInserter = 1 in {		let Namespace = "R600", usesCustomInserter = 1 in {

class CLAMP <RegisterClass rc> : AMDGPUShaderInst <		class CLAMP <RegisterClass rc> : AMDGPUShaderInst <
(outs rc:$dst),		(outs rc:$dst),
(ins rc:$src0),		(ins rc:$src0),
"CLAMP $dst, $src0",		"CLAMP $dst, $src0",
[(set f32:$dst, (AMDGPUclamp f32:$src0))]		[(set f32:$dst, (AMDGPUclamp f32:$src0))]
>;		>;

▲ Show 20 Lines • Show All 122 Lines • ▼ Show 20 Lines

let isPseudo = 1, isCodeGenOnly = 1, usesCustomInserter = 1 in {		let isPseudo = 1, isCodeGenOnly = 1, usesCustomInserter = 1 in {

class MOV_IMM <ValueType vt, Operand immType> : R600WrapperInst <		class MOV_IMM <ValueType vt, Operand immType> : R600WrapperInst <
(outs R600_Reg32:$dst),		(outs R600_Reg32:$dst),
(ins immType:$imm),		(ins immType:$imm),
"",		"",
[]		[]
>;		> {
		let Namespace = "R600";
		}

} // end let isPseudo = 1, isCodeGenOnly = 1, usesCustomInserter = 1		} // end let isPseudo = 1, isCodeGenOnly = 1, usesCustomInserter = 1

def MOV_IMM_I32 : MOV_IMM<i32, i32imm>;		def MOV_IMM_I32 : MOV_IMM<i32, i32imm>;
def : R600Pat <		def : R600Pat <
(imm:$val),		(imm:$val),
(MOV_IMM_I32 imm:$val)		(MOV_IMM_I32 imm:$val)
>;		>;
▲ Show 20 Lines • Show All 198 Lines • ▼ Show 20 Lines
class CNDGE_Common <bits<5> inst> : R600_3OP <		class CNDGE_Common <bits<5> inst> : R600_3OP <
inst, "CNDGE",		inst, "CNDGE",
[(set f32:$dst, (selectcc f32:$src0, FP_ZERO, f32:$src1, f32:$src2, COND_OGE))]		[(set f32:$dst, (selectcc f32:$src0, FP_ZERO, f32:$src1, f32:$src2, COND_OGE))]
> {		> {
let Itinerary = VecALU;		let Itinerary = VecALU;
}		}


let isCodeGenOnly = 1, isPseudo = 1, Namespace = "AMDGPU" in {		let isCodeGenOnly = 1, isPseudo = 1, Namespace = "R600" in {
class R600_VEC2OP<list<dag> pattern> : InstR600 <(outs R600_Reg32:$dst), (ins		class R600_VEC2OP<list<dag> pattern> : InstR600 <(outs R600_Reg32:$dst), (ins
// Slot X		// Slot X
UEM:$update_exec_mask_X, UP:$update_pred_X, WRITE:$write_X,		UEM:$update_exec_mask_X, UP:$update_pred_X, WRITE:$write_X,
OMOD:$omod_X, REL:$dst_rel_X, CLAMP:$clamp_X,		OMOD:$omod_X, REL:$dst_rel_X, CLAMP:$clamp_X,
R600_TReg32_X:$src0_X, NEG:$src0_neg_X, REL:$src0_rel_X, ABS:$src0_abs_X, SEL:$src0_sel_X,		R600_TReg32_X:$src0_X, NEG:$src0_neg_X, REL:$src0_rel_X, ABS:$src0_abs_X, SEL:$src0_sel_X,
R600_TReg32_X:$src1_X, NEG:$src1_neg_X, REL:$src1_rel_X, ABS:$src1_abs_X, SEL:$src1_sel_X,		R600_TReg32_X:$src1_X, NEG:$src1_neg_X, REL:$src1_rel_X, ABS:$src1_abs_X, SEL:$src1_sel_X,
R600_Pred:$pred_sel_X,		R600_Pred:$pred_sel_X,
// Slot Y		// Slot Y
▲ Show 20 Lines • Show All 303 Lines • ▼ Show 20 Lines

}		}


//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Regist loads and stores - for indirect addressing		// Regist loads and stores - for indirect addressing
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

		let Namespace = "R600" in {
defm R600_ : RegisterLoadStore <R600_Reg32, FRAMEri, ADDRIndirect>;		defm R600_ : RegisterLoadStore <R600_Reg32, FRAMEri, ADDRIndirect>;
		}

// Hardcode channel to 0		// Hardcode channel to 0
// NOTE: LSHR is not available here. LSHR is per family instruction		// NOTE: LSHR is not available here. LSHR is per family instruction
def : R600Pat <		def : R600Pat <
(i32 (load_private ADDRIndirect:$addr) ),		(i32 (load_private ADDRIndirect:$addr) ),
(R600_RegisterLoad FRAMEri:$addr, (i32 0))		(R600_RegisterLoad FRAMEri:$addr, (i32 0))
>;		>;
def : R600Pat <		def : R600Pat <
Show All 35 Lines
}		}

} // End isTerminator = 1, isBranch = 1		} // End isTerminator = 1, isBranch = 1

let usesCustomInserter = 1 in {		let usesCustomInserter = 1 in {

let mayLoad = 0, mayStore = 0, hasSideEffects = 1 in {		let mayLoad = 0, mayStore = 0, hasSideEffects = 1 in {

def MASK_WRITE : AMDGPUShaderInst <		def MASK_WRITE : InstR600 <
(outs),		(outs),
(ins R600_Reg32:$src),		(ins R600_Reg32:$src),
"MASK_WRITE $src",		"MASK_WRITE $src",
[]		[],
		NullALU
>;		>;

} // End mayLoad = 0, mayStore = 0, hasSideEffects = 1		} // End mayLoad = 0, mayStore = 0, hasSideEffects = 1


def TXD: InstR600 <		def TXD: InstR600 <
(outs R600_Reg128:$dst),		(outs R600_Reg128:$dst),
(ins R600_Reg128:$src0, R600_Reg128:$src1, R600_Reg128:$src2,		(ins R600_Reg128:$src0, R600_Reg128:$src1, R600_Reg128:$src2,
Show All 14 Lines
} // End isPseudo = 1		} // End isPseudo = 1
} // End usesCustomInserter = 1		} // End usesCustomInserter = 1


//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Constant Buffer Addressing Support		// Constant Buffer Addressing Support
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

let usesCustomInserter = 1, isCodeGenOnly = 1, isPseudo = 1, Namespace = "AMDGPU" in {		let usesCustomInserter = 1, isCodeGenOnly = 1, isPseudo = 1, Namespace = "R600" in {
def CONST_COPY : Instruction {		def CONST_COPY : Instruction {
let OutOperandList = (outs R600_Reg32:$dst);		let OutOperandList = (outs R600_Reg32:$dst);
let InOperandList = (ins i32imm:$src);		let InOperandList = (ins i32imm:$src);
let Pattern =		let Pattern =
[(set R600_Reg32:$dst, (CONST_ADDRESS ADDRGA_CONST_OFFSET:$src))];		[(set R600_Reg32:$dst, (CONST_ADDRESS ADDRGA_CONST_OFFSET:$src))];
let AsmString = "CONST_COPY";		let AsmString = "CONST_COPY";
let hasSideEffects = 0;		let hasSideEffects = 0;
let isAsCheapAsAMove = 1;		let isAsCheapAsAMove = 1;
▲ Show 20 Lines • Show All 106 Lines • ▼ Show 20 Lines
//		//
// Inst{127-96} = 0;		// Inst{127-96} = 0;
let VTXInst = 1;		let VTXInst = 1;
}		}

//===---------------------------------------------------------------------===//		//===---------------------------------------------------------------------===//
// Flow and Program control Instructions		// Flow and Program control Instructions
//===---------------------------------------------------------------------===//		//===---------------------------------------------------------------------===//
class ILFormat<dag outs, dag ins, string asmstr, list<dag> pattern>
: Instruction {

let Namespace = "AMDGPU";
dag OutOperandList = outs;
dag InOperandList = ins;
let Pattern = pattern;
let AsmString = !strconcat(asmstr, "\n");
let isPseudo = 1;
let Itinerary = NullALU;
bit hasIEEEFlag = 0;
bit hasZeroOpFlag = 0;
let mayLoad = 0;
let mayStore = 0;
let hasSideEffects = 0;
let isCodeGenOnly = 1;
}

multiclass BranchConditional<SDNode Op, RegisterClass rci, RegisterClass rcf> {		multiclass BranchConditional<SDNode Op, RegisterClass rci, RegisterClass rcf> {
def _i32 : ILFormat<(outs),		def _i32 : ILFormat<(outs),
(ins brtarget:$target, rci:$src0),		(ins brtarget:$target, rci:$src0),
"; i32 Pseudo branch instruction",		"; i32 Pseudo branch instruction",
[(Op bb:$target, (i32 rci:$src0))]>;		[(Op bb:$target, (i32 rci:$src0))]>;
def _f32 : ILFormat<(outs),		def _f32 : ILFormat<(outs),
(ins brtarget:$target, rcf:$src0),		(ins brtarget:$target, rcf:$src0),
Show All 15 Lines	multiclass BranchInstr2<string name> {
def _f32 : ILFormat<(outs), (ins R600_Reg32:$src0, R600_Reg32:$src1),		def _f32 : ILFormat<(outs), (ins R600_Reg32:$src0, R600_Reg32:$src1),
!strconcat(name, " $src0, $src1"), []>;		!strconcat(name, " $src0, $src1"), []>;
}		}

//===---------------------------------------------------------------------===//		//===---------------------------------------------------------------------===//
// Custom Inserter for Branches and returns, this eventually will be a		// Custom Inserter for Branches and returns, this eventually will be a
// separate pass		// separate pass
//===---------------------------------------------------------------------===//		//===---------------------------------------------------------------------===//
let isTerminator = 1, usesCustomInserter = 1, isBranch = 1, isBarrier = 1 in {		let isTerminator = 1, usesCustomInserter = 1, isBranch = 1, isBarrier = 1,
		Namespace = "R600" in {
def BRANCH : ILFormat<(outs), (ins brtarget:$target),		def BRANCH : ILFormat<(outs), (ins brtarget:$target),
"; Pseudo unconditional branch instruction",		"; Pseudo unconditional branch instruction",
[(br bb:$target)]>;		[(br bb:$target)]>;
defm BRANCH_COND : BranchConditional<IL_brcond, R600_Reg32, R600_Reg32>;		defm BRANCH_COND : BranchConditional<IL_brcond, R600_Reg32, R600_Reg32>;
}		}

//===---------------------------------------------------------------------===//
// Return instruction
//===---------------------------------------------------------------------===//
let isTerminator = 1, isReturn = 1, hasCtrlDep = 1,
usesCustomInserter = 1 in {
def RETURN : ILFormat<(outs), (ins variable_ops),
"RETURN", [(AMDGPUendpgm)]
>;
}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Branch Instructions		// Branch Instructions
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

def IF_PREDICATE_SET : ILFormat<(outs), (ins R600_Reg32:$src),		def IF_PREDICATE_SET : ILFormat<(outs), (ins R600_Reg32:$src),
"IF_PREDICATE_SET $src", []>;		"IF_PREDICATE_SET $src", []>;

let isTerminator=1 in {		let isTerminator=1 in {
▲ Show 20 Lines • Show All 114 Lines • ▼ Show 20 Lines
//CNDGE_INT extra pattern		//CNDGE_INT extra pattern
def : R600Pat <		def : R600Pat <
(selectcc i32:$src0, -1, i32:$src1, i32:$src2, COND_SGT),		(selectcc i32:$src0, -1, i32:$src1, i32:$src2, COND_SGT),
(CNDGE_INT $src0, $src1, $src2)		(CNDGE_INT $src0, $src1, $src2)
>;		>;

// KIL Patterns		// KIL Patterns
def KIL : R600Pat <		def KIL : R600Pat <
(int_AMDGPU_kill f32:$src0),		(int_r600_kill f32:$src0),
(MASK_WRITE (KILLGT (f32 ZERO), $src0))		(MASK_WRITE (KILLGT (f32 ZERO), $src0))
>;		>;

def : Extract_Element <f32, v4f32, 0, sub0>;		def : Extract_Element <f32, v4f32, 0, sub0>;
def : Extract_Element <f32, v4f32, 1, sub1>;		def : Extract_Element <f32, v4f32, 1, sub1>;
def : Extract_Element <f32, v4f32, 2, sub2>;		def : Extract_Element <f32, v4f32, 2, sub2>;
def : Extract_Element <f32, v4f32, 3, sub3>;		def : Extract_Element <f32, v4f32, 3, sub3>;

▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

lib/Target/AMDGPU/R600IntrinsicInfo.h

This file was added.

				//===- R600IntrinsicInfo.h - R600 Intrinsic Information ------- C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//==-----------------------------------------------------------------------===//
				//
				/// \file
				/// \brief Interface for the R600 Implementation of the Intrinsic Info class.
				//
				//===-----------------------------------------------------------------------===//
				#ifndef LLVM_LIB_TARGET_AMDGPU_R600INTRINSICINFO_H
				#define LLVM_LIB_TARGET_AMDGPU_R600INTRINSICINFO_H

				#include "llvm/IR/Intrinsics.h"
				#include "llvm/Target/TargetIntrinsicInfo.h"

				namespace llvm {
				class TargetMachine;

				namespace r600Intrinsic {
				enum ID {
				last_non_R600_intrinsic = Intrinsic::num_intrinsics - 1,
				#define GET_INTRINSIC_ENUM_VALUES
				#include "R600GenIntrinsics.inc"
				#undef GET_INTRINSIC_ENUM_VALUES
				, num_R600_intrinsics
				};

				} // end namespace R600Intrinsic

				class R600IntrinsicInfo final : public TargetIntrinsicInfo {
				public:
				R600IntrinsicInfo();

				StringRef getName(unsigned IntrId, ArrayRef<Type *> Tys = None) const;

				std::string getName(unsigned IntrId, Type **Tys = nullptr,
				unsigned NumTys = 0) const override;

				unsigned lookupName(const char *Name, unsigned Len) const override;
				bool isOverloaded(unsigned IID) const override;
				Function getDeclaration(Module M, unsigned ID,
				Type **Tys = nullptr,
				unsigned NumTys = 0) const override;

				Function getDeclaration(Module M, unsigned ID,
				ArrayRef<Type *> = None) const;

				FunctionType *getType(LLVMContext &Context, unsigned ID,
				ArrayRef<Type*> Tys = None) const;
				};

				} // end namespace llvm

				#endif

lib/Target/AMDGPU/R600IntrinsicInfo.cpp

This file was added.

				//===- R600IntrinsicInfo.cpp - R600 Intrinsic Information -------- C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//==-----------------------------------------------------------------------===//
				//
				/// \file
				/// \brief R600 Implementation of the IntrinsicInfo class.
				//
				//===-----------------------------------------------------------------------===//

				#include "R600IntrinsicInfo.h"
				#include "AMDGPUSubtarget.h"
				#include "llvm/IR/DerivedTypes.h"
				#include "llvm/IR/Intrinsics.h"
				#include "llvm/IR/Module.h"

				using namespace llvm;

				R600IntrinsicInfo::R600IntrinsicInfo()
				: TargetIntrinsicInfo() {}

				arsenmUnsubmitted Done Reply Inline Actions We can probably drop the whole class. We don't have very many intrinsic definitions left in the backend, and this code never really worked well to begin with arsenm: We can probably drop the whole class. We don't have very many intrinsic definitions left in the…
				static const char *const IntrinsicNameTable[] = {
				#define GET_INTRINSIC_NAME_TABLE
				#include "R600GenIntrinsics.inc"
				#undef GET_INTRINSIC_NAME_TABLE
				};

				namespace {
				#define GET_INTRINSIC_ATTRIBUTES
				#include "R600GenIntrinsics.inc"
				#undef GET_INTRINSIC_ATTRIBUTES
				}

				StringRef R600IntrinsicInfo::getName(unsigned IntrID,
				ArrayRef<Type *> Tys) const {
				if (IntrID < Intrinsic::num_intrinsics)
				return StringRef();

				assert(IntrID < r600Intrinsic::num_R600_intrinsics &&
				"Invalid intrinsic ID");

				return IntrinsicNameTable[IntrID - Intrinsic::num_intrinsics];
				}

				std::string R600IntrinsicInfo::getName(unsigned IntrID, Type **Tys,
				unsigned NumTys) const {
				return getName(IntrID, makeArrayRef(Tys, NumTys)).str();
				}

				FunctionType *R600IntrinsicInfo::getType(LLVMContext &Context, unsigned ID,
				ArrayRef<Type*> Tys) const {
				// FIXME: Re-use Intrinsic::getType machinery
				llvm_unreachable("unhandled intrinsic");
				}

				unsigned R600IntrinsicInfo::lookupName(const char *NameData,
				unsigned Len) const {
				StringRef Name(NameData, Len);
				if (!Name.startswith("llvm."))
				return 0; // All intrinsics start with 'llvm.'

				// Look for a name match in our table. If the intrinsic is not overloaded,
				// require an exact match. If it is overloaded, require a prefix match. The
				// R600 enum enum starts at Intrinsic::num_intrinsics.
				int Idx = Intrinsic::lookupLLVMIntrinsicByName(IntrinsicNameTable, Name);
				if (Idx >= 0) {
				bool IsPrefixMatch = Name.size() > strlen(IntrinsicNameTable[Idx]);
				return IsPrefixMatch == isOverloaded(Idx + 1)
				? Intrinsic::num_intrinsics + Idx
				: 0;
				}

				return 0;
				}

				bool R600IntrinsicInfo::isOverloaded(unsigned id) const {
				// Overload Table
				#define GET_INTRINSIC_OVERLOAD_TABLE
				#include "R600GenIntrinsics.inc"
				#undef GET_INTRINSIC_OVERLOAD_TABLE
				}

				Function R600IntrinsicInfo::getDeclaration(Module M, unsigned IntrID,
				ArrayRef<Type *> Tys) const {
				FunctionType *FTy = getType(M->getContext(), IntrID, Tys);
				Function *F
				= cast<Function>(M->getOrInsertFunction(getName(IntrID, Tys), FTy));

				AttributeList AS =
				getAttributes(M->getContext(), static_cast<r600Intrinsic::ID>(IntrID));
				F->setAttributes(AS);
				return F;
				}

				Function R600IntrinsicInfo::getDeclaration(Module M, unsigned IntrID,
				Type **Tys,
				unsigned NumTys) const {
				return getDeclaration(M, IntrID, makeArrayRef(Tys, NumTys));
				}

lib/Target/AMDGPU/R600Intrinsics.td

	Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
	def int_r600_txf : TextureIntrinsicInt32Input;			def int_r600_txf : TextureIntrinsicInt32Input;
	def int_r600_txq : TextureIntrinsicInt32Input;			def int_r600_txq : TextureIntrinsicInt32Input;
	def int_r600_ddx : TextureIntrinsicFloatInput;			def int_r600_ddx : TextureIntrinsicFloatInput;
	def int_r600_ddy : TextureIntrinsicFloatInput;			def int_r600_ddy : TextureIntrinsicFloatInput;

	def int_r600_dot4 : Intrinsic<[llvm_float_ty],			def int_r600_dot4 : Intrinsic<[llvm_float_ty],
	[llvm_v4f32_ty, llvm_v4f32_ty], [IntrNoMem, IntrSpeculatable]			[llvm_v4f32_ty, llvm_v4f32_ty], [IntrNoMem, IntrSpeculatable]
	>;			>;

				def int_r600_kill : Intrinsic<[], [llvm_float_ty], []>;

	} // End TargetPrefix = "r600", isTarget = 1			} // End TargetPrefix = "r600", isTarget = 1

lib/Target/AMDGPU/R600MachineScheduler.cpp

Show First 20 Lines • Show All 158 Lines • ▼ Show 20 Lines	case AluT_XYZW:
break;		break;
case AluDiscarded:		case AluDiscarded:
break;		break;
default: {		default: {
++CurEmitted;		++CurEmitted;
for (MachineInstr::mop_iterator It = SU->getInstr()->operands_begin(),		for (MachineInstr::mop_iterator It = SU->getInstr()->operands_begin(),
E = SU->getInstr()->operands_end(); It != E; ++It) {		E = SU->getInstr()->operands_end(); It != E; ++It) {
MachineOperand &MO = *It;		MachineOperand &MO = *It;
if (MO.isReg() && MO.getReg() == AMDGPU::ALU_LITERAL_X)		if (MO.isReg() && MO.getReg() == R600::ALU_LITERAL_X)
++CurEmitted;		++CurEmitted;
}		}
}		}
}		}
} else {		} else {
++CurEmitted;		++CurEmitted;
}		}


DEBUG(dbgs() << CurEmitted << " Instructions Emitted in this clause\n");		DEBUG(dbgs() << CurEmitted << " Instructions Emitted in this clause\n");

if (CurInstKind != IDFetch) {		if (CurInstKind != IDFetch) {
MoveUnits(Pending[IDFetch], Available[IDFetch]);		MoveUnits(Pending[IDFetch], Available[IDFetch]);
} else		} else
FetchInstCount++;		FetchInstCount++;
}		}

static bool		static bool
isPhysicalRegCopy(MachineInstr *MI) {		isPhysicalRegCopy(MachineInstr *MI) {
if (MI->getOpcode() != AMDGPU::COPY)		if (MI->getOpcode() != R600::COPY)
return false;		return false;

return !TargetRegisterInfo::isVirtualRegister(MI->getOperand(1).getReg());		return !TargetRegisterInfo::isVirtualRegister(MI->getOperand(1).getReg());
}		}

void R600SchedStrategy::releaseTopNode(SUnit *SU) {		void R600SchedStrategy::releaseTopNode(SUnit *SU) {
DEBUG(dbgs() << "Top Releasing ";SU->dump(DAG););		DEBUG(dbgs() << "Top Releasing ";SU->dump(DAG););
}		}
Show All 26 Lines

R600SchedStrategy::AluKind R600SchedStrategy::getAluKind(SUnit *SU) const {		R600SchedStrategy::AluKind R600SchedStrategy::getAluKind(SUnit *SU) const {
MachineInstr *MI = SU->getInstr();		MachineInstr *MI = SU->getInstr();

if (TII->isTransOnly(*MI))		if (TII->isTransOnly(*MI))
return AluTrans;		return AluTrans;

switch (MI->getOpcode()) {		switch (MI->getOpcode()) {
case AMDGPU::PRED_X:		case R600::PRED_X:
return AluPredX;		return AluPredX;
case AMDGPU::INTERP_PAIR_XY:		case R600::INTERP_PAIR_XY:
case AMDGPU::INTERP_PAIR_ZW:		case R600::INTERP_PAIR_ZW:
case AMDGPU::INTERP_VEC_LOAD:		case R600::INTERP_VEC_LOAD:
case AMDGPU::DOT_4:		case R600::DOT_4:
return AluT_XYZW;		return AluT_XYZW;
case AMDGPU::COPY:		case R600::COPY:
if (MI->getOperand(1).isUndef()) {		if (MI->getOperand(1).isUndef()) {
// MI will become a KILL, don't considers it in scheduling		// MI will become a KILL, don't considers it in scheduling
return AluDiscarded;		return AluDiscarded;
}		}
default:		default:
break;		break;
}		}

// Does the instruction take a whole IG ?		// Does the instruction take a whole IG ?
// XXX: Is it possible to add a helper function in R600InstrInfo that can		// XXX: Is it possible to add a helper function in R600InstrInfo that can
// be used here and in R600PacketizerList::isSoloInstruction() ?		// be used here and in R600PacketizerList::isSoloInstruction() ?
if(TII->isVector(*MI) \|\|		if(TII->isVector(*MI) \|\|
TII->isCubeOp(MI->getOpcode()) \|\|		TII->isCubeOp(MI->getOpcode()) \|\|
TII->isReductionOp(MI->getOpcode()) \|\|		TII->isReductionOp(MI->getOpcode()) \|\|
MI->getOpcode() == AMDGPU::GROUP_BARRIER) {		MI->getOpcode() == R600::GROUP_BARRIER) {
return AluT_XYZW;		return AluT_XYZW;
}		}

if (TII->isLDSInstr(MI->getOpcode())) {		if (TII->isLDSInstr(MI->getOpcode())) {
return AluT_X;		return AluT_X;
}		}

// Is the result already assigned to a channel ?		// Is the result already assigned to a channel ?
unsigned DestSubReg = MI->getOperand(0).getSubReg();		unsigned DestSubReg = MI->getOperand(0).getSubReg();
switch (DestSubReg) {		switch (DestSubReg) {
case AMDGPU::sub0:		case R600::sub0:
return AluT_X;		return AluT_X;
case AMDGPU::sub1:		case R600::sub1:
return AluT_Y;		return AluT_Y;
case AMDGPU::sub2:		case R600::sub2:
return AluT_Z;		return AluT_Z;
case AMDGPU::sub3:		case R600::sub3:
return AluT_W;		return AluT_W;
default:		default:
break;		break;
}		}

// Is the result already member of a X/Y/Z/W class ?		// Is the result already member of a X/Y/Z/W class ?
unsigned DestReg = MI->getOperand(0).getReg();		unsigned DestReg = MI->getOperand(0).getReg();
if (regBelongsToClass(DestReg, &AMDGPU::R600_TReg32_XRegClass) \|\|		if (regBelongsToClass(DestReg, &R600::R600_TReg32_XRegClass) \|\|
regBelongsToClass(DestReg, &AMDGPU::R600_AddrRegClass))		regBelongsToClass(DestReg, &R600::R600_AddrRegClass))
return AluT_X;		return AluT_X;
if (regBelongsToClass(DestReg, &AMDGPU::R600_TReg32_YRegClass))		if (regBelongsToClass(DestReg, &R600::R600_TReg32_YRegClass))
return AluT_Y;		return AluT_Y;
if (regBelongsToClass(DestReg, &AMDGPU::R600_TReg32_ZRegClass))		if (regBelongsToClass(DestReg, &R600::R600_TReg32_ZRegClass))
return AluT_Z;		return AluT_Z;
if (regBelongsToClass(DestReg, &AMDGPU::R600_TReg32_WRegClass))		if (regBelongsToClass(DestReg, &R600::R600_TReg32_WRegClass))
return AluT_W;		return AluT_W;
if (regBelongsToClass(DestReg, &AMDGPU::R600_Reg128RegClass))		if (regBelongsToClass(DestReg, &R600::R600_Reg128RegClass))
return AluT_XYZW;		return AluT_XYZW;

// LDS src registers cannot be used in the Trans slot.		// LDS src registers cannot be used in the Trans slot.
if (TII->readsLDSSrcReg(*MI))		if (TII->readsLDSSrcReg(*MI))
return AluT_XYZW;		return AluT_XYZW;

return AluAny;		return AluAny;
}		}

int R600SchedStrategy::getInstKind(SUnit* SU) {		int R600SchedStrategy::getInstKind(SUnit* SU) {
int Opcode = SU->getInstr()->getOpcode();		int Opcode = SU->getInstr()->getOpcode();

if (TII->usesTextureCache(Opcode) \|\| TII->usesVertexCache(Opcode))		if (TII->usesTextureCache(Opcode) \|\| TII->usesVertexCache(Opcode))
return IDFetch;		return IDFetch;

if (TII->isALUInstr(Opcode)) {		if (TII->isALUInstr(Opcode)) {
return IDAlu;		return IDAlu;
}		}

switch (Opcode) {		switch (Opcode) {
case AMDGPU::PRED_X:		case R600::PRED_X:
case AMDGPU::COPY:		case R600::COPY:
case AMDGPU::CONST_COPY:		case R600::CONST_COPY:
case AMDGPU::INTERP_PAIR_XY:		case R600::INTERP_PAIR_XY:
case AMDGPU::INTERP_PAIR_ZW:		case R600::INTERP_PAIR_ZW:
case AMDGPU::INTERP_VEC_LOAD:		case R600::INTERP_VEC_LOAD:
case AMDGPU::DOT_4:		case R600::DOT_4:
return IDAlu;		return IDAlu;
default:		default:
return IDOther;		return IDOther;
}		}
}		}

SUnit R600SchedStrategy::PopInst(std::vector<SUnit > &Q, bool AnyALU) {		SUnit R600SchedStrategy::PopInst(std::vector<SUnit > &Q, bool AnyALU) {
if (Q.empty())		if (Q.empty())
Show All 29 Lines	void R600SchedStrategy::PrepareNextSlot() {
OccupedSlotsMask = 0;		OccupedSlotsMask = 0;
// if (HwGen == R600Subtarget::NORTHERN_ISLANDS)		// if (HwGen == R600Subtarget::NORTHERN_ISLANDS)
// OccupedSlotsMask \|= 16;		// OccupedSlotsMask \|= 16;
InstructionsGroupCandidate.clear();		InstructionsGroupCandidate.clear();
LoadAlu();		LoadAlu();
}		}

void R600SchedStrategy::AssignSlot(MachineInstr* MI, unsigned Slot) {		void R600SchedStrategy::AssignSlot(MachineInstr* MI, unsigned Slot) {
int DstIndex = TII->getOperandIdx(MI->getOpcode(), AMDGPU::OpName::dst);		int DstIndex = TII->getOperandIdx(MI->getOpcode(), R600::OpName::dst);
if (DstIndex == -1) {		if (DstIndex == -1) {
return;		return;
}		}
unsigned DestReg = MI->getOperand(DstIndex).getReg();		unsigned DestReg = MI->getOperand(DstIndex).getReg();
// PressureRegister crashes if an operand is def and used in the same inst		// PressureRegister crashes if an operand is def and used in the same inst
// and we try to constraint its regclass		// and we try to constraint its regclass
for (MachineInstr::mop_iterator It = MI->operands_begin(),		for (MachineInstr::mop_iterator It = MI->operands_begin(),
E = MI->operands_end(); It != E; ++It) {		E = MI->operands_end(); It != E; ++It) {
MachineOperand &MO = *It;		MachineOperand &MO = *It;
if (MO.isReg() && !MO.isDef() &&		if (MO.isReg() && !MO.isDef() &&
MO.getReg() == DestReg)		MO.getReg() == DestReg)
return;		return;
}		}
// Constrains the regclass of DestReg to assign it to Slot		// Constrains the regclass of DestReg to assign it to Slot
switch (Slot) {		switch (Slot) {
case 0:		case 0:
MRI->constrainRegClass(DestReg, &AMDGPU::R600_TReg32_XRegClass);		MRI->constrainRegClass(DestReg, &R600::R600_TReg32_XRegClass);
break;		break;
case 1:		case 1:
MRI->constrainRegClass(DestReg, &AMDGPU::R600_TReg32_YRegClass);		MRI->constrainRegClass(DestReg, &R600::R600_TReg32_YRegClass);
break;		break;
case 2:		case 2:
MRI->constrainRegClass(DestReg, &AMDGPU::R600_TReg32_ZRegClass);		MRI->constrainRegClass(DestReg, &R600::R600_TReg32_ZRegClass);
break;		break;
case 3:		case 3:
MRI->constrainRegClass(DestReg, &AMDGPU::R600_TReg32_WRegClass);		MRI->constrainRegClass(DestReg, &R600::R600_TReg32_WRegClass);
break;		break;
}		}
}		}

SUnit *R600SchedStrategy::AttemptFillSlot(unsigned Slot, bool AnyAlu) {		SUnit *R600SchedStrategy::AttemptFillSlot(unsigned Slot, bool AnyAlu) {
static const AluKind IndexToID[] = {AluT_X, AluT_Y, AluT_Z, AluT_W};		static const AluKind IndexToID[] = {AluT_X, AluT_Y, AluT_Z, AluT_W};
SUnit *SlotedSU = PopInst(AvailableAlus[IndexToID[Slot]], AnyAlu);		SUnit *SlotedSU = PopInst(AvailableAlus[IndexToID[Slot]], AnyAlu);
if (SlotedSU)		if (SlotedSU)
▲ Show 20 Lines • Show All 75 Lines • Show Last 20 Lines

lib/Target/AMDGPU/R600OptimizeVectorRegisters.cpp

Show First 20 Lines • Show All 73 Lines • ▼ Show 20 Lines

class RegSeqInfo {		class RegSeqInfo {
public:		public:
MachineInstr *Instr;		MachineInstr *Instr;
DenseMap<unsigned, unsigned> RegToChan;		DenseMap<unsigned, unsigned> RegToChan;
std::vector<unsigned> UndefReg;		std::vector<unsigned> UndefReg;

RegSeqInfo(MachineRegisterInfo &MRI, MachineInstr *MI) : Instr(MI) {		RegSeqInfo(MachineRegisterInfo &MRI, MachineInstr *MI) : Instr(MI) {
assert(MI->getOpcode() == AMDGPU::REG_SEQUENCE);		assert(MI->getOpcode() == R600::REG_SEQUENCE);
for (unsigned i = 1, e = Instr->getNumOperands(); i < e; i+=2) {		for (unsigned i = 1, e = Instr->getNumOperands(); i < e; i+=2) {
MachineOperand &MO = Instr->getOperand(i);		MachineOperand &MO = Instr->getOperand(i);
unsigned Chan = Instr->getOperand(i + 1).getImm();		unsigned Chan = Instr->getOperand(i + 1).getImm();
if (isImplicitlyDef(MRI, MO.getReg()))		if (isImplicitlyDef(MRI, MO.getReg()))
UndefReg.push_back(Chan);		UndefReg.push_back(Chan);
else		else
RegToChan[MO.getReg()] = Chan;		RegToChan[MO.getReg()] = Chan;
}		}
▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines

char &llvm::R600VectorRegMergerID = R600VectorRegMerger::ID;		char &llvm::R600VectorRegMergerID = R600VectorRegMerger::ID;

bool R600VectorRegMerger::canSwizzle(const MachineInstr &MI)		bool R600VectorRegMerger::canSwizzle(const MachineInstr &MI)
const {		const {
if (TII->get(MI.getOpcode()).TSFlags & R600_InstFlag::TEX_INST)		if (TII->get(MI.getOpcode()).TSFlags & R600_InstFlag::TEX_INST)
return true;		return true;
switch (MI.getOpcode()) {		switch (MI.getOpcode()) {
case AMDGPU::R600_ExportSwz:		case R600::R600_ExportSwz:
case AMDGPU::EG_ExportSwz:		case R600::EG_ExportSwz:
return true;		return true;
default:		default:
return false;		return false;
}		}
}		}

bool R600VectorRegMerger::tryMergeVector(const RegSeqInfo *Untouched,		bool R600VectorRegMerger::tryMergeVector(const RegSeqInfo *Untouched,
RegSeqInfo *ToMerge, std::vector< std::pair<unsigned, unsigned>> &Remap)		RegSeqInfo *ToMerge, std::vector< std::pair<unsigned, unsigned>> &Remap)
Show All 36 Lines	MachineInstr *R600VectorRegMerger::RebuildVector(
MachineBasicBlock &MBB = *Pos->getParent();		MachineBasicBlock &MBB = *Pos->getParent();
DebugLoc DL = Pos->getDebugLoc();		DebugLoc DL = Pos->getDebugLoc();

unsigned SrcVec = BaseRSI->Instr->getOperand(0).getReg();		unsigned SrcVec = BaseRSI->Instr->getOperand(0).getReg();
DenseMap<unsigned, unsigned> UpdatedRegToChan = BaseRSI->RegToChan;		DenseMap<unsigned, unsigned> UpdatedRegToChan = BaseRSI->RegToChan;
std::vector<unsigned> UpdatedUndef = BaseRSI->UndefReg;		std::vector<unsigned> UpdatedUndef = BaseRSI->UndefReg;
for (DenseMap<unsigned, unsigned>::iterator It = RSI->RegToChan.begin(),		for (DenseMap<unsigned, unsigned>::iterator It = RSI->RegToChan.begin(),
E = RSI->RegToChan.end(); It != E; ++It) {		E = RSI->RegToChan.end(); It != E; ++It) {
unsigned DstReg = MRI->createVirtualRegister(&AMDGPU::R600_Reg128RegClass);		unsigned DstReg = MRI->createVirtualRegister(&R600::R600_Reg128RegClass);
unsigned SubReg = (*It).first;		unsigned SubReg = (*It).first;
unsigned Swizzle = (*It).second;		unsigned Swizzle = (*It).second;
unsigned Chan = getReassignedChan(RemapChan, Swizzle);		unsigned Chan = getReassignedChan(RemapChan, Swizzle);

MachineInstr *Tmp = BuildMI(MBB, Pos, DL, TII->get(AMDGPU::INSERT_SUBREG),		MachineInstr *Tmp = BuildMI(MBB, Pos, DL, TII->get(R600::INSERT_SUBREG),
DstReg)		DstReg)
.addReg(SrcVec)		.addReg(SrcVec)
.addReg(SubReg)		.addReg(SubReg)
.addImm(Chan);		.addImm(Chan);
UpdatedRegToChan[SubReg] = Chan;		UpdatedRegToChan[SubReg] = Chan;
std::vector<unsigned>::iterator ChanPos = llvm::find(UpdatedUndef, Chan);		std::vector<unsigned>::iterator ChanPos = llvm::find(UpdatedUndef, Chan);
if (ChanPos != UpdatedUndef.end())		if (ChanPos != UpdatedUndef.end())
UpdatedUndef.erase(ChanPos);		UpdatedUndef.erase(ChanPos);
assert(!is_contained(UpdatedUndef, Chan) &&		assert(!is_contained(UpdatedUndef, Chan) &&
"UpdatedUndef shouldn't contain Chan more than once!");		"UpdatedUndef shouldn't contain Chan more than once!");
DEBUG(dbgs() << " ->"; Tmp->dump(););		DEBUG(dbgs() << " ->"; Tmp->dump(););
(void)Tmp;		(void)Tmp;
SrcVec = DstReg;		SrcVec = DstReg;
}		}
MachineInstr *NewMI =		MachineInstr *NewMI =
BuildMI(MBB, Pos, DL, TII->get(AMDGPU::COPY), Reg).addReg(SrcVec);		BuildMI(MBB, Pos, DL, TII->get(R600::COPY), Reg).addReg(SrcVec);
DEBUG(dbgs() << " ->"; NewMI->dump(););		DEBUG(dbgs() << " ->"; NewMI->dump(););

DEBUG(dbgs() << " Updating Swizzle:\n");		DEBUG(dbgs() << " Updating Swizzle:\n");
for (MachineRegisterInfo::use_instr_iterator It = MRI->use_instr_begin(Reg),		for (MachineRegisterInfo::use_instr_iterator It = MRI->use_instr_begin(Reg),
E = MRI->use_instr_end(); It != E; ++It) {		E = MRI->use_instr_end(); It != E; ++It) {
DEBUG(dbgs() << " ";(*It).dump(); dbgs() << " ->");		DEBUG(dbgs() << " ";(*It).dump(); dbgs() << " ->");
SwizzleInput(*It, RemapChan);		SwizzleInput(*It, RemapChan);
DEBUG((*It).dump());		DEBUG((*It).dump());
▲ Show 20 Lines • Show All 103 Lines • ▼ Show 20 Lines	for (MachineFunction::iterator MBB = Fn.begin(), MBBe = Fn.end();
MachineBasicBlock MB = &MBB;		MachineBasicBlock MB = &MBB;
PreviousRegSeq.clear();		PreviousRegSeq.clear();
PreviousRegSeqByReg.clear();		PreviousRegSeqByReg.clear();
PreviousRegSeqByUndefCount.clear();		PreviousRegSeqByUndefCount.clear();

for (MachineBasicBlock::iterator MII = MB->begin(), MIIE = MB->end();		for (MachineBasicBlock::iterator MII = MB->begin(), MIIE = MB->end();
MII != MIIE; ++MII) {		MII != MIIE; ++MII) {
MachineInstr &MI = *MII;		MachineInstr &MI = *MII;
if (MI.getOpcode() != AMDGPU::REG_SEQUENCE) {		if (MI.getOpcode() != R600::REG_SEQUENCE) {
if (TII->get(MI.getOpcode()).TSFlags & R600_InstFlag::TEX_INST) {		if (TII->get(MI.getOpcode()).TSFlags & R600_InstFlag::TEX_INST) {
unsigned Reg = MI.getOperand(1).getReg();		unsigned Reg = MI.getOperand(1).getReg();
for (MachineRegisterInfo::def_instr_iterator		for (MachineRegisterInfo::def_instr_iterator
It = MRI->def_instr_begin(Reg), E = MRI->def_instr_end();		It = MRI->def_instr_begin(Reg), E = MRI->def_instr_end();
It != E; ++It) {		It != E; ++It) {
RemoveMI(&(*It));		RemoveMI(&(*It));
}		}
}		}
▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

lib/Target/AMDGPU/R600Packetizer.cpp

Show First 20 Lines • Show All 78 Lines • ▼ Show 20 Lines	DenseMap<unsigned, unsigned> getPreviousVector(MachineBasicBlock::iterator I)
do {		do {
bool isTrans = false;		bool isTrans = false;
int BISlot = getSlot(*BI);		int BISlot = getSlot(*BI);
if (LastDstChan >= BISlot)		if (LastDstChan >= BISlot)
isTrans = true;		isTrans = true;
LastDstChan = BISlot;		LastDstChan = BISlot;
if (TII->isPredicated(*BI))		if (TII->isPredicated(*BI))
continue;		continue;
int OperandIdx = TII->getOperandIdx(BI->getOpcode(), AMDGPU::OpName::write);		int OperandIdx = TII->getOperandIdx(BI->getOpcode(), R600::OpName::write);
if (OperandIdx > -1 && BI->getOperand(OperandIdx).getImm() == 0)		if (OperandIdx > -1 && BI->getOperand(OperandIdx).getImm() == 0)
continue;		continue;
int DstIdx = TII->getOperandIdx(BI->getOpcode(), AMDGPU::OpName::dst);		int DstIdx = TII->getOperandIdx(BI->getOpcode(), R600::OpName::dst);
if (DstIdx == -1) {		if (DstIdx == -1) {
continue;		continue;
}		}
unsigned Dst = BI->getOperand(DstIdx).getReg();		unsigned Dst = BI->getOperand(DstIdx).getReg();
if (isTrans \|\| TII->isTransOnly(*BI)) {		if (isTrans \|\| TII->isTransOnly(*BI)) {
Result[Dst] = AMDGPU::PS;		Result[Dst] = R600::PS;
continue;		continue;
}		}
if (BI->getOpcode() == AMDGPU::DOT4_r600 \|\|		if (BI->getOpcode() == R600::DOT4_r600 \|\|
BI->getOpcode() == AMDGPU::DOT4_eg) {		BI->getOpcode() == R600::DOT4_eg) {
Result[Dst] = AMDGPU::PV_X;		Result[Dst] = R600::PV_X;
continue;		continue;
}		}
if (Dst == AMDGPU::OQAP) {		if (Dst == R600::OQAP) {
continue;		continue;
}		}
unsigned PVReg = 0;		unsigned PVReg = 0;
switch (TRI.getHWRegChan(Dst)) {		switch (TRI.getHWRegChan(Dst)) {
case 0:		case 0:
PVReg = AMDGPU::PV_X;		PVReg = R600::PV_X;
break;		break;
case 1:		case 1:
PVReg = AMDGPU::PV_Y;		PVReg = R600::PV_Y;
break;		break;
case 2:		case 2:
PVReg = AMDGPU::PV_Z;		PVReg = R600::PV_Z;
break;		break;
case 3:		case 3:
PVReg = AMDGPU::PV_W;		PVReg = R600::PV_W;
break;		break;
default:		default:
llvm_unreachable("Invalid Chan");		llvm_unreachable("Invalid Chan");
}		}
Result[Dst] = PVReg;		Result[Dst] = PVReg;
} while ((++BI)->isBundledWithPred());		} while ((++BI)->isBundledWithPred());
return Result;		return Result;
}		}

void substitutePV(MachineInstr &MI, const DenseMap<unsigned, unsigned> &PVs)		void substitutePV(MachineInstr &MI, const DenseMap<unsigned, unsigned> &PVs)
const {		const {
unsigned Ops[] = {		unsigned Ops[] = {
AMDGPU::OpName::src0,		R600::OpName::src0,
AMDGPU::OpName::src1,		R600::OpName::src1,
AMDGPU::OpName::src2		R600::OpName::src2
};		};
for (unsigned i = 0; i < 3; i++) {		for (unsigned i = 0; i < 3; i++) {
int OperandIdx = TII->getOperandIdx(MI.getOpcode(), Ops[i]);		int OperandIdx = TII->getOperandIdx(MI.getOpcode(), Ops[i]);
if (OperandIdx < 0)		if (OperandIdx < 0)
continue;		continue;
unsigned Src = MI.getOperand(OperandIdx).getReg();		unsigned Src = MI.getOperand(OperandIdx).getReg();
const DenseMap<unsigned, unsigned>::const_iterator It = PVs.find(Src);		const DenseMap<unsigned, unsigned>::const_iterator It = PVs.find(Src);
if (It != PVs.end())		if (It != PVs.end())
Show All 23 Lines	public:

// isSoloInstruction - return true if instruction MI can not be packetized		// isSoloInstruction - return true if instruction MI can not be packetized
// with any other instruction, which means that MI itself is a packet.		// with any other instruction, which means that MI itself is a packet.
bool isSoloInstruction(const MachineInstr &MI) override {		bool isSoloInstruction(const MachineInstr &MI) override {
if (TII->isVector(MI))		if (TII->isVector(MI))
return true;		return true;
if (!TII->isALUInstr(MI.getOpcode()))		if (!TII->isALUInstr(MI.getOpcode()))
return true;		return true;
if (MI.getOpcode() == AMDGPU::GROUP_BARRIER)		if (MI.getOpcode() == R600::GROUP_BARRIER)
return true;		return true;
// XXX: This can be removed once the packetizer properly handles all the		// XXX: This can be removed once the packetizer properly handles all the
// LDS instruction group restrictions.		// LDS instruction group restrictions.
return TII->isLDSInstr(MI.getOpcode());		return TII->isLDSInstr(MI.getOpcode());
}		}

// isLegalToPacketizeTogether - Is it legal to packetize SUI and SUJ		// isLegalToPacketizeTogether - Is it legal to packetize SUI and SUJ
// together.		// together.
bool isLegalToPacketizeTogether(SUnit SUI, SUnit SUJ) override {		bool isLegalToPacketizeTogether(SUnit SUI, SUnit SUJ) override {
MachineInstr MII = SUI->getInstr(), MIJ = SUJ->getInstr();		MachineInstr MII = SUI->getInstr(), MIJ = SUJ->getInstr();
if (getSlot(MII) == getSlot(MIJ))		if (getSlot(MII) == getSlot(MIJ))
ConsideredInstUsesAlreadyWrittenVectorElement = true;		ConsideredInstUsesAlreadyWrittenVectorElement = true;
// Does MII and MIJ share the same pred_sel ?		// Does MII and MIJ share the same pred_sel ?
int OpI = TII->getOperandIdx(MII->getOpcode(), AMDGPU::OpName::pred_sel),		int OpI = TII->getOperandIdx(MII->getOpcode(), R600::OpName::pred_sel),
OpJ = TII->getOperandIdx(MIJ->getOpcode(), AMDGPU::OpName::pred_sel);		OpJ = TII->getOperandIdx(MIJ->getOpcode(), R600::OpName::pred_sel);
unsigned PredI = (OpI > -1)?MII->getOperand(OpI).getReg():0,		unsigned PredI = (OpI > -1)?MII->getOperand(OpI).getReg():0,
PredJ = (OpJ > -1)?MIJ->getOperand(OpJ).getReg():0;		PredJ = (OpJ > -1)?MIJ->getOperand(OpJ).getReg():0;
if (PredI != PredJ)		if (PredI != PredJ)
return false;		return false;
if (SUJ->isSucc(SUI)) {		if (SUJ->isSucc(SUI)) {
for (unsigned i = 0, e = SUJ->Succs.size(); i < e; ++i) {		for (unsigned i = 0, e = SUJ->Succs.size(); i < e; ++i) {
const SDep &Dep = SUJ->Succs[i];		const SDep &Dep = SUJ->Succs[i];
if (Dep.getSUnit() != SUI)		if (Dep.getSUnit() != SUI)
Show All 17 Lines	public:

// isLegalToPruneDependencies - Is it legal to prune dependece between SUI		// isLegalToPruneDependencies - Is it legal to prune dependece between SUI
// and SUJ.		// and SUJ.
bool isLegalToPruneDependencies(SUnit SUI, SUnit SUJ) override {		bool isLegalToPruneDependencies(SUnit SUI, SUnit SUJ) override {
return false;		return false;
}		}

void setIsLastBit(MachineInstr *MI, unsigned Bit) const {		void setIsLastBit(MachineInstr *MI, unsigned Bit) const {
unsigned LastOp = TII->getOperandIdx(MI->getOpcode(), AMDGPU::OpName::last);		unsigned LastOp = TII->getOperandIdx(MI->getOpcode(), R600::OpName::last);
MI->getOperand(LastOp).setImm(Bit);		MI->getOperand(LastOp).setImm(Bit);
}		}

bool isBundlableWithCurrentPMI(MachineInstr &MI,		bool isBundlableWithCurrentPMI(MachineInstr &MI,
const DenseMap<unsigned, unsigned> &PV,		const DenseMap<unsigned, unsigned> &PV,
std::vector<R600InstrInfo::BankSwizzle> &BS,		std::vector<R600InstrInfo::BankSwizzle> &BS,
bool &isTransSlot) {		bool &isTransSlot) {
isTransSlot = TII->isTransOnly(MI);		isTransSlot = TII->isTransOnly(MI);
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	const DenseMap<unsigned, unsigned> &PV =
getPreviousVector(FirstInBundle);		getPreviousVector(FirstInBundle);
std::vector<R600InstrInfo::BankSwizzle> BS;		std::vector<R600InstrInfo::BankSwizzle> BS;
bool isTransSlot;		bool isTransSlot;

if (isBundlableWithCurrentPMI(MI, PV, BS, isTransSlot)) {		if (isBundlableWithCurrentPMI(MI, PV, BS, isTransSlot)) {
for (unsigned i = 0, e = CurrentPacketMIs.size(); i < e; i++) {		for (unsigned i = 0, e = CurrentPacketMIs.size(); i < e; i++) {
MachineInstr *MI = CurrentPacketMIs[i];		MachineInstr *MI = CurrentPacketMIs[i];
unsigned Op = TII->getOperandIdx(MI->getOpcode(),		unsigned Op = TII->getOperandIdx(MI->getOpcode(),
AMDGPU::OpName::bank_swizzle);		R600::OpName::bank_swizzle);
MI->getOperand(Op).setImm(BS[i]);		MI->getOperand(Op).setImm(BS[i]);
}		}
unsigned Op =		unsigned Op =
TII->getOperandIdx(MI.getOpcode(), AMDGPU::OpName::bank_swizzle);		TII->getOperandIdx(MI.getOpcode(), R600::OpName::bank_swizzle);
MI.getOperand(Op).setImm(BS.back());		MI.getOperand(Op).setImm(BS.back());
if (!CurrentPacketMIs.empty())		if (!CurrentPacketMIs.empty())
setIsLastBit(CurrentPacketMIs.back(), 0);		setIsLastBit(CurrentPacketMIs.back(), 0);
substitutePV(MI, PV);		substitutePV(MI, PV);
MachineBasicBlock::iterator It = VLIWPacketizerList::addToPacket(MI);		MachineBasicBlock::iterator It = VLIWPacketizerList::addToPacket(MI);
if (isTransSlot) {		if (isTransSlot) {
endPacket(std::next(It)->getParent(), std::next(It));		endPacket(std::next(It)->getParent(), std::next(It));
}		}
Show All 12 Lines	bool R600Packetizer::runOnMachineFunction(MachineFunction &Fn) {

MachineLoopInfo &MLI = getAnalysis<MachineLoopInfo>();		MachineLoopInfo &MLI = getAnalysis<MachineLoopInfo>();

// Instantiate the packetizer.		// Instantiate the packetizer.
R600PacketizerList Packetizer(Fn, ST, MLI);		R600PacketizerList Packetizer(Fn, ST, MLI);

// DFA state table should not be empty.		// DFA state table should not be empty.
assert(Packetizer.getResourceTracker() && "Empty DFA table!");		assert(Packetizer.getResourceTracker() && "Empty DFA table!");
		assert(Packetizer.getResourceTracker()->getInstrItins());

if (Packetizer.getResourceTracker()->getInstrItins()->isEmpty())		if (Packetizer.getResourceTracker()->getInstrItins()->isEmpty())
return false;		return false;

//		//
// Loop over all basic blocks and remove KILL pseudo-instructions		// Loop over all basic blocks and remove KILL pseudo-instructions
// These instructions confuse the dependence analysis. Consider:		// These instructions confuse the dependence analysis. Consider:
// D0 = ... (Insn 0)		// D0 = ... (Insn 0)
// R0 = KILL R0, D0 (Insn 1)		// R0 = KILL R0, D0 (Insn 1)
// R0 = ... (Insn 2)		// R0 = ... (Insn 2)
// Here, Insn 1 will result in the dependence graph not emitting an output		// Here, Insn 1 will result in the dependence graph not emitting an output
// dependence between Insn 0 and Insn 2. This can lead to incorrect		// dependence between Insn 0 and Insn 2. This can lead to incorrect
// packetization		// packetization
//		//
for (MachineFunction::iterator MBB = Fn.begin(), MBBe = Fn.end();		for (MachineFunction::iterator MBB = Fn.begin(), MBBe = Fn.end();
MBB != MBBe; ++MBB) {		MBB != MBBe; ++MBB) {
MachineBasicBlock::iterator End = MBB->end();		MachineBasicBlock::iterator End = MBB->end();
MachineBasicBlock::iterator MI = MBB->begin();		MachineBasicBlock::iterator MI = MBB->begin();
while (MI != End) {		while (MI != End) {
if (MI->isKill() \|\| MI->getOpcode() == AMDGPU::IMPLICIT_DEF \|\|		if (MI->isKill() \|\| MI->getOpcode() == R600::IMPLICIT_DEF \|\|
(MI->getOpcode() == AMDGPU::CF_ALU && !MI->getOperand(8).getImm())) {		(MI->getOpcode() == R600::CF_ALU && !MI->getOperand(8).getImm())) {
MachineBasicBlock::iterator DeleteMI = MI;		MachineBasicBlock::iterator DeleteMI = MI;
++MI;		++MI;
MBB->erase(DeleteMI);		MBB->erase(DeleteMI);
End = MBB->end();		End = MBB->end();
continue;		continue;
}		}
++MI;		++MI;
}		}
▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

lib/Target/AMDGPU/R600Processors.td

	//===-- R600Processors.td - R600 Processor definitions --------------------===//			//===-- R600Processors.td - R600 Processor definitions --------------------===//
	//			//
	// The LLVM Compiler Infrastructure			// The LLVM Compiler Infrastructure
	//			//
	// This file is distributed under the University of Illinois Open Source			// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.			// License. See LICENSE.TXT for details.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

				class SubtargetFeatureFetchLimit <string Value> :
				SubtargetFeature <"fetch"#Value,
				"TexVTXClauseSize",
				Value,
				"Limit the maximum number of fetches in a clause to "#Value
				>;

				def FeatureR600ALUInst : SubtargetFeature<"R600ALUInst",
				"R600ALUInst",
				"false",
				"Older version of ALU instructions encoding"
				>;

				def FeatureFetchLimit8 : SubtargetFeatureFetchLimit <"8">;
				def FeatureFetchLimit16 : SubtargetFeatureFetchLimit <"16">;

				def FeatureVertexCache : SubtargetFeature<"HasVertexCache",
				"HasVertexCache",
				"true",
				"Specify use of dedicated vertex cache"
				>;

				def FeatureCaymanISA : SubtargetFeature<"caymanISA",
				"CaymanISA",
				"true",
				"Use Cayman ISA"
				>;

				def FeatureCFALUBug : SubtargetFeature<"cfalubug",
				"CFALUBug",
				"true",
				"GPU has CF_ALU bug"
				>;

				class R600SubtargetFeatureGeneration <string Value,
				list<SubtargetFeature> Implies> :
				SubtargetFeatureGeneration <Value, "R600Subtarget", Implies>;

				def FeatureR600 : R600SubtargetFeatureGeneration<"R600",
				[FeatureR600ALUInst, FeatureFetchLimit8, FeatureLocalMemorySize0]
				>;

				def FeatureR700 : R600SubtargetFeatureGeneration<"R700",
				[FeatureFetchLimit16, FeatureLocalMemorySize0]
				>;

				def FeatureEvergreen : R600SubtargetFeatureGeneration<"EVERGREEN",
				[FeatureFetchLimit16, FeatureLocalMemorySize32768]
				>;

				def FeatureNorthernIslands : R600SubtargetFeatureGeneration<"NORTHERN_ISLANDS",
				[FeatureFetchLimit16, FeatureWavefrontSize64,
				FeatureLocalMemorySize32768]
				>;


	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Radeon HD 2000/3000 Series (R600).			// Radeon HD 2000/3000 Series (R600).
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	def : Processor<"r600", R600_VLIW5_Itin,			def : Processor<"r600", R600_VLIW5_Itin,
	[FeatureR600, FeatureWavefrontSize64, FeatureVertexCache]			[FeatureR600, FeatureWavefrontSize64, FeatureVertexCache]
	>;			>;

	▲ Show 20 Lines • Show All 73 Lines • Show Last 20 Lines

lib/Target/AMDGPU/R600RegisterInfo.h

Show All 9 Lines
/// \file		/// \file
/// \brief Interface definition for R600RegisterInfo		/// \brief Interface definition for R600RegisterInfo
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_LIB_TARGET_AMDGPU_R600REGISTERINFO_H		#ifndef LLVM_LIB_TARGET_AMDGPU_R600REGISTERINFO_H
#define LLVM_LIB_TARGET_AMDGPU_R600REGISTERINFO_H		#define LLVM_LIB_TARGET_AMDGPU_R600REGISTERINFO_H

#include "AMDGPURegisterInfo.h"		#define GET_REGINFO_HEADER
		#include "R600GenRegisterInfo.inc"

namespace llvm {		namespace llvm {

class AMDGPUSubtarget;		class AMDGPUSubtarget;

struct R600RegisterInfo final : public AMDGPURegisterInfo {		struct R600RegisterInfo final : public R600GenRegisterInfo {
RegClassWeight RCW;		RegClassWeight RCW;

R600RegisterInfo();		R600RegisterInfo();

BitVector getReservedRegs(const MachineFunction &MF) const override;		BitVector getReservedRegs(const MachineFunction &MF) const override;
const MCPhysReg getCalleeSavedRegs(const MachineFunction MF) const override;		const MCPhysReg getCalleeSavedRegs(const MachineFunction MF) const override;
unsigned getFrameRegister(const MachineFunction &MF) const override;		unsigned getFrameRegister(const MachineFunction &MF) const override;

Show All 11 Lines	struct R600RegisterInfo final : public R600GenRegisterInfo {

// \returns true if \p Reg can be defined in one ALU clause and used in		// \returns true if \p Reg can be defined in one ALU clause and used in
// another.		// another.
bool isPhysRegLiveAcrossClauses(unsigned Reg) const;		bool isPhysRegLiveAcrossClauses(unsigned Reg) const;

void eliminateFrameIndex(MachineBasicBlock::iterator MI, int SPAdj,		void eliminateFrameIndex(MachineBasicBlock::iterator MI, int SPAdj,
unsigned FIOperandNum,		unsigned FIOperandNum,
RegScavenger *RS = nullptr) const override;		RegScavenger *RS = nullptr) const override;

		void reserveRegisterTuples(BitVector &Reserved, unsigned Reg) const;
};		};

} // End namespace llvm		} // End namespace llvm

#endif		#endif

lib/Target/AMDGPU/R600RegisterInfo.cpp

	Show All 15 Lines
	#include "AMDGPUTargetMachine.h"			#include "AMDGPUTargetMachine.h"
	#include "R600Defines.h"			#include "R600Defines.h"
	#include "R600InstrInfo.h"			#include "R600InstrInfo.h"
	#include "R600MachineFunctionInfo.h"			#include "R600MachineFunctionInfo.h"
	#include "MCTargetDesc/AMDGPUMCTargetDesc.h"			#include "MCTargetDesc/AMDGPUMCTargetDesc.h"

	using namespace llvm;			using namespace llvm;

	R600RegisterInfo::R600RegisterInfo() : AMDGPURegisterInfo() {			R600RegisterInfo::R600RegisterInfo() : R600GenRegisterInfo(0) {
	RCW.RegWeight = 0;			RCW.RegWeight = 0;
	RCW.WeightLimit = 0;			RCW.WeightLimit = 0;
	}			}

				#define GET_REGINFO_TARGET_DESC
				#include "R600GenRegisterInfo.inc"

	BitVector R600RegisterInfo::getReservedRegs(const MachineFunction &MF) const {			BitVector R600RegisterInfo::getReservedRegs(const MachineFunction &MF) const {
	BitVector Reserved(getNumRegs());			BitVector Reserved(getNumRegs());

	const R600Subtarget &ST = MF.getSubtarget<R600Subtarget>();			const R600Subtarget &ST = MF.getSubtarget<R600Subtarget>();
	const R600InstrInfo *TII = ST.getInstrInfo();			const R600InstrInfo *TII = ST.getInstrInfo();

	reserveRegisterTuples(Reserved, AMDGPU::ZERO);			reserveRegisterTuples(Reserved, R600::ZERO);
	reserveRegisterTuples(Reserved, AMDGPU::HALF);			reserveRegisterTuples(Reserved, R600::HALF);
	reserveRegisterTuples(Reserved, AMDGPU::ONE);			reserveRegisterTuples(Reserved, R600::ONE);
	reserveRegisterTuples(Reserved, AMDGPU::ONE_INT);			reserveRegisterTuples(Reserved, R600::ONE_INT);
	reserveRegisterTuples(Reserved, AMDGPU::NEG_HALF);			reserveRegisterTuples(Reserved, R600::NEG_HALF);
	reserveRegisterTuples(Reserved, AMDGPU::NEG_ONE);			reserveRegisterTuples(Reserved, R600::NEG_ONE);
	reserveRegisterTuples(Reserved, AMDGPU::PV_X);			reserveRegisterTuples(Reserved, R600::PV_X);
	reserveRegisterTuples(Reserved, AMDGPU::ALU_LITERAL_X);			reserveRegisterTuples(Reserved, R600::ALU_LITERAL_X);
	reserveRegisterTuples(Reserved, AMDGPU::ALU_CONST);			reserveRegisterTuples(Reserved, R600::ALU_CONST);
	reserveRegisterTuples(Reserved, AMDGPU::PREDICATE_BIT);			reserveRegisterTuples(Reserved, R600::PREDICATE_BIT);
	reserveRegisterTuples(Reserved, AMDGPU::PRED_SEL_OFF);			reserveRegisterTuples(Reserved, R600::PRED_SEL_OFF);
	reserveRegisterTuples(Reserved, AMDGPU::PRED_SEL_ZERO);			reserveRegisterTuples(Reserved, R600::PRED_SEL_ZERO);
	reserveRegisterTuples(Reserved, AMDGPU::PRED_SEL_ONE);			reserveRegisterTuples(Reserved, R600::PRED_SEL_ONE);
	reserveRegisterTuples(Reserved, AMDGPU::INDIRECT_BASE_ADDR);			reserveRegisterTuples(Reserved, R600::INDIRECT_BASE_ADDR);

	for (TargetRegisterClass::iterator I = AMDGPU::R600_AddrRegClass.begin(),			for (TargetRegisterClass::iterator I = R600::R600_AddrRegClass.begin(),
	E = AMDGPU::R600_AddrRegClass.end(); I != E; ++I) {			E = R600::R600_AddrRegClass.end(); I != E; ++I) {
	reserveRegisterTuples(Reserved, *I);			reserveRegisterTuples(Reserved, *I);
	}			}

	TII->reserveIndirectRegisters(Reserved, MF, *this);			TII->reserveIndirectRegisters(Reserved, MF, *this);

	return Reserved;			return Reserved;
	}			}

	// Dummy to not crash RegisterClassInfo.			// Dummy to not crash RegisterClassInfo.
	static const MCPhysReg CalleeSavedReg = AMDGPU::NoRegister;			static const MCPhysReg CalleeSavedReg = R600::NoRegister;

	const MCPhysReg *R600RegisterInfo::getCalleeSavedRegs(			const MCPhysReg *R600RegisterInfo::getCalleeSavedRegs(
	const MachineFunction *) const {			const MachineFunction *) const {
	return &CalleeSavedReg;			return &CalleeSavedReg;
	}			}

	unsigned R600RegisterInfo::getFrameRegister(const MachineFunction &MF) const {			unsigned R600RegisterInfo::getFrameRegister(const MachineFunction &MF) const {
	return AMDGPU::NoRegister;			return R600::NoRegister;
	}			}

	unsigned R600RegisterInfo::getHWRegChan(unsigned reg) const {			unsigned R600RegisterInfo::getHWRegChan(unsigned reg) const {
	return this->getEncodingValue(reg) >> HW_CHAN_SHIFT;			return this->getEncodingValue(reg) >> HW_CHAN_SHIFT;
	}			}

	unsigned R600RegisterInfo::getHWRegIndex(unsigned Reg) const {			unsigned R600RegisterInfo::getHWRegIndex(unsigned Reg) const {
	return GET_REG_INDEX(getEncodingValue(Reg));			return GET_REG_INDEX(getEncodingValue(Reg));
	}			}

	const TargetRegisterClass * R600RegisterInfo::getCFGStructurizerRegClass(			const TargetRegisterClass * R600RegisterInfo::getCFGStructurizerRegClass(
	MVT VT) const {			MVT VT) const {
	switch(VT.SimpleTy) {			switch(VT.SimpleTy) {
	default:			default:
	case MVT::i32: return &AMDGPU::R600_TReg32RegClass;			case MVT::i32: return &R600::R600_TReg32RegClass;
	}			}
	}			}

	const RegClassWeight &R600RegisterInfo::getRegClassWeight(			const RegClassWeight &R600RegisterInfo::getRegClassWeight(
	const TargetRegisterClass *RC) const {			const TargetRegisterClass *RC) const {
	return RCW;			return RCW;
	}			}

	bool R600RegisterInfo::isPhysRegLiveAcrossClauses(unsigned Reg) const {			bool R600RegisterInfo::isPhysRegLiveAcrossClauses(unsigned Reg) const {
	assert(!TargetRegisterInfo::isVirtualRegister(Reg));			assert(!TargetRegisterInfo::isVirtualRegister(Reg));

	switch (Reg) {			switch (Reg) {
	case AMDGPU::OQAP:			case R600::OQAP:
	case AMDGPU::OQBP:			case R600::OQBP:
	case AMDGPU::AR_X:			case R600::AR_X:
	return false;			return false;
	default:			default:
	return true;			return true;
	}			}
	}			}

	void R600RegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator MI,			void R600RegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator MI,
	int SPAdj,			int SPAdj,
	unsigned FIOperandNum,			unsigned FIOperandNum,
	RegScavenger *RS) const {			RegScavenger *RS) const {
	llvm_unreachable("Subroutines not supported yet");			llvm_unreachable("Subroutines not supported yet");
	}			}

				void R600RegisterInfo::reserveRegisterTuples(BitVector &Reserved, unsigned Reg) const {
				MCRegAliasIterator R(Reg, this, true);

				for (; R.isValid(); ++R)
				Reserved.set(*R);
				}

lib/Target/AMDGPU/R600RegisterInfo.td

Show First 20 Lines • Show All 239 Lines • ▼ Show 20 Lines	def R600_Reg128 : RegisterClass<"AMDGPU", [v4f32, v4i32], 128,
(add (sequence "T%u_XYZW", 0, 127))> {		(add (sequence "T%u_XYZW", 0, 127))> {
let CopyCost = -1;		let CopyCost = -1;
}		}

def R600_Reg128Vertical : RegisterClass<"AMDGPU", [v4f32, v4i32], 128,		def R600_Reg128Vertical : RegisterClass<"AMDGPU", [v4f32, v4i32], 128,
(add V0123_W, V0123_Z, V0123_Y, V0123_X)		(add V0123_W, V0123_Z, V0123_Y, V0123_X)
>;		>;

def R600_Reg64 : RegisterClass<"AMDGPU", [v2f32, v2i32], 64,		def R600_Reg64 : RegisterClass<"AMDGPU", [v2f32, v2i32, i64, f64], 64,
(add (sequence "T%u_XY", 0, 63))>;		(add (sequence "T%u_XY", 0, 63))>;

def R600_Reg64Vertical : RegisterClass<"AMDGPU", [v2f32, v2i32], 64,		def R600_Reg64Vertical : RegisterClass<"AMDGPU", [v2f32, v2i32], 64,
(add V01_X, V01_Y, V01_Z, V01_W,		(add V01_X, V01_Y, V01_Z, V01_W,
V23_X, V23_Y, V23_Z, V23_W)>;		V23_X, V23_Y, V23_Z, V23_W)>;

lib/Target/AMDGPU/R700Instructions.td

	//===-- R700Instructions.td - R700 Instruction defs -------- tablegen --===//			//===-- R700Instructions.td - R700 Instruction defs -------- tablegen --===//
	//			//
	// The LLVM Compiler Infrastructure			// The LLVM Compiler Infrastructure
	//			//
	// This file is distributed under the University of Illinois Open Source			// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.			// License. See LICENSE.TXT for details.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// TableGen definitions for instructions which are:			// TableGen definitions for instructions which are:
	// - Available to R700 and newer VLIW4/VLIW5 GPUs			// - Available to R700 and newer VLIW4/VLIW5 GPUs
	// - Available only on R700 family GPUs.			// - Available only on R700 family GPUs.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	def isR700 : Predicate<"Subtarget->getGeneration() == AMDGPUSubtarget::R700">;			def isR700 : Predicate<"Subtarget->getGeneration() == R600Subtarget::R700">;

	let Predicates = [isR700] in {			let Predicates = [isR700] in {
	def SIN_r700 : SIN_Common<0x6E>;			def SIN_r700 : SIN_Common<0x6E>;
	def COS_r700 : COS_Common<0x6F>;			def COS_r700 : COS_Common<0x6F>;
	}			}

lib/Target/AMDGPU/SIFoldOperands.cpp

Show First 20 Lines • Show All 70 Lines • ▼ Show 20 Lines
};		};

class SIFoldOperands : public MachineFunctionPass {		class SIFoldOperands : public MachineFunctionPass {
public:		public:
static char ID;		static char ID;
MachineRegisterInfo *MRI;		MachineRegisterInfo *MRI;
const SIInstrInfo *TII;		const SIInstrInfo *TII;
const SIRegisterInfo *TRI;		const SIRegisterInfo *TRI;
const SISubtarget *ST;		const AMDGPUSubtarget *ST;

void foldOperand(MachineOperand &OpToFold,		void foldOperand(MachineOperand &OpToFold,
MachineInstr *UseMI,		MachineInstr *UseMI,
unsigned UseOpIdx,		unsigned UseOpIdx,
SmallVectorImpl<FoldCandidate> &FoldList,		SmallVectorImpl<FoldCandidate> &FoldList,
SmallVectorImpl<MachineInstr *> &CopiesToReplace) const;		SmallVectorImpl<MachineInstr *> &CopiesToReplace) const;

void foldInstOperand(MachineInstr &MI, MachineOperand &OpToFold) const;		void foldInstOperand(MachineInstr &MI, MachineOperand &OpToFold) const;
▲ Show 20 Lines • Show All 869 Lines • ▼ Show 20 Lines	bool SIFoldOperands::tryFoldOMod(MachineInstr &MI) {
return true;		return true;
}		}

bool SIFoldOperands::runOnMachineFunction(MachineFunction &MF) {		bool SIFoldOperands::runOnMachineFunction(MachineFunction &MF) {
if (skipFunction(MF.getFunction()))		if (skipFunction(MF.getFunction()))
return false;		return false;

MRI = &MF.getRegInfo();		MRI = &MF.getRegInfo();
ST = &MF.getSubtarget<SISubtarget>();		ST = &MF.getSubtarget<AMDGPUSubtarget>();
TII = ST->getInstrInfo();		TII = ST->getInstrInfo();
TRI = &TII->getRegisterInfo();		TRI = &TII->getRegisterInfo();

const SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();		const SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();

// omod is ignored by hardware if IEEE bit is enabled. omod also does not		// omod is ignored by hardware if IEEE bit is enabled. omod also does not
// correctly handle signed zeros.		// correctly handle signed zeros.
//		//
▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIISelLowering.h

	Show All 16 Lines

	#include "AMDGPUISelLowering.h"			#include "AMDGPUISelLowering.h"
	#include "AMDGPUArgumentUsageInfo.h"			#include "AMDGPUArgumentUsageInfo.h"
	#include "SIInstrInfo.h"			#include "SIInstrInfo.h"

	namespace llvm {			namespace llvm {

	class SITargetLowering final : public AMDGPUTargetLowering {			class SITargetLowering final : public AMDGPUTargetLowering {
				private:
				const SISubtarget *Subtarget;

	SDValue lowerKernArgParameterPtr(SelectionDAG &DAG, const SDLoc &SL,			SDValue lowerKernArgParameterPtr(SelectionDAG &DAG, const SDLoc &SL,
	SDValue Chain, uint64_t Offset) const;			SDValue Chain, uint64_t Offset) const;
	SDValue getImplicitArgPtr(SelectionDAG &DAG, const SDLoc &SL) const;			SDValue getImplicitArgPtr(SelectionDAG &DAG, const SDLoc &SL) const;
	SDValue lowerKernargMemParameter(SelectionDAG &DAG, EVT VT, EVT MemVT,			SDValue lowerKernargMemParameter(SelectionDAG &DAG, EVT VT, EVT MemVT,
	const SDLoc &SL, SDValue Chain,			const SDLoc &SL, SDValue Chain,
	uint64_t Offset, bool Signed,			uint64_t Offset, bool Signed,
	const ISD::InputArg *Arg = nullptr) const;			const ISD::InputArg *Arg = nullptr) const;

	▲ Show 20 Lines • Show All 263 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 107 Lines • ▼ Show 20 Lines	if (!CCInfo.isAllocated(AMDGPU::SGPR0 + Reg)) {
return AMDGPU::SGPR0 + Reg;		return AMDGPU::SGPR0 + Reg;
}		}
}		}
llvm_unreachable("Cannot allocate sgpr");		llvm_unreachable("Cannot allocate sgpr");
}		}

SITargetLowering::SITargetLowering(const TargetMachine &TM,		SITargetLowering::SITargetLowering(const TargetMachine &TM,
const SISubtarget &STI)		const SISubtarget &STI)
: AMDGPUTargetLowering(TM, STI) {		: AMDGPUTargetLowering(TM, STI),
		Subtarget(&STI) {
addRegisterClass(MVT::i1, &AMDGPU::VReg_1RegClass);		addRegisterClass(MVT::i1, &AMDGPU::VReg_1RegClass);
addRegisterClass(MVT::i64, &AMDGPU::SReg_64RegClass);		addRegisterClass(MVT::i64, &AMDGPU::SReg_64RegClass);

addRegisterClass(MVT::i32, &AMDGPU::SReg_32_XM0RegClass);		addRegisterClass(MVT::i32, &AMDGPU::SReg_32_XM0RegClass);
addRegisterClass(MVT::f32, &AMDGPU::VGPR_32RegClass);		addRegisterClass(MVT::f32, &AMDGPU::VGPR_32RegClass);

addRegisterClass(MVT::f64, &AMDGPU::VReg_64RegClass);		addRegisterClass(MVT::f64, &AMDGPU::VReg_64RegClass);
addRegisterClass(MVT::v2i32, &AMDGPU::SReg_64RegClass);		addRegisterClass(MVT::v2i32, &AMDGPU::SReg_64RegClass);
Show All 16 Lines	if (Subtarget->has16BitInsts()) {
addRegisterClass(MVT::f16, &AMDGPU::SReg_32_XM0RegClass);		addRegisterClass(MVT::f16, &AMDGPU::SReg_32_XM0RegClass);
}		}

if (Subtarget->hasVOP3PInsts()) {		if (Subtarget->hasVOP3PInsts()) {
addRegisterClass(MVT::v2i16, &AMDGPU::SReg_32_XM0RegClass);		addRegisterClass(MVT::v2i16, &AMDGPU::SReg_32_XM0RegClass);
addRegisterClass(MVT::v2f16, &AMDGPU::SReg_32_XM0RegClass);		addRegisterClass(MVT::v2f16, &AMDGPU::SReg_32_XM0RegClass);
}		}

computeRegisterProperties(STI.getRegisterInfo());		computeRegisterProperties(Subtarget->getRegisterInfo());

// We need to custom lower vector stores from local memory		// We need to custom lower vector stores from local memory
setOperationAction(ISD::LOAD, MVT::v2i32, Custom);		setOperationAction(ISD::LOAD, MVT::v2i32, Custom);
setOperationAction(ISD::LOAD, MVT::v4i32, Custom);		setOperationAction(ISD::LOAD, MVT::v4i32, Custom);
setOperationAction(ISD::LOAD, MVT::v8i32, Custom);		setOperationAction(ISD::LOAD, MVT::v8i32, Custom);
setOperationAction(ISD::LOAD, MVT::v16i32, Custom);		setOperationAction(ISD::LOAD, MVT::v16i32, Custom);
setOperationAction(ISD::LOAD, MVT::i1, Custom);		setOperationAction(ISD::LOAD, MVT::i1, Custom);

▲ Show 20 Lines • Show All 143 Lines • ▼ Show 20 Lines	#endif
setOperationAction(ISD::ATOMIC_CMP_SWAP, MVT::i32, Custom);		setOperationAction(ISD::ATOMIC_CMP_SWAP, MVT::i32, Custom);
setOperationAction(ISD::ATOMIC_CMP_SWAP, MVT::i64, Custom);		setOperationAction(ISD::ATOMIC_CMP_SWAP, MVT::i64, Custom);

// We can't return success/failure, only the old value,		// We can't return success/failure, only the old value,
// let LLVM add the comparison		// let LLVM add the comparison
setOperationAction(ISD::ATOMIC_CMP_SWAP_WITH_SUCCESS, MVT::i32, Expand);		setOperationAction(ISD::ATOMIC_CMP_SWAP_WITH_SUCCESS, MVT::i32, Expand);
setOperationAction(ISD::ATOMIC_CMP_SWAP_WITH_SUCCESS, MVT::i64, Expand);		setOperationAction(ISD::ATOMIC_CMP_SWAP_WITH_SUCCESS, MVT::i64, Expand);

if (getSubtarget()->hasFlatAddressSpace()) {		if (Subtarget->hasFlatAddressSpace()) {
setOperationAction(ISD::ADDRSPACECAST, MVT::i32, Custom);		setOperationAction(ISD::ADDRSPACECAST, MVT::i32, Custom);
setOperationAction(ISD::ADDRSPACECAST, MVT::i64, Custom);		setOperationAction(ISD::ADDRSPACECAST, MVT::i64, Custom);
}		}

setOperationAction(ISD::BSWAP, MVT::i32, Legal);		setOperationAction(ISD::BSWAP, MVT::i32, Legal);
setOperationAction(ISD::BITREVERSE, MVT::i32, Legal);		setOperationAction(ISD::BITREVERSE, MVT::i32, Legal);

// On SI this is s_memtime and s_memrealtime on VI.		// On SI this is s_memtime and s_memrealtime on VI.
setOperationAction(ISD::READCYCLECOUNTER, MVT::i64, Legal);		setOperationAction(ISD::READCYCLECOUNTER, MVT::i64, Legal);
setOperationAction(ISD::TRAP, MVT::Other, Custom);		setOperationAction(ISD::TRAP, MVT::Other, Custom);
setOperationAction(ISD::DEBUGTRAP, MVT::Other, Custom);		setOperationAction(ISD::DEBUGTRAP, MVT::Other, Custom);

setOperationAction(ISD::FMINNUM, MVT::f64, Legal);		setOperationAction(ISD::FMINNUM, MVT::f64, Legal);
setOperationAction(ISD::FMAXNUM, MVT::f64, Legal);		setOperationAction(ISD::FMAXNUM, MVT::f64, Legal);

if (Subtarget->getGeneration() >= SISubtarget::SEA_ISLANDS) {		if (Subtarget->getGeneration() >= SISubtarget::SEA_ISLANDS) {
setOperationAction(ISD::FTRUNC, MVT::f64, Legal);		setOperationAction(ISD::FTRUNC, MVT::f64, Legal);
setOperationAction(ISD::FCEIL, MVT::f64, Legal);		setOperationAction(ISD::FCEIL, MVT::f64, Legal);
setOperationAction(ISD::FRINT, MVT::f64, Legal);		setOperationAction(ISD::FRINT, MVT::f64, Legal);
		} else {
		setOperationAction(ISD::FCEIL, MVT::f64, Custom);
		setOperationAction(ISD::FTRUNC, MVT::f64, Custom);
		setOperationAction(ISD::FRINT, MVT::f64, Custom);
		setOperationAction(ISD::FFLOOR, MVT::f64, Custom);
}		}

setOperationAction(ISD::FFLOOR, MVT::f64, Legal);		setOperationAction(ISD::FFLOOR, MVT::f64, Legal);

setOperationAction(ISD::FSIN, MVT::f32, Custom);		setOperationAction(ISD::FSIN, MVT::f32, Custom);
setOperationAction(ISD::FCOS, MVT::f32, Custom);		setOperationAction(ISD::FCOS, MVT::f32, Custom);
setOperationAction(ISD::FDIV, MVT::f32, Custom);		setOperationAction(ISD::FDIV, MVT::f32, Custom);
setOperationAction(ISD::FDIV, MVT::f64, Custom);		setOperationAction(ISD::FDIV, MVT::f64, Custom);
▲ Show 20 Lines • Show All 208 Lines • ▼ Show 20 Lines	#endif
setTargetDAGCombine(ISD::ATOMIC_LOAD_MAX);		setTargetDAGCombine(ISD::ATOMIC_LOAD_MAX);
setTargetDAGCombine(ISD::ATOMIC_LOAD_UMIN);		setTargetDAGCombine(ISD::ATOMIC_LOAD_UMIN);
setTargetDAGCombine(ISD::ATOMIC_LOAD_UMAX);		setTargetDAGCombine(ISD::ATOMIC_LOAD_UMAX);

setSchedulingPreference(Sched::RegPressure);		setSchedulingPreference(Sched::RegPressure);
}		}

const SISubtarget *SITargetLowering::getSubtarget() const {		const SISubtarget *SITargetLowering::getSubtarget() const {
return static_cast<const SISubtarget *>(Subtarget);		return Subtarget;
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// TargetLowering queries		// TargetLowering queries
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

bool SITargetLowering::isShuffleMaskLegal(ArrayRef<int>, EVT) const {		bool SITargetLowering::isShuffleMaskLegal(ArrayRef<int>, EVT) const {
// SI has some legal vector types, but no legal vector operations. Say no		// SI has some legal vector types, but no legal vector operations. Say no
▲ Show 20 Lines • Show All 1,333 Lines • ▼ Show 20 Lines	for (unsigned i = 0, realRVLocIdx = 0;

Chain = DAG.getCopyToReg(Chain, DL, VA.getLocReg(), Arg, Flag);		Chain = DAG.getCopyToReg(Chain, DL, VA.getLocReg(), Arg, Flag);
Flag = Chain.getValue(1);		Flag = Chain.getValue(1);
RetOps.push_back(DAG.getRegister(VA.getLocReg(), VA.getLocVT()));		RetOps.push_back(DAG.getRegister(VA.getLocReg(), VA.getLocVT()));
}		}

// FIXME: Does sret work properly?		// FIXME: Does sret work properly?
if (!Info->isEntryFunction()) {		if (!Info->isEntryFunction()) {
const SIRegisterInfo *TRI		const SIRegisterInfo *TRI = Subtarget->getRegisterInfo();
= static_cast<const SISubtarget *>(Subtarget)->getRegisterInfo();
const MCPhysReg *I =		const MCPhysReg *I =
TRI->getCalleeSavedRegsViaCopy(&DAG.getMachineFunction());		TRI->getCalleeSavedRegsViaCopy(&DAG.getMachineFunction());
if (I) {		if (I) {
for (; *I; ++I) {		for (; *I; ++I) {
if (AMDGPU::SReg_64RegClass.contains(*I))		if (AMDGPU::SReg_64RegClass.contains(*I))
RetOps.push_back(DAG.getRegister(*I, MVT::i64));		RetOps.push_back(DAG.getRegister(*I, MVT::i64));
else if (AMDGPU::SReg_32RegClass.contains(*I))		else if (AMDGPU::SReg_32RegClass.contains(*I))
RetOps.push_back(DAG.getRegister(*I, MVT::i32));		RetOps.push_back(DAG.getRegister(*I, MVT::i32));
▲ Show 20 Lines • Show All 85 Lines • ▼ Show 20 Lines	if (!CLI.CS)
return;		return;

const Function *CalleeFunc = CLI.CS.getCalledFunction();		const Function *CalleeFunc = CLI.CS.getCalledFunction();
assert(CalleeFunc);		assert(CalleeFunc);

SelectionDAG &DAG = CLI.DAG;		SelectionDAG &DAG = CLI.DAG;
const SDLoc &DL = CLI.DL;		const SDLoc &DL = CLI.DL;

const SISubtarget *ST = getSubtarget();		const SIRegisterInfo *TRI = Subtarget->getRegisterInfo();
const SIRegisterInfo *TRI = ST->getRegisterInfo();

auto &ArgUsageInfo =		auto &ArgUsageInfo =
DAG.getPass()->getAnalysis<AMDGPUArgumentUsageInfo>();		DAG.getPass()->getAnalysis<AMDGPUArgumentUsageInfo>();
const AMDGPUFunctionArgInfo &CalleeArgInfo		const AMDGPUFunctionArgInfo &CalleeArgInfo
= ArgUsageInfo.lookupFuncArgInfo(*CalleeFunc);		= ArgUsageInfo.lookupFuncArgInfo(*CalleeFunc);

const AMDGPUFunctionArgInfo &CallerArgInfo = Info.getArgInfo();		const AMDGPUFunctionArgInfo &CallerArgInfo = Info.getArgInfo();

▲ Show 20 Lines • Show All 421 Lines • ▼ Show 20 Lines	SDValue SITargetLowering::LowerCall(CallLoweringInfo &CLI,
// into the call.		// into the call.
for (auto &RegToPass : RegsToPass) {		for (auto &RegToPass : RegsToPass) {
Ops.push_back(DAG.getRegister(RegToPass.first,		Ops.push_back(DAG.getRegister(RegToPass.first,
RegToPass.second.getValueType()));		RegToPass.second.getValueType()));
}		}

// Add a register mask operand representing the call-preserved registers.		// Add a register mask operand representing the call-preserved registers.

const AMDGPURegisterInfo *TRI = Subtarget->getRegisterInfo();		auto TRI = static_cast<const SIRegisterInfo>(Subtarget->getRegisterInfo());
const uint32_t *Mask = TRI->getCallPreservedMask(MF, CallConv);		const uint32_t *Mask = TRI->getCallPreservedMask(MF, CallConv);
assert(Mask && "Missing call preserved mask for calling convention");		assert(Mask && "Missing call preserved mask for calling convention");
Ops.push_back(DAG.getRegisterMask(Mask));		Ops.push_back(DAG.getRegisterMask(Mask));

if (InFlag.getNode())		if (InFlag.getNode())
Ops.push_back(InFlag);		Ops.push_back(InFlag);

SDVTList NodeTys = DAG.getVTList(MVT::Other, MVT::Glue);		SDVTList NodeTys = DAG.getVTList(MVT::Other, MVT::Glue);
▲ Show 20 Lines • Show All 5,131 Lines • ▼ Show 20 Lines

// Figure out which registers should be reserved for stack access. Only after		// Figure out which registers should be reserved for stack access. Only after
// the function is legalized do we know all of the non-spill stack objects or if		// the function is legalized do we know all of the non-spill stack objects or if
// calls are present.		// calls are present.
void SITargetLowering::finalizeLowering(MachineFunction &MF) const {		void SITargetLowering::finalizeLowering(MachineFunction &MF) const {
MachineRegisterInfo &MRI = MF.getRegInfo();		MachineRegisterInfo &MRI = MF.getRegInfo();
SIMachineFunctionInfo *Info = MF.getInfo<SIMachineFunctionInfo>();		SIMachineFunctionInfo *Info = MF.getInfo<SIMachineFunctionInfo>();
const MachineFrameInfo &MFI = MF.getFrameInfo();		const MachineFrameInfo &MFI = MF.getFrameInfo();
const SISubtarget &ST = MF.getSubtarget<SISubtarget>();		const SIRegisterInfo *TRI = Subtarget->getRegisterInfo();
const SIRegisterInfo *TRI = ST.getRegisterInfo();

if (Info->isEntryFunction()) {		if (Info->isEntryFunction()) {
// Callable functions have fixed registers used for stack access.		// Callable functions have fixed registers used for stack access.
reservePrivateMemoryRegs(getTargetMachine(), MF, TRI, Info);		reservePrivateMemoryRegs(getTargetMachine(), MF, TRI, Info);
}		}

// We have to assume the SP is needed in case there are calls in the function		// We have to assume the SP is needed in case there are calls in the function
// during lowering. Calls are only detected after the function is		// during lowering. Calls are only detected after the function is
▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIInsertWaitcnts.cpp

Show First 20 Lines • Show All 916 Lines • ▼ Show 20 Lines	else if (MI.getOpcode() == AMDGPU::BUFFER_WBINVL1 \|\|
MI.getOpcode() == AMDGPU::BUFFER_WBINVL1_VOL) {		MI.getOpcode() == AMDGPU::BUFFER_WBINVL1_VOL) {
EmitWaitcnt \|=		EmitWaitcnt \|=
ScoreBrackets->updateByWait(VM_CNT, ScoreBrackets->getScoreUB(VM_CNT));		ScoreBrackets->updateByWait(VM_CNT, ScoreBrackets->getScoreUB(VM_CNT));
}		}

// All waits must be resolved at call return.		// All waits must be resolved at call return.
// NOTE: this could be improved with knowledge of all call sites or		// NOTE: this could be improved with knowledge of all call sites or
// with knowledge of the called routines.		// with knowledge of the called routines.
if (MI.getOpcode() == AMDGPU::RETURN \|\|		if (
MI.getOpcode() == AMDGPU::SI_RETURN_TO_EPILOG \|\|		MI.getOpcode() == AMDGPU::SI_RETURN_TO_EPILOG \|\|
		arsenmUnsubmitted Done Reply Inline Actions Fix formatting arsenm: Fix formatting
MI.getOpcode() == AMDGPU::S_SETPC_B64_return) {		MI.getOpcode() == AMDGPU::S_SETPC_B64_return) {
for (enum InstCounterType T = VM_CNT; T < NUM_INST_CNTS;		for (enum InstCounterType T = VM_CNT; T < NUM_INST_CNTS;
T = (enum InstCounterType)(T + 1)) {		T = (enum InstCounterType)(T + 1)) {
if (ScoreBrackets->getScoreUB(T) > ScoreBrackets->getScoreLB(T)) {		if (ScoreBrackets->getScoreUB(T) > ScoreBrackets->getScoreLB(T)) {
ScoreBrackets->setScoreLB(T, ScoreBrackets->getScoreUB(T));		ScoreBrackets->setScoreLB(T, ScoreBrackets->getScoreUB(T));
EmitWaitcnt \|= CNT_MASK(T);		EmitWaitcnt \|= CNT_MASK(T);
}		}
}		}
▲ Show 20 Lines • Show All 179 Lines • ▼ Show 20 Lines	EmitWaitcnt \|= ScoreBrackets->updateByWait(
EXP_CNT, ScoreBrackets->getScoreUB(EXP_CNT));		EXP_CNT, ScoreBrackets->getScoreUB(EXP_CNT));
EmitWaitcnt \|= ScoreBrackets->updateByWait(		EmitWaitcnt \|= ScoreBrackets->updateByWait(
LGKM_CNT, ScoreBrackets->getScoreUB(LGKM_CNT));		LGKM_CNT, ScoreBrackets->getScoreUB(LGKM_CNT));
}		}

// TODO: Remove this work-around, enable the assert for Bug 457939		// TODO: Remove this work-around, enable the assert for Bug 457939
// after fixing the scheduler. Also, the Shader Compiler code is		// after fixing the scheduler. Also, the Shader Compiler code is
// independent of target.		// independent of target.
if (readsVCCZ(MI) && ST->getGeneration() <= SISubtarget::SEA_ISLANDS) {		if (readsVCCZ(MI) && ST->getGeneration() <= AMDGPUSubtarget::SEA_ISLANDS) {
if (ScoreBrackets->getScoreLB(LGKM_CNT) <		if (ScoreBrackets->getScoreLB(LGKM_CNT) <
ScoreBrackets->getScoreUB(LGKM_CNT) &&		ScoreBrackets->getScoreUB(LGKM_CNT) &&
ScoreBrackets->hasPendingSMEM()) {		ScoreBrackets->hasPendingSMEM()) {
// Wait on everything, not just LGKM. vccz reads usually come from		// Wait on everything, not just LGKM. vccz reads usually come from
// terminators, and we always wait on everything at the end of the		// terminators, and we always wait on everything at the end of the
// block, so if we only wait on LGKM here, we might end up with		// block, so if we only wait on LGKM here, we might end up with
// another s_waitcnt inserted right after this if there are non-LGKM		// another s_waitcnt inserted right after this if there are non-LGKM
// instructions still outstanding.		// instructions still outstanding.
▲ Show 20 Lines • Show All 562 Lines • ▼ Show 20 Lines	for (MachineBasicBlock::iterator Iter = Block.begin(), E = Block.end();
}		}

bool VCCZBugWorkAround = false;		bool VCCZBugWorkAround = false;
if (readsVCCZ(Inst) &&		if (readsVCCZ(Inst) &&
(!VCCZBugHandledSet.count(&Inst))) {		(!VCCZBugHandledSet.count(&Inst))) {
if (ScoreBrackets->getScoreLB(LGKM_CNT) <		if (ScoreBrackets->getScoreLB(LGKM_CNT) <
ScoreBrackets->getScoreUB(LGKM_CNT) &&		ScoreBrackets->getScoreUB(LGKM_CNT) &&
ScoreBrackets->hasPendingSMEM()) {		ScoreBrackets->hasPendingSMEM()) {
if (ST->getGeneration() <= SISubtarget::SEA_ISLANDS)		if (ST->getGeneration() <= AMDGPUSubtarget::SEA_ISLANDS)
VCCZBugWorkAround = true;		VCCZBugWorkAround = true;
}		}
}		}

// Generate an s_waitcnt instruction to be placed before		// Generate an s_waitcnt instruction to be placed before
// cur_Inst, if needed.		// cur_Inst, if needed.
generateWaitcntInstBefore(Inst, ScoreBrackets);		generateWaitcntInstBefore(Inst, ScoreBrackets);

▲ Show 20 Lines • Show All 294 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIInsertWaits.cpp

Show First 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	using InstType = enum {
VMEM		VMEM
};		};

using RegCounters = Counters[512];		using RegCounters = Counters[512];
using RegInterval = std::pair<unsigned, unsigned>;		using RegInterval = std::pair<unsigned, unsigned>;

class SIInsertWaits : public MachineFunctionPass {		class SIInsertWaits : public MachineFunctionPass {
private:		private:
const SISubtarget *ST = nullptr;		const AMDGPUSubtarget *ST = nullptr;
const SIInstrInfo *TII = nullptr;		const SIInstrInfo *TII = nullptr;
const SIRegisterInfo *TRI = nullptr;		const SIRegisterInfo *TRI = nullptr;
const MachineRegisterInfo *MRI;		const MachineRegisterInfo *MRI;
AMDGPU::IsaInfo::IsaVersion ISA;		AMDGPU::IsaInfo::IsaVersion ISA;

/// \brief Constant zero value		/// \brief Constant zero value
static const Counters ZeroCounts;		static const Counters ZeroCounts;

▲ Show 20 Lines • Show All 234 Lines • ▼ Show 20 Lines	void SIInsertWaits::pushInstruction(MachineBasicBlock &MBB,
}		}

// If we don't increase anything then that's it		// If we don't increase anything then that's it
if (Sum == 0) {		if (Sum == 0) {
LastOpcodeType = OTHER;		LastOpcodeType = OTHER;
return;		return;
}		}

if (ST->getGeneration() >= SISubtarget::VOLCANIC_ISLANDS) {		if (ST->getGeneration() >= AMDGPUSubtarget::VOLCANIC_ISLANDS) {
// Any occurrence of consecutive VMEM or SMEM instructions forms a VMEM		// Any occurrence of consecutive VMEM or SMEM instructions forms a VMEM
// or SMEM clause, respectively.		// or SMEM clause, respectively.
//		//
// The temporary workaround is to break the clauses with S_NOP.		// The temporary workaround is to break the clauses with S_NOP.
//		//
// The proper solution would be to allocate registers such that all source		// The proper solution would be to allocate registers such that all source
// and destination registers don't overlap, e.g. this is illegal:		// and destination registers don't overlap, e.g. this is illegal:
// r0 = load r2		// r0 = load r2
▲ Show 20 Lines • Show All 162 Lines • ▼ Show 20 Lines	for (unsigned i = 0, e = MI.getNumOperands(); i != e; ++i) {
}		}
}		}

return Result;		return Result;
}		}

void SIInsertWaits::handleSendMsg(MachineBasicBlock &MBB,		void SIInsertWaits::handleSendMsg(MachineBasicBlock &MBB,
MachineBasicBlock::iterator I) {		MachineBasicBlock::iterator I) {
if (ST->getGeneration() < SISubtarget::VOLCANIC_ISLANDS)		if (ST->getGeneration() < AMDGPUSubtarget::VOLCANIC_ISLANDS)
return;		return;

// There must be "S_NOP 0" between an instruction writing M0 and S_SENDMSG.		// There must be "S_NOP 0" between an instruction writing M0 and S_SENDMSG.
if (LastInstWritesM0 && (I->getOpcode() == AMDGPU::S_SENDMSG \|\| I->getOpcode() == AMDGPU::S_SENDMSGHALT)) {		if (LastInstWritesM0 && (I->getOpcode() == AMDGPU::S_SENDMSG \|\| I->getOpcode() == AMDGPU::S_SENDMSGHALT)) {
BuildMI(MBB, I, DebugLoc(), TII->get(AMDGPU::S_NOP)).addImm(0);		BuildMI(MBB, I, DebugLoc(), TII->get(AMDGPU::S_NOP)).addImm(0);
LastInstWritesM0 = false;		LastInstWritesM0 = false;
return;		return;
}		}
Show All 20 Lines	static bool hasTrivialSuccessor(const MachineBasicBlock &MBB) {
return (Succ->pred_size() == 1) && MBB.isLayoutSuccessor(Succ);		return (Succ->pred_size() == 1) && MBB.isLayoutSuccessor(Succ);
}		}

// FIXME: Insert waits listed in Table 4.2 "Required User-Inserted Wait States"		// FIXME: Insert waits listed in Table 4.2 "Required User-Inserted Wait States"
// around other non-memory instructions.		// around other non-memory instructions.
bool SIInsertWaits::runOnMachineFunction(MachineFunction &MF) {		bool SIInsertWaits::runOnMachineFunction(MachineFunction &MF) {
bool Changes = false;		bool Changes = false;

ST = &MF.getSubtarget<SISubtarget>();		ST = &MF.getSubtarget<AMDGPUSubtarget>();
TII = ST->getInstrInfo();		TII = ST->getInstrInfo();
TRI = &TII->getRegisterInfo();		TRI = &TII->getRegisterInfo();
MRI = &MF.getRegInfo();		MRI = &MF.getRegInfo();
ISA = AMDGPU::IsaInfo::getIsaVersion(ST->getFeatureBits());		ISA = AMDGPU::IsaInfo::getIsaVersion(ST->getFeatureBits());
const SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();		const SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();

HardwareLimits.Named.VM = AMDGPU::getVmcntBitMask(ISA);		HardwareLimits.Named.VM = AMDGPU::getVmcntBitMask(ISA);
HardwareLimits.Named.EXP = AMDGPU::getExpcntBitMask(ISA);		HardwareLimits.Named.EXP = AMDGPU::getExpcntBitMask(ISA);
Show All 19 Lines	for (MachineFunction::iterator BI = MF.begin(), BE = MF.end();
BI != BE; ++BI) {		BI != BE; ++BI) {
MachineBasicBlock &MBB = *BI;		MachineBasicBlock &MBB = *BI;

for (MachineBasicBlock::iterator I = MBB.begin(), E = MBB.end();		for (MachineBasicBlock::iterator I = MBB.begin(), E = MBB.end();
I != E; ++I) {		I != E; ++I) {
if (!HaveScalarStores && TII->isScalarStore(*I))		if (!HaveScalarStores && TII->isScalarStore(*I))
HaveScalarStores = true;		HaveScalarStores = true;

if (ST->getGeneration() <= SISubtarget::SEA_ISLANDS) {		if (ST->getGeneration() <= AMDGPUSubtarget::SEA_ISLANDS) {
// There is a hardware bug on CI/SI where SMRD instruction may corrupt		// There is a hardware bug on CI/SI where SMRD instruction may corrupt
// vccz bit, so when we detect that an instruction may read from a		// vccz bit, so when we detect that an instruction may read from a
// corrupt vccz bit, we need to:		// corrupt vccz bit, we need to:
// 1. Insert s_waitcnt lgkm(0) to wait for all outstanding SMRD operations to		// 1. Insert s_waitcnt lgkm(0) to wait for all outstanding SMRD operations to
// complete.		// complete.
// 2. Restore the correct value of vccz by writing the current value		// 2. Restore the correct value of vccz by writing the current value
// of vcc back to vcc.		// of vcc back to vcc.

▲ Show 20 Lines • Show All 119 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIInstrFormats.td

Show All 15 Lines	def isGCN : Predicate<"Subtarget->getGeneration() "
AssemblerPredicate<"FeatureGCN">;		AssemblerPredicate<"FeatureGCN">;
def isSI : Predicate<"Subtarget->getGeneration() "		def isSI : Predicate<"Subtarget->getGeneration() "
"== SISubtarget::SOUTHERN_ISLANDS">,		"== SISubtarget::SOUTHERN_ISLANDS">,
AssemblerPredicate<"FeatureSouthernIslands">;		AssemblerPredicate<"FeatureSouthernIslands">;


class InstSI <dag outs, dag ins, string asm = "",		class InstSI <dag outs, dag ins, string asm = "",
list<dag> pattern = []> :		list<dag> pattern = []> :
AMDGPUInst<outs, ins, asm, pattern>, PredicateControl {		AMDGPUInst<outs, ins, asm, pattern>, GCNPredicateControl {
let SubtargetPredicate = isGCN;		let SubtargetPredicate = isGCN;

// Low bits - basic encoding information.		// Low bits - basic encoding information.
field bit SALU = 0;		field bit SALU = 0;
field bit VALU = 0;		field bit VALU = 0;

// SALU instruction formats.		// SALU instruction formats.
field bit SOP1 = 0;		field bit SOP1 = 0;
▲ Show 20 Lines • Show All 315 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIInstrInfo.h

Show All 25 Lines
#include "llvm/CodeGen/MachineInstr.h"		#include "llvm/CodeGen/MachineInstr.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"		#include "llvm/CodeGen/MachineInstrBuilder.h"
#include "llvm/CodeGen/MachineOperand.h"		#include "llvm/CodeGen/MachineOperand.h"
#include "llvm/MC/MCInstrDesc.h"		#include "llvm/MC/MCInstrDesc.h"
#include "llvm/Support/Compiler.h"		#include "llvm/Support/Compiler.h"
#include <cassert>		#include <cassert>
#include <cstdint>		#include <cstdint>

		#define GET_INSTRINFO_HEADER
		#include "AMDGPUGenInstrInfo.inc"

namespace llvm {		namespace llvm {

class APInt;		class APInt;
class MachineRegisterInfo;		class MachineRegisterInfo;
class RegScavenger;		class RegScavenger;
class SISubtarget;		class SISubtarget;
class TargetRegisterClass;		class TargetRegisterClass;

class SIInstrInfo final : public AMDGPUInstrInfo {		class SIInstrInfo final : public AMDGPUGenInstrInfo {
private:		private:
const SIRegisterInfo RI;		const SIRegisterInfo RI;
const SISubtarget &ST;		const SISubtarget &ST;

// The inverse predicate should have the negative value.		// The inverse predicate should have the negative value.
enum BranchPredicate {		enum BranchPredicate {
INVALID_BR = 0,		INVALID_BR = 0,
SCC_TRUE = 1,		SCC_TRUE = 1,
▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines	bool areLoadsFromSameBasePtr(SDNode Load1, SDNode Load2,
int64_t &Offset2) const override;		int64_t &Offset2) const override;

bool getMemOpBaseRegImmOfs(MachineInstr &LdSt, unsigned &BaseReg,		bool getMemOpBaseRegImmOfs(MachineInstr &LdSt, unsigned &BaseReg,
int64_t &Offset,		int64_t &Offset,
const TargetRegisterInfo *TRI) const final;		const TargetRegisterInfo *TRI) const final;

bool shouldClusterMemOps(MachineInstr &FirstLdSt, unsigned BaseReg1,		bool shouldClusterMemOps(MachineInstr &FirstLdSt, unsigned BaseReg1,
MachineInstr &SecondLdSt, unsigned BaseReg2,		MachineInstr &SecondLdSt, unsigned BaseReg2,
unsigned NumLoads) const final;		unsigned NumLoads) const override;

		bool shouldScheduleLoadsNear(SDNode Load0, SDNode Load1, int64_t Offset0,
		int64_t Offset1, unsigned NumLoads) const override;

void copyPhysReg(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI,		void copyPhysReg(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI,
const DebugLoc &DL, unsigned DestReg, unsigned SrcReg,		const DebugLoc &DL, unsigned DestReg, unsigned SrcReg,
bool KillSrc) const override;		bool KillSrc) const override;

unsigned calculateLDSSpillAddress(MachineBasicBlock &MBB, MachineInstr &MI,		unsigned calculateLDSSpillAddress(MachineBasicBlock &MBB, MachineInstr &MI,
RegScavenger *RS, unsigned TmpReg,		RegScavenger *RS, unsigned TmpReg,
unsigned Offset, unsigned Size) const;		unsigned Offset, unsigned Size) const;
▲ Show 20 Lines • Show All 699 Lines • ▼ Show 20 Lines	MachineInstrBuilder getAddNoCarry(MachineBasicBlock &MBB,
unsigned DestReg) const;		unsigned DestReg) const;

static bool isKillTerminator(unsigned Opcode);		static bool isKillTerminator(unsigned Opcode);
const MCInstrDesc &getKillTerminatorFromPseudo(unsigned Opcode) const;		const MCInstrDesc &getKillTerminatorFromPseudo(unsigned Opcode) const;

static bool isLegalMUBUFImmOffset(unsigned Imm) {		static bool isLegalMUBUFImmOffset(unsigned Imm) {
return isUInt<12>(Imm);		return isUInt<12>(Imm);
}		}

		/// \brief Return a target-specific opcode if Opcode is a pseudo instruction.
		/// Return -1 if the target-specific opcode for the pseudo instruction does
		/// not exist. If Opcode is not a pseudo instruction, this is identity.
		int pseudoToMCOpcode(int Opcode) const;

};		};

namespace AMDGPU {		namespace AMDGPU {

LLVM_READONLY		LLVM_READONLY
int getVOPe64(uint16_t Opcode);		int getVOPe64(uint16_t Opcode);

LLVM_READONLY		LLVM_READONLY
▲ Show 20 Lines • Show All 64 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIInstrInfo.cpp

//===- SIInstrInfo.cpp - SI Instruction Information ----------------------===//		//===- SIInstrInfo.cpp - SI Instruction Information ----------------------===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
/// \file		/// \file
/// \brief SI Implementation of TargetInstrInfo.		/// \brief SI Implementation of TargetInstrInfo.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "SIInstrInfo.h"		#include "SIInstrInfo.h"
#include "AMDGPU.h"		#include "AMDGPU.h"
		#include "AMDGPUIntrinsicInfo.h"
#include "AMDGPUSubtarget.h"		#include "AMDGPUSubtarget.h"
#include "GCNHazardRecognizer.h"		#include "GCNHazardRecognizer.h"
#include "SIDefines.h"		#include "SIDefines.h"
#include "SIMachineFunctionInfo.h"		#include "SIMachineFunctionInfo.h"
#include "SIRegisterInfo.h"		#include "SIRegisterInfo.h"
#include "MCTargetDesc/AMDGPUMCTargetDesc.h"		#include "MCTargetDesc/AMDGPUMCTargetDesc.h"
#include "Utils/AMDGPUBaseInfo.h"		#include "Utils/AMDGPUBaseInfo.h"
#include "llvm/ADT/APInt.h"		#include "llvm/ADT/APInt.h"
Show All 33 Lines
#include "llvm/Target/TargetMachine.h"		#include "llvm/Target/TargetMachine.h"
#include <cassert>		#include <cassert>
#include <cstdint>		#include <cstdint>
#include <iterator>		#include <iterator>
#include <utility>		#include <utility>

using namespace llvm;		using namespace llvm;

		#define GET_INSTRINFO_CTOR_DTOR
		#include "AMDGPUGenInstrInfo.inc"

		namespace llvm {
		namespace AMDGPU {
		#define GET_RSRCINTRINSIC_IMPL
		#include "AMDGPUGenSearchableTables.inc"

		#define GET_D16IMAGEDIMINTRINSIC_IMPL
		#include "AMDGPUGenSearchableTables.inc"
		}
		}


// Must be at least 4 to be able to branch over minimum unconditional branch		// Must be at least 4 to be able to branch over minimum unconditional branch
// code. This is only for making it possible to write reasonably small tests for		// code. This is only for making it possible to write reasonably small tests for
// long branches.		// long branches.
static cl::opt<unsigned>		static cl::opt<unsigned>
BranchOffsetBits("amdgpu-s-branch-bits", cl::ReallyHidden, cl::init(16),		BranchOffsetBits("amdgpu-s-branch-bits", cl::ReallyHidden, cl::init(16),
cl::desc("Restrict range of branch instructions (DEBUG)"));		cl::desc("Restrict range of branch instructions (DEBUG)"));

SIInstrInfo::SIInstrInfo(const SISubtarget &ST)		SIInstrInfo::SIInstrInfo(const SISubtarget &ST)
: AMDGPUInstrInfo(ST), RI(ST), ST(ST) {}		: AMDGPUGenInstrInfo(AMDGPU::ADJCALLSTACKUP, AMDGPU::ADJCALLSTACKDOWN),
		RI(ST), ST(ST) {}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// TargetInstrInfo callbacks		// TargetInstrInfo callbacks
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

static unsigned getNumOperandsNoGlue(SDNode *Node) {		static unsigned getNumOperandsNoGlue(SDNode *Node) {
unsigned N = Node->getNumOperands();		unsigned N = Node->getNumOperands();
while (N && Node->getOperand(N - 1).getValueType() == MVT::Glue)		while (N && Node->getOperand(N - 1).getValueType() == MVT::Glue)
▲ Show 20 Lines • Show All 350 Lines • ▼ Show 20 Lines	bool SIInstrInfo::shouldClusterMemOps(MachineInstr &FirstLdSt,

const MachineRegisterInfo &MRI =		const MachineRegisterInfo &MRI =
FirstLdSt.getParent()->getParent()->getRegInfo();		FirstLdSt.getParent()->getParent()->getRegInfo();
const TargetRegisterClass *DstRC = MRI.getRegClass(FirstDst->getReg());		const TargetRegisterClass *DstRC = MRI.getRegClass(FirstDst->getReg());

return (NumLoads * (RI.getRegSizeInBits(*DstRC) / 8)) <= LoadClusterThreshold;		return (NumLoads * (RI.getRegSizeInBits(*DstRC) / 8)) <= LoadClusterThreshold;
}		}

		// FIXME: This behaves strangely. If, for example, you have 32 load + stores,
		// the first 16 loads will be interleaved with the stores, and the next 16 will
		// be clustered as expected. It should really split into 2 16 store batches.
		//
		// Loads are clustered until this returns false, rather than trying to schedule
		// groups of stores. This also means we have to deal with saying different
		// address space loads should be clustered, and ones which might cause bank
		// conflicts.
		//
		// This might be deprecated so it might not be worth that much effort to fix.
		bool SIInstrInfo::shouldScheduleLoadsNear(SDNode Load0, SDNode Load1,
		int64_t Offset0, int64_t Offset1,
		unsigned NumLoads) const {
		assert(Offset1 > Offset0 &&
		"Second offset should be larger than first offset!");
		// If we have less than 16 loads in a row, and the offsets are within 64
		// bytes, then schedule together.

		// A cacheline is 64 bytes (for global memory).
		return (NumLoads <= 16 && (Offset1 - Offset0) < 64);
		}

static void reportIllegalCopy(const SIInstrInfo *TII, MachineBasicBlock &MBB,		static void reportIllegalCopy(const SIInstrInfo *TII, MachineBasicBlock &MBB,
MachineBasicBlock::iterator MI,		MachineBasicBlock::iterator MI,
const DebugLoc &DL, unsigned DestReg,		const DebugLoc &DL, unsigned DestReg,
unsigned SrcReg, bool KillSrc) {		unsigned SrcReg, bool KillSrc) {
MachineFunction *MF = MBB.getParent();		MachineFunction *MF = MBB.getParent();
DiagnosticInfoUnsupported IllegalCopy(MF->getFunction(),		DiagnosticInfoUnsupported IllegalCopy(MF->getFunction(),
"illegal SGPR to VGPR copy",		"illegal SGPR to VGPR copy",
DL, DS_Error);		DL, DS_Error);
▲ Show 20 Lines • Show All 544 Lines • ▼ Show 20 Lines
}		}

/// \param @Offset Offset in bytes of the FrameIndex being spilled		/// \param @Offset Offset in bytes of the FrameIndex being spilled
unsigned SIInstrInfo::calculateLDSSpillAddress(		unsigned SIInstrInfo::calculateLDSSpillAddress(
MachineBasicBlock &MBB, MachineInstr &MI, RegScavenger *RS, unsigned TmpReg,		MachineBasicBlock &MBB, MachineInstr &MI, RegScavenger *RS, unsigned TmpReg,
unsigned FrameOffset, unsigned Size) const {		unsigned FrameOffset, unsigned Size) const {
MachineFunction *MF = MBB.getParent();		MachineFunction *MF = MBB.getParent();
SIMachineFunctionInfo *MFI = MF->getInfo<SIMachineFunctionInfo>();		SIMachineFunctionInfo *MFI = MF->getInfo<SIMachineFunctionInfo>();
const SISubtarget &ST = MF->getSubtarget<SISubtarget>();		const AMDGPUSubtarget &ST = MF->getSubtarget<AMDGPUSubtarget>();
DebugLoc DL = MBB.findDebugLoc(MI);		DebugLoc DL = MBB.findDebugLoc(MI);
unsigned WorkGroupSize = MFI->getMaxFlatWorkGroupSize();		unsigned WorkGroupSize = MFI->getMaxFlatWorkGroupSize();
unsigned WavefrontSize = ST.getWavefrontSize();		unsigned WavefrontSize = ST.getWavefrontSize();

unsigned TIDReg = MFI->getTIDReg();		unsigned TIDReg = MFI->getTIDReg();
if (!MFI->hasCalculatedTID()) {		if (!MFI->hasCalculatedTID()) {
MachineBasicBlock &Entry = MBB.getParent()->front();		MachineBasicBlock &Entry = MBB.getParent()->front();
MachineBasicBlock::iterator Insert = Entry.front();		MachineBasicBlock::iterator Insert = Entry.front();
▲ Show 20 Lines • Show All 119 Lines • ▼ Show 20 Lines	case AMDGPU::S_NOP:
return MI.getOperand(0).getImm() + 1;		return MI.getOperand(0).getImm() + 1;
}		}
}		}

bool SIInstrInfo::expandPostRAPseudo(MachineInstr &MI) const {		bool SIInstrInfo::expandPostRAPseudo(MachineInstr &MI) const {
MachineBasicBlock &MBB = *MI.getParent();		MachineBasicBlock &MBB = *MI.getParent();
DebugLoc DL = MBB.findDebugLoc(MI);		DebugLoc DL = MBB.findDebugLoc(MI);
switch (MI.getOpcode()) {		switch (MI.getOpcode()) {
default: return AMDGPUInstrInfo::expandPostRAPseudo(MI);		default: return TargetInstrInfo::expandPostRAPseudo(MI);
case AMDGPU::S_MOV_B64_term:		case AMDGPU::S_MOV_B64_term:
// This is only a terminator to get the correct spill code placement during		// This is only a terminator to get the correct spill code placement during
// register allocation.		// register allocation.
MI.setDesc(get(AMDGPU::S_MOV_B64));		MI.setDesc(get(AMDGPU::S_MOV_B64));
break;		break;

case AMDGPU::S_XOR_B64_term:		case AMDGPU::S_XOR_B64_term:
// This is only a terminator to get the correct spill code placement during		// This is only a terminator to get the correct spill code placement during
▲ Show 20 Lines • Show All 733 Lines • ▼ Show 20 Lines	bool SIInstrInfo::isFoldableCopy(const MachineInstr &MI) const {
}		}
}		}

unsigned SIInstrInfo::getAddressSpaceForPseudoSourceKind(		unsigned SIInstrInfo::getAddressSpaceForPseudoSourceKind(
PseudoSourceValue::PSVKind Kind) const {		PseudoSourceValue::PSVKind Kind) const {
switch(Kind) {		switch(Kind) {
case PseudoSourceValue::Stack:		case PseudoSourceValue::Stack:
case PseudoSourceValue::FixedStack:		case PseudoSourceValue::FixedStack:
return AMDGPUASI.PRIVATE_ADDRESS;		return ST.getAMDGPUAS().PRIVATE_ADDRESS;
case PseudoSourceValue::ConstantPool:		case PseudoSourceValue::ConstantPool:
case PseudoSourceValue::GOT:		case PseudoSourceValue::GOT:
case PseudoSourceValue::JumpTable:		case PseudoSourceValue::JumpTable:
case PseudoSourceValue::GlobalValueCallEntry:		case PseudoSourceValue::GlobalValueCallEntry:
case PseudoSourceValue::ExternalSymbolCallEntry:		case PseudoSourceValue::ExternalSymbolCallEntry:
case PseudoSourceValue::TargetCustom:		case PseudoSourceValue::TargetCustom:
return AMDGPUASI.CONSTANT_ADDRESS;		return ST.getAMDGPUAS().CONSTANT_ADDRESS;
}		}
return AMDGPUASI.FLAT_ADDRESS;		return ST.getAMDGPUAS().FLAT_ADDRESS;
}		}

static void removeModOperands(MachineInstr &MI) {		static void removeModOperands(MachineInstr &MI) {
unsigned Opc = MI.getOpcode();		unsigned Opc = MI.getOpcode();
int Src0ModIdx = AMDGPU::getNamedOperandIdx(Opc,		int Src0ModIdx = AMDGPU::getNamedOperandIdx(Opc,
AMDGPU::OpName::src0_modifiers);		AMDGPU::OpName::src0_modifiers);
int Src1ModIdx = AMDGPU::getNamedOperandIdx(Opc,		int Src1ModIdx = AMDGPU::getNamedOperandIdx(Opc,
AMDGPU::OpName::src1_modifiers);		AMDGPU::OpName::src1_modifiers);
▲ Show 20 Lines • Show All 2,695 Lines • ▼ Show 20 Lines

unsigned SIInstrInfo::isStackAccess(const MachineInstr &MI,		unsigned SIInstrInfo::isStackAccess(const MachineInstr &MI,
int &FrameIndex) const {		int &FrameIndex) const {
const MachineOperand *Addr = getNamedOperand(MI, AMDGPU::OpName::vaddr);		const MachineOperand *Addr = getNamedOperand(MI, AMDGPU::OpName::vaddr);
if (!Addr \|\| !Addr->isFI())		if (!Addr \|\| !Addr->isFI())
return AMDGPU::NoRegister;		return AMDGPU::NoRegister;

assert(!MI.memoperands_empty() &&		assert(!MI.memoperands_empty() &&
(*MI.memoperands_begin())->getAddrSpace() == AMDGPUASI.PRIVATE_ADDRESS);		(*MI.memoperands_begin())->getAddrSpace() == ST.getAMDGPUAS().PRIVATE_ADDRESS);

FrameIndex = Addr->getIndex();		FrameIndex = Addr->getIndex();
return getNamedOperand(MI, AMDGPU::OpName::vdata)->getReg();		return getNamedOperand(MI, AMDGPU::OpName::vdata)->getReg();
}		}

unsigned SIInstrInfo::isSGPRStackAccess(const MachineInstr &MI,		unsigned SIInstrInfo::isSGPRStackAccess(const MachineInstr &MI,
int &FrameIndex) const {		int &FrameIndex) const {
const MachineOperand *Addr = getNamedOperand(MI, AMDGPU::OpName::addr);		const MachineOperand *Addr = getNamedOperand(MI, AMDGPU::OpName::addr);
▲ Show 20 Lines • Show All 102 Lines • ▼ Show 20 Lines
bool SIInstrInfo::mayAccessFlatAddressSpace(const MachineInstr &MI) const {		bool SIInstrInfo::mayAccessFlatAddressSpace(const MachineInstr &MI) const {
if (!isFLAT(MI))		if (!isFLAT(MI))
return false;		return false;

if (MI.memoperands_empty())		if (MI.memoperands_empty())
return true;		return true;

for (const MachineMemOperand *MMO : MI.memoperands()) {		for (const MachineMemOperand *MMO : MI.memoperands()) {
if (MMO->getAddrSpace() == AMDGPUASI.FLAT_ADDRESS)		if (MMO->getAddrSpace() == ST.getAMDGPUAS().FLAT_ADDRESS)
return true;		return true;
}		}
return false;		return false;
}		}

bool SIInstrInfo::isNonUniformBranchInstr(MachineInstr &Branch) const {		bool SIInstrInfo::isNonUniformBranchInstr(MachineInstr &Branch) const {
return Branch.getOpcode() == AMDGPU::SI_NON_UNIFORM_BRCOND_PSEUDO;		return Branch.getOpcode() == AMDGPU::SI_NON_UNIFORM_BRCOND_PSEUDO;
}		}
▲ Show 20 Lines • Show All 163 Lines • ▼ Show 20 Lines	bool SIInstrInfo::isBufferSMRD(const MachineInstr &MI) const {
// Check that it is using a buffer resource.		// Check that it is using a buffer resource.
int Idx = AMDGPU::getNamedOperandIdx(MI.getOpcode(), AMDGPU::OpName::sbase);		int Idx = AMDGPU::getNamedOperandIdx(MI.getOpcode(), AMDGPU::OpName::sbase);
if (Idx == -1) // e.g. s_memtime		if (Idx == -1) // e.g. s_memtime
return false;		return false;

const auto RCID = MI.getDesc().OpInfo[Idx].RegClass;		const auto RCID = MI.getDesc().OpInfo[Idx].RegClass;
return RCID == AMDGPU::SReg_128RegClassID;		return RCID == AMDGPU::SReg_128RegClassID;
}		}

		// This must be kept in sync with the SIEncodingFamily class in SIInstrInfo.td
		enum SIEncodingFamily {
		SI = 0,
		VI = 1,
		SDWA = 2,
		SDWA9 = 3,
		GFX80 = 4,
		GFX9 = 5
		};

		static SIEncodingFamily subtargetEncodingFamily(const SISubtarget &ST) {
		switch (ST.getGeneration()) {
		case SISubtarget::SOUTHERN_ISLANDS:
		case SISubtarget::SEA_ISLANDS:
		return SIEncodingFamily::SI;
		case SISubtarget::VOLCANIC_ISLANDS:
		case SISubtarget::GFX9:
		return SIEncodingFamily::VI;
		}
		llvm_unreachable("Unknown subtarget generation!");
		}

		int SIInstrInfo::pseudoToMCOpcode(int Opcode) const {
		SIEncodingFamily Gen = subtargetEncodingFamily(ST);

		if ((get(Opcode).TSFlags & SIInstrFlags::renamedInGFX9) != 0 &&
		ST.getGeneration() >= SISubtarget::GFX9)
		Gen = SIEncodingFamily::GFX9;

		if (get(Opcode).TSFlags & SIInstrFlags::SDWA)
		Gen = ST.getGeneration() == SISubtarget::GFX9 ? SIEncodingFamily::SDWA9
		: SIEncodingFamily::SDWA;
		// Adjust the encoding family to GFX80 for D16 buffer instructions when the
		// subtarget has UnpackedD16VMem feature.
		// TODO: remove this when we discard GFX80 encoding.
		if (ST.hasUnpackedD16VMem() && (get(Opcode).TSFlags & SIInstrFlags::D16)
		&& !(get(Opcode).TSFlags & SIInstrFlags::MIMG))
		Gen = SIEncodingFamily::GFX80;

		int MCOp = AMDGPU::getMCOpcode(Opcode, Gen);

		// -1 means that Opcode is already a native instruction.
		if (MCOp == -1)
		return Opcode;

		// (uint16_t)-1 means that Opcode is a pseudo instruction that has
		// no encoding in the given subtarget generation.
		if (MCOp == (uint16_t)-1)
		return -1;

		return MCOp;
		}

lib/Target/AMDGPU/SIInstrInfo.td

Show All 11 Lines	def isCIOnly : Predicate<"Subtarget->getGeneration() =="
"SISubtarget::SEA_ISLANDS">,		"SISubtarget::SEA_ISLANDS">,
AssemblerPredicate <"FeatureSeaIslands">;		AssemblerPredicate <"FeatureSeaIslands">;
def isVIOnly : Predicate<"Subtarget->getGeneration() =="		def isVIOnly : Predicate<"Subtarget->getGeneration() =="
"SISubtarget::VOLCANIC_ISLANDS">,		"SISubtarget::VOLCANIC_ISLANDS">,
AssemblerPredicate <"FeatureVolcanicIslands">;		AssemblerPredicate <"FeatureVolcanicIslands">;

def DisableInst : Predicate <"false">, AssemblerPredicate<"FeatureDisable">;		def DisableInst : Predicate <"false">, AssemblerPredicate<"FeatureDisable">;

		class GCNPredicateControl : PredicateControl {
		Predicate SIAssemblerPredicate = isSICI;
		Predicate VIAssemblerPredicate = isVI;
		}

// Execpt for the NONE field, this must be kept in sync with the		// Execpt for the NONE field, this must be kept in sync with the
// SIEncodingFamily enum in AMDGPUInstrInfo.cpp		// SIEncodingFamily enum in AMDGPUInstrInfo.cpp
def SIEncodingFamily {		def SIEncodingFamily {
int NONE = -1;		int NONE = -1;
int SI = 0;		int SI = 0;
int VI = 1;		int VI = 1;
int SDWA = 2;		int SDWA = 2;
int SDWA9 = 3;		int SDWA9 = 3;
▲ Show 20 Lines • Show All 2,131 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIInstructions.td

	//===-- SIInstructions.td - SI Instruction Defintions ---------------------===//			//===-- SIInstructions.td - SI Instruction Defintions ---------------------===//
	//			//
	// The LLVM Compiler Infrastructure			// The LLVM Compiler Infrastructure
	//			//
	// This file is distributed under the University of Illinois Open Source			// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.			// License. See LICENSE.TXT for details.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// This file was originally auto-generated from a GPU register header file and			// This file was originally auto-generated from a GPU register header file and
	// all the instruction definitions were originally commented out. Instructions			// all the instruction definitions were originally commented out. Instructions
	// that are not yet supported remain commented out.			// that are not yet supported remain commented out.
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	class GCNPat<dag pattern, dag result> : AMDGPUPat<pattern, result> {			class GCNPat<dag pattern, dag result> : Pat<pattern, result>, GCNPredicateControl {
	let SubtargetPredicate = isGCN;			let SubtargetPredicate = isGCN;
	}			}


	include "VOPInstructions.td"			include "VOPInstructions.td"
	include "SOPInstructions.td"			include "SOPInstructions.td"
	include "SMInstructions.td"			include "SMInstructions.td"
	include "FLATInstructions.td"			include "FLATInstructions.td"
	include "BUFInstructions.td"			include "BUFInstructions.td"

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// EXP Instructions			// EXP Instructions
	▲ Show 20 Lines • Show All 1,504 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIRegisterInfo.h

	Show All 15 Lines
	#define LLVM_LIB_TARGET_AMDGPU_SIREGISTERINFO_H			#define LLVM_LIB_TARGET_AMDGPU_SIREGISTERINFO_H

	#include "AMDGPURegisterInfo.h"			#include "AMDGPURegisterInfo.h"
	#include "SIDefines.h"			#include "SIDefines.h"
	#include "llvm/CodeGen/MachineRegisterInfo.h"			#include "llvm/CodeGen/MachineRegisterInfo.h"

	namespace llvm {			namespace llvm {

				class AMDGPUSubtarget;
	class LiveIntervals;			class LiveIntervals;
	class MachineRegisterInfo;			class MachineRegisterInfo;
	class SISubtarget;			class SISubtarget;
	class SIMachineFunctionInfo;			class SIMachineFunctionInfo;

	class SIRegisterInfo final : public AMDGPURegisterInfo {			class SIRegisterInfo final : public AMDGPURegisterInfo {
	private:			private:
	unsigned SGPRSetID;			unsigned SGPRSetID;
	▲ Show 20 Lines • Show All 211 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIRegisterInfo.cpp

Show First 20 Lines • Show All 1,225 Lines • ▼ Show 20 Lines	static const TargetRegisterClass *const BaseClasses[] = {
&AMDGPU::VReg_96RegClass,		&AMDGPU::VReg_96RegClass,
&AMDGPU::VReg_128RegClass,		&AMDGPU::VReg_128RegClass,
&AMDGPU::SReg_128RegClass,		&AMDGPU::SReg_128RegClass,
&AMDGPU::VReg_256RegClass,		&AMDGPU::VReg_256RegClass,
&AMDGPU::SReg_256RegClass,		&AMDGPU::SReg_256RegClass,
&AMDGPU::VReg_512RegClass,		&AMDGPU::VReg_512RegClass,
&AMDGPU::SReg_512RegClass,		&AMDGPU::SReg_512RegClass,
&AMDGPU::SCC_CLASSRegClass,		&AMDGPU::SCC_CLASSRegClass,
&AMDGPU::R600_Reg32RegClass,
&AMDGPU::R600_PredicateRegClass,
&AMDGPU::Pseudo_SReg_32RegClass,		&AMDGPU::Pseudo_SReg_32RegClass,
&AMDGPU::Pseudo_SReg_128RegClass,		&AMDGPU::Pseudo_SReg_128RegClass,
};		};

for (const TargetRegisterClass *BaseClass : BaseClasses) {		for (const TargetRegisterClass *BaseClass : BaseClasses) {
if (BaseClass->contains(Reg)) {		if (BaseClass->contains(Reg)) {
return BaseClass;		return BaseClass;
}		}
▲ Show 20 Lines • Show All 327 Lines • Show Last 20 Lines

lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h

Show First 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	unsigned getMaxWavesPerCU(const FeatureBitset &Features,
unsigned FlatWorkGroupSize);		unsigned FlatWorkGroupSize);

/// \returns Minimum number of waves per execution unit for given subtarget \p		/// \returns Minimum number of waves per execution unit for given subtarget \p
/// Features.		/// Features.
unsigned getMinWavesPerEU(const FeatureBitset &Features);		unsigned getMinWavesPerEU(const FeatureBitset &Features);

/// \returns Maximum number of waves per execution unit for given subtarget \p		/// \returns Maximum number of waves per execution unit for given subtarget \p
/// Features without any kind of limitation.		/// Features without any kind of limitation.
unsigned getMaxWavesPerEU(const FeatureBitset &Features);		unsigned getMaxWavesPerEU();

/// \returns Maximum number of waves per execution unit for given subtarget \p		/// \returns Maximum number of waves per execution unit for given subtarget \p
/// Features and limited by given \p FlatWorkGroupSize.		/// Features and limited by given \p FlatWorkGroupSize.
unsigned getMaxWavesPerEU(const FeatureBitset &Features,		unsigned getMaxWavesPerEU(const FeatureBitset &Features,
unsigned FlatWorkGroupSize);		unsigned FlatWorkGroupSize);

/// \returns Minimum flat work group size for given subtarget \p Features.		/// \returns Minimum flat work group size for given subtarget \p Features.
unsigned getMinFlatWorkGroupSize(const FeatureBitset &Features);		unsigned getMinFlatWorkGroupSize(const FeatureBitset &Features);
▲ Show 20 Lines • Show All 288 Lines • Show Last 20 Lines

lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp

Show First 20 Lines • Show All 218 Lines • ▼ Show 20 Lines	IsaVersion getIsaVersion(const FeatureBitset &Features) {
// GCN GFX9.		// GCN GFX9.
if (Features.test(FeatureISAVersion9_0_0))		if (Features.test(FeatureISAVersion9_0_0))
return {9, 0, 0};		return {9, 0, 0};
if (Features.test(FeatureISAVersion9_0_2))		if (Features.test(FeatureISAVersion9_0_2))
return {9, 0, 2};		return {9, 0, 2};
if (Features.test(FeatureGFX9))		if (Features.test(FeatureGFX9))
return {9, 0, 0};		return {9, 0, 0};

if (!Features.test(FeatureGCN) \|\| Features.test(FeatureSouthernIslands))		if (Features.test(FeatureSouthernIslands))
return {0, 0, 0};		return {0, 0, 0};
return {7, 0, 0};		return {7, 0, 0};
}		}

void streamIsaVersion(const MCSubtargetInfo *STI, raw_ostream &Stream) {		void streamIsaVersion(const MCSubtargetInfo *STI, raw_ostream &Stream) {
auto TargetTriple = STI->getTargetTriple();		auto TargetTriple = STI->getTargetTriple();
auto ISAVersion = IsaInfo::getIsaVersion(STI->getFeatureBits());		auto ISAVersion = IsaInfo::getIsaVersion(STI->getFeatureBits());

▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	unsigned getMaxWorkGroupsPerCU(const FeatureBitset &Features,
unsigned N = getWavesPerWorkGroup(Features, FlatWorkGroupSize);		unsigned N = getWavesPerWorkGroup(Features, FlatWorkGroupSize);
if (N == 1)		if (N == 1)
return 40;		return 40;
N = 40 / N;		N = 40 / N;
return std::min(N, 16u);		return std::min(N, 16u);
}		}

unsigned getMaxWavesPerCU(const FeatureBitset &Features) {		unsigned getMaxWavesPerCU(const FeatureBitset &Features) {
return getMaxWavesPerEU(Features) * getEUsPerCU(Features);		return getMaxWavesPerEU() * getEUsPerCU(Features);
}		}

unsigned getMaxWavesPerCU(const FeatureBitset &Features,		unsigned getMaxWavesPerCU(const FeatureBitset &Features,
unsigned FlatWorkGroupSize) {		unsigned FlatWorkGroupSize) {
return getWavesPerWorkGroup(Features, FlatWorkGroupSize);		return getWavesPerWorkGroup(Features, FlatWorkGroupSize);
}		}

unsigned getMinWavesPerEU(const FeatureBitset &Features) {		unsigned getMinWavesPerEU(const FeatureBitset &Features) {
return 1;		return 1;
}		}

unsigned getMaxWavesPerEU(const FeatureBitset &Features) {		unsigned getMaxWavesPerEU() {
if (!Features.test(FeatureGCN))
return 8;
// FIXME: Need to take scratch memory into account.		// FIXME: Need to take scratch memory into account.
return 10;		return 10;
}		}

unsigned getMaxWavesPerEU(const FeatureBitset &Features,		unsigned getMaxWavesPerEU(const FeatureBitset &Features,
unsigned FlatWorkGroupSize) {		unsigned FlatWorkGroupSize) {
return alignTo(getMaxWavesPerCU(Features, FlatWorkGroupSize),		return alignTo(getMaxWavesPerCU(Features, FlatWorkGroupSize),
getEUsPerCU(Features)) / getEUsPerCU(Features);		getEUsPerCU(Features)) / getEUsPerCU(Features);
Show All 39 Lines	unsigned getAddressableNumSGPRs(const FeatureBitset &Features) {
if (Version.Major >= 8)		if (Version.Major >= 8)
return 102;		return 102;
return 104;		return 104;
}		}

unsigned getMinNumSGPRs(const FeatureBitset &Features, unsigned WavesPerEU) {		unsigned getMinNumSGPRs(const FeatureBitset &Features, unsigned WavesPerEU) {
assert(WavesPerEU != 0);		assert(WavesPerEU != 0);

if (WavesPerEU >= getMaxWavesPerEU(Features))		if (WavesPerEU >= getMaxWavesPerEU())
return 0;		return 0;
unsigned MinNumSGPRs =		unsigned MinNumSGPRs =
alignDown(getTotalNumSGPRs(Features) / (WavesPerEU + 1),		alignDown(getTotalNumSGPRs(Features) / (WavesPerEU + 1),
getSGPRAllocGranule(Features)) + 1;		getSGPRAllocGranule(Features)) + 1;
return std::min(MinNumSGPRs, getAddressableNumSGPRs(Features));		return std::min(MinNumSGPRs, getAddressableNumSGPRs(Features));
}		}

unsigned getMaxNumSGPRs(const FeatureBitset &Features, unsigned WavesPerEU,		unsigned getMaxNumSGPRs(const FeatureBitset &Features, unsigned WavesPerEU,
Show All 23 Lines

unsigned getAddressableNumVGPRs(const FeatureBitset &Features) {		unsigned getAddressableNumVGPRs(const FeatureBitset &Features) {
return getTotalNumVGPRs(Features);		return getTotalNumVGPRs(Features);
}		}

unsigned getMinNumVGPRs(const FeatureBitset &Features, unsigned WavesPerEU) {		unsigned getMinNumVGPRs(const FeatureBitset &Features, unsigned WavesPerEU) {
assert(WavesPerEU != 0);		assert(WavesPerEU != 0);

if (WavesPerEU >= getMaxWavesPerEU(Features))		if (WavesPerEU >= getMaxWavesPerEU())
return 0;		return 0;
unsigned MinNumVGPRs =		unsigned MinNumVGPRs =
alignDown(getTotalNumVGPRs(Features) / (WavesPerEU + 1),		alignDown(getTotalNumVGPRs(Features) / (WavesPerEU + 1),
getVGPRAllocGranule(Features)) + 1;		getVGPRAllocGranule(Features)) + 1;
return std::min(MinNumVGPRs, getAddressableNumVGPRs(Features));		return std::min(MinNumVGPRs, getAddressableNumVGPRs(Features));
}		}

unsigned getMaxNumVGPRs(const FeatureBitset &Features, unsigned WavesPerEU) {		unsigned getMaxNumVGPRs(const FeatureBitset &Features, unsigned WavesPerEU) {
▲ Show 20 Lines • Show All 306 Lines • ▼ Show 20 Lines

#define CASE_CI_VI(node) \		#define CASE_CI_VI(node) \
assert(!isSI(STI)); \		assert(!isSI(STI)); \
case node: return isCI(STI) ? node##_ci : node##_vi;		case node: return isCI(STI) ? node##_ci : node##_vi;

#define CASE_VI_GFX9(node) \		#define CASE_VI_GFX9(node) \
case node: return isGFX9(STI) ? node##_gfx9 : node##_vi;		case node: return isGFX9(STI) ? node##_gfx9 : node##_vi;

unsigned getMCReg(unsigned Reg, const MCSubtargetInfo &STI) {		unsigned getMCReg(unsigned Reg, const MCSubtargetInfo &STI) {
MAP_REG2REG		MAP_REG2REG
}		}
		arsenmUnsubmitted Not Done Reply Inline Actions I would expect this to be a separate function, but not sure where this would go arsenm: I would expect this to be a separate function, but not sure where this would go
		tstellarAuthorUnsubmitted Not Done Reply Inline Actions We can refactor AMDGPUMCInstLower.cpp so that this can be in its own function. I can work on this as one of the follow on clean ups. tstellar: We can refactor AMDGPUMCInstLower.cpp so that this can be in its own function. I can work on…

#undef CASE_CI_VI		#undef CASE_CI_VI
#undef CASE_VI_GFX9		#undef CASE_VI_GFX9

#define CASE_CI_VI(node) case node##_ci: case node##_vi: return node;		#define CASE_CI_VI(node) case node##_ci: case node##_vi: return node;
#define CASE_VI_GFX9(node) case node##_vi: case node##_gfx9: return node;		#define CASE_VI_GFX9(node) case node##_vi: case node##_gfx9: return node;

unsigned mc2PseudoReg(unsigned Reg) {		unsigned mc2PseudoReg(unsigned Reg) {
▲ Show 20 Lines • Show All 229 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Separate R600 and GCN TableGen filesClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 144924

lib/Target/AMDGPU/AMDGPU.td

lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp

lib/Target/AMDGPU/AMDGPUCallingConv.td

lib/Target/AMDGPU/AMDGPUFeatures.td

lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp

lib/Target/AMDGPU/AMDGPUISelLowering.h

lib/Target/AMDGPU/AMDGPUISelLowering.cpp

lib/Target/AMDGPU/AMDGPUInstrInfo.h

lib/Target/AMDGPU/AMDGPUInstrInfo.cpp

lib/Target/AMDGPU/AMDGPUInstructions.td

lib/Target/AMDGPU/AMDGPUIntrinsics.td

lib/Target/AMDGPU/AMDGPULowerIntrinsics.cpp

lib/Target/AMDGPU/AMDGPUMCInstLower.h

lib/Target/AMDGPU/AMDGPUMCInstLower.cpp

lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp

lib/Target/AMDGPU/AMDGPURegisterInfo.td

lib/Target/AMDGPU/AMDGPUSubtarget.h

lib/Target/AMDGPU/AMDGPUSubtarget.cpp

lib/Target/AMDGPU/AMDGPUTargetMachine.h

lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h

lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp

lib/Target/AMDGPU/AMDILCFGStructurizer.cpp

lib/Target/AMDGPU/CMakeLists.txt

lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp

lib/Target/AMDGPU/EvergreenInstructions.td

lib/Target/AMDGPU/InstPrinter/AMDGPUInstPrinter.h

lib/Target/AMDGPU/InstPrinter/AMDGPUInstPrinter.cpp

lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.h

lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCTargetDesc.cpp

lib/Target/AMDGPU/MCTargetDesc/CMakeLists.txt

lib/Target/AMDGPU/MCTargetDesc/R600MCCodeEmitter.cpp

lib/Target/AMDGPU/MCTargetDesc/R600MCTargetDesc.cpp

lib/Target/AMDGPU/MCTargetDesc/SIMCCodeEmitter.cpp

lib/Target/AMDGPU/Processors.td

lib/Target/AMDGPU/R600.td

lib/Target/AMDGPU/R600ClauseMergePass.cpp

lib/Target/AMDGPU/R600ControlFlowFinalizer.cpp

lib/Target/AMDGPU/R600EmitClauseMarkers.cpp

lib/Target/AMDGPU/R600ExpandSpecialInstrs.cpp

lib/Target/AMDGPU/R600ISelLowering.h

lib/Target/AMDGPU/R600ISelLowering.cpp

lib/Target/AMDGPU/R600InstrFormats.td

lib/Target/AMDGPU/R600InstrInfo.h

lib/Target/AMDGPU/R600InstrInfo.cpp

lib/Target/AMDGPU/R600Instructions.td

lib/Target/AMDGPU/R600IntrinsicInfo.h

lib/Target/AMDGPU/R600IntrinsicInfo.cpp

lib/Target/AMDGPU/R600Intrinsics.td

lib/Target/AMDGPU/R600MachineScheduler.cpp

lib/Target/AMDGPU/R600OptimizeVectorRegisters.cpp

lib/Target/AMDGPU/R600Packetizer.cpp

lib/Target/AMDGPU/R600Processors.td

lib/Target/AMDGPU/R600RegisterInfo.h

lib/Target/AMDGPU/R600RegisterInfo.cpp

lib/Target/AMDGPU/R600RegisterInfo.td

lib/Target/AMDGPU/R700Instructions.td

lib/Target/AMDGPU/SIFoldOperands.cpp

lib/Target/AMDGPU/SIISelLowering.h

lib/Target/AMDGPU/SIISelLowering.cpp

lib/Target/AMDGPU/SIInsertWaitcnts.cpp

lib/Target/AMDGPU/SIInsertWaits.cpp

lib/Target/AMDGPU/SIInstrFormats.td

lib/Target/AMDGPU/SIInstrInfo.h

lib/Target/AMDGPU/SIInstrInfo.cpp

lib/Target/AMDGPU/SIInstrInfo.td

lib/Target/AMDGPU/SIInstructions.td

lib/Target/AMDGPU/SIRegisterInfo.h

lib/Target/AMDGPU/SIRegisterInfo.cpp

lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h

lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp

AMDGPU: Separate R600 and GCN TableGen files
ClosedPublic