This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
-
AMDGPU.h
-
AMDGPUTargetMachine.cpp
-
CMakeLists.txt
-
SIDefines.h
2
SIInstrFormats.td
-
SIInstrInfo.h
10/25
SIModeRegister.cpp
-
VOP1Instructions.td
-
VOP2Instructions.td
-
VOP3Instructions.td
-
VOP3PInstructions.td
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
-
mode-register.mir

Differential D50633

[AMDGPU] Add new Mode Register pass
ClosedPublic

Authored by timcorringham on Aug 13 2018, 6:40 AM.

Download Raw Diff

Details

Reviewers

arsenm
tpr
b-sumner
rampitec
nhaehnle

Group Reviewers

Restricted Project

Summary

A new pass to manage the Mode register.

Currently this just looks at the floating point double precision
rounding requirements, but is intended to be easily extended to
encompass all Mode register settings.

The floating point double precision rounding mode is required by
the 16 bit interpolation instructions.

Diff Detail

Repository

rL LLVM

Build Status

Buildable 25551
Build 25550: arc lint + arc unit

Event Timeline

timcorringham created this revision.Aug 13 2018, 6:40 AM

Herald added subscribers: llvm-commits, t-tye, tpr and 7 others. · View Herald TranscriptAug 13 2018, 6:40 AM

timcorringham added reviewers: arsenm, Restricted Project, tpr.Aug 13 2018, 6:44 AM

arsenm added inline comments.Aug 13 2018, 10:18 AM

lib/Target/AMDGPU/SIInstrFormats.td
124–126	It's not clear to me what this means, since every FP instruction does this
lib/Target/AMDGPU/SIModeRegister.cpp
161–181	I'm not really comfortable inserting something semantically required at this point. Can you do this when the instructions are selected instead?
172	What does this mean exactly by "needs"? Does the instruction fail to function?
225	nullptr
348–349	Since you seem to be relying on inserting these instructions, this is incorrect
368	Typo setregto

timcorringham marked 3 inline comments as done.Aug 13 2018, 12:20 PM

timcorringham added inline comments.

lib/Target/AMDGPU/SIInstrFormats.td
124–126	It is just a way of indicating to this pass that it is an instruction that uses the double precision flag.
lib/Target/AMDGPU/SIModeRegister.cpp
161–181	The problem with doing it during instruction selection is that we end up with many more mode register writes than are strictly required. As the mode register is not modelled as a register there isn't any way to track the values without a pass to do it, I suppose it is similar to adding nops or waitcnts, which are also done by specific passes.
172	The results can be outside the expected range when other rounding modes are used.

Minor amendments as per review comments.

Harbormaster completed remote builds in B21404: Diff 160427.Aug 13 2018, 12:21 PM

arsenm added inline comments.Aug 15 2018, 2:08 PM

lib/Target/AMDGPU/SIModeRegister.cpp
161–181	This is a semantic property and I think it really belongs in instruction selection. What is the problem with optimizing those out here? What actually changes?
161–181	Or even earlier, a property of the emitted operation

timcorringham added inline comments.Aug 16 2018, 4:25 AM

lib/Target/AMDGPU/SIModeRegister.cpp
161–181	An initial attempt at this functionality did insert necessary mode register writes at instruction selection, but that resulted in many more changes than were necessary as the state of the register isn't known at that point. In order to avoid some of the changes all instructions that use the mode register would have to be updated to ensure the mode was appropriate - which was considered too invasive. A pass to remove unnecessary setregs would be possible, but would be very similar to this pass, but would still require changes to many instructions to insert the setregs, and would also provide extra overhead for all intervening passes. There would have to be some dependence introduced between the mode register and the instructions that use it to ensure any rescheduling didn't break the code. Overall we thought that this approach was the best compromise. It solves the immediate problem with minimal overhead, and can be extended in a staged manner fairly easily.

Fixes for observed failures:

Corrected which instructions are marked as using the double

precision floating point rounding mode flags

Changed the position where the first setreg in a block is

inserted in order to reduce the risk of hitting a hazard that
may exist at entry to the first block of a shader.

Herald added a subscriber: jvesely. · View Herald TranscriptOct 31 2018, 1:42 PM

Harbormaster completed remote builds in B24415: Diff 172010.Oct 31 2018, 1:42 PM

You need to model HWREG and add it as impuse to affected instructions and as imdef/out of a setreg.
That is the only correct way to protect it from rescheduling.

That way you will be able to emit setregs early and let some optimizations happen, including those to minimize setreg calls (since setreg is a big performance hit).

Moreover, you may need to split hwreg into several "registers" to track dependencies on individual bits.

This revision now requires changes to proceed.Oct 31 2018, 2:36 PM

Yes, I think that your suggestion is the correct solution in a perfect world. It is one of the possible approaches that we discussed in our team before implementing the current proposed solution.

The specific issue we are trying to solve is that the three 16 bit interpolation instructions need a non-default rounding mode. These are not yet widely used, but of course we need to ensure they work when they are used.

Modelling the Mode register as a separate register for each field would allow LLVM to track the values and minimise the number of changes required, and having a dependency to that register would avoid any issues with scheduling. To be complete we would need to add something like 14 separate registers corresponding to the fields within the Mode register, and add the dependencies to those to all the instructions that depend on the settings (lots). We would also need a pass to combine changes to separate fields into a single setreg wherever possible that would probably be something similar to the pass we have now. This feels like a rather invasive set of changes. This approach would have the advantage that it would probably also resolve the concerns Matt raised. However, we chose not to adopt this approach as we considered the cost-benefit equation to be too heavy on the cost side.

The approach we have implemented is a compromise that meets our current needs, is extendable for other mode settings should that become necessary, and isn't too invasive. It produces a minimal number of setreg instructions in almost all cases. Running the pass late avoids scheduling issues, but does possibly miss some minor optimization opportunities. However, given the rare occurrence of non-default modes the impact is very small.

Do you think the benefits of the multi-register approach justify the effort required over the current approach?

rampitec added a reviewer: b-sumner.Nov 1 2018, 11:28 AM

One thing we've wanted for compute for quite a while now is a way to request non-default-rounded add, sub, mul, div, fma, and sqrt. Assuming we ever figure out how to represent these in the IR, ideally without falling back on intrinsics, could this approach be used to implement and minimize the mode changes for those as well?

In D50633#1284259, @timcorringham wrote:

Yes, I think that your suggestion is the correct solution in a perfect world. It is one of the possible approaches that we discussed in our team before implementing the current proposed solution.

The specific issue we are trying to solve is that the three 16 bit interpolation instructions need a non-default rounding mode. These are not yet widely used, but of course we need to ensure they work when they are used.

Modelling the Mode register as a separate register for each field would allow LLVM to track the values and minimise the number of changes required, and having a dependency to that register would avoid any issues with scheduling. To be complete we would need to add something like 14 separate registers corresponding to the fields within the Mode register, and add the dependencies to those to all the instructions that depend on the settings (lots). We would also need a pass to combine changes to separate fields into a single setreg wherever possible that would probably be something similar to the pass we have now. This feels like a rather invasive set of changes. This approach would have the advantage that it would probably also resolve the concerns Matt raised. However, we chose not to adopt this approach as we considered the cost-benefit equation to be too heavy on the cost side.

The approach we have implemented is a compromise that meets our current needs, is extendable for other mode settings should that become necessary, and isn't too invasive. It produces a minimal number of setreg instructions in almost all cases. Running the pass late avoids scheduling issues, but does possibly miss some minor optimization opportunities. However, given the rare occurrence of non-default modes the impact is very small.

Do you think the benefits of the multi-register approach justify the effort required over the current approach?

That is not only few interpolation instructions which need it. We need to implement OpenCL non-default rounding modes for arithmetic instructions. That will require the use of setregs.
In fact I do not see a non-invasive or efficient way to implement OpenCL rounding modes without proper modeling of HWREG and dependencies, because lowering of intrinsics must occur early.
That means we will need to revert any late approach if submitted and reimplement it any way. I.e. I believe this effort is perfectly justified.

I'm afraid I don't know anything about OpenCL non-default rounding modes - are they set per arithmetic operation or per function? When will these be needed?

In D50633#1284297, @timcorringham wrote:

I'm afraid I don't know anything about OpenCL non-default rounding modes - are they set per arithmetic operation or per function? When will these be needed?

These are set per operation. For example one could use a builtin like convert_int4_rte() or convert_float4_rtp() to perform a conversion with a non-default rounding mode.
And yes, we need them.

Actually the conversions don't need non-default-rounded operations, nor are non-default-rounded arithmetic operations required by OpenCL. However, we've had requests to implement functions such as add_rtz(x,y) which computes x+y with round-to-zero rounding. Our competitors offer such functions, and we implemented them for HSAIL. So we are really trying to get back to parity with HSAIL.

As an overall algorithmic remark: I like the organization of the pass into phases, because it provides a path forward to an additional optimization.

The current approach could be described as a straightforward fixed-point iteration over a lattice describing modes at each instruction that uses a mode. This is good and a perfectly fine approach.

However, consider the case of a loop that consistently uses a specific mode setting inside the loop, while code before the loop uses a different mode setting. The current code will introduce an s_setreg_imm32_b32 inside the loop, since the predecessor state at the beginning of the loop header is unknown. Instead, we could add an s_setreg_imm32_b32 at the end of the loop predecessor block.

The organization into separate passes allows a later modification of pass 2 & 3 into something more advanced that can take this possibility into account. I'm not saying that this optimization should necessarily be done with this change, but I just want people to be aware of it; also, it serves as an additional explanation for one of my inline comments :)

lib/Target/AMDGPU/SIModeRegister.cpp
91–93	Technically not required on entry, but required at the FirstInsertionPoint.
120	Use smart pointers (unique_ptr should do the trick).
161–181	I tend to agree with Tim here. If we first emit setregs everywhere just to remove most of them later, the pass that removes them will look pretty much identical to this pass, while in the meantime slowing down code generation elsewhere, and possibly even pessimizing things (e.g. restricting machine scheduling).
238–258	I feel like this logic is more convoluted than necessary, and possibly even wrong / overly conservative in some cases because of that. For example, why are you setting InsertionPoint also in the case where FirstInsertionPoint is set? Why are IPChange and NewInfo->Change never cleared? My intuition is that the logic should just be a delayed-update/insert pattern like this: if (!NewInfo->FirstInsertionPoint) { NewInfo->FirstInsertionPoint = &MI; } else if (!IPChange.isCompatible(InstrMode)) { if (InsertionPoint) insertSetReg(IPChange); else NewInfo->Require = IPChange; merge IPChange into NewInfo->Change and reset IPChange InsertionPoint = &MI; } merge InstrMode into IPChange Then at the end of the basic block: if (!InsertionPoint) NewInfo->Require = IPChange; The RequirePending flag should simply be unnecessary.
273–277	It would be more intuitive to guard this by IPChange being "non-empty". In the InsertionPoint case, insert the SETREG; otherwise, set NewInfo->Require. In both cases, IPChange should be reset. The overall logic can then be described as: NewInfo->Change describes the current status of the mode registers as we know it IPChange describes the pending mode changes that need to be applied at InsertionPoint (if non-null) or NewInfo->Require (otherwise)
297	Remove this from here, it isn't conceptually part of phase 1. Initialize that list as part of external driver code.

OK, I think the patch does not affect a future implementation. I still do believe we need to lower it early for the compute purposes, but it can be done later.
Temporarily resigning to remove the vote.

rampitec removed reviewers: Restricted Project, rampitec.Nov 5 2018, 12:16 PM

rampitec added reviewers: Restricted Project, rampitec.

Refactored SIModeRegister.cpp slightly and added more comments to help explain the processing, and made a couple of minor changes to address review comments.

Harbormaster completed remote builds in B24891: Diff 173668.Nov 12 2018, 6:57 AM

Amended SIModeRegister to address some minor points, and added comments to help explain why it appears more complex than necessary.

lib/Target/AMDGPU/SIModeRegister.cpp
91–93	Yes, I forgot to change this comment when I changed the code to insert at the FirstInsertionPoint rather than at the start of the block.
238–258	There are cases where we don't set FirstInsertionPoint, even if there are other InsertionPoints. This can arise where an explicit setreg appears before the first instruction that uses the mode register. We preserve the setreg (we don't really expect to see any, but if they appear we assume there is a good reason for it). In that case there is no initial mode value requirement, so no FirstInsertionPoint. That is also the reason for the RequirePending flag - there are more states than can be deduced by just the InsertionPoint and FirstInsertionPoint pointers. We don't clear Change as the algorithm assumes it holds the net change to the mode by the block. When we know the predecessor mode(s) in Phase 2 we can then determine the output mode of each block (this can involve revisiting blocks that are successors to any block that changes its output mode). Phase 3 then determines whether a setreg is required at the FirstInsertionPoiint.
273–277	The status values aren't designed to work quite the way you assume. I have refactored the code slightly and improved the comments - does that help at all?

The update seems to have messed up the indentation of comments in a few places.

lib/Target/AMDGPU/SIModeRegister.cpp
227	auto NewInfo = llvm::make_unique<BlockData>();
238–258	Okay, I see the point about having a pre-existing setreg before any instruction with mode requirements. However, the point about clearing Change (or rather IPChange) still stands, because there are many different mode bits that could have requirements separately. For example, you could have: 1. Inst that has f32 round & denormal requirements 2. Inst that has f16 round & denormal requirements 3. Inst that has different f32 round & denormal req.s 4. Inst that has different f16 round & denormal req.s You really only need to insert two setregs (before 1 and 3), but the algorithm will insert setregs before 1, 3, and 4.

Fixed minor formatting issues, and amended the way mode changes are
combined into as few setreg instrcutions as possible.

Harbormaster completed remote builds in B25210: Diff 174887.Nov 21 2018, 2:04 AM

Amended the declaration of NewInfo.

Harbormaster completed remote builds in B25211: Diff 174890.Nov 21 2018, 2:29 AM

Thank you. One question left though.

lib/Target/AMDGPU/SIModeRegister.cpp
244–247	Hasn't this become redundant now?

nhaehnle added inline comments.Nov 29 2018, 3:34 AM

lib/Target/AMDGPU/SIModeRegister.cpp
244–247	Okay, no, I see how it isn't redundant if there's a pre-existing s_setreg. But why the call to merge? isCompatible only returns true if InstrMode is a subset of the known bits of NewInfo->Change.

Removed redundant call to merge mode register status.

Harbormaster completed remote builds in B25536: Diff 176087.Nov 30 2018, 4:20 AM

timcorringham marked 2 inline comments as done.Nov 30 2018, 4:21 AM

timcorringham added inline comments.

lib/Target/AMDGPU/SIModeRegister.cpp
244–247	Good spot- this is now redundant. It is benign but unnecessary - I'll remove it.

One last thing. If I'm right about this, it'd be good to reduce the check complexity and indentation levels. Then I'm happy :)

lib/Target/AMDGPU/SIModeRegister.cpp
244–247	Nice. Isn't the `InstrMode != Status()` check now also redundant, actually?

Reordered the cases dealt with in Phase 1 so that the most specific
case (setreg instruction) is performed first, allowing the removal
of one condition, and reduced indentation for that case accordingly.

Harbormaster completed remote builds in B25551: Diff 176135.Nov 30 2018, 8:33 AM

timcorringham marked an inline comment as done.Nov 30 2018, 8:34 AM

Oops, I thought I'd posted this earlier... LGTM.

This revision is now accepted and ready to land.Dec 7 2018, 7:44 AM

I forgot to add the Phabricator Review to the commit - whoops!

Commited by:

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348754 91177308-0d34-0410-b5e6-96231b3b80d8

and subsequent minor fix for a sanitizer issue:

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348767 91177308-0d34-0410-b5e6-96231b3b80d8

Revision Contents

Path

Size

lib/

Target/

AMDGPU/

AMDGPU.h

4 lines

AMDGPUTargetMachine.cpp

9 lines

1 line

5 lines

6 lines

8 lines

406 lines

8 lines

7 lines

36 lines

8 lines

test/

CodeGen/

AMDGPU/

mode-register.mir

459 lines

Diff 176135

lib/Target/AMDGPU/AMDGPU.h

	Show First 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
	FunctionPass *createSIInsertWaitcntsPass();			FunctionPass *createSIInsertWaitcntsPass();
	FunctionPass *createSIFixWWMLivenessPass();			FunctionPass *createSIFixWWMLivenessPass();
	FunctionPass *createSIFormMemoryClausesPass();			FunctionPass *createSIFormMemoryClausesPass();
	FunctionPass *createAMDGPUSimplifyLibCallsPass(const TargetOptions &);			FunctionPass *createAMDGPUSimplifyLibCallsPass(const TargetOptions &);
	FunctionPass *createAMDGPUUseNativeCallsPass();			FunctionPass *createAMDGPUUseNativeCallsPass();
	FunctionPass *createAMDGPUCodeGenPreparePass();			FunctionPass *createAMDGPUCodeGenPreparePass();
	FunctionPass *createAMDGPUMachineCFGStructurizerPass();			FunctionPass *createAMDGPUMachineCFGStructurizerPass();
	FunctionPass *createAMDGPURewriteOutArgumentsPass();			FunctionPass *createAMDGPURewriteOutArgumentsPass();
				FunctionPass *createSIModeRegisterPass();

	void initializeAMDGPUDAGToDAGISelPass(PassRegistry&);			void initializeAMDGPUDAGToDAGISelPass(PassRegistry&);

	void initializeAMDGPUMachineCFGStructurizerPass(PassRegistry&);			void initializeAMDGPUMachineCFGStructurizerPass(PassRegistry&);
	extern char &AMDGPUMachineCFGStructurizerID;			extern char &AMDGPUMachineCFGStructurizerID;

	void initializeAMDGPUAlwaysInlinePass(PassRegistry&);			void initializeAMDGPUAlwaysInlinePass(PassRegistry&);

	▲ Show 20 Lines • Show All 117 Lines • ▼ Show 20 Lines
	extern char &SIAnnotateControlFlowPassID;			extern char &SIAnnotateControlFlowPassID;

	void initializeSIMemoryLegalizerPass(PassRegistry&);			void initializeSIMemoryLegalizerPass(PassRegistry&);
	extern char &SIMemoryLegalizerID;			extern char &SIMemoryLegalizerID;

	void initializeSIDebuggerInsertNopsPass(PassRegistry&);			void initializeSIDebuggerInsertNopsPass(PassRegistry&);
	extern char &SIDebuggerInsertNopsID;			extern char &SIDebuggerInsertNopsID;

				void initializeSIModeRegisterPass(PassRegistry&);
				extern char &SIModeRegisterID;

	void initializeSIInsertWaitcntsPass(PassRegistry&);			void initializeSIInsertWaitcntsPass(PassRegistry&);
	extern char &SIInsertWaitcntsID;			extern char &SIInsertWaitcntsID;

	void initializeSIFormMemoryClausesPass(PassRegistry&);			void initializeSIFormMemoryClausesPass(PassRegistry&);
	extern char &SIFormMemoryClausesID;			extern char &SIFormMemoryClausesID;

	void initializeAMDGPUUnifyDivergentExitNodesPass(PassRegistry&);			void initializeAMDGPUUnifyDivergentExitNodesPass(PassRegistry&);
	extern char &AMDGPUUnifyDivergentExitNodesID;			extern char &AMDGPUUnifyDivergentExitNodesID;
	▲ Show 20 Lines • Show All 85 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

Show First 20 Lines • Show All 139 Lines • ▼ Show 20 Lines

// Enable atomic optimization		// Enable atomic optimization
static cl::opt<bool> EnableAtomicOptimizations(		static cl::opt<bool> EnableAtomicOptimizations(
"amdgpu-atomic-optimizations",		"amdgpu-atomic-optimizations",
cl::desc("Enable atomic optimizations"),		cl::desc("Enable atomic optimizations"),
cl::init(false),		cl::init(false),
cl::Hidden);		cl::Hidden);

		// Enable Mode register optimization
		static cl::opt<bool> EnableSIModeRegisterPass(
		"amdgpu-mode-register",
		cl::desc("Enable mode register pass"),
		cl::init(true),
		cl::Hidden);

extern "C" void LLVMInitializeAMDGPUTarget() {		extern "C" void LLVMInitializeAMDGPUTarget() {
// Register the target		// Register the target
RegisterTargetMachine<R600TargetMachine> X(getTheAMDGPUTarget());		RegisterTargetMachine<R600TargetMachine> X(getTheAMDGPUTarget());
RegisterTargetMachine<GCNTargetMachine> Y(getTheGCNTarget());		RegisterTargetMachine<GCNTargetMachine> Y(getTheGCNTarget());

PassRegistry *PR = PassRegistry::getPassRegistry();		PassRegistry *PR = PassRegistry::getPassRegistry();
initializeR600ClauseMergePassPass(*PR);		initializeR600ClauseMergePassPass(*PR);
initializeR600ControlFlowFinalizerPass(*PR);		initializeR600ControlFlowFinalizerPass(*PR);
Show All 22 Lines	extern "C" void LLVMInitializeAMDGPUTarget() {
initializeAMDGPULowerIntrinsicsPass(*PR);		initializeAMDGPULowerIntrinsicsPass(*PR);
initializeAMDGPUOpenCLEnqueuedBlockLoweringPass(*PR);		initializeAMDGPUOpenCLEnqueuedBlockLoweringPass(*PR);
initializeAMDGPUPromoteAllocaPass(*PR);		initializeAMDGPUPromoteAllocaPass(*PR);
initializeAMDGPUCodeGenPreparePass(*PR);		initializeAMDGPUCodeGenPreparePass(*PR);
initializeAMDGPURewriteOutArgumentsPass(*PR);		initializeAMDGPURewriteOutArgumentsPass(*PR);
initializeAMDGPUUnifyMetadataPass(*PR);		initializeAMDGPUUnifyMetadataPass(*PR);
initializeSIAnnotateControlFlowPass(*PR);		initializeSIAnnotateControlFlowPass(*PR);
initializeSIInsertWaitcntsPass(*PR);		initializeSIInsertWaitcntsPass(*PR);
		initializeSIModeRegisterPass(*PR);
initializeSIWholeQuadModePass(*PR);		initializeSIWholeQuadModePass(*PR);
initializeSILowerControlFlowPass(*PR);		initializeSILowerControlFlowPass(*PR);
initializeSIInsertSkipsPass(*PR);		initializeSIInsertSkipsPass(*PR);
initializeSIMemoryLegalizerPass(*PR);		initializeSIMemoryLegalizerPass(*PR);
initializeSIDebuggerInsertNopsPass(*PR);		initializeSIDebuggerInsertNopsPass(*PR);
initializeSIOptimizeExecMaskingPass(*PR);		initializeSIOptimizeExecMaskingPass(*PR);
initializeSIFixWWMLivenessPass(*PR);		initializeSIFixWWMLivenessPass(*PR);
initializeSIFormMemoryClausesPass(*PR);		initializeSIFormMemoryClausesPass(*PR);
▲ Show 20 Lines • Show All 693 Lines • ▼ Show 20 Lines

void GCNPassConfig::addPreSched2() {		void GCNPassConfig::addPreSched2() {
}		}

void GCNPassConfig::addPreEmitPass() {		void GCNPassConfig::addPreEmitPass() {
addPass(createSIMemoryLegalizerPass());		addPass(createSIMemoryLegalizerPass());
addPass(createSIInsertWaitcntsPass());		addPass(createSIInsertWaitcntsPass());
addPass(createSIShrinkInstructionsPass());		addPass(createSIShrinkInstructionsPass());
		addPass(createSIModeRegisterPass());

// The hazard recognizer that runs as part of the post-ra scheduler does not		// The hazard recognizer that runs as part of the post-ra scheduler does not
// guarantee to be able handle all hazards correctly. This is because if there		// guarantee to be able handle all hazards correctly. This is because if there
// are multiple scheduling regions in a basic block, the regions are scheduled		// are multiple scheduling regions in a basic block, the regions are scheduled
// bottom up, so when we begin to schedule a region we don't know what		// bottom up, so when we begin to schedule a region we don't know what
// instructions were emitted directly before it.		// instructions were emitted directly before it.
//		//
// Here we add a stand-alone hazard recognizer pass which can handle all		// Here we add a stand-alone hazard recognizer pass which can handle all
Show All 14 Lines

lib/Target/AMDGPU/CMakeLists.txt

Show First 20 Lines • Show All 113 Lines • ▼ Show 20 Lines	add_llvm_target(AMDGPUCodeGen
SIMemoryLegalizer.cpp		SIMemoryLegalizer.cpp
SIOptimizeExecMasking.cpp		SIOptimizeExecMasking.cpp
SIOptimizeExecMaskingPreRA.cpp		SIOptimizeExecMaskingPreRA.cpp
SIPeepholeSDWA.cpp		SIPeepholeSDWA.cpp
SIRegisterInfo.cpp		SIRegisterInfo.cpp
SIShrinkInstructions.cpp		SIShrinkInstructions.cpp
SIWholeQuadMode.cpp		SIWholeQuadMode.cpp
GCNILPSched.cpp		GCNILPSched.cpp
		SIModeRegister.cpp
)		)

add_subdirectory(AsmParser)		add_subdirectory(AsmParser)
add_subdirectory(Disassembler)		add_subdirectory(Disassembler)
add_subdirectory(InstPrinter)		add_subdirectory(InstPrinter)
add_subdirectory(MCTargetDesc)		add_subdirectory(MCTargetDesc)
add_subdirectory(TargetInfo)		add_subdirectory(TargetInfo)
add_subdirectory(Utils)		add_subdirectory(Utils)

lib/Target/AMDGPU/SIDefines.h

Show First 20 Lines • Show All 82 Lines • ▼ Show 20 Lines	// TODO: Should this be spilt into VOP3 a and b?
// Clamps hi component of register.		// Clamps hi component of register.
// ClampLo and ClampHi set for packed clamp.		// ClampLo and ClampHi set for packed clamp.
ClampHi = UINT64_C(1) << 48,		ClampHi = UINT64_C(1) << 48,

// Is a packed VOP3P instruction.		// Is a packed VOP3P instruction.
IsPacked = UINT64_C(1) << 49,		IsPacked = UINT64_C(1) << 49,

// Is a D16 buffer instruction.		// Is a D16 buffer instruction.
D16Buf = UINT64_C(1) << 50		D16Buf = UINT64_C(1) << 50,

		// Uses floating point double precision rounding mode
		FPDPRounding = UINT64_C(1) << 51
};		};

// v_cmp_class_* etc. use a 10-bit mask for what operation is checked.		// v_cmp_class_* etc. use a 10-bit mask for what operation is checked.
// The result is true if any of these tests are true.		// The result is true if any of these tests are true.
enum ClassFlags {		enum ClassFlags {
S_NAN = 1 << 0, // Signaling NaN		S_NAN = 1 << 0, // Signaling NaN
Q_NAN = 1 << 1, // Quiet NaN		Q_NAN = 1 << 1, // Quiet NaN
N_INFINITY = 1 << 2, // Negative infinity		N_INFINITY = 1 << 2, // Negative infinity
▲ Show 20 Lines • Show All 440 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIInstrFormats.td

Show First 20 Lines • Show All 115 Lines • ▼ Show 20 Lines	class InstSI <dag outs, dag ins, string asm = "",
field bit ClampHi = 0;		field bit ClampHi = 0;

// This bit indicates that this is a packed VOP3P instruction		// This bit indicates that this is a packed VOP3P instruction
field bit IsPacked = 0;		field bit IsPacked = 0;

// This bit indicates that this is a D16 buffer instruction.		// This bit indicates that this is a D16 buffer instruction.
field bit D16Buf = 0;		field bit D16Buf = 0;

		// This bit indicates that this uses the floating point double precision
		// rounding mode flags
		field bit FPDPRounding = 0;
		arsenmUnsubmitted Not Done Reply Inline Actions It's not clear to me what this means, since every FP instruction does this arsenm: It's not clear to me what this means, since every FP instruction does this
		timcorringhamAuthorUnsubmitted Not Done Reply Inline Actions It is just a way of indicating to this pass that it is an instruction that uses the double precision flag. timcorringham: It is just a way of indicating to this pass that it is an instruction that uses the double…

// These need to be kept in sync with the enum in SIInstrFlags.		// These need to be kept in sync with the enum in SIInstrFlags.
let TSFlags{0} = SALU;		let TSFlags{0} = SALU;
let TSFlags{1} = VALU;		let TSFlags{1} = VALU;

let TSFlags{2} = SOP1;		let TSFlags{2} = SOP1;
let TSFlags{3} = SOP2;		let TSFlags{3} = SOP2;
let TSFlags{4} = SOPC;		let TSFlags{4} = SOPC;
let TSFlags{5} = SOPK;		let TSFlags{5} = SOPK;
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	class InstSI <dag outs, dag ins, string asm = "",
let TSFlags{46} = IntClamp;		let TSFlags{46} = IntClamp;
let TSFlags{47} = ClampLo;		let TSFlags{47} = ClampLo;
let TSFlags{48} = ClampHi;		let TSFlags{48} = ClampHi;

let TSFlags{49} = IsPacked;		let TSFlags{49} = IsPacked;

let TSFlags{50} = D16Buf;		let TSFlags{50} = D16Buf;

		let TSFlags{51} = FPDPRounding;

let SchedRW = [Write32Bit];		let SchedRW = [Write32Bit];

field bits<1> DisableSIDecoder = 0;		field bits<1> DisableSIDecoder = 0;
field bits<1> DisableVIDecoder = 0;		field bits<1> DisableVIDecoder = 0;
field bits<1> DisableDecoder = 0;		field bits<1> DisableDecoder = 0;

let isAsmParserOnly = !if(!eq(DisableDecoder{0}, {0}), 0, 1);		let isAsmParserOnly = !if(!eq(DisableDecoder{0}, {0}), 0, 1);
let AsmVariantName = AMDGPUAsmVariants.Default;		let AsmVariantName = AMDGPUAsmVariants.Default;
▲ Show 20 Lines • Show All 148 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIInstrInfo.h

Show First 20 Lines • Show All 597 Lines • ▼ Show 20 Lines	public:
uint64_t getClampMask(const MachineInstr &MI) const {		uint64_t getClampMask(const MachineInstr &MI) const {
const uint64_t ClampFlags = SIInstrFlags::FPClamp \|		const uint64_t ClampFlags = SIInstrFlags::FPClamp \|
SIInstrFlags::IntClamp \|		SIInstrFlags::IntClamp \|
SIInstrFlags::ClampLo \|		SIInstrFlags::ClampLo \|
SIInstrFlags::ClampHi;		SIInstrFlags::ClampHi;
return MI.getDesc().TSFlags & ClampFlags;		return MI.getDesc().TSFlags & ClampFlags;
}		}

		static bool usesFPDPRounding(const MachineInstr &MI) {
		return MI.getDesc().TSFlags & SIInstrFlags::FPDPRounding;
		}

		bool usesFPDPRounding(uint16_t Opcode) const {
		return get(Opcode).TSFlags & SIInstrFlags::FPDPRounding;
		}

bool isVGPRCopy(const MachineInstr &MI) const {		bool isVGPRCopy(const MachineInstr &MI) const {
assert(MI.isCopy());		assert(MI.isCopy());
unsigned Dest = MI.getOperand(0).getReg();		unsigned Dest = MI.getOperand(0).getReg();
const MachineFunction &MF = *MI.getParent()->getParent();		const MachineFunction &MF = *MI.getParent()->getParent();
const MachineRegisterInfo &MRI = MF.getRegInfo();		const MachineRegisterInfo &MRI = MF.getRegInfo();
return !RI.isSGPRReg(MRI, Dest);		return !RI.isSGPRReg(MRI, Dest);
}		}

▲ Show 20 Lines • Show All 388 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIModeRegister.cpp

This file was added.

				//===-- SIModeRegister.cpp - Mode Register --------------------------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				/// \file
				/// This pass inserts changes to the Mode register settings as required.
				/// Note that currently it only deals with the Double Precision Floating Point
				/// rounding mode setting, but is intended to be generic enough to be easily
				/// expanded.
				///
				//===----------------------------------------------------------------------===//
				//
				#include "AMDGPU.h"
				#include "AMDGPUInstrInfo.h"
				#include "AMDGPUSubtarget.h"
				#include "SIInstrInfo.h"
				#include "SIMachineFunctionInfo.h"
				#include "llvm/ADT/Statistic.h"
				#include "llvm/CodeGen/MachineFunctionPass.h"
				#include "llvm/CodeGen/MachineInstrBuilder.h"
				#include "llvm/CodeGen/MachineRegisterInfo.h"
				#include "llvm/IR/Constants.h"
				#include "llvm/IR/Function.h"
				#include "llvm/IR/LLVMContext.h"
				#include "llvm/Support/Debug.h"
				#include "llvm/Support/raw_ostream.h"
				#include "llvm/Target/TargetMachine.h"
				#include <queue>

				#define DEBUG_TYPE "si-mode-register"

				STATISTIC(NumSetregInserted, "Number of setreg of mode register inserted.");

				using namespace llvm;

				struct Status {
				// Mask is a bitmask where a '1' indicates the corresponding Mode bit has a
				// known value
				unsigned Mask;
				unsigned Mode;

				Status() : Mask(0), Mode(0){};

				Status(unsigned Mask, unsigned Mode) : Mask(Mask), Mode(Mode) {
				Mode &= Mask;
				};

				// merge two status values such that only values that don't conflict are
				// preserved
				Status merge(const Status &S) const {
				return Status((Mask \| S.Mask), ((Mode & ~S.Mask) \| (S.Mode & S.Mask)));
				}

				// merge an unknown value by using the unknown value's mask to remove bits
				// from the result
				Status mergeUnknown(unsigned newMask) {
				return Status(Mask & ~newMask, Mode & ~newMask);
				}

				// intersect two Status values to produce a mode and mask that is a subset
				// of both values
				Status intersect(const Status &S) const {
				unsigned NewMask = (Mask & S.Mask) & (Mode ^ ~S.Mode);
				unsigned NewMode = (Mode & NewMask);
				return Status(NewMask, NewMode);
				}

				// produce the delta required to change the Mode to the required Mode
				Status delta(const Status &S) const {
				return Status((S.Mask & (Mode ^ S.Mode)) \| (~Mask & S.Mask), S.Mode);
				}

				bool operator==(const Status &S) const {
				return (Mask == S.Mask) && (Mode == S.Mode);
				}

				bool operator!=(const Status &S) const { return !(*this == S); }

				bool isCompatible(Status &S) {
				return ((Mask & S.Mask) == S.Mask) && ((Mode & S.Mask) == S.Mode);
				}

				bool isCombinable(Status &S) {
				return !(Mask & S.Mask) \|\| isCompatible(S);
				}
				};

				class BlockData {
				public:
				nhaehnleUnsubmitted Done Reply Inline Actions Technically not required on entry, but required at the FirstInsertionPoint. nhaehnle: Technically not required on entry, but required at the FirstInsertionPoint.
				timcorringhamAuthorUnsubmitted Not Done Reply Inline Actions Yes, I forgot to change this comment when I changed the code to insert at the FirstInsertionPoint rather than at the start of the block. timcorringham: Yes, I forgot to change this comment when I changed the code to insert at the…
				// The Status that represents the mode register settings required by the
				// FirstInsertionPoint (if any) in this block. Calculated in Phase 1.
				Status Require;

				// The Status that represents the net changes to the Mode register made by
				// this block, Calculated in Phase 1.
				Status Change;

				// The Status that represents the mode register settings on exit from this
				// block. Calculated in Phase 2.
				Status Exit;

				// The Status that represents the intersection of exit Mode register settings
				// from all predecessor blocks. Calculated in Phase 2, and used by Phase 3.
				Status Pred;

				// In Phase 1 we record the first instruction that has a mode requirement,
				// which is used in Phase 3 if we need to insert a mode change.
				MachineInstr *FirstInsertionPoint;

				BlockData() : FirstInsertionPoint(nullptr) {};
				};

				namespace {

				class SIModeRegister : public MachineFunctionPass {
				public:
				nhaehnleUnsubmitted Done Reply Inline Actions Use smart pointers (unique_ptr should do the trick). nhaehnle: Use smart pointers (unique_ptr should do the trick).
				static char ID;

				std::vector<std::unique_ptr<BlockData>> BlockInfo;
				std::queue<MachineBasicBlock *> Phase2List;

				// The default mode register setting currently only caters for the floating
				// point double precision rounding mode.
				// We currently assume the default rounding mode is Round to Nearest
				// NOTE: this should come from a per function rounding mode setting once such
				// a setting exists.
				unsigned DefaultMode = FP_ROUND_ROUND_TO_NEAREST;
				Status DefaultStatus =
				Status(FP_ROUND_MODE_DP(0x3), FP_ROUND_MODE_DP(DefaultMode));

				public:
				SIModeRegister() : MachineFunctionPass(ID) {}

				bool runOnMachineFunction(MachineFunction &MF) override;

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.setPreservesCFG();
				MachineFunctionPass::getAnalysisUsage(AU);
				}

				void processBlockPhase1(MachineBasicBlock &MBB, const SIInstrInfo *TII);

				void processBlockPhase2(MachineBasicBlock &MBB, const SIInstrInfo *TII);

				void processBlockPhase3(MachineBasicBlock &MBB, const SIInstrInfo *TII);

				Status getInstructionMode(MachineInstr &MI, const SIInstrInfo *TII);

				void insertSetreg(MachineBasicBlock &MBB, MachineInstr *I,
				const SIInstrInfo *TII, Status InstrMode);
				};
				} // End anonymous namespace.

				INITIALIZE_PASS(SIModeRegister, DEBUG_TYPE,
				"Insert required mode register values", false, false)

				char SIModeRegister::ID = 0;

				char &llvm::SIModeRegisterID = SIModeRegister::ID;

				FunctionPass *llvm::createSIModeRegisterPass() { return new SIModeRegister(); }

				// Determine the Mode register setting required for this instruction.
				// Instructions which don't use the Mode register return a null Status.
				// Note this currently only deals with instructions that use the floating point
				// double precision setting.
				Status SIModeRegister::getInstructionMode(MachineInstr &MI,
				const SIInstrInfo *TII) {
				arsenmUnsubmitted Not Done Reply Inline Actions What does this mean exactly by "needs"? Does the instruction fail to function? arsenm: What does this mean exactly by "needs"? Does the instruction fail to function?
				timcorringhamAuthorUnsubmitted Not Done Reply Inline Actions The results can be outside the expected range when other rounding modes are used. timcorringham: The results can be outside the expected range when other rounding modes are used.
				if (TII->usesFPDPRounding(MI)) {
				switch (MI.getOpcode()) {
				case AMDGPU::V_INTERP_P1LL_F16:
				case AMDGPU::V_INTERP_P1LV_F16:
				case AMDGPU::V_INTERP_P2_F16:
				// f16 interpolation instructions need double precision round to zero
				return Status(FP_ROUND_MODE_DP(3),
				FP_ROUND_MODE_DP(FP_ROUND_ROUND_TO_ZERO));
				default:
				arsenmUnsubmitted Not Done Reply Inline Actions I'm not really comfortable inserting something semantically required at this point. Can you do this when the instructions are selected instead? arsenm: I'm not really comfortable inserting something semantically required at this point. Can you do…
				timcorringhamAuthorUnsubmitted Not Done Reply Inline Actions The problem with doing it during instruction selection is that we end up with many more mode register writes than are strictly required. As the mode register is not modelled as a register there isn't any way to track the values without a pass to do it, I suppose it is similar to adding nops or waitcnts, which are also done by specific passes. timcorringham: The problem with doing it during instruction selection is that we end up with many more mode…
				arsenmUnsubmitted Not Done Reply Inline Actions This is a semantic property and I think it really belongs in instruction selection. What is the problem with optimizing those out here? What actually changes? arsenm: This is a semantic property and I think it really belongs in instruction selection. What is the…
				arsenmUnsubmitted Not Done Reply Inline Actions Or even earlier, a property of the emitted operation arsenm: Or even earlier, a property of the emitted operation
				timcorringhamAuthorUnsubmitted Not Done Reply Inline Actions An initial attempt at this functionality did insert necessary mode register writes at instruction selection, but that resulted in many more changes than were necessary as the state of the register isn't known at that point. In order to avoid some of the changes all instructions that use the mode register would have to be updated to ensure the mode was appropriate - which was considered too invasive. A pass to remove unnecessary setregs would be possible, but would be very similar to this pass, but would still require changes to many instructions to insert the setregs, and would also provide extra overhead for all intervening passes. There would have to be some dependence introduced between the mode register and the instructions that use it to ensure any rescheduling didn't break the code. Overall we thought that this approach was the best compromise. It solves the immediate problem with minimal overhead, and can be extended in a staged manner fairly easily. timcorringham: An initial attempt at this functionality did insert necessary mode register writes at…
				nhaehnleUnsubmitted Not Done Reply Inline Actions I tend to agree with Tim here. If we first emit setregs everywhere just to remove most of them later, the pass that removes them will look pretty much identical to this pass, while in the meantime slowing down code generation elsewhere, and possibly even pessimizing things (e.g. restricting machine scheduling). nhaehnle: I tend to agree with Tim here. If we first emit setregs everywhere just to remove most of them…
				return DefaultStatus;
				}
				}
				return Status();
				}

				// Insert a setreg instruction to update the Mode register.
				// It is possible (though unlikely) for an instruction to require a change to
				// the value of disjoint parts of the Mode register when we don't know the
				// value of the intervening bits. In that case we need to use more than one
				// setreg instruction.
				void SIModeRegister::insertSetreg(MachineBasicBlock &MBB, MachineInstr *MI,
				const SIInstrInfo *TII, Status InstrMode) {
				while (InstrMode.Mask) {
				unsigned Offset = countTrailingZeros<unsigned>(InstrMode.Mask);
				unsigned Width = countTrailingOnes<unsigned>(InstrMode.Mask >> Offset);
				unsigned Value = (InstrMode.Mode >> Offset) & ((1 << Width) - 1);
				BuildMI(MBB, MI, 0, TII->get(AMDGPU::S_SETREG_IMM32_B32))
				.addImm(Value)
				.addImm(((Width - 1) << AMDGPU::Hwreg::WIDTH_M1_SHIFT_) \|
				(Offset << AMDGPU::Hwreg::OFFSET_SHIFT_) \|
				(AMDGPU::Hwreg::ID_MODE << AMDGPU::Hwreg::ID_SHIFT_));
				++NumSetregInserted;
				InstrMode.Mask &= ~((1 << Width) - 1) << Offset;
				}
				}

				// In Phase 1 we iterate through the instructions of the block and for each
				// instruction we get its mode usage. If the instruction uses the Mode register
				// we:
				// - update the Change status, which tracks the changes to the Mode register
				// made by this block
				// - if this instruction's requirements are compatible with the current setting
				// of the Mode register we merge the modes
				// - if it isn't compatible and an InsertionPoint isn't set, then we set the
				// InsertionPoint to the current instruction, and we remember the current
				// mode
				// - if it isn't compatible and InsertionPoint is set we insert a seteg before
				// that instruction (unless this instruction forms part of the block's
				// entry requirements in which case the insertion is deferred until Phase 3
				// when predecessor exit values are known), and move the insertion point to
				// this instruction
				// - if this is a setreg instruction we treat it as an incompatible instruction.
				// This is sub-optimal but avoids some nasty corner cases, and is expected to
				arsenmUnsubmitted Done Reply Inline Actions nullptr arsenm: nullptr
				// occur very rarely.
				// - on exit we have set the Require, Change, and initial Exit modes.
				nhaehnleUnsubmitted Not Done Reply Inline Actions auto NewInfo = llvm::make_unique<BlockData>(); nhaehnle: auto NewInfo = llvm::make_unique<BlockData>();
				void SIModeRegister::processBlockPhase1(MachineBasicBlock &MBB,
				const SIInstrInfo *TII) {
				auto NewInfo = llvm::make_unique<BlockData>();
				MachineInstr *InsertionPoint = nullptr;
				// RequirePending is used to indicate whether we are collecting the initial
				// requirements for the block, and need to defer the first InsertionPoint to
				// Phase 3. It is set to false once we have set FirstInsertionPoint, or when
				// we discover an explict setreg that means this block doesn't have any
				// initial requirements.
				bool RequirePending = true;
				Status IPChange;
				for (MachineInstr &MI : MBB) {
				Status InstrMode = getInstructionMode(MI, TII);
				if ((MI.getOpcode() == AMDGPU::S_SETREG_B32) \|\|
				(MI.getOpcode() == AMDGPU::S_SETREG_IMM32_B32)) {
				// We preserve any explicit mode register setreg instruction we encounter,
				// as we assume it has been inserted by a higher authority (this is
				// likely to be a very rare occurrence).
				unsigned Dst = TII->getNamedOperand(MI, AMDGPU::OpName::simm16)->getImm();
				if (((Dst & AMDGPU::Hwreg::ID_MASK_) >> AMDGPU::Hwreg::ID_SHIFT_) !=
				nhaehnleUnsubmitted Not Done Reply Inline Actions Hasn't this become redundant now? nhaehnle: Hasn't this become redundant now?
				nhaehnleUnsubmitted Done Reply Inline Actions Okay, no, I see how it isn't redundant if there's a pre-existing s_setreg. But why the call to merge? isCompatible only returns true if InstrMode is a subset of the known bits of NewInfo->Change. nhaehnle: Okay, no, I see how it isn't redundant if there's a pre-existing s_setreg. But why the call to…
				timcorringhamAuthorUnsubmitted Done Reply Inline Actions Good spot- this is now redundant. It is benign but unnecessary - I'll remove it. timcorringham: Good spot- this is now redundant. It is benign but unnecessary - I'll remove it.
				nhaehnleUnsubmitted Done Reply Inline Actions Nice. Isn't the `InstrMode != Status()` check now also redundant, actually? nhaehnle: Nice. Isn't the `InstrMode != Status()` check now also redundant, actually?
				AMDGPU::Hwreg::ID_MODE)
				continue;

				unsigned Width = ((Dst & AMDGPU::Hwreg::WIDTH_M1_MASK_) >>
				AMDGPU::Hwreg::WIDTH_M1_SHIFT_) +
				1;
				unsigned Offset =
				(Dst & AMDGPU::Hwreg::OFFSET_MASK_) >> AMDGPU::Hwreg::OFFSET_SHIFT_;
				unsigned Mask = ((1 << Width) - 1) << Offset;

				// If an InsertionPoint is set we will insert a setreg there.
				nhaehnleUnsubmitted Not Done Reply Inline Actions I feel like this logic is more convoluted than necessary, and possibly even wrong / overly conservative in some cases because of that. For example, why are you setting InsertionPoint also in the case where FirstInsertionPoint is set? Why are IPChange and NewInfo->Change never cleared? My intuition is that the logic should just be a delayed-update/insert pattern like this: if (!NewInfo->FirstInsertionPoint) { NewInfo->FirstInsertionPoint = &MI; } else if (!IPChange.isCompatible(InstrMode)) { if (InsertionPoint) insertSetReg(IPChange); else NewInfo->Require = IPChange; merge IPChange into NewInfo->Change and reset IPChange InsertionPoint = &MI; } merge InstrMode into IPChange Then at the end of the basic block: if (!InsertionPoint) NewInfo->Require = IPChange; The RequirePending flag should simply be unnecessary. nhaehnle: I feel like this logic is more convoluted than necessary, and possibly even wrong / overly…
				timcorringhamAuthorUnsubmitted Not Done Reply Inline Actions There are cases where we don't set FirstInsertionPoint, even if there are other InsertionPoints. This can arise where an explicit setreg appears before the first instruction that uses the mode register. We preserve the setreg (we don't really expect to see any, but if they appear we assume there is a good reason for it). In that case there is no initial mode value requirement, so no FirstInsertionPoint. That is also the reason for the RequirePending flag - there are more states than can be deduced by just the InsertionPoint and FirstInsertionPoint pointers. We don't clear Change as the algorithm assumes it holds the net change to the mode by the block. When we know the predecessor mode(s) in Phase 2 we can then determine the output mode of each block (this can involve revisiting blocks that are successors to any block that changes its output mode). Phase 3 then determines whether a setreg is required at the FirstInsertionPoiint. timcorringham: There are cases where we don't set FirstInsertionPoint, even if there are other InsertionPoints.
				nhaehnleUnsubmitted Done Reply Inline Actions Okay, I see the point about having a pre-existing setreg before any instruction with mode requirements. However, the point about clearing Change (or rather IPChange) still stands, because there are many different mode bits that could have requirements separately. For example, you could have: 1. Inst that has f32 round & denormal requirements 2. Inst that has f16 round & denormal requirements 3. Inst that has different f32 round & denormal req.s 4. Inst that has different f16 round & denormal req.s You really only need to insert two setregs (before 1 and 3), but the algorithm will insert setregs before 1, 3, and 4. nhaehnle: Okay, I see the point about having a pre-existing setreg before any instruction with mode…
				if (InsertionPoint) {
				insertSetreg(MBB, InsertionPoint, TII, IPChange.delta(NewInfo->Change));
				InsertionPoint = nullptr;
				}
				// If this is an immediate then we know the value being set, but if it is
				// not an immediate then we treat the modified bits of the mode register
				// as unknown.
				if (MI.getOpcode() == AMDGPU::S_SETREG_IMM32_B32) {
				unsigned Val = TII->getNamedOperand(MI, AMDGPU::OpName::imm)->getImm();
				unsigned Mode = (Val << Offset) & Mask;
				Status Setreg = Status(Mask, Mode);
				// If we haven't already set the initial requirements for the block we
				// don't need to as the requirements start from this explicit setreg.
				RequirePending = false;
				NewInfo->Change = NewInfo->Change.merge(Setreg);
				} else {
				NewInfo->Change = NewInfo->Change.mergeUnknown(Mask);
				}
				} else if (!NewInfo->Change.isCompatible(InstrMode)) {
				nhaehnleUnsubmitted Not Done Reply Inline Actions It would be more intuitive to guard this by IPChange being "non-empty". In the InsertionPoint case, insert the SETREG; otherwise, set NewInfo->Require. In both cases, IPChange should be reset. The overall logic can then be described as: NewInfo->Change describes the current status of the mode registers as we know it IPChange describes the pending mode changes that need to be applied at InsertionPoint (if non-null) or NewInfo->Require (otherwise) nhaehnle: It would be more intuitive to guard this by IPChange being "non-empty". In the InsertionPoint…
				timcorringhamAuthorUnsubmitted Not Done Reply Inline Actions The status values aren't designed to work quite the way you assume. I have refactored the code slightly and improved the comments - does that help at all? timcorringham: The status values aren't designed to work quite the way you assume. I have refactored the code…
				// This instruction uses the Mode register and its requirements aren't
				// compatible with the current mode.
				if (InsertionPoint) {
				// If the required mode change cannot be included in the current
				// InsertionPoint changes, we need a setreg and start a new
				// InsertionPoint.
				if (!IPChange.delta(NewInfo->Change).isCombinable(InstrMode)) {
				if (RequirePending) {
				// This is the first insertionPoint in the block so we will defer
				// the insertion of the setreg to Phase 3 where we know whether or
				// not it is actually needed.
				NewInfo->FirstInsertionPoint = InsertionPoint;
				NewInfo->Require = NewInfo->Change;
				RequirePending = false;
				} else {
				insertSetreg(MBB, InsertionPoint, TII,
				IPChange.delta(NewInfo->Change));
				IPChange = NewInfo->Change;
				}
				// Set the new InsertionPoint
				nhaehnleUnsubmitted Done Reply Inline Actions Remove this from here, it isn't conceptually part of phase 1. Initialize that list as part of external driver code. nhaehnle: Remove this from here, it isn't conceptually part of phase 1. Initialize that list as part of…
				InsertionPoint = &MI;
				}
				NewInfo->Change = NewInfo->Change.merge(InstrMode);
				} else {
				// No InsertionPoint is currently set - this is either the first in
				// the block or we have previously seen an explicit setreg.
				InsertionPoint = &MI;
				IPChange = NewInfo->Change;
				NewInfo->Change = NewInfo->Change.merge(InstrMode);
				}
				}
				}
				if (RequirePending) {
				// If we haven't yet set the initial requirements for the block we set them
				// now.
				NewInfo->FirstInsertionPoint = InsertionPoint;
				NewInfo->Require = NewInfo->Change;
				} else if (InsertionPoint) {
				// We need to insert a setreg at the InsertionPoint
				insertSetreg(MBB, InsertionPoint, TII, IPChange.delta(NewInfo->Change));
				}
				NewInfo->Exit = NewInfo->Change;
				BlockInfo[MBB.getNumber()] = std::move(NewInfo);
				}

				// In Phase 2 we revisit each block and calculate the common Mode register
				// value provided by all predecessor blocks. If the Exit value for the block
				// is changed, then we add the successor blocks to the worklist so that the
				// exit value is propagated.
				void SIModeRegister::processBlockPhase2(MachineBasicBlock &MBB,
				const SIInstrInfo *TII) {
				// BlockData *BI = BlockInfo[MBB.getNumber()];
				unsigned ThisBlock = MBB.getNumber();
				if (MBB.pred_empty()) {
				// There are no predecessors, so use the default starting status.
				BlockInfo[ThisBlock]->Pred = DefaultStatus;
				} else {
				// Build a status that is common to all the predecessors by intersecting
				// all the predecessor exit status values.
				MachineBasicBlock::pred_iterator P = MBB.pred_begin(), E = MBB.pred_end();
				MachineBasicBlock &PB = (P);
				BlockInfo[ThisBlock]->Pred = BlockInfo[PB.getNumber()]->Exit;

				for (P = std::next(P); P != E; P = std::next(P)) {
				MachineBasicBlock Pred = P;
				BlockInfo[ThisBlock]->Pred = BlockInfo[ThisBlock]->Pred.intersect(BlockInfo[Pred->getNumber()]->Exit);
				}
				}
				Status TmpStatus = BlockInfo[ThisBlock]->Pred.merge(BlockInfo[ThisBlock]->Change);
				if (BlockInfo[ThisBlock]->Exit != TmpStatus) {
				BlockInfo[ThisBlock]->Exit = TmpStatus;
				// Add the successors to the work list so we can propagate the changed exit
				arsenmUnsubmitted Done Reply Inline Actions Since you seem to be relying on inserting these instructions, this is incorrect arsenm: Since you seem to be relying on inserting these instructions, this is incorrect
				// status.
				for (MachineBasicBlock::succ_iterator S = MBB.succ_begin(),
				E = MBB.succ_end();
				S != E; S = std::next(S)) {
				MachineBasicBlock &B = (S);
				Phase2List.push(&B);
				}
				}
				}

				// In Phase 3 we revisit each block and if it has an insertion point defined we
				// check whether the predecessor mode meets the block's entry requirements. If
				// not we insert an appropriate setreg instruction to modify the Mode register.
				void SIModeRegister::processBlockPhase3(MachineBasicBlock &MBB,
				const SIInstrInfo *TII) {
				// BlockData *BI = BlockInfo[MBB.getNumber()];
				unsigned ThisBlock = MBB.getNumber();
				if (!BlockInfo[ThisBlock]->Pred.isCompatible(BlockInfo[ThisBlock]->Require)) {
				Status Delta = BlockInfo[ThisBlock]->Pred.delta(BlockInfo[ThisBlock]->Require);
				arsenmUnsubmitted Done Reply Inline Actions Typo setregto arsenm: Typo setregto
				if (BlockInfo[ThisBlock]->FirstInsertionPoint)
				insertSetreg(MBB, BlockInfo[ThisBlock]->FirstInsertionPoint, TII, Delta);
				else
				insertSetreg(MBB, &MBB.instr_front(), TII, Delta);
				}
				}

				bool SIModeRegister::runOnMachineFunction(MachineFunction &MF) {
				BlockInfo.resize(MF.getNumBlockIDs());
				const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
				const SIInstrInfo *TII = ST.getInstrInfo();

				// Processing is performed in a number of phases

				// Phase 1 - determine the initial mode required by each block, and add setreg
				// instructions for intra block requirements.
				for (MachineBasicBlock &BB : MF)
				processBlockPhase1(BB, TII);

				// Phase 2 - determine the exit mode from each block. We add all blocks to the
				// list here, but will also add any that need to be revisited during Phase 2
				// processing.
				for (MachineBasicBlock &BB : MF)
				Phase2List.push(&BB);
				while (!Phase2List.empty()) {
				processBlockPhase2(*Phase2List.front(), TII);
				Phase2List.pop();
				}

				// Phase 3 - add an initial setreg to each block where the required entry mode
				// is not satisfied by the exit mode of all its predecessors.
				for (MachineBasicBlock &BB : MF)
				processBlockPhase3(BB, TII);

				BlockInfo.clear();

				return NumSetregInserted > 0;
				}

lib/Target/AMDGPU/VOP1Instructions.td

	Show First 20 Lines • Show All 167 Lines • ▼ Show 20 Lines

	let SchedRW = [WriteQuarterRate32] in {			let SchedRW = [WriteQuarterRate32] in {
	defm V_CVT_I32_F64 : VOP1Inst <"v_cvt_i32_f64", VOP_I32_F64, fp_to_sint>;			defm V_CVT_I32_F64 : VOP1Inst <"v_cvt_i32_f64", VOP_I32_F64, fp_to_sint>;
	defm V_CVT_F64_I32 : VOP1Inst <"v_cvt_f64_i32", VOP1_F64_I32, sint_to_fp>;			defm V_CVT_F64_I32 : VOP1Inst <"v_cvt_f64_i32", VOP1_F64_I32, sint_to_fp>;
	defm V_CVT_F32_I32 : VOP1Inst <"v_cvt_f32_i32", VOP1_F32_I32, sint_to_fp>;			defm V_CVT_F32_I32 : VOP1Inst <"v_cvt_f32_i32", VOP1_F32_I32, sint_to_fp>;
	defm V_CVT_F32_U32 : VOP1Inst <"v_cvt_f32_u32", VOP1_F32_I32, uint_to_fp>;			defm V_CVT_F32_U32 : VOP1Inst <"v_cvt_f32_u32", VOP1_F32_I32, uint_to_fp>;
	defm V_CVT_U32_F32 : VOP1Inst <"v_cvt_u32_f32", VOP_I32_F32, fp_to_uint>;			defm V_CVT_U32_F32 : VOP1Inst <"v_cvt_u32_f32", VOP_I32_F32, fp_to_uint>;
	defm V_CVT_I32_F32 : VOP1Inst <"v_cvt_i32_f32", VOP_I32_F32, fp_to_sint>;			defm V_CVT_I32_F32 : VOP1Inst <"v_cvt_i32_f32", VOP_I32_F32, fp_to_sint>;
				let FPDPRounding = 1 in {
	defm V_CVT_F16_F32 : VOP1Inst <"v_cvt_f16_f32", VOP_F16_F32, fpround>;			defm V_CVT_F16_F32 : VOP1Inst <"v_cvt_f16_f32", VOP_F16_F32, fpround>;
				} // End FPDPRounding = 1
	defm V_CVT_F32_F16 : VOP1Inst <"v_cvt_f32_f16", VOP_F32_F16, fpextend>;			defm V_CVT_F32_F16 : VOP1Inst <"v_cvt_f32_f16", VOP_F32_F16, fpextend>;
	defm V_CVT_RPI_I32_F32 : VOP1Inst <"v_cvt_rpi_i32_f32", VOP_I32_F32, cvt_rpi_i32_f32>;			defm V_CVT_RPI_I32_F32 : VOP1Inst <"v_cvt_rpi_i32_f32", VOP_I32_F32, cvt_rpi_i32_f32>;
	defm V_CVT_FLR_I32_F32 : VOP1Inst <"v_cvt_flr_i32_f32", VOP_I32_F32, cvt_flr_i32_f32>;			defm V_CVT_FLR_I32_F32 : VOP1Inst <"v_cvt_flr_i32_f32", VOP_I32_F32, cvt_flr_i32_f32>;
	defm V_CVT_OFF_F32_I4 : VOP1Inst <"v_cvt_off_f32_i4", VOP1_F32_I32>;			defm V_CVT_OFF_F32_I4 : VOP1Inst <"v_cvt_off_f32_i4", VOP1_F32_I32>;
	defm V_CVT_F32_F64 : VOP1Inst <"v_cvt_f32_f64", VOP_F32_F64, fpround>;			defm V_CVT_F32_F64 : VOP1Inst <"v_cvt_f32_f64", VOP_F32_F64, fpround>;
	defm V_CVT_F64_F32 : VOP1Inst <"v_cvt_f64_f32", VOP_F64_F32, fpextend>;			defm V_CVT_F64_F32 : VOP1Inst <"v_cvt_f64_f32", VOP_F64_F32, fpextend>;
	defm V_CVT_F32_UBYTE0 : VOP1Inst <"v_cvt_f32_ubyte0", VOP1_F32_I32, AMDGPUcvt_f32_ubyte0>;			defm V_CVT_F32_UBYTE0 : VOP1Inst <"v_cvt_f32_ubyte0", VOP1_F32_I32, AMDGPUcvt_f32_ubyte0>;
	defm V_CVT_F32_UBYTE1 : VOP1Inst <"v_cvt_f32_ubyte1", VOP1_F32_I32, AMDGPUcvt_f32_ubyte1>;			defm V_CVT_F32_UBYTE1 : VOP1Inst <"v_cvt_f32_ubyte1", VOP1_F32_I32, AMDGPUcvt_f32_ubyte1>;
	Show All 36 Lines
	defm V_BFREV_B32 : VOP1Inst <"v_bfrev_b32", VOP_I32_I32>;			defm V_BFREV_B32 : VOP1Inst <"v_bfrev_b32", VOP_I32_I32>;
	defm V_FFBH_U32 : VOP1Inst <"v_ffbh_u32", VOP_I32_I32>;			defm V_FFBH_U32 : VOP1Inst <"v_ffbh_u32", VOP_I32_I32>;
	defm V_FFBL_B32 : VOP1Inst <"v_ffbl_b32", VOP_I32_I32>;			defm V_FFBL_B32 : VOP1Inst <"v_ffbl_b32", VOP_I32_I32>;
	defm V_FFBH_I32 : VOP1Inst <"v_ffbh_i32", VOP_I32_I32>;			defm V_FFBH_I32 : VOP1Inst <"v_ffbh_i32", VOP_I32_I32>;

	let SchedRW = [WriteDoubleAdd] in {			let SchedRW = [WriteDoubleAdd] in {
	defm V_FREXP_EXP_I32_F64 : VOP1Inst <"v_frexp_exp_i32_f64", VOP_I32_F64, int_amdgcn_frexp_exp>;			defm V_FREXP_EXP_I32_F64 : VOP1Inst <"v_frexp_exp_i32_f64", VOP_I32_F64, int_amdgcn_frexp_exp>;
	defm V_FREXP_MANT_F64 : VOP1Inst <"v_frexp_mant_f64", VOP_F64_F64, int_amdgcn_frexp_mant>;			defm V_FREXP_MANT_F64 : VOP1Inst <"v_frexp_mant_f64", VOP_F64_F64, int_amdgcn_frexp_mant>;
				let FPDPRounding = 1 in {
	defm V_FRACT_F64 : VOP1Inst <"v_fract_f64", VOP_F64_F64, AMDGPUfract>;			defm V_FRACT_F64 : VOP1Inst <"v_fract_f64", VOP_F64_F64, AMDGPUfract>;
				} // End FPDPRounding = 1
	} // End SchedRW = [WriteDoubleAdd]			} // End SchedRW = [WriteDoubleAdd]

	defm V_FREXP_EXP_I32_F32 : VOP1Inst <"v_frexp_exp_i32_f32", VOP_I32_F32, int_amdgcn_frexp_exp>;			defm V_FREXP_EXP_I32_F32 : VOP1Inst <"v_frexp_exp_i32_f32", VOP_I32_F32, int_amdgcn_frexp_exp>;
	defm V_FREXP_MANT_F32 : VOP1Inst <"v_frexp_mant_f32", VOP_F32_F32, int_amdgcn_frexp_mant>;			defm V_FREXP_MANT_F32 : VOP1Inst <"v_frexp_mant_f32", VOP_F32_F32, int_amdgcn_frexp_mant>;

	let VOPAsmPrefer32Bit = 1 in {			let VOPAsmPrefer32Bit = 1 in {
	defm V_CLREXCP : VOP1Inst <"v_clrexcp", VOP_NO_EXT<VOP_NONE>>;			defm V_CLREXCP : VOP1Inst <"v_clrexcp", VOP_NO_EXT<VOP_NONE>>;
	}			}
	▲ Show 20 Lines • Show All 90 Lines • ▼ Show 20 Lines
	defm V_EXP_LEGACY_F32 : VOP1Inst <"v_exp_legacy_f32", VOP_F32_F32>;			defm V_EXP_LEGACY_F32 : VOP1Inst <"v_exp_legacy_f32", VOP_F32_F32>;
	} // End SchedRW = [WriteQuarterRate32]			} // End SchedRW = [WriteQuarterRate32]

	} // End SubtargetPredicate = isCIVI			} // End SubtargetPredicate = isCIVI


	let SubtargetPredicate = Has16BitInsts in {			let SubtargetPredicate = Has16BitInsts in {

				let FPDPRounding = 1 in {
	defm V_CVT_F16_U16 : VOP1Inst <"v_cvt_f16_u16", VOP1_F16_I16, uint_to_fp>;			defm V_CVT_F16_U16 : VOP1Inst <"v_cvt_f16_u16", VOP1_F16_I16, uint_to_fp>;
	defm V_CVT_F16_I16 : VOP1Inst <"v_cvt_f16_i16", VOP1_F16_I16, sint_to_fp>;			defm V_CVT_F16_I16 : VOP1Inst <"v_cvt_f16_i16", VOP1_F16_I16, sint_to_fp>;
				} // End FPDPRounding = 1
	defm V_CVT_U16_F16 : VOP1Inst <"v_cvt_u16_f16", VOP_I16_F16, fp_to_uint>;			defm V_CVT_U16_F16 : VOP1Inst <"v_cvt_u16_f16", VOP_I16_F16, fp_to_uint>;
	defm V_CVT_I16_F16 : VOP1Inst <"v_cvt_i16_f16", VOP_I16_F16, fp_to_sint>;			defm V_CVT_I16_F16 : VOP1Inst <"v_cvt_i16_f16", VOP_I16_F16, fp_to_sint>;
	let SchedRW = [WriteQuarterRate32] in {			let SchedRW = [WriteQuarterRate32] in {
	defm V_RCP_F16 : VOP1Inst <"v_rcp_f16", VOP_F16_F16, AMDGPUrcp>;			defm V_RCP_F16 : VOP1Inst <"v_rcp_f16", VOP_F16_F16, AMDGPUrcp>;
	defm V_SQRT_F16 : VOP1Inst <"v_sqrt_f16", VOP_F16_F16, fsqrt>;			defm V_SQRT_F16 : VOP1Inst <"v_sqrt_f16", VOP_F16_F16, fsqrt>;
	defm V_RSQ_F16 : VOP1Inst <"v_rsq_f16", VOP_F16_F16, AMDGPUrsq>;			defm V_RSQ_F16 : VOP1Inst <"v_rsq_f16", VOP_F16_F16, AMDGPUrsq>;
	defm V_LOG_F16 : VOP1Inst <"v_log_f16", VOP_F16_F16, flog2>;			defm V_LOG_F16 : VOP1Inst <"v_log_f16", VOP_F16_F16, flog2>;
	defm V_EXP_F16 : VOP1Inst <"v_exp_f16", VOP_F16_F16, fexp2>;			defm V_EXP_F16 : VOP1Inst <"v_exp_f16", VOP_F16_F16, fexp2>;
	defm V_SIN_F16 : VOP1Inst <"v_sin_f16", VOP_F16_F16, AMDGPUsin>;			defm V_SIN_F16 : VOP1Inst <"v_sin_f16", VOP_F16_F16, AMDGPUsin>;
	defm V_COS_F16 : VOP1Inst <"v_cos_f16", VOP_F16_F16, AMDGPUcos>;			defm V_COS_F16 : VOP1Inst <"v_cos_f16", VOP_F16_F16, AMDGPUcos>;
	} // End SchedRW = [WriteQuarterRate32]			} // End SchedRW = [WriteQuarterRate32]
	defm V_FREXP_MANT_F16 : VOP1Inst <"v_frexp_mant_f16", VOP_F16_F16, int_amdgcn_frexp_mant>;			defm V_FREXP_MANT_F16 : VOP1Inst <"v_frexp_mant_f16", VOP_F16_F16, int_amdgcn_frexp_mant>;
	defm V_FREXP_EXP_I16_F16 : VOP1Inst <"v_frexp_exp_i16_f16", VOP_I16_F16, int_amdgcn_frexp_exp>;			defm V_FREXP_EXP_I16_F16 : VOP1Inst <"v_frexp_exp_i16_f16", VOP_I16_F16, int_amdgcn_frexp_exp>;
	defm V_FLOOR_F16 : VOP1Inst <"v_floor_f16", VOP_F16_F16, ffloor>;			defm V_FLOOR_F16 : VOP1Inst <"v_floor_f16", VOP_F16_F16, ffloor>;
	defm V_CEIL_F16 : VOP1Inst <"v_ceil_f16", VOP_F16_F16, fceil>;			defm V_CEIL_F16 : VOP1Inst <"v_ceil_f16", VOP_F16_F16, fceil>;
	defm V_TRUNC_F16 : VOP1Inst <"v_trunc_f16", VOP_F16_F16, ftrunc>;			defm V_TRUNC_F16 : VOP1Inst <"v_trunc_f16", VOP_F16_F16, ftrunc>;
	defm V_RNDNE_F16 : VOP1Inst <"v_rndne_f16", VOP_F16_F16, frint>;			defm V_RNDNE_F16 : VOP1Inst <"v_rndne_f16", VOP_F16_F16, frint>;
				let FPDPRounding = 1 in {
	defm V_FRACT_F16 : VOP1Inst <"v_fract_f16", VOP_F16_F16, AMDGPUfract>;			defm V_FRACT_F16 : VOP1Inst <"v_fract_f16", VOP_F16_F16, AMDGPUfract>;
				} // End FPDPRounding = 1

	}			}

	let OtherPredicates = [Has16BitInsts] in {			let OtherPredicates = [Has16BitInsts] in {

	def : GCNPat<			def : GCNPat<
	(f32 (f16_to_fp i16:$src)),			(f32 (f16_to_fp i16:$src)),
	(V_CVT_F32_F16_e32 $src)			(V_CVT_F32_F16_e32 $src)
	▲ Show 20 Lines • Show All 362 Lines • Show Last 20 Lines

lib/Target/AMDGPU/VOP2Instructions.td

Show First 20 Lines • Show All 534 Lines • ▼ Show 20 Lines	class divergent_i64_BinOp <SDPatternOperator Op, Instruction Inst> :
>;		>;

def : divergent_i64_BinOp <and, V_AND_B32_e32>;		def : divergent_i64_BinOp <and, V_AND_B32_e32>;
def : divergent_i64_BinOp <or, V_OR_B32_e32>;		def : divergent_i64_BinOp <or, V_OR_B32_e32>;
def : divergent_i64_BinOp <xor, V_XOR_B32_e32>;		def : divergent_i64_BinOp <xor, V_XOR_B32_e32>;

let SubtargetPredicate = Has16BitInsts in {		let SubtargetPredicate = Has16BitInsts in {

		let FPDPRounding = 1 in {
def V_MADMK_F16 : VOP2_Pseudo <"v_madmk_f16", VOP_MADMK_F16, [], "">;		def V_MADMK_F16 : VOP2_Pseudo <"v_madmk_f16", VOP_MADMK_F16, [], "">;
		defm V_LDEXP_F16 : VOP2Inst <"v_ldexp_f16", VOP_F16_F16_I32, AMDGPUldexp>;
		} // End FPDPRounding = 1

defm V_LSHLREV_B16 : VOP2Inst <"v_lshlrev_b16", VOP_I16_I16_I16>;		defm V_LSHLREV_B16 : VOP2Inst <"v_lshlrev_b16", VOP_I16_I16_I16>;
defm V_LSHRREV_B16 : VOP2Inst <"v_lshrrev_b16", VOP_I16_I16_I16>;		defm V_LSHRREV_B16 : VOP2Inst <"v_lshrrev_b16", VOP_I16_I16_I16>;
defm V_ASHRREV_I16 : VOP2Inst <"v_ashrrev_i16", VOP_I16_I16_I16>;		defm V_ASHRREV_I16 : VOP2Inst <"v_ashrrev_i16", VOP_I16_I16_I16>;
defm V_LDEXP_F16 : VOP2Inst <"v_ldexp_f16", VOP_F16_F16_I32, AMDGPUldexp>;

let isCommutable = 1 in {		let isCommutable = 1 in {
		let FPDPRounding = 1 in {
defm V_ADD_F16 : VOP2Inst <"v_add_f16", VOP_F16_F16_F16, fadd>;		defm V_ADD_F16 : VOP2Inst <"v_add_f16", VOP_F16_F16_F16, fadd>;
defm V_SUB_F16 : VOP2Inst <"v_sub_f16", VOP_F16_F16_F16, fsub>;		defm V_SUB_F16 : VOP2Inst <"v_sub_f16", VOP_F16_F16_F16, fsub>;
defm V_SUBREV_F16 : VOP2Inst <"v_subrev_f16", VOP_F16_F16_F16, null_frag, "v_sub_f16">;		defm V_SUBREV_F16 : VOP2Inst <"v_subrev_f16", VOP_F16_F16_F16, null_frag, "v_sub_f16">;
defm V_MUL_F16 : VOP2Inst <"v_mul_f16", VOP_F16_F16_F16, fmul>;		defm V_MUL_F16 : VOP2Inst <"v_mul_f16", VOP_F16_F16_F16, fmul>;
def V_MADAK_F16 : VOP2_Pseudo <"v_madak_f16", VOP_MADAK_F16, [], "">;		def V_MADAK_F16 : VOP2_Pseudo <"v_madak_f16", VOP_MADAK_F16, [], "">;
		} // End FPDPRounding = 1
defm V_ADD_U16 : VOP2Inst <"v_add_u16", VOP_I16_I16_I16>;		defm V_ADD_U16 : VOP2Inst <"v_add_u16", VOP_I16_I16_I16>;
defm V_SUB_U16 : VOP2Inst <"v_sub_u16" , VOP_I16_I16_I16>;		defm V_SUB_U16 : VOP2Inst <"v_sub_u16" , VOP_I16_I16_I16>;
defm V_SUBREV_U16 : VOP2Inst <"v_subrev_u16", VOP_I16_I16_I16, null_frag, "v_sub_u16">;		defm V_SUBREV_U16 : VOP2Inst <"v_subrev_u16", VOP_I16_I16_I16, null_frag, "v_sub_u16">;
defm V_MUL_LO_U16 : VOP2Inst <"v_mul_lo_u16", VOP_I16_I16_I16>;		defm V_MUL_LO_U16 : VOP2Inst <"v_mul_lo_u16", VOP_I16_I16_I16>;
defm V_MAX_F16 : VOP2Inst <"v_max_f16", VOP_F16_F16_F16, fmaxnum_like>;		defm V_MAX_F16 : VOP2Inst <"v_max_f16", VOP_F16_F16_F16, fmaxnum_like>;
defm V_MIN_F16 : VOP2Inst <"v_min_f16", VOP_F16_F16_F16, fminnum_like>;		defm V_MIN_F16 : VOP2Inst <"v_min_f16", VOP_F16_F16_F16, fminnum_like>;
defm V_MAX_U16 : VOP2Inst <"v_max_u16", VOP_I16_I16_I16>;		defm V_MAX_U16 : VOP2Inst <"v_max_u16", VOP_I16_I16_I16>;
defm V_MAX_I16 : VOP2Inst <"v_max_i16", VOP_I16_I16_I16>;		defm V_MAX_I16 : VOP2Inst <"v_max_i16", VOP_I16_I16_I16>;
▲ Show 20 Lines • Show All 475 Lines • Show Last 20 Lines

lib/Target/AMDGPU/VOP3Instructions.td

Show First 20 Lines • Show All 214 Lines • ▼ Show 20 Lines	def VOP3b_I64_I1_I32_I32_I64 : VOPProfile<[i64, i32, i32, i64]> {
let Outs64 = (outs DstRC:$vdst, SReg_64:$sdst);		let Outs64 = (outs DstRC:$vdst, SReg_64:$sdst);
let Asm64 = " $vdst, $sdst, $src0, $src1, $src2$clamp";		let Asm64 = " $vdst, $sdst, $src0, $src1, $src2$clamp";
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// VOP3 INTERP		// VOP3 INTERP
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

class VOP3Interp<string OpName, VOPProfile P> : VOP3_Pseudo<OpName, P> {		class VOP3Interp<string OpName, VOPProfile P, list<dag> pattern = []> :
		VOP3_Pseudo<OpName, P, pattern> {
let AsmMatchConverter = "cvtVOP3Interp";		let AsmMatchConverter = "cvtVOP3Interp";
}		}

def VOP3_INTERP : VOPProfile<[f32, f32, i32, untyped]> {		def VOP3_INTERP : VOPProfile<[f32, f32, i32, untyped]> {
let Ins64 = (ins Src0Mod:$src0_modifiers, VRegSrc_32:$src0,		let Ins64 = (ins Src0Mod:$src0_modifiers, VRegSrc_32:$src0,
Attr:$attr, AttrChan:$attrchan,		Attr:$attr, AttrChan:$attrchan,
clampmod:$clamp, omod:$omod);		clampmod:$clamp, omod:$omod);

▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines
def V_MAD_LEGACY_F32 : VOP3Inst <"v_mad_legacy_f32", VOP3_Profile<VOP_F32_F32_F32_F32>>;		def V_MAD_LEGACY_F32 : VOP3Inst <"v_mad_legacy_f32", VOP3_Profile<VOP_F32_F32_F32_F32>>;
def V_MAD_F32 : VOP3Inst <"v_mad_f32", VOP3_Profile<VOP_F32_F32_F32_F32>, fmad>;		def V_MAD_F32 : VOP3Inst <"v_mad_f32", VOP3_Profile<VOP_F32_F32_F32_F32>, fmad>;
def V_MAD_I32_I24 : VOP3Inst <"v_mad_i32_i24", VOP3_Profile<VOP_I32_I32_I32_I32, VOP3_CLAMP>>;		def V_MAD_I32_I24 : VOP3Inst <"v_mad_i32_i24", VOP3_Profile<VOP_I32_I32_I32_I32, VOP3_CLAMP>>;
def V_MAD_U32_U24 : VOP3Inst <"v_mad_u32_u24", VOP3_Profile<VOP_I32_I32_I32_I32, VOP3_CLAMP>>;		def V_MAD_U32_U24 : VOP3Inst <"v_mad_u32_u24", VOP3_Profile<VOP_I32_I32_I32_I32, VOP3_CLAMP>>;
def V_FMA_F32 : VOP3Inst <"v_fma_f32", VOP3_Profile<VOP_F32_F32_F32_F32>, fma>;		def V_FMA_F32 : VOP3Inst <"v_fma_f32", VOP3_Profile<VOP_F32_F32_F32_F32>, fma>;
def V_LERP_U8 : VOP3Inst <"v_lerp_u8", VOP3_Profile<VOP_I32_I32_I32_I32>, int_amdgcn_lerp>;		def V_LERP_U8 : VOP3Inst <"v_lerp_u8", VOP3_Profile<VOP_I32_I32_I32_I32>, int_amdgcn_lerp>;

let SchedRW = [WriteDoubleAdd] in {		let SchedRW = [WriteDoubleAdd] in {
		let FPDPRounding = 1 in {
def V_FMA_F64 : VOP3Inst <"v_fma_f64", VOP3_Profile<VOP_F64_F64_F64_F64>, fma>;		def V_FMA_F64 : VOP3Inst <"v_fma_f64", VOP3_Profile<VOP_F64_F64_F64_F64>, fma>;
def V_ADD_F64 : VOP3Inst <"v_add_f64", VOP3_Profile<VOP_F64_F64_F64>, fadd, 1>;		def V_ADD_F64 : VOP3Inst <"v_add_f64", VOP3_Profile<VOP_F64_F64_F64>, fadd, 1>;
def V_MUL_F64 : VOP3Inst <"v_mul_f64", VOP3_Profile<VOP_F64_F64_F64>, fmul, 1>;		def V_MUL_F64 : VOP3Inst <"v_mul_f64", VOP3_Profile<VOP_F64_F64_F64>, fmul, 1>;
		} // End FPDPRounding = 1
def V_MIN_F64 : VOP3Inst <"v_min_f64", VOP3_Profile<VOP_F64_F64_F64>, fminnum_like, 1>;		def V_MIN_F64 : VOP3Inst <"v_min_f64", VOP3_Profile<VOP_F64_F64_F64>, fminnum_like, 1>;
def V_MAX_F64 : VOP3Inst <"v_max_f64", VOP3_Profile<VOP_F64_F64_F64>, fmaxnum_like, 1>;		def V_MAX_F64 : VOP3Inst <"v_max_f64", VOP3_Profile<VOP_F64_F64_F64>, fmaxnum_like, 1>;
} // End SchedRW = [WriteDoubleAdd]		} // End SchedRW = [WriteDoubleAdd]

let SchedRW = [WriteQuarterRate32] in {		let SchedRW = [WriteQuarterRate32] in {
def V_MUL_LO_U32 : VOP3Inst <"v_mul_lo_u32", VOP3_Profile<VOP_I32_I32_I32>>;		def V_MUL_LO_U32 : VOP3Inst <"v_mul_lo_u32", VOP3_Profile<VOP_I32_I32_I32>>;
def V_MUL_HI_U32 : VOP3Inst <"v_mul_hi_u32", VOP3_Profile<VOP_I32_I32_I32>, mulhu>;		def V_MUL_HI_U32 : VOP3Inst <"v_mul_hi_u32", VOP3_Profile<VOP_I32_I32_I32>, mulhu>;
def V_MUL_LO_I32 : VOP3Inst <"v_mul_lo_i32", VOP3_Profile<VOP_I32_I32_I32>>;		def V_MUL_LO_I32 : VOP3Inst <"v_mul_lo_i32", VOP3_Profile<VOP_I32_I32_I32>>;
Show All 13 Lines
// v_div_fmas_f64:		// v_div_fmas_f64:
// result = src0 * src1 + src2		// result = src0 * src1 + src2
// if (vcc)		// if (vcc)
// result *= 2^64		// result *= 2^64
//		//
def V_DIV_FMAS_F64 : VOP3_Pseudo <"v_div_fmas_f64", VOP_F64_F64_F64_F64_VCC,		def V_DIV_FMAS_F64 : VOP3_Pseudo <"v_div_fmas_f64", VOP_F64_F64_F64_F64_VCC,
getVOP3VCC<VOP_F64_F64_F64_F64_VCC, AMDGPUdiv_fmas>.ret> {		getVOP3VCC<VOP_F64_F64_F64_F64_VCC, AMDGPUdiv_fmas>.ret> {
let SchedRW = [WriteDouble];		let SchedRW = [WriteDouble];
		let FPDPRounding = 1;
}		}
} // End Uses = [VCC, EXEC]		} // End Uses = [VCC, EXEC]

} // End isCommutable = 1		} // End isCommutable = 1

def V_CUBEID_F32 : VOP3Inst <"v_cubeid_f32", VOP3_Profile<VOP_F32_F32_F32_F32>, int_amdgcn_cubeid>;		def V_CUBEID_F32 : VOP3Inst <"v_cubeid_f32", VOP3_Profile<VOP_F32_F32_F32_F32>, int_amdgcn_cubeid>;
def V_CUBESC_F32 : VOP3Inst <"v_cubesc_f32", VOP3_Profile<VOP_F32_F32_F32_F32>, int_amdgcn_cubesc>;		def V_CUBESC_F32 : VOP3Inst <"v_cubesc_f32", VOP3_Profile<VOP_F32_F32_F32_F32>, int_amdgcn_cubesc>;
def V_CUBETC_F32 : VOP3Inst <"v_cubetc_f32", VOP3_Profile<VOP_F32_F32_F32_F32>, int_amdgcn_cubetc>;		def V_CUBETC_F32 : VOP3Inst <"v_cubetc_f32", VOP3_Profile<VOP_F32_F32_F32_F32>, int_amdgcn_cubetc>;
Show All 14 Lines
def V_MED3_U32 : VOP3Inst <"v_med3_u32", VOP3_Profile<VOP_I32_I32_I32_I32>, AMDGPUumed3>;		def V_MED3_U32 : VOP3Inst <"v_med3_u32", VOP3_Profile<VOP_I32_I32_I32_I32>, AMDGPUumed3>;
def V_SAD_U8 : VOP3Inst <"v_sad_u8", VOP3_Profile<VOP_I32_I32_I32_I32, VOP3_CLAMP>>;		def V_SAD_U8 : VOP3Inst <"v_sad_u8", VOP3_Profile<VOP_I32_I32_I32_I32, VOP3_CLAMP>>;
def V_SAD_HI_U8 : VOP3Inst <"v_sad_hi_u8", VOP3_Profile<VOP_I32_I32_I32_I32, VOP3_CLAMP>>;		def V_SAD_HI_U8 : VOP3Inst <"v_sad_hi_u8", VOP3_Profile<VOP_I32_I32_I32_I32, VOP3_CLAMP>>;
def V_SAD_U16 : VOP3Inst <"v_sad_u16", VOP3_Profile<VOP_I32_I32_I32_I32, VOP3_CLAMP>>;		def V_SAD_U16 : VOP3Inst <"v_sad_u16", VOP3_Profile<VOP_I32_I32_I32_I32, VOP3_CLAMP>>;
def V_SAD_U32 : VOP3Inst <"v_sad_u32", VOP3_Profile<VOP_I32_I32_I32_I32, VOP3_CLAMP>>;		def V_SAD_U32 : VOP3Inst <"v_sad_u32", VOP3_Profile<VOP_I32_I32_I32_I32, VOP3_CLAMP>>;
def V_CVT_PK_U8_F32 : VOP3Inst<"v_cvt_pk_u8_f32", VOP3_Profile<VOP_I32_F32_I32_I32>, int_amdgcn_cvt_pk_u8_f32>;		def V_CVT_PK_U8_F32 : VOP3Inst<"v_cvt_pk_u8_f32", VOP3_Profile<VOP_I32_F32_I32_I32>, int_amdgcn_cvt_pk_u8_f32>;
def V_DIV_FIXUP_F32 : VOP3Inst <"v_div_fixup_f32", VOP3_Profile<VOP_F32_F32_F32_F32>, AMDGPUdiv_fixup>;		def V_DIV_FIXUP_F32 : VOP3Inst <"v_div_fixup_f32", VOP3_Profile<VOP_F32_F32_F32_F32>, AMDGPUdiv_fixup>;

let SchedRW = [WriteDoubleAdd] in {		let SchedRW = [WriteDoubleAdd], FPDPRounding = 1 in {
def V_DIV_FIXUP_F64 : VOP3Inst <"v_div_fixup_f64", VOP3_Profile<VOP_F64_F64_F64_F64>, AMDGPUdiv_fixup>;		def V_DIV_FIXUP_F64 : VOP3Inst <"v_div_fixup_f64", VOP3_Profile<VOP_F64_F64_F64_F64>, AMDGPUdiv_fixup>;
def V_LDEXP_F64 : VOP3Inst <"v_ldexp_f64", VOP3_Profile<VOP_F64_F64_I32>, AMDGPUldexp, 1>;		def V_LDEXP_F64 : VOP3Inst <"v_ldexp_f64", VOP3_Profile<VOP_F64_F64_I32>, AMDGPUldexp, 1>;
} // End SchedRW = [WriteDoubleAdd]		} // End SchedRW = [WriteDoubleAdd], FPDPRounding = 1

def V_DIV_SCALE_F32 : VOP3_Pseudo <"v_div_scale_f32", VOP3b_F32_I1_F32_F32_F32, [], 1> {		def V_DIV_SCALE_F32 : VOP3_Pseudo <"v_div_scale_f32", VOP3b_F32_I1_F32_F32_F32, [], 1> {
let SchedRW = [WriteFloatFMA, WriteSALU];		let SchedRW = [WriteFloatFMA, WriteSALU];
let AsmMatchConverter = "";		let AsmMatchConverter = "";
}		}

// Double precision division pre-scale.		// Double precision division pre-scale.
def V_DIV_SCALE_F64 : VOP3_Pseudo <"v_div_scale_f64", VOP3b_F64_I1_F64_F64_F64, [], 1> {		def V_DIV_SCALE_F64 : VOP3_Pseudo <"v_div_scale_f64", VOP3b_F64_I1_F64_F64_F64, [], 1> {
let SchedRW = [WriteDouble, WriteSALU];		let SchedRW = [WriteDouble, WriteSALU];
let AsmMatchConverter = "";		let AsmMatchConverter = "";
		let FPDPRounding = 1;
}		}

def V_MSAD_U8 : VOP3Inst <"v_msad_u8", VOP3_Profile<VOP_I32_I32_I32_I32, VOP3_CLAMP>>;		def V_MSAD_U8 : VOP3Inst <"v_msad_u8", VOP3_Profile<VOP_I32_I32_I32_I32, VOP3_CLAMP>>;

let Constraints = "@earlyclobber $vdst" in {		let Constraints = "@earlyclobber $vdst" in {
def V_MQSAD_PK_U16_U8 : VOP3Inst <"v_mqsad_pk_u16_u8", VOP3_Profile<VOP_I64_I64_I32_I64, VOP3_CLAMP>>;		def V_MQSAD_PK_U16_U8 : VOP3Inst <"v_mqsad_pk_u16_u8", VOP3_Profile<VOP_I64_I64_I32_I64, VOP3_CLAMP>>;
} // End Constraints = "@earlyclobber $vdst"		} // End Constraints = "@earlyclobber $vdst"

▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
} // End SchedRW = [WriteDouble, WriteSALU]		} // End SchedRW = [WriteDouble, WriteSALU]
} // End isCommutable = 1		} // End isCommutable = 1

} // End SubtargetPredicate = isCIVI		} // End SubtargetPredicate = isCIVI


def V_DIV_FIXUP_F16 : VOP3Inst <"v_div_fixup_f16", VOP3_Profile<VOP_F16_F16_F16_F16>, AMDGPUdiv_fixup> {		def V_DIV_FIXUP_F16 : VOP3Inst <"v_div_fixup_f16", VOP3_Profile<VOP_F16_F16_F16_F16>, AMDGPUdiv_fixup> {
let Predicates = [Has16BitInsts, isVIOnly];		let Predicates = [Has16BitInsts, isVIOnly];
		let FPDPRounding = 1;
}		}
def V_DIV_FIXUP_F16_gfx9 : VOP3Inst <"v_div_fixup_f16_gfx9",		def V_DIV_FIXUP_F16_gfx9 : VOP3Inst <"v_div_fixup_f16_gfx9",
VOP3_Profile<VOP_F16_F16_F16_F16, VOP3_OPSEL>, AMDGPUdiv_fixup> {		VOP3_Profile<VOP_F16_F16_F16_F16, VOP3_OPSEL>, AMDGPUdiv_fixup> {
let renamedInGFX9 = 1;		let renamedInGFX9 = 1;
let Predicates = [Has16BitInsts, isGFX9];		let Predicates = [Has16BitInsts, isGFX9];
		let FPDPRounding = 1;
}		}

def V_FMA_F16 : VOP3Inst <"v_fma_f16", VOP3_Profile<VOP_F16_F16_F16_F16>, fma> {		def V_FMA_F16 : VOP3Inst <"v_fma_f16", VOP3_Profile<VOP_F16_F16_F16_F16>, fma> {
let Predicates = [Has16BitInsts, isVIOnly];		let Predicates = [Has16BitInsts, isVIOnly];
		let FPDPRounding = 1;
}		}
def V_FMA_F16_gfx9 : VOP3Inst <"v_fma_f16_gfx9", VOP3_Profile<VOP_F16_F16_F16_F16, VOP3_OPSEL>, fma> {		def V_FMA_F16_gfx9 : VOP3Inst <"v_fma_f16_gfx9", VOP3_Profile<VOP_F16_F16_F16_F16, VOP3_OPSEL>, fma> {
let renamedInGFX9 = 1;		let renamedInGFX9 = 1;
let Predicates = [Has16BitInsts, isGFX9];		let Predicates = [Has16BitInsts, isGFX9];
		let FPDPRounding = 1;
}		}

let SubtargetPredicate = Has16BitInsts, isCommutable = 1 in {		let SubtargetPredicate = Has16BitInsts, isCommutable = 1 in {

let renamedInGFX9 = 1 in {		let renamedInGFX9 = 1 in {
def V_MAD_F16 : VOP3Inst <"v_mad_f16", VOP3_Profile<VOP_F16_F16_F16_F16>, fmad>;
def V_MAD_U16 : VOP3Inst <"v_mad_u16", VOP3_Profile<VOP_I16_I16_I16_I16, VOP3_CLAMP>>;		def V_MAD_U16 : VOP3Inst <"v_mad_u16", VOP3_Profile<VOP_I16_I16_I16_I16, VOP3_CLAMP>>;
def V_MAD_I16 : VOP3Inst <"v_mad_i16", VOP3_Profile<VOP_I16_I16_I16_I16, VOP3_CLAMP>>;		def V_MAD_I16 : VOP3Inst <"v_mad_i16", VOP3_Profile<VOP_I16_I16_I16_I16, VOP3_CLAMP>>;
		let FPDPRounding = 1 in {
		def V_MAD_F16 : VOP3Inst <"v_mad_f16", VOP3_Profile<VOP_F16_F16_F16_F16>, fmad>;
		let Uses = [M0, EXEC] in {
def V_INTERP_P2_F16 : VOP3Interp <"v_interp_p2_f16", VOP3_INTERP16<[f16, f32, i32, f32]>>;		def V_INTERP_P2_F16 : VOP3Interp <"v_interp_p2_f16", VOP3_INTERP16<[f16, f32, i32, f32]>>;
}		} // End Uses = [M0, EXEC]
		} // End FPDPRounding = 1
		} // End renamedInGFX9 = 1

let SubtargetPredicate = isGFX9 in {		let SubtargetPredicate = isGFX9 in {
def V_MAD_F16_gfx9 : VOP3Inst <"v_mad_f16_gfx9", VOP3_Profile<VOP_F16_F16_F16_F16, VOP3_OPSEL>>;		def V_MAD_F16_gfx9 : VOP3Inst <"v_mad_f16_gfx9", VOP3_Profile<VOP_F16_F16_F16_F16, VOP3_OPSEL>> {
		let FPDPRounding = 1;
		}
def V_MAD_U16_gfx9 : VOP3Inst <"v_mad_u16_gfx9", VOP3_Profile<VOP_I16_I16_I16_I16, VOP3_OPSEL>>;		def V_MAD_U16_gfx9 : VOP3Inst <"v_mad_u16_gfx9", VOP3_Profile<VOP_I16_I16_I16_I16, VOP3_OPSEL>>;
def V_MAD_I16_gfx9 : VOP3Inst <"v_mad_i16_gfx9", VOP3_Profile<VOP_I16_I16_I16_I16, VOP3_OPSEL>>;		def V_MAD_I16_gfx9 : VOP3Inst <"v_mad_i16_gfx9", VOP3_Profile<VOP_I16_I16_I16_I16, VOP3_OPSEL>>;
def V_INTERP_P2_F16_gfx9 : VOP3Interp <"v_interp_p2_f16_gfx9", VOP3_INTERP16<[f16, f32, i32, f32]>>;		def V_INTERP_P2_F16_gfx9 : VOP3Interp <"v_interp_p2_f16_gfx9", VOP3_INTERP16<[f16, f32, i32, f32]>>;
} // End SubtargetPredicate = isGFX9		} // End SubtargetPredicate = isGFX9

		let Uses = [M0, EXEC], FPDPRounding = 1 in {
def V_INTERP_P1LL_F16 : VOP3Interp <"v_interp_p1ll_f16", VOP3_INTERP16<[f32, f32, i32, untyped]>>;		def V_INTERP_P1LL_F16 : VOP3Interp <"v_interp_p1ll_f16", VOP3_INTERP16<[f32, f32, i32, untyped]>>;
def V_INTERP_P1LV_F16 : VOP3Interp <"v_interp_p1lv_f16", VOP3_INTERP16<[f32, f32, i32, f16]>>;		def V_INTERP_P1LV_F16 : VOP3Interp <"v_interp_p1lv_f16", VOP3_INTERP16<[f32, f32, i32, f16]>>;
		} // End Uses = [M0, EXEC], FPDPRounding = 1

} // End SubtargetPredicate = Has16BitInsts, isCommutable = 1		} // End SubtargetPredicate = Has16BitInsts, isCommutable = 1

let SubtargetPredicate = isVI in {		let SubtargetPredicate = isVI in {
def V_INTERP_P1_F32_e64 : VOP3Interp <"v_interp_p1_f32", VOP3_INTERP>;		def V_INTERP_P1_F32_e64 : VOP3Interp <"v_interp_p1_f32", VOP3_INTERP>;
def V_INTERP_P2_F32_e64 : VOP3Interp <"v_interp_p2_f32", VOP3_INTERP>;		def V_INTERP_P2_F32_e64 : VOP3Interp <"v_interp_p2_f32", VOP3_INTERP>;
def V_INTERP_MOV_F32_e64 : VOP3Interp <"v_interp_mov_f32", VOP3_INTERP_MOV>;		def V_INTERP_MOV_F32_e64 : VOP3Interp <"v_interp_mov_f32", VOP3_INTERP_MOV>;

▲ Show 20 Lines • Show All 318 Lines • ▼ Show 20 Lines

defm V_MAD_F16 : VOP3_F16_Real_vi <0x1ea>;		defm V_MAD_F16 : VOP3_F16_Real_vi <0x1ea>;
defm V_MAD_U16 : VOP3_F16_Real_vi <0x1eb>;		defm V_MAD_U16 : VOP3_F16_Real_vi <0x1eb>;
defm V_MAD_I16 : VOP3_F16_Real_vi <0x1ec>;		defm V_MAD_I16 : VOP3_F16_Real_vi <0x1ec>;
defm V_FMA_F16 : VOP3_F16_Real_vi <0x1ee>;		defm V_FMA_F16 : VOP3_F16_Real_vi <0x1ee>;
defm V_DIV_FIXUP_F16 : VOP3_F16_Real_vi <0x1ef>;		defm V_DIV_FIXUP_F16 : VOP3_F16_Real_vi <0x1ef>;
defm V_INTERP_P2_F16 : VOP3Interp_F16_Real_vi <0x276>;		defm V_INTERP_P2_F16 : VOP3Interp_F16_Real_vi <0x276>;

		let FPDPRounding = 1 in {
defm V_MAD_LEGACY_F16 : VOP3_F16_Real_gfx9 <0x1ea, "V_MAD_F16", "v_mad_legacy_f16">;		defm V_MAD_LEGACY_F16 : VOP3_F16_Real_gfx9 <0x1ea, "V_MAD_F16", "v_mad_legacy_f16">;
defm V_MAD_LEGACY_U16 : VOP3_F16_Real_gfx9 <0x1eb, "V_MAD_U16", "v_mad_legacy_u16">;
defm V_MAD_LEGACY_I16 : VOP3_F16_Real_gfx9 <0x1ec, "V_MAD_I16", "v_mad_legacy_i16">;
defm V_FMA_LEGACY_F16 : VOP3_F16_Real_gfx9 <0x1ee, "V_FMA_F16", "v_fma_legacy_f16">;		defm V_FMA_LEGACY_F16 : VOP3_F16_Real_gfx9 <0x1ee, "V_FMA_F16", "v_fma_legacy_f16">;
defm V_DIV_FIXUP_LEGACY_F16 : VOP3_F16_Real_gfx9 <0x1ef, "V_DIV_FIXUP_F16", "v_div_fixup_legacy_f16">;		defm V_DIV_FIXUP_LEGACY_F16 : VOP3_F16_Real_gfx9 <0x1ef, "V_DIV_FIXUP_F16", "v_div_fixup_legacy_f16">;
defm V_INTERP_P2_LEGACY_F16 : VOP3Interp_F16_Real_gfx9 <0x276, "V_INTERP_P2_F16", "v_interp_p2_legacy_f16">;		defm V_INTERP_P2_LEGACY_F16 : VOP3Interp_F16_Real_gfx9 <0x276, "V_INTERP_P2_F16", "v_interp_p2_legacy_f16">;
		} // End FPDPRounding = 1

		defm V_MAD_LEGACY_U16 : VOP3_F16_Real_gfx9 <0x1eb, "V_MAD_U16", "v_mad_legacy_u16">;
		defm V_MAD_LEGACY_I16 : VOP3_F16_Real_gfx9 <0x1ec, "V_MAD_I16", "v_mad_legacy_i16">;

defm V_MAD_F16_gfx9 : VOP3OpSel_F16_Real_gfx9 <0x203, "v_mad_f16">;		defm V_MAD_F16_gfx9 : VOP3OpSel_F16_Real_gfx9 <0x203, "v_mad_f16">;
defm V_MAD_U16_gfx9 : VOP3OpSel_F16_Real_gfx9 <0x204, "v_mad_u16">;		defm V_MAD_U16_gfx9 : VOP3OpSel_F16_Real_gfx9 <0x204, "v_mad_u16">;
defm V_MAD_I16_gfx9 : VOP3OpSel_F16_Real_gfx9 <0x205, "v_mad_i16">;		defm V_MAD_I16_gfx9 : VOP3OpSel_F16_Real_gfx9 <0x205, "v_mad_i16">;
defm V_FMA_F16_gfx9 : VOP3OpSel_F16_Real_gfx9 <0x206, "v_fma_f16">;		defm V_FMA_F16_gfx9 : VOP3OpSel_F16_Real_gfx9 <0x206, "v_fma_f16">;
defm V_DIV_FIXUP_F16_gfx9 : VOP3OpSel_F16_Real_gfx9 <0x207, "v_div_fixup_f16">;		defm V_DIV_FIXUP_F16_gfx9 : VOP3OpSel_F16_Real_gfx9 <0x207, "v_div_fixup_f16">;
defm V_INTERP_P2_F16_gfx9 : VOP3Interp_F16_Real_gfx9 <0x277, "V_INTERP_P2_F16_gfx9", "v_interp_p2_f16">;		defm V_INTERP_P2_F16_gfx9 : VOP3Interp_F16_Real_gfx9 <0x277, "V_INTERP_P2_F16_gfx9", "v_interp_p2_f16">;

▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

lib/Target/AMDGPU/VOP3PInstructions.td

Show All 36 Lines	class VOP3_VOP3PInst<string OpName, VOPProfile P, bit UseTiedOutput = 0,

let Constraints = !if(UseTiedOutput, "$vdst = $vdst_in", "");		let Constraints = !if(UseTiedOutput, "$vdst = $vdst_in", "");
let DisableEncoding = !if(UseTiedOutput, "$vdst_in", "");		let DisableEncoding = !if(UseTiedOutput, "$vdst_in", "");
let AsmOperands =		let AsmOperands =
" $vdst, $src0_modifiers, $src1_modifiers, $src2_modifiers$op_sel$op_sel_hi$clamp";		" $vdst, $src0_modifiers, $src1_modifiers, $src2_modifiers$op_sel$op_sel_hi$clamp";
}		}

let isCommutable = 1 in {		let isCommutable = 1 in {
def V_PK_FMA_F16 : VOP3PInst<"v_pk_fma_f16", VOP3_Profile<VOP_V2F16_V2F16_V2F16_V2F16>, fma>;
def V_PK_MAD_I16 : VOP3PInst<"v_pk_mad_i16", VOP3_Profile<VOP_V2I16_V2I16_V2I16_V2I16>>;		def V_PK_MAD_I16 : VOP3PInst<"v_pk_mad_i16", VOP3_Profile<VOP_V2I16_V2I16_V2I16_V2I16>>;
def V_PK_MAD_U16 : VOP3PInst<"v_pk_mad_u16", VOP3_Profile<VOP_V2I16_V2I16_V2I16_V2I16>>;		def V_PK_MAD_U16 : VOP3PInst<"v_pk_mad_u16", VOP3_Profile<VOP_V2I16_V2I16_V2I16_V2I16>>;

		let FPDPRounding = 1 in {
		def V_PK_FMA_F16 : VOP3PInst<"v_pk_fma_f16", VOP3_Profile<VOP_V2F16_V2F16_V2F16_V2F16>, fma>;
def V_PK_ADD_F16 : VOP3PInst<"v_pk_add_f16", VOP3_Profile<VOP_V2F16_V2F16_V2F16>, fadd>;		def V_PK_ADD_F16 : VOP3PInst<"v_pk_add_f16", VOP3_Profile<VOP_V2F16_V2F16_V2F16>, fadd>;
def V_PK_MUL_F16 : VOP3PInst<"v_pk_mul_f16", VOP3_Profile<VOP_V2F16_V2F16_V2F16>, fmul>;		def V_PK_MUL_F16 : VOP3PInst<"v_pk_mul_f16", VOP3_Profile<VOP_V2F16_V2F16_V2F16>, fmul>;
		} // End FPDPRounding = 1
def V_PK_MAX_F16 : VOP3PInst<"v_pk_max_f16", VOP3_Profile<VOP_V2F16_V2F16_V2F16>, fmaxnum_like>;		def V_PK_MAX_F16 : VOP3PInst<"v_pk_max_f16", VOP3_Profile<VOP_V2F16_V2F16_V2F16>, fmaxnum_like>;
def V_PK_MIN_F16 : VOP3PInst<"v_pk_min_f16", VOP3_Profile<VOP_V2F16_V2F16_V2F16>, fminnum_like>;		def V_PK_MIN_F16 : VOP3PInst<"v_pk_min_f16", VOP3_Profile<VOP_V2F16_V2F16_V2F16>, fminnum_like>;

def V_PK_ADD_U16 : VOP3PInst<"v_pk_add_u16", VOP3_Profile<VOP_V2I16_V2I16_V2I16>, add>;		def V_PK_ADD_U16 : VOP3PInst<"v_pk_add_u16", VOP3_Profile<VOP_V2I16_V2I16_V2I16>, add>;
def V_PK_ADD_I16 : VOP3PInst<"v_pk_add_i16", VOP3_Profile<VOP_V2I16_V2I16_V2I16>>;		def V_PK_ADD_I16 : VOP3PInst<"v_pk_add_i16", VOP3_Profile<VOP_V2I16_V2I16_V2I16>>;
def V_PK_MUL_LO_U16 : VOP3PInst<"v_pk_mul_lo_u16", VOP3_Profile<VOP_V2I16_V2I16_V2I16>, mul>;		def V_PK_MUL_LO_U16 : VOP3PInst<"v_pk_mul_lo_u16", VOP3_Profile<VOP_V2I16_V2I16_V2I16>, mul>;

def V_PK_MIN_I16 : VOP3PInst<"v_pk_min_i16", VOP3_Profile<VOP_V2I16_V2I16_V2I16>, smin>;		def V_PK_MIN_I16 : VOP3PInst<"v_pk_min_i16", VOP3_Profile<VOP_V2I16_V2I16_V2I16>, smin>;
▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines

let SubtargetPredicate = HasMadMixInsts in {		let SubtargetPredicate = HasMadMixInsts in {
// These are VOP3a-like opcodes which accept no omod.		// These are VOP3a-like opcodes which accept no omod.
// Size of src arguments (16/32) is controlled by op_sel.		// Size of src arguments (16/32) is controlled by op_sel.
// For 16-bit src arguments their location (hi/lo) are controlled by op_sel_hi.		// For 16-bit src arguments their location (hi/lo) are controlled by op_sel_hi.
let isCommutable = 1 in {		let isCommutable = 1 in {
def V_MAD_MIX_F32 : VOP3_VOP3PInst<"v_mad_mix_f32", VOP3_Profile<VOP_F32_F16_F16_F16, VOP3_OPSEL>>;		def V_MAD_MIX_F32 : VOP3_VOP3PInst<"v_mad_mix_f32", VOP3_Profile<VOP_F32_F16_F16_F16, VOP3_OPSEL>>;

		let FPDPRounding = 1 in {
// Clamp modifier is applied after conversion to f16.		// Clamp modifier is applied after conversion to f16.
def V_MAD_MIXLO_F16 : VOP3_VOP3PInst<"v_mad_mixlo_f16", VOP3_Profile<VOP_F16_F16_F16_F16, VOP3_OPSEL>, 1>;		def V_MAD_MIXLO_F16 : VOP3_VOP3PInst<"v_mad_mixlo_f16", VOP3_Profile<VOP_F16_F16_F16_F16, VOP3_OPSEL>, 1>;

let ClampLo = 0, ClampHi = 1 in {		let ClampLo = 0, ClampHi = 1 in {
def V_MAD_MIXHI_F16 : VOP3_VOP3PInst<"v_mad_mixhi_f16", VOP3_Profile<VOP_F16_F16_F16_F16, VOP3_OPSEL>, 1>;		def V_MAD_MIXHI_F16 : VOP3_VOP3PInst<"v_mad_mixhi_f16", VOP3_Profile<VOP_F16_F16_F16_F16, VOP3_OPSEL>, 1>;
}		}
		} // End FPDPRounding = 1
}		}

defm : MadFmaMixPats<fmad, V_MAD_MIX_F32, V_MAD_MIXLO_F16, V_MAD_MIXHI_F16>;		defm : MadFmaMixPats<fmad, V_MAD_MIX_F32, V_MAD_MIXLO_F16, V_MAD_MIXHI_F16>;
} // End SubtargetPredicate = HasMadMixInsts		} // End SubtargetPredicate = HasMadMixInsts


// Essentially the same as the mad_mix versions		// Essentially the same as the mad_mix versions
let SubtargetPredicate = HasFmaMixInsts in {		let SubtargetPredicate = HasFmaMixInsts in {
let isCommutable = 1 in {		let isCommutable = 1 in {
def V_FMA_MIX_F32 : VOP3_VOP3PInst<"v_fma_mix_f32", VOP3_Profile<VOP_F32_F16_F16_F16, VOP3_OPSEL>>;		def V_FMA_MIX_F32 : VOP3_VOP3PInst<"v_fma_mix_f32", VOP3_Profile<VOP_F32_F16_F16_F16, VOP3_OPSEL>>;

		let FPDPRounding = 1 in {
// Clamp modifier is applied after conversion to f16.		// Clamp modifier is applied after conversion to f16.
def V_FMA_MIXLO_F16 : VOP3_VOP3PInst<"v_fma_mixlo_f16", VOP3_Profile<VOP_F16_F16_F16_F16, VOP3_OPSEL>, 1>;		def V_FMA_MIXLO_F16 : VOP3_VOP3PInst<"v_fma_mixlo_f16", VOP3_Profile<VOP_F16_F16_F16_F16, VOP3_OPSEL>, 1>;

let ClampLo = 0, ClampHi = 1 in {		let ClampLo = 0, ClampHi = 1 in {
def V_FMA_MIXHI_F16 : VOP3_VOP3PInst<"v_fma_mixhi_f16", VOP3_Profile<VOP_F16_F16_F16_F16, VOP3_OPSEL>, 1>;		def V_FMA_MIXHI_F16 : VOP3_VOP3PInst<"v_fma_mixhi_f16", VOP3_Profile<VOP_F16_F16_F16_F16, VOP3_OPSEL>, 1>;
}		}
		} // End FPDPRounding = 1
}		}

defm : MadFmaMixPats<fma, V_FMA_MIX_F32, V_FMA_MIXLO_F16, V_FMA_MIXHI_F16>;		defm : MadFmaMixPats<fma, V_FMA_MIX_F32, V_FMA_MIXLO_F16, V_FMA_MIXHI_F16>;
}		}

// Defines patterns that extract signed 4bit from each Idx[0].		// Defines patterns that extract signed 4bit from each Idx[0].
foreach Idx = [[0,28],[4,24],[8,20],[12,16],[16,12],[20,8],[24,4]] in		foreach Idx = [[0,28],[4,24],[8,20],[12,16],[16,12],[20,8],[24,4]] in
def ExtractSigned4bit_#Idx[0] : PatFrag<(ops node:$src),		def ExtractSigned4bit_#Idx[0] : PatFrag<(ops node:$src),
▲ Show 20 Lines • Show All 189 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/mode-register.mir

This file was added.

				# RUN: llc -march=amdgcn -mcpu=gfx900 -run-pass si-mode-register %s -o - \| FileCheck %s

				---
				# check that the mode is changed to rtz from default rtn for interp f16
				# CHECK-LABEL: name: interp_f16_default
				# CHECK-LABEL: bb.0:
				# CHECK: S_SETREG_IMM32_B32 3, 2177
				# CHECK-NEXT: V_INTERP_P1LL_F16
				# CHECK: S_SETREG_IMM32_B32 0, 2177
				# CHECK-NEXT: V_ADD_F16_e32
				# CHECK-NOT: S_SETREG_IMM32_B32

				name: interp_f16_default

				body: \|
				bb.0:
				liveins: $sgpr0, $sgpr1, $sgpr2
				$m0 = S_MOV_B32 killed $sgpr2
				$vgpr0 = V_MOV_B32_e32 killed $sgpr0, implicit $exec, implicit $exec
				$vgpr1 = V_INTERP_P1LL_F16 0, $vgpr0, 2, 1, 0, 0, 0, implicit $m0, implicit $exec
				$vgpr2 = V_MOV_B32_e32 killed $sgpr1, implicit $exec, implicit $exec
				$vgpr0 = V_INTERP_P1LL_F16 0, killed $vgpr0, 2, 1, -1, 0, 0, implicit $m0, implicit $exec
				$vgpr1 = V_INTERP_P2_F16 0, $vgpr2, 2, 1, 0, killed $vgpr1, 0, 0, implicit $m0, implicit $exec
				$vgpr0 = V_INTERP_P2_F16 0, killed $vgpr2, 2, 1, 0, killed $vgpr0, -1, 0, implicit $m0, implicit $exec
				$vgpr0 = V_ADD_F16_e32 killed $vgpr1, killed $vgpr0, implicit $exec
				S_ENDPGM
				...
				---
				# check that the mode is not changed for interp f16 when the mode is already RTZ
				# CHECK-LABEL: name: interp_f16_explicit_rtz
				# CHECK-LABEL: bb.0:
				# CHECK: S_SETREG_IMM32_B32 3, 2177
				# CHECK-NEXT: V_MOV_B32_e32
				# CHECK: S_SETREG_IMM32_B32 0, 2177
				# CHECK-NEXT: V_ADD_F16_e32
				# CHECK-NOT: S_SETREG_IMM32_B32

				name: interp_f16_explicit_rtz

				body: \|
				bb.0:
				liveins: $sgpr0, $sgpr1, $sgpr2
				$m0 = S_MOV_B32 killed $sgpr2
				S_SETREG_IMM32_B32 3, 2177
				$vgpr0 = V_MOV_B32_e32 killed $sgpr0, implicit $exec, implicit $exec
				$vgpr1 = V_INTERP_P1LL_F16 0, $vgpr0, 2, 1, 0, 0, 0, implicit $m0, implicit $exec
				$vgpr2 = V_MOV_B32_e32 killed $sgpr1, implicit $exec, implicit $exec
				$vgpr0 = V_INTERP_P1LL_F16 0, killed $vgpr0, 2, 1, -1, 0, 0, implicit $m0, implicit $exec
				$vgpr1 = V_INTERP_P2_F16 0, $vgpr2, 2, 1, 0, killed $vgpr1, 0, 0, implicit $m0, implicit $exec
				$vgpr0 = V_INTERP_P2_F16 0, killed $vgpr2, 2, 1, 0, killed $vgpr0, -1, 0, implicit $m0, implicit $exec
				$vgpr0 = V_ADD_F16_e32 killed $vgpr1, killed $vgpr0, implicit $exec
				S_ENDPGM
				...
				---
				# check that explicit RTN mode change is registered
				# CHECK-LABEL: name: explicit_rtn
				# CHECK-LABEL: bb.0:
				# CHECK: S_SETREG_IMM32_B32 3, 2177
				# CHECK-NEXT: V_INTERP_P1LL_F16
				# CHECK: S_SETREG_IMM32_B32 0, 2177
				# CHECK-NEXT: V_ADD_F16_e32
				# CHECK-NOT: S_SETREG_IMM32_B32

				name: explicit_rtn

				body: \|
				bb.0:
				liveins: $sgpr0, $sgpr1, $sgpr2
				$m0 = S_MOV_B32 killed $sgpr2
				$vgpr0 = V_MOV_B32_e32 killed $sgpr0, implicit $exec, implicit $exec
				$vgpr1 = V_INTERP_P1LL_F16 0, $vgpr0, 2, 1, 0, 0, 0, implicit $m0, implicit $exec
				$vgpr2 = V_MOV_B32_e32 killed $sgpr1, implicit $exec, implicit $exec
				$vgpr0 = V_INTERP_P1LL_F16 0, killed $vgpr0, 2, 1, -1, 0, 0, implicit $m0, implicit $exec
				$vgpr1 = V_INTERP_P2_F16 0, $vgpr2, 2, 1, 0, killed $vgpr1, 0, 0, implicit $m0, implicit $exec
				$vgpr0 = V_INTERP_P2_F16 0, killed $vgpr2, 2, 1, 0, killed $vgpr0, -1, 0, implicit $m0, implicit $exec
				S_SETREG_IMM32_B32 0, 2177
				$vgpr0 = V_ADD_F16_e32 killed $vgpr1, killed $vgpr0, implicit $exec
				S_ENDPGM
				...
				---
				# check that the mode is unchanged from RTN for F64 instruction
				# CHECK-LABEL: name: rtn_default
				# CHECK-LABEL: bb.0:
				# CHECK-NOT: S_SETREG_IMM32_B32
				# CHECK: V_FRACT_F64

				name: rtn_default

				body: \|
				bb.0:
				liveins: $vgpr1_vgpr2
				$vgpr1_vgpr2 = V_FRACT_F64_e32 killed $vgpr1_vgpr2, implicit $exec
				S_ENDPGM
				...
				---
				# check that the mode is changed from RTZ to RTN for F64 instruction
				# CHECK-LABEL: name: rtn_from_rtz
				# CHECK-LABEL: bb.0:
				# CHECK: S_SETREG_IMM32_B32 3, 2177
				# CHECK-NEXT: S_SETREG_IMM32_B32 0, 2177
				# CHECK-NEXT: V_FRACT_F64
				# CHECK-NOT: S_SETREG_IMM32_B32

				name: rtn_from_rtz

				body: \|
				bb.0:
				liveins: $vgpr1_vgpr2
				S_SETREG_IMM32_B32 3, 2177
				$vgpr1_vgpr2 = V_FRACT_F64_e32 killed $vgpr1_vgpr2, implicit $exec
				S_ENDPGM
				...
				---
				# CHECK-LABEL: name: rtz_from_rtn
				# CHECK-LABEL: bb.1:
				# CHECK: S_SETREG_IMM32_B32 3, 2177
				# CHECK-NOT: S_SETREG_IMM32_B32

				name: rtz_from_rtn

				body: \|
				bb.0:
				successors: %bb.1
				liveins: $vgpr1_vgpr2
				$vgpr1_vgpr2 = V_FRACT_F64_e32 killed $vgpr1_vgpr2, implicit $exec
				S_BRANCH %bb.1

				bb.1:
				$vgpr1 = V_INTERP_P1LL_F16 0, $vgpr0, 2, 1, 0, 0, 0, implicit $m0, implicit $exec
				S_ENDPGM
				...
				---
				# check that the mode is changed from RTZ to RTN for F64 instruction
				# and back again for remaining interp instruction
				# CHECK-LABEL: name: interp_f16_plus_sqrt_f64
				# CHECK-LABEL: bb.0:
				# CHECK: S_SETREG_IMM32_B32 3, 2177
				# CHECK: V_INTERP_P1LL_F16
				# CHECK: V_INTERP_P1LL_F16
				# CHECK: V_INTERP_P2_F16
				# CHECK: S_SETREG_IMM32_B32 0, 2177
				# CHECK: V_FRACT_F64
				# CHECK: S_SETREG_IMM32_B32 3, 2177
				# CHECK: V_INTERP_P2_F16

				name: interp_f16_plus_sqrt_f64

				body: \|
				bb.0:
				liveins: $sgpr0, $sgpr1, $sgpr2, $vgpr3, $vgpr4
				$m0 = S_MOV_B32 killed $sgpr2
				$vgpr0 = V_MOV_B32_e32 $sgpr0, implicit $exec, implicit $exec
				$vgpr1 = V_INTERP_P1LL_F16 0, $vgpr0, 2, 1, 0, 0, 0, implicit $m0, implicit $exec
				$vgpr2 = V_MOV_B32_e32 $sgpr1, implicit $exec, implicit $exec
				$vgpr0 = V_INTERP_P1LL_F16 0, killed $vgpr0, 2, 1, -1, 0, 0, implicit $m0, implicit $exec
				$vgpr1 = V_INTERP_P2_F16 0, $vgpr2, 2, 1, 0, killed $vgpr1, 0, 0, implicit $m0, implicit $exec
				$vgpr3_vgpr4 = V_FRACT_F64_e32 killed $vgpr3_vgpr4, implicit $exec
				$vgpr0 = V_INTERP_P2_F16 0, killed $vgpr2, 2, 1, 0, killed $vgpr0, -1, 0, implicit $m0, implicit $exec
				$vgpr0 = V_ADD_F16_e32 killed $sgpr0, killed $vgpr0, implicit $exec
				S_ENDPGM
				...
				---
				# check that an explicit change to the single precision mode has no effect
				# CHECK-LABEL: name: single_precision_mode_change
				# CHECK-LABEL: bb.0:
				# CHECK: S_SETREG_IMM32_B32 3, 2177
				# CHECK: V_INTERP_P1LL_F16
				# CHECK: V_INTERP_P1LL_F16
				# CHECK: V_INTERP_P2_F16
				# CHECK: S_SETREG_IMM32_B32 0, 2177
				# CHECK: V_FRACT_F64
				# CHECK: S_SETREG_IMM32_B32 3, 2177
				# CHECK: V_INTERP_P2_F16

				name: single_precision_mode_change

				body: \|
				bb.0:
				liveins: $sgpr0, $sgpr1, $sgpr2, $vgpr3, $vgpr4
				$m0 = S_MOV_B32 killed $sgpr2
				$vgpr0 = V_MOV_B32_e32 $sgpr0, implicit $exec, implicit $exec
				$vgpr1 = V_INTERP_P1LL_F16 0, $vgpr0, 2, 1, 0, 0, 0, implicit $m0, implicit $exec
				S_SETREG_IMM32_B32 2, 2049
				$vgpr2 = V_MOV_B32_e32 $sgpr1, implicit $exec, implicit $exec
				$vgpr0 = V_INTERP_P1LL_F16 0, killed $vgpr0, 2, 1, -1, 0, 0, implicit $m0, implicit $exec
				$vgpr1 = V_INTERP_P2_F16 0, $vgpr2, 2, 1, 0, killed $vgpr1, 0, 0, implicit $m0, implicit $exec
				$vgpr3_vgpr4 = V_FRACT_F64_e32 killed $vgpr3_vgpr4, implicit $exec
				$vgpr0 = V_INTERP_P2_F16 0, killed $vgpr2, 2, 1, 0, killed $vgpr0, -1, 0, implicit $m0, implicit $exec
				$vgpr0 = V_ADD_F16_e32 killed $sgpr0, killed $vgpr0, implicit $exec
				S_ENDPGM
				...
				---
				# check that mode is propagated back to start of loop - first instruction is RTN but needs
				# setreg as RTZ is set in loop
				# CHECK-LABEL: name: loop
				# CHECK-LABEL: bb.1:
				# CHECK: S_SETREG_IMM32_B32 0, 2177
				# CHECK: V_FRACT_F64
				# CHECK-LABEL: bb.2:
				# CHECK: S_SETREG_IMM32_B32 3, 2177
				# CHECK: V_INTERP_P1LL_F16
				# CHECK-NOT: S_SETREG_IMM32_B32

				name: loop

				body: \|
				bb.0:
				liveins: $sgpr0, $sgpr1, $sgpr2, $vgpr3, $vgpr4
				successors: %bb.1
				$m0 = S_MOV_B32 killed $sgpr2
				S_BRANCH %bb.1

				bb.1:
				successors: %bb.2
				$vgpr3_vgpr4 = V_FRACT_F64_e32 killed $vgpr3_vgpr4, implicit $exec
				S_BRANCH %bb.2

				bb.2:
				successors: %bb.1, %bb.3
				$vgpr0 = V_MOV_B32_e32 $sgpr0, implicit $exec, implicit $exec
				$vgpr1 = V_INTERP_P1LL_F16 0, $vgpr0, 2, 1, 0, 0, 0, implicit $m0, implicit $exec
				S_CBRANCH_VCCZ %bb.1, implicit $vcc
				S_BRANCH %bb.3

				bb.3:
				S_ENDPGM
				...
				---
				# two back-edges to same node with different modes
				# CHECK-LABEL: name: double_loop
				# CHECK-NOT: S_SETREG_IMM32_B32
				# CHECK-LABEL: bb.2:
				# CHECK: S_SETREG_IMM32_B32 0, 2177
				# CHECK: V_FRACT_F64_e32
				# CHECK-LABEL: bb.4:
				# CHECK: S_SETREG_IMM32_B32 3, 2177

				name: double_loop

				body: \|
				bb.0:
				liveins: $sgpr0, $sgpr1, $sgpr2, $vgpr3, $vgpr4
				successors: %bb.1
				$m0 = S_MOV_B32 killed $sgpr2
				S_BRANCH %bb.1

				bb.1:
				successors: %bb.2
				S_NOP 1
				S_BRANCH %bb.2

				bb.2:
				successors: %bb.1, %bb.3
				$vgpr3_vgpr4 = V_FRACT_F64_e32 killed $vgpr3_vgpr4, implicit $exec
				S_CBRANCH_VCCZ %bb.1, implicit $vcc
				S_BRANCH %bb.3

				bb.3:
				successors: %bb.4
				S_NOP 1
				S_BRANCH %bb.4

				bb.4:
				successors: %bb.5
				S_NOP 1
				S_BRANCH %bb.5

				bb.5:
				successors: %bb.1, %bb.6
				S_SETREG_IMM32_B32 3, 2177
				S_CBRANCH_VCCZ %bb.1, implicit $vcc
				S_BRANCH %bb.6

				bb.6:
				S_ENDPGM
				...
				---
				# check that mode is propagated back to start of loop and through a block that
				# neither sets or uses the mode.
				# CHECK-LABEL: name: loop_indirect
				# CHECK_NOT: S_SETREG_IMM32_B32
				# CHECK-LABEL: bb.3:
				# CHECK: S_SETREG_IMM32_B32 3, 2177
				# CHECK: V_INTERP_P1LL_F16
				# CHECK-NOT: S_SETREG_IMM32_B32

				name: loop_indirect

				body: \|
				bb.0:
				liveins: $sgpr0, $sgpr1, $sgpr2, $vgpr3, $vgpr4
				successors: %bb.1
				$m0 = S_MOV_B32 killed $sgpr2
				S_BRANCH %bb.1

				bb.1:
				successors: %bb.2
				S_NOP 1
				S_BRANCH %bb.2

				bb.2:
				successors: %bb.3
				S_NOP 1
				S_BRANCH %bb.3

				bb.3:
				successors: %bb.1, %bb.4
				$vgpr0 = V_MOV_B32_e32 $sgpr0, implicit $exec, implicit $exec
				$vgpr1 = V_INTERP_P1LL_F16 0, $vgpr0, 2, 1, 0, 0, 0, implicit $m0, implicit $exec
				S_CBRANCH_VCCZ %bb.1, implicit $vcc
				S_BRANCH %bb.4

				bb.4:
				S_ENDPGM
				...
				---
				# check that multiple mode values are propagated to a block that uses the mode
				# CHECK-LABEL: name: multiple_mode_direct
				# CHECK-LABEL: bb.3:
				# CHECK: S_SETREG_IMM32_B32 0, 2177
				# CHECK: V_FRACT_F64_e32
				# CHECK-NOT: S_SETREG_IMM32_B32

				name: multiple_mode_direct

				body: \|
				bb.0:
				liveins: $sgpr0, $sgpr1, $sgpr2, $vgpr3, $vgpr4
				successors: %bb.1
				$m0 = S_MOV_B32 killed $sgpr2
				S_BRANCH %bb.1

				bb.1:
				successors: %bb.2, %bb.3
				S_CBRANCH_VCCZ %bb.2, implicit $vcc
				S_BRANCH %bb.3

				bb.2:
				successors: %bb.3
				S_SETREG_IMM32_B32 3, 2177
				S_BRANCH %bb.3

				bb.3:
				successors: %bb.4
				$vgpr3_vgpr4 = V_FRACT_F64_e32 killed $vgpr3_vgpr4, implicit $exec
				S_BRANCH %bb.4

				bb.4:
				S_ENDPGM
				...
				---
				# check that multiple mode values are propagated through a block that neither
				# sets or uses the mode.
				# CHECK-LABEL: name: multiple_mode_indirect
				# CHECK-LABEL: bb.4:
				# CHECK: S_SETREG_IMM32_B32 0, 2177
				# CHECK: V_FRACT_F64_e32
				# CHECK-NOT: S_SETREG_IMM32_B32

				name: multiple_mode_indirect

				body: \|
				bb.0:
				liveins: $sgpr0, $sgpr1, $sgpr2, $vgpr3, $vgpr4
				successors: %bb.1
				$m0 = S_MOV_B32 killed $sgpr2
				S_BRANCH %bb.1

				bb.1:
				successors: %bb.2, %bb.3
				S_CBRANCH_VCCZ %bb.2, implicit $vcc
				S_BRANCH %bb.3

				bb.2:
				successors: %bb.3
				S_SETREG_IMM32_B32 3, 2177
				S_BRANCH %bb.3

				bb.3:
				successors: %bb.4
				S_NOP 1
				S_BRANCH %bb.4

				bb.4:
				successors: %bb.5
				$vgpr3_vgpr4 = V_FRACT_F64_e32 killed $vgpr3_vgpr4, implicit $exec
				S_BRANCH %bb.5

				bb.5:
				S_ENDPGM
				...
				---
				# CHECK-LABEL: name: pass_through_blocks
				# CHECK-LABEL: bb.0:
				# CHECK: V_FRACT_F64_e32
				# CHECK-NEXT: S_SETREG_IMM32_B32 3, 2177
				# CHECK-NOT: S_SETREG_IMM32_B32

				name: pass_through_blocks

				body: \|
				bb.0:
				successors: %bb.1
				liveins: $vgpr1_vgpr2
				$vgpr1_vgpr2 = V_FRACT_F64_e32 killed $vgpr1_vgpr2, implicit $exec
				$vgpr1 = V_INTERP_P1LL_F16 0, $vgpr0, 2, 1, 0, 0, 0, implicit $m0, implicit $exec
				S_BRANCH %bb.1

				bb.1:
				successors: %bb.2
				S_BRANCH %bb.2

				bb.2:
				successors: %bb.3
				S_BRANCH %bb.3

				bb.3:
				successors: %bb.4
				S_BRANCH %bb.4

				bb.4:
				$vgpr1 = V_INTERP_P1LL_F16 0, $vgpr0, 2, 1, 0, 0, 0, implicit $m0, implicit $exec
				S_ENDPGM
				...
				---
				# check that multiple mode values are propagated
				# CHECK-LABEL: name: if_then_else
				# CHECK-LABEL: bb.3:
				# CHECK: S_SETREG_IMM32_B32 0, 2177
				# CHECK: V_FRACT_F64_e32
				# CHECK-NOT: S_SETREG_IMM32_B32

				name: if_then_else

				body: \|
				bb.0:
				liveins: $sgpr0, $sgpr1, $sgpr2, $vgpr3, $vgpr4
				successors: %bb.1
				$m0 = S_MOV_B32 killed $sgpr2
				S_BRANCH %bb.1

				bb.1:
				successors: %bb.2, %bb.3
				S_CBRANCH_VCCZ %bb.3, implicit $vcc
				S_BRANCH %bb.2

				bb.2:
				successors: %bb.3
				S_SETREG_IMM32_B32 3, 2177
				S_BRANCH %bb.3

				bb.3:
				successors: %bb.4
				$vgpr3_vgpr4 = V_FRACT_F64_e32 killed $vgpr3_vgpr4, implicit $exec
				S_BRANCH %bb.4

				bb.4:
				S_ENDPGM
				...

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add new Mode Register passClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 176135

lib/Target/AMDGPU/AMDGPU.h

lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

lib/Target/AMDGPU/CMakeLists.txt

lib/Target/AMDGPU/SIDefines.h

lib/Target/AMDGPU/SIInstrFormats.td

lib/Target/AMDGPU/SIInstrInfo.h

lib/Target/AMDGPU/SIModeRegister.cpp

lib/Target/AMDGPU/VOP1Instructions.td

lib/Target/AMDGPU/VOP2Instructions.td

lib/Target/AMDGPU/VOP3Instructions.td

lib/Target/AMDGPU/VOP3PInstructions.td

test/CodeGen/AMDGPU/mode-register.mir

[AMDGPU] Add new Mode Register pass
ClosedPublic