This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/
-
CodeGen/
-
ExpandPostRAPseudos.cpp
-
Target/AMDGPU/
-
AMDGPU/
-
AMDGPU.h
2/3
AMDGPUTargetMachine.cpp
-
CMakeLists.txt
-
SIFixVGPRCopies.cpp
-
SIInstrInfo.h
2/9
SIInstrInfo.cpp
2
SIInstructions.td
2/16
SILowerPredicatedCopies.cpp
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
-
branch-folding-implicit-def-subreg.ll
-
greedy-global-heuristic.mir
2
llc-pipeline.ll
-
load-global-i16.ll
-
partial-regcopy-and-spill-missed-at-regalloc.ll
-
regalloc-fail-unsatisfiable-overlapping-tuple-hints.mir
-
regalloc-introduces-copy-sgpr-to-agpr.mir
-
sgpr-regalloc-flags.ll
-
skip-subreg-copy-from-iswwmcopy-check.mir

Differential D143762

[AMDGPU] Enable whole wave register copy
ClosedPublic

Authored by cdevadas on Feb 10 2023, 9:51 AM.

Download Raw Diff

Details

Reviewers

arsenm
rampitec

Commits

rGb4a62b1fa546: [AMDGPU] Enable whole wave register copy

Summary

So far, we haven't exposed the allocation of whole-wave
registers to regalloc. We hand-picked them for various
whole wave mode operations. With a future patch, we
want the allocator to efficiently allocate them rather
than using the custom pre-allocation pass.

Any liverange split of virtual registers involved in
whole-wave operations require the resulting COPY
introduced with the split to be performed for all
lanes. It isn't implemented in the compiler yet.

This patch would identify all such copies and
manipulate the exec mask around them to enable all
lanes without affecting the value of exec mask
elsewhere.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

cdevadas created this revision.Feb 10 2023, 9:51 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 10 2023, 9:51 AM

Herald added subscribers: kosarev, foad, kerbowa and 6 others. · View Herald Transcript

cdevadas requested review of this revision.Feb 10 2023, 9:51 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 10 2023, 9:51 AM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

Harbormaster completed remote builds in B213099: Diff 496531.Feb 10 2023, 9:52 AM

cdevadas added a parent revision: D143759: [AMDGPU] Implement whole wave register spill.Feb 10 2023, 10:00 AM

cdevadas added a child revision: D124196: [AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs.

cdevadas mentioned this in D143754: [MachineInstr] Introduce generic predicated copy opcode.Feb 10 2023, 10:05 AM

cdevadas removed a parent revision: D143759: [AMDGPU] Implement whole wave register spill.May 8 2023, 4:36 AM

cdevadas removed a child revision: D124196: [AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs.

Rebase
Incorporated the downstream code

Harbormaster completed remote builds in B232819: Diff 523320.May 18 2023, 3:48 AM

fix file comment

Harbormaster completed remote builds in B232824: Diff 523325.May 18 2023, 4:05 AM

yassingh added a parent revision: D143759: [AMDGPU] Implement whole wave register spill.May 18 2023, 4:11 AM

yassingh added a child revision: D124196: [AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs.

yassingh mentioned this in D150390: [AMDGPU] Introduce and use the new PRED_COPY opcode.May 18 2023, 4:30 AM

Rebase and merge D150390 ([AMDGPU] Introduce and use the new PRED_COPY opcode) into this.

Herald added subscribers: qcolombet, MatzeB. · View Herald TranscriptJun 6 2023, 5:36 AM

Harbormaster completed remote builds in B236911: Diff 528812.Jun 6 2023, 5:37 AM

yassingh added a parent revision: D152261: [CodeGen] Move lowerCopy from expandPostRA to TII.Jun 6 2023, 5:48 AM

yassingh added inline comments.Jun 6 2023, 5:53 AM

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
1940–1942	I wanted to keep this in SILowerPredicatedCopy pass but it wasn't working there, it needed virtregRewriter to be run before it.

Revert back to PRED_COPY to COPY lowering mechanism
Rebase

Harbormaster completed remote builds in B239964: Diff 532856.Jun 20 2023, 3:34 AM

Review comments (use getWaveMaskRegClass(), getExec() functions)

Harbormaster completed remote builds in B240270: Diff 533295.Jun 21 2023, 9:26 AM

In D143762#4438232, @yassingh wrote:

Review comments (use getWaveMaskRegClass(), getExec() functions)

Accidentally updated this one instead of D143759. Not reverting as the extra diff will go away once it is rebased.

arsenm added inline comments.Jun 21 2023, 9:34 AM

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
8009–8012	Don't need to query TRI, it's already the RI member. You can also just do RI.isSGPRReg. Also, you're not checking the flags? The point was to check the flags and switch to the WWM copy. You don't care about SGPR or VGPR, you care about WWM VGPR or not

Check vreg flag in getLiveRangeSplitOpcode

Harbormaster completed remote builds in B240437: Diff 533516.Jun 22 2023, 2:25 AM

arsenm added inline comments.Jun 22 2023, 10:44 AM

llvm/lib/CodeGen/SplitKit.cpp
567 ↗	(On Diff #533516)	You're missing the new opcode here. buildSingleSubRegCopy should gain a new MCInstrDesc & parameter for the opcode to use
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
2424	Regular copy shouldn't reach here?
8009	SrcReg.isVirtual should be implied
llvm/lib/Target/AMDGPU/SILowerPredicatedCopies.cpp
141	Can you early return false if the WWM reg set is empty?
144	continue and reduce indentation
145	Should add a TODO to try to reduce the saveexec/restore exec pairs for adjacent WWM ops
160	Do these need to gain an implicit exec use?

arsenm requested changes to this revision.Jun 22 2023, 10:44 AM

This revision now requires changes to proceed.Jun 22 2023, 10:44 AM

review comments

Harbormaster completed remote builds in B240730: Diff 533912.Jun 23 2023, 3:20 AM

yassingh added inline comments.Jun 23 2023, 3:22 AM

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
2424	Yes it won't, will remove.
8009	Added an assert.
llvm/lib/Target/AMDGPU/SILowerPredicatedCopies.cpp
141	Done, had to add a new helper function.
160	Do you mean we should add the implicit exec? In that case, SIFixVGPRCopies will take care?

arsenm added inline comments.Jun 23 2023, 6:12 AM

llvm/lib/Target/AMDGPU/SILowerPredicatedCopies.cpp
160	This isn't something that can be taken care of later. SIFixVGPRCopies is a horribly broken hack, the less we depend on it the better
160	Actually these split copies might have been the original problem which caused it to be added. Maybe we have a way to drop it now?

arsenm added inline comments.Jun 23 2023, 6:14 AM

llvm/lib/Target/AMDGPU/SILowerPredicatedCopies.cpp
160	That was the reasoning given in D28874 when it was added. How about as a next step we make sure all the VGPR splits end up with exec reads

yassingh added inline comments.Jun 23 2023, 7:52 AM

llvm/lib/Target/AMDGPU/SILowerPredicatedCopies.cpp
160	Will look into it, don't have sufficient expertise to answer this right now. Adding @AMDGPU.
160	I can try and see what comes. I can see 2 ways to go about it, add the exec operand here itself or, in tablegen definition of PRED_COPY. What do you suggest?

yassingh added a subscriber: Restricted Project.Jun 23 2023, 7:53 AM

arsenm added inline comments.Jun 23 2023, 7:58 AM

llvm/lib/Target/AMDGPU/SILowerPredicatedCopies.cpp
160	Add to the instruction definition. Though we're still stuck with COPY=PRED_COPY 'sometimes'

Rebase

Harbormaster completed remote builds in B241201: Diff 534588.Jun 26 2023, 8:55 AM

cdevadas mentioned this in D150388: [CodeGen]Allow targets to use target specific COPY instructions for live range splitting.Jun 26 2023, 9:29 AM

arsenm added inline comments.Jun 26 2023, 11:05 AM

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
8011	I guess the name is wrong now. It's technically an unpredicated copy. Rename to WWM_COPY or COPY_WWM?
llvm/lib/Target/AMDGPU/SILowerPredicatedCopies.cpp
83	SIInstrInfo
93	Either this check is unnecessary or should have happened earlier to avoid crashing on the checkFlag

cdevadas added inline comments.Jun 26 2023, 11:16 AM

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
8011	Make sense. I prefer WWM_COPY.
llvm/lib/Target/AMDGPU/SILowerPredicatedCopies.cpp
83	This function is no longer needed. Now the PRED_COPY is inserted only for wwm-regs after we fine-tuned AMDGPU's `getLiveRangeSplitOpcode` implementation.
152	I guess this whole check now can be entirely avoided. PRED_COPY is inserted only for wwm-regs. If we see any such copy, insert the EXEC mask manipulation unconditionally.

Rename PRED_COPY to WWM_COPY
Review comments delete redundant isWWMCopy() check

Harbormaster completed remote builds in B241386: Diff 534849.Jun 26 2023, 11:25 PM

Remove the test llvm/test/CodeGen/AMDGPU/skip-subreg-copy-from-iswwmcopy-check.mir. It is no longer needed as you have dropped isWWMCopy function.

llvm/include/llvm/CodeGen/TargetInstrInfo.h
1974 ↗	(On Diff #534849)	Did you rebase this patch after the prototype change in D150388? This diff shouldn't be here.
llvm/lib/CodeGen/SplitKit.cpp
539 ↗	(On Diff #534849)	Ditto
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
2424	Still see the regular COPY here.
llvm/lib/Target/AMDGPU/SILowerWWMCopies.cpp
93 ↗	(On Diff #534849)	Remove the VRM check here. We no longer include this pass in the O0 pipeline.
125 ↗	(On Diff #534849)	I think this assertion is redundant. We are inserting WWM_COPY only during the live range split.

yassingh added inline comments.Jun 27 2023, 6:29 AM

llvm/lib/Target/AMDGPU/SILowerWWMCopies.cpp
93 ↗	(On Diff #534849)	This pass is being invoked in O0 too currently(added while exploring an independent lowering mechanism for PRED_COPY). Will remove that too.

arsenm added inline comments.Jun 27 2023, 10:57 AM

llvm/lib/Target/AMDGPU/SILowerWWMCopies.cpp
129 ↗	(On Diff #534849)	avoid DebugLoc copy, use const ref

arsenm added inline comments.Jun 27 2023, 10:59 AM

llvm/lib/Target/AMDGPU/SILowerWWMCopies.cpp
144 ↗	(On Diff #534849)	In a follow up could release the reserved register

Review commnets and remove pass from O0.

Reupload

Harbormaster completed remote builds in B241696: Diff 535253.Jun 28 2023, 12:03 AM

Harbormaster completed remote builds in B241698: Diff 535255.

yassingh added inline comments.Jun 28 2023, 12:06 AM

llvm/lib/Target/AMDGPU/SILowerWWMCopies.cpp
144 ↗	(On Diff #534849)	I don't understand, since sgpr allocation is already done do we have to free the reserved sgprs? Also, can you refer me to some other reserved registers to see how do I release it.

rebase

Harbormaster completed remote builds in B241816: Diff 535429.Jun 28 2023, 8:41 AM

re-upload

Harbormaster completed remote builds in B241817: Diff 535430.Jun 28 2023, 8:43 AM

LGTM with nit

llvm/lib/Target/AMDGPU/SIInstructions.td
3352	This should not be AMDGPUGenericInstruction, it's not a generic instruction. Should subclass from AMDGPUInst, and move to the other non-generic pseudos. AMDGPUGenericInstruction is for the G_AMDGPU_ globalisel pre-isel pseudos

This revision is now accepted and ready to land.Jun 28 2023, 9:03 AM

arsenm added inline comments.Jun 28 2023, 9:04 AM

llvm/lib/Target/AMDGPU/SILowerWWMCopies.cpp
120 ↗	(On Diff #535430)	Typos Club adjancent
135 ↗	(On Diff #535430)	Missing newline
llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
376	I think I lost track of what was going on with -O0, did this lose the -O0 run?

Also not sure how this has no test changes

arsenm added inline comments.Jun 29 2023, 4:17 AM

llvm/lib/Target/AMDGPU/SIInstructions.td
3359	Should be marked convergent

Review comments

Harbormaster completed remote builds in B243006: Diff 537048.Jul 4 2023, 4:12 AM

In D143762#4456567, @arsenm wrote:

Also not sure how this has no test changes

This patch was originally submitted with 0 tests. My understating is we won't see any effects of these changes until WWM copies are inserted in code by the next patch(Spill SGPRs to virtual VGPRs).
cc @cdevadas

llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
376	It was introduced when we moved from PRED_COPY "simplification" to proper lowering mechanism. Was removed again after the approach changed back to lowering WWM_COPY to COPY first. I have added the O0 invocation back but it's not doing anything.

In D143762#4471029, @yassingh wrote:

In D143762#4456567, @arsenm wrote:

Also not sure how this has no test changes

This patch was originally submitted with 0 tests. My understating is we won't see any effects of these changes until WWM copies are inserted in code by the next patch(Spill SGPRs to virtual VGPRs).
cc @cdevadas

Yes. This is only a pre-patch for D124196 where the actual test was included, llvm/test/CodeGen/AMDGPU/whole-wave-register-copy.ll

arsenm added inline comments.Jul 4 2023, 7:04 AM

llvm/lib/Target/AMDGPU/SILowerWWMCopies.cpp
132 ↗	(On Diff #537048)	Back to back debug printing prints same instruction, just remove the second one?

Removed extra debug statement

Harbormaster completed remote builds in B243054: Diff 537120.Jul 4 2023, 9:40 AM

cdevadas added inline comments.Jul 4 2023, 9:58 AM

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
1361	I'm still not convinced why this is needed in the -O0 flow? By now, the VGPR allocation is done in the -O0 flow, and we no longer have any virtual registers. This pass act on virtual registers to see if wwm copies needed exec manipulation.
llvm/lib/Target/AMDGPU/SILowerWWMCopies.cpp
90 ↗	(On Diff #537120)	If this pass is enabled for for -O0 as well, put back the !VRM check to ensure an early return from here for -O0 flow. And may be a comment too?

Update instruction class of WWM_COPY

Harbormaster completed remote builds in B243429: Diff 537660.Jul 6 2023, 4:19 AM

arsenm added inline comments.Jul 6 2023, 8:30 AM

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
1361	It's conceptually needed and it's an implementation detail of current regalloc fast that these aren't introduced. Plus I think in general we should have other WWM copies for general WWM support in the future

cdevadas added inline comments.Jul 6 2023, 8:46 AM

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
1361	ok
llvm/lib/Target/AMDGPU/SILowerWWMCopies.cpp
111 ↗	(On Diff #537660)	Also, do an early return if `MRI.getNumVirtRegs()` is zero. This would avoid iterating the whole function when there are no virtual registers at all (would take care of the -O0 path when physRegs are already assigned).

Rebase before merge

This revision was landed with ongoing or failed builds.Jul 7 2023, 10:29 AM

Closed by commit rGb4a62b1fa546: [AMDGPU] Enable whole wave register copy (authored by cdevadas, committed by yassingh). · Explain Why

This revision was automatically updated to reflect the committed changes.

yassingh added a commit: rGb4a62b1fa546: [AMDGPU] Enable whole wave register copy.

Harbormaster completed remote builds in B243809: Diff 538190.Jul 7 2023, 12:40 PM

vitalybuka added a reverting change: D156381: Revert "[CodeGen]Allow targets to use target specific COPY instructions for live range splitting".Jul 26 2023, 4:00 PM

vitalybuka added a reverting change: rGa496c8be6e63: Revert "[CodeGen]Allow targets to use target specific COPY instructions for….Jul 26 2023, 10:13 PM

vitalybuka reopened this revision.Jul 26 2023, 10:15 PM

This revision is now accepted and ready to land.Jul 26 2023, 10:15 PM

Patch relanded with 4d42e8b5d1fa87e49768d100dd1bc53515391e89

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

ExpandPostRAPseudos.cpp

1 line

Target/

AMDGPU/

AMDGPU.h

4 lines

AMDGPUTargetMachine.cpp

5 lines

1 line

1 line

19 lines

48 lines

9 lines

SILowerPredicatedCopies.cpp

163 lines

test/

CodeGen/

AMDGPU/

branch-folding-implicit-def-subreg.ll

5 lines

greedy-global-heuristic.mir

16 lines

llc-pipeline.ll

5 lines

load-global-i16.ll

1 line

partial-regcopy-and-spill-missed-at-regalloc.ll

28 lines

regalloc-fail-unsatisfiable-overlapping-tuple-hints.mir

4 lines

regalloc-introduces-copy-sgpr-to-agpr.mir

1 line

sgpr-regalloc-flags.ll

5 lines

skip-subreg-copy-from-iswwmcopy-check.mir

20 lines

Diff 528812

llvm/lib/CodeGen/ExpandPostRAPseudos.cpp

Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	void getAnalysisUsage(AnalysisUsage &AU) const override {
MachineFunctionPass::getAnalysisUsage(AU);		MachineFunctionPass::getAnalysisUsage(AU);
}		}

/// runOnMachineFunction - pass entry point		/// runOnMachineFunction - pass entry point
bool runOnMachineFunction(MachineFunction&) override;		bool runOnMachineFunction(MachineFunction&) override;

private:		private:
bool LowerSubregToReg(MachineInstr *MI);		bool LowerSubregToReg(MachineInstr *MI);

};		};
} // end anonymous namespace		} // end anonymous namespace

char ExpandPostRA::ID = 0;		char ExpandPostRA::ID = 0;
char &llvm::ExpandPostRAPseudosID = ExpandPostRA::ID;		char &llvm::ExpandPostRAPseudosID = ExpandPostRA::ID;

INITIALIZE_PASS(ExpandPostRA, DEBUG_TYPE,		INITIALIZE_PASS(ExpandPostRA, DEBUG_TYPE,
"Post-RA pseudo instruction expansion pass", false, false)		"Post-RA pseudo instruction expansion pass", false, false)
▲ Show 20 Lines • Show All 103 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPU.h

	Show All 35 Lines
	FunctionPass *createSILowerI1CopiesPass();			FunctionPass *createSILowerI1CopiesPass();
	FunctionPass *createSIShrinkInstructionsPass();			FunctionPass *createSIShrinkInstructionsPass();
	FunctionPass *createSILoadStoreOptimizerPass();			FunctionPass *createSILoadStoreOptimizerPass();
	FunctionPass *createSIWholeQuadModePass();			FunctionPass *createSIWholeQuadModePass();
	FunctionPass *createSIFixControlFlowLiveIntervalsPass();			FunctionPass *createSIFixControlFlowLiveIntervalsPass();
	FunctionPass *createSIOptimizeExecMaskingPreRAPass();			FunctionPass *createSIOptimizeExecMaskingPreRAPass();
	FunctionPass *createSIOptimizeVGPRLiveRangePass();			FunctionPass *createSIOptimizeVGPRLiveRangePass();
	FunctionPass *createSIFixSGPRCopiesPass();			FunctionPass *createSIFixSGPRCopiesPass();
				FunctionPass *createLowerPredicatedCopiesPass();
	FunctionPass *createSIMemoryLegalizerPass();			FunctionPass *createSIMemoryLegalizerPass();
	FunctionPass *createSIInsertWaitcntsPass();			FunctionPass *createSIInsertWaitcntsPass();
	FunctionPass *createSIPreAllocateWWMRegsPass();			FunctionPass *createSIPreAllocateWWMRegsPass();
	FunctionPass *createSIFormMemoryClausesPass();			FunctionPass *createSIFormMemoryClausesPass();

	FunctionPass *createSIPostRABundlerPass();			FunctionPass *createSIPostRABundlerPass();
	FunctionPass createAMDGPUSimplifyLibCallsPass(const TargetMachine );			FunctionPass createAMDGPUSimplifyLibCallsPass(const TargetMachine );
	FunctionPass *createAMDGPUUseNativeCallsPass();			FunctionPass *createAMDGPUUseNativeCallsPass();
	▲ Show 20 Lines • Show All 114 Lines • ▼ Show 20 Lines
	extern char &SIShrinkInstructionsID;			extern char &SIShrinkInstructionsID;

	void initializeSIFixSGPRCopiesPass(PassRegistry &);			void initializeSIFixSGPRCopiesPass(PassRegistry &);
	extern char &SIFixSGPRCopiesID;			extern char &SIFixSGPRCopiesID;

	void initializeSIFixVGPRCopiesPass(PassRegistry &);			void initializeSIFixVGPRCopiesPass(PassRegistry &);
	extern char &SIFixVGPRCopiesID;			extern char &SIFixVGPRCopiesID;

				void initializeSILowerPredicatedCopiesPass(PassRegistry &);
				extern char &SILowerPredicatedCopiesID;

	void initializeSILowerI1CopiesPass(PassRegistry &);			void initializeSILowerI1CopiesPass(PassRegistry &);
	extern char &SILowerI1CopiesID;			extern char &SILowerI1CopiesID;

	void initializeSILowerSGPRSpillsPass(PassRegistry &);			void initializeSILowerSGPRSpillsPass(PassRegistry &);
	extern char &SILowerSGPRSpillsID;			extern char &SILowerSGPRSpillsID;

	void initializeSILoadStoreOptimizerPass(PassRegistry &);			void initializeSILoadStoreOptimizerPass(PassRegistry &);
	extern char &SILoadStoreOptimizerID;			extern char &SILoadStoreOptimizerID;
	▲ Show 20 Lines • Show All 278 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

Show First 20 Lines • Show All 355 Lines • ▼ Show 20 Lines	extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeAMDGPUTarget() {
initializeR600ControlFlowFinalizerPass(*PR);		initializeR600ControlFlowFinalizerPass(*PR);
initializeR600PacketizerPass(*PR);		initializeR600PacketizerPass(*PR);
initializeR600ExpandSpecialInstrsPassPass(*PR);		initializeR600ExpandSpecialInstrsPassPass(*PR);
initializeR600VectorRegMergerPass(*PR);		initializeR600VectorRegMergerPass(*PR);
initializeGlobalISel(*PR);		initializeGlobalISel(*PR);
initializeAMDGPUDAGToDAGISelPass(*PR);		initializeAMDGPUDAGToDAGISelPass(*PR);
initializeGCNDPPCombinePass(*PR);		initializeGCNDPPCombinePass(*PR);
initializeSILowerI1CopiesPass(*PR);		initializeSILowerI1CopiesPass(*PR);
		initializeSILowerPredicatedCopiesPass(*PR);
initializeSILowerSGPRSpillsPass(*PR);		initializeSILowerSGPRSpillsPass(*PR);
initializeSIFixSGPRCopiesPass(*PR);		initializeSIFixSGPRCopiesPass(*PR);
initializeSIFixVGPRCopiesPass(*PR);		initializeSIFixVGPRCopiesPass(*PR);
initializeSIFoldOperandsPass(*PR);		initializeSIFoldOperandsPass(*PR);
initializeSIPeepholeSDWAPass(*PR);		initializeSIPeepholeSDWAPass(*PR);
initializeSIShrinkInstructionsPass(*PR);		initializeSIShrinkInstructionsPass(*PR);
initializeSIOptimizeExecMaskingPreRAPass(*PR);		initializeSIOptimizeExecMaskingPreRAPass(*PR);
initializeSIOptimizeVGPRLiveRangePass(*PR);		initializeSIOptimizeVGPRLiveRangePass(*PR);
▲ Show 20 Lines • Show All 926 Lines • ▼ Show 20 Lines	void GCNPassConfig::addOptimizedRegAlloc() {

if (EnableDCEInRA)		if (EnableDCEInRA)
insertPass(&DetectDeadLanesID, &DeadMachineInstructionElimID);		insertPass(&DetectDeadLanesID, &DeadMachineInstructionElimID);

TargetPassConfig::addOptimizedRegAlloc();		TargetPassConfig::addOptimizedRegAlloc();
}		}

bool GCNPassConfig::addPreRewrite() {		bool GCNPassConfig::addPreRewrite() {
		addPass(&SILowerPredicatedCopiesID);
if (EnableRegReassign)		if (EnableRegReassign)
addPass(&GCNNSAReassignID);		addPass(&GCNNSAReassignID);
return true;		return true;
}		}

FunctionPass *GCNPassConfig::createSGPRAllocPass(bool Optimized) {		FunctionPass *GCNPassConfig::createSGPRAllocPass(bool Optimized) {
// Initialize the global default.		// Initialize the global default.
llvm::call_once(InitializeDefaultSGPRRegisterAllocatorFlag,		llvm::call_once(InitializeDefaultSGPRRegisterAllocatorFlag,
Show All 36 Lines	if (!usingDefaultRegAlloc())
report_fatal_error(RegAllocOptNotSupportedMessage);		report_fatal_error(RegAllocOptNotSupportedMessage);

addPass(createSGPRAllocPass(false));		addPass(createSGPRAllocPass(false));

// Equivalent of PEI for SGPRs.		// Equivalent of PEI for SGPRs.
addPass(&SILowerSGPRSpillsID);		addPass(&SILowerSGPRSpillsID);

addPass(createVGPRAllocPass(false));		addPass(createVGPRAllocPass(false));

		addPass(&SILowerPredicatedCopiesID);
		cdevadasAuthorUnsubmitted Done Reply Inline Actions I'm still not convinced why this is needed in the -O0 flow? By now, the VGPR allocation is done in the -O0 flow, and we no longer have any virtual registers. This pass act on virtual registers to see if wwm copies needed exec manipulation. cdevadas: I'm still not convinced why this is needed in the -O0 flow? By now, the VGPR allocation is done…
		arsenmUnsubmitted Not Done Reply Inline Actions It's conceptually needed and it's an implementation detail of current regalloc fast that these aren't introduced. Plus I think in general we should have other WWM copies for general WWM support in the future arsenm: It's conceptually needed and it's an implementation detail of current regalloc fast that these…
		cdevadasAuthorUnsubmitted Done Reply Inline Actions ok cdevadas: ok

return true;		return true;
}		}

bool GCNPassConfig::addRegAssignAndRewriteOptimized() {		bool GCNPassConfig::addRegAssignAndRewriteOptimized() {
if (!usingDefaultRegAlloc())		if (!usingDefaultRegAlloc())
report_fatal_error(RegAllocOptNotSupportedMessage);		report_fatal_error(RegAllocOptNotSupportedMessage);

addPass(createSGPRAllocPass(true));		addPass(createSGPRAllocPass(true));
▲ Show 20 Lines • Show All 277 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/CMakeLists.txt

Show First 20 Lines • Show All 141 Lines • ▼ Show 20 Lines	add_llvm_target(AMDGPUCodeGen
SIInsertHardClauses.cpp		SIInsertHardClauses.cpp
SIInsertWaitcnts.cpp		SIInsertWaitcnts.cpp
SIInstrInfo.cpp		SIInstrInfo.cpp
SIISelLowering.cpp		SIISelLowering.cpp
SILateBranchLowering.cpp		SILateBranchLowering.cpp
SILoadStoreOptimizer.cpp		SILoadStoreOptimizer.cpp
SILowerControlFlow.cpp		SILowerControlFlow.cpp
SILowerI1Copies.cpp		SILowerI1Copies.cpp
		SILowerPredicatedCopies.cpp
SILowerSGPRSpills.cpp		SILowerSGPRSpills.cpp
SIMachineFunctionInfo.cpp		SIMachineFunctionInfo.cpp
SIMachineScheduler.cpp		SIMachineScheduler.cpp
SIMemoryLegalizer.cpp		SIMemoryLegalizer.cpp
SIModeRegister.cpp		SIModeRegister.cpp
SIModeRegisterDefaults.cpp		SIModeRegisterDefaults.cpp
SIOptimizeExecMasking.cpp		SIOptimizeExecMasking.cpp
SIOptimizeExecMaskingPreRA.cpp		SIOptimizeExecMaskingPreRA.cpp
▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIFixVGPRCopies.cpp

Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	bool SIFixVGPRCopies::runOnMachineFunction(MachineFunction &MF) {
const SIRegisterInfo *TRI = ST.getRegisterInfo();		const SIRegisterInfo *TRI = ST.getRegisterInfo();
const SIInstrInfo *TII = ST.getInstrInfo();		const SIInstrInfo *TII = ST.getInstrInfo();
bool Changed = false;		bool Changed = false;

for (MachineBasicBlock &MBB : MF) {		for (MachineBasicBlock &MBB : MF) {
for (MachineInstr &MI : MBB) {		for (MachineInstr &MI : MBB) {
switch (MI.getOpcode()) {		switch (MI.getOpcode()) {
case AMDGPU::COPY:		case AMDGPU::COPY:
		case AMDGPU::PRED_COPY:
if (TII->isVGPRCopy(MI) && !MI.readsRegister(AMDGPU::EXEC, TRI)) {		if (TII->isVGPRCopy(MI) && !MI.readsRegister(AMDGPU::EXEC, TRI)) {
MI.addOperand(MF,		MI.addOperand(MF,
MachineOperand::CreateReg(AMDGPU::EXEC, false, true));		MachineOperand::CreateReg(AMDGPU::EXEC, false, true));
LLVM_DEBUG(dbgs() << "Add exec use to " << MI);		LLVM_DEBUG(dbgs() << "Add exec use to " << MI);
Changed = true;		Changed = true;
}		}
break;		break;
default:		default:
break;		break;
}		}
}		}
}		}

return Changed;		return Changed;
}		}

llvm/lib/Target/AMDGPU/SIInstrInfo.h

Show First 20 Lines • Show All 164 Lines • ▼ Show 20 Lines	private:
getDestEquivalentVGPRClass(const MachineInstr &Inst) const;		getDestEquivalentVGPRClass(const MachineInstr &Inst) const;

bool checkInstOffsetsDoNotOverlap(const MachineInstr &MIa,		bool checkInstOffsetsDoNotOverlap(const MachineInstr &MIa,
const MachineInstr &MIb) const;		const MachineInstr &MIb) const;

Register findUsedSGPR(const MachineInstr &MI, int OpIndices[3]) const;		Register findUsedSGPR(const MachineInstr &MI, int OpIndices[3]) const;

protected:		protected:
		/// If the specific machine instruction is a instruction that moves/copies
		/// value from one register to another register return destination and source
		/// registers as machine operands.
		std::optional<DestSourcePair>
		isCopyInstrImpl(const MachineInstr &MI) const override;

bool swapSourceModifiers(MachineInstr &MI,		bool swapSourceModifiers(MachineInstr &MI,
MachineOperand &Src0, unsigned Src0OpName,		MachineOperand &Src0, unsigned Src0OpName,
MachineOperand &Src1, unsigned Src1OpName) const;		MachineOperand &Src1, unsigned Src1OpName) const;

MachineInstr *commuteInstructionImpl(MachineInstr &MI, bool NewMI,		MachineInstr *commuteInstructionImpl(MachineInstr &MI, bool NewMI,
unsigned OpIdx0,		unsigned OpIdx0,
unsigned OpIdx1) const override;		unsigned OpIdx1) const override;

▲ Show 20 Lines • Show All 641 Lines • ▼ Show 20 Lines	static bool doesNotReadTiedSource(const MachineInstr &MI) {
return MI.getDesc().TSFlags & SIInstrFlags::TiedSourceNotRead;		return MI.getDesc().TSFlags & SIInstrFlags::TiedSourceNotRead;
}		}

bool doesNotReadTiedSource(uint16_t Opcode) const {		bool doesNotReadTiedSource(uint16_t Opcode) const {
return get(Opcode).TSFlags & SIInstrFlags::TiedSourceNotRead;		return get(Opcode).TSFlags & SIInstrFlags::TiedSourceNotRead;
}		}

bool isVGPRCopy(const MachineInstr &MI) const {		bool isVGPRCopy(const MachineInstr &MI) const {
assert(MI.isCopy());		assert(isCopyInstr(MI));
Register Dest = MI.getOperand(0).getReg();		Register Dest = MI.getOperand(0).getReg();
const MachineFunction &MF = *MI.getParent()->getParent();		const MachineFunction &MF = *MI.getParent()->getParent();
const MachineRegisterInfo &MRI = MF.getRegInfo();		const MachineRegisterInfo &MRI = MF.getRegInfo();
return !RI.isSGPRReg(MRI, Dest);		return !RI.isSGPRReg(MRI, Dest);
}		}

bool hasVGPRUses(const MachineInstr &MI) const {		bool hasVGPRUses(const MachineInstr &MI) const {
const MachineFunction &MF = *MI.getParent()->getParent();		const MachineFunction &MF = *MI.getParent()->getParent();
▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	bool isInlineConstant(const MachineInstr &MI, unsigned OpIdx) const {
return isInlineConstant(MO, MI.getDesc().operands()[OpIdx].OperandType);		return isInlineConstant(MO, MI.getDesc().operands()[OpIdx].OperandType);
}		}

bool isInlineConstant(const MachineInstr &MI, unsigned OpIdx,		bool isInlineConstant(const MachineInstr &MI, unsigned OpIdx,
const MachineOperand &MO) const {		const MachineOperand &MO) const {
if (OpIdx >= MI.getDesc().NumOperands)		if (OpIdx >= MI.getDesc().NumOperands)
return false;		return false;

if (MI.isCopy()) {		if (isCopyInstr(MI)) {
unsigned Size = getOpSize(MI, OpIdx);		unsigned Size = getOpSize(MI, OpIdx);
assert(Size == 8 \|\| Size == 4);		assert(Size == 8 \|\| Size == 4);

uint8_t OpType = (Size == 8) ?		uint8_t OpType = (Size == 8) ?
AMDGPU::OPERAND_REG_IMM_INT64 : AMDGPU::OPERAND_REG_IMM_INT32;		AMDGPU::OPERAND_REG_IMM_INT64 : AMDGPU::OPERAND_REG_IMM_INT32;
return isInlineConstant(MO, OpType);		return isInlineConstant(MO, OpType);
}		}

Show All 32 Lines	public:

bool verifyInstruction(const MachineInstr &MI,		bool verifyInstruction(const MachineInstr &MI,
StringRef &ErrInfo) const override;		StringRef &ErrInfo) const override;

unsigned getVALUOp(const MachineInstr &MI) const;		unsigned getVALUOp(const MachineInstr &MI) const;

void insertScratchExecCopy(MachineFunction &MF, MachineBasicBlock &MBB,		void insertScratchExecCopy(MachineFunction &MF, MachineBasicBlock &MBB,
MachineBasicBlock::iterator MBBI,		MachineBasicBlock::iterator MBBI,
const DebugLoc &DL, Register Reg,		const DebugLoc &DL, Register Reg, bool IsSCCLive,
bool IsSCCLive) const;		SlotIndexes *Indexes = nullptr) const;

void restoreExec(MachineFunction &MF, MachineBasicBlock &MBB,		void restoreExec(MachineFunction &MF, MachineBasicBlock &MBB,
MachineBasicBlock::iterator MBBI, const DebugLoc &DL,		MachineBasicBlock::iterator MBBI, const DebugLoc &DL,
Register Reg) const;		Register Reg, SlotIndexes *Indexes = nullptr) const;

/// Return the correct register class for \p OpNo. For target-specific		/// Return the correct register class for \p OpNo. For target-specific
/// instructions, this will return the register class that has been defined		/// instructions, this will return the register class that has been defined
/// in tablegen. For generic instructions, like REG_SEQUENCE it will return		/// in tablegen. For generic instructions, like REG_SEQUENCE it will return
/// the register class of its machine operand.		/// the register class of its machine operand.
/// to infer the correct register class base on the other operands.		/// to infer the correct register class base on the other operands.
const TargetRegisterClass *getOpRegClass(const MachineInstr &MI,		const TargetRegisterClass *getOpRegClass(const MachineInstr &MI,
unsigned OpNo) const;		unsigned OpNo) const;
▲ Show 20 Lines • Show All 175 Lines • ▼ Show 20 Lines	public:

ScheduleHazardRecognizer *		ScheduleHazardRecognizer *
CreateTargetPostRAHazardRecognizer(const MachineFunction &MF) const override;		CreateTargetPostRAHazardRecognizer(const MachineFunction &MF) const override;

ScheduleHazardRecognizer *		ScheduleHazardRecognizer *
CreateTargetMIHazardRecognizer(const InstrItineraryData *II,		CreateTargetMIHazardRecognizer(const InstrItineraryData *II,
const ScheduleDAGMI *DAG) const override;		const ScheduleDAGMI *DAG) const override;

		unsigned getLiveRangeSplitOpcode(Register reg,
		MachineRegisterInfo &MRI) const override;

bool isBasicBlockPrologue(const MachineInstr &MI) const override;		bool isBasicBlockPrologue(const MachineInstr &MI) const override;

MachineInstr *createPHIDestinationCopy(MachineBasicBlock &MBB,		MachineInstr *createPHIDestinationCopy(MachineBasicBlock &MBB,
MachineBasicBlock::iterator InsPt,		MachineBasicBlock::iterator InsPt,
const DebugLoc &DL, Register Src,		const DebugLoc &DL, Register Src,
Register Dst) const override;		Register Dst) const override;

MachineInstr *createPHISourceCopy(MachineBasicBlock &MBB,		MachineInstr *createPHISourceCopy(MachineBasicBlock &MBB,
▲ Show 20 Lines • Show All 249 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,926 Lines • ▼ Show 20 Lines	case AMDGPU::S_NOP:
return MI.getOperand(0).getImm() + 1;		return MI.getOperand(0).getImm() + 1;
// SI_RETURN_TO_EPILOG is a fallthrough to code outside of the function. The		// SI_RETURN_TO_EPILOG is a fallthrough to code outside of the function. The
// hazard, even if one exist, won't really be visible. Should we handle it?		// hazard, even if one exist, won't really be visible. Should we handle it?
}		}
}		}

bool SIInstrInfo::expandPostRAPseudo(MachineInstr &MI) const {		bool SIInstrInfo::expandPostRAPseudo(MachineInstr &MI) const {
const SIRegisterInfo *TRI = ST.getRegisterInfo();		const SIRegisterInfo *TRI = ST.getRegisterInfo();
		const SIInstrInfo *TII = ST.getInstrInfo();
MachineBasicBlock &MBB = *MI.getParent();		MachineBasicBlock &MBB = *MI.getParent();
DebugLoc DL = MBB.findDebugLoc(MI);		DebugLoc DL = MBB.findDebugLoc(MI);
switch (MI.getOpcode()) {		switch (MI.getOpcode()) {
default: return TargetInstrInfo::expandPostRAPseudo(MI);		default: return TargetInstrInfo::expandPostRAPseudo(MI);
		case AMDGPU::PRED_COPY:
		TII->lowerCopy(&MI);
		break;
		yassinghUnsubmitted Not Done Reply Inline Actions I wanted to keep this in SILowerPredicatedCopy pass but it wasn't working there, it needed virtregRewriter to be run before it. yassingh: I wanted to keep this in SILowerPredicatedCopy pass but it wasn't working there, it needed…

case AMDGPU::S_MOV_B64_term:		case AMDGPU::S_MOV_B64_term:
// This is only a terminator to get the correct spill code placement during		// This is only a terminator to get the correct spill code placement during
// register allocation.		// register allocation.
MI.setDesc(get(AMDGPU::S_MOV_B64));		MI.setDesc(get(AMDGPU::S_MOV_B64));
break;		break;

case AMDGPU::S_MOV_B32_term:		case AMDGPU::S_MOV_B32_term:
// This is only a terminator to get the correct spill code placement during		// This is only a terminator to get the correct spill code placement during
▲ Show 20 Lines • Show All 462 Lines • ▼ Show 20 Lines	BuildMI(MBB, MI, DL, get(AMDGPU::REG_SEQUENCE), Dst)
.addImm(AMDGPU::sub0)		.addImm(AMDGPU::sub0)
.addReg(Split[1]->getOperand(0).getReg())		.addReg(Split[1]->getOperand(0).getReg())
.addImm(AMDGPU::sub1);		.addImm(AMDGPU::sub1);

MI.eraseFromParent();		MI.eraseFromParent();
return std::pair(Split[0], Split[1]);		return std::pair(Split[0], Split[1]);
}		}

		std::optional<DestSourcePair>
		SIInstrInfo::isCopyInstrImpl(const MachineInstr &MI) const {
		if (MI.getOpcode() == AMDGPU::COPY \|\| MI.getOpcode() == AMDGPU::PRED_COPY)
		arsenmUnsubmitted Not Done Reply Inline Actions Regular copy shouldn't reach here? arsenm: Regular copy shouldn't reach here?
		yassinghUnsubmitted Not Done Reply Inline Actions Yes it won't, will remove. yassingh: Yes it won't, will remove.
		cdevadasAuthorUnsubmitted Done Reply Inline Actions Still see the regular COPY here. cdevadas: Still see the regular COPY here.
		return DestSourcePair{MI.getOperand(0), MI.getOperand(1)};

		return std::nullopt;
		}

bool SIInstrInfo::swapSourceModifiers(MachineInstr &MI,		bool SIInstrInfo::swapSourceModifiers(MachineInstr &MI,
MachineOperand &Src0,		MachineOperand &Src0,
unsigned Src0OpName,		unsigned Src0OpName,
MachineOperand &Src1,		MachineOperand &Src1,
unsigned Src1OpName) const {		unsigned Src1OpName) const {
MachineOperand *Src0Mods = getNamedOperand(MI, Src0OpName);		MachineOperand *Src0Mods = getNamedOperand(MI, Src0OpName);
if (!Src0Mods)		if (!Src0Mods)
return false;		return false;
▲ Show 20 Lines • Show All 639 Lines • ▼ Show 20 Lines	bool SIInstrInfo::isFoldableCopy(const MachineInstr &MI) {
case AMDGPU::V_MOV_B32_e32:		case AMDGPU::V_MOV_B32_e32:
case AMDGPU::V_MOV_B32_e64:		case AMDGPU::V_MOV_B32_e64:
case AMDGPU::V_MOV_B64_PSEUDO:		case AMDGPU::V_MOV_B64_PSEUDO:
case AMDGPU::V_MOV_B64_e32:		case AMDGPU::V_MOV_B64_e32:
case AMDGPU::V_MOV_B64_e64:		case AMDGPU::V_MOV_B64_e64:
case AMDGPU::S_MOV_B32:		case AMDGPU::S_MOV_B32:
case AMDGPU::S_MOV_B64:		case AMDGPU::S_MOV_B64:
case AMDGPU::COPY:		case AMDGPU::COPY:
		case AMDGPU::PRED_COPY:
case AMDGPU::V_ACCVGPR_WRITE_B32_e64:		case AMDGPU::V_ACCVGPR_WRITE_B32_e64:
case AMDGPU::V_ACCVGPR_READ_B32_e64:		case AMDGPU::V_ACCVGPR_READ_B32_e64:
case AMDGPU::V_ACCVGPR_MOV_B32:		case AMDGPU::V_ACCVGPR_MOV_B32:
return true;		return true;
default:		default:
return false;		return false;
}		}
}		}
▲ Show 20 Lines • Show All 1,873 Lines • ▼ Show 20 Lines	unsigned SIInstrInfo::getVALUOp(const MachineInstr &MI) const {
llvm_unreachable(		llvm_unreachable(
"Unexpected scalar opcode without corresponding vector one!");		"Unexpected scalar opcode without corresponding vector one!");
}		}

void SIInstrInfo::insertScratchExecCopy(MachineFunction &MF,		void SIInstrInfo::insertScratchExecCopy(MachineFunction &MF,
MachineBasicBlock &MBB,		MachineBasicBlock &MBB,
MachineBasicBlock::iterator MBBI,		MachineBasicBlock::iterator MBBI,
const DebugLoc &DL, Register Reg,		const DebugLoc &DL, Register Reg,
bool IsSCCLive) const {		bool IsSCCLive,
		SlotIndexes *Indexes) const {
const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();		const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
const SIInstrInfo *TII = ST.getInstrInfo();		const SIInstrInfo *TII = ST.getInstrInfo();
bool IsWave32 = ST.isWave32();		bool IsWave32 = ST.isWave32();
if (IsSCCLive) {		if (IsSCCLive) {
// Insert two move instructions, one to save the original value of EXEC and		// Insert two move instructions, one to save the original value of EXEC and
// the other to turn on all bits in EXEC. This is required as we can't use		// the other to turn on all bits in EXEC. This is required as we can't use
// the single instruction S_OR_SAVEEXEC that clobbers SCC.		// the single instruction S_OR_SAVEEXEC that clobbers SCC.
unsigned MovOpc = IsWave32 ? AMDGPU::S_MOV_B32 : AMDGPU::S_MOV_B64;		unsigned MovOpc = IsWave32 ? AMDGPU::S_MOV_B32 : AMDGPU::S_MOV_B64;
MCRegister Exec = IsWave32 ? AMDGPU::EXEC_LO : AMDGPU::EXEC;		MCRegister Exec = IsWave32 ? AMDGPU::EXEC_LO : AMDGPU::EXEC;
BuildMI(MBB, MBBI, DL, TII->get(MovOpc), Reg).addReg(Exec, RegState::Kill);		auto StoreExecMI = BuildMI(MBB, MBBI, DL, TII->get(MovOpc), Reg)
BuildMI(MBB, MBBI, DL, TII->get(MovOpc), Exec).addImm(-1);		.addReg(Exec, RegState::Kill);
		auto FlipExecMI = BuildMI(MBB, MBBI, DL, TII->get(MovOpc), Exec).addImm(-1);
		if (Indexes) {
		Indexes->insertMachineInstrInMaps(*StoreExecMI);
		Indexes->insertMachineInstrInMaps(*FlipExecMI);
		}
} else {		} else {
const unsigned OrSaveExec =		const unsigned OrSaveExec =
IsWave32 ? AMDGPU::S_OR_SAVEEXEC_B32 : AMDGPU::S_OR_SAVEEXEC_B64;		IsWave32 ? AMDGPU::S_OR_SAVEEXEC_B32 : AMDGPU::S_OR_SAVEEXEC_B64;
auto SaveExec =		auto SaveExec =
BuildMI(MBB, MBBI, DL, TII->get(OrSaveExec), Reg).addImm(-1);		BuildMI(MBB, MBBI, DL, TII->get(OrSaveExec), Reg).addImm(-1);
SaveExec->getOperand(3).setIsDead(); // Mark SCC as dead.		SaveExec->getOperand(3).setIsDead(); // Mark SCC as dead.
		if (Indexes)
		Indexes->insertMachineInstrInMaps(*SaveExec);
}		}
}		}

void SIInstrInfo::restoreExec(MachineFunction &MF, MachineBasicBlock &MBB,		void SIInstrInfo::restoreExec(MachineFunction &MF, MachineBasicBlock &MBB,
MachineBasicBlock::iterator MBBI,		MachineBasicBlock::iterator MBBI,
const DebugLoc &DL, Register Reg) const {		const DebugLoc &DL, Register Reg,
		SlotIndexes *Indexes) const {
const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();		const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
const SIInstrInfo *TII = ST.getInstrInfo();		const SIInstrInfo *TII = ST.getInstrInfo();
unsigned ExecMov = ST.isWave32() ? AMDGPU::S_MOV_B32 : AMDGPU::S_MOV_B64;		unsigned ExecMov = ST.isWave32() ? AMDGPU::S_MOV_B32 : AMDGPU::S_MOV_B64;
MCRegister Exec = ST.isWave32() ? AMDGPU::EXEC_LO : AMDGPU::EXEC;		MCRegister Exec = ST.isWave32() ? AMDGPU::EXEC_LO : AMDGPU::EXEC;
BuildMI(MBB, MBBI, DL, TII->get(ExecMov), Exec).addReg(Reg, RegState::Kill);		auto ExecRestoreMI = BuildMI(MBB, MBBI, DL, TII->get(ExecMov), Exec)
		.addReg(Reg, RegState::Kill);
		if (Indexes)
		Indexes->insertMachineInstrInMaps(*ExecRestoreMI);
}		}

static const TargetRegisterClass *		static const TargetRegisterClass *
adjustAllocatableRegClass(const GCNSubtarget &ST, const SIRegisterInfo &RI,		adjustAllocatableRegClass(const GCNSubtarget &ST, const SIRegisterInfo &RI,
const MachineRegisterInfo &MRI,		const MachineRegisterInfo &MRI,
const MCInstrDesc &TID, unsigned RCID,		const MCInstrDesc &TID, unsigned RCID,
bool IsAllocatable) {		bool IsAllocatable) {
if ((IsAllocatable \|\| !ST.hasGFX90AInsts() \|\| !MRI.reservedRegsFrozen()) &&		if ((IsAllocatable \|\| !ST.hasGFX90AInsts() \|\| !MRI.reservedRegsFrozen()) &&
▲ Show 20 Lines • Show All 2,974 Lines • ▼ Show 20 Lines	SIInstrInfo::getSerializableMachineMemOperandTargetFlags() const {
static const std::pair<MachineMemOperand::Flags, const char *> TargetFlags[] =		static const std::pair<MachineMemOperand::Flags, const char *> TargetFlags[] =
{		{
{MONoClobber, "amdgpu-noclobber"},		{MONoClobber, "amdgpu-noclobber"},
};		};

return ArrayRef(TargetFlags);		return ArrayRef(TargetFlags);
}		}

		unsigned SIInstrInfo::getLiveRangeSplitOpcode(Register Reg,
		MachineRegisterInfo &MRI) const {
		auto *TRI = MRI.getTargetRegisterInfo();
		const TargetRegisterClass *RC =
		arsenmUnsubmitted Not Done Reply Inline Actions SrcReg.isVirtual should be implied arsenm: SrcReg.isVirtual should be implied
		yassinghUnsubmitted Not Done Reply Inline Actions Added an assert. yassingh: Added an assert.
		Reg.isVirtual() ? MRI.getRegClass(Reg) : TRI->getPhysRegBaseClass(Reg);
		return SIRegisterInfo::isSGPRClass(RC) ? AMDGPU::COPY : AMDGPU::PRED_COPY;
		arsenmUnsubmitted Not Done Reply Inline Actions I guess the name is wrong now. It's technically an unpredicated copy. Rename to WWM_COPY or COPY_WWM? arsenm: I guess the name is wrong now. It's technically an unpredicated copy. Rename to WWM_COPY or…
		cdevadasAuthorUnsubmitted Done Reply Inline Actions Make sense. I prefer WWM_COPY. cdevadas: Make sense. I prefer WWM_COPY.
		}
		arsenmUnsubmitted Not Done Reply Inline Actions Don't need to query TRI, it's already the RI member. You can also just do RI.isSGPRReg. Also, you're not checking the flags? The point was to check the flags and switch to the WWM copy. You don't care about SGPR or VGPR, you care about WWM VGPR or not arsenm: Don't need to query TRI, it's already the RI member. You can also just do RI.isSGPRReg. Also…

bool SIInstrInfo::isBasicBlockPrologue(const MachineInstr &MI) const {		bool SIInstrInfo::isBasicBlockPrologue(const MachineInstr &MI) const {
return !MI.isTerminator() && MI.getOpcode() != AMDGPU::COPY &&		return !MI.isTerminator() && MI.getOpcode() != AMDGPU::COPY &&
MI.modifiesRegister(AMDGPU::EXEC, &RI);		MI.modifiesRegister(AMDGPU::EXEC, &RI);
}		}

MachineInstrBuilder		MachineInstrBuilder
SIInstrInfo::getAddNoCarry(MachineBasicBlock &MBB,		SIInstrInfo::getAddNoCarry(MachineBasicBlock &MBB,
MachineBasicBlock::iterator I,		MachineBasicBlock::iterator I,
▲ Show 20 Lines • Show All 551 Lines • ▼ Show 20 Lines	MachineInstr *SIInstrInfo::foldMemoryOperandImpl(
// We explicitly chose SReg_32 for the virtual register so such a copy might		// We explicitly chose SReg_32 for the virtual register so such a copy might
// be eliminated by RegisterCoalescer. However, that may not be possible, and		// be eliminated by RegisterCoalescer. However, that may not be possible, and
// %0 may even spill. We can't spill $m0 normally (it would require copying to		// %0 may even spill. We can't spill $m0 normally (it would require copying to
// a numbered SGPR anyway), and since it is in the SReg_32 register class,		// a numbered SGPR anyway), and since it is in the SReg_32 register class,
// TargetInstrInfo::foldMemoryOperand() is going to try.		// TargetInstrInfo::foldMemoryOperand() is going to try.
// A similar issue also exists with spilling and reloading $exec registers.		// A similar issue also exists with spilling and reloading $exec registers.
//		//
// To prevent that, constrain the %0 register class here.		// To prevent that, constrain the %0 register class here.
if (MI.isFullCopy()) {		if (isFullCopyInstr(MI)) {
Register DstReg = MI.getOperand(0).getReg();		Register DstReg = MI.getOperand(0).getReg();
Register SrcReg = MI.getOperand(1).getReg();		Register SrcReg = MI.getOperand(1).getReg();
if ((DstReg.isVirtual() \|\| SrcReg.isVirtual()) &&		if ((DstReg.isVirtual() \|\| SrcReg.isVirtual()) &&
(DstReg.isVirtual() != SrcReg.isVirtual())) {		(DstReg.isVirtual() != SrcReg.isVirtual())) {
MachineRegisterInfo &MRI = MF.getRegInfo();		MachineRegisterInfo &MRI = MF.getRegInfo();
Register VirtReg = DstReg.isVirtual() ? DstReg : SrcReg;		Register VirtReg = DstReg.isVirtual() ? DstReg : SrcReg;
const TargetRegisterClass *RC = MRI.getRegClass(VirtReg);		const TargetRegisterClass *RC = MRI.getRegClass(VirtReg);
if (RC->hasSuperClassEq(&AMDGPU::SReg_32RegClass)) {		if (RC->hasSuperClassEq(&AMDGPU::SReg_32RegClass)) {
▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines	SIInstrInfo::getInstructionUniformity(const MachineInstr &MI) const {

if (isNeverUniform(MI))		if (isNeverUniform(MI))
return InstructionUniformity::NeverUniform;		return InstructionUniformity::NeverUniform;

unsigned opcode = MI.getOpcode();		unsigned opcode = MI.getOpcode();
if (opcode == AMDGPU::V_READLANE_B32 \|\| opcode == AMDGPU::V_READFIRSTLANE_B32)		if (opcode == AMDGPU::V_READLANE_B32 \|\| opcode == AMDGPU::V_READFIRSTLANE_B32)
return InstructionUniformity::AlwaysUniform;		return InstructionUniformity::AlwaysUniform;

if (MI.isCopy()) {		if (isCopyInstr(MI)) {
const MachineOperand &srcOp = MI.getOperand(1);		const MachineOperand &srcOp = MI.getOperand(1);
if (srcOp.isReg() && srcOp.getReg().isPhysical()) {		if (srcOp.isReg() && srcOp.getReg().isPhysical()) {
const TargetRegisterClass *regClass =		const TargetRegisterClass *regClass =
RI.getPhysRegBaseClass(srcOp.getReg());		RI.getPhysRegBaseClass(srcOp.getReg());
return RI.isSGPRClass(regClass) ? InstructionUniformity::AlwaysUniform		return RI.isSGPRClass(regClass) ? InstructionUniformity::AlwaysUniform
: InstructionUniformity::NeverUniform;		: InstructionUniformity::NeverUniform;
}		}
return InstructionUniformity::Default;		return InstructionUniformity::Default;
▲ Show 20 Lines • Show All 325 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIInstructions.td

	Show First 20 Lines • Show All 3,343 Lines • ▼ Show 20 Lines
	defm : Int16Med3Pat<V_MED3_I16_e64, smin, smax>;			defm : Int16Med3Pat<V_MED3_I16_e64, smin, smax>;
	defm : Int16Med3Pat<V_MED3_U16_e64, umin, umax>;			defm : Int16Med3Pat<V_MED3_U16_e64, umin, umax>;
	} // End Predicates = [isGFX9Plus]			} // End Predicates = [isGFX9Plus]

	class AMDGPUGenericInstruction : GenericInstruction {			class AMDGPUGenericInstruction : GenericInstruction {
	let Namespace = "AMDGPU";			let Namespace = "AMDGPU";
	}			}

				def PRED_COPY : AMDGPUGenericInstruction {
				arsenmUnsubmitted Not Done Reply Inline Actions This should not be AMDGPUGenericInstruction, it's not a generic instruction. Should subclass from AMDGPUInst, and move to the other non-generic pseudos. AMDGPUGenericInstruction is for the G_AMDGPU_ globalisel pre-isel pseudos arsenm: This should not be AMDGPUGenericInstruction, it's not a generic instruction. Should subclass…
				let OutOperandList = (outs unknown:$dst);
				let InOperandList = (ins unknown:$src);
				let AsmString = "PRED_COPY";
				let hasSideEffects = false;
				let isAsCheapAsAMove = true;
				let isPredicable = true;
				}
				arsenmUnsubmitted Not Done Reply Inline Actions Should be marked convergent arsenm: Should be marked convergent

	// Convert a wave address to a swizzled vector address (i.e. this is			// Convert a wave address to a swizzled vector address (i.e. this is
	// for copying the stack pointer to a vector address appropriate to			// for copying the stack pointer to a vector address appropriate to
	// use in the offset field of mubuf instructions).			// use in the offset field of mubuf instructions).
	def G_AMDGPU_WAVE_ADDRESS : AMDGPUGenericInstruction {			def G_AMDGPU_WAVE_ADDRESS : AMDGPUGenericInstruction {
	let OutOperandList = (outs type0:$dst);			let OutOperandList = (outs type0:$dst);
	let InOperandList = (ins type0:$src);			let InOperandList = (ins type0:$src);
	let hasSideEffects = 0;			let hasSideEffects = 0;
	}			}
	▲ Show 20 Lines • Show All 286 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SILowerPredicatedCopies.cpp

This file was added.

				//===-- SILowerPredicatedCopies.cpp - Lower Copies after regalloc ---===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				/// \file
				/// Lowering the predicated PRED_COPY instructions for various register
				/// classes. AMDGPU target generates PRED_COPY instruction to differentiate WWM
				/// copy from COPY. This pass generates the necessary exec mask manipulation
				/// instructions to replicate 'Whole Wave Mode' and lowers PRED_COPY back to
				/// COPY.
				//
				//===----------------------------------------------------------------------===//

				#include "AMDGPU.h"
				#include "GCNSubtarget.h"
				#include "MCTargetDesc/AMDGPUMCTargetDesc.h"
				#include "SIMachineFunctionInfo.h"
				#include "llvm/CodeGen/LiveIntervals.h"
				#include "llvm/CodeGen/MachineFunctionPass.h"
				#include "llvm/CodeGen/VirtRegMap.h"
				#include "llvm/InitializePasses.h"

				using namespace llvm;

				#define DEBUG_TYPE "si-lower-predicated-copies"

				namespace {

				class SILowerPredicatedCopies : public MachineFunctionPass {
				public:
				static char ID;

				SILowerPredicatedCopies() : MachineFunctionPass(ID) {
				initializeSILowerPredicatedCopiesPass(*PassRegistry::getPassRegistry());
				}

				bool runOnMachineFunction(MachineFunction &MF) override;

				StringRef getPassName() const override {
				return "SI Lower Predicated Copies";
				}

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.setPreservesAll();
				MachineFunctionPass::getAnalysisUsage(AU);
				}

				private:
				bool isWWMCopy(const MachineInstr &MI, const TargetInstrInfo &TII);
				bool isSCCLiveAtMI(const MachineInstr &MI);
				void addToWWMSpills(MachineFunction &MF, Register Reg);

				LiveIntervals *LIS;
				SlotIndexes *Indexes;
				VirtRegMap *VRM;
				const SIRegisterInfo *TRI;
				const MachineRegisterInfo *MRI;
				SIMachineFunctionInfo *MFI;
				};

				} // End anonymous namespace.

				INITIALIZE_PASS_BEGIN(SILowerPredicatedCopies, DEBUG_TYPE,
				"SI Lower Predicated Copies", false, false)
				INITIALIZE_PASS_DEPENDENCY(LiveIntervals)
				INITIALIZE_PASS_DEPENDENCY(VirtRegMap)
				INITIALIZE_PASS_END(SILowerPredicatedCopies, DEBUG_TYPE,
				"SI Lower Predicated Copies", false, false)

				char SILowerPredicatedCopies::ID = 0;

				char &llvm::SILowerPredicatedCopiesID = SILowerPredicatedCopies::ID;

				// Returns true if \p MI is a whole-wave copy instruction. Iterate
				// recursively skipping the intermediate copies if it maps to any
				// whole-wave operation.
				bool SILowerPredicatedCopies::isWWMCopy(const MachineInstr &MI,
				const TargetInstrInfo &TII) {
				// Skip if it is a subreg copy.
				arsenmUnsubmitted Not Done Reply Inline Actions SIInstrInfo arsenm: SIInstrInfo
				cdevadasAuthorUnsubmitted Done Reply Inline Actions This function is no longer needed. Now the PRED_COPY is inserted only for wwm-regs after we fine-tuned AMDGPU's `getLiveRangeSplitOpcode` implementation. cdevadas: This function is no longer needed. Now the PRED_COPY is inserted only for wwm-regs after we…
				if (!TII.isFullCopyInstr(MI))
				return false;

				Register SrcReg = MI.getOperand(1).getReg();

				if (MFI->checkFlag(SrcReg, AMDGPU::VirtRegFlag::WWM_REG))
				return true;

				if (SrcReg.isPhysical())
				return false;
				arsenmUnsubmitted Not Done Reply Inline Actions Either this check is unnecessary or should have happened earlier to avoid crashing on the checkFlag arsenm: Either this check is unnecessary or should have happened earlier to avoid crashing on the…

				// Look recursively skipping intermediate copies.
				const MachineInstr *DefMI = MRI->getUniqueVRegDef(SrcReg);
				if (!DefMI \|\| !TII.isCopyInstr(*DefMI))
				return false;

				return isWWMCopy(*DefMI, TII);
				}

				bool SILowerPredicatedCopies::isSCCLiveAtMI(const MachineInstr &MI) {
				// We can't determine the liveness info if LIS isn't available. Early return
				// in that case and always assume SCC is live.
				if (!LIS)
				return true;

				LiveRange &LR =
				LIS->getRegUnit(*MCRegUnitIterator(MCRegister::from(AMDGPU::SCC), TRI));
				SlotIndex Idx = LIS->getInstructionIndex(MI);
				return LR.liveAt(Idx);
				}

				// If \p Reg is assigned with a physical VGPR, add the latter into wwm-spills
				// for preserving its entire lanes at function prolog/epilog.
				void SILowerPredicatedCopies::addToWWMSpills(MachineFunction &MF,
				Register Reg) {
				if (!VRM \|\| Reg.isPhysical())
				return;

				Register PhysReg = VRM->getPhys(Reg);
				assert(PhysReg != VirtRegMap::NO_PHYS_REG &&
				"should have allocated a physical register");

				MFI->allocateWWMSpill(MF, PhysReg);
				}

				bool SILowerPredicatedCopies::runOnMachineFunction(MachineFunction &MF) {
				const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
				const SIInstrInfo *TII = ST.getInstrInfo();

				MFI = MF.getInfo<SIMachineFunctionInfo>();
				LIS = getAnalysisIfAvailable<LiveIntervals>();
				Indexes = getAnalysisIfAvailable<SlotIndexes>();
				VRM = getAnalysisIfAvailable<VirtRegMap>();
				TRI = ST.getRegisterInfo();
				MRI = &MF.getRegInfo();
				bool Changed = false;

				for (MachineBasicBlock &MBB : MF) {
				arsenmUnsubmitted Not Done Reply Inline Actions Can you early return false if the WWM reg set is empty? arsenm: Can you early return false if the WWM reg set is empty?
				yassinghUnsubmitted Not Done Reply Inline Actions Done, had to add a new helper function. yassingh: Done, had to add a new helper function.
				for (MachineInstr &MI : MBB) {
				if (MI.getOpcode() == AMDGPU::PRED_COPY) {
				assert(TII->isVGPRCopy(MI));
				arsenmUnsubmitted Not Done Reply Inline Actions continue and reduce indentation arsenm: continue and reduce indentation
				if (MI.getOperand(0).getReg().isVirtual() && isWWMCopy(MI, *TII)) {
				arsenmUnsubmitted Not Done Reply Inline Actions Should add a TODO to try to reduce the saveexec/restore exec pairs for adjacent WWM ops arsenm: Should add a TODO to try to reduce the saveexec/restore exec pairs for adjacent WWM ops
				// For WWM vector copies, manipulate the exec mask around the copy
				// instruction.
				DebugLoc DL = MI.getDebugLoc();
				MachineBasicBlock::iterator InsertPt = MI.getIterator();
				Register RegForExecCopy = MFI->getSGPRForEXECCopy();
				TII->insertScratchExecCopy(MF, MBB, InsertPt, DL, RegForExecCopy,
				isSCCLiveAtMI(MI), Indexes);
				cdevadasAuthorUnsubmitted Done Reply Inline Actions I guess this whole check now can be entirely avoided. PRED_COPY is inserted only for wwm-regs. If we see any such copy, insert the EXEC mask manipulation unconditionally. cdevadas: I guess this whole check now can be entirely avoided. PRED_COPY is inserted only for wwm-regs.
				TII->restoreExec(MF, MBB, ++InsertPt, DL, RegForExecCopy, Indexes);
				addToWWMSpills(MF, MI.getOperand(0).getReg());
				LLVM_DEBUG(dbgs() << "WWM copy manipulation for " << MI);
				Changed \|= true;
				}
				}
				}
				}
				arsenmUnsubmitted Not Done Reply Inline Actions Do these need to gain an implicit exec use? arsenm: Do these need to gain an implicit exec use?
				yassinghUnsubmitted Not Done Reply Inline Actions Do you mean we should add the implicit exec? In that case, SIFixVGPRCopies will take care? yassingh: Do you mean we should add the implicit exec? In that case, SIFixVGPRCopies will take care?
				arsenmUnsubmitted Not Done Reply Inline Actions This isn't something that can be taken care of later. SIFixVGPRCopies is a horribly broken hack, the less we depend on it the better arsenm: This isn't something that can be taken care of later. SIFixVGPRCopies is a horribly broken hack…
				arsenmUnsubmitted Not Done Reply Inline Actions Actually these split copies might have been the original problem which caused it to be added. Maybe we have a way to drop it now? arsenm: Actually these split copies might have been the original problem which caused it to be added.
				arsenmUnsubmitted Not Done Reply Inline Actions That was the reasoning given in D28874 when it was added. How about as a next step we make sure all the VGPR splits end up with exec reads arsenm: That was the reasoning given in D28874 when it was added. How about as a next step we make sure…
				yassinghUnsubmitted Not Done Reply Inline Actions I can try and see what comes. I can see 2 ways to go about it, add the exec operand here itself or, in tablegen definition of PRED_COPY. What do you suggest? yassingh: I can try and see what comes. I can see 2 ways to go about it, add the exec operand here itself…
				yassinghUnsubmitted Not Done Reply Inline Actions Will look into it, don't have sufficient expertise to answer this right now. Adding @AMDGPU. yassingh: Will look into it, don't have sufficient expertise to answer this right now. Adding @AMDGPU.
				arsenmUnsubmitted Not Done Reply Inline Actions Add to the instruction definition. Though we're still stuck with COPY=PRED_COPY 'sometimes' arsenm: Add to the instruction definition. Though we're still stuck with COPY=PRED_COPY 'sometimes'

				return Changed;
				}

llvm/test/CodeGen/AMDGPU/branch-folding-implicit-def-subreg.ll

Show First 20 Lines • Show All 830 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @f1(ptr addrspace(1) %arg, ptr addrspace(1) %arg1, i64 %arg2, i1 %arg3, i1 %arg4, i1 %arg5, i1 %arg6, ptr addrspace(3) %arg7, ptr addrspace(3) %arg8, ptr addrspace(3) %arg9, ptr addrspace(3) %arg10) {
; GFX90A-NEXT: renamable $vgpr0 = V_MOV_B32_e32 0, implicit $exec		; GFX90A-NEXT: renamable $vgpr0 = V_MOV_B32_e32 0, implicit $exec
; GFX90A-NEXT: renamable $vgpr24_vgpr25 = DS_READ_B64_gfx9 killed renamable $vgpr0, 0, 0, implicit $exec :: (load (s64) from `ptr addrspace(3) null`, addrspace 3)		; GFX90A-NEXT: renamable $vgpr24_vgpr25 = DS_READ_B64_gfx9 killed renamable $vgpr0, 0, 0, implicit $exec :: (load (s64) from `ptr addrspace(3) null`, addrspace 3)
; GFX90A-NEXT: renamable $vgpr0 = COPY renamable $sgpr23, implicit $exec		; GFX90A-NEXT: renamable $vgpr0 = COPY renamable $sgpr23, implicit $exec
; GFX90A-NEXT: renamable $vgpr22_vgpr23 = DS_READ_B64_gfx9 killed renamable $vgpr0, 0, 0, implicit $exec :: (load (s64) from %ir.434, addrspace 3)		; GFX90A-NEXT: renamable $vgpr22_vgpr23 = DS_READ_B64_gfx9 killed renamable $vgpr0, 0, 0, implicit $exec :: (load (s64) from %ir.434, addrspace 3)
; GFX90A-NEXT: renamable $vgpr0 = COPY renamable $sgpr21, implicit $exec		; GFX90A-NEXT: renamable $vgpr0 = COPY renamable $sgpr21, implicit $exec
; GFX90A-NEXT: renamable $vgpr20_vgpr21 = DS_READ_B64_gfx9 killed renamable $vgpr0, 0, 0, implicit $exec :: (load (s64) from %ir.7, addrspace 3)		; GFX90A-NEXT: renamable $vgpr20_vgpr21 = DS_READ_B64_gfx9 killed renamable $vgpr0, 0, 0, implicit $exec :: (load (s64) from %ir.7, addrspace 3)
; GFX90A-NEXT: renamable $vgpr0 = COPY killed renamable $sgpr17, implicit $exec		; GFX90A-NEXT: renamable $vgpr0 = COPY killed renamable $sgpr17, implicit $exec
; GFX90A-NEXT: renamable $agpr0_agpr1 = DS_READ_B64_gfx9 killed renamable $vgpr0, 0, 0, implicit $exec :: (load (s64) from %ir.435, addrspace 3)		; GFX90A-NEXT: renamable $agpr0_agpr1 = DS_READ_B64_gfx9 killed renamable $vgpr0, 0, 0, implicit $exec :: (load (s64) from %ir.435, addrspace 3)
		; GFX90A-NEXT: renamable $agpr0_agpr1 = PRED_COPY killed renamable $agpr0_agpr1
; GFX90A-NEXT: renamable $vgpr0 = COPY renamable $sgpr22, implicit $exec		; GFX90A-NEXT: renamable $vgpr0 = COPY renamable $sgpr22, implicit $exec
; GFX90A-NEXT: renamable $vgpr26_vgpr27 = DS_READ_B64_gfx9 killed renamable $vgpr0, 0, 0, implicit $exec :: (load (s64) from %ir.8, addrspace 3)		; GFX90A-NEXT: renamable $vgpr26_vgpr27 = DS_READ_B64_gfx9 killed renamable $vgpr0, 0, 0, implicit $exec :: (load (s64) from %ir.8, addrspace 3)
; GFX90A-NEXT: renamable $sgpr36_sgpr37 = S_MOV_B64 -1		; GFX90A-NEXT: renamable $sgpr36_sgpr37 = S_MOV_B64 -1
; GFX90A-NEXT: renamable $sgpr23 = S_MOV_B32 0		; GFX90A-NEXT: renamable $sgpr23 = S_MOV_B32 0
; GFX90A-NEXT: renamable $sgpr17 = S_MOV_B32 0		; GFX90A-NEXT: renamable $sgpr17 = S_MOV_B32 0
; GFX90A-NEXT: S_BRANCH %bb.3		; GFX90A-NEXT: S_BRANCH %bb.3
; GFX90A-NEXT: {{ $}}		; GFX90A-NEXT: {{ $}}
; GFX90A-NEXT: bb.59.bb85:		; GFX90A-NEXT: bb.59.bb85:
Show All 20 Lines	define amdgpu_kernel void @f1(ptr addrspace(1) %arg, ptr addrspace(1) %arg1, i64 %arg2, i1 %arg3, i1 %arg4, i1 %arg5, i1 %arg6, ptr addrspace(3) %arg7, ptr addrspace(3) %arg8, ptr addrspace(3) %arg9, ptr addrspace(3) %arg10) {
; GFX90A-NEXT: {{ $}}		; GFX90A-NEXT: {{ $}}
; GFX90A-NEXT: bb.60.Flow31:		; GFX90A-NEXT: bb.60.Flow31:
; GFX90A-NEXT: successors: %bb.61(0x80000000)		; GFX90A-NEXT: successors: %bb.61(0x80000000)
; GFX90A-NEXT: liveins: $sgpr14, $sgpr15, $sgpr16, $sgpr17, $vgpr17, $vgpr19, $vgpr20, $vgpr30, $vgpr31, $vgpr54, $sgpr4_sgpr5, $sgpr6_sgpr7, $sgpr8_sgpr9:0x000000000000000F, $sgpr10_sgpr11, $sgpr12_sgpr13, $sgpr18_sgpr19, $sgpr24_sgpr25, $sgpr28_sgpr29, $sgpr30_sgpr31, $sgpr34_sgpr35, $sgpr36_sgpr37, $sgpr38_sgpr39, $sgpr40_sgpr41, $sgpr42_sgpr43, $sgpr44_sgpr45, $sgpr46_sgpr47, $sgpr48_sgpr49, $sgpr50_sgpr51, $sgpr52_sgpr53, $sgpr60_sgpr61, $sgpr62_sgpr63, $sgpr20_sgpr21_sgpr22_sgpr23:0x000000000000003C, $sgpr24_sgpr25_sgpr26_sgpr27:0x00000000000000F0, $vgpr0_vgpr1:0x000000000000000F, $vgpr2_vgpr3:0x000000000000000F, $vgpr4_vgpr5:0x0000000000000003, $vgpr6_vgpr7:0x000000000000000F, $vgpr8_vgpr9:0x000000000000000F, $vgpr10_vgpr11:0x000000000000000F, $vgpr12_vgpr13:0x000000000000000C, $vgpr14_vgpr15:0x000000000000000F, $vgpr16_vgpr17:0x0000000000000003, $vgpr18_vgpr19:0x0000000000000003, $vgpr40_vgpr41:0x000000000000000F, $vgpr42_vgpr43:0x000000000000000F, $vgpr44_vgpr45:0x000000000000000F, $vgpr46_vgpr47:0x000000000000000F, $vgpr56_vgpr57:0x000000000000000F, $vgpr58_vgpr59:0x0000000000000003, $vgpr60_vgpr61:0x000000000000000F, $vgpr62_vgpr63:0x000000000000000F, $sgpr0_sgpr1_sgpr2_sgpr3		; GFX90A-NEXT: liveins: $sgpr14, $sgpr15, $sgpr16, $sgpr17, $vgpr17, $vgpr19, $vgpr20, $vgpr30, $vgpr31, $vgpr54, $sgpr4_sgpr5, $sgpr6_sgpr7, $sgpr8_sgpr9:0x000000000000000F, $sgpr10_sgpr11, $sgpr12_sgpr13, $sgpr18_sgpr19, $sgpr24_sgpr25, $sgpr28_sgpr29, $sgpr30_sgpr31, $sgpr34_sgpr35, $sgpr36_sgpr37, $sgpr38_sgpr39, $sgpr40_sgpr41, $sgpr42_sgpr43, $sgpr44_sgpr45, $sgpr46_sgpr47, $sgpr48_sgpr49, $sgpr50_sgpr51, $sgpr52_sgpr53, $sgpr60_sgpr61, $sgpr62_sgpr63, $sgpr20_sgpr21_sgpr22_sgpr23:0x000000000000003C, $sgpr24_sgpr25_sgpr26_sgpr27:0x00000000000000F0, $vgpr0_vgpr1:0x000000000000000F, $vgpr2_vgpr3:0x000000000000000F, $vgpr4_vgpr5:0x0000000000000003, $vgpr6_vgpr7:0x000000000000000F, $vgpr8_vgpr9:0x000000000000000F, $vgpr10_vgpr11:0x000000000000000F, $vgpr12_vgpr13:0x000000000000000C, $vgpr14_vgpr15:0x000000000000000F, $vgpr16_vgpr17:0x0000000000000003, $vgpr18_vgpr19:0x0000000000000003, $vgpr40_vgpr41:0x000000000000000F, $vgpr42_vgpr43:0x000000000000000F, $vgpr44_vgpr45:0x000000000000000F, $vgpr46_vgpr47:0x000000000000000F, $vgpr56_vgpr57:0x000000000000000F, $vgpr58_vgpr59:0x0000000000000003, $vgpr60_vgpr61:0x000000000000000F, $vgpr62_vgpr63:0x000000000000000F, $sgpr0_sgpr1_sgpr2_sgpr3
; GFX90A-NEXT: {{ $}}		; GFX90A-NEXT: {{ $}}
; GFX90A-NEXT: $exec = S_OR_B64 $exec, killed renamable $sgpr52_sgpr53, implicit-def $scc		; GFX90A-NEXT: $exec = S_OR_B64 $exec, killed renamable $sgpr52_sgpr53, implicit-def $scc
; GFX90A-NEXT: renamable $sgpr52_sgpr53 = S_MOV_B64 0		; GFX90A-NEXT: renamable $sgpr52_sgpr53 = S_MOV_B64 0
; GFX90A-NEXT: renamable $vgpr12 = COPY renamable $vgpr16, implicit $exec		; GFX90A-NEXT: renamable $vgpr12 = COPY renamable $vgpr16, implicit $exec
; GFX90A-NEXT: renamable $agpr0_agpr1 = COPY killed renamable $vgpr12_vgpr13, implicit $exec		; GFX90A-NEXT: renamable $agpr0_agpr1 = PRED_COPY killed renamable $vgpr12_vgpr13
; GFX90A-NEXT: {{ $}}		; GFX90A-NEXT: {{ $}}
; GFX90A-NEXT: bb.61.Flow30:		; GFX90A-NEXT: bb.61.Flow30:
; GFX90A-NEXT: successors: %bb.55(0x80000000)		; GFX90A-NEXT: successors: %bb.55(0x80000000)
; GFX90A-NEXT: liveins: $sgpr14, $sgpr15, $sgpr16, $sgpr17, $vgpr17, $vgpr19, $vgpr20, $vgpr30, $vgpr31, $vgpr54, $agpr0_agpr1:0x000000000000000F, $sgpr4_sgpr5, $sgpr6_sgpr7, $sgpr8_sgpr9:0x000000000000000F, $sgpr10_sgpr11, $sgpr12_sgpr13, $sgpr18_sgpr19, $sgpr24_sgpr25, $sgpr28_sgpr29, $sgpr30_sgpr31, $sgpr34_sgpr35, $sgpr36_sgpr37, $sgpr38_sgpr39, $sgpr40_sgpr41, $sgpr42_sgpr43, $sgpr44_sgpr45, $sgpr46_sgpr47, $sgpr48_sgpr49, $sgpr50_sgpr51, $sgpr52_sgpr53, $sgpr60_sgpr61, $sgpr62_sgpr63, $sgpr20_sgpr21_sgpr22_sgpr23:0x000000000000003C, $sgpr24_sgpr25_sgpr26_sgpr27:0x00000000000000F0, $vgpr0_vgpr1:0x000000000000000F, $vgpr2_vgpr3:0x000000000000000F, $vgpr4_vgpr5:0x0000000000000003, $vgpr6_vgpr7:0x000000000000000F, $vgpr8_vgpr9:0x000000000000000F, $vgpr10_vgpr11:0x000000000000000F, $vgpr14_vgpr15:0x000000000000000F, $vgpr16_vgpr17:0x0000000000000003, $vgpr18_vgpr19:0x0000000000000003, $vgpr40_vgpr41:0x000000000000000F, $vgpr42_vgpr43:0x000000000000000F, $vgpr44_vgpr45:0x000000000000000F, $vgpr46_vgpr47:0x000000000000000F, $vgpr56_vgpr57:0x000000000000000F, $vgpr58_vgpr59:0x0000000000000003, $vgpr60_vgpr61:0x000000000000000F, $vgpr62_vgpr63:0x000000000000000F, $sgpr0_sgpr1_sgpr2_sgpr3		; GFX90A-NEXT: liveins: $sgpr14, $sgpr15, $sgpr16, $sgpr17, $vgpr17, $vgpr19, $vgpr20, $vgpr30, $vgpr31, $vgpr54, $agpr0_agpr1:0x000000000000000F, $sgpr4_sgpr5, $sgpr6_sgpr7, $sgpr8_sgpr9:0x000000000000000F, $sgpr10_sgpr11, $sgpr12_sgpr13, $sgpr18_sgpr19, $sgpr24_sgpr25, $sgpr28_sgpr29, $sgpr30_sgpr31, $sgpr34_sgpr35, $sgpr36_sgpr37, $sgpr38_sgpr39, $sgpr40_sgpr41, $sgpr42_sgpr43, $sgpr44_sgpr45, $sgpr46_sgpr47, $sgpr48_sgpr49, $sgpr50_sgpr51, $sgpr52_sgpr53, $sgpr60_sgpr61, $sgpr62_sgpr63, $sgpr20_sgpr21_sgpr22_sgpr23:0x000000000000003C, $sgpr24_sgpr25_sgpr26_sgpr27:0x00000000000000F0, $vgpr0_vgpr1:0x000000000000000F, $vgpr2_vgpr3:0x000000000000000F, $vgpr4_vgpr5:0x0000000000000003, $vgpr6_vgpr7:0x000000000000000F, $vgpr8_vgpr9:0x000000000000000F, $vgpr10_vgpr11:0x000000000000000F, $vgpr14_vgpr15:0x000000000000000F, $vgpr16_vgpr17:0x0000000000000003, $vgpr18_vgpr19:0x0000000000000003, $vgpr40_vgpr41:0x000000000000000F, $vgpr42_vgpr43:0x000000000000000F, $vgpr44_vgpr45:0x000000000000000F, $vgpr46_vgpr47:0x000000000000000F, $vgpr56_vgpr57:0x000000000000000F, $vgpr58_vgpr59:0x0000000000000003, $vgpr60_vgpr61:0x000000000000000F, $vgpr62_vgpr63:0x000000000000000F, $sgpr0_sgpr1_sgpr2_sgpr3
; GFX90A-NEXT: {{ $}}		; GFX90A-NEXT: {{ $}}
; GFX90A-NEXT: renamable $sgpr54_sgpr55 = S_XOR_B64 $exec, -1, implicit-def dead $scc		; GFX90A-NEXT: renamable $sgpr54_sgpr55 = S_XOR_B64 $exec, -1, implicit-def dead $scc
; GFX90A-NEXT: renamable $sgpr56_sgpr57 = S_AND_B64 killed renamable $sgpr52_sgpr53, $exec, implicit-def dead $scc		; GFX90A-NEXT: renamable $sgpr56_sgpr57 = S_AND_B64 killed renamable $sgpr52_sgpr53, $exec, implicit-def dead $scc
; GFX90A-NEXT: renamable $sgpr52_sgpr53 = S_AND_B64 killed renamable $sgpr50_sgpr51, $exec, implicit-def dead $scc		; GFX90A-NEXT: renamable $sgpr52_sgpr53 = S_AND_B64 killed renamable $sgpr50_sgpr51, $exec, implicit-def dead $scc
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @f1(ptr addrspace(1) %arg, ptr addrspace(1) %arg1, i64 %arg2, i1 %arg3, i1 %arg4, i1 %arg5, i1 %arg6, ptr addrspace(3) %arg7, ptr addrspace(3) %arg8, ptr addrspace(3) %arg9, ptr addrspace(3) %arg10) {
; GFX90A-NEXT: successors: %bb.72(0x40000000), %bb.69(0x40000000)		; GFX90A-NEXT: successors: %bb.72(0x40000000), %bb.69(0x40000000)
; GFX90A-NEXT: liveins: $sgpr14, $sgpr15, $sgpr16, $vgpr17, $vgpr19, $vgpr30, $vgpr31, $vgpr54, $agpr0_agpr1:0x000000000000000F, $sgpr4_sgpr5, $sgpr6_sgpr7, $sgpr8_sgpr9:0x000000000000000F, $sgpr10_sgpr11, $sgpr12_sgpr13, $sgpr18_sgpr19, $sgpr24_sgpr25, $sgpr28_sgpr29, $sgpr34_sgpr35, $sgpr38_sgpr39, $sgpr40_sgpr41, $sgpr42_sgpr43, $sgpr44_sgpr45, $sgpr46_sgpr47, $sgpr48_sgpr49, $sgpr50_sgpr51, $sgpr52_sgpr53, $sgpr54_sgpr55, $sgpr56_sgpr57, $sgpr20_sgpr21_sgpr22_sgpr23:0x000000000000003C, $sgpr24_sgpr25_sgpr26_sgpr27:0x00000000000000F0, $vgpr0_vgpr1:0x000000000000000F, $vgpr2_vgpr3:0x000000000000000F, $vgpr4_vgpr5:0x0000000000000003, $vgpr6_vgpr7:0x000000000000000F, $vgpr8_vgpr9:0x000000000000000F, $vgpr10_vgpr11:0x000000000000000F, $vgpr14_vgpr15:0x000000000000000F, $vgpr16_vgpr17:0x0000000000000003, $vgpr18_vgpr19:0x0000000000000003, $vgpr20_vgpr21:0x000000000000000F, $vgpr22_vgpr23:0x000000000000000F, $vgpr24_vgpr25:0x000000000000000F, $vgpr26_vgpr27:0x000000000000000F, $vgpr40_vgpr41:0x000000000000000F, $vgpr42_vgpr43:0x000000000000000F, $vgpr44_vgpr45:0x000000000000000F, $vgpr46_vgpr47:0x000000000000000F, $vgpr56_vgpr57:0x000000000000000F, $vgpr58_vgpr59:0x000000000000000F, $vgpr60_vgpr61:0x000000000000000F, $vgpr62_vgpr63:0x000000000000000F, $sgpr0_sgpr1_sgpr2_sgpr3		; GFX90A-NEXT: liveins: $sgpr14, $sgpr15, $sgpr16, $vgpr17, $vgpr19, $vgpr30, $vgpr31, $vgpr54, $agpr0_agpr1:0x000000000000000F, $sgpr4_sgpr5, $sgpr6_sgpr7, $sgpr8_sgpr9:0x000000000000000F, $sgpr10_sgpr11, $sgpr12_sgpr13, $sgpr18_sgpr19, $sgpr24_sgpr25, $sgpr28_sgpr29, $sgpr34_sgpr35, $sgpr38_sgpr39, $sgpr40_sgpr41, $sgpr42_sgpr43, $sgpr44_sgpr45, $sgpr46_sgpr47, $sgpr48_sgpr49, $sgpr50_sgpr51, $sgpr52_sgpr53, $sgpr54_sgpr55, $sgpr56_sgpr57, $sgpr20_sgpr21_sgpr22_sgpr23:0x000000000000003C, $sgpr24_sgpr25_sgpr26_sgpr27:0x00000000000000F0, $vgpr0_vgpr1:0x000000000000000F, $vgpr2_vgpr3:0x000000000000000F, $vgpr4_vgpr5:0x0000000000000003, $vgpr6_vgpr7:0x000000000000000F, $vgpr8_vgpr9:0x000000000000000F, $vgpr10_vgpr11:0x000000000000000F, $vgpr14_vgpr15:0x000000000000000F, $vgpr16_vgpr17:0x0000000000000003, $vgpr18_vgpr19:0x0000000000000003, $vgpr20_vgpr21:0x000000000000000F, $vgpr22_vgpr23:0x000000000000000F, $vgpr24_vgpr25:0x000000000000000F, $vgpr26_vgpr27:0x000000000000000F, $vgpr40_vgpr41:0x000000000000000F, $vgpr42_vgpr43:0x000000000000000F, $vgpr44_vgpr45:0x000000000000000F, $vgpr46_vgpr47:0x000000000000000F, $vgpr56_vgpr57:0x000000000000000F, $vgpr58_vgpr59:0x000000000000000F, $vgpr60_vgpr61:0x000000000000000F, $vgpr62_vgpr63:0x000000000000000F, $sgpr0_sgpr1_sgpr2_sgpr3
; GFX90A-NEXT: {{ $}}		; GFX90A-NEXT: {{ $}}
; GFX90A-NEXT: renamable $vgpr28 = V_OR_B32_e32 1, $vgpr26, implicit $exec		; GFX90A-NEXT: renamable $vgpr28 = V_OR_B32_e32 1, $vgpr26, implicit $exec
; GFX90A-NEXT: renamable $vgpr38 = V_OR_B32_e32 $vgpr28, $vgpr24, implicit $exec		; GFX90A-NEXT: renamable $vgpr38 = V_OR_B32_e32 $vgpr28, $vgpr24, implicit $exec
; GFX90A-NEXT: renamable $vgpr36 = V_OR_B32_e32 $vgpr38, $vgpr22, implicit $exec		; GFX90A-NEXT: renamable $vgpr36 = V_OR_B32_e32 $vgpr38, $vgpr22, implicit $exec
; GFX90A-NEXT: renamable $vgpr32 = V_CNDMASK_B32_e64 0, $vgpr36, 0, 0, $sgpr12_sgpr13, implicit $exec		; GFX90A-NEXT: renamable $vgpr32 = V_CNDMASK_B32_e64 0, $vgpr36, 0, 0, $sgpr12_sgpr13, implicit $exec
; GFX90A-NEXT: renamable $vgpr50 = V_OR_B32_e32 $vgpr32, $vgpr20, implicit $exec		; GFX90A-NEXT: renamable $vgpr50 = V_OR_B32_e32 $vgpr32, $vgpr20, implicit $exec
; GFX90A-NEXT: renamable $vgpr12_vgpr13 = COPY renamable $agpr0_agpr1, implicit $exec		; GFX90A-NEXT: renamable $vgpr12_vgpr13 = PRED_COPY renamable $agpr0_agpr1
; GFX90A-NEXT: renamable $vgpr48 = V_OR_B32_e32 $vgpr50, killed $vgpr12, implicit $exec		; GFX90A-NEXT: renamable $vgpr48 = V_OR_B32_e32 $vgpr50, killed $vgpr12, implicit $exec
; GFX90A-NEXT: renamable $vgpr34 = V_OR_B32_e32 $vgpr48, $vgpr14, implicit $exec		; GFX90A-NEXT: renamable $vgpr34 = V_OR_B32_e32 $vgpr48, $vgpr14, implicit $exec
; GFX90A-NEXT: renamable $vgpr52 = V_CNDMASK_B32_e64 0, 0, 0, $vgpr34, killed $sgpr12_sgpr13, implicit $exec		; GFX90A-NEXT: renamable $vgpr52 = V_CNDMASK_B32_e64 0, 0, 0, $vgpr34, killed $sgpr12_sgpr13, implicit $exec
; GFX90A-NEXT: renamable $sgpr12_sgpr13 = S_MOV_B64 -1		; GFX90A-NEXT: renamable $sgpr12_sgpr13 = S_MOV_B64 -1
; GFX90A-NEXT: renamable $vcc = S_AND_B64 $exec, killed renamable $sgpr28_sgpr29, implicit-def dead $scc		; GFX90A-NEXT: renamable $vcc = S_AND_B64 $exec, killed renamable $sgpr28_sgpr29, implicit-def dead $scc
; GFX90A-NEXT: S_CBRANCH_VCCNZ %bb.72, implicit $vcc		; GFX90A-NEXT: S_CBRANCH_VCCNZ %bb.72, implicit $vcc
; GFX90A-NEXT: {{ $}}		; GFX90A-NEXT: {{ $}}
; GFX90A-NEXT: bb.69.Flow:		; GFX90A-NEXT: bb.69.Flow:
▲ Show 20 Lines • Show All 352 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/greedy-global-heuristic.mir

Show First 20 Lines • Show All 127 Lines • ▼ Show 20 Lines	body: \|
; CHECK-NEXT: S_NOP 0		; CHECK-NEXT: S_NOP 0
; CHECK-NEXT: S_NOP 0		; CHECK-NEXT: S_NOP 0
; CHECK-NEXT: S_NOP 0		; CHECK-NEXT: S_NOP 0
; CHECK-NEXT: S_NOP 0		; CHECK-NEXT: S_NOP 0
; CHECK-NEXT: S_NOP 0		; CHECK-NEXT: S_NOP 0
; CHECK-NEXT: S_NOP 0		; CHECK-NEXT: S_NOP 0
; CHECK-NEXT: S_NOP 0		; CHECK-NEXT: S_NOP 0
; CHECK-NEXT: S_NOP 0		; CHECK-NEXT: S_NOP 0
; CHECK-NEXT: [[COPY:%[0-9]+]]:vreg_128 = COPY %31		; CHECK-NEXT: [[PRED_COPY:%[0-9]+]]:vreg_128 = PRED_COPY %31
; CHECK-NEXT: S_NOP 0, implicit %31		; CHECK-NEXT: S_NOP 0, implicit %31
; CHECK-NEXT: [[COPY1:%[0-9]+]]:vreg_128 = COPY %29		; CHECK-NEXT: [[PRED_COPY1:%[0-9]+]]:vreg_128 = PRED_COPY %29
; CHECK-NEXT: S_NOP 0, implicit %29		; CHECK-NEXT: S_NOP 0, implicit %29
; CHECK-NEXT: [[COPY2:%[0-9]+]]:vreg_128 = COPY %27		; CHECK-NEXT: [[PRED_COPY2:%[0-9]+]]:vreg_128 = PRED_COPY %27
; CHECK-NEXT: S_NOP 0, implicit %27		; CHECK-NEXT: S_NOP 0, implicit %27
; CHECK-NEXT: [[SI_SPILL_V128_RESTORE1:%[0-9]+]]:vreg_128 = SI_SPILL_V128_RESTORE %stack.1, $sgpr32, 0, implicit $exec :: (load (s128) from %stack.1, align 4, addrspace 5)		; CHECK-NEXT: [[SI_SPILL_V128_RESTORE1:%[0-9]+]]:vreg_128 = SI_SPILL_V128_RESTORE %stack.1, $sgpr32, 0, implicit $exec :: (load (s128) from %stack.1, align 4, addrspace 5)
; CHECK-NEXT: [[COPY3:%[0-9]+]]:vreg_128 = COPY [[SI_SPILL_V128_RESTORE1]]		; CHECK-NEXT: [[PRED_COPY3:%[0-9]+]]:vreg_128 = PRED_COPY [[SI_SPILL_V128_RESTORE1]]
; CHECK-NEXT: S_NOP 0, implicit [[SI_SPILL_V128_RESTORE1]]		; CHECK-NEXT: S_NOP 0, implicit [[SI_SPILL_V128_RESTORE1]]
; CHECK-NEXT: [[SI_SPILL_V128_RESTORE2:%[0-9]+]]:vreg_128 = SI_SPILL_V128_RESTORE %stack.0, $sgpr32, 0, implicit $exec :: (load (s128) from %stack.0, align 4, addrspace 5)		; CHECK-NEXT: [[SI_SPILL_V128_RESTORE2:%[0-9]+]]:vreg_128 = SI_SPILL_V128_RESTORE %stack.0, $sgpr32, 0, implicit $exec :: (load (s128) from %stack.0, align 4, addrspace 5)
; CHECK-NEXT: S_NOP 0, implicit [[SI_SPILL_V128_RESTORE2]]		; CHECK-NEXT: S_NOP 0, implicit [[SI_SPILL_V128_RESTORE2]]
; CHECK-NEXT: S_NOP 0, implicit %0		; CHECK-NEXT: S_NOP 0, implicit %0
; CHECK-NEXT: [[SI_SPILL_V128_RESTORE3:%[0-9]+]]:vreg_128 = SI_SPILL_V128_RESTORE %stack.2, $sgpr32, 0, implicit $exec :: (load (s128) from %stack.2, align 4, addrspace 5)		; CHECK-NEXT: [[SI_SPILL_V128_RESTORE3:%[0-9]+]]:vreg_128 = SI_SPILL_V128_RESTORE %stack.2, $sgpr32, 0, implicit $exec :: (load (s128) from %stack.2, align 4, addrspace 5)
; CHECK-NEXT: S_NOP 0, implicit [[SI_SPILL_V128_RESTORE3]]		; CHECK-NEXT: S_NOP 0, implicit [[SI_SPILL_V128_RESTORE3]]
; CHECK-NEXT: [[SI_SPILL_V128_RESTORE4:%[0-9]+]]:vreg_128 = SI_SPILL_V128_RESTORE %stack.4, $sgpr32, 0, implicit $exec :: (load (s128) from %stack.4, align 4, addrspace 5)		; CHECK-NEXT: [[SI_SPILL_V128_RESTORE4:%[0-9]+]]:vreg_128 = SI_SPILL_V128_RESTORE %stack.4, $sgpr32, 0, implicit $exec :: (load (s128) from %stack.4, align 4, addrspace 5)
; CHECK-NEXT: S_NOP 0, implicit [[SI_SPILL_V128_RESTORE4]]		; CHECK-NEXT: S_NOP 0, implicit [[SI_SPILL_V128_RESTORE4]]
; CHECK-NEXT: [[SI_SPILL_V128_RESTORE5:%[0-9]+]]:vreg_128 = SI_SPILL_V128_RESTORE %stack.3, $sgpr32, 0, implicit $exec :: (load (s128) from %stack.3, align 4, addrspace 5)		; CHECK-NEXT: [[SI_SPILL_V128_RESTORE5:%[0-9]+]]:vreg_128 = SI_SPILL_V128_RESTORE %stack.3, $sgpr32, 0, implicit $exec :: (load (s128) from %stack.3, align 4, addrspace 5)
; CHECK-NEXT: S_NOP 0, implicit [[SI_SPILL_V128_RESTORE5]]		; CHECK-NEXT: S_NOP 0, implicit [[SI_SPILL_V128_RESTORE5]]
; CHECK-NEXT: {{ $}}		; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2:		; CHECK-NEXT: bb.2:
; CHECK-NEXT: S_NOP 0, implicit %0		; CHECK-NEXT: S_NOP 0, implicit %0
; CHECK-NEXT: [[SI_SPILL_V128_RESTORE6:%[0-9]+]]:vreg_128 = SI_SPILL_V128_RESTORE %stack.0, $sgpr32, 0, implicit $exec :: (load (s128) from %stack.0, align 4, addrspace 5)		; CHECK-NEXT: [[SI_SPILL_V128_RESTORE6:%[0-9]+]]:vreg_128 = SI_SPILL_V128_RESTORE %stack.0, $sgpr32, 0, implicit $exec :: (load (s128) from %stack.0, align 4, addrspace 5)
; CHECK-NEXT: S_NOP 0, implicit [[SI_SPILL_V128_RESTORE6]]		; CHECK-NEXT: S_NOP 0, implicit [[SI_SPILL_V128_RESTORE6]]
; CHECK-NEXT: S_NOP 0, implicit [[COPY3]]		; CHECK-NEXT: S_NOP 0, implicit [[PRED_COPY3]]
; CHECK-NEXT: S_NOP 0, implicit [[COPY2]]		; CHECK-NEXT: S_NOP 0, implicit [[PRED_COPY2]]
; CHECK-NEXT: S_NOP 0, implicit [[COPY1]]		; CHECK-NEXT: S_NOP 0, implicit [[PRED_COPY1]]
; CHECK-NEXT: S_NOP 0, implicit [[COPY]]		; CHECK-NEXT: S_NOP 0, implicit [[PRED_COPY]]
bb.0:		bb.0:
S_NOP 0, implicit-def %0:vreg_128		S_NOP 0, implicit-def %0:vreg_128
S_NOP 0, implicit-def %1:vreg_128		S_NOP 0, implicit-def %1:vreg_128
S_NOP 0, implicit-def %2:vreg_128		S_NOP 0, implicit-def %2:vreg_128
S_NOP 0, implicit-def %3:vreg_128		S_NOP 0, implicit-def %3:vreg_128
S_NOP 0, implicit-def %4:vreg_128		S_NOP 0, implicit-def %4:vreg_128
S_NOP 0, implicit-def %5:vreg_128		S_NOP 0, implicit-def %5:vreg_128

▲ Show 20 Lines • Show All 84 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/llc-pipeline.ll

	Show First 20 Lines • Show All 117 Lines • ▼ Show 20 Lines
	; GCN-O0-NEXT: MachinePostDominator Tree Construction			; GCN-O0-NEXT: MachinePostDominator Tree Construction
	; GCN-O0-NEXT: SI Whole Quad Mode			; GCN-O0-NEXT: SI Whole Quad Mode
	; GCN-O0-NEXT: Virtual Register Map			; GCN-O0-NEXT: Virtual Register Map
	; GCN-O0-NEXT: Live Register Matrix			; GCN-O0-NEXT: Live Register Matrix
	; GCN-O0-NEXT: SI Pre-allocate WWM Registers			; GCN-O0-NEXT: SI Pre-allocate WWM Registers
	; GCN-O0-NEXT: Fast Register Allocator			; GCN-O0-NEXT: Fast Register Allocator
	; GCN-O0-NEXT: SI lower SGPR spill instructions			; GCN-O0-NEXT: SI lower SGPR spill instructions
	; GCN-O0-NEXT: Fast Register Allocator			; GCN-O0-NEXT: Fast Register Allocator
				; GCN-O0-NEXT: SI Lower Predicated Copies
	; GCN-O0-NEXT: SI Fix VGPR copies			; GCN-O0-NEXT: SI Fix VGPR copies
	; GCN-O0-NEXT: Remove Redundant DEBUG_VALUE analysis			; GCN-O0-NEXT: Remove Redundant DEBUG_VALUE analysis
	; GCN-O0-NEXT: Fixup Statepoint Caller Saved			; GCN-O0-NEXT: Fixup Statepoint Caller Saved
	; GCN-O0-NEXT: Lazy Machine Block Frequency Analysis			; GCN-O0-NEXT: Lazy Machine Block Frequency Analysis
	; GCN-O0-NEXT: Machine Optimization Remark Emitter			; GCN-O0-NEXT: Machine Optimization Remark Emitter
	; GCN-O0-NEXT: Prologue/Epilogue Insertion & Frame Finalization			; GCN-O0-NEXT: Prologue/Epilogue Insertion & Frame Finalization
	; GCN-O0-NEXT: Post-RA pseudo instruction expansion pass			; GCN-O0-NEXT: Post-RA pseudo instruction expansion pass
	; GCN-O0-NEXT: SI post-RA bundler			; GCN-O0-NEXT: SI post-RA bundler
	▲ Show 20 Lines • Show All 233 Lines • ▼ Show 20 Lines
	; GCN-O1-NEXT: Lazy Machine Block Frequency Analysis			; GCN-O1-NEXT: Lazy Machine Block Frequency Analysis
	; GCN-O1-NEXT: Machine Optimization Remark Emitter			; GCN-O1-NEXT: Machine Optimization Remark Emitter
	; GCN-O1-NEXT: Greedy Register Allocator			; GCN-O1-NEXT: Greedy Register Allocator
	; GCN-O1-NEXT: Virtual Register Rewriter			; GCN-O1-NEXT: Virtual Register Rewriter
	; GCN-O1-NEXT: SI lower SGPR spill instructions			; GCN-O1-NEXT: SI lower SGPR spill instructions
	; GCN-O1-NEXT: Virtual Register Map			; GCN-O1-NEXT: Virtual Register Map
	; GCN-O1-NEXT: Live Register Matrix			; GCN-O1-NEXT: Live Register Matrix
	; GCN-O1-NEXT: Greedy Register Allocator			; GCN-O1-NEXT: Greedy Register Allocator
				; GCN-O1-NEXT: SI Lower Predicated Copies
				arsenmUnsubmitted Not Done Reply Inline Actions I think I lost track of what was going on with -O0, did this lose the -O0 run? arsenm: I think I lost track of what was going on with -O0, did this lose the -O0 run?
				yassinghUnsubmitted Not Done Reply Inline Actions It was introduced when we moved from PRED_COPY "simplification" to proper lowering mechanism. Was removed again after the approach changed back to lowering WWM_COPY to COPY first. I have added the O0 invocation back but it's not doing anything. yassingh: It was introduced when we moved from PRED_COPY "simplification" to proper lowering mechanism.
	; GCN-O1-NEXT: GCN NSA Reassign			; GCN-O1-NEXT: GCN NSA Reassign
	; GCN-O1-NEXT: Virtual Register Rewriter			; GCN-O1-NEXT: Virtual Register Rewriter
	; GCN-O1-NEXT: Stack Slot Coloring			; GCN-O1-NEXT: Stack Slot Coloring
	; GCN-O1-NEXT: Machine Copy Propagation Pass			; GCN-O1-NEXT: Machine Copy Propagation Pass
	; GCN-O1-NEXT: Machine Loop Invariant Code Motion			; GCN-O1-NEXT: Machine Loop Invariant Code Motion
	; GCN-O1-NEXT: SI Fix VGPR copies			; GCN-O1-NEXT: SI Fix VGPR copies
	; GCN-O1-NEXT: SI optimize exec mask operations			; GCN-O1-NEXT: SI optimize exec mask operations
	; GCN-O1-NEXT: Remove Redundant DEBUG_VALUE analysis			; GCN-O1-NEXT: Remove Redundant DEBUG_VALUE analysis
	▲ Show 20 Lines • Show All 293 Lines • ▼ Show 20 Lines
	; GCN-O1-OPTS-NEXT: Lazy Machine Block Frequency Analysis			; GCN-O1-OPTS-NEXT: Lazy Machine Block Frequency Analysis
	; GCN-O1-OPTS-NEXT: Machine Optimization Remark Emitter			; GCN-O1-OPTS-NEXT: Machine Optimization Remark Emitter
	; GCN-O1-OPTS-NEXT: Greedy Register Allocator			; GCN-O1-OPTS-NEXT: Greedy Register Allocator
	; GCN-O1-OPTS-NEXT: Virtual Register Rewriter			; GCN-O1-OPTS-NEXT: Virtual Register Rewriter
	; GCN-O1-OPTS-NEXT: SI lower SGPR spill instructions			; GCN-O1-OPTS-NEXT: SI lower SGPR spill instructions
	; GCN-O1-OPTS-NEXT: Virtual Register Map			; GCN-O1-OPTS-NEXT: Virtual Register Map
	; GCN-O1-OPTS-NEXT: Live Register Matrix			; GCN-O1-OPTS-NEXT: Live Register Matrix
	; GCN-O1-OPTS-NEXT: Greedy Register Allocator			; GCN-O1-OPTS-NEXT: Greedy Register Allocator
				; GCN-O1-OPTS-NEXT: SI Lower Predicated Copies
	; GCN-O1-OPTS-NEXT: GCN NSA Reassign			; GCN-O1-OPTS-NEXT: GCN NSA Reassign
	; GCN-O1-OPTS-NEXT: Virtual Register Rewriter			; GCN-O1-OPTS-NEXT: Virtual Register Rewriter
	; GCN-O1-OPTS-NEXT: Stack Slot Coloring			; GCN-O1-OPTS-NEXT: Stack Slot Coloring
	; GCN-O1-OPTS-NEXT: Machine Copy Propagation Pass			; GCN-O1-OPTS-NEXT: Machine Copy Propagation Pass
	; GCN-O1-OPTS-NEXT: Machine Loop Invariant Code Motion			; GCN-O1-OPTS-NEXT: Machine Loop Invariant Code Motion
	; GCN-O1-OPTS-NEXT: SI Fix VGPR copies			; GCN-O1-OPTS-NEXT: SI Fix VGPR copies
	; GCN-O1-OPTS-NEXT: SI optimize exec mask operations			; GCN-O1-OPTS-NEXT: SI optimize exec mask operations
	; GCN-O1-OPTS-NEXT: Remove Redundant DEBUG_VALUE analysis			; GCN-O1-OPTS-NEXT: Remove Redundant DEBUG_VALUE analysis
	▲ Show 20 Lines • Show All 295 Lines • ▼ Show 20 Lines
	; GCN-O2-NEXT: Lazy Machine Block Frequency Analysis			; GCN-O2-NEXT: Lazy Machine Block Frequency Analysis
	; GCN-O2-NEXT: Machine Optimization Remark Emitter			; GCN-O2-NEXT: Machine Optimization Remark Emitter
	; GCN-O2-NEXT: Greedy Register Allocator			; GCN-O2-NEXT: Greedy Register Allocator
	; GCN-O2-NEXT: Virtual Register Rewriter			; GCN-O2-NEXT: Virtual Register Rewriter
	; GCN-O2-NEXT: SI lower SGPR spill instructions			; GCN-O2-NEXT: SI lower SGPR spill instructions
	; GCN-O2-NEXT: Virtual Register Map			; GCN-O2-NEXT: Virtual Register Map
	; GCN-O2-NEXT: Live Register Matrix			; GCN-O2-NEXT: Live Register Matrix
	; GCN-O2-NEXT: Greedy Register Allocator			; GCN-O2-NEXT: Greedy Register Allocator
				; GCN-O2-NEXT: SI Lower Predicated Copies
	; GCN-O2-NEXT: GCN NSA Reassign			; GCN-O2-NEXT: GCN NSA Reassign
	; GCN-O2-NEXT: Virtual Register Rewriter			; GCN-O2-NEXT: Virtual Register Rewriter
	; GCN-O2-NEXT: Stack Slot Coloring			; GCN-O2-NEXT: Stack Slot Coloring
	; GCN-O2-NEXT: Machine Copy Propagation Pass			; GCN-O2-NEXT: Machine Copy Propagation Pass
	; GCN-O2-NEXT: Machine Loop Invariant Code Motion			; GCN-O2-NEXT: Machine Loop Invariant Code Motion
	; GCN-O2-NEXT: SI Fix VGPR copies			; GCN-O2-NEXT: SI Fix VGPR copies
	; GCN-O2-NEXT: SI optimize exec mask operations			; GCN-O2-NEXT: SI optimize exec mask operations
	; GCN-O2-NEXT: Remove Redundant DEBUG_VALUE analysis			; GCN-O2-NEXT: Remove Redundant DEBUG_VALUE analysis
	▲ Show 20 Lines • Show All 306 Lines • ▼ Show 20 Lines
	; GCN-O3-NEXT: Lazy Machine Block Frequency Analysis			; GCN-O3-NEXT: Lazy Machine Block Frequency Analysis
	; GCN-O3-NEXT: Machine Optimization Remark Emitter			; GCN-O3-NEXT: Machine Optimization Remark Emitter
	; GCN-O3-NEXT: Greedy Register Allocator			; GCN-O3-NEXT: Greedy Register Allocator
	; GCN-O3-NEXT: Virtual Register Rewriter			; GCN-O3-NEXT: Virtual Register Rewriter
	; GCN-O3-NEXT: SI lower SGPR spill instructions			; GCN-O3-NEXT: SI lower SGPR spill instructions
	; GCN-O3-NEXT: Virtual Register Map			; GCN-O3-NEXT: Virtual Register Map
	; GCN-O3-NEXT: Live Register Matrix			; GCN-O3-NEXT: Live Register Matrix
	; GCN-O3-NEXT: Greedy Register Allocator			; GCN-O3-NEXT: Greedy Register Allocator
				; GCN-O3-NEXT: SI Lower Predicated Copies
	; GCN-O3-NEXT: GCN NSA Reassign			; GCN-O3-NEXT: GCN NSA Reassign
	; GCN-O3-NEXT: Virtual Register Rewriter			; GCN-O3-NEXT: Virtual Register Rewriter
	; GCN-O3-NEXT: Stack Slot Coloring			; GCN-O3-NEXT: Stack Slot Coloring
	; GCN-O3-NEXT: Machine Copy Propagation Pass			; GCN-O3-NEXT: Machine Copy Propagation Pass
	; GCN-O3-NEXT: Machine Loop Invariant Code Motion			; GCN-O3-NEXT: Machine Loop Invariant Code Motion
	; GCN-O3-NEXT: SI Fix VGPR copies			; GCN-O3-NEXT: SI Fix VGPR copies
	; GCN-O3-NEXT: SI optimize exec mask operations			; GCN-O3-NEXT: SI optimize exec mask operations
	; GCN-O3-NEXT: Remove Redundant DEBUG_VALUE analysis			; GCN-O3-NEXT: Remove Redundant DEBUG_VALUE analysis
	▲ Show 20 Lines • Show All 59 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/load-global-i16.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 7,315 Lines • ▼ Show 20 Lines
	; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 offset:240			; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[0:3], off, s[0:3], 0 offset:240
	; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v21, 0			; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v21, 0
	; GCN-NOHSA-SI-NEXT: s_waitcnt expcnt(0)			; GCN-NOHSA-SI-NEXT: s_waitcnt expcnt(0)
	; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v0, v12			; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v0, v12
	; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v1, v13			; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v1, v13
	; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v2, v14			; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v2, v14
	; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v3, 0			; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v3, 0
	; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v7, 0			; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v7, 0
				; GCN-NOHSA-SI-NEXT: ; kill: def $vgpr4_vgpr5_vgpr6_vgpr7 killed $vgpr4_vgpr5_vgpr6_vgpr7 killed $exec
	; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v41, 0			; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v41, 0
	; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v33, 0			; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v33, 0
	; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v49, 0			; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v49, 0
	; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v25, 0			; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v25, 0
	; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v61, 0			; GCN-NOHSA-SI-NEXT: v_mov_b32_e32 v61, 0
	; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[8:11], off, s[0:3], 0 offset:208			; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[8:11], off, s[0:3], 0 offset:208
	; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[54:57], off, s[0:3], 0 offset:176			; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[54:57], off, s[0:3], 0 offset:176
	; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[50:53], off, s[0:3], 0 offset:144			; GCN-NOHSA-SI-NEXT: buffer_store_dwordx4 v[50:53], off, s[0:3], 0 offset:144
	▲ Show 20 Lines • Show All 1,354 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/partial-regcopy-and-spill-missed-at-regalloc.ll

	; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	;RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 --stop-after=greedy,1 -verify-machineinstrs < %s \| FileCheck -check-prefix=REGALLOC-GFX908 %s			;RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 --stop-after=greedy,1 -verify-machineinstrs < %s \| FileCheck -check-prefix=REGALLOC-GFX908 %s
	;RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 --stop-after=prologepilog -verify-machineinstrs < %s \| FileCheck -check-prefix=PEI-GFX908 %s			;RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 --stop-after=prologepilog -verify-machineinstrs < %s \| FileCheck -check-prefix=PEI-GFX908 %s
	;RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a --stop-after=greedy,1 -verify-machineinstrs < %s \| FileCheck -check-prefix=REGALLOC-GFX90A %s			;RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a --stop-after=greedy,1 -verify-machineinstrs < %s \| FileCheck -check-prefix=REGALLOC-GFX90A %s
	;RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a --stop-after=prologepilog -verify-machineinstrs < %s \| FileCheck -check-prefix=PEI-GFX90A %s			;RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a --stop-after=prologepilog -verify-machineinstrs < %s \| FileCheck -check-prefix=PEI-GFX90A %s

	; Partial reg copy and spill missed during regalloc handled later at frame lowering.			; Partial reg copy and spill missed during regalloc handled later at frame lowering.
	define amdgpu_kernel void @partial_copy(<4 x i32> %arg) #0 {			define amdgpu_kernel void @partial_copy(<4 x i32> %arg) #0 {
	; REGALLOC-GFX908-LABEL: name: partial_copy			; REGALLOC-GFX908-LABEL: name: partial_copy
	; REGALLOC-GFX908: bb.0 (%ir-block.0):			; REGALLOC-GFX908: bb.0 (%ir-block.0):
	; REGALLOC-GFX908-NEXT: liveins: $sgpr4_sgpr5			; REGALLOC-GFX908-NEXT: liveins: $sgpr4_sgpr5
	; REGALLOC-GFX908-NEXT: {{ $}}			; REGALLOC-GFX908-NEXT: {{ $}}
	; REGALLOC-GFX908-NEXT: INLINEASM &"; use $0", 1 /* sideeffect attdialect /, 1703945 / reguse:AGPR_32 */, undef %5:agpr_32			; REGALLOC-GFX908-NEXT: INLINEASM &"; use $0", 1 /* sideeffect attdialect /, 1703945 / reguse:AGPR_32 */, undef %5:agpr_32
	; REGALLOC-GFX908-NEXT: INLINEASM &"; def $0", 1 /* sideeffect attdialect /, 5767178 / regdef:VReg_128 */, def %26			; REGALLOC-GFX908-NEXT: INLINEASM &"; def $0", 1 /* sideeffect attdialect /, 5767178 / regdef:VReg_128 */, def %26
	; REGALLOC-GFX908-NEXT: [[COPY:%[0-9]+]]:av_128 = COPY %26			; REGALLOC-GFX908-NEXT: [[PRED_COPY:%[0-9]+]]:av_128 = PRED_COPY %26
	; REGALLOC-GFX908-NEXT: INLINEASM &"; def $0", 1 /* sideeffect attdialect /, 3080202 / regdef:VReg_64 */, def %23			; REGALLOC-GFX908-NEXT: INLINEASM &"; def $0", 1 /* sideeffect attdialect /, 3080202 / regdef:VReg_64 */, def %23
	; REGALLOC-GFX908-NEXT: SI_SPILL_V64_SAVE %23, %stack.0, $sgpr32, 0, implicit $exec :: (store (s64) into %stack.0, align 4, addrspace 5)			; REGALLOC-GFX908-NEXT: SI_SPILL_V64_SAVE %23, %stack.0, $sgpr32, 0, implicit $exec :: (store (s64) into %stack.0, align 4, addrspace 5)
	; REGALLOC-GFX908-NEXT: [[COPY1:%[0-9]+]]:vreg_128 = COPY [[COPY]]			; REGALLOC-GFX908-NEXT: [[PRED_COPY1:%[0-9]+]]:vreg_128 = PRED_COPY [[PRED_COPY]]
	; REGALLOC-GFX908-NEXT: GLOBAL_STORE_DWORDX4 undef %14:vreg_64, [[COPY1]], 0, 0, implicit $exec :: (volatile store (s128) into `ptr addrspace(1) undef`, addrspace 1)			; REGALLOC-GFX908-NEXT: GLOBAL_STORE_DWORDX4 undef %14:vreg_64, [[PRED_COPY1]], 0, 0, implicit $exec :: (volatile store (s128) into `ptr addrspace(1) undef`, addrspace 1)
	; REGALLOC-GFX908-NEXT: renamable $sgpr0_sgpr1_sgpr2_sgpr3 = S_LOAD_DWORDX4_IMM killed renamable $sgpr4_sgpr5, 0, 0 :: (dereferenceable invariant load (s128) from %ir.arg.kernarg.offset1, addrspace 4)			; REGALLOC-GFX908-NEXT: renamable $sgpr0_sgpr1_sgpr2_sgpr3 = S_LOAD_DWORDX4_IMM killed renamable $sgpr4_sgpr5, 0, 0 :: (dereferenceable invariant load (s128) from %ir.arg.kernarg.offset1, addrspace 4)
	; REGALLOC-GFX908-NEXT: [[COPY2:%[0-9]+]]:areg_128 = COPY killed renamable $sgpr0_sgpr1_sgpr2_sgpr3			; REGALLOC-GFX908-NEXT: [[COPY:%[0-9]+]]:areg_128 = COPY killed renamable $sgpr0_sgpr1_sgpr2_sgpr3
	; REGALLOC-GFX908-NEXT: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 1, implicit $exec			; REGALLOC-GFX908-NEXT: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 1, implicit $exec
	; REGALLOC-GFX908-NEXT: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 2, implicit $exec			; REGALLOC-GFX908-NEXT: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 2, implicit $exec
	; REGALLOC-GFX908-NEXT: [[V_MFMA_I32_4X4X4I8_e64_:%[0-9]+]]:areg_128 = V_MFMA_I32_4X4X4I8_e64 [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], [[COPY2]], 0, 0, 0, implicit $mode, implicit $exec			; REGALLOC-GFX908-NEXT: [[V_MFMA_I32_4X4X4I8_e64_:%[0-9]+]]:areg_128 = V_MFMA_I32_4X4X4I8_e64 [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], [[COPY]], 0, 0, 0, implicit $mode, implicit $exec
	; REGALLOC-GFX908-NEXT: [[SI_SPILL_V64_RESTORE:%[0-9]+]]:vreg_64 = SI_SPILL_V64_RESTORE %stack.0, $sgpr32, 0, implicit $exec :: (load (s64) from %stack.0, align 4, addrspace 5)			; REGALLOC-GFX908-NEXT: [[SI_SPILL_V64_RESTORE:%[0-9]+]]:vreg_64 = SI_SPILL_V64_RESTORE %stack.0, $sgpr32, 0, implicit $exec :: (load (s64) from %stack.0, align 4, addrspace 5)
	; REGALLOC-GFX908-NEXT: GLOBAL_STORE_DWORDX2 undef %16:vreg_64, [[SI_SPILL_V64_RESTORE]], 0, 0, implicit $exec :: (volatile store (s64) into `ptr addrspace(1) undef`, addrspace 1)			; REGALLOC-GFX908-NEXT: GLOBAL_STORE_DWORDX2 undef %16:vreg_64, [[SI_SPILL_V64_RESTORE]], 0, 0, implicit $exec :: (volatile store (s64) into `ptr addrspace(1) undef`, addrspace 1)
	; REGALLOC-GFX908-NEXT: [[COPY3:%[0-9]+]]:vreg_128 = COPY [[V_MFMA_I32_4X4X4I8_e64_]]			; REGALLOC-GFX908-NEXT: [[COPY1:%[0-9]+]]:vreg_128 = COPY [[V_MFMA_I32_4X4X4I8_e64_]]
	; REGALLOC-GFX908-NEXT: GLOBAL_STORE_DWORDX4 undef %18:vreg_64, [[COPY3]], 0, 0, implicit $exec :: (volatile store (s128) into `ptr addrspace(1) undef`, addrspace 1)			; REGALLOC-GFX908-NEXT: GLOBAL_STORE_DWORDX4 undef %18:vreg_64, [[COPY1]], 0, 0, implicit $exec :: (volatile store (s128) into `ptr addrspace(1) undef`, addrspace 1)
	; REGALLOC-GFX908-NEXT: S_ENDPGM 0			; REGALLOC-GFX908-NEXT: S_ENDPGM 0
	; PEI-GFX908-LABEL: name: partial_copy			; PEI-GFX908-LABEL: name: partial_copy
	; PEI-GFX908: bb.0 (%ir-block.0):			; PEI-GFX908: bb.0 (%ir-block.0):
	; PEI-GFX908-NEXT: liveins: $agpr4, $sgpr4_sgpr5, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr7			; PEI-GFX908-NEXT: liveins: $agpr4, $sgpr4_sgpr5, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr7
	; PEI-GFX908-NEXT: {{ $}}			; PEI-GFX908-NEXT: {{ $}}
	; PEI-GFX908-NEXT: $sgpr8_sgpr9_sgpr10_sgpr11 = COPY killed $sgpr0_sgpr1_sgpr2_sgpr3			; PEI-GFX908-NEXT: $sgpr8_sgpr9_sgpr10_sgpr11 = COPY killed $sgpr0_sgpr1_sgpr2_sgpr3
	; PEI-GFX908-NEXT: $sgpr8 = S_ADD_U32 $sgpr8, $sgpr7, implicit-def $scc, implicit-def $sgpr8_sgpr9_sgpr10_sgpr11			; PEI-GFX908-NEXT: $sgpr8 = S_ADD_U32 $sgpr8, $sgpr7, implicit-def $scc, implicit-def $sgpr8_sgpr9_sgpr10_sgpr11
	; PEI-GFX908-NEXT: $sgpr9 = S_ADDC_U32 $sgpr9, 0, implicit-def dead $scc, implicit $scc, implicit-def $sgpr8_sgpr9_sgpr10_sgpr11			; PEI-GFX908-NEXT: $sgpr9 = S_ADDC_U32 $sgpr9, 0, implicit-def dead $scc, implicit $scc, implicit-def $sgpr8_sgpr9_sgpr10_sgpr11
	; PEI-GFX908-NEXT: INLINEASM &"; use $0", 1 /* sideeffect attdialect /, 1703945 / reguse:AGPR_32 */, undef renamable $agpr0			; PEI-GFX908-NEXT: INLINEASM &"; use $0", 1 /* sideeffect attdialect /, 1703945 / reguse:AGPR_32 */, undef renamable $agpr0
	; PEI-GFX908-NEXT: INLINEASM &"; def $0", 1 /* sideeffect attdialect /, 5767178 / regdef:VReg_128 */, def renamable $vgpr0_vgpr1_vgpr2_vgpr3			; PEI-GFX908-NEXT: INLINEASM &"; def $0", 1 /* sideeffect attdialect /, 5767178 / regdef:VReg_128 */, def renamable $vgpr0_vgpr1_vgpr2_vgpr3
	; PEI-GFX908-NEXT: renamable $agpr0_agpr1_agpr2_agpr3 = COPY killed renamable $vgpr0_vgpr1_vgpr2_vgpr3, implicit $exec			; PEI-GFX908-NEXT: renamable $agpr0_agpr1_agpr2_agpr3 = PRED_COPY killed renamable $vgpr0_vgpr1_vgpr2_vgpr3
	; PEI-GFX908-NEXT: INLINEASM &"; def $0", 1 /* sideeffect attdialect /, 3080202 / regdef:VReg_64 */, def renamable $vgpr0_vgpr1			; PEI-GFX908-NEXT: INLINEASM &"; def $0", 1 /* sideeffect attdialect /, 3080202 / regdef:VReg_64 */, def renamable $vgpr0_vgpr1
	; PEI-GFX908-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr0, $sgpr8_sgpr9_sgpr10_sgpr11, 0, 4, 0, 0, implicit $exec, implicit-def $vgpr0_vgpr1, implicit $vgpr0_vgpr1 :: (store (s32) into %stack.0, addrspace 5)			; PEI-GFX908-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr0, $sgpr8_sgpr9_sgpr10_sgpr11, 0, 4, 0, 0, implicit $exec, implicit-def $vgpr0_vgpr1, implicit $vgpr0_vgpr1 :: (store (s32) into %stack.0, addrspace 5)
	; PEI-GFX908-NEXT: $agpr4 = V_ACCVGPR_WRITE_B32_e64 killed $vgpr1, implicit $exec, implicit killed $vgpr0_vgpr1			; PEI-GFX908-NEXT: $agpr4 = V_ACCVGPR_WRITE_B32_e64 killed $vgpr1, implicit $exec, implicit killed $vgpr0_vgpr1
	; PEI-GFX908-NEXT: renamable $vgpr0_vgpr1_vgpr2_vgpr3 = COPY killed renamable $agpr0_agpr1_agpr2_agpr3, implicit $exec			; PEI-GFX908-NEXT: renamable $vgpr0_vgpr1_vgpr2_vgpr3 = PRED_COPY killed renamable $agpr0_agpr1_agpr2_agpr3
	; PEI-GFX908-NEXT: GLOBAL_STORE_DWORDX4 undef renamable $vgpr0_vgpr1, killed renamable $vgpr0_vgpr1_vgpr2_vgpr3, 0, 0, implicit $exec :: (volatile store (s128) into `ptr addrspace(1) undef`, addrspace 1)			; PEI-GFX908-NEXT: GLOBAL_STORE_DWORDX4 undef renamable $vgpr0_vgpr1, killed renamable $vgpr0_vgpr1_vgpr2_vgpr3, 0, 0, implicit $exec :: (volatile store (s128) into `ptr addrspace(1) undef`, addrspace 1)
	; PEI-GFX908-NEXT: renamable $sgpr0_sgpr1_sgpr2_sgpr3 = S_LOAD_DWORDX4_IMM killed renamable $sgpr4_sgpr5, 0, 0 :: (dereferenceable invariant load (s128) from %ir.arg.kernarg.offset1, addrspace 4)			; PEI-GFX908-NEXT: renamable $sgpr0_sgpr1_sgpr2_sgpr3 = S_LOAD_DWORDX4_IMM killed renamable $sgpr4_sgpr5, 0, 0 :: (dereferenceable invariant load (s128) from %ir.arg.kernarg.offset1, addrspace 4)
	; PEI-GFX908-NEXT: renamable $agpr0_agpr1_agpr2_agpr3 = COPY killed renamable $sgpr0_sgpr1_sgpr2_sgpr3, implicit $exec			; PEI-GFX908-NEXT: renamable $agpr0_agpr1_agpr2_agpr3 = COPY killed renamable $sgpr0_sgpr1_sgpr2_sgpr3, implicit $exec
	; PEI-GFX908-NEXT: renamable $vgpr0 = V_MOV_B32_e32 1, implicit $exec			; PEI-GFX908-NEXT: renamable $vgpr0 = V_MOV_B32_e32 1, implicit $exec
	; PEI-GFX908-NEXT: renamable $vgpr1 = V_MOV_B32_e32 2, implicit $exec			; PEI-GFX908-NEXT: renamable $vgpr1 = V_MOV_B32_e32 2, implicit $exec
	; PEI-GFX908-NEXT: renamable $agpr0_agpr1_agpr2_agpr3 = V_MFMA_I32_4X4X4I8_e64 killed $vgpr0, killed $vgpr1, killed $agpr0_agpr1_agpr2_agpr3, 0, 0, 0, implicit $mode, implicit $exec			; PEI-GFX908-NEXT: renamable $agpr0_agpr1_agpr2_agpr3 = V_MFMA_I32_4X4X4I8_e64 killed $vgpr0, killed $vgpr1, killed $agpr0_agpr1_agpr2_agpr3, 0, 0, 0, implicit $mode, implicit $exec
	; PEI-GFX908-NEXT: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr8_sgpr9_sgpr10_sgpr11, 0, 4, 0, 0, implicit $exec, implicit-def $vgpr0_vgpr1 :: (load (s32) from %stack.0, addrspace 5)			; PEI-GFX908-NEXT: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr8_sgpr9_sgpr10_sgpr11, 0, 4, 0, 0, implicit $exec, implicit-def $vgpr0_vgpr1 :: (load (s32) from %stack.0, addrspace 5)
	; PEI-GFX908-NEXT: $vgpr1 = V_ACCVGPR_READ_B32_e64 $agpr4, implicit $exec, implicit $vgpr0_vgpr1			; PEI-GFX908-NEXT: $vgpr1 = V_ACCVGPR_READ_B32_e64 $agpr4, implicit $exec, implicit $vgpr0_vgpr1
	; PEI-GFX908-NEXT: GLOBAL_STORE_DWORDX2 undef renamable $vgpr0_vgpr1, killed renamable $vgpr0_vgpr1, 0, 0, implicit $exec :: (volatile store (s64) into `ptr addrspace(1) undef`, addrspace 1)			; PEI-GFX908-NEXT: GLOBAL_STORE_DWORDX2 undef renamable $vgpr0_vgpr1, killed renamable $vgpr0_vgpr1, 0, 0, implicit $exec :: (volatile store (s64) into `ptr addrspace(1) undef`, addrspace 1)
	; PEI-GFX908-NEXT: renamable $vgpr0_vgpr1_vgpr2_vgpr3 = COPY killed renamable $agpr0_agpr1_agpr2_agpr3, implicit $exec			; PEI-GFX908-NEXT: renamable $vgpr0_vgpr1_vgpr2_vgpr3 = COPY killed renamable $agpr0_agpr1_agpr2_agpr3, implicit $exec
	; PEI-GFX908-NEXT: GLOBAL_STORE_DWORDX4 undef renamable $vgpr0_vgpr1, killed renamable $vgpr0_vgpr1_vgpr2_vgpr3, 0, 0, implicit $exec :: (volatile store (s128) into `ptr addrspace(1) undef`, addrspace 1)			; PEI-GFX908-NEXT: GLOBAL_STORE_DWORDX4 undef renamable $vgpr0_vgpr1, killed renamable $vgpr0_vgpr1_vgpr2_vgpr3, 0, 0, implicit $exec :: (volatile store (s128) into `ptr addrspace(1) undef`, addrspace 1)
	; PEI-GFX908-NEXT: S_ENDPGM 0			; PEI-GFX908-NEXT: S_ENDPGM 0
	; REGALLOC-GFX90A-LABEL: name: partial_copy			; REGALLOC-GFX90A-LABEL: name: partial_copy
	; REGALLOC-GFX90A: bb.0 (%ir-block.0):			; REGALLOC-GFX90A: bb.0 (%ir-block.0):
	; REGALLOC-GFX90A-NEXT: liveins: $sgpr4_sgpr5			; REGALLOC-GFX90A-NEXT: liveins: $sgpr4_sgpr5
	; REGALLOC-GFX90A-NEXT: {{ $}}			; REGALLOC-GFX90A-NEXT: {{ $}}
	; REGALLOC-GFX90A-NEXT: INLINEASM &"; use $0", 1 /* sideeffect attdialect /, 1703945 / reguse:AGPR_32 */, undef %5:agpr_32			; REGALLOC-GFX90A-NEXT: INLINEASM &"; use $0", 1 /* sideeffect attdialect /, 1703945 / reguse:AGPR_32 */, undef %5:agpr_32
	; REGALLOC-GFX90A-NEXT: INLINEASM &"; def $0", 1 /* sideeffect attdialect /, 6094858 / regdef:VReg_128_Align2 */, def %25			; REGALLOC-GFX90A-NEXT: INLINEASM &"; def $0", 1 /* sideeffect attdialect /, 6094858 / regdef:VReg_128_Align2 */, def %25
	; REGALLOC-GFX90A-NEXT: [[COPY:%[0-9]+]]:av_128_align2 = COPY %25			; REGALLOC-GFX90A-NEXT: [[PRED_COPY:%[0-9]+]]:av_128_align2 = PRED_COPY %25
	; REGALLOC-GFX90A-NEXT: INLINEASM &"; def $0", 1 /* sideeffect attdialect /, 3407882 / regdef:VReg_64_Align2 */, def %23			; REGALLOC-GFX90A-NEXT: INLINEASM &"; def $0", 1 /* sideeffect attdialect /, 3407882 / regdef:VReg_64_Align2 */, def %23
	; REGALLOC-GFX90A-NEXT: SI_SPILL_V64_SAVE %23, %stack.0, $sgpr32, 0, implicit $exec :: (store (s64) into %stack.0, align 4, addrspace 5)			; REGALLOC-GFX90A-NEXT: SI_SPILL_V64_SAVE %23, %stack.0, $sgpr32, 0, implicit $exec :: (store (s64) into %stack.0, align 4, addrspace 5)
	; REGALLOC-GFX90A-NEXT: GLOBAL_STORE_DWORDX4 undef %14:vreg_64_align2, [[COPY]], 0, 0, implicit $exec :: (volatile store (s128) into `ptr addrspace(1) undef`, addrspace 1)			; REGALLOC-GFX90A-NEXT: GLOBAL_STORE_DWORDX4 undef %14:vreg_64_align2, [[PRED_COPY]], 0, 0, implicit $exec :: (volatile store (s128) into `ptr addrspace(1) undef`, addrspace 1)
	; REGALLOC-GFX90A-NEXT: renamable $sgpr0_sgpr1_sgpr2_sgpr3 = S_LOAD_DWORDX4_IMM killed renamable $sgpr4_sgpr5, 0, 0 :: (dereferenceable invariant load (s128) from %ir.arg.kernarg.offset1, addrspace 4)			; REGALLOC-GFX90A-NEXT: renamable $sgpr0_sgpr1_sgpr2_sgpr3 = S_LOAD_DWORDX4_IMM killed renamable $sgpr4_sgpr5, 0, 0 :: (dereferenceable invariant load (s128) from %ir.arg.kernarg.offset1, addrspace 4)
	; REGALLOC-GFX90A-NEXT: [[COPY1:%[0-9]+]]:areg_128_align2 = COPY killed renamable $sgpr0_sgpr1_sgpr2_sgpr3			; REGALLOC-GFX90A-NEXT: [[COPY:%[0-9]+]]:areg_128_align2 = COPY killed renamable $sgpr0_sgpr1_sgpr2_sgpr3
	; REGALLOC-GFX90A-NEXT: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 1, implicit $exec			; REGALLOC-GFX90A-NEXT: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 1, implicit $exec
	; REGALLOC-GFX90A-NEXT: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 2, implicit $exec			; REGALLOC-GFX90A-NEXT: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 2, implicit $exec
	; REGALLOC-GFX90A-NEXT: [[V_MFMA_I32_4X4X4I8_e64_:%[0-9]+]]:areg_128_align2 = V_MFMA_I32_4X4X4I8_e64 [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], [[COPY1]], 0, 0, 0, implicit $mode, implicit $exec			; REGALLOC-GFX90A-NEXT: [[V_MFMA_I32_4X4X4I8_e64_:%[0-9]+]]:areg_128_align2 = V_MFMA_I32_4X4X4I8_e64 [[V_MOV_B32_e32_]], [[V_MOV_B32_e32_1]], [[COPY]], 0, 0, 0, implicit $mode, implicit $exec
	; REGALLOC-GFX90A-NEXT: [[SI_SPILL_AV64_RESTORE:%[0-9]+]]:av_64_align2 = SI_SPILL_AV64_RESTORE %stack.0, $sgpr32, 0, implicit $exec :: (load (s64) from %stack.0, align 4, addrspace 5)			; REGALLOC-GFX90A-NEXT: [[SI_SPILL_AV64_RESTORE:%[0-9]+]]:av_64_align2 = SI_SPILL_AV64_RESTORE %stack.0, $sgpr32, 0, implicit $exec :: (load (s64) from %stack.0, align 4, addrspace 5)
	; REGALLOC-GFX90A-NEXT: GLOBAL_STORE_DWORDX2 undef %16:vreg_64_align2, [[SI_SPILL_AV64_RESTORE]], 0, 0, implicit $exec :: (volatile store (s64) into `ptr addrspace(1) undef`, addrspace 1)			; REGALLOC-GFX90A-NEXT: GLOBAL_STORE_DWORDX2 undef %16:vreg_64_align2, [[SI_SPILL_AV64_RESTORE]], 0, 0, implicit $exec :: (volatile store (s64) into `ptr addrspace(1) undef`, addrspace 1)
	; REGALLOC-GFX90A-NEXT: GLOBAL_STORE_DWORDX4 undef %18:vreg_64_align2, [[V_MFMA_I32_4X4X4I8_e64_]], 0, 0, implicit $exec :: (volatile store (s128) into `ptr addrspace(1) undef`, addrspace 1)			; REGALLOC-GFX90A-NEXT: GLOBAL_STORE_DWORDX4 undef %18:vreg_64_align2, [[V_MFMA_I32_4X4X4I8_e64_]], 0, 0, implicit $exec :: (volatile store (s128) into `ptr addrspace(1) undef`, addrspace 1)
	; REGALLOC-GFX90A-NEXT: S_ENDPGM 0			; REGALLOC-GFX90A-NEXT: S_ENDPGM 0
	; PEI-GFX90A-LABEL: name: partial_copy			; PEI-GFX90A-LABEL: name: partial_copy
	; PEI-GFX90A: bb.0 (%ir-block.0):			; PEI-GFX90A: bb.0 (%ir-block.0):
	; PEI-GFX90A-NEXT: liveins: $agpr4, $sgpr4_sgpr5, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr7			; PEI-GFX90A-NEXT: liveins: $agpr4, $sgpr4_sgpr5, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr7
	; PEI-GFX90A-NEXT: {{ $}}			; PEI-GFX90A-NEXT: {{ $}}
	; PEI-GFX90A-NEXT: $sgpr8_sgpr9_sgpr10_sgpr11 = COPY killed $sgpr0_sgpr1_sgpr2_sgpr3			; PEI-GFX90A-NEXT: $sgpr8_sgpr9_sgpr10_sgpr11 = COPY killed $sgpr0_sgpr1_sgpr2_sgpr3
	; PEI-GFX90A-NEXT: $sgpr8 = S_ADD_U32 $sgpr8, $sgpr7, implicit-def $scc, implicit-def $sgpr8_sgpr9_sgpr10_sgpr11			; PEI-GFX90A-NEXT: $sgpr8 = S_ADD_U32 $sgpr8, $sgpr7, implicit-def $scc, implicit-def $sgpr8_sgpr9_sgpr10_sgpr11
	; PEI-GFX90A-NEXT: $sgpr9 = S_ADDC_U32 $sgpr9, 0, implicit-def dead $scc, implicit $scc, implicit-def $sgpr8_sgpr9_sgpr10_sgpr11			; PEI-GFX90A-NEXT: $sgpr9 = S_ADDC_U32 $sgpr9, 0, implicit-def dead $scc, implicit $scc, implicit-def $sgpr8_sgpr9_sgpr10_sgpr11
	; PEI-GFX90A-NEXT: INLINEASM &"; use $0", 1 /* sideeffect attdialect /, 1703945 / reguse:AGPR_32 */, undef renamable $agpr0			; PEI-GFX90A-NEXT: INLINEASM &"; use $0", 1 /* sideeffect attdialect /, 1703945 / reguse:AGPR_32 */, undef renamable $agpr0
	; PEI-GFX90A-NEXT: INLINEASM &"; def $0", 1 /* sideeffect attdialect /, 6094858 / regdef:VReg_128_Align2 */, def renamable $vgpr0_vgpr1_vgpr2_vgpr3			; PEI-GFX90A-NEXT: INLINEASM &"; def $0", 1 /* sideeffect attdialect /, 6094858 / regdef:VReg_128_Align2 */, def renamable $vgpr0_vgpr1_vgpr2_vgpr3
	; PEI-GFX90A-NEXT: renamable $agpr0_agpr1_agpr2_agpr3 = COPY killed renamable $vgpr0_vgpr1_vgpr2_vgpr3, implicit $exec			; PEI-GFX90A-NEXT: renamable $agpr0_agpr1_agpr2_agpr3 = PRED_COPY killed renamable $vgpr0_vgpr1_vgpr2_vgpr3
	; PEI-GFX90A-NEXT: INLINEASM &"; def $0", 1 /* sideeffect attdialect /, 3407882 / regdef:VReg_64_Align2 */, def renamable $vgpr0_vgpr1			; PEI-GFX90A-NEXT: INLINEASM &"; def $0", 1 /* sideeffect attdialect /, 3407882 / regdef:VReg_64_Align2 */, def renamable $vgpr0_vgpr1
	; PEI-GFX90A-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr0, $sgpr8_sgpr9_sgpr10_sgpr11, 0, 4, 0, 0, implicit $exec, implicit-def $vgpr0_vgpr1, implicit $vgpr0_vgpr1 :: (store (s32) into %stack.0, addrspace 5)			; PEI-GFX90A-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr0, $sgpr8_sgpr9_sgpr10_sgpr11, 0, 4, 0, 0, implicit $exec, implicit-def $vgpr0_vgpr1, implicit $vgpr0_vgpr1 :: (store (s32) into %stack.0, addrspace 5)
	; PEI-GFX90A-NEXT: $agpr4 = V_ACCVGPR_WRITE_B32_e64 killed $vgpr1, implicit $exec, implicit killed $vgpr0_vgpr1			; PEI-GFX90A-NEXT: $agpr4 = V_ACCVGPR_WRITE_B32_e64 killed $vgpr1, implicit $exec, implicit killed $vgpr0_vgpr1
	; PEI-GFX90A-NEXT: GLOBAL_STORE_DWORDX4 undef renamable $vgpr0_vgpr1, killed renamable $agpr0_agpr1_agpr2_agpr3, 0, 0, implicit $exec :: (volatile store (s128) into `ptr addrspace(1) undef`, addrspace 1)			; PEI-GFX90A-NEXT: GLOBAL_STORE_DWORDX4 undef renamable $vgpr0_vgpr1, killed renamable $agpr0_agpr1_agpr2_agpr3, 0, 0, implicit $exec :: (volatile store (s128) into `ptr addrspace(1) undef`, addrspace 1)
	; PEI-GFX90A-NEXT: renamable $sgpr0_sgpr1_sgpr2_sgpr3 = S_LOAD_DWORDX4_IMM killed renamable $sgpr4_sgpr5, 0, 0 :: (dereferenceable invariant load (s128) from %ir.arg.kernarg.offset1, addrspace 4)			; PEI-GFX90A-NEXT: renamable $sgpr0_sgpr1_sgpr2_sgpr3 = S_LOAD_DWORDX4_IMM killed renamable $sgpr4_sgpr5, 0, 0 :: (dereferenceable invariant load (s128) from %ir.arg.kernarg.offset1, addrspace 4)
	; PEI-GFX90A-NEXT: renamable $agpr0_agpr1_agpr2_agpr3 = COPY killed renamable $sgpr0_sgpr1_sgpr2_sgpr3, implicit $exec			; PEI-GFX90A-NEXT: renamable $agpr0_agpr1_agpr2_agpr3 = COPY killed renamable $sgpr0_sgpr1_sgpr2_sgpr3, implicit $exec
	; PEI-GFX90A-NEXT: renamable $vgpr0 = V_MOV_B32_e32 1, implicit $exec			; PEI-GFX90A-NEXT: renamable $vgpr0 = V_MOV_B32_e32 1, implicit $exec
	; PEI-GFX90A-NEXT: renamable $vgpr1 = V_MOV_B32_e32 2, implicit $exec			; PEI-GFX90A-NEXT: renamable $vgpr1 = V_MOV_B32_e32 2, implicit $exec
	Show All 19 Lines

llvm/test/CodeGen/AMDGPU/regalloc-fail-unsatisfiable-overlapping-tuple-hints.mir

Show First 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	body: \|
; CHECK-NEXT: SI_SPILL_V256_SAVE %7, %stack.1, $sgpr32, 0, implicit $exec :: (store (s256) into %stack.1, align 4, addrspace 5)		; CHECK-NEXT: SI_SPILL_V256_SAVE %7, %stack.1, $sgpr32, 0, implicit $exec :: (store (s256) into %stack.1, align 4, addrspace 5)
; CHECK-NEXT: SI_SPILL_V256_SAVE %5, %stack.0, $sgpr32, 0, implicit $exec :: (store (s256) into %stack.0, align 4, addrspace 5)		; CHECK-NEXT: SI_SPILL_V256_SAVE %5, %stack.0, $sgpr32, 0, implicit $exec :: (store (s256) into %stack.0, align 4, addrspace 5)
; CHECK-NEXT: S_NOP 0, implicit-def %17		; CHECK-NEXT: S_NOP 0, implicit-def %17
; CHECK-NEXT: SI_SPILL_V256_SAVE %17, %stack.2, $sgpr32, 0, implicit $exec :: (store (s256) into %stack.2, align 4, addrspace 5)		; CHECK-NEXT: SI_SPILL_V256_SAVE %17, %stack.2, $sgpr32, 0, implicit $exec :: (store (s256) into %stack.2, align 4, addrspace 5)
; CHECK-NEXT: S_NOP 0, implicit-def %4		; CHECK-NEXT: S_NOP 0, implicit-def %4
; CHECK-NEXT: [[SI_SPILL_V256_RESTORE:%[0-9]+]]:vreg_256 = SI_SPILL_V256_RESTORE %stack.1, $sgpr32, 0, implicit $exec :: (load (s256) from %stack.1, align 4, addrspace 5)		; CHECK-NEXT: [[SI_SPILL_V256_RESTORE:%[0-9]+]]:vreg_256 = SI_SPILL_V256_RESTORE %stack.1, $sgpr32, 0, implicit $exec :: (load (s256) from %stack.1, align 4, addrspace 5)
; CHECK-NEXT: [[SI_SPILL_V256_RESTORE1:%[0-9]+]]:vreg_256 = SI_SPILL_V256_RESTORE %stack.3, $sgpr32, 0, implicit $exec :: (load (s256) from %stack.3, align 4, addrspace 5)		; CHECK-NEXT: [[SI_SPILL_V256_RESTORE1:%[0-9]+]]:vreg_256 = SI_SPILL_V256_RESTORE %stack.3, $sgpr32, 0, implicit $exec :: (load (s256) from %stack.3, align 4, addrspace 5)
; CHECK-NEXT: S_NOP 0, implicit [[SI_SPILL_V256_RESTORE]], implicit [[SI_SPILL_V256_RESTORE1]], implicit %4		; CHECK-NEXT: S_NOP 0, implicit [[SI_SPILL_V256_RESTORE]], implicit [[SI_SPILL_V256_RESTORE1]], implicit %4
; CHECK-NEXT: [[COPY:%[0-9]+]]:vreg_256 = COPY [[SI_SPILL_V256_RESTORE1]]		; CHECK-NEXT: [[PRED_COPY:%[0-9]+]]:vreg_256 = PRED_COPY [[SI_SPILL_V256_RESTORE1]]
; CHECK-NEXT: S_CBRANCH_EXECNZ %bb.2, implicit $exec		; CHECK-NEXT: S_CBRANCH_EXECNZ %bb.2, implicit $exec
; CHECK-NEXT: {{ $}}		; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1:		; CHECK-NEXT: bb.1:
; CHECK-NEXT: successors: %bb.2(0x80000000)		; CHECK-NEXT: successors: %bb.2(0x80000000)
; CHECK-NEXT: {{ $}}		; CHECK-NEXT: {{ $}}
; CHECK-NEXT: S_NOP 0, implicit [[COPY]]		; CHECK-NEXT: S_NOP 0, implicit [[PRED_COPY]]
; CHECK-NEXT: [[SI_SPILL_V256_RESTORE2:%[0-9]+]]:vreg_256 = SI_SPILL_V256_RESTORE %stack.0, $sgpr32, 0, implicit $exec :: (load (s256) from %stack.0, align 4, addrspace 5)		; CHECK-NEXT: [[SI_SPILL_V256_RESTORE2:%[0-9]+]]:vreg_256 = SI_SPILL_V256_RESTORE %stack.0, $sgpr32, 0, implicit $exec :: (load (s256) from %stack.0, align 4, addrspace 5)
; CHECK-NEXT: S_NOP 0, implicit [[SI_SPILL_V256_RESTORE2]]		; CHECK-NEXT: S_NOP 0, implicit [[SI_SPILL_V256_RESTORE2]]
; CHECK-NEXT: [[SI_SPILL_V256_RESTORE3:%[0-9]+]]:vreg_256 = SI_SPILL_V256_RESTORE %stack.2, $sgpr32, 0, implicit $exec :: (load (s256) from %stack.2, align 4, addrspace 5)		; CHECK-NEXT: [[SI_SPILL_V256_RESTORE3:%[0-9]+]]:vreg_256 = SI_SPILL_V256_RESTORE %stack.2, $sgpr32, 0, implicit $exec :: (load (s256) from %stack.2, align 4, addrspace 5)
; CHECK-NEXT: S_NOP 0, implicit [[SI_SPILL_V256_RESTORE3]]		; CHECK-NEXT: S_NOP 0, implicit [[SI_SPILL_V256_RESTORE3]]
; CHECK-NEXT: {{ $}}		; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2:		; CHECK-NEXT: bb.2:
; CHECK-NEXT: S_ENDPGM 0		; CHECK-NEXT: S_ENDPGM 0
bb.0:		bb.0:
Show All 15 Lines

llvm/test/CodeGen/AMDGPU/regalloc-introduces-copy-sgpr-to-agpr.mir

Show First 20 Lines • Show All 281 Lines • ▼ Show 20 Lines	bb.0:
; GFX908-NEXT: $vgpr0 = V_ACCVGPR_READ_B32_e64 killed $agpr32, implicit $exec, implicit $exec		; GFX908-NEXT: $vgpr0 = V_ACCVGPR_READ_B32_e64 killed $agpr32, implicit $exec, implicit $exec
; GFX908-NEXT: GLOBAL_STORE_DWORD undef $vgpr0_vgpr1, killed renamable $vgpr0, 0, 0, implicit $exec		; GFX908-NEXT: GLOBAL_STORE_DWORD undef $vgpr0_vgpr1, killed renamable $vgpr0, 0, 0, implicit $exec
; GFX908-NEXT: $vgpr0 = V_ACCVGPR_READ_B32_e64 killed $agpr33, implicit $exec, implicit $exec		; GFX908-NEXT: $vgpr0 = V_ACCVGPR_READ_B32_e64 killed $agpr33, implicit $exec, implicit $exec
; GFX908-NEXT: GLOBAL_STORE_DWORD undef $vgpr0_vgpr1, killed renamable $vgpr0, 0, 0, implicit $exec		; GFX908-NEXT: GLOBAL_STORE_DWORD undef $vgpr0_vgpr1, killed renamable $vgpr0, 0, 0, implicit $exec
; GFX908-NEXT: $vgpr0 = V_ACCVGPR_READ_B32_e64 killed $agpr34, implicit $exec, implicit $exec		; GFX908-NEXT: $vgpr0 = V_ACCVGPR_READ_B32_e64 killed $agpr34, implicit $exec, implicit $exec
; GFX908-NEXT: GLOBAL_STORE_DWORD undef $vgpr0_vgpr1, killed renamable $vgpr0, 0, 0, implicit $exec		; GFX908-NEXT: GLOBAL_STORE_DWORD undef $vgpr0_vgpr1, killed renamable $vgpr0, 0, 0, implicit $exec
; GFX908-NEXT: $vgpr0 = V_ACCVGPR_READ_B32_e64 killed $agpr35, implicit $exec, implicit $exec		; GFX908-NEXT: $vgpr0 = V_ACCVGPR_READ_B32_e64 killed $agpr35, implicit $exec, implicit $exec
; GFX908-NEXT: GLOBAL_STORE_DWORD undef $vgpr0_vgpr1, killed renamable $vgpr0, 0, 0, implicit $exec		; GFX908-NEXT: GLOBAL_STORE_DWORD undef $vgpr0_vgpr1, killed renamable $vgpr0, 0, 0, implicit $exec
		; GFX908-NEXT: renamable $agpr0 = KILL killed renamable $agpr0, implicit $exec
; GFX908-NEXT: $vgpr1 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, 0, 8, 0, 0, implicit $exec :: (load (s32) from %stack.1, addrspace 5)		; GFX908-NEXT: $vgpr1 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, 0, 8, 0, 0, implicit $exec :: (load (s32) from %stack.1, addrspace 5)
; GFX908-NEXT: $vgpr2 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, 0, 12, 0, 0, implicit $exec :: (load (s32) from %stack.2, addrspace 5)		; GFX908-NEXT: $vgpr2 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, 0, 12, 0, 0, implicit $exec :: (load (s32) from %stack.2, addrspace 5)
; GFX908-NEXT: $vgpr3 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, 0, 16, 0, 0, implicit $exec :: (load (s32) from %stack.3, addrspace 5)		; GFX908-NEXT: $vgpr3 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, 0, 16, 0, 0, implicit $exec :: (load (s32) from %stack.3, addrspace 5)
; GFX908-NEXT: $vgpr4 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, 0, 20, 0, 0, implicit $exec :: (load (s32) from %stack.4, addrspace 5)		; GFX908-NEXT: $vgpr4 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, 0, 20, 0, 0, implicit $exec :: (load (s32) from %stack.4, addrspace 5)
; GFX908-NEXT: $vgpr5 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, 0, 24, 0, 0, implicit $exec :: (load (s32) from %stack.5, addrspace 5)		; GFX908-NEXT: $vgpr5 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, 0, 24, 0, 0, implicit $exec :: (load (s32) from %stack.5, addrspace 5)
; GFX908-NEXT: $vgpr6 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, 0, 28, 0, 0, implicit $exec :: (load (s32) from %stack.6, addrspace 5)		; GFX908-NEXT: $vgpr6 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, 0, 28, 0, 0, implicit $exec :: (load (s32) from %stack.6, addrspace 5)
; GFX908-NEXT: $vgpr7 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, 0, 32, 0, 0, implicit $exec :: (load (s32) from %stack.7, addrspace 5)		; GFX908-NEXT: $vgpr7 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, 0, 32, 0, 0, implicit $exec :: (load (s32) from %stack.7, addrspace 5)
; GFX908-NEXT: $vgpr8 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, 0, 36, 0, 0, implicit $exec :: (load (s32) from %stack.8, addrspace 5)		; GFX908-NEXT: $vgpr8 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, 0, 36, 0, 0, implicit $exec :: (load (s32) from %stack.8, addrspace 5)
▲ Show 20 Lines • Show All 174 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/sgpr-regalloc-flags.ll

	Show All 15 Lines
	; REGALLOC: -regalloc not supported with amdgcn. Use -sgpr-regalloc and -vgpr-regalloc			; REGALLOC: -regalloc not supported with amdgcn. Use -sgpr-regalloc and -vgpr-regalloc

	; DEFAULT: Greedy Register Allocator			; DEFAULT: Greedy Register Allocator
	; DEFAULT-NEXT: Virtual Register Rewriter			; DEFAULT-NEXT: Virtual Register Rewriter
	; DEFAULT-NEXT: SI lower SGPR spill instructions			; DEFAULT-NEXT: SI lower SGPR spill instructions
	; DEFAULT-NEXT: Virtual Register Map			; DEFAULT-NEXT: Virtual Register Map
	; DEFAULT-NEXT: Live Register Matrix			; DEFAULT-NEXT: Live Register Matrix
	; DEFAULT-NEXT: Greedy Register Allocator			; DEFAULT-NEXT: Greedy Register Allocator
				; DEFAULT-NEXT: SI Lower Predicated Copies
	; DEFAULT-NEXT: GCN NSA Reassign			; DEFAULT-NEXT: GCN NSA Reassign
	; DEFAULT-NEXT: Virtual Register Rewriter			; DEFAULT-NEXT: Virtual Register Rewriter
	; DEFAULT-NEXT: Stack Slot Coloring			; DEFAULT-NEXT: Stack Slot Coloring

	; O0: Fast Register Allocator			; O0: Fast Register Allocator
	; O0-NEXT: SI lower SGPR spill instructions			; O0-NEXT: SI lower SGPR spill instructions
	; O0-NEXT: Fast Register Allocator			; O0-NEXT: Fast Register Allocator
				; O0-NEXT: SI Lower Predicated Copies
	; O0-NEXT: SI Fix VGPR copies			; O0-NEXT: SI Fix VGPR copies




	; BASIC-DEFAULT: Debug Variable Analysis			; BASIC-DEFAULT: Debug Variable Analysis
	; BASIC-DEFAULT-NEXT: Live Stack Slot Analysis			; BASIC-DEFAULT-NEXT: Live Stack Slot Analysis
	; BASIC-DEFAULT-NEXT: Machine Natural Loop Construction			; BASIC-DEFAULT-NEXT: Machine Natural Loop Construction
	; BASIC-DEFAULT-NEXT: Machine Block Frequency Analysis			; BASIC-DEFAULT-NEXT: Machine Block Frequency Analysis
	; BASIC-DEFAULT-NEXT: Virtual Register Map			; BASIC-DEFAULT-NEXT: Virtual Register Map
	; BASIC-DEFAULT-NEXT: Live Register Matrix			; BASIC-DEFAULT-NEXT: Live Register Matrix
	; BASIC-DEFAULT-NEXT: Basic Register Allocator			; BASIC-DEFAULT-NEXT: Basic Register Allocator
	; BASIC-DEFAULT-NEXT: Virtual Register Rewriter			; BASIC-DEFAULT-NEXT: Virtual Register Rewriter
	; BASIC-DEFAULT-NEXT: SI lower SGPR spill instructions			; BASIC-DEFAULT-NEXT: SI lower SGPR spill instructions
	; BASIC-DEFAULT-NEXT: Virtual Register Map			; BASIC-DEFAULT-NEXT: Virtual Register Map
	; BASIC-DEFAULT-NEXT: Live Register Matrix			; BASIC-DEFAULT-NEXT: Live Register Matrix
	; BASIC-DEFAULT-NEXT: Bundle Machine CFG Edges			; BASIC-DEFAULT-NEXT: Bundle Machine CFG Edges
	; BASIC-DEFAULT-NEXT: Spill Code Placement Analysis			; BASIC-DEFAULT-NEXT: Spill Code Placement Analysis
	; BASIC-DEFAULT-NEXT: Lazy Machine Block Frequency Analysis			; BASIC-DEFAULT-NEXT: Lazy Machine Block Frequency Analysis
	; BASIC-DEFAULT-NEXT: Machine Optimization Remark Emitter			; BASIC-DEFAULT-NEXT: Machine Optimization Remark Emitter
	; BASIC-DEFAULT-NEXT: Greedy Register Allocator			; BASIC-DEFAULT-NEXT: Greedy Register Allocator
				; BASIC-DEFAULT-NEXT: SI Lower Predicated Copies
	; BASIC-DEFAULT-NEXT: GCN NSA Reassign			; BASIC-DEFAULT-NEXT: GCN NSA Reassign
	; BASIC-DEFAULT-NEXT: Virtual Register Rewriter			; BASIC-DEFAULT-NEXT: Virtual Register Rewriter
	; BASIC-DEFAULT-NEXT: Stack Slot Coloring			; BASIC-DEFAULT-NEXT: Stack Slot Coloring



	; DEFAULT-BASIC: Greedy Register Allocator			; DEFAULT-BASIC: Greedy Register Allocator
	; DEFAULT-BASIC-NEXT: Virtual Register Rewriter			; DEFAULT-BASIC-NEXT: Virtual Register Rewriter
	; DEFAULT-BASIC-NEXT: SI lower SGPR spill instructions			; DEFAULT-BASIC-NEXT: SI lower SGPR spill instructions
	; DEFAULT-BASIC-NEXT: Virtual Register Map			; DEFAULT-BASIC-NEXT: Virtual Register Map
	; DEFAULT-BASIC-NEXT: Live Register Matrix			; DEFAULT-BASIC-NEXT: Live Register Matrix
	; DEFAULT-BASIC-NEXT: Basic Register Allocator			; DEFAULT-BASIC-NEXT: Basic Register Allocator
				; DEFAULT-BASIC-NEXT: SI Lower Predicated Copies
	; DEFAULT-BASIC-NEXT: GCN NSA Reassign			; DEFAULT-BASIC-NEXT: GCN NSA Reassign
	; DEFAULT-BASIC-NEXT: Virtual Register Rewriter			; DEFAULT-BASIC-NEXT: Virtual Register Rewriter
	; DEFAULT-BASIC-NEXT: Stack Slot Coloring			; DEFAULT-BASIC-NEXT: Stack Slot Coloring



	; BASIC-BASIC: Debug Variable Analysis			; BASIC-BASIC: Debug Variable Analysis
	; BASIC-BASIC-NEXT: Live Stack Slot Analysis			; BASIC-BASIC-NEXT: Live Stack Slot Analysis
	; BASIC-BASIC-NEXT: Machine Natural Loop Construction			; BASIC-BASIC-NEXT: Machine Natural Loop Construction
	; BASIC-BASIC-NEXT: Machine Block Frequency Analysis			; BASIC-BASIC-NEXT: Machine Block Frequency Analysis
	; BASIC-BASIC-NEXT: Virtual Register Map			; BASIC-BASIC-NEXT: Virtual Register Map
	; BASIC-BASIC-NEXT: Live Register Matrix			; BASIC-BASIC-NEXT: Live Register Matrix
	; BASIC-BASIC-NEXT: Basic Register Allocator			; BASIC-BASIC-NEXT: Basic Register Allocator
	; BASIC-BASIC-NEXT: Virtual Register Rewriter			; BASIC-BASIC-NEXT: Virtual Register Rewriter
	; BASIC-BASIC-NEXT: SI lower SGPR spill instructions			; BASIC-BASIC-NEXT: SI lower SGPR spill instructions
	; BASIC-BASIC-NEXT: Virtual Register Map			; BASIC-BASIC-NEXT: Virtual Register Map
	; BASIC-BASIC-NEXT: Live Register Matrix			; BASIC-BASIC-NEXT: Live Register Matrix
	; BASIC-BASIC-NEXT: Basic Register Allocator			; BASIC-BASIC-NEXT: Basic Register Allocator
				; BASIC-BASIC-NEXT: SI Lower Predicated Copies
	; BASIC-BASIC-NEXT: GCN NSA Reassign			; BASIC-BASIC-NEXT: GCN NSA Reassign
	; BASIC-BASIC-NEXT: Virtual Register Rewriter			; BASIC-BASIC-NEXT: Virtual Register Rewriter
	; BASIC-BASIC-NEXT: Stack Slot Coloring			; BASIC-BASIC-NEXT: Stack Slot Coloring


	declare void @bar()			declare void @bar()

	; Something with some CSR SGPR spills			; Something with some CSR SGPR spills
	Show All 18 Lines

llvm/test/CodeGen/AMDGPU/skip-subreg-copy-from-iswwmcopy-check.mir

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a -run-pass=si-lower-predicated-copies -verify-machineinstrs %s -o - \| FileCheck -check-prefix=GCN %s

				# The test goes into infinite loop while checking for isWWMCopy().
				# getUniqueVRegDef of the SrcReg returns the instruction itself if it is a partial copy.
				# wwm-copies will always be a full copy and hence skip subreg copies while checking for one.

				---
				name: subreg_copy
				tracksRegLiveness: true
				machineFunctionInfo:
				isEntryFunction: false
				body: \|
				bb.0:
				; GCN-LABEL: name: subreg_copy
				; GCN: dead undef %0.sub3:vreg_128_align2 = PRED_COPY undef %0.sub1
				; GCN-NEXT: SI_RETURN
				dead undef %0.sub3:vreg_128_align2 = PRED_COPY undef %0.sub1
				SI_RETURN
				...

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Enable whole wave register copyClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 528812

llvm/lib/CodeGen/ExpandPostRAPseudos.cpp

llvm/lib/Target/AMDGPU/AMDGPU.h

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

llvm/lib/Target/AMDGPU/CMakeLists.txt

llvm/lib/Target/AMDGPU/SIFixVGPRCopies.cpp

llvm/lib/Target/AMDGPU/SIInstrInfo.h

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

llvm/lib/Target/AMDGPU/SIInstructions.td

llvm/lib/Target/AMDGPU/SILowerPredicatedCopies.cpp

llvm/test/CodeGen/AMDGPU/branch-folding-implicit-def-subreg.ll

llvm/test/CodeGen/AMDGPU/greedy-global-heuristic.mir

llvm/test/CodeGen/AMDGPU/llc-pipeline.ll

llvm/test/CodeGen/AMDGPU/load-global-i16.ll

llvm/test/CodeGen/AMDGPU/partial-regcopy-and-spill-missed-at-regalloc.ll

llvm/test/CodeGen/AMDGPU/regalloc-fail-unsatisfiable-overlapping-tuple-hints.mir

llvm/test/CodeGen/AMDGPU/regalloc-introduces-copy-sgpr-to-agpr.mir

llvm/test/CodeGen/AMDGPU/sgpr-regalloc-flags.ll

llvm/test/CodeGen/AMDGPU/skip-subreg-copy-from-iswwmcopy-check.mir

[AMDGPU] Enable whole wave register copy
ClosedPublic