This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
-
AMDGPU.h
2/2
AMDGPUTargetMachine.cpp
-
CMakeLists.txt
41/46
SIOptimizeVGPRLiveRange.cpp
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
1/2
vgpr-liverange.ll

Differential D102212

[AMDGPU] Add Optimize VGPR LiveRange Pass.
ClosedPublic

Authored by ruiling on May 10 2021, 8:17 PM.

Download Raw Diff

Details

Reviewers

arsenm
foad
critson
piotr
nhaehnle

Commits

rG208332de8abf: [AMDGPU] Add Optimize VGPR LiveRange Pass.

Summary

This pass aims to optimize VGPR live-range in a typical divergent if-else
control flow. For example:

def(a)
if(cond)

use(a)
... // A

else

use(a)

As AMDGPU access vgpr with respect to active-mask, we can mark a as
dead in region A. For details, please refer to the comments in
implementation file.

The pass is enabled by default, the frontend can disable it through
"-amdgpu-opt-vgpr-liverange=false".

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ruiling created this revision.May 10 2021, 8:17 PM

Herald added subscribers: kerbowa, hiraditya, t-tye and 8 others. · View Herald TranscriptMay 10 2021, 8:17 PM

ruiling requested review of this revision.May 10 2021, 8:17 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 10 2021, 8:17 PM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

Harbormaster completed remote builds in B103653: Diff 344279.May 10 2021, 8:30 PM

ping

arsenm added a reviewer: nhaehnle.May 17 2021, 1:54 PM

Is this relying on assumptions about how the placement of blocks after structurization? I think we need to augment the MIR to better track vector vs. scalar predecessors

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
168	Should default to on
llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp
39	s/LLVM/register allocator
72	Why no initializer like the rest?
166	printName?
168	single quotes
203	Could also handle AGPR
276	Braces
301	printName
326	printName
459–460	Can move this before the initializations
llvm/test/CodeGen/AMDGPU/vgpr-liverange.ll
3	Could really use an additional end to end IR checks to make sure this actually gets the allocator to do what you want

In D102212#2764725, @arsenm wrote:

Is this relying on assumptions about how the placement of blocks after structurization? I think we need to augment the MIR to better track vector vs. scalar predecessors

Thanks for the careful review! This pass assume the IR is structurized, mainly things around if-else-endif, the SI_ELSE should have a corresponding SI_IF and SI_ENDIF, and the sub-regions should be single-entry-single-exit, I don't think I have dependency on specific order of the blocks.

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
168	default on is ok to me, but I really hope we could do some testing at Compute and Mesa side to make sure we don't have obvious regression before landed.
llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp
72	should be like others.
203	I guess AGPRs are used as physical registers? if yes, I think we cannot handle them here, as handling physical register needs lots more work regarding the live-or-not-checking and updating the LiveVariable information.
llvm/test/CodeGen/AMDGPU/vgpr-liverange.ll
3	good idea, will do that.

arsenm added inline comments.May 17 2021, 4:56 PM

llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp
203	No, it's a class. They behave exactly like VGPRs

critson added inline comments.May 21 2021, 3:01 AM

llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp
10	"optimize off unnecessary VGPR live range" -> "remove unnecessary VGPR live ranges"?
190	Any reason for explicit 8 here? (I am also being reminded not to write explicit sizes in these.)
193	Is this Else->instrs() to skip phis for a reason? (Rather than *Else.)
196	I think you can just use MI.getNumOperands() directly in the loop condition and leave it up to the compiler whether to hoist it? In fact you do that in the next similar loop.
199	What is the "MO.getReg() == 0" for? It does not seem correct.
222	Where does this loop check that the registers added to KillsInElse are actually from the else branch?
237	Should this test be before the live interval retrieval?
290	Can break out of loop here? Also perhaps assert(ThenEntry) after loop?
347	As Matt noted on the earlier one of these probably should use braces here too.
371	while?
475	As earlier, do we need explicit 8s here?

foad added inline comments.May 21 2021, 3:34 AM

llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp
196	Or `for (auto &MO : MI.operands())`?
334	Use llvm::is_contained (here and a few other places).

Thanks for all the careful comments, I will also address new comments from Carl and Jay in next version.

llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp
190	No specific reason, will remove it.
193	I think they are just the same. instrs() means all the instructions I think.
196	sounds good.
222	I am assuming the predecessor that is not the Flow is from Else region. Maybe I need to add some assert for this.
237	Do you mean put this check before the line "LV->getVarInfo(Reg)"? If yes, I will fix it next version.
290	sure, will do it.

ruiling added inline comments.May 21 2021, 7:37 AM

llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp
199	they are just for trivial invalid case, should be AMDGPU::NoRegister, why not correct?

address review comments

Herald added a subscriber: nikic. · View Herald TranscriptMay 21 2021, 9:24 AM

Harbormaster completed remote builds in B105659: Diff 347063.May 21 2021, 10:15 AM

arsenm added inline comments.May 21 2021, 12:37 PM

llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp
203	There's an isVectorRegister helper

use isVectorRegister()

ruiling edited the summary of this revision. (Show Details)May 24 2021, 6:19 PM

ruiling added inline comments.

llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp
190	SmallSet still needs explicit size, so keep it while removing other explicit size for SmallVector.

Harbormaster completed remote builds in B106018: Diff 347538.May 24 2021, 6:46 PM

ping, any further comments?

ping again. I would like to add more background about the patch, this change could improve the performance of some critical workloads over 8%. And this improvement is quite important for us. Would you like to accept this? @arsenm @critson

ruiling marked 23 inline comments as done.Jun 7 2021, 4:17 AM

Sorry a few more comments, mostly minor.

llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp
120	Order of Required and Preserved could be the same?
153	I think this can just be "while (MBB) {"
159	This doesn't seem right. You only collect an arbitrary number of blocks? Should ElseBlocks not simply be growing to hold all relevant blocks?
198	Instead of "MO.getReg() == AMDGPU::NoRegister", I believe you can just write "!MO.getReg()".
231	As above.
llvm/test/CodeGen/AMDGPU/vgpr-liverange2.ll
1 ↗	(On Diff #347538)	These files should have a better name. Perhaps: vgpr-liverange.ll -> vgpr-liverange-ir.ll vgpr-liverange2.ll -> vgpr-liverange.ll We definitely don't use numeric suffixes at present.

foad added inline comments.Jun 8 2021, 3:36 AM

llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp
159	Cur tells you the first index in Blocks of a block that you have not already visited to scan its predecessors. Perhaps it would be clearer as: SetVector<MachineBasicBlock > Blocks(Endif); for (unsigned Cur = 0; Cur < Blocks.size(); ++Cur) { auto MBB = Blocks[Cur]; for (auto *Pred : MBB->predecessors()) { if (Pred != Flow) Blocks.insert(Pred); } }
175	`for (auto &UseMI : MRI->use_nodbg_instructions(Reg))`

ruiling added inline comments.Jun 8 2021, 6:27 PM

llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp
159	Hi @critson, does @foad's comment help answer your concern? I don't quite understand your confusion. The idea here is to iteratively visit the predecessors starting from `Endif` and push them into the Blocks and stop if the predecessor is `Flow`. I admit the code here may not quite easy to understand, I struggled for a while when writing this piece of code. @foad's suggestion missed one subtle requirement: the `Endif` should not appear in the final result. and I don't want to remove the first element from the vector after that. I think that was the main reason I wrote it like this. Considering that for most cases we would not have too much blocks in an if-else structure(may be I am wrong?), I think a "find in vector" should not be a big issue.

address review comments

ruiling marked 5 inline comments as done.Jun 9 2021, 1:06 AM

Harbormaster completed remote builds in B108354: Diff 350804.Jun 9 2021, 1:37 AM

foad added inline comments.Jun 9 2021, 2:05 AM

llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp
159	"for most cases we would not have too much blocks in an if-else structure": this makes me a bit nervous. People write big shaders, so some people probably write big shaders with an if-then-else wrapped around the whole thing.

critson added inline comments.Jun 9 2021, 2:21 AM

llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp
159	OK, I understand what the loop does now. I clearly understood it first time, but did get confused on the second reading. As Jay raises the is_contained in the loop may be executed a lot with a big if-else in a shader? Maybe this should be a SmallSetVector?

Use SmallSetVector for ElseBlocks

ruiling marked 2 inline comments as done.Jun 9 2021, 4:44 AM

ruiling added inline comments.

llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp
159	done

Harbormaster completed remote builds in B108387: Diff 350856.Jun 9 2021, 5:18 AM

LGTM, with one nit

llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp
350	Since Visited is a set type, I think this should be a direct test on the set, not is_contained.

This revision is now accepted and ready to land.Jun 9 2021, 6:59 PM

This revision was landed with ongoing or failed builds.Jun 21 2021, 12:27 AM

Closed by commit rG208332de8abf: [AMDGPU] Add Optimize VGPR LiveRange Pass. (authored by ruiling). · Explain Why

This revision was automatically updated to reflect the committed changes.

ruiling marked an inline comment as done.

ruiling added a commit: rG208332de8abf: [AMDGPU] Add Optimize VGPR LiveRange Pass..

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPU.h

4 lines

AMDGPUTargetMachine.cpp

8 lines

CMakeLists.txt

1 line

SIOptimizeVGPRLiveRange.cpp

497 lines

test/

CodeGen/

AMDGPU/

vgpr-liverange.ll

187 lines

Diff 344279

llvm/lib/Target/AMDGPU/AMDGPU.h

	Show First 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
	FunctionPass *createSIFoldOperandsPass();			FunctionPass *createSIFoldOperandsPass();
	FunctionPass *createSIPeepholeSDWAPass();			FunctionPass *createSIPeepholeSDWAPass();
	FunctionPass *createSILowerI1CopiesPass();			FunctionPass *createSILowerI1CopiesPass();
	FunctionPass *createSIShrinkInstructionsPass();			FunctionPass *createSIShrinkInstructionsPass();
	FunctionPass *createSILoadStoreOptimizerPass();			FunctionPass *createSILoadStoreOptimizerPass();
	FunctionPass *createSIWholeQuadModePass();			FunctionPass *createSIWholeQuadModePass();
	FunctionPass *createSIFixControlFlowLiveIntervalsPass();			FunctionPass *createSIFixControlFlowLiveIntervalsPass();
	FunctionPass *createSIOptimizeExecMaskingPreRAPass();			FunctionPass *createSIOptimizeExecMaskingPreRAPass();
				FunctionPass *createSIOptimizeVGPRLiveRangePass();
	FunctionPass *createSIFixSGPRCopiesPass();			FunctionPass *createSIFixSGPRCopiesPass();
	FunctionPass *createSIMemoryLegalizerPass();			FunctionPass *createSIMemoryLegalizerPass();
	FunctionPass *createSIInsertWaitcntsPass();			FunctionPass *createSIInsertWaitcntsPass();
	FunctionPass *createSIPreAllocateWWMRegsPass();			FunctionPass *createSIPreAllocateWWMRegsPass();
	FunctionPass *createSIFormMemoryClausesPass();			FunctionPass *createSIFormMemoryClausesPass();

	FunctionPass *createSIPostRABundlerPass();			FunctionPass *createSIPostRABundlerPass();
	FunctionPass createAMDGPUSimplifyLibCallsPass(const TargetMachine );			FunctionPass createAMDGPUSimplifyLibCallsPass(const TargetMachine );
	▲ Show 20 Lines • Show All 216 Lines • ▼ Show 20 Lines

	struct AMDGPUUnifyMetadataPass : PassInfoMixin<AMDGPUUnifyMetadataPass> {			struct AMDGPUUnifyMetadataPass : PassInfoMixin<AMDGPUUnifyMetadataPass> {
	PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM);			PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM);
	};			};

	void initializeSIOptimizeExecMaskingPreRAPass(PassRegistry&);			void initializeSIOptimizeExecMaskingPreRAPass(PassRegistry&);
	extern char &SIOptimizeExecMaskingPreRAID;			extern char &SIOptimizeExecMaskingPreRAID;

				void initializeSIOptimizeVGPRLiveRangePass(PassRegistry &);
				extern char &SIOptimizeVGPRLiveRangeID;

	void initializeAMDGPUAnnotateUniformValuesPass(PassRegistry&);			void initializeAMDGPUAnnotateUniformValuesPass(PassRegistry&);
	extern char &AMDGPUAnnotateUniformValuesPassID;			extern char &AMDGPUAnnotateUniformValuesPassID;

	void initializeAMDGPUCodeGenPreparePass(PassRegistry&);			void initializeAMDGPUCodeGenPreparePass(PassRegistry&);
	extern char &AMDGPUCodeGenPrepareID;			extern char &AMDGPUCodeGenPrepareID;

	void initializeAMDGPULateCodeGenPreparePass(PassRegistry &);			void initializeAMDGPULateCodeGenPreparePass(PassRegistry &);
	extern char &AMDGPULateCodeGenPrepareID;			extern char &AMDGPULateCodeGenPrepareID;
	▲ Show 20 Lines • Show All 119 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

Show First 20 Lines • Show All 156 Lines • ▼ Show 20 Lines	static cl::opt<bool> EnableLowerKernelArguments(
cl::Hidden);		cl::Hidden);

static cl::opt<bool> EnableRegReassign(		static cl::opt<bool> EnableRegReassign(
"amdgpu-reassign-regs",		"amdgpu-reassign-regs",
cl::desc("Enable register reassign optimizations on gfx10+"),		cl::desc("Enable register reassign optimizations on gfx10+"),
cl::init(true),		cl::init(true),
cl::Hidden);		cl::Hidden);

		static cl::opt<bool> OptVGPRLiveRange(
		"amdgpu-opt-vgpr-liverange",
		cl::desc("Enable VGPR liverange optimizations for if-else structure"),
		cl::init(false), cl::Hidden);
		arsenmUnsubmitted Done Reply Inline Actions Should default to on arsenm: Should default to on
		ruilingAuthorUnsubmitted Done Reply Inline Actions default on is ok to me, but I really hope we could do some testing at Compute and Mesa side to make sure we don't have obvious regression before landed. ruiling: default on is ok to me, but I really hope we could do some testing at Compute and Mesa side to…

// Enable atomic optimization		// Enable atomic optimization
static cl::opt<bool> EnableAtomicOptimizations(		static cl::opt<bool> EnableAtomicOptimizations(
"amdgpu-atomic-optimizations",		"amdgpu-atomic-optimizations",
cl::desc("Enable atomic optimizations"),		cl::desc("Enable atomic optimizations"),
cl::init(false),		cl::init(false),
cl::Hidden);		cl::Hidden);

// Enable Mode register optimization		// Enable Mode register optimization
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeAMDGPUTarget() {
initializeSILowerI1CopiesPass(*PR);		initializeSILowerI1CopiesPass(*PR);
initializeSILowerSGPRSpillsPass(*PR);		initializeSILowerSGPRSpillsPass(*PR);
initializeSIFixSGPRCopiesPass(*PR);		initializeSIFixSGPRCopiesPass(*PR);
initializeSIFixVGPRCopiesPass(*PR);		initializeSIFixVGPRCopiesPass(*PR);
initializeSIFoldOperandsPass(*PR);		initializeSIFoldOperandsPass(*PR);
initializeSIPeepholeSDWAPass(*PR);		initializeSIPeepholeSDWAPass(*PR);
initializeSIShrinkInstructionsPass(*PR);		initializeSIShrinkInstructionsPass(*PR);
initializeSIOptimizeExecMaskingPreRAPass(*PR);		initializeSIOptimizeExecMaskingPreRAPass(*PR);
		initializeSIOptimizeVGPRLiveRangePass(*PR);
initializeSILoadStoreOptimizerPass(*PR);		initializeSILoadStoreOptimizerPass(*PR);
initializeAMDGPUFixFunctionBitcastsPass(*PR);		initializeAMDGPUFixFunctionBitcastsPass(*PR);
initializeAMDGPUAlwaysInlinePass(*PR);		initializeAMDGPUAlwaysInlinePass(*PR);
initializeAMDGPUAnnotateKernelFeaturesPass(*PR);		initializeAMDGPUAnnotateKernelFeaturesPass(*PR);
initializeAMDGPUAnnotateUniformValuesPass(*PR);		initializeAMDGPUAnnotateUniformValuesPass(*PR);
initializeAMDGPUArgumentUsageInfoPass(*PR);		initializeAMDGPUArgumentUsageInfoPass(*PR);
initializeAMDGPUAtomicOptimizerPass(*PR);		initializeAMDGPUAtomicOptimizerPass(*PR);
initializeAMDGPULowerKernelArgumentsPass(*PR);		initializeAMDGPULowerKernelArgumentsPass(*PR);
▲ Show 20 Lines • Show All 933 Lines • ▼ Show 20 Lines	void GCNPassConfig::addOptimizedRegAlloc() {
// instructions that cause scheduling barriers.		// instructions that cause scheduling barriers.
insertPass(&MachineSchedulerID, &SIWholeQuadModeID);		insertPass(&MachineSchedulerID, &SIWholeQuadModeID);
insertPass(&MachineSchedulerID, &SIPreAllocateWWMRegsID);		insertPass(&MachineSchedulerID, &SIPreAllocateWWMRegsID);

if (OptExecMaskPreRA)		if (OptExecMaskPreRA)
insertPass(&MachineSchedulerID, &SIOptimizeExecMaskingPreRAID);		insertPass(&MachineSchedulerID, &SIOptimizeExecMaskingPreRAID);
insertPass(&MachineSchedulerID, &SIFormMemoryClausesID);		insertPass(&MachineSchedulerID, &SIFormMemoryClausesID);

		if (OptVGPRLiveRange)
		insertPass(&LiveVariablesID, &SIOptimizeVGPRLiveRangeID);
// This must be run immediately after phi elimination and before		// This must be run immediately after phi elimination and before
// TwoAddressInstructions, otherwise the processing of the tied operand of		// TwoAddressInstructions, otherwise the processing of the tied operand of
// SI_ELSE will introduce a copy of the tied operand source after the else.		// SI_ELSE will introduce a copy of the tied operand source after the else.
insertPass(&PHIEliminationID, &SILowerControlFlowID, false);		insertPass(&PHIEliminationID, &SILowerControlFlowID, false);

if (EnableDCEInRA)		if (EnableDCEInRA)
insertPass(&DetectDeadLanesID, &DeadMachineInstructionElimID);		insertPass(&DetectDeadLanesID, &DeadMachineInstructionElimID);

▲ Show 20 Lines • Show All 212 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/CMakeLists.txt

Show First 20 Lines • Show All 125 Lines • ▼ Show 20 Lines	add_llvm_target(AMDGPUCodeGen
SILowerControlFlow.cpp		SILowerControlFlow.cpp
SILowerI1Copies.cpp		SILowerI1Copies.cpp
SILowerSGPRSpills.cpp		SILowerSGPRSpills.cpp
SIMachineFunctionInfo.cpp		SIMachineFunctionInfo.cpp
SIMachineScheduler.cpp		SIMachineScheduler.cpp
SIMemoryLegalizer.cpp		SIMemoryLegalizer.cpp
SIOptimizeExecMasking.cpp		SIOptimizeExecMasking.cpp
SIOptimizeExecMaskingPreRA.cpp		SIOptimizeExecMaskingPreRA.cpp
		SIOptimizeVGPRLiveRange.cpp
SIPeepholeSDWA.cpp		SIPeepholeSDWA.cpp
SIPostRABundler.cpp		SIPostRABundler.cpp
SIPreEmitPeephole.cpp		SIPreEmitPeephole.cpp
SIProgramInfo.cpp		SIProgramInfo.cpp
SIRegisterInfo.cpp		SIRegisterInfo.cpp
SIShrinkInstructions.cpp		SIShrinkInstructions.cpp
SIWholeQuadMode.cpp		SIWholeQuadMode.cpp
GCNILPSched.cpp		GCNILPSched.cpp
Show All 34 Lines

llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp

This file was added.

				//===--------------------- SIOptimizeVGPRLiveRange.cpp -------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				/// \file
				/// This pass tries to optimize off unnecessary VGPR live range in divergent
				critsonUnsubmitted Done Reply Inline Actions "optimize off unnecessary VGPR live range" -> "remove unnecessary VGPR live ranges"? critson: "optimize off unnecessary VGPR live range" -> "remove unnecessary VGPR live ranges"?
				/// if-else structure.
				///
				/// When we do structurization, we usually transform a if-else into two
				/// sucessive if-then (with a flow block to do predicate inversion). Consider a
				/// simple case after structurization: A divergent value %a was defined before
				/// if-else and used in both THEN (use in THEN is optional) and ELSE part:
				/// bb.if:
				/// %a = ...
				/// ...
				/// bb.then:
				/// ... = op %a
				/// ... // %a can be dead here
				/// bb.flow:
				/// ...
				/// bb.else:
				/// ... = %a
				/// ...
				/// bb.endif
				///
				/// As LLVM has no idea of the thread-control-flow, it will just assume
				/// %a would be alive in the whole range of bb.then because of a later use in
				/// bb.else. On AMDGPU architecture, the VGPR was accessed with respect to exec
				/// mask. For this if-else case, the lanes active in bb.then will be inactive
				/// in bb.else, and vice-verse. So we are safe to say that %a was dead after
				/// the last use in bb.then untill the end of the block. The reason is the
				/// instructions in bb.then will only overwrite lanes that will never be
				/// accessed in bb.else.
				///
				/// This pass aims to to tell LLVM that %a is in-fact dead, through inserting
				arsenmUnsubmitted Done Reply Inline Actions s/LLVM/register allocator arsenm: s/LLVM/register allocator
				/// a phi-node in bb.flow saying that %a is undef when coming from bb.then,
				/// and then replace the uses in the bb.else with the result of newly
				/// inserted phi.
				///
				/// Two key conditions must be met to ensure correctness:
				/// 1.) The def-point should be in the same loop-level as if-else-endif to make
				/// sure the second loop iteration still get correct data.
				/// 2.) There should be no further uses after the IF-ELSE region.
				///
				//
				//===----------------------------------------------------------------------===//

				#include "AMDGPU.h"
				#include "GCNSubtarget.h"
				#include "MCTargetDesc/AMDGPUMCTargetDesc.h"
				#include "SIMachineFunctionInfo.h"
				#include "llvm/CodeGen/LiveVariables.h"
				#include "llvm/CodeGen/MachineDominators.h"
				#include "llvm/CodeGen/MachineLoopInfo.h"
				#include "llvm/CodeGen/TargetRegisterInfo.h"
				#include "llvm/InitializePasses.h"

				using namespace llvm;

				#define DEBUG_TYPE "si-opt-vgpr-liverange"

				namespace {

				class SIOptimizeVGPRLiveRange : public MachineFunctionPass {
				private:
				const SIRegisterInfo *TRI = nullptr;
				const SIInstrInfo *TII = nullptr;
				LiveVariables *LV;
				arsenmUnsubmitted Done Reply Inline Actions Why no initializer like the rest? arsenm: Why no initializer like the rest?
				ruilingAuthorUnsubmitted Done Reply Inline Actions should be like others. ruiling: should be like others.
				MachineDominatorTree *MDT = nullptr;
				const MachineLoopInfo *Loops = nullptr;
				MachineRegisterInfo *MRI = nullptr;

				public:
				static char ID;

				MachineBasicBlock getElseTarget(MachineBasicBlock MBB) const;

				void collectElseRegionBlocks(MachineBasicBlock *Flow,
				MachineBasicBlock *Endif,
				SmallVectorImpl<MachineBasicBlock *> &) const;

				void
				collectCandidateRegisters(MachineBasicBlock If, MachineBasicBlock Flow,
				MachineBasicBlock *Endif,
				SmallVectorImpl<MachineBasicBlock *> &ElseBlocks,
				SmallVectorImpl<Register> &CandidateRegs) const;

				void FindNonPHIUsesInBlock(Register Reg, MachineBasicBlock *MBB,
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function 'FindNonPHIUsesInBlock' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function 'FindNonPHIUsesInBlock' [readability…
				SmallVectorImpl<MachineInstr *> &Uses) const;

				void updateLiveRangeInThenRegion(Register Reg, MachineBasicBlock *If,
				MachineBasicBlock *Flow) const;

				void updateLiveRangeInElseRegion(
				Register Reg, Register NewReg, MachineBasicBlock *Flow,
				MachineBasicBlock *Endif,
				SmallVectorImpl<MachineBasicBlock *> &ElseBlocks) const;

				void
				optimizeLiveRange(Register Reg, MachineBasicBlock *If,
				MachineBasicBlock Flow, MachineBasicBlock Endif,
				SmallVectorImpl<MachineBasicBlock *> &ElseBlocks) const;

				SIOptimizeVGPRLiveRange() : MachineFunctionPass(ID) {}

				bool runOnMachineFunction(MachineFunction &MF) override;

				StringRef getPassName() const override {
				return "SI Optimize VGPR LiveRange";
				}

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.addRequired<MachineDominatorTree>();
				AU.addRequired<LiveVariables>();
				AU.addRequired<MachineLoopInfo>();
				AU.addPreserved<MachineDominatorTree>();
				critsonUnsubmitted Done Reply Inline Actions Order of Required and Preserved could be the same? critson: Order of Required and Preserved could be the same?
				AU.addPreserved<MachineLoopInfo>();
				AU.addPreserved<LiveVariables>();
				MachineFunctionPass::getAnalysisUsage(AU);
				}

				MachineFunctionProperties getRequiredProperties() const override {
				return MachineFunctionProperties().set(
				MachineFunctionProperties::Property::IsSSA);
				}
				};

				} // end anonymous namespace

				// Check whether the MBB is a else flow block and get the branching target which
				// is the Endif block
				MachineBasicBlock *
				SIOptimizeVGPRLiveRange::getElseTarget(MachineBasicBlock *MBB) const {
				for (auto &br : MBB->terminators()) {
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'br' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'br' [readability-identifier-naming]…
				if (br.getOpcode() == AMDGPU::SI_ELSE) {
				return br.getOperand(2).getMBB();
				}
				}
				return nullptr;
				}

				void SIOptimizeVGPRLiveRange::collectElseRegionBlocks(
				MachineBasicBlock Flow, MachineBasicBlock Endif,
				SmallVectorImpl<MachineBasicBlock *> &Blocks) const {
				assert(Flow != Endif);

				MachineBasicBlock *MBB = Endif;
				unsigned Cur = 0;
				while (MBB != nullptr) {
				critsonUnsubmitted Done Reply Inline Actions I think this can just be "while (MBB) {" critson: I think this can just be "while (MBB) {"
				for (auto *Pred : MBB->predecessors())
				if (Pred != Flow && llvm::find(Blocks, Pred) == Blocks.end())
				Blocks.push_back(Pred);

				if (Cur < Blocks.size()) {
				MBB = Blocks[Cur++];
				critsonUnsubmitted Not Done Reply Inline Actions This doesn't seem right. You only collect an arbitrary number of blocks? Should ElseBlocks not simply be growing to hold all relevant blocks? critson: This doesn't seem right. You only collect an arbitrary number of blocks? Should ElseBlocks not…
				foadUnsubmitted Not Done Reply Inline Actions Cur tells you the first index in Blocks of a block that you have not already visited to scan its predecessors. Perhaps it would be clearer as: SetVector<MachineBasicBlock > Blocks(Endif); for (unsigned Cur = 0; Cur < Blocks.size(); ++Cur) { auto MBB = Blocks[Cur]; for (auto Pred : MBB->predecessors()) { if (Pred != Flow) Blocks.insert(Pred); } } foad:* Cur tells you the first index in Blocks of a block that you have not already visited to scan…
				ruilingAuthorUnsubmitted Done Reply Inline Actions Hi @critson, does @foad's comment help answer your concern? I don't quite understand your confusion. The idea here is to iteratively visit the predecessors starting from `Endif` and push them into the Blocks and stop if the predecessor is `Flow`. I admit the code here may not quite easy to understand, I struggled for a while when writing this piece of code. @foad's suggestion missed one subtle requirement: the `Endif` should not appear in the final result. and I don't want to remove the first element from the vector after that. I think that was the main reason I wrote it like this. Considering that for most cases we would not have too much blocks in an if-else structure(may be I am wrong?), I think a "find in vector" should not be a big issue. ruiling: Hi @critson, does @foad's comment help answer your concern? I don't quite understand your…
				foadUnsubmitted Done Reply Inline Actions "for most cases we would not have too much blocks in an if-else structure": this makes me a bit nervous. People write big shaders, so some people probably write big shaders with an if-then-else wrapped around the whole thing. foad: "for most cases we would not have too much blocks in an if-else structure": this makes me a bit…
				critsonUnsubmitted Done Reply Inline Actions OK, I understand what the loop does now. I clearly understood it first time, but did get confused on the second reading. As Jay raises the is_contained in the loop may be executed a lot with a big if-else in a shader? Maybe this should be a SmallSetVector? critson: OK, I understand what the loop does now. I clearly understood it first time, but did get…
				ruilingAuthorUnsubmitted Done Reply Inline Actions done ruiling: done
				} else
				MBB = nullptr;
				}

				LLVM_DEBUG(dbgs() << "Found Else blocks:");
				for (auto *MBB : Blocks) {
				LLVM_DEBUG(dbgs() << " bb." << MBB->getNumber());
				arsenmUnsubmitted Done Reply Inline Actions printName? arsenm: printName?
				}
				LLVM_DEBUG(dbgs() << "\n");
				arsenmUnsubmitted Done Reply Inline Actions single quotes arsenm: single quotes
				}

				/// Find the instructions(excluding phi) in \p MBB that uses the \p Reg.
				void SIOptimizeVGPRLiveRange::FindNonPHIUsesInBlock(
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function 'FindNonPHIUsesInBlock' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function 'FindNonPHIUsesInBlock' [readability…
				Register Reg, MachineBasicBlock *MBB,
				SmallVectorImpl<MachineInstr *> &Uses) const {
				for (auto I = MRI->use_nodbg_begin(Reg), E = MRI->use_nodbg_end(); I != E;
				foadUnsubmitted Done Reply Inline Actions `for (auto &UseMI : MRI->use_nodbg_instructions(Reg))` foad: `for (auto &UseMI : MRI->use_nodbg_instructions(Reg))`
				++I) {
				auto *UseMI = I->getParent();
				if (UseMI->getParent() == MBB && !UseMI->isPHI())
				Uses.push_back(UseMI);
				}
				}

				/// Collect the killed registers in the ELSE region which are not alive through
				/// the whole THEN region.
				void SIOptimizeVGPRLiveRange::collectCandidateRegisters(
				MachineBasicBlock If, MachineBasicBlock Flow, MachineBasicBlock *Endif,
				SmallVectorImpl<MachineBasicBlock *> &ElseBlocks,
				SmallVectorImpl<Register> &CandidateRegs) const {

				SmallSet<Register, 8> KillsInElse;
				critsonUnsubmitted Done Reply Inline Actions Any reason for explicit 8 here? (I am also being reminded not to write explicit sizes in these.) critson: Any reason for explicit 8 here? (I am also being reminded not to write explicit sizes in these.)
				ruilingAuthorUnsubmitted Done Reply Inline Actions No specific reason, will remove it. ruiling: No specific reason, will remove it.
				ruilingAuthorUnsubmitted Done Reply Inline Actions SmallSet still needs explicit size, so keep it while removing other explicit size for SmallVector. ruiling: SmallSet still needs explicit size, so keep it while removing other explicit size for…

				for (auto *Else : ElseBlocks) {
				for (auto &MI : Else->instrs()) {
				critsonUnsubmitted Not Done Reply Inline Actions Is this Else->instrs() to skip phis for a reason? (Rather than Else.) critson:* Is this Else->instrs() to skip phis for a reason? (Rather than *Else.)
				ruilingAuthorUnsubmitted Done Reply Inline Actions I think they are just the same. instrs() means all the instructions I think. ruiling: I think they are just the same. instrs() means all the instructions I think.
				if (MI.isDebugInstr())
				continue;
				unsigned NumOps = MI.getNumOperands();
				critsonUnsubmitted Done Reply Inline Actions I think you can just use MI.getNumOperands() directly in the loop condition and leave it up to the compiler whether to hoist it? In fact you do that in the next similar loop. critson: I think you can just use MI.getNumOperands() directly in the loop condition and leave it up to…
				foadUnsubmitted Done Reply Inline Actions Or `for (auto &MO : MI.operands())`? foad: Or `for (auto &MO : MI.operands())`?
				ruilingAuthorUnsubmitted Done Reply Inline Actions sounds good. ruiling: sounds good.
				for (unsigned Op = 0; Op < NumOps; ++Op) {
				MachineOperand &MO = MI.getOperand(Op);
				critsonUnsubmitted Done Reply Inline Actions Instead of "MO.getReg() == AMDGPU::NoRegister", I believe you can just write "!MO.getReg()". critson: Instead of "MO.getReg() == AMDGPU::NoRegister", I believe you can just write "!MO.getReg()".
				if (!MO.isReg() \|\| MO.getReg() == 0 \|\| MO.isDef())
				critsonUnsubmitted Not Done Reply Inline Actions What is the "MO.getReg() == 0" for? It does not seem correct. critson: What is the "MO.getReg() == 0" for? It does not seem correct.
				ruilingAuthorUnsubmitted Done Reply Inline Actions they are just for trivial invalid case, should be AMDGPU::NoRegister, why not correct? ruiling: they are just for trivial invalid case, should be AMDGPU::NoRegister, why not correct?
				continue;

				Register MOReg = MO.getReg();
				// We can only optimize VGPR virtual register
				arsenmUnsubmitted Done Reply Inline Actions Could also handle AGPR arsenm: Could also handle AGPR
				ruilingAuthorUnsubmitted Done Reply Inline Actions I guess AGPRs are used as physical registers? if yes, I think we cannot handle them here, as handling physical register needs lots more work regarding the live-or-not-checking and updating the LiveVariable information. ruiling: I guess AGPRs are used as physical registers? if yes, I think we cannot handle them here, as…
				arsenmUnsubmitted Done Reply Inline Actions No, it's a class. They behave exactly like VGPRs arsenm: No, it's a class. They behave exactly like VGPRs
				arsenmUnsubmitted Done Reply Inline Actions There's an isVectorRegister helper arsenm: There's an isVectorRegister helper
				if (MOReg.isPhysical() \|\| !TRI->isVGPR(*MRI, MOReg))
				continue;

				if (MO.isKill() && MO.readsReg()) {
				LiveVariables::VarInfo &VI = LV->getVarInfo(MOReg);
				const MachineBasicBlock *DefMBB = MRI->getVRegDef(MOReg)->getParent();
				// Make sure two conditions are met:
				// a.) the value is defined before/in the IF block
				// b.) should be defined in the same loop-level.
				if ((VI.AliveBlocks.test(If->getNumber()) \|\| DefMBB == If) &&
				Loops->getLoopFor(DefMBB) == Loops->getLoopFor(If))
				KillsInElse.insert(MOReg);
				}
				}
				}
				}

				// Check the phis in the Endif, looking for value coming from the ELSE
				// region. Make sure the phi-use is the last use.
				critsonUnsubmitted Done Reply Inline Actions Where does this loop check that the registers added to KillsInElse are actually from the else branch? critson: Where does this loop check that the registers added to KillsInElse are actually from the else…
				ruilingAuthorUnsubmitted Done Reply Inline Actions I am assuming the predecessor that is not the Flow is from Else region. Maybe I need to add some assert for this. ruiling: I am assuming the predecessor that is not the Flow is from Else region. Maybe I need to add…
				for (auto &MI : Endif->phis()) {
				for (unsigned Idx = 1; Idx < MI.getNumOperands(); Idx += 2) {
				auto &MO = MI.getOperand(Idx);
				auto *Pred = MI.getOperand(Idx + 1).getMBB();
				if (Pred == Flow)
				continue;

				if (!MO.isReg() \|\| MO.getReg() == 0 \|\| MO.isUndef())
				continue;
				critsonUnsubmitted Done Reply Inline Actions As above. critson: As above.
				Register Reg = MO.getReg();

				LiveVariables::VarInfo &VI = LV->getVarInfo(Reg);
				const MachineBasicBlock *DefMBB = MRI->getVRegDef(Reg)->getParent();

				if (Reg.isPhysical() \|\| !TRI->isVGPR(*MRI, Reg))
				critsonUnsubmitted Done Reply Inline Actions Should this test be before the live interval retrieval? critson: Should this test be before the live interval retrieval?
				ruilingAuthorUnsubmitted Done Reply Inline Actions Do you mean put this check before the line "LV->getVarInfo(Reg)"? If yes, I will fix it next version. ruiling: Do you mean put this check before the line "LV->getVarInfo(Reg)"? If yes, I will fix it next…
				continue;

				if (VI.isLiveIn(Endif, Reg, MRI)) {
				LLVM_DEBUG(dbgs() << "Excluding " << printReg(Reg, TRI)
				<< " as Live in Endif\n");
				continue;
				}
				// Make sure two conditions are met:
				// a.) the value is defined before/in the IF block
				// b.) should be defined in the same loop-level.
				if ((VI.AliveBlocks.test(If->getNumber()) \|\| DefMBB == If) &&
				Loops->getLoopFor(DefMBB) == Loops->getLoopFor(If))
				KillsInElse.insert(Reg);
				}
				}

				auto IsLiveThroughThen = [&](Register Reg) {
				for (auto I = MRI->use_nodbg_begin(Reg), E = MRI->use_nodbg_end(); I != E;
				++I) {
				if (!I->readsReg())
				continue;
				auto *UseMI = I->getParent();
				auto *UseMBB = UseMI->getParent();
				if (UseMBB == Flow \|\| UseMBB == Endif) {
				if (!UseMI->isPHI())
				return true;

				auto *IncomingMBB = UseMI->getOperand(I.getOperandNo() + 1).getMBB();
				// The register is live through the path If->Flow or Flow->Endif.
				// we should not optimize for such cases.
				if ((UseMBB == Flow && IncomingMBB != If) \|\|
				(UseMBB == Endif && IncomingMBB == Flow))
				return true;
				}
				}
				return false;
				};

				for (auto Reg : KillsInElse)
				arsenmUnsubmitted Done Reply Inline Actions Braces arsenm: Braces
				if (!IsLiveThroughThen(Reg))
				CandidateRegs.push_back(Reg);
				}

				// Re-calculate the liveness of \p Reg in the THEN-region
				void SIOptimizeVGPRLiveRange::updateLiveRangeInThenRegion(
				Register Reg, MachineBasicBlock If, MachineBasicBlock Flow) const {

				SmallPtrSet<MachineBasicBlock *, 16> PHIIncoming;

				MachineBasicBlock *ThenEntry = nullptr;
				for (auto *Succ : If->successors()) {
				if (Succ != Flow)
				ThenEntry = Succ;
				critsonUnsubmitted Done Reply Inline Actions Can break out of loop here? Also perhaps assert(ThenEntry) after loop? critson: Can break out of loop here? Also perhaps assert(ThenEntry) after loop?
				ruilingAuthorUnsubmitted Done Reply Inline Actions sure, will do it. ruiling: sure, will do it.
				}

				LiveVariables::VarInfo &OldVarInfo = LV->getVarInfo(Reg);
				df_iterator_default_set<MachineBasicBlock *, 16> Visited;

				for (MachineBasicBlock *MBB : depth_first_ext(ThenEntry, Visited)) {
				if (MBB == Flow)
				break;

				// Clear Live bit, as we will recalculate afterwards
				LLVM_DEBUG(dbgs() << "Clear AliveBlock bb." << MBB->getNumber() << "\n");
				arsenmUnsubmitted Done Reply Inline Actions printName arsenm: printName
				OldVarInfo.AliveBlocks.reset(MBB->getNumber());
				}

				// Get the blocks the Reg should be alive through
				for (auto I = MRI->use_nodbg_begin(Reg), E = MRI->use_nodbg_end(); I != E;
				++I) {
				auto *UseMI = I->getParent();
				if (UseMI->isPHI() && I->readsReg()) {
				if (Visited.contains(UseMI->getParent()))
				PHIIncoming.insert(UseMI->getOperand(I.getOperandNo() + 1).getMBB());
				}
				}

				Visited.clear();

				for (MachineBasicBlock *MBB : depth_first_ext(ThenEntry, Visited)) {
				if (MBB == Flow)
				break;

				SmallVector<MachineInstr *, 8> Uses;
				// PHI instructions has been processed before.
				FindNonPHIUsesInBlock(Reg, MBB, Uses);

				if (Uses.size() == 1) {
				LLVM_DEBUG(dbgs() << "Found one Non-PHI use in bb." << MBB->getNumber()
				arsenmUnsubmitted Done Reply Inline Actions printName arsenm: printName
				<< "\n");
				LV->HandleVirtRegUse(Reg, MBB, (Uses.begin()));
				} else if (Uses.size() > 1) {
				// Process the instructions in-order
				LLVM_DEBUG(dbgs() << "Found " << Uses.size() << " Non-PHI uses in bb."
				<< MBB->getNumber() << "\n");
				for (MachineInstr &MI : *MBB) {
				if (llvm::find(Uses, &MI) != Uses.end()) {
				foadUnsubmitted Done Reply Inline Actions Use llvm::is_contained (here and a few other places). foad: Use llvm::is_contained (here and a few other places).
				LV->HandleVirtRegUse(Reg, MBB, MI);
				}
				}
				}

				// Mark Reg alive through the block if this is a PHI incoming block
				if (PHIIncoming.contains(MBB))
				LV->MarkVirtRegAliveInBlock(OldVarInfo, MRI->getVRegDef(Reg)->getParent(),
				MBB);
				}

				// Set the isKilled flag if we get new Kills in the THEN region.
				for (auto *MI : OldVarInfo.Kills)
				critsonUnsubmitted Done Reply Inline Actions As Matt noted on the earlier one of these probably should use braces here too. critson: As Matt noted on the earlier one of these probably should use braces here too.
				if (llvm::find(Visited, MI->getParent()) != Visited.end())
				MI->addRegisterKilled(Reg, TRI);
				}
				critsonUnsubmitted Not Done Reply Inline Actions Since Visited is a set type, I think this should be a direct test on the set, not is_contained. critson: Since Visited is a set type, I think this should be a direct test on the set, not is_contained.

				void SIOptimizeVGPRLiveRange::updateLiveRangeInElseRegion(
				Register Reg, Register NewReg, MachineBasicBlock *Flow,
				MachineBasicBlock *Endif,
				SmallVectorImpl<MachineBasicBlock *> &ElseBlocks) const {
				LiveVariables::VarInfo &NewVarInfo = LV->getVarInfo(NewReg);
				LiveVariables::VarInfo &OldVarInfo = LV->getVarInfo(Reg);

				// Transfer aliveBlocks from Reg to NewReg
				for (auto *MBB : ElseBlocks) {
				unsigned BBNum = MBB->getNumber();
				if (OldVarInfo.AliveBlocks.test(BBNum)) {
				NewVarInfo.AliveBlocks.set(BBNum);
				LLVM_DEBUG(dbgs() << "Removing ALiveBlock bb." << BBNum << "\n");
				OldVarInfo.AliveBlocks.reset(BBNum);
				}
				}

				// Transfer the possible Kills in ElseBlocks from Reg to NewReg
				std::vector<MachineInstr *>::iterator I = OldVarInfo.Kills.begin();
				for (; I != OldVarInfo.Kills.end();) {
				critsonUnsubmitted Done Reply Inline Actions while? critson: while?
				auto KillBB = (I)->getParent();
				auto It = llvm::find(ElseBlocks, KillBB);
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: 'auto It' can be declared as 'auto It' [llvm-qualified-auto] not useful Lint: Pre-merge checks:* clang-tidy: warning: 'auto It' can be declared as 'auto *It' [llvm-qualified-auto] [[https…

				if (It != ElseBlocks.end()) {
				NewVarInfo.Kills.push_back(*I);
				I = OldVarInfo.Kills.erase(I);
				} else {
				++I;
				}
				}
				}

				void SIOptimizeVGPRLiveRange::optimizeLiveRange(
				Register Reg, MachineBasicBlock If, MachineBasicBlock Flow,
				MachineBasicBlock *Endif,
				SmallVectorImpl<MachineBasicBlock *> &ElseBlocks) const {
				// Insert a new PHI, marking the value from the THEN region being
				// undef.
				LLVM_DEBUG(dbgs() << "Optimizing " << printReg(Reg, TRI) << "\n");
				auto *RC = MRI->getRegClass(Reg);
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: 'auto RC' can be declared as 'const auto RC' [llvm-qualified-auto] not useful Lint: Pre-merge checks: clang-tidy: warning: 'auto RC' can be declared as 'const auto RC' [llvm-qualified-auto]…
				Register NewReg = MRI->createVirtualRegister(RC);
				Register UndefReg = MRI->createVirtualRegister(RC);
				MachineInstrBuilder PHI = BuildMI(*Flow, Flow->getFirstNonPHI(), DebugLoc(),
				TII->get(TargetOpcode::PHI), NewReg);
				for (auto *Pred : Flow->predecessors()) {
				if (Pred == If)
				PHI.addReg(Reg).addMBB(Pred);
				else
				PHI.addReg(UndefReg, RegState::Undef).addMBB(Pred);
				}

				// Replace all uses in the ELSE region or the PHIs in ENDIF block
				for (auto I = MRI->use_begin(Reg), E = MRI->use_end(); I != E;) {
				MachineOperand &O = *I;
				// This is a little bit tricky, the setReg() will update the linked list,
				// so we have to increment the iterator before setReg() to avoid skipping
				// some uses.
				++I;
				auto *UseMI = O.getParent();
				auto *UseBlock = UseMI->getParent();
				// Replace uses in Endif block
				if (UseBlock == Endif) {
				assert(UseMI->isPHI() && "Uses should be PHI in Endif block");
				O.setReg(NewReg);
				continue;
				}

				// Replace uses in Else region
				auto It = llvm::find(ElseBlocks, UseBlock);
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: 'auto It' can be declared as 'auto It' [llvm-qualified-auto] not useful Lint: Pre-merge checks:* clang-tidy: warning: 'auto It' can be declared as 'auto *It' [llvm-qualified-auto] [[https…
				if (It != ElseBlocks.end()) {
				O.setReg(NewReg);
				}
				}

				// The optimized Reg is not alive through Flow blocks anymore.
				LiveVariables::VarInfo &OldVarInfo = LV->getVarInfo(Reg);
				OldVarInfo.AliveBlocks.reset(Flow->getNumber());

				updateLiveRangeInElseRegion(Reg, NewReg, Flow, Endif, ElseBlocks);
				updateLiveRangeInThenRegion(Reg, If, Flow);
				}

				char SIOptimizeVGPRLiveRange::ID = 0;

				INITIALIZE_PASS_BEGIN(SIOptimizeVGPRLiveRange, DEBUG_TYPE,
				"SI Optimize VGPR LiveRange", false, false)
				INITIALIZE_PASS_DEPENDENCY(MachineDominatorTree)
				INITIALIZE_PASS_DEPENDENCY(MachineLoopInfo)
				INITIALIZE_PASS_DEPENDENCY(LiveVariables)
				INITIALIZE_PASS_END(SIOptimizeVGPRLiveRange, DEBUG_TYPE,
				"SI Optimize VGPR LiveRange", false, false)

				char &llvm::SIOptimizeVGPRLiveRangeID = SIOptimizeVGPRLiveRange::ID;

				FunctionPass *llvm::createSIOptimizeVGPRLiveRangePass() {
				return new SIOptimizeVGPRLiveRange();
				}

				bool SIOptimizeVGPRLiveRange::runOnMachineFunction(MachineFunction &MF) {
				const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
				TII = ST.getInstrInfo();
				TRI = &TII->getRegisterInfo();
				MDT = &getAnalysis<MachineDominatorTree>();
				Loops = &getAnalysis<MachineLoopInfo>();
				LV = &getAnalysis<LiveVariables>();
				MRI = &MF.getRegInfo();

				if (skipFunction(MF.getFunction()))
				return false;
				arsenmUnsubmitted Done Reply Inline Actions Can move this before the initializations arsenm: Can move this before the initializations

				bool MadeChange = false;

				// TODO: we need to think about the order of visiting the blocks to get
				// optimal result for nesting if-else cases.
				for (MachineBasicBlock &MBB : MF) {
				for (auto &MI : MBB.terminators()) {
				// Detect the if-else blocks
				if (MI.getOpcode() == AMDGPU::SI_IF) {
				MachineBasicBlock *IfTarget = MI.getOperand(2).getMBB();
				auto *Endif = getElseTarget(IfTarget);
				if (!Endif)
				continue;

				SmallVector<MachineBasicBlock *, 8> ElseBlocks;
				critsonUnsubmitted Done Reply Inline Actions As earlier, do we need explicit 8s here? critson: As earlier, do we need explicit 8s here?
				SmallVector<Register, 8> CandidateRegs;

				LLVM_DEBUG(dbgs() << "Checking IF-FLOW-ENDIF: bb." << MBB.getNumber()
				<< " bb." << IfTarget->getNumber() << " bb."
				<< Endif->getNumber() << "\n");

				// Collect all the blocks in the ELSE region
				collectElseRegionBlocks(IfTarget, Endif, ElseBlocks);

				// Collect the registers can be optimized
				collectCandidateRegisters(&MBB, IfTarget, Endif, ElseBlocks,
				CandidateRegs);
				MadeChange \|= !CandidateRegs.empty();
				// Now we are safe to optimize.
				for (auto Reg : CandidateRegs)
				optimizeLiveRange(Reg, &MBB, IfTarget, Endif, ElseBlocks);
				}
				}
				}

				return MadeChange;
				}

llvm/test/CodeGen/AMDGPU/vgpr-liverange.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				; RUN: llc -march=amdgcn -mcpu=tonga -amdgpu-opt-vgpr-liverange=true -stop-after=si-opt-vgpr-liverange -verify-machineinstrs < %s \| FileCheck -check-prefix=SI %s

				arsenmUnsubmitted Not Done Reply Inline Actions Could really use an additional end to end IR checks to make sure this actually gets the allocator to do what you want arsenm: Could really use an additional end to end IR checks to make sure this actually gets the…
				ruilingAuthorUnsubmitted Done Reply Inline Actions good idea, will do that. ruiling: good idea, will do that.
				; a normal if-else
				define amdgpu_ps float @else1(i32 %z, float %v) #0 {
				; SI-LABEL: name: else1
				; SI: bb.0.main_body:
				; SI: successors: %bb.3(0x40000000), %bb.1(0x40000000)
				; SI: liveins: $vgpr0, $vgpr1
				; SI: [[COPY:%[0-9]+]]:vgpr_32 = COPY killed $vgpr1
				; SI: [[COPY1:%[0-9]+]]:vgpr_32 = COPY killed $vgpr0
				; SI: [[V_CMP_GT_I32_e64_:%[0-9]+]]:sreg_64 = V_CMP_GT_I32_e64 6, killed [[COPY1]], implicit $exec
				; SI: [[SI_IF:%[0-9]+]]:sreg_64 = SI_IF killed [[V_CMP_GT_I32_e64_]], %bb.1, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
				; SI: S_BRANCH %bb.3
				; SI: bb.1.Flow:
				; SI: successors: %bb.2(0x40000000), %bb.4(0x40000000)
				; SI: [[PHI:%[0-9]+]]:vgpr_32 = PHI undef %13:vgpr_32, %bb.0, %4, %bb.3
				; SI: [[PHI1:%[0-9]+]]:vgpr_32 = PHI [[COPY]], %bb.0, undef %15:vgpr_32, %bb.3
				; SI: [[SI_ELSE:%[0-9]+]]:sreg_64 = SI_ELSE killed [[SI_IF]], %bb.4, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
				; SI: S_BRANCH %bb.2
				; SI: bb.2.if:
				; SI: successors: %bb.4(0x80000000)
				; SI: %3:vgpr_32 = nofpexcept V_ADD_F32_e32 killed [[PHI1]], [[PHI1]], implicit $mode, implicit $exec
				; SI: S_BRANCH %bb.4
				; SI: bb.3.else:
				; SI: successors: %bb.1(0x80000000)
				; SI: %4:vgpr_32 = nofpexcept V_MUL_F32_e32 1077936128, killed [[COPY]], implicit $mode, implicit $exec
				; SI: S_BRANCH %bb.1
				; SI: bb.4.end:
				; SI: [[PHI2:%[0-9]+]]:vgpr_32 = PHI [[PHI]], %bb.1, %3, %bb.2
				; SI: SI_END_CF killed [[SI_ELSE]], implicit-def dead $exec, implicit-def dead $scc, implicit $exec
				; SI: $vgpr0 = COPY killed [[PHI2]]
				; SI: SI_RETURN_TO_EPILOG killed $vgpr0
				main_body:
				%cc = icmp sgt i32 %z, 5
				br i1 %cc, label %if, label %else

				if:
				%v.if = fmul float %v, 2.0
				br label %end

				else:
				%v.else = fmul float %v, 3.0
				br label %end

				end:
				%r = phi float [ %v.if, %if ], [ %v.else, %else ]
				ret float %r
				}


				; %v was used after if-else
				define amdgpu_ps float @else2(i32 %z, float %v) #0 {
				; SI-LABEL: name: else2
				; SI: bb.0.main_body:
				; SI: successors: %bb.3(0x40000000), %bb.1(0x40000000)
				; SI: liveins: $vgpr0, $vgpr1
				; SI: [[COPY:%[0-9]+]]:vgpr_32 = COPY killed $vgpr1
				; SI: [[COPY1:%[0-9]+]]:vgpr_32 = COPY killed $vgpr0
				; SI: [[V_CMP_GT_I32_e64_:%[0-9]+]]:sreg_64 = V_CMP_GT_I32_e64 6, killed [[COPY1]], implicit $exec
				; SI: [[SI_IF:%[0-9]+]]:sreg_64 = SI_IF killed [[V_CMP_GT_I32_e64_]], %bb.1, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
				; SI: S_BRANCH %bb.3
				; SI: bb.1.Flow:
				; SI: successors: %bb.2(0x40000000), %bb.4(0x40000000)
				; SI: [[PHI:%[0-9]+]]:vgpr_32 = PHI undef %15:vgpr_32, %bb.0, %4, %bb.3
				; SI: [[SI_ELSE:%[0-9]+]]:sreg_64 = SI_ELSE killed [[SI_IF]], %bb.4, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
				; SI: S_BRANCH %bb.2
				; SI: bb.2.if:
				; SI: successors: %bb.4(0x80000000)
				; SI: %3:vgpr_32 = nofpexcept V_ADD_F32_e32 killed [[COPY]], [[COPY]], implicit $mode, implicit $exec
				; SI: S_BRANCH %bb.4
				; SI: bb.3.else:
				; SI: successors: %bb.1(0x80000000)
				; SI: %4:vgpr_32 = nofpexcept V_MUL_F32_e32 1077936128, [[COPY]], implicit $mode, implicit $exec
				; SI: S_BRANCH %bb.1
				; SI: bb.4.end:
				; SI: [[PHI1:%[0-9]+]]:vgpr_32 = PHI [[COPY]], %bb.1, %3, %bb.2
				; SI: [[PHI2:%[0-9]+]]:vgpr_32 = PHI [[PHI]], %bb.1, %3, %bb.2
				; SI: SI_END_CF killed [[SI_ELSE]], implicit-def dead $exec, implicit-def dead $scc, implicit $exec
				; SI: %14:vgpr_32 = nofpexcept V_ADD_F32_e32 killed [[PHI1]], killed [[PHI2]], implicit $mode, implicit $exec
				; SI: $vgpr0 = COPY killed %14
				; SI: SI_RETURN_TO_EPILOG killed $vgpr0
				main_body:
				%cc = icmp sgt i32 %z, 5
				br i1 %cc, label %if, label %else

				if:
				%v.if = fmul float %v, 2.0
				br label %end

				else:
				%v.else = fmul float %v, 3.0
				br label %end

				end:
				%r0 = phi float [ %v.if, %if ], [ %v, %else ]
				%r1 = phi float [ %v.if, %if ], [ %v.else, %else ]
				%r2 = fadd float %r0, %r1
				ret float %r2
				}

				; if-else inside loop, %x can be optimized, but %v cannot be.
				define amdgpu_ps float @else3(i32 %z, float %v, i32 inreg %bound, i32 %x0) #0 {
				; SI-LABEL: name: else3
				; SI: bb.0.entry:
				; SI: successors: %bb.1(0x80000000)
				; SI: liveins: $vgpr0, $vgpr1, $sgpr0, $vgpr2
				; SI: [[COPY:%[0-9]+]]:vgpr_32 = COPY killed $vgpr2
				; SI: [[COPY1:%[0-9]+]]:sgpr_32 = COPY killed $sgpr0
				; SI: [[COPY2:%[0-9]+]]:vgpr_32 = COPY killed $vgpr1
				; SI: [[COPY3:%[0-9]+]]:vgpr_32 = COPY killed $vgpr0
				; SI: [[V_CMP_GT_I32_e64_:%[0-9]+]]:sreg_64 = V_CMP_GT_I32_e64 6, killed [[COPY3]], implicit $exec
				; SI: %1:vgpr_32 = nofpexcept V_MUL_F32_e32 1077936128, [[COPY2]], implicit $mode, implicit $exec
				; SI: %2:vgpr_32 = nofpexcept V_ADD_F32_e32 killed [[COPY2]], [[COPY2]], implicit $mode, implicit $exec
				; SI: [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 0
				; SI: bb.1.for.body:
				; SI: successors: %bb.4(0x40000000), %bb.2(0x40000000)
				; SI: [[PHI:%[0-9]+]]:sreg_32 = PHI [[S_MOV_B32_]], %bb.0, %13, %bb.5
				; SI: [[PHI1:%[0-9]+]]:vgpr_32 = PHI [[COPY]], %bb.0, %12, %bb.5
				; SI: [[SI_IF:%[0-9]+]]:sreg_64 = SI_IF [[V_CMP_GT_I32_e64_]], %bb.2, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
				; SI: S_BRANCH %bb.4
				; SI: bb.2.Flow:
				; SI: successors: %bb.3(0x40000000), %bb.5(0x40000000)
				; SI: [[PHI2:%[0-9]+]]:vgpr_32 = PHI undef %35:vgpr_32, %bb.1, %9, %bb.4
				; SI: [[PHI3:%[0-9]+]]:vgpr_32 = PHI [[PHI1]], %bb.1, undef %38:vgpr_32, %bb.4
				; SI: [[SI_ELSE:%[0-9]+]]:sreg_64 = SI_ELSE killed [[SI_IF]], %bb.5, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
				; SI: S_BRANCH %bb.3
				; SI: bb.3.if:
				; SI: successors: %bb.5(0x80000000)
				; SI: %8:vgpr_32, dead %31:sreg_64 = V_ADD_CO_U32_e64 1, killed [[PHI3]], 0, implicit $exec
				; SI: S_BRANCH %bb.5
				; SI: bb.4.else:
				; SI: successors: %bb.2(0x80000000)
				; SI: [[V_MUL_LO_U32_e64_:%[0-9]+]]:vgpr_32 = V_MUL_LO_U32_e64 killed [[PHI1]], 3, implicit $exec
				; SI: [[COPY4:%[0-9]+]]:vgpr_32 = COPY killed [[V_MUL_LO_U32_e64_]]
				; SI: S_BRANCH %bb.2
				; SI: bb.5.if.end:
				; SI: successors: %bb.6(0x04000000), %bb.1(0x7c000000)
				; SI: [[PHI4:%[0-9]+]]:vgpr_32 = PHI %1, %bb.2, %2, %bb.3
				; SI: [[PHI5:%[0-9]+]]:vgpr_32 = PHI [[PHI2]], %bb.2, %8, %bb.3
				; SI: SI_END_CF killed [[SI_ELSE]], implicit-def dead $exec, implicit-def dead $scc, implicit $exec
				; SI: %12:vgpr_32, dead %33:sreg_64 = V_ADD_CO_U32_e64 1, [[PHI5]], 0, implicit $exec
				; SI: [[S_ADD_I32_:%[0-9]+]]:sreg_32 = S_ADD_I32 killed [[PHI]], 1, implicit-def dead $scc
				; SI: S_CMP_LT_I32 [[S_ADD_I32_]], [[COPY1]], implicit-def $scc
				; SI: S_CBRANCH_SCC1 %bb.1, implicit killed $scc
				; SI: S_BRANCH %bb.6
				; SI: bb.6.for.end:
				; SI: %34:vgpr_32 = nofpexcept V_ADD_F32_e32 killed [[PHI5]], killed [[PHI4]], implicit $mode, implicit $exec
				; SI: $vgpr0 = COPY killed %34
				; SI: SI_RETURN_TO_EPILOG killed $vgpr0
				entry:
				; %break = icmp sgt i32 %bound, 0
				; br i1 %break, label %for.body, label %for.end
				br label %for.body

				for.body:
				%i = phi i32 [ 0, %entry ], [ %inc, %if.end ]
				%x = phi i32 [ %x0, %entry ], [ %xinc, %if.end ]
				%cc = icmp sgt i32 %z, 5
				br i1 %cc, label %if, label %else

				if:
				%v.if = fmul float %v, 2.0
				%x.if = add i32 %x, 1
				br label %if.end

				else:
				%v.else = fmul float %v, 3.0
				%x.else = mul i32 %x, 3
				br label %if.end

				if.end:
				%v.endif = phi float [ %v.if, %if ], [ %v.else, %else ]
				%x.endif = phi i32 [ %x.if, %if ], [ %x.else, %else ]

				%xinc = add i32 %x.endif, 1
				%inc = add i32 %i, 1
				%cond = icmp slt i32 %inc, %bound
				br i1 %cond, label %for.body, label %for.end

				for.end:
				%x_float = bitcast i32 %x.endif to float
				%r = fadd float %x_float, %v.endif
				ret float %r
				}

				attributes #0 = { nounwind }