This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
CodeGen/GlobalISel/
-
GlobalISel/
2
CopyLocalizer.h
-
InitializePasses.h
-
lib/
-
CodeGen/GlobalISel/
-
GlobalISel/
-
CMakeLists.txt
-
CopyLocalizer.cpp
-
GlobalISel.cpp
-
Target/AArch64/
-
AArch64/
-
AArch64TargetMachine.cpp
-
test/CodeGen/AArch64/GlobalISel/
-
CodeGen/
-
AArch64/
-
GlobalISel/
-
copy-localizer.mir
-
gisel-commandline-option.ll
-
integration-shuffle-vector.ll

Differential D87157

[GlobalISel] Add a localizer for copies from physregs and use it in AArch64
Needs RevisionPublic

Authored by paquette on Sep 4 2020, 12:51 PM.

Download Raw Diff

Details

Reviewers

aemerson
arsenm
volkan
aditya_nandakumar
qcolombet
rtereshin
dsanders

Summary

In the LLVM test suite, very small functions like this are fairly common:

declare i8* @bar(i32*, i32, i32**)
define i8* @foo(i32 %x, i32** %y) {
  %z = tail call i8* @bar(i32* null, i32 %x, i32** %y)
  ret i8* %z
}

In GlobalISel, during CallLowering, we tend to produce copies in the following order (for AArch64):

%0:_(s32) = COPY $w0
%1:_(p0) = COPY $x1
%3:_(p0) = G_CONSTANT i64 0
$x0 = COPY %3(p0)
$w1 = COPY %0(s32)
$x2 = COPY %1(p0)
TCRETURNdi @bar, 0, csr_darwin_aarch64_aapcs, implicit $sp, implicit $x0, implicit $w1, implicit $x2

SelectionDAG, on the other hand, produces this:

%1:gpr64 = COPY $x1
%0:gpr32 = COPY $w0
%2:gpr64all = COPY $xzr
$x0 = COPY %2
$w1 = COPY %0
$x2 = COPY %1
TCRETURNdi @bar, 0, csr_darwin_aarch64_aapcs, implicit $sp, implicit $x0, implicit $w1, implicit $x2

All that's different here is the order of the copies from physical registers. However, GlobalISel ultimately produces twice as many copies: https://godbolt.org/z/GKGsh5

This is because at some point during greedy regalloc, we (greedily) make the wrong choice of virtual register mapping, and miss the chance to emit identity copies.

It seems like the register allocator is making a reasonable choice with what it's been given. So, personally, I think it makes sense to canonicalize these copies into a form which is amicable to the greedy register allocator.

This patch adds a specialized copy localizer pass. This pass is separate from the main localizer because I think the circumstances where it should be used are more specialized. Only targets (and opt-levels) which use the greedy register allocator can benefit from this.

This pass will reorder copies in a contiguous block of copies from physregs such that the first copy has the *latest* user in the MachineBasicBlock, and the last copy has the *earliest* user in the MachineBasicBlock.

So for this MIR:

%0:_(s32) = COPY $w0
%1:_(p0) = COPY $x1
%3:_(p0) = G_CONSTANT i64 0
$x0 = COPY %3(p0)
$w1 = COPY %0(s32)
$x2 = COPY %1(p0)
TCRETURNdi @bar, 0, csr_darwin_aarch64_aapcs, implicit $sp, implicit $x0, implicit $w1, implicit $x2

We'll reorder the copies like so:

%1:_(p0) = COPY $x1
%0:_(s32) = COPY $w0
%3:_(p0) = G_CONSTANT i64 0
$x0 = COPY %3(p0)
$w1 = COPY %0(s32)
$x2 = COPY %1(p0)
TCRETURNdi @bar, 0, csr_darwin_aarch64_aapcs, implicit $sp, implicit $x0, implicit $w1, implicit $x2

This

Minimizes the length of the live range for %0:_(s32) = COPY $w0.
Ensures that its live range is fully contained within the live range of %1:_(p0) = COPY $x1 rather than partially overlapping with it.

This pass only

Shuffles the position of copies from physregs within one contiguous range of copies from physregs.
Runs on MachineFunctions with a single MachineBasicBlock. (More MBBs -> more likely that there's enough stuff going on that this won't make a difference.)
Runs on very small MachineFunctions. (This problem only seems to show up in very small functions where there isn't much going on other than copies.)

The pass will swap the positions of %0 and %1 because

I think there are a lot of places where this could live (CallLowering, the IRTranslator, the existing Localizer, ...). I'm not particularly attached to any of these options, because I think the algorithm would come out being about just the same. Personally, I think this makes sense as a standalone pass because the circumstances in which is should run, and the scope of what it does is distinct enough that it feels like its own thing.

(FWIW, if it's possible and makes sense to teach the register allocator to handle this situation, I'd prefer that solution. However, my gut feeling says that it's probably better to just canonicalize the copies using a pass.)

For now, only add this pass to AArch64.

Diff Detail

Event Timeline

paquette created this revision.Sep 4 2020, 12:51 PM

Herald added a project: Restricted Project. · View Herald TranscriptSep 4 2020, 12:51 PM

Herald added subscribers: danielkiss, hiraditya, kristof.beyls and 2 others. · View Herald Transcript

paquette requested review of this revision.Sep 4 2020, 12:51 PM

Herald added a subscriber: wdng. · View Herald TranscriptSep 4 2020, 12:51 PM

Hi Jessica,

This feels to me that we are adding another level of heuristics that could well degrade the performances on other cases. The fact that this runs only on functions with only one basic block and with less than 25 instructions tells me this is really a workaround against a more general problem.
Put differently, I don't see that this is generally useful to be worth the compile time.

Solving that properly IMHO would require to model batch of copies as parallel copies (all the copies happen in parallel in a bundle) and we would serialize them only after register allocation. We already do something like that in SplitKit IIRC.

I am guessing your concern is that we regress some cases compared to SDISel (I would also expect that we improve some as well!), thus I wondering if we should try to "fix" this at all.
If we need to fix this, could we instead issue the arguments in the same order as SDISel in the meantime? (While we bring up the proper support for parallel copies.)

Cheers,
-Quentin

llvm/include/llvm/CodeGen/GlobalISel/CopyLocalizer.h
31	I don't understand what you mean here. The physreg interferes no matter how you arrange the live-ranges. I don't get the partial part either. (Partial overlapping comes from sub registers, and this example doesn't have any :)).
45	Before we had (read '->' == interferes with): %x1 -> py, pz, x2, x3 %x2 -> pz, pa, x1, x3 %x3 -> pa, pb, x1, x2 After we have: %x1 -> x2, x3 %x2 -> pz, px, pa, x1, x3 %x3 -> px, pa, pb, x1, x2 What is the benefit of doing so? We basically relaxed the constraint on x1, because increased it on x2 and x3.

I remember you mention that the change to CallLowering approach to this issue resulted in worse perf or code size. That does seem odd to me since we should be just aping SDAG, so even if we had some regressions we shouldn't be worse overall.

In D87157#2257366, @qcolombet wrote:

Hi Jessica,

This feels to me that we are adding another level of heuristics that could well degrade the performances on other cases. The fact that this runs only on functions with only one basic block and with less than 25 instructions tells me this is really a workaround against a more general problem.
Put differently, I don't see that this is generally useful to be worth the compile time.

I don't disagree; I'm not particularly attached to or happy with this approach. :)

Solving that properly IMHO would require to model batch of copies as parallel copies (all the copies happen in parallel in a bundle) and we would serialize them only after register allocation. We already do something like that in SplitKit IIRC.

I think I prefer this.

I am guessing your concern is that we regress some cases compared to SDISel (I would also expect that we improve some as well!), thus I wondering if we should try to "fix" this at all.

I found a surprising number of places where this happens. It's extremely noticeable in small functions containing tail calls.

These are some of the cases I found:

https://godbolt.org/z/xakndt
https://godbolt.org/z/pnK5xQ
https://godbolt.org/z/ziYBUs
https://godbolt.org/z/fwmRvE
https://godbolt.org/z/w5uPHH
https://godbolt.org/z/xJVMgp
https://godbolt.org/z/ZDXn-_
https://godbolt.org/z/fcDWGT
https://godbolt.org/z/-trkgD
https://godbolt.org/z/faLS0z
https://godbolt.org/z/xUdnUq

I suspect there are more cases like this. These are just the ones I found by manually clicking through code size regressions I found with a script.

If we need to fix this, could we instead issue the arguments in the same order as SDISel in the meantime? (While we bring up the proper support for parallel copies.)

I tried doing that, and it made some cases better, and some cases worse. (Like you mentioned, we improve some things!) It didn't seem like there was a strong improvement or regression, so I don't think it's worth doing.

If there's a proper way to model what we're missing here (e.g. parallel copies), I think it would be best to just wait for that.

I think we need some kind of scheduler / physreg copy scheduler, not necessarily a copy cloning approach

This revision now requires changes to proceed.Aug 18 2023, 6:39 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 18 2023, 6:39 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

GlobalISel/

CopyLocalizer.h

106 lines

InitializePasses.h

1 line

lib/

CodeGen/

GlobalISel/

CMakeLists.txt

1 line

CopyLocalizer.cpp

137 lines

GlobalISel.cpp

1 line

Target/

AArch64/

AArch64TargetMachine.cpp

4 lines

test/

CodeGen/

AArch64/

GlobalISel/

copy-localizer.mir

312 lines

gisel-commandline-option.ll

1 line

integration-shuffle-vector.ll

8 lines

Diff 290008

llvm/include/llvm/CodeGen/GlobalISel/CopyLocalizer.h

This file was added.

				//== llvm/CodeGen/GlobalISel/CopyLocalizer.h - Localize copies ----- C++--==//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				/// \file This describes the interface for a localizer pass specifically for
				/// physreg copies.
				///
				/// This is intended to create more favourable copy live ranges for the greedy
				/// register allocator in very small MachineFunctions. In such MachineFunctions,
				/// unfavourable live ranges can block the greedy register allocator from
				/// recognizing identity copies. As a result, these functions may contain
				/// unnecessary copies.
				///
				/// An example of an unfavourable live range is like so:
				///
				/// \code
				/// %x1 = COPY $px
				/// %x2 = COPY $py
				/// %x3 = COPY $pz
				/// ...
				/// $pa = COPY %x1
				/// $pb = COPY %x2
				/// $pc = COPY %x3
				/// \code
				///
				/// In this case, every physreg's live range partially overlaps with every other
				/// physreg's range.
				qcolombetUnsubmitted Not Done Reply Inline Actions I don't understand what you mean here. The physreg interferes no matter how you arrange the live-ranges. I don't get the partial part either. (Partial overlapping comes from sub registers, and this example doesn't have any :)). qcolombet: I don't understand what you mean here. The physreg interferes no matter how you arrange the…
				///
				/// This pass will reorder the live ranges so that there are as few partial
				/// overlaps as possible.
				///
				/// For the above example, the pass will produce:
				///
				/// \code
				/// %x2 = COPY $py
				/// %x3 = COPY $pz
				/// %x1 = COPY $px
				/// ...
				/// $pa = COPY %x1
				/// $pb = COPY %x2
				/// $pc = COPY %x3
				qcolombetUnsubmitted Not Done Reply Inline Actions Before we had (read '->' == interferes with): %x1 -> py, pz, x2, x3 %x2 -> pz, pa, x1, x3 %x3 -> pa, pb, x1, x2 After we have: %x1 -> x2, x3 %x2 -> pz, px, pa, x1, x3 %x3 -> px, pa, pb, x1, x2 What is the benefit of doing so? We basically relaxed the constraint on x1, because increased it on x2 and x3. qcolombet: Before we had (read '->' == interferes with): %x1 -> py, pz, x2, x3 %x2 -> pz, pa, x1, x3 %x3…
				/// \code
				///
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_CODEGEN_GLOBALISEL_COPYLOCALIZER_H
				#define LLVM_CODEGEN_GLOBALISEL_COPYLOCALIZER_H

				#include "llvm/ADT/SetVector.h"
				#include "llvm/CodeGen/GlobalISel/MachineIRBuilder.h"
				#include "llvm/CodeGen/MachineFunctionPass.h"

				namespace llvm {
				// Forward declarations.
				class MachineRegisterInfo;
				class TargetTransformInfo;

				class CopyLocalizer : public MachineFunctionPass {
				public:
				static char ID;

				private:
				typedef SmallSetVector<MachineInstr *, 32> LocalizedSetVecT;

				/// The maximum number of allowed instructions in the MachineFunction's entry
				/// block.
				/// Used to restrict compile time.
				const unsigned MaxBlockSize = 25;

				/// The minimum number of allowed instructions in the MachineFunction's entry
				/// block.
				/// We need at least two copies, each with at least one user.
				const unsigned MinBlockSize = 4;

				/// \returns true if the pass should run on \p MF.
				bool shouldRunOnMF(const MachineFunction &MF);

				/// Find copies to localize.
				/// \p MF [in] - The MachineFunction to search in.
				/// \p CopiesToLocalize [out] - Localizable copies within the MachineFunction.
				/// \returns true if any copies were found.
				///
				/// If any copies are found, \p CopiesToLocalize is ordered such that the
				/// copy with the closest use comes first.
				bool findCopiesToLocalize(MachineFunction &MF,
				LocalizedSetVecT &CopiesToLocalize);

				/// Moves the copies in \p CopiesToLocalize closer to their users.
				bool localizeCopies(MachineFunction &MF, LocalizedSetVecT &CopiesToLocalize);

				public:
				CopyLocalizer();
				StringRef getPassName() const override { return "Copy Localizer"; }
				MachineFunctionProperties getRequiredProperties() const override {
				return MachineFunctionProperties().set(
				MachineFunctionProperties::Property::IsSSA);
				}
				void getAnalysisUsage(AnalysisUsage &AU) const override;
				bool runOnMachineFunction(MachineFunction &MF) override;
				};
				} // namespace llvm
				#endif

llvm/include/llvm/InitializePasses.h

	Show First 20 Lines • Show All 108 Lines • ▼ Show 20 Lines
	void initializeCallGraphViewerPass(PassRegistry&);			void initializeCallGraphViewerPass(PassRegistry&);
	void initializeCallGraphWrapperPassPass(PassRegistry&);			void initializeCallGraphWrapperPassPass(PassRegistry&);
	void initializeCallSiteSplittingLegacyPassPass(PassRegistry&);			void initializeCallSiteSplittingLegacyPassPass(PassRegistry&);
	void initializeCalledValuePropagationLegacyPassPass(PassRegistry &);			void initializeCalledValuePropagationLegacyPassPass(PassRegistry &);
	void initializeCodeGenPreparePass(PassRegistry&);			void initializeCodeGenPreparePass(PassRegistry&);
	void initializeConstantHoistingLegacyPassPass(PassRegistry&);			void initializeConstantHoistingLegacyPassPass(PassRegistry&);
	void initializeConstantMergeLegacyPassPass(PassRegistry&);			void initializeConstantMergeLegacyPassPass(PassRegistry&);
	void initializeControlHeightReductionLegacyPassPass(PassRegistry&);			void initializeControlHeightReductionLegacyPassPass(PassRegistry&);
				void initializeCopyLocalizerPass(PassRegistry &);
	void initializeCorrelatedValuePropagationPass(PassRegistry&);			void initializeCorrelatedValuePropagationPass(PassRegistry&);
	void initializeCostModelAnalysisPass(PassRegistry&);			void initializeCostModelAnalysisPass(PassRegistry&);
	void initializeCrossDSOCFIPass(PassRegistry&);			void initializeCrossDSOCFIPass(PassRegistry&);
	void initializeDAEPass(PassRegistry&);			void initializeDAEPass(PassRegistry&);
	void initializeDAHPass(PassRegistry&);			void initializeDAHPass(PassRegistry&);
	void initializeDCELegacyPassPass(PassRegistry&);			void initializeDCELegacyPassPass(PassRegistry&);
	void initializeDSELegacyPassPass(PassRegistry&);			void initializeDSELegacyPassPass(PassRegistry&);
	void initializeDataFlowSanitizerLegacyPassPass(PassRegistry &);			void initializeDataFlowSanitizerLegacyPassPass(PassRegistry &);
	▲ Show 20 Lines • Show All 324 Lines • Show Last 20 Lines

llvm/lib/CodeGen/GlobalISel/CMakeLists.txt

	add_llvm_component_library(LLVMGlobalISel			add_llvm_component_library(LLVMGlobalISel
	CSEInfo.cpp			CSEInfo.cpp
	GISelKnownBits.cpp			GISelKnownBits.cpp
	CSEMIRBuilder.cpp			CSEMIRBuilder.cpp
	CallLowering.cpp			CallLowering.cpp
	GlobalISel.cpp			GlobalISel.cpp
	Combiner.cpp			Combiner.cpp
	CombinerHelper.cpp			CombinerHelper.cpp
				CopyLocalizer.cpp
	GISelChangeObserver.cpp			GISelChangeObserver.cpp
	IRTranslator.cpp			IRTranslator.cpp
	InlineAsmLowering.cpp			InlineAsmLowering.cpp
	InstructionSelect.cpp			InstructionSelect.cpp
	InstructionSelector.cpp			InstructionSelector.cpp
	LegalityPredicates.cpp			LegalityPredicates.cpp
	LegalizeMutations.cpp			LegalizeMutations.cpp
	Legalizer.cpp			Legalizer.cpp
	Show All 16 Lines

llvm/lib/CodeGen/GlobalISel/CopyLocalizer.cpp

This file was added.

				//===- CopyLocalizer.cpp - Localize copies ------------------------ C++ --==//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				/// \file
				/// Implementation of the CopyLocalizer class.
				//===----------------------------------------------------------------------===//
				#include "llvm/CodeGen/GlobalISel/CopyLocalizer.h"
				#include "llvm/ADT/DenseMap.h"
				#include "llvm/CodeGen/MachineRegisterInfo.h"
				#include "llvm/InitializePasses.h"
				#include "llvm/Support/Debug.h"

				#define DEBUG_TYPE "copy-localizer"
				using namespace llvm;

				char CopyLocalizer::ID = 0;
				INITIALIZE_PASS_BEGIN(CopyLocalizer, DEBUG_TYPE,
				"Move copies closer to their uses", false, false)
				INITIALIZE_PASS_END(CopyLocalizer, DEBUG_TYPE,
				"Move copies closer to their uses", false, false)
				CopyLocalizer::CopyLocalizer() : MachineFunctionPass(ID) {}

				void CopyLocalizer::getAnalysisUsage(AnalysisUsage &AU) const {
				getSelectionDAGFallbackAnalysisUsage(AU);
				MachineFunctionPass::getAnalysisUsage(AU);
				}

				bool CopyLocalizer::shouldRunOnMF(const MachineFunction &MF) {
				// If the ISel pipeline failed, do not bother running that pass.
				if (MF.getProperties().hasProperty(
				MachineFunctionProperties::Property::FailedISel))
				return false;

				// This is somewhat expensive (we need to keep track of all copies + users in
				// a block), so don't do it at -O0.
				if (MF.getFunction().hasOptNone())
				return false;

				// This is most beneficial in MachineFunctions with a single basic block.
				if (MF.size() > 1)
				return false;

				unsigned NumInstrsInEntry = MF.front().size();
				return NumInstrsInEntry >= MinBlockSize && NumInstrsInEntry <= MaxBlockSize;
				}

				bool CopyLocalizer::findCopiesToLocalize(
				MachineFunction &MF, LocalizedSetVecT &CopiesToLocalize) {
				// Find each localizable copy in the entry block of MF. Output is ordered such
				// that the copy with the earliest user in the block comes first.
				MachineBasicBlock &MBB = MF.front();
				MachineRegisterInfo &MRI = MF.getRegInfo();

				// Keeps track of each localizable copy an instruction uses.
				DenseMap<MachineInstr , SmallVector<MachineInstr , 2>>
				UsesToLocalizableCopies;
				for (MachineInstr &MI : instructionsWithoutDebug(MBB.begin(), MBB.end())) {
				// Every time we see a copy from a physreg, save its users. Note that we
				// don't have to check if they're local; we're restricted to functions
				// which contain a single block.
				if (MI.isCopy() && MI.getOperand(1).getReg().isPhysical())
				for (MachineInstr &UseInstr :
				MRI.use_nodbg_instructions(MI.getOperand(0).getReg()))
				UsesToLocalizableCopies[&UseInstr].push_back(&MI);

				// MI is not a copy from a physreg. Check if it is known to use any of the
				// copies we saved earlier. If so, save it so we can localize it.
				auto KnownUse = UsesToLocalizableCopies.find(&MI);
				if (KnownUse == UsesToLocalizableCopies.end())
				continue;
				for (MachineInstr *Use : KnownUse->second)
				CopiesToLocalize.insert(Use);
				}
				return CopiesToLocalize.size();
				}

				/// \returns true if it is safe to move a copy from a physical register past
				/// \p MI.
				static bool safeToMoveCopyFromPhysRegPast(const MachineInstr &MI) {
				return MI.isCopy() && MI.getOperand(1).getReg().isPhysical();
				}

				bool CopyLocalizer::localizeCopies(MachineFunction &MF,
				LocalizedSetVecT &CopiesToLocalize) {
				MachineBasicBlock &MBB = MF.front();
				// Iterate over each of the copies and their uses in reverse. We want to
				// localize the copy with the furthest away use first, and the copy with the
				// closest use last. This ensures that the last insert (the one with the
				// closest use) will be placed at the end of the range.
				for (MachineInstr *Copy : reverse(CopiesToLocalize)) {
				auto Range = instructionsWithoutDebug(std::next(Copy->getIterator()),
				MBB.instr_end());
				// We can only localize copies from physical registers within a contiguous
				// range of copies from physical registers. We have no idea how register
				// allocation will play out with other instructions.
				//
				// e.g. in this situation:
				//
				// %x = COPY $p
				// %y = G_SOMETHING
				//
				// We should not move %x past %y, because %y could end up being allocated to
				// $p. This is true regardless of register bank in some situations for some
				// targets.
				//
				// FIXME: It's kind of wasteful to recalculate this; each "block" of copies
				// could store this position.
				auto NewPos = find_if(Range, [](const MachineInstr &MI) {
				return !safeToMoveCopyFromPhysRegPast(MI);
				});
				assert(
				NewPos != Range.end() &&
				"Must have something which isn't a copy from a physreg in the block?");
				LLVM_DEBUG(dbgs() << "... Will localize: " << *Copy
				<< "... New position is before: " << *NewPos << '\n');
				MachineInstr *LocalizedMI = MF.CloneMachineInstr(Copy);
				MBB.insert(MBB.SkipPHIsAndLabels(&*NewPos), LocalizedMI);
				Copy->eraseFromParent();
				}

				return true;
				}

				bool CopyLocalizer::runOnMachineFunction(MachineFunction &MF) {
				if (!shouldRunOnMF(MF))
				return false;
				LLVM_DEBUG(dbgs() << "Localizing copies in entry block of: " << MF.getName()
				<< '\n');
				LocalizedSetVecT CopiesToLocalize;
				if (!findCopiesToLocalize(MF, CopiesToLocalize))
				return false;
				return localizeCopies(MF, CopiesToLocalize);
				}

llvm/lib/CodeGen/GlobalISel/GlobalISel.cpp

	Show All 14 Lines

	using namespace llvm;			using namespace llvm;

	void llvm::initializeGlobalISel(PassRegistry &Registry) {			void llvm::initializeGlobalISel(PassRegistry &Registry) {
	initializeIRTranslatorPass(Registry);			initializeIRTranslatorPass(Registry);
	initializeLegalizerPass(Registry);			initializeLegalizerPass(Registry);
	initializeLocalizerPass(Registry);			initializeLocalizerPass(Registry);
	initializeRegBankSelectPass(Registry);			initializeRegBankSelectPass(Registry);
				initializeCopyLocalizerPass(Registry);
	initializeInstructionSelectPass(Registry);			initializeInstructionSelectPass(Registry);
	}			}

llvm/lib/Target/AArch64/AArch64TargetMachine.cpp

	Show All 20 Lines
	#include "llvm/ADT/STLExtras.h"			#include "llvm/ADT/STLExtras.h"
	#include "llvm/ADT/Triple.h"			#include "llvm/ADT/Triple.h"
	#include "llvm/Analysis/TargetTransformInfo.h"			#include "llvm/Analysis/TargetTransformInfo.h"
	#include "llvm/CodeGen/CSEConfigBase.h"			#include "llvm/CodeGen/CSEConfigBase.h"
	#include "llvm/CodeGen/GlobalISel/IRTranslator.h"			#include "llvm/CodeGen/GlobalISel/IRTranslator.h"
	#include "llvm/CodeGen/GlobalISel/InstructionSelect.h"			#include "llvm/CodeGen/GlobalISel/InstructionSelect.h"
	#include "llvm/CodeGen/GlobalISel/Legalizer.h"			#include "llvm/CodeGen/GlobalISel/Legalizer.h"
	#include "llvm/CodeGen/GlobalISel/Localizer.h"			#include "llvm/CodeGen/GlobalISel/Localizer.h"
				#include "llvm/CodeGen/GlobalISel/CopyLocalizer.h"
	#include "llvm/CodeGen/GlobalISel/RegBankSelect.h"			#include "llvm/CodeGen/GlobalISel/RegBankSelect.h"
	#include "llvm/CodeGen/MIRParser/MIParser.h"			#include "llvm/CodeGen/MIRParser/MIParser.h"
	#include "llvm/CodeGen/MachineScheduler.h"			#include "llvm/CodeGen/MachineScheduler.h"
	#include "llvm/CodeGen/Passes.h"			#include "llvm/CodeGen/Passes.h"
	#include "llvm/CodeGen/TargetPassConfig.h"			#include "llvm/CodeGen/TargetPassConfig.h"
	#include "llvm/IR/Attributes.h"			#include "llvm/IR/Attributes.h"
	#include "llvm/IR/Function.h"			#include "llvm/IR/Function.h"
	#include "llvm/InitializePasses.h"			#include "llvm/InitializePasses.h"
	▲ Show 20 Lines • Show All 530 Lines • ▼ Show 20 Lines

	bool AArch64PassConfig::addRegBankSelect() {			bool AArch64PassConfig::addRegBankSelect() {
	addPass(new RegBankSelect());			addPass(new RegBankSelect());
	return false;			return false;
	}			}

	void AArch64PassConfig::addPreGlobalInstructionSelect() {			void AArch64PassConfig::addPreGlobalInstructionSelect() {
	addPass(new Localizer());			addPass(new Localizer());
				bool IsOptNone = getOptLevel() == CodeGenOpt::None;
				if (!IsOptNone)
				addPass(new CopyLocalizer());
	}			}

	bool AArch64PassConfig::addGlobalInstructionSelect() {			bool AArch64PassConfig::addGlobalInstructionSelect() {
	addPass(new InstructionSelect());			addPass(new InstructionSelect());
	return false;			return false;
	}			}

	bool AArch64PassConfig::addILPOpts() {			bool AArch64PassConfig::addILPOpts() {
	▲ Show 20 Lines • Show All 120 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/GlobalISel/copy-localizer.mir

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				# RUN: llc -mtriple aarch64-apple-ios -global-isel -run-pass=copy-localizer -verify-machineinstrs %s -o - \| FileCheck %s

				...
				---
				name: localize_overlapping_ranges
				legalized: true
				regBankSelected: true
				tracksRegLiveness: true
				body: \|
				bb.0:
				liveins: $w0, $w1, $w2

				; We should move the copies at the top closer to their uses.

				; CHECK-LABEL: name: localize_overlapping_ranges
				; CHECK: liveins: $w0, $w1, $w2
				; CHECK: %copy_from_w2:gpr(s32) = COPY $w2
				; CHECK: %copy_from_w1:gpr(s32) = COPY $w1
				; CHECK: %copy_from_w0:gpr(s32) = COPY $w0
				; CHECK: $w0 = COPY %copy_from_w0(s32)
				; CHECK: $w1 = COPY %copy_from_w1(s32)
				; CHECK: $w2 = COPY %copy_from_w2(s32)
				%copy_from_w0:gpr(s32) = COPY $w0
				%copy_from_w1:gpr(s32) = COPY $w1
				%copy_from_w2:gpr(s32) = COPY $w2
				$w0 = COPY %copy_from_w0(s32)
				$w1 = COPY %copy_from_w1(s32)
				$w2 = COPY %copy_from_w2(s32)

				...
				---
				name: localize_overlapping_ranges_with_constant
				legalized: true
				regBankSelected: true
				tracksRegLiveness: true
				body: \|
				bb.0:
				liveins: $w0, $w1, $w2

				; We should move the copies at the top closer to their uses.
				; Verifies that we won't move anything past the constants.

				; CHECK-LABEL: name: localize_overlapping_ranges_with_constant
				; CHECK: liveins: $w0, $w1, $w2
				; CHECK: %copy_from_w2:gpr(s32) = COPY $w2
				; CHECK: %copy_from_w1:gpr(s32) = COPY $w1
				; CHECK: %copy_from_w0:gpr(s32) = COPY $w0
				; CHECK: %c0:gpr(s32) = G_CONSTANT i32 0
				; CHECK: %c1:gpr(s32) = G_CONSTANT i32 1
				; CHECK: $w0 = COPY %copy_from_w0(s32)
				; CHECK: $w1 = COPY %copy_from_w1(s32)
				; CHECK: $w2 = COPY %copy_from_w2(s32)
				%copy_from_w0:gpr(s32) = COPY $w0
				%copy_from_w1:gpr(s32) = COPY $w1
				%copy_from_w2:gpr(s32) = COPY $w2

				%c0:gpr(s32) = G_CONSTANT i32 0
				%c1:gpr(s32) = G_CONSTANT i32 1

				$w0 = COPY %copy_from_w0(s32)
				$w1 = COPY %copy_from_w1(s32)
				$w2 = COPY %copy_from_w2(s32)

				...
				---
				name: split_range_by_constant
				legalized: true
				regBankSelected: true
				tracksRegLiveness: true
				body: \|
				bb.0:
				liveins: $w0, $w1, $w2, $w3, $w4

				; Only the copies above the G_CONSTANT can be moved in the space before the
				; G_CONSTANT. Only the constants after the G_CONSTANT can be moved in the
				; space after the G_CONSTANT.
				;
				; We don't know which register the G_CONSTANT will be allocated to, so we
				; don't want to move things around here.

				; CHECK-LABEL: name: split_range_by_constant
				; CHECK: liveins: $w0, $w1, $w2, $w3, $w4
				; CHECK: %copy_from_w2:gpr(s32) = COPY $w2
				; CHECK: %copy_from_w1:gpr(s32) = COPY $w1
				; CHECK: %copy_from_w0:gpr(s32) = COPY $w0
				; CHECK: %blocked:gpr(s32) = G_CONSTANT i32 0
				; CHECK: %copy_from_w4:gpr(s32) = COPY $w4
				; CHECK: %copy_from_w3:gpr(s32) = COPY $w3
				; CHECK: $w0 = COPY %copy_from_w0(s32)
				; CHECK: $w1 = COPY %copy_from_w1(s32)
				; CHECK: $w2 = COPY %copy_from_w2(s32)
				; CHECK: $w3 = COPY %copy_from_w3(s32)
				; CHECK: $w4 = COPY %copy_from_w4(s32)
				%copy_from_w0:gpr(s32) = COPY $w0
				%copy_from_w1:gpr(s32) = COPY $w1
				%copy_from_w2:gpr(s32) = COPY $w2
				%blocked:gpr(s32) = G_CONSTANT i32 0
				%copy_from_w3:gpr(s32) = COPY $w3
				%copy_from_w4:gpr(s32) = COPY $w4
				$w0 = COPY %copy_from_w0(s32)
				$w1 = COPY %copy_from_w1(s32)
				$w2 = COPY %copy_from_w2(s32)
				$w3 = COPY %copy_from_w3(s32)
				$w4 = COPY %copy_from_w4(s32)

				...
				---
				name: split_range_by_fconstant
				legalized: true
				regBankSelected: true
				tracksRegLiveness: true
				body: \|
				bb.0:
				liveins: $w0, $w1, $w2, $w3, $w4

				; Even though the G_FCONSTANT says that it needs a FPR, we don't want to
				; move anything across it. It could later be selected to a mov.

				; CHECK-LABEL: name: split_range_by_fconstant
				; CHECK: liveins: $w0, $w1, $w2, $w3, $w4
				; CHECK: %copy_from_w2:gpr(s32) = COPY $w2
				; CHECK: %copy_from_w1:gpr(s32) = COPY $w1
				; CHECK: %copy_from_w0:gpr(s32) = COPY $w0
				; CHECK: %blocked:fpr(s32) = G_FCONSTANT float 1.000000e+00
				; CHECK: %copy_from_w4:gpr(s32) = COPY $w4
				; CHECK: %copy_from_w3:gpr(s32) = COPY $w3
				; CHECK: $w0 = COPY %copy_from_w0(s32)
				; CHECK: $w1 = COPY %copy_from_w1(s32)
				; CHECK: $w2 = COPY %copy_from_w2(s32)
				; CHECK: $w3 = COPY %copy_from_w3(s32)
				; CHECK: $w4 = COPY %copy_from_w4(s32)
				%copy_from_w0:gpr(s32) = COPY $w0
				%copy_from_w1:gpr(s32) = COPY $w1
				%copy_from_w2:gpr(s32) = COPY $w2
				%blocked:fpr(s32) = G_FCONSTANT float 1.0
				%copy_from_w3:gpr(s32) = COPY $w3
				%copy_from_w4:gpr(s32) = COPY $w4
				$w0 = COPY %copy_from_w0(s32)
				$w1 = COPY %copy_from_w1(s32)
				$w2 = COPY %copy_from_w2(s32)
				$w3 = COPY %copy_from_w3(s32)
				$w4 = COPY %copy_from_w4(s32)

				...
				---
				name: dont_change_ideal_range
				legalized: true
				regBankSelected: true
				tracksRegLiveness: true
				body: \|
				bb.0:
				liveins: $w0, $w1, $w2

				; This range should not be modified, because it is already ideal.

				; CHECK-LABEL: name: dont_change_ideal_range
				; CHECK: liveins: $w0, $w1, $w2
				; CHECK: %copy_from_w2:gpr(s32) = COPY $w2
				; CHECK: %copy_from_w1:gpr(s32) = COPY $w1
				; CHECK: %copy_from_w0:gpr(s32) = COPY $w0
				; CHECK: $w0 = COPY %copy_from_w0(s32)
				; CHECK: $w1 = COPY %copy_from_w1(s32)
				; CHECK: $w2 = COPY %copy_from_w2(s32)
				%copy_from_w2:gpr(s32) = COPY $w2
				%copy_from_w1:gpr(s32) = COPY $w1
				%copy_from_w0:gpr(s32) = COPY $w0
				$w0 = COPY %copy_from_w0(s32)
				$w1 = COPY %copy_from_w1(s32)
				$w2 = COPY %copy_from_w2(s32)


				...
				---
				name: closest_use
				legalized: true
				regBankSelected: true
				tracksRegLiveness: true
				body: \|
				bb.0:
				liveins: $w0, $x1, $w2, $w3, $w4

				; The G_TRUNC is the closest to the block of copies. It uses %copy_from_x1,
				; so that should be at the end of the range.

				; CHECK-LABEL: name: closest_use
				; CHECK: liveins: $w0, $x1, $w2, $w3, $w4
				; CHECK: %copy_from_w2:gpr(s32) = COPY $w2
				; CHECK: %copy_from_w0:gpr(s32) = COPY $w0
				; CHECK: %copy_from_x1:gpr(s64) = COPY $x1
				; CHECK: %trunc_x1:gpr(s32) = G_TRUNC %copy_from_x1(s64)
				; CHECK: $w0 = COPY %copy_from_w0(s32)
				; CHECK: $w1 = COPY %trunc_x1(s32)
				; CHECK: $w2 = COPY %copy_from_w2(s32)
				%copy_from_w0:gpr(s32) = COPY $w0
				%copy_from_x1:gpr(s64) = COPY $x1
				%copy_from_w2:gpr(s32) = COPY $w2

				%trunc_x1:gpr(s32) = G_TRUNC %copy_from_x1
				$w0 = COPY %copy_from_w0(s32)
				$w1 = COPY %trunc_x1(s32)
				$w2 = COPY %copy_from_w2(s32)

				...
				---
				name: binop_1
				legalized: true
				regBankSelected: true
				tracksRegLiveness: true
				body: \|
				bb.0:
				liveins: $w1, $w2, $w3

				; We should localize the copies from w1 and w2 to be closer to the G_ADD.

				; CHECK-LABEL: name: binop_1
				; CHECK: liveins: $w1, $w2, $w3
				; CHECK: %copy_from_w3:gpr(s32) = COPY $w3
				; CHECK: %copy_from_w2:gpr(s32) = COPY $w2
				; CHECK: %copy_from_w1:gpr(s32) = COPY $w1
				; CHECK: %add:gpr(s32) = G_ADD %copy_from_w1, %copy_from_w2
				; CHECK: $w3 = COPY %copy_from_w3(s32)
				%copy_from_w1:gpr(s32) = COPY $w1
				%copy_from_w2:gpr(s32) = COPY $w2
				%copy_from_w3:gpr(s32) = COPY $w3

				%add:gpr(s32) = G_ADD %copy_from_w1(s32), %copy_from_w2(s32)
				$w3 = COPY %copy_from_w3(s32)

				...
				---
				name: binop_swap_params
				legalized: true
				regBankSelected: true
				tracksRegLiveness: true
				body: \|
				bb.0:
				liveins: $w1, $w2, $w3

				; If we change the order of parameters on the G_ADD, it shouldn't change
				; the order of localization.

				; CHECK-LABEL: name: binop_swap_params
				; CHECK: liveins: $w1, $w2, $w3
				; CHECK: %copy_from_w3:gpr(s32) = COPY $w3
				; CHECK: %copy_from_w2:gpr(s32) = COPY $w2
				; CHECK: %copy_from_w1:gpr(s32) = COPY $w1
				; CHECK: %add:gpr(s32) = G_ADD %copy_from_w2, %copy_from_w1
				; CHECK: $w3 = COPY %copy_from_w3(s32)
				%copy_from_w1:gpr(s32) = COPY $w1
				%copy_from_w2:gpr(s32) = COPY $w2
				%copy_from_w3:gpr(s32) = COPY $w3
				%add:gpr(s32) = G_ADD %copy_from_w2(s32), %copy_from_w1(s32)
				$w3 = COPY %copy_from_w3(s32)

				...
				---
				name: dont_localize_multiple_blocks
				legalized: true
				regBankSelected: true
				tracksRegLiveness: true
				body: \|
				; Check that this doesn't impact functions with multiple blocks.
				; (This function is an example of one we really should run on, since
				; the second block is empty.)

				; CHECK-LABEL: name: dont_localize_multiple_blocks
				; CHECK: bb.0:
				; CHECK: successors: %bb.1(0x80000000)
				; CHECK: liveins: $w0, $w1, $w2
				; CHECK: %copy_from_w0:gpr(s32) = COPY $w0
				; CHECK: %copy_from_w1:gpr(s32) = COPY $w1
				; CHECK: %copy_from_w2:gpr(s32) = COPY $w2
				; CHECK: $w0 = COPY %copy_from_w0(s32)
				; CHECK: $w1 = COPY %copy_from_w1(s32)
				; CHECK: $w2 = COPY %copy_from_w2(s32)
				; CHECK: bb.1:
				; CHECK: RET_ReallyLR
				bb.0:
				liveins: $w0, $w1, $w2

				%copy_from_w0:gpr(s32) = COPY $w0
				%copy_from_w1:gpr(s32) = COPY $w1
				%copy_from_w2:gpr(s32) = COPY $w2
				$w0 = COPY %copy_from_w0(s32)
				$w1 = COPY %copy_from_w1(s32)
				$w2 = COPY %copy_from_w2(s32)
				bb.1:
				RET_ReallyLR

				...
				---
				name: smallest_range
				legalized: true
				regBankSelected: true
				tracksRegLiveness: true
				body: \|
				bb.0:
				liveins: $w0, $w1, $w2

				; Check the minimum range for localizing copies.

				; CHECK-LABEL: name: smallest_range
				; CHECK: liveins: $w0, $w1, $w2
				; CHECK: %copy_from_w1:gpr(s32) = COPY $w1
				; CHECK: %copy_from_w0:gpr(s32) = COPY $w0
				; CHECK: $w0 = COPY %copy_from_w0(s32)
				; CHECK: $w1 = COPY %copy_from_w1(s32)
				%copy_from_w0:gpr(s32) = COPY $w0
				%copy_from_w1:gpr(s32) = COPY $w1
				$w0 = COPY %copy_from_w0(s32)
				$w1 = COPY %copy_from_w1(s32)

llvm/test/CodeGen/AArch64/GlobalISel/gisel-commandline-option.ll

	Show First 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
	; ENABLED-NEXT: PreLegalizerCombiner			; ENABLED-NEXT: PreLegalizerCombiner
	; VERIFY-NEXT: Verify generated machine code			; VERIFY-NEXT: Verify generated machine code
	; ENABLED-NEXT: Analysis containing CSE Info			; ENABLED-NEXT: Analysis containing CSE Info
	; ENABLED-NEXT: Legalizer			; ENABLED-NEXT: Legalizer
	; VERIFY-NEXT: Verify generated machine code			; VERIFY-NEXT: Verify generated machine code
	; ENABLED: RegBankSelect			; ENABLED: RegBankSelect
	; VERIFY-NEXT: Verify generated machine code			; VERIFY-NEXT: Verify generated machine code
	; ENABLED-NEXT: Localizer			; ENABLED-NEXT: Localizer
				; ENABLED-O1-NEXT: Copy Localizer
	; VERIFY-O0-NEXT: Verify generated machine code			; VERIFY-O0-NEXT: Verify generated machine code
	; ENABLED-NEXT: Analysis for ComputingKnownBits			; ENABLED-NEXT: Analysis for ComputingKnownBits
	; ENABLED-NEXT: InstructionSelect			; ENABLED-NEXT: InstructionSelect
	; VERIFY-NEXT: Verify generated machine code			; VERIFY-NEXT: Verify generated machine code
	; ENABLED-NEXT: ResetMachineFunction			; ENABLED-NEXT: ResetMachineFunction

	; FALLBACK: AArch64 Instruction Selection			; FALLBACK: AArch64 Instruction Selection
	; NOFALLBACK-NOT: AArch64 Instruction Selection			; NOFALLBACK-NOT: AArch64 Instruction Selection
	Show All 9 Lines

llvm/test/CodeGen/AArch64/GlobalISel/integration-shuffle-vector.ll

	; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	; RUN: llc -global-isel -mtriple aarch64-apple-ios -stop-after=instruction-select %s -o - \| FileCheck %s			; RUN: llc -global-isel -mtriple aarch64-apple-ios -stop-after=instruction-select %s -o - \| FileCheck %s

	; Check that packing incoming arguments into a big vector type			; Check that packing incoming arguments into a big vector type
	; and unpacking them in registers for the call to @bar gets selected as just			; and unpacking them in registers for the call to @bar gets selected as just
	; simple copies. I.e., we don't artificial try to keep the big			; simple copies. I.e., we don't artificial try to keep the big
	; vector (%vec) alive.			; vector (%vec) alive.
	define void @shuffle_to_concat_vector(<2 x i64> %a, <2 x i64> %b) {			define void @shuffle_to_concat_vector(<2 x i64> %a, <2 x i64> %b) {
	; CHECK-LABEL: name: shuffle_to_concat_vector			; CHECK-LABEL: name: shuffle_to_concat_vector
	; CHECK: bb.1 (%ir-block.0):			; CHECK: bb.1 (%ir-block.0):
	; CHECK: liveins: $q0, $q1			; CHECK: liveins: $q0, $q1
	; CHECK: [[COPY:%[0-9]+]]:fpr128 = COPY $q0			; CHECK: [[COPY:%[0-9]+]]:fpr128 = COPY $q1
	; CHECK: [[COPY1:%[0-9]+]]:fpr128 = COPY $q1			; CHECK: [[COPY1:%[0-9]+]]:fpr128 = COPY $q0
	; CHECK: ADJCALLSTACKDOWN 0, 0, implicit-def $sp, implicit $sp			; CHECK: ADJCALLSTACKDOWN 0, 0, implicit-def $sp, implicit $sp
	; CHECK: $q0 = COPY [[COPY]]			; CHECK: $q0 = COPY [[COPY1]]
	; CHECK: $q1 = COPY [[COPY1]]			; CHECK: $q1 = COPY [[COPY]]
	; CHECK: BL @bar, csr_darwin_aarch64_aapcs, implicit-def $lr, implicit $sp, implicit $q0, implicit $q1			; CHECK: BL @bar, csr_darwin_aarch64_aapcs, implicit-def $lr, implicit $sp, implicit $q0, implicit $q1
	; CHECK: ADJCALLSTACKUP 0, 0, implicit-def $sp, implicit $sp			; CHECK: ADJCALLSTACKUP 0, 0, implicit-def $sp, implicit $sp
	; CHECK: RET_ReallyLR			; CHECK: RET_ReallyLR
	%vec = shufflevector <2 x i64> %a, <2 x i64> %b, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%vec = shufflevector <2 x i64> %a, <2 x i64> %b, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	call void @bar(<4 x i64> %vec)			call void @bar(<4 x i64> %vec)
	ret void			ret void
	}			}

	declare void @bar(<4 x i64> %vec)			declare void @bar(<4 x i64> %vec)

This is an archive of the discontinued LLVM Phabricator instance.

[GlobalISel] Add a localizer for copies from physregs and use it in AArch64Needs RevisionPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 290008

llvm/include/llvm/CodeGen/GlobalISel/CopyLocalizer.h

llvm/include/llvm/InitializePasses.h

llvm/lib/CodeGen/GlobalISel/CMakeLists.txt

llvm/lib/CodeGen/GlobalISel/CopyLocalizer.cpp

llvm/lib/CodeGen/GlobalISel/GlobalISel.cpp

llvm/lib/Target/AArch64/AArch64TargetMachine.cpp

llvm/test/CodeGen/AArch64/GlobalISel/copy-localizer.mir

llvm/test/CodeGen/AArch64/GlobalISel/gisel-commandline-option.ll

llvm/test/CodeGen/AArch64/GlobalISel/integration-shuffle-vector.ll

[GlobalISel] Add a localizer for copies from physregs and use it in AArch64
Needs RevisionPublic