This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/
-
llvm/
-
CodeGen/
-
AsmPrinter.h
4/4
MachineFrameInfo.h
-
ShrinkWrapper.h
-
Target/
-
TargetFrameLowering.h
-
lib/
-
CodeGen/
-
AsmPrinter/
-
AsmPrinter.cpp
-
CodeViewDebug.cpp
-
CMakeLists.txt
7/9
PrologEpilogInserter.cpp
-
ShrinkWrapper.cpp
-
TargetPassConfig.cpp
-
Target/
-
AArch64/
-
AArch64FrameLowering.h
-
AArch64FrameLowering.cpp
-
AArch64LoadStoreOptimizer.cpp
-
AArch64MachineFunctionInfo.h
-
X86/
-
X86FrameLowering.h
2/2
X86FrameLowering.cpp
-
test/CodeGen/
-
CodeGen/
-
AArch64/
-
ShrinkWrapping/
-
AliasInRegMask.mir
-
CFIStackFrame.mir
-
CSRUsedOnTerminator.mir
-
CompactUnwindingFPSPPair.mir
-
DetermineCalleeSavesSideEffects.mir
-
FirstMBBNum2.ll
-
NoPostPreLoadStore.mir
-
NoStackObjects.mir
-
aarch64-dynamic-stack-layout.ll
-
alloca.ll
-
arm64-aapcs-be.ll
-
arm64-abi_align.ll
-
arm64-alloca-frame-pointer-offset.ll
-
arm64-dead-register-def-bug.ll
-
arm64-fp128.ll
-
arm64-hello.ll
-
arm64-join-reserved.ll
-
arm64-large-frame.ll
-
X86/ShrinkWrapping/
-
ShrinkWrapping/
-
BasicBranch.mir
-
CriticalEdge.mir
-
CriticalEdge2.mir
-
CriticalEdgeLoop.mir
-
InfiniteLoop.mir
-
IrreducibleCFG.mir
-
LoopBasic.mir
-
LoopInCondition.mir
-
LoopNoPreheader.mir
-
LoopNoPreheaderLatchExit.mir
-
MultipleCriticalEdges.mir
-
NestedLoopsCriticalEdges.mir
-
NoReturnPath.mir
-
Paper1Figure2CriticalEdge.mir
-
Paper2Figure1.mir
-
Paper2Figure2.mir
-
PropagateLoopUses.mir
-
SCCCriticalEdge.mir
-
SaveBeforeLoop.mir
-
SimpleLoopBranch.mir
-
StackAlignment.mir
-
Tree.mir
-
lit.local.cfg
-
optimize-max-0.mir

Differential D30808

shrink-wrap: implement more advanced algorithm
Needs ReviewPublic

Authored by thegameg on Mar 9 2017, 8:46 PM.

Download Raw Diff

Details

Reviewers

MatzeB

Summary

THIS IS A WORK IN PROGRESS.

Putting this up for early review.

This pass is an improvement of the current shrink-wrapping pass, based, this time, on the dataflow analysis described in "Minimizing Register Usage Penalty at Procedure Calls - Fred C. Chow" [1]. The aim of this improvement is to remove the restriction that the current shrink-wrapping pass is having, which is having only one save / restore point.

Diff Detail

Event Timeline

thegameg created this revision.Mar 9 2017, 8:46 PM

Herald added a subscriber: mgorny. · View Herald TranscriptMar 9 2017, 8:46 PM

thegameg edited the summary of this revision. (Show Details)Mar 9 2017, 8:48 PM

thegameg removed a subscriber: mgorny.

Thanks for working on this and posting the early version. On a first high-level glance this looks like a great start! I may need a few more days to dive into in and give detailed comments.

Update data structures to use the MachineBasicBlock number instead of a pointer.
Update data structures to use register sets instead of boolean maps.
Add new test case.

Hi Francis,

Overal this is heading in the right way and it looks like a great start!

Here's a first round of comments. I mostly looked at stylistic issues and did not look at any of the code in ShrinkWrap2.cpp yet.

I really like all those tests getting created during development also bonus points for them being reasonably/small reduced.

Naming: There are a lot of different names for the same/similar things around:

Saves/Restores seems to be predominantly used for callee saves.
Spills/Reloads (or Spills/Fills we don't even agree there) seem to be used by the register allocator.

I think you are picking names from both domains. I probably talked too much about spills/reloads to you as I am a regalloc guy, unfortunately this leads to more mixups here (i.e. having insertSpills() and insertRestores()).

include/llvm/CodeGen/MachineFrameInfo.h
18	Looks like this change is not needed (anymore?). I only see usage of MachineBasicBlock*
296–297	This looks like the main information you compute so it would deserve a few more comments explaining details. I wondered about: I assume we can have multiple entries for the same register (as long as they are in different blocks)? Can we have different FrameIndexes for the same register in different blocks? How does this relate to the existing CSInfo struct?
684–696	Would it make sense to sprinkle some asserts here to make sure you are not accidentally mixing up the different systems and only use one at a time?
include/llvm/CodeGen/MachineFunction.h
478 ↗	(On Diff #91431)	Unrelated to this patch and obvious. Feel free to just commit this without review.
lib/CodeGen/PrologEpilogInserter.cpp
400	Do not repeat function names in doxygen comments (A lot of old code does the same, but we try to avoid it for new code).
403	In llvm we try to use `ArrayRef<T>` instead of `cont std::vector<T>&`. It typically provides the same performance but works with std::vector, llvm::SmallVector and other containers (as long as the containers stores all values in a continuous memory region).
409	I tend to use references (=`const TargetRegisterInfo &TRI =` for stuff that cannot be nullptr). Similar in a number of other places.
415–417	Use range based for.
432	Maybe call it `RestoreBB` as the variable doesn't hold a restore instruction but is about a basic block.
442	Avoiding `auto` is friendlier to the reader if the type isn't immediately obvious.
446	How about `I = Restore.getFirstTerminator()`.
466–473	Would it make sense to just reverse the loop over CSI instead of using the subtle AtStart/BeforeI logic?
lib/Target/X86/X86FrameLowering.cpp
1162–1165	odd linebreak. Also could be written as: NumBytes = FrameSize; if (!MFI.getSaves().empty()) NumBytes -= X86FI->getCalleeSavedFrameSize();
1675–1681	see above
test/CodeGen/X86/ShrinkWrapping/2006-04-27-ISelFoldingBug-Reduced.mir
4 ↗	(On Diff #91431)	Maybe move this CHECK: next to the other CHECKS. It is also often a good idea to have some CHECK-LABEL for a point where the function begins (not sure if there actually is such a thing in the debug output here though). Similar in the other tests.

thegameg added inline comments.Mar 14 2017, 2:45 PM

include/llvm/CodeGen/MachineFrameInfo.h
18	It is, unfortunately. `DenseMap<K, V>` uses `DenseMapInfo<K>`, which is specialized on `DenseMapInfo<K>`, which uses `PointerLikeTypeTraits<T>`, which uses alignof (T) to check the number of low bits available.

Thank you for the review Matthias!

Hope I addressed all your concerns so far in this patch.

I also "added support" for AArch64, fixed some region-related bugs and add some more tests.

Herald added a reviewer: javed.absar. · View Herald TranscriptMar 14 2017, 6:37 PM

Herald added subscribers: qcolombet, aemerson. · View Herald Transcript

thegameg removed a reviewer: javed.absar.Mar 14 2017, 6:38 PM

thegameg removed subscribers: aemerson, qcolombet.

Hi Francis,

Out of curiosity, what is the compile time impact of this early version compared to the performance gain?

I am not aware of the improvement that Lupo and Wilken did to the algorithm, but back in 2009, we worked on Fred's shrink-wrapping and the compile time impact was just not worth the improvement.
You can have a look at 378553cb for instance. Thus, I am wondering if we are heading in the same direction and if yes, if instead iterative improvements of the current approach wouldn't be more productive.

Cheers,
-Quentin

In D30808#701653, @qcolombet wrote:

Out of curiosity, what is the compile time impact of this early version compared to the performance gain?

For the current version, I'm expecting it to be quite high, since the implementation of the algorithm is quite basic, but I didn't measure that yet, and of course, I'm going to look into it after I get most of it fixed.

I am not aware of the improvement that Lupo and Wilken did to the algorithm, but back in 2009, we worked on Fred's shrink-wrapping and the compile time impact was just not worth the improvement.
You can have a look at 378553cb for instance. Thus, I am wondering if we are heading in the same direction and if yes, if instead iterative improvements of the current approach wouldn't be more productive.

I'm aware of the 2009 version of shrink-wrapping, and yes, I'm heading in the same direction as that, but I think we can improve it to reduce compile time. I also took a look at your implementation before, and I couldn't think of a way to produce multiple save / restore points for one register (which seems to be the feature that we're the most interested in?) using dominators (I could look into it if you have any ideas, of course).

And one of my concerns related to the improvement by Lupo and Wilken is actually compile time, since we actually need SESE regions, which are expensive to compute and unused through the pipeline, so we wouldn't be able to reuse the analysis results as we do with dominator trees (RegionInfo.h:25).

Thanks,
Francis

Cheers,
-Quentin

I couldn't think of a way to produce multiple save / restore points for one register (which seems to be the feature that we're the most interested in?) using dominators (I could look into it if you have any ideas, of course).

Is that what will bring performance?
I remember looking at candidate we missed with the current implementation and the main problem was the fact that we don't support several restore/save points, not that we don't split the set of CSRs. At that time we were not interested into pushing the approach further so we actually didn't categorize what is needed to get more interested candidates.
What does your categorization tell you?

Put differently, what is the percentage that we miss because of the granularity of the CSRs set we track, the number of save/restore points, "poor" allocation and so on?

Basically what I am saying is that your approach may make sense but I would rather have the implementation data-driven. My worries is that the end of your work we will end up in the same situation as 2009 and decide not to use it.

My 2-cent.

Cheers,
-Quentin

In D30808#701826, @qcolombet wrote:

I couldn't think of a way to produce multiple save / restore points for one register (which seems to be the feature that we're the most interested in?) using dominators (I could look into it if you have any ideas, of course).

Is that what will bring performance?
I remember looking at candidate we missed with the current implementation and the main problem was the fact that we don't support several restore/save points, not that we don't split the set of CSRs. At that time we were not interested into pushing the approach further so we actually didn't categorize what is needed to get more interested candidates.
What does your categorization tell you?

Well, splitting the CSRs set and the save / restore points is, to me, very related. What I noticed (maybe I should look further into that?) is that once we have more than one CSR use, our allocation gets dropped because we have to merge the points with the other register's points (same for stack operations). But I agree, that having multiple save / restore points should be more beneficial than splitting the CSRs.

Put differently, what is the percentage that we miss because of the granularity of the CSRs set we track, the number of save/restore points, "poor" allocation and so on?

I don't have any numbers, and I'm not sure my estimation would be correct, and I actually think that I should look into that (maybe I should have gathered those numbers from the beginning?).

Basically what I am saying is that your approach may make sense but I would rather have the implementation data-driven. My worries is that the end of your work we will end up in the same situation as 2009 and decide not to use it.

I see, thank you for taking a look, I'll see if I can get some measurements asap.

Hey, here's a nother round of comments on the data flow analysis. Remember that dataflow is just a mental model/theoretical framework. For a concrete instance you can often find shortcuts/optimizations:

The code is usually more elegant if you just have a single map from blocks (numbers) to a struct that contains all of anticipated,available for that block. Rather than having a separate map for anticipated and available.
Instead of calculating each register on its own, it is often beneficial to calculate all register at the same time. That enables the use of bit operations (like and/or) for the join operations!
In your specific case making a difference between in/out is probably not worth it. For example looking ANTOUT/ANTIN: Every time you want to compute ANTIN from ANTOUT you just take the information from ANTOUT and add the stuff from usesReg(). You could just not store ANTOUT (in memory) at all, instead every time where you would change ANTOUT, you just go ahead and add usesReg() immediately and store the results in ANTIN.
Similarily you could not store the results of a block with 0 or 1 successor (immediately forward the results in the 1 successor case).
The anticipated information seems to be strictly growing (= you are never removing bits), in that cases you can probably put the results of the usesRegs() analysis immediately into the ANTOUT array instead of keeping a separate thing around.

lib/CodeGen/PrologEpilogInserter.cpp
746	llvm coding style recommends against curly braces to call constructors.
lib/CodeGen/ShrinkWrap2.cpp
69–72 ↗	(On Diff #91809)	At least implement this with a function taking a lambda instead of using preprocessor macros (though a patch to add iterators to BitVector would of course be apreciated as well).
99 ↗	(On Diff #91809)	BitVector should be the better choice over SmallSet for block numbers as the block numbers should be pretty dense.
100–104 ↗	(On Diff #91809)	We use `typedef` over `using` in llvm when possible.

I actually think that I should look into that (maybe I should have gathered those numbers from the beginning?).

I guess it depends what is the goal of this implementation. For instance, if you want to have a baseline regarding what ShrinkWrapping can do, but don't actually plan to use this implementation in production, your approach makes sense. It will help to identify what are the cases that we miss right now and give elements to draw a path forward for fixing the existing shrink-wrapping for the important case or coming up with a new approach for shrink-wrapping.
Now, if you approach this as "this is the v2 shrink-wrapping", then yes, I would suggest to gather numbers first :).

Cheers,
-Quentin

Handle irreducible loops.
Process the CFG as a DAG, by merging all the blocks in a SCC into a single block.
Remove loop, use RPO and PO traversal for AV and AN.
Struct of maps -> map of structs.
Remove curly braces for constructors.
Add BitVector iterators (D32060).
using -> typedef.
More tests.

Herald added a subscriber: rengolin. · View Herald TranscriptApr 13 2017, 4:46 PM

thegameg removed a subscriber: rengolin.Apr 13 2017, 4:47 PM

kbarton added a subscriber: kbarton.May 3 2017, 7:23 PM

sfertile added a subscriber: sfertile.May 4 2017, 9:59 AM

Fixed many bugs.
Refactor the pass as a helper object used by PEI.
Improve compile-time.
Add CFI support.
Clean tests.
More and more PEI hacks.

Herald added a subscriber: javed.absar. · View Herald TranscriptJun 13 2017, 4:41 PM

Fix some CFI-related changes that regressed during the refactoring.

It is worth noting that D18046 seems to solve this problem in a more general way, which would definitely help with stack-setup shrink-wrapping, which needs CFA updates between basic blocks.

Also, enable the SP-bump combining on AArch64 and start XFAILing some tests that need to be looked into later.

davide added a subscriber: davide.Jul 28 2017, 11:58 AM

This now needs to update the "Restore" flag from CalleeSavedInfo, as of r310619: Add "Restored" flag to CalleeSavedInfo.

Revision Contents

Path

Size

include/

llvm/

CodeGen/

AsmPrinter.h

23 lines

MachineFrameInfo.h

38 lines

ShrinkWrapper.h

331 lines

Target/

TargetFrameLowering.h

17 lines

lib/

CodeGen/

AsmPrinter/

AsmPrinter.cpp

183 lines

CodeViewDebug.cpp

1 line

CMakeLists.txt

2 lines

PrologEpilogInserter.cpp

369 lines

ShrinkWrapper.cpp

891 lines

TargetPassConfig.cpp

9 lines

Target/

AArch64/

AArch64FrameLowering.h

16 lines

AArch64FrameLowering.cpp

475 lines

AArch64LoadStoreOptimizer.cpp

2 lines

AArch64MachineFunctionInfo.h

32 lines

X86/

X86FrameLowering.h

3 lines

X86FrameLowering.cpp

140 lines

test/

CodeGen/

AArch64/

ShrinkWrapping/

AliasInRegMask.mir

26 lines

CFIStackFrame.mir

28 lines

CSRUsedOnTerminator.mir

52 lines

CompactUnwindingFPSPPair.mir

44 lines

DetermineCalleeSavesSideEffects.mir

37 lines

FirstMBBNum2.ll

18 lines

NoPostPreLoadStore.mir

38 lines

NoStackObjects.mir

53 lines

aarch64-dynamic-stack-layout.ll

2 lines

alloca.ll

3 lines

arm64-aapcs-be.ll

1 line

arm64-abi_align.ll

3 lines

arm64-alloca-frame-pointer-offset.ll

3 lines

arm64-dead-register-def-bug.ll

1 line

arm64-fp128.ll

3 lines

arm64-hello.ll

3 lines

arm64-join-reserved.ll

3 lines

arm64-large-frame.ll

3 lines

X86/

ShrinkWrapping/

37 lines

42 lines

36 lines

52 lines

36 lines

81 lines

61 lines

40 lines

57 lines

LoopNoPreheaderLatchExit.mir

53 lines

MultipleCriticalEdges.mir

50 lines

NestedLoopsCriticalEdges.mir

64 lines

NoReturnPath.mir

60 lines

Paper1Figure2CriticalEdge.mir

47 lines

Paper2Figure1.mir

56 lines

Paper2Figure2.mir

120 lines

PropagateLoopUses.mir

113 lines

48 lines

58 lines

40 lines

37 lines

57 lines

2 lines

47 lines

Diff 103086

include/llvm/CodeGen/AsmPrinter.h

Show All 16 Lines
#define LLVM_CODEGEN_ASMPRINTER_H		#define LLVM_CODEGEN_ASMPRINTER_H

#include "llvm/ADT/MapVector.h"		#include "llvm/ADT/MapVector.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/ADT/Twine.h"		#include "llvm/ADT/Twine.h"
#include "llvm/CodeGen/DwarfStringPoolEntry.h"		#include "llvm/CodeGen/DwarfStringPoolEntry.h"
#include "llvm/CodeGen/MachineFunctionPass.h"		#include "llvm/CodeGen/MachineFunctionPass.h"
		#include "llvm/CodeGen/MachineFrameInfo.h"
#include "llvm/IR/InlineAsm.h"		#include "llvm/IR/InlineAsm.h"
#include "llvm/IR/LLVMContext.h"		#include "llvm/IR/LLVMContext.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/SourceMgr.h"		#include "llvm/Support/SourceMgr.h"
#include <cstdint>		#include <cstdint>
#include <memory>		#include <memory>
#include <utility>		#include <utility>
#include <vector>		#include <vector>
▲ Show 20 Lines • Show All 115 Lines • ▼ Show 20 Lines	HandlerInfo(AsmPrinterHandler Handler, const char TimerName,
: Handler(Handler), TimerName(TimerName),		: Handler(Handler), TimerName(TimerName),
TimerDescription(TimerDescription), TimerGroupName(TimerGroupName),		TimerDescription(TimerDescription), TimerGroupName(TimerGroupName),
TimerGroupDescription(TimerGroupDescription) {}		TimerGroupDescription(TimerGroupDescription) {}
};		};
/// A vector of all debug/EH info emitters we should use. This vector		/// A vector of all debug/EH info emitters we should use. This vector
/// maintains ownership of the emitters.		/// maintains ownership of the emitters.
SmallVector<HandlerInfo, 1> Handlers;		SmallVector<HandlerInfo, 1> Handlers;

		// FIXME: ShrinkWrap2: Find a way to emit CFI directives compatible with
		// shrink-wrapping. We now emit .cfi_offset and .cfi_restore for saves and
		// restores, we re-process them to see if the final layout needs more work or
		// not based on the block order.

		typedef DenseMap<unsigned, BitVector> CSRMap;

		// FIXME: This shouldn't be here.
		DenseMap<unsigned, unsigned> RegToCSRIdx;

		// FIXME: ShrinkWrap2: Compute CFI save / restore directives based on the
		// final layout.
		CSRMap ExtraSaveCFI;
		CSRMap ExtraRestoreCFI;

		// FIXME: ShrinkWrap2: How does this work with stack shrink-wrapping. Is there
		// a way to "restore" everything?

public:		public:
struct SrcMgrDiagInfo {		struct SrcMgrDiagInfo {
SourceMgr SrcMgr;		SourceMgr SrcMgr;
std::vector<const MDNode *> LocInfos;		std::vector<const MDNode *> LocInfos;
LLVMContext::InlineAsmDiagHandlerTy DiagHandler;		LLVMContext::InlineAsmDiagHandlerTy DiagHandler;
void *DiagContext;		void *DiagContext;
};		};

▲ Show 20 Lines • Show All 125 Lines • ▼ Show 20 Lines	public:
/// This method emits the body and trailer for a function.		/// This method emits the body and trailer for a function.
void EmitFunctionBody();		void EmitFunctionBody();

void emitCFIInstruction(const MachineInstr &MI);		void emitCFIInstruction(const MachineInstr &MI);

void emitFrameAlloc(const MachineInstr &MI);		void emitFrameAlloc(const MachineInstr &MI);

enum CFIMoveType { CFI_M_None, CFI_M_EH, CFI_M_Debug };		enum CFIMoveType { CFI_M_None, CFI_M_EH, CFI_M_Debug };
CFIMoveType needsCFIMoves();		CFIMoveType needsCFIMoves() const; // FIXME: ShrinkWrap2: Separate commit.

/// Returns false if needsCFIMoves() == CFI_M_EH for any function		/// Returns false if needsCFIMoves() == CFI_M_EH for any function
/// in the module.		/// in the module.
bool needsOnlyDebugCFIMoves() const { return isCFIMoveForDebugging; }		bool needsOnlyDebugCFIMoves() const { return isCFIMoveForDebugging; }

		void generateShrinkWrappingCFI();

bool needsSEHMoves();		bool needsSEHMoves();

/// Print to the current output stream assembly representations of the		/// Print to the current output stream assembly representations of the
/// constants in the constant pool MCP. This is used to print out constants		/// constants in the constant pool MCP. This is used to print out constants
/// which have been "spilled to memory" by the code generator.		/// which have been "spilled to memory" by the code generator.
///		///
virtual void EmitConstantPool();		virtual void EmitConstantPool();

▲ Show 20 Lines • Show All 323 Lines • Show Last 20 Lines

include/llvm/CodeGen/MachineFrameInfo.h

Show All 9 Lines
// The file defines the MachineFrameInfo class.		// The file defines the MachineFrameInfo class.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_CODEGEN_MACHINEFRAMEINFO_H		#ifndef LLVM_CODEGEN_MACHINEFRAMEINFO_H
#define LLVM_CODEGEN_MACHINEFRAMEINFO_H		#define LLVM_CODEGEN_MACHINEFRAMEINFO_H

#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
		// FIXME: ShrinkWrap2: Temporary hack. Remove.
		MatzeBUnsubmitted Done Reply Inline Actions Looks like this change is not needed (anymore?). I only see usage of MachineBasicBlock* MatzeB: Looks like this change is not needed (anymore?). I only see usage of MachineBasicBlock*
		thegamegAuthorUnsubmitted Done Reply Inline Actions It is, unfortunately. `DenseMap<K, V>` uses `DenseMapInfo<K>`, which is specialized on `DenseMapInfo<K>`, which uses `PointerLikeTypeTraits<T>`, which uses alignof (T) to check the number of low bits available. thegameg: It is, unfortunately. `DenseMap<K, V>` uses `DenseMapInfo<K>`, which is specialized on…
		#include "llvm/CodeGen/RegisterScavenging.h"
		#include "llvm/CodeGen/MachineBasicBlock.h"
#include "llvm/Support/DataTypes.h"		#include "llvm/Support/DataTypes.h"
#include <cassert>		#include <cassert>
#include <vector>		#include <vector>

namespace llvm {		namespace llvm {
class raw_ostream;		class raw_ostream;
class MachineFunction;		class MachineFunction;
class MachineBasicBlock;
class BitVector;		class BitVector;
class AllocaInst;		class AllocaInst;

/// The CalleeSavedInfo class tracks the information need to locate where a		/// The CalleeSavedInfo class tracks the information need to locate where a
/// callee saved register is in the current frame.		/// callee saved register is in the current frame.
class CalleeSavedInfo {		class CalleeSavedInfo {
unsigned Reg;		unsigned Reg;
int FrameIdx;		int FrameIdx;

public:		public:
explicit CalleeSavedInfo(unsigned R, int FI = 0)		explicit CalleeSavedInfo(unsigned R, int FI = 0)
: Reg(R), FrameIdx(FI) {}		: Reg(R), FrameIdx(FI) {}

// Accessors.		// Accessors.
unsigned getReg() const { return Reg; }		unsigned getReg() const { return Reg; }
int getFrameIdx() const { return FrameIdx; }		int getFrameIdx() const { return FrameIdx; }
void setFrameIdx(int FI) { FrameIdx = FI; }		void setFrameIdx(int FI) { FrameIdx = FI; }
};		};

		/// Map a set of registers to a basic block. This is a replacement for CSInfo
		/// with extra information about the location of the saves / restores pinned
		/// to a basic block. One register may appear more than once in the map, as
		/// long as it is associated to a different basic block. The CSIs may share
		/// frame indexes for different registers, for different basic blocks.
		/// Similar to CSInfo, the frame indexes in the CalleeSavedInfo struct are
		/// valid ony if CSIValid is true.
		// FIXME: ShrinkWrap2: Make this a DenseMap<unsigned, BitVector>
		typedef DenseMap<MachineBasicBlock *, std::vector<CalleeSavedInfo>>
		CalleeSavedMap;

/// The MachineFrameInfo class represents an abstract stack frame until		/// The MachineFrameInfo class represents an abstract stack frame until
/// prolog/epilog code is inserted. This class is key to allowing stack frame		/// prolog/epilog code is inserted. This class is key to allowing stack frame
/// representation optimizations, such as frame pointer elimination. It also		/// representation optimizations, such as frame pointer elimination. It also
/// allows more mundane (but still important) optimizations, such as reordering		/// allows more mundane (but still important) optimizations, such as reordering
/// of abstract objects on the stack frame.		/// of abstract objects on the stack frame.
///		///
/// To support this, the class assigns unique integer identifiers to stack		/// To support this, the class assigns unique integer identifiers to stack
/// objects requested clients. These identifiers are negative integers for		/// objects requested clients. These identifiers are negative integers for
▲ Show 20 Lines • Show All 208 Lines • ▼ Show 20 Lines	class MachineFrameInfo {
/// True if this is a varargs function that contains a musttail call.		/// True if this is a varargs function that contains a musttail call.
bool HasMustTailInVarArgFunc = false;		bool HasMustTailInVarArgFunc = false;

/// True if this function contains a tail call. If so immutable objects like		/// True if this function contains a tail call. If so immutable objects like
/// function arguments are no longer so. A tail call can override fixed		/// function arguments are no longer so. A tail call can override fixed
/// stack objects like arguments so we can't treat them as immutable.		/// stack objects like arguments so we can't treat them as immutable.
bool HasTailCall = false;		bool HasTailCall = false;

		// FIXME: ShrinkWrap2: Deprecate.
/// Not null, if shrink-wrapping found a better place for the prologue.		/// Not null, if shrink-wrapping found a better place for the prologue.
MachineBasicBlock *Save = nullptr;		MachineBasicBlock *Save = nullptr;
/// Not null, if shrink-wrapping found a better place for the epilogue.		/// Not null, if shrink-wrapping found a better place for the epilogue.
MachineBasicBlock *Restore = nullptr;		MachineBasicBlock *Restore = nullptr;

		private:
		/// Should the PrologEpilogInserter and the various target hooks use the
		/// information gathered from shrink-wrapping?
		// FIXME: ShrinkWrap2: Fix name.
		// FIXME: ShrinkWrap2: Merge shrink-wrapped / non-shrink-wrapped paths.
		bool ShouldUseShrinkWrap2 = false;

public:		public:
		// FIXME: ShrinkWrap2: Temporary hack. Remove.
		RegScavenger *RS;
		MatzeBUnsubmitted Done Reply Inline Actions This looks like the main information you compute so it would deserve a few more comments explaining details. I wondered about: I assume we can have multiple entries for the same register (as long as they are in different blocks)? Can we have different FrameIndexes for the same register in different blocks? How does this relate to the existing CSInfo struct? MatzeB: This looks like the main information you compute so it would deserve a few more comments…
explicit MachineFrameInfo(unsigned StackAlignment, bool StackRealignable,		explicit MachineFrameInfo(unsigned StackAlignment, bool StackRealignable,
bool ForcedRealign)		bool ForcedRealign)
: StackAlignment(StackAlignment), StackRealignable(StackRealignable),		: StackAlignment(StackAlignment), StackRealignable(StackRealignable),
ForcedRealign(ForcedRealign) {}		ForcedRealign(ForcedRealign) {}

/// Return true if there are any stack objects in this function.		/// Return true if there are any stack objects in this function.
bool hasStackObjects() const { return !Objects.empty(); }		bool hasStackObjects() const { return !Objects.empty(); }

▲ Show 20 Lines • Show All 370 Lines • ▼ Show 20 Lines	void setCalleeSavedInfo(const std::vector<CalleeSavedInfo> &CSI) {
CSInfo = CSI;		CSInfo = CSI;
}		}

/// Has the callee saved info been calculated yet?		/// Has the callee saved info been calculated yet?
bool isCalleeSavedInfoValid() const { return CSIValid; }		bool isCalleeSavedInfoValid() const { return CSIValid; }

void setCalleeSavedInfoValid(bool v) { CSIValid = v; }		void setCalleeSavedInfoValid(bool v) { CSIValid = v; }

		// FIXME: ShrinkWrap2: Merge with multiple points.
MachineBasicBlock *getSavePoint() const { return Save; }		MachineBasicBlock *getSavePoint() const { return Save; }
void setSavePoint(MachineBasicBlock *NewSave) { Save = NewSave; }		void setSavePoint(MachineBasicBlock *NewSave) { Save = NewSave; }
MachineBasicBlock *getRestorePoint() const { return Restore; }		MachineBasicBlock *getRestorePoint() const { return Restore; }
void setRestorePoint(MachineBasicBlock *NewRestore) { Restore = NewRestore; }		void setRestorePoint(MachineBasicBlock *NewRestore) { Restore = NewRestore; }

		// FIXME: ShrinkWrap2: Is this the right place for this? This should be
		// somewhere in PEI or TargetFrameLowering, since they are the only ones using
		// it.
		// FIXME: ShrinkWrap2: This gets really messy and we should merge all the
		// behaviour for both shrink-wrapping passes and with it disabled.
		// FIXME: ShrinkWrap2: Name.
		// FIXME: ShrinkWrap2: Merge shrink-wrapped / non-shrink-wrapped paths.
		MatzeBUnsubmitted Done Reply Inline Actions Would it make sense to sprinkle some asserts here to make sure you are not accidentally mixing up the different systems and only use one at a time? MatzeB: Would it make sense to sprinkle some asserts here to make sure you are not accidentally mixing…
		bool getShouldUseShrinkWrap2() const { return ShouldUseShrinkWrap2; }
		// FIXME: ShrinkWrap2: Name.
		// FIXME: ShrinkWrap2: Merge shrink-wrapped / non-shrink-wrapped paths.
		void setShouldUseShrinkWrap2(bool New) { ShouldUseShrinkWrap2 = New; }

/// Return a set of physical registers that are pristine.		/// Return a set of physical registers that are pristine.
///		///
/// Pristine registers hold a value that is useless to the current function,		/// Pristine registers hold a value that is useless to the current function,
/// but that must be preserved - they are callee saved registers that are not		/// but that must be preserved - they are callee saved registers that are not
/// saved.		/// saved.
///		///
/// Before the PrologueEpilogueInserter has placed the CSR spill code, this		/// Before the PrologueEpilogueInserter has placed the CSR spill code, this
/// method always returns an empty set.		/// method always returns an empty set.
Show All 13 Lines

include/llvm/CodeGen/ShrinkWrapper.h

This file was added.

				//===- llvm/CodeGen/ShrinkWrapper.h - Shrink Wrapping Utility ---- C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				// This class is the main utility to provide shrink-wrapping properties to any
				// kind of attributes. This is used to do callee-saved registers and stack
				// shrink-wrapping. The algorithm is based on "Minimizing Register Usage Penalty
				// at Procedure Calls - Fred C. Chow" [1], with the usage of SCCs to exclude
				// loops and provide a linear pass instead of a complete dataflow analysis.
				// FIXME: ShrinkWrap2: Random thoughts:
				// - r193749 removed an old pass that was an implementation of [1].
				// - Cost model: use MachineBlockFrequency and some instruction cost model?
				// - Split critical edges on demand?
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_CODEGEN_SHRINKWRAP_H
				#define LLVM_CODEGEN_SHRINKWRAP_H

				#include "llvm/ADT/BitVector.h"
				#include "llvm/ADT/DenseMap.h"
				#include "llvm/ADT/SetVector.h"
				#include "llvm/ADT/SmallVector.h"
				#include "llvm/ADT/iterator_range.h"
				#include "llvm/ADT/PostOrderIterator.h"

				#include "llvm/Target/TargetFrameLowering.h"
				#include "llvm/Target/TargetRegisterInfo.h"
				#include "llvm/Target/TargetSubtargetInfo.h"

				#include "llvm/CodeGen/MachineBasicBlock.h"
				#include "llvm/CodeGen/MachineFunction.h"

				namespace llvm {

				class MachineBlockFrequencyInfo;
				class MachineOptimizationRemarkEmitter;

				/// Information about the requirements on shrink-wrapping. This should describe
				/// what does "used" mean, and it should be the main interface to work with the
				/// targets and other shrink-wrappable inputs.
				class ShrinkWrapInfo {
				protected:
				/// Track all the uses per basic block.
				SmallVector<BitVector, 8> Uses;

				/// The machine function we're shrink-wrapping.
				const MachineFunction &MF;

				/// Generic code to determine callee saved register uses. This checks for
				/// regmasks, and tracks all the register units.
				/// If there is an use on a terminator, the successors will also be marked as
				/// used.
				// FIXME: ShrinkWrap2: Make this a free-function outside shrink-wrapping.
				void determineCSRUses();

				public:
				ShrinkWrapInfo(const MachineFunction &MF)
				: Uses(MF.getNumBlockIDs()), MF(MF) {}
				/// Get the number of results we want per block. i.e. number of registers in
				/// the target.
				virtual unsigned getNumResultBits() const { return 0; }

				/// Get the elements that are used for a particular basic block. The result is
				/// `nullptr` if there are no uses.
				virtual const BitVector *getUses(unsigned MBBNum) const;

				/// Provide a way to print elements. Debug only.
				// FIXME: ShrinkWrap2: Add DUMP macros.
				virtual raw_ostream &printElt(unsigned Elt, raw_ostream &OS) const {
				OS << Elt;
				return OS;
				};

				virtual ~ShrinkWrapInfo() = default;
				};

				/// Iterator for successors / predecessors. This is here to work with
				/// SmallVector and std::vector at the same time.
				// FIXME: ShrinkWrap2: Use ArrayRef?
				typedef const MachineBasicBlock const MBBIterator;

				class ShrinkWrapper {
				typedef BitVector MBBSet;
				/// Result type used to store results / uses. The target decides the meaning
				/// of the bits.
				typedef BitVector TargetResultSet;
				// Idx = MBB.getNumber()
				typedef SmallVector<TargetResultSet, 8> BBResultSetMap;
				typedef DenseMap<unsigned, TargetResultSet> SparseBBResultSetMap;

				/// The shrink-wrapping analysis is based on two properties:
				/// * Anticipation:
				/// The use of a register is ancicipated at a given point if a use of the
				/// register will be encountered in all possible execution paths leading from
				/// that point.

				/// * Availability:
				/// The use of a register is available at a given point if a use of the
				/// register has been encountered in all possible execution paths that lead to
				/// that point.

				/// Both attributes are propagated at the beginning and at the end of a block
				/// (which could be an SCC, or a basic block).
				// FIXME: ShrinkWrap2: Remove OUT/IN.
				struct SWAttributes {
				/// Is the element anticipated at the beginning of this block?
				TargetResultSet ANTIN;
				/// Is the element available at the end of this block?
				TargetResultSet AVOUT;

				/// Resize all the sets.
				SWAttributes(const ShrinkWrapInfo &SWI) {
				unsigned Max = SWI.getNumResultBits();
				for (TargetResultSet *BV : {&ANTIN, &AVOUT})
				(*BV).resize(Max);
				}
				};

				// Idx = MBB.getNumber()
				typedef SmallVector<SWAttributes, 4> AttributeMap;

				/// An SCC that was discovered through the scc_iterator on the function.
				/// This is used in order to detect loops, reducible AND irreducible.
				struct SCCLoop {
				typedef SmallVector<const MachineBasicBlock *, 4> MBBVector;
				/// The successors of the SCC. These are blocks outside the SCC.
				SetVector<const MachineBasicBlock *, MBBVector> Successors;
				iterator_range<MBBIterator> successors() const {
				return {&Successors.begin(), &Successors.end()};
				}
				/// The predecessors of the SCC. These are blocks outside the SCC.
				SetVector<const MachineBasicBlock *, MBBVector> Predecessors;
				iterator_range<MBBIterator> predecessors() const {
				return {&Predecessors.begin(), &Predecessors.end()};
				}
				/// This number is the number of the first MBB in the SCC.
				unsigned Number;
				unsigned getNumber() const { return Number; }
				/// The number of blocks the SCC contains.
				unsigned Size;
				unsigned getSize() const { return Size; }
				};

				/// Wrapper around scc_iterator that collects SCCs that are loops, computes
				/// their successor / predecessor and assigns an unique number based on the
				/// basic blocks it contains.
				struct SCCLoopInfo {
				/// Own the SCCs.
				SmallVector<SCCLoop, 4> SCCs;
				/// Map a basic block number to an SCCLoop number. The SCCLoop number is
				/// the position in the `SCCs` vector, and it is differrent from the
				/// SCCLoop::Number attribute, which is the first basic block's number in
				/// the SCC.
				DenseMap<unsigned, unsigned> MBBToSCC;

				/// Initialize the successors / predecessors of the SCCLoops.
				SCCLoopInfo(const MachineFunction &MF);
				/// Get the SCCLoop for a designated basic block number. If there is no
				/// SCCLoop associated, return `nullptr`.
				SCCLoop *getSCCLoopFor(unsigned MBBNum) {
				auto It = MBBToSCC.find(MBBNum);
				if (It == MBBToSCC.end())
				return nullptr;
				return &SCCs[It->second];
				}
				const SCCLoop *getSCCLoopFor(unsigned MBBNum) const {
				return const_cast<SCCLoopInfo *>(this)->getSCCLoopFor(MBBNum);
				}
				};

				/// The MachineFunction we're working on.
				const MachineFunction &MF;

				/// Target-found uses.
				// FIXME: ShrinkWrap2: Use the one from ShrinkWrapInfo, but detecting critical
				// edges may need to modify it.
				BBResultSetMap Uses;

				// FIXME: ShrinkWrap2: Is this the correct place to compute this?
				/// Blocks that never return.
				MBBSet NoReturnBlocks;

				/// Target-specific shrink-wrap information.
				std::unique_ptr<ShrinkWrapInfo> SWI;

				/// The replacement for the MachineLoopInfo, that handles irreducible loops
				/// as well.
				SCCLoopInfo SI;

				/// Final results.
				SparseBBResultSetMap Saves;
				SparseBBResultSetMap Restores;

				/// Number of times the attributes have been recomputed because of critical
				/// edges.
				unsigned AttributesRecomputed = 0;

				/// All the elements encountered so far.
				TargetResultSet AllElts;

				/// The CFG we're working on is no longer composed of basic blocks. It's
				/// basically the CFG of SCCs, and we're using numbers to identify nodes. A
				/// simple basic block's number is MBB->getNumber(), and a SCC that is a
				/// loop gets the number of the first basic block encountered. For that,
				/// we're using the following functions to traverse our CFG.

				/// Get the block number or the SCCLoop's number.
				unsigned blockNumber(unsigned MBBNum) const;
				/// Get the block successors or the SCCLoop exit blocks.
				iterator_range<MBBIterator> blockSuccessors(unsigned MBBNum) const;
				/// Get the block predecessors or the SCCLoop's predecessors.
				iterator_range<MBBIterator> blockPredecessors(unsigned MBBNum) const;

				/// Anticipability
				// If there is an use of this on all the paths starting from
				// this basic block, the element is anticipated at the end of this
				// block.
				// (propagate the IN attribute of successors to possibly merge saves)
				// -
				// \| false if no successor.
				// ANTOUT = \|
				// \| && ANTIN(succ[i]) otherwise.
				//
				bool ANTOUT(const AttributeMap &Attrs, unsigned MBBNum, unsigned Elt) const {
				auto Successors = blockSuccessors(MBBNum);
				if (Successors.begin() == Successors.end())
				return false;
				return all_of(Successors, [&](const MachineBasicBlock *S) {
				return Attrs[blockNumber(S->getNumber())].ANTIN.test(Elt);
				});
				}

				/// Availability
				// If there is an use of this on all the paths arriving in this block,
				// then the element is available in this block (propagate the out attribute
				// of predecessors to possibly merge restores).
				// -
				// \| false if no predecessor.
				// AVIN = \|
				// \| && AVOUT(pred[i]) otherwise.
				// -
				bool AVIN(const AttributeMap &Attrs, unsigned MBBNum, unsigned Elt) const {
				auto Predecessors = blockPredecessors(MBBNum);
				if (Predecessors.begin() == Predecessors.end())
				return false;
				return all_of(Predecessors, [&](const MachineBasicBlock *P) {
				return Attrs[blockNumber(P->getNumber())].AVOUT.test(Elt);
				});
				}

				/// Determine uses based on ShrinkWrapInfo.
				// FIXME: ShrinkWrap2: Remove. Call SWI directly.
				void determineUses();
				/// Remove uses and fill NoReturnBlocks with the blocks that we know are not
				/// going to return from the function.
				/// FIXME: ShrinkWrap2: Is this the correct place to compute this?
				void removeUsesOnNoReturnPaths();
				void dumpUses() const;
				/// Mark all the basic blocks / SCCs around a loop (pred, succ) as used,
				/// if there is an usage of a CSR inside a loop. We want to avoid any save /
				/// restore operations inside a loop.
				void markUsesOutsideLoops();

				/// Compute the attributes for one element.
				// FIXME: ShrinkWrap2: Don't do this per element.
				void computeAttributes(
				unsigned Elt, AttributeMap &Attrs,
				ReversePostOrderTraversal<const MachineFunction *> &RPOT) const;
				/// Save the results for this particular element.
				// FIXME: ShrinkWrap2: Don't do this per element.
				void gatherAttributesResults(unsigned Elt, AttributeMap &Attrs);
				/// Check for critical edges and mark new blocks as needed.
				// FIXME: ShrinkWrap2: Don't do this per element.
				bool hasCriticalEdges(unsigned Elt, AttributeMap &Attrs);
				/// Dump the contents of the attributes.
				// FIXME: ShrinkWrap2: Don't do this per element.
				void dumpAttributes(unsigned Elt, const AttributeMap &Attrs) const;

				/// * Verify if the results are better than obvious results, like:
				/// * CSR used in a single MBB: only one save and one restore.
				/// * Remove empty entries from the Saves / Restores maps.
				// FIXME: ShrinkWrap2: This shouldn't happen, we better fix the algorithm
				// first.
				void postProcessResults(const BBResultSetMap &OldUses);
				/// Compute the shrink-wrapping cost, which is based on block frequency.
				unsigned computeShrinkWrappingCost(MachineBlockFrequencyInfo *MBFI) const;
				/// Compute the same cost, in entry / return blocks, which is based on block
				/// frequency.
				unsigned computeDefaultCost(MachineBlockFrequencyInfo *MBFI) const;
				/// Verify save / restore points by walking the CFG.
				/// This asserts if anything went wrong.
				// FIXME: ShrinkWrap2: Should this be guarded by a macro?
				void verifySavesRestores() const;

				/// Dump the final shrink-wrapping results.
				void dumpResults() const;

				public:
				/// Run the shrink-wrapper on the function. If there are no uses, there will
				/// be no saves / restores.
				/// By default, run the shrink-wrapper with the target's CSRShrinkWrapInfo.
				ShrinkWrapper(const MachineFunction &MF);
				/// Run the shrink-wrapper with a custom ShrinkWrapInfo.
				ShrinkWrapper(const MachineFunction &MF, std::unique_ptr<ShrinkWrapInfo> SWI);

				/// Check if the function has any uses that can be shrink-wrapped.
				bool hasUses() const { return !Uses.empty(); }

				/// Get the target's shrink-wrap info.
				ShrinkWrapInfo &getSWI() { return *SWI; };
				const ShrinkWrapInfo &getSWI() const { return *SWI; };

				/// Get the final results.
				const SparseBBResultSetMap &getSaves() { return Saves; }
				const SparseBBResultSetMap &getRestores() { return Restores; }

				/// Emit optimization remarks for the whole function.
				void emitRemarks(MachineOptimizationRemarkEmitter *ORE,
				MachineBlockFrequencyInfo *MBFI) const;

				/// Check that the final results are better than the default behaviour.
				bool areResultsInteresting(MachineBlockFrequencyInfo *MBFI) const;
				};

				} // end namespace llvm

				#endif // LLVM_CODEGEN_SHRINKWRAP_H

include/llvm/Target/TargetFrameLowering.h

Show All 9 Lines
// Interface to describe the layout of a stack frame on the target machine.		// Interface to describe the layout of a stack frame on the target machine.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_TARGET_TARGETFRAMELOWERING_H		#ifndef LLVM_TARGET_TARGETFRAMELOWERING_H
#define LLVM_TARGET_TARGETFRAMELOWERING_H		#define LLVM_TARGET_TARGETFRAMELOWERING_H

#include "llvm/CodeGen/MachineBasicBlock.h"		#include "llvm/CodeGen/MachineBasicBlock.h"
		#include "llvm/CodeGen/MachineFrameInfo.h"
		#include "llvm/CodeGen/ShrinkWrapper.h"
#include <utility>		#include <utility>
#include <vector>		#include <vector>

namespace llvm {		namespace llvm {
class BitVector;		class BitVector;
class CalleeSavedInfo;		class CalleeSavedInfo;
class MachineFunction;		class MachineFunction;
class RegScavenger;		class RegScavenger;
		class ShrinkWrapInfo;

/// Information about stack frame layout on the target. It holds the direction		/// Information about stack frame layout on the target. It holds the direction
/// of stack growth, the known stack alignment on entry to each function, and		/// of stack growth, the known stack alignment on entry to each function, and
/// the offset to the locals area.		/// the offset to the locals area.
///		///
/// The offset to the local area is the offset from the stack pointer on		/// The offset to the local area is the offset from the stack pointer on
/// function entry to the first location where function data (local variables,		/// function entry to the first location where function data (local variables,
/// spill locations) can be stored.		/// spill locations) can be stored.
▲ Show 20 Lines • Show All 287 Lines • ▼ Show 20 Lines	public:
/// \p MBB will be correctly handled by the target.		/// \p MBB will be correctly handled by the target.
/// As soon as the target enable shrink-wrapping without overriding		/// As soon as the target enable shrink-wrapping without overriding
/// this method, we assume that each basic block is a valid		/// this method, we assume that each basic block is a valid
/// epilogue.		/// epilogue.
virtual bool canUseAsEpilogue(const MachineBasicBlock &MBB) const {		virtual bool canUseAsEpilogue(const MachineBasicBlock &MBB) const {
return true;		return true;
}		}

		// FIXME: ShrinkWrap2: Yet another target hook to be removed later. See
		// comment in PrologEpilogInserter.cpp:579
		virtual void
		processValidCalleeSavedInfo(MachineFunction &MF,
		const TargetRegisterInfo *TRI,
		std::vector<CalleeSavedInfo> &CSI) const {}

/// Check if given function is safe for not having callee saved registers.		/// Check if given function is safe for not having callee saved registers.
/// This is used when interprocedural register allocation is enabled.		/// This is used when interprocedural register allocation is enabled.
static bool isSafeForNoCSROpt(const Function *F) {		static bool isSafeForNoCSROpt(const Function *F) {
if (!F->hasLocalLinkage() \|\| F->hasAddressTaken() \|\|		if (!F->hasLocalLinkage() \|\| F->hasAddressTaken() \|\|
!F->hasFnAttribute(Attribute::NoRecurse))		!F->hasFnAttribute(Attribute::NoRecurse))
return false;		return false;
// Function should not be optimized as tail call.		// Function should not be optimized as tail call.
for (const User *U : F->users())		for (const User *U : F->users())
if (auto CS = ImmutableCallSite(U))		if (auto CS = ImmutableCallSite(U))
if (CS.isTailCall())		if (CS.isTailCall())
return false;		return false;
return true;		return true;
}		}

		/// Provide all the target-hooks needed for shrink-wrapping.
		virtual std::unique_ptr<ShrinkWrapInfo>
		createCSRShrinkWrapInfo(const MachineFunction &MF) const {
		llvm_unreachable("Target didn't implement a ShrinkWrapInfo subclass!");
		return nullptr;
		}
};		};

} // End llvm namespace		} // End llvm namespace

#endif		#endif

lib/CodeGen/AsmPrinter/AsmPrinter.cpp

Show All 14 Lines
#include "AsmPrinterHandler.h"		#include "AsmPrinterHandler.h"
#include "CodeViewDebug.h"		#include "CodeViewDebug.h"
#include "DwarfDebug.h"		#include "DwarfDebug.h"
#include "DwarfException.h"		#include "DwarfException.h"
#include "WinException.h"		#include "WinException.h"
#include "llvm/ADT/APFloat.h"		#include "llvm/ADT/APFloat.h"
#include "llvm/ADT/APInt.h"		#include "llvm/ADT/APInt.h"
#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
		#include "llvm/ADT/PostOrderIterator.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallString.h"		#include "llvm/ADT/SmallString.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/ADT/Triple.h"		#include "llvm/ADT/Triple.h"
#include "llvm/ADT/Twine.h"		#include "llvm/ADT/Twine.h"
▲ Show 20 Lines • Show All 863 Lines • ▼ Show 20 Lines	static bool emitDebugValueComment(const MachineInstr *MI, AsmPrinter &AP) {
if (MemLoc)		if (MemLoc)
OS << '+' << Offset << ']';		OS << '+' << Offset << ']';

// NOTE: Want this comment at start of line, don't emit with AddComment.		// NOTE: Want this comment at start of line, don't emit with AddComment.
AP.OutStreamer->emitRawComment(OS.str());		AP.OutStreamer->emitRawComment(OS.str());
return true;		return true;
}		}

AsmPrinter::CFIMoveType AsmPrinter::needsCFIMoves() {		AsmPrinter::CFIMoveType AsmPrinter::needsCFIMoves() const {
if (MAI->getExceptionHandlingType() == ExceptionHandling::DwarfCFI &&		ExceptionHandling ExceptionHandlingType = MAI->getExceptionHandlingType();
		if (ExceptionHandlingType != ExceptionHandling::DwarfCFI &&
		ExceptionHandlingType != ExceptionHandling::ARM)
		return CFI_M_None;

		if (ExceptionHandlingType == ExceptionHandling::DwarfCFI &&
MF->getFunction()->needsUnwindTableEntry())		MF->getFunction()->needsUnwindTableEntry())
return CFI_M_EH;		return CFI_M_EH;

if (MMI->hasDebugInfo())		if (MMI->hasDebugInfo())
return CFI_M_Debug;		return CFI_M_Debug;

return CFI_M_None;		return CFI_M_None;
}		}

		void AsmPrinter::generateShrinkWrappingCFI() {
		// Reset everything.
		ExtraSaveCFI.clear();
		ExtraRestoreCFI.clear();

		// FIXME: ShrinkWrap2: Gather all the saving points (based on CFI).
		CSRMap Saves;
		// FIXME: ShrinkWrap2: Gather all the restoring points (based on CFI).
		CSRMap Restores;

		const MCRegisterInfo *MCRI = MF->getMMI().getContext().getRegisterInfo();
		const MachineRegisterInfo &MRI = MF->getRegInfo();

		// Collect all the CSRs and their index.
		const MCPhysReg *CSRegs = MRI.getCalleeSavedRegs();
		for (unsigned i = 0; CSRegs[i]; ++i) {
		unsigned DwarfReg = MCRI->getDwarfRegNum(CSRegs[i], true);
		unsigned Reg = MCRI->getLLVMRegNum(DwarfReg, false);
		RegToCSRIdx[Reg] = i;
		}

		// First pass, collect .cfi_offset and .cfi_restore directives:
		// * .cfi_offset represents a csr save
		// * .cfi_restore represents a csr restore
		for (const MachineBasicBlock &MBB : *MF) {
		for (const MachineInstr &MI : MBB) {
		if (!MI.isCFIInstruction())
		continue;
		const std::vector<MCCFIInstruction> &Instrs = MF->getFrameInstructions();
		unsigned CFIIndex = MI.getOperand(0).getCFIIndex();
		const MCCFIInstruction &CFI = Instrs[CFIIndex];

		// Check if it's a save.
		if (CFI.getOperation() == MCCFIInstruction::OpOffset) {
		unsigned DwarfReg = CFI.getRegister();
		unsigned Reg = MCRI->getLLVMRegNum(DwarfReg, false);
		if (RegToCSRIdx.count(Reg)) {
		BitVector &Save = Saves[MBB.getNumber()];
		Save.resize(RegToCSRIdx.size());
		Save.set(RegToCSRIdx[Reg]);
		}
		}

		// Check if it's a restore.
		if (CFI.getOperation() == MCCFIInstruction::OpRestore) {
		unsigned DwarfReg = CFI.getRegister();
		unsigned Reg = MCRI->getLLVMRegNum(DwarfReg, false);
		if (RegToCSRIdx.count(Reg)) {
		BitVector &Restore = Restores[MBB.getNumber()];
		Restore.resize(RegToCSRIdx.size());
		Restore.set(RegToCSRIdx[Reg]);
		}
		}
		}
		}

		// Compute the "liveness" of the CSRs. A CSR is live if it has been saved,
		// and killed if it has been restored.
		SmallVector<BitVector, 8> LiveCSRs{MF->getNumBlockIDs()};
		for (BitVector &BV : LiveCSRs)
		BV.resize(RegToCSRIdx.size());

		ReversePostOrderTraversal<const MachineFunction *> RPOT(MF);
		for (const MachineBasicBlock *MBB : RPOT) {
		BitVector &LiveHere = LiveCSRs[MBB->getNumber()];
		// LIVE(MBB) += LIVE(EACH_PRED) - RESTORE(EACH_PRED) + SAVE(MBB)
		// Propagate the liveness information.
		for (const MachineBasicBlock *Pred : MBB->predecessors())
		LiveHere \|= LiveCSRs[Pred->getNumber()];
		// If any of the predecessors restored any CSR, kill them.
		for (const MachineBasicBlock *Pred : MBB->predecessors()) {
		auto Found = Restores.find(Pred->getNumber());
		if (Found == Restores.end())
		continue;
		BitVector &Killed = Found->second;
		LiveHere.flip();
		LiveHere \|= Killed;
		LiveHere.flip();
		}
		// If this block saved any CSRs, make them live.
		auto Found = Saves.find(MBB->getNumber());
		if (Found == Saves.end())
		continue;
		BitVector &Saved = Found->second;
		LiveHere \|= Saved;
		}

		// Now compute the state changes we need in between the blocks.
		BitVector LastState(RegToCSRIdx.size());
		for (const MachineBasicBlock &MBB : *MF) {
		BitVector &LiveHere = LiveCSRs[MBB.getNumber()];
		if (&MBB != &MF->front()) {
		auto Prev = std::prev(MBB.getIterator());
		auto Found = Restores.find(Prev->getNumber());
		if (Found != Restores.end() && !Found->second.empty()) {
		BitVector &Killed = Found->second;
		LastState.flip();
		LastState \|= Killed;
		LastState.flip();
		}
		}

		// Save everything that is added in the current state and was not there in
		// the last one (and the saves that are already here).
		BitVector ToSave = LastState;
		ToSave \|= Saves[MBB.getNumber()];
		ToSave.flip();
		ToSave &= LiveHere;
		if (ToSave.count())
		ExtraSaveCFI[MBB.getNumber()] = std::move(ToSave);

		// Restore everything that is not in the current state anymore but it was
		// in the last one.
		BitVector ToRestore = LastState;
		ToRestore.flip();
		ToRestore \|= LiveHere;
		ToRestore.flip();
		if (ToRestore.count())
		ExtraRestoreCFI[MBB.getNumber()] = std::move(ToRestore);

		LastState = LiveHere;
		}
		}

bool AsmPrinter::needsSEHMoves() {		bool AsmPrinter::needsSEHMoves() {
return MAI->usesWindowsCFI() && MF->getFunction()->needsUnwindTableEntry();		return MAI->usesWindowsCFI() && MF->getFunction()->needsUnwindTableEntry();
}		}

void AsmPrinter::emitCFIInstruction(const MachineInstr &MI) {		void AsmPrinter::emitCFIInstruction(const MachineInstr &MI) {
ExceptionHandling ExceptionHandlingType = MAI->getExceptionHandlingType();
if (ExceptionHandlingType != ExceptionHandling::DwarfCFI &&
ExceptionHandlingType != ExceptionHandling::ARM)
return;

if (needsCFIMoves() == CFI_M_None)		if (needsCFIMoves() == CFI_M_None)
return;		return;

// If there is no "real" instruction following this CFI instruction, skip		// If there is no "real" instruction following this CFI instruction, skip
// emitting it; it would be beyond the end of the function's FDE range.		// emitting it; it would be beyond the end of the function's FDE range.
auto *MBB = MI.getParent();		auto *MBB = MI.getParent();
auto I = std::next(MI.getIterator());		auto I = std::next(MI.getIterator());
while (I != MBB->end() && I->isTransient())		while (I != MBB->end() && I->isTransient())
▲ Show 20 Lines • Show All 493 Lines • ▼ Show 20 Lines	void AsmPrinter::SetupMachineFunction(MachineFunction &MF) {
ORE = &getAnalysis<MachineOptimizationRemarkEmitterPass>().getORE();		ORE = &getAnalysis<MachineOptimizationRemarkEmitterPass>().getORE();
if (isVerbose())		if (isVerbose())
LI = &getAnalysis<MachineLoopInfo>();		LI = &getAnalysis<MachineLoopInfo>();

const TargetSubtargetInfo &STI = MF.getSubtarget();		const TargetSubtargetInfo &STI = MF.getSubtarget();
EnablePrintSchedInfo = PrintSchedule.getNumOccurrences()		EnablePrintSchedInfo = PrintSchedule.getNumOccurrences()
? PrintSchedule		? PrintSchedule
: STI.supportPrintSchedInfo();		: STI.supportPrintSchedInfo();

		if (needsCFIMoves() == CFI_M_None)
		return;

		// FIXME: ShrinkWrap2: Compute the blocks that need CFI state switching.
		const MachineFrameInfo &MFI = MF.getFrameInfo();
		if (MFI.getShouldUseShrinkWrap2())
		generateShrinkWrappingCFI();
}		}

namespace {		namespace {

// Keep track the alignment, constpool entries per Section.		// Keep track the alignment, constpool entries per Section.
struct SectionCPs {		struct SectionCPs {
MCSection *S;		MCSection *S;
unsigned Alignment;		unsigned Alignment;
▲ Show 20 Lines • Show All 1,214 Lines • ▼ Show 20 Lines	if (MBB.pred_empty() \|\|
(isBlockOnlyReachableByFallthrough(&MBB) && !MBB.isEHFuncletEntry())) {		(isBlockOnlyReachableByFallthrough(&MBB) && !MBB.isEHFuncletEntry())) {
if (isVerbose()) {		if (isVerbose()) {
// NOTE: Want this comment at start of line, don't emit with AddComment.		// NOTE: Want this comment at start of line, don't emit with AddComment.
OutStreamer->emitRawComment(" BB#" + Twine(MBB.getNumber()) + ":", false);		OutStreamer->emitRawComment(" BB#" + Twine(MBB.getNumber()) + ":", false);
}		}
} else {		} else {
OutStreamer->EmitLabel(MBB.getSymbol());		OutStreamer->EmitLabel(MBB.getSymbol());
}		}

		// FIXME: ShrinkWrap2: Insert the CFI that are needed to do the transition
		// between each block.
		if (needsCFIMoves() == CFI_M_None)
		return;

		DenseMap<unsigned, unsigned> CSRIdxToCSIIdx;
		const MCRegisterInfo *MCRI = MF->getMMI().getContext().getRegisterInfo();
		const MachineFrameInfo &MFI = MF->getFrameInfo();
		const std::vector<CalleeSavedInfo> &CSIs = MFI.getCalleeSavedInfo();
		for (auto &KV : enumerate(CSIs)) {
		const CalleeSavedInfo &CSI = KV.value();
		unsigned Reg = CSI.getReg();
		unsigned DwarfReg = MCRI->getDwarfRegNum(Reg, true);
		Reg = MCRI->getLLVMRegNum(DwarfReg, false);
		unsigned CSIIdx = KV.index();
		CSRIdxToCSIIdx[RegToCSRIdx.lookup(Reg)] = CSIIdx;
		}

		if (MFI.getShouldUseShrinkWrap2()) {
		const MCRegisterInfo *MRI = MF->getMMI().getContext().getRegisterInfo();
		for (unsigned CSRIdx : ExtraSaveCFI.lookup(MBB.getNumber()).set_bits()) {
		const CalleeSavedInfo &CSI = CSIs[CSRIdxToCSIIdx[CSRIdx]];
		int64_t Offset = MFI.getObjectOffset(CSI.getFrameIdx());
		unsigned DwarfReg = MRI->getDwarfRegNum(CSI.getReg(), true);
		// .cfi_offset %reg, off
		emitCFIInstruction(
		MCCFIInstruction::createOffset(nullptr, DwarfReg, Offset));
		}
		for (unsigned CSRIdx : ExtraRestoreCFI.lookup(MBB.getNumber()).set_bits()) {
		const CalleeSavedInfo &CSI = CSIs[CSRIdxToCSIIdx[CSRIdx]];
		unsigned DwarfReg = MRI->getDwarfRegNum(CSI.getReg(), true);
		// .cfi_restore %reg
		emitCFIInstruction(MCCFIInstruction::createRestore(nullptr, DwarfReg));
		}
		}
}		}

void AsmPrinter::EmitVisibility(MCSymbol *Sym, unsigned Visibility,		void AsmPrinter::EmitVisibility(MCSymbol *Sym, unsigned Visibility,
bool IsDefinition) const {		bool IsDefinition) const {
MCSymbolAttr Attr = MCSA_Invalid;		MCSymbolAttr Attr = MCSA_Invalid;

switch (Visibility) {		switch (Visibility) {
default: break;		default: break;
▲ Show 20 Lines • Show All 193 Lines • Show Last 20 Lines

lib/CodeGen/AsmPrinter/CodeViewDebug.cpp

Show First 20 Lines • Show All 1,014 Lines • ▼ Show 20 Lines	void CodeViewDebug::beginFunctionImpl(const MachineFunction *MF) {
CurFn->Begin = Asm->getFunctionBegin();		CurFn->Begin = Asm->getFunctionBegin();

OS.EmitCVFuncIdDirective(CurFn->FuncId);		OS.EmitCVFuncIdDirective(CurFn->FuncId);

// Find the end of the function prolog. First known non-DBG_VALUE and		// Find the end of the function prolog. First known non-DBG_VALUE and
// non-frame setup location marks the beginning of the function body.		// non-frame setup location marks the beginning of the function body.
// FIXME: is there a simpler a way to do this? Can we just search		// FIXME: is there a simpler a way to do this? Can we just search
// for the first instruction of the function, not the last of the prolog?		// for the first instruction of the function, not the last of the prolog?
		// FIXME: ShrinkWrap2: This won't work with shrink-wrapping, I guess.
DebugLoc PrologEndLoc;		DebugLoc PrologEndLoc;
bool EmptyPrologue = true;		bool EmptyPrologue = true;
for (const auto &MBB : *MF) {		for (const auto &MBB : *MF) {
for (const auto &MI : MBB) {		for (const auto &MI : MBB) {
if (!MI.isMetaInstruction() && !MI.getFlag(MachineInstr::FrameSetup) &&		if (!MI.isMetaInstruction() && !MI.getFlag(MachineInstr::FrameSetup) &&
MI.getDebugLoc()) {		MI.getDebugLoc()) {
PrologEndLoc = MI.getDebugLoc();		PrologEndLoc = MI.getDebugLoc();
break;		break;
▲ Show 20 Lines • Show All 1,219 Lines • Show Last 20 Lines

lib/CodeGen/CMakeLists.txt

Show First 20 Lines • Show All 121 Lines • ▼ Show 20 Lines	add_llvm_library(LLVMCodeGen
SafeStackLayout.cpp		SafeStackLayout.cpp
ScalarizeMaskedMemIntrin.cpp		ScalarizeMaskedMemIntrin.cpp
ScheduleDAG.cpp		ScheduleDAG.cpp
ScheduleDAGInstrs.cpp		ScheduleDAGInstrs.cpp
ScheduleDAGPrinter.cpp		ScheduleDAGPrinter.cpp
ScoreboardHazardRecognizer.cpp		ScoreboardHazardRecognizer.cpp
ShadowStackGCLowering.cpp		ShadowStackGCLowering.cpp
ShrinkWrap.cpp		ShrinkWrap.cpp
		# FIXME: ShrinkWrap2: Merge.
		ShrinkWrapper.cpp
SjLjEHPrepare.cpp		SjLjEHPrepare.cpp
SlotIndexes.cpp		SlotIndexes.cpp
SpillPlacement.cpp		SpillPlacement.cpp
SplitKit.cpp		SplitKit.cpp
StackColoring.cpp		StackColoring.cpp
StackMapLivenessAnalysis.cpp		StackMapLivenessAnalysis.cpp
StackMaps.cpp		StackMaps.cpp
StackProtector.cpp		StackProtector.cpp
Show All 32 Lines

lib/CodeGen/PrologEpilogInserter.cpp

Show All 14 Lines
// executed, it is illegal to construct MO_FrameIndex operands.		// executed, it is illegal to construct MO_FrameIndex operands.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SetVector.h"		#include "llvm/ADT/SetVector.h"
#include "llvm/ADT/SmallSet.h"		#include "llvm/ADT/SmallSet.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
		#include "llvm/CodeGen/MachineBlockFrequencyInfo.h"
#include "llvm/CodeGen/MachineDominators.h"		#include "llvm/CodeGen/MachineDominators.h"
#include "llvm/CodeGen/MachineFrameInfo.h"		#include "llvm/CodeGen/MachineFrameInfo.h"
#include "llvm/CodeGen/MachineInstr.h"		#include "llvm/CodeGen/MachineInstr.h"
#include "llvm/CodeGen/MachineLoopInfo.h"		#include "llvm/CodeGen/MachineLoopInfo.h"
#include "llvm/CodeGen/MachineModuleInfo.h"		#include "llvm/CodeGen/MachineModuleInfo.h"
		#include "llvm/CodeGen/MachineOptimizationRemarkEmitter.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"		#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/CodeGen/Passes.h"		#include "llvm/CodeGen/Passes.h"
#include "llvm/CodeGen/RegisterScavenging.h"		#include "llvm/CodeGen/RegisterScavenging.h"
#include "llvm/CodeGen/StackProtector.h"		#include "llvm/CodeGen/StackProtector.h"
		#include "llvm/CodeGen/ShrinkWrapper.h"
#include "llvm/CodeGen/WinEHFuncInfo.h"		#include "llvm/CodeGen/WinEHFuncInfo.h"
#include "llvm/IR/DiagnosticInfo.h"		#include "llvm/IR/DiagnosticInfo.h"
#include "llvm/IR/InlineAsm.h"		#include "llvm/IR/InlineAsm.h"
#include "llvm/IR/LLVMContext.h"		#include "llvm/IR/LLVMContext.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Target/TargetFrameLowering.h"		#include "llvm/Target/TargetFrameLowering.h"
#include "llvm/Target/TargetInstrInfo.h"		#include "llvm/Target/TargetInstrInfo.h"
#include "llvm/Target/TargetMachine.h"		#include "llvm/Target/TargetMachine.h"
#include "llvm/Target/TargetRegisterInfo.h"		#include "llvm/Target/TargetRegisterInfo.h"
#include "llvm/Target/TargetSubtargetInfo.h"		#include "llvm/Target/TargetSubtargetInfo.h"
		#include "llvm/CodeGen/MachineInstrBuilder.h"
		#include "llvm/MC/MCAsmInfo.h"
#include <climits>		#include <climits>

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "prologepilog"		#define DEBUG_TYPE "prologepilog"

		// FIXME: ShrinkWrap2: Fix name.
		cl::opt<cl::boolOrDefault>
		EnableShrinkWrap2Opt("enable-shrink-wrap2", cl::Hidden,
		cl::desc("enable the shrink-wrapping 2 pass"));

typedef SmallVector<MachineBasicBlock *, 4> MBBVector;		typedef SmallVector<MachineBasicBlock *, 4> MBBVector;
static void doSpillCalleeSavedRegs(MachineFunction &MF, RegScavenger *RS,
unsigned &MinCSFrameIndex,
unsigned &MaxCXFrameIndex,
const MBBVector &SaveBlocks,
const MBBVector &RestoreBlocks);

namespace {		namespace {
class PEI : public MachineFunctionPass {		class PEI : public MachineFunctionPass {
public:		public:
static char ID;		static char ID;
PEI() : MachineFunctionPass(ID) {		PEI() : MachineFunctionPass(ID) {
initializePEIPass(*PassRegistry::getPassRegistry());		initializePEIPass(*PassRegistry::getPassRegistry());
}		}

void getAnalysisUsage(AnalysisUsage &AU) const override;		void getAnalysisUsage(AnalysisUsage &AU) const override;

		/// \brief Check if shrink wrapping is enabled for this target and function.
		bool isShrinkWrapEnabled(const MachineFunction &MF);

MachineFunctionProperties getRequiredProperties() const override {		MachineFunctionProperties getRequiredProperties() const override {
MachineFunctionProperties MFP;		MachineFunctionProperties MFP;
if (UsesCalleeSaves)		if (UsesCalleeSaves)
MFP.set(MachineFunctionProperties::Property::NoVRegs);		MFP.set(MachineFunctionProperties::Property::NoVRegs);
return MFP;		return MFP;
}		}

/// runOnMachineFunction - Insert prolog/epilog code and replace abstract		/// runOnMachineFunction - Insert prolog/epilog code and replace abstract
/// frame indexes with appropriate references.		/// frame indexes with appropriate references.
///		///
bool runOnMachineFunction(MachineFunction &Fn) override;		bool runOnMachineFunction(MachineFunction &Fn) override;

private:		private:
std::function<void(MachineFunction &MF, RegScavenger *RS,		std::function<void(MachineFunction &MF)> SpillCalleeSavedRegisters;
unsigned &MinCSFrameIndex, unsigned &MaxCSFrameIndex,
const MBBVector &SaveBlocks,
const MBBVector &RestoreBlocks)>
SpillCalleeSavedRegisters;
std::function<void(MachineFunction &MF, RegScavenger &RS)>		std::function<void(MachineFunction &MF, RegScavenger &RS)>
ScavengeFrameVirtualRegs;		ScavengeFrameVirtualRegs;

bool UsesCalleeSaves = false;		bool UsesCalleeSaves = false;

		// FIXME: ShrinkWrap2: Temporary hack. Remove.
RegScavenger *RS;		RegScavenger *RS;

// MinCSFrameIndex, MaxCSFrameIndex - Keeps the range of callee saved		// MinCSFrameIndex, MaxCSFrameIndex - Keeps the range of callee saved
// stack frame indexes.		// stack frame indexes.
unsigned MinCSFrameIndex = std::numeric_limits<unsigned>::max();		unsigned MinCSFrameIndex = std::numeric_limits<unsigned>::max();
unsigned MaxCSFrameIndex = 0;		unsigned MaxCSFrameIndex = 0;

		// FIXME: ShrinkWrap2: Merge the shrink-wrapping logic here.
// Save and Restore blocks of the current function. Typically there is a		// Save and Restore blocks of the current function. Typically there is a
// single save block, unless Windows EH funclets are involved.		// single save block, unless Windows EH funclets are involved.
MBBVector SaveBlocks;		MBBVector SaveBlocks;
MBBVector RestoreBlocks;		MBBVector RestoreBlocks;

// Flag to control whether to use the register scavenger to resolve		// Flag to control whether to use the register scavenger to resolve
// frame index materialization registers. Set according to		// frame index materialization registers. Set according to
// TRI->requiresFrameIndexScavenging() for the current function.		// TRI->requiresFrameIndexScavenging() for the current function.
bool FrameIndexVirtualScavenging;		bool FrameIndexVirtualScavenging;

// Flag to control whether the scavenger should be passed even though		// Flag to control whether the scavenger should be passed even though
// FrameIndexVirtualScavenging is used.		// FrameIndexVirtualScavenging is used.
bool FrameIndexEliminationScavenging;		bool FrameIndexEliminationScavenging;

		// Emit optimization remarks.
		MachineOptimizationRemarkEmitter *ORE;

		void doSpillCalleeSavedRegs(MachineFunction &MF);
		void doSpillCalleeSavedRegsShrinkWrap2(MachineFunction &Fn,
		CalleeSavedMap &Saves,
		CalleeSavedMap &Restores);

void calculateCallFrameInfo(MachineFunction &Fn);		void calculateCallFrameInfo(MachineFunction &Fn);
void calculateSaveRestoreBlocks(MachineFunction &Fn);		void calculateSaveRestoreBlocks(MachineFunction &Fn);

void calculateFrameObjectOffsets(MachineFunction &Fn);		void calculateFrameObjectOffsets(MachineFunction &Fn);
void replaceFrameIndices(MachineFunction &Fn);		void replaceFrameIndices(MachineFunction &Fn);
void replaceFrameIndices(MachineBasicBlock *BB, MachineFunction &Fn,		void replaceFrameIndices(MachineBasicBlock *BB, MachineFunction &Fn,
int &SPAdj);		int &SPAdj);
void insertPrologEpilogCode(MachineFunction &Fn);		void insertPrologEpilogCode(MachineFunction &Fn);
};		};
} // namespace		} // namespace

char PEI::ID = 0;		char PEI::ID = 0;
char &llvm::PrologEpilogCodeInserterID = PEI::ID;		char &llvm::PrologEpilogCodeInserterID = PEI::ID;

static cl::opt<unsigned>		static cl::opt<unsigned>
WarnStackSize("warn-stack-size", cl::Hidden, cl::init((unsigned)-1),		WarnStackSize("warn-stack-size", cl::Hidden, cl::init((unsigned)-1),
cl::desc("Warn for stack size bigger than the given"		cl::desc("Warn for stack size bigger than the given"
" number"));		" number"));

INITIALIZE_PASS_BEGIN(PEI, DEBUG_TYPE, "Prologue/Epilogue Insertion", false,		INITIALIZE_PASS_BEGIN(PEI, DEBUG_TYPE, "Prologue/Epilogue Insertion", false,
false)		false)
INITIALIZE_PASS_DEPENDENCY(MachineLoopInfo)		INITIALIZE_PASS_DEPENDENCY(MachineLoopInfo)
INITIALIZE_PASS_DEPENDENCY(MachineDominatorTree)		INITIALIZE_PASS_DEPENDENCY(MachineDominatorTree)
INITIALIZE_PASS_DEPENDENCY(StackProtector)		INITIALIZE_PASS_DEPENDENCY(StackProtector)
		INITIALIZE_PASS_DEPENDENCY(MachineOptimizationRemarkEmitterPass)
		INITIALIZE_PASS_DEPENDENCY(MachineBlockFrequencyInfo)
INITIALIZE_PASS_END(PEI, DEBUG_TYPE,		INITIALIZE_PASS_END(PEI, DEBUG_TYPE,
"Prologue/Epilogue Insertion & Frame Finalization", false,		"Prologue/Epilogue Insertion & Frame Finalization", false,
false)		false)

MachineFunctionPass *llvm::createPrologEpilogInserterPass() {		MachineFunctionPass *llvm::createPrologEpilogInserterPass() {
return new PEI();		return new PEI();
}		}

STATISTIC(NumBytesStackSpace,		STATISTIC(NumBytesStackSpace,
"Number of bytes used for stack in all functions");		"Number of bytes used for stack in all functions");

void PEI::getAnalysisUsage(AnalysisUsage &AU) const {		void PEI::getAnalysisUsage(AnalysisUsage &AU) const {
AU.setPreservesCFG();		AU.setPreservesCFG();
AU.addPreserved<MachineLoopInfo>();		AU.addPreserved<MachineLoopInfo>();
AU.addPreserved<MachineDominatorTree>();		AU.addPreserved<MachineDominatorTree>();
AU.addRequired<StackProtector>();		AU.addRequired<StackProtector>();
		AU.addRequired<MachineOptimizationRemarkEmitterPass>();
		AU.addRequired<MachineBlockFrequencyInfo>();
MachineFunctionPass::getAnalysisUsage(AU);		MachineFunctionPass::getAnalysisUsage(AU);
}		}

		bool PEI::isShrinkWrapEnabled(const MachineFunction &MF) {
		auto BecauseOf = [&](const char Title, const char Msg, DebugLoc Loc = {}) {
		MachineOptimizationRemarkMissed R(DEBUG_TYPE, Title, Loc, &MF.front());
		R << "Couldn't shrink-wrap this function because " << Msg;
		ORE->emit(R);
		return false;
		};

		const TargetFrameLowering *TFI = MF.getSubtarget().getFrameLowering();

		switch (EnableShrinkWrap2Opt) {
		case cl::BOU_UNSET: {
		if (MF.getTarget().getOptLevel() == CodeGenOpt::None)
		return BecauseOf("ShrinkWrapDisabledOpt",
		"shrink-wrapping is enabled at O1+.");

		if (!TFI->enableShrinkWrapping(MF))
		return BecauseOf("ShrinkWrapDisabledTarget",
		"shrink-wrapping is not enabled on this target.");
		// Windows with CFI has some limitations that make it impossible
		// to use shrink-wrapping.
		if (MF.getTarget().getMCAsmInfo()->usesWindowsCFI())
		return BecauseOf("ShrinkWrapDisabledWindowsCFI",
		"shrink-wrapping does not support Windows CFI yet.");

		// Sanitizers look at the value of the stack at the location
		// of the crash. Since a crash can happen anywhere, the
		// frame must be lowered before anything else happen for the
		// sanitizers to be able to get a correct stack frame.
		if (MF.getFunction()->hasFnAttribute(Attribute::SanitizeAddress))
		return BecauseOf("ShrinkWrapDisabledASAN",
		"shrink-wrapping can't be enabled with ASAN.");
		if (MF.getFunction()->hasFnAttribute(Attribute::SanitizeThread))
		return BecauseOf("ShrinkWrapDisabledTSAN",
		"shrink-wrapping can't be enabled with TSAN.");
		if (MF.getFunction()->hasFnAttribute(Attribute::SanitizeMemory))
		return BecauseOf("ShrinkWrapDisabledMSAN",
		"shrink-wrapping can't be enabled with MSAN.");
		}
		case cl::BOU_TRUE:
		return true;
		case cl::BOU_FALSE:
		return false;
		}
		llvm_unreachable("Invalid shrink-wrapping state");
		}

/// StackObjSet - A set of stack object indexes		/// StackObjSet - A set of stack object indexes
typedef SmallSetVector<int, 8> StackObjSet;		typedef SmallSetVector<int, 8> StackObjSet;

/// runOnMachineFunction - Insert prolog/epilog code and replace abstract		/// runOnMachineFunction - Insert prolog/epilog code and replace abstract
/// frame indexes with appropriate references.		/// frame indexes with appropriate references.
///		///
bool PEI::runOnMachineFunction(MachineFunction &Fn) {		bool PEI::runOnMachineFunction(MachineFunction &Fn) {
if (!SpillCalleeSavedRegisters) {		if (!SpillCalleeSavedRegisters) {
const TargetMachine &TM = Fn.getTarget();		const TargetMachine &TM = Fn.getTarget();
if (!TM.usesPhysRegsForPEI()) {		if (!TM.usesPhysRegsForPEI()) {
SpillCalleeSavedRegisters = [](MachineFunction &, RegScavenger *,		SpillCalleeSavedRegisters = [](MachineFunction &) {};
unsigned &, unsigned &, const MBBVector &,
const MBBVector &) {};
ScavengeFrameVirtualRegs = [](MachineFunction &, RegScavenger &) {};		ScavengeFrameVirtualRegs = [](MachineFunction &, RegScavenger &) {};
} else {		} else {
SpillCalleeSavedRegisters = doSpillCalleeSavedRegs;		SpillCalleeSavedRegisters = [this](MachineFunction &MF) {
		return this->doSpillCalleeSavedRegs(MF);
		};
ScavengeFrameVirtualRegs = scavengeFrameVirtualRegs;		ScavengeFrameVirtualRegs = scavengeFrameVirtualRegs;
UsesCalleeSaves = true;		UsesCalleeSaves = true;
}		}
}		}

const Function* F = Fn.getFunction();		const Function* F = Fn.getFunction();
const TargetRegisterInfo *TRI = Fn.getSubtarget().getRegisterInfo();		const TargetRegisterInfo *TRI = Fn.getSubtarget().getRegisterInfo();
const TargetFrameLowering *TFI = Fn.getSubtarget().getFrameLowering();		const TargetFrameLowering *TFI = Fn.getSubtarget().getFrameLowering();

		MachineFrameInfo &MFI = Fn.getFrameInfo();
RS = TRI->requiresRegisterScavenging(Fn) ? new RegScavenger() : nullptr;		RS = TRI->requiresRegisterScavenging(Fn) ? new RegScavenger() : nullptr;
		// FIXME: ShrinkWrap2: Temporary hack. Remove.
		MFI.RS = RS;
FrameIndexVirtualScavenging = TRI->requiresFrameIndexScavenging(Fn);		FrameIndexVirtualScavenging = TRI->requiresFrameIndexScavenging(Fn);
FrameIndexEliminationScavenging = (RS && !FrameIndexVirtualScavenging) \|\|		FrameIndexEliminationScavenging = (RS && !FrameIndexVirtualScavenging) \|\|
TRI->requiresFrameIndexReplacementScavenging(Fn);		TRI->requiresFrameIndexReplacementScavenging(Fn);
		ORE = &getAnalysis<MachineOptimizationRemarkEmitterPass>().getORE();

// Calculate the MaxCallFrameSize and AdjustsStack variables for the		// Calculate the MaxCallFrameSize and AdjustsStack variables for the
// function's frame information. Also eliminates call frame pseudo		// function's frame information. Also eliminates call frame pseudo
// instructions.		// instructions.
calculateCallFrameInfo(Fn);		calculateCallFrameInfo(Fn);

// Determine placement of CSR spill/restore code and prolog/epilog code:		// Determine placement of CSR spill/restore code and prolog/epilog code:
// place all spills in the entry block, all restores in return blocks.		// place all spills in the entry block, all restores in return blocks.
calculateSaveRestoreBlocks(Fn);		calculateSaveRestoreBlocks(Fn);

// Handle CSR spilling and restoring, for targets that need it.		// Handle CSR spilling and restoring, for targets that need it.
SpillCalleeSavedRegisters(Fn, RS, MinCSFrameIndex, MaxCSFrameIndex,		SpillCalleeSavedRegisters(Fn);
SaveBlocks, RestoreBlocks);

// Allow the target machine to make final modifications to the function		// Allow the target machine to make final modifications to the function
// before the frame layout is finalized.		// before the frame layout is finalized.
TFI->processFunctionBeforeFrameFinalized(Fn, RS);		TFI->processFunctionBeforeFrameFinalized(Fn, RS);

// Calculate actual frame offsets for all abstract stack objects...		// Calculate actual frame offsets for all abstract stack objects...
calculateFrameObjectOffsets(Fn);		calculateFrameObjectOffsets(Fn);

Show All 16 Lines	bool PEI::runOnMachineFunction(MachineFunction &Fn) {
if (TRI->requiresRegisterScavenging(Fn) && FrameIndexVirtualScavenging) {		if (TRI->requiresRegisterScavenging(Fn) && FrameIndexVirtualScavenging) {
ScavengeFrameVirtualRegs(Fn, *RS);		ScavengeFrameVirtualRegs(Fn, *RS);

// Clear any vregs created by virtual scavenging.		// Clear any vregs created by virtual scavenging.
Fn.getRegInfo().clearVirtRegs();		Fn.getRegInfo().clearVirtRegs();
}		}

// Warn on stack size when we exceeds the given limit.		// Warn on stack size when we exceeds the given limit.
MachineFrameInfo &MFI = Fn.getFrameInfo();
uint64_t StackSize = MFI.getStackSize();		uint64_t StackSize = MFI.getStackSize();
if (WarnStackSize.getNumOccurrences() > 0 && WarnStackSize < StackSize) {		if (WarnStackSize.getNumOccurrences() > 0 && WarnStackSize < StackSize) {
DiagnosticInfoStackSize DiagStackSize(*F, StackSize);		DiagnosticInfoStackSize DiagStackSize(*F, StackSize);
F->getContext().diagnose(DiagStackSize);		F->getContext().diagnose(DiagStackSize);
}		}

		// FIXME: ShrinkWrap2: Temporary hack. Remove.
		MFI.RS = nullptr;
delete RS;		delete RS;
SaveBlocks.clear();		SaveBlocks.clear();
RestoreBlocks.clear();		RestoreBlocks.clear();
MFI.setSavePoint(nullptr);		MFI.setSavePoint(nullptr);
MFI.setRestorePoint(nullptr);		MFI.setRestorePoint(nullptr);
return true;		return true;
}		}

▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	void PEI::calculateSaveRestoreBlocks(MachineFunction &Fn) {
const MachineFrameInfo &MFI = Fn.getFrameInfo();		const MachineFrameInfo &MFI = Fn.getFrameInfo();

// Even when we do not change any CSR, we still want to insert the		// Even when we do not change any CSR, we still want to insert the
// prologue and epilogue of the function.		// prologue and epilogue of the function.
// So set the save points for those.		// So set the save points for those.

// Use the points found by shrink-wrapping, if any.		// Use the points found by shrink-wrapping, if any.
if (MFI.getSavePoint()) {		if (MFI.getSavePoint()) {
		// FIXME: ShrinkWrap2: Remove check.
		assert(!MFI.getShouldUseShrinkWrap2() && "Mixing shrink-wrapping passes.");
SaveBlocks.push_back(MFI.getSavePoint());		SaveBlocks.push_back(MFI.getSavePoint());
assert(MFI.getRestorePoint() && "Both restore and save must be set");		assert(MFI.getRestorePoint() && "Both restore and save must be set");
MachineBasicBlock *RestoreBlock = MFI.getRestorePoint();		MachineBasicBlock *RestoreBlock = MFI.getRestorePoint();
// If RestoreBlock does not have any successor and is not a return block		// If RestoreBlock does not have any successor and is not a return block
// then the end point is unreachable and we do not need to insert any		// then the end point is unreachable and we do not need to insert any
// epilogue.		// epilogue.
if (!RestoreBlock->succ_empty() \|\| RestoreBlock->isReturnBlock())		if (!RestoreBlock->succ_empty() \|\| RestoreBlock->isReturnBlock())
RestoreBlocks.push_back(RestoreBlock);		RestoreBlocks.push_back(RestoreBlock);
return;		return;
}		}

// Save refs to entry and return blocks.		// Save refs to entry and return blocks.
SaveBlocks.push_back(&Fn.front());		SaveBlocks.push_back(&Fn.front());
for (MachineBasicBlock &MBB : Fn) {		for (MachineBasicBlock &MBB : Fn) {
if (MBB.isEHFuncletEntry())		if (MBB.isEHFuncletEntry())
SaveBlocks.push_back(&MBB);		SaveBlocks.push_back(&MBB);
if (MBB.isReturnBlock())		if (MBB.isReturnBlock())
RestoreBlocks.push_back(&MBB);		RestoreBlocks.push_back(&MBB);
}		}
}		}

		/// Insert code that saves the callee saved registers used in the basic block.
		MatzeBUnsubmitted Done Reply Inline Actions Do not repeat function names in doxygen comments (A lot of old code does the same, but we try to avoid it for new code). MatzeB: Do not repeat function names in doxygen comments (A lot of old code does the same, but we try…
		static void insertCSRSaves(MachineBasicBlock &SaveBB,
		ArrayRef<CalleeSavedInfo> CSIs) {
		MachineFunction &Fn = *SaveBB.getParent();
		MatzeBUnsubmitted Done Reply Inline Actions In llvm we try to use `ArrayRef<T>` instead of `cont std::vector<T>&`. It typically provides the same performance but works with std::vector, llvm::SmallVector and other containers (as long as the containers stores all values in a continuous memory region). MatzeB: In llvm we try to use `ArrayRef<T>` instead of `cont std::vector<T>&`. It typically provides…
		const TargetInstrInfo &TII = *Fn.getSubtarget().getInstrInfo();
		const TargetFrameLowering &TFI = *Fn.getSubtarget().getFrameLowering();
		const TargetRegisterInfo &TRI = *Fn.getSubtarget().getRegisterInfo();

		assert(!CSIs.empty() && "No saves to insert.");

		MatzeBUnsubmitted Done Reply Inline Actions I tend to use references (=`const TargetRegisterInfo &TRI =` for stuff that cannot be nullptr). Similar in a number of other places. MatzeB: I tend to use references (=`const TargetRegisterInfo &TRI =` for stuff that cannot be nullptr).
		MachineBasicBlock::iterator I = SaveBB.begin();
		if (!TFI.spillCalleeSavedRegisters(SaveBB, I, CSIs, &TRI)) {
		for (const CalleeSavedInfo &CSI : CSIs) {
		unsigned Reg = CSI.getReg();

		// Update liveness.
		if (!Fn.getRegInfo().isLiveIn(Reg))
		SaveBB.addLiveIn(Reg);
		MatzeBUnsubmitted Done Reply Inline Actions Use range based for. MatzeB: Use range based for.

		// Insert the spill to the stack frame.
		// FIXME: ShrinkWrap2: Check if can be killed.
		const TargetRegisterClass *RC = TRI.getMinimalPhysRegClass(Reg);
		TII.storeRegToStackSlot(SaveBB, I, Reg, false, CSI.getFrameIdx(), RC,
		&TRI);
		std::prev(I)->setFlag(MachineInstr::FrameSetup);

		// FIXME: ShrinkWrap2: Check wether we need CFI, even though it is
		// ignored by the AsmPrinter.
		// Emit CFI for every CSR spill:
		// .cfi_offset %reg, off
		MachineFrameInfo &MFI = Fn.getFrameInfo();
		if (MFI.getShouldUseShrinkWrap2()) {
		unsigned Offset = MFI.getObjectOffset(CSI.getFrameIdx());
		MatzeBUnsubmitted Done Reply Inline Actions Maybe call it `RestoreBB` as the variable doesn't hold a restore instruction but is about a basic block. MatzeB: Maybe call it `RestoreBB` as the variable doesn't hold a restore instruction but is about a…
		const MCRegisterInfo *MRI = Fn.getMMI().getContext().getRegisterInfo();
		unsigned DwarfReg = MRI->getDwarfRegNum(Reg, true);
		unsigned CFIIndex = Fn.addFrameInst(
		MCCFIInstruction::createOffset(nullptr, DwarfReg, Offset));
		BuildMI(SaveBB, I, {}, TII.get(TargetOpcode::CFI_INSTRUCTION))
		.addCFIIndex(CFIIndex);
		}
		}
		}
		}
		MatzeBUnsubmitted Done Reply Inline Actions Avoiding `auto` is friendlier to the reader if the type isn't immediately obvious. MatzeB: Avoiding `auto` is friendlier to the reader if the type isn't immediately obvious.

		/// Insert code that restores the callee saved registers used in the basic
		/// block.
		static void insertCSRRestores(MachineBasicBlock &RestoreBB,
		MatzeBUnsubmitted Done Reply Inline Actions How about `I = Restore.getFirstTerminator()`. MatzeB: How about `I = Restore.getFirstTerminator()`.
		ArrayRef<CalleeSavedInfo> CSIs) {
		MachineFunction &Fn = *RestoreBB.getParent();
		const TargetInstrInfo &TII = *Fn.getSubtarget().getInstrInfo();
		const TargetFrameLowering &TFI = *Fn.getSubtarget().getFrameLowering();
		const TargetRegisterInfo &TRI = *Fn.getSubtarget().getRegisterInfo();

		assert(!CSIs.empty() && "No restores to insert.");

		// Restore using target interface.
		MachineBasicBlock::iterator I = RestoreBB.getFirstTerminator();

		// Restore all registers immediately before the return and any terminators
		// that precede it.
		if (!TFI.restoreCalleeSavedRegisters(RestoreBB, I, CSIs, &TRI)) {
		for (int i = CSIs.size() - 1; i >= 0; --i) {
		unsigned Reg = CSIs[i].getReg();
		const TargetRegisterClass *RC = TRI.getMinimalPhysRegClass(Reg);
		TII.loadRegFromStackSlot(RestoreBB, I, Reg, CSIs[i].getFrameIdx(), RC,
		&TRI);
		std::prev(I)->setFlag(MachineInstr::FrameDestroy);

		assert(I != RestoreBB.begin() &&
		"loadRegFromStackSlot didn't insert any code!");

		// FIXME: ShrinkWrap2: Check wether we need CFI, even though it is
		// ignored by the AsmPrinter.
		// Emit CFI for every CSR restore.
		MatzeBUnsubmitted Not Done Reply Inline Actions Would it make sense to just reverse the loop over CSI instead of using the subtle AtStart/BeforeI logic? MatzeB: Would it make sense to just reverse the loop over CSI instead of using the subtle…
		// .cfi_restore %reg
		MachineFrameInfo &MFI = Fn.getFrameInfo();
		if (MFI.getShouldUseShrinkWrap2()) {
		MachineModuleInfo &MMI = Fn.getMMI();
		const MCRegisterInfo *MRI = MMI.getContext().getRegisterInfo();
		unsigned DwarfReg = MRI->getDwarfRegNum(Reg, true);
		unsigned CFIIndex =
		Fn.addFrameInst(MCCFIInstruction::createRestore(nullptr, DwarfReg));
		BuildMI(RestoreBB, I, {}, TII.get(TargetOpcode::CFI_INSTRUCTION))
		.addCFIIndex(CFIIndex);
		}
		}
		}
		}

static void assignCalleeSavedSpillSlots(MachineFunction &F,		static void assignCalleeSavedSpillSlots(MachineFunction &F,
const BitVector &SavedRegs,		const BitVector &SavedRegs,
unsigned &MinCSFrameIndex,		unsigned &MinCSFrameIndex,
unsigned &MaxCSFrameIndex) {		unsigned &MaxCSFrameIndex) {
if (SavedRegs.empty())		if (SavedRegs.empty())
return;		return;

const TargetRegisterInfo *RegInfo = F.getSubtarget().getRegisterInfo();		const TargetRegisterInfo *RegInfo = F.getSubtarget().getRegisterInfo();
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	for (auto &CS : CSI) {
FrameIdx = MFI.CreateFixedSpillStackObject(Size, FixedSlot->Offset);		FrameIdx = MFI.CreateFixedSpillStackObject(Size, FixedSlot->Offset);
}		}

CS.setFrameIdx(FrameIdx);		CS.setFrameIdx(FrameIdx);
}		}
}		}

MFI.setCalleeSavedInfo(CSI);		MFI.setCalleeSavedInfo(CSI);
		// FIXME: ShrinkWrap2: AArch64FrameLowering needs to call
		// computeCaleeSaveRegisterPairs after calling the generic code above. We
		// could duplicate this code inside
		// AArch64FrameLowering::assignCalleeSavedSpillSlots, but we need to update
		// MinCSFrameIndex and MaxCSFrameIndex.
		if (MFI.getShouldUseShrinkWrap2())
		TFI->processValidCalleeSavedInfo(F, RegInfo, CSI);
}		}

/// Helper function to update the liveness information for the callee-saved		/// Helper function to update the liveness information for the callee-saved
/// registers.		/// registers.
static void updateLiveness(MachineFunction &MF) {		static void updateLiveness(MachineFunction &MF) {
MachineFrameInfo &MFI = MF.getFrameInfo();		MachineFrameInfo &MFI = MF.getFrameInfo();
// Visited will contain all the basic blocks that are in the region		// Visited will contain all the basic blocks that are in the region
// where the callee saved registers are alive:		// where the callee saved registers are alive:
▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	static void insertCSRSpillsAndRestores(MachineFunction &Fn,
const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();		const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();

MFI.setCalleeSavedInfoValid(true);		MFI.setCalleeSavedInfoValid(true);

// Early exit if no callee saved registers are modified!		// Early exit if no callee saved registers are modified!
if (CSI.empty())		if (CSI.empty())
return;		return;

const TargetInstrInfo &TII = *Fn.getSubtarget().getInstrInfo();
const TargetFrameLowering *TFI = Fn.getSubtarget().getFrameLowering();
const TargetRegisterInfo *TRI = Fn.getSubtarget().getRegisterInfo();
MachineBasicBlock::iterator I;

// Spill using target interface.		// Spill using target interface.
for (MachineBasicBlock *SaveBlock : SaveBlocks) {		for (MachineBasicBlock *SaveBlock : SaveBlocks) {
I = SaveBlock->begin();		insertCSRSaves(*SaveBlock, CSI);
if (!TFI->spillCalleeSavedRegisters(*SaveBlock, I, CSI, TRI)) {
for (unsigned i = 0, e = CSI.size(); i != e; ++i) {
// Insert the spill to the stack frame.
unsigned Reg = CSI[i].getReg();
const TargetRegisterClass *RC = TRI->getMinimalPhysRegClass(Reg);
TII.storeRegToStackSlot(*SaveBlock, I, Reg, true, CSI[i].getFrameIdx(),
RC, TRI);
}
}
// Update the live-in information of all the blocks up to the save point.		// Update the live-in information of all the blocks up to the save point.
updateLiveness(Fn);		updateLiveness(Fn);
}		}

// Restore using target interface.		// Restore using target interface.
for (MachineBasicBlock *MBB : RestoreBlocks) {		for (MachineBasicBlock *RestoreBlock : RestoreBlocks)
I = MBB->end();		insertCSRRestores(*RestoreBlock, CSI);
		}

// Skip over all terminator instructions, which are part of the return		// FIXME: ShrinkWrap2: Name.
// sequence.		void PEI::doSpillCalleeSavedRegsShrinkWrap2(MachineFunction &Fn,
MachineBasicBlock::iterator I2 = I;		CalleeSavedMap &Saves,
while (I2 != MBB->begin() && (--I2)->isTerminator())		CalleeSavedMap &Restores) {
I = I2;		const TargetRegisterInfo &TRI = *Fn.getSubtarget().getRegisterInfo();
		MachineFrameInfo &MFI = Fn.getFrameInfo();
bool AtStart = I == MBB->begin();
MachineBasicBlock::iterator BeforeI = I;		// Now gather the callee-saved registers we found using shrink-wrapping.
if (!AtStart)		// FIXME: ShrinkWrap2: We already gathered all the CSRs in ShrinkWrap. Reuse
--BeforeI;		// somehow?
		BitVector ShrinkWrapSavedRegs(TRI.getNumRegs());
// Restore all registers immediately before the return and any		for (auto &Save : Saves)
// terminators that precede it.		for (const CalleeSavedInfo &CSI : Save.second)
if (!TFI->restoreCalleeSavedRegisters(*MBB, I, CSI, TRI)) {		ShrinkWrapSavedRegs.set(CSI.getReg());
for (unsigned i = 0, e = CSI.size(); i != e; ++i) {
unsigned Reg = CSI[i].getReg();		// FIXME: ShrinkWrap2: Re-use stack slots.
const TargetRegisterClass *RC = TRI->getMinimalPhysRegClass(Reg);		assignCalleeSavedSpillSlots(Fn, ShrinkWrapSavedRegs, MinCSFrameIndex,
TII.loadRegFromStackSlot(*MBB, I, Reg, CSI[i].getFrameIdx(), RC, TRI);		MaxCSFrameIndex);
assert(I != MBB->begin() &&
"loadRegFromStackSlot didn't insert any code!");		MFI.setCalleeSavedInfoValid(true);
// Insert in reverse order. loadRegFromStackSlot can insert
// multiple instructions.		if (Fn.getFunction()->hasFnAttribute(Attribute::Naked))
if (AtStart)		return;
I = MBB->begin();
else {		// FIXME: ShrinkWrap2: This is awful. We first call
I = BeforeI;		// assignCalleeSavedSpillSlots, that fills MFI.CalleeSavedInfo which is used
++I;		// for the ENTIRE function. Then, we need to reassign the FrameIdx back to the
		// Saves / Restores map.
		SmallVector<std::pair<std::vector<CalleeSavedInfo> *, unsigned>, 2> ToRemove;
		const std::vector<CalleeSavedInfo> &CSIs = MFI.getCalleeSavedInfo();
		for (auto *Map : {&Saves, &Restores}) {
		for (auto &Elt : *Map) {
		for (const CalleeSavedInfo &CSI : Elt.second) {
		unsigned Reg = CSI.getReg();
		// Look for the register in the assigned CSIs, and reassign it in the
		// map.
		auto It = find_if(CSIs, [&](const CalleeSavedInfo &NewCSI) {
		return NewCSI.getReg() == Reg;
		});
		if (It != CSIs.end())
		// FIXME: ShrinkWrap2: const_cast...
		const_cast<CalleeSavedInfo &>(CSI).setFrameIdx(It->getFrameIdx());
		else // Also, if we can't find it in the list, it means the target
		// removed it. x86 does this for FP, since the spill is part of the
		// prologue emission.
		ToRemove.emplace_back(&Elt.second, Reg);
}		}
}		}
}		}
		for (auto& Pair : ToRemove) {
		std::vector<CalleeSavedInfo> &V = *Pair.first;
		unsigned Reg = Pair.second;
		V.erase(std::remove_if(V.begin(), V.end(),
		[&](const CalleeSavedInfo &CSI) {
		return CSI.getReg() == Reg;
		}),
		V.end());
		}

		for (auto &Save : Saves) {
		insertCSRSaves(*Save.first, Save.second);
		// FIXME: ShrinkWrap2: Update liveness only after all spills / restores?
		updateLiveness(Fn);
}		}

		for (auto &Restore : Restores)
		insertCSRRestores(*Restore.first, Restore.second);
}		}

static void doSpillCalleeSavedRegs(MachineFunction &Fn, RegScavenger *RS,		void PEI::doSpillCalleeSavedRegs(MachineFunction &Fn) {
unsigned &MinCSFrameIndex,
unsigned &MaxCSFrameIndex,
const MBBVector &SaveBlocks,
const MBBVector &RestoreBlocks) {
const Function *F = Fn.getFunction();		const Function *F = Fn.getFunction();
const TargetFrameLowering *TFI = Fn.getSubtarget().getFrameLowering();		const TargetFrameLowering *TFI = Fn.getSubtarget().getFrameLowering();
		MachineFrameInfo &MFI = Fn.getFrameInfo();

MinCSFrameIndex = std::numeric_limits<unsigned>::max();		MinCSFrameIndex = std::numeric_limits<unsigned>::max();
MaxCSFrameIndex = 0;		MaxCSFrameIndex = 0;


		/// If any, contains better save points for the prologue found by
		/// shrink-wrapping.
		CalleeSavedMap Saves;
		/// If any, contains better restore points for the epilogue found by
		/// shrink-wrapping.
		CalleeSavedMap Restores;

		if (!Fn.empty() && isShrinkWrapEnabled(Fn)) {
		ShrinkWrapper SW(Fn);
		auto *MBFI = &getAnalysis<MachineBlockFrequencyInfo>();
		if (SW.areResultsInteresting(MBFI)) {
		MachineFrameInfo &MFI = Fn.getFrameInfo();
		MFI.setShouldUseShrinkWrap2(true);
		MatzeBUnsubmitted Not Done Reply Inline Actions llvm coding style recommends against curly braces to call constructors. MatzeB: llvm coding style recommends against curly braces to call constructors.
		SW.emitRemarks(ORE, MBFI);
		}
		auto &SWSaves = SW.getSaves();
		auto &SWRestores = SW.getRestores();
		const MCPhysReg *CSRegs = Fn.getRegInfo().getCalleeSavedRegs();
		auto Transform = [&](const DenseMap<unsigned, BitVector> &Src,
		CalleeSavedMap &Dst) {
		for (auto &KV : Src) {
		MachineBasicBlock *MBB = Fn.getBlockNumbered(KV.first);
		const BitVector &Regs = KV.second;
		std::vector<CalleeSavedInfo> &CSI = Dst[MBB];

		for (unsigned RegIdx : Regs.set_bits())
		CSI.emplace_back(CSRegs[RegIdx]);
		}
		};
		Transform(SWSaves, Saves);
		Transform(SWRestores, Restores);
		}

		// FIXME: ShrinkWrap2: Share code somehow.
		if (MFI.getShouldUseShrinkWrap2())
		return doSpillCalleeSavedRegsShrinkWrap2(Fn, Saves, Restores);

// Determine which of the registers in the callee save list should be saved.		// Determine which of the registers in the callee save list should be saved.
BitVector SavedRegs;		BitVector SavedRegs;
TFI->determineCalleeSaves(Fn, SavedRegs, RS);		TFI->determineCalleeSaves(Fn, SavedRegs, RS);

// Assign stack slots for any callee-saved registers that must be spilled.		// Assign stack slots for any callee-saved registers that must be spilled.
assignCalleeSavedSpillSlots(Fn, SavedRegs, MinCSFrameIndex, MaxCSFrameIndex);		assignCalleeSavedSpillSlots(Fn, SavedRegs, MinCSFrameIndex, MaxCSFrameIndex);

// Add the code to save and restore the callee saved registers.		// Add the code to save and restore the callee saved registers.
▲ Show 20 Lines • Show All 418 Lines • ▼ Show 20 Lines

/// insertPrologEpilogCode - Scan the function for modified callee saved		/// insertPrologEpilogCode - Scan the function for modified callee saved
/// registers, insert spill code for these callee saved registers, then add		/// registers, insert spill code for these callee saved registers, then add
/// prolog and epilog code to the function.		/// prolog and epilog code to the function.
///		///
void PEI::insertPrologEpilogCode(MachineFunction &Fn) {		void PEI::insertPrologEpilogCode(MachineFunction &Fn) {
const TargetFrameLowering &TFI = *Fn.getSubtarget().getFrameLowering();		const TargetFrameLowering &TFI = *Fn.getSubtarget().getFrameLowering();

		// FIXME: ShrinkWrap2: Stack alginment / adjustment / etc. go in emitPrologue.
		// For now, we add these at the entry / exit of the function, and we spill
		// callee saves using our own blocks. There should be a way to shrink-wrap the
		// stack operations as well.

// Add prologue to the function...		// Add prologue to the function...
for (MachineBasicBlock *SaveBlock : SaveBlocks)		for (MachineBasicBlock *SaveBlock : SaveBlocks)
TFI.emitPrologue(Fn, *SaveBlock);		TFI.emitPrologue(Fn, *SaveBlock);

// Add epilogue to restore the callee-save registers in each exiting block.		// Add epilogue to restore the callee-save registers in each exiting block.
for (MachineBasicBlock *RestoreBlock : RestoreBlocks)		for (MachineBasicBlock *RestoreBlock : RestoreBlocks)
TFI.emitEpilogue(Fn, *RestoreBlock);		TFI.emitEpilogue(Fn, *RestoreBlock);

		// FIXME: ShrinkWrap2: Will this still work?
for (MachineBasicBlock *SaveBlock : SaveBlocks)		for (MachineBasicBlock *SaveBlock : SaveBlocks)
TFI.inlineStackProbe(Fn, *SaveBlock);		TFI.inlineStackProbe(Fn, *SaveBlock);

		// FIXME: ShrinkWrap2: Will this still work?
// Emit additional code that is required to support segmented stacks, if		// Emit additional code that is required to support segmented stacks, if
// we've been asked for it. This, when linked with a runtime with support		// we've been asked for it. This, when linked with a runtime with support
// for segmented stacks (libgcc is one), will result in allocating stack		// for segmented stacks (libgcc is one), will result in allocating stack
// space in small chunks instead of one large contiguous block.		// space in small chunks instead of one large contiguous block.
if (Fn.shouldSplitStack()) {		if (Fn.shouldSplitStack()) {
for (MachineBasicBlock *SaveBlock : SaveBlocks)		for (MachineBasicBlock *SaveBlock : SaveBlocks)
TFI.adjustForSegmentedStacks(Fn, *SaveBlock);		TFI.adjustForSegmentedStacks(Fn, *SaveBlock);
}		}

		// FIXME: ShrinkWrap2: Will this still work?
// Emit additional code that is required to explicitly handle the stack in		// Emit additional code that is required to explicitly handle the stack in
// HiPE native code (if needed) when loaded in the Erlang/OTP runtime. The		// HiPE native code (if needed) when loaded in the Erlang/OTP runtime. The
// approach is rather similar to that of Segmented Stacks, but it uses a		// approach is rather similar to that of Segmented Stacks, but it uses a
// different conditional check and another BIF for allocating more stack		// different conditional check and another BIF for allocating more stack
// space.		// space.
if (Fn.getFunction()->getCallingConv() == CallingConv::HiPE)		if (Fn.getFunction()->getCallingConv() == CallingConv::HiPE)
for (MachineBasicBlock *SaveBlock : SaveBlocks)		for (MachineBasicBlock *SaveBlock : SaveBlocks)
TFI.adjustForHiPEPrologue(Fn, *SaveBlock);		TFI.adjustForHiPEPrologue(Fn, *SaveBlock);
▲ Show 20 Lines • Show All 145 Lines • Show Last 20 Lines

lib/CodeGen/ShrinkWrapper.cpp

This file was added.

				//===- lib/CodeGen/ShrinkWrapper.cpp - Shrink Wrapping Utility --- C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				// Shrink-wrapper implementation.
				//===----------------------------------------------------------------------===//

				#include "llvm/ADT/BitVector.h"
				#include "llvm/ADT/DenseMap.h"
				#include "llvm/ADT/DepthFirstIterator.h"
				#include "llvm/ADT/PostOrderIterator.h"
				#include "llvm/ADT/SCCIterator.h"
				#include "llvm/CodeGen/MachineBlockFrequencyInfo.h"
				#include "llvm/CodeGen/MachineOptimizationRemarkEmitter.h"
				#include "llvm/CodeGen/MachineRegisterInfo.h"
				#include "llvm/Support/raw_ostream.h"
				#include <algorithm>

				#include "llvm/CodeGen/ShrinkWrapper.h"

				// FIXME: ShrinkWrap2: Name
				#define DEBUG_TYPE "shrink-wrap2"

				#define VERBOSE_DEBUG(X) \
				do { \
				if (VerboseDebug) \
				DEBUG(X); \
				} while (0);

				using namespace llvm;

				// FIXME: ShrinkWrap2: Remove ?
				static cl::opt<cl::boolOrDefault>
				VerboseDebug("shrink-wrap-verbose", cl::Hidden,
				cl::desc("verbose debug output"));

				// FIXME: ShrinkWrap2: Remove, debug.
				static cl::opt<cl::boolOrDefault> ViewCFGDebug("shrink-wrap-view", cl::Hidden,
				cl::desc("view cfg"));

				void ShrinkWrapInfo::determineCSRUses() {
				const TargetRegisterInfo &TRI = *MF.getSubtarget().getRegisterInfo();
				const MachineRegisterInfo &MRI = MF.getRegInfo();

				// Walk all the uses of each callee-saved register, and map them to their
				// basic blocks.
				const MCPhysReg *CSRegs = MRI.getCalleeSavedRegs();

				BitVector CSRegUnits(TRI.getNumRegUnits());
				DenseMap<unsigned, unsigned> RegUnitToCSRIdx;
				for (unsigned i = 0; CSRegs[i]; ++i) {
				for (MCRegUnitIterator RegUnit(CSRegs[i], &TRI); RegUnit.isValid();
				++RegUnit) {
				RegUnitToCSRIdx[*RegUnit] = i;
				CSRegUnits.set(*RegUnit);
				}
				}

				auto MarkAsUsedBase = [&](unsigned RegIdx, unsigned MBBNum) {

				BitVector &Used = Uses[MBBNum];
				if (Used.empty())
				Used.resize(getNumResultBits());
				Used.set(RegIdx);
				};
				auto MarkAsUsed = [&](unsigned RegIdx, const MachineBasicBlock &MBB,
				bool isTerminator = false) {
				unsigned MBBNum = MBB.getNumber();
				MarkAsUsedBase(RegIdx, MBBNum);
				// If it's a terminator, mark the successors as used as well,
				// since we can't save after a terminator (i.e. cbz w23, #10).
				if (isTerminator)
				for (MachineBasicBlock *Succ : MBB.successors())
				MarkAsUsedBase(RegIdx, Succ->getNumber());
				};

				// FIXME: ShrinkWrap2: Naked functions.
				// FIXME: ShrinkWrap2: __builtin_unwind_init.

				for (const MachineBasicBlock &MBB : MF) {
				for (const MachineInstr &MI : MBB) {
				for (const MachineOperand &MO : MI.operands()) {

				if (MO.isRegMask()) {
				// Check for regmasks only on the original CSR, as the aliases are not
				// always there.
				for (unsigned i = 0; CSRegs[i]; ++i)
				if (MO.clobbersPhysReg(CSRegs[i]))
				MarkAsUsed(i, MBB, MI.isTerminator());
				} else if (MO.isReg() && MO.getReg() && (MO.readsReg() \|\| MO.isDef())) {
				for (MCRegUnitIterator RegUnit(MO.getReg(), &TRI); RegUnit.isValid();
				++RegUnit)
				if (CSRegUnits.test(*RegUnit))
				MarkAsUsed(RegUnitToCSRIdx[*RegUnit], MBB, MI.isTerminator());
				}
				}
				}
				}
				}

				const BitVector *ShrinkWrapInfo::getUses(unsigned MBBNum) const {
				auto& Use = Uses[MBBNum];
				if (Use.empty())
				return nullptr;
				return &Use;
				}

				ShrinkWrapper::SCCLoopInfo::SCCLoopInfo(const MachineFunction &MF) {
				// Create the SCCLoops.
				for (auto I = scc_begin(&MF); !I.isAtEnd(); ++I) {
				// Skip non-loop SCCs.
				if (!I.hasLoop())
				continue;

				SCCs.emplace_back();
				// The SCCLoop number is the first basic block number in the SCC.
				unsigned Number = (*I->begin())->getNumber();
				SCCs.back().Number = Number;
				SCCs.back().Size = I->size();

				// The number used in MBBToSCC is the position of the SCC in `SCCs`
				for (const MachineBasicBlock MBB : I)
				MBBToSCC[MBB->getNumber()] = SCCs.size() - 1;
				}

				// Compute successors / predecessors of the SCCLoops.
				for (const MachineBasicBlock &MBB : MF) {
				for (const MachineBasicBlock *Succ : MBB.successors()) {
				SCCLoop *MBBSCC = getSCCLoopFor(MBB.getNumber());
				SCCLoop *SuccSCC = getSCCLoopFor(Succ->getNumber());
				// The successor is a loop, but not the current block. It means the
				// successor's predecessor is the current block.
				if (!MBBSCC && SuccSCC)
				SuccSCC->Predecessors.insert(&MBB);
				// The successor is not a loop, but the current block is one. It means
				// that the loop's successor is the block's successor.
				else if (MBBSCC && !SuccSCC)
				MBBSCC->Successors.insert(Succ);
				// The successor and the block are loops. We now need to connect SCCs
				// together.
				else if (MBBSCC && SuccSCC && MBBSCC != SuccSCC) {
				MBBSCC->Successors.insert(Succ);
				SuccSCC->Predecessors.insert(&MBB);
				}
				}
				for (const MachineBasicBlock *Pred : MBB.predecessors()) {
				SCCLoop *MBBSCC = getSCCLoopFor(MBB.getNumber());
				SCCLoop *PredSCC = getSCCLoopFor(Pred->getNumber());
				// The predecessor is a loop, but not the current block. It means the
				// predecessor's successor is the current block.
				if (!MBBSCC && PredSCC)
				PredSCC->Successors.insert(&MBB);
				// The predecessor is not a loop, but the current block is one. It
				// means that the loop's predecessor is the block's predecessor.
				else if (MBBSCC && !PredSCC)
				MBBSCC->Predecessors.insert(Pred);
				// The successor and the block are loops. We now need to connect SCCs
				// together.
				else if (MBBSCC && PredSCC && MBBSCC != PredSCC) {
				MBBSCC->Predecessors.insert(Pred);
				PredSCC->Successors.insert(&MBB);
				}
				}
				}
				}

				unsigned ShrinkWrapper::blockNumber(unsigned MBBNum) const {
				if (const SCCLoop *C = SI.getSCCLoopFor(MBBNum))
				return C->getNumber();
				return MBBNum;
				}

				iterator_range<MBBIterator>
				ShrinkWrapper::blockSuccessors(unsigned MBBNum) const {
				if (const SCCLoop *C = SI.getSCCLoopFor(MBBNum))
				return {C->Successors.begin(), C->Successors.end()};
				const MachineBasicBlock *MBB = MF.getBlockNumbered(MBBNum);
				return {&MBB->succ_begin(), &MBB->succ_end()};
				}

				iterator_range<MBBIterator>
				ShrinkWrapper::blockPredecessors(unsigned MBBNum) const {
				if (const SCCLoop *C = SI.getSCCLoopFor(MBBNum))
				return {C->Predecessors.begin(), C->Predecessors.end()};
				const MachineBasicBlock *MBB = MF.getBlockNumbered(MBBNum);
				return {&MBB->pred_begin(), &MBB->pred_end()};
				}

				void ShrinkWrapper::determineUses() {
				// FIXME: ShrinkWrap2: We do unnecessary copies here.
				for (const MachineBasicBlock &MBB : MF) {
				if (const TargetResultSet *Use = SWI->getUses(MBB.getNumber())) {
				unsigned MBBNum = blockNumber(MBB.getNumber());
				Uses[MBBNum].resize(SWI->getNumResultBits());
				Uses[MBBNum] \|= *Use;
				}
				}
				}

				void ShrinkWrapper::removeUsesOnNoReturnPaths() {
				NoReturnBlocks.resize(MF.getNumBlockIDs());

				// Mark all reachable blocks from any return blocks.
				for (const MachineBasicBlock &MBB : MF)
				if (MBB.isReturnBlock())
				for (const MachineBasicBlock *Block : inverse_depth_first(&MBB))
				NoReturnBlocks.set(Block->getNumber());

				// Flip, so that we can get the non-reachable blocks.
				NoReturnBlocks.flip();

				for (unsigned MBBNum : NoReturnBlocks.set_bits()) {
				DEBUG(dbgs() << "Remove uses from no-return BB#" << MBBNum << '\n');
				Uses[MBBNum].clear();
				}
				}

				void ShrinkWrapper::dumpUses() const {
				for (const auto& Use : enumerate(Uses)) {
				if (!Use.value().count())
				continue;

				dbgs() << "BB#" << Use.index() << " uses : ";
				int Elt = Use.value().find_first();
				if (Elt >= 0)
				SWI->printElt(Elt, dbgs());
				for (Elt = Use.value().find_next(Elt); Elt > 0;
				Elt = Use.value().find_next(Elt)) {
				dbgs() << ", ";
				SWI->printElt(Elt, dbgs());
				}
				dbgs() << '\n';
				}
				}

				void ShrinkWrapper::markUsesOutsideLoops() {
				// Keep track of the elements to attach to a basic block.
				SparseBBResultSetMap ToInsert;
				for (const auto &Use : enumerate(Uses)) {
				unsigned MBBNum = Use.index();
				const TargetResultSet &Elts = Use.value();

				auto Mark = [&](const MachineBasicBlock *Block) {
				unsigned BlockNum = Block->getNumber();
				TargetResultSet &ToInsertTo = ToInsert[BlockNum];
				if (ToInsertTo.empty())
				ToInsertTo.resize(SWI->getNumResultBits());
				ToInsertTo \|= Elts;
				VERBOSE_DEBUG(dbgs() << "Mark: BB#" << BlockNum << '\n');
				};

				if (const SCCLoop *C = SI.getSCCLoopFor(MBBNum)) {
				DEBUG(dbgs() << "Loop for CSR: BB#" << MBBNum << '\n');

				// Mark all the entry blocks of the loop.
				for (const MachineBasicBlock *Block : C->predecessors())
				Mark(Block);

				// Mark all the exit blocks of the loop.
				for (const MachineBasicBlock *Exit : C->successors())
				Mark(Exit);
				}
				}

				for (auto &KV : ToInsert)
				Uses[blockNumber(KV.first)] \|= KV.second;
				}

				void ShrinkWrapper::computeAttributes(
				unsigned Elt, AttributeMap &Attrs,
				ReversePostOrderTraversal<const MachineFunction *> &RPOT) const {
				auto UsesElt = [&](unsigned MBBNum) {
				auto &Use = Uses[MBBNum];
				if (Use.empty())
				return false;
				return Use.test(Elt);
				};

				auto Assign = [&](TargetResultSet &Set, bool New) {
				if (Set.test(Elt) != New)
				Set.flip(Elt);
				};

				// Count how many times we visited a SCCLoop.
				DenseMap<const SCCLoop *, unsigned> SCCVisited;

				// PO traversal for anticipation computation. We want to handle the SCC only
				// when we reach the LAST component.
				for (const MachineBasicBlock *MBB : make_range(RPOT.rbegin(), RPOT.rend())) {
				unsigned MBBNum = MBB->getNumber();
				if (const SCCLoop *C = SI.getSCCLoopFor(MBB->getNumber())) {
				if (++SCCVisited[C] != C->getSize())
				continue;
				else
				MBBNum = C->getNumber();
				}

				SWAttributes &Attr = Attrs[MBBNum];

				// If the element is used in the block, or if it is anticipated in all
				// successors it is also anticipated at the beginning, since we consider
				// entire blocks.
				// -
				// ANTIN = \| APP \|\| ANTOUT
				// -
				TargetResultSet &ANTINb = Attr.ANTIN;
				bool NewANTIN = UsesElt(MBBNum) \|\| ANTOUT(Attrs, MBBNum, Elt);
				Assign(ANTINb, NewANTIN);
				}

				// Reuse the map.
				SCCVisited.clear();

				// RPO traversal for availability computation. We want to handle the SCC only
				// when we reach the FIRST component.
				for (const MachineBasicBlock *MBB : RPOT) {
				unsigned MBBNum = MBB->getNumber();
				if (const SCCLoop *C = SI.getSCCLoopFor(MBB->getNumber())) {
				if (++SCCVisited[C] != 1)
				continue;
				else
				MBBNum = C->getNumber();
				}

				SWAttributes &Attr = Attrs[MBBNum];

				// If the element is used in the block, or if it is always available in
				// all predecessors , it is also available on exit, since we consider
				// entire blocks.
				// -
				// AVOUT = \| APP \|\| AVIN
				// -
				TargetResultSet &AVOUTb = Attr.AVOUT;
				bool NewAVOUT = UsesElt(MBBNum) \|\| AVIN(Attrs, MBBNum, Elt);
				Assign(AVOUTb, NewAVOUT);
				}

				VERBOSE_DEBUG(dumpAttributes(Elt, Attrs));
				}

				bool ShrinkWrapper::hasCriticalEdges(unsigned Elt, AttributeMap &Attrs) {
				bool Needs = false;
				for (const MachineBasicBlock &MBB : MF) {
				bool IsSCCLoop = false;
				if (const SCCLoop *C = SI.getSCCLoopFor(MBB.getNumber())) {
				// Skip all the blocks that are not the number of the SCC, since all the
				// attributes are based on that number.
				if (static_cast<unsigned>(MBB.getNumber()) != C->getNumber())
				continue;
				else
				IsSCCLoop = true;
				}

				unsigned MBBNum = blockNumber(MBB.getNumber());
				// If the block is never returning, we won't bother saving / restoring.
				if (NoReturnBlocks.test(MBBNum))
				continue;

				SWAttributes &Attr = Attrs[MBBNum];
				// Check if this block is ANTIN and has an incoming critical edge where it
				// is not ANTIN. If it's the case, mark it as used, and recompute.
				if (Attr.ANTIN.test(Elt)) {
				auto Preds = blockPredecessors(MBBNum);
				// We're looking for more than 2 predecessors. Also, if it's a SCCLoop, it
				// has a predecessor that is itself.
				if (std::distance(Preds.begin(), Preds.end()) >= 2 \|\| IsSCCLoop) {
				for (const MachineBasicBlock *P : Preds) {
				unsigned PredNum = blockNumber(P->getNumber());
				SWAttributes &Attr = Attrs[PredNum];
				TargetResultSet &ANTINp = Attr.ANTIN;
				if (!ANTINp.test(Elt)) {
				// FIXME: ShrinkWrap2: emit remark.
				VERBOSE_DEBUG(dbgs()
				<< "Incoming critical edge in " << MBBNum << ".\n");
				// Mark it as used.
				TargetResultSet &Used = Uses[PredNum];
				if (Used.empty())
				Used.resize(SWI->getNumResultBits());
				Used.set(Elt);

				// Also, mark it as ANTIN and AVOUT, since we're not calling
				// populateAttributes anymore.
				ANTINp.set(Elt);
				Attr.AVOUT.set(Elt);
				Needs = true;
				}
				}
				}
				}
				// Check if this block is AVOUT and has an outgoing critical edge where it
				// is not AVOUT. If it's the case, mark it as used, and recompute.
				if (Attr.AVOUT.test(Elt)) {
				auto Succs = blockSuccessors(MBBNum);
				// We're looking for more than 2 successors. Also, if it's a SCCLoop, it
				// has a predecessor that is itself.
				if (std::distance(Succs.begin(), Succs.end()) >= 2 \|\| IsSCCLoop) {
				for (const MachineBasicBlock *S : Succs) {
				unsigned SuccNum = blockNumber(S->getNumber());
				SWAttributes &Attr = Attrs[SuccNum];
				TargetResultSet &AVOUTs = Attr.AVOUT;
				if (!AVOUTs.test(Elt)) {
				// FIXME: ShrinkWrap2: emit remark.
				VERBOSE_DEBUG(dbgs()
				<< "Outgoing critical edge in " << MBBNum << ".\n");
				// Mark it as used.
				TargetResultSet &Used = Uses[SuccNum];
				if (Used.empty())
				Used.resize(SWI->getNumResultBits());
				Used.set(Elt);

				// Also, mark it as AVOUT and ANTIN, since we're not calling
				// populateAttrbutes anymore.
				AVOUTs.set(Elt);
				Attr.ANTIN.set(Elt);
				Needs = true;
				}
				}
				}
				}
				}
				// Recompute if needed.
				return Needs;
				}

				void ShrinkWrapper::gatherAttributesResults(unsigned Elt, AttributeMap &Attrs) {
				for (const MachineBasicBlock &MBB : MF) {
				bool IsSCCLoop = false;
				if (const SCCLoop *C = SI.getSCCLoopFor(MBB.getNumber())) {
				// Skip all the blocks that are not the number of the SCC, since all the
				// attributes are based on that number.
				if (static_cast<unsigned>(MBB.getNumber()) != C->getNumber())
				continue;
				else
				IsSCCLoop = true;
				}

				unsigned MBBNum = blockNumber(MBB.getNumber());
				// If the block is never returning, we won't bother saving / restoring.
				if (NoReturnBlocks.test(MBBNum))
				continue;

				SWAttributes &Attr = Attrs[MBBNum];

				// If the uses are anticipated on all the paths leaving this block, and if
				// it is not available at the entry of this block (if it is, then it means
				// it has been saved already, but not restored), and if none of the
				// predecessors anticipates this element on their output (we want to get the
				// "highest" block), then we can identify a save point for the function.
				//
				// SAVE = ANTIN && !AVIN && !ANTIN(pred[i])
				//
				bool NS =
				none_of(blockPredecessors(MBBNum), [&](const MachineBasicBlock *P) {
				return Attrs[blockNumber(P->getNumber())].ANTIN.test(Elt);
				});
				if (NS && Attr.ANTIN.test(Elt) && !AVIN(Attrs, MBBNum, Elt)) {
				TargetResultSet &Save = Saves[MBBNum];
				if (Save.empty())
				Save.resize(SWI->getNumResultBits());
				Save.set(Elt);
				}

				// If the uses are available on all the paths leading to this block, and
				// if the element is not anticipated at the exit of this block (if it is,
				// then it means it has been restored already), and if none of the
				// successors make the element available (we want to cover the // deepest //
				// use), then we can identify a restrore point for the function.
				//
				// RESTORE = AVOUT && !ANTOUT && !AVOUT(succ[i])
				//
				bool NR = none_of(blockSuccessors(MBBNum), [&](const MachineBasicBlock *S) {
				return Attrs[blockNumber(S->getNumber())].AVOUT.test(Elt);
				});
				if (NR && Attr.AVOUT.test(Elt) && !ANTOUT(Attrs, MBBNum, Elt)) {
				TargetResultSet &Restore = Restores[MBBNum];
				if (Restore.empty())
				Restore.resize(SWI->getNumResultBits());
				Restore.set(Elt);
				}
				}
				}

				void ShrinkWrapper::dumpAttributes(unsigned Elt,
				const AttributeMap &Attrs) const {
				for (const MachineBasicBlock &MBB : MF) {
				unsigned MBBNum = MBB.getNumber();
				if (const SCCLoop *C = SI.getSCCLoopFor(MBBNum))
				if (MBBNum != C->getNumber())
				continue;
				const SWAttributes &Attr = Attrs[MBBNum];
				dbgs() << "BB#" << MBBNum << "<";
				SWI->printElt(Elt, dbgs());
				dbgs() << ">"
				<< ":\n\tANTOUT : " << ANTOUT(Attrs, MBBNum, Elt) << '\n'
				<< "\tANTIN : " << Attr.ANTIN.test(Elt) << '\n'
				<< "\tAVIN : " << AVIN(Attrs, MBBNum, Elt) << '\n'
				<< "\tAVOUT : " << Attr.AVOUT.test(Elt) << '\n';
				}
				}

				void ShrinkWrapper::postProcessResults(const BBResultSetMap &OldUses) {
				// If there is only one use of the element, and multiple saves / restores,
				// remove them and place the save / restore at the used MBB's boundaries.
				for (unsigned Elt : AllElts.set_bits()) {
				// FIXME: ShrinkWrap2: 2x std::find_if.
				auto HasElt = [&](const TargetResultSet &Res) {
				return Res.empty() ? false : Res.test(Elt);
				};
				auto Found1 = find_if(OldUses, HasElt);
				auto Found2 = Found1 == OldUses.end()
				? Found1
				: std::find_if(std::next(Found1), OldUses.end(), HasElt);
				if (Found1 != OldUses.end() && Found2 == OldUses.end()) {
				// Gather all the saves.
				MBBSet SavesElt(MF.getNumBlockIDs());
				for (auto &KV : Saves) {
				unsigned MBBNum = KV.first;
				const TargetResultSet &Elts = KV.second;
				if (Elts.test(Elt))
				SavesElt.set(MBBNum);
				}

				// Gather all the restores.
				MBBSet RestoresElt(MF.getNumBlockIDs());
				for (auto &KV : Restores) {
				unsigned MBBNum = KV.first;
				const TargetResultSet &Elts = KV.second;
				if (Elts.test(Elt))
				RestoresElt.set(MBBNum);
				}

				// If we only have a single save and a single restore, keep it that way.
				if (SavesElt.count() == 1 && RestoresElt.count() == 1)
				continue;

				// Remove saves and restores from the maps.
				for (unsigned MBBNum : SavesElt.set_bits())
				Saves[MBBNum].reset(Elt);
				for (unsigned MBBNum : RestoresElt.set_bits())
				Restores[MBBNum].reset(Elt);

				// Add it to the unique block that uses it.
				unsigned MBBNum = std::distance(OldUses.begin(), Found1);
				for (auto *Map : {&Saves, &Restores}) {
				TargetResultSet &Elts = (*Map)[MBBNum];
				if (Elts.empty())
				Elts.resize(SWI->getNumResultBits());
				Elts.set(Elt);
				}
				}
				}

				// Remove all the empty entries from the Saves / Restores maps.
				// FIXME: ShrinkWrap2: Should we even have empty entries?
				SmallVector<SparseBBResultSetMap::iterator, 4> ToRemove;
				for (auto *Map : {&Saves, &Restores}) {
				for (auto It = Map->begin(), End = Map->end(); It != End; ++It)
				if (It->second.count() == 0)
				ToRemove.push_back(It);
				for (auto It : ToRemove)
				Map->erase(It);
				ToRemove.clear();
				}
				}

				unsigned ShrinkWrapper::computeShrinkWrappingCost(
				MachineBlockFrequencyInfo *MBFI) const {
				unsigned Cost = 0;
				for (const MachineBasicBlock &MBB : MF) {
				unsigned BlockCost = 0;
				for (auto *Map : {&Saves, &Restores}) {
				auto Found = Map->find(MBB.getNumber());
				if (Found != Map->end())
				BlockCost += Found->second.count();
				}
				auto Frequency =
				static_cast<double>(MBFI->getBlockFreq(&MBB).getFrequency()) /
				MBFI->getEntryFreq();
				Cost += BlockCost * Frequency * 100;
				}
				return Cost;
				}

				unsigned
				ShrinkWrapper::computeDefaultCost(MachineBlockFrequencyInfo *MBFI) const {
				unsigned Cost = 0;
				for (const MachineBasicBlock &MBB : MF) {
				unsigned BlockCost =
				&MBB == &MF.front() \|\| MBB.isReturnBlock() ? AllElts.count() : 0;
				auto Frequency =
				static_cast<double>(MBFI->getBlockFreq(&MBB).getFrequency()) /
				MBFI->getEntryFreq();
				Cost += BlockCost * Frequency * 100;
				}
				return Cost;
				}

				void ShrinkWrapper::verifySavesRestores() const {
				auto HasElt = [&](const SparseBBResultSetMap &Map, unsigned Elt) {
				return find_if(Map, [&](const std::pair<unsigned, TargetResultSet> &KV) {
				return KV.second.test(Elt);
				}) != Map.end();
				};

				auto RestoresElt = [&](unsigned Elt) { return HasElt(Restores, Elt); };
				auto SavesElt = [&](unsigned Elt) { return HasElt(Saves, Elt); };

				// Check that all the CSRs used in the function are saved at least once.
				for (unsigned Elt : AllElts.set_bits())
				if (!SavesElt(Elt) && !RestoresElt(Elt))
				llvm_unreachable("Used CSR is never saved!");

				// Check that there are no saves / restores in a loop.
				for (const SparseBBResultSetMap *Map : {&Saves, &Restores})
				for (auto &KV : *Map)
				if (SI.getSCCLoopFor(KV.first))
				llvm_unreachable("Save / restore in a loop.");

				// Keep track of the currently saved elements.
				TargetResultSet Saved(SWI->getNumResultBits());
				// Cache the state of each call, to avoid redundant checks.
				std::vector<SmallVector<TargetResultSet, 2>> Cache(MF.getNumBlockIDs());

				// Verify if:
				// * All the saves are restored.
				// * All the restores are related to a store.
				// * There are no nested stores.
				std::function<void(const MachineBasicBlock *)> verifySavesRestoresRec =
				[&](const MachineBasicBlock *MBB) {
				unsigned MBBNum = MBB->getNumber();
				// Don't even check no-return blocks.
				if (MBB->succ_empty() && !MBB->isReturnBlock()) {
				VERBOSE_DEBUG(dbgs() << "IN: BB#" << MBBNum << " is an no-return\n");
				return;
				}

				SmallVectorImpl<TargetResultSet> &State = Cache[MBBNum];
				if (find(State, Saved) != State.end()) {
				VERBOSE_DEBUG(dbgs() << "IN: BB#" << MBBNum << " already visited.\n");
				return;
				}

				State.push_back(Saved);

				VERBOSE_DEBUG(dbgs() << "IN: BB#" << MBBNum << ": Save ";
				for (unsigned Elt
				: Saved.set_bits()) {
				SWI->printElt(Elt, dbgs());
				dbgs() << " ";
				} dbgs()
				<< '\n');

				const TargetResultSet &SavesMBB = Saves.lookup(MBBNum);
				const TargetResultSet &RestoresMBB = Restores.lookup(MBBNum);

				// Get the intersection of the currently saved elements and the
				// elements to be saved for this basic block. If the intersection is
				// not empty, it means we have nested saves for the same elements.
				TargetResultSet Intersection(SavesMBB);
				Intersection &= Saved;

				DEBUG(for (unsigned Elt
				: Intersection.set_bits()) {
				SWI->printElt(Elt, dbgs());
				dbgs() << " is saved twice.\n";
				});

				assert(Intersection.count() == 0 &&
				"Nested saves for the same elements.");
				Intersection.reset();

				// Save the elements to be saved.
				for (unsigned Elt : SavesMBB.set_bits()) {
				Saved.set(Elt);
				VERBOSE_DEBUG(dbgs() << "IN: BB#" << MBBNum << ": Save ";
				SWI->printElt(Elt, dbgs()); dbgs() << ".\n");
				}

				// If the intersection of the currently saved elements and the
				// elements to be restored for this basic block is not equal to the
				// restores, it means we are trying to restore something that is not
				// saved.
				Intersection = RestoresMBB;
				Intersection &= Saved;

				assert(Intersection.count() == RestoresMBB.count() &&
				"Not all restores are saved.");

				// Restore the elements to be restored.
				for (int Elt : RestoresMBB.set_bits()) {
				Saved.reset(Elt);
				VERBOSE_DEBUG(dbgs() << "IN: BB#" << MBBNum << ": Restore ";
				SWI->printElt(Elt, dbgs()); dbgs() << ".\n");
				}

				if (MBB->succ_empty() && Saved.count() != 0)
				llvm_unreachable("Not all saves are restored.");

				// Using the current set of saved elements, walk all the successors
				// recursively.
				for (MachineBasicBlock *Succ : MBB->successors())
				verifySavesRestoresRec(Succ);

				// Restore the state prior of the function exit.
				for (unsigned Elt : RestoresMBB.set_bits()) {
				Saved.set(Elt);
				VERBOSE_DEBUG(dbgs() << "OUT: BB#" << MBBNum << ": Save ";
				SWI->printElt(Elt, dbgs()); dbgs() << ".\n");
				}
				for (unsigned Elt : SavesMBB.set_bits()) {
				Saved.reset(Elt);
				VERBOSE_DEBUG(dbgs() << "OUT: BB#" << MBBNum << ": Restore ";
				SWI->printElt(Elt, dbgs()); dbgs() << ".\n");
				}
				};

				verifySavesRestoresRec(&MF.front());
				}

				void ShrinkWrapper::emitRemarks(MachineOptimizationRemarkEmitter *ORE,
				MachineBlockFrequencyInfo *MBFI) const {
				unsigned Cost = computeShrinkWrappingCost(MBFI);
				unsigned DefaultCost = computeDefaultCost(MBFI);
				int Improvement = DefaultCost - Cost;
				MachineOptimizationRemarkAnalysis R(DEBUG_TYPE, "ShrinkWrapped", {},
				&MF.front());
				R << "Shrink-wrapped function with cost " << ore::NV("ShrinkWrapCost", Cost)
				<< " which is " << ore::NV("ShrinkWrapCostImprovement", Improvement)
				<< " better than "
				<< ore::NV("OriginalShrinkWrapCost", DefaultCost)
				<< ", during which attributes were recomputed "
				<< ore::NV("ShrinkWrapRecomputed", AttributesRecomputed) << " times.";
				ORE->emit(R);
				}

				bool ShrinkWrapper::areResultsInteresting(
				MachineBlockFrequencyInfo *MBFI) const {
				if (!hasUses())
				return false;
				if (Saves.size() == 1) { // If we have only one save,
				unsigned MBBNum = Saves.begin()->first;
				unsigned FrontMBBNum = MF.front().getNumber();
				const TargetResultSet &EltsSaved = Saves.begin()->second;
				if (MBBNum == FrontMBBNum // and the save it's in the entry block,
				&& EltsSaved == AllElts) { // and it saves ALL the CSRs
				DEBUG(dbgs() << "No shrink-wrapping performed, all saves in the entry "
				"block.\n";);
				return false; // then it's not interesting.
				}
				}

				// If the cost with shrink wrapping is better than the default, use it.
				unsigned Cost = computeShrinkWrappingCost(MBFI);
				unsigned DefaultCost = computeDefaultCost(MBFI);
				if (Cost >= DefaultCost)
				DEBUG(dbgs() << "No shrink-wrapping performed. ShrinkWrapCost: " << Cost
				<< ", DefaultCost: " << DefaultCost << '\n');
				return Cost < DefaultCost;
				}

				void ShrinkWrapper::dumpResults() const {
				for (unsigned MBBNum = 0; MBBNum < MF.getNumBlockIDs(); ++MBBNum) {
				if (Saves.count(MBBNum) \|\| Restores.count(MBBNum)) {
				DEBUG(dbgs() << "BB#" << MBBNum << ": Saves: ");
				auto Save = Saves.lookup(MBBNum);
				DEBUG(for (unsigned Elt
				: Save.set_bits()) {
				SWI->printElt(Elt, dbgs());
				dbgs() << ", ";
				});
				DEBUG(dbgs() << "\| Restores: ");
				auto Restore = Restores.lookup(MBBNum);
				DEBUG(for (unsigned Elt
				: Restore.set_bits()) {
				SWI->printElt(Elt, dbgs());
				dbgs() << ", ";
				});

				DEBUG(dbgs() << '\n');
				}
				}
				}

				ShrinkWrapper::ShrinkWrapper(const MachineFunction &MF)
				: ShrinkWrapper(
				MF,
				MF.getSubtarget().getFrameLowering()->createCSRShrinkWrapInfo(MF)) {}

				ShrinkWrapper::ShrinkWrapper(const MachineFunction &MF,
				std::unique_ptr<ShrinkWrapInfo> SW)
				: MF(MF), Uses(MF.getNumBlockIDs()), SWI(std::move(SW)), SI(MF) {
				DEBUG(dbgs() << "**** Analysing " << MF.getName() << '\n');

				if (ViewCFGDebug == cl::BOU_TRUE)
				MF.viewCFGOnly();

				VERBOSE_DEBUG(for (auto &SCC
				: SI.SCCs) {
				dbgs() << "SCCLoop: " << SCC.getNumber() << "\n Pred: ";
				for (auto *Pred : SCC.Predecessors)
				dbgs() << Pred->getNumber() << ", ";
				dbgs() << "\n Succ: ";
				for (auto *Succ : SCC.Successors)
				dbgs() << Succ->getNumber() << ", ";
				dbgs() << '\n';
				});

				// FIXME: ShrinkWrap2: Remove. Call SWI directly.
				determineUses();
				if (!hasUses())
				return;

				DEBUG(dumpUses());

				// Don't bother saving if we know we're never going to return.
				removeUsesOnNoReturnPaths();
				// FIXME: ShrinkWrap2: Check if there are any modifications before printing.
				DEBUG(dbgs() << "**** After removing uses on no-return paths\n";);
				DEBUG(dumpUses());

				markUsesOutsideLoops();
				// FIXME: ShrinkWrap2: Check if there are any modifications before printing.
				DEBUG(dbgs() << "**** After marking uses inside loops\n";);
				DEBUG(dumpUses());

				// FIXME: ShrinkWrap2: Find a better way to avoid treating added CSRs the same
				// as original ones. This is needed for postProcessResults.
				// FIXME: ShrinkWrap2: Probably just save / restore once per block if there
				// is only one register from the beginning.
				auto OldUses = Uses;

				AllElts.resize(SWI->getNumResultBits());
				for (const auto &Use : Uses)
				AllElts \|= Use;

				auto &EntryUses = Uses[MF.front().getNumber()];

				// Compute the dataflow attributes described by Fred C. Chow.
				AttributeMap Attrs;
				// Reserve + emplace_back to avoid copies of empty bitvectors..
				unsigned Max = MF.getNumBlockIDs();
				Attrs.reserve(Max);
				for (unsigned i = 0; i < Max; ++i)
				Attrs.emplace_back(*SWI);
				// For each register, compute the dataflow attributes.
				// FIXME: ShrinkWrap2: Compute all elements at once.
				ReversePostOrderTraversal<const MachineFunction *> RPOT(&MF);
				for (unsigned Elt : AllElts.set_bits()) {
				// If it's used in the entry block, don't even compute it. We know the
				// results already.
				if (!EntryUses.empty() && EntryUses.test(Elt))
				continue;
				// Compute the attributes.
				computeAttributes(Elt, Attrs, RPOT);

				// If we detected critical edges, compute again.
				while (hasCriticalEdges(Elt, Attrs)) {
				++AttributesRecomputed;
				computeAttributes(Elt, Attrs, RPOT);
				}

				gatherAttributesResults(Elt, Attrs);
				VERBOSE_DEBUG(dumpResults());
				}

				VERBOSE_DEBUG(dbgs() << "**** Analysis results\n";);
				VERBOSE_DEBUG(dumpResults());

				if (!EntryUses.empty()) {
				Saves[MF.front().getNumber()] \|= EntryUses;
				for (const MachineBasicBlock &MBB : MF) {
				// FIXME: ShrinkWrap2: EHFuncletEntry.
				if (MBB.isReturnBlock())
				Restores[MBB.getNumber()] \|= EntryUses;
				}
				}
				postProcessResults(OldUses);

				DEBUG(dbgs() << "**** Shrink-wrapping results\n");
				// FIXME: ShrinkWrap2: Check if there are any modifications before printing.
				DEBUG(dumpResults());

				// FIXME: ShrinkWrap2: Remove NDEBUG.
				#if !defined(NDEBUG) \|\| defined(EXPENSIVE_CHECKS)
				verifySavesRestores();
				#endif // EXPENSIVE_CHECKS
				}

lib/CodeGen/TargetPassConfig.cpp

Show All 33 Lines
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Target/TargetMachine.h"		#include "llvm/Target/TargetMachine.h"
#include "llvm/Transforms/Instrumentation.h"		#include "llvm/Transforms/Instrumentation.h"
#include "llvm/Transforms/Scalar.h"		#include "llvm/Transforms/Scalar.h"
#include "llvm/Transforms/Utils/SymbolRewriter.h"		#include "llvm/Transforms/Utils/SymbolRewriter.h"

using namespace llvm;		using namespace llvm;

		// FIXME: ShrinkWrap2: Keep the second one only. Move it from TPC when we
		// decided that ShrinkWrapping is no longer a pass.
		extern cl::opt<cl::boolOrDefault> EnableShrinkWrap2Opt;
		static cl::opt<int> ShrinkWrapPass("shrink-wrap-pass", cl::init(2), cl::Hidden,
		cl::desc("Choose shrink-wrap-pass to use"));
static cl::opt<bool> DisablePostRASched("disable-post-ra", cl::Hidden,		static cl::opt<bool> DisablePostRASched("disable-post-ra", cl::Hidden,
cl::desc("Disable Post Regalloc Scheduler"));		cl::desc("Disable Post Regalloc Scheduler"));
static cl::opt<bool> DisableBranchFold("disable-branch-fold", cl::Hidden,		static cl::opt<bool> DisableBranchFold("disable-branch-fold", cl::Hidden,
cl::desc("Disable branch folding"));		cl::desc("Disable branch folding"));
static cl::opt<bool> DisableTailDuplicate("disable-tail-duplicate", cl::Hidden,		static cl::opt<bool> DisableTailDuplicate("disable-tail-duplicate", cl::Hidden,
cl::desc("Disable tail duplication"));		cl::desc("Disable tail duplication"));
static cl::opt<bool> DisableEarlyTailDup("disable-early-taildup", cl::Hidden,		static cl::opt<bool> DisableEarlyTailDup("disable-early-taildup", cl::Hidden,
cl::desc("Disable pre-register allocation tail duplication"));		cl::desc("Disable pre-register allocation tail duplication"));
▲ Show 20 Lines • Show All 664 Lines • ▼ Show 20 Lines	if (RegAlloc != &useDefaultRegisterAllocator &&
report_fatal_error("Must use fast (default) register allocator for unoptimized regalloc.");		report_fatal_error("Must use fast (default) register allocator for unoptimized regalloc.");
addFastRegAlloc(createRegAllocPass(false));		addFastRegAlloc(createRegAllocPass(false));
}		}

// Run post-ra passes.		// Run post-ra passes.
addPostRegAlloc();		addPostRegAlloc();

// Insert prolog/epilog code. Eliminate abstract frame index references...		// Insert prolog/epilog code. Eliminate abstract frame index references...
if (getOptLevel() != CodeGenOpt::None)		if (getOptLevel() != CodeGenOpt::None && ShrinkWrapPass == 1) {
addPass(&ShrinkWrapID);		addPass(&ShrinkWrapID);
		EnableShrinkWrap2Opt = cl::BOU_FALSE;
		}

// Prolog/Epilog inserter needs a TargetMachine to instantiate. But only		// Prolog/Epilog inserter needs a TargetMachine to instantiate. But only
// do so if it hasn't been disabled, substituted, or overridden.		// do so if it hasn't been disabled, substituted, or overridden.
if (!isPassSubstitutedOrOverridden(&PrologEpilogCodeInserterID))		if (!isPassSubstitutedOrOverridden(&PrologEpilogCodeInserterID))
addPass(createPrologEpilogInserterPass());		addPass(createPrologEpilogInserterPass());

/// Add passes that optimize machine instructions after register allocation.		/// Add passes that optimize machine instructions after register allocation.
if (getOptLevel() != CodeGenOpt::None)		if (getOptLevel() != CodeGenOpt::None)
▲ Show 20 Lines • Show All 293 Lines • Show Last 20 Lines

lib/Target/AArch64/AArch64FrameLowering.h

Show All 30 Lines	public:
eliminateCallFramePseudoInstr(MachineFunction &MF, MachineBasicBlock &MBB,		eliminateCallFramePseudoInstr(MachineFunction &MF, MachineBasicBlock &MBB,
MachineBasicBlock::iterator I) const override;		MachineBasicBlock::iterator I) const override;

/// emitProlog/emitEpilog - These methods insert prolog and epilog code into		/// emitProlog/emitEpilog - These methods insert prolog and epilog code into
/// the function.		/// the function.
void emitPrologue(MachineFunction &MF, MachineBasicBlock &MBB) const override;		void emitPrologue(MachineFunction &MF, MachineBasicBlock &MBB) const override;
void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const override;		void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const override;

		// FIXME: ShrinkWrap2: Delay the computation of NumRegsSpilled.
		bool
		assignCalleeSavedSpillSlots(MachineFunction &MF,
		const TargetRegisterInfo *TRI,
		std::vector<CalleeSavedInfo> &CSI) const override;

bool canUseAsPrologue(const MachineBasicBlock &MBB) const override;		bool canUseAsPrologue(const MachineBasicBlock &MBB) const override;

int getFrameIndexReference(const MachineFunction &MF, int FI,		int getFrameIndexReference(const MachineFunction &MF, int FI,
unsigned &FrameReg) const override;		unsigned &FrameReg) const override;
int resolveFrameIndexReference(const MachineFunction &MF, int FI,		int resolveFrameIndexReference(const MachineFunction &MF, int FI,
unsigned &FrameReg,		unsigned &FrameReg,
bool PreferFP = false) const;		bool PreferFP = false) const;
bool spillCalleeSavedRegisters(MachineBasicBlock &MBB,		bool spillCalleeSavedRegisters(MachineBasicBlock &MBB,
Show All 17 Lines	public:

/// Returns true if the target will correctly handle shrink wrapping.		/// Returns true if the target will correctly handle shrink wrapping.
bool enableShrinkWrapping(const MachineFunction &MF) const override {		bool enableShrinkWrapping(const MachineFunction &MF) const override {
return true;		return true;
}		}

bool enableStackSlotScavenging(const MachineFunction &MF) const override;		bool enableStackSlotScavenging(const MachineFunction &MF) const override;

		// FIXME: ShrinkWrap2: We need this to call computeCalleeSaveRegisterParis
		// before we spill them.
		void
		processValidCalleeSavedInfo(MachineFunction &MF,
		const TargetRegisterInfo *TRI,
		std::vector<CalleeSavedInfo> &CSI) const override;

		std::unique_ptr<ShrinkWrapInfo>
		createCSRShrinkWrapInfo(const MachineFunction &MF) const override;

private:		private:
bool shouldCombineCSRLocalStackBump(MachineFunction &MF,		bool shouldCombineCSRLocalStackBump(MachineFunction &MF,
unsigned StackBumpBytes) const;		unsigned StackBumpBytes) const;
};		};

} // End llvm namespace		} // End llvm namespace

#endif		#endif

lib/Target/AArch64/AArch64FrameLowering.cpp

Show First 20 Lines • Show All 131 Lines • ▼ Show 20 Lines
#define DEBUG_TYPE "frame-info"		#define DEBUG_TYPE "frame-info"

static cl::opt<bool> EnableRedZone("aarch64-redzone",		static cl::opt<bool> EnableRedZone("aarch64-redzone",
cl::desc("enable use of redzone on AArch64"),		cl::desc("enable use of redzone on AArch64"),
cl::init(false), cl::Hidden);		cl::init(false), cl::Hidden);

STATISTIC(NumRedZoneFunctions, "Number of functions using red zone");		STATISTIC(NumRedZoneFunctions, "Number of functions using red zone");

		static bool produceCompactUnwindFrame(const MachineFunction &MF);
		static unsigned estimateRSStackSizeLimit(MachineFunction &MF);

		class AArch64CSRShrinkWrapInfo final : public ShrinkWrapInfo {
		/// Number of bits the result needs.
		unsigned NumCSRs = 0;

		public:
		unsigned getNumResultBits() const override { return NumCSRs; }

		AArch64CSRShrinkWrapInfo(const MachineFunction &MF) : ShrinkWrapInfo(MF) {

		// All calls are tail calls in GHC calling conv, and functions have no
		// prologue/epilogue.
		if (MF.getFunction()->getCallingConv() == CallingConv::GHC)
		return;

		const AArch64RegisterInfo *RegInfo =
		static_cast<const AArch64RegisterInfo *>(
		MF.getSubtarget().getRegisterInfo());
		const MCPhysReg *CSRegs = RegInfo->getCalleeSavedRegs(&MF);
		// Count the number of CSRs.
		for (unsigned i = 0; CSRegs[i]; ++i)
		++NumCSRs;

		determineCSRUses();

		// FIXME: ShrinkWrap2: This is a duplicate of determineCalleeSaves. We
		// should split this into multiple functions, and remove all the side
		// effects from here.
		auto AFI =
		const_cast<AArch64FunctionInfo *>(MF.getInfo<AArch64FunctionInfo>());
		unsigned UnspilledCSGPR = AArch64::NoRegister;
		unsigned UnspilledCSGPRIdx = static_cast<unsigned>(-1);
		unsigned UnspilledCSGPRPaired = AArch64::NoRegister;
		unsigned UnspilledCSGPRPairedIdx = static_cast<unsigned>(-1);

		// FIXME: ShrinkWrap2: This should be available later somehow.
		BitVector SavedRegs(getNumResultBits());
		for (BitVector &BV : Uses)
		SavedRegs \|= BV;

		auto *EntrySaves = &Uses[MF.front().getNumber()];
		if (EntrySaves->empty())
		EntrySaves->resize(getNumResultBits());

		const TargetFrameLowering *TFI = MF.getSubtarget().getFrameLowering();
		// The frame record needs to be created by saving the appropriate registers
		if (TFI->hasFP(MF)) {
		// The frame pointer needs to be used in the entry and return of a
		// function, to prevent optimizations.
		EntrySaves->set(AArch64::FP);
		SavedRegs.set(AArch64::FP);
		for (const MachineBasicBlock &MBB : MF) {
		if (MBB.isReturnBlock()) {
		BitVector &Use = Uses[MBB.getNumber()];
		if (Use.empty())
		Use.resize(getNumResultBits());
		Use.set(AArch64::FP);
		EntrySaves = &Uses[MF.front().getNumber()];
		}
		}
		// FIXME: ShrinkWrap2: Should we let LR be shrink-wrapped?
		// EntrySaves.set(AArch64::LR);
		// SavedRegs.set(AArch64::LR);
		}

		unsigned BasePointerReg = AArch64::NoRegister;
		if (RegInfo->hasBasePointer(MF))
		BasePointerReg = RegInfo->getBaseRegister();

		unsigned ExtraCSSpill = 0;
		// Figure out which callee-saved registers to save/restore.
		for (unsigned i = 0; CSRegs[i]; ++i) {
		const unsigned Reg = CSRegs[i];
		const unsigned RegIdx = i;

		// Add the base pointer register to SavedRegs if it is callee-save.
		if (Reg == BasePointerReg) {
		EntrySaves->set(RegIdx);
		SavedRegs.set(RegIdx);
		// FIXME: ShrinkWrap2: gather the return blocks and re-use them.
		for (const MachineBasicBlock &MBB : MF) {
		if (MBB.isReturnBlock()) {
		BitVector &Use = Uses[MBB.getNumber()];
		if (Use.empty())
		Use.resize(getNumResultBits());
		Use.set(RegIdx);
		EntrySaves = &Uses[MF.front().getNumber()];
		}
		}
		}

		bool RegUsed = SavedRegs.test(RegIdx);
		unsigned PairedReg = CSRegs[i ^ 1];
		unsigned PairedRegIdx = i ^ 1;
		if (!RegUsed) {
		if (AArch64::GPR64RegClass.contains(Reg) &&
		!RegInfo->isReservedReg(MF, Reg)) {
		UnspilledCSGPR = Reg;
		UnspilledCSGPRIdx = RegIdx;
		UnspilledCSGPRPaired = PairedReg;
		UnspilledCSGPRPairedIdx = PairedRegIdx;
		}
		continue;
		}

		// MachO's compact unwind format relies on all registers being stored in
		// pairs.
		// FIXME: the usual format is actually better if unwinding isn't needed.
		// FIXME: ShrinkWrap2: don't check if the paired register is saved if it's
		// not a callee save. This can happen if we have an odd number of CSRs
		// (like MostRegsCC).
		if (produceCompactUnwindFrame(MF) && PairedRegIdx < NumCSRs &&
		!SavedRegs.test(PairedRegIdx)) {
		SavedRegs.set(PairedRegIdx);
		if (AArch64::GPR64RegClass.contains(PairedReg) &&
		!RegInfo->isReservedReg(MF, PairedReg))
		ExtraCSSpill = PairedReg;
		}
		}

		DEBUG(dbgs() << "*** determineCalleeSaves\nUsed CSRs:";
		for (int RegIdx
		: SavedRegs.set_bits()) dbgs()
		<< ' ' << PrintReg(CSRegs[RegIdx], RegInfo);
		dbgs() << "\n";);

		// If any callee-saved registers are used, the frame cannot be eliminated.
		unsigned NumRegsSpilled = SavedRegs.count();
		bool CanEliminateFrame = NumRegsSpilled == 0;

		// The CSR spill slots have not been allocated yet, so estimateStackSize
		// won't include them.
		MachineFrameInfo &MFI = const_cast<MachineFrameInfo &>(MF.getFrameInfo());
		unsigned CFSize = MFI.estimateStackSize(MF) + 8 * NumRegsSpilled;
		DEBUG(dbgs() << "Estimated stack frame size: " << CFSize << " bytes.\n");
		unsigned EstimatedStackSizeLimit =
		estimateRSStackSizeLimit(const_cast<MachineFunction &>(MF));
		bool BigStack = (CFSize > EstimatedStackSizeLimit);
		if (BigStack \|\| !CanEliminateFrame \|\| RegInfo->cannotEliminateFrame(MF))
		AFI->setHasStackFrame(true);

		// Estimate if we might need to scavenge a register at some point in order
		// to materialize a stack offset. If so, either spill one additional
		// callee-saved register or reserve a special spill slot to facilitate
		// register scavenging. If we already spilled an extra callee-saved register
		// above to keep the number of spills even, we don't need to do anything
		// else here.
		if (BigStack) {
		if (!ExtraCSSpill && UnspilledCSGPR != AArch64::NoRegister) {
		DEBUG(dbgs() << "Spilling " << PrintReg(UnspilledCSGPR, RegInfo)
		<< " to get a scratch register.\n");
		EntrySaves->set(UnspilledCSGPRIdx);
		// FIXME: ShrinkWrap2: Mark it in the return blocks too.
		SavedRegs.set(UnspilledCSGPRIdx);
		// MachO's compact unwind format relies on all registers being stored in
		// pairs, so if we need to spill one extra for BigStack, then we need to
		// store the pair.
		if (produceCompactUnwindFrame(MF)) {
		EntrySaves->set(UnspilledCSGPRPairedIdx);
		// FIXME: ShrinkWrap2: Mark it in the return blocks too.
		SavedRegs.set(UnspilledCSGPRPairedIdx);
		}
		ExtraCSSpill = UnspilledCSGPRPaired;
		NumRegsSpilled = SavedRegs.count();
		}

		// If we didn't find an extra callee-saved register to spill, create
		// an emergency spill slot.
		if (!ExtraCSSpill \|\| MF.getRegInfo().isPhysRegUsed(ExtraCSSpill)) {
		const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();
		const TargetRegisterClass &RC = AArch64::GPR64RegClass;
		unsigned Size = TRI->getSpillSize(RC);
		unsigned Align = TRI->getSpillAlignment(RC);
		int FI = MFI.CreateStackObject(Size, Align, false);
		// FIXME: ShrinkWrap2: Temporary hack. Remove.
		MFI.RS->addScavengingFrameIndex(FI);
		DEBUG(dbgs() << "No available CS registers, allocated fi#" << FI
		<< " as the emergency spill slot.\n");
		}
		}

		// Round up to register pair alignment to avoid additional SP adjustment
		// instructions.
		AFI->setCalleeSavedStackSize(alignTo(8 * NumRegsSpilled, 16));
		}

		raw_ostream &printElt(unsigned Elt, raw_ostream &OS) const override {
		auto &TRI = *MF.getSubtarget().getRegisterInfo();
		OS << PrintReg(TRI.getCalleeSavedRegs(&MF)[Elt], &TRI);
		return OS;
		}
		};

/// Look at each instruction that references stack frames and return the stack		/// Look at each instruction that references stack frames and return the stack
/// size limit beyond which some of these instructions will require a scratch		/// size limit beyond which some of these instructions will require a scratch
/// register during their expansion later.		/// register during their expansion later.
static unsigned estimateRSStackSizeLimit(MachineFunction &MF) {		static unsigned estimateRSStackSizeLimit(MachineFunction &MF) {
// FIXME: For now, just conservatively guestimate based on unscaled indexing		// FIXME: For now, just conservatively guestimate based on unscaled indexing
// range. We'll end up allocating an unnecessary spill slot a lot, but		// range. We'll end up allocating an unnecessary spill slot a lot, but
// realistically that's not a big deal at this stage of the game.		// realistically that's not a big deal at this stage of the game.
for (MachineBasicBlock &MBB : MF) {		for (MachineBasicBlock &MBB : MF) {
▲ Show 20 Lines • Show All 211 Lines • ▼ Show 20 Lines	if (canUseRedZone(MF))
return false;		return false;

return true;		return true;
}		}

// Convert callee-save register save/restore instruction to do stack pointer		// Convert callee-save register save/restore instruction to do stack pointer
// decrement/increment to allocate/deallocate the callee-save stack area by		// decrement/increment to allocate/deallocate the callee-save stack area by
// converting store/load to use pre/post increment version.		// converting store/load to use pre/post increment version.
		LLVM_ATTRIBUTE_USED // FIXME: ShrinkWrap2: Remove attribute when we reuse this.
static MachineBasicBlock::iterator convertCalleeSaveRestoreToSPPrePostIncDec(		static MachineBasicBlock::iterator convertCalleeSaveRestoreToSPPrePostIncDec(
MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI,		MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI,
const DebugLoc &DL, const TargetInstrInfo *TII, int CSStackSizeInc) {		const DebugLoc &DL, const TargetInstrInfo *TII, int CSStackSizeInc) {
unsigned NewOpc;		unsigned NewOpc;
bool NewIsUnscaled = false;		bool NewIsUnscaled = false;
switch (MBBI->getOpcode()) {		switch (MBBI->getOpcode()) {
default:		default:
llvm_unreachable("Unexpected callee-save save/restore opcode!");		llvm_unreachable("Unexpected callee-save save/restore opcode!");
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	static MachineBasicBlock::iterator convertCalleeSaveRestoreToSPPrePostIncDec(
return std::prev(MBB.erase(MBBI));		return std::prev(MBB.erase(MBBI));
}		}

// Fixup callee-save register save/restore instructions to take into account		// Fixup callee-save register save/restore instructions to take into account
// combined SP bump by adding the local stack size to the stack offsets.		// combined SP bump by adding the local stack size to the stack offsets.
static void fixupCalleeSaveRestoreStackOffset(MachineInstr &MI,		static void fixupCalleeSaveRestoreStackOffset(MachineInstr &MI,
unsigned LocalStackSize) {		unsigned LocalStackSize) {
unsigned Opc = MI.getOpcode();		unsigned Opc = MI.getOpcode();
		if (Opc == TargetOpcode::CFI_INSTRUCTION)
		return;
(void)Opc;		(void)Opc;
assert((Opc == AArch64::STPXi \|\| Opc == AArch64::STPDi \|\|		assert((Opc == AArch64::STPXi \|\| Opc == AArch64::STPDi \|\|
Opc == AArch64::STRXui \|\| Opc == AArch64::STRDui \|\|		Opc == AArch64::STRXui \|\| Opc == AArch64::STRDui \|\|
Opc == AArch64::LDPXi \|\| Opc == AArch64::LDPDi \|\|		Opc == AArch64::LDPXi \|\| Opc == AArch64::LDPDi \|\|
Opc == AArch64::LDRXui \|\| Opc == AArch64::LDRDui) &&		Opc == AArch64::LDRXui \|\| Opc == AArch64::LDRDui) &&
"Unexpected callee-save save/restore opcode!");		"Unexpected callee-save save/restore opcode!");

unsigned OffsetIdx = MI.getNumExplicitOperands() - 1;		unsigned OffsetIdx = MI.getNumExplicitOperands() - 1;
Show All 24 Lines	void AArch64FrameLowering::emitPrologue(MachineFunction &MF,
DebugLoc DL;		DebugLoc DL;

// All calls are tail calls in GHC calling conv, and functions have no		// All calls are tail calls in GHC calling conv, and functions have no
// prologue/epilogue.		// prologue/epilogue.
if (MF.getFunction()->getCallingConv() == CallingConv::GHC)		if (MF.getFunction()->getCallingConv() == CallingConv::GHC)
return;		return;

int NumBytes = (int)MFI.getStackSize();		int NumBytes = (int)MFI.getStackSize();
		// FIXME: ShrinkWrap2: This is set by determineCalleeSaves. Seems wrong to me.
if (!AFI->hasStackFrame()) {		if (!AFI->hasStackFrame()) {
assert(!HasFP && "unexpected function without stack frame but with FP");		assert(!HasFP && "unexpected function without stack frame but with FP");

// All of the stack allocation is for locals.		// All of the stack allocation is for locals.
AFI->setLocalStackSize(NumBytes);		AFI->setLocalStackSize(NumBytes);

if (!NumBytes)		if (!NumBytes)
return;		return;
Show All 12 Lines	else {
MCCFIInstruction::createDefCfaOffset(FrameLabel, -NumBytes));		MCCFIInstruction::createDefCfaOffset(FrameLabel, -NumBytes));
BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))		BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
.addCFIIndex(CFIIndex)		.addCFIIndex(CFIIndex)
.setMIFlags(MachineInstr::FrameSetup);		.setMIFlags(MachineInstr::FrameSetup);
}		}
return;		return;
}		}

		// FIXME: ShrinkWrap2: This is set by determineCalleeSaves. Seems wrong to me.
auto CSStackSize = AFI->getCalleeSavedStackSize();		auto CSStackSize = AFI->getCalleeSavedStackSize();
// All of the remaining stack allocations are for locals.		// All of the remaining stack allocations are for locals.
AFI->setLocalStackSize(NumBytes - CSStackSize);		AFI->setLocalStackSize(NumBytes - CSStackSize);

bool CombineSPBump = shouldCombineCSRLocalStackBump(MF, NumBytes);		bool CombineSPBump = shouldCombineCSRLocalStackBump(MF, NumBytes);
if (CombineSPBump) {		if (CombineSPBump) {
emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP, -NumBytes, TII,		emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP, -NumBytes, TII,
MachineInstr::FrameSetup);		MachineInstr::FrameSetup);
NumBytes = 0;		NumBytes = 0;
} else if (CSStackSize != 0) {		} else if (CSStackSize != 0) {
		// FIXME: ShrinkWrap2: For now, we can't use push / pop for save / restore
		// of CSR.
		if (MFI.getShouldUseShrinkWrap2()) {
		emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP, -CSStackSize,
		TII, MachineInstr::FrameSetup);
		}
		else {
MBBI = convertCalleeSaveRestoreToSPPrePostIncDec(MBB, MBBI, DL, TII,		MBBI = convertCalleeSaveRestoreToSPPrePostIncDec(MBB, MBBI, DL, TII,
-CSStackSize);		-CSStackSize);
		}
NumBytes -= CSStackSize;		NumBytes -= CSStackSize;
}		}
assert(NumBytes >= 0 && "Negative stack allocation size!?");		assert(NumBytes >= 0 && "Negative stack allocation size!?");

// Move past the saves of the callee-saved registers, fixing up the offsets		// Move past the saves of the callee-saved registers, fixing up the offsets
// and pre-inc if we decided to combine the callee-save and local stack		// and pre-inc if we decided to combine the callee-save and local stack
// pointer bump above.		// pointer bump above.
MachineBasicBlock::iterator End = MBB.end();		MachineBasicBlock::iterator End = MBB.end();
while (MBBI != End && MBBI->getFlag(MachineInstr::FrameSetup)) {		while (MBBI != End && MBBI->getFlag(MachineInstr::FrameSetup)) {
if (CombineSPBump)		if (CombineSPBump)
fixupCalleeSaveRestoreStackOffset(*MBBI, AFI->getLocalStackSize());		fixupCalleeSaveRestoreStackOffset(*MBBI, AFI->getLocalStackSize());
++MBBI;		++MBBI;
}		}
		if (CombineSPBump &&
		(MFI.getShouldUseShrinkWrap2() \|\| MFI.getShouldUseStackShrinkWrap2())) {
		for (MachineOperand *MO : AFI->getCSROffsetsToFix()) {
		MachineInstr &MI = *MO->getParent();
		fixupCalleeSaveRestoreStackOffset(MI, AFI->getLocalStackSize());
		}
		}
if (HasFP) {		if (HasFP) {
// Only set up FP if we actually need to. Frame pointer is fp = sp - 16.		// Only set up FP if we actually need to. Frame pointer is fp = sp - 16.
int FPOffset = CSStackSize - 16;		int FPOffset = CSStackSize - 16;
if (CombineSPBump)		if (CombineSPBump)
FPOffset += AFI->getLocalStackSize();		FPOffset += AFI->getLocalStackSize();

// Issue sub fp, sp, FPOffset or		// Issue sub fp, sp, FPOffset or
// mov fp,sp when FPOffset is zero.		// mov fp,sp when FPOffset is zero.
Show All 9 Lines	if (NumBytes) {
unsigned scratchSPReg = AArch64::SP;		unsigned scratchSPReg = AArch64::SP;

if (NeedsRealignment) {		if (NeedsRealignment) {
scratchSPReg = findScratchNonCalleeSaveRegister(&MBB);		scratchSPReg = findScratchNonCalleeSaveRegister(&MBB);
assert(scratchSPReg != AArch64::NoRegister);		assert(scratchSPReg != AArch64::NoRegister);
}		}

// If we're a leaf function, try using the red zone.		// If we're a leaf function, try using the red zone.
if (!canUseRedZone(MF))		if (!canUseRedZone(MF)) {
// FIXME: in the case of dynamic re-alignment, NumBytes doesn't have		// FIXME: in the case of dynamic re-alignment, NumBytes doesn't have
// the correct value here, as NumBytes also includes padding bytes,		// the correct value here, as NumBytes also includes padding bytes,
// which shouldn't be counted here.		// which shouldn't be counted here.
emitFrameOffset(MBB, MBBI, DL, scratchSPReg, AArch64::SP, -NumBytes, TII,		emitFrameOffset(MBB, MBBI, DL, scratchSPReg, AArch64::SP, -NumBytes, TII,
MachineInstr::FrameSetup);		MachineInstr::FrameSetup);

		// FIXME: ShrinkWrap2: If we have another stack allocation here, and we're
		// using SP for all the non-entry/non-return blocks, we have to fixup our
		// offsets emitted for the callee saved regs. The ideal would be to know
		// if we have this extra local stack allocation when computing the
		// offsets, but that information is not available yet at that point.

		// Another solution would be to actually use the FI operands as all the
		// targets do, and let resolveFrameIndex do the job.
		for (MachineOperand *MO : AFI->getCSROffsetsToFix())
		MO->setImm(MO->getImm() + NumBytes / 8); // This is SP-relative, it only
		// occurs when we don't have a
		// stack frame. Which means
		// that the offset is unsigned
		// and scaled, so we need to
		// divide by 8.
		}

if (NeedsRealignment) {		if (NeedsRealignment) {
const unsigned Alignment = MFI.getMaxAlignment();		const unsigned Alignment = MFI.getMaxAlignment();
const unsigned NrBitsToZero = countTrailingZeros(Alignment);		const unsigned NrBitsToZero = countTrailingZeros(Alignment);
assert(NrBitsToZero > 1);		assert(NrBitsToZero > 1);
assert(scratchSPReg != AArch64::SP);		assert(scratchSPReg != AArch64::SP);

// SUB X9, SP, NumBytes		// SUB X9, SP, NumBytes
// -- X9 is temporary register, so shouldn't contain any live data here,		// -- X9 is temporary register, so shouldn't contain any live data here,
▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines	if (HasFP) {
// Encode the stack size of the leaf function.		// Encode the stack size of the leaf function.
unsigned CFIIndex = MF.addFrameInst(		unsigned CFIIndex = MF.addFrameInst(
MCCFIInstruction::createDefCfaOffset(nullptr, -MFI.getStackSize()));		MCCFIInstruction::createDefCfaOffset(nullptr, -MFI.getStackSize()));
BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))		BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
.addCFIIndex(CFIIndex)		.addCFIIndex(CFIIndex)
.setMIFlags(MachineInstr::FrameSetup);		.setMIFlags(MachineInstr::FrameSetup);
}		}


		// FIXME: ShrinkWrap2: We emit CFI when we emit the instructions.
		if (MFI.getShouldUseShrinkWrap2())
		return;
// Now emit the moves for whatever callee saved regs we have (including FP,		// Now emit the moves for whatever callee saved regs we have (including FP,
// LR if those are saved).		// LR if those are saved).
emitCalleeSavedFrameMoves(MBB, MBBI);		emitCalleeSavedFrameMoves(MBB, MBBI);
}		}
}		}

void AArch64FrameLowering::emitEpilogue(MachineFunction &MF,		void AArch64FrameLowering::emitEpilogue(MachineFunction &MF,
MachineBasicBlock &MBB) const {		MachineBasicBlock &MBB) const {
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	void AArch64FrameLowering::emitEpilogue(MachineFunction &MF,
// = StackSize + ArgumentPopSize		// = StackSize + ArgumentPopSize
//		//
// AArch64TargetLowering::LowerCall figures out ArgumentPopSize and keeps		// AArch64TargetLowering::LowerCall figures out ArgumentPopSize and keeps
// it as the 2nd argument of AArch64ISD::TC_RETURN.		// it as the 2nd argument of AArch64ISD::TC_RETURN.

auto CSStackSize = AFI->getCalleeSavedStackSize();		auto CSStackSize = AFI->getCalleeSavedStackSize();
bool CombineSPBump = shouldCombineCSRLocalStackBump(MF, NumBytes);		bool CombineSPBump = shouldCombineCSRLocalStackBump(MF, NumBytes);

		// FIXME: ShrinkWrap2: For now, we can't use push / pop for save / restore
		// of CSR.
		if (!MFI.getShouldUseShrinkWrap2()) {
if (!CombineSPBump && CSStackSize != 0)		if (!CombineSPBump && CSStackSize != 0)
convertCalleeSaveRestoreToSPPrePostIncDec(		convertCalleeSaveRestoreToSPPrePostIncDec(
MBB, std::prev(MBB.getFirstTerminator()), DL, TII, CSStackSize);		MBB, std::prev(MBB.getFirstTerminator()), DL, TII, CSStackSize);
		}

// Move past the restores of the callee-saved registers.		// Move past the restores of the callee-saved registers.
MachineBasicBlock::iterator LastPopI = MBB.getFirstTerminator();		MachineBasicBlock::iterator LastPopI = MBB.getFirstTerminator();
MachineBasicBlock::iterator Begin = MBB.begin();		MachineBasicBlock::iterator Begin = MBB.begin();
while (LastPopI != Begin) {		while (LastPopI != Begin) {
--LastPopI;		--LastPopI;
if (!LastPopI->getFlag(MachineInstr::FrameDestroy)) {		if (!LastPopI->getFlag(MachineInstr::FrameDestroy)) {
++LastPopI;		++LastPopI;
break;		break;
} else if (CombineSPBump)		} else if (CombineSPBump)
fixupCalleeSaveRestoreStackOffset(*LastPopI, AFI->getLocalStackSize());		fixupCalleeSaveRestoreStackOffset(*LastPopI, AFI->getLocalStackSize());
}		}

// If there is a single SP update, insert it before the ret and we're done.		// If there is a single SP update, insert it before the ret and we're done.
if (CombineSPBump) {		if (CombineSPBump) {
emitFrameOffset(MBB, MBB.getFirstTerminator(), DL, AArch64::SP, AArch64::SP,		emitFrameOffset(MBB, MBB.getFirstTerminator(), DL, AArch64::SP, AArch64::SP,
NumBytes + ArgumentPopSize, TII,		NumBytes + ArgumentPopSize, TII,
MachineInstr::FrameDestroy);		MachineInstr::FrameDestroy);
return;		return;
}		}

		// FIXME: ShrinkWrap2: For now, we can't use push / pop for save / restore
		// of CSR, so we have to restore SP manually.
		if (MFI.getShouldUseShrinkWrap2()) {
		emitFrameOffset(MBB, MBB.getFirstTerminator(), DL, AArch64::SP, AArch64::SP,
		CSStackSize, TII, MachineInstr::FrameDestroy);
		}
NumBytes -= CSStackSize;		NumBytes -= CSStackSize;
assert(NumBytes >= 0 && "Negative stack allocation size!?");		assert(NumBytes >= 0 && "Negative stack allocation size!?");

if (!hasFP(MF)) {		if (!hasFP(MF)) {
bool RedZone = canUseRedZone(MF);		bool RedZone = canUseRedZone(MF);
// If this was a redzone leaf function, we don't need to restore the		// If this was a redzone leaf function, we don't need to restore the
// stack pointer (but we may need to pop stack args for fastcc).		// stack pointer (but we may need to pop stack args for fastcc).
if (RedZone && ArgumentPopSize == 0)		if (RedZone && ArgumentPopSize == 0)
▲ Show 20 Lines • Show All 110 Lines • ▼ Show 20 Lines	static unsigned getPrologueDeath(MachineFunction &MF, unsigned Reg) {
// happens with the @llvm-returnaddress intrinsic and with arguments passed in		// happens with the @llvm-returnaddress intrinsic and with arguments passed in
// callee saved registers.		// callee saved registers.
// Omitting the kill flags is conservatively correct even if the live-in		// Omitting the kill flags is conservatively correct even if the live-in
// is not used after all.		// is not used after all.
bool IsLiveIn = MF.getRegInfo().isLiveIn(Reg);		bool IsLiveIn = MF.getRegInfo().isLiveIn(Reg);
return getKillRegState(!IsLiveIn);		return getKillRegState(!IsLiveIn);
}		}

static bool produceCompactUnwindFrame(MachineFunction &MF) {		static bool produceCompactUnwindFrame(const MachineFunction &MF) {
		// FIXME: ShrinkWrap2: Fix compact unwinding.
		const MachineFrameInfo &MFI = MF.getFrameInfo();
		if (MFI.getShouldUseShrinkWrap2())
		return false;
const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();		const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
AttributeList Attrs = MF.getFunction()->getAttributes();		AttributeList Attrs = MF.getFunction()->getAttributes();
return Subtarget.isTargetMachO() &&		return Subtarget.isTargetMachO() &&
!(Subtarget.getTargetLowering()->supportSwiftError() &&		!(Subtarget.getTargetLowering()->supportSwiftError() &&
Attrs.hasAttrSomewhere(Attribute::SwiftError));		Attrs.hasAttrSomewhere(Attribute::SwiftError));
}		}

namespace {

struct RegPairInfo {
unsigned Reg1 = AArch64::NoRegister;
unsigned Reg2 = AArch64::NoRegister;
int FrameIdx;
int Offset;
bool IsGPR;

RegPairInfo() = default;

bool isPaired() const { return Reg2 != AArch64::NoRegister; }
};

} // end anonymous namespace

static void computeCalleeSaveRegisterPairs(		static void computeCalleeSaveRegisterPairs(
MachineFunction &MF, const std::vector<CalleeSavedInfo> &CSI,		MachineFunction &MF, const std::vector<CalleeSavedInfo> &CSI,
const TargetRegisterInfo *TRI, SmallVectorImpl<RegPairInfo> &RegPairs) {		const TargetRegisterInfo *TRI, SmallVectorImpl<RegPairInfo> &RegPairs) {

if (CSI.empty())		if (CSI.empty())
return;		return;

AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();		AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();
MachineFrameInfo &MFI = MF.getFrameInfo();		MachineFrameInfo &MFI = MF.getFrameInfo();
CallingConv::ID CC = MF.getFunction()->getCallingConv();		CallingConv::ID CC = MF.getFunction()->getCallingConv();
unsigned Count = CSI.size();		unsigned Count = CSI.size();
(void)CC;		(void)CC;
// MachO's compact unwind format relies on all registers being stored in		// MachO's compact unwind format relies on all registers being stored in
// pairs.		// pairs.
		// FIXME: ShrinkWrap2: Fix compact unwind format.
		if (!MFI.getShouldUseShrinkWrap2()) {
assert((!produceCompactUnwindFrame(MF) \|\|		assert((!produceCompactUnwindFrame(MF) \|\|
CC == CallingConv::PreserveMost \|\|		CC == CallingConv::PreserveMost \|\|
(Count & 1) == 0) &&		(Count & 1) == 0) &&
"Odd number of callee-saved regs to spill!");		"Odd number of callee-saved regs to spill!");
		}
unsigned Offset = AFI->getCalleeSavedStackSize();		unsigned Offset = AFI->getCalleeSavedStackSize();

for (unsigned i = 0; i < Count; ++i) {		for (unsigned i = 0; i < Count; ++i) {
RegPairInfo RPI;		RegPairInfo RPI;
RPI.Reg1 = CSI[i].getReg();		RPI.Reg1 = CSI[i].getReg();

assert(AArch64::GPR64RegClass.contains(RPI.Reg1) \|\|		assert(AArch64::GPR64RegClass.contains(RPI.Reg1) \|\|
AArch64::FPR64RegClass.contains(RPI.Reg1));		AArch64::FPR64RegClass.contains(RPI.Reg1));
RPI.IsGPR = AArch64::GPR64RegClass.contains(RPI.Reg1);		RPI.IsGPR = AArch64::GPR64RegClass.contains(RPI.Reg1);

// Add the next reg to the pair if it is in the same register class.		// Add the next reg to the pair if it is in the same register class.
		// FIXME: ShrinkWrap2: Creating real pairs during shrink-wrapping may have
		// double save / restores, that can corrupt registers.
		if (!MFI.getShouldUseShrinkWrap2()) {
if (i + 1 < Count) {		if (i + 1 < Count) {
unsigned NextReg = CSI[i + 1].getReg();		unsigned NextReg = CSI[i + 1].getReg();
if ((RPI.IsGPR && AArch64::GPR64RegClass.contains(NextReg)) \|\|		if ((RPI.IsGPR && AArch64::GPR64RegClass.contains(NextReg)) \|\|
(!RPI.IsGPR && AArch64::FPR64RegClass.contains(NextReg)))		(!RPI.IsGPR && AArch64::FPR64RegClass.contains(NextReg)))
RPI.Reg2 = NextReg;		RPI.Reg2 = NextReg;
}		}
		}

// GPRs and FPRs are saved in pairs of 64-bit regs. We expect the CSI		// GPRs and FPRs are saved in pairs of 64-bit regs. We expect the CSI
// list to come in sorted by frame index so that we can issue the store		// list to come in sorted by frame index so that we can issue the store
// pair instructions directly. Assert if we see anything otherwise.		// pair instructions directly. Assert if we see anything otherwise.
//		//
// The order of the registers in the list is controlled by		// The order of the registers in the list is controlled by
// getCalleeSavedRegs(), so they will always be in-order, as well.		// getCalleeSavedRegs(), so they will always be in-order, as well.
		// FIXME: ShrinkWrap2: Make it work with shrink-wrapping.
		if (!MFI.getShouldUseShrinkWrap2()) {
assert((!RPI.isPaired() \|\|		assert((!RPI.isPaired() \|\|
(CSI[i].getFrameIdx() + 1 == CSI[i + 1].getFrameIdx())) &&		(CSI[i].getFrameIdx() + 1 == CSI[i + 1].getFrameIdx())) &&
"Out of order callee saved regs!");		"Out of order callee saved regs!");
		}

// MachO's compact unwind format relies on all registers being stored in		// MachO's compact unwind format relies on all registers being stored in
// adjacent register pairs.		// adjacent register pairs.
		// FIXME: ShrinkWrap2: Fix compact unwind format.
		if (!MFI.getShouldUseShrinkWrap2()) {
assert((!produceCompactUnwindFrame(MF) \|\|		assert((!produceCompactUnwindFrame(MF) \|\|
CC == CallingConv::PreserveMost \|\|		CC == CallingConv::PreserveMost \|\|
(RPI.isPaired() &&		(RPI.isPaired() &&
((RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP) \|\|		((RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP) \|\|
RPI.Reg1 + 1 == RPI.Reg2))) &&		RPI.Reg1 + 1 == RPI.Reg2))) &&
"Callee-save registers not saved as adjacent register pair!");		"Callee-save registers not saved as adjacent register pair!");
		}

RPI.FrameIdx = CSI[i].getFrameIdx();		RPI.FrameIdx = CSI[i].getFrameIdx();

if (Count * 8 != AFI->getCalleeSavedStackSize() && !RPI.isPaired()) {		// FIXME: ShrinkWrap2: We are never using pairs.
		if (!MFI.getShouldUseShrinkWrap2() && Count * 8 != AFI->getCalleeSavedStackSize() && !RPI.isPaired()) {
// Round up size of non-pair to pair size if we need to pad the		// Round up size of non-pair to pair size if we need to pad the
// callee-save area to ensure 16-byte alignment.		// callee-save area to ensure 16-byte alignment.
Offset -= 16;		Offset -= 16;
assert(MFI.getObjectAlignment(RPI.FrameIdx) <= 16);		assert(MFI.getObjectAlignment(RPI.FrameIdx) <= 16);
MFI.setObjectAlignment(RPI.FrameIdx, 16);		MFI.setObjectAlignment(RPI.FrameIdx, 16);
AFI->setCalleeSaveStackHasFreeSpace(true);		AFI->setCalleeSaveStackHasFreeSpace(true);
} else		} else
Offset -= RPI.isPaired() ? 16 : 8;		Offset -= RPI.isPaired() ? 16 : 8;
assert(Offset % 8 == 0);		assert(Offset % 8 == 0);
RPI.Offset = Offset / 8;		RPI.Offset = Offset / 8;

		// FIXME: ShrinkWrap2: This is unused through the whole backend. Instead, we
		// have the RegisterPairInfo.
		MFI.setObjectSize(RPI.FrameIdx, 8);
		MFI.setObjectOffset(RPI.FrameIdx, RPI.Offset);

		// FIXME: ShrinkWrap2: Check for out of bounds ofsets for STR/STUR/etc?
assert((RPI.Offset >= -64 && RPI.Offset <= 63) &&		assert((RPI.Offset >= -64 && RPI.Offset <= 63) &&
"Offset out of bounds for LDP/STP immediate");		"Offset out of bounds for LDP/STP immediate");

RegPairs.push_back(RPI);		RegPairs.push_back(RPI);
if (RPI.isPaired())		if (RPI.isPaired())
++i;		++i;
}		}
}		}

		// FIXME: ShrinkWrap2: We need this here because we have to call
		// computeCalleeSaveRegisterPairs once after frame indices have been assigned.
		void AArch64FrameLowering::processValidCalleeSavedInfo(
		MachineFunction &MF, const TargetRegisterInfo *TRI,
		std::vector<CalleeSavedInfo> &CSI) const {
		AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();
		computeCalleeSaveRegisterPairs(MF, CSI, TRI, AFI->getRegPairs());
		}

bool AArch64FrameLowering::spillCalleeSavedRegisters(		bool AArch64FrameLowering::spillCalleeSavedRegisters(
MachineBasicBlock &MBB, MachineBasicBlock::iterator MI,		MachineBasicBlock &MBB, MachineBasicBlock::iterator MI,
const std::vector<CalleeSavedInfo> &CSI,		const std::vector<CalleeSavedInfo> &CSI,
const TargetRegisterInfo *TRI) const {		const TargetRegisterInfo *TRI) const {
MachineFunction &MF = *MBB.getParent();		MachineFunction &MF = *MBB.getParent();
		MachineFrameInfo &MFI = MF.getFrameInfo();
		MachineModuleInfo &MMI = MF.getMMI();
		AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();
const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();		const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();
		const MachineRegisterInfo &MRI = MF.getRegInfo();
DebugLoc DL;		DebugLoc DL;
SmallVector<RegPairInfo, 8> RegPairs;		SmallVectorImpl<RegPairInfo> &RegPairs = AFI->getRegPairs();
		const TargetSubtargetInfo &STI = MF.getSubtarget();
		bool needsCFI =
		MMI.hasDebugInfo() \|\| MF.getFunction()->needsUnwindTableEntry();
		// FIXME: ShrinkWrap2: We should always use AFI->getRegPairs(), or at least
		// avoid calling computeCalleeSaveRegisterPair more than once.
		if (!MFI.getShouldUseShrinkWrap2()) {
		RegPairs.clear();
computeCalleeSaveRegisterPairs(MF, CSI, TRI, RegPairs);		computeCalleeSaveRegisterPairs(MF, CSI, TRI, RegPairs);
const MachineRegisterInfo &MRI = MF.getRegInfo();		}

for (auto RPII = RegPairs.rbegin(), RPIE = RegPairs.rend(); RPII != RPIE;		for (auto RPII = RegPairs.rbegin(), RPIE = RegPairs.rend(); RPII != RPIE;
++RPII) {		++RPII) {
RegPairInfo RPI = *RPII;		RegPairInfo RPI = *RPII;
unsigned Reg1 = RPI.Reg1;		unsigned Reg1 = RPI.Reg1;
unsigned Reg2 = RPI.Reg2;		unsigned Reg2 = RPI.Reg2;
unsigned StrOpc;		unsigned StrOpc;

		// FIXME: ShrinkWrap2: Skip all the registers that are not related to this
		// block.
		if (MFI.getShouldUseShrinkWrap2()) {
		if (find_if(CSI, [&](const CalleeSavedInfo &CS) {
		return CS.getReg() == Reg1;
		}) == CSI.end())
		continue;
		}

// Issue sequence of spills for cs regs. The first spill may be converted		// Issue sequence of spills for cs regs. The first spill may be converted
// to a pre-decrement store later by emitPrologue if the callee-save stack		// to a pre-decrement store later by emitPrologue if the callee-save stack
// area allocation can't be combined with the local stack area allocation.		// area allocation can't be combined with the local stack area allocation.
// For example:		// For example:
// stp x22, x21, [sp, #0] // addImm(+0)		// stp x22, x21, [sp, #0] // addImm(+0)
// stp x20, x19, [sp, #16] // addImm(+2)		// stp x20, x19, [sp, #16] // addImm(+2)
// stp fp, lr, [sp, #32] // addImm(+4)		// stp fp, lr, [sp, #32] // addImm(+4)
// Rationale: This sequence saves uop updates compared to a sequence of		// Rationale: This sequence saves uop updates compared to a sequence of
// pre-increment spills like stp xi,xj,[sp,#-16]!		// pre-increment spills like stp xi,xj,[sp,#-16]!
// Note: Similar rationale and sequence for restores in epilog.		// Note: Similar rationale and sequence for restores in epilog.
if (RPI.IsGPR)		if (RPI.IsGPR)
StrOpc = RPI.isPaired() ? AArch64::STPXi : AArch64::STRXui;		StrOpc = RPI.isPaired() ? AArch64::STPXi : AArch64::STRXui;
else		else
StrOpc = RPI.isPaired() ? AArch64::STPDi : AArch64::STRDui;		StrOpc = RPI.isPaired() ? AArch64::STPDi : AArch64::STRDui;
DEBUG(dbgs() << "CSR spill: (" << TRI->getName(Reg1);		DEBUG(dbgs() << "CSR spill: (" << TRI->getName(Reg1);
if (RPI.isPaired())		if (RPI.isPaired())
dbgs() << ", " << TRI->getName(Reg2);		dbgs() << ", " << TRI->getName(Reg2);
dbgs() << ") -> fi#(" << RPI.FrameIdx;		dbgs() << ") -> fi#(" << RPI.FrameIdx;
if (RPI.isPaired())		if (RPI.isPaired())
dbgs() << ", " << RPI.FrameIdx+1;		dbgs() << ", " << RPI.FrameIdx+1;
dbgs() << ")\n");		dbgs() << ")\n");

		// FIXME: ShrinkWrap2: We need to decide wether to use SP or FP-relative
		// store / load here. In order to do that, we have several factors:
		// * If we don't use shrink-wrapping, always use SP.
		// * If we don't have a frame, always use SP.
		// * If it's the entry block, do not use SP, because we might have SP
		// adjustments before / after.
		// * If we don't have a frame, and we have local variables, and we have to
		// use SP, then we have to keep track of the offsets that are used to store
		// / load the CSR, and update them during prologue emission, where we have
		// the information about the local stack size.
		bool isEntryBlock = &MF.front() == &MBB;
		bool ShouldUseSP = !hasFP(*MBB.getParent()) \|\| isEntryBlock;
		int CSStackSize = AFI->getCalleeSavedStackSize();
		int Imm = -(CSStackSize - 16 - int(RPI.Offset) * 8) / 8;
		if (MFI.getShouldUseShrinkWrap2() && !ShouldUseSP) {
		if (StrOpc == AArch64::STRXui)
		StrOpc = AArch64::STURXi;
		else if (StrOpc == AArch64::STRDui)
		StrOpc = AArch64::STURDi;
		Imm *= 8;
		}
MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));		MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
if (!MRI.isReserved(Reg1))		if (!MRI.isReserved(Reg1))
MBB.addLiveIn(Reg1);		MBB.addLiveIn(Reg1);
if (RPI.isPaired()) {		if (RPI.isPaired()) {
if (!MRI.isReserved(Reg2))		if (!MRI.isReserved(Reg2))
MBB.addLiveIn(Reg2);		MBB.addLiveIn(Reg2);
MIB.addReg(Reg2, getPrologueDeath(MF, Reg2));		MIB.addReg(Reg2, getPrologueDeath(MF, Reg2));
MIB.addMemOperand(MF.getMachineMemOperand(		MIB.addMemOperand(MF.getMachineMemOperand(
MachinePointerInfo::getFixedStack(MF, RPI.FrameIdx + 1),		MachinePointerInfo::getFixedStack(MF, RPI.FrameIdx + 1),
MachineMemOperand::MOStore, 8, 8));		MachineMemOperand::MOStore, 8, 8));
}		}
		if (MFI.getShouldUseShrinkWrap2()) {
		if (ShouldUseSP) {
MIB.addReg(Reg1, getPrologueDeath(MF, Reg1))		MIB.addReg(Reg1, getPrologueDeath(MF, Reg1))
.addReg(AArch64::SP)		.addReg(AArch64::SP)
.addImm(RPI.Offset) // [sp, #offset8], where factor8 is implicit		.addImm(RPI.Offset) // [sp, #offset8], where factor8 is implicit
.setMIFlag(MachineInstr::FrameSetup);		.setMIFlag(MachineInstr::FrameSetup);
		MachineInstr *MI = MIB;
		if (&MBB != &MF.front())
		AFI->getCSROffsetsToFix().push_back(&MI->getOperand(2));
		} else {
		MIB.addReg(Reg1, getPrologueDeath(MF, Reg1))
		.addReg(AArch64::FP)
		.addImm(Imm) // [sp, #offset8], where factor8 is implicit
		.setMIFlag(MachineInstr::FrameSetup);
		}
		} else {
		MIB.addReg(Reg1, getPrologueDeath(MF, Reg1))
		.addReg(AArch64::SP)
		.addImm(RPI.Offset) // [sp, #offset8], where factor8 is implicit
		.setMIFlag(MachineInstr::FrameSetup);
		}
MIB.addMemOperand(MF.getMachineMemOperand(		MIB.addMemOperand(MF.getMachineMemOperand(
MachinePointerInfo::getFixedStack(MF, RPI.FrameIdx),		MachinePointerInfo::getFixedStack(MF, RPI.FrameIdx),
MachineMemOperand::MOStore, 8, 8));		MachineMemOperand::MOStore, 8, 8));
		if (MFI.getShouldUseShrinkWrap2() && needsCFI) {
		int64_t Offset = ((-(CSStackSize - 16 - int(RPI.Offset) * 8) / 8) - 2) * 8;
		const MCRegisterInfo *MCRI = STI.getRegisterInfo();
		unsigned DwarfReg = MCRI->getDwarfRegNum(Reg1, true);
		unsigned CFIIndex = MF.addFrameInst(
		MCCFIInstruction::createOffset(nullptr, DwarfReg, Offset));
		BuildMI(MBB, MI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
		.addCFIIndex(CFIIndex)
		.setMIFlag(MachineInstr::FrameSetup);
}		}
		}

return true;		return true;
}		}

bool AArch64FrameLowering::restoreCalleeSavedRegisters(		bool AArch64FrameLowering::restoreCalleeSavedRegisters(
MachineBasicBlock &MBB, MachineBasicBlock::iterator MI,		MachineBasicBlock &MBB, MachineBasicBlock::iterator MI,
const std::vector<CalleeSavedInfo> &CSI,		const std::vector<CalleeSavedInfo> &CSI,
const TargetRegisterInfo *TRI) const {		const TargetRegisterInfo *TRI) const {
MachineFunction &MF = *MBB.getParent();		MachineFunction &MF = *MBB.getParent();
		MachineFrameInfo &MFI = MF.getFrameInfo();
		AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();
const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();		const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();
DebugLoc DL;		DebugLoc DL;
SmallVector<RegPairInfo, 8> RegPairs;		SmallVectorImpl<RegPairInfo> &RegPairs = AFI->getRegPairs();
		const TargetSubtargetInfo &STI = MF.getSubtarget();
		const MCRegisterInfo *MRI = STI.getRegisterInfo();
		MachineModuleInfo &MMI = MF.getMMI();
		bool needsCFI =
		MMI.hasDebugInfo() \|\| MF.getFunction()->needsUnwindTableEntry();

if (MI != MBB.end())		if (MI != MBB.end())
DL = MI->getDebugLoc();		DL = MI->getDebugLoc();

		// FIXME: ShrinkWrap2: We should always use AFI->getRegPairs(), or at least
		// avoid calling computeCalleeSaveRegisterPair more than once.
		if (!MFI.getShouldUseShrinkWrap2()) {
		RegPairs.clear();
computeCalleeSaveRegisterPairs(MF, CSI, TRI, RegPairs);		computeCalleeSaveRegisterPairs(MF, CSI, TRI, RegPairs);
		}

for (auto RPII = RegPairs.begin(), RPIE = RegPairs.end(); RPII != RPIE;		for (auto RPII = RegPairs.begin(), RPIE = RegPairs.end(); RPII != RPIE;
++RPII) {		++RPII) {
RegPairInfo RPI = *RPII;		RegPairInfo RPI = *RPII;
unsigned Reg1 = RPI.Reg1;		unsigned Reg1 = RPI.Reg1;
unsigned Reg2 = RPI.Reg2;		unsigned Reg2 = RPI.Reg2;

		// FIXME: ShrinkWrap2: Skip all the registers that are not related to this
		// block.
		if (MFI.getShouldUseShrinkWrap2()) {
		if (find_if(CSI, [&](const CalleeSavedInfo &CS) {
		return CS.getReg() == Reg1;
		}) == CSI.end())
		continue;
		}

// Issue sequence of restores for cs regs. The last restore may be converted		// Issue sequence of restores for cs regs. The last restore may be converted
// to a post-increment load later by emitEpilogue if the callee-save stack		// to a post-increment load later by emitEpilogue if the callee-save stack
// area allocation can't be combined with the local stack area allocation.		// area allocation can't be combined with the local stack area allocation.
// For example:		// For example:
// ldp fp, lr, [sp, #32] // addImm(+4)		// ldp fp, lr, [sp, #32] // addImm(+4)
// ldp x20, x19, [sp, #16] // addImm(+2)		// ldp x20, x19, [sp, #16] // addImm(+2)
// ldp x22, x21, [sp, #0] // addImm(+0)		// ldp x22, x21, [sp, #0] // addImm(+0)
// Note: see comment in spillCalleeSavedRegisters()		// Note: see comment in spillCalleeSavedRegisters()
unsigned LdrOpc;		unsigned LdrOpc;
if (RPI.IsGPR)		if (RPI.IsGPR)
LdrOpc = RPI.isPaired() ? AArch64::LDPXi : AArch64::LDRXui;		LdrOpc = RPI.isPaired() ? AArch64::LDPXi : AArch64::LDRXui;
else		else
LdrOpc = RPI.isPaired() ? AArch64::LDPDi : AArch64::LDRDui;		LdrOpc = RPI.isPaired() ? AArch64::LDPDi : AArch64::LDRDui;
DEBUG(dbgs() << "CSR restore: (" << TRI->getName(Reg1);		DEBUG(dbgs() << "CSR restore: (" << TRI->getName(Reg1);
if (RPI.isPaired())		if (RPI.isPaired())
dbgs() << ", " << TRI->getName(Reg2);		dbgs() << ", " << TRI->getName(Reg2);
dbgs() << ") -> fi#(" << RPI.FrameIdx;		dbgs() << ") -> fi#(" << RPI.FrameIdx;
if (RPI.isPaired())		if (RPI.isPaired())
dbgs() << ", " << RPI.FrameIdx+1;		dbgs() << ", " << RPI.FrameIdx+1;
dbgs() << ")\n");		dbgs() << ")\n");

		// FIXME: ShrinkWrap2: See comment in spillCalleeSavedRegisters.
		bool isReturnBlock = MBB.isReturnBlock();
		bool ShouldUseSP = !hasFP(*MBB.getParent()) \|\| isReturnBlock;
		int CSStackSize = AFI->getCalleeSavedStackSize();
		int Imm = -(CSStackSize - 16 - int(RPI.Offset) * 8) / 8;
		if (MFI.getShouldUseShrinkWrap2() && !ShouldUseSP) {
		if (LdrOpc == AArch64::LDRXui)
		LdrOpc = AArch64::LDURXi;
		else if (LdrOpc == AArch64::LDRDui)
		LdrOpc = AArch64::LDURDi;
		Imm *= 8;
		}

MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(LdrOpc));		MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(LdrOpc));
if (RPI.isPaired()) {		if (RPI.isPaired()) {
MIB.addReg(Reg2, getDefRegState(true));		MIB.addReg(Reg2, getDefRegState(true));
MIB.addMemOperand(MF.getMachineMemOperand(		MIB.addMemOperand(MF.getMachineMemOperand(
MachinePointerInfo::getFixedStack(MF, RPI.FrameIdx + 1),		MachinePointerInfo::getFixedStack(MF, RPI.FrameIdx + 1),
MachineMemOperand::MOLoad, 8, 8));		MachineMemOperand::MOLoad, 8, 8));
}		}
		if (MFI.getShouldUseShrinkWrap2()) {
		if (ShouldUseSP) {
		MIB.addReg(Reg1, getDefRegState(true))
		.addReg(AArch64::SP)
		.addImm(
		RPI.Offset) // [sp, #offset8] where the factor8 is implicit
		.setMIFlag(MachineInstr::FrameDestroy);
		MachineInstr *MI = MIB;
		if (!MBB.isReturnBlock())
		AFI->getCSROffsetsToFix().push_back(&MI->getOperand(2));
		} else {
		MIB.addReg(Reg1, getDefRegState(true))
		.addReg(AArch64::FP)
		.addImm(Imm) // [sp, #offset8], where factor8 is implicit
		.setMIFlag(MachineInstr::FrameDestroy);
		}
		} else {
MIB.addReg(Reg1, getDefRegState(true))		MIB.addReg(Reg1, getDefRegState(true))
.addReg(AArch64::SP)		.addReg(AArch64::SP)
.addImm(RPI.Offset) // [sp, #offset8] where the factor8 is implicit		.addImm(RPI.Offset) // [sp, #offset8] where the factor8 is implicit
.setMIFlag(MachineInstr::FrameDestroy);		.setMIFlag(MachineInstr::FrameDestroy);
		}

MIB.addMemOperand(MF.getMachineMemOperand(		MIB.addMemOperand(MF.getMachineMemOperand(
MachinePointerInfo::getFixedStack(MF, RPI.FrameIdx),		MachinePointerInfo::getFixedStack(MF, RPI.FrameIdx),
MachineMemOperand::MOLoad, 8, 8));		MachineMemOperand::MOLoad, 8, 8));

		if (MFI.getShouldUseShrinkWrap2() && needsCFI) {
		unsigned DwarfReg = MRI->getDwarfRegNum(Reg1, true);
		unsigned CFIIndex =
		MF.addFrameInst(MCCFIInstruction::createRestore(nullptr, DwarfReg));
		BuildMI(MBB, MI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
		.addCFIIndex(CFIIndex)
		.setMIFlag(MachineInstr::FrameDestroy);
		}
}		}
return true;		return true;
}		}

void AArch64FrameLowering::determineCalleeSaves(MachineFunction &MF,		void AArch64FrameLowering::determineCalleeSaves(MachineFunction &MF,
BitVector &SavedRegs,		BitVector &SavedRegs,
RegScavenger *RS) const {		RegScavenger *RS) const {
// All calls are tail calls in GHC calling conv, and functions have no		// All calls are tail calls in GHC calling conv, and functions have no
▲ Show 20 Lines • Show All 100 Lines • ▼ Show 20 Lines	if (!ExtraCSSpill \|\| MF.getRegInfo().isPhysRegUsed(ExtraCSSpill)) {
RS->addScavengingFrameIndex(FI);		RS->addScavengingFrameIndex(FI);
DEBUG(dbgs() << "No available CS registers, allocated fi#" << FI		DEBUG(dbgs() << "No available CS registers, allocated fi#" << FI
<< " as the emergency spill slot.\n");		<< " as the emergency spill slot.\n");
}		}
}		}

// Round up to register pair alignment to avoid additional SP adjustment		// Round up to register pair alignment to avoid additional SP adjustment
// instructions.		// instructions.
		}

		bool AArch64FrameLowering::assignCalleeSavedSpillSlots(
		MachineFunction &MF, const TargetRegisterInfo *TRI,
		std::vector<CalleeSavedInfo> &CSI) const {
		// FIXME: ShrinkWrap2: This is only a hack to delay the computation of
		// NumRegsSpilled.
		AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();
		unsigned NumRegsSpilled = CSI.size();
AFI->setCalleeSavedStackSize(alignTo(8 * NumRegsSpilled, 16));		AFI->setCalleeSavedStackSize(alignTo(8 * NumRegsSpilled, 16));
		return false;
}		}

bool AArch64FrameLowering::enableStackSlotScavenging(		bool AArch64FrameLowering::enableStackSlotScavenging(
const MachineFunction &MF) const {		const MachineFunction &MF) const {
const AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();		const AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();
return AFI->hasCalleeSaveStackFreeSpace();		return AFI->hasCalleeSaveStackFreeSpace();
}		}

		std::unique_ptr<ShrinkWrapInfo>
		AArch64FrameLowering::createCSRShrinkWrapInfo(const MachineFunction &MF) const {
		return llvm::make_unique<AArch64CSRShrinkWrapInfo>(MF);
		}

lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp

Show First 20 Lines • Show All 631 Lines • ▼ Show 20 Lines	AArch64LoadStoreOpt::mergeNarrowZeroStores(MachineBasicBlock::iterator I,
MergeMI->eraseFromParent();		MergeMI->eraseFromParent();
return NextI;		return NextI;
}		}

MachineBasicBlock::iterator		MachineBasicBlock::iterator
AArch64LoadStoreOpt::mergePairedInsns(MachineBasicBlock::iterator I,		AArch64LoadStoreOpt::mergePairedInsns(MachineBasicBlock::iterator I,
MachineBasicBlock::iterator Paired,		MachineBasicBlock::iterator Paired,
const LdStPairFlags &Flags) {		const LdStPairFlags &Flags) {
		// FIXME: ShrinkWrap2: Add optimization remarks to see when we miss forming a
		// pair.
MachineBasicBlock::iterator NextI = I;		MachineBasicBlock::iterator NextI = I;
++NextI;		++NextI;
// If NextI is the second of the two instructions to be merged, we need		// If NextI is the second of the two instructions to be merged, we need
// to skip one further. Either way we merge will invalidate the iterator,		// to skip one further. Either way we merge will invalidate the iterator,
// and we don't need to scan the new instruction, as it's a pairwise		// and we don't need to scan the new instruction, as it's a pairwise
// instruction, which we're not considering for further action anyway.		// instruction, which we're not considering for further action anyway.
if (NextI == Paired)		if (NextI == Paired)
++NextI;		++NextI;
▲ Show 20 Lines • Show All 1,126 Lines • Show Last 20 Lines

lib/Target/AArch64/AArch64MachineFunctionInfo.h

Show All 17 Lines
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/CodeGen/MachineFunction.h"		#include "llvm/CodeGen/MachineFunction.h"
#include "llvm/MC/MCLinkerOptimizationHint.h"		#include "llvm/MC/MCLinkerOptimizationHint.h"
#include <cassert>		#include <cassert>

namespace llvm {		namespace llvm {

		class MachineOperand;

		/// This contains register pairs computed for callee-save saves / restores.
		struct RegPairInfo {
		unsigned Reg1 = AArch64::NoRegister;
		unsigned Reg2 = AArch64::NoRegister;
		int FrameIdx;
		int Offset;
		bool IsGPR;

		RegPairInfo() = default;

		bool isPaired() const { return Reg2 != llvm::AArch64::NoRegister; }
		};

/// AArch64FunctionInfo - This class is derived from MachineFunctionInfo and		/// AArch64FunctionInfo - This class is derived from MachineFunctionInfo and
/// contains private AArch64-specific information for each MachineFunction.		/// contains private AArch64-specific information for each MachineFunction.
class AArch64FunctionInfo final : public MachineFunctionInfo {		class AArch64FunctionInfo final : public MachineFunctionInfo {
/// Number of bytes of arguments this function has on the stack. If the callee		/// Number of bytes of arguments this function has on the stack. If the callee
/// is expected to restore the argument stack this should be a multiple of 16,		/// is expected to restore the argument stack this should be a multiple of 16,
/// all usable during a tail call.		/// all usable during a tail call.
///		///
/// The alternative would forbid tail call optimisation in some cases: if we		/// The alternative would forbid tail call optimisation in some cases: if we
/// want to transfer control from a function with 8-bytes of stack-argument		/// want to transfer control from a function with 8-bytes of stack-argument
/// space to a function with 16-bytes then misalignment of this value would		/// space to a function with 16-bytes then misalignment of this value would
/// make a stack adjustment necessary, which could not be undone by the		/// make a stack adjustment necessary, which could not be undone by the
/// callee.		/// callee.
unsigned BytesInStackArgArea = 0;		unsigned BytesInStackArgArea = 0;

/// The number of bytes to restore to deallocate space for incoming		/// The number of bytes to restore to deallocate space for incoming
/// arguments. Canonically 0 in the C calling convention, but non-zero when		/// arguments. Canonically 0 in the C calling convention, but non-zero when
/// callee is expected to pop the args.		/// callee is expected to pop the args.
unsigned ArgumentStackToRestore = 0;		unsigned ArgumentStackToRestore = 0;

/// HasStackFrame - True if this function has a stack frame. Set by		/// HasStackFrame - True if this function has a stack frame. Set by
/// determineCalleeSaves().		/// determineCalleeSaves().
		// FIXME: ShrinkWrap2: This should not be set in determineCalleeSaves...
bool HasStackFrame = false;		bool HasStackFrame = false;

/// \brief Amount of stack frame size, not including callee-saved registers.		/// \brief Amount of stack frame size, not including callee-saved registers.
unsigned LocalStackSize;		unsigned LocalStackSize;

/// \brief Amount of stack frame size used for saving callee-saved registers.		/// \brief Amount of stack frame size used for saving callee-saved registers.
unsigned CalleeSavedStackSize;		unsigned CalleeSavedStackSize;

Show All 28 Lines	class AArch64FunctionInfo final : public MachineFunctionInfo {
/// True when the stack gets realigned dynamically because the size of stack		/// True when the stack gets realigned dynamically because the size of stack
/// frame is unknown at compile time. e.g., in case of VLAs.		/// frame is unknown at compile time. e.g., in case of VLAs.
bool StackRealigned = false;		bool StackRealigned = false;

/// True when the callee-save stack area has unused gaps that may be used for		/// True when the callee-save stack area has unused gaps that may be used for
/// other stack allocations.		/// other stack allocations.
bool CalleeSaveStackHasFreeSpace = false;		bool CalleeSaveStackHasFreeSpace = false;

		// FIXME: ShrinkWrap2: This should be replaced with MFI.Objects.
		/// Register pairs computed for CSR save / restore.
		SmallVector<RegPairInfo, 8> RegPairs;

		// FIXME: ShrinkWrap2: The offsets that probably need to be fixed are
		// collected during spillCalleeSavedRegisters but need to be fixed during
		// emitPrologue.
		/// Machine operands representing SP-related offsets to CSRs, that need to be
		/// fixed if local stack allocation happens afterwards.
		SmallVector<MachineOperand*, 8> CSROffsetsToFix;

public:		public:
AArch64FunctionInfo() = default;		AArch64FunctionInfo() = default;

explicit AArch64FunctionInfo(MachineFunction &MF) {		explicit AArch64FunctionInfo(MachineFunction &MF) {
(void)MF;		(void)MF;
}		}

unsigned getBytesInStackArgArea() const { return BytesInStackArgArea; }		unsigned getBytesInStackArgArea() const { return BytesInStackArgArea; }
Show All 12 Lines	public:

bool hasCalleeSaveStackFreeSpace() const {		bool hasCalleeSaveStackFreeSpace() const {
return CalleeSaveStackHasFreeSpace;		return CalleeSaveStackHasFreeSpace;
}		}
void setCalleeSaveStackHasFreeSpace(bool s) {		void setCalleeSaveStackHasFreeSpace(bool s) {
CalleeSaveStackHasFreeSpace = s;		CalleeSaveStackHasFreeSpace = s;
}		}

		SmallVectorImpl<RegPairInfo> &getRegPairs() { return RegPairs; }
		SmallVectorImpl<MachineOperand *> &getCSROffsetsToFix() {
		return CSROffsetsToFix;
		}

bool isSplitCSR() const { return IsSplitCSR; }		bool isSplitCSR() const { return IsSplitCSR; }
void setIsSplitCSR(bool s) { IsSplitCSR = s; }		void setIsSplitCSR(bool s) { IsSplitCSR = s; }

void setLocalStackSize(unsigned Size) { LocalStackSize = Size; }		void setLocalStackSize(unsigned Size) { LocalStackSize = Size; }
unsigned getLocalStackSize() const { return LocalStackSize; }		unsigned getLocalStackSize() const { return LocalStackSize; }

void setCalleeSavedStackSize(unsigned Size) { CalleeSavedStackSize = Size; }		void setCalleeSavedStackSize(unsigned Size) { CalleeSavedStackSize = Size; }
unsigned getCalleeSavedStackSize() const { return CalleeSavedStackSize; }		unsigned getCalleeSavedStackSize() const { return CalleeSavedStackSize; }
▲ Show 20 Lines • Show All 64 Lines • Show Last 20 Lines

lib/Target/X86/X86FrameLowering.h

Show First 20 Lines • Show All 171 Lines • ▼ Show 20 Lines	public:

/// Sets up EBP and optionally ESI based on the incoming EBP value. Only		/// Sets up EBP and optionally ESI based on the incoming EBP value. Only
/// needed for 32-bit. Used in funclet prologues and at catchret destinations.		/// needed for 32-bit. Used in funclet prologues and at catchret destinations.
MachineBasicBlock::iterator		MachineBasicBlock::iterator
restoreWin32EHStackPointers(MachineBasicBlock &MBB,		restoreWin32EHStackPointers(MachineBasicBlock &MBB,
MachineBasicBlock::iterator MBBI,		MachineBasicBlock::iterator MBBI,
const DebugLoc &DL, bool RestoreSP = false) const;		const DebugLoc &DL, bool RestoreSP = false) const;

		std::unique_ptr<ShrinkWrapInfo>
		createCSRShrinkWrapInfo(const MachineFunction &MF) const override;

private:		private:
uint64_t calculateMaxStackAlign(const MachineFunction &MF) const;		uint64_t calculateMaxStackAlign(const MachineFunction &MF) const;

/// Emit target stack probe as a call to a helper function		/// Emit target stack probe as a call to a helper function
void emitStackProbeCall(MachineFunction &MF, MachineBasicBlock &MBB,		void emitStackProbeCall(MachineFunction &MF, MachineBasicBlock &MBB,
MachineBasicBlock::iterator MBBI, const DebugLoc &DL,		MachineBasicBlock::iterator MBBI, const DebugLoc &DL,
bool InProlog) const;		bool InProlog) const;

Show All 34 Lines

lib/Target/X86/X86FrameLowering.cpp

Show All 29 Lines
#include "llvm/MC/MCAsmInfo.h"		#include "llvm/MC/MCAsmInfo.h"
#include "llvm/MC/MCSymbol.h"		#include "llvm/MC/MCSymbol.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Target/TargetOptions.h"		#include "llvm/Target/TargetOptions.h"
#include <cstdlib>		#include <cstdlib>

using namespace llvm;		using namespace llvm;

		class X86CSRShrinkWrapInfo final : public ShrinkWrapInfo {
		/// Number of bits the result needs.
		unsigned NumCSRs = 0;
		public:
		unsigned getNumResultBits() const override { return NumCSRs; }

		X86CSRShrinkWrapInfo(const MachineFunction &MF) : ShrinkWrapInfo(MF) {
		bool Is64Bit = MF.getSubtarget<X86Subtarget>().is64Bit();
		auto TRI = static_cast<const X86RegisterInfo *>(
		MF.getSubtarget().getRegisterInfo());
		const MCPhysReg *CSRegs = TRI->getCalleeSavedRegs(&MF);
		unsigned BasePtrIndex = static_cast<unsigned>(-1);
		unsigned RBPIndex = static_cast<unsigned>(-1);
		// Count the number of CSRs.
		unsigned BasePtr = TRI->getBaseRegister();
		if (Is64Bit && BasePtr == X86::EBX)
		BasePtr = X86::RBX;
		unsigned FramePtr = TRI->getFramePtr();
		if (Is64Bit && FramePtr == X86::EBP)
		FramePtr = X86::RBP;
		// FIXME: ShrinkWrap2: Fix HHVM, which has only R12 as a CSR.
		for (unsigned i = 0; CSRegs[i]; ++i) {
		if (CSRegs[i] == FramePtr)
		RBPIndex = i;
		else if (CSRegs[i] == BasePtr)
		BasePtrIndex = i;
		++NumCSRs;
		}

		determineCSRUses();

		// FIXME: ShrinkWrap2: const_cast
		MachineFrameInfo &MFI = const_cast<MachineFrameInfo &>(MF.getFrameInfo());

		// FIXME: ShrinkWrap2: This is a copy of the code in determineCalleeSaves.
		// It also feels like there should not be any side effects done here.
		// FIXME: ShrinkWrap2: const_cast
		auto X86FI = const_cast<X86MachineFunctionInfo *>(
		MF.getInfo<X86MachineFunctionInfo>());
		int64_t TailCallReturnAddrDelta = X86FI->getTCReturnAddrDelta();
		auto SlotSize = TRI->getSlotSize();

		if (TailCallReturnAddrDelta < 0) {
		// create RETURNADDR area
		// arg
		// arg
		// RETADDR
		// { ...
		// RETADDR area
		// ...
		// }
		// [EBP]
		MFI.CreateFixedObject(-TailCallReturnAddrDelta,
		TailCallReturnAddrDelta - SlotSize, true);
		}

		// Spill the BasePtr if it's used.
		if (TRI->hasBasePointer(MF)) {
		auto &SavedRegs = Uses[MF.front().getNumber()];
		if (SavedRegs.empty())
		SavedRegs.resize(getNumResultBits());
		SavedRegs.set(BasePtrIndex);

		// Allocate a spill slot for EBP if we have a base pointer and EH
		// funclets.
		if (MF.hasEHFunclets()) {
		int FI = MFI.CreateSpillStackObject(SlotSize, SlotSize);
		X86FI->setHasSEHFramePtrSave(true);
		X86FI->setSEHFramePtrSaveIndex(FI);
		}
		}

		// X86FrameLowering::EmitPrologue spills RBP manually. Remove it from the
		// uses.
		for (BitVector &BV : Uses)
		if (!BV.empty())
		BV.reset(RBPIndex);
		}

		raw_ostream &printElt(unsigned Elt, raw_ostream &OS) const override {
		auto &TRI = *MF.getSubtarget().getRegisterInfo();
		OS << PrintReg(TRI.getCalleeSavedRegs(&MF)[Elt], &TRI);
		return OS;
		}
		};

X86FrameLowering::X86FrameLowering(const X86Subtarget &STI,		X86FrameLowering::X86FrameLowering(const X86Subtarget &STI,
unsigned StackAlignOverride)		unsigned StackAlignOverride)
: TargetFrameLowering(StackGrowsDown, StackAlignOverride,		: TargetFrameLowering(StackGrowsDown, StackAlignOverride,
STI.is64Bit() ? -8 : -4),		STI.is64Bit() ? -8 : -4),
STI(STI), TII(*STI.getInstrInfo()), TRI(STI.getRegisterInfo()) {		STI(STI), TII(*STI.getInstrInfo()), TRI(STI.getRegisterInfo()) {
// Cache a bunch of frame-related predicates for this subtarget.		// Cache a bunch of frame-related predicates for this subtarget.
SlotSize = TRI->getSlotSize();		SlotSize = TRI->getSlotSize();
Is64Bit = STI.is64Bit();		Is64Bit = STI.is64Bit();
▲ Show 20 Lines • Show All 1,019 Lines • ▼ Show 20 Lines	if (HasFP) {
assert(MF.getRegInfo().isReserved(MachineFramePtr) && "FP reserved");		assert(MF.getRegInfo().isReserved(MachineFramePtr) && "FP reserved");

// Calculate required stack adjustment.		// Calculate required stack adjustment.
uint64_t FrameSize = StackSize - SlotSize;		uint64_t FrameSize = StackSize - SlotSize;
// If required, include space for extra hidden slot for stashing base pointer.		// If required, include space for extra hidden slot for stashing base pointer.
if (X86FI->getRestoreBasePointer())		if (X86FI->getRestoreBasePointer())
FrameSize += SlotSize;		FrameSize += SlotSize;

NumBytes = FrameSize - X86FI->getCalleeSavedFrameSize();		NumBytes = FrameSize;
		// FIXME: ShrinkWrap2: Since we disabled the push / pop spilling, we now
		// have to include the callee saves in our frame size, so that our sp
		// displacement can be updated properly.
		if (!MFI.getShouldUseShrinkWrap2())
		NumBytes -= X86FI->getCalleeSavedFrameSize();

		MatzeBUnsubmitted Done Reply Inline Actions odd linebreak. Also could be written as: NumBytes = FrameSize; if (!MFI.getSaves().empty()) NumBytes -= X86FI->getCalleeSavedFrameSize(); MatzeB: odd linebreak. Also could be written as: ``` NumBytes = FrameSize; if (!MFI.getSaves().empty())…
// Callee-saved registers are pushed on stack before the stack is realigned.		// Callee-saved registers are pushed on stack before the stack is realigned.
if (TRI->needsStackRealignment(MF) && !IsWin64Prologue)		if (TRI->needsStackRealignment(MF) && !IsWin64Prologue)
NumBytes = alignTo(NumBytes, MaxAlign);		NumBytes = alignTo(NumBytes, MaxAlign);

// Get the offset of the stack slot for the EBP register, which is		// Get the offset of the stack slot for the EBP register, which is
// guaranteed to be the last slot by processFunctionBeforeFrameFinalized.		// guaranteed to be the last slot by processFunctionBeforeFrameFinalized.
// Update the frame offset adjustment.		// Update the frame offset adjustment.
if (!IsFunclet)		if (!IsFunclet)
Show All 40 Lines	if (!IsWin64Prologue && !IsFunclet) {
// Define the current CFA to use the EBP/RBP register.		// Define the current CFA to use the EBP/RBP register.
unsigned DwarfFramePtr = TRI->getDwarfRegNum(MachineFramePtr, true);		unsigned DwarfFramePtr = TRI->getDwarfRegNum(MachineFramePtr, true);
BuildCFI(MBB, MBBI, DL, MCCFIInstruction::createDefCfaRegister(		BuildCFI(MBB, MBBI, DL, MCCFIInstruction::createDefCfaRegister(
nullptr, DwarfFramePtr));		nullptr, DwarfFramePtr));
}		}
}		}
} else {		} else {
assert(!IsFunclet && "funclets without FPs not yet implemented");		assert(!IsFunclet && "funclets without FPs not yet implemented");
NumBytes = StackSize - X86FI->getCalleeSavedFrameSize();		NumBytes = StackSize;
		// FIXME: ShrinkWrap2: Since we disabled the push / pop spilling, we now
		// have to include the callee saves in our frame size, so that our sp
		// displacement can be updated properly.
		if (!MFI.getShouldUseShrinkWrap2())
		NumBytes -= X86FI->getCalleeSavedFrameSize();
}		}

// For EH funclets, only allocate enough space for outgoing calls. Save the		// For EH funclets, only allocate enough space for outgoing calls. Save the
// NumBytes value that we would've used for the parent frame.		// NumBytes value that we would've used for the parent frame.
unsigned ParentFrameNumBytes = NumBytes;		unsigned ParentFrameNumBytes = NumBytes;
if (IsFunclet)		if (IsFunclet)
NumBytes = getWinEHFuncletFrameSize(MF);		NumBytes = getWinEHFuncletFrameSize(MF);

// Skip the callee-saved push instructions.		// Skip the callee-saved push instructions.
bool PushedRegs = false;		bool PushedRegs = false;
int StackOffset = 2 * stackGrowth;		int StackOffset = 2 * stackGrowth;

		// FIXME: Add CFI for all the callee saved registers. Since the saves /
		// restores are not at the beginning of the function, we need to go through
		// all the basic blocks.

while (MBBI != MBB.end() &&		while (MBBI != MBB.end() &&
MBBI->getFlag(MachineInstr::FrameSetup) &&		MBBI->getFlag(MachineInstr::FrameSetup) &&
(MBBI->getOpcode() == X86::PUSH32r \|\|		(MBBI->getOpcode() == X86::PUSH32r \|\|
MBBI->getOpcode() == X86::PUSH64r)) {		MBBI->getOpcode() == X86::PUSH64r)) {
PushedRegs = true;		PushedRegs = true;
unsigned Reg = MBBI->getOperand(0).getReg();		unsigned Reg = MBBI->getOperand(0).getReg();
++MBBI;		++MBBI;

▲ Show 20 Lines • Show All 415 Lines • ▼ Show 20 Lines	if (RetOpcode && *RetOpcode == X86::CATCHRET) {
NumBytes = getWinEHFuncletFrameSize(MF);		NumBytes = getWinEHFuncletFrameSize(MF);
assert(hasFP(MF) && "EH funclets without FP not yet implemented");		assert(hasFP(MF) && "EH funclets without FP not yet implemented");
BuildMI(MBB, MBBI, DL, TII.get(Is64Bit ? X86::POP64r : X86::POP32r),		BuildMI(MBB, MBBI, DL, TII.get(Is64Bit ? X86::POP64r : X86::POP32r),
MachineFramePtr)		MachineFramePtr)
.setMIFlag(MachineInstr::FrameDestroy);		.setMIFlag(MachineInstr::FrameDestroy);
} else if (hasFP(MF)) {		} else if (hasFP(MF)) {
// Calculate required stack adjustment.		// Calculate required stack adjustment.
uint64_t FrameSize = StackSize - SlotSize;		uint64_t FrameSize = StackSize - SlotSize;
NumBytes = FrameSize - CSSize;		NumBytes = FrameSize;
		// FIXME: ShrinkWrap2: Since we disabled the push / pop spilling, we now
		// have to include the callee saves in our frame size, so that our sp
		// displacement can be updated properly.
		if (!MFI.getShouldUseShrinkWrap2())
		NumBytes -= CSSize;

		MatzeBUnsubmitted Done Reply Inline Actions see above MatzeB: see above
// Callee-saved registers were pushed on stack before the stack was		// Callee-saved registers were pushed on stack before the stack was
// realigned.		// realigned.
if (TRI->needsStackRealignment(MF) && !IsWin64Prologue)		if (TRI->needsStackRealignment(MF) && !IsWin64Prologue)
NumBytes = alignTo(FrameSize, MaxAlign);		NumBytes = alignTo(FrameSize, MaxAlign);

// Pop EBP.		// Pop EBP.
BuildMI(MBB, MBBI, DL,		BuildMI(MBB, MBBI, DL,
TII.get(Is64Bit ? X86::POP64r : X86::POP32r), MachineFramePtr)		TII.get(Is64Bit ? X86::POP64r : X86::POP32r), MachineFramePtr)
.setMIFlag(MachineInstr::FrameDestroy);		.setMIFlag(MachineInstr::FrameDestroy);
} else {		} else {
NumBytes = StackSize - CSSize;		NumBytes = StackSize;
		// FIXME: ShrinkWrap2: Since we disabled the push / pop spilling, we now
		// have to include the callee saves in our frame size, so that our sp
		// displacement can be updated properly.
		if (!MFI.getShouldUseShrinkWrap2())
		NumBytes -= CSSize;

}		}
uint64_t SEHStackAllocAmt = NumBytes;		uint64_t SEHStackAllocAmt = NumBytes;

MachineBasicBlock::iterator FirstCSPop = MBBI;		MachineBasicBlock::iterator FirstCSPop = MBBI;
// Skip the callee-saved pop instructions.		// Skip the callee-saved pop instructions.
while (MBBI != MBB.begin()) {		while (MBBI != MBB.begin()) {
MachineBasicBlock::iterator PI = std::prev(MBBI);		MachineBasicBlock::iterator PI = std::prev(MBBI);
unsigned Opc = PI->getOpcode();		unsigned Opc = PI->getOpcode();
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	void X86FrameLowering::emitEpilogue(MachineFunction &MF,
// will not do realignment or dynamic stack allocation.		// will not do realignment or dynamic stack allocation.
if ((TRI->needsStackRealignment(MF) \|\| MFI.hasVarSizedObjects()) &&		if ((TRI->needsStackRealignment(MF) \|\| MFI.hasVarSizedObjects()) &&
!IsFunclet) {		!IsFunclet) {
if (TRI->needsStackRealignment(MF))		if (TRI->needsStackRealignment(MF))
MBBI = FirstCSPop;		MBBI = FirstCSPop;
unsigned SEHFrameOffset = calculateSetFPREG(SEHStackAllocAmt);		unsigned SEHFrameOffset = calculateSetFPREG(SEHStackAllocAmt);
uint64_t LEAAmount =		uint64_t LEAAmount =
IsWin64Prologue ? SEHStackAllocAmt - SEHFrameOffset : -CSSize;		IsWin64Prologue ? SEHStackAllocAmt - SEHFrameOffset : -CSSize;
		// FIXME: ShrinkWrap2: Here, we can't assume we are going to pop all the
		// callee saves (because we aren't, we actually move them back, then adjust
		// the stack), so we just want to restore the stack pointer. This should go
		// away at some point...
		if (MFI.getShouldUseShrinkWrap2())
		LEAAmount = 0;

// There are only two legal forms of epilogue:		// There are only two legal forms of epilogue:
// - add SEHAllocationSize, %rsp		// - add SEHAllocationSize, %rsp
// - lea SEHAllocationSize(%FramePtr), %rsp		// - lea SEHAllocationSize(%FramePtr), %rsp
//		//
// 'mov %FramePtr, %rsp' will not be recognized as an epilogue sequence.		// 'mov %FramePtr, %rsp' will not be recognized as an epilogue sequence.
// However, we may use this sequence if we have a frame pointer because the		// However, we may use this sequence if we have a frame pointer because the
// effects of the prologue can safely be undone.		// effects of the prologue can safely be undone.
▲ Show 20 Lines • Show All 276 Lines • ▼ Show 20 Lines	bool X86FrameLowering::assignCalleeSavedSpillSlots(

return true;		return true;
}		}

bool X86FrameLowering::spillCalleeSavedRegisters(		bool X86FrameLowering::spillCalleeSavedRegisters(
MachineBasicBlock &MBB, MachineBasicBlock::iterator MI,		MachineBasicBlock &MBB, MachineBasicBlock::iterator MI,
const std::vector<CalleeSavedInfo> &CSI,		const std::vector<CalleeSavedInfo> &CSI,
const TargetRegisterInfo *TRI) const {		const TargetRegisterInfo *TRI) const {
		// FIXME: ShrinkWrap2: Save using this function when it's adapted to work
		// without push / pop.
		if (MBB.getParent()->getFrameInfo().getShouldUseShrinkWrap2())
		return false;

DebugLoc DL = MBB.findDebugLoc(MI);		DebugLoc DL = MBB.findDebugLoc(MI);

// Don't save CSRs in 32-bit EH funclets. The caller saves EBX, EBP, ESI, EDI		// Don't save CSRs in 32-bit EH funclets. The caller saves EBX, EBP, ESI, EDI
// for us, and there are no XMM CSRs on Win32.		// for us, and there are no XMM CSRs on Win32.
if (MBB.isEHFuncletEntry() && STI.is32Bit() && STI.isOSWindows())		if (MBB.isEHFuncletEntry() && STI.is32Bit() && STI.isOSWindows())
return true;		return true;

// Push GPRs. It increases frame size.		// Push GPRs. It increases frame size.
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	bool X86FrameLowering::spillCalleeSavedRegisters(

return true;		return true;
}		}

bool X86FrameLowering::restoreCalleeSavedRegisters(MachineBasicBlock &MBB,		bool X86FrameLowering::restoreCalleeSavedRegisters(MachineBasicBlock &MBB,
MachineBasicBlock::iterator MI,		MachineBasicBlock::iterator MI,
const std::vector<CalleeSavedInfo> &CSI,		const std::vector<CalleeSavedInfo> &CSI,
const TargetRegisterInfo *TRI) const {		const TargetRegisterInfo *TRI) const {
		// FIXME: ShrinkWrap2: Restore using this function when it's adapted to work
		// without push / pop.
		if (MBB.getParent()->getFrameInfo().getShouldUseShrinkWrap2())
		return false;

if (CSI.empty())		if (CSI.empty())
return false;		return false;

if (MI != MBB.end() && isFuncletReturnInstr(*MI) && STI.isOSWindows()) {		if (MI != MBB.end() && isFuncletReturnInstr(*MI) && STI.isOSWindows()) {
// Don't restore CSRs in 32-bit EH funclets. Matches		// Don't restore CSRs in 32-bit EH funclets. Matches
// spillCalleeSavedRegisters.		// spillCalleeSavedRegisters.
if (STI.is32Bit())		if (STI.is32Bit())
return true;		return true;
▲ Show 20 Lines • Show All 1,020 Lines • ▼ Show 20 Lines	void X86FrameLowering::processFunctionBeforeFrameFinalized(
while (MBBI != MBB.end() && MBBI->getFlag(MachineInstr::FrameSetup))		while (MBBI != MBB.end() && MBBI->getFlag(MachineInstr::FrameSetup))
++MBBI;		++MBBI;

DebugLoc DL = MBB.findDebugLoc(MBBI);		DebugLoc DL = MBB.findDebugLoc(MBBI);
addFrameReference(BuildMI(MBB, MBBI, DL, TII.get(X86::MOV64mi32)),		addFrameReference(BuildMI(MBB, MBBI, DL, TII.get(X86::MOV64mi32)),
UnwindHelpFI)		UnwindHelpFI)
.addImm(-2);		.addImm(-2);
}		}

		std::unique_ptr<ShrinkWrapInfo>
		X86FrameLowering::createCSRShrinkWrapInfo(const MachineFunction &MF) const {
		return llvm::make_unique<X86CSRShrinkWrapInfo>(MF);
		}

test/CodeGen/AArch64/ShrinkWrapping/AliasInRegMask.mir

This file was added.

				# RUN: llc -mtriple=aarch64-- -run-pass prologepilog -debug-only=shrink-wrap2 %s -o /dev/null 2>&1 \| FileCheck %s
				# REQUIRE: asserts
				--- \|
				declare void @f0() nounwind
				define void @f1() nounwind { ret void }
				...
				---
				name: f1
				tracksRegLiveness: true
				body: \|
				bb.0:
				successors: %bb.2, %bb.1

				CBNZW %wzr, %bb.2
				B %bb.1

				bb.1:
				TCRETURNdi @f0, 0, csr_aarch64_aapcs, implicit %sp

				bb.2:
				RET_ReallyLR
				...
				# Check that we don't look for aliased regs in RegMasks.

				# CHECK-LABEL: f1
				# CHECK-NOT: Uses:

test/CodeGen/AArch64/ShrinkWrapping/CFIStackFrame.mir

This file was added.

				# RUN: llc -filetype obj -mtriple=arm64-apple-ios10.3.0 -run-pass=prologepilog -debug-only=shrink-wrap2 %s -o /dev/null 2>&1 \| FileCheck %s
				# REQUIRES: asserts
				--- \|
				declare void @f0() nounwind
				define void @f1() nounwind { ret void }
				...
				---
				name: f1
				tracksRegLiveness: true
				body: \|
				bb.0:
				successors: %bb.1, %bb.2

				CBNZW %wzr, %bb.2
				B %bb.1

				bb.1:
				successors: %bb.2

				ADJCALLSTACKDOWN 0, 0, implicit-def dead %sp, implicit %sp
				BL @f0, csr_aarch64_aapcs, implicit-def dead %lr, implicit %sp, implicit-def %sp
				ADJCALLSTACKUP 0, 0, implicit-def dead %sp, implicit %sp

				bb.2:
				RET_ReallyLR
				...
				# CHECK-LABEL: f1
				# CHECK-NOT: Insufficient CFI instructions to define a frame!

test/CodeGen/AArch64/ShrinkWrapping/CSRUsedOnTerminator.mir

This file was added.

				# RUN: llc -mtriple=aarch64-- -run-pass prologepilog -debug-only=shrink-wrap2 %s -o /dev/null 2>&1 \| FileCheck %s
				# REQUIRE: asserts
				--- \|
				define void @f0() nounwind { ret void }
				...
				---
				name: f0
				tracksRegLiveness: true
				body: \|
				bb.0:
				successors: %bb.1, %bb.2

				%nzcv = IMPLICIT_DEF
				Bcc 0, %bb.1, implicit killed %nzcv
				B %bb.2

				bb.1:
				RET_ReallyLR

				bb.2:
				successors: %bb.3, %bb.4

				%x21 = IMPLICIT_DEF

				%nzcv = IMPLICIT_DEF
				Bcc 0, %bb.3, implicit killed %nzcv
				B %bb.4

				bb.3:
				RET_ReallyLR

				bb.4:
				liveins: %x21
				successors: %bb.5, %bb.6

				CBZX killed %x21, %bb.5
				B %bb.6

				bb.5:
				RET_ReallyLR

				bb.6:
				RET_ReallyLR
				...
				# Check that we mark uses in terminator instructions as used in all the successors as well.

				# CHECK-LABEL: f0

				# CHECK: BB#2 uses : %X21
				# CHECK-NEXT: BB#4 uses : %X21
				# CHECK-NEXT: BB#5 uses : %X21
				# CHECK-NEXT: BB#6 uses : %X21

test/CodeGen/AArch64/ShrinkWrapping/CompactUnwindingFPSPPair.mir

This file was added.

				# RUN: llc -filetype obj -mtriple=arm64-apple-ios10.3.0 -run-pass=prologepilog %s -debug-only=shrink-wrap2 -o - 2>&1 \| FileCheck %s
				# REQUIRES: asserts
				--- \|
				define void @f0() nounwind { ret void }
				...
				---
				name: f0
				tracksRegLiveness: true
				liveins:
				- { reg: '%x1' }
				body: \|
				bb.0:
				successors: %bb.1, %bb.3
				liveins: %x1

				%x19 = COPY %x1
				CBNZW %wzr, %bb.3
				B %bb.1

				bb.1:
				successors: %bb.2
				liveins: %x19


				bb.2:
				successors: %bb.2, %bb.3
				liveins: %x19

				ADJCALLSTACKDOWN 0, 0, implicit-def dead %sp, implicit %sp
				%x1 = COPY %x19
				BL @f0, csr_aarch64_aapcs, implicit-def dead %lr, implicit %sp, implicit undef %x0, implicit %x1, implicit-def %sp
				ADJCALLSTACKUP 0, 0, implicit-def dead %sp, implicit %sp
				dead %xzr = SUBSXri undef %x8, 8, 0, implicit-def %nzcv
				Bcc 12, %bb.2, implicit killed %nzcv
				B %bb.3

				bb.3:
				RET_ReallyLR

				...
				# Check that we're not trying to produce compact unwinding when FP and LR are split.

				# CHECK-LABEL: f0
				# CHECK-NOT: Pushing invalid registers for frame!

test/CodeGen/AArch64/ShrinkWrapping/DetermineCalleeSavesSideEffects.mir

This file was added.

				# RUN: llc -march=aarch64 -mcpu=cortex-a57 -aarch64-a57-fp-load-balancing-override=1 -aarch64-a57-fp-load-balancing-force-all -enable-misched=false -enable-post-misched=false -run-pass=prologepilog -debug-only=shrink-wrap2 %s -o /dev/null 2>&1 \| FileCheck %s
				# REQUIRE: asserts
				--- \|
				declare void @f1() #0
				define void @f0() #1 { ret void }

				attributes #0 = { nounwind "target-cpu"="cortex-a57" }
				attributes #1 = { nounwind "no-frame-pointer-elim-non-leaf" "target-cpu"="cortex-a57" }

				...
				---
				name: f0
				tracksRegLiveness: true
				body: \|
				bb.0:
				successors: %bb.1, %bb.2

				CBNZW %wzr, %bb.2
				B %bb.1

				bb.1:
				successors: %bb.2

				ADJCALLSTACKDOWN 0, 0, implicit-def dead %sp, implicit %sp
				BL @f1, csr_aarch64_aapcs, implicit-def dead %lr, implicit %sp, implicit-def %sp
				ADJCALLSTACKUP 0, 0, implicit-def dead %sp, implicit %sp

				bb.2:
				RET_ReallyLR
				...
				# Check that while we look for CSRs, we set the appropriate internal state of AArch64MachineFunction.

				# CHECK-LABEL: f0
				# CHECK-NOT: unexpected function without stacke frame but with FP
				# CHECK: BB#1 uses : %LR
				# CHECK: **** Shrink-wrapping results
				# CHECK-NEXT: BB#1: Saves: %LR, \| Restores: %LR,

test/CodeGen/AArch64/ShrinkWrapping/FirstMBBNum2.ll

This file was added.

				; RUN: llc -mtriple=aarch64-- -O0 -global-isel -global-isel-abort=0 -verify-machineinstrs -enable-shrink-wrap2=true -debug-only=shrink-wrap2 %s -o - 2>&1 \| FileCheck %s
				; FIXME: ShrinkWrap2: use MIR once we fix stack protector assert.
				; REQUIRES: asserts
				; This test causes the first MBB ID to be 2, which provoked a bug.

				; CHECK-LABEL: ABIi128

				; CHECK: BB#2 uses : %LR
				; CHECK: **** Shrink-wrapping results
				; CHECK-NEXT: BB#2: Saves: %LR, \| Restores: %LR,

				target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
				target triple = "aarch64--"

				define i128 @ABIi128(i128 %arg1) nounwind {
				%res = fptoui fp128 undef to i128
				ret i128 %res
				}

test/CodeGen/AArch64/ShrinkWrapping/NoPostPreLoadStore.mir

This file was added.

				# RUN: llc -mtriple=arm64-apple-ios -debug-only=shrink-wrap2 -run-pass=prologepilog %s -o /dev/null 2>&1 \| FileCheck %s
				# REQUIRE: asserts

				--- \|
				define void @f0() nounwind { ret void }
				declare void @f1() nounwind
				declare void @f2() nounwind
				...
				---
				name: f0
				tracksRegLiveness: true
				body: \|
				bb.0:
				successors: %bb.1, %bb.2

				CBNZW %wzr, %bb.2
				B %bb.1

				bb.1:
				successors: %bb.2

				ADJCALLSTACKDOWN 0, 0, implicit-def dead %sp, implicit %sp
				BL @f1, csr_aarch64_aapcs, implicit-def dead %lr, implicit %sp, implicit-def %sp
				ADJCALLSTACKUP 0, 0, implicit-def dead %sp, implicit %sp

				bb.2:
				TCRETURNdi @f2, 0, csr_aarch64_aapcs, implicit %sp

				...

				# This test makes sure that we don't convert callee save save / restores from
				# store / load to pre / post increment load store.

				# CHECK-LABEL: f0
				# CHECK-NOT: This is not a register operand
				# CHECK: BB#1 uses : %LR
				# CHECK: **** Shrink-wrapping results
				# CHECK-NEXT: BB#1: Saves: %LR, \| Restores: %LR,

test/CodeGen/AArch64/ShrinkWrapping/NoStackObjects.mir

This file was added.

				# RUN: llc -filetype obj -mtriple=arm64-apple-ios10.3.0 -run-pass=prologepilog -debug-only=shrink-wrap2 %s -o /dev/null 2>&1 \| FileCheck %s
				# REQUIRES: asserts
				--- \|
				define void @f0() nounwind { entry: ret void }
				declare void @f1()
				...
				---
				name: f0
				tracksRegLiveness: true
				liveins:
				- { reg: '%d0' }
				- { reg: '%d1' }
				body: \|
				bb.0:
				successors: %bb.2, %bb.1
				liveins: %d0, %d1

				dead %wzr = SUBSWri undef %w8, 0, 0, implicit-def %nzcv
				Bcc 12, %bb.2, implicit killed %nzcv
				B %bb.1

				bb.1:
				successors: %bb.4, %bb.3
				liveins: %d0, %d1

				CBNZW %wzr, %bb.4
				B %bb.3

				bb.2:
				ADJCALLSTACKDOWN 0, 0, implicit-def dead %sp, implicit %sp
				%x3 = COPY %sp
				BL @f1, csr_aarch64_aapcs_thisreturn, implicit-def dead %lr, implicit %sp, implicit undef %x0, implicit undef %x1, implicit undef %x2, implicit killed %x3, implicit-def %sp
				ADJCALLSTACKUP 0, 0, implicit-def dead %sp, implicit %sp

				bb.3:
				successors: %bb.4
				liveins: %d0, %d1

				ADJCALLSTACKDOWN 0, 0, implicit-def dead %sp, implicit %sp
				%x3 = COPY %sp
				%w4 = MOVi32imm 70
				%w5 = COPY %wzr
				BL @f1, csr_aarch64_aapcs_thisreturn, implicit-def dead %lr, implicit %sp, implicit undef %x0, implicit %d0, implicit %d1, implicit undef %x1, implicit undef %x2, implicit killed %x3, implicit undef %d2, implicit killed %w4, implicit killed %w5, implicit-def %sp
				ADJCALLSTACKUP 0, 0, implicit-def dead %sp, implicit %sp

				bb.4:
				%w0 = MOVi32imm 1
				RET_ReallyLR implicit killed %w0
				...
				# Check that we don't use the stack objects in the AArch64 backend.

				# CHECK-LABEL: f0
				# CHECK-NOT: Getting frame offset for a dead object?

test/CodeGen/AArch64/aarch64-dynamic-stack-layout.ll

	; RUN: llc -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -disable-post-ra < %s \| FileCheck %s			; RUN: llc -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -disable-post-ra < %s \| FileCheck %s
	; RUN: llc -verify-machineinstrs -mtriple=arm64-apple-ios -disable-fp-elim -disable-post-ra < %s \| FileCheck %s --check-prefix=CHECK-MACHO			; RUN: llc -verify-machineinstrs -mtriple=arm64-apple-ios -disable-fp-elim -disable-post-ra < %s \| FileCheck %s --check-prefix=CHECK-MACHO
				; XFAIL: *

	; This test aims to check basic correctness of frame layout &			; This test aims to check basic correctness of frame layout &
	; frame access code. There are 8 functions in this test file,			; frame access code. There are 8 functions in this test file,
	; each function implements one element in the cartesian product			; each function implements one element in the cartesian product
	; of:			; of:
	; . a function having a VLA/noVLA			; . a function having a VLA/noVLA
	; . a function with dynamic stack realignment/no dynamic stack realignment.			; . a function with dynamic stack realignment/no dynamic stack realignment.
	; . a function needing a frame pionter/no frame pointer,			; . a function needing a frame pionter/no frame pointer,
	▲ Show 20 Lines • Show All 644 Lines • ▼ Show 20 Lines
	bb0:			bb0:
	%MyAlloca = alloca i8, i64 64, align 32			%MyAlloca = alloca i8, i64 64, align 32
	br label %bb1			br label %bb1

	bb1:			bb1:
	ret void			ret void
	}			}

				; FIXME: ShrinkWrap2: This fails because we don't combine the two sp displacements.
	; CHECK-LABEL: realign_conditional			; CHECK-LABEL: realign_conditional
	; No realignment in the prologue.			; No realignment in the prologue.
	; CHECK-NOT: and			; CHECK-NOT: and
	; CHECK-NOT: 0xffffffffffffffe0			; CHECK-NOT: 0xffffffffffffffe0
	; CHECK: tbz {{.}} .[[LABEL:.]]			; CHECK: tbz {{.}} .[[LABEL:.]]
	; Stack is realigned in a non-entry BB.			; Stack is realigned in a non-entry BB.
	; CHECK: sub [[REG:x[01-9]+]], sp, #64			; CHECK: sub [[REG:x[01-9]+]], sp, #64
	; CHECK: and sp, [[REG]], #0xffffffffffffffe0			; CHECK: and sp, [[REG]], #0xffffffffffffffe0
	Show All 36 Lines

test/CodeGen/AArch64/alloca.ll

	; RUN: llc -mtriple=aarch64-linux-gnu -disable-post-ra -verify-machineinstrs -o - %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-linux-gnu -disable-post-ra -verify-machineinstrs -o - %s \| FileCheck %s
	; RUN: llc -mtriple=arm64-apple-ios -disable-post-ra -verify-machineinstrs -o - %s \| FileCheck %s --check-prefix=CHECK-MACHO			; RUN: llc -mtriple=arm64-apple-ios -disable-post-ra -verify-machineinstrs -o - %s \| FileCheck %s --check-prefix=CHECK-MACHO
	; RUN: llc -mtriple=aarch64-none-linux-gnu -disable-post-ra -mattr=-fp-armv8 -verify-machineinstrs < %s \| FileCheck --check-prefix=CHECK-NOFP-ARM64 %s			; RUN: llc -mtriple=aarch64-none-linux-gnu -disable-post-ra -mattr=-fp-armv8 -verify-machineinstrs < %s \| FileCheck --check-prefix=CHECK-NOFP-ARM64 %s
				; XFAIL: *
				; FIXME: ShrinkWrap2: This fails with shrink-wrapping enabled because we don't
				; care about compact unwinding, and we don't force x20 to be spilled anyway.

	declare void @use_addr(i8*)			declare void @use_addr(i8*)

	define void @test_simple_alloca(i64 %n) {			define void @test_simple_alloca(i64 %n) {
	; CHECK-LABEL: test_simple_alloca:			; CHECK-LABEL: test_simple_alloca:

	%buf = alloca i8, i64 %n			%buf = alloca i8, i64 %n
	; Make sure we align the stack change to 16 bytes:			; Make sure we align the stack change to 16 bytes:
	▲ Show 20 Lines • Show All 159 Lines • Show Last 20 Lines

test/CodeGen/AArch64/arm64-aapcs-be.ll

	; RUN: llc -mtriple=aarch64_be-none-eabi -fast-isel=false < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64_be-none-eabi -fast-isel=false < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64_be-none-eabi -fast-isel=true < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64_be-none-eabi -fast-isel=true < %s \| FileCheck %s
				; XFAIL: This test fails with shrink-wrapping enabled, because with pairs enabled, if we have only one register, it will have a 16B alignment, so we use 32B for the stack instead of just 8B. See computeCalleeSaveRegisterPairs.

	; Check narrow argument passing via stack - callee end			; Check narrow argument passing via stack - callee end
	define i32 @test_narrow_args_callee(i64 %x0, i64 %x1, i64 %x2, i64 %x3, i64 %x4, i64 %x5, i64 %x6, i64 %x7, i8 %c, i16 %s) #0 {			define i32 @test_narrow_args_callee(i64 %x0, i64 %x1, i64 %x2, i64 %x3, i64 %x4, i64 %x5, i64 %x6, i64 %x7, i8 %c, i16 %s) #0 {
	entry:			entry:
	%conv = zext i8 %c to i32			%conv = zext i8 %c to i32
	%conv1 = sext i16 %s to i32			%conv1 = sext i16 %s to i32
	%add = add nsw i32 %conv1, %conv			%add = add nsw i32 %conv1, %conv
	; CHECK-LABEL: test_narrow_args_callee:			; CHECK-LABEL: test_narrow_args_callee:
	Show All 31 Lines

test/CodeGen/AArch64/arm64-abi_align.ll

	; RUN: llc < %s -mtriple=arm64-apple-darwin -mcpu=cyclone -enable-misched=false -disable-fp-elim \| FileCheck %s			; RUN: llc < %s -mtriple=arm64-apple-darwin -mcpu=cyclone -enable-misched=false -disable-fp-elim \| FileCheck %s
	; RUN: llc < %s -mtriple=arm64-apple-darwin -O0 -disable-fp-elim \| FileCheck -check-prefix=FAST %s			; RUN: llc < %s -mtriple=arm64-apple-darwin -O0 -disable-fp-elim \| FileCheck -check-prefix=FAST %s
				; XFAIL: *
				; FIXME: ShrinkWrap2: This test fails with shrink-wrapping enabled because we
				; don't combine SP updates.

	; rdar://12648441			; rdar://12648441
	; Generated from arm64-arguments.c with -O2.			; Generated from arm64-arguments.c with -O2.
	; Test passing structs with size < 8, < 16 and > 16			; Test passing structs with size < 8, < 16 and > 16
	; with alignment of 16 and without			; with alignment of 16 and without

	; Structs with size < 8			; Structs with size < 8
	%struct.s38 = type { i32, i16 }			%struct.s38 = type { i32, i16 }
	▲ Show 20 Lines • Show All 523 Lines • Show Last 20 Lines

test/CodeGen/AArch64/arm64-alloca-frame-pointer-offset.ll

	; RUN: llc -mtriple=arm64-eabi -mcpu=cyclone < %s \| FileCheck %s			; RUN: llc -mtriple=arm64-eabi -mcpu=cyclone < %s \| FileCheck %s
				; XFAIL: *
				; FIXME: ShrinkWrap2: This test fails with shrink-wrapping enabled because we
				; don't save LR, since there are no calls.

	; CHECK: foo			; CHECK: foo
	; CHECK: str w[[REG0:[0-9]+]], [x19, #264]			; CHECK: str w[[REG0:[0-9]+]], [x19, #264]
	; CHECK: mov w[[REG1:[0-9]+]], w[[REG0]]			; CHECK: mov w[[REG1:[0-9]+]], w[[REG0]]
	; CHECK: str w[[REG1]], [x19, #132]			; CHECK: str w[[REG1]], [x19, #132]

	define i32 @foo(i32 %a) nounwind {			define i32 @foo(i32 %a) nounwind {
	%retval = alloca i32, align 4			%retval = alloca i32, align 4
	Show All 20 Lines

test/CodeGen/AArch64/arm64-dead-register-def-bug.ll

				; FIXME: ShrinkWrap2: .ll -> .mir when stack protector stuff is fixed.
	; RUN: llc -mtriple="arm64-apple-ios" < %s \| FileCheck %s			; RUN: llc -mtriple="arm64-apple-ios" < %s \| FileCheck %s
	;			;
	; Check that the dead register definition pass is considering implicit defs.			; Check that the dead register definition pass is considering implicit defs.
	; When rematerializing through truncates, the coalescer may produce instructions			; When rematerializing through truncates, the coalescer may produce instructions
	; with dead defs, but live implicit-defs of subregs:			; with dead defs, but live implicit-defs of subregs:
	; E.g. %X1<def, dead> = MOVi64imm 2, %W1<imp-def>; %X1:GPR64, %W1:GPR32			; E.g. %X1<def, dead> = MOVi64imm 2, %W1<imp-def>; %X1:GPR64, %W1:GPR32
	; These instructions are live, and their definitions should not be rewritten.			; These instructions are live, and their definitions should not be rewritten.
	;			;
	Show All 24 Lines

test/CodeGen/AArch64/arm64-fp128.ll

	; RUN: llc -mtriple=arm64-linux-gnu -verify-machineinstrs -mcpu=cyclone -aarch64-enable-atomic-cfg-tidy=0 < %s \| FileCheck %s			; RUN: llc -mtriple=arm64-linux-gnu -verify-machineinstrs -mcpu=cyclone -aarch64-enable-atomic-cfg-tidy=0 < %s \| FileCheck %s
				; XFAIL: *
				; FIXME: ShrinkWrap2: This test fails with shrink-wrapping enabled because we
				; insert a restore point between a cmp and a jump.

	@lhs = global fp128 zeroinitializer, align 16			@lhs = global fp128 zeroinitializer, align 16
	@rhs = global fp128 zeroinitializer, align 16			@rhs = global fp128 zeroinitializer, align 16

	define fp128 @test_add() {			define fp128 @test_add() {
	; CHECK-LABEL: test_add:			; CHECK-LABEL: test_add:

	%lhs = load fp128, fp128* @lhs, align 16			%lhs = load fp128, fp128* @lhs, align 16
	▲ Show 20 Lines • Show All 271 Lines • Show Last 20 Lines

test/CodeGen/AArch64/arm64-hello.ll

	; RUN: llc < %s -mtriple=arm64-apple-ios7.0 -disable-post-ra -disable-fp-elim \| FileCheck %s			; RUN: llc < %s -mtriple=arm64-apple-ios7.0 -disable-post-ra -disable-fp-elim \| FileCheck %s
	; RUN: llc < %s -mtriple=arm64-linux-gnu -disable-post-ra \| FileCheck %s --check-prefix=CHECK-LINUX			; RUN: llc < %s -mtriple=arm64-linux-gnu -disable-post-ra \| FileCheck %s --check-prefix=CHECK-LINUX
				; XFAIL: *
				; FIXME: ShrinkWrap2: This test fails with shrink-wrapping because we don't
				; combine SP updates.

	; CHECK-LABEL: main:			; CHECK-LABEL: main:
	; CHECK: sub sp, sp, #32			; CHECK: sub sp, sp, #32
	; CHECK-NEXT: stp x29, x30, [sp, #16]			; CHECK-NEXT: stp x29, x30, [sp, #16]
	; CHECK-NEXT: add x29, sp, #16			; CHECK-NEXT: add x29, sp, #16
	; CHECK-NEXT: stur wzr, [x29, #-4]			; CHECK-NEXT: stur wzr, [x29, #-4]
	; CHECK: adrp x0, l_.str@PAGE			; CHECK: adrp x0, l_.str@PAGE
	; CHECK: add x0, x0, l_.str@PAGEOFF			; CHECK: add x0, x0, l_.str@PAGEOFF
	Show All 25 Lines

test/CodeGen/AArch64/arm64-join-reserved.ll

	; RUN: llc < %s -verify-machineinstrs \| FileCheck %s			; RUN: llc < %s -verify-machineinstrs \| FileCheck %s
				; XFAIL: *
				; FIXME: ShrinkWrap2: This test fails with shrink-wrapping enabled because we
				; don't spill x29, so we merge the store of x30 with wrz.
	target triple = "arm64-apple-macosx10"			target triple = "arm64-apple-macosx10"

	; Make sure that a store to [sp] addresses off sp directly.			; Make sure that a store to [sp] addresses off sp directly.
	; A move isn't necessary.			; A move isn't necessary.
	; <rdar://problem/11492712>			; <rdar://problem/11492712>
	; CHECK-LABEL: g:			; CHECK-LABEL: g:
	; CHECK: str xzr, [sp]			; CHECK: str xzr, [sp]
	; CHECK: bl			; CHECK: bl
	; CHECK: ret			; CHECK: ret
	define void @g() nounwind ssp {			define void @g() nounwind ssp {
	entry:			entry:
	tail call void (i32, ...) @f(i32 0, i32 0) nounwind			tail call void (i32, ...) @f(i32 0, i32 0) nounwind
	ret void			ret void
	}			}

	declare void @f(i32, ...)			declare void @f(i32, ...)

test/CodeGen/AArch64/arm64-large-frame.ll

	; RUN: llc -verify-machineinstrs -mtriple=arm64-none-linux-gnu -disable-fp-elim -disable-post-ra < %s \| FileCheck %s			; RUN: llc -verify-machineinstrs -mtriple=arm64-none-linux-gnu -disable-fp-elim -disable-post-ra < %s \| FileCheck %s
				; XFAIL: *
				; FIXME: ShrinkWrap2: This test fails with shrink-wrapping enabled because we
				; don't save LR.
	declare void @use_addr(i8*)			declare void @use_addr(i8*)

	@addr = global i8* null			@addr = global i8* null

	define void @test_bigframe() {			define void @test_bigframe() {
	; CHECK-LABEL: test_bigframe:			; CHECK-LABEL: test_bigframe:
	; CHECK: .cfi_startproc			; CHECK: .cfi_startproc

	▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

test/CodeGen/X86/ShrinkWrapping/BasicBranch.mir

This file was added.

				# RUN: llc -mtriple=x86_64-- -run-pass=prologepilog -debug-only=shrink-wrap2 %s -o /dev/null 2>&1 \| FileCheck %s
				# REQUIRE: asserts
				--- \|
				define void @f0() nounwind { ret void }
				...
				---
				name: f0
				tracksRegLiveness: true
				body: \|
				bb.0:
				successors: %bb.2, %bb.1

				%eflags = IMPLICIT_DEF
				JE_1 %bb.2, implicit killed %eflags
				JMP_1 %bb.1

				bb.1:
				successors: %bb.3

				%rbx = IMPLICIT_DEF
				JMP_1 %bb.3

				bb.2:
				RET 0

				bb.3:
				%rbx = IMPLICIT_DEF
				RET 0
				...
				# Basic shrink-wrapping example. Early return with uses of CSRs in the body.
				#CHECK-LABEL: f0

				#CHECK: BB#1 uses : %RBX
				#CHECK-NEXT: BB#3 uses : %RBX
				#CHECK: **** Shrink-wrapping results
				#CHECK-NEXT: BB#1: Saves: %RBX, \| Restores:
				#CHECK-NEXT: BB#3: Saves: \| Restores: %RBX,

test/CodeGen/X86/ShrinkWrapping/CriticalEdge.mir

This file was added.

				# RUN: llc -march=x86 -run-pass=prologepilog %s -debug-only=shrink-wrap2 -o /dev/null 2>&1 \| FileCheck %s
				# REQUIRES: asserts
				# This is a reduced test case from test/CodeGen/X86/2006-04-27-ISelFoldingBug.ll
				--- \|
				define void @f0() nounwind { ret void }
				...
				---
				name: f0
				tracksRegLiveness: true
				body: \|
				bb.0:
				successors: %bb.3

				JMP_1 %bb.3

				bb.1:
				RET 0

				bb.2:
				RET 0

				bb.3:
				successors: %bb.4, %bb.2

				%esi = IMPLICIT_DEF

				%eflags = IMPLICIT_DEF
				JGE_1 %bb.2, implicit killed %eflags
				JMP_1 %bb.4

				bb.4:
				successors:%bb.1, %bb.2

				%eflags = IMPLICIT_DEF
				JNE_1 %bb.1, implicit killed %eflags
				JMP_1 %bb.2
				...
				#CHECK-LABEL: f0

				#CHECK: BB#3 uses : %ESI
				#CHECK: **** Shrink-wrapping results
				#CHECK-NEXT: BB#3: Saves: %ESI, \| Restores: %ESI,

test/CodeGen/X86/ShrinkWrapping/CriticalEdge2.mir

This file was added.

				# RUN: llc -march=x86 -run-pass=prologepilog %s -debug-only=shrink-wrap2 -o /dev/null 2>&1 \| FileCheck %s
				# REQUIRES: asserts
				# This is a reduced test case from test/CodeGen/X86/2007-08-09-IllegalX86-64Asm.ll
				--- \|
				define void @f0() nounwind { ret void }
				...
				---
				name: f0
				tracksRegLiveness: true
				body: \|
				bb.0:
				successors: %bb.4, %bb.2

				%eflags = IMPLICIT_DEF
				JNE_1 %bb.4, implicit killed %eflags
				JMP_1 %bb.2

				bb.2:
				successors: %bb.3, %bb.4

				%ebx = IMPLICIT_DEF
				%eflags = IMPLICIT_DEF
				JNE_1 %bb.4, implicit killed %eflags
				JMP_1 %bb.3

				bb.3:
				RET 0

				bb.4:
				RET 0
				...
				#CHECK-LABEL: f0

				#CHECK: BB#1 uses : %EBX
				#CHECK: **** Shrink-wrapping results
				#CHECK-NEXT: BB#1: Saves: %EBX, \| Restores: %EBX,

test/CodeGen/X86/ShrinkWrapping/CriticalEdgeLoop.mir

This file was added.

				# RUN: llc -mtriple=x86_64-- -run-pass=prologepilog %s -debug-only=shrink-wrap2 -o /dev/null 2>&1 \| FileCheck %s
				# REQUIRES: asserts
				# This is a reduced test case from test/CodeGen/X86/2009-04-27-CoalescerAssert.ll
				--- \|
				define void @f0() nounwind { ret void }
				...
				---
				name: f0
				tracksRegLiveness: true
				body: \|
				bb.0:
				successors: %bb.3, %bb.1

				%eflags = IMPLICIT_DEF
				JNE_1 %bb.3, implicit killed %eflags
				JMP_1 %bb.1

				bb.1:
				successors: %bb.4

				JMP_1 %bb.4

				bb.2:

				bb.3:
				successors: %bb.4, %bb.2

				%eflags = IMPLICIT_DEF
				JNE_1 %bb.2, implicit killed %eflags
				JMP_1 %bb.4

				bb.4:
				successors: %bb.6, %bb.5

				%rbx = IMPLICIT_DEF

				%eflags = IMPLICIT_DEF
				JE_1 %bb.6, implicit killed %eflags
				JMP_1 %bb.5

				bb.5:
				RET 0

				bb.6:
				RET 0

				...
				#CHECK-LABEL: f0

				#CHECK: BB#4 uses : %RBX
				#CHECK: **** Shrink-wrapping results
				#CHECK-NEXT: BB#4: Saves: %RBX, \| Restores: %RBX,

test/CodeGen/X86/ShrinkWrapping/InfiniteLoop.mir

This file was added.

				# RUN: llc -mtriple=x86_64-- -run-pass=prologepilog %s -debug-only=shrink-wrap2 -o /dev/null 2>&1 \| FileCheck %s
				# REQUIRES: asserts
				--- \|
				define void @f0() nounwind { ret void }
				...
				---
				name: f0
				body: \|
				bb.0:
				successors: %bb.1, %bb.2
				liveins: %edi

				%eflags = IMPLICIT_DEF
				JNE_1 %bb.2, implicit killed %eflags
				JMP_1 %bb.1

				bb.1:
				RET 0

				bb.2:
				successors: %bb.3

				%rbx = IMPLICIT_DEF

				bb.3:
				successors: %bb.3

				JMP_1 %bb.3
				...
				# Check that we don't save on a branch that never returns.
				#CHECK-LABEL: f0

				#CHECK: BB#2 uses : %RBX
				#CHECK-NEXT: Remove uses from no-return BB#2
				#CHECK-NOT: Saves:
				#CHECK-NOT: restores:

test/CodeGen/X86/ShrinkWrapping/IrreducibleCFG.mir

This file was added.

				# RUN: llc -mtriple=x86_64-- -run-pass=prologepilog %s -debug-only=shrink-wrap2 -o /dev/null 2>&1 \| FileCheck %s
				# REQUIRES: asserts
				--- \|
				define void @f0() nounwind { ret void }
				...
				---
				name: f0
				tracksRegLiveness: true
				body: \|
				bb.0:
				successors: %bb.10, %bb.6

				%eflags = IMPLICIT_DEF
				JNE_1 %bb.10, implicit killed %eflags
				JMP_1 %bb.6

				bb.1:
				successors: %bb.6

				JMP_1 %bb.6

				bb.2:
				successors: %bb.10

				JMP_1 %bb.10

				bb.3:
				successors: %bb.4

				%ebx = IMPLICIT_DEF
				JMP_1 %bb.4

				bb.4:
				successors: %bb.5, %bb.9

				%eflags = IMPLICIT_DEF
				JNE_1 %bb.9, implicit killed %eflags
				JMP_1 %bb.5

				bb.5:
				RET 0

				bb.6:
				successors: %bb.2, %bb.7

				%eflags = IMPLICIT_DEF
				JE_1 %bb.2, implicit killed %eflags
				JMP_1 %bb.7

				bb.7:
				successors: %bb.4

				JMP_1 %bb.4

				bb.8:
				successors: %bb.3, %bb.1

				%eflags = IMPLICIT_DEF
				JNE_1 %bb.3, implicit killed %eflags
				JMP_1 %bb.1

				bb.9:
				successors: %bb.4, %bb.8

				%eflags = IMPLICIT_DEF
				JNE_1 %bb.4, implicit killed %eflags
				JMP_1 %bb.8

				bb.10:
				successors: %bb.7

				JMP_1 %bb.7

				...
				# Check that we handle irreducible loops and save / restore outside them.

				#CHECK-LABEL: f0
				#CHECK: BB#2 uses : %RBX
				#CHECK: **** Shrink-wrapping results
				#CHECK-NEXT: BB#0: Saves: %RBX, \| Restores:
				#CHECK-NEXT: BB#5: Saves: \| Restores: %RBX,

test/CodeGen/X86/ShrinkWrapping/LoopBasic.mir

This file was added.

				# RUN: llc -mtriple=x86_64-- -run-pass=prologepilog %s -debug-only=shrink-wrap2 -o /dev/null 2>&1 \| FileCheck %s
				# REQUIRES: asserts
				--- \|
				define void @f0() nounwind { ret void }
				...
				---
				name: f0
				tracksRegLiveness: true
				body: \|
				bb.0:
				successors: %bb.2, %bb.1


				%eflags = IMPLICIT_DEF
				JE_1 %bb.2, implicit killed %eflags
				JMP_1 %bb.1

				bb.1:
				successors: %bb.3

				JMP_1 %bb.3

				bb.2:
				RET 0

				bb.3:
				successors: %bb.4, %bb.5

				%eflags = IMPLICIT_DEF
				JNE_1 %bb.5, implicit killed %eflags
				JMP_1 %bb.4

				bb.4:
				successors: %bb.6

				JMP_1 %bb.6

				bb.5:
				successors: %bb.6

				%ebx = IMPLICIT_DEF
				JMP_1 %bb.6

				bb.6:
				successors: %bb.7, %bb.3

				%eflags = IMPLICIT_DEF
				JNE_1 %bb.3, implicit killed %eflags
				JMP_1 %bb.7

				bb.7:
				RET 0
				...
				# Check that we don't save inside loops.

				#CHECK-LABEL: f0

				#CHECK: BB#5 uses : %RBX
				#CHECK: **** Shrink-wrapping results
				#CHECK-NEXT: BB#1: Saves: %RBX, \| Restores:
				#CHECK-NEXT: BB#7: Saves: \| Restores: %RBX,

test/CodeGen/X86/ShrinkWrapping/LoopInCondition.mir

This file was added.

				# RUN: llc -march=x86 -run-pass=prologepilog %s -debug-only=shrink-wrap2 -o /dev/null 2>&1 \| FileCheck %s
				# REQUIRES: asserts
				# This is a reduced test case from test/CodeGen/X86/2007-11-06-InstrSched.ll.
				--- \|
				define void @f0() nounwind { ret void }
				...
				---
				name: f0
				tracksRegLiveness: true
				body: \|
				bb.0:
				successors: %bb.1, %bb.3

				%eflags = IMPLICIT_DEF
				JNE_1 %bb.3, implicit killed %eflags
				JMP_1 %bb.1

				bb.1:
				successors: %bb.4

				JMP_1 %bb.4

				bb.3:
				successors: %bb.3, %bb.4

				%esi = IMPLICIT_DEF
				%eflags = IMPLICIT_DEF
				JB_1 %bb.3, implicit killed %eflags
				JMP_1 %bb.4

				bb.4:

				RET 0
				...
				#CHECK-LABEL: f0

				#CHECK: BB#2 uses : %ESI
				#CHECK: **** Shrink-wrapping results
				#CHECK-NEXT: BB#0: Saves: %ESI, \| Restores:
				#CHECK-NEXT: BB#3: Saves: \| Restores: %ESI,

test/CodeGen/X86/ShrinkWrapping/LoopNoPreheader.mir

This file was added.

				# RUN: llc -mtriple=x86_64-- -run-pass=prologepilog %s -debug-only=shrink-wrap2 -o /dev/null 2>&1 \| FileCheck %s
				# REQUIRE: asserts
				--- \|
				define void @f0() nounwind {
				entry:
				ret void
				}
				...
				---
				name: f0
				tracksRegLiveness: true
				body: \|
				bb.0:
				successors: %bb.2, %bb.1

				%eflags = IMPLICIT_DEF
				JE_1 %bb.2, implicit killed %eflags
				JMP_1 %bb.1

				bb.1:
				successors: %bb.3

				JMP_1 %bb.3

				bb.2:
				successors: %bb.3

				JMP_1 %bb.3

				bb.3:
				successors: %bb.4

				%rbx = IMPLICIT_DEF
				JMP_1 %bb.4

				bb.4:
				successors: %bb.3, %bb.5

				%eflags = IMPLICIT_DEF
				JE_1 %bb.3, implicit killed %eflags
				JMP_1 %bb.5
				RET 0

				bb.5:

				RET 0

				...
				# Check that we handle loops with no preheader. This should propagate through
				# the loop's predecessors.

				#CHECK-LABEL: f0

				#CHECK: BB#4 uses : %RBX
				#CHECK: **** Shrink-wrapping results
				#CHECK-NEXT: BB#0: Saves: %RBX, \| Restores:
				#CHECK-NEXT: BB#5: Saves: \| Restores: %RBX,

test/CodeGen/X86/ShrinkWrapping/LoopNoPreheaderLatchExit.mir

This file was added.

				# RUN: llc -mtriple=x86_64-- -run-pass=prologepilog %s -debug-only=shrink-wrap2 -o /dev/null 2>&1 \| FileCheck %s
				# REQUIRE: asserts
				# XFAIL: *
				--- \|
				define void @f0() nounwind {
				entry:
				ret void
				}
				...
				---
				name: f0
				tracksRegLiveness: true
				body: \|
				bb.0:
				successors: %bb.2, %bb.1

				%eflags = IMPLICIT_DEF
				JE_1 %bb.2, implicit killed %eflags
				JMP_1 %bb.1

				bb.1:
				successors: %bb.3

				JMP_1 %bb.3

				bb.2:
				successors: %bb.3

				JMP_1 %bb.3

				bb.3:
				successors: %bb.4

				%rbx = IMPLICIT_DEF
				JMP_1 %bb.4

				bb.4:
				successors: %bb.3

				%eflags = IMPLICIT_DEF
				JE_1 %bb.3, implicit killed %eflags
				RET 0

				...
				# FIXME: ShrinkWrap2: This test still fails, since there is no way to place a
				# restore outside a loop. This should not be possible in real code.

				#CHECK-LABEL: f0

				#CHECK: BB#3 uses : %RBX
				#CHECK: **** Shrink-wrapping results
				#CHECK-NEXT: BB#0: Saves: %RBX, \| Restores:
				#CHECK-NEXT: BB#3: Saves: \| Restores: %RBX,

test/CodeGen/X86/ShrinkWrapping/MultipleCriticalEdges.mir

This file was added.

				# RUN: llc -mtriple=x86_64-- -run-pass=prologepilog %s -debug-only=shrink-wrap2 -o /dev/null 2>&1 \| FileCheck %s
				# REQUIRE: asserts
				--- \|
				define void @f0() nounwind { ret void }

				...
				---
				name: f0
				tracksRegLiveness: true
				body: \|
				bb.0:
				successors: %bb.1, %bb.3

				%eflags = IMPLICIT_DEF
				JNE_1 %bb.3, implicit killed %eflags
				JMP_1 %bb.1

				bb.1:
				successors: %bb.4, %bb.2

				%ebx = IMPLICIT_DEF
				%eflags = IMPLICIT_DEF
				JNE_1 %bb.2, implicit killed %eflags
				JMP_1 %bb.4

				bb.2:
				successors: %bb.4, %bb.3

				%ebx = IMPLICIT_DEF
				%eflags = IMPLICIT_DEF
				JNE_1 %bb.3, implicit killed %eflags
				JMP_1 %bb.4

				bb.4:
				RET 0

				bb.3:
				RET 0

				...
				# Check that we handle multiple critical edges.

				#CHECK-LABEL: f0

				#CHECK: BB#1 uses : %RBX
				#CHECK-NEXT: BB#2 uses : %RBX
				#CHECK: **** Shrink-wrapping results
				#CHECK-NEXT: BB#0: Saves: %RBX, \| Restores:
				#CHECK-NEXT: BB#3: Saves: \| Restores: %RBX,
				#CHECK-NEXT: BB#4: Saves: \| Restores: %RBX,

test/CodeGen/X86/ShrinkWrapping/NestedLoopsCriticalEdges.mir

This file was added.

				# RUN: llc -mtriple=x86_64-- -run-pass=stack-protector -run-pass=prologepilog %s -enable-shrink-wrap2=true -debug-only=shrink-wrap2 -o /dev/null 2>&1 \| FileCheck %s
				# REQUIRES: asserts
				# XFAIL: *
				--- \|
				define void @f0() nounwind {
				entry:
				ret void
				}
				...
				---
				name: f0
				tracksRegLiveness: true
				body: \|
				bb.0:
				successors: %bb.1, %bb.6

				%eflags = IMPLICIT_DEF
				JE_1 %bb.1, implicit killed %eflags
				JMP_1 %bb.6

				bb.1:
				successors: %bb.2, %bb.6

				%eflags = IMPLICIT_DEF
				JE_1 %bb.2, implicit killed %eflags
				JMP_1 %bb.6

				bb.2:
				successors: %bb.3

				%rbx = IMPLICIT_DEF
				JMP_1 %bb.3

				bb.3:
				successors: %bb.4
				JMP_1 %bb.4

				bb.4:
				successors: %bb.4, %bb.5

				%eflags = IMPLICIT_DEF
				JE_1 %bb.4, implicit killed %eflags
				JMP_1 %bb.5

				bb.5:
				successors: %bb.6, %bb.3

				%eflags = IMPLICIT_DEF
				JE_1 %bb.6, implicit killed %eflags
				JMP_1 %bb.3

				bb.6:
				RET 0

				...
				# Mix nested loops and critical edges.
				# FIXME: ShrinkWrap2: This fails because we propagate attributes to the
				# critical edges.

				#CHECK-LABEL: f0

				#CHECK: BB#2 uses : %RBX
				#CHECK: **** Shrink-wrapping results
				#CHECK-NEXT: BB#2: Saves: %RBX, \| Restores: %RBX,

test/CodeGen/X86/ShrinkWrapping/NoReturnPath.mir

This file was added.

				# RUN: llc -mtriple=x86_64-- -run-pass=stack-protector -run-pass=prologepilog %s -enable-shrink-wrap2=true -debug-only=shrink-wrap2 -o /dev/null 2>&1 \| FileCheck %s
				# REQUIRES: asserts
				# This is a reduced test case from test/CodeGen/X86/2009-09-10-SpillComments.ll
				--- \|
				define void @f0() nounwind { ret void }
				...
				---
				name: f0
				tracksRegLiveness: true
				body: \|
				bb.0:
				successors: %bb.6, %bb.1

				%rbx = IMPLICIT_DEF
				%eflags = IMPLICIT_DEF
				JNE_1 %bb.6, implicit killed %eflags
				JMP_1 %bb.1

				bb.1:
				successors: %bb.2, %bb.3
				liveins: %rbx

				%eflags = IMPLICIT_DEF
				JNE_1 %bb.3, implicit killed %eflags
				JMP_1 %bb.2

				bb.2:
				RET 0

				bb.3:
				successors: %bb.4
				liveins: %rbx

				bb.4:
				successors: %bb.5, %bb.4
				liveins: %rbx

				%eflags = IMPLICIT_DEF
				JNE_1 %bb.4, implicit killed %eflags
				JMP_1 %bb.5

				bb.5:
				successors: %bb.4
				liveins: %rbx

				%rbx = IMPLICIT_DEF
				JMP_1 %bb.4

				bb.6:
				RET 0
				...
				#CHECK-LABEL: f0

				#CHECK: BB#0 uses : %RBX
				#CHECK-NEXT: BB#5 uses : %RBX
				#CHECK-NEXT: Remove uses from no-return BB#3
				#CHECK-NEXT: Remove uses from no-return BB#4
				#CHECK-NEXT: Remove uses from no-return BB#5
				#CHECK: **** Shrink-wrapping results
				#CHECK-NEXT: BB#0: Saves: %RBX, \| Restores: %RBX,

test/CodeGen/X86/ShrinkWrapping/Paper1Figure2CriticalEdge.mir

This file was added.

				# RUN: llc -mtriple=x86_64-- -run-pass=prologepilog %s -debug-only=shrink-wrap2 -o /dev/null 2>&1 \| FileCheck %s
				# REQUIRES: asserts
				--- \|
				define void @f0() nounwind { ret void }
				...
				---
				name: f0
				tracksRegLiveness: true
				body: \|
				bb.0:
				successors: %bb.1, %bb.2

				%eflags = IMPLICIT_DEF
				JNE_1 %bb.2, implicit killed %eflags
				JMP_1 %bb.1

				bb.1:
				successors: %bb.3, %bb.4

				%eflags = IMPLICIT_DEF
				JE_1 %bb.4, implicit killed %eflags
				JMP_1 %bb.3

				bb.2:
				successors: %bb.4

				%ebx = IMPLICIT_DEF
				JMP_1 %bb.4

				bb.3:
				RET 0

				bb.4:

				%ebx = IMPLICIT_DEF
				RET 0
				...
				# Fig. 2 in Chow's paper.

				#CHECK-LABEL: f0

				#CHECK: BB#2 uses : %RBX
				#CHECK-NEXT: BB#4 uses : %RBX
				#CHECK: **** Shrink-wrapping results
				#CHECK-NEXT: BB#0: Saves: %RBX, \| Restores:
				#CHECK-NEXT: BB#3: Saves: \| Restores: %RBX,
				#CHECK-NEXT: BB#4: Saves: \| Restores: %RBX,

test/CodeGen/X86/ShrinkWrapping/Paper2Figure1.mir

This file was added.

				# RUN: llc -mtriple=x86_64-- -run-pass=prologepilog %s -debug-only=shrink-wrap2 -o /dev/null 2>&1 \| FileCheck %s
				# REQUIRES: asserts
				--- \|
				define void @f0() nounwind { ret void }
				...
				---
				name: f0
				tracksRegLiveness: true
				body: \|
				bb.0:
				successors: %bb.1, %bb.2

				%eflags = IMPLICIT_DEF
				JNE_1 %bb.2, implicit killed %eflags
				JMP_1 %bb.1

				bb.1:
				successors: %bb.3
				JMP_1 %bb.3

				bb.2:
				successors: %bb.3

				%ebx = IMPLICIT_DEF
				JMP_1 %bb.3

				bb.3:
				successors: %bb.5, %bb.4

				%eflags = IMPLICIT_DEF
				JE_1 %bb.5, implicit killed %eflags
				JMP_1 %bb.4

				bb.4:
				successors: %bb.6

				%ebx = IMPLICIT_DEF
				JMP_1 %bb.6

				bb.5:
				successors: %bb.6

				JMP_1 %bb.6

				bb.6:
				RET 0
				...
				# Fig 1 in Lupo and Wilken's paper.

				#CHECK-LABEL: f0

				#CHECK: BB#2 uses : %RBX
				#CHECK-NEXT: BB#4 uses : %RBX
				#CHECK: **** Shrink-wrapping results
				#CHECK-NEXT: BB#2: Saves: %RBX, \| Restores: %RBX,
				#CHECK-NEXT: BB#4: Saves: %RBX, \| Restores: %RBX,

test/CodeGen/X86/ShrinkWrapping/Paper2Figure2.mir

This file was added.

				# RUN: llc -mtriple=x86_64-- -run-pass=prologepilog %s -debug-only=shrink-wrap2 -o /dev/null 2>&1 \| FileCheck %s
				# REQUIRES: asserts
				--- \|
				define void @f0() nounwind { ret void }
				...
				---
				name: f0
				tracksRegLiveness: true
				body: \|
				bb.0:
				successors: %bb.1, %bb.8

				%eflags = IMPLICIT_DEF
				JNE_1 %bb.8, implicit killed %eflags
				JMP_1 %bb.1

				bb.1:
				successors: %bb.2, %bb.7

				%eflags = IMPLICIT_DEF
				JNE_1 %bb.7, implicit killed %eflags
				JMP_1 %bb.2

				bb.2:
				successors: %bb.3, %bb.5

				%eflags = IMPLICIT_DEF
				JNE_1 %bb.5, implicit killed %eflags
				JMP_1 %bb.3

				bb.3:
				successors: %bb.4, %bb.5

				%ebx = IMPLICIT_DEF
				%eflags = IMPLICIT_DEF
				JNE_1 %bb.5, implicit killed %eflags
				JMP_1 %bb.4

				bb.4:
				successors: %bb.5

				%ebx = MOV32ri 9
				JMP_1 %bb.5

				bb.5:
				successors: %bb.6, %bb.7

				%eflags = IMPLICIT_DEF
				JNE_1 %bb.7, implicit killed %eflags
				JMP_1 %bb.6

				bb.6:
				successors: %bb.7

				%ebx = IMPLICIT_DEF
				JMP_1 %bb.7

				bb.7:
				successors: %bb.15

				JMP_1 %bb.15

				bb.8:
				successors: %bb.9, %bb.10

				%eflags = IMPLICIT_DEF
				JNE_1 %bb.10, implicit killed %eflags
				JMP_1 %bb.9

				bb.9:
				successors: %bb.11

				JMP_1 %bb.11

				bb.10:
				successors: %bb.11

				%ebx = IMPLICIT_DEF
				JMP_1 %bb.11

				bb.11:
				successors: %bb.12, %bb.13

				%eflags = IMPLICIT_DEF
				JNE_1 %bb.13, implicit killed %eflags
				JMP_1 %bb.12

				bb.12:
				successors: %bb.14

				JMP_1 %bb.14

				bb.13:
				successors: %bb.14

				%ebx = IMPLICIT_DEF
				JMP_1 %bb.14

				bb.14:
				successors: %bb.15
				JMP_1 %bb.15


				bb.15:
				RET 0
				...
				# Fig 2 in Lupo and Wilken's paper.

				#CHECK-LABEL: f0

				#CHECK: BB#3 uses : %RBX
				#CHECK-NEXT: BB#4 uses : %RBX
				#CHECK-NEXT: BB#6 uses : %RBX
				#CHECK-NEXT: BB#10 uses : %RBX
				#CHECK-NEXT: BB#13 uses : %RBX
				#CHECK: **** Shrink-wrapping results
				#CHECK-NEXT: BB#1: Saves: %RBX, \| Restores:
				#CHECK-NEXT: BB#7: Saves: \| Restores: %RBX,
				#CHECK-NEXT: BB#10: Saves: %RBX, \| Restores: %RBX,
				#CHECK-NEXT: BB#13: Saves: %RBX, \| Restores: %RBX,

test/CodeGen/X86/ShrinkWrapping/PropagateLoopUses.mir

This file was added.

				# RUN: llc -mtriple=x86_64-- -run-pass=prologepilog %s -debug-only=shrink-wrap2 -o /dev/null 2>&1 \| FileCheck %s
				# REQUIRES: asserts
				--- \|
				define void @f0() nounwind { ret void }
				...
				---
				name: f0
				tracksRegLiveness: true
				body: \|
				bb.0:
				successors: %bb.1, %bb.2

				%eflags = IMPLICIT_DEF
				JNE_1 %bb.2, implicit killed %eflags
				JMP_1 %bb.1

				bb.1:
				successors: %bb.15

				JMP_1 %bb.15

				bb.2:
				successors: %bb.11

				%r15 = IMPLICIT_DEF
				%r14 = IMPLICIT_DEF
				%rbx = IMPLICIT_DEF
				JMP_1 %bb.11

				bb.3:
				successors: %bb.4, %bb.3

				%eflags = IMPLICIT_DEF
				JNE_1 %bb.3, implicit killed %eflags
				JMP_1 %bb.4

				bb.4:
				successors: %bb.6
				liveins: %r14

				%r14 = IMPLICIT_DEF
				JMP_1 %bb.6

				bb.5:
				successors: %bb.6

				JMP_1 %bb.6

				bb.6:
				successors: %bb.7

				JMP_1 %bb.7

				bb.7:
				successors: %bb.8, %bb.9

				%eflags = IMPLICIT_DEF
				JA_1 %bb.8, implicit killed %eflags
				JMP_1 %bb.9

				bb.8:
				successors: %bb.5, %bb.7

				%eflags = IMPLICIT_DEF
				JE_1 %bb.5, implicit killed %eflags
				JMP_1 %bb.7

				bb.9:
				successors: %bb.10, %bb.7
				liveins: %rbx

				%eflags = IMPLICIT_DEF
				JE_1 %bb.7, implicit killed %eflags
				JMP_1 %bb.10

				bb.10:
				successors: %bb.11


				bb.11:
				successors: %bb.12, %bb.3

				%eflags = IMPLICIT_DEF
				JE_1 %bb.12, implicit killed %eflags
				JMP_1 %bb.3

				bb.12:
				successors: %bb.13, %bb.14

				%eflags = IMPLICIT_DEF
				JE_1 %bb.14, implicit killed %eflags

				bb.13:
				successors: %bb.15

				JMP_1 %bb.15

				bb.14:
				RET 0

				bb.15:
				RET 0
				...
				# Check that we propagate the loop uses to its predecessors and successors.

				#CHECK-LABEL: f0

				#CHECK: BB#2 uses : %RBX, %R14, %R15
				#CHECK-NEXT: BB#10 uses : %R14
				#CHECK: **** Shrink-wrapping results
				#CHECK-NEXT: BB#2: Saves: %RBX, %R14, %R15, \| Restores: %RBX, %R15
				#CHECK-NEXT: BB#13: Saves: \| Restores: %R14
				#CHECK-NEXT: BB#14: Saves: \| Restores: %R14

test/CodeGen/X86/ShrinkWrapping/SCCCriticalEdge.mir

This file was added.

				# RUN: llc -mtriple=x86_64-- -run-pass=prologepilog %s -debug-only=shrink-wrap2 -o /dev/null 2>&1 \| FileCheck %s
				# REQUIRE: asserts
				# XFAIL: *
				--- \|
				define void @f0() nounwind { ret void }
				...
				---
				name: f0
				tracksRegLiveness: true
				body: \|
				bb.0:
				successors: %bb.1, %bb.5

				%eflags = IMPLICIT_DEF
				JE_1 %bb.1, implicit killed %eflags
				JMP_1 %bb.5

				bb.1:
				successors: %bb.2

				%rbx = IMPLICIT_DEF
				JMP_1 %bb.2

				bb.2:
				successors: %bb.3

				JMP_1 %bb.3

				bb.3:
				successors: %bb.3, %bb.5

				%eflags = IMPLICIT_DEF
				JE_1 %bb.3, implicit killed %eflags
				JMP_1 %bb.5

				bb.5:
				RET 0

				...
				# FIXME: ShrinkWrap2: This still fails because we propagate attributes where we
				# could not do it.

				#CHECK-LABEL: f0

				#CHECK: BB#1 uses : %RBX
				#CHECK: **** Shrink-wrapping results
				#CHECK-NEXT: BB#1: Saves: %RBX, \| Restores:
				#CHECK-NEXT: BB#2: Saves: \| Restores: %RBX,

test/CodeGen/X86/ShrinkWrapping/SaveBeforeLoop.mir

This file was added.

				# RUN: llc -mtriple=x86_64-- -run-pass=prologepilog %s -debug-only=shrink-wrap2 -o /dev/null 2>&1 \| FileCheck %s
				# REQUIRE: asserts
				# XFAIL: x86
				--- \|
				define void @f0() nounwind { ret void }
				...
				---
				name: f0
				tracksRegLiveness: true
				body: \|
				bb.0:
				successors: %bb.1, %bb.4

				%eflags = IMPLICIT_DEF
				JE_1 %bb.1, implicit killed %eflags
				JMP_1 %bb.4

				bb.1:
				successors: %bb.2, %bb.5

				%eflags = IMPLICIT_DEF
				JE_1 %bb.2, implicit killed %eflags
				JMP_1 %bb.5

				bb.2:
				successors: %bb.3

				%rbx = IMPLICIT_DEF
				JMP_1 %bb.3

				bb.3:
				successors: %bb.3, %bb.5

				%eflags = IMPLICIT_DEF
				JE_1 %bb.3, implicit killed %eflags
				JMP_1 %bb.5

				bb.5:
				successors: %bb.5, %bb.6

				%eflags = IMPLICIT_DEF
				JE_1 %bb.5, implicit killed %eflags
				JMP_1 %bb.6

				bb.4:
				RET 0

				bb.6:
				RET 0
				...
				# FIXME: ShrinkWrap2: This fails because we propagate attributes where we could
				# avoid doing it.

				#CHECK-LABEL: f0

				#CHECK: BB#2 uses : %RBX
				#CHECK: **** Shrink-wrapping results
				#CHECK-NEXT: BB#2: Saves: %RBX, \| Restores: %RBX

test/CodeGen/X86/ShrinkWrapping/SimpleLoopBranch.mir

This file was added.

				# RUN: llc -mtriple=x86_64-- -run-pass=prologepilog %s -debug-only=shrink-wrap2 -o /dev/null 2>&1 \| FileCheck %s
				# REQUIRES: asserts
				--- \|
				define void @f0() nounwind { ret void }
				...
				---
				name: f0
				tracksRegLiveness: true
				body: \|
				bb.0:
				successors: %bb.3, %bb.2

				%eflags = IMPLICIT_DEF
				JNE_1 %bb.3, implicit killed %eflags
				JMP_1 %bb.2

				bb.1:
				successors: %bb.3, %bb.2

				%rbx = IMPLICIT_DEF
				%eflags = IMPLICIT_DEF
				JNE_1 %bb.3, implicit killed %eflags
				JMP_1 %bb.2

				bb.2:
				successors: %bb.1

				JMP_1 %bb.1

				bb.3:
				RET 0
				...
				# Check that we don't save inside loops.

				#CHECK-LABEL: f0

				#CHECK: BB#1 uses : %RBX
				#CHECK: **** Shrink-wrapping results
				#CHECK-NEXT: BB#0: Saves: %RBX, \| Restores:
				#CHECK-NEXT: BB#3: Saves: \| Restores: %RBX,

test/CodeGen/X86/ShrinkWrapping/StackAlignment.mir

This file was added.

				# RUN: llc -disable-fp-elim -mtriple=x86_64-- -run-pass=prologepilog %s -o - \| FileCheck %s
				# REQUIRE: asserts
				--- \|
				define void @f0() nounwind { ret void }
				...
				---
				name: f0
				tracksRegLiveness: true
				stack:
				- { id: 0, offset: 0, size: 8, alignment: 8 }
				body: \|
				bb.0:
				successors: %bb.2, %bb.1

				%eflags = IMPLICIT_DEF
				JE_1 %bb.2, implicit killed %eflags
				JMP_1 %bb.1

				bb.1:
				successors: %bb.3

				%rbx = IMPLICIT_DEF
				%r14 = IMPLICIT_DEF
				JMP_1 %bb.3

				bb.2:
				RET 0

				bb.3:
				liveins: %rbx

				%rax = MOV64rm %stack.0, %rbx, _, 0, _
				RET 0, %rax
				...
				# Check that we do the stack adjustments instead of pushes.
				#CHECK-LABEL: f0
				#CHECK: %rsp = frame-setup SUB64ri8 %rsp, 16

test/CodeGen/X86/ShrinkWrapping/Tree.mir

This file was added.

				# RUN: llc -mtriple=x86_64-- -run-pass=prologepilog %s -debug-only=shrink-wrap2 -o /dev/null 2>&1 \| FileCheck %s
				# REQUIRES: asserts
				--- \|
				define void @f0() nounwind { ret void }
				...
				---
				name: f0
				tracksRegLiveness: true
				body: \|
				bb.0:
				successors: %bb.1, %bb.4

				%eflags = IMPLICIT_DEF
				JNE_1 %bb.4, implicit killed %eflags
				JMP_1 %bb.1

				bb.1:
				successors: %bb.2, %bb.3

				%eflags = IMPLICIT_DEF
				JNE_1 %bb.3, implicit killed %eflags
				JMP_1 %bb.2

				bb.2:
				%ebx = IMPLICIT_DEF
				RET 0

				bb.3:
				%ebx = IMPLICIT_DEF
				RET 0

				bb.4:
				successors: %bb.5, %bb.6

				%eflags = IMPLICIT_DEF
				JNE_1 %bb.6, implicit killed %eflags
				JMP_1 %bb.5

				bb.5:
				%ebx = IMPLICIT_DEF
				RET 0

				bb.6:
				RET 0
				...
				# Check that we save only on branches we need in a tree-like CFG.

				#CHECK-LABEL: f0

				#CHECK: BB#2 uses : %RBX
				#CHECK-NEXT: BB#3 uses : %RBX
				#CHECK-NEXT: BB#5 uses : %RBX
				#CHECK: **** Shrink-wrapping results
				#CHECK-NEXT: BB#1: Saves: %RBX, \| Restores:
				#CHECK-NEXT: BB#2: Saves: \| Restores: %RBX,
				#CHECK-NEXT: BB#3: Saves: \| Restores: %RBX,
				#CHECK-NEXT: BB#5: Saves: %RBX, \| Restores: %RBX,

test/CodeGen/X86/ShrinkWrapping/lit.local.cfg

This file was added.

				if not 'X86' in config.root.targets:
				config.unsupported = True

test/CodeGen/X86/ShrinkWrapping/optimize-max-0.mir

This file was added.

				# RUN: llc -mtriple=x86_64-- -run-pass=stack-protector -run-pass=prologepilog %s -enable-shrink-wrap2=true -debug-only=shrink-wrap2 -o /dev/null 2>&1 \| FileCheck %s
				# REQUIRES: asserts
				# XFAIL: x86
				--- \|
				define void @f0() nounwind { ret void }
				...
				---
				name: f0
				tracksRegLiveness: true
				body: \|
				bb.0:
				successors: %bb.6, %bb.3

				%eflags = IMPLICIT_DEF
				JE_1 %bb.6, implicit killed %eflags
				JMP_1 %bb.3

				bb.3:
				successors: %bb.3, %bb.4

				%eflags = IMPLICIT_DEF
				JNE_1 %bb.3, implicit killed %eflags
				JMP_1 %bb.4

				bb.4:
				successors: %bb.6

				JMP_1 %bb.6

				bb.6:
				successors: %bb.8

				%ebx = IMPLICIT_DEF
				JMP_1 %bb.8

				bb.8:
				RET 0

				...
				# FIXME: ShrinkWrap2: This fails because we detect a critical edge.

				#CHECK-LABEL: f0

				#CHECK: BB#3 uses : %RBX
				#CHECK: **** Shrink-wrapping results
				#CHECK-NEXT: BB#3: Saves: %RBX, \| Restores:
				#CHECK-NEXT: BB#4: Saves: \| Restores: %RBX,

This is an archive of the discontinued LLVM Phabricator instance.

shrink-wrap: implement more advanced algorithmNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 103086

include/llvm/CodeGen/AsmPrinter.h

include/llvm/CodeGen/MachineFrameInfo.h

include/llvm/CodeGen/ShrinkWrapper.h

include/llvm/Target/TargetFrameLowering.h

lib/CodeGen/AsmPrinter/AsmPrinter.cpp

lib/CodeGen/AsmPrinter/CodeViewDebug.cpp

lib/CodeGen/CMakeLists.txt

lib/CodeGen/PrologEpilogInserter.cpp

lib/CodeGen/ShrinkWrapper.cpp

lib/CodeGen/TargetPassConfig.cpp

lib/Target/AArch64/AArch64FrameLowering.h

lib/Target/AArch64/AArch64FrameLowering.cpp

lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp

lib/Target/AArch64/AArch64MachineFunctionInfo.h

lib/Target/X86/X86FrameLowering.h

lib/Target/X86/X86FrameLowering.cpp

test/CodeGen/AArch64/ShrinkWrapping/AliasInRegMask.mir

test/CodeGen/AArch64/ShrinkWrapping/CFIStackFrame.mir

test/CodeGen/AArch64/ShrinkWrapping/CSRUsedOnTerminator.mir

test/CodeGen/AArch64/ShrinkWrapping/CompactUnwindingFPSPPair.mir

test/CodeGen/AArch64/ShrinkWrapping/DetermineCalleeSavesSideEffects.mir

test/CodeGen/AArch64/ShrinkWrapping/FirstMBBNum2.ll

test/CodeGen/AArch64/ShrinkWrapping/NoPostPreLoadStore.mir

test/CodeGen/AArch64/ShrinkWrapping/NoStackObjects.mir

test/CodeGen/AArch64/aarch64-dynamic-stack-layout.ll

test/CodeGen/AArch64/alloca.ll

test/CodeGen/AArch64/arm64-aapcs-be.ll

test/CodeGen/AArch64/arm64-abi_align.ll

test/CodeGen/AArch64/arm64-alloca-frame-pointer-offset.ll

test/CodeGen/AArch64/arm64-dead-register-def-bug.ll

test/CodeGen/AArch64/arm64-fp128.ll

test/CodeGen/AArch64/arm64-hello.ll

test/CodeGen/AArch64/arm64-join-reserved.ll

test/CodeGen/AArch64/arm64-large-frame.ll

test/CodeGen/X86/ShrinkWrapping/BasicBranch.mir

test/CodeGen/X86/ShrinkWrapping/CriticalEdge.mir

test/CodeGen/X86/ShrinkWrapping/CriticalEdge2.mir

test/CodeGen/X86/ShrinkWrapping/CriticalEdgeLoop.mir

test/CodeGen/X86/ShrinkWrapping/InfiniteLoop.mir

test/CodeGen/X86/ShrinkWrapping/IrreducibleCFG.mir

test/CodeGen/X86/ShrinkWrapping/LoopBasic.mir

test/CodeGen/X86/ShrinkWrapping/LoopInCondition.mir

test/CodeGen/X86/ShrinkWrapping/LoopNoPreheader.mir

test/CodeGen/X86/ShrinkWrapping/LoopNoPreheaderLatchExit.mir

test/CodeGen/X86/ShrinkWrapping/MultipleCriticalEdges.mir

test/CodeGen/X86/ShrinkWrapping/NestedLoopsCriticalEdges.mir

test/CodeGen/X86/ShrinkWrapping/NoReturnPath.mir

test/CodeGen/X86/ShrinkWrapping/Paper1Figure2CriticalEdge.mir

test/CodeGen/X86/ShrinkWrapping/Paper2Figure1.mir

test/CodeGen/X86/ShrinkWrapping/Paper2Figure2.mir

test/CodeGen/X86/ShrinkWrapping/PropagateLoopUses.mir

test/CodeGen/X86/ShrinkWrapping/SCCCriticalEdge.mir

test/CodeGen/X86/ShrinkWrapping/SaveBeforeLoop.mir

test/CodeGen/X86/ShrinkWrapping/SimpleLoopBranch.mir

test/CodeGen/X86/ShrinkWrapping/StackAlignment.mir

test/CodeGen/X86/ShrinkWrapping/Tree.mir

test/CodeGen/X86/ShrinkWrapping/lit.local.cfg

test/CodeGen/X86/ShrinkWrapping/optimize-max-0.mir

shrink-wrap: implement more advanced algorithm
Needs ReviewPublic