Download Raw Diff

Details

Reviewers

qcolombet
nadav

Commits

rG7cf324772fde: LEA code size optimization pass (Part 1): Remove redundant address…
rL254712: LEA code size optimization pass (Part 1): Remove redundant address…

Summary

Add new x86 pass which replaces address calculations in load or store instructions with def register of existing LEA (must be in the same basic block), if the LEA calculates address that differs only by a displacement. Works only with -Os or -Oz.

Diff Detail

Event Timeline

aturetsk updated this revision to Diff 36112.Sep 30 2015, 8:46 AM

aturetsk retitled this revision from to LEA code size optimization pass (Part 1): Remove redundant address recalculations.

aturetsk updated this object.

aturetsk added a reviewer: nadav.

aturetsk added a subscriber: llvm-commits.

Followed up by http://reviews.llvm.org/D13295.

aturetsk updated this object.Sep 30 2015, 9:04 AM

Ping.

Hi Andrey,

Thanks for working on this.

What is the compile time impact of that pass?

Right now, the scope of the pass is pretty narrow. For instance, it is basic block scope only whereas it should be MachineFunction scope.
Anyhow, I am fine with it if we have a plan to improve it, therefore the question:
What is the long term plan for this pass?

Finally, the added test case seems too small to cover all the code added. I may of course be wrong, it seems we are missing some case. In particular, please make sure to cover the cases where:

The size of the displacement exceeds 4-bytes.
The matching LEA is after the definition.
The size of the displacement exceeds 1-bytes for a closer candidate.
The function doesn’t have the midsize/optsize attribute.

Cheers,
-Quentin

lib/Target/X86/X86OptimizeLEAs.cpp
42	That’s a bit strange to have that private.
55	Since you expect both instructions to be valid, put references. This is also true for the rest of the class.
59	Use SmallVectorImpl to not have to repeat the number of elements everywhere.
60	Please explain all the fields in the comment. Also, shouldn’t List be const here?
68	What are the unsigned argument used for? In other words, please document how they are used.
97	use \p MI.
98	The address… by the displacement… I am not a native English speaker, but without the articles, I found the sentence strange.
100	The register class of the definition of the LEA… is compatible?
104	Why? You didn’t define “best” for the LEA instruction, so I guess it may make sense, but I don’t see why now. Also, after reading the code, looks like you bail on the first LEA that is past MI, so what is the strategy here? I.e., what if the next LEA would have matched but not this one.
132	You can also constrain the register class to the intersection of both classes.
172	What are the advantages of the vector against a static const array? Look into X86TransformInfo for instance.
175	Even if both operand are identical, that does not mean they carry the same value. This is true for SSA variable, but physical register are not SSA. You must check that the operand is not a physical register.
220	the number

Fix most of the comments

Hi Quentin!

Thanks for the review.

My measurements show no significant compile-time impact of the pass in -Os mode.

There are plenty of things to improve in the patch. Here are my thoughts about it:

I believe the most needed thing in the pass is a proper heuristic to decide which address calculations should be replaced and which should not. The current code size impact of the pass is -0.2% in -Os mode and -0.3% in -Oz mode (including part 2 of the patch) on Spec 2000. That's not much, I was hoping for more when I started the development. However I think a good heuristic can improve the results. Moreover, a heuristic can probably discover some performance opportunities of the pass. Last time I measured, I got about 0% geomean change in performance on Spec 2000, however the measurements showed that some tests have significant improvement and some have significant degradation from the pass. So if a heuristic could help to cut off some of bad cases, we would get a performance gain as well.

MachineFunction scope. I implemented this as one of my experiments with the pass, but the changes didn't give any improvements (or even had a negative effect, I can't remember). I believe extended over-basic-block liveranges of LEA defs damage code. So to extend the scope we need to have some heuristic to understand when we want to use LEAs over basic blocks or not. I wasn't able to find one, Unfortunately I didn't save the patch, but extending the scope is an easy thing to implement - search for LEAs in basic blocks from DominatorTree in findLEAs and tweak calcInstrDist and chooseBestLEA to handle instructions from different basic blocks.

Some time ago I discovered that at the moment when the LEA pass works some instructions calculating addresses are not yet LEAs (they will be converted later after RA, you can see convertToThreeAddress for details). I've created an experimental patch to handle these not-yet-LEAs as well, but it didn't give any improvements, so I dropped it. So this change requires some heuristic as well, otherwise we can't be sure whether we improve code or make it worse.

The biggest part of code size improvement of the pass is from replacing %esp with other register in moves to/from stack (example: lea 256(%esp), %eax; mov $111, 260(%esp)). That's profitable because if a new displacement fits 1 byte (and the old one does not), we save few bytes from different encoding of the instruction. However at the moment when the LEA pass works we don't have the actual displacement, only frame index and some displacement relative to it, so we can't really judge whether we should replace %esp with another register or not. So another possible improvement is to keep %esp cases unchanged until after RA, and than make a decision to replace %esp according to actual displacements.

That's my vision of what could be done to the pass in the future. I'm most interested in having a deeper analysis to understand when the replacement of address calculation is profitable, but I have no ideas how to do that.
About tests, I will add some soon to increase the coverage.

lib/Target/X86/X86OptimizeLEAs.cpp
44	Fixed.
57	Fixed here and in the other places.
61	Fixed.
62	Extended the comment. Put 'const' here and in other places where possible.
70	Extended the comment.
99	Should I put "\p" before every argument name mentioned in every comment?
100	Fixed.
102	Fixed.
106	3) Displacement of the new memory operand should fit in 1 byte if possible. 4) The LEA should be as close to MI as possible, and prior to it if // possible. These two conditions define the best LEA. Choosing 1 byte displacement over the bigger one leads to saving few bytes through different instruction encoding. And we try to use the closest LEA to avoid significant increasing of LEA's def liverange. And it just seems more right to me to give priority to the first LEA before MI than to the first LEA after MI regardless of the distance between LEAs and MI to avoid instruction moving.
134	The case where this is needed is extremely rare. I used to have this in the initial version of the patch, but I wasn't able to write the test for it. So I just decided to drop it.
174	Fixed.
177	To avoid large 'if' expressions I created a separate function to perform the check: isIdenticalOp.
222	Fixed.

Added more tests.
"The size of the displacement exceeds 4-bytes." - this case is the only one remaining uncovered. I wasn't able to force the compiler to use such big address displacement (If address has a big displacement it's stored into a register and than used through it). Not sure if this is possible without changing compiler sources.

qcolombet added inline comments.Dec 1 2015, 11:52 AM

lib/Target/X86/X86OptimizeLEAs.cpp
99	Those are only proceeded in the doxygen like comments (e.g., ///). In these comments, I think we usually put \p before every argument name.
106	Although I get what you are saying, it feels arbitrary to me to stop on the first LEA after the current instruction when this is the only candidate we found. For now, this is fine, just make sure to add a FIXME.
134	Maybe say that in the comment then.

Fixed the patch according to the latest comments.

Hi Andrey,

LGTM modulo two things:

Merge the test cases in one file.
Run opt -instnamer on the IR to get rid of the %[0-9]+ variables.

Please commit with those changes.

Thanks,
-Quentin

This revision is now accepted and ready to land.Dec 3 2015, 10:41 AM

Merged tests and ran 'opt -instnamer' on them.

Closed by commit rL254712: LEA code size optimization pass (Part 1): Remove redundant address… (authored by ABataev). · Explain WhyDec 4 2015, 2:56 AM

This revision was automatically updated to reflect the committed changes.

jevinskie added a subscriber: jevinskie.Dec 7 2015, 10:34 AM

Please see https://llvm.org/bugs/show_bug.cgi?id=25843: this pass can be
very slow, exponentially blowing the compile time for large functions.

Diff 36112

lib/Target/X86/CMakeLists.txt

Show All 28 Lines	set(sources
X86SelectionDAGInfo.cpp		X86SelectionDAGInfo.cpp
X86Subtarget.cpp		X86Subtarget.cpp
X86TargetMachine.cpp		X86TargetMachine.cpp
X86TargetObjectFile.cpp		X86TargetObjectFile.cpp
X86TargetTransformInfo.cpp		X86TargetTransformInfo.cpp
X86VZeroUpper.cpp		X86VZeroUpper.cpp
X86FixupLEAs.cpp		X86FixupLEAs.cpp
X86WinEHState.cpp		X86WinEHState.cpp
		X86OptimizeLEAs.cpp
)		)

if( CMAKE_CL_64 )		if( CMAKE_CL_64 )
enable_language(ASM_MASM)		enable_language(ASM_MASM)
ADD_CUSTOM_COMMAND(		ADD_CUSTOM_COMMAND(
OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/X86CompilationCallback_Win64.obj		OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/X86CompilationCallback_Win64.obj
MAIN_DEPENDENCY X86CompilationCallback_Win64.asm		MAIN_DEPENDENCY X86CompilationCallback_Win64.asm
COMMAND ${CMAKE_ASM_MASM_COMPILER} /nologo /Fo ${CMAKE_CURRENT_BINARY_DIR}/X86CompilationCallback_Win64.obj /c ${CMAKE_CURRENT_SOURCE_DIR}/X86CompilationCallback_Win64.asm		COMMAND ${CMAKE_ASM_MASM_COMPILER} /nologo /Fo ${CMAKE_CURRENT_BINARY_DIR}/X86CompilationCallback_Win64.obj /c ${CMAKE_CURRENT_SOURCE_DIR}/X86CompilationCallback_Win64.asm
Show All 12 Lines

lib/Target/X86/X86.h

	Show First 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
	/// with NOOPs. This will prevent a stall when returning on the Atom.			/// with NOOPs. This will prevent a stall when returning on the Atom.
	FunctionPass *createX86PadShortFunctions();			FunctionPass *createX86PadShortFunctions();
	/// createX86FixupLEAs - Return a a pass that selectively replaces			/// createX86FixupLEAs - Return a a pass that selectively replaces
	/// certain instructions (like add, sub, inc, dec, some shifts,			/// certain instructions (like add, sub, inc, dec, some shifts,
	/// and some multiplies) by equivalent LEA instructions, in order			/// and some multiplies) by equivalent LEA instructions, in order
	/// to eliminate execution delays in some Atom processors.			/// to eliminate execution delays in some Atom processors.
	FunctionPass *createX86FixupLEAs();			FunctionPass *createX86FixupLEAs();

				/// createX86OptimizeLEAs() - Return a pass that removes redundant
				/// address recalculations.
				FunctionPass *createX86OptimizeLEAs();

	/// createX86CallFrameOptimization - Return a pass that optimizes			/// createX86CallFrameOptimization - Return a pass that optimizes
	/// the code-size of x86 call sequences. This is done by replacing			/// the code-size of x86 call sequences. This is done by replacing
	/// esp-relative movs with pushes.			/// esp-relative movs with pushes.
	FunctionPass *createX86CallFrameOptimization();			FunctionPass *createX86CallFrameOptimization();

	/// createX86WinEHStatePass - Return an IR pass that inserts EH registration			/// createX86WinEHStatePass - Return an IR pass that inserts EH registration
	/// stack objects and explicit EH state updates. This pass must run after EH			/// stack objects and explicit EH state updates. This pass must run after EH
	/// preparation, which does Windows-specific but architecture-neutral			/// preparation, which does Windows-specific but architecture-neutral
	Show All 11 Lines

lib/Target/X86/X86OptimizeLEAs.cpp

This file was added.

				//===-- X86OptimizeLEAs.cpp - optimize usage of LEA instructions ----------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This file defines the pass that performs some optimizations with LEA
				// instructions in order to improve code size.
				// Currently, it does one thing:
				// 1) Address calculations in load and store instructions are replaced by
				// existing LEA def registers where possible.
				//
				//===----------------------------------------------------------------------===//

				#include "X86.h"
				#include "X86InstrInfo.h"
				#include "X86Subtarget.h"
				#include "llvm/ADT/Statistic.h"
				#include "llvm/CodeGen/LiveVariables.h"
				#include "llvm/CodeGen/MachineFunctionPass.h"
				#include "llvm/CodeGen/MachineInstrBuilder.h"
				#include "llvm/CodeGen/MachineRegisterInfo.h"
				#include "llvm/CodeGen/Passes.h"
				#include "llvm/IR/Function.h"
				#include "llvm/Support/Debug.h"
				#include "llvm/Support/raw_ostream.h"
				#include "llvm/Target/TargetInstrInfo.h"

				using namespace llvm;

				#define DEBUG_TYPE "x86-optimize-LEAs"

				STATISTIC(NumSubstLEAs, "Number of LEA instruction substitutions");

				namespace {
				class OptimizeLEAPass : public MachineFunctionPass {
				static char ID;

				const char *getPassName() const override { return "X86 LEA Optimize"; }
				qcolombetUnsubmitted Not Done Reply Inline Actions That’s a bit strange to have that private. qcolombet: That’s a bit strange to have that private.

				public:
				aturetskAuthorUnsubmitted Not Done Reply Inline Actions Fixed. aturetsk: Fixed.
				OptimizeLEAPass() : MachineFunctionPass(ID) {}

				/// \brief Loop over all of the basic blocks, replacing address
				/// calculations in load and store instructions, if it's already
				/// been calculated by LEA. Also, remove redundant LEAs.
				bool runOnMachineFunction(MachineFunction &MF) override;

				private:
				/// \brief Returns a distance between two instructions inside one basic block.
				/// Negative result means, that instructions occur in reverse order.
				int calcInstrDist(MachineInstr First, MachineInstr Last);
				qcolombetUnsubmitted Not Done Reply Inline Actions Since you expect both instructions to be valid, put references. This is also true for the rest of the class. qcolombet: Since you expect both instructions to be valid, put references. This is also true for the rest…

				/// \brief Choose the best LEA instruction from the list to replace address
				aturetskAuthorUnsubmitted Not Done Reply Inline Actions Fixed here and in the other places. aturetsk: Fixed here and in the other places.
				/// calculation in MI instruction.
				bool chooseBestLEA(SmallVector<MachineInstr , 16> &List, MachineInstr MI,
				qcolombetUnsubmitted Not Done Reply Inline Actions Use SmallVectorImpl to not have to repeat the number of elements everywhere. qcolombet: Use SmallVectorImpl to not have to repeat the number of elements everywhere.
				MachineInstr *&LEA, int64_t &AddrDispShift, int &Dist);
				qcolombetUnsubmitted Not Done Reply Inline Actions Please explain all the fields in the comment. Also, shouldn’t List be const here? qcolombet: Please explain all the fields in the comment. Also, shouldn’t List be const here?

				aturetskAuthorUnsubmitted Not Done Reply Inline Actions Fixed. aturetsk: Fixed.
				/// \brief Returns true if the instruction is LEA.
				aturetskAuthorUnsubmitted Not Done Reply Inline Actions Extended the comment. Put 'const' here and in other places where possible. aturetsk: Extended the comment. Put 'const' here and in other places where possible.
				bool isLEA(MachineInstr *MI);

				/// \brief Returns true if two instructions have memory operands that only
				/// differ by displacement.
				bool isSimilarMemOp(MachineInstr MI1, unsigned N1, MachineInstr MI2,
				unsigned N2, int64_t &AddrDispShift);
				qcolombetUnsubmitted Not Done Reply Inline Actions What are the unsigned argument used for? In other words, please document how they are used. qcolombet: What are the unsigned argument used for? In other words, please document how they are used.

				/// \brief Find all LEA instructions in the basic block.
				aturetskAuthorUnsubmitted Not Done Reply Inline Actions Extended the comment. aturetsk: Extended the comment.
				void findLEAs(MachineBasicBlock &MBB, SmallVector<MachineInstr *, 16> &List);

				/// \brief Removes redundant address calculations.
				bool removeRedundantAddrCalc(SmallVector<MachineInstr *, 16> &List);

				MachineRegisterInfo *MRI;
				const X86InstrInfo *TII;
				const X86RegisterInfo *TRI;
				};
				char OptimizeLEAPass::ID = 0;
				}

				FunctionPass *llvm::createX86OptimizeLEAs() { return new OptimizeLEAPass(); }

				int OptimizeLEAPass::calcInstrDist(MachineInstr First, MachineInstr Last) {
				MachineBasicBlock *MBB = First->getParent();

				// Both instructions must be in the same basic block.
				assert(Last->getParent() == MBB &&
				"Instructions are in different basic blocks");

				return std::distance(MBB->begin(), MachineBasicBlock::iterator(Last)) -
				std::distance(MBB->begin(), MachineBasicBlock::iterator(First));
				}

				// Find the best LEA instruction in the List to replace address recalculation in
				// MI. Such LEA must meet these requirements:
				qcolombetUnsubmitted Not Done Reply Inline Actions use \p MI. qcolombet: use \p MI.
				// 1) Address calculated by the LEA differs only by displacement from the
				qcolombetUnsubmitted Not Done Reply Inline Actions The address… by the displacement… I am not a native English speaker, but without the articles, I found the sentence strange. qcolombet: The address… by the displacement… I am not a native English speaker, but without the…
				// address used in MI.
				aturetskAuthorUnsubmitted Not Done Reply Inline Actions Should I put "\p" before every argument name mentioned in every comment? aturetsk: Should I put "\p" before every argument name mentioned in every comment?
				qcolombetUnsubmitted Not Done Reply Inline Actions Those are only proceeded in the doxygen like comments (e.g., ///). In these comments, I think we usually put \p before every argument name. qcolombet: Those are only proceeded in the doxygen like comments (e.g., ///). In these comments, I think…
				// 2) Class of the LEA def register doesn't conflict with class of MI
				aturetskAuthorUnsubmitted Not Done Reply Inline Actions Fixed. aturetsk: Fixed.
				qcolombetUnsubmitted Not Done Reply Inline Actions The register class of the definition of the LEA… is compatible? qcolombet: The register class of the definition of the LEA… is compatible?
				// address base register.
				// 3) Displacement of the new memory operand should fit in 1 byte if possible.
				aturetskAuthorUnsubmitted Not Done Reply Inline Actions Fixed. aturetsk: Fixed.
				// 4) The LEA should be as close to MI as possible, and prior to it if
				// possible.
				qcolombetUnsubmitted Not Done Reply Inline Actions Why? You didn’t define “best” for the LEA instruction, so I guess it may make sense, but I don’t see why now. Also, after reading the code, looks like you bail on the first LEA that is past MI, so what is the strategy here? I.e., what if the next LEA would have matched but not this one. qcolombet: Why? You didn’t define “best” for the LEA instruction, so I guess it may make sense, but I…
				bool OptimizeLEAPass::chooseBestLEA(SmallVector<MachineInstr *, 16> &List,
				MachineInstr MI, MachineInstr &LEA,
				aturetskAuthorUnsubmitted Not Done Reply Inline Actions 3) Displacement of the new memory operand should fit in 1 byte if possible. 4) The LEA should be as close to MI as possible, and prior to it if // possible. These two conditions define the best LEA. Choosing 1 byte displacement over the bigger one leads to saving few bytes through different instruction encoding. And we try to use the closest LEA to avoid significant increasing of LEA's def liverange. And it just seems more right to me to give priority to the first LEA before MI than to the first LEA after MI regardless of the distance between LEAs and MI to avoid instruction moving. aturetsk: > // 3) Displacement of the new memory operand should fit in 1 byte if possible. > // 4) The…
				qcolombetUnsubmitted Not Done Reply Inline Actions Although I get what you are saying, it feels arbitrary to me to stop on the first LEA after the current instruction when this is the only candidate we found. For now, this is fine, just make sure to add a FIXME. qcolombet: Although I get what you are saying, it feels arbitrary to me to stop on the first LEA after the…
				int64_t &AddrDispShift, int &Dist) {
				MachineFunction *MF = MI->getParent()->getParent();
				const MCInstrDesc &Desc = MI->getDesc();
				int MemOpNo = X86II::getMemoryOperandNo(Desc.TSFlags, MI->getOpcode()) +
				X86II::getOperandBias(Desc);

				LEA = nullptr;

				// Loop over all LEA instructions.
				for (auto DefMI : List) {
				int64_t AddrDispShiftTemp = 0;

				// Compare instructions memory operands.
				if (!isSimilarMemOp(MI, MemOpNo, DefMI, 1, AddrDispShiftTemp))
				continue;

				// Make sure address displacement fits 4 bytes.
				if (!isInt<32>(AddrDispShiftTemp))
				continue;

				// Check that LEA def register can be used as MI address base. Some
				// instructions can use a limited set of registers as address base, for
				// example MOV8mr_NOREX.
				if (TII->getRegClass(Desc, MemOpNo + X86::AddrBaseReg, TRI, *MF) !=
				MRI->getRegClass(DefMI->getOperand(0).getReg()))
				continue;
				qcolombetUnsubmitted Not Done Reply Inline Actions You can also constrain the register class to the intersection of both classes. qcolombet: You can also constrain the register class to the intersection of both classes.

				// Choose the closest LEA instruction from the list, prior to MI if
				aturetskAuthorUnsubmitted Not Done Reply Inline Actions The case where this is needed is extremely rare. I used to have this in the initial version of the patch, but I wasn't able to write the test for it. So I just decided to drop it. aturetsk: The case where this is needed is extremely rare. I used to have this in the initial version of…
				qcolombetUnsubmitted Not Done Reply Inline Actions Maybe say that in the comment then. qcolombet: Maybe say that in the comment then.
				// possible. Note that we took into account resulting address displacement
				// as well. Also note that the list is sorted by the order in which the LEAs
				// occur, so the break condition is pretty simple.
				int DistTemp = calcInstrDist(DefMI, MI);
				assert(DistTemp != 0 &&
				"The distance between two different instructions cannot be zero");
				if (DistTemp > 0 \|\| LEA == nullptr) {
				// Do not update return LEA, if the current one provides a displacement
				// which fits in 1 byte, while the new candidate does not.
				if (LEA != nullptr && !isInt<8>(AddrDispShiftTemp) &&
				isInt<8>(AddrDispShift))
				continue;

				LEA = DefMI;
				AddrDispShift = AddrDispShiftTemp;
				Dist = DistTemp;
				}
				if (DistTemp < 0)
				break;
				}

				return LEA != nullptr;
				}

				bool OptimizeLEAPass::isLEA(MachineInstr *MI) {
				unsigned Opcode = MI->getOpcode();
				return Opcode == X86::LEA16r \|\| Opcode == X86::LEA32r \|\|
				Opcode == X86::LEA64r \|\| Opcode == X86::LEA64_32r;
				}

				// Check if MI1 and MI2 have memory operands which represent addresses that
				// differ only by displacement.
				bool OptimizeLEAPass::isSimilarMemOp(MachineInstr *MI1, unsigned N1,
				MachineInstr *MI2, unsigned N2,
				int64_t &AddrDispShift) {
				// Address base, scale, index and segment operands must be identical.
				std::vector<int> IdenticalOpNums = {X86::AddrBaseReg, X86::AddrScaleAmt,
				X86::AddrIndexReg, X86::AddrSegmentReg};
				qcolombetUnsubmitted Not Done Reply Inline Actions What are the advantages of the vector against a static const array? Look into X86TransformInfo for instance. qcolombet: What are the advantages of the vector against a static const array? Look into X86TransformInfo…
				for (auto &N : IdenticalOpNums)
				if (!MI1->getOperand(N1 + N).isIdenticalTo(MI2->getOperand(N2 + N)))
				aturetskAuthorUnsubmitted Not Done Reply Inline Actions Fixed. aturetsk: Fixed.
				return false;
				qcolombetUnsubmitted Not Done Reply Inline Actions Even if both operand are identical, that does not mean they carry the same value. This is true for SSA variable, but physical register are not SSA. You must check that the operand is not a physical register. qcolombet: Even if both operand are identical, that does not mean they carry the same value. This is true…

				// Address displacement operands may differ by a constant.
				aturetskAuthorUnsubmitted Not Done Reply Inline Actions To avoid large 'if' expressions I created a separate function to perform the check: isIdenticalOp. aturetsk: To avoid large 'if' expressions I created a separate function to perform the check…
				MachineOperand *Op1 = &MI1->getOperand(N1 + X86::AddrDisp);
				MachineOperand *Op2 = &MI2->getOperand(N2 + X86::AddrDisp);
				if (!Op1->isIdenticalTo(*Op2)) {
				if (Op1->isImm() && Op2->isImm())
				AddrDispShift = Op1->getImm() - Op2->getImm();
				else if (Op1->isGlobal() && Op2->isGlobal() &&
				Op1->getGlobal() == Op2->getGlobal())
				AddrDispShift = Op1->getOffset() - Op2->getOffset();
				else
				return false;
				}

				return true;
				}

				void OptimizeLEAPass::findLEAs(MachineBasicBlock &MBB,
				SmallVector<MachineInstr *, 16> &List) {
				for (auto &MI : MBB) {
				if (isLEA(&MI))
				List.push_back(&MI);
				}
				}

				// Try to find load and store instructions which recalculate addresses already
				// calculated by some LEA and replace their memory operands with its def
				// register.
				bool OptimizeLEAPass::removeRedundantAddrCalc(
				SmallVector<MachineInstr *, 16> &List) {
				bool Changed = false;

				assert(List.size() > 0);
				MachineBasicBlock *MBB = List[0]->getParent();

				// Process all instructions in basic block.
				for (auto I = MBB->begin(), E = MBB->end(); I != E;) {
				MachineInstr *MI = I++;
				unsigned Opcode = MI->getOpcode();

				// Instruction must be load or store.
				if (!MI->mayLoadOrStore())
				continue;

				// Get a number of the first memory operand.
				qcolombetUnsubmitted Not Done Reply Inline Actions the number qcolombet: the number
				const MCInstrDesc &Desc = MI->getDesc();
				int MemOpNo = X86II::getMemoryOperandNo(Desc.TSFlags, Opcode);
				aturetskAuthorUnsubmitted Not Done Reply Inline Actions Fixed. aturetsk: Fixed.

				// If instruction has no memory operand - skip it.
				if (MemOpNo < 0)
				continue;

				MemOpNo += X86II::getOperandBias(Desc);

				// Get the best LEA instruction to replace address calculation.
				MachineInstr *DefMI;
				int64_t AddrDispShift;
				int Dist;
				if (!chooseBestLEA(List, MI, DefMI, AddrDispShift, Dist))
				continue;

				// If LEA occurs before current instruction, we can freely replace
				// the instruction. If LEA occurs after, we can lift LEA above the
				// instruction and this way to be able to replace it. Since LEA and the
				// instruction have similar memory operands (thus, the same def
				// instructions for these operands), we can always do that, without
				// worries of using registers before their defs.
				if (Dist < 0) {
				DefMI->removeFromParent();
				MBB->insert(MachineBasicBlock::iterator(MI), DefMI);
				}

				// Since we can possibly extend register lifetime, clear kill flags.
				MRI->clearKillFlags(DefMI->getOperand(0).getReg());

				++NumSubstLEAs;
				DEBUG(dbgs() << "OptimizeLEAs: Candidate to replace: "; MI->dump(););

				// Change instruction operands.
				MI->getOperand(MemOpNo + X86::AddrBaseReg)
				.ChangeToRegister(DefMI->getOperand(0).getReg(), false);
				MI->getOperand(MemOpNo + X86::AddrScaleAmt).ChangeToImmediate(1);
				MI->getOperand(MemOpNo + X86::AddrIndexReg)
				.ChangeToRegister(X86::NoRegister, false);
				MI->getOperand(MemOpNo + X86::AddrDisp).ChangeToImmediate(AddrDispShift);
				MI->getOperand(MemOpNo + X86::AddrSegmentReg)
				.ChangeToRegister(X86::NoRegister, false);

				DEBUG(dbgs() << "OptimizeLEAs: Replaced by: "; MI->dump(););

				Changed = true;
				}

				return Changed;
				}

				bool OptimizeLEAPass::runOnMachineFunction(MachineFunction &MF) {
				bool Changed = false;
				bool OptSize = MF.getFunction()->optForSize();
				bool MinSize = MF.getFunction()->optForMinSize();

				// Perform this optimization only if we care about code size.
				if (!OptSize && !MinSize)
				return false;

				MRI = &MF.getRegInfo();
				TII = MF.getSubtarget<X86Subtarget>().getInstrInfo();
				TRI = MF.getSubtarget<X86Subtarget>().getRegisterInfo();

				// Process all basic blocks.
				for (auto &MBB : MF) {
				SmallVector<MachineInstr *, 16> LEAs;

				// Find all LEA instructions in basic block.
				findLEAs(MBB, LEAs);

				// If current basic block has no LEAs, move on to the next one.
				if (LEAs.empty())
				continue;

				// Remove redundant address calculations.
				Changed \|= removeRedundantAddrCalc(LEAs);
				}

				return Changed;
				}

lib/Target/X86/X86TargetMachine.cpp

Show First 20 Lines • Show All 248 Lines • ▼ Show 20 Lines	bool X86PassConfig::addPreISel() {
// Only add this pass for 32-bit x86 Windows.		// Only add this pass for 32-bit x86 Windows.
const Triple &TT = TM->getTargetTriple();		const Triple &TT = TM->getTargetTriple();
if (TT.isOSWindows() && TT.getArch() == Triple::x86)		if (TT.isOSWindows() && TT.getArch() == Triple::x86)
addPass(createX86WinEHStatePass());		addPass(createX86WinEHStatePass());
return true;		return true;
}		}

void X86PassConfig::addPreRegAlloc() {		void X86PassConfig::addPreRegAlloc() {
		if (getOptLevel() != CodeGenOpt::None)
		addPass(createX86OptimizeLEAs());

addPass(createX86CallFrameOptimization());		addPass(createX86CallFrameOptimization());
}		}

void X86PassConfig::addPostRegAlloc() {		void X86PassConfig::addPostRegAlloc() {
addPass(createX86FloatingPointStackifierPass());		addPass(createX86FloatingPointStackifierPass());
}		}

void X86PassConfig::addPreSched2() { addPass(createX86ExpandPseudoPass()); }		void X86PassConfig::addPreSched2() { addPass(createX86ExpandPseudoPass()); }
Show All 13 Lines

test/CodeGen/X86/lea-opt.ll

This file was added.

				; RUN: llc < %s -mtriple=x86_64-linux \| FileCheck %s

				%struct.anon = type { i32, i32, i32 }

				@arr = external global [65 x %struct.anon], align 16

				define void @test1(i64 %x) nounwind optsize {
				entry:
				%a = getelementptr inbounds [65 x %struct.anon], [65 x %struct.anon]* @arr, i64 0, i64 %x, i32 0
				%0 = load i32, i32* %a, align 4
				%b = getelementptr inbounds [65 x %struct.anon], [65 x %struct.anon]* @arr, i64 0, i64 %x, i32 1
				%1 = load i32, i32* %b, align 4
				%sub = sub i32 %0, %1
				%c = getelementptr inbounds [65 x %struct.anon], [65 x %struct.anon]* @arr, i64 0, i64 %x, i32 2
				%2 = load i32, i32* %c, align 4
				%add = add nsw i32 %sub, %2
				switch i32 %add, label %sw.epilog [
				i32 1, label %sw.bb.1
				i32 2, label %sw.bb.2
				]

				sw.bb.1:
				store i32 111, i32* %b, align 4
				store i32 222, i32* %c, align 4
				br label %sw.epilog

				sw.bb.2:
				store i32 333, i32* %b, align 4
				store i32 444, i32* %c, align 4
				br label %sw.epilog

				sw.epilog:
				ret void
				; CHECK-LABEL: test1:
				; CHECK: leaq (%rdi,%rdi,2), [[REG1:%[a-z]+]]
				; CHECK: leaq arr+4(,[[REG1]],4), [[REG2:%[a-z]+]]
				; CHECK: movl -4([[REG2]]), {{.*}}
				; CHECK: subl ([[REG2]]), {{.*}}
				; CHECK: leaq arr+8(,[[REG1]],4), [[REG3:%[a-z]+]]
				; CHECK: addl ([[REG3]]), {{.*}}
				; CHECK: movl ${{[1-4]+}}, ([[REG2]])
				; CHECK: movl ${{[1-4]+}}, ([[REG3]])
				; CHECK: movl ${{[1-4]+}}, ([[REG2]])
				; CHECK: movl ${{[1-4]+}}, ([[REG3]])
				}

This is an archive of the discontinued LLVM Phabricator instance.

LEA code size optimization pass (Part 1): Remove redundant address recalculations
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 36112

lib/Target/X86/CMakeLists.txt

lib/Target/X86/X86.h

lib/Target/X86/X86OptimizeLEAs.cpp

lib/Target/X86/X86TargetMachine.cpp

test/CodeGen/X86/lea-opt.ll

This is an archive of the discontinued LLVM Phabricator instance.

LEA code size optimization pass (Part 1): Remove redundant address recalculationsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 36112

lib/Target/X86/CMakeLists.txt

lib/Target/X86/X86.h

lib/Target/X86/X86OptimizeLEAs.cpp

lib/Target/X86/X86TargetMachine.cpp

test/CodeGen/X86/lea-opt.ll

LEA code size optimization pass (Part 1): Remove redundant address recalculations
ClosedPublic