This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/
-
llvm/
-
Analysis/
2
TargetTransformInfo.h
1
TargetTransformInfoImpl.h
-
Transforms/Scalar/
-
Scalar/
-
ConstantHoisting.h
-
lib/
-
Analysis/
-
TargetTransformInfo.cpp
-
Target/ARM/
-
ARM/
-
ARMTargetTransformInfo.h
-
ARMTargetTransformInfo.cpp
-
Transforms/Scalar/
-
Scalar/
19
ConstantHoisting.cpp
-
test/Transforms/ConstantHoisting/ARM/
-
Transforms/
-
ConstantHoisting/
-
ARM/
-
const-addr-no-neg-offset.ll

Differential D21183

Better selection of common base address in constant hoisting
ClosedPublic

Authored by SjoerdMeijer on Jun 9 2016, 7:36 AM.

Download Raw Diff

Details

Reviewers

chandlerc
jmolloy
mehdi_amini
ributzka
mcrosier

Commits

rG38c2cd0c1499: This implements a more optimal algorithm for selecting a base constant in…
rL275382: This implements a more optimal algorithm for selecting a base constant in

Summary

This implements a more optimal algorithm for selecting a base constant in constant hoisting. It not only takes into account the number of uses of constants, but now also the resulting integer range of the offsets. Thus, the algorithm maximizes the number of uses within an integer range that will enable more efficient code generation. On ARM, for example, this will enable code size optimisations because less negative offsets will be created. Negative offsets/immediates are not supported by Thumb1 thus preventing more compact instruction encoding.

Diff Detail

Event Timeline

SjoerdMeijer updated this revision to Diff 60170.Jun 9 2016, 7:36 AM

SjoerdMeijer retitled this revision from to Better selection of common base address in constant hoisting.

SjoerdMeijer updated this object.

SjoerdMeijer added reviewers: jmolloy, mcrosier, ributzka.

SjoerdMeijer added a subscriber: llvm-commits.

Herald added a subscriber: aemerson. · View Herald TranscriptJun 9 2016, 7:36 AM

Currently, this change impacts all targets. For those targets that don't implement the isImmediateInRangeForLoad API, do we expect this change to be an improvement in general? If not, we might consider having isImmediateInRangeForLoad return an optional bool, so only those targets that implement isImmediateInRangeForLoad are impacted.

lib/Transforms/Scalar/ConstantHoisting.cpp
296	differend -> different
300	Can this just be a static helper function?
302	Does this need to be initialized to 'None'?
308	No need for the extra brackets.
359	No need for the extra brackets.
363	No need for the extra brackets.
411	I'd prefer the original 'ConstCand' over just 'i'.
565	No need for the extra brackets.

Looks like a nice improvement to ConstantHoisting, but I am a little worried about the limited scope and implementation for load optimization only.

Constants are not only used by load/store instructions, so using "isImmediateInRangeForLoad" is very misleading. Also it might negatively impact decisions we used to make for other instructions.

Did you run the test suite to measure the performance and compile time impact for X86 and ARM/AArch64?

include/llvm/Analysis/TargetTransformInfo.h
410	Please add a comment describing the new TTI method.
lib/Transforms/Scalar/ConstantHoisting.cpp
79	CandidatesHaveUses -> candidatesHaveUses
80	calcOffsetDiff -> calculateOffsetDiff
357	What about the case when the constant is not used by a load?
test/Transforms/ConstantHoisting/X86/phi.ll
23 ↗	(On Diff #60170)	OLDHECK???

Thanks for reviewing!
Yes, this impacts all targets. For targets that don't implement the isImmediateInRangeForLoad API, the default value "true" is returned. This is of course to make sure that we don't exclude any constants that were considered before this change. But yes, this might cause codegen differences. Examples are the 2 regression tests that I had to change (masks.ll and phi.ll). There can be multiple solutions with the same "gain", and simply the first solution is picked. This has the side effect that offsets will be positive, which happens to be good, at least on ARM. I don't see if that would negatively impact other targets (because I don't know them well enough). And yes, I was struggling with the name "isImmediateInRangeForLoad". Initially I just had "isImmediateInRange", but then thought it might be too vague and changed it. But it might actually be better describing its usage, because it can be in range of anything, loads/stores etc.

I have been primarily focusing on correctness (obviously) and code size. This patch shows significant code size reductions on our motivating examples. The new regression tests const-addr-no-neg-offset.ll is a representative, minimal code example of that; significantly more 16-bit load/stores will be generated.

Tomorrow I will try to get performance numbers on the table for ARM and X86 and thus see if there is a performance penalty; I don't know the other architectures well enough to make a statement about this.

I am not worried about compile times. In my first straightforward O(N^2) prototype implementation this finishes in virtually no time for lists with hundreds of constants. With thousands it really started to take some time for number crunching. This more efficient implementation had no problems at all (but I don't have hard numbers for the test suite, will also do that tomorrow).

I haven't run actual numbers, but I did diff a few Spec2006 binaries with this patch applied. In general, I see more instructions on AArch64 (a target that doesn't implement the isImmediateInRangeForLoad hook).

For example, here's the diff for gcc:
Opcode static count diff summary:

  -304  movk w #
  -258  sub w w #
   -37  mov w #
   -27  ldr x [x #]
   -27  add w w w
   -18  mov x x
   -16  sub w w # lsl #
    -9  b 
    -6  ldp x x [x #] #
    -6  stp x x [x #]!
    -3  fadd s s s
    -2  sub x x # lsl #
    -2  str s [x #]
    -2  ldr s [x #]
    -1  fsub s s s
    -1  fmov s w
    -1  ldp s s [x #]
    -1  fmul s s s
    -1  stp s s [x #]
    -1  sub x x #
    -1  add x x x
    -1  scvtf s x
     1  orr w w #
     1  cbz/nz w 
     1  ldrb w [x #]
     2  and w w #
     3  orr w w w
     3  cbz/nz x 
     4  str x [x #]
     5  str w [x #]
     5  ldr x [x #] #
     5  str x [x #]!
     6  add x x # lsl #
     7  ldr w [x #]
     9  bl  
    19  ldp x x [x #]
    20  mov w w
    22  stp x x [x #]
    43  add x x #
    54  add w w # lsl #
    57  adrp x  
   783  add w w #
-------------------------
  1050  added (excluding nops)
   725  removed (excluding nops)
   325  net (excluding nops)
  1050  added
   725  removed
   325  net

In short, an additional 325 static instructions are introduced.

Here's the diff for Sphinx:
Opcode static count diff summary:

   -22  mov x x
   -16  adrp x  
    -5  mov w #
    -3  sub x x #
    -2  stp x x [x #]
    -1  ldrb w [x #]!
    -1  bl  
    -1  orr w w #
    -1  strb w [x #]!
    -1  ldp x x [x #]
    -1  ldr w [x #]!
    -1  movk w #
     1  ldrb w [x #]
     1  ldr w [x #]
     3  str x [x #]
     4  b 
     7  add x x # lsl #
    13  add x x #
    31  ldr x [x #]
-------------------------
    60  added (excluding nops)
    55  removed (excluding nops)
     5  net (excluding nops)
    60  added
    55  removed
     5  net

I see many more loads, but I haven't investigated further. Interestingly enough for sjeng I see an opposite trend (more adds and fewer loads).

I'm going to see if I can implement the isImmediateInRangeForLoad for AArch64.

lib/Transforms/Scalar/ConstantHoisting.cpp
288	Rather than wrap all these dbgs() print statements with the DEBUG macro, why not just wrap the call to printOffsetRange() with the DEBUG macro. IMO, this makes it more clear that this is just a debug dump in the context of the caller.

I have some first performance results for 2 configurations: "cortex-a53, aarch32, –mthumb" and "cortex-a53, aarch64" and the results are fairly neutral. So some regression, some improvements; they really cancel each other out. I want to run a few more benchmarks, and also run the test suite on X86.
Next is also to get more numbers on code size.

In D21183#453606, @ributzka wrote:

Constants are not only used by load/store instructions, so using "isImmediateInRangeForLoad" is very misleading. Also it might negatively impact decisions we used to make for other instructions.

Mhh. The fact that AArch64 doesn't even implement the TTI hook and shows different behavior is not great - especially since it seems to have increased code size.

In its current form this is still a very targeted optimization for Thumb1 load/store. Wouldn't an ARM specific MachineFunction pass be better for this?

Yes, I agree that in the results are somewhat disappointing; it looks like that it is beneficial for thumb but not for the other targets, at least not in it's current form. That suggests it should be an ARM/thumb specific optimisations; or the current implementation needs improvement. I will be looking further into this to see what's best/feasible.

Resigning to get this off my queue. Please add me back if you decide to proceed with this or a similar change.

SjoerdMeijer added a reviewer: mcrosier.Jul 6 2016, 9:27 AM

Sorry for the delay (this work was interrupted by my holiday and some other work).

Compared to the previous patch, this are the changes:

It still affects all targets, but the code triggers only at –Os
We have traded readability/maintainability of the algorithm over efficiency. So it is now less efficient, but a simple check that sees if the list contains more than 100 items in the worklist makes the code fall back to the old algorithm. The algorithm is also slightly different as it sums up the costs of all uses and then subtracts values for offsets that are out of range; before we were not really tracking the costs of all uses.
For X86 I have checked that the behaviour is the same as the old algorithm for the LNT suite.
I don’t see any regressions for aarch32 thumb mode on LNT, coremark, dhrystone, eembc, but no improvements either. Our motivating example still sees the significant code size reduction though.
I still do need an extra callback to get the costs for offsets. The reasons is that the existing getIntImmCosts functions checks which subtarget is being targeted. This doesn’t help because we want to know the costs of an immediate in a different ISA (i.e. thumb1).

jmolloy added inline comments.Jul 8 2016, 5:45 AM

include/llvm/Analysis/TargetTransformInfo.h
418	Having this hook be context-free will hinder targets implementing it usefully. It should take the same arguments as getIntImmCost(), so that targets can give a contextual answer if they wish.
lib/Transforms/Scalar/ConstantHoisting.cpp
288	"different"
341	You can use a range-for loop here.
342	This should be hoisted out of the inner loop as it is it invariant.
347	The formatting looks off here - have you used clang-format?

Addressed James' comments.

This is looking fine to me now, but please wait for Mehdi or someone else to also look it over.

include/llvm/Analysis/TargetTransformInfoImpl.h
261	The formatting here isn't LLVM style. The return sholud be on a new line, then the } on the next line.

A few very superficial comments.

lib/Transforms/Scalar/ConstantHoisting.cpp
328	Thanks very much for well documenting this!
390	Any reason to add braces here?
392	All this code could be sunk in `maximizeConstantsInRange()`. It would also make the comment that described `maximizeConstantsInRange` more correct (it mentions the 100 limits and falling back to the old algorithm).

Thanks Mehdi and James for reviewing! I have addressed your comments.

LGTM

This revision is now accepted and ready to land.Jul 13 2016, 9:37 AM

Closed by commit rL275382: This implements a more optimal algorithm for selecting a base constant in (authored by SjoerdMeijer). · Explain WhyJul 14 2016, 12:51 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

include/

llvm/

Analysis/

TargetTransformInfo.h

16 lines

TargetTransformInfoImpl.h

3 lines

Transforms/

Scalar/

ConstantHoisting.h

3 lines

lib/

Analysis/

TargetTransformInfo.cpp

8 lines

Target/

ARM/

ARMTargetTransformInfo.h

3 lines

ARMTargetTransformInfo.cpp

11 lines

Transforms/

Scalar/

ConstantHoisting.cpp

101 lines

test/

Transforms/

ConstantHoisting/

ARM/

const-addr-no-neg-offset.ll

42 lines

Diff 63660

include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 401 Lines • ▼ Show 20 Lines	public:
/// \brief Return the expected cost of supporting the floating point operation		/// \brief Return the expected cost of supporting the floating point operation
/// of the specified type.		/// of the specified type.
int getFPOpCost(Type *Ty) const;		int getFPOpCost(Type *Ty) const;

/// \brief Return the expected cost of materializing for the given integer		/// \brief Return the expected cost of materializing for the given integer
/// immediate of the specified type.		/// immediate of the specified type.
int getIntImmCost(const APInt &Imm, Type *Ty) const;		int getIntImmCost(const APInt &Imm, Type *Ty) const;

/// \brief Return the expected cost of materialization for the given integer		/// \brief Return the expected cost of materialization for the given integer
		ributzkaUnsubmitted Not Done Reply Inline Actions Please add a comment describing the new TTI method. ributzka: Please add a comment describing the new TTI method.
/// immediate of the specified type for a given instruction. The cost can be		/// immediate of the specified type for a given instruction. The cost can be
/// zero if the immediate can be folded into the specified instruction.		/// zero if the immediate can be folded into the specified instruction.
int getIntImmCost(unsigned Opc, unsigned Idx, const APInt &Imm,		int getIntImmCost(unsigned Opc, unsigned Idx, const APInt &Imm,
Type *Ty) const;		Type *Ty) const;
int getIntImmCost(Intrinsic::ID IID, unsigned Idx, const APInt &Imm,		int getIntImmCost(Intrinsic::ID IID, unsigned Idx, const APInt &Imm,
Type *Ty) const;		Type *Ty) const;

		/// \brief Return the expected cost for the given integer when optimising
		jmolloyUnsubmitted Not Done Reply Inline Actions Having this hook be context-free will hinder targets implementing it usefully. It should take the same arguments as getIntImmCost(), so that targets can give a contextual answer if they wish. jmolloy: Having this hook be context-free will hinder targets implementing it usefully. It should take…
		/// for size. This is different than the other integer immediate cost
		/// functions in that it is subtarget agnostic. This is useful when you e.g.
		/// target one ISA such as Aarch32 but smaller encodings could be possible
		/// with another such as Thumb. This return value is used as a penalty when
		/// the total costs for a constant is calculated (the bigger the cost, the
		/// more beneficial constant hoisting is).
		int getIntImmCodeSizeCost(unsigned Opc, unsigned Idx, const APInt &Imm,
		Type *Ty) const;
/// @}		/// @}

/// \name Vector Target Information		/// \name Vector Target Information
/// @{		/// @{

/// \brief The various kinds of shuffle patterns for vector queries.		/// \brief The various kinds of shuffle patterns for vector queries.
enum ShuffleKind {		enum ShuffleKind {
SK_Broadcast, ///< Broadcast element 0 to all other elements.		SK_Broadcast, ///< Broadcast element 0 to all other elements.
▲ Show 20 Lines • Show All 235 Lines • ▼ Show 20 Lines	public:
virtual bool isFPVectorizationPotentiallyUnsafe() = 0;		virtual bool isFPVectorizationPotentiallyUnsafe() = 0;
virtual bool allowsMisalignedMemoryAccesses(unsigned BitWidth,		virtual bool allowsMisalignedMemoryAccesses(unsigned BitWidth,
unsigned AddressSpace,		unsigned AddressSpace,
unsigned Alignment,		unsigned Alignment,
bool *Fast) = 0;		bool *Fast) = 0;
virtual PopcntSupportKind getPopcntSupport(unsigned IntTyWidthInBit) = 0;		virtual PopcntSupportKind getPopcntSupport(unsigned IntTyWidthInBit) = 0;
virtual bool haveFastSqrt(Type *Ty) = 0;		virtual bool haveFastSqrt(Type *Ty) = 0;
virtual int getFPOpCost(Type *Ty) = 0;		virtual int getFPOpCost(Type *Ty) = 0;
		virtual int getIntImmCodeSizeCost(unsigned Opc, unsigned Idx, const APInt &Imm,
		Type *Ty) = 0;
virtual int getIntImmCost(const APInt &Imm, Type *Ty) = 0;		virtual int getIntImmCost(const APInt &Imm, Type *Ty) = 0;
virtual int getIntImmCost(unsigned Opc, unsigned Idx, const APInt &Imm,		virtual int getIntImmCost(unsigned Opc, unsigned Idx, const APInt &Imm,
Type *Ty) = 0;		Type *Ty) = 0;
virtual int getIntImmCost(Intrinsic::ID IID, unsigned Idx, const APInt &Imm,		virtual int getIntImmCost(Intrinsic::ID IID, unsigned Idx, const APInt &Imm,
Type *Ty) = 0;		Type *Ty) = 0;
virtual unsigned getNumberOfRegisters(bool Vector) = 0;		virtual unsigned getNumberOfRegisters(bool Vector) = 0;
virtual unsigned getRegisterBitWidth(bool Vector) = 0;		virtual unsigned getRegisterBitWidth(bool Vector) = 0;
virtual unsigned getLoadStoreVecRegBitWidth(unsigned AddrSpace) = 0;		virtual unsigned getLoadStoreVecRegBitWidth(unsigned AddrSpace) = 0;
▲ Show 20 Lines • Show All 160 Lines • ▼ Show 20 Lines	public:
}		}
PopcntSupportKind getPopcntSupport(unsigned IntTyWidthInBit) override {		PopcntSupportKind getPopcntSupport(unsigned IntTyWidthInBit) override {
return Impl.getPopcntSupport(IntTyWidthInBit);		return Impl.getPopcntSupport(IntTyWidthInBit);
}		}
bool haveFastSqrt(Type *Ty) override { return Impl.haveFastSqrt(Ty); }		bool haveFastSqrt(Type *Ty) override { return Impl.haveFastSqrt(Ty); }

int getFPOpCost(Type *Ty) override { return Impl.getFPOpCost(Ty); }		int getFPOpCost(Type *Ty) override { return Impl.getFPOpCost(Ty); }

		int getIntImmCodeSizeCost(unsigned Opc, unsigned Idx, const APInt &Imm,
		Type *Ty) override {
		return Impl.getIntImmCodeSizeCost(Opc, Idx, Imm, Ty);
		}
int getIntImmCost(const APInt &Imm, Type *Ty) override {		int getIntImmCost(const APInt &Imm, Type *Ty) override {
return Impl.getIntImmCost(Imm, Ty);		return Impl.getIntImmCost(Imm, Ty);
}		}
int getIntImmCost(unsigned Opc, unsigned Idx, const APInt &Imm,		int getIntImmCost(unsigned Opc, unsigned Idx, const APInt &Imm,
Type *Ty) override {		Type *Ty) override {
return Impl.getIntImmCost(Opc, Idx, Imm, Ty);		return Impl.getIntImmCost(Opc, Idx, Imm, Ty);
}		}
int getIntImmCost(Intrinsic::ID IID, unsigned Idx, const APInt &Imm,		int getIntImmCost(Intrinsic::ID IID, unsigned Idx, const APInt &Imm,
▲ Show 20 Lines • Show All 215 Lines • Show Last 20 Lines

include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 251 Lines • ▼ Show 20 Lines	public:
TTI::PopcntSupportKind getPopcntSupport(unsigned IntTyWidthInBit) {		TTI::PopcntSupportKind getPopcntSupport(unsigned IntTyWidthInBit) {
return TTI::PSK_Software;		return TTI::PSK_Software;
}		}

bool haveFastSqrt(Type *Ty) { return false; }		bool haveFastSqrt(Type *Ty) { return false; }

unsigned getFPOpCost(Type *Ty) { return TargetTransformInfo::TCC_Basic; }		unsigned getFPOpCost(Type *Ty) { return TargetTransformInfo::TCC_Basic; }

		int getIntImmCodeSizeCost(unsigned Opcode, unsigned Idx, const APInt &Imm,
		Type *Ty) { return 0; }
		jmolloyUnsubmitted Not Done Reply Inline Actions The formatting here isn't LLVM style. The return sholud be on a new line, then the } on the next line. jmolloy: The formatting here isn't LLVM style. The return sholud be on a new line, then the } on the…

unsigned getIntImmCost(const APInt &Imm, Type *Ty) { return TTI::TCC_Basic; }		unsigned getIntImmCost(const APInt &Imm, Type *Ty) { return TTI::TCC_Basic; }

unsigned getIntImmCost(unsigned Opcode, unsigned Idx, const APInt &Imm,		unsigned getIntImmCost(unsigned Opcode, unsigned Idx, const APInt &Imm,
Type *Ty) {		Type *Ty) {
return TTI::TCC_Free;		return TTI::TCC_Free;
}		}

unsigned getIntImmCost(Intrinsic::ID IID, unsigned Idx, const APInt &Imm,		unsigned getIntImmCost(Intrinsic::ID IID, unsigned Idx, const APInt &Imm,
▲ Show 20 Lines • Show All 269 Lines • Show Last 20 Lines

include/llvm/Transforms/Scalar/ConstantHoisting.h

Show First 20 Lines • Show All 128 Lines • ▼ Show 20 Lines	private:
void collectConstantCandidates(ConstCandMapType &ConstCandMap,		void collectConstantCandidates(ConstCandMapType &ConstCandMap,
Instruction *Inst, unsigned Idx,		Instruction *Inst, unsigned Idx,
ConstantInt *ConstInt);		ConstantInt *ConstInt);
void collectConstantCandidates(ConstCandMapType &ConstCandMap,		void collectConstantCandidates(ConstCandMapType &ConstCandMap,
Instruction *Inst);		Instruction *Inst);
void collectConstantCandidates(Function &Fn);		void collectConstantCandidates(Function &Fn);
void findAndMakeBaseConstant(ConstCandVecType::iterator S,		void findAndMakeBaseConstant(ConstCandVecType::iterator S,
ConstCandVecType::iterator E);		ConstCandVecType::iterator E);
		unsigned maximizeConstantsInRange(ConstCandVecType::iterator S,
		ConstCandVecType::iterator E,
		ConstCandVecType::iterator &MaxCostItr);
void findBaseConstants();		void findBaseConstants();
void emitBaseConstants(Instruction Base, Constant Offset,		void emitBaseConstants(Instruction Base, Constant Offset,
const consthoist::ConstantUser &ConstUser);		const consthoist::ConstantUser &ConstUser);
bool emitBaseConstants();		bool emitBaseConstants();
void deleteDeadCastInst() const;		void deleteDeadCastInst() const;
bool optimizeConstants(Function &Fn);		bool optimizeConstants(Function &Fn);
};		};
}		}

#endif // LLVM_TRANSFORMS_SCALAR_CONSTANTHOISTING_H		#endif // LLVM_TRANSFORMS_SCALAR_CONSTANTHOISTING_H

lib/Analysis/TargetTransformInfo.cpp

	Show First 20 Lines • Show All 203 Lines • ▼ Show 20 Lines
	}			}

	int TargetTransformInfo::getFPOpCost(Type *Ty) const {			int TargetTransformInfo::getFPOpCost(Type *Ty) const {
	int Cost = TTIImpl->getFPOpCost(Ty);			int Cost = TTIImpl->getFPOpCost(Ty);
	assert(Cost >= 0 && "TTI should not produce negative costs!");			assert(Cost >= 0 && "TTI should not produce negative costs!");
	return Cost;			return Cost;
	}			}

				int TargetTransformInfo::getIntImmCodeSizeCost(unsigned Opcode, unsigned Idx,
				const APInt &Imm,
				Type *Ty) const {
				int Cost = TTIImpl->getIntImmCodeSizeCost(Opcode, Idx, Imm, Ty);
				assert(Cost >= 0 && "TTI should not produce negative costs!");
				return Cost;
				}

	int TargetTransformInfo::getIntImmCost(const APInt &Imm, Type *Ty) const {			int TargetTransformInfo::getIntImmCost(const APInt &Imm, Type *Ty) const {
	int Cost = TTIImpl->getIntImmCost(Imm, Ty);			int Cost = TTIImpl->getIntImmCost(Imm, Ty);
	assert(Cost >= 0 && "TTI should not produce negative costs!");			assert(Cost >= 0 && "TTI should not produce negative costs!");
	return Cost;			return Cost;
	}			}

	int TargetTransformInfo::getIntImmCost(unsigned Opcode, unsigned Idx,			int TargetTransformInfo::getIntImmCost(unsigned Opcode, unsigned Idx,
	const APInt &Imm, Type *Ty) const {			const APInt &Imm, Type *Ty) const {
	▲ Show 20 Lines • Show All 241 Lines • Show Last 20 Lines

lib/Target/ARM/ARMTargetTransformInfo.h

Show First 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	public:
/// is IEEE-754 compliant, but it's not covered in this target.		/// is IEEE-754 compliant, but it's not covered in this target.
bool isFPVectorizationPotentiallyUnsafe() {		bool isFPVectorizationPotentiallyUnsafe() {
return !ST->isTargetDarwin();		return !ST->isTargetDarwin();
}		}

/// \name Scalar TTI Implementations		/// \name Scalar TTI Implementations
/// @{		/// @{

		int getIntImmCodeSizeCost(unsigned Opcode, unsigned Idx, const APInt &Imm,
		Type *Ty);

using BaseT::getIntImmCost;		using BaseT::getIntImmCost;
int getIntImmCost(const APInt &Imm, Type *Ty);		int getIntImmCost(const APInt &Imm, Type *Ty);

int getIntImmCost(unsigned Opcode, unsigned Idx, const APInt &Imm, Type *Ty);		int getIntImmCost(unsigned Opcode, unsigned Idx, const APInt &Imm, Type *Ty);

/// @}		/// @}

/// \name Vector TTI Implementations		/// \name Vector TTI Implementations
▲ Show 20 Lines • Show All 59 Lines • Show Last 20 Lines

lib/Target/ARM/ARMTargetTransformInfo.cpp

Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	if (Bits == 0 \|\| Imm.getActiveBits() >= 64)
if (SImmVal >= 0 && SImmVal < 256)		if (SImmVal >= 0 && SImmVal < 256)
return 1;		return 1;
if ((~ZImmVal < 256) \|\| ARM_AM::isThumbImmShiftedVal(ZImmVal))		if ((~ZImmVal < 256) \|\| ARM_AM::isThumbImmShiftedVal(ZImmVal))
return 2;		return 2;
// Load from constantpool.		// Load from constantpool.
return 3;		return 3;
}		}


		// Constants smaller than 256 fit in the immediate field of
		// Thumb1 instructions so we return a zero cost and 1 otherwise.
		int ARMTTIImpl::getIntImmCodeSizeCost(unsigned Opcode, unsigned Idx,
		const APInt &Imm, Type *Ty) {
		if (Imm.isNonNegative() && Imm.getLimitedValue() < 256)
		return 0;

		return 1;
		}

int ARMTTIImpl::getIntImmCost(unsigned Opcode, unsigned Idx, const APInt &Imm,		int ARMTTIImpl::getIntImmCost(unsigned Opcode, unsigned Idx, const APInt &Imm,
Type *Ty) {		Type *Ty) {
// Division by a constant can be turned into multiplication, but only if we		// Division by a constant can be turned into multiplication, but only if we
// know it's constant. So it's not so much that the immediate is cheap (it's		// know it's constant. So it's not so much that the immediate is cheap (it's
// not), but that the alternative is worse.		// not), but that the alternative is worse.
// FIXME: this is probably unneeded with GlobalISel.		// FIXME: this is probably unneeded with GlobalISel.
if ((Opcode == Instruction::SDiv \|\| Opcode == Instruction::UDiv \|\|		if ((Opcode == Instruction::SDiv \|\| Opcode == Instruction::UDiv \|\|
Opcode == Instruction::SRem \|\| Opcode == Instruction::URem) &&		Opcode == Instruction::SRem \|\| Opcode == Instruction::URem) &&
▲ Show 20 Lines • Show All 451 Lines • Show Last 20 Lines

lib/Transforms/Scalar/ConstantHoisting.cpp

Show First 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.addRequired<DominatorTreeWrapperPass>();		AU.addRequired<DominatorTreeWrapperPass>();
AU.addRequired<TargetTransformInfoWrapperPass>();		AU.addRequired<TargetTransformInfoWrapperPass>();
}		}

void releaseMemory() override { Impl.releaseMemory(); }		void releaseMemory() override { Impl.releaseMemory(); }

private:		private:
ConstantHoistingPass Impl;		ConstantHoistingPass Impl;
};		};
		ributzkaUnsubmitted Not Done Reply Inline Actions CandidatesHaveUses -> candidatesHaveUses ributzka: CandidatesHaveUses -> candidatesHaveUses
}		}
		ributzkaUnsubmitted Not Done Reply Inline Actions calcOffsetDiff -> calculateOffsetDiff ributzka: calcOffsetDiff -> calculateOffsetDiff

char ConstantHoistingLegacyPass::ID = 0;		char ConstantHoistingLegacyPass::ID = 0;
INITIALIZE_PASS_BEGIN(ConstantHoistingLegacyPass, "consthoist",		INITIALIZE_PASS_BEGIN(ConstantHoistingLegacyPass, "consthoist",
"Constant Hoisting", false, false)		"Constant Hoisting", false, false)
INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)		INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)
INITIALIZE_PASS_END(ConstantHoistingLegacyPass, "consthoist",		INITIALIZE_PASS_END(ConstantHoistingLegacyPass, "consthoist",
"Constant Hoisting", false, false)		"Constant Hoisting", false, false)
▲ Show 20 Lines • Show All 191 Lines • ▼ Show 20 Lines
/// into an instruction itself.		/// into an instruction itself.
void ConstantHoistingPass::collectConstantCandidates(Function &Fn) {		void ConstantHoistingPass::collectConstantCandidates(Function &Fn) {
ConstCandMapType ConstCandMap;		ConstCandMapType ConstCandMap;
for (BasicBlock &BB : Fn)		for (BasicBlock &BB : Fn)
for (Instruction &Inst : BB)		for (Instruction &Inst : BB)
collectConstantCandidates(ConstCandMap, &Inst);		collectConstantCandidates(ConstCandMap, &Inst);
}		}

		// This helper function is necessary to deal with values that have different
		mcrosierUnsubmitted Not Done Reply Inline Actions Rather than wrap all these dbgs() print statements with the DEBUG macro, why not just wrap the call to printOffsetRange() with the DEBUG macro. IMO, this makes it more clear that this is just a debug dump in the context of the caller. mcrosier: Rather than wrap all these dbgs() print statements with the DEBUG macro, why not just wrap the…
		jmolloyUnsubmitted Not Done Reply Inline Actions "different" jmolloy: "different"
		// bit widths (APInt Operator- does not like that). If the value cannot be
		// represented in uint64 we return an "empty" APInt. This is then interpreted
		// as the value is not in range.
		static llvm::Optional<APInt> calculateOffsetDiff(APInt V1, APInt V2)
		{
		llvm::Optional<APInt> Res = None;
		unsigned BW = V1.getBitWidth() > V2.getBitWidth() ?
		V1.getBitWidth() : V2.getBitWidth();
		mcrosierUnsubmitted Not Done Reply Inline Actions differend -> different mcrosier: differend -> different
		uint64_t LimVal1 = V1.getLimitedValue();
		uint64_t LimVal2 = V2.getLimitedValue();

		if (LimVal1 == ~0ULL \|\| LimVal2 == ~0ULL)
		mcrosierUnsubmitted Not Done Reply Inline Actions Can this just be a static helper function? mcrosier: Can this just be a static helper function?
		return Res;

		mcrosierUnsubmitted Not Done Reply Inline Actions Does this need to be initialized to 'None'? mcrosier: Does this need to be initialized to 'None'?
		uint64_t Diff = LimVal1 - LimVal2;
		return APInt(BW, Diff, true);
		}

		// From a list of constants, one needs to picked as the base and the other
		// constants will be transformed into an offset from that base constant. The
		mcrosierUnsubmitted Not Done Reply Inline Actions No need for the extra brackets. mcrosier: No need for the extra brackets.
		// question is which we can pick best? For example, consider these constants
		// and their number of uses:
		//
		// Constants\| 2 \| 4 \| 12 \| 42 \|
		// NumUses \| 3 \| 2 \| 8 \| 7 \|
		//
		// Selecting constant 12 because it has the most uses will generate negative
		// offsets for constants 2 and 4 (i.e. -10 and -8 respectively). If negative
		// offsets lead to less optimal code generation, then there might be better
		// solutions. Suppose immediates in the range of 0..35 are most optimally
		// supported by the architecture, then selecting constant 2 is most optimal
		// because this will generate offsets: 0, 2, 10, 40. Offsets 0, 2 and 10 are in
		// range 0..35, and thus 3 + 2 + 8 = 13 uses are in range. Selecting 12 would
		// have only 8 uses in range, so choosing 2 as a base is more optimal. Thus, in
		// selecting the base constant the range of the offsets is a very important
		// factor too that we take into account here. This algorithm calculates a total
		// costs for selecting a constant as the base and substract the costs if
		// immediates are out of range. It has quadratic complexity, so we call this
		// function only when we're optimising for size and there are less than 100
		// constants, we fall back to the old algorithm otherwise.
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions Thanks very much for well documenting this! mehdi_amini: Thanks very much for well documenting this!
		unsigned
		ConstantHoistingPass::maximizeConstantsInRange(ConstCandVecType::iterator S,
		ConstCandVecType::iterator E,
		ConstCandVecType::iterator &MaxCostItr) {
		unsigned NumUses = 0;

		DEBUG(dbgs() << "== Maximize constants in range ==\n");
		int MaxCost = -1;
		for (auto ConstCand = S; ConstCand != E; ++ConstCand) {
		auto Value = ConstCand->ConstInt->getValue();
		Type *Ty = ConstCand->ConstInt->getType();
		int Cost = 0;
		NumUses += ConstCand->Uses.size();
		jmolloyUnsubmitted Not Done Reply Inline Actions You can use a range-for loop here. jmolloy: You can use a range-for loop here.
		DEBUG(dbgs() << "= Constant: " << ConstCand->ConstInt->getValue() << "\n");
		jmolloyUnsubmitted Not Done Reply Inline Actions This should be hoisted out of the inner loop as it is it invariant. jmolloy: This should be hoisted out of the inner loop as it is it invariant.

		for (auto User : ConstCand->Uses) {
		unsigned Opcode = User.Inst->getOpcode();
		unsigned OpndIdx = User.OpndIdx;
		Cost += TTI->getIntImmCost(Opcode, OpndIdx, Value, Ty);
		jmolloyUnsubmitted Not Done Reply Inline Actions The formatting looks off here - have you used clang-format? jmolloy: The formatting looks off here - have you used clang-format?
		DEBUG(dbgs() << "Cost: " << Cost << "\n");

		for (auto C2 = S; C2 != E; ++C2) {
		llvm::Optional<APInt> Diff = calculateOffsetDiff(
		C2->ConstInt->getValue(),
		ConstCand->ConstInt->getValue());
		if (Diff) {
		const int ImmCosts =
		TTI->getIntImmCodeSizeCost(Opcode, OpndIdx, Diff.getValue(), Ty);
		Cost -= ImmCosts;
		ributzkaUnsubmitted Not Done Reply Inline Actions What about the case when the constant is not used by a load? ributzka: What about the case when the constant is not used by a load?
		DEBUG(dbgs() << "Offset " << Diff.getValue() << " "
		<< "has penalty: " << ImmCosts << "\n"
		mcrosierUnsubmitted Not Done Reply Inline Actions No need for the extra brackets. mcrosier: No need for the extra brackets.
		<< "Adjusted cost: " << Cost << "\n");
		}
		}
		}
		mcrosierUnsubmitted Not Done Reply Inline Actions No need for the extra brackets. mcrosier: No need for the extra brackets.
		DEBUG(dbgs() << "Cumulative cost: " << Cost << "\n");
		if (Cost > MaxCost) {
		MaxCost = Cost;
		MaxCostItr = ConstCand;
		DEBUG(dbgs() << "New candidate: " << MaxCostItr->ConstInt->getValue()
		<< "\n");
		}
		}
		return NumUses;
		}

/// \brief Find the base constant within the given range and rebase all other		/// \brief Find the base constant within the given range and rebase all other
/// constants with respect to the base constant.		/// constants with respect to the base constant.
void ConstantHoistingPass::findAndMakeBaseConstant(		void ConstantHoistingPass::findAndMakeBaseConstant(
ConstCandVecType::iterator S, ConstCandVecType::iterator E) {		ConstCandVecType::iterator S, ConstCandVecType::iterator E) {
auto MaxCostItr = S;		auto MaxCostItr = S;
unsigned NumUses = 0;		unsigned NumUses = 0;

// Use the constant that has the maximum cost as base constant.		// Use the constant that has the maximum cost as base constant.
		if(Entry->getParent()->optForSize() && std::distance(S,E) < 100) {
		NumUses = maximizeConstantsInRange(S, E, MaxCostItr);
		} else {
for (auto ConstCand = S; ConstCand != E; ++ConstCand) {		for (auto ConstCand = S; ConstCand != E; ++ConstCand) {
NumUses += ConstCand->Uses.size();		NumUses += ConstCand->Uses.size();
if (ConstCand->CumulativeCost > MaxCostItr->CumulativeCost)		if (ConstCand->CumulativeCost > MaxCostItr->CumulativeCost) {
MaxCostItr = ConstCand;		MaxCostItr = ConstCand;
}		}
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions Any reason to add braces here? mehdi_amini: Any reason to add braces here?
		}
		}
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions All this code could be sunk in `maximizeConstantsInRange()`. It would also make the comment that described `maximizeConstantsInRange` more correct (it mentions the 100 limits and falling back to the old algorithm). mehdi_amini: All this code could be sunk in `maximizeConstantsInRange()`. It would also make the comment…

// Don't hoist constants that have only one use.		// Don't hoist constants that have only one use.
if (NumUses <= 1)		if (NumUses <= 1)
return;		return;

ConstantInfo ConstInfo;		ConstantInfo ConstInfo;
ConstInfo.BaseConstant = MaxCostItr->ConstInt;		ConstInfo.BaseConstant = MaxCostItr->ConstInt;
Type *Ty = ConstInfo.BaseConstant->getType();		Type *Ty = ConstInfo.BaseConstant->getType();

// Rebase the constants with respect to the base constant.		// Rebase the constants with respect to the base constant.
for (auto ConstCand = S; ConstCand != E; ++ConstCand) {		for (auto ConstCand = S; ConstCand != E; ++ConstCand) {
APInt Diff = ConstCand->ConstInt->getValue() -		APInt Diff = ConstCand->ConstInt->getValue() -
ConstInfo.BaseConstant->getValue();		ConstInfo.BaseConstant->getValue();
Constant *Offset = Diff == 0 ? nullptr : ConstantInt::get(Ty, Diff);		Constant *Offset = Diff == 0 ? nullptr : ConstantInt::get(Ty, Diff);
ConstInfo.RebasedConstants.push_back(		ConstInfo.RebasedConstants.push_back(
RebasedConstantInfo(std::move(ConstCand->Uses), Offset));		RebasedConstantInfo(std::move(ConstCand->Uses), Offset));
}		}
ConstantVec.push_back(std::move(ConstInfo));		ConstantVec.push_back(std::move(ConstInfo));
}		}
		mcrosierUnsubmitted Not Done Reply Inline Actions I'd prefer the original 'ConstCand' over just 'i'. mcrosier: I'd prefer the original 'ConstCand' over just 'i'.

/// \brief Finds and combines constant candidates that can be easily		/// \brief Finds and combines constant candidates that can be easily
/// rematerialized with an add from a common base constant.		/// rematerialized with an add from a common base constant.
void ConstantHoistingPass::findBaseConstants() {		void ConstantHoistingPass::findBaseConstants() {
// Sort the constants by value and type. This invalidates the mapping!		// Sort the constants by value and type. This invalidates the mapping!
std::sort(ConstCandVec.begin(), ConstCandVec.end(),		std::sort(ConstCandVec.begin(), ConstCandVec.end(),
[](const ConstantCandidate &LHS, const ConstantCandidate &RHS) {		[](const ConstantCandidate &LHS, const ConstantCandidate &RHS) {
if (LHS.ConstInt->getType() != RHS.ConstInt->getType())		if (LHS.ConstInt->getType() != RHS.ConstInt->getType())
▲ Show 20 Lines • Show All 137 Lines • ▼ Show 20 Lines	Instruction *Base =
new BitCastInst(ConstInfo.BaseConstant, Ty, "const", IP);		new BitCastInst(ConstInfo.BaseConstant, Ty, "const", IP);
DEBUG(dbgs() << "Hoist constant (" << *ConstInfo.BaseConstant << ") to BB "		DEBUG(dbgs() << "Hoist constant (" << *ConstInfo.BaseConstant << ") to BB "
<< IP->getParent()->getName() << '\n' << *Base << '\n');		<< IP->getParent()->getName() << '\n' << *Base << '\n');
NumConstantsHoisted++;		NumConstantsHoisted++;

// Emit materialization code for all rebased constants.		// Emit materialization code for all rebased constants.
for (auto const &RCI : ConstInfo.RebasedConstants) {		for (auto const &RCI : ConstInfo.RebasedConstants) {
NumConstantsRebased++;		NumConstantsRebased++;
for (auto const &U : RCI.Uses)		for (auto const &U : RCI.Uses)
		mcrosierUnsubmitted Not Done Reply Inline Actions No need for the extra brackets. mcrosier: No need for the extra brackets.
emitBaseConstants(Base, RCI.Offset, U);		emitBaseConstants(Base, RCI.Offset, U);
}		}

// Use the same debug location as the last user of the constant.		// Use the same debug location as the last user of the constant.
assert(!Base->use_empty() && "The use list is empty!?");		assert(!Base->use_empty() && "The use list is empty!?");
assert(isa<Instruction>(Base->user_back()) &&		assert(isa<Instruction>(Base->user_back()) &&
"All uses should be instructions.");		"All uses should be instructions.");
Base->setDebugLoc(cast<Instruction>(Base->user_back())->getDebugLoc());		Base->setDebugLoc(cast<Instruction>(Base->user_back())->getDebugLoc());
▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

test/Transforms/ConstantHoisting/ARM/const-addr-no-neg-offset.ll

This file was added.

				; RUN: opt -mtriple=arm-arm-none-eabi -consthoist -S < %s \| FileCheck %s

				; There are different candidates here for the base constant: 1073876992 and
				; 1073876996. But we don't want to see the latter because it results in
				; negative offsets.

				define void @foo() #0 {
				entry:
				; CHECK-LABEL: @foo
				; CHECK-NOT: [[CONST1:%const_mat[0-9]*]] = add i32 %const, -4
				%0 = load volatile i32, i32* inttoptr (i32 1073876992 to i32*), align 4096
				%or = or i32 %0, 1
				store volatile i32 %or, i32* inttoptr (i32 1073876992 to i32*), align 4096
				%1 = load volatile i32, i32* inttoptr (i32 1073876996 to i32*), align 4
				%and = and i32 %1, -117506048
				store volatile i32 %and, i32* inttoptr (i32 1073876996 to i32*), align 4
				%2 = load volatile i32, i32* inttoptr (i32 1073876992 to i32*), align 4096
				%and1 = and i32 %2, -17367041
				store volatile i32 %and1, i32* inttoptr (i32 1073876996 to i32*), align 4096
				%3 = load volatile i32, i32* inttoptr (i32 1073876992 to i32*), align 4096
				%and2 = and i32 %3, -262145
				store volatile i32 %and2, i32* inttoptr (i32 1073876992 to i32*), align 4096
				%4 = load volatile i32, i32* inttoptr (i32 1073876996 to i32*), align 4
				%and3 = and i32 %4, -8323073
				store volatile i32 %and3, i32* inttoptr (i32 1073876996 to i32*), align 4
				store volatile i32 10420224, i32* inttoptr (i32 1073877000 to i32*), align 8
				%5 = load volatile i32, i32* inttoptr (i32 1073876996 to i32*), align 4096
				%or4 = or i32 %5, 65536
				store volatile i32 %or4, i32* inttoptr (i32 1073876996 to i32*), align 4096
				%6 = load volatile i32, i32* inttoptr (i32 1073881088 to i32*), align 8192
				%or6.i.i = or i32 %6, 16
				store volatile i32 %or6.i.i, i32* inttoptr (i32 1073881088 to i32*), align 8192
				%7 = load volatile i32, i32* inttoptr (i32 1073881088 to i32*), align 8192
				%and7.i.i = and i32 %7, -4
				store volatile i32 %and7.i.i, i32* inttoptr (i32 1073881088 to i32*), align 8192
				%8 = load volatile i32, i32* inttoptr (i32 1073881088 to i32*), align 8192
				%or8.i.i = or i32 %8, 2
				store volatile i32 %or8.i.i, i32* inttoptr (i32 1073881088 to i32*), align 8192
				ret void
				}

				attributes #0 = { minsize norecurse nounwind optsize readnone uwtable }

This is an archive of the discontinued LLVM Phabricator instance.

Better selection of common base address in constant hoistingClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 63660

include/llvm/Analysis/TargetTransformInfo.h

include/llvm/Analysis/TargetTransformInfoImpl.h

include/llvm/Transforms/Scalar/ConstantHoisting.h

lib/Analysis/TargetTransformInfo.cpp

lib/Target/ARM/ARMTargetTransformInfo.h

lib/Target/ARM/ARMTargetTransformInfo.cpp

lib/Transforms/Scalar/ConstantHoisting.cpp

test/Transforms/ConstantHoisting/ARM/const-addr-no-neg-offset.ll

Better selection of common base address in constant hoisting
ClosedPublic