This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
include/llvm/
-
llvm/
-
Analysis/
-
TargetTransformInfo.h
-
TargetTransformInfoImpl.h
-
Transforms/Scalar/
-
Scalar/
-
ConstantHoisting.h
-
lib/
-
Analysis/
-
TargetTransformInfo.cpp
-
Target/ARM/
-
ARM/
-
ARMTargetTransformInfo.h
-
ARMTargetTransformInfo.cpp
-
Transforms/Scalar/
-
Scalar/
-
ConstantHoisting.cpp
-
test/Transforms/ConstantHoisting/ARM/
-
Transforms/
-
ConstantHoisting/
-
ARM/
-
const-addr-no-neg-offset.ll

Differential D21183

Better selection of common base address in constant hoisting
ClosedPublic

Authored by SjoerdMeijer on Jun 9 2016, 7:36 AM.

Download Raw Diff

Details

Reviewers

chandlerc
jmolloy
mehdi_amini
ributzka
mcrosier

Commits

rG38c2cd0c1499: This implements a more optimal algorithm for selecting a base constant in…
rL275382: This implements a more optimal algorithm for selecting a base constant in

Summary

This implements a more optimal algorithm for selecting a base constant in constant hoisting. It not only takes into account the number of uses of constants, but now also the resulting integer range of the offsets. Thus, the algorithm maximizes the number of uses within an integer range that will enable more efficient code generation. On ARM, for example, this will enable code size optimisations because less negative offsets will be created. Negative offsets/immediates are not supported by Thumb1 thus preventing more compact instruction encoding.

Diff Detail

Repository: rL LLVM

Event Timeline

SjoerdMeijer updated this revision to Diff 60170.Jun 9 2016, 7:36 AM

SjoerdMeijer retitled this revision from to Better selection of common base address in constant hoisting.

SjoerdMeijer updated this object.

SjoerdMeijer added reviewers: jmolloy, mcrosier, ributzka.

SjoerdMeijer added a subscriber: llvm-commits.

Herald added a subscriber: aemerson. · View Herald TranscriptJun 9 2016, 7:36 AM

Currently, this change impacts all targets. For those targets that don't implement the isImmediateInRangeForLoad API, do we expect this change to be an improvement in general? If not, we might consider having isImmediateInRangeForLoad return an optional bool, so only those targets that implement isImmediateInRangeForLoad are impacted.

lib/Transforms/Scalar/ConstantHoisting.cpp
399 ↗	(On Diff #60170)	differend -> different
403 ↗	(On Diff #60170)	Can this just be a static helper function?
405 ↗	(On Diff #60170)	Does this need to be initialized to 'None'?
411 ↗	(On Diff #60170)	No need for the extra brackets.
462 ↗	(On Diff #60170)	No need for the extra brackets.
466 ↗	(On Diff #60170)	No need for the extra brackets.
514 ↗	(On Diff #60170)	I'd prefer the original 'ConstCand' over just 'i'.
704 ↗	(On Diff #60170)	No need for the extra brackets.

Looks like a nice improvement to ConstantHoisting, but I am a little worried about the limited scope and implementation for load optimization only.

Constants are not only used by load/store instructions, so using "isImmediateInRangeForLoad" is very misleading. Also it might negatively impact decisions we used to make for other instructions.

Did you run the test suite to measure the performance and compile time impact for X86 and ARM/AArch64?

include/llvm/Analysis/TargetTransformInfo.h
405 ↗	(On Diff #60170)	Please add a comment describing the new TTI method.
lib/Transforms/Scalar/ConstantHoisting.cpp
168 ↗	(On Diff #60170)	CandidatesHaveUses -> candidatesHaveUses
169 ↗	(On Diff #60170)	calcOffsetDiff -> calculateOffsetDiff
460 ↗	(On Diff #60170)	What about the case when the constant is not used by a load?
test/Transforms/ConstantHoisting/X86/phi.ll
23 ↗	(On Diff #60170)	OLDHECK???

Thanks for reviewing!
Yes, this impacts all targets. For targets that don't implement the isImmediateInRangeForLoad API, the default value "true" is returned. This is of course to make sure that we don't exclude any constants that were considered before this change. But yes, this might cause codegen differences. Examples are the 2 regression tests that I had to change (masks.ll and phi.ll). There can be multiple solutions with the same "gain", and simply the first solution is picked. This has the side effect that offsets will be positive, which happens to be good, at least on ARM. I don't see if that would negatively impact other targets (because I don't know them well enough). And yes, I was struggling with the name "isImmediateInRangeForLoad". Initially I just had "isImmediateInRange", but then thought it might be too vague and changed it. But it might actually be better describing its usage, because it can be in range of anything, loads/stores etc.

I have been primarily focusing on correctness (obviously) and code size. This patch shows significant code size reductions on our motivating examples. The new regression tests const-addr-no-neg-offset.ll is a representative, minimal code example of that; significantly more 16-bit load/stores will be generated.

Tomorrow I will try to get performance numbers on the table for ARM and X86 and thus see if there is a performance penalty; I don't know the other architectures well enough to make a statement about this.

I am not worried about compile times. In my first straightforward O(N^2) prototype implementation this finishes in virtually no time for lists with hundreds of constants. With thousands it really started to take some time for number crunching. This more efficient implementation had no problems at all (but I don't have hard numbers for the test suite, will also do that tomorrow).

I haven't run actual numbers, but I did diff a few Spec2006 binaries with this patch applied. In general, I see more instructions on AArch64 (a target that doesn't implement the isImmediateInRangeForLoad hook).

For example, here's the diff for gcc:
Opcode static count diff summary:

  -304  movk w #
  -258  sub w w #
   -37  mov w #
   -27  ldr x [x #]
   -27  add w w w
   -18  mov x x
   -16  sub w w # lsl #
    -9  b 
    -6  ldp x x [x #] #
    -6  stp x x [x #]!
    -3  fadd s s s
    -2  sub x x # lsl #
    -2  str s [x #]
    -2  ldr s [x #]
    -1  fsub s s s
    -1  fmov s w
    -1  ldp s s [x #]
    -1  fmul s s s
    -1  stp s s [x #]
    -1  sub x x #
    -1  add x x x
    -1  scvtf s x
     1  orr w w #
     1  cbz/nz w 
     1  ldrb w [x #]
     2  and w w #
     3  orr w w w
     3  cbz/nz x 
     4  str x [x #]
     5  str w [x #]
     5  ldr x [x #] #
     5  str x [x #]!
     6  add x x # lsl #
     7  ldr w [x #]
     9  bl  
    19  ldp x x [x #]
    20  mov w w
    22  stp x x [x #]
    43  add x x #
    54  add w w # lsl #
    57  adrp x  
   783  add w w #
-------------------------
  1050  added (excluding nops)
   725  removed (excluding nops)
   325  net (excluding nops)
  1050  added
   725  removed
   325  net

In short, an additional 325 static instructions are introduced.

Here's the diff for Sphinx:
Opcode static count diff summary:

   -22  mov x x
   -16  adrp x  
    -5  mov w #
    -3  sub x x #
    -2  stp x x [x #]
    -1  ldrb w [x #]!
    -1  bl  
    -1  orr w w #
    -1  strb w [x #]!
    -1  ldp x x [x #]
    -1  ldr w [x #]!
    -1  movk w #
     1  ldrb w [x #]
     1  ldr w [x #]
     3  str x [x #]
     4  b 
     7  add x x # lsl #
    13  add x x #
    31  ldr x [x #]
-------------------------
    60  added (excluding nops)
    55  removed (excluding nops)
     5  net (excluding nops)
    60  added
    55  removed
     5  net

I see many more loads, but I haven't investigated further. Interestingly enough for sjeng I see an opposite trend (more adds and fewer loads).

I'm going to see if I can implement the isImmediateInRangeForLoad for AArch64.

lib/Transforms/Scalar/ConstantHoisting.cpp
391 ↗	(On Diff #60170)	Rather than wrap all these dbgs() print statements with the DEBUG macro, why not just wrap the call to printOffsetRange() with the DEBUG macro. IMO, this makes it more clear that this is just a debug dump in the context of the caller.

I have some first performance results for 2 configurations: "cortex-a53, aarch32, –mthumb" and "cortex-a53, aarch64" and the results are fairly neutral. So some regression, some improvements; they really cancel each other out. I want to run a few more benchmarks, and also run the test suite on X86.
Next is also to get more numbers on code size.

In D21183#453606, @ributzka wrote:

Constants are not only used by load/store instructions, so using "isImmediateInRangeForLoad" is very misleading. Also it might negatively impact decisions we used to make for other instructions.

Mhh. The fact that AArch64 doesn't even implement the TTI hook and shows different behavior is not great - especially since it seems to have increased code size.

In its current form this is still a very targeted optimization for Thumb1 load/store. Wouldn't an ARM specific MachineFunction pass be better for this?

Yes, I agree that in the results are somewhat disappointing; it looks like that it is beneficial for thumb but not for the other targets, at least not in it's current form. That suggests it should be an ARM/thumb specific optimisations; or the current implementation needs improvement. I will be looking further into this to see what's best/feasible.

Resigning to get this off my queue. Please add me back if you decide to proceed with this or a similar change.

SjoerdMeijer added a reviewer: mcrosier.Jul 6 2016, 9:27 AM

Sorry for the delay (this work was interrupted by my holiday and some other work).

Compared to the previous patch, this are the changes:

It still affects all targets, but the code triggers only at –Os
We have traded readability/maintainability of the algorithm over efficiency. So it is now less efficient, but a simple check that sees if the list contains more than 100 items in the worklist makes the code fall back to the old algorithm. The algorithm is also slightly different as it sums up the costs of all uses and then subtracts values for offsets that are out of range; before we were not really tracking the costs of all uses.
For X86 I have checked that the behaviour is the same as the old algorithm for the LNT suite.
I don’t see any regressions for aarch32 thumb mode on LNT, coremark, dhrystone, eembc, but no improvements either. Our motivating example still sees the significant code size reduction though.
I still do need an extra callback to get the costs for offsets. The reasons is that the existing getIntImmCosts functions checks which subtarget is being targeted. This doesn’t help because we want to know the costs of an immediate in a different ISA (i.e. thumb1).

jmolloy added inline comments.Jul 8 2016, 5:45 AM

include/llvm/Analysis/TargetTransformInfo.h
413 ↗	(On Diff #62887)	Having this hook be context-free will hinder targets implementing it usefully. It should take the same arguments as getIntImmCost(), so that targets can give a contextual answer if they wish.
lib/Transforms/Scalar/ConstantHoisting.cpp
288 ↗	(On Diff #62887)	"different"
341 ↗	(On Diff #62887)	You can use a range-for loop here.
342 ↗	(On Diff #62887)	This should be hoisted out of the inner loop as it is it invariant.
347 ↗	(On Diff #62887)	The formatting looks off here - have you used clang-format?

Addressed James' comments.

This is looking fine to me now, but please wait for Mehdi or someone else to also look it over.

include/llvm/Analysis/TargetTransformInfoImpl.h
261 ↗	(On Diff #63660)	The formatting here isn't LLVM style. The return sholud be on a new line, then the } on the next line.

A few very superficial comments.

lib/Transforms/Scalar/ConstantHoisting.cpp
328 ↗	(On Diff #63660)	Thanks very much for well documenting this!
390 ↗	(On Diff #63660)	Any reason to add braces here?
392 ↗	(On Diff #63660)	All this code could be sunk in `maximizeConstantsInRange()`. It would also make the comment that described `maximizeConstantsInRange` more correct (it mentions the 100 limits and falling back to the old algorithm).

Thanks Mehdi and James for reviewing! I have addressed your comments.

LGTM

This revision is now accepted and ready to land.Jul 13 2016, 9:37 AM

Closed by commit rL275382: This implements a more optimal algorithm for selecting a base constant in (authored by SjoerdMeijer). · Explain WhyJul 14 2016, 12:51 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

Analysis/

TargetTransformInfo.h

16 lines

TargetTransformInfoImpl.h

5 lines

Transforms/

Scalar/

ConstantHoisting.h

3 lines

lib/

Analysis/

TargetTransformInfo.cpp

8 lines

Target/

ARM/

ARMTargetTransformInfo.h

3 lines

ARMTargetTransformInfo.cpp

11 lines

Transforms/

Scalar/

ConstantHoisting.cpp

105 lines

test/

Transforms/

ConstantHoisting/

ARM/

const-addr-no-neg-offset.ll

42 lines

Diff 63933

llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 408 Lines • ▼ Show 20 Lines	public:

/// \brief Return the expected cost of materialization for the given integer		/// \brief Return the expected cost of materialization for the given integer
/// immediate of the specified type for a given instruction. The cost can be		/// immediate of the specified type for a given instruction. The cost can be
/// zero if the immediate can be folded into the specified instruction.		/// zero if the immediate can be folded into the specified instruction.
int getIntImmCost(unsigned Opc, unsigned Idx, const APInt &Imm,		int getIntImmCost(unsigned Opc, unsigned Idx, const APInt &Imm,
Type *Ty) const;		Type *Ty) const;
int getIntImmCost(Intrinsic::ID IID, unsigned Idx, const APInt &Imm,		int getIntImmCost(Intrinsic::ID IID, unsigned Idx, const APInt &Imm,
Type *Ty) const;		Type *Ty) const;

		/// \brief Return the expected cost for the given integer when optimising
		/// for size. This is different than the other integer immediate cost
		/// functions in that it is subtarget agnostic. This is useful when you e.g.
		/// target one ISA such as Aarch32 but smaller encodings could be possible
		/// with another such as Thumb. This return value is used as a penalty when
		/// the total costs for a constant is calculated (the bigger the cost, the
		/// more beneficial constant hoisting is).
		int getIntImmCodeSizeCost(unsigned Opc, unsigned Idx, const APInt &Imm,
		Type *Ty) const;
/// @}		/// @}

/// \name Vector Target Information		/// \name Vector Target Information
/// @{		/// @{

/// \brief The various kinds of shuffle patterns for vector queries.		/// \brief The various kinds of shuffle patterns for vector queries.
enum ShuffleKind {		enum ShuffleKind {
SK_Broadcast, ///< Broadcast element 0 to all other elements.		SK_Broadcast, ///< Broadcast element 0 to all other elements.
▲ Show 20 Lines • Show All 235 Lines • ▼ Show 20 Lines	public:
virtual bool isFPVectorizationPotentiallyUnsafe() = 0;		virtual bool isFPVectorizationPotentiallyUnsafe() = 0;
virtual bool allowsMisalignedMemoryAccesses(unsigned BitWidth,		virtual bool allowsMisalignedMemoryAccesses(unsigned BitWidth,
unsigned AddressSpace,		unsigned AddressSpace,
unsigned Alignment,		unsigned Alignment,
bool *Fast) = 0;		bool *Fast) = 0;
virtual PopcntSupportKind getPopcntSupport(unsigned IntTyWidthInBit) = 0;		virtual PopcntSupportKind getPopcntSupport(unsigned IntTyWidthInBit) = 0;
virtual bool haveFastSqrt(Type *Ty) = 0;		virtual bool haveFastSqrt(Type *Ty) = 0;
virtual int getFPOpCost(Type *Ty) = 0;		virtual int getFPOpCost(Type *Ty) = 0;
		virtual int getIntImmCodeSizeCost(unsigned Opc, unsigned Idx, const APInt &Imm,
		Type *Ty) = 0;
virtual int getIntImmCost(const APInt &Imm, Type *Ty) = 0;		virtual int getIntImmCost(const APInt &Imm, Type *Ty) = 0;
virtual int getIntImmCost(unsigned Opc, unsigned Idx, const APInt &Imm,		virtual int getIntImmCost(unsigned Opc, unsigned Idx, const APInt &Imm,
Type *Ty) = 0;		Type *Ty) = 0;
virtual int getIntImmCost(Intrinsic::ID IID, unsigned Idx, const APInt &Imm,		virtual int getIntImmCost(Intrinsic::ID IID, unsigned Idx, const APInt &Imm,
Type *Ty) = 0;		Type *Ty) = 0;
virtual unsigned getNumberOfRegisters(bool Vector) = 0;		virtual unsigned getNumberOfRegisters(bool Vector) = 0;
virtual unsigned getRegisterBitWidth(bool Vector) = 0;		virtual unsigned getRegisterBitWidth(bool Vector) = 0;
virtual unsigned getLoadStoreVecRegBitWidth(unsigned AddrSpace) = 0;		virtual unsigned getLoadStoreVecRegBitWidth(unsigned AddrSpace) = 0;
▲ Show 20 Lines • Show All 160 Lines • ▼ Show 20 Lines	public:
}		}
PopcntSupportKind getPopcntSupport(unsigned IntTyWidthInBit) override {		PopcntSupportKind getPopcntSupport(unsigned IntTyWidthInBit) override {
return Impl.getPopcntSupport(IntTyWidthInBit);		return Impl.getPopcntSupport(IntTyWidthInBit);
}		}
bool haveFastSqrt(Type *Ty) override { return Impl.haveFastSqrt(Ty); }		bool haveFastSqrt(Type *Ty) override { return Impl.haveFastSqrt(Ty); }

int getFPOpCost(Type *Ty) override { return Impl.getFPOpCost(Ty); }		int getFPOpCost(Type *Ty) override { return Impl.getFPOpCost(Ty); }

		int getIntImmCodeSizeCost(unsigned Opc, unsigned Idx, const APInt &Imm,
		Type *Ty) override {
		return Impl.getIntImmCodeSizeCost(Opc, Idx, Imm, Ty);
		}
int getIntImmCost(const APInt &Imm, Type *Ty) override {		int getIntImmCost(const APInt &Imm, Type *Ty) override {
return Impl.getIntImmCost(Imm, Ty);		return Impl.getIntImmCost(Imm, Ty);
}		}
int getIntImmCost(unsigned Opc, unsigned Idx, const APInt &Imm,		int getIntImmCost(unsigned Opc, unsigned Idx, const APInt &Imm,
Type *Ty) override {		Type *Ty) override {
return Impl.getIntImmCost(Opc, Idx, Imm, Ty);		return Impl.getIntImmCost(Opc, Idx, Imm, Ty);
}		}
int getIntImmCost(Intrinsic::ID IID, unsigned Idx, const APInt &Imm,		int getIntImmCost(Intrinsic::ID IID, unsigned Idx, const APInt &Imm,
▲ Show 20 Lines • Show All 215 Lines • Show Last 20 Lines

llvm/trunk/include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 251 Lines • ▼ Show 20 Lines	public:
TTI::PopcntSupportKind getPopcntSupport(unsigned IntTyWidthInBit) {		TTI::PopcntSupportKind getPopcntSupport(unsigned IntTyWidthInBit) {
return TTI::PSK_Software;		return TTI::PSK_Software;
}		}

bool haveFastSqrt(Type *Ty) { return false; }		bool haveFastSqrt(Type *Ty) { return false; }

unsigned getFPOpCost(Type *Ty) { return TargetTransformInfo::TCC_Basic; }		unsigned getFPOpCost(Type *Ty) { return TargetTransformInfo::TCC_Basic; }

		int getIntImmCodeSizeCost(unsigned Opcode, unsigned Idx, const APInt &Imm,
		Type *Ty) {
		return 0;
		}

unsigned getIntImmCost(const APInt &Imm, Type *Ty) { return TTI::TCC_Basic; }		unsigned getIntImmCost(const APInt &Imm, Type *Ty) { return TTI::TCC_Basic; }

unsigned getIntImmCost(unsigned Opcode, unsigned Idx, const APInt &Imm,		unsigned getIntImmCost(unsigned Opcode, unsigned Idx, const APInt &Imm,
Type *Ty) {		Type *Ty) {
return TTI::TCC_Free;		return TTI::TCC_Free;
}		}

unsigned getIntImmCost(Intrinsic::ID IID, unsigned Idx, const APInt &Imm,		unsigned getIntImmCost(Intrinsic::ID IID, unsigned Idx, const APInt &Imm,
▲ Show 20 Lines • Show All 269 Lines • Show Last 20 Lines

llvm/trunk/include/llvm/Transforms/Scalar/ConstantHoisting.h

Show First 20 Lines • Show All 128 Lines • ▼ Show 20 Lines	private:
void collectConstantCandidates(ConstCandMapType &ConstCandMap,		void collectConstantCandidates(ConstCandMapType &ConstCandMap,
Instruction *Inst, unsigned Idx,		Instruction *Inst, unsigned Idx,
ConstantInt *ConstInt);		ConstantInt *ConstInt);
void collectConstantCandidates(ConstCandMapType &ConstCandMap,		void collectConstantCandidates(ConstCandMapType &ConstCandMap,
Instruction *Inst);		Instruction *Inst);
void collectConstantCandidates(Function &Fn);		void collectConstantCandidates(Function &Fn);
void findAndMakeBaseConstant(ConstCandVecType::iterator S,		void findAndMakeBaseConstant(ConstCandVecType::iterator S,
ConstCandVecType::iterator E);		ConstCandVecType::iterator E);
		unsigned maximizeConstantsInRange(ConstCandVecType::iterator S,
		ConstCandVecType::iterator E,
		ConstCandVecType::iterator &MaxCostItr);
void findBaseConstants();		void findBaseConstants();
void emitBaseConstants(Instruction Base, Constant Offset,		void emitBaseConstants(Instruction Base, Constant Offset,
const consthoist::ConstantUser &ConstUser);		const consthoist::ConstantUser &ConstUser);
bool emitBaseConstants();		bool emitBaseConstants();
void deleteDeadCastInst() const;		void deleteDeadCastInst() const;
bool optimizeConstants(Function &Fn);		bool optimizeConstants(Function &Fn);
};		};
}		}

#endif // LLVM_TRANSFORMS_SCALAR_CONSTANTHOISTING_H		#endif // LLVM_TRANSFORMS_SCALAR_CONSTANTHOISTING_H

llvm/trunk/lib/Analysis/TargetTransformInfo.cpp

	Show First 20 Lines • Show All 203 Lines • ▼ Show 20 Lines
	}			}

	int TargetTransformInfo::getFPOpCost(Type *Ty) const {			int TargetTransformInfo::getFPOpCost(Type *Ty) const {
	int Cost = TTIImpl->getFPOpCost(Ty);			int Cost = TTIImpl->getFPOpCost(Ty);
	assert(Cost >= 0 && "TTI should not produce negative costs!");			assert(Cost >= 0 && "TTI should not produce negative costs!");
	return Cost;			return Cost;
	}			}

				int TargetTransformInfo::getIntImmCodeSizeCost(unsigned Opcode, unsigned Idx,
				const APInt &Imm,
				Type *Ty) const {
				int Cost = TTIImpl->getIntImmCodeSizeCost(Opcode, Idx, Imm, Ty);
				assert(Cost >= 0 && "TTI should not produce negative costs!");
				return Cost;
				}

	int TargetTransformInfo::getIntImmCost(const APInt &Imm, Type *Ty) const {			int TargetTransformInfo::getIntImmCost(const APInt &Imm, Type *Ty) const {
	int Cost = TTIImpl->getIntImmCost(Imm, Ty);			int Cost = TTIImpl->getIntImmCost(Imm, Ty);
	assert(Cost >= 0 && "TTI should not produce negative costs!");			assert(Cost >= 0 && "TTI should not produce negative costs!");
	return Cost;			return Cost;
	}			}

	int TargetTransformInfo::getIntImmCost(unsigned Opcode, unsigned Idx,			int TargetTransformInfo::getIntImmCost(unsigned Opcode, unsigned Idx,
	const APInt &Imm, Type *Ty) const {			const APInt &Imm, Type *Ty) const {
	▲ Show 20 Lines • Show All 241 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/ARM/ARMTargetTransformInfo.h

Show First 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	public:
/// is IEEE-754 compliant, but it's not covered in this target.		/// is IEEE-754 compliant, but it's not covered in this target.
bool isFPVectorizationPotentiallyUnsafe() {		bool isFPVectorizationPotentiallyUnsafe() {
return !ST->isTargetDarwin();		return !ST->isTargetDarwin();
}		}

/// \name Scalar TTI Implementations		/// \name Scalar TTI Implementations
/// @{		/// @{

		int getIntImmCodeSizeCost(unsigned Opcode, unsigned Idx, const APInt &Imm,
		Type *Ty);

using BaseT::getIntImmCost;		using BaseT::getIntImmCost;
int getIntImmCost(const APInt &Imm, Type *Ty);		int getIntImmCost(const APInt &Imm, Type *Ty);

int getIntImmCost(unsigned Opcode, unsigned Idx, const APInt &Imm, Type *Ty);		int getIntImmCost(unsigned Opcode, unsigned Idx, const APInt &Imm, Type *Ty);

/// @}		/// @}

/// \name Vector TTI Implementations		/// \name Vector TTI Implementations
▲ Show 20 Lines • Show All 59 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/ARM/ARMTargetTransformInfo.cpp

Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	if (Bits == 0 \|\| Imm.getActiveBits() >= 64)
if (SImmVal >= 0 && SImmVal < 256)		if (SImmVal >= 0 && SImmVal < 256)
return 1;		return 1;
if ((~ZImmVal < 256) \|\| ARM_AM::isThumbImmShiftedVal(ZImmVal))		if ((~ZImmVal < 256) \|\| ARM_AM::isThumbImmShiftedVal(ZImmVal))
return 2;		return 2;
// Load from constantpool.		// Load from constantpool.
return 3;		return 3;
}		}


		// Constants smaller than 256 fit in the immediate field of
		// Thumb1 instructions so we return a zero cost and 1 otherwise.
		int ARMTTIImpl::getIntImmCodeSizeCost(unsigned Opcode, unsigned Idx,
		const APInt &Imm, Type *Ty) {
		if (Imm.isNonNegative() && Imm.getLimitedValue() < 256)
		return 0;

		return 1;
		}

int ARMTTIImpl::getIntImmCost(unsigned Opcode, unsigned Idx, const APInt &Imm,		int ARMTTIImpl::getIntImmCost(unsigned Opcode, unsigned Idx, const APInt &Imm,
Type *Ty) {		Type *Ty) {
// Division by a constant can be turned into multiplication, but only if we		// Division by a constant can be turned into multiplication, but only if we
// know it's constant. So it's not so much that the immediate is cheap (it's		// know it's constant. So it's not so much that the immediate is cheap (it's
// not), but that the alternative is worse.		// not), but that the alternative is worse.
// FIXME: this is probably unneeded with GlobalISel.		// FIXME: this is probably unneeded with GlobalISel.
if ((Opcode == Instruction::SDiv \|\| Opcode == Instruction::UDiv \|\|		if ((Opcode == Instruction::SDiv \|\| Opcode == Instruction::UDiv \|\|
Opcode == Instruction::SRem \|\| Opcode == Instruction::URem) &&		Opcode == Instruction::SRem \|\| Opcode == Instruction::URem) &&
▲ Show 20 Lines • Show All 451 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/Scalar/ConstantHoisting.cpp

	Show First 20 Lines • Show All 279 Lines • ▼ Show 20 Lines
	/// into an instruction itself.			/// into an instruction itself.
	void ConstantHoistingPass::collectConstantCandidates(Function &Fn) {			void ConstantHoistingPass::collectConstantCandidates(Function &Fn) {
	ConstCandMapType ConstCandMap;			ConstCandMapType ConstCandMap;
	for (BasicBlock &BB : Fn)			for (BasicBlock &BB : Fn)
	for (Instruction &Inst : BB)			for (Instruction &Inst : BB)
	collectConstantCandidates(ConstCandMap, &Inst);			collectConstantCandidates(ConstCandMap, &Inst);
	}			}

	/// \brief Find the base constant within the given range and rebase all other			// This helper function is necessary to deal with values that have different
	/// constants with respect to the base constant.			// bit widths (APInt Operator- does not like that). If the value cannot be
	void ConstantHoistingPass::findAndMakeBaseConstant(			// represented in uint64 we return an "empty" APInt. This is then interpreted
	ConstCandVecType::iterator S, ConstCandVecType::iterator E) {			// as the value is not in range.
	auto MaxCostItr = S;			static llvm::Optional<APInt> calculateOffsetDiff(APInt V1, APInt V2)
				{
				llvm::Optional<APInt> Res = None;
				unsigned BW = V1.getBitWidth() > V2.getBitWidth() ?
				V1.getBitWidth() : V2.getBitWidth();
				uint64_t LimVal1 = V1.getLimitedValue();
				uint64_t LimVal2 = V2.getLimitedValue();

				if (LimVal1 == ~0ULL \|\| LimVal2 == ~0ULL)
				return Res;

				uint64_t Diff = LimVal1 - LimVal2;
				return APInt(BW, Diff, true);
				}

				// From a list of constants, one needs to picked as the base and the other
				// constants will be transformed into an offset from that base constant. The
				// question is which we can pick best? For example, consider these constants
				// and their number of uses:
				//
				// Constants\| 2 \| 4 \| 12 \| 42 \|
				// NumUses \| 3 \| 2 \| 8 \| 7 \|
				//
				// Selecting constant 12 because it has the most uses will generate negative
				// offsets for constants 2 and 4 (i.e. -10 and -8 respectively). If negative
				// offsets lead to less optimal code generation, then there might be better
				// solutions. Suppose immediates in the range of 0..35 are most optimally
				// supported by the architecture, then selecting constant 2 is most optimal
				// because this will generate offsets: 0, 2, 10, 40. Offsets 0, 2 and 10 are in
				// range 0..35, and thus 3 + 2 + 8 = 13 uses are in range. Selecting 12 would
				// have only 8 uses in range, so choosing 2 as a base is more optimal. Thus, in
				// selecting the base constant the range of the offsets is a very important
				// factor too that we take into account here. This algorithm calculates a total
				// costs for selecting a constant as the base and substract the costs if
				// immediates are out of range. It has quadratic complexity, so we call this
				// function only when we're optimising for size and there are less than 100
				// constants, we fall back to the straightforward algorithm otherwise
				// which does not do all the offset calculations.
				unsigned
				ConstantHoistingPass::maximizeConstantsInRange(ConstCandVecType::iterator S,
				ConstCandVecType::iterator E,
				ConstCandVecType::iterator &MaxCostItr) {
	unsigned NumUses = 0;			unsigned NumUses = 0;
	// Use the constant that has the maximum cost as base constant.
				if(!Entry->getParent()->optForSize() \|\| std::distance(S,E) > 100) {
	for (auto ConstCand = S; ConstCand != E; ++ConstCand) {			for (auto ConstCand = S; ConstCand != E; ++ConstCand) {
	NumUses += ConstCand->Uses.size();			NumUses += ConstCand->Uses.size();
	if (ConstCand->CumulativeCost > MaxCostItr->CumulativeCost)			if (ConstCand->CumulativeCost > MaxCostItr->CumulativeCost)
	MaxCostItr = ConstCand;			MaxCostItr = ConstCand;
	}			}
				return NumUses;
				}

				DEBUG(dbgs() << "== Maximize constants in range ==\n");
				int MaxCost = -1;
				for (auto ConstCand = S; ConstCand != E; ++ConstCand) {
				auto Value = ConstCand->ConstInt->getValue();
				Type *Ty = ConstCand->ConstInt->getType();
				int Cost = 0;
				NumUses += ConstCand->Uses.size();
				DEBUG(dbgs() << "= Constant: " << ConstCand->ConstInt->getValue() << "\n");

				for (auto User : ConstCand->Uses) {
				unsigned Opcode = User.Inst->getOpcode();
				unsigned OpndIdx = User.OpndIdx;
				Cost += TTI->getIntImmCost(Opcode, OpndIdx, Value, Ty);
				DEBUG(dbgs() << "Cost: " << Cost << "\n");

				for (auto C2 = S; C2 != E; ++C2) {
				llvm::Optional<APInt> Diff = calculateOffsetDiff(
				C2->ConstInt->getValue(),
				ConstCand->ConstInt->getValue());
				if (Diff) {
				const int ImmCosts =
				TTI->getIntImmCodeSizeCost(Opcode, OpndIdx, Diff.getValue(), Ty);
				Cost -= ImmCosts;
				DEBUG(dbgs() << "Offset " << Diff.getValue() << " "
				<< "has penalty: " << ImmCosts << "\n"
				<< "Adjusted cost: " << Cost << "\n");
				}
				}
				}
				DEBUG(dbgs() << "Cumulative cost: " << Cost << "\n");
				if (Cost > MaxCost) {
				MaxCost = Cost;
				MaxCostItr = ConstCand;
				DEBUG(dbgs() << "New candidate: " << MaxCostItr->ConstInt->getValue()
				<< "\n");
				}
				}
				return NumUses;
				}

				/// \brief Find the base constant within the given range and rebase all other
				/// constants with respect to the base constant.
				void ConstantHoistingPass::findAndMakeBaseConstant(
				ConstCandVecType::iterator S, ConstCandVecType::iterator E) {
				auto MaxCostItr = S;
				unsigned NumUses = maximizeConstantsInRange(S, E, MaxCostItr);

	// Don't hoist constants that have only one use.			// Don't hoist constants that have only one use.
	if (NumUses <= 1)			if (NumUses <= 1)
	return;			return;

	ConstantInfo ConstInfo;			ConstantInfo ConstInfo;
	ConstInfo.BaseConstant = MaxCostItr->ConstInt;			ConstInfo.BaseConstant = MaxCostItr->ConstInt;
	Type *Ty = ConstInfo.BaseConstant->getType();			Type *Ty = ConstInfo.BaseConstant->getType();
	▲ Show 20 Lines • Show All 230 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/ConstantHoisting/ARM/const-addr-no-neg-offset.ll

				; RUN: opt -mtriple=arm-arm-none-eabi -consthoist -S < %s \| FileCheck %s

				; There are different candidates here for the base constant: 1073876992 and
				; 1073876996. But we don't want to see the latter because it results in
				; negative offsets.

				define void @foo() #0 {
				entry:
				; CHECK-LABEL: @foo
				; CHECK-NOT: [[CONST1:%const_mat[0-9]*]] = add i32 %const, -4
				%0 = load volatile i32, i32* inttoptr (i32 1073876992 to i32*), align 4096
				%or = or i32 %0, 1
				store volatile i32 %or, i32* inttoptr (i32 1073876992 to i32*), align 4096
				%1 = load volatile i32, i32* inttoptr (i32 1073876996 to i32*), align 4
				%and = and i32 %1, -117506048
				store volatile i32 %and, i32* inttoptr (i32 1073876996 to i32*), align 4
				%2 = load volatile i32, i32* inttoptr (i32 1073876992 to i32*), align 4096
				%and1 = and i32 %2, -17367041
				store volatile i32 %and1, i32* inttoptr (i32 1073876996 to i32*), align 4096
				%3 = load volatile i32, i32* inttoptr (i32 1073876992 to i32*), align 4096
				%and2 = and i32 %3, -262145
				store volatile i32 %and2, i32* inttoptr (i32 1073876992 to i32*), align 4096
				%4 = load volatile i32, i32* inttoptr (i32 1073876996 to i32*), align 4
				%and3 = and i32 %4, -8323073
				store volatile i32 %and3, i32* inttoptr (i32 1073876996 to i32*), align 4
				store volatile i32 10420224, i32* inttoptr (i32 1073877000 to i32*), align 8
				%5 = load volatile i32, i32* inttoptr (i32 1073876996 to i32*), align 4096
				%or4 = or i32 %5, 65536
				store volatile i32 %or4, i32* inttoptr (i32 1073876996 to i32*), align 4096
				%6 = load volatile i32, i32* inttoptr (i32 1073881088 to i32*), align 8192
				%or6.i.i = or i32 %6, 16
				store volatile i32 %or6.i.i, i32* inttoptr (i32 1073881088 to i32*), align 8192
				%7 = load volatile i32, i32* inttoptr (i32 1073881088 to i32*), align 8192
				%and7.i.i = and i32 %7, -4
				store volatile i32 %and7.i.i, i32* inttoptr (i32 1073881088 to i32*), align 8192
				%8 = load volatile i32, i32* inttoptr (i32 1073881088 to i32*), align 8192
				%or8.i.i = or i32 %8, 2
				store volatile i32 %or8.i.i, i32* inttoptr (i32 1073881088 to i32*), align 8192
				ret void
				}

				attributes #0 = { minsize norecurse nounwind optsize readnone uwtable }

This is an archive of the discontinued LLVM Phabricator instance.

Better selection of common base address in constant hoistingClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 63933

llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h

llvm/trunk/include/llvm/Analysis/TargetTransformInfoImpl.h

llvm/trunk/include/llvm/Transforms/Scalar/ConstantHoisting.h

llvm/trunk/lib/Analysis/TargetTransformInfo.cpp

llvm/trunk/lib/Target/ARM/ARMTargetTransformInfo.h

llvm/trunk/lib/Target/ARM/ARMTargetTransformInfo.cpp

llvm/trunk/lib/Transforms/Scalar/ConstantHoisting.cpp

llvm/trunk/test/Transforms/ConstantHoisting/ARM/const-addr-no-neg-offset.ll

Better selection of common base address in constant hoisting
ClosedPublic