This is an archive of the discontinued LLVM Phabricator instance.

Differential D19152

Improve LoopStrengthReduce RateFormula, by calling a new target hook increasedLSRFormulaCost()
ClosedPublic

Authored by jonpa on Apr 15 2016, 2:15 AM.

Download Raw Diff

Details

Reviewers

qcolombet
hfinkel

Summary

While experimenting with the LoopUnroller pass for SystemZ, it was obvious that some loops regressed, due to the fact that LSR introduced negative offsets for load / store operations. These require an extra load-address instruction for float and vector types on SystemZ..

To be able to avoid this, I tried using isLegalAddressingMode() by returning false for float / double. This however caused other regressions.
Therefore I had to add a new method to TargetTransformInfo which adds an extra cost for the where appropriate. It seems to make sense generally to let targets adjust LSR formula rating.

During rating of formulas, the new TTI method needs to look at the actual user instructions. This was a bit awkward to get in place, and it may very well be that the code owner of this file could do a better job of it, or do some refactoring. I resorted to passing the Fixups to RateFormula, while letting the LSRUse access the right LSRFixup by means of an index. I had to rearrange code a bit to be able to use the LSRFixup class as an argument to RateFormula.

This seems to be the only viable way of using partial / runtime loop unrolling on SystemZ.

Diff Detail

Event Timeline

jonpa updated this revision to Diff 53858.Apr 15 2016, 2:15 AM

jonpa retitled this revision from to Improve LoopStrengthReduce RateFormula, by calling a new target hook increasedLSRFormulaCost().

jonpa updated this object.

jonpa added a reviewer: hfinkel.

jonpa added subscribers: uweigand, chandlerc.

Herald added subscribers: mzolotukhin, sanjoy. · View Herald TranscriptApr 15 2016, 2:15 AM

ping

(Once again, this patch isn't really that big except I had to move up a class definition, so it just looks that way.)

The question that this patch answers is "Does this memory access (load/store, type) with the new total offset require an extra reg add?".

Perhaps this should be part of isAMCompletelyFolded()?

Could this problem be solved in the backend by implementing needsFrameBaseReg, materializeFrameBaseRegister, resolveFrameIndex and isFrameOffsetLegal in SystemZRegisterInfo.{h,cpp}?

I have no fundamental problem with the target being able to adjust the LSR formula costs, but the specific problem described seems to be what the virtual base register functionality should address.

lib/Target/SystemZ/SystemZISelLowering.cpp
31	rm extra blank line.

I don't think the RegisterInfo methods you mention will do the trick, as they all handle just frame index operands, meaning variables on the stack. Wouldn't that still leave e.g. global variables and argument addresses unhandled? Besides, since this relates to loops, even a simple extra register load could perhaps be worth avoiding.

The motivation for an LSR target cost adjustment is that it seems to be a good way to improve performance while avoiding regressions. As I said earlier, I tried using isLegalAddressingMode(), but that also caused regressions. I think this difference is because SystemZ actually has support for loading with big offset of float values. However, as those loads get folded into their uses an extra load-address instruction is emitted if the offset is negative. The instructions with a folded memory-operand only support small offsets.

I am not really sure about the way LoopStrengthReduce.cpp is organized. It seems intuitive to me that the LSRUse class should know about the actual user instructions (lsr-fixups). This was not the case, so I added just the indexes of the LSRFixups to LSRUse, since the fixups are stored in LSRInstance. This has the drawback of having to pass the Fixups vector to RateFormula as well, so that the LSRUse can index into it. Perhaps the LSRUse class should own the fixups instead?

In D19152#413455, @jonpa wrote:

I don't think the RegisterInfo methods you mention will do the trick, as they all handle just frame index operands, meaning variables on the stack. Wouldn't that still leave e.g. global variables and argument addresses unhandled? Besides, since this relates to loops, even a simple extra register load could perhaps be worth avoiding.

This makes sense.

The motivation for an LSR target cost adjustment is that it seems to be a good way to improve performance while avoiding regressions. As I said earlier, I tried using isLegalAddressingMode(), but that also caused regressions. I think this difference is because SystemZ actually has support for loading with big offset of float values. However, as those loads get folded into their uses an extra load-address instruction is emitted if the offset is negative. The instructions with a folded memory-operand only support small offsets.

I am not really sure about the way LoopStrengthReduce.cpp is organized. It seems intuitive to me that the LSRUse class should know about the actual user instructions (lsr-fixups). This was not the case, so I added just the indexes of the LSRFixups to LSRUse, since the fixups are stored in LSRInstance. This has the drawback of having to pass the Fixups vector to RateFormula as well, so that the LSRUse can index into it. Perhaps the LSRUse class should own the fixups instead?

Andy, Quentin, Adam, Sanjoy, et al., opinions?

Adding TTI hooks in LSR is fine with me. increasedLSRFormulaCost seems a little specific though for a TTI hook.

Iterating over Fixups.Offset does seem redundant with LSRUse.Offsets. My general concern is really the growing complexity of LSR utilities, with multiple ways of handling the same instruction set problems. I feels to me like this is something that should be handled by isAlwaysFoldable.

Can someone else take a look at this? Adam, MichaelZ, Quentin?

Hi Jonas,

I share Andy’s points:

This seems redundant with LU.Offsets.
That may be accounted within isAlwaysFoldable.

Could you add a test case to see how things are playing together?

Cheers,
-Quentin

The SystemZTargetLowering::increasedLSRFormulaCost() has been extended to consider vector store instructions also.

Hi Andy and Quentin,

thanks for review so far!

My feeling is that looking at LU.Offsets is not enough, since not all offsets of all user instructions are guaranteed / supposed to be present in the Offsets vector.

Using isAlwaysFoldable() also seems difficult, since there is no info about if the user instruction is a store or a load.

These two parameters are what is needed to make a proper cost function, at least for SystemZ, since it makes a difference if the user instruction is a load or a store (see the implementation in this patch).

One alternative to iterating over the LSRFixups (use instructions) may be to make sure *all* LU.Offsets are present in the vector, and also have keep track of load / store type for each offset somehow.

Another thought is to extend MemAccessTy (or KindType) with load / store attribute, although, that seems problematic since then stores and loads would then not belong to the same LSRUse even if they are otherwise compatible.

It may also be possible to compromise and ignore the load / store differences, but that is of course not as good.

What are your thoughts on this?

/Jonas

PS will send test case in separate mail

Patch updated so that

LSRUse will own its fixups instead of indexes into the LSRFixups vector.
Renamed the new method to isFoldableMemAccessOffset(), since it really isn't an LSR specific function.

I think it's generally useful to let the LSRUse own its fixups, and as Andy pointed out, the Offsets vector is not necessary if already iterating over the fixups, so that vector has been removed. (And again, class definitions have been moved around, so it looks like more changes than there really are).

LSR is with this patch *NOT* making entirely identical decisions. In RateFormula all fixup offets are now considered, whereas before Offsets were a bit sloppily kept unique, without any guarantee of them being so. From what I could quickly see, there seems to be less or smaller offsets used with the patch applied, which means it seems to be some welcome changes here and there. Perhaps someone could look over the logic under "Tally up the non-zero immediates." in RateFormula(), which is where this effect comes into play?

The new TargetTransformInfo hook could be considered a complement to isLegalAddressingMode(), which is more of a high level, while this new method is a more precise cost function with the Instruction included as argument. Perhaps they could be merged somehow, but I am not sure that is the right thing to do.

There is a myriad of isAlwaysFoldable() and isAMCompletelyFolded() functions used in LSR. There is also the Formula::UnfoldedOffset and the MaxOffset and MinOffset of the LSRUse which all play a role in the handling of immediate offsets. I am not sure about all the reasoning behind all this, so I have now only done the simple thing by adding this new cost function in RateFormula(). It may be that it could / should be used at earlier stages also, for efficiency's sake.

Even if this is not a complete reworking of LSR, I think this patch is a refactoring in the right direction, along with a needed improvement for SystemZ.

Just to clarify: LSR results are changed by both 1) Considering all LSRFixups instead of the old Offsets vector. This gives a little different ImmCost sums. 2) The new target hook, which may add 1 to NumBaseAdds.

Thanks for the clarifications and updates.

Will have a closer look.

Hi Jonas,

The updated patch makes sense to me.
The patch is missing a test case though.

@Andy, do you have any additional comment?

Cheers,
-Quentin

include/llvm/Analysis/TargetTransformInfo.h
357	We should mention that the addressing mode is of the form reg + Offset.

Comment fixed as suggested.
New tests for the SystemZ backend.

qcolombet added inline comments.Jun 20 2016, 11:51 AM

test/CodeGen/SystemZ/loop-01.ll
5	How different is the codegen for the z13 cpu? The reason why I am asking is because right now, the test cases are partitioned between one target and the other with no overlap whereas if we have several RUN command in the file, I would have expected at least some overlap. I.e., I would expect a common prefix between both CPUs that is used for most of the tests.
152	Use "opt -instnamer" to get %[0-9]+ variables.

jonpa added inline comments.Jun 21 2016, 1:31 AM

test/CodeGen/SystemZ/loop-01.ll
5	The one test for z13 is for vector instructions, which only z13 supports. The other tests I added should be common for all subtargets, so I just reused the already present RUN command. Perhaps it should use the generic subtarget instead of z10 (which I think would be equivalent)?

jonpa added inline comments.Jun 21 2016, 2:01 AM

test/CodeGen/SystemZ/loop-01.ll
152	It seems to replace %[0-9]+ variable names with %tmp[1-9]* names. Is that what you want?

The other tests I added should be common for all subtargets, so I just reused the already present RUN command. Perhaps it should use the generic subtarget instead of z10 (which I think would be equivalent)?

In the past, LLVM would default to detecting the host CPU when compiling natively, which would cause random test failures depending on the particular host that was running the test suite. To fix this, most tests hard-coded a z10 CPU.

These days, this is no longer necessary since LLVM no longer defaults to automatically detecting the CPU, but always defaults to z10 anyway.

qcolombet added inline comments.Jun 21 2016, 8:21 AM

test/CodeGen/SystemZ/loop-01.ll
5	In that case, for the RUN line with z13 also add —check-prefix=CHECK.
152	Yes, that what I want. [0-9]+ variables cannot be reordered or removed, tmp[0-9]+ can :).

Updated per review:
-check-prefix=CHECK added to -z13 RUN line so that all tests run for -z13 as well.
New tests filtered through opt -instnamer, to eliminate numeral based instruction names.

LGTM.

This revision is now accepted and ready to land.Jul 6 2016, 11:27 AM

I know this patch was approved already, but I have made some minor changes, which might as well get review since this hasn't been commited yet.

Added a check so that the new function is only called for loads and stores, after realizing it also gets called on PHIs etc.

+ if ((isa<LoadInst>(Fixup.UserInst) || isa<StoreInst>(Fixup.UserInst)) &&
+ !TTI.isFoldableMemAccessOffset(Fixup.UserInst, Offset))
+ NumBaseAdds++;

SystemZ implementation of isFoldableMemAccessOffset() rewritten to also handle the case of fp-stores.

Quentin, you approved this patch before, could you please take a quick look again so that my last changes look good to you? Sorry for confusion.

New rev LGTM as well!

Thanks,
-Quentin

This revision is now accepted and ready to land.Aug 8 2016, 5:32 PM

Commited as r278927.

Revision Contents

Path

Size

include/

llvm/

Analysis/

TargetTransformInfo.h

10 lines

TargetTransformInfoImpl.h

2 lines

CodeGen/

BasicTTIImpl.h

4 lines

Target/

TargetLowering.h

4 lines

lib/

Analysis/

TargetTransformInfo.cpp

5 lines

Target/

SystemZ/

SystemZISelLowering.h

1 line

SystemZISelLowering.cpp

31 lines

Transforms/

Scalar/

LoopStrengthReduce.cpp

434 lines

test/

CodeGen/

SystemZ/

loop-01.ll

117 lines

Diff 66931

include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 345 Lines • ▼ Show 20 Lines	public:
/// of the specified type.		/// of the specified type.
/// If the AM is supported, the return value must be >= 0.		/// If the AM is supported, the return value must be >= 0.
/// If the AM is not supported, it returns a negative value.		/// If the AM is not supported, it returns a negative value.
/// TODO: Handle pre/postinc as well.		/// TODO: Handle pre/postinc as well.
int getScalingFactorCost(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,		int getScalingFactorCost(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,
bool HasBaseReg, int64_t Scale,		bool HasBaseReg, int64_t Scale,
unsigned AddrSpace = 0) const;		unsigned AddrSpace = 0) const;

		/// \brief Return true if target supports the load / store
		/// instruction with the given Offset on the form reg + Offset. It
		/// may be that Offset is too big for a certain type (register
		/// class).
		qcolombetUnsubmitted Not Done Reply Inline Actions We should mention that the addressing mode is of the form reg + Offset. qcolombet: We should mention that the addressing mode is of the form reg + Offset.
		bool isFoldableMemAccessOffset(Instruction *I, int64_t Offset) const;

/// \brief Return true if it's free to truncate a value of type Ty1 to type		/// \brief Return true if it's free to truncate a value of type Ty1 to type
/// Ty2. e.g. On x86 it's free to truncate a i32 value in register EAX to i16		/// Ty2. e.g. On x86 it's free to truncate a i32 value in register EAX to i16
/// by referencing its sub-register AX.		/// by referencing its sub-register AX.
bool isTruncateFree(Type Ty1, Type Ty2) const;		bool isTruncateFree(Type Ty1, Type Ty2) const;

/// \brief Return true if it is profitable to hoist instruction in the		/// \brief Return true if it is profitable to hoist instruction in the
/// then/else to before if.		/// then/else to before if.
bool isProfitableToHoist(Instruction *I) const;		bool isProfitableToHoist(Instruction *I) const;
▲ Show 20 Lines • Show All 292 Lines • ▼ Show 20 Lines	virtual bool isLegalAddressingMode(Type Ty, GlobalValue BaseGV,
unsigned AddrSpace) = 0;		unsigned AddrSpace) = 0;
virtual bool isLegalMaskedStore(Type *DataType) = 0;		virtual bool isLegalMaskedStore(Type *DataType) = 0;
virtual bool isLegalMaskedLoad(Type *DataType) = 0;		virtual bool isLegalMaskedLoad(Type *DataType) = 0;
virtual bool isLegalMaskedScatter(Type *DataType) = 0;		virtual bool isLegalMaskedScatter(Type *DataType) = 0;
virtual bool isLegalMaskedGather(Type *DataType) = 0;		virtual bool isLegalMaskedGather(Type *DataType) = 0;
virtual int getScalingFactorCost(Type Ty, GlobalValue BaseGV,		virtual int getScalingFactorCost(Type Ty, GlobalValue BaseGV,
int64_t BaseOffset, bool HasBaseReg,		int64_t BaseOffset, bool HasBaseReg,
int64_t Scale, unsigned AddrSpace) = 0;		int64_t Scale, unsigned AddrSpace) = 0;
		virtual bool isFoldableMemAccessOffset(Instruction *I, int64_t Offset) = 0;
virtual bool isTruncateFree(Type Ty1, Type Ty2) = 0;		virtual bool isTruncateFree(Type Ty1, Type Ty2) = 0;
virtual bool isProfitableToHoist(Instruction *I) = 0;		virtual bool isProfitableToHoist(Instruction *I) = 0;
virtual bool isTypeLegal(Type *Ty) = 0;		virtual bool isTypeLegal(Type *Ty) = 0;
virtual unsigned getJumpBufAlignment() = 0;		virtual unsigned getJumpBufAlignment() = 0;
virtual unsigned getJumpBufSize() = 0;		virtual unsigned getJumpBufSize() = 0;
virtual bool shouldBuildLookupTables() = 0;		virtual bool shouldBuildLookupTables() = 0;
virtual bool enableAggressiveInterleaving(bool LoopHasReductions) = 0;		virtual bool enableAggressiveInterleaving(bool LoopHasReductions) = 0;
virtual bool enableInterleavedAccessVectorization() = 0;		virtual bool enableInterleavedAccessVectorization() = 0;
▲ Show 20 Lines • Show All 145 Lines • ▼ Show 20 Lines	bool isLegalMaskedGather(Type *DataType) override {
return Impl.isLegalMaskedGather(DataType);		return Impl.isLegalMaskedGather(DataType);
}		}
int getScalingFactorCost(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,		int getScalingFactorCost(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,
bool HasBaseReg, int64_t Scale,		bool HasBaseReg, int64_t Scale,
unsigned AddrSpace) override {		unsigned AddrSpace) override {
return Impl.getScalingFactorCost(Ty, BaseGV, BaseOffset, HasBaseReg,		return Impl.getScalingFactorCost(Ty, BaseGV, BaseOffset, HasBaseReg,
Scale, AddrSpace);		Scale, AddrSpace);
}		}
		bool isFoldableMemAccessOffset(Instruction *I, int64_t Offset) override {
		return Impl.isFoldableMemAccessOffset(I, Offset);
		}
bool isTruncateFree(Type Ty1, Type Ty2) override {		bool isTruncateFree(Type Ty1, Type Ty2) override {
return Impl.isTruncateFree(Ty1, Ty2);		return Impl.isTruncateFree(Ty1, Ty2);
}		}
bool isProfitableToHoist(Instruction *I) override {		bool isProfitableToHoist(Instruction *I) override {
return Impl.isProfitableToHoist(I);		return Impl.isProfitableToHoist(I);
}		}
bool isTypeLegal(Type *Ty) override { return Impl.isTypeLegal(Ty); }		bool isTypeLegal(Type *Ty) override { return Impl.isTypeLegal(Ty); }
unsigned getJumpBufAlignment() override { return Impl.getJumpBufAlignment(); }		unsigned getJumpBufAlignment() override { return Impl.getJumpBufAlignment(); }
▲ Show 20 Lines • Show All 252 Lines • Show Last 20 Lines

include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 229 Lines • ▼ Show 20 Lines	int getScalingFactorCost(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,
bool HasBaseReg, int64_t Scale, unsigned AddrSpace) {		bool HasBaseReg, int64_t Scale, unsigned AddrSpace) {
// Guess that all legal addressing mode are free.		// Guess that all legal addressing mode are free.
if (isLegalAddressingMode(Ty, BaseGV, BaseOffset, HasBaseReg,		if (isLegalAddressingMode(Ty, BaseGV, BaseOffset, HasBaseReg,
Scale, AddrSpace))		Scale, AddrSpace))
return 0;		return 0;
return -1;		return -1;
}		}

		bool isFoldableMemAccessOffset(Instruction *I, int64_t Offset) { return true; }

bool isTruncateFree(Type Ty1, Type Ty2) { return false; }		bool isTruncateFree(Type Ty1, Type Ty2) { return false; }

bool isProfitableToHoist(Instruction *I) { return true; }		bool isProfitableToHoist(Instruction *I) { return true; }

bool isTypeLegal(Type *Ty) { return false; }		bool isTypeLegal(Type *Ty) { return false; }

unsigned getJumpBufAlignment() { return 0; }		unsigned getJumpBufAlignment() { return 0; }

▲ Show 20 Lines • Show All 305 Lines • Show Last 20 Lines

include/llvm/CodeGen/BasicTTIImpl.h

Show First 20 Lines • Show All 138 Lines • ▼ Show 20 Lines	int getScalingFactorCost(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,
TargetLoweringBase::AddrMode AM;		TargetLoweringBase::AddrMode AM;
AM.BaseGV = BaseGV;		AM.BaseGV = BaseGV;
AM.BaseOffs = BaseOffset;		AM.BaseOffs = BaseOffset;
AM.HasBaseReg = HasBaseReg;		AM.HasBaseReg = HasBaseReg;
AM.Scale = Scale;		AM.Scale = Scale;
return getTLI()->getScalingFactorCost(DL, AM, Ty, AddrSpace);		return getTLI()->getScalingFactorCost(DL, AM, Ty, AddrSpace);
}		}

		bool isFoldableMemAccessOffset(Instruction *I, int64_t Offset) {
		return getTLI()->isFoldableMemAccessOffset(I, Offset);
		}

bool isTruncateFree(Type Ty1, Type Ty2) {		bool isTruncateFree(Type Ty1, Type Ty2) {
return getTLI()->isTruncateFree(Ty1, Ty2);		return getTLI()->isTruncateFree(Ty1, Ty2);
}		}

bool isProfitableToHoist(Instruction *I) {		bool isProfitableToHoist(Instruction *I) {
return getTLI()->isProfitableToHoist(I);		return getTLI()->isProfitableToHoist(I);
}		}

▲ Show 20 Lines • Show All 811 Lines • Show Last 20 Lines

include/llvm/Target/TargetLowering.h

Show First 20 Lines • Show All 1,622 Lines • ▼ Show 20 Lines	public:
virtual int getScalingFactorCost(const DataLayout &DL, const AddrMode &AM,		virtual int getScalingFactorCost(const DataLayout &DL, const AddrMode &AM,
Type *Ty, unsigned AS = 0) const {		Type *Ty, unsigned AS = 0) const {
// Default: assume that any scaling factor used in a legal AM is free.		// Default: assume that any scaling factor used in a legal AM is free.
if (isLegalAddressingMode(DL, AM, Ty, AS))		if (isLegalAddressingMode(DL, AM, Ty, AS))
return 0;		return 0;
return -1;		return -1;
}		}

		virtual bool isFoldableMemAccessOffset(Instruction *I, int64_t Offset) const {
		return true;
		}

/// Return true if the specified immediate is legal icmp immediate, that is		/// Return true if the specified immediate is legal icmp immediate, that is
/// the target has icmp instructions which can compare a register against the		/// the target has icmp instructions which can compare a register against the
/// immediate without having to materialize the immediate into a register.		/// immediate without having to materialize the immediate into a register.
virtual bool isLegalICmpImmediate(int64_t) const {		virtual bool isLegalICmpImmediate(int64_t) const {
return true;		return true;
}		}

/// Return true if the specified immediate is legal add immediate, that is the		/// Return true if the specified immediate is legal add immediate, that is the
▲ Show 20 Lines • Show All 1,439 Lines • Show Last 20 Lines

lib/Analysis/TargetTransformInfo.cpp

Show First 20 Lines • Show All 144 Lines • ▼ Show 20 Lines	int TargetTransformInfo::getScalingFactorCost(Type Ty, GlobalValue BaseGV,
int64_t Scale,		int64_t Scale,
unsigned AddrSpace) const {		unsigned AddrSpace) const {
int Cost = TTIImpl->getScalingFactorCost(Ty, BaseGV, BaseOffset, HasBaseReg,		int Cost = TTIImpl->getScalingFactorCost(Ty, BaseGV, BaseOffset, HasBaseReg,
Scale, AddrSpace);		Scale, AddrSpace);
assert(Cost >= 0 && "TTI should not produce negative costs!");		assert(Cost >= 0 && "TTI should not produce negative costs!");
return Cost;		return Cost;
}		}

		bool TargetTransformInfo::isFoldableMemAccessOffset(Instruction *I,
		int64_t Offset) const {
		return TTIImpl->isFoldableMemAccessOffset(I, Offset);
		}

bool TargetTransformInfo::isTruncateFree(Type Ty1, Type Ty2) const {		bool TargetTransformInfo::isTruncateFree(Type Ty1, Type Ty2) const {
return TTIImpl->isTruncateFree(Ty1, Ty2);		return TTIImpl->isTruncateFree(Ty1, Ty2);
}		}

bool TargetTransformInfo::isProfitableToHoist(Instruction *I) const {		bool TargetTransformInfo::isProfitableToHoist(Instruction *I) const {
return TTIImpl->isProfitableToHoist(I);		return TTIImpl->isProfitableToHoist(I);
}		}

▲ Show 20 Lines • Show All 308 Lines • Show Last 20 Lines

lib/Target/SystemZ/SystemZISelLowering.h

Show First 20 Lines • Show All 388 Lines • ▼ Show 20 Lines	public:
EVT getSetCCResultType(const DataLayout &DL, LLVMContext &,		EVT getSetCCResultType(const DataLayout &DL, LLVMContext &,
EVT) const override;		EVT) const override;
bool isFMAFasterThanFMulAndFAdd(EVT VT) const override;		bool isFMAFasterThanFMulAndFAdd(EVT VT) const override;
bool isFPImmLegal(const APFloat &Imm, EVT VT) const override;		bool isFPImmLegal(const APFloat &Imm, EVT VT) const override;
bool isLegalICmpImmediate(int64_t Imm) const override;		bool isLegalICmpImmediate(int64_t Imm) const override;
bool isLegalAddImmediate(int64_t Imm) const override;		bool isLegalAddImmediate(int64_t Imm) const override;
bool isLegalAddressingMode(const DataLayout &DL, const AddrMode &AM, Type *Ty,		bool isLegalAddressingMode(const DataLayout &DL, const AddrMode &AM, Type *Ty,
unsigned AS) const override;		unsigned AS) const override;
		bool isFoldableMemAccessOffset(Instruction *I, int64_t Offset) const override;
bool allowsMisalignedMemoryAccesses(EVT VT, unsigned AS,		bool allowsMisalignedMemoryAccesses(EVT VT, unsigned AS,
unsigned Align,		unsigned Align,
bool *Fast) const override;		bool *Fast) const override;
bool isTruncateFree(Type , Type ) const override;		bool isTruncateFree(Type , Type ) const override;
bool isTruncateFree(EVT, EVT) const override;		bool isTruncateFree(EVT, EVT) const override;
const char *getTargetNodeName(unsigned Opcode) const override;		const char *getTargetNodeName(unsigned Opcode) const override;
std::pair<unsigned, const TargetRegisterClass *>		std::pair<unsigned, const TargetRegisterClass *>
getRegForInlineAsmConstraint(const TargetRegisterInfo *TRI,		getRegForInlineAsmConstraint(const TargetRegisterInfo *TRI,
▲ Show 20 Lines • Show All 193 Lines • Show Last 20 Lines

lib/Target/SystemZ/SystemZISelLowering.cpp

Show All 14 Lines
#include "SystemZCallingConv.h"		#include "SystemZCallingConv.h"
#include "SystemZConstantPoolValue.h"		#include "SystemZConstantPoolValue.h"
#include "SystemZMachineFunctionInfo.h"		#include "SystemZMachineFunctionInfo.h"
#include "SystemZTargetMachine.h"		#include "SystemZTargetMachine.h"
#include "llvm/CodeGen/CallingConvLower.h"		#include "llvm/CodeGen/CallingConvLower.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"		#include "llvm/CodeGen/MachineInstrBuilder.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"		#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/CodeGen/TargetLoweringObjectFileImpl.h"		#include "llvm/CodeGen/TargetLoweringObjectFileImpl.h"
		#include "llvm/Support/CommandLine.h"
#include "llvm/IR/Intrinsics.h"		#include "llvm/IR/Intrinsics.h"
#include <cctype>		#include <cctype>

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "systemz-lower"		#define DEBUG_TYPE "systemz-lower"

		static cl::opt<bool> DisableZ13LSRTuning(
		hfinkelUnsubmitted Not Done Reply Inline Actions rm extra blank line. hfinkel: rm extra blank line.
		"disable-z13-lsr-tuning", cl::Hidden, cl::init(false),
		cl::desc("Experimental: Disable z13 LSR tuning"));

namespace {		namespace {
// Represents a sequence for extracting a 0/1 value from an IPM result:		// Represents a sequence for extracting a 0/1 value from an IPM result:
// (((X ^ XORValue) + AddValue) >> Bit)		// (((X ^ XORValue) + AddValue) >> Bit)
struct IPMConversion {		struct IPMConversion {
IPMConversion(unsigned xorValue, int64_t addValue, unsigned bit)		IPMConversion(unsigned xorValue, int64_t addValue, unsigned bit)
: XORValue(xorValue), AddValue(addValue), Bit(bit) {}		: XORValue(xorValue), AddValue(addValue), Bit(bit) {}

int64_t XORValue;		int64_t XORValue;
▲ Show 20 Lines • Show All 488 Lines • ▼ Show 20 Lines	bool SystemZTargetLowering::isLegalAddressingMode(const DataLayout &DL,
// Require a 20-bit signed offset.		// Require a 20-bit signed offset.
if (!isInt<20>(AM.BaseOffs))		if (!isInt<20>(AM.BaseOffs))
return false;		return false;

// Indexing is OK but no scale factor can be applied.		// Indexing is OK but no scale factor can be applied.
return AM.Scale == 0 \|\| AM.Scale == 1;		return AM.Scale == 0 \|\| AM.Scale == 1;
}		}

		bool SystemZTargetLowering::isFoldableMemAccessOffset(Instruction *I,
		int64_t Offset) const {
		// Experimental
		if (DisableZ13LSRTuning)
		return true;

		// This only applies to z13.
		if (!Subtarget.hasVector())
		return true;

		// * Use LDE instead of LE/LEY to avoid partial register
		// dependencies (LDE only supports small offsets).
		// * Utilize the vector registers to hold floating point
		// values (vector load / store instructions only support small
		// offsets).

		assert (isa<LoadInst>(I) \|\| isa<StoreInst>(I));
		Type *MemAccessTy = (isa<LoadInst>(I) ? I->getType() :
		I->getOperand(0)->getType());
		if (!isUInt<12>(Offset) &&
		(MemAccessTy->isFloatingPointTy() \|\| MemAccessTy->isVectorTy()))
		return false;

		return true;
		}

bool SystemZTargetLowering::isTruncateFree(Type FromType, Type ToType) const {		bool SystemZTargetLowering::isTruncateFree(Type FromType, Type ToType) const {
if (!FromType->isIntegerTy() \|\| !ToType->isIntegerTy())		if (!FromType->isIntegerTy() \|\| !ToType->isIntegerTy())
return false;		return false;
unsigned FromBits = FromType->getPrimitiveSizeInBits();		unsigned FromBits = FromType->getPrimitiveSizeInBits();
unsigned ToBits = ToType->getPrimitiveSizeInBits();		unsigned ToBits = ToType->getPrimitiveSizeInBits();
return FromBits > ToBits;		return FromBits > ToBits;
}		}

▲ Show 20 Lines • Show All 5,691 Lines • Show Last 20 Lines

lib/Transforms/Scalar/LoopStrengthReduce.cpp

Show First 20 Lines • Show All 879 Lines • ▼ Show 20 Lines	bool isLoser() {
return NumRegs == ~0u;		return NumRegs == ~0u;
}		}

void RateFormula(const TargetTransformInfo &TTI,		void RateFormula(const TargetTransformInfo &TTI,
const Formula &F,		const Formula &F,
SmallPtrSetImpl<const SCEV *> &Regs,		SmallPtrSetImpl<const SCEV *> &Regs,
const DenseSet<const SCEV *> &VisitedRegs,		const DenseSet<const SCEV *> &VisitedRegs,
const Loop *L,		const Loop *L,
const SmallVectorImpl<int64_t> &Offsets,
ScalarEvolution &SE, DominatorTree &DT,		ScalarEvolution &SE, DominatorTree &DT,
const LSRUse &LU,		const LSRUse &LU,
SmallPtrSetImpl<const SCEV > LoserRegs = nullptr);		SmallPtrSetImpl<const SCEV > LoserRegs = nullptr);

void print(raw_ostream &OS) const;		void print(raw_ostream &OS) const;
void dump() const;		void dump() const;

private:		private:
void RateRegister(const SCEV *Reg,		void RateRegister(const SCEV *Reg,
SmallPtrSetImpl<const SCEV *> &Regs,		SmallPtrSetImpl<const SCEV *> &Regs,
const Loop *L,		const Loop *L,
ScalarEvolution &SE, DominatorTree &DT);		ScalarEvolution &SE, DominatorTree &DT);
void RatePrimaryRegister(const SCEV *Reg,		void RatePrimaryRegister(const SCEV *Reg,
SmallPtrSetImpl<const SCEV *> &Regs,		SmallPtrSetImpl<const SCEV *> &Regs,
const Loop *L,		const Loop *L,
ScalarEvolution &SE, DominatorTree &DT,		ScalarEvolution &SE, DominatorTree &DT,
SmallPtrSetImpl<const SCEV > LoserRegs);		SmallPtrSetImpl<const SCEV > LoserRegs);
};		};

		/// An operand value in an instruction which is to be replaced with some
		/// equivalent, possibly strength-reduced, replacement.
		struct LSRFixup {
		/// The instruction which will be updated.
		Instruction *UserInst;

		/// The operand of the instruction which will be replaced. The operand may be
		/// used more than once; every instance will be replaced.
		Value *OperandValToReplace;

		/// If this user is to use the post-incremented value of an induction
		/// variable, this variable is non-null and holds the loop associated with the
		/// induction variable.
		PostIncLoopSet PostIncLoops;

		/// A constant offset to be added to the LSRUse expression. This allows
		/// multiple fixups to share the same LSRUse with different offsets, for
		/// example in an unrolled loop.
		int64_t Offset;

		bool isUseFullyOutsideLoop(const Loop *L) const;

		LSRFixup();

		void print(raw_ostream &OS) const;
		void dump() const;
		};


		/// A DenseMapInfo implementation for holding DenseMaps and DenseSets of sorted
		/// SmallVectors of const SCEV*.
		struct UniquifierDenseMapInfo {
		static SmallVector<const SCEV *, 4> getEmptyKey() {
		SmallVector<const SCEV *, 4> V;
		V.push_back(reinterpret_cast<const SCEV *>(-1));
		return V;
		}

		static SmallVector<const SCEV *, 4> getTombstoneKey() {
		SmallVector<const SCEV *, 4> V;
		V.push_back(reinterpret_cast<const SCEV *>(-2));
		return V;
		}

		static unsigned getHashValue(const SmallVector<const SCEV *, 4> &V) {
		return static_cast<unsigned>(hash_combine_range(V.begin(), V.end()));
		}

		static bool isEqual(const SmallVector<const SCEV *, 4> &LHS,
		const SmallVector<const SCEV *, 4> &RHS) {
		return LHS == RHS;
		}
		};

		/// This class holds the state that LSR keeps for each use in IVUsers, as well
		/// as uses invented by LSR itself. It includes information about what kinds of
		/// things can be folded into the user, information about the user itself, and
		/// information about how the use may be satisfied. TODO: Represent multiple
		/// users of the same expression in common?
		class LSRUse {
		DenseSet<SmallVector<const SCEV *, 4>, UniquifierDenseMapInfo> Uniquifier;

		public:
		/// An enum for a kind of use, indicating what types of scaled and immediate
		/// operands it might support.
		enum KindType {
		Basic, ///< A normal use, with no folding.
		Special, ///< A special case of basic, allowing -1 scales.
		Address, ///< An address use; folding according to TargetLowering
		ICmpZero ///< An equality icmp with both operands folded into one.
		// TODO: Add a generic icmp too?
		};

		typedef PointerIntPair<const SCEV *, 2, KindType> SCEVUseKindPair;

		KindType Kind;
		MemAccessTy AccessTy;

		/// The list of operands which are to be replaced.
		SmallVector<LSRFixup, 8> Fixups;

		/// Keep track of the min and max offsets of the fixups.
		int64_t MinOffset;
		int64_t MaxOffset;

		/// This records whether all of the fixups using this LSRUse are outside of
		/// the loop, in which case some special-case heuristics may be used.
		bool AllFixupsOutsideLoop;

		/// RigidFormula is set to true to guarantee that this use will be associated
		/// with a single formula--the one that initially matched. Some SCEV
		/// expressions cannot be expanded. This allows LSR to consider the registers
		/// used by those expressions without the need to expand them later after
		/// changing the formula.
		bool RigidFormula;

		/// This records the widest use type for any fixup using this
		/// LSRUse. FindUseWithSimilarFormula can't consider uses with different max
		/// fixup widths to be equivalent, because the narrower one may be relying on
		/// the implicit truncation to truncate away bogus bits.
		Type *WidestFixupType;

		/// A list of ways to build a value that can satisfy this user. After the
		/// list is populated, one of these is selected heuristically and used to
		/// formulate a replacement for OperandValToReplace in UserInst.
		SmallVector<Formula, 12> Formulae;

		/// The set of register candidates used by all formulae in this LSRUse.
		SmallPtrSet<const SCEV *, 4> Regs;

		LSRUse(KindType K, MemAccessTy AT)
		: Kind(K), AccessTy(AT), MinOffset(INT64_MAX), MaxOffset(INT64_MIN),
		AllFixupsOutsideLoop(true), RigidFormula(false),
		WidestFixupType(nullptr) {}

		LSRFixup &getNewFixup() {
		Fixups.push_back(LSRFixup());
		return Fixups.back();
		}

		void pushFixup(LSRFixup &f) {
		Fixups.push_back(f);
		if (f.Offset > MaxOffset)
		MaxOffset = f.Offset;
		if (f.Offset < MinOffset)
		MinOffset = f.Offset;
		}

		bool HasFormulaWithSameRegs(const Formula &F) const;
		bool InsertFormula(const Formula &F);
		void DeleteFormula(Formula &F);
		void RecomputeRegs(size_t LUIdx, RegUseTracker &Reguses);

		void print(raw_ostream &OS) const;
		void dump() const;
		};

}		}

/// Tally up interesting quantities from the given register.		/// Tally up interesting quantities from the given register.
void Cost::RateRegister(const SCEV *Reg,		void Cost::RateRegister(const SCEV *Reg,
SmallPtrSetImpl<const SCEV *> &Regs,		SmallPtrSetImpl<const SCEV *> &Regs,
const Loop *L,		const Loop *L,
ScalarEvolution &SE, DominatorTree &DT) {		ScalarEvolution &SE, DominatorTree &DT) {
if (const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(Reg)) {		if (const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(Reg)) {
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	void Cost::RatePrimaryRegister(const SCEV *Reg,
}		}
}		}

void Cost::RateFormula(const TargetTransformInfo &TTI,		void Cost::RateFormula(const TargetTransformInfo &TTI,
const Formula &F,		const Formula &F,
SmallPtrSetImpl<const SCEV *> &Regs,		SmallPtrSetImpl<const SCEV *> &Regs,
const DenseSet<const SCEV *> &VisitedRegs,		const DenseSet<const SCEV *> &VisitedRegs,
const Loop *L,		const Loop *L,
const SmallVectorImpl<int64_t> &Offsets,
ScalarEvolution &SE, DominatorTree &DT,		ScalarEvolution &SE, DominatorTree &DT,
const LSRUse &LU,		const LSRUse &LU,
SmallPtrSetImpl<const SCEV > LoserRegs) {		SmallPtrSetImpl<const SCEV > LoserRegs) {
assert(F.isCanonical() && "Cost is accurate only for canonical formula");		assert(F.isCanonical() && "Cost is accurate only for canonical formula");
// Tally up the registers.		// Tally up the registers.
if (const SCEV *ScaledReg = F.ScaledReg) {		if (const SCEV *ScaledReg = F.ScaledReg) {
if (VisitedRegs.count(ScaledReg)) {		if (VisitedRegs.count(ScaledReg)) {
Lose();		Lose();
Show All 21 Lines	if (NumBaseParts > 1)
NumBaseAdds +=		NumBaseAdds +=
NumBaseParts - (1 + (F.Scale && isAMCompletelyFolded(TTI, LU, F)));		NumBaseParts - (1 + (F.Scale && isAMCompletelyFolded(TTI, LU, F)));
NumBaseAdds += (F.UnfoldedOffset != 0);		NumBaseAdds += (F.UnfoldedOffset != 0);

// Accumulate non-free scaling amounts.		// Accumulate non-free scaling amounts.
ScaleCost += getScalingFactorCost(TTI, LU, F);		ScaleCost += getScalingFactorCost(TTI, LU, F);

// Tally up the non-zero immediates.		// Tally up the non-zero immediates.
for (int64_t O : Offsets) {		for (const LSRFixup &Fixup : LU.Fixups) {
		int64_t O = Fixup.Offset;
int64_t Offset = (uint64_t)O + F.BaseOffset;		int64_t Offset = (uint64_t)O + F.BaseOffset;
if (F.BaseGV)		if (F.BaseGV)
ImmCost += 64; // Handle symbolic values conservatively.		ImmCost += 64; // Handle symbolic values conservatively.
// TODO: This should probably be the pointer size.		// TODO: This should probably be the pointer size.
else if (Offset != 0)		else if (Offset != 0)
ImmCost += APInt(64, Offset, true).getMinSignedBits();		ImmCost += APInt(64, Offset, true).getMinSignedBits();

		// Check with target if this offset with this instruction is
		// specifically not supported.
		if ((isa<LoadInst>(Fixup.UserInst) \|\| isa<StoreInst>(Fixup.UserInst)) &&
		!TTI.isFoldableMemAccessOffset(Fixup.UserInst, Offset))
		NumBaseAdds++;
}		}
assert(isValid() && "invalid cost");		assert(isValid() && "invalid cost");
}		}

/// Set this cost to a losing value.		/// Set this cost to a losing value.
void Cost::Lose() {		void Cost::Lose() {
NumRegs = ~0u;		NumRegs = ~0u;
AddRecCost = ~0u;		AddRecCost = ~0u;
Show All 30 Lines	if (SetupCost != 0)
OS << ", plus " << SetupCost << " setup cost";		OS << ", plus " << SetupCost << " setup cost";
}		}

LLVM_DUMP_METHOD		LLVM_DUMP_METHOD
void Cost::dump() const {		void Cost::dump() const {
print(errs()); errs() << '\n';		print(errs()); errs() << '\n';
}		}

namespace {

/// An operand value in an instruction which is to be replaced with some
/// equivalent, possibly strength-reduced, replacement.
struct LSRFixup {
/// The instruction which will be updated.
Instruction *UserInst;

/// The operand of the instruction which will be replaced. The operand may be
/// used more than once; every instance will be replaced.
Value *OperandValToReplace;

/// If this user is to use the post-incremented value of an induction
/// variable, this variable is non-null and holds the loop associated with the
/// induction variable.
PostIncLoopSet PostIncLoops;

/// The index of the LSRUse describing the expression which this fixup needs,
/// minus an offset (below).
size_t LUIdx;

/// A constant offset to be added to the LSRUse expression. This allows
/// multiple fixups to share the same LSRUse with different offsets, for
/// example in an unrolled loop.
int64_t Offset;

bool isUseFullyOutsideLoop(const Loop *L) const;

LSRFixup();

void print(raw_ostream &OS) const;
void dump() const;
};

}

LSRFixup::LSRFixup()		LSRFixup::LSRFixup()
: UserInst(nullptr), OperandValToReplace(nullptr), LUIdx(~size_t(0)),		: UserInst(nullptr), OperandValToReplace(nullptr),
Offset(0) {}		Offset(0) {}

/// Test whether this fixup always uses its value outside of the given loop.		/// Test whether this fixup always uses its value outside of the given loop.
bool LSRFixup::isUseFullyOutsideLoop(const Loop *L) const {		bool LSRFixup::isUseFullyOutsideLoop(const Loop *L) const {
// PHI nodes use their value in their incoming blocks.		// PHI nodes use their value in their incoming blocks.
if (const PHINode *PN = dyn_cast<PHINode>(UserInst)) {		if (const PHINode *PN = dyn_cast<PHINode>(UserInst)) {
for (unsigned i = 0, e = PN->getNumIncomingValues(); i != e; ++i)		for (unsigned i = 0, e = PN->getNumIncomingValues(); i != e; ++i)
if (PN->getIncomingValue(i) == OperandValToReplace &&		if (PN->getIncomingValue(i) == OperandValToReplace &&
Show All 19 Lines	void LSRFixup::print(raw_ostream &OS) const {
OS << ", OperandValToReplace=";		OS << ", OperandValToReplace=";
OperandValToReplace->printAsOperand(OS, /PrintType=/false);		OperandValToReplace->printAsOperand(OS, /PrintType=/false);

for (const Loop *PIL : PostIncLoops) {		for (const Loop *PIL : PostIncLoops) {
OS << ", PostIncLoop=";		OS << ", PostIncLoop=";
PIL->getHeader()->printAsOperand(OS, /PrintType=/false);		PIL->getHeader()->printAsOperand(OS, /PrintType=/false);
}		}

if (LUIdx != ~size_t(0))
OS << ", LUIdx=" << LUIdx;

if (Offset != 0)		if (Offset != 0)
OS << ", Offset=" << Offset;		OS << ", Offset=" << Offset;
}		}

LLVM_DUMP_METHOD		LLVM_DUMP_METHOD
void LSRFixup::dump() const {		void LSRFixup::dump() const {
print(errs()); errs() << '\n';		print(errs()); errs() << '\n';
}		}

namespace {

/// A DenseMapInfo implementation for holding DenseMaps and DenseSets of sorted
/// SmallVectors of const SCEV*.
struct UniquifierDenseMapInfo {
static SmallVector<const SCEV *, 4> getEmptyKey() {
SmallVector<const SCEV *, 4> V;
V.push_back(reinterpret_cast<const SCEV *>(-1));
return V;
}

static SmallVector<const SCEV *, 4> getTombstoneKey() {
SmallVector<const SCEV *, 4> V;
V.push_back(reinterpret_cast<const SCEV *>(-2));
return V;
}

static unsigned getHashValue(const SmallVector<const SCEV *, 4> &V) {
return static_cast<unsigned>(hash_combine_range(V.begin(), V.end()));
}

static bool isEqual(const SmallVector<const SCEV *, 4> &LHS,
const SmallVector<const SCEV *, 4> &RHS) {
return LHS == RHS;
}
};

/// This class holds the state that LSR keeps for each use in IVUsers, as well
/// as uses invented by LSR itself. It includes information about what kinds of
/// things can be folded into the user, information about the user itself, and
/// information about how the use may be satisfied. TODO: Represent multiple
/// users of the same expression in common?
class LSRUse {
DenseSet<SmallVector<const SCEV *, 4>, UniquifierDenseMapInfo> Uniquifier;

public:
/// An enum for a kind of use, indicating what types of scaled and immediate
/// operands it might support.
enum KindType {
Basic, ///< A normal use, with no folding.
Special, ///< A special case of basic, allowing -1 scales.
Address, ///< An address use; folding according to TargetLowering
ICmpZero ///< An equality icmp with both operands folded into one.
// TODO: Add a generic icmp too?
};

typedef PointerIntPair<const SCEV *, 2, KindType> SCEVUseKindPair;

KindType Kind;
MemAccessTy AccessTy;

SmallVector<int64_t, 8> Offsets;
int64_t MinOffset;
int64_t MaxOffset;

/// This records whether all of the fixups using this LSRUse are outside of
/// the loop, in which case some special-case heuristics may be used.
bool AllFixupsOutsideLoop;

/// RigidFormula is set to true to guarantee that this use will be associated
/// with a single formula--the one that initially matched. Some SCEV
/// expressions cannot be expanded. This allows LSR to consider the registers
/// used by those expressions without the need to expand them later after
/// changing the formula.
bool RigidFormula;

/// This records the widest use type for any fixup using this
/// LSRUse. FindUseWithSimilarFormula can't consider uses with different max
/// fixup widths to be equivalent, because the narrower one may be relying on
/// the implicit truncation to truncate away bogus bits.
Type *WidestFixupType;

/// A list of ways to build a value that can satisfy this user. After the
/// list is populated, one of these is selected heuristically and used to
/// formulate a replacement for OperandValToReplace in UserInst.
SmallVector<Formula, 12> Formulae;

/// The set of register candidates used by all formulae in this LSRUse.
SmallPtrSet<const SCEV *, 4> Regs;

LSRUse(KindType K, MemAccessTy AT)
: Kind(K), AccessTy(AT), MinOffset(INT64_MAX), MaxOffset(INT64_MIN),
AllFixupsOutsideLoop(true), RigidFormula(false),
WidestFixupType(nullptr) {}

bool HasFormulaWithSameRegs(const Formula &F) const;
bool InsertFormula(const Formula &F);
void DeleteFormula(Formula &F);
void RecomputeRegs(size_t LUIdx, RegUseTracker &Reguses);

void print(raw_ostream &OS) const;
void dump() const;
};

}

/// Test whether this use as a formula which has the same registers as the given		/// Test whether this use as a formula which has the same registers as the given
/// formula.		/// formula.
bool LSRUse::HasFormulaWithSameRegs(const Formula &F) const {		bool LSRUse::HasFormulaWithSameRegs(const Formula &F) const {
SmallVector<const SCEV *, 4> Key = F.BaseRegs;		SmallVector<const SCEV *, 4> Key = F.BaseRegs;
if (F.ScaledReg) Key.push_back(F.ScaledReg);		if (F.ScaledReg) Key.push_back(F.ScaledReg);
// Unstable sort by host order ok, because this is only used for uniquifying.		// Unstable sort by host order ok, because this is only used for uniquifying.
std::sort(Key.begin(), Key.end());		std::sort(Key.begin(), Key.end());
return Uniquifier.count(Key);		return Uniquifier.count(Key);
▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	else {
OS << *AccessTy.MemTy;		OS << *AccessTy.MemTy;
}		}

OS << " in addrspace(" << AccessTy.AddrSpace << ')';		OS << " in addrspace(" << AccessTy.AddrSpace << ')';
}		}

OS << ", Offsets={";		OS << ", Offsets={";
bool NeedComma = false;		bool NeedComma = false;
for (int64_t O : Offsets) {		for (const LSRFixup &Fixup : Fixups) {
if (NeedComma) OS << ',';		if (NeedComma) OS << ',';
OS << O;		OS << Fixup.Offset;
NeedComma = true;		NeedComma = true;
}		}
OS << '}';		OS << '}';

if (AllFixupsOutsideLoop)		if (AllFixupsOutsideLoop)
OS << ", all-fixups-outside-loop";		OS << ", all-fixups-outside-loop";

if (WidestFixupType)		if (WidestFixupType)
▲ Show 20 Lines • Show All 290 Lines • ▼ Show 20 Lines	class LSRInstance {
Instruction *IVIncInsertPos;		Instruction *IVIncInsertPos;

/// Interesting factors between use strides.		/// Interesting factors between use strides.
SmallSetVector<int64_t, 8> Factors;		SmallSetVector<int64_t, 8> Factors;

/// Interesting use types, to facilitate truncation reuse.		/// Interesting use types, to facilitate truncation reuse.
SmallSetVector<Type *, 4> Types;		SmallSetVector<Type *, 4> Types;

/// The list of operands which are to be replaced.
SmallVector<LSRFixup, 16> Fixups;

/// The list of interesting uses.		/// The list of interesting uses.
SmallVector<LSRUse, 16> Uses;		SmallVector<LSRUse, 16> Uses;

/// Track which uses use which register candidates.		/// Track which uses use which register candidates.
RegUseTracker RegUses;		RegUseTracker RegUses;

// Limit the number of chains to avoid quadratic behavior. We don't expect to		// Limit the number of chains to avoid quadratic behavior. We don't expect to
// have more than a few IV increment chains in a loop. Missing a Chain falls		// have more than a few IV increment chains in a loop. Missing a Chain falls
Show All 16 Lines	class LSRInstance {
void FinalizeChain(IVChain &Chain);		void FinalizeChain(IVChain &Chain);
void CollectChains();		void CollectChains();
void GenerateIVChain(const IVChain &Chain, SCEVExpander &Rewriter,		void GenerateIVChain(const IVChain &Chain, SCEVExpander &Rewriter,
SmallVectorImpl<WeakVH> &DeadInsts);		SmallVectorImpl<WeakVH> &DeadInsts);

void CollectInterestingTypesAndFactors();		void CollectInterestingTypesAndFactors();
void CollectFixupsAndInitialFormulae();		void CollectFixupsAndInitialFormulae();

LSRFixup &getNewFixup() {
Fixups.push_back(LSRFixup());
return Fixups.back();
}

// Support for sharing of LSRUses between LSRFixups.		// Support for sharing of LSRUses between LSRFixups.
typedef DenseMap<LSRUse::SCEVUseKindPair, size_t> UseMapTy;		typedef DenseMap<LSRUse::SCEVUseKindPair, size_t> UseMapTy;
UseMapTy UseMap;		UseMapTy UseMap;

bool reconcileNewOffset(LSRUse &LU, int64_t NewOffset, bool HasBaseReg,		bool reconcileNewOffset(LSRUse &LU, int64_t NewOffset, bool HasBaseReg,
LSRUse::KindType Kind, MemAccessTy AccessTy);		LSRUse::KindType Kind, MemAccessTy AccessTy);

std::pair<size_t, int64_t> getUse(const SCEV *&Expr, LSRUse::KindType Kind,		std::pair<size_t, int64_t> getUse(const SCEV *&Expr, LSRUse::KindType Kind,
▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	BasicBlock::iterator
HoistInsertPosition(BasicBlock::iterator IP,		HoistInsertPosition(BasicBlock::iterator IP,
const SmallVectorImpl<Instruction *> &Inputs) const;		const SmallVectorImpl<Instruction *> &Inputs) const;
BasicBlock::iterator		BasicBlock::iterator
AdjustInsertPositionForExpand(BasicBlock::iterator IP,		AdjustInsertPositionForExpand(BasicBlock::iterator IP,
const LSRFixup &LF,		const LSRFixup &LF,
const LSRUse &LU,		const LSRUse &LU,
SCEVExpander &Rewriter) const;		SCEVExpander &Rewriter) const;

Value *Expand(const LSRFixup &LF,		Value *Expand(const LSRUse &LU, const LSRFixup &LF,
const Formula &F,		const Formula &F,
BasicBlock::iterator IP,		BasicBlock::iterator IP,
SCEVExpander &Rewriter,		SCEVExpander &Rewriter,
SmallVectorImpl<WeakVH> &DeadInsts) const;		SmallVectorImpl<WeakVH> &DeadInsts) const;
void RewriteForPHI(PHINode *PN, const LSRFixup &LF,		void RewriteForPHI(PHINode *PN, const LSRUse &LU, const LSRFixup &LF,
const Formula &F,		const Formula &F,
SCEVExpander &Rewriter,		SCEVExpander &Rewriter,
SmallVectorImpl<WeakVH> &DeadInsts) const;		SmallVectorImpl<WeakVH> &DeadInsts) const;
void Rewrite(const LSRFixup &LF,		void Rewrite(const LSRUse &LU, const LSRFixup &LF,
const Formula &F,		const Formula &F,
SCEVExpander &Rewriter,		SCEVExpander &Rewriter,
SmallVectorImpl<WeakVH> &DeadInsts) const;		SmallVectorImpl<WeakVH> &DeadInsts) const;
void ImplementSolution(const SmallVectorImpl<const Formula *> &Solution);		void ImplementSolution(const SmallVectorImpl<const Formula *> &Solution);

public:		public:
LSRInstance(Loop *L, IVUsers &IU, ScalarEvolution &SE, DominatorTree &DT,		LSRInstance(Loop *L, IVUsers &IU, ScalarEvolution &SE, DominatorTree &DT,
LoopInfo &LI, const TargetTransformInfo &TTI);		LoopInfo &LI, const TargetTransformInfo &TTI);
▲ Show 20 Lines • Show All 463 Lines • ▼ Show 20 Lines	if (!isAlwaysFoldable(TTI, Kind, NewAccessTy, /BaseGV=/nullptr,
return false;		return false;
NewMaxOffset = NewOffset;		NewMaxOffset = NewOffset;
}		}

// Update the use.		// Update the use.
LU.MinOffset = NewMinOffset;		LU.MinOffset = NewMinOffset;
LU.MaxOffset = NewMaxOffset;		LU.MaxOffset = NewMaxOffset;
LU.AccessTy = NewAccessTy;		LU.AccessTy = NewAccessTy;
if (NewOffset != LU.Offsets.back())
LU.Offsets.push_back(NewOffset);
return true;		return true;
}		}

/// Return an LSRUse index and an offset value for a fixup which needs the given		/// Return an LSRUse index and an offset value for a fixup which needs the given
/// expression, with the given kind and optional access type. Either reuse an		/// expression, with the given kind and optional access type. Either reuse an
/// existing use or create a new one, as needed.		/// existing use or create a new one, as needed.
std::pair<size_t, int64_t> LSRInstance::getUse(const SCEV *&Expr,		std::pair<size_t, int64_t> LSRInstance::getUse(const SCEV *&Expr,
LSRUse::KindType Kind,		LSRUse::KindType Kind,
Show All 20 Lines	std::pair<size_t, int64_t> LSRInstance::getUse(const SCEV *&Expr,
}		}

// Create a new use.		// Create a new use.
size_t LUIdx = Uses.size();		size_t LUIdx = Uses.size();
P.first->second = LUIdx;		P.first->second = LUIdx;
Uses.push_back(LSRUse(Kind, AccessTy));		Uses.push_back(LSRUse(Kind, AccessTy));
LSRUse &LU = Uses[LUIdx];		LSRUse &LU = Uses[LUIdx];

// We don't need to track redundant offsets, but we don't need to go out
// of our way here to avoid them.
if (LU.Offsets.empty() \|\| Offset != LU.Offsets.back())
LU.Offsets.push_back(Offset);

LU.MinOffset = Offset;		LU.MinOffset = Offset;
LU.MaxOffset = Offset;		LU.MaxOffset = Offset;
return std::make_pair(LUIdx, Offset);		return std::make_pair(LUIdx, Offset);
}		}

/// Delete the given use from the Uses list.		/// Delete the given use from the Uses list.
void LSRInstance::DeleteUse(LSRUse &LU, size_t LUIdx) {		void LSRInstance::DeleteUse(LSRUse &LU, size_t LUIdx) {
if (&LU != &Uses.back())		if (&LU != &Uses.back())
▲ Show 20 Lines • Show All 638 Lines • ▼ Show 20 Lines	for (const IVStrideUse &U : IU) {
Instruction *UserInst = U.getUser();		Instruction *UserInst = U.getUser();
// Skip IV users that are part of profitable IV Chains.		// Skip IV users that are part of profitable IV Chains.
User::op_iterator UseI = std::find(UserInst->op_begin(), UserInst->op_end(),		User::op_iterator UseI = std::find(UserInst->op_begin(), UserInst->op_end(),
U.getOperandValToReplace());		U.getOperandValToReplace());
assert(UseI != UserInst->op_end() && "cannot find IV operand");		assert(UseI != UserInst->op_end() && "cannot find IV operand");
if (IVIncSet.count(UseI))		if (IVIncSet.count(UseI))
continue;		continue;

// Record the uses.
LSRFixup &LF = getNewFixup();
LF.UserInst = UserInst;
LF.OperandValToReplace = U.getOperandValToReplace();
LF.PostIncLoops = U.getPostIncLoops();

LSRUse::KindType Kind = LSRUse::Basic;		LSRUse::KindType Kind = LSRUse::Basic;
MemAccessTy AccessTy;		MemAccessTy AccessTy;
if (isAddressUse(LF.UserInst, LF.OperandValToReplace)) {		if (isAddressUse(UserInst, U.getOperandValToReplace())) {
Kind = LSRUse::Address;		Kind = LSRUse::Address;
AccessTy = getAccessType(LF.UserInst);		AccessTy = getAccessType(UserInst);
}		}

const SCEV *S = IU.getExpr(U);		const SCEV *S = IU.getExpr(U);
		PostIncLoopSet TmpPostIncLoops = U.getPostIncLoops();

// Equality (== and !=) ICmps are special. We can rewrite (i == N) as		// Equality (== and !=) ICmps are special. We can rewrite (i == N) as
// (N - i == 0), and this allows (N - i) to be the expression that we work		// (N - i == 0), and this allows (N - i) to be the expression that we work
// with rather than just N or i, so we can consider the register		// with rather than just N or i, so we can consider the register
// requirements for both N and i at the same time. Limiting this code to		// requirements for both N and i at the same time. Limiting this code to
// equality icmps is not a problem because all interesting loops use		// equality icmps is not a problem because all interesting loops use
// equality icmps, thanks to IndVarSimplify.		// equality icmps, thanks to IndVarSimplify.
if (ICmpInst *CI = dyn_cast<ICmpInst>(LF.UserInst))		if (ICmpInst *CI = dyn_cast<ICmpInst>(UserInst))
if (CI->isEquality()) {		if (CI->isEquality()) {
// Swap the operands if needed to put the OperandValToReplace on the		// Swap the operands if needed to put the OperandValToReplace on the
// left, for consistency.		// left, for consistency.
Value *NV = CI->getOperand(1);		Value *NV = CI->getOperand(1);
if (NV == LF.OperandValToReplace) {		if (NV == U.getOperandValToReplace()) {
CI->setOperand(1, CI->getOperand(0));		CI->setOperand(1, CI->getOperand(0));
CI->setOperand(0, NV);		CI->setOperand(0, NV);
NV = CI->getOperand(1);		NV = CI->getOperand(1);
Changed = true;		Changed = true;
}		}

// x == y --> x - y == 0		// x == y --> x - y == 0
const SCEV *N = SE.getSCEV(NV);		const SCEV *N = SE.getSCEV(NV);
if (SE.isLoopInvariant(N, L) && isSafeToExpand(N, SE)) {		if (SE.isLoopInvariant(N, L) && isSafeToExpand(N, SE)) {
// S is normalized, so normalize N before folding it into S		// S is normalized, so normalize N before folding it into S
// to keep the result normalized.		// to keep the result normalized.
N = TransformForPostIncUse(Normalize, N, CI, nullptr,		N = TransformForPostIncUse(Normalize, N, CI, nullptr,
LF.PostIncLoops, SE, DT);		TmpPostIncLoops, SE, DT);
Kind = LSRUse::ICmpZero;		Kind = LSRUse::ICmpZero;
S = SE.getMinusSCEV(N, S);		S = SE.getMinusSCEV(N, S);
}		}

// -1 and the negations of all interesting strides (except the negation		// -1 and the negations of all interesting strides (except the negation
// of -1) are now also interesting.		// of -1) are now also interesting.
for (size_t i = 0, e = Factors.size(); i != e; ++i)		for (size_t i = 0, e = Factors.size(); i != e; ++i)
if (Factors[i] != -1)		if (Factors[i] != -1)
Factors.insert(-(uint64_t)Factors[i]);		Factors.insert(-(uint64_t)Factors[i]);
Factors.insert(-1);		Factors.insert(-1);
}		}

// Set up the initial formula for this use.		// Get or create an LSRUse.
std::pair<size_t, int64_t> P = getUse(S, Kind, AccessTy);		std::pair<size_t, int64_t> P = getUse(S, Kind, AccessTy);
LF.LUIdx = P.first;		size_t LUIdx = P.first;
LF.Offset = P.second;		int64_t Offset = P.second;
LSRUse &LU = Uses[LF.LUIdx];		LSRUse &LU = Uses[LUIdx];

		// Record the fixup.
		LSRFixup &LF = LU.getNewFixup();
		LF.UserInst = UserInst;
		LF.OperandValToReplace = U.getOperandValToReplace();
		LF.PostIncLoops = TmpPostIncLoops;
		LF.Offset = Offset;
LU.AllFixupsOutsideLoop &= LF.isUseFullyOutsideLoop(L);		LU.AllFixupsOutsideLoop &= LF.isUseFullyOutsideLoop(L);

if (!LU.WidestFixupType \|\|		if (!LU.WidestFixupType \|\|
SE.getTypeSizeInBits(LU.WidestFixupType) <		SE.getTypeSizeInBits(LU.WidestFixupType) <
SE.getTypeSizeInBits(LF.OperandValToReplace->getType()))		SE.getTypeSizeInBits(LF.OperandValToReplace->getType()))
LU.WidestFixupType = LF.OperandValToReplace->getType();		LU.WidestFixupType = LF.OperandValToReplace->getType();

// If this is the first use of this LSRUse, give it a formula.		// If this is the first use of this LSRUse, give it a formula.
if (LU.Formulae.empty()) {		if (LU.Formulae.empty()) {
InsertInitialFormula(S, LU, LF.LUIdx);		InsertInitialFormula(S, LU, LUIdx);
CountRegisters(LU.Formulae.back(), LF.LUIdx);		CountRegisters(LU.Formulae.back(), LUIdx);
}		}
}		}

DEBUG(print_fixups(dbgs()));		DEBUG(print_fixups(dbgs()));
}		}

/// Insert a formula for the given expression into the given use, separating out		/// Insert a formula for the given expression into the given use, separating out
/// loop-variant portions from loop-invariant and loop-computable portions.		/// loop-variant portions from loop-invariant and loop-computable portions.
▲ Show 20 Lines • Show All 109 Lines • ▼ Show 20 Lines	else if (const SCEVUDivExpr *D = dyn_cast<SCEVUDivExpr>(S)) {
// Ignore icmp instructions which are already being analyzed.		// Ignore icmp instructions which are already being analyzed.
if (const ICmpInst *ICI = dyn_cast<ICmpInst>(UserInst)) {		if (const ICmpInst *ICI = dyn_cast<ICmpInst>(UserInst)) {
unsigned OtherIdx = !U.getOperandNo();		unsigned OtherIdx = !U.getOperandNo();
Value OtherOp = const_cast<Value >(ICI->getOperand(OtherIdx));		Value OtherOp = const_cast<Value >(ICI->getOperand(OtherIdx));
if (SE.hasComputableLoopEvolution(SE.getSCEV(OtherOp), L))		if (SE.hasComputableLoopEvolution(SE.getSCEV(OtherOp), L))
continue;		continue;
}		}

LSRFixup &LF = getNewFixup();
LF.UserInst = const_cast<Instruction *>(UserInst);
LF.OperandValToReplace = U;
std::pair<size_t, int64_t> P = getUse(		std::pair<size_t, int64_t> P = getUse(
S, LSRUse::Basic, MemAccessTy());		S, LSRUse::Basic, MemAccessTy());
LF.LUIdx = P.first;		size_t LUIdx = P.first;
LF.Offset = P.second;		int64_t Offset = P.second;
LSRUse &LU = Uses[LF.LUIdx];		LSRUse &LU = Uses[LUIdx];
		LSRFixup &LF = LU.getNewFixup();
		LF.UserInst = const_cast<Instruction *>(UserInst);
		LF.OperandValToReplace = U;
		LF.Offset = Offset;
LU.AllFixupsOutsideLoop &= LF.isUseFullyOutsideLoop(L);		LU.AllFixupsOutsideLoop &= LF.isUseFullyOutsideLoop(L);
if (!LU.WidestFixupType \|\|		if (!LU.WidestFixupType \|\|
SE.getTypeSizeInBits(LU.WidestFixupType) <		SE.getTypeSizeInBits(LU.WidestFixupType) <
SE.getTypeSizeInBits(LF.OperandValToReplace->getType()))		SE.getTypeSizeInBits(LF.OperandValToReplace->getType()))
LU.WidestFixupType = LF.OperandValToReplace->getType();		LU.WidestFixupType = LF.OperandValToReplace->getType();
InsertSupplementalFormula(US, LU, LF.LUIdx);		InsertSupplementalFormula(US, LU, LUIdx);
CountRegisters(LU.Formulae.back(), Uses.size() - 1);		CountRegisters(LU.Formulae.back(), Uses.size() - 1);
break;		break;
}		}
}		}
}		}
}		}

/// Split S into subexpressions which can be pulled out into separate		/// Split S into subexpressions which can be pulled out into separate
▲ Show 20 Lines • Show All 712 Lines • ▼ Show 20 Lines	for (size_t FIdx = 0, NumForms = LU.Formulae.size();
// nonexistent AddRecs from other loops. These need to be filtered		// nonexistent AddRecs from other loops. These need to be filtered
// immediately, otherwise heuristics could choose them over others leading		// immediately, otherwise heuristics could choose them over others leading
// to an unsatisfactory solution. Passing LoserRegs into RateFormula here		// to an unsatisfactory solution. Passing LoserRegs into RateFormula here
// avoids the need to recompute this information across formulae using the		// avoids the need to recompute this information across formulae using the
// same bad AddRec. Passing LoserRegs is also essential unless we remove		// same bad AddRec. Passing LoserRegs is also essential unless we remove
// the corresponding bad register from the Regs set.		// the corresponding bad register from the Regs set.
Cost CostF;		Cost CostF;
Regs.clear();		Regs.clear();
CostF.RateFormula(TTI, F, Regs, VisitedRegs, L, LU.Offsets, SE, DT, LU,		CostF.RateFormula(TTI, F, Regs, VisitedRegs, L, SE, DT, LU, &LoserRegs);
&LoserRegs);
if (CostF.isLoser()) {		if (CostF.isLoser()) {
// During initial formula generation, undesirable formulae are generated		// During initial formula generation, undesirable formulae are generated
// by uses within other loops that have some non-trivial address mode or		// by uses within other loops that have some non-trivial address mode or
// use the postinc form of the IV. LSR needs to provide these formulae		// use the postinc form of the IV. LSR needs to provide these formulae
// as the basis of rediscovering the desired formula that uses an AddRec		// as the basis of rediscovering the desired formula that uses an AddRec
// corresponding to the existing phi. Once all formulae have been		// corresponding to the existing phi. Once all formulae have been
// generated, these initial losers may be pruned.		// generated, these initial losers may be pruned.
DEBUG(dbgs() << " Filtering loser "; F.print(dbgs());		DEBUG(dbgs() << " Filtering loser "; F.print(dbgs());
Show All 16 Lines	for (size_t FIdx = 0, NumForms = LU.Formulae.size();
BestFormulae.insert(std::make_pair(Key, FIdx));		BestFormulae.insert(std::make_pair(Key, FIdx));
if (P.second)		if (P.second)
continue;		continue;

Formula &Best = LU.Formulae[P.first->second];		Formula &Best = LU.Formulae[P.first->second];

Cost CostBest;		Cost CostBest;
Regs.clear();		Regs.clear();
CostBest.RateFormula(TTI, Best, Regs, VisitedRegs, L, LU.Offsets, SE,		CostBest.RateFormula(TTI, Best, Regs, VisitedRegs, L, SE, DT, LU);
DT, LU);
if (CostF < CostBest)		if (CostF < CostBest)
std::swap(F, Best);		std::swap(F, Best);
DEBUG(dbgs() << " Filtering out formula "; F.print(dbgs());		DEBUG(dbgs() << " Filtering out formula "; F.print(dbgs());
dbgs() << "\n"		dbgs() << "\n"
" in favor of formula "; Best.print(dbgs());		" in favor of formula "; Best.print(dbgs());
dbgs() << '\n');		dbgs() << '\n');
}		}
#ifndef NDEBUG		#ifndef NDEBUG
▲ Show 20 Lines • Show All 129 Lines • ▼ Show 20 Lines	for (const Formula &F : LU.Formulae) {
if (!reconcileNewOffset(LUThatHas, F.BaseOffset, /HasBaseReg=*/ false,		if (!reconcileNewOffset(LUThatHas, F.BaseOffset, /HasBaseReg=*/ false,
LU.Kind, LU.AccessTy))		LU.Kind, LU.AccessTy))
continue;		continue;

DEBUG(dbgs() << " Deleting use "; LU.print(dbgs()); dbgs() << '\n');		DEBUG(dbgs() << " Deleting use "; LU.print(dbgs()); dbgs() << '\n');

LUThatHas->AllFixupsOutsideLoop &= LU.AllFixupsOutsideLoop;		LUThatHas->AllFixupsOutsideLoop &= LU.AllFixupsOutsideLoop;

// Update the relocs to reference the new use.		// Transfer the fixups of LU to LUThatHas.
for (LSRFixup &Fixup : Fixups) {		for (LSRFixup &Fixup : LU.Fixups) {
if (Fixup.LUIdx == LUIdx) {
Fixup.LUIdx = LUThatHas - &Uses.front();
Fixup.Offset += F.BaseOffset;		Fixup.Offset += F.BaseOffset;
// Add the new offset to LUThatHas' offset list.		LUThatHas->pushFixup(Fixup);
if (LUThatHas->Offsets.back() != Fixup.Offset) {
LUThatHas->Offsets.push_back(Fixup.Offset);
if (Fixup.Offset > LUThatHas->MaxOffset)
LUThatHas->MaxOffset = Fixup.Offset;
if (Fixup.Offset < LUThatHas->MinOffset)
LUThatHas->MinOffset = Fixup.Offset;
}
DEBUG(dbgs() << "New fixup has offset " << Fixup.Offset << '\n');		DEBUG(dbgs() << "New fixup has offset " << Fixup.Offset << '\n');
}		}
if (Fixup.LUIdx == NumUses-1)
Fixup.LUIdx = LUIdx;
}

// Delete formulae from the new use which are no longer legal.		// Delete formulae from the new use which are no longer legal.
bool Any = false;		bool Any = false;
for (size_t i = 0, e = LUThatHas->Formulae.size(); i != e; ++i) {		for (size_t i = 0, e = LUThatHas->Formulae.size(); i != e; ++i) {
Formula &F = LUThatHas->Formulae[i];		Formula &F = LUThatHas->Formulae[i];
if (!isLegalUse(TTI, LUThatHas->MinOffset, LUThatHas->MaxOffset,		if (!isLegalUse(TTI, LUThatHas->MinOffset, LUThatHas->MaxOffset,
LUThatHas->Kind, LUThatHas->AccessTy, F)) {		LUThatHas->Kind, LUThatHas->AccessTy, F)) {
DEBUG(dbgs() << " Deleting "; F.print(dbgs());		DEBUG(dbgs() << " Deleting "; F.print(dbgs());
dbgs() << '\n');		dbgs() << '\n');
▲ Show 20 Lines • Show All 158 Lines • ▼ Show 20 Lines	if (NumReqRegsToFind != 0) {
// clear ReqRegs and try again. Currently, we simply give up in this case.		// clear ReqRegs and try again. Currently, we simply give up in this case.
continue;		continue;
}		}

// Evaluate the cost of the current formula. If it's already worse than		// Evaluate the cost of the current formula. If it's already worse than
// the current best, prune the search at that point.		// the current best, prune the search at that point.
NewCost = CurCost;		NewCost = CurCost;
NewRegs = CurRegs;		NewRegs = CurRegs;
NewCost.RateFormula(TTI, F, NewRegs, VisitedRegs, L, LU.Offsets, SE, DT,		NewCost.RateFormula(TTI, F, NewRegs, VisitedRegs, L, SE, DT, LU);
LU);
if (NewCost < SolutionCost) {		if (NewCost < SolutionCost) {
Workspace.push_back(&F);		Workspace.push_back(&F);
if (Workspace.size() != Uses.size()) {		if (Workspace.size() != Uses.size()) {
SolveRecurse(Solution, SolutionCost, Workspace, NewCost,		SolveRecurse(Solution, SolutionCost, Workspace, NewCost,
NewRegs, VisitedRegs);		NewRegs, VisitedRegs);
if (F.getNumRegs() == 1 && Workspace.size() == 1)		if (F.getNumRegs() == 1 && Workspace.size() == 1)
VisitedRegs.insert(F.ScaledReg ? F.ScaledReg : F.BaseRegs[0]);		VisitedRegs.insert(F.ScaledReg ? F.ScaledReg : F.BaseRegs[0]);
} else {		} else {
▲ Show 20 Lines • Show All 166 Lines • ▼ Show 20 Lines	LSRInstance::AdjustInsertPositionForExpand(BasicBlock::iterator LowestIP,
while (Rewriter.isInsertedInstruction(&*IP) && IP != LowestIP)		while (Rewriter.isInsertedInstruction(&*IP) && IP != LowestIP)
++IP;		++IP;

return IP;		return IP;
}		}

/// Emit instructions for the leading candidate expression for this LSRUse (this		/// Emit instructions for the leading candidate expression for this LSRUse (this
/// is called "expanding").		/// is called "expanding").
Value *LSRInstance::Expand(const LSRFixup &LF,		Value *LSRInstance::Expand(const LSRUse &LU,
		const LSRFixup &LF,
const Formula &F,		const Formula &F,
BasicBlock::iterator IP,		BasicBlock::iterator IP,
SCEVExpander &Rewriter,		SCEVExpander &Rewriter,
SmallVectorImpl<WeakVH> &DeadInsts) const {		SmallVectorImpl<WeakVH> &DeadInsts) const {
const LSRUse &LU = Uses[LF.LUIdx];
if (LU.RigidFormula)		if (LU.RigidFormula)
return LF.OperandValToReplace;		return LF.OperandValToReplace;

// Determine an input position which will be dominated by the operands and		// Determine an input position which will be dominated by the operands and
// which will dominate the result.		// which will dominate the result.
IP = AdjustInsertPositionForExpand(IP, LF, LU, Rewriter);		IP = AdjustInsertPositionForExpand(IP, LF, LU, Rewriter);

// Inform the Rewriter if we have a post-increment use, so that it can		// Inform the Rewriter if we have a post-increment use, so that it can
▲ Show 20 Lines • Show All 164 Lines • ▼ Show 20 Lines	Value *LSRInstance::Expand(const LSRUse &LU,

return FullV;		return FullV;
}		}

/// Helper for Rewrite. PHI nodes are special because the use of their operands		/// Helper for Rewrite. PHI nodes are special because the use of their operands
/// effectively happens in their predecessor blocks, so the expression may need		/// effectively happens in their predecessor blocks, so the expression may need
/// to be expanded in multiple places.		/// to be expanded in multiple places.
void LSRInstance::RewriteForPHI(PHINode *PN,		void LSRInstance::RewriteForPHI(PHINode *PN,
		const LSRUse &LU,
const LSRFixup &LF,		const LSRFixup &LF,
const Formula &F,		const Formula &F,
SCEVExpander &Rewriter,		SCEVExpander &Rewriter,
SmallVectorImpl<WeakVH> &DeadInsts) const {		SmallVectorImpl<WeakVH> &DeadInsts) const {
DenseMap<BasicBlock , Value > Inserted;		DenseMap<BasicBlock , Value > Inserted;
for (unsigned i = 0, e = PN->getNumIncomingValues(); i != e; ++i)		for (unsigned i = 0, e = PN->getNumIncomingValues(); i != e; ++i)
if (PN->getIncomingValue(i) == LF.OperandValToReplace) {		if (PN->getIncomingValue(i) == LF.OperandValToReplace) {
BasicBlock *BB = PN->getIncomingBlock(i);		BasicBlock *BB = PN->getIncomingBlock(i);
Show All 37 Lines	if (PN->getIncomingValue(i) == LF.OperandValToReplace) {
}		}
}		}

std::pair<DenseMap<BasicBlock , Value >::iterator, bool> Pair =		std::pair<DenseMap<BasicBlock , Value >::iterator, bool> Pair =
Inserted.insert(std::make_pair(BB, static_cast<Value *>(nullptr)));		Inserted.insert(std::make_pair(BB, static_cast<Value *>(nullptr)));
if (!Pair.second)		if (!Pair.second)
PN->setIncomingValue(i, Pair.first->second);		PN->setIncomingValue(i, Pair.first->second);
else {		else {
Value *FullV = Expand(LF, F, BB->getTerminator()->getIterator(),		Value *FullV = Expand(LU, LF, F, BB->getTerminator()->getIterator(),
Rewriter, DeadInsts);		Rewriter, DeadInsts);

// If this is reuse-by-noop-cast, insert the noop cast.		// If this is reuse-by-noop-cast, insert the noop cast.
Type *OpTy = LF.OperandValToReplace->getType();		Type *OpTy = LF.OperandValToReplace->getType();
if (FullV->getType() != OpTy)		if (FullV->getType() != OpTy)
FullV =		FullV =
CastInst::Create(CastInst::getCastOpcode(FullV, false,		CastInst::Create(CastInst::getCastOpcode(FullV, false,
OpTy, false),		OpTy, false),
FullV, LF.OperandValToReplace->getType(),		FullV, LF.OperandValToReplace->getType(),
"tmp", BB->getTerminator());		"tmp", BB->getTerminator());

PN->setIncomingValue(i, FullV);		PN->setIncomingValue(i, FullV);
Pair.first->second = FullV;		Pair.first->second = FullV;
}		}
}		}
}		}

/// Emit instructions for the leading candidate expression for this LSRUse (this		/// Emit instructions for the leading candidate expression for this LSRUse (this
/// is called "expanding"), and update the UserInst to reference the newly		/// is called "expanding"), and update the UserInst to reference the newly
/// expanded value.		/// expanded value.
void LSRInstance::Rewrite(const LSRFixup &LF,		void LSRInstance::Rewrite(const LSRUse &LU,
		const LSRFixup &LF,
const Formula &F,		const Formula &F,
SCEVExpander &Rewriter,		SCEVExpander &Rewriter,
SmallVectorImpl<WeakVH> &DeadInsts) const {		SmallVectorImpl<WeakVH> &DeadInsts) const {
// First, find an insertion point that dominates UserInst. For PHI nodes,		// First, find an insertion point that dominates UserInst. For PHI nodes,
// find the nearest block which dominates all the relevant uses.		// find the nearest block which dominates all the relevant uses.
if (PHINode *PN = dyn_cast<PHINode>(LF.UserInst)) {		if (PHINode *PN = dyn_cast<PHINode>(LF.UserInst)) {
RewriteForPHI(PN, LF, F, Rewriter, DeadInsts);		RewriteForPHI(PN, LU, LF, F, Rewriter, DeadInsts);
} else {		} else {
Value *FullV =		Value *FullV =
Expand(LF, F, LF.UserInst->getIterator(), Rewriter, DeadInsts);		Expand(LU, LF, F, LF.UserInst->getIterator(), Rewriter, DeadInsts);

// If this is reuse-by-noop-cast, insert the noop cast.		// If this is reuse-by-noop-cast, insert the noop cast.
Type *OpTy = LF.OperandValToReplace->getType();		Type *OpTy = LF.OperandValToReplace->getType();
if (FullV->getType() != OpTy) {		if (FullV->getType() != OpTy) {
Instruction *Cast =		Instruction *Cast =
CastInst::Create(CastInst::getCastOpcode(FullV, false, OpTy, false),		CastInst::Create(CastInst::getCastOpcode(FullV, false, OpTy, false),
FullV, OpTy, "tmp", LF.UserInst);		FullV, OpTy, "tmp", LF.UserInst);
FullV = Cast;		FullV = Cast;
}		}

// Update the user. ICmpZero is handled specially here (for now) because		// Update the user. ICmpZero is handled specially here (for now) because
// Expand may have updated one of the operands of the icmp already, and		// Expand may have updated one of the operands of the icmp already, and
// its new value may happen to be equal to LF.OperandValToReplace, in		// its new value may happen to be equal to LF.OperandValToReplace, in
// which case doing replaceUsesOfWith leads to replacing both operands		// which case doing replaceUsesOfWith leads to replacing both operands
// with the same value. TODO: Reorganize this.		// with the same value. TODO: Reorganize this.
if (Uses[LF.LUIdx].Kind == LSRUse::ICmpZero)		if (LU.Kind == LSRUse::ICmpZero)
LF.UserInst->setOperand(0, FullV);		LF.UserInst->setOperand(0, FullV);
else		else
LF.UserInst->replaceUsesOfWith(LF.OperandValToReplace, FullV);		LF.UserInst->replaceUsesOfWith(LF.OperandValToReplace, FullV);
}		}

DeadInsts.emplace_back(LF.OperandValToReplace);		DeadInsts.emplace_back(LF.OperandValToReplace);
}		}

Show All 16 Lines	#endif

// Mark phi nodes that terminate chains so the expander tries to reuse them.		// Mark phi nodes that terminate chains so the expander tries to reuse them.
for (const IVChain &Chain : IVChainVec) {		for (const IVChain &Chain : IVChainVec) {
if (PHINode *PN = dyn_cast<PHINode>(Chain.tailUserInst()))		if (PHINode *PN = dyn_cast<PHINode>(Chain.tailUserInst()))
Rewriter.setChainedPhi(PN);		Rewriter.setChainedPhi(PN);
}		}

// Expand the new value definitions and update the users.		// Expand the new value definitions and update the users.
for (const LSRFixup &Fixup : Fixups) {		for (size_t LUIdx = 0, NumUses = Uses.size(); LUIdx != NumUses; ++LUIdx)
Rewrite(Fixup, *Solution[Fixup.LUIdx], Rewriter, DeadInsts);		for (const LSRFixup &Fixup : Uses[LUIdx].Fixups) {
		Rewrite(Uses[LUIdx], Fixup, *Solution[LUIdx], Rewriter, DeadInsts);
Changed = true;		Changed = true;
}		}

for (const IVChain &Chain : IVChainVec) {		for (const IVChain &Chain : IVChainVec) {
GenerateIVChain(Chain, Rewriter, DeadInsts);		GenerateIVChain(Chain, Rewriter, DeadInsts);
Changed = true;		Changed = true;
}		}
// Clean up after ourselves. This must be done before deleting any		// Clean up after ourselves. This must be done before deleting any
// instructions.		// instructions.
Rewriter.clear();		Rewriter.clear();
▲ Show 20 Lines • Show All 127 Lines • ▼ Show 20 Lines	for (Type *Ty : Types) {
First = false;		First = false;
OS << '(' << *Ty << ')';		OS << '(' << *Ty << ')';
}		}
OS << '\n';		OS << '\n';
}		}

void LSRInstance::print_fixups(raw_ostream &OS) const {		void LSRInstance::print_fixups(raw_ostream &OS) const {
OS << "LSR is examining the following fixup sites:\n";		OS << "LSR is examining the following fixup sites:\n";
for (const LSRFixup &LF : Fixups) {		for (const LSRUse &LU : Uses)
		for (const LSRFixup &LF : LU.Fixups) {
dbgs() << " ";		dbgs() << " ";
LF.print(OS);		LF.print(OS);
OS << '\n';		OS << '\n';
}		}
}		}

void LSRInstance::print_uses(raw_ostream &OS) const {		void LSRInstance::print_uses(raw_ostream &OS) const {
OS << "LSR is examining the following uses:\n";		OS << "LSR is examining the following uses:\n";
for (const LSRUse &LU : Uses) {		for (const LSRUse &LU : Uses) {
dbgs() << " ";		dbgs() << " ";
LU.print(OS);		LU.print(OS);
OS << '\n';		OS << '\n';
▲ Show 20 Lines • Show All 129 Lines • Show Last 20 Lines

test/CodeGen/SystemZ/loop-01.ll

; Test loop tuning.		; Test loop tuning.
;		;
; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z10 \| FileCheck %s		; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z10 \| FileCheck %s
		; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z13 \
		; RUN: \| FileCheck %s -check-prefix=CHECK -check-prefix=CHECK-Z13
		qcolombetUnsubmitted Not Done Reply Inline Actions How different is the codegen for the z13 cpu? The reason why I am asking is because right now, the test cases are partitioned between one target and the other with no overlap whereas if we have several RUN command in the file, I would have expected at least some overlap. I.e., I would expect a common prefix between both CPUs that is used for most of the tests. qcolombet: How different is the codegen for the z13 cpu? The reason why I am asking is because right now…
		jonpaAuthorUnsubmitted Not Done Reply Inline Actions The one test for z13 is for vector instructions, which only z13 supports. The other tests I added should be common for all subtargets, so I just reused the already present RUN command. Perhaps it should use the generic subtarget instead of z10 (which I think would be equivalent)? jonpa: The one test for z13 is for vector instructions, which only z13 supports. The other tests I…
		qcolombetUnsubmitted Not Done Reply Inline Actions In that case, for the RUN line with z13 also add —check-prefix=CHECK. qcolombet: In that case, for the RUN line with z13 also add —check-prefix=CHECK.

; Test that strength reduction is applied to addresses with a scale factor,		; Test that strength reduction is applied to addresses with a scale factor,
; but that indexed addressing can still be used.		; but that indexed addressing can still be used.
define void @f1(i32 *%dest, i32 %a) {		define void @f1(i32 *%dest, i32 %a) {
; CHECK-LABEL: f1:		; CHECK-LABEL: f1:
; CHECK-NOT: sllg		; CHECK-NOT: sllg
; CHECK: st %r3, 0({{%r[1-5],%r[1-5]}})		; CHECK: st %r3, 0({{%r[1-5],%r[1-5]}})
; CHECK: br %r14		; CHECK: br %r14
▲ Show 20 Lines • Show All 105 Lines • ▼ Show 20 Lines	loop.next:
%or = or i64 %shl, %and		%or = or i64 %shl, %and
store volatile i64 %or, i64 *%dest2		store volatile i64 %or, i64 *%dest2
%cont = icmp ne i64 %next, 0		%cont = icmp ne i64 %next, 0
br i1 %cont, label %loop, label %exit		br i1 %cont, label %loop, label %exit

exit:		exit:
ret void		ret void
}		}

		; Test that negative offsets are avoided for loads of floating point.
		%s.float = type { float, float, float }
		define void @f5(%s.float* nocapture %a,
		%s.float* nocapture readonly %b,
		i32 zeroext %S) {
		; CHECK-Z13-LABEL: f5:
		; CHECK-Z13-NOT: -{{[0-9]+}}(%r

		entry:
		%cmp9 = icmp eq i32 %S, 0
		br i1 %cmp9, label %for.cond.cleanup, label %for.body.preheader

		for.body.preheader: ; preds = %entry
		br label %for.body

		for.cond.cleanup.loopexit: ; preds = %for.body
		br label %for.cond.cleanup

		for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry
		ret void

		for.body: ; preds = %for.body.preheader, %for.body
		%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %for.body.preheader ]
		%a1 = getelementptr inbounds %s.float, %s.float* %b, i64 %indvars.iv, i32 0
		%tmp = load float, float* %a1, align 4
		qcolombetUnsubmitted Not Done Reply Inline Actions Use "opt -instnamer" to get %[0-9]+ variables. qcolombet: Use "opt -instnamer" to get %[0-9]+ variables.
		jonpaAuthorUnsubmitted Not Done Reply Inline Actions It seems to replace %[0-9]+ variable names with %tmp[1-9]* names. Is that what you want? jonpa: It seems to replace %[0-9]+ variable names with %tmp[1-9]* names. Is that what you want?
		qcolombetUnsubmitted Not Done Reply Inline Actions Yes, that what I want. [0-9]+ variables cannot be reordered or removed, tmp[0-9]+ can :). qcolombet: Yes, that what I want. [0-9]+ variables cannot be reordered or removed, tmp[0-9]+ can :).
		%b4 = getelementptr inbounds %s.float, %s.float* %b, i64 %indvars.iv, i32 1
		%tmp1 = load float, float* %b4, align 4
		%add = fadd float %tmp, %tmp1
		%c = getelementptr inbounds %s.float, %s.float* %b, i64 %indvars.iv, i32 2
		%tmp2 = load float, float* %c, align 4
		%add7 = fadd float %add, %tmp2
		%a10 = getelementptr inbounds %s.float, %s.float* %a, i64 %indvars.iv, i32 0
		store float %add7, float* %a10, align 4
		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
		%lftr.wideiv = trunc i64 %indvars.iv.next to i32
		%exitcond = icmp eq i32 %lftr.wideiv, %S
		br i1 %exitcond, label %for.cond.cleanup.loopexit, label %for.body
		}

		; Test that negative offsets are avoided for loads of double.
		%s.double = type { double, double, double }
		define void @f6(%s.double* nocapture %a,
		%s.double* nocapture readonly %b,
		i32 zeroext %S) {
		; CHECK-Z13-LABEL: f6:
		; CHECK-Z13-NOT: -{{[0-9]+}}(%r
		entry:
		%cmp9 = icmp eq i32 %S, 0
		br i1 %cmp9, label %for.cond.cleanup, label %for.body.preheader

		for.body.preheader: ; preds = %entry
		br label %for.body

		for.cond.cleanup.loopexit: ; preds = %for.body
		br label %for.cond.cleanup

		for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry
		ret void

		for.body: ; preds = %for.body.preheader, %for.body
		%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %for.body.preheader ]
		%a1 = getelementptr inbounds %s.double, %s.double* %b, i64 %indvars.iv, i32 0
		%tmp = load double, double* %a1, align 4
		%b4 = getelementptr inbounds %s.double, %s.double* %b, i64 %indvars.iv, i32 1
		%tmp1 = load double, double* %b4, align 4
		%add = fadd double %tmp, %tmp1
		%c = getelementptr inbounds %s.double, %s.double* %b, i64 %indvars.iv, i32 2
		%tmp2 = load double, double* %c, align 4
		%add7 = fadd double %add, %tmp2
		%a10 = getelementptr inbounds %s.double, %s.double* %a, i64 %indvars.iv, i32 0
		store double %add7, double* %a10, align 4
		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
		%lftr.wideiv = trunc i64 %indvars.iv.next to i32
		%exitcond = icmp eq i32 %lftr.wideiv, %S
		br i1 %exitcond, label %for.cond.cleanup.loopexit, label %for.body
		}

		; Test that negative offsets are avoided for memory accesses of vector type.
		%s.vec = type { <4 x i32>, <4 x i32>, <4 x i32> }
		define void @f7(%s.vec* nocapture %a,
		%s.vec* nocapture readonly %b,
		i32 zeroext %S) {
		; CHECK-Z13-LABEL: f7:
		; CHECK-Z13-NOT: -{{[0-9]+}}(%r
		entry:
		%cmp9 = icmp eq i32 %S, 0
		br i1 %cmp9, label %for.cond.cleanup, label %for.body.preheader

		for.body.preheader: ; preds = %entry
		br label %for.body

		for.cond.cleanup.loopexit: ; preds = %for.body
		br label %for.cond.cleanup

		for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry
		ret void

		for.body: ; preds = %for.body.preheader, %for.body
		%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %for.body.preheader ]
		%a1 = getelementptr inbounds %s.vec, %s.vec* %b, i64 %indvars.iv, i32 0
		%tmp = load <4 x i32>, <4 x i32>* %a1, align 4
		%b4 = getelementptr inbounds %s.vec, %s.vec* %b, i64 %indvars.iv, i32 1
		%tmp1 = load <4 x i32>, <4 x i32>* %b4, align 4
		%add = add <4 x i32> %tmp1, %tmp
		%c = getelementptr inbounds %s.vec, %s.vec* %b, i64 %indvars.iv, i32 2
		%tmp2 = load <4 x i32>, <4 x i32>* %c, align 4
		%add7 = add <4 x i32> %add, %tmp2
		%a10 = getelementptr inbounds %s.vec, %s.vec* %a, i64 %indvars.iv, i32 0
		store <4 x i32> %add7, <4 x i32>* %a10, align 4
		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
		%lftr.wideiv = trunc i64 %indvars.iv.next to i32
		%exitcond = icmp eq i32 %lftr.wideiv, %S
		br i1 %exitcond, label %for.cond.cleanup.loopexit, label %for.body
		}