This is an archive of the discontinued LLVM Phabricator instance.

Eliminate TargetTransformInfo::isFoldableMemAccess()
ClosedPublic

Authored by jonpa on Jul 27 2017, 1:55 AM.

Download Raw Diff

Details

Reviewers

uweigand
qcolombet

Summary

isLegalAddressingMode() has recently gained the extra optional Instruction* parameter, and therefore it can now do the job that previously isFoldableMemAccess() could only do.

The SystemZ implementation of isLegalAddressingMode() has gained the functionality of checking for offsets, which used to be done with isFoldableMemAccess().

The isFoldableMemAccess() hook can be removed everywhere.

I used the isAMCompletelyFolded() wrapper again in LoopStrengthReduce.cpp, to avoid duplicating code just like last time.

Diff Detail

Event Timeline

jonpa created this revision.Jul 27 2017, 1:55 AM

Herald added a subscriber: mzolotukhin. · View Herald TranscriptJul 27 2017, 1:55 AM

The SystemZ part looks correct to me, see the inline comment for a style/readability suggestion.

I'm not sure about the common code changes -- what does simply calling isLegalAddressingMode instead of isFoldableMemAccessOffset do to targets that have not yet added support to do instruction-specific checks to the former? I'd assume you'd see performance regressions there.

Maybe this transition should be done on a target-by-target basis.

lib/Target/SystemZ/SystemZISelLowering.cpp
699	Hmm, this pulls the conditions apart again ... Maybe readability would be improved by just always having a SupportedAM variable, initialized at the top like: AddressingMode SupportedAM(true, true); if (I != nullptr) SupportedAM = supportedAddressingMode(I, Subtarget.hasVector()); and then just use it throughout the function.

I'm not sure about the common code changes -- what does simply calling isLegalAddressingMode instead of isFoldableMemAccessOffset do to targets that have not yet added support to do instruction-specific checks to the former? I'd assume you'd see performance regressions there.

Maybe this transition should be done on a target-by-target basis.

SystemZ was the only user of isFoldableMemAccessOffset() except for any out-of-tree targets. I was thinking that this transition should not be much more difficult than what it was on SystemZ. Since it is now clear that this method is obsolete, perhaps a heads-up in the commit message will do?

Hmm, this pulls the conditions apart again ... Maybe readability would be improved by just always having a SupportedAM variable, initialized at the top like...

Done.

In D35933#823813, @jonpa wrote:

I'm not sure about the common code changes -- what does simply calling isLegalAddressingMode instead of isFoldableMemAccessOffset do to targets that have not yet added support to do instruction-specific checks to the former? I'd assume you'd see performance regressions there.

Maybe this transition should be done on a target-by-target basis.

SystemZ was the only user of isFoldableMemAccessOffset() except for any out-of-tree targets.

Well, even if they had no separate implementation, they would fall back to the default one (returning always true). Now the code will do a call to isLegalAddressingMode instead. Is is clear that this will either be no change (i.e. also return always true), or else an lead to an actual improvement, on all other targets?

I'm just wondering whether this could expose a performance regression on some other target ...

In D35933#823866, @uweigand wrote:

In D35933#823813, @jonpa wrote:

I'm not sure about the common code changes -- what does simply calling isLegalAddressingMode instead of isFoldableMemAccessOffset do to targets that have not yet added support to do instruction-specific checks to the former? I'd assume you'd see performance regressions there.

Maybe this transition should be done on a target-by-target basis.

SystemZ was the only user of isFoldableMemAccessOffset() except for any out-of-tree targets.

Well, even if they had no separate implementation, they would fall back to the default one (returning always true). Now the code will do a call to isLegalAddressingMode instead. Is is clear that this will either be no change (i.e. also return always true), or else an lead to an actual improvement, on all other targets?

I'm just wondering whether this could expose a performance regression on some other target ...

I see your point now. Of course, if this does change things around it should hopefully be for the better.

Given all the other calls to isAMCompletelyFolded() and so on, I have never been quite sure that other targets ever needed this extra checking based on checking the actual Instructions like SystemZ. That's why I recently proposed guarding this with TTI.LSRWithInstrQueries(), which is still an alternative, I guess. I think Quentin was against this for the sake of out-of-tree targets, IIRC.

Quentin, what is your opinion now?

Hi Jonas,

Looks reasonable to me.

Thanks for killing some hooks :).

Cheers,
-Quentin

This revision is now accepted and ready to land.Jul 31 2017, 4:41 PM

Just before I was about to commit, I noticed that a Hexagon test broke (test/CodeGen/Hexagon/swp-const-tc.ll), so unfortunately this will have to wait a bit, since I am now on vacation.

The Hexagon test passes by just updating the label like #998->#999. It seems that the loop is now one instruction smaller also, so I am assuming everyone is happy with this.

Diff on output (old <> new)

10,11c10
< loop0(.LBB0_1,#998)

< r2 = #-4000

			loop0(.LBB0_1,#999)

15,21c14,15
< r3 = add(r0,r2)
< r4 = add(r2,#4)
< }
< {
< r2 = add(r0,r4)
< r4 = add(r4,#4)

< r3 = memw(r3+#4000)

			r3 = add(r1,#4)
			r2 = memw(r0+r1<<#0)

27,30c21,23
< r1 = add(r3,r1)
< r2 = add(r0,r4)
< r4 = add(r4,#4)

< r3 = memw(r2+#4000)

			r1 = add(r2,r1)
			r3 = add(r3,#4)
			r2 = memw(r0+r3<<#0)

34,38c27
< r0 = add(r3,r1)
< r2 = memw(r2+#4000)
< }
< {

< r0 = add(r2,r0)

			r0 = add(r2,r1)

Properly formatted diff:

10,11c10
< 			loop0(.LBB0_1,#998)
< 			r2 = #-4000
---
> 			loop0(.LBB0_1,#999)
15,21c14,15
< 			r3 = add(r0,r2)
< 			r4 = add(r2,#4)
< 	}
< 	{
< 			r2 = add(r0,r4)
< 			r4 = add(r4,#4)
< 			r3 = memw(r3+#4000)
---
> 			r3 = add(r1,#4)
> 			r2 = memw(r0+r1<<#0)
27,30c21,23
< 			r1 = add(r3,r1)
< 			r2 = add(r0,r4)
< 			r4 = add(r4,#4)
< 			r3 = memw(r2+#4000)
---
> 			r1 = add(r2,r1)
> 			r3 = add(r3,#4)
> 			r2 = memw(r0+r3<<#0)
34,38c27
< 			r0 = add(r3,r1)
< 			r2 = memw(r2+#4000)
< 	}
< 	{
< 			r0 = add(r2,r0)
---
> 			r0 = add(r2,r1)

Thanks for review.
trunk@310463

Revision Contents

Path

Size

include/

llvm/

Analysis/

TargetTransformInfo.h

10 lines

TargetTransformInfoImpl.h

2 lines

CodeGen/

BasicTTIImpl.h

4 lines

Target/

TargetLowering.h

4 lines

lib/

Analysis/

TargetTransformInfo.cpp

5 lines

Target/

SystemZ/

SystemZISelLowering.h

1 line

SystemZISelLowering.cpp

24 lines

Transforms/

Scalar/

LoopStrengthReduce.cpp

15 lines

test/

CodeGen/

Hexagon/

swp-const-tc.ll

2 lines

Diff 110349

include/llvm/Analysis/TargetTransformInfo.h

Context not available.
	/// immediate offset and no index register.	/// immediate offset and no index register.
	bool LSRWithInstrQueries() const;	bool LSRWithInstrQueries() const;

	/// \brief Return true if target supports the load / store
	/// instruction with the given Offset on the form reg + Offset. It
	/// may be that Offset is too big for a certain type (register
	/// class).
	bool isFoldableMemAccessOffset(Instruction *I, int64_t Offset) const;

	/// \brief Return true if it's free to truncate a value of type Ty1 to type	/// \brief Return true if it's free to truncate a value of type Ty1 to type
	/// Ty2. e.g. On x86 it's free to truncate a i32 value in register EAX to i16	/// Ty2. e.g. On x86 it's free to truncate a i32 value in register EAX to i16
	/// by referencing its sub-register AX.	/// by referencing its sub-register AX.
Context not available.
	int64_t BaseOffset, bool HasBaseReg,	int64_t BaseOffset, bool HasBaseReg,
	int64_t Scale, unsigned AddrSpace) = 0;	int64_t Scale, unsigned AddrSpace) = 0;
	virtual bool LSRWithInstrQueries() = 0;	virtual bool LSRWithInstrQueries() = 0;
	virtual bool isFoldableMemAccessOffset(Instruction *I, int64_t Offset) = 0;
	virtual bool isTruncateFree(Type Ty1, Type Ty2) = 0;	virtual bool isTruncateFree(Type Ty1, Type Ty2) = 0;
	virtual bool isProfitableToHoist(Instruction *I) = 0;	virtual bool isProfitableToHoist(Instruction *I) = 0;
	virtual bool isTypeLegal(Type *Ty) = 0;	virtual bool isTypeLegal(Type *Ty) = 0;
Context not available.
	bool LSRWithInstrQueries() override {	bool LSRWithInstrQueries() override {
	return Impl.LSRWithInstrQueries();	return Impl.LSRWithInstrQueries();
	}	}
	bool isFoldableMemAccessOffset(Instruction *I, int64_t Offset) override {
	return Impl.isFoldableMemAccessOffset(I, Offset);
	}
	bool isTruncateFree(Type Ty1, Type Ty2) override {	bool isTruncateFree(Type Ty1, Type Ty2) override {
	return Impl.isTruncateFree(Ty1, Ty2);	return Impl.isTruncateFree(Ty1, Ty2);
	}	}
Context not available.

include/llvm/Analysis/TargetTransformInfoImpl.h

Context not available.

	bool LSRWithInstrQueries() { return false; }	bool LSRWithInstrQueries() { return false; }

	bool isFoldableMemAccessOffset(Instruction *I, int64_t Offset) { return true; }

	bool isTruncateFree(Type Ty1, Type Ty2) { return false; }	bool isTruncateFree(Type Ty1, Type Ty2) { return false; }

	bool isProfitableToHoist(Instruction *I) { return true; }	bool isProfitableToHoist(Instruction *I) { return true; }
Context not available.

include/llvm/CodeGen/BasicTTIImpl.h

Context not available.
	return getTLI()->getScalingFactorCost(DL, AM, Ty, AddrSpace);	return getTLI()->getScalingFactorCost(DL, AM, Ty, AddrSpace);
	}	}

	bool isFoldableMemAccessOffset(Instruction *I, int64_t Offset) {
	return getTLI()->isFoldableMemAccessOffset(I, Offset);
	}

	bool isTruncateFree(Type Ty1, Type Ty2) {	bool isTruncateFree(Type Ty1, Type Ty2) {
	return getTLI()->isTruncateFree(Ty1, Ty2);	return getTLI()->isTruncateFree(Ty1, Ty2);
	}	}
Context not available.

include/llvm/Target/TargetLowering.h

Context not available.
	return -1;	return -1;
	}	}

	virtual bool isFoldableMemAccessOffset(Instruction *I, int64_t Offset) const {
	return true;
	}

	/// Return true if the specified immediate is legal icmp immediate, that is	/// Return true if the specified immediate is legal icmp immediate, that is
	/// the target has icmp instructions which can compare a register against the	/// the target has icmp instructions which can compare a register against the
	/// immediate without having to materialize the immediate into a register.	/// immediate without having to materialize the immediate into a register.
Context not available.

lib/Analysis/TargetTransformInfo.cpp

Context not available.
	return TTIImpl->LSRWithInstrQueries();	return TTIImpl->LSRWithInstrQueries();
	}	}

	bool TargetTransformInfo::isFoldableMemAccessOffset(Instruction *I,
	int64_t Offset) const {
	return TTIImpl->isFoldableMemAccessOffset(I, Offset);
	}

	bool TargetTransformInfo::isTruncateFree(Type Ty1, Type Ty2) const {	bool TargetTransformInfo::isTruncateFree(Type Ty1, Type Ty2) const {
	return TTIImpl->isTruncateFree(Ty1, Ty2);	return TTIImpl->isTruncateFree(Ty1, Ty2);
	}	}
Context not available.

lib/Target/SystemZ/SystemZISelLowering.h

Context not available.
	bool isLegalAddressingMode(const DataLayout &DL, const AddrMode &AM, Type *Ty,	bool isLegalAddressingMode(const DataLayout &DL, const AddrMode &AM, Type *Ty,
	unsigned AS,	unsigned AS,
	Instruction *I = nullptr) const override;	Instruction *I = nullptr) const override;
	bool isFoldableMemAccessOffset(Instruction *I, int64_t Offset) const override;
	bool allowsMisalignedMemoryAccesses(EVT VT, unsigned AS,	bool allowsMisalignedMemoryAccesses(EVT VT, unsigned AS,
	unsigned Align,	unsigned Align,
	bool *Fast) const override;	bool *Fast) const override;
Context not available.

lib/Target/SystemZ/SystemZISelLowering.cpp

Context not available.
	return AddressingMode(true/LongDispl/, true/IdxReg/);	return AddressingMode(true/LongDispl/, true/IdxReg/);
	}	}

	// TODO: This method should also check for the displacement when *I is
	// passed. It may also be possible to merge with isFoldableMemAccessOffset()
	// now that both methods get the *I.
	bool SystemZTargetLowering::isLegalAddressingMode(const DataLayout &DL,	bool SystemZTargetLowering::isLegalAddressingMode(const DataLayout &DL,
	const AddrMode &AM, Type Ty, unsigned AS, Instruction I) const {	const AddrMode &AM, Type Ty, unsigned AS, Instruction I) const {
	// Punt on globals for now, although they can be used in limited	// Punt on globals for now, although they can be used in limited
	// RELATIVE LONG cases.	// RELATIVE LONG cases.
	if (AM.BaseGV)	if (AM.BaseGV)
Context not available.
	if (!isInt<20>(AM.BaseOffs))	if (!isInt<20>(AM.BaseOffs))
	return false;	return false;

	if (I != nullptr &&	AddressingMode SupportedAM(true, true);
	!supportedAddressingMode(I, Subtarget.hasVector()).IndexReg)	if (I != nullptr)
		SupportedAM = supportedAddressingMode(I, Subtarget.hasVector());

		uweigandUnsubmitted Done Reply Inline Actions Hmm, this pulls the conditions apart again ... Maybe readability would be improved by just always having a SupportedAM variable, initialized at the top like: AddressingMode SupportedAM(true, true); if (I != nullptr) SupportedAM = supportedAddressingMode(I, Subtarget.hasVector()); and then just use it throughout the function. uweigand: Hmm, this pulls the conditions apart again ... Maybe readability would be improved by just…
		if (!SupportedAM.LongDisplacement && !isUInt<12>(AM.BaseOffs))
		return false;

		if (!SupportedAM.IndexReg)
	// No indexing allowed.	// No indexing allowed.
	return AM.Scale == 0;	return AM.Scale == 0;
	else	else
Context not available.
	return AM.Scale == 0 \|\| AM.Scale == 1;	return AM.Scale == 0 \|\| AM.Scale == 1;
	}	}

	// TODO: Should we check for isInt<20> also?
	bool SystemZTargetLowering::isFoldableMemAccessOffset(Instruction *I,
	int64_t Offset) const {
	if (!supportedAddressingMode(I, Subtarget.hasVector()).LongDisplacement)
	return (isUInt<12>(Offset));

	return true;
	}

	bool SystemZTargetLowering::isTruncateFree(Type FromType, Type ToType) const {	bool SystemZTargetLowering::isTruncateFree(Type FromType, Type ToType) const {
	if (!FromType->isIntegerTy() \|\| !ToType->isIntegerTy())	if (!FromType->isIntegerTy() \|\| !ToType->isIntegerTy())
	return false;	return false;
Context not available.

lib/Transforms/Scalar/LoopStrengthReduce.cpp

Context not available.

	} // end anonymous namespace	} // end anonymous namespace

		static bool isAMCompletelyFolded(const TargetTransformInfo &TTI,
		LSRUse::KindType Kind, MemAccessTy AccessTy,
		GlobalValue *BaseGV, int64_t BaseOffset,
		bool HasBaseReg, int64_t Scale,
		Instruction *Fixup = nullptr);

	/// Tally up interesting quantities from the given register.	/// Tally up interesting quantities from the given register.
	void Cost::RateRegister(const SCEV *Reg,	void Cost::RateRegister(const SCEV *Reg,
	SmallPtrSetImpl<const SCEV *> &Regs,	SmallPtrSetImpl<const SCEV *> &Regs,
Context not available.
	// Check with target if this offset with this instruction is	// Check with target if this offset with this instruction is
	// specifically not supported.	// specifically not supported.
	if (LU.Kind == LSRUse::Address && Offset != 0 &&	if (LU.Kind == LSRUse::Address && Offset != 0 &&
	!TTI.isFoldableMemAccessOffset(Fixup.UserInst, Offset))	!isAMCompletelyFolded(TTI, LSRUse::Address, LU.AccessTy, F.BaseGV,
		Offset, F.HasBaseReg, F.Scale, Fixup.UserInst))
	C.NumBaseAdds++;	C.NumBaseAdds++;
	}	}

Context not available.
	LSRUse::KindType Kind, MemAccessTy AccessTy,	LSRUse::KindType Kind, MemAccessTy AccessTy,
	GlobalValue *BaseGV, int64_t BaseOffset,	GlobalValue *BaseGV, int64_t BaseOffset,
	bool HasBaseReg, int64_t Scale,	bool HasBaseReg, int64_t Scale,
	Instruction *Fixup = nullptr) {	Instruction Fixup/= nullptr*/) {
	switch (Kind) {	switch (Kind) {
	case LSRUse::Address:	case LSRUse::Address:
	return TTI.isLegalAddressingMode(AccessTy.MemTy, BaseGV, BaseOffset,	return TTI.isLegalAddressingMode(AccessTy.MemTy, BaseGV, BaseOffset,
Context not available.
	if (LU.Kind == LSRUse::Address && TTI.LSRWithInstrQueries()) {	if (LU.Kind == LSRUse::Address && TTI.LSRWithInstrQueries()) {
	for (const LSRFixup &Fixup : LU.Fixups)	for (const LSRFixup &Fixup : LU.Fixups)
	if (!isAMCompletelyFolded(TTI, LSRUse::Address, LU.AccessTy, F.BaseGV,	if (!isAMCompletelyFolded(TTI, LSRUse::Address, LU.AccessTy, F.BaseGV,
	F.BaseOffset, F.HasBaseReg, F.Scale,	(F.BaseOffset + Fixup.Offset), F.HasBaseReg,
	Fixup.UserInst))	F.Scale, Fixup.UserInst))
	return false;	return false;
	return true;	return true;
	}	}
Context not available.

test/CodeGen/Hexagon/swp-const-tc.ll

Context not available.
	; of computing a new LC0 value.	; of computing a new LC0 value.

	; CHECK-LABEL: @test	; CHECK-LABEL: @test
	; CHECK: loop0(.LBB0_1,#998)	; CHECK: loop0(.LBB0_1,#999)

	define i32 @test(i32* %A, i32* %B, i32 %count) {	define i32 @test(i32* %A, i32* %B, i32 %count) {
	entry:	entry:
Context not available.