This is an archive of the discontinued LLVM Phabricator instance.

Commoning of target specific load/store intrinsics in Early CSE
ClosedPublic

Authored by ssijaric on Jan 22 2015, 12:48 AM.

Download Raw Diff

Details

Reviewers

pete
hfinkel

Summary

Common target specific load/store intrinsics in Early CSE. Currently, only a few AArch64 load/store intrinsics are supported (ld2, st2, etc). Updated patch based on Pete and Hal's reviews. Uploading to Phabricator this time.

Thanks,
Sanjin

Diff Detail

Event Timeline

ssijaric updated this revision to Diff 18588.Jan 22 2015, 12:48 AM

ssijaric retitled this revision from to Commoning of target specific load/store intrinsics in Early CSE.

ssijaric updated this object.

ssijaric edited the test plan for this revision. (Show Details)

ssijaric added reviewers: pete, hfinkel.

ssijaric added a subscriber: Unknown Object (MLST).

Herald added a subscriber: aemerson. · View Herald TranscriptJan 22 2015, 12:48 AM

Thanks for fixing all my prior comments.

The patch LGTM with just the few additional comments I made. I'm no expert on this part of the compiler though, so it would be best if Hal or someone else can give the final LGTM.

Thanks,
Pete

include/llvm/Analysis/TargetTransformInfo.h
39	Please use \brief here instead of 'MemIntrinsicInfo -'
43	You need to initialize Vol and NumMemRefs here
463	I think this comment should say it returns a value not an instruction. I think its possible for your code to return a Value here if the st2 intrinsic was storing constants for example.
lib/Transforms/Scalar/EarlyCSE.cpp
408	I think this can also be getPointerOperand() as you've done on the store

Diff contains a few fixes to address Pete's latest review.

Thanks for the review, Pete. New patch uploaded.

include/llvm/Analysis/TargetTransformInfo.h
39	Changed in the updated patch.
43	Thanks for catching this. Fixed in the updated patch.
463	Yes, changed description in the updated patch.
lib/Transforms/Scalar/EarlyCSE.cpp
408	Yes, changed in the updated patch.

Fix initialization list order.

LGTM too.

lib/Transforms/Scalar/EarlyCSE.cpp
478	Don't add an extra blank line here.

This revision is now accepted and ready to land.Jan 24 2015, 6:50 AM

Remove a blank line and rebase the change.

Thanks for the review, Hal.

lib/Transforms/Scalar/EarlyCSE.cpp
478	Removed in the latest update.

Okay (there was no need to repost this for just the whitespace change); feel free to commit.

Hi Hal,

I don't have commit rights. Can you or Chad please commit?

Thanks,
Sanjin

Committed r227149.

Sorry to reopen a long dead review thread, but I had a reason to look at EarlyCSE today and just noticed the changes.

I feel the introduction of the ParseMemoryInst class has badly obscured the logic of this transform. I get that you wanted to abstract over the existence of both normal loads and stores and target specific loads and stores, but the extra layer of abstraction makes this code substantially harder to follow.

Particular areas of concern include:

isValid checks as opposed to dyn_cast, cast style tests (i.e. less error checking!)
MatchingId - what does this even do? It's not documented at all.
isVolatile is confused with isSimple in problematic ways. These are related but distinct concepts.
The usage of fields in the object appear unnecessary. Simply dispatching by the contained instruction type would be far more clear and less error prone.

A potentially clearer design would have been to introduce a helper class for each type of operation: load, and store. Like CallSite, each helper class could simply proxy to the underlying instruction/information based on the type of the instruction so contained.

In D7121#123602, @reames wrote:

Sorry to reopen a long dead review thread, but I had a reason to look at EarlyCSE today and just noticed the changes.

I feel the introduction of the ParseMemoryInst class has badly obscured the logic of this transform. I get that you wanted to abstract over the existence of both normal loads and stores and target specific loads and stores, but the extra layer of abstraction makes this code substantially harder to follow.

Particular areas of concern include:

isValid checks as opposed to dyn_cast, cast style tests (i.e. less error checking!)

Can you please provide an example of what you mean?

MatchingId - what does this even do? It's not documented at all.

There is a comment in the TTI header explaining what this does:

// Same Id is set by the target for corresponding load/store intrinsics.
unsigned short MatchingId;

and there is a comment in the implementation too:

// For regular (non-intrinsic) loads/stores, this is set to -1. For
// intrinsic loads/stores, the id is retrieved from the corresponding
// field in the MemIntrinsicInfo structure.  That field contains
// non-negative values only.
int MatchingId;

isVolatile is confused with isSimple in problematic ways. These are related but distinct concepts.

I assume you're talking about this:

Vol = !LI->isSimple();

FWIW, I don't believe there was a behavioral change here. We can certainly re-name these variables.

The usage of fields in the object appear unnecessary. Simply dispatching by the contained instruction type would be far more clear and less error prone.

A potentially clearer design would have been to introduce a helper class for each type of operation: load, and store. Like CallSite, each helper class could simply proxy to the underlying instruction/information based on the type of the instruction so contained.

I'm not sure. There is common behavior between load intrinsics and loads, store intrinsics and stores, etc.

In D7121#123627, @hfinkel wrote:

In D7121#123602, @reames wrote:

Sorry to reopen a long dead review thread, but I had a reason to look at EarlyCSE today and just noticed the changes.

I feel the introduction of the ParseMemoryInst class has badly obscured the logic of this transform. I get that you wanted to abstract over the existence of both normal loads and stores and target specific loads and stores, but the extra layer of abstraction makes this code substantially harder to follow.

Particular areas of concern include:

isValid checks as opposed to dyn_cast, cast style tests (i.e. less error checking!)

Can you please provide an example of what you mean?

If you write
if (LoadInst *LI = dyn_cast<LoadInst>(I))
or even:
if (isa<LoadInst>(I)) { LI = cast<LoadInst>(I); ... }

You are pretty much guaranteed that LI is in fact a load.

With ParseMemoryInst(I).isLoad() there's no self check. You could see a path being added that set Load=true, but forgot to actually save the instruction, or similar things.

To be clear, I am *not* stating the current impl is buggy, just that its more error prone going forward.

MatchingId - what does this even do? It's not documented at all.

There is a comment in the TTI header explaining what this does:
// Same Id is set by the target for corresponding load/store intrinsics.
unsigned short MatchingId;
and there is a comment in the implementation too:

Neither of these comments describe the *semantics*. What is *meaning* or *usage* of the matching id? Does it effect must aliasing rules? (Serious question here, I can't tell what it actually means.)

// For regular (non-intrinsic) loads/stores, this is set to -1. For
// intrinsic loads/stores, the id is retrieved from the corresponding
// field in the MemIntrinsicInfo structure.  That field contains
// non-negative values only.
int MatchingId;
isVolatile is confused with isSimple in problematic ways. These are related but distinct concepts.

I assume you're talking about this:
Vol = !LI->isSimple();
FWIW, I don't believe there was a behavioral change here. We can certainly re-name these variables.

Thanks. A rename is definitely appropriate.

The usage of fields in the object appear unnecessary. Simply dispatching by the contained instruction type would be far more clear and less error prone.

A potentially clearer design would have been to introduce a helper class for each type of operation: load, and store. Like CallSite, each helper class could simply proxy to the underlying instruction/information based on the type of the instruction so contained.

I'm not sure. There is common behavior between load intrinsics and loads, store intrinsics and stores, etc.

I think you're misreading my suggestion. Here's some psuedo code:
class LoadSite {

Instruction *Inst;
LoadSite(Instruction* I);

operator bool() const;

bool isSimple() {
  if(LoadInst *LI = dyn_cast<LoadInst>) return LI->isSimple();
  assert is target intrinsic
  return TTI->getTarget...().isVolatile;

}
}

if (LoadSite LS = I) {
...
}
class StoreSite { ... };

if (StoreSite SS = I)

Having the abstraction is fine, it should just be minimal. In particular, the abstraction itself shouldn't be stateful.

Given there's little shared between loads (of any type) and stores (of any type), I think these should be separate.

Hi Philip,

Thanks for the review. Just to briefly add onto Hal's comment.

In D7121#123602, @reames wrote:

isVolatile is confused with isSimple in problematic ways. These are related but distinct concepts.

Yes, this should be renamed to avoid confusion. There is no behavioral change, as Hal noted.

MatchingId - what does this even do? It's not documented at all.

The reason for MatchingId is to identify matching loads and stores for target specific intrinsics. For example, the pointer type on different target specific intrinsic calls may be i8*, and MatchingId is there to make sure that only matching intrinsics get commoned (e.g. test case test_nocse3). This comment could be clearer.

The usage of fields in the object appear unnecessary. Simply dispatching by the contained instruction type would be far more clear and less error prone.

A potentially clearer design would have been to introduce a helper class for each type of operation: load, and store. Like CallSite, each helper class could simply proxy to the underlying instruction/information based on the type of the instruction so contained.

I thought about introducing new classes to deal with both regular loads and target specific intrinsic loads (same for stores and target specific intrinsic stores). This was only needed for EarlyCSE, so I kept it localized.

Looking back, it may have been cleaner to introduce these new classes, as they can be reused elsewhere (GVN?). If everyone agrees, I can come up with a new patch to do away with ParseMemInst.

Thanks,
Sanjin

In D7121#123664, @ssijaric wrote:

Hi Philip,

Thanks for the review. Just to briefly add onto Hal's comment.

In D7121#123602, @reames wrote:

isVolatile is confused with isSimple in problematic ways. These are related but distinct concepts.

Yes, this should be renamed to avoid confusion. There is no behavioral change, as Hal noted.

Okay, please do so.

MatchingId - what does this even do? It's not documented at all.

The reason for MatchingId is to identify matching loads and stores for target specific intrinsics. For example, the pointer type on different target specific intrinsic calls may be i8*, and MatchingId is there to make sure that only matching intrinsics get commoned (e.g. test case test_nocse3). This comment could be clearer.

I think the way to state this is that the action of a target-specific memory intrinsic is not uniquely identified by the pointer type. We need to make sure that magic permuting load of type 1 is only matched with magic permuting store of type 1, magic permuting load of type 1 is only matched with magic permuting store of type 2, etc.

The usage of fields in the object appear unnecessary. Simply dispatching by the contained instruction type would be far more clear and less error prone.

A potentially clearer design would have been to introduce a helper class for each type of operation: load, and store. Like CallSite, each helper class could simply proxy to the underlying instruction/information based on the type of the instruction so contained.

I thought about introducing new classes to deal with both regular loads and target specific intrinsic loads (same for stores and target specific intrinsic stores). This was only needed for EarlyCSE, so I kept it localized.

Looking back, it may have been cleaner to introduce these new classes, as they can be reused elsewhere (GVN?). If everyone agrees, I can come up with a new patch to do away with ParseMemInst.

I think this is worth trying.

Thanks,
Sanjin

Revision Contents

Path

Size

include/

llvm/

Analysis/

TargetTransformInfo.h

29 lines

lib/

Analysis/

TargetTransformInfo.cpp

19 lines

Target/

AArch64/

AArch64TargetTransformInfo.cpp

91 lines

Transforms/

Scalar/

EarlyCSE.cpp

140 lines

test/

Transforms/

EarlyCSE/

AArch64/

intrinsics.ll

231 lines

lit.local.cfg

5 lines

Diff 18727

include/llvm/Analysis/TargetTransformInfo.h

Context not available.
	#define LLVM_ANALYSIS_TARGETTRANSFORMINFO_H	#define LLVM_ANALYSIS_TARGETTRANSFORMINFO_H

	#include "llvm/IR/Intrinsics.h"	#include "llvm/IR/Intrinsics.h"
		#include "llvm/IR/IntrinsicInst.h"
	#include "llvm/Pass.h"	#include "llvm/Pass.h"
	#include "llvm/Support/DataTypes.h"	#include "llvm/Support/DataTypes.h"

Context not available.
	class User;	class User;
	class Value;	class Value;

		/// \brief Information about a load/store intrinsic defined by the target.
		peteUnsubmitted Not Done Reply Inline Actions Please use \brief here instead of 'MemIntrinsicInfo -' pete: Please use \brief here instead of 'MemIntrinsicInfo -'
		ssijaricAuthorUnsubmitted Not Done Reply Inline Actions Changed in the updated patch. ssijaric: Changed in the updated patch.
		struct MemIntrinsicInfo {
		MemIntrinsicInfo()
		: ReadMem(false), WriteMem(false), Vol(false), MatchingId(0),
		NumMemRefs(0), PtrVal(nullptr) {}
		peteUnsubmitted Not Done Reply Inline Actions You need to initialize Vol and NumMemRefs here pete: You need to initialize Vol and NumMemRefs here
		ssijaricAuthorUnsubmitted Not Done Reply Inline Actions Thanks for catching this. Fixed in the updated patch. ssijaric: Thanks for catching this. Fixed in the updated patch.
		bool ReadMem;
		bool WriteMem;
		bool Vol;
		// Same Id is set by the target for corresponding load/store intrinsics.
		unsigned short MatchingId;
		int NumMemRefs;
		Value *PtrVal;
		};

	/// TargetTransformInfo - This pass provides access to the codegen	/// TargetTransformInfo - This pass provides access to the codegen
	/// interfaces that are needed for IR-level transformations.	/// interfaces that are needed for IR-level transformations.
	class TargetTransformInfo {	class TargetTransformInfo {
Context not available.
	/// any callee-saved registers, so would require a spill and fill.	/// any callee-saved registers, so would require a spill and fill.
	virtual unsigned getCostOfKeepingLiveOverCall(ArrayRef<Type*> Tys) const;	virtual unsigned getCostOfKeepingLiveOverCall(ArrayRef<Type*> Tys) const;

		/// \returns True if the intrinsic is a supported memory intrinsic. Info
		/// will contain additional information - whether the intrinsic may write
		/// or read to memory, volatility and the pointer. Info is undefined
		/// if false is returned.
		virtual bool getTgtMemIntrinsic(IntrinsicInst *Inst,
		MemIntrinsicInfo &Info) const;

		/// \returns A value which is the result of the given memory intrinsic. New
		peteUnsubmitted Not Done Reply Inline Actions I think this comment should say it returns a value not an instruction. I think its possible for your code to return a Value here if the st2 intrinsic was storing constants for example. pete: I think this comment should say it returns a value not an instruction. I think its possible…
		ssijaricAuthorUnsubmitted Not Done Reply Inline Actions Yes, changed description in the updated patch. ssijaric: Yes, changed description in the updated patch.
		/// instructions may be created to extract the result from the given intrinsic
		/// memory operation. Returns nullptr if the target cannot create a result
		/// from the given intrinsic.
		virtual Value getOrCreateResultFromMemIntrinsic(IntrinsicInst Inst,
		Type *ExpectedType) const;

	/// @}	/// @}

	/// Analysis group identification.	/// Analysis group identification.
Context not available.

lib/Analysis/TargetTransformInfo.cpp

Context not available.
	return PrevTTI->getCostOfKeepingLiveOverCall(Tys);	return PrevTTI->getCostOfKeepingLiveOverCall(Tys);
	}	}

		Value *TargetTransformInfo::getOrCreateResultFromMemIntrinsic(
		IntrinsicInst Inst, Type ExpectedType) const {
		return PrevTTI->getOrCreateResultFromMemIntrinsic(Inst, ExpectedType);
		}

		bool TargetTransformInfo::getTgtMemIntrinsic(IntrinsicInst *Inst,
		MemIntrinsicInfo &Info) const {
		return PrevTTI->getTgtMemIntrinsic(Inst, Info);
		}

	namespace {	namespace {

	struct NoTTI final : ImmutablePass, TargetTransformInfo {	struct NoTTI final : ImmutablePass, TargetTransformInfo {
Context not available.
	return 0;	return 0;
	}	}

		bool getTgtMemIntrinsic(IntrinsicInst *Inst,
		MemIntrinsicInfo &Info) const override {
		return false;
		}

		Value getOrCreateResultFromMemIntrinsic(IntrinsicInst Inst,
		Type *ExpectedType) const override {
		return nullptr;
		}
	};	};

	} // end anonymous namespace	} // end anonymous namespace
Context not available.

lib/Target/AArch64/AArch64TargetTransformInfo.cpp

Context not available.
	/// are set if the result needs to be inserted and/or extracted from vectors.	/// are set if the result needs to be inserted and/or extracted from vectors.
	unsigned getScalarizationOverhead(Type *Ty, bool Insert, bool Extract) const;	unsigned getScalarizationOverhead(Type *Ty, bool Insert, bool Extract) const;

		enum MemIntrinsicType {
		VECTOR_LDST_TWO_ELEMENTS,
		VECTOR_LDST_THREE_ELEMENTS,
		VECTOR_LDST_FOUR_ELEMENTS
		};

	public:	public:
	AArch64TTI() : ImmutablePass(ID), TM(nullptr), ST(nullptr), TLI(nullptr) {	AArch64TTI() : ImmutablePass(ID), TM(nullptr), ST(nullptr), TLI(nullptr) {
	llvm_unreachable("This pass cannot be directly constructed");	llvm_unreachable("This pass cannot be directly constructed");
Context not available.
	void getUnrollingPreferences(const Function F, Loop L,	void getUnrollingPreferences(const Function F, Loop L,
	UnrollingPreferences &UP) const override;	UnrollingPreferences &UP) const override;

		Value getOrCreateResultFromMemIntrinsic(IntrinsicInst Inst,
		Type *ExpectedType) const override;

		bool getTgtMemIntrinsic(IntrinsicInst *Inst,
		MemIntrinsicInfo &Info) const override;

	/// @}	/// @}
	};	};
Context not available.
	// Disable partial & runtime unrolling on -Os.	// Disable partial & runtime unrolling on -Os.
	UP.PartialOptSizeThreshold = 0;	UP.PartialOptSizeThreshold = 0;
	}	}

		Value AArch64TTI::getOrCreateResultFromMemIntrinsic(IntrinsicInst Inst,
		Type *ExpectedType) const {
		switch (Inst->getIntrinsicID()) {
		default:
		return nullptr;
		case Intrinsic::aarch64_neon_st2:
		case Intrinsic::aarch64_neon_st3:
		case Intrinsic::aarch64_neon_st4: {
		// Create a struct type
		StructType *ST = dyn_cast<StructType>(ExpectedType);
		if (!ST)
		return nullptr;
		unsigned NumElts = Inst->getNumArgOperands() - 1;
		if (ST->getNumElements() != NumElts)
		return nullptr;
		for (unsigned i = 0, e = NumElts; i != e; ++i) {
		if (Inst->getArgOperand(i)->getType() != ST->getElementType(i))
		return nullptr;
		}
		Value *Res = UndefValue::get(ExpectedType);
		IRBuilder<> Builder(Inst);
		for (unsigned i = 0, e = NumElts; i != e; ++i) {
		Value *L = Inst->getArgOperand(i);
		Res = Builder.CreateInsertValue(Res, L, i);
		}
		return Res;
		}
		case Intrinsic::aarch64_neon_ld2:
		case Intrinsic::aarch64_neon_ld3:
		case Intrinsic::aarch64_neon_ld4:
		if (Inst->getType() == ExpectedType)
		return Inst;
		return nullptr;
		}
		}

		bool AArch64TTI::getTgtMemIntrinsic(IntrinsicInst *Inst,
		MemIntrinsicInfo &Info) const {
		switch (Inst->getIntrinsicID()) {
		default:
		break;
		case Intrinsic::aarch64_neon_ld2:
		case Intrinsic::aarch64_neon_ld3:
		case Intrinsic::aarch64_neon_ld4:
		Info.ReadMem = true;
		Info.WriteMem = false;
		Info.Vol = false;
		Info.NumMemRefs = 1;
		Info.PtrVal = Inst->getArgOperand(0);
		break;
		case Intrinsic::aarch64_neon_st2:
		case Intrinsic::aarch64_neon_st3:
		case Intrinsic::aarch64_neon_st4:
		Info.ReadMem = false;
		Info.WriteMem = true;
		Info.Vol = false;
		Info.NumMemRefs = 1;
		Info.PtrVal = Inst->getArgOperand(Inst->getNumArgOperands() - 1);
		break;
		}

		switch (Inst->getIntrinsicID()) {
		default:
		return false;
		case Intrinsic::aarch64_neon_ld2:
		case Intrinsic::aarch64_neon_st2:
		Info.MatchingId = VECTOR_LDST_TWO_ELEMENTS;
		break;
		case Intrinsic::aarch64_neon_ld3:
		case Intrinsic::aarch64_neon_st3:
		Info.MatchingId = VECTOR_LDST_THREE_ELEMENTS;
		break;
		case Intrinsic::aarch64_neon_ld4:
		case Intrinsic::aarch64_neon_st4:
		Info.MatchingId = VECTOR_LDST_FOUR_ELEMENTS;
		break;
		}
		return true;
		}
Context not available.

lib/Transforms/Scalar/EarlyCSE.cpp

Context not available.
	#include "llvm/ADT/Statistic.h"	#include "llvm/ADT/Statistic.h"
	#include "llvm/Analysis/AssumptionCache.h"	#include "llvm/Analysis/AssumptionCache.h"
	#include "llvm/Analysis/InstructionSimplify.h"	#include "llvm/Analysis/InstructionSimplify.h"
		#include "llvm/Analysis/TargetTransformInfo.h"
	#include "llvm/IR/DataLayout.h"	#include "llvm/IR/DataLayout.h"
	#include "llvm/IR/Dominators.h"	#include "llvm/IR/Dominators.h"
	#include "llvm/IR/Instructions.h"	#include "llvm/IR/Instructions.h"
Context not available.
	public:	public:
	const DataLayout *DL;	const DataLayout *DL;
	const TargetLibraryInfo *TLI;	const TargetLibraryInfo *TLI;
		const TargetTransformInfo *TTI;
	DominatorTree *DT;	DominatorTree *DT;
	AssumptionCache *AC;	AssumptionCache *AC;
	typedef RecyclingAllocator<	typedef RecyclingAllocator<
Context not available.
	bool Processed;	bool Processed;
	};	};

		/// \brief Wrapper class to handle memory instructions, including loads,
		/// stores and intrinsic loads and stores defined by the target.
		class ParseMemoryInst {
		public:
		ParseMemoryInst(Instruction Inst, const TargetTransformInfo TTI)
		: Load(false), Store(false), Vol(false), MayReadFromMemory(false),
		MayWriteToMemory(false), MatchingId(-1), Ptr(nullptr) {
		MayReadFromMemory = Inst->mayReadFromMemory();
		MayWriteToMemory = Inst->mayWriteToMemory();
		if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(Inst)) {
		MemIntrinsicInfo Info;
		if (!TTI->getTgtMemIntrinsic(II, Info))
		return;
		if (Info.NumMemRefs == 1) {
		Store = Info.WriteMem;
		Load = Info.ReadMem;
		MatchingId = Info.MatchingId;
		MayReadFromMemory = Info.ReadMem;
		MayWriteToMemory = Info.WriteMem;
		Vol = Info.Vol;
		Ptr = Info.PtrVal;
		peteUnsubmitted Not Done Reply Inline Actions I think this can also be getPointerOperand() as you've done on the store pete: I think this can also be getPointerOperand() as you've done on the store
		ssijaricAuthorUnsubmitted Not Done Reply Inline Actions Yes, changed in the updated patch. ssijaric: Yes, changed in the updated patch.
		}
		} else if (LoadInst *LI = dyn_cast<LoadInst>(Inst)) {
		Load = true;
		Vol = !LI->isSimple();
		Ptr = LI->getPointerOperand();
		} else if (StoreInst *SI = dyn_cast<StoreInst>(Inst)) {
		Store = true;
		Vol = !SI->isSimple();
		Ptr = SI->getPointerOperand();
		}
		}
		bool isLoad() { return Load; }
		bool isStore() { return Store; }
		bool isVolatile() { return Vol; }
		bool isMatchingMemLoc(const ParseMemoryInst &Inst) {
		return Ptr == Inst.Ptr && MatchingId == Inst.MatchingId;
		}
		bool isValid() { return Ptr != nullptr; }
		int getMatchingId() { return MatchingId; }
		Value *getPtr() { return Ptr; }
		bool mayReadFromMemory() { return MayReadFromMemory; }
		bool mayWriteToMemory() { return MayWriteToMemory; }

		private:
		bool Load;
		bool Store;
		bool Vol;
		bool MayReadFromMemory;
		bool MayWriteToMemory;
		// For regular (non-intrinsic) loads/stores, this is set to -1. For
		// intrinsic loads/stores, the id is retrieved from the corresponding
		// field in the MemIntrinsicInfo structure. That field contains
		// non-negative values only.
		int MatchingId;
		Value *Ptr;
		};

	bool processNode(DomTreeNode *Node);	bool processNode(DomTreeNode *Node);

	void getAnalysisUsage(AnalysisUsage &AU) const override {	void getAnalysisUsage(AnalysisUsage &AU) const override {
	AU.addRequired<AssumptionCacheTracker>();	AU.addRequired<AssumptionCacheTracker>();
	AU.addRequired<DominatorTreeWrapperPass>();	AU.addRequired<DominatorTreeWrapperPass>();
	AU.addRequired<TargetLibraryInfoWrapperPass>();	AU.addRequired<TargetLibraryInfoWrapperPass>();
		AU.addRequired<TargetTransformInfo>();
	AU.setPreservesCFG();	AU.setPreservesCFG();
	}	}

		Value getOrCreateResult(Value Inst, Type *ExpectedType) const {
		if (LoadInst *LI = dyn_cast<LoadInst>(Inst))
		return LI;
		else if (StoreInst *SI = dyn_cast<StoreInst>(Inst))
		return SI->getValueOperand();
		assert(isa<IntrinsicInst>(Inst) && "Instruction not supported");
		return TTI->getOrCreateResultFromMemIntrinsic(cast<IntrinsicInst>(Inst),
		ExpectedType);
		}
	};	};
	}	}

		hfinkelUnsubmitted Not Done Reply Inline Actions Don't add an extra blank line here. hfinkel: Don't add an extra blank line here.
		ssijaricAuthorUnsubmitted Not Done Reply Inline Actions Removed in the latest update. ssijaric: Removed in the latest update.
Context not available.
	/// as long as there in no instruction that reads memory. If we see a store	/// as long as there in no instruction that reads memory. If we see a store
	/// to the same location, we delete the dead store. This zaps trivial dead	/// to the same location, we delete the dead store. This zaps trivial dead
	/// stores which can occur in bitfield code among other things.	/// stores which can occur in bitfield code among other things.
	StoreInst *LastStore = nullptr;	Instruction *LastStore = nullptr;

	bool Changed = false;	bool Changed = false;

Context not available.
	continue;	continue;
	}	}

		ParseMemoryInst MemInst(Inst, TTI);
	// If this is a non-volatile load, process it.	// If this is a non-volatile load, process it.
	if (LoadInst *LI = dyn_cast<LoadInst>(Inst)) {	if (MemInst.isValid() && MemInst.isLoad()) {
	// Ignore volatile loads.	// Ignore volatile loads.
	if (!LI->isSimple()) {	if (MemInst.isVolatile()) {
	LastStore = nullptr;	LastStore = nullptr;
	continue;	continue;
	}	}
Context not available.
	// If we have an available version of this load, and if it is the right	// If we have an available version of this load, and if it is the right
	// generation, replace this instruction.	// generation, replace this instruction.
	std::pair<Value *, unsigned> InVal =	std::pair<Value *, unsigned> InVal =
	AvailableLoads->lookup(Inst->getOperand(0));	AvailableLoads->lookup(MemInst.getPtr());
	if (InVal.first != nullptr && InVal.second == CurrentGeneration) {	if (InVal.first != nullptr && InVal.second == CurrentGeneration) {
	DEBUG(dbgs() << "EarlyCSE CSE LOAD: " << *Inst	Value *Op = getOrCreateResult(InVal.first, Inst->getType());
	<< " to: " << *InVal.first << '\n');	if (Op != nullptr) {
	if (!Inst->use_empty())	DEBUG(dbgs() << "EarlyCSE CSE LOAD: " << *Inst
	Inst->replaceAllUsesWith(InVal.first);	<< " to: " << *InVal.first << '\n');
	Inst->eraseFromParent();	if (!Inst->use_empty())
	Changed = true;	Inst->replaceAllUsesWith(Op);
	++NumCSELoad;	Inst->eraseFromParent();
	continue;	Changed = true;
		++NumCSELoad;
		continue;
		}
	}	}

	// Otherwise, remember that we have this instruction.	// Otherwise, remember that we have this instruction.
	AvailableLoads->insert(Inst->getOperand(0), std::pair<Value *, unsigned>(	AvailableLoads->insert(MemInst.getPtr(), std::pair<Value *, unsigned>(
	Inst, CurrentGeneration));	Inst, CurrentGeneration));
	LastStore = nullptr;	LastStore = nullptr;
	continue;	continue;
	}	}

	// If this instruction may read from memory, forget LastStore.	// If this instruction may read from memory, forget LastStore.
	if (Inst->mayReadFromMemory())	// Load/store intrinsics will indicate both a read and a write to
		// memory. The target may override this (e.g. so that a store intrinsic
		// does not read from memory, and thus will be treated the same as a
		// regular store for commoning purposes).
		if (Inst->mayReadFromMemory() &&
		!(MemInst.isValid() && !MemInst.mayReadFromMemory()))
	LastStore = nullptr;	LastStore = nullptr;

	// If this is a read-only call, process it.	// If this is a read-only call, process it.
Context not available.
	if (Inst->mayWriteToMemory()) {	if (Inst->mayWriteToMemory()) {
	++CurrentGeneration;	++CurrentGeneration;

	if (StoreInst *SI = dyn_cast<StoreInst>(Inst)) {	if (MemInst.isValid() && MemInst.isStore()) {
	// We do a trivial form of DSE if there are two stores to the same	// We do a trivial form of DSE if there are two stores to the same
	// location with no intervening loads. Delete the earlier store.	// location with no intervening loads. Delete the earlier store.
	if (LastStore &&	if (LastStore) {
	LastStore->getPointerOperand() == SI->getPointerOperand()) {	ParseMemoryInst LastStoreMemInst(LastStore, TTI);
	DEBUG(dbgs() << "EarlyCSE DEAD STORE: " << *LastStore	if (LastStoreMemInst.isMatchingMemLoc(MemInst)) {
	<< " due to: " << *Inst << '\n');	DEBUG(dbgs() << "EarlyCSE DEAD STORE: " << *LastStore
	LastStore->eraseFromParent();	<< " due to: " << *Inst << '\n');
	Changed = true;	LastStore->eraseFromParent();
	++NumDSE;	Changed = true;
	LastStore = nullptr;	++NumDSE;
		LastStore = nullptr;
		}
	// fallthrough - we can exploit information about this store	// fallthrough - we can exploit information about this store
	}	}

Context not available.
	// version of the pointer. It is safe to forward from volatile stores	// version of the pointer. It is safe to forward from volatile stores
	// to non-volatile loads, so we don't have to check for volatility of	// to non-volatile loads, so we don't have to check for volatility of
	// the store.	// the store.
	AvailableLoads->insert(SI->getPointerOperand(),	AvailableLoads->insert(MemInst.getPtr(), std::pair<Value *, unsigned>(
	std::pair<Value *, unsigned>(	Inst, CurrentGeneration));
	SI->getValueOperand(), CurrentGeneration));

	// Remember that this was the last store we saw for DSE.	// Remember that this was the last store we saw for DSE.
	if (SI->isSimple())	if (!MemInst.isVolatile())
	LastStore = SI;	LastStore = Inst;
	}	}
	}	}
	}	}
Context not available.
	DataLayoutPass *DLP = getAnalysisIfAvailable<DataLayoutPass>();	DataLayoutPass *DLP = getAnalysisIfAvailable<DataLayoutPass>();
	DL = DLP ? &DLP->getDataLayout() : nullptr;	DL = DLP ? &DLP->getDataLayout() : nullptr;
	TLI = &getAnalysis<TargetLibraryInfoWrapperPass>().getTLI();	TLI = &getAnalysis<TargetLibraryInfoWrapperPass>().getTLI();
		TTI = &getAnalysis<TargetTransformInfo>();
	DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();	DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();
	AC = &getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);	AC = &getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);

Context not available.

test/Transforms/EarlyCSE/AArch64/intrinsics.ll

This file was added.

				; RUN: opt < %s -S -mtriple=aarch64-none-linux-gnu -mattr=+neon -early-cse \| FileCheck %s

				define <4 x i32> @test_cse(i32* %a, [2 x <4 x i32>] %s.coerce, i32 %n) {
				entry:
				; Check that @llvm.aarch64.neon.ld2 is optimized away by Early CSE.
				; CHECK-LABEL: @test_cse
				; CHECK-NOT: call { <4 x i32>, <4 x i32> } @llvm.aarch64.neon.ld2.v4i32.p0i8
				%s.coerce.fca.0.extract = extractvalue [2 x <4 x i32>] %s.coerce, 0
				%s.coerce.fca.1.extract = extractvalue [2 x <4 x i32>] %s.coerce, 1
				br label %for.cond

				for.cond: ; preds = %for.body, %entry
				%i.0 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
				%res.0 = phi <4 x i32> [ undef, %entry ], [ %call, %for.body ]
				%cmp = icmp slt i32 %i.0, %n
				br i1 %cmp, label %for.body, label %for.end

				for.body: ; preds = %for.cond
				%0 = bitcast i32* %a to i8*
				%1 = bitcast <4 x i32> %s.coerce.fca.0.extract to <16 x i8>
				%2 = bitcast <4 x i32> %s.coerce.fca.1.extract to <16 x i8>
				%3 = bitcast <16 x i8> %1 to <4 x i32>
				%4 = bitcast <16 x i8> %2 to <4 x i32>
				call void @llvm.aarch64.neon.st2.v4i32.p0i8(<4 x i32> %3, <4 x i32> %4, i8* %0)
				%5 = bitcast i32* %a to i8*
				%vld2 = call { <4 x i32>, <4 x i32> } @llvm.aarch64.neon.ld2.v4i32.p0i8(i8* %5)
				%vld2.fca.0.extract = extractvalue { <4 x i32>, <4 x i32> } %vld2, 0
				%vld2.fca.1.extract = extractvalue { <4 x i32>, <4 x i32> } %vld2, 1
				%call = call <4 x i32> @vaddq_s32(<4 x i32> %vld2.fca.0.extract, <4 x i32> %vld2.fca.0.extract)
				%inc = add nsw i32 %i.0, 1
				br label %for.cond

				for.end: ; preds = %for.cond
				ret <4 x i32> %res.0
				}

				define <4 x i32> @test_cse2(i32* %a, [2 x <4 x i32>] %s.coerce, i32 %n) {
				entry:
				; Check that the first @llvm.aarch64.neon.st2 is optimized away by Early CSE.
				; CHECK-LABEL: @test_cse2
				; CHECK-NOT: call void @llvm.aarch64.neon.st2.v4i32.p0i8(<4 x i32> %3, <4 x i32> %3, i8* %0)
				; CHECK: call void @llvm.aarch64.neon.st2.v4i32.p0i8(<4 x i32> %3, <4 x i32> %4, i8* %0)
				%s.coerce.fca.0.extract = extractvalue [2 x <4 x i32>] %s.coerce, 0
				%s.coerce.fca.1.extract = extractvalue [2 x <4 x i32>] %s.coerce, 1
				br label %for.cond

				for.cond: ; preds = %for.body, %entry
				%i.0 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
				%res.0 = phi <4 x i32> [ undef, %entry ], [ %call, %for.body ]
				%cmp = icmp slt i32 %i.0, %n
				br i1 %cmp, label %for.body, label %for.end

				for.body: ; preds = %for.cond
				%0 = bitcast i32* %a to i8*
				%1 = bitcast <4 x i32> %s.coerce.fca.0.extract to <16 x i8>
				%2 = bitcast <4 x i32> %s.coerce.fca.1.extract to <16 x i8>
				%3 = bitcast <16 x i8> %1 to <4 x i32>
				%4 = bitcast <16 x i8> %2 to <4 x i32>
				call void @llvm.aarch64.neon.st2.v4i32.p0i8(<4 x i32> %3, <4 x i32> %3, i8* %0)
				call void @llvm.aarch64.neon.st2.v4i32.p0i8(<4 x i32> %3, <4 x i32> %4, i8* %0)
				%5 = bitcast i32* %a to i8*
				%vld2 = call { <4 x i32>, <4 x i32> } @llvm.aarch64.neon.ld2.v4i32.p0i8(i8* %5)
				%vld2.fca.0.extract = extractvalue { <4 x i32>, <4 x i32> } %vld2, 0
				%vld2.fca.1.extract = extractvalue { <4 x i32>, <4 x i32> } %vld2, 1
				%call = call <4 x i32> @vaddq_s32(<4 x i32> %vld2.fca.0.extract, <4 x i32> %vld2.fca.0.extract)
				%inc = add nsw i32 %i.0, 1
				br label %for.cond

				for.end: ; preds = %for.cond
				ret <4 x i32> %res.0
				}

				define <4 x i32> @test_cse3(i32* %a, [2 x <4 x i32>] %s.coerce, i32 %n) #0 {
				entry:
				; Check that the first @llvm.aarch64.neon.ld2 is optimized away by Early CSE.
				; CHECK-LABEL: @test_cse3
				; CHECK: call { <4 x i32>, <4 x i32> } @llvm.aarch64.neon.ld2.v4i32.p0i8
				; CHECK-NOT: call { <4 x i32>, <4 x i32> } @llvm.aarch64.neon.ld2.v4i32.p0i8
				%s.coerce.fca.0.extract = extractvalue [2 x <4 x i32>] %s.coerce, 0
				%s.coerce.fca.1.extract = extractvalue [2 x <4 x i32>] %s.coerce, 1
				br label %for.cond

				for.cond: ; preds = %for.body, %entry
				%i.0 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
				%res.0 = phi <4 x i32> [ undef, %entry ], [ %call, %for.body ]
				%cmp = icmp slt i32 %i.0, %n
				br i1 %cmp, label %for.body, label %for.end

				for.body: ; preds = %for.cond
				%0 = bitcast i32* %a to i8*
				%vld2 = call { <4 x i32>, <4 x i32> } @llvm.aarch64.neon.ld2.v4i32.p0i8(i8* %0)
				%vld2.fca.0.extract = extractvalue { <4 x i32>, <4 x i32> } %vld2, 0
				%vld2.fca.1.extract = extractvalue { <4 x i32>, <4 x i32> } %vld2, 1
				%1 = bitcast i32* %a to i8*
				%vld22 = call { <4 x i32>, <4 x i32> } @llvm.aarch64.neon.ld2.v4i32.p0i8(i8* %1)
				%vld22.fca.0.extract = extractvalue { <4 x i32>, <4 x i32> } %vld22, 0
				%vld22.fca.1.extract = extractvalue { <4 x i32>, <4 x i32> } %vld22, 1
				%call = call <4 x i32> @vaddq_s32(<4 x i32> %vld2.fca.0.extract, <4 x i32> %vld22.fca.0.extract)
				%inc = add nsw i32 %i.0, 1
				br label %for.cond

				for.end: ; preds = %for.cond
				ret <4 x i32> %res.0
				}


				define <4 x i32> @test_nocse(i32* %a, i32* %b, [2 x <4 x i32>] %s.coerce, i32 %n) {
				entry:
				; Check that the store prevents @llvm.aarch64.neon.ld2 from being optimized
				; away by Early CSE.
				; CHECK-LABEL: @test_nocse
				; CHECK: call { <4 x i32>, <4 x i32> } @llvm.aarch64.neon.ld2.v4i32.p0i8
				%s.coerce.fca.0.extract = extractvalue [2 x <4 x i32>] %s.coerce, 0
				%s.coerce.fca.1.extract = extractvalue [2 x <4 x i32>] %s.coerce, 1
				br label %for.cond

				for.cond: ; preds = %for.body, %entry
				%i.0 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
				%res.0 = phi <4 x i32> [ undef, %entry ], [ %call, %for.body ]
				%cmp = icmp slt i32 %i.0, %n
				br i1 %cmp, label %for.body, label %for.end

				for.body: ; preds = %for.cond
				%0 = bitcast i32* %a to i8*
				%1 = bitcast <4 x i32> %s.coerce.fca.0.extract to <16 x i8>
				%2 = bitcast <4 x i32> %s.coerce.fca.1.extract to <16 x i8>
				%3 = bitcast <16 x i8> %1 to <4 x i32>
				%4 = bitcast <16 x i8> %2 to <4 x i32>
				call void @llvm.aarch64.neon.st2.v4i32.p0i8(<4 x i32> %3, <4 x i32> %4, i8* %0)
				store i32 0, i32* %b, align 4
				%5 = bitcast i32* %a to i8*
				%vld2 = call { <4 x i32>, <4 x i32> } @llvm.aarch64.neon.ld2.v4i32.p0i8(i8* %5)
				%vld2.fca.0.extract = extractvalue { <4 x i32>, <4 x i32> } %vld2, 0
				%vld2.fca.1.extract = extractvalue { <4 x i32>, <4 x i32> } %vld2, 1
				%call = call <4 x i32> @vaddq_s32(<4 x i32> %vld2.fca.0.extract, <4 x i32> %vld2.fca.0.extract)
				%inc = add nsw i32 %i.0, 1
				br label %for.cond

				for.end: ; preds = %for.cond
				ret <4 x i32> %res.0
				}

				define <4 x i32> @test_nocse2(i32* %a, [2 x <4 x i32>] %s.coerce, i32 %n) {
				entry:
				; Check that @llvm.aarch64.neon.ld3 is not optimized away by Early CSE due
				; to mismatch between st2 and ld3.
				; CHECK-LABEL: @test_nocse2
				; CHECK: call { <4 x i32>, <4 x i32>, <4 x i32> } @llvm.aarch64.neon.ld3.v4i32.p0i8
				%s.coerce.fca.0.extract = extractvalue [2 x <4 x i32>] %s.coerce, 0
				%s.coerce.fca.1.extract = extractvalue [2 x <4 x i32>] %s.coerce, 1
				br label %for.cond

				for.cond: ; preds = %for.body, %entry
				%i.0 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
				%res.0 = phi <4 x i32> [ undef, %entry ], [ %call, %for.body ]
				%cmp = icmp slt i32 %i.0, %n
				br i1 %cmp, label %for.body, label %for.end

				for.body: ; preds = %for.cond
				%0 = bitcast i32* %a to i8*
				%1 = bitcast <4 x i32> %s.coerce.fca.0.extract to <16 x i8>
				%2 = bitcast <4 x i32> %s.coerce.fca.1.extract to <16 x i8>
				%3 = bitcast <16 x i8> %1 to <4 x i32>
				%4 = bitcast <16 x i8> %2 to <4 x i32>
				call void @llvm.aarch64.neon.st2.v4i32.p0i8(<4 x i32> %3, <4 x i32> %4, i8* %0)
				%5 = bitcast i32* %a to i8*
				%vld3 = call { <4 x i32>, <4 x i32>, <4 x i32> } @llvm.aarch64.neon.ld3.v4i32.p0i8(i8* %5)
				%vld3.fca.0.extract = extractvalue { <4 x i32>, <4 x i32>, <4 x i32> } %vld3, 0
				%vld3.fca.2.extract = extractvalue { <4 x i32>, <4 x i32>, <4 x i32> } %vld3, 2
				%call = call <4 x i32> @vaddq_s32(<4 x i32> %vld3.fca.0.extract, <4 x i32> %vld3.fca.2.extract)
				%inc = add nsw i32 %i.0, 1
				br label %for.cond

				for.end: ; preds = %for.cond
				ret <4 x i32> %res.0
				}

				define <4 x i32> @test_nocse3(i32* %a, [2 x <4 x i32>] %s.coerce, i32 %n) {
				entry:
				; Check that @llvm.aarch64.neon.st3 is not optimized away by Early CSE due to
				; mismatch between st2 and st3.
				; CHECK-LABEL: @test_nocse3
				; CHECK: call void @llvm.aarch64.neon.st3.v4i32.p0i8
				; CHECK: call void @llvm.aarch64.neon.st2.v4i32.p0i8
				%s.coerce.fca.0.extract = extractvalue [2 x <4 x i32>] %s.coerce, 0
				%s.coerce.fca.1.extract = extractvalue [2 x <4 x i32>] %s.coerce, 1
				br label %for.cond

				for.cond: ; preds = %for.body, %entry
				%i.0 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
				%res.0 = phi <4 x i32> [ undef, %entry ], [ %call, %for.body ]
				%cmp = icmp slt i32 %i.0, %n
				br i1 %cmp, label %for.body, label %for.end

				for.body: ; preds = %for.cond
				%0 = bitcast i32* %a to i8*
				%1 = bitcast <4 x i32> %s.coerce.fca.0.extract to <16 x i8>
				%2 = bitcast <4 x i32> %s.coerce.fca.1.extract to <16 x i8>
				%3 = bitcast <16 x i8> %1 to <4 x i32>
				%4 = bitcast <16 x i8> %2 to <4 x i32>
				call void @llvm.aarch64.neon.st3.v4i32.p0i8(<4 x i32> %4, <4 x i32> %3, <4 x i32> %3, i8* %0)
				call void @llvm.aarch64.neon.st2.v4i32.p0i8(<4 x i32> %3, <4 x i32> %3, i8* %0)
				%5 = bitcast i32* %a to i8*
				%vld3 = call { <4 x i32>, <4 x i32>, <4 x i32> } @llvm.aarch64.neon.ld3.v4i32.p0i8(i8* %5)
				%vld3.fca.0.extract = extractvalue { <4 x i32>, <4 x i32>, <4 x i32> } %vld3, 0
				%vld3.fca.1.extract = extractvalue { <4 x i32>, <4 x i32>, <4 x i32> } %vld3, 1
				%call = call <4 x i32> @vaddq_s32(<4 x i32> %vld3.fca.0.extract, <4 x i32> %vld3.fca.0.extract)
				%inc = add nsw i32 %i.0, 1
				br label %for.cond

				for.end: ; preds = %for.cond
				ret <4 x i32> %res.0
				}

				; Function Attrs: nounwind
				declare void @llvm.aarch64.neon.st2.v4i32.p0i8(<4 x i32>, <4 x i32>, i8* nocapture)

				; Function Attrs: nounwind
				declare void @llvm.aarch64.neon.st3.v4i32.p0i8(<4 x i32>, <4 x i32>, <4 x i32>, i8* nocapture)

				; Function Attrs: nounwind readonly
				declare { <4 x i32>, <4 x i32> } @llvm.aarch64.neon.ld2.v4i32.p0i8(i8*)

				; Function Attrs: nounwind readonly
				declare { <4 x i32>, <4 x i32>, <4 x i32> } @llvm.aarch64.neon.ld3.v4i32.p0i8(i8*)

				define internal fastcc <4 x i32> @vaddq_s32(<4 x i32> %__p0, <4 x i32> %__p1) {
				entry:
				%add = add <4 x i32> %__p0, %__p1
				ret <4 x i32> %add
				}

test/Transforms/EarlyCSE/AArch64/lit.local.cfg

This file was added.

				config.suffixes = ['.ll']

				targets = set(config.root.targets_to_build.split())
				if not 'AArch64' in targets:
				config.unsupported = True

This is an archive of the discontinued LLVM Phabricator instance.

Commoning of target specific load/store intrinsics in Early CSEClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 18727

include/llvm/Analysis/TargetTransformInfo.h

lib/Analysis/TargetTransformInfo.cpp

lib/Target/AArch64/AArch64TargetTransformInfo.cpp

lib/Transforms/Scalar/EarlyCSE.cpp

test/Transforms/EarlyCSE/AArch64/intrinsics.ll

test/Transforms/EarlyCSE/AArch64/lit.local.cfg

Commoning of target specific load/store intrinsics in Early CSE
ClosedPublic