This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
Analysis/
1/1
TargetTransformInfo.h
-
TargetTransformInfoImpl.h
-
Transforms/Scalar/
-
Scalar/
-
MemCpyOptimizer.h
-
lib/
-
Analysis/
-
TargetTransformInfo.cpp
-
Target/NVPTX/
-
NVPTX/
-
NVPTXTargetTransformInfo.h
-
Transforms/Scalar/
-
Scalar/
1/1
MemCpyOptimizer.cpp

Differential D104801

[MemCpyOpt] Enable memcpy optimizations unconditionally.
ClosedPublic

Authored by tra on Jun 23 2021, 11:14 AM.

Download Raw Diff

Details

Reviewers

nikic

Commits

rG2c98298a7559: [MemCpyOpt] Enable memcpy optimizations unconditionally.

Summary

The patch does not depend on the availability of the library functions for memcpy/memset as it operates on LLVM intrinsics.
The optimizations are useful on the targets that have these functions disabled (e.g. NVPTX & AMDGPU).

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	400 ms	x64 debian > Clang.CodeGen::aggregate-assign-call.c
	800 ms	x64 debian > Clang.CodeGen::attr-arm-sve-vector-bits-call.c
	520 ms	x64 debian > Clang.CodeGen::available-externally-suppress.c
	430 ms	x64 debian > Clang.CodeGen::cfi-icall-cross-dso.c
	520 ms	x64 debian > Clang.CodeGen::dllimport.c
		View Full Test Results (185 Failed)

Event Timeline

tra created this revision.Jun 23 2021, 11:14 AM

Herald added subscribers: bixia, hiraditya, jholewinski. · View Herald TranscriptJun 23 2021, 11:14 AM

tra requested review of this revision.Jun 23 2021, 11:14 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 23 2021, 11:14 AM

jlebar added a subscriber: jlebar.Jun 23 2021, 11:17 AM

MemCpyOpt works on intrinsics, not libcalls. Why is it checking TLI at all?

I think the only place TLI should come in is when SimplifyLibCall converts memcpy into llvm.memcpy, but MemCpyOpt itself shouldn't care about it.

Harbormaster completed remote builds in B110538: Diff 353830.Jun 23 2021, 11:52 AM

In D104801#2836730, @nikic wrote:

MemCpyOpt works on intrinsics, not libcalls. Why is it checking TLI at all?

I think the only place TLI should come in is when SimplifyLibCall converts memcpy into llvm.memcpy, but MemCpyOpt itself shouldn't care about it.

The check has been added long ago to deal with -fno-builtin, but it's not clear what exactly was the issue.
https://github.com/llvm/llvm-project/commit/23f61a09aff4a3c5ca4bba9410878dcfebf0656c

My best guess here is that this exists to guard against introduction of memset/memcpy where none existed before. MemCpyOpt mostly does optimizations that remove/replace memcpys/memsets, but there are two that convert loads/stored into memcpy/memset. Possibly we should only be guarding those two and let everything else happen?

Not sure who would be familiar with this area.

This change only enables the pass for NVPTX where use of memcpy/memset intrinsics is fine and it does not change anything for other back-ends.

Figuring out a better criteria for enabling/disabling the pass can be dealt with separately by someone with better understanding of the pass than myself.

Not sure who would be familiar with this area.

We may need to ask @lattner who authored the change, but it was long ago, so our chances are probably not great.

In D104801#2836898, @tra wrote:

This change only enables the pass for NVPTX where use of memcpy/memset intrinsics is fine and it does not change anything for other back-ends.

Figuring out a better criteria for enabling/disabling the pass can be dealt with separately by someone with better understanding of the pass than myself.

I think these questions are rather related. Why does NVPTX require special handling here? If these libcalls are actually available, then you need to enable them in TLI. If they aren't, but the intrinsics form is still usable, then the libcall checks in MemCpyOpt are wrong and we should adjust those. A separate TTI hook seems like the wrong solution in either case.

In D104801#2836912, @nikic wrote:

I think these questions are rather related. Why does NVPTX require special handling here? If these libcalls are actually available, then you need to enable them in TLI. If they aren't, but the intrinsics form is still usable, then the libcall checks in MemCpyOpt are wrong and we should adjust those. A separate TTI hook seems like the wrong solution in either case.

NVPTX is... special.
There's no standard library of any kind on the GPU, hence no libcalls.
We do make an effort to provide enough functionality to keep things working.
E.g. most of the math functions end up being pulled from an external bitcode library.
We rely on memcpy/memset intrinsics to get replaced by loops by one of NVPTX-specific passes.

So, reporting libcalls as unavailable is correct.
"memcpy/memset intrinsics are still usable" is... mostly true. It is for the early passes. We do have some corner cases where we en up materializing them too late, but it's rarely an issue in practice.
Given that this behavior is very specific to this particular back-end and that the NVPTX back-end does something rather peculiar, TTI is an appropriate source for the hint, IMO.
I don't think we have a good way to express "things work in a weird way here" in a back-end agnostic way here.

In D104801#2836940, @tra wrote:

In D104801#2836912, @nikic wrote:

I think these questions are rather related. Why does NVPTX require special handling here? If these libcalls are actually available, then you need to enable them in TLI. If they aren't, but the intrinsics form is still usable, then the libcall checks in MemCpyOpt are wrong and we should adjust those. A separate TTI hook seems like the wrong solution in either case.

NVPTX is... special.
There's no standard library of any kind on the GPU, hence no libcalls.
We do make an effort to provide enough functionality to keep things working.
E.g. most of the math functions end up being pulled from an external bitcode library.
We rely on memcpy/memset intrinsics to get replaced by loops by one of NVPTX-specific passes.

So, reporting libcalls as unavailable is correct.
"memcpy/memset intrinsics are still usable" is... mostly true. It is for the early passes. We do have some corner cases where we en up materializing them too late, but it's rarely an issue in practice.
Given that this behavior is very specific to this particular back-end and that the NVPTX back-end does something rather peculiar, TTI is an appropriate source for the hint, IMO.
I don't think we have a good way to express "things work in a weird way here" in a back-end agnostic way here.

So IIUC the behavior you want to express is that certain lib functions actually are considered available up to a point? I'm not sure about having a dedicated TTI hook to specifically enable/disable the pass. Would it be possible to mark the lib functions as available until the pass that lowers them runs (there's TLI->disableAllFunctions)?

In D104801#2838064, @fhahn wrote:

So IIUC the behavior you want to express is that certain lib functions actually are considered available up to a point?

No, *intrinsics* are available. We can not currently lower any of them to libcalls and rely on alternative lowering mechanisms.

I think the pass' use of "is libcall available" is an imperfect proxy for either "should we bother looking for memcpy/memset ops" or for "can we materialize new memcpy/memset". It appears to assume that intrinsics and builtins either both supported or not supported.

I'm not sure about having a dedicated TTI hook to specifically enable/disable the pass.

IMO TTI is the standard mechanism to make target-specific information available to otherwise generic passes. There are ~50 passes under lib/Transforms that utilize it.

Would it be possible to mark the lib functions as available until the pass that lowers them runs (there's TLI->disableAllFunctions)?

I don't think it's feasible. Back-end has no idea or control over when/where the pass is going to run.
E.g. I can run opt -memcpyopt my.ll and I would want it to work with NVPTX.

I agree that this seems overly specific and unfortunate, but I'm a competent reviewer for this area, so take that with a grain of salt.

llvm/include/llvm/Analysis/TargetTransformInfo.h
1417	typo providing

typo fix

I agree that this seems overly specific and unfortunate, but I'm a competent reviewer for this area, so take that with a grain of salt.

The TTI knob can be generalized to something like "hasMemoryIntrinsicsAlwaysAvailable" which would allow decoupling availability of memcpy libcalls from usability of memcpy intrinsics without being overly specific about MemCpyOpt pass.

I'm open about any other suggestions on a better way to get the pass working for NVPTX.

Harbormaster completed remote builds in B110883: Diff 354330.Jun 24 2021, 1:01 PM

In D104801#2839313, @tra wrote:

In D104801#2838064, @fhahn wrote:

So IIUC the behavior you want to express is that certain lib functions actually are considered available up to a point?

No, *intrinsics* are available. We can not currently lower any of them to libcalls and rely on alternative lowering mechanisms.

Sure, the intrinsics should *always* be available on any target regardless of whether they are lowered to lib calls or now.

I think the pass' use of "is libcall available" is an imperfect proxy for either "should we bother looking for memcpy/memset ops" or for "can we materialize new memcpy/memset". It appears to assume that intrinsics and builtins either both supported or not supported.

Agreed, the current check is not ideal. IIUC the current check acts as a proxy to check whether the backends can lower the intrinsic using a lib call. At the moment, I think this mainly guards against MemCpyOpt introducing llvm.memcpy calls that later get lowered to library calls in the backend, even if the library function is not marked as available.

Perhaps an alternative to the TLI check would be to provide a generic lowering for llvm.memcpy intrinsics when lib calls are not available? (One thing to note is that I think both Clang and GCC require memcpy and a few others to be available, even in freestanding environments)

I'm not sure about having a dedicated TTI hook to specifically enable/disable the pass.

IMO TTI is the standard mechanism to make target-specific information available to otherwise generic passes. There are ~50 passes under lib/Transforms that utilize it.

Yes, but at the moment, I don't think there's much precedence for backends specifically disabling certain passes. This could get messy very quickly, e.g. even if we add only hooks for some of the available passes. Those hooks are also not composable. IMO it would be preferable to model this in a way so other passes that may want to introduce llvm.memcpy calls also benefit.

Generalizing the hook to whether the intrinsics can be lowered without lib calls seems a step in the right direction to me and this could be helpful for other passes as well (as you suggested in a later comment). If backends could easily lower any llvm.memcpy call without needing to fall back to library calls, that would also be compelling.

In D104801#2843821, @fhahn wrote:

Perhaps an alternative to the TLI check would be to provide a generic lowering for llvm.memcpy intrinsics when lib calls are not available? (One thing to note is that I think both Clang and GCC require memcpy and a few others to be available, even in freestanding environments)

I'm not ready to bite that much.

I wonder if we could use existing int_memcpy_inline in cases where libcals are not available. It's supposed to guarantee that it does not call external functions.
https://github.com/llvm/llvm-project/blob/44826ecd929bdd33b3c86650198a5f8a57965cc7/llvm/include/llvm/IR/Intrinsics.td#L616

I'm not sure about having a dedicated TTI hook to specifically enable/disable the pass.

IMO TTI is the standard mechanism to make target-specific information available to otherwise generic passes. There are ~50 passes under lib/Transforms that utilize it.

Yes, but at the moment, I don't think there's much precedence for backends specifically disabling certain passes. This could get messy very quickly, e.g. even if we add only hooks for some of the available passes. Those hooks are also not composable. IMO it would be preferable to model this in a way so other passes that may want to introduce llvm.memcpy calls also benefit.

In this case the knob only *allows* using the pass in wider range of cases. Passes that don't want or don't need it are not affected.
In a way enabling/disabling libcalls already acts as such external enable/disable knob already. So do the various other existing knobs provided by TTI, if you squint just right. E.g. setting inlining threshold very low would effectively disable inlining pass. I don't think this pass introduces anything conceptually new.

Generalizing the hook to whether the intrinsics can be lowered without lib calls seems a step in the right direction to me and this could be helpful for other passes as well (as you suggested in a later comment). If backends could easily lower any llvm.memcpy call without needing to fall back to library calls, that would also be compelling.

OK. Let me try to see if I can make MemCpyOpt pass fall back to inline version of memcpy intrinsic if libcalls are not available.

If that does not pan out, plan B would be to provide a generic TTI knob (e.g. optional<bool> canLowerIntrinsic()) and then change MemOptPass to use that, instead of checking libcalls.

Allow the pass to run regradless of libcall availability.

I've always hated how TLI acts as if the llvm intrinsics and libc/libm functions are interchangeable. TLI should have no bearing on the introduction of an intrinsic call which can be easily lowered without relying on a host library call. We should just always canonicalize to llvm.memcpy and let the backend expand it if the target needs to later

tra added inline comments.Jun 30 2021, 3:50 PM

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
1747–1751	AFAICT the check does not affect correctness. The "little point of doing anything here" argument is questionable. These optimizations do allow other optimizations to happen and that's exactly how I ended up here -- a memcpy that was not replaced with memset ended up keeping other things alive which resulted in more work for the compiler and slower code at runtime. In practice only AMDGPU and NVPTX disable these libcalls and for both targets this pass is beneficial. For the others the change is a no-op.

Harbormaster completed remote builds in B111861: Diff 355705.Jun 30 2021, 4:24 PM

Ping. PTAL. The patch no longer relies on TTI and has been simplified to remove the checks for libcall availability.

LGTM, but the patch description needs an update.

This revision is now accepted and ready to land.Jul 16 2021, 10:46 AM

tra retitled this revision from [MemCpyOpt] Enable memcpy optimization for NVPTX back-end. to [MemCpyOpt] Enable memcpy optimizations unconditionally..Jul 16 2021, 11:06 AM

tra edited the summary of this revision. (Show Details)

Herald added a subscriber: tpr. · View Herald TranscriptJul 16 2021, 11:06 AM

In D104801#2883786, @nikic wrote:

LGTM, but the patch description needs an update.

Done.

This revision was landed with ongoing or failed builds.Jul 19 2021, 11:58 AM

Closed by commit rG2c98298a7559: [MemCpyOpt] Enable memcpy optimizations unconditionally. (authored by tra). · Explain Why

This revision was automatically updated to reflect the committed changes.

tra added a commit: rG2c98298a7559: [MemCpyOpt] Enable memcpy optimizations unconditionally..

The one thing I'm concerned about with this change is that we might potentially make a memcpy implementation call memcpy, which seems bad.

In D104801#2888266, @efriedma wrote:

The one thing I'm concerned about with this change is that we might potentially make a memcpy implementation call memcpy, which seems bad.

memcpy implementation is recognized by LoopIdiom pass.

tra added a reverting change: rG1a43ee65d1bb: Revert "[MemCpyOpt] Enable memcpy optimizations unconditionally.".Jul 19 2021, 2:28 PM

In D104801#2888266, @efriedma wrote:

The one thing I'm concerned about with this change is that we might potentially make a memcpy implementation call memcpy, which seems bad.

Turns out that the patch breaks sanitizers that didn't expect memset to materialize when they were compiling their runtime.
https://lab.llvm.org/buildbot/#/builders/37/builds/5453

I've reverted the patch for now and will try again once I figure out how to keep sanitizers happy.

Besides that this patch created the intrinsics llvm.memset which in our target was unsupported, it also created pointers that may be invalid on the target, e.g. i8 addrspace(X)* where addrspace(X) only supports vector memory. While this is probably all solvable, please consider adding TTI switches for targets to disable this optimization (e.g. for the intrinsics).

In D104801#2889283, @hgreving wrote:

Besides that this patch created the intrinsics llvm.memset which in our target was unsupported, it also created pointers that may be invalid on the target, e.g. i8 addrspace(X)* where addrspace(X) only supports vector memory. While this is probably all solvable, please consider adding TTI switches for targets to disable this optimization (e.g. for the intrinsics).

...and that brings us back to where this patch has started -- with a TTI knob to control whether it wants this pass enabled. :-/

That said, I'm not quite sure that "this type/AS combination" is invalid is something you can expect all parts of LLVM to respect. While this pass may happen to tickle it, I can see other cases where the type of the pointer may change. I would think that dealing with the quirks of type support of particular target would be something for legalizer to do.

In D104801#2890876, @tra wrote:

In D104801#2889283, @hgreving wrote:

Besides that this patch created the intrinsics llvm.memset which in our target was unsupported, it also created pointers that may be invalid on the target, e.g. i8 addrspace(X)* where addrspace(X) only supports vector memory. While this is probably all solvable, please consider adding TTI switches for targets to disable this optimization (e.g. for the intrinsics).

...and that brings us back to where this patch has started -- with a TTI knob to control whether it wants this pass enabled. :-/

That said, I'm not quite sure that "this type/AS combination" is invalid is something you can expect all parts of LLVM to respect. While this pass may happen to tickle it, I can see other cases where the type of the pointer may change. I would think that dealing with the quirks of type support of particular target would be something for legalizer to do.

I agree to above. But in practice, since this change is quite drastic, I would recommend adding some kind of control for targets not wanting this. Also a flat-rate knob to disable the pass altogether would be fine IMO.

tra mentioned this in D106401: [CUDA, MemCpyOpt] Add a flag to force-enable memcpyopt and use it for CUDA..Jul 20 2021, 2:48 PM

nikic mentioned this in D106769: [MemCpyOpt] Relax libcall checks.Jul 25 2021, 8:40 AM

nikic mentioned this in rGbb15861e149a: [MemCpyOpt] Relax libcall checks.Aug 4 2021, 12:18 PM

tra mentioned this in rG6a9cf21f5a2d: [CUDA, MemCpyOpt] Add a flag to force-enable memcpyopt and use it for CUDA..Aug 6 2021, 11:22 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

TargetTransformInfo.h

7 lines

TargetTransformInfoImpl.h

2 lines

Transforms/

Scalar/

MemCpyOptimizer.h

6 lines

lib/

Analysis/

TargetTransformInfo.cpp

4 lines

Target/

NVPTX/

NVPTXTargetTransformInfo.h

1 line

Transforms/

Scalar/

MemCpyOptimizer.cpp

23 lines

Diff 354330

llvm/include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 1,406 Lines • ▼ Show 20 Lines	struct VPLegalization {
}		}
VPLegalization(VPTransform EVLParamStrategy, VPTransform OpStrategy)		VPLegalization(VPTransform EVLParamStrategy, VPTransform OpStrategy)
: EVLParamStrategy(EVLParamStrategy), OpStrategy(OpStrategy) {}		: EVLParamStrategy(EVLParamStrategy), OpStrategy(OpStrategy) {}
};		};

/// \returns How the target needs this vector-predicated operation to be		/// \returns How the target needs this vector-predicated operation to be
/// transformed.		/// transformed.
VPLegalization getVPLegalizationStrategy(const VPIntrinsic &PI) const;		VPLegalization getVPLegalizationStrategy(const VPIntrinsic &PI) const;

		/// \returns True, if the target wants to enable MemCpy optimization pass
		/// despite not providing library functions for memset/memcpy.
		lattnerUnsubmitted Done Reply Inline Actions typo providing lattner: typo providing
		bool enableMemCpyOpt() const;
/// @}		/// @}

/// @}		/// @}

private:		private:
/// Estimate the latency of specified instruction.		/// Estimate the latency of specified instruction.
/// Returns 1 as the default value.		/// Returns 1 as the default value.
InstructionCost getInstructionLatency(const Instruction *I) const;		InstructionCost getInstructionLatency(const Instruction *I) const;
▲ Show 20 Lines • Show All 296 Lines • ▼ Show 20 Lines	virtual bool preferPredicatedReductionSelect(unsigned Opcode, Type *Ty,
ReductionFlags) const = 0;		ReductionFlags) const = 0;
virtual bool shouldExpandReduction(const IntrinsicInst *II) const = 0;		virtual bool shouldExpandReduction(const IntrinsicInst *II) const = 0;
virtual unsigned getGISelRematGlobalCost() const = 0;		virtual unsigned getGISelRematGlobalCost() const = 0;
virtual bool supportsScalableVectors() const = 0;		virtual bool supportsScalableVectors() const = 0;
virtual bool hasActiveVectorLength() const = 0;		virtual bool hasActiveVectorLength() const = 0;
virtual InstructionCost getInstructionLatency(const Instruction *I) = 0;		virtual InstructionCost getInstructionLatency(const Instruction *I) = 0;
virtual VPLegalization		virtual VPLegalization
getVPLegalizationStrategy(const VPIntrinsic &PI) const = 0;		getVPLegalizationStrategy(const VPIntrinsic &PI) const = 0;
		virtual bool enableMemCpyOpt() const = 0;
};		};

template <typename T>		template <typename T>
class TargetTransformInfo::Model final : public TargetTransformInfo::Concept {		class TargetTransformInfo::Model final : public TargetTransformInfo::Concept {
T Impl;		T Impl;

public:		public:
Model(T Impl) : Impl(std::move(Impl)) {}		Model(T Impl) : Impl(std::move(Impl)) {}
▲ Show 20 Lines • Show All 560 Lines • ▼ Show 20 Lines	public:
InstructionCost getInstructionLatency(const Instruction *I) override {		InstructionCost getInstructionLatency(const Instruction *I) override {
return Impl.getInstructionLatency(I);		return Impl.getInstructionLatency(I);
}		}

VPLegalization		VPLegalization
getVPLegalizationStrategy(const VPIntrinsic &PI) const override {		getVPLegalizationStrategy(const VPIntrinsic &PI) const override {
return Impl.getVPLegalizationStrategy(PI);		return Impl.getVPLegalizationStrategy(PI);
}		}

		bool enableMemCpyOpt() const override { return Impl.enableMemCpyOpt(); }
};		};

template <typename T>		template <typename T>
TargetTransformInfo::TargetTransformInfo(T Impl)		TargetTransformInfo::TargetTransformInfo(T Impl)
: TTIImpl(new Model<T>(Impl)) {}		: TTIImpl(new Model<T>(Impl)) {}

/// Analysis pass providing the \c TargetTransformInfo.		/// Analysis pass providing the \c TargetTransformInfo.
///		///
▲ Show 20 Lines • Show All 94 Lines • Show Last 20 Lines

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 751 Lines • ▼ Show 20 Lines	public:

TargetTransformInfo::VPLegalization		TargetTransformInfo::VPLegalization
getVPLegalizationStrategy(const VPIntrinsic &PI) const {		getVPLegalizationStrategy(const VPIntrinsic &PI) const {
return TargetTransformInfo::VPLegalization(		return TargetTransformInfo::VPLegalization(
/* EVLParamStrategy */ TargetTransformInfo::VPLegalization::Discard,		/* EVLParamStrategy */ TargetTransformInfo::VPLegalization::Discard,
/* OperatorStrategy */ TargetTransformInfo::VPLegalization::Convert);		/* OperatorStrategy */ TargetTransformInfo::VPLegalization::Convert);
}		}

		bool enableMemCpyOpt() const { return false; }

protected:		protected:
// Obtain the minimum required size to hold the value (without the sign)		// Obtain the minimum required size to hold the value (without the sign)
// In case of a vector it returns the min required size for one element.		// In case of a vector it returns the min required size for one element.
unsigned minRequiredElementSize(const Value *Val, bool &isSigned) const {		unsigned minRequiredElementSize(const Value *Val, bool &isSigned) const {
if (isa<ConstantDataVector>(Val) \|\| isa<ConstantVector>(Val)) {		if (isa<ConstantDataVector>(Val) \|\| isa<ConstantVector>(Val)) {
const auto *VectorValue = cast<Constant>(Val);		const auto *VectorValue = cast<Constant>(Val);

// In case of a vector need to pick the max between the min		// In case of a vector need to pick the max between the min
▲ Show 20 Lines • Show All 407 Lines • Show Last 20 Lines

llvm/include/llvm/Transforms/Scalar/MemCpyOptimizer.h

	Show All 31 Lines
	class MemCpyInst;			class MemCpyInst;
	class MemMoveInst;			class MemMoveInst;
	class MemoryDependenceResults;			class MemoryDependenceResults;
	class MemorySSA;			class MemorySSA;
	class MemorySSAUpdater;			class MemorySSAUpdater;
	class MemSetInst;			class MemSetInst;
	class StoreInst;			class StoreInst;
	class TargetLibraryInfo;			class TargetLibraryInfo;
				class TargetTransformInfo;
	class Value;			class Value;

	class MemCpyOptPass : public PassInfoMixin<MemCpyOptPass> {			class MemCpyOptPass : public PassInfoMixin<MemCpyOptPass> {
	MemoryDependenceResults *MD = nullptr;			MemoryDependenceResults *MD = nullptr;
	TargetLibraryInfo *TLI = nullptr;			TargetLibraryInfo *TLI = nullptr;
				TargetTransformInfo *TTI = nullptr;
	AAResults *AA = nullptr;			AAResults *AA = nullptr;
	AssumptionCache *AC = nullptr;			AssumptionCache *AC = nullptr;
	DominatorTree *DT = nullptr;			DominatorTree *DT = nullptr;
	MemorySSA *MSSA = nullptr;			MemorySSA *MSSA = nullptr;
	MemorySSAUpdater *MSSAU = nullptr;			MemorySSAUpdater *MSSAU = nullptr;

	public:			public:
	MemCpyOptPass() = default;			MemCpyOptPass() = default;

	PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM);			PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM);

	// Glue for the old PM.			// Glue for the old PM.
	bool runImpl(Function &F, MemoryDependenceResults MD, TargetLibraryInfo TLI,			bool runImpl(Function &F, MemoryDependenceResults MD, TargetLibraryInfo TLI,
	AAResults AA, AssumptionCache AC, DominatorTree *DT,			TargetTransformInfo TTI, AAResults AA, AssumptionCache *AC,
	MemorySSA *MSSA);			DominatorTree DT, MemorySSA MSSA);

	private:			private:
	// Helper functions			// Helper functions
	bool processStore(StoreInst *SI, BasicBlock::iterator &BBI);			bool processStore(StoreInst *SI, BasicBlock::iterator &BBI);
	bool processMemSet(MemSetInst *SI, BasicBlock::iterator &BBI);			bool processMemSet(MemSetInst *SI, BasicBlock::iterator &BBI);
	bool processMemCpy(MemCpyInst *M, BasicBlock::iterator &BBI);			bool processMemCpy(MemCpyInst *M, BasicBlock::iterator &BBI);
	bool processMemMove(MemMoveInst *M);			bool processMemMove(MemMoveInst *M);
	bool performCallSlotOptzn(Instruction cpyLoad, Instruction cpyStore,			bool performCallSlotOptzn(Instruction cpyLoad, Instruction cpyStore,
	Show All 17 Lines

llvm/lib/Analysis/TargetTransformInfo.cpp

	Show First 20 Lines • Show All 1,410 Lines • ▼ Show 20 Lines
	}			}

	AnalysisKey TargetIRAnalysis::Key;			AnalysisKey TargetIRAnalysis::Key;

	TargetIRAnalysis::Result TargetIRAnalysis::getDefaultTTI(const Function &F) {			TargetIRAnalysis::Result TargetIRAnalysis::getDefaultTTI(const Function &F) {
	return Result(F.getParent()->getDataLayout());			return Result(F.getParent()->getDataLayout());
	}			}

				bool TargetTransformInfo::enableMemCpyOpt() const {
				return TTIImpl->enableMemCpyOpt();
				}

	// Register the basic pass.			// Register the basic pass.
	INITIALIZE_PASS(TargetTransformInfoWrapperPass, "tti",			INITIALIZE_PASS(TargetTransformInfoWrapperPass, "tti",
	"Target Transform Information", false, true)			"Target Transform Information", false, true)
	char TargetTransformInfoWrapperPass::ID = 0;			char TargetTransformInfoWrapperPass::ID = 0;

	void TargetTransformInfoWrapperPass::anchor() {}			void TargetTransformInfoWrapperPass::anchor() {}

	TargetTransformInfoWrapperPass::TargetTransformInfoWrapperPass()			TargetTransformInfoWrapperPass::TargetTransformInfoWrapperPass()
	Show All 22 Lines

llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.h

Show First 20 Lines • Show All 114 Lines • ▼ Show 20 Lines	bool hasVolatileVariant(Instruction *I, unsigned AddrSpace) {
switch(I->getOpcode()){		switch(I->getOpcode()){
default:		default:
return false;		return false;
case Instruction::Load:		case Instruction::Load:
case Instruction::Store:		case Instruction::Store:
return true;		return true;
}		}
}		}
		bool enableMemCpyOpt() const { return true; }
};		};

} // end namespace llvm		} // end namespace llvm

#endif		#endif

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp

Show All 21 Lines
#include "llvm/Analysis/AssumptionCache.h"		#include "llvm/Analysis/AssumptionCache.h"
#include "llvm/Analysis/GlobalsModRef.h"		#include "llvm/Analysis/GlobalsModRef.h"
#include "llvm/Analysis/Loads.h"		#include "llvm/Analysis/Loads.h"
#include "llvm/Analysis/MemoryDependenceAnalysis.h"		#include "llvm/Analysis/MemoryDependenceAnalysis.h"
#include "llvm/Analysis/MemoryLocation.h"		#include "llvm/Analysis/MemoryLocation.h"
#include "llvm/Analysis/MemorySSA.h"		#include "llvm/Analysis/MemorySSA.h"
#include "llvm/Analysis/MemorySSAUpdater.h"		#include "llvm/Analysis/MemorySSAUpdater.h"
#include "llvm/Analysis/TargetLibraryInfo.h"		#include "llvm/Analysis/TargetLibraryInfo.h"
		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/Argument.h"		#include "llvm/IR/Argument.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/DerivedTypes.h"		#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
▲ Show 20 Lines • Show All 263 Lines • ▼ Show 20 Lines
FunctionPass *llvm::createMemCpyOptPass() { return new MemCpyOptLegacyPass(); }		FunctionPass *llvm::createMemCpyOptPass() { return new MemCpyOptLegacyPass(); }

INITIALIZE_PASS_BEGIN(MemCpyOptLegacyPass, "memcpyopt", "MemCpy Optimization",		INITIALIZE_PASS_BEGIN(MemCpyOptLegacyPass, "memcpyopt", "MemCpy Optimization",
false, false)		false, false)
INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)		INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)
INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)		INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
INITIALIZE_PASS_DEPENDENCY(MemoryDependenceWrapperPass)		INITIALIZE_PASS_DEPENDENCY(MemoryDependenceWrapperPass)
INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)
		INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)
INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)		INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)
INITIALIZE_PASS_DEPENDENCY(GlobalsAAWrapperPass)		INITIALIZE_PASS_DEPENDENCY(GlobalsAAWrapperPass)
INITIALIZE_PASS_DEPENDENCY(MemorySSAWrapperPass)		INITIALIZE_PASS_DEPENDENCY(MemorySSAWrapperPass)
INITIALIZE_PASS_END(MemCpyOptLegacyPass, "memcpyopt", "MemCpy Optimization",		INITIALIZE_PASS_END(MemCpyOptLegacyPass, "memcpyopt", "MemCpy Optimization",
false, false)		false, false)

// Check that V is either not accessible by the caller, or unwinding cannot		// Check that V is either not accessible by the caller, or unwinding cannot
// occur between Start and End.		// occur between Start and End.
▲ Show 20 Lines • Show All 1,394 Lines • ▼ Show 20 Lines	PreservedAnalyses MemCpyOptPass::run(Function &F, FunctionAnalysisManager &AM) {
auto *MD = !EnableMemorySSA ? &AM.getResult<MemoryDependenceAnalysis>(F)		auto *MD = !EnableMemorySSA ? &AM.getResult<MemoryDependenceAnalysis>(F)
: AM.getCachedResult<MemoryDependenceAnalysis>(F);		: AM.getCachedResult<MemoryDependenceAnalysis>(F);
auto &TLI = AM.getResult<TargetLibraryAnalysis>(F);		auto &TLI = AM.getResult<TargetLibraryAnalysis>(F);
auto *AA = &AM.getResult<AAManager>(F);		auto *AA = &AM.getResult<AAManager>(F);
auto *AC = &AM.getResult<AssumptionAnalysis>(F);		auto *AC = &AM.getResult<AssumptionAnalysis>(F);
auto *DT = &AM.getResult<DominatorTreeAnalysis>(F);		auto *DT = &AM.getResult<DominatorTreeAnalysis>(F);
auto *MSSA = EnableMemorySSA ? &AM.getResult<MemorySSAAnalysis>(F)		auto *MSSA = EnableMemorySSA ? &AM.getResult<MemorySSAAnalysis>(F)
: AM.getCachedResult<MemorySSAAnalysis>(F);		: AM.getCachedResult<MemorySSAAnalysis>(F);
		auto &TTI = AM.getResult<TargetIRAnalysis>(F);
bool MadeChange =		bool MadeChange =
runImpl(F, MD, &TLI, AA, AC, DT, MSSA ? &MSSA->getMSSA() : nullptr);		runImpl(F, MD, &TLI, &TTI, AA, AC, DT, MSSA ? &MSSA->getMSSA() : nullptr);
if (!MadeChange)		if (!MadeChange)
return PreservedAnalyses::all();		return PreservedAnalyses::all();

PreservedAnalyses PA;		PreservedAnalyses PA;
PA.preserveSet<CFGAnalyses>();		PA.preserveSet<CFGAnalyses>();
if (MD)		if (MD)
PA.preserve<MemoryDependenceAnalysis>();		PA.preserve<MemoryDependenceAnalysis>();
if (MSSA)		if (MSSA)
PA.preserve<MemorySSAAnalysis>();		PA.preserve<MemorySSAAnalysis>();
return PA;		return PA;
}		}

bool MemCpyOptPass::runImpl(Function &F, MemoryDependenceResults *MD_,		bool MemCpyOptPass::runImpl(Function &F, MemoryDependenceResults *MD_,
TargetLibraryInfo TLI_, AliasAnalysis AA_,		TargetLibraryInfo TLI_, TargetTransformInfo TTI_,
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for parameter 'TLI_' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'TTI_' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for parameter 'TLI_' [readability-identifier-naming]…
AssumptionCache AC_, DominatorTree DT_,		AliasAnalysis AA_, AssumptionCache AC_,
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for parameter 'AA_' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'AC_' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for parameter 'AA_' [readability-identifier-naming]…
MemorySSA *MSSA_) {		DominatorTree DT_, MemorySSA MSSA_) {
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for parameter 'DT_' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'MSSA_' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for parameter 'DT_' [readability-identifier-naming]…
bool MadeChange = false;		bool MadeChange = false;
MD = MD_;		MD = MD_;
TLI = TLI_;		TLI = TLI_;
		TTI = TTI_;
AA = AA_;		AA = AA_;
AC = AC_;		AC = AC_;
DT = DT_;		DT = DT_;
MSSA = MSSA_;		MSSA = MSSA_;
MemorySSAUpdater MSSAU_(MSSA_);		MemorySSAUpdater MSSAU_(MSSA_);
MSSAU = MSSA_ ? &MSSAU_ : nullptr;		MSSAU = MSSA_ ? &MSSAU_ : nullptr;
// If we don't have at least memset and memcpy, there is little point of doing		// If we don't have at least memset and memcpy, there is little point of doing
// anything here. These are required by a freestanding implementation, so if		// anything here. These are required by a freestanding implementation, so if
// even they are disabled, there is no point in trying hard.		// even they are disabled, there is no point in trying hard.
if (!TLI->has(LibFunc_memset) \|\| !TLI->has(LibFunc_memcpy))		// Some targets (NVPTX) don't have memcpy/memset library functions, but still
		// want to enable this pass, so keep going if TTI info says so.
		if (!(TLI->has(LibFunc_memset) && TLI->has(LibFunc_memcpy)) &&
		!TTI->enableMemCpyOpt())
return false;		return false;
traAuthorUnsubmitted Done Reply Inline Actions AFAICT the check does not affect correctness. The "little point of doing anything here" argument is questionable. These optimizations do allow other optimizations to happen and that's exactly how I ended up here -- a memcpy that was not replaced with memset ended up keeping other things alive which resulted in more work for the compiler and slower code at runtime. In practice only AMDGPU and NVPTX disable these libcalls and for both targets this pass is beneficial. For the others the change is a no-op. tra: AFAICT the check does not affect correctness. The "little point of doing anything here"…

while (true) {		while (true) {
if (!iterateOnFunction(F))		if (!iterateOnFunction(F))
break;		break;
MadeChange = true;		MadeChange = true;
}		}

if (MSSA_ && VerifyMemorySSA)		if (MSSA_ && VerifyMemorySSA)
Show All 13 Lines	auto *MDWP = !EnableMemorySSA
: getAnalysisIfAvailable<MemoryDependenceWrapperPass>();		: getAnalysisIfAvailable<MemoryDependenceWrapperPass>();
auto *TLI = &getAnalysis<TargetLibraryInfoWrapperPass>().getTLI(F);		auto *TLI = &getAnalysis<TargetLibraryInfoWrapperPass>().getTLI(F);
auto *AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();		auto *AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();
auto *AC = &getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);		auto *AC = &getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);
auto *DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();		auto *DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();
auto *MSSAWP = EnableMemorySSA		auto *MSSAWP = EnableMemorySSA
? &getAnalysis<MemorySSAWrapperPass>()		? &getAnalysis<MemorySSAWrapperPass>()
: getAnalysisIfAvailable<MemorySSAWrapperPass>();		: getAnalysisIfAvailable<MemorySSAWrapperPass>();
		auto *TTI = &getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);

return Impl.runImpl(F, MDWP ? & MDWP->getMemDep() : nullptr, TLI, AA, AC, DT,		return Impl.runImpl(F, MDWP ? &MDWP->getMemDep() : nullptr, TLI, TTI, AA, AC,
MSSAWP ? &MSSAWP->getMSSA() : nullptr);		DT, MSSAWP ? &MSSAWP->getMSSA() : nullptr);
}		}