This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
InitializePasses.h
-
Transforms/
-
Scalar.h
-
Scalar/
-
LowerGCLeafIntrinsics.h
-
lib/
-
Passes/
-
PassBuilder.cpp
-
PassRegistry.def
-
Transforms/Scalar/
-
Scalar/
-
CMakeLists.txt
2
LowerGCLeafIntrinsics.cpp
-
Scalar.cpp
-
test/Transforms/LowerGCLeafIntrinsics/
-
Transforms/
-
LowerGCLeafIntrinsics/
-
memcpy.ll

Differential D107345

Introduce LowerGCLeafIntrinsics pass
AbandonedPublic

Authored by mkazantsev on Aug 3 2021, 5:54 AM.

Download Raw Diff

Details

Reviewers

apilipenko
anna
reames
nikic

Summary

Some of the intrinsics (such as llvm.memcpy.element.unordered.atomic) may be
lowered differently, depending on whether or not they are considered gc leaf.
Those that are gc leaves may be easily lowered on IR level according to their
semantics.

On small data pieces, it saves time on call overhead (that may be significant if
we are about to copy a small portion of data). On big data pieces, ideally the code
gen should be able to generate code not worse than any other possible lowering
for such simple cases.

Another advantage of IR lowering is that the compiler may figure out some facts
(e.g. regarding length of the copied data) and do less job than straightforward
lowering into a library call would.

This patch introduces a pass that may lower various GC leaf intrinsics on IR level,
and implements it for llvm.memcpy.element.unordered.atomic.

Diff Detail

Event Timeline

mkazantsev created this revision.Aug 3 2021, 5:54 AM

Herald added subscribers: jfb, hiraditya, mgorny. · View Herald TranscriptAug 3 2021, 5:54 AM

mkazantsev requested review of this revision.Aug 3 2021, 5:54 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 3 2021, 5:54 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B117614: Diff 363708.Aug 3 2021, 6:30 AM

Without commenting on anything else, this implementation is not compatible with opaque pointers. Please remove mentions of getPointerElementType() and instead rely on the provided element size, possibly by inserting appropriate casts. If this is not sufficient for some reason and you really do need to know the exact element type, then this is going to need a change to the intrinsic design.

This revision now requires changes to proceed.Aug 3 2021, 11:51 AM

Not sure I understand well what is and what is not compatible with opaque pointers, but I've tried to addressed the comment by removing getPointerElementType. Also added a preheader to the loop and moved number of iterations computation there.

mkazantsev updated this revision to Diff 364026.Aug 4 2021, 3:22 AM

Harbormaster completed remote builds in B117855: Diff 364026.Aug 4 2021, 4:07 AM

Thanks, should be fine now. Just avoiding getPointerElementType() is usually enough.

I think the name of the pass is misleading. What you are doing here is you provide an inlined lowering for a memory builtin. Your current implementation has a limitation such that it only inlines gc-leaf element atomic memcpys. I don't think the fact that it is gc-leaf is critical here. This transform can be extended to handle non-GC leaf operations. It can also be extended to handle non-atomic operations.

Here are some high-level questions to this optimization:

Should this lowering be a middle-end pass or something which resides in the backend?
If we chose to do this as a middle-end pass, when should this pass be scheduled?
How does this lowering interact with backend optimizations for memcpys? Like, replacing short constant length memcpys with loads/stores.
Is it always profitable to inline a memcpy? Should we have some heuristic here to select hot and short memcpys?
In this patch you do hand-rolled vectorization. Have you considered the alternative where you rely on the existing vectorizer instead?

I suggest writing a proposal to llvm-dev and discussing it there first.

There is a target-specific hook to emit code for regular (non-atomic) memcpys: EmitTargetCodeForMemcpy. Maybe we should just implement a similar hook for element-atomic copy?

llvm/lib/Transforms/Scalar/LowerGCLeafIntrinsics.cpp
67–68	LangRef doesn't prohibit other length types.
128–131	These loads and stores must be at least ElementSizeInBytes-atomic. I'm not sure you can express this in the IR at the moment.

That's a non-starter then.

Revision Contents

Path

Size

llvm/

include/

llvm/

InitializePasses.h

1 line

Transforms/

Scalar.h

6 lines

Scalar/

LowerGCLeafIntrinsics.h

29 lines

lib/

Passes/

PassBuilder.cpp

1 line

PassRegistry.def

1 line

Transforms/

Scalar/

CMakeLists.txt

1 line

LowerGCLeafIntrinsics.cpp

306 lines

Scalar.cpp

1 line

test/

Transforms/

LowerGCLeafIntrinsics/

memcpy.ll

96 lines

Diff 364026

llvm/include/llvm/InitializePasses.h

	Show First 20 Lines • Show All 261 Lines • ▼ Show 20 Lines
	void initializeLoopUnrollAndJamPass(PassRegistry&);			void initializeLoopUnrollAndJamPass(PassRegistry&);
	void initializeLoopUnrollPass(PassRegistry&);			void initializeLoopUnrollPass(PassRegistry&);
	void initializeLoopUnswitchPass(PassRegistry&);			void initializeLoopUnswitchPass(PassRegistry&);
	void initializeLoopVectorizePass(PassRegistry&);			void initializeLoopVectorizePass(PassRegistry&);
	void initializeLoopVersioningLICMLegacyPassPass(PassRegistry &);			void initializeLoopVersioningLICMLegacyPassPass(PassRegistry &);
	void initializeLoopVersioningLegacyPassPass(PassRegistry &);			void initializeLoopVersioningLegacyPassPass(PassRegistry &);
	void initializeLowerAtomicLegacyPassPass(PassRegistry&);			void initializeLowerAtomicLegacyPassPass(PassRegistry&);
	void initializeLowerConstantIntrinsicsPass(PassRegistry&);			void initializeLowerConstantIntrinsicsPass(PassRegistry&);
				void initializeLowerGCLeafIntrinsicsLegacyPassPass(PassRegistry &);
	void initializeLowerEmuTLSPass(PassRegistry&);			void initializeLowerEmuTLSPass(PassRegistry&);
	void initializeLowerExpectIntrinsicPass(PassRegistry&);			void initializeLowerExpectIntrinsicPass(PassRegistry&);
	void initializeLowerGuardIntrinsicLegacyPassPass(PassRegistry&);			void initializeLowerGuardIntrinsicLegacyPassPass(PassRegistry&);
	void initializeLowerWidenableConditionLegacyPassPass(PassRegistry&);			void initializeLowerWidenableConditionLegacyPassPass(PassRegistry&);
	void initializeLowerIntrinsicsPass(PassRegistry&);			void initializeLowerIntrinsicsPass(PassRegistry&);
	void initializeLowerInvokeLegacyPassPass(PassRegistry&);			void initializeLowerInvokeLegacyPassPass(PassRegistry&);
	void initializeLowerSwitchLegacyPassPass(PassRegistry &);			void initializeLowerSwitchLegacyPassPass(PassRegistry &);
	void initializeLowerTypeTestsPass(PassRegistry&);			void initializeLowerTypeTestsPass(PassRegistry&);
	▲ Show 20 Lines • Show All 184 Lines • Show Last 20 Lines

llvm/include/llvm/Transforms/Scalar.h

	Show First 20 Lines • Show All 369 Lines • ▼ Show 20 Lines
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// LowerAtomic - Lower atomic intrinsics to non-atomic form			// LowerAtomic - Lower atomic intrinsics to non-atomic form
	//			//
	Pass *createLowerAtomicPass();			Pass *createLowerAtomicPass();

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
				// Lower GC leaf intrinsics - Lower gc-leaf calls to intrinsics into IR
				//
				FunctionPass *createLowerGCLeafIntrinsicsPass();

				//===----------------------------------------------------------------------===//
				//
	// LowerGuardIntrinsic - Lower guard intrinsics to normal control flow.			// LowerGuardIntrinsic - Lower guard intrinsics to normal control flow.
	//			//
	Pass *createLowerGuardIntrinsicPass();			Pass *createLowerGuardIntrinsicPass();

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// LowerMatrixIntrinsics - Lower matrix intrinsics to vector operations.			// LowerMatrixIntrinsics - Lower matrix intrinsics to vector operations.
	//			//
	▲ Show 20 Lines • Show All 182 Lines • Show Last 20 Lines

llvm/include/llvm/Transforms/Scalar/LowerGCLeafIntrinsics.h

This file was added.

				//===--- LowerGCLeafIntrinsics.h - lower gc leaf intrinsic calls -*- C++
				//-*-===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This pass tries to inline gc-leaf versions of intrinsics that may also have a
				// non gc-leaf implementation.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_TRANSFORMS_SCALAR_LOWERGCLEAFINTRINSICS_H
				#define LLVM_TRANSFORMS_SCALAR_LOWERGCLEAFINTRINSICS_H

				#include "llvm/IR/Module.h"
				#include "llvm/IR/PassManager.h"

				namespace llvm {
				class LowerGCLeafIntrinsicsPass
				: public PassInfoMixin<LowerGCLeafIntrinsicsPass> {
				public:
				PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM);
				};
				}
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: namespace 'llvm' not terminated with a closing comment [llvm-namespace-comment] not useful clang-format: please reformat the code -} +} // namespace llvm Lint: Pre-merge checks: clang-tidy: warning: namespace 'llvm' not terminated with a closing comment [llvm-namespace…

				#endif // LLVM_TRANSFORMS_SCALAR_LOWERGCLEAFINTRINSICS_H

llvm/lib/Passes/PassBuilder.cpp

	Show First 20 Lines • Show All 176 Lines • ▼ Show 20 Lines
	#include "llvm/Transforms/Scalar/LoopSimplifyCFG.h"			#include "llvm/Transforms/Scalar/LoopSimplifyCFG.h"
	#include "llvm/Transforms/Scalar/LoopSink.h"			#include "llvm/Transforms/Scalar/LoopSink.h"
	#include "llvm/Transforms/Scalar/LoopStrengthReduce.h"			#include "llvm/Transforms/Scalar/LoopStrengthReduce.h"
	#include "llvm/Transforms/Scalar/LoopUnrollAndJamPass.h"			#include "llvm/Transforms/Scalar/LoopUnrollAndJamPass.h"
	#include "llvm/Transforms/Scalar/LoopUnrollPass.h"			#include "llvm/Transforms/Scalar/LoopUnrollPass.h"
	#include "llvm/Transforms/Scalar/LoopVersioningLICM.h"			#include "llvm/Transforms/Scalar/LoopVersioningLICM.h"
	#include "llvm/Transforms/Scalar/LowerAtomic.h"			#include "llvm/Transforms/Scalar/LowerAtomic.h"
	#include "llvm/Transforms/Scalar/LowerConstantIntrinsics.h"			#include "llvm/Transforms/Scalar/LowerConstantIntrinsics.h"
				#include "llvm/Transforms/Scalar/LowerGCLeafIntrinsics.h"
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: #includes are not sorted properly [llvm-include-order] not useful clang-format: please reformat the code -#include "llvm/Transforms/Scalar/LowerGCLeafIntrinsics.h" Lint: Pre-merge checks: clang-tidy: warning: #includes are not sorted properly [llvm-include-order] [[https://github.
	#include "llvm/Transforms/Scalar/LowerExpectIntrinsic.h"			#include "llvm/Transforms/Scalar/LowerExpectIntrinsic.h"
				Lint: Pre-merge checks Inline Actions clang-format: please reformat the code +#include "llvm/Transforms/Scalar/LowerGCLeafIntrinsics.h" Lint: Pre-merge checks: clang-format: please reformat the code ``` +#include…
	#include "llvm/Transforms/Scalar/LowerGuardIntrinsic.h"			#include "llvm/Transforms/Scalar/LowerGuardIntrinsic.h"
	#include "llvm/Transforms/Scalar/LowerMatrixIntrinsics.h"			#include "llvm/Transforms/Scalar/LowerMatrixIntrinsics.h"
	#include "llvm/Transforms/Scalar/LowerWidenableCondition.h"			#include "llvm/Transforms/Scalar/LowerWidenableCondition.h"
	#include "llvm/Transforms/Scalar/MakeGuardsExplicit.h"			#include "llvm/Transforms/Scalar/MakeGuardsExplicit.h"
	#include "llvm/Transforms/Scalar/MemCpyOptimizer.h"			#include "llvm/Transforms/Scalar/MemCpyOptimizer.h"
	#include "llvm/Transforms/Scalar/MergeICmps.h"			#include "llvm/Transforms/Scalar/MergeICmps.h"
	#include "llvm/Transforms/Scalar/MergedLoadStoreMotion.h"			#include "llvm/Transforms/Scalar/MergedLoadStoreMotion.h"
	#include "llvm/Transforms/Scalar/NaryReassociate.h"			#include "llvm/Transforms/Scalar/NaryReassociate.h"
	▲ Show 20 Lines • Show All 3,031 Lines • Show Last 20 Lines

llvm/lib/Passes/PassRegistry.def

	Show First 20 Lines • Show All 233 Lines • ▼ Show 20 Lines
	FUNCTION_PASS("irce", IRCEPass())			FUNCTION_PASS("irce", IRCEPass())
	FUNCTION_PASS("float2int", Float2IntPass())			FUNCTION_PASS("float2int", Float2IntPass())
	FUNCTION_PASS("no-op-function", NoOpFunctionPass())			FUNCTION_PASS("no-op-function", NoOpFunctionPass())
	FUNCTION_PASS("libcalls-shrinkwrap", LibCallsShrinkWrapPass())			FUNCTION_PASS("libcalls-shrinkwrap", LibCallsShrinkWrapPass())
	FUNCTION_PASS("lint", LintPass())			FUNCTION_PASS("lint", LintPass())
	FUNCTION_PASS("inject-tli-mappings", InjectTLIMappings())			FUNCTION_PASS("inject-tli-mappings", InjectTLIMappings())
	FUNCTION_PASS("instnamer", InstructionNamerPass())			FUNCTION_PASS("instnamer", InstructionNamerPass())
	FUNCTION_PASS("loweratomic", LowerAtomicPass())			FUNCTION_PASS("loweratomic", LowerAtomicPass())
				FUNCTION_PASS("lower-gc-leaf-intrinsics", LowerGCLeafIntrinsicsPass())
	FUNCTION_PASS("lower-expect", LowerExpectIntrinsicPass())			FUNCTION_PASS("lower-expect", LowerExpectIntrinsicPass())
	FUNCTION_PASS("lower-guard-intrinsic", LowerGuardIntrinsicPass())			FUNCTION_PASS("lower-guard-intrinsic", LowerGuardIntrinsicPass())
	FUNCTION_PASS("lower-constant-intrinsics", LowerConstantIntrinsicsPass())			FUNCTION_PASS("lower-constant-intrinsics", LowerConstantIntrinsicsPass())
	FUNCTION_PASS("lower-matrix-intrinsics", LowerMatrixIntrinsicsPass())			FUNCTION_PASS("lower-matrix-intrinsics", LowerMatrixIntrinsicsPass())
	FUNCTION_PASS("lower-matrix-intrinsics-minimal", LowerMatrixIntrinsicsPass(true))			FUNCTION_PASS("lower-matrix-intrinsics-minimal", LowerMatrixIntrinsicsPass(true))
	FUNCTION_PASS("lower-widenable-condition", LowerWidenableConditionPass())			FUNCTION_PASS("lower-widenable-condition", LowerWidenableConditionPass())
	FUNCTION_PASS("guard-widening", GuardWideningPass())			FUNCTION_PASS("guard-widening", GuardWideningPass())
	FUNCTION_PASS("load-store-vectorizer", LoadStoreVectorizerPass())			FUNCTION_PASS("load-store-vectorizer", LoadStoreVectorizerPass())
	▲ Show 20 Lines • Show All 206 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/CMakeLists.txt

Show First 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	add_llvm_component_library(LLVMScalarOpts
LoopSimplifyCFG.cpp		LoopSimplifyCFG.cpp
LoopStrengthReduce.cpp		LoopStrengthReduce.cpp
LoopUnrollPass.cpp		LoopUnrollPass.cpp
LoopUnrollAndJamPass.cpp		LoopUnrollAndJamPass.cpp
LoopUnswitch.cpp		LoopUnswitch.cpp
LoopVersioningLICM.cpp		LoopVersioningLICM.cpp
LowerAtomic.cpp		LowerAtomic.cpp
LowerConstantIntrinsics.cpp		LowerConstantIntrinsics.cpp
		LowerGCLeafIntrinsics.cpp
LowerExpectIntrinsic.cpp		LowerExpectIntrinsic.cpp
LowerGuardIntrinsic.cpp		LowerGuardIntrinsic.cpp
LowerMatrixIntrinsics.cpp		LowerMatrixIntrinsics.cpp
LowerWidenableCondition.cpp		LowerWidenableCondition.cpp
MakeGuardsExplicit.cpp		MakeGuardsExplicit.cpp
MemCpyOptimizer.cpp		MemCpyOptimizer.cpp
MergeICmps.cpp		MergeICmps.cpp
MergedLoadStoreMotion.cpp		MergedLoadStoreMotion.cpp
Show All 40 Lines

llvm/lib/Transforms/Scalar/LowerGCLeafIntrinsics.cpp

This file was added.

				//===- LowerGCLeafIntrinsics.cpp - lower gc leaf intrinsic calls -*- C++
				//-*-===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This pass tries to inline gc-leaf versions of intrinsics that may also have a
				// non gc-leaf implementation.
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/Transforms/Scalar/LowerGCLeafIntrinsics.h"
				#include "llvm/ADT/Statistic.h"
				#include "llvm/Analysis/TargetLibraryInfo.h"
				#include "llvm/Analysis/TargetTransformInfo.h"
				#include "llvm/IR/Dominators.h"
				#include "llvm/IR/InstIterator.h"
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: #includes are not sorted properly [llvm-include-order] not useful clang-format: please reformat the code -#include "llvm/IR/InstIterator.h" Lint: Pre-merge checks: clang-tidy: warning: #includes are not sorted properly [llvm-include-order] [[https://github.
				#include "llvm/IR/IRBuilder.h"
				Lint: Pre-merge checks Inline Actions clang-format: please reformat the code +#include "llvm/IR/InstIterator.h" Lint: Pre-merge checks: clang-format: please reformat the code ``` +#include "llvm/IR/InstIterator.h" ```
				#include "llvm/InitializePasses.h"
				#include "llvm/Support/DebugCounter.h"
				#include "llvm/Transforms/Scalar.h"
				#include "llvm/Transforms/Utils/BasicBlockUtils.h"

				using namespace llvm;

				#define DEBUG_TYPE "lower-gc-leaf-intrinsics"

				static cl::opt<bool>
				Lint: Pre-merge checks Inline Actions clang-format: please reformat the code -static cl::opt<bool> -UsePrefetching("lower-gc-leaf-intrinsics-use-prefetching", cl::init(true), - cl::Hidden, - cl::desc("Use software prefetching when lowering intrinsics.")); +static cl::opt<bool> UsePrefetching( + "lower-gc-leaf-intrinsics-use-prefetching", cl::init(true), cl::Hidden, + cl::desc("Use software prefetching when lowering intrinsics.")); Lint: Pre-merge checks: clang-format: please reformat the code ``` -static cl::opt<bool> -UsePrefetching("lower-gc-leaf…
				UsePrefetching("lower-gc-leaf-intrinsics-use-prefetching", cl::init(true),
				cl::Hidden,
				cl::desc("Use software prefetching when lowering intrinsics."));

				STATISTIC(NumAtomicMemCpyLowered,
				"Number of atomic memcpy instructions lowered");

				static bool lowerAtomicMemCpy(AtomicMemCpyInst *MemCpy,
				const TargetTransformInfo *TTI,
				DominatorTree *DT) {
				Value *Src = MemCpy->getRawSource();
				PointerType *SrcType = cast<PointerType>(Src->getType());

				Value *Dest = MemCpy->getRawDest();
				PointerType *DestType = cast<PointerType>(Dest->getType());

				// Take vector width suitable for both address spaces.
				uint64_t ElementSizeInBytes = MemCpy->getElementSizeInBytes();
				uint64_t MaxVectorSizeInBits =
				std::min(TTI->getLoadStoreVecRegBitWidth(SrcType->getAddressSpace()),
				TTI->getLoadStoreVecRegBitWidth(DestType->getAddressSpace()));
				assert(((MaxVectorSizeInBits & 7) == 0) && "Fractional number of bytes?");
				uint64_t MaxVectorSizeInBytes = MaxVectorSizeInBits / 8;
				if (MaxVectorSizeInBytes == 0 \|\|
				MaxVectorSizeInBytes % ElementSizeInBytes != 0)
				return false;

				uint64_t ElementsPerIteration = MaxVectorSizeInBytes / ElementSizeInBytes;
				assert(ElementsPerIteration != 0 && "Zero vector length is impossible!");

				IRBuilder<> Builder(MemCpy);
				LLVMContext &C = MemCpy->getContext();
				auto *LenInBytes = MemCpy->getLength();
				Type *LenType = LenInBytes->getType();
				if (!LenType->isIntegerTy(64)) {
				assert(LenInBytes->getType()->isIntegerTy(32) &&
				"Only 32 and 64-bit lengths are allowed!");
				apilipenkoUnsubmitted Not Done Reply Inline Actions LangRef doesn't prohibit other length types. apilipenko: LangRef doesn't prohibit other length types.
				LenInBytes = Builder.CreateZExt(LenInBytes, Type::getInt64Ty(C),
				LenInBytes->getName() + ".wide");
				LenType = Type::getInt64Ty(C);
				}
				auto *Len = Builder.CreateUDiv(LenInBytes,
				ConstantInt::get(LenType, ElementSizeInBytes),
				"elements.len");
				Value *ElementsPerVectorizedLoopIter =
				ConstantInt::get(LenType, ElementsPerIteration);
				Value *LoopCond =
				Builder.CreateICmpSLT(Len, ElementsPerVectorizedLoopIter, "loop.cond");
				BasicBlock *BB = Builder.GetInsertBlock();
				BasicBlock *Preheader = SplitBlock(BB, BB->getTerminator(), DT);
				Builder.SetInsertPoint(Preheader->getTerminator());
				Preheader->setName("memcpy.loop.preheader");
				Value *NumIter =
				Builder.CreateUDiv(Len, ElementsPerVectorizedLoopIter, "loop.iters");

				// Create a vectorized loop that will copy majority of data using widest
				// registers available.
				BasicBlock *Loop = SplitBlock(Preheader, Preheader->getTerminator(), DT);
				Loop->setName("memcpy.loop");

				BasicBlock *Tail = SplitBlock(Loop, Loop->getTerminator(), DT);

				Builder.SetInsertPoint(Loop);
				Loop->getTerminator()->eraseFromParent();

				PHINode *Idx = Builder.CreatePHI(Type::getInt64Ty(C), 2, "idx");
				auto *SrcIdx = Builder.CreatePHI(Src->getType(), 2, "src.idx");
				auto *DestIdx = Builder.CreatePHI(Dest->getType(), 2, "dst.idx");

				// Prefetch data.
				if (UsePrefetching) {
				auto *Int32Ty = Type::getInt32Ty(C);
				auto *ReadPrefetch = ConstantInt::get(Int32Ty, 0);
				auto *WritePrefetch = ConstantInt::get(Int32Ty, 1);
				auto *PrefetchLocality = ConstantInt::get(Int32Ty, 0);
				auto *DataCacheType = ConstantInt::get(Int32Ty, 1);

				Builder.CreateIntrinsic(
				Intrinsic::prefetch, { SrcIdx->getType() },
				Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - Intrinsic::prefetch, { SrcIdx->getType() }, - { SrcIdx, ReadPrefetch, PrefetchLocality, DataCacheType }); + Intrinsic::prefetch, {SrcIdx->getType()}, + {SrcIdx, ReadPrefetch, PrefetchLocality, DataCacheType}); Lint: Pre-merge checks: clang-format: please reformat the code ``` - Intrinsic::prefetch, { SrcIdx->getType() }…
				{ SrcIdx, ReadPrefetch, PrefetchLocality, DataCacheType });
				Builder.CreateIntrinsic(
				Intrinsic::prefetch, { DestIdx->getType() },
				Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - Intrinsic::prefetch, { DestIdx->getType() }, - { DestIdx, WritePrefetch, PrefetchLocality, DataCacheType }); + Intrinsic::prefetch, {DestIdx->getType()}, + {DestIdx, WritePrefetch, PrefetchLocality, DataCacheType}); Lint: Pre-merge checks: clang-format: please reformat the code ``` - Intrinsic::prefetch, { DestIdx->getType() }…
				{ DestIdx, WritePrefetch, PrefetchLocality, DataCacheType });
				}

				// Copy from Src to Dest and return pointers to the end of the copied regions.
				auto GenerateCopy = [&](unsigned ElementsPerIteration, Value *Src,
				Value *Dest) {
				Type *ValueType =
				VectorType::get(Type::getIntNTy(C, 8 * ElementSizeInBytes),
				ElementsPerIteration, false);
				Type *SrcValuePtrType =
				PointerType::get(ValueType, SrcType->getAddressSpace());
				Type *DestValuePtrType =
				PointerType::get(ValueType, DestType->getAddressSpace());
				Value *SrcBC = Builder.CreateBitCast(Src, SrcValuePtrType, "src.vec");
				LoadInst *Val = Builder.CreateLoad(ValueType, SrcBC, "vec");
				Val->setAlignment(Align(ElementSizeInBytes));
				Value *DestBC = Builder.CreateBitCast(Dest, DestValuePtrType, "dst.vec");
				StoreInst *Store = Builder.CreateStore(Val, DestBC);
				apilipenkoUnsubmitted Not Done Reply Inline Actions These loads and stores must be at least ElementSizeInBytes-atomic. I'm not sure you can express this in the IR at the moment. apilipenko: These loads and stores must be at least ElementSizeInBytes-atomic. I'm not sure you can express…
				Store->setAlignment(Align(ElementSizeInBytes));
				auto *Int8Ty = Type::getInt8Ty(C);
				auto *SrcNext = cast<Instruction>(Builder.CreateGEP(
				Int8Ty, SrcIdx,
				ConstantInt::get(LenType, ElementsPerIteration * ElementSizeInBytes),
				"src.next"));
				auto *DestNext = cast<Instruction>(Builder.CreateGEP(
				Int8Ty, DestIdx,
				ConstantInt::get(LenType, ElementsPerIteration * ElementSizeInBytes),
				"dest.next"));
				return std::make_pair(SrcNext, DestNext);
				};

				auto Next = GenerateCopy(ElementsPerIteration, SrcIdx, DestIdx);
				Instruction *SrcNext = Next.first;
				Instruction *DestNext = Next.second;

				Value *IdxNext =
				Builder.CreateSub(Idx, ConstantInt::get(LenType, 1), "idx.next");
				Value *MemcpyLoopCond = Builder.CreateICmpNE(
				IdxNext, ConstantInt::getNullValue(LenType), "memcpy-loop.cond");
				Builder.CreateCondBr(MemcpyLoopCond, Loop, Tail);
				SrcIdx->addIncoming(Src, Preheader);
				SrcIdx->addIncoming(SrcNext, Loop);
				DestIdx->addIncoming(Dest, Preheader);
				DestIdx->addIncoming(DestNext, Loop);
				Idx->addIncoming(NumIter, Preheader);
				Idx->addIncoming(IdxNext, Loop);

				// And then construct a tail that will handle the remainder with more narrow
				// registers.
				Builder.SetInsertPoint(BB->getTerminator());
				BranchInst *NewTerminator = BranchInst::Create(Tail, Preheader, LoopCond);
				ReplaceInstWithInst(BB->getTerminator(), NewTerminator);
				if (DT)
				DT->insertEdge(BB, Tail);

				Builder.SetInsertPoint(Tail->getTerminator());
				Instruction *Term = Tail->getTerminator();

				SrcIdx = Builder.CreatePHI(SrcIdx->getType(), 2, "src.idx");
				SrcIdx->addIncoming(Src, BB);
				SrcIdx->addIncoming(SrcNext, Loop);

				DestIdx = Builder.CreatePHI(DestIdx->getType(), 2, "dest.idx");
				DestIdx->addIncoming(Dest, BB);
				DestIdx->addIncoming(DestNext, Loop);

				for (unsigned VF = MaxVectorSizeInBytes / 2; VF >= ElementSizeInBytes;
				VF /= 2) {
				ElementsPerIteration = VF / ElementSizeInBytes;

				Value *TestBit = Builder.CreateAnd(Len, ElementsPerIteration);
				Value *Cond =
				Builder.CreateICmpNE(TestBit, ConstantInt::getNullValue(LenType));
				Term->getParent()->setName(StringRef("need_check_") +
				std::to_string(ElementsPerIteration) + "_");
				Builder.SetInsertPoint(SplitBlockAndInsertIfThen(
				Cond, Term, false, /BrMetaData/ nullptr, DT));
				Builder.GetInsertPoint()->getParent()->setName(
				StringRef("tail_") + std::to_string(ElementsPerIteration) + "_");

				auto Next = GenerateCopy(ElementsPerIteration, SrcIdx, DestIdx);
				Instruction *SrcNext = Next.first;
				Instruction *DestNext = Next.second;
				if (VF > ElementSizeInBytes) {
				Builder.SetInsertPoint(Term);

				auto *NewSrcIdx = Builder.CreatePHI(SrcIdx->getType(), 2, "src.idx");
				NewSrcIdx->addIncoming(SrcIdx, SrcIdx->getParent());
				NewSrcIdx->addIncoming(SrcNext, SrcNext->getParent());
				SrcIdx = NewSrcIdx;

				auto *NewDestIdx = Builder.CreatePHI(DestIdx->getType(), 2, "dest.idx");
				NewDestIdx->addIncoming(DestIdx, DestIdx->getParent());
				NewDestIdx->addIncoming(DestNext, DestNext->getParent());
				DestIdx = NewDestIdx;
				} else {
				SrcNext->eraseFromParent();
				DestNext->eraseFromParent();
				}
				}

				++NumAtomicMemCpyLowered;
				MemCpy->eraseFromParent();
				return true;
				}

				static bool lowerCall(CallInst CI, const TargetTransformInfo TTI,
				DominatorTree *DT) {
				if (auto *MemCpy = dyn_cast<AtomicMemCpyInst>(CI))
				return lowerAtomicMemCpy(MemCpy, TTI, DT);
				return false;
				}

				static bool lowerGCLeafIntrinsics(Function &F, const TargetTransformInfo *TTI,
				DominatorTree *DT) {
				// Intrinsic inlining will blow up the code size, so don't do it if it's our
				// concern.
				if (F.hasOptSize())
				return false;

				SmallVector<CallInst *, 8> Candidates;
				// Collect all GC leaf calls as potential candidates.
				for (Instruction &I : instructions(F))
				if (auto *CI = dyn_cast<CallInst>(&I))
				if (CI->hasFnAttr("gc-leaf-function"))
				Candidates.push_back(CI);

				bool Changed = false;
				for (auto *Candidate : Candidates)
				Changed \|= lowerCall(Candidate, TTI, DT);

				#ifndef NDEBUG
				if (DT)
				assert(DT->verify(DominatorTree::VerificationLevel::Fast));
				#endif
				return Changed;
				}

				PreservedAnalyses LowerGCLeafIntrinsicsPass::run(Function &F,
				FunctionAnalysisManager &AM) {
				auto &TTI = AM.getResult<TargetIRAnalysis>(F);
				// auto *DT = AM.getCachedResult<DominatorTreeAnalysis>(F);
				auto *DT = &AM.getResult<DominatorTreeAnalysis>(F);
				if (!lowerGCLeafIntrinsics(F, &TTI, DT))
				return PreservedAnalyses::all();
				PreservedAnalyses PA;
				PA.preserve<DominatorTreeAnalysis>();
				return PA;
				}

				namespace {
				class LowerGCLeafIntrinsicsLegacyPass : public FunctionPass {
				public:
				static char ID;

				LowerGCLeafIntrinsicsLegacyPass() : FunctionPass(ID) {
				initializeLowerGCLeafIntrinsicsLegacyPassPass(
				*PassRegistry::getPassRegistry());
				}

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.addRequired<TargetLibraryInfoWrapperPass>();
				AU.addRequired<TargetTransformInfoWrapperPass>();
				AU.addPreserved<DominatorTreeWrapperPass>();
				FunctionPass::getAnalysisUsage(AU);
				}

				bool runOnFunction(Function &F) override {
				if (skipFunction(F))
				return false;

				const TargetTransformInfo *TTI =
				&getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);
				DominatorTree *DT = nullptr;
				if (auto *DTWP = getAnalysisIfAvailable<DominatorTreeWrapperPass>())
				DT = &DTWP->getDomTree();
				return lowerGCLeafIntrinsics(F, TTI, DT);
				}
				};
				}
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: anonymous namespace not terminated with a closing comment [llvm-namespace-comment] not useful clang-format: please reformat the code -} +} // namespace Lint: Pre-merge checks: clang-tidy: warning: anonymous namespace not terminated with a closing comment [llvm-namespace…

				char LowerGCLeafIntrinsicsLegacyPass::ID = 0;
				INITIALIZE_PASS_BEGIN(LowerGCLeafIntrinsicsLegacyPass,
				"lower-gc-leaf-intrinsics",
				"Lower GC leaf intrinsic calls", false, false)
				INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
				INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)
				INITIALIZE_PASS_END(LowerGCLeafIntrinsicsLegacyPass, "lower-gc-leaf-intrinsics",
				"Lower GC leaf intrinsic calls", false, false)

				FunctionPass *llvm::createLowerGCLeafIntrinsicsPass() {
				return new LowerGCLeafIntrinsicsLegacyPass();
				}

llvm/lib/Transforms/Scalar/Scalar.cpp

Show First 20 Lines • Show All 76 Lines • ▼ Show 20 Lines	void llvm::initializeScalarOpts(PassRegistry &Registry) {
initializeLoopUnrollPass(Registry);		initializeLoopUnrollPass(Registry);
initializeLoopUnrollAndJamPass(Registry);		initializeLoopUnrollAndJamPass(Registry);
initializeLoopUnswitchPass(Registry);		initializeLoopUnswitchPass(Registry);
initializeWarnMissedTransformationsLegacyPass(Registry);		initializeWarnMissedTransformationsLegacyPass(Registry);
initializeLoopVersioningLICMLegacyPassPass(Registry);		initializeLoopVersioningLICMLegacyPassPass(Registry);
initializeLoopIdiomRecognizeLegacyPassPass(Registry);		initializeLoopIdiomRecognizeLegacyPassPass(Registry);
initializeLowerAtomicLegacyPassPass(Registry);		initializeLowerAtomicLegacyPassPass(Registry);
initializeLowerConstantIntrinsicsPass(Registry);		initializeLowerConstantIntrinsicsPass(Registry);
		initializeLowerGCLeafIntrinsicsLegacyPassPass(Registry);
initializeLowerExpectIntrinsicPass(Registry);		initializeLowerExpectIntrinsicPass(Registry);
initializeLowerGuardIntrinsicLegacyPassPass(Registry);		initializeLowerGuardIntrinsicLegacyPassPass(Registry);
initializeLowerMatrixIntrinsicsLegacyPassPass(Registry);		initializeLowerMatrixIntrinsicsLegacyPassPass(Registry);
initializeLowerMatrixIntrinsicsMinimalLegacyPassPass(Registry);		initializeLowerMatrixIntrinsicsMinimalLegacyPassPass(Registry);
initializeLowerWidenableConditionLegacyPassPass(Registry);		initializeLowerWidenableConditionLegacyPassPass(Registry);
initializeMemCpyOptLegacyPassPass(Registry);		initializeMemCpyOptLegacyPassPass(Registry);
initializeMergeICmpsLegacyPassPass(Registry);		initializeMergeICmpsLegacyPassPass(Registry);
initializeMergedLoadStoreMotionLegacyPassPass(Registry);		initializeMergedLoadStoreMotionLegacyPassPass(Registry);
▲ Show 20 Lines • Show All 216 Lines • Show Last 20 Lines

llvm/test/Transforms/LowerGCLeafIntrinsics/memcpy.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -S -lower-gc-leaf-intrinsics -mtriple=x86_64-unknown-linux-gnu -mattr=+avx < %s \| FileCheck %s
				; RUN: opt -S -passes=lower-gc-leaf-intrinsics -mtriple=x86_64-unknown-linux-gnu -mattr=+avx < %s \| FileCheck %s
				; RUN: opt -S -domtree -lower-gc-leaf-intrinsics -mtriple=x86_64-unknown-linux-gnu -mattr=+avx < %s \| FileCheck %s

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				declare void @llvm.memcpy.element.unordered.atomic.p1i8.p1i8.i32(i8 addrspace(1)* nocapture writeonly, i8 addrspace(1)* nocapture readonly, i32, i32) nounwind argmemonly

				; GC-leaf memcpy can be lowered into vector loop.
				define void @test_memcpy_gc_leaf(i8 addrspace(1)* align 16 %src, i8 addrspace(1)* align 16 %dest, i32 %len) gc "statepoint-example" {
				; CHECK-LABEL: @test_memcpy_gc_leaf(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[LEN_WIDE:%.]] = zext i32 [[LEN:%.]] to i64
				; CHECK-NEXT: [[ELEMENTS_LEN:%.*]] = udiv i64 [[LEN_WIDE]], 4
				; CHECK-NEXT: [[LOOP_COND:%.*]] = icmp slt i64 [[ELEMENTS_LEN]], 8
				; CHECK-NEXT: br i1 [[LOOP_COND]], label [[NEED_CHECK_4_:%.]], label [[MEMCPY_LOOP_PREHEADER:%.]]
				; CHECK: memcpy.loop.preheader:
				; CHECK-NEXT: [[LOOP_ITERS:%.*]] = udiv i64 [[ELEMENTS_LEN]], 8
				; CHECK-NEXT: br label [[MEMCPY_LOOP:%.*]]
				; CHECK: memcpy.loop:
				; CHECK-NEXT: [[IDX:%.]] = phi i64 [ [[LOOP_ITERS]], [[MEMCPY_LOOP_PREHEADER]] ], [ [[IDX_NEXT:%.]], [[MEMCPY_LOOP]] ]
				; CHECK-NEXT: [[SRC_IDX:%.]] = phi i8 addrspace(1) [ [[SRC:%.]], [[MEMCPY_LOOP_PREHEADER]] ], [ [[SRC_NEXT:%.]], [[MEMCPY_LOOP]] ]
				; CHECK-NEXT: [[DST_IDX:%.]] = phi i8 addrspace(1) [ [[DEST:%.]], [[MEMCPY_LOOP_PREHEADER]] ], [ [[DEST_NEXT:%.]], [[MEMCPY_LOOP]] ]
				; CHECK-NEXT: call void @llvm.prefetch.p1i8(i8 addrspace(1)* [[SRC_IDX]], i32 0, i32 0, i32 1)
				; CHECK-NEXT: call void @llvm.prefetch.p1i8(i8 addrspace(1)* [[DST_IDX]], i32 1, i32 0, i32 1)
				; CHECK-NEXT: [[SRC_VEC:%.]] = bitcast i8 addrspace(1) [[SRC_IDX]] to <8 x i32> addrspace(1)*
				; CHECK-NEXT: [[VEC:%.]] = load <8 x i32>, <8 x i32> addrspace(1) [[SRC_VEC]], align 4
				; CHECK-NEXT: [[DST_VEC:%.]] = bitcast i8 addrspace(1) [[DST_IDX]] to <8 x i32> addrspace(1)*
				; CHECK-NEXT: store <8 x i32> [[VEC]], <8 x i32> addrspace(1)* [[DST_VEC]], align 4
				; CHECK-NEXT: [[SRC_NEXT]] = getelementptr i8, i8 addrspace(1)* [[SRC_IDX]], i64 32
				; CHECK-NEXT: [[DEST_NEXT]] = getelementptr i8, i8 addrspace(1)* [[DST_IDX]], i64 32
				; CHECK-NEXT: [[IDX_NEXT]] = sub i64 [[IDX]], 1
				; CHECK-NEXT: [[MEMCPY_LOOP_COND:%.*]] = icmp ne i64 [[IDX_NEXT]], 0
				; CHECK-NEXT: br i1 [[MEMCPY_LOOP_COND]], label [[MEMCPY_LOOP]], label [[NEED_CHECK_4_]]
				; CHECK: need_check_4_:
				; CHECK-NEXT: [[SRC_IDX1:%.]] = phi i8 addrspace(1) [ [[SRC]], [[ENTRY:%.*]] ], [ [[SRC_NEXT]], [[MEMCPY_LOOP]] ]
				; CHECK-NEXT: [[DEST_IDX:%.]] = phi i8 addrspace(1) [ [[DEST]], [[ENTRY]] ], [ [[DEST_NEXT]], [[MEMCPY_LOOP]] ]
				; CHECK-NEXT: [[TMP0:%.*]] = and i64 [[ELEMENTS_LEN]], 4
				; CHECK-NEXT: [[TMP1:%.*]] = icmp ne i64 [[TMP0]], 0
				; CHECK-NEXT: br i1 [[TMP1]], label [[TAIL_4_:%.]], label [[NEED_CHECK_2_:%.]]
				; CHECK: tail_4_:
				; CHECK-NEXT: [[SRC_VEC2:%.]] = bitcast i8 addrspace(1) [[SRC_IDX1]] to <4 x i32> addrspace(1)*
				; CHECK-NEXT: [[VEC3:%.]] = load <4 x i32>, <4 x i32> addrspace(1) [[SRC_VEC2]], align 4
				; CHECK-NEXT: [[DST_VEC4:%.]] = bitcast i8 addrspace(1) [[DEST_IDX]] to <4 x i32> addrspace(1)*
				; CHECK-NEXT: store <4 x i32> [[VEC3]], <4 x i32> addrspace(1)* [[DST_VEC4]], align 4
				; CHECK-NEXT: [[SRC_NEXT5:%.]] = getelementptr i8, i8 addrspace(1) [[SRC_IDX1]], i64 16
				; CHECK-NEXT: [[DEST_NEXT6:%.]] = getelementptr i8, i8 addrspace(1) [[DEST_IDX]], i64 16
				; CHECK-NEXT: br label [[NEED_CHECK_2_]]
				; CHECK: need_check_2_:
				; CHECK-NEXT: [[SRC_IDX7:%.]] = phi i8 addrspace(1) [ [[SRC_IDX1]], [[NEED_CHECK_4_]] ], [ [[SRC_NEXT5]], [[TAIL_4_]] ]
				; CHECK-NEXT: [[DEST_IDX8:%.]] = phi i8 addrspace(1) [ [[DEST_IDX]], [[NEED_CHECK_4_]] ], [ [[DEST_NEXT6]], [[TAIL_4_]] ]
				; CHECK-NEXT: [[TMP2:%.*]] = and i64 [[ELEMENTS_LEN]], 2
				; CHECK-NEXT: [[TMP3:%.*]] = icmp ne i64 [[TMP2]], 0
				; CHECK-NEXT: br i1 [[TMP3]], label [[TAIL_2_:%.]], label [[NEED_CHECK_1_:%.]]
				; CHECK: tail_2_:
				; CHECK-NEXT: [[SRC_VEC9:%.]] = bitcast i8 addrspace(1) [[SRC_IDX7]] to <2 x i32> addrspace(1)*
				; CHECK-NEXT: [[VEC10:%.]] = load <2 x i32>, <2 x i32> addrspace(1) [[SRC_VEC9]], align 4
				; CHECK-NEXT: [[DST_VEC11:%.]] = bitcast i8 addrspace(1) [[DEST_IDX8]] to <2 x i32> addrspace(1)*
				; CHECK-NEXT: store <2 x i32> [[VEC10]], <2 x i32> addrspace(1)* [[DST_VEC11]], align 4
				; CHECK-NEXT: [[SRC_NEXT12:%.]] = getelementptr i8, i8 addrspace(1) [[SRC_IDX7]], i64 8
				; CHECK-NEXT: [[DEST_NEXT13:%.]] = getelementptr i8, i8 addrspace(1) [[DEST_IDX8]], i64 8
				; CHECK-NEXT: br label [[NEED_CHECK_1_]]
				; CHECK: need_check_1_:
				; CHECK-NEXT: [[SRC_IDX14:%.]] = phi i8 addrspace(1) [ [[SRC_IDX7]], [[NEED_CHECK_2_]] ], [ [[SRC_NEXT12]], [[TAIL_2_]] ]
				; CHECK-NEXT: [[DEST_IDX15:%.]] = phi i8 addrspace(1) [ [[DEST_IDX8]], [[NEED_CHECK_2_]] ], [ [[DEST_NEXT13]], [[TAIL_2_]] ]
				; CHECK-NEXT: [[TMP4:%.*]] = and i64 [[ELEMENTS_LEN]], 1
				; CHECK-NEXT: [[TMP5:%.*]] = icmp ne i64 [[TMP4]], 0
				; CHECK-NEXT: br i1 [[TMP5]], label [[TAIL_1_:%.]], label [[TMP6:%.]]
				; CHECK: tail_1_:
				; CHECK-NEXT: [[SRC_VEC16:%.]] = bitcast i8 addrspace(1) [[SRC_IDX14]] to <1 x i32> addrspace(1)*
				; CHECK-NEXT: [[VEC17:%.]] = load <1 x i32>, <1 x i32> addrspace(1) [[SRC_VEC16]], align 4
				; CHECK-NEXT: [[DST_VEC18:%.]] = bitcast i8 addrspace(1) [[DEST_IDX15]] to <1 x i32> addrspace(1)*
				; CHECK-NEXT: store <1 x i32> [[VEC17]], <1 x i32> addrspace(1)* [[DST_VEC18]], align 4
				; CHECK-NEXT: br label [[TMP6]]
				; CHECK: 6:
				; CHECK-NEXT: ret void
				;
				entry:
				call void @llvm.memcpy.element.unordered.atomic.p1i8.p1i8.i32(i8 addrspace(1)* align 16 %dest, i8 addrspace(1)* align 16 %src, i32 %len, i32 4) #0
				ret void
				}

				; This may trigger GC, so we should not lower it
				define void @test_memcpy_gc(i8 addrspace(1)* align 16 %src, i8 addrspace(1)* align 16 %dest, i32 %len) gc "statepoint-example" {
				; CHECK-LABEL: @test_memcpy_gc(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: call void @llvm.memcpy.element.unordered.atomic.p1i8.p1i8.i32(i8 addrspace(1)* align 16 [[DEST:%.]], i8 addrspace(1) align 16 [[SRC:%.]], i32 [[LEN:%.]], i32 4) [ "deopt"() ]
				; CHECK-NEXT: ret void
				;
				entry:
				call void @llvm.memcpy.element.unordered.atomic.p1i8.p1i8.i32(i8 addrspace(1)* align 16 %dest, i8 addrspace(1)* align 16 %src, i32 %len, i32 4) [ "deopt"() ]
				ret void
				}

				attributes #0 = { "gc-leaf-function" }

This is an archive of the discontinued LLVM Phabricator instance.

Introduce LowerGCLeafIntrinsics passAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 364026

llvm/include/llvm/InitializePasses.h

llvm/include/llvm/Transforms/Scalar.h

llvm/include/llvm/Transforms/Scalar/LowerGCLeafIntrinsics.h

llvm/lib/Passes/PassBuilder.cpp

llvm/lib/Passes/PassRegistry.def

llvm/lib/Transforms/Scalar/CMakeLists.txt

llvm/lib/Transforms/Scalar/LowerGCLeafIntrinsics.cpp

llvm/lib/Transforms/Scalar/Scalar.cpp

llvm/test/Transforms/LowerGCLeafIntrinsics/memcpy.ll

Introduce LowerGCLeafIntrinsics pass
AbandonedPublic