This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Analysis/
-
llvm/
-
Analysis/
-
MemorySSA.h
-
lib/
-
Analysis/
-
MemorySSA.cpp
-
Transforms/Scalar/
-
Scalar/
-
MemCpyOptimizer.cpp
-
test/Transforms/MemCpyOpt/
-
Transforms/
-
MemCpyOpt/
-
too-many-accesses.ll

Differential D107513

[MemCpyOpt/MemorySSA] Do not run the pass for prohibitively large number of memory accesses.
AbandonedPublic

Authored by asbirlea on Aug 4 2021, 4:45 PM.

Download Raw Diff

Details

Reviewers

aeubanks
nikic
george.burgess.iv

Summary

Provide a knob in MemorySSA for passes to query if they're dealing with a pathological testcase: very large number of memory accesses.
While MemorySSA queries are bounded, updates to the analysis are not - they are necessary for correctness. Inserting a new memory access will trigger the renaming of all accesses affected, so an update is liniar in the number of accesses.
The utility introduced here uses the counter used to assign unique IDs in MemorySSA, which is an approximation of how many accesses have been created, not necesarily how many accesses are currently in the function.
The current knob is set to 100000, so an approximation should be enough.

Usecase: triggered in MemCpyOpt on a function with a single BB and 100k+ memoryaccesses.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

asbirlea created this revision.Aug 4 2021, 4:45 PM

Herald added a subscriber: hiraditya. · View Herald TranscriptAug 4 2021, 4:45 PM

asbirlea requested review of this revision.Aug 4 2021, 4:45 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 4 2021, 4:45 PM

Harbormaster completed remote builds in B118037: Diff 364290.Aug 4 2021, 5:26 PM

For some context, we're seeing this in llvm_gcov_reset, a function generated by the gcov pass. One module contains a llvm_gcov_reset containing 100000+ stores, many of which are changed to memsets by MemCpyOpt.
I'm sort of on the fence for this change since this change is not super principled. The IR created by the gcov pass is somewhat unreasonable and IMO a super edge case. D107538 fixes GCov.
Any other thoughts? Should we support extreme IR edge cases?

Needs a test though

There are many places in LLVM that rely on various caps to prevent high compile times.
IMO, this is another case that we need to consider can happen and work around it rather than let other users discover it.
I'd also prefer to add a warning message when such a large number of accesses is reached, as this will affect any pass that inserts accesses and updates MemorySSA.

Add test and debug messages.

If the problem here updating MemorySSA in general, or only the use-optimized form? At one of the MemorySSA meetings we discussed the possibility of constructing MSSA without use optimization and then requesting that to happen later. MemCpyOpt itself doesn't particularly benefit from optimized uses.

This is not related to optimized uses.
The issue is when making a change in the IR, in this case inserting a memset. This triggers updating all the accesses found below that access. The renaming phase will traverse all the accesses in a block (see MemorySSA::renameBlock).

Harbormaster completed remote builds in B118325: Diff 364711.Aug 6 2021, 1:03 AM

In D107513#2930489, @asbirlea wrote:

This is not related to optimized uses.
The issue is when making a change in the IR, in this case inserting a memset. This triggers updating all the accesses found below that access. The renaming phase will traverse all the accesses in a block (see MemorySSA::renameBlock).

Just looked at the MemorySSA update code for the first time, and this is a lot more complicated than I expected...

I do wonder if we can't make the updating for the problematic case more efficient though. What we're doing here is less introducing a new def and more merging a number of defs into one -- any uses of those defs should point to the new def. In terms of the existing API surface, rather than inserting the new memset def after the existing store defs with RenameUses=true, can we instead insert it before the existing defs with RenameUses=false? The actual "renaming" would happen when the store memory accesses are removed, but that should be efficient and avoid any full block scans.

nikic mentioned this in D107702: [MemCpyOpt] Optimize MemoryDef insertion.Aug 7 2021, 2:16 PM

given that D107702 seems to fix the original use case, do we still want to do this?

I don't think we need this patch for the specific issue that motivated it. It's possible it will be useful for other passes that insert defs and end up in a similar scenario though.
At this point it's not worth pursuing and I think I prefer to analyze the uses of the inserDef API instead and reopen if a use case makes sense (e.g. there is a capping done in LICM, and that one iterates through the access list).

nikic mentioned this in rG17db125b487f: [MemCpyOpt] Optimize MemoryDef insertion.Aug 10 2021, 12:28 PM

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

MemorySSA.h

9 lines

lib/

Analysis/

MemorySSA.cpp

15 lines

Transforms/

Scalar/

MemCpyOptimizer.cpp

8 lines

test/

Transforms/

MemCpyOpt/

too-many-accesses.ll

54 lines

Diff 364711

llvm/include/llvm/Analysis/MemorySSA.h

Show First 20 Lines • Show All 729 Lines • ▼ Show 20 Lines	MemoryPhi getMemoryAccess(const BasicBlock BB) const {
return cast_or_null<MemoryPhi>(ValueToMemoryAccess.lookup(cast<Value>(BB)));		return cast_or_null<MemoryPhi>(ValueToMemoryAccess.lookup(cast<Value>(BB)));
}		}

DominatorTree &getDomTree() const { return *DT; }		DominatorTree &getDomTree() const { return *DT; }

void dump() const;		void dump() const;
void print(raw_ostream &) const;		void print(raw_ostream &) const;

		// Return true if the approximate number of accesses seen by MemorySSA for
		// the current function is prohibitively large.
		//
		// This information can be used to limit optimizations that need to update
		// MemorySSA for IRs with pathological patterns.
		// FIXME: A better informative number would be the number of Defs, but this
		// is currently only targetting extreme cases. Revisit if the usecases change.
		bool prohibitivelyLargeNumberOfAccesses() const;

/// Return true if \p MA represents the live on entry value		/// Return true if \p MA represents the live on entry value
///		///
/// Loads and stores from pointer arguments and other global values may be		/// Loads and stores from pointer arguments and other global values may be
/// defined by memory operations that do not occur in the current function, so		/// defined by memory operations that do not occur in the current function, so
/// they may be live on entry to the function. MemorySSA represents such		/// they may be live on entry to the function. MemorySSA represents such
/// memory state by the live on entry definition, which is guaranteed to occur		/// memory state by the live on entry definition, which is guaranteed to occur
/// before any other memory access in the function.		/// before any other memory access in the function.
inline bool isLiveOnEntryDef(const MemoryAccess *MA) const {		inline bool isLiveOnEntryDef(const MemoryAccess *MA) const {
▲ Show 20 Lines • Show All 600 Lines • Show Last 20 Lines

llvm/lib/Analysis/MemorySSA.cpp

Show First 20 Lines • Show All 89 Lines • ▼ Show 20 Lines
#else		#else
bool llvm::VerifyMemorySSA = false;		bool llvm::VerifyMemorySSA = false;
#endif		#endif
/// Enables memory ssa as a dependency for loop passes in legacy pass manager.		/// Enables memory ssa as a dependency for loop passes in legacy pass manager.
cl::opt<bool> llvm::EnableMSSALoopDependency(		cl::opt<bool> llvm::EnableMSSALoopDependency(
"enable-mssa-loop-dependency", cl::Hidden, cl::init(true),		"enable-mssa-loop-dependency", cl::Hidden, cl::init(true),
cl::desc("Enable MemorySSA dependency for loop pass manager"));		cl::desc("Enable MemorySSA dependency for loop pass manager"));

		static cl::opt<unsigned> ProhibitivelyLargeAccessNo(
		"memssa-prohibitively-large-access-no", cl::Hidden, cl::init(100000),
		cl::desc("The number of accesses seen in a function by MemorySSA, past"
		"which optimizations should consider updates to MemorySSA to be"
		"prohibitively expensive (default = 100000)"));

static cl::opt<bool, true>		static cl::opt<bool, true>
VerifyMemorySSAX("verify-memoryssa", cl::location(VerifyMemorySSA),		VerifyMemorySSAX("verify-memoryssa", cl::location(VerifyMemorySSA),
cl::Hidden, cl::desc("Enable verification of MemorySSA."));		cl::Hidden, cl::desc("Enable verification of MemorySSA."));

namespace llvm {		namespace llvm {

/// An assembly annotator class to print Memory SSA information in		/// An assembly annotator class to print Memory SSA information in
/// comments.		/// comments.
▲ Show 20 Lines • Show All 1,753 Lines • ▼ Show 20 Lines	void MemorySSA::print(raw_ostream &OS) const {
MemorySSAAnnotatedWriter Writer(this);		MemorySSAAnnotatedWriter Writer(this);
F.print(OS, &Writer);		F.print(OS, &Writer);
}		}

#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)		#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)
LLVM_DUMP_METHOD void MemorySSA::dump() const { print(dbgs()); }		LLVM_DUMP_METHOD void MemorySSA::dump() const { print(dbgs()); }
#endif		#endif

		bool MemorySSA::prohibitivelyLargeNumberOfAccesses() const {
		return NextID > ProhibitivelyLargeAccessNo;
		}

void MemorySSA::verifyMemorySSA() const {		void MemorySSA::verifyMemorySSA() const {
		if (prohibitivelyLargeNumberOfAccesses()) {
		LLVM_DEBUG(dbgs() << "MemorySSA found a function with prohibitevely large "
		"number of accesses: "
		<< F.getName() << "\n";);
		}
verifyOrderingDominationAndDefUses(F);		verifyOrderingDominationAndDefUses(F);
verifyDominationNumbers(F);		verifyDominationNumbers(F);
verifyPrevDefInPhis(F);		verifyPrevDefInPhis(F);
// Previously, the verification used to also verify that the clobberingAccess		// Previously, the verification used to also verify that the clobberingAccess
// cached by MemorySSA is the same as the clobberingAccess found at a later		// cached by MemorySSA is the same as the clobberingAccess found at a later
// query to AA. This does not hold true in general due to the current fragility		// query to AA. This does not hold true in general due to the current fragility
// of BasicAA which has arbitrary caps on the things it analyzes before giving		// of BasicAA which has arbitrary caps on the things it analyzes before giving
// up. As a result, transformations that are correct, will lead to BasicAA		// up. As a result, transformations that are correct, will lead to BasicAA
▲ Show 20 Lines • Show All 689 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp

Show First 20 Lines • Show All 1,754 Lines • ▼ Show 20 Lines	bool MemCpyOptPass::runImpl(Function &F, MemoryDependenceResults *MD_,
MemorySSAUpdater MSSAU_(MSSA_);		MemorySSAUpdater MSSAU_(MSSA_);
MSSAU = MSSA_ ? &MSSAU_ : nullptr;		MSSAU = MSSA_ ? &MSSAU_ : nullptr;
// If we don't have at least memset and memcpy, there is little point of doing		// If we don't have at least memset and memcpy, there is little point of doing
// anything here. These are required by a freestanding implementation, so if		// anything here. These are required by a freestanding implementation, so if
// even they are disabled, there is no point in trying hard.		// even they are disabled, there is no point in trying hard.
if (!TLI->has(LibFunc_memset) \|\| !TLI->has(LibFunc_memcpy))		if (!TLI->has(LibFunc_memset) \|\| !TLI->has(LibFunc_memcpy))
return false;		return false;

		// If the number of memory accesses is prohibitively large, skip the pass.
		if (MSSA && MSSA->prohibitivelyLargeNumberOfAccesses()) {
		LLVM_DEBUG(
		dbgs() << "Number of memoryaccesses prohibitively large in function "
		<< F.getName() << ". Skipping MemCpyOpt.\n");
		return false;
		}

while (true) {		while (true) {
if (!iterateOnFunction(F))		if (!iterateOnFunction(F))
break;		break;
MadeChange = true;		MadeChange = true;
}		}

if (MSSA_ && VerifyMemorySSA)		if (MSSA_ && VerifyMemorySSA)
MSSA_->verifyMemorySSA();		MSSA_->verifyMemorySSA();
Show All 24 Lines

llvm/test/Transforms/MemCpyOpt/too-many-accesses.ll

This file was added.

				; RUN: opt < %s -memssa-prohibitively-large-access-no=10 -passes=memcpyopt -S -enable-memcpyopt-memoryssa=1 \| FileCheck %s
				; RUN: opt < %s -memssa-prohibitively-large-access-no=5 -passes=memcpyopt -S -enable-memcpyopt-memoryssa=1 \| FileCheck %s --check-prefix=LARGE

				define void @test1(i8 signext %c) nounwind {
				; CHECK-LABEL: @test1(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[X:%.*]] = alloca [6 x i8], align 1
				; CHECK-NEXT: [[TMP:%.]] = getelementptr [6 x i8], [6 x i8] [[X]], i32 0, i32 0
				; CHECK-NEXT: [[TMP5:%.]] = getelementptr [6 x i8], [6 x i8] [[X]], i32 0, i32 1
				; CHECK-NEXT: [[TMP9:%.]] = getelementptr [6 x i8], [6 x i8] [[X]], i32 0, i32 2
				; CHECK-NEXT: [[TMP13:%.]] = getelementptr [6 x i8], [6 x i8] [[X]], i32 0, i32 3
				; CHECK-NEXT: [[TMP17:%.]] = getelementptr [6 x i8], [6 x i8] [[X]], i32 0, i32 4
				; CHECK-NEXT: [[TMP21:%.]] = getelementptr [6 x i8], [6 x i8] [[X]], i32 0, i32 5
				; CHECK-NEXT: call void @llvm.memset.p0i8.i64(i8* align 1 [[TMP]], i8 [[C:%.*]], i64 6, i1 false)
				; CHECK-NEXT: [[TMP25:%.]] = call i32 (...) @bar([6 x i8] [[X]]) [[ATTR0:#.*]]
				; CHECK-NEXT: ret void

				; LARGE-LABEL: @test1(
				; LARGE-NEXT: entry:
				; LARGE-NEXT: [[X:%.*]] = alloca [6 x i8], align 1
				; LARGE-NEXT: [[TMP:%.]] = getelementptr [6 x i8], [6 x i8] [[X]], i32 0, i32 0
				; LARGE-NEXT: store i8 %c, i8* [[TMP]], align 1
				; LARGE-NEXT: [[TMP5:%.]] = getelementptr [6 x i8], [6 x i8] [[X]], i32 0, i32 1
				; LARGE-NEXT: store i8 %c, i8* [[TMP5]], align 1
				; LARGE-NEXT: [[TMP9:%.]] = getelementptr [6 x i8], [6 x i8] [[X]], i32 0, i32 2
				; LARGE-NEXT: store i8 %c, i8* [[TMP9]], align 1
				; LARGE-NEXT: [[TMP13:%.]] = getelementptr [6 x i8], [6 x i8] [[X]], i32 0, i32 3
				; LARGE-NEXT: store i8 %c, i8* [[TMP13]], align 1
				; LARGE-NEXT: [[TMP17:%.]] = getelementptr [6 x i8], [6 x i8] [[X]], i32 0, i32 4
				; LARGE-NEXT: store i8 %c, i8* [[TMP17]], align 1
				; LARGE-NEXT: [[TMP21:%.]] = getelementptr [6 x i8], [6 x i8] [[X]], i32 0, i32 5
				; LARGE-NEXT: store i8 %c, i8* [[TMP21]], align 1
				; LARGE-NEXT: [[TMP25:%.]] = call i32 (...) @bar([6 x i8] [[X]]) [[ATTR0:#.*]]
				; LARGE-NEXT: ret void

				entry:
				%x = alloca [6 x i8]
				%tmp = getelementptr [6 x i8], [6 x i8]* %x, i32 0, i32 0
				store i8 %c, i8* %tmp, align 1
				%tmp5 = getelementptr [6 x i8], [6 x i8]* %x, i32 0, i32 1
				store i8 %c, i8* %tmp5, align 1
				%tmp9 = getelementptr [6 x i8], [6 x i8]* %x, i32 0, i32 2
				store i8 %c, i8* %tmp9, align 1
				%tmp13 = getelementptr [6 x i8], [6 x i8]* %x, i32 0, i32 3
				store i8 %c, i8* %tmp13, align 1
				%tmp17 = getelementptr [6 x i8], [6 x i8]* %x, i32 0, i32 4
				store i8 %c, i8* %tmp17, align 1
				%tmp21 = getelementptr [6 x i8], [6 x i8]* %x, i32 0, i32 5
				store i8 %c, i8* %tmp21, align 1
				%tmp25 = call i32 (...) @bar( [6 x i8]* %x ) nounwind
				ret void
				}

				declare i32 @bar(...)