Page MenuHomePhabricator

[LVI] Restructure caching
ClosedPublic

Authored by nikic on Nov 18 2019, 12:59 AM.

Details

Summary

Variant on D70103. The caching is switched to always use a BB to cache entry map, which then contains per-value caches. A separate set contains value handles with a deletion callback. This allows us to properly invalidate overdefined values.

A possible alternative would be to always cache by value first and have per-BB maps/sets in the each cache entry. In that case we could use a ValueMap and would avoid the separate value handle set. I went with the BB indexing at the top level to make it easier to integrate D69914, but possibly that's not the right choice.

Diff Detail

Event Timeline

nikic created this revision.Nov 18 2019, 12:59 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 18 2019, 12:59 AM
efriedma accepted this revision.Nov 18 2019, 4:36 PM

Simpler is good, I think. And I like the use of AssertingVH. LGTM.

It would be nice to have a testcase for PR43909? But not critical, I think; the expanded use of AssertingVH should catch any future issues here.

This revision is now accepted and ready to land.Nov 18 2019, 4:36 PM
nikic added a subscriber: reames.

@efriedma When using AssertingVH a bunch of the existing JT/CVP tests start to fail, so I think we don't need a test case that produces an actual collision in the map.

Adding @reames in case he has some thoughts on how we handle caching/invalidation.

uenoku added a subscriber: uenoku.Nov 19 2019, 4:59 AM
fhahn added a subscriber: fhahn.Nov 19 2019, 8:40 AM
jmorse added a subscriber: jmorse.Nov 28 2019, 2:06 AM
reames added a comment.Dec 2 2019, 5:15 PM

Adding @reames in case he has some thoughts on how we handle caching/invalidation.

This looks very reasonable to me. I ran through it and didn't spot anything obviously problematic, so also LGTM.

nikic updated this revision to Diff 232150.Dec 4 2019, 8:38 AM

Don't use AssertingVH for BasicBlock. See D29006 for why this needs to be a PoisoningVH. I believe using AssertingVH should be fine for the Value case though, as have a CallbackVH that should *always* remove these.

This revision was automatically updated to reflect the committed changes.

We're seeing a large memory increase in compilations as a result of this patch: 4.7G -> 6.0G (25%)

Would it be possible to revert this patch while we investigate? I'm trying to upstream a reproduction in the meantime.

nikic added a comment.Dec 20 2019, 1:04 AM

@rupprecht Sure. You'll also have to revert rG21fbd5587cdf.

It's expected that this patch increases memory usage, but that does seem a bit excessive.

Thanks -- the code in question seems to be heavily templated, so it may be a corner case with how those are handled. I'll revert shortly and follow up with some kind of repro, but may be delayed due to holidays.

nikic reopened this revision.Dec 20 2019, 11:28 AM
This revision is now accepted and ready to land.Dec 20 2019, 11:28 AM
nikic added a comment.Mar 22 2020, 4:04 PM

@rupprecht Ping regarding a reproducing case. I just double checked that this patch is memory usage neutral on ctmark, so I don't really have a starting point on how to improve this. I'm also wondering whether you are using the legacy PM or the new PM?

nikic updated this revision to Diff 258604.Apr 19 2020, 8:33 AM

Rebase and move BlockCacheEntry behind a unique_ptr, as it is large.

nikic added a comment.Apr 19 2020, 9:55 AM

I plan to reapply this change sometime soon, as we do need to fix the non-determinism, I have not been able to reproduce a memory usage regression, and there is no test case available. Happy to look into it if there is one.

One of the big alternatives here is to switch LVI from automatic to manual invalidation, as that removes the need to store value handles. Probably relatively easy to do for CVP, but messy for JumpThreading.

Any progress?

This revision was automatically updated to reflect the committed changes.

@rupprecht Ping regarding a reproducing case. I just double checked that this patch is memory usage neutral on ctmark, so I don't really have a starting point on how to improve this.

Sorry about this -- the test case I tried to reduce was being persistent in not being able to get smaller. (e.g. something like a ~30% increase when using 2GB memory but only 1% increase when using 1GB). And then I've been on leave for the past few months and forgot to get to this before I left.

At any rate, we'll watch this patch and see if the issues still occur.

I'm also wondering whether you are using the legacy PM or the new PM?

The new PM.

Hi @nikic, I just tracked down a big compile time increase to this patch. The issue cropped up for a very large function, where a cycle profile showed hotspots in:

llvm::DenseMapBase<llvm::SmallDenseMap<llvm::AssertingVH<llvm::Value>, llvm::ValueLatticeElement, 4u, llvm::DenseMapInfo<llvm::AssertingVH<llvm::Value> >, llvm::detail::DenseMapPair<llvm::AssertingVH<llvm::Value>, llvm::ValueLatticeElement> >, llvm::AssertingVH<llvm::Value>, llvm::ValueLatticeElement, llvm::DenseMapInfo<llvm::AssertingVH<llvm::Value> >, llvm::detail::DenseMapPair<llvm::AssertingVH<llvm::Value>, llvm::ValueLatticeElement> >::erase(llvm::AssertingVH<llvm::Value> const&)

and

(anonymous namespace)::LVIValueHandle::deleted()

The problem is related to this patch's restructuring of the LazyValueInfoCache so that instead of a 2-level map from Value* -> BB -> ValueLatticeElement, it now has a 2-level map from BB -> Value -> ValueLatticeElement. The problem is that LVIValueHandle::deleted invokes LazyValueInfoCache::eraseValue on a Value*, which now needs to walk through every entry in the outer map to remove the Value* from every block containing it in its lattice. Before, it could simply do a single lookup on the outer map to remove the Value.

When I revert this patch the compile time goes down ~35%.

I noticed in your description this comment:
"A possible alternative would be to always cache by value first and have per-BB maps/sets in the each cache entry. In that case we could use a ValueMap and would avoid the separate value handle set. I went with the BB indexing at the top level to make it easier to integrate D69914, but possibly that's not the right choice."

It sounds like your proposed alternative would address this issue. Would you be able to do that?

nikic added a comment.Jul 11 2020, 4:03 AM

Hi @nikic, I just tracked down a big compile time increase to this patch. The issue cropped up for a very large function, where a cycle profile showed hotspots in:

llvm::DenseMapBase<llvm::SmallDenseMap<llvm::AssertingVH<llvm::Value>, llvm::ValueLatticeElement, 4u, llvm::DenseMapInfo<llvm::AssertingVH<llvm::Value> >, llvm::detail::DenseMapPair<llvm::AssertingVH<llvm::Value>, llvm::ValueLatticeElement> >, llvm::AssertingVH<llvm::Value>, llvm::ValueLatticeElement, llvm::DenseMapInfo<llvm::AssertingVH<llvm::Value> >, llvm::detail::DenseMapPair<llvm::AssertingVH<llvm::Value>, llvm::ValueLatticeElement> >::erase(llvm::AssertingVH<llvm::Value> const&)

and

(anonymous namespace)::LVIValueHandle::deleted()

The problem is related to this patch's restructuring of the LazyValueInfoCache so that instead of a 2-level map from Value* -> BB -> ValueLatticeElement, it now has a 2-level map from BB -> Value -> ValueLatticeElement. The problem is that LVIValueHandle::deleted invokes LazyValueInfoCache::eraseValue on a Value*, which now needs to walk through every entry in the outer map to remove the Value* from every block containing it in its lattice. Before, it could simply do a single lookup on the outer map to remove the Value.

When I revert this patch the compile time goes down ~35%.

I noticed in your description this comment:
"A possible alternative would be to always cache by value first and have per-BB maps/sets in the each cache entry. In that case we could use a ValueMap and would avoid the separate value handle set. I went with the BB indexing at the top level to make it easier to integrate D69914, but possibly that's not the right choice."

It sounds like your proposed alternative would address this issue. Would you be able to do that?

Unfortunately the alternative approach has its own issues. It would fix this performance problem, but I think it would also raise memory usage significantly. D81811 might be a good way to go about it, because it removes the need for separate tracking of overdefined values, which would fix the non-determinism problem here as a side-effect.

Is it possible to share the problematic test case, so I can evaluate different approaches?

Unfortunately there is an additional complication here. Apparently the separate per-block storage of overdefined values is not just there to reduce memory usage, it is also necessary for the implementation of threadEdgeImpl: https://github.com/llvm/llvm-project/blob/8f183d9f3d13d66a679bd449b1f5d34942560028/llvm/lib/Analysis/LazyValueInfo.cpp#L264

This code needs to know which values are cached as overdefined for a given block, which would be highly inefficient (scan over all values) if we don't key by block first.

I'm not sure whether the threadEdgeImpl code can be dropped -- no tests fail if I do, but it's not like the commit that introduced this (https://github.com/llvm/llvm-project/commit/aa7f66ba6798ea946baa622b55679597dab60742) had any tests either...

Ideally we'd switch LVI to be based on PredicateInfo and thus avoid the need to track data per-block in the first place. Of course, this is not simple to do.

Hi @nikic, I just tracked down a big compile time increase to this patch. The issue cropped up for a very large function, where a cycle profile showed hotspots in:

llvm::DenseMapBase<llvm::SmallDenseMap<llvm::AssertingVH<llvm::Value>, llvm::ValueLatticeElement, 4u, llvm::DenseMapInfo<llvm::AssertingVH<llvm::Value> >, llvm::detail::DenseMapPair<llvm::AssertingVH<llvm::Value>, llvm::ValueLatticeElement> >, llvm::AssertingVH<llvm::Value>, llvm::ValueLatticeElement, llvm::DenseMapInfo<llvm::AssertingVH<llvm::Value> >, llvm::detail::DenseMapPair<llvm::AssertingVH<llvm::Value>, llvm::ValueLatticeElement> >::erase(llvm::AssertingVH<llvm::Value> const&)

and

(anonymous namespace)::LVIValueHandle::deleted()

The problem is related to this patch's restructuring of the LazyValueInfoCache so that instead of a 2-level map from Value* -> BB -> ValueLatticeElement, it now has a 2-level map from BB -> Value -> ValueLatticeElement. The problem is that LVIValueHandle::deleted invokes LazyValueInfoCache::eraseValue on a Value*, which now needs to walk through every entry in the outer map to remove the Value* from every block containing it in its lattice. Before, it could simply do a single lookup on the outer map to remove the Value.

When I revert this patch the compile time goes down ~35%.

I noticed in your description this comment:
"A possible alternative would be to always cache by value first and have per-BB maps/sets in the each cache entry. In that case we could use a ValueMap and would avoid the separate value handle set. I went with the BB indexing at the top level to make it easier to integrate D69914, but possibly that's not the right choice."

It sounds like your proposed alternative would address this issue. Would you be able to do that?

Unfortunately the alternative approach has its own issues. It would fix this performance problem, but I think it would also raise memory usage significantly. D81811 might be a good way to go about it, because it removes the need for separate tracking of overdefined values, which would fix the non-determinism problem here as a side-effect.

Is it possible to share the problematic test case, so I can evaluate different approaches?

Will work on getting you a test case today. Thanks!

Ideally we'd switch LVI to be based on PredicateInfo and thus avoid the need to track data per-block in the first place. Of course, this is not simple to do.

Alternatively we might be able to move some of the logic from LVI/CVP to (IP)SCCP, which already uses PredicateInfo. At the moment, there is already a bit of function overlap, but currently SCCP mostly uses ranges to simplify conditions, but could also do more things.

There probably are a few things in LVI/CVP that might not be possible to handle in (IP)SCCP, e.g. I am not sure about the edge threading.

It looks like I may not be able to share the obfuscated IR for this particular test case. I'll try to look for something else that demonstrates a compile time increase from the patch. But in any case, really any code that invokes LazyValueInfoCache::eraseValue on a whole lot of values in a function with a large number of BBs is going to suffer with a compile time increase from the change to the data structures. Here's the hot call path to it in my test case:

llvm::detail::PassModel<llvm::Function, llvm::SimplifyCFGPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&)
llvm::SimplifyCFGPass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&)
simplifyFunctionCFG(llvm::Function&, llvm::TargetTransformInfo const&, llvm::SimplifyCFGOptions const&)
llvm::removeUnreachableBlocks(llvm::Function&, llvm::DomTreeUpdater*, llvm::MemorySSAUpdater*)
llvm::BasicBlock::eraseFromParent()
llvm::BasicBlock::~BasicBlock()
llvm::Value::deleteValue()
llvm::Instruction::~Instruction()
llvm::ValueHandleBase::ValueIsDeleted(llvm::Value*)
(anonymous namespace)::LVIValueHandle::deleted()

I think this is after JumpThreading, which presumably created all these simplify opportunities.

It looks like I may not be able to share the obfuscated IR for this particular test case. I'll try to look for something else that demonstrates a compile time increase from the patch. But in any case, really any code that invokes LazyValueInfoCache::eraseValue on a whole lot of values in a function with a large number of BBs is going to suffer with a compile time increase from the change to the data structures. Here's the hot call path to it in my test case:

llvm::detail::PassModel<llvm::Function, llvm::SimplifyCFGPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&)
llvm::SimplifyCFGPass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&)
simplifyFunctionCFG(llvm::Function&, llvm::TargetTransformInfo const&, llvm::SimplifyCFGOptions const&)
llvm::removeUnreachableBlocks(llvm::Function&, llvm::DomTreeUpdater*, llvm::MemorySSAUpdater*)
llvm::BasicBlock::eraseFromParent()
llvm::BasicBlock::~BasicBlock()
llvm::Value::deleteValue()
llvm::Instruction::~Instruction()
llvm::ValueHandleBase::ValueIsDeleted(llvm::Value*)
(anonymous namespace)::LVIValueHandle::deleted()

I think this is after JumpThreading, which presumably created all these simplify opportunities.

This looks fishy. LazyValueInfo should not be alive when SimplifyCFG gets run. SimplifyCFG does not depend on it, or preserve it.

Do you use the legacy PM or new PM?

It looks like I may not be able to share the obfuscated IR for this particular test case. I'll try to look for something else that demonstrates a compile time increase from the patch. But in any case, really any code that invokes LazyValueInfoCache::eraseValue on a whole lot of values in a function with a large number of BBs is going to suffer with a compile time increase from the change to the data structures. Here's the hot call path to it in my test case:

llvm::detail::PassModel<llvm::Function, llvm::SimplifyCFGPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&)
llvm::SimplifyCFGPass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&)
simplifyFunctionCFG(llvm::Function&, llvm::TargetTransformInfo const&, llvm::SimplifyCFGOptions const&)
llvm::removeUnreachableBlocks(llvm::Function&, llvm::DomTreeUpdater*, llvm::MemorySSAUpdater*)
llvm::BasicBlock::eraseFromParent()
llvm::BasicBlock::~BasicBlock()
llvm::Value::deleteValue()
llvm::Instruction::~Instruction()
llvm::ValueHandleBase::ValueIsDeleted(llvm::Value*)
(anonymous namespace)::LVIValueHandle::deleted()

I think this is after JumpThreading, which presumably created all these simplify opportunities.

This looks fishy. LazyValueInfo should not be alive when SimplifyCFG gets run. SimplifyCFG does not depend on it, or preserve it.

Do you use the legacy PM or new PM?

This is with the newPM. I couldn't repro it with the old PM, but I assumed it was because PGO based inlining is different. But maybe it is related to where LVI is alive in the new PM.

BTW, that call path was with a release built compiler. Here is the stack trace in a debug opt where I am reproducing it (it was running for well over an hour when I attached via gdb). The BlockCache contains 322648 entries.

#0 0x0000000000cc11e1 in llvm::DenseMapInfo<llvm::Value*>::isEqual (LHS=0x54370218, RHS=0x4699a400)

at llvm/include/llvm/ADT/DenseMapInfo.h:82

#1 0x000000000284416d in llvm::DenseMapBase<llvm::SmallDenseMap<llvm::AssertingVH<llvm::Value>, llvm::ValueLatticeElement, 4u, llvm::DenseMapInfo<llvm::AssertingVH<llvm::Value> >, llvm::detail::DenseMapPair<llvm::AssertingVH<llvm::Value>, llvm::ValueLatticeElement> >, llvm::AssertingVH<llvm::Value>, llvm::ValueLatticeElement, llvm::DenseMapInfo<llvm::AssertingVH<llvm::Value> >, llvm::detail::DenseMapPair<llvm::AssertingVH<llvm::Value>, llvm::ValueLatticeElement> >::LookupBucketFor<llvm::AssertingVH<llvm::Value> > (this=0x4699a400, Val=..., FoundBucket=@0x7ffdaf052cc0: 0x3ad127a0)

at llvm/include/llvm/ADT/DenseMap.h:622

#2 0x0000000002843fd5 in llvm::DenseMapBase<llvm::SmallDenseMap<llvm::AssertingVH<llvm::Value>, llvm::ValueLatticeElement, 4u, llvm::DenseMapInfo<llvm::AssertingVH<llvm::Value> >, llvm::detail::DenseMapPair<llvm::AssertingVH<llvm::Value>, llvm::ValueLatticeElement> >, llvm::AssertingVH<llvm::Value>, llvm::ValueLatticeElement, llvm::DenseMapInfo<llvm::AssertingVH<llvm::Value> >, llvm::detail::DenseMapPair<llvm::AssertingVH<llvm::Value>, llvm::ValueLatticeElement> >::LookupBucketFor<llvm::AssertingVH<llvm::Value> > (this=0x4699a400, Val=..., FoundBucket=@0x7ffdaf052d20: 0x7ffdaf052d40)

at llvm/include/llvm/ADT/DenseMap.h:662

#3 0x0000000002843e28 in llvm::DenseMapBase<llvm::SmallDenseMap<llvm::AssertingVH<llvm::Value>, llvm::ValueLatticeElement, 4u, llvm::DenseMapInfo<llvm::AssertingVH<llvm::Value> >, llvm::detail::DenseMapPair<llvm::AssertingVH<llvm::Value>, llvm::ValueLatticeElement> >, llvm::AssertingVH<llvm::Value>, llvm::ValueLatticeElement, llvm::DenseMapInfo<llvm::AssertingVH<llvm::Value> >, llvm::detail::DenseMapPair<llvm::AssertingVH<llvm::Value>, llvm::ValueLatticeElement> >::erase (this=0x4699a400, Val=...)

at llvm/include/llvm/ADT/DenseMap.h:304

#4 0x0000000002835c62 in (anonymous namespace)::LazyValueInfoCache::eraseValue (this=0xd79b990, V=0x3ad127a0)

at llvm/lib/Analysis/LazyValueInfo.cpp:245

#5 0x0000000002835b7c in (anonymous namespace)::LVIValueHandle::deleted (this=0x7f1e9496e478)

at llvm/lib/Analysis/LazyValueInfo.cpp:257

#6 0x000000000344c7c6 in llvm::ValueHandleBase::ValueIsDeleted (V=0x3ad127a0)

at llvm/lib/IR/Value.cpp:987

#7 0x000000000344c41c in llvm::Value::~Value (this=0x3ad127a0)

at llvm/lib/IR/Value.cpp:76

#8 0x000000000290d308 in llvm::User::~User (this=0x3ad127a0)

at llvm/include/llvm/IR/User.h:94

#9 0x0000000003354800 in llvm::Instruction::~Instruction (this=0x3ad127a0)

at llvm/lib/IR/Instruction.cpp:61

#10 0x0000000003453dc8 in llvm::UnaryInstruction::~UnaryInstruction (this=0x3ad127a0)

at llvm/include/llvm/IR/InstrTypes.h:57

#11 0x0000000003452688 in llvm::LoadInst::~LoadInst (this=0x3ad127a0)

at llvm/include/llvm/IR/Instructions.h:173

#12 0x000000000344d27d in llvm::Value::deleteValue (this=0x3ad127a0)

at llvm/include/llvm/IR/Instruction.def:172

#13 0x00000000030988b8 in llvm::ilist_alloc_traits<llvm::Instruction>::deleteNode (V=0x3ad127a0)

at llvm/include/llvm/IR/Instruction.h:829

#14 0x0000000003098885 in llvm::iplist_impl<llvm::simple_ilist<llvm::Instruction>, llvm::SymbolTableListTraits<llvm::Instruction> >::erase (this=0x3ad124e8, where=...) at llvm/include/llvm/ADT/ilist.h:268
#15 0x00000000030986db in llvm::iplist_impl<llvm::simple_ilist<llvm::Instruction>, llvm::SymbolTableListTraits<llvm::Instruction> >::erase (this=0x3ad124e8, first=..., last=...)

at llvm/include/llvm/ADT/ilist.h:305

#16 0x000000000320d014 in llvm::iplist_impl<llvm::simple_ilist<llvm::Instruction>, llvm::SymbolTableListTraits<llvm::Instruction> >::clear (this=0x3ad124e8) at llvm/include/llvm/ADT/ilist.h:309
#17 0x000000000320a7fd in llvm::BasicBlock::~BasicBlock (this=0x3ad124c0)

at llvm/lib/IR/BasicBlock.cpp:90

#18 0x000000000320e837 in llvm::ilist_alloc_traits<llvm::BasicBlock>::deleteNode (V=0x3ad124c0)

at llvm/include/llvm/ADT/ilist.h:41

#19 0x000000000320d575 in llvm::iplist_impl<llvm::simple_ilist<llvm::BasicBlock>, llvm::SymbolTableListTraits<llvm::BasicBlock> >::erase (this=0x91ad8e0, where=...) at llvm/include/llvm/ADT/ilist.h:268
#20 0x000000000320ad3c in llvm::BasicBlock::eraseFromParent (this=0x3ad124c0)

at llvm/lib/IR/BasicBlock.cpp:128

#21 0x0000000003fd4b72 in llvm::removeUnreachableBlocks (F=..., DTU=0x0, MSSAU=0x0)

at llvm/lib/Transforms/Utils/Local.cpp:2325

#22 0x0000000003d597c7 in simplifyFunctionCFG (F=..., TTI=..., Options=...)

at llvm/lib/Transforms/Scalar/SimplifyCFGPass.cpp:191

#23 0x0000000003d59736 in llvm::SimplifyCFGPass::run (this=0xc68ea28, F=..., AM=...)

at llvm/lib/Transforms/Scalar/SimplifyCFGPass.cpp:237

#24 0x000000000422e645 in llvm::detail::PassModel<llvm::Function, llvm::SimplifyCFGPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (this=0xc68ea20, IR=..., AM=...)

at llvm/include/llvm/IR/PassManagerInternal.h:79

#25 0x00000000034118e2 in llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (this=0x92a7a48, IR=..., AM=...)

at llvm/include/llvm/IR/PassManager.h:520

#26 0x0000000004240fb8 in llvm::CGSCCToFunctionPassAdaptor<llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>> >::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) (this=0x92a7a48, C=..., AM=..., CG=..., UR=...)

at llvm/include/llvm/Analysis/CGSCCPassManager.h:506

#27 0x0000000004240ce5 in llvm::detail::PassModel<llvm::LazyCallGraph::SCC, llvm::CGSCCToFunctionPassAdaptor<llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>> >, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) (this=0x92a7a40, IR=...,

AM=..., ExtraArgs=..., ExtraArgs=...)
at llvm/include/llvm/IR/PassManagerInternal.h:79

#28 0x0000000002721019 in llvm::PassManager<llvm::LazyCallGraph::SCC, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run (this=0xd03b248, InitialC=..., AM=..., G=..., UR=...)

at llvm/lib/Analysis/CGSCCPassManager.cpp:87

#29 0x00000000035e2f46 in llvm::DevirtSCCRepeatedPass<llvm::PassManager<llvm::LazyCallGraph::SCC, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&> >::run (this=0xd03b248,

InitialC=..., AM=..., CG=..., UR=...)
at llvm/include/llvm/Analysis/CGSCCPassManager.h:635

#30 0x00000000035e2887 in llvm::ModuleToPostOrderCGSCCPassAdaptor<llvm::DevirtSCCRepeatedPass<llvm::PassManager<llvm::LazyCallGraph::SCC, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&> > >::run (this=0xd03b248, M=..., AM=...)

at llvm/include/llvm/Analysis/CGSCCPassManager.h:898

#31 0x00000000035e1ff5 in llvm::detail::PassModel<llvm::Module, llvm::ModuleToPostOrderCGSCCPassAdaptor<llvm::DevirtSCCRepeatedPass<llvm::PassManager<llvm::LazyCallGraph::SCC, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&> > >, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (this=0xd03b240, IR=..., AM=...)

at llvm/include/llvm/IR/PassManagerInternal.h:79

#32 0x0000000003410a4c in llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (this=0x92a1cd0, IR=..., AM=...)

at llvm/include/llvm/IR/PassManager.h:520

#33 0x00000000035d04b6 in llvm::ModuleInlinerWrapperPass::run (this=0x92a1c68, M=..., MAM=...)

at llvm/lib/Transforms/IPO/Inliner.cpp:1070

#34 0x00000000042417d5 in llvm::detail::PassModel<llvm::Module, llvm::ModuleInlinerWrapperPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (this=0x92a1c60, IR=..., AM=...)

nikic added a subscriber: aeubanks.Jul 13 2020, 1:45 PM

@aeubanks or anyone else familiar with NewPM: How do you mark an analysis as eagerly invalidated in the NewPM? The legacy PM had a releaseMemory() method that was called when an analysis is no longer needed, what is the replacement for that mechanism?

Independently of the issue here, LazyValueAnalysis really should not stay alive longer than strictly necessary, as it is something of a memory hog.

nikic added a comment.Jul 13 2020, 1:54 PM

@tejohnson Just to check whether this would improve anything, could you try this patch?

diff --git a/llvm/lib/Transforms/Scalar/CorrelatedValuePropagation.cpp b/llvm/lib/Transforms/Scalar/CorrelatedValuePropagation.cpp
index cd2f4ca36f3..ef81699c77f 100644
--- a/llvm/lib/Transforms/Scalar/CorrelatedValuePropagation.cpp
+++ b/llvm/lib/Transforms/Scalar/CorrelatedValuePropagation.cpp
@@ -929,11 +929,19 @@ CorrelatedValuePropagationPass::run(Function &F, FunctionAnalysisManager &AM) {
 
   bool Changed = runImpl(F, LVI, DT, getBestSimplifyQuery(AM, F));
 
-  if (!Changed)
-    return PreservedAnalyses::all();
   PreservedAnalyses PA;
-  PA.preserve<GlobalsAA>();
-  PA.preserve<DominatorTreeAnalysis>();
-  PA.preserve<LazyValueAnalysis>();
+  if (!Changed) {
+    PA = PreservedAnalyses::all();
+  } else {
+    PA.preserve<GlobalsAA>();
+    PA.preserve<DominatorTreeAnalysis>();
+    PA.preserve<LazyValueAnalysis>();
+  }
+
+  // Keeping LVI alive is expensive, both because it uses a lot of memory, and
+  // because invalidating values in LVI is expensive. While CVP does preserve
+  // LVI, we know that passes after JumpThreading+CVP will not need the result
+  // of this analysis, so we forcefully discard it early.
+  PA.abandon<LazyValueAnalysis>();
   return PA;
 }

@aeubanks or anyone else familiar with NewPM: How do you mark an analysis as eagerly invalidated in the NewPM? The legacy PM had a releaseMemory() method that was called when an analysis is no longer needed, what is the replacement for that mechanism?

Independently of the issue here, LazyValueAnalysis really should not stay alive longer than strictly necessary, as it is something of a memory hog.

Besides the PreservedAnalyses return value of run(), does adding InvalidateAnalysisPass<AnalysisT> to the NPM pipeline work?

@tejohnson Just to check whether this would improve anything, could you try this patch?

Yep - it seems to fix the issue!

diff --git a/llvm/lib/Transforms/Scalar/CorrelatedValuePropagation.cpp b/llvm/lib/Transforms/Scalar/CorrelatedValuePropagation.cpp
index cd2f4ca36f3..ef81699c77f 100644
--- a/llvm/lib/Transforms/Scalar/CorrelatedValuePropagation.cpp
+++ b/llvm/lib/Transforms/Scalar/CorrelatedValuePropagation.cpp
@@ -929,11 +929,19 @@ CorrelatedValuePropagationPass::run(Function &F, FunctionAnalysisManager &AM) {
 
   bool Changed = runImpl(F, LVI, DT, getBestSimplifyQuery(AM, F));
 
-  if (!Changed)
-    return PreservedAnalyses::all();
   PreservedAnalyses PA;
-  PA.preserve<GlobalsAA>();
-  PA.preserve<DominatorTreeAnalysis>();
-  PA.preserve<LazyValueAnalysis>();
+  if (!Changed) {
+    PA = PreservedAnalyses::all();
+  } else {
+    PA.preserve<GlobalsAA>();
+    PA.preserve<DominatorTreeAnalysis>();
+    PA.preserve<LazyValueAnalysis>();
+  }
+
+  // Keeping LVI alive is expensive, both because it uses a lot of memory, and
+  // because invalidating values in LVI is expensive. While CVP does preserve
+  // LVI, we know that passes after JumpThreading+CVP will not need the result
+  // of this analysis, so we forcefully discard it early.
+  PA.abandon<LazyValueAnalysis>();
   return PA;
 }

FWIW, the patch to abandon in PreserveAnalysis is correct and the right approach IMO.

AFAIK, adding an invalidate in the NPM pipeline is mostly done in testing, when we pass a custom pipeline to opt and include specific invalidate<pass> to test that the invalidation is done or an analysis is recomputed.

Ping on fix - will the patch to add the abandon call be committed?

Ping on fix - will the patch to add the abandon call be committed?

Ping 2 - @nikic can the fix be committed asap? This is still causing issues for us.