This is an archive of the discontinued LLVM Phabricator instance.

[MemCpyOpt/MemorySSA] Do not run the pass for prohibitively large number of memory accesses.
AbandonedPublic

Authored by asbirlea on Aug 4 2021, 4:45 PM.

Details

Summary

Provide a knob in MemorySSA for passes to query if they're dealing with a pathological testcase: very large number of memory accesses.
While MemorySSA queries are bounded, updates to the analysis are not - they are necessary for correctness. Inserting a new memory access will trigger the renaming of all accesses affected, so an update is liniar in the number of accesses.
The utility introduced here uses the counter used to assign unique IDs in MemorySSA, which is an approximation of how many accesses have been created, not necesarily how many accesses are currently in the function.
The current knob is set to 100000, so an approximation should be enough.

Usecase: triggered in MemCpyOpt on a function with a single BB and 100k+ memoryaccesses.

Diff Detail

Event Timeline

asbirlea created this revision.Aug 4 2021, 4:45 PM
asbirlea requested review of this revision.Aug 4 2021, 4:45 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 4 2021, 4:45 PM

For some context, we're seeing this in llvm_gcov_reset, a function generated by the gcov pass. One module contains a llvm_gcov_reset containing 100000+ stores, many of which are changed to memsets by MemCpyOpt.
I'm sort of on the fence for this change since this change is not super principled. The IR created by the gcov pass is somewhat unreasonable and IMO a super edge case. D107538 fixes GCov.
Any other thoughts? Should we support extreme IR edge cases?

Needs a test though

There are many places in LLVM that rely on various caps to prevent high compile times.
IMO, this is another case that we need to consider can happen and work around it rather than let other users discover it.
I'd also prefer to add a warning message when such a large number of accesses is reached, as this will affect any pass that inserts accesses and updates MemorySSA.

asbirlea updated this revision to Diff 364711.Aug 6 2021, 12:28 AM

Add test and debug messages.

nikic added a comment.Aug 6 2021, 12:39 AM

If the problem here updating MemorySSA in general, or only the use-optimized form? At one of the MemorySSA meetings we discussed the possibility of constructing MSSA without use optimization and then requesting that to happen later. MemCpyOpt itself doesn't particularly benefit from optimized uses.

This is not related to optimized uses.
The issue is when making a change in the IR, in this case inserting a memset. This triggers updating all the accesses found below that access. The renaming phase will traverse all the accesses in a block (see MemorySSA::renameBlock).

nikic added a comment.Aug 7 2021, 1:53 PM

This is not related to optimized uses.
The issue is when making a change in the IR, in this case inserting a memset. This triggers updating all the accesses found below that access. The renaming phase will traverse all the accesses in a block (see MemorySSA::renameBlock).

Just looked at the MemorySSA update code for the first time, and this is a lot more complicated than I expected...

I do wonder if we can't make the updating for the problematic case more efficient though. What we're doing here is less introducing a new def and more merging a number of defs into one -- any uses of those defs should point to the new def. In terms of the existing API surface, rather than inserting the new memset def after the existing store defs with RenameUses=true, can we instead insert it before the existing defs with RenameUses=false? The actual "renaming" would happen when the store memory accesses are removed, but that should be efficient and avoid any full block scans.

given that D107702 seems to fix the original use case, do we still want to do this?

asbirlea abandoned this revision.Aug 9 2021, 5:34 PM

I don't think we need this patch for the specific issue that motivated it. It's possible it will be useful for other passes that insert defs and end up in a similar scenario though.
At this point it's not worth pursuing and I think I prefer to analyze the uses of the inserDef API instead and reopen if a use case makes sense (e.g. there is a capping done in LICM, and that one iterates through the access list).