This is an archive of the discontinued LLVM Phabricator instance.

[MergedLoadStoreMotion] Fix a debug invariant bug in mergeStores
ClosedPublic

Authored by bjope on May 8 2018, 1:46 PM.

Details

Summary

MergedLoadStoreMotion::mergeStores is using some heuristics
to limit the amount of stores that it tries to sink (see
MagicCompileTimeControl in MergedLoadStoreMotion.cpp). The
heuristic involves counting the number of instructions in
one of the basic blocks that is part of the transformation.

We now ignore dbg intrinsics when counting instruction for
the MagicCompileTimeControl heuristic. This to make sure that
the amount of stores that are sunk doesn't depend on the amount
of debug information (if -g is used or not).

Diff Detail

Repository
rL LLVM

Event Timeline

bjope created this revision.May 8 2018, 1:46 PM
Gerolf added a subscriber: bjope.May 8 2018, 1:48 PM

The test case could possibly be shorter, but the change LGTM.

Thanks
Gerolf

davide accepted this revision.May 8 2018, 1:49 PM

Simple enough. Gerolf?

This revision is now accepted and ready to land.May 8 2018, 1:49 PM
davide added inline comments.May 8 2018, 1:51 PM
lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
303–305 ↗(On Diff #145772)

Ideally, MagicCompileTimeControl shouldn't be a thing, but that's a discussion for another day.

bjope added a comment.May 8 2018, 1:54 PM

Thanks alot, that was a quick review!
(I'll land this tomorrow when I have time to monitor the buildbots)

bjope added inline comments.May 8 2018, 2:22 PM
lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
303–305 ↗(On Diff #145772)

I agree (and yes, I focused on the debug invariance here, since changing the algorithm would take lots more time to implement and test).

One idea, if we think the complexity of finding all store instruction in Pred1 for each store in Pred0 in canSinkFromBlock is a problem, then we can create a list of all store instructions in Pred1 already before starting to iterate over Pred0 (at line 292). And then iterate over that list in canSinkFromBlock (removing stores that are sunk after each call to canSinkFromBlock).
If we still want to have a (less magic) limit it could be based on the number of store instructions that we allow to sink, rather than the total number of instructions in one of the two predecessors that we sink from.

dberlin added inline comments.
lib/Transforms/Scalar/MergedLoadStoreMotion.cpp
303–305 ↗(On Diff #145772)

This is still N^2 because of how memdep works.

I posted an O(N) version of MergedLoadStoreMotion that performs the same optimization, without limits, using MemorySSA by hash tabling, at D8688.

You could pick that up if you want.

This revision was automatically updated to reflect the committed changes.