This is an archive of the discontinued LLVM Phabricator instance.

[SimplifyCFG] Merge identical basic blocks (WIP)
AbandonedPublic

Authored by nikic on Jun 28 2020, 2:15 PM.

Details

Reviewers
None
Summary

Merge basic blocks that are exactly the same, inspired by https://bugs.llvm.org/show_bug.cgi?id=46402.

There's stuff to figure out here first, such as a 350% compile-time regression on CMakeFiles/clamscan.dir/libclamav_nsis_LZMADecode.c.o.

Diff Detail

Event Timeline

nikic created this revision.Jun 28 2020, 2:15 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 28 2020, 2:15 PM

I think @AndrewLitteken & @paquette are working on adding infrastructure for IR outlining. Might be worth syncing up on sharing the IR equivalence checks/hashing. It would be good to have a single place where we define those kinds of IR equivalence for merging/outlining, rather than having multiple slightly diverging ones.

Also, I am not sure if SimplifyCFG is the ideal place to do this transform, as it runs quite a few times in the pipeline. I think running the merging sometime after GVN should allow us to catch most cases. Also, various parts in the backend do not handle huge BBs well (in terms of compile time), so we should probably avoid creating BBs that are too large.

nikic updated this revision to Diff 274224.Jun 29 2020, 1:37 PM

Improve performance, mainly by hashing in successors, which eliminates a lot of blocks that only differ in successors.

nikic added a comment.Jun 30 2020, 9:46 AM

I think @AndrewLitteken & @paquette are working on adding infrastructure for IR outlining. Might be worth syncing up on sharing the IR equivalence checks/hashing. It would be good to have a single place where we define those kinds of IR equivalence for merging/outlining, rather than having multiple slightly diverging ones.

Also, I am not sure if SimplifyCFG is the ideal place to do this transform, as it runs quite a few times in the pipeline. I think running the merging sometime after GVN should allow us to catch most cases.

For reference, these are the current compile-time numbers: http://llvm-compile-time-tracker.com/compare.php?from=ef79026aad943fa9b5c12d462127cd9dfa9ea0d8&to=8fb2407c2f795d92f2e5d69a483ab749eff488f2&stat=instructions

After fixing some obvious issues, the merging itself appears to be quite cheap. The big compile-time effect (in either direction) is because this changes the amount of IR (text-size changes) and has impact on following optimizations (the largest regression is addressed by D82799).

That said, the particular placement here is just an initial attempt that puts it next to the existing merging code for return blocks, it may not be the best placement. Having it as a separate pass, or in a different position in SimplifyCFG may make more sense.

Performing merging after GVN seems too late. The block merging is intended not just as a code-size optimization, but also as a means to enable further control-flow optimization. As such, it should probably at least happen before jump threading.

Also, various parts in the backend do not handle huge BBs well (in terms of compile time), so we should probably avoid creating BBs that are too large.

I'm not immediately seeing how this optimization would result in larger blocks?

We're using a similar method for how to do the Instruction equivalence checking by using isSameOperationAs for most of the checking, but is there a reason you are not using isIdenticalTo when checking that instructions are equivalent? It looks like it is doing much of the same thing, including checking if operands and phi nodes are the same.

jfb added subscribers: jfb, vish99.Jun 30 2020, 10:33 AM
jfb added a comment.Jun 30 2020, 10:37 AM

I like this, but it's going to run into the same issues as MergeFuncs. Can you sync with @hiraditya and @vish99 to solve both block comparison and function comparison using the same methodology? The we can turn both of them on by default. Or in other words, how does this not have exactly the same issues MergeFuncs has? The principal one is that the IR comparator goes out of sync with IR definition, and we get miscompiles because of this.

Block hashing seems useful. mergeFuncs also hashes IR, but for the whole function. I wonder if hashing blocks should be an analysis pass, which we invalidate when we change IR, and which both this pass as well as MergeFuncs consumes (i.e. try to de-dup functions, then blocks, and use the block hash to create a hash for entire functions).

SimplifyCFG doesn't look like a good place to put this as it is called multiple times and impacts compile time significantly.
Agreed with @jfb that having hasher as a separate pass which other passes can consume. @vish99 may have more ideas on hashing.
Hashing can be expensive and we could use clever tricks to make it cheaper. D52896 could help, happy to provide more input.

We're using a similar method for how to do the Instruction equivalence checking by using isSameOperationAs for most of the checking, but is there a reason you are not using isIdenticalTo when checking that instructions are equivalent? It looks like it is doing much of the same thing, including checking if operands and phi nodes are the same.

isIdenticalTo isn't being used here, because the operand equality check does not use identity. In particular, for blocks like

%x1 = add 1, 2
%y1 = add %x1, 3

%x2 = add 1, 2
%y2 = add %x2, 3

We need to determine that the blocks are the same, but %x1 and %x2 are only equal after remapping. That's what the ValuesEqual predicate checks, and how this differs from isIdenticalTo.

In D82730#2123353, @jfb wrote:

I like this, but it's going to run into the same issues as MergeFuncs. Can you sync with @hiraditya and @vish99 to solve both block comparison and function comparison using the same methodology? The we can turn both of them on by default. Or in other words, how does this not have exactly the same issues MergeFuncs has? The principal one is that the IR comparator goes out of sync with IR definition, and we get miscompiles because of this.

The issue with MergeFuncs is that it does not use simple equality. First, it wants to merge code that is working on different, but bitwise equivalent, types. Second, IIRC it needs an ordering, not just an equality comparison. For that reason it reimplements the comparison logic from scratch. This transform is based on isSameOperationAs(), which is a standard instruction comparison operation used by a number of other transforms (including SimplifyCFG).

Block hashing seems useful. mergeFuncs also hashes IR, but for the whole function. I wonder if hashing blocks should be an analysis pass, which we invalidate when we change IR, and which both this pass as well as MergeFuncs consumes (i.e. try to de-dup functions, then blocks, and use the block hash to create a hash for entire functions).

Possibly, but I suspect it's not worthwhile to treat it as an analysis pass. The hash calculation is not particularly expensive as these things go, and would be invalidated by pretty much any transform.

nikic updated this revision to Diff 275554.Jul 5 2020, 8:50 AM

Improve phi handling. Blocks with phi nodes can be merged, but it requires some care when backedges are involved.

AndrewLitteken added inline comments.Jul 8 2020, 3:20 PM
llvm/lib/Transforms/Scalar/SimplifyCFGPass.cpp
210

I have a similar checking mechanism for checking if two sections of IR are similar, so it might make sense to create a shared utility function for the instruction mapping/comparison section. Where a basic block or range of instructions is passed in and the "isSameOperationAs" check for the range is performed, then if we develop new mechanisms for whether these sections are the same operations both passes get the same gains.

dmajor added a subscriber: dmajor.Nov 11 2020, 12:11 PM
nikic abandoned this revision.Aug 9 2023, 6:16 AM

Not working on this at present, and this would need better heuristics on when it is profitable to do this.

Herald added a project: Restricted Project. · View Herald TranscriptAug 9 2023, 6:16 AM