This is an archive of the discontinued LLVM Phabricator instance.

[MergeICmps][WIP] Initial MergeICmps prototype
ClosedPublic

Authored by courbet on Jun 7 2017, 5:34 AM.

Details

Summary

MergeICmps is a new optimization pass that turns chains of integer comparisons into memcmp (thanks to recent improvements in the LLVM codegen, the memcmp is typically inlined as a chain of efficient hardware comparisons). This typically benefits C++ member or nonmember operator==().

The proof of concept shows speedups of about 2-3x with a reduced code size (typically by a factor 2-3x) .

More details can be found in this document.

Diff Detail

Event Timeline

courbet created this revision.Jun 7 2017, 5:34 AM
courbet edited the summary of this revision. (Show Details)Jun 7 2017, 8:59 AM
davide added a subscriber: davide.Jun 7 2017, 12:49 PM
davide added inline comments.
lib/Transforms/Scalar/MergeICmps.cpp
521–524

This is OK, but I wonder if you can end up with this case frequently (the SSA updater should always produce minimal SSA so single arguments phi shouldn't exits [although I can see passes introducing them]).

539–550

So if I understand correctly here you're trying to find whether all the PHI arguments but the last one are constant.
Are you relying on some ordering of the PHI arguments (i.e. they're in predecessor order?) Note that llvm currently doesn't provide this guarantee.

608–613

You can skip the entry block as the first instruction can't be possibly a PHINode.

courbet updated this revision to Diff 101871.Jun 8 2017, 1:33 AM
courbet marked 3 inline comments as done.
  • Add comments
  • Do not scan the entry block.

I'll take a look at the algorithm in a few days and see if i can help formalize it a bit more :)

lib/Transforms/Scalar/MergeICmps.cpp
294

This entire block looks like:

if (Comparison.doesOtherWork()

if (Comparisons.empty)
 continue
return

Which, IMHO reads better.

dberlin added inline comments.Jun 8 2017, 9:32 AM
lib/Transforms/Scalar/MergeICmps.cpp
294

Reformatting to make clear:

if (Comparison.doesOtherWork())
  if (Comparisons.empty)
   continue
  return
courbet updated this revision to Diff 102157.Jun 12 2017, 1:22 AM
courbet marked 2 inline comments as done.

Reorganize if blocks (NFC)

lib/Transforms/Scalar/MergeICmps.cpp
521–524

Yes, that's just for safety:)

539–550

No, I'm reconstructing the order (see the call to getOrderedBlocks() below). I've added a comment.

spatel added a subscriber: spatel.Jun 15 2017, 11:08 AM

Now that Sanjay has enabled the custom memcmp lowering (yay!), I think you can have a look at the code.

spatel edited edge metadata.Aug 18 2017, 8:08 AM
spatel added a subscriber: zvi.

Note: memcmp expansion for x86 was reduced with https://reviews.llvm.org/rL308986 because it caused a perf regression in PR33914 ( https://bugs.llvm.org/show_bug.cgi?id=33914 ).
There's a proposal to revert the expansion for x86 entirely in PR34032 ( https://bugs.llvm.org/show_bug.cgi?id=34032 ) because of a related perf regression and compile-time explosion in CGP (PR33900 - https://bugs.llvm.org/show_bug.cgi?id=33900 ). IMO, we need to solve the CGP addressing-mode bug anyway, but just so everyone is aware - memcmp expansion for x86 is still shaky at this point.

davide added inline comments.Aug 18 2017, 8:19 AM
lib/Transforms/Scalar/MergeICmps.cpp
104–106

single-line if -> remove braces.

133–136

Add messages for this assertions?

172–174

No braces needed.

176–178

No braces needed (here and everywhere else)

591

I'm at a loss here as you don't use postdominance anywhere? Is this a TODO item?

608

auto.

614

You might want to preserve CFG analyses also in the new pass manager.

courbet updated this revision to Diff 111933.Aug 21 2017, 12:55 AM
courbet marked 6 inline comments as done.
courbet added a reviewer: davide.

Fix PreservedAnalyses + Cosmetics & comments.

Thanks, PTAL.

lib/Transforms/Scalar/MergeICmps.cpp
133–136

Done (moved to a different function for documentation)

591

Oops, I used another pass as a template for this one, and forgot to remove the comment.

614

I'm actually adding/removing blocks, so I've removed CFG preservation from the old pass manager.

dberlin added inline comments.Aug 29 2017, 9:54 AM
lib/Transforms/Scalar/MergeICmps.cpp
142

Can you give an overview of why this matters that it does other work?
I'm struggling to understand why this affects whether you can form a memcmp or not out of the conditions.

The only case that pops into my head is if instructions in the block depend on the condition, or if the block isn't control dependent on the conditions "before" it (IE the other atoms you are trying to form a chain from).

IE given

if (a[0] > b[0])
  if (a[1] > b[1])
    do a ton of stuff.

I don't see why it would matter to transform these comparisons into a memcmp.
Because the inner if is control dependent on the outer one, you can extract the value from the memcmp result, even if you need it, since memcp returns the first differing byte.

But the above isn't what you are checking, so i feel like i'm misunderstanding what's going on here :)

courbet added inline comments.Aug 31 2017, 5:15 AM
lib/Transforms/Scalar/MergeICmps.cpp
142

Consider the following:

if (a[0] == b[0]) {  // bb1
  if (a[1] == b[1]) {  // bb2
    some_value = 3; //bb3
    if (a[2] == b[2]) { //bb3
      do a ton of stuff  //bb4
    }
  }
}

This is:

`

bb1 --eq--> bb2 --eq--> bb3* -eq--> bb4 --+
  \            \           \               \
   ne           ne          ne              \
    \            \           \               v
     +------------+-----------+----------> bb_phi

`

We can only merge the first two comparisons, because BB3 does "other work" (setting some_value to 3)

(As a side note I'm only handling equality comparisons here (a[0] == b[0] && a[1] == b[1]), not ordering comparisons)

courbet updated this revision to Diff 113386.Aug 31 2017, 5:16 AM

Add more comments regarding doesOtherWork().

courbet updated this revision to Diff 113407.Aug 31 2017, 8:06 AM

Add a comment about merging incomplete chains and resuming from there (as per dberlin's comment).

dberlin edited edge metadata.Aug 31 2017, 8:41 AM

So, LGTM at this point.
Have you performance tested it while being on by default?
Usually, passes that are off by default are undergoing work, with the expectation they will be on by default at some opt level at some point.

So if this is always a win, IMHO, we should do it. If it's not always a win, we should understand why.

(I expect one issue you will hit is that things understand comparisons better than memcmps)

lib/Transforms/Scalar/MergeICmps.cpp
142

You should add a variant of this comment to doesOtherWork

courbet added a comment.EditedSep 1 2017, 12:52 AM

So, LGTM at this point.
Have you performance tested it while being on by default?
Usually, passes that are off by default are undergoing work, with the expectation they will be on by default at some opt level at some point.

Thanks for the review.
Unfortunately this needs to remain off by default until memcmp expansion is enabled again (see Sanjay's comment above).
In the meantime I'll run some experiments on a larger codebase :)

courbet updated this revision to Diff 113516.Sep 1 2017, 1:07 AM

Add an example for doesMoreWork()

This revision was automatically updated to reflect the committed changes.
davide edited edge metadata.Sep 1 2017, 11:25 AM
davide added a subscriber: zhendongsu.

Sorry for being a little late to the party, but life took over recently :(
I don't mind the structure of the pass as is (and I saw @dberlin approved which increases my confidence).
I see this was reverted because it breaks the bot. Clement, I suppose you plan to recommit this soon, and after that happens, maybe @zhendongsu can enable the new cl::opt and find new trophies for the fuzzers we have available? Zhendong, what do you think?

I see this was reverted because it breaks the bot. Clement, I suppose you plan to recommit this soon, and after that happens, maybe @zhendongsu can enable the new cl::opt and find new trophies for the fuzzers we have available? Zhendong, what do you think?

This was resubmitted with the missing headers. I'm currently trying to benchmark this on real-life binaries.

rsmith added a subscriber: rsmith.May 18 2018, 5:32 PM

This change causes the compiler to crash under -fno-builtin mode, presumably because it's not checking that there is a memcmp builtin library function available for use.

This change causes the compiler to crash under -fno-builtin mode, presumably because it's not checking that there is a memcmp builtin library function available for use.

I created PR37548, thanks for reporting.