This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Defend against worst-case exponential execution time
ClosedPublic

Authored by jmolloy on Jan 4 2016, 9:39 AM.

Details

Summary

It is possible to create an IR pattern that results in exponential execution
time in CollectBitParts, by making a tree where every OR refers to one other OR via *both*
operands:

for (i = 0; i < N; ++i)
    b |= b >> 1;  // Causes 2^N executions of CollectBitParts!

To defend against this (and because ORs are the only node that cause us to
fork control flow) we keep a counter alive between all invocations of
CollectBitParts, expected to be initialized to bitwidth(b).

Diff Detail

Repository
rL LLVM

Event Timeline

jmolloy updated this revision to Diff 43894.Jan 4 2016, 9:39 AM
jmolloy retitled this revision from to [InstCombine] Defend against worst-case exponential execution time.
jmolloy updated this object.
jmolloy added a reviewer: joerg.
jmolloy set the repository for this revision to rL LLVM.
jmolloy added a subscriber: llvm-commits.
majnemer added inline comments.
lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1775–1777

This seems a bit much if you have i128. What is NumOrsRemaining typically when this optimization kicks in?

Out of curiosity, is there a major benefit to canonicalizing this in the middle-end vs doing this transform in code-gen prepare?

jmolloy updated this revision to Diff 44108.Jan 6 2016, 5:17 AM

Hi David, Joerg,

This version rewrites the algorithm significantly. In the original algorithm every recursive call was a function of a Value* and multiple other operands. This made memoization impossible as it is not often that the collectBitParts is called with the same Value* and also the same BitMask/OverallLeftShift.

collectBitParts is now just a function of a single Value*, and it is up to the caller to apply any transformations on its result. This means the result can be memoized which means that Joerg's testcase requires ~64 recursions instead of 2^32. The new design makes the code smaller and more concise too.

David, I have implemented this in InstCombine still. I also have a version of this patch implemented in CodeGenPrepare. The CGP patch cannot fold bitreverse(bitreverse(X)) -> X, but perhaps that pattern is less common than the equivalent with bswap.

I am happy to switch to the CGP version if you wish, which would mean I'd revert all of my patches to InstCombineAndOrXor.cpp.

Cheers,

James

jmolloy updated this revision to Diff 44315.Jan 8 2016, 3:14 AM

Hi Joerg, David,

I think this version should tick all the boxes. There are several requirements that ended up with this design;

  1. Matching bitreversals is too heavyweight for InstCombine and doesn't really need to be done so early.
  2. Bitreversals and byteswaps are very related in their matching logic.
  3. We want to implement support for matching more advanced bswap/bitreverse patterns like partial bswaps/bitreverses.
  4. Bswaps are best matched early in InstCombine.

The result of these is that a new utility function is created in Transforms/Utils/Local.h that can be configured to search for bswaps, bitreverses or both. InstCombine uses it to find only bswaps, CGP uses it to find only bitreversals.

We can then extend the matching logic in one place only.

jmolloy updated this revision to Diff 44450.Jan 11 2016, 2:07 AM

Hi Joerg,

Thanks for that bug report. It saved me some embarassment when the asan bot would have picked it up post-commit.

The issue was holding a reference into a DenseMap which can be invalidated during a hashtable resize. This is solved (as in other places in LLVM) by explicitly using a std::map.

James

majnemer added inline comments.Jan 12 2016, 1:58 PM
lib/CodeGen/CodeGenPrepare.cpp
5241–5250 ↗(On Diff #44450)

Should we bother doing this on targets which can't lower biterverse to anything good?

jmolloy updated this revision to Diff 44753.Jan 13 2016, 7:28 AM

Hi David,

This makes sense. I've updated the diff.

James

Hi David,

Does this look OK now? I'm keen to get this committed as it needs to be merged to 3.8 to fix a compiler hang.

Cheers,

James

majnemer accepted this revision.Jan 14 2016, 3:54 PM
majnemer added a reviewer: majnemer.

LGTM

This revision is now accepted and ready to land.Jan 14 2016, 3:54 PM
mssimpso resigned from this revision.Feb 19 2016, 10:07 AM
mssimpso removed a reviewer: mssimpso.
jmolloy closed this revision.Apr 27 2016, 7:26 AM