Page MenuHomePhabricator

Fix for bug 8281 - Extremely slow assembling and disassembling of ptrtoint
ClosedPublic

Authored by wecing on Dec 10 2014, 3:44 PM.

Details

Summary

The current getFoldedSizeOf() implementation uses naive recursion, which could be really slow when the input structure type is too complex.

This issue was first brought up in http://llvm.org/bugs/show_bug.cgi?id=8281; this change fixes it by adding memoization.

Diff Detail

Event Timeline

wecing updated this revision to Diff 17126.Dec 10 2014, 3:44 PM
wecing retitled this revision from to Fix for bug 8281 - Extremely slow assembling and disassembling of ptrtoint.
wecing updated this object.
wecing edited the test plan for this revision. (Show Details)
wecing added a subscriber: wecing.
wecing updated this revision to Diff 313905.Dec 28 2020, 2:48 PM
wecing edited the summary of this revision. (Show Details)
wecing edited the summary of this revision. (Show Details)Dec 28 2020, 2:57 PM

Could you add the test as well. Also, include the context in the patch. See https://llvm.org/docs/Phabricator.html#phabricator-request-review-web

llvm/lib/IR/LLVMContextImpl.h
1418 ↗(On Diff #313905)

Documentation missing.

wecing edited the summary of this revision. (Show Details)Jan 15 2021, 5:20 PM
wecing updated this revision to Diff 317122.Jan 15 2021, 5:20 PM

Update implementation to use local cache and add unit tests.

wecing updated this revision to Diff 317123.Jan 15 2021, 5:22 PM

Include more commits.

wecing marked an inline comment as done.Jan 15 2021, 5:23 PM
wecing updated this revision to Diff 317124.Jan 15 2021, 5:24 PM

include all commits.

wecing updated this revision to Diff 317146.Jan 15 2021, 8:03 PM

remove unused include

wecing updated this revision to Diff 317153.Jan 15 2021, 9:34 PM

remove unused cast

wecing updated this revision to Diff 317154.Jan 15 2021, 9:37 PM

update comment

Sorry about the commit spam.

Not sure what would be the best way to set a deadline for the tests, so I made the test run (almost) forever if not properly optimized.
The latest version also changed Cache to be a local map, instead of a shared instance within type context. This allows us to assume the cached result will always have the correct DestTy (it doesn't really make much difference, though).

PTAL, @jdoerfert

jdoerfert accepted this revision.Feb 5 2021, 3:43 PM

LGTM, one nit.

llvm/lib/IR/ConstantFold.cpp
363

This looks up Ty in cache twice, you can do it once, e.g., remember the find result.

This revision is now accepted and ready to land.Feb 5 2021, 3:43 PM
wecing updated this revision to Diff 321919.Feb 5 2021, 9:27 PM

Avoid redundant map indexing.

wecing updated this revision to Diff 321920.Feb 5 2021, 9:28 PM

diff against main branch

wecing marked an inline comment as done.Feb 5 2021, 10:28 PM
timshen added inline comments.
llvm/lib/IR/ConstantFold.cpp
368

This is correct today, but newly added return statements in the future might miss the Cache[Ty] = part.

Do you mind doing the following:

static Constant *getFoldedSizeOfImpl(Type *Ty, Type *DestTy, bool Folded,
                                 DenseMap<Type *, Constant *> &Cache) {
  // The actual implementation, no explicit caching.
  // It recursively calls into getFoldedSizeOf below.
}

static Constant *getFoldedSizeOf(Type *Ty, Type *DestTy, bool Folded,
                                 DenseMap<Type *, Constant *> &Cache) {
  Constant*& V = Cache[Ty];
  if (V == nullptr)
    V = getFoldedSizeOfImpl(...);
  return V;
}
timshen added inline comments.Feb 10 2021, 11:26 AM
llvm/lib/IR/ConstantFold.cpp
368
Constant*& V = Cache[Ty];
if (V == nullptr)

Actually we should use result = Cache.insert(...); if (result.second) ... instead, since nullptr might be a valid cached value.

wecing added inline comments.Feb 11 2021, 3:23 PM
llvm/lib/IR/ConstantFold.cpp
368

Thank you @timshen ! I think you made a good point, but your proposal introduces another problem: now within getFoldedSizeOfImpl(), people must be very careful to always call getFoldedSizeOf() instead of getFoldedSizeOfImpl() to get the caching behavior, which is also pretty error-prone.

A better solution might be to create a wrapper type for the Constant *, so getFoldedSizeOf() would be something like:

#define CACHE_VALUE(v) (new cached_val<Constant *>(Cache[Ty] = v))

static cached_val<Constant *> getFoldedSizeOf(..., DenseMap<...> *Cache) {
  ...
  if (ArrayType *ATy = dyn_cast<ArrayType>(Ty)) {
    Constant *N = ConstantInt::get(DestTy, ATy->getNumElements());
    cached_val<Constant *>E = getFoldedSizeOf(ATy->getElementType(), DestTy, true, Cache);
    return CACHE_VALUE(ConstantExpr::getNUWMul(E.value, N));
  }
  ...
}

static Constant *getFoldedSizeOf(...) {
  return getFoldedSizeOf(...).value;
}

But that is obviously an overkill for the problem, so I would rather keep it simple here. WDYT?

timshen added inline comments.Feb 12 2021, 11:55 AM
llvm/lib/IR/ConstantFold.cpp
368

Yes, I agree that using cached_value would be an overkill. I would say adding a comment about the recursive call target is sufficient.

wecing updated this revision to Diff 323522.Feb 12 2021, 10:23 PM
  • add wrapper method for caching
wecing marked 3 inline comments as done.Feb 12 2021, 10:25 PM
wecing added inline comments.
llvm/lib/IR/ConstantFold.cpp
368

I see. Done, PTAL.

timshen accepted this revision.Feb 19 2021, 12:18 PM

Nicely done!

This revision was landed with ongoing or failed builds.Feb 19 2021, 12:45 PM
This revision was automatically updated to reflect the committed changes.