This is an archive of the discontinued LLVM Phabricator instance.

[ConstantRange] Reduce the number of allocations in ConstantRange::makeGuaranteedNoWrapRegion
AbandonedPublic

Authored by craig.topper on May 1 2017, 11:47 AM.

Details

Summary

Currently, makeGuaranteedNoWrapRegion creates 3 temporary APInts each time it calls SubsetIntersect. Two to hold the inverses of CR0 and CR1 and one to hold the result before inverting for the return. If the APInt is larger than 64-bits, each of these represents an allocation. We're also doing extra inversions on Result if SubsetIntersect is called multiple time.

This patch adds an inverseInPlace method that will turn a ConstantRange to its inverse without creating a new object. This can help remove the temporary allocations.

Additionally, we now store Result in its inverse form through all of the possible calls to SubsetIntersect and only invert it at the end before the return. I've modified the lambda to write to read/write Result through a capture so that it no longer looks like a generic function. It also takes Other by value now so that we can capture the temporary ConstantRange that's always passed to it and call inverseInPlace on it.

Diff Detail

Event Timeline

craig.topper created this revision.May 1 2017, 11:47 AM
sanjoy edited edge metadata.May 14 2017, 3:05 PM

Hi Craig,

I think this change makes makeGuaranteedNoWrapRegion less readable. Did you take a look at implementing a ConstantRange::subsetIntersectWith directly? That'll be more work, but I think the end result will be cleaner.

Add a new subsetIntersectWith method that doesn't rely on inverting and calling unionWith.

I'll work on some unit tests, but wanted another set of eyes on the various cases.

davide requested changes to this revision.May 15 2017, 12:32 AM
davide added a subscriber: davide.

This patch switches a one-liner with a much larger function.
Overall, I don't think this is necessarily unreasonable, but the increased complexity needs to be justified.
Craig, were you able to measure any measurable performance impact (in compile time) from the reduced malloc traffic?
If so, can you show the numbers?

This revision now requires changes to proceed.May 15 2017, 12:32 AM

This was based on reviewing the output of callgrind on one file and seeing what places create a lot of wider than 64-bit APInts. I can try to collect a larger data set of real numbers.

Thinking about this more, I wonder if this is even the right interface for the question some of the callers are trying to determine.

Several of the callers call this and then check if the nowrap range fully contains another range. But the makeGuaranteedNoWrap register sometimes makes compromises where it has to decide pick two different possible ranges to return by deciding which choice is larger. You can see this in the two places in where we compute d1 and d2 in my last patch. But its possible the range the caller wants to check is contained inside the smaller choice, but not the larger choice. Should we instead have a method to ask if the addition of two ranges won't cause overflow directly without having to create a possibly compromised nowrap region?

craig.topper abandoned this revision.Aug 8 2017, 9:30 AM