Page MenuHomePhabricator

[analyzer] Implemented RangeSet::Factory::unite function to handle intersections and adjacency
Needs ReviewPublic

Authored by ASDenysPetrov on Fri, Apr 2, 9:06 AM.

Details

Summary

Handle intersected and adjacent ranges uniting them into a single one.
Example:
intersection [0, 10] U [5, 20] = [0, 20]
adjacency [0, 10] U [11, 20] = [0, 20]

Diff Detail

Event Timeline

ASDenysPetrov created this revision.Fri, Apr 2, 9:06 AM
ASDenysPetrov requested review of this revision.Fri, Apr 2, 9:06 AM

Thanks for working on improvements of the solver and constraints! However, I have some tough questions about this patch.

What I really want to understand here is motivation. Why do we need to have add operation semantics like this in the first place? My guess is that "the user" will be in the following patch.
Additionally, I don't really like the idea of replacing something simple and fast (old add methods`) with something more complex unconditionally. Old users still don't need this additional logic. C++ has always been the language where we "pay for what we use".
So, with good motivation, I'd still prefer methods like add and addUnchecked or smith similar.

clang/include/clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h
127

This most certainly can be done in O(N + M) the same way the intersection is done.

vsavchenko requested changes to this revision.Fri, Apr 2, 9:15 AM
This revision now requires changes to proceed.Fri, Apr 2, 9:15 AM
ASDenysPetrov added a comment.EditedFri, Apr 2, 9:54 AM

Thanks for working on improvements of the solver and constraints! However, I have some tough questions about this patch.

What I really want to understand here is motivation. Why do we need to have add operation semantics like this in the first place? My guess is that "the user" will be in the following patch.
Additionally, I don't really like the idea of replacing something simple and fast (old add methods`) with something more complex unconditionally. Old users still don't need this additional logic. C++ has always been the language where we "pay for what we use".
So, with good motivation, I'd still prefer methods like add and addUnchecked or smith similar.

My motivation is that I'm currently working on some bigger improvement (symbolic integral cast) and stucked here of the lack of handling intersections. Would you mind of accepting this revision in case if I restore complexity to O(N+M)?

Or. We always can have both methods. The quick add and caring add. Actually you've said that :)

clang/include/clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h
127

Actually yes, it can. And it was, when I played with different solutions. But the code was poor readable. So I decided make it easier to understand to bring to review. Of couse I can move further to improve it and retain readability. I'll do.

@vsavchenko
OK, what do you think of *adjacency* feature? I mean it simplifies such ranges [1,2][3,4][5,6] to [1,6]. Is it worth for implementation?

Updated. Restored complexity to O(N).

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
112–113

Also optimized this particular case.

128–129

This allows to add a RangeSet of any type. E.g. RangeSet(uchar) + RangeSet(int) = valid, because of pin

I'm wondering whether we really need it here in practice?

clang/unittests/StaticAnalyzer/RangeSetTest.cpp
166

Fixed the misprint.

@vsavchenko
OK, what do you think of *adjacency* feature? I mean it simplifies such ranges [1,2][3,4][5,6] to [1,6]. Is it worth for implementation?

I want to clarify my position here. It's not that I am opposed to this change in principle, but I want to understand the motivation (a small example will be sufficient) and I don't want to sacrifice a single bit of performance efficiency of this part of code.
Even for the O(N + M) solution, I'd still be standing strong on keeping the old functions as is (except for maybe renaming them). Range sets are small and asymptotics don't work that well when reasoning about the expected performance benefits and gains.
For this reason, whenever we can, we should have the simplest operation possible in terms of the number of instructions, ie the constant factor is very strong here.

clang/include/clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h
127

Bring it here and we'll see what can be done on that front. *makeover time!*

145

nit: O(N + log(N)) == O(N)

vsavchenko added inline comments.Sat, Apr 3, 3:28 AM
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
128–129

We mix ranges of different types way more than one would expect.

vsavchenko added inline comments.Sat, Apr 3, 3:52 AM
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
114–115

This is REAL bad. The main benefit of the new RangeSet over the old one is the fact that common operations that consist of many basic operations are done on mutable containers, i.e. when RHS has 10 elements this code will create and copy a new array 10 times discarding 9 of them.

That's why every implementation method here operates on mutable ContainerType and then makes is persistent.

Additionally, merging add can be done with one iteration over two containers, but instead we do O(N) operation here M times, so it is not O(N + M), but O(N * M).

132

Is there a reason not to use range-based loop in this case?

140

This should be done outside of the loop, we assume that all the ranges are of the same type.

162

This is a problem here. This essentially doubles the work you did before. What can be done in one O(N) loop is done with two.

However, I don't really see a point in fixing this algorithm because the more generic RangeSet + RangeSet should be optimal O(N + M) and this one can be implemented as a special case.

ASDenysPetrov added a comment.EditedSat, Apr 3, 7:25 AM

@vsavchenko Many thanks for your feedback!
I will make a new separate function for checking intersections considering all your suggestions along with the old quick add versions. It'll be more optimized.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
132

You are right. Working on restoring O(N) I been played with iterators, then found another solution, but forgot to return range-based loop back. Thnx.

140

+1 I'll move it outside the loop.

ASDenysPetrov updated this revision to Diff 335768.EditedWed, Apr 7, 3:03 AM
ASDenysPetrov retitled this revision from [analyzer] Handle intersections and adjacency in RangeSet::Factory::add function to [analyzer] Implemented RangeSet::Factory::unite function to handle intersections and adjacency.

Updated. Implemented four separate functions RangeSet::Factory::unite. Each function has the most optimized approach to handle intersections for the particular case.

@vsavchenko You are welcome to evaluate this changes.

I want to understand the motivation (a small example will be sufficient)

I'm afraid a snippet out of context wouldn't make much sense for you. Let me present you the next patch soon.

Well, that is a nice exercise for "two pointer" problems, but can we please talk about the actual use case for it?

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
136

I'd prefer merge

217

This algorithm is indeed O(N +M), good job!
But let me get picky again 😅

It looks like we do too many comparisons in this loop. As I said earlier, constant factor is the key here and while this iterator idea is good it is a tradeoff. There is a piece of knowledge that would've been obvious with regular iterators, now is a run-time unknown that we constantly need to check (isTo and isFrom).

What we are trying to achieve here is not harder than what we do in intersect and there it is way less comparisons. While iterating through sets we can keep a set of invariants, so we don't need to re-check something that we already know.

Here is a sketch for the algorithm:

{
  assert(First != FirstEnd && Second != SecondEnd);
  // We have && here because one of them reaches its end,
  // we should not check for it again and simply add the rest
  // of the other set.
  while (true) {
    // We need to find the beginning of the next range to add
    // First should be the iterator which To is a candidate to be
    // an and for the merged range.
    if (First.From() > Second.From()) {
      swap();
    }

    auto From = First.From();

    // That's it, we found our start, and we need to go through
    // other ranges and look for the end.
    // After this point, First.From() shouldn'y be accessed.

    while (true) {
      // We can compress all the checks into just one.
      // This essentially means that Second should not get merged with First.
      if (Second.From() > First.To() + 1) {
        break;
      }

      if (Second.From() > First.From()) {
        // First is maintained as a candidate for the end.
        swap();
      }

      // At this point we know that Second lives fully inside
      // of the new range and we can skip it.
      ++Second;

      // If we have nothing else in the second set...
      if (Second == SecondEnd) {
        // ...let's finish the current range first...
        Result.emplace_back(From, First.To());
        while (++First != FirstEnd) {
          // ...and copy the rest of the ranges
          Result.push_back(*First);
        }
        // The range is ready.
        return makePersistent(Result);
      }
    };

    // Second is outside of the range, and we can
    // safely add a new range.
    Result.emplace_back(From, First.To());
    ++First;

    // First set can be over at this point and we should...
    if (First == FirstEnd) {
      // ...copy the rest if the second set's ranges.
      while (Second != SecondEnd) {
        Result.push_back(*(Second++));
      }
      // Nothing left to do.
      return makePersistent(Result);
    }
  };

  // No way for us to get here!
  llvm_unreachable("...");
}

It's trickier in the way we end things than the intersection because we still need to add the rest of the other set.

279–293

I don't see any practical reasons to keep this version over the more generic RangeSet + RangeSet

321–335

Same here, it is just way more code to maintain.

vsavchenko added inline comments.Thu, Apr 8, 5:20 AM
clang/unittests/StaticAnalyzer/RangeSetTest.cpp
471

I guess I also want to have more cases where ranges from RHS are in between ranges from LHS, also it should be a case with LHS being a set of ranges.

497

Also I'd like to see cases when there are ranges to merge from two sets but then one set has a bunch of other ranges that should be added as is. We can automatically think of two possibilities here: when these additional ranges are before and after the common part.
And I guess I want at least one check with two sets with a good amount of ranges in both covering all possible situations and overlappings.

ASDenysPetrov added a comment.EditedFri, Apr 9, 10:43 AM

@vsavchenko
Thank you for the proposed solution. It looks much easier to understand and maintain. Great! I will take it into account.

Well, that is a nice exercise for "two pointer" problems, but can we please talk about the actual use case for it?

I'm currently working on integral cast between ranges. Consider the range of int which is casted to char. You've got some ranges of int which obviously should be corespondently represented as some other ranges of char.
Some examples:

int [257, 259]  -> char [1, 3]
int [510, 513]  -> char [-2, 1]
int [42, 1000]  -> char [-128, 127]
int [257, 259] U [2049, 2051]  -> char [1,3] // Here we need `unite` logic to get the casted range because both original ranges lay on the same area after trancation.