This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/StaticAnalyzer/Core/PathSensitive/
-
clang/
-
StaticAnalyzer/
-
Core/
-
PathSensitive/
2/9
RangedConstraintManager.h
-
lib/StaticAnalyzer/Core/
-
StaticAnalyzer/
-
Core/
13/50
RangeConstraintManager.cpp
-
unittests/StaticAnalyzer/
-
StaticAnalyzer/
10/25
RangeSetTest.cpp

Differential D99797

[analyzer] Implemented RangeSet::Factory::unite function to handle intersections and adjacency
ClosedPublic

Authored by ASDenysPetrov on Apr 2 2021, 9:06 AM.

Download Raw Diff

Details

Reviewers

vsavchenko
steakhal
NoQ
xazax.hun
dcoughlin
Szelethus
martong

Commits

rG6a399bf4b3aa: [analyzer] Implemented RangeSet::Factory::unite function to handle…

Summary

Handle intersected and adjacent ranges uniting them into a single one.
Example:
intersection [0, 10] U [5, 20] = [0, 20]
adjacency [0, 10] U [11, 20] = [0, 20]

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Herald added subscribers: martong, Charusso, dkrupp and 6 others. · View Herald TranscriptApr 2 2021, 9:06 AM

ASDenysPetrov requested review of this revision.Apr 2 2021, 9:06 AM

Herald added a subscriber: cfe-commits. · View Herald TranscriptApr 2 2021, 9:06 AM

Thanks for working on improvements of the solver and constraints! However, I have some tough questions about this patch.

What I really want to understand here is motivation. Why do we need to have add operation semantics like this in the first place? My guess is that "the user" will be in the following patch.
Additionally, I don't really like the idea of replacing something simple and fast (old add methods`) with something more complex unconditionally. Old users still don't need this additional logic. C++ has always been the language where we "pay for what we use".
So, with good motivation, I'd still prefer methods like add and addUnchecked or smith similar.

clang/include/clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h
127	This most certainly can be done in `O(N + M)` the same way the intersection is done.

vsavchenko requested changes to this revision.Apr 2 2021, 9:15 AM

This revision now requires changes to proceed.Apr 2 2021, 9:15 AM

Harbormaster completed remote builds in B96909: Diff 334964.Apr 2 2021, 9:52 AM

In D99797#2666358, @vsavchenko wrote:

Thanks for working on improvements of the solver and constraints! However, I have some tough questions about this patch.

What I really want to understand here is motivation. Why do we need to have add operation semantics like this in the first place? My guess is that "the user" will be in the following patch.
Additionally, I don't really like the idea of replacing something simple and fast (old add methods`) with something more complex unconditionally. Old users still don't need this additional logic. C++ has always been the language where we "pay for what we use".
So, with good motivation, I'd still prefer methods like add and addUnchecked or smith similar.

My motivation is that I'm currently working on some bigger improvement (symbolic integral cast) and stucked here of the lack of handling intersections. Would you mind of accepting this revision in case if I restore complexity to O(N+M)?

Or. We always can have both methods. The quick add and caring add. Actually you've said that :)

clang/include/clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h
127	Actually yes, it can. And it was, when I played with different solutions. But the code was poor readable. So I decided make it easier to understand to bring to review. Of couse I can move further to improve it and retain readability. I'll do.

@vsavchenko
OK, what do you think of *adjacency* feature? I mean it simplifies such ranges [1,2][3,4][5,6] to [1,6]. Is it worth for implementation?

Updated. Restored complexity to O(N).

@vsavchenko FYI.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
112–113	Also optimized this particular case.
129–130	This allows to add a RangeSet of any type. E.g. RangeSet(uchar) + RangeSet(int) = valid, because of `pin` I'm wondering whether we really need it here in practice?
clang/unittests/StaticAnalyzer/RangeSetTest.cpp
167	Fixed the misprint.

Harbormaster completed remote builds in B96942: Diff 335011.Apr 2 2021, 1:44 PM

In D99797#2666565, @ASDenysPetrov wrote:

@vsavchenko
OK, what do you think of *adjacency* feature? I mean it simplifies such ranges [1,2][3,4][5,6] to [1,6]. Is it worth for implementation?

I want to clarify my position here. It's not that I am opposed to this change in principle, but I want to understand the motivation (a small example will be sufficient) and I don't want to sacrifice a single bit of performance efficiency of this part of code.
Even for the O(N + M) solution, I'd still be standing strong on keeping the old functions as is (except for maybe renaming them). Range sets are small and asymptotics don't work that well when reasoning about the expected performance benefits and gains.
For this reason, whenever we can, we should have the simplest operation possible in terms of the number of instructions, ie the constant factor is very strong here.

clang/include/clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h
127	Bring it here and we'll see what can be done on that front. makeover time!
145	nit: `O(N + log(N)) == O(N)`

vsavchenko added inline comments.Apr 3 2021, 3:28 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
129–130	We mix ranges of different types way more than one would expect.

vsavchenko added inline comments.Apr 3 2021, 3:52 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
114–115	This is REAL bad. The main benefit of the new `RangeSet` over the old one is the fact that common operations that consist of many basic operations are done on mutable containers, i.e. when `RHS` has 10 elements this code will create and copy a new array 10 times discarding 9 of them. That's why every implementation method here operates on mutable `ContainerType` and then makes is persistent. Additionally, merging add can be done with one iteration over two containers, but instead we do `O(N)` operation here `M` times, so it is not `O(N + M)`, but `O(N * M)`.
133	Is there a reason not to use range-based loop in this case?
141	This should be done outside of the loop, we assume that all the ranges are of the same type.
163	This is a problem here. This essentially doubles the work you did before. What can be done in one `O(N)` loop is done with two. However, I don't really see a point in fixing this algorithm because the more generic `RangeSet` + `RangeSet` should be optimal `O(N + M)` and this one can be implemented as a special case.

@vsavchenko Many thanks for your feedback!
I will make a new separate function for checking intersections considering all your suggestions along with the old quick add versions. It'll be more optimized.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
133	You are right. Working on restoring O(N) I been played with iterators, then found another solution, but forgot to return range-based loop back. Thnx.
141	+1 I'll move it outside the loop.

Updated. Implemented four separate functions RangeSet::Factory::unite. Each function has the most optimized approach to handle intersections for the particular case.

@vsavchenko You are welcome to evaluate this changes.

I want to understand the motivation (a small example will be sufficient)

I'm afraid a snippet out of context wouldn't make much sense for you. Let me present you the next patch soon.

Harbormaster completed remote builds in B97484: Diff 335768.Apr 7 2021, 4:02 AM

Well, that is a nice exercise for "two pointer" problems, but can we please talk about the actual use case for it?

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
137	I'd prefer `merge`
218	This algorithm is indeed `O(N +M)`, good job! But let me get picky again 😅 It looks like we do too many comparisons in this loop. As I said earlier, constant factor is the key here and while this iterator idea is good it is a tradeoff. There is a piece of knowledge that would've been obvious with regular iterators, now is a run-time unknown that we constantly need to check (`isTo` and `isFrom`). What we are trying to achieve here is not harder than what we do in `intersect` and there it is way less comparisons. While iterating through sets we can keep a set of invariants, so we don't need to re-check something that we already know. Here is a sketch for the algorithm: { assert(First != FirstEnd && Second != SecondEnd); // We have && here because one of them reaches its end, // we should not check for it again and simply add the rest // of the other set. while (true) { // We need to find the beginning of the next range to add // First should be the iterator which To is a candidate to be // an and for the merged range. if (First.From() > Second.From()) { swap(); } auto From = First.From(); // That's it, we found our start, and we need to go through // other ranges and look for the end. // After this point, First.From() shouldn'y be accessed. while (true) { // We can compress all the checks into just one. // This essentially means that Second should not get merged with First. if (Second.From() > First.To() + 1) { break; } if (Second.From() > First.From()) { // First is maintained as a candidate for the end. swap(); } // At this point we know that Second lives fully inside // of the new range and we can skip it. ++Second; // If we have nothing else in the second set... if (Second == SecondEnd) { // ...let's finish the current range first... Result.emplace_back(From, First.To()); while (++First != FirstEnd) { // ...and copy the rest of the ranges Result.push_back(First); } // The range is ready. return makePersistent(Result); } }; // Second is outside of the range, and we can // safely add a new range. Result.emplace_back(From, First.To()); ++First; // First set can be over at this point and we should... if (First == FirstEnd) { // ...copy the rest if the second set's ranges. while (Second != SecondEnd) { Result.push_back((Second++)); } // Nothing left to do. return makePersistent(Result); } }; // No way for us to get here! llvm_unreachable("..."); } It's trickier in the way we end things than the intersection because we still need to add the rest of the other set.
280–294	I don't see any practical reasons to keep this version over the more generic `RangeSet` + `RangeSet`
322–336	Same here, it is just way more code to maintain.

vsavchenko added inline comments.Apr 8 2021, 5:20 AM

clang/unittests/StaticAnalyzer/RangeSetTest.cpp
469	I guess I also want to have more cases where ranges from RHS are in between ranges from LHS, also it should be a case with LHS being a set of ranges.
495	Also I'd like to see cases when there are ranges to merge from two sets but then one set has a bunch of other ranges that should be added as is. We can automatically think of two possibilities here: when these additional ranges are before and after the common part. And I guess I want at least one check with two sets with a good amount of ranges in both covering all possible situations and overlappings.

@vsavchenko
Thank you for the proposed solution. It looks much easier to understand and maintain. Great! I will take it into account.

Well, that is a nice exercise for "two pointer" problems, but can we please talk about the actual use case for it?

I'm currently working on integral cast between ranges. Consider the range of int which is casted to char. You've got some ranges of int which obviously should be corespondently represented as some other ranges of char.
Some examples:

int [257, 259]  -> char [1, 3]
int [510, 513]  -> char [-2, 1]
int [42, 1000]  -> char [-128, 127]
int [257, 259] U [2049, 2051]  -> char [1,3] // Here we need `unite` logic to get the casted range because both original ranges lay on the same area after trancation.

Updated the patch due to comments. Added more tests. Simplified and improved solution.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
137	There are common namings of operations with sets and `union` is one of them. IMO this is most appropriate one.

Harbormaster completed remote builds in B103228: Diff 343714.May 7 2021, 11:02 AM

Minor comment fix.

Harbormaster completed remote builds in B103252: Diff 343750.May 7 2021, 2:19 PM

Minor performance improvement. Add more comments.

Harbormaster completed remote builds in B103771: Diff 344444.May 11 2021, 10:37 AM

vsavchenko added inline comments.May 12 2021, 4:21 AM

clang/include/clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h
147	`LHS`
247	nit: "...on the fact that..."
250	`ContainerType` is basically a mutable version of `RangeSet`, so there is only one reason to return it - you believe that the users might want to modify it after they called this `unite`. But as long as this `unite` is just a generalized version of user-facing `unites, it can totally return` RangeSet`.
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
112	Let's reserve some place here. Because `LHS` and `RHS` don't have intersections, the result always has `size(LHS) + size(RHS)` elements
222	Oof, I don't know about this algorithm. I mean it does its job. But IMO it lacks a good description of what are the invariants and what are the different situations we are looking for. Aaaand you kind of re-check certain conditions multiple times. One example here is the check for `Min` and `Max`. Those situations are super rare, but we check for them on every single iteration. `std::min` and `std::max` are additional comparisons. As I mentioned before, constant factor is the key here and less comparisons we do is way more important than doing binary search at some point. Just make a benchmark if you don't believe me (with google-benchmark, for example). The version with less comparisons will dominate one with more on `RangeSet` under 20 (and they'll be even smaller in practice).

@vsavchenko Thanka for the suggestions! I'll take them into account and update the patch.

clang/include/clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h
250	I'm going to use raw ContainerType in further patches. So this is exactly what I want.
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
222	I'll investigate the whole algorithm once more and reduce comparisons.

vsavchenko added inline comments.May 12 2021, 5:27 AM

clang/include/clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h
250	Oh, I see. OK then :)

Reworked the solution. Returned to Implemented two versions of the same algorithm. Most optimized (but more verbose) and generalized one (but less optimized).
Added a bit more tests. @vsavchenko , please, look.

Herald added a subscriber: manas. · View Herald TranscriptMay 21 2021, 5:04 AM

Harbormaster completed remote builds in B105604: Diff 346741.May 21 2021, 6:22 AM

Minor improvements in unit tests.

More minor improvements in unit tests.

Harbormaster completed remote builds in B106082: Diff 347662.May 25 2021, 6:12 AM

ASDenysPetrov added a child revision: D103094: [analyzer] Implemented RangeSet::Factory::castTo function to perform promotions, truncations and conversions.May 25 2021, 9:03 AM

To be honest, I don't think that it solves the problem I mentioned before. The fact that conditions and branching are part of operator++ now, doesn't cancel them. I noticed that you made the first loop, so we don't need to check for Min in the main loop, and this is the right direction. But this approach has the same problem as one of the previous methods - you check isFrom, this is the price of abstraction. This is something that we won't be doing when working outside of such abstractions. Also, when one of the range sets is shorter, you will keep checking if the other iterator is end on every iteration when you run out of the ranges in shorter set.

I'm sorry for being so hard on you about this patch. 😔

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
257	nit: it has different ticks than you use in all other places

@vsavchenko
Thanks for your suggestions! I really appreciate it! I'll do my best on this algorithm.
BTW, I've just presented the motivation you looked D103094 and D103096.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
257	I'm sorry. I didn't get it. What do you mean?

vsavchenko added inline comments.May 26 2021, 6:58 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
257	This is a super duper tiny thing: you usually write `)` or `(`, but here wrote '('.

ASDenysPetrov added inline comments.May 26 2021, 8:20 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
257	Oh! :-) Sharp eye!

@vsavchenko
Reworked the algorithm.

I hope this is a final version. Honestly, I also have the most optimized version but it has twice more similar(but different) code and gotos. I decided not to present it. Let it be less micro-optimized but much readable version.

Harbormaster completed remote builds in B106526: Diff 348284.May 27 2021, 8:42 AM

Fixed the issue. Added more unit tests.

Harbormaster completed remote builds in B106533: Diff 348294.May 27 2021, 9:05 AM

I think this iteration is much better, it requires way more description as it has now. You didn't actually describe anywhere how this algorithm actually works.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
210	nit: "is adjacent"

vsavchenko added inline comments.Jun 17 2021, 3:39 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
200–201	At this point, we know for a fact that the next range we are going to add to our result set starts with `I1->From()`. No need to check `F` for null or anything
212	Why `break`? At this point we know that what we peaked into is actually a part of another range (in terms of the end result). And this range is over, you just need to add it to the final result!
214–218	You actually don't need it, the moment you swap two iterators and find out that `I1->From() <= I2->From()`, that's it `I1->From()` is the beginning of the new range. No need to carry extra flag here.
227	nit Additionally, you didn't tell your readers WHY you are even swapping iterators in the first place and what relationship they have. You need to communicate in terms of invariants.
244	This goto is always finishing with a return, so we can refactor `goto end1` into something like `return copyIntoResult(I1, E1);` and `goto end2` into `return copyIntoResult(I2, E2);`. As I mentioned before, optional `F` should be removed. On every path we should know deterministically what is the beginning and what is the end of the range that we are trying to add.

Updated. Removed F as flag. Replaced goto with closure. Detailed comments and fixed typos.

@vsavchenko made changes according your suggestions.

Harbormaster completed remote builds in B109809: Diff 352851.Jun 18 2021, 7:41 AM

Rebased

Harbormaster completed remote builds in B114024: Diff 358656.Jul 14 2021, 11:38 AM

How about this patch and the entire stack?

Rebased. Review, please.

Harbormaster completed remote builds in B125151: Diff 374265.Sep 22 2021, 10:39 AM

Gentle ping.

@ASDenysPetrov Nice work! I really appreciate the hard work you guys (with @vsavchenko) had done here. I really like that you have created visible test cases (though the last ones are a bit cryptic for me). It is going to take some more time to finish my review.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
177	`This is called`

I still need to chew through the code but on a high level, I think it looks correct.
PS: the test coverage is outstanding!

unite-patch-line-coverage.zip51 KBDownload

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
147	Why do you take `APSInt`s by value? Generally, we take them by reference.
clang/unittests/StaticAnalyzer/RangeSetTest.cpp
76–79	Shouldn't you use `sizeof(BaseType) * CHAR_BIT` instead?

In D99797#3059203, @steakhal wrote:

PS: the test coverage is outstanding!

Thank you for this analysis.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
147	I want to send a message to the caller that he can pass an arbitrary APSInt without warrying about making it permanent (aka stored by the Factory). But we can revise this contract and carry this responsibility to a caller.
clang/unittests/StaticAnalyzer/RangeSetTest.cpp
76–79	Agree. It's better to avoid magic numbers. I'll fix.

martong added inline comments.Oct 20 2021, 2:41 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
147	Why do you take `APSInt`s by value? Generally, we take them by reference. Actually, it is specific to `BasicValueFactory` to cache the `APSInt`s, however, it might not be the best practice. I doubt that somebody has ever measured the performance of passing APSInts by value, so my guess is the caching of `APSInt`s might be an early optimization that might be more harmful than advantageous. On top of all this, we do the caching inconsistently, just consider the member functions of `APSIntType`, they all return by value. Perhaps (totally independently from this patch of course), it might be worth to have a measurement/comparison with removed cache and pass by value.

steakhal added inline comments.Oct 20 2021, 8:53 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
147	Okay, then we shall leave this as-is now.
clang/unittests/StaticAnalyzer/RangeSetTest.cpp
76–79	It's not only that but just imagine testing a clang on a special hardware where they have let's say 9 bit bytes, for parity or something similar stuff. The test would suddenly break. Although I suspect there would be many more things to break TBH xD

I think, the visual comments that we have in intersect makes the code of intersect a lot easier to follow. Could you please add similar visual comments here? Also, I find First and Second in intersect way more better naming than I1 and I2, could you please update?

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
167	I think, the naming conventions we have in `intersect` are easier to follow. Could we call `I1` as `First` and `I2` as `Second`?
179
186	It is not clear what is the complicated condition that you'd like to avoid. Could you please elaborate?
187–199	I'd like to see similar visual comments that we have in `intersect`.
202–203	Let's reuse as much as possible from `intersect`, both code and comments.
205–206	We could reuse `SwapIterators` from `intersect`. Of course for that we need to make a real function out of the lambda.
209
210	Let's be consistent with `intersect`. Also, you could introduce the variable here, and it is not needed to declare that at L176.
212
240

@martong
Thanks for your inlines. I'll update the patch.

clang/unittests/StaticAnalyzer/RangeSetTest.cpp
76–79	I am always skeptical about using`CHAR_BIT`, beacuse it represents bit number in `char`. And what if it would be 16 for instance (aka 2 bytes). But my intention is to get an amount of bits for a particular type. And I want something to represent a number of bits in a byte as a fundamental unit, but not something that depends on a `char` size on a particular platform. I would better introduce something like `constexpr size_t BITS_IN_BYTE = 8;`.

@martong @steakhal
Updated according to your comments. Thank you!

steakhal added inline comments.Oct 27 2021, 9:14 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
198–202	Might be `First->To() == Second->To()`. In this case the comment is not completely accurate.
clang/unittests/StaticAnalyzer/RangeSetTest.cpp
76–79	basic.memobj 6.7.1/22: The number of bits in a byte is reported by the macro `CHAR_BIT` in the header `<climits>`.

Harbormaster completed remote builds in B130962: Diff 382679.Oct 27 2021, 9:32 AM

ASDenysPetrov added inline comments.Oct 28 2021, 9:10 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
198–202	I'll update.
clang/unittests/StaticAnalyzer/RangeSetTest.cpp
76–79	These legacy names... :) I'll update.

Fixed nits.

ASDenysPetrov added inline comments.Oct 28 2021, 10:00 AM

clang/unittests/StaticAnalyzer/RangeSetTest.cpp
76–79	BTW, the first time this definition appeared only in C++17 in a footnote. The committee is not yet confident enough about CHAR_BIT that it still resides there :)

Harbormaster completed remote builds in B131226: Diff 383065.Oct 28 2021, 10:18 AM

Thanks Denys for the update! Very good! However, I think maybe we could make the code a bit more simpler.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
227–231	I am not sure about this, but perhaps we could put this `swapIterators` call right at the beginning of the nested `while` loop (L243). That would eliminate the need to call it again at the end of the second while loop.
241	So, this loop is about to merge conjunct Ranges. The first time when we find disjoint Ranges then we break out. (Or we return once we reach the end of any of the RangeSets.) This makes we wonder, if it would be possible to split this `while` loop into a lambda named `mergeConjunctRanges` ?
288–290	Should this be like in the code suggestion? You incremented `First` at L276.
291	?

Gentle ping.

ASDenysPetrov added inline comments.Nov 18 2021, 8:40 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
227–231	I'm afraid, it is not the case. Every loop needs its own `swapIterators`. As you can see, `swapIterators` in the nested loop invokes not always because of `break;` in the middle. Otherwise, it would invokes each time. But I checked your suggestion. It doesn't work.
241	I'm not in favor of introducing another level of abstraction. It would divide the flow/comments and could reduce readability and performance which is crucial here.

Fixed comments.

Thanks for the update, I am okay with the .cpp file, now I continue the review with the tests.

Harbormaster completed remote builds in B134916: Diff 388219.Nov 18 2021, 9:13 AM

martong mentioned this in D113753: [Analyzer][Core] Better simplification in SimpleSValBuilder::evalBinOpNN.Dec 2 2021, 8:42 AM

Nice, assiduous work! The tests are awesome!
LGTM, with minor revisions. Please check out my suggestions about the tests' formatting and there are those disturbing (LHS, RHS) swaps in the comments.

I am going to continue with the next patch in the stack. I recognize that the handling of casts is very important and problems related to not handling them occurs even more frequently as other parts of the engine evolve (e.g. https://reviews.llvm.org/D113753#3167134)

clang/unittests/StaticAnalyzer/RangeSetTest.cpp
436
444	I think, either we should use `{{X, X}}` or `X` everywhere, but not mixed.
520–521	LHS and RHS is swapped here?
523–524	I think we could better format these more complex cases.
528–529	LHS and RHS is swapped?
531–532
536–537	LHS and RHS is swapped?
553–554	LHS and RHS is swapped here as well.
577
585–587
585–587	What do you think about this format? The result can be easily verified this way I think, but a bit ugly ...

@martong

Nice, assiduous work!

Many thanks for your time! Your work is not less assiduous!

(LHS, RHS) swaps

it doesn't really matter, as the operation is comutative, but, yes, it does matter to depict the particular test case. I'll revise twise LHS-RHS's.

I am going to continue with the next patch in the stack. I recognize that the handling of casts is very important and problems related to not handling them occurs even more frequently as other parts of the engine evolve (e.g. https://reviews.llvm.org/D113753#3167134)

Aha. I saw you patch. I'll join to review it.

clang/unittests/StaticAnalyzer/RangeSetTest.cpp
436	There are three overloads of `checkUnite` which correspond to three overloads of `RangeSet::unite`. I made it consistent to other check-functions (`add` e.g.).
444	This tests different oveloaded versions of `RangeSet::unite`.
523–524	clang-fromat acts on its own. But I agree, it looks the way better. I'll consider wrapping it into `// clang-format on/off` directives.
585–587	I think ASCII art does this job. Let code look as code :)

Fixed code formatting in the unit test file according to remarks. Ready to load.

This revision was not accepted when it landed; it landed in state Needs Review.Dec 10 2021, 8:48 AM

This revision was landed with ongoing or failed builds.

Closed by commit rG6a399bf4b3aa: [analyzer] Implemented RangeSet::Factory::unite function to handle… (authored by ASDenysPetrov). · Explain Why

This revision was automatically updated to reflect the committed changes.

ASDenysPetrov added a commit: rG6a399bf4b3aa: [analyzer] Implemented RangeSet::Factory::unite function to handle….

Harbormaster completed remote builds in B138682: Diff 393509.Dec 10 2021, 9:21 AM

ASDenysPetrov removed a child revision: D103094: [analyzer] Implemented RangeSet::Factory::castTo function to perform promotions, truncations and conversions.Apr 15 2022, 9:19 AM

Revision Contents

Path

Size

clang/

include/

clang/

StaticAnalyzer/

Core/

PathSensitive/

RangedConstraintManager.h

27 lines

lib/

StaticAnalyzer/

Core/

RangeConstraintManager.cpp

145 lines

unittests/

StaticAnalyzer/

RangeSetTest.cpp

327 lines

Diff 348294

clang/include/clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h

Show First 20 Lines • Show All 118 Lines • ▼ Show 20 Lines	public:

class Factory {		class Factory {
public:		public:
Factory(BasicValueFactory &BV) : ValueFactory(BV) {}		Factory(BasicValueFactory &BV) : ValueFactory(BV) {}

/// Create a new set with all ranges from both LHS and RHS.		/// Create a new set with all ranges from both LHS and RHS.
/// Possible intersections are not checked here.		/// Possible intersections are not checked here.
///		///
/// Complexity: O(N + M)		/// Complexity: O(N + M)
		vsavchenkoUnsubmitted Not Done Reply Inline Actions This most certainly can be done in `O(N + M)` the same way the intersection is done. vsavchenko: This most certainly can be done in `O(N + M)` the same way the intersection is done.
		ASDenysPetrovAuthorUnsubmitted Done Reply Inline Actions Actually yes, it can. And it was, when I played with different solutions. But the code was poor readable. So I decided make it easier to understand to bring to review. Of couse I can move further to improve it and retain readability. I'll do. ASDenysPetrov: Actually yes, it can. And it was, when I played with different solutions. But the code was poor…
		vsavchenkoUnsubmitted Not Done Reply Inline Actions Bring it here and we'll see what can be done on that front. makeover time! vsavchenko: Bring it here and we'll see what can be done on that front. makeover time!
/// where N = size(LHS), M = size(RHS)		/// where N = size(LHS), M = size(RHS)
RangeSet add(RangeSet LHS, RangeSet RHS);		RangeSet add(RangeSet LHS, RangeSet RHS);
/// Create a new set with all ranges from the original set plus the new one.		/// Create a new set with all ranges from the original set plus the new one.
/// Possible intersections are not checked here.		/// Possible intersections are not checked here.
///		///
/// Complexity: O(N)		/// Complexity: O(N)
/// where N = size(Original)		/// where N = size(Original)
RangeSet add(RangeSet Original, Range Element);		RangeSet add(RangeSet Original, Range Element);
/// Create a new set with all ranges from the original set plus the point.		/// Create a new set with all ranges from the original set plus the point.
/// Possible intersections are not checked here.		/// Possible intersections are not checked here.
///		///
/// Complexity: O(N)		/// Complexity: O(N)
/// where N = size(Original)		/// where N = size(Original)
RangeSet add(RangeSet Original, const llvm::APSInt &Point);		RangeSet add(RangeSet Original, const llvm::APSInt &Point);
		/// Create a new set which is a union of two given ranges.
		/// Possible intersections are not checked here.
		///
		/// Complexity: O(N + M)
		vsavchenkoUnsubmitted Not Done Reply Inline Actions nit: `O(N + log(N)) == O(N)` vsavchenko: nit: `O(N + log(N)) == O(N)`
		/// where N = size(LHS), M = size(RHS)
		RangeSet unite(RangeSet LHS, RangeSet RHS);
		vsavchenkoUnsubmitted Not Done Reply Inline Actions `LHS` vsavchenko: `LHS`
		/// Create a new set by uniting given range set with the given range.
		/// All intersections and adjacent ranges are handled here.
		///
		/// Complexity: O(N)
		/// where N = size(Original)
		RangeSet unite(RangeSet Original, Range Element);
		/// Create a new set by uniting given range set with the given point.
		/// All intersections and adjacent ranges are handled here.
		///
		/// Complexity: O(N)
		/// where N = size(Original)
		RangeSet unite(RangeSet Original, llvm::APSInt Point);
		/// Create a new set by uniting given range set with the given range
		/// between points. All intersections and adjacent ranges are handled here.
		///
		/// Complexity: O(N)
		/// where N = size(Original)
		RangeSet unite(RangeSet Original, llvm::APSInt From, llvm::APSInt To);

RangeSet getEmptySet() { return &EmptySet; }		RangeSet getEmptySet() { return &EmptySet; }

/// Create a new set with just one range.		/// Create a new set with just one range.
/// @{		/// @{
RangeSet getRangeSet(Range Origin);		RangeSet getRangeSet(Range Origin);
RangeSet getRangeSet(const llvm::APSInt &From, const llvm::APSInt &To) {		RangeSet getRangeSet(const llvm::APSInt &From, const llvm::APSInt &To) {
return getRangeSet(Range(From, To));		return getRangeSet(Range(From, To));
▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	public:

private:		private:
/// Return a persistent version of the given container.		/// Return a persistent version of the given container.
RangeSet makePersistent(ContainerType &&From);		RangeSet makePersistent(ContainerType &&From);
/// Construct a new persistent version of the given container.		/// Construct a new persistent version of the given container.
ContainerType *construct(ContainerType &&From);		ContainerType *construct(ContainerType &&From);

RangeSet intersect(const ContainerType &LHS, const ContainerType &RHS);		RangeSet intersect(const ContainerType &LHS, const ContainerType &RHS);
		/// NOTE: This function relies on the fact that all values in the
		vsavchenkoUnsubmitted Not Done Reply Inline Actions nit: "...on the fact that..." vsavchenko: nit: "...on the fact that..."
		/// containers are persistent (created via BasicValueFactory::getValue).
		ContainerType unite(const ContainerType &LHS, const ContainerType &RHS);

		vsavchenkoUnsubmitted Not Done Reply Inline Actions `ContainerType` is basically a mutable version of `RangeSet`, so there is only one reason to return it - you believe that the users might want to modify it after they called this `unite`. But as long as this `unite` is just a generalized version of user-facing `unites, it can totally return` RangeSet`. vsavchenko: `ContainerType` is basically a mutable version of `RangeSet`, so there is only one reason to…
		ASDenysPetrovAuthorUnsubmitted Done Reply Inline Actions I'm going to use raw ContainerType in further patches. So this is exactly what I want. ASDenysPetrov: I'm going to use raw ContainerType in further patches. So this is exactly what I want.
		vsavchenkoUnsubmitted Not Done Reply Inline Actions Oh, I see. OK then :) vsavchenko: Oh, I see. OK then :)
// Many operations include producing new APSInt values and that's why		// Many operations include producing new APSInt values and that's why
// we need this factory.		// we need this factory.
BasicValueFactory &ValueFactory;		BasicValueFactory &ValueFactory;
// Allocator for all the created containers.		// Allocator for all the created containers.
// Containers might own their own memory and that's why it is specific		// Containers might own their own memory and that's why it is specific
// for the type, so it calls container destructors upon deletion.		// for the type, so it calls container destructors upon deletion.
llvm::SpecificBumpPtrAllocator<ContainerType> Arena;		llvm::SpecificBumpPtrAllocator<ContainerType> Arena;
// Usually we deal with the same ranges and range sets over and over.		// Usually we deal with the same ranges and range sets over and over.
▲ Show 20 Lines • Show All 161 Lines • Show Last 20 Lines

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp

Show First 20 Lines • Show All 102 Lines • ▼ Show 20 Lines

}; };

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

// RangeSet implementation // RangeSet implementation

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

RangeSet::ContainerType RangeSet::Factory::EmptySet{}; RangeSet::ContainerType RangeSet::Factory::EmptySet{};

RangeSet RangeSet::Factory::add(RangeSet LHS, RangeSet RHS) {

ContainerType Result;

vsavchenkoUnsubmitted

Not Done

Let's reserve some place here. Because LHS and RHS don't have intersections, the result always has size(LHS) + size(RHS) elements

vsavchenko: Let's reserve some place here. Because `LHS` and `RHS` don't have intersections, the result…

Result.reserve(LHS.size() + RHS.size());

ASDenysPetrovAuthorUnsubmitted

Done

Also optimized this particular case.

ASDenysPetrov: Also optimized this particular case.

std::merge(LHS.begin(), LHS.end(), RHS.begin(), RHS.end(),

std::back_inserter(Result));

vsavchenkoUnsubmitted

Not Done

This is REAL bad. The main benefit of the new RangeSet over the old one is the fact that common operations that consist of many basic operations are done on mutable containers, i.e. when RHS has 10 elements this code will create and copy a new array 10 times discarding 9 of them.

That's why every implementation method here operates on mutable ContainerType and then makes is persistent.

Additionally, merging add can be done with one iteration over two containers, but instead we do O(N) operation here M times, so it is not O(N + M), but O(N * M).

vsavchenko: This is REAL bad. The main benefit of the new `RangeSet` over the old one is the fact that…

return makePersistent(std::move(Result));

}

RangeSet RangeSet::Factory::add(RangeSet Original, Range Element) { RangeSet RangeSet::Factory::add(RangeSet Original, Range Element) {

ContainerType Result; ContainerType Result;

Result.reserve(Original.size() + 1); Result.reserve(Original.size() + 1);

const_iterator Lower = llvm::lower_bound(Original, Element); const_iterator Lower = llvm::lower_bound(Original, Element);

Result.insert(Result.end(), Original.begin(), Lower); Result.insert(Result.end(), Original.begin(), Lower);

Result.push_back(Element); Result.push_back(Element);

Result.insert(Result.end(), Lower, Original.end()); Result.insert(Result.end(), Lower, Original.end());

return makePersistent(std::move(Result)); return makePersistent(std::move(Result));

} }

ASDenysPetrovAuthorUnsubmitted

Done

This allows to add a RangeSet of any type. E.g. RangeSet(uchar) + RangeSet(int) = valid, because of pin

I'm wondering whether we really need it here in practice?

ASDenysPetrov: This allows to add a RangeSet of any type. E.g. RangeSet(uchar) + RangeSet(int) = valid…

vsavchenkoUnsubmitted

Not Done

We mix ranges of different types way more than one would expect.

vsavchenko: We mix ranges of different types way more than one would expect.

RangeSet RangeSet::Factory::add(RangeSet Original, const llvm::APSInt &Point) { RangeSet RangeSet::Factory::add(RangeSet Original, const llvm::APSInt &Point) {

return add(Original, Range(Point)); return add(Original, Range(Point));

} }

vsavchenkoUnsubmitted

Not Done

Is there a reason not to use range-based loop in this case?

vsavchenko: Is there a reason not to use range-based loop in this case?

ASDenysPetrovAuthorUnsubmitted

Done

You are right. Working on restoring O(N) I been played with iterators, then found another solution, but forgot to return range-based loop back. Thnx.

ASDenysPetrov: You are right. Working on restoring O(N) I been played with iterators, then found another…

RangeSet RangeSet::Factory::unite(RangeSet LHS, RangeSet RHS) {

ContainerType Result = unite(*LHS.Impl, *RHS.Impl);

return makePersistent(std::move(Result));

vsavchenkoUnsubmitted

Not Done

I'd prefer merge

vsavchenko: I'd prefer `merge`

ASDenysPetrovAuthorUnsubmitted

Done

There are common namings of operations with sets and union is one of them. IMO this is most appropriate one.

ASDenysPetrov: There are common namings of operations with sets and `union` is one of them. IMO this is most…

}

RangeSet RangeSet::Factory::unite(RangeSet Original, Range R) {

ContainerType Result;

vsavchenkoUnsubmitted

Not Done

This should be done outside of the loop, we assume that all the ranges are of the same type.

vsavchenko: This should be done outside of the loop, we assume that all the ranges are of the same type.

ASDenysPetrovAuthorUnsubmitted

Done

+1 I'll move it outside the loop.

ASDenysPetrov: +1 I'll move it outside the loop.

Result.push_back(R);

Result = unite(*Original.Impl, Result);

return makePersistent(std::move(Result));

}

RangeSet RangeSet::Factory::unite(RangeSet Original, llvm::APSInt Point) {

steakhalUnsubmitted

Not Done

Why do you take APSInts by value? Generally, we take them by reference.

steakhal: Why do you take `APSInt`s by value? Generally, we take them by reference.

ASDenysPetrovAuthorUnsubmitted

Done

I want to send a message to the caller that he can pass an arbitrary APSInt without warrying about making it permanent (aka stored by the Factory). But we can revise this contract and carry this responsibility to a caller.

ASDenysPetrov: I want to send a message to the caller that he can pass an arbitrary **APSInt** without…

martongUnsubmitted

Not Done

Why do you take APSInts by value? Generally, we take them by reference.

Actually, it is specific to BasicValueFactory to cache the APSInts, however, it might not be the best practice. I doubt that somebody has ever measured the performance of passing APSInts by value, so my guess is the caching of APSInts might be an early optimization that might be more harmful than advantageous. On top of all this, we do the caching inconsistently, just consider the member functions of APSIntType, they all return by value.

Perhaps (totally independently from this patch of course), it might be worth to have a measurement/comparison with removed cache and pass by value.

martong: > Why do you take `APSInt`s by value? Generally, we take them by reference. Actually, it is…

steakhalUnsubmitted

Done

Okay, then we shall leave this as-is now.

steakhal: Okay, then we shall leave this as-is now.

return unite(Original, Range(ValueFactory.getValue(Point)));

}

RangeSet RangeSet::Factory::unite(RangeSet Original, llvm::APSInt From,

llvm::APSInt To) {

return unite(Original,

Range(ValueFactory.getValue(From), ValueFactory.getValue(To)));

}

RangeSet::ContainerType RangeSet::Factory::unite(const ContainerType &LHS,

const ContainerType &RHS) {

if (LHS.empty())

return RHS;

if (RHS.empty())

return LHS;

vsavchenkoUnsubmitted

Not Done

This is a problem here. This essentially doubles the work you did before. What can be done in one O(N) loop is done with two.

However, I don't really see a point in fixing this algorithm because the more generic RangeSet + RangeSet should be optimal O(N + M) and this one can be implemented as a special case.

vsavchenko: This is a problem here. This essentially doubles the work you did before. What can be done in…

using llvm::APSInt;

using iterator = ContainerType::const_iterator;

iterator I1 = LHS.begin();

martongUnsubmitted

Not Done

I think, the naming conventions we have in intersect are easier to follow. Could we call I1 as First and I2 as Second?

martong: I think, the naming conventions we have in `intersect` are easier to follow. Could we call `I1`…

iterator E1 = LHS.end();

iterator I2 = RHS.begin();

iterator E2 = RHS.end();

APSIntType Ty = APSIntType(I1->From());

const APSInt One = Ty.getValue(1);

const APSInt Min = Ty.getMinValue();

const APSInt *F = nullptr;

ContainerType Result;

// Handle a corner case first when both range sets start from MIN.

martongUnsubmitted

Not Done

This is called

martong: `This is called`

// This helps to avoid complicated conditions below.

if (Min == I1->From() && Min == I2->From()) {

martongUnsubmitted

Not Done

// Append the rest of the ranges from another range set to the Result

- // and return the later.

+ // and return with that.

auto AppendRest = [&Result](iterator I, iterator E) {

martong:

if (I1->To() > I2->To()) {

// The second range is entirely inside the first one. Skip it.

// Check for the end of the range for every incrementation.

if (++I2 == E2)

goto end2;

} else {

// The first range is entirely inside the second one. Skip it.

martongUnsubmitted

Not Done

It is not clear what is the complicated condition that you'd like to avoid. Could you please elaborate?

martong: It is not clear what is the complicated condition that you'd like to avoid. Could you please…

// Check for the end of the range for every incrementation.

if (++I1 == E1)

goto end1;

}

while (true) {

// I1->From() shall be lower than I2->From().

// Otherwise, swap the iterators.

if (I1->From() > I2->From()) {

std::swap(I1, I2);

std::swap(E1, E2);

}

martongUnsubmitted

Not Done

// This helps to avoid complicated conditions below.

if (Min == I1->From() && Min == I2->From()) {

+ // ----[ First ---------------------->

+ // ----[ Second --------------------->

if (I1->To() > I2->To()) {

- // The second range is entirely inside the first one. Skip it.

- // Check for the end of the range for every incrementation.

+ // ----[ First ]-[ First + 1 ---->

+ // ----[ Second ]-[ Second + 1 ------>

+ // The Second range is entirely inside the First one.

+ // Check if Second is the last in its RangeSet.

if (++I2 == E2)

+ // ----[ First ]-[ First + 1 ---->

+ // ----[ Second ]--------------------->

+ // The union is equial to First's RangeSet.

return AppendRest(I1, E1);

} else {

- // The first range is entirely inside the second one. Skip it.

- // Check for the end of the range for every incrementation.

+ // ----[ First ]-[ First + 1 ------>

+ // ----[ Second ]-[ Second + 1 ---->

+ // The First range is entirely inside the Second one.

if (++I1 == E1)

+ // ----[ Second ]-[ Second + 1 ---->

+ // ----[ First ]--------------------->

+ // The union is equial to Second's RangeSet.

return AppendRest(I2, E2);

}

while (true) {

I'd like to see similar visual comments that we have in intersect.

martong: I'd like to see similar visual comments that we have in `intersect`.

// Build a new range.

vsavchenkoUnsubmitted

Not Done

At this point, we know for a fact that the next range we are going to add to our result set starts with I1->From(). No need to check F for null or anything

vsavchenko: At this point, we know for a fact that the next range we are going to add to our result set…

while (true) {

steakhalUnsubmitted

Not Done

Might be First->To() == Second->To(). In this case the comment is not completely accurate.

steakhal: Might be `First->To() == Second->To()`. In this case the comment is not completely accurate.

ASDenysPetrovAuthorUnsubmitted

Done

I'll update.

ASDenysPetrov: I'll update.

// Skip all enclosed ranges.

martongUnsubmitted

Not Done

while (true) {

- // I1->From() shall be lower than I2->From().

- // Otherwise, swap the iterators.

+ // We want to keep the following invariant at all times:

+ //

+ // ----[ First ---------------------->

+ // --------[ Second ----------------->

if (I1->From() > I2->From()) {

Let's reuse as much as possible from intersect, both code and comments.

martong: Let's reuse as much as possible from `intersect`, both code and comments.

while (I1->To() >= I2->To()) {

// Check for the end of the range for every incrementation.

if (++I2 == E2)

martongUnsubmitted

Not Done

We could reuse SwapIterators from intersect. Of course for that we need to make a real function out of the lambda.

martong: We could reuse `SwapIterators` from `intersect`. Of course for that we need to make a real…

goto end2;

}

martongUnsubmitted

Not Done

std::swap(E1, E2);

}

- // At this point, the next range surely starts with I1->From().

+ // The union definitely starts with Firs->From().

F = &I1->From();

martong:

// Check if the second range intersects or adjucent to the first one.

vsavchenkoUnsubmitted

Not Done

nit: "is adjacent"

vsavchenko: nit: "**is** adj**a**cent"

martongUnsubmitted

Not Done

// At this point, the next range surely starts with I1->From().

- F = &I1->From();

+ const llvm::APSint &UnionStart = First->From();

// Build a new range.

Let's be consistent with intersect. Also, you could introduce the variable here, and it is not needed to declare that at L176.

martong: Let's be consistent with `intersect`. Also, you could introduce the variable here, and it is…

if (I1->To() < I2->From() - One)

break;

vsavchenkoUnsubmitted

Not Done

Why break? At this point we know that what we peaked into is actually a part of another range (in terms of the end result).
And this range is over, you just need to add it to the final result!

vsavchenko: Why `break`? At this point we know that what we peaked into is actually a part of another…

martongUnsubmitted

Not Done

F = &I1->From();

- // Build a new range.

+ // Loop where the invariant holds.

while (true) {

martong:

// We use `F` as a flag to notify that we are in a building of a new

// range. Set `From` of a new range if it is not set yet. If it has

// already been set, then we are inside this range and just looking for

// its end.

if (!F)

vsavchenkoUnsubmitted

Not Done

This algorithm is indeed O(N +M), good job!
But let me get picky again 😅

It looks like we do too many comparisons in this loop. As I said earlier, constant factor is the key here and while this iterator idea is good it is a tradeoff. There is a piece of knowledge that would've been obvious with regular iterators, now is a run-time unknown that we constantly need to check (isTo and isFrom).

What we are trying to achieve here is not harder than what we do in intersect and there it is way less comparisons. While iterating through sets we can keep a set of invariants, so we don't need to re-check something that we already know.

Here is a sketch for the algorithm:

{
  assert(First != FirstEnd && Second != SecondEnd);
  // We have && here because one of them reaches its end,
  // we should not check for it again and simply add the rest
  // of the other set.
  while (true) {
    // We need to find the beginning of the next range to add
    // First should be the iterator which To is a candidate to be
    // an and for the merged range.
    if (First.From() > Second.From()) {
      swap();
    }

    auto From = First.From();

    // That's it, we found our start, and we need to go through
    // other ranges and look for the end.
    // After this point, First.From() shouldn'y be accessed.

    while (true) {
      // We can compress all the checks into just one.
      // This essentially means that Second should not get merged with First.
      if (Second.From() > First.To() + 1) {
        break;
      }

      if (Second.From() > First.From()) {
        // First is maintained as a candidate for the end.
        swap();
      }

      // At this point we know that Second lives fully inside
      // of the new range and we can skip it.
      ++Second;

      // If we have nothing else in the second set...
      if (Second == SecondEnd) {
        // ...let's finish the current range first...
        Result.emplace_back(From, First.To());
        while (++First != FirstEnd) {
          // ...and copy the rest of the ranges
          Result.push_back(*First);
        }
        // The range is ready.
        return makePersistent(Result);
      }
    };

    // Second is outside of the range, and we can
    // safely add a new range.
    Result.emplace_back(From, First.To());
    ++First;

    // First set can be over at this point and we should...
    if (First == FirstEnd) {
      // ...copy the rest if the second set's ranges.
      while (Second != SecondEnd) {
        Result.push_back(*(Second++));
      }
      // Nothing left to do.
      return makePersistent(Result);
    }
  };

  // No way for us to get here!
  llvm_unreachable("...");
}

It's trickier in the way we end things than the intersection because we still need to add the rest of the other set.

vsavchenko: This algorithm is indeed `O(N +M)`, good job! But let me get picky again 😅 It looks like we do…

vsavchenkoUnsubmitted

Not Done

You actually don't need it, the moment you swap two iterators and find out that I1->From() <= I2->From(), that's it I1->From() is the beginning of the new range. No need to carry extra flag here.

vsavchenko: You actually don't need it, the moment you swap two iterators and find out that `I1->From() <=…

F = &I1->From();

// The first range is entirely inside the new range. Go next.

// Check for the end of the range for every incrementation.

vsavchenkoUnsubmitted

Not Done

Oof, I don't know about this algorithm. I mean it does its job. But IMO it lacks a good description of what are the invariants and what are the different situations we are looking for.
Aaaand you kind of re-check certain conditions multiple times. One example here is the check for Min and Max. Those situations are super rare, but we check for them on every single iteration. std::min and std::max are additional comparisons. As I mentioned before, constant factor is the key here and less comparisons we do is way more important than doing binary search at some point.
Just make a benchmark if you don't believe me (with google-benchmark, for example). The version with less comparisons will dominate one with more on RangeSet under 20 (and they'll be even smaller in practice).

vsavchenko: Oof, I don't know about this algorithm. I mean it does its job. But IMO it lacks a good…

ASDenysPetrovAuthorUnsubmitted

Done

I'll investigate the whole algorithm once more and reduce comparisons.

ASDenysPetrov: I'll investigate the whole algorithm once more and reduce comparisons.

if (++I1 == E1)

goto end1;

// Now we can surely swap the iterators without any check, as we know

// I2-From() to be lower than I1-From().

vsavchenkoUnsubmitted

Not Done

// Now we can surely swap the iterators without any check, as we know

- // I2-From() to be lower than I1-From().

+ // I2->From() to be lower than I1->From().

std::swap(I1, I2);

nit
Additionally, you didn't tell your readers WHY you are even swapping iterators in the first place and what relationship they have. You need to communicate in terms of invariants.

vsavchenko: nit Additionally, you didn't tell your readers WHY you are even swapping iterators in the…

std::swap(I1, I2);

std::swap(E1, E2);

}

martongUnsubmitted

Not Done

I am not sure about this, but perhaps we could put this swapIterators call right at the beginning of the nested while loop (L243). That would eliminate the need to call it again at the end of the second while loop.

martong: I am not sure about this, but perhaps we could put this `swapIterators` call right at the…

ASDenysPetrovAuthorUnsubmitted

Done

I'm afraid, it is not the case. Every loop needs its own swapIterators. As you can see, swapIterators in the nested loop invokes not always because of break; in the middle. Otherwise, it would invokes each time. But I checked your suggestion. It doesn't work.

ASDenysPetrov: I'm afraid, it is not the case. Every loop needs its own `swapIterators`. As you can see…

// Here the first and second ranges are disjoint. So we can add a new

// range, if expected, or add just the first range.

if (F) {

Result.emplace_back(*F, I1->To());

// Clear `F` to notify we are out of the new range.

F = nullptr;

} else

Result.push_back(*I1);

martongUnsubmitted

Not Done

// Every next range of the first set always go after the second range.

- // So swap the iterators without any check.

+ // Make sure that the loop invariant holds.

std::swap(I1, I2);

martong:

// The first range is entirely inside the added range. Go next.

martongUnsubmitted

Not Done

So, this loop is about to merge conjunct Ranges. The first time when we find disjoint Ranges then we break out. (Or we return once we reach the end of any of the RangeSets.)
This makes we wonder, if it would be possible to split this while loop into a lambda named mergeConjunctRanges ?

martong: So, this loop is about to merge conjunct Ranges. The first time when we find disjoint Ranges…

ASDenysPetrovAuthorUnsubmitted

Done

I'm not in favor of introducing another level of abstraction. It would divide the flow/comments and could reduce readability and performance which is crucial here.

ASDenysPetrov: I'm not in favor of introducing another level of abstraction. It would divide the flow/comments…

// Check for the end of the range for every incrementation.

if (++I1 == E1)

goto end1;

vsavchenkoUnsubmitted

Not Done

This goto is always finishing with a return, so we can refactor goto end1 into something like return copyIntoResult(I1, E1); and goto end2 into return copyIntoResult(I2, E2);.
As I mentioned before, optional F should be removed. On every path we should know deterministically what is the beginning and what is the end of the range that we are trying to add.

vsavchenko: This goto is always finishing with a return, so we can refactor `goto end1` into something like…

};

// There are no ranges left in one of the ranges. Append the rest of the ranges

// from another range set to the Result.

end1:

I1 = I2;

E1 = E2;

end2:

// Do not forget to add a new range if it expects to be set.

if (F) {

Result.emplace_back(*F, I1->To());

// Skip the current range as it was enclosed by a new range.

// We don't need to check `I1 == E1` here, as it handles in `append`

vsavchenkoUnsubmitted

Not Done

nit: it has different ticks than you use in all other places

vsavchenko: nit: it has different ticks than you use in all other places

ASDenysPetrovAuthorUnsubmitted

Done

I'm sorry. I didn't get it. What do you mean?

ASDenysPetrov: I'm sorry. I didn't get it. What do you mean?

vsavchenkoUnsubmitted

Not Done

This is a super duper tiny thing: you usually write ) or (, but here wrote '('.

vsavchenko: This is a super duper tiny thing: you usually write `)` or `(`, but here wrote '('.

ASDenysPetrovAuthorUnsubmitted

Done

Oh! :-) Sharp eye!

ASDenysPetrov: Oh! :-) Sharp eye!

// function below.

++I1;

}

Result.append(I1, E1);

return Result;

}

RangeSet RangeSet::Factory::getRangeSet(Range From) { RangeSet RangeSet::Factory::getRangeSet(Range From) {

ContainerType Result; ContainerType Result;

Result.push_back(From); Result.push_back(From);

return makePersistent(std::move(Result)); return makePersistent(std::move(Result));

} }

RangeSet RangeSet::Factory::makePersistent(ContainerType &&From) { RangeSet RangeSet::Factory::makePersistent(ContainerType &&From) {

llvm::FoldingSetNodeID ID; llvm::FoldingSetNodeID ID;

void *InsertPos; void *InsertPos;

From.Profile(ID); From.Profile(ID);

ContainerType *Result = Cache.FindNodeOrInsertPos(ID, InsertPos); ContainerType *Result = Cache.FindNodeOrInsertPos(ID, InsertPos);

if (!Result) { if (!Result) {

// It is cheaper to fully construct the resulting range on stack // It is cheaper to fully construct the resulting range on stack

// and move it to the freshly allocated buffer if we don't have // and move it to the freshly allocated buffer if we don't have

// a set like this already. // a set like this already.

Result = construct(std::move(From)); Result = construct(std::move(From));

Cache.InsertNode(Result, InsertPos); Cache.InsertNode(Result, InsertPos);

} }

return Result; return Result;

} }

RangeSet::ContainerType *RangeSet::Factory::construct(ContainerType &&From) { RangeSet::ContainerType *RangeSet::Factory::construct(ContainerType &&From) {

void *Buffer = Arena.Allocate(); void *Buffer = Arena.Allocate();

martongUnsubmitted

Not Done

// We know that we are at one of the two cases:

- // case 1: --[ First ]-[ First + 1 ]------>

- // case 2: --[ First ]-[ First + 1 ]--->

- // -------------------[ Second + 1]------->

+ // case 1: --[ First - 1]-[ First ]------>

+ // case 2: --[ First - 1 ]-[ First ]--->

+ // ----------------------[ Second ]------->

// In any case First + 1 goes after Second.

Should this be like in the code suggestion? You incremented First at L276.

martong: Should this be like in the code suggestion? You incremented `First` at L276.

return new (Buffer) ContainerType(std::move(From)); return new (Buffer) ContainerType(std::move(From));

martongUnsubmitted

Not Done

// -------------------[ Second + 1]------->

- // In any case First + 1 goes after Second.

+ // In both cases First starts after the Second's beginning (`Second->From`).

// Make sure that the loop invariant holds.

martong: ?

} }

RangeSet RangeSet::Factory::add(RangeSet LHS, RangeSet RHS) {

ContainerType Result;

std::merge(LHS.begin(), LHS.end(), RHS.begin(), RHS.end(),

std::back_inserter(Result));

return makePersistent(std::move(Result));

}

const llvm::APSInt &RangeSet::getMinValue() const { const llvm::APSInt &RangeSet::getMinValue() const {

vsavchenkoUnsubmitted

Not Done

I don't see any practical reasons to keep this version over the more generic RangeSet + RangeSet

vsavchenko: I don't see any practical reasons to keep this version over the more generic `RangeSet` +…

assert(!isEmpty()); assert(!isEmpty());

return begin()->From(); return begin()->From();

} }

const llvm::APSInt &RangeSet::getMaxValue() const { const llvm::APSInt &RangeSet::getMaxValue() const {

assert(!isEmpty()); assert(!isEmpty());

return std::prev(end())->To(); return std::prev(end())->To();

} }

Show All 11 Lines

} }

bool RangeSet::pin(llvm::APSInt &Point) const { bool RangeSet::pin(llvm::APSInt &Point) const {

APSIntType Type(getMinValue()); APSIntType Type(getMinValue());

if (Type.testInRange(Point, true) != APSIntType::RTR_Within) if (Type.testInRange(Point, true) != APSIntType::RTR_Within)

return false; return false;

Type.apply(Point); Type.apply(Point);

return true; return true;

} }

bool RangeSet::pin(llvm::APSInt &Lower, llvm::APSInt &Upper) const { bool RangeSet::pin(llvm::APSInt &Lower, llvm::APSInt &Upper) const {

// This function has nine cases, the cartesian product of range-testing // This function has nine cases, the cartesian product of range-testing

// both the upper and lower bounds against the symbol's type. // both the upper and lower bounds against the symbol's type.

// Each case requires a different pinning operation. // Each case requires a different pinning operation.

// The function returns false if the described range is entirely outside // The function returns false if the described range is entirely outside

// the range of values for the associated symbol. // the range of values for the associated symbol.

APSIntType Type(getMinValue()); APSIntType Type(getMinValue());

APSIntType::RangeTestResultKind LowerTest = Type.testInRange(Lower, true); APSIntType::RangeTestResultKind LowerTest = Type.testInRange(Lower, true);

APSIntType::RangeTestResultKind UpperTest = Type.testInRange(Upper, true); APSIntType::RangeTestResultKind UpperTest = Type.testInRange(Upper, true);

switch (LowerTest) { switch (LowerTest) {

case APSIntType::RTR_Below: case APSIntType::RTR_Below:

vsavchenkoUnsubmitted

Not Done

Same here, it is just way more code to maintain.

vsavchenko: Same here, it is just way more code to maintain.

switch (UpperTest) { switch (UpperTest) {

case APSIntType::RTR_Below: case APSIntType::RTR_Below:

// The entire range is outside the symbol's set of possible values. // The entire range is outside the symbol's set of possible values.

// If this is a conventionally-ordered range, the state is infeasible. // If this is a conventionally-ordered range, the state is infeasible.

if (Lower <= Upper) if (Lower <= Upper)

return false; return false;

// However, if the range wraps around, it spans all possible values. // However, if the range wraps around, it spans all possible values.

▲ Show 20 Lines • Show All 2,220 Lines • Show Last 20 Lines

clang/unittests/StaticAnalyzer/RangeSetTest.cpp

Show All 35 Lines LLVM_ATTRIBUTE_UNUSED static std::ostream &operator<<(std::ostream &OS,

return OS << toString(Set); return OS << toString(Set);

} }

} // namespace ento } // namespace ento

} // namespace clang } // namespace clang

namespace { namespace {

template <typename T> struct TestValues {

static constexpr T MIN = std::numeric_limits<T>::min();

static constexpr T MAX = std::numeric_limits<T>::max();

// MID is a value in the middle of the range

// which unary minus does not affect on,

// e.g. int8/int32(0), uint8(128), uint32(2147483648).

static constexpr T MID =

std::is_signed<T>::value ? 0 : ~(static_cast<T>(-1) / static_cast<T>(2));

static constexpr T A = MID - (MAX - MID) / 3 * 2;

static constexpr T B = MID - (MAX - MID) / 3;

static constexpr T C = -B;

static constexpr T D = -A;

static_assert(MIN < A && A < B && B < MID && MID < C && C < D && D < MAX,

"Values shall be in an ascending order");

};

template <typename BaseType> class RangeSetTest : public testing::Test { template <typename BaseType> class RangeSetTest : public testing::Test {

public: public:

// Init block // Init block

std::unique_ptr<ASTUnit> AST = tooling::buildASTFromCode("struct foo;"); std::unique_ptr<ASTUnit> AST = tooling::buildASTFromCode("struct foo;");

ASTContext &Context = AST->getASTContext(); ASTContext &Context = AST->getASTContext();

llvm::BumpPtrAllocator Arena; llvm::BumpPtrAllocator Arena;

BasicValueFactory BVF{Context, Arena}; BasicValueFactory BVF{Context, Arena};

RangeSet::Factory F{BVF}; RangeSet::Factory F{BVF};

// End init block // End init block

using Self = RangeSetTest<BaseType>; using Self = RangeSetTest<BaseType>;

using RawRange = std::pair<BaseType, BaseType>; using RawRange = std::pair<BaseType, BaseType>;

using RawRangeSet = std::initializer_list<RawRange>; using RawRangeSet = std::initializer_list<RawRange>;

static constexpr BaseType getMin() {

return std::numeric_limits<BaseType>::min();

}

static constexpr BaseType getMax() {

return std::numeric_limits<BaseType>::max();

}

static constexpr BaseType getMid() {

return isSigned() ? 0 : ~(fromInt(-1) / fromInt(2));

}

static constexpr bool isSigned() { return std::is_signed<BaseType>::value; }

static constexpr BaseType fromInt(int X) { return static_cast<BaseType>(X); }

static llvm::APSInt Base;

const llvm::APSInt &from(BaseType X) { const llvm::APSInt &from(BaseType X) {

llvm::APSInt Dummy = Base; static llvm::APSInt Base{sizeof(BaseType) * 8,

Dummy = X; std::is_unsigned<BaseType>::value};

return BVF.getValue(Dummy); Base = X;

return BVF.getValue(Base);

steakhalUnsubmitted

Not Done

Shouldn't you use sizeof(BaseType) * CHAR_BIT instead?

steakhal: Shouldn't you use `sizeof(BaseType) * CHAR_BIT` instead?

ASDenysPetrovAuthorUnsubmitted

Done

Agree. It's better to avoid magic numbers. I'll fix.

ASDenysPetrov: Agree. It's better to avoid magic numbers. I'll fix.

steakhalUnsubmitted

Done

It's not only that but just imagine testing a clang on a special hardware where they have let's say 9 bit bytes, for parity or something similar stuff.
The test would suddenly break.
Although I suspect there would be many more things to break TBH xD

steakhal: It's not only that but just imagine testing a clang on a special hardware where they have let's…

ASDenysPetrovAuthorUnsubmitted

Done

I am always skeptical about using`CHAR_BIT`, beacuse it represents bit number in char. And what if it would be 16 for instance (aka 2 bytes). But my intention is to get an amount of bits for a particular type. And I want something to represent a number of bits in a byte as a fundamental unit, but not something that depends on a char size on a particular platform.
I would better introduce something like constexpr size_t BITS_IN_BYTE = 8;.

ASDenysPetrov: I am always skeptical about using`CHAR_BIT`, beacuse it represents bit number in `char`. And…

steakhalUnsubmitted

Not Done

basic.memobj 6.7.1/22:

The number of bits in a byte is reported by the macro CHAR_BIT in the header <climits>.

steakhal: [[ https://eel.is/c++draft/basic.memobj#footnote-22 | basic.memobj 6.7.1/22 ]]: >The number of…

ASDenysPetrovAuthorUnsubmitted

Done

These legacy names... :) I'll update.

ASDenysPetrov: These legacy names... :) I'll update.

ASDenysPetrovAuthorUnsubmitted

Done

BTW, the first time this definition appeared only in C++17 in a footnote. The committee is not yet confident enough about CHAR_BIT that it still resides there :)

ASDenysPetrov: BTW, the first time this definition appeared only in C++17 in a footnote. The committee is not…

} }

Range from(const RawRange &Init) { Range from(const RawRange &Init) {

return Range(from(Init.first), from(Init.second)); return Range(from(Init.first), from(Init.second));

} }

RangeSet from(const RawRangeSet &Init) { RangeSet from(const RawRangeSet &Init) {

RangeSet RangeSet = F.getEmptySet(); RangeSet RangeSet = F.getEmptySet();

▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines public:

} }

void checkAdd(RawRangeSet RawLHS, RawRange RawRHS, RawRangeSet RawExpected) { void checkAdd(RawRangeSet RawLHS, RawRange RawRHS, RawRangeSet RawExpected) {

wrap(&Self::checkAddImpl<Range>, RawLHS, RawRHS, RawExpected); wrap(&Self::checkAddImpl<Range>, RawLHS, RawRHS, RawExpected);

} }

void checkAdd(RawRangeSet RawLHS, RawRangeSet RawRHS, void checkAdd(RawRangeSet RawLHS, RawRangeSet RawRHS,

RawRangeSet RawExpected) { RawRangeSet RawExpected) {

wrap(&Self::checkAddImpl<RangeSet>, RawRHS, RawLHS, RawExpected); wrap(&Self::checkAddImpl<RangeSet>, RawLHS, RawRHS, RawExpected);

ASDenysPetrovAuthorUnsubmitted

Done

Fixed the misprint.

ASDenysPetrov: Fixed the misprint.

} }

void checkAdd(RawRangeSet RawLHS, BaseType RawRHS, RawRangeSet RawExpected) { void checkAdd(RawRangeSet RawLHS, BaseType RawRHS, RawRangeSet RawExpected) {

wrap(&Self::checkAddImpl<const llvm::APSInt &>, RawLHS, RawRHS, wrap(&Self::checkAddImpl<const llvm::APSInt &>, RawLHS, RawRHS,

RawExpected); RawExpected);

} }

template <class RHSType>

void checkUniteImpl(RangeSet LHS, RHSType RHS, RangeSet Expected) {

RangeSet Result = F.unite(LHS, RHS);

EXPECT_EQ(Result, Expected)

<< "while uniting " << toString(LHS) << " and " << toString(RHS);

}

void checkUnite(RawRangeSet RawLHS, RawRange RawRHS,

RawRangeSet RawExpected) {

wrap(&Self::checkUniteImpl<Range>, RawLHS, RawRHS, RawExpected);

}

void checkUnite(RawRangeSet RawLHS, RawRangeSet RawRHS,

RawRangeSet RawExpected) {

wrap(&Self::checkUniteImpl<RangeSet>, RawLHS, RawRHS, RawExpected);

}

void checkUnite(RawRangeSet RawLHS, BaseType RawRHS,

RawRangeSet RawExpected) {

wrap(&Self::checkUniteImpl<const llvm::APSInt &>, RawLHS, RawRHS,

RawExpected);

}

void checkDeleteImpl(const llvm::APSInt &Point, RangeSet From, void checkDeleteImpl(const llvm::APSInt &Point, RangeSet From,

RangeSet Expected) { RangeSet Expected) {

RangeSet Result = F.deletePoint(From, Point); RangeSet Result = F.deletePoint(From, Point);

EXPECT_EQ(Result, Expected) EXPECT_EQ(Result, Expected)

<< "while deleting " << toString(Point) << " from " << toString(From); << "while deleting " << toString(Point) << " from " << toString(From);

} }

void checkDelete(BaseType Point, RawRangeSet RawFrom, void checkDelete(BaseType Point, RawRangeSet RawFrom,

RawRangeSet RawExpected) { RawRangeSet RawExpected) {

wrap(&Self::checkDeleteImpl, Point, RawFrom, RawExpected); wrap(&Self::checkDeleteImpl, Point, RawFrom, RawExpected);

} }

}; };

} // namespace } // namespace

template <typename BaseType>

llvm::APSInt RangeSetTest<BaseType>::Base{sizeof(BaseType) * 8, !isSigned()};

using IntTypes = ::testing::Types<int8_t, uint8_t, int16_t, uint16_t, int32_t, using IntTypes = ::testing::Types<int8_t, uint8_t, int16_t, uint16_t, int32_t,

uint32_t, int64_t, uint64_t>; uint32_t, int64_t, uint64_t>;

TYPED_TEST_CASE(RangeSetTest, IntTypes); TYPED_TEST_CASE(RangeSetTest, IntTypes);

TYPED_TEST(RangeSetTest, RangeSetNegateTest) { TYPED_TEST(RangeSetTest, RangeSetNegateTest) {

// Use next values of the range {MIN, A, B, MID, C, D, MAX}. using TV = TestValues<TypeParam>;

constexpr auto MIN = TV::MIN;

constexpr TypeParam MIN = TestFixture::getMin(); constexpr auto MAX = TV::MAX;

constexpr TypeParam MAX = TestFixture::getMax(); constexpr auto MID = TV::MID;

// MID is a value in the middle of the range constexpr auto A = TV::A;

// which unary minus does not affect on, constexpr auto B = TV::B;

// e.g. int8/int32(0), uint8(128), uint32(2147483648). constexpr auto C = TV::C;

constexpr TypeParam MID = TestFixture::getMid(); constexpr auto D = TV::D;

constexpr TypeParam A = MID - TestFixture::fromInt(42 + 42);

constexpr TypeParam B = MID - TestFixture::fromInt(42);

constexpr TypeParam C = -B;

constexpr TypeParam D = -A;

static_assert(MIN < A && A < B && B < MID && MID < C && C < D && D < MAX,

"Values shall be in an ascending order");

this->checkNegate({{MIN, A}}, {{MIN, MIN}, {D, MAX}}); this->checkNegate({{MIN, A}}, {{MIN, MIN}, {D, MAX}});

this->checkNegate({{MIN, C}}, {{MIN, MIN}, {B, MAX}}); this->checkNegate({{MIN, C}}, {{MIN, MIN}, {B, MAX}});

this->checkNegate({{MIN, MID}}, {{MIN, MIN}, {MID, MAX}}); this->checkNegate({{MIN, MID}}, {{MIN, MIN}, {MID, MAX}});

this->checkNegate({{MIN, MAX}}, {{MIN, MAX}}); this->checkNegate({{MIN, MAX}}, {{MIN, MAX}});

this->checkNegate({{A, D}}, {{A, D}}); this->checkNegate({{A, D}}, {{A, D}});

this->checkNegate({{A, B}}, {{C, D}}); this->checkNegate({{A, B}}, {{C, D}});

this->checkNegate({{MIN, A}, {D, MAX}}, {{MIN, A}, {D, MAX}}); this->checkNegate({{MIN, A}, {D, MAX}}, {{MIN, A}, {D, MAX}});

Show All 12 Lines TYPED_TEST(RangeSetTest, RangeSetPointIntersectTest) {

// Check that intersection with itself produces the same set. // Check that intersection with itself produces the same set.

this->checkIntersect({{42, 42}}, 42, {{42, 42}}); this->checkIntersect({{42, 42}}, 42, {{42, 42}});

// Check more general cases. // Check more general cases.

this->checkIntersect({{0, 10}, {20, 30}, {30, 40}, {50, 60}}, 42, {}); this->checkIntersect({{0, 10}, {20, 30}, {30, 40}, {50, 60}}, 42, {});

this->checkIntersect({{0, 10}, {20, 30}, {30, 60}}, 42, {{42, 42}}); this->checkIntersect({{0, 10}, {20, 30}, {30, 60}}, 42, {{42, 42}});

} }

TYPED_TEST(RangeSetTest, RangeSetRangeIntersectTest) { TYPED_TEST(RangeSetTest, RangeSetRangeIntersectTest) {

constexpr TypeParam MIN = TestFixture::getMin(); using TV = TestValues<TypeParam>;

constexpr TypeParam MAX = TestFixture::getMax(); constexpr auto MIN = TV::MIN;

constexpr auto MAX = TV::MAX;

// Check that we can correctly intersect empty sets. // Check that we can correctly intersect empty sets.

this->checkIntersect({}, 10, 20, {}); this->checkIntersect({}, 10, 20, {});

this->checkIntersect({}, 20, 10, {}); this->checkIntersect({}, 20, 10, {});

// Check that intersection with itself produces the same set. // Check that intersection with itself produces the same set.

this->checkIntersect({{10, 20}}, 10, 20, {{10, 20}}); this->checkIntersect({{10, 20}}, 10, 20, {{10, 20}});

this->checkIntersect({{MIN, 10}, {20, MAX}}, 20, 10, {{MIN, 10}, {20, MAX}}); this->checkIntersect({{MIN, 10}, {20, MAX}}, 20, 10, {{MIN, 10}, {20, MAX}});

// Check non-overlapping range intersections. // Check non-overlapping range intersections.

▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines TYPED_TEST(RangeSetTest, RangeSetContainsTest) {

this->checkContains({{5, 10}}, 10, true); this->checkContains({{5, 10}}, 10, true);

// * when the range has the point somewhere in the middle // * when the range has the point somewhere in the middle

this->checkContains({{0, 25}}, 10, true); this->checkContains({{0, 25}}, 10, true);

// Check similar cases, but with larger sets. // Check similar cases, but with larger sets.

this->checkContains({{0, 5}, {10, 10}, {15, 20}}, 10, true); this->checkContains({{0, 5}, {10, 10}, {15, 20}}, 10, true);

this->checkContains({{0, 5}, {10, 12}, {15, 20}}, 10, true); this->checkContains({{0, 5}, {10, 12}, {15, 20}}, 10, true);

this->checkContains({{0, 5}, {5, 7}, {8, 10}, {12, 41}}, 10, true); this->checkContains({{0, 5}, {5, 7}, {8, 10}, {12, 41}}, 10, true);

constexpr TypeParam MIN = TestFixture::getMin(); using TV = TestValues<TypeParam>;

constexpr TypeParam MAX = TestFixture::getMax(); constexpr auto MIN = TV::MIN;

constexpr TypeParam MID = TestFixture::getMid(); constexpr auto MAX = TV::MAX;

constexpr auto MID = TV::MID;

this->checkContains({{MIN, MAX}}, 0, true); this->checkContains({{MIN, MAX}}, 0, true);

this->checkContains({{MIN, MAX}}, MID, true); this->checkContains({{MIN, MAX}}, MID, true);

this->checkContains({{MIN, MAX}}, -10, true); this->checkContains({{MIN, MAX}}, -10, true);

this->checkContains({{MIN, MAX}}, 10, true); this->checkContains({{MIN, MAX}}, 10, true);

} }

TYPED_TEST(RangeSetTest, RangeSetAddTest) { TYPED_TEST(RangeSetTest, RangeSetAddTest) {

// Check adding single points // Check adding single points

Show All 12 Lines TYPED_TEST(RangeSetTest, RangeSetAddTest) {

this->checkAdd({{0, 5}, {30, 40}}, {{10, 20}}, {{0, 5}, {10, 20}, {30, 40}}); this->checkAdd({{0, 5}, {30, 40}}, {{10, 20}}, {{0, 5}, {10, 20}, {30, 40}});

this->checkAdd({{0, 5}, {30, 40}}, {{10, 20}, {50, 60}}, this->checkAdd({{0, 5}, {30, 40}}, {{10, 20}, {50, 60}},

{{0, 5}, {10, 20}, {30, 40}, {50, 60}}); {{0, 5}, {10, 20}, {30, 40}, {50, 60}});

this->checkAdd({{10, 20}, {50, 60}}, {{0, 5}, {30, 40}, {70, 80}}, this->checkAdd({{10, 20}, {50, 60}}, {{0, 5}, {30, 40}, {70, 80}},

{{0, 5}, {10, 20}, {30, 40}, {50, 60}, {70, 80}}); {{0, 5}, {10, 20}, {30, 40}, {50, 60}, {70, 80}});

} }

TYPED_TEST(RangeSetTest, RangeSetDeletePointTest) { TYPED_TEST(RangeSetTest, RangeSetDeletePointTest) {

constexpr TypeParam MIN = TestFixture::getMin(); using TV = TestValues<TypeParam>;

constexpr TypeParam MAX = TestFixture::getMax(); constexpr auto MIN = TV::MIN;

constexpr TypeParam MID = TestFixture::getMid(); constexpr auto MAX = TV::MAX;

constexpr auto MID = TV::MID;

this->checkDelete(MID, {{MIN, MAX}}, {{MIN, MID - 1}, {MID + 1, MAX}}); this->checkDelete(MID, {{MIN, MAX}}, {{MIN, MID - 1}, {MID + 1, MAX}});

// Check that delete works with an empty set. // Check that delete works with an empty set.

this->checkDelete(10, {}, {}); this->checkDelete(10, {}, {});

// Check that delete can remove entire ranges. // Check that delete can remove entire ranges.

this->checkDelete(10, {{10, 10}}, {}); this->checkDelete(10, {{10, 10}}, {});

this->checkDelete(10, {{0, 5}, {10, 10}, {20, 30}}, {{0, 5}, {20, 30}}); this->checkDelete(10, {{0, 5}, {10, 10}, {20, 30}}, {{0, 5}, {20, 30}});

// Check that delete can split existing ranges into two. // Check that delete can split existing ranges into two.

this->checkDelete(10, {{0, 5}, {7, 15}, {20, 30}}, this->checkDelete(10, {{0, 5}, {7, 15}, {20, 30}},

{{0, 5}, {7, 9}, {11, 15}, {20, 30}}); {{0, 5}, {7, 9}, {11, 15}, {20, 30}});

// Check that delete of the point not from the range set works as expected. // Check that delete of the point not from the range set works as expected.

this->checkDelete(10, {{0, 5}, {20, 30}}, {{0, 5}, {20, 30}}); this->checkDelete(10, {{0, 5}, {20, 30}}, {{0, 5}, {20, 30}});

} }

TYPED_TEST(RangeSetTest, RangeSetUniteTest) {

using TV = TestValues<TypeParam>;

constexpr auto MIN = TV::MIN;

constexpr auto MAX = TV::MAX;

constexpr auto MID = TV::MID;

constexpr auto A = TV::A;

constexpr auto B = TV::B;

constexpr auto C = TV::C;

constexpr auto D = TV::D;

// LHS and RHS is empty.

// RHS =>

// LHS => =

// ___________________ ___________________

this->checkUnite({}, {}, {});

// RHS is empty.

// RHS =>

// LHS => _____ = _____

// ______/_____\______ ______/_____\______

this->checkUnite({{A, B}}, {}, {{A, B}});

this->checkUnite({{A, B}, {C, D}}, {}, {{A, B}, {C, D}});

this->checkUnite({{MIN, MIN}}, {}, {{MIN, MIN}});

this->checkUnite({{MAX, MAX}}, {}, {{MAX, MAX}});

this->checkUnite({{MIN, MIN}, {MAX, MAX}}, {}, {{MIN, MIN}, {MAX, MAX}});

// LHS is empty.

// RHS => ___

// LHS => / \ = _____

// ______/_____\______ ______/_____\______

this->checkUnite({}, B, {{B, B}});

this->checkUnite({}, {B, C}, {{B, C}});

this->checkUnite({}, {{MIN, B}, {C, MAX}}, {{MIN, B}, {C, MAX}});

this->checkUnite({}, {{MIN, MIN}}, {{MIN, MIN}});

this->checkUnite({}, {{MAX, MAX}}, {{MAX, MAX}});

this->checkUnite({}, {{MIN, MIN}, {MAX, MAX}}, {{MIN, MIN}, {MAX, MAX}});

// RHS is detached from LHS.

// RHS => ___

// LHS => ___ / \ = ___ _____

// __/___\___/_____\__ __/___\___/_____\__

this->checkUnite({{A, C}}, D, {{A, C}, {D, D}});

this->checkUnite({{MID, C}, {D, MAX}}, A, {{A, A}, {MID, C}, {D, MAX}});

this->checkUnite({{A, B}}, {MID, D}, {{A, B}, {MID, D}});

this->checkUnite({{MIN, A}, {D, MAX}}, {B, C}, {{MIN, A}, {B, C}, {D, MAX}});

this->checkUnite({{B, MID}, {D, MAX}}, {{MIN, A}, {C, C}},

{{MIN, A}, {B, MID}, {C, C}, {D, MAX}});

this->checkUnite({{MIN, A}, {C, C}}, {{B, MID}, {D, MAX}},

{{MIN, A}, {B, MID}, {C, C}, {D, MAX}});

this->checkUnite({{MAX, MAX}}, {A, B}, {{A, B}, {MAX, MAX}});

this->checkUnite({{MIN, MIN}}, {A, B}, {{MIN, MIN}, {A, B}});

this->checkUnite({{MIN, MIN}}, {MAX, MAX}, {{MIN, MIN}, {MAX, MAX}});

// RHS is inside LHS.

// RHS => ___

// LHS => ___/___\___ = ___________

// ___/__/_____\__\___ ___/___________\___

this->checkUnite({{A, C}}, MID, {{A, C}});

this->checkUnite({{A, D}}, {B, C}, {{A, D}});

// RHS wraps LHS.

// RHS => _________

// LHS => / _____ \ = ___________

// ___/__/_____\__\___ ___/___________\___

this->checkUnite({{MID, MID}}, {A, D}, {{A, D}});

martongUnsubmitted

Not Done

// ___/__/_____\__\___ ___/___________\___

- this->checkUnite({{MID, MID}}, {A, D}, {{A, D}});

+ this->checkUnite(MID, {A, D}, {{A, D}});

this->checkUnite({{B, C}}, {A, D}, {{A, D}});

martong:

ASDenysPetrovAuthorUnsubmitted

Done

There are three overloads of checkUnite which correspond to three overloads of RangeSet::unite. I made it consistent to other check-functions (add e.g.).

ASDenysPetrov: There are three overloads of `checkUnite` which correspond to three overloads of `RangeSet…

this->checkUnite({{B, C}}, {A, D}, {{A, D}});

this->checkUnite({{A, B}}, {MIN, MAX}, {{MIN, MAX}});

// RHS equals to LHS.

// RHS => _________

// LHS => /_________\ = ___________

// ___/___________\___ ___/___________\___

this->checkUnite({{MIN, MIN}}, MIN, {{MIN, MIN}});

martongUnsubmitted

Not Done

// ___/___________\___ ___/___________\___

- this->checkUnite({{MIN, MIN}}, MIN, {{MIN, MIN}});

+ this->checkUnite(MIN, MIN, MIN);

this->checkUnite({{A, B}}, {A, B}, {{A, B}});

I think, either we should use {{X, X}} or X everywhere, but not mixed.

martong: I think, either we should use `{{X, X}}` or `X` everywhere, but not mixed.

ASDenysPetrovAuthorUnsubmitted

Done

This tests different oveloaded versions of RangeSet::unite.

ASDenysPetrov: This tests different oveloaded versions of `RangeSet::unite`.

this->checkUnite({{A, B}}, {A, B}, {{A, B}});

this->checkUnite({{MAX, MAX}}, {{MAX, MAX}}, {{MAX, MAX}});

this->checkUnite({{MIN, MIN}}, {{MIN, MIN}}, {{MIN, MIN}});

this->checkUnite({{MIN, MIN}, {MAX, MAX}}, {{MIN, MIN}, {MAX, MAX}},

{{MIN, MIN}, {MAX, MAX}});

// RHS equals to LHS.

// RHS => _____

// LHS => /_____\_____ = ___________

// /_______\____\___ /___________\___

this->checkUnite({{MIN, A}}, {MIN, B}, {{MIN, B}});

// RHS equals to LHS.

// RHS => __________

// LHS => /______ \ = ___________

// /_______\____\___ /___________\___

this->checkUnite({{MIN, B}}, {MIN, A}, {{MIN, B}});

// RHS intersects right of LHS.

// RHS => ______

// LHS => ___/____ \ = ___________

// ___/__/_____\__\___ ___/___________\___

this->checkUnite({{A, C}}, C, {{A, C}});

this->checkUnite({{A, C}}, {B, D}, {{A, D}});

vsavchenkoUnsubmitted

Not Done

I guess I also want to have more cases where ranges from RHS are in between ranges from LHS, also it should be a case with LHS being a set of ranges.

vsavchenko: I guess I also want to have more cases where ranges from RHS are in between ranges from LHS…

// RHS intersects left of LHS.

// RHS => ______

// LHS => / ____\___ = ___________

// ___/__/_____\__\___ ___/___________\___

this->checkUnite({{B, D}}, B, {{B, D}});

this->checkUnite({{B, D}}, {A, C}, {{A, D}});

// RHS adjacent to LHS on right.

// RHS => _____

// LHS => ______ / \ = _______________

// _/______\/_______\_ _/_______________\_

this->checkUnite({{A, B - 1}}, B, {{A, B}});

this->checkUnite({{A, C}}, {C + 1, D}, {{A, D}});

// RHS adjacent to LHS on left.

// RHS => _____

// LHS => / \ ______ = _______________

// _/_______\/______\_ _/_______________\_

this->checkUnite({{B + 1, C}}, B, {{B, C}});

this->checkUnite({{B, D}}, {A, B - 1}, {{A, D}});

// RHS adjacent to LHS in between.

// RHS => ___

// LHS => ___ / \ ___ = _______________

// _/___\/_____\/___\_ _/_______________\_

this->checkUnite({{A, MID - 1}, {MID + 1, D}}, MID, {{A, D}});

vsavchenkoUnsubmitted

Not Done

Also I'd like to see cases when there are ranges to merge from two sets but then one set has a bunch of other ranges that should be added as is. We can automatically think of two possibilities here: when these additional ranges are before and after the common part.
And I guess I want at least one check with two sets with a good amount of ranges in both covering all possible situations and overlappings.

vsavchenko: Also I'd like to see cases when there are ranges to merge from two sets but then one set has a…

this->checkUnite({{MIN, A}, {D, MAX}}, {A + 1, D - 1}, {{MIN, MAX}});

// RHS adjacent to LHS on the outside.

// RHS => __ __

// LHS => / \ ___ / \ = _______________

// _/____\/___\/____\_ _/_______________\_

this->checkUnite({{C, C}}, {{A, C - 1}, {C + 1, D}}, {{A, D}});

this->checkUnite({{B, MID}}, {{A, B - 1}, {MID + 1, D}}, {{A, D}});

// RHS wraps two subranges of LHS.

// RHS => ___________

// LHS => / ___ ___ \ = _____________

// __/_/___\_/___\_\__ __/_____________\__

this->checkUnite({{B, B}, {MID, MID}, {C, C}}, {{A, D}}, {{A, D}});

this->checkUnite({{A, B}, {MID, C}}, {{MIN, D}}, {{MIN, D}});

// RHS intersects two subranges of LHS.

// RHS => _________

// LHS => __/__ _\__ = _______________

// _/_/___\____/__\_\_ _/_______________\_

this->checkUnite({{MIN, B}, {C, MAX}}, {{A, D}}, {{MIN, MAX}});

// Multiple intersections.

// RHS =>

// LHS => /\ /\ = __ __

martongUnsubmitted

Not Done

// Multiple intersections.

- // RHS =>

- // LHS => /\ /\ = __ __

+ // LHS =>

+ // RHS => /\ /\ = __ __

// _/__\_/__\_/\_/\_/\_ _/__\_/__\_/\_/\_/\_

LHS and RHS is swapped here?

martong: LHS and RHS is swapped here?

// _/__\_/__\_/\_/\_/\_ _/__\_/__\_/\_/\_/\_

this->checkUnite({{MIN, A}, {A + 2, B}}, {{MID, C}, {C + 2, D - 2}, {D, MAX}},

{{MIN, A}, {A + 2, B}, {MID, C}, {C + 2, D - 2}, {D, MAX}});

martongUnsubmitted

Not Done

// _/__\_/__\_/\_/\_/\_ _/__\_/__\_/\_/\_/\_

- this->checkUnite({{MIN, A}, {A + 2, B}}, {{MID, C}, {C + 2, D - 2}, {D, MAX}},

- {{MIN, A}, {A + 2, B}, {MID, C}, {C + 2, D - 2}, {D, MAX}});

+ this->checkUnite({{MIN, A}, {A + 2, B}}, // LHS

+ {{MID, C}, {C + 2, D - 2}, {D, MAX}}, // RHS

+ {{MIN, A}, {A + 2, B}, {MID, C}, {C + 2, D - 2}, {D, MAX}}); // Result

this->checkUnite({{MIN, MIN}, {A, A}}, {{B, B}, {C, C}, {MAX, MAX}},

I think we could better format these more complex cases.

martong: I think we could better format these more complex cases.

ASDenysPetrovAuthorUnsubmitted

Done

clang-fromat acts on its own. But I agree, it looks the way better. I'll consider wrapping it into // clang-format on/off directives.

ASDenysPetrov: clang-fromat acts on its own. But I agree, it looks the way better. I'll consider wrapping it…

this->checkUnite({{MIN, MIN}, {A, A}}, {{B, B}, {C, C}, {MAX, MAX}},

{{MIN, MIN}, {A, A}, {B, B}, {C, C}, {MAX, MAX}});

// RHS =>

// LHS => /\ /\ = __ __

martongUnsubmitted

Not Done

{{MIN, MIN}, {A, A}, {B, B}, {C, C}, {MAX, MAX}});

- // RHS =>

- // LHS => /\ /\ = __ __

+ // LHS =>

+ // RHS => /\ /\ = __ __

// _/\_/\_/\__/__\_/__\_ _/\_/\_/\_/__\_/__\_

LHS and RHS is swapped?

martong: LHS and RHS is swapped?

// _/\_/\_/\__/__\_/__\_ _/\_/\_/\_/__\_/__\_

this->checkUnite({{C + 2, D - 2}, {D, MAX}}, {{MIN, A}, {A + 2, B}, {MID, C}},

{{MIN, A}, {A + 2, B}, {MID, C}, {C + 2, D - 2}, {D, MAX}});

martongUnsubmitted

Not Done

// _/\_/\_/\__/__\_/__\_ _/\_/\_/\_/__\_/__\_

- this->checkUnite({{C + 2, D - 2}, {D, MAX}}, {{MIN, A}, {A + 2, B}, {MID, C}},

- {{MIN, A}, {A + 2, B}, {MID, C}, {C + 2, D - 2}, {D, MAX}});

+ this->checkUnite({{C + 2, D - 2}, {D, MAX}}, // LHS

+ {{MIN, A}, {A + 2, B}, {MID, C}}, // RHS

+ {{MIN, A}, {A + 2, B}, {MID, C}, {C + 2, D - 2}, {D, MAX}}); // Result

this->checkUnite({{C, C}, {MAX, MAX}}, {{MIN, MIN}, {A, A}, {B, B}},

martong:

this->checkUnite({{C, C}, {MAX, MAX}}, {{MIN, MIN}, {A, A}, {B, B}},

{{MIN, MIN}, {A, A}, {B, B}, {C, C}, {MAX, MAX}});

// RHS =>

// LHS => _ /\ _ /\ _ /\ =

martongUnsubmitted

Not Done

LHS and RHS is swapped?

martong: LHS and RHS is swapped?

// _/_\_/__\_/_\_/__\_/_\_/__\_

// RSLT => _ __ _ __ _ __

// _/_\_/__\_/_\_/__\_/_\_/__\_

this->checkUnite({{A + 2, B}, {MID + 2, C}, {D + 2, MAX}},

{{MIN, A}, {B + 2, MID}, {C + 2, D}},

{{MIN, A},

{A + 2, B},

{B + 2, MID},

{MID + 2, C},

{C + 2, D},

{D + 2, MAX}});

this->checkUnite({{A, A}, {C, C}, {MAX, MAX}}, {{MIN, MIN}, {B, B}, {D, D}},

{{MIN, MIN}, {A, A}, {B, B}, {C, C}, {D, D}, {MAX, MAX}});

// RHS =>

// LHS => /\ _ /\ _ /\ _ =

martongUnsubmitted

Not Done

LHS and RHS is swapped here as well.

martong: LHS and RHS is swapped here as well.

// _/__\_/_\_/__\_/_\_/__\_/_\_

// RSLT => __ _ __ _ __ _

// _/__\_/_\_/__\_/_\_/__\_/_\_

this->checkUnite({{MIN, A}, {B + 2, MID}, {C + 2, D}},

{{A + 2, B}, {MID + 2, C}, {D + 2, MAX}},

{{MIN, A},

{A + 2, B},

{B + 2, MID},

{MID + 2, C},

{C + 2, D},

{D + 2, MAX}});

this->checkUnite({{MIN, MIN}, {B, B}, {D, D}}, {{A, A}, {C, C}, {MAX, MAX}},

{{MIN, MIN}, {A, A}, {B, B}, {C, C}, {D, D}, {MAX, MAX}});

// RHS => _ __ _

// LHS => /_\ /_ \ _ / \ = ___ ____________

// _/___\_/__\_\/_\/___\_ _/___\_/____________\_

this->checkUnite({{MIN, A}, {B, C}, {D, MAX}},

{{MIN, A}, {B, C - 2}, {C + 1, D - 1}},

{{MIN, A}, {B, MAX}});

this->checkUnite({{A, A}, {B, MID}, {D, D}},

{{A, A}, {B, B}, {MID + 1, D - 1}}, {{A, A}, {B, D}});

martongUnsubmitted

Not Done

this->checkUnite({{A, A}, {B, MID}, {D, D}},

- {{A, A}, {B, B}, {MID + 1, D - 1}}, {{A, A}, {B, D}});

+ {{A, A}, {B, B}, {MID + 1, D - 1}},

+ {{A, A}, {B, D}});

// RHS => ___ ___

martong:

// RHS => ___ ___

// LHS => /\ _/_ \_ / _ \ /\ =

// _/\_/__\//__\ /\\_/_/_\_\_/__\_

// RSLT => ___________ _____ __

// _/\_/___________\_/_____\_/__\_

this->checkUnite({{A, B - 1}, {B + 1, C - 1}, {C + 2, D}, {MAX - 1, MAX}},

{{MIN, MIN}, {B, MID}, {MID + 1, C}, {C + 4, D - 1}},

{{MIN, MIN}, {A, C}, {C + 2, D}, {MAX - 1, MAX}});

martongUnsubmitted

Not Done

// _/\_/___________\_/_____\_/__\_

- this->checkUnite({{A, B - 1}, {B + 1, C - 1}, {C + 2, D}, {MAX - 1, MAX}},

- {{MIN, MIN}, {B, MID}, {MID + 1, C}, {C + 4, D - 1}},

- {{MIN, MIN}, {A, C}, {C + 2, D}, {MAX - 1, MAX}});

+ this->checkUnite({{A, B - 1}, {B + 1, C - 1}, {C + 2, D}, {MAX - 1, MAX}}, // LHS

+ {{MIN, MIN}, {B, MID}, {MID + 1, C}, {C + 4, D - 1}}, // RHS

+ {{MIN, MIN}, {A, C}, {C + 2, D}, {MAX - 1, MAX}}); // Result

}

martong:

martongUnsubmitted

Not Done

// _/\_/___________\_/_____\_/__\_

- this->checkUnite({{A, B - 1}, {B + 1, C - 1}, {C + 2, D}, {MAX - 1, MAX}},

- {{MIN, MIN}, {B, MID}, {MID + 1, C}, {C + 4, D - 1}},

- {{MIN, MIN}, {A, C}, {C + 2, D}, {MAX - 1, MAX}});

- }

+ // clang-format off

+ this->checkUnite(

+ {{A, B - 1}, {B + 1, C - 1}, {C + 2, D}, {MAX - 1, MAX}},

+ {{MIN, MIN}, {B, MID}, {MID + 1, C}, {C + 4, D - 1}},

+ {{MIN, MIN}, {A, C}, {C + 2, D}, {MAX - 1, MAX}});

+ // clang format on}

What do you think about this format? The result can be easily verified this way I think, but a bit ugly ...

martong: What do you think about this format? The result can be easily verified this way I think, but a…

ASDenysPetrovAuthorUnsubmitted

Done

I think ASCII art does this job. Let code look as code :)

ASDenysPetrov: I think ASCII art does this job. Let code look as code :)

}

This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Implemented RangeSet::Factory::unite function to handle intersections and adjacencyClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 348294

clang/include/clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp

clang/unittests/StaticAnalyzer/RangeSetTest.cpp

[analyzer] Implemented RangeSet::Factory::unite function to handle intersections and adjacency
ClosedPublic