This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Implemented RangeSet::Factory::castTo function to perform promotions, truncations and conversions
ClosedPublic

Authored by ASDenysPetrov on May 25 2021, 9:03 AM.

Details

Summary

Handle casts for ranges working similarly to APSIntType::apply function but for the whole range set. Support promotions, truncations and conversions.
Example:
Promotion: char [0, 42] -> short [0, 42] -> int [0, 42] -> llong [0, 42]
Truncation: llong [4295033088, 4295033130] -> int [65792, 65834] -> short [256, 298] -> char [0, 42]
Conversion: char [-42, 42] -> uint [0, 42]U[4294967254, 4294967295] -> short[-42, 42]

Diff Detail

Event Timeline

ASDenysPetrov created this revision.May 25 2021, 9:03 AM
ASDenysPetrov requested review of this revision.May 25 2021, 9:03 AM

OK, about the quadratic algorithm. I can think of one solution that is actually linear asymptotically, but because of the constant factor can be worse in practice (we usually have extremely small range sets). So, while I don't like it, it's probably fine.

clang/include/clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h
246–247

That is a bit of a red flag because instantly my intuition tells me that there should be a linear option.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
682

We have LHS (left-hand side) and RHS (right-hand side) as a common convention for naming operands of binary operators.
The fact that you later on use it as an RHS for unite, doesn't make it a good name.

vsavchenko added inline comments.Jun 17 2021, 4:11 AM
clang/include/clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h
249

If we talk about types for ranges, the interface we have is incomplete.
We should introduce something like getType for range sets, so that the user can have a more holistic picture about types and ranges.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
687

That's incorrect. uint64_t holds exactly this many values. If you count every number from 0 to UINT64_MAX, you get UINT64_MAX + 1, which is the number you are talking about.
Maybe you want to talk about the maximum stored value?

691–720

I think we should have a shortcut for widening casts, where we don't need to go and unite them. We just literally need to convert every range from one type to another and be done with it.

clang/unittests/StaticAnalyzer/RangeSetTest.cpp
137–149

I don't really understand what's going on here.

651–652

Yes, please.

700

Test cases don't belong in RangeSetTest.

Each piece of code like this should be a separate test case.

@vsavchenko Repled in inlines. Preparing an update.

clang/include/clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h
246–247

Unfortunately, castTo is N^2 because technically we can call unite(which is N) inside it N times.

249

+1. Is there any place where we can put tasks as tech debt or just keep noting with FIXME/TODO's?

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
682

What about Tmp?

687

Maybe you want to talk about the maximum stored value?

Yes. I should rephrase the comment.
E.g.: 128 is an amount of all possible values of char and we can't keep it inside char.

691–720

we don't need to go and unite them.

Um, yes. 'unite' only needed for truncated range sets which have more then one range. I'll try to refactor this.
That will help make the algorithm linear for most cases.

clang/unittests/StaticAnalyzer/RangeSetTest.cpp
137–149

This is a generalized version of from function. But this one you can use independently from which type the current test case uses. You can set a custom type to get a range set for a given type. This allows to get range sets based on different types (e.g. char and int) in a single test case.
Previously from produced range sets only for the type specified by a test case.

ASDenysPetrov added inline comments.Jun 17 2021, 5:44 AM
clang/include/clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h
246–247

Corrected.
Unfortunately, castTo is N^2 because technically we can call unite(which is N+M) inside it N times.

vsavchenko added inline comments.Jun 17 2021, 5:55 AM
clang/include/clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h
246–247

Unfortunately, castTo is N^2 because technically we can call unite(which is N) inside it N times.

I know why you put N^2, and what I meant was that there is probably a better algorithm than that.

And as I said there is a linear algorithm:
If you have a smaller type that is, let's say, 8 times smaller than the larger type. We can split the original range set into 8 range sets and unite them.
This difference between type sizes is limited (because we don't support integer types of arbitrary bit width) aka is a constant. However, IRL the size of range sets is very small, and this constant factor can make it much slower than your N^2 algorithm.

But it doesn't mean that there is no better algorithm and that you HAVE TO call unite in a loop N times.

vsavchenko added inline comments.Jun 17 2021, 6:06 AM
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
682

Maybe something like Dummy or Intermediate.

clang/unittests/StaticAnalyzer/RangeSetTest.cpp
137–149

OK then, if you make this from templated, make all of them templated and you won't need to duplicate the logic with sizeof * 8 everywhere.

Made changes according to comments. Optimized castTo function for each cast case. Simplified unit test.

vsavchenko added inline comments.Jun 23 2021, 7:47 AM
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
356–366

We should add assert(!isEmpty()); in all three

702–733

Maybe we can extract this into something like truncateTo and the next if to convertTo private methods?

clang/unittests/StaticAnalyzer/RangeSetTest.cpp
121–122

Great, that works too!

124

Default to BaseType?

127

Default to BaseType?

131–132

Default to BaseType?

132

Unused parameter?

726–804

If loop and promotion share the same test case, why should we split them into two groups?

I'll update tommorow.

clang/unittests/StaticAnalyzer/RangeSetTest.cpp
124

It's implicitly deduced. I think it is not necessary.

132

Oh... :-)

726–804

Yes, the tests are identical but they use different testing::Types in the suites. Also I don't want to mix them to make problem localizing easier if occurs. But still I haven't strong preferences on that.

Added assertions. Added two helper functions for castTo method. Moved some distinct code to them. Fixed clang-tidy complaints.

ASDenysPetrov retitled this revision from [analyzer] Implemented RangeSet::Factory::castTo function to perform promotions, truncations and conversions. to [analyzer] Implemented RangeSet::Factory::castTo function to perform promotions, truncations and conversions.Nov 17 2021, 1:58 AM

Fixed missed part during rebasing in the unit test.

Herald added a project: Restricted Project. · View Herald TranscriptMar 23 2022, 10:13 AM

Ping.

I am still interested in this! I am sorry for being inactive with it lately, I hope I'll have time soon to continue the review of this patch stack.

Thanks for you patience Denys! Finally I had some time for the review. Nice work!

clang/include/clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h
242
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
358

Probably it is unrelated to this patch, but
Could it happen that (++begin())->From().isUnsigned() gives a different signedness? Or we had a separate assertion when we added the second Range? The same question applies to the below two functions as well. Seems like in unite we don't have such validity check...

687

Could you please explain why do we need the || !What.isUnsigned() part of the condition?

This would make much more sense to me:

if (IsConversion && !IsPromotion)
    return makePersistent(convertTo(What, Ty));
690

This comment is confusing, since we can get here also if
isConversion is false and
isPromotion is false

690–711

I think it would be more consistent with the other operations (truncate and convert) if we had this logic in its own separate function: promoteTo. And this function should be called only if isPromotion is set (?)

740–741

?

773

We could certainly have a better name for this. E.g. SecondHalf. (And FirstHalf instead of Result?)

774–791

I think this comment would be better positioned as a header comment for the entire function.

792

Why not -> Range?

802

Would it be a better name:LastConvertedInt? The naming From is confusing because that suggests the value we cast from, however, this variable refers to the casted to values, isn't it?

812–814

I think this case (wrap) deserves a comment.

clang/unittests/StaticAnalyzer/RangeSetTest.cpp
731

It doesn't matter, becasue T and F are equal here, but for consistency with the other tests.

830

I am getting lost. Why don't you check against ToMIN and ToMAX here? Could you explain e.g. with int16->int8?

It is confusing that at many places you test against MIN, MAX, A, ... and the conversion is happening automatically by the checkCastTo template. Would it be more explicit to use everywhere ToMIN, ToA, ToB, ... and check against them?

877–882

Could you please elaborate what do we test here?

ASDenysPetrov added inline comments.Apr 15 2022, 6:10 AM
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
358

Or we had a separate assertion when we added the second Range?

I didn't find any assertion while adding.
Moreover, I'm not sure if there can happen APSInts of different types in a single Range. We only have assert(From < To) in a ctor.

Seems like in unite we don't have such validity check...

Not only unite doesn't have such, but all the rest don't as well. We rely on a fact that the caller garantees the validity of RangeSet by using AdjustmentType.
But, yes, it's worth to make some checks.

687

Look. Here we handle 2 cases:

  • IsConversion && !IsPromotion. In this case we handle changing a sign with same bitwidth: char -> uchar, uint -> int. Here we convert negatives to positives and positives which is out of range to negatives. We use convertTo function for that.
  • IsConversion && IsPromotion && !What.isUnsigned(). In this case we handle changing a sign from signeds to unsigneds with higher bitwidth: char -> uint, int-> uint64. The point is that we also need convert negatives to positives and use convertTo function as well. For example, we don't need such a convertion when converting unsigned to signed with higher bitwidth, because all the values of unsigned is valid for the such signed.
690

Nothing confusing is here.
We have 7 main cases:
NOOP: u -> u, s -> s
Conversion: u -> s, s -> u
Truncation: U-> u, S -> s
Promotion: u->U, s -> S
Truncation + Conversion: U -> s, S -> u
Promotion + Conversion: s -> U, u -> S

As you can see, we've handled all the bolds above this line .
So only promotions from unsigneds left. That means, isPromotion never should be false here. We could add an assertion here if you want.

690–711

That makes sense. I'll do.

792

Because Range requires from < to in its contract. But I need to store the values after conversions. You can see a check if (NewBounds.first > NewBounds.second) on line #812.

clang/unittests/StaticAnalyzer/RangeSetTest.cpp
830

We can't use ToMIN, ToMAX, ... everywhere. That would be incorrect:
int16(-32768, 32767) -> int8(-128, 127), aka (MIN, MAX) -> (ToMIN, ToMAX) // OK.
int16(-32768, -32768) -> int8(-128, -128), aka (MIN, MIN) -> (ToMIN, ToMIN) // NOK.
int16(-32768,-32768) -> int8(0, 0), aka (MIN, MIN) -> ((int8)MIN, (int8)MIN) // OK.

877–882

E.g.

                 RangeA                   U                   RangeB
(0000'1111'0000'0100, 0000'1111'0000'0111)U(1111'0000'0000'0100, 1111'0000'0000'0111)
truncates to
                 RangeC                   U                   RangeC
(          0000'0100,           0000'0111)U(          0000'0100,           0000'0111)
As you can see, we got two equal truncated ranges.
                           RangeC
Then unite them to (0000'0100, 0000'0111).

Updated according to suggestions.
@martong thank you for the review.

ASDenysPetrov marked 33 inline comments as done.Apr 15 2022, 9:09 AM
martong accepted this revision.Apr 19 2022, 6:27 AM

Thanks, LGTM! With minor revisions.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
687

Ok, makes sense. Could you please attach your explanation as a comment then into the code?

690

So only promotions from unsigneds left. That means, isPromotion never should be false here. We could add an assertion here if you want.

Yeah, I think that would be useful, to enforce that the precondition for promoteTo holds all the time (even after future changes).

749
clang/unittests/StaticAnalyzer/RangeSetTest.cpp
830

Okay makes sense, thanks for the explanation. Nevertheless, I think this explanatory comment would be useful to have in the code.

877–882

Ok.

This revision is now accepted and ready to land.Apr 19 2022, 6:27 AM
This revision was landed with ongoing or failed builds.Apr 19 2022, 12:34 PM
This revision was automatically updated to reflect the committed changes.

Loaded with comment updateds according to the remarks.
@martong thank you for your time for the review!