This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] StdLibraryFunctionsChecker refactor w/ inheritance
ClosedPublic

Authored by martong on Feb 21 2020, 9:50 AM.

Details

Summary

Currently, ValueRange is very hard to extend with new kind of constraints.
For instance, it forcibly encapsulates relations between arguments and the
return value (ComparesToArgument) besides handling the regular value
ranges (OutOfRange, WithinRange).
ValueRange in this form is not suitable to add new constraints on
arguments like "not-null".

This refactor introduces a new base class ValueConstraint with an
abstract apply function. Descendants must override this. There are 2
descendants: RangeConstraint and ComparisonConstraint. In the following
patches I am planning to add the NotNullConstraint, and additional
virtual functions like negate() and warning().

Diff Detail

Event Timeline

martong created this revision.Feb 21 2020, 9:50 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 21 2020, 9:50 AM
martong marked an inline comment as done.Feb 21 2020, 10:06 AM
martong added inline comments.
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
158

Note here, we need a copyable, polymorphic and default initializable type (vector needs that). A raw pointer were good, however, we cannot default initialize that. unique_ptr makes the Summary class non-copyable, therefore not an option.
Releasing the copyablitly requirement would render the initialization of the Summary map infeasible.
Perhaps we could come up with a type erasure technique without inheritance once we consider the shared_ptr as restriction, but for now that seems to be overkill.

Is really more kind of constraint needed than range constraint? A non-null can be represented as range constraint too. The compare constraint is used only for the return value for which a special ReturnConstraint can be used to handle the return value not like a normal argument (and then the Ret special value is not needed). Or are there sometimes relations between arguments of a function?

Is really more kind of constraint needed than range constraint?

Yes, there are other constraints I am planning to implement:

  • Size requirements E.g.: asctime_s(char *buf, rsize_t bufsz, const struct tm *time_ptr); buf size must be at least bufsz.
  • Not-null
  • Not-uninitalized
  • Not-tainted

A non-null can be represented as range constraint too.

Actually, to implement that we should have a branch in all ValueRange::apply* functions that handles Loc SVals. Unfortunately, a pointer cannot be handled as NonLoc, and the current Range based implementation handles NonLocs only.

The compare constraint is used only for the return value for which a special ReturnConstraint can be used to handle the return value not like a normal argument (and then the Ret special value is not needed).

The Compare constraint is already forced into a Range "concept" whereas it has nothing to do with ranges. By handling compare constraints separately, we attach a single responsibility to each constraint class, instead of having a monolithic god constraint class. Take a look at this coerced data representation that we have today in ValueRange:

BinaryOperator::Opcode getOpcode() const {
  assert(Kind == ComparesToArgument);
  assert(Args.size() == 1);
  BinaryOperator::Opcode Op =
      static_cast<BinaryOperator::Opcode>(Args[0].first);
  assert(BinaryOperator::isComparisonOp(Op) &&
         "Only comparison ops are supported for ComparesToArgument");
  return Op;
}

Subclasses are a good way to get rid of this not-so-intuitive structure and assertions.

Or are there sometimes relations between arguments of a function?

I can't recall now a direct relation by heart.
But there could be more subtle indirect relations, see the size requirements above. (Thought I'd rather implement that in a separate constraint class.)

FYI I've been seeing your patches to this checker and I will respond to them, but I need to do some learning on my own before having the confidence to accept or request changes. Working on it!

FYI I've been seeing your patches to this checker and I will respond to them, but I need to do some learning on my own before having the confidence to accept or request changes. Working on it!

No worries, just take your time! :) In the meanwhile, I am trying to implement the "non-null" argument constraint based on these two patches.

@balazske See https://reviews.llvm.org/D75063 about the simple independent implementation of the NotNull constraint.

martong marked an inline comment as done.Feb 24 2020, 10:01 AM
martong added inline comments.
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
92

virtual dtor is missing.

gamesh411 added inline comments.Feb 27 2020, 3:15 AM
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
158

std::variant (with std::monostate for the default constructibility) would also be an option (if c++17 were supported). But this is not really an issue, i agree with that.

I have some high level questions, you have spent far more time with this code and I'm happy to be proven wrong! :)

Is really more kind of constraint needed than range constraint?

Yes, there are other constraints I am planning to implement:

  • Size requirements E.g.: asctime_s(char *buf, rsize_t bufsz, const struct tm *time_ptr); buf size must be at least bufsz.
  • Not-null
  • Not-uninitalized
  • Not-tainted

Are we really sure that we need to express that with constraints? Can't we just change the name of ValueRange (or encapsulate it in another class) and add more fields to it, such as taintedness or initializedness? Is there an incentive to keep ValueRange lean?

This doesn't look too bad:

auto Getc = [&]() {
  return Summary(ArgTypes{Irrelevant}, RetType{IntTy}, NoEvalCall)
      .Case(
          {ReturnValueDescription(RangeConstraints(WithinRange, {{EOFv, EOFv}, {0, UCharMax}},
                                  Tainted, Non_Uninitialized});
};

A non-null can be represented as range constraint too.

Actually, to implement that we should have a branch in all ValueRange::apply* functions that handles Loc SVals. Unfortunately, a pointer cannot be handled as NonLoc, and the current Range based implementation handles NonLocs only.

So, why didn't we take that route instead? Marking a pointer non-null seems to be a less invasive change.

The compare constraint is used only for the return value for which a special ReturnConstraint can be used to handle the return value not like a normal argument (and then the Ret special value is not needed).

The Compare constraint is already forced into a Range "concept" whereas it has nothing to do with ranges. By handling compare constraints separately, we attach a single responsibility to each constraint class, instead of having a monolithic god constraint class. Take a look at this coerced data representation that we have today in ValueRange:

BinaryOperator::Opcode getOpcode() const {
  assert(Kind == ComparesToArgument);
  assert(Args.size() == 1);
  BinaryOperator::Opcode Op =
      static_cast<BinaryOperator::Opcode>(Args[0].first);
  assert(BinaryOperator::isComparisonOp(Op) &&
         "Only comparison ops are supported for ComparesToArgument");
  return Op;
}

Subclasses are a good way to get rid of this not-so-intuitive structure and assertions.

Having more fields feels like another possible solution.

clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
158

Ugh, we've historically been very hostile towards virtual functions. We don't mind them that much when they don't have to run a lot, like during bug report construction, but as a core part of the analysis, I'm not sure what the current stance is on it.

I'm not inherently (haha) against it, and I'm fine with leaving this as-is for the time being, though I'd prefer if you placed a TODO to revisit this issue.

martong marked 4 inline comments as done.Mar 6 2020, 9:23 AM

I have some high level questions, you have spent far more time with this code and I'm happy to be proven wrong! :)

Is really more kind of constraint needed than range constraint?

Yes, there are other constraints I am planning to implement:

  • Size requirements E.g.: asctime_s(char *buf, rsize_t bufsz, const struct tm *time_ptr); buf size must be at least bufsz.
  • Not-null
  • Not-uninitalized
  • Not-tainted

Are we really sure that we need to express that with constraints? Can't we just change the name of ValueRange (or encapsulate it in another class) and add more fields to it, such as taintedness or initializedness? Is there an incentive to keep ValueRange lean?

Yes there is: separation of concerns, i.e. ValueRange should handle Values with int ranges. One class should handle only one well established responsibility.

This doesn't look too bad:

auto Getc = [&]() {
  return Summary(ArgTypes{Irrelevant}, RetType{IntTy}, NoEvalCall)
      .Case(
          {ReturnValueDescription(RangeConstraints(WithinRange, {{EOFv, EOFv}, {0, UCharMax}},
                                  Tainted, Non_Uninitialized});
};

A non-null can be represented as range constraint too.

Actually, to implement that we should have a branch in all ValueRange::apply* functions that handles Loc SVals. Unfortunately, a pointer cannot be handled as NonLoc, and the current Range based implementation handles NonLocs only.

So, why didn't we take that route instead? Marking a pointer non-null seems to be a less invasive change.

The answer is the same here. I think we should not mix too much implementation of different cases into one monumental function/class.

The compare constraint is used only for the return value for which a special ReturnConstraint can be used to handle the return value not like a normal argument (and then the Ret special value is not needed).

The Compare constraint is already forced into a Range "concept" whereas it has nothing to do with ranges. By handling compare constraints separately, we attach a single responsibility to each constraint class, instead of having a monolithic god constraint class. Take a look at this coerced data representation that we have today in ValueRange:

BinaryOperator::Opcode getOpcode() const {
  assert(Kind == ComparesToArgument);
  assert(Args.size() == 1);
  BinaryOperator::Opcode Op =
      static_cast<BinaryOperator::Opcode>(Args[0].first);
  assert(BinaryOperator::isComparisonOp(Op) &&
         "Only comparison ops are supported for ComparesToArgument");
  return Op;
}

Subclasses are a good way to get rid of this not-so-intuitive structure and assertions.

Having more fields feels like another possible solution.

Yes, having fields is another approach of having an enum and a union (i.e. tagged union). A tagged union is a variant basically. So adding more fields would finally be very similar to a variant based solution. And that is useful only if we know the set of classes beforehand and we want to gradually implement new operations on them.
In our case we know that we need 3 operations: apply, negate, warning and no more. And we want to gradually add new classes: NonNull, NotUninitialized, ...

clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
158

std::variant (with std::monostate for the default constructibility) would also be an option (if c++17 were supported). But this is not really an issue, i agree with that.

Variant would be useful if we knew the set of classes prior and we wanted to add operations gradually. Class hierarchies (or run-time concepts [Sean Parent]) are very useful if we know the set of operations prior and we want to add classes gradually, and we have this case here.

158

Ugh, we've historically been very hostile towards virtual functions. We don't mind them that much when they don't have to run a lot, like during bug report construction, but as a core part of the analysis, I'm not sure what the current stance is on it.

I did not find any evidence for this statement. Consider as a counter example the ExternalASTSource interface in Clang, which is filled with virtual functions and is part of the C/C++ lookup mechanism, which is quite on the hot path of C/C++ parsing I think. Did not find any prohibition in LLVM coding guidelines neither. I do believe virtual functions have their use cases exactly where (runtime) polimorphism is needed, such as in this patch.

martong marked an inline comment as done and an inline comment as not done.Mar 6 2020, 9:26 AM

Ah, okay I think I got why you chose this direction. Summaries are little more then a collection of constraints, and at select points in the execution we check and apply them one-by-one. If we want to preserve this architecture (and it seems great, why not?), inheritance is indeed the correct design decision.

The costliest code to run according to my limited knowledge about C++ development is indeed the one that is hard to understand and maintain, and this seems to have a good bit of thought behind it. I'll check your followup patches to gain some confidence before formally accepting, but the general idea seems great.

clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
87–88

Why did we remove the initialization here? Can we make this constexpr?

89

We should totally have a good bit of documentation here.

158

I consider myself proven wrong here then.

martong marked 2 inline comments as done.Mar 10 2020, 7:40 AM
martong added inline comments.
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
158

Thanks for the review and for considering other alternatives! And please accept my apologies, maybe I was pushing too hard on inheritance.

Szelethus added inline comments.Mar 11 2020, 8:08 AM
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
158

We should definitely decorate this with a TODO: Can we change this to not use a shared_ptr?. Worst case scenario, there it will stay for eternity :)

clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
101

I would move this into the class to encapsulate the values instead of contaminating namespace ento.

158

FIXME is the official, not TODO, afaik.

158

I think that inheritance is the right approach here. However, if it is unacceptable for performance reasons it could be replaced by a template-based solution.

NoQ accepted this revision.Mar 15 2020, 8:46 PM

I like this! Please address the remaining nits^^

clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
101

It's checker-local anyway and i suspect we write these a lot in the summaries, so let's keep it in the global scope.

This revision is now accepted and ready to land.Mar 15 2020, 8:46 PM
martong marked 12 inline comments as done.Mar 17 2020, 5:15 AM
martong added inline comments.
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
87–88

I am receiving an undefined reference linker error (I use gold) if the initialization is happening in-class.
Even if constexpr is used.

/usr/bin/ld.gold: error: tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/StdLibraryFunctionsChecker.cpp.o: requires dynamic R_X86_64_PC32 reloc against '_ZN12_GLOBAL__N_126StdLibraryFunctionsChecker3RetE' which may overflow at runtime; recompile with -fPIC
tools/clang/lib/StaticAnalyzer/Checkers/CMakeFiles/obj.clangStaticAnalyzerCheckers.dir/StdLibraryFunctionsChecker.cpp.o:StdLibraryFunctionsChecker.cpp:function (anonymous namespace)::StdLibraryFunctionsChecker::initFunctionSummaries(clang::ento::CheckerContext&) const: error: undefined reference to '(anonymous namespace)::StdLibraryFunctionsChecker::Ret'
89

Ok, I added some comments to the class and to apply as well.

158

Actually, this is not necessary something that we need to fix or do. So, instead of the TODO I have added a comment that explains why do we use the shared_ptr.

martong updated this revision to Diff 250732.Mar 17 2020, 5:15 AM
martong marked 3 inline comments as done.
  • Rebase to master
  • Add comments to ValueConstraint
  • Add virtual dtor
  • Add comments to ValueConstraintPtr
This revision was automatically updated to reflect the committed changes.

Please have my post-commit approval :^) Nice work!

Please have my post-commit approval :^) Nice work!

Thanks! :)