This is an archive of the discontinued LLVM Phabricator instance.

Implement BufferOverlap check for sprint/snprintf
ClosedPublic

Authored by ArnaudBienner on May 12 2023, 1:08 AM.

Details

Summary

I discussed about this with you @dergachev.a a long time ago on the mailing list ("static or dynamic code analysis for undefined behavior in sprintf") and finally took the time to implement something :)

Let me know what you think about this.

Diff Detail

Event Timeline

ArnaudBienner created this revision.May 12 2023, 1:08 AM
Herald added a project: Restricted Project. · View Herald Transcript
ArnaudBienner published this revision for review.May 12 2023, 1:16 AM
ArnaudBienner edited the summary of this revision. (Show Details)
ArnaudBienner added a subscriber: dergachev.a.
Herald added a project: Restricted Project. · View Herald TranscriptMay 12 2023, 1:16 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
ArnaudBienner added inline comments.May 12 2023, 1:01 PM
clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
2385

Just realized this should be 2 (position 3, but index 2).

Will upload a newer version of the patch with this change, and new tests

Fix start index for sprintf ovlerap check + tests

Updating D150430: Implement BufferOverlap check for sprint/snprintf

Updating D150430: Implement BufferOverlap check for sprint/snprintf

I like it.

How about an implementation like this:

void CStringChecker::evalSprintfCommon(CheckerContext &C, const CallExpr *CE,
                                       bool IsBounded) const {
  ProgramStateRef State = C.getState();
  DestinationArgExpr Dest = {CE->getArg(0), 0};

  const auto NumParams = CE->getCalleeDecl()->getAsFunction()->getNumParams();
  assert(CE->getNumArgs() >= NumParams);

  auto AllArguments =
      llvm::make_range(CE->getArgs(), CE->getArgs() + CE->getNumArgs());
  auto VariadicArguments = drop_begin(enumerate(AllArguments), NumParams);

  for (auto [ArgIdx, ArgExpr] : VariadicArguments) {
    // We consider only string buffers
    if (QualType PointeeTy = ArgExpr->getType()->getPointeeType();
        PointeeTy.isNull() || !PointeeTy->isAnyCharacterType())
      continue;
    SourceArgExpr Source = {ArgExpr, unsigned(ArgIdx)};

    // Ensure the buffers do not overlap.
    SizeArgExpr SrcExprAsSizeDummy = {Source.Expression, Source.ArgumentIndex};
    State = CheckOverlap(
        C, State,
        (IsBounded ? SizeArgExpr{CE->getArg(1), 1} : SrcExprAsSizeDummy), Dest,
        Source);
    if (!State)
      return;
  }

  C.addTransition(State);
}
  • Using ranges to iterate over the variadic arguments. This way it will just work with both builtins and whatnot.
  • Adding a transition to preserve the gained knowledge about the relation of the source and destination buffers. In this case, it won't be really effective, but at least we obey this convention.

Maybe add a few test cases where the warning should not be present.
And it is also valuable to have at least one test case asserting the complete message.

clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
179–180

Let's express that we expect at least 2 arguments at the callsites.

  • Code review comments

Thanks for the review :)

I implemented the suggested changes.

Just one question: PointeeTy.isNull(): is this guaranteed to always work even though getType()->isAnyPointerType() == false?

I'm assuming yes since the tests still pass, but wanted to confirm this is the best way to do this check for char* arguments.

Thanks for the review :)

I implemented the suggested changes.

Just one question: PointeeTy.isNull(): is this guaranteed to always work even though getType()->isAnyPointerType() == false?

I'm assuming yes since the tests still pass, but wanted to confirm this is the best way to do this check for char* arguments.

I was also surprised, but check the definition of that function. It will try to cast the nonnull type to a list of types, and return null if didnt match.

Giving this a second thought, I feel like the initial check:

if (!ArgExpr->getType()->isAnyPointerType() ||
    !ArgExpr->getType()->getPointeeType()->isAnyCharacterType())

is better than the new one. To me it reads like "expr type is a pointer and it points to character type" which is more understandable IMHO.

If you're worried about the expression being a bit long, I could move type to a temp variable:

if (const QualType type = ArgExpr->getType();
    !type->isAnyPointerType() ||
    !type->getPointeeType()->isAnyCharacterType())

Though I'm not sure this is really more readable.

What do you think?

Any other suggestion/comment about this patch?

@steakhal did you get a chance to look at my comment?

I would really love to see this merged upstream if you think this could be a beneficial change :)

steakhal accepted this revision.May 29 2023, 2:32 AM

Yea, it still looks okay. Let me know if you don't have commit access. In that case, also let me know what should be used for the commit author.

This revision is now accepted and ready to land.May 29 2023, 2:32 AM

Ah, I can see that pre-merge bots failed due to clang-format violations.
Please make all the touched lines as clang-formatted prior to committing anything.

  • Code review comments: change back the string buffer check + run clang format
steakhal accepted this revision.May 30 2023, 1:56 AM

Can you commit this change or should I do it on your behalf?

  • clang-format fix
This revision was automatically updated to reflect the committed changes.

Thanks @steakhal for the proposal :) but I pushed it myself: https://github.com/llvm/llvm-project/commit/ce97312d109b21acb97d3ea243e214f20bd87cfc

I used to have svn access, and Chris kindly give me write access to the git repository.

@steakhal should the release notes page be updated to add this?

I think this could be beneficial for the users to be aware of that change.

If yes, who is responsible for doing that? Developers? (me) or someone else is taking care of reviewing the list of changes since latest releases?

TBH we don't really have processes there. And its a bit of a mess, regarding CSA. Usually we forgot to update the release notes, and at best we aggregate the changes just prior a release for the release notes. But we haven't done it for the last two releases for sure.

I was thinking of asking you, but given this context we wouldn't gain much. We would still need to go through the diffs anyway. So, let not touch the notes now.