This is an archive of the discontinued LLVM Phabricator instance.

[-Wunsafe-buffer-usage] Move the whole analysis to the end of a translation unit
ClosedPublic

Authored by ziqingluo-90 on Mar 17 2023, 5:12 PM.

Details

Summary

Our analysis requires a complete view of the translation unit to be conservative. As mentioned in this patch, we need to know if there is any function overload of a function f declared after f. In addition, we later may want to make global variables safe too. In such a case, we need to know if a global variable is used somewhere in the translation unit. Moreover, the analysis now can ignore ill-formed code detected at the end of a TU.

A summary of the change:
Add a new IssueWarnings function in AnalysisBasedWarnings.cpp for TU-based analyses. So far [-Wunsafe-buffer-usage] is the only analysis using it but there could be more. Sema will call the new IssueWarnings function at the end of parsing a TU.

This patch is still work in progress as the existence of the following concerns:

  1. [No] Can we move everything in AnalysisBasedWarnings.cpp to Sema? So far AnalysisBasedWarnings is used to bridge Sema and UnsafeBufferAnalysis so that the changes are minimal.
  2. [Done] We probably need a more efficient TU traversal implementation.
  3. [Solved] Current tests are mostly fine except that some notes with message "in instantiation of ... " are missing. Although these notes are not emitted by our analysis, we better understand why things change.
  4. [Done] To test this patch on a branch with all ongoing [-Wunsafe-buffer-usage] patches.
  5. Maybe there are better solutions? (Looking for comments!)

Diff Detail

Event Timeline

ziqingluo-90 created this revision.Mar 17 2023, 5:12 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 17 2023, 5:12 PM
Herald added a subscriber: ChuanqiXu. · View Herald Transcript
ziqingluo-90 requested review of this revision.Mar 17 2023, 5:12 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 17 2023, 5:12 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
NoQ added a subscriber: usama54321.Mar 20 2023, 6:27 PM

Thank you for a detailed summary, it's very helpful!

Can we move everything in AnalysisBasedWarnings.cpp to Sema? So far AnalysisBasedWarnings is used to bridge Sema and UnsafeBufferAnalysis so that the changes are minimal.

Currently Sema is completely oblivious to what specific analysis-based-warnings get issued per-function, and I see no need to do it differently for per-translation-unit warnings. (See also the inline comment.)

Current tests are mostly fine except that some notes with message "in instantiation of ... " are missing. Although these notes are not emitted by our analysis, we better understand why things change.

This sounds familiar, I think @usama54321 ran into a similar problem from inside clang-tidy: https://discourse.llvm.org/t/clang-tidy-diagnostics-for-template-code/62909

clang/include/clang/Sema/AnalysisBasedWarnings.h
96–102

Let's make the name generic. It doesn't really matter to the caller which specific warnings are emitted here (just like in the original method). We're likely to add more warnings of this kind in the future. What we're really doing in this patch, is create a new home for them to live in, so let's call it what it is!

Current tests are mostly fine except that some notes with message "in instantiation of ... " are missing. Although these notes are not emitted by our analysis, we better understand why things change.

This sounds familiar, I think @usama54321 ran into a similar problem from inside clang-tidy: https://discourse.llvm.org/t/clang-tidy-diagnostics-for-template-code/62909

Thank you for the reference. I checked out the code printing instantiation stacks:

void PrintContextStack() {
  if (!CodeSynthesisContexts.empty() &&
      CodeSynthesisContexts.size() != LastEmittedCodeSynthesisContextDepth) {
    PrintInstantiationStack();
    LastEmittedCodeSynthesisContextDepth = CodeSynthesisContexts.size();
  }
...

The CodeSynthesisContexts keeps track of instantiation stack during parsing. It's not a surprise that CodeSynthesisContexts is empty by the end of parsing. That's why no such note gets printed.

ziqingluo-90 edited the summary of this revision. (Show Details)Mar 21 2023, 4:35 PM
ziqingluo-90 edited the summary of this revision. (Show Details)Mar 21 2023, 5:29 PM
NoQ added a comment.Mar 22 2023, 11:28 AM

Yeah this information is probably at least partially unavailable in the fully parsed AST. We have all the instantiations with their AST copied from the original template AST, but we no longer remember *why* we copied it (which is what the CodeSynthesisContext stack captures) . And at this point there could actually be more than one reason (whereas the regular compiler warnings showed up the *first* time the instantiation was requested, so at that point there was just one reason *so far*).

I wonder if it's possible to build an "example" reason after the fact. There's probably no readily available solution but it might be possible to build one with relative ease 🤔 I'm not sure we need to do this for this patch.

But as a smaller subtask of this task, it's probably really valuable to at least make sure the current template arguments are explained. Eg., in

template <typename T>
T addFive(T t) {
  return t + 5;
}

void foo() {
  int *x = ...;
  x = addFive(x);
}

it's very important to make sure that the warning about unsafe buffer operation t + 5 carries a hint that T is int *. Without that the warning may be completely incomprehensible.

The template specialization information is still available but we may need to print it by ourselves.
I will add a FIXME there and we will deal with it in a follow-up patch.

Address comments.

ziqingluo-90 retitled this revision from [WIP][-Wunsafe-buffer-usage] Move the whole analysis to the end of a translation unit to [-Wunsafe-buffer-usage] Move the whole analysis to the end of a translation unit.Apr 6 2023, 4:16 PM
NoQ added inline comments.Apr 19 2023, 5:15 PM
clang/lib/Analysis/UnsafeBufferUsage.cpp
1478 ↗(On Diff #508249)

I think this entire matcher business can easily live in AnalysisBasedWarnings::IssueWarnings(TU). Just because we're running at the end of TU, doesn't mean each warning will need to figure out how to traverse it. In our case the warning still acts on function-by-function basis, it just benefits from other functions being already parsed. I suspect that this is going to be the case for almost every other analysis-based warning we ever put there. So I think let's run the matcher once in IssueWarnings(TU) , for all warnings at once, and then invoke void checkUnsafeBufferUsage(const Decl *D, ...) directly from inside it, just like we did before the patch. If we ever run into a warning that really needs a TranslationUnitDecl, we can always invoke it from outside the matcher.

Move the traversal framework to AnalysisBasedWarnings so that the UnsafeBufferUsage analyzer (as well as other possible analyzers) is still function based. It is in the same pattern as the original IssueWarnings function.

ziqingluo-90 marked an inline comment as done.Apr 26 2023, 6:10 PM
NoQ added inline comments.May 2 2023, 1:16 PM
clang/lib/Sema/AnalysisBasedWarnings.cpp
2323–2326

RecursiveASTVisitor does this automatically. Just ask it to Visit(TU).

2338

The intended way to make recursive visitors is to have TraverseFunctionDecl, TraverseBlockDecl, etc., which automatically do the isa check and fall through to the superclass if the method isn't defined. Then they can all call a common method to analyze the body.

This doesn't necessarily lead to better code though, just to more traditional code, so up to you^^ in this case it probably doesn't matter. But it's definitely nice to separate the common part (the part that invokes all individual analyses) into its own method. And then you can eg. isolate the special hasBody() check in the TraverseFunctionDecl() method, so that other code paths didn't have an extra runtime check.

2346–2348

I wonder if ObjCMethodDecl has the same problem. They too can have prototypes and definitions separate from each other.

2366–2367

I suspect that you might have reinvented shouldVisitLambdaBody().

ziqingluo-90 marked 3 inline comments as done.

Addressed the comments:

  • 1) about using a more traditional way of AST traversal; and
  • 2) about the concern on ObjcMethodDecl traversal.
clang/lib/Sema/AnalysisBasedWarnings.cpp
2338

yeah, I think overriding VisitFunctionDecl and others separately is a better form because I was clearly dealing with FunctionDecl and others case by case there. So put the specific logic for FunctionDecl in VisitFunctionDecl makes it clear.

Thanks for your advice.

2346–2348

It looks like hasBody() works properly for ObjCMethodDecl. I'm adding a test for it.

ziqingluo-90 marked an inline comment as done.May 5 2023, 1:21 PM
ziqingluo-90 added inline comments.
clang/lib/Sema/AnalysisBasedWarnings.cpp
2366–2367

I tried shouldVisitLambdaBody but it didn't work (I didn't look into why though)

ziqingluo-90 marked an inline comment as done.
ziqingluo-90 edited the summary of this revision. (Show Details)May 5 2023, 1:32 PM

Visit LambdaExpr properly

Clean up the code for early return in case of ignoring unsafe_buffer_usage warnings.

NoQ added inline comments.May 10 2023, 7:24 PM
clang/lib/Sema/AnalysisBasedWarnings.cpp
2341

This code will grow bigger when more warnings are added, and it's quite likely that most warnings would want to duplicate themselves in each of the Visit methods. Maybe let's make a single function from which all per-function warnings are invoked?

Also, hmm, UnsafeBufferUsageReporter can probably be reused between callables.

Actually, here's an even fancier approach: maybe let's move the visitor outside the function so that it didn't get in the way of readability, and run warnings in a lambda so that we didn't need to explicitly capture anything:

class CallableVisitor {
  llvm::function_ref<void(const Decl *)> callback;

public:
  CallableVisitor(llvm::function_ref<void(const Decl *)> callback):
    callback(callback)
  {}

  VisitFunctionDecl(FunctionDecl *Node) {
      if (Node->doesThisDeclarationHaveABody()) {
        callback(Node);
      }
  }

  // More Visit() methods.
};

void clang::sema::AnalysisBasedWarnings::IssueWarnings(const TranslationUnitDecl *TU) {
  // Common whole-TU safety checks go here.
  if (S.hasUncompilableErrorOccurred() || Diags.getIgnoreAllWarnings())
    return;

  // Whole-TU variables for all warnings go here. Then they're implicitly captured,
  // so no need to list them in the constructor anymore.
  UnsafeBufferUsageReporter UnsafeBufferUsageDiags(S);
  const bool UnsafeBufferUsageEmitFixits =
      Diags.getDiagnosticOptions().ShowFixits && S.getLangOpts().CPlusPlus20;
 
  CallableVisitor([&](const Decl *Node) {
    // Per-decl code for all warnings goes here.
    if (!Diags.isIgnored(diag::warn_unsafe_buffer_operation, D->getBeginLoc()) ||
        !Diags.isIgnored(diag::warn_unsafe_buffer_variable, D->getBeginLoc())) {
      checkUnsafeBufferUsage(Node, UnsafeBufferUsageDiags, UnsafeBufferUsageEmitFixits);
    }

  }).TraverseTranslationUnitDecl(TU);
}

I think that's a pretty neat way to organize this. This way it's immediately clear which code is executed how many times.

2370–2372

Ooo interesting, there is indeed a good reason to keep those checks outside; we shouldn't make an entire extra pass when none of the analysis-based warnings are enabled.

I suggest making the comment a bit more generic, so that users felt empowered to add more checks there and reuse the visitor for their warnings.

Address comments: refactor the callable-definition visitor to take a lambda Callback, who calls various analyses that need whole-TU information. If one needs to add a new such analysis later, just add a call in the Callback function.

NoQ accepted this revision.May 11 2023, 3:42 PM

Ok looks great to me now!

clang/lib/Sema/AnalysisBasedWarnings.cpp
2364

So WDYT about capturing the reporter from outside lambda instead?

This revision is now accepted and ready to land.May 11 2023, 3:42 PM
ziqingluo-90 edited the summary of this revision. (Show Details)May 12 2023, 11:42 AM
ziqingluo-90 marked an inline comment as done.
ziqingluo-90 added inline comments.
clang/lib/Sema/AnalysisBasedWarnings.cpp
2364

The Reporter is stateless so we definitely can let analyses of functions share it.

This revision was landed with ongoing or failed builds.May 12 2023, 11:52 AM
This revision was automatically updated to reflect the committed changes.
ziqingluo-90 marked an inline comment as done.