Page MenuHomePhabricator

[analyzer] Add StackSizeChecker to StaticAnalyzer

Authored by mate1214 on Sep 22 2018, 4:04 AM.



This checker can be used to warn about potential stack overflows or be used to estimate the size of used stack space. Both Clang Static Analyzer and Clang-Tidy versions exist.

The Clang-Tidy version references the StackUsageMeasuringVisitor class included in this commit so it can only come after this.

Both versions work by examining function bodies via the AST:

  • the static analyzer version can follow and summarize the space used by call chains
  • while the Tidy one version can analyze function bodies faster, one by one.

This version emits warnings if the estimated stack size surpasses the StackUsageLimit parameter, which is measured in bytes and defaults to 100 000. Setting it to some small value can be used to turn the checker into a statistical tool since the calculated values are included in the warning messages.

The calculations used for the size estimations do not take into account potential compile-time optimizations and contain some
minor simplifications. Only the information stored in the AST is used and variable lifetime rules are respected. The calculations in a particular function body are carried out by the StackUsageMeasuringVisitor class, which gives a composite result about a piece of the AST containing the maximal estimated space, the space that remain in use after the execution of those lines and flags about encountered variable length arrays or special nodes that satisfy a given predicate (e.g.: templates).

The current version takes no special actions upon encountering variable length arrays, the Tidy version has a simple extra logic
for them. The tests are divided between the two versions, the static analyzer ones are about the ability to calculate the size
of a complete call stack and the Tidy ones focus on particular statements and expression types, this is referenced
in one of the comments of the test files for those who want to use the test cases to understand the checker a little bit more.

The code favors readability and maintainability over performance in some places.

Diff Detail

rC Clang

Event Timeline

mate1214 created this revision.Sep 22 2018, 4:04 AM


Always great to see a new checker! I've started working in this project little over half a year ago, so I don't claim to be an expert, read my remarks as such! It'll be some time before I go through the entire code, but so far here are the things that caught my eye.

I think you should move the checker related files to a new folder, maybe StackOverflow/?


Alphabetical sorting?
(get it? alphabetical? I'm sorry.)


Missing header guard.


You included map but I don't see std::map anywhere?


Use ento after clang. Btw, does anybody know what ento refers to? Never got to know it :D

namespace clang {
namespace ento {
namespace stack {

Since this is the only constructor of this visitor, I think C++11 in-class member initialization would be nicer.


It isn't obvious to me what this function does.




Are you sure this function belongs in a visitor?


// end of namespace .*


I bet you can't just #include this without getting some nasty buildbut errors down the line, just #include "StackUsageMeasuringVisitor.h".


If you have a conversion operator to int, do you need operator==?


I think you don't need to make this mutable anymore, you can just initialize it in the constructor.

size_t length(const llvm::ImmutableList<StackInt> &L) {
  int Length = 0;
  for (auto It = L.begin(), E = L.end(); It != E; ++It)
  return Length;

// end of anonymous namespace


Would look better right below the definition of StackInt.


I think some dividers would help a lot:

// Methods for StackSizeChecker.

Use LLVM streams.

llvm::SmallString<20> WarningBuf;
llvm::raw_svector_ostream OS(WarningBuf);
OS << "whatever"<< " needs" << " to" << " be" << " concatenated";

It isn't obvious to me why what "StackLevels" mean. Did you mean depth?


You use dyn_cast_or_null, which to me indicates that there's a possibility of Declaration being null, but you never check it.


In essence, this looks like

if (/* Calculated stack usage so far*/ > StackUsageLimit)

but to me it's very obfuscated. Can you please either add more comments or move this to a separate function?


Same issue here. It would be better just to see something like int CurrentStackUsage = getCurrentStackUsage(C);


#include "StackUsageMeasuringVisitor.h"


// end of anonymous namespace

// Methods for StackUsageMeasuringVisitor.

To me it seems like ContinueTraversing always equals to !ExitFlag. Maybe remove one of them?


I think the idea for a checker like this is great! I left some inline comments, but most of them are minor nits. I have some general remarks to make however:

  • Your code lacks comments, especially a nice documentation describing this problem, how you approach it, what the general algorithm is etc. For example, you use Usage extensively, but I am still a little unsure what the difference is between Empty and EmptyFlagged. I'd encourage you to look at a couple nice examples:
    1. @baloghadamsoftware's IteratorChecker (A long code with fairly long comments to support it),
    2. @rnkovacs's DeleteWithNonVirtualDtorChecker (A shorter code with shorter explanation),
    3. My UninitializedObjectChecker (Very similar to your checker in terms of quantity of files etc)
  • Make sure you follow the LLVM Coding Standards.
  • As stated before, should move this to a new directory.
  • For the amount of checker implementation you have, there aren't no many test cases. I can see that you handle CXXTryStmt, but there are no test cases for it.
  • This revision is way too large for comfortable reading, you can help this by learning from my mistake and split it up into smaller patches. @NoQ summarized the reasons for this very nicely here: D45532#1083670
    1. Post a checker class with an empty callback [1]. Even in this stage, some discussion could be had with the general direction the implementation should go. This avoids the potential of investing a lot of effort into a project that could be wrong from the get-go. Don't get me wrong, I'm not at all saying this is the case here!
    2. Add implementation for checking a very easy case, such as a function with 10 array declarations, each having an absurd size.
    3. Add each new functionality as a separate patch, and add dependencies with "Edit Related Revisions". For example, one for handling branches, one for loops, etc.

Let me emphasize that I'm a beginner myself, so I could be wrong on many of these ^-^

[1] This could be a nice first patch:

//=======- StackSizeChecker.cpp ----------------------------------*- C++ -*-==//
//                     The LLVM Compiler Infrastructure
// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.
//  This file defines a checker that checks for suspiciously high stack use.
//  The checker produces a warning whenever the given stack limit is exceeded.
//  The limit is defined by the "StackUsageLimit" analyzer parameter,
//  or defaults to 100000 bytes.

class StackSizeChecker
    : public Checker<check::PreCall, check::PostCall, check::EndFunction> {
  mutable std::unique_ptr<BugType> StackOverflowBugType;

  int StackUsageLimit;

  void checkPreCall(const CallEvent &Call, CheckerContext &C) const {}
  void checkPostCall(const CallEvent &Call, CheckerContext &C) const {}
  void checkEndFunction(CheckerContext &C) const {}

void clang::ento::registerStackSizeChecker(CheckerManager &Mgr) {
  StackSizeChecker *Check = Mgr.registerChecker<StackSizeChecker>();
  Check->StackUsageLimit = Mgr.getAnalyzerOptions().getOptionAsInteger(
      "StackUsageLimit", /*DefaultVal*/ 100000, Check);

To me, Usage seems to be a way too general name. Also, for a class being used this heavily, it has no comments explaining what it does. Is this a property of a variable? A function call?


Which tree? I know you refer to the statement tree after reading the code for a while, but it would be more obvious if you put some comments for Usage.


Actually, it might be cleaner to move these into a new class, and have StackUsageMeasuringVisitor manage an object from that class.


Would isEmptyFlagged be a better method name?


The function name states sizeInBytes, but

  • ASTContext::getTypeSizeInChars returns the width in CharUnits,
  • which you divide with Context.getCharWidth(), which is in bits (according to the doxygen comments),
  • and we end up getting the amount of bytes T occupies?

I might make a fool out of myself, but wouldn't the formula look something like this: (Context.getTypeSize(T) * Context.getCharWidth()) / 8?


Needs explanation.


Why initialize here if you're going to overwrite it when you register the checker?


I think this would be better as static with the definition being out-of-line on the bottom of the file (but not before registering the checker, as it's traditionally the last function in the checker's TU).


Actually, same here, should be static and defined out of line, so the declarations and documentation can be placed on top of the file and implementation can be found on the bottom.

"StackUsageLimit", /*DefaultVal*/ 100000, Check);

Actually, these shouldn't be static variables, but rather static functions (and probably within the class definition):

static Usage getEmptyUsage() { return Usage{0, 0, false, false}; }
static Usage getEmptyFlaggedUsage() { return Usage{0, 0, false, true}; }

I think this would be more fitting as a static member function.


I don't really understand what this function does.




Can you rephrase this? "signal the algorithm" doesn't sound too descriptive to me.


We only started doing this recently, but you can actually organize the run line like this:

// RUN: %clang_cc1 -analyze -analyzer-checker=core,alpha.core.StackSize \
// RUN:   -analyzer-config alpha.core.StackSize:StackUsageLimit=200 \
// RUN:   -std=c++11 -triple x86_64-unknown-linux-gnu -verify %s
Szelethus added inline comments.Sep 22 2018, 9:14 AM

Ah, I I guess it's actually the height of the statement tree.

Szelethus added inline comments.Sep 22 2018, 9:27 AM

Oh never mind, you had it in include/clang/StaticAnalyzer/Checkers/StackUsageMeasuringVisitor.h. I bet it is for the clang-tidy check. Never mind.


Never mind, but should use "" instead of <>.

Without seeing the full picture (i.e. what you want to do with the clang-tidy check) it is hard to tell,
but are you *very* sure all this logic should be in clang/lib/StaticAnalyzer/, and not in clang/lib/Analysis/ ?

Hi @Szelethus !

Thanks for all your detailed and helpful input, I will make sure to go over all the comments and answer them, but it will take some time.

A bit more background information on this checker and how it came to be might help you and others to understand some of the choices I made, and also address some of your general questions and worries.
I was hesitant about putting too much in the summary but it seems I should have added more.

This was a university assignment and I was encouraged to put it up here. This code has seen quite a few iterations and criticism.
It was my introduction to the clang tool-chain, and the clang-tidy checker was actually the first one to exist. Explaining it briefly will help with the "bigger picture".
It focused on single function bodies and could tell you how much the maximal stack space that body can occupy is.
It was able to calculate with:

  • variable lifetimes (even taking into account lifetimes extended by const references etc.)
  • recycled space (if the lifetime of a variable ends you can place the new ones can take up its place)
  • execution rules (for example: if there is an if statement then you take the maximal space from its branches and not add them together etc.).
  • variable length arrays (it took note when it saw one and if the resulting number was close enough to the limit but not over it you still got a warning)

The final result was the all-time maximum so if you had a ... {... char c[1000]; ...} ... somewhere in the middle that was, by lifetime rules, gone by the end and nothing surpassed it, you still got the correct numbers.
To produce the result, 2-3 massive recursive functions with tons of else if-s were used to traverse the AST of the function body with an in-order traverse strategy.
Now for ordinary functions it was an easy "traverse it all then get the final number" task. But enter the templates. There seemed to be no way to tell right away that we entered a function that had dependent types.
The easiest answer to templates was going until you see something that has dependent types involved then quit right away and say "it is a template we have nothing to do here".
The implementation was crude but it got the job done. It had (and still has) tests for most of the cases (statement types, expressions, VLA, templates etc.) and those tests felt natural there because they were concerned
with the contents of a particular function body.

Now single functions are fun, but entire call chains are what usually matter. You can pack a few char[10000]-s in one body but having them in each function of a call stack may not be so good idea.
So the static analyzer checker basically uses the same logic for single functions as mentioned before but as the analyzer follows the execution paths it adds up everything.
Now you could say that simply taking the whole bodies into account is enough but consider the following example:

void f(...){
    ... some minor stack space is used
    if(...) {
    } else {
        char x[5000];

Here taking the whole body of f may yield 5000+"someting" bytes, instead of the actual small space, that gets used in the call stack when the condition is true and the other function gets called.
So it is reasonable to say that partial summaries should be possible up to certain point in the body.
Having said these and the fact that the original code was even harder to read, the StackUsageMeasuringVisitor came into existence to replace those massive recursive functions.
It is based on the RecursiveASTVisitor and uses post-order traverse strategy to accomplish its goals. This is where the readability over performance aspect comes in.
Without going too deep into details my visitor assigns Usage values to the AST nodes and then their parents can calculate with that, depending on what they actually are. The aforementioned requirement to stop on templates is quite easy,
because you do not need a real answer at the end so you can stop. The requirement to leave out the parts after a certain point is trickier, because it has to assign a value to all the nodes up to the top for proper calculations.
With the specifics of the visitor base class (I've seen no in-order traverse possibility), the solution was to assign zero to everything we do not care about and basically after a certain point it is a dry run on the rest of the function body and some calculation in the parent nodes.
I saw a question about that ContinueTraversing and ExitFlag in the visitor. They are both needed, because you can use the same logic for "I don't care about anything following this statement" and "I don't care about anything if I see a template type"
with some predicates, but in the first case you still need to do stuff with the post-order strategy and can not stop the visitor right away.
Also I saw a comment about Usage::Empty and Usage::EmptyFlagged, the first one says something uses no space at all in general, the second one says that it may, but it is considered zero by decision and nothing should change that (i.e.: visiting the same node again but as a subclass of the statement or expression type).
The rest of this SA checker is basically "use the visitor on every function body you enter and add the numbers to the global picture".

I'm contradicting my previous example here a bit, but if you take the SA approach, probably what you are *really* interested in the grand picture is the fact that the checker can follow the execution path and not make too big mistakes during calculations.
That is what the currently included tests try to aim for. The real "nit picky" ones like const references or ones about casts and such are left in the tidy checker. There you are really interested in them because you want to really focus on one function at a time and get the best possible result.
I realize this breakdown might not be the best, and anyone is welcome to suggest a proper test layout. The other concern I had about tests is the fact that these warnings contain numbers that the checker has to match exactly with the ones in the test, and sometimes it feels a bit forced to require exactly 238 bytes in the message instead of "somewhere between 230-250" if we are talking about some f --> g --> h call graph. I'm fine with doing the "exact" math for a single function f but it gets redundant and cumbersome.
Also if someone says for example that something that has been assumed to take up some space on the stack should just be zero because clang always optimizes it out no matter what (just a thought, but may happen with the various types of casts that the AST can include), then you may have to recalculate more tests than it should be necessary. Again if I have slipped over some fact about the capabilities of the testing system please correct me.

About your suggestion with the folder, it seems reasonable, but I'm not sure about the name.

I hope this helps some of the general questions, I will post other answers as I get to them. Thanks again for your comments!

Hi @lebedev.ri!

Thanks for the question. I was not sure as to where exactly put the files. The important thing for me is that the StackUsageMeasuringVisitor should be reachable from the clang-tidy checker. (For the bigger picture please refer to my answer to @Szelethus). I was not able to find documentation I could use to put the extra logic in the right place (may be my fault). Any suggestions and references are welcome!

Hi @lebedev.ri!

Thanks for the question. I was not sure as to where exactly put the files. The important thing for me is that the StackUsageMeasuringVisitor should be reachable from the clang-tidy checker. (For the bigger picture please refer to my answer to @Szelethus). I was not able to find documentation I could use to put the extra logic in the right place (may be my fault). Any suggestions and references are welcome!

Reachable *how*?
Is clang-tidy check using this very same logic?
For recent examples, see ExprMutationAnalyzer, how it is structured (especially the tests, they are so nice!).
It started in clang-tidy, but then was promoted to analysis/.
So i'm wondering if the same should happen here, too.

Szelethus added a comment.EditedSep 24 2018, 1:26 PM

Thanks for all your detailed and helpful input, I will make sure to go over all the comments and answer them, but it will take some time.

Cheers! I can't emphasize enough however that I might be wrong on what I've said, or say in this comment.

It was my introduction to the clang tool-chain

I always struggled getting enough literature about the static analyzer, if you didn't come across these works already, it might be worth taking a look :)

  5. <--- by far the best guide for the static analyzer at the moment.

A bit more background information on this checker and how it came to be might help you and others to understand some of the choices I made, and also address some of your general questions and worries.
I was hesitant about putting too much in the summary but it seems I should have added more.

Summaries are one thing, that most likely won't be (and shouldn't have to be) read by a future maintainer, the code should speak for itself.

This was a university assignment and I was encouraged to put it up here. This code has seen quite a few iterations and criticism.

It's great that you put it up here! New checkers (which from what I've seen are mostly written by beginners) tend to go through in some cases a many month long review process, so I guess be prepared for some back and forth, but I personally really enjoyed the process, I hope you will too.

I have read the rest of your comment, thanks for the details! I can't say that I understand every aspect of your algorithm, but I have a couple question for the grand picture (but feel free to correct me if I misunderstood anything):

I can see that you use check::PreCall, check::PostCall, check::EndFunction, and you also modifiy the program state to have a fairly good idea about what execution path did the analyzer take, but a function's stack usage is calculated very roughly. Let's imagine that in the next example, f is only called after heavy stack usage:

// Let's just pretend that this function actually
// has a meaningful implementation that the analyzer
// knows will return false in this case.
bool hasStackEnoughSpace() { return false; }

void f() {
  if (hasStackEnoughSpace())
    // use the stack like an absolute madman

This is silly, but what it wants to demonstrate, that you could report a false positive here, despite the analyzer potentially knowing that that path will never be taken. To me it seems like you don't utilize the actual ProgramState, but I think you should( edit: nvm, partially, see my next comment).

To me it also seems like that you pretty much reinvent the compiler in this patch, by modeling almost everything the compiler does by itself. I'm sadly nowhere near an expert on this topic, but it begs the question whether there's an already existing solution to this. Let's wait for others to weigh in on this, maybe I'm wrong and this is what has to be done.

A couple tips for easier development and review process:

  • I think before deep diving into the fixes, you really should split up this patch into an empty callback (essentially an announcement), and add each feature as a separate patch. Ideally, each time you implement a new feature, such as handling branches, you should make a neat patch with a small implementation and related test cases. As a beginner (and I suffered from this myself) this is exceptionally hard to do, because you can often find yourself deleting everything and starting all over, but I've grown to appreciate this principle as I often saved myself a whole lot of effort due to some feedback. However, making a rough proof of concept is in this case IMO a good idea, because it's hard to see at the start where the code will end up, but later it should be split up.
  • Comment everything. I mean, bool isSuccessful() const { return IsSuccessful; } and similar functions speak for themselves, but ideally every non-trivial function and structure should be documented, as well as any non-trivial hackery inside a function.

One important thing I forgot, that you cant rely on ProgramState due to tidy's constraints. Btw, how do you plan to make this into a tidy checker? To me it seems like it would amplify the already existing false positive issues (if I understand your currect way of thinking correctly).

whisperity retitled this revision from [analyzer] StackSizeChecker to [analyzer] Add StackSizeChecker to StaticAnalyzer.Sep 26 2018, 3:23 AM
whisperity edited the summary of this revision. (Show Details)

One final nit apart from a few in-line comments (most of them are just arguing with @Szelethus anyways 😛): KB vs. KiB (remember how a 2 TB hard drive appears as 1.8 TB on your machine? Because it's TiB!) is one of my pet peeves especially that Windows and most people still get it wrong nowadays (and they also teach it wrong). Can we change the default from 100 000 to 102 400? That would make it truly 100 KiB in the proper sense.


Speaking of missing header guards and in general: apart from tests, it might be a good idea to run CSA and Clang-Tidy on your checker's code. E.g. Tidy has a llvm-header-guard checker -- which might not find this particular issue, but in general we could find some issues on this code itself.

(For the ultimate version, run your own checker on your own checker's code... maybe you yourself used a too large stack somewhere! 😜 )


Then calling this method would mean answering this question: "A Stmt/Decl is empty flagged?" This is a "What?" moment.


If we're talking about sizes, int (especially because it's signed int) might not be the most appropriate return type for these functions.


Need to make a distinction whether or not we want to use byte in the sense of 8 bit or in the sense of sizeof(char). Also, the code calls getTypeSize, not getTypeSizeInChars. This gives the answer in pure bits.

(Because of this, your suggested formula is wrong. getTypeSize() is already in bits, which you multiply further by getCharWidth() -- which is on some systems 8 -- then you divide it out. In most conventional systems the division cancels out the multiplication, and on systems where a sizeof(char)-byte is not 8, you'll get a non-integer quotient truncated as result.)


  • in case of you want to define byte to be 8 bits forever and always, the divisor should be 8, not getCharWidth().
  • In case you want to define byte to be sizeof(char), then the current code is correct, but you could just use: Context.getTypeSizeInChar() . getQuantity().

Just for the record: the C++14 standard says the following in § 1.7 intro.memory:

The fundamental storage unit in the C++ memory is the byte. A byte is at least large enough to contain any member of the basic execution character set (-> § 2.3 lex.charset) and the eight-bit code units of the Unicode UTF-8 encoding form and is composed of a contiguous sequence of bits, the number of which is implementation-defined.

(lex.charset is basically an ASCII subset of 96 cardinality, lowercase, uppercase, parens, etc.)

I'm not sure what the convention is for error reports when referring to bytes, but maybe it would be better to consider byte in the C++ sense and let the value change based on what target the analyzer file is being compiled for.


Note that using the other stream headers (<sstream> for example) is not problematic in this regard


generateError is const-qualified but writes this.


const_cast catches the eye. Why do you need to make this non-const?


Could we use int32_t instead of int, and uint8_t instead of bool?

Szelethus added inline comments.Sep 26 2018, 5:51 AM

Right. And it doesn't actually describe what the function does. hasEmptyFlaggedSubDecl?


Oh, right, sorry about that. In either case, every other checker uses LLVM streams, so I guess llvm::raw_*_ostream are the preferred stream classes anyways.

MTC added a subscriber: MTC.Sep 26 2018, 7:53 PM
NoQ added a comment.Oct 1 2018, 12:41 PM

@Szelethus, thank you a lot for covering this review!

@mate1214, yes, please please split this up, like @Szelethus described.

At the moment this patch is not only huge, it is very under-tested. Eg., the Visitor promises support for C++ temporaries (which can potentially be a very difficult problem), but tests don't contain even a single class. If the functionality was added incrementally, it would have been easy to make sure every improvement is covered by regression tests.

Am i understanding correctly that the visitor is capable of summing up stack usage on the current path, i.e. ignoring objects that are put on the stack only within parts of the function that were not in fact executed on the current path, while also ignoring objects that are already removed from the stack? It should indeed be possible to accomplish this by only looking at the AST (unless VLAs or alloca() calls are used excessively), but in this case why is the checkEndFunction() callback not specifying the current path?

I guess a good idea for this checker would be to add a BugReporterVisitor to it in order to highlight where exactly does the stack usage go up. It is allowed to be slow, so it might be reasonable to just outright re-run the Visitor on every statement in the bug path.

I approve moving the visitor into lib/Analysis. That's indeed the library into which we dump all static analysis that is not specific to Static Analyzer.

Once out of alpha, this checker will look great in optin.performance, where we already have a checker for excessive structure padding.


It's Entomology.

NoQ added a subscriber: rsmith.Oct 1 2018, 6:22 PM
In D52390#1251526, @NoQ wrote:

I approve moving the visitor into lib/Analysis. That's indeed the library into which we dump all static analysis that is not specific to Static Analyzer.

@rsmith disagrees though, and he has a point:

Let's keep this in lib/StaticAnalyzer then!

It'd also be good to have an entry in www/analyzer/alpha_checks.html. We've been neglecting it for long enough :/.

mate1214 abandoned this revision.EditedOct 16 2018, 12:42 PM
mate1214 marked 24 inline comments as done.


Sorry for the delay. After considering the amount of requested tests and investigating the possibly reviewer friendly ways to split this up, I came to the conclusion that I can not go through with all this work for now. Thank you everyone who took the time to comment.
I'm abandoning this revision.

nhaehnle removed a subscriber: nhaehnle.Oct 17 2018, 3:17 AM