[analyzer][www] Add more open projects
ClosedPublic

Authored by george.karpenkov on Tue, Oct 9, 8:27 AM.

Details

Summary

Added some extra tasks to the open projects. These are the ideas of @NoQ and @george.karpenkov, I just converted them to HTML.

Diff Detail

Repository
rC Clang
Szelethus created this revision.Tue, Oct 9, 8:27 AM
Szelethus added a comment.EditedTue, Oct 9, 9:32 AM

Mind you, there are some ideas I didn't add to the list -- I just don't know how to put them to words nicely, but I'm on it.

@NoQ mentioned

  • Enhancing constraint solving. I just simply lack the knowledge to make an entry here about it.
  • Fixing up taint analysis.

@george.karpenkov mentioned assigning a unique error code to each analyzer warning, and maintaining it on the website.

Also, a lot of items on this list is outdated, but I joined the project relatively recently, so I'm not sure what's the state on all of them.

NoQ added a comment.Tue, Oct 9, 1:09 PM

Whoa thanks! Will have a closer look again.

www/analyzer/open_projects.html
30

I'll try to list other constructor kinds that i have in mind.

73

Typo: methods.

83–84

I'll try to be more, emm, positive on the website :]

156

Typo: However.

157

Typo: Development.

184

Something is unfinished here.

Thanks!

I admit that the difficulty was mostly chosen at random, so that could be brought closer to the actual difficulty of the project.

www/analyzer/open_projects.html
83–84

Oh, right, sorry, I tried to "positivitize" most of these, but apparently missed this one O:)

Also, a lot of items on this list is outdated, but I joined the project relatively recently, so I'm not sure what's the state on all of them.

AFAIK, the items below is outdated.

  • Enhance CFG to model C++ temporaries properly (This problem has basically been fixed by @NoQ.)
  • Enhance CFG to model C++ new more precisely (This problem has basically been fixed by @NoQ.)
  • Implement iterators invalidation checker (IIRC, @baloghadamsoftware has solved this, see IteratorChecker.cpp.)
  • Write checkers which catch Copy and Paste errors (IIRC, @teemperor has solved this, see CloneChecker.cpp)
  • Enhance CFG to model C++ delete more precisely (@blitz.opensource's focus is no longer on clang static analyzer, so we should not keep him as current contact.).

And there are items, I'm not sure what the current state is. Like:

  • Explicitly model standard library functions with BodyFarm. (This item is marked as ongoing, it doesn't look very active nowadays.)

If I'm wrong, @NoQ and @george.karpenkov, please correct me. In addition 2018 Bay Area LLVM Developers' Meetings may bring some new open projects :), see http://llvm.org/devmtg/2018-10/talk-abstracts.html#bof6.

At the end, there are some punctuation problems, yea, I browsed this page through the browser :).

www/analyzer/open_projects.html
95

“this function is pure” -> "this function is pure"

97

“partial -> "partial

181

Also here? “Schrödinger state” -> "Schrödinger state"

189

can‘t -> can't

190

there‘s -> there's
aren‘t -> aren't

Szelethus updated this revision to Diff 169193.Thu, Oct 11, 4:54 AM
  • Fixed typos
  • Fixed stupid characters (Thank you so much @MTC for going the extra mile and opening this patch in a browser!)
  • Removed outdated entries as mentioned by @MTC
Szelethus marked 12 inline comments as done.Thu, Oct 11, 4:55 AM
NoQ added a comment.Thu, Oct 11, 3:44 PM

In addition 2018 Bay Area LLVM Developers' Meetings may bring some new open projects :)

Actually, let's commit this before the conference, even if it's not perfect, so that people who suddenly get inspired to work on Static Analyzer already had an updated list :)

www/analyzer/open_projects.html
24–25

Let's move this into a sub-bullet of "Enhance the modeling of the standard library." and remove the part about std::string, but instead say "This is often effective for simple C functions such as atomic builtins and certain related threading functions (such as call_once) because atomicity doesn't have any meaning for Static Analyzer and can be completely ignored. This approach is not recommended for functions dealing with complicated C++ code - temporaries, templates - unless you have unbelievably good AST handwriting and recite all supported versions of the C++ standard by heart."

24–25

Let's move this into the construction context bullet. I think we do call the destructor now but it might be that we don't call the custom delete operator.

26

@george.karpenkov your turn here.

26

Let's mark this with "Refactoring: ".

30

Actually, maybe let's turn this into a single project with sub-bullets and list all problems instead of just one.

Ok if i just dump my thoughts with markdown? >_<

Something like:

  • Handle more ways of constructing C++ objects by identifying ConstructionContexts in the CFG and using them to identify and track the object until the permanent storage for the object is evaluated.
    • Constructors of fields and [[since C++17] base classes] of aggregates. When aggregates are brace-initialized, their own trivial constructor does not happen, but the constructor for sub-objects does. It would be necessary to see beyond the InitListExpr in order to find out which field of which aggregate is actually constructed. Because brace initializers can be nested, ConstructionContext for this constructor would potentially contain an indefinite amount of intermediate InitListExprs. Additionally, if the field is of reference type, lifetime extension needs to be modeled.
    • Constructors within new[]. Once an array of objects is allocated via operator new[], constructors for all elements of the array are called. Arguably, we should evaluate at least a few of them, as if it was some sort of loop. The same applies to destructors within delete[]. ConstructionContext is already available here, so it is likely that there's no CFG work required, though indicating the presence of a loopy control flow in the CFG might be helpful.
    • Constructors that can be elided due to Named Return Value Optimization (NRVO). If a local variable within a function is of object type and is returned by value on all return statements, then the compiler is allowed (but not required to, even in C++17) to immediately store this variable at the address into which the function would put the return value, thereby skipping ("eliding") copy/move constructor call (even if it has user-visible side effects). Variables eligible for NRVO can be easily identified in the AST via VarDecl::isNRVOVariable(), so no CFG work is necessary here, but Static Analyzer needs to realize that the respecitive VarRegion should be entirely replaced with the return region of the function for the whole duration of the respective stack frame.
    • Constructors of virtual method arguments. Constructors of function arguments are already supported, but Static Analyzer has problems finding the correct parameter variable declaration and the correct stack frame for the object under construction if it doesn't know how exactly a virtual call is devirtualized. It might help to simplify the identity of the parameter region to exclude the Decl of the callee and its parameters from its identity.
    • Constructors of default arguments. In C++ functions can have default values for parameters that are re-computed at run-time every time the the function is called without specifying the argument explicitly. There isn't much difference between default arguments and normal arguments when it comes to ConstructionContext-related problems. But the actual problem here is that the expression that initializes the parameter is not part of the current stack frame, but instead lives in the middle of nowhere within the parameter declaration. This means that if two calls of the same function with defaulted arguments are present within the same full-expression, we may accidentally assign two different values (corresponding to two different default argument objects) to the same expression in the Environment, which is incorrect. One of the possible solutions is to define an additional LocationContext to discriminate between different default argument evaluations, similarly to how StackFrameContext allows discriminating between values of the same expression on different layers of recursion.
    • Constructors that perform lambda captures. If a lambda captures a variable of object type by value, the object needs to be copied into the lambda, which implies calling a copy-constructor. A new kind of ConstructionContext would need to be defined in order to identify the memory region occupied by the lambda object (most likely a MaterializeTemporaryExpr that surrounds the LambdaExpr) and the sub-region that corresponds to the implicit field of the lambda object that contains the capture.

(Difficulty: Medium)

66

"supporting all of them" -> "modeling all methods in all classes of the standard library"

67

Maybe move "One good thing to start..." into a sub-bullet?

83–84

"Pointer-to-pointer casts are also causing a surprising amount of problems. The decision to re-use ElementRegion to track the type of the object behind the pointer value seems to create more problems than it solves, but..."

229–230

This should be removed because we already have a bullet for this.

@Szelethus I take it this is mostly formed from @NoQ email? Language could use polishing in quite a few places, could I just commandeer this revision and try to fix it?

www/analyzer/open_projects.html
26

Let's just skip it.
I suspect the explanation would not benefit here, because a large body of assumed knowledge would be required to take on the project.

234

This is handled by Z3 invalidation, I'm not sure it's worth it spending much effort here in the long run.

@Szelethus I take it this is mostly formed from @NoQ email? Language could use polishing in quite a few places, could I just commandeer this revision and try to fix it?

Yes it is. Though the other item you mentioned should be on this list -- I just simply forgot to put it there before updating the diff O:). Feel free to commandeer it, this patch is barely my work.

george.karpenkov commandeered this revision.Mon, Oct 15, 2:52 PM
george.karpenkov edited reviewers, added: Szelethus; removed: george.karpenkov.

I have tried to clean up the list.

Szelethus accepted this revision.Mon, Oct 15, 3:29 PM

Thanks, this looks great!

This revision is now accepted and ready to land.Mon, Oct 15, 3:29 PM

@Szelethus thanks! BTW if you really want to invest into maintaining the website,
I think it's totally worth it to change all contents to markdown,
and then have a script to generate HTML from that.
Committers would be expected to manually run that script.
This would also solve our problem with the disappearing header.

I dislike web development, but that would indeed be invaluable. I'll take a look.

NoQ added inline comments.Mon, Oct 15, 4:00 PM
clang/www/analyzer/open_projects.html
27–32 ↗(On Diff #169758)

This is extremely important to get right. Alpha doesn't mean "i did an experiment, let me dump my code so that it wasn't lost, maybe others will pick it up and turn it into a useful checker if they figure out how". Alpha doesn't mean "a checker that power-users can use at their own risk when they want to find more bugs". Alpha doesn't mean "i think this checker is great but maintainers think it's bad so they keep me in alpha but i'm happy because i can write in my resume that i'm an llvm contributor". All of these are super popular misconceptions.

Alpha means "i'm working on it". That's it.

Let's re-phrase to something like: "When a new checker is being developed incrementally, it is committed into clang and is put into the hidden "alpha" package (from "alpha version"). Ideally, once all desired functionality of the checker is implemented, checker should be moved out of the alpha package and become enabled by default or recommended to opt-in into, but development of many alpha checkers has stalled over the years."

80 ↗(On Diff #169758)

I guess let's add the rest of the constructors from my message above.

88–89 ↗(On Diff #169758)

Something's wrong here.

NoQ added a comment.EditedMon, Oct 15, 4:01 PM

Other than that, the current text looks great.

Szelethus added a comment.EditedTue, Oct 16, 2:49 AM
This comment has been deleted.
clang/www/analyzer/open_projects.html
153 ↗(On Diff #169758)

Did you mean to have this newline here? Difficulty seems to have a weird placement when viewed in a browser.

clang/www/analyzer/open_projects.html
27–32 ↗(On Diff #169758)

Let's ignore (3) as a red herring, but I'm not sure I see the difference between (1), (2) and (4). When someone works actively on a checker, but then stops, it immediately transfers from state (4) to state (2) and optionally (1)

80 ↗(On Diff #169758)

What message above? The original email?

NoQ added inline comments.Tue, Oct 16, 1:59 PM
clang/www/analyzer/open_projects.html
27–32 ↗(On Diff #169758)

This is kinda mostly about the attitude with which people should put stuff into the alpha package. Yeah, if someone stops for inevitable reasons, then we're inevitably left with unmaintained experimental code in the repo, i guess that's why this section exists. But contributors shouldn't plan for this from the start. A lot of contributors literally believe that alpha is by design a stockpile of incomplete work and weird experiments and it'll make everybody happy if they add more incomplete work and weird experiments into it, i myself misunderstood this for a long time and i want to address this misconception.

NoQ accepted this revision.Tue, Oct 16, 4:22 PM

Ok, let's see if this actually works :)

This revision was automatically updated to reflect the committed changes.
dkrupp added inline comments.Wed, Oct 17, 2:17 AM
www/analyzer/open_projects.html
198

Probably it is worth mentioning here, that there is a macro language already for describing summaries of standard library functions in StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp.

This macro language could be refactored to a separate header file so it could be used in other checkers too. Could also be extended for C++.

Another useful addition would be to enable users to describe these summaries in run-time (in YAML files for example) to be able to model their own proprietary library functions.

Then as a next step we could introduce a flow sensitive analysis to generate such summaries automatically. Which is a hard problem indeed, the others above should not be too difficult

pchan added a subscriber: pchan.Thu, Oct 18, 12:58 AM