This is an archive of the discontinued LLVM Phabricator instance.

[WIP][analyzer][doc] Add Container- and IteratorModeling developer docs
Needs RevisionPublic

Authored by gamesh411 on Nov 23 2020, 1:44 AM.

Details

Summary

Container and IteratorModeling is a feature in ClangSA which handles the containers and their
iterators. Containers include arrays and STL containers, and their iterators implemented as either
pointers or class instances.

Diff Detail

Event Timeline

gamesh411 created this revision.Nov 23 2020, 1:44 AM
gamesh411 requested review of this revision.Nov 23 2020, 1:45 AM
gamesh411 retitled this revision from [analyzer][doc] Add Container- and IteratorModeling developer docs to [WIP][analyzer][doc] Add Container- and IteratorModeling developer docs.Nov 23 2020, 1:48 AM

In general, perhaps you should go over the rendered text once again and make use of emph and bold some more.
Explanation looks okay from my part, although I'm really not knowledgeable about CSA internals. But there are plenty of "typesetting" issues.

Do you have to stick to 80-col in the documentation file? It is customary (at least in LaTeX) that prose is organised in a "one line of code per sentence" fashion. This also makes handling the diff easier... In the latter part towards the discussion about how iterators are targeted, the "80-col" is broken anyways.

clang/docs/analyzer/developer-docs.rst
14

How come this new addition is a .rst but the rest isn't. Does the rendering work without it? We should keep a style here, either way. So turn all into proper .rst links, or drop the .rst here.

clang/docs/analyzer/developer-docs/ContainerModeling.rst
5

Can the checker's name be turned into a link, or am I mixing up individual checker documentations' availability with Tidy?

Either way, show the check's name in bold face.

6
8
11

what does it refer to here?

21

Will this appear to the reader as alpha- or alpha.?

29
35
35–36

Iterator was linked already, I don't think this needs to repeat.

37–47

What about neibloids (see right before Parameters) (callables that are explicitly disabling ADL at a call site even though they look like an ADL call, with tricks, usually found in the ranges library).

40

Existence function?

51–55

Format these as monospace too.

57
71
87
106

(While I don't at all agree with the whole put the * to the variable name, not the type name yadda, Clang prints type names with a before the *, so let's keep it consistent...ly bad.)

110
114
122
123
126
147
148

I think something is missing for here, I am incapable of decyphering this sentence.

154–155
157
158

Is typesize a defined thing in the constraint manager? Otherwise, wouldn't sizeof(T) / 4 be better readable?

159
161–162
165–166
169
180

(same with T -> T below consistently)

185
204

What does the ' do here?

210

Should be?
Is.

213
224
227
236
261–282

Perhaps it would be better if indivudal comments for each line would simply be on the line before the code?

Something like:

// No modelling needed:
Container C;

// Bla bla
foo()
263

Yeaaaaaaah...

290
290–293

Previously, lvalue was written as LValue!

292
295
299
311
312
315

LValues, RValues, incosistency

317
320

Could we turn this on its head and simply call these "user-defined iterator types" instead of this, while technically correct, rather wonky "class instance" kind of stuff?

324–325

Which directory? Previously the algorithm checker was referenced with its logical name alpha.cplusplus.SomethingSomething.

328–329
331
335
340
360

I think LLVM's RST rendering rules have a .. option:: element, which renders it in a nice style. But this might also be Tidy-specific...

363
367
369
369

typo

372
373–374

Actually, while the explanation is understandable for me with additional knowledge about the representation... I think it would be useful to add the most simple example from the iterator checkers to the end of the document, how this whole thing ties together and how it is useful in a checker.

I don't have the time to comb through this doc, unfortunately, but I want to applaud this effort to make the iterator checker family more accessible. Its certainly a forerunner in modeling tricky C++ constructs, and I can't wait to be a more valuable reviewer after being a bit more knowledgeable about it.

Actually, while the explanation is understandable for me with additional knowledge about the representation... I think it would be useful to add the most simple example from the iterator checkers to the end of the document, how this whole thing ties together and how it is useful in a checker.

I cannot agree more. Please add examples (with indicators where we want warning) and explain how they would be handled according to the different implementation options. Also please highlight, if possible, what are the limitations of the different solutions: potential false positives that we cannot filter out, or bugs we will not be able to find.

Szelethus requested changes to this revision.Feb 18 2021, 8:15 AM

As far as content goes, here are my thoughts:

There is a huge block of comments in the source code: https://github.com/llvm/llvm-project/blob/main/clang/lib/StaticAnalyzer/Checkers/IteratorModeling.cpp#L13. You have not mentioned these iterator categories: are they still a thing? If so, lets move those comments here, if not, just remove them altogether.

Please mention these docs on the top of iterator checker files.

Everything else:

In general, you have some meaningless sentences that I always catch myself writing as well: "There are a variety of techniques.". Its better to just list them straight away.

clang/docs/analyzer/developer-docs/ContainerModeling.rst
105–106

I think const MemRegion * is implied.

107–108

Fair enough, is this true though for containers whose lifetime was extended? Maybe I should know this myself :^)

132–140

Huh, interesting, considering the earlier point about how you track iterator positions. Couldn't you just say that <end symbol> = <begin symbol> + <some concrete value>? I mean, onbviously not, because if it was that simple it would have landed already, but why is that so difficult? Half a sentence about this should suffice.

213–214

This is the juicy part. I want some more thoughts in here: why? Can I have some examples?

290–313

Ah, there it is. Move it to where I placed my other inline! :)

301–305

This example is here to explain the previous two paragraphs, yet I'm left confused. I heard these things:

"If no Region is found for an iterator value then a Symbol is used (SymbolRef)."

Do we have a region here, or not? If we do, then this example doesn't demonstrate what was said highlighted in the first paragraph, so lets do that. There is no region here? Why?

"In the case of pointer iterators, the modeling of the symbolic value is simpler. The lifetime of such values is simple to model,
there is no need for constructors, destructors and copy-elision rules to be taken into consideration."

Do I gather correctly that we need to keep around a memregion and a symbolic value implementation anyways, so we might as well pick the easier one when possible? This look clunky, even if its true. With that said, a paragraph guiding me through the example outlining why its preferable to model it as a symbolic value would be appreciated.

319

You spent long enough on pointer iterators that subsectioning here would be justified.

324–325

This should be moved out of this section. You could mention those files that have the the lowest likelihood of being split up / renamed.

327–329

Why is this an issue? Please explain.

383

Any links to a previous discussion?

This revision now requires changes to proceed.Feb 18 2021, 8:15 AM

Nice work!

clang/docs/analyzer/developer-docs/ContainerModeling.rst
125
148

+1

160

Or ...comparisons still cannot be handled properly. If the ContstraintManager` was also able ... then ...

210
222

:

227
264
266
276

branch 'A'

290

Why don't we track Rvalue containers? It's not clear from this document.

310–313

Break up this sentence please.

317

I think it would be useful to have a few sentences (preceding this paragraph) that introduces the SymbolMap and RegionMap and for what they used for.

329
353

iterators

369

chain

369
371–372

These sentences are word-by-word copies of the previous two sentences.

380

I vote for the extra layer of abstraction.