Container and IteratorModeling is a feature in ClangSA which handles the containers and their
iterators. Containers include arrays and STL containers, and their iterators implemented as either
pointers or class instances.
|330 ms||linux > HWAddressSanitizer-x86_64.TestCases::sizes.cpp|
Script: -- : 'RUN: at line 3'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang --driver-mode=g++ -m64 -gline-tables-only -fsanitize=hwaddress -fuse-ld=lld -mcmodel=large -mllvm -hwasan-globals -mllvm -hwasan-use-short-granules -mllvm -hwasan-instrument-landing-pads=0 -mllvm -hwasan-instrument-personality-functions /mnt/disks/ssd0/agent/llvm-project/compiler-rt/test/hwasan/TestCases/sizes.cpp -nostdlib++ -lstdc++ -o /mnt/disks/ssd0/agent/llvm-project/build/projects/compiler-rt/test/hwasan/X86_64/TestCases/Output/sizes.cpp.tmp
In general, perhaps you should go over the rendered text once again and make use of emph and bold some more.
Explanation looks okay from my part, although I'm really not knowledgeable about CSA internals. But there are plenty of "typesetting" issues.
Do you have to stick to 80-col in the documentation file? It is customary (at least in LaTeX) that prose is organised in a "one line of code per sentence" fashion. This also makes handling the diff easier... In the latter part towards the discussion about how iterators are targeted, the "80-col" is broken anyways.
How come this new addition is a .rst but the rest isn't. Does the rendering work without it? We should keep a style here, either way. So turn all into proper .rst links, or drop the .rst here.
Can the checker's name be turned into a link, or am I mixing up individual checker documentations' availability with Tidy?
Either way, show the check's name in bold face.
what does it refer to here?
Will this appear to the reader as alpha- or alpha.?
Iterator was linked already, I don't think this needs to repeat.
What about neibloids (see right before Parameters) (callables that are explicitly disabling ADL at a call site even though they look like an ADL call, with tricks, usually found in the ranges library).
Format these as monospace too.
(While I don't at all agree with the whole put the * to the variable name, not the type name yadda, Clang prints type names with a before the *, so let's keep it consistent...ly bad.)
I think something is missing for here, I am incapable of decyphering this sentence.
Is typesize a defined thing in the constraint manager? Otherwise, wouldn't sizeof(T) / 4 be better readable?
(same with T -> T below consistently)
What does the ' do here?
Perhaps it would be better if indivudal comments for each line would simply be on the line before the code?
// No modelling needed: Container C; // Bla bla foo()
Previously, lvalue was written as LValue!
LValues, RValues, incosistency
Could we turn this on its head and simply call these "user-defined iterator types" instead of this, while technically correct, rather wonky "class instance" kind of stuff?
Which directory? Previously the algorithm checker was referenced with its logical name alpha.cplusplus.SomethingSomething.
I think LLVM's RST rendering rules have a .. option:: element, which renders it in a nice style. But this might also be Tidy-specific...
Actually, while the explanation is understandable for me with additional knowledge about the representation... I think it would be useful to add the most simple example from the iterator checkers to the end of the document, how this whole thing ties together and how it is useful in a checker.
I don't have the time to comb through this doc, unfortunately, but I want to applaud this effort to make the iterator checker family more accessible. Its certainly a forerunner in modeling tricky C++ constructs, and I can't wait to be a more valuable reviewer after being a bit more knowledgeable about it.
I cannot agree more. Please add examples (with indicators where we want warning) and explain how they would be handled according to the different implementation options. Also please highlight, if possible, what are the limitations of the different solutions: potential false positives that we cannot filter out, or bugs we will not be able to find.
As far as content goes, here are my thoughts:
There is a huge block of comments in the source code: https://github.com/llvm/llvm-project/blob/main/clang/lib/StaticAnalyzer/Checkers/IteratorModeling.cpp#L13. You have not mentioned these iterator categories: are they still a thing? If so, lets move those comments here, if not, just remove them altogether.
Please mention these docs on the top of iterator checker files.
In general, you have some meaningless sentences that I always catch myself writing as well: "There are a variety of techniques.". Its better to just list them straight away.
I think const MemRegion * is implied.
Fair enough, is this true though for containers whose lifetime was extended? Maybe I should know this myself :^)
Huh, interesting, considering the earlier point about how you track iterator positions. Couldn't you just say that <end symbol> = <begin symbol> + <some concrete value>? I mean, onbviously not, because if it was that simple it would have landed already, but why is that so difficult? Half a sentence about this should suffice.
This is the juicy part. I want some more thoughts in here: why? Can I have some examples?
Ah, there it is. Move it to where I placed my other inline! :)
This example is here to explain the previous two paragraphs, yet I'm left confused. I heard these things:
"If no Region is found for an iterator value then a Symbol is used (SymbolRef)."
Do we have a region here, or not? If we do, then this example doesn't demonstrate what was said highlighted in the first paragraph, so lets do that. There is no region here? Why?
"In the case of pointer iterators, the modeling of the symbolic value is simpler. The lifetime of such values is simple to model,
Do I gather correctly that we need to keep around a memregion and a symbolic value implementation anyways, so we might as well pick the easier one when possible? This look clunky, even if its true. With that said, a paragraph guiding me through the example outlining why its preferable to model it as a symbolic value would be appreciated.
You spent long enough on pointer iterators that subsectioning here would be justified.
This should be moved out of this section. You could mention those files that have the the lowest likelihood of being split up / renamed.
Why is this an issue? Please explain.
Any links to a previous discussion?