Added a checker for non-determinism caused by iterating unordered containers like std::unordered_set containing pointer elements.
Following are the assumptions/limitations of this patch:
1. The assumption is that iteration of ordered containers of pointers is not non-deterministic. 2. Currently we only detect std::unordered_set. Detection of unordered_map can be added later. 3. Currently, we also do not check for what happens inside the for loop. Not all iterations may causes non-determinism. For example, counting or summing up the elements should not be non-deterministic.
Code is great, idea's great, so green light on my part, but maybe we should allow some folks who are more knowledgeable on AST matching to have a say in this.
The obvious question, why not implement this in clang-tidy? I have nothing against landing this in the static analyzer, just curious.
Please also add some doc to docs/analyzer/checkers.rst.
Uh-oh, we changed licence not long ago, and your last patch also landed with the incorrect header -- could you please fix them?
// RUN: %clang_analyze_cc1 %s -analyzer-output=text -verify \ // RUN: -analyzer-checker=core,alpha.nondeterminism.PointerIteration
The obvious question, why not implement this in clang-tidy?
Going forward I would like to make these non-determinism checks more precise by reasoning about what happens inside the loop, for example. I am not sure if clang-tidy has support for such deeper reasoning. I thought clang-tidy is more of a pattern matcher?
Also, what is the difference between these two: https://clang.llvm.org/docs/analyzer/checkers.html, https://clang-analyzer.llvm.org/available_checks.html. It seems they document similar stuff. Should we add the doc for each new checker in both of these?
Could you please add a commit that changes the top of the file on your previous commit? Feel free to do that without review (I can't commit it myself atm).
Umm, yea, we're in the middle of moving from the HTML format to the new sphinx format, so I think it's fine to ignore the latter, and only extend docs/analyzer/checkers.rst. Since the moving is a lot of manual labor, we haven't gotten around getting rid of it entirely just yet. But of course, it's welcome to have that maintained for the time being too!
Yup, I'm sold on that.
I was trying to write another checker for hashing of pointers. Basically, I wanted to match all instances where the keys of std::map are pointers. Writing an AST matcher for std::map is straightforward. However, I am not sure how to check for pointer keys after matching std::map.
The following matches std::map<K, V>. But I need to match std::map<int *, V>. So I was thinking something on the lines of templateArgument(hasType(pointerType())).
auto ContainersM = anyOf( hasName("std::map"), hasName("std::unordered_map") ); auto HashContainerM = varDecl(hasType(cxxRecordDecl(ContainersM)))
Could you please explain which type of non-determinism we are addressing here? If our issue is that iteration order is not consistent across runs, then an unordered set of integers seems just as non-deterministic as an unordered set of pointers. On the other hand, if our issue is that pointer values vary between runs, then an ordered set of pointers seems just as non-deterministic as an unordered set of pointers. Are unordered sets of pointers distinguished because they lie in the intersection of these categories and thus avoid the most false positive cases? If so, for someone debugging non-deterministic behavior in their code, would it be useful to add a strict option that shows other cases too? If not, maybe we could document our reasons somewhere.
Yes, the reason we limit the checks only to unordered containers is to reduce the false positive rate. Although, as you rightly pointed out that ordered sets of pointers are as non-deterministic as unordered ones. Once our checks have the capability to detect what happens inside the loop maybe we can add ordered sets too. I will add this to the TODO. Thanks.
What if you store pointers to the elements of an array? In that case, it should be deterministic.
----------- |1|2|3|5|6| ----------- ^ ^ p q
In the above example, pointer p's memory address will always be less then q's, so in an ordered set, iterating over them would be deterministic. When hashing algorithms come into the picture, than it wouldn't be :^)