Page MenuHomePhabricator

[Analyzer] Checker for non-determinism caused by iteration of unordered container of pointers
AcceptedPublic

Authored by mgrang on Mar 12 2019, 5:23 PM.

Details

Summary

Added a checker for non-determinism caused by iterating unordered containers like std::unordered_set containing pointer elements.

Diff Detail

Event Timeline

mgrang created this revision.Mar 12 2019, 5:23 PM

Following are the assumptions/limitations of this patch:

1. The assumption is that iteration of ordered containers of pointers is not non-deterministic.

2. Currently we only detect std::unordered_set. Detection of unordered_map can be added later.

3. Currently, we also do not check for what happens inside the for loop. Not all iterations may causes non-determinism. For example, counting or summing up the elements should not be non-deterministic.
Szelethus requested changes to this revision.Mar 13 2019, 4:28 AM
Szelethus added a reviewer: baloghadamsoftware.

Code is great, idea's great, so green light on my part, but maybe we should allow some folks who are more knowledgeable on AST matching to have a say in this.

The obvious question, why not implement this in clang-tidy? I have nothing against landing this in the static analyzer, just curious.

Please also add some doc to docs/analyzer/checkers.rst.

lib/StaticAnalyzer/Checkers/PointerIterationChecker.cpp
6–7

Uh-oh, we changed licence not long ago, and your last patch also landed with the incorrect header -- could you please fix them?

test/Analysis/ptr-iter.cpp
2
// RUN: %clang_analyze_cc1 %s -analyzer-output=text -verify \
// RUN:   -analyzer-checker=core,alpha.nondeterminism.PointerIteration
This revision now requires changes to proceed.Mar 13 2019, 4:28 AM
3. Currently, we also do not check for what happens inside the for loop. Not all iterations may causes non-determinism. For example, counting or summing up the elements should not be non-deterministic.

Could you please add a TODO then to the code? We definitely wouldn't want to forget this before moving out of alpha.

mgrang updated this revision to Diff 190453.Mar 13 2019, 10:52 AM
mgrang marked 2 inline comments as done.
mgrang added a comment.EditedMar 13 2019, 11:01 AM

The obvious question, why not implement this in clang-tidy?

Going forward I would like to make these non-determinism checks more precise by reasoning about what happens inside the loop, for example. I am not sure if clang-tidy has support for such deeper reasoning. I thought clang-tidy is more of a pattern matcher?

Also, what is the difference between these two: https://clang.llvm.org/docs/analyzer/checkers.html, https://clang-analyzer.llvm.org/available_checks.html. It seems they document similar stuff. Should we add the doc for each new checker in both of these?

Szelethus added a comment.EditedMar 13 2019, 12:00 PM

Could you please add a commit that changes the top of the file on your previous commit? Feel free to do that without review (I can't commit it myself atm).

Also, what is the difference between these two: https://clang.llvm.org/docs/analyzer/checkers.html, https://clang-analyzer.llvm.org/available_checks.html. It seems they document similar stuff. Should we add the doc for each new checker in both of these?

Umm, yea, we're in the middle of moving from the HTML format to the new sphinx format, so I think it's fine to ignore the latter, and only extend docs/analyzer/checkers.rst. Since the moving is a lot of manual labor, we haven't gotten around getting rid of it entirely just yet. But of course, it's welcome to have that maintained for the time being too!

The obvious question, why not implement this in clang-tidy?

Going forward I would like to make these non-determinism checks more precise by reasoning about what happens inside the loop, for example. I am not sure if clang-tidy has support for such deeper reasoning. I thought clang-tidy is more of a pattern matcher?

Yup, I'm sold on that.

Could you please add a commit that changes the top of the file on your previous commit? Feel free to do that without review (I can't commit it myself atm).

Done in https://reviews.llvm.org/rC356086.

mgrang updated this revision to Diff 190472.Mar 13 2019, 12:29 PM

I was trying to write another checker for hashing of pointers. Basically, I wanted to match all instances where the keys of std::map are pointers. Writing an AST matcher for std::map is straightforward. However, I am not sure how to check for pointer keys after matching std::map.

The following matches std::map<K, V>. But I need to match std::map<int *, V>. So I was thinking something on the lines of templateArgument(hasType(pointerType())).

auto ContainersM = anyOf(
                     hasName("std::map"),
                     hasName("std::unordered_map")
                   );

auto HashContainerM = varDecl(hasType(cxxRecordDecl(ContainersM)))

Ping for reviews please.

Following are the assumptions/limitations of this patch:

1. The assumption is that iteration of ordered containers of pointers is not non-deterministic.

Could you please explain which type of non-determinism we are addressing here? If our issue is that iteration order is not consistent across runs, then an unordered set of integers seems just as non-deterministic as an unordered set of pointers. On the other hand, if our issue is that pointer values vary between runs, then an ordered set of pointers seems just as non-deterministic as an unordered set of pointers. Are unordered sets of pointers distinguished because they lie in the intersection of these categories and thus avoid the most false positive cases? If so, for someone debugging non-deterministic behavior in their code, would it be useful to add a strict option that shows other cases too? If not, maybe we could document our reasons somewhere.

Following are the assumptions/limitations of this patch:

1. The assumption is that iteration of ordered containers of pointers is not non-deterministic.

Could you please explain which type of non-determinism we are addressing here? If our issue is that iteration order is not consistent across runs, then an unordered set of integers seems just as non-deterministic as an unordered set of pointers. On the other hand, if our issue is that pointer values vary between runs, then an ordered set of pointers seems just as non-deterministic as an unordered set of pointers. Are unordered sets of pointers distinguished because they lie in the intersection of these categories and thus avoid the most false positive cases? If so, for someone debugging non-deterministic behavior in their code, would it be useful to add a strict option that shows other cases too? If not, maybe we could document our reasons somewhere.

Yes, the reason we limit the checks only to unordered containers is to reduce the false positive rate. Although, as you rightly pointed out that ordered sets of pointers are as non-deterministic as unordered ones. Once our checks have the capability to detect what happens inside the loop maybe we can add ordered sets too. I will add this to the TODO. Thanks.

Although, as you rightly pointed out that ordered sets of pointers are as non-deterministic as unordered ones.

What if you store pointers to the elements of an array? In that case, it should be deterministic.

-----------
|1|2|3|5|6|
-----------
 ^     ^ 
 p     q

In the above example, pointer p's memory address will always be less then q's, so in an ordered set, iterating over them would be deterministic. When hashing algorithms come into the picture, than it wouldn't be :^)

Although, as you rightly pointed out that ordered sets of pointers are as non-deterministic as unordered ones.

What if you store pointers to the elements of an array? In that case, it should be deterministic.

-----------
|1|2|3|5|6|
-----------
 ^     ^ 
 p     q

In the above example, pointer p's memory address will always be less then q's, so in an ordered set, iterating over them would be deterministic. When hashing algorithms come into the picture, than it wouldn't be :^)

Yes, by containers here I mean hashing containers.

Szelethus accepted this revision.Tue, Mar 26, 4:37 AM

Other than that, LGTM, thanks! Please, again, let @NoQ have the final say.

This revision is now accepted and ready to land.Tue, Mar 26, 4:37 AM

Yes, the reason we limit the checks only to unordered containers is to reduce the false positive rate. Although, as you rightly pointed out that ordered sets of pointers are as non-deterministic as unordered ones. Once our checks have the capability to detect what happens inside the loop maybe we can add ordered sets too. I will add this to the TODO. Thanks.

Great, thanks!