A pointer referring to the elements of a basic_string may be invalidated by calling a non-const member function, except operator, at, front, back, begin, rbegin, end, and rend. The checker now warns if the pointer is used after such operations.
Cool! I don't have a strong preference with respect to whitelist vs. blacklist; your approach is safer but listing functions that don't immediately invalidate the buffer would allow us to avoid hard-to-detect false negatives while pretending that our users would notice and report easy-to-fix false positives for us. Also we rarely commit to adding a test for every single supported API function; bonus points for that, but usually 2-3 functions from a series of similar functions is enough :)
That quote from the Standard would look great here.
It is really nice to see this checker take shape! Some drive by diagnostic comments in line.
In other parts of clang we use the term "inner pointer" to mean a pointer that will be invalidated if its containing object is destroyed https://clang.llvm.org/docs/AutomaticReferenceCounting.html#interior-pointers. There are existing attributes that use this term to specify that a method returns an inner pointer.
I think it would be good to use the same terminology here. So the diagnostic could be something like "Dangling inner pointer obtained here".
What do you think about explicitly mentioning the name of the method here when we have it? This will make it more clear when there are multiple methods on the same line.
I also think that instead of saying "is allowed to" (which raises the question "by whom?") you could make it more direct.
or, for the destructor "Inner pointer invalidated by call to destructor"?
What do you think?
If you're worried about this wording being to strong, you could weaken it with a "may be" to:
"Inner pointer may be invalidated by call to 'clear'"
I showed the bug mentioned in D49058 to a friend who didn't do much C++ recently, for a fresh look, and he provided a bunch of interesting feedback by explaining the way he didn't understand what the analyzer was trying to say.
- When we call c_str(), the pointer is not dangling yet, not until the destructor or realloc is called. He didn't understand the report because he was trying to figure out why do we think the pointer is already dangling.
- A generic "use after free" warning on the return site is confusing because the user would expect to see an actual "use" instead of just passing it around. We should be more specific, i.e. "Deallocated pointer returned to the caller".
- We mention that there's a destructor, but the destructor is hard to see. Knowing the type of the destroyed object would help. Knowing that it's a temporary object would help.
- The whole idea of "string has a buffer that would be destroyed when the string is destroyed and we shouldn't pass the pointer around" needs to be explained all together, rather than separated into different diagnostic pieces. The user needs to be somehow informed that this is how std::string operates because he doesn't necessarily know that.
I think the word "invalidated" may be confusing, how about "reallocated"? And "deallocated" in case of destructors.
My intuition suggests that we should remove the word "Dangling" from the checker name, because our checker names are usually indicating what they check, not what bugs they find. Eg., MallocChecker doesn't find all mallocs, it checks that mallocs are used correctly. This checker checks that pointers to inner buffers are used correctly, so we may call it InnerPointerChecker or something like that.