This is an archive of the discontinued LLVM Phabricator instance.

clang-tidy: introduce readability-containter-data-pointer check
ClosedPublic

Authored by compnerd on Aug 29 2021, 10:43 AM.

Details

Summary

This introduces a new check, readability-containter-data-pointer. This
check is meant to catch the cases where the user may be trying to
materialize the data pointer by taking the address of the 0-th member of
a container. With C++11 or newer, the data member should be used for
this. This provides the following benefits:

  • .data() is easier to read than &[0]
  • it avoids an unnecessary re-materialization of the pointer
    • this doesn't matter in the case of optimized code, but in the case of unoptimized code, this will be visible
  • it avoids a potential invalid memory de-reference caused by the indexing when the container is empty (in debug mode, clang will normally optimize away the re-materialization in optimized builds).

The small potential behavioural change raises the question of where the
check should belong. A reasoning of defense in depth applies here, and
this does an unchecked conversion, with the assumption that users can
use the static analyzer to catch cases where we can statically identify
an invalid memory de-reference. For the cases where the static analysis
is unable to prove the size of the container, UBSan can be used to track
the invalid access.

Special thanks to Aaron Ballman for the discussion on whether this
check would be useful and where to place it.

This partially resolves PR26817!

Diff Detail

Event Timeline

compnerd created this revision.Aug 29 2021, 10:43 AM
compnerd requested review of this revision.Aug 29 2021, 10:43 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 29 2021, 10:43 AM
compnerd updated this revision to Diff 369331.Aug 29 2021, 10:47 AM

Reflow the text using clang-format.

Thank you for implementing 26817! But shouldn't this check belong to modernize module?

Please add documentation and mention new check in Release Notes.

clang-tools-extra/clang-tidy/readability/ContainerDataPointerCheck.cpp
19

Please separate with empty line.

118

Please separate with empty line.

clang-tools-extra/clang-tidy/readability/ContainerDataPointerCheck.h
18

Please separate with empty line.

41

Please separate with empty line.

46

Please add inclusion guard in comment.

compnerd updated this revision to Diff 369340.Aug 29 2021, 1:58 PM

Thank you for implementing 26817! But shouldn't this check belong to modernize module?

Oh, I was unaware of the PR, I'll tag that in the commit message, thanks!

Hmm, I'm somewhat on the fence, but I wouldn't be against the reorganization to modernize, only I would prefer that we get that settled before I do the actual rename/move.

Please add documentation and mention new check in Release Notes.

Ah, good idea, I'll add that as well.

clang-tools-extra/clang-tidy/readability/ContainerDataPointerCheck.cpp
19

Hmm, this is what clang-format does, which should be unambiguously correct. Is there something that needs to be changed in clang-format or .clang-format?

Eugene.Zelenko added inline comments.Aug 29 2021, 2:09 PM
clang-tools-extra/clang-tidy/readability/ContainerDataPointerCheck.cpp
19

Hard to tell about current state of clang-format, but you could take a look on existing checks code.

compnerd added inline comments.Aug 29 2021, 2:44 PM
clang-tools-extra/clang-tidy/readability/ContainerDataPointerCheck.cpp
19

I would prefer that we fix the clang-format configuration rather than manually change the formatting here. This is introducing a new set of files, so it isn't resulting in inconsistency in a single file. I don't mind reformatting the code with clang-format, but generally, that is considered the definitive arbiter for formatting.

compnerd added a comment.EditedAug 29 2021, 3:04 PM

Hmm, one case that doesn't currently get handled properly is the following test case:

template <typename T> void f(const T *);
void g(const std::vector<unsigned char> **v) {
  f(&(**v)[0]);
}
compnerd updated this revision to Diff 369343.Aug 29 2021, 3:10 PM
compnerd edited the summary of this revision. (Show Details)

Add some documentation and release notes.

Eugene.Zelenko added inline comments.Aug 29 2021, 8:09 PM
clang-tools-extra/docs/ReleaseNotes.rst
87

It'll be good idea to mention containers and that data is method.

clang-tools-extra/docs/clang-tidy/checks/readability-data-pointer.rst
6

Please make first statement same as in Release Notes.

compnerd marked 2 inline comments as done.Aug 30 2021, 8:44 AM
compnerd added inline comments.
clang-tools-extra/docs/ReleaseNotes.rst
87

That was dumb, you're right. Thanks!

clang-tools-extra/docs/clang-tidy/checks/readability-data-pointer.rst
6

That was a good idea. I find that the updated text reads much better too.

compnerd updated this revision to Diff 369453.Aug 30 2021, 8:46 AM
compnerd marked 2 inline comments as done.

Update release notes to incorporate feedback.

@Eugene.Zelenko I'd like to have @aaron.ballman weigh in on the naming, and assuming that he's okay with modernize.container-data-pointer, I'll change it to that.

@Eugene.Zelenko I'd like to have @aaron.ballman weigh in on the naming, and assuming that he's okay with modernize.container-data-pointer, I'll change it to that.

Sure, Aaron has much broader vision than me.

@Eugene.Zelenko I'd like to have @aaron.ballman weigh in on the naming, and assuming that he's okay with modernize.container-data-pointer, I'll change it to that.

Sure, Aaron has much broader vision than me.

Lies! I just wear glasses, that's all. ;-)

I can see a case being made for either modernize or readability. For modernize, data() was newly added in C++11, so you could argue that this is used to modernize C++98 code. For readability, data() makes the semantics of the underlying code more clear. Also, bugprone makes sense too because doing &v[0] is UB if v is empty, but v.data() is not (immediately) UB.

I think bugprone would be a tough place to put it because of false positive concerns. This is more of a blanket code rewriting utility than it is trying to catch bugs (the static analyzer and UBSan are better tools for that in that case).
I think modernize would be reasonable, but given that C++11 is already over ten years old, I feel like this is losing its "modernization" aspects in some ways. Also, this only covers converting &vec[0] into a call to data(); for modernize, I'd expect it to convert &vec[N] into vec.data() + N more broadly.
I think readability makes slightly more sense because then we can position this as a rewriting tool to make the code more readable (so there are no false positives -- it transforms all &vec[0] calls to data()).

However, if someone wants to make a strong case for modernize, I think it'd be defensible. Also, if someone wanted to suggest we add it to both readability and modernize, that could also make sense (perhaps with an option to convert &vec[N] that's on by default for modernize and off by default for readability).

aaron.ballman added inline comments.Sep 13 2021, 6:20 AM
clang-tools-extra/clang-tidy/readability/ContainerDataPointerCheck.cpp
57

Is this pointsTo() correct? IIRC, that would be trying to capture something like:

std::vector<int> *container;
&container[0];

but this wouldn't call the operator[] overload, it'd be using the builtin array subscripting, so I'm not certain this would match anything.

64

Same question here about pointsTo().

76

Here too.

89

And here.

103–106
clang-tools-extra/clang-tidy/readability/ContainerDataPointerCheck.h
24
46
compnerd marked 7 inline comments as done.Sep 13 2021, 9:58 AM
compnerd added inline comments.
clang-tools-extra/clang-tidy/readability/ContainerDataPointerCheck.cpp
64

This is actually meant to catch cases such as references to UnOp(DRE).

compnerd updated this revision to Diff 372285.Sep 13 2021, 9:59 AM

Address review feedback

aaron.ballman accepted this revision.Sep 13 2021, 10:41 AM
aaron.ballman added inline comments.
clang-tools-extra/clang-tidy/readability/ContainerDataPointerCheck.cpp
64

Good catch and thank you for all the off-list discussion on this point!

clang-tools-extra/test/clang-tidy/checkers/readability-container-data-pointer.cpp
105

Can you add one more test for how this behaves on uninstantiated templates?

template <typename Ty>
void func(const std::vector<Ty> &Vec) {
  const Ty *Ptr = &Vec[0]; // Should diagnose and provide a fix-it despite not instantiating func()
}
This revision is now accepted and ready to land.Sep 13 2021, 10:41 AM
compnerd marked an inline comment as done.Sep 13 2021, 1:14 PM
compnerd added inline comments.
clang-tools-extra/test/clang-tidy/checkers/readability-container-data-pointer.cpp
105

Absolutely! Thanks for the additional test case.

LGTM modulo the testing request (I forgot to mention that when marking it yesterday, oops!).

What about adding modernize/bugprone aliases?

What about adding modernize/bugprone aliases?

I'd be fine if we wanted to add aliases, but I'd sort of expect some extra functionality out of a check in those modules. I think it's fine to land this now, and we can add aliases in a follow-up. WDYT?

What about adding modernize/bugprone aliases?

I'd be fine if we wanted to add aliases, but I'd sort of expect some extra functionality out of a check in those modules. I think it's fine to land this now, and we can add aliases in a follow-up. WDYT?

Aliases code is trivial comparing with check itself, so it make sense to add it in this patch.

What about adding modernize/bugprone aliases?

I'd be fine if we wanted to add aliases, but I'd sort of expect some extra functionality out of a check in those modules. I think it's fine to land this now, and we can add aliases in a follow-up. WDYT?

Aliases code is trivial comparing with check itself, so it make sense to add it in this patch.

So you'd like to see that extra functionality added now? (I don't think it makes sense to have the check as-is in both readability and modernize.)

So you'd like to see that extra functionality added now? (I don't think it makes sense to have the check as-is in both readability and modernize.)

There are plenty of other cross-module aliases already.

So you'd like to see that extra functionality added now? (I don't think it makes sense to have the check as-is in both readability and modernize.)

There are plenty of other cross-module aliases already.

Those are most often checks shared with coding guidelines or they have configurable differences between the modules. I do not think we commonly share identical checks between modules that are not coding standards.

This revision was automatically updated to reflect the committed changes.
compnerd marked an inline comment as done.

@Eugene.Zelenko - sorry, I didn't see the additional comments before the commit. I'm happy to do a follow up depending on the resolution.

@Eugene.Zelenko - sorry, I didn't see the additional comments before the commit. I'm happy to do a follow up depending on the resolution.

If there's a follow-up to add this to modernize, I would say that version of the check should transition &v[N] to v.data() + N (perhaps optionally). I don't think a follow-up to add this to bugprone makes a lot of sense because of the high potential for false positive diagnostics (I think the bugprone one should be a static analyzer check that looks for a preceding predicate that checks the size of the container or whether it's empty).

thakis added a subscriber: thakis.Sep 14 2021, 8:57 AM

This breaks tests: http://45.33.8.238/linux/55784/step_8.txt

Please take a look and revert for now if it takes a while to fix.

Reverted in 76dc8ac36d07 for now.

@thakis - thanks, seems that I had a part of the change sitting in my stash ... I had added a -NOT to verify the behaviour, and forgot to remove it. I'll revert the revert with the fix.

49992c04148e5327bef9bd2dff53a0d46004b4b4 relanded the change. It is useful to attach the original Differential Revision: so that it is connected to this Differential.

It seems like the checker is documented as readability-data-pointer but in the tests it reports issues under the readability-container-data-pointer name.
Shouldn't they be the same? I think it will confuse the users.
@aaron.ballman @whisperity @compnerd

It seems like the checker is documented as readability-data-pointer but in the tests it reports issues under the readability-container-data-pointer name.
Shouldn't they be the same? I think it will confuse the users.
@aaron.ballman @whisperity @compnerd

Sounds like a typo or an intermittent rebase or rename missing something. Definitely should be consistent.

It seems like the checker is documented as readability-data-pointer but in the tests it reports issues under the readability-container-data-pointer name.
Shouldn't they be the same? I think it will confuse the users.

Sounds like a typo or an intermittent rebase or rename missing something. Definitely should be consistent.

@steakhal Fixed in rG164ee457a04daf727ead99b43ad490ea05523652.