Page MenuHomePhabricator

[clang-tidy] Extend readability-container-size-empty to arbitrary class with size() and empty()
ClosedPublic

Authored by omtcyfz on Sep 8 2016, 8:57 AM.

Diff Detail

Repository
rL LLVM

Event Timeline

omtcyfz updated this revision to Diff 70708.Sep 8 2016, 8:57 AM
omtcyfz retitled this revision from to [clang-tidy] Extend readability-container-size-empty to arbitrary class with size() and empty().
omtcyfz updated this object.
omtcyfz added reviewers: alexfh, Eugene.Zelenko, klimek.
omtcyfz added a subscriber: cfe-commits.
aaron.ballman added a subscriber: aaron.ballman.
aaron.ballman added inline comments.
clang-tidy/readability/ContainerSizeEmptyCheck.cpp
33 ↗(On Diff #70708)

While it's not entirely unreasonable, it makes me uneasy to assume that anything with a size() and empty() member function must be a container. Is there a way to silence false positives aside from disabling the check entirely?

Perhaps this should instead be a user-configurable list of containers that's prepopulated with STL container names?

Also note: returns(isInteger()) seems a bit permissive. Consider:

bool size() const;

enum E { one };
enum E size() const;

These both will qualify for that matcher, but don't seem like particularly valid indications that the object is a container.

omtcyfz updated this revision to Diff 70717.Sep 8 2016, 10:00 AM

Blacklist enum and bool return types for size().

omtcyfz marked an inline comment as done.Sep 8 2016, 10:00 AM
omtcyfz added inline comments.
clang-tidy/readability/ContainerSizeEmptyCheck.cpp
33 ↗(On Diff #70717)

Thank you!

Blacklisted these types.

Actually, I believe if someone has bool size() in their codebase there's still a problem...

Probably check should have options to extend list of containers and also to assume all classes with integer type size() const and bool empty() const as containers. It may be not trivial to find out all custom containers and last option will be helpful to assemble such list.

aaron.ballman edited edge metadata.Sep 8 2016, 12:18 PM

Probably check should have options to extend list of containers and also to assume all classes with integer type size() const and bool empty() const as containers. It may be not trivial to find out all custom containers and last option will be helpful to assemble such list.

I think this should actually have two options: one is a list of default-checked containers (default is the list of STL containers we previously had), and the other is a Boolean option for guessing at what might be a container based on function signature (default is true). This gives people a way to silence the diagnostic while still being useful. I would document what function signatures we use as our heuristic, but note that the function signatures we care about may change (for instance, we may decide that const-qualification is unimportant, or that we want to require begin()/end() functions, etc). I think <standard integer type> size() const and bool empty() const are a reasonable heuristic for a first pass.

clang-tidy/readability/ContainerSizeEmptyCheck.cpp
33 ↗(On Diff #70717)

Blacklisted these types.

I'm still not certain this is correct. For instance, if someone wants to do uint8_t size() const;, we shouldn't punish them because uint8_t happens to be unsigned char for their platform. But you are correct that we don't want this to work with char32_t, for instance.

I think the best approach is to make a local matcher for the type predicate. We want isInteger() unless it's char (that's not explicitly qualified with signed or unsigned), bool, wchar_t, char16_t, char32_t, or an enumeration type.

Actually, I believe if someone has bool size() in their codebase there's still a problem...

Agreed, though not as part of this check. ;-)

Should we also check for absence of parameters in size() and empty() as well as const?

Should we also check for absence of parameters in size() and empty() as well as const?

I think that would be reasonable.

Eugene.Zelenko edited edge metadata.Sep 8 2016, 1:06 PM
Eugene.Zelenko set the repository for this revision to rL LLVM.
Eugene.Zelenko edited subscribers, added: xazax.hun, hokein, Prazek, etienneb; removed: aaron.ballman.
omtcyfz marked an inline comment as done.EditedSep 8 2016, 1:14 PM

Probably check should have options to extend list of containers and also to assume all classes with integer type size() const and bool empty() const as containers. It may be not trivial to find out all custom containers and last option will be helpful to assemble such list.

I was thinking about that, but I'm not sure people are misusing size() and empty() so much.

Should we also check for absence of parameters in size() and empty() as well as const?

I think that would be reasonable.

I do not agree about the const part. It should be encouraged to use const for these, but I do not think they should be a requirement.

If size() and empty() change object's state, it may be not equivalent replacement.

If size() and empty() change object's state, it may be not equivalent replacement.

True. But my point is that they are not required to do that if they're just not marked const.

If size() and empty() change object's state, it may be not equivalent replacement.

True. But my point is that they are not required to do that if they're just not marked const.

I agree with Eugene. But I think a good consensus could be to warn on the non-const case but do not do the rewrite. What do you think?

If size() and empty() change object's state, it may be not equivalent replacement.

True. But my point is that they are not required to do that if they're just not marked const.

I agree with Eugene. But I think a good consensus could be to warn on the non-const case but do not do the rewrite. What do you think?

I think that's reasonable, depending on whether we find false positives with the warning as well (I have a slight concern about size() and empty() being unrelated operations on a non-container class that someone writes).

I think that's reasonable, depending on whether we find false positives with the warning as well (I have a slight concern about size() and empty() being unrelated operations on a non-container class that someone writes).

Agreed. I'd like to try LLVM as an example of a huge codebase tomorrow to see how many false-positives there actually are.

Generally, I can see the point of requiring const and providing options to reduce the set of allowed classes to std::container's, but at the same time my thoughts are as follows:

  1. Chances of people misusing C++ so much are very low. That's something I can check tomorrow just by running the check on few large open source codebases to see whether it's true.
  2. If people are misusing C++ it's their fault anyway. One can do whatever he likes to do. One can rewrite all interfaces to STL containers and make size() and empty() most sophisticated functions the world has ever seen while performing whatever strange operations inside of them. Thus, I am not very supportive of the idea "let's try to think about all weird cornercases".

I'm very curious to see how many false-positives there would be in LLVM codebase, for example. I'll try it tomorrow.

Prazek added inline comments.Sep 8 2016, 2:57 PM
clang-tidy/readability/ContainerSizeEmptyCheck.cpp
34 ↗(On Diff #70717)

Would be nice to have 'isStrictlyInteger' matcher to do this thing.

alexfh edited edge metadata.Sep 9 2016, 5:17 AM

Thank you for the patch!

Probably check should have options to extend list of containers and also to assume all classes with integer type size() const and bool empty() const as containers. It may be not trivial to find out all custom containers and last option will be helpful to assemble such list.

I think this should actually have two options: one is a list of default-checked containers (default is the list of
STL containers we previously had), and the other is a Boolean option for guessing at what might be a
container based on function signature (default is true). This gives people a way to silence the diagnostic
while still being useful.

While I understand your concerns, I would suggest to wait for actual bug reports before introducing more options. The duck typing approach worked really well for readability-redundant-smartptr-get and I expect it to work equally well here. The constraints checked here are pretty strict and should make the check safe.

I would document what function signatures we use as our heuristic, but note that the function signatures we care about may change (for instance, we may decide that const-qualification is unimportant, or that we want to require begin()/end() functions, etc). I think <standard integer type> size() const and bool empty() const are a reasonable heuristic for a first pass.

Yep, documentation should be updated accordingly.

clang-tidy/readability/ContainerSizeEmptyCheck.cpp
33 ↗(On Diff #70717)

Same here, let's wait for a real user asking to support his real container class that defines uint8_t size() const;.

145 ↗(On Diff #70717)

Since we're expanding the set of supported classes, it makes sense to include the name of the class to the diagnostic message.

alexfh removed a reviewer: klimek.Sep 9 2016, 5:17 AM

Thank you for the patch!

Probably check should have options to extend list of containers and also to assume all classes with integer type size() const and bool empty() const as containers. It may be not trivial to find out all custom containers and last option will be helpful to assemble such list.

I think this should actually have two options: one is a list of default-checked containers (default is the list of
STL containers we previously had), and the other is a Boolean option for guessing at what might be a
container based on function signature (default is true). This gives people a way to silence the diagnostic
while still being useful.

While I understand your concerns, I would suggest to wait for actual bug reports before introducing more options. The duck typing approach worked really well for readability-redundant-smartptr-get and I expect it to work equally well here. The constraints checked here are pretty strict and should make the check safe.

That means there is no way to silence this check on false positives aside from disabling it entirely, which is not a particularly good user experience. However, I do agree that if we constrain the heuristic appropriately, the number of false positives should be low.

clang-tidy/readability/ContainerSizeEmptyCheck.cpp
33 ↗(On Diff #70717)

I'm not certain I agree with that (I don't like adding new functionality that is known to be imprecise, especially when dealing with heuristic-based checking), but I don't have that strong of feelings.

omtcyfz updated this revision to Diff 70818.Sep 9 2016, 5:51 AM
omtcyfz removed rL LLVM as the repository for this revision.

Restricted size() and empty() functions a little bit more.

omtcyfz updated this revision to Diff 70822.Sep 9 2016, 6:27 AM

Allow inheritance for size() and empty().

alexfh requested changes to this revision.Sep 12 2016, 2:47 AM
alexfh edited edge metadata.

Thank you for the patch!

Probably check should have options to extend list of containers and also to assume all classes with integer type size() const and bool empty() const as containers. It may be not trivial to find out all custom containers and last option will be helpful to assemble such list.

I think this should actually have two options: one is a list of default-checked containers (default is the list of
STL containers we previously had), and the other is a Boolean option for guessing at what might be a
container based on function signature (default is true). This gives people a way to silence the diagnostic
while still being useful.

While I understand your concerns, I would suggest to wait for actual bug reports before introducing more options. The duck typing approach worked really well for readability-redundant-smartptr-get and I expect it to work equally well here. The constraints checked here are pretty strict and should make the check safe.

That means there is no way to silence this check on false positives aside from disabling it entirely, which is not a particularly good user experience. However, I do agree that if we constrain the heuristic appropriately, the number of false positives should be low.

Just checked: readability-container-size-empty with the patch finds a few thousand new warnings related to about 400 unique container names in our code. A representative sample shows no false positives and we've found just a single false negative (that triggers the check without the patch) - a container's size() method was not const. Which makes me pretty sure that:

  1. The new algorithm is much more consistent and useful than the whitelist approach (gathering and maintaining a whitelist of a few hundred names is also not fun at all).
  2. Given the size of our code and the number of third-party libraries, possibility of false positives is extremely low.
  3. If there are reports of false positives, we could go with a blacklist approach instead, but I don't think this will ever be needed.
clang-tidy/readability/ContainerSizeEmptyCheck.cpp
33 ↗(On Diff #70822)

I don't think we can come up with a proper solution until we see a real world example. If we want to find one faster, we can allow all character types and wait for reports of false positives ;)

docs/clang-tidy/checks/readability-container-size-empty.rst
18 ↗(On Diff #70822)

The check

18 ↗(On Diff #70822)

Double backquotes should be used.

21 ↗(On Diff #70822)

nit: space before c++.

23 ↗(On Diff #70822)

Explain the constraints size_type should satisfy.

This revision now requires changes to proceed.Sep 12 2016, 2:47 AM
omtcyfz updated this revision to Diff 70986.Sep 12 2016, 3:04 AM
omtcyfz edited edge metadata.
omtcyfz marked 4 inline comments as done.

Address another round of comments.

omtcyfz updated this revision to Diff 70989.Sep 12 2016, 3:15 AM
omtcyfz edited edge metadata.

Messed up with the last diff; fix that + clang-format the check.

alexfh accepted this revision.Sep 13 2016, 1:44 AM
alexfh edited edge metadata.

LG

clang-tidy/readability/ContainerSizeEmptyCheck.cpp
35 ↗(On Diff #70989)

You could probably combine the two returns matchers. Something like this: returns(qualType(isInteger(), unless(booleanType()))).

This revision is now accepted and ready to land.Sep 13 2016, 1:44 AM
omtcyfz updated this revision to Diff 71119.Sep 13 2016, 1:57 AM
omtcyfz edited edge metadata.
omtcyfz marked an inline comment as done.

Combine two returns matchers.

This revision was automatically updated to reflect the committed changes.