This is an archive of the discontinued LLVM Phabricator instance.

[clang-tidy] Add check for classes missing -hash ⚠️
ClosedPublic

Authored by stephanemoore on Sep 18 2019, 3:44 PM.

Details

Summary

Apple documentation states that:
"If two objects are equal, they must have the same hash value. This last
point is particularly important if you define isEqual: in a subclass and
intend to put instances of that subclass into a collection. Make sure
you also define hash in your subclass."
https://developer.apple.com/documentation/objectivec/1418956-nsobject/1418795-isequal?language=objc

In many or all versions of libobjc, -[NSObject isEqual:] is a pointer
equality check and -[NSObject hash] returns the messaged object's
pointer. A relatively common form of developer error is for a developer to
override -isEqual: in a subclass without overriding -hash to ensure that
hashes are equal for objects that are equal.

It is assumed that an override of -isEqual: is a strong signal for
changing the object's equality operator to something other than pointer
equality which implies that a missing override of -hash could result in
distinct objects being equal but having distinct hashes because they are
independent instances. This added check flags classes that override
-isEqual: but inherit NSObject's implementation of -hash to warn of the
potential for unexpected behavior.

The proper implementation of -hash is the responsibility of the
developer and the check will only verify that the developer made an
effort to properly implement -hash. Developers can set up unit tests
to verify that their implementation of -hash is appropriate.

Test Notes:
Ran check-clang-tools.

Diff Detail

Repository
rL LLVM

Event Timeline

stephanemoore created this revision.Sep 18 2019, 3:44 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 18 2019, 3:44 PM
Eugene.Zelenko added inline comments.
clang-tools-extra/docs/ReleaseNotes.rst
105 ↗(On Diff #220763)

Wrong place. Please move to new checks list (in alphabetical order).

Eugene.Zelenko added inline comments.Sep 18 2019, 5:30 PM
clang-tools-extra/clang-tidy/objc/MissingHashCheck.cpp
40 ↗(On Diff #220763)

Should check if language is Objective-C. See ForbiddenSubclassingCheck.cpp as example. Will be good ensure that all Objective-C checks do this.

Restrict ojbc-missing-hash to Objective-C language variants and fix sorting of
release notes.

stephanemoore marked 4 inline comments as done.Sep 19 2019, 12:45 AM
stephanemoore added inline comments.
clang-tools-extra/clang-tidy/objc/MissingHashCheck.cpp
40 ↗(On Diff #220763)

Thanks for calling this out; I tried using the check addition script and forgot to include this important conditional. I believe this has been resolved with my proposed changes.

clang-tools-extra/docs/ReleaseNotes.rst
105 ↗(On Diff #220763)

Thanks for pointing this out. I believe I may have made an error while resolving merge conflicts. I believe this has now been resolved.

aaron.ballman added inline comments.Sep 19 2019, 7:42 AM
clang-tools-extra/clang-tidy/objc/MissingHashCheck.cpp
56 ↗(On Diff #220823)

Do you think we could generate a fixit to add the hash method? Do you think we could even add a default implementation that returns the pointer to the object (assuming that's the correct default behavior)?

stephanemoore marked 3 inline comments as done.Sep 19 2019, 7:34 PM
stephanemoore added inline comments.
clang-tools-extra/clang-tidy/objc/MissingHashCheck.cpp
56 ↗(On Diff #220823)

Do you think we could generate a fixit to add the hash method?

I think it would be pretty tough to generate a reasonable hash method without knowing the equality and hashing semantics that the scenario calls for.

Here is an analogous situation presented in C++ (please excuse the hastily assembled sample code):

namespace {

class NSObject {
  public:
    NSObject() {}
    virtual ~NSObject() {}

    virtual bool isEqual(const NSObject *other) const {
      return this == other;
    }
    virtual unsigned long long hash() const {
      return (unsigned long long)this;
    }
};

}

#include <stdio.h>
#include <string>

namespace {

class Movie : public virtual NSObject {
  private:
    std::string name;
    std::string language;

  public:
    Movie(std::string name, std::string language) : name(name), language(language) {}
    ~Movie() override {}
    bool isEqual(const NSObject *other) const override {
      if (auto otherMovie = dynamic_cast<const Movie *>(other)) {
        // Movies with the same name are considered equal
        // regardless of the language of the screening.
        return name == otherMovie->name;
      }
      return false;
    }
    unsigned long long hash() const override {
      return name.length();
    }
};

}

As before, the base class uses pointer equality and the pointer as a hash. A subclass may arbitrarily add additional state but only the developer knows which added state factors into equality operations and consequently should be considered—but not necessarily required—in the hash operation. The matter can technically get even more complicated if an object stores state externally. I would hope that externally stored state would not factor into the equality operation of an object but I hesitate to make an assumption.

The developer is also in the best position to prioritize different properties of the hash function including performance, collision resistance, uniformity, and non-invertibility.

Writing effective hash functions is probably difficult independent of the programming language but it might help to consider some specific examples in Objective-C. GPBMessage, the Objective-C base class for Google Protocol Buffer message classes, implements -hash but has an extensive comment explaining that its complex but generic implementation is not generally optimal and recommends that developers override -hash and -isEqual: to optimize for runtime performance. In contrast, the basic collection classes in Apple's Foundation framework have surprisingly simple hash behavior that clearly indicate priority to runtime performance over uniformity and collision resistance. The former is a conservatively expensive hash function and the latter is a conservatively inexpensive hash function.

Do you think we could even add a default implementation that returns the pointer to the object (assuming that's the correct default behavior)?

A hash returning the object pointer is already inherited from the superclass (i..e, -[NSObject hash]). Defining an override that returns the object pointer would be a functional no-op for classes directly derived from NSObject (although the explicit override could be useful as a signal of intended behavior).

aaron.ballman accepted this revision.Sep 20 2019, 5:40 AM

LGTM!

clang-tools-extra/clang-tidy/objc/MissingHashCheck.cpp
56 ↗(On Diff #220823)

A hash returning the object pointer is already inherited from the superclass (i..e, -[NSObject hash]). Defining an override that returns the object pointer would be a functional no-op for classes directly derived from NSObject (although the explicit override could be useful as a signal of intended behavior).

Ah, my ObjC knowledge is weak and I was thinking that the one inherited from NSObject would be hidden. Thank you for the detailed explanation, that makes a lot of sense to me.

This revision is now accepted and ready to land.Sep 20 2019, 5:40 AM
stephanemoore marked 2 inline comments as done.Sep 20 2019, 5:44 PM

Thanks for the review!

clang-tools-extra/clang-tidy/objc/MissingHashCheck.cpp
56 ↗(On Diff #220823)

Thank you for the detailed explanation, that makes a lot of sense to me.

My pleasure!

This revision was automatically updated to reflect the committed changes.
stephanemoore marked an inline comment as done.
Herald added a project: Restricted Project. · View Herald TranscriptSep 20 2019, 6:22 PM