This is an archive of the discontinued LLVM Phabricator instance.

[clang-doc] De-duplicate comments and locations
ClosedPublic

Authored by DiegoAstiazaran on Jun 6 2019, 11:01 AM.

Details

Summary

De-duplicate comments and declaration locations in reduce function.
When two files include the same header file, this file's content is mapped twice causing comments and locations to be duplicated after the reduce stage.

Diff Detail

Repository
rL LLVM

Event Timeline

Please add a test case to unittests/clang-doc/MergeTest.cpp.

clang-tools-extra/clang-doc/Representation.cpp
124–127 ↗(On Diff #203404)
if (std::find(Description.begin(), Description.end(), Comment) == Description.end())
  Description.emplace_back(std::move(Comment));
clang-tools-extra/clang-doc/Representation.h
51–59 ↗(On Diff #203404)

Use std::tie here (see Reference::operator== below as an example).

jakehehrlich added inline comments.Jun 6 2019, 2:31 PM
clang-tools-extra/clang-doc/Representation.cpp
124 ↗(On Diff #203404)

Instead of deduping like this can we just make Description a set?

clang-tools-extra/clang-doc/Representation.h
60–65 ↗(On Diff #203404)

You can do this using llvm::make_pointer_range/llvm::pointer_iterator and std::equal

Actually if we can make Description a hash set that would be best. I think using std::set would require an awkward < operator to be defined. There should be standard ways of getting hashes out of all the string and vector types and you can use hash_combine to combine all of these into a single hash.

Actually if we can make Description a hash set that would be best. I think using std::set would require an awkward < operator to be defined. There should be standard ways of getting hashes out of all the string and vector types and you can use hash_combine to combine all of these into a single hash.

This SGTM -- this would actually make it a bit more deterministic in the ordering, since we'd no longer depend on the order of file load, even.

clang-tools-extra/clang-doc/Representation.cpp
139–140 ↗(On Diff #203404)

While you're here, could you actually do the same for Location? That is, make it a hash set on SymbolInfo and dedup here?

DiegoAstiazaran retitled this revision from [clang-doc] De-duplicate comments to [clang-doc] De-duplicate comments and locations.
DiegoAstiazaran edited the summary of this revision. (Show Details)

Change how the vectors are de-duplicated.
Add de-duplication of declaration locations.
Add tests.

DiegoAstiazaran marked 4 inline comments as done.Jun 11 2019, 1:33 PM
DiegoAstiazaran added inline comments.
clang-tools-extra/clang-doc/Representation.cpp
124 ↗(On Diff #203404)

The YAML generator uses an llvm method that doesn't support sets, so changing Description to a set would break that.
Talking with Julie we decided to de-duplicate with sort and unique.

DiegoAstiazaran marked 2 inline comments as done.Jun 11 2019, 1:34 PM

LGTM

clang-tools-extra/clang-doc/Representation.h
66 ↗(On Diff #204146)

We should be explicit about what we intend to happen here. Do we just want a total ordering of any kind? Do we want some things to be "more important" than others? For instance because the ordering is lexicographical here Kind is more important than Text which is more important than Name etc...

If the intent is just to have any total order state so at the top and why we need a total ordering (for instance to have an std::set/std::map of these things or we need to sort them, etc...)

186 ↗(On Diff #204146)

Same comment here.

DiegoAstiazaran marked 3 inline comments as done.Jun 25 2019, 6:12 PM
juliehockett accepted this revision.Jun 26 2019, 10:16 AM
This revision is now accepted and ready to land.Jun 26 2019, 10:16 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptJun 28 2019, 11:18 AM