Adds a bit of boiler plate, but removing the need to build temporary strings to compare the calls definitely worth it.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
clang/lib/Tooling/NodeIntrospection.cpp | ||
---|---|---|
84 | Would it make sense to compare the sizes (leftsize < rightsize) return true etc and only loop if the containers are the same size? |
clang/lib/Tooling/NodeIntrospection.cpp | ||
---|---|---|
84 | That wouldn't work as we are comparing lexicographically. If leftsize was smaller than rightsize, but the first item in left was zzzz, and first in right was aaaa. We'd want right to be marked as less then left. |
clang/lib/Tooling/NodeIntrospection.cpp | ||
---|---|---|
84 | I don't think we would want that. All we need is to be able to put these things in a std set. We only need uniquenes and deterministic order. We don't need or want any particular order at all. Besides, your implementation orders elements as they are formatted for c++, which could differ from the order the calls would be in if casts are omitted (js, python). Besides, https://en.m.wikipedia.org/wiki/Lexicographic_order "One variant applies to sequences of different lengths by comparing the lengths of the sequences before considering their elements." It should be possible to implement this without creating the local vectors. Just iterate in parallel through the location calls, comparing names, and if one chain ends before the other return early. That would seem to have all the properties we need, right? |
Rebase and remove Args from comparison.
Add quick check for skippig common prefix calls.
Your implementation is getting very complicated and it requires many comments.
Also, if we get to this point in the execution of RangeLessThan::operator(), you're creating and populating two vectors for every two elements compared. I know it's llvm::SmallVector and maybe not so expensive etc, but doing something is more expensive than not doing it.
You're trying to sort elements (which have the same location/range) alphabetically according to how it will be sorted after c++ formatting. Sorting according to c++ formatting means sorting incorrectly according to JS/python. This is simple to demonstrate:
TEST(Introspection, SourceLocations_CallContainer3) { SourceLocationMap slm; SharedLocationCall Prefix; auto call1 = llvm::makeIntrusiveRefCnt<LocationCall>( llvm::makeIntrusiveRefCnt<LocationCall>( llvm::makeIntrusiveRefCnt<LocationCall>( Prefix, "dddd"), "eeee", LocationCall::IsCast), "ffff"); auto call2 = llvm::makeIntrusiveRefCnt<LocationCall>( llvm::makeIntrusiveRefCnt<LocationCall>( llvm::makeIntrusiveRefCnt<LocationCall>( Prefix, "dddd"), "gggg", LocationCall::IsCast), "bbbb"); slm.insert( std::make_pair(SourceLocation(), call1)); slm.insert( std::make_pair(SourceLocation(), call2)); for (auto& item : slm) { llvm::errs() << LocationCallFormatterCpp::format(*item.second) << "\n"; } for (auto& item : slm) { llvm::errs() << LocationCallFormatterSimple::format(*item.second) << "\n"; } }
output:
dddd().eeee().ffff() dddd().gggg().bbbb() dddd().ffff() dddd().bbbb()
So, the cost of creating two vectors for each invocation of lessThan that reaches this point delivers no benefit.
Here is an implementation which does not create those vectors:
diff --git a/clang/lib/Tooling/NodeIntrospection.cpp b/clang/lib/Tooling/NodeIntrospection.cpp index 8e2d446c3efe..6e846acf9ecf 100644 --- a/clang/lib/Tooling/NodeIntrospection.cpp +++ b/clang/lib/Tooling/NodeIntrospection.cpp @@ -40,6 +40,22 @@ std::string LocationCallFormatterCpp::format(const LocationCall &Call) { return Result; } +bool locationLessThan(const LocationCall *LHS, const LocationCall *RHS) +{ + if (!LHS && !RHS) + return false; + if (LHS && !RHS) + return true; + if (!LHS && RHS) + return false; + auto compareResult = LHS->name().compare(RHS->name()); + if (compareResult < 0) + return true; + if (compareResult > 0) + return false; + return locationLessThan(LHS->on(), RHS->on()); +} + namespace internal { bool RangeLessThan::operator()( std::pair<SourceRange, SharedLocationCall> const &LHS, @@ -54,15 +70,13 @@ bool RangeLessThan::operator()( else if (LHS.first.getEnd() != RHS.first.getEnd()) return false; - return LocationCallFormatterCpp::format(*LHS.second) < - LocationCallFormatterCpp::format(*RHS.second); + return locationLessThan(LHS.second.get(), RHS.second.get()); } bool RangeLessThan::operator()( std::pair<SourceLocation, SharedLocationCall> const &LHS, std::pair<SourceLocation, SharedLocationCall> const &RHS) const { if (LHS.first == RHS.first) - return LocationCallFormatterCpp::format(*LHS.second) < - LocationCallFormatterCpp::format(*RHS.second); + return locationLessThan(LHS.second.get(), RHS.second.get()); return LHS.first < RHS.first; } } // namespace internal
Would it make sense to compare the sizes (leftsize < rightsize) return true etc and only loop if the containers are the same size?