This is an archive of the discontinued LLVM Phabricator instance.

[AST][Introspection] Avoid creating temporary strings when comparing LocationCalls.
AbandonedPublic

Authored by njames93 on Apr 16 2021, 3:51 AM.

Details

Reviewers
steveire
Summary

Adds a bit of boiler plate, but removing the need to build temporary strings to compare the calls definitely worth it.

Diff Detail

Event Timeline

njames93 requested review of this revision.Apr 16 2021, 3:51 AM
njames93 created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptApr 16 2021, 3:51 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
steveire added inline comments.Apr 16 2021, 4:28 AM
clang/lib/Tooling/NodeIntrospection.cpp
75

Would it make sense to compare the sizes (leftsize < rightsize) return true etc and only loop if the containers are the same size?

njames93 added inline comments.Apr 16 2021, 5:06 AM
clang/lib/Tooling/NodeIntrospection.cpp
75

That wouldn't work as we are comparing lexicographically. If leftsize was smaller than rightsize, but the first item in left was zzzz, and first in right was aaaa. We'd want right to be marked as less then left.

steveire added inline comments.Apr 16 2021, 5:15 AM
clang/lib/Tooling/NodeIntrospection.cpp
75

I don't think we would want that. All we need is to be able to put these things in a std set. We only need uniquenes and deterministic order. We don't need or want any particular order at all.

Besides, your implementation orders elements as they are formatted for c++, which could differ from the order the calls would be in if casts are omitted (js, python).

Besides, https://en.m.wikipedia.org/wiki/Lexicographic_order "One variant applies to sequences of different lengths by comparing the lengths of the sequences before considering their elements."

It should be possible to implement this without creating the local vectors. Just iterate in parallel through the location calls, comparing names, and if one chain ends before the other return early. That would seem to have all the properties we need, right?

njames93 updated this revision to Diff 338356.Apr 17 2021, 11:50 PM

Rebase and remove Args from comparison.
Add quick check for skippig common prefix calls.

steveire added a comment.EditedApr 18 2021, 6:35 AM

Your implementation is getting very complicated and it requires many comments.

Also, if we get to this point in the execution of RangeLessThan::operator(), you're creating and populating two vectors for every two elements compared. I know it's llvm::SmallVector and maybe not so expensive etc, but doing something is more expensive than not doing it.

You're trying to sort elements (which have the same location/range) alphabetically according to how it will be sorted after c++ formatting. Sorting according to c++ formatting means sorting incorrectly according to JS/python. This is simple to demonstrate:

TEST(Introspection, SourceLocations_CallContainer3) {

  SourceLocationMap slm;
  SharedLocationCall Prefix;
  auto call1 = llvm::makeIntrusiveRefCnt<LocationCall>(
                                        llvm::makeIntrusiveRefCnt<LocationCall>(
                                        llvm::makeIntrusiveRefCnt<LocationCall>(
                                        Prefix, "dddd"), "eeee", LocationCall::IsCast), "ffff");
  auto call2 = llvm::makeIntrusiveRefCnt<LocationCall>(
                                        llvm::makeIntrusiveRefCnt<LocationCall>(
                                        llvm::makeIntrusiveRefCnt<LocationCall>(
                                        Prefix, "dddd"), "gggg", LocationCall::IsCast), "bbbb");
  slm.insert(
      std::make_pair(SourceLocation(), call1));
  slm.insert(
      std::make_pair(SourceLocation(), call2));

  for (auto& item : slm)
  {
    llvm::errs() << LocationCallFormatterCpp::format(*item.second) << "\n";
  }
  for (auto& item : slm)
  {
    llvm::errs() << LocationCallFormatterSimple::format(*item.second) << "\n";
  }
}

output:

dddd().eeee().ffff()
dddd().gggg().bbbb()
dddd().ffff()
dddd().bbbb()

So, the cost of creating two vectors for each invocation of lessThan that reaches this point delivers no benefit.

Here is an implementation which does not create those vectors:

diff --git a/clang/lib/Tooling/NodeIntrospection.cpp b/clang/lib/Tooling/NodeIntrospection.cpp
index 8e2d446c3efe..6e846acf9ecf 100644
--- a/clang/lib/Tooling/NodeIntrospection.cpp
+++ b/clang/lib/Tooling/NodeIntrospection.cpp
@@ -40,6 +40,22 @@ std::string LocationCallFormatterCpp::format(const LocationCall &Call) {
   return Result;
 }
 
+bool locationLessThan(const LocationCall *LHS, const LocationCall *RHS)
+{
+  if (!LHS && !RHS)
+    return false;
+  if (LHS && !RHS)
+    return true;
+  if (!LHS && RHS)
+    return false;
+  auto compareResult = LHS->name().compare(RHS->name());
+  if (compareResult < 0)
+    return true;
+  if (compareResult > 0)
+    return false;
+  return locationLessThan(LHS->on(), RHS->on());
+}
+
 namespace internal {
 bool RangeLessThan::operator()(
     std::pair<SourceRange, SharedLocationCall> const &LHS,
@@ -54,15 +70,13 @@ bool RangeLessThan::operator()(
   else if (LHS.first.getEnd() != RHS.first.getEnd())
     return false;
 
-  return LocationCallFormatterCpp::format(*LHS.second) <
-         LocationCallFormatterCpp::format(*RHS.second);
+  return locationLessThan(LHS.second.get(), RHS.second.get());
 }
 bool RangeLessThan::operator()(
     std::pair<SourceLocation, SharedLocationCall> const &LHS,
     std::pair<SourceLocation, SharedLocationCall> const &RHS) const {
   if (LHS.first == RHS.first)
-    return LocationCallFormatterCpp::format(*LHS.second) <
-           LocationCallFormatterCpp::format(*RHS.second);
+    return locationLessThan(LHS.second.get(), RHS.second.get());
   return LHS.first < RHS.first;
 }
 } // namespace internal
njames93 abandoned this revision.Apr 24 2021, 11:39 PM