This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Introduce MemoryTrees
ClosedPublic

Authored by kadircet on Sep 28 2020, 6:01 AM.

Details

Summary

A structure that can be used to represent memory usage of a nested
set of systems.

Diff Detail

Event Timeline

kadircet created this revision.Sep 28 2020, 6:01 AM
kadircet requested review of this revision.Sep 28 2020, 6:01 AM
sammccall added inline comments.Sep 28 2020, 6:56 AM
clang-tools-extra/clangd/support/MemoryTree.h
23

(I'm wondering if there are other resources we'd like to count, like disk size or so, but probably YAGNI and a concrete name is nice)

25

The concept of "detail" should be introduced somewhere, or just be something more concrete
(like IsFilename)

There's a few things about filenames that make erasing them sensible:

  • they have uniform semantics - we don't really know the difference between them
  • can be very many of them so the result is inherently hard to interpret
  • can be many/large so we don't want to copy them

Thinking about how to generalize these - I think the name detail is OK (if you want a general name) but we should mention some of it.

26

do we support nodes with both usage and children?
This API makes such a construction awkward.

vs something like addUsage(size_t)

32

In earlier discussions we talked about avoiding the cost of *assembling* the detailed tree, not just summarizing it at output time.

This requires CollapseDetails to be passed into the profiling function, which in turn suggests making it part of a tree and passing a reference to the tree in. e.g. an API like

class MemoryTree {
  MemoryTree(bool Detailed);
  MemoryTree* addChild(StringLiteral); // no copy
  MemoryTree* addDetail(StringRef); // returns `this` unless detailed
  void addUsage(size_t);
}
32

It's hard to evaluate a tree-traversal API without its usages... how will this be used?

sammccall added inline comments.Sep 28 2020, 7:02 AM
clang-tools-extra/clangd/support/MemoryTree.cpp
29

Here the detailed nodes are entirely collapsed. This isn't quite as powerful as collapsing the detail *edges* only, which is worth considering.

e.g. consider TUScheduler. The natural accounting of resources is (IMO)

clangd.TUScheduler.<filename>.AST
clangd.TUScheduler.<filename>.Preamble.InMemory
clangd.TUScheduler.<filename>.Preamble.Inputs

or something like that.

Instead of aggregating to clangd.TUScheduler only, collapsing the <filename> edges mean you get clangd.TUScheduler.AST etc.

kadircet marked an inline comment as done.Oct 2 2020, 3:01 AM
kadircet added inline comments.
clang-tools-extra/clangd/support/MemoryTree.cpp
29

yes this was an option i hadn't considered. my only hesitation is this likely implies different paths with the same name. current api also doesn't protect against it, but they are a lot less likely as we collapse whole subtrees.

I believe deleting the edge should also imply aggregating all the paths with the same name, e.g.
a.b.<deleted-1>.c, 5 and a.b.<deleted-2>.c, 8 should end up being a.b.c,13. WDYT?

clang-tools-extra/clangd/support/MemoryTree.h
23

i went with this one as i didn't want to call it Tree and couldn't find a nice generic name, maybe something around NamedBytesTree.

26

yes we do, I haven't added any because it wasn't needed and internal collapsing mechanism could update the size. i'll add an explicit addUsage too.

32

you can find some in https://reviews.llvm.org/D88414

sammccall added inline comments.Oct 2 2020, 4:38 AM
clang-tools-extra/clangd/support/MemoryTree.cpp
29

Agree with your desired behavior.

If you're willing to make collapsing a property of the tree rather than the query, you avoid this problem, because you collapse up-front and make addChild idempotent:

class Tree {
  BumpPtrAllocator *DetailAllocator; // Null if no detail
  StringMap<Tree> Children;
  size_t Self = 0;

  Tree* childImpl(StringRef Name) {
    return &*Children.try_emplace(Name, DetailAllocator).first;
  }

public:  
  Tree(BumpPtrAllocator *);
  Tree* child(StringLiteral Name) { return childImpl(Name); }
  Tree* detail(StringRef Name) {
    return DetailAllocator ? child(DetailAllocator->copy(Name)) : this;
  }
  void add(unsigned Size) { Self += Size; }
};
kadircet updated this revision to Diff 296111.Oct 5 2020, 1:53 AM
kadircet marked 7 inline comments as done.
  • Address comments
sammccall accepted this revision.Oct 5 2020, 2:26 AM

Looks good - some concerns that the traversal isn't flexible enough but we can address that later once it's used.

clang-tools-extra/clangd/support/MemoryTree.cpp
11

If you're only going to pass one size, I'd think total (rather than self) is the most meaningful.
This is easier if you do postorder of course...

12

if you pass ComponentName as a StringRef you can avoid lots of allocations by using the same underlying string (which will grow and shrink, but not reallocate much).
You'll need a recursion helper with a different signature though.

clang-tools-extra/clangd/support/MemoryTree.h
32

I guess you mean D88417?

This works well when exporting all nodes somewhere keyed by dotted path, but it's pretty awkward+inefficient if you want to - totally hypothetical example here - dump the memory tree as a JSON structure over RPC.

So I think we need a more flexible API, at which point... maybe we should punt, expose const DenseMap<StringRef, MemoryTree> &children() const, and move the traverse implementation to the metrics exporter?

34

nit: I'd call this parameter DetailAlloc to hint at what's stored there

42

nit: Name.copy(*Alloc) is a little more direct

50

nit: need to be explicit whether this is self or subtree (or could pass both)

This revision is now accepted and ready to land.Oct 5 2020, 2:26 AM
sammccall added inline comments.Oct 5 2020, 2:35 AM
clang-tools-extra/clangd/support/MemoryTree.h
36

Maybe mention that child pointers are invalidated by future addChild/addDetail

37

actually, why do these return pointers rather than references?

sammccall added inline comments.Oct 5 2020, 3:04 AM
clang-tools-extra/clangd/support/MemoryTree.h
37

reading call sites, child() might be both more fluent and more accurate than addChild - we're not calling it for the side effect and there may or may not be one.

50

I think it'd be worth having size_t total() with a comment that it traverses the tree.

We have places where we only want this info (e.g. log messages) and it's probably nice in unit tests too.

kadircet updated this revision to Diff 296667.Oct 7 2020, 7:19 AM
kadircet marked 9 inline comments as done.
  • Rename add{Detail,Child} -> {detail,child}
  • Get rid of traverseTree API and only expose total
  • Add a children() getter
clang-tools-extra/clangd/support/MemoryTree.h
32

SGTM.

37

actually, why do these return pointers rather than references?

It is to make usage with auto safer, as it won't capture ref by default and make a copy. e.g. auto child = MT.child(); child.addusage(1) won't work as expected.

I suppose we can make the object non-copyable and prevent such misuse.

42

ah thanks, didn't know that!

sammccall accepted this revision.Oct 7 2020, 7:25 AM

LGTM, thanks!

clang-tools-extra/clangd/support/MemoryTree.h
37

Oh, right - yes ref but noncopyable seems even better to me

58

oops, we should probably also expose self?

kadircet updated this revision to Diff 296675.Oct 7 2020, 7:43 AM
  • Add self
kadircet marked an inline comment as done.Oct 7 2020, 7:43 AM
kadircet updated this revision to Diff 296677.Oct 7 2020, 7:51 AM
kadircet marked 2 inline comments as done.
  • Make MemoryTree move-only and return refs instead of pointers on child and detail.
sammccall accepted this revision.Oct 7 2020, 9:37 AM

Still LGTM, commit it already :-D

clang-tools-extra/clangd/support/MemoryTree.cpp
27

nit: could inline this consistent with addUsage

Still LGTM, commit it already :-D

Haha, waiting for dependents to be stamped as well :D

kadircet updated this revision to Diff 296910.Oct 8 2020, 3:13 AM
kadircet marked an inline comment as done.
  • Inline MemoryTree::self
This revision was landed with ongoing or failed builds.Oct 12 2020, 6:27 AM
This revision was automatically updated to reflect the committed changes.