This is an archive of the discontinued LLVM Phabricator instance.

[pseudo] GC GSS nodes, reuse them with a freelist
ClosedPublic

Authored by sammccall on May 31 2022, 2:58 PM.

Details

Summary

Most GSS nodes have short effective lifetimes, keeping them around until the
end of the parse is wasteful. Mark and sweep them every 20 tokens.

When parsing clangd/AST.cpp, this reduces the GSS memory from 1MB to 20kB.
We pay ~5% performance for this according to the glrParse benchmark.
(Parsing more tokens between GCs doesn't seem to improve this further).

Compared to the refcounting approach in https://reviews.llvm.org/D126337, this
is simpler (at least the complexity is better isolated) and has >2x less
overhead. It doesn't provide death handlers (for error-handling) but we have
an alternative solution in mind.

Diff Detail

Event Timeline

sammccall created this revision.May 31 2022, 2:58 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 31 2022, 2:58 PM
sammccall requested review of this revision.May 31 2022, 2:58 PM
sammccall updated this revision to Diff 433201.May 31 2022, 3:01 PM

remove dead variable

hokein added a comment.Jun 1 2022, 1:55 AM

Nice! I really like the form it goes now.

clang-tools-extra/pseudo/include/clang-pseudo/GLR.h
81–82

This FIXME is done by this patch :)

clang-tools-extra/pseudo/lib/GLR.cpp
15

an off-topic comment: we only use the function in debug branch code, this will trigger the include-cleaner warning in release mode :)

we could warp it with #ifndef NDEBUG as well.

88

I would just call it before the NewHeads.clear(), and run the gc on the NewHeads directly.

It simplifies the implementation (no need to traverse three pending actions, and no duplicated elements in Roots). The downside is that some dead heads (heads with no available actions on Terminal.symbol()) will not be GCed in this run, but I think it is negligible, as those will be GCed in next run.

392

The NodeCount becomes vague now, we update in both addNode and allocate methods, I think we should have two separate counts.

404

What's the practical size of the FreeList? I think we can just find one, and use it as initial default size, to save some cost of resize.

clang-tools-extra/pseudo/unittests/GLRTest.cpp
412

nit: since we free D, I think here E will reuse the memory free from D, maybe add pointer comparison with E and D ?

sammccall added inline comments.Jun 1 2022, 2:58 AM
clang-tools-extra/pseudo/lib/GLR.cpp
15

conditional including is ugly and can cause subtle opt-only compile failures.
I'm tempted to add a "keep" pragma but want to think about how to handle this case a bit more.

88

Oops, I forgot that Pending* are all empty at the beginning of this loop. I should probably add an assertion for that.

We can call it right at the top of the loop, yes? It doesn't need to be between AddSsteps and NewHeads.clear().

run the gc on the NewHeads directly

Unfortunately not: the GC naturally consumes the argument (it becomes the stack for the DFS) so we need a copy.
We could move the copy into gc() but:

  • this way we can reuse the same vector every time and don't have to allocate
  • soon we'll have error-recovery candidates as additional roots we want to pass in
392
  • Oops, that duplicated ++NodeCount is just a mistake.
  • names are vague now, both "NodeCount" and "Allocated" are dubious. Maybe I'll rename to NodesCreated and Alive, WDYT?
  • are there actually more metrics we want to record? we have nodes-created, currently-live-nodes (Allocated.size()), nodes-destroyed (NodeCount - Allocated.size()). We could track max-live-node but I think Arena.bytes() is fine for our purposes.
  • do we want to expose any of those numbers through APIs? I couldn't work out what I'd actually do with them.
404

It's whatever the max #parents is - maybe 10 or something?
I don't think it's worth adding a hard-coded guess to save ~1 resize of a single vector per parse!

hokein added inline comments.Jun 1 2022, 3:21 AM
clang-tools-extra/pseudo/lib/GLR.cpp
88

Oops, I forgot that Pending* are all empty at the beginning of this loop. I should probably add an assertion for that.

Yes. I would probably do it after glrReduce and glrShift calls.

We can call it right at the top of the loop, yes? It doesn't need to be between AddSsteps and NewHeads.clear().

Yes, exactly. We could also do that at the end of the loop.

run the gc on the NewHeads directly

Unfortunately not: the GC naturally consumes the argument (it becomes the stack for the DFS) so we need a copy.
We could move the copy into gc() but:

  • this way we can reuse the same vector every time and don't have to allocate
  • soon we'll have error-recovery candidates as additional roots we want to pass in

oh, I didn't notice that. I think making a copy is fine.

392

names are vague now, both "NodeCount" and "Allocated" are dubious. Maybe I'll rename to NodesCreated and Alive, WDYT?

Sounds good to me.

are there actually more metrics we want to record? we have nodes-created, currently-live-nodes (Allocated.size()), nodes-destroyed (NodeCount - Allocated.size()). We could track max-live-node but I think Arena.bytes() is fine for our purposes.

I think these are sufficient, should provide enough details about GSS-memory-related information. The other things I will like to record is the max-active-heads, but that's a different bit.

do we want to expose any of those numbers through APIs? I couldn't work out what I'd actually do with them.

I think we should avoid to expose an API per field, we probably just need a single API which returns a Stats struct containing all these things. We can do it later.

404

I would image there will be multiple resize calls, if the path like allocate(1), allocate(3), ..., allocate(max), which I think it is practical? unless the first call is allocate(max) which is unlikely I think.

sammccall updated this revision to Diff 434103.Jun 3 2022, 12:09 PM
sammccall marked 6 inline comments as done.

address comments

sammccall marked an inline comment as done.Jun 3 2022, 12:09 PM
sammccall added inline comments.
clang-tools-extra/pseudo/lib/GLR.cpp
392

OK, I haven't added any new stats APIs for now.

404

So for parsing clangd/AST.cpp, we have 4 small allocations for the whole file, which we could reduce by 3.

size increased to 1
capacity grew from 0 to 1
size increased to 2
capacity grew from 1 to 2
size increased to 3
capacity grew from 2 to 4
size increased to 4
size increased to 6
capacity grew from 4 to 8
size increased to 7

By comparison, each call to glrReduce creates 5 vectors and 2 priority queues, each of which can have multiple allocations as it grows. There are 4000 tokens, for a total of probably 50000 mallocs.

I think you're microoptimizing the wrong places :-)

hokein accepted this revision.Jun 7 2022, 12:32 AM
hokein added inline comments.
clang-tools-extra/pseudo/lib/GLR.cpp
78

nit: I'd rather pass the NewHeads as a vector parameter, and get rid of the Roots variable in the lambda, but up to you.

404

that's fair enough (and there are some opportunities to optimize the glrReduce). But I think adding a single FreeList.revese(10); statement seems trivial. Up to you.

This revision is now accepted and ready to land.Jun 7 2022, 12:32 AM
sammccall added inline comments.Jun 8 2022, 2:43 PM
clang-tools-extra/pseudo/lib/GLR.cpp
78

That would result in reallocating the array every time: we would allocate a new vector to copy NewHeads into, copy it, gc() would own it and destroy (deallocate) it at the end.

As the code is now, Roots stays alive. Each time we GC we fill it up with elements, and then gc() clears them out again, but we never actually deallocate the buffer.

This revision was landed with ongoing or failed builds.Jun 8 2022, 2:44 PM
This revision was automatically updated to reflect the committed changes.