This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Reuse buffer for JSONTransport::sendMessage
ClosedPublic

Authored by njames93 on Dec 18 2020, 3:56 AM.

Details

Summary

Allocate a Buffer in the JSONTransport to be used when sending messages to the client.
This gets reused each time a message is sent, reducing in fewer malloc, which is always a bonus.

Diff Detail

Event Timeline

njames93 created this revision.Dec 18 2020, 3:56 AM
njames93 requested review of this revision.Dec 18 2020, 3:56 AM

100% sure that failure is from trunk

njames93 updated this revision to Diff 312780.Dec 18 2020, 6:10 AM

Extend buffer behaviour to sendMessage.

Calls to this function seem to be guarded(up the call stack) by a mutex so again we shouldn't worry about races.
Besides there is no synchronization in the function currently when writing to the output stream.

This adds a bit of complexity, making the code here a fair amount harder to follow and verify the correctness of.

  • Do we have evidence that these allocations are causing a problem? (e.g. do we observe a significant decrease in RSS after the patch)? Naively I would expect these allocations to be basically unimportant compared to those of the JSON objects themselves.(And I don't particularly expect either of them to be significant - the comment on the other review was really just "0 probably isn't the right arg to malloc_trim if there's any allocation going on").
  • there seem to be simpler ways to structure this avoiding allocations. JSONLineBuffer is effectively statically bounded, and can be SmallString<32> or so. The content buffer could simply be passed in if I'm reading right: bool readRawMessage(std::string&)? OutputBuffer probably does need to be a member variable though.

This adds a bit of complexity, making the code here a fair amount harder to follow and verify the correctness of.

  • Do we have evidence that these allocations are causing a problem? (e.g. do we observe a significant decrease in RSS after the patch)? Naively I would expect these allocations to be basically unimportant compared to those of the JSON objects themselves.(And I don't particularly expect either of them to be significant - the comment on the other review was really just "0 probably isn't the right arg to malloc_trim if there's any allocation going on").

It's not causing problems per say. but given the incoming json messages can contain a whole file plus things like escape chars. its wise to allocate a buffer that will grow to the largest json it receives but never shrink.

  • there seem to be simpler ways to structure this avoiding allocations. JSONLineBuffer is effectively statically bounded, and can be SmallString<32> or so. The content buffer could simply be passed in if I'm reading right: bool readRawMessage(std::string&)? OutputBuffer probably does need to be a member variable though.

Its not statically bounded unfortunately, the length is how ever long a line in the json is, which can be infinite. the read line function uses a buffer size of 1024 as an upperbound.
However that can easily be exceeded as I think the contents of files are escaped so they will be read as 1 long line. It may be slightly more readable to make the function take a reference to the buffer though.

njames93 updated this revision to Diff 312796.Dec 18 2020, 7:52 AM

Moved a few members out of the class.

This adds a bit of complexity, making the code here a fair amount harder to follow and verify the correctness of.

  • Do we have evidence that these allocations are causing a problem? (e.g. do we observe a significant decrease in RSS after the patch)? Naively I would expect these allocations to be basically unimportant compared to those of the JSON objects themselves.(And I don't particularly expect either of them to be significant - the comment on the other review was really just "0 probably isn't the right arg to malloc_trim if there's any allocation going on").

It's not causing problems per say. but given the incoming json messages can contain a whole file plus things like escape chars. its wise to allocate a buffer that will grow to the largest json it receives but never shrink.

Well, it's also wise to measure before optimizing. The large requests (ones containing a whole file) typically cause us to parse a C++ TU. After we read this JSON message, we will immediately:

  • allocate a std::string to hold the decoded file content payload as part of the json::Value
  • copy that string into the DraftManager
  • allocate a MemoryBuffer and copy the file into it, for parsing
  • read all of the thousands of transitively included headers into newly-allocated MemoryBuffers
  • run the clang parser and then the clangd indexer

I think the most likely case is that this change has no observable impact on performance or resource usage. If that's true, I don't want to land it - it makes the code harder to reason about for no detectable benefit.

It's definitely *plausible* there's something surprising going on with allocation patterns such that this makes a difference. The impact of malloc_trim shows that clangd's overall allocation patterns have surprising effects. But right now I don't see any reason to think that these allocations matter.

  • there seem to be simpler ways to structure this avoiding allocations. JSONLineBuffer is effectively statically bounded, and can be SmallString<32> or so. The content buffer could simply be passed in if I'm reading right: bool readRawMessage(std::string&)? OutputBuffer probably does need to be a member variable though.

Its not statically bounded unfortunately, the length is how ever long a line in the json is, which can be infinite. the read line function uses a buffer size of 1024 as an upperbound.

JSONLineBuffer is not used for reading JSON, but rather HTTP-style headers. The only headers supported are Content-Length and Content-Type, and I don't believe clients send Content-Type in practice (there's only one supported). In any case, you'll never see a line beyond 256 chars from a well-behaved client. ("statically bounded" is too strong, but from a performance-tuning perspective it's enough).

njames93 updated this revision to Diff 312930.Dec 19 2020, 5:26 AM

Remove input buffers, but keep output as its easy to reason about.

sammccall accepted this revision.Dec 21 2020, 10:42 AM

Thanks, this version seems just as clear as the original to me, and fewer allocations is nice :-)

This revision is now accepted and ready to land.Dec 21 2020, 10:42 AM

I've put a version of the reading part in D93653. This seems a bit less invasive to me - I'm still not sure if it's worth the readability hit though...

njames93 retitled this revision from [clangd] Reuse buffer for JSONTransport::readRawMessage to [clangd] Reuse buffer for JSONTransport::sendMessage.Dec 22 2020, 3:29 AM
njames93 edited the summary of this revision. (Show Details)
This revision was automatically updated to reflect the committed changes.