This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Fix bugs with incorrect memory estimate report
ClosedPublic

Authored by kbobyrev on Sep 25 2018, 9:06 AM.

Details

Summary
  • With the current implementation, sizeof(std::vector<Chunk>) is added twice to the Dex memory estimate which is incorrect
  • Dex logs memory usage estimation before BackingDataSize is set and hence the log report excludes size of the external SymbolSlab which is coupled with Dex instance

Diff Detail

Event Timeline

kbobyrev created this revision.Sep 25 2018, 9:06 AM
ioeric added inline comments.Sep 25 2018, 10:14 AM
clang-tools-extra/clangd/index/dex/Dex.cpp
248–250

Would InvertedIndex.getMemorySize() be a better estimate?

kbobyrev added inline comments.Sep 25 2018, 10:38 AM
clang-tools-extra/clangd/index/dex/Dex.cpp
248–250

IIUC this is only precise for the objects which do not make any allocations (e.g. std::vector and std::string memory estimate would not be "correct").

From the documentation

This is just the raw memory used by DenseMap. If entries are pointers to objects, the size of the referenced objects are not included.

I also have Bytes += InvertedIndex.getMemorySize(); above, so the purpose of this code would be to take into account the size of these "reference objects".

However, since PostingList::bytes() also returns underlying std::vector size, this code will probably add these std::vector objects size twice (the first one was in InvertedIndex.getMemorySize()). I should fix that.

ioeric added inline comments.Sep 25 2018, 11:12 AM
clang-tools-extra/clangd/index/dex/Dex.cpp
248–250

If entries are pointers to objects, the size of the referenced objects are not included.

This applies to *pointers* to objects, but the PostingLists in InvertedList are not pointerd? So it seems to me that InvertedIndex.getMemorySize() covers everything in the InvertedIndex. One way to verify is probably check the size calculated with loop against InvertedIndex.getMemorySize().

kbobyrev updated this revision to Diff 167062.Sep 26 2018, 12:49 AM
kbobyrev edited the summary of this revision. (Show Details)
  • Prevent sizeof(std::vector<Chunk>) to be counted twice in the memory estimates
  • Pull memory usage logging to the caller side: it is currently incorrect because Dex::BackingDataSize is not set by the time vlog(..., estimateMemoryUsage()) is called and hence the log only includes size of the index overhead

I will remove these changes from D52047 and hopefully make it easier to review.

clang-tools-extra/clangd/index/dex/Dex.cpp
248–250

As discussed offline, pointers and references are an example of objects which would be incorrectly measured by DesneMap::getMemorySize(), but std::vector and std::string are pointers in a way because they also just store a pointer to the underlying storage which is hidden from DenseMap::getMemorySize(); thus, it would be more precise to use the proposed scheme for memory estimation.

kbobyrev updated this revision to Diff 167079.Sep 26 2018, 1:37 AM
kbobyrev retitled this revision from [clangd] More precise index memory usage estimate to [clangd] Fix bugs with incorrect memory estimate report.
kbobyrev edited the summary of this revision. (Show Details)

As discussed offline, using std::string::size() would not be a precise estimate of std::string size, therefore I'm excluding Token::Data size estimate.

ioeric added inline comments.Sep 26 2018, 3:48 AM
clang-tools-extra/clangd/index/dex/Dex.cpp
248–250

I see. I was expecting DenseMap to maintain an arena which could provide memory usage of all owned data, but it seems to only consider sizeof(KV-Pair)...

If the posting lists (vectors) are not covered, shouldn't we also also add the size of tokens (string) here?

kbobyrev added inline comments.Sep 26 2018, 6:51 AM
clang-tools-extra/clangd/index/dex/Dex.cpp
248–250

Yes, I was also adding TokenToPostingList.first.Data.size() to the size estimate in the previous revision snapshot, but as @sammccall pointed out, there are multiple string implementations and one of them (IIUC this one is currently pushed in the C++ Standard) utilizes Small Object Optimization (i.e. it behaves like llvm::SmallVector rather than std::vector). Therefore, it does not perform external allocations before some limit is reached. I don't know whether it is possible to know whether the external allocation happened or not and the summary of our discussion with Sam was that we should just omit this.

ioeric accepted this revision.Sep 26 2018, 6:57 AM
This revision is now accepted and ready to land.Sep 26 2018, 6:57 AM
This revision was automatically updated to reflect the committed changes.