This is an archive of the discontinued LLVM Phabricator instance.

[llvm-tblgen] - Stop using std::string in RecordKeeper.
ClosedPublic

Authored by grimar on Nov 17 2017, 2:09 AM.

Details

Summary

I was wondering why building of LLVM takes so much time for me and
tried to profile llvm-tblgen, which calls for X86CommonTableGen module looks
take about 24 minutes on my i7-4970k@~4.0 under windows with MSVS 2015 for debug build.

I found RecordKeeper::getDef() shows up in profiling and it
creates std::string instance for each search in RecordMap though RecordKeeper::RecordMap
can use StringRef as a key instead to avoid that I believe.

Total time seem reduces for about 12 seconds (from initial ~24minutes) so it is not so significant, but still a bit better.
(During investigation I found LLVM_OPTIMIZED_TABLEGEN flag and going to check how it helps, but still looks
we can fix that place to avoid excessive allocations too).

Sorry I also not sure who should be reviewer for that, and code looks really old,
so added people who modified code around that place.

Diff Detail

Repository
rL LLVM

Event Timeline

grimar created this revision.Nov 17 2017, 2:09 AM
grimar added a comment.EditedNov 17 2017, 3:23 AM

LLVM_OPTIMIZED_TABLEGEN speedups build more than 2x for me, I wonder why it is not default option. (we probably could emit warning or something that if no release binaries found to use).

There's been some talk about making LLVM_OPTIMIZED_TABLEGEN the default, but it hasn't happened for whatever reason.

As for this patch, I would actually prefer if we didn't even use std::map. Is ordering important here? Can this be changed to a llvm::StringMap?

As for this patch, I would actually prefer if we didn't even use std::map. Is ordering important here? Can this be changed to a llvm::StringMap?

RecordKeeper has getClasses and getDefs which are used for iterations over std::map returned currently, so I think ordering is important.

I tried to use StringMap + std::vector (demo of this is D40239) and X86CommonTableGen rebuild time changes from 1446 seconds to 1430 for me
(FWIW I am using MSVS built in timing for calculation), or about 1%. Do you think we can use such approach ?

I just realized that I do not know if ordering is important or not. I know almost nothing about TableGen.
Some code iterates over std::map returned, but it does not automatially means ordering is really important.

And some code just applies sorting by itself (https://github.com/llvm-mirror/llvm/blob/master/utils/TableGen/CTagsEmitter.cpp#L65):

for (const auto &C : Classes)
  Tags.push_back(Tag(C->getName(), locate(C)));
for (const auto &D : Defs)
  Tags.push_back(Tag(D->getName(), locate(D)));
std::sort(Tags.begin(), Tags.end());

My earlier suggested approach (D40239) also changes ordering to insertion order, what may be excessive (if we do not need ordering)
or incorrect (if we need sorted order and not just any fixed order), so I now don't think it was good idea to suggest.

grimar planned changes to this revision.Nov 20 2017, 3:54 AM
grimar updated this revision to Diff 123572.EditedNov 20 2017, 5:59 AM
grimar retitled this revision from [llvm-tblgen] - Stop using std:string in RecordKeeper. to [llvm-tblgen] - Stop using std::string in RecordKeeper..
  • Use StringRef in Tag instead of std::string*'. (Tag` remembers ID of Classes/Defs from Record and since patch changes type of map key from std::string to StringRef, this place should be updated.)
This revision was automatically updated to reflect the committed changes.

When built with memory sanitizer, this change causes many use-of-uninitialized-value errors and also causes compilation failures when built with the cmake option -DBUILD_SHARED_LIBS=ON. AFAIK, StringRef does not store the string contents in itself and just maintains a pointer to an external storage. Are you sure all the string keys supplied to Classes or Defs map in RecordKeeper class are persistent throughout all the uses of those maps?

If you can't reproduce the same error, try building with the memory sanitizer, which can be enabled with the cmake option -DLLVM_USE_SANITIZER=Memory. If you confirm this problem, I suggest maybe reverting this change.

When built with memory sanitizer, this change causes many use-of-uninitialized-value errors and also causes compilation failures when built with the cmake option -DBUILD_SHARED_LIBS=ON. AFAIK, StringRef does not store the string contents in itself and just maintains a pointer to an external storage. Are you sure all the string keys supplied to Classes or Defs map in RecordKeeper class are persistent throughout all the uses of those maps?

If you can't reproduce the same error, try building with the memory sanitizer, which can be enabled with the cmake option -DLLVM_USE_SANITIZER=Memory. If you confirm this problem, I suggest maybe reverting this change.

I'll try to reproduce it, thanks for reporting. Reverted for now in r318899.

BTW, there was a delay of 24h between commit and revert of this and no bots reported any failtures to me. I wonder why it happened ? Don't we have ones with memory sanitizer enabled ?
(I did not test it with -DLLVM_USE_SANITIZER=Memory yet, but just wonder how it is possible ?).

I don't know if they have bots with sanitizer enabled. And it looks like -DBUILD_SHARED_LIBS=ON does not always reproduce the failure; it looks like it depends on the system. But I guess the memory sanitizer behavior should be the same. Please let me know if you can reproduce the error. Thank you!