This is an archive of the discontinued LLVM Phabricator instance.

Bitcode: Add a string table to the bitcode format.
ClosedPublic

Authored by pcc on Apr 7 2017, 6:48 PM.

Details

Summary

Add a top-level STRTAB block containing a string table blob, and start storing
strings for module codes FUNCTION, GLOBALVAR, ALIAS, IFUNC and COMDAT in
the string table.

This change allows us to share names between globals and comdats as well
as between modules, and improves the efficiency of loading bitcode files by
no longer using a bit encoding for symbol names. Once we start writing the
irsymtab to the bitcode file we will also be able to share strings between
it and the module.

On my machine, link time for Chromium for Linux with ThinLTO decreases by
about 7% for no-op incremental builds or about 1% for full builds. Total
bitcode file size decreases by about 3%.

As discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2017-April/111732.html

Depends on D31828

Depends on D31829

Diff Detail

Repository
rL LLVM

Event Timeline

pcc created this revision.Apr 7 2017, 6:48 PM
pcc updated this revision to Diff 94581.Apr 7 2017, 7:09 PM
  • Update some comments
  • Move writing the version into a function
tejohnson edited edge metadata.Apr 11 2017, 9:57 AM

I got through the end of the BitcodeReader.cpp, sending comments so far.

llvm/lib/Bitcode/Reader/BitcodeReader.cpp
398 ↗(On Diff #94581)

Document, especially return value.

1799 ↗(On Diff #94581)

We don't have namecharXN anymore, right?

1800 ↗(On Diff #94581)

Can this block be refactored out of here and parseValueSymbolTable? I guess that requires some change to recordValue too if we want to share the accessing of the Function pointer, but it would at least be nice to refactor and share the rest.

2734 ↗(On Diff #94581)

Maybe add comment here and in parseFunctionRecord and parseGlobalIndirectSymbolRecord that in v1 the name will be added to the value later when the old style VST is parsed.

4904 ↗(On Diff #94581)

refactor out and share with analogous parseValueSymbolTable handling?

vsk added a subscriber: vsk.Apr 11 2017, 2:12 PM

I've got a collection of old bitcode files which I'll try recompiling with this patch applied, once the review winds down.

llvm/tools/llvm-modextract/llvm-modextract.cpp
67 ↗(On Diff #94581)

I see two places where we need to perform a byte-for-byte copy of a BitcodeModule's buffer, and then copy the strtab. Is there any reason to perform such a copy and not copy the strtab? If not, we could reduce the duplication by adding a 'writeCopyOfModule(BitcodeModule &)' helper to BitcodeWriter.

pcc updated this revision to Diff 94902.Apr 11 2017, 3:53 PM
pcc marked 5 inline comments as done.
  • Address review comments
pcc added inline comments.Apr 11 2017, 3:56 PM
llvm/tools/llvm-modextract/llvm-modextract.cpp
67 ↗(On Diff #94581)

Is there any reason to perform such a copy and not copy the strtab?

In principle you could concatenate multi-module bitcode files by copying only unique strtabs instead of potentially copying the same strtab multiple times.

If not, we could reduce the duplication by adding a 'writeCopyOfModule(BitcodeModule &)' helper to BitcodeWriter.

Maybe, but we only need to perform these copies in low level tools, so I'm not sure if we want to support their operations at the API level. That would also create a direct dependency from the bitcode writer to the bitcode reader, and I'd like to avoid adding any new such dependencies. Maybe we can reconsider if we need to do this elsewhere (i.e. rule of 3).

tejohnson added inline comments.Apr 12 2017, 4:33 PM
llvm/lib/Bitcode/Reader/BitcodeReader.cpp
5013 ↗(On Diff #94902)

Add comment with record format above here (see other cases).

llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
1285 ↗(On Diff #94902)

Why does this move?

1310 ↗(On Diff #94902)

Why does this go away?

2902 ↗(On Diff #94902)

Comment needs updating.

2905 ↗(On Diff #94902)

Maybe rename to writeFunctionLevelValueSymbolTable or something like that to distinguish from the other one.

2924 ↗(On Diff #94902)

This one goes away I think

3539 ↗(On Diff #94902)

Can getValueId be split into two, part that does the assignment if it doesn't exist in map and returns nothing (called from this code), and one that asserts it exists in map and returns the value (called from where we originally called getValueId further down? Presumably by the time we reach there we will always have them assigned due to this code.

llvm/tools/llvm-cat/llvm-cat.cpp
54 ↗(On Diff #94902)

Needed because string in string table owned by parsed modules?

llvm/tools/llvm-modextract/llvm-modextract.cpp
64 ↗(On Diff #94902)

Rename variable? "Header" no longer seems appropriate.

pcc marked 6 inline comments as done.Apr 13 2017, 10:45 AM
pcc added inline comments.
llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
1285 ↗(On Diff #94902)

We now create the summary entries as we read MODULE_CODE_* records. That means that we need the source file name to compute the GUID correctly.

1310 ↗(On Diff #94902)

We now create a VST unconditionally. Avoiding creating a VST if the module is empty seems like a micro-optimization that is not worth it.

llvm/tools/llvm-cat/llvm-cat.cpp
54 ↗(On Diff #94902)

Yes, comment added.

pcc updated this revision to Diff 95165.Apr 13 2017, 10:45 AM
  • Address review comments
pcc added a comment.Apr 13 2017, 10:47 AM

Hmm, I somehow managed to revert my changes from the first round of review comments. Uploading a new patch shortly.

pcc updated this revision to Diff 95167.Apr 13 2017, 10:58 AM

Re-applying changes from first round of review comments

This revision is now accepted and ready to land.Apr 16 2017, 2:33 PM
This revision was automatically updated to reflect the committed changes.