Page MenuHomePhabricator

[Remarks] Add an LLVM-bitstream-based remark serializer
ClosedPublic

Authored by thegameg on Jun 17 2019, 6:20 PM.

Details

Summary

Add a new serializer, using a binary format based on the LLVM bitstream format.

This format provides a way to serialize the remarks in two modes:

  1. Separate mode: the metadata is separate from the remark entries.
  2. Standalone mode: the metadata and the remark entries are in the same file.

The format contains:

  • a meta block: container version, container type, string table, external file path, remark version
  • a remark block: type, remark name, pass name, function name, debug file, debug line, debug column, hotness, arguments (key, value, debug file, debug line, debug column)

A string table is required for this format, which will be dumped in the meta block to be consumed before parsing the remark blocks.

On clang itself, we noticed a size reduction of 13.4x compared to YAML, and a compile-time reduction of between 1.7% and 3.5% on CTMark.

Diff Detail

Repository
rL LLVM

Event Timeline

thegameg created this revision.Jun 17 2019, 6:20 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 17 2019, 6:20 PM
ormris added a subscriber: ormris.Jun 18 2019, 2:41 PM
paquette added inline comments.Jun 19 2019, 10:09 AM
llvm/include/llvm/Remarks/BitstreamRemarkContainer.h
32–34 ↗(On Diff #205228)

A couple things:

  1. Is remark metadata explicitly documented anywhere? The comment in BitstreamRemarkSerializer.h seems to be about half way there. Maybe it would be good to expand a bit upon that, and then reference that file here? (I think it might be worth adding to the LLVM docs on remarks eventually.)
  1. The "separate" point only talks about SeparateRemarksFile?
52 ↗(On Diff #205228)

extra comma

55 ↗(On Diff #205228)

Can we have a documentation comment on this?

63 ↗(On Diff #205228)

A documentation comment would be useful here as well.

llvm/include/llvm/Remarks/BitstreamRemarkSerializer.h
62–63 ↗(On Diff #205228)

Is it possible to verify this with a test?

E.g. add a debug dumper for the available abbrev IDs and do something similar to llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir?

76 ↗(On Diff #205228)

Grammar nit: Setup is a noun. I would use set up here. English is weird.

81 ↗(On Diff #205228)

This, and the other ones should probably say "set up" like setupMetaBlockInfo

llvm/lib/Remarks/BitstreamRemarkSerializer.cpp
23–26 ↗(On Diff #205228)

Whenever this function is called, there's a call to Bitstream.EmitRecord following it. Would it make sense to pull that into the function and give it a more descriptive name?

30–32 ↗(On Diff #205228)

This is a pretty common pattern as well. Could this be pulled out into a function?

thegameg marked 2 inline comments as done.Jun 19 2019, 2:17 PM
thegameg added inline comments.
llvm/include/llvm/Remarks/BitstreamRemarkContainer.h
32–34 ↗(On Diff #205228)
  1. Yes, I documented it right above the SeparateRemarksMeta enum entry. I agree that I should add something to the LLVM docs.
  2. Actually, it only talks about SeparateRemarksMeta. I will make this more explicit, but the idea for the separate model is:
    • the meta part will eventually go into a section. It will contain one BLOCK_META with some records.
    • the remark part will go into a separate file (with its path saved in the meta part). It will also contain one BLOCK_META (for more versioning) and multiple BLOCK_REMARK blocks.
llvm/include/llvm/Remarks/BitstreamRemarkSerializer.h
62–63 ↗(On Diff #205228)

Nothing obvious comes to mind right now for iterating on the available abbrev IDs without clobbering the code with debug stuff. Technically, the BitstreamWriter will assert if the abbrev id's width is bigger than what specified when entering the block.

thegameg updated this revision to Diff 206982.Jun 27 2019, 6:30 PM
thegameg marked 7 inline comments as done.
  • Address Jessica's comments.
  • Rename BLOCK_* to *_BLOCK_ID.
  • Teach llvm-bcanalyzer to recognize the magic number.
  • Add a unit test to check for the values that shouldn't change without a version change.
paquette added inline comments.Jun 28 2019, 10:33 AM
llvm/lib/Remarks/BitstreamRemarkSerializer.cpp
342 ↗(On Diff #206982)

This should only run when DidSetUp == true, correct?

Would it make sense to move DidSetUp into the Helper? Then you could set it in setupBlockInfo and either assert or throw an error if DidSetUp == false in appropriate Helper functions.

llvm/test/CodeGen/X86/remarks-bitstream.ll
11 ↗(On Diff #206982)

Are these outputs guaranteed to be stable? Judging by the opt runline, it looks like this might rely on inlining behaviour + remark formatting?

llvm/tools/llvm-bcanalyzer/llvm-bcanalyzer.cpp
975 ↗(On Diff #206982)

"Bitstream" seems a little redundant here. Just "LLVM Remarks" instead?

thegameg marked 3 inline comments as done.Jun 28 2019, 4:00 PM
thegameg added inline comments.
llvm/lib/Remarks/BitstreamRemarkSerializer.cpp
342 ↗(On Diff #206982)

Correct.

I wouldn't want to move DidSetUp into the Helper because I intend to use the Helper in other places, like the generation of the remark section where emitRemarkBlock won't be called.

Maybe I should just assert(DidSetUp == true) to avoid code getting in between the initialization and the emission of the remark?

llvm/test/CodeGen/X86/remarks-bitstream.ll
11 ↗(On Diff #206982)

You're right, they rely on inlining behavior and on the emission of the remarks from the passes. Ideally we would have unit tests for this, but I'm afraid at that point we'll have to check for byte-by-byte correctness, while here we can sort of have a textual check.

Maybe we should move the whole analysis behavior from llvm-bcanalyzer to lib/Bitcode? That way we can have unit tests that come from actual Remark objects and can at least compare strings instead of raw bytes.

Let me know if you have any ideas for testing this differently.

thegameg updated this revision to Diff 207681.Jul 2 2019, 5:57 PM

This adds a lot more testing.

  1. BitstreamRemarksFormatTest.cpp: test for the stability of the versions and IDs.
  2. BitstreamRemarksSerializerTest.cpp: don't depend on actual codegen to generate remarks. Manually create remark objects that get serialized then analyzed by the bitcode analyzer. This tests the multiple remark container types for the metadata blocks.
  3. Gets rid of the codegen test: llvm/test/CodeGen/X86/remarks-bitstream.ll.
thegameg updated this revision to Diff 208806.Jul 9 2019, 2:06 PM

Add documentation.

paquette accepted this revision.Jul 10 2019, 9:14 AM

LGTM!

This revision is now accepted and ready to land.Jul 10 2019, 9:14 AM
This revision was automatically updated to reflect the committed changes.