This is an archive of the discontinued LLVM Phabricator instance.

[Remarks] Add string deduplication using a string table
ClosedPublic

Authored by thegameg on Apr 3 2019, 12:42 PM.

Details

Summary
  • Add support for uniquing strings in the remark streamer and emitting the string table in the remarks section.
  • Add parsing support for the string table in the RemarkParser.

From this remark:

--- !Missed
Pass:            inline
Name:            NoDefinition
DebugLoc:        { File: 'test-suite/SingleSource/UnitTests/2002-04-17-PrintfChar.c',
                   Line: 7, Column: 3 }
Function:        printArgsNoRet
Args:
  - Callee:          printf
  - String:          ' will not be inlined into '
  - Caller:          printArgsNoRet
    DebugLoc:        { File: 'test-suite/SingleSource/UnitTests/2002-04-17-PrintfChar.c',
                       Line: 6, Column: 0 }
  - String:          ' because its definition is unavailable'
...

to:

--- !Missed
Pass:            0
Name:            1
DebugLoc:        { File: 3, Line: 7, Column: 3 }
Function:        2
Args:
  - Callee:          4
  - String:          5
  - Caller:          2
    DebugLoc:        { File: 3, Line: 6, Column: 0 }
  - String:          6
...

And the string table in the .remarks/__remarks section containing:

inline\0NoDefinition\0printArgsNoRet\0test-suite/SingleSource/UnitTests/2002-04-17-PrintfChar.c\0printf\0 will not be inlined into \0 because its definition is unavailable\0

This is mostly supposed to be used for testing purposes, but it gives us a 2x reduction in the remark size, and is an incremental change for the updates to the remarks file format.

Diff Detail

Event Timeline

thegameg created this revision.Apr 3 2019, 12:42 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 3 2019, 12:42 PM
paquette added inline comments.Apr 3 2019, 1:51 PM
llvm/include/llvm/Remarks/RemarkStringTable.h
47

Why undefined?

thegameg marked 2 inline comments as done.Apr 3 2019, 2:29 PM
thegameg added inline comments.
llvm/include/llvm/Remarks/RemarkStringTable.h
47

This class is supposed to be used by having a sequence of calls to add(...) and then serialize(OS). After thinking a bit more I will just remove the get function.

thegameg updated this revision to Diff 193598.Apr 3 2019, 2:30 PM
thegameg marked an inline comment as done.

Remove remarks::StringTable::get.

Is this transparent to tools that use LLVM's YAML library? Other tools?

If not, we'll need to update the llvm-opt-report tool and the opt-viewer tool.

JDevlieghere added inline comments.Apr 3 2019, 3:08 PM
llvm/include/llvm/Remarks/RemarkStringTable.h
41

Can we not use the size of the StringMap?

llvm/lib/IR/DiagnosticInfo.cpp
442

Since the keys are the same and only the values are different, I'd extract this in a helper function.

llvm/lib/Remarks/RemarkParser.cpp
85

Not sure if this would actually simplify things, but you could do

size_t NextOffset = (Index == Offsets.size() - 1) ? Buffer.size() : Offsets[Index + 1];

to not repeat the computation in the StringRef ctor.

thegameg updated this revision to Diff 193649.Apr 3 2019, 8:18 PM
thegameg marked 3 inline comments as done.

Address Jonas' remarks.

Is this transparent to tools that use LLVM's YAML library? Other tools?

If not, we'll need to update the llvm-opt-report tool and the opt-viewer tool.

This is disabled by default, and enabled under -mllvm -remarks-yaml-string-table so tools like llvm-opt-report and the opt-viewer won't need any changes yet.

On the parsing side, this will expect the usage of a string table in the YAML only if the constructor with an actual string table buffer is provided. The tools don't know how to do this yet since they operate on the YAML files alone (the string table is in a section in the object file), but anyone using libRemarks.dylib/.so could use this to parse these files. I plan on making all these tools use this library at some point.

llvm/include/llvm/Remarks/RemarkStringTable.h
41

This is the total size of the string table that is going to be written to disk. It's the sum of the sizes of every string in the map. I can recompute this in serialize() if you think it's a nicer way to do this.

JDevlieghere accepted this revision.Apr 10 2019, 1:43 AM

LGTM with the outstanding comments/questions addressed.

llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
1365

Can you be more specific? Could we write everything but the version to a buffer and EmitBinaryData here?

This revision is now accepted and ready to land.Apr 10 2019, 1:43 AM
thegameg marked an inline comment as done.Apr 17 2019, 3:08 PM
thegameg added inline comments.
llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
1365

Yes, we could, but we would perform an extra copy of the strings which this allows us to avoid.

LGTM

llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
1365

Fair!

This revision was automatically updated to reflect the committed changes.