Page MenuHomePhabricator

COFF: Implement string tail merging.
ClosedPublic

Authored by pcc on Mar 14 2018, 5:39 PM.

Details

Summary

In COFF, duplicate string literals are merged by placing them in a
comdat whose leader symbol name contains a specific prefix followed
by the hash and partial contents of the string literal. This gives
us an easy way to identify sections containing string literals in
the linker: check for leader symbol names with the given prefix.

Any sections that are identified in this way as containing string
literals may be tail merged. We do so using the StringTableBuilder
class, which is also used to tail merge string literals in the ELF
linker. Tail merging is enabled only if ICF is enabled, as this
provides a signal as to whether the user cares about binary size.

Depends on D44501

Diff Detail

Repository
rL LLVM

Event Timeline

pcc created this revision.Mar 14 2018, 5:39 PM
pcc updated this revision to Diff 138481.Mar 14 2018, 5:52 PM
  • Check contents of .rdata
pcc updated this revision to Diff 138486.Mar 14 2018, 6:40 PM
  • Align MergeChunks correctly

Do you have a test to make sure that a wide string literal doesn't point into a misaligned offset into a narrow string literal?

Also, does the standard allow aliasing string literals of different underlying types together?

pcc added a comment.Mar 14 2018, 7:36 PM

Do you have a test to make sure that a wide string literal doesn't point into a misaligned offset into a narrow string literal?

I will add one.

Also, does the standard allow aliasing string literals of different underlying types together?

That's an interesting question. The standardese is:

Whether all string literals are distinct (that is, are stored in nonoverlapping objects) and whether successive evaluations of a string-literal yield the same or a different object is unspecified.

This doesn't make any distinction between different kinds of string literal, so I think the answer is "yes". In any event, this seems to be the intent of the standard.

pcc updated this revision to Diff 138489.Mar 14 2018, 8:45 PM
  • Add a test that makes sure that we don't merge string literals with different alignments
ruiu added a comment.Mar 15 2018, 11:14 AM

Overall looking very nice.

lld/COFF/Chunks.cpp
590–592 ↗(On Diff #138489)

I believe you can use toStringRef(C->getContents()).

601 ↗(On Diff #138489)

I'd flip the condition and continue early.

lld/COFF/Chunks.h
230–231 ↗(On Diff #138489)

I'd expand this comment a little bit to say that this is an lld-specific feature (not implemented in MSVC) to minimize the output size by finding string literals sharing tail parts and merge them. Maybe I'd explain how it work a bit too.

232 ↗(On Diff #138489)

Do all members have to be public? Even that's the case, I feel comfortable if you make it not a struct but a class with no private member.

lld/COFF/ICF.cpp
230 ↗(On Diff #138489)

auto -> MergeSection

lld/COFF/Writer.cpp
429 ↗(On Diff #138489)

auto -> MergeSection

pcc updated this revision to Diff 138605.Mar 15 2018, 12:47 PM
pcc marked 4 inline comments as done.
  • Address review comments
lld/COFF/ICF.cpp
230 ↗(On Diff #138489)

It would actually be std::pair<const uint32_t, MergeChunk *>, which seems a little verbose, so I left it the way it is.

lld/COFF/Writer.cpp
429 ↗(On Diff #138489)

Likewise

ruiu accepted this revision.Mar 15 2018, 12:58 PM

LGTM

This feature is awesome. I strongly believe that Microsoft should do the same thing in the next version of MSVC. :)

lld/COFF/ICF.cpp
230 ↗(On Diff #138489)

But what about SC? Isn't it SectionChunk?

This revision is now accepted and ready to land.Mar 15 2018, 12:58 PM
pcc updated this revision to Diff 138619.Mar 15 2018, 1:54 PM

Use SectionChunk

pcc added a comment.Mar 15 2018, 1:55 PM

I will commit when D44501 is approved :)

lld/COFF/ICF.cpp
230 ↗(On Diff #138489)

Oh, right.

Closed by commit rL327668: COFF: Implement string tail merging. (authored by pcc, committed by ). · Explain Why
This revision was automatically updated to reflect the committed changes.
This revision was automatically updated to reflect the committed changes.