This is an archive of the discontinued LLVM Phabricator instance.

Print C-string literals in mapfile
ClosedPublic

Authored by Roger on Jan 24 2022, 1:59 PM.

Details

Reviewers
int3
Group Reviewers
Restricted Project
Commits
rG4f2c46c35ccd: Print C-string literals in mapfile
Summary

This diff has the C-string literals printed into the mapfile in the symbol table like how ld64 does.

Here is what ld64's mapfile looks like with C-string literals:

# Path: out
# Arch: x86_64
# Object files:
[  0] linker synthesized
[  1] foo.o
# Sections:
# Address       Size            Segment Section
0x100003F7D     0x0000001D      __TEXT  __text
0x100003F9A     0x0000001E      __TEXT  __cstring
0x100003FB8     0x00000048      __TEXT  __unwind_info
# Symbols:
# Address       Size            File  Name
0x100003F7D     0x0000001D      [  1] _main
0x100003F9A     0x0000000E      [  1] literal string: Hello world!\n
0x100003FA8     0x00000010      [  1] literal string: Hello, it's me\n
0x100003FB8     0x00000048      [  0] compact unwind info

Here is what the new lld's Mach-O mapfile looks like:

# Path: /Users/rgr/local/llvm-project/build/Debug/tools/lld/test/MachO/Output/map-file.s.tmp/c-string-liter
al-out
# Arch: x86_64
# Object files:
[  0] linker synthesized
[  1] /Users/rgr/local/llvm-project/build/Debug/tools/lld/test/MachO/Output/map-file.s.tmp/c-string-literal
.o
# Sections:
# Address       Size            Segment Section
0x1000002E0     0x0000001D      __TEXT  __text
0x1000002FD     0x0000001D      __TEXT  __cstring
# Symbols:
# Address           File  Name
0x1000002E0     [  1] _main
0x1000002FD     [  1] literal string: Hello world!\n
0x10000030B     [  1] literal string: Hello, it's me\n

Diff Detail

Event Timeline

Roger created this revision.Jan 24 2022, 1:59 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 24 2022, 1:59 PM
Herald added a reviewer: Restricted Project. · View Herald Transcript
Roger retitled this revision from Print C-string literals in symbol table. to Print C-string literals in mapfile.Jan 25 2022, 8:27 AM
Roger edited the summary of this revision. (Show Details)
Roger updated this revision to Diff 406203.Feb 5 2022, 12:35 PM

Ready for review

Roger published this revision for review.Feb 5 2022, 12:39 PM
Roger edited the summary of this revision. (Show Details)
Herald added a project: Restricted Project. · View Herald TranscriptFeb 5 2022, 12:39 PM
Roger added a reviewer: int3.Feb 5 2022, 12:40 PM
int3 added inline comments.Feb 7 2022, 8:55 AM
lld/MachO/MapFile.cpp
83–98

we don't need to match cstrings with symbols -- in fact, to follow ld64's behavior, we shouldn't. Given an input like

.cstring
_y:
.asciz "hi"
.asciz "unnamed"

The mapfile contains separate entries for "hi" and "unnamed", even though "unnamed" doesn't have a symbol associated with it.

So basically we can iterate from i=0...isec->pieces.size()-1 and call isec->getStringRef(i) on each one. No need to work directly with the string pieces either :)

lld/test/MachO/map-file.s
71–79

so most of the other tests have all the RUN and CHECK lines at the top, above the input files. I see that the other tests in this file have already interleaved them... could we fix that please? Thanks :)

int3 accepted this revision.Feb 7 2022, 9:27 AM

Thanks!

lld/MachO/MapFile.cpp
83–98

talked offline. Roger pointed out that in order to sort the cstrings along with the rest of the symbols in address order, we would need some sort of Symbol * key in the map. We could create fake Symbols for this purpose, but that doesn't make for any simpler code, so let's stick with this.

This does mean that we won't be able to deal with "unnamed" strings, but I suspect that in practice it's a rare enough case. I don't believe clang generates any such strings in its assembly output.

86–87

cast<> does a runtime type check + non-null check in debug builds

88–91

codebase convention is to favor explicit types over auto. Exceptions are if the RHS has the type explicitly mentioned already (e.g. via a cast), or if the type is some messy thing (e.g. iterator types)

also, let's add const wherever possible

lld/test/MachO/map-file.s
78–79

let's also have the test cover the case of dead-stripped cstrings

This revision is now accepted and ready to land.Feb 7 2022, 9:27 AM

This may make the output much larger... The information can be achieved with llvm-readobj -p __cstring

int3 added a comment.EditedFeb 8 2022, 12:45 AM

Well we are trying to have feature parity with ld64... I guess we can add additional LLD-specific flags to disable this behavior for builds that don't need that info from their mapfiles

Roger updated this revision to Diff 407577.Feb 10 2022, 9:30 AM

Address comments

Roger added a comment.Feb 10 2022, 9:55 AM

@int3 I've simplified the code to not use offsets from StringPieces and to just print the StringRef that corresponds to each StringPiece. This is because we no longer do any tail-merging, making offsets unnecessary.

Roger marked 5 inline comments as done.Feb 10 2022, 9:56 AM
Roger added inline comments.
lld/test/MachO/map-file.s
71–79

I created a new diff to do this.

int3 accepted this revision.Feb 10 2022, 11:08 AM

This is because we no longer do any tail-merging, making offsets unnecessary.

I believe they're technically still possible, assuming the input is like

.ascii "non-null-terminated"
_foo:
.asciz "null-terminated

Probably unlikely to happen -- plus our current implementation doesn't really work when we have strings without corresponding symbols anyway -- but it wouldn't hurt to assert that the offset is indeed zero.

lld/MachO/MapFile.cpp
86

does this not work?

Roger updated this revision to Diff 407727.Feb 10 2022, 5:12 PM
Roger marked an inline comment as done.

Fix cast.

Roger marked an inline comment as done.Feb 10 2022, 5:20 PM

it wouldn't hurt to assert that the offset is indeed zero.

I realized this is not necessary as the function CStringInputSection::getStringRef should handle any complexities around offsets. Looking at that getStringRef's code right now, the logic actually assumes that there is no tail merging as it does the following:

StringRef getStringRef(size_t i) const {
    size_t begin = pieces[i].inSecOff;
    size_t end =
        (pieces.size() - 1 == i) ? data.size() : pieces[i + 1].inSecOff;
    return toStringRef(data.slice(begin, end - begin));
}

You can see that it uses the start of the next StringPiece to determine the end of the current StringPiece. This implies that StringPieces do not overlap (assuming they are ordered by their starting location and that there is 1:1 relationship between strings and StringPieces).

int3 added inline comments.Feb 11 2022, 12:14 PM
lld/MachO/MapFile.cpp
87

By "assert the offset is zero" I meant assert(sym->value == piece.inSecOff)... since we are locating the StringPiece via binary search, we are not guaranteed to find a string who starts at the exact location of the symbol

Even if strings don't overlap, we can have a symbol that starts in the middle of an existing string -- that's what the assert would catch.

Roger updated this revision to Diff 407988.Feb 11 2022, 12:27 PM

Add assert statement.

This revision was automatically updated to reflect the committed changes.