This is an archive of the discontinued LLVM Phabricator instance.

[lld-macho] Implement -object_path_lto
ClosedPublic

Authored by int3 on Dec 2 2020, 8:47 PM.

Details

Reviewers
clayborg
Group Reviewers
Restricted Project
Commits
rG95831a56d092: [lld-macho] Implement -object_path_lto
Summary

This makes it possible for STABS entries to reference the debug info
contained in the LTO-compiled output.

I'm not sure how to test the file mtime within llvm-lit -- GNU and BSD
stat take different command-line arguments. I've omitted the check for
now.

Diff Detail

Event Timeline

int3 created this revision.Dec 2 2020, 8:47 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 2 2020, 8:47 PM
int3 requested review of this revision.Dec 2 2020, 8:47 PM
int3 updated this revision to Diff 309142.Dec 2 2020, 8:57 PM

error handling

int3 added a comment.Dec 7 2020, 2:56 PM

@clayborg was hoping to get your opinion on my approach to modtime in this diff

@clayborg was hoping to get your opinion on my approach to modtime in this diff

What is the main reason for this? Is this so you can run the linker on the same set of input .o files and end up with a binary whose MD5 checksum doesn't change? The danger of this approach is that we will end up allowing any lto.o file to be loaded and linked with dsymutil or lldb. If the lto.o file is modified, we will have no way of knowing if we are loading the right file. Can we add an option for this if this is solely needed for testing? I think people in the real world are going to want the modification file set correctly. If we fix https://reviews.llvm.org/D92736 to generate the UUID with the MD5 of all of the sections that don't change with where or when the binary was built, you can end up using the UUID for verification that things are the same or correct.

lld/MachO/LTO.cpp
61

Remove this.

If you just emit the N_OSO STABS entry for the lto.o file with zero a the value (mod time) then LLDB will not try to match the mod time of the binary and won't reject it if it doesn't match. Same goes for dsymutil.

110

Remove this (see above inline comment) and maybe special case it when emitting the N_OSO?

int3 planned changes to this revision.Dec 7 2020, 3:17 PM

What is the main reason for this? Is this so you can run the linker on the same set of input .o files and end up with a binary whose MD5 checksum doesn't change?

Yeah pretty much... but you raise good points. I'll change it and have the test just skip checking for the exact timestamp

int3 updated this revision to Diff 310066.Dec 7 2020, 5:29 PM

emit real modTime in STABS

clayborg accepted this revision.Dec 8 2020, 2:15 PM
This revision is now accepted and ready to land.Dec 8 2020, 2:15 PM
int3 edited the summary of this revision. (Show Details)Dec 8 2020, 10:29 PM
thakis added a subscriber: thakis.Dec 9 2020, 11:28 AM

FWIW we're very interested in having deterministic build outputs (in the senses of https://blog.llvm.org/2019/11/deterministic-builds-with-clang-and-lld.html), and putting file mtimes in the linker output defeats that. Could we put content hashes in the mtime instead? (We could make the compiler produce a content hash of its .o output at compile time so that we could just read the content off out the .o file instead of computing it if computing it is a perf issue.)

int3 added a comment.Dec 9 2020, 1:51 PM

That sounds like it'd need a corresponding change in LLDB. (And possibly gdb too?)

Seems out of scope for now...

int3 added a comment.Dec 9 2020, 3:26 PM

If I understood @clayborg's earlier comments, LLDB will match nonzero STABS values with the mtime, so a content hash would result in a mismatch. But a zero STABS value will match any file (at the given path), so I guess we could have a flag to enable that behavior in order to get reproducible builds?

FWIW we're very interested in having deterministic build outputs (in the senses of https://blog.llvm.org/2019/11/deterministic-builds-with-clang-and-lld.html), and putting file mtimes in the linker output defeats that. Could we put content hashes in the mtime instead? (We could make the compiler produce a content hash of its .o output at compile time so that we could just read the content off out the .o file instead of computing it if computing it is a perf issue.)

We can't change the value of the N_OSO symbol without a change to LLDB and GDB. The debuggers really want to have this timestamp because someone can recompile a .o file and then try to debug a program that refers to this .o file which now has completely different contents. If we use a hash, then each time you debug, the debugger must fully hash each .o file to figure out if the hash matches which we really do not want to do as that will force it to touch every page in each .o file each time you debug and that will kill performance.

If we can come up with a solution that doesn't adversely affect performance, the N_OSO currently must have the "n_desc" field of each symbol set to 1. We could set this to 2 and know that the value in the symbol would be different. But that again requires an update to the debugger and anyone else that tries to load a binary with this change into an older debugger will get no debug info with no explanation of what is going wrong.

If I understood @clayborg's earlier comments, LLDB will match nonzero STABS values with the mtime, so a content hash would result in a mismatch. But a zero STABS value will match any file (at the given path), so I guess we could have a flag to enable that behavior in order to get reproducible builds?

zeroing one the value of the N_OSO can cause the debugger to just load any version of the .o file even if it is out of date.

BTW: dsymutil does support loading the debug map from a yaml file (or maybe JSON?). They use this for testing. So there is nothing that states that the debug map must be in the binary itself. We could teach dsymutil to check for N_OSO entries in the binary it is trying to link and if there are none, check for a JSON or YAML file right next to the binary and load the debug map from there automatically. Another option is to add a new STAB entry that specifies the path to the YAML file, and this could be made relative for repro builds.

thakis added a comment.Dec 9 2020, 5:09 PM

Ok, then let's not worry about that for now, thanks.

This revision was landed with ongoing or failed builds.Dec 10 2020, 3:58 PM
This revision was automatically updated to reflect the committed changes.