This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objcopy] Cache gnu_debuglink's target CRC
ClosedPublic

Authored by janisozaur on Apr 30 2019, 2:33 PM.

Details

Summary

.gnu_debuglink section contains information regarding file with
debugging symbols, identified by its CRC32. This target file is not
intended to ever change or it would invalidate the stored checksum, yet
the checksum is calculated over and over again for each of the objects
inside the archive, usually hundreds of times.

This patch precomputes the CRC32 of the target once and then reuses the
value where required, saving lots of redundant I/O.

The error message reported should stay the same, although now it might
be reported earlier.

Diff Detail

Repository
rL LLVM

Event Timeline

janisozaur created this revision.Apr 30 2019, 2:33 PM
janisozaur edited the summary of this revision. (Show Details)

Whilst I don't necessarily object to this change, I do wonder as to its necessity. Have you seen this as an issue in practice? My understanding of the .gnu_debuglink section is that there should be a single entry in it in an executable, pointing at a single file containing debug information for that executable. Adding the section to relocatable object files (and therefore to archive members) makes no sense, since you don't debug those.

llvm/tools/llvm-objcopy/CopyConfig.cpp
502 ↗(On Diff #197450)

Is "no_such_file_or_directory" the only way MemoryBuffer::getFile can file? What about things like "it's a directory" or "access denied"? If it isn't the only way, we shouldn't be using this misleading error code.

Whilst I don't necessarily object to this change, I do wonder as to its necessity. Have you seen this as an issue in practice?

Yes. For a given foo.a archive and its debug symbols foo.a.debug, try executing llvm-objcopy --add-gnu-debuglink=foo.a.debug foo.a. In my tests with ~130MiB of debug symbols and 300 entries in the archive, it takes about 1 minute 25 seconds to complete without this patch and less than 500ms with this patch, producing SHA1-identical output.

My understanding of the .gnu_debuglink section is that there should be a single entry in it in an executable, pointing at a single file containing debug information for that executable. Adding the section to relocatable object files (and therefore to archive members) makes no sense, since you don't debug those.

I have not investigated why is it being added to each member of the archive, but I can imagine when you create a static archive, there is no section to place this information and only object files exist which are capable of holding such information. Then, when you link said archive to executable, linker is free to only select and use the object files actually in use, hence each of them should carry relevant information.

You can observe very similar behaviour in GNU binutils' objcopy, where I've filed a similar patch.

janisozaur added inline comments.May 1 2019, 11:34 AM
llvm/tools/llvm-objcopy/CopyConfig.cpp
502 ↗(On Diff #197450)

This one seemed the most relevant. Originally exit(1) was used explicitly.

All the other option parsing code returns invalid argument (I think it's 22), but it also doesn't verify existence and readability of files.

jakehehrlich accepted this revision.May 1 2019, 1:02 PM

This code looks good to me and the use case seems valid. I'll give my LGTM but please wait for others.

llvm/tools/llvm-objcopy/CopyConfig.cpp
494–498 ↗(On Diff #197450)

This comment reads a bit aggressive to me. Maybe something like "We can avoid recalculating the checksum for every target file inside an archive by precomputing the CRC here. This prevents a significant amount of I/O."

This revision is now accepted and ready to land.May 1 2019, 1:02 PM
janisozaur updated this revision to Diff 197718.May 2 2019, 1:47 AM

Updated comment as per review request

jhenderson added inline comments.May 2 2019, 1:53 AM
llvm/tools/llvm-objcopy/CopyConfig.cpp
502 ↗(On Diff #197450)

Okay. How about we just use the error code stored in DebugOrErr?

MaskRay added inline comments.May 2 2019, 1:54 AM
llvm/tools/llvm-objcopy/CopyConfig.cpp
502 ↗(On Diff #197450)

return errorCodeToError(DebugOrErr.getError());

MaskRay added inline comments.May 2 2019, 1:58 AM
llvm/tools/llvm-objcopy/CopyConfig.cpp
502 ↗(On Diff #197450)

return createFileError(Config.AddGnuDebugLink, DebugOrErr.getError());

janisozaur updated this revision to Diff 197735.May 2 2019, 3:35 AM

Update reported error

janisozaur marked 2 inline comments as done.May 2 2019, 3:36 AM
MaskRay added inline comments.May 2 2019, 4:11 AM
llvm/tools/llvm-objcopy/CopyConfig.cpp
508 ↗(On Diff #197735)

Since you are moving the comment, may I ask you to delete It also doesn't negate the initial CRC32 value but it starts by default at 0xFFFFFFFF which is the complement of zero. This sentence if verbose and doesn't appear to help a casual reader understand the rationale.

janisozaur updated this revision to Diff 197756.May 2 2019, 5:11 AM

Remove redundant part of comment

janisozaur marked an inline comment as done.May 2 2019, 5:11 AM

Simple rebase + removal of unrelated formatting change

Ping. Is there still anything else to improve here?

Ping. Is there still anything else to improve here?

I don't think so. You've addressed all the outstanding comments, got two LGTMs, and have given it more than enough time. This can go in when you are ready to commit.

Is this an action I can take? This is first time I'm using Phabricator and I can't see any button to submit the change myself. I would expect this is something project maintainers need to take care of.

janisozaur accepted this revision.May 14 2019, 1:25 AM

Is this an action I can take? This is first time I'm using Phabricator and I can't see any button to submit the change myself. I would expect this is something project maintainers need to take care of.

Do you have commit access for LLVM? If not, I can commit it later today.

I doubt I have such access. Can you take over?

I doubt I have such access. Can you take over?

Sure, no problem.

This revision was automatically updated to reflect the committed changes.