Page MenuHomePhabricator

[llvm-objcopy] Preserve .ARM.attributes section when stripping files
ClosedPublic

Authored by thieta on Oct 18 2019, 11:40 AM.

Details

Summary

This works around a bug in Debian's patchset for glibc. The bug is described in detail in the upstream debian bug: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=943798, but the short version of it is that glibc on any Debian based distro don't load libraries unless it has a .ARM.attribute section.

Diff Detail

Event Timeline

thieta created this revision.Oct 18 2019, 11:40 AM

Seems like a hack, no? What about using --keep-section when you need it? You say its not strictly needed on modern systems and the problem is with the dlopen implementation of these older libcs, and not llvm-objcopy, so I would be hesitant to add something like this especially given we have a solution with --keep-section

Seems like a hack, no? What about using --keep-section when you need it? You say its not strictly needed on modern systems and the problem is with the dlopen implementation of these older libcs, and not llvm-objcopy, so I would be hesitant to add something like this especially given we have a solution with --keep-section

Keep-sections totally works. But since this produces unusable binaries on systems that require this section and lld already makes this consession I think it's worth adding this very low maintenance hack.

The most common usecase here is to replace GNU strip with llvm-strip, to pass keep-section in this case you would need a wrapper or patch build systems (painful with automake for example).

Reading the launchpad report linked from Peter's post it seems like this might actually still be a problem which would mean that even a new Ubuntu install on arm wouldn't be able to run these binaries.

Seems like a hack, no? What about using --keep-section when you need it? You say its not strictly needed on modern systems and the problem is with the dlopen implementation of these older libcs, and not llvm-objcopy, so I would be hesitant to add something like this especially given we have a solution with --keep-section

Keep-sections totally works. But since this produces unusable binaries on systems that require this section and lld already makes this consession I think it's worth adding this very low maintenance hack.

I agree, we generally want --strip-all to be the aggressive option and --strip-all-gnu to be the option when more compatibility is needed (and --keep-section when things seriously go wrong), but I think we should probably not have --strip-all be completely unusable for default modes.

OTOH, Alex has a point, we don't want to make binaries larger just so buggy systems are compatible. How large are arm attribute sections anyway? Is there any significant binary bloat to worry about? Maybe this just warrants a more explicit comment in the source that we should revisit this in a few years -- and if anyone is still using an old system, they can stay pinned to an old llvm-strip.

The most common usecase here is to replace GNU strip with llvm-strip, to pass keep-section in this case you would need a wrapper or patch build systems (painful with automake for example).

Reading the launchpad report linked from Peter's post it seems like this might actually still be a problem which would mean that even a new Ubuntu install on arm wouldn't be able to run these binaries.

For more context, I dug up the patch where we added the .gnu.warning exemption above, and tracked it down to where --strip-all-gnu was created in rL319071. The patch description there says that the more aggressive --strip-all was inspired by eu-strip. Have you tried out that tool to see if it has the same issue? It would be interesting to see if they exempt that & if they've run into this issue before.

llvm/test/tools/llvm-objcopy/ELF/strip-preserve-arm-attributes.test
3

Copying the input file is only required for tests that plan to check that the input file wasn't accidentally modified (i.e. there would be a check for cmp %t %t3 later). I think that's not necessary for this test, and you can remove this line.

4

I think --strip-all-gnu should work too. Can you add another check that it also works?

Also, a couple docs that should be updated (where we mention .gnu.warning):

llvm/tools/llvm-objcopy/CommonOpts.td
llvm/docs/CommandGuide/llvm-objcopy.rst
llvm/docs/CommandGuide/llvm-strip.rst

+1 for moving the to --strip-all-gnu

Hello!

Thanks everyone for the feedback. I was meeting up with our QA team this week and they said that they seen the issue with these binaries on a much larger array of devices than I initially thought. This is not a bug in upstream GLIBC - this is a bug in debian/ubuntu versions that patch glibc to specifically check for SHT_ARM_ATTRIBUTES. We tested with all debian based NAS devices we had and saw the same issue. We even tested with raspbian on raspberry pi and this also contains the issue.

Navigating patches in upstream ubuntu/debian is pretty hard - but as far as I can gatherr this patch is still shipping with modern debian based distributions: https://git.launchpad.net/ubuntu/+source/glibc/tree/debian/patches/arm/unsubmitted-ldso-abi-check.diff?h=debian/sid#n45

This means that the impact of this right now is that llvm-strip with no arguments produces binaries that can't be run on modern ubuntu and debian for arm, so I think this patch has value.

The size of .ARM.attributes seems to be pretty tiny - in my binary it's 112 bytes.

@rupprecht thanks for the review - happy to address your comments and change the commit message and inline comment to reflect my findings above if you guys decide to accept this approach. I think this needs to be documented if nothing else considering how confusing the error on device actually is.

Thanks,
Tobias

MaskRay added inline comments.Oct 21 2019, 10:44 AM
llvm/test/tools/llvm-objcopy/ELF/strip-preserve-arm-attributes.test
2

We need a ## file-level comments that explains the purpose of the test. You can find many examples in the same directory.

llvm/tools/llvm-objcopy/ELF/ELFObjcopy.cpp
506

We should document the upstream glibc bug (as an example, put a link http://sourceware.org/PR16773) and the first version of glibc that fixed the bug, so that we can retire the exemption at some point (I think we usually don't care about toolchains that are more than 5 years old).

thieta marked an inline comment as done.Oct 21 2019, 8:54 PM
thieta added inline comments.
llvm/tools/llvm-objcopy/ELF/ELFObjcopy.cpp
506

Unfortunately this bug is not in upstream glibc but rather in the patchset that ubuntu/debian uses. There has been no upstream fix as far as I can tell and the bug hasn't been updated since 2017: https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1712938

I added a little info to that bug now to hope to see someone review it again.

@rupprecht thanks for the review - happy to address your comments and change the commit message and inline comment to reflect my findings above if you guys decide to accept this approach. I think this needs to be documented if nothing else considering how confusing the error on device actually is.

I'm good with the approach in this patch -- once the test/docs changes are done, I'll accept it. I also ran into Jake yesterday at the llvm devmtg and came to the same conclusion. (Also, sorry for the delay, which was also because of the devmtg). --strip-all should generally be more aggressive, but not so much that it totally doesn't work as a default mode on common platforms.

Also, do you have commit access (or plan on getting it), or would you like one of us to commit it?

thieta updated this revision to Diff 226370.Oct 24 2019, 8:11 PM
thieta marked an inline comment as not done.
thieta edited the summary of this revision. (Show Details)

This revision should address all the comments in the previous revision. Comments are updated, testing --strip-all-gnu and documentation changes. Let me know if you have additional comments.

thieta marked 5 inline comments as done.Oct 24 2019, 8:13 PM

@rupprecht - Thanks! I have addressed the comments and uploaded a new rev. Hope I did it correctly. I don't have commit access - if you could commit it for me it would be great. I might have other patches in the future (looking at you MingW driver) but I'll cross that bridge then.

MaskRay added a comment.EditedOct 24 2019, 11:21 PM

I got a misimpression that this is an upstream glibc issue but it turns out that it is not.

A glibc ARM maintainer said https://git.launchpad.net/ubuntu/+source/glibc/tree/debian/patches/arm/unsubmitted-ldso-abi-check.diff?h=debian/sid# is only shipped by Ubuntu (Debian?) and probably its derivatives. They can fix the problem by dropping the patch. The glibc upstream does not have the problem.

We probably need more input from their side before we proceed to add an llvm-objcopy hack that fixes issues brought by a potentially problematic distribution patch to glibc. In the best case, if Ubuntu developers decide that the glibc patch is a mistake - this is possible, e.g. it used to work around some problems but the package maintainers failed to notice that the patch is no longer needed. We do not need the generic llvm-objcopy patch that may impact users from other distributions.

rupprecht accepted this revision.Oct 25 2019, 11:38 AM

LGTM to the change, though I want to resolve this thread before committing:

I got a misimpression that this is an upstream glibc issue but it turns out that it is not.

A glibc ARM maintainer said https://git.launchpad.net/ubuntu/+source/glibc/tree/debian/patches/arm/unsubmitted-ldso-abi-check.diff?h=debian/sid# is only shipped by Ubuntu (Debian?) and probably its derivatives. They can fix the problem by dropping the patch. The glibc upstream does not have the problem.

We probably need more input from their side before we proceed to add an llvm-objcopy hack that fixes issues brought by a potentially problematic distribution patch to glibc. In the best case, if Ubuntu developers decide that the glibc patch is a mistake - this is possible, e.g. it used to work around some problems but the package maintainers failed to notice that the patch is no longer needed. We do not need the generic llvm-objcopy patch that may impact users from other distributions.

Suppose the glibc patch is dropped from ubuntu (and any other distro that still has this patch), isn't there still a problem that some very large percentage of machines would need to update to pick up the patch, and llvm-strip would be producing unusable binaries for the long tail that haven't? Would it be fine to commit this with a TODO to remove it after some appropriate amount of time (1 year maybe)?

I might have other patches in the future (looking at you MingW driver)

mstorsjo should be able to help/redirect you for questions there

llvm/test/tools/llvm-objcopy/ELF/strip-preserve-arm-attributes.test
9–10

nit: this should be %t3 so %t2 from the previous case isn't overwritten (in case this test needs to be debugged)

This revision is now accepted and ready to land.Oct 25 2019, 11:38 AM

LGTM to the change, though I want to resolve this thread before committing:

I got a misimpression that this is an upstream glibc issue but it turns out that it is not.

A glibc ARM maintainer said https://git.launchpad.net/ubuntu/+source/glibc/tree/debian/patches/arm/unsubmitted-ldso-abi-check.diff?h=debian/sid# is only shipped by Ubuntu (Debian?) and probably its derivatives. They can fix the problem by dropping the patch. The glibc upstream does not have the problem.

We probably need more input from their side before we proceed to add an llvm-objcopy hack that fixes issues brought by a potentially problematic distribution patch to glibc. In the best case, if Ubuntu developers decide that the glibc patch is a mistake - this is possible, e.g. it used to work around some problems but the package maintainers failed to notice that the patch is no longer needed. We do not need the generic llvm-objcopy patch that may impact users from other distributions.

Suppose the glibc patch is dropped from ubuntu (and any other distro that still has this patch), isn't there still a problem that some very large percentage of machines would need to update to pick up the patch, and llvm-strip would be producing unusable binaries for the long tail that haven't? Would it be fine to commit this with a TODO to remove it after some appropriate amount of time (1 year maybe)?

We don't receive more complaints, do we? It is also possible for Ubuntu package maintainers to keep this patch in their repositories. We have more choices than accepting this unconditionally and penalizing every other distribution.

I totally defer to you guys on if this is something you want to accept or not. We are now patching our own toolchain with this patch to not create bad binaries. If I where to make a case for accepting this patch it would be something like:

Stripping with llvm-strip without any arguments currently creates binaries that doesn't work on any debian derriative distribution on arm32. And while yes this is totally a bug in a patch they ship, the upstream have had a bug opened against this for over 2 years without even triaging it, so I don't think we can see it be removed for a long time. What's worse is that the error is really hard to spot / understand - it just gives feedback that the shared object can't be loaded and it lead us to troubleshoot for hours to find the culprit.

The downside to accepting this is that we add around 100 bytes to files for other dists that doesn't really need it. Personally I think that's a small price to pay to get working binaries.

One option could be to add a new option that more aggressively removes things that can break compatibility and document those (could even drop .gnu.warnings). Or make --strip-all-gnu the default and --strip-all something you opt into.

From my point of view I think argument less llvm-strip should retain high compatibility and only people that know what they are doing can opt for more aggressive removal.

But as stated above - I won't be crushed if this patch doesn't make it to the mainline. I think maybe the incompatibility should be documented in that case so it can be easier to find in the future without a lot of troubleshooting.

Thanks again for taking the time to review my first attempt at a LLVM patch!

My impression is that a large fleet of devices for some reason decided to do this much like the infamous 8-byte aligned note bug. I think we should allow people to cope while giving them the best results possible. I'm personally in favor of adding this for now and removing it later personally.

However I have an even better solution that might work and its something I've been meaning to do for a while anyways. We should add llvm-strip-gnu and llvm-objcopy-gnu which act as really strong drop in replacements for GNU objcopy and strip. This lets us maintain the best possible results for 99% of users while allowing the 1% to not be blocked when they're doing something unsavory like relaying on a non-allocated section, or section headers at all, at runtime.

MaskRay added a comment.EditedOct 28 2019, 6:59 PM

Thank you for the long reply but can you hold it off a bit though it is displayed as "accepted". I am discussing with a Debian developer on this issue. I think we should make Debian aware of it. FYI The Debian patch is at https://salsa.debian.org/glibc-team/glibc/commits/sid/debian/patches/arm/unsubmitted-ldso-abi-check.diff

Program behaviors should not depend on details of the file outside of the loadable segments. To some degree I may agree that some sort of introspectability is allowed, but it definitely should not segfault due the missing of a non-SHF_ALLOC section.

The downside to accepting this is that we add around 100 bytes to files for other dists that doesn't really need it. Personally I think that's a small price to pay to get working binaries.

The downside also includes the additional rule to --strip-all, and the behavior changes introduced now and when we remove it in the future.

A more effective approach moving forward is probably to notify the patch author. I just did that.

jhenderson added inline comments.Oct 29 2019, 4:39 AM
llvm/docs/CommandGuide/llvm-objcopy.rst
101–103

Nit: not strictly required, but when I worked on much of the documentation recently, I tried to keep the lines to 80 columns, like the source code, where possible, so please reflow this comment.

llvm/docs/CommandGuide/llvm-strip.rst
80–82

Ditto.

llvm/test/tools/llvm-objcopy/ELF/strip-preserve-arm-attributes.test
2

preserves -> preserve

8

You don't need to do --file-headers in this or the other check two lines below.

llvm/tools/llvm-objcopy/CommonOpts.td
43–44

This goes over the 80 character limit. There's a way to run clang-format on tablegen files, but I can't remember the specific invocation off the top of my head.

MaskRay added inline comments.Oct 29 2019, 10:59 AM
llvm/tools/llvm-objcopy/CommonOpts.td
43–44

Ensure the change is either git added, or included in the branch,

git diff -U0 --no-color 'HEAD^' | /path/to/llvm-project/clang/tools/clang-format/clang-format-diff.py -i -p1

to format .td files.

If you have a newer clang-format, append -binary=/path/to/clang-format to the clang-format-diff.py command line.

MaskRay added a comment.EditedOct 29 2019, 4:46 PM

So chatted with Sledge. Their authoritative input on this matter is that we do need to carry a workaround. I think this patch is good to go once Sledge has an official bug id and the comment is updated with that piece of information.

<Sledge> it's going to take quite a while for that fix to filter through to users, particularly for Ubuntu and Debian LTS users :-/

MaskRay accepted this revision.Oct 29 2019, 6:08 PM
MaskRay added inline comments.
llvm/tools/llvm-objcopy/ELF/ELFObjcopy.cpp
509

Change it to https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=943798 because it is the upstream of other distributions.

So chatted with Sledge. Their authoritative input on this matter is that we do need to carry a workaround. I think this patch is good to go once Sledge has an official bug id and the comment is updated with that piece of information.

<Sledge> it's going to take quite a while for that fix to filter through to users, particularly for Ubuntu and Debian LTS users :-/

Thanks a lot for reaching out to debian developers @MaskRay ! - I will address the last set of comments raised and hopefully submit a final pass of this patch today or tomorrow.

thieta updated this revision to Diff 227043.Oct 30 2019, 2:16 AM
thieta edited the summary of this revision. (Show Details)

Fixed all outstanding comments with documentation, test cases and linking to the correct upstream bug.

thieta marked 8 inline comments as done.Oct 30 2019, 2:17 AM
jhenderson added inline comments.Oct 30 2019, 2:41 AM
llvm/test/tools/llvm-objcopy/ELF/strip-preserve-arm-attributes.test
7–10

Since the original issue was with llvm-strip, it probably makes sense to include llvm-strip invocations to test this. Probably the best thing to do would be to run llvm-strip and llvm-strip --strip-all-gnu and then cmp the output with the corresponding llvm-objcopy output.

thieta marked an inline comment as done.Oct 30 2019, 2:57 AM
thieta added inline comments.
llvm/test/tools/llvm-objcopy/ELF/strip-preserve-arm-attributes.test
7–10

I think I understood what you mean - but just to make sure you meant something like this right:

 # RUN: yaml2obj %s > %t
-# RUN: llvm-objcopy --strip-all %t %t2
+# RUN: llvm-strip %t %t2
 # RUN: llvm-readobj --sections %t2 | FileCheck %s
-# RUN: llvm-objcopy --strip-all-gnu %t %t3
+# RUN: llvm-strip --strip-all-gnu %t %t3
 # RUN: llvm-readobj --sections %t3 | FileCheck %s
+# RUN: llvm-objcopy --strip-all %t %t4
+# RUN: cmp %t4 %t2
+# RUN: llvm-objcopy --strip-all-gnu %t %t5
+# RUN: cmp %t5 %t3
jhenderson added inline comments.Oct 30 2019, 3:08 AM
llvm/test/tools/llvm-objcopy/ELF/strip-preserve-arm-attributes.test
7–10

That works. Or you can just add the llvm-strip ones afterwards instead:

# RUN: yaml2obj %s > %t
# RUN: llvm-objcopy --strip-all %t %t2
# RUN: llvm-readobj --sections %t2 | FileCheck %s
# RUN: llvm-objcopy --strip-all-gnu %t %t3
# RUN: llvm-readobj --sections %t3 | FileCheck %s
# RUN: llvm-strip %t -o %t4
# RUN: cmp %t4 %t2
# RUN: llvm-strip --strip-all-gnu %t -o %t5
# RUN: cmp %t5 %t3

As a note, llvm-strip strips files inline unless the -o option is specified. Thus llvm-objcopy %t %t2 copies %t to %t2, whereas llvm-strip %t %t2 strips both %t and %t2.

thieta updated this revision to Diff 227056.Oct 30 2019, 3:20 AM

Updated the tests with @jhenderson comments.

jhenderson accepted this revision.Oct 30 2019, 4:23 AM

Thanks! LGTM.

Can someone please commit this for me if everyone is happy with it now?

Can someone please commit this for me if everyone is happy with it now?

I'll take care of this later today.

This revision was automatically updated to reflect the committed changes.

Thanks everyone! It was a great learning experience getting this patch in and learning more about building/testing LLVM!

MaskRay added a comment.EditedOct 31 2019, 10:45 AM

Can someone please commit this for me if everyone is happy with it now?

I'll take care of this later today.

Since LLVM has migrated from svn to git, we can do arc patch D69188; git commit --ammend --author='Tobias Hieta <...>'; git checkout master; git merge --ff arc-D69188; git push now :) I don't know how to fix the --author=... inconvenience (https://lists.llvm.org/pipermail/llvm-dev/2019-October/136374.html), though.

Requested the commit to be included in llvm 9.0.1 https://bugs.llvm.org/show_bug.cgi?id=43863