This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objcopy] Make removeSectionReferences batched
ClosedPublic

Authored by rupprecht on Feb 15 2019, 11:26 AM.

Details

Summary

Removing a large number of sections from a file with a lot of symbols can have abysmal (i.e. O(n^2)) performance, e.g. when running --only-section to extract one section out of a large file.

This comes from iterating over all symbols in the symbol table each time we remove a section, to remove references to the section we just removed.
Instead, do just one pass of symbol removal by passing a hash set of all the sections we'd like to remove references to.

This fixes a regression when running llvm-objcopy -j <one section> on an object file with many sections and symbols -- on my machine, running objcopy -j .keep_me huge-input.o /tmp/foo.o takes .3s with GNU objcopy, 1.3s with an updated llvm-objcopy, and 7+ minutes with llvm-objcopy prior to this patch.

Diff Detail

Repository
rL LLVM

Event Timeline

rupprecht created this revision.Feb 15 2019, 11:26 AM

The approach looks good to me, but is anyone concerned about the 3MB compressed file?

llvm/tools/llvm-objcopy/ELF/Object.h
340 ↗(On Diff #187053)

Do you consider a plural form?

Don't know if you're planning on using the description as your commit message (I usually do), but there's a typo in it: abyssmal -> abysmal.

llvm/test/tools/llvm-objcopy/ELF/only-section-huge.test
1 ↗(On Diff #187053)

How does this test show reasonable performance in a CI run? I don't see anything that checks that the runtime is sane.

llvm/tools/llvm-objcopy/ELF/Object.h
278 ↗(On Diff #187053)

ArrayRef?

Also Sec -> Sections.

rupprecht updated this revision to Diff 187452.Feb 19 2019, 3:02 PM
rupprecht marked 2 inline comments as done.
  • vector->ArrayRef
  • Pluralize Sections arg
rupprecht marked 2 inline comments as done.Feb 19 2019, 3:05 PM

The approach looks good to me, but is anyone concerned about the 3MB compressed file?

FWIW, this would be the 11th largest file:

$ find . -size +1M -print0 | xargs -0 du -sk | grep -v .git | sort -nr
15752   ./compiler-rt/test/builtins/Unit/udivmodti4_test.c
8124    ./lldb/unittests/Process/minidump/Inputs/fizzbuzz_wow64.dmp
8124    ./lldb/packages/Python/lldbsuite/test/functionalities/postmortem/wow64_minidump/fizzbuzz_wow64.dmp
7416    ./lldb/www/python_reference/lldb-pysrc.html
5404    ./llvm/test/MC/Disassembler/AMDGPU/gfx9_dasm_all.txt
5244    ./libcxxabi/test/test_demangle.pass.cpp
5108    ./llvm/test/MC/Disassembler/AMDGPU/gfx8_dasm_all.txt
3712    ./llvm/test/tools/sancov/Inputs/test-linux_android_aarch64
3688    ./llvm/test/MC/AMDGPU/gfx9_asm_all.s
3508    ./llvm/test/MC/AMDGPU/gfx8_asm_all.s
3352    ./llvm/test/tools/llvm-objcopy/ELF/Inputs/huge-input.o.gz

As mentioned in a comment though, I plan to pull it from the review.

llvm/test/tools/llvm-objcopy/ELF/only-section-huge.test
1 ↗(On Diff #187053)

It doesn't -- it will just take a very long time, and hopefully buildbots would be configured to run with a timeout and kill the test if it takes too long.

I think I should actually revert the test portion of this change, but I can leave it up in phab until I commit it. Testing timeouts like this doesn't seem very common or well supported with lit. Also, someone on IRC (sorry, I forget who) suggested not writing a test with a timeout, because it could fail if it just happens to run on really slow hardware. Maybe I'll save it and commit it later if there's a better regression testing framework -- there is LNT, but I'm not sure if it's suited to this task.

jhenderson added inline comments.Feb 20 2019, 2:08 AM
llvm/tools/llvm-objcopy/ELF/Object.cpp
1358 ↗(On Diff #187452)

I'm staring at this and thinking that an unordered Set may be a better container for performance here, especially given our use of pointers, making the comparison and hashing operations cheap. It would remove the need for sort and binary_search in favour of a set lookup, which is (usually) constant time.

Have I missed something?

If we're just giving a set we might as well give the user a function instead right? That's a bit more general of an interface and easier to use. many-sections.o.gz is smaller than 3.3 MB right? I went to a lot of pains to make sure that I was uploading as small a file as was possible. Do we have large file support

cc @echristo who might have a better idea of what our limits are there.

rupprecht marked 3 inline comments as done.Feb 20 2019, 3:30 PM

If we're just giving a set we might as well give the user a function instead right? That's a bit more general of an interface and easier to use.

That's a good idea, and makes with experimenting with the lookup algorithm (switching out a vector for an unordered_set) trivial. Done,

many-sections.o.gz is smaller than 3.3 MB right? I went to a lot of pains to make sure that I was uploading as small a file as was possible. Do we have large file support

cc @echristo who might have a better idea of what our limits are there.

I'll update the patch description -- given concerns about the file size, I'll drop it when committing, but it's here for review if anyone wants to see it.
I could check in a tiny script instead and just generate it on the fly, but that ends up being very slow, so I'd rather check in a pre-built object if anything at all.

llvm/tools/llvm-objcopy/ELF/Object.cpp
1358 ↗(On Diff #187452)

Switched, although it is only a minor improvement:

python -m timeit -n 3 -r 10 -v -s 'import os' 'os.system("llvm-objcopy -j .keep_me /tmp/huge-input.o /tmp/foo.o")'

w/ sorted vector:
raw times: 4.23 4.14 4.16 4.17 4.27 4.22 4.13 4.14 4.14 4.2
3 loops, best of 10: 1.38 sec per loop

w/ unordered_set:
raw times: 3.83 3.75 3.85 3.77 3.76 3.87 3.81 3.82 3.84 3.83
3 loops, best of 10: 1.25 sec per loop

rupprecht updated this revision to Diff 187690.Feb 20 2019, 3:31 PM
rupprecht marked an inline comment as done.
  • Use function_ref to isolate the lookup logic to one place
  • Use unordered_set
rupprecht edited the summary of this revision. (Show Details)Feb 20 2019, 3:33 PM

Awesome patch.

As far as the testcase:

Right now we don't really have anything in the testsuite that tackles "How long does something take?" and I'm not sure I want to add it here yet. One thing we could do is add it to projects/test-suite where there is some tracking of how long things take via lnt. I'm not quite sure how to do that here, but it could work.

One more comment:

I also have no objections to adding a test of this size if we need to for correctness concerns.

FWIW, this would be the 11th largest file:

llvm/tools/llvm-objcopy/ELF/Object.cpp
1358 ↗(On Diff #187452)

No preference here.. Is SmallPtrSet<const SectionBase *, ?> faster?

rupprecht marked 2 inline comments as done.Feb 20 2019, 5:29 PM

Looking at projects/test-suite; it doesn't seem to have much support for running other llvm tools, but I can add llvm-objcopy. Still planning to do that separately from this patch though.

llvm/tools/llvm-objcopy/ELF/Object.cpp
1358 ↗(On Diff #187452)

No, but it's not slower -- it produces numbers within noise.

MaskRay accepted this revision.Feb 20 2019, 6:19 PM
This revision is now accepted and ready to land.Feb 20 2019, 6:19 PM
jhenderson accepted this revision.Feb 21 2019, 2:08 AM

LGTM.

I'm surprised at just how much slower we are than GNU objcopy still, but this is definitely a big win.

llvm/tools/llvm-objcopy/ELF/Object.cpp
1358 ↗(On Diff #187452)

Switched, although it is only a minor improvement:

I'll take a 10% performance improvement, thanks!

This revision was automatically updated to reflect the committed changes.
rupprecht marked an inline comment as done.