This is an archive of the discontinued LLVM Phabricator instance.

[Object] Support large archive members
ClosedPublic

Authored by jhenderson on Mar 6 2020, 6:59 AM.

Details

Summary

The archive library truncated the size of archive members whose size was greater than max uint32_t, leading to errors in tools that use the library. This patch fixes the issue and adds some unit tests to verify.

Diff Detail

Event Timeline

jhenderson created this revision.Mar 6 2020, 6:59 AM
Herald added a project: Restricted Project. · View Herald Transcript
MaskRay added a comment.EditedMar 6 2020, 7:52 AM
Command Output (stderr):
--

bzip2: Compressed file ends unexpectedly;
	perhaps it is corrupted?  *Possible* reason follows.
bzip2: Success
	Input file = /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/lld/test/ELF/Inputs/large-files1.o.bz2, output file = (stdout)

It is possible that the compressed file(s) have become corrupted.
You can use the -tvv option to test integrity of such files.

You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.

As you see, many systems may not be able to run such tests. Deleting the test may be fine.

llvm/lib/Object/Archive.cpp
394–395

Does return Header.getSize() work?

grimar added a comment.Mar 7 2020, 2:21 AM

I am also not fond of having tests that might consume gigabytes to run.
Perhaps you could craft an object that has a broken size written to a header, e.g. (MAX_UINT32 + 1) ant test
that LLD does not crash and report an error with that would contain the size value found?

Perhaps you could craft an object that has a broken size written to a header, e.g. (MAX_UINT32 + 1) ant test
that LLD does not crash and report an error with that would contain the size value found?

I'm not sure I understand what you mean here. The bug fixed by this patch manifests as attempting to read a section header table past the end of the file, because the result of the truncated size is failing to create a buffer that is large enough. This fix can therefore only be verified by attempting to use data in the latter part of a buffer that is > 4GB in size. The only alternative I can think of to having large files as inputs, is to create sparse files, but I'm not sure if the test would be portable consequently.

jhenderson updated this revision to Diff 249053.Mar 9 2020, 3:57 AM

Simplify size handling slightly.

I've not deleted the test yet, but I will do so in due course before this gets committed. I'd like to see if we can come up with an alternative test option first, if we can. For the record, the test passes locally with my patch, but didn't without the fix (it was how I found the bug in the first place).

I realised that I could at least test Archive::Child::getSize() function using gtest unittests. I'll take a look at that, and see if getBuffer is possible too.

jhenderson updated this revision to Diff 249089.Mar 9 2020, 7:19 AM
jhenderson marked an inline comment as done.

Test with unit tests.

jhenderson edited the summary of this revision. (Show Details)Mar 9 2020, 7:19 AM
jhenderson edited the summary of this revision. (Show Details)
MaskRay accepted this revision.Mar 9 2020, 8:51 AM
This revision is now accepted and ready to land.Mar 9 2020, 8:51 AM
ruiu accepted this revision.Mar 9 2020, 11:36 PM

LGTM

grimar accepted this revision.Mar 10 2020, 4:01 AM

LGTM. Using unit test is good I think.

Perhaps you could craft an object that has a broken size written to a header, e.g. (MAX_UINT32 + 1) ant test
that LLD does not crash and report an error with that would contain the size value found?

I'm not sure I understand what you mean here. The bug fixed by this patch manifests as attempting to read a section header table past the end of the file, because the result of the truncated size is failing to create a buffer that is large enough. This fix can therefore only be verified by attempting to use data in the latter part of a buffer that is > 4GB in size. The only alternative I can think of to having large files as inputs, is to create sparse files, but I'm not sure if the test would be portable consequently.

Idea was based on a testing of a values reported in a error message.
See the code this patch changes:

Expected<StringRef> Archive::Child::getBuffer() const {
  Expected<bool> isThinOrErr = isThinMember();
  if (!isThinOrErr)
    return isThinOrErr.takeError();
  bool isThin = isThinOrErr.get();
  if (!isThin) {
    Expected<uint64_t> Size = getSize();
    if (!Size)
      return Size.takeError();
    return StringRef(Data.data() + StartOfFile, Size.get());
  }

The problem is with the original logic after Expected<uint64_t> Size = getSize();.
Imagine we have a broken object and getSize() returns (uint64)-1.
We do not report it and return a StringRef that has overruns the file size.
But we could report an error (i.e. add a new one) instead.

This error could contain an information about a size of a broken child.
It would be printed differently in Expected<uint64_t> Size and Expected<uint32_t> Size
cases I think. That is what something I was thinking about.

llvm/unittests/Object/ArchiveTest.cpp
26

static char const -> static const char?

67

Not strong suggestion:

MemoryBufferRef Source(ArchiveWithMember, "regular");
Expected<std::unique_ptr<Archive>> A = Archive::create(Source);
ASSERT_THAT_EXPECTED(A, Succeeded());
Error ChildErr = Error::success();
auto Child = (*A)->child_begin(ChildErr);
ASSERT_THAT_ERROR(std::move(ChildErr), Succeeded());

This part is almost similar to one in ArchiveChildGetSizeThinArchive and ArchiveChildGetSizeRegularArchive.
Should this code be moved to a helper that returns a Child?

jhenderson marked an inline comment as done.Mar 10 2020, 4:51 AM
jhenderson added inline comments.
llvm/unittests/Object/ArchiveTest.cpp
67

I considered that. Unfortunately, ASSERT_* macros return void, so can't be inside other functions that return something else. Consequently, we have to put them in yet another function, then check to see if the thing is valid anyway (rendering the ASSERT pointless), and the logic becomes somewhat complex and hard to follow. I don't think there's much benefit as a result.

Simplify size handling slightly.

I've not deleted the test yet, but I will do so in due course before this gets committed. I'd like to see if we can come up with an alternative test option first, if we can. For the record, the test passes locally with my patch, but didn't without the fix (it was how I found the bug in the first place).

I don't know if it's portable (you'd have to add requires for Linux/shell, or maybe do some lit configuration to detect if it's available), but you could try using fallocate:

$ time fallocate -l 3G test.large
fallocate -l 3G test.large  0.00s user 0.01s system 39% cpu 0.017 total
$ time cat test.large > /dev/null
cat test.large > /dev/null  0.01s user 1.18s system 99% cpu 1.193 total

If you want to try that, I would still recommend keeping the unit test that runs on all platforms, and just having this lit one be an extra test.

llvm/unittests/Object/ArchiveTest.cpp
67

One way you might be able to refactor this is to not have the assert in the helper method:

Expected<Child> createChild(StringRef src, StringRef name) {
  MemoryBufferRef Source(src, name);
  Expected<std::unique_ptr<Archive>> A = Archive::create(Source);
  if (!A) return A.takeError();

  Error ChildErr = Error::success();
  auto Child = (*A)->child_begin(ChildErr);
  if (ChildErr) return std::move(ChildErr);
  return std::move(Child);
}

TEST(ArchiveTests, ArchiveChildGetBuffer) {
  auto Child = createChild(ArchiveWithMember, "regular");
  ASSERT_THAT_EXPECTED(Child, Succeeded());
...

(heavy pseudocode, I almost definitely got a lot of types wrong... and I'm also ignoring the lifetime issues)

72

ASSERT_TRUE((bool)Buffer);

jhenderson marked 4 inline comments as done.Mar 10 2020, 9:24 AM

Simplify size handling slightly.

I've not deleted the test yet, but I will do so in due course before this gets committed. I'd like to see if we can come up with an alternative test option first, if we can. For the record, the test passes locally with my patch, but didn't without the fix (it was how I found the bug in the first place).

I don't know if it's portable (you'd have to add requires for Linux/shell, or maybe do some lit configuration to detect if it's available), but you could try using fallocate:

$ time fallocate -l 3G test.large
fallocate -l 3G test.large  0.00s user 0.01s system 39% cpu 0.017 total
$ time cat test.large > /dev/null
cat test.large > /dev/null  0.01s user 1.18s system 99% cpu 1.193 total

If you want to try that, I would still recommend keeping the unit test that runs on all platforms, and just having this lit one be an extra test.

I'm not sure that's going to work. We'll still need an object of the right size in the first place, to punch holes in, as otherwise it won't be a valid linker input, so it doesn't avoid the size issue. Furthermore, I doubt that llvm-ar or LLD will preserve the sparseness, so it won't be useful for testing this or LLD's behaviour.

Address review comments (char const -> const char, pull code out into helper).

MaskRay accepted this revision.EditedMar 10 2020, 5:52 PM

Simplify size handling slightly.

I've not deleted the test yet, but I will do so in due course before this gets committed. I'd like to see if we can come up with an alternative test option first, if we can. For the record, the test passes locally with my patch, but didn't without the fix (it was how I found the bug in the first place).

I don't know if it's portable (you'd have to add requires for Linux/shell, or maybe do some lit configuration to detect if it's available), but you could try using fallocate:

$ time fallocate -l 3G test.large
fallocate -l 3G test.large  0.00s user 0.01s system 39% cpu 0.017 total
$ time cat test.large > /dev/null
cat test.large > /dev/null  0.01s user 1.18s system 99% cpu 1.193 total

If you want to try that, I would still recommend keeping the unit test that runs on all platforms, and just having this lit one be an extra test.

Nice tip! Sparse files is even supported on ext2. So running it on Linux will be safe. However, if the disk cannot allocate 3G space (if the space is less 3G, then it surely cannot), fallocate(2) can fail with ENOSPC 28 No space left on device.

Additionally, if a test can take more than one second, I will still be hesitant..

ld.lld -m elf_x86_64 -b binary test.large -o out can take quite a few seconds. 2.5s in tmpfs with warn cache on my pc. It could be much slower with a different configuration.

FTR.

The current size of LLVM build folder for me is 26.1 Gb.
I think it is reasonable to avoid letting it go to 30.1 Gb (e.g. +4 Gb, +15%).

We have huge-temporarily-file.s test in LLD with the following comment:

  1. This inputs previously created a 4gb temporarily file under 32 bit
  2. configuration. Issue was fixed. There is no clean way to check that from here.
  3. This testcase added for documentation purposes.

I believe that +15% of peak free space consumption is probably too much. It might be better to
have no test at all sometimes.

FTR.

The current size of LLVM build folder for me is 26.1 Gb.
I think it is reasonable to avoid letting it go to 30.1 Gb (e.g. +4 Gb, +15%).

We have huge-temporarily-file.s test in LLD with the following comment:

  1. This inputs previously created a 4gb temporarily file under 32 bit
  2. configuration. Issue was fixed. There is no clean way to check that from here.
  3. This testcase added for documentation purposes.

I believe that +15% of peak free space consumption is probably too much. It might be better to
have no test at all sometimes.

We have or had that file? I couldn't find it in git as things stand.

This revision was automatically updated to reflect the committed changes.

FTR.

The current size of LLVM build folder for me is 26.1 Gb.
I think it is reasonable to avoid letting it go to 30.1 Gb (e.g. +4 Gb, +15%).

We have huge-temporarily-file.s test in LLD with the following comment:

  1. This inputs previously created a 4gb temporarily file under 32 bit
  2. configuration. Issue was fixed. There is no clean way to check that from here.
  3. This testcase added for documentation purposes.

I believe that +15% of peak free space consumption is probably too much. It might be better to
have no test at all sometimes.

We have or had that file? I couldn't find it in git as things stand.

I believe we have it:
https://github.com/llvm/llvm-project/blob/master/lld/test/ELF/linkerscript/huge-temporary-file.s

jhenderson added a comment.EditedMar 11 2020, 4:52 AM

FTR.

The current size of LLVM build folder for me is 26.1 Gb.
I think it is reasonable to avoid letting it go to 30.1 Gb (e.g. +4 Gb, +15%).

We have huge-temporarily-file.s test in LLD with the following comment:

  1. This inputs previously created a 4gb temporarily file under 32 bit
  2. configuration. Issue was fixed. There is no clean way to check that from here.
  3. This testcase added for documentation purposes.

I believe that +15% of peak free space consumption is probably too much. It might be better to
have no test at all sometimes.

We have or had that file? I couldn't find it in git as things stand.

I believe we have it:
https://github.com/llvm/llvm-project/blob/master/lld/test/ELF/linkerscript/huge-temporary-file.s

Thanks. I didn't look in the linkerscript directory.