The archive library truncated the size of archive members whose size was greater than max uint32_t, leading to errors in tools that use the library. This patch fixes the issue and adds some unit tests to verify.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Time | Test | |
---|---|---|
50 ms | lld.ELF::Unknown Unit Message ("") |
Event Timeline
Command Output (stderr): -- bzip2: Compressed file ends unexpectedly; perhaps it is corrupted? *Possible* reason follows. bzip2: Success Input file = /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/lld/test/ELF/Inputs/large-files1.o.bz2, output file = (stdout) It is possible that the compressed file(s) have become corrupted. You can use the -tvv option to test integrity of such files. You can use the `bzip2recover' program to attempt to recover data from undamaged sections of corrupted files.
As you see, many systems may not be able to run such tests. Deleting the test may be fine.
llvm/lib/Object/Archive.cpp | ||
---|---|---|
394–395 | Does return Header.getSize() work? |
I am also not fond of having tests that might consume gigabytes to run.
Perhaps you could craft an object that has a broken size written to a header, e.g. (MAX_UINT32 + 1) ant test
that LLD does not crash and report an error with that would contain the size value found?
I'm not sure I understand what you mean here. The bug fixed by this patch manifests as attempting to read a section header table past the end of the file, because the result of the truncated size is failing to create a buffer that is large enough. This fix can therefore only be verified by attempting to use data in the latter part of a buffer that is > 4GB in size. The only alternative I can think of to having large files as inputs, is to create sparse files, but I'm not sure if the test would be portable consequently.
Simplify size handling slightly.
I've not deleted the test yet, but I will do so in due course before this gets committed. I'd like to see if we can come up with an alternative test option first, if we can. For the record, the test passes locally with my patch, but didn't without the fix (it was how I found the bug in the first place).
I realised that I could at least test Archive::Child::getSize() function using gtest unittests. I'll take a look at that, and see if getBuffer is possible too.
LGTM. Using unit test is good I think.
Idea was based on a testing of a values reported in a error message.
See the code this patch changes:
Expected<StringRef> Archive::Child::getBuffer() const { Expected<bool> isThinOrErr = isThinMember(); if (!isThinOrErr) return isThinOrErr.takeError(); bool isThin = isThinOrErr.get(); if (!isThin) { Expected<uint64_t> Size = getSize(); if (!Size) return Size.takeError(); return StringRef(Data.data() + StartOfFile, Size.get()); }
The problem is with the original logic after Expected<uint64_t> Size = getSize();.
Imagine we have a broken object and getSize() returns (uint64)-1.
We do not report it and return a StringRef that has overruns the file size.
But we could report an error (i.e. add a new one) instead.
This error could contain an information about a size of a broken child.
It would be printed differently in Expected<uint64_t> Size and Expected<uint32_t> Size
cases I think. That is what something I was thinking about.
llvm/unittests/Object/ArchiveTest.cpp | ||
---|---|---|
25 ↗ | (On Diff #249089) | static char const -> static const char? |
66 ↗ | (On Diff #249089) | Not strong suggestion: MemoryBufferRef Source(ArchiveWithMember, "regular"); Expected<std::unique_ptr<Archive>> A = Archive::create(Source); ASSERT_THAT_EXPECTED(A, Succeeded()); Error ChildErr = Error::success(); auto Child = (*A)->child_begin(ChildErr); ASSERT_THAT_ERROR(std::move(ChildErr), Succeeded()); This part is almost similar to one in ArchiveChildGetSizeThinArchive and ArchiveChildGetSizeRegularArchive. |
llvm/unittests/Object/ArchiveTest.cpp | ||
---|---|---|
66 ↗ | (On Diff #249089) | I considered that. Unfortunately, ASSERT_* macros return void, so can't be inside other functions that return something else. Consequently, we have to put them in yet another function, then check to see if the thing is valid anyway (rendering the ASSERT pointless), and the logic becomes somewhat complex and hard to follow. I don't think there's much benefit as a result. |
I don't know if it's portable (you'd have to add requires for Linux/shell, or maybe do some lit configuration to detect if it's available), but you could try using fallocate:
$ time fallocate -l 3G test.large fallocate -l 3G test.large 0.00s user 0.01s system 39% cpu 0.017 total $ time cat test.large > /dev/null cat test.large > /dev/null 0.01s user 1.18s system 99% cpu 1.193 total
If you want to try that, I would still recommend keeping the unit test that runs on all platforms, and just having this lit one be an extra test.
llvm/unittests/Object/ArchiveTest.cpp | ||
---|---|---|
66 ↗ | (On Diff #249089) | One way you might be able to refactor this is to not have the assert in the helper method: Expected<Child> createChild(StringRef src, StringRef name) { MemoryBufferRef Source(src, name); Expected<std::unique_ptr<Archive>> A = Archive::create(Source); if (!A) return A.takeError(); Error ChildErr = Error::success(); auto Child = (*A)->child_begin(ChildErr); if (ChildErr) return std::move(ChildErr); return std::move(Child); } TEST(ArchiveTests, ArchiveChildGetBuffer) { auto Child = createChild(ArchiveWithMember, "regular"); ASSERT_THAT_EXPECTED(Child, Succeeded()); ... (heavy pseudocode, I almost definitely got a lot of types wrong... and I'm also ignoring the lifetime issues) |
71 ↗ | (On Diff #249089) | ASSERT_TRUE((bool)Buffer); |
I'm not sure that's going to work. We'll still need an object of the right size in the first place, to punch holes in, as otherwise it won't be a valid linker input, so it doesn't avoid the size issue. Furthermore, I doubt that llvm-ar or LLD will preserve the sparseness, so it won't be useful for testing this or LLD's behaviour.
Nice tip! Sparse files is even supported on ext2. So running it on Linux will be safe. However, if the disk cannot allocate 3G space (if the space is less 3G, then it surely cannot), fallocate(2) can fail with ENOSPC 28 No space left on device.
Additionally, if a test can take more than one second, I will still be hesitant..
ld.lld -m elf_x86_64 -b binary test.large -o out can take quite a few seconds. 2.5s in tmpfs with warn cache on my pc. It could be much slower with a different configuration.
FTR.
The current size of LLVM build folder for me is 26.1 Gb.
I think it is reasonable to avoid letting it go to 30.1 Gb (e.g. +4 Gb, +15%).
We have huge-temporarily-file.s test in LLD with the following comment:
- This inputs previously created a 4gb temporarily file under 32 bit
- configuration. Issue was fixed. There is no clean way to check that from here.
- This testcase added for documentation purposes.
I believe that +15% of peak free space consumption is probably too much. It might be better to
have no test at all sometimes.
Does return Header.getSize() work?