This is an archive of the discontinued LLVM Phabricator instance.

[llvm-ar][Object] Fix detection of need for 64-bit archive symbol tables
ClosedPublic

Authored by andrewng on Oct 21 2020, 8:58 AM.

Details

Summary

The code to detect the requirement for 64-bit offsets in the archive
symbol table was not correctly accounting for the archive file signature
and the size of all the contents of the symbol table itself, e.g. the
symbol table's header and string table. Also was not considering the
variation in symbol table formats. This could result in the creation of
large archives with a corrupt symbol table.

Change the testing environment variable SYM64_THRESHOLD to be an
absolute value rather than a power of 2 in order to enable precise
testing of this detection code.

Diff Detail

Event Timeline

andrewng created this revision.Oct 21 2020, 8:58 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 21 2020, 8:58 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
andrewng requested review of this revision.Oct 21 2020, 8:58 AM

I understand that a test may be problematic because it requires 4+GiB output. Do you have a script to craft such a test case so that people working on llvm-ar can run it on demand? The test can have UNSUPPORTED: *.

grimar added inline comments.Oct 21 2020, 11:52 PM
llvm/lib/Object/ArchiveWriter.cpp
294

Size is a size of the symbol table, not a number of entries, right? Though I am not sure it makes much sense to have a comment about it, I think it is clear from the function name?

337

Perhaps just is64BitKind(Kind) ? 8 : 4? (it is a very often used construction in LLVM).

610

I was a bit confused when saw writeSymbolTableHeader here.
Kind can be changed below:

if (LastOffset >= (1ULL << Sym64Threshold)) {
  if (Kind == object::Archive::K_DARWIN)
    Kind = object::Archive::K_DARWIN64;
  else
    Kind = object::Archive::K_GNU64;
}

I think would be more natural to call writeSymbolTableHeader after last possible modification of Kind, i.e. after this code block. Is it possible?

grimar added inline comments.Oct 22 2020, 12:17 AM
llvm/lib/Object/ArchiveWriter.cpp
294

Ah, I see now. It describes that the number of entries is written first.

andrewng updated this revision to Diff 299925.Oct 22 2020, 5:02 AM
andrewng edited the summary of this revision. (Show Details)

Updated to address review comments and updated to add better testing. Whilst adding this testing, found another bug in the detection code; it was also not accounting for the size of the file signature!

grimar added inline comments.Oct 22 2020, 5:07 AM
llvm/lib/Object/ArchiveWriter.cpp
610

This wasn't addressed?

aside: could you please mask addressed comments as "Done"? So that reviewers can see the status.

610

mask -> mark

I understand that a test may be problematic because it requires 4+GiB output. Do you have a script to craft such a test case so that people working on llvm-ar can run it on demand? The test can have UNSUPPORTED: *.

No, I don't have such a script. I actually had a use case that was showing the issue. I've updated the existing testing to be more precise to provide coverage.

llvm/lib/Object/ArchiveWriter.cpp
610

This call to writeSymbolTableHeader is purely used to determine the size of the header and is not a write to the output which happens later.

grimar added inline comments.Oct 22 2020, 5:11 AM
llvm/lib/Object/ArchiveWriter.cpp
610

This call to writeSymbolTableHeader is purely used to determine the size of the header and is not a write to the output which happens later.

writeSymbolTableHeader still uses Kind. It is just not clean to give it one Kind and change it to another later.

andrewng marked an inline comment as done.Oct 22 2020, 5:11 AM
andrewng marked 2 inline comments as done.Oct 22 2020, 6:07 AM
andrewng added inline comments.
llvm/lib/Object/ArchiveWriter.cpp
610

But this is by design. This call to writeSymbolTableHeader is used as part of the calculation of the 32-bit symbol table size in order to determine whether a 64-bit symbol table is required. If this is required, then the Kind is changed and the appropriate format is output. Does that make sense?

grimar added inline comments.Oct 22 2020, 6:24 AM
llvm/lib/Object/ArchiveWriter.cpp
610

I believe that the behavior of original code was:

  1. Calculate the size of the symbol table.
  2. Switch to 64-bit symbol table Kind if needed.
  3. Write the header and the body of the symbol table in one move.

Now the behavior with this patch is:

  1. Calculate the size of the symbol table.
  2. Write the symbol table header using the Kind (Kind1) we have at this step.
  3. Switch to 64-bit symbol table Kind (Kind2) if needed.
  4. Write the body of the symbol table using the new Kind (Kind2)

computeSymbolTableSize uses Kind for isBSDLike(Kind) call.
Perhaps instead of Kind it could take bool IsBSDLike? That way we know that
changing of Kind from 32->64 does not affect anything. What do you think?

andrewng marked 2 inline comments as done.Oct 22 2020, 6:42 AM
andrewng added inline comments.
llvm/lib/Object/ArchiveWriter.cpp
610

Sorry if I haven't explained clearly, but the behaviour with this patch is:

  1. Correctly calculate LastOffset including the correct calculation for the 32-bit symbol table size and also include the file signature.
  2. The above calculation includes a call to writeSymbolTableHeader with the original Kind to a temporary buffer in order to determine the symbol table header size.
  3. Switch to a 64-bit symbol table Kind if required.
  4. Write archive which includes a call to writeSymbolTableHeader with the final Kind and the actual output buffer.

Hope that clarifies the situation.

andrewng marked an inline comment as done.Oct 22 2020, 6:49 AM
andrewng added inline comments.
llvm/lib/Object/ArchiveWriter.cpp
610

Perhaps it would make it clearer if I wrap this use of writeSymbolTableHeader (in point 2 above) in a function such as computeSymbolTableHeaderSize? Although this would be the only place that it would be used. What do you think?

grimar added inline comments.Oct 22 2020, 7:02 AM
llvm/lib/Object/ArchiveWriter.cpp
610

Perhaps it would make it clearer if I wrap this use of writeSymbolTableHeader (in point 2 above) in a function such as computeSymbolTableHeaderSize

Makes sense I think. Probably you can just have a lambda names like that instead of having a function.

610

names -> named

andrewng updated this revision to Diff 300010.Oct 22 2020, 9:13 AM

Update to clarify the purpose of the initial call to writeSymbolTableHeader.

andrewng marked an inline comment as done.Oct 22 2020, 9:14 AM

I think this reads much better, thanks! I have no more comments.

This revision is now accepted and ready to land.Oct 23 2020, 10:55 AM
MaskRay accepted this revision.Oct 23 2020, 11:28 PM