This is an archive of the discontinued LLVM Phabricator instance.

[Archive] Don't throw away errors for malformed archive members
ClosedPublic

Authored by jhenderson on Sep 25 2020, 2:29 AM.

Details

Summary

When adding an archive member with a problem, e.g. a new bitcode with an old archiver, containing an unsupported attribute, or an ELF file with a malformed symbol table, the archiver would throw away the error and simply add the member to the archive without any symbol entries. This meant that the resultant archive could be silently unusable when not using --whole-archive, and result in unexpected undefined symbols.

This change fixes this issue by addressing two FIXMEs and only throwing away not-an-object errors. However, this meant that some LLD tests which didn't need symbol tables and were using invalid members deliberately to test the linker's malformed input handling no longer worked, so this patch also stops the archiver from looking for symbols in an object if it doesn't require a symbol table, and updates the tests accordingly.

Diff Detail

Event Timeline

jhenderson created this revision.Sep 25 2020, 2:29 AM
Herald added a project: Restricted Project. · View Herald Transcript
jhenderson requested review of this revision.Sep 25 2020, 2:29 AM
jhenderson updated this revision to Diff 294259.
jhenderson added reviewers: rupprecht, gbreynoo.

Fix test comment.

grimar added inline comments.Sep 25 2020, 2:56 AM
llvm/lib/Object/ArchiveWriter.cpp
381

At line 362 we call identify_magic(Buf.getBuffer()). I wonder if it will be cleaner to
add a new static method to SymbolicFile, e.g:

bool SymbolicFile::isSymbolicFile(file_magic Type);

and then to use it from here and also from SymbolicFile::createSymbolicFile,
instead of the code it has:

case file_magic::unknown:
case file_magic::archive:
case file_magic::coff_cl_gl_object:
case file_magic::macho_universal_binary:
case file_magic::windows_resource:
case file_magic::pdb:
case file_magic::minidump:
case file_magic::tapi_file:
  return errorCodeToError(object_error::invalid_file_type);

With that you'll don't need to handle error code in this way, what is probably better because
I think the general direction is to get rid of them in favor of Expected<> etc. What do you think?

llvm/test/Object/archive-unknown-filetype.test
2

Did you mean this?

jhenderson marked 2 inline comments as done.Sep 25 2020, 7:30 AM
jhenderson added inline comments.
llvm/test/Object/archive-unknown-filetype.test
2

I fixed this one already (think you must have missed my patch update).

jhenderson marked an inline comment as done.

Refactor to remove reliance on error code.

Minor nit. Otherwise looks good to me. Lets see what others think.

llvm/lib/Object/ArchiveWriter.cpp
363

I'd use const.

rupprecht accepted this revision.Sep 25 2020, 11:02 AM

There's a build warning that needs to be fixed, but otherwise LG.

llvm/lib/Object/ArchiveWriter.cpp
499–501

nit: looks like tab characters were inserted, can you run clang-format?

llvm/lib/Object/SymbolicFile.cpp
47

Splitting up the switch leads to -Wswitch errors because now each switch statement is incomplete:

/home/rupprecht/src/llvm-project/llvm/lib/Object/SymbolicFile.cpp:47:11: warning: 8 enumeration values not handled in switch: 'unknown', 'archive', 'macho_universal_binary'... [-Wswitch]
  switch (Type) {
          ^
/home/rupprecht/src/llvm-project/llvm/lib/Object/SymbolicFile.cpp:97:11: warning: 23 enumeration values not handled in switch: 'bitcode', 'elf', 'elf_relocatable'... [-Wswitch]
  switch (Type) {
          ^

Normally the right fix is to add a default switch case to handle all the unused cases. If you're concerned about the list of file_magic::unknown, file_magic::archive, ... in each method going out of sync in the two definitions (and new enums accidentally being handled with the default case), you could define that list in a macro, e.g.

#define NOT_SYMBOLIC_FILES \
  case file_magic::unknown: \
  case file_magic::archive: \
  case file_magic::coff_cl_gl_object: \
...
  case file_magic::tapi_file

  switch (Type) {
    NOT_SYMBOLIC_FILES:
      return false;
    default:
      return true;
  }

(note no trailing : in the last case you can write NOT_SYMBOLIC_FILES: and not confused IDEs)

This revision is now accepted and ready to land.Sep 25 2020, 11:02 AM
MaskRay added inline comments.Sep 25 2020, 11:13 AM
llvm/lib/Object/SymbolicFile.cpp
139
default:
  return true;

resolves a clang warning: Object/SymbolicFile.cpp:97:11: warning: 23 enumeration values not handled in switch: 'bitcode', 'elf', 'elf_relocatable'... [-Wswitch]

rupprecht added inline comments.Sep 25 2020, 11:17 AM
llvm/lib/Object/SymbolicFile.cpp
139

Yes, but see my other comment, because that is addressing a symptom instead of fixing the cause. Having an explicit switch in the previous switch ensures that a new file type added doesn't accidentally get included in the default switch branch.

MaskRay added a comment.EditedSep 25 2020, 11:27 AM

Checked lld/test/ELF updates, GNU ar silently allows invalid 'symbolic' files. I think being more rigid in llvm-ar is fine.

An alternative to the new isSymbolicFile is by checking whether createSymbolicFile returns an ECError whose error_code (ECError::convertToErrorCode) is object_error::invalid_file_type.

Checked lld/test/ELF updates, GNU ar silently allows invalid 'symbolic' files. I think being more rigid in llvm-ar is fine.

An alternative to the new isSymbolicFile is by checking whether createSymbolicFile returns an ECError whose error_code (ECError::convertToErrorCode) is object_error::invalid_file_type.

That's what I originally did, but a) the code is not particularly clean, and b) it's brittle to changes in how the error is reported (relies on on the error code being set, rather than e.g. changing to a normal LLVM error). @grimar previously suggested to change this (see the inline comment).

llvm/lib/Object/ArchiveWriter.cpp
499–501

That's an unfortunate UI issue in Phabricator that was recently introduced. Code that has been indented now shows up this way. There are no tab characters here.

llvm/lib/Object/SymbolicFile.cpp
47

I don't particularly like macros. They are a pain to debug and not particularly readable, in my opinion. Maybe the way to resolve this is to flip the switch in isSymbolicFile on its head, so that any new file_magic types will return false (specified via a default label), until this function is modified. This means you wouldn't be able to create a symbolic file for them, which I don't think is unreasonable. This switch here then gains a default: llvm_unreachable() line, since it would only be reachable if the isSymbolicFile method returned true.

jhenderson marked 5 inline comments as done.

Invert isSymbolFIle switch, and add const.

grimar added inline comments.Sep 28 2020, 3:53 AM
llvm/lib/Object/SymbolicFile.cpp
49–51

I think the Content is always non-null here now?

jhenderson added inline comments.Sep 28 2020, 4:22 AM
llvm/lib/Object/SymbolicFile.cpp
49–51

Yeah, I think this can be simplified. I'm not going to have a chance to update the patch again today, but will fix in the commit, if you're otherwise happy.

It will become:

case file_magic::bitcode:
  return IRObjectFile::create(Object, *Context);
grimar added inline comments.Sep 28 2020, 4:40 AM
llvm/lib/Object/SymbolicFile.cpp
49–51

The rest looks fine to me. Please wait to see @MaskRay/@rupprecht opinions too

MaskRay accepted this revision.Sep 28 2020, 11:18 AM
MaskRay added inline comments.
llvm/lib/Object/SymbolicFile.cpp
49–51

Context is always non-null. This is however a hidden fact only clear after I read all the callers..

llvm/test/Object/archive-unknown-filetype.test
6

Using echo something > %t can be more robust. touch %t does not update the file content. If %t preexists as a different file type, this may fail spuriously.

rupprecht requested changes to this revision.Sep 28 2020, 1:18 PM

Requesting changes mostly because this is marked as accepted but I still see build errors; overall this looks fine though.

llvm/lib/Object/SymbolicFile.cpp
47–48

I'm still seeing -Wswitch errors:

/home/rupprecht/src/llvm-project/llvm/lib/Object/SymbolicFile.cpp:47:11: warning: 8 enumeration values not handled in switch: 'unknown', 'archive', 'macho_universal_binary'... [-Wswitch]
  switch (Type) {
49–51

I found this thread hard to reconcile with the code: IIUC, Context is only non-null *in this case statement*: if Context is null and this is a bitcode file, isSymbolicFile returns false, and an error is returned above, so we don't reach this line. However, if this is a wasm_object (just one example), isSymbolicFile returns true even if Context is null. Context can be null in this switch statement, but not this particular case.

I think it's fine to make that simplification, but with a comment that isSymbolicFile verifies that Context is non-null when the type is bitcode. Otherwise, the nullness of Context is very difficult to reason with -- the baseline is much easier to understand in that regard.

This revision now requires changes to proceed.Sep 28 2020, 1:18 PM
jhenderson marked 6 inline comments as done.

Add missing default; simplify bitcode object checking; add comment about Context being non-null; use echo instead of touch in the test.

llvm/lib/Object/SymbolicFile.cpp
47–48

Oops, sorry. Forgot to add the necessary default label.

grimar accepted this revision.Sep 29 2020, 11:32 PM

LGTM

rupprecht accepted this revision.Sep 30 2020, 12:21 PM

Thanks!

This revision is now accepted and ready to land.Sep 30 2020, 12:21 PM