This is an archive of the discontinued LLVM Phabricator instance.

[BinaryFormat] Limit COFF file detection with unknown machine type
Needs ReviewPublic

Authored by sbc100 on Jan 27 2020, 1:25 PM.

Details

Reviewers
Bigcheese
Summary

When the machine type is zero (first two bytes) also check that we
have a non-zero number of sections (second two bytes). This avoids
mistaking any file that has two leaving zeros as a COFF file.

Fixes PR44683

Diff Detail

Event Timeline

sbc100 created this revision.Jan 27 2020, 1:25 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 27 2020, 1:25 PM
sbc100 added a subscriber: alexcrichton.
sbc100 updated this revision to Diff 240670.Jan 27 2020, 1:41 PM

update comment

Unit tests: fail. 62143 tests passed, 6 failed and 811 were skipped.

failed: LLVM.tools/llvm-readobj/COFF/file-headers.test
failed: libc++.std/language_support/cmp/cmp_partialord/partialord.pass.cpp
failed: libc++.std/language_support/cmp/cmp_strongeq/cmp.strongeq.pass.cpp
failed: libc++.std/language_support/cmp/cmp_strongord/strongord.pass.cpp
failed: libc++.std/language_support/cmp/cmp_weakeq/cmp.weakeq.pass.cpp
failed: libc++.std/language_support/cmp/cmp_weakord/weakord.pass.cpp

clang-tidy: fail. clang-tidy found 0 errors and 1 warnings. 0 of them are added as review comments below (why?).

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

Unit tests: fail. 62142 tests passed, 7 failed and 811 were skipped.

failed: LLVM.tools/llvm-readobj/COFF/file-headers.test
failed: libc++.std/language_support/cmp/cmp_partialord/partialord.pass.cpp
failed: libc++.std/language_support/cmp/cmp_strongeq/cmp.strongeq.pass.cpp
failed: libc++.std/language_support/cmp/cmp_strongord/strongord.pass.cpp
failed: libc++.std/language_support/cmp/cmp_weakeq/cmp.weakeq.pass.cpp
failed: libc++.std/language_support/cmp/cmp_weakord/weakord.pass.cpp
failed: libc++.std/thread/futures/futures_async/async.pass.cpp

clang-tidy: fail. clang-tidy found 0 errors and 1 warnings. 0 of them are added as review comments below (why?).

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

llvm::identify_magic is not really a general-purpose file identification routine; it's specifically designed to sniff out the format of files which are interesting to LLVM tools, and it doesn't try very hard to figure out whether the input is valid. If it's returning file_magic::unknown, probably something has gone wrong elsewhere. That said, it would be nice to use more than two bytes to make a positive identification, if there's some other reliable indicator for a COFF object.

Are PE object files with IMAGE_FILE_MACHINE_UNKNOWN actually something that shows up in practice (besides the special case of "\0\0\xFF\xFF")? What would such a file contain? Can any LLVM tools do anything interesting with it?

If we're going to try to sniff out whether the COFF header is valid, should we do it for all coff_object files?

Is it reasonable to assume all COFF object files have at least one section? It's sort of degenerate, but it's possible to construct an empty object file.

llvm::identify_magic is not really a general-purpose file identification routine; it's specifically designed to sniff out the format of files which are interesting to LLVM tools, and it doesn't try very hard to figure out whether the input is valid. If it's returning file_magic::unknown, probably something has gone wrong elsewhere. That said, it would be nice to use more than two bytes to make a positive identification, if there's some other reliable indicator for a COFF object.

But tools like llvm-nm do end up using identify_magic to decide if they should decode an object. Running llvm-nm on ar archives is supposed to be robust against non-object files being in the archive I believe.

Are PE object files with IMAGE_FILE_MACHINE_UNKNOWN actually something that shows up in practice (besides the special case of "\0\0\xFF\xFF")? What would such a file contain? Can any LLVM tools do anything interesting with it?

It seem very unlikely that there are any uses for such an object.

If we're going to try to sniff out whether the COFF header is valid, should we do it for all coff_object files?

Yes, I thought that was what this change was doing but maybe it should happen elsewhere too.

Is it reasonable to assume all COFF object files have at least one section? It's sort of degenerate, but it's possible to construct an empty object file.

I think it is reasonable, although this change did uncover a test file that creates and emtpy object file from yaml. Note that in my experiments it is *not* possible create such an empty object file using clang or llvm tools.

In the mean time, it looks like rust has worked around this issue upstream so I'm tempted to simply close this issue and abandon this change: https://github.com/rust-lang/rust/pull/66235