This is an archive of the discontinued LLVM Phabricator instance.

[obj2yaml] - Add support of dumping archives.
AbandonedPublic

Authored by grimar on Oct 19 2020, 4:59 AM.

Download Raw Diff

Details

Reviewers

jhenderson
MaskRay

Summary

Currently obj2yaml does not support dumping archives.
With this change the behavior changed to:

Try to extract each binary one by one and try to dump it.
Report an error and skip a binary when it is not possible to dump it.

Diff Detail

Event Timeline

grimar created this revision.Oct 19 2020, 4:59 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 19 2020, 4:59 AM

grimar requested review of this revision.Oct 19 2020, 4:59 AM

What's the use case for this? I don't follow why you can't just use llvm-ar to extract the files and then run obj2yaml on them individually.

There was a TODO in the code that suggested to fix this. I think this is also consistent with other tools behavior, e.g with llvm-readelf.

Honestly, I'm not convinced that just because we can, we should. There are good reasons for other tools like llvm-readelf to run over all archive members, but obj2yaml isn't, to my knowledge, used in a production environment or even in a testing environment where such functionality would prove particularly useful. Increasing code complexity in order to support it seems unwise to me.

I actually think there might be a slightly different approach to this, for which there might be a use-case: rather than dumping individual YAML blocks, add them all as "children" for an archive YAML block. That way, you can also see the non-binary members of the archive, e.g. the symbol table and the string table. This is probably more useful for the yaml2obj side, to allow testing of malformed archives in llvm-ar testing.

In D89691#2341153, @jhenderson wrote:

Honestly, I'm not convinced that just because we can, we should. There are good reasons for other tools like llvm-readelf to run over all archive members, but obj2yaml isn't, to my knowledge, used in a production environment or even in a testing environment where such functionality would prove particularly useful. Increasing code complexity in order to support it seems unwise to me.

OK.

In D89691#2341153, @jhenderson wrote:

I actually think there might be a slightly different approach to this, for which there might be a use-case: rather than dumping individual YAML blocks, add them all as "children" for an archive YAML block. That way, you can also see the non-binary members of the archive, e.g. the symbol table and the string table. This is probably more useful for the yaml2obj side, to allow testing of malformed archives in llvm-ar testing.

I thought about it (as an independent feature/option though) too. It could be useful for tests like https://github.com/llvm/llvm-project/blob/master/llvm/test/tools/llvm-objdump/malformed-archives.test,
which is currently has 14 precompiled broken archives, which are a bit hard to invesigate (at least for me it was, when I tried to craft a last test case for this patch).

I've even started to dig into format details of archives. Going to invesigate how to implement it for yaml2obj/obj2yaml.

Abandoning. As discussed, going to post a patch for obj2yaml/yaml2obj to support dumping/crafting archives very soon instead.

In D89691#2346607, @grimar wrote:

Abandoning. As discussed, going to post a patch for obj2yaml/yaml2obj to support dumping/crafting archives very soon instead.

An archive header with Children fields referencing some filenames/opaque contents? I think it might be useful.

Revision Contents

Path

Size

llvm/

test/

tools/

obj2yaml/

archive.test

131 lines

tools/

obj2yaml/

obj2yaml.cpp

89 lines

Diff 299012

llvm/test/tools/obj2yaml/archive.test

This file was added.

				## Check that obj2yaml is able to dump archives.

				# RUN: rm -f %t.a
				# RUN: rm -rf %t.dir
				# RUN: mkdir -p %t.dir
				# RUN: yaml2obj --docnum=1 %s -o %t.dir/obj.elf-x86-64
				# RUN: yaml2obj --docnum=2 %s -o %t.dir/obj.coff-arm
				# RUN: yaml2obj --docnum=3 %s -o %t.dir/minidump
				# RUN: yaml2obj --docnum=4 %s -o %t.dir/machO

				## Check we are able to extract and dump binaries of different types.
				# RUN: llvm-ar rc %t.a %t.dir/obj.elf-x86-64 %t.dir/obj.coff-arm %t.dir/minidump %t.dir/machO
				# RUN: obj2yaml %t.a \| FileCheck %s

				# CHECK: --- !ELF
				# CHECK-NEXT: FileHeader:
				# CHECK-NEXT: Class: ELFCLASS64
				# CHECK-NEXT: Data: ELFDATA2LSB
				# CHECK-NEXT: Type: ET_REL
				# CHECK-NEXT: Machine: EM_X86_64
				# CHECK-NEXT: ...
				# CHECK-NEXT: --- !COFF
				# CHECK-NEXT: header:
				# CHECK-NEXT: Machine: IMAGE_FILE_MACHINE_ARMNT
				# CHECK-NEXT: Characteristics: [ ]
				# CHECK-NEXT: sections: []
				# CHECK-NEXT: symbols: []
				# CHECK-NEXT: ...
				# CHECK-NEXT: --- !minidump
				# CHECK-NEXT: Streams:
				# CHECK-NEXT: - Type: SystemInfo
				# CHECK-NEXT: Processor Arch: BP_ARM64
				# CHECK-NEXT: Platform ID: Linux
				# CHECK-NEXT: CPU:
				# CHECK-NEXT: CPUID: 0x00000000
				# CHECK-NEXT: ...
				# CHECK-NEXT: --- !mach-o
				# CHECK-NEXT: FileHeader:
				# CHECK-NEXT: magic: 0xFEEDFACF
				# CHECK-NEXT: cputype: 0x01000007
				# CHECK-NEXT: cpusubtype: 0x00000003
				# CHECK-NEXT: filetype: 0x0000000A
				# CHECK-NEXT: ncmds: 0
				# CHECK-NEXT: sizeofcmds: 0
				# CHECK-NEXT: flags: 0x00000000
				# CHECK-NEXT: reserved: 0x00000000
				# CHECK-NEXT: ...

				--- !ELF
				FileHeader:
				Class: ELFCLASS64
				Data: ELFDATA2LSB
				Type: ET_REL
				Machine: EM_X86_64

				--- !COFF
				header:
				Machine: IMAGE_FILE_MACHINE_ARMNT
				Characteristics: [ ]
				sections: []
				symbols: []

				--- !minidump
				Streams:
				- Type: SystemInfo
				Processor Arch: BP_ARM64
				Platform ID: Linux

				--- !mach-o
				FileHeader:
				magic: 0xFEEDFACF
				cputype: 0x01000007
				cpusubtype: 0x00000003
				filetype: 0x0000000A
				ncmds: 0
				sizeofcmds: 0
				flags: 0x00000000
				reserved: 0x00000000
				LoadCommands: []

				## Check we report an error when trying to dump an invalid archive. Here it is truncated.

				# RUN: rm -f %t.truncated.a
				# RUN: echo -e "!<arch>\nfoo" > %t.truncated.a
				# RUN: not obj2yaml %t.truncated.a 2>&1 \| FileCheck %s -DFILE=%t.truncated.a --check-prefix=TRUNC

				# TRUNC: Error reading file: [[FILE]]: truncated or malformed archive (remaining size of archive too small for next archive member header at offset 8)

				## Check we report errors and skip broken binaries when dumping an archive.

				# RUN: rm -f %t.broken.objects.a
				# RUN: echo "foo" > %t.dir/foo
				# RUN: echo "bar" > %t.dir/bar
				# RUN: llvm-ar rc %t.broken.objects.a %t.dir/obj.elf-x86-64 %t.dir/foo %t.dir/bar %t.dir/machO
				# RUN: not obj2yaml %t.broken.objects.a 2>&1 \| FileCheck %s -DARFILE=%t.broken.objects.a --check-prefix=BROKEN

				# BROKEN: --- !ELF
				# BROKEN-NEXT: FileHeader:
				# BROKEN-NEXT: Class: ELFCLASS64
				# BROKEN-NEXT: Data: ELFDATA2LSB
				# BROKEN-NEXT: Type: ET_REL
				# BROKEN-NEXT: Machine: EM_X86_64
				# BROKEN-NEXT: ...
				# BROKEN-NEXT: Error reading file: [[ARFILE]](foo): The file was not recognized as a valid object file
				# BROKEN-NEXT: Error reading file: [[ARFILE]](bar): The file was not recognized as a valid object file
				# BROKEN-NEXT: --- !mach-o
				# BROKEN-NEXT: FileHeader:
				# BROKEN-NEXT: magic: 0xFEEDFACF
				# BROKEN-NEXT: cputype: 0x01000007
				# BROKEN-NEXT: cpusubtype: 0x00000003
				# BROKEN-NEXT: filetype: 0x0000000A
				# BROKEN-NEXT: ncmds: 0
				# BROKEN-NEXT: sizeofcmds: 0
				# BROKEN-NEXT: flags: 0x00000000
				# BROKEN-NEXT: reserved: 0x00000000
				# BROKEN-NEXT: ...

				## Check we print a file index in an error message when can't read a file name.

				# RUN: rm -f %t.broken.name.a
				## Use an arbitrary long name to trigger creation of the symbol table.
				# RUN: echo "foo" > %t.dir/an_arbitrary_long_file_name
				# RUN: llvm-ar rc %t.broken.name.a %t.dir/an_arbitrary_long_file_name
				# RUN: echo "with open('%/t.broken.name.a', 'rb+') as input:" > %t.py
				## Override the long name offset characters with a broken value.
				# RUN: echo " input.seek(0x63)" >> %t.py
				# RUN: echo " input.write(bytearray.fromhex('FF'))" >> %t.py
				# RUN: %python %t.py
				# RUN: not obj2yaml %t.broken.name.a 2>&1 \| FileCheck -DARFILE=%t.broken.name.a %s --check-prefix=BROKEN-NAME

				# BROKEN-NAME: Error reading file: [[ARFILE]](<file index: 0>): truncated or malformed archive (long name offset characters after the '/' are not all decimal numbers: '\377' for archive member header at offset 98)

llvm/tools/obj2yaml/obj2yaml.cpp

	Show All 11 Lines
	#include "llvm/Object/Minidump.h"			#include "llvm/Object/Minidump.h"
	#include "llvm/Support/CommandLine.h"			#include "llvm/Support/CommandLine.h"
	#include "llvm/Support/Errc.h"			#include "llvm/Support/Errc.h"
	#include "llvm/Support/InitLLVM.h"			#include "llvm/Support/InitLLVM.h"

	using namespace llvm;			using namespace llvm;
	using namespace llvm::object;			using namespace llvm::object;

				cl::opt<std::string> InputFilename(cl::Positional, cl::desc("<input file>"),
				cl::init("-"));

				static bool HasError = false;
				static void reportError(StringRef Input, Error Err) {
				HasError = true;
				outs().flush();
				if (Input == "-")
				Input = "<stdin>";
				std::string ErrMsg;
				raw_string_ostream OS(ErrMsg);
				logAllUnhandledErrors(std::move(Err), OS);
				OS.flush();
				errs() << "Error reading file: " << Input << ": " << ErrMsg;
				errs().flush();
				}

	static Error dumpObject(const ObjectFile &Obj) {			static Error dumpObject(const ObjectFile &Obj) {
	if (Obj.isCOFF())			if (Obj.isCOFF())
	return errorCodeToError(coff2yaml(outs(), cast<COFFObjectFile>(Obj)));			return errorCodeToError(coff2yaml(outs(), cast<COFFObjectFile>(Obj)));

	if (Obj.isXCOFF())			if (Obj.isXCOFF())
	return errorCodeToError(xcoff2yaml(outs(), cast<XCOFFObjectFile>(Obj)));			return errorCodeToError(xcoff2yaml(outs(), cast<XCOFFObjectFile>(Obj)));

	if (Obj.isELF())			if (Obj.isELF())
	return elf2yaml(outs(), Obj);			return elf2yaml(outs(), Obj);

	if (Obj.isWasm())			if (Obj.isWasm())
	return errorCodeToError(wasm2yaml(outs(), cast<WasmObjectFile>(Obj)));			return errorCodeToError(wasm2yaml(outs(), cast<WasmObjectFile>(Obj)));

	llvm_unreachable("unexpected object file format");			llvm_unreachable("unexpected object file format");
	}			}

	static Error dumpInput(StringRef File) {			static Error dumpBinary(const Binary &Binary) {
	Expected<OwningBinary<Binary>> BinaryOrErr = createBinary(File);			assert(!Binary.isArchive());
	if (!BinaryOrErr)
	return BinaryOrErr.takeError();

	Binary &Binary = *BinaryOrErr.get().getBinary();
	// Universal MachO is not a subclass of ObjectFile, so it needs to be handled			// Universal MachO is not a subclass of ObjectFile, so it needs to be handled
	// here with the other binary types.			// here with the other binary types.
	if (Binary.isMachO() \|\| Binary.isMachOUniversalBinary())			if (Binary.isMachO() \|\| Binary.isMachOUniversalBinary())
	return macho2yaml(outs(), Binary);			return macho2yaml(outs(), Binary);
	// TODO: If this is an archive, then burst it and dump each entry			if (const ObjectFile *Obj = dyn_cast<ObjectFile>(&Binary))
	if (ObjectFile *Obj = dyn_cast<ObjectFile>(&Binary))
	return dumpObject(*Obj);			return dumpObject(*Obj);
	if (MinidumpFile *Minidump = dyn_cast<MinidumpFile>(&Binary))			if (const MinidumpFile *Minidump = dyn_cast<MinidumpFile>(&Binary))
	return minidump2yaml(outs(), *Minidump);			return minidump2yaml(outs(), *Minidump);

	return Error::success();			return Error::success();
	}			}

	static void reportError(StringRef Input, Error Err) {			static void dumpArchive(StringRef ArchiveName, const Archive &A) {
	if (Input == "-")			auto getFileNameForError = [&](const Archive::Child &C,
	Input = "<stdin>";			unsigned Index) -> std::string {
	std::string ErrMsg;			Expected<StringRef> NameOrErr = C.getName();
	raw_string_ostream OS(ErrMsg);			if (NameOrErr)
	logAllUnhandledErrors(std::move(Err), OS);			return (ArchiveName + "(" + NameOrErr.get() + ")").str();
	OS.flush();			// If we have an error getting the name then we print the index of the
	errs() << "Error reading file: " << Input << ": " << ErrMsg;			// archive member. Since we are already in an error state, we just ignore
	errs().flush();			// this error.
				consumeError(NameOrErr.takeError());
				return (ArchiveName + "(<file index: " + std::to_string(Index) + ">)")
				.str();
				};

				unsigned I = -1;
				Error Err = Error::success();
				for (auto &C : A.children(Err)) {
				++I;
				Expected<std::unique_ptr<Binary>> ChildOrErr = C.getAsBinary();
				if (!ChildOrErr) {
				reportError(getFileNameForError(C, I), std::move(ChildOrErr.takeError()));
				continue;
	}			}

	cl::opt<std::string> InputFilename(cl::Positional, cl::desc("<input file>"),			if (Error E = dumpBinary(**ChildOrErr))
	cl::init("-"));			reportError(getFileNameForError(C, I), std::move(E));
				}

				if (Err)
				reportError(ArchiveName, std::move(Err));
				}

	int main(int argc, char *argv[]) {			int main(int argc, char *argv[]) {
	InitLLVM X(argc, argv);			InitLLVM X(argc, argv);
	cl::ParseCommandLineOptions(argc, argv);			cl::ParseCommandLineOptions(argc, argv);

	if (Error Err = dumpInput(InputFilename)) {			Expected<OwningBinary<Binary>> BinaryOrErr = createBinary(InputFilename);
	reportError(InputFilename, std::move(Err));			if (!BinaryOrErr) {
				reportError(InputFilename, std::move(BinaryOrErr.takeError()));
				return 1;
				}

				Binary &Binary = *BinaryOrErr->getBinary();
				if (Archive *Arc = dyn_cast<Archive>(&Binary)) {
				dumpArchive(InputFilename, *Arc);
				return HasError ? 1 : 0;
				}

				if (Error E = dumpBinary(Binary)) {
				reportError(InputFilename, std::move(E));
	return 1;			return 1;
	}			}

	return 0;			return 0;
	}			}