This is an archive of the discontinued LLVM Phabricator instance.

[yaml2obj][obj2yaml] - Teach tools to work with regular archives.
ClosedPublic

Authored by grimar on Oct 22 2020, 4:10 AM.

Details

Summary

This teaches obj2yaml to dump valid regular (not thin) archives.
This also teaches yaml2obj to recognize archives YAML descriptions,
what allows to craft all different kinds of archives (valid and broken ones).

Diff Detail

Event Timeline

grimar created this revision.Oct 22 2020, 4:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 22 2020, 4:10 AM
grimar requested review of this revision.Oct 22 2020, 4:10 AM

Now I am going to use this to drop precompiled binaries from https://github.com/llvm/llvm-project/blob/master/llvm/test/tools/llvm-objdump/malformed-archives.test
and to see how well this approach works. Hope to post a patch soon.

jhenderson added inline comments.Oct 23 2020, 1:30 AM
llvm/include/llvm/ObjectYAML/ArchiveYAML.h
26

I feel like Child or Member are more common terms for the parts of an archive. Probably best to pick one and use that? Same goes for the strings used in the YAML.

llvm/include/llvm/ObjectYAML/yaml2obj.h
53

Maybe rename to yaml2archive, since "arch" could be ambiguous (e.g. meaning "architecture").

llvm/lib/ObjectYAML/ArchiveYAML.cpp
24

I think the magic should be optional, and default to a regular archive's magic.

28–30

Could you explain why it doesn't get invoked? Presumably the equivalent functions for other formats are?

49

I'd like these to be optional too, with sensible defaults for the various fields. Having to specify them in most cases just generates noise.

50

This should be optional, with a default of no content.

llvm/test/tools/obj2yaml/Archives/regular.yaml
59–67

Can we omit the Entries here?

llvm/test/tools/yaml2obj/Archives/regular.yaml
4
7

llvm-ar options traditionally don't have -.

32

(fun fact: the ability to change the magic bytes arbitrarily allows someone to use yaml2obj as a kind of "echo" tool, by just specifying the arbitrary bytes to anything they want and omitting content and entries)

47

Maybe we don't need the "same size" case?

60–74

Rather than using od, I wonder if you could just use FileCheck directly on the archive, since the archive is a textual format?

# RUN: FileCheck --input-file=%t.two.a --check-prefix=TWO-DATA --strict-whitespace --match-full-lines --implicit-check-not={{.}}

#      TWO-DATA:!<arch>
# TWO-DATA-NEXT:bbbbbbbbbbbbbbb/1234567890abqwertyasdfgh876543217`
# TWO-DATA-NEXT:<some ASCII data>
...

I think the only change you'd need to make to the YAML is to use an ASCII character for the padding byte (e.g. 0x0a).

90
llvm/tools/obj2yaml/archive2yaml.cpp
18

Personal opinion: I don't see any benefit to final here, or even in general, since it prevents users extending things. I don't think that's useful.

29
50–56

If we make the header fields optional in the YAML, it's probably worth omitting them from obj2yaml's output.

61

This message maybe should include the Size field contents.

64

This should maybe list the expected size and the buffer size.

67

Is there testing for the Buffer.size() > Size aspect of this? In other words, showing that if the member ends at the end of the file, no padding.

grimar updated this revision to Diff 300631.Oct 26 2020, 4:25 AM
grimar marked 19 inline comments as done.
  • Addressed review comments.
llvm/lib/ObjectYAML/ArchiveYAML.cpp
28–30

Presumably the equivalent functions for other formats are?

No. Below is the piece of the code that calls mapping for the top level for each format:

void MappingTraits<YamlObjectFile>::mapping(IO &IO,
                                            YamlObjectFile &ObjectFile) {
  if (IO.outputting()) {
...
  } else {
    Input &In = (Input &)IO;
    if (IO.mapTag("!Arch")) {
      ObjectFile.Arch.reset(new ArchYAML::Archive());
      MappingTraits<ArchYAML::Archive>::mapping(IO, *ObjectFile.Arch);
    } else if (IO.mapTag("!ELF")) {
      ObjectFile.Elf.reset(new ELFYAML::Object());
      MappingTraits<ELFYAML::Object>::mapping(IO, *ObjectFile.Elf);
    } else if (IO.mapTag("!COFF")) {
      ObjectFile.Coff.reset(new COFFYAML::Object());
      MappingTraits<COFFYAML::Object>::mapping(IO, *ObjectFile.Coff);
    } else if (IO.mapTag("!mach-o")) {
      ObjectFile.MachO.reset(new MachOYAML::Object());
      MappingTraits<MachOYAML::Object>::mapping(IO, *ObjectFile.MachO);
    } else if (IO.mapTag("!fat-mach-o")) {
      ObjectFile.FatMachO.reset(new MachOYAML::UniversalBinary());
      MappingTraits<MachOYAML::UniversalBinary>::mapping(IO,
                                                         *ObjectFile.FatMachO);
    } else if (IO.mapTag("!minidump")) {
      ObjectFile.Minidump.reset(new MinidumpYAML::Object());
      MappingTraits<MinidumpYAML::Object>::mapping(IO, *ObjectFile.Minidump);
    } else if (IO.mapTag("!WASM")) {
      ObjectFile.Wasm.reset(new WasmYAML::Object());
      MappingTraits<WasmYAML::Object>::mapping(IO, *ObjectFile.Wasm);
    } else if (const Node *N = In.getCurrentNode()) {
      if (N->getRawTag().empty())
        IO.setError("YAML Object File missing document type tag!");
      else
        IO.setError("YAML Object File unsupported document type tag '" +
                    N->getRawTag() + "'!");
    }
  }
}

The code above just doesn't call validate. I think it just because it doesn't need it atm. E.g. when
we describe an ELF file, we can't specify both "Sections" and "Content" as ELF parser doesn't allow
us to set a content.

I've moved this piece of code to MappingTraits<YamlObjectFile>::mapping

llvm/test/tools/obj2yaml/Archives/regular.yaml
59–67

Now, when all fields are optional - yes.

llvm/test/tools/yaml2obj/Archives/regular.yaml
32

Not exactly. They still need to provide a file type first, e.g. "--- !Arch".

47

OK. I've removed the "same size" case.

60–74

Done.

llvm/tools/obj2yaml/archive2yaml.cpp
67

No. I've added the test.

jhenderson added inline comments.Oct 27 2020, 2:41 AM
llvm/test/tools/obj2yaml/Archives/regular.yaml
2

Possible additional test case: dumping the output when the header fields are empty, where such are "legal" (i.e. I think all but the name).

60

I think this comment better explains what's going on?

95
105

Looks like you could use '1' as the default here?

107
llvm/tools/obj2yaml/archive2yaml.cpp
38

I think it might be useful to include the offset of the bad header here. What do you think?

Same comment probably applies for most other messages below.

grimar updated this revision to Diff 300973.Oct 27 2020, 6:20 AM
grimar marked 6 inline comments as done.
  • Addressed review comments.
llvm/test/tools/obj2yaml/Archives/regular.yaml
2

Done.

i.e. I think all but the name

The Size value should be a valid integer to read the content. The name can be empty, but it`s the default value for it, so it is not dumped.

60

I think so.

llvm/tools/obj2yaml/archive2yaml.cpp
38

Done.

jhenderson added inline comments.Oct 28 2020, 2:50 AM
llvm/tools/obj2yaml/archive2yaml.cpp
39

I think the "at offset" bit is necessary, as otherwise it's not immediately clear what the number represents (it might be an index for example). Same applies below.

grimar updated this revision to Diff 301235.Oct 28 2020, 4:20 AM
grimar marked an inline comment as done.
  • Addressed review comment.
This revision is now accepted and ready to land.Oct 28 2020, 4:48 AM