This teaches obj2yaml to dump valid regular (not thin) archives.
This also teaches yaml2obj to recognize archives YAML descriptions,
what allows to craft all different kinds of archives (valid and broken ones).
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Now I am going to use this to drop precompiled binaries from https://github.com/llvm/llvm-project/blob/master/llvm/test/tools/llvm-objdump/malformed-archives.test
and to see how well this approach works. Hope to post a patch soon.
llvm/include/llvm/ObjectYAML/ArchiveYAML.h | ||
---|---|---|
26 | I feel like Child or Member are more common terms for the parts of an archive. Probably best to pick one and use that? Same goes for the strings used in the YAML. | |
llvm/include/llvm/ObjectYAML/yaml2obj.h | ||
53 | Maybe rename to yaml2archive, since "arch" could be ambiguous (e.g. meaning "architecture"). | |
llvm/lib/ObjectYAML/ArchiveYAML.cpp | ||
24 | I think the magic should be optional, and default to a regular archive's magic. | |
28–30 | Could you explain why it doesn't get invoked? Presumably the equivalent functions for other formats are? | |
49 | I'd like these to be optional too, with sensible defaults for the various fields. Having to specify them in most cases just generates noise. | |
50 | This should be optional, with a default of no content. | |
llvm/test/tools/obj2yaml/Archives/regular.yaml | ||
59–67 | Can we omit the Entries here? | |
llvm/test/tools/yaml2obj/Archives/regular.yaml | ||
4 | ||
7 | llvm-ar options traditionally don't have -. | |
32 | (fun fact: the ability to change the magic bytes arbitrarily allows someone to use yaml2obj as a kind of "echo" tool, by just specifying the arbitrary bytes to anything they want and omitting content and entries) | |
47 | Maybe we don't need the "same size" case? | |
60–74 | Rather than using od, I wonder if you could just use FileCheck directly on the archive, since the archive is a textual format? # RUN: FileCheck --input-file=%t.two.a --check-prefix=TWO-DATA --strict-whitespace --match-full-lines --implicit-check-not={{.}} # TWO-DATA:!<arch> # TWO-DATA-NEXT:bbbbbbbbbbbbbbb/1234567890abqwertyasdfgh876543217` # TWO-DATA-NEXT:<some ASCII data> ... I think the only change you'd need to make to the YAML is to use an ASCII character for the padding byte (e.g. 0x0a). | |
90 | ||
llvm/tools/obj2yaml/archive2yaml.cpp | ||
18 | Personal opinion: I don't see any benefit to final here, or even in general, since it prevents users extending things. I don't think that's useful. | |
29 | ||
50–56 | If we make the header fields optional in the YAML, it's probably worth omitting them from obj2yaml's output. | |
61 | This message maybe should include the Size field contents. | |
64 | This should maybe list the expected size and the buffer size. | |
67 | Is there testing for the Buffer.size() > Size aspect of this? In other words, showing that if the member ends at the end of the file, no padding. |
- Addressed review comments.
llvm/lib/ObjectYAML/ArchiveYAML.cpp | ||
---|---|---|
28–30 |
No. Below is the piece of the code that calls mapping for the top level for each format: void MappingTraits<YamlObjectFile>::mapping(IO &IO, YamlObjectFile &ObjectFile) { if (IO.outputting()) { ... } else { Input &In = (Input &)IO; if (IO.mapTag("!Arch")) { ObjectFile.Arch.reset(new ArchYAML::Archive()); MappingTraits<ArchYAML::Archive>::mapping(IO, *ObjectFile.Arch); } else if (IO.mapTag("!ELF")) { ObjectFile.Elf.reset(new ELFYAML::Object()); MappingTraits<ELFYAML::Object>::mapping(IO, *ObjectFile.Elf); } else if (IO.mapTag("!COFF")) { ObjectFile.Coff.reset(new COFFYAML::Object()); MappingTraits<COFFYAML::Object>::mapping(IO, *ObjectFile.Coff); } else if (IO.mapTag("!mach-o")) { ObjectFile.MachO.reset(new MachOYAML::Object()); MappingTraits<MachOYAML::Object>::mapping(IO, *ObjectFile.MachO); } else if (IO.mapTag("!fat-mach-o")) { ObjectFile.FatMachO.reset(new MachOYAML::UniversalBinary()); MappingTraits<MachOYAML::UniversalBinary>::mapping(IO, *ObjectFile.FatMachO); } else if (IO.mapTag("!minidump")) { ObjectFile.Minidump.reset(new MinidumpYAML::Object()); MappingTraits<MinidumpYAML::Object>::mapping(IO, *ObjectFile.Minidump); } else if (IO.mapTag("!WASM")) { ObjectFile.Wasm.reset(new WasmYAML::Object()); MappingTraits<WasmYAML::Object>::mapping(IO, *ObjectFile.Wasm); } else if (const Node *N = In.getCurrentNode()) { if (N->getRawTag().empty()) IO.setError("YAML Object File missing document type tag!"); else IO.setError("YAML Object File unsupported document type tag '" + N->getRawTag() + "'!"); } } } The code above just doesn't call validate. I think it just because it doesn't need it atm. E.g. when I've moved this piece of code to MappingTraits<YamlObjectFile>::mapping | |
llvm/test/tools/obj2yaml/Archives/regular.yaml | ||
59–67 | Now, when all fields are optional - yes. | |
llvm/test/tools/yaml2obj/Archives/regular.yaml | ||
32 | Not exactly. They still need to provide a file type first, e.g. "--- !Arch". | |
47 | OK. I've removed the "same size" case. | |
60–74 | Done. | |
llvm/tools/obj2yaml/archive2yaml.cpp | ||
67 | No. I've added the test. |
llvm/test/tools/obj2yaml/Archives/regular.yaml | ||
---|---|---|
2 | Possible additional test case: dumping the output when the header fields are empty, where such are "legal" (i.e. I think all but the name). | |
60 | I think this comment better explains what's going on? | |
95 | ||
105 | Looks like you could use '1' as the default here? | |
107 | ||
llvm/tools/obj2yaml/archive2yaml.cpp | ||
38 | I think it might be useful to include the offset of the bad header here. What do you think? Same comment probably applies for most other messages below. |
- Addressed review comments.
llvm/test/tools/obj2yaml/Archives/regular.yaml | ||
---|---|---|
2 | Done.
The Size value should be a valid integer to read the content. The name can be empty, but it`s the default value for it, so it is not dumped. | |
60 | I think so. | |
llvm/tools/obj2yaml/archive2yaml.cpp | ||
38 | Done. |
llvm/tools/obj2yaml/archive2yaml.cpp | ||
---|---|---|
40 | I think the "at offset" bit is necessary, as otherwise it's not immediately clear what the number represents (it might be an index for example). Same applies below. |
I feel like Child or Member are more common terms for the parts of an archive. Probably best to pick one and use that? Same goes for the strings used in the YAML.