This is the first part of an effort to add wasm binary
support across all llvm tools.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
@Bigcheese would you mind taking a look? I think this is the smallest testable unit that makes sense.
I'm taking a look. First thing I noticed is that it parses up front on creation. I'm not familiar with the wasm binary format, but generally we try to be as lazy as possible with this API.
lib/Object/WasmObjectFile.cpp | ||
---|---|---|
22–24 ↗ | (On Diff #77910) | auto ObjectFile = llvm::make_unique<WasmObjectFile>(Buffer, Err)); |
I could probably remove a all/most of the parseXXXSection functions from the PR actually since they are not used by objdump directly. Would make the PR smaller and mean less up-front parsing in the WasmObjectFile.cpp.
There's a lot more code than needed here to pass the tests. Most of the section parsing code isn't actually used.
I'm wondering how you envision mapping between the binary tools and wasm. What should each of the major options of objdump print? Should ar create archives with symbol tables? Is wasm an object file format, executable format, or both?
I'd like to know what the plan is so I can review the design with that in mind.
lib/Object/ObjectFile.cpp | ||
---|---|---|
16 ↗ | (On Diff #78048) | Sort alphabetically. |
lib/Object/WasmObjectFile.cpp | ||
45 ↗ | (On Diff #78048) | This is ignoring the endianess. |
52 ↗ | (On Diff #78048) | This is ignoring the endianess. |
119 ↗ | (On Diff #78048) | Why not return WasmSection by value? |
434 ↗ | (On Diff #78048) | Isn't the name of the section for USER sections a string from the file? |
tools/llvm-objdump/WasmDump.cpp | ||
30 ↗ | (On Diff #78048) | The section table for other formats isn't considered part of the file header. |
- Better support for user sections
- Update identify_magic change to match recent changes
- Remove unused parse functions
- Run clang-format
- Cleanup program header
I removed a lot of the code that is not needed for this initial PR.
Added tests for objdump -s, -p and -h.
The wasm file format as it stands today is designed as an executable file format. We plan to use it for object files down the line but our initial target is to use bitcode linking and only emit the wasm file for the final binary. The next phase is to design some extensions to the file format to allow for it to be used for object files also.
The binary format is described here: http://webassembly.org/docs/binary-encoding/
And the initial plan for extending the format for linkable object file is here: https://github.com/WebAssembly/tool-conventions/pull/3
@sunfish please correct me if I'm wrong here.
include/llvm/Object/Wasm.h | ||
---|---|---|
88 ↗ | (On Diff #78113) | Any reason why you're inlining this method here but not all the other trivial ones? (getArch() et similia) |
include/llvm/Support/Wasm.h | ||
38–49 ↗ | (On Diff #78113) | Adding comment about what these variables represent (as we do, e.g. in ELF.h) would be nice. |
lib/Object/WasmObjectFile.cpp | ||
117–119 ↗ | (On Diff #78113) | Can any of these unreachable be triggered unvoluntarily by the user? (indirectly, e.g. via objdump?). If so, they should be converted to report_fatal_error. Otherwise it's fine, but please add the name of the function to the unreachable string. |
210 ↗ | (On Diff #78113) | Can this fail? e.g. empty section contents? |
- Better support for user sections
- Update identify_magic change to match recent changes
- Remove unused parse functions
- Run clang-format
- Cleanup program header
- Add comments to Wasm.h
- Add check for empty section
include/llvm/Object/Wasm.h | ||
---|---|---|
88 ↗ | (On Diff #78113) | Looks like I cargo-culted this from MachO.h and COFF.h which both have this strangeness. Will move to cpp file. |
lib/Object/WasmObjectFile.cpp | ||
117–119 ↗ | (On Diff #78113) | None of the objdump flags seem to tickle any of these unreachables. I don't see any precedent for including the name of the unreachable function for unimplemented functions (See git grep -i "llvm_unreachable.*unimp". Want me to go ahead and do this anyway? |
210 ↗ | (On Diff #78113) | No, sections cannot be empty. |
include/llvm/Object/Wasm.h | ||
---|---|---|
31 ↗ | (On Diff #78219) | I think inline inside class definition is not necessary (check elsewhere). |
34 ↗ | (On Diff #78219) | This comment is too trivial to be useful. Please remove. |
lib/Object/WasmObjectFile.cpp | ||
71 ↗ | (On Diff #78219) | Can't this just be a StringError instead of a GenericBinaryError? |
117–119 ↗ | (On Diff #78113) | No, leave it as is. Hopefully they won't be unreachable for long. |
210 ↗ | (On Diff #78113) | Please add a comment explaining why this can't fail. |
- Better support for user sections
- Update identify_magic change to match recent changes
- Remove unused parse functions
- Run clang-format
- Cleanup program header
- Add comments to Wasm.h
- Add check for empty section
- Add comment and remove unneeded friend declartions
include/llvm/Object/Wasm.h | ||
---|---|---|
31 ↗ | (On Diff #78219) | All the other headers in include/llvm/Object do it this way |
lib/Object/WasmObjectFile.cpp | ||
71 ↗ | (On Diff #78219) | GenericBinaryError seems to make sense since this is an problem with binary file. Its a subclass of BinaryError which I think is that kind of error we want here. From the docs from GenericBinaryError: / For errors that don't require their own specific sub-error (most errors) MachOObjectFile.cpp uses it too. |
- Remove inline from classof and use StringError
include/llvm/Object/Wasm.h | ||
---|---|---|
31 ↗ | (On Diff #78219) | Hmm, seems like the codebase is quite divided on this one: $ git grep classof ./include | grep inline | wc -l Perhaps I should propose a cleanup CL? In the mean time I'll drop inline here I guess? |
lib/Object/WasmObjectFile.cpp | ||
71 ↗ | (On Diff #78219) | OK.. will change. |
This LGTM. Michael might have some extra comment.
include/llvm/Object/Wasm.h | ||
---|---|---|
31 ↗ | (On Diff #78219) | That's a great idea if you feel like it =) |