Page MenuHomePhabricator

[llvm-objcopy][WebAssembly] Initial support for wasm in llvm-objcopy
Needs ReviewPublic

Authored by dschuff on Mon, Dec 2, 3:54 PM.

Details

Summary

Supports dumping, adding, and removing of named custom sections, but
nothing else yet.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Herald added a project: Restricted Project. · View Herald Transcript
dschuff retitled this revision from [llvm-objcopy] Initial support for wasm in llvm-objcopy to [llvm-objcopy][WebAssembly] Initial support for wasm in llvm-objcopy.Mon, Dec 2, 3:54 PM
dschuff updated this revision to Diff 231798.Mon, Dec 2, 4:05 PM
  • Add test that removed section is gone

In general, we've tended to only add support for a basic copy in the first patch, and add further functionality as needed for each switch. That helps keep the patch size down and focuses on getting the basic writing algorithm right before worrying about any manipulations that need to make. As such, please break this patch into several smaller patches.

I don't really know anything about WASM by the way, but I've taken a look at generic stuff.

llvm/test/tools/llvm-objcopy/Wasm/add-section.test
1 ↗(On Diff #231798)

Nit: missing full stop.

In general with newer tests, we use '##' to indicate comments, so that they stand out from test lines.

I'd also be more expansive in your top-level comment with what the test is doing, i.e. it tests adding a section to a WASM object.

Same sort of comments apply in the other tests.

Why are you testing multiple pieces of functionality in one test?

2 ↗(On Diff #231798)

yaml2obj %s -o %t is more common in newer tests, to avoid a shell redirection.

11 ↗(On Diff #231798)

At least in ELF YAML, we tend to remove the excessive whitespace that yaml2obj currently produces. I'd recommend doing the same here. Keep only enough whitespace so that all values within a block are lined up.

71 ↗(On Diff #231798)

Add '#' characters for these CHECK lines.

78 ↗(On Diff #231798)

Nit: no new line at EOF.

llvm/test/tools/llvm-objcopy/Wasm/basic-archive-copy.test
1 ↗(On Diff #231798)

Same sort of comment here as in the above test:

  1. Add a top-level comment explaining the purpose of the test.
  2. Use -o for yaml2obj output.
11 ↗(On Diff #231798)

Nit: too many blank lines.

15 ↗(On Diff #231798)

'##'

89 ↗(On Diff #231798)

Use '#' characters here.

llvm/test/tools/llvm-objcopy/Wasm/basic-copy.test
1 ↗(On Diff #231798)

Same as above.

7–8 ↗(On Diff #231798)

What's stopping llvm-objcopy using the 5-byte LEBs?

FWIW, in ELF, we don't require a basic copy to produce identical output to the input, because the input might be written in an order style (e.g. the section header table might appear in a different place to "normal", there might be gaps in the object which objcopy compresses etc). Assuming this applies for WASM, you shouldn't try to do a binary comparison, but rather test the key things are all identical.

llvm/tools/llvm-objcopy/Wasm/Object.cpp
49 ↗(On Diff #231798)

Is there any structure that you can do a sizeof operation on or similar to avoid a literal here (and the need of the comment)?

75 ↗(On Diff #231798)

Put a blank line before each grouping here. Also, comments need leading capital letters and trailing full stops.

llvm/tools/llvm-objcopy/Wasm/Object.h
60 ↗(On Diff #231798)

You're already in the llvm namespace. Can't you just do wasm::WasmObjectHeader?

62 ↗(On Diff #231798)

This line of the comment isn't useful. The names of the functions show this. Please delete.

llvm/tools/llvm-objcopy/Wasm/WasmObjcopy.cpp
39–41 ↗(On Diff #231798)

Should this be an error? In ELF, an empty section is legitimate, and there's no reason it couldn't generate an empty file in this case.

53 ↗(On Diff #231798)

This may want to be errc::invalid_argument, since the process hasn't failed to parse the file.

114 ↗(On Diff #231798)

Copy-paste error?

dschuff updated this revision to Diff 231932.Tue, Dec 3, 9:36 AM
dschuff marked 17 inline comments as done.
  • Add test that removed section is gone
  • Address round 1 of review comments

Thanks for the detailed review. I just uploaded a diff to address the comments (other than splitting the patch apart).

llvm/test/tools/llvm-objcopy/Wasm/add-section.test
1 ↗(On Diff #231798)

This started out as a copy of test/tools/llvm-objcopy/ELF/add-section.test which has a similar structure. Since you've asked to start without this functionality anyway, I can back this test out entirely for now.

llvm/test/tools/llvm-objcopy/Wasm/basic-copy.test
7–8 ↗(On Diff #231798)

Nothing is stopping it; I just decided to match clang's behavior instead of yaml2obj's (as the comment mentions, they are different). I'll just clarify the comment, since as you say, there's not really any reason to change it.

llvm/tools/llvm-objcopy/Wasm/Object.h
60 ↗(On Diff #231798)

No:
/s/llvm-upstream/llvm-project/llvm/tools/llvm-objcopy/Wasm/Object.h:60:3: error: no type named 'WasmObjectHeader' in namespace 'llvm::objcopy::wasm'; did you mean '::llvm::wasm::WasmObjectHeader'?

llvm/tools/llvm-objcopy/Wasm/WasmObjcopy.cpp
39–41 ↗(On Diff #231798)

Actually you're right; I believe custom sections can be empty (Wasm has a distinction between "known" sections whose structure is specified by the standard and "custom" sections which are not interpreted by the VM). Since we only try to support custom sections for now, I'll remove this check.

dschuff updated this revision to Diff 231934.Tue, Dec 3, 9:48 AM
  • Remove add/remove/dump section support

Great to see this happening.

I wonder if the directory name should be called just wasm.. like e.g. lld/test/wasm and lld/wasm, We do have a mixture of wasm, 'Wasm` and WebAssembly. I've been thinking of WebAssembly as the target and wasm as the object/container format. So in this case I think just wasm makes the most sense for the directory name.

Do you mean wasm instead of Wasm?

dschuff updated this revision to Diff 231943.Tue, Dec 3, 10:21 AM
  • rename Wasm directories to wasm
sbc100 added inline comments.Tue, Dec 3, 10:24 AM
llvm/test/tools/llvm-objcopy/Wasm/basic-archive-copy.test
1 ↗(On Diff #231934)

I looks like objcopy might be the exception here but normally when writing tests in a given format such as this we give them the file extension of the format. So this file should be a .yaml maybe? One of the advantages of this is that you get syntax highlighting by default.

Maybe we should land a rename of the existing tests to follow this convention first?

llvm/tools/llvm-objcopy/Wasm/Object.cpp
57 ↗(On Diff #231934)

Seems like kind of a shame we have so many different places where we write the object format. This existing ones that I know of are:

  1. llvm/lib/MC/WasmObjectWriter.cpp
  2. llvm/lib/ObjectYAML/WasmEmitter.cpp
  3. lld/wasm/Writer.cpp

So this is added a 4th. But maybe its the same for ELF. I wonder if it would make sense to add a writing capability to lib/Object to unify these. I guess they are all subtly different so maybe it doesn't make sense?

llvm/tools/llvm-objcopy/Wasm/WasmObjcopy.cpp
53 ↗(On Diff #231934)

Is this assert useful? The contract with the Expected is that that the object pointer is valid if its not an error right? Also "deserialize" is a strange word here.. read or load would make more sense to me).

dschuff marked 2 inline comments as done.Tue, Dec 3, 10:53 AM
dschuff added inline comments.
llvm/tools/llvm-objcopy/Wasm/Object.cpp
57 ↗(On Diff #231934)

It is a bit odd that there is a completely independent data model of object files here (the ELF one is particularly complex but I don't have a particular intuition about how much duplication there is for ELF). For wasm (assuming we want to model symbols and relocations against custom sections), my hope was that we could continue to treat the known sections and most custom sections as opaque blobs and hopefully reuse the MC logic for symbols, which would mean that this code wouldn't get too much more complex. Not sure about how much work/duplication symbols would be though.

llvm/tools/llvm-objcopy/Wasm/WasmObjcopy.cpp
53 ↗(On Diff #231934)

I think if Expected holds a pointer/smart pointer, it can still be null?

I looks like objcopy might be the exception here but normally when writing tests in a given format such as this we give them the file extension of the format. So this file should be a .yaml maybe? One of the advantages of this is that you get syntax highlighting by default.

Maybe we should land a rename of the existing tests to follow this convention first?

objcopy isn't the exception. It's common practice for YAML tests throughout the binutils to have .test as an extension, because .yaml isn't in the default set of extensions that lit recognises as a test. I don't object to a mass renaming, but it should probably be done for each of the tools and probably needs to include a change in the default suffixes too (which might cause a problem in some places where a separate YAML file is used as an input...).

llvm/test/tools/llvm-objcopy/wasm/basic-archive-copy.test
2

Maybe rephrase this slightly:

"Test a basic copy of an archive containing a WASM object."

16–17

This is probably not needed: you already have basic-copy.test, which shows that objects are copied correctly, so the comparison of the extracted object is probably sufficient.

43

As mentioned in the now-deleted test, please remove the excessive whitespace, so that the Values in any given block line up neatly, but as close as possible to their tags, i.e.

--- !WASM
FileHeader:
  Version: 0x00000001
Sections:
  - Type: TYPE
    Signatures:
      - Index: 0
        ParamTypes:
          - I32
        ReturnTypes:
          - F32
      - Index: 1
        ParamTypes:
          - I32
          - I64
        ReturnTypes: []
# etc
      - Type:   R_WASM_TABLE_INDEX_SLEB
        Index:  0
        Offset: 0x00000006
74

Not knowing anything really about WASM, how much of the stuff in this output is really necessary for the test? Can, e.g. the Body value here be reduced down to a minimal content?

94–95

You don't need these two lines.

llvm/test/tools/llvm-objcopy/wasm/basic-copy.test
5

Prefer -o to shell redirection.

11

Same comments apply to this YAML as in the other test.

llvm/tools/llvm-objcopy/Wasm/Object.cpp
57 ↗(On Diff #231934)

I can't comment about WASM, but to some degree, ELF llvm-objcopy needs to have a different internal representation of an Object to that of the library, because the existing Object library object file there doesn't really support mutation. That's largely why things are written the way they were. @jakehehrlich might have more comments on that though.

llvm/tools/llvm-objcopy/wasm/Object.h
66

Can we get rid of this for this version (i.e. add it later when they are needed)?

llvm/tools/llvm-objcopy/wasm/WasmObjcopy.cpp
39

clang-format?

42

I'd probably just change this comment so that it is future proof "unsupported flag specified", or something like that. Also, we usually use lower-case first letter for error/warning messages.

dschuff updated this revision to Diff 232200.Wed, Dec 4, 1:55 PM
dschuff marked 12 inline comments as done.
  • Review 2
llvm/test/tools/llvm-objcopy/wasm/basic-archive-copy.test
43

OK, but is it really worthwhile to deviate from the default output format of obj2yaml and hand-edit everything just so there's less whitespace? It doesn't look any more readable to me.

74

I guess for the purposes of this test it doesn't actually matter whether the function body is valid wasm code or not. I do want to have some known (CODE, TYPE, etc) and linking/reloc sections in the test so that we can ensure that both code that is and is not understood by the object file layer doesn't get modified. That also means that the body has to be long enough that 5-byte relocations against it make sense.

94–95

Removing the readobj test above means removing all of this anyway.

llvm/tools/llvm-objcopy/wasm/WasmObjcopy.cpp
42

That's more future-proof for me, but much less useful for a user. In order to figure out what *is* supported if someone gets this error, they'd have to keep removing flags until it works. I did update the case though.

jhenderson added inline comments.Thu, Dec 5, 2:08 AM
llvm/test/tools/llvm-objcopy/wasm/basic-archive-copy.test
43

Excessive whitespace for me at least makes it harder to read.

An obj2yaml-generated document is not much better than a canned-object file in most test cases. The vast majority (all?) of our ELF tests use YAML that has been hand-written to make sure it covers the things we care about and nothing more. obj2yaml may well generate significantly more output than is needed to test the different bits that are really being implemented, so whilst it may be useful to provide a basis, the output really should be cut down.

llvm/tools/llvm-objcopy/wasm/WasmObjcopy.cpp
42

Yeah, I'm not particularly happy with how this is implemented in COFF or Mach-O either. Note that when you do add support for a single switch, you'll need to message, and that will just get unwieldy as more options are added. Perhaps we need to figure out a better method of resolving this, probably with some sort of command-line validation layer somewhere that goes over all switches and sees if they're supported.

alexshap added inline comments.Thu, Dec 5, 2:25 AM
llvm/tools/llvm-objcopy/wasm/Object.h
2

I'd prefer to follow the structure which we use for other formats (MachO, COFF, ELF), i.e. put the description of the model into Object.h/Object.cpp (the class Object etc), and put the classes Reader and Writer into separate files, even if they are small at the moment, they might grow in the future, it makes things a little bit easier to read.

dschuff updated this revision to Diff 232358.Thu, Dec 5, 8:39 AM
dschuff marked an inline comment as done.
  • Split Reader/Writer into separate files

I believe I've addressed all the feedback now.

llvm/tools/llvm-objcopy/wasm/Object.h
2

I don't really mind doing this, but I will note that ELF also has its Readers and Writers in Object.h

jhenderson added inline comments.Fri, Dec 6, 1:31 AM
llvm/test/tools/llvm-objcopy/wasm/basic-archive-copy.test
87–88

Minor nit: add an extra space between Kind/Name and their values to match the Index 1 symbol.

llvm/tools/llvm-objcopy/wasm/Object.h
2

I wouldn't mind somebody changing ELF to split them out too. Having them in the same file always confuses me initially.

llvm/tools/llvm-objcopy/wasm/Writer.cpp
27–29

I think we've been trying to get rid of these sort of non-obvious auto usages within llvm-objcopy. Whilst I can tell that S and Header are likely some kind of section and section header respectively, it's not obvious to me what their type is. Please replace them with the concrete type.

32

Does wasm define a constant for this 0 case here?

36

What is the 5 here? Is there a constant for it somewhere if applicable?

41

Same sort of comment as above. What are the 1 and 5?

53

Does wasm have a concept of endianness that we need to care about here?

56

i -> I

Also, I believe the standard LLVM style is:

for (size_t I = 0, S = SectionHeaders.size(); I < S; ++I)

58

Don't use auto here please. It's not clear what the type is again.

dschuff updated this revision to Diff 233408.Wed, Dec 11, 10:33 AM
dschuff marked 8 inline comments as done.
  • Review comments 3
llvm/tools/llvm-objcopy/wasm/Writer.cpp
36

No constant (it's just the max encoded LEB size that can encode 32 bits) but added a comment.

53

No, the spec says that these header values here are both little-endian and we are not trying to interpret them anywhere. All the other values in the format are either 1 byte or LEB-encoded.

Only one small pair of questions from me, though absolutely others should approve this too.

llvm/tools/llvm-objcopy/wasm/Writer.cpp
36

Okay, thanks. Why 32-bit though? Also, do we need to handle data that is greater in size than max 32-bits in some way?