This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly][Objcopy] Write output section headers identically to inputs
ClosedPublic

Authored by aheejin on Jul 17 2023, 6:20 PM.

Details

Summary

Previously when objcopy generated section headers, it padded the LEB that
encodes the section size out to 5 bytes, matching the behavior of clang.
This is correct, but results in a binary that differs from the input.
This can sometimes have undesirable consequences (e.g. breaking source maps).

This change makes the object reader remember the size of the LEB encoding
in the section header, so that llvm-objcopy can reproduce it exactly. For sections
not read from an object file (e.g. that llvm-objcopy is adding itself), pad to 5 bytes.

Diff Detail

Event Timeline

dschuff created this revision.Jul 17 2023, 6:20 PM
dschuff requested review of this revision.Jul 17 2023, 6:20 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 17 2023, 6:20 PM
dschuff edited the summary of this revision. (Show Details)Jul 17 2023, 6:21 PM
dschuff updated this revision to Diff 541299.Jul 17 2023, 6:25 PM

rename test file

jhenderson added inline comments.Jul 18 2023, 1:08 AM
llvm/test/tools/llvm-objcopy/wasm/section-header-size.test
4

Rather than adding a canned binary, it seems to me like it would be fairly straightforward to add some additional functionality to yaml2obj to customise the LEB size? Probably an additional, optional field called "HeaderSizeLength" or something to that effect as a section member?

Code LGTM, but the yaml2obj idea seems like a good one.

aheejin accepted this revision.Jul 18 2023, 3:36 PM
aheejin added inline comments.
llvm/include/llvm/Object/Wasm.h
112

One-liner comment on what this is (in line with other members above) would be nice

This revision is now accepted and ready to land.Jul 18 2023, 3:36 PM
dschuff added inline comments.Jul 18 2023, 5:37 PM
llvm/test/tools/llvm-objcopy/wasm/section-header-size.test
4

Yes, it's straightforward to add YAML support for this. The reason I didn't was that it would mean a new field in all of the sections of every YAML output, which would be a lot of extra output that we don't care about in every section in every test (which would make the output harder to read and require rewriting the expectations for basically every wasm YAML test). There are a couple of options here.

  1. Just use the checked-in binary in this one case
  2. Implement reading the value from YAML but not print it when outputting YAML. This allows writing a test without the checked-in binary and avoids polluting all the output. The test would read YAML with this field and diff the output of llvm-objcopy against the one generated by yaml2obj. That's nice, although it's a slightly odd that the support is asymmetric.
  3. Just bite the bullet and rewrite all the YAML tests. Then we could write this test with yaml2obj and obj2yaml.
  4. Same as option 2 but only print the value in YAML if it's a "non-default" value, i.e. neither the minimum size for the section in question, nor 5 (the default "padded" value). This would complicate the wasm2yaml logic slightly but not require rewriting the tests.

I think I favor option 2, WDYT?

dschuff added inline comments.Jul 18 2023, 5:39 PM
llvm/test/tools/llvm-objcopy/wasm/section-header-size.test
4

(or if not option 2, then 4, 1, or 3 in that order. Really any option other than 3 is fine with me).

jhenderson added inline comments.Jul 19 2023, 12:59 AM
llvm/include/llvm/Object/Wasm.h
112

FWIW, I think the comments for the other variables are redundant for the most part (e.g. the ones for type, offset and relocations provide no additional info beyond what the variable name provides).

llvm/test/tools/llvm-objcopy/wasm/section-header-size.test
4

Option 4 would be my suggestion, mostly because that's what ELF yaml2obj does for many of its fields. It means that we can have minimal YAML whilst also being able to go yaml2obj -> obj2yaml -> yaml2obj -> ... ad infinitum without the file ever changing.

dschuff updated this revision to Diff 542690.Jul 20 2023, 2:51 PM
  • add bidirectional YAML support, only print if non-default
dschuff accepted this revision.Jul 20 2023, 2:53 PM
dschuff added inline comments.
llvm/test/tools/llvm-objcopy/wasm/section-header-size.test
4

Thanks makes sense to me. Patch updated.

dschuff resigned from this revision.Jul 20 2023, 3:24 PM

oops, didn't mean to self-accept. Still wanted to give @jhenderson a chance to respond if he wants.

dschuff updated this revision to Diff 542704.Jul 20 2023, 3:31 PM
dschuff marked 2 inline comments as done.
  • update comments on section fields
dschuff added inline comments.Jul 20 2023, 4:02 PM
llvm/include/llvm/Object/Wasm.h
112

I agree that some of those comments don't add much use, and removed those; I left the others in, and added one for the new field.

jhenderson added inline comments.Jul 21 2023, 12:38 AM
llvm/lib/ObjectYAML/WasmEmitter.cpp
660

What happens if the encoding length is smaller than what is actually required to encode the size? I assume it either throws some kind of error somehow (no idea how), or silently writes size in full without truncation. If the latter, I think it might be worth reporting an error from yaml2obj, as otherwise the data won't reflect what was explicitly requested.

llvm/test/ObjectYAML/wasm/section_header_size.yaml
92

Nit: new line at EOF.

llvm/tools/obj2yaml/wasm2yaml.cpp
402–403

Maybe I'm missing something, but doesn't this mean that there's ambiguity between whether a length of the LEB size or 5 is appropriate when the emitted YAML is converted back to an object?

dschuff updated this revision to Diff 542970.Jul 21 2023, 9:36 AM
  • report error when LEB can't be encoded as requested
llvm/lib/ObjectYAML/WasmEmitter.cpp
660

The length field is "padTo", meaning that extra padding will be added if necessary, but otherwise you'll get whatever the minimum length is. I added an error report here if the length can't be encoded in the specified size.

llvm/tools/obj2yaml/wasm2yaml.cpp
402–403

No, you're correct. This does mean that a binary could fail to roundtrip from obj -> YAML -> obj. However doing it this way meant that we don't have to rewrite all of the YAML test expectations (since some components e.g. MCAssembler emit patchable 5-byte encodings, while others emit minimal-sized encodings). This potential failure to round-trip is maybe suboptimal in theory but I think it doesn't matter that much.

Any more thoughts on this?

Looks like a bundle of the pre-merge checks are failing due to this latest patch? Otherwise, this basically looks good to me, barring a couple of nits.

llvm/lib/ObjectYAML/WasmEmitter.cpp
652

LLVM coding standards state that this should be a lower-case letter.

A simple test case for the new error would be good too.

llvm/test/ObjectYAML/wasm/section_header_size.yaml
92

Ping this comment?

aheejin commandeered this revision.Jul 25 2023, 3:14 PM
aheejin removed a reviewer: aheejin.

@dschuff is gonna be OOO for a while and we need to make this available to our users sooner, so I'm taking this over to address the remaining issues.

This revision now requires review to proceed.Jul 25 2023, 3:14 PM
aheejin updated this revision to Diff 544160.Jul 25 2023, 5:33 PM
aheejin marked 4 inline comments as done.

Address comments + fix errors

llvm/lib/ObjectYAML/WasmEmitter.cpp
650

I think this is why the tests are failing. Will fix that.

aheejin updated this revision to Diff 544162.Jul 25 2023, 5:34 PM

Remove newline

aheejin updated this revision to Diff 544163.Jul 25 2023, 5:35 PM

I think I messed up diff.. Attempt to recover

Nearly there. Just a couple of small points now from me.

llvm/lib/ObjectYAML/WasmEmitter.cpp
650

Thanks, I agree the previous version looks suspicious. This number controls the default encoding length, so 5 seems appropriate.

Looking at this again, I think there is one technical bug here still though, in that if OutString happened to be so long that the LEB had to be 6 bytes or longer, then the user would be forced to specify the YAML field, even if it was just to the same size as the expected LEB. Perhaps worth changing 5 to something like std::max(5, getULEB128Size(OutString.size())? (Probably factor out the size calculation into a local variable)

652

Nit: I'd tend to say "leb" rather than "el-ee-bee" for "LEB", so "an" -> "a", but if you pronounce it differently, I'm fine with this staying as is.

llvm/test/ObjectYAML/wasm/invalid_section_header_size.yaml
1 ↗(On Diff #544164)

Rather than spin this off into a separate test file, I think it should be part of the existing section_header_size.yaml test. You could use either --docnum to allow you to have multiple YAML docs in the same file, or you could use yaml2obj's -D option to parmaterise an existing section header length field to be the appropriate too-small value.

tlively added inline comments.Jul 26 2023, 9:26 AM
llvm/lib/ObjectYAML/WasmEmitter.cpp
650

It's not possible for the size to require greater than 5 bytes because it is an unsigned 32-bit value. Each byte of a LEB has 7 significant bits and ceil(32/7) = 5. Indeed the Wasm spec does not allow using LEBs larger than 5 bytes to encode 32-bit values.

aheejin updated this revision to Diff 544525.Jul 26 2023, 2:44 PM
aheejin marked 2 inline comments as done.

Address comments

llvm/test/ObjectYAML/wasm/invalid_section_header_size.yaml
1 ↗(On Diff #544164)

Thanks! Didn't know about those options.

jhenderson accepted this revision.Jul 27 2023, 12:04 AM

LGTM, with nits addressed.

llvm/lib/ObjectYAML/WasmEmitter.cpp
650

Fair enough. FWIW, a ULEB has no technical upper limit on its size, so the constraint is in the wasm spec and what we implement rather than the theoretical what the user could write here, I guess.

llvm/test/ObjectYAML/wasm/section_header_size.yaml
83

Two nits:

  1. I personally wouldn't mix the check line in with the yaml. I think most readers would expect it to appear immediately after the RUN line, or after the end of the YAML doc (I personally prefer the former). However, I don't feel strongly about this.
  1. Should this be "section header length" rather than "section length"?
This revision is now accepted and ready to land.Jul 27 2023, 12:04 AM
aheejin marked an inline comment as done.Jul 27 2023, 3:27 PM
aheejin added inline comments.
llvm/lib/ObjectYAML/WasmEmitter.cpp
650

Changed to this so we assert the required length should not be greater than 5 bytes:

unsigned RequiredLen = getULEB128Size(OutString.size());                   
// Wasm spec does not allow LEBs larger than 5 bytes
assert(RequiredLen <= 5);                                                   
if (HeaderSecSizeEncodingLen < RequiredLen) {                     
  reportError("section length can't be encoded in a LEB of size " +
              Twine(HeaderSecSizeEncodingLen));                             
  return false;                                                             
}
aheejin updated this revision to Diff 544956.Jul 27 2023, 3:28 PM

Address comments

This revision was landed with ongoing or failed builds.Jul 27 2023, 3:44 PM
This revision was automatically updated to reflect the committed changes.