This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
Object/
3/3
Wasm.h
-
ObjectYAML/
-
WasmYAML.h
-
lib/
-
ObjCopy/wasm/
-
wasm/
-
WasmObject.h
-
WasmReader.cpp
-
WasmWriter.cpp
-
Object/
-
WasmObjectFile.cpp
-
ObjectYAML/
6/9
WasmEmitter.cpp
-
WasmYAML.cpp
-
test/
-
ObjectYAML/wasm/
-
wasm/
3/3
section_header_size.yaml
-
tools/llvm-objcopy/wasm/
-
llvm-objcopy/
-
wasm/
3/5
section-header-size.test
-
tools/obj2yaml/
-
obj2yaml/
1/2
wasm2yaml.cpp

Differential D155535

[WebAssembly][Objcopy] Write output section headers identically to inputs
ClosedPublic

Authored by aheejin on Jul 17 2023, 6:20 PM.

Download Raw Diff

Details

Reviewers

tlively
alexander-shaposhnikov
jhenderson
dschuff

Commits

rG1b21067cf247: [WebAssembly][Objcopy] Write output section headers identically to inputs

Summary

Previously when objcopy generated section headers, it padded the LEB that
encodes the section size out to 5 bytes, matching the behavior of clang.
This is correct, but results in a binary that differs from the input.
This can sometimes have undesirable consequences (e.g. breaking source maps).

This change makes the object reader remember the size of the LEB encoding
in the section header, so that llvm-objcopy can reproduce it exactly. For sections
not read from an object file (e.g. that llvm-objcopy is adding itself), pad to 5 bytes.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

dschuff created this revision.Jul 17 2023, 6:20 PM

Herald added a reviewer: alexander-shaposhnikov. · View Herald TranscriptJul 17 2023, 6:20 PM

Herald added a reviewer: jhenderson. · View Herald Transcript

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: pmatos, asb, wingo and 4 others. · View Herald Transcript

dschuff requested review of this revision.Jul 17 2023, 6:20 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 17 2023, 6:20 PM

Herald added subscribers: llvm-commits, MaskRay. · View Herald Transcript

dschuff edited the summary of this revision. (Show Details)Jul 17 2023, 6:21 PM

rename test file

Harbormaster completed remote builds in B246053: Diff 541299.Jul 18 2023, 12:32 AM

jhenderson added inline comments.Jul 18 2023, 1:08 AM

llvm/test/tools/llvm-objcopy/wasm/section-header-size.test
4	Rather than adding a canned binary, it seems to me like it would be fairly straightforward to add some additional functionality to yaml2obj to customise the LEB size? Probably an additional, optional field called "HeaderSizeLength" or something to that effect as a section member?

Code LGTM, but the yaml2obj idea seems like a good one.

aheejin accepted this revision.Jul 18 2023, 3:36 PM

aheejin added inline comments.

llvm/include/llvm/Object/Wasm.h
112	One-liner comment on what this is (in line with other members above) would be nice

This revision is now accepted and ready to land.Jul 18 2023, 3:36 PM

dschuff added inline comments.Jul 18 2023, 5:37 PM

llvm/test/tools/llvm-objcopy/wasm/section-header-size.test
4	Yes, it's straightforward to add YAML support for this. The reason I didn't was that it would mean a new field in all of the sections of every YAML output, which would be a lot of extra output that we don't care about in every section in every test (which would make the output harder to read and require rewriting the expectations for basically every wasm YAML test). There are a couple of options here. Just use the checked-in binary in this one case Implement reading the value from YAML but not print it when outputting YAML. This allows writing a test without the checked-in binary and avoids polluting all the output. The test would read YAML with this field and diff the output of llvm-objcopy against the one generated by yaml2obj. That's nice, although it's a slightly odd that the support is asymmetric. Just bite the bullet and rewrite all the YAML tests. Then we could write this test with yaml2obj and obj2yaml. Same as option 2 but only print the value in YAML if it's a "non-default" value, i.e. neither the minimum size for the section in question, nor 5 (the default "padded" value). This would complicate the wasm2yaml logic slightly but not require rewriting the tests. I think I favor option 2, WDYT?

dschuff added inline comments.Jul 18 2023, 5:39 PM

llvm/test/tools/llvm-objcopy/wasm/section-header-size.test
4	(or if not option 2, then 4, 1, or 3 in that order. Really any option other than 3 is fine with me).

jhenderson added inline comments.Jul 19 2023, 12:59 AM

llvm/include/llvm/Object/Wasm.h
112	FWIW, I think the comments for the other variables are redundant for the most part (e.g. the ones for type, offset and relocations provide no additional info beyond what the variable name provides).
llvm/test/tools/llvm-objcopy/wasm/section-header-size.test
4	Option 4 would be my suggestion, mostly because that's what ELF yaml2obj does for many of its fields. It means that we can have minimal YAML whilst also being able to go yaml2obj -> obj2yaml -> yaml2obj -> ... ad infinitum without the file ever changing.

add bidirectional YAML support, only print if non-default

rebase

dschuff accepted this revision.Jul 20 2023, 2:53 PM

dschuff added inline comments.

llvm/test/tools/llvm-objcopy/wasm/section-header-size.test
4	Thanks makes sense to me. Patch updated.

oops, didn't mean to self-accept. Still wanted to give @jhenderson a chance to respond if he wants.

update comments on section fields

dschuff added inline comments.Jul 20 2023, 4:02 PM

llvm/include/llvm/Object/Wasm.h
112	I agree that some of those comments don't add much use, and removed those; I left the others in, and added one for the new field.

Harbormaster completed remote builds in B247054: Diff 542704.Jul 20 2023, 8:37 PM

jhenderson added inline comments.Jul 21 2023, 12:38 AM

llvm/lib/ObjectYAML/WasmEmitter.cpp
660	What happens if the encoding length is smaller than what is actually required to encode the size? I assume it either throws some kind of error somehow (no idea how), or silently writes size in full without truncation. If the latter, I think it might be worth reporting an error from yaml2obj, as otherwise the data won't reflect what was explicitly requested.
llvm/test/ObjectYAML/wasm/section_header_size.yaml
92	Nit: new line at EOF.
llvm/tools/obj2yaml/wasm2yaml.cpp
402–403	Maybe I'm missing something, but doesn't this mean that there's ambiguity between whether a length of the LEB size or 5 is appropriate when the emitted YAML is converted back to an object?

report error when LEB can't be encoded as requested

llvm/lib/ObjectYAML/WasmEmitter.cpp
660	The length field is "padTo", meaning that extra padding will be added if necessary, but otherwise you'll get whatever the minimum length is. I added an error report here if the length can't be encoded in the specified size.
llvm/tools/obj2yaml/wasm2yaml.cpp
402–403	No, you're correct. This does mean that a binary could fail to roundtrip from obj -> YAML -> obj. However doing it this way meant that we don't have to rewrite all of the YAML test expectations (since some components e.g. MCAssembler emit patchable 5-byte encodings, while others emit minimal-sized encodings). This potential failure to round-trip is maybe suboptimal in theory but I think it doesn't matter that much.

Harbormaster completed remote builds in B247259: Diff 542970.Jul 21 2023, 2:11 PM

Any more thoughts on this?

Looks like a bundle of the pre-merge checks are failing due to this latest patch? Otherwise, this basically looks good to me, barring a couple of nits.

llvm/lib/ObjectYAML/WasmEmitter.cpp
652	LLVM coding standards state that this should be a lower-case letter. A simple test case for the new error would be good too.
llvm/test/ObjectYAML/wasm/section_header_size.yaml
92	Ping this comment?

@dschuff is gonna be OOO for a while and we need to make this available to our users sooner, so I'm taking this over to address the remaining issues.

This revision now requires review to proceed.Jul 25 2023, 3:14 PM

Herald added a subscriber: dschuff. · View Herald TranscriptJul 25 2023, 3:14 PM

Address comments + fix errors

llvm/lib/ObjectYAML/WasmEmitter.cpp
650	I think this is why the tests are failing. Will fix that.

Harbormaster completed remote builds in B248125: Diff 544160.Jul 25 2023, 5:33 PM

Remove newline

Harbormaster completed remote builds in B248127: Diff 544162.Jul 25 2023, 5:35 PM

I think I messed up diff.. Attempt to recover

again

Harbormaster completed remote builds in B248128: Diff 544163.Jul 25 2023, 5:36 PM

Nearly there. Just a couple of small points now from me.

llvm/lib/ObjectYAML/WasmEmitter.cpp
650	Thanks, I agree the previous version looks suspicious. This number controls the default encoding length, so 5 seems appropriate. Looking at this again, I think there is one technical bug here still though, in that if OutString happened to be so long that the LEB had to be 6 bytes or longer, then the user would be forced to specify the YAML field, even if it was just to the same size as the expected LEB. Perhaps worth changing 5 to something like `std::max(5, getULEB128Size(OutString.size())`? (Probably factor out the size calculation into a local variable)
652	Nit: I'd tend to say "leb" rather than "el-ee-bee" for "LEB", so "an" -> "a", but if you pronounce it differently, I'm fine with this staying as is.
llvm/test/ObjectYAML/wasm/invalid_section_header_size.yaml
1 ↗	(On Diff #544164)	Rather than spin this off into a separate test file, I think it should be part of the existing section_header_size.yaml test. You could use either --docnum to allow you to have multiple YAML docs in the same file, or you could use yaml2obj's -D option to parmaterise an existing section header length field to be the appropriate too-small value.

Harbormaster completed remote builds in B248129: Diff 544164.Jul 26 2023, 3:12 AM

tlively added inline comments.Jul 26 2023, 9:26 AM

llvm/lib/ObjectYAML/WasmEmitter.cpp
650	It's not possible for the size to require greater than 5 bytes because it is an unsigned 32-bit value. Each byte of a LEB has 7 significant bits and ceil(32/7) = 5. Indeed the Wasm spec does not allow using LEBs larger than 5 bytes to encode 32-bit values.

Address comments

llvm/test/ObjectYAML/wasm/invalid_section_header_size.yaml
1 ↗	(On Diff #544164)	Thanks! Didn't know about those options.

Harbormaster completed remote builds in B248377: Diff 544525.Jul 26 2023, 7:31 PM

LGTM, with nits addressed.

llvm/lib/ObjectYAML/WasmEmitter.cpp
650	Fair enough. FWIW, a ULEB has no technical upper limit on its size, so the constraint is in the wasm spec and what we implement rather than the theoretical what the user could write here, I guess.
llvm/test/ObjectYAML/wasm/section_header_size.yaml
83	Two nits: I personally wouldn't mix the check line in with the yaml. I think most readers would expect it to appear immediately after the RUN line, or after the end of the YAML doc (I personally prefer the former). However, I don't feel strongly about this. Should this be "section header length" rather than "section length"?

This revision is now accepted and ready to land.Jul 27 2023, 12:04 AM

aheejin marked an inline comment as done.Jul 27 2023, 3:27 PM

aheejin added inline comments.

llvm/lib/ObjectYAML/WasmEmitter.cpp

650

Changed to this so we assert the required length should not be greater than 5 bytes:

unsigned RequiredLen = getULEB128Size(OutString.size());                   
// Wasm spec does not allow LEBs larger than 5 bytes
assert(RequiredLen <= 5);                                                   
if (HeaderSecSizeEncodingLen < RequiredLen) {                     
  reportError("section length can't be encoded in a LEB of size " +
              Twine(HeaderSecSizeEncodingLen));                             
  return false;                                                             
}

Address comments

This revision was landed with ongoing or failed builds.Jul 27 2023, 3:44 PM

Closed by commit rG1b21067cf247: [WebAssembly][Objcopy] Write output section headers identically to inputs (authored by dschuff, committed by aheejin). · Explain Why

This revision was automatically updated to reflect the committed changes.

aheejin added a commit: rG1b21067cf247: [WebAssembly][Objcopy] Write output section headers identically to inputs.

Harbormaster completed remote builds in B248709: Diff 544956.Jul 27 2023, 5:34 PM

Revision Contents

Path

Size

llvm/

include/

llvm/

Object/

Wasm.h

10 lines

ObjectYAML/

WasmYAML.h

1 line

lib/

ObjCopy/

wasm/

WasmObject.h

1 line

WasmReader.cpp

4 lines

WasmWriter.cpp

13 lines

Object/

WasmObjectFile.cpp

4 lines

ObjectYAML/

WasmEmitter.cpp

12 lines

WasmYAML.cpp

1 line

test/

ObjectYAML/

wasm/

section_header_size.yaml

91 lines

tools/

llvm-objcopy/

wasm/

section-header-size.test

41 lines

tools/

obj2yaml/

wasm2yaml.cpp

12 lines

Diff 544968

llvm/include/llvm/Object/Wasm.h

	Show First 20 Lines • Show All 98 Lines • ▼ Show 20 Lines
	#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)			#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)
	LLVM_DUMP_METHOD void dump() const;			LLVM_DUMP_METHOD void dump() const;
	#endif			#endif
	};			};

	struct WasmSection {			struct WasmSection {
	WasmSection() = default;			WasmSection() = default;

	uint32_t Type = 0; // Section type (See below)			uint32_t Type = 0;
	uint32_t Offset = 0; // Offset with in the file			uint32_t Offset = 0; // Offset within the file
	StringRef Name; // Section name (User-defined sections only)			StringRef Name; // Section name (User-defined sections only)
	uint32_t Comdat = UINT32_MAX; // From the "comdat info" section			uint32_t Comdat = UINT32_MAX; // From the "comdat info" section
	ArrayRef<uint8_t> Content; // Section content			ArrayRef<uint8_t> Content;
	std::vector<wasm::WasmRelocation> Relocations; // Relocations for this section			std::vector<wasm::WasmRelocation> Relocations;
				aheejinAuthorUnsubmitted Done Reply Inline Actions One-liner comment on what this is (in line with other members above) would be nice aheejin: One-liner comment on what this is (in line with other members above) would be nice
				jhendersonUnsubmitted Done Reply Inline Actions FWIW, I think the comments for the other variables are redundant for the most part (e.g. the ones for type, offset and relocations provide no additional info beyond what the variable name provides). jhenderson: FWIW, I think the comments for the other variables are redundant for the most part (e.g. the…
				dschuffUnsubmitted Done Reply Inline Actions I agree that some of those comments don't add much use, and removed those; I left the others in, and added one for the new field. dschuff: I agree that some of those comments don't add much use, and removed those; I left the others in…
				// Length of the LEB encoding of the section header's size field
				std::optional<uint8_t> HeaderSecSizeEncodingLen;
	};			};

	struct WasmSegment {			struct WasmSegment {
	uint32_t SectionOffset;			uint32_t SectionOffset;
	wasm::WasmDataSegment Data;			wasm::WasmDataSegment Data;
	};			};

	class WasmObjectFile : public ObjectFile {			class WasmObjectFile : public ObjectFile {
	▲ Show 20 Lines • Show All 248 Lines • Show Last 20 Lines

llvm/include/llvm/ObjectYAML/WasmYAML.h

	Show First 20 Lines • Show All 183 Lines • ▼ Show 20 Lines
	};			};

	struct Section {			struct Section {
	explicit Section(SectionType SecType) : Type(SecType) {}			explicit Section(SectionType SecType) : Type(SecType) {}
	virtual ~Section();			virtual ~Section();

	SectionType Type;			SectionType Type;
	std::vector<Relocation> Relocations;			std::vector<Relocation> Relocations;
				std::optional<uint8_t> HeaderSecSizeEncodingLen;
	};			};

	struct CustomSection : Section {			struct CustomSection : Section {
	explicit CustomSection(StringRef Name)			explicit CustomSection(StringRef Name)
	: Section(wasm::WASM_SEC_CUSTOM), Name(Name) {}			: Section(wasm::WASM_SEC_CUSTOM), Name(Name) {}

	static bool classof(const Section *S) {			static bool classof(const Section *S) {
	return S->Type == wasm::WASM_SEC_CUSTOM;			return S->Type == wasm::WASM_SEC_CUSTOM;
	▲ Show 20 Lines • Show All 403 Lines • Show Last 20 Lines

llvm/lib/ObjCopy/wasm/WasmObject.h

	Show All 17 Lines
	namespace llvm {			namespace llvm {
	namespace objcopy {			namespace objcopy {
	namespace wasm {			namespace wasm {

	struct Section {			struct Section {
	// For now, each section is only an opaque binary blob with no distinction			// For now, each section is only an opaque binary blob with no distinction
	// between custom and known sections.			// between custom and known sections.
	uint8_t SectionType;			uint8_t SectionType;
				std::optional<uint8_t> HeaderSecSizeEncodingLen;
	StringRef Name;			StringRef Name;
	ArrayRef<uint8_t> Contents;			ArrayRef<uint8_t> Contents;
	};			};

	struct Object {			struct Object {
	llvm::wasm::WasmObjectHeader Header;			llvm::wasm::WasmObjectHeader Header;
	// For now don't discriminate between kinds of sections.			// For now don't discriminate between kinds of sections.
	std::vector<Section> Sections;			std::vector<Section> Sections;
	Show All 14 Lines

llvm/lib/ObjCopy/wasm/WasmReader.cpp

	Show All 16 Lines

	Expected<std::unique_ptr<Object>> Reader::create() const {			Expected<std::unique_ptr<Object>> Reader::create() const {
	auto Obj = std::make_unique<Object>();			auto Obj = std::make_unique<Object>();
	Obj->Header = WasmObj.getHeader();			Obj->Header = WasmObj.getHeader();
	std::vector<Section> Sections;			std::vector<Section> Sections;
	Obj->Sections.reserve(WasmObj.getNumSections());			Obj->Sections.reserve(WasmObj.getNumSections());
	for (const SectionRef &Sec : WasmObj.sections()) {			for (const SectionRef &Sec : WasmObj.sections()) {
	const WasmSection &WS = WasmObj.getWasmSection(Sec);			const WasmSection &WS = WasmObj.getWasmSection(Sec);
	Obj->Sections.push_back(			Obj->Sections.push_back({static_cast<uint8_t>(WS.Type),
	{static_cast<uint8_t>(WS.Type), WS.Name, WS.Content});			WS.HeaderSecSizeEncodingLen, WS.Name, WS.Content});
	// Give known sections standard names to allow them to be selected. (Custom			// Give known sections standard names to allow them to be selected. (Custom
	// sections already have their names filled in by the parser).			// sections already have their names filled in by the parser).
	Section &ReaderSec = Obj->Sections.back();			Section &ReaderSec = Obj->Sections.back();
	if (ReaderSec.SectionType > WASM_SEC_CUSTOM &&			if (ReaderSec.SectionType > WASM_SEC_CUSTOM &&
	ReaderSec.SectionType <= WASM_SEC_LAST_KNOWN)			ReaderSec.SectionType <= WASM_SEC_LAST_KNOWN)
	ReaderSec.Name = sectionTypeToString(ReaderSec.SectionType);			ReaderSec.Name = sectionTypeToString(ReaderSec.SectionType);
	}			}
	return std::move(Obj);			return std::move(Obj);
	}			}

	} // end namespace wasm			} // end namespace wasm
	} // end namespace objcopy			} // end namespace objcopy
	} // end namespace llvm			} // end namespace llvm

llvm/lib/ObjCopy/wasm/WasmWriter.cpp

Show All 23 Lines	Writer::SectionHeader Writer::createSectionHeader(const Section &S,
size_t &SectionSize) {		size_t &SectionSize) {
SectionHeader Header;		SectionHeader Header;
raw_svector_ostream OS(Header);		raw_svector_ostream OS(Header);
OS << S.SectionType;		OS << S.SectionType;
bool HasName = S.SectionType == WASM_SEC_CUSTOM;		bool HasName = S.SectionType == WASM_SEC_CUSTOM;
SectionSize = S.Contents.size();		SectionSize = S.Contents.size();
if (HasName)		if (HasName)
SectionSize += getULEB128Size(S.Name.size()) + S.Name.size();		SectionSize += getULEB128Size(S.Name.size()) + S.Name.size();
// Pad the LEB value out to 5 bytes to make it a predictable size, and		// If we read this section from an object file, use its original size for the
// match the behavior of clang.		// padding of the LEB value to avoid changing the file size. Otherwise, pad
encodeULEB128(SectionSize, OS, 5);		// out to 5 bytes to make it predictable, and match the behavior of clang.
		unsigned HeaderSecSizeEncodingLen =
		S.HeaderSecSizeEncodingLen ? *S.HeaderSecSizeEncodingLen : 5;
		encodeULEB128(SectionSize, OS, HeaderSecSizeEncodingLen);
if (HasName) {		if (HasName) {
encodeULEB128(S.Name.size(), OS);		encodeULEB128(S.Name.size(), OS);
OS << S.Name;		OS << S.Name;
}		}
// Total section size is the content size plus 1 for the section type and		// Total section size is the content size plus 1 for the section type and
// 5 for the LEB-encoded size.		// the LEB-encoded size.
SectionSize = SectionSize + 1 + 5;		SectionSize = SectionSize + 1 + HeaderSecSizeEncodingLen;
return Header;		return Header;
}		}

size_t Writer::finalize() {		size_t Writer::finalize() {
size_t ObjectSize = sizeof(WasmMagic) + sizeof(WasmVersion);		size_t ObjectSize = sizeof(WasmMagic) + sizeof(WasmVersion);
SectionHeaders.reserve(Obj.Sections.size());		SectionHeaders.reserve(Obj.Sections.size());
// Finalize the headers of each section so we know the total size.		// Finalize the headers of each section so we know the total size.
for (const Section &S : Obj.Sections) {		for (const Section &S : Obj.Sections) {
Show All 30 Lines

llvm/lib/Object/WasmObjectFile.cpp

Show First 20 Lines • Show All 262 Lines • ▼ Show 20 Lines	static wasm::WasmTableType readTableType(WasmObjectFile::ReadContext &Ctx) {
return TableType;		return TableType;
}		}

static Error readSection(WasmSection &Section, WasmObjectFile::ReadContext &Ctx,		static Error readSection(WasmSection &Section, WasmObjectFile::ReadContext &Ctx,
WasmSectionOrderChecker &Checker) {		WasmSectionOrderChecker &Checker) {
Section.Offset = Ctx.Ptr - Ctx.Start;		Section.Offset = Ctx.Ptr - Ctx.Start;
Section.Type = readUint8(Ctx);		Section.Type = readUint8(Ctx);
LLVM_DEBUG(dbgs() << "readSection type=" << Section.Type << "\n");		LLVM_DEBUG(dbgs() << "readSection type=" << Section.Type << "\n");
		// When reading the section's size, store the size of the LEB used to encode
		// it. This allows objcopy/strip to reproduce the binary identically.
		const uint8_t *PreSizePtr = Ctx.Ptr;
uint32_t Size = readVaruint32(Ctx);		uint32_t Size = readVaruint32(Ctx);
		Section.HeaderSecSizeEncodingLen = Ctx.Ptr - PreSizePtr;
if (Size == 0)		if (Size == 0)
return make_error<StringError>("zero length section",		return make_error<StringError>("zero length section",
object_error::parse_failed);		object_error::parse_failed);
if (Ctx.Ptr + Size > Ctx.End)		if (Ctx.Ptr + Size > Ctx.End)
return make_error<StringError>("section too large",		return make_error<StringError>("section too large",
object_error::parse_failed);		object_error::parse_failed);
if (Section.Type == wasm::WASM_SEC_CUSTOM) {		if (Section.Type == wasm::WASM_SEC_CUSTOM) {
WasmObjectFile::ReadContext SectionCtx;		WasmObjectFile::ReadContext SectionCtx;
▲ Show 20 Lines • Show All 1,730 Lines • Show Last 20 Lines

llvm/lib/ObjectYAML/WasmEmitter.cpp

Show First 20 Lines • Show All 640 Lines • ▼ Show 20 Lines for (const std::unique_ptr<WasmYAML::Section> &Sec : Obj.Sections) {

else else

reportError("unknown section type: " + Twine(Sec->Type)); reportError("unknown section type: " + Twine(Sec->Type));

if (HasError) if (HasError)

return false; return false;

StringStream.flush(); StringStream.flush();

unsigned HeaderSecSizeEncodingLen =

Sec->HeaderSecSizeEncodingLen ? *Sec->HeaderSecSizeEncodingLen : 5;

aheejinAuthorUnsubmitted

Done

unsigned HeaderSecSizeEncodingLen =

- Sec->HeaderSecSizeEncodingLen ? *Sec->HeaderSecSizeEncodingLen : 0;

+ Sec->HeaderSecSizeEncodingLen ? *Sec->HeaderSecSizeEncodingLen : 5;

if (HeaderSecSizeEncodingLen < getULEB128Size(OutString.size())) {

I think this is why the tests are failing. Will fix that.

aheejin: I think this is why the tests are failing. Will fix that.

jhendersonUnsubmitted

Not Done

Thanks, I agree the previous version looks suspicious. This number controls the default encoding length, so 5 seems appropriate.

Looking at this again, I think there is one technical bug here still though, in that if OutString happened to be so long that the LEB had to be 6 bytes or longer, then the user would be forced to specify the YAML field, even if it was just to the same size as the expected LEB. Perhaps worth changing 5 to something like std::max(5, getULEB128Size(OutString.size())? (Probably factor out the size calculation into a local variable)

jhenderson: Thanks, I agree the previous version looks suspicious. This number controls the default…

tlivelyUnsubmitted

Not Done

It's not possible for the size to require greater than 5 bytes because it is an unsigned 32-bit value. Each byte of a LEB has 7 significant bits and ceil(32/7) = 5. Indeed the Wasm spec does not allow using LEBs larger than 5 bytes to encode 32-bit values.

tlively: It's not possible for the size to require greater than 5 bytes because it is an unsigned 32-bit…

jhendersonUnsubmitted

Not Done

Fair enough. FWIW, a ULEB has no technical upper limit on its size, so the constraint is in the wasm spec and what we implement rather than the theoretical what the user could write here, I guess.

jhenderson: Fair enough. FWIW, a ULEB has no technical upper limit on its size, so the constraint is in the…

aheejinAuthorUnsubmitted

Done

Changed to this so we assert the required length should not be greater than 5 bytes:

unsigned RequiredLen = getULEB128Size(OutString.size());                   
// Wasm spec does not allow LEBs larger than 5 bytes
assert(RequiredLen <= 5);                                                   
if (HeaderSecSizeEncodingLen < RequiredLen) {                     
  reportError("section length can't be encoded in a LEB of size " +
              Twine(HeaderSecSizeEncodingLen));                             
  return false;                                                             
}

aheejin: Changed to this so we assert the required length should not be greater than 5 bytes: ```…

unsigned RequiredLen = getULEB128Size(OutString.size());

// Wasm spec does not allow LEBs larger than 5 bytes

jhendersonUnsubmitted

Done

if (HeaderSecSizeEncodingLen < getULEB128Size(OutString.size())) {

- reportError("Section length can't be encoded in an LEB of size " +

+ reportError("section length can't be encoded in an LEB of size " +

Twine(HeaderSecSizeEncodingLen));

LLVM coding standards state that this should be a lower-case letter.

A simple test case for the new error would be good too.

jhenderson: LLVM coding standards state that this should be a lower-case letter. A simple test case for…

jhendersonUnsubmitted

Done

Nit: I'd tend to say "leb" rather than "el-ee-bee" for "LEB", so "an" -> "a", but if you pronounce it differently, I'm fine with this staying as is.

jhenderson: Nit: I'd tend to say "leb" rather than "el-ee-bee" for "LEB", so "an" -> "a", but if you…

assert(RequiredLen <= 5);

if (HeaderSecSizeEncodingLen < RequiredLen) {

reportError("section header length can't be encoded in a LEB of size " +

Twine(HeaderSecSizeEncodingLen));

return false;

}

// Write the section size followed by the content // Write the section size followed by the content

encodeULEB128(OutString.size(), OS); encodeULEB128(OutString.size(), OS, HeaderSecSizeEncodingLen);

jhendersonUnsubmitted

Done

What happens if the encoding length is smaller than what is actually required to encode the size? I assume it either throws some kind of error somehow (no idea how), or silently writes size in full without truncation. If the latter, I think it might be worth reporting an error from yaml2obj, as otherwise the data won't reflect what was explicitly requested.

jhenderson: What happens if the encoding length is smaller than what is actually required to encode the…

dschuffUnsubmitted

Done

The length field is "padTo", meaning that extra padding will be added if necessary, but otherwise you'll get whatever the minimum length is. I added an error report here if the length can't be encoded in the specified size.

dschuff: The length field is "padTo", meaning that extra padding will be added if necessary, but…

OS << OutString; OS << OutString;

} }

// write reloc sections for any section that have relocations // write reloc sections for any section that have relocations

uint32_t SectionIndex = 0; uint32_t SectionIndex = 0;

for (const std::unique_ptr<WasmYAML::Section> &Sec : Obj.Sections) { for (const std::unique_ptr<WasmYAML::Section> &Sec : Obj.Sections) {

if (Sec->Relocations.empty()) { if (Sec->Relocations.empty()) {

SectionIndex++; SectionIndex++;

Show All 26 Lines

llvm/lib/ObjectYAML/WasmYAML.cpp

Show All 39 Lines	void MappingTraits<WasmYAML::Object>::mapping(IO &IO,
IO.mapRequired("FileHeader", Object.Header);		IO.mapRequired("FileHeader", Object.Header);
IO.mapOptional("Sections", Object.Sections);		IO.mapOptional("Sections", Object.Sections);
IO.setContext(nullptr);		IO.setContext(nullptr);
}		}

static void commonSectionMapping(IO &IO, WasmYAML::Section &Section) {		static void commonSectionMapping(IO &IO, WasmYAML::Section &Section) {
IO.mapRequired("Type", Section.Type);		IO.mapRequired("Type", Section.Type);
IO.mapOptional("Relocations", Section.Relocations);		IO.mapOptional("Relocations", Section.Relocations);
		IO.mapOptional("HeaderSecSizeEncodingLen", Section.HeaderSecSizeEncodingLen);
}		}

static void sectionMapping(IO &IO, WasmYAML::DylinkSection &Section) {		static void sectionMapping(IO &IO, WasmYAML::DylinkSection &Section) {
commonSectionMapping(IO, Section);		commonSectionMapping(IO, Section);
IO.mapRequired("Name", Section.Name);		IO.mapRequired("Name", Section.Name);
IO.mapRequired("MemorySize", Section.MemorySize);		IO.mapRequired("MemorySize", Section.MemorySize);
IO.mapRequired("MemoryAlignment", Section.MemoryAlignment);		IO.mapRequired("MemoryAlignment", Section.MemoryAlignment);
IO.mapRequired("TableSize", Section.TableSize);		IO.mapRequired("TableSize", Section.TableSize);
▲ Show 20 Lines • Show All 591 Lines • Show Last 20 Lines

llvm/test/ObjectYAML/wasm/section_header_size.yaml

This file was added.

				## Test that obj2yaml output includes the section header size encoding length
				## only when the length isn't padded to 5 bytes.
				# RUN: yaml2obj --docnum=1 %s \| obj2yaml \| FileCheck %s

				--- !WASM
				FileHeader:
				Version: 0x1
				Sections:
				- Type: TYPE
				HeaderSecSizeEncodingLen: 3
				Signatures:
				- Index: 0
				ParamTypes:
				- I32
				- I32
				ReturnTypes:
				- I32
				- Type: FUNCTION
				HeaderSecSizeEncodingLen: 4
				FunctionTypes: [ 0 ]
				- Type: MEMORY
				HeaderSecSizeEncodingLen: 1
				Memories:
				- Flags: [ HAS_MAX ]
				Minimum: 0x100
				Maximum: 0x100
				- Type: EXPORT
				HeaderSecSizeEncodingLen: 5
				Exports:
				- Name: add
				Kind: FUNCTION
				Index: 0
				- Type: CODE
				HeaderSecSizeEncodingLen: 2
				Functions:
				- Index: 0
				Locals: []
				Body: 200020016A0B
				...
				# CHECK: --- !WASM
				# CHECK-NEXT: FileHeader:
				# CHECK-NEXT: Version: 0x1
				# CHECK-NEXT: Sections:
				# CHECK-NEXT: - Type: TYPE
				# CHECK-NEXT: HeaderSecSizeEncodingLen: 3
				# CHECK-NEXT: Signatures:
				# CHECK-NEXT: - Index: 0
				# CHECK-NEXT: ParamTypes:
				# CHECK-NEXT: - I32
				# CHECK-NEXT: - I32
				# CHECK-NEXT: ReturnTypes:
				# CHECK-NEXT: - I32
				# CHECK-NEXT: - Type: FUNCTION
				# CHECK-NEXT: HeaderSecSizeEncodingLen: 4
				# CHECK-NEXT: FunctionTypes: [ 0 ]
				# CHECK-NEXT: - Type: MEMORY
				# CHECK-NEXT: Memories:
				# CHECK-NEXT: - Flags: [ HAS_MAX ]
				# CHECK-NEXT: Minimum: 0x100
				# CHECK-NEXT: Maximum: 0x100
				# CHECK-NEXT: - Type: EXPORT
				# CHECK-NEXT: Exports:
				# CHECK-NEXT: - Name: add
				# CHECK-NEXT: Kind: FUNCTION
				# CHECK-NEXT: Index: 0
				# CHECK-NEXT: - Type: CODE
				# CHECK-NEXT: HeaderSecSizeEncodingLen: 2
				# CHECK-NEXT: Functions:
				# CHECK-NEXT: - Index: 0
				# CHECK-NEXT: Locals: []
				# CHECK-NEXT: Body: 200020016A0B

				## Test if we correctly error out if the provided section header size is less
				## than the size required.
				# RUN: not yaml2obj --docnum=2 %s -o /dev/null 2>&1 \| FileCheck %s --check-prefix=INVALID
				# INVALID: yaml2obj: error: section header length can't be encoded in a LEB of size 0

				--- !WASM
				FileHeader:
				Version: 0x1
				Sections:
				- Type: TYPE
				HeaderSecSizeEncodingLen: 0
				jhendersonUnsubmitted Done Reply Inline Actions Two nits: I personally wouldn't mix the check line in with the yaml. I think most readers would expect it to appear immediately after the RUN line, or after the end of the YAML doc (I personally prefer the former). However, I don't feel strongly about this. Should this be "section header length" rather than "section length"? jhenderson: Two nits: 1) I personally wouldn't mix the check line in with the yaml. I think most readers…
				Signatures:
				- Index: 0
				ParamTypes:
				- I32
				- I32
				ReturnTypes:
				- I32
				...
				jhendersonUnsubmitted Done Reply Inline Actions Nit: new line at EOF. jhenderson: Nit: new line at EOF.
				jhendersonUnsubmitted Done Reply Inline Actions Ping this comment? jhenderson: Ping this comment?

llvm/test/tools/llvm-objcopy/wasm/section-header-size.test

This file was added.

				## Test that objcopy generates section headers that are identical to those from
				## the input binary, including the encoded size of the LEB that represents the
				## section size.

				jhendersonUnsubmitted Not Done Reply Inline Actions Rather than adding a canned binary, it seems to me like it would be fairly straightforward to add some additional functionality to yaml2obj to customise the LEB size? Probably an additional, optional field called "HeaderSizeLength" or something to that effect as a section member? jhenderson: Rather than adding a canned binary, it seems to me like it would be fairly straightforward to…
				dschuffUnsubmitted Done Reply Inline Actions Yes, it's straightforward to add YAML support for this. The reason I didn't was that it would mean a new field in all of the sections of every YAML output, which would be a lot of extra output that we don't care about in every section in every test (which would make the output harder to read and require rewriting the expectations for basically every wasm YAML test). There are a couple of options here. Just use the checked-in binary in this one case Implement reading the value from YAML but not print it when outputting YAML. This allows writing a test without the checked-in binary and avoids polluting all the output. The test would read YAML with this field and diff the output of llvm-objcopy against the one generated by yaml2obj. That's nice, although it's a slightly odd that the support is asymmetric. Just bite the bullet and rewrite all the YAML tests. Then we could write this test with yaml2obj and obj2yaml. Same as option 2 but only print the value in YAML if it's a "non-default" value, i.e. neither the minimum size for the section in question, nor 5 (the default "padded" value). This would complicate the wasm2yaml logic slightly but not require rewriting the tests. I think I favor option 2, WDYT? dschuff: Yes, it's straightforward to add YAML support for this. The reason I didn't was that it would…
				dschuffUnsubmitted Done Reply Inline Actions (or if not option 2, then 4, 1, or 3 in that order. Really any option other than 3 is fine with me). dschuff: (or if not option 2, then 4, 1, or 3 in that order. Really any option other than 3 is fine with…
				jhendersonUnsubmitted Not Done Reply Inline Actions Option 4 would be my suggestion, mostly because that's what ELF yaml2obj does for many of its fields. It means that we can have minimal YAML whilst also being able to go yaml2obj -> obj2yaml -> yaml2obj -> ... ad infinitum without the file ever changing. jhenderson: Option 4 would be my suggestion, mostly because that's what ELF yaml2obj does for many of its…
				dschuffUnsubmitted Done Reply Inline Actions Thanks makes sense to me. Patch updated. dschuff: Thanks makes sense to me. Patch updated.
				# RUN: yaml2obj %s -o %t.wasm
				# RUN: llvm-objcopy %t.wasm %t.wasm.copy
				# RUN: diff %t.wasm %t.wasm.copy

				--- !WASM
				FileHeader:
				Version: 0x1
				Sections:
				- Type: TYPE
				HeaderSecSizeEncodingLen: 3
				Signatures:
				- Index: 0
				ParamTypes:
				- I32
				- I32
				ReturnTypes:
				- I32
				- Type: FUNCTION
				HeaderSecSizeEncodingLen: 4
				FunctionTypes: [ 0 ]
				- Type: MEMORY
				HeaderSecSizeEncodingLen: 1
				Memories:
				- Flags: [ HAS_MAX ]
				Minimum: 0x100
				Maximum: 0x100
				- Type: EXPORT
				HeaderSecSizeEncodingLen: 5
				Exports:
				- Name: add
				Kind: FUNCTION
				Index: 0
				- Type: CODE
				Functions:
				- Index: 0
				Locals: []
				Body: 200020016A0B

llvm/tools/obj2yaml/wasm2yaml.cpp

//===------ utils/wasm2yaml.cpp - obj2yaml conversion tool ------- C++ --===//		//===------ utils/wasm2yaml.cpp - obj2yaml conversion tool ------- C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "obj2yaml.h"		#include "obj2yaml.h"
#include "llvm/Object/COFF.h"		#include "llvm/Object/COFF.h"
#include "llvm/ObjectYAML/WasmYAML.h"		#include "llvm/ObjectYAML/WasmYAML.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
		#include "llvm/Support/LEB128.h"
#include "llvm/Support/YAMLTraits.h"		#include "llvm/Support/YAMLTraits.h"

using namespace llvm;		using namespace llvm;
using object::WasmSection;		using object::WasmSection;

namespace {		namespace {

class WasmDumper {		class WasmDumper {
▲ Show 20 Lines • Show All 366 Lines • ▼ Show 20 Lines	case wasm::WASM_SEC_DATACOUNT: {
DataCountSec->Count = Obj.dataSegments().size();		DataCountSec->Count = Obj.dataSegments().size();
S = std::move(DataCountSec);		S = std::move(DataCountSec);
break;		break;
}		}
default:		default:
llvm_unreachable("Unknown section type");		llvm_unreachable("Unknown section type");
break;		break;
}		}

		// Only propagate the section size encoding length if it's not the minimal
		// size or 5 (the default "padded" value). This is to avoid having every
		// YAML output polluted with this value when we usually don't care about it
		// (and avoid rewriting all the test expectations).
		if (WasmSec.HeaderSecSizeEncodingLen &&
		WasmSec.HeaderSecSizeEncodingLen !=
		getULEB128Size(WasmSec.Content.size()) &&
		jhendersonUnsubmitted Not Done Reply Inline Actions Maybe I'm missing something, but doesn't this mean that there's ambiguity between whether a length of the LEB size or 5 is appropriate when the emitted YAML is converted back to an object? jhenderson: Maybe I'm missing something, but doesn't this mean that there's ambiguity between whether a…
		dschuffUnsubmitted Done Reply Inline Actions No, you're correct. This does mean that a binary could fail to roundtrip from obj -> YAML -> obj. However doing it this way meant that we don't have to rewrite all of the YAML test expectations (since some components e.g. MCAssembler emit patchable 5-byte encodings, while others emit minimal-sized encodings). This potential failure to round-trip is maybe suboptimal in theory but I think it doesn't matter that much. dschuff: No, you're correct. This does mean that a binary could fail to roundtrip from obj -> YAML ->…
		WasmSec.HeaderSecSizeEncodingLen != 5)
		S->HeaderSecSizeEncodingLen = WasmSec.HeaderSecSizeEncodingLen;

for (const wasm::WasmRelocation &Reloc : WasmSec.Relocations) {		for (const wasm::WasmRelocation &Reloc : WasmSec.Relocations) {
WasmYAML::Relocation R;		WasmYAML::Relocation R;
R.Type = Reloc.Type;		R.Type = Reloc.Type;
R.Index = Reloc.Index;		R.Index = Reloc.Index;
R.Offset = Reloc.Offset;		R.Offset = Reloc.Offset;
R.Addend = Reloc.Addend;		R.Addend = Reloc.Addend;
S->Relocations.push_back(R);		S->Relocations.push_back(R);
}		}
Show All 18 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly][Objcopy] Write output section headers identically to inputsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 544968

llvm/include/llvm/Object/Wasm.h

llvm/include/llvm/ObjectYAML/WasmYAML.h

llvm/lib/ObjCopy/wasm/WasmObject.h

llvm/lib/ObjCopy/wasm/WasmReader.cpp

llvm/lib/ObjCopy/wasm/WasmWriter.cpp

llvm/lib/Object/WasmObjectFile.cpp

llvm/lib/ObjectYAML/WasmEmitter.cpp

llvm/lib/ObjectYAML/WasmYAML.cpp

llvm/test/ObjectYAML/wasm/section_header_size.yaml

llvm/test/tools/llvm-objcopy/wasm/section-header-size.test

llvm/tools/obj2yaml/wasm2yaml.cpp

[WebAssembly][Objcopy] Write output section headers identically to inputs
ClosedPublic