Download Raw Diff

Details

Reviewers

ruiu
ncw

Commits

rGdbd33b80b7f9: [WebAssembly] Verify contents of relocation target before writing it
rLLD327325: [WebAssembly] Verify contents of relocation target before writing it
rL327325: [WebAssembly] Verify contents of relocation target before writing it

Summary

Verify that the location where a relocation is about the be
applied contains the expected existing value.

This is essentially a sanity check to catch bugs in the compiler
and the linker.

Diff Detail

Repository

rLLD LLVM Linker

Build Status

Buildable 15926
Build 15926: arc lint + arc unit

Event Timeline

sbc100 created this revision.Mar 9 2018, 9:25 PM

Herald added subscribers: llvm-commits, sunfish, aheejin and 3 others. · View Herald TranscriptMar 9 2018, 9:25 PM

Harbormaster completed remote builds in B15924: Diff 137896.Mar 9 2018, 9:28 PM

factor out separate change

fix type

Harbormaster completed remote builds in B15926: Diff 137898.Mar 9 2018, 9:32 PM

Harbormaster completed remote builds in B15927: Diff 137899.

sbc100 added reviewers: ruiu, ncw.Mar 9 2018, 9:53 PM

Cool, this is a good error-catching device. It could perhaps be debug-only on assert-only - but it's not exactly an expensive check, so I don't mind it being always on.

wasm/InputFiles.cpp
133	Can't you take the address of imported functions - yet functionTypes only contains the defined ones? Something doesn't seem quite right about these indexes, but I'd have to play around with a couple of examples to be sure. I think the code that uses TableEntries above is correct (ie ElementIndex is the right thing to use).

sbc100 added inline comments.Mar 10 2018, 5:39 PM

wasm/InputFiles.cpp
133	You are right.. i was trying to calculate the max function index here but I failed. I will try again :)

feedback

Harbormaster completed remote builds in B15939: Diff 137933.Mar 10 2018, 6:15 PM

Looks good!

This revision is now accepted and ready to land.Mar 12 2018, 8:56 AM

sbc100 retitled this revision from [WebAssembly] Verify contents of relocations target before writing in to [WebAssembly] Verify contents of relocations target before writing it.Mar 12 2018, 12:54 PM

Closed by commit rL327325: [WebAssembly] Verify contents of relocation target before writing it (authored by sbc). · Explain WhyMar 12 2018, 12:57 PM

This revision was automatically updated to reflect the committed changes.

I doubt that you really need to do that for every invocation of the linker. Generally speaking, relocation handling is what you want to optimize the most because the number of relocation can be really huge (it can be tens of millions). In addition to that, there are a lot of different ways that a compiler can be wrong, and catching only one error in the linker doesn't make much sense to me. Why is finding this particular error so important? We generally trust compilers that they create sane object files. To be honest, I think we should remove this check completely.

In D44349#1037809, @ruiu wrote:

I doubt that you really need to do that for every invocation of the linker. Generally speaking, relocation handling is what you want to optimize the most because the number of relocation can be really huge (it can be tens of millions). In addition to that, there are a lot of different ways that a compiler can be wrong, and catching only one error in the linker doesn't make much sense to me. Why is finding this particular error so important? We generally trust compilers that they create sane object files. To be honest, I think we should remove this check completely.

I'd be OK with doing this is debug builds only perhaps.

The motivating reason is that we are thinking of adding linker relaxation for accessing external global addresses. I was speaking to @mcgrathr about this and he said that the linker (in this case at least) can/should/does check for sanity of the surrounding instructions before replacing/modifying it. So I'd also be OK only doing this when we do relaxation perhaps?

The motivating reason is that we are thinking of adding linker relaxation for accessing external global addresses. I was speaking to @mcgrathr about this and he said that the linker (in this case at least) can/should/does check for sanity of the surrounding instructions before replacing/modifying it. So I'd also be OK only doing this when we do relaxation perhaps?

Maybe. If you already have some piece of data when processing something, and you can easily use that data to verify something, you probably should do that. But I don't think we do some extra validation for the sake of validation. In particular, creating a map and look it up for each relocation just for validation is too expensive and doesn't feel lld-ish.

So Sam and I had dinner together. :-) I don't know a lot about WebAssembly and Sam doesn't know a lot about traditional (e.g. ELF) linkers, so we talked about the parallels in somewhat general terms while explaining the specifics to each other without distracting from our dessert.
In talking about what WebAssembly wants to do with converting some kinds of references to other kinds, I described the obvious analogy to the various kinds of linker relaxation that have been done in traditional machine code linking such as ELF.
There are two precedents that I can cite in detail off hand. In both cases the reloc type is specified in the ABI as for use with certain instruction patterns and the linker is expected to rewrite instructions with complete knowledge of their meaning, rather than just deliver values into bitfields.
One is x86-64's GOTPCREL and GOTPCRELX relocs, where GNU linkers check for expected opcode bytes immediate prior to the r_offset location and silently forego the instruction-rewriting optimization if the reloc is not used in the expected context.
The other is x86-64's TLS relocs, where GNU linkers check the opcode bytes immediatley prior to the r_offset location and diagnose an error if they don't match specific instruction patterns prescribed in the ABI for each reloc that enables linker relaxation (IIRC for some cases the x86 ABI mandates that linkers perform the relaxation, rather than just describing how the optimization is possible).
I don't know off hand how LLD handles these cases, but I would say that matching the GNU linkers' behavior is right--with the possible caveat that perhaps there should be a warning emitting for using GOTPCRELX relocs with instructions the linker doesn't know how to rewrite.
I know that other machines have many more relocs that are prescribed in their ABIs as for use with specific instruction patterns, but I don't know off hand what the details of those are, nor how existing linkers handle situations where such relocs are used with mismatching instructions.

In D44349#1037891, @mcgrathr wrote:

So Sam and I had dinner together. :-) I don't know a lot about WebAssembly and Sam doesn't know a lot about traditional (e.g. ELF) linkers, so we talked about the parallels in somewhat general terms while explaining the specifics to each other without distracting from our dessert.
In talking about what WebAssembly wants to do with converting some kinds of references to other kinds, I described the obvious analogy to the various kinds of linker relaxation that have been done in traditional machine code linking such as ELF.
There are two precedents that I can cite in detail off hand. In both cases the reloc type is specified in the ABI as for use with certain instruction patterns and the linker is expected to rewrite instructions with complete knowledge of their meaning, rather than just deliver values into bitfields.
One is x86-64's GOTPCREL and GOTPCRELX relocs, where GNU linkers check for expected opcode bytes immediate prior to the r_offset location and silently forego the instruction-rewriting optimization if the reloc is not used in the expected context.
The other is x86-64's TLS relocs, where GNU linkers check the opcode bytes immediatley prior to the r_offset location and diagnose an error if they don't match specific instruction patterns prescribed in the ABI for each reloc that enables linker relaxation (IIRC for some cases the x86 ABI mandates that linkers perform the relaxation, rather than just describing how the optimization is possible).
I don't know off hand how LLD handles these cases, but I would say that matching the GNU linkers' behavior is right--with the possible caveat that perhaps there should be a warning emitting for using GOTPCRELX relocs with instructions the linker doesn't know how to rewrite.

lld relaxes these relocations assuming that a compiler emit an appropriate instruction to where a relocation is applied. We read data from the location where a relocation is applied to if it is needed to relax it, but we don't really verify whether an instruction at that location is a valid one or not. lld just does what is instructed to do by a compiler when processing relocations, and I like it. I don't think that finding a mismatching instruction for some type of relocation adds a practical value to the linker.

I know that other machines have many more relocs that are prescribed in their ABIs as for use with specific instruction patterns, but I don't know off hand what the details of those are, nor how existing linkers handle situations where such relocs are used with mismatching instructions.

Would you be OK with leaving this code in behind ifndef NDEBUG? I think it is of some use during development at least.

Can you make it a completely separated pass in the linker? I think that you can construct a table and visit all relocations (or symbols?) in a separate function rather than doing while processing some other data.

Diff 137898

wasm/InputChunks.cpp

Show First 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	if (Relocations.empty())
return;		return;

DEBUG(dbgs() << "applying relocations: count=" << Relocations.size() << "\n");		DEBUG(dbgs() << "applying relocations: count=" << Relocations.size() << "\n");
int32_t Off = OutputOffset - getInputSectionOffset();		int32_t Off = OutputOffset - getInputSectionOffset();

for (const WasmRelocation &Rel : Relocations) {		for (const WasmRelocation &Rel : Relocations) {
uint8_t *Loc = Buf + Rel.Offset + Off;		uint8_t *Loc = Buf + Rel.Offset + Off;
uint32_t Value = File->calcNewValue(Rel);		uint32_t Value = File->calcNewValue(Rel);
		uint32_t ExistingValue;
DEBUG(dbgs() << "apply reloc: type=" << ReloctTypeToString(Rel.Type)		DEBUG(dbgs() << "apply reloc: type=" << ReloctTypeToString(Rel.Type)
<< " addend=" << Rel.Addend << " index=" << Rel.Index		<< " addend=" << Rel.Addend << " index=" << Rel.Index
<< " value=" << Value << " offset=" << Rel.Offset << "\n");		<< " value=" << Value << " offset=" << Rel.Offset << "\n");

switch (Rel.Type) {		switch (Rel.Type) {
case R_WEBASSEMBLY_TYPE_INDEX_LEB:		case R_WEBASSEMBLY_TYPE_INDEX_LEB:
case R_WEBASSEMBLY_FUNCTION_INDEX_LEB:		case R_WEBASSEMBLY_FUNCTION_INDEX_LEB:
case R_WEBASSEMBLY_GLOBAL_INDEX_LEB:		case R_WEBASSEMBLY_GLOBAL_INDEX_LEB:
case R_WEBASSEMBLY_MEMORY_ADDR_LEB:		case R_WEBASSEMBLY_MEMORY_ADDR_LEB:
		ExistingValue = decodeULEB128(Loc);
encodeULEB128(Value, Loc, 5);		encodeULEB128(Value, Loc, 5);
break;		break;
case R_WEBASSEMBLY_TABLE_INDEX_SLEB:		case R_WEBASSEMBLY_TABLE_INDEX_SLEB:
case R_WEBASSEMBLY_MEMORY_ADDR_SLEB:		case R_WEBASSEMBLY_MEMORY_ADDR_SLEB:
		ExistingValue = static_cast<uint32_t>(decodeSLEB128(Loc));
encodeSLEB128(static_cast<int32_t>(Value), Loc, 5);		encodeSLEB128(static_cast<int32_t>(Value), Loc, 5);
break;		break;
case R_WEBASSEMBLY_TABLE_INDEX_I32:		case R_WEBASSEMBLY_TABLE_INDEX_I32:
case R_WEBASSEMBLY_MEMORY_ADDR_I32:		case R_WEBASSEMBLY_MEMORY_ADDR_I32:
		ExistingValue = static_cast<uint32_t>(read32le(Loc));
write32le(Loc, Value);		write32le(Loc, Value);
break;		break;
default:		default:
llvm_unreachable("unknown relocation type");		llvm_unreachable("unknown relocation type");
}		}

		uint64_t ExpectedValue = File->calcExpectedValue(Rel);
		if (ExpectedValue != ExistingValue)
		error("unexpected existing value for " + ReloctTypeToString(Rel.Type) +
		": existing=" + Twine(ExistingValue) +
		" expected=" + Twine(ExpectedValue));
}		}
}		}

// Copy relocation entries to a given output stream.		// Copy relocation entries to a given output stream.
// This function is used only when a user passes "-r". For a regular link,		// This function is used only when a user passes "-r". For a regular link,
// we consume relocations instead of copying them to an output file.		// we consume relocations instead of copying them to an output file.
void InputChunk::writeRelocations(raw_ostream &OS) const {		void InputChunk::writeRelocations(raw_ostream &OS) const {
if (Relocations.empty())		if (Relocations.empty())
Show All 34 Lines

wasm/InputFiles.h

Show First 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	public:

// Returns the underlying wasm file.		// Returns the underlying wasm file.
const WasmObjectFile *getWasmObj() const { return WasmObj.get(); }		const WasmObjectFile *getWasmObj() const { return WasmObj.get(); }

void dumpInfo() const;		void dumpInfo() const;

uint32_t calcNewIndex(const WasmRelocation &Reloc) const;		uint32_t calcNewIndex(const WasmRelocation &Reloc) const;
uint32_t calcNewValue(const WasmRelocation &Reloc) const;		uint32_t calcNewValue(const WasmRelocation &Reloc) const;
		uint32_t calcExpectedValue(const WasmRelocation &Reloc) const;

const WasmSection *CodeSection = nullptr;		const WasmSection *CodeSection = nullptr;
const WasmSection *DataSection = nullptr;		const WasmSection *DataSection = nullptr;

		// Maps input type indices to output type indices
std::vector<uint32_t> TypeMap;		std::vector<uint32_t> TypeMap;
std::vector<bool> TypeIsUsed;		std::vector<bool> TypeIsUsed;
		// Maps function indices to table indices
		std::vector<uint32_t> TableEntries;
std::vector<InputSegment *> Segments;		std::vector<InputSegment *> Segments;
std::vector<InputFunction *> Functions;		std::vector<InputFunction *> Functions;
std::vector<InputGlobal *> Globals;		std::vector<InputGlobal *> Globals;

ArrayRef<Symbol *> getSymbols() const { return Symbols; }		ArrayRef<Symbol *> getSymbols() const { return Symbols; }
Symbol *getSymbol(uint32_t Index) const { return Symbols[Index]; }		Symbol *getSymbol(uint32_t Index) const { return Symbols[Index]; }
FunctionSymbol *getFunctionSymbol(uint32_t Index) const;		FunctionSymbol *getFunctionSymbol(uint32_t Index) const;
DataSymbol *getDataSymbol(uint32_t Index) const;		DataSymbol *getDataSymbol(uint32_t Index) const;
Show All 24 Lines

wasm/InputFiles.cpp

Show First 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
uint32_t ObjFile::calcNewIndex(const WasmRelocation &Reloc) const {		uint32_t ObjFile::calcNewIndex(const WasmRelocation &Reloc) const {
if (Reloc.Type == R_WEBASSEMBLY_TYPE_INDEX_LEB) {		if (Reloc.Type == R_WEBASSEMBLY_TYPE_INDEX_LEB) {
assert(TypeIsUsed[Reloc.Index]);		assert(TypeIsUsed[Reloc.Index]);
return TypeMap[Reloc.Index];		return TypeMap[Reloc.Index];
}		}
return Symbols[Reloc.Index]->getOutputSymbolIndex();		return Symbols[Reloc.Index]->getOutputSymbolIndex();
}		}

		// Calculate the value we expect to find at the relocation location.
		// This is used as a sanity check before applying a relocation to a given
		// location. It is useful for catching bugs in the compiler and linker.
		uint32_t ObjFile::calcExpectedValue(const WasmRelocation &Reloc) const {
		switch (Reloc.Type) {
		case R_WEBASSEMBLY_TABLE_INDEX_I32:
		case R_WEBASSEMBLY_TABLE_INDEX_SLEB: {
		const WasmSymbol& Sym = WasmObj->syms()[Reloc.Index];
		return TableEntries[Sym.Info.ElementIndex];
		}
		case R_WEBASSEMBLY_MEMORY_ADDR_SLEB:
		case R_WEBASSEMBLY_MEMORY_ADDR_I32:
		case R_WEBASSEMBLY_MEMORY_ADDR_LEB: {
		const WasmSymbol& Sym = WasmObj->syms()[Reloc.Index];
		if (Sym.isUndefined())
		return 0;
		const WasmSegment& Segment = WasmObj->dataSegments()[Sym.Info.DataRef.Segment];
		return Segment.Data.Offset.Value.Int32 + Sym.Info.DataRef.Offset;
		}
		case R_WEBASSEMBLY_TYPE_INDEX_LEB:
		return Reloc.Index;
		case R_WEBASSEMBLY_FUNCTION_INDEX_LEB:
		case R_WEBASSEMBLY_GLOBAL_INDEX_LEB: {
		const WasmSymbol& Sym = WasmObj->syms()[Reloc.Index];
		return Sym.Info.ElementIndex;
		}
		default:
		llvm_unreachable("unknown relocation type");
		}
		}

// Translate from the relocation's index into the final linked output value.		// Translate from the relocation's index into the final linked output value.
uint32_t ObjFile::calcNewValue(const WasmRelocation &Reloc) const {		uint32_t ObjFile::calcNewValue(const WasmRelocation &Reloc) const {
switch (Reloc.Type) {		switch (Reloc.Type) {
case R_WEBASSEMBLY_TABLE_INDEX_I32:		case R_WEBASSEMBLY_TABLE_INDEX_I32:
case R_WEBASSEMBLY_TABLE_INDEX_SLEB:		case R_WEBASSEMBLY_TABLE_INDEX_SLEB:
return getFunctionSymbol(Reloc.Index)->getTableIndex();		return getFunctionSymbol(Reloc.Index)->getTableIndex();
case R_WEBASSEMBLY_MEMORY_ADDR_SLEB:		case R_WEBASSEMBLY_MEMORY_ADDR_SLEB:
case R_WEBASSEMBLY_MEMORY_ADDR_I32:		case R_WEBASSEMBLY_MEMORY_ADDR_I32:
Show All 21 Lines	void ObjFile::parse() {
if (!Obj)		if (!Obj)
fatal(toString(this) + ": not a wasm file");		fatal(toString(this) + ": not a wasm file");
if (!Obj->isRelocatableObject())		if (!Obj->isRelocatableObject())
fatal(toString(this) + ": not a relocatable wasm file");		fatal(toString(this) + ": not a relocatable wasm file");

Bin.release();		Bin.release();
WasmObj.reset(Obj);		WasmObj.reset(Obj);

		// Build up a map of function indices to table indices for use when
		// verifying the existing table index relocations
		TableEntries.resize(WasmObj->functionTypes().size());
		ncwUnsubmitted Not Done Reply Inline Actions Can't you take the address of imported functions - yet functionTypes only contains the defined ones? Something doesn't seem quite right about these indexes, but I'd have to play around with a couple of examples to be sure. I think the code that uses TableEntries above is correct (ie ElementIndex is the right thing to use). ncw: Can't you take the address of imported functions - yet functionTypes only contains the defined…
		sbc100AuthorUnsubmitted Not Done Reply Inline Actions You are right.. i was trying to calculate the max function index here but I failed. I will try again :) sbc100: You are right.. i was trying to calculate the max function index here but I failed. I will try…
		for (const WasmElemSegment &Seg : WasmObj->elements()) {
		if (Seg.Offset.Opcode != WASM_OPCODE_I32_CONST)
		fatal(toString(this) + ": invalid table elements");
		uint32_t Offset = Seg.Offset.Value.Int32;
		for (uint32_t Index = 0; Index < Seg.Functions.size(); Index++) {

		uint32_t FunctionIndex = Seg.Functions[Index];
		TableEntries[FunctionIndex] = Offset + Index;
		}
		}

// Find the code and data sections. Wasm objects can have at most one code		// Find the code and data sections. Wasm objects can have at most one code
// and one data section.		// and one data section.
for (const SectionRef &Sec : WasmObj->sections()) {		for (const SectionRef &Sec : WasmObj->sections()) {
const WasmSection &Section = WasmObj->getWasmSection(Sec);		const WasmSection &Section = WasmObj->getWasmSection(Sec);
if (Section.Type == WASM_SEC_CODE)		if (Section.Type == WASM_SEC_CODE)
CodeSection = &Section;		CodeSection = &Section;
else if (Section.Type == WASM_SEC_DATA)		else if (Section.Type == WASM_SEC_DATA)
DataSection = &Section;		DataSection = &Section;
▲ Show 20 Lines • Show All 170 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] Verify contents of relocations target before writing it
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 137898

wasm/InputChunks.cpp

wasm/InputFiles.h

wasm/InputFiles.cpp

This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] Verify contents of relocations target before writing itClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 137898

wasm/InputChunks.cpp

wasm/InputFiles.h

wasm/InputFiles.cpp

[WebAssembly] Verify contents of relocations target before writing it
ClosedPublic