This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
wasm/
-
InputChunks.h
-
InputChunks.cpp
2
InputFiles.cpp

Differential D42176

[WebAssembly] Optimise relocation iteration to remove n^2 loop. NFC.
AbandonedPublic

Authored by ncw on Jan 17 2018, 6:17 AM.

Download Raw Diff

Details

Reviewers

sbc100
ruiu

Summary

Addresses issue: https://github.com/WebAssembly/tool-conventions/issues/32

Previously, for each function/segment we iterated over every relocation to find the relevant ones, which is an n^2 operation. Now, we just make a single pass.

Diff Detail

Repository

rLLD LLVM Linker

Build Status

Buildable 15352
Build 15352: arc lint + arc unit

Event Timeline

ncw created this revision.Jan 17 2018, 6:17 AM

Herald added subscribers: llvm-commits, mgrang, sunfish and 4 others. · View Herald TranscriptJan 17 2018, 6:17 AM

Updated:

Remove copy for already-sorted input

This should be OK to merge (assuming it passes review :) ), doesn't change anything functionally.

Another n^2 loop I've noticed in LLD is the one where the segment is looked up from the data symbol, which is done by iterating over all the segments in order. That will be addressed of course by the symtab changes, which will add a direct data-symbol-to-segment association.

Somewhat orthogonal to this patch, but one of the reasons why lld is so fast is that it doesn't only avoid expensive algorithms such as O(n^2) or O(log n) but also avoids even hash table lookups, though it's O(1). Loading something through two-levels of pointer indirection is very fast, but obtaining the same information through a hash table can be very expensive, as the amount of data the linker has to deal with is huge. Maybe I should write a document about that somewhere in the lld repository, but please keep in mind that that's one of the design principles.

Rebased.

Any objections to this?

(NB. I have LLVM commit access now, so can merge patches myself once reviewed.)

Harbormaster completed remote builds in B15352: Diff 135634.Feb 23 2018, 6:32 AM

ruiu added inline comments.Feb 23 2018, 12:09 PM

wasm/InputFiles.cpp
195	I read this part of code recently and found the current code does something very weird. It copies relocations for no obvious reason. Instead of copying relocations and attach it to a Chunk, you could directly consume mmap'ed relocation section. So, I think "copy" or "set" relocation should be eliminated, instead of making it efficient.

ncw added inline comments.Feb 23 2018, 2:15 PM

wasm/InputFiles.cpp
195	I agree - in fact, I think that's exactly what I've done! I've renamed the methods from "copy" to "set" because the copy is eliminated - see the std::vector in InputChunk that's been replaced with an ArrayRef that points directly into the WasmObjectFile (OK, so WasmObjectFile does a copy to unserialise the relocations, so it's not quite a direct pointer into the input file... but it's closer). I think that with this change, there are no longer any copies of relocations in Wasm-LLD. All we're doing here is giving each InputChunk a single pointer to that chunk's relocs in the InputFile's array.

Ah, sorry. If that's the case, I misunderstood this patch. But why do you have to explicitly set relocations to chunks? Chunks knows which file they are created from, and Files have pointers to relocations, so Chunks doesn't have to be told what relocations they have, no?

In D42176#1017836, @ruiu wrote:

Ah, sorry. If that's the case, I misunderstood this patch. But why do you have to explicitly set relocations to chunks? Chunks knows which file they are created from, and Files have pointers to relocations, so Chunks doesn't have to be told what relocations they have, no?

That's all true... the chunks already know the file, and the file already knows the relocations - but the file stores a big list of all the relocations (in one array). They are sorted by chunk, but when we write out the (non-discarded) chunks we have to know which relocations go with which chunk.

Alternatives:

Don't remember the association between chunks and relocs. Do a binary search for every chunk, to find the relocs that are relevant to it. Could make LLD slower.
Change WasmObjectFile so the reloc list in WasmObjectFile is per-chunk, not per-file; not really any better than this commit, it just pushes the same code into a different class.
Change the actual Wasm file format, so we store relocs per-chunk on disk, rather than per-file (Sam was suggesting this one actually). Could result in marginally higher memory usage? I'm not too bothered by the current format; relocs in one list sorted by chunk seems a reasonable file format.

Hmm, I think I'm confused. IIUC, a wasm object file has single .data and .code section, and there is one relocation section for each section.

A "section" in wasm is actually not an atomic unit of inclusion/exclusion. It can be split into multiple "chunks", and "chunks" is a unit of garbage collection.

But why does that two-level abstraction needed? Why can't everything just a section?

In D42176#1018179, @ruiu wrote:

Hmm, I think I'm confused. IIUC, a wasm object file has single .data and .code section, and there is one relocation section for each section.

A "section" in wasm is actually not an atomic unit of inclusion/exclusion. It can be split into multiple "chunks", and "chunks" is a unit of garbage collection.

Yes - I think that's right.

But why does that two-level abstraction needed? Why can't everything just a section?

I don't follow what your alternative suggestion is ("why can't everything just a section"). Possibly one of these:

WasmObjectFile should implement llvm::object::ObjectFile to return an llvm::object::SectionRef for each chunk. Then there would be a one-to-one mapping between the LLVM "sections" and the LLD "chunks". You don't have any opinion on how Wasm files are encoded on disk - you're simply asking for the code I've written to essentially be pushed down into WasmObjectFile, so that LLD doesn't have to handle lists of relocations.
Or, maybe you are concerned about the Wasm file encoding on disk - you want to change that so that relocations are stored per chunk/function. You don't care about what WasmObjectFile's sections iterate over, as long as each input chunk (function etc) has a list of relocations associated with it by WasmObjectFile, so that LLD doesn't have to track that. I think this is what Sam was suggesting on GitHub.
Or maybe you're worried about both? The file format and the section abstraction are both wrong to you.

I don't have a strong opinion - I mean functionally they all end up doing the same thing! It's just a matter of shuffling around/renaming some code, to make future maintenance less confusing.

I'll go with whatever other people want - this PR is just a quick win that gets rid of an unoptimised loop, while avoiding changing the file format, and without redefining what "section" currently means in the context of Wasm. With those constraints, I don't think there's that much that could be done differently...

I suggest we move to storing the relocations on per chunk basis. There is already a bug open to do this: https://github.com/WebAssembly/tool-conventions/issues/32. I also don't it makes much sense to try to fix this performance issue in the mean time.

This may be a silly question, but what is the distinction of Chunk and Section? Chunk seems like a section in a section, so I wonder why wasm object files is designed that way. In ELF and COFF, everything is a section, and section is a atomic unit of linking.

In D42176#1021076, @ruiu wrote:

This may be a silly question, but what is the distinction of Chunk and Section? Chunk seems like a section in a section, so I wonder why wasm object files is designed that way. In ELF and COFF, everything is a section, and section is a atomic unit of linking.

In a wasm binary all code is in a single section (WASM_SEC_CODE) and all data is defined in single section (WASM_SEC_DATA). Within the code section there are N functions and within the data section there are N data *segments*. Things are not as homogeneous as in ELF were everything is just bytes and virtual addresses. We use the term "chunk" there as an abstraction that linker can use to handle both functions bodies and data segements, even though in the file format these are no really the same thing.

We could have gone a different route for the intermediate object and allowed multiple code sections, but then none of the existing wasm tools would load the file (i.e. they would no longer we valid wasm files) and I'm not sure this would have made the linker any simpler, since we would still not be able to match ELF in its ability to blindly concatenate everything.

We could have gone a different route for the intermediate object and allowed multiple code sections, but then none of the existing wasm tools would load the file (i.e. they would no longer we valid wasm files) and I'm not sure this would have made the linker any simpler, since we would still not be able to match ELF in its ability to blindly concatenate everything.

I don't think I agree with this point. Object files and executable files are different kind of files, so the requirement that executables must have one text and one data section doesn't have to be applied to object files. If you apply that, then you would have to come up with a way to workaround -- and that's the "chunk". I'm not particularly interested in making wasm as much similar as ELF because they are naturally different, but "chunk" is beyond that -- it just feels more complicated than necessary. I strongly believe that everything can be (and should be) sections in object files.

In D42176#1021117, @ruiu wrote:

We could have gone a different route for the intermediate object and allowed multiple code sections, but then none of the existing wasm tools would load the file (i.e. they would no longer we valid wasm files) and I'm not sure this would have made the linker any simpler, since we would still not be able to match ELF in its ability to blindly concatenate everything.

I don't think I agree with this point. Object files and executable files are different kind of files, so the requirement that executables must have one text and one data section doesn't have to be applied to object files. If you apply that, then you would have to come up with a way to workaround -- and that's the "chunk". I'm not particularly interested in making wasm as much similar as ELF because they are naturally different, but "chunk" is beyond that -- it just feels more complicated than necessary. I strongly believe that everything can be (and should be) sections in object files.

I understand that executables and objects are different and we are not striving to make the object files actually run-able, but we tried to make them into at least valid wasm files (with extra metadata sections).

If you feel strongly, perhaps we should have a longer discussion about that. An issue in https://github.com/WebAssembly/tool-conventions/ is probably the best place for that. I guess we could still revisit this.

We can't change WebAssembly - it's too late for that. What seems to be the disagreement here is that the Wasm spec has chosen the word "section" to describe a different concept to what "section" means in ELF. We're stuck with that!

Rui's suggesting changing the Wasm object format, so that each function gets its own Wasm "section". But that's not going to make the format any better, the thing that's called a "section" in Wasm isn't designed for holding individual functions. Just because it's called a "section" in the Wasm spec, has got nothing to do with whether it means the same thing to the linker that an ELF "section" does.

I think we just have to accept that the ELF and Wasm specs have chosen to use terminology differently. That's not even a defect in those specs, the Wasm meaning of "section" is perfectly in line with how it's used in other file formats to represent a portion of the data in the file.

To resolve this, I think it's just a question of how terminology and naming should be used in LLD. We have total freedom to change the LLD class names, so if Rui really wants LLD to operate on "sections" then it can, just with some renaming! But, we can't change the Wasm format itself without a good reason, since there are downstream projects already using these files.

Non-exhaustive options:

Status quo. Accept the difference, and use "chunk" where the ELF linker uses "section", and use "section" in the Wasm code with the same meaning it has in the Wasm spec.
Rename the Wasm code to avoid the contentious "section" terminology entirely. Come up with a different name for Wasm "sections" so no-one is tempted into thinking it's supposed to be the unit of linking like it is in ELF.
Rename the Wasm code to use the terminology "section" to match the ELF linker (ie rename InputChunk -> InputSection), with a comment explaining that InputSection doesn't represent the thing that's called a "section" in the Wasm spec.

TLDR:

I agree that everything in the linker should be "sections" - and that's already how it works, it's just that that abstraction is called a "chunk" currently in the Wasm code, because "section" is taken to mean something else in Wasm. Surely it already is what you want it to be, apart from the fact it's called "chunk". The fact that Wasm has something called a "section" doesn't mean that's what Wasm object files should be using for storing each function. We can do whatever we want with the terminology in the code to fix the situation, but the file format itself isn't the problem.

Maybe you are right. I didn't have time to take a look and understand the wasm file format (that's why I didn't send any comments to https://github.com/WebAssembly/tool-conventions yet), but by reading wasm/InputChunks.h and wasm/InputFiles.cpp again, there's no two-level containers in the file format. What I found a bit confusing is the name of "InputSegment" -- that sounds like it is some base class, but it is actually a leaf class for data segment.

To be clear, I'm not suggesting to make wasm look simliar to ELF. I'm of course not complaining that wasm is different from ELF. All file formats should have equal rights in lld. :) When I say something seems weird, that's just that. I don't think wasm needs to be similar to ELF because the two file formats are naturally different for a good reason. That's why you guys have invented a new file format in the first place.

If "section" is used for other stuff in wasm, we shouldn't use the same terminology for something else to avoid confusion. I think this is more like a problem of lack of documentation or comments in wasm lld. Maybe we should expand the file comment of InputChunk.h to describe what Chunk is and how a file consists with chunks. I'll try to do that.

I also think that we should rename InputSegment InputData. A function is called InputFunction, so it is I believe a better name.

As to the spec, how much is it fixed at the moment? You said that it's too late, but IIRC, Sam is trying to emit relocation per chunk, instead of emitting a single relocation per file. That's not a file format change?

In D42176#1024134, @ruiu wrote:

Maybe you are right. I didn't have time to take a look and understand the wasm file format (that's why I didn't send any comments to https://github.com/WebAssembly/tool-conventions yet), but by reading wasm/InputChunks.h and wasm/InputFiles.cpp again, there's no two-level containers in the file format. What I found a bit confusing is the name of "InputSegment" -- that sounds like it is some base class, but it is actually a leaf class for data segment.

To be clear, I'm not suggesting to make wasm look simliar to ELF. I'm of course not complaining that wasm is different from ELF. All file formats should have equal rights in lld. :) When I say something seems weird, that's just that. I don't think wasm needs to be similar to ELF because the two file formats are naturally different for a good reason. That's why you guys have invented a new file format in the first place.

If "section" is used for other stuff in wasm, we shouldn't use the same terminology for something else to avoid confusion. I think this is more like a problem of lack of documentation or comments in wasm lld. Maybe we should expand the file comment of InputChunk.h to describe what Chunk is and how a file consists with chunks. I'll try to do that.

I also think that we should rename InputSegment InputData. A function is called InputFunction, so it is I believe a better name.

The term "data segment" in the wasm world has a specific meaning. It refers to the things in the data section. The data section is made up of segments. I know this is directly counter to the meaning in the ELF world but it make some sense. Each wasm data segment is a contiguous indivisible run of of bytes. InputSegment already has a comment specifying why its called what it is, and why it might be counter intuitive to someone coming from ELF.

I think the name InputSegment makes sense when this is understood, but I don't really might what we call it.

Thanks for looking into improving the comments! :)

In D42176#1024151, @ruiu wrote:

As to the spec, how much is it fixed at the moment? You said that it's too late, but IIRC, Sam is trying to emit relocation per chunk, instead of emitting a single relocation per file. That's not a file format change?

The Wasm core sections are very fixed, they're used by browsers (Firefox, Chrome, Edge, Safari...) and that boat has sailed. So that includes the way functions and globals are encoded. The bits specific to object files (like relocations) are under our control, and we can still change those. And of course, object files don't have to use a FUNCTION section for functions - but not doing so would be counterintuitive. The FUNCTION section we have no control over, that's set in stone.

In D42176#1025688, @ncw wrote:

Thanks for looking into improving the comments! :)

In D42176#1024151, @ruiu wrote:

As to the spec, how much is it fixed at the moment? You said that it's too late, but IIRC, Sam is trying to emit relocation per chunk, instead of emitting a single relocation per file. That's not a file format change?

The Wasm core sections are very fixed, they're used by browsers (Firefox, Chrome, Edge, Safari...) and that boat has sailed. So that includes the way functions and globals are encoded. The bits specific to object files (like relocations) are under our control, and we can still change those. And of course, object files don't have to use a FUNCTION section for functions - but not doing so would be counterintuitive. The FUNCTION section we have no control over, that's set in stone.

While you are correct that we can't change the executable format, the object file format, as you say, is under our control, so I think think should rely on the "we can't change it" argument too strongly.

We could, for example, have an object format the contains one code section per function and one data section per segment. I'm not suggesting we do that.. but its conceivable. It would even be pretty easy to have existing tools such as wabt support this (by parsing multiple code and data sections as if they were one).

In D42176#1025779, @sbc100 wrote:

We could, for example, have an object format the contains one code section per function and one data section per segment. I'm not suggesting we do that.. but its conceivable. It would even be pretty easy to have existing tools such as wabt support this (by parsing multiple code and data sections as if they were one).

It wouldn't be a valid Wasm file if it had more than one CODE or DATA section. We'd have to use a custom section if we wanted to do that; it would be an abuse to reuse existing official Wasm sections in a way that doesn't the specified syntax. You're right it's possible, my objection is that it just feels odd to use the Wasm container but not use its way of enclosing functions/data, since those are the core things that we're trying to work with.

Abandoned in favour of https://reviews.llvm.org/D51063

Revision Contents

Path

Size

wasm/

InputChunks.h

18 lines

InputChunks.cpp

10 lines

InputFiles.cpp

46 lines

Diff 135634

wasm/InputChunks.h

Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
class InputChunk {		class InputChunk {
public:		public:
enum Kind { DataSegment, Function };		enum Kind { DataSegment, Function };

Kind kind() const { return SectionKind; }		Kind kind() const { return SectionKind; }

uint32_t getSize() const { return data().size(); }		uint32_t getSize() const { return data().size(); }

void copyRelocations(const WasmSection &Section);

void writeTo(uint8_t *SectionStart) const;		void writeTo(uint8_t *SectionStart) const;

ArrayRef<WasmRelocation> getRelocations() const { return Relocations; }		ArrayRef<WasmRelocation> getRelocations() const { return Relocations; }
		void setRelocations(ArrayRef<WasmRelocation> Rs) { Relocations = Rs; }

virtual StringRef getComdat() const = 0;		virtual StringRef getComdat() const = 0;
virtual StringRef getName() const = 0;		virtual StringRef getName() const = 0;
		virtual uint32_t getInputSectionOffset() const = 0;

size_t NumRelocations() const { return Relocations.size(); }		size_t NumRelocations() const { return Relocations.size(); }
void writeRelocations(llvm::raw_ostream &OS) const;		void writeRelocations(llvm::raw_ostream &OS) const;

ObjFile *File;		ObjFile *File;
int32_t OutputOffset = 0;		int32_t OutputOffset = 0;

// Signals that the section is part of the output. The garbage collector,		// Signals that the section is part of the output. The garbage collector,
// and COMDAT handling can set a sections' Live bit.		// and COMDAT handling can set a sections' Live bit.
// If GC is disabled, all sections start out as live by default.		// If GC is disabled, all sections start out as live by default.
unsigned Live : 1;		unsigned Live : 1;

protected:		protected:
InputChunk(ObjFile *F, Kind K)		InputChunk(ObjFile *F, Kind K)
: File(F), Live(!Config->GcSections), SectionKind(K) {}		: File(F), Live(!Config->GcSections), SectionKind(K) {}
virtual ~InputChunk() = default;		virtual ~InputChunk() = default;
virtual ArrayRef<uint8_t> data() const = 0;		virtual ArrayRef<uint8_t> data() const = 0;
virtual uint32_t getInputSectionOffset() const = 0;

std::vector<WasmRelocation> Relocations;		ArrayRef<WasmRelocation> Relocations;
Kind SectionKind;		Kind SectionKind;
};		};

// Represents a WebAssembly data segment which can be included as part of		// Represents a WebAssembly data segment which can be included as part of
// an output data segments. Note that in WebAssembly, unlike ELF and other		// an output data segments. Note that in WebAssembly, unlike ELF and other
// formats, used the term "data segment" to refer to the continous regions of		// formats, used the term "data segment" to refer to the continous regions of
// memory that make on the data section. See:		// memory that make on the data section. See:
// https://webassembly.github.io/spec/syntax/modules.html#syntax-data		// https://webassembly.github.io/spec/syntax/modules.html#syntax-data
Show All 16 Lines	public:
void setOutputSegment(const OutputSegment *Segment, uint32_t Offset) {		void setOutputSegment(const OutputSegment *Segment, uint32_t Offset) {
OutputSeg = Segment;		OutputSeg = Segment;
OutputSegmentOffset = Offset;		OutputSegmentOffset = Offset;
}		}

uint32_t getAlignment() const { return Segment.Data.Alignment; }		uint32_t getAlignment() const { return Segment.Data.Alignment; }
StringRef getName() const override { return Segment.Data.Name; }		StringRef getName() const override { return Segment.Data.Name; }
StringRef getComdat() const override { return Segment.Data.Comdat; }		StringRef getComdat() const override { return Segment.Data.Comdat; }
		uint32_t getInputSectionOffset() const override {
		return Segment.SectionOffset;
		}

int32_t OutputSegmentOffset = 0;		int32_t OutputSegmentOffset = 0;

protected:		protected:
ArrayRef<uint8_t> data() const override { return Segment.Data.Content; }		ArrayRef<uint8_t> data() const override { return Segment.Data.Content; }
uint32_t getInputSectionOffset() const override {
return Segment.SectionOffset;
}

const WasmSegment &Segment;		const WasmSegment &Segment;
const OutputSegment *OutputSeg = nullptr;		const OutputSegment *OutputSeg = nullptr;
};		};

// Represents a single wasm function within and input file. These are		// Represents a single wasm function within and input file. These are
// combined to create the final output CODE section.		// combined to create the final output CODE section.
class InputFunction : public InputChunk {		class InputFunction : public InputChunk {
public:		public:
InputFunction(const WasmSignature &S, const WasmFunction *Func,		InputFunction(const WasmSignature &S, const WasmFunction *Func,
ObjFile *F)		ObjFile *F)
: InputChunk(F, InputChunk::Function), Signature(S), Function(Func) {}		: InputChunk(F, InputChunk::Function), Signature(S), Function(Func) {}

static bool classof(const InputChunk *C) {		static bool classof(const InputChunk *C) {
return C->kind() == InputChunk::Function;		return C->kind() == InputChunk::Function;
}		}

StringRef getName() const override { return Function->Name; }		StringRef getName() const override { return Function->Name; }
StringRef getComdat() const override { return Function->Comdat; }		StringRef getComdat() const override { return Function->Comdat; }
		uint32_t getInputSectionOffset() const override {
		return Function->CodeSectionOffset;
		}
uint32_t getOutputIndex() const { return OutputIndex.getValue(); }		uint32_t getOutputIndex() const { return OutputIndex.getValue(); }
bool hasOutputIndex() const { return OutputIndex.hasValue(); }		bool hasOutputIndex() const { return OutputIndex.hasValue(); }
void setOutputIndex(uint32_t Index);		void setOutputIndex(uint32_t Index);
uint32_t getTableIndex() const { return TableIndex.getValue(); }		uint32_t getTableIndex() const { return TableIndex.getValue(); }
bool hasTableIndex() const { return TableIndex.hasValue(); }		bool hasTableIndex() const { return TableIndex.hasValue(); }
void setTableIndex(uint32_t Index);		void setTableIndex(uint32_t Index);

const WasmSignature &Signature;		const WasmSignature &Signature;

protected:		protected:
ArrayRef<uint8_t> data() const override {		ArrayRef<uint8_t> data() const override {
return File->CodeSection->Content.slice(getInputSectionOffset(),		return File->CodeSection->Content.slice(getInputSectionOffset(),
Function->Size);		Function->Size);
}		}
uint32_t getInputSectionOffset() const override {
return Function->CodeSectionOffset;
}

const WasmFunction *Function;		const WasmFunction *Function;
llvm::Optional<uint32_t> OutputIndex;		llvm::Optional<uint32_t> OutputIndex;
llvm::Optional<uint32_t> TableIndex;		llvm::Optional<uint32_t> TableIndex;
};		};

class SyntheticFunction : public InputFunction {		class SyntheticFunction : public InputFunction {
public:		public:
Show All 19 Lines

wasm/InputChunks.cpp

	Show All 27 Lines
	}			}

	uint32_t InputSegment::translateVA(uint32_t Offset) const {			uint32_t InputSegment::translateVA(uint32_t Offset) const {
	assert(Offset <= getSize());			assert(Offset <= getSize());
	DEBUG(dbgs() << "translateVA: " << getName() << " Offset=" << Offset << "\n");			DEBUG(dbgs() << "translateVA: " << getName() << " Offset=" << Offset << "\n");
	return OutputSeg->StartVA + OutputSegmentOffset + Offset;			return OutputSeg->StartVA + OutputSegmentOffset + Offset;
	}			}

	void InputChunk::copyRelocations(const WasmSection &Section) {
	if (Section.Relocations.empty())
	return;
	size_t Start = getInputSectionOffset();
	size_t Size = getSize();
	for (const WasmRelocation &R : Section.Relocations)
	if (R.Offset >= Start && R.Offset < Start + Size)
	Relocations.push_back(R);
	}

	// Copy this input chunk to an mmap'ed output file and apply relocations.			// Copy this input chunk to an mmap'ed output file and apply relocations.
	void InputChunk::writeTo(uint8_t *Buf) const {			void InputChunk::writeTo(uint8_t *Buf) const {
	// Copy contents			// Copy contents
	memcpy(Buf + OutputOffset, data().data(), data().size());			memcpy(Buf + OutputOffset, data().data(), data().size());

	// Apply relocations			// Apply relocations
	if (Relocations.empty())			if (Relocations.empty())
	return;			return;
	▲ Show 20 Lines • Show All 69 Lines • Show Last 20 Lines

wasm/InputFiles.cpp

	Show First 20 Lines • Show All 185 Lines • ▼ Show 20 Lines
	GlobalSymbol *ObjFile::getGlobalSymbol(uint32_t Index) const {			GlobalSymbol *ObjFile::getGlobalSymbol(uint32_t Index) const {
	return cast<GlobalSymbol>(Symbols[Index]);			return cast<GlobalSymbol>(Symbols[Index]);
	}			}

	DataSymbol *ObjFile::getDataSymbol(uint32_t Index) const {			DataSymbol *ObjFile::getDataSymbol(uint32_t Index) const {
	return cast<DataSymbol>(Symbols[Index]);			return cast<DataSymbol>(Symbols[Index]);
	}			}

				template<class T>
				static void setRelocs(const std::vector<T *> &Chunks,
				ruiuUnsubmitted Not Done Reply Inline Actions I read this part of code recently and found the current code does something very weird. It copies relocations for no obvious reason. Instead of copying relocations and attach it to a Chunk, you could directly consume mmap'ed relocation section. So, I think "copy" or "set" relocation should be eliminated, instead of making it efficient. ruiu: I read this part of code recently and found the current code does something very weird. It…
				ncwAuthorUnsubmitted Not Done Reply Inline Actions I agree - in fact, I think that's exactly what I've done! I've renamed the methods from "copy" to "set" because the copy is eliminated - see the std::vector in InputChunk that's been replaced with an ArrayRef that points directly into the WasmObjectFile (OK, so WasmObjectFile does a copy to unserialise the relocations, so it's not quite a direct pointer into the input file... but it's closer). I think that with this change, there are no longer any copies of relocations in Wasm-LLD. All we're doing here is giving each InputChunk a single pointer to that chunk's relocs in the InputFile's array. ncw: I agree - in fact, I think that's exactly what I've done! I've renamed the methods from "copy"…
				const WasmSection *Section) {
				if (!Section)
				return;

				ArrayRef<WasmRelocation> Relocs = Section->Relocations;
				assert(std::is_sorted(
				Relocs.begin(), Relocs.end(),
				[](const WasmRelocation &R1, const WasmRelocation &R2) {
				return R1.Offset < R2.Offset;
				}));
				assert(std::is_sorted(
				Chunks.begin(), Chunks.end(),
				[](InputChunk C1, InputChunk C2) {
				return C1->getInputSectionOffset() < C2->getInputSectionOffset();
				}));

				auto RelocsNext = Relocs.begin();
				auto RelocsEnd = Relocs.end();
				auto RelocLess = [](const WasmRelocation &R, uint32_t Val) {
				return R.Offset < Val;
				};
				for (InputChunk *C : Chunks) {
				auto RelocsStart = std::lower_bound(RelocsNext, RelocsEnd,
				C->getInputSectionOffset(), RelocLess);
				RelocsNext = std::lower_bound(RelocsStart, RelocsEnd,
				C->getInputSectionOffset() + C->getSize(),
				RelocLess);
				C->setRelocations(ArrayRef<WasmRelocation>(RelocsStart, RelocsNext));
				}
				}

	void ObjFile::initializeSymbols() {			void ObjFile::initializeSymbols() {
	Symbols.reserve(WasmObj->getNumberOfSymbols());			Symbols.reserve(WasmObj->getNumberOfSymbols());

	NumFunctionImports = WasmObj->getNumImportedFunctions();			NumFunctionImports = WasmObj->getNumImportedFunctions();
	NumGlobalImports = WasmObj->getNumImportedGlobals();			NumGlobalImports = WasmObj->getNumImportedGlobals();

	ArrayRef<WasmFunction> Funcs = WasmObj->functions();			ArrayRef<WasmFunction> Funcs = WasmObj->functions();
	ArrayRef<uint32_t> FuncTypes = WasmObj->functionTypes();			ArrayRef<uint32_t> FuncTypes = WasmObj->functionTypes();
	ArrayRef<WasmSignature> Types = WasmObj->types();			ArrayRef<WasmSignature> Types = WasmObj->types();

	for (const auto &C : WasmObj->comdats())			for (const auto &C : WasmObj->comdats())
	Symtab->addComdat(C, this);			Symtab->addComdat(C, this);

	for (const WasmSegment &S : WasmObj->dataSegments()) {			for (const WasmSegment &S : WasmObj->dataSegments())
	InputSegment *Seg = make<InputSegment>(S, this);			Segments.emplace_back(make<InputSegment>(S, this));
	Seg->copyRelocations(*DataSection);			setRelocs(Segments, DataSection);
	Segments.emplace_back(Seg);
	}

	for (const WasmGlobal &G : WasmObj->globals())			for (const WasmGlobal &G : WasmObj->globals())
	Globals.emplace_back(make<InputGlobal>(G));			Globals.emplace_back(make<InputGlobal>(G));

	for (size_t I = 0; I < Funcs.size(); ++I) {			for (size_t I = 0; I < Funcs.size(); ++I) {
	const WasmFunction &Func = Funcs[I];			const WasmFunction &Func = Funcs[I];
	const WasmSignature &Sig = Types[FuncTypes[I]];			const WasmSignature &Sig = Types[FuncTypes[I]];
	InputFunction *F = make<InputFunction>(Sig, &Func, this);			Functions.emplace_back(make<InputFunction>(Sig, &Func, this));
	F->copyRelocations(*CodeSection);
	Functions.emplace_back(F);
	}			}
				setRelocs(Functions, CodeSection);

	// Populate `Symbols` based on the WasmSymbols in the object			// Populate `Symbols` based on the WasmSymbols in the object
	for (const SymbolRef &Sym : WasmObj->symbols()) {			for (const SymbolRef &Sym : WasmObj->symbols()) {
	const WasmSymbol &WasmSym = WasmObj->getWasmSymbol(Sym.getRawDataRefImpl());			const WasmSymbol &WasmSym = WasmObj->getWasmSymbol(Sym.getRawDataRefImpl());
	bool IsDefined = WasmSym.isDefined();			bool IsDefined = WasmSym.isDefined();

	if (IsDefined) {			if (IsDefined) {
	switch (WasmSym.Info.Kind) {			switch (WasmSym.Info.Kind) {
	▲ Show 20 Lines • Show All 121 Lines • Show Last 20 Lines