This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
COFF/
3
Chunks.h
19
Chunks.cpp
-
Writer.h
23
Writer.cpp
-
test/COFF/
-
COFF/
-
Inputs/
-
far-arm-thumb-abs.s
-
far-arm-thumb-abs20.s
-
arm-thumb-branch-error.s
-
arm-thumb-branch20-error.s
-
arm-thumb-thunks.s

Differential D51089

[LLD] [COFF] Add support for creating range extension thunks for ARM
AbandonedPublic

Authored by mstorsjo on Aug 22 2018, 2:14 AM.

Download Raw Diff

Details

Reviewers

ruiu
peter.smith
pcc
rnk
javed.absar

Summary

This is a feature that MS link.exe lacks; it currently errors out on such relocations, just like lld did before.

This allows linking clang.exe for ARM - practically, any image over 16 MB will likely run into the issue.

Diff Detail

Event Timeline

mstorsjo created this revision.Aug 22 2018, 2:14 AM

Herald added a reviewer: javed.absar. · View Herald TranscriptAug 22 2018, 2:14 AM

Herald added subscribers: chrib, kristof.beyls. · View Herald Transcript

Just FWIW, this change plus D51032 seems to be enough to build a working (at least for small trivial examples) clang+lld for ARM/Windows (AArch64 seems to work just fine as is).

I've left a couple of comments where I think that there may be some unintentional inefficiencies, but as far as I can tell it looks like it will be correct. I suggest adding a lot more test cases as you go for things like thunk reuse.

COFF/Writer.cpp
355	Assuming getRVA() is the virtual address of the symbol, is Target->getRVA() stable between passes? Presumably if thunks are inserted then assignAddresses() may cause some symbols to change address? I'm not too familiar with the COFF code base so I could be missing something here. If I'm right the reuse between passes may not work as well as it could do.
407	In theory if you are iterating a fixed number (Chunks.size()) of the Chunks vector then inserting thunks into the Chunks vector in the same loop will mean that Chunks near the end may not be scanned for Thunks. Given that the algorithm will only terminate when 0 thunks are inserted you'll eventually scan all of them but it may cost you more passes than you would need if you inserted all Thunks in one go. I think you'll be unlikely to hit 10 passes without a contrived test case though.

In D51089#1209038, @peter.smith wrote:

I've left a couple of comments where I think that there may be some unintentional inefficiencies, but as far as I can tell it looks like it will be correct. I suggest adding a lot more test cases as you go for things like thunk reuse.

Thanks for taking a look!

Yup, some more tests definitely would be good. Do you have any suggestions on how to test things like this efficiently without creating >16 MB binaries? The existing tests (that used to check for errors) just use absolute symbols as targets to make it fail. I see that the ELF tests either do padding with .space or huge alignment with .balign - I guess something like that would work here as well. And I see that the existing ELF tests also produce huge binaries (although they hopefully are stored sparsely).

COFF/Writer.cpp
355	No, the RVAs aren't stable between passes, but we don't keep the map between passes either; it's a local variable in createThunks below. I guess it would be useful to allow this to find a different thunk from the previous pass (that would also reduce the amount of changes in later passes, reducing the number of passes required before it converges), but that's not implemented (yet).
407	Hmm, I'm not quite sure I understand what you mean here. You mean that since I'm adding more elements to the Chunks vector, I'd miss the last few ones that were pushed forward? The limit on the outer loop, on line 379, explicitly checks for Chunks.size(), so it will loop until the very end of the vector, even if Chunks grows meanwhile.

Sadly I had to resort to using .space to create large binaries. Creating the binaries is usually quick, disassembling them is unfortunately not. I tended to used gnu objdump first as that skips 0 by default to find the address ranges I needed, then used --start-address and --stop-address in llvm-objdump to pull out the bits that I need.

In ELF we can use Linker Scripts for quite a few of the cases; although Linker Scripts make some corner cases possible that wouldn't exist otherwise. You may be able to do most of your tests with the conditional branch (+- 1 Mb), I didn't do that with ELF as the ThunkSections were placed at 16 Mb intervals as the vast majority of the relocations we would encounter were +- 16Mb.

COFF/Writer.cpp
407	Apologies, I had it my mind that Chunks.size() would only be calculated once per pass.

mstorsjo added inline comments.Aug 24 2018, 12:01 PM

COFF/Chunks.cpp
54	Does anyone have an opinion on the mechanism of overriding what symbol an individual reloc points to? Here I provide a full vector of symbols (which can't be initialized directly but after all symbols actually exist) - an alternative would be e.g. a DenseMap to only provide the individual symbols that are overridden. Or something else?

smeenai added a subscriber: smeenai.Aug 24 2018, 12:50 PM

ruiu added inline comments.Aug 26 2018, 9:52 PM

COFF/Chunks.cpp
51–52	Can this happen?
53–54	Since this vector can be very large, it is perhaps better to call `reserve()`.
54	I think this vector should be fine because this vector will be used very heavily and a vector lookup is extremely fast.
431–432	It is more straightforward to write this loop in as a plain old `for` loop instead of a range-based for loop with `Counter`.
COFF/Chunks.h
231	Maybe something like `RelocTargets` is better? `Symbols` looks like it represents symbol table contents. Please add a comment to explain why we want to cache relocation targets in this table (i.e. we need to modify relocation targets when a relocation is redirected to other symbol due to thunk insertion.)
COFF/ICF.cpp
150 ↗	(On Diff #161900)	I'd think this comment is not easy to understand if you don't know about this comment is in the context of ARM thunk support. Maybe we can just omit it? Does ICF run after ARM thunk creation?
178 ↗	(On Diff #161900)	Ditto.
COFF/Writer.cpp
338	nit: move this assert at the beginning of this function.
359	I don't know if your above comment is true, but if the one this loop is looking for is likely at the end of the vector, I'd search in the reverse order. I.e. for (Defined *Sym : llvm::reverse(TargetThunks))
361	Can this just be `return {Sym, false}`?
362	nit: omit {}
366	I don't think you need to pass this `ThunkCounter` around; I'd define this as a static local variable in this function.
369	Error messages should start with a lowercase letter. Shouldn't this be an `assert`? If this can be triggered by a valid user input, this error message should contain more info as to what is the problem.
371	Maybe `return {D, true}`
375	std::map is an ordered map and usually much slower than DenseMap, so please use DenseMap.
384	nit: please insert a blank line before a comment.
405	Nesting is too deep. Please consider splitting to multiple functions.
428	Adding -> adding

Thanks for the feedback! I'll fold this into the next iteration of the patch; I have a bunch of other improvements planned.

COFF/Chunks.cpp
51–52	I don't think so - perhaps I should make it an assert. I had to insert a call to finalizeContents() in relocateDebugChunk() in PDB.cpp though, so I wanted to make sure.
53–54	Indeed, I'll make it use reserve in the next iteration.
431–432	Ok, will do.
COFF/ICF.cpp
150 ↗	(On Diff #161900)	Yeah, it's probably best to omit the comment. ICF runs before the thunk creation. The original reason for the comment was that I wanted to replace all uses of coff_relocation::SymbolTableIndex with the Symbols vector (outside the case when initializing the vector), to make things consistent, but it wasn't practical in all cases. In reality it mostly is necessary in SectionChunk::writeTo() and SectionChunk::getBaserels().
COFF/Writer.cpp
366	I later figured out I didn't need to name the thunk symbol at all; I just create the DefinedSynthetic directly without a name, and don't add it to Symtab - just like we do with object file local symbols.
369	I managed to remove this error altogether by not adding the thunk symbols to the symbol table.

mstorsjo added inline comments.Aug 27 2018, 12:25 AM

COFF/Writer.cpp
359	That's a nice idea. I'm changing code to keep the mapping of existing thunks stable across passes, so then the comment doesn't apply quite as much as before. I tested this and it didn't really save any measurable runtime, but it might be worthwhile anyway, as long as we iterate through the whole vector.
361	Indeed, will simplify the syntax of these.

mstorsjo added inline comments.Aug 27 2018, 4:11 AM

COFF/Chunks.cpp
51–52	Actually, yes, it does happen. `finalizeContents` gets called by `assignAddresses`, which gets called repeatedly when relayouting after adding thunks. (After realizing this, I had to make sure MergeChunk::finalizeContents works properly for this case as well.) The alternative would be to add another callback to Chunk, which we'd call just once, when all symbols are available.
COFF/Writer.cpp
359	In practice with the case I'm testing, TargetThunks will only have 0 or 1 members, so it doesn't really matter much, but in general I guess it could be useful. (My testcase produces a 46 MB clang.exe, so it's only roughly twice as large as the branch range which is 16 MB.)
405	It's a bit hard to split this part to a separate method since it touches almost every single local variable from this method, but I can easily change the `if (!isInRange())` into an `if (isInRange()) continue;` to reduce the nesting a little.

Updated taking @ruiu and @peter.smith's feedback into account. I still haven't added more tests though, so that's still a clear todo, but reposting for more potential feedback meanwhile.

I updated the code to keep the thunk maps between passes, which leads to much fewer additions in later passes (originally I could occasionally hit up to 8 passes before things were done, now it gets done in 3 passes), and fixed code to avoid chaining thunks in case the originally chosen thunk went out of range.

Added a pretty complete testcase testing most aspects of the algorithm that is implemented, added a testcase for when unable to fix range issues with thunks.

Missed the RelocTargets part of the diff in the previous update.

efriedma added a subscriber: efriedma.Aug 28 2018, 1:56 PM

efriedma added inline comments.

COFF/Chunks.cpp
663	Can you use "add pc, ip" instead? That's not an interworking branch, but I think we can assume the target is Thumb mode here.

mstorsjo added inline comments.Aug 28 2018, 2:20 PM

COFF/Chunks.cpp
663	Oh, indeed, that'll make it as small as the other ones, while being PIC. Will update.

Changed the thunk implementation to a shorter, non-interworking, form as suggested by @efriedma.

ruiu added inline comments.Aug 28 2018, 11:28 PM

COFF/Chunks.cpp
50	We generally don't micro-optimize code, but for relocations, we do, because the number of relocations can be an order of tens of millions for large programs. Spending one more microsecond for each relocation adds up to one second if your program has one million relocations. This function is a bit concerning in that regard. Could you measure the performance impact? Also, it looks odd that you do this in `finalizeContents`, as it doesn't correspond to finalizing contents. Perhaps this function should be given a new name.
COFF/Chunks.h
369–372	We generally don't define trivial accessors like this; instead, just define a member as a public one.

mstorsjo added inline comments.Aug 29 2018, 2:11 AM

COFF/Chunks.cpp
50	I tried measuring it, and I think it's making things slower, but it's mostly within measurement noise. My testcase was linking a 66 MB clang.exe (for x86_64). Before this change, the fastest link was in 480 ms, after the change the fastest link was 520 ms. But in both cases the runtimes occasionally go up to over 600 ms. So it's not huge but I think it's consistently measurable. Do you have any other suggestions on how to achieve this without affecting performance? Only use the RelocTargets vector if `Machine==ARMNT`? Or make it a `DenseMap` for overridden targets, which is empty for all other cases than when we have added thunks? Yes, it's a bit odd with `finalizeContents`, maybe a new method `initRelocTargets` which just gets called once before we start doing `assignAddresses`?
COFF/Chunks.h
369–372	Ok, will include that change into the next update.

ruiu added inline comments.Aug 29 2018, 2:19 AM

COFF/Chunks.cpp
50	Of 520ms, it'd be interesting to know how much time this function is spending. gprof might help, but I'm not sure if it works on Windows. 480ms to 520ms isn't I think a marginal difference; it's 10% slowdown if the measurement is accurate. One idea to make it faster (and could potentially be faster than it is now) is to parallelize it. I believe you can make it a separate function, say `readRelocTargets`, and call on all input sections in parallel. It should be safe to do because filling this new vector doesn't affect other threads.

mstorsjo added inline comments.Aug 29 2018, 2:58 AM

COFF/Chunks.cpp
50	I don't run things on Windows myself, I'm mostly working with cross compilation here. I'll try making this a separate parallel pass and see what difference it makes!

Split out readRelocTargets() to a separate method, which is called in parallel. This reduced the performance drop quite a bit, now the difference is much smaller. In this test round, the fastest run for the original version was 395 ms, after this patch the fastest run was 401 ms. So in this case, the slowdown is around 2% which hopefully is more acceptable.

In D51089#1217205, @mstorsjo wrote:

Split out readRelocTargets() to a separate method, which is called in parallel. This reduced the performance drop quite a bit, now the difference is much smaller. In this test round, the fastest run for the original version was 395 ms, after this patch the fastest run was 401 ms. So in this case, the slowdown is around 2% which hopefully is more acceptable.

@ruiu - any further comments? Is this form more acceptable?

In D51089#1220360, @mstorsjo wrote:

In D51089#1217205, @mstorsjo wrote:

Split out readRelocTargets() to a separate method, which is called in parallel. This reduced the performance drop quite a bit, now the difference is much smaller. In this test round, the fastest run for the original version was 395 ms, after this patch the fastest run was 401 ms. So in this case, the slowdown is around 2% which hopefully is more acceptable.

@ruiu - any further comments? Is this form more acceptable?

Ping @ruiu

In D51089#1224268, @mstorsjo wrote:

In D51089#1220360, @mstorsjo wrote:

In D51089#1217205, @mstorsjo wrote:

Split out readRelocTargets() to a separate method, which is called in parallel. This reduced the performance drop quite a bit, now the difference is much smaller. In this test round, the fastest run for the original version was 395 ms, after this patch the fastest run was 401 ms. So in this case, the slowdown is around 2% which hopefully is more acceptable.

@ruiu - any further comments? Is this form more acceptable?

Ping @ruiu

Another ping for @ruiu

@ruiu - Do you have time to proceed with this one? Is the performance regression, which now is smaller thanks to your suggeation, acceptable? Or should I try other alternatives which make messier code with different codepaths for architectures that don't need thunks?

The thunk algorithm here should be good to go now (no longer RFC level), in case @peter.smith wants to have another look (it's just a few minor improvements over the original one, which was more or less ok'd).

The algorithm looks good to me. For ELF I preferred to give the Thunk a name related to the destination as it makes it a bit easier to follow disassembled binaries, but it is not essential.

Sorry for the belated response. I was thinking of this patch for a while.

Every time I saw the code of thunk range extension, I wonder if we really need this multi-pass algorithm which add thunks iterative on each pass. I believe in almost all cases, the algorithm finishes on the first iteration, if we allow a very small margin when determining "reachability". As long as a margin is small, size increase by allowing a margin should be negligible.

For pathetic executables for which we need to generate tons of thunks (which enlarges distance between callers and callees and thus need multiple passes with the current algorithm), we can simply discard everything that we made in the previous iteration instead of keeping them, double the margin, and then try again from scratch. In practice, I believe that fallback doesn't happen too frequently.

What do you think of the algorithm? If it works, I prefer that algorithm because discarding everything and redo with a larger margin is simpler than keeping thunks created in previous passes.

COFF/Chunks.cpp
659	This variable name seems a bit too long to my taste; I'd name `ArmThunk` or something like that, and that should be fine as long as this is a file-scope variable.
829–836	This part is not guarded by `Finalized` -- is that intended?
COFF/PDB.cpp
775 ↗	(On Diff #163043)	If you change this line to cast<SectionChunk>(DebugChunk)->readRelocTargets(); then can you make `readRelocTargets` a non-virtual member function that belongs to `SectionChunk`?
COFF/Writer.cpp
364	Please insert a blank line before a multi-line comment.

In D51089#1231980, @ruiu wrote:

Sorry for the belated response. I was thinking of this patch for a while.

Every time I saw the code of thunk range extension, I wonder if we really need this multi-pass algorithm which add thunks iterative on each pass. I believe in almost all cases, the algorithm finishes on the first iteration, if we allow a very small margin when determining "reachability". As long as a margin is small, size increase by allowing a margin should be negligible.

For pathetic executables for which we need to generate tons of thunks (which enlarges distance between callers and callees and thus need multiple passes with the current algorithm), we can simply discard everything that we made in the previous iteration instead of keeping them, double the margin, and then try again from scratch. In practice, I believe that fallback doesn't happen too frequently.

What do you think of the algorithm? If it works, I prefer that algorithm because discarding everything and redo with a larger margin is simpler than keeping thunks created in previous passes.

It can work; I worked on a proprietary linker for embedded systems that used that algorithm, It worked well enough 99% of the time. It could fail with nasty corner cases though, for example increasing the margin means more calls go out of range which leads to more thunks etc. Having said that I suspect that won't be a problem for COFF as most of the failures were a combination of a strange linker script and number of Thunks (Thumb branch range used to be 4 Megabytes, so there could be Thousands of thunks in a large project).

In D51089#1231980, @ruiu wrote:

Sorry for the belated response. I was thinking of this patch for a while.

Every time I saw the code of thunk range extension, I wonder if we really need this multi-pass algorithm which add thunks iterative on each pass. I believe in almost all cases, the algorithm finishes on the first iteration, if we allow a very small margin when determining "reachability". As long as a margin is small, size increase by allowing a margin should be negligible.

I actually thought about that before, but either didn't think it through properly or didn't feel it was necessary - but by adding that in this current design I can make it succeed after the first pass already (previously it required two passes adding thunks on my testcase).

For pathetic executables for which we need to generate tons of thunks (which enlarges distance between callers and callees and thus need multiple passes with the current algorithm), we can simply discard everything that we made in the previous iteration instead of keeping them, double the margin, and then try again from scratch. In practice, I believe that fallback doesn't happen too frequently.

What do you think of the algorithm? If it works, I prefer that algorithm because discarding everything and redo with a larger margin is simpler than keeping thunks created in previous passes.

The approach you describe feels a bit fragile. Unless you're really sure the margin tradeoff is right and it will be done on the first pass in real-world cases, it'll degrade pretty badly.

I was going to try to give this an honest and objective try, but I don't feel it will be much simpler. For your suggestion, we would need to have two kinds of loops over the chunks - one loop which checks whether thunks are needed with margin, adding them as necessary, and a second loop which checks if all relocations now are in range (not trying to add any thunks, but just aborting the loop, then resetting everything back to the original state and starting over. While the current code (which also is very similar to the corresponding ELF thunk code) does verification at the same time as runs a new pass trying to add more thunks if needed. If no more were needed, the algorithm was done.

COFF/Chunks.cpp
659	Sure, I'll shorten it.
829–836	Yes - this is called from assignAddresses on each relayout, to propagate the current location in the layout.
COFF/PDB.cpp
775 ↗	(On Diff #163043)	That also requires changes like `if (SectionChunk *SC = dyn_cast_or_null<SectionChunk>(C))` in readRelocTargets() in Writer.cpp. Or we move calling that to somewhere else, but then it's probably not quite as easy to parallelize.

Optimized the algorithm further by checking ranges with a margin in the first pass, making it succeed after the first run in my testcase. Shortened a variable name and added whitespace as @ruiu suggested.

The testcase isn't updated after the last adjustments though - it's pretty much work to hand craft a testcase which triggers as many of the cornercases of the algorithm, I'll update it once we settle on the algorithm to choose.

Since using a margin for adding thunks, I think it will be extremely tedious to actually create a testcase which would trigger more than one pass.

In D51089#1232372, @mstorsjo wrote:

Since using a margin for adding thunks, I think it will be extremely tedious to actually create a testcase which would trigger more than one pass.

In order to get sensible test coverage, I could make the margin (used only in the first pass) configurable (somehow), and run the multipass test with a very small margin.

Even with the other approach suggested by @ruiu, testing of the case when one pass isn't enough would require a huge, pathological test.

mstorsjo mentioned this in D52156: [LLD] [COFF] Alternative ARM range thunk algorithm.Sep 16 2018, 2:30 PM

If I understand it, to get a test case we'd need to have a branch that is just in range (including margin) such that no thunk is generated, but adding sufficient thunks causes that branch to go out of range. The brute force way to do it would be to generate greater than (margin/thunk-size) thunks but even with macros that would be a large tedious test to write. One possibility in ELF that I don't know would transition into COFF is to have some sections with high alignment so that inserting a thunk could displace one of these sections off an alignment boundary and hence add much more size than just the Thunk.

Interesting. I think that the implementation you have here will converge faster as it makes it more likely that pass 1 has all the thunks, at the expense of potentially generating more thunks than is strictly required. However if the goal is simplicity I think that you'll need to do everything in one pass, and accept that there will be corner cases that might not link if the margin isn't sufficient. For ELF and arbitrary linker scripts I thought the chance of failure too high, for COFF the chance of failure may be low enough. I think the most likely edge case in COFF will be the presence of sections with high alignment requirements as inserting thunks could cause a lot of bytes of alignment padding to be added.

In D51089#1237524, @peter.smith wrote:

Interesting. I think that the implementation you have here will converge faster as it makes it more likely that pass 1 has all the thunks, at the expense of potentially generating more thunks than is strictly required. However if the goal is simplicity I think that you'll need to do everything in one pass, and accept that there will be corner cases that might not link if the margin isn't sufficient. For ELF and arbitrary linker scripts I thought the chance of failure too high, for COFF the chance of failure may be low enough. I think the most likely edge case in COFF will be the presence of sections with high alignment requirements as inserting thunks could cause a lot of bytes of alignment padding to be added.

My apologies I clicked on the wrong link in the email, I should have been looking at D52156. Please ignore the past comment as it won't make much sense. The comment about writing the test might still make sense.

Went with the alternative algorithm in D52156 instead.

Revision Contents

Path

Size

COFF/

12 lines

35 lines

6 lines

166 lines

test/

COFF/

Inputs/

far-arm-thumb-abs.s

far-arm-thumb-abs20.s

arm-thumb-branch-error.s

arm-thumb-branch20-error.s

12 lines

arm-thumb-thunks.s

90 lines

Diff 162943

COFF/Chunks.h

Show First 20 Lines • Show All 222 Lines • ▼ Show 20 Lines	public:

// The file that this chunk was created from.		// The file that this chunk was created from.
ObjFile *File;		ObjFile *File;

// The COMDAT leader symbol if this is a COMDAT chunk.		// The COMDAT leader symbol if this is a COMDAT chunk.
DefinedRegular *Sym = nullptr;		DefinedRegular *Sym = nullptr;

ArrayRef<coff_relocation> Relocs;		ArrayRef<coff_relocation> Relocs;

		ruiuUnsubmitted Not Done Reply Inline Actions Maybe something like `RelocTargets` is better? `Symbols` looks like it represents symbol table contents. Please add a comment to explain why we want to cache relocation targets in this table (i.e. we need to modify relocation targets when a relocation is redirected to other symbol due to thunk insertion.) ruiu: Maybe something like `RelocTargets` is better? `Symbols` looks like it represents symbol table…
// When inserting a thunk, we need to adjust a relocation to point to		// When inserting a thunk, we need to adjust a relocation to point to
// the thunk instead of the actual original target Symbol.		// the thunk instead of the actual original target Symbol.
std::vector<Symbol *> RelocTargets;		std::vector<Symbol *> RelocTargets;

private:		private:
StringRef SectionName;		StringRef SectionName;
std::vector<SectionChunk *> AssocChildren;		std::vector<SectionChunk *> AssocChildren;

Show All 28 Lines	public:
size_t getSize() const override;		size_t getSize() const override;
void writeTo(uint8_t *Buf) const override;		void writeTo(uint8_t *Buf) const override;

static std::map<uint32_t, MergeChunk *> Instances;		static std::map<uint32_t, MergeChunk *> Instances;
std::vector<SectionChunk *> Sections;		std::vector<SectionChunk *> Sections;

private:		private:
llvm::StringTableBuilder Builder;		llvm::StringTableBuilder Builder;
		bool Finalized = false;
};		};

// A chunk for common symbols. Common chunks don't have actual data.		// A chunk for common symbols. Common chunks don't have actual data.
class CommonChunk : public Chunk {		class CommonChunk : public Chunk {
public:		public:
CommonChunk(const COFFSymbolRef Sym);		CommonChunk(const COFFSymbolRef Sym);
size_t getSize() const override { return Sym.getValue(); }		size_t getSize() const override { return Sym.getValue(); }
bool hasData() const override { return false; }		bool hasData() const override { return false; }
▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	public:
explicit ImportThunkChunkARM64(Defined *S) : ImpSymbol(S) {}		explicit ImportThunkChunkARM64(Defined *S) : ImpSymbol(S) {}
size_t getSize() const override { return sizeof(ImportThunkARM64); }		size_t getSize() const override { return sizeof(ImportThunkARM64); }
void writeTo(uint8_t *Buf) const override;		void writeTo(uint8_t *Buf) const override;

private:		private:
Defined *ImpSymbol;		Defined *ImpSymbol;
};		};

		class RangeExtensionThunk : public Chunk {
		public:
		explicit RangeExtensionThunk(Defined *T) : Target(T) {}
		size_t getSize() const override;
		void writeTo(uint8_t *Buf) const override;
		Defined *getTarget() const { return Target; }

		private:
		Defined *Target;
		ruiuUnsubmitted Not Done Reply Inline Actions We generally don't define trivial accessors like this; instead, just define a member as a public one. ruiu: We generally don't define trivial accessors like this; instead, just define a member as a…
		mstorsjoAuthorUnsubmitted Not Done Reply Inline Actions Ok, will include that change into the next update. mstorsjo: Ok, will include that change into the next update.
		};

// Windows-specific.		// Windows-specific.
// See comments for DefinedLocalImport class.		// See comments for DefinedLocalImport class.
class LocalImportChunk : public Chunk {		class LocalImportChunk : public Chunk {
public:		public:
explicit LocalImportChunk(Defined *S) : Sym(S) {}		explicit LocalImportChunk(Defined *S) : Sym(S) {}
size_t getSize() const override;		size_t getSize() const override;
void getBaserels(std::vector<Baserel> *Res) override;		void getBaserels(std::vector<Baserel> *Res) override;
void writeTo(uint8_t *Buf) const override;		void writeTo(uint8_t *Buf) const override;
▲ Show 20 Lines • Show All 125 Lines • Show Last 20 Lines

COFF/Chunks.cpp

Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	SectionChunk::SectionChunk(ObjFile F, const coff_section H)
// files will be built with -ffunction-sections or /Gy, so most things worth		// files will be built with -ffunction-sections or /Gy, so most things worth
// stripping will be in a comdat.		// stripping will be in a comdat.
Live = !Config->DoGC \|\| !isCOMDAT();		Live = !Config->DoGC \|\| !isCOMDAT();
}		}

// Initialize the RelocTargets vector, to allow redirecting certain relocations		// Initialize the RelocTargets vector, to allow redirecting certain relocations
// to a thunk instead of the actual symbol the relocation's symbol table index		// to a thunk instead of the actual symbol the relocation's symbol table index
// indicates.		// indicates.
void SectionChunk::finalizeContents() {		void SectionChunk::finalizeContents() {
		ruiuUnsubmitted Not Done Reply Inline Actions We generally don't micro-optimize code, but for relocations, we do, because the number of relocations can be an order of tens of millions for large programs. Spending one more microsecond for each relocation adds up to one second if your program has one million relocations. This function is a bit concerning in that regard. Could you measure the performance impact? Also, it looks odd that you do this in `finalizeContents`, as it doesn't correspond to finalizing contents. Perhaps this function should be given a new name. ruiu: We generally don't micro-optimize code, but for relocations, we do, because the number of…
		mstorsjoAuthorUnsubmitted Not Done Reply Inline Actions I tried measuring it, and I think it's making things slower, but it's mostly within measurement noise. My testcase was linking a 66 MB clang.exe (for x86_64). Before this change, the fastest link was in 480 ms, after the change the fastest link was 520 ms. But in both cases the runtimes occasionally go up to over 600 ms. So it's not huge but I think it's consistently measurable. Do you have any other suggestions on how to achieve this without affecting performance? Only use the RelocTargets vector if `Machine==ARMNT`? Or make it a `DenseMap` for overridden targets, which is empty for all other cases than when we have added thunks? Yes, it's a bit odd with `finalizeContents`, maybe a new method `initRelocTargets` which just gets called once before we start doing `assignAddresses`? mstorsjo: I tried measuring it, and I think it's making things slower, but it's mostly within measurement…
		ruiuUnsubmitted Not Done Reply Inline Actions Of 520ms, it'd be interesting to know how much time this function is spending. gprof might help, but I'm not sure if it works on Windows. 480ms to 520ms isn't I think a marginal difference; it's 10% slowdown if the measurement is accurate. One idea to make it faster (and could potentially be faster than it is now) is to parallelize it. I believe you can make it a separate function, say `readRelocTargets`, and call on all input sections in parallel. It should be safe to do because filling this new vector doesn't affect other threads. ruiu: Of 520ms, it'd be interesting to know how much time this function is spending. gprof might help…
		mstorsjoAuthorUnsubmitted Not Done Reply Inline Actions I don't run things on Windows myself, I'm mostly working with cross compilation here. I'll try making this a separate parallel pass and see what difference it makes! mstorsjo: I don't run things on Windows myself, I'm mostly working with cross compilation here. I'll try…
if (!RelocTargets.empty())		if (!RelocTargets.empty())
return;		return;
		ruiuUnsubmitted Not Done Reply Inline Actions Can this happen? ruiu: Can this happen?
		mstorsjoAuthorUnsubmitted Not Done Reply Inline Actions I don't think so - perhaps I should make it an assert. I had to insert a call to finalizeContents() in relocateDebugChunk() in PDB.cpp though, so I wanted to make sure. mstorsjo: I don't think so - perhaps I should make it an assert. I had to insert a call to…
		mstorsjoAuthorUnsubmitted Not Done Reply Inline Actions Actually, yes, it does happen. `finalizeContents` gets called by `assignAddresses`, which gets called repeatedly when relayouting after adding thunks. (After realizing this, I had to make sure MergeChunk::finalizeContents works properly for this case as well.) The alternative would be to add another callback to Chunk, which we'd call just once, when all symbols are available. mstorsjo: Actually, yes, it does happen. `finalizeContents` gets called by `assignAddresses`, which gets…
RelocTargets.reserve(Relocs.size());		RelocTargets.reserve(Relocs.size());
for (const coff_relocation &Rel : Relocs)		for (const coff_relocation &Rel : Relocs)
		mstorsjoAuthorUnsubmitted Not Done Reply Inline Actions Does anyone have an opinion on the mechanism of overriding what symbol an individual reloc points to? Here I provide a full vector of symbols (which can't be initialized directly but after all symbols actually exist) - an alternative would be e.g. a DenseMap to only provide the individual symbols that are overridden. Or something else? mstorsjo: Does anyone have an opinion on the mechanism of overriding what symbol an individual reloc…
		ruiuUnsubmitted Not Done Reply Inline Actions I think this vector should be fine because this vector will be used very heavily and a vector lookup is extremely fast. ruiu: I think this vector should be fine because this vector will be used very heavily and a vector…
		ruiuUnsubmitted Not Done Reply Inline Actions Since this vector can be very large, it is perhaps better to call `reserve()`. ruiu: Since this vector can be very large, it is perhaps better to call `reserve()`.
		mstorsjoAuthorUnsubmitted Not Done Reply Inline Actions Indeed, I'll make it use reserve in the next iteration. mstorsjo: Indeed, I'll make it use reserve in the next iteration.
RelocTargets.push_back(File->getSymbol(Rel.SymbolTableIndex));		RelocTargets.push_back(File->getSymbol(Rel.SymbolTableIndex));
}		}

static void add16(uint8_t *P, int16_t V) { write16le(P, read16le(P) + V); }		static void add16(uint8_t *P, int16_t V) { write16le(P, read16le(P) + V); }
static void add32(uint8_t *P, int32_t V) { write32le(P, read32le(P) + V); }		static void add32(uint8_t *P, int32_t V) { write32le(P, read32le(P) + V); }
static void add64(uint8_t *P, int64_t V) { write64le(P, read64le(P) + V); }		static void add64(uint8_t *P, int64_t V) { write64le(P, read64le(P) + V); }
static void or16(uint8_t *P, uint16_t V) { write16le(P, read16le(P) \| V); }		static void or16(uint8_t *P, uint16_t V) { write16le(P, read16le(P) \| V); }
static void or32(uint8_t *P, uint32_t V) { write32le(P, read32le(P) \| V); }		static void or32(uint8_t *P, uint32_t V) { write32le(P, read32le(P) \| V); }
▲ Show 20 Lines • Show All 360 Lines • ▼ Show 20 Lines	static uint8_t getBaserelType(const coff_relocation &Rel) {
}		}
}		}

// Windows-specific.		// Windows-specific.
// Collect all locations that contain absolute addresses, which need to be		// Collect all locations that contain absolute addresses, which need to be
// fixed by the loader if load-time relocation is needed.		// fixed by the loader if load-time relocation is needed.
// Only called when base relocation is enabled.		// Only called when base relocation is enabled.
void SectionChunk::getBaserels(std::vector<Baserel> *Res) {		void SectionChunk::getBaserels(std::vector<Baserel> *Res) {
for (size_t I = 0, E = Relocs.size(); I < E; I++) {		for (size_t I = 0, E = Relocs.size(); I < E; I++) {
const coff_relocation &Rel = Relocs[I];		const coff_relocation &Rel = Relocs[I];
		ruiuUnsubmitted Not Done Reply Inline Actions It is more straightforward to write this loop in as a plain old `for` loop instead of a range-based for loop with `Counter`. ruiu: It is more straightforward to write this loop in as a plain old `for` loop instead of a range…
		mstorsjoAuthorUnsubmitted Not Done Reply Inline Actions Ok, will do. mstorsjo: Ok, will do.
uint8_t Ty = getBaserelType(Rel);		uint8_t Ty = getBaserelType(Rel);
if (Ty == IMAGE_REL_BASED_ABSOLUTE)		if (Ty == IMAGE_REL_BASED_ABSOLUTE)
continue;		continue;
// Use the potentially remapped Symbol instead of the one that the		// Use the potentially remapped Symbol instead of the one that the
// relocation points to.		// relocation points to.
Symbol *Target = RelocTargets[I];		Symbol *Target = RelocTargets[I];
if (!Target \|\| isa<DefinedAbsolute>(Target))		if (!Target \|\| isa<DefinedAbsolute>(Target))
continue;		continue;
▲ Show 20 Lines • Show All 209 Lines • ▼ Show 20 Lines

void ImportThunkChunkARM64::writeTo(uint8_t *Buf) const {		void ImportThunkChunkARM64::writeTo(uint8_t *Buf) const {
int64_t Off = ImpSymbol->getRVA() & 0xfff;		int64_t Off = ImpSymbol->getRVA() & 0xfff;
memcpy(Buf + OutputSectionOff, ImportThunkARM64, sizeof(ImportThunkARM64));		memcpy(Buf + OutputSectionOff, ImportThunkARM64, sizeof(ImportThunkARM64));
applyArm64Addr(Buf + OutputSectionOff, ImpSymbol->getRVA(), RVA, 12);		applyArm64Addr(Buf + OutputSectionOff, ImpSymbol->getRVA(), RVA, 12);
applyArm64Ldr(Buf + OutputSectionOff + 4, Off);		applyArm64Ldr(Buf + OutputSectionOff + 4, Off);
}		}

		// A Thumb2, PIC range extension thunk. A non-PIC one would be 2 bytes
		// shorter but would require a base relocation instead.
		ruiuUnsubmitted Not Done Reply Inline Actions This variable name seems a bit too long to my taste; I'd name `ArmThunk` or something like that, and that should be fine as long as this is a file-scope variable. ruiu: This variable name seems a bit too long to my taste; I'd name `ArmThunk` or something like that…
		mstorsjoAuthorUnsubmitted Not Done Reply Inline Actions Sure, I'll shorten it. mstorsjo: Sure, I'll shorten it.
		const uint8_t RangeExtensionThunkARMData[] = {
		0x40, 0xf2, 0x00, 0x0c, // P: movw ip,:lower16:S - (P + (L1-P) + 4)
		0xc0, 0xf2, 0x00, 0x0c, // movt ip,:upper16:S - (P + (L1-P) + 4)
		0xfc, 0x44, // L1: add ip, pc
		efriedmaUnsubmitted Not Done Reply Inline Actions Can you use "add pc, ip" instead? That's not an interworking branch, but I think we can assume the target is Thumb mode here. efriedma: Can you use "add pc, ip" instead? That's not an interworking branch, but I think we can assume…
		mstorsjoAuthorUnsubmitted Not Done Reply Inline Actions Oh, indeed, that'll make it as small as the other ones, while being PIC. Will update. mstorsjo: Oh, indeed, that'll make it as small as the other ones, while being PIC. Will update.
		0x60, 0x47, // bx ip
		};

		size_t RangeExtensionThunk::getSize() const {
		assert(Config->Machine == ARMNT);
		return sizeof(RangeExtensionThunkARMData);
		}

		void RangeExtensionThunk::writeTo(uint8_t *Buf) const {
		assert(Config->Machine == ARMNT);
		uint64_t Offset = Target->getRVA() - RVA - 12;
		// The target address needs to have the Thumb bit set.
		Offset \|= 1;
		memcpy(Buf + OutputSectionOff, RangeExtensionThunkARMData,
		sizeof(RangeExtensionThunkARMData));
		applyMOV32T(Buf + OutputSectionOff, uint32_t(Offset));
		}

void LocalImportChunk::getBaserels(std::vector<Baserel> *Res) {		void LocalImportChunk::getBaserels(std::vector<Baserel> *Res) {
Res->emplace_back(getRVA());		Res->emplace_back(getRVA());
}		}

size_t LocalImportChunk::getSize() const {		size_t LocalImportChunk::getSize() const {
return Config->is64() ? 8 : 4;		return Config->is64() ? 8 : 4;
}		}

▲ Show 20 Lines • Show All 123 Lines • ▼ Show 20 Lines
void MergeChunk::addSection(SectionChunk *C) {		void MergeChunk::addSection(SectionChunk *C) {
auto *&MC = Instances[C->Alignment];		auto *&MC = Instances[C->Alignment];
if (!MC)		if (!MC)
MC = make<MergeChunk>(C->Alignment);		MC = make<MergeChunk>(C->Alignment);
MC->Sections.push_back(C);		MC->Sections.push_back(C);
}		}

void MergeChunk::finalizeContents() {		void MergeChunk::finalizeContents() {
		if (!Finalized) {
for (SectionChunk *C : Sections)		for (SectionChunk *C : Sections)
if (C->isLive())		if (C->isLive())
Builder.add(toStringRef(C->getContents()));		Builder.add(toStringRef(C->getContents()));
Builder.finalize();		Builder.finalize();
		Finalized = true;
		}

for (SectionChunk *C : Sections) {		for (SectionChunk *C : Sections) {
if (!C->isLive())		if (!C->isLive())
continue;		continue;
size_t Off = Builder.getOffset(toStringRef(C->getContents()));		size_t Off = Builder.getOffset(toStringRef(C->getContents()));
C->setOutputSection(Out);		C->setOutputSection(Out);
C->setRVA(RVA + Off);		C->setRVA(RVA + Off);
C->OutputSectionOff = OutputSectionOff + Off;		C->OutputSectionOff = OutputSectionOff + Off;
}		}
		ruiuUnsubmitted Not Done Reply Inline Actions This part is not guarded by `Finalized` -- is that intended? ruiu: This part is not guarded by `Finalized` -- is that intended?
		mstorsjoAuthorUnsubmitted Not Done Reply Inline Actions Yes - this is called from assignAddresses on each relayout, to propagate the current location in the layout. mstorsjo: Yes - this is called from assignAddresses on each relayout, to propagate the current location…
}		}

uint32_t MergeChunk::getOutputCharacteristics() const {		uint32_t MergeChunk::getOutputCharacteristics() const {
return IMAGE_SCN_MEM_READ \| IMAGE_SCN_CNT_INITIALIZED_DATA;		return IMAGE_SCN_MEM_READ \| IMAGE_SCN_CNT_INITIALIZED_DATA;
}		}

size_t MergeChunk::getSize() const {		size_t MergeChunk::getSize() const {
return Builder.getSize();		return Builder.getSize();
}		}

void MergeChunk::writeTo(uint8_t *Buf) const {		void MergeChunk::writeTo(uint8_t *Buf) const {
Builder.write(Buf + OutputSectionOff);		Builder.write(Buf + OutputSectionOff);
}		}

} // namespace coff		} // namespace coff
} // namespace lld		} // namespace lld

COFF/Writer.h

	//===- Writer.h -------------------------------------------------- C++ --===//			//===- Writer.h -------------------------------------------------- C++ --===//
	//			//
	// The LLVM Linker			// The LLVM Linker
	//			//
	// This file is distributed under the University of Illinois Open Source			// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.			// License. See LICENSE.TXT for details.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLD_COFF_WRITER_H			#ifndef LLD_COFF_WRITER_H
	#define LLD_COFF_WRITER_H			#define LLD_COFF_WRITER_H

	#include "Chunks.h"			#include "Chunks.h"
				#include "Symbols.h"
	#include "llvm/ADT/StringRef.h"			#include "llvm/ADT/StringRef.h"
	#include "llvm/Object/COFF.h"			#include "llvm/Object/COFF.h"
	#include <chrono>			#include <chrono>
	#include <cstdint>			#include <cstdint>
	#include <vector>			#include <vector>

	namespace lld {			namespace lld {
	namespace coff {			namespace coff {
	Show All 9 Lines
	class OutputSection {			class OutputSection {
	public:			public:
	OutputSection(llvm::StringRef N, uint32_t Chars) : Name(N) {			OutputSection(llvm::StringRef N, uint32_t Chars) : Name(N) {
	Header.Characteristics = Chars;			Header.Characteristics = Chars;
	}			}
	void addChunk(Chunk *C);			void addChunk(Chunk *C);
	void merge(OutputSection *Other);			void merge(OutputSection *Other);
	ArrayRef<Chunk *> getChunks() { return Chunks; }			ArrayRef<Chunk *> getChunks() { return Chunks; }
				void clear() { Chunks.clear(); }
	void addPermissions(uint32_t C);			void addPermissions(uint32_t C);
	void setPermissions(uint32_t C);			void setPermissions(uint32_t C);
	uint64_t getRVA() { return Header.VirtualAddress; }			uint64_t getRVA() { return Header.VirtualAddress; }
	uint64_t getFileOff() { return Header.PointerToRawData; }			uint64_t getFileOff() { return Header.PointerToRawData; }
	void writeHeaderTo(uint8_t *Buf);			void writeHeaderTo(uint8_t *Buf);
				bool createThunks(int Pass,
				llvm::DenseMap<std::pair<Chunk *, uint64_t>,
				std::vector<Defined *>> &ThunksPerTarget,
				llvm::DenseMap<Defined , RangeExtensionThunk > &Thunks);

	// Returns the size of this section in an executable memory image.			// Returns the size of this section in an executable memory image.
	// This may be smaller than the raw size (the raw size is multiple			// This may be smaller than the raw size (the raw size is multiple
	// of disk sector size, so there may be padding at end), or may be			// of disk sector size, so there may be padding at end), or may be
	// larger (if that's the case, the loader reserves spaces after end			// larger (if that's the case, the loader reserves spaces after end
	// of raw data).			// of raw data).
	uint64_t getVirtualSize() { return Header.VirtualSize; }			uint64_t getVirtualSize() { return Header.VirtualSize; }

	Show All 22 Lines

COFF/Writer.cpp

Show First 20 Lines • Show All 147 Lines • ▼ Show 20 Lines

private:		private:
void createSections();		void createSections();
void createMiscChunks();		void createMiscChunks();
void createImportTables();		void createImportTables();
void createExportTable();		void createExportTable();
void mergeSections();		void mergeSections();
void assignAddresses();		void assignAddresses();
		void finalizeAddresses();
void removeEmptySections();		void removeEmptySections();
void createSymbolAndStringTable();		void createSymbolAndStringTable();
void openFile(StringRef OutputPath);		void openFile(StringRef OutputPath);
template <typename PEHeaderTy> void writeHeader();		template <typename PEHeaderTy> void writeHeader();
void createSEHTable();		void createSEHTable();
void createRuntimePseudoRelocs();		void createRuntimePseudoRelocs();
void insertRuntimePseudoRelocs();		void insertRuntimePseudoRelocs();
void createGuardCFTables();		void createGuardCFTables();
▲ Show 20 Lines • Show All 161 Lines • ▼ Show 20 Lines	for (const auto &DebugDir : File.debug_directories()) {
// id that we recognize / support, ignore it.		// id that we recognize / support, ignore it.
if (ExistingDI->Signature.CVSignature != OMF::Signature::PDB70)		if (ExistingDI->Signature.CVSignature != OMF::Signature::PDB70)
return None;		return None;
return *ExistingDI;		return *ExistingDI;
}		}
return None;		return None;
}		}

		static bool machineRequiresThunks() {
		// Only ARMNT requires range extension thunks out of the currently supported
		// architectures.
		return Config->Machine == ARMNT;
		}
		ruiuUnsubmitted Not Done Reply Inline Actions nit: move this assert at the beginning of this function. ruiu: nit: move this assert at the beginning of this function.

		// Check whether the target address S is in range from a relocation
		// of type RelType at address P.
		static bool isInRange(uint16_t RelType, uint64_t S, uint64_t P) {
		assert(Config->Machine == ARMNT);
		int64_t Diff = S - P - 4;
		switch (RelType) {
		case IMAGE_REL_ARM_BRANCH20T:
		return isInt<21>(Diff);
		case IMAGE_REL_ARM_BRANCH24T:
		case IMAGE_REL_ARM_BLX23T:
		return isInt<25>(Diff);
		default:
		return true;
		}
		}

		peter.smithUnsubmitted Not Done Reply Inline Actions Assuming getRVA() is the virtual address of the symbol, is Target->getRVA() stable between passes? Presumably if thunks are inserted then assignAddresses() may cause some symbols to change address? I'm not too familiar with the COFF code base so I could be missing something here. If I'm right the reuse between passes may not work as well as it could do. peter.smith: Assuming getRVA() is the virtual address of the symbol, is Target->getRVA() stable between…
		mstorsjoAuthorUnsubmitted Not Done Reply Inline Actions No, the RVAs aren't stable between passes, but we don't keep the map between passes either; it's a local variable in createThunks below. I guess it would be useful to allow this to find a different thunk from the previous pass (that would also reduce the amount of changes in later passes, reducing the number of passes required before it converges), but that's not implemented (yet). mstorsjo: No, the RVAs aren't stable between passes, but we don't keep the map between passes either…
		// Return an existing thunk which is in range, or create a new one.
		static std::pair<Defined *, bool>
		getThunk(DenseMap<std::pair<Chunk , uint64_t>, std::vector<Defined >>
		&ThunksPerTarget,
		ruiuUnsubmitted Not Done Reply Inline Actions I don't know if your above comment is true, but if the one this loop is looking for is likely at the end of the vector, I'd search in the reverse order. I.e. for (Defined Sym : llvm::reverse(TargetThunks)) ruiu:* I don't know if your above comment is true, but if the one this loop is looking for is likely…
		mstorsjoAuthorUnsubmitted Not Done Reply Inline Actions That's a nice idea. I'm changing code to keep the mapping of existing thunks stable across passes, so then the comment doesn't apply quite as much as before. I tested this and it didn't really save any measurable runtime, but it might be worthwhile anyway, as long as we iterate through the whole vector. mstorsjo: That's a nice idea. I'm changing code to keep the mapping of existing thunks stable across…
		mstorsjoAuthorUnsubmitted Not Done Reply Inline Actions In practice with the case I'm testing, TargetThunks will only have 0 or 1 members, so it doesn't really matter much, but in general I guess it could be useful. (My testcase produces a 46 MB clang.exe, so it's only roughly twice as large as the branch range which is 16 MB.) mstorsjo: In practice with the case I'm testing, TargetThunks will only have 0 or 1 members, so it…
		DenseMap<Defined , RangeExtensionThunk > &Thunks, Defined *Target,
		uint64_t P, uint16_t Type) {
		ruiuUnsubmitted Not Done Reply Inline Actions Can this just be `return {Sym, false}`? ruiu: Can this just be `return {Sym, false}`?
		mstorsjoAuthorUnsubmitted Not Done Reply Inline Actions Indeed, will simplify the syntax of these. mstorsjo: Indeed, will simplify the syntax of these.
		Chunk *TargetChunk = Target->getChunk();
		ruiuUnsubmitted Not Done Reply Inline Actions nit: omit {} ruiu: nit: omit {}
		uint64_t TargetChunkRVA = TargetChunk ? TargetChunk->getRVA() : 0;
		// A unique representation of the target address of a Defined symbol,
		ruiuUnsubmitted Not Done Reply Inline Actions Please insert a blank line before a multi-line comment. ruiu: Please insert a blank line before a multi-line comment.
		// stable across relayouts. This is represented as the base Chunk* with
		// an offset, wihch should be stable across relayouts.
		ruiuUnsubmitted Not Done Reply Inline Actions I don't think you need to pass this `ThunkCounter` around; I'd define this as a static local variable in this function. ruiu: I don't think you need to pass this `ThunkCounter` around; I'd define this as a static local…
		mstorsjoAuthorUnsubmitted Not Done Reply Inline Actions I later figured out I didn't need to name the thunk symbol at all; I just create the DefinedSynthetic directly without a name, and don't add it to Symtab - just like we do with object file local symbols. mstorsjo: I later figured out I didn't need to name the thunk symbol at all; I just create the…
		// Some symbols return a nullptr Chunk, which we should be ready to handle.
		std::pair<Chunk *, uint64_t> UniqueTarget = {TargetChunk, Target->getRVA() -
		TargetChunkRVA};
		ruiuUnsubmitted Not Done Reply Inline Actions Error messages should start with a lowercase letter. Shouldn't this be an `assert`? If this can be triggered by a valid user input, this error message should contain more info as to what is the problem. ruiu: Error messages should start with a lowercase letter. Shouldn't this be an `assert`? If this…
		mstorsjoAuthorUnsubmitted Not Done Reply Inline Actions I managed to remove this error altogether by not adding the thunk symbols to the symbol table. mstorsjo: I managed to remove this error altogether by not adding the thunk symbols to the symbol table.
		std::vector<Defined *> &TargetThunks = ThunksPerTarget[UniqueTarget];
		// For the first pass, any matches are most likely at the end of the vector,
		ruiuUnsubmitted Not Done Reply Inline Actions Maybe `return {D, true}` ruiu: Maybe `return {D, true}`
		// so by iterating in reverse order, we might find a match sooner. As long
		// as the image size only is in the same order of magnitude as the branch
		// range (16 MB for ARMNT), there will in practice only be one or a few
		// thunks per target.
		ruiuUnsubmitted Not Done Reply Inline Actions std::map is an ordered map and usually much slower than DenseMap, so please use DenseMap. ruiu: std::map is an ordered map and usually much slower than DenseMap, so please use DenseMap.
		for (Defined *Sym : llvm::reverse(TargetThunks))
		if (isInRange(Type, Sym->getRVA(), P))
		return {Sym, false};
		RangeExtensionThunk *C = make<RangeExtensionThunk>(Target);
		Defined *D = make<DefinedSynthetic>("", C);
		TargetThunks.push_back(D);
		Thunks[D] = C;
		return {D, true};
		}
		ruiuUnsubmitted Not Done Reply Inline Actions nit: please insert a blank line before a comment. ruiu: nit: please insert a blank line before a comment.

		// Check if the symbol currently points at a thunk, and if it does, if it still
		// is usable. Returns true if it is a thunk and it still is usable.
		static bool
		normalizeExistingThunk(DenseMap<Defined , RangeExtensionThunk > &Thunks,
		Symbol *&RelocTarget, uint16_t RelType,
		uint64_t RelAddr) {
		Defined *Sym = dyn_cast_or_null<Defined>(RelocTarget);
		if (!Sym)
		return false;
		if (RangeExtensionThunk *RET = Thunks.lookup(Sym)) {
		if (isInRange(RelType, Sym->getRVA(), RelAddr))
		return true;
		// The previously used thunk is out of range; don't refer to the thunk any
		// longer but directly to the original target, to avoid chaining thunks.
		RelocTarget = RET->getTarget();
		}
		return false;
		}

		bool OutputSection::createThunks(
		ruiuUnsubmitted Not Done Reply Inline Actions Nesting is too deep. Please consider splitting to multiple functions. ruiu: Nesting is too deep. Please consider splitting to multiple functions.
		mstorsjoAuthorUnsubmitted Not Done Reply Inline Actions It's a bit hard to split this part to a separate method since it touches almost every single local variable from this method, but I can easily change the `if (!isInRange())` into an `if (isInRange()) continue;` to reduce the nesting a little. mstorsjo: It's a bit hard to split this part to a separate method since it touches almost every single…
		int Pass, DenseMap<std::pair<Chunk , uint64_t>, std::vector<Defined >>
		&ThunksPerTarget,
		peter.smithUnsubmitted Not Done Reply Inline Actions In theory if you are iterating a fixed number (Chunks.size()) of the Chunks vector then inserting thunks into the Chunks vector in the same loop will mean that Chunks near the end may not be scanned for Thunks. Given that the algorithm will only terminate when 0 thunks are inserted you'll eventually scan all of them but it may cost you more passes than you would need if you inserted all Thunks in one go. I think you'll be unlikely to hit 10 passes without a contrived test case though. peter.smith: In theory if you are iterating a fixed number (Chunks.size()) of the Chunks vector then…
		mstorsjoAuthorUnsubmitted Not Done Reply Inline Actions Hmm, I'm not quite sure I understand what you mean here. You mean that since I'm adding more elements to the Chunks vector, I'd miss the last few ones that were pushed forward? The limit on the outer loop, on line 379, explicitly checks for Chunks.size(), so it will loop until the very end of the vector, even if Chunks grows meanwhile. mstorsjo: Hmm, I'm not quite sure I understand what you mean here. You mean that since I'm adding more…
		peter.smithUnsubmitted Not Done Reply Inline Actions Apologies, I had it my mind that Chunks.size() would only be calculated once per pass. peter.smith: Apologies, I had it my mind that Chunks.size() would only be calculated once per pass.
		DenseMap<Defined , RangeExtensionThunk > &Thunks) {
		bool AddressesChanged = false;
		size_t ThunksSize = 0;
		// Recheck Chunks.size() each iteration, since we can insert more
		// elements into it.
		for (size_t I = 0; I != Chunks.size(); ++I) {
		SectionChunk *SC = dyn_cast_or_null<SectionChunk>(Chunks[I]);
		if (!SC)
		continue;
		size_t ThunkInsertionSpot = I + 1;

		// Try to get a good enough estimate of where new thunks will be placed.
		// Offset this by the size of the new thunks added so far, to make the
		// estimate slightly better.
		size_t ThunkInsertionRVA = SC->getRVA() + SC->getSize() + ThunksSize;
		for (size_t J = 0, E = SC->Relocs.size(); J < E; ++J) {
		const coff_relocation &Rel = SC->Relocs[J];
		Symbol *&RelocTarget = SC->RelocTargets[J];

		// The estimate of the source address P should be pretty accurate,
		// but we don't know whether the target Symbol address should be
		ruiuUnsubmitted Not Done Reply Inline Actions Adding -> adding ruiu: Adding -> adding
		// offset by ThunkSize or not (or by some of ThunksSize but not all of
		// it), giving us some uncertainty once we have added one thunk.
		uint64_t P = SC->getRVA() + Rel.VirtualAddress + ThunksSize;

		// If this Symbol already is a thunk, and it is in range, no need to do
		// anything. If it was a thunk but the thunk now also is out of range,
		// this resets the Symbol to point to the original symbol, allowing the
		// new thunk to point directly to the target.
		if (Pass > 0 && normalizeExistingThunk(Thunks, RelocTarget, Rel.Type, P))
		continue;

		Defined *Sym = dyn_cast_or_null<Defined>(RelocTarget);
		if (!Sym)
		continue;

		uint64_t S = Sym->getRVA();

		if (isInRange(Rel.Type, S, P))
		continue;

		// If the target isn't in range, hook it up to an existing or new
		// thunk.
		Defined *Thunk;
		bool WasNew;
		std::tie(Thunk, WasNew) =
		getThunk(ThunksPerTarget, Thunks, Sym, P, Rel.Type);
		if (WasNew) {
		Chunk *ThunkChunk = Thunk->getChunk();
		ThunkChunk->setRVA(ThunkInsertionRVA); // Estimate of where it will be located.
		Chunks.insert(Chunks.begin() + ThunkInsertionSpot, ThunkChunk);
		ThunkInsertionSpot++;
		ThunksSize += ThunkChunk->getSize();
		ThunkInsertionRVA += ThunkChunk->getSize();
		AddressesChanged = true;
		}
		RelocTarget = Thunk;
		}
		}
		return AddressesChanged;
		}

		// Assign addresses and add thunks if necessary.
		void Writer::finalizeAddresses() {
		int ThunkPass = 0;
		bool AddressesChanged;
		DenseMap<std::pair<Chunk , uint64_t>, std::vector<Defined >>
		ThunksPerTarget;
		DenseMap<Defined , RangeExtensionThunk > Thunks;
		do {
		if (ThunkPass >= 10)
		fatal("adding thunks hasn't converged after " + Twine(ThunkPass) +
		" passes");
		assignAddresses();
		if (!machineRequiresThunks())
		return;
		AddressesChanged = false;
		for (OutputSection *Sec : OutputSections)
		AddressesChanged \|= Sec->createThunks(ThunkPass, ThunksPerTarget, Thunks);
		ThunkPass++;
		// Iterate until no new thunks have been added. Even if the last pass
		// hooked up a relocation to a different target than before, we don't need
		// to run another pass unless addresses actually have changed.
		} while (AddressesChanged);
		log("Added " + Twine(Thunks.size()) + " thunks in " + Twine(ThunkPass) +
		" passes");
		}

// The main function of the writer.		// The main function of the writer.
void Writer::run() {		void Writer::run() {
ScopedTimer T1(CodeLayoutTimer);		ScopedTimer T1(CodeLayoutTimer);

// Find pseudo relocs early, to allow marking the corresponding		// Find pseudo relocs early, to allow marking the corresponding
// section chunks as writable, before assigning them to output sections.		// section chunks as writable, before assigning them to output sections.
if (Config->MinGW)		if (Config->MinGW)
createRuntimePseudoRelocs();		createRuntimePseudoRelocs();

createSections();		createSections();
createMiscChunks();		createMiscChunks();
createImportTables();		createImportTables();
createExportTable();		createExportTable();
mergeSections();		mergeSections();
assignAddresses();		finalizeAddresses();
removeEmptySections();		removeEmptySections();
setSectionPermissions();		setSectionPermissions();
createSymbolAndStringTable();		createSymbolAndStringTable();

if (FileSize > UINT32_MAX)		if (FileSize > UINT32_MAX)
fatal("image size (" + Twine(FileSize) + ") " +		fatal("image size (" + Twine(FileSize) + ") " +
"exceeds maximum allowable size (" + Twine(UINT32_MAX) + ")");		"exceeds maximum allowable size (" + Twine(UINT32_MAX) + ")");

▲ Show 20 Lines • Show All 956 Lines • ▼ Show 20 Lines	if (S->Header.Characteristics & IMAGE_SCN_CNT_INITIALIZED_DATA)
Res += S->getRawSize();		Res += S->getRawSize();
return Res;		return Res;
}		}

// Add base relocations to .reloc section.		// Add base relocations to .reloc section.
void Writer::addBaserels() {		void Writer::addBaserels() {
if (!Config->Relocatable)		if (!Config->Relocatable)
return;		return;
		RelocSec->clear();
std::vector<Baserel> V;		std::vector<Baserel> V;
for (OutputSection *Sec : OutputSections) {		for (OutputSection *Sec : OutputSections) {
if (Sec->Header.Characteristics & IMAGE_SCN_MEM_DISCARDABLE)		if (Sec->Header.Characteristics & IMAGE_SCN_MEM_DISCARDABLE)
continue;		continue;
// Collect all locations for base relocations.		// Collect all locations for base relocations.
for (Chunk *C : Sec->getChunks())		for (Chunk *C : Sec->getChunks())
C->getBaserels(&V);		C->getBaserels(&V);
// Add the addresses to .reloc section.		// Add the addresses to .reloc section.
Show All 23 Lines

test/COFF/Inputs/far-arm-thumb-abs.s

This file was deleted.

	.global too_far1
	too_far1 = 0x1401004

test/COFF/Inputs/far-arm-thumb-abs20.s

This file was deleted.

	.global too_far20
	too_far20 = 0x501004

test/COFF/arm-thumb-branch-error.s

This file was deleted.

	// RUN: llvm-mc -filetype=obj -triple=thumbv7a-windows-gnu %s -o %t
	// RUN: llvm-mc -filetype=obj -triple=thumbv7a-windows-gnu %S/Inputs/far-arm-thumb-abs.s -o %tfar
	// RUN: not lld-link -entry:_start -subsystem:console %t %tfar -out:%t2 2>&1 \| FileCheck %s
	// REQUIRES: arm
	.syntax unified
	.globl _start
	_start:
	bl too_far1

	// CHECK: relocation out of range

test/COFF/arm-thumb-branch20-error.s

	// REQUIRES: arm			// REQUIRES: arm
	// RUN: llvm-mc -filetype=obj -triple=thumbv7a-windows-gnu %s -o %t.obj			// RUN: llvm-mc -filetype=obj -triple=thumbv7a-windows-gnu %s -o %t.obj
	// RUN: llvm-mc -filetype=obj -triple=thumbv7a-windows-gnu %S/Inputs/far-arm-thumb-abs20.s -o %t.far.obj			// RUN: not lld-link -entry:_start -subsystem:console %t.obj -out:%t.exe 2>&1 \| FileCheck %s
	// RUN: not lld-link -entry:_start -subsystem:console %t.obj %t.far.obj -out:%t.exe 2>&1 \| FileCheck %s
	.syntax unified			.syntax unified
	.globl _start			.globl _start
	_start:			_start:
	bne too_far20			bne too_far20
				.space 0x100000
				.section .text$a, "xr"
				too_far20:
				bx lr

	// CHECK: relocation out of range			// When trying to add a thunk at the end of the section, the thunk itself
				// will be too far away, so this won't converge.

				// CHECK: adding thunks hasn't converged

test/COFF/arm-thumb-thunks.s

This file was added.

				// REQUIRES: arm
				// RUN: llvm-mc -filetype=obj -triple=thumbv7-windows %s -o %t.obj
				// RUN: lld-link -entry:main -subsystem:console %t.obj -out:%t.exe
				// RUN: llvm-objdump -d %t.exe -start-address=0x401000 -stop-address=0x401022 \| FileCheck -check-prefix=MAIN %s
				// RUN: llvm-objdump -d %t.exe -start-address=0x501012 -stop-address=0x501030 \| FileCheck -check-prefix=FUNC1 %s
				// RUN: llvm-objdump -d %t.exe -start-address=0x601030 \| FileCheck -check-prefix=FUNC2 %s

				// Pass 0:
				// main->func1 in range
				// main->func2 out of range, adding thunk after main
				// func1->func2 (first) out of range, using thunk from main
				// func1->func2 (second) in range
				// Pass 1:
				// main->func1 out of range, adding thunk after main
				// func1->thunk from main out of range, adding new thunk after func1
				// Pass 2:
				// func1->func2 (second) now out of range, using existing thunk after func1
				.syntax unified
				.globl main
				.globl func1
				.text
				main:
				bne func1
				bne func2
				nop
				.section .text$a, "xr"
				.space 0x100000 - 16
				.section .text$b, "xr"
				func1:
				bne func2
				nop
				nop
				nop
				nop
				bne func2
				bx lr
				.section .text$c, "xr"
				.space 0x100000
				.section .text$d, "xr"
				func2:
				// Test using string tail merging. This is irrelevant to the thunking itself,
				// but running multiple passes of assignAddresses() calls finalizeAddresses()
				// multiple times; check that MergeChunk handles this correctly.
				movw r0, :lower16:"??_C@string1"
				movt r0, :upper16:"??_C@string1"
				movw r1, :lower16:"??_C@string2"
				movt r1, :upper16:"??_C@string2"
				bx lr

				.section .rdata,"dr",discard,"??_C@string1"
				.globl "??_C@string1"
				"??_C@string1":
				.asciz "foobar"
				.section .rdata,"dr",discard,"??_C@string2"
				.globl "??_C@string2"
				"??_C@string2":
				.asciz "bar"

				// MAIN: 401000: 40 f0 03 80 bne.w #6 <.text+0xa>
				// MAIN: 401004: 40 f0 07 80 bne.w #14 <.text+0x16>
				// MAIN: 401008: 00 bf nop
				// func2 thunk
				// MAIN: 40100a: 4f f6 fd 7c movw r12, #65533
				// MAIN: 40100e: c0 f2 0f 0c movt r12, #15
				// MAIN: 401012: fc 44 add r12, pc
				// MAIN: 401014: 60 47 bx r12
				// func1 thunk
				// MAIN: 401016: 40 f2 0f 0c movw r12, #15
				// MAIN: 40101a: c0 f2 20 0c movt r12, #32
				// MAIN: 40101e: fc 44 add r12, pc
				// MAIN: 401020: 60 47 bx r12

				// FUNC1: 501012: 40 f0 07 80 bne.w #14 <.text+0x100024>
				// FUNC1: 501016: 00 bf nop
				// FUNC1: 501018: 00 bf nop
				// FUNC1: 50101a: 00 bf nop
				// FUNC1: 50101c: 00 bf nop
				// FUNC1: 50101e: 40 f0 01 80 bne.w #2 <.text+0x100024>
				// FUNC1: 501022: 70 47 bx lr
				// func2 thunk
				// FUNC1: 501024: 40 f2 01 0c movw r12, #1
				// FUNC1: 501028: c0 f2 10 0c movt r12, #16
				// FUNC1: 50102c: fc 44 add r12, pc
				// FUNC1: 50102e: 60 47 bx r12

				// FUNC2: 601030: 42 f2 00 00 movw r0, #8192
				// FUNC2: 601034: c0 f2 60 00 movt r0, #96
				// FUNC2: 601038: 42 f2 03 01 movw r1, #8195
				// FUNC2: 60103c: c0 f2 60 01 movt r1, #96
				// FUNC2: 601040: 70 47 bx lr