This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
COFF/
3
Chunks.h
19
Chunks.cpp
3
ICF.cpp
2
PDB.cpp
-
Writer.h
23
Writer.cpp
-
test/COFF/
-
COFF/
-
arm-thumb-branch-error.s
-
arm-thumb-branch-thunk.s
-
arm-thumb-branch20-error.s
-
arm-thumb-branch20-thunk.s

Differential D51089

[LLD] [COFF] Add support for creating range extension thunks for ARM
AbandonedPublic

Authored by mstorsjo on Aug 22 2018, 2:14 AM.

Download Raw Diff

Details

Reviewers

ruiu
peter.smith
pcc
rnk
javed.absar

Summary

This is a feature that MS link.exe lacks; it currently errors out on such relocations, just like lld did before.

This allows linking clang.exe for ARM - practically, any image over 16 MB will likely run into the issue.

Diff Detail

Event Timeline

mstorsjo created this revision.Aug 22 2018, 2:14 AM

Herald added a reviewer: javed.absar. · View Herald TranscriptAug 22 2018, 2:14 AM

Herald added subscribers: chrib, kristof.beyls. · View Herald Transcript

Just FWIW, this change plus D51032 seems to be enough to build a working (at least for small trivial examples) clang+lld for ARM/Windows (AArch64 seems to work just fine as is).

I've left a couple of comments where I think that there may be some unintentional inefficiencies, but as far as I can tell it looks like it will be correct. I suggest adding a lot more test cases as you go for things like thunk reuse.

COFF/Writer.cpp
356	Assuming getRVA() is the virtual address of the symbol, is Target->getRVA() stable between passes? Presumably if thunks are inserted then assignAddresses() may cause some symbols to change address? I'm not too familiar with the COFF code base so I could be missing something here. If I'm right the reuse between passes may not work as well as it could do.
408	In theory if you are iterating a fixed number (Chunks.size()) of the Chunks vector then inserting thunks into the Chunks vector in the same loop will mean that Chunks near the end may not be scanned for Thunks. Given that the algorithm will only terminate when 0 thunks are inserted you'll eventually scan all of them but it may cost you more passes than you would need if you inserted all Thunks in one go. I think you'll be unlikely to hit 10 passes without a contrived test case though.

In D51089#1209038, @peter.smith wrote:

I've left a couple of comments where I think that there may be some unintentional inefficiencies, but as far as I can tell it looks like it will be correct. I suggest adding a lot more test cases as you go for things like thunk reuse.

Thanks for taking a look!

Yup, some more tests definitely would be good. Do you have any suggestions on how to test things like this efficiently without creating >16 MB binaries? The existing tests (that used to check for errors) just use absolute symbols as targets to make it fail. I see that the ELF tests either do padding with .space or huge alignment with .balign - I guess something like that would work here as well. And I see that the existing ELF tests also produce huge binaries (although they hopefully are stored sparsely).

COFF/Writer.cpp
356	No, the RVAs aren't stable between passes, but we don't keep the map between passes either; it's a local variable in createThunks below. I guess it would be useful to allow this to find a different thunk from the previous pass (that would also reduce the amount of changes in later passes, reducing the number of passes required before it converges), but that's not implemented (yet).
408	Hmm, I'm not quite sure I understand what you mean here. You mean that since I'm adding more elements to the Chunks vector, I'd miss the last few ones that were pushed forward? The limit on the outer loop, on line 379, explicitly checks for Chunks.size(), so it will loop until the very end of the vector, even if Chunks grows meanwhile.

Sadly I had to resort to using .space to create large binaries. Creating the binaries is usually quick, disassembling them is unfortunately not. I tended to used gnu objdump first as that skips 0 by default to find the address ranges I needed, then used --start-address and --stop-address in llvm-objdump to pull out the bits that I need.

In ELF we can use Linker Scripts for quite a few of the cases; although Linker Scripts make some corner cases possible that wouldn't exist otherwise. You may be able to do most of your tests with the conditional branch (+- 1 Mb), I didn't do that with ELF as the ThunkSections were placed at 16 Mb intervals as the vast majority of the relocations we would encounter were +- 16Mb.

COFF/Writer.cpp
408	Apologies, I had it my mind that Chunks.size() would only be calculated once per pass.

mstorsjo added inline comments.Aug 24 2018, 12:01 PM

COFF/Chunks.cpp
54	Does anyone have an opinion on the mechanism of overriding what symbol an individual reloc points to? Here I provide a full vector of symbols (which can't be initialized directly but after all symbols actually exist) - an alternative would be e.g. a DenseMap to only provide the individual symbols that are overridden. Or something else?

smeenai added a subscriber: smeenai.Aug 24 2018, 12:50 PM

ruiu added inline comments.Aug 26 2018, 9:52 PM

COFF/Chunks.cpp
51–52	Can this happen?
53–54	Since this vector can be very large, it is perhaps better to call `reserve()`.
54	I think this vector should be fine because this vector will be used very heavily and a vector lookup is extremely fast.
424	It is more straightforward to write this loop in as a plain old `for` loop instead of a range-based for loop with `Counter`.
COFF/Chunks.h
227	Maybe something like `RelocTargets` is better? `Symbols` looks like it represents symbol table contents. Please add a comment to explain why we want to cache relocation targets in this table (i.e. we need to modify relocation targets when a relocation is redirected to other symbol due to thunk insertion.)
COFF/ICF.cpp
150	I'd think this comment is not easy to understand if you don't know about this comment is in the context of ARM thunk support. Maybe we can just omit it? Does ICF run after ARM thunk creation?
178	Ditto.
COFF/Writer.cpp
339	nit: move this assert at the beginning of this function.
360	I don't know if your above comment is true, but if the one this loop is looking for is likely at the end of the vector, I'd search in the reverse order. I.e. for (Defined *Sym : llvm::reverse(TargetThunks))
362	Can this just be `return {Sym, false}`?
363	nit: omit {}
367	I don't think you need to pass this `ThunkCounter` around; I'd define this as a static local variable in this function.
370	Error messages should start with a lowercase letter. Shouldn't this be an `assert`? If this can be triggered by a valid user input, this error message should contain more info as to what is the problem.
372	Maybe `return {D, true}`
376	std::map is an ordered map and usually much slower than DenseMap, so please use DenseMap.
385	nit: please insert a blank line before a comment.
406	Nesting is too deep. Please consider splitting to multiple functions.
429	Adding -> adding

Thanks for the feedback! I'll fold this into the next iteration of the patch; I have a bunch of other improvements planned.

COFF/Chunks.cpp
51–52	I don't think so - perhaps I should make it an assert. I had to insert a call to finalizeContents() in relocateDebugChunk() in PDB.cpp though, so I wanted to make sure.
53–54	Indeed, I'll make it use reserve in the next iteration.
424	Ok, will do.
COFF/ICF.cpp
150	Yeah, it's probably best to omit the comment. ICF runs before the thunk creation. The original reason for the comment was that I wanted to replace all uses of coff_relocation::SymbolTableIndex with the Symbols vector (outside the case when initializing the vector), to make things consistent, but it wasn't practical in all cases. In reality it mostly is necessary in SectionChunk::writeTo() and SectionChunk::getBaserels().
COFF/Writer.cpp
367	I later figured out I didn't need to name the thunk symbol at all; I just create the DefinedSynthetic directly without a name, and don't add it to Symtab - just like we do with object file local symbols.
370	I managed to remove this error altogether by not adding the thunk symbols to the symbol table.

mstorsjo added inline comments.Aug 27 2018, 12:25 AM

COFF/Writer.cpp
360	That's a nice idea. I'm changing code to keep the mapping of existing thunks stable across passes, so then the comment doesn't apply quite as much as before. I tested this and it didn't really save any measurable runtime, but it might be worthwhile anyway, as long as we iterate through the whole vector.
362	Indeed, will simplify the syntax of these.

mstorsjo added inline comments.Aug 27 2018, 4:11 AM

COFF/Chunks.cpp
51–52	Actually, yes, it does happen. `finalizeContents` gets called by `assignAddresses`, which gets called repeatedly when relayouting after adding thunks. (After realizing this, I had to make sure MergeChunk::finalizeContents works properly for this case as well.) The alternative would be to add another callback to Chunk, which we'd call just once, when all symbols are available.
COFF/Writer.cpp
360	In practice with the case I'm testing, TargetThunks will only have 0 or 1 members, so it doesn't really matter much, but in general I guess it could be useful. (My testcase produces a 46 MB clang.exe, so it's only roughly twice as large as the branch range which is 16 MB.)
406	It's a bit hard to split this part to a separate method since it touches almost every single local variable from this method, but I can easily change the `if (!isInRange())` into an `if (isInRange()) continue;` to reduce the nesting a little.

Updated taking @ruiu and @peter.smith's feedback into account. I still haven't added more tests though, so that's still a clear todo, but reposting for more potential feedback meanwhile.

I updated the code to keep the thunk maps between passes, which leads to much fewer additions in later passes (originally I could occasionally hit up to 8 passes before things were done, now it gets done in 3 passes), and fixed code to avoid chaining thunks in case the originally chosen thunk went out of range.

Added a pretty complete testcase testing most aspects of the algorithm that is implemented, added a testcase for when unable to fix range issues with thunks.

Missed the RelocTargets part of the diff in the previous update.

efriedma added a subscriber: efriedma.Aug 28 2018, 1:56 PM

efriedma added inline comments.

COFF/Chunks.cpp
642	Can you use "add pc, ip" instead? That's not an interworking branch, but I think we can assume the target is Thumb mode here.

mstorsjo added inline comments.Aug 28 2018, 2:20 PM

COFF/Chunks.cpp
642	Oh, indeed, that'll make it as small as the other ones, while being PIC. Will update.

Changed the thunk implementation to a shorter, non-interworking, form as suggested by @efriedma.

ruiu added inline comments.Aug 28 2018, 11:28 PM

COFF/Chunks.cpp
50	We generally don't micro-optimize code, but for relocations, we do, because the number of relocations can be an order of tens of millions for large programs. Spending one more microsecond for each relocation adds up to one second if your program has one million relocations. This function is a bit concerning in that regard. Could you measure the performance impact? Also, it looks odd that you do this in `finalizeContents`, as it doesn't correspond to finalizing contents. Perhaps this function should be given a new name.
COFF/Chunks.h
361–364	We generally don't define trivial accessors like this; instead, just define a member as a public one.

mstorsjo added inline comments.Aug 29 2018, 2:11 AM

COFF/Chunks.cpp
50	I tried measuring it, and I think it's making things slower, but it's mostly within measurement noise. My testcase was linking a 66 MB clang.exe (for x86_64). Before this change, the fastest link was in 480 ms, after the change the fastest link was 520 ms. But in both cases the runtimes occasionally go up to over 600 ms. So it's not huge but I think it's consistently measurable. Do you have any other suggestions on how to achieve this without affecting performance? Only use the RelocTargets vector if `Machine==ARMNT`? Or make it a `DenseMap` for overridden targets, which is empty for all other cases than when we have added thunks? Yes, it's a bit odd with `finalizeContents`, maybe a new method `initRelocTargets` which just gets called once before we start doing `assignAddresses`?
COFF/Chunks.h
361–364	Ok, will include that change into the next update.

ruiu added inline comments.Aug 29 2018, 2:19 AM

COFF/Chunks.cpp
50	Of 520ms, it'd be interesting to know how much time this function is spending. gprof might help, but I'm not sure if it works on Windows. 480ms to 520ms isn't I think a marginal difference; it's 10% slowdown if the measurement is accurate. One idea to make it faster (and could potentially be faster than it is now) is to parallelize it. I believe you can make it a separate function, say `readRelocTargets`, and call on all input sections in parallel. It should be safe to do because filling this new vector doesn't affect other threads.

mstorsjo added inline comments.Aug 29 2018, 2:58 AM

COFF/Chunks.cpp
50	I don't run things on Windows myself, I'm mostly working with cross compilation here. I'll try making this a separate parallel pass and see what difference it makes!

Split out readRelocTargets() to a separate method, which is called in parallel. This reduced the performance drop quite a bit, now the difference is much smaller. In this test round, the fastest run for the original version was 395 ms, after this patch the fastest run was 401 ms. So in this case, the slowdown is around 2% which hopefully is more acceptable.

In D51089#1217205, @mstorsjo wrote:

Split out readRelocTargets() to a separate method, which is called in parallel. This reduced the performance drop quite a bit, now the difference is much smaller. In this test round, the fastest run for the original version was 395 ms, after this patch the fastest run was 401 ms. So in this case, the slowdown is around 2% which hopefully is more acceptable.

@ruiu - any further comments? Is this form more acceptable?

In D51089#1220360, @mstorsjo wrote:

In D51089#1217205, @mstorsjo wrote:

Split out readRelocTargets() to a separate method, which is called in parallel. This reduced the performance drop quite a bit, now the difference is much smaller. In this test round, the fastest run for the original version was 395 ms, after this patch the fastest run was 401 ms. So in this case, the slowdown is around 2% which hopefully is more acceptable.

@ruiu - any further comments? Is this form more acceptable?

Ping @ruiu

In D51089#1224268, @mstorsjo wrote:

In D51089#1220360, @mstorsjo wrote:

In D51089#1217205, @mstorsjo wrote:

Split out readRelocTargets() to a separate method, which is called in parallel. This reduced the performance drop quite a bit, now the difference is much smaller. In this test round, the fastest run for the original version was 395 ms, after this patch the fastest run was 401 ms. So in this case, the slowdown is around 2% which hopefully is more acceptable.

@ruiu - any further comments? Is this form more acceptable?

Ping @ruiu

Another ping for @ruiu

@ruiu - Do you have time to proceed with this one? Is the performance regression, which now is smaller thanks to your suggeation, acceptable? Or should I try other alternatives which make messier code with different codepaths for architectures that don't need thunks?

The thunk algorithm here should be good to go now (no longer RFC level), in case @peter.smith wants to have another look (it's just a few minor improvements over the original one, which was more or less ok'd).

The algorithm looks good to me. For ELF I preferred to give the Thunk a name related to the destination as it makes it a bit easier to follow disassembled binaries, but it is not essential.

Sorry for the belated response. I was thinking of this patch for a while.

Every time I saw the code of thunk range extension, I wonder if we really need this multi-pass algorithm which add thunks iterative on each pass. I believe in almost all cases, the algorithm finishes on the first iteration, if we allow a very small margin when determining "reachability". As long as a margin is small, size increase by allowing a margin should be negligible.

For pathetic executables for which we need to generate tons of thunks (which enlarges distance between callers and callees and thus need multiple passes with the current algorithm), we can simply discard everything that we made in the previous iteration instead of keeping them, double the margin, and then try again from scratch. In practice, I believe that fallback doesn't happen too frequently.

What do you think of the algorithm? If it works, I prefer that algorithm because discarding everything and redo with a larger margin is simpler than keeping thunks created in previous passes.

COFF/Chunks.cpp
638	This variable name seems a bit too long to my taste; I'd name `ArmThunk` or something like that, and that should be fine as long as this is a file-scope variable.
790–797	This part is not guarded by `Finalized` -- is that intended?
COFF/PDB.cpp
775	If you change this line to cast<SectionChunk>(DebugChunk)->readRelocTargets(); then can you make `readRelocTargets` a non-virtual member function that belongs to `SectionChunk`?
COFF/Writer.cpp
365	Please insert a blank line before a multi-line comment.

In D51089#1231980, @ruiu wrote:

Sorry for the belated response. I was thinking of this patch for a while.

Every time I saw the code of thunk range extension, I wonder if we really need this multi-pass algorithm which add thunks iterative on each pass. I believe in almost all cases, the algorithm finishes on the first iteration, if we allow a very small margin when determining "reachability". As long as a margin is small, size increase by allowing a margin should be negligible.

For pathetic executables for which we need to generate tons of thunks (which enlarges distance between callers and callees and thus need multiple passes with the current algorithm), we can simply discard everything that we made in the previous iteration instead of keeping them, double the margin, and then try again from scratch. In practice, I believe that fallback doesn't happen too frequently.

What do you think of the algorithm? If it works, I prefer that algorithm because discarding everything and redo with a larger margin is simpler than keeping thunks created in previous passes.

It can work; I worked on a proprietary linker for embedded systems that used that algorithm, It worked well enough 99% of the time. It could fail with nasty corner cases though, for example increasing the margin means more calls go out of range which leads to more thunks etc. Having said that I suspect that won't be a problem for COFF as most of the failures were a combination of a strange linker script and number of Thunks (Thumb branch range used to be 4 Megabytes, so there could be Thousands of thunks in a large project).

In D51089#1231980, @ruiu wrote:

Sorry for the belated response. I was thinking of this patch for a while.

Every time I saw the code of thunk range extension, I wonder if we really need this multi-pass algorithm which add thunks iterative on each pass. I believe in almost all cases, the algorithm finishes on the first iteration, if we allow a very small margin when determining "reachability". As long as a margin is small, size increase by allowing a margin should be negligible.

I actually thought about that before, but either didn't think it through properly or didn't feel it was necessary - but by adding that in this current design I can make it succeed after the first pass already (previously it required two passes adding thunks on my testcase).

For pathetic executables for which we need to generate tons of thunks (which enlarges distance between callers and callees and thus need multiple passes with the current algorithm), we can simply discard everything that we made in the previous iteration instead of keeping them, double the margin, and then try again from scratch. In practice, I believe that fallback doesn't happen too frequently.

What do you think of the algorithm? If it works, I prefer that algorithm because discarding everything and redo with a larger margin is simpler than keeping thunks created in previous passes.

The approach you describe feels a bit fragile. Unless you're really sure the margin tradeoff is right and it will be done on the first pass in real-world cases, it'll degrade pretty badly.

I was going to try to give this an honest and objective try, but I don't feel it will be much simpler. For your suggestion, we would need to have two kinds of loops over the chunks - one loop which checks whether thunks are needed with margin, adding them as necessary, and a second loop which checks if all relocations now are in range (not trying to add any thunks, but just aborting the loop, then resetting everything back to the original state and starting over. While the current code (which also is very similar to the corresponding ELF thunk code) does verification at the same time as runs a new pass trying to add more thunks if needed. If no more were needed, the algorithm was done.

COFF/Chunks.cpp
638	Sure, I'll shorten it.
790–797	Yes - this is called from assignAddresses on each relayout, to propagate the current location in the layout.
COFF/PDB.cpp
775	That also requires changes like `if (SectionChunk *SC = dyn_cast_or_null<SectionChunk>(C))` in readRelocTargets() in Writer.cpp. Or we move calling that to somewhere else, but then it's probably not quite as easy to parallelize.

Optimized the algorithm further by checking ranges with a margin in the first pass, making it succeed after the first run in my testcase. Shortened a variable name and added whitespace as @ruiu suggested.

The testcase isn't updated after the last adjustments though - it's pretty much work to hand craft a testcase which triggers as many of the cornercases of the algorithm, I'll update it once we settle on the algorithm to choose.

Since using a margin for adding thunks, I think it will be extremely tedious to actually create a testcase which would trigger more than one pass.

In D51089#1232372, @mstorsjo wrote:

Since using a margin for adding thunks, I think it will be extremely tedious to actually create a testcase which would trigger more than one pass.

In order to get sensible test coverage, I could make the margin (used only in the first pass) configurable (somehow), and run the multipass test with a very small margin.

Even with the other approach suggested by @ruiu, testing of the case when one pass isn't enough would require a huge, pathological test.

mstorsjo mentioned this in D52156: [LLD] [COFF] Alternative ARM range thunk algorithm.Sep 16 2018, 2:30 PM

If I understand it, to get a test case we'd need to have a branch that is just in range (including margin) such that no thunk is generated, but adding sufficient thunks causes that branch to go out of range. The brute force way to do it would be to generate greater than (margin/thunk-size) thunks but even with macros that would be a large tedious test to write. One possibility in ELF that I don't know would transition into COFF is to have some sections with high alignment so that inserting a thunk could displace one of these sections off an alignment boundary and hence add much more size than just the Thunk.

Interesting. I think that the implementation you have here will converge faster as it makes it more likely that pass 1 has all the thunks, at the expense of potentially generating more thunks than is strictly required. However if the goal is simplicity I think that you'll need to do everything in one pass, and accept that there will be corner cases that might not link if the margin isn't sufficient. For ELF and arbitrary linker scripts I thought the chance of failure too high, for COFF the chance of failure may be low enough. I think the most likely edge case in COFF will be the presence of sections with high alignment requirements as inserting thunks could cause a lot of bytes of alignment padding to be added.

In D51089#1237524, @peter.smith wrote:

Interesting. I think that the implementation you have here will converge faster as it makes it more likely that pass 1 has all the thunks, at the expense of potentially generating more thunks than is strictly required. However if the goal is simplicity I think that you'll need to do everything in one pass, and accept that there will be corner cases that might not link if the margin isn't sufficient. For ELF and arbitrary linker scripts I thought the chance of failure too high, for COFF the chance of failure may be low enough. I think the most likely edge case in COFF will be the presence of sections with high alignment requirements as inserting thunks could cause a lot of bytes of alignment padding to be added.

My apologies I clicked on the wrong link in the email, I should have been looking at D52156. Please ignore the past comment as it won't make much sense. The comment about writing the test might still make sense.

Went with the alternative algorithm in D52156 instead.

Revision Contents

Path

Size

COFF/

12 lines

44 lines

6 lines

1 line

2 lines

112 lines

test/

COFF/

arm-thumb-branch-error.s

arm-thumb-branch-thunk.s

17 lines

arm-thumb-branch20-error.s

arm-thumb-branch20-thunk.s

17 lines

Diff 161900

COFF/Chunks.h

Show First 20 Lines • Show All 135 Lines • ▼ Show 20 Lines	public:
public:		public:
symbol_iterator() = default;		symbol_iterator() = default;

Symbol operator() const { return File->getSymbol(I->SymbolTableIndex); }		Symbol operator() const { return File->getSymbol(I->SymbolTableIndex); }
};		};

SectionChunk(ObjFile File, const coff_section Header);		SectionChunk(ObjFile File, const coff_section Header);
static bool classof(const Chunk *C) { return C->kind() == SectionKind; }		static bool classof(const Chunk *C) { return C->kind() == SectionKind; }
		void finalizeContents() override;
size_t getSize() const override { return Header->SizeOfRawData; }		size_t getSize() const override { return Header->SizeOfRawData; }
ArrayRef<uint8_t> getContents() const;		ArrayRef<uint8_t> getContents() const;
void writeTo(uint8_t *Buf) const override;		void writeTo(uint8_t *Buf) const override;
bool hasData() const override;		bool hasData() const override;
uint32_t getOutputCharacteristics() const override;		uint32_t getOutputCharacteristics() const override;
StringRef getSectionName() const override;		StringRef getSectionName() const override;
void getBaserels(std::vector<Baserel> *Res) override;		void getBaserels(std::vector<Baserel> *Res) override;
bool isCOMDAT() const;		bool isCOMDAT() const;
▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	public:

// The file that this chunk was created from.		// The file that this chunk was created from.
ObjFile *File;		ObjFile *File;

// The COMDAT leader symbol if this is a COMDAT chunk.		// The COMDAT leader symbol if this is a COMDAT chunk.
DefinedRegular *Sym = nullptr;		DefinedRegular *Sym = nullptr;

ArrayRef<coff_relocation> Relocs;		ArrayRef<coff_relocation> Relocs;
		std::vector<Symbol *> Symbols;
		ruiuUnsubmitted Not Done Reply Inline Actions Maybe something like `RelocTargets` is better? `Symbols` looks like it represents symbol table contents. Please add a comment to explain why we want to cache relocation targets in this table (i.e. we need to modify relocation targets when a relocation is redirected to other symbol due to thunk insertion.) ruiu: Maybe something like `RelocTargets` is better? `Symbols` looks like it represents symbol table…

private:		private:
StringRef SectionName;		StringRef SectionName;
std::vector<SectionChunk *> AssocChildren;		std::vector<SectionChunk *> AssocChildren;

// Used by the garbage collector.		// Used by the garbage collector.
bool Live;		bool Live;

▲ Show 20 Lines • Show All 112 Lines • ▼ Show 20 Lines	public:
explicit ImportThunkChunkARM64(Defined *S) : ImpSymbol(S) {}		explicit ImportThunkChunkARM64(Defined *S) : ImpSymbol(S) {}
size_t getSize() const override { return sizeof(ImportThunkARM64); }		size_t getSize() const override { return sizeof(ImportThunkARM64); }
void writeTo(uint8_t *Buf) const override;		void writeTo(uint8_t *Buf) const override;

private:		private:
Defined *ImpSymbol;		Defined *ImpSymbol;
};		};

		class RangeExtensionThunkARM : public Chunk {
		public:
		explicit RangeExtensionThunkARM(Defined *T) : Target(T) {}
		size_t getSize() const override;
		void writeTo(uint8_t *Buf) const override;

		private:
		Defined *Target;
		};
		ruiuUnsubmitted Not Done Reply Inline Actions We generally don't define trivial accessors like this; instead, just define a member as a public one. ruiu: We generally don't define trivial accessors like this; instead, just define a member as a…
		mstorsjoAuthorUnsubmitted Not Done Reply Inline Actions Ok, will include that change into the next update. mstorsjo: Ok, will include that change into the next update.

// Windows-specific.		// Windows-specific.
// See comments for DefinedLocalImport class.		// See comments for DefinedLocalImport class.
class LocalImportChunk : public Chunk {		class LocalImportChunk : public Chunk {
public:		public:
explicit LocalImportChunk(Defined *S) : Sym(S) {}		explicit LocalImportChunk(Defined *S) : Sym(S) {}
size_t getSize() const override;		size_t getSize() const override;
void getBaserels(std::vector<Baserel> *Res) override;		void getBaserels(std::vector<Baserel> *Res) override;
void writeTo(uint8_t *Buf) const override;		void writeTo(uint8_t *Buf) const override;
▲ Show 20 Lines • Show All 115 Lines • Show Last 20 Lines

COFF/Chunks.cpp

Show All 38 Lines	SectionChunk::SectionChunk(ObjFile F, const coff_section H)

// If linker GC is disabled, every chunk starts out alive. If linker GC is		// If linker GC is disabled, every chunk starts out alive. If linker GC is
// enabled, treat non-comdat sections as roots. Generally optimized object		// enabled, treat non-comdat sections as roots. Generally optimized object
// files will be built with -ffunction-sections or /Gy, so most things worth		// files will be built with -ffunction-sections or /Gy, so most things worth
// stripping will be in a comdat.		// stripping will be in a comdat.
Live = !Config->DoGC \|\| !isCOMDAT();		Live = !Config->DoGC \|\| !isCOMDAT();
}		}

		// Initialize the Symbols vector, to allow redirecting certain relocations
		// to a thunk instead of the actual symbol the relocation's symbol table index
		// indicates.
		void SectionChunk::finalizeContents() {
		ruiuUnsubmitted Not Done Reply Inline Actions We generally don't micro-optimize code, but for relocations, we do, because the number of relocations can be an order of tens of millions for large programs. Spending one more microsecond for each relocation adds up to one second if your program has one million relocations. This function is a bit concerning in that regard. Could you measure the performance impact? Also, it looks odd that you do this in `finalizeContents`, as it doesn't correspond to finalizing contents. Perhaps this function should be given a new name. ruiu: We generally don't micro-optimize code, but for relocations, we do, because the number of…
		mstorsjoAuthorUnsubmitted Not Done Reply Inline Actions I tried measuring it, and I think it's making things slower, but it's mostly within measurement noise. My testcase was linking a 66 MB clang.exe (for x86_64). Before this change, the fastest link was in 480 ms, after the change the fastest link was 520 ms. But in both cases the runtimes occasionally go up to over 600 ms. So it's not huge but I think it's consistently measurable. Do you have any other suggestions on how to achieve this without affecting performance? Only use the RelocTargets vector if `Machine==ARMNT`? Or make it a `DenseMap` for overridden targets, which is empty for all other cases than when we have added thunks? Yes, it's a bit odd with `finalizeContents`, maybe a new method `initRelocTargets` which just gets called once before we start doing `assignAddresses`? mstorsjo: I tried measuring it, and I think it's making things slower, but it's mostly within measurement…
		ruiuUnsubmitted Not Done Reply Inline Actions Of 520ms, it'd be interesting to know how much time this function is spending. gprof might help, but I'm not sure if it works on Windows. 480ms to 520ms isn't I think a marginal difference; it's 10% slowdown if the measurement is accurate. One idea to make it faster (and could potentially be faster than it is now) is to parallelize it. I believe you can make it a separate function, say `readRelocTargets`, and call on all input sections in parallel. It should be safe to do because filling this new vector doesn't affect other threads. ruiu: Of 520ms, it'd be interesting to know how much time this function is spending. gprof might help…
		mstorsjoAuthorUnsubmitted Not Done Reply Inline Actions I don't run things on Windows myself, I'm mostly working with cross compilation here. I'll try making this a separate parallel pass and see what difference it makes! mstorsjo: I don't run things on Windows myself, I'm mostly working with cross compilation here. I'll try…
		if (!Symbols.empty())
		return;
		ruiuUnsubmitted Not Done Reply Inline Actions Can this happen? ruiu: Can this happen?
		mstorsjoAuthorUnsubmitted Not Done Reply Inline Actions I don't think so - perhaps I should make it an assert. I had to insert a call to finalizeContents() in relocateDebugChunk() in PDB.cpp though, so I wanted to make sure. mstorsjo: I don't think so - perhaps I should make it an assert. I had to insert a call to…
		mstorsjoAuthorUnsubmitted Not Done Reply Inline Actions Actually, yes, it does happen. `finalizeContents` gets called by `assignAddresses`, which gets called repeatedly when relayouting after adding thunks. (After realizing this, I had to make sure MergeChunk::finalizeContents works properly for this case as well.) The alternative would be to add another callback to Chunk, which we'd call just once, when all symbols are available. mstorsjo: Actually, yes, it does happen. `finalizeContents` gets called by `assignAddresses`, which gets…
		for (const coff_relocation &Rel : Relocs)
		Symbols.push_back(File->getSymbol(Rel.SymbolTableIndex));
		mstorsjoAuthorUnsubmitted Not Done Reply Inline Actions Does anyone have an opinion on the mechanism of overriding what symbol an individual reloc points to? Here I provide a full vector of symbols (which can't be initialized directly but after all symbols actually exist) - an alternative would be e.g. a DenseMap to only provide the individual symbols that are overridden. Or something else? mstorsjo: Does anyone have an opinion on the mechanism of overriding what symbol an individual reloc…
		ruiuUnsubmitted Not Done Reply Inline Actions I think this vector should be fine because this vector will be used very heavily and a vector lookup is extremely fast. ruiu: I think this vector should be fine because this vector will be used very heavily and a vector…
		ruiuUnsubmitted Not Done Reply Inline Actions Since this vector can be very large, it is perhaps better to call `reserve()`. ruiu: Since this vector can be very large, it is perhaps better to call `reserve()`.
		mstorsjoAuthorUnsubmitted Not Done Reply Inline Actions Indeed, I'll make it use reserve in the next iteration. mstorsjo: Indeed, I'll make it use reserve in the next iteration.
		}

static void add16(uint8_t *P, int16_t V) { write16le(P, read16le(P) + V); }		static void add16(uint8_t *P, int16_t V) { write16le(P, read16le(P) + V); }
static void add32(uint8_t *P, int32_t V) { write32le(P, read32le(P) + V); }		static void add32(uint8_t *P, int32_t V) { write32le(P, read32le(P) + V); }
static void add64(uint8_t *P, int64_t V) { write64le(P, read64le(P) + V); }		static void add64(uint8_t *P, int64_t V) { write64le(P, read64le(P) + V); }
static void or16(uint8_t *P, uint16_t V) { write16le(P, read16le(P) \| V); }		static void or16(uint8_t *P, uint16_t V) { write16le(P, read16le(P) \| V); }
static void or32(uint8_t *P, uint32_t V) { write32le(P, read32le(P) \| V); }		static void or32(uint8_t *P, uint32_t V) { write32le(P, read32le(P) \| V); }

// Verify that given sections are appropriate targets for SECREL		// Verify that given sections are appropriate targets for SECREL
// relocations. This check is relaxed because unfortunately debug		// relocations. This check is relaxed because unfortunately debug
▲ Show 20 Lines • Show All 243 Lines • ▼ Show 20 Lines	if (!hasData())
return;		return;
// Copy section contents from source object file to output file.		// Copy section contents from source object file to output file.
ArrayRef<uint8_t> A = getContents();		ArrayRef<uint8_t> A = getContents();
if (!A.empty())		if (!A.empty())
memcpy(Buf + OutputSectionOff, A.data(), A.size());		memcpy(Buf + OutputSectionOff, A.data(), A.size());

// Apply relocations.		// Apply relocations.
size_t InputSize = getSize();		size_t InputSize = getSize();
		size_t Idx = 0;
for (const coff_relocation &Rel : Relocs) {		for (const coff_relocation &Rel : Relocs) {
// Check for an invalid relocation offset. This check isn't perfect, because		// Check for an invalid relocation offset. This check isn't perfect, because
// we don't have the relocation size, which is only known after checking the		// we don't have the relocation size, which is only known after checking the
// machine and relocation type. As a result, a relocation may overwrite the		// machine and relocation type. As a result, a relocation may overwrite the
// beginning of the following input section.		// beginning of the following input section.
if (Rel.VirtualAddress >= InputSize) {		if (Rel.VirtualAddress >= InputSize) {
error("relocation points beyond the end of its parent section");		error("relocation points beyond the end of its parent section");
continue;		continue;
}		}

uint8_t *Off = Buf + OutputSectionOff + Rel.VirtualAddress;		uint8_t *Off = Buf + OutputSectionOff + Rel.VirtualAddress;

auto *Sym =		// Use the potentially remapped Symbol instead of the one that the
dyn_cast_or_null<Defined>(File->getSymbol(Rel.SymbolTableIndex));		// relocation points to.
		auto *Sym = dyn_cast_or_null<Defined>(Symbols[Idx++]);
if (!Sym) {		if (!Sym) {
if (isCodeView() \|\| isDWARF())		if (isCodeView() \|\| isDWARF())
continue;		continue;
// Symbols in early discarded sections are represented using null pointers,		// Symbols in early discarded sections are represented using null pointers,
// so we need to retrieve the name from the object file.		// so we need to retrieve the name from the object file.
COFFSymbolRef Sym =		COFFSymbolRef Sym =
check(File->getCOFFObj()->getSymbol(Rel.SymbolTableIndex));		check(File->getCOFFObj()->getSymbol(Rel.SymbolTableIndex));
StringRef Name;		StringRef Name;
▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	static uint8_t getBaserelType(const coff_relocation &Rel) {
}		}
}		}

// Windows-specific.		// Windows-specific.
// Collect all locations that contain absolute addresses, which need to be		// Collect all locations that contain absolute addresses, which need to be
// fixed by the loader if load-time relocation is needed.		// fixed by the loader if load-time relocation is needed.
// Only called when base relocation is enabled.		// Only called when base relocation is enabled.
void SectionChunk::getBaserels(std::vector<Baserel> *Res) {		void SectionChunk::getBaserels(std::vector<Baserel> *Res) {
		size_t Counter = 0;
for (const coff_relocation &Rel : Relocs) {		for (const coff_relocation &Rel : Relocs) {
		ruiuUnsubmitted Not Done Reply Inline Actions It is more straightforward to write this loop in as a plain old `for` loop instead of a range-based for loop with `Counter`. ruiu: It is more straightforward to write this loop in as a plain old `for` loop instead of a range…
		mstorsjoAuthorUnsubmitted Not Done Reply Inline Actions Ok, will do. mstorsjo: Ok, will do.
uint8_t Ty = getBaserelType(Rel);		uint8_t Ty = getBaserelType(Rel);
		size_t Idx = Counter++;
if (Ty == IMAGE_REL_BASED_ABSOLUTE)		if (Ty == IMAGE_REL_BASED_ABSOLUTE)
continue;		continue;
Symbol *Target = File->getSymbol(Rel.SymbolTableIndex);		// Use the potentially remapped Symbol instead of the one that the
		// relocation points to.
		Symbol *Target = Symbols[Idx];
if (!Target \|\| isa<DefinedAbsolute>(Target))		if (!Target \|\| isa<DefinedAbsolute>(Target))
continue;		continue;
Res->emplace_back(RVA + Rel.VirtualAddress, Ty);		Res->emplace_back(RVA + Rel.VirtualAddress, Ty);
}		}
}		}

static int getRuntimePseudoRelocSize(uint16_t Type) {		static int getRuntimePseudoRelocSize(uint16_t Type) {
// Relocations that either contain an absolute address, or a plain		// Relocations that either contain an absolute address, or a plain
▲ Show 20 Lines • Show All 189 Lines • ▼ Show 20 Lines

void ImportThunkChunkARM64::writeTo(uint8_t *Buf) const {		void ImportThunkChunkARM64::writeTo(uint8_t *Buf) const {
int64_t Off = ImpSymbol->getRVA() & 0xfff;		int64_t Off = ImpSymbol->getRVA() & 0xfff;
memcpy(Buf + OutputSectionOff, ImportThunkARM64, sizeof(ImportThunkARM64));		memcpy(Buf + OutputSectionOff, ImportThunkARM64, sizeof(ImportThunkARM64));
applyArm64Addr(Buf + OutputSectionOff, ImpSymbol->getRVA(), RVA, 12);		applyArm64Addr(Buf + OutputSectionOff, ImpSymbol->getRVA(), RVA, 12);
applyArm64Ldr(Buf + OutputSectionOff + 4, Off);		applyArm64Ldr(Buf + OutputSectionOff + 4, Off);
}		}

		// A Thumb2, PIC range extension thunk. A non-PIC one would be 2 bytes
		// shorter but would require a base relocation instead.
		ruiuUnsubmitted Not Done Reply Inline Actions This variable name seems a bit too long to my taste; I'd name `ArmThunk` or something like that, and that should be fine as long as this is a file-scope variable. ruiu: This variable name seems a bit too long to my taste; I'd name `ArmThunk` or something like that…
		mstorsjoAuthorUnsubmitted Not Done Reply Inline Actions Sure, I'll shorten it. mstorsjo: Sure, I'll shorten it.
		const uint8_t RangeExtensionThunkARMData[] = {
		0x40, 0xf2, 0x00, 0x0c, // P: movw ip,:lower16:S - (P + (L1-P) + 4)
		0xc0, 0xf2, 0x00, 0x0c, // movt ip,:upper16:S - (P + (L1-P) + 4)
		0xfc, 0x44, // L1: add ip, pc
		efriedmaUnsubmitted Not Done Reply Inline Actions Can you use "add pc, ip" instead? That's not an interworking branch, but I think we can assume the target is Thumb mode here. efriedma: Can you use "add pc, ip" instead? That's not an interworking branch, but I think we can assume…
		mstorsjoAuthorUnsubmitted Not Done Reply Inline Actions Oh, indeed, that'll make it as small as the other ones, while being PIC. Will update. mstorsjo: Oh, indeed, that'll make it as small as the other ones, while being PIC. Will update.
		0x60, 0x47, // bx ip
		};

		size_t RangeExtensionThunkARM::getSize() const {
		return sizeof(RangeExtensionThunkARMData);
		}

		void RangeExtensionThunkARM::writeTo(uint8_t *Buf) const {
		uint64_t Offset = Target->getRVA() - RVA - 12;
		// The target address needs to have the Thumb bit set.
		Offset \|= 1;
		memcpy(Buf + OutputSectionOff, RangeExtensionThunkARMData,
		sizeof(RangeExtensionThunkARMData));
		applyMOV32T(Buf + OutputSectionOff, uint32_t(Offset));
		}

void LocalImportChunk::getBaserels(std::vector<Baserel> *Res) {		void LocalImportChunk::getBaserels(std::vector<Baserel> *Res) {
Res->emplace_back(getRVA());		Res->emplace_back(getRVA());
}		}

size_t LocalImportChunk::getSize() const {		size_t LocalImportChunk::getSize() const {
return Config->is64() ? 8 : 4;		return Config->is64() ? 8 : 4;
}		}

▲ Show 20 Lines • Show All 115 Lines • ▼ Show 20 Lines
}		}

void MergeChunk::finalizeContents() {		void MergeChunk::finalizeContents() {
for (SectionChunk *C : Sections)		for (SectionChunk *C : Sections)
if (C->isLive())		if (C->isLive())
Builder.add(toStringRef(C->getContents()));		Builder.add(toStringRef(C->getContents()));
Builder.finalize();		Builder.finalize();

for (SectionChunk *C : Sections) {		for (SectionChunk *C : Sections) {
if (!C->isLive())		if (!C->isLive())
continue;		continue;
size_t Off = Builder.getOffset(toStringRef(C->getContents()));		size_t Off = Builder.getOffset(toStringRef(C->getContents()));
C->setOutputSection(Out);		C->setOutputSection(Out);
C->setRVA(RVA + Off);		C->setRVA(RVA + Off);
C->OutputSectionOff = OutputSectionOff + Off;		C->OutputSectionOff = OutputSectionOff + Off;
}		}
		ruiuUnsubmitted Not Done Reply Inline Actions This part is not guarded by `Finalized` -- is that intended? ruiu: This part is not guarded by `Finalized` -- is that intended?
		mstorsjoAuthorUnsubmitted Not Done Reply Inline Actions Yes - this is called from assignAddresses on each relayout, to propagate the current location in the layout. mstorsjo: Yes - this is called from assignAddresses on each relayout, to propagate the current location…
}		}

uint32_t MergeChunk::getOutputCharacteristics() const {		uint32_t MergeChunk::getOutputCharacteristics() const {
return IMAGE_SCN_MEM_READ \| IMAGE_SCN_CNT_INITIALIZED_DATA;		return IMAGE_SCN_MEM_READ \| IMAGE_SCN_CNT_INITIALIZED_DATA;
}		}

size_t MergeChunk::getSize() const {		size_t MergeChunk::getSize() const {
return Builder.getSize();		return Builder.getSize();
}		}

void MergeChunk::writeTo(uint8_t *Buf) const {		void MergeChunk::writeTo(uint8_t *Buf) const {
Builder.write(Buf + OutputSectionOff);		Builder.write(Buf + OutputSectionOff);
}		}

} // namespace coff		} // namespace coff
} // namespace lld		} // namespace lld

COFF/ICF.cpp

Show First 20 Lines • Show All 141 Lines • ▼ Show 20 Lines	if (A->Relocs.size() != B->Relocs.size())
return false;		return false;

// Compare relocations.		// Compare relocations.
auto Eq = [&](const coff_relocation &R1, const coff_relocation &R2) {		auto Eq = [&](const coff_relocation &R1, const coff_relocation &R2) {
if (R1.Type != R2.Type \|\|		if (R1.Type != R2.Type \|\|
R1.VirtualAddress != R2.VirtualAddress) {		R1.VirtualAddress != R2.VirtualAddress) {
return false;		return false;
}		}
		// This doesn't use the potentially remapped Symbol, but at this point,
		ruiuUnsubmitted Not Done Reply Inline Actions I'd think this comment is not easy to understand if you don't know about this comment is in the context of ARM thunk support. Maybe we can just omit it? Does ICF run after ARM thunk creation? ruiu: I'd think this comment is not easy to understand if you don't know about this comment is in the…
		mstorsjoAuthorUnsubmitted Not Done Reply Inline Actions Yeah, it's probably best to omit the comment. ICF runs before the thunk creation. The original reason for the comment was that I wanted to replace all uses of coff_relocation::SymbolTableIndex with the Symbols vector (outside the case when initializing the vector), to make things consistent, but it wasn't practical in all cases. In reality it mostly is necessary in SectionChunk::writeTo() and SectionChunk::getBaserels(). mstorsjo: Yeah, it's probably best to omit the comment. ICF runs before the thunk creation. The original…
		// we shouldn't have done any remapping yet (and using the original
		// Symbol is probably better in any case).
Symbol *B1 = A->File->getSymbol(R1.SymbolTableIndex);		Symbol *B1 = A->File->getSymbol(R1.SymbolTableIndex);
Symbol *B2 = B->File->getSymbol(R2.SymbolTableIndex);		Symbol *B2 = B->File->getSymbol(R2.SymbolTableIndex);
if (B1 == B2)		if (B1 == B2)
return true;		return true;
if (auto *D1 = dyn_cast<DefinedRegular>(B1))		if (auto *D1 = dyn_cast<DefinedRegular>(B1))
if (auto *D2 = dyn_cast<DefinedRegular>(B2))		if (auto *D2 = dyn_cast<DefinedRegular>(B2))
return D1->getValue() == D2->getValue() &&		return D1->getValue() == D2->getValue() &&
D1->getChunk()->Class[Cnt % 2] == D2->getChunk()->Class[Cnt % 2];		D1->getChunk()->Class[Cnt % 2] == D2->getChunk()->Class[Cnt % 2];
Show All 9 Lines	return A->getOutputCharacteristics() == B->getOutputCharacteristics() &&
A->Checksum == B->Checksum && A->getContents() == B->getContents() &&		A->Checksum == B->Checksum && A->getContents() == B->getContents() &&
assocEquals(A, B);		assocEquals(A, B);
}		}

// Compare "moving" part of two sections, namely relocation targets.		// Compare "moving" part of two sections, namely relocation targets.
bool ICF::equalsVariable(const SectionChunk A, const SectionChunk B) {		bool ICF::equalsVariable(const SectionChunk A, const SectionChunk B) {
// Compare relocations.		// Compare relocations.
auto Eq = [&](const coff_relocation &R1, const coff_relocation &R2) {		auto Eq = [&](const coff_relocation &R1, const coff_relocation &R2) {
		// This doesn't use the potentially remapped Symbol, but at this point,
		ruiuUnsubmitted Not Done Reply Inline Actions Ditto. ruiu: Ditto.
		// we shouldn't have done any remapping yet (and using the original
		// Symbol is probably better in any case).
Symbol *B1 = A->File->getSymbol(R1.SymbolTableIndex);		Symbol *B1 = A->File->getSymbol(R1.SymbolTableIndex);
Symbol *B2 = B->File->getSymbol(R2.SymbolTableIndex);		Symbol *B2 = B->File->getSymbol(R2.SymbolTableIndex);
if (B1 == B2)		if (B1 == B2)
return true;		return true;
if (auto *D1 = dyn_cast<DefinedRegular>(B1))		if (auto *D1 = dyn_cast<DefinedRegular>(B1))
if (auto *D2 = dyn_cast<DefinedRegular>(B2))		if (auto *D2 = dyn_cast<DefinedRegular>(B2))
return D1->getChunk()->Class[Cnt % 2] == D2->getChunk()->Class[Cnt % 2];		return D1->getChunk()->Class[Cnt % 2] == D2->getChunk()->Class[Cnt % 2];
return false;		return false;
▲ Show 20 Lines • Show All 119 Lines • Show Last 20 Lines

COFF/PDB.cpp

	Show First 20 Lines • Show All 766 Lines • ▼ Show 20 Lines
	}			}

	// Allocate memory for a .debug$S section and relocate it.			// Allocate memory for a .debug$S section and relocate it.
	static ArrayRef<uint8_t> relocateDebugChunk(BumpPtrAllocator &Alloc,			static ArrayRef<uint8_t> relocateDebugChunk(BumpPtrAllocator &Alloc,
	SectionChunk *DebugChunk) {			SectionChunk *DebugChunk) {
	uint8_t *Buffer = Alloc.Allocate<uint8_t>(DebugChunk->getSize());			uint8_t *Buffer = Alloc.Allocate<uint8_t>(DebugChunk->getSize());
	assert(DebugChunk->OutputSectionOff == 0 &&			assert(DebugChunk->OutputSectionOff == 0 &&
	"debug sections should not be in output sections");			"debug sections should not be in output sections");
				DebugChunk->finalizeContents();
				ruiuUnsubmitted Not Done Reply Inline Actions If you change this line to cast<SectionChunk>(DebugChunk)->readRelocTargets(); then can you make `readRelocTargets` a non-virtual member function that belongs to `SectionChunk`? ruiu: If you change this line to cast<SectionChunk>(DebugChunk)->readRelocTargets(); then can you…
				mstorsjoAuthorUnsubmitted Not Done Reply Inline Actions That also requires changes like `if (SectionChunk SC = dyn_cast_or_null<SectionChunk>(C))` in readRelocTargets() in Writer.cpp. Or we move calling that to somewhere else, but then it's probably not quite as easy to parallelize. mstorsjo:* That also requires changes like `if (SectionChunk *SC = dyn_cast_or_null<SectionChunk>(C))` in…
	DebugChunk->writeTo(Buffer);			DebugChunk->writeTo(Buffer);
	return consumeDebugMagic(makeArrayRef(Buffer, DebugChunk->getSize()),			return consumeDebugMagic(makeArrayRef(Buffer, DebugChunk->getSize()),
	".debug$S");			".debug$S");
	}			}

	static pdb::SectionContrib createSectionContrib(const Chunk *C, uint32_t Modi) {			static pdb::SectionContrib createSectionContrib(const Chunk *C, uint32_t Modi) {
	OutputSection *OS = C->getOutputSection();			OutputSection *OS = C->getOutputSection();
	pdb::SectionContrib SC;			pdb::SectionContrib SC;
	▲ Show 20 Lines • Show All 552 Lines • Show Last 20 Lines

COFF/Writer.h

	Show All 30 Lines
	class OutputSection {			class OutputSection {
	public:			public:
	OutputSection(llvm::StringRef N, uint32_t Chars) : Name(N) {			OutputSection(llvm::StringRef N, uint32_t Chars) : Name(N) {
	Header.Characteristics = Chars;			Header.Characteristics = Chars;
	}			}
	void addChunk(Chunk *C);			void addChunk(Chunk *C);
	void merge(OutputSection *Other);			void merge(OutputSection *Other);
	ArrayRef<Chunk *> getChunks() { return Chunks; }			ArrayRef<Chunk *> getChunks() { return Chunks; }
				void clear() { Chunks.clear(); }
	void addPermissions(uint32_t C);			void addPermissions(uint32_t C);
	void setPermissions(uint32_t C);			void setPermissions(uint32_t C);
	uint64_t getRVA() { return Header.VirtualAddress; }			uint64_t getRVA() { return Header.VirtualAddress; }
	uint64_t getFileOff() { return Header.PointerToRawData; }			uint64_t getFileOff() { return Header.PointerToRawData; }
	void writeHeaderTo(uint8_t *Buf);			void writeHeaderTo(uint8_t *Buf);
				bool createThunks(size_t &ThunkCounter);

	// Returns the size of this section in an executable memory image.			// Returns the size of this section in an executable memory image.
	// This may be smaller than the raw size (the raw size is multiple			// This may be smaller than the raw size (the raw size is multiple
	// of disk sector size, so there may be padding at end), or may be			// of disk sector size, so there may be padding at end), or may be
	// larger (if that's the case, the loader reserves spaces after end			// larger (if that's the case, the loader reserves spaces after end
	// of raw data).			// of raw data).
	uint64_t getVirtualSize() { return Header.VirtualSize; }			uint64_t getVirtualSize() { return Header.VirtualSize; }

	Show All 22 Lines

COFF/Writer.cpp

Show First 20 Lines • Show All 148 Lines • ▼ Show 20 Lines
private:		private:
void findRuntimePseudoRelocs();		void findRuntimePseudoRelocs();
void createSections();		void createSections();
void createMiscChunks();		void createMiscChunks();
void createImportTables();		void createImportTables();
void createExportTable();		void createExportTable();
void mergeSections();		void mergeSections();
void assignAddresses();		void assignAddresses();
		void finalizeAddresses();
void removeEmptySections();		void removeEmptySections();
void createSymbolAndStringTable();		void createSymbolAndStringTable();
void openFile(StringRef OutputPath);		void openFile(StringRef OutputPath);
template <typename PEHeaderTy> void writeHeader();		template <typename PEHeaderTy> void writeHeader();
void createSEHTable();		void createSEHTable();
void createRuntimePseudoRelocs();		void createRuntimePseudoRelocs();
void createGuardCFTables();		void createGuardCFTables();
void markSymbolsForRVATable(ObjFile *File,		void markSymbolsForRVATable(ObjFile *File,
▲ Show 20 Lines • Show All 161 Lines • ▼ Show 20 Lines	for (const auto &DebugDir : File.debug_directories()) {
// id that we recognize / support, ignore it.		// id that we recognize / support, ignore it.
if (ExistingDI->Signature.CVSignature != OMF::Signature::PDB70)		if (ExistingDI->Signature.CVSignature != OMF::Signature::PDB70)
return None;		return None;
return *ExistingDI;		return *ExistingDI;
}		}
return None;		return None;
}		}

		// Check whether the target address S is in range from a relocation
		// of type RelType at address P.
		static bool isInRange(uint16_t RelType, uint64_t S, uint64_t P) {
		int64_t Diff = S - P - 4;
		assert(Config->Machine == ARMNT);
		ruiuUnsubmitted Not Done Reply Inline Actions nit: move this assert at the beginning of this function. ruiu: nit: move this assert at the beginning of this function.
		switch (RelType) {
		case IMAGE_REL_ARM_BRANCH20T:
		return isInt<21>(Diff);
		case IMAGE_REL_ARM_BRANCH24T:
		case IMAGE_REL_ARM_BLX23T:
		return isInt<25>(Diff);
		default:
		return true;
		}
		}

		// Return an existing thunk which is in range, or create a new one.
		static std::pair<Defined *, bool>
		getThunk(std::map<uint64_t, std::vector<Defined *>> &ExistingThunks,
		Defined *Target, uint64_t P, uint16_t Type, size_t &ThunkCounter,
		uint32_t EstimatedThunkRVA) {
		std::vector<Defined *> &TargetThunks = ExistingThunks[Target->getRVA()];
		peter.smithUnsubmitted Not Done Reply Inline Actions Assuming getRVA() is the virtual address of the symbol, is Target->getRVA() stable between passes? Presumably if thunks are inserted then assignAddresses() may cause some symbols to change address? I'm not too familiar with the COFF code base so I could be missing something here. If I'm right the reuse between passes may not work as well as it could do. peter.smith: Assuming getRVA() is the virtual address of the symbol, is Target->getRVA() stable between…
		mstorsjoAuthorUnsubmitted Not Done Reply Inline Actions No, the RVAs aren't stable between passes, but we don't keep the map between passes either; it's a local variable in createThunks below. I guess it would be useful to allow this to find a different thunk from the previous pass (that would also reduce the amount of changes in later passes, reducing the number of passes required before it converges), but that's not implemented (yet). mstorsjo: No, the RVAs aren't stable between passes, but we don't keep the map between passes either…
		// TODO: Since we create thunks linearly in the same order as we iterate
		// over the section, it should be enough to just check the last one of
		// the existing thunks?
		for (Defined *Sym : TargetThunks) {
		ruiuUnsubmitted Not Done Reply Inline Actions I don't know if your above comment is true, but if the one this loop is looking for is likely at the end of the vector, I'd search in the reverse order. I.e. for (Defined Sym : llvm::reverse(TargetThunks)) ruiu:* I don't know if your above comment is true, but if the one this loop is looking for is likely…
		mstorsjoAuthorUnsubmitted Not Done Reply Inline Actions That's a nice idea. I'm changing code to keep the mapping of existing thunks stable across passes, so then the comment doesn't apply quite as much as before. I tested this and it didn't really save any measurable runtime, but it might be worthwhile anyway, as long as we iterate through the whole vector. mstorsjo: That's a nice idea. I'm changing code to keep the mapping of existing thunks stable across…
		mstorsjoAuthorUnsubmitted Not Done Reply Inline Actions In practice with the case I'm testing, TargetThunks will only have 0 or 1 members, so it doesn't really matter much, but in general I guess it could be useful. (My testcase produces a 46 MB clang.exe, so it's only roughly twice as large as the branch range which is 16 MB.) mstorsjo: In practice with the case I'm testing, TargetThunks will only have 0 or 1 members, so it…
		if (isInRange(Type, Sym->getRVA(), P))
		return std::pair<Defined *, bool>(Sym, false);
		ruiuUnsubmitted Not Done Reply Inline Actions Can this just be `return {Sym, false}`? ruiu: Can this just be `return {Sym, false}`?
		mstorsjoAuthorUnsubmitted Not Done Reply Inline Actions Indeed, will simplify the syntax of these. mstorsjo: Indeed, will simplify the syntax of these.
		}
		ruiuUnsubmitted Not Done Reply Inline Actions nit: omit {} ruiu: nit: omit {}
		Chunk *C = make<RangeExtensionThunkARM>(Target);
		C->setRVA(EstimatedThunkRVA); // Estimate of where it will be located.
		ruiuUnsubmitted Not Done Reply Inline Actions Please insert a blank line before a multi-line comment. ruiu: Please insert a blank line before a multi-line comment.
		Defined *D = dyn_cast_or_null<Defined>(Symtab->addSynthetic(
		Saver.save("__thunk_" + Target->getName() + "_" + Twine(ThunkCounter++)),
		ruiuUnsubmitted Not Done Reply Inline Actions I don't think you need to pass this `ThunkCounter` around; I'd define this as a static local variable in this function. ruiu: I don't think you need to pass this `ThunkCounter` around; I'd define this as a static local…
		mstorsjoAuthorUnsubmitted Not Done Reply Inline Actions I later figured out I didn't need to name the thunk symbol at all; I just create the DefinedSynthetic directly without a name, and don't add it to Symtab - just like we do with object file local symbols. mstorsjo: I later figured out I didn't need to name the thunk symbol at all; I just create the…
		C));
		if (!D)
		fatal("Thunk collision with other symbol?");
		ruiuUnsubmitted Not Done Reply Inline Actions Error messages should start with a lowercase letter. Shouldn't this be an `assert`? If this can be triggered by a valid user input, this error message should contain more info as to what is the problem. ruiu: Error messages should start with a lowercase letter. Shouldn't this be an `assert`? If this…
		mstorsjoAuthorUnsubmitted Not Done Reply Inline Actions I managed to remove this error altogether by not adding the thunk symbols to the symbol table. mstorsjo: I managed to remove this error altogether by not adding the thunk symbols to the symbol table.
		TargetThunks.push_back(D);
		return std::pair<Defined *, bool>(D, true);
		ruiuUnsubmitted Not Done Reply Inline Actions Maybe `return {D, true}` ruiu: Maybe `return {D, true}`
		}

		bool OutputSection::createThunks(size_t &ThunkCounter) {
		std::map<uint64_t, std::vector<Defined *>> ExistingThunks;
		ruiuUnsubmitted Not Done Reply Inline Actions std::map is an ordered map and usually much slower than DenseMap, so please use DenseMap. ruiu: std::map is an ordered map and usually much slower than DenseMap, so please use DenseMap.
		bool Changed = false;
		size_t ThunksSize = 0;
		for (size_t I = 0; I != Chunks.size(); ++I) {
		SectionChunk *SC = dyn_cast_or_null<SectionChunk>(Chunks[I]);
		if (!SC)
		continue;
		size_t SymbolIdx = 0;
		size_t ThunkInsertionSpot = I + 1;
		// Try to get a good enough estimate of where new thunks will be placed.
		ruiuUnsubmitted Not Done Reply Inline Actions nit: please insert a blank line before a comment. ruiu: nit: please insert a blank line before a comment.
		// Offset this by the size of the new thunks added so far, to make the
		// estimate slightly better.
		size_t ThunkInsertionRVA = SC->getRVA() + SC->getSize() + ThunksSize;
		for (const coff_relocation &Rel : SC->Relocs) {
		Defined *Sym = dyn_cast_or_null<Defined>(SC->Symbols[SymbolIdx]);
		if (!Sym) {
		SymbolIdx++;
		continue;
		}
		// The estimate of the source address P should be pretty accurate,
		// but we don't know whether the target Symbol address should be
		// offset or not, giving us uncertainty once we have added one thunk.
		uint64_t S = Sym->getRVA();
		uint64_t P = SC->getRVA() + Rel.VirtualAddress + ThunksSize;
		if (!isInRange(Rel.Type, S, P)) {
		Defined *Thunk;
		bool WasNew;
		std::tie(Thunk, WasNew) = getThunk(ExistingThunks, Sym, P, Rel.Type,
		ThunkCounter, ThunkInsertionRVA);
		if (WasNew && Thunk) {
		// TODO: .insert() in a std::vector will move all later elements
		ruiuUnsubmitted Not Done Reply Inline Actions Nesting is too deep. Please consider splitting to multiple functions. ruiu: Nesting is too deep. Please consider splitting to multiple functions.
		mstorsjoAuthorUnsubmitted Not Done Reply Inline Actions It's a bit hard to split this part to a separate method since it touches almost every single local variable from this method, but I can easily change the `if (!isInRange())` into an `if (isInRange()) continue;` to reduce the nesting a little. mstorsjo: It's a bit hard to split this part to a separate method since it touches almost every single…
		// forward - is this a concern?
		Chunks.insert(Chunks.begin() + ThunkInsertionSpot, Thunk->getChunk());
		peter.smithUnsubmitted Not Done Reply Inline Actions In theory if you are iterating a fixed number (Chunks.size()) of the Chunks vector then inserting thunks into the Chunks vector in the same loop will mean that Chunks near the end may not be scanned for Thunks. Given that the algorithm will only terminate when 0 thunks are inserted you'll eventually scan all of them but it may cost you more passes than you would need if you inserted all Thunks in one go. I think you'll be unlikely to hit 10 passes without a contrived test case though. peter.smith: In theory if you are iterating a fixed number (Chunks.size()) of the Chunks vector then…
		mstorsjoAuthorUnsubmitted Not Done Reply Inline Actions Hmm, I'm not quite sure I understand what you mean here. You mean that since I'm adding more elements to the Chunks vector, I'd miss the last few ones that were pushed forward? The limit on the outer loop, on line 379, explicitly checks for Chunks.size(), so it will loop until the very end of the vector, even if Chunks grows meanwhile. mstorsjo: Hmm, I'm not quite sure I understand what you mean here. You mean that since I'm adding more…
		peter.smithUnsubmitted Not Done Reply Inline Actions Apologies, I had it my mind that Chunks.size() would only be calculated once per pass. peter.smith: Apologies, I had it my mind that Chunks.size() would only be calculated once per pass.
		ThunkInsertionSpot++;
		ThunksSize += Thunk->getChunk()->getSize();
		ThunkInsertionRVA += Thunk->getChunk()->getSize();
		}
		SC->Symbols[SymbolIdx] = Thunk;
		Changed = true;
		}
		SymbolIdx++;
		}
		}
		return Changed;
		}

		// Assign addresses and add thunks if necessary.
		void Writer::finalizeAddresses() {
		int ThunkPass = 0;
		size_t ThunkCounter = 0;
		bool Changed;
		do {
		if (ThunkPass >= 10)
		fatal("Adding thunks hasn't converged after " + Twine(ThunkPass) +
		ruiuUnsubmitted Not Done Reply Inline Actions Adding -> adding ruiu: Adding -> adding
		" passes");
		assignAddresses();
		// Only ARMNT requires range extension thunks at the moment.
		if (Config->Machine != ARMNT)
		break;
		Changed = false;
		for (OutputSection *Sec : OutputSections)
		Changed \|= Sec->createThunks(ThunkCounter);
		ThunkPass++;
		// Iterate until no new thunks are needed.
		} while (Changed);
		}

// The main function of the writer.		// The main function of the writer.
void Writer::run() {		void Writer::run() {
ScopedTimer T1(CodeLayoutTimer);		ScopedTimer T1(CodeLayoutTimer);

// Find pseudo relocs early, to allow marking the corresponding		// Find pseudo relocs early, to allow marking the corresponding
// section chunks as writable, before assigning them to output sections.		// section chunks as writable, before assigning them to output sections.
if (Config->MinGW)		if (Config->MinGW)
findRuntimePseudoRelocs();		findRuntimePseudoRelocs();

createSections();		createSections();
createMiscChunks();		createMiscChunks();
createImportTables();		createImportTables();
createExportTable();		createExportTable();
mergeSections();		mergeSections();
assignAddresses();		finalizeAddresses();
removeEmptySections();		removeEmptySections();
setSectionPermissions();		setSectionPermissions();
createSymbolAndStringTable();		createSymbolAndStringTable();

if (FileSize > UINT32_MAX)		if (FileSize > UINT32_MAX)
fatal("image size (" + Twine(FileSize) + ") " +		fatal("image size (" + Twine(FileSize) + ") " +
"exceeds maximum allowable size (" + Twine(UINT32_MAX) + ")");		"exceeds maximum allowable size (" + Twine(UINT32_MAX) + ")");

▲ Show 20 Lines • Show All 957 Lines • ▼ Show 20 Lines	if (S->Header.Characteristics & IMAGE_SCN_CNT_INITIALIZED_DATA)
Res += S->getRawSize();		Res += S->getRawSize();
return Res;		return Res;
}		}

// Add base relocations to .reloc section.		// Add base relocations to .reloc section.
void Writer::addBaserels() {		void Writer::addBaserels() {
if (!Config->Relocatable)		if (!Config->Relocatable)
return;		return;
		RelocSec->clear();
std::vector<Baserel> V;		std::vector<Baserel> V;
for (OutputSection *Sec : OutputSections) {		for (OutputSection *Sec : OutputSections) {
if (Sec->Header.Characteristics & IMAGE_SCN_MEM_DISCARDABLE)		if (Sec->Header.Characteristics & IMAGE_SCN_MEM_DISCARDABLE)
continue;		continue;
// Collect all locations for base relocations.		// Collect all locations for base relocations.
for (Chunk *C : Sec->getChunks())		for (Chunk *C : Sec->getChunks())
C->getBaserels(&V);		C->getBaserels(&V);
// Add the addresses to .reloc section.		// Add the addresses to .reloc section.
Show All 23 Lines

test/COFF/arm-thumb-branch-error.s

This file was deleted.

	// RUN: llvm-mc -filetype=obj -triple=thumbv7a-windows-gnu %s -o %t
	// RUN: llvm-mc -filetype=obj -triple=thumbv7a-windows-gnu %S/Inputs/far-arm-thumb-abs.s -o %tfar
	// RUN: not lld-link -entry:_start -subsystem:console %t %tfar -out:%t2 2>&1 \| FileCheck %s
	// REQUIRES: arm
	.syntax unified
	.globl _start
	_start:
	bl too_far1

	// CHECK: relocation out of range

test/COFF/arm-thumb-branch-thunk.s

This file was added.

				// RUN: llvm-mc -filetype=obj -triple=thumbv7a-windows-gnu %s -o %t
				// RUN: llvm-mc -filetype=obj -triple=thumbv7a-windows-gnu %S/Inputs/far-arm-thumb-abs.s -o %tfar
				// RUN: lld-link -entry:_start -subsystem:console %t %tfar -out:%t.exe
				// RUN: llvm-objdump -d %t.exe \| FileCheck %s
				// REQUIRES: arm
				.syntax unified
				.globl _start
				_start:
				bl too_far1

				// CHECK: Disassembly of section .text:
				// CHECK: .text:
				// CHECK: 401000: 00 f0 00 f8 bl #0
				// CHECK: 401004: 4f f6 f5 7c movw r12, #65525
				// CHECK: 401008: c0 f2 ff 0c movt r12, #255
				// CHECK: 40100c: fc 44 add r12, pc
				// CHECK: 40100e: 60 47 bx r12

test/COFF/arm-thumb-branch20-error.s

This file was deleted.

	// REQUIRES: arm
	// RUN: llvm-mc -filetype=obj -triple=thumbv7a-windows-gnu %s -o %t.obj
	// RUN: llvm-mc -filetype=obj -triple=thumbv7a-windows-gnu %S/Inputs/far-arm-thumb-abs20.s -o %t.far.obj
	// RUN: not lld-link -entry:_start -subsystem:console %t.obj %t.far.obj -out:%t.exe 2>&1 \| FileCheck %s
	.syntax unified
	.globl _start
	_start:
	bne too_far20

	// CHECK: relocation out of range

test/COFF/arm-thumb-branch20-thunk.s

This file was added.

				// REQUIRES: arm
				// RUN: llvm-mc -filetype=obj -triple=thumbv7a-windows-gnu %s -o %t.obj
				// RUN: llvm-mc -filetype=obj -triple=thumbv7a-windows-gnu %S/Inputs/far-arm-thumb-abs20.s -o %t.far.obj
				// RUN: lld-link -entry:_start -subsystem:console %t.obj %t.far.obj -out:%t.exe
				// RUN: llvm-objdump -d %t.exe \| FileCheck %s
				.syntax unified
				.globl _start
				_start:
				bne too_far20

				// CHECK: Disassembly of section .text:
				// CHECK: .text:
				// CHECK: 401000: 40 f0 00 80 bne.w #0 <.text+0x4>
				// CHECK: 401004: 4f f6 f5 7c movw r12, #65525
				// CHECK: 401008: c0 f2 0f 0c movt r12, #15
				// CHECK: 40100c: fc 44 add r12, pc
				// CHECK: 40100e: 60 47 bx r12