This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lld/
-
ELF/
-
Arch/
56/59
X86_64.cpp
-
Config.h
1/1
Driver.cpp
9/9
InputSection.h
6/6
InputSection.cpp
5/5
LTO.cpp
1/2
Options.td
17/17
OutputSections.cpp
1/1
Relocations.h
7/7
Target.h
67/73
Writer.cpp
-
test/ELF/
-
ELF/
9/9
bb-sections-and-icf.s
8/8
bb-sections-delete-fallthru.s
-
bb-sections-pc32reloc.s

Differential D68065

Propeller: LLD Support for Basic Block Sections
ClosedPublic

Authored by tmsriram on Sep 25 2019, 5:58 PM.

Download Raw Diff

Details

Reviewers

ruiu
• espindola
MaskRay
davidxl
echristo

Commits

rG94317878d826: LLD Support for Basic Block Sections

Summary

LLD Support for Basic Block Sections

This is part of the Propeller framework to do post link code layout optimizations. Please see the RFC here: https://groups.google.com/forum/#!msg/llvm-dev/ef3mKzAdJ7U/1shV64BYBAAJ and the detailed RFC doc here: https://github.com/google/llvm-propeller/blob/plo-dev/Propeller_RFC.pdf

This is one in the series of patches for Propeller.

This patch adds lld support for basic block sections and performs relaxations after the basic blocks have been reordered.

After the linker has reordered the basic block sections according to the desired sequence, it runs a relaxation pass to optimize jump instructions. Currently, the compiler emits the long form of all jump instructions . AMD64 ISA supports variants of jump instructions with one byte offset or a four byte offset. The compiler generates jump instructions with R_X86_64 32-bit PC relative relocations. We would like to use a new relocation type for these jump instructions as it makes it easy and accurate while relaxing these instructions.

A new relocation type for jmp instructions which need to be relaxed makes it easy and accurate when the linker tries to find these instructions. Our current method peeks back to look at the opcode of the relocation type that could correspond to jmp instructions.

The relaxation pass does two things:

First, it deletes all explicit fall-through direct jump instructions between adjacent basic blocks. This is done by discarding the tail of the basic block section.

Second, If there are consecutive jump instructions, it checks if the first conditional jump can be inverted to convert the second into a fall through and delete the second.

The jump instructions are relaxed by using jump instruction mods, something like relocations. These are used to modify the opcode of the jump instruction. Jump instruction mods contain three values, instruction offset, jump type and size. While writing this jump instruction out to the final binary, the linker uses the jump instruction mod to determine the opcode and the size of the modified jump instruction. These mods are required because the input object files are memory-mapped without write permissions and directly modifying the object files requires copying these sections. Copying a large number of basic block sections significantly bloats memory.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

ruiu added inline comments.Feb 11 2020, 11:11 PM

lld/ELF/Arch/X86_64.cpp
172	Does "direct jmp" actually mean just J_JMP_32?
230	In this context, does "direct jump" mean all J* instructions?
279	Please fix local variable names.
lld/ELF/InputSection.h
225	Since JumpInstrMod is a public member, you don't need this?
lld/ELF/LTO.cpp
81	nit: it is more common to use `operator==` instead of `equals` to compare StringRefs.

Address reviewer comments, camel case, delete and rename methods.

lld/ELF/Arch/X86_64.cpp
172	Yes, it is J_JMP_32. Do you want this refactored into getJmpInsnType?

Add a test for ICF with bb sections.

tmsriram marked an inline comment as done.Feb 12 2020, 5:24 PM

tmsriram added inline comments.

lld/ELF/Writer.cpp
1689	I added an icf all test for this.

ruiu added inline comments.Feb 12 2020, 11:43 PM

lld/ELF/Arch/X86_64.cpp
172	I found it a little confusing to use "direct jump" to refer only to J_JMP_32 here and used the same term to represent all J* instructions in the deleteFallThruJmpInsn function comment. But maybe you can just inline and remove this function.
lld/ELF/InputSection.h
134	Now I wonder if you can just shrink `rawData`? Then you can revert he change you made to `getSize()`.
lld/ELF/OutputSections.cpp
247	Please use the actual type instead of `auto`.

Address reviewer comments.

tmsriram added inline comments.Feb 13 2020, 11:44 AM

lld/ELF/InputSection.h
134	I looked at this and saw that "trimmed" can be deleted which simplifies the code here and in getSize() much more. I need to keep track of actual available space and shrunk space. This is because we both shrink and grow the section, a follow-up patch that does this optimization. So, bytesDropped is useful to undo the shrink. PTAL and see if the simplification helps. Also, if we dont keep track of the actual size of the section when growing and shrinking, it is not possible to catch bugs where we accidentally grow more than the original size of the section, potentially writing out-of-bounds into rawData.

tmsriram marked an inline comment as done.Feb 13 2020, 11:50 AM

tmsriram added inline comments.

lld/ELF/InputSection.h
134	Sorry, I mistyped the last line. It should read "potentially reading out-of-bounds from rawData".

Overall looking good, but it looks like this test file doesn't cover all relocations you want to handle. You are handling the following relocations, and compared to that the test file seems too small.

J_JMP_32,
J_JNE_32,
J_JE_32,
J_JG_32,
J_JGE_32,
J_JB_32,
J_JBE_32,
J_JL_32,
J_JLE_32,
J_JA_32,
J_JAE_32,

lld/ELF/Arch/X86_64.cpp
160	nit: you should cache the result of is.relocations.size() as `for (unsigned e = is.relocations.size(); i < e; ++i)`
272	nit: jmpOpcode_B -> jmpOpcodeB
lld/ELF/Driver.cpp
929	This should be named `ltoBasicblockSections` to exactly match the option name.
lld/ELF/InputSection.cpp
1030–1031	nit: this can be `uint64_t offset = jumpMod.Offset + sec->outSecOff`
lld/ELF/InputSection.h
134	I see. Maybe we can have two ArrayRefs, one is an ArrayRef of the original size and the other is a (possibly) shrunk one, but I think it doesn't matter much, so I'm fine with this approach.
lld/ELF/OutputSections.cpp
245	Lowercase
249	Lowercase
lld/ELF/Relocations.h
116–118	Original -> original Offset -> offset Size -> size
lld/ELF/Target.h
91–92	Lowercase
101–102	Lowercase
lld/ELF/Writer.cpp
1719	What this `!ELFT::Is64Bits` condition for? It looks like you can just remove it.

Add tests for more jmp opcodes other nits.

lld/ELF/InputSection.h
134	Once the relocations are added, this could be made simpler so I will leave it for now.

LGTM

lld/ELF/Arch/X86_64.cpp
180	nit: remove extra parentheses
184	nit: remove extra parentheses

This revision is now accepted and ready to land.Feb 24 2020, 6:31 PM

40+ comments have not been addressed. Sorry, but I don't think this can be accepted as is. I am not catching up recent discussions but I believe there are still major concerns about the overall approach, whether code should be implemented in CodeGenPrepare.cpp, etc.

Mark as "Request Changes" in fear of an accidental commit.

lld/ELF/Arch/X86_64.cpp
41	Fix parameter case.
63	const
162	In an old comment, I suggested See Relocations.cpp:scanRelocs. A better way is to sort relocations beforehand. It is not addressed.
179	.byte 0xe8 .long foo - . - 4

This revision now requires changes to proceed.Feb 24 2020, 7:43 PM

MaskRay added inline comments.Feb 24 2020, 8:03 PM

lld/ELF/Arch/X86_64.cpp
174	Delete this once psABI reservers a new relocation type for fallthrough jumps.
184	Delete `SignExtend64`. config->wordsize * 8 is a constant, 64.
188	Fix variable case.
190	Delete excess parentheses
261	Delete excess parentheses
lld/ELF/OutputSections.cpp
353	Is `target->nopInstrs` redundant?
lld/ELF/Writer.cpp
1695	Better to check if there are cases where st_value > original_input_section_size. I asked because in binutils, bfd/elfnn-riscv.c has a similar check. if (sym->st_value > addr && sym->st_value <= toaddr) sym->st_value -= count;
1704	Similarly, in BFD, this is: else if (sym->st_value <= addr && sym->st_value + sym->st_size > addr && sym->st_value + sym->st_size <= toaddr) sym->st_size -= count;
1739	excess parentheses
lld/test/ELF/bb-sections-and-icf.s
23	This comment is redundant.
38	ditto

In D68065#1890650, @MaskRay wrote:

40+ comments have not been addressed.

Since the patch was split many of the changes didn't apply as we separated the shrink instruction optimization into another. We will address the comments you pointed.

Sorry, but I don't think this can be accepted as is. I am not catching up recent discussions but I believe there are still major concerns about the overall approach, whether code should be implemented in CodeGenPrepare.cpp, etc.

Could you be more specific as to what is the major concern? Removing code from CodeGenPrepare.cpp is for the LLVM patch, not this one, and is not a major concern, we will be updating that patch removing all code from CodeGenPrepare.cpp. We cannot land this patch without first getiing the LLVM patch approved anyways.

Mark as "Request Changes" in fear of an accidental commit.

Address reviewer comments, new test for PC32 reloc.

Pending: fixing size of symbols.

lld/ELF/Arch/X86_64.cpp
162	Could you please reconsider this? I understand what you mean here. This code is going to change when new psABI relocations are added. Could we re-open this then?
179	Okay, added a test for PC32. Also, the PC8 reloc will be created when we shrink a jmp insn offset from 32 to 8. The actual shrink code will be presented in a separate patch which will test for the PC8 so I will leave that out for now. Further, when we add new psABI relocs, this code and tests will have to be modified.
lld/ELF/InputSection.cpp
952	I am assuming this is alright.
lld/ELF/OutputSections.cpp
353	I can just assert instead!
lld/ELF/Target.h
105	The grow part has been split, not applicable.
lld/ELF/Writer.cpp
43	Not applicable.
573	Not applicable.
654	Not applicable.
656	Not applicable.
662	Not applicable.
700	Not applicable.
725	Not applicable.
1695	I will address this in your most recent comment.
1704	Working on this one, will fix this in the next iteration. Keeping this open until then.
1717	Accomodating thunks and creating them optimally needs a little more thought. We haven't looked at this and we haven't tested it. Could we do this in a later patch?
1740	Not applicable.
1747	Not applicable.
1764	Not applicable.
1770	Not applicable.
1772	Not applicable.
1779	Not applicable.
1779	Not applicable.

Fix symbol shrinking code to incorporate reviewer comments.

tmsriram marked an inline comment as done.Feb 26 2020, 12:45 PM

A Phabricator tip: marking every review comment Not applicable. is not needed. You can just click "Close" (status: Unsubmitted) but don't click Submit. When you upload a new Diff (with arc diff), the comments will be closed automatically.

I believe a large number of Not applicable comments were originally applicable. There have been many changes (see the number of Diff in History: 16). A large number of comments were not addressed with several versions. Now a bunch of comments were commented as Not applicable at once... It made it slightly difficult to understand which Diff addresses the comments. Though, they don't matter now.

When you upload a new Diff, I hope you can rebase the patch on origin/master. "Patch not applicable on master" has been complained by other reviewers in some other patches.

I think a patch hierarchy has been suggested by other reviewers and me several times. It is not clear what patches to apply and in what order. You can mention it in the SUMMARY. It will be very useful and also save reviewers' time. (To be honest I felt a lot of pressure when I received messages after I clicked "Request Changes". I try to be responsive.)

lld/ELF/Arch/X86_64.cpp
162	What is the average size of `is.relocations.size()`? It it is small in practice, the comment should mention it.

tmsriram added a parent revision: D68063: Propeller: LLVM support for basic block sections.Feb 26 2020, 4:42 PM

In D68065#1894573, @MaskRay wrote:

A Phabricator tip: marking every review comment Not applicable. is not needed. You can just click "Close" (status: Unsubmitted) but don't click Submit. When you upload a new Diff (with arc diff), the comments will be closed automatically.

Alright, will do that in future.

I believe a large number of Not applicable comments were originally applicable. There have been many changes (see the number of Diff in History: 16). A large number of comments were not addressed with several versions. Now a bunch of comments were commented as Not applicable at once... It made it slightly difficult to understand which Diff addresses the comments. Though, they don't matter now.

Just to be clear, the original patch also had shrink and grow optimization which we split out of the main patch and hence many of the original comments don't apply.

When you upload a new Diff, I hope you can rebase the patch on origin/master. "Patch not applicable on master" has been complained by other reviewers in some other patches.

Sure, will do.

I think a patch hierarchy has been suggested by other reviewers and me several times. It is not clear what patches to apply and in what order. You can mention it in the SUMMARY. It will be very useful and also save reviewers' time. (To be honest I felt a lot of pressure when I received messages after I clicked "Request Changes". I try to be responsive.)

! In D68065#1894573, @MaskRay wrote:

A Phabricator tip: marking every review comment Not applicable. is not needed. You can just click "Close" (status: Unsubmitted) but don't click Submit. When you upload a new Diff (with arc diff), the comments will be closed automatically.

Sure, this patch depends on TargetOptions.h changes in D68063. I marked it as a parent patch. I will put a larger summary of all patches associated with BBSections and Propeller.

I believe a large number of Not applicable comments were originally applicable. There have been many changes (see the number of Diff in History: 16). A large number of comments were not addressed with several versions. Now a bunch of comments were commented as Not applicable at once... It made it slightly difficult to understand which Diff addresses the comments. Though, they don't matter now.

When you upload a new Diff, I hope you can rebase the patch on origin/master. "Patch not applicable on master" has been complained by other reviewers in some other patches.

I think a patch hierarchy has been suggested by other reviewers and me several times. It is not clear what patches to apply and in what order. You can mention it in the SUMMARY. It will be very useful and also save reviewers' time. (To be honest I felt a lot of pressure when I received messages after I clicked "Request Changes". I try to be responsive.)

Rebase.

The number of relocations per input section with basic blocks is a few, eyeballing it is about 3-4 relocations including the extra jump relocation that gets added.

Here is the summary of all the basic block sections patches in clang, llvm and lld:

D68063 : Propeller: LLVM support for basic block sections [BB: LLVM1]
D73674 : Propeller: LLVM support for basic block sections (Base Patch - Part 2) [BB: LLVM2]
D68049 : Propeller: Clang options for basic block sections [BB: Clang]
D68065 : Propeller: LLD Support for Basic Block Sections [BB: LLD] -- This Patch!
D73497 : lld: Propeller framework part I [Propeller: LLD]

 BB: LLVM1
  /                \
/                    \

BB: LLVM2 BB: LLD

|           \                    |
|             \                  |
|               \                |

BB: CLang Propeller: LLD

If Sriraman has already resolved all Fangrui's comments, I can reiterate my LGTM, as the new code is well written, and I don't have a specific concern. I actually think that this is an interesting feature and in some degree a natural extension of -ffunction-sections.

tmsriram added a reviewer: echristo.Mar 11 2020, 6:53 PM

It looks like there's no activities in the llvm-dev thread and this review thread, so I'll reiterate my LGTM, as I think we should not block this change, in particular given that the corresponding change to LLVM has been submitted.

In general I think a lot of this looks pretty good. It could use some more comments - in particular to call out what a lot of this code is used for. It's not used on a typical linking path and so could be confusing to people going through the code.

-eric

lld/ELF/OutputSections.cpp
248	Might want to assert size != 0 or early return.

Address Reviewer comments.

+ More comments
+ Check for early return in nopInstrFill

In D68065#1927117, @echristo wrote:

In general I think a lot of this looks pretty good. It could use some more comments - in particular to call out what a lot of this code is used for. It's not used on a typical linking path and so could be confusing to people going through the code.

-eric

I have added more comments and mentioned that this is only applicable with basic block sections. I have also added "TODO:" in places where the code is going to get a lot simpler with newer relocations

@echristo Eric, is this alright now? Thanks.

I am very glad to see that we have made progress by landing D68063 (llvm/CodeGen/CommandFlags.inc) and D73674 (llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp). Basic block sections is agreed to be useful even outside Properller.

There are several optimizations goals:

Alignment inserting
Automatic cache prefetching
Large code model addressing can lower performance quite a bit. A post-link scheme can relax large code model addressting to small code model addressing.
...

There is a CPU erratum that we want to mitigate.

Intel's Jump Condition Code Erratum

By making this change, we will go the object file level route: annotate object files with metadata so that certain transformations can be performed.

Whether this scheme can satisfy the goals and avoid the erratum, and the uncertainty about how many stuff we will have to reinvent is my biggest concerns.

On https://lists.llvm.org/pipermail/llvm-dev/2020-February/139543.html (my brainstorming), I mentioned we may achieve our goals and make it suitable for future optimizations by using a file format with more semantics (rather than an object file). I hope we can think more on this, rather than rush to conclusions "this is redoing full LTO. it can't scale"

Considering the above points, I re-iterate my "Request Changes". We need a plan to prove that we can achieve our optimization goals while avoiding caveats (erratum).

This revision now requires changes to proceed.Mar 20 2020, 5:30 PM

Fangrui, I agree with Eric here. JCC erratum handling is not part of the core feature and it is perfectly reasonable to submit that as a follow up patch. This review thread has been going on for almost half a year and if the comments about the core patch have been addressed, we shall unlock the progress and move on.

In the meantime, the discussion on the erratum handling should also start -- probably in the contexts of other mitigations (not just this JCC one).

thanks,

David

In D68065#1934866, @MaskRay wrote:

I am very glad to see that we have made progress by landing D68063 (llvm/CodeGen/CommandFlags.inc) and D73674 (llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp). Basic block sections is agreed to be useful even outside Properller.

There are several optimizations goals:

Alignment inserting

Automatic cache prefetching

Large code model addressing can lower performance quite a bit. A post-link scheme can relax large code model addressting to small code model addressing.

...

There is a CPU erratum that we want to mitigate.

Intel's Jump Condition Code Erratum

By making this change, we will go the object file level route: annotate object files with metadata so that certain transformations can be performed.

Whether this scheme can satisfy the goals and avoid the erratum, and the uncertainty about how many stuff we will have to reinvent is my biggest concerns.

On https://lists.llvm.org/pipermail/llvm-dev/2020-February/139543.html (my brainstorming), I mentioned we may achieve our goals and make it suitable for future optimizations by using a file format with more semantics (rather than an object file). I hope we can think more on this, rather than rush to conclusions "this is redoing full LTO. it can't scale"

Considering the above points, I re-iterate my "Request Changes". We need a plan to prove that we can achieve our optimization goals while avoiding caveats (erratum).

@echristo @ruiu

If the JCC erratum is the only concern then we are able to show now with experiments that Propeller can produce JCC erratum free binaries with almost no performance impact and only by using the existing assembler mitigations : http://lists.llvm.org/pipermail/llvm-dev/2020-March/140134.html

Let's use that thread to continue to investigate how the linker could potentially do a better job of handling this or other erratums in general. Could we please unblock this?

In D68065#1937331, @tmsriram wrote:

In D68065#1934866, @MaskRay wrote:

I am very glad to see that we have made progress by landing D68063 (llvm/CodeGen/CommandFlags.inc) and D73674 (llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp). Basic block sections is agreed to be useful even outside Properller.

There are several optimizations goals:

Alignment inserting

Automatic cache prefetching

Large code model addressing can lower performance quite a bit. A post-link scheme can relax large code model addressting to small code model addressing.

...

There is a CPU erratum that we want to mitigate.

Intel's Jump Condition Code Erratum

By making this change, we will go the object file level route: annotate object files with metadata so that certain transformations can be performed.

Whether this scheme can satisfy the goals and avoid the erratum, and the uncertainty about how many stuff we will have to reinvent is my biggest concerns.

On https://lists.llvm.org/pipermail/llvm-dev/2020-February/139543.html (my brainstorming), I mentioned we may achieve our goals and make it suitable for future optimizations by using a file format with more semantics (rather than an object file). I hope we can think more on this, rather than rush to conclusions "this is redoing full LTO. it can't scale"

Considering the above points, I re-iterate my "Request Changes". We need a plan to prove that we can achieve our optimization goals while avoiding caveats (erratum).

@echristo @ruiu

If the JCC erratum is the only concern then we are able to show now with experiments that Propeller can produce JCC erratum free binaries with almost no performance impact and only by using the existing assembler mitigations : http://lists.llvm.org/pipermail/llvm-dev/2020-March/140134.html

Let's use that thread to continue to investigate how the linker could potentially do a better job of handling this or other erratums in general. Could we please unblock this?

I agree. While the erratum is going to be important I think that this going in now doesn't block any future work on erratum handling in the linker or make it more difficult to add other than adding it on top of propeller perhaps, but that's a different issue?

@MaskRay : Is the propeller patch going to (outside of in propeller) going to cause implementing the jcc erratum to be more difficult? If not, then I think we should go forward. If so, can you propose some (hopefully narrow) changes to this patch in order to make that more possible - even "you asked for jcc erratum and we can't do that here" as an error seems like it would be an ok incremental step forward?

dxf added a subscriber: dxf.Mar 23 2020, 2:33 PM

In D68065#1937331, @tmsriram wrote:

In D68065#1934866, @MaskRay wrote:

I am very glad to see that we have made progress by landing D68063 (llvm/CodeGen/CommandFlags.inc) and D73674 (llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp). Basic block sections is agreed to be useful even outside Properller.

There are several optimizations goals:

Alignment inserting

Automatic cache prefetching

Large code model addressing can lower performance quite a bit. A post-link scheme can relax large code model addressting to small code model addressing.

...

There is a CPU erratum that we want to mitigate.

Intel's Jump Condition Code Erratum

By making this change, we will go the object file level route: annotate object files with metadata so that certain transformations can be performed.

Whether this scheme can satisfy the goals and avoid the erratum, and the uncertainty about how many stuff we will have to reinvent is my biggest concerns.

On https://lists.llvm.org/pipermail/llvm-dev/2020-February/139543.html (my brainstorming), I mentioned we may achieve our goals and make it suitable for future optimizations by using a file format with more semantics (rather than an object file). I hope we can think more on this, rather than rush to conclusions "this is redoing full LTO. it can't scale"

Considering the above points, I re-iterate my "Request Changes". We need a plan to prove that we can achieve our optimization goals while avoiding caveats (erratum).

@echristo @ruiu

If the JCC erratum is the only concern then we are able to show now with experiments that Propeller can produce JCC erratum free binaries with almost no performance impact and only by using the existing assembler mitigations : http://lists.llvm.org/pipermail/llvm-dev/2020-March/140134.html

Let's use that thread to continue to investigate how the linker could potentially do a better job of handling this or other erratums in general. Could we please unblock this?

Intel JCC erratum is not the only concern. My bigger concern is whether we can achieve our post-link optimization goals other than layout shuffling with the current scheme:

Alignment inserting
Automatic cache prefetching
Large code model addressing can lower performance quite a bit. A post-link scheme can relax large code model addressting to small code model addressing. ...
...

These points were already listed in my previous comments. I believe internally you probably have more brainstorming thoughts. As I said on https://lists.llvm.org/pipermail/llvm-dev/2020-March/139639.html , I am not yet convinced that with the no disassembly assumption, reordering opaque sections can achieve the above goals. Post-link optimization is not a new idea and there have been several engineering efforts before Propeller. However, Propeller is the first integrating the great idea into LLVM. As I said I look forward to its bright future. I just hope that we can create a generic framework. Our focus is currently section reordering. When we start to think future optimization opportunities, we don't need to create one, two, three, four more frameworks.

I saw Rahman posted https://lists.llvm.org/pipermail/llvm-dev/2020-March/140134.html yesterday. Sorry that I did not have time reading it today. If the idea is that more layout work will be done by the compiler, then it starts to look good to me.

arichardson removed a subscriber: arichardson.Mar 24 2020, 12:24 AM

In D68065#1938447, @MaskRay wrote:

In D68065#1937331, @tmsriram wrote:

In D68065#1934866, @MaskRay wrote:

I am very glad to see that we have made progress by landing D68063 (llvm/CodeGen/CommandFlags.inc) and D73674 (llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp). Basic block sections is agreed to be useful even outside Properller.

There are several optimizations goals:

Alignment inserting

Automatic cache prefetching

Large code model addressing can lower performance quite a bit. A post-link scheme can relax large code model addressting to small code model addressing.

...

@echristo @snehasish @dxf @ruiu @tejohnson @rnk

All the above ideas you mentioned here, you heard it from my team in face to face meetings and the last one in internal discussions. I recall specifically telling you some of it is in very early stages and to keep it to yourself. You could have at least checked with us before mentioning it, as just courtesy if not anything else.

There is a CPU erratum that we want to mitigate.

Intel's Jump Condition Code Erratum

By making this change, we will go the object file level route: annotate object files with metadata so that certain transformations can be performed.

Whether this scheme can satisfy the goals and avoid the erratum, and the uncertainty about how many stuff we will have to reinvent is my biggest concerns.

On https://lists.llvm.org/pipermail/llvm-dev/2020-February/139543.html (my brainstorming), I mentioned we may achieve our goals and make it suitable for future optimizations by using a file format with more semantics (rather than an object file). I hope we can think more on this, rather than rush to conclusions "this is redoing full LTO. it can't scale"

Considering the above points, I re-iterate my "Request Changes". We need a plan to prove that we can achieve our optimization goals while avoiding caveats (erratum).

@echristo @ruiu

If the JCC erratum is the only concern then we are able to show now with experiments that Propeller can produce JCC erratum free binaries with almost no performance impact and only by using the existing assembler mitigations : http://lists.llvm.org/pipermail/llvm-dev/2020-March/140134.html

Let's use that thread to continue to investigate how the linker could potentially do a better job of handling this or other erratums in general. Could we please unblock this?

Intel JCC erratum is not the only concern. My bigger concern is whether we can achieve our post-link optimization goals other than layout shuffling with the current scheme:

Alignment inserting

Automatic cache prefetching

Large code model addressing can lower performance quite a bit. A post-link scheme can relax large code model addressting to small code model addressing. ...

...

These points were already listed in my previous comments. I believe internally you probably have more brainstorming thoughts. As I said on https://lists.llvm.org/pipermail/llvm-dev/2020-March/139639.html , I am not yet convinced that with the no disassembly assumption, reordering opaque sections can achieve the above goals. Post-link optimization is not a new idea and there have been several engineering efforts before Propeller. However, Propeller is the first integrating the great idea into LLVM. As I said I look forward to its bright future. I just hope that we can create a generic framework. Our focus is currently section reordering. When we start to think future optimization opportunities, we don't need to create one, two, three, four more frameworks.

I don't mean any disrespect here but your tone suggests that you are quite experienced in this area :) If you have a better proposal, I strongly encourage you to propose it, evaluate it with experiments and performance numbers and get it into LLVM. Asking us to evaluate completely new designs along the lines of Full or Partial LTO is not feasible as it takes several weeks if not months, and IMHO, not reasonable particularly at this stage in the review.

We have put in a lot of effort towards this work to come this far. Asking us to go do it totally differently is not something we are going to do. We now know the partial LTO idea was actually not yours and only suggested to you, which I believe you should at least acknowledge for transparency.

Multiple people with significant LTO experience have told you ideas resembling Full LTO have scalability issues and ThinLTO has had a lot of adoption due to exactly this. If you want to prove them wrong, good luck to you but please don't ask us to do the heavy lifting!

As for disassembly, we have not presented a single patch that does any serious disassembly and we fully understand the pitfalls. We understand that the jump relaxation does mild disassembly and we are looking at relocations to overcome that as you already know.

We are looking at efficient ways to accomplish our other optimization objectives and we will present clear designs with experiments on llvm-dev when we get it done. The idea here is to do thin links like @mehdi_amini alluded to in that thread of yours: https://llvm.org/devmtg/2020-02-23/#kl which will do most of the transformations in the compiler and use thin links to generate summaries that are whole program.

I saw Rahman posted https://lists.llvm.org/pipermail/llvm-dev/2020-March/140134.html yesterday. Sorry that I did not have time reading it today. If the idea is that more layout work will be done by the compiler, then it starts to look good to me.

I urge you to read that as we spent significant time to conclusively prove that JCC erratum is a non-issue. I can summarize the plan to you:

We have been looking at constantly reducing the bloat from extra sections to be as low as possible. Some of the work we did here was to selectively create basic block sections.
During this, we realized we can even do better if we can compute basic block orders early immediately after profiling. This would require using a dynamic CFG but we have just protoyped it and it has the same performance benefits.
This could also moves the bulk of the Propeller work from the linker to a third party tool, create_llvm_prof. This patch is still necessary as it relates to bb sections.
This allows us to form larger sections in the compiler and not wait for the linker to do the reordering. We would still have to create multiple sections but a lot fewer, significantly reducing the bloats.
This means we can also reuse the existing assembler mitigations developed without having to reinvent them in the linker which gives us an immediate solution.
Performance is neutral (0.2% slowdown) after applying the mitigations. Infact, without Propeller the performance is down by 0.6% from the mitigations.
To be clear, we get all of the Propeller wins even with the mitigations, measured on clang benchmark.
We feel the linker can do a much better job here since the mitigations are only using NOPs and contrary to what you told us in the meeting, prefixes dont seem to help. You have noted this yourself so I am still wondering why you told us that prefixes help: https://reviews.llvm.org/D72225#1818149 where you say " NOP padding alone seems good"
The linker does not have to use the large hammer of aligning every function to 32 byte boundaries but this is something best discussed in a design thread

You also say in your previous message that "I am very glad to see that we have made progress by landing D68063 ..." and yet you are blocking this. This is ridiculous. If you have fundamental disagreements, you should also have been blocking the other patches. The other patches are not very useful without this, whats up with the selective blocking!

To conclude, it is perfectly fine if you are opposed to this and don't wish to unblock. I am trying to act in good faith and I am just going to have to push this around you if I get the approvals or kill basic block sections. We have to agree to disagree.

Hi Sriraman,

Sounds like there is strong support for your patch, so let’s move forward on it.

I do have a few more code review items I’d like addressed if we can before we commit.
There are several nits about excess parentheses which do not fit with the rest of lld code.
Please scan the full patch and delete them. Rebase and upload a new diff so that I can patch my local repository with arc patch D68065.
It does not apply so it isn’t convenient for me to review the tests now.

In addition, we probably should move test/ELF/bb-sections-* tests to test/ELF/propeller/

I don't mean any disrespect here but your tone suggests that you are quite experienced in this area :) If you have a better proposal, I strongly encourage you to propose it, evaluate it with experiments and performance numbers and get it into LLVM. Asking us to evaluate completely new designs along the lines of Full or Partial LTO is not feasible as it takes several weeks if not months, and IMHO, not reasonable particularly at this stage in the review.

The first sentence is fair. I have just a bit more than 2 years experience contributing to LLVM. I made a llvm-dev proposal as everyone can see. In my spare time I will probably make more investigations how to speed up things.

We have put in a lot of effort towards this work to come this far. Asking us to go do it totally differently is not something we are going to do. We now know the partial LTO idea was actually not yours and only suggested to you, which I believe you should at least acknowledge for transparency.

Just wanted to defend for myself. I was not sure I could mention the person. That email also includes lots of my own ideas.

You also say in your previous message that "I am very glad to see that we have made progress by landing D68063 ..." and yet you are blocking this. This is ridiculous.

Please note that I made it very clear that basic block sections are helpful for other things. I was blocking this linker patch because of concerns about long-term viability and maintenance.

That said, I think we need to try out new ideas. I am now focusing on code view itself, instead of repeating my previous high level concerns.

lld/ELF/Arch/X86_64.cpp
66	Should follow `variableName` convention. Add `See X86AsmBackend::writeNopData`
162	for (unsigned i = is.relocations.size(); i != 0; ) { --i; if (is.relocations[i].offset == offset && is.relocations[i].expr != R_NONE) return i; }
171	delete parens
254	Nit: delete parens `(is.getSize() - 4)`
lld/ELF/Options.td
47	Add `(default)`. See other options.
513	`"Do not give unique names to every basic block section for LTO (default)"`
lld/ELF/OutputSections.cpp
259	`assert(nopFiller[remaining - 1].size() == remaining)`
352–356	The code (`if (isec->nopFiller)`) self explains. No need for a comment.
lld/ELF/Target.h
96	delete excess space after `//`
lld/ELF/Writer.cpp
1695	I requested a research for st_value but I think it is not needed to get it accurate. However, I don't think we should just copy the elfnn-riscv.c behavior. if (def->value + def->size > NewSize && def->value <= OldSize && def->value + def->size <= OldSize) { should be simplified to if (def->value + def->size > NewSize && def->value + def->size <= OldSize) {
1700	Nit: Moving -> move (make it a bit simpler)
1724	Nit: move the condition to the call site.
1729	There is only one list item. Why use `1.` ?
1739	Where is `Step 2`?
1744	Nit: delete parens
1747	Nit: static_cast<unsigned>
1753	Removing -> removed
2088	`optimizeBasicBlockJumps` calls assignAddresses, which was only called in finalizeAddressDependentContent. We hope assignAddresses caller are grouped together (if in.symTab needs to be finalized first, please add a comment). Can you move this pass immediately before (or after) finalizeAddressDependentContent?
lld/test/ELF/bb-sections-and-icf.s
5	deleted
7	`x86_64-pc-linux` -> `x86_64`
8	delete excess space. Prefer `--optimize-bb-jumps` over `-optimize-bb-jumps`
8	If --icf=all result is different from `--icf=none`. Add a comment.
17	Use `##` for test comments.
18	Don't add excess space.

MaskRay added inline comments.Mar 24 2020, 5:19 PM

lld/test/ELF/bb-sections-delete-fallthru.s
14	Just delete `{{[0-9\|a-f\| ]*}}` if the address is not significant. Please also apply to other test files.
19	Delete `Begin function foo` Scrub clang output to the minimum

Address recent reviewer comments.

rebase
check it applies clean to trunk

In D68065#1940343, @MaskRay wrote:

Hi Sriraman,

Sounds like there is strong support for your patch, so let’s move forward on it.

I do have a few more code review items I’d like addressed if we can before we commit.
There are several nits about excess parentheses which do not fit with the rest of lld code.
Please scan the full patch and delete them. Rebase and upload a new diff so that I can patch my local repository with arc patch D68065.
It does not apply so it isn’t convenient for me to review the tests now.

In addition, we probably should move test/ELF/bb-sections-* tests to test/ELF/propeller/

In D68063 and D73674, we decided (one of your suggestions too) to remove all references to Propeller as this was related to basic block sections. This is also fully a part of bb sections and fits naturally and is part of core LLD. I dont think we should move it to propeller/.

I don't mean any disrespect here but your tone suggests that you are quite experienced in this area :) If you have a better proposal, I strongly encourage you to propose it, evaluate it with experiments and performance numbers and get it into LLVM. Asking us to evaluate completely new designs along the lines of Full or Partial LTO is not feasible as it takes several weeks if not months, and IMHO, not reasonable particularly at this stage in the review.

The first sentence is fair. I have just a bit more than 2 years experience contributing to LLVM. I made a llvm-dev proposal as everyone can see. In my spare time I will probably make more investigations how to speed up things.

We have put in a lot of effort towards this work to come this far. Asking us to go do it totally differently is not something we are going to do. We now know the partial LTO idea was actually not yours and only suggested to you, which I believe you should at least acknowledge for transparency.

Just wanted to defend for myself. I was not sure I could mention the person. That email also includes lots of my own ideas.

You also say in your previous message that "I am very glad to see that we have made progress by landing D68063 ..." and yet you are blocking this. This is ridiculous.

Please note that I made it very clear that basic block sections are helpful for other things. I was blocking this linker patch because of concerns about long-term viability and maintenance.

That said, I think we need to try out new ideas. I am now focusing on code view itself, instead of repeating my previous high level concerns.

lld/ELF/Arch/X86_64.cpp
162	This won't work because this function should return the max. size when not found. If I change that assumption, I must also change how this is consumed by its callers. Is there a reason why you want to search in the reverse direction? I mean I will change it if you could tell me why.
lld/ELF/OutputSections.cpp
352–356	I removed it but maybe I should say that NOPs are needed here as opposed to TRAP because they might be executed.
lld/ELF/Writer.cpp
1695	@amharc We are going back and forth here. We didn't think it was necessary to copy the behavior either, but since you suggested that we copy elfnn-riscv.c behavior, we went ahead and did it. We dont see a problem with either, so should we just leave it as is? Is there a particular concern here, that is not clear to me.
1700	I am not a native speaker but "Moving" seems more appropriate here unless you meant that 'M' should be lower case. My take, "move" seems to imply the user must do it.
1724	Moved and asserted at the beginning.
1729	Rephrased, shrinking the jump instrs was the 2. but I split that patch out and this remained unnoticed.
1739	Same deal.
1744	With parens reads better to me personally. If this is not acceptable w.r.t the coding style, lmk and I will delete.
1753	Past tense is preferred? I see mixed use here in many places in LLD.
2088	Correct me if I am wrong but finalizing in.symtab before we optimize seems important, I added a comment here. If we relax jumps going forward, we would definitely need to know how far they are.
lld/test/ELF/bb-sections-and-icf.s
8	Didn't follow. This test explicitly checks for the folding. You want to test icf=none too? Why?

tmsriram edited the summary of this revision. (Show Details)Mar 25 2020, 8:36 PM

Ping.

In D68065#1956518, @tmsriram wrote:

Ping.

Will read tomorrow.

MaskRay added inline comments.Apr 2 2020, 2:51 PM

lld/test/ELF/bb-sections-delete-fallthru.s
9	delete excess space and use `--optimize-bb-jumps` I mentioned this in a previous comment.
25	Add some spaces after `CHECK:` to make the label aligned. For subsequent `-NEXT:` lines, you can increase the indentation (say, by 1) to make it clear the instructions follow the label: # CHECK: <a.BB.foo> # CHECK-NEXT: nopl (%rax) Please fix other tests as well.
27	`je {{.*}} <r.BB.foo>` I have recently updated llvm-objdump -d to print the target address (to be consistent with GNU objdump ; is what most users desire) instead of a decimal PC relative immediate No need to align the first operand.

MaskRay added inline comments.Apr 2 2020, 2:53 PM

lld/test/ELF/bb-sections-delete-fallthru.s
25	# CHECK: <a.BB.foo>: # CHECK-NEXT: nopl (%rax) Add a colon to make it clear that `a.BB.foo` is a label:

I have a question about aaaaaaa.BB.foo style labels. Are they STB_LOCAL symbol? If so,

The assembler (MC) will generally convert relocations referencing STB_LOCAL to reference STT_SECTION instead.
The assembler will keep them in .symtab
The linker will retain such labels in the .symtab section in the executable/shared object unless --discard-locals is specified.
aaaa.BB.foo will just be marker symbols in executables/shared objects: "hey, these addresses are special and are referenced by some instructions."
The strip tool will drop .symtab . It seems that the executables/shared objects cannot be stripped.

Are all the items above expected?

lld/test/ELF/bb-sections-delete-fallthru.s
8	Drop `-pc-linux`. I mentioned this in a previous comment.
10	Drop `--check-prefix=CHECK`. It is the default.

MaskRay added inline comments.Apr 2 2020, 3:32 PM

lld/ELF/Arch/X86_64.cpp
162	for (unsigned i = is.relocations.size(); i != 0; ) { --i; if (is.relocations[i].offset == offset && is.relocations[i].expr != R_NONE) return i; } return is.relocations.size(); An input section may have several relocations. Scanning backward can be more efficient.

tmsriram marked an inline comment as done.Apr 2 2020, 3:42 PM

tmsriram added inline comments.

lld/ELF/Arch/X86_64.cpp
162	We discussed this f2f and even about sorting it and you expressed that you were alright leaving it as is, maybe you forgot.

tmsriram marked an inline comment as done.Apr 2 2020, 3:44 PM

tmsriram added inline comments.

lld/ELF/Arch/X86_64.cpp
162	Forgot to add that since this will change when the new relocations come in.

MaskRay added inline comments.Apr 2 2020, 5:00 PM

lld/ELF/Arch/X86_64.cpp
162	No, I do not forget it. The relocations are still sorted in practice. It is just that this is not a guaranteed property. Iterating the relocations backward is faster.

Rebase and make recent suggested changes.

In D68065#1958233, @MaskRay wrote:

I have a question about aaaaaaa.BB.foo style labels. Are they STB_LOCAL symbol? If so,

The assembler (MC) will generally convert relocations referencing STB_LOCAL to reference STT_SECTION instead.

The assembler will keep them in .symtab

The linker will retain such labels in the .symtab section in the executable/shared object unless --discard-locals is specified.

aaaa.BB.foo will just be marker symbols in executables/shared objects: "hey, these addresses are special and are referenced by some instructions."

The strip tool will drop .symtab . It seems that the executables/shared objects cannot be stripped.

Acknowledged. We are aware of all these points. We will be adding -mrelocate-with-symbols which will relocate with symbols instead of sections to perform relaxation more easily. These symbols are used to map profiles to virtual addresses and hence the unstripped binary is needed. This is our first version and we are brain-storming other methods including putting it in Debug Info. Thanks.

Are all the items above expected?

lld/ELF/Arch/X86_64.cpp
162	Alright, backward scanning now.

LGTM

I had a VC meeting wtih maskray to discuss concerns he has, and he is ok with this patch. I'm fine with this patch too. I'll give a final LGTM to unblock you. Please make the following changes and submit. Thanks!

lld/ELF/LTO.cpp
84	nit: if else has {}, add {} to other `if` and `else if` clauses.
94–95	Please use `error()` to report an error and use the same style as other error messages. E.g. error("cannot open " + config->ltoBasicBlockSections + ":" + MBOrErr.getError().message());

MaskRay added inline comments.Apr 6 2020, 8:55 PM

lld/ELF/Writer.cpp
1743	delete excess parens
2088	Run after in.symTab is finalized. Why is important?

This revision was not accepted when it landed; it landed in state Needs Review.Apr 7 2020, 7:02 AM

Closed by commit rG94317878d826: LLD Support for Basic Block Sections (authored by tmsriram). · Explain Why

This revision was automatically updated to reflect the committed changes.

tmsriram marked 6 inline comments as done and an inline comment as not done.

tmsriram added inline comments.Apr 7 2020, 7:07 AM

lld/ELF/Writer.cpp
2088	Modified comment.

MaskRay added inline comments.Apr 7 2020, 12:00 PM

lld/ELF/Writer.cpp
2088	I am still not sure this is correct. .symtab and .strtab and potential .shstrtab are placed after everything else. The internal representation does not use the contents of `.symtab` at all. Though, you already committed this patch. We probably don't want to change here more to cause churn.

MaskRay mentioned this in D77694: [WIP][RISCV][ELF] Linker relaxation support.Apr 7 2020, 6:08 PM

MaskRay mentioned this in D91018: [ELF] Make InputSection smaller.Nov 10 2020, 10:19 PM

Revision Contents

Path

Size

lld/

ELF/

Arch/

318 lines

3 lines

7 lines

33 lines

27 lines

26 lines

9 lines

25 lines

10 lines

17 lines

97 lines

test/

ELF/

bb-sections-and-icf.s

47 lines

bb-sections-delete-fallthru.s

128 lines

bb-sections-pc32reloc.s

37 lines

Diff 255670

lld/ELF/Arch/X86_64.cpp

//===- X86_64.cpp ---------------------------------------------------------===//		//===- X86_64.cpp ---------------------------------------------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "InputFiles.h"		#include "InputFiles.h"
		#include "OutputSections.h"
#include "Symbols.h"		#include "Symbols.h"
#include "SyntheticSections.h"		#include "SyntheticSections.h"
#include "Target.h"		#include "Target.h"
#include "lld/Common/ErrorHandler.h"		#include "lld/Common/ErrorHandler.h"
#include "llvm/Object/ELF.h"		#include "llvm/Object/ELF.h"
#include "llvm/Support/Endian.h"		#include "llvm/Support/Endian.h"

using namespace llvm;		using namespace llvm;
Show All 14 Lines	public:
RelType getDynRel(RelType type) const override;		RelType getDynRel(RelType type) const override;
void writeGotPltHeader(uint8_t *buf) const override;		void writeGotPltHeader(uint8_t *buf) const override;
void writeGotPlt(uint8_t *buf, const Symbol &s) const override;		void writeGotPlt(uint8_t *buf, const Symbol &s) const override;
void writePltHeader(uint8_t *buf) const override;		void writePltHeader(uint8_t *buf) const override;
void writePlt(uint8_t *buf, const Symbol &sym,		void writePlt(uint8_t *buf, const Symbol &sym,
uint64_t pltEntryAddr) const override;		uint64_t pltEntryAddr) const override;
void relocate(uint8_t *loc, const Relocation &rel,		void relocate(uint8_t *loc, const Relocation &rel,
uint64_t val) const override;		uint64_t val) const override;
		void applyJumpInstrMod(uint8_t *loc, JumpModType type,
		MaskRayUnsubmitted Not Done Reply Inline Actions Fix parameter case. MaskRay: Fix parameter case.
		unsigned size) const override;

RelExpr adjustRelaxExpr(RelType type, const uint8_t *data,		RelExpr adjustRelaxExpr(RelType type, const uint8_t *data,
RelExpr expr) const override;		RelExpr expr) const override;
void relaxGot(uint8_t *loc, const Relocation &rel,		void relaxGot(uint8_t *loc, const Relocation &rel,
uint64_t val) const override;		uint64_t val) const override;
void relaxTlsGdToIe(uint8_t *loc, const Relocation &rel,		void relaxTlsGdToIe(uint8_t *loc, const Relocation &rel,
uint64_t val) const override;		uint64_t val) const override;
void relaxTlsGdToLe(uint8_t *loc, const Relocation &rel,		void relaxTlsGdToLe(uint8_t *loc, const Relocation &rel,
uint64_t val) const override;		uint64_t val) const override;
void relaxTlsIeToLe(uint8_t *loc, const Relocation &rel,		void relaxTlsIeToLe(uint8_t *loc, const Relocation &rel,
uint64_t val) const override;		uint64_t val) const override;
void relaxTlsLdToLe(uint8_t *loc, const Relocation &rel,		void relaxTlsLdToLe(uint8_t *loc, const Relocation &rel,
uint64_t val) const override;		uint64_t val) const override;
bool adjustPrologueForCrossSplitStack(uint8_t loc, uint8_t end,		bool adjustPrologueForCrossSplitStack(uint8_t loc, uint8_t end,
uint8_t stOther) const override;		uint8_t stOther) const override;
		bool deleteFallThruJmpInsn(InputSection &is, InputFile *file,
		InputSection *nextIS) const override;
};		};
} // namespace		} // namespace

		// This is vector of NOP instructions of sizes from 1 to 8 bytes. The
		MaskRayUnsubmitted Done Reply Inline Actions const MaskRay: const
		// appropriately sized instructions are used to fill the gaps between sections
		// which are executed during fall through.
		static const std::vector<std::vector<uint8_t>> nopInstructions = {
		MaskRayUnsubmitted Done Reply Inline Actions Should follow `variableName` convention. Add `See X86AsmBackend::writeNopData` MaskRay: Should follow `variableName` convention. Add `See X86AsmBackend::writeNopData`
		{0x90},
		{0x66, 0x90},
		{0x0f, 0x1f, 0x00},
		{0x0f, 0x1f, 0x40, 0x00},
		{0x0f, 0x1f, 0x44, 0x00, 0x00},
		{0x66, 0x0f, 0x1f, 0x44, 0x00, 0x00},
		{0x0F, 0x1F, 0x80, 0x00, 0x00, 0x00, 0x00},
		{0x0F, 0x1F, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00},
		{0x66, 0x0F, 0x1F, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00}};

X86_64::X86_64() {		X86_64::X86_64() {
copyRel = R_X86_64_COPY;		copyRel = R_X86_64_COPY;
gotRel = R_X86_64_GLOB_DAT;		gotRel = R_X86_64_GLOB_DAT;
noneRel = R_X86_64_NONE;		noneRel = R_X86_64_NONE;
pltRel = R_X86_64_JUMP_SLOT;		pltRel = R_X86_64_JUMP_SLOT;
relativeRel = R_X86_64_RELATIVE;		relativeRel = R_X86_64_RELATIVE;
iRelativeRel = R_X86_64_IRELATIVE;		iRelativeRel = R_X86_64_IRELATIVE;
symbolicRel = R_X86_64_64;		symbolicRel = R_X86_64_64;
tlsDescRel = R_X86_64_TLSDESC;		tlsDescRel = R_X86_64_TLSDESC;
tlsGotRel = R_X86_64_TPOFF64;		tlsGotRel = R_X86_64_TPOFF64;
tlsModuleIndexRel = R_X86_64_DTPMOD64;		tlsModuleIndexRel = R_X86_64_DTPMOD64;
tlsOffsetRel = R_X86_64_DTPOFF64;		tlsOffsetRel = R_X86_64_DTPOFF64;
pltHeaderSize = 16;		pltHeaderSize = 16;
pltEntrySize = 16;		pltEntrySize = 16;
ipltEntrySize = 16;		ipltEntrySize = 16;
trapInstr = {0xcc, 0xcc, 0xcc, 0xcc}; // 0xcc = INT3		trapInstr = {0xcc, 0xcc, 0xcc, 0xcc}; // 0xcc = INT3
		nopInstrs = nopInstructions;
		ruiuUnsubmitted Done Reply Inline Actions Maybe `nopInstrs` is slightly better? ruiu: Maybe `nopInstrs` is slightly better?

// Align to the large page size (known as a superpage or huge page).		// Align to the large page size (known as a superpage or huge page).
// FreeBSD automatically promotes large, superpage-aligned allocations.		// FreeBSD automatically promotes large, superpage-aligned allocations.
defaultImageBase = 0x200000;		defaultImageBase = 0x200000;
}		}

int X86_64::getTlsGdRelaxSkip(RelType type) const { return 2; }		int X86_64::getTlsGdRelaxSkip(RelType type) const { return 2; }

		// Opcodes for the different X86_64 jmp instructions.
		enum JmpInsnOpcode : uint32_t {
		J_JMP_32,
		J_JNE_32,
		J_JE_32,
		J_JG_32,
		J_JGE_32,
		J_JB_32,
		J_JBE_32,
		J_JL_32,
		J_JLE_32,
		J_JA_32,
		J_JAE_32,
		J_UNKNOWN,
		};

		// Given the first (optional) and second byte of the insn's opcode, this
		// returns the corresponding enum value.
		static JmpInsnOpcode getJmpInsnType(const uint8_t *first,
		const uint8_t *second) {
		ruiuUnsubmitted Done Reply Inline Actions First -> first, Second -> second ruiu: First -> first, Second -> second
		if (*second == 0xe9)
		return J_JMP_32;

		if (first == nullptr)
		return J_UNKNOWN;

		if (*first == 0x0f) {
		switch (*second) {
		case 0x84:
		return J_JE_32;
		case 0x85:
		return J_JNE_32;
		case 0x8f:
		return J_JG_32;
		case 0x8d:
		return J_JGE_32;
		case 0x82:
		return J_JB_32;
		case 0x86:
		return J_JBE_32;
		case 0x8c:
		return J_JL_32;
		case 0x8e:
		return J_JLE_32;
		case 0x87:
		return J_JA_32;
		case 0x83:
		return J_JAE_32;
		}
		}
		return J_UNKNOWN;
		}

		// Return the relocation index for input section IS with a specific Offset.
		// Returns the maximum size of the vector if no such relocation is found.
		static unsigned getRelocationWithOffset(const InputSection &is,
		uint64_t offset) {
		ruiuUnsubmitted Done Reply Inline Actions IS -> is,Offset -> offset ruiu: IS -> is,Offset -> offset
		unsigned size = is.relocations.size();
		for (unsigned i = size - 1; i + 1 > 0; --i) {
		ruiuUnsubmitted Done Reply Inline Actions nit: you should cache the result of is.relocations.size() as `for (unsigned e = is.relocations.size(); i < e; ++i)` ruiu: nit: you should cache the result of is.relocations.size() as `for (unsigned e = is.relocations.
		if (is.relocations[i].offset == offset && is.relocations[i].expr != R_NONE)
		ruiuUnsubmitted Done Reply Inline Actions nit: `Return I` here and add an `llvm_unreachable` after the loop, so that we can catch an impossible condition at runtime. ruiu: nit: `Return I` here and add an `llvm_unreachable` after the loop, so that we can catch an…
		tmsriramAuthorUnsubmitted Done Reply Inline Actions Don't follow. I == rellocations.size() is used to check that no such relocation was found. It is not an impossible condition. tmsriram: Don't follow. I == rellocations.size() is used to check that no such relocation was found. It…
		return i;
		MaskRayUnsubmitted Done Reply Inline Actions In an old comment, I suggested See Relocations.cpp:scanRelocs. A better way is to sort relocations beforehand. It is not addressed. MaskRay: In an old comment, I suggested > See Relocations.cpp:scanRelocs. A better way is to sort…
		tmsriramAuthorUnsubmitted Done Reply Inline Actions Could you please reconsider this? I understand what you mean here. This code is going to change when new psABI relocations are added. Could we re-open this then? tmsriram: Could you please reconsider this? I understand what you mean here. This code is going to change…
		MaskRayUnsubmitted Not Done Reply Inline Actions What is the average size of `is.relocations.size()`? It it is small in practice, the comment should mention it. MaskRay: What is the average size of `is.relocations.size()`? It it is small in practice, the comment…
		MaskRayUnsubmitted Not Done Reply Inline Actions for (unsigned i = is.relocations.size(); i != 0; ) { --i; if (is.relocations[i].offset == offset && is.relocations[i].expr != R_NONE) return i; } MaskRay: ``` for (unsigned i = is.relocations.size(); i != 0; ) { --i; if (is.relocations[i].offset…
		tmsriramAuthorUnsubmitted Done Reply Inline Actions This won't work because this function should return the max. size when not found. If I change that assumption, I must also change how this is consumed by its callers. Is there a reason why you want to search in the reverse direction? I mean I will change it if you could tell me why. tmsriram: This won't work because this function should return the max. size when not found. If I change…
		MaskRayUnsubmitted Done Reply Inline Actions for (unsigned i = is.relocations.size(); i != 0; ) { --i; if (is.relocations[i].offset == offset && is.relocations[i].expr != R_NONE) return i; } return is.relocations.size(); An input section may have several relocations. Scanning backward can be more efficient. MaskRay: ``` for (unsigned i = is.relocations.size(); i != 0; ) { --i; if (is.relocations[i].offset…
		tmsriramAuthorUnsubmitted Done Reply Inline Actions We discussed this f2f and even about sorting it and you expressed that you were alright leaving it as is, maybe you forgot. tmsriram: We discussed this f2f and even about sorting it and you expressed that you were alright leaving…
		tmsriramAuthorUnsubmitted Done Reply Inline Actions Forgot to add that since this will change when the new relocations come in. tmsriram: Forgot to add that since this will change when the new relocations come in.
		MaskRayUnsubmitted Done Reply Inline Actions No, I do not forget it. The relocations are still sorted in practice. It is just that this is not a guaranteed property. Iterating the relocations backward is faster. MaskRay: No, I do not forget it. The relocations are still sorted in practice. It is just that this is…
		tmsriramAuthorUnsubmitted Done Reply Inline Actions Alright, backward scanning now. tmsriram: Alright, backward scanning now.
		}
		return size;
		}

		// Returns true if R corresponds to a relocation used for a jump instruction.
		MaskRayUnsubmitted Done Reply Inline Actions getJumpRelocationWithOffset is unused. See Relocations.cpp:scanRelocs. A better way is to sort relocations beforehand. MaskRay: getJumpRelocationWithOffset is unused. See Relocations.cpp:scanRelocs. A better way is to sort…
		tmsriramAuthorUnsubmitted Done Reply Inline Actions Deleted. tmsriram: Deleted.
		// TODO: Once special relocations for relaxable jump instructions are available,
		// this should be modified to use those relocations.
		static bool isRelocationForJmpInsn(Relocation &R) {
		return R.type == R_X86_64_PLT32 \|\| R.type == R_X86_64_PC32 \|\|
		MaskRayUnsubmitted Done Reply Inline Actions delete parens MaskRay: delete parens
		R.type == R_X86_64_PC8;
		ruiuUnsubmitted Done Reply Inline Actions Does "direct jmp" actually mean just J_JMP_32? ruiu: Does "direct jmp" actually mean just J_JMP_32?
		tmsriramAuthorUnsubmitted Done Reply Inline Actions Yes, it is J_JMP_32. Do you want this refactored into getJmpInsnType? tmsriram: Yes, it is J_JMP_32. Do you want this refactored into getJmpInsnType?
		ruiuUnsubmitted Done Reply Inline Actions I found it a little confusing to use "direct jump" to refer only to J_JMP_32 here and used the same term to represent all J* instructions in the deleteFallThruJmpInsn function comment. But maybe you can just inline and remove this function. ruiu: I found it a little confusing to use "direct jump" to refer only to J_JMP_32 here and used the…
		}

		MaskRayUnsubmitted Done Reply Inline Actions Delete this once psABI reservers a new relocation type for fallthrough jumps. MaskRay: Delete this once psABI reservers a new relocation type for fallthrough jumps.
		// Return true if Relocation R points to the first instruction in the
		// next section.
		// TODO: Delete this once psABI reserves a new relocation type for fall thru
		// jumps.
		static bool isFallThruRelocation(InputSection &is, InputFile *file,
		MaskRayUnsubmitted Done Reply Inline Actions Neither _PC8 nor _PC32 is tested. MaskRay: Neither _PC8 nor _PC32 is tested.
		tmsriramAuthorUnsubmitted Done Reply Inline Actions How do I test this? LLVM uses PLT32 relocs for this. Potentially, PC32 or PC8 can be used for this. tmsriram: How do I test this? LLVM uses PLT32 relocs for this. Potentially, PC32 or PC8 can be used for…
		MaskRayUnsubmitted Done Reply Inline Actions .byte 0xe8 .long foo - . - 4 MaskRay: ``` .byte 0xe8 .long foo - . - 4 ```
		tmsriramAuthorUnsubmitted Done Reply Inline Actions Okay, added a test for PC32. Also, the PC8 reloc will be created when we shrink a jmp insn offset from 32 to 8. The actual shrink code will be presented in a separate patch which will test for the PC8 so I will leave that out for now. Further, when we add new psABI relocs, this code and tests will have to be modified. tmsriram: Okay, added a test for PC32. Also, the PC8 reloc will be created when we shrink a jmp insn…
		InputSection *nextIS, Relocation &r) {
		ruiuUnsubmitted Done Reply Inline Actions nit: remove extra parentheses ruiu: nit: remove extra parentheses
		if (!isRelocationForJmpInsn(r))
		return false;

		uint64_t addrLoc = is.getOutputSection()->addr + is.outSecOff + r.offset;
		ruiuUnsubmitted Done Reply Inline Actions nit: remove extra parentheses ruiu: nit: remove extra parentheses
		MaskRayUnsubmitted Done Reply Inline Actions Delete `SignExtend64`. config->wordsize * 8 is a constant, 64. MaskRay: Delete `SignExtend64`. config->wordsize * 8 is a constant, 64.
		uint64_t targetOffset = InputSectionBase::getRelocTargetVA(
		file, r.type, r.addend, addrLoc, *r.sym, r.expr);

		// If this jmp is a fall thru, the target offset is the beginning of the
		MaskRayUnsubmitted Done Reply Inline Actions Fix variable case. MaskRay: Fix variable case.
		// next section.
		uint64_t nextSectionOffset =
		MaskRayUnsubmitted Done Reply Inline Actions Delete excess parentheses MaskRay: Delete excess parentheses
		nextIS->getOutputSection()->addr + nextIS->outSecOff;
		return (addrLoc + 4 + targetOffset) == nextSectionOffset;
		}

		// Return the jmp instruction opcode that is the inverse of the given
		// opcode. For example, JE inverted is JNE.
		static JmpInsnOpcode invertJmpOpcode(const JmpInsnOpcode opcode) {
		ruiuUnsubmitted Done Reply Inline Actions I think you can just return the result of the boolean expression. ruiu: I think you can just return the result of the boolean expression.
		switch (opcode) {
		case J_JE_32:
		ruiuUnsubmitted Done Reply Inline Actions If you define new set of relocation types for bb sections, you can make it guaranteed that the relocations always point to the beginning of a section (because if there's a jump instruction jumping to a middle of a bb section, it is not really a BB). ruiu: If you define new set of relocation types for bb sections, you can make it guaranteed that the…
		tmsriramAuthorUnsubmitted Done Reply Inline Actions Agreed, a new relocation type will simplify this. I added a comment here. Could we keep this temporarily until we can get a new relocation added? tmsriram: Agreed, a new relocation type will simplify this. I added a comment here. Could we keep this…
		return J_JNE_32;
		case J_JNE_32:
		return J_JE_32;
		case J_JG_32:
		return J_JLE_32;
		case J_JGE_32:
		return J_JL_32;
		case J_JB_32:
		return J_JAE_32;
		case J_JBE_32:
		return J_JA_32;
		case J_JL_32:
		return J_JGE_32;
		case J_JLE_32:
		return J_JG_32;
		case J_JA_32:
		return J_JBE_32;
		case J_JAE_32:
		return J_JB_32;
		default:
		return J_UNKNOWN;
		}
		}

		// Deletes direct jump instruction in input sections that jumps to the
		// following section as it is not required. If there are two consecutive jump
		// instructions, it checks if they can be flipped and one can be deleted.
		// For example:
		ruiuUnsubmitted Done Reply Inline Actions You have a `default` as well as this `return` statement, but I guess that compilers are smart enough to figure out that this return is unreachable? ruiu: You have a `default` as well as this `return` statement, but I guess that compilers are smart…
		// .section .text
		// a.BB.foo:
		// ...
		ruiuUnsubmitted Done Reply Inline Actions In this context, does "direct jump" mean all J* instructions? ruiu: In this context, does "direct jump" mean all J* instructions?
		// 10: jne aa.BB.foo
		// 16: jmp bar
		// aa.BB.foo:
		// ...
		ruiuUnsubmitted Done Reply Inline Actions is, file and nextIS ruiu: is, file and nextIS
		//
		ruiuUnsubmitted Done Reply Inline Actions Ditto ruiu: Ditto
		// can be converted to:
		// a.BB.foo:
		// ...
		// 10: je bar #jne flipped to je and the jmp is deleted.
		// aa.BB.foo:
		// ...
		bool X86_64::deleteFallThruJmpInsn(InputSection &is, InputFile *file,
		InputSection *nextIS) const {
		const unsigned sizeOfDirectJmpInsn = 5;

		if (nextIS == nullptr)
		return false;

		if (is.getSize() < sizeOfDirectJmpInsn)
		return false;

		// If this jmp insn can be removed, it is the last insn and the
		// relocation is 4 bytes before the end.
		unsigned rIndex = getRelocationWithOffset(is, is.getSize() - 4);
		MaskRayUnsubmitted Done Reply Inline Actions Nit: delete parens `(is.getSize() - 4)` MaskRay: Nit: delete parens `(is.getSize() - 4)`
		if (rIndex == is.relocations.size())
		return false;

		Relocation &r = is.relocations[rIndex];

		// Check if the relocation corresponds to a direct jmp.
		const uint8_t *secContents = is.data().data();
		MaskRayUnsubmitted Done Reply Inline Actions Delete excess parentheses MaskRay: Delete excess parentheses
		// If it is not a direct jmp instruction, there is nothing to do here.
		MaskRayUnsubmitted Done Reply Inline Actions Variable cases need to be fixed. If `R` is an invalid R_X86_64_PLT32 that happens to be r_offset=0. R.offset-1 will be an out-of-bounds read. MaskRay: Variable cases need to be fixed. If `R` is an invalid R_X86_64_PLT32 that happens to be…
		tmsriramAuthorUnsubmitted Done Reply Inline Actions We check above that the size of the InputSection is atleast that of a direct jmp. We also check the relocation 4 bytes from the end. So, this out-of-bounds is guaranteed to not occur. tmsriram: We check above that the size of the InputSection is atleast that of a direct jmp. We also…
		if (*(secContents + r.offset - 1) != 0xe9)
		return false;
		MaskRayUnsubmitted Done Reply Inline Actions I am a bit skeptical about this scheme. The code sequence relies heavily on the order of the last two instructions of a basic block. Does it disable optimizations placing additional instructions between jcc and jmp? MaskRay: I am a bit skeptical about this scheme. The code sequence relies heavily on the order of the…
		tmsriramAuthorUnsubmitted Done Reply Inline Actions If there are instructions between the jumps, the optimization is simply not valid. The optimization is only valid for consecutive jumps like: jne <next_BB> jmp <far_BB> which can be converted to je <far_BB> jmp <next_BB> tmsriram: If there are instructions between the jumps, the optimization is simply not valid. The…

		if (isFallThruRelocation(is, file, nextIS, r)) {
		// This is a fall thru and can be deleted.
		r.expr = R_NONE;
		r.offset = 0;
		is.drop_back(sizeOfDirectJmpInsn);
		is.nopFiller = true;
		return true;
		ruiuUnsubmitted Done Reply Inline Actions nit: jmpOpcode_B -> jmpOpcodeB ruiu: nit: jmpOpcode_B -> jmpOpcodeB
		}

		// Now, check if flip and delete is possible.
		const unsigned sizeOfJmpCCInsn = 6;
		// To flip, there must be atleast one JmpCC and one direct jmp.
		if (is.getSize() < sizeOfDirectJmpInsn + sizeOfJmpCCInsn)
		return 0;
		ruiuUnsubmitted Done Reply Inline Actions Please fix local variable names. ruiu: Please fix local variable names.

		unsigned rbIndex =
		MaskRayUnsubmitted Done Reply Inline Actions ditto. An invalid r_offset can cause the read to go out-of-bounds. MaskRay: ditto. An invalid r_offset can cause the read to go out-of-bounds.
		tmsriramAuthorUnsubmitted Done Reply Inline Actions Same argument here, cannot go out of bounds as we explicitly check the size of the Input Section in the above lines. tmsriram: Same argument here, cannot go out of bounds as we explicitly check the size of the Input…
		getRelocationWithOffset(is, (is.getSize() - sizeOfDirectJmpInsn - 4));
		if (rbIndex == is.relocations.size())
		return 0;

		Relocation &rB = is.relocations[rbIndex];

		const uint8_t *jmpInsnB = secContents + rB.offset - 1;
		JmpInsnOpcode jmpOpcodeB = getJmpInsnType(jmpInsnB - 1, jmpInsnB);
		if (jmpOpcodeB == J_UNKNOWN)
		return false;

		if (!isFallThruRelocation(is, file, nextIS, rB))
		return false;

		// jmpCC jumps to the fall thru block, the branch can be flipped and the
		// jmp can be deleted.
		JmpInsnOpcode jInvert = invertJmpOpcode(jmpOpcodeB);
		if (jInvert == J_UNKNOWN)
		return false;
		is.jumpInstrMods.push_back({jInvert, (rB.offset - 1), 4});
		MaskRayUnsubmitted Done Reply Inline Actions `rb = {... , ..., ...}` Set all fields at once. MaskRay: `rb = {... , ..., ...}` Set all fields at once.
		// Move R's values to rB except the offset.
		rB = {r.expr, r.type, rB.offset, r.addend, r.sym};
		// Cancel R
		r.expr = R_NONE;
		r.offset = 0;
		is.drop_back(sizeOfDirectJmpInsn);
		is.nopFiller = true;
		return true;
		}

RelExpr X86_64::getRelExpr(RelType type, const Symbol &s,		RelExpr X86_64::getRelExpr(RelType type, const Symbol &s,
const uint8_t *loc) const {		const uint8_t *loc) const {
if (type == R_X86_64_GOTTPOFF)		if (type == R_X86_64_GOTTPOFF)
config->hasStaticTlsModel = true;		config->hasStaticTlsModel = true;

switch (type) {		switch (type) {
case R_X86_64_8:		case R_X86_64_8:
case R_X86_64_16:		case R_X86_64_16:
▲ Show 20 Lines • Show All 190 Lines • ▼ Show 20 Lines	if (memcmp(inst, "\x48\x03\x25", 3) == 0) {
// "addq foo@gottpoff(%rip),%rsp" -> "addq $foo,%rsp"		// "addq foo@gottpoff(%rip),%rsp" -> "addq $foo,%rsp"
memcpy(inst, "\x48\x81\xc4", 3);		memcpy(inst, "\x48\x81\xc4", 3);
} else if (memcmp(inst, "\x4c\x03\x25", 3) == 0) {		} else if (memcmp(inst, "\x4c\x03\x25", 3) == 0) {
// "addq foo@gottpoff(%rip),%r12" -> "addq $foo,%r12"		// "addq foo@gottpoff(%rip),%r12" -> "addq $foo,%r12"
memcpy(inst, "\x49\x81\xc4", 3);		memcpy(inst, "\x49\x81\xc4", 3);
} else if (memcmp(inst, "\x4c\x03", 2) == 0) {		} else if (memcmp(inst, "\x4c\x03", 2) == 0) {
// "addq foo@gottpoff(%rip),%r[8-15]" -> "leaq foo(%r[8-15]),%r[8-15]"		// "addq foo@gottpoff(%rip),%r[8-15]" -> "leaq foo(%r[8-15]),%r[8-15]"
memcpy(inst, "\x4d\x8d", 2);		memcpy(inst, "\x4d\x8d", 2);
*regSlot = 0x80 \| (reg << 3) \| reg;		*regSlot = 0x80 \| (reg << 3) \| reg;
		MaskRayUnsubmitted Done Reply Inline Actions Delete () MaskRay: Delete ()
} else if (memcmp(inst, "\x48\x03", 2) == 0) {		} else if (memcmp(inst, "\x48\x03", 2) == 0) {
// "addq foo@gottpoff(%rip),%reg -> "leaq foo(%reg),%reg"		// "addq foo@gottpoff(%rip),%reg -> "leaq foo(%reg),%reg"
		MaskRayUnsubmitted Done Reply Inline Actions Delete superfluous `{}` and blank lines. MaskRay: Delete superfluous `{}` and blank lines.
memcpy(inst, "\x48\x8d", 2);		memcpy(inst, "\x48\x8d", 2);
*regSlot = 0x80 \| (reg << 3) \| reg;		*regSlot = 0x80 \| (reg << 3) \| reg;
} else if (memcmp(inst, "\x4c\x8b", 2) == 0) {		} else if (memcmp(inst, "\x4c\x8b", 2) == 0) {
// "movq foo@gottpoff(%rip),%r[8-15]" -> "movq $foo,%r[8-15]"		// "movq foo@gottpoff(%rip),%r[8-15]" -> "movq $foo,%r[8-15]"
memcpy(inst, "\x49\xc7", 2);		memcpy(inst, "\x49\xc7", 2);
*regSlot = 0xc0 \| reg;		*regSlot = 0xc0 \| reg;
} else if (memcmp(inst, "\x48\x8b", 2) == 0) {		} else if (memcmp(inst, "\x48\x8b", 2) == 0) {
// "movq foo@gottpoff(%rip),%reg" -> "movq $foo,%reg"		// "movq foo@gottpoff(%rip),%reg" -> "movq $foo,%reg"
Show All 34 Lines	if (loc[4] == 0xe8) {
// to		// to
// .word 0x6666		// .word 0x6666
// .byte 0x66		// .byte 0x66
// mov %fs:0,%rax		// mov %fs:0,%rax
// leaq bar@tpoff(%rax), %rcx		// leaq bar@tpoff(%rax), %rcx
memcpy(loc - 3, inst, sizeof(inst));		memcpy(loc - 3, inst, sizeof(inst));
return;		return;
}		}

		MaskRayUnsubmitted Done Reply Inline Actions Delete `if (BytesGrown)` MaskRay: Delete `if (BytesGrown)`
if (loc[4] == 0xff && loc[5] == 0x15) {		if (loc[4] == 0xff && loc[5] == 0x15) {
// Convert		// Convert
// leaq x@tlsld(%rip),%rdi # 48 8d 3d <Loc>		// leaq x@tlsld(%rip),%rdi # 48 8d 3d <Loc>
// call *__tls_get_addr@GOTPCREL(%rip) # ff 15 <disp32>		// call *__tls_get_addr@GOTPCREL(%rip) # ff 15 <disp32>
// to		// to
// .long 0x66666666		// .long 0x66666666
// movq %fs:0,%rax		// movq %fs:0,%rax
// See "Table 11.9: LD -> LE Code Transition (LP64)" in		// See "Table 11.9: LD -> LE Code Transition (LP64)" in
// https://raw.githubusercontent.com/wiki/hjl-tools/x86-psABI/x86-64-psABI-1.0.pdf		// https://raw.githubusercontent.com/wiki/hjl-tools/x86-psABI/x86-64-psABI-1.0.pdf
loc[-3] = 0x66;		loc[-3] = 0x66;
memcpy(loc - 2, inst, sizeof(inst));		memcpy(loc - 2, inst, sizeof(inst));
return;		return;
}		}

error(getErrorLocation(loc - 3) +		error(getErrorLocation(loc - 3) +
"expected R_X86_64_PLT32 or R_X86_64_GOTPCRELX after R_X86_64_TLSLD");		"expected R_X86_64_PLT32 or R_X86_64_GOTPCRELX after R_X86_64_TLSLD");
}		}

		// A JumpInstrMod at a specific offset indicates that the jump instruction
		// opcode at that offset must be modified. This is specifically used to relax
		// jump instructions with basic block sections. This function looks at the
		// JumpMod and effects the change.
		void X86_64::applyJumpInstrMod(uint8_t *loc, JumpModType type,
		unsigned size) const {
		switch (type) {
		case J_JMP_32:
		MaskRayUnsubmitted Done Reply Inline Actions loc[-1] MaskRay: loc[-1]
		if (size == 4)
		*loc = 0xe9;
		else
		*loc = 0xeb;
		break;
		case J_JE_32:
		if (size == 4) {
		loc[-1] = 0x0f;
		*loc = 0x84;
		} else
		*loc = 0x74;
		break;
		case J_JNE_32:
		if (size == 4) {
		loc[-1] = 0x0f;
		*loc = 0x85;
		} else
		*loc = 0x75;
		break;
		case J_JG_32:
		if (size == 4) {
		loc[-1] = 0x0f;
		*loc = 0x8f;
		} else
		*loc = 0x7f;
		break;
		case J_JGE_32:
		if (size == 4) {
		loc[-1] = 0x0f;
		*loc = 0x8d;
		MaskRayUnsubmitted Done Reply Inline Actions llvm_unreachable if this is unreachable MaskRay: llvm_unreachable if this is unreachable
		} else
		*loc = 0x7d;
		break;
		case J_JB_32:
		if (size == 4) {
		loc[-1] = 0x0f;
		*loc = 0x82;
		} else
		*loc = 0x72;
		break;
		case J_JBE_32:
		if (size == 4) {
		loc[-1] = 0x0f;
		*loc = 0x86;
		} else
		*loc = 0x76;
		break;
		case J_JL_32:
		if (size == 4) {
		loc[-1] = 0x0f;
		*loc = 0x8c;
		} else
		*loc = 0x7c;
		break;
		case J_JLE_32:
		if (size == 4) {
		loc[-1] = 0x0f;
		*loc = 0x8e;
		} else
		*loc = 0x7e;
		break;
		case J_JA_32:
		if (size == 4) {
		loc[-1] = 0x0f;
		*loc = 0x87;
		} else
		*loc = 0x77;
		break;
		case J_JAE_32:
		if (size == 4) {
		loc[-1] = 0x0f;
		*loc = 0x83;
		MaskRayUnsubmitted Done Reply Inline Actions change default to J_UNKNOWN. All cases of the switch should be covered, instead of relying on `default:`. MaskRay: change default to J_UNKNOWN. All cases of the switch should be covered, instead of relying on…
		} else
		*loc = 0x73;
		break;
		case J_UNKNOWN:
		llvm_unreachable("Unknown Jump Relocation");
		}
		}

void X86_64::relocate(uint8_t *loc, const Relocation &rel, uint64_t val) const {		void X86_64::relocate(uint8_t *loc, const Relocation &rel, uint64_t val) const {
switch (rel.type) {		switch (rel.type) {
case R_X86_64_8:		case R_X86_64_8:
checkIntUInt(loc, val, 8, rel);		checkIntUInt(loc, val, 8, rel);
*loc = val;		*loc = val;
break;		break;
case R_X86_64_PC8:		case R_X86_64_PC8:
checkInt(loc, val, 8, rel);		checkInt(loc, val, 8, rel);
▲ Show 20 Lines • Show All 407 Lines • Show Last 20 Lines

lld/ELF/Config.h

Show First 20 Lines • Show All 108 Lines • ▼ Show 20 Lines	struct Configuration {
llvm::StringRef optRemarksPasses;		llvm::StringRef optRemarksPasses;
llvm::StringRef optRemarksFormat;		llvm::StringRef optRemarksFormat;
llvm::StringRef progName;		llvm::StringRef progName;
llvm::StringRef printSymbolOrder;		llvm::StringRef printSymbolOrder;
llvm::StringRef soName;		llvm::StringRef soName;
llvm::StringRef sysroot;		llvm::StringRef sysroot;
llvm::StringRef thinLTOCacheDir;		llvm::StringRef thinLTOCacheDir;
llvm::StringRef thinLTOIndexOnlyArg;		llvm::StringRef thinLTOIndexOnlyArg;
		llvm::StringRef ltoBasicBlockSections;
std::pair<llvm::StringRef, llvm::StringRef> thinLTOObjectSuffixReplace;		std::pair<llvm::StringRef, llvm::StringRef> thinLTOObjectSuffixReplace;
std::pair<llvm::StringRef, llvm::StringRef> thinLTOPrefixReplace;		std::pair<llvm::StringRef, llvm::StringRef> thinLTOPrefixReplace;
std::string rpath;		std::string rpath;
std::vector<VersionDefinition> versionDefinitions;		std::vector<VersionDefinition> versionDefinitions;
std::vector<llvm::StringRef> auxiliaryList;		std::vector<llvm::StringRef> auxiliaryList;
std::vector<llvm::StringRef> filterList;		std::vector<llvm::StringRef> filterList;
std::vector<llvm::StringRef> searchPaths;		std::vector<llvm::StringRef> searchPaths;
std::vector<llvm::StringRef> symbolOrderingFile;		std::vector<llvm::StringRef> symbolOrderingFile;
Show All 35 Lines	struct Configuration {
bool gnuUnique;		bool gnuUnique;
bool hasDynamicList = false;		bool hasDynamicList = false;
bool hasDynSymTab;		bool hasDynSymTab;
bool ignoreDataAddressEquality;		bool ignoreDataAddressEquality;
bool ignoreFunctionAddressEquality;		bool ignoreFunctionAddressEquality;
bool ltoCSProfileGenerate;		bool ltoCSProfileGenerate;
bool ltoDebugPassManager;		bool ltoDebugPassManager;
bool ltoNewPassManager;		bool ltoNewPassManager;
		bool ltoUniqueBBSectionNames;
bool ltoWholeProgramVisibility;		bool ltoWholeProgramVisibility;
bool mergeArmExidx;		bool mergeArmExidx;
bool mipsN32Abi = false;		bool mipsN32Abi = false;
bool mmapOutputFile;		bool mmapOutputFile;
bool nmagic;		bool nmagic;
bool noDynamicLinker = false;		bool noDynamicLinker = false;
bool noinhibitExec;		bool noinhibitExec;
bool nostdlib;		bool nostdlib;
bool oFormatBinary;		bool oFormatBinary;
bool omagic;		bool omagic;
		bool optimizeBBJumps;
bool optRemarksWithHotness;		bool optRemarksWithHotness;
bool picThunk;		bool picThunk;
bool pie;		bool pie;
bool printGcSections;		bool printGcSections;
bool printIcfSections;		bool printIcfSections;
bool relocatable;		bool relocatable;
bool relrPackDynRelocs;		bool relrPackDynRelocs;
bool saveTemps;		bool saveTemps;
▲ Show 20 Lines • Show All 156 Lines • Show Last 20 Lines

lld/ELF/Driver.cpp

Show First 20 Lines • Show All 872 Lines • ▼ Show 20 Lines	static void readConfigs(opt::InputArgList &args) {
config->bsymbolicFunctions = args.hasArg(OPT_Bsymbolic_functions);		config->bsymbolicFunctions = args.hasArg(OPT_Bsymbolic_functions);
config->checkSections =		config->checkSections =
args.hasFlag(OPT_check_sections, OPT_no_check_sections, true);		args.hasFlag(OPT_check_sections, OPT_no_check_sections, true);
config->chroot = args.getLastArgValue(OPT_chroot);		config->chroot = args.getLastArgValue(OPT_chroot);
config->compressDebugSections = getCompressDebugSections(args);		config->compressDebugSections = getCompressDebugSections(args);
config->cref = args.hasFlag(OPT_cref, OPT_no_cref, false);		config->cref = args.hasFlag(OPT_cref, OPT_no_cref, false);
config->defineCommon = args.hasFlag(OPT_define_common, OPT_no_define_common,		config->defineCommon = args.hasFlag(OPT_define_common, OPT_no_define_common,
!args.hasArg(OPT_relocatable));		!args.hasArg(OPT_relocatable));
		config->optimizeBBJumps =
		args.hasFlag(OPT_optimize_bb_jumps, OPT_no_optimize_bb_jumps, false);
config->demangle = args.hasFlag(OPT_demangle, OPT_no_demangle, true);		config->demangle = args.hasFlag(OPT_demangle, OPT_no_demangle, true);
config->dependentLibraries = args.hasFlag(OPT_dependent_libraries, OPT_no_dependent_libraries, true);		config->dependentLibraries = args.hasFlag(OPT_dependent_libraries, OPT_no_dependent_libraries, true);
config->disableVerify = args.hasArg(OPT_disable_verify);		config->disableVerify = args.hasArg(OPT_disable_verify);
config->discard = getDiscard(args);		config->discard = getDiscard(args);
config->dwoDir = args.getLastArgValue(OPT_plugin_opt_dwo_dir_eq);		config->dwoDir = args.getLastArgValue(OPT_plugin_opt_dwo_dir_eq);
config->dynamicLinker = getDynamicLinker(args);		config->dynamicLinker = getDynamicLinker(args);
config->ehFrameHdr =		config->ehFrameHdr =
args.hasFlag(OPT_eh_frame_hdr, OPT_no_eh_frame_hdr, false);		args.hasFlag(OPT_eh_frame_hdr, OPT_no_eh_frame_hdr, false);
Show All 30 Lines	static void readConfigs(opt::InputArgList &args) {
config->ltoNewPassManager = args.hasArg(OPT_lto_new_pass_manager);		config->ltoNewPassManager = args.hasArg(OPT_lto_new_pass_manager);
config->ltoNewPmPasses = args.getLastArgValue(OPT_lto_newpm_passes);		config->ltoNewPmPasses = args.getLastArgValue(OPT_lto_newpm_passes);
config->ltoWholeProgramVisibility =		config->ltoWholeProgramVisibility =
args.hasArg(OPT_lto_whole_program_visibility);		args.hasArg(OPT_lto_whole_program_visibility);
config->ltoo = args::getInteger(args, OPT_lto_O, 2);		config->ltoo = args::getInteger(args, OPT_lto_O, 2);
config->ltoObjPath = args.getLastArgValue(OPT_lto_obj_path_eq);		config->ltoObjPath = args.getLastArgValue(OPT_lto_obj_path_eq);
config->ltoPartitions = args::getInteger(args, OPT_lto_partitions, 1);		config->ltoPartitions = args::getInteger(args, OPT_lto_partitions, 1);
config->ltoSampleProfile = args.getLastArgValue(OPT_lto_sample_profile);		config->ltoSampleProfile = args.getLastArgValue(OPT_lto_sample_profile);
		config->ltoBasicBlockSections =
		ruiuUnsubmitted Done Reply Inline Actions This should be named `ltoBasicblockSections` to exactly match the option name. ruiu: This should be named `ltoBasicblockSections` to exactly match the option name.
		args.getLastArgValue(OPT_lto_basicblock_sections);
		config->ltoUniqueBBSectionNames =
		args.hasFlag(OPT_lto_unique_bb_section_names,
		OPT_no_lto_unique_bb_section_names, false);
config->mapFile = args.getLastArgValue(OPT_Map);		config->mapFile = args.getLastArgValue(OPT_Map);
config->mipsGotSize = args::getInteger(args, OPT_mips_got_size, 0xfff0);		config->mipsGotSize = args::getInteger(args, OPT_mips_got_size, 0xfff0);
config->mergeArmExidx =		config->mergeArmExidx =
args.hasFlag(OPT_merge_exidx_entries, OPT_no_merge_exidx_entries, true);		args.hasFlag(OPT_merge_exidx_entries, OPT_no_merge_exidx_entries, true);
config->mmapOutputFile =		config->mmapOutputFile =
args.hasFlag(OPT_mmap_output_file, OPT_no_mmap_output_file, true);		args.hasFlag(OPT_mmap_output_file, OPT_no_mmap_output_file, true);
config->nmagic = args.hasFlag(OPT_nmagic, OPT_no_nmagic, false);		config->nmagic = args.hasFlag(OPT_nmagic, OPT_no_nmagic, false);
config->noinhibitExec = args.hasArg(OPT_noinhibit_exec);		config->noinhibitExec = args.hasArg(OPT_noinhibit_exec);
▲ Show 20 Lines • Show All 1,137 Lines • Show Last 20 Lines

lld/ELF/InputSection.h

Show First 20 Lines • Show All 122 Lines • ▼ Show 20 Lines	public:
// ObjFile<ELFT>, but in order to avoid ELFT, we use InputFile as		// ObjFile<ELFT>, but in order to avoid ELFT, we use InputFile as
// its static type.		// its static type.
InputFile *file;		InputFile *file;

template <class ELFT> ObjFile<ELFT> *getFile() const {		template <class ELFT> ObjFile<ELFT> *getFile() const {
return cast_or_null<ObjFile<ELFT>>(file);		return cast_or_null<ObjFile<ELFT>>(file);
}		}

		// If basic block sections are enabled, many code sections could end up with
		// one or two jump instructions at the end that could be relaxed to a smaller
		// instruction. The members below help trimming the trailing jump instruction
		ruiuUnsubmitted Done Reply Inline Actions bytesDropped, trimmed ruiu: bytesDropped, trimmed
		ruiuUnsubmitted Done Reply Inline Actions This needs a brief comment as to what they are for, e.g. "If Basic-Block Sections is enabled, most code sections ends with a jump instruction, and if a basic block just fall through in the final code layout, we want to trim the trailing jump instruction by shrinking a section. We have a few members to support that operation." ruiu: This needs a brief comment as to what they are for, e.g. "If Basic-Block Sections is enabled…
		// and shrinking a section.
		ruiuUnsubmitted Done Reply Inline Actions Now I wonder if you can just shrink `rawData`? Then you can revert he change you made to `getSize()`. ruiu: Now I wonder if you can just shrink `rawData`? Then you can revert he change you made to…
		tmsriramAuthorUnsubmitted Done Reply Inline Actions I looked at this and saw that "trimmed" can be deleted which simplifies the code here and in getSize() much more. I need to keep track of actual available space and shrunk space. This is because we both shrink and grow the section, a follow-up patch that does this optimization. So, bytesDropped is useful to undo the shrink. PTAL and see if the simplification helps. Also, if we dont keep track of the actual size of the section when growing and shrinking, it is not possible to catch bugs where we accidentally grow more than the original size of the section, potentially writing out-of-bounds into rawData. tmsriram: I looked at this and saw that "trimmed" can be deleted which simplifies the code here and in…
		tmsriramAuthorUnsubmitted Done Reply Inline Actions Sorry, I mistyped the last line. It should read "potentially reading out-of-bounds from rawData". tmsriram: Sorry, I mistyped the last line. It should read "potentially reading out-of-bounds from…
		ruiuUnsubmitted Done Reply Inline Actions I see. Maybe we can have two ArrayRefs, one is an ArrayRef of the original size and the other is a (possibly) shrunk one, but I think it doesn't matter much, so I'm fine with this approach. ruiu: I see. Maybe we can have two ArrayRefs, one is an ArrayRef of the original size and the other…
		tmsriramAuthorUnsubmitted Done Reply Inline Actions Once the relocations are added, this could be made simpler so I will leave it for now. tmsriram: Once the relocations are added, this could be made simpler so I will leave it for now.
		unsigned bytesDropped = 0;

		void drop_back(uint64_t num) { bytesDropped += num; }

		void push_back(uint64_t num) {
		assert(bytesDropped >= num);
		bytesDropped -= num;
		}

		void trim() {
		if (bytesDropped) {
		rawData = rawData.drop_back(bytesDropped);
		bytesDropped = 0;
		}
		}

ArrayRef<uint8_t> data() const {		ArrayRef<uint8_t> data() const {
if (uncompressedSize >= 0)		if (uncompressedSize >= 0)
uncompress();		uncompress();
return rawData;		return rawData;
}		}

uint64_t getOffsetInFile() const;		uint64_t getOffsetInFile() const;

Show All 39 Lines	public:
std::string getSrcMsg(const Symbol &sym, uint64_t offset);		std::string getSrcMsg(const Symbol &sym, uint64_t offset);
std::string getObjMsg(uint64_t offset);		std::string getObjMsg(uint64_t offset);

// Each section knows how to relocate itself. These functions apply		// Each section knows how to relocate itself. These functions apply
// relocations, assuming that Buf points to this section's copy in		// relocations, assuming that Buf points to this section's copy in
// the mmap'ed output buffer.		// the mmap'ed output buffer.
template <class ELFT> void relocate(uint8_t buf, uint8_t bufEnd);		template <class ELFT> void relocate(uint8_t buf, uint8_t bufEnd);
void relocateAlloc(uint8_t buf, uint8_t bufEnd);		void relocateAlloc(uint8_t buf, uint8_t bufEnd);
		static uint64_t getRelocTargetVA(const InputFile *File, RelType Type,
		int64_t A, uint64_t P, const Symbol &Sym,
		RelExpr Expr);

// The native ELF reloc data type is not very convenient to handle.		// The native ELF reloc data type is not very convenient to handle.
// So we convert ELF reloc records to our own records in Relocations.cpp.		// So we convert ELF reloc records to our own records in Relocations.cpp.
// This vector contains such "cooked" relocations.		// This vector contains such "cooked" relocations.
std::vector<Relocation> relocations;		std::vector<Relocation> relocations;

		// Indicates that this section needs to be padded with a NOP filler if set to
		// true.
		bool nopFiller = false;

		// These are modifiers to jump instructions that are necessary when basic
		// block sections are enabled. Basic block sections creates opportunities to
		// relax jump instructions at basic block boundaries after reordering the
		// basic blocks.
		std::vector<JumpInstrMod> jumpInstrMods;
		ruiuUnsubmitted Done Reply Inline Actions lowercase ruiu: lowercase

// A function compiled with -fsplit-stack calling a function		// A function compiled with -fsplit-stack calling a function
		ruiuUnsubmitted Done Reply Inline Actions Since JumpInstrMod is a public member, you don't need this? ruiu: Since JumpInstrMod is a public member, you don't need this?
// compiled without -fsplit-stack needs its prologue adjusted. Find		// compiled without -fsplit-stack needs its prologue adjusted. Find
// such functions and adjust their prologues. This is very similar		// such functions and adjust their prologues. This is very similar
// to relocation. See https://gcc.gnu.org/wiki/SplitStacks for more		// to relocation. See https://gcc.gnu.org/wiki/SplitStacks for more
// information.		// information.
template <typename ELFT>		template <typename ELFT>
void adjustSplitStackFunctionPrologues(uint8_t buf, uint8_t end);		void adjustSplitStackFunctionPrologues(uint8_t buf, uint8_t end);


▲ Show 20 Lines • Show All 172 Lines • Show Last 20 Lines

lld/ELF/InputSection.cpp

Show First 20 Lines • Show All 132 Lines • ▼ Show 20 Lines	if (hdr.sh_addralign > UINT32_MAX)
fatal(toString(&file) + ": section sh_addralign is too large");		fatal(toString(&file) + ": section sh_addralign is too large");
}		}

size_t InputSectionBase::getSize() const {		size_t InputSectionBase::getSize() const {
if (auto *s = dyn_cast<SyntheticSection>(this))		if (auto *s = dyn_cast<SyntheticSection>(this))
return s->getSize();		return s->getSize();
if (uncompressedSize >= 0)		if (uncompressedSize >= 0)
return uncompressedSize;		return uncompressedSize;
return rawData.size();		return rawData.size() - bytesDropped;
}		}

void InputSectionBase::uncompress() const {		void InputSectionBase::uncompress() const {
size_t size = uncompressedSize;		size_t size = uncompressedSize;
char *uncompressedBuf;		char *uncompressedBuf;
{		{
static std::mutex mu;		static std::mutex mu;
std::lock_guard<std::mutex> lock(mu);		std::lock_guard<std::mutex> lock(mu);
▲ Show 20 Lines • Show All 504 Lines • ▼ Show 20 Lines	static int64_t getTlsTpOffset(const Symbol &s) {
case EM_X86_64:		case EM_X86_64:
return s.getVA(0) - tls->p_memsz -		return s.getVA(0) - tls->p_memsz -
((-tls->p_vaddr - tls->p_memsz) & (tls->p_align - 1));		((-tls->p_vaddr - tls->p_memsz) & (tls->p_align - 1));
default:		default:
llvm_unreachable("unhandled Config->EMachine");		llvm_unreachable("unhandled Config->EMachine");
}		}
}		}

static uint64_t getRelocTargetVA(const InputFile *file, RelType type, int64_t a,		uint64_t InputSectionBase::getRelocTargetVA(const InputFile *file, RelType type,
uint64_t p, const Symbol &sym, RelExpr expr) {		int64_t a, uint64_t p,
		const Symbol &sym, RelExpr expr) {
switch (expr) {		switch (expr) {
case R_ABS:		case R_ABS:
case R_DTPREL:		case R_DTPREL:
case R_RELAX_TLS_LD_TO_LE_ABS:		case R_RELAX_TLS_LD_TO_LE_ABS:
case R_RELAX_GOT_PC_NOPIC:		case R_RELAX_GOT_PC_NOPIC:
case R_RISCV_ADD:		case R_RISCV_ADD:
return sym.getVA(a);		return sym.getVA(a);
case R_ADDEND:		case R_ADDEND:
▲ Show 20 Lines • Show All 194 Lines • ▼ Show 20 Lines	for (const RelTy &rel : rels) {
if (!RelTy::IsRela)		if (!RelTy::IsRela)
addend += target->getImplicitAddend(bufLoc, type);		addend += target->getImplicitAddend(bufLoc, type);

Symbol &sym = getFile<ELFT>()->getRelocTargetSym(rel);		Symbol &sym = getFile<ELFT>()->getRelocTargetSym(rel);
RelExpr expr = target->getRelExpr(type, sym, bufLoc);		RelExpr expr = target->getRelExpr(type, sym, bufLoc);
if (expr == R_NONE)		if (expr == R_NONE)
continue;		continue;

		if (expr == R_SIZE) {
		MaskRayUnsubmitted Done Reply Inline Actions How is `R_SIZE` used? MaskRay: How is `R_SIZE` used?
		tmsriramAuthorUnsubmitted Done Reply Inline Actions In cases where the range is represented in DWARF as start and length, we defer the length calculation to the link stage, by emitting an appropriate SIZE relocation instead of hardcoding the length directly in the object file by the compiler. tmsriram: In cases where the range is represented in DWARF as start and length, we defer the length…
		target->relocateNoSym(bufLoc, type,
		SignExtend64<bits>(sym.getSize() + addend));
		continue;
		}

if (expr != R_ABS && expr != R_DTPREL && expr != R_RISCV_ADD) {		if (expr != R_ABS && expr != R_DTPREL && expr != R_RISCV_ADD) {
std::string msg = getLocation<ELFT>(offset) +		std::string msg = getLocation<ELFT>(offset) +
": has non-ABS relocation " + toString(type) +		": has non-ABS relocation " + toString(type) +
" against symbol '" + toString(sym) + "'";		" against symbol '" + toString(sym) + "'";
if (expr != R_PC && expr != R_ARM_PCA) {		if (expr != R_PC && expr != R_ARM_PCA) {
error(msg);		error(msg);
return;		return;
}		}
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	else
sec->relocateNonAlloc<ELFT>(buf, sec->template rels<ELFT>());		sec->relocateNonAlloc<ELFT>(buf, sec->template rels<ELFT>());
}		}

void InputSectionBase::relocateAlloc(uint8_t buf, uint8_t bufEnd) {		void InputSectionBase::relocateAlloc(uint8_t buf, uint8_t bufEnd) {
assert(flags & SHF_ALLOC);		assert(flags & SHF_ALLOC);
const unsigned bits = config->wordsize * 8;		const unsigned bits = config->wordsize * 8;

for (const Relocation &rel : relocations) {		for (const Relocation &rel : relocations) {
		if (rel.expr == R_NONE)
		MaskRayUnsubmitted Done Reply Inline Actions After patching `expr` to R_NONE, propeller should delete these relocations. InputSection.cpp should not know R_NONE has to be skipped. MaskRay: After patching `expr` to R_NONE, propeller should delete these relocations. InputSection.cpp…
		tmsriramAuthorUnsubmitted Done Reply Inline Actions I assumed it would be helpful in general to avoid R_NONE relocations here. I dont mind deleting the R_NONE relocations with Propeller. tmsriram: I assumed it would be helpful in general to avoid R_NONE relocations here. I dont mind deleting…
		tmsriramAuthorUnsubmitted Done Reply Inline Actions I am assuming this is alright. tmsriram: I am assuming this is alright.
		continue;
uint64_t offset = rel.offset;		uint64_t offset = rel.offset;
if (auto *sec = dyn_cast<InputSection>(this))		if (auto *sec = dyn_cast<InputSection>(this))
offset += sec->outSecOff;		offset += sec->outSecOff;
uint8_t *bufLoc = buf + offset;		uint8_t *bufLoc = buf + offset;
RelType type = rel.type;		RelType type = rel.type;

uint64_t addrLoc = getOutputSection()->addr + offset;		uint64_t addrLoc = getOutputSection()->addr + offset;
RelExpr expr = rel.expr;		RelExpr expr = rel.expr;
▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	case R_PPC64_CALL:
}		}
target->relocate(bufLoc, rel, targetVA);		target->relocate(bufLoc, rel, targetVA);
break;		break;
default:		default:
target->relocate(bufLoc, rel, targetVA);		target->relocate(bufLoc, rel, targetVA);
break;		break;
}		}
}		}

		// Apply jumpInstrMods. jumpInstrMods are created when the opcode of
		// a jmp insn must be modified to shrink the jmp insn or to flip the jmp
		// insn. This is primarily used to relax and optimize jumps created with
		// basic block sections.
		if (auto *sec = dyn_cast<InputSection>(this)) {
		for (const JumpInstrMod &jumpMod : jumpInstrMods) {
		uint64_t offset = jumpMod.offset + sec->outSecOff;
		uint8_t *bufLoc = buf + offset;
		ruiuUnsubmitted Done Reply Inline Actions nit: this can be `uint64_t offset = jumpMod.Offset + sec->outSecOff` ruiu: nit: this can be `uint64_t offset = jumpMod.Offset + sec->outSecOff`
		target->applyJumpInstrMod(bufLoc, jumpMod.original, jumpMod.size);
		}
		}
}		}

// For each function-defining prologue, find any calls to __morestack,		// For each function-defining prologue, find any calls to __morestack,
// and replace them with calls to __morestack_non_split.		// and replace them with calls to __morestack_non_split.
static void switchMorestackCallsToMorestackNonSplit(		static void switchMorestackCallsToMorestackNonSplit(
DenseSet<Defined > &prologues, std::vector<Relocation > &morestackCalls) {		DenseSet<Defined > &prologues, std::vector<Relocation > &morestackCalls) {

// If the target adjusted a function's prologue, all calls to		// If the target adjusted a function's prologue, all calls to
▲ Show 20 Lines • Show All 355 Lines • Show Last 20 Lines

lld/ELF/LTO.cpp

Show First 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	static lto::Config createConfig() {
c.Options = initTargetOptionsFromCodeGenFlags();		c.Options = initTargetOptionsFromCodeGenFlags();
c.Options.RelaxELFRelocations = true;		c.Options.RelaxELFRelocations = true;
c.Options.EmitAddrsig = true;		c.Options.EmitAddrsig = true;

// Always emit a section per function/datum with LTO.		// Always emit a section per function/datum with LTO.
c.Options.FunctionSections = true;		c.Options.FunctionSections = true;
c.Options.DataSections = true;		c.Options.DataSections = true;

		// Check if basic block sections must be used.
		// Allowed values for --lto-basicblock-sections are "all", "labels",
		// "<file name specifying basic block ids>", or none. This is the equivalent
		ruiuUnsubmitted Done Reply Inline Actions nit: it is more common to use `operator==` instead of `equals` to compare StringRefs. ruiu: nit: it is more common to use `operator==` instead of `equals` to compare StringRefs.
		// of -fbasicblock-sections= flag in clang.
		if (!config->ltoBasicBlockSections.empty()) {
		if (config->ltoBasicBlockSections == "all") {
		ruiuUnsubmitted Done Reply Inline Actions nit: if else has {}, add {} to other `if` and `else if` clauses. ruiu: nit: if else has {}, add {} to other `if` and `else if` clauses.
		c.Options.BBSections = BasicBlockSection::All;
		} else if (config->ltoBasicBlockSections == "labels") {
		c.Options.BBSections = BasicBlockSection::Labels;
		MaskRayUnsubmitted Done Reply Inline Actions Check `config->ltoBBSections == "list"` MaskRay: Check `config->ltoBBSections == "list"`
		tmsriramAuthorUnsubmitted Done Reply Inline Actions "list" is not an option but is specified as a file name with a list. The file contains a list of functions for which basic block sections must be generated. tmsriram: "list" is not an option but is specified as a file name with a list. The file contains a list…
		} else if (config->ltoBasicBlockSections == "none") {
		c.Options.BBSections = BasicBlockSection::None;
		} else {
		ErrorOr<std::unique_ptr<MemoryBuffer>> MBOrErr =
		MemoryBuffer::getFile(config->ltoBasicBlockSections.str());
		if (!MBOrErr) {
		error("cannot open " + config->ltoBasicBlockSections + ":" +
		MBOrErr.getError().message());
		ruiuUnsubmitted Done Reply Inline Actions Please use `error()` to report an error and use the same style as other error messages. E.g. error("cannot open " + config->ltoBasicBlockSections + ":" + MBOrErr.getError().message()); ruiu: Please use `error()` to report an error and use the same style as other error messages. E.g.
		} else {
		c.Options.BBSectionsFuncListBuf = std::move(*MBOrErr);
		}
		c.Options.BBSections = BasicBlockSection::List;
		}
		}

		c.Options.UniqueBBSectionNames = config->ltoUniqueBBSectionNames;

if (auto relocModel = getRelocModelFromCMModel())		if (auto relocModel = getRelocModelFromCMModel())
c.RelocModel = *relocModel;		c.RelocModel = *relocModel;
else if (config->relocatable)		else if (config->relocatable)
c.RelocModel = None;		c.RelocModel = None;
else if (config->isPic)		else if (config->isPic)
c.RelocModel = Reloc::PIC_;		c.RelocModel = Reloc::PIC_;
else		else
c.RelocModel = Reloc::Static;		c.RelocModel = Reloc::Static;
▲ Show 20 Lines • Show All 232 Lines • Show Last 20 Lines

lld/ELF/Options.td

Show All 36 Lines	defm check_sections: B<"check-sections",
"Do not check section addresses for overlaps">;		"Do not check section addresses for overlaps">;

defm compress_debug_sections:		defm compress_debug_sections:
Eq<"compress-debug-sections", "Compress DWARF debug sections">,		Eq<"compress-debug-sections", "Compress DWARF debug sections">,
MetaVarName<"[none,zlib]">;		MetaVarName<"[none,zlib]">;

defm defsym: Eq<"defsym", "Define a symbol alias">, MetaVarName<"<symbol>=<value>">;		defm defsym: Eq<"defsym", "Define a symbol alias">, MetaVarName<"<symbol>=<value>">;

		defm optimize_bb_jumps: B<"optimize-bb-jumps",
		"Remove direct jumps at the end to the next basic block",
		"Do not remove any direct jumps at the end to the next basic block (default)">;
		MaskRayUnsubmitted Not Done Reply Inline Actions Add `(default)`. See other options. MaskRay: Add `(default)`. See other options.

defm split_stack_adjust_size		defm split_stack_adjust_size
: Eq<"split-stack-adjust-size",		: Eq<"split-stack-adjust-size",
"Specify adjustment to stack size when a split-stack function calls a "		"Specify adjustment to stack size when a split-stack function calls a "
"non-split-stack function">,		"non-split-stack function">,
MetaVarName<"<value>">;		MetaVarName<"<value>">;

defm library_path:		defm library_path:
Eq<"library-path", "Add a directory to the library search path">, MetaVarName<"<dir>">;		Eq<"library-path", "Add a directory to the library search path">, MetaVarName<"<dir>">;
▲ Show 20 Lines • Show All 444 Lines • ▼ Show 20 Lines
def opt_remarks_passes: Separate<["--"], "opt-remarks-passes">,		def opt_remarks_passes: Separate<["--"], "opt-remarks-passes">,
HelpText<"Regex for the passes that need to be serialized to the output file">;		HelpText<"Regex for the passes that need to be serialized to the output file">;
def opt_remarks_with_hotness: Flag<["--"], "opt-remarks-with-hotness">,		def opt_remarks_with_hotness: Flag<["--"], "opt-remarks-with-hotness">,
HelpText<"Include hotness information in the optimization remarks file">;		HelpText<"Include hotness information in the optimization remarks file">;
def opt_remarks_format: Separate<["--"], "opt-remarks-format">,		def opt_remarks_format: Separate<["--"], "opt-remarks-format">,
HelpText<"The format used for serializing remarks (default: YAML)">;		HelpText<"The format used for serializing remarks (default: YAML)">;
defm plugin_opt: Eq<"plugin-opt", "specifies LTO options for compatibility with GNU linkers">;		defm plugin_opt: Eq<"plugin-opt", "specifies LTO options for compatibility with GNU linkers">;
def save_temps: F<"save-temps">;		def save_temps: F<"save-temps">;
		def lto_basicblock_sections: J<"lto-basicblock-sections=">,
		HelpText<"Enable basic block sections for LTO">;
		defm lto_unique_bb_section_names: B<"lto-unique-bb-section-names",
		"Give unique names to every basic block section for LTO",
		"Do not give unique names to every basic block section for LTO (default)">;
		MaskRayUnsubmitted Done Reply Inline Actions `"Do not give unique names to every basic block section for LTO (default)"` MaskRay: `"Do not give unique names to every basic block section for LTO (default)"`
def shuffle_sections: J<"shuffle-sections=">, MetaVarName<"<seed>">,		def shuffle_sections: J<"shuffle-sections=">, MetaVarName<"<seed>">,
HelpText<"Shuffle input sections using the given seed. If 0, use a random seed">;		HelpText<"Shuffle input sections using the given seed. If 0, use a random seed">;
def thinlto_cache_dir: J<"thinlto-cache-dir=">,		def thinlto_cache_dir: J<"thinlto-cache-dir=">,
HelpText<"Path to ThinLTO cached object file directory">;		HelpText<"Path to ThinLTO cached object file directory">;
defm thinlto_cache_policy: Eq<"thinlto-cache-policy", "Pruning policy for the ThinLTO cache">;		defm thinlto_cache_policy: Eq<"thinlto-cache-policy", "Pruning policy for the ThinLTO cache">;
def thinlto_emit_imports_files: F<"thinlto-emit-imports-files">;		def thinlto_emit_imports_files: F<"thinlto-emit-imports-files">;
def thinlto_index_only: F<"thinlto-index-only">;		def thinlto_index_only: F<"thinlto-index-only">;
def thinlto_index_only_eq: J<"thinlto-index-only=">;		def thinlto_index_only_eq: J<"thinlto-index-only=">;
▲ Show 20 Lines • Show All 85 Lines • Show Last 20 Lines

lld/ELF/OutputSections.cpp

Show First 20 Lines • Show All 236 Lines • ▼ Show 20 Lines

void OutputSection::sort(llvm::function_ref<int(InputSectionBase *s)> order) {		void OutputSection::sort(llvm::function_ref<int(InputSectionBase *s)> order) {
assert(isLive());		assert(isLive());
for (BaseCommand *b : sectionCommands)		for (BaseCommand *b : sectionCommands)
if (auto *isd = dyn_cast<InputSectionDescription>(b))		if (auto *isd = dyn_cast<InputSectionDescription>(b))
sortByOrder(isd->sections, order);		sortByOrder(isd->sections, order);
}		}

		static void nopInstrFill(uint8_t *buf, size_t size) {
		ruiuUnsubmitted Done Reply Inline Actions Please give this function a new name, as it is not easy to distinguish this function from the other `fill`. ruiu: Please give this function a new name, as it is not easy to distinguish this function from the…
		ruiuUnsubmitted Done Reply Inline Actions Lowercase ruiu: Lowercase
		if (size == 0)
		return;
		ruiuUnsubmitted Done Reply Inline Actions Please use the actual type instead of `auto`. ruiu: Please use the actual type instead of `auto`.
		unsigned i = 0;
		ruiuUnsubmitted Done Reply Inline Actions I think you can directly use `target->sizedNOPInstrs` instead of passing it as an argument. ruiu: I think you can directly use `target->sizedNOPInstrs` instead of passing it as an argument.
		echristoUnsubmitted Done Reply Inline Actions Might want to assert size != 0 or early return. echristo: Might want to assert size != 0 or early return.
		if (size == 0)
		ruiuUnsubmitted Done Reply Inline Actions Lowercase ruiu: Lowercase
		return;
		std::vector<std::vector<uint8_t>> nopFiller = *target->nopInstrs;
		unsigned num = size / nopFiller.back().size();
		for (unsigned c = 0; c < num; ++c) {
		memcpy(buf + i, nopFiller.back().data(), nopFiller.back().size());
		i += nopFiller.back().size();
		}
		MaskRayUnsubmitted Done Reply Inline Actions `at` -> `operator[]` at has a bound checking overhead. MaskRay: `at` -> `operator[]` at has a bound checking overhead.
		unsigned remaining = size - i;
		MaskRayUnsubmitted Done Reply Inline Actions `failed ...` and no full stop MaskRay: `failed ...` and no full stop
		if (!remaining)
		MaskRayUnsubmitted Done Reply Inline Actions `Sfillter[remaining - 1]` vector::at has an unneeded bound check. MaskRay: `Sfillter[remaining - 1]` vector::at has an unneeded bound check.
		return;
		MaskRayUnsubmitted Done Reply Inline Actions `assert(nopFiller[remaining - 1].size() == remaining)` MaskRay: `assert(nopFiller[remaining - 1].size() == remaining)`
		assert(nopFiller[remaining - 1].size() == remaining);
		memcpy(buf + i, nopFiller[remaining - 1].data(), remaining);
		}

// Fill [Buf, Buf + Size) with Filler.		// Fill [Buf, Buf + Size) with Filler.
// This is used for linker script "=fillexp" command.		// This is used for linker script "=fillexp" command.
static void fill(uint8_t *buf, size_t size,		static void fill(uint8_t *buf, size_t size,
const std::array<uint8_t, 4> &filler) {		const std::array<uint8_t, 4> &filler) {
size_t i = 0;		size_t i = 0;
for (; i + 4 < size; i += 4)		for (; i + 4 < size; i += 4)
memcpy(buf + i, filler.data(), 4);		memcpy(buf + i, filler.data(), 4);
memcpy(buf + i, filler.data(), size - i);		memcpy(buf + i, filler.data(), size - i);
▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	parallelForEachN(0, sections.size(), [&](size_t i) {
// Fill gaps between sections.		// Fill gaps between sections.
if (nonZeroFiller) {		if (nonZeroFiller) {
uint8_t *start = buf + isec->outSecOff + isec->getSize();		uint8_t *start = buf + isec->outSecOff + isec->getSize();
uint8_t *end;		uint8_t *end;
if (i + 1 == sections.size())		if (i + 1 == sections.size())
end = buf + size;		end = buf + size;
else		else
end = buf + sections[i + 1]->outSecOff;		end = buf + sections[i + 1]->outSecOff;
		if (isec->nopFiller) {
		assert(target->nopInstrs);
		ruiuUnsubmitted Done Reply Inline Actions This SpecialFiller is always NOP instructions, so how about adding just a boolean flag (e.g. `isec->fallthrough`) to an input section and remove this `SpecialFiller` vector? ruiu: This SpecialFiller is always NOP instructions, so how about adding just a boolean flag (e.g.
		tmsriramAuthorUnsubmitted Done Reply Inline Actions Done. tmsriram: Done.
		MaskRayUnsubmitted Done Reply Inline Actions Is `target->nopInstrs` redundant? MaskRay: Is `target->nopInstrs` redundant?
		tmsriramAuthorUnsubmitted Done Reply Inline Actions I can just assert instead! tmsriram: I can just assert instead!
		nopInstrFill(start, end - start);
		ruiuUnsubmitted Done Reply Inline Actions nit: (target->sizedNOPInstrs) -> target->sizedNOPInstrs ruiu: nit: (target->sizedNOPInstrs) -> target->sizedNOPInstrs
		} else
fill(start, end - start, filler);		fill(start, end - start, filler);
		MaskRayUnsubmitted Done Reply Inline Actions The code (`if (isec->nopFiller)`) self explains. No need for a comment. MaskRay: The code (`if (isec->nopFiller)`) self explains. No need for a comment.
		tmsriramAuthorUnsubmitted Done Reply Inline Actions I removed it but maybe I should say that NOPs are needed here as opposed to TRAP because they might be executed. tmsriram: I removed it but maybe I should say that NOPs are needed here as opposed to TRAP because they…
}		}
});		});

// Linker scripts may have BYTE()-family commands with which you		// Linker scripts may have BYTE()-family commands with which you
// can write arbitrary bytes to the output. Process them if any.		// can write arbitrary bytes to the output. Process them if any.
for (BaseCommand *base : sectionCommands)		for (BaseCommand *base : sectionCommands)
if (auto *data = dyn_cast<ByteCommand>(base))		if (auto *data = dyn_cast<ByteCommand>(base))
writeInt(buf + data->offset, data->expression().getValue(), data->size);		writeInt(buf + data->offset, data->expression().getValue(), data->size);
▲ Show 20 Lines • Show All 178 Lines • Show Last 20 Lines

lld/ELF/Relocations.h

	Show All 18 Lines
	class Symbol;			class Symbol;
	class InputSection;			class InputSection;
	class InputSectionBase;			class InputSectionBase;
	class OutputSection;			class OutputSection;
	class SectionBase;			class SectionBase;

	// Represents a relocation type, such as R_X86_64_PC32 or R_ARM_THM_CALL.			// Represents a relocation type, such as R_X86_64_PC32 or R_ARM_THM_CALL.
	using RelType = uint32_t;			using RelType = uint32_t;
				using JumpModType = uint32_t;

	// List of target-independent relocation types. Relocations read			// List of target-independent relocation types. Relocations read
	// from files are converted to these types so that the main code			// from files are converted to these types so that the main code
	// doesn't have to know about architecture-specific details.			// doesn't have to know about architecture-specific details.
	enum RelExpr {			enum RelExpr {
	R_ABS,			R_ABS,
	R_ADDEND,			R_ADDEND,
	R_DTPREL,			R_DTPREL,
	▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines
	struct Relocation {			struct Relocation {
	RelExpr expr;			RelExpr expr;
	RelType type;			RelType type;
	uint64_t offset;			uint64_t offset;
	int64_t addend;			int64_t addend;
	Symbol *sym;			Symbol *sym;
	};			};

				// Manipulate jump instructions with these modifiers. These are used to relax
				// jump instruction opcodes at basic block boundaries and are particularly
				// useful when basic block sections are enabled.
				struct JumpInstrMod {
				JumpModType original;
				uint64_t offset;
				unsigned size;
				ruiuUnsubmitted Done Reply Inline Actions Original -> original Offset -> offset Size -> size ruiu: Original -> original Offset -> offset Size -> size
				};

	// This function writes undefined symbol diagnostics to an internal buffer.			// This function writes undefined symbol diagnostics to an internal buffer.
	// Call reportUndefinedSymbols() after calling scanRelocations() to emit			// Call reportUndefinedSymbols() after calling scanRelocations() to emit
	// the diagnostics.			// the diagnostics.
	template <class ELFT> void scanRelocations(InputSectionBase &);			template <class ELFT> void scanRelocations(InputSectionBase &);

	template <class ELFT> void reportUndefinedSymbols();			template <class ELFT> void reportUndefinedSymbols();

	void hexagonTLSSymbolUpdate(ArrayRef<OutputSection *> outputSections);			void hexagonTLSSymbolUpdate(ArrayRef<OutputSection *> outputSections);
	▲ Show 20 Lines • Show All 72 Lines • Show Last 20 Lines

lld/ELF/Target.h

Show First 20 Lines • Show All 82 Lines • ▼ Show 20 Lines	virtual bool inBranchRange(RelType type, uint64_t src,
uint64_t dst) const;		uint64_t dst) const;

virtual void relocate(uint8_t *loc, const Relocation &rel,		virtual void relocate(uint8_t *loc, const Relocation &rel,
uint64_t val) const = 0;		uint64_t val) const = 0;
void relocateNoSym(uint8_t *loc, RelType type, uint64_t val) const {		void relocateNoSym(uint8_t *loc, RelType type, uint64_t val) const {
relocate(loc, Relocation{R_NONE, type, 0, 0, nullptr}, val);		relocate(loc, Relocation{R_NONE, type, 0, 0, nullptr}, val);
}		}

		virtual void applyJumpInstrMod(uint8_t *loc, JumpModType type,
		JumpModType val) const {}
		ruiuUnsubmitted Done Reply Inline Actions Lowercase ruiu: Lowercase

virtual ~TargetInfo();		virtual ~TargetInfo();

		// This deletes a jump insn at the end of the section if it is a fall thru to
		PkmXUnsubmitted Done Reply Inline Actions Document the meaning of the `bool` returned. PkmX: Document the meaning of the `bool` returned.
		MaskRayUnsubmitted Done Reply Inline Actions delete excess space after `//` MaskRay: delete excess space after `// `
		// the next section. Further, if there is a conditional jump and a direct
		// jump consecutively, it tries to flip the conditional jump to convert the
		// direct jump into a fall thru and delete it. Returns true if a jump
		// instruction can be deleted.
		virtual bool deleteFallThruJmpInsn(InputSection &is, InputFile *file,
		InputSection *nextIS) const {
		ruiuUnsubmitted Done Reply Inline Actions Lowercase ruiu: Lowercase
		return false;
		}

		MaskRayUnsubmitted Done Reply Inline Actions If growJmpInsn is X86 specific. Can it be moved to Arch/X86_64.cpp? MaskRay: If growJmpInsn is X86 specific. Can it be moved to Arch/X86_64.cpp?
		tmsriramAuthorUnsubmitted Done Reply Inline Actions growJmpInsn has been removed from this patch and will be presented as a separate patch. tmsriram: growJmpInsn has been removed from this patch and will be presented as a separate patch.
		tmsriramAuthorUnsubmitted Done Reply Inline Actions The grow part has been split, not applicable. tmsriram: The grow part has been split, not applicable.
unsigned defaultCommonPageSize = 4096;		unsigned defaultCommonPageSize = 4096;
unsigned defaultMaxPageSize = 4096;		unsigned defaultMaxPageSize = 4096;

uint64_t getImageBase() const;		uint64_t getImageBase() const;

// True if _GLOBAL_OFFSET_TABLE_ is relative to .got.plt, false if .got.		// True if _GLOBAL_OFFSET_TABLE_ is relative to .got.plt, false if .got.
bool gotBaseSymInGotPlt = true;		bool gotBaseSymInGotPlt = true;

Show All 20 Lines	public:
unsigned gotHeaderEntriesNum = 0;		unsigned gotHeaderEntriesNum = 0;

bool needsThunks = false;		bool needsThunks = false;

// A 4-byte field corresponding to one or more trap instructions, used to pad		// A 4-byte field corresponding to one or more trap instructions, used to pad
// executable OutputSections.		// executable OutputSections.
std::array<uint8_t, 4> trapInstr;		std::array<uint8_t, 4> trapInstr;

		// Stores the NOP instructions of different sizes for the target and is used
		// to pad sections that are relaxed.
		llvm::Optional<std::vector<std::vector<uint8_t>>> nopInstrs;

// If a target needs to rewrite calls to __morestack to instead call		// If a target needs to rewrite calls to __morestack to instead call
// __morestack_non_split when a split-stack enabled caller calls a		// __morestack_non_split when a split-stack enabled caller calls a
// non-split-stack callee this will return true. Otherwise returns false.		// non-split-stack callee this will return true. Otherwise returns false.
bool needsMoreStackNonSplit = true;		bool needsMoreStackNonSplit = true;

virtual RelExpr adjustRelaxExpr(RelType type, const uint8_t *data,		virtual RelExpr adjustRelaxExpr(RelType type, const uint8_t *data,
RelExpr expr) const;		RelExpr expr) const;
virtual void relaxGot(uint8_t *loc, const Relocation &rel,		virtual void relaxGot(uint8_t *loc, const Relocation &rel,
▲ Show 20 Lines • Show All 131 Lines • Show Last 20 Lines

lld/ELF/Writer.cpp

Show All 25 Lines
#include "llvm/ADT/StringMap.h"		#include "llvm/ADT/StringMap.h"
#include "llvm/ADT/StringSwitch.h"		#include "llvm/ADT/StringSwitch.h"
#include "llvm/Support/RandomNumberGenerator.h"		#include "llvm/Support/RandomNumberGenerator.h"
#include "llvm/Support/SHA1.h"		#include "llvm/Support/SHA1.h"
#include "llvm/Support/TimeProfiler.h"		#include "llvm/Support/TimeProfiler.h"
#include "llvm/Support/xxhash.h"		#include "llvm/Support/xxhash.h"
#include <climits>		#include <climits>

		#define DEBUG_TYPE "lld"

using namespace llvm;		using namespace llvm;
using namespace llvm::ELF;		using namespace llvm::ELF;
using namespace llvm::object;		using namespace llvm::object;
using namespace llvm::support;		using namespace llvm::support;
using namespace llvm::support::endian;		using namespace llvm::support::endian;

namespace lld {		namespace lld {
namespace elf {		namespace elf {
		MaskRayUnsubmitted Done Reply Inline Actions Delete these. MaskRay: Delete these.
		tmsriramAuthorUnsubmitted Done Reply Inline Actions Not applicable. tmsriram: Not applicable.
namespace {		namespace {
// The writer writes a SymbolTable result to a file.		// The writer writes a SymbolTable result to a file.
template <class ELFT> class Writer {		template <class ELFT> class Writer {
public:		public:
Writer() : buffer(errorHandler().outputBuffer) {}		Writer() : buffer(errorHandler().outputBuffer) {}
using Elf_Shdr = typename ELFT::Shdr;		using Elf_Shdr = typename ELFT::Shdr;
using Elf_Ehdr = typename ELFT::Ehdr;		using Elf_Ehdr = typename ELFT::Ehdr;
using Elf_Phdr = typename ELFT::Phdr;		using Elf_Phdr = typename ELFT::Phdr;

void run();		void run();

private:		private:
void copyLocalSymbols();		void copyLocalSymbols();
void addSectionSymbols();		void addSectionSymbols();
void forEachRelSec(llvm::function_ref<void(InputSectionBase &)> fn);		void forEachRelSec(llvm::function_ref<void(InputSectionBase &)> fn);
void sortSections();		void sortSections();
void resolveShfLinkOrder();		void resolveShfLinkOrder();
void finalizeAddressDependentContent();		void finalizeAddressDependentContent();
		void optimizeBasicBlockJumps();
void sortInputSections();		void sortInputSections();
void finalizeSections();		void finalizeSections();
void checkExecuteOnly();		void checkExecuteOnly();
void setReservedSymbolSections();		void setReservedSymbolSections();

std::vector<PhdrEntry *> createPhdrs(Partition &part);		std::vector<PhdrEntry *> createPhdrs(Partition &part);
void addPhdrForSection(Partition &part, unsigned shType, unsigned pType,		void addPhdrForSection(Partition &part, unsigned shType, unsigned pType,
unsigned pFlags);		unsigned pFlags);
▲ Show 20 Lines • Show All 494 Lines • ▼ Show 20 Lines	template <class ELFT> void Writer<ELFT>::run() {

if (config->copyRelocs)		if (config->copyRelocs)
addSectionSymbols();		addSectionSymbols();

// Now that we have a complete set of output sections. This function		// Now that we have a complete set of output sections. This function
// completes section contents. For example, we need to add strings		// completes section contents. For example, we need to add strings
// to the string table, and add entries to .got and .plt.		// to the string table, and add entries to .got and .plt.
// finalizeSections does that.		// finalizeSections does that.
finalizeSections();		finalizeSections();
		MaskRayUnsubmitted Done Reply Inline Actions Delete unneeded comments. MaskRay: Delete unneeded comments.
		tmsriramAuthorUnsubmitted Done Reply Inline Actions Not applicable. tmsriram: Not applicable.
checkExecuteOnly();		checkExecuteOnly();
if (errorCount())		if (errorCount())
return;		return;

// If -compressed-debug-sections is specified, we need to compress		// If -compressed-debug-sections is specified, we need to compress
// .debug_* sections. Do it right now because it changes the size of		// .debug_* sections. Do it right now because it changes the size of
// output sections.		// output sections.
for (OutputSection *sec : outputSections)		for (OutputSection *sec : outputSections)
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	if (config->discard == DiscardPolicy::None)
return true;		return true;

// If -emit-reloc is given, all symbols including local ones need to be		// If -emit-reloc is given, all symbols including local ones need to be
// copied because they may be referenced by relocations.		// copied because they may be referenced by relocations.
if (config->emitRelocs)		if (config->emitRelocs)
return true;		return true;

// In ELF assembly .L symbols are normally discarded by the assembler.		// In ELF assembly .L symbols are normally discarded by the assembler.
// If the assembler fails to do so, the linker discards them if		// If the assembler fails to do so, the linker discards them if
		MaskRayUnsubmitted Done Reply Inline Actions Delete the blank line. MaskRay: Delete the blank line.
		tmsriramAuthorUnsubmitted Done Reply Inline Actions Not applicable. tmsriram: Not applicable.
// * --discard-locals is used.		// * --discard-locals is used.
// * The symbol is in a SHF_MERGE section, which is normally the reason for		// * The symbol is in a SHF_MERGE section, which is normally the reason for
		MaskRayUnsubmitted Done Reply Inline Actions Delete surrounding `{}` MaskRay: Delete surrounding `{}`
		tmsriramAuthorUnsubmitted Done Reply Inline Actions Not applicable. tmsriram: Not applicable.
// the assembler keeping the .L symbol.		// the assembler keeping the .L symbol.
StringRef name = sym.getName();		StringRef name = sym.getName();
bool isLocal = name.startswith(".L") \|\| name.empty();		bool isLocal = name.startswith(".L") \|\| name.empty();
if (!isLocal)		if (!isLocal)
return true;		return true;

		MaskRayUnsubmitted Done Reply Inline Actions return condition rather than if condition return true else return false MaskRay: return condition rather than if condition return true else return false
		tmsriramAuthorUnsubmitted Done Reply Inline Actions Not applicable. tmsriram: Not applicable.
if (config->discard == DiscardPolicy::Locals)		if (config->discard == DiscardPolicy::Locals)
return false;		return false;

SectionBase *sec = sym.section;		SectionBase *sec = sym.section;
return !sec \|\| !(sec->flags & SHF_MERGE);		return !sec \|\| !(sec->flags & SHF_MERGE);
}		}

static bool includeInSymtab(const Symbol &b) {		static bool includeInSymtab(const Symbol &b) {
Show All 21 Lines

// Local symbols are not in the linker's symbol table. This function scans		// Local symbols are not in the linker's symbol table. This function scans
// each object file's symbol table to copy local symbols to the output.		// each object file's symbol table to copy local symbols to the output.
template <class ELFT> void Writer<ELFT>::copyLocalSymbols() {		template <class ELFT> void Writer<ELFT>::copyLocalSymbols() {
if (!in.symTab)		if (!in.symTab)
return;		return;
for (InputFile *file : objectFiles) {		for (InputFile *file : objectFiles) {
ObjFile<ELFT> *f = cast<ObjFile<ELFT>>(file);		ObjFile<ELFT> *f = cast<ObjFile<ELFT>>(file);
for (Symbol *b : f->getLocalSymbols()) {		for (Symbol *b : f->getLocalSymbols()) {
		MaskRayUnsubmitted Done Reply Inline Actions Use vectors and move them outside the loop. MaskRay: Use vectors and move them outside the loop.
		tmsriramAuthorUnsubmitted Done Reply Inline Actions Not applicable. tmsriram: Not applicable.
if (!b->isLocal())		if (!b->isLocal())
fatal(toString(f) +		fatal(toString(f) +
": broken object: getLocalSymbols returns a non-local symbol");		": broken object: getLocalSymbols returns a non-local symbol");
auto *dr = dyn_cast<Defined>(b);		auto *dr = dyn_cast<Defined>(b);

// No reason to keep local undefined symbol in symtab.		// No reason to keep local undefined symbol in symtab.
if (!dr)		if (!dr)
continue;		continue;
if (!includeInSymtab(*b))		if (!includeInSymtab(*b))
continue;		continue;
if (!shouldKeepInSymtab(*dr))		if (!shouldKeepInSymtab(*dr))
continue;		continue;
in.symTab->addSymbol(b);		in.symTab->addSymbol(b);
}		}
}		}
}		}

// Create a section symbol for each output section so that we can represent		// Create a section symbol for each output section so that we can represent
// relocations that point to the section. If we know that no relocation is		// relocations that point to the section. If we know that no relocation is
// referring to a section (that happens if the section is a synthetic one), we		// referring to a section (that happens if the section is a synthetic one), we
// don't create a section symbol for that section.		// don't create a section symbol for that section.
template <class ELFT> void Writer<ELFT>::addSectionSymbols() {		template <class ELFT> void Writer<ELFT>::addSectionSymbols() {
for (BaseCommand *base : script->sectionCommands) {		for (BaseCommand *base : script->sectionCommands) {
auto *sec = dyn_cast<OutputSection>(base);		auto *sec = dyn_cast<OutputSection>(base);
if (!sec)		if (!sec)
		MaskRayUnsubmitted Done Reply Inline Actions Delete {} auto -> Symbol MaskRay: Delete {} auto -> Symbol
		tmsriramAuthorUnsubmitted Done Reply Inline Actions Not applicable. tmsriram: Not applicable.
continue;		continue;
auto i = llvm::find_if(sec->sectionCommands, [](BaseCommand *base) {		auto i = llvm::find_if(sec->sectionCommands, [](BaseCommand *base) {
if (auto *isd = dyn_cast<InputSectionDescription>(base))		if (auto *isd = dyn_cast<InputSectionDescription>(base))
return !isd->sections.empty();		return !isd->sections.empty();
return false;		return false;
});		});
if (i == sec->sectionCommands.end())		if (i == sec->sectionCommands.end())
continue;		continue;
▲ Show 20 Lines • Show All 934 Lines • ▼ Show 20 Lines	template <class ELFT> void Writer<ELFT>::finalizeAddressDependentContent() {
for (BaseCommand *cmd : script->sectionCommands)		for (BaseCommand *cmd : script->sectionCommands)
if (auto *os = dyn_cast<OutputSection>(cmd))		if (auto *os = dyn_cast<OutputSection>(cmd))
if (os->addr % os->alignment != 0)		if (os->addr % os->alignment != 0)
warn("address (0x" + Twine::utohexstr(os->addr) + ") of section " +		warn("address (0x" + Twine::utohexstr(os->addr) + ") of section " +
os->name + " is not a multiple of alignment (" +		os->name + " is not a multiple of alignment (" +
Twine(os->alignment) + ")");		Twine(os->alignment) + ")");
}		}

		// If Input Sections have been shrinked (basic block sections) then
		// update symbol values and sizes associated with these sections. With basic
		// block sections, input sections can shrink when the jump instructions at
		// the end of the section are relaxed.
		PkmXUnsubmitted Done Reply Inline Actions I don't think you need to loop over all input files, just files with sections that have been shrunk. PkmX: I don't think you need to loop over all input files, just files with sections that have been…
		tmsriramAuthorUnsubmitted Done Reply Inline Actions How would we determine that? We need to keep track of this somewhere. tmsriram: How would we determine that? We need to keep track of this somewhere.
		static void fixSymbolsAfterShrinking() {
		for (InputFile *File : objectFiles) {
		parallelForEach(File->getSymbols(), [&](Symbol *Sym) {
		auto *def = dyn_cast<Defined>(Sym);
		if (!def)
		return;
		MaskRayUnsubmitted Done Reply Inline Actions auto->SectionBase MaskRay: auto->SectionBase

		const SectionBase *sec = def->section;
		if (!sec)
		return;
		MaskRayUnsubmitted Done Reply Inline Actions If you use `sec->repl`, there should be tests checking --icf={safe,all} MaskRay: If you use `sec->repl`, there should be tests checking --icf={safe,all}
		tmsriramAuthorUnsubmitted Done Reply Inline Actions I added an icf all test for this. tmsriram: I added an icf all test for this.

		const InputSectionBase *inputSec = dyn_cast<InputSectionBase>(sec->repl);
		if (!inputSec \|\| !inputSec->bytesDropped)
		return;
		MaskRayUnsubmitted Done Reply Inline Actions size_t MaskRay: size_t

		const size_t OldSize = inputSec->data().size();
		MaskRayUnsubmitted Not Done Reply Inline Actions This is a dangerous operation. Are you arbitrarily changing st_value/st_size? In what conditions can this be triggered? MaskRay: This is a dangerous operation. Are you arbitrarily changing st_value/st_size? In what…
		PkmXUnsubmitted Done Reply Inline Actions I think having symbols in the part of section that is shrunk should just be an error, or at least it should clamp them to the end of a section. PkmX: I think having symbols in the part of section that is shrunk should just be an error, or at…
		tmsriramAuthorUnsubmitted Done Reply Inline Actions This is called from optimizeBBJumps() and can only be triggered when the size of the section is shrunk from deleting jump instructions. tmsriram: This is called from optimizeBBJumps() and can only be triggered when the size of the section is…
		tmsriramAuthorUnsubmitted Done Reply Inline Actions I will address this in your most recent comment. tmsriram: I will address this in your most recent comment.
		MaskRayUnsubmitted Not Done Reply Inline Actions Better to check if there are cases where st_value > original_input_section_size. I asked because in binutils, bfd/elfnn-riscv.c has a similar check. if (sym->st_value > addr && sym->st_value <= toaddr) sym->st_value -= count; MaskRay: Better to check if there are cases where st_value > original_input_section_size. I asked…
		MaskRayUnsubmitted Not Done Reply Inline Actions I requested a research for st_value but I think it is not needed to get it accurate. However, I don't think we should just copy the elfnn-riscv.c behavior. if (def->value + def->size > NewSize && def->value <= OldSize && def->value + def->size <= OldSize) { should be simplified to if (def->value + def->size > NewSize && def->value + def->size <= OldSize) { MaskRay: I requested a research for st_value but I think it is not needed to get it accurate. However…
		tmsriramAuthorUnsubmitted Done Reply Inline Actions @amharc We are going back and forth here. We didn't think it was necessary to copy the behavior either, but since you suggested that we copy elfnn-riscv.c behavior, we went ahead and did it. We dont see a problem with either, so should we just leave it as is? Is there a particular concern here, that is not clear to me. tmsriram: @amharc We are going back and forth here. We didn't think it was necessary to copy the…
		const size_t NewSize = OldSize - inputSec->bytesDropped;

		if (def->value > NewSize && def->value <= OldSize) {
		LLVM_DEBUG(llvm::dbgs()
		<< "Moving symbol " << Sym->getName() << " from "
		MaskRayUnsubmitted Done Reply Inline Actions I am still skeptical about this st_value adjustment. Maybe you could find some introspection programs and check they don't break. This may sometimes confuse symbolizers. MaskRay: I am still skeptical about this st_value adjustment. Maybe you could find some introspection…
		MaskRayUnsubmitted Done Reply Inline Actions Nit: Moving -> move (make it a bit simpler) MaskRay: Nit: Moving -> move (make it a bit simpler)
		tmsriramAuthorUnsubmitted Done Reply Inline Actions I am not a native speaker but "Moving" seems more appropriate here unless you meant that 'M' should be lower case. My take, "move" seems to imply the user must do it. tmsriram: I am not a native speaker but "Moving" seems more appropriate here unless you meant that 'M'…
		<< def->value << " to "
		<< def->value - inputSec->bytesDropped << " bytes\n");
		def->value -= inputSec->bytesDropped;
		return;
		MaskRayUnsubmitted Done Reply Inline Actions Similarly, in BFD, this is: else if (sym->st_value <= addr && sym->st_value + sym->st_size > addr && sym->st_value + sym->st_size <= toaddr) sym->st_size -= count; MaskRay: Similarly, in BFD, this is: else if (sym->st_value <= addr && sym->st_value + sym…
		tmsriramAuthorUnsubmitted Done Reply Inline Actions Working on this one, will fix this in the next iteration. Keeping this open until then. tmsriram: Working on this one, will fix this in the next iteration. Keeping this open until then.
		}

		if (def->value + def->size > NewSize && def->value <= OldSize &&
		def->value + def->size <= OldSize) {
		LLVM_DEBUG(llvm::dbgs()
		<< "Shrinking symbol " << Sym->getName() << " from "
		<< def->size << " to " << def->size - inputSec->bytesDropped
		<< " bytes\n");
		def->size -= inputSec->bytesDropped;
		}
		});
		}
		}
		MaskRayUnsubmitted Not Done Reply Inline Actions optimizeBasicBlockJumps should be placed in finalizeAddressDependentContent to accommodate thunks and linker script symbol changes. MaskRay: optimizeBasicBlockJumps should be placed in finalizeAddressDependentContent to accommodate…
		tmsriramAuthorUnsubmitted Done Reply Inline Actions Accomodating thunks and creating them optimally needs a little more thought. We haven't looked at this and we haven't tested it. Could we do this in a later patch? tmsriram: Accomodating thunks and creating them optimally needs a little more thought. We haven't looked…

		MaskRayUnsubmitted Done Reply Inline Actions Why !ELFT::Is64Bits MaskRay: Why !ELFT::Is64Bits
		// If basic block sections exist, there are opportunities to delete fall thru
		ruiuUnsubmitted Done Reply Inline Actions What this `!ELFT::Is64Bits` condition for? It looks like you can just remove it. ruiu: What this `!ELFT::Is64Bits` condition for? It looks like you can just remove it.
		// jumps and shrink jump instructions after basic block reordering. This
		// relaxation pass does that. It is only enabled when --optimize-bb-jumps
		// option is used.
		template <class ELFT> void Writer<ELFT>::optimizeBasicBlockJumps() {
		assert(config->optimizeBBJumps);
		MaskRayUnsubmitted Done Reply Inline Actions Nit: move the condition to the call site. MaskRay: Nit: move the condition to the call site.
		tmsriramAuthorUnsubmitted Done Reply Inline Actions Moved and asserted at the beginning. tmsriram: Moved and asserted at the beginning.

		MaskRayUnsubmitted Done Reply Inline Actions vector<bool> is thread hostile because it is represented as a bit vector. See SyntheticSections.cpp:createSymbols for an example how to parallel correctly. MaskRay: vector<bool> is thread hostile because it is represented as a bit vector. See…
		script->assignAddresses();
		// For every output section that has executable input sections, this
		// does the following:
		// 1. Deletes all direct jump instructions in input sections that
		MaskRayUnsubmitted Done Reply Inline Actions There is only one list item. Why use `1.` ? MaskRay: There is only one list item. Why use `1.` ?
		tmsriramAuthorUnsubmitted Done Reply Inline Actions Rephrased, shrinking the jump instrs was the 2. but I split that patch out and this remained unnoticed. tmsriram: Rephrased, shrinking the jump instrs was the 2. but I split that patch out and this remained…
		// jump to the following section as it is not required.
		// 2. If there are two consecutive jump instructions, it checks
		// if they can be flipped and one can be deleted.
		for (OutputSection *os : outputSections) {
		if (!(os->flags & SHF_EXECINSTR))
		continue;
		std::vector<InputSection *> sections = getInputSections(os);
		std::vector<unsigned> result(sections.size());
		// Delete all fall through jump instructions. Also, check if two
		// consecutive jump instructions can be flipped so that a fall
		MaskRayUnsubmitted Done Reply Inline Actions excess parentheses MaskRay: excess parentheses
		MaskRayUnsubmitted Done Reply Inline Actions Where is `Step 2`? MaskRay: Where is `Step 2`?
		tmsriramAuthorUnsubmitted Done Reply Inline Actions Same deal. tmsriram: Same deal.
		// through jmp instruction can be deleted.
		MaskRayUnsubmitted Done Reply Inline Actions MaxIt is not needed. MaxAlign = 0; if !config->shrinkJumpsAggressively && !sections.empty() MaxAlign = max_element(...)->alignment; MaskRay: MaxIt is not needed. ``` MaxAlign = 0; if !config->shrinkJumpsAggressively && !sections.empty…
		tmsriramAuthorUnsubmitted Done Reply Inline Actions Not applicable. tmsriram: Not applicable.
		parallelForEachN(0, sections.size(), [&](size_t i) {
		InputSection *next = i + 1 < sections.size() ? sections[i + 1] : nullptr;
		InputSection &is = *sections[i];
		MaskRayUnsubmitted Done Reply Inline Actions delete excess parens MaskRay: delete excess parens
		result[i] =
		MaskRayUnsubmitted Done Reply Inline Actions Nit: delete parens MaskRay: Nit: delete parens
		tmsriramAuthorUnsubmitted Done Reply Inline Actions With parens reads better to me personally. If this is not acceptable w.r.t the coding style, lmk and I will delete. tmsriram: With parens reads better to me personally. If this is not acceptable w.r.t the coding style…
		target->deleteFallThruJmpInsn(is, is.getFile<ELFT>(), next) ? 1 : 0;
		});
		size_t numDeleted = std::count(result.begin(), result.end(), 1);
		MaskRayUnsubmitted Done Reply Inline Actions More comments on how sections are shrunk. MaskRay: More comments on how sections are shrunk.
		tmsriramAuthorUnsubmitted Done Reply Inline Actions Not applicable. tmsriram: Not applicable.
		MaskRayUnsubmitted Not Done Reply Inline Actions Nit: static_cast<unsigned> MaskRay: Nit: static_cast<unsigned>
		if (numDeleted > 0) {
		script->assignAddresses();
		MaskRayUnsubmitted Done Reply Inline Actions vector<bool> is thread hostile. MaskRay: vector<bool> is thread hostile.
		LLVM_DEBUG(llvm::dbgs()
		<< "Removing " << numDeleted << " fall through jumps\n");
		}
		}
		MaskRayUnsubmitted Done Reply Inline Actions Removing -> removed MaskRay: Removing -> removed
		tmsriramAuthorUnsubmitted Done Reply Inline Actions Past tense is preferred? I see mixed use here in many places in LLD. tmsriram: Past tense is preferred? I see mixed use here in many places in LLD.

		fixSymbolsAfterShrinking();

		MaskRayUnsubmitted Done Reply Inline Actions Supefluous `()` MaskRay: Supefluous `()`
		for (OutputSection *os : outputSections) {
		std::vector<InputSection *> sections = getInputSections(os);
		for (InputSection *is : sections)
		is->trim();
		}
		}

static void finalizeSynthetic(SyntheticSection *sec) {		static void finalizeSynthetic(SyntheticSection *sec) {
		MaskRayUnsubmitted Done Reply Inline Actions Magic 4 here seems very x86 specific. MaskRay: Magic 4 here seems very x86 specific.
		tmsriramAuthorUnsubmitted Done Reply Inline Actions Not applicable. tmsriram: Not applicable.
if (sec && sec->isNeeded() && sec->getParent())		if (sec && sec->isNeeded() && sec->getParent())
sec->finalizeContents();		sec->finalizeContents();
}		}

// In order to allow users to manipulate linker-synthesized sections,		// In order to allow users to manipulate linker-synthesized sections,
// we had to add synthetic sections to the input section list early,		// we had to add synthetic sections to the input section list early,
		PkmXUnsubmitted Done Reply Inline Actions Let the user specify how many rounds to run? I suppose the last few rounds may have marginal benefits. PkmX: Let the user specify how many rounds to run? I suppose the last few rounds may have marginal…
		tmsriramAuthorUnsubmitted Done Reply Inline Actions Not applicable. tmsriram: Not applicable.
// even before we make decisions whether they are needed. This allows		// even before we make decisions whether they are needed. This allows
// users to write scripts like this: ".mygot : { .got }".		// users to write scripts like this: ".mygot : { .got }".
		ruiuUnsubmitted Done Reply Inline Actions Please add a comment as to in what condition this shrink sections too much. If it is monotonically decreasing and alignments are all the same, I think this condition should never occur. ruiu: Please add a comment as to in what condition this shrink sections too much. If it is…
		tmsriramAuthorUnsubmitted Done Reply Inline Actions Not applicable. tmsriram: Not applicable.
//		//
// Doing it has an unintended side effects. If it turns out that we		// Doing it has an unintended side effects. If it turns out that we
// don't need a .got (for example) at all because there's no		// don't need a .got (for example) at all because there's no
// relocation that needs a .got, we don't want to emit .got.		// relocation that needs a .got, we don't want to emit .got.
//		//
// To deal with the above problem, this function is called after		// To deal with the above problem, this function is called after
// scanRelocations is called to remove synthetic sections that turn		// scanRelocations is called to remove synthetic sections that turn
		ruiuUnsubmitted Done Reply Inline Actions Is growing different from undoing? I'm not quite sure why you can't simply revert all changes we made to a section if we have to grow it. ruiu: Is growing different from undoing? I'm not quite sure why you can't simply revert all changes…
		MaskRayUnsubmitted Done Reply Inline Actions Another related question: why does shrinkJumpsAggressively call growJmpInsn? MaskRay: Another related question: why does shrinkJumpsAggressively call growJmpInsn?
		tmsriramAuthorUnsubmitted Done Reply Inline Actions Not applicable. tmsriram: Not applicable.
		tmsriramAuthorUnsubmitted Done Reply Inline Actions Not applicable. tmsriram: Not applicable.
// out to be empty.		// out to be empty.
static void removeUnusedSyntheticSections() {		static void removeUnusedSyntheticSections() {
// All input synthetic sections that can be empty are placed after		// All input synthetic sections that can be empty are placed after
// all regular ones. We iterate over them all and exit at first		// all regular ones. We iterate over them all and exit at first
// non-synthetic.		// non-synthetic.
for (InputSectionBase *s : llvm::reverse(inputSections)) {		for (InputSectionBase *s : llvm::reverse(inputSections)) {
SyntheticSection *ss = dyn_cast<SyntheticSection>(s);		SyntheticSection *ss = dyn_cast<SyntheticSection>(s);
if (!ss)		if (!ss)
return;		return;
OutputSection *os = ss->getParent();		OutputSection *os = ss->getParent();
if (!os \|\| ss->isNeeded())		if (!os \|\| ss->isNeeded())
continue;		continue;

// If we reach here, then ss is an unused synthetic section and we want to		// If we reach here, then ss is an unused synthetic section and we want to
// remove it from the corresponding input section description, and		// remove it from the corresponding input section description, and
// orphanSections.		// orphanSections.
for (BaseCommand *b : os->sectionCommands)		for (BaseCommand *b : os->sectionCommands)
if (auto *isd = dyn_cast<InputSectionDescription>(b))		if (auto *isd = dyn_cast<InputSectionDescription>(b))
llvm::erase_if(isd->sections,		llvm::erase_if(isd->sections,
[=](InputSection *isec) { return isec == ss; });		[=](InputSection *isec) { return isec == ss; });
llvm::erase_if(script->orphanSections,		llvm::erase_if(script->orphanSections,
[=](const InputSectionBase *isec) { return isec == ss; });		[=](const InputSectionBase *isec) { return isec == ss; });
		MaskRayUnsubmitted Done Reply Inline Actions Try not adding another field InputSectionBase::Trimmed. Instead, moving it here. MaskRay: Try not adding another field InputSectionBase::Trimmed. Instead, moving it here.
}		}
}		}

// Create output section objects and add them to OutputSections.		// Create output section objects and add them to OutputSections.
template <class ELFT> void Writer<ELFT>::finalizeSections() {		template <class ELFT> void Writer<ELFT>::finalizeSections() {
Out::preinitArray = findSection(".preinit_array");		Out::preinitArray = findSection(".preinit_array");
Out::initArray = findSection(".init_array");		Out::initArray = findSection(".init_array");
Out::finiArray = findSection(".fini_array");		Out::finiArray = findSection(".fini_array");
▲ Show 20 Lines • Show All 268 Lines • ▼ Show 20 Lines	template <class ELFT> void Writer<ELFT>::finalizeSections() {
// sometimes using forward symbol declarations. We want to set the correct		// sometimes using forward symbol declarations. We want to set the correct
// values. They also might change after adding the thunks.		// values. They also might change after adding the thunks.
finalizeAddressDependentContent();		finalizeAddressDependentContent();

// finalizeAddressDependentContent may have added local symbols to the static symbol table.		// finalizeAddressDependentContent may have added local symbols to the static symbol table.
finalizeSynthetic(in.symTab);		finalizeSynthetic(in.symTab);
finalizeSynthetic(in.ppc64LongBranchTarget);		finalizeSynthetic(in.ppc64LongBranchTarget);

		// Relaxation to delete inter-basic block jumps created by basic block
		// sections. Run after in.symTab is finalized as optimizeBasicBlockJumps
		// can relax jump instructions based on symbol offset.
		MaskRayUnsubmitted Done Reply Inline Actions Delete unrelated comments. MaskRay: Delete unrelated comments.
		MaskRayUnsubmitted Done Reply Inline Actions `optimizeBasicBlockJumps` calls assignAddresses, which was only called in finalizeAddressDependentContent. We hope assignAddresses caller are grouped together (if in.symTab needs to be finalized first, please add a comment). Can you move this pass immediately before (or after) finalizeAddressDependentContent? MaskRay: `optimizeBasicBlockJumps` calls assignAddresses, which was only called in…
		tmsriramAuthorUnsubmitted Done Reply Inline Actions Correct me if I am wrong but finalizing in.symtab before we optimize seems important, I added a comment here. If we relax jumps going forward, we would definitely need to know how far they are. tmsriram: Correct me if I am wrong but finalizing in.symtab before we optimize seems important, I added a…
		MaskRayUnsubmitted Done Reply Inline Actions Run after in.symTab is finalized. Why is important? MaskRay: > Run after in.symTab is finalized. Why is important?
		tmsriramAuthorUnsubmitted Done Reply Inline Actions Modified comment. tmsriram: Modified comment.
		MaskRayUnsubmitted Not Done Reply Inline Actions I am still not sure this is correct. .symtab and .strtab and potential .shstrtab are placed after everything else. The internal representation does not use the contents of `.symtab` at all. Though, you already committed this patch. We probably don't want to change here more to cause churn. MaskRay: I am still not sure this is correct. .symtab and .strtab and potential .shstrtab are placed…
		if (config->optimizeBBJumps)
		optimizeBasicBlockJumps();

// Fill other section headers. The dynamic table is finalized		// Fill other section headers. The dynamic table is finalized
// at the end because some tags like RELSZ depend on result		// at the end because some tags like RELSZ depend on result
// of finalizing other sections.		// of finalizing other sections.
for (OutputSection *sec : outputSections)		for (OutputSection *sec : outputSections)
sec->finalize();		sec->finalize();
}		}

// Ensure data sections are not mixed with executable sections when		// Ensure data sections are not mixed with executable sections when
▲ Show 20 Lines • Show All 809 Lines • Show Last 20 Lines

lld/test/ELF/bb-sections-and-icf.s

This file was added.

				# REQUIRES: x86
				## basicblock-sections tests.
				## This simple test checks foo is folded into bar with bb sections
				## and the jumps are deleted.

				MaskRayUnsubmitted Done Reply Inline Actions deleted MaskRay: deleted
				# RUN: llvm-mc -filetype=obj -triple=x86_64 %s -o %t.o
				# RUN: ld.lld --optimize-bb-jumps --icf=all %t.o -o %t.out
				MaskRayUnsubmitted Done Reply Inline Actions `x86_64-pc-linux` -> `x86_64` MaskRay: `x86_64-pc-linux` -> `x86_64`
				# RUN: llvm-objdump -d %t.out\| FileCheck %s
				MaskRayUnsubmitted Done Reply Inline Actions delete excess space. Prefer `--optimize-bb-jumps` over `-optimize-bb-jumps` MaskRay: delete excess space. Prefer `--optimize-bb-jumps` over `-optimize-bb-jumps`
				MaskRayUnsubmitted Done Reply Inline Actions If --icf=all result is different from `--icf=none`. Add a comment. MaskRay: If --icf=all result is different from `--icf=none`. Add a comment.
				tmsriramAuthorUnsubmitted Done Reply Inline Actions Didn't follow. This test explicitly checks for the folding. You want to test icf=none too? Why? tmsriram: Didn't follow. This test explicitly checks for the folding. You want to test icf=none too?

				# CHECK: <foo>:
				# CHECK-NEXT: nopl (%rax)
				# CHECK-NEXT: je 0x{{[[:xdigit:]]+}} <aa.BB.foo>
				# CHECK-NOT: jmp

				# CHECK: <a.BB.foo>:
				## Explicity check that bar is folded and not emitted.
				# CHECK-NOT: <bar>:
				MaskRayUnsubmitted Done Reply Inline Actions Use `##` for test comments. MaskRay: Use `## ` for test comments.
				# CHECK-NOT: <a.BB.bar>:
				MaskRayUnsubmitted Done Reply Inline Actions Don't add excess space. MaskRay: Don't add excess space.
				# CHECK-NOT: <aa.BB.bar>:

				.section .text.bar,"ax",@progbits
				.type bar,@function
				bar:
				MaskRayUnsubmitted Done Reply Inline Actions This comment is redundant. MaskRay: This comment is redundant.
				nopl (%rax)
				jne a.BB.bar
				jmp aa.BB.bar

				.section .text.a.BB.bar,"ax",@progbits,unique,3
				a.BB.bar:
				nopl (%rax)

				aa.BB.bar:
				ret

				.section .text.foo,"ax",@progbits
				.type foo,@function
				foo:
				nopl (%rax)
				MaskRayUnsubmitted Done Reply Inline Actions ditto MaskRay: ditto
				jne a.BB.foo
				jmp aa.BB.foo

				.section .text.a.BB.foo,"ax",@progbits,unique,2
				a.BB.foo:
				nopl (%rax)

				aa.BB.foo:
				ret

lld/test/ELF/bb-sections-delete-fallthru.s

This file was added.

				# REQUIRES: x86
				## basicblock-sections tests.
				## This simple test checks if redundant direct jumps are converted to
				## implicit fallthrus. The jcc's must be converted to their inverted
				## opcode, for instance jne to je and jmp must be deleted.

				# RUN: llvm-mc -filetype=obj -triple=x86_64 %s -o %t.o
				# RUN: ld.lld --optimize-bb-jumps %t.o -o %t.out
				MaskRayUnsubmitted Done Reply Inline Actions Drop `-pc-linux`. I mentioned this in a previous comment. MaskRay: Drop `-pc-linux`. I mentioned this in a previous comment.
				# RUN: llvm-objdump -d %t.out\| FileCheck %s
				MaskRayUnsubmitted Done Reply Inline Actions delete excess space and use `--optimize-bb-jumps` I mentioned this in a previous comment. MaskRay: delete excess space and use `--optimize-bb-jumps` I mentioned this in a previous comment.

				MaskRayUnsubmitted Done Reply Inline Actions Drop `--check-prefix=CHECK`. It is the default. MaskRay: Drop `--check-prefix=CHECK`. It is the default.
				# CHECK: <foo>:
				# CHECK-NEXT: nopl (%rax)
				# CHECK-NEXT: jne 0x{{[[:xdigit:]]+}} <r.BB.foo>
				# CHECK-NOT: jmp
				MaskRayUnsubmitted Done Reply Inline Actions Just delete `{{[0-9\|a-f\| ]}}` if the address is not significant. Please also apply to other test files. MaskRay:* Just delete `{{[0-9\|a-f\| ]*}}` if the address is not significant. Please also apply to other…


				.section .text,"ax",@progbits
				.type foo,@function
				foo:
				MaskRayUnsubmitted Done Reply Inline Actions Delete `Begin function foo` Scrub clang output to the minimum MaskRay: Delete `Begin function foo` Scrub clang output to the minimum
				nopl (%rax)
				je a.BB.foo
				jmp r.BB.foo

				# CHECK: <a.BB.foo>:
				# CHECK-NEXT: nopl (%rax)
				MaskRayUnsubmitted Done Reply Inline Actions Add some spaces after `CHECK:` to make the label aligned. For subsequent `-NEXT:` lines, you can increase the indentation (say, by 1) to make it clear the instructions follow the label: # CHECK: <a.BB.foo> # CHECK-NEXT: nopl (%rax) Please fix other tests as well. MaskRay: Add some spaces after `CHECK: ` to make the label aligned. For subsequent `-NEXT:` lines, you…
				MaskRayUnsubmitted Done Reply Inline Actions # CHECK: <a.BB.foo>: # CHECK-NEXT: nopl (%rax) Add a colon to make it clear that `a.BB.foo` is a label: MaskRay: ``` # CHECK: <a.BB.foo>: # CHECK-NEXT: nopl (%rax) ``` Add a colon to make it clear that…
				# CHECK-NEXT: je 0x{{[[:xdigit:]]+}} <r.BB.foo>
				# CHECK-NOT: jmp
				MaskRayUnsubmitted Done Reply Inline Actions `je {{.}} <r.BB.foo>` I have recently updated llvm-objdump -d to print the target address (to be consistent with GNU objdump ; is what most users desire) instead of a decimal PC relative immediate No need to align the first operand. MaskRay:* `je {{.*}} <r.BB.foo>` I have recently updated llvm-objdump -d to print the target address (to…

				.section .text,"ax",@progbits,unique,3
				a.BB.foo:
				nopl (%rax)
				jne aa.BB.foo
				jmp r.BB.foo

				# CHECK: <aa.BB.foo>:
				# CHECK-NEXT: nopl (%rax)
				# CHECK-NEXT: jle 0x{{[[:xdigit:]]+}} <r.BB.foo>
				# CHECK-NOT: jmp
				#
				.section .text,"ax",@progbits,unique,4
				aa.BB.foo:
				nopl (%rax)
				jg aaa.BB.foo
				jmp r.BB.foo

				# CHECK: <aaa.BB.foo>:
				# CHECK-NEXT: nopl (%rax)
				# CHECK-NEXT: jl 0x{{[[:xdigit:]]+}} <r.BB.foo>
				# CHECK-NOT: jmp
				#
				.section .text,"ax",@progbits,unique,5
				aaa.BB.foo:
				nopl (%rax)
				jge aaaa.BB.foo
				jmp r.BB.foo

				# CHECK: <aaaa.BB.foo>:
				# CHECK-NEXT: nopl (%rax)
				# CHECK-NEXT: jae 0x{{[[:xdigit:]]+}} <r.BB.foo>
				# CHECK-NOT: jmp
				#
				.section .text,"ax",@progbits,unique,6
				aaaa.BB.foo:
				nopl (%rax)
				jb aaaaa.BB.foo
				jmp r.BB.foo

				# CHECK: <aaaaa.BB.foo>:
				# CHECK-NEXT: nopl (%rax)
				# CHECK-NEXT: ja 0x{{[[:xdigit:]]+}} <r.BB.foo>
				# CHECK-NOT: jmp
				#
				.section .text,"ax",@progbits,unique,7
				aaaaa.BB.foo:
				nopl (%rax)
				jbe aaaaaa.BB.foo
				jmp r.BB.foo

				# CHECK: <aaaaaa.BB.foo>:
				# CHECK-NEXT: nopl (%rax)
				# CHECK-NEXT: jge 0x{{[[:xdigit:]]+}} <r.BB.foo>
				# CHECK-NOT: jmp
				#
				.section .text,"ax",@progbits,unique,8
				aaaaaa.BB.foo:
				nopl (%rax)
				jl aaaaaaa.BB.foo
				jmp r.BB.foo

				# CHECK: <aaaaaaa.BB.foo>:
				# CHECK-NEXT: nopl (%rax)
				# CHECK-NEXT: jg 0x{{[[:xdigit:]]+}} <r.BB.foo>
				# CHECK-NOT: jmp
				#
				.section .text,"ax",@progbits,unique,9
				aaaaaaa.BB.foo:
				nopl (%rax)
				jle aaaaaaaa.BB.foo
				jmp r.BB.foo

				# CHECK: <aaaaaaaa.BB.foo>:
				# CHECK-NEXT: nopl (%rax)
				# CHECK-NEXT: jbe 0x{{[[:xdigit:]]+}} <r.BB.foo>
				# CHECK-NOT: jmp
				#
				.section .text,"ax",@progbits,unique,10
				aaaaaaaa.BB.foo:
				nopl (%rax)
				ja aaaaaaaaa.BB.foo
				jmp r.BB.foo

				# CHECK: <aaaaaaaaa.BB.foo>:
				# CHECK-NEXT: nopl (%rax)
				# CHECK-NEXT: jb 0x{{[[:xdigit:]]+}} <r.BB.foo>
				# CHECK-NOT: jmp
				#
				.section .text,"ax",@progbits,unique,11
				aaaaaaaaa.BB.foo:
				nopl (%rax)
				jae aaaaaaaaaa.BB.foo
				jmp r.BB.foo

				.section .text,"ax",@progbits,unique,20
				aaaaaaaaaa.BB.foo:
				nopl (%rax)

				r.BB.foo:
				ret

lld/test/ELF/bb-sections-pc32reloc.s

This file was added.

				# REQUIRES: x86
				## basicblock-sections tests.
				## This simple test checks if redundant direct jumps are converted to
				## implicit fallthrus when PC32 reloc is present. The jcc's must be converted
				## to their inverted opcode, for instance jne to je and jmp must be deleted.

				# RUN: llvm-mc -filetype=obj -triple=x86_64 %s -o %t.o
				# RUN: llvm-objdump -dr %t.o\| FileCheck %s --check-prefix=RELOC
				# RUN: ld.lld --optimize-bb-jumps %t.o -o %t.out
				# RUN: llvm-objdump -d %t.out\| FileCheck %s

				# RELOC: jmp
				# RELOC-NEXT: R_X86_64_PC32

				# CHECK: <foo>:
				# CHECK-NEXT: nopl (%rax)
				# CHECK-NEXT: jne 0x{{[[:xdigit:]]+}} <r.BB.foo>
				# CHECK-NOT: jmp


				.section .text,"ax",@progbits
				.type foo,@function
				foo:
				nopl (%rax)
				je a.BB.foo
				# Encode a jmp r.BB.foo insn using a PC32 reloc
				.byte 0xe9
				.long r.BB.foo - . - 4

				# CHECK: <a.BB.foo>:
				# CHECK-NEXT: nopl (%rax)

				.section .text,"ax",@progbits,unique,3
				a.BB.foo:
				nopl (%rax)
				r.BB.foo:
				ret

This is an archive of the discontinued LLVM Phabricator instance.

Propeller: LLD Support for Basic Block SectionsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 255670

lld/ELF/Arch/X86_64.cpp

lld/ELF/Config.h

lld/ELF/Driver.cpp

lld/ELF/InputSection.h

lld/ELF/InputSection.cpp

lld/ELF/LTO.cpp

lld/ELF/Options.td

lld/ELF/OutputSections.cpp

lld/ELF/Relocations.h

lld/ELF/Target.h

lld/ELF/Writer.cpp

lld/test/ELF/bb-sections-and-icf.s

lld/test/ELF/bb-sections-delete-fallthru.s

lld/test/ELF/bb-sections-pc32reloc.s

Propeller: LLD Support for Basic Block Sections
ClosedPublic