Page MenuHomePhabricator

Propeller: LLD Support for Basic Block Sections
ClosedPublic

Authored by tmsriram on Sep 25 2019, 5:58 PM.

Details

Summary

LLD Support for Basic Block Sections

This is part of the Propeller framework to do post link code layout optimizations. Please see the RFC here: https://groups.google.com/forum/#!msg/llvm-dev/ef3mKzAdJ7U/1shV64BYBAAJ and the detailed RFC doc here: https://github.com/google/llvm-propeller/blob/plo-dev/Propeller_RFC.pdf

This is one in the series of patches for Propeller.

This patch adds lld support for basic block sections and performs relaxations after the basic blocks have been reordered.

After the linker has reordered the basic block sections according to the desired sequence, it runs a relaxation pass to optimize jump instructions. Currently, the compiler emits the long form of all jump instructions . AMD64 ISA supports variants of jump instructions with one byte offset or a four byte offset. The compiler generates jump instructions with R_X86_64 32-bit PC relative relocations. We would like to use a new relocation type for these jump instructions as it makes it easy and accurate while relaxing these instructions.

A new relocation type for jmp instructions which need to be relaxed makes it easy and accurate when the linker tries to find these instructions. Our current method peeks back to look at the opcode of the relocation type that could correspond to jmp instructions.

The relaxation pass does two things:

First, it deletes all explicit fall-through direct jump instructions between adjacent basic blocks. This is done by discarding the tail of the basic block section.

Second, If there are consecutive jump instructions, it checks if the first conditional jump can be inverted to convert the second into a fall through and delete the second.

The jump instructions are relaxed by using jump instruction mods, something like relocations. These are used to modify the opcode of the jump instruction. Jump instruction mods contain three values, instruction offset, jump type and size. While writing this jump instruction out to the final binary, the linker uses the jump instruction mod to determine the opcode and the size of the modified jump instruction. These mods are required because the input object files are memory-mapped without write permissions and directly modifying the object files requires copying these sections. Copying a large number of basic block sections significantly bloats memory.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
ruiu added a comment.Feb 27 2020, 8:32 PM

If Sriraman has already resolved all Fangrui's comments, I can reiterate my LGTM, as the new code is well written, and I don't have a specific concern. I actually think that this is an interesting feature and in some degree a natural extension of -ffunction-sections.

ruiu accepted this revision.Mar 16 2020, 2:51 AM

It looks like there's no activities in the llvm-dev thread and this review thread, so I'll reiterate my LGTM, as I think we should not block this change, in particular given that the corresponding change to LLVM has been submitted.

In general I think a lot of this looks pretty good. It could use some more comments - in particular to call out what a lot of this code is used for. It's not used on a typical linking path and so could be confusing to people going through the code.

-eric

lld/ELF/OutputSections.cpp
249

Might want to assert size != 0 or early return.

tmsriram updated this revision to Diff 250933.Mar 17 2020, 3:49 PM
tmsriram marked an inline comment as done.

Address Reviewer comments.

+ More comments
+ Check for early return in nopInstrFill

In general I think a lot of this looks pretty good. It could use some more comments - in particular to call out what a lot of this code is used for. It's not used on a typical linking path and so could be confusing to people going through the code.

-eric

I have added more comments and mentioned that this is only applicable with basic block sections. I have also added "TODO:" in places where the code is going to get a lot simpler with newer relocations

@echristo Eric, is this alright now? Thanks.

MaskRay requested changes to this revision.EditedMar 20 2020, 5:30 PM

I am very glad to see that we have made progress by landing D68063 (llvm/CodeGen/CommandFlags.inc) and D73674 (llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp). Basic block sections is agreed to be useful even outside Properller.

There are several optimizations goals:

  • Alignment inserting
  • Automatic cache prefetching
  • Large code model addressing can lower performance quite a bit. A post-link scheme can relax large code model addressting to small code model addressing.
  • ...

There is a CPU erratum that we want to mitigate.

  • Intel's Jump Condition Code Erratum

By making this change, we will go the object file level route: annotate object files with metadata so that certain transformations can be performed.

Whether this scheme can satisfy the goals and avoid the erratum, and the uncertainty about how many stuff we will have to reinvent is my biggest concerns.

On https://lists.llvm.org/pipermail/llvm-dev/2020-February/139543.html (my brainstorming), I mentioned we may achieve our goals and make it suitable for future optimizations by using a file format with more semantics (rather than an object file). I hope we can think more on this, rather than rush to conclusions "this is redoing full LTO. it can't scale"

Considering the above points, I re-iterate my "Request Changes". We need a plan to prove that we can achieve our optimization goals while avoiding caveats (erratum).

This revision now requires changes to proceed.Mar 20 2020, 5:30 PM

Fangrui, I agree with Eric here. JCC erratum handling is not part of the core feature and it is perfectly reasonable to submit that as a follow up patch. This review thread has been going on for almost half a year and if the comments about the core patch have been addressed, we shall unlock the progress and move on.

In the meantime, the discussion on the erratum handling should also start -- probably in the contexts of other mitigations (not just this JCC one).

thanks,

David

I am very glad to see that we have made progress by landing D68063 (llvm/CodeGen/CommandFlags.inc) and D73674 (llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp). Basic block sections is agreed to be useful even outside Properller.

There are several optimizations goals:

  • Alignment inserting
  • Automatic cache prefetching
  • Large code model addressing can lower performance quite a bit. A post-link scheme can relax large code model addressting to small code model addressing.
  • ...

There is a CPU erratum that we want to mitigate.

  • Intel's Jump Condition Code Erratum

By making this change, we will go the object file level route: annotate object files with metadata so that certain transformations can be performed.

Whether this scheme can satisfy the goals and avoid the erratum, and the uncertainty about how many stuff we will have to reinvent is my biggest concerns.

On https://lists.llvm.org/pipermail/llvm-dev/2020-February/139543.html (my brainstorming), I mentioned we may achieve our goals and make it suitable for future optimizations by using a file format with more semantics (rather than an object file). I hope we can think more on this, rather than rush to conclusions "this is redoing full LTO. it can't scale"

Considering the above points, I re-iterate my "Request Changes". We need a plan to prove that we can achieve our optimization goals while avoiding caveats (erratum).

@echristo @ruiu

If the JCC erratum is the only concern then we are able to show now with experiments that Propeller can produce JCC erratum free binaries with almost no performance impact and only by using the existing assembler mitigations : http://lists.llvm.org/pipermail/llvm-dev/2020-March/140134.html

Let's use that thread to continue to investigate how the linker could potentially do a better job of handling this or other erratums in general. Could we please unblock this?

I am very glad to see that we have made progress by landing D68063 (llvm/CodeGen/CommandFlags.inc) and D73674 (llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp). Basic block sections is agreed to be useful even outside Properller.

There are several optimizations goals:

  • Alignment inserting
  • Automatic cache prefetching
  • Large code model addressing can lower performance quite a bit. A post-link scheme can relax large code model addressting to small code model addressing.
  • ...

There is a CPU erratum that we want to mitigate.

  • Intel's Jump Condition Code Erratum

By making this change, we will go the object file level route: annotate object files with metadata so that certain transformations can be performed.

Whether this scheme can satisfy the goals and avoid the erratum, and the uncertainty about how many stuff we will have to reinvent is my biggest concerns.

On https://lists.llvm.org/pipermail/llvm-dev/2020-February/139543.html (my brainstorming), I mentioned we may achieve our goals and make it suitable for future optimizations by using a file format with more semantics (rather than an object file). I hope we can think more on this, rather than rush to conclusions "this is redoing full LTO. it can't scale"

Considering the above points, I re-iterate my "Request Changes". We need a plan to prove that we can achieve our optimization goals while avoiding caveats (erratum).

@echristo @ruiu

If the JCC erratum is the only concern then we are able to show now with experiments that Propeller can produce JCC erratum free binaries with almost no performance impact and only by using the existing assembler mitigations : http://lists.llvm.org/pipermail/llvm-dev/2020-March/140134.html

Let's use that thread to continue to investigate how the linker could potentially do a better job of handling this or other erratums in general. Could we please unblock this?

I agree. While the erratum is going to be important I think that this going in now doesn't block any future work on erratum handling in the linker or make it more difficult to add other than adding it on top of propeller perhaps, but that's a different issue?

@MaskRay : Is the propeller patch going to (outside of in propeller) going to cause implementing the jcc erratum to be more difficult? If not, then I think we should go forward. If so, can you propose some (hopefully narrow) changes to this patch in order to make that more possible - even "you asked for jcc erratum and we can't do that here" as an error seems like it would be an ok incremental step forward?

dxf added a subscriber: dxf.Mar 23 2020, 2:33 PM
MaskRay added a comment.EditedMar 23 2020, 10:56 PM

I am very glad to see that we have made progress by landing D68063 (llvm/CodeGen/CommandFlags.inc) and D73674 (llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp). Basic block sections is agreed to be useful even outside Properller.

There are several optimizations goals:

  • Alignment inserting
  • Automatic cache prefetching
  • Large code model addressing can lower performance quite a bit. A post-link scheme can relax large code model addressting to small code model addressing.
  • ...

There is a CPU erratum that we want to mitigate.

  • Intel's Jump Condition Code Erratum

By making this change, we will go the object file level route: annotate object files with metadata so that certain transformations can be performed.

Whether this scheme can satisfy the goals and avoid the erratum, and the uncertainty about how many stuff we will have to reinvent is my biggest concerns.

On https://lists.llvm.org/pipermail/llvm-dev/2020-February/139543.html (my brainstorming), I mentioned we may achieve our goals and make it suitable for future optimizations by using a file format with more semantics (rather than an object file). I hope we can think more on this, rather than rush to conclusions "this is redoing full LTO. it can't scale"

Considering the above points, I re-iterate my "Request Changes". We need a plan to prove that we can achieve our optimization goals while avoiding caveats (erratum).

@echristo @ruiu

If the JCC erratum is the only concern then we are able to show now with experiments that Propeller can produce JCC erratum free binaries with almost no performance impact and only by using the existing assembler mitigations : http://lists.llvm.org/pipermail/llvm-dev/2020-March/140134.html

Let's use that thread to continue to investigate how the linker could potentially do a better job of handling this or other erratums in general. Could we please unblock this?

Intel JCC erratum is not the only concern. My bigger concern is whether we can achieve our post-link optimization goals other than layout shuffling with the current scheme:

  • Alignment inserting
  • Automatic cache prefetching
  • Large code model addressing can lower performance quite a bit. A post-link scheme can relax large code model addressting to small code model addressing. ...
  • ...

These points were already listed in my previous comments. I believe internally you probably have more brainstorming thoughts. As I said on https://lists.llvm.org/pipermail/llvm-dev/2020-March/139639.html , I am not yet convinced that with the no disassembly assumption, reordering opaque sections can achieve the above goals. Post-link optimization is not a new idea and there have been several engineering efforts before Propeller. However, Propeller is the first integrating the great idea into LLVM. As I said I look forward to its bright future. I just hope that we can create a generic framework. Our focus is currently section reordering. When we start to think future optimization opportunities, we don't need to create one, two, three, four more frameworks.

I saw Rahman posted https://lists.llvm.org/pipermail/llvm-dev/2020-March/140134.html yesterday. Sorry that I did not have time reading it today. If the idea is that more layout work will be done by the compiler, then it starts to look good to me.

I am very glad to see that we have made progress by landing D68063 (llvm/CodeGen/CommandFlags.inc) and D73674 (llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp). Basic block sections is agreed to be useful even outside Properller.

There are several optimizations goals:

  • Alignment inserting
  • Automatic cache prefetching
  • Large code model addressing can lower performance quite a bit. A post-link scheme can relax large code model addressting to small code model addressing.
  • ...

@echristo @snehasish @dxf @ruiu @tejohnson @rnk

All the above ideas you mentioned here, you heard it from my team in face to face meetings and the last one in internal discussions. I recall specifically telling you some of it is in very early stages and to keep it to yourself. You could have at least checked with us before mentioning it, as just courtesy if not anything else.

There is a CPU erratum that we want to mitigate.

  • Intel's Jump Condition Code Erratum

By making this change, we will go the object file level route: annotate object files with metadata so that certain transformations can be performed.

Whether this scheme can satisfy the goals and avoid the erratum, and the uncertainty about how many stuff we will have to reinvent is my biggest concerns.

On https://lists.llvm.org/pipermail/llvm-dev/2020-February/139543.html (my brainstorming), I mentioned we may achieve our goals and make it suitable for future optimizations by using a file format with more semantics (rather than an object file). I hope we can think more on this, rather than rush to conclusions "this is redoing full LTO. it can't scale"

Considering the above points, I re-iterate my "Request Changes". We need a plan to prove that we can achieve our optimization goals while avoiding caveats (erratum).

@echristo @ruiu

If the JCC erratum is the only concern then we are able to show now with experiments that Propeller can produce JCC erratum free binaries with almost no performance impact and only by using the existing assembler mitigations : http://lists.llvm.org/pipermail/llvm-dev/2020-March/140134.html

Let's use that thread to continue to investigate how the linker could potentially do a better job of handling this or other erratums in general. Could we please unblock this?

Intel JCC erratum is not the only concern. My bigger concern is whether we can achieve our post-link optimization goals other than layout shuffling with the current scheme:

  • Alignment inserting
  • Automatic cache prefetching
  • Large code model addressing can lower performance quite a bit. A post-link scheme can relax large code model addressting to small code model addressing. ...
  • ...

These points were already listed in my previous comments. I believe internally you probably have more brainstorming thoughts. As I said on https://lists.llvm.org/pipermail/llvm-dev/2020-March/139639.html , I am not yet convinced that with the no disassembly assumption, reordering opaque sections can achieve the above goals. Post-link optimization is not a new idea and there have been several engineering efforts before Propeller. However, Propeller is the first integrating the great idea into LLVM. As I said I look forward to its bright future. I just hope that we can create a generic framework. Our focus is currently section reordering. When we start to think future optimization opportunities, we don't need to create one, two, three, four more frameworks.

I don't mean any disrespect here but your tone suggests that you are quite experienced in this area :) If you have a better proposal, I strongly encourage you to propose it, evaluate it with experiments and performance numbers and get it into LLVM. Asking us to evaluate completely new designs along the lines of Full or Partial LTO is not feasible as it takes several weeks if not months, and IMHO, not reasonable particularly at this stage in the review.

We have put in a lot of effort towards this work to come this far. Asking us to go do it totally differently is not something we are going to do. We now know the partial LTO idea was actually not yours and only suggested to you, which I believe you should at least acknowledge for transparency.

Multiple people with significant LTO experience have told you ideas resembling Full LTO have scalability issues and ThinLTO has had a lot of adoption due to exactly this. If you want to prove them wrong, good luck to you but please don't ask us to do the heavy lifting!

As for disassembly, we have not presented a single patch that does any serious disassembly and we fully understand the pitfalls. We understand that the jump relaxation does mild disassembly and we are looking at relocations to overcome that as you already know.

We are looking at efficient ways to accomplish our other optimization objectives and we will present clear designs with experiments on llvm-dev when we get it done. The idea here is to do thin links like @mehdi_amini alluded to in that thread of yours: https://llvm.org/devmtg/2020-02-23/#kl which will do most of the transformations in the compiler and use thin links to generate summaries that are whole program.

I saw Rahman posted https://lists.llvm.org/pipermail/llvm-dev/2020-March/140134.html yesterday. Sorry that I did not have time reading it today. If the idea is that more layout work will be done by the compiler, then it starts to look good to me.

I urge you to read that as we spent significant time to conclusively prove that JCC erratum is a non-issue. I can summarize the plan to you:

  • We have been looking at constantly reducing the bloat from extra sections to be as low as possible. Some of the work we did here was to selectively create basic block sections.
  • During this, we realized we can even do better if we can compute basic block orders early immediately after profiling. This would require using a dynamic CFG but we have just protoyped it and it has the same performance benefits.
  • This could also moves the bulk of the Propeller work from the linker to a third party tool, create_llvm_prof. This patch is still necessary as it relates to bb sections.
  • This allows us to form larger sections in the compiler and not wait for the linker to do the reordering. We would still have to create multiple sections but a lot fewer, significantly reducing the bloats.
  • This means we can also reuse the existing assembler mitigations developed without having to reinvent them in the linker which gives us an immediate solution.
  • Performance is neutral (0.2% slowdown) after applying the mitigations. Infact, without Propeller the performance is down by 0.6% from the mitigations.
  • To be clear, we get all of the Propeller wins even with the mitigations, measured on clang benchmark.
  • We feel the linker can do a much better job here since the mitigations are only using NOPs and contrary to what you told us in the meeting, prefixes dont seem to help. You have noted this yourself so I am still wondering why you told us that prefixes help: https://reviews.llvm.org/D72225#1818149 where you say " NOP padding alone seems good"
  • The linker does not have to use the large hammer of aligning every function to 32 byte boundaries but this is something best discussed in a design thread

You also say in your previous message that "I am very glad to see that we have made progress by landing D68063 ..." and yet you are blocking this. This is ridiculous. If you have fundamental disagreements, you should also have been blocking the other patches. The other patches are not very useful without this, whats up with the selective blocking!

To conclude, it is perfectly fine if you are opposed to this and don't wish to unblock. I am trying to act in good faith and I am just going to have to push this around you if I get the approvals or kill basic block sections. We have to agree to disagree.

MaskRay added a comment.EditedMar 24 2020, 5:15 PM

Hi Sriraman,

Sounds like there is strong support for your patch, so let’s move forward on it.

I do have a few more code review items I’d like addressed if we can before we commit.
There are several nits about excess parentheses which do not fit with the rest of lld code.
Please scan the full patch and delete them. Rebase and upload a new diff so that I can patch my local repository with arc patch D68065.
It does not apply so it isn’t convenient for me to review the tests now.

In addition, we probably should move test/ELF/bb-sections-* tests to test/ELF/propeller/

I don't mean any disrespect here but your tone suggests that you are quite experienced in this area :) If you have a better proposal, I strongly encourage you to propose it, evaluate it with experiments and performance numbers and get it into LLVM. Asking us to evaluate completely new designs along the lines of Full or Partial LTO is not feasible as it takes several weeks if not months, and IMHO, not reasonable particularly at this stage in the review.

The first sentence is fair. I have just a bit more than 2 years experience contributing to LLVM. I made a llvm-dev proposal as everyone can see. In my spare time I will probably make more investigations how to speed up things.

We have put in a lot of effort towards this work to come this far. Asking us to go do it totally differently is not something we are going to do. We now know the partial LTO idea was actually not yours and only suggested to you, which I believe you should at least acknowledge for transparency.

Just wanted to defend for myself. I was not sure I could mention the person. That email also includes lots of my own ideas.

You also say in your previous message that "I am very glad to see that we have made progress by landing D68063 ..." and yet you are blocking this. This is ridiculous.

Please note that I made it very clear that basic block sections are helpful for other things. I was blocking this linker patch because of concerns about long-term viability and maintenance.

That said, I think we need to try out new ideas. I am now focusing on code view itself, instead of repeating my previous high level concerns.

lld/ELF/Arch/X86_64.cpp
66

Should follow variableName convention.

Add See X86AsmBackend::writeNopData

159
for (unsigned i = is.relocations.size(); i != 0; ) {
  --i;
  if (is.relocations[i].offset == offset && is.relocations[i].expr != R_NONE)
    return i;
}
168

delete parens

251

Nit: delete parens (is.getSize() - 4)

lld/ELF/Options.td
47

Add (default). See other options.

510

"Do not give unique names to every basic block section for LTO (default)"

lld/ELF/OutputSections.cpp
260

assert(nopFiller[remaining - 1].size() == remaining)

350–354

The code (if (isec->nopFiller)) self explains. No need for a comment.

lld/ELF/Target.h
96

delete excess space after //

lld/ELF/Writer.cpp
1633

I requested a research for st_value but I think it is not needed to get it accurate.

However, I don't think we should just copy the elfnn-riscv.c behavior.

if (def->value + def->size > NewSize && def->value <= OldSize && def->value + def->size <= OldSize) {

should be simplified to

if (def->value + def->size > NewSize && def->value + def->size <= OldSize) {
1638

Nit: Moving -> move (make it a bit simpler)

1662

Nit: move the condition to the call site.

1667

There is only one list item. Why use 1. ?

1677

Where is Step 2?

1682

Nit: delete parens

1685

Nit: static_cast<unsigned>

1691

Removing -> removed

2004

optimizeBasicBlockJumps calls assignAddresses, which was only called in finalizeAddressDependentContent.

We hope assignAddresses caller are grouped together (if in.symTab needs to be finalized first, please add a comment). Can you move this pass immediately before (or after) finalizeAddressDependentContent?

lld/test/ELF/bb-sections-and-icf.s
4 ↗(On Diff #250933)

deleted

6 ↗(On Diff #250933)

x86_64-pc-linux -> x86_64

7 ↗(On Diff #250933)

delete excess space.

Prefer --optimize-bb-jumps over -optimize-bb-jumps

7 ↗(On Diff #250933)

If --icf=all result is different from --icf=none. Add a comment.

16 ↗(On Diff #250933)

Use ## for test comments.

17 ↗(On Diff #250933)

Don't add excess space.

MaskRay added inline comments.Mar 24 2020, 5:19 PM
lld/test/ELF/bb-sections-delete-fallthru.s
14

Just delete {{[0-9|a-f| ]*}} if the address is not significant.

Please also apply to other test files.

19

Delete Begin function foo

Scrub clang output to the minimum

tmsriram updated this revision to Diff 252733.Mar 25 2020, 8:31 PM
tmsriram marked 33 inline comments as done and an inline comment as not done.

Address recent reviewer comments.

  • rebase
  • check it applies clean to trunk

Hi Sriraman,

Sounds like there is strong support for your patch, so let’s move forward on it.

I do have a few more code review items I’d like addressed if we can before we commit.
There are several nits about excess parentheses which do not fit with the rest of lld code.
Please scan the full patch and delete them. Rebase and upload a new diff so that I can patch my local repository with arc patch D68065.
It does not apply so it isn’t convenient for me to review the tests now.

In addition, we probably should move test/ELF/bb-sections-* tests to test/ELF/propeller/

In D68063 and D73674, we decided (one of your suggestions too) to remove all references to Propeller as this was related to basic block sections. This is also fully a part of bb sections and fits naturally and is part of core LLD. I dont think we should move it to propeller/.

I don't mean any disrespect here but your tone suggests that you are quite experienced in this area :) If you have a better proposal, I strongly encourage you to propose it, evaluate it with experiments and performance numbers and get it into LLVM. Asking us to evaluate completely new designs along the lines of Full or Partial LTO is not feasible as it takes several weeks if not months, and IMHO, not reasonable particularly at this stage in the review.

The first sentence is fair. I have just a bit more than 2 years experience contributing to LLVM. I made a llvm-dev proposal as everyone can see. In my spare time I will probably make more investigations how to speed up things.

We have put in a lot of effort towards this work to come this far. Asking us to go do it totally differently is not something we are going to do. We now know the partial LTO idea was actually not yours and only suggested to you, which I believe you should at least acknowledge for transparency.

Just wanted to defend for myself. I was not sure I could mention the person. That email also includes lots of my own ideas.

You also say in your previous message that "I am very glad to see that we have made progress by landing D68063 ..." and yet you are blocking this. This is ridiculous.

Please note that I made it very clear that basic block sections are helpful for other things. I was blocking this linker patch because of concerns about long-term viability and maintenance.

That said, I think we need to try out new ideas. I am now focusing on code view itself, instead of repeating my previous high level concerns.

lld/ELF/Arch/X86_64.cpp
159

This won't work because this function should return the max. size when not found. If I change that assumption, I must also change how this is consumed by its callers. Is there a reason why you want to search in the reverse direction? I mean I will change it if you could tell me why.

lld/ELF/OutputSections.cpp
350–354

I removed it but maybe I should say that NOPs are needed here as opposed to TRAP because they might be executed.

lld/ELF/Writer.cpp
1633

@amharc We are going back and forth here. We didn't think it was necessary to copy the behavior either, but since you suggested that we copy elfnn-riscv.c behavior, we went ahead and did it. We dont see a problem with either, so should we just leave it as is? Is there a particular concern here, that is not clear to me.

1638

I am not a native speaker but "Moving" seems more appropriate here unless you meant that 'M' should be lower case. My take, "move" seems to imply the user must do it.

1662

Moved and asserted at the beginning.

1667

Rephrased, shrinking the jump instrs was the 2. but I split that patch out and this remained unnoticed.

1677

Same deal.

1682

With parens reads better to me personally. If this is not acceptable w.r.t the coding style, lmk and I will delete.

1691

Past tense is preferred? I see mixed use here in many places in LLD.

2004

Correct me if I am wrong but finalizing in.symtab before we optimize seems important, I added a comment here. If we relax jumps going forward, we would definitely need to know how far they are.

lld/test/ELF/bb-sections-and-icf.s
7 ↗(On Diff #250933)

Didn't follow. This test explicitly checks for the folding. You want to test icf=none too? Why?

tmsriram edited the summary of this revision. (Show Details)Mar 25 2020, 8:36 PM

Ping.

Will read tomorrow.

MaskRay added inline comments.Apr 2 2020, 2:51 PM
lld/test/ELF/bb-sections-delete-fallthru.s
9

delete excess space and use --optimize-bb-jumps

I mentioned this in a previous comment.

25

Add some spaces after CHECK: to make the label aligned.

For subsequent -NEXT: lines, you can increase the indentation (say, by 1) to make it clear the instructions follow the label:

# CHECK:      <a.BB.foo>
# CHECK-NEXT:  nopl (%rax)

Please fix other tests as well.

27

je {{.*}} <r.BB.foo> I have recently updated llvm-objdump -d to print the target address (to be consistent with GNU objdump ; is what most users desire) instead of a decimal PC relative immediate

No need to align the first operand.

MaskRay added inline comments.Apr 2 2020, 2:53 PM
lld/test/ELF/bb-sections-delete-fallthru.s
25
# CHECK:      <a.BB.foo>:
# CHECK-NEXT:  nopl (%rax)

Add a colon to make it clear that a.BB.foo is a label:

MaskRay added a comment.EditedApr 2 2020, 3:26 PM

I have a question about aaaaaaa.BB.foo style labels. Are they STB_LOCAL symbol? If so,

  • The assembler (MC) will generally convert relocations referencing STB_LOCAL to reference STT_SECTION instead.
  • The assembler will keep them in .symtab
  • The linker will retain such labels in the .symtab section in the executable/shared object unless --discard-locals is specified.
  • aaaa.BB.foo will just be marker symbols in executables/shared objects: "hey, these addresses are special and are referenced by some instructions."
  • The strip tool will drop .symtab . It seems that the executables/shared objects cannot be stripped.

Are all the items above expected?

lld/test/ELF/bb-sections-delete-fallthru.s
8

Drop -pc-linux.

I mentioned this in a previous comment.

10

Drop --check-prefix=CHECK. It is the default.

MaskRay added inline comments.Apr 2 2020, 3:32 PM
lld/ELF/Arch/X86_64.cpp
159
for (unsigned i = is.relocations.size(); i != 0; ) {
  --i;
  if (is.relocations[i].offset == offset && is.relocations[i].expr != R_NONE)
    return i;
}
return is.relocations.size();

An input section may have several relocations. Scanning backward can be more efficient.

tmsriram marked an inline comment as done.Apr 2 2020, 3:42 PM
tmsriram added inline comments.
lld/ELF/Arch/X86_64.cpp
159

We discussed this f2f and even about sorting it and you expressed that you were alright leaving it as is, maybe you forgot.

tmsriram marked an inline comment as done.Apr 2 2020, 3:44 PM
tmsriram added inline comments.
lld/ELF/Arch/X86_64.cpp
159

Forgot to add that since this will change when the new relocations come in.

MaskRay added inline comments.Apr 2 2020, 5:00 PM
lld/ELF/Arch/X86_64.cpp
159

No, I do not forget it. The relocations are still sorted in practice. It is just that this is not a guaranteed property. Iterating the relocations backward is faster.

tmsriram updated this revision to Diff 254655.Apr 2 2020, 5:20 PM
tmsriram marked 10 inline comments as done.

Rebase and make recent suggested changes.

I have a question about aaaaaaa.BB.foo style labels. Are they STB_LOCAL symbol? If so,

  • The assembler (MC) will generally convert relocations referencing STB_LOCAL to reference STT_SECTION instead.
  • The assembler will keep them in .symtab
  • The linker will retain such labels in the .symtab section in the executable/shared object unless --discard-locals is specified.
  • aaaa.BB.foo will just be marker symbols in executables/shared objects: "hey, these addresses are special and are referenced by some instructions."
  • The strip tool will drop .symtab . It seems that the executables/shared objects cannot be stripped.

Acknowledged. We are aware of all these points. We will be adding -mrelocate-with-symbols which will relocate with symbols instead of sections to perform relaxation more easily. These symbols are used to map profiles to virtual addresses and hence the unstripped binary is needed. This is our first version and we are brain-storming other methods including putting it in Debug Info. Thanks.

Are all the items above expected?

lld/ELF/Arch/X86_64.cpp
159

Alright, backward scanning now.

ruiu accepted this revision.Apr 6 2020, 8:29 PM

LGTM

I had a VC meeting wtih maskray to discuss concerns he has, and he is ok with this patch. I'm fine with this patch too. I'll give a final LGTM to unblock you. Please make the following changes and submit. Thanks!

lld/ELF/LTO.cpp
85

nit: if else has {}, add {} to other if and else if clauses.

95–96

Please use error() to report an error and use the same style as other error messages. E.g.

error("cannot open " + config->ltoBasicBlockSections + ":" + MBOrErr.getError().message());
MaskRay added inline comments.Apr 6 2020, 8:55 PM
lld/ELF/Writer.cpp
1681

delete excess parens

2004

Run after in.symTab is finalized.

Why is important?

This revision was not accepted when it landed; it landed in state Needs Review.Apr 7 2020, 7:02 AM
This revision was automatically updated to reflect the committed changes.
tmsriram marked 6 inline comments as done and an inline comment as not done.
tmsriram added inline comments.Apr 7 2020, 7:07 AM
lld/ELF/Writer.cpp
2004

Modified comment.

MaskRay added inline comments.Apr 7 2020, 12:00 PM
lld/ELF/Writer.cpp
2004

I am still not sure this is correct. .symtab and .strtab and potential .shstrtab are placed after everything else. The internal representation does not use the contents of .symtab at all.

Though, you already committed this patch. We probably don't want to change here more to cause churn.