This is an archive of the discontinued LLVM Phabricator instance.

[lld] [LinkerScript] Implement semantics for simple sections mappings
AcceptedPublic

Authored by rafaelauler on Mar 8 2015, 6:49 PM.

Details

Summary

This patch should work on top of D8156 because it needs linker script
expression evaluation.

This patch implements the behaviour of the SECTIONS linker script directive,
used to define a custom mapping between input and output sections. Not all
sections constructs are currently supported, but only the ones that do not
use special sort orders. I added two LIT test as practical examples of which
sections directives are currently supported.

In terms of high-level changes, I created a new class "script::Sema" that owns
all linker script ASTs and the logic for linker script semantics as well.
ELFLinkingContext owns a single copy of Sema, which will be used throughout
the object file reading process (to assign rule ids to atoms) and writing
process (to layout sections as proposed by the linker script).

Other high-level change is that the writer no longer uses a "const" copy of
the linking context. This happens because linker script expressions must be
calculated *while* calculating final virtual addresses, which is a very late
step in object file writing. While calculating these expressions, we need to
update the linker script symbol table (inside the semantics object), and, thus,
we are "modifying our context" as we prepare to write the file.

The difference D7915 is that I implemented several suggestions from Rui and
Shankar:

  • there is no exposed "id". Clients always ask how to order a section, and whether we use "ids" to sort or not is internal to the Sema class implementation. (suggestion from Rui)
  • improved the input sections ordering and addressed the test case mentioned by Shankar (second LIT test case in this patch) (suggestion from Shankar)
  • refactored the code in DefaultLayout to live in ScriptLayout (suggestion from Shankar)

I also took the opportunity to improve the code and implement simple
SORT directives and wildcard matching.

Diff Detail

Event Timeline

rafaelauler updated this revision to Diff 21466.Mar 8 2015, 6:49 PM
rafaelauler retitled this revision from to [lld] [LinkerScript] Implement semantics for simple sections mappings.
rafaelauler updated this object.
rafaelauler edited the test plan for this revision. (Show Details)
rafaelauler added a subscriber: Unknown Object (MLST).

can we split this patch by functionalities too, its easier for review.

include/lld/ReaderWriter/LinkerScript.h
168

Could you represent all the different kinds of functions that can operate on the input sections as Functions instead of InputSectionKinds ?

759–760

this could be simplified if you use lld::range.

1122

can you rename this to archiveNameOrPath

1127–1130

caching layout order is error prone, as users may selectively want to specify a layout order from one rule to another.

1154

All the end users should be using Sema right ? So This function could be made private.

1159

returns void ?

1186

Could we evaluate as and when the input section is appended to the output section ?

lib/ReaderWriter/ELF/DefaultLayout.h
229–235

If you can associate the expression also to an outputsection, this would not be needed. The code gets overly simplified too.

lib/ReaderWriter/ELF/SegmentChunks.h
437

why would you do this ?

ruiu edited edge metadata.Mar 8 2015, 8:22 PM

As a general comment, this patch is still too large, and I prefer smaller, incremental patches. For examples, we could drop some directives from this patch, or remove some features such as wildcard pattern matching from this, and basic features should still work. And then we could add missing features one by one. It may be too late, and I read the whole patch anyway, but please consider doing that way next time. Thanks!

Review comments are inline.

include/lld/ReaderWriter/LinkerScript.h
1130

Instead of defining a class (struct) implementing an operator=(const SectionKey &), you can simply define of a function (int64_t f(const SectionKey &)) and pass it to std::unordered_map.

You want to move this below private:.

1138

Ditto

1154

Returning a const std::vector would be better?

lib/ReaderWriter/ELF/DefaultLayout.h
350

Merge this with DefaultLayout and remove this class.

587

It's good to add a brief comment here, like, section names may be renamed by linker scripts.

993

This code is complicated. If we just want to group sections by content type, can you just stable_sort them by content type?

lib/ReaderWriter/LinkerScript.cpp
2111

I think you need to check whether or not i is not greater than the length of name on each iteration.

2123

StringRef::find returns npos if not found.

2210

This loop content is duplicate of the previous one. Do we really need to separate non-wildcard patterns and wildcard patterns? Technically, the former is a special case of the latter, and I don't think we have to optimize this routine. Simplicity is preferred, I guess.

2378

You can do

if (Sections *sec = dyn_cast<Sections>(c))
  linearlizeAst(sec)
ruiu added inline comments.Mar 8 2015, 8:26 PM
include/lld/ReaderWriter/LinkerScript.h
759–760

This cannot be replaced with lld::range since these two methods allows us to directly use this class in range-based for loops. Also, don't use lld::range anymore. We cannot justify having that generic library only inside LLD.

1159

Returning void is fine. Don't make this function return std::error_code "just in case".

Thank you all for your patience with this patch size and for the reviews. I'll keep the next patches shorter. Comments inline.

include/lld/ReaderWriter/LinkerScript.h
168

I'm afraid I'm not on the same page, which functions are you mentioning?

1122

I was under the impression that, from the linker script perspective, we should always work with file names instead of paths. For example, to make my test case work correctly, I need to drop the path that comes from the File object and use only the file name. Only then I can match it with a linker script rule because they omit the full path. Can a linker script specify a full path to an archive?

1130

I don't understand how caching here would be error prone. I'm simply implementing the design that was suggested from the previous thread of D7915:

  • Make DefaultLayout ask Sema about the order of two sections by section name (or, more precisely, section "key") and do not expose "ids".
  • Since these questions are frequently asked for the same input section, we cache the result.

Which users do you refer to? You mean, the TargetLayout?

1154

Rui: Yes, I should return a const std::vector.

You are right, Shankar, but this function is currently not private to enable unit testing of linker script expression evaluation.

lib/ReaderWriter/ELF/DefaultLayout.h
229–235

Ok, I'll work on this. Thanks.

350

No problem, but first can we get a consensus on whether do we need ScriptLayout or not? I'm asking this because I only moved the code out of DefaultLayout due to Shankar's solicitation.

587

Sure

993

Unfortunately this turned out to be more complicated than intended. Here is the problem:

We can't sort everything again because this would undo the linker script order. We can only move sections that weren't assigned by the linker script.

If we stable_sort everything by content_type, we may also move sections that should be pinned by the linker script.

lib/ReaderWriter/ELF/SegmentChunks.h
437

This is used to keep in sync the file image with the memory image.

lib/ReaderWriter/LinkerScript.cpp
2111

Will do.

2123

Thanks for pointing out, I will fix this.

2210

I will refactor this.

2378

Thanks, will do.

shankarke added inline comments.Mar 9 2015, 8:56 AM
include/lld/ReaderWriter/LinkerScript.h
168

Thinking about this again, Its ok.

1122

I think so, you can specify a full path to the archive too. why not ?

1130

Yes, TargetLayout.

lib/ReaderWriter/ELF/DefaultLayout.h
350

DefaultLayout was designed as a class to handle generic ELF Layout when not having linker scripts. Some of the linker script usecases are not even relevant for the DefaultLayout. Conditionally executing things in the DefaultLayout if we have a linker script IMO is a bad design.

lib/ReaderWriter/ELF/SegmentChunks.h
437

Do you have a specific test for this ? I would like to see that to understand here.

ruiu added inline comments.Mar 9 2015, 11:31 AM
lib/ReaderWriter/ELF/DefaultLayout.h
350

I'm sorry but I completely disagree. I need to keep an eye on this kind of overkill abstraction as the history of this project shows. I spent so much time to reduce complexity of the linker, such as removal of InputGraph. Don't overdesign class hierarchy. That's going to become technical debt.

shankarke added inline comments.Mar 9 2015, 11:42 AM
lib/ReaderWriter/ELF/DefaultLayout.h
350

I dont think making changes in the DefaultLayout to accomodate LinkerScripts is a good idea.

If others / Rafael think its a great idea, please go for it.

rafaelauler added inline comments.Mar 10 2015, 2:04 PM
include/lld/ReaderWriter/LinkerScript.h
1122

I checked the manual and you are right, thanks for pointing out. I renamed this to archivePath (and the other to memberPath) to maintain consistency with file.archivePath, which also can be a name or a path.

I also deleted a function call that I was using to retrieve the file name from a path in DefaultLayout. Now I checked with GNU ld that if the linker script has a full path inside it, it must only match inputs with this exact name, and vice-verse (if the input file was specified with a full path, it does not match a rule that only specifies the file name).

1130

Ok, but in this current implementation, it is impossible for TargetLayouts to change this id, since this is private to Sema. If the user wants to change this order, we can later implement an interface to do this in a clean way that also updates our caches.

1138

I'm under the impression that std::unordered_map only accepts functors, no? I tried to remove this from the class, but I couldn't get this to work. Am I missing something? But I moved this to private.

1186

Woudn't this be equivalent to evaluating once we append to segments? I mean, the algorithm I have in mind is exactly the same. Before adding a section to the output section, we gather all expressions and put it there. This is exactly the same as putting expressions once we attach an input section to a segment.

lib/ReaderWriter/ELF/DefaultLayout.h
229–235

See above.

350

I guess currently we only have a few functions for ScriptLayout, so I merged them back into DefaultLayout. I tried to maintain them factored out from the rest of DefaultLayout, so as to be easy to see where script-related functions hook in the default behavior.

587

After a comment from Shankar explaining that we may need the full path (instead of just the file name), I removed this behavior (I don't call llvm::sys::path::filename() anymore).

lib/ReaderWriter/ELF/SegmentChunks.h
437

Well, the test attached to this patch tests this. Let me try to briefly show what's happening here:

you see the line below, "fileOffset = llvm::RoundUpToAlignment..." ?

Well, it is fixing the offset because when we assigned virtual addresses, we may have changed the address where a given chunk resides to meet alignment contraints. Thus, to keep the fileOffset in sync with the virtual addresses (memory image), this line fixes this.

However, the linker script also has power to change the current virtual address (we see now that not only alignments can change this). So I added these three lines of code that basically checks if we are calculating the address of an expression chunk and, if yes, check if our virtual address (calculated when assigning virtual addresses to chunks) is deviating from the expected value found in "lastVirtualAddress". If yes, we need to fix "fileOffset" by adding this "gap". It's just like a "custom alignment".

To see how this is necessary, any test case that uses a linker script that changes any address will suffice. The attached test case works: it requests a .ro.data section to start somewhere around address 0x506000 (actually, it is 0x50600 added to the size of previously emitted sections), but the current address for sections is around 0x500000 + previous sections size. If this code is removed, even though symbols get fixed up as though these sections reside at 0x506000, the memory content at run time will not reflect this because the file contents will be created putting everything cluttered at 0x500000.

lib/ReaderWriter/LinkerScript.cpp
2210

I gave a look at this and thinks are already refactored. The real contents of the loop are actually seen in "matchSectionName()". However, I can't merge these two different loops into one because they are iterating over quite different things: the first iterates over a range of a multimap that contains names associated with "order ids", while the second iterated over a vector. I don't see how I can "blend" these two different ranges into the same loop. But if you have an idea of how to simplify this and there is something I'm missing here, I'll gladly implement it.

rafaelauler edited edge metadata.

Addressed reviewers' concerns.

ruiu accepted this revision.Mar 11 2015, 1:29 PM
ruiu edited edge metadata.

LGTM

This revision is now accepted and ready to land.Mar 11 2015, 1:29 PM
shankarke added inline comments.Mar 12 2015, 12:44 PM
include/lld/ReaderWriter/LinkerScript.h
1186

You cannot get all the expressions and evaluate them for a simple case like this :-

SECTIONS {

.text : {
start_text_val = .;
. = . + 1000;
end_text_val = .;
*(.text)
}

}

rafaelauler added inline comments.Mar 12 2015, 12:50 PM
include/lld/ReaderWriter/LinkerScript.h
1186

Thanks for detecting this unhandled corner case. To fix this, we need to return all expressions in "getExprs()" in the original order of the linker script. Currently, we return it in the reverse order, so "end_text_val = ." is evaluated before others. I'll reverse the order and I'll add this test as a new test case.