- User Since
- Apr 11 2016, 8:56 AM (153 w, 23 h)
- I tend to collect the CET features in GnuPropertySection (SyntheticSection) class or in Writer->run like combineEhFrameSections(), but I found it should first get the CET features, then decide to create the GnuPropertySection or not (SPltSection too).
- I didn't remove the Config->X86Feature1AND, because I need it to tell the createSyntheticSections() to create the GnuPropertySection or not (SPltSection too).
Thu, Mar 14
Wed, Mar 13
Updated diff to address review comments.
Thanks for the comments I'll update the diff.
Tue, Mar 12
As an earlier comment mentioned, there is quite a lot of code that depends on the details of the .note.gnu.property section added very early to Driver.c which is not where we'd normally expect to see it. It also looks like you using quite a lot of code to get an InputSection to represent the output .note.gnu.property section. Given that there will be multiple Input .note.gnu.property sections but only one Output .note.gnu.property section, I think it would be cleaner to use a SyntheticSection to represent the .note.gnu.property section. SyntheticSections can implement all sorts of custom functionality that could move the parsing and merging code out of Driver.cpp. My suggestion is to:
- Create a new SyntheticSection (see SyntheticSections.h), something like NotePropertySyntheticSection. This will represent our single output .note.gnu.property section. It can store the status of the FEATUREAND bits.
- Add an entry to InStruct for a single instance of it that we will use for the output.
- After InStruct instance is created in Writer.c but before the relocations are scanned to create PLT entries iterate through InputSections and pass each .note.gnu.property to the NotePropertySyntheticSection. Then erase these InputSections from the InputSections. The combineEhFrameSections<ELFT> function in Writer.cpp is one way of doing something like that.
- When you add the .note.gnu.property InputSection to the NotePropertySyntheticSection you can update the FEATUREAND bits
- The writeTo() member function of the NotePropertySyntheticSection can output the data needed to make up the output.
Updated to make setSizeAndOffsets() use of Size less confusing, fixed some curly brace nits.
Updated the diff with the requested changes. I've followed pcc's idea to make the section behave more like the EHFrame section. This gets rid of some of the nastier relocation handling code in writeTo().
Thanks for the comments, I'll upload another diff with the changes.
Mon, Mar 11
I personally favour a separate PT_NOTE section per alignment, that gives a note parser simple rule to iterate through the section (the p_align field). I think ordering by decreasing alignment will work in practice for the note sections that we have now, but it may not be in the future. I also think that this is worth doing even if clang is changed to generate .note.gnu.property sections that are 4-byte aligned; clang will often use libraries compiled by gcc (or some other compiler) that does use 8-byte aligned sections.
Fri, Mar 8
I think https://bugs.llvm.org/show_bug.cgi?id=41000 "PT_NOTE segment not packed correctly" has some relevance to this review. In particular the .note.gnu.property sections on 64-bit machines have 8-byte alignment, merging these with 4-byte aligned note sections such as build-id can cause problems for consumers of PT_NOTE (see also https://sourceware.org/ml/libc-alpha/2018-08/msg00340.html for a discussion on the glibc mailing list). My understanding is that the eventual solution was to collate 4 and 8 byte SHT_NOTE sections separately and output a PT_NOTE section for each. I think that we should follow that and have tests for it.
Tue, Mar 5
I've no objections to adding the command line as a Darwin only option. Implementation looks fine to me although I've not got any prior experience with Darwin.
Mon, Mar 4
Fri, Mar 1
It looks like this change broke the clang-armv7-linux-build-cache which uses clang 5.0.1 running on a 32-bit ARM container http://lab.llvm.org:8011/builders/clang-armv7-linux-build-cache/builds/11063 . I've committed a temporary fix that r355195 that changes FileSize to use size_t rather than uint64_t. Mentioning it here just in case you want to fix it in different way.
Wed, Feb 27
Unfortunately I can't point you at any official documentation of the flags yet, the next release of the 64-bit ABI will have the details and should be released soon. I hope to post some patches soon after that has been done.
Soon happened to be a few hours ago: https://developer.arm.com/docs/ihi0056/latest/elf-for-the-arm-64-bit-architecture-aarch64-abi-2018q4 search for BTI and PAC
Thanks for putting this feature forward. I've not had a chance to go through everything in detail but I thought it would be important to mention that AArch64 has a similar set of features (Pointer Authentication PAC and Branch Target Identification BTI) that are going to use .note.gnu.property sections with GNU_PROPERTY_AARCH64_FEATURE_1_AND (same meaning as GNU_PROPERTY_X86_FEATURE_1_AND), with two associated feature bits GNU_PROPERTY_AARCH64_FEATURE_1_BTI and GNU_PROPERTY_AARCH64_FEATURE_1_PAC. AArch64 does need a modified PLT entry to make this work but it doesn't use a .splt, in effect an extra instruction at the top of the PLT if BTI is used and one at the end if PAC is used, or both if both BTI and PAC are needed. Given that we will have at least two targets implementing a similar mechanism but with target specific details then we'll either need to make it my responsibility to generalise the .note.gnu.property implementation so that it can support both AArch64 and X86, or we make it generic from the start. The main difference for AArch64 is that there are two independent feature bits to track.
Tue, Feb 26
Rui is correct, neither AArch64 or Arm support big endian at the moment. The output of the linker would certainly be broken. AArch64 and ARM make some demands of the linker that mean some additional complexity is needed:
- AArch64 instructions are little-endian, data is big-endian. Input objects have little-endian instructions.
- ARM v6+ in be-8 mode (default), instructions are little-endian, data is big-endian. Input objects have big-endian instructions and a linker must endian reverse the instructions.
- ARM prior to v6 and v6/v7 in be-32 mode (backwards compatibility only) have big-endian instructions and data.
My understanding is that big-endian is used in some networking and embedded systems as they have native big-endian data, however these tend not to make it to consumer hardware.
Mon, Feb 25
Fri, Feb 22
Thu, Feb 21
Thanks for the update. To summarise I'd like to keep the integrated and non-integrated assembler in synch for Linux like Android, even at the risk of adding an fpu when it isn't needed. I think that given how difficult it is to not use NEON, I think the mfloat-abi=soft guard you've put on will be sufficient. By the way, I'm hoping we aren't talking past each other with the default for armv7-a. I've tried to put as much of what I understand in the comments and I hope the answers make sense, or you'll be able to correct me where I'm wrong. If the timezone delayed comments aren't working well it may be worth finding me on IRC, I usually leave the office about 6:30pm but I can easily arrange a time and log on later if you wanted to discuss.
At the moment I'm happy with the current implementation assuming that the .ARM.exidx sections from input objects are treated are modelled as InputSections. If I'm going to need to rewrite all of it as a single synthetic section I can do it, but that is quite a bit of work and there isn't a guarantee it will be any better. Would it be possible to take this patch for the benefit of the teams waiting on the PR, and I'll work on a replacement? If it turns out to be better we can switch to it?
Wed, Feb 20
LGTM. Thanks for the fix. My apologies for missing that at the time, it looks like a cut and paste error on my part.
Tue, Feb 19
My main concern is that this changes the default behaviour for arm-linux-gnueabi and arm-linux-gnueabihf targets. I've put some suggestions on what I think the behaviour should be.
The implementation changes in the Darwin toolchain look fine to me, although with respect to the command line option I think Petr Hosek's message on cfe-dev is interesting:
GCC implements -nolibc which could be used to achieve the same effect when combined with -nostartfiles (and -nostdlib++ when compiling C++). I'd prefer that approach not only because it improves compatibility with with GCC, but also because it matches existing flag scheme which is subtractive rather than additive (i.e. -nodefaultlibs, -nostdlib, -nostdlib++, -nostartfiles). Clang already defines this flag but the only toolchain that currently supports it is DragonFly.
The test added here is intermittently hanging on our AArch64 buildbot.
Feb 14 2019
The LLVM associated symbol metadata uses SHF_LINK_ORDER (https://llvm.org/docs/LangRef.html#associated-metadata) and I think this is used by the at least some of the sanitizers. I think it would be difficult to get rid of completely.
Feb 12 2019
I think what I have there is correct but it obviously could be clearer. I've added some alternatives inline.
Feb 11 2019
Uploaded new diff to address review comments, main changes:
- Added expanded comment explaining why we need these sections.
- Moved the addition of the synthetic sections to where we add the sentinel and added test that fails if we don't.
Thanks for the comments. I will upload another diff in a few minutes.
Feb 6 2019
Thanks for the update. My Arm tests were fine as well. This all looks good to me, I don't have any more comments at the moment.
Feb 5 2019
I've made a few suggestions on the comments, otherwise looks good to me. I think it is better to keep the complexity in one place and document it than spreading it out. It would be good to get some input from George and Rui, or someone else that hasn't been deep diving on ifuncs recently to see if they had any problems following the code. I've run this on some address comparison tests I had lying around for AArch64 and it looks good on these. I'll run these on Arm tomorrow, as that puts the .igot.plt in the .got and will comment if I find any problems, but I'm not expecting any.
I've suggested a test case that fails when toPlt() is removed in D57743. Aiming to have comments on this review by the end of the day.
Jan 31 2019
The mix of soft-floating point and hard-floating point (with soft-float interface) can happen on embedded systems code as well. The solution Arm's proprietary toolchain had to this problem is for the linker to select libraries based on build-attibutes. It would detect the rounding mode build attribute and select the appropriate library variant, but that isn't available to us here. A colleague suggested an easily available customisation point that someone that had a local need for a different rounding mode could implement.
Thanks for the update. I'll be able to take a look at this in detail next week.
Jan 29 2019
Thanks for sharing the WIP. I think the approach will work and should match the behaviour of ld.bfd more closely. I've been trying to think of alternatives to redirecting all the symbols for the non GOT generating case. Off the top of my head:
- Keep the _PLT RelExpr variants and instead of changing the symbol alter the Expr for the non-plt non-got generating reference. This has the disadvantage of needing _PLT variants for any RelType that might fall into that category.
- Still create a new symbol for the non-plt, non-got generating reference, but instead of replacing the existing symbol, just change Sym for the affected relocations, leaving the others untouched. I think this would mean you wouldn't need to create DirectSym but it would mean having to store the symbol somewhere so that subsequent non-plt, non-got generating references could use it.
Not sure if either of those are better at this point though.
Jan 28 2019
Looks good to me. Thanks for the fix.
Jan 25 2019
Jan 24 2019
I've not got a strong opinion here, for what it is worth:
- The code changes look fine.
- I can see LMA being useful to embedded developers, or tools vendors, given a binary without access to or ability to generate a map file. Would they intuitively look for objdump for that information though? I don't know.
- gnu objdump also gives file offset and flag information which probably have more justification as they can't be derived from other options. If we are looking for output format compatibility then we ought to add those too. I don't think that there is any expectation of that at the moment though.
- If there is a command line option to print the LMA, then there is a risk that a good portion of people won't read far into the help to discover it exists. For that reason I quite like the idea of printing the column only when there are program headers that have different VMA and LMA.
Jan 17 2019
I'm happy to split the patch into basic support and consider how we support GC separately.
Jan 16 2019
Given the comments from bd1976llvm about the intention vs the wording of the ELF specification, and the restriction in this patch of the behaviour to non-comdat groups (assuming intentional) then I think we should take a step back and decide why we want to make the change and what the requirements are. I think that if we are taking the annobin interpretation of the specification with respect to garbage collection and what to be strictly compliant then we should apply the behaviour to all groups. Another interpretation is that only use case for non-comdat groups is to affect garbage collection so we can handle them differently to comdat groups; given the only use case I've seen in many years is annobin then this could be low risk without adding group member squared extra dependencies for all comdat groups. Alternatively if we just want to support annobin then there may be a better way of doing this without involving groups. There is a relocation from each SHT_NOTE section to the code section that it describes. It could be possible to handle these in the same way as .ARM.exidx sections by making the SHT_NOTE section from the group dependent on the code section it relocates against. If we did that then you'd only need to preserve the SHF_GROUP flag on the SHT_NOTES section and wouldn't need to make all the sections in the group depend on each other.
Jan 15 2019
Ping. I think that all the currently requested changes have been made and the out of range link errors problem caused by enabling it has been fixed in D56396.
Jan 14 2019
I've had a look at the code in a bit more detail and have left some comments. I think we should state somewhere, either as a comment or in a commit message about how we handle garbage collection/icf for Section Groups. As it stands LLD marks all non-SHF_ALLOC sections as live so the primary use case for the non-comdat section groups (ensure removal of function removes all the build notes associated with it) isn't going to happen.
I think we need to be very careful here. If I've understood this correctly then if this functionality is used for anything critical then a manually supplied target will be needed when doing cross-linking. For example my host LLD is x86_64, is just called ld.lld and will have an inferred x86_64 target triple. If someone customises the behaviour of LLD on the triple in a way that doesn't get caught by the test suite then we could get some strange breakages when doing cross-linking. I personally would prefer to see any option like this not try and auto-infer the target unless it can do it reliably and accurately from the input objects and I don't know if that is possible for all supported targets.
Jan 10 2019
I don't it is correct to use common-page-size to override the value of max-page-size. My understanding of common-page-size is that it is used to save memory in some cases when the same page in the file can be mapped at a different virtual address. At the moment I don't think LLD supports this so there isn't any benefit in having a different common-page-size.
Thanks, I've updated the test to use multiple echos. I also specified a load address of .text_high to force the generation of a second program header. A single one was generating a close to 4Gb file.
Jan 9 2019
Thanks very much for the review. I'll wait a day before committing (I don't think that this is controversial) and will make the suggested changes in the test there.
I can confirm that the group sections are coming from the annobin tool.
From the man page: https://www.mankier.com/1/annobin
"attach""no-attach" When gcc compiles code with the -ffunction-sections option active it will place each function into its own section. When the annobin attach option is active the plugin will attempt to attach the function section to a group containing the notes and relocations for the function. In that way, if the linker decides to discard the function, it will also know that it should discard the notes and relocations as well.
Jan 8 2019
For some background, the static glibc.a in the latest (unreleased) build of Fedora contains these non-comdat group sections.
I've got a small suggestion about the comment. Otherwise I'm happy with the change. I think that it is reasonable to increase the alignment for all operating systems as the amount is not likely to be significant for platforms that make use of TLS.
Jan 7 2019
I've created D56396 to cover the relocation out of range errors seen when linking the linux kernel.
Updated to remove --pic-thunk command line option. We now use only the ld.bfd option --pic-veneer. No other changes.
Jan 2 2019
Dec 20 2018
LGTM as well
LGTM as well
Dec 18 2018
Unless we have evidence that many users are making the mistake of linking against incompatible shared objects, my personal preference is that it is better to follow ld.bfd and ignore. I think most toolchains and distributions make this mistake hard to do by accident.
I have several reservations about doing this:
- BuildAttributes are only defined for relocatable object files, the ABI does not require or prevent them from appearing in Shared Objects or Executables therefore nothing should rely on their presence.
- Can it be guaranteed that the shared object we link against will have the same attributes as the shared object at run time?
- Should the attributes of the shared object be aggregated with the attributes from the relocatable object?
Dec 17 2018
Thanks for the comments. I used projects as that was what was in the LLVM Getting Started guide. I do agree that it is confusing though. I've put a bit more text at the start about where to get compiler-rt source code and where to place it. I've also moved the Baremetal CMake cache to the end as it is different to the other two sections.
Dec 14 2018
Thanks for the comments. I've added some more information to mention where the flags come from and added a reference to InputFiles.cpp where updateSupportedARMFeatures() contains the mapping between architecture and the flags along with some explanation.
Dec 13 2018
Ok I've removed the extra register. I've revisited my notes and the instructions that could potentially fault if not 8-byte aligned (ldrd and strd on v5te) don't exist in v6-m anyway.
Dec 12 2018
It is true that the AAPCS only requires 8-byte stack alignment at procedure call boundaries. I have seen some interrupt handlers assuming 8 byte stack alignment, or at least not considering the case that it might not be, so I thought it better to be safe given that thunks are not easily visible to the programmer. I'm not wedded to the idea though so if you'd prefer I can remove the save of the extra register.
Dec 11 2018
Yes, thanks for the spot, there is a copy-paste error. My apologies for missing that; now corrected.
Thanks for the comments. I've applied the suggested change to remove the else.
Overall I'm happy with this change as I think it is simpler than adding another call to Writer<ELFT>::run(). The one remaining thought is whether Writer<ELFT>::run() can be moved into finalizeAddressDependentContent() as there shouldn't be anything after that function that changes address. Could we move the remaining Writer<ELFT>::run() to the end of finalizeAddressDependentContent() ?
Dec 10 2018
Thanks very much for the comments, I've applied the suggested changes and updated the docs.
I quite like the idea of renaming maybeNeedsThunks(), for example addAddressDependentContent()? We could make sure that there is always at least one pass of assignAddresses done there.
Dec 7 2018
Thanks for the fix. I kind of thought it might be possible to lazily postpone symbol assignments to other symbols till after linker scripts but this requires quite a bit of contextual information to know when we need the value of a symbol expression. It looks like ld.bfd has a similar post-script fixup for symbol sections and values although it just evaluates all linker defined symbols again rather than all the addresses again. That could be another option if there were performance concerns about redoing the whole of assignAddresses.
Dec 6 2018
I've had a chance to check through this and other 3 patches in series. If we are intending to do the minimum to stop the linker from crashing and assume the user knows what they are doing when they do this, then these look correct to me. I think these changes are important for use of LLD to link the linux kernel.
Dec 3 2018
I've managed to build the linux kernel with D55211, D55215 and D55218 (after working around https://bugs.llvm.org/show_bug.cgi?id=39857) . I've not got the means to run the kernel, but it seems like the rela.dyn is present and correct and can be dumped with readelf.
Will try and test these patches out on the linux kernel build on Thursday (going to be away Tuesday/Wednesday). One thought I had that could potentially simplify all three patches is to treat .dynamic, .dynstr and and .dynsym as a single discardable unit. For example:
- The .dynsym is not useful without the .dynstr.
- If there is a .dynsym then there must be a symbol that needs looking up with a dynamic loader, hence there is a strong case for the .dynamic section.
Nov 30 2018
This could also occur with ALIGNOF and SIZEOF. It looks like Script->getOrCreateSectionName() is only called when a part of a script references another, whereas SECTIONS uses Script->createSectionName(). Might it be better to set ForceKeepIfEmpty in that function.
I personally would prefer D54621 but I think this is better than producing incorrect output.
I think that this type of script is common in embedded systems. A company will have a product family represented by a common core + variable feature sets, they often have a single linker script that is shared between the projects. This often results in some Output Sections being empty for some of the products. On larger projects it can be difficult to know when this will occur, for example all it takes is one developer to remove the use of a library to silently break a working program.
Nov 29 2018
Yes this is fine. The effects are entirely within the Android target.
Nov 28 2018
Thanks for the comment. I've made the suggested change.
Thanks for the comments. I've added a specific error message for when _GLOBAL_OFFSET_TABLE_ is already defined.
Nov 27 2018
No problem, I'm still ok with the changes.
Thanks, superfluous parentheses removed.
Updated to remove now unnecessary cast and to make sure REQUIRES has a colon at the end. Reposting in case Rui has any comments.
Thanks for the comments. I've made the lambda a free function and have made sure REQUIRES has a colon after it in all the tests.
Nov 23 2018
Nov 22 2018
I think this looks good. It is likely that US based people will be away for thanks giving so it will be worth pinging Rui next week.
LGTM. I think that this one is low risk anyway.
Nov 21 2018
Thanks, I'll wait a bit before committing to see if Rui has any comments.
Agreed, I've made the change to In.RelaIplt->Name == In.RelaPlt->Name.