- User Since
- Apr 11 2016, 8:56 AM (196 w, 4 d)
- Removed support for corner case where there is just a filespec. This permits the INPUT_SECTION_FLAGS logic to be moved to readInputSectionDescription from readOutputSectionDescription.
- Changed comment in test
The implementation looks good to me. Will defer to others that know -fsanitize-coverage=inline-8bit-counters and LTO better. There was a comment in the PR about adding a dummy non-gcable section instead, I think that this would have the same effect as having the linker handle sh-link == 0, but it might be more portable to other linkers. Any thoughts on whether a dummy section would work better?
I'm reluctantly in favour of making the change on the basis that there exists a design using SHF_LINKORDER that lets this happen. My reservation is that in some use cases of SHF_LINKORDER such as .ARM.exidx sections where there is a 1 : 1 relationship between meta-data and "content" (for want of a better word) section then sh_link = 0 would be a sign that something had gone badly wrong. Having said that there isn't any thing in ELF that forbids the use case that this needs. I think that for .ARM.exidx which is the other main user of SHF_LINKORDER we handle it as a special case so this shouldn't make a difference in practice.
I'll remove the support for the corner-case of when there is just a filepattern. Most likely won't get the time today, if I don't then I'll update on Monday. Thanks for the comments.
Happy to submit a patch to implement these relocations in LLD, on the initial implementation I concentrated on what llvm-mc could output. Now, we'd be able to write a test in yaml2elf.
Updated diff. Main changes:
- Use stringswitch with enumerations
- Incorporate simplification suggestions
Thanks for the review. I'll upload another patch that I hope to have addressed all the comments.
As I understand it these relocation types are not often implemented is due to their short range. The Arm and Thumb2 ldr (literal) range (search for A8.8.64 in https://static.docs.arm.com/ddi0406/c/DDI0406C_C_arm_architecture_reference_manual.pdf) is:
Encoding T2 or A1 Any value in the range -4095 to 4095.
Outside of carefully crafted section orders, the chances of the relocation being in range if it went outside of a section boundary is very low. If the relocation is within a section boundary then the assembler can resolve it.
Thu, Jan 16
Address code review comments so far. Main changes:
- Used EnumEntry as in ELFDumper.cpp
- Updated parser with better diagnostic message
- Updated tests
Thanks for the comments, I'll upload a new patch shortly.
Updated diff (and description) to simplify part of the check to os->size > target->getThunkSectionSpacing(). This is conservative, but simple. We can in theory do better, but it is difficult to do so without using heuristics. In effect it is a judgement that ThunkSections close to, but not necessarily at the end of the OutputSection have less of a chance of generating sufficient extra thunks and patches to cause non-convergence.
Thanks for the comments. I'll post an update with a simplification as I don't think I can justify the heuristic without being too hand wavy.
I'm happy to go ahead with this. The maximum alignment for instructions on Arm and AArch64 is 4, which is what the compiler uses for code sections.
Wed, Jan 15
Thanks for the comments, I'll make an update tomorrow. I'll plan to:
- Try the flagsMask + withMask to see if it looks any better.
- Limit the recognised flags to what BFD accepts, I'd like to continue to recognise integers.
- Look at the error reporting in the parser.
- Update the tests.
Tue, Jan 14
Updated the fix, and description after testing on a real linux kernel allyesconfig build. The description contains the full details, which should also be present in the comment. In summary we round up the size of a thunk section to nearest 4KiB when:
- The erratum fix is being used.
- The thunk section may be placed before the end of Output Section so there may be code after it.
- The InputSectionDescription is larger than 4KiB.
Thanks for the comments. I've decided to rework this In light of it not actually fixing the problem in Linux. In the allyesconfig build the overall .text OutputSection is larger than thunkSectionSpacing, but the assert was for in effect an InputSectionDescription within .text so I've had to think again. I'm nearly ready to submit again, just got to test in on the kernel itself first.
Mon, Jan 13
Committed as 10c11e4e2d05cf0e8f8251f50d84ce77eb1e9b8d , I've used the new attribution process https://llvm.org/docs/DeveloperPolicy.html so you should show up as the author of the patch in the commit log. I'll comment here if there are any problems with buildbots.
Fri, Jan 10
LGTM. ld.bfd, at least on AARCH64 will write the address of PLT into the .got.plt, but this may just be out of convenience as it will be overwritten by the IRELATIVE relocation before it can be used.
LGTM thanks for the update.
I can do it early next week, unfortunately a little too busy today. If anyone else can do it earlier please go ahead. To obtain commit access it is worth following https://llvm.org/docs/DeveloperPolicy.html#obtaining-commit-access
I'm personally strongly enough in favour to approve given no strong objections so far. Will be worth holding off to next week to see if the approval provokes any.
Wed, Jan 8
If I could summarise the comments so far:
- There is some concern that this is not the full solution and there maybe other things that can break interposing.
- The response is that fixing the assembler will be a necessary part of that work.
- There maybe some extra via-PLT calls in shared libraries if -ffunction-sections is not used, but these can be mitigated by giving the symbols non-default visibility, and in any case these would exist with -ffunction-sections.
Thanks for the update, looks good to me.
Tue, Jan 7
Ok LGTM. May be worth waiting a day to see if there are any final objections.
On looking again at the linux kernel makefile
Apologies for the delay in responding, just come back from vacation. I've checked the implementation against GCC and it looks like it will give the same behaviour. I've got one minor suggestion surrounding the clamping of TLSSize to its maximum value. It looks like it would only need to be done once, is there a convenient place, such as AArch64TargetMachine where it can be done once?
Apologies for the delay in responding, the original came in when I was on vacation.
- Am I correct in thinking that the extra dynamic symbols that will result with -pie and no shared libraries are just wasted space, but won't cause anything to happen at run time as there are no dynamic relocations generated against them?
- The main argument for removing is that this our ld.bfd is ad-hoc and this part doesn't help much, removing it would simplify here and permit a simplification in D71795. Is there a plan beyond D71795 to more closely follow ld.bfd behaviour, or is this just an area where we expect cosmetic inconsistencies?
Mon, Jan 6
I'm not sure of the history of LLVM policy on this. I note that as of today it is possible to get different program behaviour (either performance regressions due to extra PLT entries, or interposing) with -ffunction-sections and without as there will always be a relocation for a branch to another global STT_FUNC symbol with -ffunction-sections as it will be in a different section.
Dec 19 2019
Apologies, I'm out of office for the next couple of weeks. If you don't get any other volunteers to review can you ping again in January, I'll be back on the 6th.
Dec 16 2019
Looks good to me. One minor nit about a stale comment. I'll be out of office for the next few weeks, I'll check to make sure there isn't anything critical but otherwise might be a bit slow to respond.
Looks good to me too.
Dec 13 2019
Looks good to me. Will be worth seeing if there are any comments in the US time zone.
Dec 11 2019
Looks good to me.
Dec 10 2019
Updated diff with test changes.
Thanks, I'll upload a new diff with the test changes.
Looks good to me. Will be worth giving Tim or other reviewers a day for a last chance to comment before committing.
Dec 9 2019
Reading through the PR I think that this is the right thing to. I think that there is a possibility of narrowing down the assumption that a Shared definition is common, by checking its address against the loadable segment containing BSS. I don't think it is worth doing that.
Assuming George is happy with them, changes look good to me,
Dec 6 2019
Apologies for the delay in responding. I've got a query about the alignment of the .note.gnu.property sections. As I understand it they should be 8-byte aligned for a 64-bit machine. I also think it would be helpful to warn that if at least one, but not all functions have the branch-target-enforcement attribute.
Dec 5 2019
Dec 4 2019
Uploading diff, I could have sworn I did this early today. Maybe I forgot to press something.
Thanks for the comments I'll upload a new version shortly.
Dec 3 2019
Thanks both for the comments. Updated to use config->wordsize, converted test to x86_64 and added 32-bit x86 test for the alignment 4 case.
Thanks for the comments. Converted test to x86_64, removed indendation and fixed comments.
Thanks for the comments. I've paired the tests back to only print the header type. The GNU readelf string type matches, it looks like GNU objdump is missing the string, I've followed the same convention it uses for PT_GNU_RELRO, which is just RELRO.
Thanks for the comments. I've updated the diff with context, apologies it has been some time since I last posted a review and I forgot. I've also applied the suggested changes to the test.
Updated to add context to diffs.
Updated to add context.
The change looks reasonable to me but I'll defer to the PPC experts for the PPC specific changes, as I'm not that familiar with it.
Nov 28 2019
I've put some comments inline. I'm no CMake expert though, so there may be other reviewers that can make alternative suggestions.
Nov 27 2019
I'm happy with this change. It would be worth waiting a bit to see if there are any comments from other reviewers.
LGTM. I think this looks a lot cleaner thanks. I also agree that the getISDThunk call sites don't need modifying as at this point the Thunk is our destination and we don't want an addend from the thunk symbol.
There are a couple of downsides to putting all the globals in one place. I think the LLD codebase is probably small enough for them to not be too serious:
- It couples pretty much every file to a single place. Touching this file means recompiling everything.
- It can make the code harder to reason about as any file can see all the globals, and can potentially modify them. This is a possible argument for not promoting statics to globals as the scope of a static is much easier to reason about.
Nov 26 2019
Is there any reason not to have just one function inBranchRange() and assume that it needs to take care of the relocation addend explicitly, that way the PC Bias could be limited to the ARM implementation of inBranchRange() and inBranchRangeBeforePCBias wouldn't be needed?
Nov 25 2019
I'm comfortable with the detailed code-changes. I initially thought that using a nested pair might make the code too difficult to read, but it is clear at the construction of the pair which field is which. I'd definitely want to avoid referring the elements of the pair with first.first or first.second, but as we only use it as a key it doesn't matter.
Looks good to me, looks like it matches the discussion in https://bugs.llvm.org/show_bug.cgi?id=44089 will be worth waiting for US timezone based reviewers to comment.
Will look at the code in detail this afternoon.
Nov 22 2019
I'm happy with this too.
Nov 21 2019
I think this is a good idea, I'd say most of my typos are of this form, such as SubTarget vs Subtarget. Will be worth waiting to see if anyone in US timezone has any objections.
I'm comfortable with defining a filter iterator, I think the trade off of increased complexity vs simple interface is worth it. I can see some benefit in `for (Symbol *sym : symtab->symbols())` as it is not clear from sym : symtab that PlaceHolders are being skipped.
Nov 19 2019
This looks good to me too.
One thing that might work better than potentially random problems if the containers aren't cleared, is some kind of assert only, or self-diagnosis mode that could give an assert or error message that a data-structure hadn't been cleared. For example assert(inputSections.empty()); assert(In.bss == nullptr);. With asserts checking I think we could get away with a small set of tests to run twice with some --self-diagnostic-option to check the majority of the cases.
Nov 13 2019
I'm broadly in favour of making this change. I recognise there are some concerns about the intent behind the ELF specification, which is largely written without garbage collection in mind. One reading of it is that garbage collection is orthogonal to group selection, and the all or nothing comments relate to group selection only. Will be good to wait a few days to see if there are any strong objections. I tend towards the conservative approach unless there are some demonstrable benefits from not doing so.
Nov 12 2019
To summarise where I think we are right now.
- D76002 fixes a problem affecting within MCDataFragment eager evaluation of .if expressions. Specifically when there is a label that would be inserted into the MCDataFragment, but at the time of encountering the label the fragment hadn't been created. MC will attempt to reuse an MCDataFragment for new instructions, see CanReuseDataFragment() in MCObjectStreamer.cpp. As noted earlier when the Subtarget changes a new MCDataFragment is created. In the majority of cases this is done via a .arch or .arch_extension directive.
- This patch extends the eager evaluation to cope with two adjacent MCDataFragments. My understanding is that this only occurs in the following circumstances:
- When bundle locking and --mc-relax-all is used, there is a complicated 1 instruction per fragment + fragment merging that goes on here. This is only used in NaCl which I'm not sure what the status of in Chrome is. I think it is at best deprecated.
- When there is a Subtarget change between MCDataFragments.
- In all other cases such as .align, .space and a relaxable instruction there is a separate non MCDataFragment created so we cannot fix these up.
- The patch restricts the eager evaluation to .if as some asm backends do not want the expressions between fragments evaluated eagerly in some cases.
Thanks for the updates, I've no more comments and have no objections.
I'm happy with this too. I checked the ARM Mapping Symbol code, and it only uses the EmitLabelAtPos in one specific circumstance, a pending data mapping symbol which always has a MCDataFragment associated with it.