- User Since
- Apr 11 2016, 8:56 AM (114 w, 6 d)
Fri, Jun 15
Thanks very much for the update. I don't have any more comments at this stage.
Thu, Jun 14
Thanks, I'll wait till tomorrow before committing to see if anyone in the US timezone has any objections.
Thanks for the patches. Out of curiosity did you have any thoughts about how the proposal to ELF was going? https://groups.google.com/forum/#!topic/generic-abi/MPr8TVtnVn4 . In particular compressed SHT_SYMATTR seemed promising.
Tue, Jun 12
Tue, Jun 5
This looks good to me. I've checked that each of the files has some requirement such as an x86 target triple. I've added the code owner Rui and a frequent contributor George to see if they have any objections.
Mon, Jun 4
Thanks for the comments I've updated the diff to show:
Fri, Jun 1
Thanks for the review, the code here is rebased on top of D44928 (not landed yet) which passes STI in as a pointer. The existing code has STI as a member (const reference) so I'd need to change STI-> to STI. but nothing else would change.
This looks sensible to me. If I've got this right when the DW_AT_linkage name differs from the DW_AT_name the symbol name will match the DW_AT_linkage_name so the lookup by symbol name won't succeed if done via DW_AT_name.
Apologies I don't know x86 well enough to be authoritative here. It looks like at least the test case needs to use %rip to see a meaningful difference in the output of mc.
Tue, May 29
Ping. Is anyone willing to approve or has any thoughts on a better design?
I wonder if the next would be the correct way to solve this:
As far I understand, _GLOBAL_OFFSET_TABLE_ seems intended to be used with special types of relocations (this needs clarification).
R_X86_64_GOTPC64/R_X86_64_GOTPC32 and you mentioned R_ARM_BASE_PREL for arm, for example (your D46319 seems to make sense to me)
I am not sure that the second way should work (if I understand everything correctly).
For x86 as you know we define _GLOBAL_OFFSET_TABLE_ in the .got.plt now.
Then for example if we have the following TLS code:addl $_GLOBAL_OFFSET_TABLE_, %ebx #add $0x199f,%ebx, R_386_GOTPC subl $8, %esp #sub $0x8,%esp leal gd@tlsgd(%ebx), %eax #lea -0x10(%ebx),%eax, R_386_TLS_GD
it expects the first line to be resolved to the end of .got (to the GOT base in ABI terminology).
If we redefine the relocations that use _FROM_END to use the location of the _GLOBAL_OFFSET_TABLE_ (I think that is what this patch tried to do),
then it will not help us I think as we will end up with the address of .got.plt at the first line. And the code will fail to find a proper TLS slot as a result since .got.plt
can be far away from .got in LLD.
Apologies for the delay in responding, Monday was a public holiday in the UK. I'm not sure if I have much to add over what has been already said. My understanding from when I looked at pr36555:
- Each target has a convention on where _GLOBAL_OFFSET_TABLE_ is placed.
- glibc uses _GLOBAL_OFFSET_TABLE_ to find where it has been loading, it expects the link time address of _DYNAMIC. This seems to be the case for all targets.
- glibc reserves entries in the .got.plt to support lazy loading, these seem to be accessed via the dynamic tag DT_PLTGOT rather than relative to _GLOBAL_OFFSET_TABLE_ despite the comment mentioning _GLOBAL_OFFSET_TABLE_ and _GLOBAL_OFFSET_TABLE_, I suspect that some time ago these used _GLOBAL_OFFSET_TABLE_ but was at some point switched to .got.plt and .got.plt with .got.plt possibly used for _GLOBAL_OFFSET_TABLE_ if the target sets _GLOBAL_OFFSET_TABLE_ to .got.plt.
- Some targets have a convention of loading the value of _GLOBAL_OFFSET_TABLE_ and accessing variables using some offset from the _GLOBAL_OFFSET_TABLE_.
- LLD initially set _GLOBAL_OFFSET_TABLE_ to 0 and defined some of the relocation expressions from the end of the .got.
- This works for most programs as the precise value of _GLOBAL_OFFSET_TABLE_ isn't used, as long as the relocations that are calculated relative to it come out consistently. This obviously doesn't work for glibc as it uses _GLOBAL_OFFSET_TABLE_ directly.
May 24 2018
Thanks for the update. That looks good to me.
May 23 2018
I think that this is a sensible thing to do and it also helps out when trying to build compiler-rt for armv7-m as compiler-rt currently assumes that you have a target with a VFP when many armv7-m devices do not.
May 21 2018
Thanks very much for the comments. Although the STI is available at the time the instruction is encoded; due to relaxation being a separate pass resolved after all instructions have been emitted we don't have the STI to pass to applyFixups() as all we have is a list of fragments. For example the MCRelaxableFragment has to store the STI as it might need to re-encode the instruction during relaxation. Thinking about some possible alternatives:
- Make another pass over the input during relaxation (and at other times we need the STI, such as emitting NOPS in alignment padding), so that we know what STI is in effect at the time. I think that this would be a much larger change, in effect making the assembler 2 pass.
- Maintain some kind of mapping table between fragment and STI. I think that this is logically equivalent to storing the STI in the fragment though.
- Store the STI per MCInst, this seems wasteful and I don't think it would work for writeNops() as these come from alignment and padding fragments.
May 18 2018
Thanks for the update, looks good to me.
May 17 2018
Thanks for the update, typo aside they look fine to me.
May 16 2018
I've rebased the patch against current trunk. The littleEndian parameter was removed from applyFixup, NFC for the patch.
May 15 2018
Thanks for the update. I've left some minor comments but I'm overall happy with the change.
Thank you. I've made the suggested changes and committed under r332332
May 14 2018
Thanks very much for the comments. I've made the following changes:
- Used a bitfield for KeepUnique
- moved findKeepUniqueSections to the driver
- Simplified test case with nops, and removed absolute paths.
May 11 2018
Thanks, I'll wait till next Monday before committing.
Rebased against master. I'd like to see if I can progress this a bit further as there is now interest in Android to switch to LLD and this issue is preventing some of Bionic from being linked with LLD.
May 10 2018
May 8 2018
If I've understood correctly (from test/ELF/comdat.s) this will affect relocations from all sections to discarded local symbols. If I'm right I think that this is too wide a use case. My understanding is that in gold and bfd this case is specifically for relocations from Dwarf debug sections to discarded local symbols. Relocations from non-dwarf debug sections to discarded sections will be either an error or ignored depending on the use case.
May 4 2018
May 3 2018
Updated diff to remove relocations from usesOnlyLowPageBits(). While this is true for these relocations the usesOnlyLowPageBits() function is never called for local exec TLS so by the same logic as applied to D46249 (don't add dead code) I've removed them from the patch.
Ok I'll abandon this one and I'll update D46255 to remove the new TLSLE relocations I added to usesOnlyLowPageBits as the same argument applies.
Perhaps it is worth a comment instead as the function name implies a complete list. For example "// not a complete list, excludes relocations that can always be resolved statically such as TLS local exec."
May 2 2018
Looks good to me. I checked the relocation against the table in https://msdn.microsoft.com/en-us/library/windows/desktop/ms680547(v=vs.85).aspx#base_relocation_types
Thanks for the comments. I've rebased on top of D44928, only change to the source is STI. goes to STI->. I've added .arch directives to the test to check that we can switch and keep the correct check. By the way I'd be really grateful if you could accept D44928, I've been threatening to commit it unless there are any objections, but I'd feel better if the review was accepted.
May 1 2018
Apr 30 2018
Apr 27 2018
I've not had any more comments this week on the approach to instruction bundling. Would anyone have any objections to committing this next week as this will fix pr36542, and will at least not make the writeNops case any worse?
I'm happy to LGTM. Apologies for the delay.
Apr 23 2018
FWIW I'm in favour of this approach, and of merging the Asm and ObjectStreamer as I've recently found a case where this would have been useful (deriving a MCSubtargetInfo when emitting constant pools). There were others with stronger objections though.
I've added reviews for the support of writeNops(). My apologies for the size of the changes, the first three make sure the STI is recorded in the fragment so that we can retrieve it when calling writeNops. The final one passes STI through to writeNops and removes STI from the X86 and ARM AsmBackends.
1 [MC] Move bundling and MCSubtargetInfo to MCEncodedFragment [NFC] D45959
2 [MC] Add MCSubtargetInfo to MCPaddingFragment [NFC] D45960
3 [MC] Add MCSubtargetInfo to MCAlignFragment [NFC] D45961
4 [MC] Use local MCSubtargetInfo in writeNops D45962
Thanks for the review. I'll wait a bit to see if there is a second opinion, I too would value more input on instruction bundling as I've not used it myself. I've completed a set of patches that will pass through the subtarget to writeNops() and remove STI from the X86 and ARM AsmBackends, these have some further reorganisation of the position of instruction bundling inside the MCFragment hierarchy. I'll add these as dependent reviews to this one and link them in a comment.
Apr 19 2018
Thanks for adding the test, it looks like it will cover my concern.
As well as Thunks and errata patching, I'm guessing that any future generic (not just Android) uses of packed dynamic relocs might also need symbol addresses to be recalculated (https://sourceware.org/ml/gnu-gabi/2017-q2/msg00000.html)
Apr 18 2018
I've updated the diff to account for instruction bundling. When an instruction must be put in the same fragment for the purposes of bundling we now check that the MCSubtargetInfo for the fragment is consistent. I've added some tests to check that we detect an attempt to change subtarget within a bundle.
Apr 17 2018
If I've understood correctly, this will evaluate the expression if there is something simple and unrelaxable such as (Armv7a)
Apr 10 2018
Thanks for the comments. I think your second one brings up a couple of wider problems that I'll need to address:
- getOrCreateDataFragment() needs to start a new fragment when the original contains instructions and the new one has a different Subtarget. Otherwise the later Subtarget will overwrite the first, I don't think that this makes any practical difference for the MCAsmBackend functions changed in this patch, however it will do for the WriteNops().
- For instruction bundling in cases where the previous fragment must be used then it must either have the same Subtarget, or we permit the later Subtarget to overwrite the first.
Apr 6 2018
I've updated the diff, description and title to include passing through MCSubtargetInfo to applyFixup() as well. I think that it is worth combining these changes into one as it is possible to introduce a code generation fault with just fixupNeedsRelaxation() receiving the STI.
Apr 5 2018
I think that an EmitAssemblerFlag that is used in the same way as the flag to switch from Thumb to Arm can toggle the support for the Architectural NOP without having to do large amounts of source code changes, most of them outside the ARM backend.
This is really hacky... the fragments in question should really have an STI instead. Plus there are actually separate features; whether NOP is supported in ARM mode (which is what hasNOP queries), and whether NOP is supported in Thumb mode (i.e. whether Thumb2 is supported). Plus other architectures have similar problems (e.g. x86 changes the generated NOPs based on the target). If you're not going to fix it properly, please just leave it as-is for now, with a FIXME.
Perhaps --check-library-dependency is too strong a name for the cases that this can catch. I think it is really a --find-library-backreferences that can be helpful as part of resolving differences in behaviour between lld and other unix linkers, but there are limitations.
Apr 4 2018
I've done a bit of thinking about how best to get the necessary information to ARMAsmBackend::writeNopData and ARMAsmBackend::applyFixup. I made an attempt at passing MCSubtargetInfo through the necessary layers so that STI could be removed. I think that this is impractical for writeNopData, although possible for applyFixup. What I propose to do:
- The writeNopData() currently checks to see if the Architectural NOP instruction is supported, based on the MCSubtargetInfo in ARMAsmBackend. I think that an EmitAssemblerFlag that is used in the same way as the flag to switch from Thumb to Arm can toggle the support for the Architectural NOP without having to do large amounts of source code changes, most of them outside the ARM backend.
- All the fixups that test the MCSubTargetInfo for target features are for instruction fixups, these are covered by the MCRelaxableFragment type that already has STI embedded in it. Rather than adding STI to the Data Fragments with relaxations I propose to pass a pointer to STI that will be null for Data Fragments.
Apr 3 2018
Thanks for the comments and apologies for the delayed response (Easter break). The patch here is somewhat fortunate in that the MCRelaxableFragment already has a reference to the MCSubtargetInfo. The other instances ARMAsmBackend::writeNopData and ARMAsmBackend::applyFixup may be called on other Fragments that don't have the MCSubtargetInfo member such as MCAlignFragment and MCDataFragment. I'll do some investigation to see what the best options are.
Mar 29 2018
Thanks for the update, I'm happy with the changes and the new test case.
Thanks, I'm happy with the changes.
Given that this is important for compiling the Linux kernel with clang I think that it is worth improving compatibility with GCC. LGTM.
Mar 28 2018
That looks reasonable to me. Another heuristic that can be used in at least C programs is to look for the sections with the most number of calls to them and place these in the centre. This tends to reduce the number of thunks as many library functions are called from all over the program, however I think the performance loss of going through a thunk means that moving the hot functions makes more sense.
Thanks for putting in the effort to doing this, especially updating the tests! This can also apply to AArch64 as well. The only concern that I have right now are whether it is possible to construct an example that oscillates between short and long thunks and would infinite loop if it were not for the iteration count. In general transitioning from a short thunk to a large thunk will increase the distance between sections, however there is one case where adding a thunk, or increasing the distance can shorten a distance.
This sounds reasonable to me. For large programs there will be sufficient thunks that the small benefit to readability (all your thunks can be found at the end of the section) won't apply.
With respect to parsing, at a recent conference one of the main requirements embedded developers had on ld.bfd was a machine readable (xml, json, etc.) map file. If ease of parsing is a concern, could we separate those concerns and have an alternative way to output the necessary information?
Mar 27 2018
It can be very useful when developing embedded systems. There is often a case where the load address of an output section is in flash and it is copied to a different execution address in sdram. Seeing the load address in the map file can be useful when debugging problems, especially when the device won't boot up. It is probably sufficient to give the LMA of each OutputSection as the LMA of the input sections can be derived from it.
Thanks for the clarification. With that in mind I'm much less concerned that adding "r" as an alias will make extra warnings more difficult. I agree that there should be at least a warning for register long x asm ("wn") although that is separate from this patch. Has anyone else got any objections?
Thanks for pointing that out. I should check each use of the member to see if the information can be used incorrectly. I think it would be good if we can remove the MCSubtargetInfo, I have my suspicions that it might be used from Assembly and for BuildAttributes generation (on Arm), again some more checking is needed.
Mar 26 2018
I think that we may have a bit of a conceptual difference with GCC here. As far as I can tell in GCC it doesn't matter if "rn", "wn", "xn" or even "bn" is used, this refers to register 0. The Operand constraint such as %w0 is used to control the substitution so the r registers aren't strictly aliases like the other registers. The comment
The S/D/Q and W/X registers overlap, but aren't really aliases; we
don't want to substitute one of these for a different-sized one.
suggests that this may have been the intent that Clang behave differently to GCC in this respect. For example in Clang there appears to be a bit more significance placed on the "wn" or "xn", for example clang will warn in the following example, whereas gcc will not:
I had a quick look at the PR last week after some people mentioned that they were seeing link errors in their project and were blaming the linker and not the compiler. I'm not a debug expert so it would be best to get someone else to look over the test case. I think the source code changes are probably the best that we can do right now.
Looks good to me.
Mar 21 2018
No performance implications that I know of, the source code history might contain some clues. I'm guessing that the function wouldn't be significant in the context of a compile so I personally think it would be good to clean it up.
Thanks, I'm happy with that. Will be worth waiting to see if there are any comments from the US timezones.
Mar 20 2018
Looks a lot better than before. One possible alternative to making the subtarget then subtracting ARM, could we use the same way that nacl adds the nacl-trap feature in ParseARMTriple?
Mar 14 2018
https://sourceware.org/bugzilla/show_bug.cgi?id=22969 and https://sourceware.org/bugzilla/show_bug.cgi?id=22970 raised on gold and ld.bfd respectively.