- User Since
- Apr 11 2016, 8:56 AM (127 w, 4 d)
Wed, Sep 19
Thanks for the updates. LGTM.
Tue, Sep 18
Thanks for the updates. I don't have any more comments.
Mon, Sep 17
My best suggestion for a test that will need exceed the margin is to use input sections with high alignment requirements. If these are on a natural boundary prior to adding thunks then adding a thunk will result in a large amount of bytes inserted to maintain section alignment. As long as these padding bytes are in between the source and destination of a branch and exceed the margin then a test case can be constructed without needing to insert hundreds of Thunks.
Interesting. I think that the implementation you have here will converge faster as it makes it more likely that pass 1 has all the thunks, at the expense of potentially generating more thunks than is strictly required. However if the goal is simplicity I think that you'll need to do everything in one pass, and accept that there will be corner cases that might not link if the margin isn't sufficient. For ELF and arbitrary linker scripts I thought the chance of failure too high, for COFF the chance of failure may be low enough. I think the most likely edge case in COFF will be the presence of sections with high alignment requirements as inserting thunks could cause a lot of bytes of alignment padding to be added.
If I understand it, to get a test case we'd need to have a branch that is just in range (including margin) such that no thunk is generated, but adding sufficient thunks causes that branch to go out of range. The brute force way to do it would be to generate greater than (margin/thunk-size) thunks but even with macros that would be a large tedious test to write. One possibility in ELF that I don't know would transition into COFF is to have some sections with high alignment so that inserting a thunk could displace one of these sections off an alignment boundary and hence add much more size than just the Thunk.
Yes I'm happy with this too.
Does FreeBsd use 4Kb granules for the page size or the LLD default of 64Kb. From what I can see from the MMU documentation when 64Kb pages are used the equivalent of the Super Page size is 512MiB and not 2 MiB? If I'm right then aligning to 2MiB is only going to be beneficial when 4Kb pages are used. On the assumption that it doesn't do much harm to people using 64Kb pages I don't have any objections in changing it though. Aligning to 512MiB for 64kb pages sounds a bit extreme though.
I've added Stephen Hines as a reviewer to see if this affects Android in a significant way. I'd like to run this by a few people in the OS community and also check that are no adverse code/file size increases as these could be significant on low-end mobile devices. Will hopefully get back to you soon.
Fri, Sep 14
If no-one else beats me to it, I'll take a look when I get back from a conference (24th September).
I think the natural default CPU for ARMv6K is the "mpcore". The K stands for kernel extensions and includes the load,store and clear exclusive instructions but otherwise no effect on code generation. I would expect that ARMv6KZ includes the instructions for trustzone such as SMC (secure monitor call) but again, this doesn't have an effect on code generation.
There isn't a processor called ARM1176J-S, only ARM1176JZ-S and ARM1176JZF-S. I've already left a comment about J and S. Z is for Trustzone and F is for hardware floating point. There isn't a good external source for all the possible arm 11 CPU names unfortunately. There is a page of trademarks that contains all the ones that I know about https://www.arm.com/company/policies/trademarks/arm-trademark-list/arm11-trademark.
Thu, Sep 13
Wed, Sep 12
The algorithm looks good to me. For ELF I preferred to give the Thunk a name related to the destination as it makes it a bit easier to follow disassembled binaries, but it is not essential.
Mon, Sep 10
I don't have much to add over Eli's existing comments. I think the general form of (symA - symB) + constant may not be handled well for all instructions though.
I agree that we should be using APSR_nzcvq rather than CPSR_f as both assemble to the same instruction encoding (they disassemble to APSR_nzcvq). I think it is worth seeing if there are any objections to removing the case for ARM state msr CPSR_f, #APSR_C as it saves one instruction. Overall looking good to me though.
Mon, Sep 3
Committed on behalf of Bharathi r341312.
Fri, Aug 31
Changes look good to me, although I'd be happy to remove the max. The only reason I can think of for raising the alignment based on sh_entsize is that it would mean that (align n, shentsize s) could be merged into the same section (align m where m < n, shentsize s) but I'm struggling to think of a case where this would be significant. I think that if the compiler has taken advantage of the alignment of a data element or string in a SHF_MERGE section it sets the alignment of the section.
Thu, Aug 30
Thanks for the update. I still think this looks good to me.
I've been referred back to this by a colleague trying to build with big-endian clang with binutils and is seeing an error message out of binutils. I've checked that binutils from the initial commit always used aarch64linuxb and I've verified that I cannot build a aarch64_be-linux-gnu executable with binutils without this change, and I can with this change.
Wed, Aug 29
I've added a couple of comments. One other thing to think about is the -S output from the compiler. If I compile a trivial test case with -mexecute-only I get the output:
Wed, Aug 22
Thanks for looking. What is our usual response with issues like this? Is it OK for this to go ahead, as it can be used fine in some linkers. I could put it behind a flag I think, so that users have the option to turn it off (although that might not be very obvious..)
I think it depends on the context. For example:
- How many existing builds is it going to break?
- How easy would it be to backport the fix in binutils if it were important enough to do so?
- Are there alternatives to binutils available?
- Does the same fault exist when compiling the file with GCC?
Aug 22 2018
Sadly I had to resort to using .space to create large binaries. Creating the binaries is usually quick, disassembling them is unfortunately not. I tended to used gnu objdump first as that skips 0 by default to find the address ranges I needed, then used --start-address and --stop-address in llvm-objdump to pull out the bits that I need.
I've left a couple of comments where I think that there may be some unintentional inefficiencies, but as far as I can tell it looks like it will be correct. I suggest adding a lot more test cases as you go for things like thunk reuse.
Thanks for the test update, I've no further comments.
Do you have a better source of documentation for these linker script features than the gnu docs. I find the gnu docs difficult, for example the gnu docs state for NOLOAD that: "the section should be marked as not loadable, so that it will not be loaded into memory when the program is run". Which I find difficult to interpret in in terms of ELF and leaves me required to read the source code to understand what NOLOAD does!
Looks good to me now that LLVM support has been approved in D49673
Aug 21 2018
I'm not sure it is possible to do exactly what the author is doing with https://bugs.llvm.org/show_bug.cgi?id=38625 with a section without SHF_ALLOC. The entire contents of the output section are defined by .= stack_start. As stack_start is calculated by the linker I don't think it would have been possible to do this with a section.
Looks good to me. I've not seen any other comments and I've no further comments.
The code changes look good to me, although someone more familiar with the ConstantIslands pass may have a better idea of how to prevent splitting. The only other thought about the test is to make a MIR test and just run the constant-islands pass. For example see the test constant-islands-cfg.mir. I suspect that it will be hard to write the test but it should be a lot less fragile.
For what it's worth I've checked that the bit-patterns used in the MOVT/MOVW instructions are correct. In assembler it seems like it is possible to create an error that isn't detectable at link time.
Aug 16 2018
I've uploaded a new version that doesn't use getLastSubtargetInfo(). Thanks for the warning about clang tools/driver/cc1as_main.cpp I've not seen anywhere else in the project that uses InitSections(). The code there will become:
As the STI has been computed earlier. I'm happy to post another review if people prefer? Otherwise I could just make the change immediately after where this to be committed.
From what I can see, it should just be the InitialExec model that does a LOADgot which should match the GCC behaviour with respect to TLS; so I'm happy with that for this patch. I think we can add better support for tiny code model TLS later if there is a need for it. I don't have any more comments myself, if there are no other comments from the other reviewers by the end of the week I'll LGTM.
Thanks for the update. This looks good to me. It may be worth waiting a few days before committing to see if there are any objections/other opinions given that this touches some code shared between multiple backends.
Aug 14 2018
Generally looking good to me. Apologies for the delay.
Ping, I don't think that there are any outstanding comments that need addressing.
Ping, I don't think that there are any outstanding comments that need addressing.
Interesting. This seems to be one of the situations where linker scripts aren't specified well enough to give a definitive answer of what a linker should do.
Aug 10 2018
Approving as these are linaro buildbot specific.
Approving as these are linaro buildbot specific.
Aug 9 2018
Looks good to me, we've tested this out internally.
Looks good to me, this matches the configuration we've had internally at Linaro.
Aug 8 2018
I've left some brief comments. As this is hitting the generic parts of MC it may be worth casting the net wider to look for some more reviewers?
Aug 3 2018
Thanks for the comments. I've updated the diff.
Thanks for the suggestion to use yaml2obj. I keep forgetting about that. I've been able to make the same test case.
I've updated the comment to mention the gcc version. The code for the object is in the comment, I've made that a bit clearer.
I've proposed a test case in https://reviews.llvm.org/D50234 it does need a binary file as llvm-mc does indeed relax the b.n into a b.w.
Aug 2 2018
Aug 1 2018
I don't have any better suggestions about where to move the function right now. I have a couple of minor suggestions.
Updated patch. Changes:
- Correct instruction sequence in PI thunk to avoid use of Armv7 instruction
- Update and add tests for error messages
- Moved ThunkSectionSpacing into a function
Thanks for the comments, I'll post another patch shortly.
Jul 31 2018
LGTM. Probably worth waiting a bit to see if Rui has any opinions.
One example where gcc (7.1 in my case) will generate different code for tls is:
Looks reasonable to me. I've made a suggestion that might simplify a bit further, I've not tried or tested it though.
Thanks, I'll update the patches to use ToolChain on commit.
Thanks all for the comments.
I think this looks fine from an ELF perspective. As I understand it the default section names like .data and .bss are more convention than a requirement so it shouldn't matter what the name of the section is.
Jul 30 2018
Apologies for the delay. This looks good to me. I've checked that I can build v7-m without floating point support and I've checked that I can build and run the tests for armv7-linux-gnueabihf.
I can't comment too much on the use of .inst in machO and Windows. I have a small suggestion that might make it possible to share more of the implementation, but it is only a personal preference.
That looks like the right behaviour to me. For reference the Arm ARM definition of Thumb Instruction width from encodeing is in https://developer.arm.com/products/architecture/cpu-architecture/a-profile/docs/ddi0406/latest/arm-architecture-reference-manual-armv7-a-and-armv7-r-edition 6.1 Thumb Instruction set encoding.
Jul 25 2018
This patch is causing the http://lab.llvm.org:8011/builders/clang-cmake-aarch64-lld to fail. Unfortunately (for me) the failures seems to be restricted to older versions of clang and are not reproducible on Arm or X86. The buildbot itself is using the default /usr/bin/clang for Ubuntu 16.04 LTS (clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final). The failures go away if I use the 3.9.1 release (No failures on 4.0.1, 6.0.0 or built from master). At this stage I think that this could be a code-gen bug in the AArch64 backend of clang 3.8.1 so I don't think it is worth reverting, but I'm mentioning it here just in case there is a subtle problem that is only being exposed by that version.
Jul 24 2018
I think that it is important to support position independent code in the tiny code model, there was a thread on llvm-dev in May where someone couldn't 4k align his position independent code on a bare-metal AArch64 platform (http://lists.llvm.org/pipermail/llvm-dev/2018-May/123465.html) A tiny code model that avoids ADRP could be a solution there. A PR was raised to implement it https://bugs.llvm.org/show_bug.cgi?id=37543 . If we can get position independent support added then it will be worth closing that PR. I guess it could be added in a follow up patch though.
I definitely think that it is the right thing to do to consider the section and the SHF_LINK_ORDER section as a unit. I'm assuming that no merging of the code sections is done if the SHF_LINK_ORDER sections are different.
Jul 18 2018
Adding Rui as code owner as he'll probably want to sign off on the introduction of a command line option.
Jul 17 2018
Can you elaborate on why the armhf buildbot failed and why you have excluded armhf from the MATCHES expression above. A simple test of a source file containing
Looks good to me thanks.
This looks correct to me. I've got a few small suggestions.
Speaking as primarily a linker person; the implementation of the R_AARCH64_ABS32 relocation is correct. You'll need to get some input from an LLDB person about whether extending the function like this is the right approach though, I'm guessing that there are more than just AArch64 and Arm that support relocated debug sections? The more codes that are added the less readable that assert is going to get. Final comment: is there a test case that you can add to show that R_AARCH64_ABS32 is handled correctly, or even just doesn't assert fail?
Jul 6 2018
Looks good to me.
Jul 4 2018
Looks ok to me. I've checked that gcc also supports -mcpu=native on an AArch64 machine. It also supports -march=native. While I don't think that this is necessary to support it for this patch is it worth considering as well?
Some early feedback from the AArch64 ABI along the lines of "Do you actually need SHF_AARCH64_PURECODE?" In Arm there has been a long history of using literal pools so it is important to mark code that has been compiled to avoid them as SHF_ARM_PURECODE. In AArch64 the expectation is that code is execute-only unless someone has forced it otherwise. An alternative model is to assume that all executable sections are implicitly execute-only and therefore all that you would need is an option at link time to remove PF_R from a program header that contained only executable sections. In essence we can have a one time declare all executable sections as execute-only at link time. I think that this would be compatible with adding SHF_AARCH64_PURECODE later if there was a code-generation model that made use of literal pools.
Jul 3 2018
As an example for LTO, or just using a single bitcode file I'm thinking of something like
Jul 2 2018
I think that setting SHF_ARM_PURECODE for the initial .text section is the right thing to do; the flag is a declaration that the section does not read from the .text section, and that property is satisfied by an empty section. For reference the code you've restored was removed in rL306927 "Rewrite ARM execute only support to avoid the use of a command line flag and unqualified ARMSubtarget lookup." I think the unqualified ARMSubtarget lookup might need a bit more exploration; for example what happens if this is the link step of a LTO build where the functions have ARMSubtargets with SHF_ARM_PURECODE but the command line to the link step has no code generation flags I think that the ARMSubtarget is likely to not have the ARM::FeatureExecuteOnly. If this is the case then we may need an alternative approach. At a glance would it be possible to always add SHF_ARM_PURECODE to the initial .text section?
I've left a comment about the ABI problems with using SHF_ARM_PURECODE. I've raised an issue to support execute only on AArch64 with the ABI team at Arm.
There is an ABI issue with using an Arm specific flag that isn't defined in AArch64. I've not had a chance to check over the code yet, but will try to do so later today.
Jun 29 2018
I'll come up with something. Will aim to do it on Monday. The load/store exclusive instructions are architecture v8.1 only so aren't supported by the v8.0 cortex-a53; it is unlikely that anyone will run the erratum fix that only affects cortex-a53 with the option but it should be possible to come up with test case.
Jun 26 2018
Committed under r335463 my apologies I forgot the Differential Revision commit that would automatically close this revision.
Jun 15 2018
Thanks very much for the update. I don't have any more comments at this stage.
Jun 14 2018
Thanks, I'll wait till tomorrow before committing to see if anyone in the US timezone has any objections.