- User Since
- Apr 11 2016, 8:56 AM (79 w, 1 d)
Fri, Oct 13
I don't have a particularly good answer here. If a test could be a maintenance burden due to a possibly unstable API then it may not be worthwhile if the change it is testing is on the safe side. I'm hoping for input from a code owner to see whether they think introducing a UnitTest is worth it.
I think the change is the right thing to do, I was hoping to see if there were any objections/feedback from anyone that commented more strongly on D38050. If there aren't any more comments by next week I can approve.
Thu, Oct 12
I don't have any objections to using a unittest, although I think their use is uncommon; I could only find AArch64 as the other example. It will be worth finding out what the Code Owners think as this type of test is somewhat sensitive to changes in llvm API so they may think the cost/benefit trade-off isn't worth it. Personally I would be happy with the change without a heavyweight test.
Rebased in light of RelType refactoring, no other changes.
Rebase on top of recent refactoring, no other changes.
Rebase in light of recent refactoring changes to RelType. No other changes.
Wed, Oct 11
Update diff in light of recent refactoring, no other changes.
Thank you for pointing out the missing semicolons. I have updated the diff and rebased in light of recent refactoring changes.
Rebase in light of recent refactoring. No other changes.
Thanks very much for spotting that. Updated patch to fix and to rebase in light of recent refactoring.
Mon, Oct 9
Whilst attempting to avoid this estimating by calling assignAddresses again, I found another issue, which is also theoretically present prior to my changes: the DT_MIPS_LOCAL_GOTNO depends on the number of local got entries, but these are dependent on the size calculated in updateAllocSize(). Updating this size, e.g. due to additional thunks being created, might render the the dynamic tag value invalid.
Thanks for the update, I'm happy with that change. No more comments from me.
I think that this could be a useful thing to do as it would allow the handling of SHF_LINK_ORDER sections to be moved from its current location in OutputSection::finalize() to just after OutSecPos is determined. This would permit an implementation of .ARM.exidx table deduplication (pr34501) to run without needing an early AssignAddresses.
I think that this is the right thing to do. On Arm and Mips there is no TLS relaxation so we can have a reference to __tls_get_addr at static link time that isn't removed, however the static libraries for these platforms do provide a definition. At the time of implementation in D25978 we had special cases to not add this symbol on Arm and Mips, I don't remember when these were deleted.
Fri, Oct 6
Apologies for the drive by comment, I need to leave the office fairly soon. In this specific case might it be simpler to only early calculate the size of non SHF_ALLOC content such as debug sections as the OutputSection doesn't get an address, just an offset and doesn't affect the addresses of any other SHF_ALLOC content, it is therefore safe to calculate the OutputSection size early.
After some research I don't think that it will be possible to generate a test case that fails using the standard llvm tools.
- llvm-mc or clang calling the integrated assembly doesn't even call reset on the streamer.
- clang using inline assembly embedded in a .c file will create an ARMElfStreamer Object for each module.
- lto merges the bitcode files before creating only one ARMElfStreamer Object for the output.
Thu, Oct 5
On the assumption that the Sections encountered after the streamer has been reset are completely new, i.e. we don't want to go back to the previous Streamer's Sections, I think that clearing out the last mapping symbol state on reset() is definitely the right thing to do even if D30724 can be fixed to avoid a crash. For example we want to avoid false matches in LastMappingSymbols.
The use of the hash table and the proposed new member is so that, for
example, two arm relocations to a thumb function use a single thunk,
If so the hash table is only needed during the one pass when we are
mutating relocations to point to the thunk.
If the above is correct, a hash table seems better IMHO since it is only
used in a very specific point in the code.
Wed, Oct 4
For what it is worth I've created https://git.linaro.org/people/peter.smith/range-thunks-arm.git/ that contains all the range-thunks patches. I'll keep this in synch with the reviews.
Rebased to account for r314797 addition of thunks for microMIPS code. No other changes.
Tue, Oct 3
Has anyone had any more thoughts on this or is there a different approach that I can try to fix this pr?
Rebase, change lld/Core to lld/Common in include file. No further changes.
Rebased, to account for changes in Driver.c. Is there anything I can do to progress this patch? At present lld may silently use instructions that will break on older Arm CPUs that don't support them. This patch gives a warning, and I have follow up patches that add compatible alternatives.
Fri, Sep 29
I've made one suggestion for the comment, but otherwise looks ok to me.
Wed, Sep 27
I'm happy with this change; it matches my understanding of what gcc does as well. I think that in practice it is the users responsibility to match the compiler-rt or libgcc with their choice of mfloat-abi.
Tue, Sep 26
Apologies for taking so long to respond, I've been trying to find a reference as to whether the Arm cache clean using MRC CP15, ... should always cause a Data Abort if there is no write permission as with the AArch64 version, or only when it tries to actually write to memory, which would only occur if the data in the cache were dirty.
Mon, Sep 25
A colleague mentioned that on Arm removing write and adding exec should cause mprotect() to flush the at least some of the cache hierarchy. This might explain why we don't see lots of failures, however it doesn't explain why we are seeing an abort when trying to clear the cache. The only thing that I can think of is that there is some dirty data in _a_ cache that can't be written back to the memory as the mprotect() has just marked it read-only. However I'm struggling to work out how this situation could arise.
Interesting, I think the next thing to find out is "Is the cache flushed, and then the syscall returns -EFAULT, or is the cache not flushed and the syscall returns -EFAULT?" If it is the former then we may have a reasonable case for removing the check for compiler-rt, however if it is the latter then compiler-rt may have found a subtle bug in the idiom of removing write access, call clear cache on Arm.
Apologies for being a pain, could you split the changes into their own review, even if they depend on each other? It will be easier to review, and to commit each one separately.
I agree that the SHF_LINK_ORDER flag shouldn't need to be kept in a non-relocatable object, I kept it in I've run into at least one ELF disassembler that complained about it not being there as it was inferring that one should be there from the .ARM.exidx section name. This was many years ago and I can't remember the program though. It might be worth trying to output the exceptions with GNU readelf just to check.
Thu, Sep 21
Thanks for spotting that, and my apologies for the mistake, I agree the offset needs to be from the add ip, ip, pc, and shouldn't vary between the movw and movt that construct the offset. Looks good to me, but will be worth waiting to see if Rui or Rafael have any comments.
Wed, Sep 20
Thank you very much for the comment, it is helpful to know what your constraints are. AArch64 and especially Arm are somewhat awkward targets as the ABI has been designed to move some of the toolchain complexity into the linker in order to simplify the compiler. Apologies in advance for the long response which I hope makes sense and is of some use. While I obviously would prefer to get these in earlier I will be at the US developers meeting next Month and will be happy to discuss.
My apologies in advance if I've missed something. If I understand this correctly we are tying the calling convention used for compiler-rt to the triple, but for user code, and with this change for builtins provided by libm) we allow -mfloat-abi to determine the calling convention? I think that this can be simplified, but it does require removing the association between triple and compiler-rt float-abi, which I don't think should be there. I think that the behaviour for targets intending to follow the ABI for the Arm Architecture (i.e. not Apple or Microsoft) should be:
- The aeabi functions described in RTABI chapter 4.1.2 required to use AAPCS should always use ARM_AAPCS.
- A triple with a suffix of HF such as arm-linux-gnueabihf implies ARM_AAPCS_VFP for all other builtins, otherwise ARM_AAPCS is used.
- The -mfloat-abi=hard should override whatever is in the triple for all other builtins. There should be no special treatment for compiler-rt.
We would then not need to maintain a list of builtins with floating point parameters defined outside compiler-rt.
Only thing I can spot is that I think that we are missing an alignment check on the result, otherwise looks good to me as well. The address of the value to be loaded must be at least 4-byte aligned. BFD has implemented this check, for example if I change the test case a little:
Tue, Sep 19
I've refactored the patch in an attempt to simplify it. The involves a change of strategy to make each patch its own SyntheticSection, this has the following implications:
- We can get rid of the distinction between Patches and PatchSections, there is one instance of a Patch843419Section per erratum instance.
- We can't round up the size of an individual Patch843419Section to avoid disturbing the page-offset of subsequent sections as this would be too wasteful in size.
- As adding one or more Patch843419Sections may disturb the page offset of following sections, we have to do multiple passes until no more Patches are added.
- I've converted the function into a class so we only need to create the SectionMap once.
- Updated the tests to account for the smaller patches.
Mon, Sep 18
So, I'm in two minds about this. While I understand the frustration of virtualisation users, I find it hard to concede without harder evidence (which so far has been anecdotal).
I've not been able to come to a firm conclusion on this either, I think that there is a good chance that there is call to the builtin that doesn't specify the address range with sufficient precision. However on a pragmatic basis I'm tending towards removing/altering the check for the following reasons:
- The clear_cache() implementations on all the other architectures and platforms do not call compilerrt_abort(), for example the return value of sys_icache_invalidate() on Apple is not checked.
- We are adding a stronger pre-condition [start, end) virtual addresses must be currently mapped, to libgcc's implementation. In return we can offer a stronger post-condition. True we don't need to slavishly follow libgcc, but we don't have any documentation stating this anywhere and I think that most people porting applications to compiler-rt will assume that the clear-cache function has the same interface.
- Unlike the other uses of compilerrt_abort which usually represent unimplemented cases or precondition failures, I think that only the application can truly judge whether the program should be aborted.
Thanks for pointing the typos in the comment out, I've uploaded a new patch to fix them.
This looks like the right change to make. There isn't anything written down about case-sensitivity in unified assembly language syntax that I could find in the Arm ARM, the best I could find was in the Armasm documentation http://infocenter.arm.com/help/topic/com.arm.doc.dui0473m/dom1359731141352.html "Considerations when writing assembly language source code", which implies that all lower case or all upper case is acceptable. I think the only potential problem would be if the same letter meant two different things in upper and lower case and I can't think of any case where that happens. Both armasm and gcc seem to be fine with AIF as well as aif.
Sep 14 2017
Some more thoughts.
- For a [VA start, VA end) interval that is entirely unmapped it is safe for the cache flush to continue without error as by definition the addresses won't be in the cache.
- If we can split [VA start, VA end) into two sub-ranges [Mapped VA start, Mapped VA end), [Mapped VA end, Unmapped VA end) then when the function returns all the Mapped addresses will have been cleared, we don't care about the unmapped addresses.
- If the [VA start, VA end) is sparsely mapped into non contiguous chunks of mapped and non-mapped then there is a potential problem if the program is written to assume that all addresses in the interval have been flushed.
Sep 13 2017
From speaking to a few people internally and digging a bit further into the code it seems like:
- The prototype for the builtin is void builtin_clear_cache (char *begin, char *end) so there isn't a way of reporting failure from that function to the caller
- The system call can fail on Linux when StartAddress > EndAddress which the user can check themselves before calling, or if the access_ok check fails (see comment later) or if a data abort exception occurs when doing one of the CP15 cache manipulation instructions. According to the linux kernel code, this happens when the virtual address isn't mapped.
From reading the code and the disassembly this looks like it isn't checking for a NULL start address but is actually checking the value in r0 after the svc call that will clear the cache has completed. Looking in http://elixir.free-electrons.com/linux/latest/source/arch/arm/kernel/traps.c [*] I can see that the call may use non zero return status to signal an error. Am I missing something here?
Sep 12 2017
Updated diff to implement the idea to use the pass number the ThunkSection was created in to indicate the ThunkSections that need to be merged (those created this pass). This has the advantage of only having one vector in InputSectionDescription.
Rebased on top of refactoring change D37743. No other changes.
Updated diff to rebase on top of the refactoring done in D37743 no other changes made.
I think I prefer adding all these stuffs to InputSectionDescription because (1) InputSectionDescription should be the central place to describe an input section and (2) doing hash table lookups that often could be slow.
I've implemented just the change to use InputSectionDescription in a separate refactoring review D37743.
Sep 11 2017
Using a std::vector<T> * as a key seems a bit odd to me. IIUC, this vector is a member of InputSection, so we can attach a vector of ThunkSections to InputSections, no?
Rebased to account for recent changes in SyntheticSections.h, no other changes.
Yes you are correct, A < is better than <= as the ThunkSectionSpacing already lowers the start of the ThunkSection to be within range. I've updated the diff with the change and to account for D37627
I've removed the unused parameter and have searched for other locations where Cmd is used instead of OS.
Sep 8 2017
I've made the straightforward and comment changes.
I've updated the diff with the review changes and a couple of new tests.
Thanks very much for the comments. Yes you are correct that the ThunkSpacing could be increased to ~32Mb after the first one, however I thought I'd try and keep it simple for the first iteration.
I've updated the diff for the inline comments. With respect to the overall approach:
Sep 7 2017
Rebased to account for changes in D34689
Rebased to account for changes in D34689
I've added some inline comments to show what I changed.
Thanks very much for the review comments, especially those surrounding createInitialThunkSections(), I've made an attempt at applying the suggestions.
Sep 6 2017
Thanks very much for the comments. I've made the small changes and have attempted to split up the nested function. I've removed the check for a relocation to an existing patch as this isn't necessary in a one pass implementation and it lets me simplify the code a bit.
Thanks very much for the review comments. I've updated the diff with the extra spaces. I personally would prefer to stick with the map, but I'm happy to change it.
Thanks very much for the comments. I've update the diff with a revised comment mentioning the -fix-cortex-a53-843419 option, I have also added LLVM_FALLTHROUGTH.
Yes you are correct, thanks for pointing that out. I had somewhat naively assumed the True and False was the test. I've updated the diff.
Sep 5 2017
Looks like a useful error message. I've checked that ld.bfd matches its documentation and has a maximum limit of 10. I've spotted one typo in the error message, and have made a minor stylistic suggestion but don't have a strong opinion on it.
Thanks for pointing that out, I've removed the duplicate.
Sep 4 2017
Ok, I've done something similar with the yaml test and removed the aarch64-branch-fullrange.s test
Thanks for pointing the yaml tests out, I've been able to use it to write a test case to show that the b opcode is written.
Rebased tests to account for removal of -print-fixes in favour of just using -verbose. No other changes.
Thanks both for the comments. My apologies for forgetting the context, I've added a new diff with it in. Adding a comprehensive test at this stage is difficult because (sensibly) there is no way with llvm-mc to generate a R_AARCH64_ JUMP26 relocation on a non branch instruction. What I've done here is added a test case that uses the full range of the immediate field of the JUMP26, if the extra overwriting would break anything in the immediate the test will fail. The next review in the sequence that implements the patching will heavily test this.
Thanks very much for the comments. I've got rid of the -print-fixes option in favour of -verbose, and I've also added a check (with test) to error if the -fix-cortex-a53-843419 option is used when the machine is not AArch64.
Thanks very much for the comments.
Sep 1 2017
I'm ok with the proposed changes. I agree the SVC setting isCall is a much wider question than just this patch and shouldn't block it, however it would be worth having a think about though. As Renato suggests it would be good to wait for a couple of working days to see there are any comments from later timezones before committing.
General ping for the remaining range thunks patches. If it helps at all I will be at a conference in the Bay area 25th to 28th September, but I could escape to talk these over in person if any of the maintainers happened to be within public transport range.
General ping for the -fix-cortex-a53-843419 patch set. Collectively these fix pr33463 which would be necessary to fix in order for lld to be used to build Android on AArch64.
Ping for the build attributes scanning and warning patch. As mentioned previously, I've got some follow up patches to add Arm V5 and Arm V6 support that rely on detecting (lack of support) for features used by lld.
Will be worth uploading the diff again with context git diff -U999999.
Aug 24 2017
General ping for the -fix-cortex-a53-843419 patch set. Collectively these fix pr33463 which would be necessary to fix to allow lld to be used to build Android on AArch64.
General ping for the remaining range thunks patches. Is there anything more I can do to progress these?
Ping for the build attributes scanning and warning patch. I have some follow up patches that implement support for Arm architectures v5 and v6 (1st revision Raspberry Pi with an Arm1176) that rely on the detection of architecture to restrict the Thumb branch range to 4 megabytes and to not use alternative code sequences in Thunks that avoid use of movt and movw instructions.
Aug 23 2017
I'm ok with this, I've set the status to approved, although it will be worth waiting a day or so to see if compnerd has any objections.
Aug 22 2017
Thanks very much for the update, I've no further comments.
I think that this is definitely an improvement, I've made some suggestions but I haven't got a strong opinion.
Aug 18 2017
I think that this is a worthwhile addition, I've seen at least a couple of questions on llvm-dev that this improvement might have helped to resolve.
I'd like to see something like this for embedded systems as the ELF file is often not directly run on the target (PT_LOAD regions are extracted into binary or hex and burned into flash), having the ELF headers mapped in the first PT_LOAD is often detrimental. I suggest you add Rafael as a reviewer as well as he has touched this area fairly recently.
Rebasing after changes to use llvm::erase_if in D34689
Thanks for pointing out llvm::erase_if, I've updated the diff to use it.
Thanks for the comment, I've updated the diff with the change from Mb to Mib.