Page MenuHomePhabricator

peter.smith (Peter Smith)
User

Projects

User does not belong to any projects.

User Details

User Since
Apr 11 2016, 8:56 AM (161 w, 6 d)

Recent Activity

Fri, May 17

peter.smith added a comment to D62055: [ARM][AArch64] Revert Android Bionic PT_TLS overaligning hack.

I'm happy to go along with the Android folks decision here.

Fri, May 17, 1:53 AM · Restricted Project
peter.smith added a comment to D62051: [ELF][test] Reorganize some R_*_NONE tests.

I've no objections to making this change.

Fri, May 17, 1:48 AM · Restricted Project
peter.smith accepted D62052: [ELF] -r: fix R_*_NONE to section symbols on Elf*_Rel targets.

LGTM. The addend in a R_*_NONE relocation is not meaningful anyway as the relocation is never resolved to a value.

Fri, May 17, 1:46 AM · Restricted Project

Thu, May 16

peter.smith accepted D61973: [AArch64] Support .reloc *, R_AARCH64_NONE, *.

Thanks, looks good to me. Will be worth seeing if there are any further comments from the US Timezone.

Thu, May 16, 8:10 AM · Restricted Project
peter.smith accepted D61992: [ARM] Support .reloc *, R_ARM_NONE, *.

Thanks, this looks good to me. May be worth waiting before committing to see if there are any comments from the US timezone.

Thu, May 16, 6:49 AM · Restricted Project
peter.smith added a comment to D61973: [AArch64] Support .reloc *, R_AARCH64_NONE, *.

To get back to this patch. I think the non-bigendian comments I made on D61992 apply here as well. Generally looks good. Will be worth checking that the format is ELF before accepting R_AARCH64_NONE.

Thu, May 16, 6:49 AM · Restricted Project
peter.smith added a comment to D61973: [AArch64] Support .reloc *, R_AARCH64_NONE, *.

I think the ld -r failure with arm is due to InputSection::copyRelocations(uint8_t *Buf, ArrayRef<RelTy> Rels) in particular:

Thu, May 16, 6:41 AM · Restricted Project
peter.smith added a comment to D61973: [AArch64] Support .reloc *, R_AARCH64_NONE, *.

I have similar comments to D61992

Thu, May 16, 6:15 AM · Restricted Project
peter.smith added a comment to D61992: [ARM] Support .reloc *, R_ARM_NONE, *.

I think this is looking close. I've got a small concern about what happens on non-ELF platforms Windows and Macho where R_ARM_NONE is not defined. Other than that it is just a few suggestions.

Thu, May 16, 4:29 AM · Restricted Project
peter.smith added a comment to D61992: [ARM] Support .reloc *, R_ARM_NONE, *.

I get an assertion failure from this patch on the armv7eb-linux-gnueabi test case: llvm/lib/Target/ARM/MCTargetDesc/ARMAsmBackend.cpp:872! (the llvm_unreachable)

Thu, May 16, 3:13 AM · Restricted Project

Wed, May 15

peter.smith added a comment to D61824: [ARM][AArch64] Overalign .tbss (Android Bionic hack) to avoid p_vaddr%p_align!=0.

I'm not sure I understand the comment about ALIGNUP, did you mean the linker script command ALIGN? I guess by perceivable you mean it has more impact on the file and following sections?

Sorry, I meant ALIGNOF. If we lower the hack to the OutputSection layer, it may interfere with linker scripts. Though my understanding about linker scripts is largely learned from lld source, I don't use linker scripts myself so I am not very clear what issues it may cause.

I think that should be fine. If someone happened to write ALIGNOF(.tbss) then it would return the higher value, which would be consistent with the actual section alignment. The only reason I can think of for doing that is to roll their own TLS implementation using linker defined symbols rather than the PT_TLS segment.

Wed, May 15, 9:20 AM · Restricted Project
peter.smith added a comment to D61824: [ARM][AArch64] Overalign .tbss (Android Bionic hack) to avoid p_vaddr%p_align!=0.

I have a question based on the updated description of D61825 (makes more sense in the context of this patch)

- Moving the hack to the OutputSection layer makes the hack more perceivable as it has interaction with the linker script ALIGNUP command.
- It may waste up to 8*WordSize * 2 bytes of each thread (when both .tbss and .tdata exist and are overaligned). With more code, we can apply the overalignment to the first TLS output section (either .tbss or .tdata) and decrease the wastage to 8*WordSize, but that is still unsatisfactory on other platforms.

I'm not sure I understand the comment about ALIGNUP, did you mean the linker script command ALIGN? I guess by perceivable you mean it has more impact on the file and following sections?

Wed, May 15, 6:51 AM · Restricted Project
peter.smith added a comment to D61824: [ARM][AArch64] Overalign .tbss (Android Bionic hack) to avoid p_vaddr%p_align!=0.

Aside: An R_{ARM,AARCH64}_NONE relocation to a TLS symbol has the desired effect. Is there a better way to create this relocation than by using a hex editor?

why not export the symbol?
then it cannot be removed for sure.
(i think gas has a .reloc directive that accepts R_AARCH64_NONE as argument, not sure if llvm handles that)

Wed, May 15, 6:12 AM · Restricted Project

Mon, May 13

peter.smith committed rG4e21c770ec32: [ELF] Full support for -n (--nmagic) and -N (--omagic) via common page (authored by peter.smith).
[ELF] Full support for -n (--nmagic) and -N (--omagic) via common page
Mon, May 13, 9:02 AM
peter.smith added a comment to D61824: [ARM][AArch64] Overalign .tbss (Android Bionic hack) to avoid p_vaddr%p_align!=0.

Below is my understanding about how glibc is wrong. glibc/elf/dl-tls.c:250 (the #elif TLS_DTV_AT_TP part)

// I don't know what this piece of code intended to do with non-zero firstbyte...
off = roundup (offset, slotinfo[cnt].map->l_tls_align); // roundup(16, 64) = 64
if (off - offset < firstbyte) // 64 - 16 ; 32
  off += slotinfo[cnt].map->l_tls_align;
slotinfo[cnt].map->l_tls_offset = off - firstbyte; // 64 - 32 = 32, wrong, should be 64

// If firstbyte is 0 (i.e. p_vaddr % l_tls_align == 0), this is simply:
off = roundup (offset, slotinfo[cnt].map->l_tls_align);
slotinfo[cnt].map->l_tls_offset = off;
// and is correct
Mon, May 13, 8:16 AM · Restricted Project
peter.smith added a comment to D61688: [LLD][ELF] Full support for -n (--nmagic) and -N (--omagic) via -zcommon-page-size.

Thanks for the reviews, I'll fix the nits, and will add a comment to describe the difference between max and common page size.

Mon, May 13, 6:18 AM · Restricted Project
peter.smith added a comment to D61824: [ARM][AArch64] Overalign .tbss (Android Bionic hack) to avoid p_vaddr%p_align!=0.

I think that this will definitely fix the problem, but I'd like to understand a bit more about what glibc is doing and what assumptions it is making, there does seem to be some code that attempts to deal with a PT_TLS with p_vaddr != 0 modulo p_align. My understanding from looking at dt-tls.c _dl_determine_tlsoffset() which sets the offset of each tls block (with the executable being in slot 0). Does something like this:

  • offset = TLS_TCB_SIZE (hard-coded to 2 * (void*) pointers.
  • calculates firstbyte, which is derived from l_tls_firstbyte_offset which is PT_TLS p_vaddr % p_align. For the example in pr41527 this is 32 bytes
  • sets the l_tls_offset of slot 0 to alignto(offset, p_align) - firstbyte.
Mon, May 13, 4:18 AM · Restricted Project

Thu, May 9

peter.smith updated the diff for D61688: [LLD][ELF] Full support for -n (--nmagic) and -N (--omagic) via -zcommon-page-size.

Update diff to add (default) to the --no-omagic and --no-nmagic options.

Thu, May 9, 8:11 AM · Restricted Project
peter.smith added a comment to D61688: [LLD][ELF] Full support for -n (--nmagic) and -N (--omagic) via -zcommon-page-size.

Sorry for my ignorance, but I don't think I fully understand the difference of "page size" and "common page size". In what situation you want to set different values to these variables? Is there any constraint between them, such as one value is always greater than the other, etc.?

Thu, May 9, 5:42 AM · Restricted Project
peter.smith abandoned D61201: [LLD][ELF] Full support for -n (--nmagic) and -N (--omagic).

Abandoned in favour of a different approach D61688

Thu, May 9, 3:58 AM

Wed, May 8

peter.smith added a comment to D61201: [LLD][ELF] Full support for -n (--nmagic) and -N (--omagic).

https://sourceware.org/bugzilla/show_bug.cgi?id=24505 doesn't get a response, but I researched the users of CONSTANT(...):

CONSTANT(COMMONPAGESIZE): 2 users: edk2, u-boot
CONSTANT(MAXPAGESIZE): 1 user: fuchsia zircon, but they don't use -n or -N

u-boot and fuchsia zircon don't seem to use -n or -N. edk2 is probably the only project that uses both -n/-N and CONSTANT(...). It has DEFINE GCC44_IA32_X64_DLINK_COMMON = -nostdlib -Wl,-n,-q,--gc-sections -z common-page-size=0x20

We probably shouldn't be constrained by ld.bfd's behavior for this single user,
if making MaxPageSize and PageSize to 1 makes our implementation simpler.

The potential users can make straightforward changes to adapt to lld's behavior: change CONSTANT(COMMONPAGESIZE) to a literal constant.

I'll make an alternative implementation that in effect implements -z common-page, and have -n/-N set it to 1 and will post as a separate review. It will probably be of similar complexity to this. In the example above it does look to be using -z common-page-size and CONSTANT(COMMONPAGESIZE) as a way of passing a parameter to the linker script.

Wed, May 8, 8:52 AM
peter.smith created D61688: [LLD][ELF] Full support for -n (--nmagic) and -N (--omagic) via -zcommon-page-size.
Wed, May 8, 8:51 AM · Restricted Project
peter.smith added a comment to D61666: [ELF] Optimize getISDThunkSec() to amortized O(1) in thunk creation.

If I understand correctly I think this could prematurely skip over ThunkSections if there is more than one branch relocation range. For example consider a case where there are two relocation ranges S (short) and L (long). If we search for a ThunkSection for S where the ThunkSection spacing is L then there is a chance that at a relocation with range S at offset O will be out of range of one of more ThunkSections, but these would have been in range had it been R. The next relocation with range L at offset O2 will not even check some ThunkSections that it would have been in range of. The chances of this happening are much greater when S is much smaller than L.

Wed, May 8, 2:48 AM · Restricted Project
peter.smith added a comment to D61610: [PPC64] implement Thunk Section Spacing.

I was trying to find a good value used in
uint32_t PPC64::getThunkSectionSpacing() const { return 0x8000 - 0x1000; } before I noticed this caused an extreme slowdown in thunk creation.

We have a large program whose text segment is of 1.3GiB. Its OutputSection .text consists of 1948775 InputSections. It takes 20 seconds to link but with this patch it doesn't stop in 10 minutes.

The reason is that:

// O(|ThunkVec|). Not the critical path, but there is a bottleneck, too. Many ThunkVec's have hundreds/thousands of elements
            std::tie(T, IsNew) = getThunk(*Rel.Sym, Rel.Type, Src); 

            if (IsNew) { // called 112353 times
              // Find or create a ThunkSection for the new Thunk
              ThunkSection *TS;
              if (auto *TIS = T->getTargetInputSection())
                TS = getISThunkSec(TIS);
              else
                TS = getISDThunkSec(OS, IS, ISD, Rel.Type, Src); // it takes O(|ISD->ThunkSections|) time, `ISD->ThunkSections` is large (25980) when getThunkSectionSpacing is enabled.
              TS->addThunk(T);
              Thunks[T->getThunkTargetSym()] = T;
            }
Wed, May 8, 1:59 AM · Restricted Project

Tue, May 7

peter.smith added a comment to D61201: [LLD][ELF] Full support for -n (--nmagic) and -N (--omagic).

https://sourceware.org/bugzilla/show_bug.cgi?id=24505 doesn't get a response, but I researched the users of CONSTANT(...):

CONSTANT(COMMONPAGESIZE): 2 users: edk2, u-boot
CONSTANT(MAXPAGESIZE): 1 user: fuchsia zircon, but they don't use -n or -N

u-boot and fuchsia zircon don't seem to use -n or -N. edk2 is probably the only project that uses both -n/-N and CONSTANT(...). It has DEFINE GCC44_IA32_X64_DLINK_COMMON = -nostdlib -Wl,-n,-q,--gc-sections -z common-page-size=0x20

We probably shouldn't be constrained by ld.bfd's behavior for this single user,
if making MaxPageSize and PageSize to 1 makes our implementation simpler.

The potential users can make straightforward changes to adapt to lld's behavior: change CONSTANT(COMMONPAGESIZE) to a literal constant.

Tue, May 7, 8:54 AM
peter.smith added a comment to D61610: [PPC64] implement Thunk Section Spacing.

@atanasyan Should MIPS override getThunkSectionSpacing as well? Then we can make the function pure virtual: virtual uint32_t getThunkSectionSpacing() const = 0;

Tue, May 7, 7:19 AM · Restricted Project
peter.smith added a comment to D61629: [ELF] Try placing newly created ThunkSection at the end of the InputSection first.

I don't think it will do any harm, but it is probably not going to make a lot of difference. The thunk code makes the assumption that there is one branch range for the thunk section spacing, the vast majority will get placed in these sections. The no suitable ThunkSection case is for the small number of edge cases that don't fit the model, for example in Thumb there is one very rare branch that has a much shorter range so it may need to use this, but there aren't enough of them to make reuse opportunities that likely.

createInitialThunkSections() looks to me a very good global heuristic. Without it, Relocations.cpp:1669 getISDThunkSec is making local decisions that may not be very good. I think it this way. There are three positions to create the thunk:

  • IS->OutSecOff: current status
  • IS->OutSecOff + IS->getSize(): this patch. It improves sharing slightly.
  • The last InputSection whose start address is before IS->OutputSecOff + getThunkSectionSpacing(). This is the best place to insert the thunk. (This is heuristically achieved by the proactive createInitialThunkSections().

    If you think the marginal benefit is not bad to have, I'll update the tests:)
Tue, May 7, 2:52 AM · Restricted Project
peter.smith committed rG3f585ae3ce5f: [libFuzzer] Increase timeouts on fork tests and skip one on aarch64 (authored by peter.smith).
[libFuzzer] Increase timeouts on fork tests and skip one on aarch64
Tue, May 7, 2:34 AM
peter.smith added a comment to D61629: [ELF] Try placing newly created ThunkSection at the end of the InputSection first.

I don't think it will do any harm, but it is probably not going to make a lot of difference. The thunk code makes the assumption that there is one branch range for the thunk section spacing, the vast majority will get placed in these sections. The no suitable ThunkSection case is for the small number of edge cases that don't fit the model, for example in Thumb there is one very rare branch that has a much shorter range so it may need to use this, but there aren't enough of them to make reuse opportunities that likely.

Tue, May 7, 1:35 AM · Restricted Project

Thu, May 2

peter.smith created D61449: [libFuzzer] Increase timeouts on fork tests and skip one on aarch64.
Thu, May 2, 9:55 AM · Restricted Project, Restricted Project

Wed, May 1

peter.smith added a comment to D61052: [compiler-rt][builtins] Implement some fetch-and-x operations for Cortex-M.

Thanks for the update.

Wed, May 1, 9:28 AM · Restricted Project, Restricted Project
peter.smith committed rG101bf520d1b5: [libFuzzer] Add --dump-input-on-failure to help diagnose AArch64 failures (authored by peter.smith).
[libFuzzer] Add --dump-input-on-failure to help diagnose AArch64 failures
Wed, May 1, 5:29 AM
peter.smith added a comment to D61201: [LLD][ELF] Full support for -n (--nmagic) and -N (--omagic).

Thanks for the review, I'll update the docs for --no-omagic and --no-nmagic and will wait for the response to MaskRay's question before committing just in case it changes anything significant.

Wed, May 1, 3:19 AM

Tue, Apr 30

peter.smith updated the diff for D61201: [LLD][ELF] Full support for -n (--nmagic) and -N (--omagic).

Updated diff for review comments. Main changes:

  • Move location of Config::Paged
  • Update captitalisation in comments
  • Add documentation
  • Use echo in linkerscript test
Tue, Apr 30, 8:27 AM
peter.smith added a comment to D61201: [LLD][ELF] Full support for -n (--nmagic) and -N (--omagic).

Thanks for the suggestions I'll apply and upload a new diff.

Tue, Apr 30, 8:22 AM
peter.smith created D61315: [libFuzzer] Add --dump-input-on-failure to help diagnose AArch64 failures [NFC].
Tue, Apr 30, 8:04 AM · Restricted Project
peter.smith added a comment to D61052: [compiler-rt][builtins] Implement some fetch-and-x operations for Cortex-M.

Is there any reason to avoid 32-bit ldrex/strex on M-class processors other than Cortex-M0?

No there isn't any reason to avoid ldrex/strex on M-class. It is the recommended way of implementing these functions on all architectures that support them.

Tue, Apr 30, 4:28 AM · Restricted Project, Restricted Project
peter.smith added a comment to D61298: [LLD] Emit dynamic relocations for references to script symbols in -pie links.

That looks ok to me, IIRC the main problem in D55423 was hitting the cannot refer to absolute symbol error, and this will still avoid that.

Tue, Apr 30, 2:59 AM · Restricted Project

Mon, Apr 29

peter.smith added a comment to D61186: [LLD][ELF] /DISCARD/ output sections should not be orphans.

I think it is 2 lines of code change actually :) I believe the much more correct way to fix it is this: https://reviews.llvm.org/D61251
With this approach there is no /DISCARD/ in the output.
(note, I did not change the original test case there).

Unfortunately, this doesn't completely work, although it does "work" for this particular issue. The problem is if the "/DISCARD/" has a PHDR assignment that needs to propagate to other sections, which is why the "isDiscardable" function returns false for sections with explicit PHDR assignment. It's an unlikely but still possible scenario.

Mon, Apr 29, 8:45 AM · Restricted Project
peter.smith updated the diff for D61201: [LLD][ELF] Full support for -n (--nmagic) and -N (--omagic).

Updated diff to address review comments. Main changes:

  • Removed Config->Nmagic in favour of just using Config->Paged
  • Moved test to linkerscript, converted to .test and switched to x86
Mon, Apr 29, 3:53 AM
peter.smith added a comment to D61201: [LLD][ELF] Full support for -n (--nmagic) and -N (--omagic).

Thanks all for the comments. I'll upload a new diff with the changes shortly.

Mon, Apr 29, 3:51 AM

Fri, Apr 26

peter.smith created D61201: [LLD][ELF] Full support for -n (--nmagic) and -N (--omagic).
Fri, Apr 26, 10:25 AM
peter.smith added a comment to D60887: [AsmPrinter] refactor to support %c w/ GlobalAddress'.

This looks like a nice cleanup. Accepting unless Peter has any objections.

Fri, Apr 26, 1:47 AM · Restricted Project

Thu, Apr 25

peter.smith added inline comments to D60887: [AsmPrinter] refactor to support %c w/ GlobalAddress'.
Thu, Apr 25, 2:23 AM · Restricted Project

Wed, Apr 24

peter.smith added a comment to D60887: [AsmPrinter] refactor to support %c w/ GlobalAddress'.

so I guess these backends are defaulted off or something?

@srhines identified that AVR and RISCV are not in LLVM_ALL_TARGETS in llvm/CMakeLists.txt. Should I send a patch to turn them on? (assuming the tests aren't red).

Wed, Apr 24, 8:59 AM · Restricted Project
peter.smith added a comment to D54621: [ELF] - Do not remove empty sections referenced in LOADADDR/ADDR commands..

I think that this change is worth making as it is a reasonable to expect ADDR and LOADADDR to return a valid result even when empty. I note that with wildcards the author may not know ahead of time whether the OutputSection will be empty or not so this doesn't just affect people intentionally using empty OutputSections.

Wed, Apr 24, 8:39 AM · Restricted Project
peter.smith added inline comments to D60927: [WIP][llvm-objdump] Switch between ARM/Thumb based on mapping symbols..
Wed, Apr 24, 3:46 AM · Restricted Project

Tue, Apr 23

peter.smith added a comment to D60927: [WIP][llvm-objdump] Switch between ARM/Thumb based on mapping symbols..

I've not got any great ideas about how to move the target switching logic out of llvm-objdump. I think that we would have to either construct a map (Arm/Thumb/Data) in llvm-objdump that we can pass to the disassembler and it can tell from the index which SubTarget to use assuming the disassembler also has the index, or we'd have to make the disassembler aware of symbols so it could look them up.

Tue, Apr 23, 6:15 AM · Restricted Project

Apr 17 2019

peter.smith added a comment to D55550: [LLD][ELF] - Fix the different behavior of the linker script symbols on different platforms..

Rui, I am a bit tired of uncertainty in some LLD patches like this one. Could you please either confirm you're not going to accept this, so I'll abandon it. Or if we can proceed with that somehow, please say your word.

Apr 17 2019, 4:02 AM · Restricted Project
peter.smith added a comment to D60809: [builtins] __gnu_[u]ldivmod_helper for libgcc compatibility.

Can you update the description to give some context on why these functions are needed in compiler-rt? To the best of my knowledge gnu_uldivmod_helper is no longer in libgcc and gnu_ldivmod is only called from the v6-m (cortex-m0) implementation of aeabi_ldivmod in libgcc/config/arm/bpabi-v6m.S which would imply that these functions would not not get called.

Apr 17 2019, 3:12 AM · Restricted Project, Restricted Project

Apr 10 2019

peter.smith added a comment to D60472: [AArch64][PowerPC][Driver] Allow setting crypto feature through -mcrypto for ARM/AArch64.

The motivation for this change is to make "crypto" setting an additive option e.g. like "-mavx" used in many media packages. Some packages in Chrome want to enable crypto conditionally for a few files to allow crypto feature to be used based on runtime cpu detection. They set "-march=armv8+crypto" flag but it gets overridden by the global "-march=armv8a" flag set by the build system in Chrome OS because the target cpu does not support crypto causing compile-time errors.
Ability to specify "-mcrypto" standalone makes it an additive option and ensures that it it is not lost. i.e. this will help in decoupling "-mcrypto" from "-march" so that they could be set independently. The current additive alternate is '-Xclang -target-feature -Xclang "+crypto" ' which is ugly.

Apr 10 2019, 7:55 AM · Restricted Project
peter.smith added a comment to D60472: [AArch64][PowerPC][Driver] Allow setting crypto feature through -mcrypto for ARM/AArch64.

I'm not in favour of adding AArch64 support to -mcrypto -mnocrypto for a few reasons:

Apr 10 2019, 7:15 AM · Restricted Project
peter.smith added a comment to D60428: [Aarch64AsmPrinter] support %c output template for global addresses.

I have a mild preference for following most of the other backends so that we don't split the handling of 'c' across the generic and the target specific

Thinking more about this last night; it's going to be a pain/stupid to keep reimplementing this for other arches. The Linux kernel is using %c with global addresses for arm32 and ppc (and many others). I think I should:

  1. Create AsmPrinter::printSymbolOperand(), a virtual method of the generic class, that simply forwards its arguments to the virtual method printAsmOperand().
  2. Update X86AsmPrinter::printSymbolOperand() to be a member function rather than static local function, overriding the base AsmPrinter::printSymbolOperand().
  3. Handle %c in the generic AsmPrinter::PrintAsmOperand(), deferring to the now virtual method printSymbolOperand() for MachineOperand::MO_GlobalAddress, thus supporting %c w/ MachineOperand::MO_GlobalAddress for all architectures.

    How does that sound?
Apr 10 2019, 6:37 AM · Restricted Project

Apr 9 2019

peter.smith added a comment to D60428: [Aarch64AsmPrinter] support %c output template for global addresses.

I have a mild preference for following most of the other backends so that we don't split the handling of 'c' across the generic and the target specific, but I'm happy to let that pass if others prefer it this way.

Apr 9 2019, 2:38 AM · Restricted Project

Apr 8 2019

peter.smith added a comment to D58951: [compiler-rt][tests] Improve handling with non-default toolchains.

As I understand it compiler-rt is built by make/ninja from CMake derived files so CMake only features such as CMAKE_C(XX)_EXTERNAL_TOOLCHAIN and CMAKE_SYSROOT will affect the make/ninja build files. The tests are run by llvm-lit so we wouldn't expect CMake only features to affect the output unless we convert them to something llvm-lit understands. We've also got a situation where cross-compiling and native tests have different options available to them:

  • Cross compiled test runs can use COMPILER_RT_TEST_COMPILER_CFLAGS to pass in --gcc-toolchain=... --sysroot=... duplicating any use of CMAKE_C(XX)_EXTERNAL_TOOLCHAIN and CMAKE_SYSROOT, but at least it is possible.

That is also my understanding. I also believe that it is working sufficiently today for people using such configurations, or at least sufficiently so that improving the situation was not seen as being urgent.

I think once you know what to you need to do it suffices, however finding out how to do it is more difficult. I think people have largely been reliant on the mailing list for guidance.
...

Possible ways to fix this are:

  • Convert the CMAKE_C(XX)_EXTERNAL_TOOLCHAIN and CMAKE_SYSROOT to a form that llvm-lit understands, either by converting them to c-flags or adding support to them in llvm-lit so that they have their own line in lit.common.configured.in

This patch implements the above approach for CMAKE_CXX_COMPILER_EXTERNAL_TOOLCHAIN. The various *_TEST_TARGET_CFLAGS variables are picked up by their respective lit.site.cfg.in files.

In the suggestion I was thinking more along the lines of a separate toolchain and sysroot rather than passing them as C Flags, something like

Apr 8 2019, 5:41 AM · Restricted Project, Restricted Project

Apr 5 2019

peter.smith added a comment to D60143: [compiler-rt][tests] Propagate COMPILER_RT_UNITTEST_LINK_FLAGS.

I checked with the original introduction of the support for COMPILER_RT_UNITTEST_LINKFLAGS in https://reviews.llvm.org/D16165 and this looks like it is consistent with that. In the original patch it looks like it was fixing a specific link failure, which I think is why it didn't get propagated to the other files. Unless there are any objections I'm inclined to approve this?

Apr 5 2019, 9:43 AM · Restricted Project, Restricted Project
peter.smith added a comment to D58951: [compiler-rt][tests] Improve handling with non-default toolchains.

I've had a chance to look into this a bit more, although my understanding is far from complete. At the moment I think the patch is too specific to the use-case, I'm not an expert in this area so I welcome a second opinion. I don't want to block this, but I also not confident enough to approve either.

Apr 5 2019, 9:39 AM · Restricted Project, Restricted Project

Apr 4 2019

peter.smith added inline comments to D60274: [ELF] Implement Dependent Libraries Feature.
Apr 4 2019, 8:36 PM · Restricted Project
peter.smith added a comment to D55550: [LLD][ELF] - Fix the different behavior of the linker script symbols on different platforms..

As mentioned above, I'm in favour of making this change.

Apr 4 2019, 8:23 PM · Restricted Project
peter.smith added a comment to D60274: [ELF] Implement Dependent Libraries Feature.

I think it will be worth adding some user level documentation, whether in this patch or another, for the pragma and how autolinking is expecting to work. My apologies I've not had time to go through the code in detail.

Apr 4 2019, 8:08 PM · Restricted Project
peter.smith added a comment to D60242: Add IR support, ELF section and user documentation for partitioning feature..

Thanks for the updates and clarifications, I don't have any more comments.

Apr 4 2019, 7:21 PM · Restricted Project
peter.smith added a comment to D60242: Add IR support, ELF section and user documentation for partitioning feature..

I've put some suggestions for the documentation. Feel free to ignore them if you prefer the original. I can't see anything wrong with the LLVM changes at a cursory look.

Apr 4 2019, 7:50 AM · Restricted Project

Apr 3 2019

peter.smith added a comment to D55423: [LLD][ELF] - A fix for "linker script assignment loses relative nature of section" bug..

Thanks for the test suggestion, that looks good to me. My understanding is that this would unblock the Linux Kernel KASLR build on AArch64. The upstream linux maintainers are not willing to change the linker script for what seems like a linker limitation/bug.

My understanding is that only the Arm and AArch64 Linux kernel's are actively using LLD right now and the Linker script is specific to those builds so this half-fix in practice would be a step forward.

Apr 3 2019, 7:57 AM · Restricted Project

Apr 2 2019

peter.smith added a comment to D58951: [compiler-rt][tests] Improve handling with non-default toolchains.

Thanks for splitting the patch, I'll take a look in detail as soon as I can, apologies I'm at a conference at the moment so it may be a few days.. I've got some experience and requirements from the cross-compilation side, although I'm certainly not the only one. At a high level I'd like to try and support --gcc-toolchain in both cross and native builds consistently if it is all possible,

Apr 2 2019, 7:07 PM · Restricted Project, Restricted Project
peter.smith added inline comments to D55423: [LLD][ELF] - A fix for "linker script assignment loses relative nature of section" bug..
Apr 2 2019, 8:09 AM · Restricted Project
peter.smith added a comment to D58951: [compiler-rt][tests] Improve handling with non-default toolchains.

If I've understood this correctly the patch introduces COMPILER_RT_TOOLCHAIN_CFLAGS which we don't expect users to set directly, but instead set -gcc-toolchain if the compiler is clang and CMAKE_CXX_COMPILER_EXTERNAL_TOOLCHAIN is set?

Apr 2 2019, 8:04 AM · Restricted Project, Restricted Project

Mar 30 2019

peter.smith added a comment to D60026: ELF: Perform per-section .ARM.exidx processing during combineEhFrameSections(). NFCI..

I'm happy to make the change. One minor reservation is that from combineEHSections to ARMExidxSections::finalize() the section will report its size as 0 and each individual ARMExidx InputSection won't have a parent. To the best of my knowledge this doesn't matter right now as nothing depends on these being set during that time. It may give someone a surprise in the future though.

Mar 30 2019, 11:26 PM · Restricted Project

Mar 28 2019

peter.smith committed rG3ce9af9370d0: [ELF][ARM] Recommit Redesign of .ARM.exidx handling to use a SyntheticSection (authored by peter.smith).
[ELF][ARM] Recommit Redesign of .ARM.exidx handling to use a SyntheticSection
Mar 28 2019, 4:10 AM
peter.smith added a comment to D59531: [ELF] Produce multiple PT_NOTE segments as needed.

We don't know addresses at this point, and indeed we can't because the layout of the rest of the output file depends on how many program headers there are, which is decided by this function.

Yep I think the only correct solution then is to use one PT_NOTE per note section then. That's kind of a shame but I think all other behaviors have a counter example (academic or otherwise). We can't relay on vaddrs and there aren't any assumptions we can make about what the relative properties between the final vaddrs will be either because of linker scripts. One per is the only solution resistant to academic counter examples (that I personally think we should take seriously).

Mar 28 2019, 3:03 AM · Restricted Project

Mar 27 2019

peter.smith added a comment to D59581: Python 2/3 compat: urllib.

I think that this change has already been committed as r356538 with an almost identical diff (only difference is the path to benchmark.py including trunk) at D59538 which is already approved. Were you intending this review to be for some other change such as LLDB? FWIW I've no objections to the changes.

Mar 27 2019, 3:56 AM · Restricted Project, Restricted Project
peter.smith accepted D59834: [ARM] Don't confuse the scheduler for very large VLDMDIA etc..

I've tentatively marked this as LGTM, if there are any disagreements please feel free to override. If I've understood correctly the only uses of that function are from the ARMSchedule*.td files which will treat larger number of registers as if they were the maximum size available. I think that this, or reverting the previous patch, are the most plausible short term ways to fix the bots (I've checked I can do a 2-stage build).

Mar 27 2019, 3:14 AM · Restricted Project

Mar 26 2019

peter.smith added a comment to D59780: Support Intel Control-flow Enforcement Technology.

I'm happy with this approach and I think it could be extended to handle the AArch64 BTI/PAC features very simply. I think it will be worth adding some tests for the new PLT sequences at some point.

Mar 26 2019, 10:28 AM · Restricted Project
peter.smith added a comment to D59780: Support Intel Control-flow Enforcement Technology.

Is --force-cet the best option name? H. J. Lu pointed out that gold has -z ibtplt and other options, but I couldn't find these options in the binutils' repository. Is the option really implemneted to gold?

I think the name doesn't quite match the concept. To me --force means "do it anyway", for example "git force push". As it has been implemented, which is error if object hasn't got CET then I'd say that would be better described with "enforce", "mandate", or more simply "require". So I'd say --require-cet would be a better name for that concept.

Mar 26 2019, 9:16 AM · Restricted Project
peter.smith updated the diff for D59216: [LLD][ELF][ARM] Redesign of .ARM.exidx handling to use a SyntheticSection.

Updated the diff to just remove the .rel.ARM.exidx relocation sections when --emit-relocations is used. This is a few lines in the ARMExidxSyntheticSection constructor. This permits us to remove the additional fixup code in InputSections.cpp and OutputSections.cpp. The test remains to make sure we don't crash with --emit-relocs and to check that there are no .rel.ARM.exidx relocation sections.

Mar 26 2019, 7:44 AM · Restricted Project
peter.smith added a comment to D59713: [ARM] Add missing memory operands to a bunch of instructions..

It does not appear to be showing up on our Arm v8 builders clang-cmake-armv8-selfhost-neon , and I've not yet been able to reproduce that on a v8 machine. One difference is that the v7 builders use gcc 5.4 for stage-1 whereas I think the v8 builders use clang. I will try and reproduce with gcc 5.4.0 on a v8 machine.

Mar 26 2019, 6:08 AM · Restricted Project
peter.smith added a comment to D59713: [ARM] Add missing memory operands to a bunch of instructions..

This broke compilation for me in a number of cases, see https://bugs.llvm.org/show_bug.cgi?id=41231 for repro.

Mar 26 2019, 5:03 AM · Restricted Project
peter.smith added a comment to D59216: [LLD][ELF][ARM] Redesign of .ARM.exidx handling to use a SyntheticSection.

This may be a silly question, but is --emit-reloc for .ARM.exidx important? If we re-create an .ARM.exidx section, we probably have to re-create a relocation section for .ARM.exidx just for --emit-relocs, but I'm wondering if there's an actual user who wants that behavior. Perhaps this is a kind of question no one really knows, though...

Mar 26 2019, 3:05 AM · Restricted Project

Mar 25 2019

peter.smith added a comment to D59780: Support Intel Control-flow Enforcement Technology.

I'll take a look at this tomorrow. I found https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=92ac8192dd3bd4b48d6ba882af1f1649231258e9 which was the initial commit for cet and ibt. Specifically:

Mar 25 2019, 10:53 AM · Restricted Project
peter.smith added a comment to D59216: [LLD][ELF][ARM] Redesign of .ARM.exidx handling to use a SyntheticSection.

I've added comments to highlight the changes. I've also added test cases for them.

Mar 25 2019, 10:33 AM · Restricted Project
peter.smith updated the diff for D59216: [LLD][ELF][ARM] Redesign of .ARM.exidx handling to use a SyntheticSection.

I've discovered 3 problems with this, the first 2 are trivial to fix, the last is not so I think I'll need to send for review when I've fixed rather than just recommit. The trivial problems are:

  • When all the .ARM.exidx sections from input files are merged into a synthetic one the getLinkOrderDep() function can't find any .ARM.exidx section and returns nullptr.
    • 1 line fix to just return the front() of the ExecutableSections vector, as that will always exist.
  • If the In.ARMexidx is discarded via \DISCARD\ then scanning its relocations adds a dependency to the personality routine (undefined symbol error).
    • 1 line fix to skip the relocation scanning if In.ARM.exidx is not live.

      The difficult problem to fix is that --emit-relocs doesn't work with the SyntheticSection.
  • The .ARM.exidx InputSections have been removed so there is no way to get the OutputSection
  • If table merging is on then we end up with spurious relocations for the removed table. This is true of the existing implementation.

    I'm investigating the best way of fixing this. The approach I'm currently favouring is following the example of EhInputSection. The alternative is some special cases for .ARM.exidx, I think that these could prove fragile so I'm tending towards the former even if it is more code.
Mar 25 2019, 10:24 AM · Restricted Project
peter.smith added a comment to D57143: [builtins] Rounding mode support for addxf3/subxf3.

The mix of soft-floating point and hard-floating point (with soft-float interface) can happen on embedded systems code as well. The solution Arm's proprietary toolchain had to this problem is for the linker to select libraries based on build-attibutes. It would detect the rounding mode build attribute and select the appropriate library variant, but that isn't available to us here. A colleague suggested an easily available customisation point that someone that had a local need for a different rounding mode could implement.

Mar 25 2019, 5:05 AM · Restricted Project, Restricted Project
peter.smith added a comment to D59733: ARM: Allow cp10/cp11 coprocessor register access with a warning.

I prefer having the warning here as there are use cases for using the coprocessor interface for VFP instructions, notably when access to the instructions is gated at run-time and the same source needs to assemble on multiple targets. From what I can see the mrc in venum_read_pmresr() is not a VFP/Neon instruction. It hails from arch/arm/kernel/perf_event_v7.c and is specific to Qualcomm's Scorpion and Krait CPUs. My understanding is that CP15 is used for the PMU in v7, it is possible that the use of p10 there is a mistake but I'm not knowledgeable enough about Scorpion and Krait to know whether this is the case.

Mar 25 2019, 4:15 AM · Restricted Project

Mar 22 2019

peter.smith reopened D59216: [LLD][ELF][ARM] Redesign of .ARM.exidx handling to use a SyntheticSection.

I've discovered 3 problems with this, the first 2 are trivial to fix, the last is not so I think I'll need to send for review when I've fixed rather than just recommit. The trivial problems are:

  • When all the .ARM.exidx sections from input files are merged into a synthetic one the getLinkOrderDep() function can't find any .ARM.exidx section and returns nullptr.
    • 1 line fix to just return the front() of the ExecutableSections vector, as that will always exist.
  • If the In.ARMexidx is discarded via \DISCARD\ then scanning its relocations adds a dependency to the personality routine (undefined symbol error).
    • 1 line fix to skip the relocation scanning if In.ARM.exidx is not live.
Mar 22 2019, 9:19 AM · Restricted Project

Mar 21 2019

peter.smith committed rG54dab70bb75e: [ELF][ARM] Revert Redesign of .ARM.exidx handling to use a SyntheticSection (authored by peter.smith).
[ELF][ARM] Revert Redesign of .ARM.exidx handling to use a SyntheticSection
Mar 21 2019, 10:18 AM
peter.smith committed rGd3511a214e47: [ELF][ARM] Redesign of .ARM.exidx handling to use a SyntheticSection (authored by peter.smith).
[ELF][ARM] Redesign of .ARM.exidx handling to use a SyntheticSection
Mar 21 2019, 7:09 AM
peter.smith abandoned D58047: [LLD][ELF][ARM] Synthesise missing .ARM.exidx sections..

Abandoned in favour of D59216

Mar 21 2019, 7:08 AM
peter.smith added a comment to D58102: Support X86 Control-flow Enforcement Technology (CET) in LLD.

I've checked that it does work, although at the moment it does leave a redundant .splt section in the binary. I think that this can be removed by accounting for this in the SPltSection::empty(), .........

It shouldn't have redundant .splt section, there should not have any .splt section if IBT is disable or non-dynamic link. Could you show me your test please?

Mar 21 2019, 2:43 AM · Restricted Project, lld

Mar 20 2019

peter.smith added a comment to D58102: Support X86 Control-flow Enforcement Technology (CET) in LLD.

Hi, Peter, very appreciate for your suggestion, for the Splt, in fact, it is just the entry of the old PLT, you can see in this way: SPLT+PLT=(old)PLT, we split the (old)Plt to Plt +SPlt, one of the reasons is for performance that we want to put them into different areas (hot code, cold code) .
And in this patch, I used getSPltVA() instead of getPltVA when face outside. For example in getRelocTargetVA():

case R_PLT:
  // When the CET Branch protection is enabled, the second PLT
  // should be call to.
  if (Config->SPltSectionEnable)
    return Sym.getSPltVA() + A;
  else
    return Sym.getPltVA() + A;

Maybe it's hard to use In.SpltSection::Enable to do these.
Any way I will check it.
Thank you very much!

Mar 20 2019, 5:03 AM · Restricted Project, lld
peter.smith added a comment to D58102: Support X86 Control-flow Enforcement Technology (CET) in LLD.

I rewrited the collecting feature functions.
It is very simple now!

+struct GnuPropertyHeadType {
+  uint32_t Namesz;
+  uint32_t Descsz;
+  uint32_t NType;
+  uint32_t Name;
+};                                              // Did here need packed attribute ? I have a little concern
+
+struct GnuPropertyDescType {
+  uint32_t DescTy;
+  uint32_t Descsz;
+  uint32_t Feature;
+  uint32_t Pad;
+};                                              // Did here need packed attribute ? I have a little concern
+
+struct GnuPropertyType {
+  GnuPropertyHeadType Head;
+  GnuPropertyDescType Desc[];
+};                                            // Did here need packed attribute ? I have a little concern
+
+static bool findGNUPropertyX86Feature1AND(InputSectionBase *S,
+                                          unsigned &Features) {
+  auto Data = S->getDataAs<uint32_t>();
+  GnuPropertyType *GnuProperty = cast<GnuPropertyType>(Data.data());
+  size_t DescNum = GnuProperty->Head.Descsz / sizeof (GnuPropertyDescType);
+  if (GnuProperty->Head.NType != NT_GNU_PROPERTY_TYPE_0)
+    return false;
+  for (size_t i = 0; i < DescNum; i ++) {
+    if (GnuProperty->Desc[i].DescTy == GNU_PROPERTY_X86_FEATURE_1_AND) {
+      Features = GnuProperty->Desc[i].Feature;
+      return true;
+    }
+  }
+  return false;
+}
Mar 20 2019, 3:32 AM · Restricted Project, lld
peter.smith added a comment to D59216: [LLD][ELF][ARM] Redesign of .ARM.exidx handling to use a SyntheticSection.

Ping. I think I have addressed all the existing comments. Happy to make any further changes if needed.

Mar 20 2019, 3:21 AM · Restricted Project

Mar 19 2019

peter.smith added a comment to D59531: [ELF] Produce multiple PT_NOTE segments as needed.
In D59531#1434312, @pcc wrote:

Is it possible for this to fail with a linker script that looks like:

SECTIONS {
  . = SIZEOF_HEADERS;
  .note1 : { *(.note1) }
  . = 0x10000;
  .note2 : { *(.note2) }
}

?

Mar 19 2019, 5:22 AM · Restricted Project
peter.smith added a comment to D58102: Support X86 Control-flow Enforcement Technology (CET) in LLD.

I guess you mean "I tried", not "I tend" in 1). I think that you can do accomplish 1.) via unconditionally creating the In.GnuPropertySection and SPltSection and by implementing the empty() member function, ensure that they are removed by removeUnusedSyntheticSections() if there isn't anything that uses them. Unconditionally creating the sections means that you don't need to call mergeAggregateMetaData from Driver.cpp prior to createSyntheticSections(). For the case of ifunc where you need to turn off the feature then just inform In.GnuPropertySection and In.SPltSection as they will always be there.

Hi peter, I have implemented your idea in my code, and it work well, I did not update here now because I am not sure about the ifunc.
One of my destination is to remove the Config->X86Feature1AND and do the feature collection work inside synthetic GnuPropertySection class.
I found I can remove the Config->X86Feature1AND except dealing with ifunc, If I remove the Config->X86Feature1AND I still need another global variable.
At this stage I want to disable CET when the program will use ifunc, my current code is Relocations.cpp:1072

// To Be Done.
// Now the IBT can't work with IFUNC feature, because they may conflict in
// changing the PLT. So we disable the IBT when the IFUNC enabled.
if (Sym.isGnuIFunc()) {
  Config->X86Feature1AND &= ~GNU_PROPERTY_X86_FEATURE_1_IBT;
  Config->SPltSectionEnable = false;
}

Do you have some good idea to turn off the CET feature in ifunc ?
Thank you very much!

Mar 19 2019, 4:45 AM · Restricted Project, lld

Mar 18 2019

peter.smith added a comment to D58102: Support X86 Control-flow Enforcement Technology (CET) in LLD.

deficiencies:

  1. I tend to collect the CET features in GnuPropertySection (SyntheticSection) class or in Writer->run like combineEhFrameSections(), but I found it should first get the CET features, then decide to create the GnuPropertySection or not (SPltSection too).
  2. I didn't remove the Config->X86Feature1AND, because I need it to tell the createSyntheticSections() to create the GnuPropertySection or not (SPltSection too).
Mar 18 2019, 7:22 AM · Restricted Project, lld

Mar 14 2019

peter.smith added a comment to D59120: [ELF] Sort notes by alignment in decreasing order.

Both choices look good to me:

  • emit one PT_NOTE for each SHF_ALLOC .note*
  • emit one PT_NOTE for adjacent SHF_ALLOC .note* with the same alignment (don't check alignment of their sizes)
Mar 14 2019, 4:20 AM · Restricted Project

Mar 13 2019

peter.smith updated the diff for D59216: [LLD][ELF][ARM] Redesign of .ARM.exidx handling to use a SyntheticSection.

Updated diff to address review comments.

Mar 13 2019, 3:39 AM · Restricted Project
peter.smith added a comment to D59216: [LLD][ELF][ARM] Redesign of .ARM.exidx handling to use a SyntheticSection.

Thanks for the comments I'll update the diff.

Mar 13 2019, 3:32 AM · Restricted Project

Mar 12 2019

peter.smith added a comment to D58102: Support X86 Control-flow Enforcement Technology (CET) in LLD.

As an earlier comment mentioned, there is quite a lot of code that depends on the details of the .note.gnu.property section added very early to Driver.c which is not where we'd normally expect to see it. It also looks like you using quite a lot of code to get an InputSection to represent the output .note.gnu.property section. Given that there will be multiple Input .note.gnu.property sections but only one Output .note.gnu.property section, I think it would be cleaner to use a SyntheticSection to represent the .note.gnu.property section. SyntheticSections can implement all sorts of custom functionality that could move the parsing and merging code out of Driver.cpp. My suggestion is to:

  • Create a new SyntheticSection (see SyntheticSections.h), something like NotePropertySyntheticSection. This will represent our single output .note.gnu.property section. It can store the status of the FEATUREAND bits.
  • Add an entry to InStruct for a single instance of it that we will use for the output.
  • After InStruct instance is created in Writer.c but before the relocations are scanned to create PLT entries iterate through InputSections and pass each .note.gnu.property to the NotePropertySyntheticSection. Then erase these InputSections from the InputSections. The combineEhFrameSections<ELFT> function in Writer.cpp is one way of doing something like that.
  • When you add the .note.gnu.property InputSection to the NotePropertySyntheticSection you can update the FEATUREAND bits
  • The writeTo() member function of the NotePropertySyntheticSection can output the data needed to make up the output.
Mar 12 2019, 9:48 AM · Restricted Project, lld
peter.smith updated the diff for D59216: [LLD][ELF][ARM] Redesign of .ARM.exidx handling to use a SyntheticSection.

Updated to make setSizeAndOffsets() use of Size less confusing, fixed some curly brace nits.

Mar 12 2019, 8:44 AM · Restricted Project
peter.smith updated the diff for D59216: [LLD][ELF][ARM] Redesign of .ARM.exidx handling to use a SyntheticSection.

Updated the diff with the requested changes. I've followed pcc's idea to make the section behave more like the EHFrame section. This gets rid of some of the nastier relocation handling code in writeTo().

Mar 12 2019, 8:20 AM · Restricted Project
peter.smith added a comment to D59216: [LLD][ELF][ARM] Redesign of .ARM.exidx handling to use a SyntheticSection.

Thanks for the comments, I'll upload another diff with the changes.

Mar 12 2019, 8:12 AM · Restricted Project