Page MenuHomePhabricator

[ELF/AArch64] - Implemented set of R_AARCH64_TLSDESC_* relocations.
AbandonedPublic

Authored by grimar on Jan 14 2016, 10:59 AM.

Details

Reviewers
ruiu
rafael
Summary

This patch implements next relocations:
R_AARCH64_TLSDESC_ADR_PAGE21, R_AARCH64_TLSDESC_LD64_LO12_NC, R_AARCH64_TLSDESC_ADD_LO12_NC, R_AARCH64_TLSDESC_CALL.

They have lazy relocation nature, so require entries in got.plt. Each one is double sized. Also special Plt entry with resolver is created in addition to single got entry.
A module that uses lazy TLSDESC relocations must define next two entries in dynamic section:

  • DT_TLSDESC_PLT - Location of PLT entry for TLS descriptor resolver calls.
  • DT_TLSDESC_GOT - Location of GOT entry used by TLS descriptor resolver PLT entry.

More information can be found in "Thread-Local Storage Descriptors for IA32 and AMD64/EM64T" and "Thread-Local Storage Descriptors for the ARM platform" (http://www.fsfla.org/~lxoliva/writeups/TLS/RFC-TLSDESC-x86.txt, http://www.fsfla.org/~lxoliva/writeups/TLS/RFC-TLSDESC-ARM.txt)

Sample app to see them on AArch64 is:
__thread int foo;
int main() {
  foo = 5;
}

aarch64-linux-gnu-g++ -fPIC test.cpp -S -o test.s

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
grimar updated this revision to Diff 44902.Jan 14 2016, 10:59 AM
grimar updated this object.
grimar added reviewers: ruiu, rafael.
grimar added subscribers: llvm-commits, grimar.
ruiu added inline comments.Jan 14 2016, 11:05 AM
ELF/InputSection.cpp
272

This needs a comment.

ELF/OutputSections.cpp
237

Ditto

303

Ditto

ELF/Writer.cpp
389

Ditto

emaste added a subscriber: emaste.Jan 14 2016, 5:19 PM
grimar added inline comments.
test/ELF/aarch64-tls-desc.s
76

I`ll remove the unknowns once http://reviews.llvm.org/D16224 is commited.

grimar updated this revision to Diff 45007.Jan 15 2016, 10:04 AM

Review comments addressed (added comments).

grimar updated this revision to Diff 45252.Jan 19 2016, 6:08 AM
  • Implemented TlsDesc GD->LE relaxation.
ruiu added inline comments.Jan 19 2016, 4:53 PM
ELF/InputSection.cpp
272

Instead of adding this code,

284–298

... why don't you add

} else if (Body->isTls() && Target->isTlsDescReloc(Type, *Body)) {
  SymVA = Out<ELFT>::GotPlt->getEntryAddr(*Body)
  continue;

here? (I'm not saying that this long if-elseif-elseif is easy to read, but consistency matters.

ELF/Writer.cpp
390

Can you move this code to a new function? This function got too large.

grimar updated this revision to Diff 45383.Jan 20 2016, 5:29 AM

Review comments addressed.

grimar added inline comments.
ELF/InputSection.cpp
284–298

In general descriptor relocations are very specific and if for example we have logic for isTlsGlobalDynamicReloc separated (which is also specific case) then why dont do the same for descreloc ?
Code for Target->isTlsLocalDynamicReloc(Type) and Target->isTlsGlobalDynamicReloc(Type) is above it. So I did it exactly for consistency.
Also I think in the similar situation for R_X86_64_GOTTPOFF optimization (http://reviews.llvm.org/D14713?vs=on&id=40303&whitespace=ignore-most#toc, ELF/InputSection.cpp, line 150) you asked me to move the code above uintX_t SymVA :)

ELF/Writer.cpp
390

Created new function for all TLS code in http://reviews.llvm.org/D16354, moved that code.

ruiu edited edge metadata.Jan 20 2016, 2:55 PM

Please submit http://reviews.llvm.org/D16354 and sync this patch to HEAD.

grimar updated this revision to Diff 45501.Jan 21 2016, 2:15 AM
grimar edited edge metadata.

Synced with head as requested.

Rui, I know you want to suspend adding complexity to the relocation handler until cleanup, I am not sure if this one does it or not, so this update is mostly to have the lastest vesion of this patch available here as was requested above.

ruiu added a comment.Jan 21 2016, 11:15 AM

Yeah, it would be OK if this were a small patch, but this is fairly large and bring in a new concept of TLS descriptor in the linker. I'd like to suspend this. Most of the code will be reusable after the refactoring.

rafael edited edge metadata.Feb 16 2016, 1:44 PM
rafael added a subscriber: rafael.

The refactoring on how dynamic relocations are handled is done (r259829).

We still need to figure out a better way to propagate information for
the relocation scan to relocateOne, but I don't know of anyone doing
that on the immediate future, so this might be a reasonable time to
implement this.

The refactoring on how dynamic relocations are handled is done (r259829).

We still need to figure out a better way to propagate information for
the relocation scan to relocateOne, but I don't know of anyone doing
that on the immediate future, so this might be a reasonable time to
implement this.

it looks r260677 "[ELF/AArch64] Add support to some GD/LE/IS TLS relocations" also implemented something relative,
I didn`t look close on it yet.
Will try to ressurect the patch soon.

George.

grimar updated this revision to Diff 48611.Feb 21 2016, 12:09 AM
grimar edited edge metadata.
  • Rebased
ruiu added inline comments.Feb 22 2016, 3:05 PM
ELF/InputSection.cpp
272

Do you mean special by specific?

276–277

What is not clear in the TLS documents and in LLD's code is what is the ABI requirement and what's an optimization. We probably need to write our own document (read "long comments") to describe the TLS ABI.

Although my TLSDESC patch have missing bits (as you pointed out and I am inclided to abandon it), there is something missing on this patch. Although I can bootstrap the compiler itself, the make check-ldd is stuck in:

tools/lld/unittests/DriverTests/DriverTests --gtest_list_tests

(gdb) bt
#0 0x0000007f8842ca04 in lll_lock_wait (futex=0x34c59980, private=0) at lowlevellock.c:46
#1 0x0000007f88428964 in
GI___pthread_rwlock_rdlock (rwlock=0x34c59980) at pthread_rwlock_rdlock.c:113
#2 0x0000000001971c54 in llvm::sys::RWMutexImpl::RWMutexImpl() ()
#3 0x0000000001971c54 in llvm::sys::RWMutexImpl::RWMutexImpl() ()
#4 0x0000000001971c54 in llvm::sys::RWMutexImpl::RWMutexImpl() ()
#5 0x0000000001971c54 in llvm::sys::RWMutexImpl::RWMutexImpl() ()
#6 0x0000000001971c54 in llvm::sys::RWMutexImpl::RWMutexImpl() ()
#7 0x0000000001971c54 in llvm::sys::RWMutexImpl::RWMutexImpl() ()
#8 0x0000000001971c54 in llvm::sys::RWMutexImpl::RWMutexImpl() ()
#9 0x0000000001971c54 in llvm::sys::RWMutexImpl::RWMutexImpl() ()
[...]
(gdb) info threads

Id   Target Id         Frame
  • 1 Thread 0x7f880c8000 (LWP 14280) "DriverTests" 0x0000007f8842ca04 in __lll_lock_wait (futex=0x34c59980, private=0) at lowlevellock.c:46

Where with my patch I can bootstrap and run all the make-ldd tests without issue. I am debugging this with your patch.

Although my TLSDESC patch have missing bits (as you pointed out and I am inclided to abandon it), there is something missing on this patch. Although I can bootstrap the compiler itself, the make check-ldd is stuck in:

tools/lld/unittests/DriverTests/DriverTests --gtest_list_tests

(gdb) bt
#0 0x0000007f8842ca04 in lll_lock_wait (futex=0x34c59980, private=0) at lowlevellock.c:46
#1 0x0000007f88428964 in
GI___pthread_rwlock_rdlock (rwlock=0x34c59980) at pthread_rwlock_rdlock.c:113
#2 0x0000000001971c54 in llvm::sys::RWMutexImpl::RWMutexImpl() ()
#3 0x0000000001971c54 in llvm::sys::RWMutexImpl::RWMutexImpl() ()
#4 0x0000000001971c54 in llvm::sys::RWMutexImpl::RWMutexImpl() ()
#5 0x0000000001971c54 in llvm::sys::RWMutexImpl::RWMutexImpl() ()
#6 0x0000000001971c54 in llvm::sys::RWMutexImpl::RWMutexImpl() ()
#7 0x0000000001971c54 in llvm::sys::RWMutexImpl::RWMutexImpl() ()
#8 0x0000000001971c54 in llvm::sys::RWMutexImpl::RWMutexImpl() ()
#9 0x0000000001971c54 in llvm::sys::RWMutexImpl::RWMutexImpl() ()
[...]
(gdb) info threads

Id   Target Id         Frame
  • 1 Thread 0x7f880c8000 (LWP 14280) "DriverTests" 0x0000007f8842ca04 in __lll_lock_wait (futex=0x34c59980, private=0) at lowlevellock.c:46

    Where with my patch I can bootstrap and run all the make-ldd tests without issue. I am debugging this with your patch.

Thanks for info ! I`ll review the output produced by this patch once again.

Although my TLSDESC patch have missing bits (as you pointed out and I am inclided to abandon it)

Looks your one implements non-lazy case and since your one is already workable, I think there is no real need to abandon it.
This one really can be commited on top if issues with it be resolved.

I found that I can run the next helloworld app fine with this:

main.c

int method();
int main(void) {
  return method();
}

shared.c

int puts(const char *);
__thread int foo;
int method() {
  puts("Hello, World");
  foo = 0;
}

/home/umb/LLVM/build/bin/clang -fuse-ld=lld -target aarch64-linux-gnu -fPIC -shared shared.c -o shared.so
/home/umb/LLVM/build/bin/clang -fuse-ld=lld -target aarch64-linux-gnu -fPIC main.c shared.so -o test
root@ubuntu:/home# ./test
Hello, World

but if I switch lines:

foo = 0;
puts("Hello, World");

Then it crashes:

root@ubuntu:/home# ./test
Inconsistency detected by ld.so: dl-runtime.c: 79: _dl_fixup: Assertion `((reloc->r_info) & 0xffffffff) == 1026' failed!
(source code of dynamic linker for that place: https://github.com/lattera/glibc/blob/master/elf/dl-runtime.c#L86)

My guess that it is because TLSDESC is placed not at the end in that case:
000000003020 000100000402 R_AARCH64_JUMP_SL 0000000000000000 gmon_start + 0
000000003028 000e00000402 R_AARCH64_JUMP_SL 00000000000011b0 __cxa_finalize + 0
000000003030 000c00000407 R_AARCH64_TLSDESC 0000000000000000 foo + 0
000000003040 000500000402 R_AARCH64_JUMP_SL 0000000000000000 puts + 0

I think gold/bfd place it at the end and that is the root of problem of this patch. I am fixing it.
I think this is the first relocation we meet that requires placing after all JUMP_SLOTs ones.
I can remember that also IFunc relocations were placed to the end for dynamic case by gold, but
we did not implement dynamic case for IFunc`s.

I am also not sure about the position of the TLSDESC relocation, I will check that.

I also noted that the patches does not create dynamic R_AARCH64_TLSDESC relocations against local symbols. For instance:

__thread int v1;
static __thread int v2;

// We don't use these pointers, but putting them in tests alignment on
// a 64-bit target.
__thread char* p1;
char dummy;
__thread char* p2 = &dummy;

__thread int v3 = 3;
static __thread int v4 = 4;
__thread int v5;
static __thread int v6;

int*
f() {
  return &v2;
}
clang++ -target aarch64-linux-gnu -fpic test.cc -c -o test.o
clang++ -target aarch64-linux-gnu -fpic test.o -shared -o test.so
Improper alignment for relocation R_AARCH64_TLSDESC_LD64_LO12_NC
clang-3.9: error: linker command failed with exit code 1 (use -v to see invocation)

It is due the fact is passing the wrong value on relocateOne (...). Also to avoid extra bogus dynamic relocations with recent changes in trunk I am using:

diff --git a/ELF/Target.cpp b/ELF/Target.cpp
index dbe0cc5..fef5b60 100644
--- a/ELF/Target.cpp
+++ b/ELF/Target.cpp
@@ -1220,11 +1220,14 @@ AArch64TargetInfo::AArch64TargetInfo() {
 }
 
 bool AArch64TargetInfo::isRelRelative(uint32_t Type) const {
-  return Type == R_AARCH64_PREL32 || Type == R_AARCH64_ADR_PREL_PG_HI21 ||
-         Type == R_AARCH64_LDST8_ABS_LO12_NC ||
-         Type == R_AARCH64_LDST32_ABS_LO12_NC ||
-         Type == R_AARCH64_LDST64_ABS_LO12_NC ||
-         Type == R_AARCH64_ADD_ABS_LO12_NC || Type == R_AARCH64_CALL26;
+  switch (Type) {
+  default:
+    return true;
+  case R_AARCH64_ABS64:
+  case R_AARCH64_ADR_GOT_PAGE:
+  case R_AARCH64_LD64_GOT_LO12_NC:
+    return !Config->Shared;
+  }
 }

Which I intend to create a patch to simplify the logic on dynamic relocation creation.

I placed R_AARCH64_TLSDESC after R_AARCH64_JUMP_SLOT ones. And also placed .got.plt entries after common ones for these relocations.
These changes together resolved the issues I observed, helloworld now works fine for me. I hope it is enough for initial version of patch and I am going to update it with this changes soon.

I am also not sure about the position of the TLSDESC relocation, I will check that.

I also noted that the patches does not create dynamic R_AARCH64_TLSDESC relocations against local symbols.

I`ll check that. But I think that can be fixed in a different patch(s). Changes I wrote about above IMO a good point to commit as this patch is already large.

So I have plans to update it tommorrow.

grimar updated this revision to Diff 49377.Feb 29 2016, 8:01 AM
  • Placed R_AARCH64_TLSDESC after R_AARCH64_JUMP_SLOT ones.
  • Placed .got.plt entries after common ones for these relocations.
  • Updated the testcase.

With that changes helloworld application mentioned above in this thread works fine.

grimar added inline comments.Feb 29 2016, 8:04 AM
ELF/InputSection.cpp
272

Probably yes. Fixed.

276–277

I am not sure here. There is long comment about how TlsDesc works in Writer.cpp.
If we are talk about optimizations then I removed the aarch64-tls-desc-gdle.s test from this patch, because optimizations for these relocs were commited separately in r260677. And this patch does not contain any mention of optimizations anymore anywhere.

I see placing TLSDESC after jump slots seems to be the right way. From elf aarch64 code in glibc (sysdeps/aarch64/dl-machine.h) I see that only TLSDESC is a double GOT, so we should we place it on the end to avoid dynamic relocation handling issues (gold also place the TLSDESC in .got.plt after jump slot and IRELATIVE).

However I still think it is something missing, I am still seeing some issues with bootstrap with an random testcase:

$ gdb ./bin/lld-link
(gdb) r @tools/lld/test/COFF/Output/responsefile.test.tmp.rsp /heap:0x3000
(gdb) bt
#0 0x00454211f0000f10 in ?? ()
#1 0x0000007fb7f52030 in pthread_once_slow (once_control=0x1aac6d0, init_routine=0xf9454211f0000f10) at pthread_once.c:114
#2 0x0000000000900368 in std::
future_base::_Async_state_commonV2::_M_complete_async() ()
#3 0x00000000008fd3e4 in std::future<lld::coff::InputFile*>::get() ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) up
[...]
(gdb) disas
[...]

0x0000007fb7f52020 <+132>:   add     x1, x1, #0x7c
0x0000007fb7f52024 <+136>:   mov     x19, x4
0x0000007fb7f52028 <+140>:   bl      0x7fb7f53440 <_pthread_cleanup_push>
0x0000007fb7f5202c <+144>:   blr     x20

> 0x0000007fb7f52030 <+148>: mov x0, x21

ELF/InputSection.cpp
280

I think you could just use the already got addend 'A' at line 235.

ELF/OutputSections.cpp
245

I think this should use the non-specialized 'GotPlt' access (Out<ELT>::GotPlt).

grimar updated this revision to Diff 49453.Feb 29 2016, 10:12 PM
grimar marked 2 inline comments as done.
  • Addressed Adhemerval's review comments.
  • Updated comment.

However I still think it is something missing, I am still seeing some issues with bootstrap with an random testcase:

That might be not something directly relative with this patch. At least all sample apps that uses TLSDESC relocations are working now. And we probably can land this and continue investigating issues and fix them separatelly. What do you think ?

ELF/InputSection.cpp
280

Done.

ELF/OutputSections.cpp
245

Fixed.

Right I think we can investigate it later with this patch as base.

grimar added a comment.Mar 4 2016, 5:35 AM

I am sorry that I am pinging this on 4th day and not on 5th day how it declared in llvm polite rules :) I just feel that it is an old enough patch with lot of discussions and it was just forgotten, probably.
But actually it still requires an review and I will appreciate really any comment about it, since it is large and probably needs tweaking.

At least it introduces the approach that adds relocations to the end of .got.plt, after the other ones.
(implemented in void GotPltSection<ELFT>::addTlsDescEntry(SymbolBody *Sym)).
That is the first relocations implemented in lld that requires that.

With this patch the next helloworld apps on AArch64 are running fine:

main.c

int method();
int main(void) {
  return method();
}

shared.c

int puts(const char *);
__thread int foo;
int method() {
  foo = 0;
  puts("Hello, World");
}

And that was not possible before, so I really believe it is useful to have.

Sorry if I was not clear, this LGMT to me.

Although I think you will need to rebase again, i see that canRelaxTls is now Target agnostic and it will require tlsdesc logic on it.

Sorry if I was not clear, this LGMT to me.

Yap, but I think we need to wait to know what Rafael and Rui thinks about it.

ruiu added a comment.Mar 9 2016, 10:44 PM

Do you know the number of performance gains you can get with this optimization? I'm a bit worried about the complexity of the peephole optimizations we have in the linker as it is probably the most hard-to-read part. Also, as I wrote before, it hard to identify what we have to do and what are optional optimizations (so you can skip that code when reading) by just reading code.

In D16201#371640, @ruiu wrote:

Do you know the number of performance gains you can get with this optimization? I'm a bit worried about the complexity of the peephole optimizations we have in the linker as it is probably the most hard-to-read part. Also, as I wrote before, it hard to identify what we have to do and what are optional optimizations (so you can skip that code when reading) by just reading code.

Probably there is some misunderstanding here. That is not an optimization. I think we actually have no choice about implementing them, It is just the set of relocations that are used on aarch64 for tls. This relocations can be optimized, and that is optional path I believe, that was implemented separatelly (http://reviews.llvm.org/D17980). This patch just implements the basic not optimized way.

Consider the code from my comments above. When we compile it using clang without any additional options it creates such relocations.
test.c

__thread int foo;
int method() {
  foo = 0;
}

clang -target aarch64-linux-gnu -fPIC -c test.c -o test.o
aarch64-linux-gnu-readelf -r test.o

Relocation section '.rela.text' at offset 0x160 contains 4 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
00000000000c  000600000232 R_AARCH64_TLSDESC 0000000000000000 foo + 0
000000000010  000600000233 R_AARCH64_TLSDESC 0000000000000000 foo + 0
000000000014  000600000234 R_AARCH64_TLSDESC 0000000000000000 foo + 0
000000000018  000600000239 R_AARCH64_TLSDESC 0000000000000000 foo + 0

They requires .got entry, .got.plt and so on but that is not the optimization and just the way how TLSDECS implementation actually works.

test/ELF/aarch64-tls-desc.s
7

I`ll move REQUIRES to be the first.

130

I will also add the space after # to be consistent in this test case.

This patch is required because clang only implements the tlsdesc dialect for aarch64. GCC implements both dialects, so you can:

$ gcc test.c -fPIC -o test.o -c -mtls-dialect=trad
$ readelf -r test.o

Relocation section '.rela.text' at offset 0x248 contains 3 entries:

Offset          Info           Type           Sym. Value    Sym. Name + Addend

000000000008 000a00000201 R_AARCH64_TLSGD_A 0000000000000000 foo + 0
00000000000c 000a00000202 R_AARCH64_TLSGD_A 0000000000000000 foo + 0
000000000010 000c0000011b R_AARCH64_CALL26 0000000000000000 __tls_get_addr + 0

Also, even is clang implements the traditional TLS dialect for aarch64, we do need to support tlsdesc one (and it is the default for aarch64).

grimar added a comment.EditedMar 10 2016, 2:18 AM

This patch is required because clang only implements the tlsdesc dialect for aarch64. GCC implements both dialects, so you can:

$ gcc test.c -fPIC -o test.o -c -mtls-dialect=trad
...
Also, even is clang implements the traditional TLS dialect for aarch64, we do need to support tlsdesc one (and it is the default for aarch64).

So, that mean that without additional options, like "-mtls-dialect=trad", gcc will also produce the output with TLSDESC. That what I was pointing on about clang: it is aarch64 specific tls relocation type that we can't just avoid to support, because it is used often.

And as a side note, x86_64 does have tlsdesc abi dialect implemented by gcc. Using the same example:

$ gcc test3.c -fPIC -c -mtls-dialect=gnu2
$ readelf -r test3.o

Relocation section '.rela.text' at offset 0x5b8 contains 2 entries:

Offset          Info           Type           Sym. Value    Sym. Name + Addend

000000000007 000900000022 R_X86_64_GOTPC32_ 0000000000000000 foo - 4
00000000000b 000900000023 R_X86_64_TLSDESC_ 0000000000000000 foo + 0

It does not trigger with object built with clang because it only implementes the traditional TLS argument.

grimar updated this revision to Diff 50417.Mar 11 2016, 4:42 AM
  • Rebased
  • Cleaned up
  • Fixed testcase formatting

I have been testing this patch along other that fix aarch64-linux support [1] [2] [3] and I can bootstrap and run both lld own testcases and test-suite with with relative success (test-suite shows 5 runtime failures I am investigating).

This patch also requires a new rebase and some adjustments due addEntry interface changes: instead of using a pointer to SymbolBody, I would recommend to use a reference on addTlsDescEntry (as it is for addEntry).

The only thing I am not sure about this patch is for the architecture that might use the new tlsdesc dialect, saying x86_64, i386, arm, and aarch64; the position seems to be architecture dependent. Using gold implementation as reference:

  • x86_64 places TLSDESC relocs, if any, after regular PLT followed by IRELATIVE
  • aarch64 and i386 seems to place IRELATIVE, if any, after regular PLT followed by TLSDESC
  • arm is not implemented.

So I think this solves aarc64 and possible i386 case (if the idea is to implement it in the future), however I think we will need something more flexible for a possible x86_64 implementation.

[1] http://reviews.llvm.org/D18026
[2] http://reviews.llvm.org/D17980
[3] http://reviews.llvm.org/D18031

The only thing I am not sure about this patch is for the architecture that might use the new tlsdesc dialect, saying x86_64, i386, arm, and aarch64; the position seems to be architecture dependent. Using gold implementation as reference:

  • x86_64 places TLSDESC relocs, if any, after regular PLT followed by IRELATIVE
  • aarch64 and i386 seems to place IRELATIVE, if any, after regular PLT followed by TLSDESC

That should not be a problem. We are supporting gnu_ifunc only for static case. Therefore there are no any other than IRELATIVE relocations in rela.plt when they are used.
There is probably no real need in supporting dynamic case in closest future because as far as I know ifunc mostly needed to statically link against multiarch libc.
So for now I would not overcomplicate the solution for that.

grimar added inline comments.Mar 15 2016, 6:53 AM
ELF/OutputSections.cpp
294

Ah, Relocs is a vector, not list. That is bad line then. I`ll replace it with the list on next iteration.

{F1670821}I have rebased this patch against master and checked on an aarch64 machine (using both lld-test and a full bootstrap).

ELF/OutputSections.cpp
294

I do not think this is incorrect since the vector will rearrange the itens by expanding it internally. It might however show performance regression, although I also think changing it to std::list or similar will show less performance and more memory usage (even with itens rearrange by inserting in the middle). If you check this small performance test (http://pastie.org/10764242), std::list append shows pretty much the same performance of the std::vector rearrange (the 13000000 is number of relocation in NewLLD doc).

{F1670821}I have rebased this patch against master and checked on an aarch64 machine (using both lld-test and a full bootstrap).

Did you mean this, my patch ? Then attached patch is a different one it seems. I am going to rebase this one too soon.

Actually I would put this on hold for now. It is not reviewed for about a month or so, so I think there is a reason for that or this patch just is not "in time" for current lld.
So I am ready to rebase this on request, but will not continue support of that until that.

Do you mind if I use this patch as base and repost a similar one for revision? I really want to implement this continue the TLS support on aarch64.

Do you mind if I use this patch as base and repost a similar one for revision? I really want to implement this continue the TLS support on aarch64.

Sure, please feel free to use this :)

Do you mind if I use this patch as base and repost a similar one for revision? I really want to implement this continue the TLS support on aarch64.

So since Ed is ready to start investigating lld on FreeBSD/arm64, I think it is reasonable to keep that in rebased state for him. I`ll rebase this in a few hours.
But just in case if you still want to do your own patch based on this that is also fine for me.

grimar updated this revision to Diff 51106.Mar 19 2016, 12:38 AM
  • Rebased
ELF/OutputSections.cpp
294

Well, that test uses list of ints, but we have Relocs which is a struct with several fields:

template <class ELFT> struct DynamicReloc {
  typedef typename ELFT::uint uintX_t;
  uint32_t Type;
...
  SymbolBody *Sym = nullptr;
  InputSectionBase<ELFT> *OffsetSec = nullptr;
  uintX_t OffsetInSec = 0;
  bool UseSymVA = false;
  uintX_t Addend = 0;

So shift of one element should cost a bit more.
Perfomance probably will depend a lot on amount of that tail relocations what may vary from application to application.

I don't think we should care much about memory usage here, there are probably not much relocations anyways, every list item should add only 2 pointers for each element, while vector anyways will reserve some unused place too.

I will leave that place of code as is for now.

So since Ed is ready to start investigating lld on FreeBSD/arm64, I think it is reasonable to keep that in rebased state for him. I`ll rebase this in a few hours.

Thank you!

Current status, I've applied this and D18031 and the build fails early on, when linking libc, with:

relocation R_AARCH64_ADR_PREL_PG_HI21 cannot be used when making a shared object; recompile with -fPIC.
relocation R_AARCH64_LDST64_ABS_LO12_NC cannot be used when making a shared object; recompile with -fPIC.
relocation R_AARCH64_ADR_PREL_PG_HI21 cannot be used when making a shared object; recompile with -fPIC.
relocation R_AARCH64_LDST64_ABS_LO12_NC cannot be used when making a shared object; recompile with -fPIC.
relocation R_AARCH64_ADR_PREL_PG_HI21 cannot be used when making a shared object; recompile with -fPIC.
relocation R_AARCH64_LDST64_ABS_LO12_NC cannot be used when making a shared object; recompile with -fPIC.
relocation R_AARCH64_LDST64_ABS_LO12_NC cannot be used when making a shared object; recompile with -fPIC.
relocation R_AARCH64_LDST64_ABS_LO12_NC cannot be used when making a shared object; recompile with -fPIC.
cc: error: linker command failed with exit code 1 (use -v to see invocation)
*** [libc.so.7.full] Error code 1

I'll move on to trying self-hosted clang/llvm/lld builds on FreeBSD first.

So since Ed is ready to start investigating lld on FreeBSD/arm64, I think it is reasonable to keep that in rebased state for him. I`ll rebase this in a few hours.

Thank you!

Current status, I've applied this and D18031 and the build fails early on, when linking libc, with:

relocation R_AARCH64_ADR_PREL_PG_HI21 cannot be used when making a shared object; recompile with -fPIC.
relocation R_AARCH64_LDST64_ABS_LO12_NC cannot be used when making a shared object; recompile with -fPIC.
relocation R_AARCH64_ADR_PREL_PG_HI21 cannot be used when making a shared object; recompile with -fPIC.
relocation R_AARCH64_LDST64_ABS_LO12_NC cannot be used when making a shared object; recompile with -fPIC.
relocation R_AARCH64_ADR_PREL_PG_HI21 cannot be used when making a shared object; recompile with -fPIC.
relocation R_AARCH64_LDST64_ABS_LO12_NC cannot be used when making a shared object; recompile with -fPIC.
relocation R_AARCH64_LDST64_ABS_LO12_NC cannot be used when making a shared object; recompile with -fPIC.
relocation R_AARCH64_LDST64_ABS_LO12_NC cannot be used when making a shared object; recompile with -fPIC.
cc: error: linker command failed with exit code 1 (use -v to see invocation)
*** [libc.so.7.full] Error code 1

I'll move on to trying self-hosted clang/llvm/lld builds on FreeBSD first.

Yes, with master you will hit this issue. I fixed with http://reviews.llvm.org/D18031 and you will also need http://reviews.llvm.org/D17980 and http://reviews.llvm.org/D18026 (which are not rebased yet).

I plan to resend all the patches mentioned (including this one) today rebased against master (I am currently bootstrapping and checking the test-suite on aarch64).

ELF/OutputSections.cpp
294

Although I did use a vector of ints, if you check the push_back is dominated by internal libc++ calls where insert it is dominated by memmove. And if you use the same benchmark and increase the total number of elements you will see that it still somewhat faster. It is due for general usage of lld, bulk memmove yields much more CPI and performance than pointer chasing and function call from std::list.

I would really ask the reviewers then if they will have a time to take a look on current aarch64 patches.
As updating them in every day lld changing world is really painfull sometimes.

We have multiple aarch64 patches like "on hold" now, I think any progress here would be really appreciated.

Best regards,
George.


От: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Отправлено: 21 марта 2016 г. 19:07
Кому: George Rimar; ruiu@google.com; rafael.espindola@gmail.com
Копия: adhemerval.zanella@linaro.org; emaste@freebsd.org; amara.emerson@arm.com; renato.golin@linaro.org; llvm-commits@lists.llvm.org
Тема: Re: [PATCH] D16201: [ELF/AArch64] - Implemented set of R_AARCH64_TLSDESC_* relocations.

zatrazz added a comment.

In http://reviews.llvm.org/D16201#379217, @emaste wrote:

> So since Ed is ready to start investigating lld on FreeBSD/arm64, I think it is reasonable to keep that in rebased state for him. I`ll rebase this in a few hours.

Thank you!

Current status, I've applied this and http://reviews.llvm.org/D18031 and the build fails early on, when linking libc, with:

relocation R_AARCH64_ADR_PREL_PG_HI21 cannot be used when making a shared object; recompile with -fPIC.
relocation R_AARCH64_LDST64_ABS_LO12_NC cannot be used when making a shared object; recompile with -fPIC.
relocation R_AARCH64_ADR_PREL_PG_HI21 cannot be used when making a shared object; recompile with -fPIC.
relocation R_AARCH64_LDST64_ABS_LO12_NC cannot be used when making a shared object; recompile with -fPIC.
relocation R_AARCH64_ADR_PREL_PG_HI21 cannot be used when making a shared object; recompile with -fPIC.
relocation R_AARCH64_LDST64_ABS_LO12_NC cannot be used when making a shared object; recompile with -fPIC.
relocation R_AARCH64_LDST64_ABS_LO12_NC cannot be used when making a shared object; recompile with -fPIC.
relocation R_AARCH64_LDST64_ABS_LO12_NC cannot be used when making a shared object; recompile with -fPIC.
cc: error: linker command failed with exit code 1 (use -v to see invocation)
*** [libc.so.7.full] Error code 1

I'll move on to trying self-hosted clang/llvm/lld builds on FreeBSD first.

Yes, with master you will hit this issue. I fixed with http://reviews.llvm.org/D18031 and you will also need http://reviews.llvm.org/D17980 and http://reviews.llvm.org/D18026 (which are not rebased yet).

I plan to resend all the patches mentioned (including this one) today rebased against master (I am currently bootstrapping and checking the test-suite on aarch64).

Comment at: ELF/OutputSections.cpp:294
@@ +293,3 @@
+ Relocs.insert(Relocs.end() - Tail, Reloc);
+}

+

Although I did use a vector of ints, if you check the push_back is dominated by internal libc++ calls where insert it is dominated by memmove. And if you use the same benchmark and increase the total number of elements you will see that it still somewhat faster. It is due for general usage of lld, bulk memmove yields much more CPI and performance than pointer chasing and function call from std::list.

http://reviews.llvm.org/D16201


llvm-commits mailing list
llvm-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits

I plan to resend all the patches mentioned (including this one) today rebased against master (I am currently bootstrapping and checking the test-suite on aarch64).

Ok, once that's ready I will apply all of those and try the FreeBSD/arm64 build.

Let support all of that in one place: http://reviews.llvm.org/D18330

Let support all of that in one place: http://reviews.llvm.org/D18330

If you put just the review ID phabricator will add a cross-reference to the target also: D18330.

grimar abandoned this revision.Mar 23 2016, 8:16 AM

Let support all of that in one place: http://reviews.llvm.org/D18330

If you put just the review ID phabricator will add a cross-reference to the target also: D18330.

Sorry, what I actually wanted it was to abandon this. I forgot to choose the status it seems.