Details
Diff Detail
Event Timeline
I don't know - this is the way it's done currently for other, normal global variable: An ADDlow can get merged into a following LDR/STR. For cases when the ADDlow isn't merged or matched by something else, there needs to be a fallback rule that lowers it.
I guess I could look at adding other code for matching an ADDXri machine node with LDR/STR, but I don't know if that has got other implications.
I think this change is good.
Here is what happens on arm64-tls-execs.ll
before this patch, the dag after instruction selection looks like this:
t0: ch = EntryToken t12: i64 = ADDXri MOVbaseTLS:i64, TargetGlobalTLSAddress:i64<i32* @local_exec_var> 0 [TF=71], TargetConstant:i32<0> t13: i64 = ADDXri t12, TargetGlobalTLSAddress:i64<i32* @local_exec_var> 0 [TF=98], TargetConstant:i32<0> t4: i32,ch = LDRWui<Mem:LD4[@local_exec_var](dereferenceable)> t13, TargetConstant:i64<0>, t0 t6: ch,glue = CopyToReg t0, Register:i32 $w0, t4 t7: ch = RET_ReallyLR Register:i32 $w0, t6, t6:1
with the patch there is one less ADDXri that got folded into the load:
t0: ch = EntryToken t12: i64 = ADDXri MOVbaseTLS:i64, TargetGlobalTLSAddress:i64<i32* @local_exec_var> 0 [TF=71], TargetConstant:i32<0> t4: i32,ch = LDRWui<Mem:LD4[@local_exec_var](dereferenceable)> t12, TargetGlobalTLSAddress:i64<i32* @local_exec_var> 0 [TF=98], t0 t6: ch,glue = CopyToReg t0, Register:i32 $w0, t4 t7: ch = RET_ReallyLR Register:i32 $w0, t6, t6:1
That is because aarch64 has a pattern
defm : ExtLoadTo32ROPat<ro8, extloadi8, LDRBBroW, LDRBBroX>;
to match
t13: i64 = AArch64ISD::ADDlow t12, TargetGlobalTLSAddress:i64<i32* @local_exec_var> 0 [TF=98] t4: i32,ch = load<LD4[@local_exec_var](dereferenceable)> t0, t13, undef:i64
and transform that into:
Morphed node: t4: i32,ch = LDRWui<Mem:LD4[@local_exec_var](dereferenceable)> t12, TargetGlobalTLSAddress:i64<i32* @local_exec_var> 0 [TF=98], t0
I guess I could look at adding other code for matching an ADDXri machine node with LDR/STR, but I don't know if that has got other implications.
I think it is impossible to specify a pattern to match a load with a machine node AArch64::ADDXri.
The LHS matching part of a def-pat should be a generic dag node.
The current patch avoids lowering the add into a machine node too early, and keeps the add as a generic addlow node, making the load+addlow ISEL pattern match.
If the addlow node is not folded into a load, it gets caught by the pseudo after regalloc and lowered into a machine node ADDXri.
Unfortunately this change looks like it is going to be incompatible with binutils and LLD for the ELF targets as the ldr . The new output via llvm-objdump is:
0: 48 d0 3b d5 mrs x8, TPIDR_EL0 4: 08 01 40 91 add x8, x8, #0, lsl #12 0000000000000004: R_AARCH64_TLSLE_ADD_TPREL_HI12 tlsVar 8: 00 01 40 b9 ldr w0, [x8] 0000000000000008: R_AARCH64_TLSLE_LDST32_TPREL_LO12_NC tlsVar c: c0 03 5f d6 ret
The GNU tools and LLD don't implement R_AARCH64_TLSLE_LDST32_TPREL_LO12_NC yet. For example using aarch64-linux-gnu-objdump:
0000000000000000 <getVar>: 0: d53bd048 mrs x8, tpidr_el0 4: 91400108 add x8, x8, #0x0, lsl #12 4: R_AARCH64_TLSLE_ADD_TPREL_HI12 tlsVar 8: b9400100 ldr w0, [x8] 8: *unknown* tlsVar c: d65f03c0 ret
Whereas the equivalent add relocation R_AARCH64_TLSLE_ADD_TPREL_LO12 is supported.
This is currently manifesting itself as a buildbot failure on AArch64 http://lab.llvm.org:8011/builders/clang-cmake-aarch64-full/builds/4609/steps/ninja%20check%202/logs/stdio
I think we'll need to get R_AARCH64_TLSLE_ADD_TPREL_LO12 implemented in binutils and LLD and widely adopted by distributions before we can enable this by default, at least for Linux targets.
Oh, I hadn't realized that this relocation wasn't implemented for ELF even though it did exist in the LLVM object library. When working on TLS for windows, I implemented both the ADD and LDR/STR relocations at the same time, and MSVC uses the LDR/STR form by default.
I'm fine with reverting the ELF part of this change for now - I'll do that in a moment.
Thanks.
I've raised https://bugs.llvm.org/show_bug.cgi?id=36727 to get the relocation implemented in LLD. Will need to raise issues in Gold and BFD as well.
https://sourceware.org/bugzilla/show_bug.cgi?id=22969 and https://sourceware.org/bugzilla/show_bug.cgi?id=22970 raised on gold and ld.bfd respectively.