This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Fold adds with tprel_lo12_nc and secrel_lo12 into a following ldr/str
ClosedPublic

Authored by mstorsjo on Mar 10 2018, 1:23 PM.

Diff Detail

Event Timeline

mstorsjo created this revision.Mar 10 2018, 1:23 PM

Can't this be done in the DAGCombine?

I don't know - this is the way it's done currently for other, normal global variable: An ADDlow can get merged into a following LDR/STR. For cases when the ADDlow isn't merged or matched by something else, there needs to be a fallback rule that lowers it.

I guess I could look at adding other code for matching an ADDXri machine node with LDR/STR, but I don't know if that has got other implications.

sebpop accepted this revision.Mar 12 2018, 8:50 AM

I think this change is good.

Here is what happens on arm64-tls-execs.ll
before this patch, the dag after instruction selection looks like this:

t0: ch = EntryToken
      t12: i64 = ADDXri MOVbaseTLS:i64, TargetGlobalTLSAddress:i64<i32* @local_exec_var> 0 [TF=71], TargetConstant:i32<0>
    t13: i64 = ADDXri t12, TargetGlobalTLSAddress:i64<i32* @local_exec_var> 0 [TF=98], TargetConstant:i32<0>
  t4: i32,ch = LDRWui<Mem:LD4[@local_exec_var](dereferenceable)> t13, TargetConstant:i64<0>, t0
t6: ch,glue = CopyToReg t0, Register:i32 $w0, t4
t7: ch = RET_ReallyLR Register:i32 $w0, t6, t6:1

with the patch there is one less ADDXri that got folded into the load:

t0: ch = EntryToken
    t12: i64 = ADDXri MOVbaseTLS:i64, TargetGlobalTLSAddress:i64<i32* @local_exec_var> 0 [TF=71], TargetConstant:i32<0>
  t4: i32,ch = LDRWui<Mem:LD4[@local_exec_var](dereferenceable)> t12, TargetGlobalTLSAddress:i64<i32* @local_exec_var> 0 [TF=98], t0
t6: ch,glue = CopyToReg t0, Register:i32 $w0, t4
t7: ch = RET_ReallyLR Register:i32 $w0, t6, t6:1

That is because aarch64 has a pattern

defm : ExtLoadTo32ROPat<ro8,  extloadi8,   LDRBBroW, LDRBBroX>;

to match

  t13: i64 = AArch64ISD::ADDlow t12, TargetGlobalTLSAddress:i64<i32* @local_exec_var> 0 [TF=98]
t4: i32,ch = load<LD4[@local_exec_var](dereferenceable)> t0, t13, undef:i64

and transform that into:

Morphed node: t4: i32,ch = LDRWui<Mem:LD4[@local_exec_var](dereferenceable)> t12, TargetGlobalTLSAddress:i64<i32* @local_exec_var> 0 [TF=98], t0

I guess I could look at adding other code for matching an ADDXri machine node with LDR/STR, but I don't know if that has got other implications.

I think it is impossible to specify a pattern to match a load with a machine node AArch64::ADDXri.
The LHS matching part of a def-pat should be a generic dag node.

The current patch avoids lowering the add into a machine node too early, and keeps the add as a generic addlow node, making the load+addlow ISEL pattern match.
If the addlow node is not folded into a load, it gets caught by the pseudo after regalloc and lowered into a machine node ADDXri.

This revision is now accepted and ready to land.Mar 12 2018, 8:50 AM
This revision was automatically updated to reflect the committed changes.

Unfortunately this change looks like it is going to be incompatible with binutils and LLD for the ELF targets as the ldr . The new output via llvm-objdump is:

       0:	48 d0 3b d5 	mrs	x8, TPIDR_EL0
       4:	08 01 40 91 	add	x8, x8, #0, lsl #12
		0000000000000004:  R_AARCH64_TLSLE_ADD_TPREL_HI12	tlsVar
       8:	00 01 40 b9 	ldr	w0, [x8]
		0000000000000008:  R_AARCH64_TLSLE_LDST32_TPREL_LO12_NC	tlsVar
       c:	c0 03 5f d6 	ret

The GNU tools and LLD don't implement R_AARCH64_TLSLE_LDST32_TPREL_LO12_NC yet. For example using aarch64-linux-gnu-objdump:

0000000000000000 <getVar>:
   0:	d53bd048 	mrs	x8, tpidr_el0
   4:	91400108 	add	x8, x8, #0x0, lsl #12
			4: R_AARCH64_TLSLE_ADD_TPREL_HI12	tlsVar
   8:	b9400100 	ldr	w0, [x8]
			8: *unknown*	tlsVar
   c:	d65f03c0 	ret

Whereas the equivalent add relocation R_AARCH64_TLSLE_ADD_TPREL_LO12 is supported.

This is currently manifesting itself as a buildbot failure on AArch64 http://lab.llvm.org:8011/builders/clang-cmake-aarch64-full/builds/4609/steps/ninja%20check%202/logs/stdio

I think we'll need to get R_AARCH64_TLSLE_ADD_TPREL_LO12 implemented in binutils and LLD and widely adopted by distributions before we can enable this by default, at least for Linux targets.

Unfortunately this change looks like it is going to be incompatible with binutils and LLD for the ELF targets as the ldr . The new output via llvm-objdump is:

       0:	48 d0 3b d5 	mrs	x8, TPIDR_EL0
       4:	08 01 40 91 	add	x8, x8, #0, lsl #12
		0000000000000004:  R_AARCH64_TLSLE_ADD_TPREL_HI12	tlsVar
       8:	00 01 40 b9 	ldr	w0, [x8]
		0000000000000008:  R_AARCH64_TLSLE_LDST32_TPREL_LO12_NC	tlsVar
       c:	c0 03 5f d6 	ret

The GNU tools and LLD don't implement R_AARCH64_TLSLE_LDST32_TPREL_LO12_NC yet. For example using aarch64-linux-gnu-objdump:

0000000000000000 <getVar>:
   0:	d53bd048 	mrs	x8, tpidr_el0
   4:	91400108 	add	x8, x8, #0x0, lsl #12
			4: R_AARCH64_TLSLE_ADD_TPREL_HI12	tlsVar
   8:	b9400100 	ldr	w0, [x8]
			8: *unknown*	tlsVar
   c:	d65f03c0 	ret

Whereas the equivalent add relocation R_AARCH64_TLSLE_ADD_TPREL_LO12 is supported.

This is currently manifesting itself as a buildbot failure on AArch64 http://lab.llvm.org:8011/builders/clang-cmake-aarch64-full/builds/4609/steps/ninja%20check%202/logs/stdio

I think we'll need to get R_AARCH64_TLSLE_ADD_TPREL_LO12 implemented in binutils and LLD and widely adopted by distributions before we can enable this by default, at least for Linux targets.

Oh, I hadn't realized that this relocation wasn't implemented for ELF even though it did exist in the LLVM object library. When working on TLS for windows, I implemented both the ADD and LDR/STR relocations at the same time, and MSVC uses the LDR/STR form by default.

I'm fine with reverting the ELF part of this change for now - I'll do that in a moment.

I think we'll need to get R_AARCH64_TLSLE_ADD_TPREL_LO12 implemented in binutils and LLD and widely adopted by distributions before we can enable this by default, at least for Linux targets.

I'm fine with reverting the ELF part of this change for now - I'll do that in a moment.

Reverted in SVN r327503.

Thanks.

I've raised https://bugs.llvm.org/show_bug.cgi?id=36727 to get the relocation implemented in LLD. Will need to raise issues in Gold and BFD as well.