Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
I don't know much about COFF TLS to tell why you need the special treatment, but the AArch64 asm looks correct (though very inefficient, but you know that:).
Thanks for taking a look in any case!
Let's see if I can get some more progress on this if I turn it into a few more concrete questions:
First: Currently there's the pseudoinstruction LOADgot which lowers into a "adrp + ldr" instruction pair, but the ldr always loads a full X-register. Here I needed to do that, but load a W-register instead, to only read 32 bits. I tried to do this by adding a separate LOADgot32 pseudoinstruction, but I wasn't able to make it return a 32 bit value on the SelectionDAG level.
Secondly: When I do a series of SHL + ZERO_EXTEND + Load, I would want it lowered into a ldr x0, [x0, w1 uxtw #3], but it ends up as lsl w1, w1, #3; ldr x0, [x0, w1 uxtw] which feels pointless. I remember doing similar things elsewhere, where it was folded properly into the ldr as a shift - what's missing here?
Who's familiar with the tablegen part of the AArch64 backend and/or SelectionDAG and can help out with this?
Right, I see. This seems to be a new problem. When trying it on current trunk, I do get an unreachable, but when I try on older versions (ex. 3.8) I see a much simpler code coming out.
@tlsVar16 = thread_local global i16 0
@tlsVar32 = thread_local global i32 0
@tlsVar64 = thread_local global i64 0
define i64 @getVar() {
  %1 = load i16, i16* @tlsVar16
  %2 = load i32, i32* @tlsVar32
  %3 = load i64, i64* @tlsVar64
  %4 = zext i16 %1 to i32
  %5 = add i32 %2, %4
  %6 = zext i32 %5 to i64
  %7 = add i64 %3, %6
  ret i64 %7
}then:
$ llc -mtriple aarch64-windows tls-32b.ll -o -I get:
mrs x8, TPIDR_EL0 add x9, x8, :tprel_hi12:tlsVar16 add x10, x8, :tprel_hi12:tlsVar32 add x8, x8, :tprel_hi12:tlsVar64 add x9, x9, :tprel_lo12_nc:tlsVar16 add x10, x10, :tprel_lo12_nc:tlsVar32 add x8, x8, :tprel_lo12_nc:tlsVar64 ldr w10, [x10] ldrh w9, [x9] ldr x8, [x8] add w9, w10, w9 add x0, x8, w9, uxtw ret
Which does the correct loads for half, single and double sizes.
Maybe the functionality was lost for not being tested properly on Windows?
Secondly: When I do a series of SHL + ZERO_EXTEND + Load, I would want it lowered into a ldr x0, [x0, w1 uxtw #3], but it ends up as lsl w1, w1, #3; ldr x0, [x0, w1 uxtw] which feels pointless. I remember doing similar things elsewhere, where it was folded properly into the ldr as a shift - what's missing here?
Who's familiar with the tablegen part of the AArch64 backend and/or SelectionDAG and can help out with this?
AFAICR, this is supposed to be done by the DAGCombine. However, it depends on when the Pseudos get expanded (I forget). If it's too late, then they won't get merged.
No, that's probably because it aarch64-windows didn't exist at all back then (there's even no publicly available hardware yet), so this just did something else (looks like ELF relocations). And the issue isn't with reading tls vars of different sizes, but just in one small piece of implementing it properly - when reading the global _tls_index variable, which is a 32 bit value.
To reproduce the issue with this, that I'm facing, try adding the following to AArch64InstrInfo.td (next to the existing similar pattern for LOADgot):
def LOADgot32 : Pseudo<(outs GPR32:$dst), (ins i64imm:$addr),
                     [(set GPR32:$dst, (AArch64LOADgot tglobaladdr:$addr))]>,
              Sched<[WriteLDAdr]>;When building with this pattern in place, tablegen fails like this:
FAILED: cd /home/martin/code/llvm/build/lib/Target/AArch64 && /home/martin/code/llvm/build/bin/llvm-tblgen -gen-instr-info -I /home/martin/code/llvm/lib/Target/AArch64 -I /home/martin/code/llvm/include -I /home/martin/code/llvm/lib/Target /home/martin/code/llvm/lib/Target/AArch64/AArch64.td -o /home/martin/code/llvm/build/lib/Target/AArch64/AArch64GenInstrInfo.inc.tmp
Type set is empty for each HW mode:
possible type contradiction in the pattern below (use -print-records with llvm-tblgen to see all expanded records).
LOADgot32:      (LOADgot32:{ *:[i32] } (tglobaladdr:{ *:[] }):$addr)
UNREACHABLE executed at ../utils/TableGen/CodeGenDAGPatterns.cpp:817!I guess I could try to see what dag nodes this produces if I do such a load of a 32 bit global variable from normal IR.
D'oh, of course! :)
I guess I could try to see what dag nodes this produces if I do such a load of a 32 bit global variable from normal IR.
That's what I'd recommend, yes.
Managed to implement it without adding any new pseudo instructions. No significant effect on the generated code (the two missing combines are still as before), but this isn't nearly as messy in as many places as before.
While still suboptimal, I wouldn't be quite as embarrased to commit this as such now, with the rest of the instruction selection mess cleaned up now.
Right, this makes a lot more sense! :)
I'm assuming only 32-bit values are supported? If not, would it be too hard to add different sizes now?
Managed to merge the lsl into the ldr addressing by swapping the order of ZERO_EXTEND and SHL.
No, it supports any size. This added code just calculates the address of the tls variable. (The fiddling with a 32 bit load was only for the _tls_index ABI variable, not the user defined tls variable itself.)
Added testcases for getting a plain pointer, for storing to the tls var, and for other sizes of tls vars.