This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Use hardware TLS register in Thumb2 mode when -mtp=cp15 is passed
ClosedPublic

Authored by ardb on Oct 27 2021, 1:57 AM.

Details

Summary

In ARM mode, passing -mtp=cp15 forces the use of an inline MRC system register read to move the thread pointer value into a register.

Currently, in Thumb2 mode, -mtp=cp15 is ignored, and a call to the __aeabi_read_tp helper is emitted instead.

This is inconsistent, and breaks the Linux/ARM build for Thumb2 targets, as the Linux kernel does not provide an implementation of __aeabi_read_tp,.

Diff Detail

Event Timeline

ardb created this revision.Oct 27 2021, 1:57 AM
ardb requested review of this revision.Oct 27 2021, 1:57 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptOct 27 2021, 1:57 AM

I tested this against next-20211027, where it resolved the build error and boots in QEMU master without any visible issues. Disassembly between GCC and clang seems to be good. I'll leave it up to @nickdesaulniers and @peter.smith to approve, as I am not very familiar with TableGen.

peter.smith accepted this revision.Oct 27 2021, 9:55 AM

LGTM as this as CP15 can be used on architecture v6k and above, which maps to IsThumb2.

As an aside from this patch, the Arm state could be considered too permissive as it will permit -mtls=cp15 on architectures that wouldn't have the coprocessor register like arm7tdmi, although GCC also permits this so I guess we're in the set this option with care territory.

This revision is now accepted and ready to land.Oct 27 2021, 9:55 AM
nickdesaulniers requested changes to this revision.Oct 27 2021, 10:28 AM

Thanks for the patch!

clang/test/CodeGen/arm-tphard.c
10 ↗(On Diff #382558)

Let's make this a test under llvm/test/CodeGen/, using IR:

; RUN: llc --mtriple=armv7-linux-gnueabihf -o - %s | FileCheck %s
; RUN: llc --mtriple=thumbv7-linux-gnu -o - %s | FileCheck %s
define dso_local i8* @tphard() "target-features"="+read-tp-hard" {
// CHECK-NOT: __aeabi_read_tp
  %1 = tail call i8* @llvm.thread.pointer()
  ret i8* %1
}

declare i8* @llvm.thread.pointer()
This revision now requires changes to proceed.Oct 27 2021, 10:28 AM
ardb added inline comments.Oct 27 2021, 2:30 PM
clang/test/CodeGen/arm-tphard.c
10 ↗(On Diff #382558)

Let's make this a test under llvm/test/CodeGen/, using IR:

Why is that better?

; RUN: llc --mtriple=armv7-linux-gnueabihf -o - %s | FileCheck %s
; RUN: llc --mtriple=thumbv7-linux-gnu -o - %s | FileCheck %s

Are you sure using this triple forces generation of Thumb2 code? It didn't seem to when I tried.

define dso_local i8* @tphard() "target-features"="+read-tp-hard" {
// CHECK-NOT: __aeabi_read_tp

Are you sure __aeabi_read_tp will appear in the IR for soft TP?

%1 = tail call i8* @llvm.thread.pointer()
ret i8* %1

}

declare i8* @llvm.thread.pointer()

clang/test/CodeGen/arm-tphard.c
10 ↗(On Diff #382558)

Why is that better?

Because the front-end isn't really involved in this back end code gen bug, so we should just be testing the back end.

Are you sure using this triple forces generation of Thumb2 code? It didn't seem to when I tried.

Good question; I guess for thumb2 there's no command line flag passed to the compiler that says "I would like thumb[1] as opposed to thumb2?" Maybe @peter.smith can provide some color on that? Is it simply armv7 for thumb2 vs v6 for thumb[1], or something else?

Are you sure __aeabi_read_tp will appear in the IR for soft TP?

__aeabi_read_tp will never appear in the IR (unless someone explicitly called it`. Instead, we're testing that the intrinsic (@llvm.thread.pointer) is lowered to either that libcall or mrc. __aeabi_read_tp may appear in the object file or assembler code generated from that intrinsic call.

But also, it looks like there's already coverage for __aeabi_read_tp being generated for soft TP. See also llvm/test/CodeGen/ARM/readtp.ll. There's also thread_pointer.ll in that same dir (and a few more tests mentioning __aeabi_read_tp that all look like candidates to add these tests to rather than creating a new test file, perhaps.

ardb updated this revision to Diff 382826.Oct 27 2021, 3:46 PM

Drop new Clang CodeGen test, and add the Thumb2 check to an existing backend test instead.

nickdesaulniers accepted this revision.Oct 27 2021, 3:55 PM

Cool, if you wouldn't mind additional extending this patch to cover the intrinsic that the front end also generates, this LGTM:

diff --git a/llvm/test/CodeGen/ARM/thread_pointer.ll b/llvm/test/CodeGen/ARM/thread_pointer.ll
index c6318a58277c..f1ef2ddac2d0 100644
--- a/llvm/test/CodeGen/ARM/thread_pointer.ll
+++ b/llvm/test/CodeGen/ARM/thread_pointer.ll
@@ -1,4 +1,7 @@
-; RUN: llc -mtriple arm-linux-gnueabi -filetype asm -o - %s | FileCheck %s
+; RUN: llc -mtriple arm-linux-gnueabi -o - %s | FileCheck %s -check-prefix=CHECK-SOFT
+; RUN: llc -mtriple arm-linux-gnueabi -mattr=+read-tp-hard -o - %s | FileCheck %s -check-prefix=CHECK-HARD
+; RUN: llc -mtriple thumbv7-linux-gnueabi -o - %s | FileCheck %s -check-prefix=CHECK-SOFT
+; RUN: llc -mtriple thumbv7-linux-gnueabi -mattr=+read-tp-hard -o - %s | FileCheck %s -check-prefix=CHECK-HARD
 
 declare i8* @llvm.thread.pointer()
 
@@ -8,5 +11,6 @@ entry:
   ret i8* %tmp1
 }
 
-; CHECK: bl __aeabi_read_tp
+; CHECK-SOFT: bl __aeabi_read_tp
+; CHECK-HARD: mrc p15, #0, {{r[0-9]+}}, c13, c0, #3

Thanks for the patch!

This revision is now accepted and ready to land.Oct 27 2021, 3:55 PM
ardb updated this revision to Diff 382839.Oct 27 2021, 4:14 PM

Add another test suggested by Nick.

nickdesaulniers accepted this revision.Oct 27 2021, 4:18 PM
ardb added a comment.Oct 27 2021, 4:26 PM

Thanks all. Could someone with commit access please merge this? Thanks.

Thanks all. Could someone with commit access please merge this? Thanks.

Sure thing; I'll pick it up. Thanks again for the patch!

This revision was landed with ongoing or failed builds.Oct 27 2021, 4:46 PM
This revision was automatically updated to reflect the committed changes.
peter.smith added inline comments.Oct 28 2021, 10:02 AM
clang/test/CodeGen/arm-tphard.c
10 ↗(On Diff #382558)

Are you sure using this triple forces generation of Thumb2 code? It didn't seem to when I tried.

Good question; I guess for thumb2 there's no command line flag passed to the compiler that says "I would like thumb[1] as opposed to thumb2?" Maybe @peter.smith can provide some color on that? Is it simply armv7 for thumb2 vs v6 for thumb[1], or something else?

As I understand it, "Thumb2 Technology" to give it the marketing name is still the Thumb (T32) ISA, it just has access to a lot more instructions than Thumb. In the backend this means that the compiler is free to select instructions with the Thumb2 predicate. It doesn't have to if there is an equivalent 2-byte sized Thumb instruction that does the job.

In theory to get Thumb only code -march=armv6 might work, armv6 is the intersection of the A, R and M profiles that include no Thumb 2.

The architecture for Thumb 2 is close to v7+ although like most things Arm it is more complicated:

  • Armv6k as implemented by one CPU Arm1165t2-s (which I wouldn't expect to see running Linux). All other Arm v6 CPUs including the Arm 1176jzf-s (original Raspberry Pi) do not have Thumb 2.
  • All Arm v7 CPUs have Thumb 2, including M, R and A profiles.
  • All Arm v8 R and A profile CPUs have Thumb 2 as do Armv8-M.mainline from M profile. The Armv8-M.baseline CPUs do not (these are the smallest microcontrollers).
clang/test/CodeGen/arm-tphard.c
10 ↗(On Diff #382558)

Thanks for all the historical context!

In the backend this means that the compiler is free to select instructions with the Thumb2 predicate. It doesn't have to if there is an equivalent 2-byte sized Thumb instruction that does the job.

So I guess this would answer @ardb 's question:

Are you sure using this triple forces generation of Thumb2 code? It didn't seem to when I tried.

It seems that the answer is "it depends on whether there is an equivalent 2-byte size Thumb instruction that does the job."