This is an archive of the discontinued LLVM Phabricator instance.

[lld][ARM] support absolute thunks for Armv4T Thumb and interworking
ClosedPublic

Authored by stuij on Dec 12 2022, 3:17 PM.

Details

Summary

changes:

  • BLX: The Arm architecture versions that support the branch and link instruction (BLX), can rewrite BLs in place when a state change from Arm<->Thumb is required. Armv4T does not have BLX and so needs thunks for state changes.
  • v4T Thumb long branches needed their own thunk. We could have used the v6M implementation, but v6M doesn't have Arm state and must resolve to rather inefficient stack reshuffling. We also can't reuse v7 thumb thunks as they use MOVV/MOVT, which wasn't available yet for v4T.
  • Remove the `lack of BLX' warning. LLVM only supports Arm Architecture versions upwards of v4, which we now all support in LLD.
  • renamed existing thunks to better reflect their use: ARMV5ABSLongThunk -> ARMV5LongLdrPcThunk, ARMV5PILongThunk -> ARMV4PILongThunk
  • removed isCompatibleWith method from ARMV5ABSLongThunk and ARMV5PILongThunk, as they were identical to the ARMThunk parent class implementation.

Support for (efficient) position independent thunks for v4T will be added in a
follow-up patch, including possible related thunk renaming and code comment
cleanup.

Diff Detail

Event Timeline

stuij created this revision.Dec 12 2022, 3:17 PM
Herald added a project: Restricted Project. · View Herald Transcript
stuij requested review of this revision.Dec 12 2022, 3:17 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 12 2022, 3:17 PM
stuij edited the summary of this revision. (Show Details)Dec 12 2022, 3:28 PM
stuij added a reviewer: peter.smith.
MaskRay added a comment.EditedDec 12 2022, 11:06 PM

(On a trip until Dec 15, with limited time using a computer)

add linker support for Armv4(T) absolute relocations

What are the "absolute relocations"? They usually refer to R_ARM_ABS32 and other relocation types that are represented as R_ABS in lld.

"linker" in the subject isn't useful. The directory is all about the linker implementation.

If this is about supporting additional thunks, a more useful subject may be "Support thunks for ..."

Does the support make GBA and NDS programs with mixed A32/T32 code linkable with lld? If yes, that'll be great... I've played many GBA/NDS games in the past...

stuij updated this revision to Diff 482401.Dec 13 2022, 2:05 AM

renamed commit message

stuij retitled this revision from [lld][ARM] add linker support for Armv4(T) absolute relocations to [lld][ARM] support absolute thunks for Armv4T Thumb and interworking.Dec 13 2022, 2:06 AM
stuij added a comment.Dec 13 2022, 2:46 AM

What are the "absolute relocations"? They usually refer to R_ARM_ABS32 and other relocation types that are represented as R_ABS in lld.

"linker" in the subject isn't useful. The directory is all about the linker implementation.

Thanks. Yes you're right, this is about making R_ARM_CALL and R_ARM_THUMB_CALL work with non-PIC thunks. Previously I actually had "thunks" in the subject and didn't have "linker", but then at the last minute late at night for reasons of bad judgement I guess, I decided to go with relocation and added linker :) I changed the subject to follow your suggestion.

Does the support make GBA and NDS programs with mixed A32/T32 code linkable with lld? If yes, that'll be great... I've played many GBA/NDS games in the past...

Yes, indeed it does :) And inexplicably I didn't even bother to test-drive this code on a GBA program. But now I did and I'm staring at a red screen written by Thumb code called from Arm created by an all-LLVM toolchain :)

What are the "absolute relocations"? They usually refer to R_ARM_ABS32 and other relocation types that are represented as R_ABS in lld.

When I added the Thunk support I borrowed terminology from Arm's linker. It uses ABS as short for ABSOLUTE which was its term for Non-PI or position dependent.

Does the support make GBA and NDS programs with mixed A32/T32 code linkable with lld? If yes, that'll be great... I've played many GBA/NDS games in the past...

GBA is v4T so in theory this should make it possible as I think the existing homebrew dev-kits are based on GCC. NDS has a v5T and a v4T, not sure what the development story was for it, whether the software used v4T exclusively or the chips were programmed independently.

A few small comments, but overall looking good to me.

For v4T PI thunks I think the V4 Thumb to Arm is essentially

P:  bx px 
     b #-6
     ldr ip, [pc, #0] ; L2
L1: add pc, pc, ip
L2: .word s - (p + (L1-p) + 8)

The Arm to Thumb looks like it can reuse the existing v5 PI Thunk.

lld/ELF/Thunks.cpp
242

Is this extra new line intentional?

664–665

It looks like the original naming convention for Thunks has broken down a bit. When I first saw the sequence I left a comment that you can't use this for V4 as it can't handle interworking, only saw the ARMLongBXThunk later.

As I understand it the thunk has a code sequence that is V4 ABS (non-interworking), but V5 ABS (interworking). The new v4 ABS Long BX thunk is only needed for V4 interworking.

Although a more significant name change, perhaps:
ARMV4V5LongLdrPcThunk
ARMV4LongBxThunk

Not sure what the capitalisation should be for Ldr and Bx. As instructions they need to be all one case, but this makes it difficult to read the class name.

693

perhaps worth a comment: Arm recommended sequence to follow bx pc. As it looks strange, and in theory anything can be used as it is never executed, but it can make a difference if run on a CPU with speculative execution.

1142

I think it will be worth separating what is not supported, from what is supported inefficiently.
TODO: Support PIC interworking thunks for V4T.
TODO: More efficient PIC non-interworking thunks for V4T.

1144

This would be useful as it looks like there is a potential problem with the existing implementation for V5T and below as it always uses the 16 MiB V6+ Thumb branch range. This puts it at risk of relocation out of range errors if there is more than 4 MiB of code in a single output section.

1153

I think the ARMV5PILongThunk can work for V4T as well here.

P:   ldr ip, [pc, #4] ; L2
L1: add ip, pc, ip
      bx ip
L2: .word S - (P + (L1 - P) + 8)

The entry is Arm state so no BLX required, and the LDR is just loading an offset, the state change is done by the bx ip.

1229–1230

Possible to rerrange to:

if (config->armJ1J2BranchEncoding)
  return addThunkV6M(reloc, s, a);
else if (config->armHasBlx)
  return addThunkArmv5v6(reloc, s, a);
return addThunkArmv4(reloc, s, a);

Should be similar but could be easier to read.

stuij added a comment.Dec 13 2022, 3:55 AM

NDS has a v5T and a v4T, not sure what the development story was for it, whether the software used v4T exclusively or the chips were programmed independently.

The official devkit didn't allow programmers to write for the arm7tdmi directly actually. It was mainly used to control hardware and you had to use an API to talk to it from the Arm9 (which is horrible to program for btw, as cost-saving measures meant that you can only run it at full speed when interfacing with TCM and caches). For home-brew devkits of course anything goes and you can program them both. They communicate through an interconnect FIFO. And yes, the most popular home-brew devkit is based on current GCC.

stuij added inline comments.Dec 13 2022, 4:07 AM
lld/ELF/Thunks.cpp
1144

I have a separate fix for that issue btw.

stuij updated this revision to Diff 482444.Dec 13 2022, 5:54 AM
stuij marked 6 inline comments as done.

addressed various review comments

lld/ELF/Thunks.cpp
242

no. removed.

1153

ah right. thanks.

stuij added inline comments.Dec 13 2022, 6:45 AM
lld/ELF/Thunks.cpp
664–665

As I understand it the thunk has a code sequence that is V4 ABS (non-interworking), but V5 ABS (interworking). The new v4 ABS Long BX thunk is only needed for V4 interworking.

Correct, LDR doesn't do state change for Armv4T.

Regarding naming, V6 also uses the V5 thunks, so the thunk name would become:
ARMV4V5V6LongLdrPcThunk

Similarly for PI thunks that would become:
ARMV4V5V6PILongThunk

I *think* I'm ok with this as it does make the intent clear, but it's not pretty. Thoughts?

Thanks for the updates. One more suggestion on the naming, but I've not got a strong opinion.

lld/ELF/Thunks.cpp
664–665

Likely that the best naming scheme means renaming everything, which is not ideal as it will mean updating a lot of tests.

I think the requirement is probably the best so V7 requires Arm V7 etc. This would imply that only ARMV4PILongThunk is needed.

The awkward case is the LongLdrPcThunk as that can be used in Arm state for non-interworking. This would end up with a monstrosity like ARMV5AndNotInterworkingV4LongLdrPcThunk. It is probably less confusing to say ARMV5LongLdrPcThunk with a comment that says can be used for ARMV4T if not interworking.

Probably best err on the side of readable names with the one exception rather than precise but unreadable names.

stuij added inline comments.Dec 13 2022, 8:39 AM
lld/ELF/Thunks.cpp
664–665

Right, so we're back to minimum arch requirement encoding so to speak. Yes I'd prefer that. And can live with ARMV5LongLdrPcThunk.

So the name changes included in this patch would then be:
ARMV4PILongThunk
ARMV5LongLdrPcThunk
ARMV4ABSLongBxThunk (as there will also be an ARMV4PILongBxThunk in follow-up patch)

peter.smith added inline comments.Dec 14 2022, 1:28 AM
lld/ELF/Thunks.cpp
664–665

Yes that looks good to me.

stuij updated this revision to Diff 482783.Dec 14 2022, 3:15 AM

change thunk names to agreed upon wording

stuij marked 3 inline comments as done.Dec 14 2022, 3:16 AM
stuij added inline comments.
lld/ELF/Thunks.cpp
664–665

changed the thunk names

stuij edited the summary of this revision. (Show Details)Dec 14 2022, 3:21 AM
stuij edited the summary of this revision. (Show Details)
stuij updated this revision to Diff 482785.Dec 14 2022, 3:23 AM
stuij marked an inline comment as done.

edit commit message to reflect thunk renaming

peter.smith accepted this revision.Dec 16 2022, 7:11 AM

Thanks for the updates, I'm happy from the Arm side. Will be worth waiting some time to give MaskRay some time to take a look.

This revision is now accepted and ready to land.Dec 16 2022, 7:11 AM
MaskRay accepted this revision.Dec 17 2022, 8:38 PM
MaskRay added inline comments.
lld/ELF/Thunks.cpp
737

delete blank line.

753
756

delete blank line.

777

delete blank line.

795

S - (P + (L1 - P) + 8) can be simplified

1157

ditto below

1232
lld/test/ELF/arm-bl-v4t.s
3

Use split-file %s %t. See some new tests

22

/// for non-RUN non-CHECK comments

lld/test/ELF/arm-bx-v4t.s
60

FAR-LABEL: <target>: (omit addresses so that diagnostics will be better in case of an error)

74

Align (omit addresses so that diagnostics will be better in case of an error))

MaskRay added a comment.EditedDec 17 2022, 8:39 PM

Remove the `lack of BLX' warning. LLVM only supports Arm Architecture versions upwards of v4, which we now all support in LLD.

renamed existing thunks to better reflect their use: ARMV5ABSLongThunk -> ARMV5LongLdrPcThunk, ARMV5PILongThunk -> ARMV4PILongThunk

s/renamed/rename. Just use imperative sentences

We also can't reuse v7 thumb thunks as they use MOVV/MOVT, which wasn't available yet for v4T.

... not supported by v4T.

stuij updated this revision to Diff 484228.Dec 20 2022, 5:20 AM
stuij marked 10 inline comments as done.

addressed review comments

stuij marked an inline comment as done.Dec 20 2022, 5:28 AM
stuij added inline comments.
lld/ELF/Thunks.cpp
795

This is Peter's comment. I assume he went for descriptive, rather than succinct. He's on vacation for a couple of weeks, but I'll ask him when he gets back.

lld/test/ELF/arm-bl-v4t.s
3

Ah, I like that. Thanks, and thanks for implementing it :) Much cleaner and easier to debug.

lld/test/ELF/arm-bx-v4t.s
60

Thanks, fixed. I took aarch64-thunk-reuse.s as a model for how this should look.

stuij marked an inline comment as done.Dec 20 2022, 10:49 AM
MaskRay accepted this revision.Dec 20 2022, 4:53 PM

Looks great!

lld/ELF/Thunks.cpp
1164
1168

Delete else. Also, when the code is similarly beautiful with a positive or a negative condition, prefer a positive condition.

stuij marked 2 inline comments as done.Dec 21 2022, 2:30 AM
This revision was landed with ongoing or failed builds.Dec 21 2022, 3:06 AM
This revision was automatically updated to reflect the committed changes.
lld/test/ELF/arm-bl-v6.s