This is an archive of the discontinued LLVM Phabricator instance.

[ARM] armv6m eXecute Only (XO) long branch Thunk
ClosedPublic

Authored by keith.walker.arm on Jun 26 2023, 7:39 AM.

Details

Summary

This patch adds a thunk for Thumb long branch on V6-M for eXecute Only.

Note that there is currently no support for a position independent and
eXecute Only V6-M long branch thunk.

Diff Detail

Event Timeline

keith.walker.arm requested review of this revision.Jun 26 2023, 7:39 AM
stuij added inline comments.Jun 26 2023, 8:56 AM
lld/ELF/Thunks.cpp
774

Not sure if we should prefer speed over space, but seeing the speed penalty, wouldn't it be better to use r12 instead of the stack to store the address?

Can you put [LLD] in the title [LLD][ARM] armv6m eXecute Only (XO) long branch Thunk?

Note that there is currently no support for a position independant

Possible Typo, I think it is position independent.

I think the use case of position independence and execute-only on v6-m, and requirements for long branches will be exceedingly rare so I think an error message is good enough. I think a PI variant is possible in theory although I don't think the existing ABI relocations could be used, would need to manually code them or alter the ABI.

lld/ELF/Thunks.cpp
780

Did you mean R_ARM_THM_ALU_ABS_G0_NC?

lld/test/ELF/arm-thumb-thunk-v6m-xo.s
3

New tests are being written with split-file which makes it easier to edit the linker script. For an example see aarch64-long-thunk-converge.s

For example:
// --- a.lds
SECTIONS {
.text_low : { *(.text_low) }
.text_high 0x2000000 : { *(.text_high) }
}

// --- a.s
.syntax unified

4

I think it would be better to start with a known constant value as then you can make the distance just out of range.

5

There is no .text_low2 and .text_high2 in this example, they can be removed.

6

The destination has a lot of zeros in it. Possibly worth choosing a number that will show up more non-zero identical immediates ? For example 0x21345678 (which I think is 2-byte aligned).

peter.smith added inline comments.Jun 26 2023, 9:19 AM
lld/ELF/Thunks.cpp
774

I think that would be

push {r0}
...
mov r12, r0
pop {r0}
bx r12

Looking at the M0 cycle counts

push {r0, r1} is 1 more cycle that push {r0} (1 + N) cycles where N is number of registers
mov r12, r0 is 1 cycle
pop {r0, pc} is 4 cycles more than pop {ro} (4 + N) with branch, (1 + N) without.
bx r12 is 3 cycles.

If I've got that correct the alternative sequence using r12 is 1 fewer cycle for 4-bytes extra code-size. For M0 I would tend to go for space over speed for 1 cycle.

keith.walker.arm edited the summary of this revision. (Show Details)

Updated to address review comments:

  • Decision is to make no change to the thunk code sequence. As per Peters calculations the difference is 4 bytes of code vs 1 less instruction cycle. Decided that the 4 bytes codes saving was more important.
  • Corrected the usage of R_ARM_THM_ALU_ABS_G0_NC.
  • Rewrote the test to use the split-file feature
  • Changed addresses of the test sections to :
    • Have unique values for each applied relocation.
    • Have the 2 sections near the limit of when it is required (see below)
  • Removed unused section specification from linker script.

During testing I found that there is an issue if the distance between the source & destination is exactly 16M. A branch instruction is generated (rather than the long thunk code squence) which is not legal for armv6-m which only allows branches +/-2K. However this is not specific to this XO change, it already happens if XO is not involved so I think this is something that should be fixed separately.
The current test has an offset of 16M-4

keith.walker.arm marked 5 inline comments as done.Jun 27 2023, 8:18 AM
keith.walker.arm added inline comments.
lld/ELF/Thunks.cpp
774

Agrred with Peter's reasoning and making no change.

Minor correction to a previous stement: The current test has an offset of 16M+4

During testing I found that there is an issue if the distance between the source & destination is exactly 16M. A branch instruction is generated (rather than the long thunk code squence) which is not legal for armv6-m which only allows branches +/-2K. However this is not specific to this XO change, it already happens if XO is not involved so I think this is something that should be fixed separately.
The current test has an offset of 16M-4

I had thought something similar, but then managed to convince myself it wouldn't happen as LLD wouldn't generate a thunk for the narrow branch relocation, forgetting that these could be generated from BL relocations. I agree that it would be best fixed as a separate patch rather than tagged on to this one.

Thanks for the update. I've only got a few more nits that I should have noticed first time round. Otherwise this looks good to me.

lld/ELF/Thunks.cpp
1331

LLD tends not to use auto unless unless it is the result of a cast or an iterator. I think const bool would be most idiomatic.

lld/test/ELF/arm-thumb-thunk-v6m-xo.s
6

can you add --no-show-raw-insn as this will remove the raw encoding from the test.

44

Can you remove the addresses from the disassembly? We usually leave these off unless they are significant to the test as it makes the test more fragile and harder to update if there are minor changes.

Made the following changes:

  • Added --no-show-raw-insn to the test and removed the raw instructions from the expected test cases
  • removed addresss from expected test cases, expect for the 1st line in each section to ensure that the section start address is correct.
  • replaced used of auto with explicit type,
keith.walker.arm marked 2 inline comments as done.Jun 28 2023, 8:21 AM
MaskRay accepted this revision.Jun 28 2023, 3:45 PM

During testing I found that there is an issue if the distance between the source & destination is exactly 16M. A branch instruction is generated (rather than the long thunk code squence) which is not legal for armv6-m which only allows branches +/-2K. However this is not specific to this XO change, it already happens if XO is not involved so I think this is something that should be fixed separately.
The current test has an offset of 16M-4

I had thought something similar, but then managed to convince myself it wouldn't happen as LLD wouldn't generate a thunk for the narrow branch relocation, forgetting that these could be generated from BL relocations. I agree that it would be best fixed as a separate patch rather than tagged on to this one.

Thanks for the update. I've only got a few more nits that I should have noticed first time round. Otherwise this looks good to me.

Ideally, we should improve the test to test the limit of the range extension thunk. ppc64-long-branch-pi.s checks this but such tests are not common for range extension thunk support.
But I'm ok if this is too difficult to create.

I'll be out of town til July 5. I think the code is good. Accepted once @peter.smith is happy as well.

lld/ELF/Thunks.cpp
1340

place single quotes around the symbol name.

lld/test/ELF/arm-thumb-thunk-v6m-xo.s
6

For small output files, we don't typically remove them. Delete // RUN: rm -f %t/a %t/a2

10

New tests use ## for non-RUN-non-CHECK comments. For Arm where the universal comment marker // is preferred, use ///.

30

We can add another bl far to demonstrate that the thunk can be shared.

44

See my comment https://reviews.llvm.org/D153264#inline-1489080 for a prevailing style in newer tests.

// CHECK: Disassembly of section .text_low: can probably be removed.

This revision is now accepted and ready to land.Jun 28 2023, 3:45 PM
peter.smith accepted this revision.Jun 29 2023, 9:51 AM

LGTM too, assuming MaskRays comments can be addressed prior to commit.

Ideally, we should improve the test to test the limit of the range extension thunk. ppc64-long-branch-pi.s checks this but such tests are not common for range extension thunk support.
But I'm ok if this is too difficult to create.

I did originally try to add a range limit check and that was when I found (and mentioned in an earlier comment) that for armv6-m in general (not just XO) a branch instuction is generated which is not supported by the armv6-m architecure. Fixing that issue is outside the scope of this change specifically aimed at XO. Attempting to add a range limit check at this time would enshrine the invalid branch instruction as being correct. I think adding the range limit checks should be added when this branch problem is fixed for armv6-m in general.

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptJun 30 2023, 4:00 AM