This is an archive of the discontinued LLVM Phabricator instance.

[lld][ELF] Relax long jump/call to short jump/call on AVR
AbandonedPublic

Authored by benshi001 on Mar 16 2023, 4:47 AM.

Details

Reviewers
MaskRay
Summary

The short jump/call costs less cycles than the long jump/call.

Diff Detail

Event Timeline

benshi001 created this revision.Mar 16 2023, 4:47 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 16 2023, 4:47 AM
benshi001 requested review of this revision.Mar 16 2023, 4:47 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 16 2023, 4:47 AM
benshi001 updated this revision to Diff 505777.Mar 16 2023, 5:13 AM
benshi001 added inline comments.Mar 16 2023, 5:15 AM
lld/test/ELF/avr-relax.s
37

These 2 jmp instructions are no relaxed, due to out of range.

lld/test/ELF/basic-avr.s
12

We should not display <unknown>, we can display the instructions via adding --mcpu=atmega328p to llvm-objdump.

benshi001 added inline comments.Mar 18 2023, 10:27 PM
lld/ELF/Arch/AVR.cpp
245

Is there any better we can remove this nop ? I can only figure out the way by memcpy uint8_t *loc .

MaskRay added inline comments.Mar 18 2023, 11:15 PM
lld/ELF/Arch/AVR.cpp
245

I implemented this for RISC-V but I am unsure we should get the complexity for the less-used AVR port. This complexity is exactly what I called out in a previous patch and you said that you did not intend to implement it.

benshi001 marked an inline comment as done.Mar 19 2023, 12:48 AM
benshi001 added inline comments.
lld/ELF/Arch/AVR.cpp
245

I see. I will not pursue removing the NOP any more.

And I think current form long jump -> short jump + nop is simple enough, at least 1 cycle is saved. Hopefully you will accept. ^_^

benshi001 marked an inline comment as done.Mar 19 2023, 12:50 AM

Apologies but this patch gives a feeling of overengineering for a less-popular (experimental) arch. Adding code with unclear benefits... Is it measurable?

Replacing an instruction to two likely makes the execution slower, so I think since we don't implement linker relaxation for AVR, we should not do the jmp/call rewriting as well.

Replacing an instruction to two likely makes the execution slower, so I think since we don't implement linker relaxation for AVR, we should not do the jmp/call rewriting as well.

My change is measurable,

  1. short jmp/call cost 1 less tick than long jmp/call
  2. the nop costs 1 tick.

So the rewriting of call has neither improvement nor regression, since the nop is always executed, so there is no tick/space change.

But the rewriting of jmp does have improvement on time, since the nop is no longer executed, one tick can be saved.

Here is the cost of short jmp

And this is the cost of long jmp

Increasing the number of instructions, while it may improve performance for some processors (any number?), doesn't look right...

Increasing the number of instructions, while it may improve performance for some processors (any number?), doesn't look right...

  1. On all devices, long jump costs 4 bytes, and short jump + nop also cost four bytes, there is no space expansion or shrink.
  2. As AVR instruction manual indicates, short jump costs 2 cycles, while long jump costs 3 cycles. So one CPU cycle is saved.

For example,

long jump _foo ; this is an unconditional jump which costs 4 bytes and 3 cpu cycle
...
short jump _foo; this is an unconditional jump which costs 2 bytes and 2 cpu cycle 
nop            ; this `nop` is just for padding the space of the replaced `long jump` , it is never executed.

In the above contrast, the nop is never executed, so short jump + nop does not waste any space, but saves one CPU cycle.

benshi001 added inline comments.Apr 12 2023, 8:16 PM
lld/ELF/Arch/AVR.cpp
258

This NOP is just for padding, actually we need not handle it and left it unchanged. (However this will make llvm-objdump show an <unknown>)

I can give up this patch. At least it is a tiny optimization, and have little affect on lld. But can my another patch https://reviews.llvm.org/D147364 be reviewd and accepted ?
R_AVR_LO8_LDI_GS/R_AVR_HI8_LDI_GS are still missing in lld, with them implemented, lld will be fully functional as GNU-ld, and can finally replace GNU-ld. My aim of clang+llvm+compilerrt+lld fully replace gnu toolchain, can be achieved. I really appreciate for that !

benshi001 abandoned this revision.Apr 14 2023, 8:23 PM