This is an archive of the discontinued LLVM Phabricator instance.

lld/test/ELF/avr-relax.s
37	These 2 jmp instructions are no relaxed, due to out of range.
lld/test/ELF/basic-avr.s
12	We should not display `<unknown>`, we can display the instructions via adding `--mcpu=atmega328p` to `llvm-objdump`.

Harbormaster completed remote builds in B219844: Diff 505777.Mar 16 2023, 5:36 AM

benshi001 added inline comments.Mar 18 2023, 10:27 PM

lld/ELF/Arch/AVR.cpp
245	Is there any better we can remove this nop ? I can only figure out the way by `memcpy` `uint8_t *loc` .

MaskRay added inline comments.Mar 18 2023, 11:15 PM

lld/ELF/Arch/AVR.cpp
245	I implemented this for RISC-V but I am unsure we should get the complexity for the less-used AVR port. This complexity is exactly what I called out in a previous patch and you said that you did not intend to implement it.

benshi001 marked an inline comment as done.Mar 19 2023, 12:48 AM

benshi001 added inline comments.

lld/ELF/Arch/AVR.cpp
245	I see. I will not pursue removing the `NOP` any more. And I think current form `long jump` -> `short jump + nop` is simple enough, at least 1 cycle is saved. Hopefully you will accept. ^_^

benshi001 marked an inline comment as done.Mar 19 2023, 12:50 AM

ping ...

Apologies but this patch gives a feeling of overengineering for a less-popular (experimental) arch. Adding code with unclear benefits... Is it measurable?

Replacing an instruction to two likely makes the execution slower, so I think since we don't implement linker relaxation for AVR, we should not do the jmp/call rewriting as well.

In D146216#4225135, @MaskRay wrote:

Replacing an instruction to two likely makes the execution slower, so I think since we don't implement linker relaxation for AVR, we should not do the jmp/call rewriting as well.

My change is measurable,

short jmp/call cost 1 less tick than long jmp/call
the nop costs 1 tick.

So the rewriting of call has neither improvement nor regression, since the nop is always executed, so there is no tick/space change.

But the rewriting of jmp does have improvement on time, since the nop is no longer executed, one tick can be saved.

Here is the cost of short jmp

And this is the cost of long jmp

Please refer to gun-ld,

https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=bfd/elf32-avr.c;h=702719136d09acbc8c98ec49ab8129d0f33fffa8;hb=HEAD#l2721
https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=bfd/elf32-avr.c;h=702719136d09acbc8c98ec49ab8129d0f33fffa8;hb=HEAD#l2738

gnu-ld also replace JMP to a pair of RJMP+NOP, since it does save one CPU cycle, because the NOP is never executed, it is just a padding word.

ping ...

relax long jump to short jump + nop does can save 1 CPU cycle;
GNU-ld also does this optimization, as

Increasing the number of instructions, while it may improve performance for some processors (any number?), doesn't look right...

In D146216#4263682, @MaskRay wrote:

Increasing the number of instructions, while it may improve performance for some processors (any number?), doesn't look right...

On all devices, long jump costs 4 bytes, and short jump + nop also cost four bytes, there is no space expansion or shrink.
As AVR instruction manual indicates, short jump costs 2 cycles, while long jump costs 3 cycles. So one CPU cycle is saved.

For example,

long jump _foo ; this is an unconditional jump which costs 4 bytes and 3 cpu cycle
...
short jump _foo; this is an unconditional jump which costs 2 bytes and 2 cpu cycle 
nop            ; this `nop` is just for padding the space of the replaced `long jump` , it is never executed.

In the above contrast, the nop is never executed, so short jump + nop does not waste any space, but saves one CPU cycle.

benshi001 added inline comments.Apr 12 2023, 8:16 PM

lld/ELF/Arch/AVR.cpp
258	This `NOP` is just for padding, actually we need not handle it and left it unchanged. (However this will make `llvm-objdump` show an `<unknown>`)

I can give up this patch. At least it is a tiny optimization, and have little affect on lld. But can my another patch https://reviews.llvm.org/D147364 be reviewd and accepted ?
R_AVR_LO8_LDI_GS/R_AVR_HI8_LDI_GS are still missing in lld, with them implemented, lld will be fully functional as GNU-ld, and can finally replace GNU-ld. My aim of clang+llvm+compilerrt+lld fully replace gnu toolchain, can be achieved. I really appreciate for that !

benshi001 abandoned this revision.Apr 14 2023, 8:23 PM

Revision Contents

Path

Size

lld/

ELF/

Arch/

AVR.cpp

55 lines

test/

ELF/

avr-relax.s

39 lines

basic-avr.s

11 lines

Diff 505777

lld/ELF/Arch/AVR.cpp

Show All 20 Lines
// objcopy -O binary --only-section=.text foo output.bin		// objcopy -O binary --only-section=.text foo output.bin
//		//
// Note that the current AVR support is very preliminary so you can't		// Note that the current AVR support is very preliminary so you can't
// link any useful program yet, though.		// link any useful program yet, though.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "InputFiles.h"		#include "InputFiles.h"
		#include "OutputSections.h"
#include "Symbols.h"		#include "Symbols.h"
#include "Target.h"		#include "Target.h"
#include "lld/Common/ErrorHandler.h"		#include "lld/Common/ErrorHandler.h"
#include "llvm/BinaryFormat/ELF.h"		#include "llvm/BinaryFormat/ELF.h"
#include "llvm/Support/Endian.h"		#include "llvm/Support/Endian.h"

using namespace llvm;		using namespace llvm;
using namespace llvm::object;		using namespace llvm::object;
using namespace llvm::support::endian;		using namespace llvm::support::endian;
using namespace llvm::ELF;		using namespace llvm::ELF;
using namespace lld;		using namespace lld;
using namespace lld::elf;		using namespace lld::elf;

namespace {		namespace {
class AVR final : public TargetInfo {		class AVR final : public TargetInfo {
public:		public:
uint32_t calcEFlags() const override;		uint32_t calcEFlags() const override;
RelExpr getRelExpr(RelType type, const Symbol &s,		RelExpr getRelExpr(RelType type, const Symbol &s,
const uint8_t *loc) const override;		const uint8_t *loc) const override;
void relocate(uint8_t *loc, const Relocation &rel,		void relocate(uint8_t *loc, const Relocation &rel,
uint64_t val) const override;		uint64_t val) const override;
		void relocateAlloc(InputSectionBase &sec, uint8_t *buf) const override;
		bool tryRelaxLongJumpCall(uint8_t *loc, uint64_t callAddr,
		uint64_t destAddr) const;
};		};
} // namespace		} // namespace

RelExpr AVR::getRelExpr(RelType type, const Symbol &s,		RelExpr AVR::getRelExpr(RelType type, const Symbol &s,
const uint8_t *loc) const {		const uint8_t *loc) const {
switch (type) {		switch (type) {
case R_AVR_6:		case R_AVR_6:
case R_AVR_6_ADIW:		case R_AVR_6_ADIW:
▲ Show 20 Lines • Show All 171 Lines • ▼ Show 20 Lines	TargetInfo *elf::getAVRTargetInfo() {
static AVR target;		static AVR target;
return &target;		return &target;
}		}

static uint32_t getEFlags(InputFile *file) {		static uint32_t getEFlags(InputFile *file) {
return cast<ObjFile<ELF32LE>>(file)->getObj().getHeader().e_flags;		return cast<ObjFile<ELF32LE>>(file)->getObj().getHeader().e_flags;
}		}

		// Try to relax
		// jmp _foo ; 4-byte instruction
		// to
		// rjmp _foo ; 2-byte instruction
		// nop ; 2-byte instruction
		benshi001AuthorUnsubmitted Done Reply Inline Actions Is there any better we can remove this nop ? I can only figure out the way by `memcpy` `uint8_t loc` . benshi001:* Is there any better we can remove this nop ? I can only figure out the way by `memcpy` `uint8_t…
		MaskRayUnsubmitted Done Reply Inline Actions I implemented this for RISC-V but I am unsure we should get the complexity for the less-used AVR port. This complexity is exactly what I called out in a previous patch and you said that you did not intend to implement it. MaskRay: I implemented this for RISC-V but I am unsure we should get the complexity for the less-used…
		benshi001AuthorUnsubmitted Done Reply Inline Actions I see. I will not pursue removing the `NOP` any more. And I think current form `long jump` -> `short jump + nop` is simple enough, at least 1 cycle is saved. Hopefully you will accept. ^_^ benshi001: I see. I will not pursue removing the `NOP` any more. And I think current form `long jump` ->…
		bool AVR::tryRelaxLongJumpCall(uint8_t *loc, uint64_t callAddr,
		uint64_t destAddr) const {
		// The offset must be in range [-4094, 4096].
		const int64_t Off = destAddr - callAddr;
		if (Off < -4094 \|\| Off > 4096)
		return false;

		// Set the first instruction to RCALL/RJMP.
		const uint16_t OffCode = static_cast<uint64_t>(Off - 2) >> 1;
		const uint16_t OpCode = read16le(loc) == 0x940c ? 0xc000 : 0xd000;
		write16le(loc, OpCode \| (OffCode & 0xfff));

		// Set the second instruction to NOP.
		benshi001AuthorUnsubmitted Done Reply Inline Actions This `NOP` is just for padding, actually we need not handle it and left it unchanged. (However this will make `llvm-objdump` show an `<unknown>`) benshi001: This `NOP` is just for padding, actually we need not handle it and left it unchanged. (However…
		write16le(loc + 2, 0);

		return true;
		}

		void AVR::relocateAlloc(InputSectionBase &sec, uint8_t *buf) const {
		uint64_t secAddr = sec.getOutputSection()->addr;
		if (auto *s = dyn_cast<InputSection>(&sec))
		secAddr += s->outSecOff;
		for (const Relocation &rel : sec.relocs()) {
		uint8_t *loc = buf + rel.offset;
		const uint64_t val = SignExtend64(
		sec.getRelocTargetVA(sec.file, rel.type, rel.addend,
		secAddr + rel.offset, *rel.sym, rel.expr),
		32);

		switch (rel.type) {
		// Try to relax a long jump/call (a 4-byte instruction) to a pair of
		// short jump/call (a 2-byte instruction) and nop (a 2-byte instruction).
		case R_AVR_CALL:
		if (config->relax &&
		(getEFlags(ctx.objectFiles[0]) & EF_AVR_LINKRELAX_PREPARED) != 0)
		if (tryRelaxLongJumpCall(loc, secAddr + rel.offset, val))
		continue;
		[[fallthrough]];

		default:
		relocate(loc, rel, val);
		break;
		}
		}
		}

uint32_t AVR::calcEFlags() const {		uint32_t AVR::calcEFlags() const {
assert(!ctx.objectFiles.empty());		assert(!ctx.objectFiles.empty());

uint32_t flags = getEFlags(ctx.objectFiles[0]);		uint32_t flags = getEFlags(ctx.objectFiles[0]);
bool hasLinkRelaxFlag = flags & EF_AVR_LINKRELAX_PREPARED;		bool hasLinkRelaxFlag = flags & EF_AVR_LINKRELAX_PREPARED;

for (InputFile *f : ArrayRef(ctx.objectFiles).slice(1)) {		for (InputFile *f : ArrayRef(ctx.objectFiles).slice(1)) {
uint32_t objFlags = getEFlags(f);		uint32_t objFlags = getEFlags(f);
Show All 12 Lines

lld/test/ELF/avr-relax.s

This file was added.

				# REQUIRES: avr
				# RUN: llvm-mc -filetype=obj -triple=avr -mcpu=atmega328p %s -o %t.o
				# RUN: ld.lld %t.o -o %t0.exe -Ttext=0 --no-relax
				# RUN: llvm-objdump -d %t0.exe --mcpu=atmega328 \| FileCheck %s --check-prefix=NORELAX
				# RUN: ld.lld %t.o -o %t1.exe -Ttext=0
				# RUN: llvm-objdump -d %t1.exe --mcpu=atmega328 \| FileCheck %s --check-prefix=RELAX

				main:
				call foo
				rcall foo
				foo:
				jmp foo
				rjmp foo

				la0:
				jmp la1
				.zero 4096
				la1:
				jmp la0

				# NORELAX: <main>:
				# NORELAX-NEXT: 0: 0e 94 03 00 call 0x6
				# NORELAX-NEXT: 4: 00 d0 rcall .+0
				# NORELAX: <foo>:
				# NORELAX-NEXT: 6: 0c 94 03 00 jmp 0x6
				# NORELAX-NEXT: a: fd cf rjmp .-6

				# RELAX: <main>:
				# RELAX-NEXT: 0: 02 d0 rcall .+4
				# RELAX-NEXT: 2: 00 00 nop
				# RELAX-NEXT: 4: 00 d0 rcall .+0
				# RELAX: <foo>:
				# RELAX-NEXT: 6: ff cf rjmp .-2
				# RELAX-NEXT: 8: 00 00 nop
				# RELAX-NEXT: a: fd cf rjmp .-6
				# RELAX: <la0>:
				# RELAX-NEXT: c: 0c 94 08 08 jmp 0x1010
				benshi001AuthorUnsubmitted Done Reply Inline Actions These 2 jmp instructions are no relaxed, due to out of range. benshi001: These 2 jmp instructions are no relaxed, due to out of range.
				# RELAX: <la1>:
				# RELAX-NEXT: 1010: 0c 94 06 00 jmp 0xc

lld/test/ELF/basic-avr.s

	# REQUIRES: avr			# RUN: llvm-mc -filetype=obj -triple=avr -mcpu=atmega328p %s -o %t.o
	# RUN: llvm-mc -filetype=obj -triple=avr-unknown-linux -mcpu=atmega328p %s -o %t.o			# RUN: ld.lld %t.o -o %t.exe -Ttext=0 --no-relax
	# RUN: ld.lld %t.o -o %t.exe -Ttext=0			# RUN: llvm-objdump -d %t.exe --mcpu=atmega328 \| FileCheck %s
	# RUN: llvm-objdump -d %t.exe \| FileCheck %s

	main:			main:
	call foo			call foo
	foo:			foo:
	jmp foo			jmp foo

	# CHECK: <main>:			# CHECK: <main>:
	# CHECK-NEXT: 0: 0e 94 02 00 <unknown>			# CHECK-NEXT: 0: 0e 94 02 00 call 0x4
	benshi001AuthorUnsubmitted Done Reply Inline Actions We should not display `<unknown>`, we can display the instructions via adding `--mcpu=atmega328p` to `llvm-objdump`. benshi001: We should not display ` <unknown>`, we can display the instructions via adding `…
	# CHECK: <foo>:			# CHECK: <foo>:
	# CHECK-NEXT: 4: 0c 94 02 00 <unknown>			# CHECK-NEXT: 4: 0c 94 02 00 jmp 0x4

This is an archive of the discontinued LLVM Phabricator instance.

[lld][ELF] Relax long jump/call to short jump/call on AVRAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 505777

lld/ELF/Arch/AVR.cpp

lld/test/ELF/avr-relax.s

lld/test/ELF/basic-avr.s

[lld][ELF] Relax long jump/call to short jump/call on AVR
AbandonedPublic