This is an archive of the discontinued LLVM Phabricator instance.

[LLD][ELF][AVR] Implement the missing relocation types
ClosedPublic

Authored by aykevl on Apr 23 2020, 11:57 AM.

Details

Summary

Implements the missing relocation types for AVR target.
The results have been cross-checked with binutils.

Original patch by LemonBoy. Some changes by me.

Diff Detail

Event Timeline

LemonBoy created this revision.Apr 23 2020, 11:57 AM

Similar question: where can we find the psABI?

lld/test/ELF/avr-reloc.s
8

Does AVR use uppercase section names?

53

Prefer -NEXT: if applicable. And align the content with the previous line.

Similar question: where can we find the psABI?

Good question, I wasn't able to find any document online.
All the info I've used come from this ISA manual and by feeding some asm into ld to see if the result matched my expectations.

(I forgot to add that D78721 is needed to let the test complete)

Good question, I wasn't able to find any document online.
All the info I've used come from this ISA manual and by feeding some asm into ld to see if the result matched my expectations.

Yes - there is no document I have been able to find either. As you note, the best we can do is compile and dump the sources under AVR-GCC and see that the expectations match up.

This patch is really good, the test coverage is nice too (TIL --defsym=).

I will have a deeper review of the specific relocation bitwise expressions for correctness this week.

lld/test/ELF/avr-reloc.s
8

Not normally but I suppose this is chosen in frontend code so it is allowed but may not be a section that is understood by an existing AVR ELF program flashing tool. Of course, it says nowhere in The Ten Commandments that every section in the ELF file must be flashed to the final AVR device, so uppercase section names are reasonable in this example.

I've checked over all of the relocation logic, cross referenced with both the AVR ISA manual and the AVR LLVM fixture implementations in AVRAsmBackend.cpp - everything looks consistent to me.

Nice patch!

I think there are a few missing check[U]Int calls (I've left couple comments comments) but other than that, everything looks good from an AVR perspective.

lld/ELF/Arch/AVR.cpp
96

missing checkUInt(loc, val, 8, rel)

102

I suspect all of these remaining cases can/should have checkUInt(loc, val, N, rel) calls

156

Missing checkInt?

dylanmckay requested changes to this revision.May 31 2020, 5:02 AM
This revision now requires changes to proceed.May 31 2020, 5:02 AM

I won't have time to update this diff in the near future so feel free to add the value range checks in a follow-up commit/patch.

aykevl added a subscriber: aykevl.Jun 10 2020, 12:42 PM

I actually wrote a patch very similar to this one (but not implementing as many relocations). It's here: https://gist.github.com/aykevl/ce3c04e1175d9602c9dfaf0cf91298ba
I wanted to hold off on submitting it before the AVR disassembler was a bit more reliable, to make proper tests.

It's heavily documented. Feel free to use it however you wish. I have an inline comment for what I suspect is a bug in this patch.

lld/ELF/Arch/AVR.cpp
77

The comment I wrote on my patch is the following:

Note: this relocation is often used between code and data space, which are 0x800000 apart in the output ELF file. The bitmask cuts off the high bit.

I'm not sure when it would trigger, perhaps with debug info. Did you test that?

Also, what does the MEMORY part of your linker script look like? For me it looks like this:

MEMORY
{
    FLASH_TEXT (rw) : ORIGIN = 0,                      LENGTH = __flash_size - _bootloader_size
    RAM (xrw)       : ORIGIN = 0x800000 + __ram_start, LENGTH = __ram_size
}

In other words, code and data are 0x800000 apart (in the ELF file). This is also the convention followed by the avr-gcc toolchain.

aykevl added inline comments.Jun 10 2020, 12:44 PM
lld/test/ELF/avr-reloc.s
59

Since D74049 landed, it is able to disassemble I/O instructions such as in.

aykevl added a comment.EditedJun 29 2020, 12:39 PM

I have attached a patch that addresses review comments:

  • Fixed R_AVR_16 to mask off the high bits instead of raising an error.
  • Added checkInt to R_AVR_7_PCREL.
  • Used -NEXT in the test where possible.
  • Switched to checking instructions now that more instructions can be disassembled.

With these changes, I can successfully link at least some TinyGo compiled programs.

@LemonBoy from your previous comment it looks like you don't intend to work on this further, is that correct? If so, are you okay with me taking over this patch, thus uploading this new patch for example?

lld/ELF/Arch/AVR.cpp
102

This relocation also goes between code and data space, so may be 0x800000 apart. Therefore there is a good reason why these symbols would be far apart and masking off the high bits is valid.

(Sidenote: I've heard avr-ld shipped with GCC 9 requires code and data to not overlap or it complains, so it actually has to be done this way).

@LemonBoy from your previous comment it looks like you don't intend to work on this further, is that correct? If so, are you okay with me taking over this patch, thus uploading this new patch for example?

Yeah go for it, let's land this patchset.

Looks like I'm not able to upload a new diff, so I've made a new revision: D82854.

You can click "Commandeer Revision" if you want to keep history...

aykevl commandeered this revision.Jul 1 2020, 8:16 AM
aykevl edited reviewers, added: LemonBoy; removed: aykevl.

Ok, let's see whether this works...

aykevl updated this revision to Diff 274822.Jul 1 2020, 8:20 AM
aykevl edited the summary of this revision. (Show Details)

@MaskRay thanks! I wasn't aware of that option.

Changes I made:

  • Fixed R_AVR_16 to mask off the high bits instead of raising an error.
  • Added checkInt to R_AVR_7_PCREL.
  • Used -NEXT in the test where possible.
  • Switched to checking instructions now that more instructions can be disassembled.
aykevl updated this revision to Diff 274826.Jul 1 2020, 8:38 AM
  • some minor formatting fixes to CHECK lines
MaskRay added inline comments.Jul 9 2020, 10:24 AM
lld/ELF/Arch/AVR.cpp
98

~val + 1 = -val

So just use (-val)

lld/test/ELF/avr-reloc.s
4

-o %t

Omit .exe

I don't know why -Ttext=0 is there. Use --image-base=0 if you really want a 0 address text segment.

aykevl updated this revision to Diff 277061.Jul 10 2020, 8:50 AM

Thanks for the review! This should address all comments.

-Ttext=0 was indeed unnecessary (tests still pass) so I've removed it.

MaskRay added inline comments.Jul 10 2020, 10:49 AM
lld/test/ELF/avr-reloc.s
5

You may want --print-imm-hex to print immediates in hexadecimal.

18

Might be worth adding comments about the exact relocation type used, e.g.

ldi r20, lo8(a) # R_AVR_LO8_LDI...

81

{{.*}} 1e1e000f 00785634 12

The address of .DATA is insignificant and should be omitted.

aykevl marked 3 inline comments as done.Jul 11 2020, 9:56 AM
aykevl added inline comments.
lld/test/ELF/avr-reloc.s
5

I tried, but it doesn't seem to make a difference in the output. However I agree the assembly would be easier to read if it were hexadecimal (because of the hexadecimal input).

18

Good idea, I'll change this.

81

What do you mean? The address is not included in the test.

This is the full output:

Contents of section .DATA:
 110e4 1e1e000f 00785634 12                 .....xV4.

I believe 110e4 is the address (not included in the test) while 1e1e000f 00785634 12 is the contents of the section (in hex).

Or do you mean I should check for the presence of an address using {{.*}}?

aykevl updated this revision to Diff 277249.Jul 11 2020, 10:04 AM
  • add relocation types in comments
MaskRay added inline comments.Jul 11 2020, 11:09 AM
lld/test/ELF/avr-reloc.s
18

Doesn't # work as well?

81

110e4 as the address is insignificant. If the content is not dependent on the address, omitting 110e4 has the benefit that the test does not need an update if the assigned addresses change.

aykevl marked 2 inline comments as done.Jul 11 2020, 1:07 PM
aykevl added inline comments.
lld/test/ELF/avr-reloc.s
18

Apparently not. I tried, but the assembler chokes on it. It appears that ; is the comment char in AVR assembly.

81

Sorry, I still don't understand. There is no address 110e4 in the test code that could be removed. The test is independent of the address of the .DATA section.

What change do you suggest?

MaskRay added inline comments.Jul 11 2020, 1:46 PM
lld/test/ELF/avr-reloc.s
81

I have mentioned {{.*}} 1e1e000f 00785634 12. See the first comment.

It matches 110e4 1e1e000f 00785634 12 and 210e4 1e1e000f 00785634 12 and other addresses. For a contributor updating LLD's address assignment algorithm, it is very annoying to update every test.

aykevl marked an inline comment as done.Jul 11 2020, 2:02 PM
aykevl added inline comments.
lld/test/ELF/avr-reloc.s
81

Oh okay, didn't interpret that as a code change suggestion. Will update the patch.

aykevl updated this revision to Diff 277261.Jul 11 2020, 2:06 PM
  • Updated .DATA test.

I think this addresses all review comments.

Please check D83634. We can then enable --print-imm-hex

If the canonical comment marker of AVR is ;, I think we should just use ; everywhere...

lld/ELF/Arch/AVR.cpp
161

Delete redundant parentheses: ((val - 2) >> 1) -> (val - 2) >> 1

167

Delete redundant parentheses

aykevl updated this revision to Diff 277282.Jul 12 2020, 3:17 AM
aykevl edited the summary of this revision. (Show Details)
  • added --print-imm-hex (so this patch depends on D83634)
  • deleted some redundant parentheses
  • changed all comment markers to ;

I've confirmed that # is not a valid comment char in the original AVR assembler. See: http://ww1.microchip.com/downloads/en/devicedoc/40001917a.pdf#page=12 (section 4.3). I don't know why # at the start of a line works as a comment, perhaps these lines are removed by llvm-lit?

Thanks for the update. All look good to me now.

  • added --print-imm-hex (so this patch depends on D83634)
  • deleted some redundant parentheses
  • changed all comment markers to ;

I've confirmed that # is not a valid comment char in the original AVR assembler. See: http://ww1.microchip.com/downloads/en/devicedoc/40001917a.pdf#page=12 (section 4.3). I don't know why # at the start of a line works as a comment, perhaps these lines are removed by llvm-lit?

MCParser supports # in the line beginning.

6.12 of the document you linked says "Unsurprisingly, this directive does exactly nothing. The only reason it exists is that it is required by the ANSI C standard" but I think that is not true. GCC/clang will report error: invalid preprocessing directive

MaskRay accepted this revision.Jul 12 2020, 8:11 AM
This revision was not accepted when it landed; it landed in state Needs Review.Jul 12 2020, 9:20 AM
This revision was automatically updated to reflect the committed changes.