Page MenuHomePhabricator

[AVR] Generate ELPM for loading byte/word from extended program memory
ClosedPublic

Authored by benshi001 on Jan 2 2022, 3:25 AM.

Diff Detail

Event Timeline

benshi001 created this revision.Jan 2 2022, 3:25 AM
benshi001 requested review of this revision.Jan 2 2022, 3:25 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 2 2022, 3:25 AM
This comment was removed by benshi001.
benshi001 updated this revision to Diff 396920.Jan 2 2022, 4:13 AM
benshi001 updated this revision to Diff 396925.Jan 2 2022, 5:09 AM

For my context, I believe this is for devices that have a larger flash memory, such as the ATmega2560?

I haven't looked too deeply at the code yet.

llvm/lib/Target/AVR/AVRExpandPseudoInsts.cpp
829

There is actually a much more elegant way to do this: by giving ELPMRdZ (and similar) instructions an extra LD8 input and output register. This makes sure the register allocator allocates a spare register for this instruction. The scavengeGPR8 method is really just a hack to find an unused register and it can fail if no register is available. In addition, by telling the register allocator you need an LD8 register, you don't need the if/else below.

aykevl added inline comments.Jan 4 2022, 3:51 PM
llvm/lib/Target/AVR/AVRExpandPseudoInsts.cpp
829

I recently implemented this, in a work-in-progress change. It looks like this in my case:

+let Constraints = "@earlyclobber $scratch" in
 class AtomicLoad<PatFrag Op, RegisterClass DRC,
                  RegisterClass PTRRC> :
-  Pseudo<(outs DRC:$rd), (ins PTRRC:$rr), "atomic_op",
-         [(set DRC:$rd, (Op i16:$rr))]>;
+  Pseudo<(outs DRC:$rd, GPR8:$scratch), (ins PTRRC:$rr), "atomic_op",
+         [(set DRC:$rd, (Op i16:$rr)), (set GPR8:$scratch, SREG)]>;

So essentially, this adds a new output with the "earlyclobber" flag set. This scratch register can then be used however you like.

aykevl added inline comments.Jan 5 2022, 4:39 PM
llvm/lib/Target/AVR/AVRExpandPseudoInsts.cpp
829

Another option is to make the Bank operand a register operand instead of an immediate. This would make the code generator put the value there without an extra ldi instruction, and allows optimizations such as using the same value multiple times for multiple elpm instructions.

benshi001 added inline comments.Jan 5 2022, 6:00 PM
llvm/lib/Target/AVR/AVRExpandPseudoInsts.cpp
829

That is a good idea, Thanks. I will try it later this week.

benshi001 updated this revision to Diff 398645.Jan 10 2022, 7:55 AM
benshi001 marked 2 inline comments as done.Jan 10 2022, 7:58 AM
benshi001 added inline comments.
llvm/lib/Target/AVR/AVRExpandPseudoInsts.cpp
829

Done. I have created an extra LDI machine node, which I think is the best way to revmove calling of scavengeGPR8 in my previous patch revision.

benshi001 updated this revision to Diff 398819.Jan 10 2022, 7:59 PM

This looks pretty good. Thanks for the improvements!

Can you add a test where you read twice from the same extended address space? It should only have one ldi instruction to load the address space number.

We should also save RAMPZ in interrupts on devices that support ELPM. But I guess that can happen in a separate patch.
See for example: https://godbolt.org/z/x9Tx1jzMc

llvm/lib/Target/AVR/AVRExpandPseudoInsts.cpp
833

I don't think this necessarily kills the register. A later instruction might use the same value.

892

Same here.

llvm/lib/Target/AVR/AVRISelDAGToDAG.cpp
366–368

I think it is better to use report_fatal_error here instead of assert. Asserts are removed in release builds, so this check will not be performed in release builds, while I think it should always be checked.

The same is true for the assert above (Subtarget->hasLPM()) but that is outside this patch.

In general, asserts should only be used for things that are always true unless there are bugs in the code. In this case, it is possible to trigger the assert without a bug in the AVR backend.

385–387

I don't really understand this comment?
The instruction selector should always use a register class that is supported by all the instruction operands for the virtual register. In this case, that would be LD8.

I found out that avr-gcc does not support reusing the same ldi instruction for multiple elpm instructions, so if we can do that, we're better than avr-gcc :)
https://godbolt.org/z/Yzhaj7e54

benshi001 updated this revision to Diff 401169.Jan 19 2022, 4:19 AM
benshi001 marked 4 inline comments as done.Jan 19 2022, 4:22 AM
benshi001 added inline comments.
llvm/lib/Target/AVR/AVRISelDAGToDAG.cpp
366–368

I also change the above assert to report_fatal_error.

385–387

My previous intention is to say

do not combine LDI into ELPM, since ELPM may be allocated a register in R0-R15, and LDI can not access.

Anyway, I change the comment to a more clear expression.

benshi001 marked 2 inline comments as done.Jan 19 2022, 4:22 AM
benshi001 added a comment.EditedJan 19 2022, 4:26 AM

I also have added a new test foob4 in the new test elpm.ll, which will show the case you mentioned:

read twice from the same extended address space, and there is only one LDI instruction to load the address space number.

This looks pretty good. Thanks for the improvements!

Can you add a test where you read twice from the same extended address space? It should only have one ldi instruction to load the address space number.

We should also save RAMPZ in interrupts on devices that support ELPM. But I guess that can happen in a separate patch.
See for example: https://godbolt.org/z/x9Tx1jzMc

aykevl accepted this revision.Jan 19 2022, 3:20 PM

Looks good to me!

This revision is now accepted and ready to land.Jan 19 2022, 3:20 PM