Details
Diff Detail
Event Timeline
For my context, I believe this is for devices that have a larger flash memory, such as the ATmega2560?
I haven't looked too deeply at the code yet.
llvm/lib/Target/AVR/AVRExpandPseudoInsts.cpp | ||
---|---|---|
829 | There is actually a much more elegant way to do this: by giving ELPMRdZ (and similar) instructions an extra LD8 input and output register. This makes sure the register allocator allocates a spare register for this instruction. The scavengeGPR8 method is really just a hack to find an unused register and it can fail if no register is available. In addition, by telling the register allocator you need an LD8 register, you don't need the if/else below. |
llvm/lib/Target/AVR/AVRExpandPseudoInsts.cpp | ||
---|---|---|
829 | I recently implemented this, in a work-in-progress change. It looks like this in my case: +let Constraints = "@earlyclobber $scratch" in class AtomicLoad<PatFrag Op, RegisterClass DRC, RegisterClass PTRRC> : - Pseudo<(outs DRC:$rd), (ins PTRRC:$rr), "atomic_op", - [(set DRC:$rd, (Op i16:$rr))]>; + Pseudo<(outs DRC:$rd, GPR8:$scratch), (ins PTRRC:$rr), "atomic_op", + [(set DRC:$rd, (Op i16:$rr)), (set GPR8:$scratch, SREG)]>; So essentially, this adds a new output with the "earlyclobber" flag set. This scratch register can then be used however you like. |
llvm/lib/Target/AVR/AVRExpandPseudoInsts.cpp | ||
---|---|---|
829 | Another option is to make the Bank operand a register operand instead of an immediate. This would make the code generator put the value there without an extra ldi instruction, and allows optimizations such as using the same value multiple times for multiple elpm instructions. |
llvm/lib/Target/AVR/AVRExpandPseudoInsts.cpp | ||
---|---|---|
829 | That is a good idea, Thanks. I will try it later this week. |
llvm/lib/Target/AVR/AVRExpandPseudoInsts.cpp | ||
---|---|---|
829 | Done. I have created an extra LDI machine node, which I think is the best way to revmove calling of scavengeGPR8 in my previous patch revision. |
This looks pretty good. Thanks for the improvements!
Can you add a test where you read twice from the same extended address space? It should only have one ldi instruction to load the address space number.
We should also save RAMPZ in interrupts on devices that support ELPM. But I guess that can happen in a separate patch.
See for example: https://godbolt.org/z/x9Tx1jzMc
llvm/lib/Target/AVR/AVRExpandPseudoInsts.cpp | ||
---|---|---|
833 | I don't think this necessarily kills the register. A later instruction might use the same value. | |
892 | Same here. | |
llvm/lib/Target/AVR/AVRISelDAGToDAG.cpp | ||
366–368 | I think it is better to use report_fatal_error here instead of assert. Asserts are removed in release builds, so this check will not be performed in release builds, while I think it should always be checked. The same is true for the assert above (Subtarget->hasLPM()) but that is outside this patch. In general, asserts should only be used for things that are always true unless there are bugs in the code. In this case, it is possible to trigger the assert without a bug in the AVR backend. | |
385–387 | I don't really understand this comment? |
I found out that avr-gcc does not support reusing the same ldi instruction for multiple elpm instructions, so if we can do that, we're better than avr-gcc :)
https://godbolt.org/z/Yzhaj7e54
llvm/lib/Target/AVR/AVRISelDAGToDAG.cpp | ||
---|---|---|
366–368 | I also change the above assert to report_fatal_error. | |
385–387 | My previous intention is to say do not combine LDI into ELPM, since ELPM may be allocated a register in R0-R15, and LDI can not access. Anyway, I change the comment to a more clear expression. |
I also have added a new test foob4 in the new test elpm.ll, which will show the case you mentioned:
read twice from the same extended address space, and there is only one LDI instruction to load the address space number.
There is actually a much more elegant way to do this: by giving ELPMRdZ (and similar) instructions an extra LD8 input and output register. This makes sure the register allocator allocates a spare register for this instruction. The scavengeGPR8 method is really just a hack to find an unused register and it can fail if no register is available. In addition, by telling the register allocator you need an LD8 register, you don't need the if/else below.