Details
- Reviewers
rengolin t.p.northover
Diff Detail
Event Timeline
Hi Jon,
See my comments inline.
Also, please add some tests with ARMv4T and ARMv5T variants, checking for their specific returns, using FileCheck.
Thanks!
--renato
llvm/lib/Target/ARM/Thumb1FrameLowering.cpp | ||
---|---|---|
474 ↗ | (On Diff #12088) | Since you're already creating a new boolean, why not make it IsV4TReturn and set it inside the conditional below, so that your special lowering only checks for a flag and not a bunch of them. |
475 ↗ | (On Diff #12088) | A comment here hinting at the block you just added below would be nice. Something like: // ARMv4T requires BX, see below |
495 ↗ | (On Diff #12088) |
Hi Jon,
I had a bit of a think about the liveness thing I mentioned. I think it's mostly OK, but there is one issue.
Cheers.
Tim.
llvm/lib/Target/ARM/Thumb1FrameLowering.cpp | ||
---|---|---|
502–503 ↗ | (On Diff #12088) | I think R3 is potentially reserved for a return value. E.g. define i128 @foo() { ret i128 0 } It's rather an edge-case (particularly on v4t, but then v4t is one big edge case). But we probably shouldn't break it anyway. The safest thing to do would be use the register scavenger. And if the machinery is wired up to restoreCalleeSavedRegisters (fingers crossed) the easiest way to do *that* is to just create a virtual register (even though it is post regalloc, it's got special handling in PrologEpilogInserter). |
I'll add some tests, and look into the register scavenger thing. I think I'm going to hold off on coalescing the POPs though until I'm convinced that it's even possible to do.
Thanks for the quick review comments!
Jon
llvm/lib/Target/ARM/Thumb1FrameLowering.cpp | ||
---|---|---|
474 ↗ | (On Diff #12088) | Ok. |
475 ↗ | (On Diff #12088) | Ok. |
495 ↗ | (On Diff #12088) | Yes, it will create two POPs. ISTR that POPs require their register lists to be in ascending order with no duplicates, so I think swapping PC for R3 only works if reglist contains at most {r0,r1,r2,pc}. Please correct me here if I'm mistaken. If I use a virtual register as Tim suggests, I think I'm going to have a hard time detecting the few cases where it is possible to coalesce these two POPs. Any suggestions on how to make that happen? |
502–503 ↗ | (On Diff #12088) | Ohhhh, good point. What about i256? That would eat up _all_ the lo registers, leaving us with no option to pop the link address. I guess for the ABI to be sane, that would have to go on the stack... |
llvm/lib/Target/ARM/Thumb1FrameLowering.cpp | ||
---|---|---|
495 ↗ | (On Diff #12088) | That's a good point. If you can use LR, than the problem is solved. If not, let's leave the two POPs for now and create a FIXME comment to that effect. |
502–503 ↗ | (On Diff #12088) | The AAPCS only allows you to use R0~R3, everything else goes on the stack. The confusion from using R3 comes from ARM's own docs suggesting it's a reasonable way of doing it (even not coalescing the POPs), but it fails to mention that large return values could use it. I had a read on the old APCS and couldn't find any mentions and to why R3 should be used as a return register (though it says R0~R3 can be used for returning values). But since we don't follow the APCS any more, even on ARMv4T, I think we should discard any reference to that document. Any other register would have to be spilled in the caller, which is just not viable. Tim's suggestion would be the only reasonable way of doing it as a generic case. Without thinking too much, we may get away with POPing it into LR again, and BX LR. I haven't covered all tail-return / recursive cases, but it would be a way of not needing the register scavenger. |
I had a read on the old APCS and couldn't find any mentions and to why R3 should be used as a return register (though it says R0~R3 can be used for returning values). But since we don't follow the APCS any more, even on ARMv4T, I think we should discard any reference to that document.
iOS is unfortunately stuck with the APCS.
Cheers.
Tim.
*gasp* *cough* *spit*... er... ok.
Anyway, AAPCS is more strict anyway, so whatever we do, must follow both.
I found another sequence that will work for the case where we would be clobbering r3 of the return value.
pop {r4, r5, r6, r7}
mov ip, r3
pop {r3}
mov lr, r3
mov r3, ip
bx lr
This is slightly less horrible than all the alternatives I've been able to come up with... correctness first, then performance later.
With this patch, I've fixed both the varargs and non-varargs cases, and added a test for the latter. I still need to write one for the former, but it is exercising the same logic.
Jon,
You probably have thought of it, but what's wrong with just "pop"ing it manually?
pop {r4, r5, r6, r7} ldr lr, sp add sp, #4 bx lr
ldr lr, [sp]
I don't think there's any Thumb1 load into registers above r7.
Cheers.
Tim.
Of course. I keep forgetting Thumb1 is that restrictive, and the v5 manual is not particularly descriptive. :(
The generated code looks horrible, but I guess that's Thumb1's fault, I can't think of anything better. I also like the conditional between larger return values and v5T vs. v4T lowering.
I have no more comments, if Tim agrees, this patch looks good to go to me.
cheers,
--renato
Add tests for varargs functions.
Fix stack pointer update (accidentally dropped this).