Support prologue/epilogue generation for RV64 when the stack offset exceeds 32-bit
Details
- Reviewers
asb apazos lenary luismarques - Commits
- rGa49a16ddd0eb: [RISCV] Support stack offset exceed 32-bit for RV64
rL371810: [RISCV] Support stack offset exceed 32-bit for RV64
rGeaa230fe3c86: [RISCV] Support stack offset exceed 32-bit for RV64
rL371806: [RISCV] Support stack offset exceed 32-bit for RV64
Diff Detail
- Repository
- rL LLVM
Event Timeline
Using T1 as temp register for prologue/epilogue generation if there is no shrink wrapping optimization occur.
Thanks for the patch! I have two comments, which I think will help improve this patch.
lib/Target/RISCV/RISCVFrameLowering.cpp | ||
---|---|---|
90 | I think you can probably get rid of movImm32, and only use movImm64 (the logic in generateInstSeq covers 32-bit operands), which I think will address the TODO in movImm32. Then my only worry would be how we error when we need a larger-than-32-bit offset on RISC-V 32, which would probably cause errors somewhere way before this code anyway. | |
test/CodeGen/RISCV/rv64-large-stack.ll | ||
43 | I think this test case can be significantly simplified. You should just need a single alloca, and then maybe a call that uses a pointer into the alloca? |
lib/Target/RISCV/RISCVFrameLowering.cpp | ||
---|---|---|
90 | Yes, there's some test case improvement by replacing movImm32 with the generateInstSeq one, thanks! I would add an error message for RV32 with a larger-than-32-bit offset. We could update the message when we support a larger-than-32-bit offset in RV32. | |
test/CodeGen/RISCV/rv64-large-stack.ll | ||
43 | Yes, it becomes significantly cleaner, thanks! |
Nice, this is looking a lot better.
I chatted to @asb about this this morning, and he's not sure of the value of using t1 (if it's free) instead of any general-purpose-register. This is going to compromise how many of these instructions we can compress when the C extension is enabled. If you use a virtual register, then the register allocator can use a0-5 if it wishes, which can be compressed, unlike t1. Do you have a justification for explicitly choosing t1?
To my understanding, the caller saved registers which are not parameter registers(a0-7) could be used as temp register for prologue and epilogue without any spill because the lifetime of these registers won't cross function call, and it would start after prologue and end before the epilogue. For the virtual register, if there are not enough scratch registers, the allocator will try to spill one. So choosing t1 could avoid spill and GCC choose t1, too. If the offset need to be rematlize, the constant after spilit might not fit in the C extension instructions in most cases.
I see your point that t0 is guaranteed not to be live over a function call boundary.
If parameter registers a0-7 are not being used to pass arguments, then their lifetime will also not cross the function call and they are effectively no different from the temp registers, as far as their usage in the ABI is concerned.
In the case where there are unallocated (general purpose) registers, presumably choosing a virtual GPR will choose an unallocated one, and not spill. LLVM should know that if a0-7 is used for argument passing, then it cannot be used for a spill.
I think if we have to spill, we should let the compiler choose the register that is cheapest to spill. And if we don't have to spill, choosing in allocation order is more important as we have put the compressible registers earlier in the GPR allocation order.
Thus, if we have the ability to let LLVM choose for us, then we should.
Choosing virtual register as temp register.
I thought to choose a virtual register may cause spill, but I was wrong. After dumping the liveness info in Scavenger, LLVM could know caller saved register which is not using for parameter passing could be used. If the shrink wrap didn't occur, the caller saved registers which not used for parameter passing should always free to use for prologue and epilogue. And Scavenger will follow the AllocationOrder for the backend. So choosing a virtual register will be more reasonable. Thanks for @asb and @lenary comments, I got your point now :)
Great! This looks good to me! I'm glad we're getting rid of all those mov a0, a0 (which are really useless addi a0, a0, 0).
I'm not worried about changes from addi to addiw.
I'm happy that this means improvements to RISCVMatInt will affect these offsets as well as anywhere else we use that code.
I think you can probably get rid of movImm32, and only use movImm64 (the logic in generateInstSeq covers 32-bit operands), which I think will address the TODO in movImm32.
Then my only worry would be how we error when we need a larger-than-32-bit offset on RISC-V 32, which would probably cause errors somewhere way before this code anyway.