Page MenuHomePhabricator

[RISCV] Support stack offset exceed 32-bit for RV64
ClosedPublic

Authored by shiva0217 on May 14 2019, 1:49 AM.

Diff Detail

Repository
rL LLVM

Event Timeline

shiva0217 created this revision.May 14 2019, 1:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 14 2019, 1:49 AM
lenary added a subscriber: lenary.Aug 1 2019, 5:58 AM
shiva0217 updated this revision to Diff 217758.Aug 28 2019, 6:59 PM

Rebase to the trunk

shiva0217 added reviewers: lenary, luismarques.

Using T1 as temp register for prologue/epilogue generation if there is no shrink wrapping optimization occur.

Thanks for the patch! I have two comments, which I think will help improve this patch.

lib/Target/RISCV/RISCVFrameLowering.cpp
88 ↗(On Diff #219481)

I think you can probably get rid of movImm32, and only use movImm64 (the logic in generateInstSeq covers 32-bit operands), which I think will address the TODO in movImm32.

Then my only worry would be how we error when we need a larger-than-32-bit offset on RISC-V 32, which would probably cause errors somewhere way before this code anyway.

test/CodeGen/RISCV/rv64-large-stack.ll
42 ↗(On Diff #219481)

I think this test case can be significantly simplified. You should just need a single alloca, and then maybe a call that uses a pointer into the alloca?

shiva0217 marked 2 inline comments as done.Sep 11 2019, 7:08 AM
shiva0217 added inline comments.
lib/Target/RISCV/RISCVFrameLowering.cpp
88 ↗(On Diff #219481)

Yes, there's some test case improvement by replacing movImm32 with the generateInstSeq one, thanks!

I would add an error message for RV32 with a larger-than-32-bit offset. We could update the message when we support a larger-than-32-bit offset in RV32.

test/CodeGen/RISCV/rv64-large-stack.ll
42 ↗(On Diff #219481)

Yes, it becomes significantly cleaner, thanks!

shiva0217 updated this revision to Diff 219708.Sep 11 2019, 7:14 AM

Update patch to address the comments.

Hi @lenary, thanks for the review.

Nice, this is looking a lot better.

I chatted to @asb about this this morning, and he's not sure of the value of using t1 (if it's free) instead of any general-purpose-register. This is going to compromise how many of these instructions we can compress when the C extension is enabled. If you use a virtual register, then the register allocator can use a0-5 if it wishes, which can be compressed, unlike t1. Do you have a justification for explicitly choosing t1?

Nice, this is looking a lot better.

I chatted to @asb about this this morning, and he's not sure of the value of using t1 (if it's free) instead of any general-purpose-register. This is going to compromise how many of these instructions we can compress when the C extension is enabled. If you use a virtual register, then the register allocator can use a0-5 if it wishes, which can be compressed, unlike t1. Do you have a justification for explicitly choosing t1?

To my understanding, the caller saved registers which are not parameter registers(a0-7) could be used as temp register for prologue and epilogue without any spill because the lifetime of these registers won't cross function call, and it would start after prologue and end before the epilogue. For the virtual register, if there are not enough scratch registers, the allocator will try to spill one. So choosing t1 could avoid spill and GCC choose t1, too. If the offset need to be rematlize, the constant after spilit might not fit in the C extension instructions in most cases.

lenary requested changes to this revision.Sep 12 2019, 2:36 AM

Nice, this is looking a lot better.

I chatted to @asb about this this morning, and he's not sure of the value of using t1 (if it's free) instead of any general-purpose-register. This is going to compromise how many of these instructions we can compress when the C extension is enabled. If you use a virtual register, then the register allocator can use a0-5 if it wishes, which can be compressed, unlike t1. Do you have a justification for explicitly choosing t1?

To my understanding, the caller saved registers which are not parameter registers(a0-7) could be used as temp register for prologue and epilogue without any spill because the lifetime of these registers won't cross function call, and it would start after prologue and end before the epilogue. For the virtual register, if there are not enough scratch registers, the allocator will try to spill one. So choosing t1 could avoid spill and GCC choose t1, too. If the offset need to be rematlize, the constant after spilit might not fit in the C extension instructions in most cases.

I see your point that t0 is guaranteed not to be live over a function call boundary.

If parameter registers a0-7 are not being used to pass arguments, then their lifetime will also not cross the function call and they are effectively no different from the temp registers, as far as their usage in the ABI is concerned.

In the case where there are unallocated (general purpose) registers, presumably choosing a virtual GPR will choose an unallocated one, and not spill. LLVM should know that if a0-7 is used for argument passing, then it cannot be used for a spill.

I think if we have to spill, we should let the compiler choose the register that is cheapest to spill. And if we don't have to spill, choosing in allocation order is more important as we have put the compressible registers earlier in the GPR allocation order.

Thus, if we have the ability to let LLVM choose for us, then we should.

This revision now requires changes to proceed.Sep 12 2019, 2:36 AM
shiva0217 updated this revision to Diff 219922.Sep 12 2019, 8:01 AM

Choosing virtual register as temp register.

I thought to choose a virtual register may cause spill, but I was wrong. After dumping the liveness info in Scavenger, LLVM could know caller saved register which is not using for parameter passing could be used. If the shrink wrap didn't occur, the caller saved registers which not used for parameter passing should always free to use for prologue and epilogue. And Scavenger will follow the AllocationOrder for the backend. So choosing a virtual register will be more reasonable. Thanks for @asb and @lenary comments, I got your point now :)

lenary accepted this revision.Sep 12 2019, 8:57 AM

Great! This looks good to me! I'm glad we're getting rid of all those mov a0, a0 (which are really useless addi a0, a0, 0).

I'm not worried about changes from addi to addiw.

I'm happy that this means improvements to RISCVMatInt will affect these offsets as well as anywhere else we use that code.

This revision is now accepted and ready to land.Sep 12 2019, 8:57 AM
This revision was automatically updated to reflect the committed changes.