This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Add a test showing overlapping stack offsets with RVV
ClosedPublic

Authored by frasercrmck on May 19 2022, 3:32 AM.

Details

Summary

This test (and its forthcoming fix) was split off from D125787. It shows
that the logic we use to determine when we need to add extra RVV padding
is insufficient.

In this example, we may have a situation involving dynamic stack
alignment -- but no variable-sized objects -- where we have no FP but
must still use SP to index objects. In this case we also need the
extra RVV padding, otherwise objects may overlap. Specifically, the test
shows that the RVV vector object may clobber the lowest callee-save.

|------------------------------| -- <-- Incoming SP
| 4-byte callee-save (ra)      |
|------------------------------| -- <-- SP + VLENB*2 + 60
| 4-byte callee-save (s0)      |
|------------------------------| -- <-- SP + VLENB*2 + 56  --
| 4-byte callee-save (s9)      |                            |
|------------------------------| -- <-- SP + VLENB*2 + 52   | RVV object(!!)
| VLENB*2 RVV object           |                            |
|------------------------------| -- <-- SP + 56            --
| 4-byte local object          | 
|------------------------------| -- <-- SP + 32
| Dead area                    |
|------------------------------| -- <-- InSP - 2*VLENB - 64
| Possibly-zero realignment    | 
|------------------------------| -- <-- SP (realigned to 32)

This diagram should help show that when SP==InSP -- e.g., when the incoming SP
is 32-byte aligned, subtracting 2*VLENB+64 may keep it that way -- the RVV
object clobbers the spill of s9.

Diff Detail

Event Timeline

frasercrmck created this revision.May 19 2022, 3:32 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 19 2022, 3:32 AM
frasercrmck requested review of this revision.May 19 2022, 3:32 AM
reames accepted this revision.May 19 2022, 8:27 AM

LGTM

This revision is now accepted and ready to land.May 19 2022, 8:27 AM
This revision was landed with ongoing or failed builds.May 20 2022, 5:11 AM
This revision was automatically updated to reflect the committed changes.
eopXD added a comment.May 24 2022, 2:45 AM

Hi @frasercrmck ,

May I ask why does the generated asm stores is trying to store a0 (x10)?
The RISC-V calling convention specifies that a0 is a caller saved register.

Regards,

eop Chen

Hi @frasercrmck ,

May I ask why does the generated asm stores is trying to store a0 (x10)?
The RISC-V calling convention specifies that a0 is a caller saved register.

Regards,

eop Chen

Hi. The test is arbitrary and is storing s9/$x25 just to have something storing to a scalar stack slot to exercise the frame lowering; it's not a real callee spill.

Since $x25 is a copy of a0/$x10, eventually the machine copy propagation pass removes the copy and that's why it becomes a store of a0.

We could update the tests to use a callee-saved register for readability, but this isn't a correctness issue.

eopXD added a comment.May 24 2022, 5:38 PM

Hi. The test is arbitrary and is storing s9/$x25 just to have something storing to a scalar stack slot to exercise the frame lowering; it's not a real callee spill.
Since $x25 is a copy of a0/$x10, eventually the machine copy propagation pass removes the copy and that's why it becomes a store of a0.
We could update the tests to use a callee-saved register for readability, but this isn't a correctness issue.

Thank you for the reply. I think I am getting closer to understanding the whole thing.

So after the machine copy propagation, the code looks like the following.
s9/$x25 contains value (so we need to callee-save it) and a0/$x10 is a spill-slot object (so we need to callee save it too along with $v30) as well, right?

$x25 = COPY $x10
SW renamable $x10, %stack.0, 0 :: (store (s32) into %stack.0)
PseudoVSPILL_M2 renamable $v30m2, %stack.1 :: (store unknown-size into %stack.1, align 8)
PseudoRET

So after the machine copy propagation, the code looks like the following.
s9/$x25 contains value (so we need to callee-save it) and a0/$x10 is a spill-slot object (so we need to callee save it too along with $v30) as well, right?

$x25 = COPY $x10
SW renamable $x10, %stack.0, 0 :: (store (s32) into %stack.0)
PseudoVSPILL_M2 renamable $v30m2, %stack.1 :: (store unknown-size into %stack.1, align 8)
PseudoRET

Yes, we callee-save s9/$x25 because we're clobbering it in the test. That spill (sw s9, 52(sp)) is automatically generated by the frame lowering code.

But no, I wouldn't say we're "callee-saving" a0/$x10 or $v30. Personally I think of the term "callee-saving" as describing the process of saving registers for the purposes of the calling convention. But in this test we're only storing a0/$x10 and $v30 to the stack because we've concocted a test with those stores in it. We're just demonstrating some codegen in an arbitrary fashion. The stores/spills are unnecessary from a calling convention point of view, but LLVM isn't going to (or able to) remove such unnecessary stack spills - at least at this stage of the pipeline.

I think maybe the type: spill-slot is possibly leading to some confusion? Sorry about that - I just copy/pasted from other tests. I think type: default would be more accurate.

I hope that helps and I'm not confusing you further :)

eopXD added a comment.Jun 1 2022, 8:34 PM

So after the machine copy propagation, the code looks like the following.
s9/$x25 contains value (so we need to callee-save it) and a0/$x10 is a spill-slot object (so we need to callee save it too along with $v30) as well, right?

$x25 = COPY $x10
SW renamable $x10, %stack.0, 0 :: (store (s32) into %stack.0)
PseudoVSPILL_M2 renamable $v30m2, %stack.1 :: (store unknown-size into %stack.1, align 8)
PseudoRET

Yes, we callee-save s9/$x25 because we're clobbering it in the test. That spill (sw s9, 52(sp)) is automatically generated by the frame lowering code.

But no, I wouldn't say we're "callee-saving" a0/$x10 or $v30. Personally I think of the term "callee-saving" as describing the process of saving registers for the purposes of the calling convention. But in this test we're only storing a0/$x10 and $v30 to the stack because we've concocted a test with those stores in it. We're just demonstrating some codegen in an arbitrary fashion. The stores/spills are unnecessary from a calling convention point of view, but LLVM isn't going to (or able to) remove such unnecessary stack spills - at least at this stage of the pipeline.

I think maybe the type: spill-slot is possibly leading to some confusion? Sorry about that - I just copy/pasted from other tests. I think type: default would be more accurate.

I hope that helps and I'm not confusing you further :)

I see. Thank you very much for the explanation!