This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
-
MIRYamlMapping.h
-
TargetFrameLowering.h
-
lib/Target/
-
Target/
-
AArch64/
-
AArch64FrameLowering.h
-
AArch64FrameLowering.cpp
-
AArch64InstrInfo.cpp
-
AArch64MachineFunctionInfo.h
-
AArch64StackOffset.h
-
AMDGPU/
-
SIFrameLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
framelayout-sve.mir
-
unittests/Target/AArch64/
-
Target/
-
AArch64/
-
TestStackOffset.cpp

Differential D61437

[AArch64] Static (de)allocation of SVE stack objects.
ClosedPublic

Authored by sdesmalen on May 2 2019, 5:38 AM.

Download Raw Diff

Details

Reviewers

thegameg
rovka
t.p.northover
efriedma
rengolin
greened

Commits

rG4f99b6f0fe42: [AArch64] Static (de)allocation of SVE stack objects.
rL373585: [AArch64] Static (de)allocation of SVE stack objects.

Summary

Adds support to AArch64FrameLowering to allocate fixed-stack SVE objects.

The focus of this patch is purely to allow the stack frame to
allocate/deallocate space for scalable SVE objects. More dynamic
allocation (at compile-time, i.e. determining placement of SVE objects
on the stack), or resolving frame-index references that include
scalable-sized offsets, are left for subsequent patches.

SVE objects are allocated in the stack frame as a separate region below
the callee-save area, and above the alignment gap. This is done so that
the SVE objects can be accessed directly from the FP at (runtime)
VL-based offsets to benefit from using the VL-scaled addressing modes.

The layout looks as follows:

+-------------+
| stack arg   |   
+-------------+
| Callee Saves|
|   X29, X30  |       (if available)
|-------------| <- FP (if available)
|     :       |   
|  SVE area   |   
|     :       |   
+-------------+
|/////////////| alignment gap.
|     :       |   
| Stack objs  |
|     :       |   
+-------------+ <- SP after call and frame-setup

SVE and non-SVE stack objects are distinguished using different
StackIDs. The offsets for objects with TargetStackID::SVEVector should be
interpreted as purely scalable offsets within their respective SVE region.

Diff Detail

Repository: rL LLVM

Event Timeline

sdesmalen created this revision.May 2 2019, 5:38 AM

Herald added subscribers: kristof.beyls, tschuett, javed.absar. · View Herald TranscriptMay 2 2019, 5:38 AM

sdesmalen added parent revisions: D61436: [AArch64] NFC: Generalize emitFrameOffset to support more than byte offsets., D61435: [AArch64] NFC: Add generic StackOffset to describe scalable offsets..May 2 2019, 5:39 AM

Your proposed stack layout doesn't really make sense. There are a few issues:

How do you compute the address of a stack argument?
Under the ios and Windows calling conventions, vararg functions must allocate some fixed slots directly after the stack arguments.
How do you restore SP in the epilogue?

It would make a lot more sense to place the SVE objects somewhere between FP and SP; we already support allocating a variable amount of space between FP and SP, for stack realignment.

Not sure what impact this has on this patch; maybe not much?

sdesmalen mentioned this in D61436: [AArch64] NFC: Generalize emitFrameOffset to support more than byte offsets..May 3 2019, 8:23 AM

We've actually experimented with various layouts and eventually chose this layout for our HPC compiler.

Let me give some more clarification on the spill/fill addressing modes as background for this choice.

When loading regular (non-scalable) data from the stack in the presence of SVE stack objects, the base offset can be materialized using ADDVL, which adds a multiple of the runtime VL to a register. For example, a GPR register spilled at an offset SP + 16 bytes + 2 * sizeof(SVE vector) can be loaded using the sequence:

addvl x8, sp, #2
ldr x0, [x8, #16]

Conversely, the SVE spill/fill addressing modes expect a (runtime) VL scaled offset. For example:

ldr z0, [sp, #2, mul vl]  // loads z0 from SP + 2 * sizeof(SVE vector)

If we want to load SVE vector z0 from an offset SP + 16 bytes + 2 * sizeof(SVE vector), this requires first materializing the base offset by adding 16 bytes, and then using the scaled addressing mode to load z0:

add x8, sp, #16
ldr z0, [x8, #2, mul vl]

Because the additional add <fixed-size offset>, or alternatively addvl <scalable offset> is expensive, we distinguish fixed-size objects and scalable (SVE) objects in different regions. By allocating the SVE region before all other stack objects (CSRs, spills, locals), we benefit that the existing frame-layout doesn't need to change. More importantly, this means that accesses to almost all fixed-size stack objects (with exception of stack arguments) will be as efficient as they would be without SVE stack objects, and don't require an extra frame register. In the presence of a frame-pointer, we can also benefit from accessing our SVE objects directly from the FP.

How do you compute the address of a stack argument?

We can compute the address of a stack argument using 'addvl' and regular 'add/sub' instructions.

Under the ios and Windows calling conventions, vararg functions must allocate some fixed slots directly after the stack arguments.

I don't think there is a reason this decision would prevent the ability to create fixed slots directly after stack arguments (with some work to support it for these calling conventions, of course), although I admit we have not had to concern ourselves with this case for our HPC compiler which implements the AAPCS (with SVE extensions). I don't know enough about the iOS and Windows calling conventions to know if there are explicit assumptions made on the frame-layout other than this?

How do you restore SP in the epilogue?

We restore the SP in the epilogue by adding the scalable stack-size to the SP as a last step. For example (from test/CodeGen/AArch64/framelayout-sve.mir)

# CHECK-NEXT: $sp = frame-setup ADDVL_XXI $sp, -2    // allocate scalable-sized stack
# CHECK-NEXT: $sp = frame-setup SUBXri $sp, 16, 0    // allocate fixed-size stack

# CHECK:      $sp = frame-destroy ADDXri $sp, 16, 0  // deallocate fixed-size stack
# CHECK-NEXT: $sp = frame-destroy ADDVL_XXI $sp, 2   // deallocate scalable-sized stack

The description of the addressing modes is helpful; I didn't realize there was native support for vl-relative arithmetic. I guess that makes it more straightforward than I was expecting for stack address computations to skip over the SVE spill slots.

I'm not sure I understand why it's important to allocate the SVE spill slots before the CSRs, as opposed to allocating them between the CSRs and the regular locals/spills. For code which has a frame pointer, placing the SVE spill slots between the CSRs and the locals/spills has a number of benefits over your suggested layout:

the epilogue is cheaper (you don't need an addvl after restoring sp from fp)
it's cheaper to access arguments passed on the stack
it's cheaper to access the SVE spill slots: you can arrange for the frame pointer to point to the top of the SVE spill area, and use negative offsets from it to spill/restore SVE registers in a single instruction.
code using frame pointers can be unwound using a non-SVE-aware DWARF unwinder.

And I don't see any benefits to the other order, unless I'm missing something big.

In cases where we don't have a frame pointer, the two orders are basically equivalent, I guess.

I guess on current SVE implementations, there isn't any advantage to aligning SVE spill slots more than 16 bytes? And you don't expect that to ever change on future implementations?

I'll write out a diagram for my suggested layout, using a similar style to the one in framelayout-sve.mir, to make sure I'm describing it clearly:

#     +--------------+
#     | stack arg    |
#     +--------------+ <- SP before call
#     | Callee Saves |
#     +--------------+
#     | Frame record |
#     +--------------+ <- FP
#     | SVE objs     |
#     +--------------+
#     | gap for stack realignment, if there's an over-aligned local variable
#     +--------------+
#     |     :        |
#     | Stack objs   |
#     |     :        |
#     +--------------+ <- SP after call and frame-setup

I don't know enough about the iOS and Windows calling conventions to know if there are explicit assumptions made on the frame-layout other than this?

That's the only relevant restriction, I think, unless you want to count the Windows unwind rules.

Added more unittests for StackOffset.

sdesmalen mentioned this in D61435: [AArch64] NFC: Add generic StackOffset to describe scalable offsets..May 7 2019, 5:15 AM

Thanks for your suggestions @efriedma!

Just to double check, your suggested layout has the frame-record *after* the callee-saves. The current layout however, puts the frame-record above the callee-saves. Are you suggesting to change that?

I'm not sure I understand why it's important to allocate the SVE spill slots before the CSRs, as opposed to allocating them between the CSRs and the regular locals/spills.

The current layout for our HPC compiler was a trade-off between getting an efficient implementation for SVE spills/fills on one hand, while keeping in mind a way to limit our downstream debt on the other hand. By keeping the layout as unchanged as possible (i.e. keeping all existing offsets to locals/spills the same, with the exception of stack arguments), we figured this simplified the code and reduced the chance of introducing bugs or regressing performance for accesses to regular stack objects in the presence of any SVE slots (with exception of stack arguments).

I spent some time investigating your suggestion to place the SVE area between the callee-saves and locals/spills and found some things worth noting/considering: 

In the presence of an SVE area, the compiler should then no longer use stack-slot scavenging to reuse gaps in the CSR area, because accesses from the SP will be expensive.
The compiler will have less flexibility to choose the best base pointer to access a stack-slot, because using the FP to access a non-SVE local/spill will require an extra ADDVL instruction. For large stack-frames, this may incur an overhead (and would probably require the emergency spill slot).
Allocation of (non-SVE) stack space will always need to happen in separate steps, because it will no longer be possible to allocate the entire stack space in one go and then save the callee-saves from the new SP, because the scalable area is inserted in the middle. Instead, compiler needs to first allocate stack space for callee-saves, store callee-saves, and finally allocate the remaining stack-space. Pre/post-incrementing addressing modes can be used for the first two steps, but I don't know if this would be more expensive than using the regular addressing modes.
The emergency scavenging will always need to be allocated near the SP (or BP), rather than FP. This is not really a problem, but more something that is different when the stack does not contain any SVE objects.
We'd need to change the location of the frame-record within the callee-saves. If we do so, we'll probably want to do that regardless of whether the stack contains SVE spills or not to keep the layouts similar. Also the distance between FP and locals/spills would be smaller, which is probably beneficial. According to the AAPCS, the placement of the FrameRecord within the stack frame is unspecified (section 5.2.3 The Frame Pointer). Do you know if the same freedom holds true for iOS and Windows calling conventions?

the epilogue is cheaper (you don't need an addvl after restoring sp from fp)

In most cases however, LLVM chooses to restore the stack by incrementing the stack-pointer, even when that is suboptimal (e.g. when the FP is available and restoring the SP by adding sizeof(stack) requires more than 1 add instruction). The exception seems to be when the stack is aligned > 16 bytes and it needs to restore it by using the frame-pointer. Do you know if this behaviour is intentional?

it's cheaper to access arguments passed on the stack

Correct.

it's cheaper to access the SVE spill slots: you can arrange for the frame pointer to point to the top of the SVE spill area, and use negative offsets from it to spill/restore SVE registers in a single instruction.

Note that with the layout proposed in this patch, we can overcome that by extending the 16 byte frame-record to be n x 16 bytes <=> sizeof(1 SVE-vec spill), and access all SVE objects directly from FP + 1 + Offset.

code using frame pointers can be unwound using a non-SVE-aware DWARF unwinder.

When using a frame-pointer, that is still the case with the proposed layout, because the FP will always point to the frame-record, so it can always easily find the previous FP and LR, and offsets to the (non-SVE) callee-saves will be unchanged.

I guess on current SVE implementations, there isn't any advantage to aligning SVE spill slots more than 16 bytes? And you don't expect that to ever change on future implementations?

Locals arising from use of the ACLE may be set to a different alignment, but since the ACLE does not allow them being members of structs or arrays, there is probably little value in doing so. One advantage of placing the SVE area as you suggested is that we could easily implement such re-alignment by moving up the alignment gap between the callee-saves and the SVE area.

In D61437#1497980, @sdesmalen wrote:

Thanks for your suggestions @efriedma!

Just to double check, your suggested layout has the frame-record *after* the callee-saves. The current layout however, puts the frame-record above the callee-saves. Are you suggesting to change that?

Yes, I'm suggesting to rearrange them, to make the fp more useful for accessing SVE spills.

I'm not sure I understand why it's important to allocate the SVE spill slots before the CSRs, as opposed to allocating them between the CSRs and the regular locals/spills.

The current layout for our HPC compiler was a trade-off between getting an efficient implementation for SVE spills/fills on one hand, while keeping in mind a way to limit our downstream debt on the other hand. By keeping the layout as unchanged as possible (i.e. keeping all existing offsets to locals/spills the same, with the exception of stack arguments), we figured this simplified the code and reduced the chance of introducing bugs or regressing performance for accesses to regular stack objects in the presence of any SVE slots (with exception of stack arguments).

If the SVE spill area is below the CSRs, you can leverage the existing checks to handle stack realignment, so I don't think it's that complicated to implement. But maybe your approach requires changing fewer places.

I spent some time investigating your suggestion to place the SVE area between the callee-saves and locals/spills and found some things worth noting/considering: 

In the presence of an SVE area, the compiler should then no longer use stack-slot scavenging to reuse gaps in the CSR area, because accesses from the SP will be expensive.

I don't think there's ever more than one 8-byte slot; not a great loss. And if we really wanted to, we could access the slot relative to fp.

The compiler will have less flexibility to choose the best base pointer to access a stack-slot, because using the FP to access a non-SVE local/spill will require an extra ADDVL instruction. For large stack-frames, this may incur an overhead (and would probably require the emergency spill slot).

We don't normally use fp anyway, unless the function has dynamic allocations; the legal negative offsets from fp are much smaller than the legal positive offsets from sp. And if there are dynamic allocations, we often emit a base pointer anyway.

But on a related note, we end up forcing a base pointer in all cases with dynamic allocation and SVE spill slots, which I guess is a potential downside.

Allocation of (non-SVE) stack space will always need to happen in separate steps, because it will no longer be possible to allocate the entire stack space in one go and then save the callee-saves from the new SP, because the scalable area is inserted in the middle. Instead, compiler needs to first allocate stack space for callee-saves, store callee-saves, and finally allocate the remaining stack-space. Pre/post-incrementing addressing modes can be used for the first two steps, but I don't know if this would be more expensive than using the regular addressing modes.

On cortex-a57 etc., the performance of pre/post-increment is basically the same as an extra arithmetic instruction, IIRC. So yes, it's slightly more expensive, but not by a lot.

The emergency scavenging will always need to be allocated near the SP (or BP), rather than FP. This is not really a problem, but more something that is different when the stack does not contain any SVE objects.

This is probably a one-line change, since we already do this in cases with stack realignment.

We'd need to change the location of the frame-record within the callee-saves. If we do so, we'll probably want to do that regardless of whether the stack contains SVE spills or not to keep the layouts similar. Also the distance between FP and locals/spills would be smaller, which is probably beneficial. According to the AAPCS, the placement of the FrameRecord within the stack frame is unspecified (section 5.2.3 The Frame Pointer). Do you know if the same freedom holds true for iOS and Windows calling conventions?

It doesn't matter on iOS. On Windows, the document describing unwind data actually claims the frame record is supposed to be allocated after the local variables for functions with dynamic stack allocations, but we currently don't implement that, and we haven't seen any issues. Maybe there's some interaction between C++ exceptions and dynamic allocation we don't implement correctly? I haven't really spent any time trying to break it, and dynamic allocations combined with C++ exception handling doesn't really show up in real-world code.

the epilogue is cheaper (you don't need an addvl after restoring sp from fp)

In most cases however, LLVM chooses to restore the stack by incrementing the stack-pointer, even when that is suboptimal (e.g. when the FP is available and restoring the SP by adding sizeof(stack) requires more than 1 add instruction). The exception seems to be when the stack is aligned > 16 bytes and it needs to restore it by using the frame-pointer. Do you know if this behaviour is intentional?

That isn't intentional, I think; probably just nobody noticed. Stack frames that require more than one instruction are rare, and frames that require more than two basically never happen.

it's cheaper to access arguments passed on the stack

Correct.

it's cheaper to access the SVE spill slots: you can arrange for the frame pointer to point to the top of the SVE spill area, and use negative offsets from it to spill/restore SVE registers in a single instruction.

Note that with the layout proposed in this patch, we can overcome that by extending the 16 byte frame-record to be n x 16 bytes <=> sizeof(1 SVE-vec spill), and access all SVE objects directly from FP + 1 + Offset.

Oh, that's clever, and I guess it's not that expensive.

code using frame pointers can be unwound using a non-SVE-aware DWARF unwinder.

When using a frame-pointer, that is still the case with the proposed layout, because the FP will always point to the frame-record, so it can always easily find the previous FP and LR, and offsets to the (non-SVE) callee-saves will be unchanged.

Sorry, I didn't state this correctly. The key here would be if code isn't using frame pointers, we could emit a frame pointer for all functions with SVE spill slots, and then get correct unwinding without a SVE-aware unwinder, and without recompiling everything with frame pointers.

I guess on current SVE implementations, there isn't any advantage to aligning SVE spill slots more than 16 bytes? And you don't expect that to ever change on future implementations?

Locals arising from use of the ACLE may be set to a different alignment, but since the ACLE does not allow them being members of structs or arrays, there is probably little value in doing so. One advantage of placing the SVE area as you suggested is that we could easily implement such re-alignment by moving up the alignment gap between the callee-saves and the SVE area.

Yes, that's what I was thinking.

Ignore the bit about Windows unwinding. I remembered how it actually works; the Microsoft document is just wrong, and it's not actually necessary to allocate the frame record in any particular position on WIndows (except that it has to be somewhere that isn't allocated using _chkstk... but that wouldn't happen anyway with the layout I'm suggesting).

This sounds like it will report the wrong stack size in PEI for the StackSize remark and the stack size warning. Is that expected?

In D61437#1498210, @efriedma wrote:

If the SVE spill area is below the CSRs, you can leverage the existing checks to handle stack realignment, so I don't think it's that complicated to implement. But maybe your approach requires changing fewer places.

I think you've made some compelling reasons to try the change in layout! I'll actually try this out downstream first before updating this patch, so I can run it through our SVE testing and see if there is any impact on performance or if I run into anything unexpected.

But on a related note, we end up forcing a base pointer in all cases with dynamic allocation and SVE spill slots, which I guess is a potential downside.

Does LLVM need this information to be available before register allocation so it knows whether to use the register or not? Because we would only know if we'd need a BP if there are any SVE instructions that would lead to spills *after* register allocation (unless the BP is reserved during RA and only used for scavenging).

When using a frame-pointer, that is still the case with the proposed layout, because the FP will always point to the frame-record, so it can always easily find the previous FP and LR, and offsets to the (non-SVE) callee-saves will be unchanged.

Sorry, I didn't state this correctly. The key here would be if code isn't using frame pointers, we could emit a frame pointer for all functions with SVE spill slots, and then get correct unwinding without a SVE-aware unwinder, and without recompiling everything with frame pointers.

Okay, so we should always use the FP if the function needs unwind table entries and has SVE spills/locals.

In D61437#1498242, @efriedma wrote:

Ignore the bit about Windows unwinding. I remembered how it actually works; the Microsoft document is just wrong, and it's not actually necessary to allocate the frame record in any particular position on WIndows (except that it has to be somewhere that isn't allocated using _chkstk... but that wouldn't happen anyway with the layout I'm suggesting).

Thanks for clarifying!

In D61437#1501444, @thegameg wrote:

This sounds like it will report the wrong stack size in PEI for the StackSize remark and the stack size warning. Is that expected?

For now the answer is yes. One of our primary concerns at the moment is adding basic SVE spill/fill support and we appreciate the caveat that nothing in LLVM really supports the concept of scalable types yet, including offsets and sizes.

Patch D61435 introduces (AArch64)StackOffset, which we'll use to describe offsets composed of a scalable and fixed-size part. Instead of recording sizes and offsets as an 'int' or 'unsigned', they should be described as an instance of a StackOffset/StackSize class. When the scalable-type patch (D32530) lands we should make an effort to roll out the StackOffset class (perhaps with an alias for StackSize) to generic CodeGen interfaces such as getStackSize().

Does LLVM need this information to be available before register allocation so it knows whether to use the register or not?

You have to decide whether the register is reserved before register allocation, so the register allocator doesn't decide to use it, yes. You should be able to change the answer after register allocation: basically, reserve the register through register allocation, then decide after register allocation you don't really need it and "unreserve" it.

I wouldn't really worry about optimizing this; dynamic stack allocation is rare in most C and C++ codebases, and one integer register likely doesn't matter much.

Okay, so we should always use the FP if the function needs unwind table entries and has SVE spills/locals.

Yes. Granted, you probably want a frame pointer anyway for functions with SVE spills/locals.

cameron.mcinally added a subscriber: cameron.mcinally.Jul 30 2019, 1:34 PM

Herald added a reviewer: rengolin. · View Herald TranscriptJul 30 2019, 1:34 PM

greened added a subscriber: greened.Aug 1 2019, 11:20 AM

I wouldn't really worry about optimizing this; dynamic stack allocation is rare in most C and C++ codebases, and one integer register likely doesn't matter much.

Note that the situation is different with Fortran, where dynamic stack allocation is much more common, though I don't know whether this particular issue will impact performance all that much.

greened added inline comments.Aug 1 2019, 12:53 PM

lib/Target/AArch64/AArch64FrameLowering.cpp
187 ↗	(On Diff #198440)	Needs a comment explaining what this does.
865 ↗	(On Diff #198440)	This is confusing. Asserting that `SVEStackSize` is non-zero but the message sort of implies it must be true. Maybe word this similarly to the assert right above: "unexpected function without stack frame but with SVE objects."
1281 ↗	(On Diff #198440)	Maybe this and the uses below are better as a separate NFC patch?
1436 ↗	(On Diff #198440)	It would be helpful to have a comment here explaining why we are not `Done` if `SVEStackSize` is non-zero.
1486 ↗	(On Diff #198440)	Add a comment here explaining what this is doing.
lib/Target/AArch64/AArch64InstrInfo.cpp
3052 ↗	(On Diff #198440)	It's not clear to me what these are. Could you name them a bit more clearly, specifically without acronyms?
lib/Target/AArch64/AArch64StackOffset.h
97 ↗	(On Diff #198440)	Again, `PL` and `VL` are not very clear.

Changed the location of the SVE area within the frame-layout as suggested by @efriedma
Updated the summary.

Herald added subscribers: nhaehnle, jvesely, arsenm. · View Herald TranscriptAug 2 2019, 6:24 AM

sdesmalen added inline comments.Aug 2 2019, 6:24 AM

lib/Target/AArch64/AArch64FrameLowering.cpp
1281 ↗	(On Diff #198440)	This change is no longer in my updated patch.
1436 ↗	(On Diff #198440)	This change is no longer in my updated patch.
1486 ↗	(On Diff #198440)	This change is no longer in my updated patch.
lib/Target/AArch64/AArch64InstrInfo.cpp
3052 ↗	(On Diff #198440)	Good point, I see how this is unclear. I've renamed the variables in my latest revision!

sdesmalen added a parent revision: D65653: [AArch64] Change location of frame-record within callee-save area..Aug 2 2019, 6:25 AM

@efriedma, sorry for taking a while to update this patch with the new layout. Other than being distracted by many other things, I tried it on our downstream repo first to see if this might lead to any negative performance impact. This all seems fine, and I now realise this approach makes the code in AArch64FrameLowering a bit simpler (which was the opposite of what I initially thought). I separated out the patch to reorder the frame-record within the callee-save area into D65653.

sdesmalen added a subscriber: joelkevinjones.Aug 2 2019, 7:10 AM

troyj added a subscriber: troyj.Aug 2 2019, 8:44 AM

I wonder if this should have a test that ensures we generate VL-scaled addressing modes for SVE object addressing. If there's not enough codegen yet to emit the asm, then we should probably add such a test when we can. After all, it's the stated goal of this patch. :)

In D61437#1612435, @greened wrote:

I wonder if this should have a test that ensures we generate VL-scaled addressing modes for SVE object addressing. If there's not enough codegen yet to emit the asm, then we should probably add such a test when we can. After all, it's the stated goal of this patch. :)

You're right. I have a separate patch for that, that I could share next week. This patch only adds the support to allocate the SVE area using ADDVL.
(The compiler currently guards against accessing any stack objects in the presence of an SVE area using an assert).

My previous patch didn't have any context, so added it now (git format-patch -U999999)

This LGTM but I think someone else should probably sign off on it as well.

Gentle ping. Now D65653 has been committed, I think this patch is ready for review again.

(I also have some further patches prepared that follow this one; one that implements sp/fp accesses to SVE objects, and another one that implements the saving/restoring of SVE callee-save registers. If it is appreciated, I can post those on Phabricator for context)

greened added inline comments.Sep 3 2019, 7:11 AM

include/llvm/CodeGen/TargetFrameLowering.h
28 ↗	(On Diff #213058)	Why was the formatting changed here? It's easier for me to read with the newlines.

Updated formatting of TargetStackID enum.

sdesmalen marked an inline comment as done.Sep 19 2019, 12:18 AM

sdesmalen added a child revision: D67749: [AArch64] Stackframe accesses to SVE objects..Sep 19 2019, 12:32 AM

Ping.

@eli.friedman are you happy with this patch now that the SVE region is moved between callee-saves and locals as you suggested in https://reviews.llvm.org/D61437#1490588 ?

I think you've basically landed all the significant pieces of this already? But sure, LGTM.

This revision is now accepted and ready to land.Sep 30 2019, 6:30 PM

Closed by commit rL373585: [AArch64] Static (de)allocation of SVE stack objects. (authored by s.desmalen). · Explain WhyOct 3 2019, 4:34 AM

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptOct 3 2019, 4:34 AM

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

CodeGen/

MIRYamlMapping.h

1 line

TargetFrameLowering.h

1 line

lib/

Target/

AArch64/

AArch64FrameLowering.h

11 lines

AArch64FrameLowering.cpp

77 lines

AArch64InstrInfo.cpp

31 lines

AArch64MachineFunctionInfo.h

16 lines

AArch64StackOffset.h

49 lines

AMDGPU/

SIFrameLowering.cpp

2 lines

test/

CodeGen/

AArch64/

framelayout-sve.mir

121 lines

unittests/

Target/

AArch64/

TestStackOffset.cpp

75 lines

Diff 222986

llvm/trunk/include/llvm/CodeGen/MIRYamlMapping.h

Show First 20 Lines • Show All 308 Lines • ▼ Show 20 Lines	struct ScalarEnumerationTraits<FixedMachineStackObject::ObjectType> {
}		}
};		};

template <>		template <>
struct ScalarEnumerationTraits<TargetStackID::Value> {		struct ScalarEnumerationTraits<TargetStackID::Value> {
static void enumeration(yaml::IO &IO, TargetStackID::Value &ID) {		static void enumeration(yaml::IO &IO, TargetStackID::Value &ID) {
IO.enumCase(ID, "default", TargetStackID::Default);		IO.enumCase(ID, "default", TargetStackID::Default);
IO.enumCase(ID, "sgpr-spill", TargetStackID::SGPRSpill);		IO.enumCase(ID, "sgpr-spill", TargetStackID::SGPRSpill);
		IO.enumCase(ID, "sve-vec", TargetStackID::SVEVector);
IO.enumCase(ID, "noalloc", TargetStackID::NoAlloc);		IO.enumCase(ID, "noalloc", TargetStackID::NoAlloc);
}		}
};		};

template <> struct MappingTraits<FixedMachineStackObject> {		template <> struct MappingTraits<FixedMachineStackObject> {
static void mapping(yaml::IO &YamlIO, FixedMachineStackObject &Object) {		static void mapping(yaml::IO &YamlIO, FixedMachineStackObject &Object) {
YamlIO.mapRequired("id", Object.ID);		YamlIO.mapRequired("id", Object.ID);
YamlIO.mapOptional(		YamlIO.mapOptional(
▲ Show 20 Lines • Show All 310 Lines • Show Last 20 Lines

llvm/trunk/include/llvm/CodeGen/TargetFrameLowering.h

Show All 22 Lines	namespace llvm {
class CalleeSavedInfo;		class CalleeSavedInfo;
class MachineFunction;		class MachineFunction;
class RegScavenger;		class RegScavenger;

namespace TargetStackID {		namespace TargetStackID {
enum Value {		enum Value {
Default = 0,		Default = 0,
SGPRSpill = 1,		SGPRSpill = 1,
		SVEVector = 2,
NoAlloc = 255		NoAlloc = 255
};		};
}		}

/// Information about stack frame layout on the target. It holds the direction		/// Information about stack frame layout on the target. It holds the direction
/// of stack growth, the known stack alignment on entry to each function, and		/// of stack growth, the known stack alignment on entry to each function, and
/// the offset to the locals area.		/// the offset to the locals area.
///		///
▲ Show 20 Lines • Show All 359 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AArch64/AArch64FrameLowering.h

Show First 20 Lines • Show All 81 Lines • ▼ Show 20 Lines	public:

int getFrameIndexReferencePreferSP(const MachineFunction &MF, int FI,		int getFrameIndexReferencePreferSP(const MachineFunction &MF, int FI,
unsigned &FrameReg,		unsigned &FrameReg,
bool IgnoreSPUpdates) const override;		bool IgnoreSPUpdates) const override;
int getNonLocalFrameIndexReference(const MachineFunction &MF,		int getNonLocalFrameIndexReference(const MachineFunction &MF,
int FI) const override;		int FI) const override;
int getSEHFrameIndexOffset(const MachineFunction &MF, int FI) const;		int getSEHFrameIndexOffset(const MachineFunction &MF, int FI) const;

		bool isSupportedStackID(TargetStackID::Value ID) const override {
		switch (ID) {
		default:
		return false;
		case TargetStackID::Default:
		case TargetStackID::SVEVector:
		case TargetStackID::NoAlloc:
		return true;
		}
		}

private:		private:
bool shouldCombineCSRLocalStackBump(MachineFunction &MF,		bool shouldCombineCSRLocalStackBump(MachineFunction &MF,
unsigned StackBumpBytes) const;		unsigned StackBumpBytes) const;
};		};

} // End llvm namespace		} // End llvm namespace

#endif		#endif

llvm/trunk/lib/Target/AArch64/AArch64FrameLowering.cpp

Show First 20 Lines • Show All 49 Lines • ▼ Show 20 Lines
// \| \| \| (frame record first)		// \| \| \| (frame record first)
// \| prev_fp, prev_lr \| <--'		// \| prev_fp, prev_lr \| <--'
// \| (a.k.a. "frame record") \|		// \| (a.k.a. "frame record") \|
// \|-----------------------------------\| <- fp(=x29)		// \|-----------------------------------\| <- fp(=x29)
// \| \|		// \| \|
// \| callee-saved fp/simd/SVE regs \|		// \| callee-saved fp/simd/SVE regs \|
// \| \|		// \| \|
// \|-----------------------------------\|		// \|-----------------------------------\|
		// \| \|
		// \| SVE stack objects \|
		// \| \|
		// \|-----------------------------------\|
// \|.empty.space.to.make.part.below....\|		// \|.empty.space.to.make.part.below....\|
// \|.aligned.in.case.it.needs.more.than\| (size of this area is unknown at		// \|.aligned.in.case.it.needs.more.than\| (size of this area is unknown at
// \|.the.standard.16-byte.alignment....\| compile time; if present)		// \|.the.standard.16-byte.alignment....\| compile time; if present)
// \|-----------------------------------\|		// \|-----------------------------------\|
// \| \|		// \| \|
// \| local variables of fixed size \|		// \| local variables of fixed size \|
// \| including spill slots \|		// \| including spill slots \|
// \|-----------------------------------\| <- bp(not defined by ABI,		// \|-----------------------------------\| <- bp(not defined by ABI,
▲ Show 20 Lines • Show All 131 Lines • ▼ Show 20 Lines	for (MachineInstr &MI : MBB) {
AArch64FrameOffsetCannotUpdate)		AArch64FrameOffsetCannotUpdate)
return 0;		return 0;
}		}
}		}
}		}
return DefaultSafeSPDisplacement;		return DefaultSafeSPDisplacement;
}		}

		/// Returns the size of the entire SVE stackframe (calleesaves + spills).
		static StackOffset getSVEStackSize(const MachineFunction &MF) {
		const AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();
		return {(int64_t)AFI->getStackSizeSVE(), MVT::nxv1i8};
		}

bool AArch64FrameLowering::canUseRedZone(const MachineFunction &MF) const {		bool AArch64FrameLowering::canUseRedZone(const MachineFunction &MF) const {
if (!EnableRedZone)		if (!EnableRedZone)
return false;		return false;
// Don't use the red zone if the function explicitly asks us not to.		// Don't use the red zone if the function explicitly asks us not to.
// This is typically used for kernel code.		// This is typically used for kernel code.
if (MF.getFunction().hasFnAttribute(Attribute::NoRedZone))		if (MF.getFunction().hasFnAttribute(Attribute::NoRedZone))
return false;		return false;

const MachineFrameInfo &MFI = MF.getFrameInfo();		const MachineFrameInfo &MFI = MF.getFrameInfo();
const AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();		const AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();
unsigned NumBytes = AFI->getLocalStackSize();		unsigned NumBytes = AFI->getLocalStackSize();

return !(MFI.hasCalls() \|\| hasFP(MF) \|\| NumBytes > 128);		return !(MFI.hasCalls() \|\| hasFP(MF) \|\| NumBytes > 128 \|\|
		getSVEStackSize(MF));
}		}

/// hasFP - Return true if the specified function should have a dedicated frame		/// hasFP - Return true if the specified function should have a dedicated frame
/// pointer register.		/// pointer register.
bool AArch64FrameLowering::hasFP(const MachineFunction &MF) const {		bool AArch64FrameLowering::hasFP(const MachineFunction &MF) const {
const MachineFrameInfo &MFI = MF.getFrameInfo();		const MachineFrameInfo &MFI = MF.getFrameInfo();
const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();		const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
// Win64 EH requires a frame pointer if funclets are present, as the locals		// Win64 EH requires a frame pointer if funclets are present, as the locals
▲ Show 20 Lines • Show All 225 Lines • ▼ Show 20 Lines	if (RegInfo->needsStackRealignment(MF))
return false;		return false;

// This isn't strictly necessary, but it simplifies things a bit since the		// This isn't strictly necessary, but it simplifies things a bit since the
// current RedZone handling code assumes the SP is adjusted by the		// current RedZone handling code assumes the SP is adjusted by the
// callee-save save/restore code.		// callee-save save/restore code.
if (canUseRedZone(MF))		if (canUseRedZone(MF))
return false;		return false;

		// When there is an SVE area on the stack, always allocate the
		// callee-saves and spills/locals separately.
		if (getSVEStackSize(MF))
		return false;

return true;		return true;
}		}

// Given a load or a store instruction, generate an appropriate unwinding SEH		// Given a load or a store instruction, generate an appropriate unwinding SEH
// code on Windows.		// code on Windows.
static MachineBasicBlock::iterator InsertSEH(MachineBasicBlock::iterator MBBI,		static MachineBasicBlock::iterator InsertSEH(MachineBasicBlock::iterator MBBI,
const TargetInstrInfo &TII,		const TargetInstrInfo &TII,
MachineInstr::MIFlag Flag) {		MachineInstr::MIFlag Flag) {
▲ Show 20 Lines • Show All 398 Lines • ▼ Show 20 Lines	void AArch64FrameLowering::emitPrologue(MachineFunction &MF,
// prologue/epilogue.		// prologue/epilogue.
if (MF.getFunction().getCallingConv() == CallingConv::GHC)		if (MF.getFunction().getCallingConv() == CallingConv::GHC)
return;		return;

// Set tagged base pointer to the bottom of the stack frame.		// Set tagged base pointer to the bottom of the stack frame.
// Ideally it should match SP value after prologue.		// Ideally it should match SP value after prologue.
AFI->setTaggedBasePointerOffset(MFI.getStackSize());		AFI->setTaggedBasePointerOffset(MFI.getStackSize());

		const StackOffset &SVEStackSize = getSVEStackSize(MF);

// getStackSize() includes all the locals in its size calculation. We don't		// getStackSize() includes all the locals in its size calculation. We don't
// include these locals when computing the stack size of a funclet, as they		// include these locals when computing the stack size of a funclet, as they
// are allocated in the parent's stack frame and accessed via the frame		// are allocated in the parent's stack frame and accessed via the frame
// pointer from the funclet. We only save the callee saved registers in the		// pointer from the funclet. We only save the callee saved registers in the
// funclet, which are really the callee saved registers of the parent		// funclet, which are really the callee saved registers of the parent
// function, including the funclet.		// function, including the funclet.
int NumBytes = IsFunclet ? (int)getWinEHFuncletFrameSize(MF)		int NumBytes = IsFunclet ? (int)getWinEHFuncletFrameSize(MF)
: (int)MFI.getStackSize();		: (int)MFI.getStackSize();
if (!AFI->hasStackFrame() && !windowsRequiresStackProbe(MF, NumBytes)) {		if (!AFI->hasStackFrame() && !windowsRequiresStackProbe(MF, NumBytes)) {
assert(!HasFP && "unexpected function without stack frame but with FP");		assert(!HasFP && "unexpected function without stack frame but with FP");
		assert(!SVEStackSize &&
		"unexpected function without stack frame but with SVE objects");
// All of the stack allocation is for locals.		// All of the stack allocation is for locals.
AFI->setLocalStackSize(NumBytes);		AFI->setLocalStackSize(NumBytes);
if (!NumBytes)		if (!NumBytes)
return;		return;
// REDZONE: If the stack size is less than 128 bytes, we don't need		// REDZONE: If the stack size is less than 128 bytes, we don't need
// to actually allocate.		// to actually allocate.
if (canUseRedZone(MF)) {		if (canUseRedZone(MF)) {
AFI->setHasRedZone(true);		AFI->setHasRedZone(true);
Show All 30 Lines	void AArch64FrameLowering::emitPrologue(MachineFunction &MF,
unsigned FixedObject = (IsWin64 && !IsFunclet) ?		unsigned FixedObject = (IsWin64 && !IsFunclet) ?
alignTo(AFI->getVarArgsGPRSize(), 16) : 0;		alignTo(AFI->getVarArgsGPRSize(), 16) : 0;

auto PrologueSaveSize = AFI->getCalleeSavedStackSize() + FixedObject;		auto PrologueSaveSize = AFI->getCalleeSavedStackSize() + FixedObject;
// All of the remaining stack allocations are for locals.		// All of the remaining stack allocations are for locals.
AFI->setLocalStackSize(NumBytes - PrologueSaveSize);		AFI->setLocalStackSize(NumBytes - PrologueSaveSize);
bool CombineSPBump = shouldCombineCSRLocalStackBump(MF, NumBytes);		bool CombineSPBump = shouldCombineCSRLocalStackBump(MF, NumBytes);
if (CombineSPBump) {		if (CombineSPBump) {
		assert(!SVEStackSize && "Cannot combine SP bump with SVE");
emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP,		emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP,
{-NumBytes, MVT::i8}, TII, MachineInstr::FrameSetup, false,		{-NumBytes, MVT::i8}, TII, MachineInstr::FrameSetup, false,
NeedsWinCFI, &HasWinCFI);		NeedsWinCFI, &HasWinCFI);
NumBytes = 0;		NumBytes = 0;
} else if (PrologueSaveSize != 0) {		} else if (PrologueSaveSize != 0) {
MBBI = convertCalleeSaveRestoreToSPPrePostIncDec(		MBBI = convertCalleeSaveRestoreToSPPrePostIncDec(
MBB, MBBI, DL, TII, -PrologueSaveSize, NeedsWinCFI, &HasWinCFI);		MBB, MBBI, DL, TII, -PrologueSaveSize, NeedsWinCFI, &HasWinCFI);
NumBytes -= PrologueSaveSize;		NumBytes -= PrologueSaveSize;
▲ Show 20 Lines • Show All 141 Lines • ▼ Show 20 Lines	if (NeedsWinCFI) {
HasWinCFI = true;		HasWinCFI = true;
BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_StackAlloc))		BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_StackAlloc))
.addImm(NumBytes)		.addImm(NumBytes)
.setMIFlag(MachineInstr::FrameSetup);		.setMIFlag(MachineInstr::FrameSetup);
}		}
NumBytes = 0;		NumBytes = 0;
}		}

		emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP, -SVEStackSize, TII,
		MachineInstr::FrameSetup);

// Allocate space for the rest of the frame.		// Allocate space for the rest of the frame.
if (NumBytes) {		if (NumBytes) {
const bool NeedsRealignment = RegInfo->needsStackRealignment(MF);		const bool NeedsRealignment = RegInfo->needsStackRealignment(MF);
unsigned scratchSPReg = AArch64::SP;		unsigned scratchSPReg = AArch64::SP;

if (NeedsRealignment) {		if (NeedsRealignment) {
scratchSPReg = findScratchNonCalleeSaveRegister(&MBB);		scratchSPReg = findScratchNonCalleeSaveRegister(&MBB);
assert(scratchSPReg != AArch64::NoRegister);		assert(scratchSPReg != AArch64::NoRegister);
▲ Show 20 Lines • Show All 332 Lines • ▼ Show 20 Lines	void AArch64FrameLowering::emitEpilogue(MachineFunction &MF,
}		}

if (NeedsWinCFI) {		if (NeedsWinCFI) {
HasWinCFI = true;		HasWinCFI = true;
BuildMI(MBB, LastPopI, DL, TII->get(AArch64::SEH_EpilogStart))		BuildMI(MBB, LastPopI, DL, TII->get(AArch64::SEH_EpilogStart))
.setMIFlag(MachineInstr::FrameDestroy);		.setMIFlag(MachineInstr::FrameDestroy);
}		}

		const StackOffset &SVEStackSize = getSVEStackSize(MF);

// If there is a single SP update, insert it before the ret and we're done.		// If there is a single SP update, insert it before the ret and we're done.
if (CombineSPBump) {		if (CombineSPBump) {
		assert(!SVEStackSize && "Cannot combine SP bump with SVE");
emitFrameOffset(MBB, MBB.getFirstTerminator(), DL, AArch64::SP, AArch64::SP,		emitFrameOffset(MBB, MBB.getFirstTerminator(), DL, AArch64::SP, AArch64::SP,
{NumBytes + (int64_t)AfterCSRPopSize, MVT::i8}, TII,		{NumBytes + (int64_t)AfterCSRPopSize, MVT::i8}, TII,
MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI);		MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI);
if (NeedsWinCFI && HasWinCFI)		if (NeedsWinCFI && HasWinCFI)
BuildMI(MBB, MBB.getFirstTerminator(), DL,		BuildMI(MBB, MBB.getFirstTerminator(), DL,
TII->get(AArch64::SEH_EpilogEnd))		TII->get(AArch64::SEH_EpilogEnd))
.setMIFlag(MachineInstr::FrameDestroy);		.setMIFlag(MachineInstr::FrameDestroy);
return;		return;
}		}

NumBytes -= PrologueSaveSize;		NumBytes -= PrologueSaveSize;
assert(NumBytes >= 0 && "Negative stack allocation size!?");		assert(NumBytes >= 0 && "Negative stack allocation size!?");

		// Deallocate the SVE area.
		if (SVEStackSize)
		if (!AFI->isStackRealigned())
		emitFrameOffset(MBB, LastPopI, DL, AArch64::SP, AArch64::SP, SVEStackSize,
		TII, MachineInstr::FrameDestroy);

if (!hasFP(MF)) {		if (!hasFP(MF)) {
bool RedZone = canUseRedZone(MF);		bool RedZone = canUseRedZone(MF);
// If this was a redzone leaf function, we don't need to restore the		// If this was a redzone leaf function, we don't need to restore the
// stack pointer (but we may need to pop stack args for fastcc).		// stack pointer (but we may need to pop stack args for fastcc).
if (RedZone && AfterCSRPopSize == 0)		if (RedZone && AfterCSRPopSize == 0)
return;		return;

bool NoCalleeSaveRestore = PrologueSaveSize == 0;		bool NoCalleeSaveRestore = PrologueSaveSize == 0;
▲ Show 20 Lines • Show All 133 Lines • ▼ Show 20 Lines	StackOffset AArch64FrameLowering::resolveFrameOffsetReference(
const auto *AFI = MF.getInfo<AArch64FunctionInfo>();		const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();		const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();

int FPOffset = getFPOffset(MF, ObjectOffset).getBytes();		int FPOffset = getFPOffset(MF, ObjectOffset).getBytes();
int Offset = getStackOffset(MF, ObjectOffset).getBytes();		int Offset = getStackOffset(MF, ObjectOffset).getBytes();
bool isCSR =		bool isCSR =
!isFixed && ObjectOffset >= -((int)AFI->getCalleeSavedStackSize());		!isFixed && ObjectOffset >= -((int)AFI->getCalleeSavedStackSize());

		const StackOffset &SVEStackSize = getSVEStackSize(MF);
		if (SVEStackSize)
		llvm_unreachable("Accessing frame indices in presence of SVE "
		"not yet supported");

// Use frame pointer to reference fixed objects. Use it for locals if		// Use frame pointer to reference fixed objects. Use it for locals if
// there are VLAs or a dynamically realigned SP (and thus the SP isn't		// there are VLAs or a dynamically realigned SP (and thus the SP isn't
// reliable as a base). Make sure useFPForScavengingIndex() does the		// reliable as a base). Make sure useFPForScavengingIndex() does the
// right thing for the emergency spill slot.		// right thing for the emergency spill slot.
bool UseFP = false;		bool UseFP = false;
if (AFI->hasStackFrame()) {		if (AFI->hasStackFrame()) {
// Note: Keeping the following as multiple 'if' statements rather than		// Note: Keeping the following as multiple 'if' statements rather than
// merging to a single expression for readability.		// merging to a single expression for readability.
▲ Show 20 Lines • Show All 564 Lines • ▼ Show 20 Lines	void AArch64FrameLowering::determineCalleeSaves(MachineFunction &MF,
}		}

LLVM_DEBUG(dbgs() << "*** determineCalleeSaves\nSaved CSRs:";		LLVM_DEBUG(dbgs() << "*** determineCalleeSaves\nSaved CSRs:";
for (unsigned Reg		for (unsigned Reg
: SavedRegs.set_bits()) dbgs()		: SavedRegs.set_bits()) dbgs()
<< ' ' << printReg(Reg, RegInfo);		<< ' ' << printReg(Reg, RegInfo);
dbgs() << "\n";);		dbgs() << "\n";);

		bool HasSVEStackObjects = [&MFI]() {
		for (int I = MFI.getObjectIndexBegin(); I != 0; ++I)
		if (MFI.getStackID(I) == TargetStackID::SVEVector &&
		MFI.getObjectOffset(I) < 0)
		return true;
		// Note: We don't take allocatable stack objects into
		// account yet, because allocation for those is not yet
		// implemented.
		return false;
		}();

// If any callee-saved registers are used, the frame cannot be eliminated.		// If any callee-saved registers are used, the frame cannot be eliminated.
bool CanEliminateFrame = SavedRegs.count() == 0;		bool CanEliminateFrame = (SavedRegs.count() == 0) && !HasSVEStackObjects;

// The CSR spill slots have not been allocated yet, so estimateStackSize		// The CSR spill slots have not been allocated yet, so estimateStackSize
// won't include them.		// won't include them.
unsigned EstimatedStackSizeLimit = estimateRSStackSizeLimit(MF);		unsigned EstimatedStackSizeLimit = estimateRSStackSizeLimit(MF);
bool BigStack = (EstimatedStackSize + CSStackSize) > EstimatedStackSizeLimit;		bool BigStack = (EstimatedStackSize + CSStackSize) > EstimatedStackSizeLimit;
if (BigStack \|\| !CanEliminateFrame \|\| RegInfo->cannotEliminateFrame(MF))		if (BigStack \|\| !CanEliminateFrame \|\| RegInfo->cannotEliminateFrame(MF))
AFI->setHasStackFrame(true);		AFI->setHasStackFrame(true);

▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
bool AArch64FrameLowering::enableStackSlotScavenging(		bool AArch64FrameLowering::enableStackSlotScavenging(
const MachineFunction &MF) const {		const MachineFunction &MF) const {
const AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();		const AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();
return AFI->hasCalleeSaveStackFreeSpace();		return AFI->hasCalleeSaveStackFreeSpace();
}		}

void AArch64FrameLowering::processFunctionBeforeFrameFinalized(		void AArch64FrameLowering::processFunctionBeforeFrameFinalized(
MachineFunction &MF, RegScavenger *RS) const {		MachineFunction &MF, RegScavenger *RS) const {
		MachineFrameInfo &MFI = MF.getFrameInfo();

		assert(getStackGrowthDirection() == TargetFrameLowering::StackGrowsDown &&
		"Upwards growing stack unsupported");

		// Process all fixed stack SVE objects.
		int64_t Offset = 0;
		for (int I = MFI.getObjectIndexBegin(); I != 0; ++I) {
		unsigned StackID = MFI.getStackID(I);
		if (StackID == TargetStackID::SVEVector) {
		int64_t FixedOffset = -MFI.getObjectOffset(I);
		if (FixedOffset > Offset)
		Offset = FixedOffset;
		}
		}

		unsigned MaxAlign = getStackAlignment();
		uint64_t SVEStackSize = alignTo(Offset, MaxAlign);

		AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();
		AFI->setStackSizeSVE(SVEStackSize);
		assert(MaxAlign <= 16 && "Cannot align scalable vectors more than 16 bytes");

// If this function isn't doing Win64-style C++ EH, we don't need to do		// If this function isn't doing Win64-style C++ EH, we don't need to do
// anything.		// anything.
if (!MF.hasEHFunclets())		if (!MF.hasEHFunclets())
return;		return;
const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();		const TargetInstrInfo &TII = *MF.getSubtarget().getInstrInfo();
MachineFrameInfo &MFI = MF.getFrameInfo();
WinEHFuncInfo &EHInfo = *MF.getWinEHFuncInfo();		WinEHFuncInfo &EHInfo = *MF.getWinEHFuncInfo();

MachineBasicBlock &MBB = MF.front();		MachineBasicBlock &MBB = MF.front();
auto MBBI = MBB.begin();		auto MBBI = MBB.begin();
while (MBBI != MBB.end() && MBBI->getFlag(MachineInstr::FrameSetup))		while (MBBI != MBB.end() && MBBI->getFlag(MachineInstr::FrameSetup))
++MBBI;		++MBBI;

// Create an UnwindHelp object.		// Create an UnwindHelp object.
▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AArch64/AArch64InstrInfo.cpp

Show First 20 Lines • Show All 3,040 Lines • ▼ Show 20 Lines	static void emitFrameOffsetAdj(MachineBasicBlock &MBB,
switch (Opc) {		switch (Opc) {
case AArch64::ADDXri:		case AArch64::ADDXri:
case AArch64::ADDSXri:		case AArch64::ADDSXri:
case AArch64::SUBXri:		case AArch64::SUBXri:
case AArch64::SUBSXri:		case AArch64::SUBSXri:
MaxEncoding = 0xfff;		MaxEncoding = 0xfff;
ShiftSize = 12;		ShiftSize = 12;
break;		break;
		case AArch64::ADDVL_XXI:
		case AArch64::ADDPL_XXI:
		MaxEncoding = 31;
		ShiftSize = 0;
		if (Offset < 0) {
		MaxEncoding = 32;
		Sign = -1;
		Offset = -Offset;
		}
		break;
default:		default:
llvm_unreachable("Unsupported opcode");		llvm_unreachable("Unsupported opcode");
}		}

// FIXME: If the offset won't fit in 24-bits, compute the offset into a		// FIXME: If the offset won't fit in 24-bits, compute the offset into a
// scratch register. If DestReg is a virtual register, use it as the		// scratch register. If DestReg is a virtual register, use it as the
// scratch register; otherwise, create a new virtual register (to be		// scratch register; otherwise, create a new virtual register (to be
// replaced by the scavenger at the end of PEI). That case can be optimized		// replaced by the scavenger at the end of PEI). That case can be optimized
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines
}		}

void llvm::emitFrameOffset(MachineBasicBlock &MBB,		void llvm::emitFrameOffset(MachineBasicBlock &MBB,
MachineBasicBlock::iterator MBBI, const DebugLoc &DL,		MachineBasicBlock::iterator MBBI, const DebugLoc &DL,
unsigned DestReg, unsigned SrcReg,		unsigned DestReg, unsigned SrcReg,
StackOffset Offset, const TargetInstrInfo *TII,		StackOffset Offset, const TargetInstrInfo *TII,
MachineInstr::MIFlag Flag, bool SetNZCV,		MachineInstr::MIFlag Flag, bool SetNZCV,
bool NeedsWinCFI, bool *HasWinCFI) {		bool NeedsWinCFI, bool *HasWinCFI) {
int64_t Bytes;		int64_t Bytes, NumPredicateVectors, NumDataVectors;
Offset.getForFrameOffset(Bytes);		Offset.getForFrameOffset(Bytes, NumPredicateVectors, NumDataVectors);

// First emit non-scalable frame offsets, or a simple 'mov'.		// First emit non-scalable frame offsets, or a simple 'mov'.
if (Bytes \|\| (!Offset && SrcReg != DestReg)) {		if (Bytes \|\| (!Offset && SrcReg != DestReg)) {
assert((DestReg != AArch64::SP \|\| Bytes % 16 == 0) &&		assert((DestReg != AArch64::SP \|\| Bytes % 16 == 0) &&
"SP increment/decrement not 16-byte aligned");		"SP increment/decrement not 16-byte aligned");
unsigned Opc = SetNZCV ? AArch64::ADDSXri : AArch64::ADDXri;		unsigned Opc = SetNZCV ? AArch64::ADDSXri : AArch64::ADDXri;
if (Bytes < 0) {		if (Bytes < 0) {
Bytes = -Bytes;		Bytes = -Bytes;
Opc = SetNZCV ? AArch64::SUBSXri : AArch64::SUBXri;		Opc = SetNZCV ? AArch64::SUBSXri : AArch64::SUBXri;
}		}
emitFrameOffsetAdj(MBB, MBBI, DL, DestReg, SrcReg, Bytes, Opc, TII, Flag,		emitFrameOffsetAdj(MBB, MBBI, DL, DestReg, SrcReg, Bytes, Opc, TII, Flag,
NeedsWinCFI, HasWinCFI);		NeedsWinCFI, HasWinCFI);
SrcReg = DestReg;		SrcReg = DestReg;
}		}

		assert(!(SetNZCV && (NumPredicateVectors \|\| NumDataVectors)) &&
		"SetNZCV not supported with SVE vectors");
		assert(!(NeedsWinCFI && (NumPredicateVectors \|\| NumDataVectors)) &&
		"WinCFI not supported with SVE vectors");

		if (NumDataVectors) {
		emitFrameOffsetAdj(MBB, MBBI, DL, DestReg, SrcReg, NumDataVectors,
		AArch64::ADDVL_XXI, TII, Flag, NeedsWinCFI, nullptr);
		SrcReg = DestReg;
		}

		if (NumPredicateVectors) {
		assert(DestReg != AArch64::SP && "Unaligned access to SP");
		emitFrameOffsetAdj(MBB, MBBI, DL, DestReg, SrcReg, NumPredicateVectors,
		AArch64::ADDPL_XXI, TII, Flag, NeedsWinCFI, nullptr);
		}
}		}

MachineInstr *AArch64InstrInfo::foldMemoryOperandImpl(		MachineInstr *AArch64InstrInfo::foldMemoryOperandImpl(
MachineFunction &MF, MachineInstr &MI, ArrayRef<unsigned> Ops,		MachineFunction &MF, MachineInstr &MI, ArrayRef<unsigned> Ops,
MachineBasicBlock::iterator InsertPt, int FrameIndex,		MachineBasicBlock::iterator InsertPt, int FrameIndex,
LiveIntervals LIS, VirtRegMap VRM) const {		LiveIntervals LIS, VirtRegMap VRM) const {
// This is a bit of a hack. Consider this instruction:		// This is a bit of a hack. Consider this instruction:
//		//
▲ Show 20 Lines • Show All 2,499 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AArch64/AArch64MachineFunctionInfo.h

Show First 20 Lines • Show All 89 Lines • ▼ Show 20 Lines	class AArch64FunctionInfo final : public MachineFunctionInfo {
/// True when the callee-save stack area has unused gaps that may be used for		/// True when the callee-save stack area has unused gaps that may be used for
/// other stack allocations.		/// other stack allocations.
bool CalleeSaveStackHasFreeSpace = false;		bool CalleeSaveStackHasFreeSpace = false;

/// SRetReturnReg - sret lowering includes returning the value of the		/// SRetReturnReg - sret lowering includes returning the value of the
/// returned struct in a register. This field holds the virtual register into		/// returned struct in a register. This field holds the virtual register into
/// which the sret argument is passed.		/// which the sret argument is passed.
unsigned SRetReturnReg = 0;		unsigned SRetReturnReg = 0;
		/// SVE stack size (for predicates and data vectors) are maintained here
		/// rather than in FrameInfo, as the placement and Stack IDs are target
		/// specific.
		uint64_t StackSizeSVE = 0;

		/// HasCalculatedStackSizeSVE indicates whether StackSizeSVE is valid.
		bool HasCalculatedStackSizeSVE = false;

/// Has a value when it is known whether or not the function uses a		/// Has a value when it is known whether or not the function uses a
/// redzone, and no value otherwise.		/// redzone, and no value otherwise.
/// Initialized during frame lowering, unless the function has the noredzone		/// Initialized during frame lowering, unless the function has the noredzone
/// attribute, in which case it is set to false at construction.		/// attribute, in which case it is set to false at construction.
Optional<bool> HasRedZone;		Optional<bool> HasRedZone;

/// ForwardedMustTailRegParms - A list of virtual and physical registers		/// ForwardedMustTailRegParms - A list of virtual and physical registers
Show All 20 Lines	public:
unsigned getBytesInStackArgArea() const { return BytesInStackArgArea; }		unsigned getBytesInStackArgArea() const { return BytesInStackArgArea; }
void setBytesInStackArgArea(unsigned bytes) { BytesInStackArgArea = bytes; }		void setBytesInStackArgArea(unsigned bytes) { BytesInStackArgArea = bytes; }

unsigned getArgumentStackToRestore() const { return ArgumentStackToRestore; }		unsigned getArgumentStackToRestore() const { return ArgumentStackToRestore; }
void setArgumentStackToRestore(unsigned bytes) {		void setArgumentStackToRestore(unsigned bytes) {
ArgumentStackToRestore = bytes;		ArgumentStackToRestore = bytes;
}		}

		bool hasCalculatedStackSizeSVE() const { return HasCalculatedStackSizeSVE; }

		void setStackSizeSVE(uint64_t S) {
		HasCalculatedStackSizeSVE = true;
		StackSizeSVE = S;
		}

		uint64_t getStackSizeSVE() const { return StackSizeSVE; }

bool hasStackFrame() const { return HasStackFrame; }		bool hasStackFrame() const { return HasStackFrame; }
void setHasStackFrame(bool s) { HasStackFrame = s; }		void setHasStackFrame(bool s) { HasStackFrame = s; }

bool isStackRealigned() const { return StackRealigned; }		bool isStackRealigned() const { return StackRealigned; }
void setStackRealigned(bool s) { StackRealigned = s; }		void setStackRealigned(bool s) { StackRealigned = s; }

bool hasCalleeSaveStackFreeSpace() const {		bool hasCalleeSaveStackFreeSpace() const {
return CalleeSaveStackHasFreeSpace;		return CalleeSaveStackHasFreeSpace;
▲ Show 20 Lines • Show All 109 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AArch64/AArch64StackOffset.h

	Show All 29 Lines
	/// offsets, e.g.			/// offsets, e.g.
	//			//
	/// StackOffset(1, MVT::nxv16i8) + StackOffset(1, MVT::i64)			/// StackOffset(1, MVT::nxv16i8) + StackOffset(1, MVT::i64)
	//			//
	/// describes an offset that spans the combined storage required for an SVE			/// describes an offset that spans the combined storage required for an SVE
	/// vector and a 64bit GPR.			/// vector and a 64bit GPR.
	class StackOffset {			class StackOffset {
	int64_t Bytes;			int64_t Bytes;
				int64_t ScalableBytes;

	explicit operator int() const;			explicit operator int() const;

	public:			public:
	using Part = std::pair<int64_t, MVT>;			using Part = std::pair<int64_t, MVT>;

	StackOffset() : Bytes(0) {}			StackOffset() : Bytes(0), ScalableBytes(0) {}

	StackOffset(int64_t Offset, MVT::SimpleValueType T) : StackOffset() {			StackOffset(int64_t Offset, MVT::SimpleValueType T) : StackOffset() {
	assert(!MVT(T).isScalableVector() && "Scalable types not supported");			assert(MVT(T).getSizeInBits() % 8 == 0 &&
				"Offset type is not a multiple of bytes");
	*this += Part(Offset, T);			*this += Part(Offset, T);
	}			}

	StackOffset(const StackOffset &Other) : Bytes(Other.Bytes) {}			StackOffset(const StackOffset &Other)
				: Bytes(Other.Bytes), ScalableBytes(Other.ScalableBytes) {}

	StackOffset &operator=(const StackOffset &) = default;			StackOffset &operator=(const StackOffset &) = default;

	StackOffset &operator+=(const StackOffset::Part &Other) {			StackOffset &operator+=(const StackOffset::Part &Other) {
	assert(Other.second.getSizeInBits() % 8 == 0 &&			int64_t OffsetInBytes = Other.first * (Other.second.getSizeInBits() / 8);
	"Offset type is not a multiple of bytes");			if (Other.second.isScalableVector())
	Bytes += Other.first * (Other.second.getSizeInBits() / 8);			ScalableBytes += OffsetInBytes;
				else
				Bytes += OffsetInBytes;
	return *this;			return *this;
	}			}

	StackOffset &operator+=(const StackOffset &Other) {			StackOffset &operator+=(const StackOffset &Other) {
	Bytes += Other.Bytes;			Bytes += Other.Bytes;
				ScalableBytes += Other.ScalableBytes;
	return *this;			return *this;
	}			}

	StackOffset operator+(const StackOffset &Other) const {			StackOffset operator+(const StackOffset &Other) const {
	StackOffset Res(*this);			StackOffset Res(*this);
	Res += Other;			Res += Other;
	return Res;			return Res;
	}			}

	StackOffset &operator-=(const StackOffset &Other) {			StackOffset &operator-=(const StackOffset &Other) {
	Bytes -= Other.Bytes;			Bytes -= Other.Bytes;
				ScalableBytes -= Other.ScalableBytes;
	return *this;			return *this;
	}			}

	StackOffset operator-(const StackOffset &Other) const {			StackOffset operator-(const StackOffset &Other) const {
	StackOffset Res(*this);			StackOffset Res(*this);
	Res -= Other;			Res -= Other;
	return Res;			return Res;
	}			}

	StackOffset operator-() const {			StackOffset operator-() const {
	StackOffset Res = {};			StackOffset Res = {};
	const StackOffset Other(*this);			const StackOffset Other(*this);
	Res -= Other;			Res -= Other;
	return Res;			return Res;
	}			}

				/// Returns the scalable part of the offset in bytes.
				int64_t getScalableBytes() const { return ScalableBytes; }

	/// Returns the non-scalable part of the offset in bytes.			/// Returns the non-scalable part of the offset in bytes.
	int64_t getBytes() const { return Bytes; }			int64_t getBytes() const { return Bytes; }

	/// Returns the offset in parts to which this frame offset can be			/// Returns the offset in parts to which this frame offset can be
	/// decomposed for the purpose of describing a frame offset.			/// decomposed for the purpose of describing a frame offset.
	/// For non-scalable offsets this is simply its byte size.			/// For non-scalable offsets this is simply its byte size.
	void getForFrameOffset(int64_t &ByteSized) const { ByteSized = Bytes; }			void getForFrameOffset(int64_t &NumBytes, int64_t &NumPredicateVectors,
				int64_t &NumDataVectors) const {
				assert(isValid() && "Invalid frame offset");

				NumBytes = Bytes;
				NumDataVectors = 0;
				NumPredicateVectors = ScalableBytes / 2;
				// This method is used to get the offsets to adjust the frame offset.
				// If the function requires ADDPL to be used and needs more than two ADDPL
				// instructions, part of the offset is folded into NumDataVectors so that it
				// uses ADDVL for part of it, reducing the number of ADDPL instructions.
				if (NumPredicateVectors % 8 == 0 \|\| NumPredicateVectors < -64 \|\|
				NumPredicateVectors > 62) {
				NumDataVectors = NumPredicateVectors / 8;
				NumPredicateVectors -= NumDataVectors * 8;
				}
				}

	/// Returns whether the offset is known zero.			/// Returns whether the offset is known zero.
	explicit operator bool() const { return Bytes; }			explicit operator bool() const { return Bytes \|\| ScalableBytes; }

				bool isValid() const {
				// The smallest scalable element supported by scaled SVE addressing
				// modes are predicates, which are 2 scalable bytes in size. So the scalable
				// byte offset must always be a multiple of 2.
				return ScalableBytes % 2 == 0;
				}
	};			};

	} // end namespace llvm			} // end namespace llvm

	#endif			#endif

llvm/trunk/lib/Target/AMDGPU/SIFrameLowering.cpp

	Show First 20 Lines • Show All 667 Lines • ▼ Show 20 Lines
	}			}

	bool SIFrameLowering::isSupportedStackID(TargetStackID::Value ID) const {			bool SIFrameLowering::isSupportedStackID(TargetStackID::Value ID) const {
	switch (ID) {			switch (ID) {
	case TargetStackID::Default:			case TargetStackID::Default:
	case TargetStackID::NoAlloc:			case TargetStackID::NoAlloc:
	case TargetStackID::SGPRSpill:			case TargetStackID::SGPRSpill:
	return true;			return true;
				case TargetStackID::SVEVector:
				return false;
	}			}
	llvm_unreachable("Invalid TargetStackID::Value");			llvm_unreachable("Invalid TargetStackID::Value");
	}			}

	void SIFrameLowering::emitPrologue(MachineFunction &MF,			void SIFrameLowering::emitPrologue(MachineFunction &MF,
	MachineBasicBlock &MBB) const {			MachineBasicBlock &MBB) const {
	SIMachineFunctionInfo *FuncInfo = MF.getInfo<SIMachineFunctionInfo>();			SIMachineFunctionInfo *FuncInfo = MF.getInfo<SIMachineFunctionInfo>();
	if (FuncInfo->isEntryFunction()) {			if (FuncInfo->isEntryFunction()) {
	▲ Show 20 Lines • Show All 467 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AArch64/framelayout-sve.mir

				# RUN: llc -mtriple=aarch64-none-linux-gnu -run-pass=prologepilog %s -o - \| FileCheck %s
				#
				# Test allocation and deallocation of SVE objects on the stack,
				# as well as using a combination of scalable and non-scalable
				# offsets to access the SVE on the stack.
				#
				# SVE objects are allocated below the (scalar) callee saves,
				# and above spills/locals and the alignment gap, e.g.
				#
				# +-------------+
				# \| stack arg \|
				# +-------------+ <- SP before call
				# \| Callee Saves\|
				# \| Frame record\| (if available)
				# \|-------------\| <- FP (if available)
				# \| SVE area \|
				# +-------------+
				# \|/////////////\| alignment gap.
				# \| : \|
				# \| Stack objs \|
				# \| : \|
				# +-------------+ <- SP after call and frame-setup
				#
				--- \|

				define void @test_allocate_sve() nounwind { entry: unreachable }
				define void @test_allocate_sve_gpr_callee_saves() nounwind { entry: unreachable }
				define void @test_allocate_sve_gpr_realigned() nounwind { entry: unreachable }

				...
				# +----------+
				# \| %fixed- \| // scalable SVE object of n * 18 bytes, aligned to 16 bytes,
				# \| stack.0 \| // to be materialized with 2ADDVL (<=> 2 n * 16bytes)
				# +----------+
				# \| %stack.0 \| // not scalable
				# +----------+ <- SP

				# CHECK-LABEL: name: test_allocate_sve
				# CHECK: stackSize: 16

				# CHECK: bb.0.entry:
				# CHECK-NEXT: $sp = frame-setup ADDVL_XXI $sp, -2
				# CHECK-NEXT: $sp = frame-setup SUBXri $sp, 16, 0

				# CHECK-NEXT: $sp = frame-destroy ADDVL_XXI $sp, 2
				# CHECK-NEXT: $sp = frame-destroy ADDXri $sp, 16, 0
				# CHECK-NEXT: RET_ReallyLR
				name: test_allocate_sve
				fixedStack:
				- { id: 0, stack-id: sve-vec, size: 18, alignment: 2, offset: -18 }
				stack:
				- { id: 0, stack-id: default, size: 16, alignment: 8 }
				body: \|
				bb.0.entry:
				RET_ReallyLR
				---
				...
				# +----------+
				# \| x20, x21 \| // callee saves
				# +----------+
				# \| %fixed- \| // scalable objects
				# \| stack.0 \|
				# +----------+
				# \| %stack.0 \| // not scalable
				# +----------+ <- SP

				# CHECK-LABEL: name: test_allocate_sve_gpr_callee_saves
				# CHECK: stackSize: 32

				# CHECK: bb.0.entry:
				# CHECK-NEXT: $sp = frame-setup STPXpre killed $x21, killed $x20, $sp, -2
				# CHECK-NEXT: $sp = frame-setup ADDVL_XXI $sp, -2
				# CHECK-NEXT: $sp = frame-setup SUBXri $sp, 16, 0
				# CHECK-NEXT: $x20 = IMPLICIT_DEF
				# CHECK-NEXT: $x21 = IMPLICIT_DEF
				# CHECK-NEXT: $sp = frame-destroy ADDVL_XXI $sp, 2
				# CHECK-NEXT: $sp = frame-destroy ADDXri $sp, 16, 0
				# CHECK-NEXT: $sp, $x21, $x20 = frame-destroy LDPXpost $sp, 2
				# CHECK-NEXT: RET_ReallyLR
				name: test_allocate_sve_gpr_callee_saves
				fixedStack:
				- { id: 0, stack-id: sve-vec, size: 18, alignment: 2, offset: -18 }
				stack:
				- { id: 0, stack-id: default, size: 16, alignment: 8 }
				body: \|
				bb.0.entry:
				$x20 = IMPLICIT_DEF
				$x21 = IMPLICIT_DEF
				RET_ReallyLR
				---
				...
				# +----------+
				# \| lr, fp \| // frame record
				# +----------+ <- FP
				# \| %fixed- \| // scalable objects
				# \| stack.0 \|
				# +----------+
				# \|//////////\| // alignment gap
				# \| %stack.0 \| // not scalable
				# +----------+ <- SP
				# CHECK-LABEL: name: test_allocate_sve_gpr_realigned
				# CHECK: stackSize: 32

				# CHECK: bb.0.entry:
				# CHECK-NEXT: $sp = frame-setup STPXpre killed $fp, killed $lr, $sp, -2
				# CHECK-NEXT: $fp = frame-setup ADDXri $sp, 0, 0
				# CHECK-NEXT: $sp = frame-setup ADDVL_XXI $sp, -2
				# CHECK-NEXT: $[[TMP:x[0-9]+]] = frame-setup SUBXri $sp, 16, 0
				# CHECK-NEXT: $sp = ANDXri killed $[[TMP]]
				# CHECK-NEXT: $sp = frame-destroy ADDXri $fp, 0, 0
				# CHECK-NEXT: $sp, $fp, $lr = frame-destroy LDPXpost $sp, 2
				# CHECK-NEXT: RET_ReallyLR
				name: test_allocate_sve_gpr_realigned
				fixedStack:
				- { id: 0, stack-id: sve-vec, size: 18, alignment: 2, offset: -18 }
				stack:
				- { id: 0, stack-id: default, size: 16, alignment: 32 }
				body: \|
				bb.0.entry:
				RET_ReallyLR
				---

llvm/trunk/unittests/Target/AArch64/TestStackOffset.cpp

Show All 14 Lines	TEST(StackOffset, MixedSize) {
StackOffset A(1, MVT::i8);		StackOffset A(1, MVT::i8);
EXPECT_EQ(1, A.getBytes());		EXPECT_EQ(1, A.getBytes());

StackOffset B(2, MVT::i32);		StackOffset B(2, MVT::i32);
EXPECT_EQ(8, B.getBytes());		EXPECT_EQ(8, B.getBytes());

StackOffset C(2, MVT::v4i64);		StackOffset C(2, MVT::v4i64);
EXPECT_EQ(64, C.getBytes());		EXPECT_EQ(64, C.getBytes());

		StackOffset D(2, MVT::nxv4i64);
		EXPECT_EQ(64, D.getScalableBytes());

		StackOffset E(2, MVT::v4i64);
		EXPECT_EQ(0, E.getScalableBytes());

		StackOffset F(2, MVT::nxv4i64);
		EXPECT_EQ(0, F.getBytes());
}		}

TEST(StackOffset, Add) {		TEST(StackOffset, Add) {
StackOffset A(1, MVT::i64);		StackOffset A(1, MVT::i64);
StackOffset B(1, MVT::i32);		StackOffset B(1, MVT::i32);
StackOffset C = A + B;		StackOffset C = A + B;
EXPECT_EQ(12, C.getBytes());		EXPECT_EQ(12, C.getBytes());

StackOffset D(1, MVT::i32);		StackOffset D(1, MVT::i32);
D += A;		D += A;
EXPECT_EQ(12, D.getBytes());		EXPECT_EQ(12, D.getBytes());

		StackOffset E(1, MVT::nxv1i32);
		StackOffset F = C + E;
		EXPECT_EQ(12, F.getBytes());
		EXPECT_EQ(4, F.getScalableBytes());
}		}

TEST(StackOffset, Sub) {		TEST(StackOffset, Sub) {
StackOffset A(1, MVT::i64);		StackOffset A(1, MVT::i64);
StackOffset B(1, MVT::i32);		StackOffset B(1, MVT::i32);
StackOffset C = A - B;		StackOffset C = A - B;
EXPECT_EQ(4, C.getBytes());		EXPECT_EQ(4, C.getBytes());

StackOffset D(1, MVT::i64);		StackOffset D(1, MVT::i64);
D -= A;		D -= A;
EXPECT_EQ(0, D.getBytes());		EXPECT_EQ(0, D.getBytes());

		C += StackOffset(2, MVT::nxv1i32);
		StackOffset E = StackOffset(1, MVT::nxv1i32);
		StackOffset F = C - E;
		EXPECT_EQ(4, F.getBytes());
		EXPECT_EQ(4, F.getScalableBytes());
}		}

TEST(StackOffset, isZero) {		TEST(StackOffset, isZero) {
StackOffset A(0, MVT::i64);		StackOffset A(0, MVT::i64);
StackOffset B(0, MVT::i32);		StackOffset B(0, MVT::i32);
EXPECT_TRUE(!A);		EXPECT_TRUE(!A);
EXPECT_TRUE(!(A + B));		EXPECT_TRUE(!(A + B));

		StackOffset C(0, MVT::nxv1i32);
		EXPECT_TRUE(!(A + C));

		StackOffset D(1, MVT::nxv1i32);
		EXPECT_FALSE(!(A + D));
		}

		TEST(StackOffset, isValid) {
		EXPECT_FALSE(StackOffset(1, MVT::nxv8i1).isValid());
		EXPECT_TRUE(StackOffset(2, MVT::nxv8i1).isValid());

		#ifndef NDEBUG
		#ifdef GTEST_HAS_DEATH_TEST
		EXPECT_DEATH(StackOffset(1, MVT::i1),
		"Offset type is not a multiple of bytes");
		EXPECT_DEATH(StackOffset(1, MVT::nxv1i1),
		"Offset type is not a multiple of bytes");
		#endif // defined GTEST_HAS_DEATH_TEST
		#endif // not defined NDEBUG
}		}

TEST(StackOffset, getForFrameOffset) {		TEST(StackOffset, getForFrameOffset) {
StackOffset A(1, MVT::i64);		StackOffset A(1, MVT::i64);
StackOffset B(1, MVT::i32);		StackOffset B(1, MVT::i32);
int64_t ByteSized;		StackOffset C(1, MVT::nxv4i32);
(A + B).getForFrameOffset(ByteSized);
		// If all offsets can be materialized with only ADDVL,
		// make sure PLSized is 0.
		int64_t ByteSized, VLSized, PLSized;
		(A + B + C).getForFrameOffset(ByteSized, PLSized, VLSized);
EXPECT_EQ(12, ByteSized);		EXPECT_EQ(12, ByteSized);
		EXPECT_EQ(1, VLSized);
		EXPECT_EQ(0, PLSized);

		// If we need an ADDPL to materialize the offset, and the number of scalable
		// bytes fits the ADDPL immediate, fold the scalable bytes to fit in PLSized.
		StackOffset D(1, MVT::nxv16i1);
		(C + D).getForFrameOffset(ByteSized, PLSized, VLSized);
		EXPECT_EQ(0, ByteSized);
		EXPECT_EQ(0, VLSized);
		EXPECT_EQ(9, PLSized);

		StackOffset E(4, MVT::nxv4i32);
		StackOffset F(1, MVT::nxv16i1);
		(E + F).getForFrameOffset(ByteSized, PLSized, VLSized);
		EXPECT_EQ(0, ByteSized);
		EXPECT_EQ(0, VLSized);
		EXPECT_EQ(33, PLSized);

		// If the offset requires an ADDPL instruction to materialize, and would
		// require more than two instructions, decompose it into both
		// ADDVL (n x 16 bytes) and ADDPL (n x 2 bytes) instructions.
		StackOffset G(8, MVT::nxv4i32);
		StackOffset H(1, MVT::nxv16i1);
		(G + H).getForFrameOffset(ByteSized, PLSized, VLSized);
		EXPECT_EQ(0, ByteSized);
		EXPECT_EQ(8, VLSized);
		EXPECT_EQ(1, PLSized);
}		}

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Static (de)allocation of SVE stack objects.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 222986

llvm/trunk/include/llvm/CodeGen/MIRYamlMapping.h

llvm/trunk/include/llvm/CodeGen/TargetFrameLowering.h

llvm/trunk/lib/Target/AArch64/AArch64FrameLowering.h

llvm/trunk/lib/Target/AArch64/AArch64FrameLowering.cpp

llvm/trunk/lib/Target/AArch64/AArch64InstrInfo.cpp

llvm/trunk/lib/Target/AArch64/AArch64MachineFunctionInfo.h

llvm/trunk/lib/Target/AArch64/AArch64StackOffset.h

llvm/trunk/lib/Target/AMDGPU/SIFrameLowering.cpp

llvm/trunk/test/CodeGen/AArch64/framelayout-sve.mir

llvm/trunk/unittests/Target/AArch64/TestStackOffset.cpp

[AArch64] Static (de)allocation of SVE stack objects.
ClosedPublic