This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/RISCV/
-
Target/
-
RISCV/
-
RISCVFrameLowering.h
1/2
RISCVFrameLowering.cpp
-
RISCVMachineFunctionInfo.h
-
test/CodeGen/RISCV/rvv/
-
CodeGen/
-
RISCV/
-
rvv/
-
access-fixed-objects-by-rvv.ll
-
addi-scalable-offset.mir
-
allocate-lmul-2-4-8.ll
-
calling-conv-fastcc.ll
-
calling-conv.ll
-
emergency-slot.mir
-
fixed-vectors-insert-subvector.ll
-
fixed-vectors-vpscatter.ll
-
localvar.ll
-
memory-args.ll
-
no-reserved-frame.ll
-
rv32-spill-vector-csr.ll
1/3
rv32-spill-vector.ll
-
rv32-spill-zvlsseg.ll
-
rv64-spill-vector-csr.ll
-
rv64-spill-vector.ll
-
rv64-spill-zvlsseg.ll
-
rvv-args-by-mem.ll
-
rvv-framelayout.ll
-
rvv-stack-align.mir
-
scalar-stack-align.ll
-
wrong-stack-offset-for-rvv-object.mir
-
wrong-stack-slot-rv32.mir
-
wrong-stack-slot-rv64.mir

Differential D125787

[RISCV] Fix RVV stack frame alignment bugs
ClosedPublic

Authored by frasercrmck on May 17 2022, 7:02 AM.

Download Raw Diff

Details

Reviewers

rogfer01
HsiangKai
kito-cheng
craig.topper
StephenFan

Commits

rGcb8681a2b3ad: [RISCV] Fix RVV stack frame alignment bugs

Summary

This patch addresses several alignment issues in the stack frame when
RVV objects are taken into account.

One bug is that the RVV stack was never guaranteed to keep the alignment
of the stack *as a whole*. We must maintain a 16-byte aligned stack at
all times, especially when calling other functions. With the standard V
extension, this is conveniently happening since VLEN is at least 128 and
always 16-byte aligned. However, we support Zvl64b which does not
guarantee this. To fix this, the RVV stack size is rounded up to be
aligned to 16 bytes. This in practice generally makes us allocate a
stack sized at least 2*VLEN in size, and a multiple of 2.

|------------------------------| -- <-- FP
| 8-byte callee-save           | |      |
|------------------------------| |      |
| one VLENB-sized RVV object   | |      |
|------------------------------| |      |
| 8-byte local variable        | |      |
|------------------------------| -- <-- SP (must be aligned to 16)

In the example above, with Zvl64b we are decrementing SP by 12 bytes
which does not leave SP correctly aligned. We therefore introduce an
extra VLENB-sized amount used for alignment. This would therefore ensure
the total stack size was 16 bytes (48 for Zvl128b, 80 for Zvl256b, etc):

|------------------------------| -- <-- FP
| 8-byte callee-save           | |      |
|------------------------------| |      |
| one VLENB-sized padding obj  | |      |
| one VLENB-sized RVV object   | |      |
|------------------------------| |      |
| 8-byte local variable        | |      |
|------------------------------| -- <-- SP

A new RVV invariant has been introduced in this patch, which is that the
base of the RVV stack itself is now always aligned to 16 bytes, not 8 as
before. This keeps us more in line with the scalar stack and should be
easier to reason about. The calculation of the RVV padding has thus
changed to be the amount required to align the scalar local variable
section to the RVV section's alignment. This amount is further rounded
up when setting up the initial stack to keep everything aligned:

|------------------------------| -- <-- FP
| 8-byte callee-save           |
|------------------------------|
|                              |
| RVV objects                  |
| (aligned to at least 16)     |
|                              |
|------------------------------|
| RVV padding of 8 bytes       |
|------------------------------|
| 8-byte local variable        |
|------------------------------| -- <-- SP

In the example above, it's clear that we need 8 bytes of padding to keep
the RVV section aligned to 16 when using SP. But to keep SP *itself*
aligned to 16 we can't decrement the initial stack pointer by 24 - we
have to round up to 32.

With the RVV section correctly aligned, the second bug fixed by
this patch is that RVV objects themselves are now correctly aligned. We
were previously only guaranteeing an alignment of 8 bytes, even if they
required a higher alignment. This is relatively simple and in practice
we see more rounding up of VLEN amounts to account for alignment in
between objects:

|------------------------------|
| RVV object (aligned to 16)   |
|------------------------------|
| no padding necessary         |
|------------------------------|
| 2*VLENB RVV object (align 16)|
|------------------------------|
| VLENB alignment padding      |
|------------------------------|
| RVV object (align 32)        |
|------------------------------|
| 3*VLENB alignment padding    |
|------------------------------|
| VLENB RVV object (align 32)  |
|------------------------------| -- <-- base of RVV section

Note that a lot of the regressions in codegen owing to the new alignment
rules are correct but actually only strictly necessary for Zvl64b (and
Zvl32b but that's not really supported). I plan a follow-up patch to
take the known VLEN into account when padding for alignment.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

frasercrmck created this revision.May 17 2022, 7:02 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 17 2022, 7:02 AM

Herald added subscribers: sunshaoce, VincentWu, luke957 and 27 others. · View Herald Transcript

frasercrmck requested review of this revision.May 17 2022, 7:02 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 17 2022, 7:02 AM

Herald added subscribers: llvm-commits, • pcwang-thead, eopXD, MaskRay. · View Herald Transcript

frasercrmck added a parent revision: D110933: [RISCV] Add a test showing incorrect RVV stack alignment.May 17 2022, 7:03 AM

Thank you for a informative and well written patch description!

Reading through, it seems like your second bug mentioned could probably be reproduced and fixed independently of the others? Would it make sense to split that into it's own patch?

One bit I don't understand the reasoning on is increasing minimal RVV alignment to 16 bytes. You say that this makes it easier to reason about, but I don't really follow that.

llvm/test/CodeGen/RISCV/rvv/rv32-spill-vector.ll
55	Near as I can tell, these shifts are coming from the 16 byte minimum RVV alignment right? Or is there some other cause I'm missing. As noted in the top-level comment, I wonder if this is worthwhile.

Harbormaster completed remote builds in B164883: Diff 430046.May 17 2022, 8:04 AM

In D125787#3519315, @reames wrote:

Reading through, it seems like your second bug mentioned could probably be reproduced and fixed independently of the others? Would it make sense to split that into it's own patch?

It's likely, yeah. It's something I came across most of the way through fixing everything else so I didn't really think about it. I'm sure I could come up with a test that exercises this, and I think the fix is relatively self-contained. I'll give it a bash later.

One bit I don't understand the reasoning on is increasing minimal RVV alignment to 16 bytes. You say that this makes it easier to reason about, but I don't really follow that.

Yeah, it's a good question. To be honest the real reason slipped my mind while writing the description. It was in part of fixing the scalar stack alignment. Since it's underneath the RVV section and needs to be aligned to 16 bytes, unless we know that the RVV section is 16-byte aligned (in size) then we'd have to dynamically realign sp to keep the scalar section below it correctly aligned. This would affect the situation where our (minimum) VLEN is 64.

I thought that dynamically realigning the scalar stack was less desirable than just padding out the RVV stack size, but also that having different alignment requirements depending on the RVV version was overly complicated and just leads to maintenance burden (the simpler the FrameLowering the better, imo). Now that doesn't preclude us from generating different code to satisfy that same alignment requirement depending on our minimum VLEN, so in practice I don't expect the 16-byte alignment requirement to negatively affect us when VLEN >= 128 - once I add in that optimization.

kito-cheng added inline comments.May 18 2022, 9:15 AM

llvm/test/CodeGen/RISCV/rvv/rv32-spill-vector.ll
55	Does it possible to re-align by this way? csrr a0, vlenb addi a0, a0, 15 andi a0, a0, -16 sub sp, sp, a0

frasercrmck added inline comments.May 19 2022, 12:09 AM

llvm/test/CodeGen/RISCV/rvv/rv32-spill-vector.ll
55	Technically I think we could, but because we may be in a situation where we only have sp/bp and we need to jump over the RVV section to reach callee saves or fixed objects, that would complicate the code we emit for frame offset calculations. I think, on balance, having a "known" size for the RVV section is preferable, even if it may waste stack space on certain (zvl32b/zvl64b) configurations.

frasercrmck mentioned this in D125962: [RISCV] Add a test showing overlapping stack offsets with RVV.May 19 2022, 3:32 AM

frasercrmck mentioned this in D125964: [RISCV] Fix logic for determining RVV stack padding.May 19 2022, 3:50 AM

frasercrmck edited the summary of this revision. (Show Details)May 19 2022, 3:54 AM

rebase on D125964

In D125787#3522637, @frasercrmck wrote:

In D125787#3519315, @reames wrote:

Reading through, it seems like your second bug mentioned could probably be reproduced and fixed independently of the others? Would it make sense to split that into it's own patch?

It's likely, yeah. It's something I came across most of the way through fixing everything else so I didn't really think about it. I'm sure I could come up with a test that exercises this, and I think the fix is relatively self-contained. I'll give it a bash later.

Split into D125962 and D125964. Thanks!

fix up a comment: RVV padding isn't only required when *accessing* RVV objects;
it affects the calculation of *any* objects when we have a non-zero RVV stack
section.

Harbormaster completed remote builds in B165302: Diff 430638.May 19 2022, 4:32 AM

frasercrmck added a child revision: D125973: [RISCV] Ensure the entire stack is aligned to the RVV stack alignment.May 19 2022, 7:05 AM

frasercrmck mentioned this in D125973: [RISCV] Ensure the entire stack is aligned to the RVV stack alignment.May 19 2022, 7:09 AM

In D125787#3522637, @frasercrmck wrote:

In D125787#3519315, @reames wrote:

One bit I don't understand the reasoning on is increasing minimal RVV alignment to 16 bytes. You say that this makes it easier to reason about, but I don't really follow that.

Yeah, it's a good question. To be honest the real reason slipped my mind while writing the description. It was in part of fixing the scalar stack alignment. Since it's underneath the RVV section and needs to be aligned to 16 bytes, unless we know that the RVV section is 16-byte aligned (in size) then we'd have to dynamically realign sp to keep the scalar section below it correctly aligned. This would affect the situation where our (minimum) VLEN is 64.

I thought that dynamically realigning the scalar stack was less desirable than just padding out the RVV stack size, but also that having different alignment requirements depending on the RVV version was overly complicated and just leads to maintenance burden (the simpler the FrameLowering the better, imo). Now that doesn't preclude us from generating different code to satisfy that same alignment requirement depending on our minimum VLEN, so in practice I don't expect the 16-byte alignment requirement to negatively affect us when VLEN >= 128 - once I add in that optimization.

Ok, I think that makes sense. We can always chose a more complicated scheme later if we don't like the extra padding, but fixing bugs comes first.

Again though, I think this is a separable patch. Can I ask you to split that off? We should be able to demonstrate the misaligned stack without fixing either a) internal underalignment alignments of RVV objects, or b) your offseting code (I think?). If I understood you correctly, this basically just ends up needing to set RVVPadding more often.

Hm, though one further question which maybe shows I don't fully understand. With vl128b, is there anything preventing the actual implementation from having a VLEN=196? My understanding was that the vlNb variants were a minimum, not an exact value. If so, don't we need to pad out to a 16b alignment even with vl128? (Except when actual register size is known.)

In D125787#3525348, @reames wrote:

In D125787#3522637, @frasercrmck wrote:

In D125787#3519315, @reames wrote:

One bit I don't understand the reasoning on is increasing minimal RVV alignment to 16 bytes. You say that this makes it easier to reason about, but I don't really follow that.

Yeah, it's a good question. To be honest the real reason slipped my mind while writing the description. It was in part of fixing the scalar stack alignment. Since it's underneath the RVV section and needs to be aligned to 16 bytes, unless we know that the RVV section is 16-byte aligned (in size) then we'd have to dynamically realign sp to keep the scalar section below it correctly aligned. This would affect the situation where our (minimum) VLEN is 64.

I thought that dynamically realigning the scalar stack was less desirable than just padding out the RVV stack size, but also that having different alignment requirements depending on the RVV version was overly complicated and just leads to maintenance burden (the simpler the FrameLowering the better, imo). Now that doesn't preclude us from generating different code to satisfy that same alignment requirement depending on our minimum VLEN, so in practice I don't expect the 16-byte alignment requirement to negatively affect us when VLEN >= 128 - once I add in that optimization.

Ok, I think that makes sense. We can always chose a more complicated scheme later if we don't like the extra padding, but fixing bugs comes first.

Again though, I think this is a separable patch. Can I ask you to split that off? We should be able to demonstrate the misaligned stack without fixing either a) internal underalignment alignments of RVV objects, or b) your offseting code (I think?). If I understood you correctly, this basically just ends up needing to set RVVPadding more often.

Hm, though one further question which maybe shows I don't fully understand. With vl128b, is there anything preventing the actual implementation from having a VLEN=196? My understanding was that the vlNb variants were a minimum, not an exact value. If so, don't we need to pad out to a 16b alignment even with vl128? (Except when actual register size is known.)

VLEN must be a power of 2 less than or equal to 65536. From the spec "The number of bits in a single vector register, VLEN ≥ ELEN, which must be a power of 2, and must be no greater than 2^16." So VLEN=196 is not possible

In D125787#3525348, @reames wrote:

Ok, I think that makes sense. We can always chose a more complicated scheme later if we don't like the extra padding, but fixing bugs comes first.

Again though, I think this is a separable patch. Can I ask you to split that off? We should be able to demonstrate the misaligned stack without fixing either a) internal underalignment alignments of RVV objects, or b) your offseting code (I think?). If I understood you correctly, this basically just ends up needing to set RVVPadding more often.

I think there's something separable but I do think it'd have to bring a lot of this patch with it. To keep the base of the RVV section aligned to 16, we'd need to use this new computation of the RVV padding as a hard-coded 16 doesn't cut it. Once the padding changes, we'd also need to change some or all of the offsetting/frame index reference code to account for it.

(To clarify, with this patch the RVV padding is conceptually only the part at the bottom of the stack between the scalar section and the base of the RVV section. The alignment above the RVV section is taken into account via the alignTo in getStackSizeWithRVVPadding.

On main without this patch, RVV padding is more nebulous and is the combined above/below parts, so the offsetting only needs to round some offsets up to 8, rather than accurately tracking the various sections we're traversing. For example, If the scalar section size is 8 then the RVV padding is effectively zero between the scalar and RVV sections, and is 16 above RVV. If the scalar section size is 12 then the RVV padding is 4 below and 12 above.

I personally found that quite difficult to reason about, so that's why I made the conceptual change. I tried to reflect this in the changes I made to the stack diagrams in the code - I hope it's clear enough. It's quite subtle and does affect how offsets are computed. I think with this patch the whole scheme is a bit clearer - conceptually, at least)

Anyway I guess my conclusion is that these two things are more interlinked than it may first appear, and splitting out the 16-byte stuff almost feels as though it's the majority of this patch. The smaller part may actually the ensuring of RVV alignments greater than 16 and all the internal RVV alignments as you point out.

We could also maybe split off the change that increases the size of the RVV stack to be a multiple of 16 bytes, while leaving it underaligned at 8. That might reduce some of the test noise involving the shifts. It might also fix the scalar-stacka-align.ll issue too.

frasercrmck mentioned this in rGa351070710f5: [RISCV] Add a test showing overlapping stack offsets with RVV.May 20 2022, 5:12 AM

frasercrmck mentioned this in rGd60ae47f9dab: [RISCV] Fix logic for determining RVV stack padding.May 20 2022, 5:31 AM

This comment has been deleted.

StephenFan added inline comments.May 20 2022, 7:04 AM

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
753	If I didn't misunderstand, the SP here is assumed that align to maxalign, which is the max alignment of all stack objects (including rvv stack objects). But I found https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/MachineFrameInfo.cpp#L61 only records max align on non-scalable stack objects.

And I also worried that if the size of RVV padding is big enough, the size of the local scalar variable field is out of the range of 12 bits signed number. If it is necessary to add an extra scavenger spill slot for it?

In D125787#3527694, @StephenFan wrote:

And I also worried that if the size of RVV padding is big enough, the size of the local scalar variable field is out of the range of 12 bits signed number. If it is necessary to add an extra scavenger spill slot for it?

We already reserve a scavenger slot if there are any RVV objects at all in the frame - I believe that this one reg is sufficient to cover this eventuality. I don't think there's anything special about RVV padding compared with just the number of scalar local variables being too large for a 12-bit int in the "regular" case.

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
753	Yeah, that's right. This patch itself isn't enough - we have to update MaxAlign ourselves with the RVV objects. You probably saw that in D125973. Although, some scalable objects are included in max align, as you can already see in `rvv-stack-align.mir` which does realign the stack to 32 if there's a 32-byte aligned scalable-vector alloca. I haven't dug into that to see why some are included and some aren't, but I don't think it affects our correctness once D125973 is merged. I don't think it makes much sense to merge D125973 first as without this patch, correctly updating the max align will do nothing to fix any of the scalable misalignment issues and I don't think it'll solve any non-scalable bugs either. That's why it's based on this one.

reames mentioned this in D126088: [RISCV] Add clarifying asserts to getFrameIndexReference [NFC].May 20 2022, 1:12 PM

Still trying to wrap my head around this code. In the process, I've got a small cleanup to propose. https://reviews.llvm.org/D126088

Thinking about what this code does, I think we've got two pieces:

Step 1 - Pick a base register (FP, SP, or BP). There's some legality aspects in this selection (we can't index across either dynamic realign or var sized objects since we don't know their sizes), but it's also a profitability decision.
Step 2 - Given choice from 1, compute offset expression.

I'm wondering if it makes sense to separate the code out this way. If we had a function which only did step 2, then having the asserts to make sure we didn't try to cross var sized objects becomes easy. It also becomes easy to ensure that e.g. all the SP offset paths stay in sync.

Thoughts?

In D125787#3528016, @frasercrmck wrote:

In D125787#3527694, @StephenFan wrote:

And I also worried that if the size of RVV padding is big enough, the size of the local scalar variable field is out of the range of 12 bits signed number. If it is necessary to add an extra scavenger spill slot for it?

We already reserve a scavenger slot if there are any RVV objects at all in the frame - I believe that this one reg is sufficient to cover this eventuality. I don't think there's anything special about RVV padding compared with just the number of scalar local variables being too large for a 12-bit int in the "regular" case.

You are right. Thanks for your explanation!

This revision is now accepted and ready to land.May 21 2022, 6:24 AM

In D125787#3528414, @reames wrote:

Thinking about what this code does, I think we've got two pieces:

Step 1 - Pick a base register (FP, SP, or BP). There's some legality aspects in this selection (we can't index across either dynamic realign or var sized objects since we don't know their sizes), but it's also a profitability decision.

Step 2 - Given choice from 1, compute offset expression.

I'm wondering if it makes sense to separate the code out this way. If we had a function which only did step 2, then having the asserts to make sure we didn't try to cross var sized objects becomes easy. It also becomes easy to ensure that e.g. all the SP offset paths stay in sync.

Thoughts?

I prototyped this structure, and all of the tests in tree continued to pass.

Given this change has been LGTMed, this is not a blocker, but I think this code organization does make sense as a post cleanup.

I am not particular confident in the correctness of this code as it stands. Either the existing code, or this patch.

In D125787#3531829, @reames wrote:

I prototyped this structure, and all of the tests in tree continued to pass.

Given this change has been LGTMed, this is not a blocker, but I think this code organization does make sense as a post cleanup.

I am not particular confident in the correctness of this code as it stands. Either the existing code, or this patch.

Sorry, forgot to reply. The change you propose makes sense to me logically. I can't tell if there are gotchas in reorganizing things like that though.

Since I think discussions have been across various patches at this point, which aspect of this patch are you suspicious of?

In D125787#3531852, @frasercrmck wrote:

In D125787#3531829, @reames wrote:

I prototyped this structure, and all of the tests in tree continued to pass.

Given this change has been LGTMed, this is not a blocker, but I think this code organization does make sense as a post cleanup.

I am not particular confident in the correctness of this code as it stands. Either the existing code, or this patch.

Sorry, forgot to reply. The change you propose makes sense to me logically. I can't tell if there are gotchas in reorganizing things like that though.

Since I think discussions have been across various patches at this point, which aspect of this patch are you suspicious of?

A couple of themes, but nothing actionable.

Theme 1 - We have confirmed we have untested code. We know this code used to differ between two different cases of using SP as the base register. Its not clear if this is correct, or not. This implies both lack of test coverage, and (probably) lack of understanding.

Theme 2 - This is very complicated code, and the changes are broad enough to be hard to track.

I am not objecting to this landing. It's probably more correct than what's in tree. I'm just saying that even with this, we probably have uncaught issues here. (Or at least, the code structure doesn't convince me that we *don't*.) I am a strong proponent of reducing code complexity until it can't be simplified further as a means to flesh out bugs.

This revision was landed with ongoing or failed builds.May 23 2022, 11:04 PM

Closed by commit rGcb8681a2b3ad: [RISCV] Fix RVV stack frame alignment bugs (authored by frasercrmck). · Explain Why

This revision was automatically updated to reflect the committed changes.

frasercrmck added a commit: rGcb8681a2b3ad: [RISCV] Fix RVV stack frame alignment bugs.

In D125787#3531893, @reames wrote:

A couple of themes, but nothing actionable.

Theme 1 - We have confirmed we have untested code. We know this code used to differ between two different cases of using SP as the base register. Its not clear if this is correct, or not. This implies both lack of test coverage, and (probably) lack of understanding.

Theme 2 - This is very complicated code, and the changes are broad enough to be hard to track.

I am not objecting to this landing. It's probably more correct than what's in tree. I'm just saying that even with this, we probably have uncaught issues here. (Or at least, the code structure doesn't convince me that we *don't*.) I am a strong proponent of reducing code complexity until it can't be simplified further as a means to flesh out bugs.

Yeah, good points. I hope we can do further work on this code to simplify it and increase readability so we can reduce the barrier to entry somewhat. I don't have any plans for Theme 2, really, but for 1) I am going to root out the untested code and work out whether it's a lack of coverage or whether it's logically impossible to reach.

reames mentioned this in D126403: [RISCV] reorganize getFrameIndexReference to reduce code duplication [nfc].May 25 2022, 11:45 AM

frasercrmck mentioned this in D126465: [RISCV] Use knowledge of VLEN to avoid over-aligning the stack.May 26 2022, 7:08 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

RISCV/

RISCVFrameLowering.h

5 lines

RISCVFrameLowering.cpp

155 lines

RISCVMachineFunctionInfo.h

5 lines

test/

CodeGen/

RISCV/

rvv/

access-fixed-objects-by-rvv.ll

2 lines

addi-scalable-offset.mir

3 lines

allocate-lmul-2-4-8.ll

417 lines

calling-conv-fastcc.ll

136 lines

calling-conv.ll

32 lines

emergency-slot.mir

2 lines

fixed-vectors-insert-subvector.ll

12 lines

fixed-vectors-vpscatter.ll

8 lines

localvar.ll

72 lines

memory-args.ll

28 lines

no-reserved-frame.ll

23 lines

rv32-spill-vector-csr.ll

20 lines

rv32-spill-vector.ll

8 lines

rv32-spill-zvlsseg.ll

4 lines

rv64-spill-vector-csr.ll

16 lines

rv64-spill-vector.ll

4 lines

rv64-spill-zvlsseg.ll

4 lines

rvv-args-by-mem.ll

18 lines

rvv-framelayout.ll

42 lines

rvv-stack-align.mir

64 lines

scalar-stack-align.ll

20 lines

wrong-stack-offset-for-rvv-object.mir

37 lines

wrong-stack-slot-rv32.mir

2 lines

wrong-stack-slot-rv64.mir

10 lines

Diff 431592

llvm/lib/Target/RISCV/RISCVFrameLowering.h

Show All 24 Lines	explicit RISCVFrameLowering(const RISCVSubtarget &STI)
: TargetFrameLowering(StackGrowsDown,		: TargetFrameLowering(StackGrowsDown,
/StackAlignment=/Align(16),		/StackAlignment=/Align(16),
/LocalAreaOffset=/0),		/LocalAreaOffset=/0),
STI(STI) {}		STI(STI) {}

void emitPrologue(MachineFunction &MF, MachineBasicBlock &MBB) const override;		void emitPrologue(MachineFunction &MF, MachineBasicBlock &MBB) const override;
void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const override;		void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const override;

		uint64_t getStackSizeWithRVVPadding(const MachineFunction &MF) const;

StackOffset getFrameIndexReference(const MachineFunction &MF, int FI,		StackOffset getFrameIndexReference(const MachineFunction &MF, int FI,
Register &FrameReg) const override;		Register &FrameReg) const override;

void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs,		void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs,
RegScavenger *RS) const override;		RegScavenger *RS) const override;

void processFunctionBeforeFrameFinalized(MachineFunction &MF,		void processFunctionBeforeFrameFinalized(MachineFunction &MF,
RegScavenger *RS) const override;		RegScavenger *RS) const override;
Show All 33 Lines
private:		private:
void determineFrameLayout(MachineFunction &MF) const;		void determineFrameLayout(MachineFunction &MF) const;
void adjustReg(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI,		void adjustReg(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI,
const DebugLoc &DL, Register DestReg, Register SrcReg,		const DebugLoc &DL, Register DestReg, Register SrcReg,
int64_t Val, MachineInstr::MIFlag Flag) const;		int64_t Val, MachineInstr::MIFlag Flag) const;
void adjustStackForRVV(MachineFunction &MF, MachineBasicBlock &MBB,		void adjustStackForRVV(MachineFunction &MF, MachineBasicBlock &MBB,
MachineBasicBlock::iterator MBBI, const DebugLoc &DL,		MachineBasicBlock::iterator MBBI, const DebugLoc &DL,
int64_t Amount, MachineInstr::MIFlag Flag) const;		int64_t Amount, MachineInstr::MIFlag Flag) const;
int64_t assignRVVStackObjectOffsets(MachineFrameInfo &MFI) const;		std::pair<int64_t, Align>
		assignRVVStackObjectOffsets(MachineFrameInfo &MFI) const;
};		};
}		}
#endif		#endif

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp

Show First 20 Lines • Show All 244 Lines • ▼ Show 20 Lines	return (MFI.hasVarSizedObjects() \|\|
(!hasReservedCallFrame(MF) && (!MFI.isMaxCallFrameSizeComputed() \|\|		(!hasReservedCallFrame(MF) && (!MFI.isMaxCallFrameSizeComputed() \|\|
MFI.getMaxCallFrameSize() != 0))) &&		MFI.getMaxCallFrameSize() != 0))) &&
TRI->hasStackRealignment(MF);		TRI->hasStackRealignment(MF);
}		}

// Determines the size of the frame and maximum call frame size.		// Determines the size of the frame and maximum call frame size.
void RISCVFrameLowering::determineFrameLayout(MachineFunction &MF) const {		void RISCVFrameLowering::determineFrameLayout(MachineFunction &MF) const {
MachineFrameInfo &MFI = MF.getFrameInfo();		MachineFrameInfo &MFI = MF.getFrameInfo();
		auto *RVFI = MF.getInfo<RISCVMachineFunctionInfo>();

// Get the number of bytes to allocate from the FrameInfo.		// Get the number of bytes to allocate from the FrameInfo.
uint64_t FrameSize = MFI.getStackSize();		uint64_t FrameSize = MFI.getStackSize();

// Get the alignment.		// Get the alignment.
Align StackAlign = getStackAlign();		Align StackAlign = getStackAlign();

// Make sure the frame is aligned.		// Make sure the frame is aligned.
FrameSize = alignTo(FrameSize, StackAlign);		FrameSize = alignTo(FrameSize, StackAlign);

// Update frame info.		// Update frame info.
MFI.setStackSize(FrameSize);		MFI.setStackSize(FrameSize);

		// When using SP or BP to access stack objects, we may require extra padding
		// to ensure the bottom of the RVV stack is correctly aligned within the main
		// stack. We calculate this as the amount required to align the scalar local
		// variable section up to the RVV alignment.
		const TargetRegisterInfo *TRI = STI.getRegisterInfo();
		if (RVFI->getRVVStackSize() && (!hasFP(MF) \|\| TRI->hasStackRealignment(MF))) {
		int ScalarLocalVarSize = FrameSize - RVFI->getCalleeSavedStackSize() -
		RVFI->getVarArgsSaveSize();
		if (auto RVVPadding =
		offsetToAlignment(ScalarLocalVarSize, RVFI->getRVVStackAlign()))
		RVFI->setRVVPadding(RVVPadding);
		}
		}

		// Returns the stack size including RVV padding (when required), rounded back
		// up to the required stack alignment.
		uint64_t RISCVFrameLowering::getStackSizeWithRVVPadding(
		const MachineFunction &MF) const {
		const MachineFrameInfo &MFI = MF.getFrameInfo();
		auto *RVFI = MF.getInfo<RISCVMachineFunctionInfo>();
		return alignTo(MFI.getStackSize() + RVFI->getRVVPadding(), getStackAlign());
}		}

void RISCVFrameLowering::adjustReg(MachineBasicBlock &MBB,		void RISCVFrameLowering::adjustReg(MachineBasicBlock &MBB,
MachineBasicBlock::iterator MBBI,		MachineBasicBlock::iterator MBBI,
const DebugLoc &DL, Register DestReg,		const DebugLoc &DL, Register DestReg,
Register SrcReg, int64_t Val,		Register SrcReg, int64_t Val,
MachineInstr::MIFlag Flag) const {		MachineInstr::MIFlag Flag) const {
MachineRegisterInfo &MRI = MBB.getParent()->getRegInfo();		MachineRegisterInfo &MRI = MBB.getParent()->getRegInfo();
▲ Show 20 Lines • Show All 123 Lines • ▼ Show 20 Lines	if (int LibCallRegs = getLibCallID(MF, MFI.getCalleeSavedInfo()) + 1) {
// Calculate the size of the frame managed by the libcall. The libcalls are		// Calculate the size of the frame managed by the libcall. The libcalls are
// implemented such that the stack will always be 16 byte aligned.		// implemented such that the stack will always be 16 byte aligned.
unsigned LibCallFrameSize = alignTo((STI.getXLen() / 8) * LibCallRegs, 16);		unsigned LibCallFrameSize = alignTo((STI.getXLen() / 8) * LibCallRegs, 16);
RVFI->setLibCallStackSize(LibCallFrameSize);		RVFI->setLibCallStackSize(LibCallFrameSize);
}		}

// FIXME (note copied from Lanai): This appears to be overallocating. Needs		// FIXME (note copied from Lanai): This appears to be overallocating. Needs
// investigation. Get the number of bytes to allocate from the FrameInfo.		// investigation. Get the number of bytes to allocate from the FrameInfo.
uint64_t StackSize = MFI.getStackSize() + RVFI->getRVVPadding();		uint64_t StackSize = getStackSizeWithRVVPadding(MF);
uint64_t RealStackSize = StackSize + RVFI->getLibCallStackSize();		uint64_t RealStackSize = StackSize + RVFI->getLibCallStackSize();
uint64_t RVVStackSize = RVFI->getRVVStackSize();		uint64_t RVVStackSize = RVFI->getRVVStackSize();

// Early exit if there is no need to allocate on the stack		// Early exit if there is no need to allocate on the stack
if (RealStackSize == 0 && !MFI.adjustsStack() && RVVStackSize == 0)		if (RealStackSize == 0 && !MFI.adjustsStack() && RVVStackSize == 0)
return;		return;

// If the stack pointer has been marked as reserved, then produce an error if		// If the stack pointer has been marked as reserved, then produce an error if
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::cfiDefCfa(
nullptr, RI->getDwarfRegNum(FPReg, true), RVFI->getVarArgsSaveSize()));		nullptr, RI->getDwarfRegNum(FPReg, true), RVFI->getVarArgsSaveSize()));
BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))		BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
.addCFIIndex(CFIIndex)		.addCFIIndex(CFIIndex)
.setMIFlag(MachineInstr::FrameSetup);		.setMIFlag(MachineInstr::FrameSetup);
}		}

// Emit the second SP adjustment after saving callee saved registers.		// Emit the second SP adjustment after saving callee saved registers.
if (FirstSPAdjustAmount) {		if (FirstSPAdjustAmount) {
uint64_t SecondSPAdjustAmount = MFI.getStackSize() - FirstSPAdjustAmount;		uint64_t SecondSPAdjustAmount =
		getStackSizeWithRVVPadding(MF) - FirstSPAdjustAmount;
assert(SecondSPAdjustAmount > 0 &&		assert(SecondSPAdjustAmount > 0 &&
"SecondSPAdjustAmount should be greater than zero");		"SecondSPAdjustAmount should be greater than zero");
adjustReg(MBB, MBBI, DL, SPReg, SPReg, -SecondSPAdjustAmount,		adjustReg(MBB, MBBI, DL, SPReg, SPReg, -SecondSPAdjustAmount,
MachineInstr::FrameSetup);		MachineInstr::FrameSetup);

// If we are using a frame-pointer, and thus emitted ".cfi_def_cfa fp, 0",		// If we are using a frame-pointer, and thus emitted ".cfi_def_cfa fp, 0",
// don't emit an sp-based .cfi_def_cfa_offset		// don't emit an sp-based .cfi_def_cfa_offset
if (!hasFP(MF)) {		if (!hasFP(MF)) {
// Emit ".cfi_def_cfa_offset StackSize"		// Emit ".cfi_def_cfa_offset StackSize"
unsigned CFIIndex = MF.addFrameInst(		unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::cfiDefCfaOffset(
MCCFIInstruction::cfiDefCfaOffset(nullptr, MFI.getStackSize()));		nullptr, getStackSizeWithRVVPadding(MF)));
BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))		BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
.addCFIIndex(CFIIndex)		.addCFIIndex(CFIIndex)
.setMIFlag(MachineInstr::FrameSetup);		.setMIFlag(MachineInstr::FrameSetup);
}		}
}		}

if (RVVStackSize)		if (RVVStackSize)
adjustStackForRVV(MF, MBB, MBBI, DL, -RVVStackSize,		adjustStackForRVV(MF, MBB, MBBI, DL, -RVVStackSize,
▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines	void RISCVFrameLowering::emitEpilogue(MachineFunction &MF,

// Skip to before the restores of callee-saved registers		// Skip to before the restores of callee-saved registers
// FIXME: assumes exactly one instruction is used to restore each		// FIXME: assumes exactly one instruction is used to restore each
// callee-saved register.		// callee-saved register.
auto LastFrameDestroy = MBBI;		auto LastFrameDestroy = MBBI;
if (!CSI.empty())		if (!CSI.empty())
LastFrameDestroy = std::prev(MBBI, CSI.size());		LastFrameDestroy = std::prev(MBBI, CSI.size());

uint64_t StackSize = MFI.getStackSize() + RVFI->getRVVPadding();		uint64_t StackSize = getStackSizeWithRVVPadding(MF);
uint64_t RealStackSize = StackSize + RVFI->getLibCallStackSize();		uint64_t RealStackSize = StackSize + RVFI->getLibCallStackSize();
uint64_t FPOffset = RealStackSize - RVFI->getVarArgsSaveSize();		uint64_t FPOffset = RealStackSize - RVFI->getVarArgsSaveSize();
uint64_t RVVStackSize = RVFI->getRVVStackSize();		uint64_t RVVStackSize = RVFI->getRVVStackSize();

// Restore the stack pointer using the value of the frame pointer. Only		// Restore the stack pointer using the value of the frame pointer. Only
// necessary if the stack pointer was modified, meaning the stack size is		// necessary if the stack pointer was modified, meaning the stack size is
// unknown.		// unknown.
if (RI->hasStackRealignment(MF) \|\| MFI.hasVarSizedObjects()) {		if (RI->hasStackRealignment(MF) \|\| MFI.hasVarSizedObjects()) {
assert(hasFP(MF) && "frame pointer should not have been eliminated");		assert(hasFP(MF) && "frame pointer should not have been eliminated");
adjustReg(MBB, LastFrameDestroy, DL, SPReg, FPReg, -FPOffset,		adjustReg(MBB, LastFrameDestroy, DL, SPReg, FPReg, -FPOffset,
MachineInstr::FrameDestroy);		MachineInstr::FrameDestroy);
} else {		} else {
if (RVVStackSize)		if (RVVStackSize)
adjustStackForRVV(MF, MBB, LastFrameDestroy, DL, RVVStackSize,		adjustStackForRVV(MF, MBB, LastFrameDestroy, DL, RVVStackSize,
MachineInstr::FrameDestroy);		MachineInstr::FrameDestroy);
}		}

uint64_t FirstSPAdjustAmount = getFirstSPAdjustAmount(MF);		uint64_t FirstSPAdjustAmount = getFirstSPAdjustAmount(MF);
if (FirstSPAdjustAmount) {		if (FirstSPAdjustAmount) {
uint64_t SecondSPAdjustAmount = MFI.getStackSize() - FirstSPAdjustAmount;		uint64_t SecondSPAdjustAmount =
		getStackSizeWithRVVPadding(MF) - FirstSPAdjustAmount;
assert(SecondSPAdjustAmount > 0 &&		assert(SecondSPAdjustAmount > 0 &&
"SecondSPAdjustAmount should be greater than zero");		"SecondSPAdjustAmount should be greater than zero");

adjustReg(MBB, LastFrameDestroy, DL, SPReg, SPReg, SecondSPAdjustAmount,		adjustReg(MBB, LastFrameDestroy, DL, SPReg, SPReg, SecondSPAdjustAmount,
MachineInstr::FrameDestroy);		MachineInstr::FrameDestroy);
}		}

if (FirstSPAdjustAmount)		if (FirstSPAdjustAmount)
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	RISCVFrameLowering::getFrameIndexReference(const MachineFunction &MF, int FI,
}		}

if (FI >= MinCSFI && FI <= MaxCSFI) {		if (FI >= MinCSFI && FI <= MaxCSFI) {
FrameReg = RISCV::X2;		FrameReg = RISCV::X2;

if (FirstSPAdjustAmount)		if (FirstSPAdjustAmount)
Offset += StackOffset::getFixed(FirstSPAdjustAmount);		Offset += StackOffset::getFixed(FirstSPAdjustAmount);
else		else
Offset +=		Offset += StackOffset::getFixed(getStackSizeWithRVVPadding(MF));
StackOffset::getFixed(MFI.getStackSize() + RVFI->getRVVPadding());
} else if (RI->hasStackRealignment(MF) && !MFI.isFixedObjectIndex(FI)) {		} else if (RI->hasStackRealignment(MF) && !MFI.isFixedObjectIndex(FI)) {
// If the stack was realigned, the frame pointer is set in order to allow		// If the stack was realigned, the frame pointer is set in order to allow
// SP to be restored, so we need another base register to record the stack		// SP to be restored, so we need another base register to record the stack
// after realignment.		// after realignment.
if (hasBP(MF)) {		if (hasBP(MF)) {
FrameReg = RISCVABI::getBPReg();		FrameReg = RISCVABI::getBPReg();
// \|--------------------------\| -- <-- FP		// \|--------------------------\| -- <-- FP
// \| callee-allocated save \| \| <----\|		// \| callee-allocated save \| \| <----\|
// \| area for register varargs\| \| \|		// \| area for register varargs\| \| \|
// \|--------------------------\| \| \|		// \|--------------------------\| \| \|
// \| callee-saved registers \| \| \|		// \| callee-saved registers \| \| \|
// \|--------------------------\| -- \|		// \|--------------------------\| -- \|
// \| realignment (the size of \| \| \|		// \| realignment (the size of \| \| \|
// \| this area is not counted \| \| \|		// \| this area is not counted \| \| \|
// \| in MFI.getStackSize()) \| \| \|		// \| in MFI.getStackSize()) \| \| \|
// \|--------------------------\| -- \|
// \| Padding after RVV \| \| \|
// \| (not counted in \| \| \|
// \| MFI.getStackSize()) \| \| \|
// \|--------------------------\| -- \|-- MFI.getStackSize()		// \|--------------------------\| -- \|-- MFI.getStackSize()
		// \| RVV alignment padding \| \| \|
		// \| (not counted in \| \| \|
		// \| MFI.getStackSize() but \| \| \|
		// \| counted in \| \| \|
		// \| RVFI.getRVVStackSize()) \| \| \|
		// \|--------------------------\| -- \|
// \| RVV objects \| \| \|		// \| RVV objects \| \| \|
// \| (not counted in \| \| \|		// \| (not counted in \| \| \|
// \| MFI.getStackSize()) \| \| \|		// \| MFI.getStackSize()) \| \| \|
// \|--------------------------\| -- \|		// \|--------------------------\| -- \|
// \| Padding before RVV \| \| \|		// \| padding before RVV \| \| \|
// \| (not counted in \| \| \|		// \| (not counted in \| \| \|
// \| MFI.getStackSize()) \| \| \|		// \| MFI.getStackSize() or in \| \| \|
		// \| RVFI.getRVVStackSize()) \| \| \|
// \|--------------------------\| -- \|		// \|--------------------------\| -- \|
// \| scalar local variables \| \| <----'		// \| scalar local variables \| \| <----'
// \|--------------------------\| -- <-- BP		// \|--------------------------\| -- <-- BP
// \| VarSize objects \| \|		// \| VarSize objects \| \|
// \|--------------------------\| -- <-- SP		// \|--------------------------\| -- <-- SP
} else {		} else {
FrameReg = RISCV::X2;		FrameReg = RISCV::X2;
// \|--------------------------\| -- <-- FP		// \|--------------------------\| -- <-- FP
// \| callee-allocated save \| \| <----\|		// \| callee-allocated save \| \| <----\|
// \| area for register varargs\| \| \|		// \| area for register varargs\| \| \|
// \|--------------------------\| \| \|		// \|--------------------------\| \| \|
// \| callee-saved registers \| \| \|		// \| callee-saved registers \| \| \|
// \|--------------------------\| -- \|		// \|--------------------------\| -- \|
// \| realignment (the size of \| \| \|		// \| realignment (the size of \| \| \|
// \| this area is not counted \| \| \|		// \| this area is not counted \| \| \|
// \| in MFI.getStackSize()) \| \| \|		// \| in MFI.getStackSize()) \| \| \|
// \|--------------------------\| -- \|		// \|--------------------------\| -- \|
// \| Padding after RVV \| \| \|		// \| RVV alignment padding \| \| \|
// \| (not counted in \| \| \|		// \| (not counted in \| \| \|
// \| MFI.getStackSize()) \| \| \|		// \| MFI.getStackSize() but \| \| \|
		// \| counted in \| \| \|
		// \| RVFI.getRVVStackSize()) \| \| \|
// \|--------------------------\| -- \|-- MFI.getStackSize()		// \|--------------------------\| -- \|-- MFI.getStackSize()
// \| RVV objects \| \| \|		// \| RVV objects \| \| \|
// \| (not counted in \| \| \|		// \| (not counted in \| \| \|
// \| MFI.getStackSize()) \| \| \|		// \| MFI.getStackSize()) \| \| \|
// \|--------------------------\| -- \|		// \|--------------------------\| -- \|
// \| Padding before RVV \| \| \|		// \| padding before RVV \| \| \|
// \| (not counted in \| \| \|		// \| (not counted in \| \| \|
// \| MFI.getStackSize()) \| \| \|		// \| MFI.getStackSize() or in \| \| \|
		// \| RVFI.getRVVStackSize()) \| \| \|
// \|--------------------------\| -- \|		// \|--------------------------\| -- \|
// \| scalar local variables \| \| <----'		// \| scalar local variables \| \| <----'
// \|--------------------------\| -- <-- SP		// \|--------------------------\| -- <-- SP
		StephenFanUnsubmitted Not Done Reply Inline Actions If I didn't misunderstand, the SP here is assumed that align to maxalign, which is the max alignment of all stack objects (including rvv stack objects). But I found https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/MachineFrameInfo.cpp#L61 only records max align on non-scalable stack objects. StephenFan: If I didn't misunderstand, the SP here is assumed that align to maxalign, which is the max…
		frasercrmckAuthorUnsubmitted Done Reply Inline Actions Yeah, that's right. This patch itself isn't enough - we have to update MaxAlign ourselves with the RVV objects. You probably saw that in D125973. Although, some scalable objects are included in max align, as you can already see in `rvv-stack-align.mir` which does realign the stack to 32 if there's a 32-byte aligned scalable-vector alloca. I haven't dug into that to see why some are included and some aren't, but I don't think it affects our correctness once D125973 is merged. I don't think it makes much sense to merge D125973 first as without this patch, correctly updating the max align will do nothing to fix any of the scalable misalignment issues and I don't think it'll solve any non-scalable bugs either. That's why it's based on this one. frasercrmck: Yeah, that's right. This patch itself isn't enough - we have to update MaxAlign ourselves with…
}		}
// The total amount of padding surrounding RVV objects is described by		// The total amount of padding surrounding RVV objects is described by
// RVV->getRVVPadding() and it can be zero. It allows us to align the RVV		// RVV->getRVVPadding() and it can be zero. It allows us to align the RVV
// objects to 8 bytes.		// objects to the required alignment.
if (MFI.getStackID(FI) == TargetStackID::Default) {		if (MFI.getStackID(FI) == TargetStackID::Default) {
Offset += StackOffset::getFixed(MFI.getStackSize());		Offset += StackOffset::getFixed(MFI.getStackSize());
if (FI < 0)		if (FI < 0)
Offset += StackOffset::getFixed(RVFI->getLibCallStackSize());		Offset += StackOffset::getFixed(RVFI->getLibCallStackSize());
} else if (MFI.getStackID(FI) == TargetStackID::ScalableVector) {		} else if (MFI.getStackID(FI) == TargetStackID::ScalableVector) {
Offset += StackOffset::get(		// Ensure the base of the RVV stack is correctly aligned: add on the
alignTo(MFI.getStackSize() - RVFI->getCalleeSavedStackSize(), 8),		// alignment padding.
RVFI->getRVVStackSize());		int ScalarLocalVarSize = MFI.getStackSize() -
		RVFI->getCalleeSavedStackSize() +
		RVFI->getRVVPadding();
		Offset += StackOffset::get(ScalarLocalVarSize, RVFI->getRVVStackSize());
}		}
} else {		} else {
FrameReg = RI->getFrameRegister(MF);		FrameReg = RI->getFrameRegister(MF);
if (hasFP(MF)) {		if (hasFP(MF)) {
Offset += StackOffset::getFixed(RVFI->getVarArgsSaveSize());		Offset += StackOffset::getFixed(RVFI->getVarArgsSaveSize());
if (FI >= 0)		if (FI >= 0)
Offset -= StackOffset::getFixed(RVFI->getLibCallStackSize());		Offset -= StackOffset::getFixed(RVFI->getLibCallStackSize());
// When using FP to access scalable vector objects, we need to minus		// When using FP to access scalable vector objects, we need to minus
// the frame size.		// the frame size.
//		//
// \|--------------------------\| -- <-- FP		// \|--------------------------\| -- <-- FP
// \| callee-allocated save \| \|		// \| callee-allocated save \| \|
// \| area for register varargs\| \|		// \| area for register varargs\| \|
// \|--------------------------\| \|		// \|--------------------------\| \|
// \| callee-saved registers \| \|		// \| callee-saved registers \| \|
// \|--------------------------\| \| MFI.getStackSize()		// \|--------------------------\| \| MFI.getStackSize()
// \| scalar local variables \| \|		// \| scalar local variables \| \|
// \|--------------------------\| -- (Offset of RVV objects is from here.)		// \|--------------------------\| -- (Offset of RVV objects is from here.)
// \| RVV objects \|		// \| RVV objects \|
// \|--------------------------\|		// \|--------------------------\|
// \| VarSize objects \|		// \| VarSize objects \|
// \|--------------------------\| <-- SP		// \|--------------------------\| <-- SP
if (MFI.getStackID(FI) == TargetStackID::ScalableVector)		if (MFI.getStackID(FI) == TargetStackID::ScalableVector) {
		// We don't expect any extra RVV alignment padding, as the stack size
		// and RVV object sections should be correct aligned in their own
		// right.
		assert(MFI.getStackSize() == getStackSizeWithRVVPadding(MF) &&
		"Inconsistent stack layout");
Offset -= StackOffset::getFixed(MFI.getStackSize());		Offset -= StackOffset::getFixed(MFI.getStackSize());
		}
} else {		} else {
// When using SP to access frame objects, we need to add RVV stack size.		// When using SP to access frame objects, we need to add RVV stack size.
//		//
// \|--------------------------\| -- <-- FP		// \|--------------------------\| -- <-- FP
// \| callee-allocated save \| \| <----\|		// \| callee-allocated save \| \| <----\|
// \| area for register varargs\| \| \|		// \| area for register varargs\| \| \|
// \|--------------------------\| \| \|		// \|--------------------------\| \| \|
// \| callee-saved registers \| \| \|		// \| callee-saved registers \| \| \|
// \|--------------------------\| -- \|		// \|--------------------------\| -- \|
// \| Padding after RVV \| \| \|		// \| RVV alignment padding \| \| \|
// \| (not counted in \| \| \|		// \| (not counted in \| \| \|
// \| MFI.getStackSize()) \| \| \|		// \| MFI.getStackSize() but \| \| \|
		// \| counted in \| \| \|
		// \| RVFI.getRVVStackSize()) \| \| \|
// \|--------------------------\| -- \|		// \|--------------------------\| -- \|
// \| RVV objects \| \| \|-- MFI.getStackSize()		// \| RVV objects \| \| \|-- MFI.getStackSize()
// \| (not counted in \| \| \|		// \| (not counted in \| \| \|
// \| MFI.getStackSize()) \| \| \|		// \| MFI.getStackSize()) \| \| \|
// \|--------------------------\| -- \|		// \|--------------------------\| -- \|
// \| Padding before RVV \| \| \|		// \| padding before RVV \| \| \|
// \| (not counted in \| \| \|		// \| (not counted in \| \| \|
// \| MFI.getStackSize()) \| \| \|		// \| MFI.getStackSize()) \| \| \|
// \|--------------------------\| -- \|		// \|--------------------------\| -- \|
// \| scalar local variables \| \| <----'		// \| scalar local variables \| \| <----'
// \|--------------------------\| -- <-- SP		// \|--------------------------\| -- <-- SP
//		//
// The total amount of padding surrounding RVV objects is described by		// The total amount of padding surrounding RVV objects is described by
// RVV->getRVVPadding() and it can be zero. It allows us to align the RVV		// RVV->getRVVPadding() and it can be zero. It allows us to align the RVV
// objects to 8 bytes.		// objects to the required alignment.
if (MFI.getStackID(FI) == TargetStackID::Default) {		if (MFI.getStackID(FI) == TargetStackID::Default) {
if (MFI.isFixedObjectIndex(FI)) {		if (MFI.isFixedObjectIndex(FI)) {
Offset +=		Offset += StackOffset::get(getStackSizeWithRVVPadding(MF) +
StackOffset::get(MFI.getStackSize() + RVFI->getRVVPadding() +
RVFI->getLibCallStackSize(),		RVFI->getLibCallStackSize(),
RVFI->getRVVStackSize());		RVFI->getRVVStackSize());
} else {		} else {
Offset += StackOffset::getFixed(MFI.getStackSize());		Offset += StackOffset::getFixed(MFI.getStackSize());
}		}
} else if (MFI.getStackID(FI) == TargetStackID::ScalableVector) {		} else if (MFI.getStackID(FI) == TargetStackID::ScalableVector) {
int ScalarLocalVarSize = MFI.getStackSize() -		// Ensure the base of the RVV stack is correctly aligned: add on the
RVFI->getCalleeSavedStackSize() -		// alignment padding.
RVFI->getVarArgsSaveSize();		int ScalarLocalVarSize =
Offset += StackOffset::get(		MFI.getStackSize() - RVFI->getCalleeSavedStackSize() -
alignTo(ScalarLocalVarSize, 8),		RVFI->getVarArgsSaveSize() + RVFI->getRVVPadding();
RVFI->getRVVStackSize());		Offset += StackOffset::get(ScalarLocalVarSize, RVFI->getRVVStackSize());
}		}
}		}
}		}

return Offset;		return Offset;
}		}

void RISCVFrameLowering::determineCalleeSaves(MachineFunction &MF,		void RISCVFrameLowering::determineCalleeSaves(MachineFunction &MF,
Show All 36 Lines	if (MF.getSubtarget<RISCVSubtarget>().hasStdExtF()) {
if (RISCV::FPR16RegClass.contains(Regs[i]) \|\|		if (RISCV::FPR16RegClass.contains(Regs[i]) \|\|
RISCV::FPR32RegClass.contains(Regs[i]) \|\|		RISCV::FPR32RegClass.contains(Regs[i]) \|\|
RISCV::FPR64RegClass.contains(Regs[i]))		RISCV::FPR64RegClass.contains(Regs[i]))
SavedRegs.set(Regs[i]);		SavedRegs.set(Regs[i]);
}		}
}		}
}		}

int64_t		std::pair<int64_t, Align>
RISCVFrameLowering::assignRVVStackObjectOffsets(MachineFrameInfo &MFI) const {		RISCVFrameLowering::assignRVVStackObjectOffsets(MachineFrameInfo &MFI) const {
// Create a buffer of RVV objects to allocate.		// Create a buffer of RVV objects to allocate.
SmallVector<int, 8> ObjectsToAllocate;		SmallVector<int, 8> ObjectsToAllocate;
for (int I = 0, E = MFI.getObjectIndexEnd(); I != E; ++I) {		for (int I = 0, E = MFI.getObjectIndexEnd(); I != E; ++I) {
unsigned StackID = MFI.getStackID(I);		unsigned StackID = MFI.getStackID(I);
if (StackID != TargetStackID::ScalableVector)		if (StackID != TargetStackID::ScalableVector)
continue;		continue;
if (MFI.isDeadObjectIndex(I))		if (MFI.isDeadObjectIndex(I))
continue;		continue;

ObjectsToAllocate.push_back(I);		ObjectsToAllocate.push_back(I);
}		}

// Allocate all RVV locals and spills		// Allocate all RVV locals and spills
int64_t Offset = 0;		int64_t Offset = 0;
		// The minimum alignment is 16 bytes.
		Align RVVStackAlign(16);
for (int FI : ObjectsToAllocate) {		for (int FI : ObjectsToAllocate) {
// ObjectSize in bytes.		// ObjectSize in bytes.
int64_t ObjectSize = MFI.getObjectSize(FI);		int64_t ObjectSize = MFI.getObjectSize(FI);
		auto ObjectAlign = std::max(Align(8), MFI.getObjectAlign(FI));
// If the data type is the fractional vector type, reserve one vector		// If the data type is the fractional vector type, reserve one vector
// register for it.		// register for it.
if (ObjectSize < 8)		if (ObjectSize < 8)
ObjectSize = 8;		ObjectSize = 8;
// Currently, all scalable vector types are aligned to 8 bytes.		Offset = alignTo(Offset + ObjectSize, ObjectAlign);
Offset = alignTo(Offset + ObjectSize, 8);
MFI.setObjectOffset(FI, -Offset);		MFI.setObjectOffset(FI, -Offset);
		// Update the maximum alignment of the RVV stack section
		RVVStackAlign = std::max(RVVStackAlign, ObjectAlign);
}		}

return Offset;		// Ensure the alignment of the RVV stack. Since we want the most-aligned
		// object right at the bottom (i.e., any padding at the top of the frame),
		// readjust all RVV objects down by the alignment padding.
		uint64_t StackSize = Offset;
		if (auto AlignmentPadding = offsetToAlignment(StackSize, RVVStackAlign)) {
		StackSize += AlignmentPadding;
		for (int FI : ObjectsToAllocate)
		MFI.setObjectOffset(FI, MFI.getObjectOffset(FI) - AlignmentPadding);
		}

		return std::make_pair(StackSize, RVVStackAlign);
}		}

static bool hasRVVSpillWithFIs(MachineFunction &MF, const RISCVInstrInfo &TII) {		static bool hasRVVSpillWithFIs(MachineFunction &MF, const RISCVInstrInfo &TII) {
if (!MF.getSubtarget<RISCVSubtarget>().hasVInstructions())		if (!MF.getSubtarget<RISCVSubtarget>().hasVInstructions())
return false;		return false;
return any_of(MF, [&TII](const MachineBasicBlock &MBB) {		return any_of(MF, [&TII](const MachineBasicBlock &MBB) {
return any_of(MBB, [&TII](const MachineInstr &MI) {		return any_of(MBB, [&TII](const MachineInstr &MI) {
return TII.isRVVSpill(MI, /CheckFIs/ true);		return TII.isRVVSpill(MI, /CheckFIs/ true);
});		});
});		});
}		}

void RISCVFrameLowering::processFunctionBeforeFrameFinalized(		void RISCVFrameLowering::processFunctionBeforeFrameFinalized(
MachineFunction &MF, RegScavenger *RS) const {		MachineFunction &MF, RegScavenger *RS) const {
const RISCVRegisterInfo *RegInfo =		const RISCVRegisterInfo *RegInfo =
MF.getSubtarget<RISCVSubtarget>().getRegisterInfo();		MF.getSubtarget<RISCVSubtarget>().getRegisterInfo();
MachineFrameInfo &MFI = MF.getFrameInfo();		MachineFrameInfo &MFI = MF.getFrameInfo();
const TargetRegisterClass *RC = &RISCV::GPRRegClass;		const TargetRegisterClass *RC = &RISCV::GPRRegClass;
auto *RVFI = MF.getInfo<RISCVMachineFunctionInfo>();		auto *RVFI = MF.getInfo<RISCVMachineFunctionInfo>();

int64_t RVVStackSize = assignRVVStackObjectOffsets(MFI);		int64_t RVVStackSize;
		Align RVVStackAlign;
		std::tie(RVVStackSize, RVVStackAlign) = assignRVVStackObjectOffsets(MFI);

RVFI->setRVVStackSize(RVVStackSize);		RVFI->setRVVStackSize(RVVStackSize);
		RVFI->setRVVStackAlign(RVVStackAlign);

const RISCVInstrInfo &TII = *MF.getSubtarget<RISCVSubtarget>().getInstrInfo();		const RISCVInstrInfo &TII = *MF.getSubtarget<RISCVSubtarget>().getInstrInfo();

// estimateStackSize has been observed to under-estimate the final stack		// estimateStackSize has been observed to under-estimate the final stack
// size, so give ourselves wiggle-room by checking for stack size		// size, so give ourselves wiggle-room by checking for stack size
// representable an 11-bit signed field rather than 12-bits.		// representable an 11-bit signed field rather than 12-bits.
// FIXME: It may be possible to craft a function with a small stack that		// FIXME: It may be possible to craft a function with a small stack that
// still needs an emergency spill slot for branch relaxation. This case		// still needs an emergency spill slot for branch relaxation. This case
// would currently be missed.		// would currently be missed.
Show All 22 Lines	void RISCVFrameLowering::processFunctionBeforeFrameFinalized(
for (const auto &Info : MFI.getCalleeSavedInfo()) {		for (const auto &Info : MFI.getCalleeSavedInfo()) {
int FrameIdx = Info.getFrameIdx();		int FrameIdx = Info.getFrameIdx();
if (MFI.getStackID(FrameIdx) != TargetStackID::Default)		if (MFI.getStackID(FrameIdx) != TargetStackID::Default)
continue;		continue;

Size += MFI.getObjectSize(FrameIdx);		Size += MFI.getObjectSize(FrameIdx);
}		}
RVFI->setCalleeSavedStackSize(Size);		RVFI->setCalleeSavedStackSize(Size);

// Padding required to keep the RVV stack aligned to 8 bytes within the main
// stack. We only need this when using SP or BP to access stack objects.
const TargetRegisterInfo *TRI = STI.getRegisterInfo();
if (RVVStackSize && (!hasFP(MF) \|\| TRI->hasStackRealignment(MF)) &&
Size % 8 != 0) {
// Because we add the padding to the size of the stack, adding
// getStackAlign() will keep it aligned.
RVFI->setRVVPadding(getStackAlign().value());
}
}		}

static bool hasRVVFrameObject(const MachineFunction &MF) {		static bool hasRVVFrameObject(const MachineFunction &MF) {
// Originally, the function will scan all the stack objects to check whether		// Originally, the function will scan all the stack objects to check whether
// if there is any scalable vector object on the stack or not. However, it		// if there is any scalable vector object on the stack or not. However, it
// causes errors in the register allocator. In issue 53016, it returns false		// causes errors in the register allocator. In issue 53016, it returns false
// before RA because there is no RVV stack objects. After RA, it returns true		// before RA because there is no RVV stack objects. After RA, it returns true
// because there are spilling slots for RVV values during RA. It will not		// because there are spilling slots for RVV values during RA. It will not
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
// sw s3,2012(sp)		// sw s3,2012(sp)
// sw s4,2008(sp)		// sw s4,2008(sp)
// add sp,sp,-64		// add sp,sp,-64
uint64_t		uint64_t
RISCVFrameLowering::getFirstSPAdjustAmount(const MachineFunction &MF) const {		RISCVFrameLowering::getFirstSPAdjustAmount(const MachineFunction &MF) const {
const auto *RVFI = MF.getInfo<RISCVMachineFunctionInfo>();		const auto *RVFI = MF.getInfo<RISCVMachineFunctionInfo>();
const MachineFrameInfo &MFI = MF.getFrameInfo();		const MachineFrameInfo &MFI = MF.getFrameInfo();
const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();		const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
uint64_t StackSize = MFI.getStackSize();		uint64_t StackSize = getStackSizeWithRVVPadding(MF);

// Disable SplitSPAdjust if save-restore libcall is used. The callee-saved		// Disable SplitSPAdjust if save-restore libcall is used. The callee-saved
// registers will be pushed by the save-restore libcalls, so we don't have to		// registers will be pushed by the save-restore libcalls, so we don't have to
// split the SP adjustment in this case.		// split the SP adjustment in this case.
if (RVFI->getLibCallStackSize())		if (RVFI->getLibCallStackSize())
return 0;		return 0;

// Return the FirstSPAdjustAmount if the StackSize can not fit in a signed		// Return the FirstSPAdjustAmount if the StackSize can not fit in a signed
▲ Show 20 Lines • Show All 157 Lines • Show Last 20 Lines

llvm/lib/Target/RISCV/RISCVMachineFunctionInfo.h

Show First 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	private:
int VarArgsSaveSize = 0;		int VarArgsSaveSize = 0;
/// FrameIndex used for transferring values between 64-bit FPRs and a pair		/// FrameIndex used for transferring values between 64-bit FPRs and a pair
/// of 32-bit GPRs via the stack.		/// of 32-bit GPRs via the stack.
int MoveF64FrameIndex = -1;		int MoveF64FrameIndex = -1;
/// Size of any opaque stack adjustment due to save/restore libcalls.		/// Size of any opaque stack adjustment due to save/restore libcalls.
unsigned LibCallStackSize = 0;		unsigned LibCallStackSize = 0;
/// Size of RVV stack.		/// Size of RVV stack.
uint64_t RVVStackSize = 0;		uint64_t RVVStackSize = 0;
		/// Alignment of RVV stack.
		Align RVVStackAlign;
/// Padding required to keep RVV stack aligned within the main stack.		/// Padding required to keep RVV stack aligned within the main stack.
uint64_t RVVPadding = 0;		uint64_t RVVPadding = 0;
/// Size of stack frame to save callee saved registers		/// Size of stack frame to save callee saved registers
unsigned CalleeSavedStackSize = 0;		unsigned CalleeSavedStackSize = 0;

public:		public:
RISCVMachineFunctionInfo(const MachineFunction &MF) {}		RISCVMachineFunctionInfo(const MachineFunction &MF) {}

Show All 19 Lines	bool useSaveRestoreLibCalls(const MachineFunction &MF) const {
return MF.getSubtarget<RISCVSubtarget>().enableSaveRestore() &&		return MF.getSubtarget<RISCVSubtarget>().enableSaveRestore() &&
VarArgsSaveSize == 0 && !MF.getFrameInfo().hasTailCall() &&		VarArgsSaveSize == 0 && !MF.getFrameInfo().hasTailCall() &&
!MF.getFunction().hasFnAttribute("interrupt");		!MF.getFunction().hasFnAttribute("interrupt");
}		}

uint64_t getRVVStackSize() const { return RVVStackSize; }		uint64_t getRVVStackSize() const { return RVVStackSize; }
void setRVVStackSize(uint64_t Size) { RVVStackSize = Size; }		void setRVVStackSize(uint64_t Size) { RVVStackSize = Size; }

		Align getRVVStackAlign() const { return RVVStackAlign; }
		void setRVVStackAlign(Align StackAlign) { RVVStackAlign = StackAlign; }

uint64_t getRVVPadding() const { return RVVPadding; }		uint64_t getRVVPadding() const { return RVVPadding; }
void setRVVPadding(uint64_t Padding) { RVVPadding = Padding; }		void setRVVPadding(uint64_t Padding) { RVVPadding = Padding; }

unsigned getCalleeSavedStackSize() const { return CalleeSavedStackSize; }		unsigned getCalleeSavedStackSize() const { return CalleeSavedStackSize; }
void setCalleeSavedStackSize(unsigned Size) { CalleeSavedStackSize = Size; }		void setCalleeSavedStackSize(unsigned Size) { CalleeSavedStackSize = Size; }

void initializeBaseYamlFields(const yaml::RISCVMachineFunctionInfo &YamlMFI);		void initializeBaseYamlFields(const yaml::RISCVMachineFunctionInfo &YamlMFI);
};		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_LIB_TARGET_RISCV_RISCVMACHINEFUNCTIONINFO_H		#endif // LLVM_LIB_TARGET_RISCV_RISCVMACHINEFUNCTIONINFO_H

llvm/test/CodeGen/RISCV/rvv/access-fixed-objects-by-rvv.ll

Show All 28 Lines	declare <vscale x 1 x i64> @llvm.riscv.vadd.nxv1i64.nxv1i64(
i64);		i64);

define <vscale x 1 x i64> @access_fixed_and_vector_objects(i64 *%val) {		define <vscale x 1 x i64> @access_fixed_and_vector_objects(i64 *%val) {
; RV64IV-LABEL: access_fixed_and_vector_objects:		; RV64IV-LABEL: access_fixed_and_vector_objects:
; RV64IV: # %bb.0:		; RV64IV: # %bb.0:
; RV64IV-NEXT: addi sp, sp, -544		; RV64IV-NEXT: addi sp, sp, -544
; RV64IV-NEXT: .cfi_def_cfa_offset 544		; RV64IV-NEXT: .cfi_def_cfa_offset 544
; RV64IV-NEXT: csrr a0, vlenb		; RV64IV-NEXT: csrr a0, vlenb
		; RV64IV-NEXT: slli a0, a0, 1
; RV64IV-NEXT: sub sp, sp, a0		; RV64IV-NEXT: sub sp, sp, a0
; RV64IV-NEXT: addi a0, sp, 24		; RV64IV-NEXT: addi a0, sp, 24
; RV64IV-NEXT: vl1re64.v v8, (a0)		; RV64IV-NEXT: vl1re64.v v8, (a0)
; RV64IV-NEXT: ld a0, 536(sp)		; RV64IV-NEXT: ld a0, 536(sp)
; RV64IV-NEXT: addi a1, sp, 544		; RV64IV-NEXT: addi a1, sp, 544
; RV64IV-NEXT: vl1re64.v v9, (a1)		; RV64IV-NEXT: vl1re64.v v9, (a1)
; RV64IV-NEXT: vsetvli zero, a0, e64, m1, ta, mu		; RV64IV-NEXT: vsetvli zero, a0, e64, m1, ta, mu
; RV64IV-NEXT: vadd.vv v8, v8, v9		; RV64IV-NEXT: vadd.vv v8, v8, v9
; RV64IV-NEXT: csrr a0, vlenb		; RV64IV-NEXT: csrr a0, vlenb
		; RV64IV-NEXT: slli a0, a0, 1
; RV64IV-NEXT: add sp, sp, a0		; RV64IV-NEXT: add sp, sp, a0
; RV64IV-NEXT: addi sp, sp, 544		; RV64IV-NEXT: addi sp, sp, 544
; RV64IV-NEXT: ret		; RV64IV-NEXT: ret
%local = alloca i64		%local = alloca i64
%vector = alloca <vscale x 1 x i64>		%vector = alloca <vscale x 1 x i64>
%array = alloca [64 x i64]		%array = alloca [64 x i64]
%vptr = bitcast [64 x i64]* %array to <vscale x 1 x i64>*		%vptr = bitcast [64 x i64]* %array to <vscale x 1 x i64>*
%v1 = load <vscale x 1 x i64>, <vscale x 1 x i64>* %vptr		%v1 = load <vscale x 1 x i64>, <vscale x 1 x i64>* %vptr
Show All 11 Lines

llvm/test/CodeGen/RISCV/rvv/addi-scalable-offset.mir

Show All 32 Lines	bb.0:
; CHECK-NEXT: SD killed $x1, $x2, 2024 :: (store (s64) into %stack.3)		; CHECK-NEXT: SD killed $x1, $x2, 2024 :: (store (s64) into %stack.3)
; CHECK-NEXT: SD killed $x8, $x2, 2016 :: (store (s64) into %stack.4)		; CHECK-NEXT: SD killed $x8, $x2, 2016 :: (store (s64) into %stack.4)
; CHECK-NEXT: frame-setup CFI_INSTRUCTION offset $x1, -8		; CHECK-NEXT: frame-setup CFI_INSTRUCTION offset $x1, -8
; CHECK-NEXT: frame-setup CFI_INSTRUCTION offset $x8, -16		; CHECK-NEXT: frame-setup CFI_INSTRUCTION offset $x8, -16
; CHECK-NEXT: $x8 = frame-setup ADDI $x2, 2032		; CHECK-NEXT: $x8 = frame-setup ADDI $x2, 2032
; CHECK-NEXT: frame-setup CFI_INSTRUCTION def_cfa $x8, 0		; CHECK-NEXT: frame-setup CFI_INSTRUCTION def_cfa $x8, 0
; CHECK-NEXT: $x2 = frame-setup ADDI $x2, -240		; CHECK-NEXT: $x2 = frame-setup ADDI $x2, -240
; CHECK-NEXT: $x12 = frame-setup PseudoReadVLENB		; CHECK-NEXT: $x12 = frame-setup PseudoReadVLENB
		; CHECK-NEXT: $x12 = frame-setup SLLI killed $x12, 1
; CHECK-NEXT: $x2 = frame-setup SUB $x2, killed $x12		; CHECK-NEXT: $x2 = frame-setup SUB $x2, killed $x12
; CHECK-NEXT: dead $x0 = PseudoVSETVLI killed renamable $x11, 88 /* e64, m1, ta, mu */, implicit-def $vl, implicit-def $vtype		; CHECK-NEXT: dead $x0 = PseudoVSETVLI killed renamable $x11, 88 /* e64, m1, ta, mu */, implicit-def $vl, implicit-def $vtype
; CHECK-NEXT: renamable $v8 = PseudoVLE64_V_M1 killed renamable $x10, $noreg, 6 /* e64 */, implicit $vl, implicit $vtype :: (load unknown-size from %ir.pa, align 8)		; CHECK-NEXT: renamable $v8 = PseudoVLE64_V_M1 killed renamable $x10, $noreg, 6 /* e64 */, implicit $vl, implicit $vtype :: (load unknown-size from %ir.pa, align 8)
; CHECK-NEXT: $x11 = PseudoReadVLENB		; CHECK-NEXT: $x11 = PseudoReadVLENB
		; CHECK-NEXT: $x11 = SLLI killed $x11, 1
; CHECK-NEXT: $x10 = LUI 1048575		; CHECK-NEXT: $x10 = LUI 1048575
; CHECK-NEXT: $x10 = ADDIW killed $x10, 1824		; CHECK-NEXT: $x10 = ADDIW killed $x10, 1824
; CHECK-NEXT: $x10 = ADD $x8, killed $x10		; CHECK-NEXT: $x10 = ADD $x8, killed $x10
; CHECK-NEXT: $x10 = SUB killed $x10, killed $x11		; CHECK-NEXT: $x10 = SUB killed $x10, killed $x11
; CHECK-NEXT: VS1R_V killed renamable $v8, killed renamable $x10		; CHECK-NEXT: VS1R_V killed renamable $v8, killed renamable $x10
; CHECK-NEXT: $x10 = frame-destroy PseudoReadVLENB		; CHECK-NEXT: $x10 = frame-destroy PseudoReadVLENB
		; CHECK-NEXT: $x10 = frame-destroy SLLI killed $x10, 1
; CHECK-NEXT: $x2 = frame-destroy ADD $x2, killed $x10		; CHECK-NEXT: $x2 = frame-destroy ADD $x2, killed $x10
; CHECK-NEXT: $x2 = frame-destroy ADDI $x2, 240		; CHECK-NEXT: $x2 = frame-destroy ADDI $x2, 240
; CHECK-NEXT: $x1 = LD $x2, 2024 :: (load (s64) from %stack.3)		; CHECK-NEXT: $x1 = LD $x2, 2024 :: (load (s64) from %stack.3)
; CHECK-NEXT: $x8 = LD $x2, 2016 :: (load (s64) from %stack.4)		; CHECK-NEXT: $x8 = LD $x2, 2016 :: (load (s64) from %stack.4)
; CHECK-NEXT: $x2 = frame-destroy ADDI $x2, 2032		; CHECK-NEXT: $x2 = frame-destroy ADDI $x2, 2032
; CHECK-NEXT: PseudoRET		; CHECK-NEXT: PseudoRET
%1:gprnox0 = COPY $x11		%1:gprnox0 = COPY $x11
%0:gpr = COPY $x10		%0:gpr = COPY $x10
%2:vr = PseudoVLE64_V_M1 %0, %1, 6 :: (load unknown-size from %ir.pa, align 8)		%2:vr = PseudoVLE64_V_M1 %0, %1, 6 :: (load unknown-size from %ir.pa, align 8)
%3:gpr = ADDI %stack.2, 0		%3:gpr = ADDI %stack.2, 0
VS1R_V killed %2:vr, %3:gpr		VS1R_V killed %2:vr, %3:gpr
PseudoRET		PseudoRET

...		...

llvm/test/CodeGen/RISCV/rvv/allocate-lmul-2-4-8.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=riscv64 -mattr=+m,+v -verify-machineinstrs < %s \			; RUN: llc -mtriple=riscv64 -mattr=+m,+v -verify-machineinstrs < %s \
	; RUN: \| FileCheck %s --check-prefixes=CHECK,NOZBA			; RUN: \| FileCheck %s
	; RUN: llc -mtriple=riscv64 -mattr=+m,+v,+zba -verify-machineinstrs < %s \			; RUN: llc -mtriple=riscv64 -mattr=+m,+v,+zba -verify-machineinstrs < %s \
	; RUN: \| FileCheck %s --check-prefixes=CHECK,ZBA			; RUN: \| FileCheck %s

	define void @lmul1() nounwind {			define void @lmul1() nounwind {
	; CHECK-LABEL: lmul1:			; CHECK-LABEL: lmul1:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: csrr a0, vlenb			; CHECK-NEXT: csrr a0, vlenb
				; CHECK-NEXT: slli a0, a0, 1
	; CHECK-NEXT: sub sp, sp, a0			; CHECK-NEXT: sub sp, sp, a0
	; CHECK-NEXT: csrr a0, vlenb			; CHECK-NEXT: csrr a0, vlenb
				; CHECK-NEXT: slli a0, a0, 1
	; CHECK-NEXT: add sp, sp, a0			; CHECK-NEXT: add sp, sp, a0
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%v = alloca <vscale x 1 x i64>			%v = alloca <vscale x 1 x i64>
	ret void			ret void
	}			}

	define void @lmul2() nounwind {			define void @lmul2() nounwind {
	; CHECK-LABEL: lmul2:			; CHECK-LABEL: lmul2:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: csrr a0, vlenb			; CHECK-NEXT: csrr a0, vlenb
	; CHECK-NEXT: slli a0, a0, 1			; CHECK-NEXT: slli a0, a0, 1
	; CHECK-NEXT: sub sp, sp, a0			; CHECK-NEXT: sub sp, sp, a0
	; CHECK-NEXT: csrr a0, vlenb			; CHECK-NEXT: csrr a0, vlenb
	; CHECK-NEXT: slli a0, a0, 1			; CHECK-NEXT: slli a0, a0, 1
	; CHECK-NEXT: add sp, sp, a0			; CHECK-NEXT: add sp, sp, a0
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%v = alloca <vscale x 2 x i64>			%v = alloca <vscale x 2 x i64>
	ret void			ret void
	}			}

	define void @lmul4() nounwind {			define void @lmul4() nounwind {
	; CHECK-LABEL: lmul4:			; CHECK-LABEL: lmul4:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: addi sp, sp, -32			; CHECK-NEXT: addi sp, sp, -48
	; CHECK-NEXT: sd ra, 24(sp) # 8-byte Folded Spill			; CHECK-NEXT: sd ra, 40(sp) # 8-byte Folded Spill
	; CHECK-NEXT: sd s0, 16(sp) # 8-byte Folded Spill			; CHECK-NEXT: sd s0, 32(sp) # 8-byte Folded Spill
	; CHECK-NEXT: addi s0, sp, 32			; CHECK-NEXT: addi s0, sp, 48
	; CHECK-NEXT: csrr a0, vlenb			; CHECK-NEXT: csrr a0, vlenb
	; CHECK-NEXT: slli a0, a0, 2			; CHECK-NEXT: slli a0, a0, 2
	; CHECK-NEXT: sub sp, sp, a0			; CHECK-NEXT: sub sp, sp, a0
	; CHECK-NEXT: andi sp, sp, -32			; CHECK-NEXT: andi sp, sp, -32
	; CHECK-NEXT: addi sp, s0, -32			; CHECK-NEXT: addi sp, s0, -48
	; CHECK-NEXT: ld ra, 24(sp) # 8-byte Folded Reload			; CHECK-NEXT: ld ra, 40(sp) # 8-byte Folded Reload
	; CHECK-NEXT: ld s0, 16(sp) # 8-byte Folded Reload			; CHECK-NEXT: ld s0, 32(sp) # 8-byte Folded Reload
	; CHECK-NEXT: addi sp, sp, 32			; CHECK-NEXT: addi sp, sp, 48
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%v = alloca <vscale x 4 x i64>			%v = alloca <vscale x 4 x i64>
	ret void			ret void
	}			}

	define void @lmul8() nounwind {			define void @lmul8() nounwind {
	; CHECK-LABEL: lmul8:			; CHECK-LABEL: lmul8:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: addi sp, sp, -64			; CHECK-NEXT: addi sp, sp, -80
	; CHECK-NEXT: sd ra, 56(sp) # 8-byte Folded Spill			; CHECK-NEXT: sd ra, 72(sp) # 8-byte Folded Spill
	; CHECK-NEXT: sd s0, 48(sp) # 8-byte Folded Spill			; CHECK-NEXT: sd s0, 64(sp) # 8-byte Folded Spill
	; CHECK-NEXT: addi s0, sp, 64			; CHECK-NEXT: addi s0, sp, 80
	; CHECK-NEXT: csrr a0, vlenb			; CHECK-NEXT: csrr a0, vlenb
	; CHECK-NEXT: slli a0, a0, 3			; CHECK-NEXT: slli a0, a0, 3
	; CHECK-NEXT: sub sp, sp, a0			; CHECK-NEXT: sub sp, sp, a0
	; CHECK-NEXT: andi sp, sp, -64			; CHECK-NEXT: andi sp, sp, -64
	; CHECK-NEXT: addi sp, s0, -64			; CHECK-NEXT: addi sp, s0, -80
	; CHECK-NEXT: ld ra, 56(sp) # 8-byte Folded Reload			; CHECK-NEXT: ld ra, 72(sp) # 8-byte Folded Reload
	; CHECK-NEXT: ld s0, 48(sp) # 8-byte Folded Reload			; CHECK-NEXT: ld s0, 64(sp) # 8-byte Folded Reload
	; CHECK-NEXT: addi sp, sp, 64			; CHECK-NEXT: addi sp, sp, 80
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%v = alloca <vscale x 8 x i64>			%v = alloca <vscale x 8 x i64>
	ret void			ret void
	}			}

	define void @lmul1_and_2() nounwind {			define void @lmul1_and_2() nounwind {
	; NOZBA-LABEL: lmul1_and_2:			; CHECK-LABEL: lmul1_and_2:
	; NOZBA: # %bb.0:			; CHECK: # %bb.0:
	; NOZBA-NEXT: csrr a0, vlenb			; CHECK-NEXT: csrr a0, vlenb
	; NOZBA-NEXT: slli a1, a0, 1			; CHECK-NEXT: slli a0, a0, 2
	; NOZBA-NEXT: add a0, a1, a0			; CHECK-NEXT: sub sp, sp, a0
	; NOZBA-NEXT: sub sp, sp, a0			; CHECK-NEXT: csrr a0, vlenb
	; NOZBA-NEXT: csrr a0, vlenb			; CHECK-NEXT: slli a0, a0, 2
	; NOZBA-NEXT: slli a1, a0, 1			; CHECK-NEXT: add sp, sp, a0
	; NOZBA-NEXT: add a0, a1, a0			; CHECK-NEXT: ret
	; NOZBA-NEXT: add sp, sp, a0
	; NOZBA-NEXT: ret
	;
	; ZBA-LABEL: lmul1_and_2:
	; ZBA: # %bb.0:
	; ZBA-NEXT: csrr a0, vlenb
	; ZBA-NEXT: sh1add a0, a0, a0
	; ZBA-NEXT: sub sp, sp, a0
	; ZBA-NEXT: csrr a0, vlenb
	; ZBA-NEXT: sh1add a0, a0, a0
	; ZBA-NEXT: add sp, sp, a0
	; ZBA-NEXT: ret
	%v1 = alloca <vscale x 1 x i64>			%v1 = alloca <vscale x 1 x i64>
	%v2 = alloca <vscale x 2 x i64>			%v2 = alloca <vscale x 2 x i64>
	ret void			ret void
	}			}

	define void @lmul2_and_4() nounwind {			define void @lmul2_and_4() nounwind {
	; CHECK-LABEL: lmul2_and_4:			; CHECK-LABEL: lmul2_and_4:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: addi sp, sp, -32			; CHECK-NEXT: addi sp, sp, -48
	; CHECK-NEXT: sd ra, 24(sp) # 8-byte Folded Spill			; CHECK-NEXT: sd ra, 40(sp) # 8-byte Folded Spill
	; CHECK-NEXT: sd s0, 16(sp) # 8-byte Folded Spill			; CHECK-NEXT: sd s0, 32(sp) # 8-byte Folded Spill
	; CHECK-NEXT: addi s0, sp, 32			; CHECK-NEXT: addi s0, sp, 48
	; CHECK-NEXT: csrr a0, vlenb			; CHECK-NEXT: csrr a0, vlenb
	; CHECK-NEXT: li a1, 6			; CHECK-NEXT: slli a0, a0, 3
	; CHECK-NEXT: mul a0, a0, a1
	; CHECK-NEXT: sub sp, sp, a0			; CHECK-NEXT: sub sp, sp, a0
	; CHECK-NEXT: andi sp, sp, -32			; CHECK-NEXT: andi sp, sp, -32
	; CHECK-NEXT: addi sp, s0, -32			; CHECK-NEXT: addi sp, s0, -48
	; CHECK-NEXT: ld ra, 24(sp) # 8-byte Folded Reload			; CHECK-NEXT: ld ra, 40(sp) # 8-byte Folded Reload
	; CHECK-NEXT: ld s0, 16(sp) # 8-byte Folded Reload			; CHECK-NEXT: ld s0, 32(sp) # 8-byte Folded Reload
	; CHECK-NEXT: addi sp, sp, 32			; CHECK-NEXT: addi sp, sp, 48
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%v1 = alloca <vscale x 2 x i64>			%v1 = alloca <vscale x 2 x i64>
	%v2 = alloca <vscale x 4 x i64>			%v2 = alloca <vscale x 4 x i64>
	ret void			ret void
	}			}

	define void @lmul1_and_4() nounwind {			define void @lmul1_and_4() nounwind {
	; NOZBA-LABEL: lmul1_and_4:			; CHECK-LABEL: lmul1_and_4:
	; NOZBA: # %bb.0:			; CHECK: # %bb.0:
	; NOZBA-NEXT: addi sp, sp, -32			; CHECK-NEXT: addi sp, sp, -48
	; NOZBA-NEXT: sd ra, 24(sp) # 8-byte Folded Spill			; CHECK-NEXT: sd ra, 40(sp) # 8-byte Folded Spill
	; NOZBA-NEXT: sd s0, 16(sp) # 8-byte Folded Spill			; CHECK-NEXT: sd s0, 32(sp) # 8-byte Folded Spill
	; NOZBA-NEXT: addi s0, sp, 32			; CHECK-NEXT: addi s0, sp, 48
	; NOZBA-NEXT: csrr a0, vlenb			; CHECK-NEXT: csrr a0, vlenb
	; NOZBA-NEXT: slli a1, a0, 2			; CHECK-NEXT: slli a0, a0, 3
	; NOZBA-NEXT: add a0, a1, a0			; CHECK-NEXT: sub sp, sp, a0
	; NOZBA-NEXT: sub sp, sp, a0			; CHECK-NEXT: andi sp, sp, -32
	; NOZBA-NEXT: andi sp, sp, -32			; CHECK-NEXT: addi sp, s0, -48
	; NOZBA-NEXT: addi sp, s0, -32			; CHECK-NEXT: ld ra, 40(sp) # 8-byte Folded Reload
	; NOZBA-NEXT: ld ra, 24(sp) # 8-byte Folded Reload			; CHECK-NEXT: ld s0, 32(sp) # 8-byte Folded Reload
	; NOZBA-NEXT: ld s0, 16(sp) # 8-byte Folded Reload			; CHECK-NEXT: addi sp, sp, 48
	; NOZBA-NEXT: addi sp, sp, 32			; CHECK-NEXT: ret
	; NOZBA-NEXT: ret
	;
	; ZBA-LABEL: lmul1_and_4:
	; ZBA: # %bb.0:
	; ZBA-NEXT: addi sp, sp, -32
	; ZBA-NEXT: sd ra, 24(sp) # 8-byte Folded Spill
	; ZBA-NEXT: sd s0, 16(sp) # 8-byte Folded Spill
	; ZBA-NEXT: addi s0, sp, 32
	; ZBA-NEXT: csrr a0, vlenb
	; ZBA-NEXT: sh2add a0, a0, a0
	; ZBA-NEXT: sub sp, sp, a0
	; ZBA-NEXT: andi sp, sp, -32
	; ZBA-NEXT: addi sp, s0, -32
	; ZBA-NEXT: ld ra, 24(sp) # 8-byte Folded Reload
	; ZBA-NEXT: ld s0, 16(sp) # 8-byte Folded Reload
	; ZBA-NEXT: addi sp, sp, 32
	; ZBA-NEXT: ret
	%v1 = alloca <vscale x 1 x i64>			%v1 = alloca <vscale x 1 x i64>
	%v2 = alloca <vscale x 4 x i64>			%v2 = alloca <vscale x 4 x i64>
	ret void			ret void
	}			}

	define void @lmul2_and_1() nounwind {			define void @lmul2_and_1() nounwind {
	; NOZBA-LABEL: lmul2_and_1:			; CHECK-LABEL: lmul2_and_1:
	; NOZBA: # %bb.0:			; CHECK: # %bb.0:
	; NOZBA-NEXT: csrr a0, vlenb			; CHECK-NEXT: csrr a0, vlenb
	; NOZBA-NEXT: slli a1, a0, 1			; CHECK-NEXT: slli a0, a0, 2
	; NOZBA-NEXT: add a0, a1, a0			; CHECK-NEXT: sub sp, sp, a0
	; NOZBA-NEXT: sub sp, sp, a0			; CHECK-NEXT: csrr a0, vlenb
	; NOZBA-NEXT: csrr a0, vlenb			; CHECK-NEXT: slli a0, a0, 2
	; NOZBA-NEXT: slli a1, a0, 1			; CHECK-NEXT: add sp, sp, a0
	; NOZBA-NEXT: add a0, a1, a0			; CHECK-NEXT: ret
	; NOZBA-NEXT: add sp, sp, a0
	; NOZBA-NEXT: ret
	;
	; ZBA-LABEL: lmul2_and_1:
	; ZBA: # %bb.0:
	; ZBA-NEXT: csrr a0, vlenb
	; ZBA-NEXT: sh1add a0, a0, a0
	; ZBA-NEXT: sub sp, sp, a0
	; ZBA-NEXT: csrr a0, vlenb
	; ZBA-NEXT: sh1add a0, a0, a0
	; ZBA-NEXT: add sp, sp, a0
	; ZBA-NEXT: ret
	%v1 = alloca <vscale x 2 x i64>			%v1 = alloca <vscale x 2 x i64>
	%v2 = alloca <vscale x 1 x i64>			%v2 = alloca <vscale x 1 x i64>
	ret void			ret void
	}			}

	define void @lmul4_and_1() nounwind {			define void @lmul4_and_1() nounwind {
	; NOZBA-LABEL: lmul4_and_1:			; CHECK-LABEL: lmul4_and_1:
	; NOZBA: # %bb.0:			; CHECK: # %bb.0:
	; NOZBA-NEXT: addi sp, sp, -32			; CHECK-NEXT: addi sp, sp, -48
	; NOZBA-NEXT: sd ra, 24(sp) # 8-byte Folded Spill			; CHECK-NEXT: sd ra, 40(sp) # 8-byte Folded Spill
	; NOZBA-NEXT: sd s0, 16(sp) # 8-byte Folded Spill			; CHECK-NEXT: sd s0, 32(sp) # 8-byte Folded Spill
	; NOZBA-NEXT: addi s0, sp, 32			; CHECK-NEXT: addi s0, sp, 48
	; NOZBA-NEXT: csrr a0, vlenb			; CHECK-NEXT: csrr a0, vlenb
	; NOZBA-NEXT: slli a1, a0, 2			; CHECK-NEXT: slli a0, a0, 3
	; NOZBA-NEXT: add a0, a1, a0			; CHECK-NEXT: sub sp, sp, a0
	; NOZBA-NEXT: sub sp, sp, a0			; CHECK-NEXT: andi sp, sp, -32
	; NOZBA-NEXT: andi sp, sp, -32			; CHECK-NEXT: addi sp, s0, -48
	; NOZBA-NEXT: addi sp, s0, -32			; CHECK-NEXT: ld ra, 40(sp) # 8-byte Folded Reload
	; NOZBA-NEXT: ld ra, 24(sp) # 8-byte Folded Reload			; CHECK-NEXT: ld s0, 32(sp) # 8-byte Folded Reload
	; NOZBA-NEXT: ld s0, 16(sp) # 8-byte Folded Reload			; CHECK-NEXT: addi sp, sp, 48
	; NOZBA-NEXT: addi sp, sp, 32			; CHECK-NEXT: ret
	; NOZBA-NEXT: ret
	;
	; ZBA-LABEL: lmul4_and_1:
	; ZBA: # %bb.0:
	; ZBA-NEXT: addi sp, sp, -32
	; ZBA-NEXT: sd ra, 24(sp) # 8-byte Folded Spill
	; ZBA-NEXT: sd s0, 16(sp) # 8-byte Folded Spill
	; ZBA-NEXT: addi s0, sp, 32
	; ZBA-NEXT: csrr a0, vlenb
	; ZBA-NEXT: sh2add a0, a0, a0
	; ZBA-NEXT: sub sp, sp, a0
	; ZBA-NEXT: andi sp, sp, -32
	; ZBA-NEXT: addi sp, s0, -32
	; ZBA-NEXT: ld ra, 24(sp) # 8-byte Folded Reload
	; ZBA-NEXT: ld s0, 16(sp) # 8-byte Folded Reload
	; ZBA-NEXT: addi sp, sp, 32
	; ZBA-NEXT: ret
	%v1 = alloca <vscale x 4 x i64>			%v1 = alloca <vscale x 4 x i64>
	%v2 = alloca <vscale x 1 x i64>			%v2 = alloca <vscale x 1 x i64>
	ret void			ret void
	}			}

	define void @lmul4_and_2() nounwind {			define void @lmul4_and_2() nounwind {
	; CHECK-LABEL: lmul4_and_2:			; CHECK-LABEL: lmul4_and_2:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: addi sp, sp, -32			; CHECK-NEXT: addi sp, sp, -48
	; CHECK-NEXT: sd ra, 24(sp) # 8-byte Folded Spill			; CHECK-NEXT: sd ra, 40(sp) # 8-byte Folded Spill
	; CHECK-NEXT: sd s0, 16(sp) # 8-byte Folded Spill			; CHECK-NEXT: sd s0, 32(sp) # 8-byte Folded Spill
	; CHECK-NEXT: addi s0, sp, 32			; CHECK-NEXT: addi s0, sp, 48
	; CHECK-NEXT: csrr a0, vlenb			; CHECK-NEXT: csrr a0, vlenb
	; CHECK-NEXT: li a1, 6			; CHECK-NEXT: slli a0, a0, 3
	; CHECK-NEXT: mul a0, a0, a1
	; CHECK-NEXT: sub sp, sp, a0			; CHECK-NEXT: sub sp, sp, a0
	; CHECK-NEXT: andi sp, sp, -32			; CHECK-NEXT: andi sp, sp, -32
	; CHECK-NEXT: addi sp, s0, -32			; CHECK-NEXT: addi sp, s0, -48
	; CHECK-NEXT: ld ra, 24(sp) # 8-byte Folded Reload			; CHECK-NEXT: ld ra, 40(sp) # 8-byte Folded Reload
	; CHECK-NEXT: ld s0, 16(sp) # 8-byte Folded Reload			; CHECK-NEXT: ld s0, 32(sp) # 8-byte Folded Reload
	; CHECK-NEXT: addi sp, sp, 32			; CHECK-NEXT: addi sp, sp, 48
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%v1 = alloca <vscale x 4 x i64>			%v1 = alloca <vscale x 4 x i64>
	%v2 = alloca <vscale x 2 x i64>			%v2 = alloca <vscale x 2 x i64>
	ret void			ret void
	}			}

	define void @lmul4_and_2_x2_0() nounwind {			define void @lmul4_and_2_x2_0() nounwind {
	; CHECK-LABEL: lmul4_and_2_x2_0:			; CHECK-LABEL: lmul4_and_2_x2_0:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: addi sp, sp, -32			; CHECK-NEXT: addi sp, sp, -48
	; CHECK-NEXT: sd ra, 24(sp) # 8-byte Folded Spill			; CHECK-NEXT: sd ra, 40(sp) # 8-byte Folded Spill
	; CHECK-NEXT: sd s0, 16(sp) # 8-byte Folded Spill			; CHECK-NEXT: sd s0, 32(sp) # 8-byte Folded Spill
	; CHECK-NEXT: addi s0, sp, 32			; CHECK-NEXT: addi s0, sp, 48
	; CHECK-NEXT: csrr a0, vlenb			; CHECK-NEXT: csrr a0, vlenb
	; CHECK-NEXT: li a1, 12			; CHECK-NEXT: slli a0, a0, 4
	; CHECK-NEXT: mul a0, a0, a1
	; CHECK-NEXT: sub sp, sp, a0			; CHECK-NEXT: sub sp, sp, a0
	; CHECK-NEXT: andi sp, sp, -32			; CHECK-NEXT: andi sp, sp, -32
	; CHECK-NEXT: addi sp, s0, -32			; CHECK-NEXT: addi sp, s0, -48
	; CHECK-NEXT: ld ra, 24(sp) # 8-byte Folded Reload			; CHECK-NEXT: ld ra, 40(sp) # 8-byte Folded Reload
	; CHECK-NEXT: ld s0, 16(sp) # 8-byte Folded Reload			; CHECK-NEXT: ld s0, 32(sp) # 8-byte Folded Reload
	; CHECK-NEXT: addi sp, sp, 32			; CHECK-NEXT: addi sp, sp, 48
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%v1 = alloca <vscale x 4 x i64>			%v1 = alloca <vscale x 4 x i64>
	%v2 = alloca <vscale x 2 x i64>			%v2 = alloca <vscale x 2 x i64>
	%v3 = alloca <vscale x 4 x i64>			%v3 = alloca <vscale x 4 x i64>
	%v4 = alloca <vscale x 2 x i64>			%v4 = alloca <vscale x 2 x i64>
	ret void			ret void
	}			}

	define void @lmul4_and_2_x2_1() nounwind {			define void @lmul4_and_2_x2_1() nounwind {
	; CHECK-LABEL: lmul4_and_2_x2_1:			; CHECK-LABEL: lmul4_and_2_x2_1:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: addi sp, sp, -32			; CHECK-NEXT: addi sp, sp, -48
	; CHECK-NEXT: sd ra, 24(sp) # 8-byte Folded Spill			; CHECK-NEXT: sd ra, 40(sp) # 8-byte Folded Spill
	; CHECK-NEXT: sd s0, 16(sp) # 8-byte Folded Spill			; CHECK-NEXT: sd s0, 32(sp) # 8-byte Folded Spill
	; CHECK-NEXT: addi s0, sp, 32			; CHECK-NEXT: addi s0, sp, 48
	; CHECK-NEXT: csrr a0, vlenb			; CHECK-NEXT: csrr a0, vlenb
	; CHECK-NEXT: li a1, 12			; CHECK-NEXT: li a1, 12
	; CHECK-NEXT: mul a0, a0, a1			; CHECK-NEXT: mul a0, a0, a1
	; CHECK-NEXT: sub sp, sp, a0			; CHECK-NEXT: sub sp, sp, a0
	; CHECK-NEXT: andi sp, sp, -32			; CHECK-NEXT: andi sp, sp, -32
	; CHECK-NEXT: addi sp, s0, -32			; CHECK-NEXT: addi sp, s0, -48
	; CHECK-NEXT: ld ra, 24(sp) # 8-byte Folded Reload			; CHECK-NEXT: ld ra, 40(sp) # 8-byte Folded Reload
	; CHECK-NEXT: ld s0, 16(sp) # 8-byte Folded Reload			; CHECK-NEXT: ld s0, 32(sp) # 8-byte Folded Reload
	; CHECK-NEXT: addi sp, sp, 32			; CHECK-NEXT: addi sp, sp, 48
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%v1 = alloca <vscale x 4 x i64>			%v1 = alloca <vscale x 4 x i64>
	%v3 = alloca <vscale x 4 x i64>			%v3 = alloca <vscale x 4 x i64>
	%v2 = alloca <vscale x 2 x i64>			%v2 = alloca <vscale x 2 x i64>
	%v4 = alloca <vscale x 2 x i64>			%v4 = alloca <vscale x 2 x i64>
	ret void			ret void
	}			}


	define void @gpr_and_lmul1_and_2() nounwind {			define void @gpr_and_lmul1_and_2() nounwind {
	; NOZBA-LABEL: gpr_and_lmul1_and_2:			; CHECK-LABEL: gpr_and_lmul1_and_2:
	; NOZBA: # %bb.0:			; CHECK: # %bb.0:
	; NOZBA-NEXT: addi sp, sp, -16			; CHECK-NEXT: addi sp, sp, -16
	; NOZBA-NEXT: csrr a0, vlenb			; CHECK-NEXT: csrr a0, vlenb
	; NOZBA-NEXT: slli a1, a0, 1			; CHECK-NEXT: slli a0, a0, 2
	; NOZBA-NEXT: add a0, a1, a0			; CHECK-NEXT: sub sp, sp, a0
	; NOZBA-NEXT: sub sp, sp, a0			; CHECK-NEXT: li a0, 3
	; NOZBA-NEXT: li a0, 3			; CHECK-NEXT: sd a0, 8(sp)
	; NOZBA-NEXT: sd a0, 8(sp)			; CHECK-NEXT: csrr a0, vlenb
	; NOZBA-NEXT: csrr a0, vlenb			; CHECK-NEXT: slli a0, a0, 2
	; NOZBA-NEXT: slli a1, a0, 1			; CHECK-NEXT: add sp, sp, a0
	; NOZBA-NEXT: add a0, a1, a0			; CHECK-NEXT: addi sp, sp, 16
	; NOZBA-NEXT: add sp, sp, a0			; CHECK-NEXT: ret
	; NOZBA-NEXT: addi sp, sp, 16
	; NOZBA-NEXT: ret
	;
	; ZBA-LABEL: gpr_and_lmul1_and_2:
	; ZBA: # %bb.0:
	; ZBA-NEXT: addi sp, sp, -16
	; ZBA-NEXT: csrr a0, vlenb
	; ZBA-NEXT: sh1add a0, a0, a0
	; ZBA-NEXT: sub sp, sp, a0
	; ZBA-NEXT: li a0, 3
	; ZBA-NEXT: sd a0, 8(sp)
	; ZBA-NEXT: csrr a0, vlenb
	; ZBA-NEXT: sh1add a0, a0, a0
	; ZBA-NEXT: add sp, sp, a0
	; ZBA-NEXT: addi sp, sp, 16
	; ZBA-NEXT: ret
	%x1 = alloca i64			%x1 = alloca i64
	%v1 = alloca <vscale x 1 x i64>			%v1 = alloca <vscale x 1 x i64>
	%v2 = alloca <vscale x 2 x i64>			%v2 = alloca <vscale x 2 x i64>
	store volatile i64 3, i64* %x1			store volatile i64 3, i64* %x1
	ret void			ret void
	}			}

	define void @gpr_and_lmul1_and_4() nounwind {			define void @gpr_and_lmul1_and_4() nounwind {
	; NOZBA-LABEL: gpr_and_lmul1_and_4:			; CHECK-LABEL: gpr_and_lmul1_and_4:
	; NOZBA: # %bb.0:			; CHECK: # %bb.0:
	; NOZBA-NEXT: addi sp, sp, -32			; CHECK-NEXT: addi sp, sp, -48
	; NOZBA-NEXT: sd ra, 24(sp) # 8-byte Folded Spill			; CHECK-NEXT: sd ra, 40(sp) # 8-byte Folded Spill
	; NOZBA-NEXT: sd s0, 16(sp) # 8-byte Folded Spill			; CHECK-NEXT: sd s0, 32(sp) # 8-byte Folded Spill
	; NOZBA-NEXT: addi s0, sp, 32			; CHECK-NEXT: addi s0, sp, 48
	; NOZBA-NEXT: csrr a0, vlenb			; CHECK-NEXT: csrr a0, vlenb
	; NOZBA-NEXT: slli a1, a0, 2			; CHECK-NEXT: slli a0, a0, 3
	; NOZBA-NEXT: add a0, a1, a0			; CHECK-NEXT: sub sp, sp, a0
	; NOZBA-NEXT: sub sp, sp, a0			; CHECK-NEXT: andi sp, sp, -32
	; NOZBA-NEXT: andi sp, sp, -32			; CHECK-NEXT: li a0, 3
	; NOZBA-NEXT: li a0, 3			; CHECK-NEXT: sd a0, 8(sp)
	; NOZBA-NEXT: sd a0, 8(sp)			; CHECK-NEXT: addi sp, s0, -48
	; NOZBA-NEXT: addi sp, s0, -32			; CHECK-NEXT: ld ra, 40(sp) # 8-byte Folded Reload
	; NOZBA-NEXT: ld ra, 24(sp) # 8-byte Folded Reload			; CHECK-NEXT: ld s0, 32(sp) # 8-byte Folded Reload
	; NOZBA-NEXT: ld s0, 16(sp) # 8-byte Folded Reload			; CHECK-NEXT: addi sp, sp, 48
	; NOZBA-NEXT: addi sp, sp, 32			; CHECK-NEXT: ret
	; NOZBA-NEXT: ret
	;
	; ZBA-LABEL: gpr_and_lmul1_and_4:
	; ZBA: # %bb.0:
	; ZBA-NEXT: addi sp, sp, -32
	; ZBA-NEXT: sd ra, 24(sp) # 8-byte Folded Spill
	; ZBA-NEXT: sd s0, 16(sp) # 8-byte Folded Spill
	; ZBA-NEXT: addi s0, sp, 32
	; ZBA-NEXT: csrr a0, vlenb
	; ZBA-NEXT: sh2add a0, a0, a0
	; ZBA-NEXT: sub sp, sp, a0
	; ZBA-NEXT: andi sp, sp, -32
	; ZBA-NEXT: li a0, 3
	; ZBA-NEXT: sd a0, 8(sp)
	; ZBA-NEXT: addi sp, s0, -32
	; ZBA-NEXT: ld ra, 24(sp) # 8-byte Folded Reload
	; ZBA-NEXT: ld s0, 16(sp) # 8-byte Folded Reload
	; ZBA-NEXT: addi sp, sp, 32
	; ZBA-NEXT: ret
	%x1 = alloca i64			%x1 = alloca i64
	%v1 = alloca <vscale x 1 x i64>			%v1 = alloca <vscale x 1 x i64>
	%v2 = alloca <vscale x 4 x i64>			%v2 = alloca <vscale x 4 x i64>
	store volatile i64 3, i64* %x1			store volatile i64 3, i64* %x1
	ret void			ret void
	}			}

	define void @lmul_1_2_4_8() nounwind {			define void @lmul_1_2_4_8() nounwind {
	; CHECK-LABEL: lmul_1_2_4_8:			; CHECK-LABEL: lmul_1_2_4_8:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: addi sp, sp, -64			; CHECK-NEXT: addi sp, sp, -80
	; CHECK-NEXT: sd ra, 56(sp) # 8-byte Folded Spill			; CHECK-NEXT: sd ra, 72(sp) # 8-byte Folded Spill
	; CHECK-NEXT: sd s0, 48(sp) # 8-byte Folded Spill			; CHECK-NEXT: sd s0, 64(sp) # 8-byte Folded Spill
	; CHECK-NEXT: addi s0, sp, 64			; CHECK-NEXT: addi s0, sp, 80
	; CHECK-NEXT: csrr a0, vlenb			; CHECK-NEXT: csrr a0, vlenb
	; CHECK-NEXT: slli a1, a0, 4			; CHECK-NEXT: slli a0, a0, 4
	; CHECK-NEXT: sub a0, a1, a0
	; CHECK-NEXT: sub sp, sp, a0			; CHECK-NEXT: sub sp, sp, a0
	; CHECK-NEXT: andi sp, sp, -64			; CHECK-NEXT: andi sp, sp, -64
	; CHECK-NEXT: addi sp, s0, -64			; CHECK-NEXT: addi sp, s0, -80
	; CHECK-NEXT: ld ra, 56(sp) # 8-byte Folded Reload			; CHECK-NEXT: ld ra, 72(sp) # 8-byte Folded Reload
	; CHECK-NEXT: ld s0, 48(sp) # 8-byte Folded Reload			; CHECK-NEXT: ld s0, 64(sp) # 8-byte Folded Reload
	; CHECK-NEXT: addi sp, sp, 64			; CHECK-NEXT: addi sp, sp, 80
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%v1 = alloca <vscale x 1 x i64>			%v1 = alloca <vscale x 1 x i64>
	%v2 = alloca <vscale x 2 x i64>			%v2 = alloca <vscale x 2 x i64>
	%v4 = alloca <vscale x 4 x i64>			%v4 = alloca <vscale x 4 x i64>
	%v8 = alloca <vscale x 8 x i64>			%v8 = alloca <vscale x 8 x i64>
	ret void			ret void
	}			}

	define void @lmul_1_2_4_8_x2_0() nounwind {			define void @lmul_1_2_4_8_x2_0() nounwind {
	; CHECK-LABEL: lmul_1_2_4_8_x2_0:			; CHECK-LABEL: lmul_1_2_4_8_x2_0:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: addi sp, sp, -64			; CHECK-NEXT: addi sp, sp, -80
	; CHECK-NEXT: sd ra, 56(sp) # 8-byte Folded Spill			; CHECK-NEXT: sd ra, 72(sp) # 8-byte Folded Spill
	; CHECK-NEXT: sd s0, 48(sp) # 8-byte Folded Spill			; CHECK-NEXT: sd s0, 64(sp) # 8-byte Folded Spill
	; CHECK-NEXT: addi s0, sp, 64			; CHECK-NEXT: addi s0, sp, 80
	; CHECK-NEXT: csrr a0, vlenb			; CHECK-NEXT: csrr a0, vlenb
	; CHECK-NEXT: li a1, 30			; CHECK-NEXT: slli a0, a0, 5
	; CHECK-NEXT: mul a0, a0, a1
	; CHECK-NEXT: sub sp, sp, a0			; CHECK-NEXT: sub sp, sp, a0
	; CHECK-NEXT: andi sp, sp, -64			; CHECK-NEXT: andi sp, sp, -64
	; CHECK-NEXT: addi sp, s0, -64			; CHECK-NEXT: addi sp, s0, -80
	; CHECK-NEXT: ld ra, 56(sp) # 8-byte Folded Reload			; CHECK-NEXT: ld ra, 72(sp) # 8-byte Folded Reload
	; CHECK-NEXT: ld s0, 48(sp) # 8-byte Folded Reload			; CHECK-NEXT: ld s0, 64(sp) # 8-byte Folded Reload
	; CHECK-NEXT: addi sp, sp, 64			; CHECK-NEXT: addi sp, sp, 80
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%v1 = alloca <vscale x 1 x i64>			%v1 = alloca <vscale x 1 x i64>
	%v2 = alloca <vscale x 1 x i64>			%v2 = alloca <vscale x 1 x i64>
	%v3 = alloca <vscale x 2 x i64>			%v3 = alloca <vscale x 2 x i64>
	%v4 = alloca <vscale x 2 x i64>			%v4 = alloca <vscale x 2 x i64>
	%v5 = alloca <vscale x 4 x i64>			%v5 = alloca <vscale x 4 x i64>
	%v6 = alloca <vscale x 4 x i64>			%v6 = alloca <vscale x 4 x i64>
	%v7 = alloca <vscale x 8 x i64>			%v7 = alloca <vscale x 8 x i64>
	%v8 = alloca <vscale x 8 x i64>			%v8 = alloca <vscale x 8 x i64>
	ret void			ret void
	}			}

	define void @lmul_1_2_4_8_x2_1() nounwind {			define void @lmul_1_2_4_8_x2_1() nounwind {
	; CHECK-LABEL: lmul_1_2_4_8_x2_1:			; CHECK-LABEL: lmul_1_2_4_8_x2_1:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: addi sp, sp, -64			; CHECK-NEXT: addi sp, sp, -80
	; CHECK-NEXT: sd ra, 56(sp) # 8-byte Folded Spill			; CHECK-NEXT: sd ra, 72(sp) # 8-byte Folded Spill
	; CHECK-NEXT: sd s0, 48(sp) # 8-byte Folded Spill			; CHECK-NEXT: sd s0, 64(sp) # 8-byte Folded Spill
	; CHECK-NEXT: addi s0, sp, 64			; CHECK-NEXT: addi s0, sp, 80
	; CHECK-NEXT: csrr a0, vlenb			; CHECK-NEXT: csrr a0, vlenb
	; CHECK-NEXT: li a1, 30			; CHECK-NEXT: slli a0, a0, 5
	; CHECK-NEXT: mul a0, a0, a1
	; CHECK-NEXT: sub sp, sp, a0			; CHECK-NEXT: sub sp, sp, a0
	; CHECK-NEXT: andi sp, sp, -64			; CHECK-NEXT: andi sp, sp, -64
	; CHECK-NEXT: addi sp, s0, -64			; CHECK-NEXT: addi sp, s0, -80
	; CHECK-NEXT: ld ra, 56(sp) # 8-byte Folded Reload			; CHECK-NEXT: ld ra, 72(sp) # 8-byte Folded Reload
	; CHECK-NEXT: ld s0, 48(sp) # 8-byte Folded Reload			; CHECK-NEXT: ld s0, 64(sp) # 8-byte Folded Reload
	; CHECK-NEXT: addi sp, sp, 64			; CHECK-NEXT: addi sp, sp, 80
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%v8 = alloca <vscale x 8 x i64>			%v8 = alloca <vscale x 8 x i64>
	%v7 = alloca <vscale x 8 x i64>			%v7 = alloca <vscale x 8 x i64>
	%v6 = alloca <vscale x 4 x i64>			%v6 = alloca <vscale x 4 x i64>
	%v5 = alloca <vscale x 4 x i64>			%v5 = alloca <vscale x 4 x i64>
	%v4 = alloca <vscale x 2 x i64>			%v4 = alloca <vscale x 2 x i64>
	%v3 = alloca <vscale x 2 x i64>			%v3 = alloca <vscale x 2 x i64>
	%v2 = alloca <vscale x 1 x i64>			%v2 = alloca <vscale x 1 x i64>
	Show All 20 Lines

llvm/test/CodeGen/RISCV/rvv/calling-conv-fastcc.ll

Show First 20 Lines • Show All 280 Lines • ▼ Show 20 Lines
}		}

declare <vscale x 32 x i32> @ext2(<vscale x 32 x i32>, <vscale x 32 x i32>, i32, i32)		declare <vscale x 32 x i32> @ext2(<vscale x 32 x i32>, <vscale x 32 x i32>, i32, i32)
declare <vscale x 32 x i32> @ext3(<vscale x 32 x i32>, <vscale x 32 x i32>, <vscale x 32 x i32>, i32, i32)		declare <vscale x 32 x i32> @ext3(<vscale x 32 x i32>, <vscale x 32 x i32>, <vscale x 32 x i32>, i32, i32)

define fastcc <vscale x 32 x i32> @ret_nxv32i32_call_nxv32i32_nxv32i32_i32(<vscale x 32 x i32> %x, <vscale x 32 x i32> %y, i32 %w) {		define fastcc <vscale x 32 x i32> @ret_nxv32i32_call_nxv32i32_nxv32i32_i32(<vscale x 32 x i32> %x, <vscale x 32 x i32> %y, i32 %w) {
; RV32-LABEL: ret_nxv32i32_call_nxv32i32_nxv32i32_i32:		; RV32-LABEL: ret_nxv32i32_call_nxv32i32_nxv32i32_i32:
; RV32: # %bb.0:		; RV32: # %bb.0:
; RV32-NEXT: addi sp, sp, -32		; RV32-NEXT: addi sp, sp, -144
; RV32-NEXT: .cfi_def_cfa_offset 32		; RV32-NEXT: .cfi_def_cfa_offset 144
; RV32-NEXT: sw ra, 28(sp) # 4-byte Folded Spill		; RV32-NEXT: sw ra, 140(sp) # 4-byte Folded Spill
; RV32-NEXT: .cfi_offset ra, -4		; RV32-NEXT: .cfi_offset ra, -4
; RV32-NEXT: csrr a1, vlenb		; RV32-NEXT: csrr a1, vlenb
; RV32-NEXT: slli a1, a1, 4		; RV32-NEXT: slli a1, a1, 4
; RV32-NEXT: sub sp, sp, a1		; RV32-NEXT: sub sp, sp, a1
; RV32-NEXT: csrr a1, vlenb		; RV32-NEXT: csrr a1, vlenb
; RV32-NEXT: slli a1, a1, 3		; RV32-NEXT: slli a1, a1, 3
; RV32-NEXT: add a3, a0, a1		; RV32-NEXT: add a3, a0, a1
; RV32-NEXT: vl8re32.v v24, (a3)		; RV32-NEXT: vl8re32.v v24, (a3)
; RV32-NEXT: vl8re32.v v0, (a0)		; RV32-NEXT: vl8re32.v v0, (a0)
; RV32-NEXT: addi a0, sp, 16		; RV32-NEXT: addi a0, sp, 128
; RV32-NEXT: add a0, a0, a1		; RV32-NEXT: add a0, a0, a1
; RV32-NEXT: vs8r.v v16, (a0)		; RV32-NEXT: vs8r.v v16, (a0)
; RV32-NEXT: addi a0, sp, 16		; RV32-NEXT: addi a0, sp, 128
; RV32-NEXT: li a3, 2		; RV32-NEXT: li a3, 2
; RV32-NEXT: addi a1, sp, 16		; RV32-NEXT: addi a1, sp, 128
; RV32-NEXT: vs8r.v v8, (a1)		; RV32-NEXT: vs8r.v v8, (a1)
; RV32-NEXT: vmv8r.v v8, v0		; RV32-NEXT: vmv8r.v v8, v0
; RV32-NEXT: vmv8r.v v16, v24		; RV32-NEXT: vmv8r.v v16, v24
; RV32-NEXT: call ext2@plt		; RV32-NEXT: call ext2@plt
; RV32-NEXT: csrr a0, vlenb		; RV32-NEXT: csrr a0, vlenb
; RV32-NEXT: slli a0, a0, 4		; RV32-NEXT: slli a0, a0, 4
; RV32-NEXT: add sp, sp, a0		; RV32-NEXT: add sp, sp, a0
; RV32-NEXT: lw ra, 28(sp) # 4-byte Folded Reload		; RV32-NEXT: lw ra, 140(sp) # 4-byte Folded Reload
; RV32-NEXT: addi sp, sp, 32		; RV32-NEXT: addi sp, sp, 144
; RV32-NEXT: ret		; RV32-NEXT: ret
;		;
; RV64-LABEL: ret_nxv32i32_call_nxv32i32_nxv32i32_i32:		; RV64-LABEL: ret_nxv32i32_call_nxv32i32_nxv32i32_i32:
; RV64: # %bb.0:		; RV64: # %bb.0:
; RV64-NEXT: addi sp, sp, -32		; RV64-NEXT: addi sp, sp, -144
; RV64-NEXT: .cfi_def_cfa_offset 32		; RV64-NEXT: .cfi_def_cfa_offset 144
; RV64-NEXT: sd ra, 24(sp) # 8-byte Folded Spill		; RV64-NEXT: sd ra, 136(sp) # 8-byte Folded Spill
; RV64-NEXT: .cfi_offset ra, -8		; RV64-NEXT: .cfi_offset ra, -8
; RV64-NEXT: csrr a1, vlenb		; RV64-NEXT: csrr a1, vlenb
; RV64-NEXT: slli a1, a1, 4		; RV64-NEXT: slli a1, a1, 4
; RV64-NEXT: sub sp, sp, a1		; RV64-NEXT: sub sp, sp, a1
; RV64-NEXT: csrr a1, vlenb		; RV64-NEXT: csrr a1, vlenb
; RV64-NEXT: slli a1, a1, 3		; RV64-NEXT: slli a1, a1, 3
; RV64-NEXT: add a3, a0, a1		; RV64-NEXT: add a3, a0, a1
; RV64-NEXT: vl8re32.v v24, (a3)		; RV64-NEXT: vl8re32.v v24, (a3)
; RV64-NEXT: vl8re32.v v0, (a0)		; RV64-NEXT: vl8re32.v v0, (a0)
; RV64-NEXT: addi a0, sp, 24		; RV64-NEXT: addi a0, sp, 128
; RV64-NEXT: add a0, a0, a1		; RV64-NEXT: add a0, a0, a1
; RV64-NEXT: vs8r.v v16, (a0)		; RV64-NEXT: vs8r.v v16, (a0)
; RV64-NEXT: addi a0, sp, 24		; RV64-NEXT: addi a0, sp, 128
; RV64-NEXT: li a3, 2		; RV64-NEXT: li a3, 2
; RV64-NEXT: addi a1, sp, 24		; RV64-NEXT: addi a1, sp, 128
; RV64-NEXT: vs8r.v v8, (a1)		; RV64-NEXT: vs8r.v v8, (a1)
; RV64-NEXT: vmv8r.v v8, v0		; RV64-NEXT: vmv8r.v v8, v0
; RV64-NEXT: vmv8r.v v16, v24		; RV64-NEXT: vmv8r.v v16, v24
; RV64-NEXT: call ext2@plt		; RV64-NEXT: call ext2@plt
; RV64-NEXT: csrr a0, vlenb		; RV64-NEXT: csrr a0, vlenb
; RV64-NEXT: slli a0, a0, 4		; RV64-NEXT: slli a0, a0, 4
; RV64-NEXT: add sp, sp, a0		; RV64-NEXT: add sp, sp, a0
; RV64-NEXT: ld ra, 24(sp) # 8-byte Folded Reload		; RV64-NEXT: ld ra, 136(sp) # 8-byte Folded Reload
; RV64-NEXT: addi sp, sp, 32		; RV64-NEXT: addi sp, sp, 144
; RV64-NEXT: ret		; RV64-NEXT: ret
%t = call fastcc <vscale x 32 x i32> @ext2(<vscale x 32 x i32> %y, <vscale x 32 x i32> %x, i32 %w, i32 2)		%t = call fastcc <vscale x 32 x i32> @ext2(<vscale x 32 x i32> %y, <vscale x 32 x i32> %x, i32 %w, i32 2)
ret <vscale x 32 x i32> %t		ret <vscale x 32 x i32> %t
}		}

define fastcc <vscale x 32 x i32> @ret_nxv32i32_call_nxv32i32_nxv32i32_nxv32i32_i32(<vscale x 32 x i32> %x, <vscale x 32 x i32> %y, <vscale x 32 x i32> %z, i32 %w) {		define fastcc <vscale x 32 x i32> @ret_nxv32i32_call_nxv32i32_nxv32i32_nxv32i32_i32(<vscale x 32 x i32> %x, <vscale x 32 x i32> %y, <vscale x 32 x i32> %z, i32 %w) {
; RV32-LABEL: ret_nxv32i32_call_nxv32i32_nxv32i32_nxv32i32_i32:		; RV32-LABEL: ret_nxv32i32_call_nxv32i32_nxv32i32_nxv32i32_i32:
; RV32: # %bb.0:		; RV32: # %bb.0:
; RV32-NEXT: addi sp, sp, -32		; RV32-NEXT: addi sp, sp, -144
; RV32-NEXT: .cfi_def_cfa_offset 32		; RV32-NEXT: .cfi_def_cfa_offset 144
; RV32-NEXT: sw ra, 28(sp) # 4-byte Folded Spill		; RV32-NEXT: sw ra, 140(sp) # 4-byte Folded Spill
; RV32-NEXT: .cfi_offset ra, -4		; RV32-NEXT: .cfi_offset ra, -4
; RV32-NEXT: csrr a1, vlenb		; RV32-NEXT: csrr a1, vlenb
; RV32-NEXT: li a3, 48		; RV32-NEXT: li a3, 48
; RV32-NEXT: mul a1, a1, a3		; RV32-NEXT: mul a1, a1, a3
; RV32-NEXT: sub sp, sp, a1		; RV32-NEXT: sub sp, sp, a1
; RV32-NEXT: csrr a1, vlenb		; RV32-NEXT: csrr a1, vlenb
; RV32-NEXT: slli a1, a1, 3		; RV32-NEXT: slli a1, a1, 3
; RV32-NEXT: add a3, a2, a1		; RV32-NEXT: add a3, a2, a1
; RV32-NEXT: vl8re32.v v24, (a3)		; RV32-NEXT: vl8re32.v v24, (a3)
; RV32-NEXT: csrr a3, vlenb		; RV32-NEXT: csrr a3, vlenb
; RV32-NEXT: slli a3, a3, 3		; RV32-NEXT: slli a3, a3, 3
; RV32-NEXT: add a3, sp, a3		; RV32-NEXT: add a3, sp, a3
; RV32-NEXT: addi a3, a3, 16		; RV32-NEXT: addi a3, a3, 128
; RV32-NEXT: vs8r.v v24, (a3) # Unknown-size Folded Spill		; RV32-NEXT: vs8r.v v24, (a3) # Unknown-size Folded Spill
; RV32-NEXT: add a3, a0, a1		; RV32-NEXT: add a3, a0, a1
; RV32-NEXT: vl8re32.v v24, (a3)		; RV32-NEXT: vl8re32.v v24, (a3)
; RV32-NEXT: vl8re32.v v0, (a2)		; RV32-NEXT: vl8re32.v v0, (a2)
; RV32-NEXT: addi a2, sp, 16		; RV32-NEXT: addi a2, sp, 128
; RV32-NEXT: vs8r.v v0, (a2) # Unknown-size Folded Spill		; RV32-NEXT: vs8r.v v0, (a2) # Unknown-size Folded Spill
; RV32-NEXT: vl8re32.v v0, (a0)		; RV32-NEXT: vl8re32.v v0, (a0)
; RV32-NEXT: csrr a0, vlenb		; RV32-NEXT: csrr a0, vlenb
; RV32-NEXT: slli a0, a0, 4		; RV32-NEXT: slli a0, a0, 4
; RV32-NEXT: add a0, sp, a0		; RV32-NEXT: add a0, sp, a0
; RV32-NEXT: addi a0, a0, 16		; RV32-NEXT: addi a0, a0, 128
; RV32-NEXT: add a0, a0, a1		; RV32-NEXT: add a0, a0, a1
; RV32-NEXT: vs8r.v v16, (a0)		; RV32-NEXT: vs8r.v v16, (a0)
; RV32-NEXT: csrr a0, vlenb		; RV32-NEXT: csrr a0, vlenb
; RV32-NEXT: slli a0, a0, 5		; RV32-NEXT: slli a0, a0, 5
; RV32-NEXT: add a0, sp, a0		; RV32-NEXT: add a0, sp, a0
; RV32-NEXT: addi a0, a0, 16		; RV32-NEXT: addi a0, a0, 128
; RV32-NEXT: add a0, a0, a1		; RV32-NEXT: add a0, a0, a1
; RV32-NEXT: vs8r.v v24, (a0)		; RV32-NEXT: vs8r.v v24, (a0)
; RV32-NEXT: csrr a0, vlenb		; RV32-NEXT: csrr a0, vlenb
; RV32-NEXT: slli a0, a0, 4		; RV32-NEXT: slli a0, a0, 4
; RV32-NEXT: add a0, sp, a0		; RV32-NEXT: add a0, sp, a0
; RV32-NEXT: addi a0, a0, 16		; RV32-NEXT: addi a0, a0, 128
; RV32-NEXT: vs8r.v v8, (a0)		; RV32-NEXT: vs8r.v v8, (a0)
; RV32-NEXT: csrr a0, vlenb		; RV32-NEXT: csrr a0, vlenb
; RV32-NEXT: slli a0, a0, 5		; RV32-NEXT: slli a0, a0, 5
; RV32-NEXT: add a0, sp, a0		; RV32-NEXT: add a0, sp, a0
; RV32-NEXT: addi a0, a0, 16		; RV32-NEXT: addi a0, a0, 128
; RV32-NEXT: csrr a1, vlenb		; RV32-NEXT: csrr a1, vlenb
; RV32-NEXT: slli a1, a1, 4		; RV32-NEXT: slli a1, a1, 4
; RV32-NEXT: add a1, sp, a1		; RV32-NEXT: add a1, sp, a1
; RV32-NEXT: addi a2, a1, 16		; RV32-NEXT: addi a2, a1, 128
; RV32-NEXT: li a5, 42		; RV32-NEXT: li a5, 42
; RV32-NEXT: csrr a1, vlenb		; RV32-NEXT: csrr a1, vlenb
; RV32-NEXT: slli a1, a1, 5		; RV32-NEXT: slli a1, a1, 5
; RV32-NEXT: add a1, sp, a1		; RV32-NEXT: add a1, sp, a1
; RV32-NEXT: addi a1, a1, 16		; RV32-NEXT: addi a1, a1, 128
; RV32-NEXT: vs8r.v v0, (a1)		; RV32-NEXT: vs8r.v v0, (a1)
; RV32-NEXT: addi a1, sp, 16		; RV32-NEXT: addi a1, sp, 128
; RV32-NEXT: vl8re8.v v8, (a1) # Unknown-size Folded Reload		; RV32-NEXT: vl8re8.v v8, (a1) # Unknown-size Folded Reload
; RV32-NEXT: csrr a1, vlenb		; RV32-NEXT: csrr a1, vlenb
; RV32-NEXT: slli a1, a1, 3		; RV32-NEXT: slli a1, a1, 3
; RV32-NEXT: add a1, sp, a1		; RV32-NEXT: add a1, sp, a1
; RV32-NEXT: addi a1, a1, 16		; RV32-NEXT: addi a1, a1, 128
; RV32-NEXT: vl8re8.v v16, (a1) # Unknown-size Folded Reload		; RV32-NEXT: vl8re8.v v16, (a1) # Unknown-size Folded Reload
; RV32-NEXT: call ext3@plt		; RV32-NEXT: call ext3@plt
; RV32-NEXT: csrr a0, vlenb		; RV32-NEXT: csrr a0, vlenb
; RV32-NEXT: li a1, 48		; RV32-NEXT: li a1, 48
; RV32-NEXT: mul a0, a0, a1		; RV32-NEXT: mul a0, a0, a1
; RV32-NEXT: add sp, sp, a0		; RV32-NEXT: add sp, sp, a0
; RV32-NEXT: lw ra, 28(sp) # 4-byte Folded Reload		; RV32-NEXT: lw ra, 140(sp) # 4-byte Folded Reload
; RV32-NEXT: addi sp, sp, 32		; RV32-NEXT: addi sp, sp, 144
; RV32-NEXT: ret		; RV32-NEXT: ret
;		;
; RV64-LABEL: ret_nxv32i32_call_nxv32i32_nxv32i32_nxv32i32_i32:		; RV64-LABEL: ret_nxv32i32_call_nxv32i32_nxv32i32_nxv32i32_i32:
; RV64: # %bb.0:		; RV64: # %bb.0:
; RV64-NEXT: addi sp, sp, -32		; RV64-NEXT: addi sp, sp, -144
; RV64-NEXT: .cfi_def_cfa_offset 32		; RV64-NEXT: .cfi_def_cfa_offset 144
; RV64-NEXT: sd ra, 24(sp) # 8-byte Folded Spill		; RV64-NEXT: sd ra, 136(sp) # 8-byte Folded Spill
; RV64-NEXT: .cfi_offset ra, -8		; RV64-NEXT: .cfi_offset ra, -8
; RV64-NEXT: csrr a1, vlenb		; RV64-NEXT: csrr a1, vlenb
; RV64-NEXT: li a3, 48		; RV64-NEXT: li a3, 48
; RV64-NEXT: mul a1, a1, a3		; RV64-NEXT: mul a1, a1, a3
; RV64-NEXT: sub sp, sp, a1		; RV64-NEXT: sub sp, sp, a1
; RV64-NEXT: csrr a1, vlenb		; RV64-NEXT: csrr a1, vlenb
; RV64-NEXT: slli a1, a1, 3		; RV64-NEXT: slli a1, a1, 3
; RV64-NEXT: add a3, a2, a1		; RV64-NEXT: add a3, a2, a1
; RV64-NEXT: vl8re32.v v24, (a3)		; RV64-NEXT: vl8re32.v v24, (a3)
; RV64-NEXT: csrr a3, vlenb		; RV64-NEXT: csrr a3, vlenb
; RV64-NEXT: slli a3, a3, 3		; RV64-NEXT: slli a3, a3, 3
; RV64-NEXT: add a3, sp, a3		; RV64-NEXT: add a3, sp, a3
; RV64-NEXT: addi a3, a3, 24		; RV64-NEXT: addi a3, a3, 128
; RV64-NEXT: vs8r.v v24, (a3) # Unknown-size Folded Spill		; RV64-NEXT: vs8r.v v24, (a3) # Unknown-size Folded Spill
; RV64-NEXT: add a3, a0, a1		; RV64-NEXT: add a3, a0, a1
; RV64-NEXT: vl8re32.v v24, (a3)		; RV64-NEXT: vl8re32.v v24, (a3)
; RV64-NEXT: vl8re32.v v0, (a2)		; RV64-NEXT: vl8re32.v v0, (a2)
; RV64-NEXT: addi a2, sp, 24		; RV64-NEXT: addi a2, sp, 128
; RV64-NEXT: vs8r.v v0, (a2) # Unknown-size Folded Spill		; RV64-NEXT: vs8r.v v0, (a2) # Unknown-size Folded Spill
; RV64-NEXT: vl8re32.v v0, (a0)		; RV64-NEXT: vl8re32.v v0, (a0)
; RV64-NEXT: csrr a0, vlenb		; RV64-NEXT: csrr a0, vlenb
; RV64-NEXT: slli a0, a0, 4		; RV64-NEXT: slli a0, a0, 4
; RV64-NEXT: add a0, sp, a0		; RV64-NEXT: add a0, sp, a0
; RV64-NEXT: addi a0, a0, 24		; RV64-NEXT: addi a0, a0, 128
; RV64-NEXT: add a0, a0, a1		; RV64-NEXT: add a0, a0, a1
; RV64-NEXT: vs8r.v v16, (a0)		; RV64-NEXT: vs8r.v v16, (a0)
; RV64-NEXT: csrr a0, vlenb		; RV64-NEXT: csrr a0, vlenb
; RV64-NEXT: slli a0, a0, 5		; RV64-NEXT: slli a0, a0, 5
; RV64-NEXT: add a0, sp, a0		; RV64-NEXT: add a0, sp, a0
; RV64-NEXT: addi a0, a0, 24		; RV64-NEXT: addi a0, a0, 128
; RV64-NEXT: add a0, a0, a1		; RV64-NEXT: add a0, a0, a1
; RV64-NEXT: vs8r.v v24, (a0)		; RV64-NEXT: vs8r.v v24, (a0)
; RV64-NEXT: csrr a0, vlenb		; RV64-NEXT: csrr a0, vlenb
; RV64-NEXT: slli a0, a0, 4		; RV64-NEXT: slli a0, a0, 4
; RV64-NEXT: add a0, sp, a0		; RV64-NEXT: add a0, sp, a0
; RV64-NEXT: addi a0, a0, 24		; RV64-NEXT: addi a0, a0, 128
; RV64-NEXT: vs8r.v v8, (a0)		; RV64-NEXT: vs8r.v v8, (a0)
; RV64-NEXT: csrr a0, vlenb		; RV64-NEXT: csrr a0, vlenb
; RV64-NEXT: slli a0, a0, 5		; RV64-NEXT: slli a0, a0, 5
; RV64-NEXT: add a0, sp, a0		; RV64-NEXT: add a0, sp, a0
; RV64-NEXT: addi a0, a0, 24		; RV64-NEXT: addi a0, a0, 128
; RV64-NEXT: csrr a1, vlenb		; RV64-NEXT: csrr a1, vlenb
; RV64-NEXT: slli a1, a1, 4		; RV64-NEXT: slli a1, a1, 4
; RV64-NEXT: add a1, sp, a1		; RV64-NEXT: add a1, sp, a1
; RV64-NEXT: addi a2, a1, 24		; RV64-NEXT: addi a2, a1, 128
; RV64-NEXT: li a5, 42		; RV64-NEXT: li a5, 42
; RV64-NEXT: csrr a1, vlenb		; RV64-NEXT: csrr a1, vlenb
; RV64-NEXT: slli a1, a1, 5		; RV64-NEXT: slli a1, a1, 5
; RV64-NEXT: add a1, sp, a1		; RV64-NEXT: add a1, sp, a1
; RV64-NEXT: addi a1, a1, 24		; RV64-NEXT: addi a1, a1, 128
; RV64-NEXT: vs8r.v v0, (a1)		; RV64-NEXT: vs8r.v v0, (a1)
; RV64-NEXT: addi a1, sp, 24		; RV64-NEXT: addi a1, sp, 128
; RV64-NEXT: vl8re8.v v8, (a1) # Unknown-size Folded Reload		; RV64-NEXT: vl8re8.v v8, (a1) # Unknown-size Folded Reload
; RV64-NEXT: csrr a1, vlenb		; RV64-NEXT: csrr a1, vlenb
; RV64-NEXT: slli a1, a1, 3		; RV64-NEXT: slli a1, a1, 3
; RV64-NEXT: add a1, sp, a1		; RV64-NEXT: add a1, sp, a1
; RV64-NEXT: addi a1, a1, 24		; RV64-NEXT: addi a1, a1, 128
; RV64-NEXT: vl8re8.v v16, (a1) # Unknown-size Folded Reload		; RV64-NEXT: vl8re8.v v16, (a1) # Unknown-size Folded Reload
; RV64-NEXT: call ext3@plt		; RV64-NEXT: call ext3@plt
; RV64-NEXT: csrr a0, vlenb		; RV64-NEXT: csrr a0, vlenb
; RV64-NEXT: li a1, 48		; RV64-NEXT: li a1, 48
; RV64-NEXT: mul a0, a0, a1		; RV64-NEXT: mul a0, a0, a1
; RV64-NEXT: add sp, sp, a0		; RV64-NEXT: add sp, sp, a0
; RV64-NEXT: ld ra, 24(sp) # 8-byte Folded Reload		; RV64-NEXT: ld ra, 136(sp) # 8-byte Folded Reload
; RV64-NEXT: addi sp, sp, 32		; RV64-NEXT: addi sp, sp, 144
; RV64-NEXT: ret		; RV64-NEXT: ret
%t = call fastcc <vscale x 32 x i32> @ext3(<vscale x 32 x i32> %z, <vscale x 32 x i32> %y, <vscale x 32 x i32> %x, i32 %w, i32 42)		%t = call fastcc <vscale x 32 x i32> @ext3(<vscale x 32 x i32> %z, <vscale x 32 x i32> %y, <vscale x 32 x i32> %x, i32 %w, i32 42)
ret <vscale x 32 x i32> %t		ret <vscale x 32 x i32> %t
}		}

; A test case where the normal calling convention would pass directly via the		; A test case where the normal calling convention would pass directly via the
; stack, but with fastcc can pass indirectly with the extra GPR registers		; stack, but with fastcc can pass indirectly with the extra GPR registers
; allowed.		; allowed.
Show All 12 Lines	; CHECK-NEXT: ret
%s = add <vscale x 32 x i32> %x, %z		%s = add <vscale x 32 x i32> %x, %z
ret <vscale x 32 x i32> %s		ret <vscale x 32 x i32> %s
}		}

; Calling the function above. Ensure we pass the arguments correctly.		; Calling the function above. Ensure we pass the arguments correctly.
define fastcc <vscale x 32 x i32> @pass_vector_arg_indirect_stack(<vscale x 32 x i32> %x, <vscale x 32 x i32> %y, <vscale x 32 x i32> %z) {		define fastcc <vscale x 32 x i32> @pass_vector_arg_indirect_stack(<vscale x 32 x i32> %x, <vscale x 32 x i32> %y, <vscale x 32 x i32> %z) {
; RV32-LABEL: pass_vector_arg_indirect_stack:		; RV32-LABEL: pass_vector_arg_indirect_stack:
; RV32: # %bb.0:		; RV32: # %bb.0:
; RV32-NEXT: addi sp, sp, -32		; RV32-NEXT: addi sp, sp, -144
; RV32-NEXT: .cfi_def_cfa_offset 32		; RV32-NEXT: .cfi_def_cfa_offset 144
; RV32-NEXT: sw ra, 28(sp) # 4-byte Folded Spill		; RV32-NEXT: sw ra, 140(sp) # 4-byte Folded Spill
; RV32-NEXT: .cfi_offset ra, -4		; RV32-NEXT: .cfi_offset ra, -4
; RV32-NEXT: csrr a0, vlenb		; RV32-NEXT: csrr a0, vlenb
; RV32-NEXT: slli a0, a0, 5		; RV32-NEXT: slli a0, a0, 5
; RV32-NEXT: sub sp, sp, a0		; RV32-NEXT: sub sp, sp, a0
; RV32-NEXT: csrr a0, vlenb		; RV32-NEXT: csrr a0, vlenb
; RV32-NEXT: slli a0, a0, 3		; RV32-NEXT: slli a0, a0, 3
; RV32-NEXT: addi a1, sp, 16		; RV32-NEXT: addi a1, sp, 128
; RV32-NEXT: add a1, a1, a0		; RV32-NEXT: add a1, a1, a0
; RV32-NEXT: vsetvli a2, zero, e32, m8, ta, mu		; RV32-NEXT: vsetvli a2, zero, e32, m8, ta, mu
; RV32-NEXT: vmv.v.i v8, 0		; RV32-NEXT: vmv.v.i v8, 0
; RV32-NEXT: vs8r.v v8, (a1)		; RV32-NEXT: vs8r.v v8, (a1)
; RV32-NEXT: csrr a1, vlenb		; RV32-NEXT: csrr a1, vlenb
; RV32-NEXT: slli a1, a1, 4		; RV32-NEXT: slli a1, a1, 4
; RV32-NEXT: add a1, sp, a1		; RV32-NEXT: add a1, sp, a1
; RV32-NEXT: addi a1, a1, 16		; RV32-NEXT: addi a1, a1, 128
; RV32-NEXT: add a0, a1, a0		; RV32-NEXT: add a0, a1, a0
; RV32-NEXT: vs8r.v v8, (a0)		; RV32-NEXT: vs8r.v v8, (a0)
; RV32-NEXT: addi a0, sp, 16		; RV32-NEXT: addi a0, sp, 128
; RV32-NEXT: vs8r.v v8, (a0)		; RV32-NEXT: vs8r.v v8, (a0)
; RV32-NEXT: li a1, 1		; RV32-NEXT: li a1, 1
; RV32-NEXT: li a2, 2		; RV32-NEXT: li a2, 2
; RV32-NEXT: li a3, 3		; RV32-NEXT: li a3, 3
; RV32-NEXT: li a4, 4		; RV32-NEXT: li a4, 4
; RV32-NEXT: li a5, 5		; RV32-NEXT: li a5, 5
; RV32-NEXT: li a6, 6		; RV32-NEXT: li a6, 6
; RV32-NEXT: li a7, 7		; RV32-NEXT: li a7, 7
; RV32-NEXT: csrr a0, vlenb		; RV32-NEXT: csrr a0, vlenb
; RV32-NEXT: slli a0, a0, 4		; RV32-NEXT: slli a0, a0, 4
; RV32-NEXT: add a0, sp, a0		; RV32-NEXT: add a0, sp, a0
; RV32-NEXT: addi t2, a0, 16		; RV32-NEXT: addi t2, a0, 128
; RV32-NEXT: addi t4, sp, 16		; RV32-NEXT: addi t4, sp, 128
; RV32-NEXT: li t6, 8		; RV32-NEXT: li t6, 8
; RV32-NEXT: csrr a0, vlenb		; RV32-NEXT: csrr a0, vlenb
; RV32-NEXT: slli a0, a0, 4		; RV32-NEXT: slli a0, a0, 4
; RV32-NEXT: add a0, sp, a0		; RV32-NEXT: add a0, sp, a0
; RV32-NEXT: addi a0, a0, 16		; RV32-NEXT: addi a0, a0, 128
; RV32-NEXT: vs8r.v v8, (a0)		; RV32-NEXT: vs8r.v v8, (a0)
; RV32-NEXT: li a0, 0		; RV32-NEXT: li a0, 0
; RV32-NEXT: vmv.v.i v16, 0		; RV32-NEXT: vmv.v.i v16, 0
; RV32-NEXT: call vector_arg_indirect_stack@plt		; RV32-NEXT: call vector_arg_indirect_stack@plt
; RV32-NEXT: csrr a0, vlenb		; RV32-NEXT: csrr a0, vlenb
; RV32-NEXT: slli a0, a0, 5		; RV32-NEXT: slli a0, a0, 5
; RV32-NEXT: add sp, sp, a0		; RV32-NEXT: add sp, sp, a0
; RV32-NEXT: lw ra, 28(sp) # 4-byte Folded Reload		; RV32-NEXT: lw ra, 140(sp) # 4-byte Folded Reload
; RV32-NEXT: addi sp, sp, 32		; RV32-NEXT: addi sp, sp, 144
; RV32-NEXT: ret		; RV32-NEXT: ret
;		;
; RV64-LABEL: pass_vector_arg_indirect_stack:		; RV64-LABEL: pass_vector_arg_indirect_stack:
; RV64: # %bb.0:		; RV64: # %bb.0:
; RV64-NEXT: addi sp, sp, -32		; RV64-NEXT: addi sp, sp, -144
; RV64-NEXT: .cfi_def_cfa_offset 32		; RV64-NEXT: .cfi_def_cfa_offset 144
; RV64-NEXT: sd ra, 24(sp) # 8-byte Folded Spill		; RV64-NEXT: sd ra, 136(sp) # 8-byte Folded Spill
; RV64-NEXT: .cfi_offset ra, -8		; RV64-NEXT: .cfi_offset ra, -8
; RV64-NEXT: csrr a0, vlenb		; RV64-NEXT: csrr a0, vlenb
; RV64-NEXT: slli a0, a0, 5		; RV64-NEXT: slli a0, a0, 5
; RV64-NEXT: sub sp, sp, a0		; RV64-NEXT: sub sp, sp, a0
; RV64-NEXT: csrr a0, vlenb		; RV64-NEXT: csrr a0, vlenb
; RV64-NEXT: slli a0, a0, 3		; RV64-NEXT: slli a0, a0, 3
; RV64-NEXT: addi a1, sp, 24		; RV64-NEXT: addi a1, sp, 128
; RV64-NEXT: add a1, a1, a0		; RV64-NEXT: add a1, a1, a0
; RV64-NEXT: vsetvli a2, zero, e32, m8, ta, mu		; RV64-NEXT: vsetvli a2, zero, e32, m8, ta, mu
; RV64-NEXT: vmv.v.i v8, 0		; RV64-NEXT: vmv.v.i v8, 0
; RV64-NEXT: vs8r.v v8, (a1)		; RV64-NEXT: vs8r.v v8, (a1)
; RV64-NEXT: csrr a1, vlenb		; RV64-NEXT: csrr a1, vlenb
; RV64-NEXT: slli a1, a1, 4		; RV64-NEXT: slli a1, a1, 4
; RV64-NEXT: add a1, sp, a1		; RV64-NEXT: add a1, sp, a1
; RV64-NEXT: addi a1, a1, 24		; RV64-NEXT: addi a1, a1, 128
; RV64-NEXT: add a0, a1, a0		; RV64-NEXT: add a0, a1, a0
; RV64-NEXT: vs8r.v v8, (a0)		; RV64-NEXT: vs8r.v v8, (a0)
; RV64-NEXT: addi a0, sp, 24		; RV64-NEXT: addi a0, sp, 128
; RV64-NEXT: vs8r.v v8, (a0)		; RV64-NEXT: vs8r.v v8, (a0)
; RV64-NEXT: li a1, 1		; RV64-NEXT: li a1, 1
; RV64-NEXT: li a2, 2		; RV64-NEXT: li a2, 2
; RV64-NEXT: li a3, 3		; RV64-NEXT: li a3, 3
; RV64-NEXT: li a4, 4		; RV64-NEXT: li a4, 4
; RV64-NEXT: li a5, 5		; RV64-NEXT: li a5, 5
; RV64-NEXT: li a6, 6		; RV64-NEXT: li a6, 6
; RV64-NEXT: li a7, 7		; RV64-NEXT: li a7, 7
; RV64-NEXT: csrr a0, vlenb		; RV64-NEXT: csrr a0, vlenb
; RV64-NEXT: slli a0, a0, 4		; RV64-NEXT: slli a0, a0, 4
; RV64-NEXT: add a0, sp, a0		; RV64-NEXT: add a0, sp, a0
; RV64-NEXT: addi t2, a0, 24		; RV64-NEXT: addi t2, a0, 128
; RV64-NEXT: addi t4, sp, 24		; RV64-NEXT: addi t4, sp, 128
; RV64-NEXT: li t6, 8		; RV64-NEXT: li t6, 8
; RV64-NEXT: csrr a0, vlenb		; RV64-NEXT: csrr a0, vlenb
; RV64-NEXT: slli a0, a0, 4		; RV64-NEXT: slli a0, a0, 4
; RV64-NEXT: add a0, sp, a0		; RV64-NEXT: add a0, sp, a0
; RV64-NEXT: addi a0, a0, 24		; RV64-NEXT: addi a0, a0, 128
; RV64-NEXT: vs8r.v v8, (a0)		; RV64-NEXT: vs8r.v v8, (a0)
; RV64-NEXT: li a0, 0		; RV64-NEXT: li a0, 0
; RV64-NEXT: vmv.v.i v16, 0		; RV64-NEXT: vmv.v.i v16, 0
; RV64-NEXT: call vector_arg_indirect_stack@plt		; RV64-NEXT: call vector_arg_indirect_stack@plt
; RV64-NEXT: csrr a0, vlenb		; RV64-NEXT: csrr a0, vlenb
; RV64-NEXT: slli a0, a0, 5		; RV64-NEXT: slli a0, a0, 5
; RV64-NEXT: add sp, sp, a0		; RV64-NEXT: add sp, sp, a0
; RV64-NEXT: ld ra, 24(sp) # 8-byte Folded Reload		; RV64-NEXT: ld ra, 136(sp) # 8-byte Folded Reload
; RV64-NEXT: addi sp, sp, 32		; RV64-NEXT: addi sp, sp, 144
; RV64-NEXT: ret		; RV64-NEXT: ret
%s = call fastcc <vscale x 32 x i32> @vector_arg_indirect_stack(i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, <vscale x 32 x i32> zeroinitializer, <vscale x 32 x i32> zeroinitializer, <vscale x 32 x i32> zeroinitializer, i32 8)		%s = call fastcc <vscale x 32 x i32> @vector_arg_indirect_stack(i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, <vscale x 32 x i32> zeroinitializer, <vscale x 32 x i32> zeroinitializer, <vscale x 32 x i32> zeroinitializer, i32 8)
ret <vscale x 32 x i32> %s		ret <vscale x 32 x i32> %s
}		}

llvm/test/CodeGen/RISCV/rvv/calling-conv.ll

Show All 17 Lines	; CHECK-NEXT: ret
%a = add <vscale x 32 x i32> %x, %y		%a = add <vscale x 32 x i32> %x, %y
ret <vscale x 32 x i32> %a		ret <vscale x 32 x i32> %a
}		}

; Call the function above. Check that we set the arguments correctly.		; Call the function above. Check that we set the arguments correctly.
define <vscale x 32 x i32> @caller_scalable_vector_split_indirect(<vscale x 32 x i32> %x) {		define <vscale x 32 x i32> @caller_scalable_vector_split_indirect(<vscale x 32 x i32> %x) {
; RV32-LABEL: caller_scalable_vector_split_indirect:		; RV32-LABEL: caller_scalable_vector_split_indirect:
; RV32: # %bb.0:		; RV32: # %bb.0:
; RV32-NEXT: addi sp, sp, -48		; RV32-NEXT: addi sp, sp, -144
; RV32-NEXT: .cfi_def_cfa_offset 48		; RV32-NEXT: .cfi_def_cfa_offset 144
; RV32-NEXT: sw ra, 44(sp) # 4-byte Folded Spill		; RV32-NEXT: sw ra, 140(sp) # 4-byte Folded Spill
; RV32-NEXT: .cfi_offset ra, -4		; RV32-NEXT: .cfi_offset ra, -4
; RV32-NEXT: csrr a0, vlenb		; RV32-NEXT: csrr a0, vlenb
; RV32-NEXT: slli a0, a0, 4		; RV32-NEXT: slli a0, a0, 4
; RV32-NEXT: sub sp, sp, a0		; RV32-NEXT: sub sp, sp, a0
; RV32-NEXT: csrr a0, vlenb		; RV32-NEXT: csrr a0, vlenb
; RV32-NEXT: slli a0, a0, 3		; RV32-NEXT: slli a0, a0, 3
; RV32-NEXT: addi a1, sp, 32		; RV32-NEXT: addi a1, sp, 128
; RV32-NEXT: add a0, a1, a0		; RV32-NEXT: add a0, a1, a0
; RV32-NEXT: vs8r.v v16, (a0)		; RV32-NEXT: vs8r.v v16, (a0)
; RV32-NEXT: addi a0, sp, 32		; RV32-NEXT: addi a0, sp, 128
; RV32-NEXT: vs8r.v v8, (a0)		; RV32-NEXT: vs8r.v v8, (a0)
; RV32-NEXT: vsetvli a0, zero, e32, m8, ta, mu		; RV32-NEXT: vsetvli a0, zero, e32, m8, ta, mu
; RV32-NEXT: vmv.v.i v8, 0		; RV32-NEXT: vmv.v.i v8, 0
; RV32-NEXT: addi a0, sp, 32		; RV32-NEXT: addi a0, sp, 128
; RV32-NEXT: vmv.v.i v16, 0		; RV32-NEXT: vmv.v.i v16, 0
; RV32-NEXT: call callee_scalable_vector_split_indirect@plt		; RV32-NEXT: call callee_scalable_vector_split_indirect@plt
; RV32-NEXT: csrr a0, vlenb		; RV32-NEXT: csrr a0, vlenb
; RV32-NEXT: slli a0, a0, 4		; RV32-NEXT: slli a0, a0, 4
; RV32-NEXT: add sp, sp, a0		; RV32-NEXT: add sp, sp, a0
; RV32-NEXT: lw ra, 44(sp) # 4-byte Folded Reload		; RV32-NEXT: lw ra, 140(sp) # 4-byte Folded Reload
; RV32-NEXT: addi sp, sp, 48		; RV32-NEXT: addi sp, sp, 144
; RV32-NEXT: ret		; RV32-NEXT: ret
;		;
; RV64-LABEL: caller_scalable_vector_split_indirect:		; RV64-LABEL: caller_scalable_vector_split_indirect:
; RV64: # %bb.0:		; RV64: # %bb.0:
; RV64-NEXT: addi sp, sp, -32		; RV64-NEXT: addi sp, sp, -144
; RV64-NEXT: .cfi_def_cfa_offset 32		; RV64-NEXT: .cfi_def_cfa_offset 144
; RV64-NEXT: sd ra, 24(sp) # 8-byte Folded Spill		; RV64-NEXT: sd ra, 136(sp) # 8-byte Folded Spill
; RV64-NEXT: .cfi_offset ra, -8		; RV64-NEXT: .cfi_offset ra, -8
; RV64-NEXT: csrr a0, vlenb		; RV64-NEXT: csrr a0, vlenb
; RV64-NEXT: slli a0, a0, 4		; RV64-NEXT: slli a0, a0, 4
; RV64-NEXT: sub sp, sp, a0		; RV64-NEXT: sub sp, sp, a0
; RV64-NEXT: csrr a0, vlenb		; RV64-NEXT: csrr a0, vlenb
; RV64-NEXT: slli a0, a0, 3		; RV64-NEXT: slli a0, a0, 3
; RV64-NEXT: addi a1, sp, 24		; RV64-NEXT: addi a1, sp, 128
; RV64-NEXT: add a0, a1, a0		; RV64-NEXT: add a0, a1, a0
; RV64-NEXT: vs8r.v v16, (a0)		; RV64-NEXT: vs8r.v v16, (a0)
; RV64-NEXT: addi a0, sp, 24		; RV64-NEXT: addi a0, sp, 128
; RV64-NEXT: vs8r.v v8, (a0)		; RV64-NEXT: vs8r.v v8, (a0)
; RV64-NEXT: vsetvli a0, zero, e32, m8, ta, mu		; RV64-NEXT: vsetvli a0, zero, e32, m8, ta, mu
; RV64-NEXT: vmv.v.i v8, 0		; RV64-NEXT: vmv.v.i v8, 0
; RV64-NEXT: addi a0, sp, 24		; RV64-NEXT: addi a0, sp, 128
; RV64-NEXT: vmv.v.i v16, 0		; RV64-NEXT: vmv.v.i v16, 0
; RV64-NEXT: call callee_scalable_vector_split_indirect@plt		; RV64-NEXT: call callee_scalable_vector_split_indirect@plt
; RV64-NEXT: csrr a0, vlenb		; RV64-NEXT: csrr a0, vlenb
; RV64-NEXT: slli a0, a0, 4		; RV64-NEXT: slli a0, a0, 4
; RV64-NEXT: add sp, sp, a0		; RV64-NEXT: add sp, sp, a0
; RV64-NEXT: ld ra, 24(sp) # 8-byte Folded Reload		; RV64-NEXT: ld ra, 136(sp) # 8-byte Folded Reload
; RV64-NEXT: addi sp, sp, 32		; RV64-NEXT: addi sp, sp, 144
; RV64-NEXT: ret		; RV64-NEXT: ret
%c = alloca i64		%c = alloca i64
%a = call <vscale x 32 x i32> @callee_scalable_vector_split_indirect(<vscale x 32 x i32> zeroinitializer, <vscale x 32 x i32> %x)		%a = call <vscale x 32 x i32> @callee_scalable_vector_split_indirect(<vscale x 32 x i32> zeroinitializer, <vscale x 32 x i32> %x)
ret <vscale x 32 x i32> %a		ret <vscale x 32 x i32> %a
}		}

llvm/test/CodeGen/RISCV/rvv/emergency-slot.mir

Show First 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	body: \|
; CHECK-NEXT: frame-setup CFI_INSTRUCTION offset $x24, -72		; CHECK-NEXT: frame-setup CFI_INSTRUCTION offset $x24, -72
; CHECK-NEXT: frame-setup CFI_INSTRUCTION offset $x25, -80		; CHECK-NEXT: frame-setup CFI_INSTRUCTION offset $x25, -80
; CHECK-NEXT: frame-setup CFI_INSTRUCTION offset $x26, -88		; CHECK-NEXT: frame-setup CFI_INSTRUCTION offset $x26, -88
; CHECK-NEXT: frame-setup CFI_INSTRUCTION offset $x27, -96		; CHECK-NEXT: frame-setup CFI_INSTRUCTION offset $x27, -96
; CHECK-NEXT: $x8 = frame-setup ADDI $x2, 2032		; CHECK-NEXT: $x8 = frame-setup ADDI $x2, 2032
; CHECK-NEXT: frame-setup CFI_INSTRUCTION def_cfa $x8, 0		; CHECK-NEXT: frame-setup CFI_INSTRUCTION def_cfa $x8, 0
; CHECK-NEXT: $x2 = frame-setup ADDI $x2, -272		; CHECK-NEXT: $x2 = frame-setup ADDI $x2, -272
; CHECK-NEXT: $x10 = frame-setup PseudoReadVLENB		; CHECK-NEXT: $x10 = frame-setup PseudoReadVLENB
; CHECK-NEXT: $x11 = frame-setup ADDI killed $x0, 51		; CHECK-NEXT: $x11 = frame-setup ADDI killed $x0, 52
; CHECK-NEXT: $x10 = frame-setup MUL killed $x10, killed $x11		; CHECK-NEXT: $x10 = frame-setup MUL killed $x10, killed $x11
; CHECK-NEXT: $x2 = frame-setup SUB $x2, killed $x10		; CHECK-NEXT: $x2 = frame-setup SUB $x2, killed $x10
; CHECK-NEXT: $x2 = frame-setup ANDI $x2, -128		; CHECK-NEXT: $x2 = frame-setup ANDI $x2, -128
; CHECK-NEXT: dead renamable $x15 = PseudoVSETIVLI 1, 72 /* e16, m1, ta, mu */, implicit-def $vl, implicit-def $vtype		; CHECK-NEXT: dead renamable $x15 = PseudoVSETIVLI 1, 72 /* e16, m1, ta, mu */, implicit-def $vl, implicit-def $vtype
; CHECK-NEXT: renamable $v25 = PseudoVMV_V_X_M1 killed renamable $x12, $noreg, 4 /* e16 */, implicit $vl, implicit $vtype		; CHECK-NEXT: renamable $v25 = PseudoVMV_V_X_M1 killed renamable $x12, $noreg, 4 /* e16 */, implicit $vl, implicit $vtype
; CHECK-NEXT: $x11 = PseudoReadVLENB		; CHECK-NEXT: $x11 = PseudoReadVLENB
; CHECK-NEXT: $x10 = ADDI killed $x0, 50		; CHECK-NEXT: $x10 = ADDI killed $x0, 50
; CHECK-NEXT: $x11 = MUL killed $x11, killed $x10		; CHECK-NEXT: $x11 = MUL killed $x11, killed $x10
▲ Show 20 Lines • Show All 131 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-insert-subvector.ll

Show First 20 Lines • Show All 553 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret
ret void		ret void
}		}

; Check we don't mistakenly optimize this: we don't know whether this is		; Check we don't mistakenly optimize this: we don't know whether this is
; inserted into the low or high split vector.		; inserted into the low or high split vector.
define void @insert_v2i64_nxv16i64_hi(<2 x i64>* %psv, <vscale x 16 x i64>* %out) {		define void @insert_v2i64_nxv16i64_hi(<2 x i64>* %psv, <vscale x 16 x i64>* %out) {
; CHECK-LABEL: insert_v2i64_nxv16i64_hi:		; CHECK-LABEL: insert_v2i64_nxv16i64_hi:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: addi sp, sp, -16		; CHECK-NEXT: addi sp, sp, -64
; CHECK-NEXT: .cfi_def_cfa_offset 16		; CHECK-NEXT: .cfi_def_cfa_offset 64
; CHECK-NEXT: csrr a2, vlenb		; CHECK-NEXT: csrr a2, vlenb
; CHECK-NEXT: slli a2, a2, 4		; CHECK-NEXT: slli a2, a2, 4
; CHECK-NEXT: sub sp, sp, a2		; CHECK-NEXT: sub sp, sp, a2
; CHECK-NEXT: vsetivli zero, 2, e64, m1, ta, mu		; CHECK-NEXT: vsetivli zero, 2, e64, m1, ta, mu
; CHECK-NEXT: vle64.v v8, (a0)		; CHECK-NEXT: vle64.v v8, (a0)
; CHECK-NEXT: addi a0, sp, 80		; CHECK-NEXT: addi a0, sp, 128
; CHECK-NEXT: vse64.v v8, (a0)		; CHECK-NEXT: vse64.v v8, (a0)
; CHECK-NEXT: csrr a0, vlenb		; CHECK-NEXT: csrr a0, vlenb
; CHECK-NEXT: slli a0, a0, 3		; CHECK-NEXT: slli a0, a0, 3
; CHECK-NEXT: addi a2, sp, 16		; CHECK-NEXT: addi a2, sp, 64
; CHECK-NEXT: add a2, a2, a0		; CHECK-NEXT: add a2, a2, a0
; CHECK-NEXT: vl8re64.v v8, (a2)		; CHECK-NEXT: vl8re64.v v8, (a2)
; CHECK-NEXT: addi a2, sp, 16		; CHECK-NEXT: addi a2, sp, 64
; CHECK-NEXT: vl8re64.v v16, (a2)		; CHECK-NEXT: vl8re64.v v16, (a2)
; CHECK-NEXT: add a0, a1, a0		; CHECK-NEXT: add a0, a1, a0
; CHECK-NEXT: vs8r.v v8, (a0)		; CHECK-NEXT: vs8r.v v8, (a0)
; CHECK-NEXT: vs8r.v v16, (a1)		; CHECK-NEXT: vs8r.v v16, (a1)
; CHECK-NEXT: csrr a0, vlenb		; CHECK-NEXT: csrr a0, vlenb
; CHECK-NEXT: slli a0, a0, 4		; CHECK-NEXT: slli a0, a0, 4
; CHECK-NEXT: add sp, sp, a0		; CHECK-NEXT: add sp, sp, a0
; CHECK-NEXT: addi sp, sp, 16		; CHECK-NEXT: addi sp, sp, 64
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%sv = load <2 x i64>, <2 x i64>* %psv		%sv = load <2 x i64>, <2 x i64>* %psv
%v = call <vscale x 16 x i64> @llvm.experimental.vector.insert.v2i64.nxv16i64(<vscale x 16 x i64> undef, <2 x i64> %sv, i64 8)		%v = call <vscale x 16 x i64> @llvm.experimental.vector.insert.v2i64.nxv16i64(<vscale x 16 x i64> undef, <2 x i64> %sv, i64 8)
store <vscale x 16 x i64> %v, <vscale x 16 x i64>* %out		store <vscale x 16 x i64> %v, <vscale x 16 x i64>* %out
ret void		ret void
}		}

declare <8 x i1> @llvm.experimental.vector.insert.v4i1.v8i1(<8 x i1>, <4 x i1>, i64)		declare <8 x i1> @llvm.experimental.vector.insert.v4i1.v8i1(<8 x i1>, <4 x i1>, i64)
Show All 15 Lines

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vpscatter.ll

	Show First 20 Lines • Show All 1,827 Lines • ▼ Show 20 Lines
	; RV32-NEXT: vsoxei32.v v16, (a0), v8, v0.t			; RV32-NEXT: vsoxei32.v v16, (a0), v8, v0.t
	; RV32-NEXT: ret			; RV32-NEXT: ret
	;			;
	; RV64-LABEL: vpscatter_baseidx_v32i32_v32f64:			; RV64-LABEL: vpscatter_baseidx_v32i32_v32f64:
	; RV64: # %bb.0:			; RV64: # %bb.0:
	; RV64-NEXT: addi sp, sp, -16			; RV64-NEXT: addi sp, sp, -16
	; RV64-NEXT: .cfi_def_cfa_offset 16			; RV64-NEXT: .cfi_def_cfa_offset 16
	; RV64-NEXT: csrr a3, vlenb			; RV64-NEXT: csrr a3, vlenb
	; RV64-NEXT: slli a4, a3, 3			; RV64-NEXT: li a4, 10
	; RV64-NEXT: add a3, a4, a3			; RV64-NEXT: mul a3, a3, a4
	; RV64-NEXT: sub sp, sp, a3			; RV64-NEXT: sub sp, sp, a3
	; RV64-NEXT: li a3, 32			; RV64-NEXT: li a3, 32
	; RV64-NEXT: vsetvli zero, a3, e32, m8, ta, mu			; RV64-NEXT: vsetvli zero, a3, e32, m8, ta, mu
	; RV64-NEXT: vle32.v v24, (a1)			; RV64-NEXT: vle32.v v24, (a1)
	; RV64-NEXT: csrr a1, vlenb			; RV64-NEXT: csrr a1, vlenb
	; RV64-NEXT: add a1, sp, a1			; RV64-NEXT: add a1, sp, a1
	; RV64-NEXT: addi a1, a1, 16			; RV64-NEXT: addi a1, a1, 16
	; RV64-NEXT: vs8r.v v24, (a1) # Unknown-size Folded Spill			; RV64-NEXT: vs8r.v v24, (a1) # Unknown-size Folded Spill
	Show All 31 Lines
	; RV64-NEXT: vl8re8.v v8, (a2) # Unknown-size Folded Reload			; RV64-NEXT: vl8re8.v v8, (a2) # Unknown-size Folded Reload
	; RV64-NEXT: vslidedown.vi v8, v8, 16			; RV64-NEXT: vslidedown.vi v8, v8, 16
	; RV64-NEXT: vsetivli zero, 16, e64, m8, ta, mu			; RV64-NEXT: vsetivli zero, 16, e64, m8, ta, mu
	; RV64-NEXT: vsext.vf2 v24, v8			; RV64-NEXT: vsext.vf2 v24, v8
	; RV64-NEXT: vsll.vi v8, v24, 3			; RV64-NEXT: vsll.vi v8, v24, 3
	; RV64-NEXT: vsetvli zero, a1, e64, m8, ta, mu			; RV64-NEXT: vsetvli zero, a1, e64, m8, ta, mu
	; RV64-NEXT: vsoxei64.v v16, (a0), v8, v0.t			; RV64-NEXT: vsoxei64.v v16, (a0), v8, v0.t
	; RV64-NEXT: csrr a0, vlenb			; RV64-NEXT: csrr a0, vlenb
	; RV64-NEXT: slli a1, a0, 3			; RV64-NEXT: li a1, 10
	; RV64-NEXT: add a0, a1, a0			; RV64-NEXT: mul a0, a0, a1
	; RV64-NEXT: add sp, sp, a0			; RV64-NEXT: add sp, sp, a0
	; RV64-NEXT: addi sp, sp, 16			; RV64-NEXT: addi sp, sp, 16
	; RV64-NEXT: ret			; RV64-NEXT: ret
	%ptrs = getelementptr inbounds double, double* %base, <32 x i32> %idxs			%ptrs = getelementptr inbounds double, double* %base, <32 x i32> %idxs
	call void @llvm.vp.scatter.v32f64.v32p0f64(<32 x double> %val, <32 x double*> %ptrs, <32 x i1> %m, i32 %evl)			call void @llvm.vp.scatter.v32f64.v32p0f64(<32 x double> %val, <32 x double*> %ptrs, <32 x i1> %m, i32 %evl)
	ret void			ret void
	}			}

	▲ Show 20 Lines • Show All 267 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/rvv/localvar.ll

Show First 20 Lines • Show All 79 Lines • ▼ Show 20 Lines	; RV64IV-NEXT: ret
load volatile <vscale x 16 x i8>, <vscale x 16 x i8>* %local0		load volatile <vscale x 16 x i8>, <vscale x 16 x i8>* %local0
load volatile <vscale x 16 x i8>, <vscale x 16 x i8>* %local1		load volatile <vscale x 16 x i8>, <vscale x 16 x i8>* %local1
ret void		ret void
}		}

define void @local_var_m4() {		define void @local_var_m4() {
; RV64IV-LABEL: local_var_m4:		; RV64IV-LABEL: local_var_m4:
; RV64IV: # %bb.0:		; RV64IV: # %bb.0:
; RV64IV-NEXT: addi sp, sp, -32		; RV64IV-NEXT: addi sp, sp, -48
; RV64IV-NEXT: .cfi_def_cfa_offset 32		; RV64IV-NEXT: .cfi_def_cfa_offset 48
; RV64IV-NEXT: sd ra, 24(sp) # 8-byte Folded Spill		; RV64IV-NEXT: sd ra, 40(sp) # 8-byte Folded Spill
; RV64IV-NEXT: sd s0, 16(sp) # 8-byte Folded Spill		; RV64IV-NEXT: sd s0, 32(sp) # 8-byte Folded Spill
; RV64IV-NEXT: .cfi_offset ra, -8		; RV64IV-NEXT: .cfi_offset ra, -8
; RV64IV-NEXT: .cfi_offset s0, -16		; RV64IV-NEXT: .cfi_offset s0, -16
; RV64IV-NEXT: addi s0, sp, 32		; RV64IV-NEXT: addi s0, sp, 48
; RV64IV-NEXT: .cfi_def_cfa s0, 0		; RV64IV-NEXT: .cfi_def_cfa s0, 0
; RV64IV-NEXT: csrr a0, vlenb		; RV64IV-NEXT: csrr a0, vlenb
; RV64IV-NEXT: slli a0, a0, 3		; RV64IV-NEXT: slli a0, a0, 3
; RV64IV-NEXT: sub sp, sp, a0		; RV64IV-NEXT: sub sp, sp, a0
; RV64IV-NEXT: andi sp, sp, -32		; RV64IV-NEXT: andi sp, sp, -32
; RV64IV-NEXT: csrr a0, vlenb		; RV64IV-NEXT: csrr a0, vlenb
; RV64IV-NEXT: slli a0, a0, 2		; RV64IV-NEXT: slli a0, a0, 2
; RV64IV-NEXT: add a0, sp, a0		; RV64IV-NEXT: add a0, sp, a0
; RV64IV-NEXT: addi a0, a0, 16		; RV64IV-NEXT: addi a0, a0, 32
; RV64IV-NEXT: vl4r.v v8, (a0)		; RV64IV-NEXT: vl4r.v v8, (a0)
; RV64IV-NEXT: addi a0, sp, 16		; RV64IV-NEXT: addi a0, sp, 32
; RV64IV-NEXT: vl4r.v v8, (a0)		; RV64IV-NEXT: vl4r.v v8, (a0)
; RV64IV-NEXT: addi sp, s0, -32		; RV64IV-NEXT: addi sp, s0, -48
; RV64IV-NEXT: ld ra, 24(sp) # 8-byte Folded Reload		; RV64IV-NEXT: ld ra, 40(sp) # 8-byte Folded Reload
; RV64IV-NEXT: ld s0, 16(sp) # 8-byte Folded Reload		; RV64IV-NEXT: ld s0, 32(sp) # 8-byte Folded Reload
; RV64IV-NEXT: addi sp, sp, 32		; RV64IV-NEXT: addi sp, sp, 48
; RV64IV-NEXT: ret		; RV64IV-NEXT: ret
%local0 = alloca <vscale x 32 x i8>		%local0 = alloca <vscale x 32 x i8>
%local1 = alloca <vscale x 32 x i8>		%local1 = alloca <vscale x 32 x i8>
load volatile <vscale x 32 x i8>, <vscale x 32 x i8>* %local0		load volatile <vscale x 32 x i8>, <vscale x 32 x i8>* %local0
load volatile <vscale x 32 x i8>, <vscale x 32 x i8>* %local1		load volatile <vscale x 32 x i8>, <vscale x 32 x i8>* %local1
ret void		ret void
}		}

define void @local_var_m8() {		define void @local_var_m8() {
; RV64IV-LABEL: local_var_m8:		; RV64IV-LABEL: local_var_m8:
; RV64IV: # %bb.0:		; RV64IV: # %bb.0:
; RV64IV-NEXT: addi sp, sp, -64		; RV64IV-NEXT: addi sp, sp, -80
; RV64IV-NEXT: .cfi_def_cfa_offset 64		; RV64IV-NEXT: .cfi_def_cfa_offset 80
; RV64IV-NEXT: sd ra, 56(sp) # 8-byte Folded Spill		; RV64IV-NEXT: sd ra, 72(sp) # 8-byte Folded Spill
; RV64IV-NEXT: sd s0, 48(sp) # 8-byte Folded Spill		; RV64IV-NEXT: sd s0, 64(sp) # 8-byte Folded Spill
; RV64IV-NEXT: .cfi_offset ra, -8		; RV64IV-NEXT: .cfi_offset ra, -8
; RV64IV-NEXT: .cfi_offset s0, -16		; RV64IV-NEXT: .cfi_offset s0, -16
; RV64IV-NEXT: addi s0, sp, 64		; RV64IV-NEXT: addi s0, sp, 80
; RV64IV-NEXT: .cfi_def_cfa s0, 0		; RV64IV-NEXT: .cfi_def_cfa s0, 0
; RV64IV-NEXT: csrr a0, vlenb		; RV64IV-NEXT: csrr a0, vlenb
; RV64IV-NEXT: slli a0, a0, 4		; RV64IV-NEXT: slli a0, a0, 4
; RV64IV-NEXT: sub sp, sp, a0		; RV64IV-NEXT: sub sp, sp, a0
; RV64IV-NEXT: andi sp, sp, -64		; RV64IV-NEXT: andi sp, sp, -64
; RV64IV-NEXT: csrr a0, vlenb		; RV64IV-NEXT: csrr a0, vlenb
; RV64IV-NEXT: slli a0, a0, 3		; RV64IV-NEXT: slli a0, a0, 3
; RV64IV-NEXT: add a0, sp, a0		; RV64IV-NEXT: add a0, sp, a0
; RV64IV-NEXT: addi a0, a0, 48		; RV64IV-NEXT: addi a0, a0, 64
; RV64IV-NEXT: vl8r.v v8, (a0)		; RV64IV-NEXT: vl8r.v v8, (a0)
; RV64IV-NEXT: addi a0, sp, 48		; RV64IV-NEXT: addi a0, sp, 64
; RV64IV-NEXT: vl8r.v v8, (a0)		; RV64IV-NEXT: vl8r.v v8, (a0)
; RV64IV-NEXT: addi sp, s0, -64		; RV64IV-NEXT: addi sp, s0, -80
; RV64IV-NEXT: ld ra, 56(sp) # 8-byte Folded Reload		; RV64IV-NEXT: ld ra, 72(sp) # 8-byte Folded Reload
; RV64IV-NEXT: ld s0, 48(sp) # 8-byte Folded Reload		; RV64IV-NEXT: ld s0, 64(sp) # 8-byte Folded Reload
; RV64IV-NEXT: addi sp, sp, 64		; RV64IV-NEXT: addi sp, sp, 80
; RV64IV-NEXT: ret		; RV64IV-NEXT: ret
%local0 = alloca <vscale x 64 x i8>		%local0 = alloca <vscale x 64 x i8>
%local1 = alloca <vscale x 64 x i8>		%local1 = alloca <vscale x 64 x i8>
load volatile <vscale x 64 x i8>, <vscale x 64 x i8>* %local0		load volatile <vscale x 64 x i8>, <vscale x 64 x i8>* %local0
load volatile <vscale x 64 x i8>, <vscale x 64 x i8>* %local1		load volatile <vscale x 64 x i8>, <vscale x 64 x i8>* %local1
ret void		ret void
}		}

▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	; RV64IV-NEXT: ret
load volatile <vscale x 16 x i8>, <vscale x 16 x i8>* %2		load volatile <vscale x 16 x i8>, <vscale x 16 x i8>* %2
load volatile <vscale x 16 x i8>, <vscale x 16 x i8>* %3		load volatile <vscale x 16 x i8>, <vscale x 16 x i8>* %3
ret void		ret void
}		}

define void @local_var_m2_with_bp(i64 %n) {		define void @local_var_m2_with_bp(i64 %n) {
; RV64IV-LABEL: local_var_m2_with_bp:		; RV64IV-LABEL: local_var_m2_with_bp:
; RV64IV: # %bb.0:		; RV64IV: # %bb.0:
; RV64IV-NEXT: addi sp, sp, -256		; RV64IV-NEXT: addi sp, sp, -272
; RV64IV-NEXT: .cfi_def_cfa_offset 256		; RV64IV-NEXT: .cfi_def_cfa_offset 272
; RV64IV-NEXT: sd ra, 248(sp) # 8-byte Folded Spill		; RV64IV-NEXT: sd ra, 264(sp) # 8-byte Folded Spill
; RV64IV-NEXT: sd s0, 240(sp) # 8-byte Folded Spill		; RV64IV-NEXT: sd s0, 256(sp) # 8-byte Folded Spill
; RV64IV-NEXT: sd s1, 232(sp) # 8-byte Folded Spill		; RV64IV-NEXT: sd s1, 248(sp) # 8-byte Folded Spill
; RV64IV-NEXT: .cfi_offset ra, -8		; RV64IV-NEXT: .cfi_offset ra, -8
; RV64IV-NEXT: .cfi_offset s0, -16		; RV64IV-NEXT: .cfi_offset s0, -16
; RV64IV-NEXT: .cfi_offset s1, -24		; RV64IV-NEXT: .cfi_offset s1, -24
; RV64IV-NEXT: addi s0, sp, 256		; RV64IV-NEXT: addi s0, sp, 272
; RV64IV-NEXT: .cfi_def_cfa s0, 0		; RV64IV-NEXT: .cfi_def_cfa s0, 0
; RV64IV-NEXT: csrr a1, vlenb		; RV64IV-NEXT: csrr a1, vlenb
; RV64IV-NEXT: slli a1, a1, 2		; RV64IV-NEXT: slli a1, a1, 2
; RV64IV-NEXT: sub sp, sp, a1		; RV64IV-NEXT: sub sp, sp, a1
; RV64IV-NEXT: andi sp, sp, -128		; RV64IV-NEXT: andi sp, sp, -128
; RV64IV-NEXT: mv s1, sp		; RV64IV-NEXT: mv s1, sp
; RV64IV-NEXT: addi a0, a0, 15		; RV64IV-NEXT: addi a0, a0, 15
; RV64IV-NEXT: andi a0, a0, -16		; RV64IV-NEXT: andi a0, a0, -16
; RV64IV-NEXT: sub a0, sp, a0		; RV64IV-NEXT: sub a0, sp, a0
; RV64IV-NEXT: mv sp, a0		; RV64IV-NEXT: mv sp, a0
; RV64IV-NEXT: addi a1, s1, 128		; RV64IV-NEXT: addi a1, s1, 128
; RV64IV-NEXT: csrr a2, vlenb		; RV64IV-NEXT: csrr a2, vlenb
; RV64IV-NEXT: slli a2, a2, 1		; RV64IV-NEXT: slli a2, a2, 1
; RV64IV-NEXT: add a2, s1, a2		; RV64IV-NEXT: add a2, s1, a2
; RV64IV-NEXT: addi a2, a2, 232		; RV64IV-NEXT: addi a2, a2, 240
; RV64IV-NEXT: call notdead2@plt		; RV64IV-NEXT: call notdead2@plt
; RV64IV-NEXT: lw a0, 124(s1)		; RV64IV-NEXT: lw a0, 124(s1)
; RV64IV-NEXT: csrr a0, vlenb		; RV64IV-NEXT: csrr a0, vlenb
; RV64IV-NEXT: slli a0, a0, 1		; RV64IV-NEXT: slli a0, a0, 1
; RV64IV-NEXT: add a0, s1, a0		; RV64IV-NEXT: add a0, s1, a0
; RV64IV-NEXT: addi a0, a0, 232		; RV64IV-NEXT: addi a0, a0, 240
; RV64IV-NEXT: vl2r.v v8, (a0)		; RV64IV-NEXT: vl2r.v v8, (a0)
; RV64IV-NEXT: addi a0, s1, 232		; RV64IV-NEXT: addi a0, s1, 240
; RV64IV-NEXT: vl2r.v v8, (a0)		; RV64IV-NEXT: vl2r.v v8, (a0)
; RV64IV-NEXT: lw a0, 120(s1)		; RV64IV-NEXT: lw a0, 120(s1)
; RV64IV-NEXT: addi sp, s0, -256		; RV64IV-NEXT: addi sp, s0, -272
; RV64IV-NEXT: ld ra, 248(sp) # 8-byte Folded Reload		; RV64IV-NEXT: ld ra, 264(sp) # 8-byte Folded Reload
; RV64IV-NEXT: ld s0, 240(sp) # 8-byte Folded Reload		; RV64IV-NEXT: ld s0, 256(sp) # 8-byte Folded Reload
; RV64IV-NEXT: ld s1, 232(sp) # 8-byte Folded Reload		; RV64IV-NEXT: ld s1, 248(sp) # 8-byte Folded Reload
; RV64IV-NEXT: addi sp, sp, 256		; RV64IV-NEXT: addi sp, sp, 272
; RV64IV-NEXT: ret		; RV64IV-NEXT: ret
%1 = alloca i8, i64 %n		%1 = alloca i8, i64 %n
%2 = alloca i32, align 128		%2 = alloca i32, align 128
%local_scalar0 = alloca i32		%local_scalar0 = alloca i32
%local0 = alloca <vscale x 16 x i8>		%local0 = alloca <vscale x 16 x i8>
%local1 = alloca <vscale x 16 x i8>		%local1 = alloca <vscale x 16 x i8>
%local_scalar1 = alloca i32		%local_scalar1 = alloca i32
call void @notdead2(i8* %1, i32* %2, <vscale x 16 x i8>* %local0)		call void @notdead2(i8* %1, i32* %2, <vscale x 16 x i8>* %local0)
Show All 30 Lines

llvm/test/CodeGen/RISCV/rvv/memory-args.ll

Show All 21 Lines	%ret = call <vscale x 64 x i8> @llvm.riscv.vmacc.nxv64i8.nxv64i8(
<vscale x 64 x i8> %arg1,		<vscale x 64 x i8> %arg1,
<vscale x 64 x i8> %arg2, i64 1024, i64 0)		<vscale x 64 x i8> %arg2, i64 1024, i64 0)
ret <vscale x 64 x i8> %ret		ret <vscale x 64 x i8> %ret
}		}

define <vscale x 64 x i8> @caller() {		define <vscale x 64 x i8> @caller() {
; RV64IV-LABEL: caller:		; RV64IV-LABEL: caller:
; RV64IV: # %bb.0:		; RV64IV: # %bb.0:
; RV64IV-NEXT: addi sp, sp, -64		; RV64IV-NEXT: addi sp, sp, -80
; RV64IV-NEXT: .cfi_def_cfa_offset 64		; RV64IV-NEXT: .cfi_def_cfa_offset 80
; RV64IV-NEXT: sd ra, 56(sp) # 8-byte Folded Spill		; RV64IV-NEXT: sd ra, 72(sp) # 8-byte Folded Spill
; RV64IV-NEXT: sd s0, 48(sp) # 8-byte Folded Spill		; RV64IV-NEXT: sd s0, 64(sp) # 8-byte Folded Spill
; RV64IV-NEXT: .cfi_offset ra, -8		; RV64IV-NEXT: .cfi_offset ra, -8
; RV64IV-NEXT: .cfi_offset s0, -16		; RV64IV-NEXT: .cfi_offset s0, -16
; RV64IV-NEXT: addi s0, sp, 64		; RV64IV-NEXT: addi s0, sp, 80
; RV64IV-NEXT: .cfi_def_cfa s0, 0		; RV64IV-NEXT: .cfi_def_cfa s0, 0
; RV64IV-NEXT: csrr a0, vlenb		; RV64IV-NEXT: csrr a0, vlenb
; RV64IV-NEXT: slli a0, a0, 5		; RV64IV-NEXT: slli a0, a0, 5
; RV64IV-NEXT: sub sp, sp, a0		; RV64IV-NEXT: sub sp, sp, a0
; RV64IV-NEXT: andi sp, sp, -64		; RV64IV-NEXT: andi sp, sp, -64
; RV64IV-NEXT: csrr a0, vlenb		; RV64IV-NEXT: csrr a0, vlenb
; RV64IV-NEXT: li a1, 24		; RV64IV-NEXT: li a1, 24
; RV64IV-NEXT: mul a0, a0, a1		; RV64IV-NEXT: mul a0, a0, a1
; RV64IV-NEXT: add a0, sp, a0		; RV64IV-NEXT: add a0, sp, a0
; RV64IV-NEXT: addi a0, a0, 48		; RV64IV-NEXT: addi a0, a0, 64
; RV64IV-NEXT: vl8r.v v8, (a0)		; RV64IV-NEXT: vl8r.v v8, (a0)
; RV64IV-NEXT: csrr a0, vlenb		; RV64IV-NEXT: csrr a0, vlenb
; RV64IV-NEXT: slli a0, a0, 4		; RV64IV-NEXT: slli a0, a0, 4
; RV64IV-NEXT: add a0, sp, a0		; RV64IV-NEXT: add a0, sp, a0
; RV64IV-NEXT: addi a0, a0, 48		; RV64IV-NEXT: addi a0, a0, 64
; RV64IV-NEXT: vl8r.v v16, (a0)		; RV64IV-NEXT: vl8r.v v16, (a0)
; RV64IV-NEXT: csrr a0, vlenb		; RV64IV-NEXT: csrr a0, vlenb
; RV64IV-NEXT: slli a0, a0, 3		; RV64IV-NEXT: slli a0, a0, 3
; RV64IV-NEXT: add a0, sp, a0		; RV64IV-NEXT: add a0, sp, a0
; RV64IV-NEXT: addi a0, a0, 48		; RV64IV-NEXT: addi a0, a0, 64
; RV64IV-NEXT: vl8r.v v24, (a0)		; RV64IV-NEXT: vl8r.v v24, (a0)
; RV64IV-NEXT: addi a0, sp, 48		; RV64IV-NEXT: addi a0, sp, 64
; RV64IV-NEXT: addi a1, sp, 48		; RV64IV-NEXT: addi a1, sp, 64
; RV64IV-NEXT: vs8r.v v24, (a1)		; RV64IV-NEXT: vs8r.v v24, (a1)
; RV64IV-NEXT: call callee@plt		; RV64IV-NEXT: call callee@plt
; RV64IV-NEXT: addi sp, s0, -64		; RV64IV-NEXT: addi sp, s0, -80
; RV64IV-NEXT: ld ra, 56(sp) # 8-byte Folded Reload		; RV64IV-NEXT: ld ra, 72(sp) # 8-byte Folded Reload
; RV64IV-NEXT: ld s0, 48(sp) # 8-byte Folded Reload		; RV64IV-NEXT: ld s0, 64(sp) # 8-byte Folded Reload
; RV64IV-NEXT: addi sp, sp, 64		; RV64IV-NEXT: addi sp, sp, 80
; RV64IV-NEXT: ret		; RV64IV-NEXT: ret
%local0 = alloca <vscale x 64 x i8>		%local0 = alloca <vscale x 64 x i8>
%local1 = alloca <vscale x 64 x i8>		%local1 = alloca <vscale x 64 x i8>
%local2 = alloca <vscale x 64 x i8>		%local2 = alloca <vscale x 64 x i8>
%arg0 = load volatile <vscale x 64 x i8>, <vscale x 64 x i8>* %local0		%arg0 = load volatile <vscale x 64 x i8>, <vscale x 64 x i8>* %local0
%arg1 = load volatile <vscale x 64 x i8>, <vscale x 64 x i8>* %local1		%arg1 = load volatile <vscale x 64 x i8>, <vscale x 64 x i8>* %local1
%arg2 = load volatile <vscale x 64 x i8>, <vscale x 64 x i8>* %local2		%arg2 = load volatile <vscale x 64 x i8>, <vscale x 64 x i8>* %local2
%ret = call <vscale x 64 x i8> @callee(<vscale x 64 x i8> %arg0,		%ret = call <vscale x 64 x i8> @callee(<vscale x 64 x i8> %arg0,
<vscale x 64 x i8> %arg1,		<vscale x 64 x i8> %arg1,
<vscale x 64 x i8> %arg2)		<vscale x 64 x i8> %arg2)
ret <vscale x 64 x i8> %ret		ret <vscale x 64 x i8> %ret
}		}

llvm/test/CodeGen/RISCV/rvv/no-reserved-frame.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=riscv64 -mattr=+v -verify-machineinstrs \			; RUN: llc -mtriple=riscv64 -mattr=+v -verify-machineinstrs \
	; RUN: < %s \| FileCheck %s			; RUN: < %s \| FileCheck %s

	define signext i32 @foo(i32 signext %aa) #0 {			define signext i32 @foo(i32 signext %aa) #0 {
	; CHECK-LABEL: foo:			; CHECK-LABEL: foo:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: addi sp, sp, -80			; CHECK-NEXT: addi sp, sp, -96
	; CHECK-NEXT: .cfi_def_cfa_offset 80			; CHECK-NEXT: .cfi_def_cfa_offset 96
	; CHECK-NEXT: sd ra, 72(sp) # 8-byte Folded Spill			; CHECK-NEXT: sd ra, 88(sp) # 8-byte Folded Spill
	; CHECK-NEXT: sd s0, 64(sp) # 8-byte Folded Spill			; CHECK-NEXT: sd s0, 80(sp) # 8-byte Folded Spill
	; CHECK-NEXT: sd s1, 56(sp) # 8-byte Folded Spill			; CHECK-NEXT: sd s1, 72(sp) # 8-byte Folded Spill
	; CHECK-NEXT: .cfi_offset ra, -8			; CHECK-NEXT: .cfi_offset ra, -8
	; CHECK-NEXT: .cfi_offset s0, -16			; CHECK-NEXT: .cfi_offset s0, -16
	; CHECK-NEXT: .cfi_offset s1, -24			; CHECK-NEXT: .cfi_offset s1, -24
	; CHECK-NEXT: addi s0, sp, 80			; CHECK-NEXT: addi s0, sp, 96
	; CHECK-NEXT: .cfi_def_cfa s0, 0			; CHECK-NEXT: .cfi_def_cfa s0, 0
	; CHECK-NEXT: csrr a1, vlenb			; CHECK-NEXT: csrr a1, vlenb
				; CHECK-NEXT: slli a1, a1, 1
	; CHECK-NEXT: sub sp, sp, a1			; CHECK-NEXT: sub sp, sp, a1
	; CHECK-NEXT: andi sp, sp, -8			; CHECK-NEXT: andi sp, sp, -8
	; CHECK-NEXT: mv s1, sp			; CHECK-NEXT: mv s1, sp
	; CHECK-NEXT: lw t0, 44(s1)			; CHECK-NEXT: lw t0, 44(s1)
	; CHECK-NEXT: lw a2, 40(s1)			; CHECK-NEXT: lw a2, 40(s1)
	; CHECK-NEXT: lw a3, 36(s1)			; CHECK-NEXT: lw a3, 36(s1)
	; CHECK-NEXT: lw a4, 32(s1)			; CHECK-NEXT: lw a4, 32(s1)
	; CHECK-NEXT: lw a5, 28(s1)			; CHECK-NEXT: lw a5, 28(s1)
	; CHECK-NEXT: lw a6, 24(s1)			; CHECK-NEXT: lw a6, 24(s1)
	; CHECK-NEXT: lw a7, 20(s1)			; CHECK-NEXT: lw a7, 20(s1)
	; CHECK-NEXT: lw t1, 16(s1)			; CHECK-NEXT: lw t1, 16(s1)
	; CHECK-NEXT: lw a1, 12(s1)			; CHECK-NEXT: lw a1, 12(s1)
	; CHECK-NEXT: lw t2, 8(s1)			; CHECK-NEXT: lw t2, 8(s1)
	; CHECK-NEXT: sw a0, 52(s1)			; CHECK-NEXT: sw a0, 52(s1)
	; CHECK-NEXT: sw a0, 48(s1)			; CHECK-NEXT: sw a0, 48(s1)
	; CHECK-NEXT: addi sp, sp, -32			; CHECK-NEXT: addi sp, sp, -32
	; CHECK-NEXT: sd t2, 16(sp)			; CHECK-NEXT: sd t2, 16(sp)
	; CHECK-NEXT: sd a1, 8(sp)			; CHECK-NEXT: sd a1, 8(sp)
	; CHECK-NEXT: addi a1, s1, 48			; CHECK-NEXT: addi a1, s1, 48
	; CHECK-NEXT: sd t1, 0(sp)			; CHECK-NEXT: sd t1, 0(sp)
	; CHECK-NEXT: mv a0, t0			; CHECK-NEXT: mv a0, t0
	; CHECK-NEXT: call gfunc@plt			; CHECK-NEXT: call gfunc@plt
	; CHECK-NEXT: addi sp, sp, 32			; CHECK-NEXT: addi sp, sp, 32
	; CHECK-NEXT: li a0, 0			; CHECK-NEXT: li a0, 0
	; CHECK-NEXT: addi sp, s0, -80			; CHECK-NEXT: addi sp, s0, -96
	; CHECK-NEXT: ld ra, 72(sp) # 8-byte Folded Reload			; CHECK-NEXT: ld ra, 88(sp) # 8-byte Folded Reload
	; CHECK-NEXT: ld s0, 64(sp) # 8-byte Folded Reload			; CHECK-NEXT: ld s0, 80(sp) # 8-byte Folded Reload
	; CHECK-NEXT: ld s1, 56(sp) # 8-byte Folded Reload			; CHECK-NEXT: ld s1, 72(sp) # 8-byte Folded Reload
	; CHECK-NEXT: addi sp, sp, 80			; CHECK-NEXT: addi sp, sp, 96
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%aa.addr = alloca i32, align 4			%aa.addr = alloca i32, align 4
	%local = alloca i32, align 4			%local = alloca i32, align 4
	%a = alloca i32, align 4			%a = alloca i32, align 4
	%b = alloca i32, align 4			%b = alloca i32, align 4
	%c = alloca i32, align 4			%c = alloca i32, align 4
	%d = alloca i32, align 4			%d = alloca i32, align 4
	Show All 28 Lines

llvm/test/CodeGen/RISCV/rvv/rv32-spill-vector-csr.ll

	Show All 39 Lines
	; SPILL-O0-NEXT: slli a0, a0, 1			; SPILL-O0-NEXT: slli a0, a0, 1
	; SPILL-O0-NEXT: add sp, sp, a0			; SPILL-O0-NEXT: add sp, sp, a0
	; SPILL-O0-NEXT: lw ra, 28(sp) # 4-byte Folded Reload			; SPILL-O0-NEXT: lw ra, 28(sp) # 4-byte Folded Reload
	; SPILL-O0-NEXT: addi sp, sp, 32			; SPILL-O0-NEXT: addi sp, sp, 32
	; SPILL-O0-NEXT: ret			; SPILL-O0-NEXT: ret
	;			;
	; SPILL-O2-LABEL: foo:			; SPILL-O2-LABEL: foo:
	; SPILL-O2: # %bb.0:			; SPILL-O2: # %bb.0:
	; SPILL-O2-NEXT: addi sp, sp, -16			; SPILL-O2-NEXT: addi sp, sp, -32
	; SPILL-O2-NEXT: sw ra, 12(sp) # 4-byte Folded Spill			; SPILL-O2-NEXT: sw ra, 28(sp) # 4-byte Folded Spill
	; SPILL-O2-NEXT: sw s0, 8(sp) # 4-byte Folded Spill			; SPILL-O2-NEXT: sw s0, 24(sp) # 4-byte Folded Spill
	; SPILL-O2-NEXT: csrr a1, vlenb			; SPILL-O2-NEXT: csrr a1, vlenb
	; SPILL-O2-NEXT: slli a1, a1, 1			; SPILL-O2-NEXT: slli a1, a1, 1
	; SPILL-O2-NEXT: sub sp, sp, a1			; SPILL-O2-NEXT: sub sp, sp, a1
	; SPILL-O2-NEXT: mv s0, a0			; SPILL-O2-NEXT: mv s0, a0
	; SPILL-O2-NEXT: addi a1, sp, 8			; SPILL-O2-NEXT: addi a1, sp, 16
	; SPILL-O2-NEXT: vs1r.v v8, (a1) # Unknown-size Folded Spill			; SPILL-O2-NEXT: vs1r.v v8, (a1) # Unknown-size Folded Spill
	; SPILL-O2-NEXT: vsetvli zero, a0, e64, m1, ta, mu			; SPILL-O2-NEXT: vsetvli zero, a0, e64, m1, ta, mu
	; SPILL-O2-NEXT: vfadd.vv v9, v8, v9			; SPILL-O2-NEXT: vfadd.vv v9, v8, v9
	; SPILL-O2-NEXT: csrr a0, vlenb			; SPILL-O2-NEXT: csrr a0, vlenb
	; SPILL-O2-NEXT: add a0, sp, a0			; SPILL-O2-NEXT: add a0, sp, a0
	; SPILL-O2-NEXT: addi a0, a0, 8			; SPILL-O2-NEXT: addi a0, a0, 16
	; SPILL-O2-NEXT: vs1r.v v9, (a0) # Unknown-size Folded Spill			; SPILL-O2-NEXT: vs1r.v v9, (a0) # Unknown-size Folded Spill
	; SPILL-O2-NEXT: lui a0, %hi(.L.str)			; SPILL-O2-NEXT: lui a0, %hi(.L.str)
	; SPILL-O2-NEXT: addi a0, a0, %lo(.L.str)			; SPILL-O2-NEXT: addi a0, a0, %lo(.L.str)
	; SPILL-O2-NEXT: call puts@plt			; SPILL-O2-NEXT: call puts@plt
	; SPILL-O2-NEXT: vsetvli zero, s0, e64, m1, ta, mu			; SPILL-O2-NEXT: vsetvli zero, s0, e64, m1, ta, mu
	; SPILL-O2-NEXT: csrr a0, vlenb			; SPILL-O2-NEXT: csrr a0, vlenb
	; SPILL-O2-NEXT: add a0, sp, a0			; SPILL-O2-NEXT: add a0, sp, a0
	; SPILL-O2-NEXT: addi a0, a0, 8			; SPILL-O2-NEXT: addi a0, a0, 16
	; SPILL-O2-NEXT: vl1r.v v8, (a0) # Unknown-size Folded Reload			; SPILL-O2-NEXT: vl1r.v v8, (a0) # Unknown-size Folded Reload
	; SPILL-O2-NEXT: addi a0, sp, 8			; SPILL-O2-NEXT: addi a0, sp, 16
	; SPILL-O2-NEXT: vl1r.v v9, (a0) # Unknown-size Folded Reload			; SPILL-O2-NEXT: vl1r.v v9, (a0) # Unknown-size Folded Reload
	; SPILL-O2-NEXT: vfadd.vv v8, v9, v8			; SPILL-O2-NEXT: vfadd.vv v8, v9, v8
	; SPILL-O2-NEXT: csrr a0, vlenb			; SPILL-O2-NEXT: csrr a0, vlenb
	; SPILL-O2-NEXT: slli a0, a0, 1			; SPILL-O2-NEXT: slli a0, a0, 1
	; SPILL-O2-NEXT: add sp, sp, a0			; SPILL-O2-NEXT: add sp, sp, a0
	; SPILL-O2-NEXT: lw ra, 12(sp) # 4-byte Folded Reload			; SPILL-O2-NEXT: lw ra, 28(sp) # 4-byte Folded Reload
	; SPILL-O2-NEXT: lw s0, 8(sp) # 4-byte Folded Reload			; SPILL-O2-NEXT: lw s0, 24(sp) # 4-byte Folded Reload
	; SPILL-O2-NEXT: addi sp, sp, 16			; SPILL-O2-NEXT: addi sp, sp, 32
	; SPILL-O2-NEXT: ret			; SPILL-O2-NEXT: ret
	{			{
	%x = call <vscale x 1 x double> @llvm.riscv.vfadd.nxv1f64.nxv1f64(<vscale x 1 x double> undef, <vscale x 1 x double> %a, <vscale x 1 x double> %b, i32 %gvl)			%x = call <vscale x 1 x double> @llvm.riscv.vfadd.nxv1f64.nxv1f64(<vscale x 1 x double> undef, <vscale x 1 x double> %a, <vscale x 1 x double> %b, i32 %gvl)
	%call = call signext i32 @puts(i8* getelementptr inbounds ([6 x i8], [6 x i8]* @.str, i64 0, i64 0))			%call = call signext i32 @puts(i8* getelementptr inbounds ([6 x i8], [6 x i8]* @.str, i64 0, i64 0))
	%z = call <vscale x 1 x double> @llvm.riscv.vfadd.nxv1f64.nxv1f64(<vscale x 1 x double> undef, <vscale x 1 x double> %a, <vscale x 1 x double> %x, i32 %gvl)			%z = call <vscale x 1 x double> @llvm.riscv.vfadd.nxv1f64.nxv1f64(<vscale x 1 x double> undef, <vscale x 1 x double> %a, <vscale x 1 x double> %x, i32 %gvl)
	ret <vscale x 1 x double> %z			ret <vscale x 1 x double> %z
	}			}

	declare <vscale x 1 x double> @llvm.riscv.vfadd.nxv1f64.nxv1f64(<vscale x 1 x double> %passthru, <vscale x 1 x double> %a, <vscale x 1 x double> %b, i32 %gvl)			declare <vscale x 1 x double> @llvm.riscv.vfadd.nxv1f64.nxv1f64(<vscale x 1 x double> %passthru, <vscale x 1 x double> %a, <vscale x 1 x double> %b, i32 %gvl)
	declare i32 @puts(i8*);			declare i32 @puts(i8*);

llvm/test/CodeGen/RISCV/rvv/rv32-spill-vector.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=riscv32 -mattr=+v -O0 < %s \			; RUN: llc -mtriple=riscv32 -mattr=+v -O0 < %s \
	; RUN: \| FileCheck --check-prefix=SPILL-O0 %s			; RUN: \| FileCheck --check-prefix=SPILL-O0 %s
	; RUN: llc -mtriple=riscv32 -mattr=+v -O2 < %s \			; RUN: llc -mtriple=riscv32 -mattr=+v -O2 < %s \
	; RUN: \| FileCheck --check-prefix=SPILL-O2 %s			; RUN: \| FileCheck --check-prefix=SPILL-O2 %s

	define <vscale x 1 x i32> @spill_lmul_mf2(<vscale x 1 x i32> %va) nounwind {			define <vscale x 1 x i32> @spill_lmul_mf2(<vscale x 1 x i32> %va) nounwind {
	; SPILL-O0-LABEL: spill_lmul_mf2:			; SPILL-O0-LABEL: spill_lmul_mf2:
	; SPILL-O0: # %bb.0: # %entry			; SPILL-O0: # %bb.0: # %entry
	; SPILL-O0-NEXT: addi sp, sp, -16			; SPILL-O0-NEXT: addi sp, sp, -16
	; SPILL-O0-NEXT: csrr a0, vlenb			; SPILL-O0-NEXT: csrr a0, vlenb
				; SPILL-O0-NEXT: slli a0, a0, 1
	; SPILL-O0-NEXT: sub sp, sp, a0			; SPILL-O0-NEXT: sub sp, sp, a0
	; SPILL-O0-NEXT: addi a0, sp, 16			; SPILL-O0-NEXT: addi a0, sp, 16
	; SPILL-O0-NEXT: vs1r.v v8, (a0) # Unknown-size Folded Spill			; SPILL-O0-NEXT: vs1r.v v8, (a0) # Unknown-size Folded Spill
	; SPILL-O0-NEXT: #APP			; SPILL-O0-NEXT: #APP
	; SPILL-O0-NEXT: #NO_APP			; SPILL-O0-NEXT: #NO_APP
	; SPILL-O0-NEXT: addi a0, sp, 16			; SPILL-O0-NEXT: addi a0, sp, 16
	; SPILL-O0-NEXT: vl1r.v v8, (a0) # Unknown-size Folded Reload			; SPILL-O0-NEXT: vl1r.v v8, (a0) # Unknown-size Folded Reload
	; SPILL-O0-NEXT: csrr a0, vlenb			; SPILL-O0-NEXT: csrr a0, vlenb
				; SPILL-O0-NEXT: slli a0, a0, 1
	; SPILL-O0-NEXT: add sp, sp, a0			; SPILL-O0-NEXT: add sp, sp, a0
	; SPILL-O0-NEXT: addi sp, sp, 16			; SPILL-O0-NEXT: addi sp, sp, 16
	; SPILL-O0-NEXT: ret			; SPILL-O0-NEXT: ret
	;			;
	; SPILL-O2-LABEL: spill_lmul_mf2:			; SPILL-O2-LABEL: spill_lmul_mf2:
	; SPILL-O2: # %bb.0: # %entry			; SPILL-O2: # %bb.0: # %entry
	; SPILL-O2-NEXT: addi sp, sp, -16			; SPILL-O2-NEXT: addi sp, sp, -16
	; SPILL-O2-NEXT: csrr a0, vlenb			; SPILL-O2-NEXT: csrr a0, vlenb
				; SPILL-O2-NEXT: slli a0, a0, 1
	; SPILL-O2-NEXT: sub sp, sp, a0			; SPILL-O2-NEXT: sub sp, sp, a0
	; SPILL-O2-NEXT: addi a0, sp, 16			; SPILL-O2-NEXT: addi a0, sp, 16
	; SPILL-O2-NEXT: vs1r.v v8, (a0) # Unknown-size Folded Spill			; SPILL-O2-NEXT: vs1r.v v8, (a0) # Unknown-size Folded Spill
	; SPILL-O2-NEXT: #APP			; SPILL-O2-NEXT: #APP
	; SPILL-O2-NEXT: #NO_APP			; SPILL-O2-NEXT: #NO_APP
	; SPILL-O2-NEXT: addi a0, sp, 16			; SPILL-O2-NEXT: addi a0, sp, 16
	; SPILL-O2-NEXT: vl1r.v v8, (a0) # Unknown-size Folded Reload			; SPILL-O2-NEXT: vl1r.v v8, (a0) # Unknown-size Folded Reload
	; SPILL-O2-NEXT: csrr a0, vlenb			; SPILL-O2-NEXT: csrr a0, vlenb
				; SPILL-O2-NEXT: slli a0, a0, 1
	; SPILL-O2-NEXT: add sp, sp, a0			; SPILL-O2-NEXT: add sp, sp, a0
	; SPILL-O2-NEXT: addi sp, sp, 16			; SPILL-O2-NEXT: addi sp, sp, 16
	; SPILL-O2-NEXT: ret			; SPILL-O2-NEXT: ret
	entry:			entry:
	call void asm sideeffect "",			call void asm sideeffect "",
	"~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7},~{v8},~{v9},~{v10},~{v11},~{v12},~{v13},~{v14},~{v15},~{v16},~{v17},~{v18},~{v19},~{v20},~{v21},~{v22},~{v23},~{v24},~{v25},~{v26},~{v27},~{v28},~{v29},~{v30},~{v31}"()			"~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7},~{v8},~{v9},~{v10},~{v11},~{v12},~{v13},~{v14},~{v15},~{v16},~{v17},~{v18},~{v19},~{v20},~{v21},~{v22},~{v23},~{v24},~{v25},~{v26},~{v27},~{v28},~{v29},~{v30},~{v31}"()

	ret <vscale x 1 x i32> %va			ret <vscale x 1 x i32> %va
	}			}

	define <vscale x 2 x i32> @spill_lmul_1(<vscale x 2 x i32> %va) nounwind {			define <vscale x 2 x i32> @spill_lmul_1(<vscale x 2 x i32> %va) nounwind {
	; SPILL-O0-LABEL: spill_lmul_1:			; SPILL-O0-LABEL: spill_lmul_1:
	; SPILL-O0: # %bb.0: # %entry			; SPILL-O0: # %bb.0: # %entry
	; SPILL-O0-NEXT: addi sp, sp, -16			; SPILL-O0-NEXT: addi sp, sp, -16
	; SPILL-O0-NEXT: csrr a0, vlenb			; SPILL-O0-NEXT: csrr a0, vlenb
				; SPILL-O0-NEXT: slli a0, a0, 1
				reamesUnsubmitted Not Done Reply Inline Actions Near as I can tell, these shifts are coming from the 16 byte minimum RVV alignment right? Or is there some other cause I'm missing. As noted in the top-level comment, I wonder if this is worthwhile. reames: Near as I can tell, these shifts are coming from the 16 byte minimum RVV alignment right? Or…
				kito-chengUnsubmitted Not Done Reply Inline Actions Does it possible to re-align by this way? csrr a0, vlenb addi a0, a0, 15 andi a0, a0, -16 sub sp, sp, a0 kito-cheng: Does it possible to re-align by this way? ``` csrr a0, vlenb addi a0, a0, 15 andi a0, a0, -16…
				frasercrmckAuthorUnsubmitted Done Reply Inline Actions Technically I think we could, but because we may be in a situation where we only have sp/bp and we need to jump over the RVV section to reach callee saves or fixed objects, that would complicate the code we emit for frame offset calculations. I think, on balance, having a "known" size for the RVV section is preferable, even if it may waste stack space on certain (zvl32b/zvl64b) configurations. frasercrmck: Technically I think we could, but because we may be in a situation where we only have sp/bp and…
	; SPILL-O0-NEXT: sub sp, sp, a0			; SPILL-O0-NEXT: sub sp, sp, a0
	; SPILL-O0-NEXT: addi a0, sp, 16			; SPILL-O0-NEXT: addi a0, sp, 16
	; SPILL-O0-NEXT: vs1r.v v8, (a0) # Unknown-size Folded Spill			; SPILL-O0-NEXT: vs1r.v v8, (a0) # Unknown-size Folded Spill
	; SPILL-O0-NEXT: #APP			; SPILL-O0-NEXT: #APP
	; SPILL-O0-NEXT: #NO_APP			; SPILL-O0-NEXT: #NO_APP
	; SPILL-O0-NEXT: addi a0, sp, 16			; SPILL-O0-NEXT: addi a0, sp, 16
	; SPILL-O0-NEXT: vl1r.v v8, (a0) # Unknown-size Folded Reload			; SPILL-O0-NEXT: vl1r.v v8, (a0) # Unknown-size Folded Reload
	; SPILL-O0-NEXT: csrr a0, vlenb			; SPILL-O0-NEXT: csrr a0, vlenb
				; SPILL-O0-NEXT: slli a0, a0, 1
	; SPILL-O0-NEXT: add sp, sp, a0			; SPILL-O0-NEXT: add sp, sp, a0
	; SPILL-O0-NEXT: addi sp, sp, 16			; SPILL-O0-NEXT: addi sp, sp, 16
	; SPILL-O0-NEXT: ret			; SPILL-O0-NEXT: ret
	;			;
	; SPILL-O2-LABEL: spill_lmul_1:			; SPILL-O2-LABEL: spill_lmul_1:
	; SPILL-O2: # %bb.0: # %entry			; SPILL-O2: # %bb.0: # %entry
	; SPILL-O2-NEXT: addi sp, sp, -16			; SPILL-O2-NEXT: addi sp, sp, -16
	; SPILL-O2-NEXT: csrr a0, vlenb			; SPILL-O2-NEXT: csrr a0, vlenb
				; SPILL-O2-NEXT: slli a0, a0, 1
	; SPILL-O2-NEXT: sub sp, sp, a0			; SPILL-O2-NEXT: sub sp, sp, a0
	; SPILL-O2-NEXT: addi a0, sp, 16			; SPILL-O2-NEXT: addi a0, sp, 16
	; SPILL-O2-NEXT: vs1r.v v8, (a0) # Unknown-size Folded Spill			; SPILL-O2-NEXT: vs1r.v v8, (a0) # Unknown-size Folded Spill
	; SPILL-O2-NEXT: #APP			; SPILL-O2-NEXT: #APP
	; SPILL-O2-NEXT: #NO_APP			; SPILL-O2-NEXT: #NO_APP
	; SPILL-O2-NEXT: addi a0, sp, 16			; SPILL-O2-NEXT: addi a0, sp, 16
	; SPILL-O2-NEXT: vl1r.v v8, (a0) # Unknown-size Folded Reload			; SPILL-O2-NEXT: vl1r.v v8, (a0) # Unknown-size Folded Reload
	; SPILL-O2-NEXT: csrr a0, vlenb			; SPILL-O2-NEXT: csrr a0, vlenb
				; SPILL-O2-NEXT: slli a0, a0, 1
	; SPILL-O2-NEXT: add sp, sp, a0			; SPILL-O2-NEXT: add sp, sp, a0
	; SPILL-O2-NEXT: addi sp, sp, 16			; SPILL-O2-NEXT: addi sp, sp, 16
	; SPILL-O2-NEXT: ret			; SPILL-O2-NEXT: ret
	entry:			entry:
	call void asm sideeffect "",			call void asm sideeffect "",
	"~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7},~{v8},~{v9},~{v10},~{v11},~{v12},~{v13},~{v14},~{v15},~{v16},~{v17},~{v18},~{v19},~{v20},~{v21},~{v22},~{v23},~{v24},~{v25},~{v26},~{v27},~{v28},~{v29},~{v30},~{v31}"()			"~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7},~{v8},~{v9},~{v10},~{v11},~{v12},~{v13},~{v14},~{v15},~{v16},~{v17},~{v18},~{v19},~{v20},~{v21},~{v22},~{v23},~{v24},~{v25},~{v26},~{v27},~{v28},~{v29},~{v30},~{v31}"()

	ret <vscale x 2 x i32> %va			ret <vscale x 2 x i32> %va
	▲ Show 20 Lines • Show All 130 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/rvv/rv32-spill-zvlsseg.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -mtriple=riscv32 -mattr=+v -mattr=+m -O0 < %s \		; RUN: llc -mtriple=riscv32 -mattr=+v -mattr=+m -O0 < %s \
; RUN: \| FileCheck --check-prefix=SPILL-O0 %s		; RUN: \| FileCheck --check-prefix=SPILL-O0 %s
; RUN: llc -mtriple=riscv32 -mattr=+v -mattr=+m -O2 < %s \		; RUN: llc -mtriple=riscv32 -mattr=+v -mattr=+m -O2 < %s \
; RUN: \| FileCheck --check-prefix=SPILL-O2 %s		; RUN: \| FileCheck --check-prefix=SPILL-O2 %s

define <vscale x 1 x i32> @spill_zvlsseg_nxv1i32(i32* %base, i32 %vl) nounwind {		define <vscale x 1 x i32> @spill_zvlsseg_nxv1i32(i32* %base, i32 %vl) nounwind {
; SPILL-O0-LABEL: spill_zvlsseg_nxv1i32:		; SPILL-O0-LABEL: spill_zvlsseg_nxv1i32:
; SPILL-O0: # %bb.0: # %entry		; SPILL-O0: # %bb.0: # %entry
; SPILL-O0-NEXT: addi sp, sp, -16		; SPILL-O0-NEXT: addi sp, sp, -16
; SPILL-O0-NEXT: csrr a2, vlenb		; SPILL-O0-NEXT: csrr a2, vlenb
		; SPILL-O0-NEXT: slli a2, a2, 1
; SPILL-O0-NEXT: sub sp, sp, a2		; SPILL-O0-NEXT: sub sp, sp, a2
; SPILL-O0-NEXT: vsetvli zero, a1, e32, mf2, ta, mu		; SPILL-O0-NEXT: vsetvli zero, a1, e32, mf2, ta, mu
; SPILL-O0-NEXT: vlseg2e32.v v8, (a0)		; SPILL-O0-NEXT: vlseg2e32.v v8, (a0)
; SPILL-O0-NEXT: vmv1r.v v8, v9		; SPILL-O0-NEXT: vmv1r.v v8, v9
; SPILL-O0-NEXT: addi a0, sp, 16		; SPILL-O0-NEXT: addi a0, sp, 16
; SPILL-O0-NEXT: vs1r.v v8, (a0) # Unknown-size Folded Spill		; SPILL-O0-NEXT: vs1r.v v8, (a0) # Unknown-size Folded Spill
; SPILL-O0-NEXT: #APP		; SPILL-O0-NEXT: #APP
; SPILL-O0-NEXT: #NO_APP		; SPILL-O0-NEXT: #NO_APP
; SPILL-O0-NEXT: addi a0, sp, 16		; SPILL-O0-NEXT: addi a0, sp, 16
; SPILL-O0-NEXT: vl1r.v v8, (a0) # Unknown-size Folded Reload		; SPILL-O0-NEXT: vl1r.v v8, (a0) # Unknown-size Folded Reload
; SPILL-O0-NEXT: csrr a0, vlenb		; SPILL-O0-NEXT: csrr a0, vlenb
		; SPILL-O0-NEXT: slli a0, a0, 1
; SPILL-O0-NEXT: add sp, sp, a0		; SPILL-O0-NEXT: add sp, sp, a0
; SPILL-O0-NEXT: addi sp, sp, 16		; SPILL-O0-NEXT: addi sp, sp, 16
; SPILL-O0-NEXT: ret		; SPILL-O0-NEXT: ret
;		;
; SPILL-O2-LABEL: spill_zvlsseg_nxv1i32:		; SPILL-O2-LABEL: spill_zvlsseg_nxv1i32:
; SPILL-O2: # %bb.0: # %entry		; SPILL-O2: # %bb.0: # %entry
; SPILL-O2-NEXT: addi sp, sp, -16		; SPILL-O2-NEXT: addi sp, sp, -16
; SPILL-O2-NEXT: csrr a2, vlenb		; SPILL-O2-NEXT: csrr a2, vlenb
Show All 27 Lines	entry:
ret <vscale x 1 x i32> %1		ret <vscale x 1 x i32> %1
}		}

define <vscale x 2 x i32> @spill_zvlsseg_nxv2i32(i32* %base, i32 %vl) nounwind {		define <vscale x 2 x i32> @spill_zvlsseg_nxv2i32(i32* %base, i32 %vl) nounwind {
; SPILL-O0-LABEL: spill_zvlsseg_nxv2i32:		; SPILL-O0-LABEL: spill_zvlsseg_nxv2i32:
; SPILL-O0: # %bb.0: # %entry		; SPILL-O0: # %bb.0: # %entry
; SPILL-O0-NEXT: addi sp, sp, -16		; SPILL-O0-NEXT: addi sp, sp, -16
; SPILL-O0-NEXT: csrr a2, vlenb		; SPILL-O0-NEXT: csrr a2, vlenb
		; SPILL-O0-NEXT: slli a2, a2, 1
; SPILL-O0-NEXT: sub sp, sp, a2		; SPILL-O0-NEXT: sub sp, sp, a2
; SPILL-O0-NEXT: vsetvli zero, a1, e32, m1, ta, mu		; SPILL-O0-NEXT: vsetvli zero, a1, e32, m1, ta, mu
; SPILL-O0-NEXT: vlseg2e32.v v8, (a0)		; SPILL-O0-NEXT: vlseg2e32.v v8, (a0)
; SPILL-O0-NEXT: vmv1r.v v8, v9		; SPILL-O0-NEXT: vmv1r.v v8, v9
; SPILL-O0-NEXT: addi a0, sp, 16		; SPILL-O0-NEXT: addi a0, sp, 16
; SPILL-O0-NEXT: vs1r.v v8, (a0) # Unknown-size Folded Spill		; SPILL-O0-NEXT: vs1r.v v8, (a0) # Unknown-size Folded Spill
; SPILL-O0-NEXT: #APP		; SPILL-O0-NEXT: #APP
; SPILL-O0-NEXT: #NO_APP		; SPILL-O0-NEXT: #NO_APP
; SPILL-O0-NEXT: addi a0, sp, 16		; SPILL-O0-NEXT: addi a0, sp, 16
; SPILL-O0-NEXT: vl1r.v v8, (a0) # Unknown-size Folded Reload		; SPILL-O0-NEXT: vl1r.v v8, (a0) # Unknown-size Folded Reload
; SPILL-O0-NEXT: csrr a0, vlenb		; SPILL-O0-NEXT: csrr a0, vlenb
		; SPILL-O0-NEXT: slli a0, a0, 1
; SPILL-O0-NEXT: add sp, sp, a0		; SPILL-O0-NEXT: add sp, sp, a0
; SPILL-O0-NEXT: addi sp, sp, 16		; SPILL-O0-NEXT: addi sp, sp, 16
; SPILL-O0-NEXT: ret		; SPILL-O0-NEXT: ret
;		;
; SPILL-O2-LABEL: spill_zvlsseg_nxv2i32:		; SPILL-O2-LABEL: spill_zvlsseg_nxv2i32:
; SPILL-O2: # %bb.0: # %entry		; SPILL-O2: # %bb.0: # %entry
; SPILL-O2-NEXT: addi sp, sp, -16		; SPILL-O2-NEXT: addi sp, sp, -16
; SPILL-O2-NEXT: csrr a2, vlenb		; SPILL-O2-NEXT: csrr a2, vlenb
▲ Show 20 Lines • Show All 215 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/rvv/rv64-spill-vector-csr.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=riscv64 -mattr=+v,+d -mattr=+d -O0 < %s \			; RUN: llc -mtriple=riscv64 -mattr=+v,+d -mattr=+d -O0 < %s \
	; RUN: \| FileCheck --check-prefix=SPILL-O0 %s			; RUN: \| FileCheck --check-prefix=SPILL-O0 %s
	; RUN: llc -mtriple=riscv64 -mattr=+v,+d -mattr=+d -O2 < %s \			; RUN: llc -mtriple=riscv64 -mattr=+v,+d -mattr=+d -O2 < %s \
	; RUN: \| FileCheck --check-prefix=SPILL-O2 %s			; RUN: \| FileCheck --check-prefix=SPILL-O2 %s

	@.str = private unnamed_addr constant [6 x i8] c"hello\00", align 1			@.str = private unnamed_addr constant [6 x i8] c"hello\00", align 1

	define <vscale x 1 x double> @foo(<vscale x 1 x double> %a, <vscale x 1 x double> %b, <vscale x 1 x double> %c, i64 %gvl) nounwind			define <vscale x 1 x double> @foo(<vscale x 1 x double> %a, <vscale x 1 x double> %b, <vscale x 1 x double> %c, i64 %gvl) nounwind
	; SPILL-O0-LABEL: foo:			; SPILL-O0-LABEL: foo:
	; SPILL-O0: # %bb.0:			; SPILL-O0: # %bb.0:
	; SPILL-O0-NEXT: addi sp, sp, -32			; SPILL-O0-NEXT: addi sp, sp, -48
	; SPILL-O0-NEXT: sd ra, 24(sp) # 8-byte Folded Spill			; SPILL-O0-NEXT: sd ra, 40(sp) # 8-byte Folded Spill
	; SPILL-O0-NEXT: csrr a1, vlenb			; SPILL-O0-NEXT: csrr a1, vlenb
	; SPILL-O0-NEXT: slli a1, a1, 1			; SPILL-O0-NEXT: slli a1, a1, 1
	; SPILL-O0-NEXT: sub sp, sp, a1			; SPILL-O0-NEXT: sub sp, sp, a1
	; SPILL-O0-NEXT: sd a0, 16(sp) # 8-byte Folded Spill			; SPILL-O0-NEXT: sd a0, 16(sp) # 8-byte Folded Spill
	; SPILL-O0-NEXT: csrr a1, vlenb			; SPILL-O0-NEXT: csrr a1, vlenb
	; SPILL-O0-NEXT: add a1, sp, a1			; SPILL-O0-NEXT: add a1, sp, a1
	; SPILL-O0-NEXT: addi a1, a1, 24			; SPILL-O0-NEXT: addi a1, a1, 32
	; SPILL-O0-NEXT: vs1r.v v8, (a1) # Unknown-size Folded Spill			; SPILL-O0-NEXT: vs1r.v v8, (a1) # Unknown-size Folded Spill
	; SPILL-O0-NEXT: vsetvli zero, a0, e64, m1, ta, mu			; SPILL-O0-NEXT: vsetvli zero, a0, e64, m1, ta, mu
	; SPILL-O0-NEXT: vfadd.vv v8, v8, v9			; SPILL-O0-NEXT: vfadd.vv v8, v8, v9
	; SPILL-O0-NEXT: addi a0, sp, 24			; SPILL-O0-NEXT: addi a0, sp, 32
	; SPILL-O0-NEXT: vs1r.v v8, (a0) # Unknown-size Folded Spill			; SPILL-O0-NEXT: vs1r.v v8, (a0) # Unknown-size Folded Spill
	; SPILL-O0-NEXT: lui a0, %hi(.L.str)			; SPILL-O0-NEXT: lui a0, %hi(.L.str)
	; SPILL-O0-NEXT: addi a0, a0, %lo(.L.str)			; SPILL-O0-NEXT: addi a0, a0, %lo(.L.str)
	; SPILL-O0-NEXT: call puts@plt			; SPILL-O0-NEXT: call puts@plt
	; SPILL-O0-NEXT: addi a1, sp, 24			; SPILL-O0-NEXT: addi a1, sp, 32
	; SPILL-O0-NEXT: vl1r.v v9, (a1) # Unknown-size Folded Reload			; SPILL-O0-NEXT: vl1r.v v9, (a1) # Unknown-size Folded Reload
	; SPILL-O0-NEXT: csrr a1, vlenb			; SPILL-O0-NEXT: csrr a1, vlenb
	; SPILL-O0-NEXT: add a1, sp, a1			; SPILL-O0-NEXT: add a1, sp, a1
	; SPILL-O0-NEXT: addi a1, a1, 24			; SPILL-O0-NEXT: addi a1, a1, 32
	; SPILL-O0-NEXT: vl1r.v v8, (a1) # Unknown-size Folded Reload			; SPILL-O0-NEXT: vl1r.v v8, (a1) # Unknown-size Folded Reload
	; SPILL-O0-NEXT: # kill: def $x11 killed $x10			; SPILL-O0-NEXT: # kill: def $x11 killed $x10
	; SPILL-O0-NEXT: ld a0, 16(sp) # 8-byte Folded Reload			; SPILL-O0-NEXT: ld a0, 16(sp) # 8-byte Folded Reload
	; SPILL-O0-NEXT: vsetvli zero, a0, e64, m1, ta, mu			; SPILL-O0-NEXT: vsetvli zero, a0, e64, m1, ta, mu
	; SPILL-O0-NEXT: vfadd.vv v8, v8, v9			; SPILL-O0-NEXT: vfadd.vv v8, v8, v9
	; SPILL-O0-NEXT: csrr a0, vlenb			; SPILL-O0-NEXT: csrr a0, vlenb
	; SPILL-O0-NEXT: slli a0, a0, 1			; SPILL-O0-NEXT: slli a0, a0, 1
	; SPILL-O0-NEXT: add sp, sp, a0			; SPILL-O0-NEXT: add sp, sp, a0
	; SPILL-O0-NEXT: ld ra, 24(sp) # 8-byte Folded Reload			; SPILL-O0-NEXT: ld ra, 40(sp) # 8-byte Folded Reload
	; SPILL-O0-NEXT: addi sp, sp, 32			; SPILL-O0-NEXT: addi sp, sp, 48
	; SPILL-O0-NEXT: ret			; SPILL-O0-NEXT: ret
	;			;
	; SPILL-O2-LABEL: foo:			; SPILL-O2-LABEL: foo:
	; SPILL-O2: # %bb.0:			; SPILL-O2: # %bb.0:
	; SPILL-O2-NEXT: addi sp, sp, -32			; SPILL-O2-NEXT: addi sp, sp, -32
	; SPILL-O2-NEXT: sd ra, 24(sp) # 8-byte Folded Spill			; SPILL-O2-NEXT: sd ra, 24(sp) # 8-byte Folded Spill
	; SPILL-O2-NEXT: sd s0, 16(sp) # 8-byte Folded Spill			; SPILL-O2-NEXT: sd s0, 16(sp) # 8-byte Folded Spill
	; SPILL-O2-NEXT: csrr a1, vlenb			; SPILL-O2-NEXT: csrr a1, vlenb
	Show All 38 Lines

llvm/test/CodeGen/RISCV/rvv/rv64-spill-vector.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=riscv64 -mattr=+v -O0 < %s \			; RUN: llc -mtriple=riscv64 -mattr=+v -O0 < %s \
	; RUN: \| FileCheck --check-prefix=SPILL-O0 %s			; RUN: \| FileCheck --check-prefix=SPILL-O0 %s
	; RUN: llc -mtriple=riscv64 -mattr=+v -O2 < %s \			; RUN: llc -mtriple=riscv64 -mattr=+v -O2 < %s \
	; RUN: \| FileCheck --check-prefix=SPILL-O2 %s			; RUN: \| FileCheck --check-prefix=SPILL-O2 %s

	define <vscale x 1 x i64> @spill_lmul_1(<vscale x 1 x i64> %va) nounwind {			define <vscale x 1 x i64> @spill_lmul_1(<vscale x 1 x i64> %va) nounwind {
	; SPILL-O0-LABEL: spill_lmul_1:			; SPILL-O0-LABEL: spill_lmul_1:
	; SPILL-O0: # %bb.0: # %entry			; SPILL-O0: # %bb.0: # %entry
	; SPILL-O0-NEXT: addi sp, sp, -16			; SPILL-O0-NEXT: addi sp, sp, -16
	; SPILL-O0-NEXT: csrr a0, vlenb			; SPILL-O0-NEXT: csrr a0, vlenb
				; SPILL-O0-NEXT: slli a0, a0, 1
	; SPILL-O0-NEXT: sub sp, sp, a0			; SPILL-O0-NEXT: sub sp, sp, a0
	; SPILL-O0-NEXT: addi a0, sp, 16			; SPILL-O0-NEXT: addi a0, sp, 16
	; SPILL-O0-NEXT: vs1r.v v8, (a0) # Unknown-size Folded Spill			; SPILL-O0-NEXT: vs1r.v v8, (a0) # Unknown-size Folded Spill
	; SPILL-O0-NEXT: #APP			; SPILL-O0-NEXT: #APP
	; SPILL-O0-NEXT: #NO_APP			; SPILL-O0-NEXT: #NO_APP
	; SPILL-O0-NEXT: addi a0, sp, 16			; SPILL-O0-NEXT: addi a0, sp, 16
	; SPILL-O0-NEXT: vl1r.v v8, (a0) # Unknown-size Folded Reload			; SPILL-O0-NEXT: vl1r.v v8, (a0) # Unknown-size Folded Reload
	; SPILL-O0-NEXT: csrr a0, vlenb			; SPILL-O0-NEXT: csrr a0, vlenb
				; SPILL-O0-NEXT: slli a0, a0, 1
	; SPILL-O0-NEXT: add sp, sp, a0			; SPILL-O0-NEXT: add sp, sp, a0
	; SPILL-O0-NEXT: addi sp, sp, 16			; SPILL-O0-NEXT: addi sp, sp, 16
	; SPILL-O0-NEXT: ret			; SPILL-O0-NEXT: ret
	;			;
	; SPILL-O2-LABEL: spill_lmul_1:			; SPILL-O2-LABEL: spill_lmul_1:
	; SPILL-O2: # %bb.0: # %entry			; SPILL-O2: # %bb.0: # %entry
	; SPILL-O2-NEXT: addi sp, sp, -16			; SPILL-O2-NEXT: addi sp, sp, -16
	; SPILL-O2-NEXT: csrr a0, vlenb			; SPILL-O2-NEXT: csrr a0, vlenb
				; SPILL-O2-NEXT: slli a0, a0, 1
	; SPILL-O2-NEXT: sub sp, sp, a0			; SPILL-O2-NEXT: sub sp, sp, a0
	; SPILL-O2-NEXT: addi a0, sp, 16			; SPILL-O2-NEXT: addi a0, sp, 16
	; SPILL-O2-NEXT: vs1r.v v8, (a0) # Unknown-size Folded Spill			; SPILL-O2-NEXT: vs1r.v v8, (a0) # Unknown-size Folded Spill
	; SPILL-O2-NEXT: #APP			; SPILL-O2-NEXT: #APP
	; SPILL-O2-NEXT: #NO_APP			; SPILL-O2-NEXT: #NO_APP
	; SPILL-O2-NEXT: addi a0, sp, 16			; SPILL-O2-NEXT: addi a0, sp, 16
	; SPILL-O2-NEXT: vl1r.v v8, (a0) # Unknown-size Folded Reload			; SPILL-O2-NEXT: vl1r.v v8, (a0) # Unknown-size Folded Reload
	; SPILL-O2-NEXT: csrr a0, vlenb			; SPILL-O2-NEXT: csrr a0, vlenb
				; SPILL-O2-NEXT: slli a0, a0, 1
	; SPILL-O2-NEXT: add sp, sp, a0			; SPILL-O2-NEXT: add sp, sp, a0
	; SPILL-O2-NEXT: addi sp, sp, 16			; SPILL-O2-NEXT: addi sp, sp, 16
	; SPILL-O2-NEXT: ret			; SPILL-O2-NEXT: ret
	entry:			entry:
	call void asm sideeffect "",			call void asm sideeffect "",
	"~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7},~{v8},~{v9},~{v10},~{v11},~{v12},~{v13},~{v14},~{v15},~{v16},~{v17},~{v18},~{v19},~{v20},~{v21},~{v22},~{v23},~{v24},~{v25},~{v26},~{v27},~{v28},~{v29},~{v30},~{v31}"()			"~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7},~{v8},~{v9},~{v10},~{v11},~{v12},~{v13},~{v14},~{v15},~{v16},~{v17},~{v18},~{v19},~{v20},~{v21},~{v22},~{v23},~{v24},~{v25},~{v26},~{v27},~{v28},~{v29},~{v30},~{v31}"()

	ret <vscale x 1 x i64> %va			ret <vscale x 1 x i64> %va
	▲ Show 20 Lines • Show All 130 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/rvv/rv64-spill-zvlsseg.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -mtriple=riscv64 -mattr=+v -mattr=+m -O0 < %s \		; RUN: llc -mtriple=riscv64 -mattr=+v -mattr=+m -O0 < %s \
; RUN: \| FileCheck --check-prefix=SPILL-O0 %s		; RUN: \| FileCheck --check-prefix=SPILL-O0 %s
; RUN: llc -mtriple=riscv64 -mattr=+v -mattr=+m -O2 < %s \		; RUN: llc -mtriple=riscv64 -mattr=+v -mattr=+m -O2 < %s \
; RUN: \| FileCheck --check-prefix=SPILL-O2 %s		; RUN: \| FileCheck --check-prefix=SPILL-O2 %s

define <vscale x 1 x i32> @spill_zvlsseg_nxv1i32(i32* %base, i64 %vl) nounwind {		define <vscale x 1 x i32> @spill_zvlsseg_nxv1i32(i32* %base, i64 %vl) nounwind {
; SPILL-O0-LABEL: spill_zvlsseg_nxv1i32:		; SPILL-O0-LABEL: spill_zvlsseg_nxv1i32:
; SPILL-O0: # %bb.0: # %entry		; SPILL-O0: # %bb.0: # %entry
; SPILL-O0-NEXT: addi sp, sp, -16		; SPILL-O0-NEXT: addi sp, sp, -16
; SPILL-O0-NEXT: csrr a2, vlenb		; SPILL-O0-NEXT: csrr a2, vlenb
		; SPILL-O0-NEXT: slli a2, a2, 1
; SPILL-O0-NEXT: sub sp, sp, a2		; SPILL-O0-NEXT: sub sp, sp, a2
; SPILL-O0-NEXT: vsetvli zero, a1, e32, mf2, ta, mu		; SPILL-O0-NEXT: vsetvli zero, a1, e32, mf2, ta, mu
; SPILL-O0-NEXT: vlseg2e32.v v8, (a0)		; SPILL-O0-NEXT: vlseg2e32.v v8, (a0)
; SPILL-O0-NEXT: vmv1r.v v8, v9		; SPILL-O0-NEXT: vmv1r.v v8, v9
; SPILL-O0-NEXT: addi a0, sp, 16		; SPILL-O0-NEXT: addi a0, sp, 16
; SPILL-O0-NEXT: vs1r.v v8, (a0) # Unknown-size Folded Spill		; SPILL-O0-NEXT: vs1r.v v8, (a0) # Unknown-size Folded Spill
; SPILL-O0-NEXT: #APP		; SPILL-O0-NEXT: #APP
; SPILL-O0-NEXT: #NO_APP		; SPILL-O0-NEXT: #NO_APP
; SPILL-O0-NEXT: addi a0, sp, 16		; SPILL-O0-NEXT: addi a0, sp, 16
; SPILL-O0-NEXT: vl1r.v v8, (a0) # Unknown-size Folded Reload		; SPILL-O0-NEXT: vl1r.v v8, (a0) # Unknown-size Folded Reload
; SPILL-O0-NEXT: csrr a0, vlenb		; SPILL-O0-NEXT: csrr a0, vlenb
		; SPILL-O0-NEXT: slli a0, a0, 1
; SPILL-O0-NEXT: add sp, sp, a0		; SPILL-O0-NEXT: add sp, sp, a0
; SPILL-O0-NEXT: addi sp, sp, 16		; SPILL-O0-NEXT: addi sp, sp, 16
; SPILL-O0-NEXT: ret		; SPILL-O0-NEXT: ret
;		;
; SPILL-O2-LABEL: spill_zvlsseg_nxv1i32:		; SPILL-O2-LABEL: spill_zvlsseg_nxv1i32:
; SPILL-O2: # %bb.0: # %entry		; SPILL-O2: # %bb.0: # %entry
; SPILL-O2-NEXT: addi sp, sp, -16		; SPILL-O2-NEXT: addi sp, sp, -16
; SPILL-O2-NEXT: csrr a2, vlenb		; SPILL-O2-NEXT: csrr a2, vlenb
Show All 27 Lines	entry:
ret <vscale x 1 x i32> %1		ret <vscale x 1 x i32> %1
}		}

define <vscale x 2 x i32> @spill_zvlsseg_nxv2i32(i32* %base, i64 %vl) nounwind {		define <vscale x 2 x i32> @spill_zvlsseg_nxv2i32(i32* %base, i64 %vl) nounwind {
; SPILL-O0-LABEL: spill_zvlsseg_nxv2i32:		; SPILL-O0-LABEL: spill_zvlsseg_nxv2i32:
; SPILL-O0: # %bb.0: # %entry		; SPILL-O0: # %bb.0: # %entry
; SPILL-O0-NEXT: addi sp, sp, -16		; SPILL-O0-NEXT: addi sp, sp, -16
; SPILL-O0-NEXT: csrr a2, vlenb		; SPILL-O0-NEXT: csrr a2, vlenb
		; SPILL-O0-NEXT: slli a2, a2, 1
; SPILL-O0-NEXT: sub sp, sp, a2		; SPILL-O0-NEXT: sub sp, sp, a2
; SPILL-O0-NEXT: vsetvli zero, a1, e32, m1, ta, mu		; SPILL-O0-NEXT: vsetvli zero, a1, e32, m1, ta, mu
; SPILL-O0-NEXT: vlseg2e32.v v8, (a0)		; SPILL-O0-NEXT: vlseg2e32.v v8, (a0)
; SPILL-O0-NEXT: vmv1r.v v8, v9		; SPILL-O0-NEXT: vmv1r.v v8, v9
; SPILL-O0-NEXT: addi a0, sp, 16		; SPILL-O0-NEXT: addi a0, sp, 16
; SPILL-O0-NEXT: vs1r.v v8, (a0) # Unknown-size Folded Spill		; SPILL-O0-NEXT: vs1r.v v8, (a0) # Unknown-size Folded Spill
; SPILL-O0-NEXT: #APP		; SPILL-O0-NEXT: #APP
; SPILL-O0-NEXT: #NO_APP		; SPILL-O0-NEXT: #NO_APP
; SPILL-O0-NEXT: addi a0, sp, 16		; SPILL-O0-NEXT: addi a0, sp, 16
; SPILL-O0-NEXT: vl1r.v v8, (a0) # Unknown-size Folded Reload		; SPILL-O0-NEXT: vl1r.v v8, (a0) # Unknown-size Folded Reload
; SPILL-O0-NEXT: csrr a0, vlenb		; SPILL-O0-NEXT: csrr a0, vlenb
		; SPILL-O0-NEXT: slli a0, a0, 1
; SPILL-O0-NEXT: add sp, sp, a0		; SPILL-O0-NEXT: add sp, sp, a0
; SPILL-O0-NEXT: addi sp, sp, 16		; SPILL-O0-NEXT: addi sp, sp, 16
; SPILL-O0-NEXT: ret		; SPILL-O0-NEXT: ret
;		;
; SPILL-O2-LABEL: spill_zvlsseg_nxv2i32:		; SPILL-O2-LABEL: spill_zvlsseg_nxv2i32:
; SPILL-O2: # %bb.0: # %entry		; SPILL-O2: # %bb.0: # %entry
; SPILL-O2-NEXT: addi sp, sp, -16		; SPILL-O2-NEXT: addi sp, sp, -16
; SPILL-O2-NEXT: csrr a2, vlenb		; SPILL-O2-NEXT: csrr a2, vlenb
▲ Show 20 Lines • Show All 215 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/rvv/rvv-args-by-mem.ll

Show All 16 Lines	; CHECK-NEXT: ret
%s1 = add <vscale x 16 x i32> %x, %z		%s1 = add <vscale x 16 x i32> %x, %z
%s = add <vscale x 16 x i32> %s0, %s1		%s = add <vscale x 16 x i32> %s0, %s1
ret <vscale x 16 x i32> %s		ret <vscale x 16 x i32> %s
}		}

define <vscale x 16 x i32> @foo(i32 %0, i32 %1, i32 %2, i32 %3, i32 %4, i32 %5, i32 %6, i32 %7, <vscale x 16 x i32> %x) {		define <vscale x 16 x i32> @foo(i32 %0, i32 %1, i32 %2, i32 %3, i32 %4, i32 %5, i32 %6, i32 %7, <vscale x 16 x i32> %x) {
; CHECK-LABEL: foo:		; CHECK-LABEL: foo:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: addi sp, sp, -48		; CHECK-NEXT: addi sp, sp, -80
; CHECK-NEXT: .cfi_def_cfa_offset 48		; CHECK-NEXT: .cfi_def_cfa_offset 80
; CHECK-NEXT: sd ra, 40(sp) # 8-byte Folded Spill		; CHECK-NEXT: sd ra, 72(sp) # 8-byte Folded Spill
; CHECK-NEXT: .cfi_offset ra, -8		; CHECK-NEXT: .cfi_offset ra, -8
; CHECK-NEXT: csrr t0, vlenb		; CHECK-NEXT: csrr t0, vlenb
; CHECK-NEXT: slli t0, t0, 4		; CHECK-NEXT: slli t0, t0, 4
; CHECK-NEXT: sub sp, sp, t0		; CHECK-NEXT: sub sp, sp, t0
; CHECK-NEXT: addi t0, sp, 40		; CHECK-NEXT: addi t0, sp, 64
; CHECK-NEXT: sd t0, 8(sp)		; CHECK-NEXT: sd t0, 8(sp)
; CHECK-NEXT: csrr t0, vlenb		; CHECK-NEXT: csrr t0, vlenb
; CHECK-NEXT: slli t0, t0, 3		; CHECK-NEXT: slli t0, t0, 3
; CHECK-NEXT: add t0, sp, t0		; CHECK-NEXT: add t0, sp, t0
; CHECK-NEXT: addi t0, t0, 40		; CHECK-NEXT: addi t0, t0, 64
; CHECK-NEXT: sd t0, 0(sp)		; CHECK-NEXT: sd t0, 0(sp)
; CHECK-NEXT: addi t0, sp, 40		; CHECK-NEXT: addi t0, sp, 64
; CHECK-NEXT: vs8r.v v8, (t0)		; CHECK-NEXT: vs8r.v v8, (t0)
; CHECK-NEXT: csrr t0, vlenb		; CHECK-NEXT: csrr t0, vlenb
; CHECK-NEXT: slli t0, t0, 3		; CHECK-NEXT: slli t0, t0, 3
; CHECK-NEXT: add t0, sp, t0		; CHECK-NEXT: add t0, sp, t0
; CHECK-NEXT: addi t0, t0, 40		; CHECK-NEXT: addi t0, t0, 64
; CHECK-NEXT: vs8r.v v8, (t0)		; CHECK-NEXT: vs8r.v v8, (t0)
; CHECK-NEXT: vmv8r.v v16, v8		; CHECK-NEXT: vmv8r.v v16, v8
; CHECK-NEXT: call bar@plt		; CHECK-NEXT: call bar@plt
; CHECK-NEXT: csrr a0, vlenb		; CHECK-NEXT: csrr a0, vlenb
; CHECK-NEXT: slli a0, a0, 4		; CHECK-NEXT: slli a0, a0, 4
; CHECK-NEXT: add sp, sp, a0		; CHECK-NEXT: add sp, sp, a0
; CHECK-NEXT: ld ra, 40(sp) # 8-byte Folded Reload		; CHECK-NEXT: ld ra, 72(sp) # 8-byte Folded Reload
; CHECK-NEXT: addi sp, sp, 48		; CHECK-NEXT: addi sp, sp, 80
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%ret = call <vscale x 16 x i32> @bar(i32 %0, i32 %1, i32 %2, i32 %3, i32 %4, i32 %5, i32 %6, i32 %7, <vscale x 16 x i32> %x, <vscale x 16 x i32> %x, <vscale x 16 x i32> %x, <vscale x 16 x i32> %x)		%ret = call <vscale x 16 x i32> @bar(i32 %0, i32 %1, i32 %2, i32 %3, i32 %4, i32 %5, i32 %6, i32 %7, <vscale x 16 x i32> %x, <vscale x 16 x i32> %x, <vscale x 16 x i32> %x, <vscale x 16 x i32> %x)
ret <vscale x 16 x i32> %ret		ret <vscale x 16 x i32> %ret
}		}

llvm/test/CodeGen/RISCV/rvv/rvv-framelayout.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -mtriple=riscv64 -mattr=+v,+m -verify-machineinstrs < %s \| FileCheck %s		; RUN: llc -mtriple=riscv64 -mattr=+v,+m -verify-machineinstrs < %s \| FileCheck %s

define void @rvv_vla(i64 %n, i64 %i) nounwind {		define void @rvv_vla(i64 %n, i64 %i) nounwind {
; CHECK-LABEL: rvv_vla:		; CHECK-LABEL: rvv_vla:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: addi sp, sp, -32		; CHECK-NEXT: addi sp, sp, -32
; CHECK-NEXT: sd ra, 24(sp) # 8-byte Folded Spill		; CHECK-NEXT: sd ra, 24(sp) # 8-byte Folded Spill
; CHECK-NEXT: sd s0, 16(sp) # 8-byte Folded Spill		; CHECK-NEXT: sd s0, 16(sp) # 8-byte Folded Spill
; CHECK-NEXT: addi s0, sp, 32		; CHECK-NEXT: addi s0, sp, 32
; CHECK-NEXT: csrr a2, vlenb		; CHECK-NEXT: csrr a2, vlenb
; CHECK-NEXT: slli a3, a2, 1		; CHECK-NEXT: slli a2, a2, 2
; CHECK-NEXT: add a2, a3, a2
; CHECK-NEXT: sub sp, sp, a2		; CHECK-NEXT: sub sp, sp, a2
; CHECK-NEXT: slli a0, a0, 2		; CHECK-NEXT: slli a0, a0, 2
; CHECK-NEXT: addi a0, a0, 15		; CHECK-NEXT: addi a0, a0, 15
; CHECK-NEXT: andi a0, a0, -16		; CHECK-NEXT: andi a0, a0, -16
; CHECK-NEXT: sub a0, sp, a0		; CHECK-NEXT: sub a0, sp, a0
; CHECK-NEXT: mv sp, a0		; CHECK-NEXT: mv sp, a0
; CHECK-NEXT: csrr a2, vlenb		; CHECK-NEXT: csrr a2, vlenb
; CHECK-NEXT: sub a2, s0, a2		; CHECK-NEXT: sub a2, s0, a2
; CHECK-NEXT: addi a2, a2, -32		; CHECK-NEXT: addi a2, a2, -32
; CHECK-NEXT: vl1re64.v v8, (a2)		; CHECK-NEXT: vl1re64.v v8, (a2)
; CHECK-NEXT: csrr a2, vlenb		; CHECK-NEXT: csrr a2, vlenb
; CHECK-NEXT: slli a3, a2, 1		; CHECK-NEXT: slli a2, a2, 2
; CHECK-NEXT: add a2, a3, a2
; CHECK-NEXT: sub a2, s0, a2		; CHECK-NEXT: sub a2, s0, a2
; CHECK-NEXT: addi a2, a2, -32		; CHECK-NEXT: addi a2, a2, -32
; CHECK-NEXT: vl2re64.v v8, (a2)		; CHECK-NEXT: vl2re64.v v8, (a2)
; CHECK-NEXT: slli a1, a1, 2		; CHECK-NEXT: slli a1, a1, 2
; CHECK-NEXT: add a0, a0, a1		; CHECK-NEXT: add a0, a0, a1
; CHECK-NEXT: lw a0, 0(a0)		; CHECK-NEXT: lw a0, 0(a0)
; CHECK-NEXT: addi sp, s0, -32		; CHECK-NEXT: addi sp, s0, -32
; CHECK-NEXT: ld ra, 24(sp) # 8-byte Folded Reload		; CHECK-NEXT: ld ra, 24(sp) # 8-byte Folded Reload
Show All 16 Lines
define void @rvv_overaligned() nounwind {		define void @rvv_overaligned() nounwind {
; CHECK-LABEL: rvv_overaligned:		; CHECK-LABEL: rvv_overaligned:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: addi sp, sp, -128		; CHECK-NEXT: addi sp, sp, -128
; CHECK-NEXT: sd ra, 120(sp) # 8-byte Folded Spill		; CHECK-NEXT: sd ra, 120(sp) # 8-byte Folded Spill
; CHECK-NEXT: sd s0, 112(sp) # 8-byte Folded Spill		; CHECK-NEXT: sd s0, 112(sp) # 8-byte Folded Spill
; CHECK-NEXT: addi s0, sp, 128		; CHECK-NEXT: addi s0, sp, 128
; CHECK-NEXT: csrr a0, vlenb		; CHECK-NEXT: csrr a0, vlenb
; CHECK-NEXT: slli a1, a0, 1		; CHECK-NEXT: slli a0, a0, 2
; CHECK-NEXT: add a0, a1, a0
; CHECK-NEXT: sub sp, sp, a0		; CHECK-NEXT: sub sp, sp, a0
; CHECK-NEXT: andi sp, sp, -64		; CHECK-NEXT: andi sp, sp, -64
; CHECK-NEXT: csrr a0, vlenb		; CHECK-NEXT: csrr a0, vlenb
; CHECK-NEXT: slli a0, a0, 1		; CHECK-NEXT: slli a1, a0, 1
		; CHECK-NEXT: add a0, a1, a0
; CHECK-NEXT: add a0, sp, a0		; CHECK-NEXT: add a0, sp, a0
; CHECK-NEXT: addi a0, a0, 112		; CHECK-NEXT: addi a0, a0, 112
; CHECK-NEXT: vl1re64.v v8, (a0)		; CHECK-NEXT: vl1re64.v v8, (a0)
; CHECK-NEXT: addi a0, sp, 112		; CHECK-NEXT: addi a0, sp, 112
; CHECK-NEXT: vl2re64.v v8, (a0)		; CHECK-NEXT: vl2re64.v v8, (a0)
; CHECK-NEXT: lw a0, 64(sp)		; CHECK-NEXT: lw a0, 64(sp)
; CHECK-NEXT: addi sp, s0, -128		; CHECK-NEXT: addi sp, s0, -128
; CHECK-NEXT: ld ra, 120(sp) # 8-byte Folded Reload		; CHECK-NEXT: ld ra, 120(sp) # 8-byte Folded Reload
Show All 10 Lines	; CHECK-NEXT: ret

%s = load volatile i32, i32* %overaligned, align 64		%s = load volatile i32, i32* %overaligned, align 64
ret void		ret void
}		}

define void @rvv_vla_and_overaligned(i64 %n, i64 %i) nounwind {		define void @rvv_vla_and_overaligned(i64 %n, i64 %i) nounwind {
; CHECK-LABEL: rvv_vla_and_overaligned:		; CHECK-LABEL: rvv_vla_and_overaligned:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: addi sp, sp, -128		; CHECK-NEXT: addi sp, sp, -144
; CHECK-NEXT: sd ra, 120(sp) # 8-byte Folded Spill		; CHECK-NEXT: sd ra, 136(sp) # 8-byte Folded Spill
; CHECK-NEXT: sd s0, 112(sp) # 8-byte Folded Spill		; CHECK-NEXT: sd s0, 128(sp) # 8-byte Folded Spill
; CHECK-NEXT: sd s1, 104(sp) # 8-byte Folded Spill		; CHECK-NEXT: sd s1, 120(sp) # 8-byte Folded Spill
; CHECK-NEXT: addi s0, sp, 128		; CHECK-NEXT: addi s0, sp, 144
; CHECK-NEXT: csrr a2, vlenb		; CHECK-NEXT: csrr a2, vlenb
; CHECK-NEXT: slli a3, a2, 1		; CHECK-NEXT: slli a2, a2, 2
; CHECK-NEXT: add a2, a3, a2
; CHECK-NEXT: sub sp, sp, a2		; CHECK-NEXT: sub sp, sp, a2
; CHECK-NEXT: andi sp, sp, -64		; CHECK-NEXT: andi sp, sp, -64
; CHECK-NEXT: mv s1, sp		; CHECK-NEXT: mv s1, sp
; CHECK-NEXT: slli a0, a0, 2		; CHECK-NEXT: slli a0, a0, 2
; CHECK-NEXT: addi a0, a0, 15		; CHECK-NEXT: addi a0, a0, 15
; CHECK-NEXT: andi a0, a0, -16		; CHECK-NEXT: andi a0, a0, -16
; CHECK-NEXT: sub a0, sp, a0		; CHECK-NEXT: sub a0, sp, a0
; CHECK-NEXT: mv sp, a0		; CHECK-NEXT: mv sp, a0
; CHECK-NEXT: csrr a2, vlenb		; CHECK-NEXT: csrr a2, vlenb
; CHECK-NEXT: slli a2, a2, 1		; CHECK-NEXT: slli a3, a2, 1
		; CHECK-NEXT: add a2, a3, a2
; CHECK-NEXT: add a2, s1, a2		; CHECK-NEXT: add a2, s1, a2
; CHECK-NEXT: addi a2, a2, 104		; CHECK-NEXT: addi a2, a2, 112
; CHECK-NEXT: vl1re64.v v8, (a2)		; CHECK-NEXT: vl1re64.v v8, (a2)
; CHECK-NEXT: addi a2, s1, 104		; CHECK-NEXT: addi a2, s1, 112
; CHECK-NEXT: vl2re64.v v8, (a2)		; CHECK-NEXT: vl2re64.v v8, (a2)
; CHECK-NEXT: lw a2, 64(s1)		; CHECK-NEXT: lw a2, 64(s1)
; CHECK-NEXT: slli a1, a1, 2		; CHECK-NEXT: slli a1, a1, 2
; CHECK-NEXT: add a0, a0, a1		; CHECK-NEXT: add a0, a0, a1
; CHECK-NEXT: lw a0, 0(a0)		; CHECK-NEXT: lw a0, 0(a0)
; CHECK-NEXT: addi sp, s0, -128		; CHECK-NEXT: addi sp, s0, -144
; CHECK-NEXT: ld ra, 120(sp) # 8-byte Folded Reload		; CHECK-NEXT: ld ra, 136(sp) # 8-byte Folded Reload
; CHECK-NEXT: ld s0, 112(sp) # 8-byte Folded Reload		; CHECK-NEXT: ld s0, 128(sp) # 8-byte Folded Reload
; CHECK-NEXT: ld s1, 104(sp) # 8-byte Folded Reload		; CHECK-NEXT: ld s1, 120(sp) # 8-byte Folded Reload
; CHECK-NEXT: addi sp, sp, 128		; CHECK-NEXT: addi sp, sp, 144
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%overaligned = alloca i32, align 64		%overaligned = alloca i32, align 64
%vla.addr = alloca i32, i64 %n		%vla.addr = alloca i32, i64 %n

%v1.addr = alloca <vscale x 1 x i64>		%v1.addr = alloca <vscale x 1 x i64>
%v1 = load volatile <vscale x 1 x i64>, <vscale x 1 x i64>* %v1.addr		%v1 = load volatile <vscale x 1 x i64>, <vscale x 1 x i64>* %v1.addr

%v2.addr = alloca <vscale x 2 x i64>		%v2.addr = alloca <vscale x 2 x i64>
%v2 = load volatile <vscale x 2 x i64>, <vscale x 2 x i64>* %v2.addr		%v2 = load volatile <vscale x 2 x i64>, <vscale x 2 x i64>* %v2.addr

%s1 = load volatile i32, i32* %overaligned, align 64		%s1 = load volatile i32, i32* %overaligned, align 64
%p = getelementptr i32, i32* %vla.addr, i64 %i		%p = getelementptr i32, i32* %vla.addr, i64 %i
%s2 = load volatile i32, i32* %p		%s2 = load volatile i32, i32* %p
ret void		ret void

}		}

llvm/test/CodeGen/RISCV/rvv/rvv-stack-align.mir

Show All 28 Lines	--- \|
; RV32-NEXT: slli a0, a0, 1		; RV32-NEXT: slli a0, a0, 1
; RV32-NEXT: add sp, sp, a0		; RV32-NEXT: add sp, sp, a0
; RV32-NEXT: lw ra, 44(sp) # 4-byte Folded Reload		; RV32-NEXT: lw ra, 44(sp) # 4-byte Folded Reload
; RV32-NEXT: addi sp, sp, 48		; RV32-NEXT: addi sp, sp, 48
; RV32-NEXT: ret		; RV32-NEXT: ret
;		;
; RV64-LABEL: rvv_stack_align8:		; RV64-LABEL: rvv_stack_align8:
; RV64: # %bb.0:		; RV64: # %bb.0:
; RV64-NEXT: addi sp, sp, -32		; RV64-NEXT: addi sp, sp, -48
; RV64-NEXT: sd ra, 24(sp) # 8-byte Folded Spill		; RV64-NEXT: sd ra, 40(sp) # 8-byte Folded Spill
; RV64-NEXT: csrr a0, vlenb		; RV64-NEXT: csrr a0, vlenb
; RV64-NEXT: slli a0, a0, 1		; RV64-NEXT: slli a0, a0, 1
; RV64-NEXT: sub sp, sp, a0		; RV64-NEXT: sub sp, sp, a0
; RV64-NEXT: addi a0, sp, 24		; RV64-NEXT: addi a0, sp, 32
; RV64-NEXT: addi a1, sp, 16		; RV64-NEXT: addi a1, sp, 16
; RV64-NEXT: addi a2, sp, 8		; RV64-NEXT: addi a2, sp, 8
; RV64-NEXT: call extern@plt		; RV64-NEXT: call extern@plt
; RV64-NEXT: csrr a0, vlenb		; RV64-NEXT: csrr a0, vlenb
; RV64-NEXT: slli a0, a0, 1		; RV64-NEXT: slli a0, a0, 1
; RV64-NEXT: add sp, sp, a0		; RV64-NEXT: add sp, sp, a0
; RV64-NEXT: ld ra, 24(sp) # 8-byte Folded Reload		; RV64-NEXT: ld ra, 40(sp) # 8-byte Folded Reload
; RV64-NEXT: addi sp, sp, 32		; RV64-NEXT: addi sp, sp, 48
; RV64-NEXT: ret		; RV64-NEXT: ret
%a = alloca <vscale x 4 x i32>, align 8		%a = alloca <vscale x 4 x i32>, align 8
%b = alloca i64		%b = alloca i64
%c = alloca i64		%c = alloca i64
call void @extern(<vscale x 4 x i32>* %a)		call void @extern(<vscale x 4 x i32>* %a)
ret void		ret void
}		}

; FIXME: The alloca is not correctly aligned to 16 bytes.

define void @rvv_stack_align16() #0 {		define void @rvv_stack_align16() #0 {
; RV32-LABEL: rvv_stack_align16:		; RV32-LABEL: rvv_stack_align16:
; RV32: # %bb.0:		; RV32: # %bb.0:
; RV32-NEXT: addi sp, sp, -48		; RV32-NEXT: addi sp, sp, -48
; RV32-NEXT: sw ra, 44(sp) # 4-byte Folded Spill		; RV32-NEXT: sw ra, 44(sp) # 4-byte Folded Spill
; RV32-NEXT: csrr a0, vlenb		; RV32-NEXT: csrr a0, vlenb
; RV32-NEXT: slli a0, a0, 1		; RV32-NEXT: slli a0, a0, 1
; RV32-NEXT: sub sp, sp, a0		; RV32-NEXT: sub sp, sp, a0
; RV32-NEXT: addi a0, sp, 32		; RV32-NEXT: addi a0, sp, 32
; RV32-NEXT: addi a1, sp, 16		; RV32-NEXT: addi a1, sp, 16
; RV32-NEXT: addi a2, sp, 8		; RV32-NEXT: addi a2, sp, 8
; RV32-NEXT: call extern@plt		; RV32-NEXT: call extern@plt
; RV32-NEXT: csrr a0, vlenb		; RV32-NEXT: csrr a0, vlenb
; RV32-NEXT: slli a0, a0, 1		; RV32-NEXT: slli a0, a0, 1
; RV32-NEXT: add sp, sp, a0		; RV32-NEXT: add sp, sp, a0
; RV32-NEXT: lw ra, 44(sp) # 4-byte Folded Reload		; RV32-NEXT: lw ra, 44(sp) # 4-byte Folded Reload
; RV32-NEXT: addi sp, sp, 48		; RV32-NEXT: addi sp, sp, 48
; RV32-NEXT: ret		; RV32-NEXT: ret
;		;
; RV64-LABEL: rvv_stack_align16:		; RV64-LABEL: rvv_stack_align16:
; RV64: # %bb.0:		; RV64: # %bb.0:
; RV64-NEXT: addi sp, sp, -32		; RV64-NEXT: addi sp, sp, -48
; RV64-NEXT: sd ra, 24(sp) # 8-byte Folded Spill		; RV64-NEXT: sd ra, 40(sp) # 8-byte Folded Spill
; RV64-NEXT: csrr a0, vlenb		; RV64-NEXT: csrr a0, vlenb
; RV64-NEXT: slli a0, a0, 1		; RV64-NEXT: slli a0, a0, 1
; RV64-NEXT: sub sp, sp, a0		; RV64-NEXT: sub sp, sp, a0
; RV64-NEXT: addi a0, sp, 24		; RV64-NEXT: addi a0, sp, 32
; RV64-NEXT: addi a1, sp, 16		; RV64-NEXT: addi a1, sp, 16
; RV64-NEXT: addi a2, sp, 8		; RV64-NEXT: addi a2, sp, 8
; RV64-NEXT: call extern@plt		; RV64-NEXT: call extern@plt
; RV64-NEXT: csrr a0, vlenb		; RV64-NEXT: csrr a0, vlenb
; RV64-NEXT: slli a0, a0, 1		; RV64-NEXT: slli a0, a0, 1
; RV64-NEXT: add sp, sp, a0		; RV64-NEXT: add sp, sp, a0
; RV64-NEXT: ld ra, 24(sp) # 8-byte Folded Reload		; RV64-NEXT: ld ra, 40(sp) # 8-byte Folded Reload
; RV64-NEXT: addi sp, sp, 32		; RV64-NEXT: addi sp, sp, 48
; RV64-NEXT: ret		; RV64-NEXT: ret
%a = alloca <vscale x 4 x i32>, align 16		%a = alloca <vscale x 4 x i32>, align 16
%b = alloca i64		%b = alloca i64
%c = alloca i64		%c = alloca i64
call void @extern(<vscale x 4 x i32>* %a)		call void @extern(<vscale x 4 x i32>* %a)
ret void		ret void
}		}

; FIXME: The alloca is not correctly aligned to 32 bytes.

define void @rvv_stack_align32() #0 {		define void @rvv_stack_align32() #0 {
; RV32-LABEL: rvv_stack_align32:		; RV32-LABEL: rvv_stack_align32:
; RV32: # %bb.0:		; RV32: # %bb.0:
; RV32-NEXT: addi sp, sp, -32		; RV32-NEXT: addi sp, sp, -48
; RV32-NEXT: sw ra, 28(sp) # 4-byte Folded Spill		; RV32-NEXT: sw ra, 44(sp) # 4-byte Folded Spill
; RV32-NEXT: sw s0, 24(sp) # 4-byte Folded Spill		; RV32-NEXT: sw s0, 40(sp) # 4-byte Folded Spill
; RV32-NEXT: addi s0, sp, 32		; RV32-NEXT: addi s0, sp, 48
; RV32-NEXT: csrr a0, vlenb		; RV32-NEXT: csrr a0, vlenb
; RV32-NEXT: slli a0, a0, 1		; RV32-NEXT: slli a0, a0, 2
; RV32-NEXT: sub sp, sp, a0		; RV32-NEXT: sub sp, sp, a0
; RV32-NEXT: andi sp, sp, -32		; RV32-NEXT: andi sp, sp, -32
; RV32-NEXT: addi a0, sp, 24		; RV32-NEXT: addi a0, sp, 32
; RV32-NEXT: addi a1, sp, 16		; RV32-NEXT: addi a1, sp, 16
; RV32-NEXT: addi a2, sp, 8		; RV32-NEXT: addi a2, sp, 8
; RV32-NEXT: call extern@plt		; RV32-NEXT: call extern@plt
; RV32-NEXT: addi sp, s0, -32		; RV32-NEXT: addi sp, s0, -48
; RV32-NEXT: lw ra, 28(sp) # 4-byte Folded Reload		; RV32-NEXT: lw ra, 44(sp) # 4-byte Folded Reload
; RV32-NEXT: lw s0, 24(sp) # 4-byte Folded Reload		; RV32-NEXT: lw s0, 40(sp) # 4-byte Folded Reload
; RV32-NEXT: addi sp, sp, 32		; RV32-NEXT: addi sp, sp, 48
; RV32-NEXT: ret		; RV32-NEXT: ret
;		;
; RV64-LABEL: rvv_stack_align32:		; RV64-LABEL: rvv_stack_align32:
; RV64: # %bb.0:		; RV64: # %bb.0:
; RV64-NEXT: addi sp, sp, -32		; RV64-NEXT: addi sp, sp, -48
; RV64-NEXT: sd ra, 24(sp) # 8-byte Folded Spill		; RV64-NEXT: sd ra, 40(sp) # 8-byte Folded Spill
; RV64-NEXT: sd s0, 16(sp) # 8-byte Folded Spill		; RV64-NEXT: sd s0, 32(sp) # 8-byte Folded Spill
; RV64-NEXT: addi s0, sp, 32		; RV64-NEXT: addi s0, sp, 48
; RV64-NEXT: csrr a0, vlenb		; RV64-NEXT: csrr a0, vlenb
; RV64-NEXT: slli a0, a0, 1		; RV64-NEXT: slli a0, a0, 2
; RV64-NEXT: sub sp, sp, a0		; RV64-NEXT: sub sp, sp, a0
; RV64-NEXT: andi sp, sp, -32		; RV64-NEXT: andi sp, sp, -32
; RV64-NEXT: addi a0, sp, 16		; RV64-NEXT: addi a0, sp, 32
; RV64-NEXT: addi a1, sp, 8		; RV64-NEXT: addi a1, sp, 8
; RV64-NEXT: mv a2, sp		; RV64-NEXT: mv a2, sp
; RV64-NEXT: call extern@plt		; RV64-NEXT: call extern@plt
; RV64-NEXT: addi sp, s0, -32		; RV64-NEXT: addi sp, s0, -48
; RV64-NEXT: ld ra, 24(sp) # 8-byte Folded Reload		; RV64-NEXT: ld ra, 40(sp) # 8-byte Folded Reload
; RV64-NEXT: ld s0, 16(sp) # 8-byte Folded Reload		; RV64-NEXT: ld s0, 32(sp) # 8-byte Folded Reload
; RV64-NEXT: addi sp, sp, 32		; RV64-NEXT: addi sp, sp, 48
; RV64-NEXT: ret		; RV64-NEXT: ret
%a = alloca <vscale x 4 x i32>, align 32		%a = alloca <vscale x 4 x i32>, align 32
%b = alloca i64		%b = alloca i64
%c = alloca i64		%c = alloca i64
call void @extern(<vscale x 4 x i32>* %a)		call void @extern(<vscale x 4 x i32>* %a)
ret void		ret void
}		}

▲ Show 20 Lines • Show All 137 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/rvv/scalar-stack-align.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=riscv32 -mattr=+zve64x -verify-machineinstrs < %s \			; RUN: llc -mtriple=riscv32 -mattr=+zve64x -verify-machineinstrs < %s \
	; RUN: \| FileCheck %s --check-prefix=RV32			; RUN: \| FileCheck %s --check-prefix=RV32
	; RUN: llc -mtriple=riscv64 -mattr=+zve64x -verify-machineinstrs < %s \			; RUN: llc -mtriple=riscv64 -mattr=+zve64x -verify-machineinstrs < %s \
	; RUN: \| FileCheck %s --check-prefix=RV64			; RUN: \| FileCheck %s --check-prefix=RV64
	; RUN: llc -mtriple=riscv32 -mattr=+v -verify-machineinstrs < %s \			; RUN: llc -mtriple=riscv32 -mattr=+v -verify-machineinstrs < %s \
	; RUN: \| FileCheck %s --check-prefix=RV32			; RUN: \| FileCheck %s --check-prefix=RV32
	; RUN: llc -mtriple=riscv64 -mattr=+v -verify-machineinstrs < %s \			; RUN: llc -mtriple=riscv64 -mattr=+v -verify-machineinstrs < %s \
	; RUN: \| FileCheck %s --check-prefix=RV64			; RUN: \| FileCheck %s --check-prefix=RV64

	; FIXME: The stack is assumed and required to be aligned to 16 bytes, but we			; FIXME: We are over-aligning the stack on V, wasting stack space.
	; only ensure an 8-byte alignment for the size of the section containing RVV
	; objects. After establishing sp, on zve64x the stack is only 8-byte aligned.
	; This is wrong in and of itself, but we can see that this also has the effect
	; that the 16-byte-aligned object at the bottom of the stack is misaligned.

	define i64* @scalar_stack_align16() nounwind {			define i64* @scalar_stack_align16() nounwind {
	; RV32-LABEL: scalar_stack_align16:			; RV32-LABEL: scalar_stack_align16:
	; RV32: # %bb.0:			; RV32: # %bb.0:
	; RV32-NEXT: addi sp, sp, -32			; RV32-NEXT: addi sp, sp, -32
	; RV32-NEXT: sw ra, 28(sp) # 4-byte Folded Spill			; RV32-NEXT: sw ra, 28(sp) # 4-byte Folded Spill
	; RV32-NEXT: csrr a0, vlenb			; RV32-NEXT: csrr a0, vlenb
				; RV32-NEXT: slli a0, a0, 1
	; RV32-NEXT: sub sp, sp, a0			; RV32-NEXT: sub sp, sp, a0
	; RV32-NEXT: addi a0, sp, 16			; RV32-NEXT: addi a0, sp, 16
	; RV32-NEXT: call extern@plt			; RV32-NEXT: call extern@plt
	; RV32-NEXT: mv a0, sp			; RV32-NEXT: mv a0, sp
	; RV32-NEXT: csrr a1, vlenb			; RV32-NEXT: csrr a1, vlenb
				; RV32-NEXT: slli a1, a1, 1
	; RV32-NEXT: add sp, sp, a1			; RV32-NEXT: add sp, sp, a1
	; RV32-NEXT: lw ra, 28(sp) # 4-byte Folded Reload			; RV32-NEXT: lw ra, 28(sp) # 4-byte Folded Reload
	; RV32-NEXT: addi sp, sp, 32			; RV32-NEXT: addi sp, sp, 32
	; RV32-NEXT: ret			; RV32-NEXT: ret
	;			;
	; RV64-LABEL: scalar_stack_align16:			; RV64-LABEL: scalar_stack_align16:
	; RV64: # %bb.0:			; RV64: # %bb.0:
	; RV64-NEXT: addi sp, sp, -16			; RV64-NEXT: addi sp, sp, -32
	; RV64-NEXT: sd ra, 8(sp) # 8-byte Folded Spill			; RV64-NEXT: sd ra, 24(sp) # 8-byte Folded Spill
	; RV64-NEXT: csrr a0, vlenb			; RV64-NEXT: csrr a0, vlenb
				; RV64-NEXT: slli a0, a0, 1
	; RV64-NEXT: sub sp, sp, a0			; RV64-NEXT: sub sp, sp, a0
	; RV64-NEXT: addi a0, sp, 8			; RV64-NEXT: addi a0, sp, 16
	; RV64-NEXT: call extern@plt			; RV64-NEXT: call extern@plt
	; RV64-NEXT: mv a0, sp			; RV64-NEXT: mv a0, sp
	; RV64-NEXT: csrr a1, vlenb			; RV64-NEXT: csrr a1, vlenb
				; RV64-NEXT: slli a1, a1, 1
	; RV64-NEXT: add sp, sp, a1			; RV64-NEXT: add sp, sp, a1
	; RV64-NEXT: ld ra, 8(sp) # 8-byte Folded Reload			; RV64-NEXT: ld ra, 24(sp) # 8-byte Folded Reload
	; RV64-NEXT: addi sp, sp, 16			; RV64-NEXT: addi sp, sp, 32
	; RV64-NEXT: ret			; RV64-NEXT: ret
	%a = alloca <vscale x 2 x i32>			%a = alloca <vscale x 2 x i32>
	%c = alloca i64, align 16			%c = alloca i64, align 16
	call void @extern(<vscale x 2 x i32>* %a)			call void @extern(<vscale x 2 x i32>* %a)
	ret i64* %c			ret i64* %c
	}			}

	declare void @extern(<vscale x 2 x i32>*)			declare void @extern(<vscale x 2 x i32>*)

llvm/test/CodeGen/RISCV/rvv/wrong-stack-offset-for-rvv-object.mir

# RUN: llc -mtriple riscv64 -mattr=+m,+v -run-pass=prologepilog \		# RUN: llc -mtriple riscv64 -mattr=+m,+v -run-pass=prologepilog \
# RUN: -riscv-v-vector-bits-min=512 -o - %s \| FileCheck %s		# RUN: -riscv-v-vector-bits-min=512 -o - %s \| FileCheck %s
#		#
# Stack layout of this program		# Stack layout of this program
# \|--------------------------\| -- <-- Incoming SP		# \|--------------------------\| -- <-- Incoming SP
# \| a7 (Vaarg) \|		# \| a7 (Vaarg) \|
# \| ------------------------ \| -- <-- New SP + vlenb + 56		# \| ------------------------ \| -- <-- New SP + vlenb + 72
# \| a6 (Vaarg) \|		# \| a6 (Vaarg) \|
# \| ------------------------ \| -- <-- New SP + vlenb + 48		# \| ------------------------ \| -- <-- New SP + vlenb + 64
# \| ra (Callee-saved reg) \|		# \| ra (Callee-saved reg) \|
# \| ------------------------ \| -- <-- New SP + vlenb + 40		# \| ------------------------ \| -- <-- New SP + vlenb + 56
# \| s0 (Callee-saved reg) \|		# \| s0 (Callee-saved reg) \|
# \| ------------------------ \| -- <-- New SP + vlenb + 32		# \| ------------------------ \| -- <-- New SP + vlenb + 48
# \| s1 (Callee-saved reg) \|		# \| s1 (Callee-saved reg) \|
# \| ------------------------ \| -- <-- New SP + vlenb + 24		# \| ------------------------ \| -- <-- New SP + vlenb + 40
		# \| 8 bytes of padding \|
		# \| ------------------------ \| -- <-- New SP + vlenb
# \| v8 (RVV objects) \|		# \| v8 (RVV objects) \|
# \| ------------------------ \| -- <-- New SP + 24		# \| ------------------------ \| -- <-- New SP + 32
# \| buf1 \|		# \| buf1 \|
# \|--------------------------\| -- <-- New SP + 16		# \|--------------------------\| -- <-- New SP + 16
# \| Stack ID 5 \|		# \| Stack ID 5 \|
# \|--------------------------\| -- <-- New SP + 8		# \|--------------------------\| -- <-- New SP + 8
# \| Stack ID 6 \|		# \| Stack ID 6 \|
# \|--------------------------\| -- <-- New SP		# \|--------------------------\| -- <-- New SP

--- \|		--- \|
▲ Show 20 Lines • Show All 91 Lines • ▼ Show 20 Lines	machineFunctionInfo:
varArgsFrameIndex: -1		varArgsFrameIndex: -1
varArgsSaveSize: 16		varArgsSaveSize: 16
body: \|		body: \|
; CHECK-LABEL: name: asm_fprintf		; CHECK-LABEL: name: asm_fprintf
; CHECK: stack:		; CHECK: stack:
; CHECK-NEXT: - { id: 0, name: buf1, type: default, offset: -48, size: 1, alignment: 8,		; CHECK-NEXT: - { id: 0, name: buf1, type: default, offset: -48, size: 1, alignment: 8,
; CHECK-NEXT: stack-id: default, callee-saved-register: '', callee-saved-restored: true,		; CHECK-NEXT: stack-id: default, callee-saved-register: '', callee-saved-restored: true,
; CHECK-NEXT: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }		; CHECK-NEXT: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
; CHECK-NEXT: - { id: 1, name: '', type: spill-slot, offset: -8, size: 8, alignment: 8,		; CHECK-NEXT: - { id: 1, name: '', type: spill-slot, offset: -16, size: 8, alignment: 8,
; CHECK-NEXT: stack-id: scalable-vector, callee-saved-register: '', callee-saved-restored: true,		; CHECK-NEXT: stack-id: scalable-vector, callee-saved-register: '', callee-saved-restored: true,
; CHECK-NEXT: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }		; CHECK-NEXT: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
; CHECK-NEXT: - { id: 2, name: '', type: spill-slot, offset: -24, size: 8, alignment: 8,		; CHECK-NEXT: - { id: 2, name: '', type: spill-slot, offset: -24, size: 8, alignment: 8,
; CHECK-NEXT: stack-id: default, callee-saved-register: '$x1', callee-saved-restored: true,		; CHECK-NEXT: stack-id: default, callee-saved-register: '$x1', callee-saved-restored: true,
; CHECK-NEXT: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }		; CHECK-NEXT: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
; CHECK-NEXT: - { id: 3, name: '', type: spill-slot, offset: -32, size: 8, alignment: 8,		; CHECK-NEXT: - { id: 3, name: '', type: spill-slot, offset: -32, size: 8, alignment: 8,
; CHECK-NEXT: stack-id: default, callee-saved-register: '$x8', callee-saved-restored: true,		; CHECK-NEXT: stack-id: default, callee-saved-register: '$x8', callee-saved-restored: true,
; CHECK-NEXT: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }		; CHECK-NEXT: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
; CHECK-NEXT: - { id: 4, name: '', type: spill-slot, offset: -40, size: 8, alignment: 8,		; CHECK-NEXT: - { id: 4, name: '', type: spill-slot, offset: -40, size: 8, alignment: 8,
; CHECK-NEXT: stack-id: default, callee-saved-register: '$x9', callee-saved-restored: true,		; CHECK-NEXT: stack-id: default, callee-saved-register: '$x9', callee-saved-restored: true,
; CHECK-NEXT: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }		; CHECK-NEXT: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
; CHECK-NEXT: - { id: 5, name: '', type: default, offset: -56, size: 8, alignment: 8,		; CHECK-NEXT: - { id: 5, name: '', type: default, offset: -56, size: 8, alignment: 8,
; CHECK-NEXT: stack-id: default, callee-saved-register: '', callee-saved-restored: true,		; CHECK-NEXT: stack-id: default, callee-saved-register: '', callee-saved-restored: true,
; CHECK-NEXT: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }		; CHECK-NEXT: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
; CHECK-NEXT: - { id: 6, name: '', type: default, offset: -64, size: 8, alignment: 8,		; CHECK-NEXT: - { id: 6, name: '', type: default, offset: -64, size: 8, alignment: 8,
; CHECK-NEXT: stack-id: default, callee-saved-register: '', callee-saved-restored: true,		; CHECK-NEXT: stack-id: default, callee-saved-register: '', callee-saved-restored: true,
; CHECK-NEXT: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }		; CHECK-NEXT: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
; CHECK: bb.0.entry:		; CHECK: bb.0.entry:
; CHECK-NEXT: successors: %bb.1(0x80000000)		; CHECK-NEXT: successors: %bb.1(0x80000000)
; CHECK-NEXT: liveins: $x11, $x14, $x16, $x17, $x1, $x8, $x9		; CHECK-NEXT: liveins: $x11, $x14, $x16, $x17, $x1, $x8, $x9
; CHECK-NEXT: {{ $}}		; CHECK-NEXT: {{ $}}
; CHECK-NEXT: $x2 = frame-setup ADDI $x2, -64		; CHECK-NEXT: $x2 = frame-setup ADDI $x2, -80
; CHECK-NEXT: frame-setup CFI_INSTRUCTION def_cfa_offset 64		; CHECK-NEXT: frame-setup CFI_INSTRUCTION def_cfa_offset 80
; CHECK-NEXT: SD killed $x1, $x2, 40 :: (store (s64) into %stack.2)		; CHECK-NEXT: SD killed $x1, $x2, 56 :: (store (s64) into %stack.2)
; CHECK-NEXT: SD killed $x8, $x2, 32 :: (store (s64) into %stack.3)		; CHECK-NEXT: SD killed $x8, $x2, 48 :: (store (s64) into %stack.3)
; CHECK-NEXT: SD killed $x9, $x2, 24 :: (store (s64) into %stack.4)		; CHECK-NEXT: SD killed $x9, $x2, 40 :: (store (s64) into %stack.4)
; CHECK-NEXT: frame-setup CFI_INSTRUCTION offset $x1, -24		; CHECK-NEXT: frame-setup CFI_INSTRUCTION offset $x1, -24
; CHECK-NEXT: frame-setup CFI_INSTRUCTION offset $x8, -32		; CHECK-NEXT: frame-setup CFI_INSTRUCTION offset $x8, -32
; CHECK-NEXT: frame-setup CFI_INSTRUCTION offset $x9, -40		; CHECK-NEXT: frame-setup CFI_INSTRUCTION offset $x9, -40
; CHECK-NEXT: $x10 = frame-setup PseudoReadVLENB		; CHECK-NEXT: $x10 = frame-setup PseudoReadVLENB
		; CHECK-NEXT: $x10 = frame-setup SLLI killed $x10, 1
; CHECK-NEXT: $x2 = frame-setup SUB $x2, killed $x10		; CHECK-NEXT: $x2 = frame-setup SUB $x2, killed $x10
; CHECK-NEXT: renamable $x8 = COPY $x14		; CHECK-NEXT: renamable $x8 = COPY $x14
; CHECK-NEXT: renamable $x9 = COPY $x11		; CHECK-NEXT: renamable $x9 = COPY $x11
; CHECK-NEXT: $x10 = PseudoReadVLENB		; CHECK-NEXT: $x10 = PseudoReadVLENB
		; CHECK-NEXT: $x10 = SLLI killed $x10, 1
; CHECK-NEXT: $x10 = ADD $x2, killed $x10		; CHECK-NEXT: $x10 = ADD $x2, killed $x10
; CHECK-NEXT: SD killed renamable $x17, killed $x10, 56 :: (store (s64))		; CHECK-NEXT: SD killed renamable $x17, killed $x10, 72 :: (store (s64))
; CHECK-NEXT: $x10 = PseudoReadVLENB		; CHECK-NEXT: $x10 = PseudoReadVLENB
		; CHECK-NEXT: $x10 = SLLI killed $x10, 1
; CHECK-NEXT: $x10 = ADD $x2, killed $x10		; CHECK-NEXT: $x10 = ADD $x2, killed $x10
; CHECK-NEXT: SD killed renamable $x16, killed $x10, 48 :: (store (s64) into %fixed-stack.1, align 16)		; CHECK-NEXT: SD killed renamable $x16, killed $x10, 64 :: (store (s64) into %fixed-stack.1, align 16)
; CHECK-NEXT: dead $x0 = PseudoVSETIVLI 2, 69 /* e8, mf8, ta, mu */, implicit-def $vl, implicit-def $vtype		; CHECK-NEXT: dead $x0 = PseudoVSETIVLI 2, 69 /* e8, mf8, ta, mu */, implicit-def $vl, implicit-def $vtype
; CHECK-NEXT: renamable $v8 = PseudoVMV_V_I_MF8 0, 2, 3 /* e8 */, implicit $vl, implicit $vtype		; CHECK-NEXT: renamable $v8 = PseudoVMV_V_I_MF8 0, 2, 3 /* e8 */, implicit $vl, implicit $vtype
; CHECK-NEXT: $x10 = ADDI $x2, 24		; CHECK-NEXT: $x10 = ADDI $x2, 32
; CHECK-NEXT: PseudoVSPILL_M1 killed renamable $v8, killed $x10 :: (store unknown-size into %stack.1, align 8)		; CHECK-NEXT: PseudoVSPILL_M1 killed renamable $v8, killed $x10 :: (store unknown-size into %stack.1, align 8)
; CHECK-NEXT: {{ $}}		; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.1.while.cond:		; CHECK-NEXT: bb.1.while.cond:
; CHECK-NEXT: successors: %bb.2(0x30000000), %bb.1(0x50000000)		; CHECK-NEXT: successors: %bb.2(0x30000000), %bb.1(0x50000000)
; CHECK-NEXT: liveins: $x8, $x9		; CHECK-NEXT: liveins: $x8, $x9
; CHECK-NEXT: {{ $}}		; CHECK-NEXT: {{ $}}
; CHECK-NEXT: BNE $x0, $x0, %bb.1		; CHECK-NEXT: BNE $x0, $x0, %bb.1
; CHECK-NEXT: PseudoBR %bb.2		; CHECK-NEXT: PseudoBR %bb.2
; CHECK-NEXT: {{ $}}		; CHECK-NEXT: {{ $}}
; CHECK-NEXT: bb.2.sw.bb:		; CHECK-NEXT: bb.2.sw.bb:
; CHECK-NEXT: successors: %bb.1(0x80000000)		; CHECK-NEXT: successors: %bb.1(0x80000000)
; CHECK-NEXT: liveins: $x8, $x9		; CHECK-NEXT: liveins: $x8, $x9
; CHECK-NEXT: {{ $}}		; CHECK-NEXT: {{ $}}
; CHECK-NEXT: dead $x0 = PseudoVSETIVLI 2, 69 /* e8, mf8, ta, mu */, implicit-def $vl, implicit-def $vtype		; CHECK-NEXT: dead $x0 = PseudoVSETIVLI 2, 69 /* e8, mf8, ta, mu */, implicit-def $vl, implicit-def $vtype
; CHECK-NEXT: $x10 = ADDI $x2, 24		; CHECK-NEXT: $x10 = ADDI $x2, 32
; CHECK-NEXT: renamable $v8 = PseudoVRELOAD_M1 killed $x10 :: (load unknown-size from %stack.1, align 8)		; CHECK-NEXT: renamable $v8 = PseudoVRELOAD_M1 killed $x10 :: (load unknown-size from %stack.1, align 8)
; CHECK-NEXT: PseudoVSE8_V_MF8 killed renamable $v8, renamable $x8, 2, 3 /* e8 */, implicit $vl, implicit $vtype :: (store (s16) into %ir.0, align 1)		; CHECK-NEXT: PseudoVSE8_V_MF8 killed renamable $v8, renamable $x8, 2, 3 /* e8 */, implicit $vl, implicit $vtype :: (store (s16) into %ir.0, align 1)
; CHECK-NEXT: $x10 = COPY renamable $x9		; CHECK-NEXT: $x10 = COPY renamable $x9
; CHECK-NEXT: PseudoCALL target-flags(riscv-plt) @fprintf, csr_ilp32d_lp64d, implicit-def dead $x1, implicit killed $x10, implicit-def $x2, implicit-def dead $x10		; CHECK-NEXT: PseudoCALL target-flags(riscv-plt) @fprintf, csr_ilp32d_lp64d, implicit-def dead $x1, implicit killed $x10, implicit-def $x2, implicit-def dead $x10
; CHECK-NEXT: PseudoBR %bb.1		; CHECK-NEXT: PseudoBR %bb.1
bb.0.entry:		bb.0.entry:
successors: %bb.1(0x80000000)		successors: %bb.1(0x80000000)
liveins: $x11, $x14, $x16, $x17		liveins: $x11, $x14, $x16, $x17
Show All 30 Lines

llvm/test/CodeGen/RISCV/rvv/wrong-stack-slot-rv32.mir

Show All 34 Lines	--- \|
; CHECK-NEXT: sw s0, 72(sp) # 4-byte Folded Spill		; CHECK-NEXT: sw s0, 72(sp) # 4-byte Folded Spill
; CHECK-NEXT: sw s9, 68(sp) # 4-byte Folded Spill		; CHECK-NEXT: sw s9, 68(sp) # 4-byte Folded Spill
; CHECK-NEXT: addi s0, sp, 80		; CHECK-NEXT: addi s0, sp, 80
; CHECK-NEXT: csrr a1, vlenb		; CHECK-NEXT: csrr a1, vlenb
; CHECK-NEXT: slli a1, a1, 1		; CHECK-NEXT: slli a1, a1, 1
; CHECK-NEXT: sub sp, sp, a1		; CHECK-NEXT: sub sp, sp, a1
; CHECK-NEXT: andi sp, sp, -32		; CHECK-NEXT: andi sp, sp, -32
; CHECK-NEXT: sw a0, 32(sp) # 4-byte Folded Spill		; CHECK-NEXT: sw a0, 32(sp) # 4-byte Folded Spill
; CHECK-NEXT: addi a0, sp, 56		; CHECK-NEXT: addi a0, sp, 64
; CHECK-NEXT: vs2r.v v30, (a0) # Unknown-size Folded Spill		; CHECK-NEXT: vs2r.v v30, (a0) # Unknown-size Folded Spill
; CHECK-NEXT: addi sp, s0, -80		; CHECK-NEXT: addi sp, s0, -80
; CHECK-NEXT: lw ra, 76(sp) # 4-byte Folded Reload		; CHECK-NEXT: lw ra, 76(sp) # 4-byte Folded Reload
; CHECK-NEXT: lw s0, 72(sp) # 4-byte Folded Reload		; CHECK-NEXT: lw s0, 72(sp) # 4-byte Folded Reload
; CHECK-NEXT: lw s9, 68(sp) # 4-byte Folded Reload		; CHECK-NEXT: lw s9, 68(sp) # 4-byte Folded Reload
; CHECK-NEXT: addi sp, sp, 80		; CHECK-NEXT: addi sp, sp, 80
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/rvv/wrong-stack-slot-rv64.mir

	# NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	# RUN: llc -mtriple=riscv64 -mattr=+m,+v -o - %s \			# RUN: llc -mtriple=riscv64 -mattr=+m,+v -o - %s \
	# RUN: -start-before=prologepilog \| FileCheck %s			# RUN: -start-before=prologepilog \| FileCheck %s
	#			#
	# This test checks that we are assigning the right stack slot to GPRs and to			# This test checks that we are assigning the right stack slot to GPRs and to
	# vector registers (VRs). If this test changes, make sure there is no overlap			# vector registers (VRs). If this test changes, make sure there is no overlap
	# between slots for GPRs and VRs.			# between slots for GPRs and VRs.
	--- \|			--- \|
	define void @foo() #0 {			define void @foo() #0 {
	; CHECK-LABEL: foo:			; CHECK-LABEL: foo:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: addi sp, sp, -32			; CHECK-NEXT: addi sp, sp, -48
	; CHECK-NEXT: sd s9, 24(sp) # 8-byte Folded Spill			; CHECK-NEXT: sd s9, 40(sp) # 8-byte Folded Spill
	; CHECK-NEXT: csrr a1, vlenb			; CHECK-NEXT: csrr a1, vlenb
	; CHECK-NEXT: slli a1, a1, 1			; CHECK-NEXT: slli a1, a1, 1
	; CHECK-NEXT: sub sp, sp, a1			; CHECK-NEXT: sub sp, sp, a1
	; CHECK-NEXT: sd a0, 16(sp) # 8-byte Folded Spill			; CHECK-NEXT: sd a0, 16(sp) # 8-byte Folded Spill
	; CHECK-NEXT: addi a0, sp, 24			; CHECK-NEXT: addi a0, sp, 32
	; CHECK-NEXT: vs2r.v v30, (a0) # Unknown-size Folded Spill			; CHECK-NEXT: vs2r.v v30, (a0) # Unknown-size Folded Spill
	; CHECK-NEXT: csrr a0, vlenb			; CHECK-NEXT: csrr a0, vlenb
	; CHECK-NEXT: slli a0, a0, 1			; CHECK-NEXT: slli a0, a0, 1
	; CHECK-NEXT: add sp, sp, a0			; CHECK-NEXT: add sp, sp, a0
	; CHECK-NEXT: ld s9, 24(sp) # 8-byte Folded Reload			; CHECK-NEXT: ld s9, 40(sp) # 8-byte Folded Reload
	; CHECK-NEXT: addi sp, sp, 32			; CHECK-NEXT: addi sp, sp, 48
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	ret void			ret void
	}			}

	attributes #0 = { nounwind }			attributes #0 = { nounwind }
	...			...
	---			---
	Show All 19 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Fix RVV stack frame alignment bugsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 431592

llvm/lib/Target/RISCV/RISCVFrameLowering.h

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp

llvm/lib/Target/RISCV/RISCVMachineFunctionInfo.h

llvm/test/CodeGen/RISCV/rvv/access-fixed-objects-by-rvv.ll

llvm/test/CodeGen/RISCV/rvv/addi-scalable-offset.mir

llvm/test/CodeGen/RISCV/rvv/allocate-lmul-2-4-8.ll

llvm/test/CodeGen/RISCV/rvv/calling-conv-fastcc.ll

llvm/test/CodeGen/RISCV/rvv/calling-conv.ll

llvm/test/CodeGen/RISCV/rvv/emergency-slot.mir

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-insert-subvector.ll

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vpscatter.ll

llvm/test/CodeGen/RISCV/rvv/localvar.ll

llvm/test/CodeGen/RISCV/rvv/memory-args.ll

llvm/test/CodeGen/RISCV/rvv/no-reserved-frame.ll

llvm/test/CodeGen/RISCV/rvv/rv32-spill-vector-csr.ll

llvm/test/CodeGen/RISCV/rvv/rv32-spill-vector.ll

llvm/test/CodeGen/RISCV/rvv/rv32-spill-zvlsseg.ll

llvm/test/CodeGen/RISCV/rvv/rv64-spill-vector-csr.ll

llvm/test/CodeGen/RISCV/rvv/rv64-spill-vector.ll

llvm/test/CodeGen/RISCV/rvv/rv64-spill-zvlsseg.ll

llvm/test/CodeGen/RISCV/rvv/rvv-args-by-mem.ll

llvm/test/CodeGen/RISCV/rvv/rvv-framelayout.ll

llvm/test/CodeGen/RISCV/rvv/rvv-stack-align.mir

llvm/test/CodeGen/RISCV/rvv/scalar-stack-align.ll

llvm/test/CodeGen/RISCV/rvv/wrong-stack-offset-for-rvv-object.mir

llvm/test/CodeGen/RISCV/rvv/wrong-stack-slot-rv32.mir

llvm/test/CodeGen/RISCV/rvv/wrong-stack-slot-rv64.mir

[RISCV] Fix RVV stack frame alignment bugs
ClosedPublic