This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/RISCV/
-
Target/
-
RISCV/
-
RISCVISelLowering.cpp
-
test/CodeGen/RISCV/
-
CodeGen/
-
RISCV/
-
rvv/
-
fixed-vectors-calling-conv.ll
-
stack-slot-size.ll
-
vector-abi.ll

Differential D99087

[RISCV] Fix stack slot for argument types (Bug 49500)
ClosedPublic

Authored by frasercrmck on Mar 22 2021, 9:21 AM.

Download Raw Diff

Details

Reviewers

luismarques
asb
rogfer01
mundaym

Commits

rG43ad058a0188: [RISCV] Fix stack slot for argument types (Bug 49500)

Summary

This is an complementary/alternative fix for D99068. It takes a slightly
different approach by explicitly summing up all of the required split
part type sizes and ensuring we allocate enough space for them. It also
takes the maximum alignment of each part.

Compared with D99068 there are fewer changes to the stack objects in
existing tests. However, @luismarques has shown in that patch that there
are opportunities to reduce our stack usage in the future.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

frasercrmck created this revision.Mar 22 2021, 9:21 AM

Herald added subscribers: vkmr, evandro, apazos and 21 others. · View Herald TranscriptMar 22 2021, 9:21 AM

frasercrmck requested review of this revision.Mar 22 2021, 9:21 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 22 2021, 9:21 AM

Herald added subscribers: llvm-commits, MaskRay. · View Herald Transcript

Harbormaster completed remote builds in B95020: Diff 332326.Mar 22 2021, 11:27 AM

There are certainly fewer changes to existing tests, compared with

D99068. But if @luismarques is right in that there is excessive stack
alignment then this patch needs further work.

My analysis was a bit rushed. Looking at just the stack-slot-size.ll tests, the allocas are fine. The stack has some spare space (due to the correct alignment) and the alloca offset was chosen differently than I had expected when I originally looked at it (36 vs 32), but both options are valid.
The other tests I haven't yet looked into in detail to verify if the changes are correct. On the one hand, having this patch with fewer changes is appealing, because it's much easier to review. On the other hand, I see that my patch has reduced the stack size for several test cases, so if those changes are correct they are an improvement, and it would be nice to reap that reward.
While looking into this issue now, I explored some other patch variants and they had even more test case changes, possibly (correct) improvements, so maybe it's worth it looking into this more carefully? Or maybe, given the priorities, we should just commit this patch for now (looks OK to me)... Decisions, decisions. @asb any thoughts?

In D99087#2644578, @luismarques wrote:

My analysis was a bit rushed. Looking at just the stack-slot-size.ll tests, the allocas are fine. The stack has some spare space (due to the correct alignment) and the alloca offset was chosen differently than I had expected when I originally looked at it (36 vs 32), but both options are valid.
The other tests I haven't yet looked into in detail to verify if the changes are correct. On the one hand, having this patch with fewer changes is appealing, because it's much easier to review. On the other hand, I see that my patch has reduced the stack size for several test cases, so if those changes are correct they are an improvement, and it would be nice to reap that reward.
While looking into this issue now, I explored some other patch variants and they had even more test case changes, possibly (correct) improvements, so maybe it's worth it looking into this more carefully? Or maybe, given the priorities, we should just commit this patch for now (looks OK to me)... Decisions, decisions. @asb any thoughts?

Sorry for the slow reply, I've been away from LLVM for a few days, and have kind of been waiting in case @asb had some input. In general, I don't mind which patch goes in but, conceptually, splitting it more into "non-functional" and "optimize stack usage" changes probably sounds like better design. I do think it's worth investigating whether there's work to do on reducing the stack size.

I've been a bit busy with some none-LLVM things the past few days too, so sorry for the delay in commenting. I'll try and loop back shortly.

@frasercrmck you mentioned this patch doesn't quite seem right to you - is there a particular part you're concerned about. I've spent some time going over both this and D99068. The logic of this patch makes more intuitive sense to me, but that may be as it's a long time since I have a better memory of the logic it's modifying. Though Luis' approach does indeed seem to reduce stack size in some cases, and of course directly mirrors the matching SystemZ fix.

In D99087#2663805, @asb wrote:

@frasercrmck you mentioned this patch doesn't quite seem right to you - is there a particular part you're concerned about. I've spent some time going over both this and D99068. The logic of this patch makes more intuitive sense to me, but that may be as it's a long time since I have a better memory of the logic it's modifying. Though Luis' approach does indeed seem to reduce stack size in some cases, and of course directly mirrors the matching SystemZ fix.

Er no, it was more a three-fold hunch, given that I had only dipped my toe into this bit of the ABI: that I couldn't find a hook or setting that does it automatically, that other targets hadn't been required to do the same, and that other targets hadn't been able to fix it "nicely" (without two loops).

Thanks for going over it, though. It's a relief that both you and @luismarques seem to agree it does the job.

LGTM.

This revision is now accepted and ready to land.Apr 1 2021, 10:10 AM

luismarques mentioned this in D99068: [RISCV][WIP][RFC] Fix stack slot for argument type sizes not a multiple of 64 bits (Bug 49500).Apr 4 2021, 8:33 AM

BTW, it shouldn't affect correctness but the original CreateStackTemporary would use (as part of that method's implementation) getPrefTypeAlign instead of getEVTAlign. In principle that would allow better alignment choices, although at the moment it probably would never make any difference.

luismarques mentioned this in D97514: [SystemZ] Assign the full space for promoted and split outgoing args.Apr 4 2021, 9:45 AM

frasercrmck mentioned this in D95025: [RISCV] Add a test showing incorrect codegen.Apr 5 2021, 1:36 AM

rebase on top of pre-committed test

frasercrmck retitled this revision from [RISCV][WIP] Fix stack slot for argument types (Bug 49500) to [RISCV] Fix stack slot for argument types (Bug 49500).Apr 5 2021, 4:06 AM

frasercrmck edited the summary of this revision. (Show Details)

In D99087#2668140, @luismarques wrote:

BTW, it shouldn't affect correctness but the original CreateStackTemporary would use (as part of that method's implementation) getPrefTypeAlign instead of getEVTAlign. In principle that would allow better alignment choices, although at the moment it probably would never make any difference.

Hmm good catch. Shame there's no DAG wrapper for that method; targets don't typically use it. I could look into it before merging, if you like?

In D99087#2668840, @frasercrmck wrote:

Hmm good catch. Shame there's no DAG wrapper for that method; targets don't typically use it. I could look into it before merging, if you like?

Using the preferred type align does produce changes in 7 of our tests. Maybe we should leave it to a follow-up patch? We'd be best verifying that each is correct.

Harbormaster completed remote builds in B97112: Diff 335238.Apr 5 2021, 4:45 AM

In D99087#2668842, @frasercrmck wrote:

In D99087#2668840, @frasercrmck wrote:

Hmm good catch. Shame there's no DAG wrapper for that method; targets don't typically use it. I could look into it before merging, if you like?

Using the preferred type align does produce changes in 7 of our tests. Maybe we should leave it to a follow-up patch? We'd be best verifying that each is correct.

My concern with leaving it up to a follow-up patch is that 1) for "normal" types, we're already using getPrefTypeAlign (under the covers), so in a sense this patch would be a slight regression; 2) it's easy to never follow up on these kinds of things.
Do you reckon it would be hard to verify each? (I imagine that the stack allocation is either the same or increases for all of those impacted tests, right?)

update for preferred EVT alignment

In D99087#2668854, @luismarques wrote:

My concern with leaving it up to a follow-up patch is that 1) for "normal" types, we're already using getPrefTypeAlign (under the covers), so in a sense this patch would be a slight regression; 2) it's easy to never follow up on these kinds of things.
Do you reckon it would be hard to verify each? (I imagine that the stack allocation is either the same or increases for all of those impacted tests, right?)

Yes, fair enough. I've updated the diff to use the preferred type alignment. We can always revert for whatever reason. I'll try and take a look through the diffs later.

Harbormaster completed remote builds in B97113: Diff 335240.Apr 5 2021, 6:00 AM

In D99087#2668857, @frasercrmck wrote:

Yes, fair enough. I've updated the diff to use the preferred type alignment. We can always revert for whatever reason. I'll try and take a look through the diffs later.

Thanks. Also, it seems I was wrong, you do sometimes get smaller stack sizes despite getPrefTypeAlign >= getEVTAlign. Weird.

In D99087#2670691, @luismarques wrote:

In D99087#2668857, @frasercrmck wrote:

Yes, fair enough. I've updated the diff to use the preferred type alignment. We can always revert for whatever reason. I'll try and take a look through the diffs later.

Thanks. Also, it seems I was wrong, you do sometimes get smaller stack sizes despite getPrefTypeAlign >= getEVTAlign. Weird.

Yeah it's interesting. Sorry, I haven't had time to investigate yet. I'll do that today. Have you taken a look, by any chance?

In D99087#2685236, @frasercrmck wrote:

Thanks. Also, it seems I was wrong, you do sometimes get smaller stack sizes despite getPrefTypeAlign >= getEVTAlign. Weird.

Yeah it's interesting. Sorry, I haven't had time to investigate yet. I'll do that today. Have you taken a look, by any chance?

Sorry, I didn't.

In D99087#2685260, @luismarques wrote:

Yeah it's interesting. Sorry, I haven't had time to investigate yet. I'll do that today. Have you taken a look, by any chance?

Sorry, I didn't.

No worries. I just noticed my latest "pref" diff mistakenly changed the initial StackAlign from max(align(ArgValueVT), align(Outs[i].ArgVT)) to max(prefalign(ArgValueVT), prefalign(ArgValueVT)). I now think that this is where the diffs came from. The align of the Outs[i].ArgVT type is often likely larger than the split type (like i128 vs. XLEN), so by removing it we reduce the stack size.

Right now I suspect I should revert this back to the preferred alignment of Outs[i].ArgVT so we're at least keeping the same alignment (or safely increasing it to the split type). Any thoughts?

I don't know if we strictly need the alignment of the original type or not.

rebase
fix preferred alignment, keeping alignment of original ArgVT
undo most of the test changes

In D99087#2685264, @frasercrmck wrote:

No worries. I just noticed my latest "pref" diff mistakenly changed the initial StackAlign from max(align(ArgValueVT), align(Outs[i].ArgVT)) to max(prefalign(ArgValueVT), prefalign(ArgValueVT)). I now think that this is where the diffs came from. The align of the Outs[i].ArgVT type is often likely larger than the split type (like i128 vs. XLEN), so by removing it we reduce the stack size.

Right now I suspect I should revert this back to the preferred alignment of Outs[i].ArgVT so we're at least keeping the same alignment (or safely increasing it to the split type). Any thoughts?

Ahh, that explains it. Yes, I think we need that change (that you already made).

I don't know if we strictly need the alignment of the original type or not.

I think we do, per the psABI alignment requirements.
I'll go over the patch more thoroughly later, but this now seems more like what I would expect and could probably be committed.

Harbormaster completed remote builds in B98448: Diff 337088.Apr 13 2021, 3:52 AM

rebase

In D99087#2685314, @luismarques wrote:

I think we do, per the psABI alignment requirements.

Yeah that makes sense.

I'll go over the patch more thoroughly later, but this now seems more like what I would expect and could probably be committed.

Have you had a chance to look over it?

Harbormaster completed remote builds in B101351: Diff 341104.Apr 28 2021, 3:23 AM

In D99087#2722025, @frasercrmck wrote:

Have you had a chance to look over it?

Sorry, not yet. I'll go over it again later today, just to confirm everything, but IIRC the patch was now in mergeable shape.

LGTM. Thank you!

Closed by commit rG43ad058a0188: [RISCV] Fix stack slot for argument types (Bug 49500) (authored by frasercrmck). · Explain WhyApr 29 2021, 1:18 AM

This revision was automatically updated to reflect the committed changes.

frasercrmck added a commit: rG43ad058a0188: [RISCV] Fix stack slot for argument types (Bug 49500).

Revision Contents

Path

Size

llvm/

lib/

Target/

RISCV/

RISCVISelLowering.cpp

33 lines

test/

CodeGen/

RISCV/

rvv/

fixed-vectors-calling-conv.ll

36 lines

stack-slot-size.ll

26 lines

vector-abi.ll

7 lines

Diff 341430

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,074 Lines • ▼ Show 20 Lines	bool RISCVTargetLowering::isEligibleForTailCallOptimization(
// but less efficient and uglier in LowerCall.		// but less efficient and uglier in LowerCall.
for (auto &Arg : Outs)		for (auto &Arg : Outs)
if (Arg.Flags.isByVal())		if (Arg.Flags.isByVal())
return false;		return false;

return true;		return true;
}		}

		static Align getPrefTypeAlign(EVT VT, SelectionDAG &DAG) {
		return DAG.getDataLayout().getPrefTypeAlign(
		VT.getTypeForEVT(*DAG.getContext()));
		}

// Lower a call to a callseq_start + CALL + callseq_end chain, and add input		// Lower a call to a callseq_start + CALL + callseq_end chain, and add input
// and output parameter nodes.		// and output parameter nodes.
SDValue RISCVTargetLowering::LowerCall(CallLoweringInfo &CLI,		SDValue RISCVTargetLowering::LowerCall(CallLoweringInfo &CLI,
SmallVectorImpl<SDValue> &InVals) const {		SmallVectorImpl<SDValue> &InVals) const {
SelectionDAG &DAG = CLI.DAG;		SelectionDAG &DAG = CLI.DAG;
SDLoc &DL = CLI.DL;		SDLoc &DL = CLI.DL;
SmallVectorImpl<ISD::OutputArg> &Outs = CLI.Outs;		SmallVectorImpl<ISD::OutputArg> &Outs = CLI.Outs;
SmallVectorImpl<SDValue> &OutVals = CLI.OutVals;		SmallVectorImpl<SDValue> &OutVals = CLI.OutVals;
▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines	for (unsigned i = 0, j = 0, e = ArgLocs.size(); i != e; ++i) {

// IsF64OnRV32DSoftABI && VA.isMemLoc() is handled below in the same way		// IsF64OnRV32DSoftABI && VA.isMemLoc() is handled below in the same way
// as any other MemLoc.		// as any other MemLoc.

// Promote the value if needed.		// Promote the value if needed.
// For now, only handle fully promoted and indirect arguments.		// For now, only handle fully promoted and indirect arguments.
if (VA.getLocInfo() == CCValAssign::Indirect) {		if (VA.getLocInfo() == CCValAssign::Indirect) {
// Store the argument in a stack slot and pass its address.		// Store the argument in a stack slot and pass its address.
SDValue SpillSlot = DAG.CreateStackTemporary(Outs[i].ArgVT);		Align StackAlign =
int FI = cast<FrameIndexSDNode>(SpillSlot)->getIndex();		std::max(getPrefTypeAlign(Outs[i].ArgVT, DAG),
MemOpChains.push_back(		getPrefTypeAlign(ArgValue.getValueType(), DAG));
DAG.getStore(Chain, DL, ArgValue, SpillSlot,		TypeSize StoredSize = ArgValue.getValueType().getStoreSize();
MachinePointerInfo::getFixedStack(MF, FI)));
// If the original argument was split (e.g. i128), we need		// If the original argument was split (e.g. i128), we need
// to store the required parts of it here (and pass just one address).		// to store the required parts of it here (and pass just one address).
// Vectors may be partly split to registers and partly to the stack, in		// Vectors may be partly split to registers and partly to the stack, in
// which case the base address is partly offset and subsequent stores are		// which case the base address is partly offset and subsequent stores are
// relative to that.		// relative to that.
unsigned ArgIndex = Outs[i].OrigArgIndex;		unsigned ArgIndex = Outs[i].OrigArgIndex;
unsigned ArgPartOffset = Outs[i].PartOffset;		unsigned ArgPartOffset = Outs[i].PartOffset;
assert(VA.getValVT().isVector() \|\| ArgPartOffset == 0);		assert(VA.getValVT().isVector() \|\| ArgPartOffset == 0);
		// Calculate the total size to store. We don't have access to what we're
		// actually storing other than performing the loop and collecting the
		// info.
		SmallVector<std::pair<SDValue, unsigned>> Parts;
while (i + 1 != e && Outs[i + 1].OrigArgIndex == ArgIndex) {		while (i + 1 != e && Outs[i + 1].OrigArgIndex == ArgIndex) {
SDValue PartValue = OutVals[i + 1];		SDValue PartValue = OutVals[i + 1];
unsigned PartOffset = Outs[i + 1].PartOffset - ArgPartOffset;		unsigned PartOffset = Outs[i + 1].PartOffset - ArgPartOffset;
		EVT PartVT = PartValue.getValueType();
		StoredSize += PartVT.getStoreSize();
		StackAlign = std::max(StackAlign, getPrefTypeAlign(PartVT, DAG));
		Parts.push_back(std::make_pair(PartValue, PartOffset));
		++i;
		}
		SDValue SpillSlot = DAG.CreateStackTemporary(StoredSize, StackAlign);
		int FI = cast<FrameIndexSDNode>(SpillSlot)->getIndex();
		MemOpChains.push_back(
		DAG.getStore(Chain, DL, ArgValue, SpillSlot,
		MachinePointerInfo::getFixedStack(MF, FI)));
		for (const auto &Part : Parts) {
		SDValue PartValue = Part.first;
		unsigned PartOffset = Part.second;
SDValue Address = DAG.getNode(ISD::ADD, DL, PtrVT, SpillSlot,		SDValue Address = DAG.getNode(ISD::ADD, DL, PtrVT, SpillSlot,
DAG.getIntPtrConstant(PartOffset, DL));		DAG.getIntPtrConstant(PartOffset, DL));
MemOpChains.push_back(		MemOpChains.push_back(
DAG.getStore(Chain, DL, PartValue, Address,		DAG.getStore(Chain, DL, PartValue, Address,
MachinePointerInfo::getFixedStack(MF, FI)));		MachinePointerInfo::getFixedStack(MF, FI)));
++i;
}		}
ArgValue = SpillSlot;		ArgValue = SpillSlot;
} else {		} else {
ArgValue = convertValVTToLocVT(DAG, ArgValue, VA, DL, Subtarget);		ArgValue = convertValVTToLocVT(DAG, ArgValue, VA, DL, Subtarget);
}		}

// Use local copy if it is a byval arg.		// Use local copy if it is a byval arg.
if (Flags.isByVal())		if (Flags.isByVal())
▲ Show 20 Lines • Show All 1,037 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-calling-conv.ll

	Show First 20 Lines • Show All 1,068 Lines • ▼ Show 20 Lines
	; LMULMAX4-NEXT: addi sp, s0, -256			; LMULMAX4-NEXT: addi sp, s0, -256
	; LMULMAX4-NEXT: ld s0, 240(sp) # 8-byte Folded Reload			; LMULMAX4-NEXT: ld s0, 240(sp) # 8-byte Folded Reload
	; LMULMAX4-NEXT: ld ra, 248(sp) # 8-byte Folded Reload			; LMULMAX4-NEXT: ld ra, 248(sp) # 8-byte Folded Reload
	; LMULMAX4-NEXT: addi sp, sp, 256			; LMULMAX4-NEXT: addi sp, sp, 256
	; LMULMAX4-NEXT: ret			; LMULMAX4-NEXT: ret
	;			;
	; LMULMAX2-LABEL: call_split_vector_args:			; LMULMAX2-LABEL: call_split_vector_args:
	; LMULMAX2: # %bb.0:			; LMULMAX2: # %bb.0:
	; LMULMAX2-NEXT: addi sp, sp, -256			; LMULMAX2-NEXT: addi sp, sp, -128
	; LMULMAX2-NEXT: .cfi_def_cfa_offset 256			; LMULMAX2-NEXT: .cfi_def_cfa_offset 128
	; LMULMAX2-NEXT: sd ra, 248(sp) # 8-byte Folded Spill			; LMULMAX2-NEXT: sd ra, 120(sp) # 8-byte Folded Spill
	; LMULMAX2-NEXT: sd s0, 240(sp) # 8-byte Folded Spill			; LMULMAX2-NEXT: sd s0, 112(sp) # 8-byte Folded Spill
	; LMULMAX2-NEXT: .cfi_offset ra, -8			; LMULMAX2-NEXT: .cfi_offset ra, -8
	; LMULMAX2-NEXT: .cfi_offset s0, -16			; LMULMAX2-NEXT: .cfi_offset s0, -16
	; LMULMAX2-NEXT: addi s0, sp, 256			; LMULMAX2-NEXT: addi s0, sp, 128
	; LMULMAX2-NEXT: .cfi_def_cfa s0, 0			; LMULMAX2-NEXT: .cfi_def_cfa s0, 0
	; LMULMAX2-NEXT: andi sp, sp, -128			; LMULMAX2-NEXT: andi sp, sp, -128
	; LMULMAX2-NEXT: vsetivli a2, 2, e32,m1,ta,mu			; LMULMAX2-NEXT: vsetivli a2, 2, e32,m1,ta,mu
	; LMULMAX2-NEXT: vle32.v v8, (a0)			; LMULMAX2-NEXT: vle32.v v8, (a0)
	; LMULMAX2-NEXT: vsetivli a0, 8, e32,m2,ta,mu			; LMULMAX2-NEXT: vsetivli a0, 8, e32,m2,ta,mu
	; LMULMAX2-NEXT: vle32.v v14, (a1)			; LMULMAX2-NEXT: vle32.v v14, (a1)
	; LMULMAX2-NEXT: addi a0, a1, 32			; LMULMAX2-NEXT: addi a0, a1, 32
	; LMULMAX2-NEXT: vle32.v v16, (a0)			; LMULMAX2-NEXT: vle32.v v16, (a0)
	; LMULMAX2-NEXT: addi a0, a1, 64			; LMULMAX2-NEXT: addi a0, a1, 64
	; LMULMAX2-NEXT: vle32.v v18, (a0)			; LMULMAX2-NEXT: vle32.v v18, (a0)
	; LMULMAX2-NEXT: addi a0, a1, 96			; LMULMAX2-NEXT: addi a0, a1, 96
	; LMULMAX2-NEXT: vle32.v v20, (a0)			; LMULMAX2-NEXT: vle32.v v20, (a0)
	; LMULMAX2-NEXT: addi a0, sp, 64			; LMULMAX2-NEXT: addi a0, sp, 64
	; LMULMAX2-NEXT: vse32.v v20, (a0)			; LMULMAX2-NEXT: vse32.v v20, (a0)
	; LMULMAX2-NEXT: addi a0, sp, 32			; LMULMAX2-NEXT: addi a0, sp, 32
	; LMULMAX2-NEXT: vse32.v v18, (a0)			; LMULMAX2-NEXT: vse32.v v18, (a0)
	; LMULMAX2-NEXT: mv a0, sp			; LMULMAX2-NEXT: mv a0, sp
	; LMULMAX2-NEXT: vse32.v v16, (sp)			; LMULMAX2-NEXT: vse32.v v16, (sp)
	; LMULMAX2-NEXT: vmv1r.v v9, v8			; LMULMAX2-NEXT: vmv1r.v v9, v8
	; LMULMAX2-NEXT: vmv1r.v v10, v8			; LMULMAX2-NEXT: vmv1r.v v10, v8
	; LMULMAX2-NEXT: vmv1r.v v11, v8			; LMULMAX2-NEXT: vmv1r.v v11, v8
	; LMULMAX2-NEXT: vmv1r.v v12, v8			; LMULMAX2-NEXT: vmv1r.v v12, v8
	; LMULMAX2-NEXT: vmv2r.v v22, v14			; LMULMAX2-NEXT: vmv2r.v v22, v14
	; LMULMAX2-NEXT: call split_vector_args@plt			; LMULMAX2-NEXT: call split_vector_args@plt
	; LMULMAX2-NEXT: addi sp, s0, -256			; LMULMAX2-NEXT: addi sp, s0, -128
	; LMULMAX2-NEXT: ld s0, 240(sp) # 8-byte Folded Reload			; LMULMAX2-NEXT: ld s0, 112(sp) # 8-byte Folded Reload
	; LMULMAX2-NEXT: ld ra, 248(sp) # 8-byte Folded Reload			; LMULMAX2-NEXT: ld ra, 120(sp) # 8-byte Folded Reload
	; LMULMAX2-NEXT: addi sp, sp, 256			; LMULMAX2-NEXT: addi sp, sp, 128
	; LMULMAX2-NEXT: ret			; LMULMAX2-NEXT: ret
	;			;
	; LMULMAX1-LABEL: call_split_vector_args:			; LMULMAX1-LABEL: call_split_vector_args:
	; LMULMAX1: # %bb.0:			; LMULMAX1: # %bb.0:
	; LMULMAX1-NEXT: addi sp, sp, -256			; LMULMAX1-NEXT: addi sp, sp, -128
	; LMULMAX1-NEXT: .cfi_def_cfa_offset 256			; LMULMAX1-NEXT: .cfi_def_cfa_offset 128
	; LMULMAX1-NEXT: sd ra, 248(sp) # 8-byte Folded Spill			; LMULMAX1-NEXT: sd ra, 120(sp) # 8-byte Folded Spill
	; LMULMAX1-NEXT: sd s0, 240(sp) # 8-byte Folded Spill			; LMULMAX1-NEXT: sd s0, 112(sp) # 8-byte Folded Spill
	; LMULMAX1-NEXT: .cfi_offset ra, -8			; LMULMAX1-NEXT: .cfi_offset ra, -8
	; LMULMAX1-NEXT: .cfi_offset s0, -16			; LMULMAX1-NEXT: .cfi_offset s0, -16
	; LMULMAX1-NEXT: addi s0, sp, 256			; LMULMAX1-NEXT: addi s0, sp, 128
	; LMULMAX1-NEXT: .cfi_def_cfa s0, 0			; LMULMAX1-NEXT: .cfi_def_cfa s0, 0
	; LMULMAX1-NEXT: andi sp, sp, -128			; LMULMAX1-NEXT: andi sp, sp, -128
	; LMULMAX1-NEXT: vsetivli a2, 2, e32,m1,ta,mu			; LMULMAX1-NEXT: vsetivli a2, 2, e32,m1,ta,mu
	; LMULMAX1-NEXT: vle32.v v8, (a0)			; LMULMAX1-NEXT: vle32.v v8, (a0)
	; LMULMAX1-NEXT: vsetivli a0, 4, e32,m1,ta,mu			; LMULMAX1-NEXT: vsetivli a0, 4, e32,m1,ta,mu
	; LMULMAX1-NEXT: vle32.v v13, (a1)			; LMULMAX1-NEXT: vle32.v v13, (a1)
	; LMULMAX1-NEXT: addi a0, a1, 16			; LMULMAX1-NEXT: addi a0, a1, 16
	; LMULMAX1-NEXT: vle32.v v14, (a0)			; LMULMAX1-NEXT: vle32.v v14, (a0)
	Show All 22 Lines
	; LMULMAX1-NEXT: vmv1r.v v9, v8			; LMULMAX1-NEXT: vmv1r.v v9, v8
	; LMULMAX1-NEXT: vmv1r.v v10, v8			; LMULMAX1-NEXT: vmv1r.v v10, v8
	; LMULMAX1-NEXT: vmv1r.v v11, v8			; LMULMAX1-NEXT: vmv1r.v v11, v8
	; LMULMAX1-NEXT: vmv1r.v v12, v8			; LMULMAX1-NEXT: vmv1r.v v12, v8
	; LMULMAX1-NEXT: vmv1r.v v21, v13			; LMULMAX1-NEXT: vmv1r.v v21, v13
	; LMULMAX1-NEXT: vmv1r.v v22, v14			; LMULMAX1-NEXT: vmv1r.v v22, v14
	; LMULMAX1-NEXT: vmv1r.v v23, v15			; LMULMAX1-NEXT: vmv1r.v v23, v15
	; LMULMAX1-NEXT: call split_vector_args@plt			; LMULMAX1-NEXT: call split_vector_args@plt
	; LMULMAX1-NEXT: addi sp, s0, -256			; LMULMAX1-NEXT: addi sp, s0, -128
	; LMULMAX1-NEXT: ld s0, 240(sp) # 8-byte Folded Reload			; LMULMAX1-NEXT: ld s0, 112(sp) # 8-byte Folded Reload
	; LMULMAX1-NEXT: ld ra, 248(sp) # 8-byte Folded Reload			; LMULMAX1-NEXT: ld ra, 120(sp) # 8-byte Folded Reload
	; LMULMAX1-NEXT: addi sp, sp, 256			; LMULMAX1-NEXT: addi sp, sp, 128
	; LMULMAX1-NEXT: ret			; LMULMAX1-NEXT: ret
	%a = load <2 x i32>, <2 x i32>* %pa			%a = load <2 x i32>, <2 x i32>* %pa
	%b = load <32 x i32>, <32 x i32>* %pb			%b = load <32 x i32>, <32 x i32>* %pb
	%r = call <32 x i32> @split_vector_args(<2 x i32> %a, <2 x i32> %a, <2 x i32> %a, <2 x i32> %a, <2 x i32> %a, <32 x i32> %b, <32 x i32> %b)			%r = call <32 x i32> @split_vector_args(<2 x i32> %a, <2 x i32> %a, <2 x i32> %a, <2 x i32> %a, <2 x i32> %a, <32 x i32> %b, <32 x i32> %b)
	ret <32 x i32> %r			ret <32 x i32> %r
	}			}

llvm/test/CodeGen/RISCV/stack-slot-size.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=riscv32 -verify-machineinstrs < %s \			; RUN: llc -mtriple=riscv32 -verify-machineinstrs < %s \
	; RUN: \| FileCheck -check-prefix=RV32I %s			; RUN: \| FileCheck -check-prefix=RV32I %s
	; RUN: llc -mtriple=riscv64 -verify-machineinstrs < %s \			; RUN: llc -mtriple=riscv64 -verify-machineinstrs < %s \
	; RUN: \| FileCheck -check-prefix=RV64I %s			; RUN: \| FileCheck -check-prefix=RV64I %s

	; When passing a function argument with a size that isn't a multiple of XLEN,			; When passing a function argument with a size that isn't a multiple of XLEN,
	; and the argument is split and passed indirectly, we must ensure that the stack			; and the argument is split and passed indirectly, we must ensure that the stack
	; slot size appropriately reflects the total size of the parts the argument is			; slot size appropriately reflects the total size of the parts the argument is
	; split into. Otherwise, stack writes can clobber neighboring values.			; split into. Otherwise, stack writes can clobber neighboring values.

	declare void @callee129(i129)			declare void @callee129(i129)
	declare void @callee160(i160)			declare void @callee160(i160)
	declare void @callee161(i161)			declare void @callee161(i161)

	; FIXME: Stack write clobbers the spilled value (on RV64).
	define i32 @caller129() nounwind {			define i32 @caller129() nounwind {
	; RV32I-LABEL: caller129:			; RV32I-LABEL: caller129:
	; RV32I: # %bb.0:			; RV32I: # %bb.0:
	; RV32I-NEXT: addi sp, sp, -32			; RV32I-NEXT: addi sp, sp, -32
	; RV32I-NEXT: sw ra, 28(sp) # 4-byte Folded Spill			; RV32I-NEXT: sw ra, 28(sp) # 4-byte Folded Spill
	; RV32I-NEXT: addi a0, zero, 42			; RV32I-NEXT: addi a0, zero, 42
	; RV32I-NEXT: sw a0, 24(sp)			; RV32I-NEXT: sw a0, 24(sp)
	; RV32I-NEXT: sw zero, 16(sp)			; RV32I-NEXT: sw zero, 16(sp)
	; RV32I-NEXT: sw zero, 12(sp)			; RV32I-NEXT: sw zero, 12(sp)
	; RV32I-NEXT: sw zero, 8(sp)			; RV32I-NEXT: sw zero, 8(sp)
	; RV32I-NEXT: sw zero, 4(sp)			; RV32I-NEXT: sw zero, 4(sp)
	; RV32I-NEXT: mv a0, sp			; RV32I-NEXT: mv a0, sp
	; RV32I-NEXT: sw zero, 0(sp)			; RV32I-NEXT: sw zero, 0(sp)
	; RV32I-NEXT: call callee129@plt			; RV32I-NEXT: call callee129@plt
	; RV32I-NEXT: lw a0, 24(sp)			; RV32I-NEXT: lw a0, 24(sp)
	; RV32I-NEXT: lw ra, 28(sp) # 4-byte Folded Reload			; RV32I-NEXT: lw ra, 28(sp) # 4-byte Folded Reload
	; RV32I-NEXT: addi sp, sp, 32			; RV32I-NEXT: addi sp, sp, 32
	; RV32I-NEXT: ret			; RV32I-NEXT: ret
	;			;
	; RV64I-LABEL: caller129:			; RV64I-LABEL: caller129:
	; RV64I: # %bb.0:			; RV64I: # %bb.0:
	; RV64I-NEXT: addi sp, sp, -32			; RV64I-NEXT: addi sp, sp, -48
	; RV64I-NEXT: sd ra, 24(sp) # 8-byte Folded Spill			; RV64I-NEXT: sd ra, 40(sp) # 8-byte Folded Spill
	; RV64I-NEXT: addi a0, zero, 42			; RV64I-NEXT: addi a0, zero, 42
	; RV64I-NEXT: sw a0, 20(sp)			; RV64I-NEXT: sw a0, 36(sp)
	; RV64I-NEXT: sd zero, 16(sp)			; RV64I-NEXT: sd zero, 16(sp)
	; RV64I-NEXT: sd zero, 8(sp)			; RV64I-NEXT: sd zero, 8(sp)
	; RV64I-NEXT: mv a0, sp			; RV64I-NEXT: mv a0, sp
	; RV64I-NEXT: sd zero, 0(sp)			; RV64I-NEXT: sd zero, 0(sp)
	; RV64I-NEXT: call callee129@plt			; RV64I-NEXT: call callee129@plt
	; RV64I-NEXT: lw a0, 20(sp)			; RV64I-NEXT: lw a0, 36(sp)
	; RV64I-NEXT: ld ra, 24(sp) # 8-byte Folded Reload			; RV64I-NEXT: ld ra, 40(sp) # 8-byte Folded Reload
	; RV64I-NEXT: addi sp, sp, 32			; RV64I-NEXT: addi sp, sp, 48
	; RV64I-NEXT: ret			; RV64I-NEXT: ret
	%1 = alloca i32			%1 = alloca i32
	store i32 42, i32* %1			store i32 42, i32* %1
	call void @callee129(i129 0)			call void @callee129(i129 0)
	%2 = load i32, i32* %1			%2 = load i32, i32* %1
	ret i32 %2			ret i32 %2
	}			}

	; FIXME: Stack write clobbers the spilled value (on RV64).
	define i32 @caller160() nounwind {			define i32 @caller160() nounwind {
	; RV32I-LABEL: caller160:			; RV32I-LABEL: caller160:
	; RV32I: # %bb.0:			; RV32I: # %bb.0:
	; RV32I-NEXT: addi sp, sp, -32			; RV32I-NEXT: addi sp, sp, -32
	; RV32I-NEXT: sw ra, 28(sp) # 4-byte Folded Spill			; RV32I-NEXT: sw ra, 28(sp) # 4-byte Folded Spill
	; RV32I-NEXT: addi a0, zero, 42			; RV32I-NEXT: addi a0, zero, 42
	; RV32I-NEXT: sw a0, 24(sp)			; RV32I-NEXT: sw a0, 24(sp)
	; RV32I-NEXT: sw zero, 16(sp)			; RV32I-NEXT: sw zero, 16(sp)
	; RV32I-NEXT: sw zero, 12(sp)			; RV32I-NEXT: sw zero, 12(sp)
	; RV32I-NEXT: sw zero, 8(sp)			; RV32I-NEXT: sw zero, 8(sp)
	; RV32I-NEXT: sw zero, 4(sp)			; RV32I-NEXT: sw zero, 4(sp)
	; RV32I-NEXT: mv a0, sp			; RV32I-NEXT: mv a0, sp
	; RV32I-NEXT: sw zero, 0(sp)			; RV32I-NEXT: sw zero, 0(sp)
	; RV32I-NEXT: call callee160@plt			; RV32I-NEXT: call callee160@plt
	; RV32I-NEXT: lw a0, 24(sp)			; RV32I-NEXT: lw a0, 24(sp)
	; RV32I-NEXT: lw ra, 28(sp) # 4-byte Folded Reload			; RV32I-NEXT: lw ra, 28(sp) # 4-byte Folded Reload
	; RV32I-NEXT: addi sp, sp, 32			; RV32I-NEXT: addi sp, sp, 32
	; RV32I-NEXT: ret			; RV32I-NEXT: ret
	;			;
	; RV64I-LABEL: caller160:			; RV64I-LABEL: caller160:
	; RV64I: # %bb.0:			; RV64I: # %bb.0:
	; RV64I-NEXT: addi sp, sp, -32			; RV64I-NEXT: addi sp, sp, -48
	; RV64I-NEXT: sd ra, 24(sp) # 8-byte Folded Spill			; RV64I-NEXT: sd ra, 40(sp) # 8-byte Folded Spill
	; RV64I-NEXT: addi a0, zero, 42			; RV64I-NEXT: addi a0, zero, 42
	; RV64I-NEXT: sw a0, 20(sp)			; RV64I-NEXT: sw a0, 36(sp)
	; RV64I-NEXT: sd zero, 16(sp)			; RV64I-NEXT: sd zero, 16(sp)
	; RV64I-NEXT: sd zero, 8(sp)			; RV64I-NEXT: sd zero, 8(sp)
	; RV64I-NEXT: mv a0, sp			; RV64I-NEXT: mv a0, sp
	; RV64I-NEXT: sd zero, 0(sp)			; RV64I-NEXT: sd zero, 0(sp)
	; RV64I-NEXT: call callee160@plt			; RV64I-NEXT: call callee160@plt
	; RV64I-NEXT: lw a0, 20(sp)			; RV64I-NEXT: lw a0, 36(sp)
	; RV64I-NEXT: ld ra, 24(sp) # 8-byte Folded Reload			; RV64I-NEXT: ld ra, 40(sp) # 8-byte Folded Reload
	; RV64I-NEXT: addi sp, sp, 32			; RV64I-NEXT: addi sp, sp, 48
	; RV64I-NEXT: ret			; RV64I-NEXT: ret
	%1 = alloca i32			%1 = alloca i32
	store i32 42, i32* %1			store i32 42, i32* %1
	call void @callee160(i160 0)			call void @callee160(i160 0)
	%2 = load i32, i32* %1			%2 = load i32, i32* %1
	ret i32 %2			ret i32 %2
	}			}

	▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/vector-abi.ll

	; RUN: llc -mtriple=riscv32 -stop-after finalize-isel < %s \| FileCheck %s -check-prefix=RV32			; RUN: llc -mtriple=riscv32 -stop-after finalize-isel < %s \| FileCheck %s -check-prefix=RV32
	; RUN: llc -mtriple=riscv64 -stop-after finalize-isel < %s \| FileCheck %s -check-prefix=RV64			; RUN: llc -mtriple=riscv64 -stop-after finalize-isel < %s \| FileCheck %s -check-prefix=RV64

	; FIXME: The stack location used to pass the parameter to the function has the
	; incorrect size and alignment for how we use it, and we clobber the stack.

	declare void @callee(<4 x i8> %v)			declare void @callee(<4 x i8> %v)

	define void @caller() {			define void @caller() {
	; RV32-LABEL: name: caller			; RV32-LABEL: name: caller
	; RV32: stack:			; RV32: stack:
	; RV32: - { id: 0, name: '', type: default, offset: 0, size: 4, alignment: 4,			; RV32: - { id: 0, name: '', type: default, offset: 0, size: 16, alignment: 4,
	; RV32-NEXT: stack-id: default, callee-saved-register: '', callee-saved-restored: true,			; RV32-NEXT: stack-id: default, callee-saved-register: '', callee-saved-restored: true,
	; RV32-NEXT: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }			; RV32-NEXT: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
	; RV32: bb.0 (%ir-block.0):			; RV32: bb.0 (%ir-block.0):
	; RV32: ADJCALLSTACKDOWN 0, 0, implicit-def dead $x2, implicit $x2			; RV32: ADJCALLSTACKDOWN 0, 0, implicit-def dead $x2, implicit $x2
	; RV32: [[ADDI:%[0-9]+]]:gpr = ADDI $x0, 7			; RV32: [[ADDI:%[0-9]+]]:gpr = ADDI $x0, 7
	; RV32: SW killed [[ADDI]], %stack.0, 12 :: (store 4 into %stack.0)			; RV32: SW killed [[ADDI]], %stack.0, 12 :: (store 4 into %stack.0)
	; RV32: [[ADDI1:%[0-9]+]]:gpr = ADDI $x0, 6			; RV32: [[ADDI1:%[0-9]+]]:gpr = ADDI $x0, 6
	; RV32: SW killed [[ADDI1]], %stack.0, 8 :: (store 4 into %stack.0)			; RV32: SW killed [[ADDI1]], %stack.0, 8 :: (store 4 into %stack.0)
	; RV32: [[ADDI2:%[0-9]+]]:gpr = ADDI $x0, 5			; RV32: [[ADDI2:%[0-9]+]]:gpr = ADDI $x0, 5
	; RV32: SW killed [[ADDI2]], %stack.0, 4 :: (store 4 into %stack.0)			; RV32: SW killed [[ADDI2]], %stack.0, 4 :: (store 4 into %stack.0)
	; RV32: [[ADDI3:%[0-9]+]]:gpr = ADDI $x0, 4			; RV32: [[ADDI3:%[0-9]+]]:gpr = ADDI $x0, 4
	; RV32: SW killed [[ADDI3]], %stack.0, 0 :: (store 4 into %stack.0)			; RV32: SW killed [[ADDI3]], %stack.0, 0 :: (store 4 into %stack.0)
	; RV32: [[ADDI4:%[0-9]+]]:gpr = ADDI %stack.0, 0			; RV32: [[ADDI4:%[0-9]+]]:gpr = ADDI %stack.0, 0
	; RV32: $x10 = COPY [[ADDI4]]			; RV32: $x10 = COPY [[ADDI4]]
	; RV32: PseudoCALL target-flags(riscv-plt) @callee, csr_ilp32_lp64, implicit-def dead $x1, implicit $x10, implicit-def $x2			; RV32: PseudoCALL target-flags(riscv-plt) @callee, csr_ilp32_lp64, implicit-def dead $x1, implicit $x10, implicit-def $x2
	; RV32: ADJCALLSTACKUP 0, 0, implicit-def dead $x2, implicit $x2			; RV32: ADJCALLSTACKUP 0, 0, implicit-def dead $x2, implicit $x2
	; RV32: PseudoRET			; RV32: PseudoRET
	; RV64-LABEL: name: caller			; RV64-LABEL: name: caller
	; RV64: stack:			; RV64: stack:
	; RV64: - { id: 0, name: '', type: default, offset: 0, size: 4, alignment: 4,			; RV64: - { id: 0, name: '', type: default, offset: 0, size: 32, alignment: 8,
	; RV64-NEXT: stack-id: default, callee-saved-register: '', callee-saved-restored: true,			; RV64-NEXT: stack-id: default, callee-saved-register: '', callee-saved-restored: true,
	; RV64-NEXT: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }			; RV64-NEXT: debug-info-variable: '', debug-info-expression: '', debug-info-location: '' }
	; RV64: bb.0 (%ir-block.0):			; RV64: bb.0 (%ir-block.0):
	; RV64: ADJCALLSTACKDOWN 0, 0, implicit-def dead $x2, implicit $x2			; RV64: ADJCALLSTACKDOWN 0, 0, implicit-def dead $x2, implicit $x2
	; RV64: [[ADDI:%[0-9]+]]:gpr = ADDI $x0, 7			; RV64: [[ADDI:%[0-9]+]]:gpr = ADDI $x0, 7
	; RV64: SD killed [[ADDI]], %stack.0, 24 :: (store 8 into %stack.0)			; RV64: SD killed [[ADDI]], %stack.0, 24 :: (store 8 into %stack.0)
	; RV64: [[ADDI1:%[0-9]+]]:gpr = ADDI $x0, 6			; RV64: [[ADDI1:%[0-9]+]]:gpr = ADDI $x0, 6
	; RV64: SD killed [[ADDI1]], %stack.0, 16 :: (store 8 into %stack.0)			; RV64: SD killed [[ADDI1]], %stack.0, 16 :: (store 8 into %stack.0)
	Show All 12 Lines