This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/RISCV/
-
Target/
-
RISCV/
-
RISCVISelLowering.cpp
-
test/CodeGen/RISCV/
-
CodeGen/
-
RISCV/
-
calling-conv-vector-float.ll
-
fastcc-float.ll

Differential D102852

[RISCV] Fix a crash when lowering split float arguments
ClosedPublic

Authored by frasercrmck on May 20 2021, 9:54 AM.

Download Raw Diff

Details

Reviewers

asb
luismarques
craig.topper
kito-cheng

Commits

rGb115c038d2d4: [RISCV] Fix a crash when lowering split float arguments

Summary

Lowering certain float vectors without legal vector types could cause a
crash due to a bad interaction between passing floats via GPRs and
argument splitting. Split vector floats appear just like scalar floats.
Under certain situations we choose to pass these float arguments via
GPRs and use an XLenVT location and set the 'BCvt' info to track how
they must be converted back to floating-point values. However, later
logic for handling split arguments may take over, in which case we lose
the previous information and set the 'Indirect' info, thus incorrectly
lowering to integer types.

I don't believe that we would have come across the notion of split
floating-point arguments before. This patch addresses the issue by
updating the lowering so that split arguments are only passed indirectly
when they are scalar integer types.

This has some change to how we lower some larger illegal float vectors,
as can be seen in 'fastcc-float.ll' where the vector is now passed
partly in registers and partly on the stack.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

frasercrmck created this revision.May 20 2021, 9:54 AM

Herald added subscribers: vkmr, evandro, apazos and 22 others. · View Herald TranscriptMay 20 2021, 9:54 AM

frasercrmck requested review of this revision.May 20 2021, 9:54 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 20 2021, 9:54 AM

Herald added subscribers: llvm-commits, MaskRay. · View Herald Transcript

Harbormaster completed remote builds in B105449: Diff 346770.May 20 2021, 10:58 AM

I'm going to try to review this soon.

The last time I looked at this I had some concerns but I didn't have time to fully investigate them.
To help get this going again, could you please clarify:

Is this patch and/or the ABI changes supposed to be (roughly) permanent or just a temporary fix/workaround?
Can you provide more details about the "The register arguments seem to occupy unused stack slots, which I've observed in other vector tests"?

In D102852#2824377, @luismarques wrote:

The last time I looked at this I had some concerns but I didn't have time to fully investigate them.
To help get this going again, could you please clarify:

Is this patch and/or the ABI changes supposed to be (roughly) permanent or just a temporary fix/workaround?

Can you provide more details about the "The register arguments seem to occupy unused stack slots, which I've observed in other vector tests"?

Thanks for taking a look. To quickly address the second part, I was a bit presumptuous. I was observing the above, e.g. the `<32 x float> being split into:

; CHECK-NEXT:    fmv.w.x fa0, a0
; CHECK-NEXT:    fmv.w.x fa1, a1
; CHECK-NEXT:    fmv.w.x fa2, a2
; CHECK-NEXT:    fmv.w.x fa3, a3
; CHECK-NEXT:    fmv.w.x fa4, a4
; CHECK-NEXT:    flw ft0, 64(sp)
; CHECK-NEXT:    flw ft1, 68(sp)
...

We (may) never access 0-63(sp) so to a casual observer it seems wasteful to use the stack for those parts of the split value that go into registers a0-a4. But thinking about it further I now know (believe) that this is required in case the function takes the address of its argument. There needs to be a "full" copy of the argument somewhere in memory. I believe I would edit that out of the summary now.

As for your first question...

My understanding was that this would be a permanent change. Take all of the following with a grain of salt as you no doubt understand the psABI better than I do. I still have misgivings about whether this is the correct thing to do.

I suppose the ultimate problem is that vectors aren't really specified by the psABI as far as I can tell, so we don't have a concrete model to follow. So as long we're correctly handling other types I think we're okay.

The problem is that as far as LLVM's concerned, illegal vector types are "split". "Split" arguments in our backend are definitely encoding some assumptions which, when I authored this patch, I assumed were only for scalar integers. At the least it does seem to assume types which have been assigned integer locations. The psABI doesn't mention anything in the "Hardware Floating-Point Calling Convention" section about passing floating-point types indirectly. However, if the floating-point argument is passed soft, or variadic, or once other float arguments are exhausted, we do fall into the integer handling, at which point these sections cover us on "split" arguments:

Scalars that are 2×XLEN bits wide are passed in a pair of argument registers, with the low-order XLEN bits in the lower-numbered register and the high-order XLEN bits in the higher-numbered register. If no argument registers are available, the scalar is passed on the stack by value. If exactly one register is available, the low-order XLEN bits are passed in the register and the high-order XLEN bits are passed on the stack.

Scalars wider than 2×XLEN are passed by reference and are replaced in the argument list with the address.

Thing is, I can't work out how we would get to either of the code paths above (where we have a split type and push it into PendingLocs) when dealing with anything other than a scalar integer type. Floating-point type are all handled earlier by a combination of hard/float ABIs and assumptions. We certainly don't have any tests that cover it.

If you pass double on riscv32 (which is the first thing I thought of) you get 2*XLEN scalar integer locations so that's unchanged. double on riscv32 +d is specially handled and early-returned (// Handle passing f64 on RV32D with a soft float ABI or when floating point registers are exhausted.). And double on risv32 with a float ABI is passed in a register, or else through the previous case.

I agree the code looks odd in that it doesn't align to what the psABI prescribes, but after having explained this it does look okay.

I'll add @kito-cheng here since they'll presumably know enough about the psABI to help in case I've missed something.

In D102852#2824822, @frasercrmck wrote:

Thanks for taking a look.

Thanks for the detailed feedback! That helps me a lot, as I can better understand where you're coming from :)

To quickly address the second part, I was a bit presumptuous. I was observing the above, e.g. the `<32 x float> being split into:
; CHECK-NEXT:    fmv.w.x fa0, a0
; CHECK-NEXT:    fmv.w.x fa1, a1
; CHECK-NEXT:    fmv.w.x fa2, a2
; CHECK-NEXT:    fmv.w.x fa3, a3
; CHECK-NEXT:    fmv.w.x fa4, a4
; CHECK-NEXT:    flw ft0, 64(sp)
; CHECK-NEXT:    flw ft1, 68(sp)
...
We (may) never access 0-63(sp) so to a casual observer it seems wasteful to use the stack for those parts of the split value that go into registers a0-a4. But thinking about it further I now know (believe) that this is required in case the function takes the address of its argument. There needs to be a "full" copy of the argument somewhere in memory. I believe I would edit that out of the summary now.

Sorry if I'm misunderstanding your point, but we *are* accessing 0-63(sp) (later below). Am I reading the assembly totally wrong or does it seem to be spilling the original values of the floating-point callee-saved registers *after* killing their old values (by loading into them some of the arguments passed on the stack, as part of preparing the fastcc call), and never restoring them?

In D102852#2829300, @luismarques wrote:

Sorry if I'm misunderstanding your point, but we *are* accessing 0-63(sp) (later below). Am I reading the assembly totally wrong or does it seem to be spilling the original values of the floating-point callee-saved registers *after* killing their old values (by loading into them some of the arguments passed on the stack, as part of preparing the fastcc call), and never restoring them?

Sorry, not "spilling", using them to copy the remaining arguments. The code is loading arguments from the stack (plus GPR args) and passing most of them in the fa<N> and ft<N> (I assume that's our fastcc convention), and using the fs<N> registers to copy the remaining arguments to the proper place in the stack, right? So the 0-63(sp) stack usage still seems OK, that's our frame, that we use it to spill ra and to pass the remaining arguments.

In D102852#2829369, @luismarques wrote:

Sorry, not "spilling", using them to copy the remaining arguments. The code is loading arguments from the stack (plus GPR args) and passing most of them in the fa<N> and ft<N> (I assume that's our fastcc convention), and using the fs<N> registers to copy the remaining arguments to the proper place in the stack, right? So the 0-63(sp) stack usage still seems OK, that's our frame, that we use it to spill ra and to pass the remaining arguments.

To be clear, it's fine not to save the floating-point callee-saved registers as that RUN invocation isn't targeting a hard-float ABI. One thing that seems we aren't checking properly is that fastcc still works even if we change -mattr=+f,+d to -mattr=-f,-d, as it still uses the floating-point registers and instructions. I suppose it should either complain or use a different convention?

In D102852#2824822, @frasercrmck wrote:

To be clear, it's fine not to save the floating-point callee-saved registers as that RUN invocation isn't targeting a hard-float ABI. One thing that seems we aren't checking properly is that fastcc still works even if we change -mattr=+f,+d to -mattr=-f,-d, as it still uses the floating-point registers and instructions. I suppose it should either complain or use a different convention?

Arg. Don't drink and drive and don't do reviews on the weekends. That was faulty testing, it does work as expected, which is a relief.

In D102852#2824822, @frasercrmck wrote:
My understanding was that this would be a permanent change. Take all of the following with a grain of salt as you no doubt understand the psABI better than I do. I still have misgivings about whether this is the correct thing to do.

I suppose the ultimate problem is that vectors aren't really specified by the psABI as far as I can tell, so we don't have a concrete model to follow. So as long we're correctly handling other types I think we're okay.

The problem is that as far as LLVM's concerned, illegal vector types are "split". "Split" arguments in our backend are definitely encoding some assumptions which, when I authored this patch, I assumed were only for scalar integers. At the least it does seem to assume types which have been assigned integer locations. The psABI doesn't mention anything in the "Hardware Floating-Point Calling Convention" section about passing floating-point types indirectly. However, if the floating-point argument is passed soft, or variadic, or once other float arguments are exhausted, we do fall into the integer handling, at which point these sections cover us on "split" arguments:
Scalars that are 2×XLEN bits wide are passed in a pair of argument registers, with the low-order XLEN bits in the lower-numbered register and the high-order XLEN bits in the higher-numbered register. If no argument registers are available, the scalar is passed on the stack by value. If exactly one register is available, the low-order XLEN bits are passed in the register and the high-order XLEN bits are passed on the stack.

Scalars wider than 2×XLEN are passed by reference and are replaced in the argument list with the address.
Thing is, I can't work out how we would get to either of the code paths above (where we have a split type and push it into PendingLocs) when dealing with anything other than a scalar integer type. Floating-point type are all handled earlier by a combination of hard/float ABIs and assumptions. We certainly don't have any tests that cover it.

If you pass double on riscv32 (which is the first thing I thought of) you get 2*XLEN scalar integer locations so that's unchanged. double on riscv32 +d is specially handled and early-returned (// Handle passing f64 on RV32D with a soft float ABI or when floating point registers are exhausted.). And double on risv32 with a float ABI is passed in a register, or else through the previous case.

I agree the code looks odd in that it doesn't align to what the psABI prescribes, but after having explained this it does look okay.

You're right that, when it comes to vectors, we're generally out of what the psABI specifies. But, if anything, I would say that this falls within the "Aggregates larger than 2×XLEN bits are passed by reference and are replaced in the argument list with the address". So I'm not particularly convinced that this is the correct long-term solution, as before we were doing exactly that and now we're not.

In D102852#2829300, @luismarques wrote:

Sorry if I'm misunderstanding your point, but we *are* accessing 0-63(sp) (later below).

Yeah sorry I misread the assembly somehow. I thought that the lowering of %A in the non-fastcc @caller was passing the first 8 elements in a0-a7 but were shadowed by stack locations 0-63, which felt wasteful. I thought that because the first load was from 64(sp) onwards. Looking again I now see that this isn't the case because I forgot to notice that the sp is being decremented by 64 first :(. So 64(sp) is indeed the 9th element. I'll remove that bit from the commit summary now. However, this may be irrelevant given the discussion below:

In D102852#2829427, @luismarques wrote:

You're right that, when it comes to vectors, we're generally out of what the psABI specifies. But, if anything, I would say that this falls within the "Aggregates larger than 2×XLEN bits are passed by reference and are replaced in the argument list with the address". So I'm not particularly convinced that this is the correct long-term solution, as before we were doing exactly that and now we're not.

I agree that if vectors are aggregates then this isn't doing the right thing. However, I couldn't find a concrete definition of "aggregates" as per the psABI. The System V ABI is a bit clearer in saying "Aggregates (structures and arrays)" which I don't think covers vectors.

I suppose it may be wise to take a step back in case we're on the wrong track. These vector types we're dealing with are those that are passed by value in the LLVM IR. The frontend never actually generates this code. It would generate define float @foo(<32 x float>* nocapture readonly %0) for the equivalent C code:

typedef float float32 __attribute__((ext_vector_type(32)));

extern float ext(float32);

float foo(float32 v) {
  return ext(v);
}

So I realise I brought it up but is the psABI applicable for by-value vector types in IR? I believe @kito-cheng might have had the opinion that it was effectively a compiler-internal ABI in one of the recent LLVM calls.

If we do want vectors (without the V extension) to behave like the equivalently-sized aggregates in the psABI then we'll need to fix the CC lowering so that it correctly handles float ABIs in conjunction with non-scalar types. I think we've got at least 2 separate symptoms which I would like to fix here. Both of these bugs are bad interactions between split type lowering and floating-point type lowering. Both are fixed by this patch:

; RUN: llc -mtriple=riscv64 -mattr=+f < %s
; RUN: llc -mtriple=riscv64 -mattr=+f --target-abi=lp64f < %s
; RUN: llc -mtriple=riscv64 -mattr=+d --target-abi=lp64d < %s

; CC_RISCVAssign2XLen picks up on this and crashes, returning incorrect part type sizes

define <2 x float> @callee_v2f32(<2 x float> %x) {
  ret <2 x float> %x
}

; RUN: llc -mtriple=riscv32 -mattr=+f --target-abi=ilp32f  < %s
; RUN: llc -mtriple=riscv32 -mattr=+d --target-abi=ilp32d < %s
; RUN: llc -mtriple=riscv64 -mattr=+f --target-abi=lp64f < %s
; RUN: llc -mtriple=riscv64 -mattr=+d --target-abi=lp64d < %s

; The pending locs/split handling takes over, losing the initial "BCvt" LocInfo for the ABI, assigns an XLEN loc, and emits a COPY from float to int registers. This is an impossible reg-to-reg copy.

define <4 x float> @callee_v4f32(<4 x float> %x) {
  ret <4 x float> %x
}

Since the Hardware Floating-point Calling Convention doesn't refer to "aggregates" as much as it refers to "structs" I suppose it follows from your opinion about aggregates that, if the psABI is relevant, we should be looking entirely at the Integer Calling Convention? So there shouldn't be any use of floating-point registers in lowering these illegal vectors at all, no matter which ABI is used?

frasercrmck edited the summary of this revision. (Show Details)Jun 21 2021, 5:14 AM

Some thoughts:

I agree with you that the psABI doesn't (for now, at least) specify this ABI condition so technically we can do whatever we want. Whether we should still follow the spirit of the ABI or be free to make something completely different I'm not sure about. As you noted, this IR construct might arguably also be outside of what a vector-aware ABI would specify (but clang isn't the only frontend).

Besides the psABI not covering vectors explicitly, I think what makes the situation more confusing is that arrays are passed by reference, so the specification of the ABI for non-scalar types ends up being less general than it could be. IMO vectors passed by value are reasonably analogous to the "Floating-point reals" and other structs, the way they are discussed in the spec.

Since the Hardware Floating-point Calling Convention doesn't refer to "aggregates" as much as it refers to "structs" I suppose it follows from your opinion about aggregates that, if the psABI is relevant, we should be looking entirely at the Integer Calling Convention? So there shouldn't be any use of floating-point registers in lowering these illegal vectors at all, no matter which ABI is used?

A struct is an aggregate. I think the FP section just happens to use more specific language, as it details the rules for the actual language constructs (structs, reals, etc) that would be impacted by the FP calling convention. But if we decided to follow the spirit of the psABI my overall reading of it is that I would expect small (<= 2×XLEN) vectors passed by value to follow the FP convention, when available.

I'm not opposed to this implementation as "something that works" and that "doesn't have to be compatible with anything" and that "solves our problems now". It's just not the ABI I would expect. It seemed to me like an unnecessary departure from convention, and that's why I didn't find it particularly appealing. But perhaps it doesn't matter? In any case, thanks for the patch and for your analysis. Let's hear from other people -- @kito-cheng @asb and others.

In D102852#2830433, @luismarques wrote:

Some thoughts:

I agree with you that the psABI doesn't (for now, at least) specify this ABI condition so technically we can do whatever we want. Whether we should still follow the spirit of the ABI or be free to make something completely different I'm not sure about. As you noted, this IR construct might arguably also be outside of what a vector-aware ABI would specify (but clang isn't the only frontend).

Besides the psABI not covering vectors explicitly, I think what makes the situation more confusing is that arrays are passed by reference, so the specification of the ABI for non-scalar types ends up being less general than it could be. IMO vectors passed by value are reasonably analogous to the "Floating-point reals" and other structs, the way they are discussed in the spec.

Since the Hardware Floating-point Calling Convention doesn't refer to "aggregates" as much as it refers to "structs" I suppose it follows from your opinion about aggregates that, if the psABI is relevant, we should be looking entirely at the Integer Calling Convention? So there shouldn't be any use of floating-point registers in lowering these illegal vectors at all, no matter which ABI is used?

A struct is an aggregate. I think the FP section just happens to use more specific language, as it details the rules for the actual language constructs (structs, reals, etc) that would be impacted by the FP calling convention. But if we decided to follow the spirit of the psABI my overall reading of it is that I would expect small (<= 2×XLEN) vectors passed by value to follow the FP convention, when available.

I'm not opposed to this implementation as "something that works" and that "doesn't have to be compatible with anything" and that "solves our problems now". It's just not the ABI I would expect. It seemed to me like an unnecessary departure from convention, and that's why I didn't find it particularly appealing. But perhaps it doesn't matter? In any case, thanks for the patch and for your analysis. Let's hear from other people -- @kito-cheng @asb and others.

Thanks for all of your input and thoughts. It's great to have this discussion as this sort of stuff, as we can see, isn't something I feel comfortable asserting on by myself. I would like to hear from other people, but I should add that another reason I'm wary of treating vectors == aggregates as specified by the psABI because I don't think there's great value in passing vectors <= 2*XLEN (<8 x i8> on RV64, etc) packed into one/two register(s). We'd just have to unpack it back into individual elements with shifts and truncates/extends on the caller side anyway since the vector operations would be scalarized by most compilers.

In D102852#2830495, @frasercrmck wrote:

I would like to hear from other people, but I should add that another reason I'm wary of treating vectors == aggregates as specified by the psABI because I don't think there's great value in passing vectors <= 2*XLEN (<8 x i8> on RV64, etc) packed into one/two register(s). We'd just have to unpack it back into individual elements with shifts and truncates/extends on the caller side anyway since the vector operations would be scalarized by most compilers.

You're right. That's a compelling argument. The patch LGTM.
(Feedback from other people would still be welcome).

This revision is now accepted and ready to land.Jun 21 2021, 8:22 AM

Ping @asb or @kito-cheng any thoughts on this? (see above).

rebase

Harbormaster completed remote builds in B115497: Diff 360713.Jul 22 2021, 12:49 AM

In D102852#2873715, @frasercrmck wrote:

Ping @asb or @kito-cheng any thoughts on this? (see above).

I think you can merge this. In the unlikely case we want to change the ABI, we can always do follow-up patches.
Thank you for the fix and the analysis!

This revision was landed with ongoing or failed builds.Jul 22 2021, 2:04 AM

Closed by commit rGb115c038d2d4: [RISCV] Fix a crash when lowering split float arguments (authored by frasercrmck). · Explain Why

This revision was automatically updated to reflect the committed changes.

frasercrmck added a commit: rGb115c038d2d4: [RISCV] Fix a crash when lowering split float arguments.

In D102852#2895758, @luismarques wrote:

In D102852#2873715, @frasercrmck wrote:

Ping @asb or @kito-cheng any thoughts on this? (see above).

I think you can merge this. In the unlikely case we want to change the ABI, we can always do follow-up patches.
Thank you for the fix and the analysis!

Yeah, fair enough. Thank you for your help and patience!

Revision Contents

Path

Size

llvm/

lib/

Target/

RISCV/

RISCVISelLowering.cpp

5 lines

test/

CodeGen/

RISCV/

calling-conv-vector-float.ll

63 lines

fastcc-float.ll

66 lines

Diff 360734

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,124 Lines • ▼ Show 20 Lines	static bool CC_RISCV(const DataLayout &DL, RISCVABI::ABI ABI, unsigned ValNo,
// Fixed-length vectors are located in the corresponding scalable-vector		// Fixed-length vectors are located in the corresponding scalable-vector
// container types.		// container types.
if (ValVT.isFixedLengthVector())		if (ValVT.isFixedLengthVector())
LocVT = TLI.getContainerForFixedLengthVector(LocVT);		LocVT = TLI.getContainerForFixedLengthVector(LocVT);

// Split arguments might be passed indirectly, so keep track of the pending		// Split arguments might be passed indirectly, so keep track of the pending
// values. Split vectors are passed via a mix of registers and indirectly, so		// values. Split vectors are passed via a mix of registers and indirectly, so
// treat them as we would any other argument.		// treat them as we would any other argument.
if (!LocVT.isVector() && (ArgFlags.isSplit() \|\| !PendingLocs.empty())) {		if (ValVT.isScalarInteger() && (ArgFlags.isSplit() \|\| !PendingLocs.empty())) {
LocVT = XLenVT;		LocVT = XLenVT;
LocInfo = CCValAssign::Indirect;		LocInfo = CCValAssign::Indirect;
PendingLocs.push_back(		PendingLocs.push_back(
CCValAssign::getPending(ValNo, ValVT, LocVT, LocInfo));		CCValAssign::getPending(ValNo, ValVT, LocVT, LocInfo));
PendingArgFlags.push_back(ArgFlags);		PendingArgFlags.push_back(ArgFlags);
if (!ArgFlags.isSplitEnd()) {		if (!ArgFlags.isSplitEnd()) {
return false;		return false;
}		}
}		}

// If the split argument only had two elements, it should be passed directly		// If the split argument only had two elements, it should be passed directly
// in registers or on the stack.		// in registers or on the stack.
if (!LocVT.isVector() && ArgFlags.isSplitEnd() && PendingLocs.size() <= 2) {		if (ValVT.isScalarInteger() && ArgFlags.isSplitEnd() &&
		PendingLocs.size() <= 2) {
assert(PendingLocs.size() == 2 && "Unexpected PendingLocs.size()");		assert(PendingLocs.size() == 2 && "Unexpected PendingLocs.size()");
// Apply the normal calling convention rules to the first half of the		// Apply the normal calling convention rules to the first half of the
// split argument.		// split argument.
CCValAssign VA = PendingLocs[0];		CCValAssign VA = PendingLocs[0];
ISD::ArgFlagsTy AF = PendingArgFlags[0];		ISD::ArgFlagsTy AF = PendingArgFlags[0];
PendingLocs.clear();		PendingLocs.clear();
PendingArgFlags.clear();		PendingArgFlags.clear();
return CC_RISCVAssign2XLen(XLen, State, VA, AF, ValNo, ValVT, LocVT,		return CC_RISCVAssign2XLen(XLen, State, VA, AF, ValNo, ValVT, LocVT,
▲ Show 20 Lines • Show All 1,879 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/calling-conv-vector-float.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=riscv64 -mattr=+f -verify-machineinstrs < %s \
				; RUN: \| FileCheck -check-prefix=RV64 %s
				; RUN: llc -mtriple=riscv64 -mattr=+f -target-abi=lp64f -verify-machineinstrs < %s \
				; RUN: \| FileCheck -check-prefix=RV64LP64F %s

				define <2 x float> @callee_v2f32(<2 x float> %x, <2 x float> %y) {
				; RV64-LABEL: callee_v2f32:
				; RV64: # %bb.0:
				; RV64-NEXT: fmv.w.x ft0, a2
				; RV64-NEXT: fmv.w.x ft1, a0
				; RV64-NEXT: fmv.w.x ft2, a3
				; RV64-NEXT: fmv.w.x ft3, a1
				; RV64-NEXT: fadd.s ft2, ft3, ft2
				; RV64-NEXT: fadd.s ft0, ft1, ft0
				; RV64-NEXT: fmv.x.w a0, ft0
				; RV64-NEXT: fmv.x.w a1, ft2
				; RV64-NEXT: ret
				;
				; RV64LP64F-LABEL: callee_v2f32:
				; RV64LP64F: # %bb.0:
				; RV64LP64F-NEXT: fadd.s fa0, fa0, fa2
				; RV64LP64F-NEXT: fadd.s fa1, fa1, fa3
				; RV64LP64F-NEXT: ret
				%z = fadd <2 x float> %x, %y
				ret <2 x float> %z
				}

				define <4 x float> @callee_v4f32(<4 x float> %x, <4 x float> %y) {
				; RV64-LABEL: callee_v4f32:
				; RV64: # %bb.0:
				; RV64-NEXT: fmv.w.x ft0, a4
				; RV64-NEXT: fmv.w.x ft1, a7
				; RV64-NEXT: fmv.w.x ft2, a3
				; RV64-NEXT: fmv.w.x ft3, a6
				; RV64-NEXT: fmv.w.x ft4, a2
				; RV64-NEXT: fmv.w.x ft5, a5
				; RV64-NEXT: fmv.w.x ft6, a1
				; RV64-NEXT: flw ft7, 0(sp)
				; RV64-NEXT: fadd.s ft5, ft6, ft5
				; RV64-NEXT: fadd.s ft3, ft4, ft3
				; RV64-NEXT: fadd.s ft1, ft2, ft1
				; RV64-NEXT: fadd.s ft0, ft0, ft7
				; RV64-NEXT: fsw ft0, 12(a0)
				; RV64-NEXT: fsw ft1, 8(a0)
				; RV64-NEXT: fsw ft3, 4(a0)
				; RV64-NEXT: fsw ft5, 0(a0)
				; RV64-NEXT: ret
				;
				; RV64LP64F-LABEL: callee_v4f32:
				; RV64LP64F: # %bb.0:
				; RV64LP64F-NEXT: fadd.s ft0, fa0, fa4
				; RV64LP64F-NEXT: fadd.s ft1, fa1, fa5
				; RV64LP64F-NEXT: fadd.s ft2, fa2, fa6
				; RV64LP64F-NEXT: fadd.s ft3, fa3, fa7
				; RV64LP64F-NEXT: fsw ft3, 12(a0)
				; RV64LP64F-NEXT: fsw ft2, 8(a0)
				; RV64LP64F-NEXT: fsw ft1, 4(a0)
				; RV64LP64F-NEXT: fsw ft0, 0(a0)
				; RV64LP64F-NEXT: ret
				%z = fadd <4 x float> %x, %y
				ret <4 x float> %z
				}

llvm/test/CodeGen/RISCV/fastcc-float.ll

	Show All 12 Lines

	; With the fastcc, arguments will be passed by fa0-fa7 and ft0-ft11.			; With the fastcc, arguments will be passed by fa0-fa7 and ft0-ft11.
	; The rest will be pushed on the stack.			; The rest will be pushed on the stack.
	define float @caller(<32 x float> %A) nounwind {			define float @caller(<32 x float> %A) nounwind {
	; CHECK-LABEL: caller:			; CHECK-LABEL: caller:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: addi sp, sp, -64			; CHECK-NEXT: addi sp, sp, -64
	; CHECK-NEXT: sw ra, 60(sp) # 4-byte Folded Spill			; CHECK-NEXT: sw ra, 60(sp) # 4-byte Folded Spill
	; CHECK-NEXT: flw fa0, 0(a0)			; CHECK-NEXT: fmv.w.x fa0, a0
	; CHECK-NEXT: flw fa1, 4(a0)			; CHECK-NEXT: fmv.w.x fa1, a1
	; CHECK-NEXT: flw fa2, 8(a0)			; CHECK-NEXT: fmv.w.x fa2, a2
	; CHECK-NEXT: flw fa3, 12(a0)			; CHECK-NEXT: fmv.w.x fa3, a3
	; CHECK-NEXT: flw fa4, 16(a0)			; CHECK-NEXT: fmv.w.x fa4, a4
	; CHECK-NEXT: flw fa5, 20(a0)			; CHECK-NEXT: flw ft0, 64(sp)
	; CHECK-NEXT: flw fa6, 24(a0)			; CHECK-NEXT: flw ft1, 68(sp)
	; CHECK-NEXT: flw fa7, 28(a0)			; CHECK-NEXT: flw ft2, 72(sp)
	; CHECK-NEXT: flw ft0, 32(a0)			; CHECK-NEXT: flw ft3, 76(sp)
	; CHECK-NEXT: flw ft1, 36(a0)			; CHECK-NEXT: flw ft4, 80(sp)
	; CHECK-NEXT: flw ft2, 40(a0)			; CHECK-NEXT: flw ft5, 84(sp)
	; CHECK-NEXT: flw ft3, 44(a0)			; CHECK-NEXT: flw ft6, 88(sp)
	; CHECK-NEXT: flw ft4, 48(a0)			; CHECK-NEXT: flw ft7, 92(sp)
	; CHECK-NEXT: flw ft5, 52(a0)			; CHECK-NEXT: flw ft8, 96(sp)
	; CHECK-NEXT: flw ft6, 56(a0)			; CHECK-NEXT: flw ft9, 100(sp)
	; CHECK-NEXT: flw ft7, 60(a0)			; CHECK-NEXT: flw ft10, 104(sp)
	; CHECK-NEXT: flw ft8, 64(a0)			; CHECK-NEXT: flw ft11, 108(sp)
	; CHECK-NEXT: flw ft9, 68(a0)			; CHECK-NEXT: flw fs0, 112(sp)
	; CHECK-NEXT: flw ft10, 72(a0)			; CHECK-NEXT: flw fs1, 116(sp)
	; CHECK-NEXT: flw ft11, 76(a0)			; CHECK-NEXT: flw fs2, 120(sp)
	; CHECK-NEXT: flw fs0, 80(a0)			; CHECK-NEXT: flw fs3, 124(sp)
	; CHECK-NEXT: flw fs1, 84(a0)			; CHECK-NEXT: flw fs4, 128(sp)
	; CHECK-NEXT: flw fs2, 88(a0)			; CHECK-NEXT: flw fs5, 132(sp)
	; CHECK-NEXT: flw fs3, 92(a0)			; CHECK-NEXT: flw fs6, 136(sp)
	; CHECK-NEXT: flw fs4, 96(a0)			; CHECK-NEXT: flw fs7, 140(sp)
	; CHECK-NEXT: flw fs5, 100(a0)			; CHECK-NEXT: flw fs8, 144(sp)
	; CHECK-NEXT: flw fs6, 104(a0)			; CHECK-NEXT: flw fs9, 148(sp)
	; CHECK-NEXT: flw fs7, 108(a0)			; CHECK-NEXT: flw fs10, 152(sp)
	; CHECK-NEXT: flw fs8, 112(a0)			; CHECK-NEXT: flw fs11, 156(sp)
	; CHECK-NEXT: flw fs9, 116(a0)			; CHECK-NEXT: fmv.w.x fa5, a5
	; CHECK-NEXT: flw fs10, 120(a0)			; CHECK-NEXT: fmv.w.x fa6, a6
	; CHECK-NEXT: flw fs11, 124(a0)			; CHECK-NEXT: fmv.w.x fa7, a7
	; CHECK-NEXT: fsw fs11, 44(sp)			; CHECK-NEXT: fsw fs11, 44(sp)
	; CHECK-NEXT: fsw fs10, 40(sp)			; CHECK-NEXT: fsw fs10, 40(sp)
	; CHECK-NEXT: fsw fs9, 36(sp)			; CHECK-NEXT: fsw fs9, 36(sp)
	; CHECK-NEXT: fsw fs8, 32(sp)			; CHECK-NEXT: fsw fs8, 32(sp)
	; CHECK-NEXT: fsw fs7, 28(sp)			; CHECK-NEXT: fsw fs7, 28(sp)
	; CHECK-NEXT: fsw fs6, 24(sp)			; CHECK-NEXT: fsw fs6, 24(sp)
	; CHECK-NEXT: fsw fs5, 20(sp)			; CHECK-NEXT: fsw fs5, 20(sp)
	; CHECK-NEXT: fsw fs4, 16(sp)			; CHECK-NEXT: fsw fs4, 16(sp)
	; CHECK-NEXT: fsw fs3, 12(sp)			; CHECK-NEXT: fsw fs3, 12(sp)
	; CHECK-NEXT: fsw fs2, 8(sp)			; CHECK-NEXT: fsw fs2, 8(sp)
	; CHECK-NEXT: fsw fs1, 4(sp)			; CHECK-NEXT: fsw fs1, 4(sp)
	; CHECK-NEXT: fsw fs0, 0(sp)			; CHECK-NEXT: fsw fs0, 0(sp)
	; CHECK-NEXT: call callee			; CHECK-NEXT: call callee@plt
	; CHECK-NEXT: lw ra, 60(sp) # 4-byte Folded Reload			; CHECK-NEXT: lw ra, 60(sp) # 4-byte Folded Reload
	; CHECK-NEXT: addi sp, sp, 64			; CHECK-NEXT: addi sp, sp, 64
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%C = call fastcc float @callee(<32 x float> %A)			%C = call fastcc float @callee(<32 x float> %A)
	ret float %C			ret float %C
	}			}