Page MenuHomePhabricator

[RISCV] Fix a crash when lowering split float arguments
ClosedPublic

Authored by frasercrmck on May 20 2021, 9:54 AM.

Details

Summary

Lowering certain float vectors without legal vector types could cause a
crash due to a bad interaction between passing floats via GPRs and
argument splitting. Split vector floats appear just like scalar floats.
Under certain situations we choose to pass these float arguments via
GPRs and use an XLenVT location and set the 'BCvt' info to track how
they must be converted back to floating-point values. However, later
logic for handling split arguments may take over, in which case we lose
the previous information and set the 'Indirect' info, thus incorrectly
lowering to integer types.

I don't believe that we would have come across the notion of split
floating-point arguments before. This patch addresses the issue by
updating the lowering so that split arguments are only passed indirectly
when they are scalar integer types.

This has some change to how we lower some larger illegal float vectors,
as can be seen in 'fastcc-float.ll' where the vector is now passed
partly in registers and partly on the stack.

Diff Detail

Event Timeline

frasercrmck created this revision.May 20 2021, 9:54 AM
frasercrmck requested review of this revision.May 20 2021, 9:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 20 2021, 9:54 AM

I'm going to try to review this soon.

The last time I looked at this I had some concerns but I didn't have time to fully investigate them.
To help get this going again, could you please clarify:

  • Is this patch and/or the ABI changes supposed to be (roughly) permanent or just a temporary fix/workaround?
  • Can you provide more details about the "The register arguments seem to occupy unused stack slots, which I've observed in other vector tests"?

The last time I looked at this I had some concerns but I didn't have time to fully investigate them.
To help get this going again, could you please clarify:

  • Is this patch and/or the ABI changes supposed to be (roughly) permanent or just a temporary fix/workaround?
  • Can you provide more details about the "The register arguments seem to occupy unused stack slots, which I've observed in other vector tests"?

Thanks for taking a look. To quickly address the second part, I was a bit presumptuous. I was observing the above, e.g. the `<32 x float> being split into:

; CHECK-NEXT:    fmv.w.x fa0, a0
; CHECK-NEXT:    fmv.w.x fa1, a1
; CHECK-NEXT:    fmv.w.x fa2, a2
; CHECK-NEXT:    fmv.w.x fa3, a3
; CHECK-NEXT:    fmv.w.x fa4, a4
; CHECK-NEXT:    flw ft0, 64(sp)
; CHECK-NEXT:    flw ft1, 68(sp)
...

We (may) never access 0-63(sp) so to a casual observer it seems wasteful to use the stack for those parts of the split value that go into registers a0-a4. But thinking about it further I now know (believe) that this is required in case the function takes the address of its argument. There needs to be a "full" copy of the argument somewhere in memory. I believe I would edit that out of the summary now.

As for your first question...

My understanding was that this would be a permanent change. Take all of the following with a grain of salt as you no doubt understand the psABI better than I do. I still have misgivings about whether this is the correct thing to do.

I suppose the ultimate problem is that vectors aren't really specified by the psABI as far as I can tell, so we don't have a concrete model to follow. So as long we're correctly handling other types I think we're okay.

The problem is that as far as LLVM's concerned, illegal vector types are "split". "Split" arguments in our backend are definitely encoding some assumptions which, when I authored this patch, I assumed were only for scalar integers. At the least it does seem to assume types which have been assigned integer locations. The psABI doesn't mention anything in the "Hardware Floating-Point Calling Convention" section about passing floating-point types indirectly. However, if the floating-point argument is passed soft, or variadic, or once other float arguments are exhausted, we do fall into the integer handling, at which point these sections cover us on "split" arguments:

Scalars that are 2×XLEN bits wide are passed in a pair of argument registers, with the low-order XLEN bits in the lower-numbered register and the high-order XLEN bits in the higher-numbered register. If no argument registers are available, the scalar is passed on the stack by value. If exactly one register is available, the low-order XLEN bits are passed in the register and the high-order XLEN bits are passed on the stack.

Scalars wider than 2×XLEN are passed by reference and are replaced in the argument list with the address.

Thing is, I can't work out how we would get to either of the code paths above (where we have a split type and push it into PendingLocs) when dealing with anything other than a scalar integer type. Floating-point type are all handled earlier by a combination of hard/float ABIs and assumptions. We certainly don't have any tests that cover it.

If you pass double on riscv32 (which is the first thing I thought of) you get 2*XLEN scalar integer locations so that's unchanged. double on riscv32 +d is specially handled and early-returned (// Handle passing f64 on RV32D with a soft float ABI or when floating point registers are exhausted.). And double on risv32 with a float ABI is passed in a register, or else through the previous case.

I agree the code looks odd in that it doesn't align to what the psABI prescribes, but after having explained this it does look okay.

I'll add @kito-cheng here since they'll presumably know enough about the psABI to help in case I've missed something.

Thanks for taking a look.

Thanks for the detailed feedback! That helps me a lot, as I can better understand where you're coming from :)

To quickly address the second part, I was a bit presumptuous. I was observing the above, e.g. the `<32 x float> being split into:

; CHECK-NEXT:    fmv.w.x fa0, a0
; CHECK-NEXT:    fmv.w.x fa1, a1
; CHECK-NEXT:    fmv.w.x fa2, a2
; CHECK-NEXT:    fmv.w.x fa3, a3
; CHECK-NEXT:    fmv.w.x fa4, a4
; CHECK-NEXT:    flw ft0, 64(sp)
; CHECK-NEXT:    flw ft1, 68(sp)
...

We (may) never access 0-63(sp) so to a casual observer it seems wasteful to use the stack for those parts of the split value that go into registers a0-a4. But thinking about it further I now know (believe) that this is required in case the function takes the address of its argument. There needs to be a "full" copy of the argument somewhere in memory. I believe I would edit that out of the summary now.

Sorry if I'm misunderstanding your point, but we *are* accessing 0-63(sp) (later below). Am I reading the assembly totally wrong or does it seem to be spilling the original values of the floating-point callee-saved registers *after* killing their old values (by loading into them some of the arguments passed on the stack, as part of preparing the fastcc call), and never restoring them?

luismarques added a comment.EditedJun 20 2021, 2:02 PM

Sorry if I'm misunderstanding your point, but we *are* accessing 0-63(sp) (later below). Am I reading the assembly totally wrong or does it seem to be spilling the original values of the floating-point callee-saved registers *after* killing their old values (by loading into them some of the arguments passed on the stack, as part of preparing the fastcc call), and never restoring them?

Sorry, not "spilling", using them to copy the remaining arguments. The code is loading arguments from the stack (plus GPR args) and passing most of them in the fa<N> and ft<N> (I assume that's our fastcc convention), and using the fs<N> registers to copy the remaining arguments to the proper place in the stack, right? So the 0-63(sp) stack usage still seems OK, that's our frame, that we use it to spill ra and to pass the remaining arguments.

luismarques added a comment.EditedJun 20 2021, 2:29 PM

Sorry, not "spilling", using them to copy the remaining arguments. The code is loading arguments from the stack (plus GPR args) and passing most of them in the fa<N> and ft<N> (I assume that's our fastcc convention), and using the fs<N> registers to copy the remaining arguments to the proper place in the stack, right? So the 0-63(sp) stack usage still seems OK, that's our frame, that we use it to spill ra and to pass the remaining arguments.

To be clear, it's fine not to save the floating-point callee-saved registers as that RUN invocation isn't targeting a hard-float ABI. One thing that seems we aren't checking properly is that fastcc still works even if we change -mattr=+f,+d to -mattr=-f,-d, as it still uses the floating-point registers and instructions. I suppose it should either complain or use a different convention?

To be clear, it's fine not to save the floating-point callee-saved registers as that RUN invocation isn't targeting a hard-float ABI. One thing that seems we aren't checking properly is that fastcc still works even if we change -mattr=+f,+d to -mattr=-f,-d, as it still uses the floating-point registers and instructions. I suppose it should either complain or use a different convention?

Arg. Don't drink and drive and don't do reviews on the weekends. That was faulty testing, it does work as expected, which is a relief.

My understanding was that this would be a permanent change. Take all of the following with a grain of salt as you no doubt understand the psABI better than I do. I still have misgivings about whether this is the correct thing to do.

I suppose the ultimate problem is that vectors aren't really specified by the psABI as far as I can tell, so we don't have a concrete model to follow. So as long we're correctly handling other types I think we're okay.

The problem is that as far as LLVM's concerned, illegal vector types are "split". "Split" arguments in our backend are definitely encoding some assumptions which, when I authored this patch, I assumed were only for scalar integers. At the least it does seem to assume types which have been assigned integer locations. The psABI doesn't mention anything in the "Hardware Floating-Point Calling Convention" section about passing floating-point types indirectly. However, if the floating-point argument is passed soft, or variadic, or once other float arguments are exhausted, we do fall into the integer handling, at which point these sections cover us on "split" arguments:

Scalars that are 2×XLEN bits wide are passed in a pair of argument registers, with the low-order XLEN bits in the lower-numbered register and the high-order XLEN bits in the higher-numbered register. If no argument registers are available, the scalar is passed on the stack by value. If exactly one register is available, the low-order XLEN bits are passed in the register and the high-order XLEN bits are passed on the stack.

Scalars wider than 2×XLEN are passed by reference and are replaced in the argument list with the address.

Thing is, I can't work out how we would get to either of the code paths above (where we have a split type and push it into PendingLocs) when dealing with anything other than a scalar integer type. Floating-point type are all handled earlier by a combination of hard/float ABIs and assumptions. We certainly don't have any tests that cover it.

If you pass double on riscv32 (which is the first thing I thought of) you get 2*XLEN scalar integer locations so that's unchanged. double on riscv32 +d is specially handled and early-returned (// Handle passing f64 on RV32D with a soft float ABI or when floating point registers are exhausted.). And double on risv32 with a float ABI is passed in a register, or else through the previous case.

I agree the code looks odd in that it doesn't align to what the psABI prescribes, but after having explained this it does look okay.

You're right that, when it comes to vectors, we're generally out of what the psABI specifies. But, if anything, I would say that this falls within the "Aggregates larger than 2×XLEN bits are passed by reference and are replaced in the argument list with the address". So I'm not particularly convinced that this is the correct long-term solution, as before we were doing exactly that and now we're not.

Sorry if I'm misunderstanding your point, but we *are* accessing 0-63(sp) (later below).

Yeah sorry I misread the assembly somehow. I thought that the lowering of %A in the non-fastcc @caller was passing the first 8 elements in a0-a7 but were shadowed by stack locations 0-63, which felt wasteful. I thought that because the first load was from 64(sp) onwards. Looking again I now see that this isn't the case because I forgot to notice that the sp is being decremented by 64 first :(. So 64(sp) is indeed the 9th element. I'll remove that bit from the commit summary now. However, this may be irrelevant given the discussion below:

You're right that, when it comes to vectors, we're generally out of what the psABI specifies. But, if anything, I would say that this falls within the "Aggregates larger than 2×XLEN bits are passed by reference and are replaced in the argument list with the address". So I'm not particularly convinced that this is the correct long-term solution, as before we were doing exactly that and now we're not.

I agree that if vectors are aggregates then this isn't doing the right thing. However, I couldn't find a concrete definition of "aggregates" as per the psABI. The System V ABI is a bit clearer in saying "Aggregates (structures and arrays)" which I don't think covers vectors.

I suppose it may be wise to take a step back in case we're on the wrong track. These vector types we're dealing with are those that are passed by value in the LLVM IR. The frontend never actually generates this code. It would generate define float @foo(<32 x float>* nocapture readonly %0) for the equivalent C code:

typedef float float32 __attribute__((ext_vector_type(32)));

extern float ext(float32);

float foo(float32 v) {
  return ext(v);
}

So I realise I brought it up but is the psABI applicable for by-value vector types in IR? I believe @kito-cheng might have had the opinion that it was effectively a compiler-internal ABI in one of the recent LLVM calls.

If we do want vectors (without the V extension) to behave like the equivalently-sized aggregates in the psABI then we'll need to fix the CC lowering so that it correctly handles float ABIs in conjunction with non-scalar types. I think we've got at least 2 separate symptoms which I would like to fix here. Both of these bugs are bad interactions between split type lowering and floating-point type lowering. Both are fixed by this patch:

; RUN: llc -mtriple=riscv64 -mattr=+f < %s
; RUN: llc -mtriple=riscv64 -mattr=+f --target-abi=lp64f < %s
; RUN: llc -mtriple=riscv64 -mattr=+d --target-abi=lp64d < %s

; CC_RISCVAssign2XLen picks up on this and crashes, returning incorrect part type sizes

define <2 x float> @callee_v2f32(<2 x float> %x) {
  ret <2 x float> %x
}
; RUN: llc -mtriple=riscv32 -mattr=+f --target-abi=ilp32f  < %s
; RUN: llc -mtriple=riscv32 -mattr=+d --target-abi=ilp32d < %s
; RUN: llc -mtriple=riscv64 -mattr=+f --target-abi=lp64f < %s
; RUN: llc -mtriple=riscv64 -mattr=+d --target-abi=lp64d < %s

; The pending locs/split handling takes over, losing the initial "BCvt" LocInfo for the ABI, assigns an XLEN loc, and emits a COPY from float to int registers. This is an impossible reg-to-reg copy.

define <4 x float> @callee_v4f32(<4 x float> %x) {
  ret <4 x float> %x
}

Since the Hardware Floating-point Calling Convention doesn't refer to "aggregates" as much as it refers to "structs" I suppose it follows from your opinion about aggregates that, if the psABI is relevant, we should be looking entirely at the Integer Calling Convention? So there shouldn't be any use of floating-point registers in lowering these illegal vectors at all, no matter which ABI is used?

frasercrmck edited the summary of this revision. (Show Details)Jun 21 2021, 5:14 AM

Some thoughts:

  1. I agree with you that the psABI doesn't (for now, at least) specify this ABI condition so technically we can do whatever we want. Whether we should still follow the spirit of the ABI or be free to make something completely different I'm not sure about. As you noted, this IR construct might arguably also be outside of what a vector-aware ABI would specify (but clang isn't the only frontend).
  1. Besides the psABI not covering vectors explicitly, I think what makes the situation more confusing is that arrays are passed by reference, so the specification of the ABI for non-scalar types ends up being less general than it could be. IMO vectors passed by value are reasonably analogous to the "Floating-point reals" and other structs, the way they are discussed in the spec.

Since the Hardware Floating-point Calling Convention doesn't refer to "aggregates" as much as it refers to "structs" I suppose it follows from your opinion about aggregates that, if the psABI is relevant, we should be looking entirely at the Integer Calling Convention? So there shouldn't be any use of floating-point registers in lowering these illegal vectors at all, no matter which ABI is used?

A struct is an aggregate. I think the FP section just happens to use more specific language, as it details the rules for the actual language constructs (structs, reals, etc) that would be impacted by the FP calling convention. But if we decided to follow the spirit of the psABI my overall reading of it is that I would expect small (<= 2×XLEN) vectors passed by value to follow the FP convention, when available.

I'm not opposed to this implementation as "something that works" and that "doesn't have to be compatible with anything" and that "solves our problems now". It's just not the ABI I would expect. It seemed to me like an unnecessary departure from convention, and that's why I didn't find it particularly appealing. But perhaps it doesn't matter? In any case, thanks for the patch and for your analysis. Let's hear from other people -- @kito-cheng @asb and others.

Some thoughts:

  1. I agree with you that the psABI doesn't (for now, at least) specify this ABI condition so technically we can do whatever we want. Whether we should still follow the spirit of the ABI or be free to make something completely different I'm not sure about. As you noted, this IR construct might arguably also be outside of what a vector-aware ABI would specify (but clang isn't the only frontend).
  1. Besides the psABI not covering vectors explicitly, I think what makes the situation more confusing is that arrays are passed by reference, so the specification of the ABI for non-scalar types ends up being less general than it could be. IMO vectors passed by value are reasonably analogous to the "Floating-point reals" and other structs, the way they are discussed in the spec.

Since the Hardware Floating-point Calling Convention doesn't refer to "aggregates" as much as it refers to "structs" I suppose it follows from your opinion about aggregates that, if the psABI is relevant, we should be looking entirely at the Integer Calling Convention? So there shouldn't be any use of floating-point registers in lowering these illegal vectors at all, no matter which ABI is used?

A struct is an aggregate. I think the FP section just happens to use more specific language, as it details the rules for the actual language constructs (structs, reals, etc) that would be impacted by the FP calling convention. But if we decided to follow the spirit of the psABI my overall reading of it is that I would expect small (<= 2×XLEN) vectors passed by value to follow the FP convention, when available.

I'm not opposed to this implementation as "something that works" and that "doesn't have to be compatible with anything" and that "solves our problems now". It's just not the ABI I would expect. It seemed to me like an unnecessary departure from convention, and that's why I didn't find it particularly appealing. But perhaps it doesn't matter? In any case, thanks for the patch and for your analysis. Let's hear from other people -- @kito-cheng @asb and others.

Thanks for all of your input and thoughts. It's great to have this discussion as this sort of stuff, as we can see, isn't something I feel comfortable asserting on by myself. I would like to hear from other people, but I should add that another reason I'm wary of treating vectors == aggregates as specified by the psABI because I don't think there's great value in passing vectors <= 2*XLEN (<8 x i8> on RV64, etc) packed into one/two register(s). We'd just have to unpack it back into individual elements with shifts and truncates/extends on the caller side anyway since the vector operations would be scalarized by most compilers.

luismarques accepted this revision.Jun 21 2021, 8:22 AM

I would like to hear from other people, but I should add that another reason I'm wary of treating vectors == aggregates as specified by the psABI because I don't think there's great value in passing vectors <= 2*XLEN (<8 x i8> on RV64, etc) packed into one/two register(s). We'd just have to unpack it back into individual elements with shifts and truncates/extends on the caller side anyway since the vector operations would be scalarized by most compilers.

You're right. That's a compelling argument. The patch LGTM.
(Feedback from other people would still be welcome).

This revision is now accepted and ready to land.Jun 21 2021, 8:22 AM

Ping @asb or @kito-cheng any thoughts on this? (see above).

Ping @asb or @kito-cheng any thoughts on this? (see above).

I think you can merge this. In the unlikely case we want to change the ABI, we can always do follow-up patches.
Thank you for the fix and the analysis!

This revision was landed with ongoing or failed builds.Thu, Jul 22, 2:04 AM
This revision was automatically updated to reflect the committed changes.

Ping @asb or @kito-cheng any thoughts on this? (see above).

I think you can merge this. In the unlikely case we want to change the ABI, we can always do follow-up patches.
Thank you for the fix and the analysis!

Yeah, fair enough. Thank you for your help and patience!