This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Avoid using x0,x0 vsetvli for vmv.x.s and vfmv.f.s unless we know the sew/lmul ratio is constant.
ClosedPublic

Authored by craig.topper on Jul 20 2021, 2:50 PM.

Details

Summary

Since we're changing VTYPE, we may change VLMAX which could
invalidate the previous VL. If we can't tell if it is safe we
should use an AVL of 1 instead of keeping the old VL.

This is a quick fix. A better fix is to thread VL to the pseudo
instruction instead of making up a value. That will require ISD
opcode changes and changes to the C intrinsic interface.

This fixes the issue raised in D106286.

Diff Detail

Event Timeline

craig.topper created this revision.Jul 20 2021, 2:50 PM
craig.topper requested review of this revision.Jul 20 2021, 2:50 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 20 2021, 2:50 PM
Herald added a subscriber: MaskRay. · View Herald Transcript

Maybe vsetivli zero, 0, * rather than vsetivli zero, 1, *, so that we can tell it's a special mark and won't confused with other intrinsic use VL=1, and easier to identify & optimize that in future?

Use 0 instead of 1.

Use 0 instead of 1.

The patch summary will need update to mention 0 instead of 1.

Interesting that we've been having zero,zero at the beginning of functions all this time and I'd never thought that this is very likely to break.

I'm wondering about the pros/cons of adding a VL to the C functions. I think I'm missing something: would it really change much? If the SEW/LMUL ratio remains the same we'll omit the vsetvli, else we'll add one as we're doing in this patch. Is it likely to improve much about the generated code or is it just the "right" thing to do because it's not arbitrarily decided to be 0?

Use 0 instead of 1.

The patch summary will need update to mention 0 instead of 1.

Interesting that we've been having zero,zero at the beginning of functions all this time and I'd never thought that this is very likely to break.

I'm wondering about the pros/cons of adding a VL to the C functions. I think I'm missing something: would it really change much? If the SEW/LMUL ratio remains the same we'll omit the vsetvli, else we'll add one as we're doing in this patch. Is it likely to improve much about the generated code or is it just the "right" thing to do because it's not arbitrarily decided to be 0?

Having the user provide a value may allow us to use the same vsetvli that's needed by code after the vmv.x.s if the user can tell us that the vmv.x.s and later code all uses the same VL. Our current vsetvli insertion algorithm can't detect that. I suppose we could defer the insertion for vmv.x.s until we see what the next instruction and try to match it ourselves, but that add more complexity.

Out of order cores may want to speculate on the value of VL rather than waiting on the scalar computation for it. This would require the VL to be somewhat predictable. I'm not sure making up a number would play well with that.

It would allow us to remove some code from the vsetvli insertion pass if these instructions weren't special.

OTOH, it is perhaps confusing to provide vl for the intrinsic when the instruction doesn't need it. We'll need to rework isel patterns and may not be able to use ISD::EXTRACT_VECTOR_ELT directly for floating point.

I think we need to weigh the options here. I'm leaning towards adding the VL to the intrinsics and instructions, but I might be persuaded to just keep this patch.

It would allow us to remove some code from the vsetvli insertion pass if these instructions weren't special.

OTOH, it is perhaps confusing to provide vl for the intrinsic when the instruction doesn't need it. We'll need to rework isel patterns and may not be able to use ISD::EXTRACT_VECTOR_ELT directly for floating point.

I think we need to weigh the options here. I'm leaning towards adding the VL to the intrinsics and instructions, but I might be persuaded to just keep this patch.

I'm not super thrilled by requiring the addition of a vl if it is not used.

If we add it, it may be documented as a no-op argument that the compiler can use as a hint for better code generation (in the context of other vector intrinsics using that same vl). It may be a bit of a tough sell though.

It would allow us to remove some code from the vsetvli insertion pass if these instructions weren't special.

OTOH, it is perhaps confusing to provide vl for the intrinsic when the instruction doesn't need it. We'll need to rework isel patterns and may not be able to use ISD::EXTRACT_VECTOR_ELT directly for floating point.

I think we need to weigh the options here. I'm leaning towards adding the VL to the intrinsics and instructions, but I might be persuaded to just keep this patch.

I'm not super thrilled by requiring the addition of a vl if it is not used.

If we add it, it may be documented as a no-op argument that the compiler can use as a hint for better code generation (in the context of other vector intrinsics using that same vl). It may be a bit of a tough sell though.

Ok I won't change the intrinsic. Any objection to this patch as is?

frasercrmck accepted this revision.Jul 23 2021, 7:15 AM

LGTM. Even if there's discussion still to be had about the intrinsics I'd personally prefer to have something in that fixes this issue by the time we branch for 13. We could revisit this later if required.

This revision is now accepted and ready to land.Jul 23 2021, 7:15 AM
This revision was landed with ongoing or failed builds.Jul 23 2021, 9:13 AM
This revision was automatically updated to reflect the committed changes.