This is an archive of the discontinued LLVM Phabricator instance.

[RISCV][VP] Lower FP VP ISD nodes to RVV instructions
ClosedPublic

Authored by frasercrmck on Jun 14 2021, 9:13 AM.

Details

Summary

With the exception of frem, this patch supports the current set of VP
floating-point binary intrinsics by lowering them to to RVV instructions. It
does so by using the existing RISCVISD *_VL custom nodes as an intermediate
layer. Both scalable and fixed-length vectors are supported by using this
method.

The frem node is unsupported due to a lack of available instructions. For
fixed-length vectors we could scalarize but that option is not (currently)
available for scalable-vector types. The support is intentionally left out so
it equivalent for both vector types.

The matching of vector/scalar forms is currently lacking, as scalable vector
types do not lower to the custom VFMV_V_F_VL node. We could either make
floating-point scalable vector splats lower to this node, or support the
matching of multiple kinds of splat via a ComplexPattern, much like we do for
integer types.

Diff Detail

Event Timeline

frasercrmck created this revision.Jun 14 2021, 9:13 AM
frasercrmck requested review of this revision.Jun 14 2021, 9:13 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 14 2021, 9:13 AM

Will frem need to be expanded to a loop in the expansion pass?

rogfer01 added a comment.EditedJun 15 2021, 2:57 AM

Will frem need to be expanded to a loop in the expansion pass?

Would it be reasonable to expand it using the definition of fmod but vector-wise? It seems we'd need a division, a rounding to integer towards zero, convert back to float, a mutiplication and then a subtraction.

Will frem need to be expanded to a loop in the expansion pass?

Would it be reasonable to expand it using the definition of fmod but vector-wise? It seems we'd need a division, a rounding to integer towards zero, convert back to float, a mutiplication and then a subtraction.

I think the VP expansion pass should probably be left to expand vp.frem to regular vector frem since that's a legal target-independent instruction and ExpandVectorPredication is meant to be generic.

As for RISC-V I'm not sure. We could have our own expansion pass but I'd rather lower it to some sequence during ISel if we can. I'll take a look at fmod - I'm unsure if there needs to be additional logic for NaNs and all that fun.

Will frem need to be expanded to a loop in the expansion pass?

Would it be reasonable to expand it using the definition of fmod but vector-wise? It seems we'd need a division, a rounding to integer towards zero, convert back to float, a mutiplication and then a subtraction.

I don't think that gives an exact answer for all inputs since each of the individual operations can be subject to rounding.

I think the VP expansion pass should probably be left to expand vp.frem to regular vector frem since that's a legal target-independent instruction and ExpandVectorPredication is meant to be generic.

As for RISC-V I'm not sure. We could have our own expansion pass but I'd rather lower it to some sequence during ISel if we can. I'll take a look at fmod - I'm unsure if there needs to be additional logic for NaNs and all that fun.

Many targets scalarize frem on fixed vectors. It looks like frem on scalable vectors currently crashes on SVE because the type legalizer can't unroll it.

Many targets scalarize frem on fixed vectors. It looks like frem on scalable vectors currently crashes on SVE because the type legalizer can't unroll it.

Indeed. I don't think we or any target support any level of scalable-vector frem right now, so I think this discussion is orthogonal to this VP patch. But I'll spend some time today seeing if there's anything we can do since it's come up.

Matt added a subscriber: Matt.Jun 16 2021, 8:54 AM

Many targets scalarize frem on fixed vectors. It looks like frem on scalable vectors currently crashes on SVE because the type legalizer can't unroll it.

Indeed. I don't think we or any target support any level of scalable-vector frem right now, so I think this discussion is orthogonal to this VP patch. But I'll spend some time today seeing if there's anything we can do since it's come up.

I spent a bit of time looking at musl's implementation of fmod and running it through Codeplay's whole-function vectorizer. I get vectorized output and it's passing musl's own tests for rounding, NaNs, etc. (not exceptions but I don't think that's a problem for LLVM's fmod instruction), but it's definitely not something we want to emit inline with IRBuilder or with SelectionDAG: control flow, loops, you name it. I think it'd have to be lowered to a vector library call, realistically-speaking. That or we arrive back at the beginning of this discussion and emit a loop with the scalar fmod and it expands itself to a libcall.

rogfer01 accepted this revision.Jun 16 2021, 11:19 PM

Many targets scalarize frem on fixed vectors. It looks like frem on scalable vectors currently crashes on SVE because the type legalizer can't unroll it.

Indeed. I don't think we or any target support any level of scalable-vector frem right now, so I think this discussion is orthogonal to this VP patch. But I'll spend some time today seeing if there's anything we can do since it's come up.

I spent a bit of time looking at musl's implementation of fmod and running it through Codeplay's whole-function vectorizer. I get vectorized output and it's passing musl's own tests for rounding, NaNs, etc. (not exceptions but I don't think that's a problem for LLVM's fmod instruction), but it's definitely not something we want to emit inline with IRBuilder or with SelectionDAG: control flow, loops, you name it. I think it'd have to be lowered to a vector library call, realistically-speaking. That or we arrive back at the beginning of this discussion and emit a loop with the scalar fmod and it expands itself to a libcall.

Thanks for looking into it. In our downstream we defer it too to a function call so we can complete the compilation and elsewhere we can provide an implementation (there is a number of other ISD nodes that we already lower this way like exp, sin, cos, etc.). AFAICT Arm SVE is not lowering any operation to LibCallyet so perhaps this conversation should be raised in the mailing list. Not sure if we need to align compiler-rt/libgcc in terms of naming conventions here.

I'm happy to land this without frem.

This revision is now accepted and ready to land.Jun 16 2021, 11:19 PM
This revision was automatically updated to reflect the committed changes.

Thanks for looking into it. In our downstream we defer it too to a function call so we can complete the compilation and elsewhere we can provide an implementation (there is a number of other ISD nodes that we already lower this way like exp, sin, cos, etc.). AFAICT Arm SVE is not lowering any operation to LibCallyet so perhaps this conversation should be raised in the mailing list. Not sure if we need to align compiler-rt/libgcc in terms of naming conventions here.

I'm happy to land this without frem.

Thanks for the review.

Downstream for us is OpenCL and we have both a custom builtins/math library and whole-function vectorizer to transform fmod OpenCL functions (both scalar and fixed vector) into (wider & possibly scalable) vector forms. So we're currently just transforming any LLVM IR fmod into a call to the equivalent OpenCL builtin to avoid the backend complaining. It's not where I'd like to leave it but it does the job in the short term.

Yes perhaps we should arrange a call with SVE and any other interested parties. We just missed the opportunity to bring this up in the last SVE call for a while :)