[SelectionDAG] unroll unsupported vector FP ops earlier to avoid libcalls on undef elements (PR38527)
ClosedPublic

Authored by spatel on Aug 15 2018, 10:53 AM.

Details

Summary

I haven't dealt with vector legalization, so this might not be the best or even correct approach. But I think this solves the motivating case from:
https://bugs.llvm.org/show_bug.cgi?id=38527

If we are legalizing an FP vector op that maps to 1 of the LLVM intrinsics that mimic libm calls, but we're going to end up with scalar libcalls for that vector type anyway, then we should unroll the vector op into scalars before widening. This avoids libcalls because we've lost the knowledge that some of the scalar elements are undef.

Please have a close look at the test diffs to make sure we didn't drop any necessary ops.

Diff Detail

Repository
rL LLVM
spatel created this revision.Aug 15 2018, 10:53 AM
efriedma added inline comments.Aug 15 2018, 12:22 PM
lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
2445 ↗(On Diff #160847)

I think you need to call TLI.getRegisterType() rather than getTypeToTransformTo()? Consider what happens to, for example, <5 x f32> on AArch64.

spatel added inline comments.Aug 15 2018, 12:52 PM
lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
2445 ↗(On Diff #160847)

I initially didn't step through that test since it showed no diffs, but here's what I see in the debug output for that case:
VT = v5f32
WideVecVT = v8f32 (is it expected that we ask to widen to the next pow-of-2 size?)

Legalizing node: t12: v5f32 = fsin t11
Analyzing result type: v5f32
Widen node result 0: t12: v5f32 = fsin t11

Creating new node: t36: f32 = fsin t2
Creating new node: t37: f32 = fsin t4
Creating new node: t38: f32 = fsin t6
Creating new node: t39: f32 = fsin t8
Creating new node: t40: f32 = fsin t10
Creating new node: t41: v8f32 = BUILD_VECTOR t36, t37, t38, t39, t40, undef:f32, undef:f32, undef:f32

I copied the getTypeToTransformTo() call from the implementation of WidenVecRes_Unary(), so that's what we use if we fall-through. Should there be a difference in the call based on what we're using that type for (UnrollVectorOp() vs. GetWidenedVector())?

efriedma added inline comments.Aug 15 2018, 12:59 PM
lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
2445 ↗(On Diff #160847)

Thinking about it a bit more, you'd probably get the same result if you just didn't try to compute the vector type at all, and just check TLI.isOperationExpand(N->getOpcode(), VT.getScalarType(). (If a vector operation is legal, the equivalent scalar operation is generally also legal.)

spatel added inline comments.Aug 15 2018, 1:20 PM
lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
2445 ↗(On Diff #160847)

We still have to compute the vector type to pass into UnrollVectorOp though? But you're correct that I don't see any tests diffs if we just ask if the scalar type is set to expand. Let me post the updated patch and see if that looks better.

spatel updated this revision to Diff 160907.Aug 15 2018, 1:23 PM

Make the vector type check an assert instead of an actual condition for the transform (no test diffs).

Thinking forward to a future where we have SVML library support for v4f32 and a v8f32 sin/cos. What should we do for v5f32?

Thinking forward to a future where we have SVML library support for v4f32 and a v8f32 sin/cos. What should we do for v5f32?

Not sure. Over in:
https://bugs.llvm.org/show_bug.cgi?id=38528
...we discussed transforming to veclib calls late in IR, so any special lib support would already be taken care of before we reach this point?

Another note about the current patch: I did try sinking the expansion check directly into WidenVecRes_Unary(N), so it would apply to all of the unary opcodes in the switch rather than just the FP libcall opcodes. But that results in a regression in test/CodeGen/X86/bswap-vector.ll because we scalarize something that currently gets turned into a shuffle.

Thinking forward to a future where we have SVML library support for v4f32 and a v8f32 sin/cos. What should we do for v5f32?

Not sure. Over in:
https://bugs.llvm.org/show_bug.cgi?id=38528
...we discussed transforming to veclib calls late in IR, so any special lib support would already be taken care of before we reach this point?

True. But it probably also makes sense to handle them in SDAG so that targets can also provide custom lowering.

Another note about the current patch: I did try sinking the expansion check directly into WidenVecRes_Unary(N), so it would apply to all of the unary opcodes in the switch rather than just the FP libcall opcodes. But that results in a regression in test/CodeGen/X86/bswap-vector.ll because we scalarize something that currently gets turned into a shuffle.

lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
2445 ↗(On Diff #160847)

Makes sense to me.

Ping.

Are we ok with this as-is, needs refinements, or do we want to answer the custom lib lowering (SVML) questions first?

The changes to tests seem fine to me. I don’t see why this behaviour would be incompatible with a future change to add support for SVML & friends at selection/legalisation time.

This revision is now accepted and ready to land.Aug 22 2018, 11:31 AM
This revision was automatically updated to reflect the committed changes.