This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Add legalizations for VECREDUCE_SEQ_FADD
ClosedPublic

Authored by cameron.mcinally on Oct 27 2020, 9:39 AM.

Details

Summary

Continuing from D89162...

I think I stumbled across the motivation for the ExpandReductions pass. If we wait until Legalization to expand the ordered reductions, we end up with suboptimal code for illegal types. And it's pretty bad. We could end up with O(n/2) extra operations.

A good example of this can be seen in @test_v3f32 from vecreduce-fadd-legalization-strict.ll. Here we end up with 4 FADDs, instead of the 3 FADDs required. The newly added FADD is the result of widening the illegal v3f32 vector type to v4f32, where the newly added element in the reduction is the "neutral" value, 0.0.

Question is... should we be optimizing for exceptional cases? Eh, probably not IMHO. But I could see users filing bug reports for the extra operations. What does everyone else think?

[*Also note that I decided to turn these new legalizations on for NEON, and not just SVE. NEON had a number of existing tests, so it made sense to reuse them. And I'm still missing legalization support for some other actions, like SoftenFloat and PromoteFloat. I did not find existing tests for those actions.*]

Diff Detail

Event Timeline

cameron.mcinally requested review of this revision.Oct 27 2020, 9:39 AM

A good example of this can be seen in @test_v3f32 from vecreduce-fadd-legalization-strict.ll. Here we end up with 4 FADDs, instead of the 3 FADDs required. The newly added FADD is the result of widening the illegal v3f32 vector type to v4f32, where the newly added element in the reduction is the "neutral" value, 0.0.

Looking at https://github.com/llvm/llvm-project/blob/5a3ef55a524bf9e072d98286e5febdb218b1fc72/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp#L7477-L7480, shouldn't this just be a matter of using -0.0 as the neutral element instead? If 0.0 is not actually neutral here, then this is not just suboptimal, it's incorrect. (We should fix this for the non-sequential case as well.)

And I'm still missing legalization support for some other actions, like SoftenFloat and PromoteFloat. I did not find existing tests for those actions.

Copying llvm/test/CodeGen/ARM/vecreduce-fadd-legalization-soft-float.ll without the fast fmf should work as test coverage for that.

A good example of this can be seen in @test_v3f32 from vecreduce-fadd-legalization-strict.ll. Here we end up with 4 FADDs, instead of the 3 FADDs required. The newly added FADD is the result of widening the illegal v3f32 vector type to v4f32, where the newly added element in the reduction is the "neutral" value, 0.0.

Looking at https://github.com/llvm/llvm-project/blob/5a3ef55a524bf9e072d98286e5febdb218b1fc72/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp#L7477-L7480, shouldn't this just be a matter of using -0.0 as the neutral element instead? If 0.0 is not actually neutral here, then this is not just suboptimal, it's incorrect. (We should fix this for the non-sequential case as well.)

@spatel

Yup, good catch. The (-0.0+0.0 -> +0.0) case is a problem. The identity should be -0.0.

And I'm still missing legalization support for some other actions, like SoftenFloat and PromoteFloat. I did not find existing tests for those actions.

Copying llvm/test/CodeGen/ARM/vecreduce-fadd-legalization-soft-float.ll without the fast fmf should work as test coverage for that.

Ok, I can build that out. Are we okay with the suboptimal legalization though? I'll wait for that decision before putting more time into this.

Or does anyone see a clever fix for the illegal type legalization? It looks like we lost information during widening, so I'm not sure we can get it back in a non-hacky way.

Ok, I can build that out. Are we okay with the suboptimal legalization though? I'll wait for that decision before putting more time into this.

Or does anyone see a clever fix for the illegal type legalization? It looks like we lost information during widening, so I'm not sure we can get it back in a non-hacky way.

Not sure I follow. If the neutral element is fixed, then the extra fadds should also get folded away. Or is there some additional sub-optimality here?

Ok, I can build that out. Are we okay with the suboptimal legalization though? I'll wait for that decision before putting more time into this.

Or does anyone see a clever fix for the illegal type legalization? It looks like we lost information during widening, so I'm not sure we can get it back in a non-hacky way.

Not sure I follow. If the neutral element is fixed, then the extra fadds should also get folded away. Or is there some additional sub-optimality here?

Ah, I follow now. Didn't catch the connection...

Update 'neutral' element to -0.0.

@nikic, I don't see llvm/test/CodeGen/AArch64/vecreduce-fadd-legalization-soft-float.ll. Is that a test file you have downstream? Or is it just missing for AArch64/?

Ah, I see it in ARM/. That will work...

Looking at https://github.com/llvm/llvm-project/blob/5a3ef55a524bf9e072d98286e5febdb218b1fc72/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp#L7477-L7480, shouldn't this just be a matter of using -0.0 as the neutral element instead? If 0.0 is not actually neutral here, then this is not just suboptimal, it's incorrect. (We should fix this for the non-sequential case as well.)

Agree - looks like the -0.0 was missed in D58015. Note that we have semi-redundant logic for this sort of thing in IR with ConstantExpr::getBinOpIdentity(), so we should confirm that everything is lined up and/or try to consolidate code.

Comment from ARM/ARMISelLowering.cpp:

// Custom Expand smaller than legal vector reductions to prevent false zero
// items being added.
setOperationAction(ISD::VECREDUCE_FADD, MVT::v4f16, Custom);

So it will probably take some work to truly unwind the 0.0 neutral element code. Just FYI...

Comment from ARM/ARMISelLowering.cpp:

// Custom Expand smaller than legal vector reductions to prevent false zero
// items being added.
setOperationAction(ISD::VECREDUCE_FADD, MVT::v4f16, Custom);

Those are for MVE. Maybe try NEON instead?

Updated patch with, I think, all the needed legalizations.

This patch touches both the ARM and AArch64 backends, to exercise all the different legalizations. If this is too much to review in one patch, we might want to split it up into two Diffs, even if there's a small window of unsupported legalizations in the wild. It should be relatively safe to split.

Also note that my confidence in the softening test changes is low. E.g. @test_v4f32_strict. Is it obvious to anyone if the new instruction sequences are bad?

nikic added inline comments.Oct 29 2020, 2:10 PM
llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
713

Looks like there is no definition for this method. There's probably no way to test it (requires X86), but I'd suggest to still include it, as the implementation is the same as the other soft float legalization. I'd also be fine with not having it, in which case this declaration should be dropped as well.

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
826

As this is a recurring pattern, I have extracted an ISD::getVecReduceBaseOpcode() function that does this. Please add VECREDUCE_SEQ_FADD there and then use it here and in expansion.

4865

Again, this is a recurring pattern, so I have extracted a SelectionDAG::getNeutralElement() API for this. The invocation should be:

SDValue NeutralElem = DAG.getNeutralElement(
    ISD::getVecReduceBaseOpcode(N->getOpcode()), dl, ElemVT, Flags);
cameron.mcinally marked 3 inline comments as done.

Update patch based on @nikic's comments...

llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
713

Implemented.

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
4865

I improvised a little here for readability. Is that okay with you, @nikic?

Although, I wonder if getting the BaseOpc is a necessary step here. Did you consider just adding VECREDUCE_* cases to DAG.getNeutralElement(...)?

nikic accepted this revision.Oct 30 2020, 1:06 PM

LGTM with the default action adjusted...

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
4865

I improvised a little here for readability. Is that okay with you, @nikic?

Sure!

Although, I wonder if getting the BaseOpc is a necessary step here. Did you consider just adding VECREDUCE_* cases to DAG.getNeutralElement(...)?

Not strictly necessary, but I personally found it a bit more elegant to not repeat the full VECREDUCE list another time here.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
929 ↗(On Diff #301993)

Rather than explicitly marking them as expanded in targets, VECREDUCE_SEQ_FADD should be marked as default expanded using the list at https://github.com/llvm/llvm-project/blob/aa1c6b79878475d61c90c0d4c2af17312242f18e/llvm/lib/CodeGen/TargetLoweringBase.cpp#L722, to stay consistent with other VECREDUCE opcodes.

This revision is now accepted and ready to land.Oct 30 2020, 1:06 PM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
929 ↗(On Diff #301993)

Oh, cool. Didn't know that list existed. Thanks.