This is an archive of the discontinued LLVM Phabricator instance.

[DAG] Fold Op(vecreduce(a), vecreduce(b)) into vecreduce(Op(a,b))
ClosedPublic

Authored by dmgreen on Jan 16 2023, 1:02 PM.

Details

Summary

So long as the operation is reassociative, we can reassociate the double vecreduce from for example fadd(vecreduce(a), vecreduce(b)) to vecreduce(fadd(a,b)). This will in general save a few instructions, but some architectures (MVE) require the opposite fold, so a shouldExpandReduction is added to account for it. Only targets that use shouldExpandReduction will be affected.

Diff Detail

Event Timeline

dmgreen created this revision.Jan 16 2023, 1:02 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 16 2023, 1:02 PM
dmgreen requested review of this revision.Jan 16 2023, 1:02 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 16 2023, 1:02 PM

We probably want this for all associate reductions - fmul, min/max, integer ops, etc. Generalize into a helper function that takes an opcode and maps it to the corresponding reduction opcode? Subsequent patches then just need to add set of tests and another case into a switch or something like that.

Should we be doing this at IR level as well? https://simd.godbolt.org/z/aTE8qjMET (to be clear - I'm not expecting this patch to address it)

barannikov88 added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
15695
barannikov88 added inline comments.Jan 16 2023, 1:18 PM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
15695

Never mind, I misunderstood the transformation.
Please copy a comment showing the transformation from the description to code.

Should we be doing this at IR level as well? https://simd.godbolt.org/z/aTE8qjMET (to be clear - I'm not expecting this patch to address it)

The case that was reported to me was a <16 x float> reduction and a <4 x float> reduction added together, which needs to be handled after legalization. Similar to fadd_reduct_reassoc_v4v8f32 but with a 16x and more going on.

I'm not sure that certain architectures would like the combine in for all reductions. It can depend on whether the reduction+add is cheap. MVE might prefer multiple reductions for integer adds for example, which can require quite precise cost modelling, unfortunately.

We probably want this for all associate reductions - fmul, min/max, integer ops, etc. Generalize into a helper function that takes an opcode and maps it to the corresponding reduction opcode? Subsequent patches then just need to add set of tests and another case into a switch or something like that.

Sounds good. I can take a look, and see if it needs a target-hook to prevent the transform for any backends.

dmgreen updated this revision to Diff 494826.Feb 4 2023, 8:42 AM
dmgreen retitled this revision from [DAG] Fold fadd(vecreduce(a), vecreduce(b)) into vecreduce(fadd(a,b)) to [DAG] Fold Op(vecreduce(a), vecreduce(b)) into vecreduce(Op(a,b)).
dmgreen edited the summary of this revision. (Show Details)

I've update this to now handle all the reduction types, and added tests for various architectures.

spatel added inline comments.Feb 6 2023, 5:05 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
1325

Flags param is not used?
They could propagate with:

SelectionDAG::FlagInserter FlagsInserter(DAG, Flags);
dmgreen updated this revision to Diff 495420.Feb 7 2023, 1:35 AM

Thanks - Now using a FlagInserter to copy the flags. The flags for integer ops are not passed through, as nsw/nuw/etc may not apply to the new operations.

spatel accepted this revision.Feb 7 2023, 5:54 AM

LGTM

Thanks - Now using a FlagInserter to copy the flags. The flags for integer ops are not passed through, as nsw/nuw/etc may not apply to the new operations.

Ah, right - I was only thinking of FMF flags in this context. That's worth a header comment on reassociateReduction. It looks like we drop all flags in reassociateOps, but that's not called from FP opcodes.

We could also make it less likely to go wrong by using a default flags param and/or naming that param "FMFFlags" or something like that.
For example, we have functions like this:

SDValue getMemBasePlusOffset(SDValue Base, SDValue Offset, const SDLoc &DL,
                             const SDNodeFlags Flags = SDNodeFlags());
This revision is now accepted and ready to land.Feb 7 2023, 5:54 AM
This revision was landed with ongoing or failed builds.Feb 8 2023, 3:43 AM
This revision was automatically updated to reflect the committed changes.
dzhidzhoev added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5521

Should here be "min/max/...(vecreduce..."?

dmgreen added inline comments.Jun 13 2023, 9:17 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5521