Page MenuHomePhabricator

[x86] form reduction intrinsics over raw IR
ClosedPublic

Authored by spatel on May 30 2020, 6:34 AM.

Details

Summary

This flips the switch on forming reduction intrinsics in the vectorizers. The IR diffs seem ok, but that doesn't provide any info on what happens in expansion/codegen. I will see if we can expose any obvious bugs using PhaseOrdering IR tests that include the expansion pass.

A motivating example is seen in https://bugs.llvm.org/show_bug.cgi?id=43953#c2 - if we had intrinsics there, we might get CGP or InstCombine to fold them.

Diff Detail

Event Timeline

spatel created this revision.May 30 2020, 6:34 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 30 2020, 6:34 AM
nikic added a subscriber: nikic.May 30 2020, 9:16 AM
spatel updated this revision to Diff 268531.Jun 4 2020, 11:02 AM
spatel retitled this revision from [x86][WIP] form reduction intrinsics over raw IR to [x86] form reduction intrinsics over raw IR.

Patch updated:
Added a PhaseOrdering test file (vector-reductions-expanded.ll) that checks the output from close to initial IR from clang through -O2 followed by -expand-reductions.

I haven't figured out how to get the new pass manager to mimic that, so for now the file uses the old pass manager only.

That shows the reduction expansion pass creates FP instructions for accumulation that could be avoided, but the backend squashes those. Everything else is identical.

Note the failure to generate fmin/fmax reductions in SLP - it doesn't seem to recognize the min/max IR intrinsics as reduction candidates, but that's an existing/different issue.

We have codegen test coverage for all of the reduction intrinsics too, so that covers everything from intrinsics to x86 asm. So I think we can enable this now without any regressions.

craig.topper accepted this revision.Jun 4 2020, 11:52 AM

This makes sense to me. LGTM

This revision is now accepted and ready to land.Jun 4 2020, 11:52 AM
fhahn added a subscriber: fhahn.Jun 4 2020, 11:54 AM
fhahn added inline comments.
llvm/test/Transforms/LoopVectorize/X86/imprecise-through-phis.ll
113

maybe it's time to drop the experimental bit? :)

spatel marked 2 inline comments as done.Jun 4 2020, 2:44 PM
spatel added inline comments.
llvm/test/Transforms/LoopVectorize/X86/imprecise-through-phis.ll
113

Yes, and we can clean up the 'v2' at that point too.

I think it's best to push this patch first, see if we get any complaints, then declare the intrinsics non-experimental if things are still ok after a few days.

This revision was automatically updated to reflect the committed changes.
spatel marked an inline comment as done.
nick added a subscriber: nick.Jun 10 2020, 3:38 PM
nick added inline comments.
llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-expanded.ll
258

Is this expected behavior?

spatel marked an inline comment as done.Jun 11 2020, 7:14 AM
spatel added inline comments.
llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-expanded.ll
258

Yes, the "fadd fast 0.0" is expected as commented on previously in the review.
That's because the expansion pass is not doing any simplifications itself. We do expect SDAG or other codegen to simplify that.

If that's not working as expected, please let me know.