This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner][x86] scalarize splatted vector FP ops
ClosedPublic

Authored by spatel on Apr 2 2019, 1:59 PM.

Details

Summary

There are a variety of vector patterns that may be profitably reduced to a scalar op when scalar ops are performed using a subset (typically, the first lane) of the vector register file.

For x86, this is true for float/double ops and element 0 because insert/extract is just a sub-register rename.

Other targets should likely enable the hook in a similar way.

Note that we will move the splat after the binop in instcombine for these patterns, but it's possible that we'll create splats here in the backend and form this pattern.

If this looks ok, the planned follow-ups would be to handle splat constant operands as well as the case where the splat follows the binop.

Diff Detail

Repository
rL LLVM

Event Timeline

spatel created this revision.Apr 2 2019, 1:59 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 2 2019, 1:59 PM
RKSimon added inline comments.Apr 3 2019, 2:38 AM
llvm/include/llvm/CodeGen/TargetLowering.h
2447 ↗(On Diff #193357)

Would we gain anything from supporting separate LHS/RHS splat indices? x86 probably not but other targets where larger vector register types often alias smaller/scalar (ARM NEON?)

llvm/lib/Target/X86/X86ISelLowering.h
1075 ↗(On Diff #193357)

I'm still amazed we haven't ended up with a isExtractElementCheap helper yet.

llvm/test/CodeGen/X86/scalarize-fp.ll
481 ↗(On Diff #193357)

Add these to trunk?

spatel marked an inline comment as done.Apr 3 2019, 5:53 AM
spatel added inline comments.
llvm/lib/Target/X86/X86ISelLowering.h
1075 ↗(On Diff #193357)

Yes, I went back and forth on how to structure and name this.

Just realized something though: the fold that I mentioned is in instcombine already exists here in the DAG. That would move splats (and other matching shuffles) after a binop because that's always a win (removes a shuffle).

But the transform is currently limited by a seemingly unnecessary legality check, so that's why it didn't catch any of the cases that I was looking at.

Let me adjust that existing fold and see if that changes how we want to implement this hook. At the least, instead of starting the match from a binop, we'll start the match from the splat and then check the binop.

spatel marked an inline comment as done.Apr 3 2019, 6:49 AM
spatel added inline comments.
llvm/lib/Target/X86/X86ISelLowering.h
1075 ↗(On Diff #193357)
spatel updated this revision to Diff 193723.Apr 4 2019, 8:25 AM

Patch updated:

  1. Start the match from a splat-shuffle rather a binop.
  2. Rename the TLI hook to the more specific query about extract_element.

This is still independent of D60214 (but it would affect more tests if we do that transform).

This would have exposed some holes in x86's horizontal op + shuffle simplifications, but those should be fixed with:
rL357642
rL357644
rL357703

spatel marked 2 inline comments as done.Apr 4 2019, 8:31 AM
spatel added inline comments.
llvm/test/CodeGen/X86/scalarize-fp.ll
440 ↗(On Diff #193723)

Non-obvious reason for this diff: we use reciprocal estimates for vector FP division by default, but not scalar FP division. This is because the x86 estimate instruction is weak and breaks too much real-world scalar code.

797 ↗(On Diff #193723)

Intentionally chose fdiv with 1.0 divisor to show the follow-on simplification.

Looks ok to me.

RKSimon added inline comments.Apr 4 2019, 9:39 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
18133 ↗(On Diff #193723)

Should we assert that the splat index is coming from N0?

spatel marked an inline comment as done.Apr 4 2019, 9:46 AM
spatel added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
18133 ↗(On Diff #193723)

Sure - I can add that. The check is at line 18080 above here.

spatel updated this revision to Diff 193738.Apr 4 2019, 9:58 AM

Patch updated:
Rather than add an assert that we are splatting from the first operand of the shuffle, I rearranged the order of the folds, so the check is more obvious.
The fact that we may be peeking through a bitcast for the other transforms always makes me nervous that we may be using a different value than we expected.

RKSimon accepted this revision.Apr 4 2019, 1:19 PM

LGTM

This revision is now accepted and ready to land.Apr 4 2019, 1:19 PM
This revision was automatically updated to reflect the committed changes.