This is an archive of the discontinued LLVM Phabricator instance.

[SelectionDAG] allow FP binops in SimplifyDemandedVectorElts
ClosedPublic

Authored by spatel on Oct 4 2018, 3:53 PM.

Details

Summary

This is intended to make the backend on par with functionality that was added to the IR version of SimplifyDemandedVectorElts in:
rL343727
...and the original motivation is that we need to improve demanded-vector-elements in several ways to avoid problems that would be exposed in D51553.

There's an SSE1 regression related to movhpd that I'd prefer to defer because everything else looks like a win or neutral to me.

Diff Detail

Event Timeline

spatel created this revision.Oct 4 2018, 3:53 PM
arsenm added a subscriber: arsenm.Oct 4 2018, 7:21 PM
arsenm added inline comments.
lib/CodeGen/SelectionDAG/TargetLowering.cpp
1641

Should this also handle the unary and ternary ops?

spatel added inline comments.Oct 5 2018, 5:46 AM
lib/CodeGen/SelectionDAG/TargetLowering.cpp
1641

Yes, definitely. But I'm planning to extend this in pieces to limit risk. As we're seeing in this patch, and I found in a draft patch that included ISD::MUL, we have potential regressions as a result of improving this transform. I'm hoping that's a quirk of x86 only, but I can't say for sure because other targets don't seem to have regression tests that are affected.

RKSimon added inline comments.Oct 5 2018, 7:05 AM
test/CodeGen/X86/sse2-schedule.ll
5092 ↗(On Diff #168399)

This should be fixable by tweaking the test so use all the elements from xmm1

spatel updated this revision to Diff 168491.Oct 5 2018, 11:06 AM

Patch updated - no code changes, but:
Rebased after rL343858, so less noise from the SSE1/2 sched tests.

Side note about adding more ops, we'll need to audit our use of splat matching in the backend to avoid regressions as we make demanded-elts smarter.
Example: rL343865

RKSimon added inline comments.Oct 5 2018, 11:20 AM
test/CodeGen/X86/avx-schedule.ll
554

Looks like this test needs to be made more robust as well!

test/CodeGen/X86/sse41-schedule.ll
121

And this one...

spatel updated this revision to Diff 168527.Oct 5 2018, 1:49 PM
spatel added a reviewer: andreadb.

Patch updated - once again, no code changes, but rebased after:
rL343882
...so now we don't have any more sched test diffs.
I was completely missing the point of those tests (to check both reg-reg and reg-mem variants of the opcodes).

craig.topper added inline comments.Oct 5 2018, 2:23 PM
test/CodeGen/X86/avx512-hadd-hsub.ll
183

Do you have a plan to narrow the add here too?

spatel added inline comments.Oct 5 2018, 2:35 PM
test/CodeGen/X86/avx512-hadd-hsub.ll
183

That's the goal, but I'm still not sure how many steps this is going to take. I'm imagining there are 5 groups of patches needed before we get there:

  1. Allow undefs with splat matching.
  2. Propagate more undefs with SimplifyDemandedVectorElts.
  3. Call SimplifyDemandedVectorElts starting from more opcodes.
  4. Add per-element demanded logic for more ops (example: add X, undef --> undef).
  5. Add narrowing transforms.
spatel added inline comments.Oct 5 2018, 2:42 PM
test/CodeGen/X86/avx512-hadd-hsub.ll
183

This particular case (extract_subvector) should be easy. I thought I already had that 1 in IR...but it's not there either.

RKSimon accepted this revision.Oct 12 2018, 3:32 AM

LGTM although may need to be rebased after D53095 lands

This revision is now accepted and ready to land.Oct 12 2018, 3:32 AM

Trying to avoid regressions with improved undef propagation:
rL343940
rL343942
rL343945
rL344525
rL344528
rL344534

This revision was automatically updated to reflect the committed changes.