This is an archive of the discontinued LLVM Phabricator instance.

[X86] Match PSADBW in straight-line code
ClosedPublic

Authored by mkuper on Jul 27 2016, 3:34 PM.

Details

Summary

Up until now, we only had code to match PSADBW patterns similar to what comes out of the loop vectorizer - a partial reduction inside the loop body that gets fed into a horizontal operation in a different block.
This adds support for straight-line patterns, like those generated by the SLP vectorizer.

Diff Detail

Repository
rL LLVM

Event Timeline

mkuper updated this revision to Diff 65819.Jul 27 2016, 3:34 PM
mkuper retitled this revision from to [X86] Match PSADBW in straight-line code.
mkuper updated this object.
mkuper added reviewers: spatel, RKSimon, wmi.
mkuper added a subscriber: llvm-commits.
wmi added a subscriber: danielcdh.Jul 27 2016, 4:07 PM
RKSimon edited edge metadata.Jul 28 2016, 9:58 AM

Is there much scope to share more of the code with combineLoopSADPattern?

lib/Target/X86/X86ISelLowering.cpp
26461 ↗(On Diff #65819)

To support wider types is there any way that you can split the vector, perform PSAD on both and then combine the 2 results?

26500 ↗(On Diff #65819)

Minor, but move this into the for() loop's initializer?

26529 ↗(On Diff #65819)

Move into for loop

Thanks, Simon!

I tried to share as much code as I could, see https://reviews.llvm.org/rL276798 and https://reviews.llvm.org/rL276918 .
I don't see a way to share the "shuffle pyramid" detection code, unfortunately - for the loop case, we are forced to do it in IR, because the pyramid lives in a different basic block.
And the type checks in the beginning didn't look like they were worth it.

If you have suggestions for what I can factor out, let me know, I'll be happy to try.

lib/Target/X86/X86ISelLowering.cpp
26461 ↗(On Diff #65819)

I think so.
Right now we don't do this for the loop version either (I didn't write that code originally :-) ), and I kept the same constraint here. I'd prefer to leave it as a TODO, in both places.

26500 ↗(On Diff #65819)

Sure (for both loops)

Another few minor thoughts, I'm happy if you'd prefer to just make these all future TODOs.

lib/Target/X86/X86ISelLowering.cpp
26450 ↗(On Diff #65819)

Doesn't PSADBW generate unsigned i16 results? IIRC the horizontal sum of absdiff v16i8 / v32i8 / v64i8 should fit into a single i16 correct? Really there's nothing stopping us supporting any result integer type >= 16-bits no?

26461 ↗(On Diff #65819)

OK - a TODO comment is fine.

26481 ↗(On Diff #65819)

This looks like it could be useful for general matching of reduction/horizontal ops - possibly pull it out? I have in mind detecting any_of/all_of tests for vector comparison results that I'd like to use MOVMSK for instead - in that case it'd be the same code but we'd match against OR / AND instead of ADD.

mkuper added inline comments.Jul 29 2016, 10:09 AM
lib/Target/X86/X86ISelLowering.cpp
26450 ↗(On Diff #65819)

You're right, it's another constraint the original loop code had, and removing it seems trivial. I'll add a TODO and fix this in a separate patch.

26481 ↗(On Diff #65819)

Sure, I'll pull it out.

TBH, I wouldn't be surprised if we already had something similar, but I didn't see it. There's isHorizontalBinOp(), but it's different - it matches the x86 "horizontal" operations (like PHADDD).

mkuper updated this revision to Diff 66148.Jul 29 2016, 10:50 AM
mkuper edited edge metadata.

Updated per Simon's comments.

RKSimon accepted this revision.Jul 29 2016, 2:28 PM
RKSimon edited edge metadata.

LGTM

This revision is now accepted and ready to land.Jul 29 2016, 2:28 PM
This revision was automatically updated to reflect the committed changes.