Up until now, we only had code to match PSADBW patterns similar to what comes out of the loop vectorizer - a partial reduction inside the loop body that gets fed into a horizontal operation in a different block.
This adds support for straight-line patterns, like those generated by the SLP vectorizer.
Details
Diff Detail
Event Timeline
Is there much scope to share more of the code with combineLoopSADPattern?
lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
26461 | To support wider types is there any way that you can split the vector, perform PSAD on both and then combine the 2 results? | |
26500 | Minor, but move this into the for() loop's initializer? | |
26529 | Move into for loop |
Thanks, Simon!
I tried to share as much code as I could, see https://reviews.llvm.org/rL276798 and https://reviews.llvm.org/rL276918 .
I don't see a way to share the "shuffle pyramid" detection code, unfortunately - for the loop case, we are forced to do it in IR, because the pyramid lives in a different basic block.
And the type checks in the beginning didn't look like they were worth it.
If you have suggestions for what I can factor out, let me know, I'll be happy to try.
lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
26461 | I think so. | |
26500 | Sure (for both loops) |
Another few minor thoughts, I'm happy if you'd prefer to just make these all future TODOs.
lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
26450 | Doesn't PSADBW generate unsigned i16 results? IIRC the horizontal sum of absdiff v16i8 / v32i8 / v64i8 should fit into a single i16 correct? Really there's nothing stopping us supporting any result integer type >= 16-bits no? | |
26461 | OK - a TODO comment is fine. | |
26481 | This looks like it could be useful for general matching of reduction/horizontal ops - possibly pull it out? I have in mind detecting any_of/all_of tests for vector comparison results that I'd like to use MOVMSK for instead - in that case it'd be the same code but we'd match against OR / AND instead of ADD. |
lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
26450 | You're right, it's another constraint the original loop code had, and removing it seems trivial. I'll add a TODO and fix this in a separate patch. | |
26481 | Sure, I'll pull it out. TBH, I wouldn't be surprised if we already had something similar, but I didn't see it. There's isHorizontalBinOp(), but it's different - it matches the x86 "horizontal" operations (like PHADDD). |
Doesn't PSADBW generate unsigned i16 results? IIRC the horizontal sum of absdiff v16i8 / v32i8 / v64i8 should fit into a single i16 correct? Really there's nothing stopping us supporting any result integer type >= 16-bits no?