Page MenuHomePhabricator

X86: Add pattern matching for PMADDWD
ClosedPublic

Authored by zvi on Jan 7 2018, 12:30 PM.

Details

Summary

In addition to the existing match as part of a loop-reduction, add a straightforward
pattern match for DAG-contained patterns.

Diff Detail

Repository
rL LLVM

Event Timeline

zvi created this revision.Jan 7 2018, 12:30 PM
RKSimon added inline comments.Jan 7 2018, 1:05 PM
test/CodeGen/X86/madd.ll
358 ↗(On Diff #128885)

Whats missing to get SSE2 to lower to 2 x pmaddwd? (TBH I'm more interested in AVX1 but it'd be good for SSE as well).

zvi added inline comments.Jan 7 2018, 2:42 PM
test/CodeGen/X86/madd.ll
358 ↗(On Diff #128885)

That's a good idea. Though it might be tricky to split the 'mul' operands so that type-legalization won't mess-up illegal types. I can try to rework this patch or leave it as a follow-up if it gets too messy.

RKSimon added inline comments.Jan 8 2018, 1:30 AM
test/CodeGen/X86/madd.ll
358 ↗(On Diff #128885)

I did something similar in D41440 for PAVG - we could pull out and generalize the 'LowerToAVG' code to split into legal ops and concat the results.

craig.topper added inline comments.Jan 8 2018, 9:38 AM
lib/Target/X86/X86ISelLowering.cpp
37042 ↗(On Diff #128885)

Second line is indented 2 extra spaces.

37054 ↗(On Diff #128885)

Do you need a hasSSE2 check on the v4i32? I don't see one before this call in combineAdd.

37062 ↗(On Diff #128885)

What ensures the multiply has exactly 2X the elements of the build_vector? Couldn't it have more? Which would cause the truncate later to fail.

37066 ↗(On Diff #128885)

tolerant*

37087 ↗(On Diff #128885)

Is there anything that guarantees even indices will be on the LHS?

zvi added inline comments.Jan 9 2018, 12:09 PM
lib/Target/X86/X86ISelLowering.cpp
37062 ↗(On Diff #128885)

You're right! Will fix and add tests

37087 ↗(On Diff #128885)

Will add checks for both orderings. thanks

zvi updated this revision to Diff 129145.Jan 9 2018, 12:31 PM

Fixes for Craig's comments

zvi updated this revision to Diff 129218.Jan 10 2018, 12:47 AM
  1. Following Simon's suggestion, refactored out the code that splits the vector to legal-types to 'LowerBinTo' (the function name probably needs revision)) and applied to PMADDWD.
  2. Added a missing DAGCombine to let a truncate negate a sext through an EXTRACT_SUBVECTOR.

This patch still needs to be worked on to improve comments and code-style, and broken into 2-3 patches. But would appreciate feedback on item #1 above. Thanks.

zvi marked 9 inline comments as done.Jan 10 2018, 12:49 AM
zvi updated this revision to Diff 129219.Jan 10 2018, 1:15 AM

Average lowering fully using the refactored type-splitting code.

RKSimon added inline comments.Jan 10 2018, 5:36 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
8780 ↗(On Diff #129219)

Add comment describing the combine - any chance that you can add additional tests for this?

zvi added inline comments.Jan 10 2018, 5:54 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
8780 ↗(On Diff #129219)

Sure.

zvi updated this revision to Diff 129372.Jan 10 2018, 4:48 PM

Reabase on top D41925

RKSimon added inline comments.Jan 11 2018, 12:43 PM
lib/Target/X86/X86ISelLowering.cpp
37149 ↗(On Diff #129470)

Can't you do this in a fully general manner, checking both LHS+RHS at the same time that one is the odd and the other is the even? It seems as the moment you can only check for all_odd+all_even or all_even+all_odd.

37171 ↗(On Diff #129470)

Mul.getValueType().getVectorNumElements() != (2 * e)

zvi updated this revision to Diff 129590.Jan 12 2018, 1:44 AM

Check both BUILD_VECTOR nodes together if one is composed of odd indexed extracts and the other composed of even idexed extracts.

You can probably make this completely generic and support fully random odd/even pairs:

define <4 x i32> @pmaddwd_8_interleaved(<8 x i16> %A, <8 x i16> %B) {
   %a = sext <8 x i16> %A to <8 x i32>
   %b = sext <8 x i16> %B to <8 x i32>
   %m = mul nsw <8 x i32> %a, %b
   %odd = shufflevector <8 x i32> %m, <8 x i32> undef, <4 x i32> <i32 0, i32 3, i32 4, i32 7>
   %even = shufflevector <8 x i32> %m, <8 x i32> undef, <4 x i32> <i32 1, i32 2, i32 5, i32 6>
   %ret = add <4 x i32> %even, %odd
   ret <4 x i32> %ret
}
zvi updated this revision to Diff 129751.Jan 13 2018, 12:24 AM

Generalize to account for commutativity of add and mul

RKSimon accepted this revision.Jan 13 2018, 6:44 AM

LGTM - thank you!

This revision is now accepted and ready to land.Jan 13 2018, 6:44 AM
This revision was automatically updated to reflect the committed changes.