In addition to the existing match as part of a loop-reduction, add a straightforward
pattern match for DAG-contained patterns.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
test/CodeGen/X86/madd.ll | ||
---|---|---|
358 ↗ | (On Diff #128885) | Whats missing to get SSE2 to lower to 2 x pmaddwd? (TBH I'm more interested in AVX1 but it'd be good for SSE as well). |
test/CodeGen/X86/madd.ll | ||
---|---|---|
358 ↗ | (On Diff #128885) | That's a good idea. Though it might be tricky to split the 'mul' operands so that type-legalization won't mess-up illegal types. I can try to rework this patch or leave it as a follow-up if it gets too messy. |
lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
37042 ↗ | (On Diff #128885) | Second line is indented 2 extra spaces. |
37054 ↗ | (On Diff #128885) | Do you need a hasSSE2 check on the v4i32? I don't see one before this call in combineAdd. |
37062 ↗ | (On Diff #128885) | What ensures the multiply has exactly 2X the elements of the build_vector? Couldn't it have more? Which would cause the truncate later to fail. |
37066 ↗ | (On Diff #128885) | tolerant* |
37087 ↗ | (On Diff #128885) | Is there anything that guarantees even indices will be on the LHS? |
- Following Simon's suggestion, refactored out the code that splits the vector to legal-types to 'LowerBinTo' (the function name probably needs revision)) and applied to PMADDWD.
- Added a missing DAGCombine to let a truncate negate a sext through an EXTRACT_SUBVECTOR.
This patch still needs to be worked on to improve comments and code-style, and broken into 2-3 patches. But would appreciate feedback on item #1 above. Thanks.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
8780 ↗ | (On Diff #129219) | Add comment describing the combine - any chance that you can add additional tests for this? |
lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
8780 ↗ | (On Diff #129219) | Sure. |
lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
37149 ↗ | (On Diff #129470) | Can't you do this in a fully general manner, checking both LHS+RHS at the same time that one is the odd and the other is the even? It seems as the moment you can only check for all_odd+all_even or all_even+all_odd. |
37171 ↗ | (On Diff #129470) | Mul.getValueType().getVectorNumElements() != (2 * e) |
Check both BUILD_VECTOR nodes together if one is composed of odd indexed extracts and the other composed of even idexed extracts.
You can probably make this completely generic and support fully random odd/even pairs:
define <4 x i32> @pmaddwd_8_interleaved(<8 x i16> %A, <8 x i16> %B) { %a = sext <8 x i16> %A to <8 x i32> %b = sext <8 x i16> %B to <8 x i32> %m = mul nsw <8 x i32> %a, %b %odd = shufflevector <8 x i32> %m, <8 x i32> undef, <4 x i32> <i32 0, i32 3, i32 4, i32 7> %even = shufflevector <8 x i32> %m, <8 x i32> undef, <4 x i32> <i32 1, i32 2, i32 5, i32 6> %ret = add <4 x i32> %even, %odd ret <4 x i32> %ret }