This is an archive of the discontinued LLVM Phabricator instance.

[X86] Add pattern matching for PMADDUBSW
ClosedPublic

Authored by craig.topper on Jul 25 2018, 5:21 PM.

Details

Summary

Similar to D49636, but for PMADDUBSW. This instruction has the additional complexity that the addition of the two products saturates to 16-bits rather than wrapping around. And one operand is treated as signed and the other as unsigned.

I changed the madd.ll test command line from sse2 to ssse3 to ensure this instruction was available which also caused some test changes for phadd. I can commit that separately if desired. Or I can add a new run line. Or a new test file. Whatever is preferable

A C example that triggers this pattern

static const int N = 128;

int8_t A[2*N];
uint8_t B[2*N];
int16_t C[N];

#define MIN(x, y) ((x) < (y)) ? (x) : (y)
#define MAX(x, y) ((x) > (y)) ? (x) : (y)

void foo() {
  for (int i = 0; i != N; ++i)
    C[i] = MIN(MAX((int16_t)A[2*i]*(int16_t)B[2*i] + (int16_t)A[2*i+1]*(int16_t)B[2*i+1], -32768), 32767);
}

Diff Detail

Event Timeline

craig.topper created this revision.Jul 25 2018, 5:21 PM
craig.topper edited the summary of this revision. (Show Details)
zvi added a comment.Jul 27 2018, 12:57 PM

LGTM. Regarding the SSE2->SSSE3 test change, i think it's fine. Can you update the --check-prefix to SSSE3 in a follow-up commit? I think it's convenient to review as-is, but in the longer term it would be misleading to leave it as-is.

zvi accepted this revision.Jul 27 2018, 12:58 PM
This revision is now accepted and ready to land.Jul 27 2018, 12:58 PM
RKSimon requested changes to this revision.Jul 30 2018, 3:27 AM
RKSimon added inline comments.
lib/Target/X86/X86ISelLowering.cpp
36821

Couldn't you merge all these sets of canonicalization early-outs together to safe space?

36837

I always get nervous when we add vectorization detection in the DAG.

36839

for (unsigned i = 0, e = N00.getNumOperands(); i != e; ++i) {

or use NumElems?

36865

element

test/CodeGen/X86/madd.ll
2 ↗(On Diff #157406)

prefix doesn't match the sse level - please can you put the pmaddusbw tests in pmaddusbw.ll - I don;t think its worth adding SSSE3 tests to madd.ll tbh

This revision now requires changes to proceed.Jul 30 2018, 3:27 AM
craig.topper added inline comments.Jul 30 2018, 10:34 AM
lib/Target/X86/X86ISelLowering.cpp
36837

Nervous how?

Address Simon's comments. Spell the mnemonic correctly in the test cases.

RKSimon added inline comments.Jul 30 2018, 11:54 AM
lib/Target/X86/X86ISelLowering.cpp
36837

In this case it looks necessary, but it encourages the belief that its safe to perform vectorization in the DAG. In general we should be relying on the vectorizers to perform this using a proper cost analysis. see PR35732 as well.

RKSimon accepted this revision.Jul 31 2018, 2:17 AM

LGTM thanks - would be nice to commit pmaddubsw.ll first so we see the codegen diff from this patch.

This revision is now accepted and ready to land.Jul 31 2018, 2:17 AM
This revision was automatically updated to reflect the committed changes.
hsaito added a subscriber: hsaito.Aug 21 2018, 2:58 PM

Sorry, for the late response to the already closed review.

In this case it looks necessary, but it encourages the belief that its safe to perform vectorization in the DAG. In general we should be relying on the vectorizers to perform this using a proper cost analysis.

Vectorizer to perform a proper cost analysis, I agree 100%.

Vectorizer to perform this part, we can't do that unless we want vectorizer to emit target (in)dependent intrinsics all over the place. As long as we want to keep using sequence of IR instructions, what's more important is to build up common and robust (TTI based?) pattern matchers that can be used at IR level as well as at the DAG level such that what we see at vectorizer can also be captured at DAG Optimizer.