Similar to D49636, but for PMADDUBSW. This instruction has the additional complexity that the addition of the two products saturates to 16-bits rather than wrapping around. And one operand is treated as signed and the other as unsigned.
I changed the madd.ll test command line from sse2 to ssse3 to ensure this instruction was available which also caused some test changes for phadd. I can commit that separately if desired. Or I can add a new run line. Or a new test file. Whatever is preferable
A C example that triggers this pattern
static const int N = 128;
int8_t A[2*N];
uint8_t B[2*N];
int16_t C[N];
#define MIN(x, y) ((x) < (y)) ? (x) : (y)
#define MAX(x, y) ((x) > (y)) ? (x) : (y)
void foo() {
for (int i = 0; i != N; ++i)
C[i] = MIN(MAX((int16_t)A[2*i]*(int16_t)B[2*i] + (int16_t)A[2*i+1]*(int16_t)B[2*i+1], -32768), 32767);
}