This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] combineMulToPMADDWD - improve recognition of sign/zero extended upper bits
ClosedPublic

Authored by RKSimon on Aug 22 2021, 1:14 PM.

Details

Summary

PMADDWD(v8i16 x, v8i16 y) == (v4i32) { (int)x[0]*y[0] + (int)x[1]*y[1], ..., (int)x[6]*y[6] + (int)x[7]*y[7] }

Currently combineMulToPMADDWD only folds cases where the upper 17 bits of both vXi32 inputs are known zero (i.e. the first half is positive and the second half of the pair is zero in each 2xi16 pair), this can be relaxed to only require one zero-extended input if the other input has at least 17 sign bits.

That way the sign of the result is still preserved, and the second half is still zero.

Noticed while investigating PR47437.

Diff Detail

Event Timeline

RKSimon created this revision.Aug 22 2021, 1:14 PM
RKSimon requested review of this revision.Aug 22 2021, 1:14 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 22 2021, 1:14 PM
pengfei accepted this revision.Sep 1 2021, 6:20 AM

The math looks good to me.
Wild thought: can we extend to zero/signed bits = 16 if the other element has more than 17 bits zero/signed? I think this should be common as a sext/zext <2 x i16> to <2 x i32>.

llvm/lib/Target/X86/X86ISelLowering.cpp
51613

Is there a test to cover this combine?

This revision is now accepted and ready to land.Sep 1 2021, 6:20 AM

The math looks good to me.
Wild thought: can we extend to zero/signed bits = 16 if the other element has more than 17 bits zero/signed? I think this should be common as a sext/zext <2 x i16> to <2 x i32>.

Thanks - we might be able to do something along those lines. I'll do some more testing to see what would be safe. The implicit sign extension of PMADDWD is quite powerful if we use it properly.

llvm/lib/Target/X86/X86ISelLowering.cpp
51613

Yes, without it we get a couple of test regressions as the patch more aggressively generates PMADDWD custom nodes before other nodes further up the DAG have been simplified to zero.

I'll see if I can pull this out as a pre-commit. I left this in such as general state as I wondered about supporting PMADDUBSW as well with the same combine.

This revision was landed with ongoing or failed builds.Sep 2 2021, 9:36 AM
This revision was automatically updated to reflect the committed changes.