This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner][x86] add transform/hook to decompose integer multiply into shift/add
ClosedPublic

Authored by spatel on Sep 17 2018, 3:10 PM.

Details

Summary

This is an alternative to D37896. I don't see a way to decompose multiplies generically without a target hook to tell us when it's profitable.

As a first step, I'm just trying to get the vector cases requested in PR34474:
https://bugs.llvm.org/show_bug.cgi?id=34474

The shakiest test diff here may be SSE4.1 code that uses 'pmulld' with a constant pool load. That can become 4 instructions like:

movdqa %xmm0, %xmm1
pslld $4, %xmm1
paddd %xmm0, %xmm1
movdqa %xmm1, %xmm0

...but I think despite the code-size increase, this is still better performing code. A scan of Agner's timing tables says pmulld is always at least 4 cycle latency, but possibly as much as 11 cycles. So replacing that with fast ops (and removing the constant load) should be a win even in the minimal case.

Diff Detail

Event Timeline

spatel created this revision.Sep 17 2018, 3:10 PM

The shakiest test diff here may be SSE4.1 code that uses 'pmulld' with a constant pool load. That can become 4 instructions like:

pmullw could be worse - that's often just 2/3cy latency

spatel updated this revision to Diff 165992.Sep 18 2018, 9:46 AM

Patch updated:
A more conservative first step for x86 - don't do the transform if the vector multiply is legal (pmullw/pmulld). The remaining cases should always be clear improvements in speed and size.

RKSimon accepted this revision.Sep 19 2018, 5:20 AM

LGTM

This revision is now accepted and ready to land.Sep 19 2018, 5:20 AM
This revision was automatically updated to reflect the committed changes.