This is an archive of the discontinued LLVM Phabricator instance.

[X86] Make constant `mul` -> `shl` + `add`/`sub` work for vector types
ClosedPublic

Authored by goldstein.w.n on May 10 2023, 4:04 PM.

Details

Summary

Something like:

`%r = mul %x, <33, 33, 33, ...>`

Is best lowered as:

`%tmp = %shl x, <5, 5, 5>; %r = add %tmp, %x`

As well, since vectors have non-destructive shifts, we can also do
cases where the multiply constant is Pow2A +/- Pow2B for arbitrary A
and B, unlike in the scalar case where the extra mov instructions
make it not worth it.

Diff Detail

Event Timeline

goldstein.w.n created this revision.May 10 2023, 4:04 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 10 2023, 4:04 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
goldstein.w.n requested review of this revision.May 10 2023, 4:04 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 10 2023, 4:04 PM

On current chips, pmullw has the same throughput as psllw; it seems unlikely to me that it's profitable to replace a pmullw with a four-instruction sequence.

On current chips, pmullw has the same throughput as psllw; it seems unlikely to me that it's profitable to replace a pmullw with a four-instruction sequence.

On SKX and newer there is an additional shuffle unit (p01). So will make the arbitrary Pow2A + Pow2B only for targets with fast shift.

Only use 2x shift approach on targets with fast shifts (2/cycle)

pengfei added inline comments.May 29 2023, 12:47 AM
llvm/lib/Target/X86/X86.td
562–564

Can we get such info from SchedModel rather than put another tuning?

llvm/lib/Target/X86/X86ISelLowering.cpp
48653–48655

Remove parentheses.

48772

Remove blank line.

llvm/test/CodeGen/X86/omit-urem-of-power-of-two-or-zero-when-comparing-with-zero.ll
323

IIRC, the intention is to generate pslld for new targets. Why this is affected given it's a general turning?

The same for others. I didn't check all, but in most tests we just enable features without specifying a turning target.

goldstein.w.n marked 2 inline comments as done.May 29 2023, 3:20 PM
goldstein.w.n added inline comments.
llvm/lib/Target/X86/X86.td
562–564

Can we get such info from SchedModel rather than put another tuning?

I'm not sure. The transform proposed in this patch works on SDNode which aren't in MachineInst form yet. AFAIK the schedmodel works on MachineInst, not SDNode, but if there is a way to get sched info for an SDNode I agree it would be better to use that.
Is there a way?

llvm/test/CodeGen/X86/omit-urem-of-power-of-two-or-zero-when-comparing-with-zero.ll
323

IIRC, the intention is to generate pslld for new targets. Why this is affected given it's a general turning?

The same for others. I didn't check all, but in most tests we just enable features without specifying a turning target.

That was the 2x pslld case but changed to cover the 1x pslld + 2x padd case as well.

On use shift if 1x extra add/sub
Fix some nits

pengfei added inline comments.May 29 2023, 6:10 PM
llvm/lib/Target/X86/X86.td
562–564

That's a good point. My suggestion was based on an assumption that if a target has fast shifts, all shift instructions, no matter SSE/AVX/AVX512 MI should have the same port info. So we can choose any one of them to check here.
However, I found it is not true, see https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/X86/X86SchedSkylakeServer.td#L416-L419
Adding a tuning flag has the same problem.
One alternative way is to check WriteVecShiftImm/X/Y/Z based on type and predicates. It's not always precise and I think it's a big hammer for this.

goldstein.w.n added inline comments.May 29 2023, 10:40 PM
llvm/lib/Target/X86/X86.td
562–564

Agreed, although this does highlight that it would be nice to have some scalable way to estimate SDNode cost

pengfei accepted this revision.Jun 8 2023, 1:30 AM

LGTM.

llvm/lib/Target/X86/X86ISelLowering.cpp
48707–48710

Maybe just VT.isVector() && VT.isInteger() given we made sure it's legal type by line 48691.

48722–48724

In which case we need to analyze target constant?

This revision is now accepted and ready to land.Jun 8 2023, 1:30 AM
goldstein.w.n added inline comments.Jun 8 2023, 1:09 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
48722–48724

Some of the tests in vector-mul.ll like: @mul_v4i32_17(<4 x i32> %a0). Not exact sure why, but it does cover some cases.

goldstein.w.n marked an inline comment as done.Jun 8 2023, 1:18 PM

Simplify VEC check

This revision was landed with ongoing or failed builds.Jun 10 2023, 12:39 PM
This revision was automatically updated to reflect the committed changes.