This is an archive of the discontinued LLVM Phabricator instance.

[DAG] Fold vector mul(x,0)/mul(x,1) to a clearing mask
ClosedPublic

Authored by RKSimon on Sep 24 2020, 6:16 AM.

Details

Summary

If we're multiplying all elements of a vector by '0' or '1' then we can more efficiently perform this as a clearing mask (that is likely to further simplify to a shuffle blend).

This was noticed when reviewing D87502 but seems to help idiv/irem by constant cases even more as '0'/'1' values are often used for 'passthrough' cases.

I'm not sure what to make of the aarch64 changes - replacing a fused mla with and+add seems beneficial as the mul is by far the costlier op but I might be wrong....

Diff Detail

Event Timeline

RKSimon created this revision.Sep 24 2020, 6:16 AM
RKSimon requested review of this revision.Sep 24 2020, 6:16 AM
foad added a comment.Sep 24 2020, 6:33 AM

Looks good to me but I guess it needs approval from the affected target maintainers.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
3711

Maybe reserve() space in ClearMask, or initialize it with an explicit length, for efficiency?

RKSimon updated this revision to Diff 294061.Sep 24 2020, 7:56 AM

Add explicit SmallBitVector::reserve()

lebedev.ri accepted this revision.Sep 24 2020, 8:09 AM
lebedev.ri added a subscriber: lebedev.ri.

Not sure about AArch64, but this LG to me.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
3708

It took me 3 tries to read right. I'd suggest something closer to

// Fold ((mul x, 0/undef) -> 0,
//       (mul x, 1)       -> x))
       -> and(x, mask)
This revision is now accepted and ready to land.Sep 24 2020, 8:09 AM
RKSimon updated this revision to Diff 294073.Sep 24 2020, 8:31 AM

improve grokability

RKSimon requested review of this revision.Sep 24 2020, 8:31 AM

reopening for aarch64 review

craig.topper added inline comments.Sep 24 2020, 10:48 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
3727

Does this work correctly after type legalization if VT.getScalarType() isn't legal? Though maybe that can't happen because we're already looking at a build vector constants?

There's also the case that the scalar type for the constant needs to be larger than the scalar type of the vector.

efriedma added inline comments.Sep 24 2020, 11:01 AM
llvm/test/CodeGen/AArch64/srem-seteq-vec-nonsplat.ll
19

These test changes seem fine; throughput is probably higher in this form.

RKSimon updated this revision to Diff 294263.Sep 25 2020, 3:26 AM

Ensure the new zero/ones scalar type matches the existing type inside the BUILD_VECTOR.

Any more comments? I think I've covered everything that was raised, cheers.

This revision is now accepted and ready to land.Sep 26 2020, 12:40 AM
This revision was landed with ongoing or failed builds.Sep 26 2020, 6:32 AM
This revision was automatically updated to reflect the committed changes.