This is an archive of the discontinued LLVM Phabricator instance.

[DAG] try to convert multiply to shift via demanded bits
ClosedPublic

Authored by spatel on Feb 20 2022, 12:02 PM.

Details

Summary

This is a fix for a regression discussed in:
https://github.com/llvm/llvm-project/issues/53829

We cleared more high multiplier bits with 995d400, but that can lead to worse codegen because we would fail to recognize the now disguised multiplication by neg-power-of-2 as a shift-left. The problem exists independently of the IR change in the case that the multiply already had cleared high bits. We also convert shl+sub into mul+add in instcombine's negator.

This patch fills in the high-bits to see the shift transform opportunity. Alive2 attempt to show correctness:
https://alive2.llvm.org/ce/z/GgSKVX

The AArch64, RISCV, and MIPS diffs look like clear wins. The x86 code requires an extra move register in the minimal examples, but it's still an improvement to get rid of the multiply on all CPUs that I am aware of (because multiply is never as fast as a shift).

There's a potential follow-up noted by the TODO comment. We should already convert that pattern into shl+add in IR, so it's probably not common:
https://alive2.llvm.org/ce/z/7QY_Ga

Diff Detail

Event Timeline

spatel created this revision.Feb 20 2022, 12:02 PM
spatel requested review of this revision.Feb 20 2022, 12:02 PM
RKSimon added inline comments.Feb 20 2022, 12:32 PM
llvm/test/CodeGen/Mips/urem-seteq-illegal-types.ll
215

How did just a single add become a sub?

xbolva00 added inline comments.
llvm/test/CodeGen/PowerPC/urem-seteq-illegal-types.ll
261 ↗(On Diff #410171)

also interesting change

spatel marked 2 inline comments as done.Feb 20 2022, 5:57 PM
spatel added inline comments.
llvm/test/CodeGen/PowerPC/urem-seteq-illegal-types.ll
261 ↗(On Diff #410171)

Yes - this is the same test as for Mips above here.

After legalization, we have:

       t56: i64 = mul t2, Constant:i64<2>
      t58: i64 = add t55, t56
...
    t59: i64 = add t58, t57
  t65: i64 = and t59, Constant:i64<3>

So we are shifting a single meaningful demanded bit to bit 1, and I think the code is correct as shown here:
https://alive2.llvm.org/ce/z/cqL3SC

Notice that the sub becomes an add in IR with instcombine.

I fixed a similar gap in DAG folding with:
a2963d871ee5
...but we need yet another demanded bits fold or some other sub->add fold.

I suspect it's a rare case, and it didn't seem harmful in these tests at least, so I figured it could be another follow-up if needed.

spatel marked an inline comment as done.Feb 20 2022, 6:00 PM
spatel added inline comments.
llvm/test/CodeGen/PowerPC/urem-seteq-illegal-types.ll
261 ↗(On Diff #410171)

Alternatively, we could probably ignore any multiply by a power-of-2 constant in this fold since that should eventually become a shl by itself.

spatel updated this revision to Diff 410201.Feb 20 2022, 6:11 PM

Patch updated:
Ignore power-of-2 multiplies. That eliminates the add/sub diffs and other reg allocation diffs, so now we just show real improvements in the tests.

spatel updated this revision to Diff 410204.Feb 20 2022, 6:19 PM

Fixed a missed test update that didn't make it into the last revision. There really are no pure-noise tests diffs. :)

AArch64 test look OK to me. They seem to be generating correct code.

RKSimon accepted this revision.Feb 23 2022, 4:55 AM

LGTM - cheers

This revision is now accepted and ready to land.Feb 23 2022, 4:55 AM
This revision was automatically updated to reflect the committed changes.
asb added a comment.Feb 23 2022, 9:16 AM

Thanks Sanjay - I can confirm this fixes all code quality regressions across the GCC torture suite on RISC-V.

Thanks Sanjay - I can confirm this fixes all code quality regressions across the GCC torture suite on RISC-V.

Great - thanks for the bug report and confirming the fix!