[ARM] bottom-top mul support in ARMParallelDSP
ClosedPublic

Authored by samparker on Sep 12 2018, 7:00 AM.

Details

Summary

On failing to find sequences that can be converted into dual macs, try to find sequential 16-bit loads that are used by muls which we can then use smultb, smulbt, smultt with a wide load.

Diff Detail

Repository
rL LLVM
samparker created this revision.Sep 12 2018, 7:00 AM
samparker updated this revision to Diff 165086.Sep 12 2018, 7:30 AM

Added a negative test for loading i8s as well as a positive test for 64-bit macs.

SjoerdMeijer accepted this revision.Sep 13 2018, 8:11 AM
SjoerdMeijer added inline comments.
lib/Target/ARM/ARMParallelDSP.cpp
662 ↗(On Diff #165086)

I think the alias checks are the same as in MatchSMLAD, so it looks like we can share that, and avoid the duplication here. I could definitely live with that being done in a follow up patch.

test/CodeGen/ARM/paralleldsp-top-bottom.ll
28 ↗(On Diff #165086)

These tests are not ideal, in the sense we are not directly checking for the smultb, smulbt, smultt patterns here. Instead, we rely on these ashr + mul patterns that we now generate, to be picked up and turned in to smlab variants later, like it is tested in test/CodeGen/ARM/smul.ll. And thus, since breaking these shr + mul rewrite patterns would give problems in smul.ll, I think these tests in their current form are acceptable as I understood that turning these tests in to llc tests would complicate things a bit.

This revision is now accepted and ready to land.Sep 13 2018, 8:11 AM
samparker added inline comments.Sep 14 2018, 1:07 AM
lib/Target/ARM/ARMParallelDSP.cpp
662 ↗(On Diff #165086)

Will do.

Closed by commit rL342210: [ARM] bottom-top mul support in ARMParallelDSP (authored by sam_parker, committed by ). · Explain WhySep 14 2018, 1:11 AM
This revision was automatically updated to reflect the committed changes.
samparker reopened this revision.Sep 20 2018, 3:47 AM

Commit was reverted in rL342260.

This revision is now accepted and ready to land.Sep 20 2018, 3:47 AM
samparker updated this revision to Diff 166256.EditedSep 20 2018, 3:49 AM
samparker edited reviewers, added: rnk; removed: javed.absar.

The patch caused an assert on some vectorized code because I missed a check that the muls are plain integer types. parallel-dsp-top-bottom-neg-vec.ll has been added which was the reproducer provided.

SjoerdMeijer accepted this revision.Sep 21 2018, 1:55 AM
SjoerdMeijer added inline comments.
lib/Target/ARM/ARMParallelDSP.cpp
650 ↗(On Diff #166256)

Diff'ing this was a bit difficult, but these are the 2 new lines, which matches with your last comment. Looks like a straightforward fix to me.

samparker closed this revision.Mon, Sep 24, 2:38 AM

Committed in rL342870.

samparker reopened this revision.Mon, Oct 1, 8:50 AM

This patch was reverted again in rL343082.

This revision is now accepted and ready to land.Mon, Oct 1, 8:50 AM
samparker updated this revision to Diff 167745.Mon, Oct 1, 8:55 AM
samparker added a reviewer: efriedma.

Two breaking assumptions were that:

  • the base load would be before the offset load.
  • each load would only have one user - this is true but I also really meant and assumed that the sext of the load has one user.

So, I've rewritten the sext user replacement code as the previous version was horrible. I've also added some code to reorder the load and address calculations. The way I've figured out the ordering looks bad, but I couldn't find a simple way in the APIs to find this kind of info out.

I've also created a ParallelDSP subdirectory for the tests and moved the smlad stuff into it as well.

SjoerdMeijer added inline comments.Wed, Oct 17, 2:52 AM
lib/Target/ARM/ARMParallelDSP.cpp
719 ↗(On Diff #167745)

Nit: this loop is doing the same things as the previous loop, can we make and call a function?

730 ↗(On Diff #167745)

Nit: looks like this is workaround for not having comparison operators on these iterators. Perhaps we can put this in little helper function too, sweep it under the carpet a little bit, but with a TODO/FIXME and some comments added to it. If a helper function is too much (this function MatchTopBottomMuls is becoming rather big), than at lease some comments here.

samparker updated this revision to Diff 169992.Wed, Oct 17, 4:54 AM

Cheers Sjoerd, I've added two helper functions to try to clean this up a bit.

SjoerdMeijer accepted this revision.Wed, Oct 17, 4:58 AM

cheers, LGTM

Closed by commit rL344693: [ARM] bottom-top mul support in ARMParallelDSP (authored by sam_parker, committed by ). · Explain WhyWed, Oct 17, 6:05 AM
This revision was automatically updated to reflect the committed changes.