Page MenuHomePhabricator

[RISCV] Add performMULcombine to perform strength-reduction
Needs ReviewPublic

Authored by philipp.tomsich on Feb 6 2023, 6:24 AM.

Details

Summary

The RISC-V backend thus far does not provide strength-reduction, which
causes a long (but not complete) list of 3-instruction patterns listed
to utilize the shift-and-add instruction from Zba and XTHeadBa in
strength-reduction.

This adds the logic to perform strength-reduction through the DAG
combine for ISD::MUL. Initially, we wire this up for XTheadBa only,
until this has had some time to settle and get real-world test
exposure.

The following strength-reductions strategies are currently supported:

  • XTheadBa
    • C = (n + 1) // th.addsl
    • C = (n + 1)k // th.addsl, slli
    • C = (n + 1)(m + 1) // th.addsl, th.addsl
    • C = (n + 1)(m + 1)k // th.addsl, th.addsl, slli
    • C = ((n + 1)m + 1) // th.addsl, th.addsl
    • C = ((n + 1)m + 1)k // th.addslm th.addsl, slli
  • base ISA
    • C being 2 set-bits // slli, slli, add
			       (possibly slli, th.addsl)

Even though the slli+slli+add sequence would we supported without
XTheadBa, this currently is gated to avoid having to update a large
number of test cases (i.e., anything that has a multiplication with a
constant where only 2 bits are set) in this commit.

With the strength reduction now being performed in performMUL combine,
we drop the (now redundant) patterns from RISCVInstrInfoXTHead.td.

Depends on D143029

Diff Detail

Event Timeline

philipp.tomsich created this revision.Feb 6 2023, 6:24 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 6 2023, 6:24 AM
philipp.tomsich requested review of this revision.Feb 6 2023, 6:24 AM
  • rerun clang-format
kito-cheng added inline comments.Feb 6 2023, 11:40 PM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
8651

Early exit if no Subtarget.hasVendorXTHeadBa()?

8690

Does it applicable on zba?

craig.topper added inline comments.Feb 7 2023, 12:02 AM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
8656

Why allowing for vector?

8694

Divisible*

8722

It's harmless to create a nop ANY_EXTEND. getNode detects it.

8727

It's harmless to create a nop TRUNCATE.

8769

feasible*

craig.topper added inline comments.Feb 7 2023, 12:13 AM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
8722

Though I'm not sure why the VT would be different. The code doesn't look through any extends or truncates so the types should be changing right?

philipp.tomsich marked 4 inline comments as done.
  • don't try to handle vectors (yet)
  • unconditionally insert ANY_EXTEND and TRUNCATE and let later passes clean up
philipp.tomsich added inline comments.Feb 7 2023, 5:13 AM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
8651

The final optimization (turning this into slli + slli + add is applicable to RV64I).
So that block (the one "C has 2 bits set") will eventually move out of the if in a later commit.

Refer to the remark from the commit message:

Even though the slli+slli+add sequence would we supported without
XTheadBa, this currently is gated to avoid having to update a large
number of test cases (i.e., anything that has a multiplication with a
constant where only 2 bits are set) in this commit.

For that reason, I'd like to keep the logic as is.

8690

Yes, it is applicable.
I'd like to keep this for a later commit (once we had a chance to run a full QA run with this enabled for Zba).

Ok to defer?

8722

th.addsl is defined for XLen only (unlike mul, which has a W-form).
When operating on a MVT::i32, this any-extends and then truncate the result (which will be merged into a W-form add or shift on the final instruction).

If we don't any-extend, the compile on RV64 for the function

unsigned func32(unsigned a)
{

return a*200;

}

will keep the first operation separated out as a slliw + add:

func32:
slliw a1, a0, 2
add a0, a0, a1
th.addsl a0, a0, a0, 2
slliw a0, a0, 3
ret

This revision was not accepted when it landed; it landed in state Needs Review.Feb 7 2023, 10:57 PM
This revision was automatically updated to reflect the committed changes.
philipp.tomsich reopened this revision.Feb 7 2023, 11:03 PM

Reopening as this was accidentially pushed and revert using 'arc patch on D143534'.

reames added a subscriber: reames.Feb 8 2023, 7:38 AM

At a high level, it feels wrong for this to be applied only to the t-head versions of the Zba instructions. If this can be profitably done for the standard extension, I'd encourage you to do so. If anything, starting with the standard extension specifically so we get the test coverage would seem like a better strategy.

Have you looked at what it would take to share code with another target here? I glanced at x86, and it seems like there's a bunch of overlap. Maybe we could introduce a set of generic DAG combines which enabled based on a callback or configuration? Haven't given this much thought, so take this as a light suggestion only.

philipp.tomsich planned changes to this revision.Feb 8 2023, 7:44 AM

I'll take the two suggestions (Kito and Philip) on getting Zba supported in this initial change as a hint that there's no point in delaying this.
We'll get a new version ready that adds the following:

  • enables for Zba

I will keep the following for a separate change:

  • move the slli + slli + add out of the guard ... this will be 3 instructions, with two of them independent for a { slli, slli } -> { add } critical path on dual-issue cores)
  • adjust all affected test-cases (there will be a massive ripple effect)

If you want the slli + slli + add moved out of the guard in this change as well, please let me know and we'll fold this into this change as well.

This revision was not accepted when it landed; it landed in state Changes Planned.Feb 17 2023, 10:45 AM
This revision was automatically updated to reflect the committed changes.
philipp.tomsich reopened this revision.Feb 17 2023, 10:47 AM

Accidentially pushed (another 'arc patch' issue) and reverted.