This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Use shift for zero extension when Zbb and Zbp are not enabled
ClosedPublic

Authored by Luhaocong on Jan 5 2022, 7:41 PM.

Details

Summary

Now AND is used for zero extension when Zbb and Zbp are both not enabled.
It‘s maybe better to use shift operation if the trailing ones mask exceeds simm12.

This patch optimzes LUI + ADDI + AND to SLLI + SRLI.

Diff Detail

Event Timeline

Luhaocong created this revision.Jan 5 2022, 7:41 PM
Luhaocong requested review of this revision.Jan 5 2022, 7:41 PM
jrtc27 added inline comments.Jan 5 2022, 7:56 PM
llvm/lib/Target/RISCV/RISCVInstrInfo.td
1045–1051

You could generalise this to the input being a zero-extended all ones value and using a shift of XLen - number of ones, i.e. using isMask_64, countTrailingOnes and ImmSubFromXLen. That'd also result in a single pattern that works for both RV32 and RV64; RISCVInstrInfoB.td only has separate patterns because the RV32 and RV64 ZEXTH instructions are different.

The AND could be better for loops if LICM can move the constant materialization out of the loop.

I’ve wondered about doing this as a machine IR peephole after LICM has its chance. But I haven’t spent any time on it.

I also concern that the 0xffff has multiple uses, such as

unsigned short foo(unsigned short a, unsigned short b, int c, int d) {
  return (a >> c) + (b >> d);
}

And there is only one immediate 0xffff is materialized.

foo:
        srl     a0, a0, a2
        srl     a1, a1, a3
        add     a0, a0, a1
        lui     a1, 16
        addi    a1, a1, -1
        and     a0, a0, a1
        ret

Using a pass seems to be better, two cases should be excluded,

  1. in a loop body
  2. 0xffff has multiple uses

What's more, the pass can be generalized to a common immediate optimization pass, which combines several small optimization rules, just like AArch64's AArch64MIPeepholeOpt pass. However we need not make is so complex in current patch, that would be something TODO in the future.

The AND could be better for loops if LICM can move the constant materialization out of the loop.

I’ve wondered about doing this as a machine IR peephole after LICM has its chance. But I haven’t spent any time on it.

Would it be better as follow?

  1. Always Select SLLI + SRLI when input is all ones value exceeding simm12.
  2. in PeepholeOptimzer or other pass,revert SLLI + SRLI to LUI + ADDI + AND if they are in loop.
Luhaocong updated this revision to Diff 397854.Jan 6 2022, 4:49 AM
Luhaocong retitled this revision from [RISCV] Use shift for zext.h when Zbb and Zbp are not enabled to [RISCV] Use shift for zero extension when Zbb and Zbp are not enabled.
Luhaocong edited the summary of this revision. (Show Details)

Generalise this optimization

The AND could be better for loops if LICM can move the constant materialization out of the loop.

I’ve wondered about doing this as a machine IR peephole after LICM has its chance. But I haven’t spent any time on it.

Would it be better as follow?

  1. Always Select SLLI + SRLI when input is all ones value exceeding simm12.
  2. in PeepholeOptimzer or other pass,revert SLLI + SRLI to LUI + ADDI + AND if they are in loop.

PeepholeOptimizer is after LICM so that would be too late. It would need to be before LICM.

Another thing I just realized. MachineIR LICM doesn't visit every loop, just the outermost loop with a preheader. So LICM would probably move the LUI+ADDI as far out as it can. And rematerialization in register allocation isn't powerful enough to bring it back in when necessary to avoid a spill since it is two instructions.

llvm/lib/Target/RISCV/RISCVInstrInfo.td
414

Make this a PatLeaf like AddiPair and move the one use check in here to get rid of and_const_oneuse.

I think you can then do

return `isInt<12>(N->getSExtValue()) && isMask_64(N->getZExtValue())

No need for Subtarget check.

1046

Now that it's generalized, I don't think you want this predicate check here. The zext.h patterns should get priority since it is look for a specific immediate. We would still want this to fire for other masks.

Luhaocong updated this revision to Diff 398028.Jan 6 2022, 6:48 PM

update pattern

Luhaocong marked 3 inline comments as done.Jan 6 2022, 6:51 PM
Luhaocong updated this revision to Diff 398584.Jan 10 2022, 5:01 AM
Luhaocong edited the summary of this revision. (Show Details)

rebase and update test cases

This revision is now accepted and ready to land.Jan 10 2022, 10:12 AM
This revision was landed with ongoing or failed builds.Jan 10 2022, 6:40 PM
This revision was automatically updated to reflect the committed changes.