This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Match neg (and x, 1) to two shifts to improve codesize
ClosedPublic

Authored by reames on Dec 19 2022, 10:26 AM.

Details

Summary

The negate operation is never compressible (as the destination and rs1 register must differ). The two shift versions will be equal size if the input GPR is reused, or smaller if this is the only use of the input.

For clarity, the operation being performed is (select (low-bit-of x), -1, 0).

Diff Detail

Event Timeline

reames created this revision.Dec 19 2022, 10:26 AM
reames requested review of this revision.Dec 19 2022, 10:26 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 19 2022, 10:26 AM
reames added inline comments.Dec 19 2022, 10:27 AM
llvm/lib/Target/RISCV/RISCVInstrInfo.td
1244

This shouldn't be specific to RV64, but my attempts to write a pattern which used ImmSubFromVLen kept failing type inference. Any suggestions on how to write such a pattern?

jrtc27 added inline comments.Dec 19 2022, 10:56 AM
llvm/lib/Target/RISCV/RISCVInstrInfo.td
1244

ImmSubFromXLen (XLenVT 1), as the issue is TableGen has no way of knowing that the type of the input is the same as the output, it just sees the input as unconstrained, only the output constrained by its use. Normally that doesn't matter because the input is a pre-existing node whose type is constrained by the pattern being matched.

1245

Why explicit hex?

reames updated this revision to Diff 484029.Dec 19 2022, 11:51 AM
reames marked 2 inline comments as done.
reames added inline comments.
llvm/lib/Target/RISCV/RISCVInstrInfo.td
1244

Thanks!

craig.topper added inline comments.Dec 19 2022, 11:59 AM
llvm/lib/Target/RISCV/RISCVInstrInfo.td
1244

Any harm in doing this regardless of HasStdExtC? I've sort of been of the opinion that we should avoid CodeGen differences between C and non-C if we can.

No RV32 test coverage?

llvm/lib/Target/RISCV/RISCVInstrInfo.td
1246

Wrap this for 80 columns

asb added a comment.EditedDec 19 2022, 12:12 PM

LGTM with Craig's comments addressed + RV32 test coverage.

I think doing this even without the compressed extension probably makes sense. Surprisingly it seems there's a couple of cases in the GCC torture suite at O0 where li a0, 0; sub a0, a0, a1 is selected rather than a neg. This patch also avoids avoids that - not that it's particularly important one way or another given how heavyweight the O0 output is in general.

EDIT: Thanks for Craig pointing out below why this isn't surprising.

LGTM with Craig's comments addressed + RV32 test coverage.

I think doing this even without the compressed extension probably makes sense. Surprisingly it seems there's a couple of cases in the GCC torture suite at O0 where li a0, 0; sub a0, a0, a1 is selected rather than a neg. This patch also avoids avoids that - not that it's particularly important one way or another given how heavyweight the O0 output is in general.

Probably because we emit a copy from X0 and -O0 doesn't run the coalescer.

This revision was not accepted when it landed; it landed in state Needs Review.Dec 19 2022, 2:32 PM
This revision was automatically updated to reflect the committed changes.
reames marked an inline comment as done.