This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Lower multiplication by a constant int to madd
ClosedPublic

Authored by Allen on Sep 21 2022, 12:38 AM.

Details

Summary

Lower a = b * C -1 into madd

a) instcombine change b * C -1 --> b * C + (-1)
b) machine-combine change b * C + (-1) --> madd

Assembler will transform the neg immedate of sub to add, see https://gcc.godbolt.org/z/cTcxePPf4
Fixes AArch64 part of https://github.com/llvm/llvm-project/issues/57255.

Diff Detail

Event Timeline

Allen created this revision.Sep 21 2022, 12:38 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 21 2022, 12:38 AM
Allen requested review of this revision.Sep 21 2022, 12:38 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 21 2022, 12:38 AM

That's mysterious... we had this code for years, but only used it for immediate we can synthesize using ORR. Huh.

Do we want to make the same change for MULADDWI_OP1 ?

Allen updated this revision to Diff 462050.Sep 21 2022, 5:23 PM

Address the MULADDWI_OP1 according comment

Code changes seem fine.

This sort of change might have some unexpected performance effects... do you have a benchmarking environment to see if there's an impact?

llvm/test/CodeGen/AArch64/machine-outliner-throw.ll
1

I think this NOTE was intentionally dropped from the test because some of the CHECK lines are manually written?

Allen added a comment.Sep 23 2022, 1:45 AM

Code changes seem fine.

This sort of change might have some unexpected performance effects... do you have a benchmarking environment to see if there's an impact?

Thanks for your suggestion, and I will try a performance test based on spec2017 in the near future.

Allen updated this revision to Diff 462422.Sep 23 2022, 2:05 AM

Drop the Note in file machine-outliner-throw.ll

Allen marked an inline comment as done.EditedSep 26 2022, 2:28 AM

hi, @efriedma

  • base on commit 7328ff75ba, I add this patch and test the performace change, there is no notable regression case base on tsv110 target. (I think there are still some fluctuations in performance and this combination does not significantly affect performance is expected.)
  • ps: the 548.exchange2_r compile fail as it is a fortran source code, see https://www.spec.org/cpu2017/Docs/overview.html#benchmarks

llvm/test/CodeGen/AArch64/machine-outliner-throw.ll
1

Done, thanks

Allen marked an inline comment as done.Sep 27 2022, 12:35 AM

ping

Perf seems fine, then.

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
5942

Sorry, missed an issue here the first time I looked at this...

The current complete set of opcodes that immediate expansion can produce is listed in AArch64ExpandPseudo::expandMOVImm . If the opcode is movz/movn, there are two immediate operands. But the opcode can also be ORRWri/ORRXri; in that case, there a register operand (wzr/xzr), and an immediate operand.

I think this comes up for something like int f(int a) { return a * 44 + 16773120; }.

Allen added inline comments.Sep 27 2022, 9:25 PM
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
5942

Despite the fact that movz/movn have two immediate operands, but it will generate a mov as alias, which only has one immediate operand.
see the following code frag in AArch64_IMM::expandMOVImm.

// Prefer MOVZ/MOVN over ORR because of the rules for the "mov" alias.
if ((BitSize / 16) - OneChunks <= 1 || (BitSize / 16) - ZeroChunks <= 1) {
  expandMOVImmSimple(Imm, BitSize, OneChunks, ZeroChunks, Insn);
  return;
}
efriedma added inline comments.Sep 28 2022, 11:12 AM
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
5942

The first instruction returned by expandMOVImm is always "mov", yes. "mov" is an alias for one of three instructions: movz, movn, and orr. (See section C3.3.4 in the ARM architecture reference.)

As far as I can tell, the operands you're adding are correct for movz and movn, but not orr.

Allen updated this revision to Diff 463827.Sep 29 2022, 3:33 AM

revert to use ORR V, ZR, Imm if valid

Allen marked 2 inline comments as done.Sep 29 2022, 3:37 AM
Allen added inline comments.
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
5942

Done, revert to use orr if valid,thanks

efriedma added inline comments.Sep 29 2022, 9:34 AM
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
5942

Do we have test coverage for any constants like 16773120?

Instead of calling processLogicalImmediate like this, it would be more intuitive to explicitly check that MovI->Opcode is the opcode we expect. The current code works, but it's tightly tied to the exact implementation of expandMOVImm.

Allen updated this revision to Diff 464124.Sep 29 2022, 6:40 PM
Allen marked an inline comment as done.

update to use MovI->Opcode check the opcode

efriedma added inline comments.Sep 30 2022, 1:39 PM
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
5799

Just make this say "mov", since that's always the alias.

5839

Maybe assert that the opcode is movn or movz here? For documentation.

Allen updated this revision to Diff 464530.Oct 1 2022, 5:56 PM
Allen marked an inline comment as done.

add assert to check the opcode

Allen marked 2 inline comments as done.Oct 2 2022, 4:29 PM
Allen added inline comments.
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
5799

Done

5839

Apply your comment, thanks

efriedma accepted this revision.Oct 3 2022, 9:24 AM

LGTM, but please clang-format before you merge.

This revision is now accepted and ready to land.Oct 3 2022, 9:24 AM
Allen updated this revision to Diff 465948.Oct 6 2022, 6:08 PM
Allen marked 2 inline comments as done.

update with clang-format

Allen updated this revision to Diff 466036.Oct 7 2022, 4:23 AM
Allen edited the summary of this revision. (Show Details)

Update with clang-format

This revision was landed with ongoing or failed builds.Oct 7 2022, 4:35 AM
This revision was automatically updated to reflect the committed changes.