Page MenuHomePhabricator

[AArch64][SVE][DAGCombine] Factor out redundant SVE mul/fmul intrinsics

Authored by joechrisellis on Feb 18 2021, 8:10 AM.



This commit implements the following patterns:

fmul (ptrue sv_all) (dup 1.0) V => V
fmul (ptrue sv_all) V (dup 1.0) => V
mul (ptrue sv_all) (dup 1) V => V
mul (ptrue sv_all) V (dup 1) => V

That is: using the SVE mul/fmul intrinsic with an all-true predicate to
multiply a vector X by a vector of all ones is redundant.

The result of this commit is that code such as:

1  #include <arm_sve.h>
3  svfloat64_t foo(svfloat64_t a) {
4    svbool_t t = svptrue_b64();
5    svfloat64_t b = svdup_f64(1.0);
6    return svmul_m(t, a, b);
7  }

will compile to a nop.

This commit does not capture all possibilities; only the simple case as
described above. There is still room for further optimisation.

Diff Detail

Unit TestsFailed

100 msx64 debian > Polly.Isl/CodeGen::scev-backedgetaken.ll
Script: -- : 'RUN: at line 1'; opt -polly-process-unprofitable -polly-remarks-minimal -polly-use-llvm-names -polly-import-jscop-dir=/mnt/disks/ssd0/agent/llvm-project/polly/test/Isl/CodeGen -polly-codegen-verify -polly-codegen -S < /mnt/disks/ssd0/agent/llvm-project/polly/test/Isl/CodeGen/scev-backedgetaken.ll | FileCheck /mnt/disks/ssd0/agent/llvm-project/polly/test/Isl/CodeGen/scev-backedgetaken.ll
760 msx64 windows > MLIR.Examples/Toy/Ch6::jit.toy
Script: -- : 'RUN: at line 1'; toyc-ch6 -emit=jit C:\ws\w64\llvm-project\premerge-checks\mlir\test\Examples\Toy\Ch6\jit.toy
670 msx64 windows > MLIR.Examples/Toy/Ch7::jit.toy
Script: -- : 'RUN: at line 1'; toyc-ch7 -emit=jit C:\ws\w64\llvm-project\premerge-checks\mlir\test\Examples\Toy\Ch7\jit.toy

Event Timeline

joechrisellis created this revision.Feb 18 2021, 8:10 AM
joechrisellis requested review of this revision.Feb 18 2021, 8:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 18 2021, 8:10 AM

Ideas for more tests appreciated. 😄

david-arm added inline comments.

Is it worth naming this something like combineSVEIntrinsicBinOp and similarly for FP you could have combineSVEIntrinsicFPBinOp? A bit like SelectionDAG::simplifyFPBinop. The reason I mention this is that I can imagine you wanting similar things for divides, adds at some point too, i.e. fdiv X, 1.0 -> X or fadd X, 0.0 -> X


Perhaps you can commonise the check for "ptrue all" that appears in both functions? Also, it might be worth inverting the logic and moving to the start of the function, i.e.

SDLoc DL(N);

SDValue Pg = N->getOperand(1);

if (!ptrueAllIntrinsic(Pg))
  return SDValue();

This avoids the extra indentation. Not sure what you think?

joechrisellis marked 2 inline comments as done.

Address @david-arm's comments.

  • more general implementation.
  • factor out checking for ptrue(SV_ALL) intrinsic calls to isPTrueAllIntrinsic.

This is a great suggestion, thank you! Done.


Agreed -- done. 😄

joechrisellis abandoned this revision.Mar 5 2021, 3:00 AM
joechrisellis added a subscriber: paulwalker-arm.

After some discussion with @paulwalker-arm we think this might be better placed as an IR-level optimisation in SVEIntrinsicOpts.cpp. Therefore, I am going to abandon this revision and create a new one. 🙂