Performing the rearrangement for add/sub and mul instructions to match the madd/msub pattern
Details
Diff Detail
Event Timeline
llvm/test/CodeGen/AArch64/madd-combiner.ll | ||
---|---|---|
1–3 | Can you change these check lines to: ; RUN: llc -mtriple=aarch64-apple-darwin -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,CHECK-ISEL ; RUN: llc -mtriple=aarch64-apple-darwin -fast-isel -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK,CHECK-FAST And pre-generate the check lines with update_llc_test_checks. |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
12896 | I'd suggest collapsing all conditions together into a single lambda, so that it makes the code structurally simpler (less control flow) and more readable (you can directly see the exact pattern being matched). It may also help handle the case where the N0 is an ADD/SUB which does not match the conditions that follow, where N1 is just Y + 1. e.g. (X+Z)*(Y+1), isn't yet handled by your code. auto IsAddSubWith1= [](SDValue V) -> bool { unsigned Opc = V->getOpcode(); if ((Opc == ISD::ADD || Opc == ISD::SUB) ) && V->hasOneUse()) { SDValue Opnd = Opc == ISD::ADD ? V->getOperand(1) : V->getOperand(0); if (auto C = dyn_cast<ConstantSDNode>(Opnd)) return C->isOne(); } return false; } if (IsAddSubWith1(N->getOperand(0)) { // Rewrite } if (IsAddSubWith1(N->getOperand(1))) { // Rewrite } | |
12923 | nit: this one can be replaced by N1. | |
12930 | nit: this one can be replaced by N0. |
Thanks for the changes!
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
12886 | nit: unnecessary curly braces. | |
12896 | If you change IsAddSubWith1 also return the other operand, the code for rewriting becomes a bit simpler and easier to follow. auto IsAddSubWith1 = [](SDValue V, SDValue &OtherOpnd) -> bool { ... }; SDValue OtherOpnd; if (IsAddSubWith1(N0, OtherOpnd)) { SDValue MulVal = DAG.getNode(ISD::MUL, DL, VT, N1, OtherOpnd); return DAG.getNode(N0->getOpcode(), DL, VT, N1, MulVal); } |
Hi @wwei, have you done any performance measurements with this change on hardware to see what effect it has on some benchmarks?
We found this optimization opportunity on some HPC workloads, but we haven’t tested the performance gain yet.
LGTM! Looks like you've addressed all the review comments.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
12882 | nit: Could you rename this so it's more obvious, i.e. something like AddSubOpc? |
Should we add ISD::Freeze here? What if X/Y is undef for this pattern?
It looks the transform is not correct unless X/Y is noundef on LLVM IR. I don't know if SDAG have similar limitation or not.
https://alive2.llvm.org/ce/z/ciKKWy
The pattern isn't undef-safe, but it's poison-safe. We haven't been going after transforms like that very aggressively lately, given https://discourse.llvm.org/t/rfc-load-instruction-uninitialized-memory-semantics/67481 . But yes, it should freeze the operand.
nit: Could you rename this so it's more obvious, i.e. something like AddSubOpc?