Page MenuHomePhabricator

[DAGCombine][X86][AArch64][AMDGPU][MIPS][PPC] (sub x, c) -> (add x, -c) vector edition.
Needs ReviewPublic

Authored by lebedev.ri on May 23 2019, 1:12 PM.

Details

Summary

Lots of diff. The entire check-llvm-codegen passes,
so only X86 had conflicting transform. (D62327)

We want this transform because currently every single DAGCombine add %x, C
vector pattern needs to be written twice - for add and for sub.
Not good.

  • AArch64 changes look neutral-positive. I'm not good with that asm, but i think movi v1.2d encodes the entire all-ones as an imm0_255:$imm8, so there should not be codesize penalty?
  • AMDGPU seems to miss a fold: "if this is an addition by an immediate, and immediate needs a load, and negated immediate won't need load if used in add, then transform add to sub" Looks neutral otherwise
  • MIPS - bad, many regressions, same fold as AMDGPU seems missing.
  • PowerPC - not great, some regressions, same fold as AMDGPU seems missing.
  • X86 - in average looks like an improvement :) There are more deletions than additions. We delete 137 unfolded constant-pool loads, but add 56; delete 233 folded constant-pool loads, but add 350. Can't tell yet if there is some missing combines..

Diff Detail

Repository
rL LLVM

Event Timeline

lebedev.ri created this revision.May 23 2019, 1:12 PM

Forgot to add, i'm not dropping anything from DAGCombiner::visitSUB() here,
that should be done as NFC followups.

arsenm added inline comments.May 23 2019, 1:16 PM
test/CodeGen/AMDGPU/sub.v2i16.ll
329

This is worse because it is no longer an inline immediate. We undo this already for the 32-bit case during selection:

Undo sub x, c -> add x, -c canonicalization since c is more likely
an inline immediate than -c.
// TODO: Also do for 64-bit.
def : GCNPat<

(add i32:$src0, (i32 NegSubInlineConst32:$src1)),
(S_SUB_I32 $src0, NegSubInlineConst32:$src1)

;

RKSimon added inline comments.May 23 2019, 1:38 PM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2819

Do we need a (!VT.isVector() || N1.hasOneUse()) limit?

I do not think it's good for MIPS because this patch replace one subvi instruction by pair of ldi and addv. Are you going to fix such regressions?

lebedev.ri marked an inline comment as done.May 24 2019, 4:45 AM

I do not think it's good for MIPS because this patch replace one subvi instruction by pair of ldi and addv.

Yep, i guessed as much in the patch description.

Are you going to fix such regressions?

I have zero clues about mips/etc. I'll certainly take a look, but i just don't know.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2819

Hmm, i'm not a fan, and let me explain why.
There are three paths, as i see it:

  • Without this patch - keep sub for vectors.
    1. Bad - pattern duplication
    2. Good - it is immediately obvious when the sub fold is missing,
    3. Neutral - we do nothing to constant pool
  • This patch as-is - always transform all sub, with no use-checks.
    1. Better - no pattern duplication
    2. Unknown - this has some effect on constant pool.
  • This patch, but limit vector transform to one-use constants
    1. Bad - the pattern duplication is still there, or Worse - we suddenly loose folds if the constant is not one-use.
    2. Worse - unless one knows about that one-use limitation, the missing fold won't even be noticed

So ignoring the constant pool issues, the one-use limit is the worst path
of them all, even worse than what we have now. Am i missing some other path?

As an alternative, i can suggest some transform to analyze all the vector add %x, C,
all the existing constant-pool entries (==all constant vectors, not just those used by add),
and try to flip some of those add to sub if that allows to kill some pool entires.
The same could be done with locality in mind (within one BB).

arsenm added inline comments.Wed, Jun 19, 6:00 AM
test/CodeGen/AMDGPU/sub.v2i16.ll
329

I have a patch which should implement this for the v2i16 case