This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine][X86][AArch64][AMDGPU][MIPS][PPC] (sub x, c) -> (add x, -c) vector edition.
AbandonedPublic

Authored by lebedev.ri on May 23 2019, 1:12 PM.

Details

Summary

Lots of diff. The entire check-llvm-codegen passes,
so only X86 had conflicting transform. (D62327)

We want this transform because currently every single DAGCombine add %x, C
vector pattern needs to be written twice - for add and for sub.
Not good.

  • AArch64 changes look neutral-positive. I'm not good with that asm, but i think movi v1.2d encodes the entire all-ones as an imm0_255:$imm8, so there should not be codesize penalty?
  • AMDGPU changes look neutral-positive.
  • MIPS changes are neutral, regressions are being addressed by D66805.
  • PowerPC - not great, some regressions, same fold as MIPS seems missing.
  • X86 - in average looks like an improvement :) There are more deletions than additions. We delete 137 unfolded constant-pool loads, but add 56; delete 233 folded constant-pool loads, but add 350. Can't tell yet if there is some missing combines..

Diff Detail

Event Timeline

lebedev.ri created this revision.May 23 2019, 1:12 PM

Forgot to add, i'm not dropping anything from DAGCombiner::visitSUB() here,
that should be done as NFC followups.

arsenm added inline comments.May 23 2019, 1:16 PM
test/CodeGen/AMDGPU/sub.v2i16.ll
329 ↗(On Diff #201043)

This is worse because it is no longer an inline immediate. We undo this already for the 32-bit case during selection:

Undo sub x, c -> add x, -c canonicalization since c is more likely
an inline immediate than -c.
// TODO: Also do for 64-bit.
def : GCNPat<

(add i32:$src0, (i32 NegSubInlineConst32:$src1)),
(S_SUB_I32 $src0, NegSubInlineConst32:$src1)

;

RKSimon added inline comments.May 23 2019, 1:38 PM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2819 ↗(On Diff #201043)

Do we need a (!VT.isVector() || N1.hasOneUse()) limit?

I do not think it's good for MIPS because this patch replace one subvi instruction by pair of ldi and addv. Are you going to fix such regressions?

lebedev.ri marked an inline comment as done.May 24 2019, 4:45 AM

I do not think it's good for MIPS because this patch replace one subvi instruction by pair of ldi and addv.

Yep, i guessed as much in the patch description.

Are you going to fix such regressions?

I have zero clues about mips/etc. I'll certainly take a look, but i just don't know.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2819 ↗(On Diff #201043)

Hmm, i'm not a fan, and let me explain why.
There are three paths, as i see it:

  • Without this patch - keep sub for vectors.
    1. Bad - pattern duplication
    2. Good - it is immediately obvious when the sub fold is missing,
    3. Neutral - we do nothing to constant pool
  • This patch as-is - always transform all sub, with no use-checks.
    1. Better - no pattern duplication
    2. Unknown - this has some effect on constant pool.
  • This patch, but limit vector transform to one-use constants
    1. Bad - the pattern duplication is still there, or Worse - we suddenly loose folds if the constant is not one-use.
    2. Worse - unless one knows about that one-use limitation, the missing fold won't even be noticed

So ignoring the constant pool issues, the one-use limit is the worst path
of them all, even worse than what we have now. Am i missing some other path?

As an alternative, i can suggest some transform to analyze all the vector add %x, C,
all the existing constant-pool entries (==all constant vectors, not just those used by add),
and try to flip some of those add to sub if that allows to kill some pool entires.
The same could be done with locality in mind (within one BB).

arsenm added inline comments.Jun 19 2019, 6:00 AM
test/CodeGen/AMDGPU/sub.v2i16.ll
329 ↗(On Diff #201043)

I have a patch which should implement this for the v2i16 case

lebedev.ri marked 3 inline comments as done.
lebedev.ri edited the summary of this revision. (Show Details)

Rebased.
AMDGPU regressions are no more (@arsenm, thanks!)
Everything else is still the same - some PPC and MIPS regressions remain.

I do not think it's good for MIPS because this patch replace one subvi instruction by pair of ldi and addv. Are you going to fix such regressions?

@atanasyan D66805 addresses all llvm/test/CodeGen/Mips/msa/arithmetic.ll regressions,
but llvm/test/CodeGen/Mips/msa/i5-s.ll one remains, i don't know what's going on there;
PTAL, thank you.

@atanasyan D66805 addresses all llvm/test/CodeGen/Mips/msa/arithmetic.ll regressions,
but llvm/test/CodeGen/Mips/msa/i5-s.ll one remains, i don't know what's going on there;
PTAL, thank you.

I'll take a look probably tomorrow.

lebedev.ri edited the summary of this revision. (Show Details)Aug 27 2019, 10:52 AM

@hfinkel should i be worried about PPC regressions here?
Those are there regardless of the patch (instcombine already did this fold).
I'm not yet sure how to handle them, in some of these patterns the constant
is already constant-pool load, in some cases it's bitcasted, etc.

@hfinkel should i be worried about PPC regressions here?
Those are there regardless of the patch (instcombine already did this fold).
I'm not yet sure how to handle them, in some of these patterns the constant
is already constant-pool load, in some cases it's bitcasted, etc.

Yeah, those don't look good. @nemanjai , are we just missing some patterns?

@hfinkel should i be worried about PPC regressions here?
Those are there regardless of the patch (instcombine already did this fold).
I'm not yet sure how to handle them, in some of these patterns the constant
is already constant-pool load, in some cases it's bitcasted, etc.

Yeah, those don't look good. @nemanjai , are we just missing some patterns?

I believe you need something similar to https://reviews.llvm.org/D66805#change-JSdQ9NCYAb2i / D63558,
but i'm not sure how to do that for PPC, would be great if someone actually familiar with that backend could handle that..

Could you rebase the patch against the master branch?

Could you rebase the patch against the master branch?

Rebased. Are you happy with D66805 ?

@nemanjai / @hfinkel ping? any advice on how to proceed here?
It seems PPC is the only regression remaining here.

@hfinkel should i be worried about PPC regressions here?
Those are there regardless of the patch (instcombine already did this fold).
I'm not yet sure how to handle them, in some of these patterns the constant
is already constant-pool load, in some cases it's bitcasted, etc.

Yeah, those don't look good. @nemanjai , are we just missing some patterns?

I believe you need something similar to https://reviews.llvm.org/D66805#change-JSdQ9NCYAb2i / D63558,
but i'm not sure how to do that for PPC, would be great if someone actually familiar with that backend could handle that..

For vectors this is a big increase in constant pool usage - are we sure we want to do this?

lebedev.ri marked an inline comment as done.Sep 6 2019, 6:56 AM

For vectors this is a big increase in constant pool usage - are we sure we want to do this?

Could you be more specific please?
Are you thinking of the code where we previously reused the constant from sub in some other instruction, and now we won't?
One more issue i'm seeing is some load folding opportunities.

llvm/test/CodeGen/X86/addsub-constant-folding.ll
460–463

FIXME: Sometimes using sub will allow to avoid extra load instruction.

Yes the vXi8 shifts for instance

Yes the vXi8 shifts for instance

Err, i meant to reply but apparently never did.
Can you point me to the appropriate test case?

lebedev.ri abandoned this revision.Jan 17 2022, 2:40 PM