This is an archive of the discontinued LLVM Phabricator instance.

[SLP] Support for horizontal min/max reduction
ClosedPublic

Authored by ABataev on Dec 16 2016, 5:47 AM.

Details

Summary

SLP vectorizer supports horizontal reductions for Add/FAdd binary operations. Patch adds support for horizontal min/max reductions.
Function getReductionCost() is split to getArithmeticReductionCost() for binary operation reductions and getMinMaxReductionCost() for min/max reductions.
Patch fixes PR26956.

Diff Detail

Repository
rL LLVM

Event Timeline

ABataev updated this revision to Diff 81744.Dec 16 2016, 5:47 AM
ABataev retitled this revision from to [SLP] Support for horizontal min/max reduction.
ABataev updated this object.
ABataev added a subscriber: llvm-commits.
RKSimon edited edge metadata.Jan 6 2017, 6:00 AM

A few comments, but someone with more vectorizer knowledge needs to review as well.

include/llvm/CodeGen/BasicTTIImpl.h
1009 ↗(On Diff #81744)

Missing assert message

1012 ↗(On Diff #81744)

Move this comment out and just above the arithmetic/minmax functions?

1020 ↗(On Diff #81744)

where only the first n/2 elements are meaningful,

1023 ↗(On Diff #81744)

, not n,

lib/Target/X86/X86TargetTransformInfo.cpp
1659 ↗(On Diff #81744)

Unnecessary?

ABataev marked 5 inline comments as done.Jan 18 2017, 11:03 AM
ABataev added inline comments.
lib/Target/X86/X86TargetTransformInfo.cpp
1659 ↗(On Diff #81744)

Yes, missed it, thanks.

ABataev updated this revision to Diff 84867.Jan 18 2017, 11:19 AM
ABataev marked an inline comment as done.

Address Simon's comments

mkuper edited edge metadata.Jan 30 2017, 5:55 PM

This is another example of what I was talking about re smaller patches.

You could do things like name changes (Reduction -> ArithmeticReduction) or moving comments around (from inside the body of getReductionCost to above the declaration) in separate NFC patches. In a lot of cases they're simple enough not to require pre-commit review, and it makes reviewing actual meaningful patches much much easier, because the diff only contains relevant things. You could also start with a patch that only supports one kind of min/max reduction, and add the rest of them as a follow-up.

I'm sorry if it seems like I'm being petty. In fact I'm really interested in getting all of this stuff in. But the SLP vectorizer is not the simplest or the clearest piece of code to begin with, and I'm not deeply familiar with its nuances, so it's really hard for me to meaningfully review SLP patches if they also contain noise, and if they do more than one thing. If there's someone else who's capable of reviewing and LGTMing these patches as is, I don't oppose it. But I can't.

This is another example of what I was talking about re smaller patches.

You could do things like name changes (Reduction -> ArithmeticReduction) or moving comments around (from inside the body of getReductionCost to above the declaration) in separate NFC patches. In a lot of cases they're simple enough not to require pre-commit review, and it makes reviewing actual meaningful patches much much easier, because the diff only contains relevant things. You could also start with a patch that only supports one kind of min/max reduction, and add the rest of them as a follow-up.

I'm sorry if it seems like I'm being petty. In fact I'm really interested in getting all of this stuff in. But the SLP vectorizer is not the simplest or the clearest piece of code to begin with, and I'm not deeply familiar with its nuances, so it's really hard for me to meaningfully review SLP patches if they also contain noise, and if they do more than one thing. If there's someone else who's capable of reviewing and LGTMing these patches as is, I don't oppose it. But I can't.

Michael, no problems at all. Will do my best to split this and other patches into several smaller pieces, but I can't guarantee that it will be possible to do in all cases.

RKSimon resigned from this revision.Feb 8 2017, 3:44 AM
RKSimon added a subscriber: RKSimon.
RKSimon edited edge metadata.Aug 1 2017, 5:21 AM

This looks like it needs rebasing and making dependent on D29826

ABataev updated this revision to Diff 109576.Aug 3 2017, 9:18 AM
ABataev edited the summary of this revision. (Show Details)

Update to latest revision.

RKSimon added inline comments.Aug 3 2017, 9:36 AM
include/llvm/CodeGen/BasicTTIImpl.h
1242 ↗(On Diff #109576)

This seems the same as the comment before getArithmeticReductionCost - in which case is it worth keeping?

1296 ↗(On Diff #109576)

ConcreteTTI->getCmpSelInstrCost(

ABataev updated this revision to Diff 109990.Aug 7 2017, 7:27 AM

Update after review

RKSimon added inline comments.Aug 7 2017, 8:09 AM
include/llvm/CodeGen/BasicTTIImpl.h
1182 ↗(On Diff #109990)

ScalarTy->isFloatingPointTy()

lib/Analysis/CostModel.cpp
194 ↗(On Diff #109990)

They're not public, but maybe keep to style guide (also, maybe drop the class?)

enum ReductionKind {
  RK_None,        /// Not a reduction.
  RK_Arithmetic, /// Binary reduction data.
  RK_MinMax,     /// Min/max reduction data.
};
210 ↗(On Diff #109990)

Do you need the this == &RD? Won't it always match on (Kind == RD.Kind && Opcode == RD.Opcode)?

lib/Target/X86/X86TargetTransformInfo.cpp
1892 ↗(On Diff #109990)

One cost entry per line

lib/Transforms/Vectorize/SLPVectorizer.cpp
4501 ↗(On Diff #109990)

Same as above:

enum ReductionKind {
  RK_Not,        /// Not a reduction.
  RK_Arithmetic, /// Binary reduction data.
  RK_Min,        /// Minimum reduction data.
  RK_UMin,       /// Unsigned minimum reduction data.
  RK_Max,        /// Maximum reduction data.
  RK_UMax,       /// Unsigned maximum reduction data.
};
ABataev updated this revision to Diff 110255.Aug 8 2017, 12:28 PM

Update after review.

I think its almost there now, my only concern is that we're not discriminating between signed/unsigned in getMinMaxReductionCost calls - e.g. SSE is very inconsistent with support for these.

ABataev updated this revision to Diff 111056.Aug 14 2017, 1:20 PM

Update after review

No more comments from me

@mkuper - any thoughts?

RKSimon accepted this revision.Sep 8 2017, 4:25 AM

LGTM with one minor

lib/Transforms/Vectorize/SLPVectorizer.cpp
4583 ↗(On Diff #111056)

Call this RK_None to match the other version?

This revision is now accepted and ready to land.Sep 8 2017, 4:25 AM
This revision was automatically updated to reflect the committed changes.
RKSimon reopened this revision.Sep 17 2017, 7:02 AM

(Partially) reverted in rL313409 due to PR34635

This revision is now accepted and ready to land.Sep 17 2017, 7:02 AM
RKSimon requested changes to this revision.Sep 17 2017, 7:02 AM

PR34635 needs addressing

This revision now requires changes to proceed.Sep 17 2017, 7:02 AM
ABataev updated this revision to Diff 115705.Sep 18 2017, 12:46 PM
ABataev edited edge metadata.

Update after fixing PR34635

Is there any chance that you can simplify the PR34635.ll test case that you committed? There's a lot of metadata/unnecessary code in there which is likely to make the testcase very brittle.

ABataev updated this revision to Diff 115837.Sep 19 2017, 6:50 AM

Update after test update

Is there any chance that you can simplify the PR34635.ll test case that you committed? There's a lot of metadata/unnecessary code in there which is likely to make the testcase very brittle.

Done

RKSimon accepted this revision.Sep 24 2017, 7:19 AM

LGTM, thanks

This revision is now accepted and ready to land.Sep 24 2017, 7:19 AM
This revision was automatically updated to reflect the committed changes.
sabuasal added inline comments.
llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp
4923

Hi,

Why are we filling the NonNan flags for the "OperationData" object the value from the condition of the select instruction instead of the select Instruction itself? Wheb ew get to code gen we check that the value itself is not a Nan, am I missing something?

ABataev added inline comments.Mar 7 2018, 6:24 AM
llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp
4923
  1. SelectInst itself does not have any fp flags, only fcmp does.
  2. I don't fully understand your question, but seems to me you're asking where are the checks for the NaN in the generated code, right? Nowhere, we do not emit these checks.