Download Raw Diff

Details

Reviewers

spatel
RKSimon
dtemirbulatov
anton-afanasyev
sdesmalen
lebedev.ri

Commits

rG04ba80ca4dee: [Instcombiner]Improve emission of logical or/and reductions.

Summary

For logical or/and reductions we emit regular intrinsics @llvm.vector.reduce.or/and.vxi1 calls.
These intrinsics are not effective for the logical or/and reductions,
especially if the optimizer is able to emit short circuit versions of
the scalar or/and instructions and vector code gets less effective than
the scalar version.
Instead, or reduction for i1 can be represented as:

%val = bitcast <ReduxWidth x i1> to iReduxWidth
%res = cmp ne iReduxWidth %val, 0

and reduction for i1 can be represented as:

%val = bitcast <ReduxWidth x i1> to iReduxWidth
%res = cmp eq iReduxWidth %val, 11111

This improves perfromance of the vector code significantly and make it
to outperform short circuit scalar code.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ABataev created this revision.Feb 24 2021, 11:44 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptFeb 24 2021, 11:44 AM

ABataev requested review of this revision.Feb 24 2021, 11:44 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 24 2021, 11:44 AM

Summary improvements

ABataev added a parent revision: D57059: [SLP] Initial support for the vectorization of the non-power-of-2 vectors..Feb 24 2021, 1:06 PM

Harbormaster completed remote builds in B90674: Diff 326187.Feb 24 2021, 1:06 PM

Harbormaster completed remote builds in B90662: Diff 326167.Feb 24 2021, 2:38 PM

Hi @ABataev, I saw this patch come by and left some comments, hope that's alright!

llvm/lib/Transforms/Utils/LoopUtils.cpp
1031 ↗	(On Diff #326187)	Can you add the condition that `&& isa<FixedVectorType>(Src)`? (same request for LoopVectorize.cpp and SLPVectorize.cpp) We're starting to make the LoopVectorizer vectorize for scalable VFs. This means we're currently fixing up cases like this where assumptions are made that are only valid for fixed-width vectors. For scalable vectors it might be possible to do the `<vscale x N x i1>` reduction as a compare on `<vscale x 1 x iN>`, but at least for SVE I know that we never want that.
1034–1036 ↗	(On Diff #326187)	nit: `cast<FixedVectorType>(Src->getType())->getNumElements()`
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
6779–6794 ↗	(On Diff #326187)	Should this be part of BasicTTImpl::getArithmeticReductionCost? That way you don't have to implement it twice.

sdesmalen added a reviewer: sdesmalen.Feb 25 2021, 1:19 AM

Why can't we just do the costmodel change, and then canonicalize to the right variant (i.e. away from @llvm.vector.reduce.?.v?i1) in instcombine?

In D97406#2587071, @lebedev.ri wrote:

Why can't we just do the costmodel change, and then canonicalize to the right variant (i.e. away from @llvm.vector.reduce.?.v?i1) in instcombine?

+1 I'd prefer to avoid yet more special cases in the vectorizers.

Yeah, I also thought about moving this to instcombiner. This patch originally is more a question where it is better to implement it. I'll move the transformation to the instcombiner.

llvm/lib/Transforms/Utils/LoopUtils.cpp
1031 ↗	(On Diff #326187)	Sure, will do.

Address comments.

Can't all the vectorizer code be removed now? X86TTIImpl::getArithmeticReductionCost already has bool reduction costs - not sure if we need to improve the default costs ?

In D97406#2590716, @RKSimon wrote:

Can't all the vectorizer code be removed now? X86TTIImpl::getArithmeticReductionCost already has bool reduction costs - not sure if we need to improve the default costs ?

Looks like the bool reduction costs are too high because they still rely on some vector instructions. The cost of bitcast + cmp is much lower than even AVX512 bool reductions, for example. What we can do, is to check ащкall targets that we have or/and bool reduction and fallback to the BasicTTI cost. And teach BasicTTI that the cost of bool or/and reductions is actually just bitcast+cmp.

Harbormaster completed remote builds in B91073: Diff 326737.Feb 26 2021, 11:01 AM

Rebase

Harbormaster completed remote builds in B91575: Diff 327456.Mar 2 2021, 7:57 AM

llvm/lib/Transforms/Vectorize/ changes should not be there, the costmodel itself should be fixed.

In D97406#2597397, @lebedev.ri wrote:

llvm/lib/Transforms/Vectorize/ changes should not be there, the costmodel itself should be fixed.

So, do you suggest modifying getArithmeticReductionCost to return the correct cost for logical reductions?

In D97406#2597419, @ABataev wrote:

In D97406#2597397, @lebedev.ri wrote:

llvm/lib/Transforms/Vectorize/ changes should not be there, the costmodel itself should be fixed.

So, do you suggest modifying getArithmeticReductionCost to return the correct cost for logical reductions?

Something along those lines, yes.

Also, could you please split off the instcombine change? That is basically LG.

llvm/test/Transforms/InstCombine/vector-logical-reductions.ll
4–22	These tests are too complex, they should be just:

In D97406#2597430, @lebedev.ri wrote:

In D97406#2597419, @ABataev wrote:

In D97406#2597397, @lebedev.ri wrote:

llvm/lib/Transforms/Vectorize/ changes should not be there, the costmodel itself should be fixed.

So, do you suggest modifying getArithmeticReductionCost to return the correct cost for logical reductions?

Something along those lines, yes.

Also, could you please split off the instcombine change? That is basically LG.

Ok.

llvm/test/Transforms/InstCombine/vector-logical-reductions.ll
4–22	Will fix

Removed cost update

ABataev retitled this revision from [Vectorizers]Improve emission of logical or/and reductions. to [Instcombiner]Improve emission of logical or/and reductions..Mar 2 2021, 8:59 AM

Harbormaster completed remote builds in B91590: Diff 327477.Mar 2 2021, 9:00 AM

Hm, actually, are we sure this is universally true/better?
For aarch64 this seems to result in larger codegen: https://godbolt.org/z/hczeMo

In D97406#2597601, @lebedev.ri wrote:

Hm, actually, are we sure this is universally true/better?
For aarch64 this seems to result in larger codegen: https://godbolt.org/z/hczeMo

Is this the same as these?
https://bugs.llvm.org/show_bug.cgi?id=41636
https://bugs.llvm.org/show_bug.cgi?id=41635
https://bugs.llvm.org/show_bug.cgi?id=41634

In D97406#2597705, @RKSimon wrote:

In D97406#2597601, @lebedev.ri wrote:

Hm, actually, are we sure this is universally true/better?
For aarch64 this seems to result in larger codegen: https://godbolt.org/z/hczeMo

Is this the same as these?
https://bugs.llvm.org/show_bug.cgi?id=41636
https://bugs.llvm.org/show_bug.cgi?id=41635
https://bugs.llvm.org/show_bug.cgi?id=41634

Yes, looks so.

Okay, let's call this canonicalization.

This revision is now accepted and ready to land.Mar 2 2021, 9:50 AM

This revision was landed with ongoing or failed builds.Mar 4 2021, 8:01 AM

Closed by commit rG04ba80ca4dee: [Instcombiner]Improve emission of logical or/and reductions. (authored by ABataev). · Explain Why

This revision was automatically updated to reflect the committed changes.

ABataev added a commit: rG04ba80ca4dee: [Instcombiner]Improve emission of logical or/and reductions..

This is an archive of the discontinued LLVM Phabricator instance.

[Instcombiner]Improve emission of logical or/and reductions.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 328178

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

llvm/test/Transforms/InstCombine/vector-logical-reductions.ll

This is an archive of the discontinued LLVM Phabricator instance.

[Instcombiner]Improve emission of logical or/and reductions.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 328178

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

llvm/test/Transforms/InstCombine/vector-logical-reductions.ll

[Instcombiner]Improve emission of logical or/and reductions.
ClosedPublic