This is an archive of the discontinued LLVM Phabricator instance.

[VP] Add vector-predicated reduction intrinsics
ClosedPublic

Authored by frasercrmck on Jun 15 2021, 9:41 AM.

Details

Summary

This patch adds vector-predicated ("VP") reduction intrinsics corresponding to
each of the existing unpredicated llvm.vector.reduce.* versions. Unlike the
unpredicated reductions, all VP reductions have a start value. This start value
is returned when the no vector element is active.

Support for expansion on targets without native vector-predication support is
included.

This patch is based on the "reduction slice" of the LLVM-VP reference patch
(https://reviews.llvm.org/D57504).

Diff Detail

Event Timeline

frasercrmck created this revision.Jun 15 2021, 9:41 AM
frasercrmck requested review of this revision.Jun 15 2021, 9:41 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 15 2021, 9:41 AM
frasercrmck added inline comments.Jun 15 2021, 9:43 AM
llvm/test/CodeGen/Generic/expand-vp.ll
29

Note that I've still got some reductions to add here but I feel the patch itself is good enough to start reviewing.

frasercrmck added inline comments.Jun 15 2021, 9:46 AM
llvm/include/llvm/IR/IntrinsicInst.h
450

This can probably go in VPReductionIntrinsic

I wonder whether the semantics sections in the documentation should just refer back to semantics sections of the regular reduction intrinsics instead of replicating them. In the end, we use those when vp reduction are expanded anyway: if standard reductions switch semantics at some point, we will too unwittingly.

llvm/unittests/IR/VPIntrinsicTest.cpp
62

Good ol' printf debugging

  • rebase
  • remove debug print
  • move isVPReduction into VPReductionIntrinsic
  • flesh out expand-vp.ll test
frasercrmck marked an inline comment as done.Jun 18 2021, 4:07 AM

I wonder whether the semantics sections in the documentation should just refer back to semantics sections of the regular reduction intrinsics instead of replicating them. In the end, we use those when vp reduction are expanded anyway: if standard reductions switch semantics at some point, we will too unwittingly.

I think that's a good idea. I found it difficult to strike a balance between providing enough "interesting" information that doesn't involve jumping about the page and plain copy/paste. I do think it's important to clarify how disabled lanes behave (I hope you agree) but almost everything else is just the base version.

llvm/unittests/IR/VPIntrinsicTest.cpp
62

:)

frasercrmck marked an inline comment as done.Jun 18 2021, 4:08 AM

I wonder whether the semantics sections in the documentation should just refer back to semantics sections of the regular reduction intrinsics instead of replicating them. In the end, we use those when vp reduction are expanded anyway: if standard reductions switch semantics at some point, we will too unwittingly.

I think that's a good idea. I found it difficult to strike a balance between providing enough "interesting" information that doesn't involve jumping about the page and plain copy/paste. I do think it's important to clarify how disabled lanes behave (I hope you agree) but almost everything else is just the base version.

Disabling lanes is really what makes the difference between these and the regular reduction intrinsics.
There is also the corner case that all lanes are disabled and i am unsure what the return value should be then. Any thoughts on that?

Disabling lanes is really what makes the difference between these and the regular reduction intrinsics.
There is also the corner case that all lanes are disabled and i am unsure what the return value should be then. Any thoughts on that?

Agreed. I'll update the docs accordingly.

We have a couple of options for what happens when all lanes are disabled. For starters, it follows logically from the definition of the expansion into reduce(select %mask, %v, %neutral) that we just return the neutral element. So that's the "easiest" definition in that sense.

We could return undef too which would be closer to how the other VP intrinsics work.

As for other options, it wouldn't make sense to me to specify that we return any of the vector elements (e.g. v[0]) as they're not conceptually active. And I think poison wouldn't make any sense. Are those the only realistic options?

In terms of hardware (which is orthogonal but may help guide us), RVV always takes a start value so we'd just return the neutral element; I admit I might be surreptitiously led by that. The lowering for returning the "neutral element" may be complicated on some targets, involving fetching the active length and perhaps doing some kind of scalar select. How would VE work?

Disabling lanes is really what makes the difference between these and the regular reduction intrinsics.
There is also the corner case that all lanes are disabled and i am unsure what the return value should be then. Any thoughts on that?

Agreed. I'll update the docs accordingly.

We have a couple of options for what happens when all lanes are disabled. For starters, it follows logically from the definition of the expansion into reduce(select %mask, %v, %neutral) that we just return the neutral element. So that's the "easiest" definition in that sense.

We could return undef too which would be closer to how the other VP intrinsics work.

As for other options, it wouldn't make sense to me to specify that we return any of the vector elements (e.g. v[0]) as they're not conceptually active. And I think poison wouldn't make any sense. Are those the only realistic options?

In terms of hardware (which is orthogonal but may help guide us), RVV always takes a start value so we'd just return the neutral element; I admit I might be surreptitiously led by that. The lowering for returning the "neutral element" may be complicated on some targets, involving fetching the active length and perhaps doing some kind of scalar select. How would VE work?

Could we add a start value operand to all of the intrinsics? fadd and fmul already have it. If the vectorizer doesn't have a value it can put the neutral value. On targets like RISCV we can use that neutral value directly since we need the scalar input anyway. For other targets they can detect that the argument is neutral and not use it if it doesn't mesh with their ISA.

simoll added a comment.Jul 1 2021, 2:49 AM

Disabling lanes is really what makes the difference between these and the regular reduction intrinsics.
There is also the corner case that all lanes are disabled and i am unsure what the return value should be then. Any thoughts on that?

Agreed. I'll update the docs accordingly.

We have a couple of options for what happens when all lanes are disabled. For starters, it follows logically from the definition of the expansion into reduce(select %mask, %v, %neutral) that we just return the neutral element. So that's the "easiest" definition in that sense.

We could return undef too which would be closer to how the other VP intrinsics work.

As for other options, it wouldn't make sense to me to specify that we return any of the vector elements (e.g. v[0]) as they're not conceptually active. And I think poison wouldn't make any sense. Are those the only realistic options?

In terms of hardware (which is orthogonal but may help guide us), RVV always takes a start value so we'd just return the neutral element; I admit I might be surreptitiously led by that. The lowering for returning the "neutral element" may be complicated on some targets, involving fetching the active length and perhaps doing some kind of scalar select. How would VE work?

Could we add a start value operand to all of the intrinsics? fadd and fmul already have it. If the vectorizer doesn't have a value it can put the neutral value. On targets like RISCV we can use that neutral value directly since we need the scalar input anyway. For other targets they can detect that the argument is neutral and not use it if it doesn't mesh with their ISA.

Feeding back the discussions we had off Phabricator into the review thread:

  1. We agreed to have scalar start value parameters in all vp reduction intrinsics (unlike llvm.vector.reduce.* which only have them where needed for non-reassociatable reductions). This makes them regular and there is no need to match the start value.
  2. There appears to be an issue with the %evl == 0 corner case in that the ISAs (RISC- V and VE) do not update the result register in that case. There would need to be a register constraint between the start value register and the result register to enforce this. This sure is an artifact of the intrinsic reaching into isel. However, I am not sure how big of a problem that really is.
  • add start value to all intrinsics
  • clarify %evl==0 semantics in docs

Wasn't sure if it's best to explicitly say in the docs that vp reductions have a start value unlike their unpredicated counterparts. I've left it out for now.

craig.topper added inline comments.Jul 14 2021, 10:06 AM
llvm/docs/LangRef.rst
18582

With the start value added, I think it's the second operand now?

18807

Neutral value is -1 or UINT_MAX for AND

19175

Does -QNAN mean anything? Isn't the sign of a NaN mostly ignored.

19182

Even if all vector elements are NaN, the result would depend on the start value

llvm/include/llvm/IR/IntrinsicInst.h
465

Should this be isVPReduction?

llvm/include/llvm/IR/VPIntrinsics.def
251

Why accu instead of start?

llvm/lib/CodeGen/ExpandVectorPredication.cpp
264

I was wondering if we could go through ConstantExpr::getBinOpIdentity to avoid repeating the constants in multiple places but we'd still need special handling for min/max

llvm/test/Verifier/vp-intrinsics.ll
34

Start value is missing

frasercrmck marked 7 inline comments as done.

fix Verifier test
fix VPReductionIntrinsic::classof
add new unit test
update docs:

  • fix references to operand indices
  • fix neutral element for AND reduction
  • fix specified return value for FMIN/FMAX and NaNs
  • be more explicit about return value EVL==0
  • be more explicit about start value
  • reference operands by their names in Arguments sections
llvm/docs/LangRef.rst
18582

Yes, thanks. I didn't do a proper sweep over the docs. I've made further changes to them so it might be worth going over them again.

18807

Good spot, thanks. Updated the "equivalent to" section below, too.

19175

I have heard that the sign of NaNs is often ignored but I'm not sure it's guaranteed to be ignored? -QNAN does have a bit representation and ConstantFP and APFloat both take a Negative parameter to their getQNAN methods. It's also what SelectionDAG::getNeutralElement is doing for FMAXNUM.

19182

Yes true. I've replaced that sentence now.

llvm/include/llvm/IR/IntrinsicInst.h
456

Not sure whether to keep this in or not, now that all vp reductions have start parameter? Currently start is always 0 and vector is always 1.

465

Oh good spot, thanks. I've fixed that and added a unit test for these VPReductionIntrinsic methods.

llvm/include/llvm/IR/VPIntrinsics.def
251

I think the reference patch uses/used accu so I was flip-flopping between the two. I agree that "start" is better though.

llvm/lib/CodeGen/ExpandVectorPredication.cpp
264

Coming at it from a slightly different angle, I was wondering if there should be a single source of truth for neutral reduction elements between IR and SelectionDAG.

llvm/test/Verifier/vp-intrinsics.ll
34

Cheers.

  • add logic for legalization similar to regular reductions
  • return correct non-SEQ ISD opcode from reassoc reductions
craig.topper added inline comments.Aug 6 2021, 10:42 AM
llvm/docs/LangRef.rst
18710

vale -> value

llvm/include/llvm/IR/VPIntrinsics.def
241

Is this comment still accurate with start value?

297

Should this be %acc or %start?

313

Should this be lined up with vp_reduce_fadd on the line above?

llvm/lib/CodeGen/ExpandVectorPredication.cpp
376

Is the documentation for this in IRBuilder incorrect? It says the accumulator is for ordered reductions. Is it also used for unordered?

/// Create a vector fadd reduction intrinsic of the source vector.                                                                                                                                      
/// The first parameter is a scalar accumulator value for ordered reductions.                                                                                                                           
CallInst *CreateFAddReduce(Value *Acc, Value *Src);                                                                                                                                                     
                                                                                                                                                                                                        
/// Create a vector fmul reduction intrinsic of the source vector.                                                                                                                                      
/// The first parameter is a scalar accumulator value for ordered reductions.                                                                                                                           
CallInst *CreateFMulReduce(Value *Acc, Value *Src);
frasercrmck marked 4 inline comments as done.
  • rebase
  • address wee bits of feedback
frasercrmck marked an inline comment as done.Aug 9 2021, 3:19 AM
frasercrmck added inline comments.
llvm/docs/LangRef.rst
18710

Nice spot

llvm/include/llvm/IR/VPIntrinsics.def
241

Nope, but it is now, thanks for catching that.

297

Aye it should be %start but I've reworked the documentation anyway; this specialized macro is now really for reductions with separate "seq" forms. Cheers.

313

Done, cheers.

llvm/lib/CodeGen/ExpandVectorPredication.cpp
376

Yeah so the unordered reduction intrinsics do indeed have an accumulator, so this is misleading. However, since the only difference between ordered and unordered intrinsics is the presence of the reassoc flag, this method always creates an ordered reduction. It's up to the user to add the flag later. That's what we do in replaceOperation by transferring any existing fast-math flags on to the expanded reduction.

From the SelectionDAG's perspective the unordered reductions don't have an accumulator, because it's split out early in the SelectionDAGBuilder. I wonder if that's where the confusion came from.

I think I still support addressing this documentation though. I can do that in a separate patch as it's likely to spawn discussion.

frasercrmck marked an inline comment as done.Aug 9 2021, 4:17 AM
frasercrmck added inline comments.
llvm/lib/CodeGen/ExpandVectorPredication.cpp
376

I tried to address this in D107753.

craig.topper added inline comments.Aug 9 2021, 9:44 AM
llvm/lib/CodeGen/ExpandVectorPredication.cpp
27

What did we start using from the Operator.h? I couldn't spot it.

  • remove unused header
frasercrmck marked an inline comment as done.Aug 10 2021, 2:45 AM
frasercrmck added inline comments.
llvm/lib/CodeGen/ExpandVectorPredication.cpp
27

Ah, nice; must have been an artifact from an intermediate change. That's that gone now.

This revision is now accepted and ready to land.Aug 10 2021, 8:30 AM
frasercrmck marked an inline comment as done.Aug 11 2021, 2:28 AM

Thanks for the review, Craig. Does the patch LGTY, @simoll?

  • rebase
  • address clang-tidy warnings
frasercrmck edited the summary of this revision. (Show Details)Aug 17 2021, 10:05 AM
This revision was landed with ongoing or failed builds.Aug 17 2021, 10:06 AM
This revision was automatically updated to reflect the committed changes.