Page MenuHomePhabricator

[VP] Add vector-predicated reduction intrinsics
Needs ReviewPublic

Authored by frasercrmck on Jun 15 2021, 9:41 AM.

Details

Summary

This patch adds vector-predicated ("VP") reduction intrinsics
corresponding to each of the existing unpredicated
llvm.vector.reduce.* versions. Support for expansion on targets
without native vector-predication support is included.

This patch is based on the "reduction slice" of the LLVM-VP
reference patch (https://reviews.llvm.org/D57504).

Diff Detail

Unit TestsFailed

TimeTest
2,900 msx64 debian > libarcher.critical::critical.c
Script: -- : 'RUN: at line 15'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests -I /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/src -L /var/lib/buildkite-agent/builds/llvm-project/build/lib -Wl,-rpath,/var/lib/buildkite-agent/builds/llvm-project/build/lib /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/critical/critical.c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/critical/Output/critical.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/critical/Output/critical.c.tmp 2>&1 | tee /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/critical/Output/critical.c.tmp.log | /var/lib/buildkite-agent/builds/llvm-project/build/./bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/critical/critical.c
2,710 msx64 debian > libarcher.parallel::parallel-simple.c
Script: -- : 'RUN: at line 15'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests -I /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/src -L /var/lib/buildkite-agent/builds/llvm-project/build/lib -Wl,-rpath,/var/lib/buildkite-agent/builds/llvm-project/build/lib /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/parallel/parallel-simple.c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/parallel/Output/parallel-simple.c.tmp -latomic && env OMP_TOOL_VERBOSE_INIT=stderr env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/parallel/Output/parallel-simple.c.tmp 2>&1 | tee /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/parallel/Output/parallel-simple.c.tmp.log 2>&1 | /var/lib/buildkite-agent/builds/llvm-project/build/./bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/parallel/parallel-simple.c --check-prefixes CHECK,TSAN_ON
2,670 msx64 debian > libarcher.parallel::parallel-simple2.c
Script: -- : 'RUN: at line 15'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests -I /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/src -L /var/lib/buildkite-agent/builds/llvm-project/build/lib -Wl,-rpath,/var/lib/buildkite-agent/builds/llvm-project/build/lib /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/parallel/parallel-simple2.c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/parallel/Output/parallel-simple2.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/parallel/Output/parallel-simple2.c.tmp 2>&1 | tee /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/parallel/Output/parallel-simple2.c.tmp.log | /var/lib/buildkite-agent/builds/llvm-project/build/./bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/parallel/parallel-simple2.c
2,630 msx64 debian > libarcher.races::critical-unrelated.c
Script: -- : 'RUN: at line 13'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests -I /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/src -L /var/lib/buildkite-agent/builds/llvm-project/build/lib -Wl,-rpath,/var/lib/buildkite-agent/builds/llvm-project/build/lib /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/races/critical-unrelated.c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/critical-unrelated.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/deflake.bash /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/critical-unrelated.c.tmp 2>&1 | tee /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/critical-unrelated.c.tmp.log | /var/lib/buildkite-agent/builds/llvm-project/build/./bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/races/critical-unrelated.c
2,840 msx64 debian > libarcher.races::lock-nested-unrelated.c
Script: -- : 'RUN: at line 13'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests -I /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/src -L /var/lib/buildkite-agent/builds/llvm-project/build/lib -Wl,-rpath,/var/lib/buildkite-agent/builds/llvm-project/build/lib /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/races/lock-nested-unrelated.c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/lock-nested-unrelated.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/deflake.bash /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/lock-nested-unrelated.c.tmp 2>&1 | tee /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/lock-nested-unrelated.c.tmp.log | /var/lib/buildkite-agent/builds/llvm-project/build/./bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/races/lock-nested-unrelated.c
View Full Test Results (17 Failed)

Event Timeline

frasercrmck created this revision.Jun 15 2021, 9:41 AM
frasercrmck requested review of this revision.Jun 15 2021, 9:41 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 15 2021, 9:41 AM
frasercrmck added inline comments.Jun 15 2021, 9:43 AM
llvm/test/CodeGen/Generic/expand-vp.ll
29

Note that I've still got some reductions to add here but I feel the patch itself is good enough to start reviewing.

frasercrmck added inline comments.Jun 15 2021, 9:46 AM
llvm/include/llvm/IR/IntrinsicInst.h
450

This can probably go in VPReductionIntrinsic

I wonder whether the semantics sections in the documentation should just refer back to semantics sections of the regular reduction intrinsics instead of replicating them. In the end, we use those when vp reduction are expanded anyway: if standard reductions switch semantics at some point, we will too unwittingly.

llvm/unittests/IR/VPIntrinsicTest.cpp
67

Good ol' printf debugging

  • rebase
  • remove debug print
  • move isVPReduction into VPReductionIntrinsic
  • flesh out expand-vp.ll test
frasercrmck marked an inline comment as done.Jun 18 2021, 4:07 AM

I wonder whether the semantics sections in the documentation should just refer back to semantics sections of the regular reduction intrinsics instead of replicating them. In the end, we use those when vp reduction are expanded anyway: if standard reductions switch semantics at some point, we will too unwittingly.

I think that's a good idea. I found it difficult to strike a balance between providing enough "interesting" information that doesn't involve jumping about the page and plain copy/paste. I do think it's important to clarify how disabled lanes behave (I hope you agree) but almost everything else is just the base version.

llvm/unittests/IR/VPIntrinsicTest.cpp
67

:)

frasercrmck marked an inline comment as done.Jun 18 2021, 4:08 AM

I wonder whether the semantics sections in the documentation should just refer back to semantics sections of the regular reduction intrinsics instead of replicating them. In the end, we use those when vp reduction are expanded anyway: if standard reductions switch semantics at some point, we will too unwittingly.

I think that's a good idea. I found it difficult to strike a balance between providing enough "interesting" information that doesn't involve jumping about the page and plain copy/paste. I do think it's important to clarify how disabled lanes behave (I hope you agree) but almost everything else is just the base version.

Disabling lanes is really what makes the difference between these and the regular reduction intrinsics.
There is also the corner case that all lanes are disabled and i am unsure what the return value should be then. Any thoughts on that?

Disabling lanes is really what makes the difference between these and the regular reduction intrinsics.
There is also the corner case that all lanes are disabled and i am unsure what the return value should be then. Any thoughts on that?

Agreed. I'll update the docs accordingly.

We have a couple of options for what happens when all lanes are disabled. For starters, it follows logically from the definition of the expansion into reduce(select %mask, %v, %neutral) that we just return the neutral element. So that's the "easiest" definition in that sense.

We could return undef too which would be closer to how the other VP intrinsics work.

As for other options, it wouldn't make sense to me to specify that we return any of the vector elements (e.g. v[0]) as they're not conceptually active. And I think poison wouldn't make any sense. Are those the only realistic options?

In terms of hardware (which is orthogonal but may help guide us), RVV always takes a start value so we'd just return the neutral element; I admit I might be surreptitiously led by that. The lowering for returning the "neutral element" may be complicated on some targets, involving fetching the active length and perhaps doing some kind of scalar select. How would VE work?

Disabling lanes is really what makes the difference between these and the regular reduction intrinsics.
There is also the corner case that all lanes are disabled and i am unsure what the return value should be then. Any thoughts on that?

Agreed. I'll update the docs accordingly.

We have a couple of options for what happens when all lanes are disabled. For starters, it follows logically from the definition of the expansion into reduce(select %mask, %v, %neutral) that we just return the neutral element. So that's the "easiest" definition in that sense.

We could return undef too which would be closer to how the other VP intrinsics work.

As for other options, it wouldn't make sense to me to specify that we return any of the vector elements (e.g. v[0]) as they're not conceptually active. And I think poison wouldn't make any sense. Are those the only realistic options?

In terms of hardware (which is orthogonal but may help guide us), RVV always takes a start value so we'd just return the neutral element; I admit I might be surreptitiously led by that. The lowering for returning the "neutral element" may be complicated on some targets, involving fetching the active length and perhaps doing some kind of scalar select. How would VE work?

Could we add a start value operand to all of the intrinsics? fadd and fmul already have it. If the vectorizer doesn't have a value it can put the neutral value. On targets like RISCV we can use that neutral value directly since we need the scalar input anyway. For other targets they can detect that the argument is neutral and not use it if it doesn't mesh with their ISA.

simoll added a comment.Thu, Jul 1, 2:49 AM

Disabling lanes is really what makes the difference between these and the regular reduction intrinsics.
There is also the corner case that all lanes are disabled and i am unsure what the return value should be then. Any thoughts on that?

Agreed. I'll update the docs accordingly.

We have a couple of options for what happens when all lanes are disabled. For starters, it follows logically from the definition of the expansion into reduce(select %mask, %v, %neutral) that we just return the neutral element. So that's the "easiest" definition in that sense.

We could return undef too which would be closer to how the other VP intrinsics work.

As for other options, it wouldn't make sense to me to specify that we return any of the vector elements (e.g. v[0]) as they're not conceptually active. And I think poison wouldn't make any sense. Are those the only realistic options?

In terms of hardware (which is orthogonal but may help guide us), RVV always takes a start value so we'd just return the neutral element; I admit I might be surreptitiously led by that. The lowering for returning the "neutral element" may be complicated on some targets, involving fetching the active length and perhaps doing some kind of scalar select. How would VE work?

Could we add a start value operand to all of the intrinsics? fadd and fmul already have it. If the vectorizer doesn't have a value it can put the neutral value. On targets like RISCV we can use that neutral value directly since we need the scalar input anyway. For other targets they can detect that the argument is neutral and not use it if it doesn't mesh with their ISA.

Feeding back the discussions we had off Phabricator into the review thread:

  1. We agreed to have scalar start value parameters in all vp reduction intrinsics (unlike llvm.vector.reduce.* which only have them where needed for non-reassociatable reductions). This makes them regular and there is no need to match the start value.
  2. There appears to be an issue with the %evl == 0 corner case in that the ISAs (RISC- V and VE) do not update the result register in that case. There would need to be a register constraint between the start value register and the result register to enforce this. This sure is an artifact of the intrinsic reaching into isel. However, I am not sure how big of a problem that really is.
  • add start value to all intrinsics
  • clarify %evl==0 semantics in docs

Wasn't sure if it's best to explicitly say in the docs that vp reductions have a start value unlike their unpredicated counterparts. I've left it out for now.

craig.topper added inline comments.Wed, Jul 14, 10:06 AM
llvm/docs/LangRef.rst
18582

With the start value added, I think it's the second operand now?

18807

Neutral value is -1 or UINT_MAX for AND

19175

Does -QNAN mean anything? Isn't the sign of a NaN mostly ignored.

19182

Even if all vector elements are NaN, the result would depend on the start value

llvm/include/llvm/IR/IntrinsicInst.h
465

Should this be isVPReduction?

llvm/include/llvm/IR/VPIntrinsics.def
251

Why accu instead of start?

llvm/lib/CodeGen/ExpandVectorPredication.cpp
264

I was wondering if we could go through ConstantExpr::getBinOpIdentity to avoid repeating the constants in multiple places but we'd still need special handling for min/max

llvm/test/Verifier/vp-intrinsics.ll
34

Start value is missing

frasercrmck marked 7 inline comments as done.

fix Verifier test
fix VPReductionIntrinsic::classof
add new unit test
update docs:

  • fix references to operand indices
  • fix neutral element for AND reduction
  • fix specified return value for FMIN/FMAX and NaNs
  • be more explicit about return value EVL==0
  • be more explicit about start value
  • reference operands by their names in Arguments sections
llvm/docs/LangRef.rst
18582

Yes, thanks. I didn't do a proper sweep over the docs. I've made further changes to them so it might be worth going over them again.

18807

Good spot, thanks. Updated the "equivalent to" section below, too.

19175

I have heard that the sign of NaNs is often ignored but I'm not sure it's guaranteed to be ignored? -QNAN does have a bit representation and ConstantFP and APFloat both take a Negative parameter to their getQNAN methods. It's also what SelectionDAG::getNeutralElement is doing for FMAXNUM.

19182

Yes true. I've replaced that sentence now.

llvm/include/llvm/IR/IntrinsicInst.h
456

Not sure whether to keep this in or not, now that all vp reductions have start parameter? Currently start is always 0 and vector is always 1.

465

Oh good spot, thanks. I've fixed that and added a unit test for these VPReductionIntrinsic methods.

llvm/include/llvm/IR/VPIntrinsics.def
251

I think the reference patch uses/used accu so I was flip-flopping between the two. I agree that "start" is better though.

llvm/lib/CodeGen/ExpandVectorPredication.cpp
264

Coming at it from a slightly different angle, I was wondering if there should be a single source of truth for neutral reduction elements between IR and SelectionDAG.

llvm/test/Verifier/vp-intrinsics.ll
34

Cheers.