This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Analysis/
-
llvm/
-
Analysis/
-
VecFuncs.def
-
lib/Analysis/
-
Analysis/
1/2
LoopAccessAnalysis.cpp
-
test/Transforms/LoopVectorize/X86/
-
Transforms/
-
LoopVectorize/
-
X86/
4
libm-vector-calls-VF2-VF8.ll
-
libm-vector-calls.ll

Differential D116879

[llvm] Allow forced auto-vectorization of sincos() using libmvec
Needs ReviewPublic

Authored by tim.schmielau on Jan 9 2022, 12:29 AM.

Download Raw Diff

Details

Reviewers

venkataramanan.kumar.llvm
abique
fpetrogalli
rengolin
spatel
fhahn
jdoerfert
RKSimon

Commits

rG57cdc52c4df0: Initial support for vectorization using Libmvec (GLIBC vector math library)

Summary

Currently auto-vectorization lacks the ability to analyze
memory dependencies caused by function calls, only dependencies
caused by explicit load and store instructions are considered.

In order to still be able to vectorize loops with calls
to basic mathematical functions, any function listed in
include/llvm/Analysis/VecFuncs.def was implicitly assumed
to be safe.

This prevents addition of sincos() and other functions returning
multiple values via pointer operands to VecFuncs.def.

As a first step we only vectorize functions with pointer
arguments if the user forcibly skips dependency checks
via #pragma clang loop vectorize(assume_safety).

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	50 ms	x64 debian > LLVM.Transforms/LoopVectorize/AArch64::scalable-call.ll

Event Timeline

tim.schmielau created this revision.Jan 9 2022, 12:29 AM

Herald added subscribers: pengfei, dmgreen. · View Herald TranscriptJan 9 2022, 12:29 AM

tim.schmielau requested review of this revision.Jan 9 2022, 12:29 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 9 2022, 12:29 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Adding reviewers who commented on D88154.

tim.schmielau added a commit: rG57cdc52c4df0: Initial support for vectorization using Libmvec (GLIBC vector math library).Jan 9 2022, 12:36 AM

tim.schmielau edited the summary of this revision. (Show Details)Jan 9 2022, 12:46 AM

Of the two reproducers linked in https://bugs.llvm.org/show_bug.cgi?id=51530, the sincos_simd.cpp now auto-vectorizes when compiling with clang++ -fveclib=libmvec -O2 -march=core-avx2 sincos_simd.cpp, even when the #pragma omp simd annotation is left out in the source.

On the templated example sincos_simd_template.cpp a helping hand from the user is needed by annotating the loop with #pragma omp simd and adding -fopenmp to the compiler flags, but at least vectorization is now possible without the user having to manually substitute function calls.

In line with existing behavior, I have not added vector function definitions for AVX-512.

I have not included vector function definitions for @llvm.sincos.f(32|64) either as I believe nothing would be able to generate them.
They would be straightforward and I could add them if requested.

I've added regression tests to the same extend as for the existing, i.e. checking auto-vectorisation with float and double for vector widths 2, 4 and 8 where those are covered by SSE or AVX-2.
At some point we might want to limit the number of tests. In that case I'd still recommend keeping all tests I've added, because of the unique signature of sincos() producing two results from one call.
Instead I'd suggest to prune the existing coverage of combinations of sin(), cos(), vector lengths, float/double and libm functions / LLVM intrinsics.

We might also consider updating the documentation by mentioning that further functions might be auto-vectorized on some platforms as well as some combinations not being supported by others, Then again, a conscient user would likely be able to deduce that themselves.

I haven't figured out how to link to https://bugs.llvm.org/show_bug.cgi?id=51530 so that it would automatically get closed on merging. I suppose this isn't possible anymore with bugzilla being frozen?

Harbormaster completed remote builds in B142271: Diff 398397.Jan 9 2022, 1:13 AM

tim.schmielau edited the summary of this revision. (Show Details)Jan 9 2022, 1:15 AM

A potential issue is that working sincos() support in libmvec is "only" a bit over 5 years old.
E.g. a user on Centos 7 compiling code with a vectorizable call to sincos() and compiling with -fveclib=libmvec and -O2 or higher will now run into an undefined reference to the vectorized sincos() function and will either have to deactivate auto-vectorization or update their libmvec.

Given the very specific circumstances where this applies I am not sure whether to consider that a bug or a feature - a user that much concerned with performance of numerical codes might well appreciate the change.

In D116879#3229779, @tim.schmielau wrote:

I haven't figured out how to link to https://bugs.llvm.org/show_bug.cgi?id=51530 so that it would automatically get closed on merging. I suppose this isn't possible anymore with bugzilla being frozen?

Correct. Bugzilla won't change any more. Any changes will be done on Github. I'm not sure if Phab has a way to auto-close a Github issue, though.

This patch looks good to me, pretty straight forward stuff, but I'm not a libmvec expert. Please wait until the relevant folks have a look at it and approve.

Thanks!

The patch looks good to me. But please wait for the approval from other vectorization experts in the reviewer list.

Thank you.
I do not have commit access anyway, so can someone please commit once review is sufficient.

tim.schmielau added a reviewer: jdoerfert.Jan 9 2022, 1:40 PM

RKSimon added a reviewer: RKSimon.Jan 9 2022, 2:57 PM

RKSimon added a subscriber: RKSimon.

RKSimon added inline comments.

llvm/test/Transforms/LoopVectorize/X86/libm-vector-calls-VF2-VF8.ll
362	Is this correct? This looks like it creates a sincos signature that takes vectors of pointers to doubles, but I expect most sincos vector implementations to actually use pointers to vectors of doubles. Something like: void @sincos(<2 x double>, <2 x double>, <2 x double>) I hit something almost identical here: https://llvm.org/PR38424

jdoerfert added a subscriber: ye-luo.Jan 9 2022, 4:10 PM

tim.schmielau added inline comments.Jan 9 2022, 6:42 PM

llvm/test/Transforms/LoopVectorize/X86/libm-vector-calls-VF2-VF8.ll

362

I stumbled over this as well. Unfortunately the libmvec Vector ABI Spec isn't particularly enlightening on the matter:

2.3. Element Data Type to Vector Data Type Mapping
 
The vector data types for parameters are selected depending on ISA, vector length, data type of original parameter, and parameter specification.
For uniform and linear parameters (detailed description could be found in [1]), the original data type is preserved.
For vector parameters, vector data types are selected by the compiler. The mapping from element data type to vector data type is described as below.
* The bit size of vector data type of parameter is computed as: 
size_of_vector_data_type = VLEN * sizeof(original_parameter_data_type) * 8
For instance, for SSE version of vector function with parameter data type "int":
If VLEN = 4, size_of_vector_data_type = 4 * 4 * 8 = 128 (bits), which means one argument of type __m128 to be passed.
* If the size_of_vector_data_type is greater than the width of the vector register, multiple vector registers are selected and the parameter will be passed in multiple vector registers.
For instance, for SSE version of vector function with parameter data type "int": If VLEN = 8, size_of_vector_data_type = 8 * 4 * 8 = 256 (bits), so the vector data type is __m256, which means 2 arguments of type __m128 are to be passed.

I interpret that as the vvv part of the signature indicating the three scalar arguments as being duplicated inside vector registers, which would make the last two arguments vectors of pointers, rather than pointers to vectors. I also tested that the generated code actually works with libmvec.
However, given the lack of specific mention of pointers in the vector ABI spec I don't feel particularly confident about my interpretation.

rengolin added inline comments.Jan 10 2022, 1:02 AM

llvm/test/Transforms/LoopVectorize/X86/libm-vector-calls-VF2-VF8.ll
362	Good catch! I totally missed that. Tim, how did you test this? It's possible that vector of pointers "just worked" on X86 because it's supported, but this would probably break on non-SVE Arm. Regardless, that's the wrong implementation, we want just vectors. Can you share the asm output of this sequence you're getting?

tim.schmielau added inline comments.Jan 10 2022, 12:36 PM

llvm/test/Transforms/LoopVectorize/X86/libm-vector-calls-VF2-VF8.ll
362	[un-inlining the discussion, as testcase + asm output are somewhat lengthy]

I have beefed up my testcase to demonstrate why I had to choose the _ZGVdN4vvv_sincos() variant for correctness, even though _ZGVdN4vl8l8_sincos() would be desirable from a performance perspective:
We have no control over what pointers the user is passing in in different loop iterations.

sincosarr.cpp:

#include <math.h>

void sincos_arr(double* sines, double* cosines, double* phases, int* indices, int size) {
#pragma unroll 1
    for (int i=0; i<size; i++) {
        sincos(phases[indices[i]], sines+indices[i], cosines+indices[i]);
    }
}

main.cpp:

#include <stdio.h>
#include <math.h>

void sincos_arr(double* sins, double* coses, double* phases, int* indices, int size);

int main()
{
    const int N=32;
    int indices[N];
    double phases[N], sins[N], coses[N];
    for (int i=0; i<N; i++) {
        phases[i] = i;
        indices[i] = (i < 2) ? 1 : (indices[i-2] + indices[i-1]) % N;
    }
    sincos_arr(sins, coses, phases, indices, N);
    for (int i=0; i<N; i++) {
        int j = indices[i];
        printf("sin(%2d) == %10f == %10f | cos(%2d) == %10f == %10f\n",
               j, sin(phases[j]), sins[j],
               j, cos(phases[j]), coses[j]);
    }
    return 0;
}

Vectorized inner loop x86 assembly from clang++ -march=core-avx2 -fveclib=libmvec -O2 -S sincosarr.cpp:

    .p2align    4, 0x90
.LBB0_4:                                # =>This Inner Loop Header: Depth=1
    vpmovsxdq   (%r14,%r12), %ymm1
    vpextrq $1, %xmm1, %rax
    vextracti128    $1, %ymm1, %xmm0
    vpextrq $1, %xmm0, %rcx
    vmovq   %xmm0, %rdx
    vmovsd  (%rbx,%rdx,8), %xmm0            # xmm0 = mem[0],zero
    vmovhps (%rbx,%rcx,8), %xmm0, %xmm0     # xmm0 = xmm0[0,1],mem[0,1]
    vmovq   %xmm1, %rcx
    vmovsd  (%rbx,%rcx,8), %xmm2            # xmm2 = mem[0],zero
    vmovhps (%rbx,%rax,8), %xmm2, %xmm2     # xmm2 = xmm2[0,1],mem[0,1]
    vinsertf128 $1, %xmm0, %ymm2, %ymm0
    vpsllq  $3, %ymm1, %ymm2
    vpaddq  48(%rsp), %ymm2, %ymm1          # 32-byte Folded Reload
    vpaddq  16(%rsp), %ymm2, %ymm2          # 32-byte Folded Reload
    callq   _ZGVdN4vvv_sincos
    addq    $16, %r12
    cmpq    %r12, %r15
    jne .LBB0_4

And a variant of the code above shows that even the transformation to the vvv variant isn't safe in all cases.
I am looking into adding variations of the code above into the test-suite ahead of enabling the vectorizing transformation, to be sure the transformation is not applied when unsafe, and that the behavior of the underlying vector library matches my interpretation of the VectorAPI.
I don't see any existing tests around the transformations already performed.
Also, I see lots of regression tests to prevent potential performance regressions, ensuring the vectorizing transformation is not missed. But I don't see any tests currently to guard against correctness issues, ensuring the transformation is not applied in unsafe cases.

I think there is groundwork to be done before this change can be made in confidence. So please do not yet commit, even if someone should approve.

In D116879#3232537, @tim.schmielau wrote:

I have beefed up my testcase to demonstrate why I had to choose the _ZGVdN4vvv_sincos() variant for correctness, even though _ZGVdN4vl8l8_sincos() would be desirable from a performance perspective:
We have no control over what pointers the user is passing in in different loop iterations.

That is true, but the vectoriser won't generate code that it deems unsafe (no known bounds, aliasing) and that's why I'm assuming you need the pragmas to force vectorisation in your tests.

In your example below, size is known and the compiler assumes access to the [ith] element from each pointer is sane (even though it could be undefined), and can vectorise the call inside the loop, regardless of what the original pointers had in hand.

And a variant of the code above shows that even the transformation to the vvv variant isn't safe in all cases.

Is this with pragma or without? The compiler sometimes treats pragmas as "the user said it's safe, then it probably is".

Does this code generate unsafe transformations without any pragma or forced parameters?

I am looking into adding variations of the code above into the test-suite ahead of enabling the vectorizing transformation, to be sure the transformation is not applied when unsafe, and that the behavior of the underlying vector library matches my interpretation of the VectorAPI.
I don't see any existing tests around the transformations already performed.

Awesome, thanks!

@fpetrogalli @spatel @fhahn do you know of any tests for math library vectorisation?

Also, I see lots of regression tests to prevent potential performance regressions, ensuring the vectorizing transformation is not missed. But I don't see any tests currently to guard against correctness issues, ensuring the transformation is not applied in unsafe cases.

It's harder to create adversarial tests than benign ones, unfortunately. That's not an excuse, just a fact. :)

It'd be awesome if we had more of such tests... (wink)

I think there is groundwork to be done before this change can be made in confidence. So please do not yet commit, even if someone should approve.

Ack. We usually don't merge other people's patches unless they ask for it (like those that don't have commit permissions), so we should be safe.

Matt added a subscriber: Matt.Jan 25 2022, 3:13 PM

tim.schmielau mentioned this in D120977: [test-suite] Add unit test for libmvec sincos() auto-vectorisation.Mar 4 2022, 12:39 AM

I have just submitted the test for the testsuite. Once that is merged, this change should also be fine to go in.

And a variant of the code above shows that even the transformation to the vvv variant isn't safe in all cases.

I must have made a mistake there, as I cannot reproduce the failure anymore.

I'm assuming you need the pragmas to force vectorisation in your tests.

The pragma is only for convenience to shorten the loop body for manual inspection. It has no other impact on the result.

We usually don't merge other people's patches unless they ask for it (like those that don't have commit permissions), so we should be safe.

I just wanted to be explicit, as further up I had already requested merging.

Herald added a project: Restricted Project. · View Herald TranscriptMar 4 2022, 1:08 AM

It's been a while since I last looked at this. However there don't seem to be many users of -fveclib=libmvec in general as demonstrated by the fact that it has been broken since before I've submitted this patch.
Before returning to allowing sincos() vectorization, I've thus submitted a separate patch to reenable vectorization of all other functions declared in VecFuncs.def already: D134732.

Herald added a subscriber: • pcwang-thead. · View Herald TranscriptSep 27 2022, 5:28 AM

To finally get this out of the way without having the necessary analysis available, I am retreating a bit and just allow forced auto-vectorization via #pragma clang loop vectorize(assume_safety).

Herald added a subscriber: hiraditya. · View Herald TranscriptSep 28 2022, 6:07 AM

Harbormaster completed remote builds in B189148: Diff 463517.Sep 28 2022, 6:08 AM

tim.schmielau added a parent revision: D134732: fix vectorization of library calls.Sep 28 2022, 6:14 AM

tim.schmielau removed a parent revision: D134732: fix vectorization of library calls.Sep 29 2022, 7:34 AM

tim.schmielau added a parent revision: D134732: fix vectorization of library calls.

I'm struggling to find much documentation for libmvec other than the VectorABI.txt you mentioned - do you happen to know where we can review a reference header or source code please? I've tried searching by there's plenty of libs out there with similar names....

I'm still not sure the vector-of-pointers pattern what we want to support tbh

llvm/lib/Analysis/LoopAccessAnalysis.cpp
2230–2231	I'm not sure this will work well once we want to handle sincos(<2 x double>, <2 x double>, <2 x double>) style patterns as well.

Reset parent to reflect reviewer comment there (removing duplicate call && test).

The failing LLVM.Transforms/LoopVectorize/AArch64::scalable-call.ll test makes me wonder what the exact semantics of the vector-function-abi-variant call attribute is in the presence of pointer arguments.

Apparently it does not imply vectorization is safe, because then we wouldn't need to invoke LoopAccessAnalysis at all. But how is LoopAccessAnalysis supposed to work if we don't know the length of the array pointed at? Do we assume pointers always point to a single element?

tim.schmielau added inline comments.Sep 29 2022, 8:46 AM

llvm/lib/Analysis/LoopAccessAnalysis.cpp
2230–2231	It is indeed only meant as a stopgap solution - don't vectorize any function with pointer arguments unless the user has asserted it is safe to do so. How would your pattern cause an issue if the user has already asserted it is safe to vectorize? (On an unrelated note I should probably `recordAnalysis()` though when rejecting vectorization.)

Harbormaster completed remote builds in B189415: Diff 463900.Sep 29 2022, 9:18 AM

In D116879#3823875, @RKSimon wrote:

I'm struggling to find much documentation for libmvec other than the VectorABI.txt you mentioned - do you happen to know where we can review a reference header or source code please? I've tried searching by there's plenty of libs out there with similar names....

I'm still not sure the vector-of-pointers pattern what we want to support tbh

I find the libmvec sources hard to read, and agree there isn't much useful material out there (at least not that I am aware of).
Most illuminating is probably the commit that changed the sincos() vector variant signature to what it currently is: https://sourceware.org/git/?p=glibc.git;a=commit;h=ee2196bb6766ca7e63a1ba22ebb7619a3266776a

Here is the discussion that lead up to the glibc / libmvec commit linked above: https://marc.info/?t=146472287500003

The glibc Bugzilla ticket probably makes the best reading so far: https://sourceware.org/bugzilla/show_bug.cgi?id=20024

Hi @tim.schmielau, your assumption that libm-vec expects vectors filled with pointers is correct, as shown by https://elixir.bootlin.com/glibc/glibc-2.37.9000/source/sysdeps/x86_64/fpu/test-vector-abi-sincos.h#L46.

Do you plan on continuing to work on this patch?

Herald added subscribers: wangpc, StephenFan. · View Herald TranscriptJul 13 2023, 2:34 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

VecFuncs.def

6 lines

lib/

Analysis/

LoopAccessAnalysis.cpp

16 lines

test/

Transforms/

LoopVectorize/

X86/

libm-vector-calls-VF2-VF8.ll

65 lines

libm-vector-calls.ll

119 lines

Diff 463900

llvm/include/llvm/Analysis/VecFuncs.def

	Show First 20 Lines • Show All 139 Lines • ▼ Show 20 Lines
	TLI_DEFINE_VECFUNC("cosf", "_ZGVdN8v_cosf", FIXED(8))			TLI_DEFINE_VECFUNC("cosf", "_ZGVdN8v_cosf", FIXED(8))

	TLI_DEFINE_VECFUNC("llvm.cos.f64", "_ZGVbN2v_cos", FIXED(2))			TLI_DEFINE_VECFUNC("llvm.cos.f64", "_ZGVbN2v_cos", FIXED(2))
	TLI_DEFINE_VECFUNC("llvm.cos.f64", "_ZGVdN4v_cos", FIXED(4))			TLI_DEFINE_VECFUNC("llvm.cos.f64", "_ZGVdN4v_cos", FIXED(4))

	TLI_DEFINE_VECFUNC("llvm.cos.f32", "_ZGVbN4v_cosf", FIXED(4))			TLI_DEFINE_VECFUNC("llvm.cos.f32", "_ZGVbN4v_cosf", FIXED(4))
	TLI_DEFINE_VECFUNC("llvm.cos.f32", "_ZGVdN8v_cosf", FIXED(8))			TLI_DEFINE_VECFUNC("llvm.cos.f32", "_ZGVdN8v_cosf", FIXED(8))

				TLI_DEFINE_VECFUNC("sincos", "_ZGVbN2vvv_sincos", FIXED(2))
				TLI_DEFINE_VECFUNC("sincos", "_ZGVdN4vvv_sincos", FIXED(4))

				TLI_DEFINE_VECFUNC("sincosf", "_ZGVbN4vvv_sincosf", FIXED(4))
				TLI_DEFINE_VECFUNC("sincosf", "_ZGVdN8vvv_sincosf", FIXED(8))

	TLI_DEFINE_VECFUNC("pow", "_ZGVbN2vv_pow", FIXED(2))			TLI_DEFINE_VECFUNC("pow", "_ZGVbN2vv_pow", FIXED(2))
	TLI_DEFINE_VECFUNC("pow", "_ZGVdN4vv_pow", FIXED(4))			TLI_DEFINE_VECFUNC("pow", "_ZGVdN4vv_pow", FIXED(4))

	TLI_DEFINE_VECFUNC("powf", "_ZGVbN4vv_powf", FIXED(4))			TLI_DEFINE_VECFUNC("powf", "_ZGVbN4vv_powf", FIXED(4))
	TLI_DEFINE_VECFUNC("powf", "_ZGVdN8vv_powf", FIXED(8))			TLI_DEFINE_VECFUNC("powf", "_ZGVdN8vv_powf", FIXED(8))

	TLI_DEFINE_VECFUNC("__pow_finite", "_ZGVbN2vv___pow_finite", FIXED(2))			TLI_DEFINE_VECFUNC("__pow_finite", "_ZGVbN2vv___pow_finite", FIXED(2))
	TLI_DEFINE_VECFUNC("__pow_finite", "_ZGVdN4vv___pow_finite", FIXED(4))			TLI_DEFINE_VECFUNC("__pow_finite", "_ZGVdN4vv___pow_finite", FIXED(4))
	▲ Show 20 Lines • Show All 324 Lines • Show Last 20 Lines

llvm/lib/Analysis/LoopAccessAnalysis.cpp

//===- LoopAccessAnalysis.cpp - Loop Access Analysis Implementation --------==//		//===- LoopAccessAnalysis.cpp - Loop Access Analysis Implementation --------==//
		Lint: Lint Inline Actions clang-format not found in user’s local PATH; not linting file. Lint: Lint: clang-format not found in user’s local PATH; not linting file.
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
▲ Show 20 Lines • Show All 2,199 Lines • ▼ Show 20 Lines	for (Instruction &I : *BB) {
if (auto *Call = dyn_cast<CallInst>(&I)) {		if (auto *Call = dyn_cast<CallInst>(&I)) {
// Many math library functions read the rounding mode. We will only		// Many math library functions read the rounding mode. We will only
// vectorize a loop if it contains known function calls that don't set		// vectorize a loop if it contains known function calls that don't set
// the flag. Therefore, it is safe to ignore this read from memory.		// the flag. Therefore, it is safe to ignore this read from memory.
if (getVectorIntrinsicIDForCall(Call, TLI))		if (getVectorIntrinsicIDForCall(Call, TLI))
continue;		continue;

// If the function has an explicit vectorized counterpart, we can safely		// If the function has an explicit vectorized counterpart, we can safely
// assume that it can be vectorized.		// assume that it can be vectorized unless it has pointer arguments.
if (!Call->isNoBuiltin() && Call->getCalledFunction() &&		if (!Call->isNoBuiltin() && Call->getCalledFunction() &&
!VFDatabase::getMappings(*Call).empty())		!VFDatabase::getMappings(*Call).empty()) {
		// Don't even check if the user asked for vectorization.
		if (IsAnnotatedParallel)
continue;		continue;
		// Scan arguments for pointers, which currently prevent vectorization.
		for (Value *Arg : Call->args()) {
		if (Arg->getType()->isPointerTy()) {
		HasComplexMemInst = true;
		continue;
		}
		}
		// No pointer arguments: safe to vectorize.
		continue;
		}
		RKSimonUnsubmitted Not Done Reply Inline Actions I'm not sure this will work well once we want to handle sincos(<2 x double>, <2 x double>, <2 x double>) style patterns as well. RKSimon: I'm not sure this will work well once we want to handle sincos(<2 x double>, <2 x double>*, <2…
		tim.schmielauAuthorUnsubmitted Done Reply Inline Actions It is indeed only meant as a stopgap solution - don't vectorize any function with pointer arguments unless the user has asserted it is safe to do so. How would your pattern cause an issue if the user has already asserted it is safe to vectorize? (On an unrelated note I should probably `recordAnalysis()` though when rejecting vectorization.) tim.schmielau: It is indeed only meant as a stopgap solution - don't vectorize any function with pointer…
}		}

// If this is a load, save it. If this instruction can read from memory		// If this is a load, save it. If this instruction can read from memory
// but is not a load, then we quit. Notice that we don't handle function		// but is not a load, then we quit. Notice that we don't handle function
// calls that read or write.		// calls that read or write.
if (I.mayReadFromMemory()) {		if (I.mayReadFromMemory()) {
auto *Ld = dyn_cast<LoadInst>(&I);		auto *Ld = dyn_cast<LoadInst>(&I);
if (!Ld) {		if (!Ld) {
▲ Show 20 Lines • Show All 514 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/X86/libm-vector-calls-VF2-VF8.ll

	Show First 20 Lines • Show All 350 Lines • ▼ Show 20 Lines
	for.end: ; preds = %for.body			for.end: ; preds = %for.body
	ret void			ret void
	}			}

	!131 = distinct !{!131, !132, !133}			!131 = distinct !{!131, !132, !133}
	!132 = !{!"llvm.loop.vectorize.width", i32 8}			!132 = !{!"llvm.loop.vectorize.width", i32 8}
	!133 = !{!"llvm.loop.vectorize.enable", i1 true}			!133 = !{!"llvm.loop.vectorize.enable", i1 true}

				define void @sincos_f64(double* nocapture noalias %sinarray, double* nocapture noalias %cosarray) {
				; CHECK-LABEL: @sincos_f64(
				; CHECK-LABEL: vector.body
				; CHECK: call void @_ZGVbN2vvv_sincos(<2 x double> [[TMP4:%.]], <2 x double> [[TMP5:%.]], <2 x double> [[TMP6:%.*]])
				RKSimonUnsubmitted Not Done Reply Inline Actions Is this correct? This looks like it creates a sincos signature that takes vectors of pointers to doubles, but I expect most sincos vector implementations to actually use pointers to vectors of doubles. Something like: void @sincos(<2 x double>, <2 x double>, <2 x double>) I hit something almost identical here: https://llvm.org/PR38424 RKSimon: Is this correct? This looks like it creates a sincos signature that takes vectors of pointers…
				tim.schmielauAuthorUnsubmitted Not Done Reply Inline Actions I stumbled over this as well. Unfortunately the libmvec Vector ABI Spec isn't particularly enlightening on the matter: 2.3. Element Data Type to Vector Data Type Mapping The vector data types for parameters are selected depending on ISA, vector length, data type of original parameter, and parameter specification. For uniform and linear parameters (detailed description could be found in [1]), the original data type is preserved. For vector parameters, vector data types are selected by the compiler. The mapping from element data type to vector data type is described as below. * The bit size of vector data type of parameter is computed as: size_of_vector_data_type = VLEN * sizeof(original_parameter_data_type) * 8 For instance, for SSE version of vector function with parameter data type "int": If VLEN = 4, size_of_vector_data_type = 4 * 4 * 8 = 128 (bits), which means one argument of type __m128 to be passed. * If the size_of_vector_data_type is greater than the width of the vector register, multiple vector registers are selected and the parameter will be passed in multiple vector registers. For instance, for SSE version of vector function with parameter data type "int": If VLEN = 8, size_of_vector_data_type = 8 * 4 * 8 = 256 (bits), so the vector data type is __m256, which means 2 arguments of type __m128 are to be passed. I interpret that as the `vvv` part of the signature indicating the three scalar arguments as being duplicated inside vector registers, which would make the last two arguments vectors of pointers, rather than pointers to vectors. I also tested that the generated code actually works with libmvec. However, given the lack of specific mention of pointers in the vector ABI spec I don't feel particularly confident about my interpretation. tim.schmielau: I stumbled over this as well. Unfortunately the [[ https://sourceware.org/glibc/wiki/libmvec?
				rengolinUnsubmitted Not Done Reply Inline Actions Good catch! I totally missed that. Tim, how did you test this? It's possible that vector of pointers "just worked" on X86 because it's supported, but this would probably break on non-SVE Arm. Regardless, that's the wrong implementation, we want just vectors. Can you share the asm output of this sequence you're getting? rengolin: Good catch! I totally missed that. Tim, how did you test this? It's possible that vector of…
				tim.schmielauAuthorUnsubmitted Not Done Reply Inline Actions [un-inlining the discussion, as testcase + asm output are somewhat lengthy] tim.schmielau: [un-inlining the discussion, as testcase + asm output are somewhat lengthy]
				;
				entry:
				br label %for.body

				for.body:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%t = trunc i64 %iv to i32
				%conv = sitofp i32 %t to double
				%sinptr = getelementptr inbounds double, double* %sinarray, i64 %iv
				%cosptr = getelementptr inbounds double, double* %cosarray, i64 %iv
				call void @sincos(double %conv, double* %sinptr, double* %cosptr), !llvm.access.group !145
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond = icmp eq i64 %iv.next, 1000
				br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !141

				for.end:
				ret void
				}

				!141 = distinct !{!141, !142, !143, !144}
				!142 = !{!"llvm.loop.vectorize.width", i32 2}
				!143 = !{!"llvm.loop.vectorize.enable", i1 true}
				!144 = !{!"llvm.loop.parallel_accesses", !145}
				!145 = distinct !{}

				define void @sincos_f32(float* nocapture noalias %sinarray, float* nocapture noalias %cosarray) {
				; CHECK-LABEL: @sincos_f32(
				; CHECK-LABEL: vector.body
				; CHECK: call void @_ZGVdN8vvv_sincosf(<8 x float> [[TMP4:%.]], <8 x float> [[TMP5:%.]], <8 x float> [[TMP6:%.*]])
				;
				entry:
				br label %for.body

				for.body:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%t = trunc i64 %iv to i32
				%conv = sitofp i32 %t to float
				%sinptr = getelementptr inbounds float, float* %sinarray, i64 %iv
				%cosptr = getelementptr inbounds float, float* %cosarray, i64 %iv
				call void @sincosf(float %conv, float* %sinptr, float* %cosptr), !llvm.access.group !155
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond = icmp eq i64 %iv.next, 1000
				br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !151

				for.end:
				ret void
				}

				!151 = distinct !{!151, !152, !153, !154}
				!152 = !{!"llvm.loop.vectorize.width", i32 8}
				!153 = !{!"llvm.loop.vectorize.enable", i1 true}
				!154 = !{!"llvm.loop.parallel_accesses", !155}
				!155 = distinct !{}

				; CHECK-LABEL: ; Function Attrs:

	; functions are in fact "readnone" but clang only emits the weaker "writeonly" as other math functions may write errno.			; functions are in fact "readnone" but clang only emits the weaker "writeonly" as other math functions may write errno.
	attributes #0 = { nounwind writeonly }			attributes #0 = { nounwind writeonly }

	declare double @sin(double) #0			declare double @sin(double) #0
	declare float @sinf(float) #0			declare float @sinf(float) #0
	declare double @llvm.sin.f64(double) #0			declare double @llvm.sin.f64(double) #0
	declare float @llvm.sin.f32(float) #0			declare float @llvm.sin.f32(float) #0
	declare double @cos(double) #0			declare double @cos(double) #0
	declare float @cosf(float) #0			declare float @cosf(float) #0
	declare double @llvm.cos.f64(double) #0			declare double @llvm.cos.f64(double) #0
	declare float @llvm.cos.f32(float) #0			declare float @llvm.cos.f32(float) #0
	declare float @expf(float) #0			declare float @expf(float) #0
	declare float @powf(float, float) #0			declare float @powf(float, float) #0
	declare float @llvm.exp.f32(float) #0			declare float @llvm.exp.f32(float) #0
	declare float @logf(float) #0			declare float @logf(float) #0
	declare float @llvm.pow.f32(float, float) #0			declare float @llvm.pow.f32(float, float) #0

				attributes #1 = { nounwind argmemonly }

				declare void @sincos(double, double, double) #1
				declare void @sincosf(float, float, float) #1

llvm/test/Transforms/LoopVectorize/X86/libm-vector-calls.ll

	Show First 20 Lines • Show All 350 Lines • ▼ Show 20 Lines
	for.end: ; preds = %for.body			for.end: ; preds = %for.body
	ret void			ret void
	}			}

	!131 = distinct !{!131, !132, !133}			!131 = distinct !{!131, !132, !133}
	!132 = !{!"llvm.loop.vectorize.width", i32 4}			!132 = !{!"llvm.loop.vectorize.width", i32 4}
	!133 = !{!"llvm.loop.vectorize.enable", i1 true}			!133 = !{!"llvm.loop.vectorize.enable", i1 true}

				define void @sincos_f64(double* nocapture noalias %sinarray, double* nocapture noalias %cosarray) {
				; CHECK-LABEL: @sincos_f64(
				; CHECK-LABEL: vector.body
				; CHECK: call void @_ZGVdN4vvv_sincos(<4 x double> [[TMP4:%.]], <4 x double> [[TMP5:%.]], <4 x double> [[TMP6:%.*]])
				;
				entry:
				br label %for.body

				for.body:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%t = trunc i64 %iv to i32
				%conv = sitofp i32 %t to double
				%sinptr = getelementptr inbounds double, double* %sinarray, i64 %iv
				%cosptr = getelementptr inbounds double, double* %cosarray, i64 %iv
				call void @sincos(double %conv, double* %sinptr, double* %cosptr), !llvm.access.group !145
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond = icmp eq i64 %iv.next, 1000
				br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !141

				for.end:
				ret void
				}

				!141 = distinct !{!141, !142, !143, !144}
				!142 = !{!"llvm.loop.vectorize.width", i32 4}
				!143 = !{!"llvm.loop.vectorize.enable", i1 true}
				!144 = !{!"llvm.loop.parallel_accesses", !145}
				!145 = distinct !{}

				define void @sincos_f32(float* nocapture noalias %sinarray, float* nocapture noalias %cosarray) {
				; CHECK-LABEL: @sincos_f32(
				; CHECK-LABEL: vector.body
				; CHECK: call void @_ZGVbN4vvv_sincosf(<4 x float> [[TMP4:%.]], <4 x float> [[TMP5:%.]], <4 x float> [[TMP6:%.*]])
				;
				entry:
				br label %for.body

				for.body:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%t = trunc i64 %iv to i32
				%conv = sitofp i32 %t to float
				%sinptr = getelementptr inbounds float, float* %sinarray, i64 %iv
				%cosptr = getelementptr inbounds float, float* %cosarray, i64 %iv
				call void @sincosf(float %conv, float* %sinptr, float* %cosptr), !llvm.access.group !155
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond = icmp eq i64 %iv.next, 1000
				br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !151

				for.end:
				ret void
				}

				!151 = distinct !{!151, !152, !153, !154}
				!152 = !{!"llvm.loop.vectorize.width", i32 4}
				!153 = !{!"llvm.loop.vectorize.enable", i1 true}
				!154 = !{!"llvm.loop.parallel_accesses", !155}
				!155 = distinct !{}

				define void @dependent_sincos_f64(double* nocapture noalias %sinarray, double* nocapture noalias %cosarray) {
				; CHECK-LABEL: @dependent_sincos_f64(
				; CHECK-NOT: @_ZGVdN4vvv_sincos
				;
				entry:
				br label %for.body

				for.body:
				%iv = phi i64 [ 1, %entry ], [ %iv.next, %for.body ]
				%iv.prev = sub nuw nsw i64 %iv, 1
				%phaseptr = getelementptr inbounds double, double* %cosarray, i64 %iv.prev
				%sinptr = getelementptr inbounds double, double* %sinarray, i64 %iv
				%cosptr = getelementptr inbounds double, double* %cosarray, i64 %iv
				%phase = load double, double* %phaseptr
				call void @sincos(double %phase, double* %sinptr, double* %cosptr)
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond = icmp eq i64 %iv.next, 1000
				br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !161

				for.end:
				ret void
				}

				!161 = distinct !{!161, !162, !163}
				!162 = !{!"llvm.loop.vectorize.width", i32 4}
				!163 = !{!"llvm.loop.vectorize.enable", i1 true}

				define void @dependent_sincos_f32(float* nocapture noalias %sinarray, float* nocapture noalias %cosarray) {
				; CHECK-LABEL: @dependent_sincos_f32(
				; CHECK-NOT: @_ZGVbN4vvv_sincosf
				;
				entry:
				br label %for.body

				for.body:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%iv.prev = sub nuw nsw i64 %iv, 1
				%phaseptr = getelementptr inbounds float, float* %sinarray, i64 %iv.prev
				%sinptr = getelementptr inbounds float, float* %sinarray, i64 %iv
				%cosptr = getelementptr inbounds float, float* %cosarray, i64 %iv
				%phase = load float, float* %phaseptr
				call void @sincosf(float %phase, float* %sinptr, float* %cosptr)
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond = icmp eq i64 %iv.next, 1000
				br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !171

				for.end:
				ret void
				}

				!171 = distinct !{!171, !172, !173}
				!172 = !{!"llvm.loop.vectorize.width", i32 4}
				!173 = !{!"llvm.loop.vectorize.enable", i1 true}

				; CHECK-LABEL: ; Function Attrs:

	; functions are in fact "readnone" but clang only emits the weaker "writeonly" as other math functions may write errno.			; functions are in fact "readnone" but clang only emits the weaker "writeonly" as other math functions may write errno.
	attributes #0 = { nounwind writeonly }			attributes #0 = { nounwind writeonly }

	declare double @sin(double) #0			declare double @sin(double) #0
	declare float @sinf(float) #0			declare float @sinf(float) #0
	declare double @llvm.sin.f64(double) #0			declare double @llvm.sin.f64(double) #0
	declare float @llvm.sin.f32(float) #0			declare float @llvm.sin.f32(float) #0
	declare double @cos(double) #0			declare double @cos(double) #0
	declare float @cosf(float) #0			declare float @cosf(float) #0
	declare double @llvm.cos.f64(double) #0			declare double @llvm.cos.f64(double) #0
	declare float @llvm.cos.f32(float) #0			declare float @llvm.cos.f32(float) #0
	declare float @expf(float) #0			declare float @expf(float) #0
	declare float @powf(float, float) #0			declare float @powf(float, float) #0
	declare float @llvm.exp.f32(float) #0			declare float @llvm.exp.f32(float) #0
	declare float @logf(float) #0			declare float @logf(float) #0
	declare float @llvm.pow.f32(float, float) #0			declare float @llvm.pow.f32(float, float) #0

				attributes #1 = { nounwind argmemonly }

				declare void @sincos(double, double, double) #1
				declare void @sincosf(float, float, float) #1

This is an archive of the discontinued LLVM Phabricator instance.

[llvm] Allow forced auto-vectorization of sincos() using libmvecNeeds ReviewPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 463900

llvm/include/llvm/Analysis/VecFuncs.def

llvm/lib/Analysis/LoopAccessAnalysis.cpp

llvm/test/Transforms/LoopVectorize/X86/libm-vector-calls-VF2-VF8.ll

llvm/test/Transforms/LoopVectorize/X86/libm-vector-calls.ll

[llvm] Allow forced auto-vectorization of sincos() using libmvec
Needs ReviewPublic