Page MenuHomePhabricator

Initial support for vectorization using Libmvec (GLIBC vector math library).
ClosedPublic

Authored by venkataramanan.kumar.llvm on Sep 23 2020, 8:02 AM.

Diff Detail

Event Timeline

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptSep 23 2020, 8:02 AM
venkataramanan.kumar.llvm requested review of this revision.Sep 23 2020, 8:02 AM

Initial version I supported the following vector functions (VF 2 and 4 ).

sin
cos
exp
pow
log

Also added test cases similar to SVML under X86. I am not sure about other targets.

Looks good to me.
Regarding the tests, it seems that you check if auto-vectorization takes advantages of libmvec?
Would it be interesting to have a test which declares a vector and call the builtin sin on it?

Thank you very much for the changes! :)

clang/include/clang/Driver/Options.td
1585

I think glibc always refer to the library as "libmvec" in lower case, should we do so here?

rengolin added inline comments.Fri, Sep 25, 1:47 AM
clang/include/clang/Driver/Options.td
1585

Yes, so clang can be a drop in replacement.

Selection of Glibc vector math library is enabled via the option -fvec-lib=libmvec .

Looks good to me.
Regarding the tests, it seems that you check if auto-vectorization takes advantages of libmvec?
Would it be interesting to have a test which declares a vector and call the builtin sin on it?

Thank you very much for the changes! :)

do we we have built-in support for sin that takes vector types?

I tried

m128d compute_sin(m128d x)
{

return __builtin_sin(x);

}

error: passing '__m128d' (vector of 2 'double' values) to parameter of incompatible type 'double'

Pinging for review comments.

spatel added a comment.Tue, Oct 6, 6:59 AM

Looks good to me.
Regarding the tests, it seems that you check if auto-vectorization takes advantages of libmvec?
Would it be interesting to have a test which declares a vector and call the builtin sin on it?

Thank you very much for the changes! :)

do we we have built-in support for sin that takes vector types?

I tried

m128d compute_sin(m128d x)
{

return __builtin_sin(x);

}

error: passing '__m128d' (vector of 2 'double' values) to parameter of incompatible type 'double'

We have LLVM intrinsics for sin/cos that may use vector types:
http://llvm.org/docs/LangRef.html#llvm-sin-intrinsic
...but I don't know of a way to produce those directly from C source.

llvm/test/Transforms/LoopVectorize/X86/libm-vector-calls.ll
2

Why does this test file use command-line options to specify the vector factor and the other uses metadata?
If we can use metadata, then can you vary it to get better coverage (for example <2 x double> or <8 x float>)?

222–225

It would be better to consistently put the FileCheck lines after the 'define'.
Can you auto-generate the CHECK lines using llvm/utils/update_test_checks.py ?

fpetrogalli added inline comments.Thu, Oct 8, 1:47 AM
llvm/include/llvm/Analysis/TargetLibraryInfo.h
91

Can we call this LIBMVEC-X86? Libmvec itself is supposed to support other architectures, I can see list of mappings for each of the supported targets.

Then, the logic of selecting the correct one in the frontent clang would depend on the value of -fvec-lib=libmvec plus the value of -target.

nemanjai added inline comments.Thu, Oct 8, 3:54 AM
llvm/include/llvm/Analysis/TargetLibraryInfo.h
91

So if I follow correctly, we can choose the various vendor-specific libraries as well as libmvec which itself has target-specific ports.

Would it make sense to just add an overload of addVectorizableFunctions() that would consider the Triple and remove any entries from VectorDescs that the target doesn't support? Or even more specifically, simply add the Triple argument to addVectorizableFunctionsFromVecLib() and call something like removeLIBMVECEntriesForTarget(const Triple &T) that would do the job.

And of course, if the triple isn't provided and the user is targeting an architecture that doesn't provide some entry, that is just user error.

fpetrogalli added inline comments.Thu, Oct 8, 7:16 AM
llvm/include/llvm/Analysis/TargetLibraryInfo.h
91

The overload of the addVectorizableFunctions() might be feasible, but for the sake of simplicity I think that having LIBMVEC_<TARGET> in enum VectorLibrary for each of the <TARGET> to support would avoid having to deal with overload of methods. Given that these lists are static, I'd rather see them explicitly instead of having them filled up by add/remove methods.

All in all, I think it is easier to add the logic for the target triple in clang as it is just a matter of modifying the changes in BackendUtils.cpp (warning, pseudocode ahead):

case CodeGenOptions::LIBMVEC: 
  switch(Triple) {
  case X:
     TLII->addVectorizableFunctionsFromVecLib(TargetLibraryInfoImpl::LIBMVEC_X);
   break
  case Y:
     TLII->addVectorizableFunctionsFromVecLib(TargetLibraryInfoImpl::LIBMVEC_Y);
  break;
  case ...
  }
  break;

Looks good to me.
Regarding the tests, it seems that you check if auto-vectorization takes advantages of libmvec?
Would it be interesting to have a test which declares a vector and call the builtin sin on it?

Thank you very much for the changes! :)

do we we have built-in support for sin that takes vector types?

I tried

m128d compute_sin(m128d x)
{

return __builtin_sin(x);

}

error: passing '__m128d' (vector of 2 'double' values) to parameter of incompatible type 'double'

We have LLVM intrinsics for sin/cos that may use vector types:
http://llvm.org/docs/LangRef.html#llvm-sin-intrinsic
...but I don't know of a way to produce those directly from C source.

Ok I see intrinsic for fp128 type in LangRef.

---Snip--
declare fp128 @llvm.cos.f128(fp128)

define fp128 @test_cos(float %float, double %double, fp128 %fp128) {

%cosfp128 = call fp128 @llvm.cos.f128(fp128 %fp128)
ret fp128 %cosfp128

}
--Snip--

f128 is treated a long double I see call to cosl.

vmovaps %xmm2, %xmm0
 jmp     cosl

so I am not sure how to generate vector calls via built-ins.

Changed library naming to LIBMVEC-X86 as per comments and also selected based on Target Tripple in clang.
I am still working on auto generating FileCheck for the test cases.

As per review comments from Sanjay, updated the test case to use metadata. Also autogenerated the checks in the test cases using llvm/utils/update_test_checks.py.

As per review comments from Sanjay, updated the test case to use metadata. Also autogenerated the checks in the test cases using llvm/utils/update_test_checks.py.

Thanks. But wouldn't it be better test coverage to vary the "llvm.loop.vectorize.width". Why is it always "4"?
I don't have any experience with this lib, so no real feedback on the code itself. Hopefully another reviewer can approve if there are no other concerns.

Hi,

Thank you for modifying the implementation to facilitate the extension to other targets.

Please add libmvec-specific tests is clang/Driver/fveclib.c, and simplify the loop-vectorize tests.

Other than that, I only have minor comments.

Francesco

clang/lib/CodeGen/BackendUtil.cpp
377

Nit: is this misaligned?

llvm/test/Transforms/LoopVectorize/X86/libm-vector-calls-finite.ll
3

-inject-tli-mappings is not required here, as a pass itself is required by the loop vectorizer.

7

Nit: it is standard practice to put all declarations at the end of the file.

10

I think you are over-testing here. It is enough to check that inside the vector body there is a call to the vector function you have listed in the mapping. You are not checking for the whole auto-vectorization process here, just the vectorization of the function call. Same for all the tests for this patch in which you are doing something similar to this one test.

; CHECK-LABEL: @exp_f32(
; CHECK-LABEL:       vector.body:
; CHECK: call fast <4 x float> @_ZGVbN4v___expf_finite(<4 x float>
spatel added inline comments.Tue, Oct 13, 6:14 AM
llvm/test/Transforms/LoopVectorize/X86/libm-vector-calls-finite.ll
10

I requested using "utils/update_test_checks.py" to auto-generate the assertions consistently.

We have standardized on this practice for tests in several passes because it provides extra test coverage (at the risk of over-testing), and it makes updating tests in the future nearly automatic.

The time cost of checking the extra lines is negligible vs. the benefit that we have gotten in finding/avoiding bugs. If the consensus is that it's not worth it on this particular file, I'm ok with that. But the general trend is definitely towards auto-generating full checks.

70

The script should have warned you about using variables named "tmp". Independent of whether we choose to use the scripted assertions or not, you should change this value name (even plain "t" for "trunc" is an improvement over "tmp").

fhahn added a subscriber: fhahn.Tue, Oct 13, 6:35 AM
fhahn added inline comments.
llvm/test/Transforms/LoopVectorize/X86/libm-vector-calls-finite.ll
3

I guess it still doesn't hurt to be explicit. Also, can you add a line for the new pass manager?

10

FWIW in this case I would also slightly prefer to have more targeted test lines than auto-generating them (same for most loop-vectorize tests). The tests are large and LV generates a lot of code, and a lot of the code is completely unrelated/uninteresting to the code in the patch. IMO it would be enough to check the arguments of the vector function calls, together with the calls and maybe the induction variable.

The auto-generated check lines make things much more brittle and unrelated changes lead to us requiring to update lots of tests. And I am not sure if it is feasible to audit all details of the generated check lines (in the current patch ~500-800 new CHECK lines). So to me it seems like auto-generating checks here gives a false sense of security and make things harder down the line.

83

I don't think we need this. You can just pass -force-vector-width=4 to the command line and avoid the extra metadata duplicated for each test

fhahn added inline comments.Tue, Oct 13, 6:39 AM
llvm/test/Transforms/LoopVectorize/X86/libm-vector-calls.ll
2

FWIW most LV tests use -force-vector-width. If we decide to just go for the 'essential' check lines, it should be easy to add multiple run lines with difference VFs from the command line?

fhahn added inline comments.Tue, Oct 13, 6:40 AM
llvm/test/Transforms/LoopVectorize/X86/libm-vector-calls-finite.ll
83

Apologies, I missed that this was suggested earlier to test different VFs. If we decide to go with the 'essential' check lines approach it might make sense to just invoke opt with different VFs using force-vector-width.

Updated the patch as per review comments received.

The test cases are updated with the checks based on the below comment from Francesco.
---Snip--
I think you are over-testing here. It is enough to check that inside the vector body there is a call to the vector function you have listed in the mapping.
---Snip--
Florian also suggesting the same.

Another comment from Florian.
---Snip--
If we decide to go with the 'essential' check lines approach it might make sense to just invoke opt with different VFs using force-vector-width.
--Snip--

I still use metadata suggested by Sanjay. It is because I am currently testing only VF=4 .

We have both float and double type lib calls in the test case and libmvec has vector call support for VF=4 . For VF say 8 there is no vector call support for double types.

I can add few more float type tests with meta data for VF=8. please let me know your suggestions.

LGTM from the perspective of making sure that this solution can be extended to any of the architectures that libmvec supports.

Thank you.

spatel added a subscriber: aeubanks.EditedFri, Oct 16, 7:38 AM

I can add few more float type tests with meta data for VF=8. please let me know your suggestions.

I may be missing some subtlety of the vectorizer behavior. Can we vary the test types + metadata in 1 file, so that there is coverage for something like this v2f64 call : TLI_DEFINE_VECFUNC("llvm.sin.f64", "_ZGVbN2v_sin", 2)?
I'm just trying to make sure we don't fall into some blind-spot by only testing VF=4.

llvm/test/Transforms/LoopVectorize/X86/libm-vector-calls-finite.ll
3

We need to be explicit about that pass with new-pass-manager as shown here:
df5576a

cc @aeubanks as I'm not sure if we want to update tests with NPM RUN lines or if we want to silently transition whenever the default gets changed.

Added a test case for testing vector library calls for VF=2 and VF=8.

Remove an incorrect file that got attached with my earlier patch.

spatel accepted this revision.Tue, Oct 20, 9:03 AM

LGTM.
I'm not sure why we need 3 different test files to test the very similar patterns - just programmer preference?
Wait a day or so to commit in case anyone else has comments.

This revision is now accepted and ready to land.Tue, Oct 20, 9:03 AM

Thanks @spatel , @Florian and @fpetrogalli for the review comments and approval. Can someone please commit it on my behalf.