Initial support for vectorization using Libmvec (GLIBC vector math library).
Details
Diff Detail
Event Timeline
Initial version I supported the following vector functions (VF 2 and 4 ).
sin
cos
exp
pow
log
Also added test cases similar to SVML under X86. I am not sure about other targets.
Looks good to me.
Regarding the tests, it seems that you check if auto-vectorization takes advantages of libmvec?
Would it be interesting to have a test which declares a vector and call the builtin sin on it?
Thank you very much for the changes! :)
clang/include/clang/Driver/Options.td | ||
---|---|---|
1582 | I think glibc always refer to the library as "libmvec" in lower case, should we do so here? |
clang/include/clang/Driver/Options.td | ||
---|---|---|
1582 | Yes, so clang can be a drop in replacement. |
do we we have built-in support for sin that takes vector types?
I tried
m128d compute_sin(m128d x)
{
return __builtin_sin(x);
}
error: passing '__m128d' (vector of 2 'double' values) to parameter of incompatible type 'double'
We have LLVM intrinsics for sin/cos that may use vector types:
http://llvm.org/docs/LangRef.html#llvm-sin-intrinsic
...but I don't know of a way to produce those directly from C source.
llvm/test/Transforms/LoopVectorize/X86/libm-vector-calls.ll | ||
---|---|---|
2 | Why does this test file use command-line options to specify the vector factor and the other uses metadata? | |
222–225 | It would be better to consistently put the FileCheck lines after the 'define'. |
llvm/include/llvm/Analysis/TargetLibraryInfo.h | ||
---|---|---|
91 | Can we call this LIBMVEC-X86? Libmvec itself is supposed to support other architectures, I can see list of mappings for each of the supported targets. Then, the logic of selecting the correct one in the frontent clang would depend on the value of -fvec-lib=libmvec plus the value of -target. |
llvm/include/llvm/Analysis/TargetLibraryInfo.h | ||
---|---|---|
91 | So if I follow correctly, we can choose the various vendor-specific libraries as well as libmvec which itself has target-specific ports. Would it make sense to just add an overload of addVectorizableFunctions() that would consider the Triple and remove any entries from VectorDescs that the target doesn't support? Or even more specifically, simply add the Triple argument to addVectorizableFunctionsFromVecLib() and call something like removeLIBMVECEntriesForTarget(const Triple &T) that would do the job. And of course, if the triple isn't provided and the user is targeting an architecture that doesn't provide some entry, that is just user error. |
llvm/include/llvm/Analysis/TargetLibraryInfo.h | ||
---|---|---|
91 | The overload of the addVectorizableFunctions() might be feasible, but for the sake of simplicity I think that having LIBMVEC_<TARGET> in enum VectorLibrary for each of the <TARGET> to support would avoid having to deal with overload of methods. Given that these lists are static, I'd rather see them explicitly instead of having them filled up by add/remove methods. All in all, I think it is easier to add the logic for the target triple in clang as it is just a matter of modifying the changes in BackendUtils.cpp (warning, pseudocode ahead): case CodeGenOptions::LIBMVEC: switch(Triple) { case X: TLII->addVectorizableFunctionsFromVecLib(TargetLibraryInfoImpl::LIBMVEC_X); break case Y: TLII->addVectorizableFunctionsFromVecLib(TargetLibraryInfoImpl::LIBMVEC_Y); break; case ... } break; |
Ok I see intrinsic for fp128 type in LangRef.
---Snip--
declare fp128 @llvm.cos.f128(fp128)
define fp128 @test_cos(float %float, double %double, fp128 %fp128) {
%cosfp128 = call fp128 @llvm.cos.f128(fp128 %fp128) ret fp128 %cosfp128
}
--Snip--
f128 is treated a long double I see call to cosl.
vmovaps %xmm2, %xmm0 jmp cosl
so I am not sure how to generate vector calls via built-ins.
Changed library naming to LIBMVEC-X86 as per comments and also selected based on Target Tripple in clang.
I am still working on auto generating FileCheck for the test cases.
As per review comments from Sanjay, updated the test case to use metadata. Also autogenerated the checks in the test cases using llvm/utils/update_test_checks.py.
Thanks. But wouldn't it be better test coverage to vary the "llvm.loop.vectorize.width". Why is it always "4"?
I don't have any experience with this lib, so no real feedback on the code itself. Hopefully another reviewer can approve if there are no other concerns.
Hi,
Thank you for modifying the implementation to facilitate the extension to other targets.
Please add libmvec-specific tests is clang/Driver/fveclib.c, and simplify the loop-vectorize tests.
Other than that, I only have minor comments.
Francesco
clang/lib/CodeGen/BackendUtil.cpp | ||
---|---|---|
377 | Nit: is this misaligned? | |
llvm/test/Transforms/LoopVectorize/X86/libm-vector-calls-finite.ll | ||
2 | -inject-tli-mappings is not required here, as a pass itself is required by the loop vectorizer. | |
6 | Nit: it is standard practice to put all declarations at the end of the file. | |
9 | I think you are over-testing here. It is enough to check that inside the vector body there is a call to the vector function you have listed in the mapping. You are not checking for the whole auto-vectorization process here, just the vectorization of the function call. Same for all the tests for this patch in which you are doing something similar to this one test. ; CHECK-LABEL: @exp_f32( ; CHECK-LABEL: vector.body: ; CHECK: call fast <4 x float> @_ZGVbN4v___expf_finite(<4 x float> |
llvm/test/Transforms/LoopVectorize/X86/libm-vector-calls-finite.ll | ||
---|---|---|
9 | I requested using "utils/update_test_checks.py" to auto-generate the assertions consistently. We have standardized on this practice for tests in several passes because it provides extra test coverage (at the risk of over-testing), and it makes updating tests in the future nearly automatic. The time cost of checking the extra lines is negligible vs. the benefit that we have gotten in finding/avoiding bugs. If the consensus is that it's not worth it on this particular file, I'm ok with that. But the general trend is definitely towards auto-generating full checks. | |
69 | The script should have warned you about using variables named "tmp". Independent of whether we choose to use the scripted assertions or not, you should change this value name (even plain "t" for "trunc" is an improvement over "tmp"). |
llvm/test/Transforms/LoopVectorize/X86/libm-vector-calls-finite.ll | ||
---|---|---|
2 | I guess it still doesn't hurt to be explicit. Also, can you add a line for the new pass manager? | |
9 | FWIW in this case I would also slightly prefer to have more targeted test lines than auto-generating them (same for most loop-vectorize tests). The tests are large and LV generates a lot of code, and a lot of the code is completely unrelated/uninteresting to the code in the patch. IMO it would be enough to check the arguments of the vector function calls, together with the calls and maybe the induction variable. The auto-generated check lines make things much more brittle and unrelated changes lead to us requiring to update lots of tests. And I am not sure if it is feasible to audit all details of the generated check lines (in the current patch ~500-800 new CHECK lines). So to me it seems like auto-generating checks here gives a false sense of security and make things harder down the line. | |
82 | I don't think we need this. You can just pass -force-vector-width=4 to the command line and avoid the extra metadata duplicated for each test |
llvm/test/Transforms/LoopVectorize/X86/libm-vector-calls.ll | ||
---|---|---|
2 | FWIW most LV tests use -force-vector-width. If we decide to just go for the 'essential' check lines, it should be easy to add multiple run lines with difference VFs from the command line? |
llvm/test/Transforms/LoopVectorize/X86/libm-vector-calls-finite.ll | ||
---|---|---|
82 | Apologies, I missed that this was suggested earlier to test different VFs. If we decide to go with the 'essential' check lines approach it might make sense to just invoke opt with different VFs using force-vector-width. |
Updated the patch as per review comments received.
The test cases are updated with the checks based on the below comment from Francesco.
---Snip--
I think you are over-testing here. It is enough to check that inside the vector body there is a call to the vector function you have listed in the mapping.
---Snip--
Florian also suggesting the same.
Another comment from Florian.
---Snip--
If we decide to go with the 'essential' check lines approach it might make sense to just invoke opt with different VFs using force-vector-width.
--Snip--
I still use metadata suggested by Sanjay. It is because I am currently testing only VF=4 .
We have both float and double type lib calls in the test case and libmvec has vector call support for VF=4 . For VF say 8 there is no vector call support for double types.
I can add few more float type tests with meta data for VF=8. please let me know your suggestions.
LGTM from the perspective of making sure that this solution can be extended to any of the architectures that libmvec supports.
Thank you.
I may be missing some subtlety of the vectorizer behavior. Can we vary the test types + metadata in 1 file, so that there is coverage for something like this v2f64 call : TLI_DEFINE_VECFUNC("llvm.sin.f64", "_ZGVbN2v_sin", 2)?
I'm just trying to make sure we don't fall into some blind-spot by only testing VF=4.
llvm/test/Transforms/LoopVectorize/X86/libm-vector-calls-finite.ll | ||
---|---|---|
2 |
LGTM.
I'm not sure why we need 3 different test files to test the very similar patterns - just programmer preference?
Wait a day or so to commit in case anyone else has comments.
Thanks @spatel , @Florian and @fpetrogalli for the review comments and approval. Can someone please commit it on my behalf.
I think glibc always refer to the library as "libmvec" in lower case, should we do so here?