This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

33/33
RFC-fveclib.md

Differential D54412

[RFC] Re-implementing -fveclib with OpenMP
AbandonedPublic

Authored by fpetrogalli on Nov 11 2018, 7:31 PM.

Download Raw Diff

Details

Reviewers

hfinkel
rengolin
hsaito
jdoerfert

Summary

This RFC encompass the proposal of replacing the current TargetLibraryInfo (TLI) based implementation of the command line -fveclib with an OpenMP based one.

Diff Detail

Event Timeline

fpetrogalli created this revision.Nov 11 2018, 7:31 PM

Herald added a subscriber: guansong. · View Herald TranscriptNov 11 2018, 7:31 PM

hfinkel added a subscriber: hfinkel.Nov 16 2018, 10:59 AM

hfinkel added inline comments.

RFC-fveclib.md
11	RC -> RFC
23	Why is this a question? Isn't this exactly the same infastructure we're using for OpenMP delcare simd, in which case it does exactly this. It also enables other frontends and languages to add things these things as relevant for their own runtime libraries, etc.
29	meta-data -> metadata (here and below)
224	TODO?
235	analisys -> analysis
269	This needs to be in the reserved namespace. Maybe something like: _CLANG_USE_SVML

fpetrogalli updated this revision to Diff 174412.Nov 16 2018, 11:23 AM

fpetrogalli marked 6 inline comments as done.

fpetrogalli added inline comments.

RFC-fveclib.md
224	Done, I have also changed the name to the correct one of the TLI (`getVectorizedFunction`).

Also, in order for us to be able to adopt this mechanism, we need to work on LV's code gen. Currently, parts of it rely on having one consistent VF across the entire loop (i.e., loop-level VF or "serialize"). That's the reason why https://reviews.llvm.org/D53035 is generating illegal call first and then legalize. We need to add this work item as a dependency (to at least some veclib, e.g., SVML).

RFC-fveclib.md
41–42	I think we need to come up with a mechanism to enable the veclib mapping without fully enabling -fopenmp-simd. Decorating the veclib stuff with an extra "veclib" clause added to omp declare simd and create a mode to process just those would do, and I'm open to other suggestions. Rationale: auto-vectorization of loops and slp vectorization, using veclib, is really a separate issue from OpenMP SIMD.

hfinkel added inline comments.Nov 16 2018, 12:05 PM

RFC-fveclib.md
41–42	I agree with this. We should add a clang pragma that we always honor independent of the OpenMP settings.

hsaito added inline comments.Nov 16 2018, 12:17 PM

RFC-fveclib.md
39–40	I think we would want to have a mechanism to include the custom header file from a non-default location, say, for some non-compiler vendor, supplying a new veclib implementation. Something like -fveclib-include=<full-path>/filename.h. By default, it'll be <clang's include dir>/{SVML, Accelerate, ...}.h for those that are shipped with the compiler.

kmitropo added a subscriber: kmitropo.Nov 16 2018, 12:45 PM

fpetrogalli added inline comments.Nov 16 2018, 1:14 PM

RFC-fveclib.md
39–40	Let me see if I got your proposal right. We add in `<clang>/lib/Headers` a `math.h` plus one header file for each of the libraries we need to expose to the vectorizer, as follows: #include_once <math.h> #ifdef _CLANG_USE_ACCELERATE #include "Accelerate.h" #endif #ifdef _CLANG_USE_SVML #include "SVML.h" #endif The header `Accelerate.h` and `SVML.h` would add the `declare simd` directive to the declarations. The option `-fveclib-include=<full-path>/filename.h` trigger the generation of the metadata, by triggering the mechanism described in this RFC to generate the `vector-variant` metadata. This is useful as it allows auto-vectorization of external function calls for libraries that do not relate to `math.h`. The option `-fveclib=X` is a wrapper that (a) selects the correct section in `<clang>/lib/Headersmath.h` with a macro (e.g. `_CLANG_USE_X`), and moreover triggers the generation of the metadata described in this RFC. Under the assumption I interpreted your suggestion right, I think this is a very useful change, as it fully realizes the improvement described in the introduction: Enable support for a developer's own vector libraries without requiring changes to the compiler.
41–42	I agree too, thank you for raising this. Could we do the same by restricting `-fopenmp-simd` to something like `-fopenmpo-declare-simd`? This would process only the `declare simd` directive and its associated `declare variant` for simd. I think that this would: Make out internal header files and the one the libraries would have to provide look more standard (they would be able to use them with any other compiler, in principle). Avoid adding parser code to duplicate the new directive. Does it make sense?

hsaito added inline comments.Nov 16 2018, 2:47 PM

RFC-fveclib.md
39–40	What I wrote was below, but I think what you wrote should also work. Either way is fine with me. If I understand your interpretation correctly, -fveclib and -fveclib-include should be mutually exclusive. -fveclib=X w/o explicit -fveclib-include defaults to the behavior you wrote. Works only for the ones shipped with the compiler. -fveclib=X w/ explicit -fveclib-include should pick up the specified header file (from anywhere in the accessible file system).
41–42	Not quite the same thing, because auto-vectorization can still pick up vector functions outside of veclib. I think the include file can be written with a macro like below and still compatible with other OpenMP compilers. I don't know how the clang FE folks likes/hates parsing this when -fopenmp/-fopenmp-simd isn't thrown, however. #ifdef clang #define CLANG_VECLIB clang_veclib #else #define CLANG_VECLIB #endif snip snip snip #pragma omp declare simd CLANG_VECLIB

hfinkel added inline comments.Nov 16 2018, 7:56 PM

RFC-fveclib.md
39–40	Why do we need a separate flag for this? Isn't it the same as using '-include <full-path>/filename.h'?
41–42	Avoid adding parser code to duplicate the new directive. I think that this is exactly what we should do, however. We can refactor things so that we have a '#pragma clang declare simd', or similar, that we always accept (even if we ignore the '#pragma omp declare simd' spelling when OpenMP is not enabled). An alternative might be to enable OpenMP simd parsing by default?
269	To update my own suggestion, we might use: _CLANG_USE_VECLIB_X (but we should be consistent throughout the document - below you have LIBRARY_X).

@hsaito , @hfinkel

I would avoid having a clang-specific pragma. By just using the OpenMP one, we provide the library vendors with a mechanism that is based on a standard and therefore can be used by any compiler.

Specifically:

An alternative might be to enable OpenMP simd parsing by default?

Yes, I thisnk this is doable, to have #pragma omp declare simd and #pragma omp declare variant (only the simd case) enabled by default.

Are you happy for me to modify the RFC with this?

Francesco

In D54412#1302873, @fpetrogalli wrote:

@hsaito , @hfinkel

I would avoid having a clang-specific pragma. By just using the OpenMP one, we provide the library vendors with a mechanism that is based on a standard and therefore can be used by any compiler.

Specifically:

An alternative might be to enable OpenMP simd parsing by default?

Yes, I thisnk this is doable, to have #pragma omp declare simd and #pragma omp declare variant (only the simd case) enabled by default.

Are you happy for me to modify the RFC with this?

I'm fine with that.

Francesco

In D54412#1303003, @hfinkel wrote:

In D54412#1302873, @fpetrogalli wrote:

An alternative might be to enable OpenMP simd parsing by default?

Yes, I thisnk this is doable, to have #pragma omp declare simd and #pragma omp declare variant (only the simd case) enabled by default.

Are you happy for me to modify the RFC with this?

I'm fine with that.

I'm not totally aligned here, but I'm fine proceeding with the RFC revision along this line. I'll wait for more voices to be raised (or not) on the issue. Please keep a note saying that there is a concern raised and those who care should voice their concerns.

In D54412#1303144, @hsaito wrote:

In D54412#1303003, @hfinkel wrote:

In D54412#1302873, @fpetrogalli wrote:

An alternative might be to enable OpenMP simd parsing by default?

Yes, I thisnk this is doable, to have #pragma omp declare simd and #pragma omp declare variant (only the simd case) enabled by default.

Are you happy for me to modify the RFC with this?

I'm fine with that.

I'm not totally aligned here, but I'm fine proceeding with the RFC revision along this line. I'll wait for more voices to be raised (or not) on the issue. Please keep a note saying that there is a concern raised and those who care should voice their concerns.

So that I understand, what's your preference?

In D54412#1303192, @hfinkel wrote:

In D54412#1303144, @hsaito wrote:

In D54412#1303003, @hfinkel wrote:

In D54412#1302873, @fpetrogalli wrote:

An alternative might be to enable OpenMP simd parsing by default?

Yes, I thisnk this is doable, to have #pragma omp declare simd and #pragma omp declare variant (only the simd case) enabled by default.

Are you happy for me to modify the RFC with this?

I'm fine with that.

I'm not totally aligned here, but I'm fine proceeding with the RFC revision along this line. I'll wait for more voices to be raised (or not) on the issue. Please keep a note saying that there is a concern raised and those who care should voice their concerns.

So that I understand, what's your preference?

I still would like to see the ability to differentiate quasi-intrinsic veclib functions versus user-written vector functions and enable veclib w/o enabling user-written vector functions, even if -fopenmp-simd becomes default. Some of our ICC customers take perf data with and w/o openmp simd (and proprietary simd pragma) and they expect SVML to be used for both cases. I don't have much preferences in how to accomplish it. For example, if we'd like to keep the header file contents totally standard based, we could use something like a command line flag -fclang-vector-intrinsic=file1,file2,..,fileN to indicate that "declare simd" in those header file is "quasi-intrinsic". We could silently do this for the veclibs shipped with clang (e.g., check if "declare simd" is coming from clang include directory) ---- I'm fine with this ----, but that's a bit unfair to the 3rd party veclibs.

I still would like to see the ability to differentiate quasi-intrinsic veclib functions versus user-written vector functions and enable veclib w/o enabling user-written vector functions, even if -fopenmp-simd becomes default. Some of our ICC customers take perf data with and w/o openmp simd (and proprietary simd pragma) and they expect SVML to be used for both cases. I don't have much preferences in how to accomplish it. For example, if we'd like to keep the header file contents totally standard based, we could use something like a command line flag -fclang-vector-intrinsic=file1,file2,..,fileN to indicate that "declare simd" in those header file is "quasi-intrinsic". We could silently do this for the veclibs shipped with clang (e.g., check if "declare simd" is coming from clang include directory) ---- I'm fine with this ----, but that's a bit unfair to the 3rd party veclibs.

I wonder if there is a way to do this programmatically. Maybe we could use clang/ModuleMap to tell clang to parse (and load) math.h and the compiler provided headers with the extra feature -fopenmp-simd?

If not, we can go with the #pragma clang declare simd directive, with clang being ifdef guarded to become omp when _OPENMP is detected.

https://reviews.llvm.org/D54412

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

Sigh, ignore this, it was an error.

If not, we can go with the #pragma clang declare simd directive, with clang being ifdef guarded to become omp when _OPENMP is detected.

Under the circumstances, this seems like a reasonable strategy to me.

In D54412#1303759, @hfinkel wrote:

If not, we can go with the #pragma clang declare simd directive, with clang being ifdef guarded to become omp when _OPENMP is detected.

Under the circumstances, this seems like a reasonable strategy to me.

Works for me.

@hfinkel , @hsaito ,

I think i have addressed all your comments. Please let me know if I have missed anything.

I have adapted the RFC to your suggestions, in particular the need to keep auto-vectorization and SLP vectorization separated from OpenMP SIMD. As suggested, I have introduced a new #pragma, that is equivalent to the OpenMP ones. I decide to call it #pragma veclib instead of #pragma clang to avoid making it clang-specific. The reason for doing this is that gcc has a -mveclibabi flag that is similar to the -fveclib in clang, so I thought it would add value for those library vendors that wanted to use the header file mechanism also in gcc, if gcc will ever decide to go via the same route we might take in llvm.

fpetrogalli marked 8 inline comments as done.Nov 25 2018, 8:26 PM

fpetrogalli marked an inline comment as done.Nov 25 2018, 8:29 PM

fpetrogalli added inline comments.

RFC-fveclib.md
39–40	Why do we need a separate flag for this? Isn't it the same as using '-include <full-path>/filename.h'? Am I correct thinking that we want to have new separate flag because the #pragma directives are not parsed by default?

hsaito added inline comments.Nov 26 2018, 4:40 PM

RFC-fveclib.md
39–40	Am I correct thinking that we want to have new separate flag because the #pragma directives are not >parsed by default? If we'd like to enable/disable #pragma veclib parsing, yes. Else no (i.e., enable parsing by default. and no way to turn it off, other than #pragma veclib becoming #pragma omp if _OPENMP is seen before parsing the header file). If a programmer doesn't want veclib, don't bother including the header file. That usage model works for me, I think. I think the summary of discussion here would be: No veclib support ==> do not use -include. veclib (shipped with compiler) -include filename.h (look under system include dir by default) veclib (3rd party) -include <path>/filename.h veclib w/o _OPENMP seen: #pragma veclib declare simd clang should parse this w/o a flag. veclib w/ _OPENMP seen: #pragma omp declare simd clang should parse this as part of OpenMP support. using something like #ifndef _OPENMP #define VECLIB_PRAGMA veclib #else #define VECLIB_PRAGMA omp #endif

hsaito added inline comments.Nov 26 2018, 4:50 PM

RFC-fveclib.md
44–45	-include
47–50	Retire -fveclib flag.
52	I think this subsection, explaining current situation, doesn't need any changes.
129	standard
140	I have a reservation about this simple substitution, but if other people think this will work, then I'm fine.

hsaito added inline comments.Nov 26 2018, 5:17 PM

RFC-fveclib.md
209	(SVFS)
211	SVFS
238	This isn't specific to veclib, but we need to think about what to do when there are more than one candidate that "fuzzy matches" the calling context. We should provide a standard "scoring system" that returns the "best match", but that scoring system may not be "one size fits all". For veclib, we probably could take some short cut --- but if we want to standardize with OpenMP declare simd, we should think about multiple fuzzy matches.

fpetrogalli marked an inline comment as done.Nov 26 2018, 7:35 PM

fpetrogalli added inline comments.

RFC-fveclib.md
39–40	Hum - I was thinking something slightly different. I am not really keen in removing the -fveclib option, as it makes very clear to the user what to expect. Let me iterate once again on a high level view of what I am trying to do here. The LLVM view. The backend is informed about the availability of vector functions via the vector-varian attribute, with non standard name redirection. The standard names of the vector variant is generated according to the mangling rule of a vector function ABI based on OpenMP. The vector-variant attribute is all the vectorizer needs to know to be able to vectorize the function. The clang view. clang generates the vector-variant attribute when a function is decorated with #pragma omp declare simd, and with #pragma omp declare variant simd The -fveclib view. We want to retain auto-vectorization of functions even when OpenMP is not enabled in the compiler. 2. This is done by adding the functions of a library in the system header files shipped with clang, and by cloning the #pragma omp declare simd and #pragma omp declare variant simd directive into the "veclib" ones. Parsing of the veclib directive is enabled only with the -fveclib command line option, that select the proper header file section shipped by the compiler (if we are not happy with math.h, we could ship the compiler with a veclib.h that is automatically included when -fveclib is used) The user libraries The option -fveclib-include= activate the parsing of the veclib pragma and loads the list from a user provided header file. Summary: 0. No option -> nothing happens `-fveclib=X` becomes `-fparse-veclib -D__CLANG_ENABLE_LIBRARY_X -lX -include=<path>/to/lib/Headers/veclib.h`. The header veclib.h is shipped with the compiler. `-fveclib-include=/path/to/my/header.h` becomes `-fparse-veclib -include=/path/to/my/header.h`. The user has to provide the correct linker flangs. `-fopenmp[-simd]` no vectorization happens other then for those functions that are marked with OpenMP declare simd. The header veclib.h is not loaded so nothing gets vectorized under the hood `-fopenmp[-simd] -fveclib=X` or `-fopenmp[-simd] -fveclib-include=/path/to/my/header.h` behave like in 1. and 2. Item 1. 2. 3. and 4. use the same mechanism of vector-variant. The only difference is that for the -fveclib based option we replace the openMP directive with the veclib one to make sure we can keep the two functionality separated (vectorize OpenMP delcare simd _without_ also enabling -fveclib).

@hsaito , there are another couple of reasons for which I'd prefer to keep the -fveclib option.

First of all, by using -fveclib=X we inform the compiler that we are also requesting to link against library X. Given that clang can be used to link object files that have been compiled with -fveclib, I think that we would prevent linking errors by retaining -fveclib.

Second, the way I see -fveclib=X is for vector functions of something that is already available in math.h. Libraries like SVML and Accelerate contain only the vector version of the functions that are provided separately by libm or other scalar math library that implement math.h. For use provided library, this is not granted, as the compiler will expect to find both scalar and vector version "somewhere". Therefore I'd rather have a separate -fveclib-include option for external library, which would make explicit the fact that the user needs to provide at link time whatever library is needed to resolve the symbols generated by the compiler (scalar and vector).

Finally, -fveclib=X make it more easy for a Fortran front-end to pick up the list of vector functions associated to the library X, as I believe it doesn't make sense to use an -include option in a Fortran context.

hsaito added inline comments.Nov 28 2018, 2:00 PM

RFC-fveclib.md
39–40	This is fine with me. Aligns enough with my earlier comments. @hfinkel was asking why we need two separate flags and that's why my summary was purely based on -include.
44–45	Taking my earlier comment back.
47–50	Taking my earlier comment back.

fpetrogalli marked 12 inline comments as done.Nov 29 2018, 12:46 PM

fpetrogalli added inline comments.

RFC-fveclib.md
238	I have added the following paragraph (at the end of this section) to address this: Notice that to fully support OpenMP vectorization we need to think about a fuzzy matching mechanism that is able to select a candidate in the calling context. However, this is not needed for `-fveclib` because the scalar-to-vector mappings of `-fveclib` are such that for every scalar function there is only one possible vector function associated. Therefore, extending this behavior to a generic one is an aspect of the implementation that will be treated in a separate RFC about the vectorization pass.

fpetrogalli updated this revision to Diff 175937.Nov 29 2018, 12:49 PM

fpetrogalli marked an inline comment as done.

hsaito added inline comments.Nov 29 2018, 1:16 PM

RFC-fveclib.md
238	Thanks!

fpetrogalli marked 4 inline comments as done.Nov 29 2018, 9:40 PM

steleman added a subscriber: steleman.Nov 29 2018, 9:52 PM

shibatch added a subscriber: shibatch.Nov 30 2018, 4:07 AM

I feel like pointing out that this RFC did *NOT* go to the lists, which makes zero sense, since it's RFC.
Usually, it is recommended to close the old review and submit a new one, making sure that it *does* go to the lists..

RKSimon added a subscriber: RKSimon.Dec 4 2018, 1:58 AM

smaslov added a subscriber: smaslov.Dec 6 2018, 2:48 PM

In D54412#1317294, @lebedev.ri wrote:

I feel like pointing out that this RFC did *NOT* go to the lists, which makes zero sense, since it's RFC.

Hi - sorry about that - I thought that sending it out on llvm-dev and cfe-dev was enough.

Usually, it is recommended to close the old review and submit a new one, making sure that it *does* go to the lists..

I see. I am happy to create a new one if that is a requirement for the RFC submission. One question: how do I make sure it goes to the lists? I thought I did it and I want to avoid making the same error.

Kind regards,

Francesco

In D54412#1325449, @fpetrogalli wrote:

In D54412#1317294, @lebedev.ri wrote:

I feel like pointing out that this RFC did *NOT* go to the lists, which makes zero sense, since it's RFC.

Hi - sorry about that - I thought that sending it out on llvm-dev and cfe-dev was enough.

No.

Usually, it is recommended to close the old review and submit a new one, making sure that it *does* go to the lists..

I see. I am happy to create a new one if that is a requirement for the RFC submission. One question:

how do I make sure it goes to the lists? I thought I did it and I want to avoid making the same error.

By setting correct "repo" field (best) and/or adding the appropriate -commits list as the subscriber.

Kind regards,

Francesco

In D54412#1325453, @lebedev.ri wrote:

In D54412#1325449, @fpetrogalli wrote:

I see. I am happy to create a new one if that is a requirement for the RFC submission. One question:

how do I make sure it goes to the lists? I thought I did it and I want to avoid making the same error.

By setting correct "repo" field (best) and/or adding the appropriate -commits list as the subscriber.

Sure. Can you specify exactly which are the repo fields I have to set? Sorry, it is not cleat to me from the changes you have done here.

Moreover, I can see though that you have subscribed llvm-commits. Is that the only one I need to add to the reviewers field? No cfe-commits list?

In D54412#1325477, @fpetrogalli wrote:

In D54412#1325453, @lebedev.ri wrote:

In D54412#1325449, @fpetrogalli wrote:

I see. I am happy to create a new one if that is a requirement for the RFC submission. One question:

how do I make sure it goes to the lists? I thought I did it and I want to avoid making the same error.

By setting correct "repo" field (best) and/or adding the appropriate -commits list as the subscriber.

Sure. Can you specify exactly which are the repo fields I have to set? Sorry, it is not cleat to me from the changes you have done here.

Moreover, I can see though that you have subscribed llvm-commits. Is that the only one I need to add to the reviewers field? No cfe-commits list?

RFCs should be sent to llvm-dev (or cfe-dev, etc.) and that can't be done from Phabricator.

rengolin added reviewers: hfinkel, rengolin, hsaito.Dec 11 2018, 7:02 AM

rengolin added a subscriber: rengolin.

Just to clarify - to me it is not clear whether I am required to create a new review on phabricator to be able to refer to this proposal as an RFC, and if so, which fields have to be set, and with which values, to make sure that the proposal is recognized as an official RFC by the LLVM community.

Is there a document that explains how to submit an RFC?

In D54412#1327560, @fpetrogalli wrote:

Is there a document that explains how to submit an RFC?

Hi Francesco,

It's not that complex for a document. The text you have here would be more than fine, and the discussion that happened is exactly what we look for, but Phab reviews only copy llvm-commits, not llvm-dev, and that is the main problem.

Since there's a lot of high quality discussions here, I suggest you post the same content, but with a summary of the feedback you had, and the changes in design you had to do because of that.

This would avoid Hal and Hideki from having to say it all over again, and will get more people into the right point in the conversation.

cheers,
--renato

In D54412#1327574, @rengolin wrote:

In D54412#1327560, @fpetrogalli wrote:

Is there a document that explains how to submit an RFC?

Hi Francesco,

It's not that complex for a document. The text you have here would be more than fine, and the discussion that happened is exactly what we look for, but Phab reviews only copy llvm-commits, not llvm-dev, and that is the main problem.

I see! Now it is clear to me - I thought people were asking me to create a new review on phabricator of the same document, and move the discussion there. What was really being asked was to submit the actual text of the proposal in the body of the email announcing the RFC... I will do it.

Since there's a lot of high quality discussions here, I suggest you post the same content, but with a summary of the feedback you had, and the changes in design you had to do because of that.

This would avoid Hal and Hideki from having to say it all over again, and will get more people into the right point in the conversation.

I would like to avoid writing a summary, and take care of clarifying what is being asked by pointing them at specific comments in the review if the question has already been answered in phabricator.

cheers,
--renato

Hello again - I have posted the RFC on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2018-December/128426.html

Regards,

Francesco

chriselrod added a subscriber: chriselrod.Feb 26 2019, 2:19 PM

Herald added a subscriber: jdoerfert. · View Herald TranscriptFeb 26 2019, 2:19 PM

simoll added a subscriber: simoll.Jun 3 2019, 2:21 AM

tim.schmielau added a subscriber: tim.schmielau.Apr 21 2022, 10:50 PM

Herald added a reviewer: jdoerfert. · View Herald TranscriptApr 21 2022, 10:50 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: sstefan1, yaxunl. · View Herald Transcript

fpetrogalli abandoned this revision.Jan 17 2023, 12:42 AM

Herald added a subscriber: • pcwang-thead. · View Herald TranscriptJan 17 2023, 12:42 AM

Revision Contents

Path

Size

RFC-fveclib.md

297 lines

Diff 173609

RFC-fveclib.md

This file was added.

				---
				title: \[RFC\] Implementing -fveclib using OpenMP.
				toc: yes
				geometry: margin=1.5in
				colorlinks: yes
				fontfamily: mathpazo
				...

				# Introduction

				This RC encompass the proposal of replacing the current
				hfinkelUnsubmitted Done Reply Inline Actions RC -> RFC hfinkel: RC -> RFC
				``TargetLibraryInfo`` (TLI) based implementation of the command line
				``-fveclib`` with an OpenMP based one.

				With this change, `-fveclib` will maintain its current behavior in
				terms of user experience, but the new implementation will
				additionally:

				1. Decouples the compiler front-end that knows about the availability
				of vectorized routines, from the back-end that knows how to make
				use of them.
				2. Enable support for a developer's own vector libraries without
				requiring changes to the compiler?
				hfinkelUnsubmitted Done Reply Inline Actions Why is this a question? Isn't this exactly the same infastructure we're using for OpenMP delcare simd, in which case it does exactly this. It also enables other frontends and languages to add things these things as relevant for their own runtime libraries, etc. hfinkel: Why is this a question? Isn't this exactly the same infastructure we're using for OpenMP…

				The implementation of the proposal will consists of the following
				components:

				1. [Changes in LLVM IR](#llvmIR) to provide information about the
				availability of vector math functions via meta-data attached to an
				hfinkelUnsubmitted Done Reply Inline Actions meta-data -> metadata (here and below) hfinkel: meta-data -> metadata (here and below)
				``llvm::CallInst``.
				2. [An infrastructure](#infrastructure) that can be queried to retrive
				information about the available vector functions associated to a
				``llvm::CallInst``.
				3. [Changes in the LoopVectorizer](#LV) to use the API to query the
				meta-data.
				4. [Changes in clang](#mathdoth) to add the meta-data in the IR via a
				custom header file shipped with the compiler.
				5. [Changes in the clang driver](#driver) to translate ``-fveclib`` in
				a combination of ``-fopenmp-simd`` and library-specific flags.

				hsaitoUnsubmitted Done Reply Inline Actions I think we would want to have a mechanism to include the custom header file from a non-default location, say, for some non-compiler vendor, supplying a new veclib implementation. Something like -fveclib-include=<full-path>/filename.h. By default, it'll be <clang's include dir>/{SVML, Accelerate, ...}.h for those that are shipped with the compiler. hsaito: I think we would want to have a mechanism to include the custom header file from a non-default…
				fpetrogalliAuthorUnsubmitted Done Reply Inline Actions Let me see if I got your proposal right. We add in `<clang>/lib/Headers` a `math.h` plus one header file for each of the libraries we need to expose to the vectorizer, as follows: #include_once <math.h> #ifdef _CLANG_USE_ACCELERATE #include "Accelerate.h" #endif #ifdef _CLANG_USE_SVML #include "SVML.h" #endif The header `Accelerate.h` and `SVML.h` would add the `declare simd` directive to the declarations. The option `-fveclib-include=<full-path>/filename.h` trigger the generation of the metadata, by triggering the mechanism described in this RFC to generate the `vector-variant` metadata. This is useful as it allows auto-vectorization of external function calls for libraries that do not relate to `math.h`. The option `-fveclib=X` is a wrapper that (a) selects the correct section in `<clang>/lib/Headersmath.h` with a macro (e.g. `_CLANG_USE_X`), and moreover triggers the generation of the metadata described in this RFC. Under the assumption I interpreted your suggestion right, I think this is a very useful change, as it fully realizes the improvement described in the introduction: Enable support for a developer's own vector libraries without requiring changes to the compiler. fpetrogalli: Let me see if I got your proposal right. 1. We add in `<clang>/lib/Headers` a `math.h` plus…
				hsaitoUnsubmitted Done Reply Inline Actions What I wrote was below, but I think what you wrote should also work. Either way is fine with me. If I understand your interpretation correctly, -fveclib and -fveclib-include should be mutually exclusive. -fveclib=X w/o explicit -fveclib-include defaults to the behavior you wrote. Works only for the ones shipped with the compiler. -fveclib=X w/ explicit -fveclib-include should pick up the specified header file (from anywhere in the accessible file system). hsaito: What I wrote was below, but I think what you wrote should also work. Either way is fine with me.
				hfinkelUnsubmitted Done Reply Inline Actions Why do we need a separate flag for this? Isn't it the same as using '-include <full-path>/filename.h'? hfinkel: Why do we need a separate flag for this? Isn't it the same as using '-include <full…
				fpetrogalliAuthorUnsubmitted Done Reply Inline Actions Why do we need a separate flag for this? Isn't it the same as using '-include <full-path>/filename.h'? Am I correct thinking that we want to have new separate flag because the #pragma directives are not parsed by default? fpetrogalli: > Why do we need a separate flag for this? Isn't it the same as using '-include <full…
				hsaitoUnsubmitted Done Reply Inline Actions Am I correct thinking that we want to have new separate flag because the #pragma directives are not >parsed by default? If we'd like to enable/disable #pragma veclib parsing, yes. Else no (i.e., enable parsing by default. and no way to turn it off, other than #pragma veclib becoming #pragma omp if _OPENMP is seen before parsing the header file). If a programmer doesn't want veclib, don't bother including the header file. That usage model works for me, I think. I think the summary of discussion here would be: No veclib support ==> do not use -include. veclib (shipped with compiler) -include filename.h (look under system include dir by default) veclib (3rd party) -include <path>/filename.h veclib w/o _OPENMP seen: #pragma veclib declare simd clang should parse this w/o a flag. veclib w/ _OPENMP seen: #pragma omp declare simd clang should parse this as part of OpenMP support. using something like #ifndef _OPENMP #define VECLIB_PRAGMA veclib #else #define VECLIB_PRAGMA omp #endif hsaito: >Am I correct thinking that we want to have new separate flag because the #pragma directives…
				fpetrogalliAuthorUnsubmitted Done Reply Inline Actions Hum - I was thinking something slightly different. I am not really keen in removing the -fveclib option, as it makes very clear to the user what to expect. Let me iterate once again on a high level view of what I am trying to do here. The LLVM view. The backend is informed about the availability of vector functions via the vector-varian attribute, with non standard name redirection. The standard names of the vector variant is generated according to the mangling rule of a vector function ABI based on OpenMP. The vector-variant attribute is all the vectorizer needs to know to be able to vectorize the function. The clang view. clang generates the vector-variant attribute when a function is decorated with #pragma omp declare simd, and with #pragma omp declare variant simd The -fveclib view. We want to retain auto-vectorization of functions even when OpenMP is not enabled in the compiler. 2. This is done by adding the functions of a library in the system header files shipped with clang, and by cloning the #pragma omp declare simd and #pragma omp declare variant simd directive into the "veclib" ones. Parsing of the veclib directive is enabled only with the -fveclib command line option, that select the proper header file section shipped by the compiler (if we are not happy with math.h, we could ship the compiler with a veclib.h that is automatically included when -fveclib is used) The user libraries The option -fveclib-include= activate the parsing of the veclib pragma and loads the list from a user provided header file. Summary: 0. No option -> nothing happens `-fveclib=X` becomes `-fparse-veclib -D__CLANG_ENABLE_LIBRARY_X -lX -include=<path>/to/lib/Headers/veclib.h`. The header veclib.h is shipped with the compiler. `-fveclib-include=/path/to/my/header.h` becomes `-fparse-veclib -include=/path/to/my/header.h`. The user has to provide the correct linker flangs. `-fopenmp[-simd]` no vectorization happens other then for those functions that are marked with OpenMP declare simd. The header veclib.h is not loaded so nothing gets vectorized under the hood `-fopenmp[-simd] -fveclib=X` or `-fopenmp[-simd] -fveclib-include=/path/to/my/header.h` behave like in 1. and 2. Item 1. 2. 3. and 4. use the same mechanism of vector-variant. The only difference is that for the -fveclib based option we replace the openMP directive with the veclib one to make sure we can keep the two functionality separated (vectorize OpenMP delcare simd _without_ also enabling -fveclib). fpetrogalli: Hum - I was thinking something slightly different. I am not really keen in removing the…
				hsaitoUnsubmitted Done Reply Inline Actions This is fine with me. Aligns enough with my earlier comments. @hfinkel was asking why we need two separate flags and that's why my summary was purely based on -include. hsaito: This is fine with me. Aligns enough with my earlier comments. @hfinkel was asking why we need…


				hsaitoUnsubmitted Done Reply Inline Actions I think we need to come up with a mechanism to enable the veclib mapping without fully enabling -fopenmp-simd. Decorating the veclib stuff with an extra "veclib" clause added to omp declare simd and create a mode to process just those would do, and I'm open to other suggestions. Rationale: auto-vectorization of loops and slp vectorization, using veclib, is really a separate issue from OpenMP SIMD. hsaito: I think we need to come up with a mechanism to enable the veclib mapping without fully enabling…
				hfinkelUnsubmitted Done Reply Inline Actions I agree with this. We should add a clang pragma that we always honor independent of the OpenMP settings. hfinkel: I agree with this. We should add a clang pragma that we always honor independent of the OpenMP…
				fpetrogalliAuthorUnsubmitted Done Reply Inline Actions I agree too, thank you for raising this. Could we do the same by restricting `-fopenmp-simd` to something like `-fopenmpo-declare-simd`? This would process only the `declare simd` directive and its associated `declare variant` for simd. I think that this would: Make out internal header files and the one the libraries would have to provide look more standard (they would be able to use them with any other compiler, in principle). Avoid adding parser code to duplicate the new directive. Does it make sense? fpetrogalli: I agree too, thank you for raising this. Could we do the same by restricting `-fopenmp-simd` to…
				hsaitoUnsubmitted Done Reply Inline Actions Not quite the same thing, because auto-vectorization can still pick up vector functions outside of veclib. I think the include file can be written with a macro like below and still compatible with other OpenMP compilers. I don't know how the clang FE folks likes/hates parsing this when -fopenmp/-fopenmp-simd isn't thrown, however. #ifdef clang #define CLANG_VECLIB clang_veclib #else #define CLANG_VECLIB #endif snip snip snip #pragma omp declare simd CLANG_VECLIB hsaito: Not quite the same thing, because auto-vectorization can still pick up vector functions outside…
				hfinkelUnsubmitted Done Reply Inline Actions Avoid adding parser code to duplicate the new directive. I think that this is exactly what we should do, however. We can refactor things so that we have a '#pragma clang declare simd', or similar, that we always accept (even if we ignore the '#pragma omp declare simd' spelling when OpenMP is not enabled). An alternative might be to enable OpenMP simd parsing by default? hfinkel: > Avoid adding parser code to duplicate the new directive. I think that this is exactly what…
				# Current status of `-fveclib`

				## User interface
				hsaitoUnsubmitted Done Reply Inline Actions -include hsaito: -include
				hsaitoUnsubmitted Done Reply Inline Actions Taking my earlier comment back. hsaito: Taking my earlier comment back.

				At the moment, a user can invoke `-fveclib` to generate vector calls
				from two libraries, SVML and Accelerate, as follows:

				```
				hsaitoUnsubmitted Done Reply Inline Actions Retire -fveclib flag. hsaito: Retire -fveclib flag.
				hsaitoUnsubmitted Done Reply Inline Actions Taking my earlier comment back. hsaito: Taking my earlier comment back.
				$> clang -fveclib=[SVML\|Accelerate]
				```
				hsaitoUnsubmitted Done Reply Inline Actions I think this subsection, explaining current situation, doesn't need any changes. hsaito: I think this subsection, explaining current situation, doesn't need any changes.

				## Interface with the loop vectorizer

				The TLI exposes an interface that enables querying the list of
				available mappings by scalar name and number of lanes needed. The TLI
				interface is currently used by the InnerLoopVectorizer to plant vector
				calls in auto-vectorized loops.

				## Extending `-fveclib`

				Adding new libraries require listing the mapping in
				`<llvm>/lib/Analisys/TargetLibraryInfo.cpp`, plus modifying the clang
				front-end to handle the new value for the option - see for example the
				two patches to add SLEEF (<http://sleef.org>) as a target library for
				AArch64: <https://reviews.llvm.org/D53927> (LLVM code-base) and
				<https://reviews.llvm.org/D53928> (clang code-base).

				## Limitations of the current implementation

				The mapping between scalar to vector version of a function is defined
				by the backend, within the TLI specifically. For this reason the
				frontend's -fveclib option is tied to the backend's support for the,
				often language dependent, library. In particular, an IR file that is
				generated with a version of clang that knows about the availability of
				library ``X``, needs to be processed by a backend end that also needs
				to know about the availability of library ``X``.

				# Proposed changes

				We propose an implementation of ``-fveclib`` that makes uses of an
				OpenMP ``declare simd`` based mechanism to inform the backend
				components about the availability of vector version of scalar
				functions found in IR. The mechanism relies in storing such
				information in IR meta-data, and therefore makes the
				auto-vectorization of function calls a mid-end (``opt``) process that
				is independent on the front-end that generated such IR meta-data.

				Moreover, this implementation enhances the extendibility and
				portability of ``-fveclib`` to other libraries and front-ends, and it
				provides a generic mechanism that the users of the LLVM compiler will
				be able to use for interfacing their own vector routines for generic
				code.

				The proposed implementation can also be used to expose
				vectorization-specific descriptors -- for example, like the ``linear``
				and ``uniform`` clauses of the OpenMP ``declare simd`` directive --
				that could be used to finely tune the automatic vectorization of some
				functions (think for example the vectorization of ``double
				sincos(double , double , double )``, where ``linear`` can be used to
				give extra information about the memory layout of the 2 pointers
				parameters in the vector version).


				## Changes in LLVM IR {#llvmIR}

				The IR is enriched with meta-data that details the availability of
				vector versions of an associated scalar function. This metadata is
				attached to the call site of the scalar function.

				The meta-data takes the form of an attribute containing a comma
				separated list of vector function mappings. Each entry has a
				unique name that follows the Vector Function ABI[^3] and real name that
				is used when generating calls to this vector function.

				```
				vfunc_name1(real_name1), vfunc_name2(real_name2)
				```

				The Vector Function ABI name describes the signature of the vector
				function so that properties like vectorisation factor can be queried
				during compilation.

				The real name is optional and assumed to match the vector function ABI
				name when omitted.

				For example, the availability of a 2-lane double precision ``sin``
				function via SVML when targeting AVX on x86 is provided by the
				hsaitoUnsubmitted Done Reply Inline Actions standard hsaito: standard
				following IR.

				```
				// ...
				... = call double @sin(double) #0
				// ...

				#0 = { vector-variant = {"_ZGVcN2v_sin(__svml_sin2),
				_ZGVdN4v_sin(__svml_sin4),
				..."} }
				```
				hsaitoUnsubmitted Done Reply Inline Actions I have a reservation about this simple substitution, but if other people think this will work, then I'm fine. hsaito: I have a reservation about this simple substitution, but if other people think this will work…

				The string ``"_ZGVcN2v_sin(__svml_sin2)"`` in this vector-variant
				attribute provides information on the shape of the vector function via
				the string ``_ZGVcN2v_sin``, mangled according to the Vector Function
				ABI for Intel, and remaps the standard Vector Function ABI name to the
				non-standard name ``__svml_sin2``.

				This meta-data is compatible with the proposal "Proposal for function
				vectorization and loop vectorization with function calls",[^1] that
				uses Vector Function ABI mangled names to inform the vectorizer about
				the availability of vector functions. The proposal extends the
				original by allowing the explicit mapping of the Vector Function ABI
				mangled name to a non-standard name, which allows the use of existing
				vector libraries.

				The ``vector-variant`` attribute needs to be attached on a per-call
				basis to avoid conflicts when merging modules with different vector
				variants.

				## The query infrastructure: SVFS {#infrastructure}

				The Search Vector Function System (SVFM) is constructed from an
				``llvm::Module`` instance so it can create function definitions. The
				SVMS exposes an API with two methods.

				### ``SVFS::isFunctionVectorizable``

				This method queries the avilability of a vectorized version of a
				function. The signature of the method is as follows.

				```
				bool isFunctionVectorizable(llvm::CallInst * Call, ParTypeSet Params);
				```

				The method determine the availability of vector version of the
				function invoked by the ``Call`` parameter by looking at the
				``vector-variant`` meta-data.

				The ``Params`` argument is a set mapping the position of a parameter
				in the CallInst to its ``ParameterType`` descriptor. The
				``ParameterType`` descriptor holds information about the shape of the
				correspondend parameter in the signature of the vector function. This
				``ParamaterType`` is used to query the SVMS about the availability
				of vector version that have ``linear`` or ``uniform`` parameters (in
				the sense of OpenMP 4.0 and onwards).

				The method we propose, when invoked with an empty ``ParTypeSet``, is
				equivalent to the ``TargetLibraryInfo`` method
				``isFunctionVectorizable(StrinRef Name)``

				### ``SVFS::getVectorFunction``

				This method returns the vector function declaration that correspond
				to the needs of the vectorization technique that is being run.

				The signature of the function is as follows.

				```
				std::pair<llvm::FunctionType *, std::string> getVectorFunction(
				llvm::CallInst * Call, unsigned VF, bool IsMasked, ParTypeSet Params);
				```

				The ``Call`` parameter is the call instance that is being vectorized,
				the ``VF`` parameter represent the vectorization factor (how many
				lanes), the ``IsMasked`` parameter decides whether or not the
				signature of the vector function is required to have a mask parameter,
				the ``Params`` parameter describes the shape of the vector function as
				in the ``isFunctionVectorizable`` method.

				hsaitoUnsubmitted Done Reply Inline Actions (SVFS) hsaito: (SVFS)
				The methods uses the ``vector-variant`` meta-data and returns the
				function signature and the name of the function based on the input
				hsaitoUnsubmitted Done Reply Inline Actions SVFS hsaito: SVFS
				parameters.

				The SVFS can add new function definitions, in the same module as the
				``Call``, to provide vector functions that are not present within the
				vector-variant meta-data. For example, if a library provides a vector
				version of a function with a vectorization factor of 2, but the
				vectorizer is requesting a vectorization factor of 4, the SVFS is
				allowed to create a definition that calls the 2-lane version
				(provided by the library) twice. This capability applies similarly for
				providing masked and unmasked versions when the request doesn't match
				what is available in the library.

				This method is equivalent to the TLI method ``getVectorFunction(TODO:
				hfinkelUnsubmitted Done Reply Inline Actions TODO? hfinkel: TODO?
				fpetrogalliAuthorUnsubmitted Done Reply Inline Actions Done, I have also changed the name to the correct one of the TLI (`getVectorizedFunction`). fpetrogalli: Done, I have also changed the name to the correct one of the TLI (`getVectorizedFunction`).
				add signature)``.

				### Scalable vectorization

				Both methods of the SVFS API will be extended with a boolean parameter
				to specify whether scalable signatures are needed by the user of the
				SVFS.

				## Changes in the LoopVectorizer {#LV}

				The LoopVectorizer and the related analisys passes will have to
				hfinkelUnsubmitted Done Reply Inline Actions analisys -> analysis hfinkel: analisys -> analysis
				replace the TLI version of ``isFunctionVectorizable`` and
				``getVectorFunction`` with the SVFS ones.

				hsaitoUnsubmitted Done Reply Inline Actions This isn't specific to veclib, but we need to think about what to do when there are more than one candidate that "fuzzy matches" the calling context. We should provide a standard "scoring system" that returns the "best match", but that scoring system may not be "one size fits all". For veclib, we probably could take some short cut --- but if we want to standardize with OpenMP declare simd, we should think about multiple fuzzy matches. hsaito: This isn't specific to veclib, but we need to think about what to do when there are more than…
				fpetrogalliAuthorUnsubmitted Done Reply Inline Actions I have added the following paragraph (at the end of this section) to address this: Notice that to fully support OpenMP vectorization we need to think about a fuzzy matching mechanism that is able to select a candidate in the calling context. However, this is not needed for `-fveclib` because the scalar-to-vector mappings of `-fveclib` are such that for every scalar function there is only one possible vector function associated. Therefore, extending this behavior to a generic one is an aspect of the implementation that will be treated in a separate RFC about the vectorization pass. fpetrogalli: I have added the following paragraph (at the end of this section) to address this: >…
				hsaitoUnsubmitted Done Reply Inline Actions Thanks! hsaito: Thanks!
				## Changes in clang: shipping ``math.h`` with the compiler {#mathdoth}

				We use clang to generate the meta-data described above. The functions
				available in library ``X`` are listed in a custom ``math.h`` file
				that is shipped with the compiler in ``<clang>/lib/Headers/math.h``.
				The header file is implemented by including "once" the system
				``math.h`` file, followed by ``#ifdef`` guarded re-declarations of the
				functions enriched with ``#pragma omp declare simd`` directives.

				```
				#include_once <math.h>

				// ... cpp extern "C" guards omitted

				#ifdef CONDITION_FOR_LIBRARY_X
				#pragma omp declare simd simdlen(4) notinbranch
				extern double sin(double);
				#endif
				```

				This generates the vector Function ABI mangled name to be used in
				the ``vector-variant`` attribute, for example ``_ZGVcN2v_sin``, when
				targeting AVX code generation.

				The part of the vector-variant attribute that redirects the call to
				``__svml_sin2`` is also added via the header file ``math.h``, by using
				the OpenMP 5.0 directive ``declare variant``,[^2] guarded by SVML
				specific preprocessor macros:

				```
				#ifdef CONDITION_FOR_SVML
				hfinkelUnsubmitted Done Reply Inline Actions This needs to be in the reserved namespace. Maybe something like: _CLANG_USE_SVML hfinkel: This needs to be in the reserved namespace. Maybe something like: _CLANG_USE_SVML
				hfinkelUnsubmitted Done Reply Inline Actions To update my own suggestion, we might use: _CLANG_USE_VECLIB_X (but we should be consistent throughout the document - below you have LIBRARY_X). hfinkel: To update my own suggestion, we might use: _CLANG_USE_VECLIB_X (but we should be consistent…
				#pragma omp declare simd simdlen(4) notinbranch
				extern double sin(double);

				#pragma omp declare variant(double sin(double)) \
				match(construct=simd(simdlen(4),notinbranch), device={isa(avx2)})
				__m256d __svml_sin4(__m256d x);
				#endif
				```

				## Changes in the clang driver {#driver}

				To enable the information provided via ``math.h``, the clang driver
				will translate the ``-fveclib=X`` option into ``-fopenmp-simd
				-DCONDITION_FOR_LIBRARY_X`` to turn on the correct section of the
				header file.

				# Extending auto-vectorization capabilities of LLVM

				When compared to the TLI-based auto-vectorization mechanism, the
				OpenMP-based mechanism has the advantage of enabling users to provide
				their own vector routines (not just the math ones) by adding
				``declare simd`` and ``declare variant`` definitions in their source.

				[^1]: <http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html>

				[^2]: <https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf>

				[^3] Vector Funcion ABI for x86: <https://software.intel.com/en-us/articles/vector-simd-function-abi>. Vector Function ABI for AArch64: https://developer.arm.com/products/software-development-tools/hpc/arm-compiler-for-hpc/vector-function-abi

This is an archive of the discontinued LLVM Phabricator instance.

[RFC] Re-implementing -fveclib with OpenMPAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 173609

RFC-fveclib.md

[RFC] Re-implementing -fveclib with OpenMP
AbandonedPublic