Index: RFC-fveclib.md
===================================================================
--- /dev/null
+++ RFC-fveclib.md
@@ -0,0 +1,428 @@
+---
+title: \[RFC\] Implementing -fveclib using OpenMP.
+toc: yes
+geometry: margin=1.5in
+colorlinks: yes
+fontfamily: mathpazo
+...
+
+# Introduction
+
+This RFC encompass the proposal of replacing the current
+``TargetLibraryInfo`` (TLI) based implementation of the command line
+``-fveclib`` with an OpenMP based one.
+
+With this change, `-fveclib` will maintain its current behavior in
+terms of user experience, but the new implementation will
+additionally:
+
+1. Decouples the compiler front-end that knows about the availability
+   of vectorized routines, from the back-end that knows how to make
+   use of them.
+2. Enable support for a developer's own vector libraries without
+   requiring changes to the compiler, via the new ``-fveclib-include``
+   command line option.
+3. Enables other frontends and languages to add scalar-to-vector
+   function mappings as relevant for their own runtime libraries, etc.
+
+The implementation of the proposal will consists of the following
+components:
+
+1. [Changes in LLVM IR](#llvmIR) to provide information about the
+   availability of vector math functions via metadata attached to an
+   ``llvm::CallInst``.
+2. [An infrastructure](#infrastructure) that can be queried to retrive
+   information about the available vector functions associated to a
+   ``llvm::CallInst``.
+3. [Changes in the LoopVectorizer](#LV) to use the API to query the
+   metadata.
+4. [Changes in clang](#mathdoth) to add the metadata in the IR via two
+   mechanisms:
+
+   1. A custom ``math.h`` header file shipped with the compiler.
+   2. A user header file distributed with the library, to be used with
+      the command line option ``-fveclib-include``.
+
+5. [Changes in the clang driver](#driver) to translate ``-fveclib`` in
+   a combination of flags that enable the generation of the
+   library-specific flags needed to select the list of available
+   vector functions specified in any of the header files.
+
+# Current status of `-fveclib`
+
+## User interface
+
+At the moment, a user can invoke `-fveclib` to generate vector calls
+from two libraries, SVML and Accelerate, as follows:
+
+```
+$> clang -fveclib=[SVML|Accelerate]
+```
+
+## Interface with the loop vectorizer
+
+The TLI exposes an interface that enables querying the list of
+available mappings by scalar name and number of lanes needed. The TLI
+interface is currently used by the InnerLoopVectorizer to plant vector
+calls in auto-vectorized loops.
+
+## Extending `-fveclib`
+
+Adding new libraries require listing the mapping in
+`<llvm>/lib/Analysis/TargetLibraryInfo.cpp`, plus modifying the clang
+front-end to handle the new value for the option - see for example the
+two patches to add SLEEF (<http://sleef.org>) as a target library for
+AArch64: <https://reviews.llvm.org/D53927> (LLVM code-base) and
+<https://reviews.llvm.org/D53928> (clang code-base).
+
+## Limitations of the current implementation
+
+The mapping between scalar to vector version of a function is defined
+by the backend, within the TLI specifically. For this reason the
+frontend's -fveclib option is tied to the backend's support for the,
+often language dependent, library. In particular, an IR file that is
+generated with a version of clang that knows about the availability of
+library ``X``, needs to be processed by a backend end that also needs
+to know about the availability of library ``X``.
+
+# Proposed changes
+
+We propose an implementation of ``-fveclib`` that makes uses of a
+_veclib specific_ pragma that is based on the OpenMP ``declare simd``
+and ``declare variant`` mechanism to inform the backend components
+about the availability of vector version of scalar functions found in
+IR.  The mechanism relies in storing such information in IR metadata,
+and therefore makes the auto-vectorization of function calls a mid-end
+(``opt``) process that is independent on the front-end that generated
+such IR metadata.
+
+Moreover, this implementation enhances the extendibility and
+portability of ``-fveclib`` to other libraries and front-ends, and it
+provides a generic mechanism that the users of the LLVM compiler will
+be able to use for interfacing their own vector routines for generic
+code.
+
+The proposed implementation can also be used to expose
+vectorization-specific descriptors -- for example, like the ``linear``
+and ``uniform`` clauses of the OpenMP ``declare simd`` directive --
+that could be used to finely tune the automatic vectorization of some
+functions (think for example the vectorization of ``double
+sincos(double , double *, double *)``, where ``linear`` can be used to
+give extra information about the memory layout of the 2 pointers
+parameters in the vector version).
+
+The new proposed ``#pragma`` directive are:
+
+1. ``#pragma veclib declare simd``.
+2. ``#pragma veclib declare variant``.
+
+Both directive follows the syntax of the ``declare simd`` and the
+``declare variant`` directives of OpenMP, with the exception that
+``declare variant`` is used only for the ``simd`` context.
+
+We define a new ``veclib``-only directive instead of using the `omp`
+ones of OpenMP for the following reasons:
+
+1. Allow the compiler to perform auto-vectorization outside of an
+   OpenMP SIMD context.
+2. Allow library vendors to provide standard mechanism, based on
+   OpenMP, to inform the compiler about the availability of vector
+   functions that can be used for auto-vectorization.
+
+A new compiler option, ``-fparse-veclib``, is added to clang to enable
+parsing of the ``veclib`` directive outside an OpenMP context.
+
+## OpenMP compatibility
+
+Note that the ``veclib`` pragma can be converted to the standard
+OpenMP one by the following pre-processor test.
+
+```
+#ifdef _OPENMP
+#define veclib omp
+#endif
+```
+
+Notice also that the ``veclib simd`` and ``veclib variant`` directive
+can be parsed with the same infrastructure used for the OpenMP
+correspondents.
+
+In the following RFC, we will describe how the compiler behaves when
+parsing a ``veclib`` pragma. The same behavior is obtained when
+parsing the OpenMP based one when the compiler is invoked with the
+comman line options that enable OpenMP (``-fopenmp[-simd]``).
+
+## Changes in LLVM IR {#llvmIR}
+
+The IR is enriched with metadata that details the availability of
+vector versions of an associated scalar function. This metadata is
+attached to the call site of the scalar function.
+
+The  metadata takes  the  form  of an  attribute  containing a  comma
+separated  list  of  vector  function  mappings.   Each  entry  has  a
+unique name that follows the Vector Function ABI[^3] and real name that
+is used when generating calls to this vector function.
+
+```
+vfunc_name1(real_name1), vfunc_name2(real_name2)
+```
+
+The Vector Function ABI name describes the signature of the vector
+function so that properties like vectorisation factor can be queried
+during compilation.
+
+The real name is optional and assumed to match the vector function ABI
+name when omitted.
+
+For example, the availability of a 2-lane double precision ``sin``
+function via SVML when targeting AVX on x86 is provided by the
+following IR.
+
+```
+// ...
+... = call double @sin(double) #0
+// ...
+
+#0 = { vector-variant = {"_ZGVcN2v_sin(__svml_sin2),
+                          _ZGVdN4v_sin(__svml_sin4),
+                          ..."} }
+```
+
+The string ``"_ZGVcN2v_sin(__svml_sin2)"`` in this vector-variant
+attribute provides information on the shape of the vector function via
+the string ``_ZGVcN2v_sin``, mangled according to the Vector Function
+ABI for Intel, and remaps the standard Vector Function ABI name to the
+non-standard name ``__svml_sin2``.
+
+This metadata is compatible with the proposal "Proposal for function
+vectorization and loop vectorization with function calls",[^1] that
+uses Vector Function ABI mangled names to inform the vectorizer about
+the availability of vector functions. The proposal extends the
+original by allowing the explicit mapping of the Vector Function ABI
+mangled name to a non-standard name, which allows the use of existing
+vector libraries.
+
+The ``vector-variant`` attribute needs to be attached on a per-call
+basis to avoid conflicts when merging modules with different vector
+variants.
+
+## The query infrastructure: SVFS {#infrastructure}
+
+The Search Vector Function System (SVFS) is constructed from an
+``llvm::Module`` instance so it can create function definitions.  The
+SVFS exposes an API with two methods.
+
+### ``SVFS::isFunctionVectorizable``
+
+This method queries the avilability of a vectorized version of a
+function. The signature of the method is as follows.
+
+```
+bool isFunctionVectorizable(llvm::CallInst * Call, ParTypeSet Params);
+```
+
+The method determine the availability of vector version of the
+function invoked by the ``Call`` parameter by looking at the
+``vector-variant`` metadata.
+
+The ``Params`` argument is a set mapping the position of a parameter
+in the CallInst to its ``ParameterType`` descriptor.  The
+``ParameterType`` descriptor holds information about the shape of the
+correspondend parameter in the signature of the vector function. This
+``ParamaterType`` is used to query the SVMS about the availability
+of vector version that have ``linear`` or ``uniform`` parameters (in
+the sense of OpenMP 4.0 and onwards).
+
+The method we propose, when invoked with an empty ``ParTypeSet``, is
+equivalent to the ``TargetLibraryInfo`` method
+``isFunctionVectorizable(StrinRef Name)``
+
+### ``SVFS::getVectorizedFunction``
+
+This method returns the vector function declaration that correspond
+to the needs of the vectorization technique that is being run.
+
+The signature of the function is as follows.
+
+```
+std::pair<llvm::FunctionType *, std::string> getVectorizedFunction(
+  llvm::CallInst * Call, unsigned VF, bool IsMasked, ParTypeSet Params);
+```
+
+The ``Call`` parameter is the call instance that is being vectorized,
+the ``VF`` parameter represent the vectorization factor (how many
+lanes), the ``IsMasked`` parameter decides whether or not the
+signature of the vector function is required to have a mask parameter,
+the ``Params`` parameter describes the shape of the vector function as
+in the ``isFunctionVectorizable`` method.
+
+The methods uses the ``vector-variant`` metadata and returns the
+function signature and the name of the function based on the input
+parameters.
+
+The SVFS can add new function definitions, in the same module as the
+``Call``, to provide vector functions that are not present within the
+vector-variant metadata. For example, if a library provides a vector
+version of a function with a vectorization factor of 2, but the
+vectorizer is requesting a vectorization factor of 4, the SVFS is
+allowed to create a definition that calls the 2-lane version
+(provided by the library) twice. This capability applies similarly for
+providing masked and unmasked versions when the request doesn't match
+what is available in the library.
+
+This method is equivalent to the TLI method ``StringRef
+getVectorizedFunction(StringRef F, unsigned VF) const;``.
+
+Notice that to fully support OpenMP vectorization we need to think
+about a fuzzy matching mechanism that is able to select a candidate in
+the calling context. However, this is not needed for `-fveclib`
+because the scalar-to-vector mappings of ``-fveclib`` are such that
+for every scalar function there is only one possible vector function
+associated. Therefore, extending this behavior to a generic one is an
+aspect of the implementation that will be treated in a separate RFC
+about the vectorization pass.
+
+### Scalable vectorization
+
+Both methods of the SVFS API will be extended with a boolean parameter
+to specify whether scalable signatures are needed by the user of the
+SVFS.
+
+## Changes in  the LoopVectorizer {#LV}
+
+The LoopVectorizer and the related analysis passes will have to
+replace the TLI version of ``isFunctionVectorizable`` and
+``getVectorizedFunction`` with the SVFS ones.
+
+## Changes in clang: shipping ``math.h`` with the compiler {#mathdoth}
+
+We use clang to generate the metadata described above.  The functions
+available in library ``X`` are listed in a custom ``math.h`` file
+that is shipped with the compiler in ``<clang>/lib/Headers/math.h``.
+The header file is implemented by including "once" the system
+``math.h`` file, followed by ``#ifdef`` guarded re-declarations of the
+functions enriched with ``#pragma veclib declare simd`` directives.
+
+```
+#include_once <math.h>
+
+// ... cpp extern "C" guards omitted
+
+#ifdef _CLANG_USE_LIBRARY_X
+#pragma veclib declare simd simdlen(4) notinbranch
+extern double sin(double);
+#endif
+```
+
+This generates the vector Function ABI mangled name to be used in
+the ``vector-variant`` attribute, for example ``_ZGVcN2v_sin``, when
+targeting AVX code generation.
+
+The part of the vector-variant attribute that redirects the call to
+``__svml_sin2`` is also added via the header file ``math.h``, by using
+the OpenMP 5.0 directive ``declare variant``,[^2] guarded by SVML
+specific preprocessor macros:
+
+```
+#ifdef _CLANG_USE_SVML
+#pragma veclib declare simd simdlen(4) notinbranch
+extern double sin(double);
+
+#pragma veclib declare variant(double sin(double)) \
+match(construct=simd(simdlen(4),notinbranch), device={isa(avx2)})
+__m256d __svml_sin4(__m256d x);
+#endif
+```
+
+Note that the list of if-guarded function declaration do not need to
+leave in the same ``math.h`` file, but can be included in ``math.h``
+from library-specific header files.
+
+## Changes in the clang driver {#driver}
+
+To enable the information provided via ``math.h``, the clang driver
+will translate the ``-fveclib=X`` option into ``-D_CLANG_USE_LIBRARY_X
+-lX`` to turn on the correct section of the header file and the flag
+for the linker.
+
+Note that the ``veclib`` directives are loaded even when *not*
+compiling for an OpenMP target.
+
+# Extending auto-vectorization capabilities of LLVM
+
+When compared to the TLI-based auto-vectorization mechanism, the
+OpenMP-based mechanism has the advantage of enabling users to provide
+their own vector routines (not just the math ones) by adding ``veclib
+declare simd`` and ``veclib declare variant`` definitions in their
+source.
+
+For this specific functionality, the following command line option is
+added to clang:
+
+```
+-fveclib-include=path/to/header/file.h
+```
+
+This options enable clang to recognize the ``veclib declare simd`` and
+``veclib declare variant`` directive listed in the library of the
+header file.
+
+# Summary
+
+## New ``veclib`` directives in clang
+
+1. ``#pragma veclib declare simd [clause, ]``, same as ``#pragma omp
+   declare simd`` from OpenMP 4.0+.
+2. ``#pragma omp declare variant``, same as ``#pragma omp declare variant``
+ restricted to the ``simd`` context selector, from OpenMP 5.0+.
+
+## New ``math.h`` header file
+
+Shipped in ``<clang>/lib/Headers/math.h``, contains all the
+declaration of the functions available in the vector library ``X``,
+``ifdef`` guarded by the macro ``__CLANG_ENABLE_LIBRARY_X``.
+
+## Option behavior, and interaction with OpenMP
+
+The behavior described below makes sure that `-fveclib`` function
+vectorization and OpenMP function vectorization are orthogonal.
+
+No options
+
+:   No function vectorization via vector library, neither user
+    provided or shipped via an internal `math.h`.
+
+``-fveclib=X``
+
+:   The driver transform this into ``-fparse-veclib
+    -D__CLANG_ENABLE_LIBRARY_X=1 -lX``. This is used only for users
+    that want to vectorize ``math.h`` functions.
+
+``-fveclib-include=path/to/user/provided/header/file.h``
+
+:   The driver transform this into ``-fparse-veclib
+    -include=path/to/user/provided/header/file.h``. The user has to
+    provide the correct linker flag for both the scalar version and
+    the vector version of whatever function they have defined in the
+    header file. The header file must use the ``veclib`` directive to
+    inform the compiler about the available vector functions.
+
+``-fopenmp[-simd]``
+
+:   No vectorization happens other then for those functions that are
+    marked with OpenMP declare simd. The header ``math.h`` is loaded,
+    but the ``veclib` decorated declarations are invisible to the
+    compiler instance because hidden behind the
+    ``__CLANG_ENABLE_LIBRARY_X`` macros, which are not defined.
+
+``-fopenmp[-simd] -fveclib=X`` or ``-fopenmp[-simd]
+-fveclib-include=path/to/user/provided/header/file.h``
+
+:   Same behavior as without the ``-fopenmp[-simd]`` option.
+
+
+[^1]: <http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html>
+
+[^2]: <https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf>
+
+[^3] Vector Funcion ABI for x86: <https://software.intel.com/en-us/articles/vector-simd-function-abi>. Vector Function ABI for AArch64: https://developer.arm.com/products/software-development-tools/hpc/arm-compiler-for-hpc/vector-function-abi