Index: RFC-fveclib.md
===================================================================
--- /dev/null
+++ RFC-fveclib.md
@@ -0,0 +1,297 @@
+---
+title: \[RFC\] Implementing -fveclib using OpenMP.
+toc: yes
+geometry: margin=1.5in
+colorlinks: yes
+fontfamily: mathpazo
+...
+
+# Introduction
+
+This RC encompass the proposal of replacing the current
+``TargetLibraryInfo`` (TLI) based implementation of the command line
+``-fveclib`` with an OpenMP based one.
+
+With this change, `-fveclib` will maintain its current behavior in
+terms of user experience, but the new implementation will
+additionally:
+
+1. Decouples the compiler front-end that knows about the availability
+   of vectorized routines, from the back-end that knows how to make
+   use of them.
+2. Enable support for a developer's own vector libraries without
+   requiring changes to the compiler?
+
+The implementation of the proposal will consists of the following
+components:
+
+1. [Changes in LLVM IR](#llvmIR) to provide information about the
+   availability of vector math functions via meta-data attached to an
+   ``llvm::CallInst``.
+2. [An infrastructure](#infrastructure) that can be queried to retrive
+   information about the available vector functions associated to a
+   ``llvm::CallInst``.
+3. [Changes in the LoopVectorizer](#LV) to use the API to query the
+   meta-data.
+4. [Changes in clang](#mathdoth) to add the meta-data in the IR via a
+   custom header file shipped with the compiler.
+5. [Changes in the clang driver](#driver) to translate ``-fveclib`` in
+   a combination of ``-fopenmp-simd`` and library-specific flags.
+
+
+
+# Current status of `-fveclib`
+
+## User interface
+
+At the moment, a user can invoke `-fveclib` to generate vector calls
+from two libraries, SVML and Accelerate, as follows:
+
+```
+$> clang -fveclib=[SVML|Accelerate]
+```
+
+## Interface with the loop vectorizer
+
+The TLI exposes an interface that enables querying the list of
+available mappings by scalar name and number of lanes needed. The TLI
+interface is currently used by the InnerLoopVectorizer to plant vector
+calls in auto-vectorized loops.
+
+## Extending `-fveclib`
+
+Adding new libraries require listing the mapping in
+`<llvm>/lib/Analisys/TargetLibraryInfo.cpp`, plus modifying the clang
+front-end to handle the new value for the option - see for example the
+two patches to add SLEEF (<http://sleef.org>) as a target library for
+AArch64: <https://reviews.llvm.org/D53927> (LLVM code-base) and
+<https://reviews.llvm.org/D53928> (clang code-base).
+
+## Limitations of the current implementation
+
+The mapping between scalar to vector version of a function is defined
+by the backend, within the TLI specifically. For this reason the
+frontend's -fveclib option is tied to the backend's support for the,
+often language dependent, library. In particular, an IR file that is
+generated with a version of clang that knows about the availability of
+library ``X``, needs to be processed by a backend end that also needs
+to know about the availability of library ``X``.
+
+# Proposed changes
+
+We propose an implementation of ``-fveclib`` that makes uses of an
+OpenMP ``declare simd`` based mechanism to inform the backend
+components about the availability of vector version of scalar
+functions found in IR.  The mechanism relies in storing such
+information in IR meta-data, and therefore makes the
+auto-vectorization of function calls a mid-end (``opt``) process that
+is independent on the front-end that generated such IR meta-data.
+
+Moreover, this implementation enhances the extendibility and
+portability of ``-fveclib`` to other libraries and front-ends, and it
+provides a generic mechanism that the users of the LLVM compiler will
+be able to use for interfacing their own vector routines for generic
+code.
+
+The proposed implementation can also be used to expose
+vectorization-specific descriptors -- for example, like the ``linear``
+and ``uniform`` clauses of the OpenMP ``declare simd`` directive --
+that could be used to finely tune the automatic vectorization of some
+functions (think for example the vectorization of ``double
+sincos(double , double *, double *)``, where ``linear`` can be used to
+give extra information about the memory layout of the 2 pointers
+parameters in the vector version).
+
+
+## Changes in LLVM IR {#llvmIR}
+
+The IR is enriched with meta-data that details the availability of
+vector versions of an associated scalar function. This metadata is
+attached to the call site of the scalar function.
+
+The  meta-data takes  the  form  of an  attribute  containing a  comma
+separated  list  of  vector  function  mappings.   Each  entry  has  a
+unique name that follows the Vector Function ABI[^3] and real name that
+is used when generating calls to this vector function.
+
+```
+vfunc_name1(real_name1), vfunc_name2(real_name2)
+```
+
+The Vector Function ABI name describes the signature of the vector
+function so that properties like vectorisation factor can be queried
+during compilation.
+
+The real name is optional and assumed to match the vector function ABI
+name when omitted.
+
+For example, the availability of a 2-lane double precision ``sin``
+function via SVML when targeting AVX on x86 is provided by the
+following IR.
+
+```
+// ...
+... = call double @sin(double) #0
+// ...
+
+#0 = { vector-variant = {"_ZGVcN2v_sin(__svml_sin2),
+                          _ZGVdN4v_sin(__svml_sin4),
+                          ..."} }
+```
+
+The string ``"_ZGVcN2v_sin(__svml_sin2)"`` in this vector-variant
+attribute provides information on the shape of the vector function via
+the string ``_ZGVcN2v_sin``, mangled according to the Vector Function
+ABI for Intel, and remaps the standard Vector Function ABI name to the
+non-standard name ``__svml_sin2``.
+
+This meta-data is compatible with the proposal "Proposal for function
+vectorization and loop vectorization with function calls",[^1] that
+uses Vector Function ABI mangled names to inform the vectorizer about
+the availability of vector functions. The proposal extends the
+original by allowing the explicit mapping of the Vector Function ABI
+mangled name to a non-standard name, which allows the use of existing
+vector libraries.
+
+The ``vector-variant`` attribute needs to be attached on a per-call
+basis to avoid conflicts when merging modules with different vector
+variants.
+
+## The query infrastructure: SVFS {#infrastructure}
+
+The Search Vector Function System (SVFM) is constructed from an
+``llvm::Module`` instance so it can create function definitions.  The
+SVMS exposes an API with two methods.
+
+### ``SVFS::isFunctionVectorizable``
+
+This method queries the avilability of a vectorized version of a
+function. The signature of the method is as follows.
+
+```
+bool isFunctionVectorizable(llvm::CallInst * Call, ParTypeSet Params);
+```
+
+The method determine the availability of vector version of the
+function invoked by the ``Call`` parameter by looking at the
+``vector-variant`` meta-data.
+
+The ``Params`` argument is a set mapping the position of a parameter
+in the CallInst to its ``ParameterType`` descriptor.  The
+``ParameterType`` descriptor holds information about the shape of the
+correspondend parameter in the signature of the vector function. This
+``ParamaterType`` is used to query the SVMS about the availability
+of vector version that have ``linear`` or ``uniform`` parameters (in
+the sense of OpenMP 4.0 and onwards).
+
+The method we propose, when invoked with an empty ``ParTypeSet``, is
+equivalent to the ``TargetLibraryInfo`` method
+``isFunctionVectorizable(StrinRef Name)``
+
+### ``SVFS::getVectorFunction``
+
+This method returns the vector function declaration that correspond
+to the needs of the vectorization technique that is being run.
+
+The signature of the function is as follows.
+
+```
+std::pair<llvm::FunctionType *, std::string> getVectorFunction(
+  llvm::CallInst * Call, unsigned VF, bool IsMasked, ParTypeSet Params);
+```
+
+The ``Call`` parameter is the call instance that is being vectorized,
+the ``VF`` parameter represent the vectorization factor (how many
+lanes), the ``IsMasked`` parameter decides whether or not the
+signature of the vector function is required to have a mask parameter,
+the ``Params`` parameter describes the shape of the vector function as
+in the ``isFunctionVectorizable`` method.
+
+The methods uses the ``vector-variant`` meta-data and returns the
+function signature and the name of the function based on the input
+parameters.
+
+The SVFS can add new function definitions, in the same module as the
+``Call``, to provide vector functions that are not present within the
+vector-variant meta-data. For example, if a library provides a vector
+version of a function with a vectorization factor of 2, but the
+vectorizer is requesting a vectorization factor of 4, the SVFS is
+allowed to create a definition that calls the 2-lane version
+(provided by the library) twice. This capability applies similarly for
+providing masked and unmasked versions when the request doesn't match
+what is available in the library.
+
+This method is equivalent to the TLI method ``getVectorFunction(TODO:
+add signature)``.
+
+### Scalable vectorization
+
+Both methods of the SVFS API will be extended with a boolean parameter
+to specify whether scalable signatures are needed by the user of the
+SVFS.
+
+## Changes in  the LoopVectorizer {#LV}
+
+The LoopVectorizer and the related analisys passes will have to
+replace the TLI version of ``isFunctionVectorizable`` and
+``getVectorFunction`` with the SVFS ones.
+
+## Changes in clang: shipping ``math.h`` with the compiler {#mathdoth}
+
+We use clang to generate the meta-data described above.  The functions
+available in library ``X`` are listed in a custom ``math.h`` file
+that is shipped with the compiler in ``<clang>/lib/Headers/math.h``.
+The header file is implemented by including "once" the system
+``math.h`` file, followed by ``#ifdef`` guarded re-declarations of the
+functions enriched with ``#pragma omp declare simd`` directives.
+
+```
+#include_once <math.h>
+
+// ... cpp extern "C" guards omitted
+
+#ifdef CONDITION_FOR_LIBRARY_X
+#pragma omp declare simd simdlen(4) notinbranch
+extern double sin(double);
+#endif
+```
+
+This generates the vector Function ABI mangled name to be used in
+the ``vector-variant`` attribute, for example ``_ZGVcN2v_sin``, when
+targeting AVX code generation.
+
+The part of the vector-variant attribute that redirects the call to
+``__svml_sin2`` is also added via the header file ``math.h``, by using
+the OpenMP 5.0 directive ``declare variant``,[^2] guarded by SVML
+specific preprocessor macros:
+
+```
+#ifdef CONDITION_FOR_SVML
+#pragma omp declare simd simdlen(4) notinbranch
+extern double sin(double);
+
+#pragma omp declare variant(double sin(double)) \
+match(construct=simd(simdlen(4),notinbranch), device={isa(avx2)})
+__m256d __svml_sin4(__m256d x);
+#endif
+```
+
+## Changes in the clang driver {#driver}
+
+To enable the information provided via ``math.h``, the clang driver
+will translate the ``-fveclib=X`` option into ``-fopenmp-simd
+-DCONDITION_FOR_LIBRARY_X`` to turn on the correct section of the
+header file.
+
+# Extending auto-vectorization capabilities of LLVM
+
+When compared to the TLI-based auto-vectorization mechanism, the
+OpenMP-based mechanism has the advantage of enabling users to provide
+their own vector routines (not just the math ones) by adding
+``declare simd`` and ``declare variant`` definitions in their source.
+
+[^1]: <http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html>
+
+[^2]: <https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf>
+
+[^3] Vector Funcion ABI for x86: <https://software.intel.com/en-us/articles/vector-simd-function-abi>. Vector Function ABI for AArch64: https://developer.arm.com/products/software-development-tools/hpc/arm-compiler-for-hpc/vector-function-abi