Index: RFC-fveclib.md =================================================================== --- /dev/null +++ RFC-fveclib.md @@ -0,0 +1,297 @@ +--- +title: \[RFC\] Implementing -fveclib using OpenMP. +toc: yes +geometry: margin=1.5in +colorlinks: yes +fontfamily: mathpazo +... + +# Introduction + +This RC encompass the proposal of replacing the current +``TargetLibraryInfo`` (TLI) based implementation of the command line +``-fveclib`` with an OpenMP based one. + +With this change, `-fveclib` will maintain its current behavior in +terms of user experience, but the new implementation will +additionally: + +1. Decouples the compiler front-end that knows about the availability + of vectorized routines, from the back-end that knows how to make + use of them. +2. Enable support for a developer's own vector libraries without + requiring changes to the compiler? + +The implementation of the proposal will consists of the following +components: + +1. [Changes in LLVM IR](#llvmIR) to provide information about the + availability of vector math functions via meta-data attached to an + ``llvm::CallInst``. +2. [An infrastructure](#infrastructure) that can be queried to retrive + information about the available vector functions associated to a + ``llvm::CallInst``. +3. [Changes in the LoopVectorizer](#LV) to use the API to query the + meta-data. +4. [Changes in clang](#mathdoth) to add the meta-data in the IR via a + custom header file shipped with the compiler. +5. [Changes in the clang driver](#driver) to translate ``-fveclib`` in + a combination of ``-fopenmp-simd`` and library-specific flags. + + + +# Current status of `-fveclib` + +## User interface + +At the moment, a user can invoke `-fveclib` to generate vector calls +from two libraries, SVML and Accelerate, as follows: + +``` +$> clang -fveclib=[SVML|Accelerate] +``` + +## Interface with the loop vectorizer + +The TLI exposes an interface that enables querying the list of +available mappings by scalar name and number of lanes needed. The TLI +interface is currently used by the InnerLoopVectorizer to plant vector +calls in auto-vectorized loops. + +## Extending `-fveclib` + +Adding new libraries require listing the mapping in +`/lib/Analisys/TargetLibraryInfo.cpp`, plus modifying the clang +front-end to handle the new value for the option - see for example the +two patches to add SLEEF () as a target library for +AArch64: (LLVM code-base) and + (clang code-base). + +## Limitations of the current implementation + +The mapping between scalar to vector version of a function is defined +by the backend, within the TLI specifically. For this reason the +frontend's -fveclib option is tied to the backend's support for the, +often language dependent, library. In particular, an IR file that is +generated with a version of clang that knows about the availability of +library ``X``, needs to be processed by a backend end that also needs +to know about the availability of library ``X``. + +# Proposed changes + +We propose an implementation of ``-fveclib`` that makes uses of an +OpenMP ``declare simd`` based mechanism to inform the backend +components about the availability of vector version of scalar +functions found in IR. The mechanism relies in storing such +information in IR meta-data, and therefore makes the +auto-vectorization of function calls a mid-end (``opt``) process that +is independent on the front-end that generated such IR meta-data. + +Moreover, this implementation enhances the extendibility and +portability of ``-fveclib`` to other libraries and front-ends, and it +provides a generic mechanism that the users of the LLVM compiler will +be able to use for interfacing their own vector routines for generic +code. + +The proposed implementation can also be used to expose +vectorization-specific descriptors -- for example, like the ``linear`` +and ``uniform`` clauses of the OpenMP ``declare simd`` directive -- +that could be used to finely tune the automatic vectorization of some +functions (think for example the vectorization of ``double +sincos(double , double *, double *)``, where ``linear`` can be used to +give extra information about the memory layout of the 2 pointers +parameters in the vector version). + + +## Changes in LLVM IR {#llvmIR} + +The IR is enriched with meta-data that details the availability of +vector versions of an associated scalar function. This metadata is +attached to the call site of the scalar function. + +The meta-data takes the form of an attribute containing a comma +separated list of vector function mappings. Each entry has a +unique name that follows the Vector Function ABI[^3] and real name that +is used when generating calls to this vector function. + +``` +vfunc_name1(real_name1), vfunc_name2(real_name2) +``` + +The Vector Function ABI name describes the signature of the vector +function so that properties like vectorisation factor can be queried +during compilation. + +The real name is optional and assumed to match the vector function ABI +name when omitted. + +For example, the availability of a 2-lane double precision ``sin`` +function via SVML when targeting AVX on x86 is provided by the +following IR. + +``` +// ... +... = call double @sin(double) #0 +// ... + +#0 = { vector-variant = {"_ZGVcN2v_sin(__svml_sin2), + _ZGVdN4v_sin(__svml_sin4), + ..."} } +``` + +The string ``"_ZGVcN2v_sin(__svml_sin2)"`` in this vector-variant +attribute provides information on the shape of the vector function via +the string ``_ZGVcN2v_sin``, mangled according to the Vector Function +ABI for Intel, and remaps the standard Vector Function ABI name to the +non-standard name ``__svml_sin2``. + +This meta-data is compatible with the proposal "Proposal for function +vectorization and loop vectorization with function calls",[^1] that +uses Vector Function ABI mangled names to inform the vectorizer about +the availability of vector functions. The proposal extends the +original by allowing the explicit mapping of the Vector Function ABI +mangled name to a non-standard name, which allows the use of existing +vector libraries. + +The ``vector-variant`` attribute needs to be attached on a per-call +basis to avoid conflicts when merging modules with different vector +variants. + +## The query infrastructure: SVFS {#infrastructure} + +The Search Vector Function System (SVFM) is constructed from an +``llvm::Module`` instance so it can create function definitions. The +SVMS exposes an API with two methods. + +### ``SVFS::isFunctionVectorizable`` + +This method queries the avilability of a vectorized version of a +function. The signature of the method is as follows. + +``` +bool isFunctionVectorizable(llvm::CallInst * Call, ParTypeSet Params); +``` + +The method determine the availability of vector version of the +function invoked by the ``Call`` parameter by looking at the +``vector-variant`` meta-data. + +The ``Params`` argument is a set mapping the position of a parameter +in the CallInst to its ``ParameterType`` descriptor. The +``ParameterType`` descriptor holds information about the shape of the +correspondend parameter in the signature of the vector function. This +``ParamaterType`` is used to query the SVMS about the availability +of vector version that have ``linear`` or ``uniform`` parameters (in +the sense of OpenMP 4.0 and onwards). + +The method we propose, when invoked with an empty ``ParTypeSet``, is +equivalent to the ``TargetLibraryInfo`` method +``isFunctionVectorizable(StrinRef Name)`` + +### ``SVFS::getVectorFunction`` + +This method returns the vector function declaration that correspond +to the needs of the vectorization technique that is being run. + +The signature of the function is as follows. + +``` +std::pair getVectorFunction( + llvm::CallInst * Call, unsigned VF, bool IsMasked, ParTypeSet Params); +``` + +The ``Call`` parameter is the call instance that is being vectorized, +the ``VF`` parameter represent the vectorization factor (how many +lanes), the ``IsMasked`` parameter decides whether or not the +signature of the vector function is required to have a mask parameter, +the ``Params`` parameter describes the shape of the vector function as +in the ``isFunctionVectorizable`` method. + +The methods uses the ``vector-variant`` meta-data and returns the +function signature and the name of the function based on the input +parameters. + +The SVFS can add new function definitions, in the same module as the +``Call``, to provide vector functions that are not present within the +vector-variant meta-data. For example, if a library provides a vector +version of a function with a vectorization factor of 2, but the +vectorizer is requesting a vectorization factor of 4, the SVFS is +allowed to create a definition that calls the 2-lane version +(provided by the library) twice. This capability applies similarly for +providing masked and unmasked versions when the request doesn't match +what is available in the library. + +This method is equivalent to the TLI method ``getVectorFunction(TODO: +add signature)``. + +### Scalable vectorization + +Both methods of the SVFS API will be extended with a boolean parameter +to specify whether scalable signatures are needed by the user of the +SVFS. + +## Changes in the LoopVectorizer {#LV} + +The LoopVectorizer and the related analisys passes will have to +replace the TLI version of ``isFunctionVectorizable`` and +``getVectorFunction`` with the SVFS ones. + +## Changes in clang: shipping ``math.h`` with the compiler {#mathdoth} + +We use clang to generate the meta-data described above. The functions +available in library ``X`` are listed in a custom ``math.h`` file +that is shipped with the compiler in ``/lib/Headers/math.h``. +The header file is implemented by including "once" the system +``math.h`` file, followed by ``#ifdef`` guarded re-declarations of the +functions enriched with ``#pragma omp declare simd`` directives. + +``` +#include_once + +// ... cpp extern "C" guards omitted + +#ifdef CONDITION_FOR_LIBRARY_X +#pragma omp declare simd simdlen(4) notinbranch +extern double sin(double); +#endif +``` + +This generates the vector Function ABI mangled name to be used in +the ``vector-variant`` attribute, for example ``_ZGVcN2v_sin``, when +targeting AVX code generation. + +The part of the vector-variant attribute that redirects the call to +``__svml_sin2`` is also added via the header file ``math.h``, by using +the OpenMP 5.0 directive ``declare variant``,[^2] guarded by SVML +specific preprocessor macros: + +``` +#ifdef CONDITION_FOR_SVML +#pragma omp declare simd simdlen(4) notinbranch +extern double sin(double); + +#pragma omp declare variant(double sin(double)) \ +match(construct=simd(simdlen(4),notinbranch), device={isa(avx2)}) +__m256d __svml_sin4(__m256d x); +#endif +``` + +## Changes in the clang driver {#driver} + +To enable the information provided via ``math.h``, the clang driver +will translate the ``-fveclib=X`` option into ``-fopenmp-simd +-DCONDITION_FOR_LIBRARY_X`` to turn on the correct section of the +header file. + +# Extending auto-vectorization capabilities of LLVM + +When compared to the TLI-based auto-vectorization mechanism, the +OpenMP-based mechanism has the advantage of enabling users to provide +their own vector routines (not just the math ones) by adding +``declare simd`` and ``declare variant`` definitions in their source. + +[^1]: + +[^2]: + +[^3] Vector Funcion ABI for x86: . Vector Function ABI for AArch64: https://developer.arm.com/products/software-development-tools/hpc/arm-compiler-for-hpc/vector-function-abi