Index: RFC-fveclib.md =================================================================== --- /dev/null +++ RFC-fveclib.md @@ -0,0 +1,428 @@ +--- +title: \[RFC\] Implementing -fveclib using OpenMP. +toc: yes +geometry: margin=1.5in +colorlinks: yes +fontfamily: mathpazo +... + +# Introduction + +This RFC encompass the proposal of replacing the current +``TargetLibraryInfo`` (TLI) based implementation of the command line +``-fveclib`` with an OpenMP based one. + +With this change, `-fveclib` will maintain its current behavior in +terms of user experience, but the new implementation will +additionally: + +1. Decouples the compiler front-end that knows about the availability + of vectorized routines, from the back-end that knows how to make + use of them. +2. Enable support for a developer's own vector libraries without + requiring changes to the compiler, via the new ``-fveclib-include`` + command line option. +3. Enables other frontends and languages to add scalar-to-vector + function mappings as relevant for their own runtime libraries, etc. + +The implementation of the proposal will consists of the following +components: + +1. [Changes in LLVM IR](#llvmIR) to provide information about the + availability of vector math functions via metadata attached to an + ``llvm::CallInst``. +2. [An infrastructure](#infrastructure) that can be queried to retrive + information about the available vector functions associated to a + ``llvm::CallInst``. +3. [Changes in the LoopVectorizer](#LV) to use the API to query the + metadata. +4. [Changes in clang](#mathdoth) to add the metadata in the IR via two + mechanisms: + + 1. A custom ``math.h`` header file shipped with the compiler. + 2. A user header file distributed with the library, to be used with + the command line option ``-fveclib-include``. + +5. [Changes in the clang driver](#driver) to translate ``-fveclib`` in + a combination of flags that enable the generation of the + library-specific flags needed to select the list of available + vector functions specified in any of the header files. + +# Current status of `-fveclib` + +## User interface + +At the moment, a user can invoke `-fveclib` to generate vector calls +from two libraries, SVML and Accelerate, as follows: + +``` +$> clang -fveclib=[SVML|Accelerate] +``` + +## Interface with the loop vectorizer + +The TLI exposes an interface that enables querying the list of +available mappings by scalar name and number of lanes needed. The TLI +interface is currently used by the InnerLoopVectorizer to plant vector +calls in auto-vectorized loops. + +## Extending `-fveclib` + +Adding new libraries require listing the mapping in +`/lib/Analysis/TargetLibraryInfo.cpp`, plus modifying the clang +front-end to handle the new value for the option - see for example the +two patches to add SLEEF () as a target library for +AArch64: (LLVM code-base) and + (clang code-base). + +## Limitations of the current implementation + +The mapping between scalar to vector version of a function is defined +by the backend, within the TLI specifically. For this reason the +frontend's -fveclib option is tied to the backend's support for the, +often language dependent, library. In particular, an IR file that is +generated with a version of clang that knows about the availability of +library ``X``, needs to be processed by a backend end that also needs +to know about the availability of library ``X``. + +# Proposed changes + +We propose an implementation of ``-fveclib`` that makes uses of a +_veclib specific_ pragma that is based on the OpenMP ``declare simd`` +and ``declare variant`` mechanism to inform the backend components +about the availability of vector version of scalar functions found in +IR. The mechanism relies in storing such information in IR metadata, +and therefore makes the auto-vectorization of function calls a mid-end +(``opt``) process that is independent on the front-end that generated +such IR metadata. + +Moreover, this implementation enhances the extendibility and +portability of ``-fveclib`` to other libraries and front-ends, and it +provides a generic mechanism that the users of the LLVM compiler will +be able to use for interfacing their own vector routines for generic +code. + +The proposed implementation can also be used to expose +vectorization-specific descriptors -- for example, like the ``linear`` +and ``uniform`` clauses of the OpenMP ``declare simd`` directive -- +that could be used to finely tune the automatic vectorization of some +functions (think for example the vectorization of ``double +sincos(double , double *, double *)``, where ``linear`` can be used to +give extra information about the memory layout of the 2 pointers +parameters in the vector version). + +The new proposed ``#pragma`` directive are: + +1. ``#pragma veclib declare simd``. +2. ``#pragma veclib declare variant``. + +Both directive follows the syntax of the ``declare simd`` and the +``declare variant`` directives of OpenMP, with the exception that +``declare variant`` is used only for the ``simd`` context. + +We define a new ``veclib``-only directive instead of using the `omp` +ones of OpenMP for the following reasons: + +1. Allow the compiler to perform auto-vectorization outside of an + OpenMP SIMD context. +2. Allow library vendors to provide standard mechanism, based on + OpenMP, to inform the compiler about the availability of vector + functions that can be used for auto-vectorization. + +A new compiler option, ``-fparse-veclib``, is added to clang to enable +parsing of the ``veclib`` directive outside an OpenMP context. + +## OpenMP compatibility + +Note that the ``veclib`` pragma can be converted to the standard +OpenMP one by the following pre-processor test. + +``` +#ifdef _OPENMP +#define veclib omp +#endif +``` + +Notice also that the ``veclib simd`` and ``veclib variant`` directive +can be parsed with the same infrastructure used for the OpenMP +correspondents. + +In the following RFC, we will describe how the compiler behaves when +parsing a ``veclib`` pragma. The same behavior is obtained when +parsing the OpenMP based one when the compiler is invoked with the +comman line options that enable OpenMP (``-fopenmp[-simd]``). + +## Changes in LLVM IR {#llvmIR} + +The IR is enriched with metadata that details the availability of +vector versions of an associated scalar function. This metadata is +attached to the call site of the scalar function. + +The metadata takes the form of an attribute containing a comma +separated list of vector function mappings. Each entry has a +unique name that follows the Vector Function ABI[^3] and real name that +is used when generating calls to this vector function. + +``` +vfunc_name1(real_name1), vfunc_name2(real_name2) +``` + +The Vector Function ABI name describes the signature of the vector +function so that properties like vectorisation factor can be queried +during compilation. + +The real name is optional and assumed to match the vector function ABI +name when omitted. + +For example, the availability of a 2-lane double precision ``sin`` +function via SVML when targeting AVX on x86 is provided by the +following IR. + +``` +// ... +... = call double @sin(double) #0 +// ... + +#0 = { vector-variant = {"_ZGVcN2v_sin(__svml_sin2), + _ZGVdN4v_sin(__svml_sin4), + ..."} } +``` + +The string ``"_ZGVcN2v_sin(__svml_sin2)"`` in this vector-variant +attribute provides information on the shape of the vector function via +the string ``_ZGVcN2v_sin``, mangled according to the Vector Function +ABI for Intel, and remaps the standard Vector Function ABI name to the +non-standard name ``__svml_sin2``. + +This metadata is compatible with the proposal "Proposal for function +vectorization and loop vectorization with function calls",[^1] that +uses Vector Function ABI mangled names to inform the vectorizer about +the availability of vector functions. The proposal extends the +original by allowing the explicit mapping of the Vector Function ABI +mangled name to a non-standard name, which allows the use of existing +vector libraries. + +The ``vector-variant`` attribute needs to be attached on a per-call +basis to avoid conflicts when merging modules with different vector +variants. + +## The query infrastructure: SVFS {#infrastructure} + +The Search Vector Function System (SVFS) is constructed from an +``llvm::Module`` instance so it can create function definitions. The +SVFS exposes an API with two methods. + +### ``SVFS::isFunctionVectorizable`` + +This method queries the avilability of a vectorized version of a +function. The signature of the method is as follows. + +``` +bool isFunctionVectorizable(llvm::CallInst * Call, ParTypeSet Params); +``` + +The method determine the availability of vector version of the +function invoked by the ``Call`` parameter by looking at the +``vector-variant`` metadata. + +The ``Params`` argument is a set mapping the position of a parameter +in the CallInst to its ``ParameterType`` descriptor. The +``ParameterType`` descriptor holds information about the shape of the +correspondend parameter in the signature of the vector function. This +``ParamaterType`` is used to query the SVMS about the availability +of vector version that have ``linear`` or ``uniform`` parameters (in +the sense of OpenMP 4.0 and onwards). + +The method we propose, when invoked with an empty ``ParTypeSet``, is +equivalent to the ``TargetLibraryInfo`` method +``isFunctionVectorizable(StrinRef Name)`` + +### ``SVFS::getVectorizedFunction`` + +This method returns the vector function declaration that correspond +to the needs of the vectorization technique that is being run. + +The signature of the function is as follows. + +``` +std::pair getVectorizedFunction( + llvm::CallInst * Call, unsigned VF, bool IsMasked, ParTypeSet Params); +``` + +The ``Call`` parameter is the call instance that is being vectorized, +the ``VF`` parameter represent the vectorization factor (how many +lanes), the ``IsMasked`` parameter decides whether or not the +signature of the vector function is required to have a mask parameter, +the ``Params`` parameter describes the shape of the vector function as +in the ``isFunctionVectorizable`` method. + +The methods uses the ``vector-variant`` metadata and returns the +function signature and the name of the function based on the input +parameters. + +The SVFS can add new function definitions, in the same module as the +``Call``, to provide vector functions that are not present within the +vector-variant metadata. For example, if a library provides a vector +version of a function with a vectorization factor of 2, but the +vectorizer is requesting a vectorization factor of 4, the SVFS is +allowed to create a definition that calls the 2-lane version +(provided by the library) twice. This capability applies similarly for +providing masked and unmasked versions when the request doesn't match +what is available in the library. + +This method is equivalent to the TLI method ``StringRef +getVectorizedFunction(StringRef F, unsigned VF) const;``. + +Notice that to fully support OpenMP vectorization we need to think +about a fuzzy matching mechanism that is able to select a candidate in +the calling context. However, this is not needed for `-fveclib` +because the scalar-to-vector mappings of ``-fveclib`` are such that +for every scalar function there is only one possible vector function +associated. Therefore, extending this behavior to a generic one is an +aspect of the implementation that will be treated in a separate RFC +about the vectorization pass. + +### Scalable vectorization + +Both methods of the SVFS API will be extended with a boolean parameter +to specify whether scalable signatures are needed by the user of the +SVFS. + +## Changes in the LoopVectorizer {#LV} + +The LoopVectorizer and the related analysis passes will have to +replace the TLI version of ``isFunctionVectorizable`` and +``getVectorizedFunction`` with the SVFS ones. + +## Changes in clang: shipping ``math.h`` with the compiler {#mathdoth} + +We use clang to generate the metadata described above. The functions +available in library ``X`` are listed in a custom ``math.h`` file +that is shipped with the compiler in ``/lib/Headers/math.h``. +The header file is implemented by including "once" the system +``math.h`` file, followed by ``#ifdef`` guarded re-declarations of the +functions enriched with ``#pragma veclib declare simd`` directives. + +``` +#include_once + +// ... cpp extern "C" guards omitted + +#ifdef _CLANG_USE_LIBRARY_X +#pragma veclib declare simd simdlen(4) notinbranch +extern double sin(double); +#endif +``` + +This generates the vector Function ABI mangled name to be used in +the ``vector-variant`` attribute, for example ``_ZGVcN2v_sin``, when +targeting AVX code generation. + +The part of the vector-variant attribute that redirects the call to +``__svml_sin2`` is also added via the header file ``math.h``, by using +the OpenMP 5.0 directive ``declare variant``,[^2] guarded by SVML +specific preprocessor macros: + +``` +#ifdef _CLANG_USE_SVML +#pragma veclib declare simd simdlen(4) notinbranch +extern double sin(double); + +#pragma veclib declare variant(double sin(double)) \ +match(construct=simd(simdlen(4),notinbranch), device={isa(avx2)}) +__m256d __svml_sin4(__m256d x); +#endif +``` + +Note that the list of if-guarded function declaration do not need to +leave in the same ``math.h`` file, but can be included in ``math.h`` +from library-specific header files. + +## Changes in the clang driver {#driver} + +To enable the information provided via ``math.h``, the clang driver +will translate the ``-fveclib=X`` option into ``-D_CLANG_USE_LIBRARY_X +-lX`` to turn on the correct section of the header file and the flag +for the linker. + +Note that the ``veclib`` directives are loaded even when *not* +compiling for an OpenMP target. + +# Extending auto-vectorization capabilities of LLVM + +When compared to the TLI-based auto-vectorization mechanism, the +OpenMP-based mechanism has the advantage of enabling users to provide +their own vector routines (not just the math ones) by adding ``veclib +declare simd`` and ``veclib declare variant`` definitions in their +source. + +For this specific functionality, the following command line option is +added to clang: + +``` +-fveclib-include=path/to/header/file.h +``` + +This options enable clang to recognize the ``veclib declare simd`` and +``veclib declare variant`` directive listed in the library of the +header file. + +# Summary + +## New ``veclib`` directives in clang + +1. ``#pragma veclib declare simd [clause, ]``, same as ``#pragma omp + declare simd`` from OpenMP 4.0+. +2. ``#pragma omp declare variant``, same as ``#pragma omp declare variant`` + restricted to the ``simd`` context selector, from OpenMP 5.0+. + +## New ``math.h`` header file + +Shipped in ``/lib/Headers/math.h``, contains all the +declaration of the functions available in the vector library ``X``, +``ifdef`` guarded by the macro ``__CLANG_ENABLE_LIBRARY_X``. + +## Option behavior, and interaction with OpenMP + +The behavior described below makes sure that `-fveclib`` function +vectorization and OpenMP function vectorization are orthogonal. + +No options + +: No function vectorization via vector library, neither user + provided or shipped via an internal `math.h`. + +``-fveclib=X`` + +: The driver transform this into ``-fparse-veclib + -D__CLANG_ENABLE_LIBRARY_X=1 -lX``. This is used only for users + that want to vectorize ``math.h`` functions. + +``-fveclib-include=path/to/user/provided/header/file.h`` + +: The driver transform this into ``-fparse-veclib + -include=path/to/user/provided/header/file.h``. The user has to + provide the correct linker flag for both the scalar version and + the vector version of whatever function they have defined in the + header file. The header file must use the ``veclib`` directive to + inform the compiler about the available vector functions. + +``-fopenmp[-simd]`` + +: No vectorization happens other then for those functions that are + marked with OpenMP declare simd. The header ``math.h`` is loaded, + but the ``veclib` decorated declarations are invisible to the + compiler instance because hidden behind the + ``__CLANG_ENABLE_LIBRARY_X`` macros, which are not defined. + +``-fopenmp[-simd] -fveclib=X`` or ``-fopenmp[-simd] +-fveclib-include=path/to/user/provided/header/file.h`` + +: Same behavior as without the ``-fopenmp[-simd]`` option. + + +[^1]: + +[^2]: + +[^3] Vector Funcion ABI for x86: . Vector Function ABI for AArch64: https://developer.arm.com/products/software-development-tools/hpc/arm-compiler-for-hpc/vector-function-abi